IBM Education Assistance for z/OS V2R1
Item: Unicode Services Exploitation Element/Component: z/OS UNIX System Services
Material is current as of March 2013 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode
Agenda
■ Trademarks ■ Presentation Objectives ■ Overview ■ Usage & Invocation ■ Interactions & Dependencies ■ Migration & Coexistence Considerations ■ Presentation Summary ■ Appendix
Page 2 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Trademarks
■ See url http://www.ibm.com/legal/copytrade.shtml for a list of trademarks.
■ Additional trademarks: Unicode is a registered trademark of Unicode, Inc. in the United States and other countries.
Page 3 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Presentation Objectives
■ Description of Unicode Services Exploitation by UNIX System Services and its need. ■ External interfaces. ■ Examples of usage.
Page 4 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Overview
■ Problem: Files can reside in different systems or geographic locations from which it originated. Conversion between code pages may be necessary.
■ Solution: 1) Allow z/OS UNIX files to be tagged with any code page. 2) Allow auto conversion to occur when file I/O is issued by a program.
z/OS UNIX will use z/OS Unicode Services to allow a program to read/write a file tagged with a code page and convert file data to/from the program's code page.
Benefit/Value: z/OS UNIX now fully participates in the text conversion arena and supports auto conversion for all IBM code pages.
Page 5 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Overview
■ Design is based on the z/OS V1R2 Enhanced ASCII function: 1) Each program or thread has a code page (default ccsid = 1047). 2) Unix files can have a code page (i.e., file tag). 3) Run time environment can be enabled for automatic conversion.
Page 6 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Usage & Invocation
■ Assign a ccsid (numeric id for a code page) to a thread/program: ➢ Default = 1047 ➢ _BPXK_PCCSID=ccsid ➢ Compile C program with ASCII option (ccsid = 819) ➢ fcntl(fd,f_control_cvt,...) can assign program ccsid for open file
Tag a file: ➢ Default = no tag, i.e., no conversion ➢ /bin/chtag -tc 1208 file1 ➢ /bin/cp file1 file2 (tag propagation) ➢ Compile C program with FILETAG(...,AUTOTAG) run option ➢ fcntl(fd,f_settag,&tag); /* assign file ccsid for this file */ ➢ fcntl(fd,f_control_cvt,...); /* just for life of this open */
Page 7 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Usage & Invocation Enable the conversion environment: ➢ For System: SYS1.PARMLIB(BPXPRMxx): ➢ AUTOCVT(ALL) ➢ SETOMVS AUTOCVT=ALL ➢ SET OMVS=(xx) ➢ D OMVS,OPTIONS displays the Parmlib values ➢ For Session or Program: ➢ export _BPXK_AUTOCVT=ALL ➢ setenv(“_BPXK_AUTOCVT”,”ALL”,1); ➢ For single open file before 1st read or write: ➢ fcntl(fd,f_control_cvt,....) enables only for this open file
Example: > _BPXK_AUTOCVT=ALL mypgm mytaggedfile
Page 8 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Details
Function resides in Logical File System (LFS). Standard read/write I/O supported; special v_rdwr not supported Unlike Enhanced ASCII, character special file conversion is not allowed. ➢ Existing ASCII programs using std streams need Autocvt=On REXX syscalls are also supported: ➢ f_settag, f_control_cvt, environment() Environment variables: ➢ _BPXK_AUTOCVT = OFF | ON | ALL ➢ _BPXK_PCCSID = ccsid ➢ _BPXK_UNICODE_TECHNIQUE = (LMREC0-9) ➢ _BPXK_UNICODE_SUB = YES | NO (substitute character action) ➢ _BPXK_UNICODE_MAL = YES | NO (mal-formed character action) ➢ Tip: Set these up before the first read/write
Page 9 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Details
Passthru of Unicode Services conversion errors: ➢ z/OS UNIX reason code E400ccrr and E401ccrr is a Unicode Services error with return code cc and reason code rr
LFS gets three large buffers to hold and convert the data ➢ For reads, any extra data is discarded when the I/O completes. ➢ For reads and writes, any ending partial character is cached for the next read or write. close() can cause loss of a (cached) character ➢ Only converted data is supplied to the program or file. ➢ The read/write return value reflects the bytes supplied to/from the program, not the amount read from or written to the file. The cursor reflects the actual amount LFS reads from or writes to the physical file system.
Page 10 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Details
The internal buffers used for conversion is persistent kernel above the bar (64 bit) storage. ➢ Excessive usage can cause significant paging and virtual 31 storage accumulation. ➢ Parmlib keyword MAXIOBUFUSER(nnnnn) limits each user's storage for conversioni to nnnnnM (Mbytes). MAXIOBUFUSER(2G) = 2P (petabytes) of storage MAXIOBUFUSER(2048) = 2G is the default SETOMVS MAXIOBUFUSER=nnnnn is supported Multi-threaded effects ➢ Each thread sharing a single open file must have the same program ccsid. ➢ Simultaneous reads or writes resulting in partial characters can cause an error. ➢ Additional threads use temporary above the bar storage for buffers.
Page 11 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Details
lseek problems: 1) lseek to a non-character boundary, followed by a read or write. 2) lseek from one code page to another (MBCS files). 3) lseek to a non-character boundary causing a false positive conversion.
An lseek to other than the current position or file beginning results in a subsequent I/O conversion error for DBCS/MBCS files.
Bypass: (????)
lseek(fd,position,seek_set); ------> lseek(fd,0,seek_set); read(fd,buf,position); Character boundary problems only occur with DBCS/MBCS code pages. Sequential reading/writing is the preferred I/O method when converting.
Page 12 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Interactions & Dependencies
■ Software Dependencies None ■ Hardware Dependencies None ■ Exploiters Unix Shell & Utilities (e.g., /bin/cat, …) C Run Time Library (e.g., fopen,fread,fwrite,fcntl)
Page 13 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Presentation Summary
■ z/OS UNIX has extended Enhanced ASCII function to provide any code page to any code page conversion.
■ z/OS UNIX is exploiting z/OS Unicode Services to provide this function.
■ Read and write conversion occurs when the environment is enabled and the program CCSID and file CCSID do not match.
Page 14 of 15 © 2013 IBM Corporation Filename: zOS V2R1 UNIX Unicode IBM Presentation Template Full Version Appendix
■ z/OS UNIX System Services Planning (GA22-7800) ■ z/OS UNIX System Services File System Interface Reference SA23-2285 ■ z/OS UNIX System Services Command Reference (SA23-2280) ■ z/OS Using REXX and z/OS UNIX System Services (SA23-2283) ■ z/OS MVS System Commands (SA38-0666) ■ z/OS MVS Initialization and Tuning Reference (SA23-1380) ■ z/OS XL C/C++ Run-Time Library Reference ■ z/OS Unicode Services User's Guide & Reference (SA38-0689) ■ Character Data Representation Architecture (SC09-2190) and http://www-01.ibm.com/software/globalization/cdra
Page 15 of 15 © 2013 IBM Corporation