<<

The NLS SETUP Application and TRABASE.SAS: Easy ways to customise character conversions using the SAS System Manfred Kiefer, SAS Institute European Headquarters

Abstract

All software used in a multi-dimensional environment must account for the differences in character sets and encoding schemes. It must also accommodate differences in conventional usage between languages, such as the differing usage of upper and lower cases. The SAS System provides several features to ensure that applications can be written to use local conventions and provide national language support (NLS). Areas of concern include the following: • moving data and applications between hosts • management of text-strings • displaying and printing national characters other than the standard upper and lower case A-, as they are encoded by various ASCII and EBCDIC formats.

Historically, the SAS System has provided internal translation tables, called TRANTAB entries, that convert one standard to another. Currently shipping as a sample application with the Orlando Release, NLSsetup fully automates the creation of TRANTABs, key maps, and device maps, and provides an easy point-and-click interface for users to transparently specify language features. The Problem for SAS System Users

As shown in Table 1, each host or platform on which the SAS System runs uses different standards for encoding characters. As a result, you must convert or map characters when you move data across platforms.

Table 1: Operating Systems (Hosts) Grouped by Character-Encoding Standard

EBCDIC hosts: • CMS • MVS • VSE

ASCII-ISO hosts (those that use character set() that are defined by the ISO 8859 standard): • AIX-RS/6000 • Convex • DG/UX • HP-UX • Intel ABI • MIPS ABI • OpenVMS-VAX • OpenVMS-AXP • Digital UNIX • Solaris 2 • SunOS 4.1 • ULTRIX

ASCII-ANSI hosts,. which is the MS-Windows ANSI character set. (This is essentially ISO 8859, but it is called ASCII-ANSI because it was originally based on an ANSI draft standard) • Windows 3.1 • Windows 32s • Windows NT • Windows 95

ASCII-MAC hosts (those that use character set(s) that are defined by the vendor-specific Apple Macintosh character set): • Macintosh System 7.5 for Motorola 68020-, 68030-, and 68040-based systems • PowerPC-based Macintosh systems

ASCII-OEM host (vendor-specific IBM PC-ASCII character set): • /2

When you transfer data with the "standard" A-Z characters, character conversion from one encoding standard to another is not a problem. You simply rely on the default conversion mechanisms. However, when you have data with national characters such as the æ, ø, and å in Danish; the ä, ö, and ö in German; and the accented characters, such as á, é, ú, and ñ in Spanish, different conversion mechanisms are involved, and unique character encoding standards are used on each platform. The NLSsetup application enables you to adapt conversion (translation) tables for each language. Although this is typically a system administrator's task, you should understand the process so that you can add your own customised tables or modify existing ones.

The problem for a SAS system administrator is to enable users to transfer data and applications from one host to another, or transparently access data on one host from another host without concern about character conversion from one coded character set to another.

The SAS System provides a number of ways of transporting data and applications across hosts. However, the processes and trantabs that are involved differ, depending on the mechanisms you use. The REMOTE engine feature of SAS/CONNECT and SAS/SHARE software uses host-to-host trantabs. PROCs UPLOAD, DOWNLOAD, CPORT, and CIMPORT use transport-format trantabs. Each of these mechanisms is explained in the following sections. Host-to-Host Trantabs: Transporting Data via the REMOTE Engine

The REMOTE engine is a feature of SAS/CONNECT and SAS/SHARE software that allows you to access remote data. When you move data across platforms, the REMOTE engine translates character sets directly from the source platform's encoding standard to the target platform's encoding standard, as shown in the following diagram. source target platform <------> translation <------> platform (host-to-host trantabs)

For example, if you are using the REMOTE engine to access data on an MVS host, which uses EBCDIC encoding, from a PC client, which uses IBM PC-ASCII encoding, characters are translated directly from EBCDIC to IBM PC-ASCII and vice-versa. Table 2 shows the trantabs that the SAS System provides for direct host-to-host character-set translation. Table 2: SAS Host-to-Host Trantabs

Trantab Name Entry and Function Specific Hosts ------On EBCDIC hosts (IBM mainframes)

_0000030 (0) import from ASCII-ISO to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-ISO OpenVMS or UNIX systems

_0000060 (0) import from ASCII-ANSI to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-ANSI Windows

_00000A0 (0) import from ASCII-OEM to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-OEM OS/2

_0000120 (0) import from ASCII-MAC to EBCDIC connecting MVS, CMS, or VSE to (1) export from EBCDIC to ASCII-MAC MAC ------

On Windows hosts

- Windows in ANSI mode:

_0000050 (0) import from ASCII-ISO to ASCII-ANSI connecting Windows to (1) export from ASCII-ANSI to ASCII-ISO OpenVMS and UNIX systems

_0000060 (0) import from EBCDIC to ASCII-ANSI connecting Windows to MVS (1) export from ASCII-ANSI to EBCDIC CMS, or VSE

_00000C0 (0) import from ASCII-OEM to ASCII-ANSI connecting Windows to OS/2 (1) export from ASCII-ANSI to ASCII-OEM or to Windows in ASCII-OEM mode _0000140 (0) import from ASCII-MAC to ASCII-ANSI connecting Windows to MAC (1) export from ASCII-ANSI to ASCII-MAC

- Windows in OEM mode:

_0000090 (0) import from ASCII-ISO to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to ASCII-ISO OpenVMS or UNIX systems

_00000A0 (0) import from EBCDIC to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to EBCDIC MVS, CMS, or VSE

_00000C0 (0) import from ASCII-ANSI to ASCII-OEM connecting Windows to (1) export from ASCII-OEM to ASCII-ANSI Windows in ASCII-ANSI mode

_0000180 (0) import from ASCII-MAC to ASCII-OEM connecting Windows to MAC (1) export from ASCII-OEM to ASCII-MAC ------

On OS/2

_0000090 (0) import from ASCII-ISO to ASCII-OEM connecting OS/2 to OpenVMS (1) export from ASCII-OEM to ASCII-ISO or UNIX systems

_00000A0 (0) import from EBCDIC to ASCII-OEM connecting OS/2 to MVS, (1) export from ASCII-OEM to EBCDIC CMS, or VSE

_00000C0 (0) import from ASCII-ANSI to ASCII-OEM connecting OS/2 to Windows (1) export from ASCII-OEM to ASCII-ANSI

_0000180 (0) import from ASCII-MAC to ASCII-OEM connecting OS/2 to MAC (1) export from ASCII-OEM to ASCII-MAC ------On MAC

_0000110 (0) import from ASCII-ISO to ASCII-MAC connecting MAC to OpenVMS (1) export from ASCII-MAC to ASCII-ISO or UNIX systems

_0000120 (0) import from EBCDIC to ASCII-MAC connecting MAC to MVS, (1) export from ASCII-MAC to EBCDIC CMS, or VSE

_0000140 (0) import from ASCII-ANSI to ASCII-MAC connecting MAC to Windows (1) export from ASCII-MAC to ASCII-ANSI

_0000180 (0) import from ASCII-OEM to ASCII-MAC connecting MAC to OS/2 (1) export from ASCII-MAC to ASCII-OEM ------On OpenVMS and UNIX hosts

_0000030 (0) import from EBCDIC to ASCII-ISO connecting OpenVMS or UNIX to (1) export from ASCII-ISO to EBCDIC MVS, CMS, or VSE

_0000050 (0) import from ASCII-ANSI to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-ANSI to Windows

_0000090 (0) import from ASCII-OEM to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-OEM to OS/2

_0000110 (0) import from ASCII-MAC to ASCII-ISO connecting OpenVMS or UNIX (1) export from ASCII-ISO to ASCII-MAC to MAC ------

The same trantabs are used for all connectivity mechanisms, including APPC, TCP/IP, NETBIOS, and DECnet.

As you can see from Tables 1 and 2, the conversion subsystem distinguishes the following character architectures: • EBCDIC • ASCII-ISO (ISO 8859) • ASCII-OEM (which, for our purposes, includes only the IBM PC-ASCII standard) • Microsoft's ASCII-ANSI • Apple's ASCII-MAC.

Each host-to-host trantab actually consists of two halves, or "entries": • ordered entry 0 (for importing) • entry 1 (for exporting).

For example, on UNIX hosts, the _0000030 (EBCDIC to ASCII-ISO) trantab is shown in Table 3 as it appears when both halves are listed by the TRANTAB procedure.

Table 3: The _0000030 Trantab

Table name is _0000030.

0 1 2 3 4 5 6 7 8 9 A 00 '000102039C09867F978D8E0B0C0D0E0F' 10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x 30 '909116939495960498999A9B14159E1A'x 40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x -> 50 '26A9AAABACADAEAFB0B121242A293B5E'x 60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x 80 'C3616263646566676869C4C5C6C7C8C9'x 90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x A0 'D17E737475767778797AD2D3D45BD6D7'x B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x C0 '7B414243444546474849E8E9EAEBECED'x D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x E0 '5C9F535455565758595AF4F5F6F7F8F9'x F0 '30313233343536373839FAFBFCFDFEFF'x

0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '00010203372D2E2F1605250B0C0D0E0F'x 10 '101112133C3D322618193F271C1D1E1F'x -> 20 '405A7F7B5B6C507D4D5D5C4E6B604B61'x 30 'F0F1F2F3F4F5F6F7F8F97A5E4C7E6E6F'x 40 '7CC1C2C3C4C5C6C7C8C9D1D2D3D4D5D6'x 50 'D7D8D9E2E3E4E5E6E7E8E9ADE0BD5F6D'x 60 '79818283848586878889919293949596'x 70 '979899A2A3A4A5A6A7A8A9C04FD0A107'x 80 '202122232415061728292A2B2C090A1B'x 90 '30311A333435360838393A3B04143EE1'x A0 '41424344454647484951525354555657'x B0 '58596263646566676869707172737475'x C0 '767778808A8B8C8D8E8F909A9B9C9D9E'x D0 '9FA0AAABAC4AAEAFB0B1B2B3B4B5B6B7'x E0 'B8B9BABBBC6ABEBFCACBCCCDCECFDADB'x F0 'DCDDDEDFEAEBECEDEEEFFAFBFCFDFEFF'x Each cell in the trantab "maps" an ASCII code point to a corresponding EBCDIC code point, or vice-versa. For example, the character (&) is '50'x in EBCDIC and '26'x in ASCII-ISO (as in all ASCII standards). Therefore, in the first half of the trantab in Table 3 (the EBCDIC to ASCII-ISO half), the cell that represents EBCDIC code point 50 contains the value 26. In the second half of the table (the ASCII-ISO to EBCDIC half), cell 26 contains the value 50.

In all cases, the EBCDIC trantabs (_0000030, _0000060, _00000A0, _0000120) are accurate for the U.S. English EBCDIC (CECP 037), and for the first 128 ASCII code points. The upper 128 "ASCII" code points, which are used for national characters, vary from one 8- ASCII extension to another, or from one code page to another. Therefore, international users who want to preserve their national characters must always customise these trantabs. This is discussed in "Customizing Host-to-Host Trantabs."

Table 4 shows an ASCII trantab, trantab _0000090 (ASCII-OEM to ASCII-ISO).

Table 4: The _0000090 Trantab

Table name is _0000090.

0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102030405060708090A0B0C0D0E0F'x 10 '101112131415161718191A1B1C1D1E1F'x 20 '202122232425262728292A2B2C2D2E2F'x 30 '303132333435363738393A3B3C3D3E3F'x 40 '404142434445464748494A4B4C4D4E4F'x 50 '505152535455565758595A5B5C5D5E5F'x 60 '606162636465666768696A6B6C6D6E6F'x 70 '707172737475767778797A7B7C7D7E7F'x 80 '808182838485868788898A8B8C8D8E8F'x 90 '909192939495969798999A9B9C9D9E9F'x A0 'A0A1A2A3A4A5A6A7A8A9AAABACADAEAF'x B0 'B0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF'x C0 'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECF'x D0 'D0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF'x E0 'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEF'x F0 'F0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'x

As you can see, this half of the trantab, as well as the other half, which is not shown, is an identity table, in which each code point is mapped to itself. For example, cell 00 contains the value 00, cell 01 contains the value 01, and so on. Again, this is valid for the first 128 code points, which are the same in all ASCII-based standards. International users must customise the upper half of the trantab to reflect the correct mappings for the national characters that they want to preserve. Customising Host-to-Host Trantabs

You can use the TRANTAB procedure to modify a host-to-host trantab. As previously stated, customisation of these trantabs (except _0000050, where Windows ANSI corresponds exactly to ISO 8859) is always necessary for international users who want to preserve their national characters when they use the REMOTE engine to transfer data across hosts.

For example, suppose that from an OpenVMS session you want to modify a SAS data set in CMS that contains German national characters. The national characters that you want to preserve are ä, ö, ü, ß, Ä, Ö, and Ü.

As Table 2 shows, the trantab that handles character conversion between OpenVMS and CMS is _0000030. Therefore, you customise by mapping the code point for each national character from the German EBCDIC code page (CECP 273) to the corresponding code point on the ISO 8859-1 code page. You do this by performing the following steps. 1. From your OpenVMS session, use PROC TRANTAB to modify the appropriate trantab. For data conversion, only the trantabs in the user's SAS session are used. Do not update a trantab in a user session while you are connected to a foreign host that needs to use the trantab.

proc trantab table=_0000030; /* Customize import entry: */ /* convert EBCDIC to ASCII-ISO */ rep 'c0'x 'e4'x; /* a diaeresis (ä) */ rep '6a'x 'f6'x; /* diaeresis (ö) */ rep 'd0'x 'fc'x; /* u diaeresis (ü) */ rep 'a1'x 'df'x; /* s sharp (ß) */ rep '4a'x 'c4'x; /* A diaeresis (Ä) */ rep 'e0'x 'd6'x; /* O diaeresis (Ö) */ rep '5a'x 'dc'x; /* U diaeresis (Ü) */

swap; /* Customize export entry: */ /* convert ASCII-ISO to EBCDIC */ rep 'e4'x 'c0'x; /* a diaeresis (ä) */ rep 'f6'x '6a'x; /* o diaeresis (ö) */ rep 'fc'x 'd0'x; /* u diaeresis (ü) */ rep 'df'x 'a1'x; /* s sharp (ß) */ rep 'c4'x '4a'x; /* A diaeresis (Ä) */ rep 'd6'x 'e0'x; /* O diaeresis (Ö) */ rep 'dc'x '5a'x; /* U diaeresis (Ü) */ swap; save; quit;

2. The custom table is written to your SASUSER.PROFILE catalog. If it needs to be generally accessible, copy it to the SASHELP.HOST catalog. By default, the SAS System tries first to locate translation tables in SASUSER.PROFILE, and then in SASHELP.HOST.

3. To start using the modified trantab, you must first close and re-start your SAS session on OpenVMS.

4. Now sign on to CMS. For example:

options comamid=tcp remote=your_serverid; filename rlink 'your_communication_script'; signon;

5. Assign a libname:

libname test 'file-type file-mode' server=your_serverid;

6. Now you can use the FSEDIT procedure, for instance, to update the data set on the remote host, and keep national characters correct.

If you had not modified the table, the default character conversion would apply. This means, for instance, that 'A1'x would be translated to '7E'x. Therefore, the word "Straße" would appear as "Stra~e" in your OpenVMS session.

You can de-activate the customised character conversion by renaming the customised trantab, de-assigning the library, and assigning it again. Transport-Format Trantabs: Transporting Data via PROCs UPLOAD, DOWNLOAD, CPORT, and CIMPORT

Both the UPLOAD/DOWNLOAD procedures and the CPORT/CIMPORT procedures use an intermediate transport format when transporting files from one host to another. PROCs UPLOAD and DOWNLOAD are part of SAS/CONNECT software. See SAS/CONNECT Software: Usage and Reference, Version 6, 2nd Edition for details. (Appendix 4 lists the default ASCII/EBCDIC translation tables.) The process is illustrated in the following diagram:

translation translation (local-to-transport (transport-to-local trantab) trantab) | | | | source transport V target platform <------> format <------> platform

When you are converting from a character-encoding standard to transport format or vice-versa, you use the SAS transport-format trantabs shown in Table 5.

Table 5: SAS Transport-format Trantabs

------Trantab name Function

SASXPT controls local-to-transport-format translation SASLCL controls transport-to-local-format translation

------

For example, if you are transporting SAS data from an MVS host, which uses the EBCDIC standard, to an OS/2 system, which uses IBM PC-ASCII, the SASXPT trantab on MVS is used for the conversion from EBCDIC to transport format, and the SASLCL trantab is used for the conversion from transport format to IBM PC-ASCII.

The character transport format is an extended ASCII representation. You can visualise transport format as an 8-bit code page in which the first 128 code points are the same as they are for all ASCII-based standards, and the upper 128 code points are initially unassigned. The upper 128 code points are simply used for mapping the national characters from EBCDIC or from any 8-bit ASCII encoding standard. On any host that uses an ASCII-based standard, SASXPT and SASLCL are identity tables, similar to many of the host-to-host trantabs. If you want to preserve national characters on hosts that use an 8-bit ASCII standard, then you must modify the default mapping of the upper 128 cells to fit the particular ASCII standard and code page that you are using.

On EBCDIC hosts, SASXPT is the same as the second half of the EBCDIC host-to-host trantabs _0000030, _0000060, or _00000A0. SASLCL is the same as the first half of those trantabs. To preserve national characters, you must customise SASXPT and SASLCL, just as you would customise the two halves of the EBCDIC host-to-host trantabs. See "Customizing Transport-Format Trantabs," for more information.

Note: To transport SAS files from one host to another via tape or shared DASD, you use the CPORT/CIMPORT procedures, just as you would if you were transporting the files via communications software. The process is virtually identical to that described previously. Only the transport medium is different. Customising Transport-Format Trantabs

Transport-format (SASXPT, SASLCL) trantabs often must be customised to accommodate national language character sets other than U.S. English. There are three ways of customising these tables: • with the NLSSetup Application • with the TRABASE program • directly with the TRANTAB procedure.

The TRABASE program actually uses the TRANTAB procedure to create a number of customised trantabs for you. The following sections explain how to use the TRABASE program and how to use PROC TRANTAB separately to create your own customised trantabs. Building Customized Trantabs with the TRABASE Program

The TRABASE program, which builds transport-format and character-operations trantabs for a number of languages and operating systems, is part of the SAS sample library. It does not create tables for all possible combinations, but it can easily be adapted to specific needs.

When you look at the TRABASE program, you will see that it creates a macro, BTABLE, with the single parameter COUNTRY. When you supply an appropriate country name, BTABLE creates a set of trantabs (corresponding to some or all of the SASXPT, SASLCL, and other default trantabs) to handle the translation of that country's national characters.

The names of the trantabs that are created by TRABASE follow a naming convention. For the local-to- transport-format and the transport-to-local-format tables, SPAETA and SPAATE are typical trantab names, where "SPA" is an abbreviation for Spanish, "ETA" stands for "EBCDIC to ASCII," and "ATE" represents "ASCII to EBCDIC."

Note: In this context, "ASCII" means IBM PC-ASCII.

The following naming convention for the local-to-transport and the transport-to-local entries was used: • EBCDIC <-> OEM (PC-ASCII): eta, ate • ISO <-> OEM (PC-ASCII): ita, ati • EBCDIC <-> ISO : eti, ite • ISO <-> MAC (Apple) : itm, mti • ISO <-> ANSI (MS-Windows): itw, wti • OEM <-> MAC : atm, mta • EBCDIC <-> MAC : etm, mte Where country is one of the following: • dan: Denmark/Norway • fre: France • ger: Germany • hun: Hungary • ita: Italy • pol: Poland • spa: Spain • swe: Sweden/Finland • swi: Switzerland (German/French)

See the text of the TRABASE program for further information. Examples: Using Customised Transport-Format Trantabs

Suppose you are using an OS/2 PC. You have data that contain Spanish characters and you want to use PROC DOWNLOAD to download that data from MVS (EBCDIC) to OS/2 (ASCII, or, more specifically, IBM PC-ASCII or what SAS classifies as ASCII-OEM).

1. First, use the TRABASE program to create the customized transport-format trantabs SPAETA (EBCDIC to ASCII) and SPAATE (ASCII to EBCDIC).

2. To specify that SAS should use these trantabs instead of the default, (SASXPT and SASLCL) transport-format trantabs, you specify the following OPTIONS statement on MVS - since PROC DOWNLOAD (as would be PROC UPLOAD) is executed on the remote host (which is the MVS mainframe in this case):

options trantab=(spaeta,spaate);

The SPAETA trantab handles the correct host-to-transport format translation, and SPAATE takes care of the transport-to-host format translation. As stated earlier, character translation depends on which platforms are involved. If you want to translate characters between the two 8-bit ASCII extensions of OS/2 and UNIX, you need to create a new set of transport-format trantabs. Most UNIX derivatives use the ISO 8859 standard. In accordance with the naming convention used above, you call a Polish SASXPT trantab for OS/2 POLATI (ASCII-to-ISO). A modified SASLCL trantab would be called POLITA (ISO to ASCII).

Note: The TRABASE program that generates the POLATI trantab in addition to numerous other customised trantabs is a recent program modification. If you don' find this trantab in your version of TRABASE, you could use PROC TRANTAB to create the table as follows:

PROC TRANTAB table=SASXPT nls; rep '98'x 'b6'x; /* s acute */ ...... save table=polati; quit;

A new table is written to your SASUSER.PROFILE catalogue. If a table needs to be generally accessible, copy it to the SASHELP.HOST catalogue. NLS SETUP APPLICATION

The NLSsetup application that creates all the necessary trantabs for users (both host-to-host and transport format) is shipped with Release 6.11 in the BASE sample source library. You can also use this to generate devmaps and keymaps just by selecting a country name from a listbox.

To access the NLSsetup application on Windows or OS2: • Assign a libname libn '!SASROOT\core\sample'; • Issue: af c=libn.nlssetup.nlssetup.frame

To access the NLSsetup application on UNIX: • Assign a libname libn '!SASROOT/samples/base'; • Issue: af c=libn.nlssetup.nlssetup.frame

The primary window for the NLSsetup application is shown in the following figure.

• The elements of the figure are described as follows. • SELECT ONE allows a user to select the country for which the default tables will be generated. • RESET REMOTE ENGINE TABLES resets the REMOTE engine tables to the default tables initially shipped with the SAS System. • OK causes the tables to be generated and stored in SASUSER.PROFILE as TRANTAB entries. If you have write access to SASHELP.HOST, it copies the TRANTAB entries from SASUSER.PROFILE to SASHELP.HOST. It also copies the appropriate key map and device map to DEFAULT.KEYMAP and DEFAULT.DEVMAP in GFONT0.FONTS. If you have write access to SASHELP.FONTS, it copies these entries to SASHELP.FONTS. Further Details

Customised character translation tables are created, which are used for the REMOTE engine as well as some or all of the following TRANTAB entries: • Local-to-transport format • Transport-to-local format • Uppercase-to-lowercase • Lowercase-to-uppercase • Character Classification • Scanner Translation • Sort Tables

You are free to rename the entries according to your needs. For example, since Danish and Norwegian users make use of the danxxx tables, Norwegian you may wish to rename the tables to norxxx.

NLSsetup creates TRANTAB entries for various configurations. You must select the proper TRANTAB entries to be used in a system OPTIONS= statement. For example, if you frequently upload or download from OS/2 to a mainframe and vice versa, then you need to use the xxETA and xxxATE tables.

In order to use the customised tables properly, you specify the appropriate TRANTAB system option. The easiest way to make the tables available to all SAS users is to add the -TRANTAB option to the CONFIG.SAS file. The arguments to the TRANTAB option are positional and identify the table entry in SASUSER.PROFILE or SASHELP.HOST by name. The example below specifies custom local-to-transport- format and transport-to-local-format translation tables while leaving the other tables unchanged. -TRANTAB (sweeta,sweate)

The custom host-to-host trantabs are written to SASUSER.PROFILE, and, if you have write access, copied to SASHELP.HOST, which overwrites the default TRANTAB entries there. You can reset them with the RESET REMOTE ENGINE TABLES button. Conclusion

The SAS System helps to compensate for numerous incompatible character encoding standards by providing internal translation tables that convert from one character encoding standard to another. Key maps and device maps compensate for the differences in the character encodings of SAS System graphics, on the one hand, and the character encodings of host systems and output devices on the other. The NLSsetup application provides an easy point-and-click interface that allows you to set up theses features transparently.

REFERENCES

 Kiefer, . and Kohl, .. (1995), “SAS System Support for International Character Sets,” Observations : The Technical Journal for SAS Software Users, 4(3), 18-33.

SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries.  indicates USA registration. Other brand and product names are registered trademarks or trademarks of their respective companies. NLS SETUP and TRABASE

Easy ways DtaPaperto customize Title character conversions

Manfred Kiefer SAS Institute

Overview

 Problems for SAS users

 Host-to-Host Trantabs: Transporting via the REMOTE Engine

 Transport-Format Trantabs: Transporting Data via PROCs UPLOAD/DOWNLOAD, CPORT/CIMPORT

 The NLS SETUP Application

S SOME Terminology

 (coded) character set = encoding: unambiguous mapping of the items of a character set (letters, digits, etc.) to numeric code values

 ASCII: American Standard Code for Information Interchange, a 7-bit code that includes the (upper- and lowercase) letters A-Z, digits, punctuation and control characters

 EBCDIC: Extended Binary Coded Decimal Interchange Code, a family of 8-bit codes S SOME Terminology

 national character: a character specific to a particular nation or group of nations (ä, è, ð, ñ, ø), or any letter other than upper- and lowercase A-Z

 character conversion = mapping: changing the representation of data by using one coded character set in place of another

 trantab = translation table: a SAS catalog entry that translates from one character set to another S Easy character conversion?

S Character Encoding standards

ANSI X3.4-1977: 7-bit ASCII ISO 646-1983: 7-bit ASCII (IRV) ANSI X3.4-1986: 7-bit ASCII ISO 8859-1:1987: 8-bit IBM EBCDIC CECP 037 IBM CP 437 DEC Multinational Character Set HP Roman 8 ... “In hindsight, what we have done is invented a communications Tower of Babel.” Edwin Hart S Easy character conversion?

“The mishmash of character encoding standards makes it hard for users to share data and for programmers to create worldwide software. Trying to pass data from different encodings across networks or between operating systems involves a gantlet of mappings, conversions, fonts and general headaches.”

Nadine Kano, Asmus Freytag S Problems for SAS Users

 character conversion (mapping) when moving data across platforms

 default conversion when transferring data with “standard” (A-Z) characters

 be careful when dealing with national characters

 different conversion mechanisms, and different character encoding standards on different platforms.

S SAS enables users to:

 transparently access data on one host from another host, or

 transfer data and applications from one host to another

 without having to worry about character conversions from one coded character set to another.

S Transporting Data via the REMOTE Engine

 Direct character translation

 the same trantabs are used regardless of the connectivity mechanism

 source platform > target platform

S SAS Host-to-Host Trantabs

 EBCDIC/ASCII-ISO: _0000030

 ASCII-ISO/ASCII-ANSI: _0000050

 EBCDIC/ASCII-ANSI: _0000060

 ASCII-ISO/ASCII-OEM: _0000090

 EBCDIC/ASCII-OEM: _00000A0

 ASCII-ANSI/ASCII-OEM: _00000C0

 ASCII-MAC/ASCII-ISO: _0000110

 ASCII-MAC/EBCDIC: _0000120

 ASCII-MAC/ASCII-ANSI: _0000140

 ASCII-MAC/ASCII-OEM: _0000180 S SAS Host-to-Host Trantabs

 Each host-to-host trantab consists of two halves, or “entries”

 ordered

 entry 0 (for importing)

 entry 1 (for exporting)

S _0000030 Import Entry: EBCDIC/ASCII

Table name is _0000030.

0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102039C09867F978D8E0B0C0D0E0F'x 10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x 30 '909116939495960498999A9B14159E1A'x 40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x 50 '26A9AAABACADAEAFB0B121242A293B5E'x 60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x 80 'C3616263646566676869C4C5C6C7C8C9'x 90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x A0 'D17E737475767778797AD2D3D45BD6D7'x B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x C0 '7B414243444546474849E8E9EAEBECED'x D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x E0 '5C9F535455565758595AF4F5F6F7F8F9'x F0 '30313233343536373839FAFBFCFDFEFF'x

S _0000030 Import Entry: EBCDIC/ASCII

S Which ?

S The answer depends on ...

 the encoding.

 In any case ...

 EBCDIC trantabs are accurate for the U.S. EBCDIC code page, and for the first 128 ASCII code positions

 the upper 128 code positions vary from one ASCII extension to another

 international users need to customize these trantabs. S Character conversion without customization

 German text stored in EBCDIC encoding (CECP 237):

 “Blüht die Rose noch so schön, läßt sie doch die Dornen sehn.”

 Displayed under UNIX (ISO 8859- 1):

 “Bl}ht die Rose noch so sch|, {~t sie doch die Dornen sehn.”

S Customize a Host-to-Host Character Conversion

 Customize trantabs on local host, e...

proc trantab table = _0000030; rep ‘C0’x ‘E4’x; /* a diaeresis */ ... swap; rep ‘E4’x ‘C0’x; /* a diaeresis */ ... swap; save; quit; S Customize Host-to-Host Character Conversion

 The custom table is written to your SASUSER.PROFILE catalog.

 If it needs to generally accessible, copy it to the SASHELP.HOST catalog.

 By default, the SAS System tries first to locate translation tables in SASUSER.PROFILE, and then in SASHELP.HOST. S Customize Host-to-Host Character Conversion

 Sign on to the remote host; e.g..

options comamid=tcp remote=your_serverid; filename rlink ‘your_communication_script’; signon;

S Customize Host-to-Host Character Conversion

 Assign a libname, e.g..

libname test ‘myid.nls.data’;

 Use PROC FSEDIT to update the data on the remote host, and keep national characters correct.

S Character conversion with customization

 German text stored in EBCDIC encoding (CECP 237):

 “Blüht die Rose noch so schön, läßt sie doch die Dornen sehn.”

 Displayed under UNIX (ISO 8859- 1):

 “Blüht die Rose noch so schön, läßt sie doch die Dornen sehn.”

S Transporting Data via the PROCs UPLOAD/DOWNLOAD, CPORT/CIMPORT

 intermediate transport format

 default trantabs can be overridden via the TRANTAB= system option

 source > transport format > target

S SAS Transport-Format Trantabs

 SASXPT: controls local-to- transport-format translation

 SASLCL: controls transport-to- local-format translation

S Transporting Data via the PROCs UPLOAD/DOWNLOAD, CPORT/CIMPORT

 On EBCDIC hosts, SASXPT is the same as the second half (export entry) of the host-to-host trantabs

 ... SASLCL is the same as the first half (import entry) of the host-to-host trantabs

 On hosts that use an ASCII-based standard SASXPT and SASLCL are identity tables

S SASXPT: EBCDIC to ASCII

0 1 2 3 4 5 6 7 8 9 A B C D E F 00 '000102039C09867F978D8E0B0C0D0E0F'x 10 '101112139D8508871819928F1C1D1E1F'x 20 '80818283840A171B88898A8B8C050607'x 30 '909116939495960498999A9B14159E1A'x 40 '20A0A1A2A3A4A5A6A7A8D52E3C282B7C'x 50 '26A9AAABACADAEAFB0B121242A293B5E'x 60 '2D2FB2B3B4B5B6B7B8B9E52C255F3E3F'x 70 'BABBBCBDBEBFC0C1C2603A2340273D22'x 80 'C3616263646566676869C4C5C6C7C8C9'x 90 'CA6A6B6C6D6E6F707172CBCCCDCECFD0'x A0 'D17E737475767778797AD2D3D45BD6D7'x B0 'D8D9DADBDCDDDEDFE0E1E2E3E45DE6E7'x C0 '7B414243444546474849E8E9EAEBECED'x D0 '7D4A4B4C4D4E4F505152EEEFF0F1F2F3'x E0 '5C9F535455565758595AF4F5F6F7F8F9'x F0 '30313233343536373839FAFBFCFDFEFF'x

S Customize Transport-Format Trantabs

 with the TRANTAB procedure proc trantab table=SASXPT NLS; rep ‘C0’x ‘E4’x; /* a diaeresis */ ... save table = ... ;

 with the TRABASE program

 with the NLS Setup Application

S Customized Trantabs with TRABASE

 part of the SAS sample library

 builds trantabs for a number of countries and operating systems

 can easily be adapted to specific needs

 creates a macro with the single parameter COUNTRY

 names of the trantabs follow a naming convention S TRABASE naming convention

 EBCDIC/ASCII-OEM: ETA/ATE

 ASCII-ISO/ASCII-OEM: ITA/ATI

 EBCDIC/ASCII-ISO: ETI/ITE

 ASCII-ISO/ASCII-MAC: ITM/MTI

 ASCII-ISO/ASCII-ANSI: ITW/WTI

 ASCII-OEM/ASCII-MAC: ATM/MTA

 EBCDIC/ASCII-MAC: ETM/MTE

S TRABASE naming convention

 DAN: Denmark/Norway

 FRE: France

 GER: Germany/Austria

 HUN: Hungary

 ITA: Italy

 POL: Poland

 SPA: Spain

 SWE: Sweden/Finland

 SWI: Switzerland (Belgium) S TRABASE naming convention

 Trabase copies and modifies default trantabs

 gereta and gerate are typical names where

 ger is an abbreviation for “German”

 eta stands for “EBCDIC to ASCII”

 ate represents “ASCII to EBCDIC”

 you are free to rename the trantabs

 but use a “telling name” S Using customized Trantabs

 use trabase to create customized transport-format trantabs

 these are used instead of the default (sasxpt, saslcl) via the TRANTAB= system option, e.g.. options trantab=(gereta,gerate);

 gereta handles correct host-to- transport format translation

 gerate takes care of transport-to-host format translation S The NLS SETUP Application

 creates all necessary trantabs for users (both host-to-host and transport-format)

 also generates devmaps and keymaps

 is shipped with Orlando in the BASE sample source library

 is fully customizable

S The NLS SETUP Application

 on Windows or OS/2:

 assign a libname libn ‘!SASROOT\core\sample’;

 issue: af c=libn.nlssetup.nlssetup.frame

 on UNIX:

 assign a libname libn ‘!SASROOT\samples\base’;

 issue: af c=libn.nlssetup.nlssetup.frame S The NLS SETUP Application

 easy to use: just select a country from the listbox

 provides on line help

 will be further enhanced

S The NLS SETUP Application: Future

 6.12: enhanced version with more countries, revised help

 6.14: production version

S The NLS SETUP Application: Demo

S NLS SETUP and TRABASE: Questions?

S Thank you for your attention DtaPaper Title The SAS® System for successful decision making