<<

Portable SAS®: Language and Platform Considerations Robert A. Cruz, Info-Mation Systems, Hollister, CA

Portable SAS®: Language and Platform Considerations...... 1 Abstract ...... 1 Audience...... 1 O. Introduction...... 1 I. The Roots of Portability Issues...... 1 I.A Platforms...... 1 I.B What Do We Mean by “Portable” Software? ...... 2 II. Platform Differences...... 2 II.A Hardware Differences...... 2 II.B Software Differences...... 6 III. Base Language Considerations ...... 6 III.A Instruction Considerations ...... 6 III.B Internal Memory Considerations ...... 6 III.C Set Considerations ...... 7 III.D Numeric Considerations...... 10 III.E INFORMAT and OUTFORMAT Considerations ...... 11 III.F Macro Language Considerations...... 11 III.G PROC Considerations...... 11 II.H System, Statement, and PROC Options Considerations ...... 12 IV. Operating System Considerations ...... 13 IV.A External Executables Considerations ...... 13 IV.B Other Platform-Specific Features...... 13 IV.C Isolating System Dependencies ...... 13 V. Creating Portable SAS Programs ...... 14 V.A The Three-Pronged Strategy to Create a Portable SAS Program ...... 14 VI. Conclusion ...... 15 Acknowledgements ...... 15 Recommended Reading...... 16 Contact Information ...... 16 Trademarks, Brand and Product Names ...... 16 Appendix A: SAS Language Elements with Portability Considerations ...... 16 Endnotes ...... 21

ABSTRACT Techniques for creating portable SAS programs will be discussed. Portable mainline code can be executed unchanged on multiple platforms. Requirements for Windows, Unix, and mainframe systems will be presented. Considerations include language features to use or avoid, and coding techniques to use or avoid. Issues to be dealt with include internal representation of characters and numbers. Techniques for addressing the peculiarities of each platform will be presented. Windows, Unix, and mainframe platforms will be covered. Interfacing to sequential files and databases will be covered, as will considerations for system commands and sort utility. Automated platform identification and adaptation by macros will be covered. Search Keywords: Portable SAS, Windows, Unix, MVS, CMS, PC, Server, Mainframe, Collation Operating System: ALL Applicable SAS Products: Base SAS AUDIENCE This presentation is of interest to Beginner, Intermediate, and Expert SAS users who must deal with portability issues.

O. INTRODUCTION The objective of this paper is to familiarize the reader with those issues that impact the ability of a SAS program to function in the same manner when run on different computing platforms. Techniques for creating a portable program will be illustrated. These techniques include choices of base language elements, and an approach to structuring a SAS program to improve portability.

I. THE ROOTS OF PORTABILITY ISSUES I.A PLATFORMS A computing platform is the combination of hardware and software that an application is executed on. This combination identifies the environment in which the program will run. The platform may be identified by a particular

Page 1 of 21 Portable SAS: Language and Platform Considerations industry term, such as “Wintel” (the Windows operating system running on an Intel-compatible processor), or by implication. For example, citing “CMS” as the platform implies an operating system in the VM/CMS family running on IBM System/370-family hardware. Note that these are historical references, as the current version of this operating system is z/VM and the current hardware is the zSeries System 10. Very often, the required environment must be specified in more detail, giving a minimum amount of memory, and/or processor class, and/or storage capacity. Very often, a particular release of an operating system or other corequisite software is required, due to the exploitation of features available in that version of the OS. Examples of platforms are:  32- Windows running on an Intel-compatible CPU  Windows running on an Alpha processor  Linux running on an x86-class CPU  Linux running on server-class hardware (it may also be necessary to specify which manufacturer’s hardware is in use)  Linux running on an IBM mainframe (such as System z)  MVS-family operating system (such as z/OS) running on an IBM mainframe (such as System z)  CMS-family operating system (such as z/VM) running on an IBM mainframe (such as System z ) As you can see, neither the hardware nor operating system alone determines the platform. In SAS documentation, the equivalent term for platform is operating environment. SAS publishes a series of “Companion” manuals, one for each platform1. I.B WHAT DO WE MEAN BY “PORTABLE” SOFTWARE? The concept of “portability” in computing refers to the ability to move an application program from one computing platform to another without having to change it. Portable programs either do not involve platform aspects that differ from one platform to another, or take them into account. Some languages were created with portability as a design objective. An early example of this was the “P-code” system used by Pascal language in the early- to mid-1970s2. Chief among these today is Java, which runs in a virtual machine that isolates the Java program from its platform. These virtual machines are themselves not portable, but they enable the Java applications which run within them to be portable.

II. PLATFORM DIFFERENCES II.A HARDWARE DIFFERENCES II.A.1 Executable Instructions Hardware design concerns itself with data representation and instruction implementation. Hardware design has produced a number of differing approaches to instruction encoding, including CISC (Complex Instruction Set Computer), RISC (Reduced Instruction Set Computer), and VLIW (Very Long Instruction Words). It is not necessary to know the details of these instruction implementations to realize that they are quite different. II.A.2 Character Data

II.A.2.i Character Sets Data representations, too, can be significantly different. Characters in text strings cannot be stored directly in digital computers, as there is no internal hardware for “a” or “B”, only for 0 or 1. In order to store character data, a numeric value has to be assigned for each character (letter, digit, or special character). The mapping of characters to numeric values is referred to as a character set or character encoding. Over time, two distinct major character code sets have evolved. One of these is EBCDIC (Extended Binary Character-to- Interchange Code), which evolved in the IBM Mainframe environment. The other is ASCII (American Standard Code for Information Interchange), which evolved from origins in teletype equipment. There were some other character sets, such as BCDIC3 and Baudot4, which are no longer in use. In recent years, Unicode has been standardized in an effort to extend ASCII for use with alphabets in existence worldwide. Due to their separate evolution, ASCII and EBCDIC are not identical: they share some characters, but others are unique to one character set or the other.

In addition to visible characters (graphemes), these character sets include control characters, which direct the output device to take some action. Control characters are assigned a two- or three-letter mnemonic when used in discussions. Examples of control characters that originated with teletype devices include BEL (ring the bell on the output device), HT (jump forward to the next Horizontal Tab position), BS (backspace), CR (Carriage Return: position to the beginning of the line), and LF (Line Feed: move down to the next line). Some control characters were assigned for communications functions, such as ACK (acknowledge transmission). There are variations of ASCII, including Extended ASCII, ISO-8 (an 8-bit international version), and, ultimately, Unicode. There are also variations of EBCDIC, created for different national markets. These variations are identified by their “code page”. Even the definition of a code page can change over time. For the purposes of this document,

Page 2 of 21 Portable SAS: Language and Platform Considerations

“ASCII” will mean the 7-bit US ASCII character set, and “EBCDIC” will refer to the characters shown in IBM Publication GX20-1850-3, as this version of EBCDIC is widely supported by peripherals, and software on IBM mainframe hardware. For a look at the modern EBCDIC Code Page 037, see SA22-7871-05, “z/Architecture Reference Summary”5. In this paper, “EBCDIC” may be used as a shorthand to designate platforms that support this character set, such as IBM z/OS and IBM z/VM. “ASCII” may be used as a shorthand for platforms that support that character set, such as Windows and Unix. Finally, please note that “blank” is synonymous with “space” when referring to that character.

II.A.2.ii Contrasting the Major Character Sets There are two major character sets in use today: EBCDIC, which is used on IBM mainframes and midrange systems, and ASCII, which is used on nearly all other platforms, including PCs, workstations, and servers, as well as some embedded systems. It is worthwhile to note that there is no intrinsic reason why an IBM mainframe must use EBCDIC: z/Linux runs using ASCII. Unicode is also being adopted widely, but I will not discuss it here, because (for the most part) our SAS programs will only need to deal with ASCII, which is a proper subset of Unicode6.

ASCII and EBCDIC share many display characters in common. These include uppercase letters (A-Z), lowercase letters (a-z), digits (0-9), and the special characters exclamation point (!), commercial at sign (@), hash mark or pound sign or number sign (#), dollar sign ($), percent sign (%), ampersand (&), asterisk (*), left parenthesis (“(“), right parenthesis (“)”), underscore (_), plus sign (+), minus sign or hyphen or dash (-), equal sign (=), left brace ({), right brace (}), left bracket ([), right bracket (]), colon (:), semicolon (;), quote (“), apostrophe (‘), less-than sign (<), greater-than sign (>), question mark (?), comma (,), period (.), forward slash (/), back-slash (\), grave accent (`), tilde (~), and space. US-ASCII includes one display character which is not supported in EBCDIC: the circumflex accent (^), which also represents an “up arrowhead”, or a “caret”. Conversely, EBCDIC contains display characters which are not found in US-ASCII. These are the cents sign (¢), the not sign (¬), and the broken vertical line (¦). Note that although ASCII code point ’7C’X is shown as a broken vertical line on many keyboards and some fonts, the standard describes it a an (unbroken) vertical line; therefore it is not equated with the EBCDIC broken vertical line (’6A’X), but rather with the EBCDIC vertical bar (’4F’X). A similar situation exists with respect to control characters: some are in both character sets while some are in one or the other. As with display characters, there are twice as many control characters in EBCDIC (code points 0-63) as in ASCII (code points 0-31). Few control characters are of interest to us in SAS programming, but one exception is the HT (horizontal tab) character. HT has code point 5 (‘05’X) in EBCDIC, and 9 (‘09’X) in ASCII. Text editors use this key to position the cursor to the next designated position (called a tab stop). When doing so, the editor may either insert sufficient spaces to fill in the empty space, or it may leave the HT character as part of the file instead. For historical reasons, many text editors use a default setting of one tab stop every eight positions. There are two situations when the HT character is of interest: if we are reading source code created by the SAS editor (or certain other text editors) with the “convert tabs to spaces” option turned off, and spreadsheets that have been exported in “tab-delimited” format. EBCDIC control characters (code points 0 to 63), US alphabet (upper- and lower-case), and basic punctuation are portable among EBCDIC platforms. Other characters may vary from one code page to another. Similarly, only ASCII values from 0 to 127 are guaranteed portable within all ASCII platforms, the higher-valued code points may vary (according to the code page in use). Only values from 32 to 126 are printable. II.A.3 Integer Data Integers, of course, are the essence of computing, being used for everything from counters to memory addresses. Integers can be stored in either one of two representations: ones-complement, or two’s-complement. While non- negative values look the same in both formats, negative numbers are represented differently. In addition to this consideration, integers may be stored in various lengths: 8 bits (usually referred to a integers), 16 bits (referred to as “short” or “halfword” integers), 32 bits (“long” or “fullword” integers). Newer 64-bit machines also support 64-bit integers. Finally, in addition to considerations over length and representation (format), there are differences in how machines store integers. Some store data “forwords”, with the high-order bits (in the Most Significant Byte, or MSB) at a lower storage address, and the low-order bits at the higher storage address. Thus, the data is stored in memory in the same arrangement as it would appear in an arithmetic register for computation. This is a typical convention for processors whose architecture supports accessing memory one word at a time. However, some machines were designed to access memory one byte at a time, and such designs can lead to a method of storing data referred to as “backwords”. In such an implementation, the byte at the lowest storage address contains the least significant portion of the integer, while increasingly higher storage addresses hold containing increasingly significant portions of the integer. This arrangement, where the Least Significant Byte (LSB) is stored first (at the lower ) is also referred to as “little-endian”.

Page 3 of 21 Portable SAS: Language and Platform Considerations

This example shows how the value +517 would be stored in “forwords” (big-endian, or MSB first) and “backwords” (little-endian) sequences.

224 216 28 20 Big-Endian .------.------.------.------. or “Forwords” |00000000|00000001|00000010|00000101| '------'------'------'------' Address: N N+1 N+2 N+3

20 28 216 224 Little-Endian .------.------.------.------. Or “Backwords” |00000101|00000010|00000001|00000000| '------'------'------'------' Address: N N+1 N+2 N+3 Be sure to read “Byte Ordering for Integer Binary Data on Big Endian and Little Endian Platforms” on page 75 of SAS® 9.1.3 Language Reference: Dictionary, Third Edition. II.A.4 Decimal Data In addition to binary representation of numbers, some hardware can also store and perform computations in a pseudo-decimal format, sometimes called packed decimal. Intel refers to this format as packed Binary Coded Decimal (BCD). Decimal computations can be essential to business applications, where pennies must be correct, even on multi-million dollar numbers, and rounding can raise the suspicion of auditors. In terms of machine resources, conversion between character representation and packed decimal are much more economical than between character and integer formats. In packed decimal format, one decimal digit is stored in four bits (0000 to 1001), and two decimal digits occupy one byte (one digit each in the high-order and low-order half of the byte). These half-bytes are sometimes called nibbles. While these conventions are common, implementations differ in the representation of the sign of the number, both in which value(s) are used for negative or positive, and the placement of the sign within the stored value. There are also differences in the maximum number of digits which can be stored as a single number. This example shows how the value +517 would be stored in decimal format by an Intel 386 (or successor), and IBM System/360-class machines. Value: 0 5 1 7 .----+----.----+----. Intel 386 |0000 0011|0001 0111| '----+----'----+----' Address: N N+1

Value: 5 1 7 + .----+----.----+----. IBM S/360 |0011 0001|0111 1100| '----+----'----+----' Address: N N+1 Note: the Intel 386 implementation of decimal arithmetic is rudimentary. Multiple instructions are required to perform simple operations, and only four digits can be processed at a time. There are no signs, leaving the tracking of the correct sign as an exercise for the programmer. All implementations rely on the programmer to track the position of the decimal point for any non-integer values. Note: IBM packed decimal recognizes hex C, A, E, and F as plus signs; and D and B as minus signs. When performing computations, the hardware generates a result with one of the preferred signs, C (+) or D. (-) Note: The IBM POWER architecture (a RISC architecture used in IBM’s System i & System p, and Power Blade servers) does not support a packed decimal data representation. II.A.5 Floating-Point Data Floating-point format is used to represent rational numbers. This format allows the representation of a wide range of values, from very small to very large, both positive and negative. In order to accomplish this, the value in question is encoded in scientific notation: s x be, where s is the significand (or coefficient or mantissa), b is the base (or radix), and e is an integer exponent of the base. The base used is implicit in the format, and is commonly 2 or 16, with some instances of 10 being available. The coefficient provides the significant digits for the value. The value 258.25 would be written in scientific notation as 2.5825 x 10+2. If our radix was base 10, this would mean that the coefficient is 2.351, and the exponent is 2. These two values (2 and 2.5825) would be stored as the floating point number. For +1 a base 16 representation, the number would be written as 1.2416 x 16 . In this case, the 1 and 1.2416 would be +4 stored. Finally, a radix of 2 would result in 1.001001002 x 2 , with values of 4 and .001001002 being stored (Where’d the leading 1 go? In most binary floating point implementations, the high-order 1 is implicit under ordinary circumstances. This ingenious trick effectively extends the number of bits in the significand by 1 bit). Page 4 of 21 Portable SAS: Language and Platform Considerations

Floating-point numbers are stored in several sizes; most implementations have up to three. In general, the smallest size is 32 bits, and is referred to as either short or single-precision. The intermediate size is 64 bits long, and is referred to as either long or double-precision. The length of the largest size varies, and can be anywhere from 80 to 128 bits. This size is referred to as extended. The IEEE7 standards also provide for a half format, which is 16 bits long. The size of the storage used for the floating-point number determines the number of significant digits (the length of the mantissa). In some implementations (notably those based on the IEEE-754 standard), the amount of storage used also determines the maximum value for the exponent, thereby affecting the range of values as well. There are three standard representations for floating point numbers in use today: the IEEE 754-1985 standard8, which is a base 2 (binary) format, the IBM S/360 floating point format, which is a base 16 (hexadecimal) format, and the IEEE 754-2008 standard. There is also another format used by Cray on its SV1 machines9, as well as most of its older models. There is also an old IEEE 854-1987 radix-independent standard, which never saw wide use. The IEEE 754-2008 standard has a binary format and two decimal formats10. The IBM POWER6 server architecture was the first hardware implementation of this standard11. The IEEE 754-2008 standard was subsequently implemented by the IBM zSystem series Z9 and Z10 hardware12. Other differences between various implementations have to do with the nature of rounding, and so-called guard digits, which are additional digits of precision present only during internal computations. The IEEE standard provides for special values, such as infinities, not-a-number (NaN) values, sub-normal values, and negative zero. The IBM hexadecimal format can encounter subnormal values, but will not generate them (an exception occurs instead).

II.A.5.i Summary of Fixed-Point Implementations

Property Intel 386 IBM System z IBM POWER6 BINARY INTEGER DATA TYPE Format: two’s complement two’s complement two’s complement Sizes: 8, 16, 32, 64-bit 16, 32, 64-bit 32, 64-bit Ordering: little-endian big-endian big-endian Special values: None None None PACKED DECIMAL DATA TYPE Sizes 2 to 18 digits 1 to 15 digits Not Applicable Format: 1 digit/nibble, no sign 1 digit/nibble, trailing nibble is sign Ordering: big-endian big-endian Special values: None None

II.A.5.ii Summary of Floating-Point Data Type Implementations Property Intel 386 IBM System z IBM POWER6 Format: IEEE-754-1985 IBM base 16 IEEE-754-2008 base IEEE-754-2008 base (base 2) 2 2 Sizes: 32 (short), 32 (short), 32 (short), 32 (short), 64 (long), 64 (long), 64 (long), 64 (long), 80-bit (extended) 128-bit (extended) 128-bit (extended) 128-bit (extended) Ordering: little-endian big-endian big-endian big-endian Significant digits Short: Short: Short: Short: 7 decimal 6-7 decimal 7 decimal 7 decimal 24 binary 21-24 binary 24 binary 24 binary Long: Long: Long: Long: 15 decimal 16 decimal 15 decimal 15 decimal 53 binary 53-56 binary 53 binary 53 binary Extended: Extended: Extended: Extended: 19 decimal 33 decimal 33 decimal 33 decimal 53 binary 109-112 binary 113 binary 113 binary Value range Short: All precisions: Short: Short: (Normalized) +1.18x10-38 to +8.6x10-78 to -3.37x10-38 to -3.37x10-38 to +3.40x10+38 +7.2x10+75 +3.37x 10+38 +3.37x 10+38 Long: Long: Long: +2.2x10-308 to -1.67x10308 to -1.67x10308 to +1.79x10+308 +1.67x10308 +1.67x10308 Extended: Extended: Extended: +3.4x10-4932 to -1.2x104932 to -1.2x104932 to +8.4x104933 +1.2x104932 +1.2x104932 Special values Yes No Yes Yes As you can see, there are some noteworthy differences between the IBM base 16 and IEEE base 2 schemes. I have not included the IEEE-764-2008 Decimal Floating Point format because it is not used by SAS. The IBM base 16

Page 5 of 21 Portable SAS: Language and Platform Considerations design produces the same range for all precisions, whereas the range for the IEEE base 2 design varies according to the storage size. Interestingly, the IBM base 16 representation results in a more asymmetrical negative vs. positive exponent range than the IEEE base 2 design. The most interesting similarity between these standards is that the short and long formats provide roughly the same precision. The Intel 386 implementation is based on the 1985 version of the IEEE-754 standard, and has an of 80 bits13, whereas the IBM IEEE base 2 implementation is based on the 2008 standard of the IEEE-754 standard, and has an extended precision of 128 bits, and provides precision comparable to the IBM base 16 format. II.B SOFTWARE DIFFERENCES Software differences between platforms derive from the features of the respective operating systems. These differences may include the manner in which data is stored and accessed, and the syntax and function of commands. II.B.1 Commands Operating systems may support one or more distinct sets of commands. Unix-based systems may provide one or more of the C shell, Korn shell (ksh), and the “Bourne again” shell (bash). IBM MVS-family operating systems have one language for running batch jobs (JCL) and another for interactive use (TSO Commands). Commands may not be consistent across operating systems from the same source. For example, the commands for IBM’s TSO are significantly different than those for IBM’s CMS, even though they are both interactive systems. There are even variations in commands between Linux and BSD Unix, which both have their origins in Bell Labs’ Unix. Another general trend is that new commands are added with each release of an OS. Current Windows users have commands at their disposal that were not available in previous generations of that OS. In addition, existing commands often evolve from one iteration of an OS to the next, sprouting new features or behaviors under particular conditions. Naturally, Windows has its own set of commands (which trace their lineage back to PC-DOS). II.B.2 Data Storage The manner in which data is stored on various platforms can be quite different. In Windows and Unix, text files (such as a SAS report) are stored as strings of characters with special character(s) signifying the end of a line or file. Due to the usage of these sequences of special characters, they cannot be used as data within the file. This results in a different method of access for binary files than text files. On the IBM mainframe legacy operating systems, collections of data, known as records, are stored according to either a pre-arranged length or along with metadata specifying the length of each record. The distinction between text lines or binary data is not a consideration in these systems, because any data values can be stored in a record without causing processing errors. The consequence of this situation is that blindly transferring data from an IBM mainframe platform to a Unix or PC platform may inadvertently introduce end-of-line or end-of-file markers that would result in a different interpretation of the data when it is read on the target platform. II.B.3 Access Methods Some operating systems support methods of storing and accessing data that others don’t. For example, IBM MVS- family operating systems provide an indexed file organization named VSAM which allows a program to access data via a character key. The DEC VMS OS also supports a native indexed file organization. There is no direct analog for this file organization in Windows or Unix. Indexed files, while supported by SAS Base Language, present special portability issues, which must be addressed on a case-by-case basis. II.B.4 Database Access Access to databases from SAS programs is provided the ODBC14. This works consistently across platforms, and does not usually present portability issues.

III. BASE LANGUAGE CONSIDERATIONS We have seen that there may be significant variations between platforms. How can we possibly code SAS programs that are immune to these differences? Our strategy is three-pronged:  let SAS shield us from many platform differences  write our programs in a manner that avoids code which is prone to being affected by platform differences  when necessary, isolate platform-specific code from the portable portion of the program III.A INSTRUCTION SET CONSIDERATIONS It was previously pointed out that different processors use machine instruction sets that can be dramatically different. We don’t need to concern ourselves with this, as the SAS for a given platform produces object code appropriate to that platform. III.B INTERNAL MEMORY CONSIDERATIONS Depending on the platform in question, a SAS program might be able to access and possibly alter instructions and/or data in memory through use of the PEEK, PEEKC, PEEKLONG, PEEKCLONG functions and/or POKE and POKELONG routines.

Page 6 of 21 Portable SAS: Language and Platform Considerations

Use of these functions will almost invariably lead to a program that is not portable, because the typical use of them is to access hardware- or OS-specific data. Such control structures will vary from one platform to another. Even SAS cautions against their use15: CAUTION: The CALL POKE routine is intended only for experienced programmers in specific cases. If you plan to use this routine, use extreme care both in your programming and in your typing. Writing directly into memory can cause devastating problems. This routine bypasses the normal safeguards that prevent you from destroying a vital element in your SAS session or in another piece of software that is active at the time. Naturally, this warning applies to all the functions POKE family. Even if memory containing raw data is accessed, there will be issues, such as the representation of the data on the given platform, and storage order (big-endian vs. little-endian). III.C CHARACTER SET CONSIDERATIONS III.C.1 Pitfalls There are many pitfalls in dealing with ASCII and EBCDIC character sets16.

III.C.1.i Pitfall: Embedded Characters in Hex, Octal, or Decimal Notation In order for your code to be portable, you do not want to use any non-graphic representation of characters, such as hexadecimal or octal literals, or conversions from decimal values. When characters are coded in such a manner, they are not automatically converted to the target platform’s native character set when they are transferred from one platform to another. The following are all ways in which the ASCII character “a” can be coded in a SAS program:  'a'  "a"  '61'X  BYTE(97) However, only the first two are portable. If a program containing these were transferred to an EBCDIC platform, only the first two would be translated; the last two would remain essentially ASCII, even in the EBCDIC environment. There are situations where you cannot use a character string to represent a character you need. One of these is control codes, which have no corresponding glyph. Another would be when you want to represent a character that is not available on the platform you are using to write your code. In order for your application to be portable under these circumstances, you will have to create code that can choose the proper code point for the current platform. Suppose we are reading a spreadsheet stored in tab-delimited form. The best approach would be to use the FILE statement: FILE SPRDSHT DSD DLM=horizontal_tab ; Note that the input file has a filename of SPRDSHT, and that the DSD option has been specified to cause SAS to treat two consecutive delimiters as a missing value and remove quotation marks from character values. The issue now becomes how to code the horizontal_tab. For an ASCII platform, we could code: FILE SPRDSHT DSD DLM='09'X ; But this will fail as soon as the code is executed on a platform that uses EBCDIC, where HT is ‘05’X. Conversely, if the code was written explicitly for EBCDIC, it wouldn’t work on an ASCII platform. The solution is to have the application detect the character set in use, and set the proper value for the horizontal tab. The solution could look like this: ATTRIB HT LENGTH= $1 ; IF RANK( '3' ) = 51 /* 51 in ASCII, 243 in EBCDIC */ THEN /*ASCII */ HT = '09'X ; ELSE /*EBCDIC*/ HT = '05'X ; FILE SPRDSHT DSD DLM=HT ; This example assumes that you have no choice regarding the format used to export the spreadsheet. If you can specify the manner in which the spreadsheet is exported, the “comma-delimited” form will avoid these platform- specific complications. There is one additional trap. Perl regular expressions (PRX) provide a means of coding a character in octal notation. This is done by a backslash followed by up to three octal digits. Examples include "\5" (EBCDIC HT, ‘05’X), "\11" (ASCII HT, ‘09’X), and "\132" (ASCII “Z”, EBCDIC “!”, ‘58’X). You must be on the lookout for these hidden time bombs!

Page 7 of 21 Portable SAS: Language and Platform Considerations

Generally, you want to avoid those functions which convert between numeric and character forms. These are RANK() and BYTE(). In addition, such conversions can occur by using INPUT(), INPUTC(), INPUTN(), PUT(), PUTN(), or PUTC() functions with INFORMATs and OUTFORMATs such as $HEX, $OCTAL, HEX, and OCTAL.

III.C.1.ii Pitfall: Assuming That a Given Set of Characters is Contiguous Suppose that you need to generate a SAS data set with one observation for each of the letters A through Z. It is tempting to start with the internal representation for “A”, and add 1 to it to generate each of the subsequent letters. The code to do so would look like this: DATA Letters_A_Z ; ATTRIB letter LENGTH= $1 ; RETAIN ii ; DO ii = RANK( "A" ) TO RANK( "Z" ) ; letter = BYTE( ii ) ; OUTPUT ; END ; STOP ; RUN ; PROC PRINT ; This would work in an ASCII platform, but not in an EBCDIC platform. Why? Because the assumption that these letters have contiguous code points is false for EBCDIC. In fact, in EBCDIC, the tilde (~) falls between lowercase “r” and lowercase “s”! This logic can be coded in a platform-independent manner by not relying on the specific characteristics of the codes assigned to these letters. DATA Letters_A_Z ; ATTRIB letter LENGTH= $1 ; ATTRIB capitals LENGTH= $26 ; ATTRIB ii ; capitals = "ABCDEFGHIJKLMNOPQRSTUVWXYZ" ; DO ii = 1 TO LENGTH( capitals ) ; letter{ ii } = SUBSTR( capitals, ii, 1 ) ; OUTPUT ; END ; STOP ; RUN ; PROC PRINT ; At first blush, it might seem that we could use COLLATE( "A", "Z" ) as a more convenient way to generate "ABCDE...XYZ". However, this will not work, because COLLATE generates all the characters between the first and last ones given. On an ASCII platform, “A” is code point 65, and “Z” is code point 90, giving us 90 – 65 + 1 = 26 characters, just as we expect. However, on an EBCDIC platform, “A” is code point 193, and “Z” is code point 233, giving us 293 – 193 + 1 = 101 characters, which is just the same result we would have gotten by incrementing the internal values ourselves, as in the non-portable code example. You should avoid performing any arithmetic computations on character values. Consequently, you should avoid functions that perform such computations, such as COLLATE(), as well as functions that enable such computations, including RANK() and BYTE(). Instead, use character functions, such as UPCASE(), LOWCASE(), and TRANSLATE(), to transform characters.

III.C.1.iii Pitfalls of Character Comparison An important difference between ASCII and EBCDIC is their collating sequence. Collating sequence refers to the numeric ordering of the characters within the character set. This ordering is important when we perform comparisons either than equal or not equal. Although control characters precede display characters and space is the first (lowest) display character in both character sets, the rest differs. The characters in EBCDIC have code points that start with the control characters, followed by the space character (usually referred to as “blank” by mainframe programmers), and place most special characters before lowercase letters followed by uppercase letters, and finally numeric digits. Less common special characters, which were added after the initial design can be found between the lower- and upper-case letters, and even in between some letters. Characters in ASCII start with with the control characters, followed by the space character, followed by special characters followed by digits, then some more special characters, with uppercase letters, then more special characters, and lowercase letters followed by (you guessed it!) more special characters. This means that a statement like:

Page 8 of 21 Portable SAS: Language and Platform Considerations

IF "A" > "a" THEN PUTLOG "A>a" ; ELSE PUTLOG "A<=a" ; will give different results when run on an ASCII platform than an EBCDIC platform. The only comparisons that yield the same results for both character sets involve only one of these subsets of characters:  all uppercase letters  all lowercase letters  all digits Does this matter? Maybe! The first principle here is to avoid any assumptions about the ordering of characters between these groups. Due to the different ordering of characters within the two character sets, a file that was sorted on one platform might not be considered sorted when transferred to another platform. For this reason, it is best to assume that a file transferred from a foreign platform, or of unknown origin, is not sorted. A match-merge operation is implemented by using the MERGE statement in conjunction with a BY statement. This operation is dependent on the ordering of two or more SAS data sets. Comparing the BY-variables from two or more data sets is subject to the same pitfalls as other comparisons. For a match-merge to work properly, all of the character BY-variables the input data sets must be of the same character set, and sorted in the order (ascending or descending) specified in the MERGE statement. III.C.2 Character Functions In addition to the character functions listed above, there are several functions whose default values differ according to the platform they run on.

Function or Subroutine Considerations ANYPUNCT(string <,start>) The results of the ANYPUNCT and NOTPUNCT functions depend directly on the translation table that is NOTPUNCT(string <,start>) in effect (see “TRANTAB= System Option” in the SAS Language Reference: Dictionary) and indirectly on the ENCODING and LOCALE system options. BYTE(n) NOT portable COLLATE(start-position NOT portable [,{end-position|, length}]) RANK(c) NOT portable: Returns the position of a character in the collating sequence for the native character set CALL SCAN(string, n, position, The default delimiters supplied for these functions differ length<,delimiters>); between ASCII and EBCDIC platforms. To insure CALL SCANQ(string, n, position, consistent operation, be sure to explicitly specify these length<, delimiters>); parameters. SCAN(string, n<, delimiter(s)>) SCANQ(string, n<, delimiter(s)>) TRANSLATE(source, to-1, from-1, <... to- Portable, but see notes in Appendix A n, from-n>)

III.C.3 Characters in Your SAS Program Aside from characters contained in character constants (literals), there are considerations in which characters are used in the code itself. SAS programs are generally composed of characters which exist in both the ASCII and EBCDIC character sets. The exception to this observation is the character used as the NOT symbol. SAS permits several characters to be used as the NOT symbol. These are: the EBCDIC NOT symbol (¬) , the ASCII cap or hat symbol (^), and the tilde (~). The EBCDIC NOT symbol (¬) is not recognized by the SAS compiler on ASCII platforms. The ASCII circumflex symbol (^) is not recognized by the SAS compiler on EBCDIC platforms. Fortunately, the tilde (~) is recognized on all platforms. I strongly recommend that you use tilde as your NOT symbol. Some programming languages allow <> or >< as the not-equal operator. Don’t make the mistake of doing this in SAS, as these operators are interpreted as MAX and MIN, respectively. An alternative to using tilde as the NOT symbol would be to use mnemonics wherever a NOT operation was required: NE for ~=, and NOT in place of standalone ~ for Boolean operations. There are no difficulties with the vertical bar (|) character. Although it appears on some keyboards and fonts as a broken vertical line, , it will transfer to EBCDIC systems as the vertical bar (code point '4F'X), and its interpretation

Page 9 of 21 Portable SAS: Language and Platform Considerations as the OR operator is universal. This is true despite the fact that EBCDIC includes a broken vertical line (¦) character at code point '6A'X. The final characters that we must deal with are the left bracket ([) and the right bracket (]). A historical problem exists with these characters in EBCDIC. No code points were assigned to these characters in the original EBCDIC specification. Once there were output devices that could print these characters, code-points were assigned to them for this limited purpose ('AD'X and 'BD'X). Those code points are part of the current definition for EBCDIC Code Page 1047. However, the code points were assigned as part of EBCDIC Code Page 037 are different ('BA'X and 'BB'X). This second set of code points may have originated with a different output device (display tubes). The net result is that transfer and display of these characters is problematic. What is the responsible SAS programmer to do in the face of this? It’s simple: don’t use brackets. They are not a necessity in SAS (as they are, for example, in C, C++, Java, et al). Instead, I recommend that you use braces ( { and } ) or simply parentheses. III.D NUMERIC CONSIDERATIONS III.D.1 Binary Numeric Data We have seen a variety of numeric data types (binary integer, decimal fixed-point, and hexadecimal/binary/decimal floating-point), in a range of sizes (8 to 128 bits). We must honor these representations when reading and writing binary data on the various platforms. If the requirement to deal with these data types is imposed by external sources, you must code properly for them, and there are suitable SAS INFORMATs and OUTFORMATs for this purpose. However, if you have the flexibility to choose, design your SAS program so as to avoid all binary formats in files that might be passed from one platform to another. III.D.2 Computational Accuracy Clearly, we want our programs to produce the same results regardless of the platform they are run under. Yet there are a wide range of precisions available. SAS performs its computations using floating-point representation. Floating-point precision can be anywhere from 6 to 33 digits. If SAS chooses to use different precisions on different platforms, the results from a run on one platform could be different from another. Single-precision computations on any one platform could produce different results than double-precision calculations on a different, or even the same, platform. The other possibility we need to consider is the difference between IEEE and IBM hexadecimal floating point hardware: even at the same nominal precision, could the results differ? You can specify the amount of storage to be used for a numeric variable, and thereby its precision using the LENGTH attribute. This can be done either by means of the ATTRIB statement, or the LENGTH statement. For example: ATTRIB short LENGTH=4 ; LENGTH long 8 ; SAS gives the possible lengths17: For numeric variables, 2 to 8 or 3 to 8, depending on your operating environment. We see from this that SAS does not support any form of extended precision. Here is SAS’ statement on the data format they use on IBM mainframe hardware18: To store numbers of large magnitude and to perform computations that require many digits of precision to the right of the decimal point, SAS stores all numeric values in 8-byte floating-point (real binary) representation. On platforms which employ the IEEE standard for floating-point numbers, such as Windows, SAS states19: The default length of numeric variables in SAS data sets is 8 bytes. … In SAS under Windows, the Windows data type of numeric values that have a length of 8 is LONG REAL. The precision of floating-point values is always accurate to [at least] 15 digits. All of this brings us to the best choice: let SAS use the default length of 8 for numeric values. This gives us at least 15 digits of precision on all platforms. It is also the least amount of effort, as we do not have to explicitly declare a non-standard length for all our numeric variables. Ordinarily, we will not have to do anything to assure portability of our programs, but when the LENGTH of a numeric variable is given explicitly, we must remember to code it as 8. Changes in length for input and output can be handled by the length specified on the INFORMAT and OUTFORMAT used for the variable. Although this technique insures the most precision available, and similar precision on all platforms, there is still the issue of range. In theory, a value greater than roughly 10+75 could cause a problem on IBM mainframes. I personally have not encountered this situation, and I believe it to be highly improbable. However, you know your data better than I, so you will know when this is a possibility. Values less than about 10-78 will become zero on IBM mainframe systems. This may, or may not, affect your results. Once again, whether or not this is a realistic possibility, and

Page 10 of 21 Portable SAS: Language and Platform Considerations whether or not it will have any impact, can only be determined by you, based on the data and the processing algorithm. The bottom line is, that due to different floating-point implementations on different platforms, there may be differences in computed values. This is nearly impossible between platforms that both use the same floating-point representation (for example IEEE floating-point on a PC and on a server), and possible but unlikely between machines that have different floating-point representation (for example IBM mainframe hexadecimal floating-point and IEEE floating-point on either a PC or a server). Such differences should be small. III.E INFORMAT AND OUTFORMAT CONSIDERATIONS INFORMATs and OUTFORMATs can be bane or boon for portability. When you are reading an input file from, or are creating output for, from a specific platform, you will have to use INFORMATs and OUTFORMATs appropriate to that platform. Some key INFORMATs and OUTFORMATs used for such conversions are:  $ASCIIw. o On EBCDIC systems, $ASCIIw. converts EBCDIC character data to ASCIIw. o On all other systems, $ASCIIw. behaves like the $CHARw. format.  $EBCDICw. o On ASCII systems, $EBCDICw. converts ASCII character data to EBCDIC. o On all other systems, $EBCDICw. behaves like the $CHARw. format.  IBRw.d , PIBRw.d – converts SAS numbers to/from DEC/Intel (“backwords”) Integer Binary form  IEEEw.d – converts SAS numbers to/from IEEE floating-point form  S370FIBw.d, S370FIBUw.d, S370FPIBw.d, S370FPIBw.d – converts SAS numbers to/from IBM System/370 Integer (Fixed) Binary form  S370FPDw.d, S370FPDUw.d – converts SAS numbers to/from IBM System/370 Packed Decimal form  S370FRBw.d – converts SAS numbers to/from IBM System/370 Floating Point form  S370FZDw.d, S370FZDLw.d, S370FZDTw.d, S370FZDUw.d – converts SAS numbers to/from IBM System/370 Zoned Decimal format Some formats pose a hazard, because of their differing behavior on different platforms. These formats produce output that is in a form specific to the platform the SAS program is being run on. Such representations are referred to as native. As long as the data is written and read on the same platform, these formats can be used safely. In addition, they may be more efficient than formats which write data in a non-native representation.  FLOAT4. – converts SAS numbers to/from the native single-precision, floating-point value  HEXw. – converts SAS numbers to/from the hexadecimal representation for their native integer or floating- point binary representation  $HEXw. – converts SAS characters to/from the hexadecimal representation for their native representation  IBw.d, PIBw.d – converts SAS numbers to/from native Integer Binary form  OCTALw. – converts SAS numbers to/from the octal representation of their native integer representation  $OCTALw. – converts SAS numbers to/from the octal representation of their native integer representation  PDw.d – converts SAS numbers to/from native Packed Decimal format  RBw.d – converts SAS numbers to/from native Floating Point (Real Binary)  ZDw.d – converts SAS numbers to/from native Zoned Decimal format Consult the SAS Companion for a given platform to learn about its native formats. III.F MACRO LANGUAGE CONSIDERATIONS Like base code, macro language is also susceptible to the following considerations:  comparisons affected by the character set’s collating sequence  the default delimiters supplied for %SCAN and %QSCAN differ between ASCII and EBCDIC platforms. To insure consistent operation, be sure to explicitly specify these parameters. For more detailed information, see “Macro Language Elements with System Dependencies”, on page 144 of SAS 9.1 Macro Language Reference. Refer to Appendix A for additional automatic macro variables (beginning with “&”) and statements and autocall macros (both of which being with “%”). III.G PROC CONSIDERATIONS Comparison is at the heart of the SORT and MERGE operations. Consequently, there are portability considerations for PROC SORT and match-merging operations. Sorting or merging on numeric values alone should not impact portability. However, sorting based on character strings, unless those keys solely contain unsigned fixed-point, decimal-aligned numbers, will produce varying results from one platform to another. This difference in ordering does not impact the portability of your program as long as the data is maintained solely within one platform. However, if data is transferred from one platform to another, it may no longer be considered sorted on the target platform. For data which will be moved from one platform to another, there are two possible

Page 11 of 21 Portable SAS: Language and Platform Considerations approaches: (a) re-sort the data after it has been transferred to the target platform, or (b) pre-sort the data on the sending platform. You can specify the ASCII or EBCDIC option of the SORT PROC to force sorting by the collating sequence of a particular character set, regardless of the sorting platform’s native character set. WARNING: do not use SORTSEQ= if you have already converted the data to a particular character set by use of the $ASCII. or $EBCDIC. OUTFORMAT. Other PROCs with platform dependencies include CIMPORT, CPORT, IMPORT, EXPORT, CONVERT, SOURCE, CALENDAR (when used with ODS), CATALOG, CHART (when used with ODS), CONTENTS, COPY (when used with the XPORT engine), DATASETS, FORMAT, MIGRATE, OPTIONS, PLOT (when used with ODS), PRINTTO, and REPORT. Often, it is the uncommon features of these PROCs that present portability issues, but you should read the appropriate section for any PROC that you will use in more than one platform in the Base SAS® 9.1.3 Procedures Guide. In addition, you should read the platform companion documentation for PROCs CATALOG, DATASETS, FRONTREG (when used under z/OS), FORMAT, OPTIONS, PRINTTO, and REPORT. I strongly recommend that you avoid PROC PMENU. If not, research it thoroughly, and check its operation carefully on each platform. Note that carriage control creates portability issues, no matter which PROC or statement it is used with. If you are dealing with carriage control, be sure to research any of the following that you will be using: PROC PRINTTO, the FSLIST command, the FILE, INFILE, FILENAME, INPUT, & PUT statements, the FAPPEND, FOPEN, FWRITE, & MOPEN functions, the SKIP= system option. Check both the base language and platform companion manuals. All PROCs which perform computations, such as CORR, MEANS, SUMMARY, TABULATE, UNIVARIATE, etc, are subject to floating-point considerations, such as range and accuracy (covered above). For a list of PROCs which will only work on particular platforms, see Appendix 2, “Operating Environment-Specific Procedures” in Base SAS® 9.1.3 Procedures Guide. Many PROCs deal with printing. Printing is implemented differently on different platforms. For information related to printing, consult SAS Language Reference: Concepts. Additional information may be available in the SAS companion documentation for your operating environment. II.H SYSTEM, STATEMENT, AND PROC OPTIONS CONSIDERATIONS There is a hidden trap that can catch you unawares as you move your program from one platform to another: System Options. The reason for this is that not only can default settings vary from one platform to another, but when SAS was installed on each platform, it may have been customized with different system options by the administrator who performed the installation! To be prepared to deal with problems that arise that are linked to system options, place a PROC OPTIONS step at the start of your portable programs. Something as simple as a difference in the L= option can produce surprising (and unpleasant) compilation errors; run-time variations are even more troublesome. The best response to this hazard is to include an OPTIONS statement with a complete set of explicit options for every setting relevant to your program. II.H.1 Character Encoding Option Considerations The term encoding refers to the process of assigning code points from a given code page to represent a text string. For example, text encoded in EBCDIC will be represented internally differently than the same text encoded in Unicode. Encoding is controlled by the ENCODING= data set option, the ENCODING= option of the FILE, FILENAME, and INFILE statements, the INENCODING= and OUTENCODING= options of the LIBNAME Statement. The $UCS* family of FORMATs and INFORMATs are affected by encoding considerations. For the necessary understand of encoding options, see “The ENCODING data set option” and “Encoding Values in SAS Language Elements” in SAS National Language Support (NLS): User’s Guide. II.H.2 Transcoding Option Considerations The process of transforming text from one character encoding to another is called transcoding. SAS transcoding is controlled by the TRANTAB= system option, This option affects the operation of character searching functions ANYALNUM, ANYALPHA, ANYGRAPH, ANYLOWER, ANYPRINT, ANYPUNCT, NOTALNUM, NOTALPHA, NOTFIRST, NOTGRAPH, NOTLOWER, NOTPRINT, NOTPUNCT, NOTSPACE, NOTUPPER, as well as the TRANTAB and URLDECODE functions. For details, see “The TRANTAB= system option” in SAS National Language Support (NLS): User’s Guide II.H.1 Locale Option Considerations The locale of your platform determines things such as the character that is used to designate monetary units (such as the dollar sign, pounds sterling sign, euro sign, etc) as well as date format. Formats affected by your locale include of NLDATEw., NLDATEMNw., NLDATEWw., NLDATEWNw., NLDATMw., NLDATMAPw., NLDATMTMw., NLTIMAPw., NLTIMEw., NLMNYw.d, NLMNYIw.d, NLNUMw.d, NLNUMIw.d, NLPCTw.d, and NLPCTIw.d.. See “LOCALE System Option: OpenVMS, UNIX, Windows, and z/OS” in SAS National Language Support (NLS): User’s Guide.

Page 12 of 21 Portable SAS: Language and Platform Considerations

IV. OPERATING SYSTEM CONSIDERATIONS IV.A EXTERNAL EXECUTABLES CONSIDERATIONS IV.A.1 OS Command Considerations Through use of the system() function, SAS programs can issue a command to the Operating System. OS commands are inherently dependent on the OS for their syntax and function, and, except possibly for some trivial cases, not portable. In addition to the system() function, OS commands may be issued by using the X (eXecute) statement, and the %SYSEXEC macro statement. I strongly recommend against use of these facilities. IV.A.2 User-Written Module Considerations The use of the CALL MODULE subroutines can be portable, but the user must port these routines to each target platform, and possibly tailor the tables which describe the between SAS and the user module as well. This facility is also supported by the %SYSCALL macro statement. IV.B OTHER PLATFORM-SPECIFIC FEATURES Under some platforms, it is possible for SAS to access hardware features that may not exist on other platforms. An example of this can be found in the SAS Companion for Windows, where an illustration is given of how to read data from a COM: (RS-232 serial) port is given20. Any code that accesses hardware features which are not available on all target platforms will not qualify as portable. Likewise, exploiting platform-specific software features can just as readily undermine portability. Accessing or altering hardware or OS internal data via PEEK and POKE, or issuing commands via the system() function, executing routines in DLLs (or their analog on non-windows systems), interacting with objects using OLE or DDE (in Windows), or with ISPF (on MVS TSO) must all be avoided. An example of logic that exploits MVS data structures is given by SAS for “Listing ASCB Bytes” and “Creating a DATA Step View” of the TIOT (Task Input/Output Table), both on the z/OS platform21. Another way to interact with the OS is the SYSGET() function. This function operations somewhat differently on each platform, and may undermine the portability of your program. SAS provides a number of ways in which to obtain information about the computing environment that are independent of the platform. The most useful of these are the Portable Automatic Macro Variables. These macro variables will contain analogous information for each given platform. They are22:

Portable Automatic Contents Macro Variable Name &SYSDEVIC The name of the current graphics device on DEVICE= &SYSENV The mode of execution (“FORE” or “BACK”) &SYSJOBID The name of the currently executing batch job. On *nix platforms, this is the PID (Process ID); on MVS platforms, this is the job name. &SYSRC The last return code generated by your host environment (in response to a SYSTEM() invocation, the X statement, or the %SYSEXEC, %TSO or %CMS macro statements.) &SYSSCP System Control Program (Operating System). Possible values12 include:  VMS for VAX VMS  OS2 for IBM OS/2  WIN for Microsoft Windows  CMS for IBM VM/CMS &SYSCPL System Control Program (Operating System) detail. Possible values12 include:  WIN_32S for 32-bit Widows (generic)  WIN_95 for Widows 95  WIN_NT for Widows NT &SYSPARM Retrieve a character string that was passed to SAS by the SYSPARM= system option There are other automatic macro variables whose value is determined completely internally, and are safe to use on all platforms. These are listed in Appendix A. IV.C ISOLATING SYSTEM DEPENDENCIES Inevitably, there are situations where platform-dependent statements are a necessity. The obvious case is the identification of input and output files. The need to do this varies from one platform to another. The IBM mainframe platforms, MVS, CMS, and VSE23, and their various incarnations (currently z/OS, z/VM, and z/VSE) use system commands external to the SAS program to associate files with DDNAMEs (Data Definition Names). Due to the fact that SAS originated in these environments, a DDNAME automatically becomes a SAS fileref. The beauty of IBM’s method of divorcing the identification of the files to be used for input and output to a program from the program itself, is that the program can be run multiple times with multiple inputs and outputs without the program itself ever having to be modified. A program can be run with magnetic tape input and disk output, or vice

Page 13 of 21 Portable SAS: Language and Platform Considerations versa. Output can go to a printer one time, or to a file for subsequent FTP the next time the program is run. Once again, all this is done without any need to change the program. On Windows and Unix platforms, SAS programs identify external input and output files by using the FILENAME statement. The names of the files to be read and written are coded explicitly in the program. This means that whenever an input or output file name changes, the program code has to be changed. This will cause both portability problems and maintenance complications.

V. CREATING PORTABLE SAS PROGRAMS V.A THE THREE-PRONGED STRATEGY TO CREATE A PORTABLE SAS PROGRAM I recommend a three-pronged strategy for creating portable SAS programs: avoid, externalize, and encapsulate24. The tactics are listed in their order of priority. V.A.1 Avoid Non-Portable Language Elements This is straightforward: if you find yourself using a non-portable SAS language element, look for an alternative means to accomplish your goal, or ask yourself if what you are doing is truly necessary. In programming, Simplicity is a virtue (and sometimes harder to achieve than complexity)! See Appendix A for recommendations on specific language elements. Some SAS functions that should be avoided include PEEK, PEEKC, PEEKLONG, PEEKCLONG. POKE, POKELONG, RANK, BYTE, and COLLATE. The FILEMAP, COMMPORT, and PIPE keywords of the FILENAME statement should be avoided. All SAS Windowing commands, must be avoided. INFORMATs and OUTFORMATs $HEX., $OCTAL., HEX., and OCTAL. must be used with great care (although they’re OK for debugging). Finally, the CALL MODULE family (MODULE, MODULEN, MODULEC, MODULEI, MODULEIN, MODULEIC) presents special problems, because the modules executed would have to be ported as well, along with their interface definition tables. V.A.2 Externalize Non-Portable Language Elements This tactic removes platform-specific elements from the core logic of your program, and moves it to an outer shell which can perform prologue and epilogue process specific to the environment. It then becomes possible to have a small piece of code which is platform-dependent and a large module whose code is common to all platforms. You will then have one version of core logic, along with several different versions of the platform-specific code. For example, if we have code like the following: DATA Processed_Data ; FILE "C:\Data\Weekly_Sumry_2009-05-12.DAT" ; /*for Output from PUT stmt*/ INFILE "C:\Data\Weekly_Data_2009-05-12.DAT" ; /* External input */ . . . RUN ; The file specifications are specific to the platform (Windows in this case. We can decouple the filespecs from the FILE and INFILE statements by using the FILENAME statement: FILENAME InptFile "C:\Data\Weekly_Data_2009-05-12.DAT" ; FILENAME OutptFil "C:\Data\Weekly_Sumry_2009-05-12.DAT" ; DATA Processed_Data ; FILE OutptFil ; /* For output from PUT statement */ INFILE InptFile ; /* External input */ . . . RUN ; This step makes maintenance easier, but does not make the program portable. What we must do is create two separate SAS programs: /* The Windows-specific part of the code */ FILENAME InptFile "C:\Data\Weekly_Data_2009-05-12.DAT" ; FILENAME OutptFil "C:\Data\Weekly_Sumry_2009-05-12.DAT" ; %INCLUDE APPLIB(WklySmry) /* Common code */ ; Stored separately as WklySmry in the application code repository: /* The platform-independent part of the code */ DATA Processed_Data ; FILE OutptFil ; /* For output from PUT statement */ INFILE InptFile ; /* External input */ . . . RUN ;

Page 14 of 21 Portable SAS: Language and Platform Considerations

Now that we have this structure, it would be easy to create platform-specific code for Unix, MVS, etc, modeled on the Windows-specific code. V.A.3 Encapsulate Non-Portable Language Elements The final tactic applies to those platform-dependent language features that are absolutely necessary, but cannot be externalized. These situations should be rare, as macros can generally be used to externalize such features. Take, for example, the problem of causing a SAS program to pause for one minute. Under MVS and CMS, this would be achieved by CALL SLEEP( 60000 ) ;, whereas on a Windows machine, you would have to code dummy = SLEEP( 60 ) ; . Further suppose this action must be taken at some point during the course of a DATA step, and cannot be externalized to a prologue or epilogue. To accomplish this delay, you could code: SELECT ( "&SYSSCP" ) ; WHEN ( "OS", "CMS" ) /*MVS|CMS*/ CALL SLEEP( 60000 ) ; WHEN ( "Win" ) /*Windows*/ dummy = SLEEP( 60 ) ; OTHERWISE /*Unknown platform*/ ABORT ; END /* Case of operating system */ ; It would be even better to create a macro to generate this code. The following example uses seconds as the units for its parameter, but you could code it differently: %MACRO TakeaNap( duration_in_seconds ) ; SELECT ( "&SYSSCP" ) ; WHEN ( "OS" /*MVS*/, "CMS" /*CMS*/ ) CALL SLEEP( ( &duration_in_seconds ) * 1000 ) ; WHEN ( "Win" ) /*Win*/ dummy = SLEEP( &duration_in_seconds ) ; OTHERWISE /*Unknown platform*/ ABORT ; END /* Case of operating system */ ; %MEND TakeaNap ; The user would then code %TakeaNap( 60 ) to sleep for one minute. This macro generates code that must make the decision of what to execute during run time. You could take advantage of the fact that &SYSSCP is known at macro time to generate only the code to be executed, without any run-time decision overhead. Such a macro would look like: %MACRO TakeaNap( duration_in_seconds ) ; %IF "&SYSSCP" = "OS" /*MVS*/ | "&SYSSCP" = "CMS" /*CMS*/ %THEN %DO ; CALL SLEEP( ( &duration_in_seconds ) * 1000 ) ; %END ; %ELSE %IF "&SYSSCP" = "WIN" /*Windows*/ | "&SYSSCP" = "WIN_SRV" /*Windows Servers*/ %THEN %DO ; dummy = SLEEP( &duration_in_seconds ) ; %END ; %ELSE /*Unknown platform*/ %DO ; ABORT ; %END ; %MEND TakeaNap ; Now that this platform-dependency is encapsulated in a macro, the macro can be externalized, and shared with other programs as needed. Note: I used ABORT in the last example, so that if the expanded code were never executed, it would not cause the program to fail. Depending on your philosophy of these things, you could use %ABORT to insure that the code is never executed on an unsupported platform.

VI. CONCLUSION The ability to run the same SAS program on any one of several platforms allows an organization to lower costs and increase flexibility. This enables the enterprise to best leverage its computing resources.

ACKNOWLEDGEMENTS I wish to acknowledge the invaluable assistance of Mr. Donald Weimer, as well as constructive review feedback from Ivan Padilla, Mike Whitaker, and one other.

Page 15 of 21 Portable SAS: Language and Platform Considerations

RECOMMENDED READING For the fine points of character encoding in code pages, transcoding between code pages, and the code pages themselves cam be found in SAS National Language Support (NLS): User’s Guide. You may also want to peruse “Processing Data Using Cross-Environment Data Access (CEDA)” in SAS Language Reference: Concepts. I strongly recommend that you review the SAS companion manual for each platform you will be porting to.

CONTACT INFORMATION Your comments and questions are valued and encouraged. Contact the author at: Robert Cruz, Info-Mation Systems 1691 El Camino de Vida Hollister, CA 95023 Work Phone: 831-207-9132 Fax: 413-771-053 E-mail: [email protected]

TRADEMARKS, BRAND AND PRODUCT NAMES SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are trademarks of their respective companies.

APPENDIX A: SAS LANGUAGE ELEMENTS WITH PORTABILITY CONSIDERATIONS

For information on options, see Section II.H, “System, Statement, and PROC Options Considerations”, above.

Language Element Considerations Portability Recommendation Statement: %ABORT The effect of this statement varies Code this statement without any from one platform to another, and parameters (neither ABEND nor even with operating methods RETURN) to insure portability. (foreground vs. background or batch) on a given platform Format: $ASCIIw. Platform independent. Use this format to read data from, Produced output or reads input in the or produce data for, ASCII ASCII character set, regardless of the platforms, or to ensure portability of platform’s native character set. the data. Format: $EBCDICw. Platform independent. Use this format to read data from, Produced output or reads input in the or produce data for, EBCDIC EBCDIC character set, regardless of the platforms, or to ensure portability of platform’s native character set. the data. Format: FLOAT4. This format produces output that is As long as the data is written and in a form specific to the platform the read on the same platform, this Converts SAS numbers to/from the SAS program is being run on. format can be used safely. In order native single-precision, floating-point to have a uniform format for zoned value decimal across all platforms, use either the IEEEw.d or the S370FRBw.d format consistently. Format: HEXw. This format produces output that is As long as the data is written and Converts SAS numbers to/from the in a form specific to the platform the read on the same platform, this hexadecimal representation for their SAS program is being run on. format can be used safely. native integer or floating-point binary representation Format: $HEXw. This format produces output that is As long as the data is written and Converts SAS characters to/from the in a form specific to the platform the read on the same platform, this hexadecimal representation for their SAS program is being run on. format can be used safely. native representation Format: IBw.d, PIBw.d These formats produce output that As long as the data is written and is in a form specific to the platform read on the same platform, these Converts SAS numbers to/from native the SAS program is being run on. formats can be used safely. In Integer Binary form order to have a uniform format for zoned decimal across all platforms, use the S370FIBw.d format consistently

Page 16 of 21 Portable SAS: Language and Platform Considerations

Language Element Considerations Portability Recommendation Format: IBRw.d , PIBRw.d Platform independent. Use this format to read data from, or produce data for, little-endian Converts SAS numbers to/from platforms, or to ensure portability of DEC/Intel (“backwords”) Integer Binary the data. Or, you could use the form S370FIBw.d format to ensure portability. Format: IEEEw.d Platform independent. Use this format to read data from, Converts SAS numbers to/from IEEE or produce data for, platforms that floating-point form use the IEEE floating-point representation, or to ensure portability of the data. Or, you could use the S370FRBw.d format to ensure portability. Function: Only available on the Windows NOT Portable. Avoid. MCIPISTR(MCI-string-command) platform. Submits an MCI string command to a piece of multimedia equipment Function: Arguments must be integer values, Avoid using the MOD function with MOD(argument-1, argument-2) even though they are stored as large arguments. Do not exceed Returns the remainder from the division floating point numbers. The 2**53-1. of the first argument by the second differences in floating-point argument, fuzzed to avoid most implementations means that some unexpected floating-point results integer values might be Function: represented on one platform, but MODZ(argument-1, argument-2) not another. Returns the remainder from the division of the first argument by the second argument, without fuzzing. Function: The permissible format for NOT Portable. Avoid. If this MOPEN(directory-id,member- member-name values varies by function is necessary, try to name<,open-mode<,record- platform. externalize it. Alternatively, you length <,record-format>>>) might be able to restrict member- A value of “P” for record-format name to a subset common to all Opens a file by directory id and member specifies a platform-dependent file. platforms. Try to avoid a record- name, and returns the file identifier or a 0 format of “P”. Format: OCTALw. This format produces output that is As long as the data is written and Converts SAS numbers to/from the in a form specific to the platform the read on the same platform, this hexadecimal representation for their SAS program is being run on. format can be used safely. native integer binary representation Format: $OCTALw. This format produces output that is As long as the data is written and Converts SAS characters to/from the in a form specific to the platform the read on the same platform, this hexadecimal representation for their SAS program is being run on. format can be used safely. native representation Function: PATHNAME((fileref | Pathnames vary in format and Limit your use of this. It can be libref) <, search-opt>) length from one platform to beneficial to list the names of input Returns the physical name of a SAS data another. and/or output files used by a SAS or of an external file, or returns a program, but avoid processing the blank. pathnames the function returns. Format: PDw.d This format produces output that is As long as the data is written and in a form specific to the platform the read on the same platform, thIs Converts SAS numbers to/from native SAS program is being run on. format can be used safely. For a Packed Decimal format uniform format for zoned decimal across all platforms, use the S370FPDw.d format consistently Function: A SAS program could potentially Avoid. PEEK(address<, length>) access memory owned by either Stores the contents of a memory address the OS (if the OS designates it as Depending on what information you into a numeric variable on a 32–bit R/W) or owned by SAS itself. are looking for, try to find a SAS platform There is no commonality in the facility (such as the automatic Function: placement, content or format of macro variables or FINFO function) PEEKC(address<, length>) data or instructions in memory from that will provide it to you. Stores the contents of a memory address one platform to another. in a char variable on a 32–bit platform

Page 17 of 21 Portable SAS: Language and Platform Considerations

Language Element Considerations Portability Recommendation Function: Same as PEEK() Same as PEEK() PEEKCLONG(address<, length>) Stores the contents of a memory address in a character variable on 32-bit and 64- bit platforms Function: PEEKLONG(address<, length>) Stores the contents of a memory address in a numeric variable on 32-bit and 64-bit platforms Subroutine: CALL POKE(source, A SAS program could potentially Avoid. Period. pointer<, length>); access memory owned by either Writes a value directly into memory on a the OS (if the OS designates it as Modifying system or application 32–bit platform R/W) or owned by SAS itself. storage can lead to disastrous Subroutine: CALL POKELONG(source, There is no commonality in the results. pointer<, length>); placement, content or format of Writes a value directly into memory on data or instructions in memory from 32-bit and 64-bit platforms one platform to another. Tabled Distribution of RAND function: The maximum number of Code no more than 32,767 RAND(’TABLE’, p1, p2, ...) probability parms depends on your probability parameters. platform, but is at least 32,767. Format: RBw.d This format produces output that is As long as the data is written and Converts SAS numbers to/from native in a form specific to the platform the read on the same platform, thIs Floating Point (Real Binary) SAS program is being run on. format can be used safely. In order to have a uniform cross-platform format for zoned decimal, use either the IEEEw.d or the S370FRBw.d format consistently Format: S370FIBw.d, S370FIBUw.d, Platform independent. Use this format to read data from, S370FPIBw.d, S370FPIBw.d or produce data for, platforms that use the IBM fixed-point format, or Converts SAS numbers to/from IBM to ensure portability of the data. System/370 Integer (Fixed) Binary form Or, you could use the IBRw.d format to ensure portability. Format: S370FPDw.d, S370FPDUw.d Platform independent. Use this format to read data from, or produce data for, platforms that Converts SAS numbers to/from IBM use the IBM packed decimal System/370 Packed Decimal form representation, or to ensure portability of the data. Format: S370FRBw.d Platform independent. Use this format to read data from, or produce data for, platforms that Converts SAS numbers to/from IBM use the IBM hex floating-point System/370 Floating Point form representation, or to ensure portability of the data. Or, you could use the IEEEw.d format to ensure portability. Format: S370FZDw.d, S370FZDLw.d, Platform independent. Use this format to read data from, S370FZDTw.d, S370FZDUw.d or produce data for, platforms that use the IBM zoned decimal Converts SAS numbers to/from IBM representation, or to ensure System/370 Zoned Decimal format portability of the data. Subroutine: CALL SCAN(string,n, The default value for the delimiters Always specify an explicit value for position,length<,delimiters>) parameter is dependent on the the delimiter parameter. Subroutine: CALL SCANQ(string,n, platform: position,length<,delimiters>) EBCDIC: Function: blank . < ( + | & ! $ * ) ; ¬/ , % ¦ ¢ SCAN(string,n<,delimiters>) ASCII code pages with circumflex: Function: blank . < ( + & ! $ * ) ; ^ – / , % | SCANQ(string,n<,delimiters>) ASCII code pages without Function: circumflex: %SCAN(string,n<,delimiters>) Blank . < ( + & ! $ * ) ; ~ – / , % | Function: %QSCAN(string, n<, delimiters>)

Page 18 of 21 Portable SAS: Language and Platform Considerations

Language Element Considerations Portability Recommendation Subroutine: CALL SOUND( Windows only NOT portable. Avoid, or if frequency, duration); necessary, encapsulate. Automatic variable: &SYSCC The condition code that SAS Preferably, avoid this function. returns to your operating If it is a necessity, you will need to environment. code platform-sensitive logic. Automatic variable: &SYSCMD Last unrecognized command from Avoid. If you are following the the command line of a macro injunction against all windowed window. commands, you won’t need this. Automatic variable: &SYSDEVIC Contains the name of the current Portable, but may give different graphics device. These names are results in each platform. platform-dependent. Automatic variable: &SYSENV Reports whether SAS is running Portable. interactively. Automatic variable: &SYSFILRC Contains the return code from the Portable if you only test for zero or last FILENAME statement non-zero, rather than specific non- zero values. Function: SYSGET(operating- The method of setting environment To use this in a portable fashion, environment-variable) variables prior to invoking SAS you would have to insure that the Function: %SYSGET(operating- differs on each platform same environment variables were environment-variable) properly set in all the platforms you Returns the value of the specified run SAS on. operating environment variable Automatic variable: &SYSINFO Contains return codes provided by Portable. some SAS procedures Automatic variable: &SYSJOBID The length and format of a User ID Portable. Contains the name of the current batch may vary from one platform to job or user ID. another. Automatic variable: &SYSLAST Contains the name of the SAS data Portable. file created most recently Automatic variable: &SYSLCKRC Contains the return code from the Portable if you only test for zero or most recent LOCK statement non-zero, rather than specific non- zero values. Automatic macro variable: &SYSLIBRC Contains the return code from the Portable if you only test for zero or most recent LIBNAME statement non-zero, rather than specific non- zero values. Statement: %SYSLPUT Exchange macro variable values Portable. between local and remote systems. Automatic variable: &SYSMACRONAME Contains the name of the currently Portable. executing macro Automatic variable: &SYSMENV Contains the invocation status of Portable. the macro that is currently executing Automatic variable: &SYSMSG Contains the text to display in the Portable. message area of a macro window Automatic variable: &SYSNCPU Contains the number of processors Portable. available to SAS for computation Automatic variable: &SYSPARM Contains a character string that can Portable. be passed from the operating environment to SAS program steps Automatic variable: &SYSPBUFF Contains text supplied as macro Portable. parameter values Automatic variable: &SYSPROCESSID Contains the process id of the Portable. current SAS process Automatic variable: &SYSPROCESSNAME Contains the process name of the Portable. current SAS process Automatic variable: &SYSPROCNAME Contains the name of the Portable. procedure (or “DATASTEP” for data steps) currently being processed by the SAS language processor Function: %SYSPROD(product) Reports whether a SAS software Portable. product is licensed at the site

Page 19 of 21 Portable SAS: Language and Platform Considerations

Language Element Considerations Portability Recommendation Autocall macro: Returns a value corresponding to Portable. Note: do not confuse %SYSRC(character-string) an error condition this with the automatic variable of the same name. Automatic variable: &SYSRC Contains the last return code NOT portable. You shouldn’t have generated by your operating any need for this automatic variable system. if you are following the recommendation against issuing Note: do not confuse this with the OS commands. autocall macro of the same name. Statement: %SYSRPUT Exchange macro variable values Portable between local and remote systems. Automatic variable: &SYSSCP For a list of possible values, see Portable. Can be used to Table 13.3, SYSSCP and determine the current operating Contains a value which identifies the SYSSCPL Values, in “SYSSCP system family for logic which operating system. and SYSSCPL Automatic Macro encapsulates platform-specific Variables” in SAS 9.1 Macro logic, for instance Unix vs. z/OS. Language: Reference Automatic variable: &SYSSCPL For a list of possible values, see Portable. Can be used to Table 13.3, SYSSCP and determine the current operating Contains a value which is either blank or SYSSCPL Values, in “SYSSCP system variant for logic which further identifies the operating system. and SYSSCPL Automatic Macro encapsulates platform-specific Variables” in SAS 9.1 Macro logic, for example Windows 95 vs. Language: Reference Windows NT. Automatic variable: &SYSSITE None: Contains the number Portable. assigned to your site Automatic variable: &SYSSTARTID May be blank. Portable. Contains the ID generated from the last STARTSAS statement Automatic variable: &SYSSTARTNAME May be blank. Portable. Contains the process name generated from the last STARTSAS statement Subroutine: CALL SYSTEM(command); The value of the command NOT portable. Statement: X < command >; parameter is platform-dependent. Function: SYSTEM(command) The results of command are Alternatives depend on the Function: %SYSTEM(command) platform-dependent. intended purpose of the command. Statement: %SYSEXEC < command >; Runs command in the OS environment. Automatic variable: &SYSTIME None. Portable. Contains the time a SAS job or session began executing Automatic variable: &SYSUSERID The length and format of a User ID Portable. Contains the user ID or login of the may vary from one platform to current SAS process another. Automatic variable: &SYSVER None. Portable. Contains the release number of SAS software Automatic variable: &SYSVLONG None. Portable. Contains the release number and maintenance level of SAS software Function: TRANSLATE(source, to- You must have pairs of to and from Portable if you specify a complete 1, from-1<, …to-n, from-n>) arguments on some operating to-from pairs. environments. On other operating See the SAS documentation for your environments, a segment of the Only one pair is ever necessary, operating environment for more collating sequence replaces null and I recommend you restrict information. from arguments. yourself to that form. Function: URLDECODE(argument) Characters specified by an escape Use with care. For more sequence. are assumed to be in information see “TRANTAB= Returns a string that was decoded using ASCII encoding. On an EBCDIC System Option” on page 1639 of the URL escape syntax platform, SAS uses the transport- SAS 9.1.3 Language Reference: to-local translation table to convert Dictionary, Third Edition. these characters to their corresponding EBCDIC characters

Page 20 of 21 Portable SAS: Language and Platform Considerations

Language Element Considerations Portability Recommendation Function: WAKEUP(datetime) Supported only on the Windows Avoid. platform. You might be able to replicate this Specifies the day and time that execution function on other platforms by will ensue using the SLEEP function (with appropriate calculations), but SLEEP itself is problematic. Commands: Windowing commands All SAS Windowing commands act Avoid. differently on various platforms, due to varying implementations of the interactive environment. Format: ZDw.d Exact layout is platform-dependent. As long as the data is written and Consult the companion read on the same platform, this documentation for the operating format can be used safely. In order environment in question. to have a uniform format for zoned decimal across all platforms, use the S370FZDw.d format.

ENDNOTES

1 SAS Publications can be downloaded gratis from http://support.sas.com/documentation/onlinedoc/base/index.html ; the Companions manuals are listed at http://support.sas.com/documentation/onlinedoc/base/index.html#companion 2 Wikipedia, the free encyclopedia. “P-code machine”. http://en.wikipedia.org/wiki/P-code_machine (09 Jul 2009) 3 BCDIC was also used by CDC, prior to, and following, the creation of EBCDIC (see http://en.wikipedia.org/wiki/CDC_3000), and also by the HP 3000 MPE/ix-series computers (see http://docs.hp.com/en/32212-90008/apcs03.html), as well as other computers 4 Wikipedia, the free encyclopedia. “Baudot Code”. http://en.wikipedia.org/wiki/Baudot_code (15 Jun 2009) 5 You can obtain IBM publications in hardcopy (for a fee) and many of them at no charge in softcopy (PDF and/or Bookmaster) format from http://www.elink.ibmlink.ibm.com/publications/servlet/pbi.wss?CTY=US, by clicking on “Search for Publications” 6 Unicode Consortium (1991-2009). “C0 Controls and Basic Latin”. http://www.unicode.org/charts/PDF/U0000.pdf (10 Jul 2009) 7 IEEE is a professional organization for the advancement of technology, see http://www.ieee.org/web/aboutus/home/index.html 8 Wikipedia, the free encyclopedia. “IEEE 754-1985”. http://en.wikipedia.org/wiki/IEEE_754-1985 (09 Jul 2009) 9 Wikipedia, the free encyclopedia. “Cray SV1”. http://en.wikipedia.org/wiki/Cray_SV1 (15 Jul 2009) 10 Wikipedia, the free encyclopedia. “Floating Point”. http://en.wikipedia.org/wiki/Floating_point (09 Jul 2009) 11 Wikipedia, the free encyclopedia. “POWER6”. http://en.wikipedia.org/wiki/POWER6 (09 Jul 2009) 12 Wikipedia, the free encyclopedia. “Z10”. http://en.wikipedia.org/wiki/IBM_System_z10#Decimal_Floating_Point (09 Jul 2009) 13 Morse, Isaacson, Albert (1987). The 80386/387 Architecture. NY, NY: John Wiley & Sons. 14 Wikipedia, the free encyclopedia. “Open Database Connectivity”. http://en.wikipedia.org/wiki/Odbc (15 Jun 2009) 15 SAS Institute Inc (2005). “CALL POKE Routine”, pg 354, SAS® 9.1.3 Language Reference: Dictionary, Third Edition. Cary, NC: SAS Institute. 16 IBM Corp. “ASCII to EBCDIC conversion”. http://www-03.ibm.com/servers/eserver/zseries/zos/unix/bpxa1p03.html (09 Jul 2009) 17 SAS Institute Inc (2005). “LENGTH Statement”, pg 1294, SAS® 9.1.3 Language Reference: Dictionary, Third Edition. Cary, NC: SAS Institute. 18 SAS Institute Inc (2004). “Representation of Numeric Variables”, pg 207, SAS® 9.1.3 Companion for z/OS. Cary, NC: SAS Institute. 19 SAS Institute Inc (2004). “Length and Precision of Variables”, pg 579, SAS® 9.1.3 Companion for Windows. Cary, NC: SAS Institute. 20 SAS Institute Inc (2004). “Using Reserved Operating System Physical Names”, pg 155, SAS® 9.1.3 Companion for Windows. Cary, NC: SAS Institute. 21 SAS Institute Inc (2005). “CALL PEEKC Function”, pg 713, SAS® 9.1.3 Language Reference: Dictionary, Third Edition. Cary, NC: SAS Institute. 22 Check the exact value of this variable with the SAS companion for your particular platform. 23 IBM Corp. “About VSE”. http://www-03.ibm.com/servers/eserver/zseries/zvse/about/ (15 Jul 2009) 24 I wish I had a catchy mnemonic for this… any suggestions? Send them to me at the e-mail address above.

Page 21 of 21