An Overview of the SASle· and Library Tim Hunter, SAS Institute Inc.; Cary, NC

ABSTRACT The Compiler

This paper describes the major features of the SAS/® compiler The compiler implemented the C language as described in the and library and discusses the development history oftha product. then definitive book on C, Kernighan and Ritchie's The C Pro­ The enhancements in the current production release are outlined. .gramming Language: Because the compiler was implemented for Some possible enhancements for future releases are described. IBM 370's, it included many features that were thought necessary for that architecture and for programmers who were used to working In that environment. The foliowing list describes some INTRODUCTION of these features: • The generated code IS fully reentrant, even allowing The SAS/C product has had four production reteases in its five modification of external variables. years of development. Each release included many new features and enhancements of existing features. Not only is the compiler • The compiler supports a number of built-in functions. used exclustve1y for Version 6 of the SA&!' System on ISM® main~ These are functions for which the compiler generates frame hosts, it is the leading C compiter in this market. Although instructions directly into the instruction stream, rather than the compiler and IIDrary is heavily orlented to use in large soft­ generating a call to a separately linked version of the ware systems, it is an efficient tool for any software project that function. In this release, the following functions were is written in the C langauge. implemented as built-in functions:

The primary elements of the SASIC product are the the compiler strlen memcmpp ,b, strcpy memcP¥P ceil and run-time library. However, the product includes a number of 1IiEImcpy memltlt fabs utility programs, as well as several configurations of the run-time memset strxlt floor library for specialized environments. Because the documentation memcmp medf Idexp is as important as the software itself, the SASIC product is accompanied by a complete and detailed set of manuals and • The compiler produces a source listing file, induding technical reports. macro expansion and cross-reference.

This description begins with a history of SASjC dev€lopment that .. C programs can call, and be called by, programs written highlights the features of the first three releases. Following this In IBM 370 assembly language using a very simple is an overview of the product as it is today. The tast part is a dis­ interface. Many assembler subroutines require no cussion of what dlrections the product may take in the future. modification. Throughout, features exclusive to the SASle product are high­ .. The INDE? compiler option generates code that allows lighted, as are those features that were implemented in response programs to execute withOut the need tor the run-time to requests from our users. library, -and that can be called from other high-level languages such as PLII, COBOL, and FORTRAN. HISTORY OF THE SASjC COMPILER AND (Exclusive.} LIBRARY The compiler also supports several extensions that were deemed important to the usability of the C language 1n the IBM 370 envi­ In mid-1 ~84, SAS Institute Inc. decided to develop its own C com­ ronment. Some of these extensions are described in the following piler. This project was assigned to the Institute's Language Sys­ list: tems Department. The goal of this group was to Lattice~ Ino:s compiler work on the IBM 370 architecture under' the • The characters {lU" and the circumflex are commonly MVSj370, MVSjXA~ and VMjCMS operating systems. A little used in C programming, but are not always available on over a year later, the progenitor of the SASle product was put IBM terminals and printers. Therefore, the compiler into production. supports digraphs (two~character sequences) as substitute representations of these characters. You can specify up The following sections describe each production release of the to four representations of each character, two for use in SAS/C product. Specific features are mentioned in the section the source fHe and two for use in the Hsting file. The covering the release 111 which the features appeared. Of course, alternate representations may be modified on-site. aU of the features have continued to be supported in newer releases. .. In order to allow the generation of call-by-address parameter lists, the compiler supports the @ special Rele.... 2.10C operator. When applied to a function argument, this operator, like the & operator, returns the address of the The first production release was known as the Lattice Native argument. Unlike the & operator. the @ operator accepts an argument ttlat IS not an Ivalue; in which case the @ Compj/er as Modified by SAS Institute Inc. for IBM 370 Systems < operator forces creation of a temporary copy of the The compiler was based on Version 2.00 of the Lattice C com~ the piler. The run-time library was a new implementation of the stan­ argument and returns the address of the copy. dard C library, with emphasis on both IBM suitability and (Exc!uslve.) compatibility with ® implementations.

640 • In order for C programs to call assembly language ences to an external object refer to the same object, OM0370 is functions that expect a VL-format parameter list, the an object code disassembler that. given an object deck produced compiler supports the ...... aS1n- pr~fix for function names. by the compiler, produces a listing, similar to that produced by an When a function whose name is prefixed by -.-aSDL is assembler, optionally interspersing.C language source lines at called, the compiler creates a VL-format parameter list for the appropriate points in the Ilst109. the function. (Exclusive.) Release 3.00F The Run-Time Library Shortly after Release 2.1-OC went into production, the Language As mentioned earlier, the design of the run-time library had two Systems Department began converting Lattice's Version 3.00 goaJs. suitability for IBM 370 operating systems and compatibility compiler. As shown below, the new release contained several with UNIX implementations. Most of the library functions are important enhancements. Release 3.00F. the first release to use standard functions; that is, they are -functions that most C imple­ the SAS/e name, went into production 13 months latar, Decem­ mentations support. These functions can be grouped as follows: ber 1986.

• memory allocation functions The Compiler • character-type functions Both Lattice, Inc., and the Institute had been members of the • program control functions ANSl X3J11 committee (the committee responsible for producing a standard for the C language and library) for some time. There­ • string functions fore, Release 3.0OF contained several new language elements. Some of these elements are • mathematicat functions • the void data type • date and time functions • structure assignment, structure arguments, and functions • varia~e argument list functions. returning structures

Because the UN1X I/O modeJ (widely used in the existing imple­ • the enum data type mentations of the C library) is so different from the IBM 370 model, the library I/O subs.ystem was implemented amy after • function prototypes much thought and hard work, The final result was three basically • the ~LINE-- and _FILE-- pre-defined macro names. separate sets of VO functions: standard, augmented, and UNIX· styJe. Standard I/O functions are those-commonly used, such as Also, the compiler implemented the register storage class. Up to fOPll'n, prjntf. and scanf. Augmented 1/0 functlons'(exciu­ six general purpose registers may be assigned to auto integer sive) are cJoser IBM 370 1/0 models and indude afopen. to and pointer variables. and up to two floating..point registers may afread, and afwrite. ille- UNIX-style I/O functions are close be assigned to floating-point variables. approximations of the UNIX -level I/O routines open. read, write, and lseek. The Run-Tme Library The library also contains some functions designed specifically for use in the IBM 370 environment. $uCh as the dynamic loading The run-time library gained new functions in several areas. These functions are functions loadm and unloaom. These functions allow C pro­ grams to be implemented as several distinct load modules, each • the string functions memupr. memlwr. strupr. of whiCh can be loaded into memory as required and unloaded strlwr, strstr, and 14 others. when it is no longer needed. Other functions are present that fol­ low common IBM 370 idioms (for exampte, the strxl t and xlt­ • a set of system interface functions including easer io able functions for string translation). and functions that allow access to operating system information such as the name and release numbers. In addition, a version of the library that allows execution in other operating systems is provided. This version, called the General­ • two new utility functions, bsearch and qsort. ized Operating System environment, does not use operating sys­ tem services directly. Instead. the library invokes one of several This release also defined a subset of the library informally called exits. Each exit is coded for use in the-target operating system, pure/ib, This group of functions, including the math and string and, as such, may take whatever action is appropriate to fill or functions, are independent of operating system services. As deny the library's request. (Exclusive.) such, they were intended for use in programs that execute with­ out the full library. Finally, most of the run-time library was packaged as a set of tran­ sient load modules. This packaging allowed individual C pro­ Utilities grams to take up less space on disk, simplified system maintenance. and allowed the library to be installed in a shared The GENCSEG utmty was added in Release 3.00F, also. The util~ area, such as the LPA or a DCSS. ity program allows dynamically-loadable C load modules to be saved in a sharable VM segment. This is the preferred method Utilities of installation for large applications users. Such applications exe­ cute faster and use less memory in the user's virtual machine. Two utility programs, CLINK and OMD370, were included in the Release 2.1OC package. GUNK is an object code preprocessor that is used when a reentrant program consists of two or more object decks produced by the compiler. CUNK combines the data needed to initialize external objects and ensures that all refer-

641 Release 3.01F Th.e Run-Time Library

Release 3.01 F. the third production release. became available in The fUn~time library gained five major subsections, as well as March 1988. SAS/e user feedback grew proportionally with the numerous otl1er new functions. number of SAS/e users; therefore; an important tradition was established in this release - the Inclusion of complier and library .. A set of signal functions that go far beyond the level of enhancements based on suen feedback. FOllowing the Institute's support required by thG ANSI draft standard. Synchronous lead of responding quickly to input from users of the SAS System. Signals, such as overflow, and asynchronous signals, the SASjC developers began imptementing those items the users such as terminal interrupts, can be trapped and handled requested. For the remainder of this paper. this sort of enhanc­ as required by the C program. Up to sixteen user-defined ment will be highlighted. signals can be added. (Exclusive). An important extension to signal-handling is support for The Source·level Debugger IUCV signrus in programs running in VM (exclusive).

The most popular feedback request was for a source-level • Subcommand processing functions that allow a C -debugger, so Release 3.01F included one. Some of the features program to get input from subcommands contained in a of the debugger are CLIST or EXeC (exclusive).

• breakpoints at every source line and at every function call, .. Support for Rexx in eMS, including the ability to create a entry. and return Rexx function package containing C functions that can be c81led as a Rexx function or subroutine (exclusive); the • single-step mode ability to get. set, or drop Rexx variable values; and the ability to put lines on the stack in either LIFO or FIFO • ability to display and modify the values of both scalar and order (user request). aggregate objects • A set of coprocessing functions that allow a C function to .. ability to display a traceback (for example, a list of the execute as a set of cooperative processes (exclusive). functions in the calling sequence and the line number where each function was called) • eMS low-level I/O functions that interface directly with the eMS file system macros, and a set of XEDIT IjO functions .. ability to resume execution after a program check that allow a program tQ read and write directly to a file it"! .. ability to issue an operating system command from the XEDIT. debugger The ANSI fsetpos'and fgetpos functions were added to the .. 'ability to operate in tandem with an operating system library, allowing random access to most file types, These func­ debugger such as TSO TEST or VM PER (exclusive) tions were added by the ANSJ committee in order to provide a more general method of file positJoning than had been allowed .. ability to execute a CLIST or EXEC contal:ning other by the fseek function. The library I/O functions were enhanced debugger commands or operating system commands to allow I/O to VSAM files, Including ESDS and KSDS file types. (exclusive) In CMS, the I/O functions were enhanced to allow input to come from a file in xeolT . .. ability to Ust the source code by line number or function name Documentation .. 1/0 exits that allow routing of debugger 1/0 and from to Release 3.01 F was accompanied by a set of four manuals. These aHernate sources and destinations. or allow the 1/0 to manuals remain the standard documentation for the SAS/C prod~ proceed via a non-standard path. (Exclusive. This facility ucts. They are was added by user request.) $ASIC Compiler and Library User's Guide The Compiler SAS/C Ubrary Reference, Volume 1 Several new ANSI features were added to the release. as well as a number of language extensions requested by the users. $ASIC Libr6ry Reference. Volume 2

.. non~reentrant code generation (user request). $ASIC Source Level Debugger User's Guide • the void • data type and the const and v(llatile qualifiers. RELEASE 4.000

• in~line machine code. This fsature allows the C program to specify an exact sequence of IBM 370 machine Release 4.00C of the SAS/C product went into production in Feb­ instructions to be generated directly in the generated ruaryofthis year. This is the first mainframe release to be entirely code. Almost any instruction can be specified, including developed at the Institute since the acquisition of Lattice, Inc. The exotic instructions that would not normally be generated Language Systems Department became the C Compiler Develop.. by C language statements. {Exclusive). ment Division. The division was expanded to include developers for both the PC compiler and the mainframe compiler, and • enhanced strcpy and strlen bunt-in functions (user includes a complete testing department. request). The major enhancements to the product are complete C language .. Three new built~in functions. getc, putc, and strcmp. conformance to the latest draft ANSI standard, extensive support

642 for communication with other hlgh-1evellanguages, and the addi­ Language Extensions tion of a global optimization phase. In addition, a new product was added to the line, the SAS/C® Full-Screen Support Library. Several useful extensions to the language were for Release 4.0OC. All of these extensions are exclusiv-e to the SAS/C prod­ The Compiler uct. They include • the ability to deciare a union with no tag and no identifier, The major change to the compiler was complete conformance to known as an anonymous union. Anonymous unions can the current draft ANSI standard. In addition, a global optimizer greatly simplify references to objects with complex added to the compiler as well as several language enhance­ was definitions, as well as change a simple member to a ments. union.

The Global Optimizer • the ability to declare an array with zero members in a structure. The array occupies no storage, but the name of While previous releases of the compiler have always emitted the array can be used in expressions, and the member highly optimized machine code, the compiler was largely limited following the array is correctly aligned for the array type. to optimizing small sequences 0f statements. However, with the addition of the global optimizer, the flow of control and data • the -asm keywOrd. This keyword may be used in the through an erltire function can be analyzed. resulting tn consider­ declaration of a function or function pointer. It indicates able savings in code space and execution time. Some of the that the compiler should assume that the function is optimizations performed are listed below. written in assembly tanguage and should generate an appropriate function call and parameter list. • Heavily used variables are automatically allocated to machine registers. The global optimizer will allocate up to • the ability to dedare char, short, or long bit fields. six auto integer and painter variables to general registers, Additionally, the default type (eit~er char, short, or and up to two floating point variables to floating point long) of an int bit field may be specified via an option. registers. Although the C language al10ws variables to be given the register storage class, this declaration does not The memcpy, memset, and memcrnp built-in functions have been allow the: variable to be assigned to a register at one enhanced so they generate the smallest code sequence possible point in a function and not at another point. This is less based on the type of the length parameter. If the type is char, than optimal jf the variable is heavily used at some places for example, then only the code necessary to operate on 255 or and not 1n others. The global optimizer will change a. fewer characters will be generated. (This was added in response register's assigned variable in mid-expression, if to user requeat.) necessary, in order to keep the most heavily used variables in machine registers throughout the function. Intertanguage Communication • Assignments that are not referenced later, caned dead Release 4.00C contains a new'feature called Interlanguage Com­ stores. are eliminated. The global optimizer keeps a munication, or ILC. This feature is a combination of compiler recOfd of references to all of the variables in a function. If enhancements, run-time library enhancements, and a new linking a value is assigned to a variable, but that variab!e is never utility called ILCLINK. Together, these items allow C programs referred to later, then the code to make the assignment is to call, and be called by, programs written in other high-level lan­ not generated. guages. The languages supported are FORTRAN, PLfI, COBOL, • Invariant calculations are moved out of loops. If an and Pascal. It is also possible for a site to add support for addi­ expression (for exam~e an array reference with a tional languages. constant index) is found in a loop and its value does not change within the loop, it is moved out of the loop. ILC support allows a programmer to freely mix routines written in any of these languages in a program. Some of the benefits of • Common subexpressions are merged. doing this are

• Constants are propagated and any resulting constant • using existing subroutines written in another language expressions are folded. If a variable is assigned a • using the language most natural for a specific part of the constant value and not modified throughout the function, applicatiOfl the global optimizer replaces references to the variable with the constant. If the variable is the wrong type, the • easy conversion of a program from one language to global optimizer creates a constant of the correct type. another. For example, if a constant int variable is only used in float expressions, then the constant will be represented The SASle product's lnterlanguage Communication feature sur­ as a floa t constant. passes the mixed language support found in some other high­ level languages, because it allows each language's environment • Unused code is eliminated. If the global optimizer's to be responsible for the behavior of that part of the program. analysis of control flow through the function shows a For example, PL/I ON conditions are handled by ?L/I, and C sig­ section of code that cannot be executed. it will remove nais- are handled by C. Other high-level languages may require the code. the programmer to disable all exception handling in both lan­ • Very busy expressions are hoisted. Expressions that are guages. If the other .language has its own debugger, both it and computed along ail paths from a-given point in the the SAS/C source-level debugger can be used simultaneously to debug the program. function _are catled very busy expressions. If the global optimizer detects a very busy expression. it moves (hoists) the computation to a single. common location.

643 The Run-rune Library utility. OSECT2C automatically transforms any assembly larl­ guage DSECT into the equivalent C structure definition. As part The run-time library has been enhanced in many areas, but the of the transformation, two most important are the additions of an alJ-resident library and the Systems Programming Environment (SPE). • assembly language data types are converted into C data types

The AlI~Resid.nt Library • objects with overlapping definitions are converted into unions Many users requested that the Institute provide a verSion of the run-time library that could be linked into a C program. In • assembly language eou instructions are converted into C response, an all-resident Ubrary has been provided that supplies tanguage macro names this function. Ltnking the C program with the ali-resident library results in a stand-alone load module that contains all of the sup~ • C language macro names are produced fOr each port routines necessary for the program to execute. Such load assembly language symbol modules are easier to transport to another location because the transient library does not have to reside in the new location. • a cros:s~refElrence is produced. The cr'oss~reference section shows the assembly language symbol, its offset, Unking the program with the aU-resident library is largely auto­ length and data type, the associated C data type, and the matic. However, the programmer may choose to tailor the pro­ C identifier. cess in order to include necessary parts of tile library and exclude those deemed unnecessary. The tailoring process is driven by Other Library Enhancements a C include file named resident.h. When a C source program containing resident. h is compiled and the reSulting Object In addition to an all-resident library and SPE, two other enhance­ code is linked with the program. the setected routines from the ments to the run-time Ubrary deserve special mention: all-resident library are included in the resulting load module. • New fUflctions have been added. All of the following functions conform to the draft ANSI standard: The Systems Programming Environment assert strtok strerror atexit memchr vprintf The C language started out as both a general programming lan­ s'Dtbuf labs vsprintf guage and as the systems programming language for UNIX oper­ setvbuf div v£printf ating systems. While It has been successful as a general-purpose memro.ove Idiv application language on the mainframe, so far little attempt has been made to use it as a replacement for assembly language in • All ClIO functions support the reading and writing of systems programming applications. The Systems Programming VSAM linear data sets in MVSjXA using the data-tn-virtual Environment is intended to support such applications. (OIV) faCility.

SPE is essentially a specialized version of the run-time ubrary. The Source-Level Debugger This version may be grouped into five sets of support routines: For Release 4.00C. the source-level debugger includes the fO!~ • general routines tor program start-up, execution, and lowing enhancements: termination. This includes program entry, command4ine processing, stad<. and heap memory allocation, and • the MONITOR command. used to interrupt the program program exit. when a data object, such 8S a simple variable or structure member, changes vahJe. (This command was implemented • synchronous and asynchronous interrupt handling in r&sponStl!! to user' requests.) routines. A C function may be entered in response to an interrupt. • the STORAGE command, used to display memory use summaries and to diagnose memory overlays caused by • special versions of functions found in the full C run-time program bugs. library. These functions, such as malloc and free. have been written especially for the SPE library. • the WHATIS command. which displays the type of a variable. • interfaces to operating system services such as SVCs. • the size of the debugger symbol table file has been • a subset of the functions available in the full C library. reduced by as much as 30-50%. These functions are those that depend upon few or no operating system services. They include string functions, • a reference to an object can specify that the type of the math functions, and low~'evel I/O functions. object is defined in another function in the catling sequence. Most of the routines are provided in source code and may be modified as necessary for a specific environment or application. The FuU-Screen Support Library

Separate from, but important to, SPE is the OSECT2C utility pro­ One of the most frequently requested items has been support for gram. Most systems programming applications require access full-screen programming. Our response is the Full-Screen Sup­ to system data whose definition is only available via an assembly port Library (FSSL). a coilection of functions that provide a high­ language DSECT. In the C language, such data should be level interface for developing full-screen applications on main­ mapped using a structure. Transforming these OSECTs into C frame systems. Its capabilities include structures is a tedious process. If many DSECTs need to be con­ verted, this transformation is such an overwheiming process that • an mterface to 16M data stream programming for 3270 it is easier to write the application in assembly language. There~ devices fore, Release 4.0OC of the SAS/e product includes the DSECT2C

644 • generic or abstract screen manipulation The Run-Time Library

• complete portability be_ MVS and CMS In the next major release, the library wm conform to the draft ANSI standard, • 3210 device independence. The Institute has received many requests tor support tor exploita­ FSSL provides functions to tion-mode 6Kecution under VM/XA SP. This support will be avail~ • initialize and terminate a fu!I--screen session able in the next major SAS/e release.

• define screen viewing areas, known as panels The Source.. Level Debugger

• define individual fields within the panrus One of last year's most popular SASware Ballot items was a sug­ gestion that the debugger support expressions in its commands. • display and read data in the panels. This enhancement has been added in the next major release.

Documentation The top vote-getter in this year's SASware Ballot was for a full­ screen debugger. ThiS Will be added to SAs/e soft'Nare in a Release 4.00C is accompanied by four new manuals, future release.

SAS Technical Report C-106, Changes and Enhancements to Documenta1ion the SAS/C C

FUTURE DIRECTIONS Table 1 summarizes all of the features discussed in the preceding sections. Each feature is marked with an X under the release in Given that the SAS/e product is now in its fourth produCtion which it was introduced and any releases in which it it was release in its five years of life, it is not hard to predict that more enhanced. new features will be added in future releases. The task of the C Compiler Development Division is to select those additions which J will prove the most useful to you. As in the past, your input will REFERENCES help us make these decisions. The Institute has a number of mechanisms in place for this sort of input, the most popular of Kernighan, Brian W. and Ritchie, ~nnis M. (1978), The C Pro~ which is the SASware Ballot'!' gramming Language, EngleWOOd CliffS, NJ: Prentice-Hall Inc.

Some directions are clearly indicated, others need more input. A number of possibilities are discussed in the following para­ SASfC and SASware Ballot are registered trademarks of SAS graphs. Remember 1hat, unless specifically noted, the mention institute Inc., Cary. NC. of a possible feature does not represent a promise on our part to implement the feature. These items are noted solely as poTnts IBM is a registered trademark of International BUSiness for discussion. Machines Corporation, Annonk, NY.

Lattice is a registered trademark of Lattice. Inc. The Compiler MVS/XA is a tradamafk of International Business Machines Cor­ Now that the language accepted by compiler is defined by the the poration, Armonk, NY. ANSI draft standard, it might seem that this part of the product could become stable. However, the draft standard specifies UNIX is a registered trademark of AT&T. mechanisms by which implementation-dependent language extensions may be added. For example, the ~sm keyword fol­ lows those specifications for the addition of new language key­ words. Clearty there is room for new language extensions. One continually popular request is for a packed-decimal data type. Other popular SASware 8allot items are vector code generation aoO assembler source code generation,

The second most popular suggestion in this year's SASware Bal~ lot was for a C++ compiler. The C+ + language is be<;oming extraordinarily popular, and so we will do our best to provide one in a future release.

645 Table 1 SAS/e Features listed by First Release

Release introduced Feature 2.10C 3.00F 3~01F 4.00C

--.-.-asm keyword x --LIN~ and --EILE-- predefined macros x bsenrch and qsort functions X const and volatile type qualifiers x enum. data type ' x fsetpos and £getpos functions x

register storage class X user-added signals x void data type x

Access to files in XEDIT x All-resident library x Anonymous unions x

ANSI language conformance x Augmented 1/0 functions x Built-in functions X x X

Call-by-address parameter lists x Communication-with assembly lanqauge x Complete C library X

coprocessing functions x Cross-reference listing x CLINK object code preprocessor X

CHS low-level 110 functions x Digraphs for special characters x DIV support x Dynamic loading functions x DSECT2C x Full-screen support library X

Generalized Operating system library x Global optimization X GENCSEG x

In-line machine code x Independent execution X Interl'anguage communication x

IUCV support via signal X MONITOR debugger command X Non-int bit-fields X

Non-reentrant code X Object Module Disassembler X Operating system interface functions X x

OS low-level IIO functions x prototypes x l'urelib library X

Reentrant code X Rexx support functions x Signal functions X

Source listinq X Source-level debugger x X String translation functions x

646 Release introduced Feature 2.10C 3.QOF 3~01F 4.00C structure assignment~ structure arquments, x and structur~ return values suncomman4 tunctions x systems programming Environment library x

STORAGB d~bugger command x Transient library x Upper- and lower-case string functions x

UNIx-style I/O functions x VL-format parameter lists X void • data type x

VSAM linear data set 1/0 x VSAM 1/0 X WHATIS debugger command x

Zero-Iengtb arrays x

647