Writing Portable C Code: the SAS Institute Experience
Total Page:16
File Type:pdf, Size:1020Kb
Writing Portable C Code: The SAS Institute Experience Richard D. Langston SAS Institute Inc. This paper illustrates how SAS Institute Inc. went the VM family, so the emphasis was placed on per about converting a large software system from PL/I formance within OS systems instead of portability. and IBM® 370 assembly language into a new ver Of course, the user community quickly recognized sion written almost entirely in C. The paper briefly the usefulness of the VM family and began an alle discusses the history of so~tware development at giance to those operating systems that precipitated the Institute, then explains the rationale behind the the Institute's need to convert the SAS System to company's decision to convert to a new language. run under VM. From there, the Institute's experience in writing portable software is discussed. In addition, the pa This effort was accomplished while maintaining per touches on the development process the Institute the aforementioned hybrid of languages. Very little used, including the adaptation of software coding PL/I code was added to support VM; most of the standards. system modifications were to the supervisor. There fore, most modifications were m_ade in assembly lan Let us begin by tracing the history of program guage. The conversion goal was to change as small ming language usage at the Institute. SAS® soft amount of procedure code as possible and allow the ware was originally implemented for OS-type oper supervisor to recognize the environment and deal ating systems (MVT, MFT, MVS, VSl). The SAS with its eccentricities, relieving the procedure writ System was implemented using a combination of ers to once again concentrate on their specialties. PL/I and assembly language. The assembly lan guage portion was known as the "supervisor" and However, certain aspects of the VM operating was implemented as such for efficiency, speed, code system certainly made their way to the procedure size, and any other characteristic for which assem writer's level. For example, the MVS operating sys bler is best suited. The PL/I portion of the system tem fills dynamically allocated memory with zeros was that of the procedures (programs written to ac before the application receives the pointer to the cess and create data using the supervisor through a memory. The PLjl programmers were sometimes common set of routines). PLjl was chosen here be unknowingly taking advantage of this fact by not cause th;: procedures were performing a higher-level initializing variables (whose initial values would be set of functions, such as statistical analysis and re zero nonetheless). 0 nly when their procedures were port writing. Procedure writers could spend more tested under VM did this fact arise. This was the time concentrating on the implementation of com first of many occurrences of recognizing the differ plex algorithms instead of concerning themselvses ence an operating system can make on the perfor with how-many base registers to have available. mance of an application that is supposed to be op erating system-independent. This hybrid of language served the Institute well until it was decided to transport the SAS System to The conversion for VfVl was accomplished pn other operating systems. Recall that at the time the marily by OS simulation"; that is, simulating OS system was first implemented, the IBM OS family of SVCs and control blocks when system calls were operating systems was much more widespread than made. However, when the Institute converted to 26 the DOS JVSE system, the system developers rec were available for all the systems. Third, conversion ognized the need for isolating the system-dependent woul-d not be as painful since the procedures were portions of the code so that these could be replaced already in PLjl, albeit using the PL/I Optimizing when moving to a new environment. Once again, Compiler as a base language_ very little of the procedure code had to be changed to function under DOSjVSE. The programming staff quickly discovered that each vendor's implementation of the PLjllanguage The SAS System remained an IBM mainstay was different. As the staff gained experience with through the 82.4 release of the product. The sys the compilers, a set of coding standards was adapted tem did not function on any other computers other to ensure that the code would compile on any of the than those within the 370 family (using the oper machines. Some of the simpler decisions made were ating systems VSjl, MVS, VMjCMS, DOSjVSE). that such characters as '#' and '@' could not be However, just as the Institute recognized a grow used in the names of variables because the ANSI ing force in the user community for the VM family, Standard did not include them. The use of # and @ there was acceptance of the fact that even more in variables was a common practice among the de operating systems seemed likely candidates for con velopers using the more amiable Optimizing Com version. The user community had been asking for piler. Another common practice was incrementing minicomputer support, primarily on Digital Equip pointers by overlaying an integer on the pointer and ment Corp.'s VMS® Data General's AOSjVS, and incrementing the integer. This became verboten in Prime's PRIMOS® operating systems. the portable implementation due to pointer sizes on some machines. These are but two of many eXam But converting to those operating systems would ples of differences that were dealt with. Striving be much different than converting from MVS to for portability across operating systems became a VMjCMS. After all, the IBM operating systems all common goal among all developers, and this virtue use a common assembly language and PL/I com carried on throughout the minicomputer implemen piler (not a completely. true statement but relatively tation and is still apparent in the latest design effort speaking, close enough)_ The three minicomputer using the C language. operating systems all had their own assembly lan guages and different Pljl compilers (all different The minicomputer and mainframe implementa implementations of the ANSI standard), and they tions merged with the advent of the Version 5 SAS were architecturally far removed from the IBM 370 System. The procedure code implemented for the family. A totally new implementation of the SAS minicomputers was also functional on the mainframe System would be needed. And, to simplify this im system_ Only the supervisor was different between plementation, the largest possible subset of the sys the two systems. The supervisor on all three mini tem needed to be implemented in a portable lan computer systems was the same, with the exception guage so that the same code would function on all of small amounts of host-dependent code, necessary three minicomputers_ to perform the low-level tasks demanded of a com plex system. The final outcome of the design was that the su pervisor for the system was wriiten in PLjl. This lan As the Version 5 SAS System made its way into guage was chosen for several reasons_ First, almost the user community, the IBM PC was quickly evolv all of the Institute programming staff was familiar ing into a viable environment for a system as com with it, having written procedures in this language_ plex and expansive as the SAS System. An experi Second, the intention was to once again have the mental project began on this microcomputer, using procedures be as portable as possible across oper C as the implementation language. C was chosen ating systems (IBM mainframes and the minicom because there was not a fully acceptable PLjl com puters). At the time, PLjl was the language with piler available for the machine_ C has many charac the most power and flexibility for which compilers teristics of Pl/l, so its choice was not an unreason- 27 able one. And its functionality on a microcomputer tion's NOS IVE. The Institute also realized that was well recognized. many other operating systems had C compilers, and if a new implementation would function success Once again, the Institute was faced with needs fully on all of the systems currently available to the of the user community that would require massive Institute, new operating systems could be added changes to system implementation. Was it better expediently and with much less "pain" than with to develop a system written in ( solely for the P( VM/(MS, DOS/VSE, or the minicomputers using' environment, similar to the Version 5 PLjl system PL/L written for the minicomputers? It was decided that the best approach was to rewrite the entire system Most developers accepted the ( language quite tn C in such a way that it would be portable across readily. As stated earlier, there are' similarities be all operating systems, so that minimal reimplemen tween PL/I and C. The most obvious are the use tat ion would be necessary when stili more computer of semicolons as statement ddimiters and j* * j as systems became viable as environments for the SAS comment delimiters. My own experience with learn System. ing C was that it was initially a "step down" from PL/I; the string-handling capabilities of ( are min The biggest problem with this decision was com imal, and it has the feel of a low-level language, piler availability. All of the minicomputers on which unlike PL/I. However, as more developers began to . the SAS System ran had good ( compilers, anq convert to the new language, more subroutines and there were several available on the IBM Pc. How macros were created and C began to have the look ever, such was not the case for the IBM mainframe.