Test Results
Total Page:16
File Type:pdf, Size:1020Kb
PORTABILITY OF IARGE SAS (R) CODE ACROSS OS BATCH, GKS, VKSnl AND THE SAS(R) SYSTEII FOR PERSONAL COIIP\ITERS Larry McEvoy, Washington University ABSTRACT After our scoring program was initially developed, it was scheduled to be used in a The SAS(R)Companion for the eMS Operating large scale multi-centered research project. System (1986 Edition) states on page 1, Each of the five centers in the project "Generally speaking, SAS programs and their extensively reviewed and critiqued the code results are the same, regardless of the host over the course of a year. After this review operating system. n The purpose of this paper process was complete, the sentiment was to is to determine the extent to which SAS freeze the code. In this way other users Institute is generally speaking. could use the same exact program and compare their findings to those in this large scale In the Department of Psychiatry at Washington project. Such standardization would greatly University. we have developed a large SAS facilitate cross-study comparisons in a field program of over 1 1 600 statements which scores plagued by inconsistent definitions and Widely a standardized psychiatric interview and variant measures. creates a data set of over 1,700 variables. Other institutions often wish to run this It was only after SAS Institute greatly scoring program on machines other than an IBM expanded the types of machines and operating mainframe. In addition to aiding such users of systems under which SAS software could run our own program, it is hoped that this' that we had to think of how the program could exercise of running large SAS code under be used in these different environments. It different environments will be of more general was clear that parts of the program would have help to those wanting to write SAS code that to change since some of its conventions are is independent of the host operating system. not supported under all operating systems. It was not clear, however. whether all operating systems could handle a program this size. HISTORICAL DEVELOPMENT OF SCORING PROGRAM whether performance would be acceptable, and whether the same results could be guaranteed. Our scoring program was written with the 1979 edition of SAS. At that time, the SAS User's Guide (1979 Edition) indicated that "SAS runs TEST RESULTS only on IBM 360/370 computers (and plug-compatible machines such as Amdahl, Itel, Whenever we distribute our scoring program, we CDC Omega, Magnuson, Ryad, etc.) under OS or include on the tape a file of 34 test cases OS/VS (p.3)." As a result, there was no stored according to the input format specified thought given to writing the program to run on by the deck and column numbers contained in a variety of machines with different operating the interview. Along with the scoring MACROs, systems. there is also a MACRO for the INPUT statement for the test data. Users are instructed to The scoring program was designed in sections, read the test cases with the INPUT MACRO, run since it is possible to limit the interview to the data set through the scoring MACROs, and selected sections. This was achieved by do a FREQ and PRINT of forty-four variables storing each section of the scoring program as created by the program. The output of these a SAS MACRO. These refer to the old-style procedures should match the output we enclose MACRO statements which begin with the word with the tape. For the initial test we MACRO and end with a percent sign. These attempted to match these test frequencies MACRO statements should not be confused with across the four operating systems. the MACRO language which was not available in the 1979 release. As an additional test, we ran the program with We copied these MACROs to tape and then 3,400 observations to determine how instructed users to concatenate this file to performance under the operating systems would the front of the SYSIN file in their JCL. compare wi th a heavier load. These 3,400 This was the method for simulating a MACRO observations were obtained by outputting each library recommended in the 1979 Manual (p.12). of the original cases one hundred t'imes and passing this larger data set to the scoring Another feature of SAS79 that later proved program. The same PROC FREQ was then important relates ~o the ARRAY statement. The performed on the larger data set, but ,the PROC current Version 5 Manual distinguishes between PRINT was omitted to avoid a lengthy listing. explicitly and implicitly subscripted ARRAYs. In the 1979 edition there was no such Table I displays the four operating systems distinction. The only type available in this used in the testing, the versions of those earlier manual is riow described as implicitly operating systems. the hardware which was subscripted. used, and the mode of SAS execution. The test results are described below. 59 I. as BATCH %INCLUDE statement indicating the logical name Since the scoring program was written to run of the external file being referenced. Thus, under as, it ran without complaint on an IBM the NOSOURCE2 option. which suppresses the SAS 4341-2. A MACRO library was simulated by statements from external files in the current concatenating them to the front of the JCL %INCLUDE statement, looked des irable. SYSIN file. and the MACROs were invoked after (Curiously. NOSOURCE2 is not available under the DATA and INFILE statements. The DATA step VMS as an option of the %INCLUDE statement, read in the 34 cases and created 1,743 but is available as a SAS system option.) variables 1,074 variables coming from the interview's INPUT statement and another 669 Unfortunately. the NOSOURCE2 option makes variables being created from the scoring debugging impossible. Whenever a SAS error is program. This DATA statement took 52 seconds encountered, the offending statement is also and used a6aK. The PROC FREQ of the selected suppressed. The log contains error messages forty-four variables took 3 seconds and used but no SAS statements to which they refer. ll40K. The PROC PRINT of these same variables This differs from results obtained with the also took 3 seconds and 996K. NOSOURCE SAS system option. With this option, SAS statements with errors are printed to the The data steps necessary to create and score log while the printing of the other SAS source the 3,400 observations took a total of 5 code is still suppressed. Consequently. we minutes and 47 seconds. The PROC FREQ of this placed the INPUT statement and scoring larger data set consumed 20 seconds. A statements directly into the program and used comparison of these CPU times is displayed in the NOSOURCE system option. Table 2. The problems encountered with compatibility II. CMS can be grouped into three areas: The test on the CMS operating system was also A. syntax in the INPUT statement run on an IBM- 4341 mainframe. Perhaps it is B. implicitly subscripted ARRAYs not surprising then that only one modification C. size of the program was needed when changing to this environment. These areas are discussed below. Since there is no JCL in CMS, the MACROs had to be identified with CMS FILEDEFs which could A. SYNTAX IN THE INPUT STATEMENT be %INCLUDEd into the job stream. Once this The original INPUT statement made use of the difference in referencing external files was n* modifier which specifies in format lists made, the program ran identically to the OS that the next format is to be repeated n times version. The DATA statement took 48 seconds (p.137). The footnotes to the INPUT Statement and used 9l6K. The PRDC FREQ used 3 seconds section of the User' s Manual, however. and l492K. The PROC PRINT used 2 seconds and indicate that the n* modifier is not available used l492K. for AOS/VS, PRIMOS, and VMS. The original statement contained expressions such as (X Y The data steps necessary to create and score Z) (@l 3*1.). These had to be changed to read the 3,400 observations took a total of 4 @1 (X Y Z) (3.). There were 66 such minutes and 42 seconds. The PROC FREQ of this occurrences in the INPUT statement. larger data set consumed 22 seconds. A comparison of these CPU times is displayed in Table 3. B. IMPLICITLY SUBSCRIPTED ARRAYS As mentioned previously. the only type of III. VMS™ ARRAY statement available at the writing of The story becomes more complicated when the program is now described as implicitly switching operating systems to VMS and subscripted. The original program contained switching hardware to a VAX™ 11/785. Version the following type of 'construction: 5.16 of SAS under VMS supports neither the ARRAY Z XI-X9; old-style MACRO statement nor the MACRO DO OVER Z; language. Thus, a different approach was IF z...1 THEN Y+l; needed to make the INPUT statement and scoring END' statements available to the SAS- job. Only explicitly subscripted ARRAYs can be We first tried deleting the word MACRO and the processed under VMS. Thus. the above % symbol from each of the MACROs, and then constructions had to be changed to: using %INCLUDE to bring in the entire scoring ARRAY Z(*) XI-X9; program as a single block. This proved DO I-I TO DIM(Z); unsatisfactory during the debugging process. IF Z(I)-1 THEN Y+l; Since the code is lengthy, we did not want a END; dumping of the entire program with each attempted run. This was made an even more The program contained 39 ARRAY statements and significant consideration by the additional 31 DO OVER statements which had to be changed lines generated for each line of code by the in this manner.