THE BEST OF ALL POSSIBLE WORLDS: USING THE SAS SYSTEM® UNDER MVS, PRIMOS AND MS/DOS@ Frederick Pratter, Abt Associates, Inc. There are a number of advantages to us­ One of the most remarkable, though log­ ing local computing resources (the Prime and ical, developments in the evolution of the SAS the micros) as opposed to a remote main­ System has been the success of portable SAS. frame. First, lt is considerably cheaper. As recently as 1982, the task of rewriting the Although we occasionally require the level of SAS System for mini and micro computers, while processing power and the data storage capacity preserving its ease of use and function~lity, that only a large mainframe installation can seemed nearly impossible. At SUGI 8, 1n New supply, much of our work can readily be done Orleans in 1983~ it was announced that Version at a lower cost on a smaller machine. Even 5, the first portable version of SAS, h-:d :-e­ were cost not an issue, however, there are qui red the development of over half a m11l10n still advantages to a distributed system. The lines of code, most of it from scratch. After Prime is particularly well suited to fast five years of further refinement, I am glad to turnaround jobs, as for example when a tape be able to say that most of it works. comes in, files are merged, and mailing labels This paper is intended to document the generated that same day. Sending the data out experiences of one group of users with Version to the service bureau and getting output back 5 under MVS and Primos, and Version 6 under rarely takes less than a day. Finally, as MS/OOS, and our efforts to develop a distr~­ those of you who have used PC/SAS have surely buted processing system that would make maX1- discovered, the fixed monthly cost of a micro mum use of the SAS System in these environ­ computer (as opposed to the metered use of a ments. This effort should be of interest both time sharing system) encourages the kind of to applications running SAS in a ttwhat-if" exploration that often leads to new distributed system, and to managers insights into the data. contemplating such a move. Therefore, our initial objectives in First, however, I need to describe :he exploring the possibilities of a distributed systems' resources that are currently ava1l­ system were twofold (see Figure 2). First, we able to my organization. Abt Associates Inc. anticipated that code could be generated in is a research based consulting firm headquar­ the relatively low cost local environment, and tered in Cambridge, Massachusetts. Our then ported to run on the mainframe. The clients include a number of federal government theory was that we could get all the bugs out agencies, such as the Health Care Financing of the software, and output formats finalized, Administration, the National Institutes of without incurring conunercial time sharing Health, and the Departments of Health and rates. Second, for analyses that were appro­ Human Services, Labor and Justice. In addi­ priate for PC applications, we thought we tion, we provide consulting services. to some could use the larger systems as file servers of the nation's largest banks and 1nsurers, (data storage) for our micros. Incidently, computer manufacturers, pharmaceutical firms these considerations apply to both the service and several large automobile manufacturers. bureau mainframe and the in-house Prime, as we The data processing staff supplies systems de­ are charged for using the latter, albeit at a sign, database management and statistical pro lower rate than for the mainframe. The Prime gramming support to our research analysts. is used at Abt Associates as a data entry ma­ Obviously, in such a multivariate environment, chine, since it can support multiple users; the flexibility and comprehensiveness of the thus many of the survey data sets that one SAS System is extremely useful. About half of would like to analyze on one's PC are resident all the code currently written at Abt on the Prime. Associates Inc. is in SAS. It should be noted, at this point, that Three different computer systems are the downfall of many distributed systems is available (See Figure O. We maintain the necessity of maintaining multiple, inde­ accounts at several local service bureaus; pendent copies of database files. This can most mainframe applications are run on an IBM lead to situations where edits to one copy of 3084 using Version 5.08 under MVS/XA. a file are not posted in a timely fashion to (Version 5.16 will be available shortly.) all other copies, with the result that dif­ Version 82.4 is also still available at this ferent analysts are using different versions installation, and this paper will include some of the data. Too often, this can lead to ex­ discussion of the relative advantages of the asperating and time consuming efforts to fig­ older version. In-house at Abt Associates, we ure out why the tables don't agree. In a mic­ have a Prime Model 9955-11 that supports SAS ro computer environment, where everyone has an Version 5.04. Finally, PC SAS (Version 6.02) individual copy of the data, this is particul­ is presently licensed on more than 20 IBM and arly critical. The only solutions are either IBM compatible micro computers. This paper the utmost vigilance on the part of the data will focus on how we have attempted to find base manager, or else some kind of network, in the optimum mix of uses for each of these which one copy of the data _is maintained on a three environments, in particular on how our single file server machine and is equally initial objectives had to be modified in the available to all users. The Prime makes a light of experience, and what conclusions we convenient and powerful node for our micro have reached regarding the most effective way , and offers the further to take advantage of the flexibi lity that the advantage that tape datasets can be readily SAS System offers. accessed, something that cannot easily be done in all micro computer networks.

636 Before getting to the results of our cost of the bisynch board and the terminal experience, it is useful to review the differ­ emulation software can be more than that ent kinds of physical connections among the for a 3270 type terminal. None of the three systems. There are four different kinds available emulators can provide all of the of communications protocols available: functions of a real 3278. In particular, some of the keys must be simulated since 1. Asynchronous Dialup (Figure 3) - By con­ they are not present on the standard IBK necting a modem to the Com port of a PC, PC keyboard. For these reasons, most of it can be converted into an asynch term­ our staff prefer to use a separate inal. A terminal emulation software pack~ terminal and PC. Finally, using 3270 mode age must be installed in order to use the means that data transfer has to compete on PC as a terminal, but these are fairly low the line with the interactive in cost, and provide a high level of func­ applications, which can degrade response tional ity, allowing file transfer as well time for the latter noticeably. We have a as online access. The SAS: micro-to-main­ couple of PCs configured this way, but frame link procedures can be used if the they are only used for certain special host is an IBM system running TSO or CMS applications. and Version 5.16 of the SAS System. These procedures are designed to provide two-way 3. 3780 HASP Emulation (Figure 5) -- Having file transfer. That is, programs resident the Prime in-house gives us another on the PC can be run on the mainframe option. This is a two step that (using the RSUBMIT command) and SAS files offers more flexibility than either of the moved back and forth (using the UPLOAD & first two modes. The mainframe and the DOWNLOAD procedures). The only disadvan­ Prime are connected via a digital data tages of an asynch link are that it is line. This allows files to be moved back relatively slow (even at 2400 baud), and and forth at high speed with bit error that if the telephone circuit goes through rates of <1:10,000. The micros are al­ a publ ic exchange there is likely to be a ready hooked up to the Prime via direct significant level of transmission connections, so data files on the Prime errors. The bit error rate for a voice can easily be downloaded to the PC, and grade communication line is approximately vice versa. Since, as noted above, the 1: 1000, and the asynch protocol supplies Prime can be used as a file server for our little in the way of error correction. PC network, the staff are familiar with For these reasons, at Abt Associates file the latter process. The disadvantages of I:.ransfer is rarely done in this way. It this two step process, however, are that is faster for those users whose PCs are since only batch mode transmission is configured in this way just to use the possible, the micro-to-mainframe proce­ micro as a dumb terminal, log into the dures don't work. In addition, it is more mainframe or the Prime, and use the online time consuming to log onto the mainframe editor of the host for program to send a file to _the Prime, then onto the: development. Prime to transfer the file to the PC, than a one step process would be. Neverthe­ 2. Bisynchronous 3270 (Figure 4) - For in­ less, this is the primary mode of file stallations communicating with an IBM transfer at Abt Associates, since it is mainframe via a dedicated data line, an­ reliable and easy to use. other option is to use the PC as an IBM 3270 type terminal connected to a local 4. SNA 3270/3777 (Figure 6) The most controller. This requires a bisynch board sophisticated communication mode available and the installation of the appropriate is via a SNA gateway installed on the mini terminal emulation software. Such a con­ computer, allowing it to function simulta­ figuration looks like another 3270 term­ neous as a batch remote and as as a con­ inal to the control unit and has several troller for attached terminals. All of advantages over the asynch approach. the software is resident on the mini; the Foremost of these is that it is faster. only requirement of the PC is that it can In a local system, response is virtually function as a workstation to the mini. instantaneous, while in a remote link, The charm of this approach is that one such as that used at Abt Associates, keyboard on the user's desk can alterna­ response time is still very good. Also, tively be used as a standalone PC work­ this configuration allows the use of full station, as a terminal on the mini, and as screen editing on the host, which the a remote 3270 terminal on the mainframe. asynch link typically does not. (Our Unfortunately, the micro-to-mainframe service bureau has a VTAM emulator that procedures will not work in SNA 3270 simulates 3270 operation in asynch mode, emulation mode, but files can still be but at dialup speeds it is just too transferred in batch mode, just as in 13 frustrating to use.) A bisynch connection above. also allows the use of all the features of the micro-to-mainframe link described above. There are, however, several dis­ advantages to this approach. For one, the

637 The obvious question, then, is under Figure 9 shows a slightly simplistic what circumstances is it most effective to use version of this comparison, factoring each of the three systems available? In order labor into job cost at a rate of to answer this question, a simple test job was $60 per hour (not an unrealistic rate for written to benchmark the different configura­ contract programming). For the two micro tions (see Figure 7). This job was invented systems, no computer charges are shown; the to combine floating point arithmetic with cost of the machine is assumed to be zero, and I/O. Five variations were tested. only labor costs are illustrated~ In the mainframe envi ronment, however, programmer 1. an extremely fast, state-of-the art, micro time disappears, and only servic-=: bureau with all available hardware options to charges are shown. Only large jobs on the speed processing; Prime include both labor and computer charges. There are no circumstances in which it 2. a conventional 80286 machine (AT is cost effective to use the SAS System on the compatible); mini computer. Clearly, this machine should be relegated to those tasks which it performs 3. the ; well, such as tape handling, printing, communication and network management. It 4. the IBM 3084 using SAS version 82.4, the should not be used for analysis, at least not last IBM 370 only SAS release; and for SAS programming. The cost breakpoint between mainframe 5. the IBM 3084 using a portable version of and micro applications is somewhere around SAS, release 5.08. 5000 observations. The advanced processor is cheaper than the 3084 even at this level. Figure 8 shows, for four different size data­ Only when datasets become too large to fit sets, the results in processing time, calcu­ into MS/DOS partitions does the mainframe lated as the sum of CPU seconds as indicated become appropriate, and even here one can in the SAS log for the three steps. A number expect new generations of micro computers to of interesting points emerge. become increasingly more useful. First, the difference between the two I cannot leave this discussion without micros is startling. The addition of advanced noting the one area in which micro computers performance features clearly pays off in are still outshone by their larger cousins. increased speed. The newer machine is more This is the problem of obtaining printed than twice as fast as the conventional AT output. compatible. We have linked our micros via parallel Second, I was unable to run the 50,000 connectorS to the latest generation of observation benchmark on ~ of the micros available desktop laser printers, but these available. This was not because of CPU limit­ still are limited to about 8 pages per ations, but simply because none of the PCs had minute. A 240 page output consequently takes enough empty space on the hard disk for the about a half hour to print. The mini work files. Every effort to run the large computers currently available support laser benchmark aborted with a "disk fuli ll error copiers that can generate up to 24 pages per message. In the real world, this is likely to minute, reducing the half hour figure to ten be a constraint on all MS/noS based soft­ minutes. Even this cannot rival the giant ware. Until SAS can run on machines that use laser printers available on the mainframe. the full capacity of the newer Winchesters, Our primary service bureau has four of these, data files of this size will not be practical each capable of producing a 240 page report on for PC applications. single or double sided copy in a variety of The Prime, as expected, is slower than type fonts in about 45 seconds. In an the mainframe (particularly for the big job), environment where time is critical, as in our but faster than the micros. consulting firm, there is no substitute for As far as the two mainframe versions this kind of service. Nor is it likely that are concerned, a price has obviously been paid anyone would attach a $250,000 printer tc a for the portability and the increased func­ $2,500 computer. If the costs for staff time tionality of Version 5. The older version spent waiting for output are added in, there runs in 10-20 percent less time and uses less is definitely still a place for the large memory as well. Those users who are con­ volume output capabilities of a mainframe strained to run in environments where core is system. at a premium could well benefit from using the In conclusion, then, what can be said older version for those applications where the regarding the initial objectives that prompted enhanced features of Version 5 are not needed. the move to a distributed system? Figure 10 CPU time is not the whole cost of summarizes the results of our experience. processing, however.. One must also consider programmer labor. Staff time is -still cheaper than machine time, on a per second basis, but on a time sharing system ~ time can be as much as 60 times CPU time, because of paging and -other considerations.

638 First, as long as Version 6 is a subset of mainframe SAS, it will be impractical to develop code on the PC to run on a larger time sharing system. It is unlikely that the limitations restricting micro computer applications will ever vanish completely. For example, programs that address more than 265 variables won't run on the PC. Substantial amounts of programmer labor are expended in dealing with these incompatibilities. Second, even the most sophisticated file transfer software is still limited by the bandwidth of telecommunications lines. For some applications, it is alright to let the machines run all night to download a file, but in our environment, we usually don't have the time to wait. It is cheaper to trade computer time for labor costs, and run on the mainframe. What then are the advantages of a distributed system? Foremost of these is that for jobs that do not produce a large volume of printed output, the latest generation of fast PCs can be very cost effective, even for medium sized (5,000 observation) datasets. Much of our work involves surveyor clinical trial data files that are smaller than this, which can be very productively analyzed on advanced micro computers. In addition, the significant virtue of the SAS System is that programmers who know one system can quickly become effective in a new environment. Training time is consequently greatly reduced. Instead of learning a new editor, and a new set of commandS and a new , our staff can transfer the skills they already have to the new environment. The familiarity of the SAS System on different machines reduces the anxiety associated with moving to a different system. To summarize our experience, we have reached the somewhat unexpected -conclusion that the objectives that initiated the effort to create a distributed system have receded in importance as new advantages became apparent. I am looking forward to following the future evolution of the SAS System, as it moves forward to integrate the continuing development of data processing concepts and designs with the demonstrated reliability and flexibility of the product.

639 Ftgu.... 8 eu"",arhon of ProC41"dng Speed" Using Ftva 5115* Syst_

/::7j

0000. /'------;~i(i------71§1-1--:T::;;---7~=~~---7 /_. 1::1 1 / 1** / / 1**1/ / ,**1/ / /f::I~ 1-----;'1 ::I_I _____ ;/:...----~~~--ll-----~~--;l-----!.:..!--:/ , 1**1 I / ** / '-' / '-' / ._. / 511110 /" ::V / I::V / 1~.i.~1 / l~.i~' / 1~.i~1 / / / / / / / /------/------/------/------/------// 121 / 265 / 16,3 / 1.2 / 1.4 / / / '-' / / / / 500 / /:7j / I~.(I / /::7i ;' /':':71 / /':':71 / / 1**1/ / lull / lul/ / lull / lull / / / / / / / /------/------/------/------/------// 24 / 51 / 3.8 / 0.5 / 0.63 / / / / / / / 60 / /::7j / /:-:li / L/l / /::7i / /::7; / / 1**1/ / 1**1/ / l**lI / 1**11 / 1**1/ / / / / / / / /------/------/------/------/------// 15 / 29 / 2,1 / 0.41 / 0.55 / PC 6.02 (I) PC 6.02 (21 Pd.... 5,04 IBM 82.4 IBM S.OB

Sy.te.... : PC.l _ 10 Ilh~, HoaUng point "04'1'0"'"5501'. urn ...it .t.ta PC.2 _ AT "~Qt1bte Prl.... 9955-It IBM 3084

* SAS h thtl regishred trad.... ark "f SAS InsUtuh Inc • Cary. NC, USA

Figura 9

,~i(i I~! I ,_. /------;------1-1 ~l-l----;--I::7j ----;-I~i /1-----;- , , "MII' IMII 'IMI I ' 50000 / / / I~ / / I~'/ / I~I/ / / / / / / / /------/------/------/------/------// / / 530.83 / $10.64 / $11.66 / / / / /':':71 / / / 5111111 / I~~i / I,ij~j / L:V / ,ij;i / lij;i / / / / / / / /------/------/------/------/------// U.1I2 / $4.42 / $4.61 / $3.12 / $3.41 / / / / / / / 500 / ,~-,.-:i / /::71 / /::71 / /':':71 / /':':71 / / .,., , / 111111/ / IMII / IMI/ / IMlI / / / / / / / /------/------/------/------/------// $.40 / $.85 / $1.33 / $2.34 / $2,60 / / / / / / / / / / / / / 50 / l~i~t / "ijj;1 / I~j;i / l~i~1 / I~~i / / / / / / / /------/------/------/------/------// $.25 / $.48 / $1.28 / $2.25 / $2.55 / PC 5.02 (1) PC 6.112 (2) PIIIIJIE 5.114 UIM 62,4 IBM 5.06

5y.ta.... : PC II _ 10 111hz. HOQUng point co-procauor. UrO wait .tat. PC 12 - AT C01llpaUb"le Pd_ 9955-11 IBM 3084

Symbols' A _ eo.. put .. " 8 - labor" $50lhr

* SAS h the reghtered t ....d ...... rk of $AS Institute Inc .. Cary. NC. USA

Figure 10

DISADVANTAGES

SOFTWARE DEVELOPEMENT ON PC UMITED BY DIFFERENCES BElWEEN VERSION 5 AND 6.

FILE TRNSFER RESTRICTED BECAUSE OF TIME CONSTRAINTS.

ADVANTAGES

USE THE RIGHT TOOL FOR THE JOB.

TRAINING TIME REDUCED.

640 Figure 1 Figure 2 AVAIl ABLE COMPlITER RESOURCES OBJECTIVES

IBM 3084 AT REMOTE SERVICE BUREAU, CONNECTED VIA MULTIPLEXED 56K BAUD DIGITAL LINE TO LOCAL SYSTEMS. DEVELOP SOFTWARE ON PC TO RUN ON LARGER SYSTEMS. RELEASE 82.4 AND 5.08 OF THE SAS SYSTEM. DOWNLOAD DATABASE OR EXTRACT RLES TO PROCESS ON PC's. PRIME 995511 IN~HOUSE MINI COMPUTER. RELEASE 5.04 SAS

IBM AND IBM COMPATIBLE MICRO COMPUTERS LINKED TO THE PRIME ON AN INTERNAL LAN. RELEASE 6.02 SAS .

Figure 3 Figure 4

Micro - Host Link vIa Asynchronous Modem Micro • HHI Link ria Control Unit (Bisynchronous)

rr==i ~ _.. "0 v..... _""" .mm... "'." ...... ••• '~m • B' ••• " c:::,~, .;.~"!.c ..... "n'eo".n.= .-. c... ,,,,, U"', ···~rl B..... u.« C""''''" Un" ~ - ~0_ ~ {83J """"""'" -~

FIgure 5 figure 6

Micro _ Host Link via Distributed System Micro • Host Link via SNA (Batch TransmiSSion Only) (Batch a .... OtI·llne Access Available)

co."", Uolt .. - --\.P.. , -~

Figure 7

Benchmark Program

options dquote: X1et obs = 50; data; do y=l to &obs; x "" y**-3; z = x*y: output; end; proc sort out=xxx: by x; proc means n mean min max data=xxx; tit l.e "TEST RUN WITH "obs CASESlI;

641