Sas Report Writing Over the Years
Total Page:16
File Type:pdf, Size:1020Kb
SAS REPORT WRITING OVER THE YEARS An Exploration Into the Difficult Task of Meeting Everyone's Needs. Written by Richard C Chiofolo, Ph. D. Kaiser Foundation Health Plan Oakland, California 5 July 1 \ 2000 528 Writing reports is not a very esoteric or intellectual task. It lacks glamour. The ability to produce reports isn't something people cite on their resumes. However, it may be one of the most useful functions in any language package because programmer/ analysts often need only put information on a piece of paper as a list or simple comparison of some business function or event. Whether the data is character fields in a list or data points put on a graph_ the intent is the same: put information in a format for presentation. Some level of summarization may be desired, but people are rarely willing to look at final statistical results without seeing some of the detailed information first. Communicating information itself should be the goal, but, in many cases, presentation is as important as the information itself. Finally, how information is presented counts, a lot. Programmers and analysts have their biases about how they want to do their work. At my age a non-graphic PROC CHART is fine, my assistants want to play with WEB ready graphics. We should look at the SAS procedures designed for making reports, but only those that give us the simple ability to present raw or summarized lists and comparisons. How well do they aid us to produce reports, not esoteric statistics, only mundane listings, detailed or summarized records in our workplace. SAS was designed originally by statisticians to prepare data for statistical comparisons. In the 70's, SPSS, BMDP and SAS were strong rivals for the market of statisticians and scientific experiments. By the 80's SAS became a more generally used package for "users" and other non-statistical applications, competing with FOCUS and CICS. Report generation became a more important component of the overall SAS package. Although relatively simple in purpose, attempts by SAS Institute to generate a flexible, coherent, easy to use and efficient SAS procedure for generating reports has actually been fraught with difficulties and mistakes. Since we are concerned only with "reports," not summary statistics or high level analysis, we need to define which SAS Procedures generate reports. Clearly PROC FREQ is less statistical than PROC REG, yet both are analytical in nature. A report is output that lists detailed or summarized data, either as a listing or multi-field comparison, called cross-tabulation. It must show actual data points, not just our conclusions about some data, even if the data points are sununarized data rather than detailed. Cross-tabulation should be included, since it also displays detailed information. Most businesses do not use the refined statistics to test and compare factors affecting their business, like the variance procedures. Business analysts are prone to more basic and obvious comparisons of data, so cross-tabulation is a more common feature in business analysis. ln essence, a report either lists values or compares two or more lists of data values. Lets look at several SAS procedures, over time, as they have evolved since the early stages, and measure them, not by what they accomplish, but by what affects their use, and rate them on these "Report Writer'' dimensions. Here are three dimensions we can use to rate SAS Procedures as Report Writers. 1. A procedure is either algorithmic or textual. Different people prefer one type or the other. People with statistical or heavy third GL backgrounds often prefer very concise, algebraic languages, terse with short names. SAS overall is wordy, like early BASIC, as seen in the logic and controls of a data step, while still retaining the algebraic logic of FORTRAN and PLl, the languages SAS was first written in. Fourth GL' s are usually more wordy than third GL's. SAS has tried to be both. 2. A procedure is either compact and standardized or modular and flexible. Most SAS procedures are standardized packages of code that users never see but will produce consistent preformatted output, as long as the design covers all possible options. Users are constrained by the available options but also aided by simple designs that can produce complicated results. Sometimes users want more flexibility for very customized output, but most procedures follow standard approaches for ease of use. The need for flexibility leads to modular designs that offer more but are harder to master. 529 3. A procedure may be limited in scope or very comprehensive. Most SAS procedures are limited. SAS is written to be a collection of data steps and Procedures, each completing more limited tasks. Over time, SAS has become more comprehensive, becoming more 4th GL with longer more complicated procedures that do more for you. You may or may not include various optional statements that make a report more complex, but you can opt for the very simple and minimal statements. A comprehensive procedure forces users to provide more detailed coding. with a higher learning curve, but allows more features and controls. SAS has always provided both types of procedures. In fact, SAS provides all kinds of methods to complete a report, and we shall find procedures that represent all combinations on these dimensions. It is up to tne user, not the product, to pick what works best for them. There is almost always more than one procedure and more than one method to generate any report. Let's now look at some of those methods used by SAS programmers over the years. T~e Early Years: Using PRINT and NULL Data steps The first attempt to use SAS for generating reports was found in PROC PRINT. This very basic procedure was found in the earliest versions of SAS and was probably the most used. PRINT is essentially a "dump" of raw data points. Each row of the output is a single record or observation in the data. Each field or variable is put on the paper output as a single column. PRINT takes care of several features needed. PRINT provides automated spacing between the columns, labeling variables that become column headings, observation counts, double spacing, enough room to handle wide field values, D and BY statements provide report breaks and subtotals and SUM provides subtotals (with BY) and totals. PAGEBY and SUMBY were added later to allow control over page breaks and specifying which BY variables are used for subtotals. The key was simplicity. PRINT is textual, very standard and limited You simply specify the fields you want listed, in order, on a VAR statement. SAS takes care of the formatting of columns on the paper, and adjusts for pages with wider values. You can standardize column widths for the entire report using UNIFORM. PRINT is comprehensive, even using minimal code, as long as you stay with a simple need to dump the detailed records and fields of a dataset. For example, just: PROCPRINT; Will provide almost entirely what you need. Without any other statement, SAS defaults to dumping the last used dataset, all fields, in the order in which they appear in the dataset. If there are too many fields to fit on a page, SAS automatically "wraps" records, so that you get to see something like a multi-panel spreadsheet. PRINT is so useful as a debugging tool, that it is essential to writing even the most complicated SAS application. It is, on our dimensions, a comprehensive standardized textual procedure. Easy to use, it has filled so many needs it has become the procedure most taken for granted. It is closest to what most analysts are already familiar with, EXCEL and LOTUS. Because PRINT was so standardized, it did not meet many specialized needs. Very often we needed to create a more customized report. There were ways to accomplish this, using PRINT, mostly by preparing other records for the report BEFORE running PRINT. For example, if we wanted "Mean's" on the subtotal lines instead of "Summations," we: 1. Ran PROC SUMMARY first to create subtotal records with Mean values; 2. Marked the new subtotal records with special keys to make them sort correctly when we: 3. Merged the subtotal and original records together. 4. Printed the resulting file, without any SUM statement, since the subtotals already exist. 530 This method took time, and made the programming more complicated, since we've added a SUMMARY procedure to create the subtotal records and a DATA step to merge them back in with the original detail records. False keys have to be invented just so we can sort each subtotal line after the detail lines it belongs to. In sum, PRINT could not be both simple and customizable. Whenever the users requested a more precise layout, we turned to a procedure that was the opposite on all dimensions: Using NULL data steps to generate customized reports. Report Writing a Ia Carte Report Writing is often so complex, with such specialized needs, that SAS had a manual dedicated to using data step processing to produce a custom report. Using a combination of BY statements to keep track of the current record and some sophisticated features of PUT, such as the cursor's specific line number and column on the output page, it was possible to do almost anything COBOOL or EASYTRIEVE could produce. The technique is well described in the SAS Applications Programming manual. This technique treats DATA steps as blocks of output paper. You control directly which record you are reading, and the way in which data is placed on the paper. Like in the days when COBOL was king, this technique allows infinite options and direct control of where each value is placed on the output device.