A Strange Affair: An Application Using SAS and Excel

By Richard Chiofolo, Ph. D.

Why Excel? Learning to Use Excel’s VBA

Yes, it is strange. We learn that SAS is In most Finance departments today inclusive; “soup-to-nuts,” and can do outputs are in Excel format, or at least anything “under the sun.” It’s largely comma delimited (.csv) files easily read true. However, SAS does not have a by Excel. With the addition of VBA financial image. It is not a problem for code, a remake of an earlier markup those who use SAS, since we know its language, one can automate a lot of almost universal applicability. A quick activities in Excel. You can even design review of the financial “functions” in databases in sheets, entry screens, SAS proves the point to most intelligent links, and display screens. programmers. The overall capacity exists largely However, in Finance departments, because Excel is embedded in a much regardless of industry, SAS is not the larger engine design (Windows), which first tool of choice. Financial analysts now includes OOP “objects” usable will tell you “there is one and only one, across applications. For older universal standard: Excel.” For those programmers of BASIC, VBA does not over fifty, Excel is the latest in a look like BASIC, but it is much richer progression from MultiPlan, , and application based than the original SuperCalc, Lotus 123, Symphony and language. However, it is not easy to use. other applications. Excel One must learn terms of the Windows was first released for Mac’s only. As , and each object’s soon as it was available for the PC, it characteristics in each package (Excel, quickly became the dominant Word, Access, etc.). VBA is a lot more spreadsheet package for personal than the usual control structures like . In Finance, it is the standard “IF” and “DO”. More important, the tool, and has been for over ten years, objects are defined largely within the quite a feat among software products. context of the application package, in this case, Excel. It helps to know the Here is my goal: Excel package well before you learn to use the VBA objects. I must design an application in Excel, the product my users Normally I would have used SAS, on understand best. the mainframe, but in this case, my users recoil from mainframe based

applications. I cannot even use Access, “Facilities” each have various Centers the appropriate Microsoft tool for and Accounts. database analysis and reporting. The users want to manipulate results further Finally, a field called MANCAT was (sorting, printing, copying) in a tool created which allows for a very high they already understand: Excel. I level rollup of the approximately two needed to automate the selection of hundred thousand records in our full records so that the user had an interface general ledger. These categories have to the data, not simply a dump of direct meaning to clients and they most records in a sheet. I knew that Excel had often want to see the location, centers pull-down menus, list and combo boxes and accounts associated with one of for selecting values and option buttons more of the MANCAT categories. for selecting criteria. I would have to learn how to use VBA code to automate The full key or “fully keyed account” for the process behind these Excel “Forms.” the database is thus:

MANCAT, LOCATION, CENTER, and The General Ledger in a ACCOUNT. Hospital Corporation We use alphanumeric data for all fields.

In SAS they are character fields, since Leaving SAS temporarily, I began we cannot treat them as computational analyzing Excel’s capacity to automate . The rest of our record consists the selection and subtotaling of records. of various standard accounting The data used was a dump of the measures, Actual and Budget dollars general ledger in a large HMO spent in each fully keyed account, and corporation. Like most ledgers, it is FTE (full-time equivalent) headcounts. organized into Locations, Centers and To provide user requested comparisons, Accounts. Centers are equivalent to we include dollars for the current “departments,” organization or business month, prior month, and the same units for collecting costs and revenues. month last year, with both monthly and Here we are concerned only with “Cost year-to-date versions of the monthly Centers,” not “Revenue Centers.” Each figures. Center is divided into several Accounts that represent “cost functions” such as salaries, taxes paid, medical supplies, The Excel Application repair work, laundry charges, etc. We can also group Now, how do we allow users to review

detail records, via automated selections, We are a very large corporation, and various rollups or subtotals of the subdivided into many locations. It is same data. For example, users want roll only relevant because the same center ups of all accounts in a Center or all can appear in more than one location. These physical “Locations” or

locations in the same Service Area as a to it. I was experimenting with Excel’s total. implementation of Advanced Filter, a variation on the “Filter” function. It uses The easy part was implementing an “input range,” our database, an selections of records. Excel includes “output range” of the filtered records, several tools, such as “list” or “combo” and a “criteria range” that determines and “check” or “option” boxes for how to conduct the filtering of the selections. Each box is connected to a original data. It is analogous to our SAS “cell link” which allows me to see procedure PROC SUMMARY together numerically what selection the user with a WHERE to pick out which made, and a “macro code module” that summary is desired. The marriage automates a process for selecting occurred when I realized that records in a criteria range (see below). SUMMARY could generate all the possible combinations or summaries the The hard part was figuring out how to user wanted, and Excel’s FILTER summarize the detail records. Excel function would quickly select the provides two methods, an older chosen records that match a user’s “Subtotals” method, and a newer selection criteria. I would have to pre- system called “Pivot Table.” There are summarize the data in SAS first, which drawbacks. Subtotals inserts subtotals added more records, but I could avoid directly into the database of detail Excel processes delayed the display of records, wherever a break in the results. presorted data occurs. Once the subtotals are present in the data, you To implement Advanced Filter, we start cannot make further manipulations. The with a defined input range. As with Pivot Table method adds another sheet SUMMARY, this input does not need to to the application and needs user be pre-sorted. The output range (which intervention to implement and format. must lie in the same sheet as the input They can be automated, but I found that range, necessitating a final “COPY” the code was becoming too difficult to function to move the data to another manage. A path incorrectly chosen was “Display” sheet) can be limited. Excel leading me into convoluted fixes. only filters records for output columns that match input columns. There are over a hundred financial measures in Advanced Filter: The Chosen the input range, but only two columns Path (besides the keys) are needed. The user selects which dollar columns to output. After several weeks, a number of assist books on Excel’s VBA, and some The criteria range was more difficult to extensive experimentation, I found a figure out, since it must be intelligent simpler solution. What is strange is the and include several options, such as an unintended synthesis between the exact match, or the reciprocal: all strengths of SAS and Excel that led me records that do not match the specified

value. The criteria range is complex subtotals are already present in the data. because there are so many options to This was the major breakthrough of this include. The user might want to see: application design. By using PROC SUMMARY to create subtotals 1) The detail records: a list of beforehand, Excel has only to extract fully keyed accounts. They them, no rollup occurs in Excel. may want to select only a few of them, by entering a single center for which they want to The SAS Summaries see all locations and all accounts. They may want the Using PROC SUMMARY without the other combinations, all NWAY option, we then use “bit” masks locations and centers for one to select which summaries and detail account, or all centers and records are needed by the application. accounts for one location or one Service Area (that is For example, here is the SAS code in a multiple locations in one data step for SUMMARY’s output file. Service Area). * CLASS MANCAT CENTER LOCATION 2) They may also want a custom ACCOUNT; IF _TYPE_ = '1111'B THEN DO; rollup of which there are RECTYPE='DETAILS'; OUTPUT; END; many combinations, but only ELSE IF _TYPE_ = '1000'B THEN DO; certain ones are likely, such RECTYPE='BYCSA '; OUTPUT; END; as: ELSE IF _TYPE_ = '1001'B THEN DO; RECTYPE='BYACCT'; OUTPUT; END; ELSE IF _TYPE_ = '1010'B THEN DO; a) By Center totals, all RECTYPE='BYLOCS '; OUTPUT; END; accounts and all ELSE IF _TYPE_ = '1011'B THEN DO; RECTYPE='ACCTLOC'; OUTPUT; END; locations. ELSE IF _TYPE_ = '1100'B THEN DO; b) By Account totals, all RECTYPE='BYCENT'; OUTPUT; END; centers and all ELSE IF _TYPE_ = '1110'B THEN DO; locations. RECTYPE='CENTLOC'; OUTPUT; END; ELSE IF _TYPE_ = '1101'B THEN DO; ) By Location totals, all RECTYPE='CENTACCT'; OUTPUT; END; centers and all accounts. The commented CLASS statement (from d) By Center and the SUMMARY) provides the order of Accounts, all locations. fields invoked by the bit-mask, a “1” e) By Location and signifies to include this break and a “0” Center, all accounts. indicates to ignore this field, the f) By Location and equivalent of “ALL” records on this Account, all centers. field. The first IF statement requests detail records. The others pull in other Advanced Filter can grab any or all of subtotals that the users wanted to see. these “cuts” of the data, but only if the

After selecting specific records, we alter Downloading the Results the actual values of any field that is summarized (the zero’s in the bit-mask) The SAS generated data must be to be “ALL”. This is critical, since the downloaded, from mainframe to PC. To value “ALL” will later be used in the accomplish this, we could design a .CSV Excel criteria range to request records file output using PUT statements in summarized on that field. Here is the SAS, inserting the requisite commas and code: quotes around string fields, then FTP the file to our PC’s. However, since IF CENTNAME = "" THEN downloading is so common a needed CENTNAME = "ALL"; feature, we have a macro to perform the IF ACCTNAME = "" THEN work. I will here go into all the details, ACCTNAME = "ALL"; and much of the operation takes place IF LOCNAME = "" THEN LOCNAME outside the SAS environment. Once the = "ALL"; file is generated, we have binary code which transmits the output file to What we are doing is capitalizing on LOTUS NOTES directly. We can simply null-values which appear on records launch the resulting .csv file into Excel when that field is being summarized. (usually opened with Excel) without SAS initializes the character fields to intercession by the Excel Wizard. blank, since the record includes all possible values for that field. “ALL” was used so that the user sees this word on Excel’s Criteria Range their screen whenever they want a summary on that field. We must now have a method in Excel to select which of the detail or summary We have also generated more records records the user wants to see. Here is an than were in the original data, because example of one selection (without the we include both the original detail MANCAT field). Both the criteria (first records and all summaries needed by two rows) and output (last two rows) the application. In this case, each dollar ranges included: is repeated in the application eleven times, once at the fully keyed account CENTNAME LOCNAME ACCTNAME level, and ten summaries. Since each ALL ALL ALL summary is a rollup, their record count CENTNAME LOCNAME ACCTNAME ATD0304 BTD0304 cannot exceed the original, but when all ALL ALL ALL 50,000,000 55,000,000 the summaries are added, the full ALL ALL ALL 250,000 260,000 database can be many times the size of the original records. In the criteria range, we select only records with “ALL” values listed except for the Payroll and Entity fields. We obtain totals split between Payroll and

Non-Payroll dollars, as displayed in the 1) One specific value on this last two rows. field. The user enters the value chosen; or In order to select for Center totals, we 2) The “ALL” or summary would use: records on this field rolled up; or 3) The “<>ALL” records, that is, CENTNAME LOCNAME ACCTNAME all possible detail records on <>ALL ALL ALL this field. CENTNAME LOCNAME ACCTNAME ATD0304 BTD0304 Center One ALL ALL 50,000,000 55,000,000 Center Two ALL ALL 250,000 260,000

Some combinations are not credible, and Here we have used the “ALL” records cannot be included. For example, users value in most of the criteria range, but could select a location and center “<>ALL” in the Centname column, combination that does exist. The sheet allowing for subtotals for each Center. generates a “beep” and error message, Excel interprets this value (<>ALL) as but cannot do more than inform the “all values except the value “ALL”. client that a combination does not occur.

With 4,000 Centers and 150 locations, What is striking is the application’s the application could become unwieldy speed. Excel performs the Advanced very quickly if the Ledger included Filter function very quickly, faster than every possible combination. either Subtotals or the Pivot table. To use it efficiently, we generated subtotals Even the names of the two amount beforehand. The resulting database size fields are created with formulas. We use exceeds the original count of detail a standard convention of xxxyymm for records. For example, an application field names, where xxx is either “ACT” with only one MANCAT category for actual, “BUD” for budgets, “ATD” generated 797 detail records and 17,273 for year-to-date actual or “BTD” for possible combinations of the seven year-to-date budgets, yy is the year and fields in the CLASS statement. After mm is the month. The user had six selecting only the subtotals desired, we options to choose from: were back to 2,246 records. So far, we have designed sheets with as many as 1) Current Month, comparing 25,000 detail and summary records actual to budget. combined without any noticeable loss of 2) YTD for the current month, performance. comparing YTD actual to YTD

budget. The last step was to put formulas into 3) Compare to last month, actual the criteria range so that one of three only. choices is available for each criteria 4) Compare to last year, actual field: only.

5) Compare to last year YTD, 2) Excel to extract and display actual only. the user’s records. Once 6) Compare Year-end YTD, displayed, the user can sort, actual and budget. print and copy and paste to other Excel applications. I have not included the Excel code, since we are concerned here mainly with the The big advantage is speed. Excel application of SAS, not Excel. performs the Advanced Filter as quickly Functionally, the code consists of an as it can read, compare to the criteria invocation of Advanced Filter, and copy the output. The data does not references to all the selection criteria even need pre-sorting. My only values linked to the user selection boxes, complaint is that Excel demands that the and copies from the output range to the input and output range be in the same “Display” sheet. The user sees only the sheet, thus necessitating a final copy “Select” sheet with boxes and buttons, and paste. An alternative would have and the “Display.” been to use the database sheet as a display as well, but this solution allows the user to alter the database sheet, a Advantages and Drawbacks potential disaster. A simple copy and paste took care of this handicap. For an avid SAS programmer, my pursuit of an advanced level of Excel In general, there is a rule here: allow design, using VBA code, was not SAS, not Excel, to provide much desired or easy. All the outputs of the processing beforehand for designed could have been achieved an Excel application. Simple testing directly in PROC REPORT, which both of large sheets with lots of formulas in summarizes and displays, with or its cells will prove the point. For without the details. The VBA library of example, this application uses objects is so large; I have not explored cumulative year-to-date dollars as well all the possible methods that could have as monthly dollars. We need both sets of been used here. My solution is strange dollars in the database. I could have because it relies on the strengths of two calculated the year-to-date versions in different packages not normally used Excel with SUM functions very easily, together: but didn’t because formulas take too long to process. SAS can do the 1) SAS to generate the calculations instead. With thousands of summaries needed by the records times twelve monthly year-to- application. Summing large date buckets times two years times two volumes of data is fast, and versions (actual and budget), we have the bit-mask code, once too many cell formulas to calculate. written, takes little effort. The download is automated. The same is true of the Subtotals and Pivot table solutions, since they also

involve processing time. Advanced already constituted database and report Filter is much faster. objects in a small DBMS package.

There are some disadvantages as well. Which introduces another implication, a Catch 22. OOP programming may be All the possible user requests must be rapidly producing a generation of analyzed beforehand. This is essentially programmers who use very a closed or pre-defined system, a sophisticated objects, but do not know drawback in comparison with the Pivot how to design them from scratch. Like table, but an asset for clients who know with calculator use in the primary only how to navigate, copy and print in schools, we are in danger of believing Excel and want the sheet to convert their that the world is as simple as Windows, button selections into results. but a world none of us can recreate. I used to teach DOS. More important, we are using Excel as a database package. We run into two Any questions: Please direct to the problems: author, available at rick.chiofolo @kp.org or (510) 987-2645 daytime. 1) There is a absolute limit of 65,536 records. The practical limit is less. 2) Excel does not have completed application devices for databases. These features are in Access, but not Excel. For example, Access includes built in Entry screens and a Report Writer.

I got around these limits by sub-setting the data into smaller groups of records, based on user feedback about what they wanted to see. In this case, a security issue (not allowing un-authorized access to the ledger) was solved at the same time as the overall size limit by sending out selected records to specific users only. However, I had to design the user interface for selections and an output sheet for the display. Once done, they are automatic, but a less sophisticated programmer would have looked for