101-31: Using External Data Dictionary Files When Building SQL

SUGI 31 Data Warehousing, Management and Quality Paper 101-31 Utilizing External Data Dictionaries to Build SQL Queries in Base SAS® Mike Tangedal, US Bank, St. Paul, MN ABSTRACT necessary business rules applied to the Documentation of created variables within a source data before becoming readily data warehouse requires a compromise available within any data warehouse cube between definitions in code (Base SAS) and structure. After the loading phase where the text supplied by analysts. Business rules data updates are physically loaded into the nestled within code serve no documentation data warehouse structure and after the audience other than skilled SAS metadata and initial quality assurance phase programmers, not management and analysts when the data is checked for initial validity who ultimately provide the logic for these and conformity with previous loads comes rules. However, business rules stated purely the phase of applying business rules. Since in terms of the customer are not directly additional quality assurance is almost translatable into code without major always a mandatory precaution before concession to both the code sophistication placing data in a dimensional structure and the expertise of the customers. A available for reporting, a data dictionary workable solution is to store each business containing these established rules also rule for each unique defined variable in an warrants necessity. Although the business external file, both readily accessible by the rule components of such a quality assurance Base SAS language, customers, and platform need not reside in any set rigid analysts. The challenge to the programmer structure, the ease of maintenance and is then to successfully implement these implementation warrants segregating each external business rules into a usable end component into an entry in a separate file product. The challenge to the analysts is to referred to as a data dictionary. The document all business rules in absolute methodology employed to implement an terms of data available. Such a solution external data dictionary file into a SAS SQL involves a unique approach from the SAS query tool involves some macro code professional, much of which is discussed in sophistication and development of a both code examples and concepts. hierarchal structure for the data dictionary Programming issues discussed include itself. macro routines to verify existence of require data sets, development of a hierarchal data dictionary, parameter verification within INTRODUCTION Base SAS, limits of macro variables within The most direct methodology for SQL, and remote building of a sophisticated transferring the business rules required to SQL query. transform the available data residing in the data warehouse to summarized report-level data consisting of dimensional categories PURPOSE and calculated metrics is to place all of this An oft-neglected yet critical component of logic within a SAS code module. any successful implementation of a data warehouse is the incorporation of the data dictionary. The data dictionary stores the SUGI 31 Data Warehousing, Management and Quality terms as popularized in the popular data warehouse manuals do no suffice. Your customer database is not going to appear as Customer a typical customer database definition in a Report data warehouse manual. Your customers Data Warehouse need to derive business rules as specific to their needs and always in terms of data available. These terms need to be defined in a centralized location accessible to all pertinent members of your business group. Therein lies the need for a data dictionary separate from any singular SAS program but accessible both from SAS and from your This most direct approach fails in two major business customers. ways. First, any business rules included in the process are under complete control by The job of the SAS programmer savvy to the the programmer. Developing such a ‘black ways of data warehouse architecture is to box’ approach puts all responsibility for create these definitions both translatable into maintenance and communication of these programming language (SQL or base SAS) business rules on the programmer. Second, and relatively easy to discern by the the stress put upon an increasingly complex business users. The business rules block of code increases the chances for its themselves won’t of course be in clear failure. Maintaining complex logic English, but additional descriptive text structures within a singular large block of should be made available to accompany the code is not only difficult but dangerous. business rule in its most basic form. Given the number of standard dimensions and metrics available for each data Such a resulting data dictionary can then be warehouse source table multiplied by the used to create reports not only of maximum complexity of each business rule accounting benefit to the end user but also fully for missing and default values, the block of documented through the use of the data code required can be so large as to be dictionary as a reference tool. Your unmanageable in a singular program. customers will be able to understand the business rule definitions completely without An approach to this problem taking into the burden and bother of understanding the account the future needs of the customer overall SAS program. Making the data rather than the most direct solution to the dictionary as collaborative effort as possible problem reveals a better solution for both benefits both the customer and the the customer and the programmer. programmer. In the most practical terminology, the customers for whom the end reports are created are going to discuss the data contained within in terms specific to their business. Academic descriptions of these SUGI 31 Data Warehousing, Management and Quality Lookup Libraries & Format Tables For example, a common created variable requiring a business rule may be an average Customer based on the sum from one field divided by Data Report the count of another field. The most Warehouse simplified business rule would appear as Data Dictionary follows in base SAS code: Customers Field_avg = field_sum / field_cnt; Again, this most simple direct solution is not the best. Pray tell, what if one of the fields in the calculation is zero or heaven forbid, missing? Oh, what a mess you’re going to create in the summary file. The best Integrity of a data dictionary is easier solution is derived from first meeting with maintained when definitions and the customers to decide how missing and descriptions are in the same format. Instead zero values are to be interpreted. Most of arbitrarily segmenting such a complex likely the resulting business rule will then block of logic into modular parts, the best appear as follows: solution is to map each business rule logic block to separate locations and reference If field_cnt in (.,0) then field_avg=.; them all through mapping within the Else if field_sum in (.,0) then field_avg=0; Else field_avg = field_sum / field_cnt; program. Cross-referencing such a data definition library is best done within a The structure of the data dictionary itself spreadsheet or database. Through the use should follow a module in the overall of the much-improved ‘Proc Import’ concept of control file hierarchy, as noted in procedure in SAS, referencing and utilizing various papers and books by legendary SAS the contents of these files is simple. programmer, Art Carpenter. Mr. Carpenter explains the concept and the SAS code used to implement this concept far better than I IMPLEMENTATION Creation of a validated and useable data ever could. In brief, one of the main control dictionary containing business rule files contains a list of all other files utilized definitions for each created variable in the data dictionary. The data dictionary amongst other entries is a far greater reference file can be as simple as a flat file challenge than creating the code to utilize containing the name of the variable along these business rules in a production SAS with the business rule defining this variable. program. The main reason this task is so Also handy if not mandatory is a list of daunting is that creation of each unique source files found on the data warehouse business rule requires extensive coordination and the variables contained on each file. between all interested parties. Trust in the Development of a hierarchal control file business rules comes at the expense of the structure will ease the utilization of the data amount of foresight gained by the dictionary concept significantly. programmer in working with analysts and customers. MACRO PARAMETERS USED TO CREATE PRODUCTION REPORTS FROM DATA DICTIONARY: THE ADHOC PROGRAM SUGI 31 Data Warehousing, Management and Quality The front-end tool will consist of a simple source file key in the data set SAS program called ‘AdHoc’. Portions of resulting from the query. this code are explained at the end of this • Extras – The list of additional paper. The SAS program thoroughly variables to be added to the resulting explains all the input parameters available. data set. Note the contents of this Parameters will also be available to field are only applied if the ‘account’ customize output to a high degree. The parameter is set to ‘Y’ and the user is AdHoc query tool will be able to create responsible for proper context of the almost any query falling within the context code appearing in this parameter. of the standard business rules. • Dims – The list of dimension variables to be stored in the resulting The AdHoc SAS program allows for various data set with the default value being user parameters to select the source query all available dimension variables file, standard dimensions, and metrics in (‘all’) order to create a resulting data set containing • Mets – The list of metric variables to dimensions and metrics from the source be stored in the resulting data set query table. The complete list of user with the default value being all parameters available for use with the AdHoc available dimension variables (‘all’) program follows.

Load more