<<

Is the SAS® System a System?

William D. Clifford, SAS Institute Inc., Austin, TX

ABSTRACT WHAT IS A DATABASE MANAGEMENT SYSTEM? Commercial Database Management Systems (DBMSs) provide applications with fast access to A DBMS is a software package that provide's a large quantities of data. In addition, many have repository for computerized data. The DBMS is other capabilities such as data integrity services, responsible for storing the user's data in the data sharing, application-creation tools, and report repository and making it available upon demand, writing. Version 6 of the SAS® System also Users of the data are shielded from the details and contains a number of similar features. peculiarities of the computer software and hardware by the DBMS. That is, a DBMS This paper examines the database features of the separates the application from the data. This Version 6 SAS System and compares them to the separation is a key point and will be discussed in services offered by several popular DBMSs. The more detail. conclusion is that the SAS System can provide a cost-effective alternative to a commercial DBMS for A database is the term used in this paper for a the storage of data. logical collection of data managed by a DBMS. The terms record, row, and observation are INTRODUCTION synonyms as are column, field, and variable.

Database Management Systems have been Data Separation available for more than two decades and are frequently used as a repository for data. The The objective is to separate the application from applications that use this data are often not part of the data so that the application can focus on the the DBMS and are either purchased from another external or logical aspects of the data such as vendor or developed by the user. analysis and presentation. The DBMS focuses on managing the internal or physical aspects of the The SAS System is widely used as an application data such as the type and quantity of storage for data analysis. The data may come from a devices and the bookkeeping necessary to support of repositories, including a number of the data model. DBMSs. As an example (in a relational data model), the A definition of a DBMS is offered to use as the application sees the data as rows and columns. basis for answering the question posed in the The DBMS translates its internal storage structures paper's title. An inventory of features found in into these rows and columns. current DBMSs is provided and this inventory is compared to the DBMS features found in the SAS The fundamental responsibility of the DBMS, once System. the data are in the database, is to deliver the data back to an application. Query, selection, and With this background, an answer to the question of update faciiities are manifestations of this whether or not the SAS Systel)'l is a DBMS is given. responsibility. More relevant, however, than the name you call your data repository are the features you really Another benefit of data separation is data sharing. require from it. Once a database is created, its data can be accessed by mu ltiple applications. An argument is made that the data management facilities in the Version 6 SAS System have Data Model matured sufficiently so that it is a viable candidate for your data repository. The data model defines the relationships that a,xist among the various data items in the database. Finally some of the DBMS features planned for Some examples of relationships are: future releases of the SAS System are identified. • field owned by a record

168 • child record owned by parent record advanced features are built upon the basic ones and reflect additions required by users to keep up • physical order of records. with advancements in computer . There is no Significance to the order of presentation. The Database Management System is responsible for supporting the relationships specified by the Examples of components in Release 6.08 of the data model. Prior to DBMSs, this was the SAS System are included with the description of application's responsibility. each DBMS feature. The examples used here are not intended to be an exhaustive list of such • Earlier DBMSs made the relationships static components of the SAS System. when the database was created. The specific relationShip was the main focus of Basic these DBMSs as evidenced by the data model they supported. Examples ar~ file management and networks. To create, 'populate, delete, and backup . • Newer DBMSs allow some of the . relationships to be specified dynamically. Examples of file management services in Their focus is also on the relationships, but the SAS System are the DATA step and the in a general, flexible sense instead of a COPY, CIMPORT, CPORT, and SOL procedures. specific, rigid sense. A DBMS that supports the relational data model is an example. data inventory services To list and display about the Beyond the Basics existing databases.

Advancements in computer technology (e.g., more The DATASETS and CONTENTS procedures power, lower cost) placed additional burdens on provide data inventory services in the DBMSs (e.g., user-friendly interfaces, improved SAS System. performance). This brought demand for additional features from the DBMS. query processing To retrieve the stored data, including data As keepers of the data, DBMSs were required to filtering, that is, selectionand projection. solve these problems. Automatic query optimization, integrity constraints, high speed The DATA step, SCL, the WHERE clause, and transactions, and point-and-click interfaces are a the PRINT, SOL, REPORT, and FSBROWSE partial list of solutions provided by the DBMS procedures provide query processing in the vendors. SAS System.

Although most DBMSs today have a variety of data update processing presentation and analysis services, such features To change existing data in a database and are not relevant to this discussion. Our focus here add new data. is on the storage and management of data. The DATA step, SCL, the SOL, APPEND, and FEATURES FOUND IN CURRENT DBMSs FSEDIT procedures can be used for update processing in the SAS System. In this section, features found in present-day DBMSs are identified. There may not be industry­ relational data model wide agreement on the categories or definitions To provide support for the data model that used here. This section is intended to serve as a is most popular for new applications. general overview of the facilities available, not a (However, this is not a requirement for a comprehensive survey. system to be a DBMS.)

The features are divided into two general SAS data sets are composed of rows categories, basic and advanced. The basic (observations) and columns (variables), and features refiect the core functionality of a DBMS: thus are relational tables.' The SOL data separation and data relationships. Tile more procedure implements the de facto industry

169 standard data manipulation for the not the application, is responsible for relational model. preventing data corruption by coordinating access to the data. file-level security To grant or deny a user's access to an The SAS/SHARE® software product is designed entire data file. to permit multiple users to read and update the same data set concurrently. The data All host-level file security features ·are sharing is transparent to the application. honored by. the SAS System. In addition, data set passwords to read, write, row-level loc~ing and utility access can be defined. To allow data sharing by row. This means multiple users can query an.d update a provide data in sorted order given database concurrently as long as they To physically store the data in sorted do not request the same row. File-level order. or to sort data temporarily before locking, by contrast, permits only one user they are returned to the application. access to the file at a time.

The SORT procedure and BY processing can The SAS System supports roW-level locking of be used to return data to the application a single row in a data set within SAS/SHARE in sorted order. software and for multiple opens of the same data set in a standalone . Advanced integrated data dictionary row-level security To provide a database of information, To grant or deny a user's access to a maintained and used by the DBMS, containing single row. data (meta data) about all the databases managed by the DBMS. The SQL procedure can be used to define views with a WHERE clause to restrict a Currently the SAS System does not have an user's access to certain rows. integrated data dictionary. SAS/EIS® software supports a non-integrated metabase. portability of applications To facilitate the movement of applications non-integrated integrity constraints and data to different platforms. To support data validation checks performed by the application. The MultiVendor T" of the SAS System is designed to provide The SAS applications programmer can use portability of applications across informats and write validation code in the heterogeneous platforms. DATA step, SCL, and the AF and FSP procedures. automatic query optimization To allow the DBMS to determine the most integrated integrity constraints efficient method of obtaining the requested To support data validation checks in a data. This may include the use of auxiliary multiple user/application environment. data structures such as indexes and hash These checks are performed automatically by tables. the DBMS for all applications. Non-integrated data validation techniques can be applied to Applications can create indexes for SAS this environment. data sets that will automatically be considered for WHERE clause optimization. Currently the SAS System does not support The SQL procedure will also use appropriate integrated integrity constraints indexes for join optimization. audit trail multiple users access to data To maintain a time-stamped log of what user To permit multiple users to query and update made a given update, including the new data the same database concurrently. The DBMS, values.

170 distributed databases No integrated audit trail currently exists for To store parts of the same database on the SAS System. For a given application, the different platforms. DATA step and SCL support user-written schemes for collecting such data. There is no support in theSAS System for distribution of a single data set across rollforward different platforms. To permit the recovery of a lost or damaged data set by the application of updates from IS THE SAS SYSTEM A DBMS? an aud,it trail to an archived copy of the database. If you use the historical definition of a DBMS as a data repository that provides separation of data The SAS System currently does not support a and applications, then the SAS System is clearly a rollforward mechanism. For a given application, DBMS. the DATA step and SCL support user-written schemes for collecting such data. If you choose a more contemporary definition of a DBMS. t~en the SAS System falls somewhat short transactions with rollback of being a DBMS. It has a number of features To logically bind multiple updates into a found in many commercial DBMSs, but it does not single atomic update. That is, either all have all of them. the updates are successfully applied to the database or none of them are applied. However, this question .is really academic. A better Rollback initiates the removal of pending question is "What specific requirements do you updates in the atomic unit. have for .your data repository?" If you have an OLTP environment, the SAS System will probably Currently there is no support for transactions not satisfy your performance requirements. An in the SAS System Information Database environment that depends upon lots of rapid sequential. access to the high volume transactions ," databases, is likely to find the SAS System's To provide very fast response time to a large performance very gOOd. number of requests, also known as On-Line Transaction Processing (OLTP). Here WHERE SHOULD YOU STORE YOUR performance is of key importance .. The DATA? environment is usually highly interactive with many users. An exam.ple is an airline DBMS vendors position their product as a data reservation system. repository. The applications that use the data are usually not provided by the DBMS vendor. The The SAS System has been tailored for fast SAS System is positioned as a data analysis and sequential processing, and therefore is not information delivery system. That is, the SAS well-suited to this type of application. System is the application that uses the data.

distributed data/distributed processing The SAS System has facilities to access data in To support an environment with applications many different formats and repositories as has and data on separate platforms. A given been mentioned earlier. Given that you want to database will reside entirely on a single process/analyze your data with the SAS System, platform. then the question here is not access to the data but where the data are to be permanently stored. SAS/CONNECT® software allows an application to access data from a different platform, and There, are three basic choices for the data it permits the application to execute on repository: flat/unstructured files, a commercial another platform. SAS/ACCESS® software DBMS, or the SAS System. And there are SAS supports access to data on other platforms applications and non-SAS applications. With these in some environments. variables, let's define six simple models:

171 primary data data analysis and data storage will eliminate model application repository the need for maintenance and system upgrades to another product (the DBMS), 1 non-SAS flat file and it will provide a single source for 2 SAS flat file problem resolution. Compatibility issues 3 non-SAS DBMS between different versions of the application 4 SAS DBMS software and the DBMS software will not 5 non-SAS SAS System exist. 6 SAS SAS System • product consistency across many platforms. The first two models are quite reasonable and The MultiVendor Architecture (MVA)'" of the common uses of flat files as data repositories. The SAS System provides a portable applications SAS System, via the DATA step, has powerful environment independent of the host facilities for accessing a wide variety of flat file computer system. There is only one formats. language to learn. SAS applications developed on one platform will run on other Models 3 and 4 are the traditional ones with a platforms. Data can be shared across DBMS as the data repository and non-DBMS different platforms. Your data and applications as consumers of the data. applications are not tied to a particular computer system. In a model 5 environment, Ihe DATA step can provide the data to applications in a wide variety of • the ease of transferring data to non-SAS fiat file formats when the original data cannot be applications. In many cases, the fiexibility of read by the applications.. The DATA step can the SAS System for this purpose exceeds produce multiple different fiat files, one for each of that of a traditional DBMS. While most the different applications. While stored in SAS data DBMSs do have an export feature, the length sets, the data can be edited (to repair invalid and data types of the exported data are often values) and subseted prior to delivery to the fixed. The DATA step allows you to output applications. fiat files exactly as you want them,- or as ·the next application needs them. In fact, the SAS The main premise of this paper is that model 6 is a System data management capabilities are viable model and should be carefully considered often used just to massage data between when deciding upon a data repository for SAS applications. applications. The choice between model 4 and model 6 should be based upon the features you FUTURE DIRECTIONS FOR DBMS require from your data repository. FEATURES OF THE SAS SYSTEM

Version 6 of the SAS System lacks some features The features listed below are under consideration found in commercial DBMSs as has been for some future release of the SAS System. No described previously. If you do not have ani of details are given as the research and development these requirements for your data repository, then is in progress and numerous issues remain to be you should seriously consider using the SAS resolved. System. • audit trail, with optional rollforward The benefits of using the SAS System for the storage of your data include: • integrated integrity constraints, including referential integrity • faster access to the data for SAS applications. The SAS System is optimized • integrated data dictionary to deliver data to its own procedures. • rollback, multiple record locking, and • more cost-effective solution. You don't have transactions the added expense of a DBMS. • improved distributed data access (libname • a reduction in the number of vendors on different host) involved. Using the SAS System for both

172 The goal of these efforts is to expand the DBMS When you are making a decision about what services the SAS System offers you, not to displace repository to use for your data, the SAS System is current DBMS products in the marketplace. There a serious candidate. It's the functionality that is considerable use of the DBMS features that have comes with the product, not the product's already been implemented and strong interest in classification, that's important. those that are on the drawing board.

CONCLUSION SAS, SAS/ACCESS, SAS/CONNECT, SAS/EIS, SAS/SHARE, It may be difficult to agree on the exact definition of MultiVendor Architecture, and MVA are registered trademarks a DBMS and whether or not the SAS System or trademarks of SAS Institute Inc. in the USA and other satisfies that definition. However, it should be countries. ® indicates USA registration. clear that the SAS System does support many features found in current DBMS products, and ir, Other brand and product names are registered trademarks or some cases provides more functionality. In future trademarks of their respective companres. releases, additional DBMS functionality will be added to the SAS System.

173