Is the SAS® System a Database Management System?
Total Page:16
File Type:pdf, Size:1020Kb
Is the SAS® System a Database Management System? William D. Clifford, SAS Institute Inc., Austin, TX ABSTRACT WHAT IS A DATABASE MANAGEMENT SYSTEM? Commercial Database Management Systems (DBMSs) provide applications with fast access to A DBMS is a software package that provide's a large quantities of data. In addition, many have repository for computerized data. The DBMS is other capabilities such as data integrity services, responsible for storing the user's data in the data sharing, application-creation tools, and report repository and making it available upon demand, writing. Version 6 of the SAS® System also Users of the data are shielded from the details and contains a number of similar features. peculiarities of the computer software and hardware by the DBMS. That is, a DBMS This paper examines the database features of the separates the application from the data. This Version 6 SAS System and compares them to the separation is a key point and will be discussed in services offered by several popular DBMSs. The more detail. conclusion is that the SAS System can provide a cost-effective alternative to a commercial DBMS for A database is the term used in this paper for a the storage of data. logical collection of data managed by a DBMS. The terms record, row, and observation are INTRODUCTION synonyms as are column, field, and variable. Database Management Systems have been Data Separation available for more than two decades and are frequently used as a repository for data. The The objective is to separate the application from applications that use this data are often not part of the data so that the application can focus on the the DBMS and are either purchased from another external or logical aspects of the data such as vendor or developed by the user. analysis and presentation. The DBMS focuses on managing the internal or physical aspects of the The SAS System is widely used as an application data such as the type and quantity of storage for data analysis. The data may come from a devices and the bookkeeping necessary to support variety of repositories, including a number of the data model. DBMSs. As an example (in a relational data model), the A definition of a DBMS is offered to use as the application sees the data as rows and columns. basis for answering the question posed in the The DBMS translates its internal storage structures paper's title. An inventory of features found in into these rows and columns. current DBMSs is provided and this inventory is compared to the DBMS features found in the SAS The fundamental responsibility of the DBMS, once System. the data are in the database, is to deliver the data back to an application. Query, selection, and With this background, an answer to the question of update faciiities are manifestations of this whether or not the SAS Systel)'l is a DBMS is given. responsibility. More relevant, however, than the name you call your data repository are the features you really Another benefit of data separation is data sharing. require from it. Once a database is created, its data can be accessed by mu ltiple applications. An argument is made that the data management facilities in the Version 6 SAS System have Data Model matured sufficiently so that it is a viable candidate for your data repository. The data model defines the relationships that a,xist among the various data items in the database. Finally some of the DBMS features planned for Some examples of relationships are: future releases of the SAS System are identified. • field owned by a record 168 • child record owned by parent record advanced features are built upon the basic ones and reflect additions required by users to keep up • physical order of records. with advancements in computer technology. There is no Significance to the order of presentation. The Database Management System is responsible for supporting the relationships specified by the Examples of components in Release 6.08 of the data model. Prior to DBMSs, this was the SAS System are included with the description of application's responsibility. each DBMS feature. The examples used here are not intended to be an exhaustive list of such • Earlier DBMSs made the relationships static components of the SAS System. when the database was created. The specific relationShip was the main focus of Basic these DBMSs as evidenced by the data model they supported. Examples ar~ file management hierarchies and networks. To create, 'populate, delete, and backup databases. • Newer DBMSs allow some of the . relationships to be specified dynamically. Examples of file management services in Their focus is also on the relationships, but the SAS System are the DATA step and the in a general, flexible sense instead of a COPY, CIMPORT, CPORT, and SOL procedures. specific, rigid sense. A DBMS that supports the relational data model is an example. data inventory services To list and display information about the Beyond the Basics existing databases. Advancements in computer technology (e.g., more The DATASETS and CONTENTS procedures power, lower cost) placed additional burdens on provide data inventory services in the DBMSs (e.g., user-friendly interfaces, improved SAS System. performance). This brought demand for additional features from the DBMS. query processing To retrieve the stored data, including data As keepers of the data, DBMSs were required to filtering, that is, selectionand projection. solve these problems. Automatic query optimization, integrity constraints, high speed The DATA step, SCL, the WHERE clause, and transactions, and point-and-click interfaces are a the PRINT, SOL, REPORT, and FSBROWSE partial list of solutions provided by the DBMS procedures provide query processing in the vendors. SAS System. Although most DBMSs today have a variety of data update processing presentation and analysis services, such features To change existing data in a database and are not relevant to this discussion. Our focus here add new data. is on the storage and management of data. The DATA step, SCL, the SOL, APPEND, and FEATURES FOUND IN CURRENT DBMSs FSEDIT procedures can be used for update processing in the SAS System. In this section, features found in present-day DBMSs are identified. There may not be industry relational data model wide agreement on the categories or definitions To provide support for the data model that used here. This section is intended to serve as a is most popular for new applications. general overview of the facilities available, not a (However, this is not a requirement for a comprehensive survey. system to be a DBMS.) The features are divided into two general SAS data sets are composed of rows categories, basic and advanced. The basic (observations) and columns (variables), and features refiect the core functionality of a DBMS: thus are relational tables.' The SOL data separation and data relationships. Tile more procedure implements the de facto industry 169 standard data manipulation language for the not the application, is responsible for relational model. preventing data corruption by coordinating access to the data. file-level security To grant or deny a user's access to an The SAS/SHARE® software product is designed entire data file. to permit multiple users to read and update the same data set concurrently. The data All host-level file security features ·are sharing is transparent to the application. honored by. the SAS System. In addition, data set passwords to control read, write, row-level loc~ing and utility access can be defined. To allow data sharing by row. This means multiple users can query an.d update a provide data in sorted order given database concurrently as long as they To physically store the data in sorted do not request the same row. File-level order. or to sort data temporarily before locking, by contrast, permits only one user they are returned to the application. access to the file at a time. The SORT procedure and BY processing can The SAS System supports roW-level locking of be used to return data to the application a single row in a data set within SAS/SHARE in sorted order. software and for multiple opens of the same data set in a standalone environment. Advanced integrated data dictionary row-level security To provide a database of information, To grant or deny a user's access to a maintained and used by the DBMS, containing single row. data (meta data) about all the databases managed by the DBMS. The SQL procedure can be used to define views with a WHERE clause to restrict a Currently the SAS System does not have an user's access to certain rows. integrated data dictionary. SAS/EIS® software supports a non-integrated metabase. portability of applications To facilitate the movement of applications non-integrated integrity constraints and data to different platforms. To support data validation checks performed by the application. The MultiVendor Architecture T" of the SAS System is designed to provide The SAS applications programmer can use portability of applications across informats and write validation code in the heterogeneous platforms. DATA step, SCL, and the AF and FSP procedures. automatic query optimization To allow the DBMS to determine the most integrated integrity constraints efficient method of obtaining the requested To support data validation checks in a data. This may include the use of auxiliary multiple user/application environment. data structures such as indexes and hash These checks are performed automatically by tables. the DBMS for all applications. Non-integrated data validation techniques can be applied to Applications can create indexes for SAS this environment. data sets that will automatically be considered for WHERE clause optimization. Currently the SAS System does not support The SQL procedure will also use appropriate integrated integrity constraints indexes for join optimization.