A History and Evaluation of System R Donald D
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTING PRACTICES A History and Evaluation of System R Donald D. Chamberlin Thomas G. Price Morton M. Astrahan Franco Putzolu Michael W. Blasgen Patricia Griffiths Selinger James N. Gray Mario Schkolnick W. Frank King Donald R. Slutz Bruce G. Lindsay Irving L. Traiger Raymond Lorie Bradford W. Wade James W. Mehl Robert A. Yost IBM Research Laboratory San Jose, California 1. Introduction Throughout the history of infor- mation storage in computers, one of SUMMARY: System R, an experimental database system, the most readily observable trends was constructed to demonstrate that the usability advantages has been the focus on data indepen- dence. C.J. Date [27] defined data of the relational data model can be realized in a system with independence as "immunity of ap- the complete function and high performance required for plications to change in storage struc- everyday production use. This paper describes the three ture and access strategy." Modern principal phases of the System R project and discusses some database systems offer data indepen- of the lessons learned from System R about the design of dence by providing a high-level user relational systems and database systems in general. interface through which users deal with the information content of their data, rather than the various bits, pointers, arrays, lists, etc. which are representation for the information; sented by connections between the used to represent that information. indeed, the representation of a given relevant part and supplier records. In The system assumes responsibility fact may change over time without such a system, a user frames a ques- for choosing an appropriate internal users being aware of the change. tion, such as "What is the lowest Permission to copy without fee all or part of The relational data model was price for bolts?", by writing a pro- this material is granted provided that the cop- proposed by E.F. Codd [22] in 1970 gram which "navigates" through the ies are not made or distributed for direct as the next logical step in the trend maze of connections until it arrives commercial advantage, the ACM copyright notice and the title of the publication and its toward data independence. Codd ob- at the answer to the question. The date appear, and notice is given that copying served that conventional database user of a "navigational" system has is by permission of the Association for Com- systems store information in two the burden (or opportunity) to spec- puting Machinery. To copy otherwise, or to republish, requires a fee and/or specific per- ways: (1) by the contents of records ify exactly how the query is to be mission. stored in the database, and (2) by the processed; the user's algorithm is Key words and phrases: database manage- ways in which these records are con- then embodied in a program which ment systems, relational model, compilation, locking, recovery, access path selection, au- nected together. Different systems is dependent on the data structure thorization use various names for the connec- that existed at the time the program CR Categories: 3.50, 3.70, 3.72, 4.33, 4.6 tions among records, such as links, was written. Authors' address: D. D. Chamberlin et al., IBM Research Laboratory, 5600 Cottle Road, sets, chains, parents, etc. For exam- Relational database systems, as San Jose, California 95193. ple, in Figure l(a), the fact that sup- proposed by Codd, have two impor- © 1981 ACM 0001-0782/81/1000-0632 75¢. plier Acme supplies bolts is repre- tant properties: (1) all information is 632 Communications October 1981 of Volume 24 the ACM Number 10 represented by data values, never by any sort of "connections" which are visible to the user; (2) the system FF supports a very high-level language in which users can frame requests for data without specifying algorithms for processing the requests. The re- lational representation of the data in Figure l(a) is shown in Figure l(b). Information about parts is kept in a PARTS relation in which each record has a "key" (unique identifier) called PARTNO. Information about suppliers SUPPLIERS is kept in a SUPPLIERS relation keyed by SUPPNO. The information which was formerly represented by connec- tions between records is now con- tained in a third relation, PRICES, in pcF which parts and suppliers are repre- sented by their respective keys. The Fig. l(a). A "Navigational" Database. question "What is the lowest price for bolts?" can be framed in a high- level language like SQL [16] as fol- lows: required for everyday production nisms to protect the integrity of the SELECT MIN(PRICE) FROM PRICES use. database in a concurrent-update en- WHERE PARTNO IN The key goals established for Sys- vironment. (SELECT PARTNO tem R were: (5) To provide a means of re- FROM PARTS. covering the contents of the database WHERE NAME = 'BOLT'); (1) To provide a high-level, to a consistent state after a failure of nonnavigational user interface for A relational system can maintain hardware or software. maximum user productivity and data whatever pointers, indices, or other (6) To provide a flexible mech- independence. access aids it finds appropriate for anism whereby different views of (2) To support different types processing user requests, but the stored data can be defined and var- of database use including pro- user's request is not framed in terms ious users can be authorized to query grammed transactions, ad hoc que- of these access aids and is therefore and update these views. ries, and report generation. not dependent on them. Therefore, (7) To support all of the above (3) To support a rapidly chang- the system may change its data rep- functions with a level of performance ing database environment, in which resentation and access aids periodi- comparable to existing lower-func- tables, indexes, views, transactions, cally to adapt to changing require- tion database systems. ments without disturbing users' ex- and other objects could easily be isting applications. added to and removed from the data- Throughout the System R project, Since Codd's original paper, the base without stopping the system. there has been a strong commitment advantages of the relational data (4) To support a population of to carry the system through to an model in terms of user productivity many concurrent users, with mecha- operationally complete prototype and data independence have become widely recognized. However, as in the early days of high-level program- ming languages, questions are some- PARTS SUPPLIERS PRICES times raised about whether or not an automatic system can choose as ef- PARTNO NAME SUPPNO NAME PARTNO SUPPNO PRICE ficient an algorithm for processing a P107 Bolt $51 Acme P107 $51 .59 complex query as a trained program- P113 Nut $57 Ajax P107 $57 .65 mer would. System R is an experi- P125 Screw $63 Amco P113 $51 .25 mental system constructed at the San P132 Gear P113 $63 .21 Jose IBM Research Laboratory to P125 $63 .15 P132 $57 5.25 demonstrate that a relational data- P132 $63 10.00 base system can incorporate the high performance and complete function Fig. l(b). A Relational Database. 633 Communications October 1981 of Volume 24 the ACM Number 10 COMPUTING tional access method called XRM, by the facilities ofXRM. XRM stores which had been developed by R. relations in the form of "tuples," PRACTICES Lorie at IBM's Cambridge Scientific each of which has a unique 32-bit Center [40]. '(XRM was influenced, "tuple identifier" (TID). Since a TID to some extent, by the "Gamma contains a page number, it is possi- which could be installed and evalu- Zero" interface defined by E.F. ble, given a TID, to fetch the asso- ated in actual user sites. Codd and others at San Jose [11].) ciated tuple in one page reference. The history of System R can be Since XRM is a single-user access However, rather than actual data divided into three phases. "Phase method without locking or recovery values, the tuple contains pointers to Zero" of the project, which occurred capabilities, issues relating to con- the "domains" where the actual data during 1974 and-most of 1975, in- currency and recovery were excluded is stored, as shown in Figure 2. Op- volved the development of the SQL from consideration in Phase Zero. tionally, each domain may have an user interface [14] and a quick im- An interpreter program was writ- "inversion," which associates do- plementation of a subset of SQL for ten in PL/I to execute statements main values (e.g., "Programmer") one user at a time. The Phase Zero in the high-level SQL (formerly with the TIDs of tuples in which the prototype, described in [2], provided SEQUEL) language [14, 16] on top values appear. Using the inversions, valuable insight in several areas, but of XRM. The implemented subset XRM makes it easy to find a list of its code was eventually abandoned. of the SQL language included que- TIDs of tuples which contain a given "Phase One" of the project, which ries and updates of the database, as value. For example, in Figure 2, if took place throughout most of 1976 well as the dynamic creation of inversions exist on both the JOB and and 1977, involved the design and new database relations. The Phase LOCATION domains, XRM provides construction of the full-function, Zero implementation supported the commands to create a list of TIDs of multiuser version of System R. An "subquery" construct of SQL, but employees who are programmers, initial system architecture was pre- not its "join" construct. In effect, this and another list of TIDs of employ- sented in [4] and subsequent updates meant that a query could search ees who work in Evanston.