
Information Retrieval P. BAXENDALE, Editor A Relational Model of Data for The relational view (or model) of data described in Section 1 appears to be superior in several respects to the Large Shared Data Banks graph or network model [3,4] presently in vogue for non- inferential systems. It provides a means of describing data with its natural structure only-that is, without superim- E. F. CODD posing any additional structure for machine representation IBM Research Laboratory, San Jose, California purposes. Accordingly, it provides a basis for a high level data language which will yield maximal independence be- tween programs on the one hand and machine representa- Future users of large data banks must be protected from tion and organization of data on the other. having to know how the data is organized in the machine (the A further advantage of the relational view is that it internal representation). A prompting service which supplies forms a sound basis for treating derivability, redundancy, such information is not a satisfactory solution. Activities of users and consistency of relations-these are discussedin Section at terminals and most application programs should remain 2. The network model, on the other hand, has spawned a unaffected when the internal representation of data is changed number of confusions, not the least of which is mistaking and even when some aspects of the external representation the derivation of connections for the derivation of rela- are changed. Changes in data representation will often be tions (see remarks in Section 2 on the “connection trap”). needed as a result of changes in query, update, and report Finally, the relational view permits a clearer evaluation traffic and natural growth in the types of stored information. of the scope and logical limitations of present formatted Existing noninferential, formatted data systems provide users data systems, and also the relative merits (from a logical with tree-structured files or slightly more general network standpoint) of competing representations of data within a models of the data. In Section 1, inadequacies of these models single system. Examples of this clearer perspective are are discussed. A model based on n-ary relations, a normal cited in various parts of this paper. Implementations of form for data base relations, and the concept of a universal systems to support the relational model are not discussed. data sublanguage are introduced. In Section 2, certain opera- 1.2. DATA DEPENDENCIESIN PRESENTSYSTEMS tions on relations (other than logical inference) are discussed The provision of data description tables in recently de- and applied to the problems of redundancy and consistency veloped information systems represents a major advance in the user’s model. toward the goal of data independence [5,6,7]. Such tables KEY WORDS AND PHRASES: data bank, data base, data structure, data facilitate changing certain characteristics of the data repre- organization, hierarchies of data, networks of data, relations, derivability, sentation stored in a data bank. However, the variety of redundancy, consistency, composition, join, retrieval language, predicate data representation characteristics which can be changed calculus, security, data integrity CR CATEGORIES: 3.70, 3.73, 3.75, 4.20, 4.22, 4.29 without logically impairing some application programs is still quite limited. Further, the model of data with which users interact is still cluttered with representational prop- erties, particularly in regard to the representation of col- lections of data (as opposed to individual items). Three of the principal kinds of data dependencies which still need 1. Relational Model and Normal Form to be removed are: ordering dependence, indexing depend- ence, and accesspath dependence. In some systems these 1.I. INTR~xJ~TI~N dependencies are not clearly separable from one another. This paper is concerned with the application of ele- 1.2.1. Ordering Dependence. Elements of data in a mentary relation theory to systems which provide shared data bank may be stored in a variety of ways, some involv- accessto large banks of formatted data. Except for a paper ing no concern for ordering, some permitting each element by Childs [l], the principal application of relations to data to participate in one ordering only, others permitting each systems has been to deductive question-answering systems. element to participate in several orderings. Let us consider Levein and Maron [2] provide numerous references to work those existing systems which either require or permit data in this area. elements to be stored in at least one total ordering which is In contrast, the problems treated here are those of data closely associated with the hardware-determined ordering independence-the independence of application programs of addresses.For example, the records of a file concerning and terminal activities from growth in data types and parts might be stored in ascending order by part serial changes in data representation-and certain kinds of data number. Such systems normally permit application pro- inconsistency which are expected to become troublesome grams to assume that the order of presentation of records even in nondeductive systems. from such a file is identical to (or is a subordering of) the Volume 13 / Number 6 / June, 1970 Communications of the ACM 377 stored ordering. Those application programs which take Structure 1. Projects Subordinate to Parts advantage of the stored ordering of a file are likely to fail File Segment Fields to operate correctly if for some reason it becomes necessary F PART part # to replace that ordering by a different one. Similar remarks part name part description hold for a stored ordering implemented by means of quantity-on-hand pointers. quantity-on-order It is unnecessary to single out any system as an example, PROJECT project # because all the well-known information systems that are project name marketed today fail to make a clear distinction between project description quantity committed order of presentation on the one hand and stored ordering on the other. Significant implementation problems must be solved to provide this kind of independence. Structure 2. Parts Subordinate to Projects 1.2.2. Indexing Dependence. In the context of for- File Sqmeut Fields matted data, an index is usually thought of as a purely F PROJECT project # performance-oriented component of the data representa- project name tion. It tends to improve response to queries and updates project description PART part # and, at the same time, slow down response to insertions part name and deletions. From an informational standpoint, an index part description is a redundant component of the data representation. If a quantity-on-hand system uses indices at all and if it is to perform well in an quantity-on-order environment with changing patterns of activity on the data quantity committed bank, an ability to create and destroy indices from time to time will probably be necessary. The question then arises: Structure 3. Parts and Projects as Peers Can application programs and terminal activities remain Commitment Relationship Subordinate to Projects invariant as indices come and go? File Segment Fields Present formatted data systems take widely different F PART part # approaches to indexing. TDMS [7] unconditionally pro- part name part description vides indexing on all attributes. The presently released quantity-on-hand version of IMS [5] provides the user with a choice for each quantity-on-order file: a choice between no indexing at all (the hierarchic se- G PROJECT project # quential organization) or indexing on the primary key project name only (the hierarchic indexed sequent,ial organization). In project description PART part # neither case is the user’s application logic dependent on the quantity committed existence of the unconditionally provided indices. IDS [8], however, permits the fle designers to select attributes Structure 4. Parts and Projects as Peers to be indexed and to incorporate indices into the file struc- Commitment Relationship Subordinate to Parts ture by means of additional chains. Application programs File Segnren1 Fields taking advantage of the performance benefit of these in- F PART part # dexing chains must refer to those chains by name. Such pro- part description grams do not operate correctly if these chains are later quantity-on-hand removed. quantity-on-order PROJECT project # 1.2.3. Access Path Dependence. Many of the existing quantity committed formatted data systems provide users with tree-structured G PROJECT project # files or slightly more general network models of the data. project name Application programs developed to work with these sys- project description tems tend to be logically impaired if the trees or networks are changed in structure. A simple example follows. Structure 5. Parts, Projects, and Suppose the data bank contains information about parts Commitment Relationship as Peers and projects. For each part, the part number, part name, FCZC .%&-,,ZC,,t Ficlds part description, quantity-on-hand, and quantity-on-order F PART part # part name are recorded. For each project, the project number, project part description name, project description are recorded. Whenever a project quantity-on-hand makes use of a certain part, the quantity of that part com- quantity-on-order mitted to the given project is also recorded. Suppose that G PROJECT project # the system requires the user or file designer to declare or project name project description define the data in terms of tree structures. Then, any one H COMMIT part # of the hierarchical structures may be adopted for the infor- project # mation mentioned above (see Structures l-5). quantity committed 378 Communications of the ACM Volume 13 / Number 6 / June, 1970 Now, consider the problem of printing out the part ray which represents an n-ary relation R has the following number, part name, and quantity committed for every part properties : used in the project whose project name is “alpha.” The (1) Each row represents an n-tuple of R.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-