CODASYL Data-Base Management Systems

ROBERT W. TAYLOR IBM Research Laboratory, San Jose, California 9519S ond RANDALLL. FRANK Science Department, University of Utah, Salt Lake City, Utah 8~I12

This paper presents in tutorial fashion the concepts, notation, aud data-base languages that were defined by the CODASYL Data Description Language and Committees. Data structure diagram notation is ex- plained, and sample data-base definitionis developed along with several sample programs. " Advanced features of the languages are discussed, together with examples of their use. An extensive bibliography is included. Keywords and Phrases: data base, data-b~e management, data-base definition,data description language, data independence, data structure diagram, , DBTG Report, data-base machines, information structure design CR Categories: 3.51, 4.33, 4.34

NTRODUCTION Data Base Language Task Group of the CODASYL Programming Language Com- The data base management system (DBMS) mittee. In fact, this article uses the syntax specifications, as published in the 1971 of the more recent reports [$2, $3, $4]. Report of the CODASYL Data Base Task However, because the fundamental system Group (DBTG) [S1], are a landmark in the architecture remains essentially the same development of data base technology. These as that specified in the 1971 Report, we specifications have been the subject of much still use the abbreviation DBTG; though debate, both pro and con, and have served not strictly accurate, "DBTG" does reflect as the basis for several commercially avail- popular usage. able systems, as Fry and Sibley noted in This article explains the specifications; it their paper in this issue of COMPUTING will not attempt to debate the merits and/or SURVErS, page 7. Future national and demerits of the specific approaches taken to international standards will certainly be implement them or of the features of the influenced by this report. different approaches. :Such debates have This article presents in tutorial fashion taken place and they will continue to be the concepts, notation, and data-base lan- guages that are defined by the "DBTG held. Michaels, Mittman and Carlson sur- Report." We choose the term DBTG to vey in their paper [see page 125] many of the describe these concepts, even though since points that have been debated. It should 1971 the role of the original Task Group be remembered, however, that the initial has been assumed by the CODASYL Data role of the DBTG was to recommend Description Language Committee and the language and system specification8 for data-

Copyright © 1976, Association for Computing Machinery, Inc. General permission to republish, but not for profit, all or part of this material is granted, provided that ACM's copyright notice is given and that reference is made to the publication, to its date of issue, and to the fact that reprinting privileges were granted by permission of the Association for Computing Machinery. 68 • R. W. Taylor and R. L. Frank

CONTENTS This paper presents concepts and language statements that are characteristic of DBTG- like systems. Where possible, we point out how a particular feature or option might be used.

I. DESIGN OF A DATA BASE INTRODUCTION 1. DESIGN OF A DATA BASE Two of the most difficult areas of data-base Coueepts of Information Structure Dceign management are the design of an informa- Concepts of Data Structure Dealgn Data Structure Diagrams tion structure and the reduction of that Hierarchies (l-to-n relationships) structure to a data structure which is Many-to-Many Relationships compatible with and managed by the Complex Relationehips Udn£ Data Structure Diagrams Presidential Data Ba~e DBMS. This section deals with both of 2. SAMPLE DATA-BASE APPLICATION these topics, though emphasis is placed on Presidential Data Bage in the DDL Sub-schema of the Presidential Data Bane the data structure design decisions which Sample Retrieval Program must be made. Later on, we introduce and Sample Update Program provide examples of data structure diagrams, Traversing an m:n Relation in the ConoL DML Other ConoL DML Faeilitiec a notation that is widely used to deal with 3. ADVANCED FEATURES data and information. Data-Baee Procedures Areas Location Mode Concepts of Information Structure Design Search Keya Set Selection Currency Indieatore Data-base management systems are tools 4. IMPLEMENTATIONS OF THE DBTG SPECIFICA- to be applied by the users of these systems TIONS to build an accurate and useful model of GUIDE TO FURTHER READING REFERENCES their organization and its information ACKNOWLEDGMENTS needs. To accomplish this, the information structure must accurately define and char- acterize the items of data and the relations among them that are of interest to the users. This is no small task, for it demands a knowledge of the organization and the distribution of information among its various parts. There is currently very little theory base processing in the COBOL programming which can guide a designer in the construc- !,anguage. Data-base applications written tion of this model, though there are several In a host programming language are often guidelines that can be formalized [M9]. associated with both large data bases which We present here a more intuitive formula- contain 108 characters or more and a well tion. known set of applications or transactions, A data-base designer first has the problem perhaps run hundreds of times a day, of identifying all the relevant entities (per- triggered from individual terminals. Such son, place, thing or event) that are of in- extensive processing must be efficient, and terest to his organization. For each entity, the designers of the DBTG system took the relevant attributes must be identified. care that its applications could be tuned to This is not an easy task in practice. Dif- ensure efficiency. Although the DBTG ferent users may call the same entity or recognized the importance of supporting attribute by different names or have dif- other language interfaces for a data base, fering views of it. Some users may call especially "self-contained" languages for different attributes by the same name. It unanticipated queries, it did not directly seems that resolution of such problems is address the problem of other interfaces. primarily a human activity, though some

Computing Surveys, Voi. 8, No. 1, Mareh 1976 CODASYL Data.Base ManagementSyStem • 69 automated help is available in the record- 4) If a one-to-n~ny association exists keeping phase through the use of data between separate entities (for ex- dictionary software to catalog various ample, if there are student entities characteristics of the attributes--name, and dormitory entities, and many length, type, who generates it, who uses it, students reside at a dormitory but a etc. Once the relevant attributes are identi- given student does not reside at fied, the data-base administrator has the more than one dormitory), then problem of grouping attributes together place the identifier of the "one" with into proper entities. Some possible guide- the "many" entity. In our example, lines for doing this are: we would place ithe dormitory identi- 1) Determine those attributes (or con- fier with the student entity. If there catenations of attributes), occurrences is a many-to-many association (for of which identify the entities being example, if a student is enrolled in modeled. Call these attributes identi- many courses and a course has many tiers or candidate keys. For example, if students enrolled), then consider students are the entity being modeled creating a new entity which describes and each student has a unique student this association. This entity will number as well as a social security contain the pair ,of identifiers (namely, number, then both student number the student identifier and course and social security number are identi- identifier) along with any attributes fiers. Group together all identifiers that depend on both identifiers, for for a particular entity. example, the grade received by the 2) Determine those other attributes of student in the course. an entity that describe it, and there The three examples of one-to-one, one-to- will be only one value of this attribute many, and many-to-many associations do for a given entity, but the attribute is not exhaust all possibilities: a person has not part of an identifier. Consider exactly two natural parents (a one-to-two grouping these nonidentifier attributes association); Abrial [MI] has discussed such with the identifier or identifiers of the cases. entity. For example, if students have At the end of the information design a name and are admitted from a given process, the designer should have a full high school, the items name and high specification of those, entities that are of school would be grouped with the interest, their necessary attributes, and the student number and social security names of the entities and attributes; those number. attributes that are entity identifiers; and 3) If, for an identifier, there may be those attributes that ideqtify other en- several values of an attribute, con- tities. Data structure design can then sider whether this "repeating item" commence. may be better modeled as part of a separate entity. For example, if a Concepts of Data Structure Design student is enrolled in several courses, consider whether courses are not Ideally, an information structure can be themselves separate entities worthy handled, as designed, by the DBMS. (Later of being modeled. If they are, then we give examples of situations for which see guideline 4 below. If not (for this is not the case). But there is still much example, if educational degrees are to be done to complete the design. One task considered to describe a student but is to inform the system of the information are not entities in themselves), then structure. This is generally achieved by either allocate a separate attribhte stating the design inn formal computer for each of the finite number of repe- language (the data description language). titions (degree 1, degree 2, etc.), or The data-base definition or schema,1 written associate a dependent, repeating The word schema is used because the definition structure with the entity. is a "schematic" diagram of the data base. 70 • R. W. Taylor and R. L. Frank in the data description language, is nor- istrator may either have to adapt the in- mally compiled into internal tables of the formation structure model so that it can be DBMS. Before this is possible, however, accommodated by the system, or find a other design decisions must be made. In- way to simulate (by using a special pro- formation structure design deals with en- eedure library) the capabilities not directly tities and attributes. In contrast, data-base available in the DBMS. management systems manage records which Clearly, data structure design is a de- are organized as indexed sequential files, tailed, highly technical process which re- hashed or direct access files, inversions, quires the expertise of a professional. During ring structures, or other structures [G1]. the design process, there is an immediate Thus it is necessary to reduce the entity need for a notation that can be used to and attribute level of data-base design to detail entities and their associations. One the world of , e.g., to choose of the notations most widely used to model among storage allocation strategies; to entities and their associations (and the one equate entities with records, perhaps repre- on which the DBTG system is based) is the senting associations with pointer structures; data structure diagram. to decide whether some attributes should be indexed; and to choose between one-way Data Structure Diagrams and two-way lists. A DBMS offers a variety of options during the data definition stage, The data structure diagram was introduced and the data-base administrator must by Bachman [N1]. This graphic notation choose a reasonable (if not optimal) alterna- uses two fundamental components--a rec- tive. Further, if any validity, integrity, or tangle and an arrow. A rectangle enclosing privacy constraints are to be enforced, these a name denotes an entity or record type must also be stated. that is dealt with in the data base. Thus: In later sections we detail some of the options for the design of data structures President (and the accompanying data description language) as specified in the CODASYL Data Description Language Journal of De- indicates that there are occurrences of the velopment [$2]. However, we must make the record type named PRESIDENT. Each following point. Different DBMSs offer record type is composed of data items; but different options for the design of data struc- the particular item names are suppressed tures. Usually options that require more in this description, though they are defined human decisions offer increased control and in the full data-base description. thus the opportunity to tailor data structures It is important to distinguish between a to improve overall operation. On the other record type and an occurrence of a record hand, some s]stems require less of the user, of that type. For example, WASHINGTON for example, those systems that make data and JEFFERSON denote two record oc- structure choices "automatically" (that is, currences within the record type PRESI- according to the algorithms that are in DENT. We use the term record to denote a operation). These systems are generally record occurrence and we use the term "easier to use," but often operate ineffi- record type explicitly. ciently in cases when the structure chosen The second component in a data structure "automatically" is suboptimal. diagram is a directed arrow connecting The associations that the DBMS can make two record types. The record type located among records may not be sufficient to represent the associations at the informa- at the tail of the arrow is called the tion structure level. For example, the in- owner-record type, and the record type formation structure may demand a many- located at the head is called the member- to-many association between entities, while record type. This arrow directed from the system offers only hierarchical associ- owner to member is called a set type and ations. In such cases, the data-base admin- it is named. For example,

Computing Surveyl, Vol. 8. No. 1, March 1978 CODASYL Data-Base Manaoerae~ ~y~em • 71

concept of a set type. Bachman [All has surveyed a number of: possible implementa- ~Native.Sons tion strategies. Another rule of set occurrence is easily I President[ seen by examining the following example. Two occurrences of the NATIVE-SONS set declares that a set type named NATIVE- type are shown in Figure 1, where each SONS exists between the STATE (owner) circle denotes a set occurrence. The owner record is denoted by its STATE-NAME and PRESIDENT (member) record types. item. Member records are denoted by their The declaration that a set type exists speci- PRESIDENT-NAME item. In each set fies that there are associations between records of heterogeneous types in the data occurrence, there is one STATE record and the related PRESIDENT records. Since base. This allows the designer to interrelate there are no Presidents who were born in diverse record types and thus to model asso- Maine, the Maine set occurrence involves ciations between diverse entities in the real world. only the owner record (recall that there is a set occurrence whenever there is an owner- It is also possible to have more than one record occurrence). A set occurrence with member-record type in a set-type declara- no member-record occurrences is called an tion. However, for simplicity of explana- tion, we do not treat that case in detail. "empty set." The reader should realize that sets, as Just as the distinction was drawn be- the term is used here) bear very little re- tween a record type and a record (oc- lationship to the sets used in mathematics. currence), so a distinction is made between For example, the empty set just defined a set type and a set occurrence. The exist- has an element! To avoid possible confu- ence of a set type is declared by naming it, sion, some authors have preferred to call stating its owner-record type (exactly one) these structures "data structure sets" [D1] and its member-record type or types. A set occurrence is one occurrence of the owner- or "owner-coupled sets" [D2]. There is one other rule of set occurrence: record type together with zero or more a given member record may be associated occurrences of each member-record type. with only one set occurrence of a given type. Thus there is an occurrence of a set type A member record cannot sisaultaneously whenever there is an occurrence of its belong to two owner records for the same owner-record type. A set occurrence is an set type. In terms of the circle diagrams of association of one owner record with n (> 0) Figure 1, any overlap among two circles member records. It is this 1-to-n associating of the same type is illegal; it would be il- mechanism that is the basic building block legal to make EISENHOWER a native for relating diverse records. son of both KANSAS and NEBRASKA In every set occurrence, the following (see Figure 2). We give other consequences associations exist among the tenants (i.e., of this rule later, but we emphasize that owner or member records) of that set oc- currence: this rule is fundamental to the understand- • Given an owner record, it is possible to ing of sets. Because of this rule, it is possible process the associated member records of that set occurrence. • Given a member record, it is possible to process the associated owner record of that set occurrence. • Given a member record, it is possible to process other member records in the same set occurrence. Any implementation that satisfies these three rules is a valid implementation of the FIGURE 1. NATIVE-SONS set occurreDees.

Computing Su~ysj Voi. 8.No. 1, March 1970 72 • R. W. Taylor and R. L. Frank

~Native-Sons

J President [ F Administrations-Headed FIGURE 2. Illegal set occurrences. J AdministrationJ to regard a set type as a function with a domain which is the occurrences of the FIGURE 3. Data structure diagram of hierarchy. member-record type or types and with a range which is the occurrences of the owner- record type. 1 .o,000.u,o**0 We now present some examples of data Structure diagrams. Set occurrences are denoted here in terms of one possible im- plementation-ring structures. However, it is important to understand that a set declaration does not imply a method of It Native-Sons implementation. Any implementation that JD Administrations-Headed supports the rules just discussed is possible. FIGURE 4. Occurrence of a hierarchy.

Hierarchies (1-to-n Relationships) Many-to-Many Relationships A hierarchy is a common data structure. A second common structure relates two In this structure, one record owns 0 to n record types which stand in a many-to- occurrences of another record type; each of many relationship to each other. Consider, those occurrences in turn owns 0 to n oc- for example, STUDENT and COURSE currences of a third record type, etc.: no record types. A STUDENT may be taking record owns any record that owns it, either many courses and a COURSE may have directly or indirectly. For example, STATE- many students. To model this, it is not PRESIDENT-ADMINISTRATION, as possible to construct the diagram: shown in Figure 3, is a hierarchy. A possible occurrence of this hierarchy is shown in Student Course Figure 4. In the ring structure shown in Figure 4, =Has-Enrolled = there is an access path directed from the since a COURSE with more than one owner record to its "first" member record, STUDENT would simultaneously be a and from the first member record to the member in two set occurrences of EN- "next" member record. Each member ROLLED-IN, violating the rule of unique record, except the last, has a "next" mem- ownership. This is illustrated in Figure 5. ber record, and each member record, except Similarly, when a student takes more than the first, has a "prior" member record ac- one course, the HAS-ENROLLED set con- cessed by traversing the ring. In addition, dition is violated (see Figure 6). there is an access path directed from each The usual way of representing many-to- member record back to the "owner" record. many relationships is to define a third (Note, however, that a traversal to the record type, as shown in Figure 7. This new "next" member record from the last mem- record type is used to relate the two other ber record or to the "prior" member record record types; it contains any information from the "first" member record yields a that pertains specifically to both STUDENT diagnostic.) We have now satisfied all the and COURSE, e.g., GRADE. requirements which were stated. Figure 8 shows an occurrence of the many-

Computing Surveys, VoL 8, No, 1, March 1976 CODASYL Data-Base Manaoeme~8tt~:i • 73 , i ~: "/~

Enrolled.In Has-Enrolled LI Grade FIGURE 5. Course with more than one student, Two occurrences of ENROLLED-IN with a shared member. FIGURE 7. Representation of many-to-many re- lationships.

possible data items within the CONGRESS- PRES-LINK record type. Complex Relationships Using Data Structure Diagrams The preceding section presented examples FIGUR~ 6. Student enrolled in more than one of cases where a record type was a member course. Two occurrences of HAS-ENROLLED in more than one set type. It is also possible with a shared member. for a given record type to be the owner of more than one set type. Figure 10 illustrates to-many relationship structure. It is pos- such a case. As shown in this figure, each sible to ascertain the classes a given student President won a number of Elections and is enrolled in by following the ENROLLED- headed a number of Administrations. By IN set; whenever a GRADE record is using data structure diagram notation, a found, one switches to the HAS-EN- data-base administrator can define "net- ROLLED set to find the owner record. works," where a record type may serve as Then one returns to the GRADE record, a member in one or more set types and as switches back to the ENROLLED-IN set, owner in one or more other set types. and continues to the next GRADE record, It is also possible to represent "recursive" if any, etc. The retrieval of enrolled stu- structures such as the parts explosion or bill dents given a class record is similar. When of materials structure by using data struc- viewed in this way, it is clear that n-dimen- ture diagram notation. In the parts ex- sional associations are possible when the plosion structure, a part is composed of "intersection record" in the array is a other parts, which in turn are composed of member of n set types. If the set types other parts, etc. While data structure dia- are implemented using ring structures, the gram notation does /got forbid the same structure is like a sparse array; this is an record type from being both owner and appropriate structure in many applications. member in the same set type, this rule has Another example, used in the presidential been adopted by the Data Description data-base example, is shown in Figure 9. Language Journal of Development [52]. For A President may be associated with many example, Congresses and a Congress may serve with more than one President (as a result of a death in office, for example). In this case, a many-to-many relationship exists. Whether is an illegal structure. However, it is still or not the intersection record contains data possible to represent tl~ stl~cture through a meaningful to the user, which is frequently "relationship record type," as shown in the case, the introduction of the third Figure 11. The ASY record type represents record type is generally a necessity. In an assembly of subparts. This structure has this example, the number and dates of an occurrence diagram as shown in Figure speeches addressed to a joint session of a 12 (where horizontal lines are HAS-SUB- Congress by a President are examples of STRUCTURE set occurrences and vertical

co~uti~ S~V~S. ~ 8, ~ I, Mmh 74 R. W. Taylor and R. L. Frank Math English History m+ I+ 'f I ! I I I= I ! I I I n I s ! A~I I | Bob m • I I I I I I I I I I I I I I B I Carol , i 13'• I I I I I I I I I I I I I Ted m • I t l_ '-=

v Enrolled-in ..... ,,~ Has.Enrolled

FIGURE 8. Occurrence of many-to-many relationship.

I PresidentI Congcess I I tonswonlPresientlE Administrations- , I Headed

Congress-ServedL~ I Congress~ IA President-Served !'Election Administration Pres-Link FXGURB 10. Multiple set ownership. FIGURE 9. Many-to-many association in Presi- dential data base. I Part lines are HAS-SUPERSTRUCTURE set occurrences.) Has-Substructure ~ ~ Has-Superstructure Note that as a first approximation, we can consider Figure 11 a special case of the many-to-many relationship. That is, the diagram: FIGURE 11. Parts explosion structure.

to determine how a "parts implosion" tra- versal could be accomplished. Figure 12 does not illustrate one concept: that there is only one occurrence (not two) of a part; that is, there is one bike, one frame, etc. By "folding" the array, we ob- has been "merged" because both sides are tain Figure 13, where there is only one of the same record type. By using the array notation for the occurrence diagram (see occurrence of each part, with one part Figure 12), we can accomplish a "parts description, quantity on hand, etc., no explosion traversal" using the "find next, matter where the part is used. The structure switch sets, and find the owner in the other can also be regarded as an acyclie directed set" traversal. The reader should be able graph, as shown in Figure 14, where the

Computing Surveys, Vol. 8, No. 1, March 1970 CO D A S Y L Data-Base M ano4terne~ 8yatsmt 75 E

Bike Wheel Frame Paint Fenders Body Nuts/Bolts !÷ !'1' !4' !4' !!'t

ASY-I~ASY-2' '' I ;'I I ; 'I '' Bike I I I i i i i '" ; -- b. i ; ! ; i : Wheel ASY-,.,' 1 1 I i" :A~Y"I"J'~1 i Is ; i: Frame I "~SY'2" 1--ASY'2"2 "ASY'2='3 ==1 I li ' ' :i : I L.J li= ..J il Paint I I I I I ~SY-2.2,1 ,, ASY.2.2,2 .~1 Fenders I I ' LJ ~',/ I I Body I.d Ill Has-Substructure Nuts/Bolts Has-Superstructure FIGURE 12. Occurrence of parts explosion.

I I I Bike 'AS~(-1 ~ AYS-2 ASY-3 I I I I I I I I Wheel I ASY- 1.1 ASY. 1.2-,~- I I I I I I I I ASY-2.1-- ASY.2.2--AE~-2.3 I Frame I I I ! I I I I I I I ASY-2.2.1 I I _~sY!zz2.--- Fenders I I I I I I I I Paint -- - I I I I I I I I I Body I I I I Nuts/Bolts n~-- --.,J

km Has-Substructure Has-Superstructure FIGURE 13. Parts explosion with records merged. contents of the ASY records are shown as diagram notation. Record types are PRESI- labels on the edges of the graph. DENT, CONGRESS, ADMINISTRA- TION, STATE, ELEGTION, AND CON- Presidential Data Base GRESS-PRES-LINK. These re~rd types represent the corresponding entities. In Figure 15 shows our example, the Prcsi- addition, the following associations are dential data base, using data structure modeled usingsettypes:

Computi~ ~uurv~y~~.o~L 8, No. 1, Mmrch 1976 76 • R. W. Taylor and R. L. Frank

I Bike ] AS~-I ASI(.2 AS '-3 Wheel I Frame J I AS~o2.1 ASY-2.2 ASY-2.3 ASY-1.2 ASY]I.1 ] Fenders I Body ASY-2.2.2I Paint +

FzGu~ 14. Parts explosion as a graph.

I "System" I s AII-President~SSAdministrations-Headed Elections~Won 1 Congress.SerVed Native-Sons~ Admitted-DuringJl~ / Election Congress-Pres-Link State

All-Elections-SS 4' President-Served4' AlI-States-SS4' I I , I J "System" J

Fi~uR~ 15. Data structure diagram of Presidential data base.

• Each President is associated with the ¢ Each State (except the original thirteen) Elections he won. was admitted during an Administration. • Each President is associated with his Figure 15 also shows three set types Administrations in the ADMINISTRA- where the owner is "the system." These are TIONS-HEADED set type. called singular sets and are discussed in • Each President is associated with a more detail in Section 2, Sample Data-Base number of Congresses in a many-to- Application. Based on the rules regarding many relationship. member-record accessibility, it is sufficient • Each State is associated with a number to note here that the singular set type of Presidents who are its native sons. implies that there are three access paths

Computing Surveys, VoL 8, No. I, March 1976 CODASYL Data.Base Ma~ • ??

i (however encoded physically), an access A description of a ¢h~ b~in the Schema path that passes through all occurrences DDL consists of four major sections: of PRESIDENT (ALL-PRESIDENTS- • an introductory clause SS), an access path that passes through all • one or more AREA clauses occurrences of ELECTION (ALL-ELEC- • one or more RECORD clauses TIONS-SS), and an access path that • one or more SET clauses. passes through all STATE records (ALL- The introductory elau,.~ is used to name the STATES-SS). These singular sets represent data base and to state certain global se- entry points into the data base in the sense curity and integrity constraints. that particular Presidents, Elections, or An area is a logical subdivision of the States may be located by values of their data base, which in many imple~ntations constituent items, with no need to have ac- corresponds to a file or data set in an operat- ceased other records in the data base. ing system. While we usually think of a data base as being a single integrated col- 2. SAMPLE DATA-BASE APPLICATION lection of data, it is often desirable to sub- divide such a data base into multiple logical The DBTG specifications include several subunits, in order to implement special languages which are to be used to describe security and integrity! constraints and to and manipulate data. In this section we provide a mechanism to control the per- present an example of a data base which formance and cost of implementation. uses these languages. The subsection titled Data-base security can be increased by Presidential in the DDL dis- placing highly sensitive data in logically cusses the use of the Schema data description separate areas and by placing special con- language (DDL) to describe a data struc- trols over those areas.: Of course, physical ture. The sub.schema of the Presidential separation of the areas may be used to in- Data Base subsection presents a Sub-schema crease security. Data-base integrity can be language description of that part of the data improved by placing c~ritical data in areas base which is to be processed by an applica- that are safe from harm or are often dupli- tion program~ Subsection Sample Retrieval cated, while high performance areas may Program through subsection Traversing an need to reside on high Ispesd devices. Simi- m:n Relation in the COBOL DML illustrate larly, costs can be minimized by placing the use of the COBOLprogramming language, infrequently used data in areas which reside as augmented by the DBTG data manipu- on less costly devices. Any logical or physical lation language (DML), to access and up- reason for splitting the data can utilize the date the stored data. Some of the more area concept. complex issues, particularly in the DML, The area description in the Schema DDL are not discussed here, though some are allows the data-base admiT, tatar to name presented in Section 3, Advanced Features. these subdivisions of the data base and to specify which of the areas contain which Presidential Data-Base in the DDL record types. The actual mapping of areas to one or more physical storage volumes is At the end of the preceding section we in- under the control of a separate device- troduced the components of the Presi- media control language (DMCL), The dential data base, by using a data structure need for a DMCL waB!noted but ha~ been diagram. We use this sample data base to left unspecified by the DBTG and it6 suc- illustrate the concepts presented in the cessor committees. In !many implementa- remainder of this paper. tions of the DBTG specifications, the func- Here we give a description of the Presi- tions of the DMCL are~ i~porrated in the dential data base in the Schema DDL. The job control or comma~t language of the syntax is that adopted by the 1973 . Several advanced fea- CODASYL Data Description Language tures of areas are described in detail in Sec- (DDL) Journal of Development [$2]. tion 3, Subsection Areats. 78 • R. W. Taylor and R. L. Frank

For every record type in a data base into a Sub-schema (see subsection Sub- there exists a description in the Schema schema of the Presidential Data Base). DDL. A schema record description consists DISPLAY similarly controls the printing of information about the record type, such of the stored schema. LOCKS controls the as its storage and location mechanism, and altering of privacy locks in the schema, information about the area or areas in which which is analogous to controling who may occurrences of the record type may be change the key to the key cabinet. placed. In the preceding example we specify that The record description contains a de- when the Sub-schema processor is requesting scription of all data items that constitute a copy of the schema for use in processing a the record type. A record occurrence in the sub-schema, it must supply the literal stored data base consists of occurrences of 'COPY PASSWORD' as a privacy key. each data-item type that constitute the Similarly, to modify the schema, a person record type. These record occurrences are must satisfy the privacy constraints im- the units of data transfer between the posed by the procedure named CHECK- stored data base and an application "pro- AUTHORIZATION, (which is written by gram. Thus the application programmer the data-base administrator). Such a pro- interface uses a "record at a time" logic cedure is automatically invoked every time (one record occurrence is delivered or stored anyone attempts to modify the stored for each command) in accessing the data schema. Such a procedure is termed a base. data-base procedure (see Section 3, subsection For each set type in a data base, a sepa- Data-Base Procedures). rate set description is written in the Schema Following the introductory section, we DDL. Each set description names the set can intermix descriptions of the areas, type, specifies the owner-record type and records, and sets that make up the data base, member-record type or types, and states subject to the constraint that an area de- detailed information on how occurrences of the set are to be ordered and selected. scription must precede the description of all The introductory section of a schema records that may be placed in that area and description consists of a statement naming the constraint that all records that make the schema, and certain security and up a set must be described before the de- integrity constraints. For our sample data scription of the set using them. For sim- base, this introductory section is: plicity, we present all area descriptions,

SCHEMA NAME IS PPJ!~IDENTIAL; PRIVACY LOCK FOR COPY IS 'COPY PASSWORD' PRIVACY LOCK FOR ALTER IS PROCEDURE CHECK-AUTHORIZATION; This names the schema (PRESIDEN- followed by all record descriptions, followed TIAL) and specifies certain privacy criteria by all set descriptions. (LOCKS) that must be met when attempt- In our example we restrict attention to a ing to access the stored copy of this schema. single area. All record occurrences are in In fact, the schema language provides four this single area, which is described as fol- types of privacy associated with accessing lows: the schema: ALTER, COPY, DISPLAY, and LOCKS. ALTER controls the condi- AREA NAME IS PRIi~IDENTIAL-AREA; tions under which the contents of the stored ON OPEN FOR UPDATE CALL UPDATE-CHECK; schema may be altered. (Note that this is not the same as altering the data base. This names the area as PRESIDENTIAL- Here we are controling the ability to modify AREA and specifies that, if the area is the schema itself, e.g., to add a new record opened for update, a data-base procedure type to the data-base description.) COPY (UPDATE-CHECK) will be invoked. controls who may copy the stored schema As shown in Figure 15 the Presidential

Computin4t Surveys, VoL 8, No. 1, March 1976 CODASYL Data-Base Manayeme~~ • 79 J data base consists of six record types: PRES- The Schema langtmge!'also provides for the IDENT, ADMINISTRATION, STATE, inclusion of many additional data-item CONGRESS , CONGRESS-PRES-LINK, attributes, some of which are discussed in and ELECTION. We illustrate several of Section 3, Advanced ::Features. The PIC- these records in detail and then give a TURE clause describes the number and short description of the remaining records. type of character positions that make up The PRESIDENT record may now be the data item. For example, a picture of defined as: A(10) specifies in a way similar to COBOL or PL/I that the corresponding item is RECORD NAME IS PRESIDENT LOCATION MODE IS CALC USING LAST-NAME, FIRST-NAME, DUPLICATES ARE NOT ALLOWED WITHIN PRESIDENTIAL-AREA O2 PRES-NAME made up of 10 alphabetic positions. A pic- 03 LAST-NAME PIC"A(10)" ture code of X spccites an alphanumeric 03 FIRST-NA~IE PIC"A(10)" 02 PRES-DATE-OF-BIRTH position, and a picture code of 9 specifies a 03 MONTH-B PIC"A(9)" 03 DAY-B PIC"99" numeric position. Data items can be col- 03 YEAR-B PIC"9999" which are 02 PRES-tIE1GHT PIC"X(IO)" lected into groups, collections of 02 PRES-PARTY PIC"A(10)" data items and (optiOnally) other groups 02 PRF~-COLLEGE P1C"A(10)" 02 PRES-AECESTRY P1C "A(10)" that are named. For iexample, the group 02 PRES-RELIG1ON PIC "A(10)" the data 02 PRES-DATE-OF-DEATH PRES-NAME consists of items 03 MONTH-D PIC"A(9)" LAST-NAME and FiRST-NAME. Such 03 DAY-D P1C"99" 03 YEAR-D PIC"9999" grouping provides for good documentation, 02 PRES-CAUSE-DEATH PIC"X(10)" as well as easing the programming task, 02 PRES-FATHER PIC "A(10)" 02 PRES-MOTHER PIC "A(10)" since COBOL provides for primitives for manipulating groups as well as data items. After naming the record PRESIDENT, The ADMINISTRATION record is de- the LOCATION MODE clause specifies fined as follows: RECORD NAME IS ADMINISTRbTION certain information about the way of placing LOCATION MODE lS VIA ADMINISTRATIONS-HEADED SET and retrieving record occurrences. In this WITHIN PRESIDENTIAL-AREA 02 ADMIN-KEY PIC"XXX"' example, CALC is specified. CALC refers 02 ADMIN-INAUGURATION-DATE 03 MONTH PIG "99" to address calculation or hashing. The clause 03 DAY PIC"99" specifies that a PRESIDENT record is 03 YEAR PIC "9999" positioned according to the values of its This record uses a location mode of VIA, data items LAST-NAME and FIRST- wMch specifies that the record occurrences NAME. Note the addition of the DUPLI- are, where possible, to be located near other CATES ARE NOT ALLOWED clause, record occurrences in the ADMINISTRA- which specifies that if an attempt is made to TIONS-HEADED set to which it is linked. store a new (but duplicate) occurrence of Thus all occurrences of She ADMINISTRA- the PRESIDENT record the system should TION record for a particular administration reject the request and notify the application would be placed near each other and the program of the rejection. Of course, this owning PRESIDENT record. The VIA means that the data base would not support specification advises the DBMS of the two presidents with same first and last desirability of clustering record occurrences names ! on secondary storage. Of course, this is The WITHIN clause specifies in which only a request to attempt to cluster the area or areas occurrences of the record may records; the system will follow an imple- be placed: in our sample data base it is menter-defined algorithm. The performance PRESIDENTIAL-AREA, the only area. of this algorithm will depend, in part, on Next, the data items that constitute the storage allocations made by the data-base record are specified. The names given for administrator. .~ the items are self-explanatory. Associated The remaining record des~tions are with each data item name is a picture. presented without further discussion; they

Computi~ 8m'v~s, Vol. $, No. 1, Mlueh1~6 80 • R. W. Taylor and R. L. Frank follow the same format as do the previous clause names the owner-record type. In two. these three examples, a declaration of RECORD NAME IS STATE SYSTEM as owner denotes singular sets. LOCATION MODE IS CALC USING STATE-NAME DUPLICATES ARE NOT ALLOWED The ORDER IS clause specifies the order WITHIN PRESIDENTIAL-AREA in which member-record occurrences may 02 STATE-NAME PIC "X(10)" 02 STATE-YEAR-ADMITTED PIC "9999" be presented sequentially to an application 02 STATE-CAPITAL P1C"X(10)" program. Here we specify the order of the RECORD NAME IS ELECTION LOCATION MODE IS VIA ALL-ELECTIONS--SS set to be sorted based on keys stated as WITHIN PRESIDENTIAL-AREA part of the description of the member record 02 ELECTION-YEAR PIC "9999" 02 ELECTION-WON-ELECTORAL-VOTES PIC"999" (the DEFINED KEYS option). The RECORD NAME IS CONG1RI~$S PERMANENT option (required for sorted LOCATION MODE IS CALC USING CONGRESS-KEY DUPLICATES ARE NOT ALLOWED sets) specifies that an application program WITHIN PRESIDENTIAL-AREA may not make (permanent) alterations to 02 CONGRESS-KEY PIC"XXXX" 02 CONGRI~S-NUM-PARTY--SENATE PIC"999" the order of a set. Thus any ordering changes 02 CONGRESS-NUM-PARTY-HOUSE PIC "999" made by an application program are local RECORD NAME IS CONGRESS-PRES-LINK LOCATION MODE IS VIA CCONGRI~-SERVED SET to the execution of that program and do WITmN PRESIDENTIAL-AREA not permanently affect the data base. When the record types that make up the The DUPLICATES clause specifies data base have been described, the process whether duplicates are permitted for the of ~lating record types through sets can defined keys and, if they are, how they begin. The simpl~st type of set description, should be handled. The DUPLICATES termed a s/ngu/ar set, is one where the ARE LAST option specifies that when an owner of the set is implicitly the "sys~m." attempt is made to store a member record Because such a set has a unique owner, that has duplicated the keys of another there can be only one occurrence of the set member-record occurrence, the system (thus the term singular). One way to use should place the new record occurrence the singular set is to collect all records of a after all existing record occurrences that particular type for sequential access. In have the same key. the Presidential data base the record types The MEMBER subentry names the PRESIDENT, ELECTION, and STATE member-record types that make up this set are to be members of three singular sets. type. In all of the examples from the Presi- The descriptions for these three sets are: dential data base, a set type is composed of an owner-record type and a single mem- SET NAME IS ALL-PRESIDENTS-SS OWNER IS SYSTEM ber-record type, though the specifications OI{DER IS PERMANENT SORTED BY DEFINED KEYS allow for multiple member-record types in a DL'PLICATES ARE LAST MEMBER IS PRESIDENT MANDATORY AUTOMATIC single set type. KEY IS ASCENDING LAST-NAME IN PRES-NAME SET SELECTION IS THRU ALL-PRESIDENTS-SS The record type which acts as a member OWNER IDENTIFIED BY SYSTEM of the set is named in the MEMBER IS SET NAME IS ALL-ELECTIONS-SS OWNER IS SYSTEM entry. This is followed by a statement con- ORDER IS PERMANENT SORTED BY DEFINED KEYS cerning the removal of member-record oc- DUPL1CATI'~ ARE NOT ALLOWED .MEMBEI¢ IS ELECTION MANDATORY AUTOMATIC currences from set occurrences that is al- KEY 1S ASCENDING ELECTION-YEAR SET SELECTION IS THRU ALL-ELECTIONS-SS lowed and a statement specifying how mem- OWNER IDENTIFIED BY SYSTEM ber record occurrences are initially placed SET NAME 1S ALL-STATES--SS in set occurrences. The MANDATORY OWNER IS SYSTEM ORDER 1S PERMANENT SORTED BY DEFINED KEYS specification indicates that once an oc- DUPLICATFES ARE NOT ALLOWED MEMBER IS STATE MANDATORY AUTOMATIC currence of PRESIDENT is placed in KEY IS ASCENDING STATE-NAME SET SELECTION IS THRU ALL-STATES-SS ALL-PRESIDENTS-SS it may not be OWNER IDENTIFIED BY SYSTEM removed from the set occurrence without Since all three set descriptions follow the actually deleting the record occurrence. same format, only the ~st set is described. The AUTOMATIC specification indicates After naming the set type, the OWNER IS that each time a new occurrence of the

Computing Surveys, Voi. 8, No. 1, March 1976 CODASYL Data-Ba~e Ma~If:~em • 81 i PRESIDENT record is stored in the data CALC-KEY of the eon~mmding PRESI- base, it is automatically inserted in the DENT. By referring to the description ALL-PRESIDENTS-SS set. The combined for a PRESIDENT record, we note that effect of MANDATORY AUTOMATIC the CALC-KEY is LAST-NAME, FIRST- is that all occurrences of the PRESIDENT NAME. Therefore, before we attempt to record will be a member of the ALL-PRESI- STORE a new occurrence of ELECTION DENTS-SS. in the data base, we must present the system The SET SELECTION clause allows the with values for the Winning PRESIDENT system to support the AUTOMATIC in- LAST-NAME and FIRST-NAME. Since sertion of member-record occurrences into there is a one-to-one correspondence be- the appropriate set occurrences. Since all tween owner-record occurrences and set sets described have only one occurrence, the occurrences, to identify a PRESIDENT- system may (trivially) select the proper set record occurrence uniquely identifies the occurrence. For singular sets, the SET corresponding ELECTIONS-WON set oc- SELECTION clause is a restatement of currence and allows insertion of the oc- the fact that the set is singular. The other currence of the new ELECTION record set descriptions present more complex cases into this set. of set selection. As discussed in Section 1, Design of a We now describe the ELECTIONS-WON Data Base, there is an m:n relationship be- set: tween PRESIDEN'I~! and CONGRESSES. SET NAME IS ELECTIONS-WON We relate these two records a6 follows by OWNER IS PRESIDENT ORDER IS SORTED PERMANENT BY DEFINED KEYS use of an intermediat~ "link record," which DUPLICATES ARE NOT ALLOWED in this ease is CONGRESS-PRES-LINK, ~IEMBER IS ELECTION MANDATORY AUTOMATIC KEY IS ELECTION-YEAR shown in Display 1 below. SET SELECTION IS THRU ELECTIONS-WON OWNER IDENTIFIED BY CALC-KEY The sole purpose for CONGRESS-PRES- LINK is to act as a link between PRESI- This set description basically follows the DENT and CONGRESS. As such, it is not same format as do the previous descriptions, meaningful to define a sort order, and it is with the exception of the SET SELECTION declared IMMATERIAL; the system may clause. Since there now exists an occurrence keep occurrences in an implementor-defined of ELECTIONS-WON corresponding to order. each occurrence of PRESIDENT, we need For both sets, the SET SELECTION is to tell the system how to relate a new oc- determined by the CALC-KEY of the re- currence of ELECTION to the correspond- spective owner record. For example, when ing occurrence of PRESIDENT. This arises, an application program determines that a for example, when a new ELECTION particular PRESIDENT is to be linked to record is stored, since we have declared (with AUTOMATIC) that each new oc- a particular CONGRESS, the program currence of ELECTION must be placed in must provide a value for the CALC-KEY an occurrence of the ELECTIONS-WON of PRESIDENT (LAST-NAME, FIRST- set. By stating OWNER IDENTIFIED NAME) and CONGRESS (CONGRESS- BY CALC-KEY, the system is told that the KEY). When the program issues a STORE application program will state the necessary operation on CONGRRSS-PRES-LINK Display 1 SET NAME IS CONGRESS-SERVED SET SELECTION IS THRU FItESIDENT-SERVED OWNER IS PRESIDENT OWNER IDENTIFIED BY CAI~.,-KEY ORDER IS PERMANENT IMMATERIAL 51EMBER IS CONGRFESS-PRES-L1NK MANDATORY AUTOMATIC SET SELECTION 1S THRU CONGRESS-SERVED OWNER IDENTIFIED BY CALC-KEY SET NAME IS PRESIDENT-SERVED OWNER IS CONGRESS ORDER IS PER~IANLXT IMMATERIAL MEMBER IS CONGRESS-PRES-LINK MANDATORY AUTOMATIC

Computins ~ Yd. 8, ~ I, ]~eh 1~'8 i 82 • R. W. Taylor and R. L. Frank

(CONGRESS-PRES-LINK is an AUTO- I Administration MATIC member of both CONGRESS- J President mm- SERVED and PRESIDENT-SERVED), ions i~ the record is linked to both sets. Headed The other sets relate the records Native-Son Admitted-During PRESIDENT, ADMINISTRATION, and 1 State STATE:

SET NAME IS ADMINISTRATIONS-HEADED FIGURE 16. Part of Presidential data base. OWNER IS PRESIDENT ORDER IS PERMANENT SORTED BY DEFINED KEYS DUPLICATES ARE NOT ALLOWED whenever a new record occurrence is stored MEMBER IS ADMINISTRATION MANDATORY AUTOMATIC KEY IS ASCENDING ADM1N-KEY in the data base, it will be placed in an oc- SET SELECTION IS THRU ADMINISTRATIONS-HEADED OWNER IDENTIFIED BY CALC-KEY currence of each set in which it is an auto- SET NAME IS ADMiTTED-DURING matic member. However, there is a one-to- OWNER IS ADMINISTRATION ORDER IS PERMANENT SORTED BY DEFINED KEYS one correspondence between set occurrences DUPLICATES ARE NOT ALLOWED and owner-record occurrences. In order for MEMBER IS STATE MANDATORY MANUAL KEY IS ASCENDING STATE-YEAR-ADMITTED a set occurrence to exist, there must be an SET SELECTION FOR ADMITTED-DURING IS THRU ADMINISTRATIONS-HEADED occurrence of its owner. In the context of OWNER IDENTIFIED BY CALC-KEY Figure 16, if all three records are automatic THEN THRU ADMITTED-DURING WHERE OWNER IDENTIFIED BY ADMIN-REY members of their respective sets, it is im- SET NAME IS NATIVE-SON possible to store the first record occurrence OWNER IS STATE ORDER IS PERMANENT SORTED BY DEFINED KEYS of any of the record types (because storing DUPLICATES ARE LAST an automatic member record requires an MEMBER 1S PRI~SIDENT MANDATORY AUTOMATIC KEY 1S ASCENDING LAST-NAME IN PRES-NAME owner). SET SELECTION IS THRU NATIVE-SON OWNER IDENTIFIED BY CALC-KEY The solution to the dilemma, as required by the CODASYL specifications, is the The format for the sets ADMINISTPUt- declaration that one of the member records TIONS-HEADED and NATIVE-SON (in a cyclic structure) is MANUAL. In follows that of the previous sets. In both this example, we declare STATE to be a cases the owners of the respective sets have manual member of ADMITTED-DURING. CALC-KEYS defined for them, and these The implication is that appropriate occur- calc-keys are used to determine an occur- rences of STATE must be stored in the rence of the respective sets when necessary. data base first; then store occurrences of The ADMITTED-DURING set intro- PRESIDENT, and finally occurrences of duces several new concepts. The first of ADMINISTRATION. After the record these is the concept of MANUAL member- occurrences have been stored, the appli- ship. In previous examples all member cation program can traverse the structure records have been declared AUTOMATIC, and can manually link together STATE and meaning that when a new occurrence of the ADMINISTRATION record occurrences. member record is storm in the data base, it The second and more fundamental reason is to be automatically included in an oc- for declaring STATE a manual member currence of the set. There are two important concerns the nature of the data being reasons an AUTOMATIC membership stored. The original thirteen states of the attribute would be improper here. The first Union existed from the beginning of the reason concerns the cyclic structure of the Country, and were not "added" during three sets, as shown by the data structure diagram of Figure 16. any President's Administration. It would If each of the records, PRESIDENT, be improper to require that these States ADMINISTRATION, STATE, shown in be placed in any occurrence of ADMITTED- Figure 16, were defined as automatic mem- DURING; hence STATE must be a manual bers of their respective sets, problems would member of ADMITTED-DURING. In occur. Automatic membership implies that general, when a member record participates

Computing Surveys, Vol. 8, No. 1, March 1976 CODASYL Data-Base Managvm¢~2 ~ • • 83

conditionally in a set, the MANUAL attri- Sub-schema DDL allows a data-base ad- bute is used. ministrator to delimit which portions of a The second major concept introduced by data base (as declared in the Schema) are the ADMITTED-DURING set concerns to be made available to the application its SET SELECTION clause. In all pre- program or programs. It also enables the vious examples, SET SELECTION is data-base administrator to make some either obvious (for singular sets), or ap- changes in the way that the stored data is parent directly through the CALC-key of presented to an application program; for the owner record of the set. If we refer to example, the internal representation of a the description of the ADMINISTRATION data item might be changed from binary record, we note that the location mode for in the data base to decimal when passed the record is VIA the ADMINISTRA- to the application program. The particular TIONS-HEADED set. Thus we may not sub-schema described here is defined for use identify the owner of ADMITTED- with COBOL. CODASYL intends to de- DURING by presenting a CALC-key, since velop sub-schema languages for other ADMINISTRATION is not a "calced" programming and self-contained languages. record. In this case we chose an option The application programs discussed later which specifies that we first select an occur- need only the information in that part of rence of ADMINISTRATIONS-HEADED the data base which is shown in Figure 16. by presenting the CALC-key of PRESI- We therefore define a sub-schema to be DENT. The result of this operation is the used for that information only. The first determination of a set occurrence of part of the definition gives names and ADMINISTRATIONS-HEADED. Then, privacy keys: from among the occurrences of ADMIN- SS PRI~-ADMIN-STATE-II~F0 WITHIN SCHEMA PRESIDENTIAL ISTRATION that participate in the PRIVACY KEY IS *COPY PASSWORD'. ADMINISTRATIONS-HEADED just se- lected, we select an ADMINISTRATION This calls the sub-schema PRES-ADMIN- record by providing its ADMIN-KEY. Once STATE-INFO, and names the privacy key we have identified an occurrence of AD- which allows the sub'schema processor to MINISTRATION, we have also identified access the stored schema; the schema defini- an occurrence of ADMITTED-DURING. tion has a COPY privacy lock. It is important to note that the second phase Next, the necessary areas are defined; we of this process, that of selecting an occur- must select those areas that are required rence of ADMINISTRATION based on a by the application programs that access value for ADMIN-KEY, is performed only the data base through this sub-schema. among those occurrences selected during Due to conflicts with Go~oL reserved words, the first phase. Thus there is no requirement the term AREA in the Schema DDL is that ADMIN-KEY be unique across the called a REALM in ~he sub-schema and data base, but only that it be unique within COBOL DML. Since there is only one area individual occurrences of ADMINISTRA- in the schema, we specify that it is to be TIONS-HEADED. made available: We have now described the Presidential REALM DIVISION RD PRESIDENTIAL-AREA. data base. The next phase is the declaration of one or more sub-schemas. Next we define the sets required in this sub-schema:

A Sub-schema of the Presidential Data Base SET DIVISION. SD ALL-PRESIDEI~TS-SS. The Schema defines the entire data base SD ALL-STATES--SS. SD ADMINISTRATIOI'~S--HEADED. that is stored and available to all users. SD ADMITTED-DURING. But an application program may need to SD NATIVE-SON. view only some parts of the data base, as Finally we name the records and cor- well as to make some simple changes. The responding data items that are part of this

ComputinffSuz'v~ve, VoL 8, No. |, March 1976 84 • R. W. Taylor and R. L. Frank sub-schema: The IDENTIFICATION and EN- VIRONMENT DIVISIONS remain un- RECORD DIVISION. 01 PRESIDENT. changed from those used by standard O2 PRFES-NAME. 03 LAST-NAME PICA(10). COBOL, Within the ENVIRONJvIENT DI- 03 FIRST-NAME PICA(10). VISION, we assign an internal COBOL file 01 ADMINISTRATION. 02 ADMIN-KEY PICXXX. to the actual print file, for output of our 02 ADMIN-INAUGURATION-DATE. query. 03 MONTH PIC99. 03 YEAR PIC9999. The COBOL DATA DIVISION incorpo- 01 STATE. rates the link to the data base: 02 STATE-NAME PICX(10). 02 STATE-YEAR-ADMITTED P1C9999 DATA DIVISION. FILE SEC~rIoN. DB PRES.ADMIN-STATE-INFO WITHIN PRESIDENTIAL. In this record section we not only elimi- FD REPORT-FILE. remainder of data item entries which make nate unnecessary records from the sub- up a standard COBOL file. schema, but include only those data items WORKING-STORAGE SECTION. 77 PRESIDENT-COUNT USAGE COMPUTATIONAL PIC 999. of interest to an application program using 77 DONE PIC 9(5) VALUE "04021". 77 NO-MORE-SONS PICA(5). this sub-schema. 77 NO-MORE-STATES PICA(5). Sample Retrieval Program The DB entry specifie,~ which sub-schema Once the schema and sub-schema are de- and schema this program is referencing. fined, application programs can be written While. not required by the specifications, to store and access data. The DBTG speci- the effect of such a statement in most im- fications do not include a special data-base plementations is to cause the record de- population function; therefore the first scriptions from the sub-schema (augmented program written is normally a data-base by schema information) to be copied into the load program. We shall, however, assume COBOL application program, thus reserving that a data base does exist for our examples. space within the program for each data-base The first program presented here finds record type (and selected items) which this all of the States that have more than one program may access. The record and data President as a native son. Then we print item names declared in the sub-schema are out the name of the State with its number therefore referenceable from the COBOL of Presidents. Queries like this, which in- program. DML verbs cause the DBMS to volve traversing almost the entire data base transfer data to/from the buffers reserved and performing counting are well suited for by copying the sub-schema record descrip- DBTG-type systems. tions into the program. The DML presented in this section is The working storage section remains un- designed to augment the COBOL program- changed; here we define local variables to ruing language. To save space (and to aid be used in the program. PRESIDENT- those who are not COBOL programmers), COUNT is used to keep a count of the na- the examples use English language de- tive sons of a particular state. DONE is scriptions for nondata-base functions such used as a mnemonic device for the status as input/output and computation. For those condition of a data-base. It is initialized who know the COBOL programming lan- to the correct status value. While using a guage, the corresponding code should be DBTG-like system, it might be common obvious. Our COBoL/DML program begins with practice to define a library of common the standard COBOL IDENTIFICATION status codes and to include them in the and ENVIRONMENT DIVISIONS: working storage section by using the COBOL IDENTIFICATION DIVISION. COPY facility. NO-MORE-SONS and NO- PROGRAM-NAME. SAMPLE-QUERY, MORE-STATES are status variables used ENVIRONMENT DIVISION. identification of machine environment and within the program logic. d~|acafion of non-data-base files (i.e., atandatd COBOL filos) The procedures for accessing the data base

Computing Surveys, Vol. 8, No. 1, March 1976 • 85 are specified in the COBOL PROCEDURE an expected error situ~tlon occurs:when we DIVISION: sequentially traverse :thr~l.Jl a st oc- PROCEDURE DIVISION, currence and reach the end of the set oc- DECLARATIVES. EXPECTED-ERROR SECTION. currence. Such an error situation usually USE FOR DATABASE-EXCEPTION ON "04021". means that we should ~ proceed to another EXPECTED-ERROR-HANDLING. EXIT, part of our program to~ continue processing. UNEXPECTED-ERROR SECTION. USE FOR DATABASE-EXCEPTION ON OTHER. An example of an unexpected error condi- UNEXPECTED-ERROR-HANDLING. tion is an input/output error (for example, here we would procc.~s unexpected error conditions. bad parity detected). END DECLARATIVES, In this example the DATABASE- STATUS code of "04021" corresponds to The first part of the PROCEDURE the end-of-set occurrence condition just DIVISION is the DECLARATIVES sec- mentioned. When suchl an error occurs, the tion. In the DECLARATIVES section we processing specifies a return to the state- state that the processing is to take place ment following the DML command which when the DBMS determines that an error caused the error (EXIT). The program has occurred. The DBMS maintains a would then examine DATAB/L_SE.STATUS status location that can be referenced in and, if an end-of-set occurrence condition the program by the name DATABASE- had occurred, could br~eh to a~ther part STATUS. DATABASE-STATUS, upon re- of the program. turn from a DML command, will contain a If any DATABASE.STATIYS code other value of "00000" if the command was suc- than "04021" occurs, control is passed cessfully executed. A nonzero code repre- to UNEXPECTED-ERROR.HANDLING. sents an error condition, with the value Here we would specify the proeea~ing to for the code indicating the nature of the take place for these ul~xpege~i error situ- error. ations. The code can examine DATABASE- In the event that an error code results from a DML operation, control is returned STATUS as well as other status locations in an attempt to determine what caused the to that section within the DECLARATIVES error and how the program should attempt that corresponds to the error code. For to recover from it. every error status code listed explicitly in a Following the DECLARATIVES section USE FOR DATABASE-EXCEPTION ON are the normal procedures: "error status code", control is passed to the processing INITIALIZATION. paragraph that follows the USE statement. READY PRESIDENTIAL.AI~A. OPEN nen-databaee COBOL fil~. In the preceding example, if a DATABASE- MOVE "FALSE" TO NO~-31OltE-STATES. STATUS of "04021" were returned, the FIND FIRST STATE IN ALL4~TATES.$S. PERFORM PROCESS-STATE TIIIRU FINISH-STATE paragraph labeled EXPECTED-ERROR- UNTIL NO-MORE.STAT~ - "TRUE". GO TO FINISH-UP. HANDLING would be invoked. Any PROCFESS~STATE. DATABASE-STATUS codes that are not MOVE 0 TO PRESIDENT-coUNT. IF NATIVE-SON IS EMPTY explicitly listed cause a transfer to the para- MOVE "TRUE" TO NO-MORE.SONS, ELSE MOVE "FALSE~ TO I~,'O-MORE.SOI~;S. graph following the USE FOR DATABASE- PERFORM COUNT-NATIVE-SOI~S EXCEPTION ON OTHER statement. In UNTIL NO-MORE-SONS - "TRUE". GO TO FINISH,STA'I~ the preceding example, control would be COUNT-NATIVE-SONS. FIND NEXT PRESIDEET IN NATIVE-SON. returned to the paragraph UNEXPECTED- IF DATABASESTATUS -- DONE ERROR-HANDLING after an unlisted MOVE "TRUE" TO K0-MORE.80!NS ELSE ADD 1 TO PRI~IDENT-COUR'T. DATABASE-STATUS statement resulted. FINISH.STATE. IF PRESIDENT-COU~,~T IS GIR.I~ATERTHAN 1 As implied by the paragraph and section FIND STATE CURRFE~'T, GET STATE, names in the preceding example, there is write out state rise sn4 I~ ~omllt, FIND NEXT STATE IN AL]~TATES.,~I8 generally a distinction between error situ- IF DATABASE.STATUS' = DO~E ations that we expect to happen as part of MOVE "TRUE" TO NO-MORE-STATES. FINISH-UP. our normal processing, and error situations FINISH PRESIDENTIAL.AREA. CLOSE non.datalmee COBOL filee. that are totally unexpected. An example of STOP RUN.

[ . 86 • R. W. Taylor and R. L. Frank

The initialization of this program involves ceives the value "FALSE". When the set is READYing the PRESIDENTIAL-AREA not empty, we PERFORM the paragraph realm (area in the Schema language), which COUNT-NATIVE-SONS until NO-MORE- makes this realm available to the applica- SONS is set to "TRUE". tion program. Following this, any standard Within COUNT-NATIVE-SONS we COBOL files are opened (for example, the FIND the NEXT occurrence of PRESI- report file). NO-MORE-STATES is used to DENT. NEXT is relative to the current indicate that we have finished processing record occurrence of NATIVE-SON. If the all of the states, and is initialized to current record of NATIVE-SON is the "FALSE". owner record (STATE), "next" is the first Our algorithm used to solve this query member record. When a PRESIDENT is to traverse the STATE-record occur- record is the current record of NATIVE- rences (by sequencing through the singular SON, then the "next" record is the PRESI- set ALL-STATES-SS). As we find each new DENT record which follows (unless the set STATE record, we check to see whether its is now exhausted). NATIVE-SON set is empty (i.e., whether If the program has traversed through all there are no native sons of that state), in occurrences of PRESIDENT within the which case we move to the next state. current occurrence of NATIVE-SON, the Otherwise, we traverse the occurrence of system sets DATABASE-STATUS to indi- NATIVE-SON, counting the PRESIDENT cate this and passes control to the records that participate in the set occur- DECLARATIVES. Within the DECLARA- rence. If the count is equal to or greater than TIVES we have specified that for an end- one, we write out the state name and num- of-set condition return is to be passed back ber of native sons. If not, we continue by to the statement after the FIND command. selecting the next STATE record until we The program then sets NO-MORE-SONS have finished processing all states. to "TRUE", which causes the execution of The FIND statement in the COBOL DML COUNT-NATIVE-SONS to terminate. is used to locate a specified record occur- If we have successfully found another rence in the data base. It does not cause the occurrence of PRESIDENT, we increment contents of the found record to be trans- PRESIDENT-COUNT and continue an- ferred to the working storage of the pro- other iteration through COUNT-NATIVE- gram. For example, the statement FIND SONS. FIRST STATE IN ALL-STATES-SS causes The paragraph FINISH-STATE is en- the system only to locate the record oc- tered when we have finished counting all of currence. The FIND statement changes the the native sons of the current STATE. We status of several currency indicators so that check PRESIDENT-COUNT to see whether they point to the first record occurrence in its value is greater than one; if it is, we ALL-STATES-SS. The member record re-locate or again find the current STATE re- which is "first" depends on the member- cord (FIND STATE CURRENT) and GET record order which was specified in the it. The GET command causes the transfer of Schema description of the set. the record occurrence from secondary stor- By finding an occurrence of a STATE age/system buffers to the internal work record, we have identified an occurrence of space of the program. After GETting the the NATIVE-SON record. We now PER- current STATE record, we write out the FORM (execute) the COBOL paragraph la- state name and the number of native sons. beled PROCESS-STATE through FINISH- We then proceed to select the next STATE STATE until the variable NO-MORE- record within ALL-STATES-SS. If we have STATES is set to the value of "TRUE", at finished processing all STATE records, which point we branch to FINISH-UP. NO-MORE-STATES receives the value Within PROCESS-STATE we initialize "TRUE", which causes the termination of PRESIDENT-COUNT to zero. If the PERFORM PROCESS-STATE THRU current occurrence of NATIVE-SON FINISH-STATE. Otherwise, we continue is empty, we set NO-MORE-SONS to with another iteration beginning at "TRUE"; otherwise, NO-MORE-SONS re- PROCESS-STATE.

Computing Surveys, VoL 8, No. Io March 1976 CODASYL Data-Base Managlrnum~~gystem • 87

When we have finished processing all of must explicitly tell the system which AD- the STATE records, we enter the FINISH- MINISTRATION record occurrence is to UP paragraph; when we FINISH (re- be the owner of the new STATE (in the lease) the realm PRESIDENTIAL-AREA, ADMITTED-DURING set). We first lo- CLOSE, and non-data-base (CoBoL) files, we cate the appropriate PRESIDENT. The terminate the program. FIND ANY PRESIDENT statement speci- fies that the system is to locate (using Sample Update Program CALC) the occurrence of PRESIDENT While the Presidential data base is designed based on its key value (LAST-NAME, primarily for retrieval, its updating is still FIRST-NAME). Then, within the occur- necessary, for example, at election time or rence of ADMINISTRATIONS-HEADED when a new state is admitted to the Union. owned by the PRESIDENT thus selected, We now define a program to admit a new the system is to search for an occurrence of state. Referring to the discussion of selec- ADMINISTRATION with a value for tion criteria for the ADMITTED-DURING ADMIN-KEY equal to that supplied to the set, we recall that in order to enter a new program. We have now identified the neces- occurrence of STATE in the data base, we sary occurrence of ADMINISTRATION. must supply values for both LAST-NAME When we request that the STATE informa- and FIRST-NAME in PRESIDENT, and tion be STOREd in the data base, we com- for ADMIN-KEY in ADMINISTRATION. plete the process by CONNECTing the In this program the IDENTIFICATION, newly stored State record to the current ENVIRONMENT, and DATA DIVISIONs occurrence of ADMITTED-DURING. If are basically the same as before, and there- the State had been: an AUTOMATIC fore we specify only the PROCEDURE member of ADMITTED-DURING, the DIVISION: sequence of FIND statements would have been performed by the DBMS. PROCEDURE DIVISIOi'~. DECLARATIVES. UNEXPECTED-ERROR SECTION. USE FOR DATABASE-EXCEPTiON. Traversing an m:n Relation in the COBOL DML UNEXPECTED-ERROR-HANDLING. here we process unexpected error conditions. END DECLARATIVES. The relationship between CONGRESS and INITIALIZATION. PRESIDENT is a many-to-many rela- READY PRESIDENTIAL-AREA, USAGE-MODE IS UPDATE. tionship (Section 1, subsection Many- OPEN COBOL files. STORE-NEW-STATE. to-Many-Relationships) and necessi- here we read in from standard COBOL input files the values for LAST-NAME and FIRST-NAME in PRESIDENT tated the introduction of a link record ADM1N-KE¥ in ADMIi',~ISTRATION, and values for the new (CONGRESS-PRES-LINK). The introduc- STATE rocord. FIND ANY PRESIDENT. tion of such a link record complicates tra- FIND ADMINISTRATION IN ADMINISTRATIONS-HEADED USING ADMIN-KEY. versals from an occurrence of PRESIDENT STORE STATE, to his CONGRESSes, and vice-versa. We CONNECT STATE TO ADMITTED-DURING. FINISH-UP. present here a program segment designed to FINISH PRESIDENTIAL.AREA. CLOSE COBOL files. traverse such an m:n relation. STOP RUN. We assume that LAST-NAME and Because STATE is a manual member of FIRST-NAME in PRESIDENT have been ADMITTED-DURING (see subsection set to a desired PRESIDENT in Dislay Presidential Data Base in the DDL), we 2 below: Display 2 FIND ANY PRESIDENT. FIND-NEXT-LINK. FIND NEXT CONGRESS-PRES-LINK IN CONGRF, SS-SERVED IF DATABASE.STATUS ffi DONE GO TO COMPLETE-RUN. FIND-CONGRESS. FIND PRESIDENT-SERVED OWNER. here we have located a CCONGRF~Srecord occurrence over which the PRESIDENT served GO TO FIND-NEXT-LINK. COhlPLETE-RUN. continuation of program

i 88 • R. W. Taylor and R. L. Frank

This program segment is designed to use of CONNECT to place a member FIND all CONGRESSes over which a record in a set. By use of the DISCONNECT specified PRESIDENT served. After FIND- verb an optional member record may be ing the desired PRESIDENT, we traverse removed from a set occurrence. Note that through each of the CONGRESS-PRES- DISCONNECT may only be used on op- LINK records owned by this PRESIDENT. tional member records. When we reach a DATABASE-STATUS condition of DONE we have found all 3. ADVANCED FEATURES CONGRESSes over which this PRESI- DENT served. Section 2, Sample Data-Base Applications After locating a CONGRESS-PRES- introduced the basic facilities offered by the LINK, we look for the owner of the record DBTG systems. The paper by Sibley and in the PRESIDENT-SERVED set. The Fry [see page 7] pointed out, however, that owner of PRESIDENT-SERVED is the there is more involved in the design and CONGRESS record. By FINDing each maintenance of a data base than the speci- CONGRESS-PRES-LINK owned by a fication and manipulation of complex, in- particular PRESIDENT within the CON- terrelated data-base structures. In par- GRESS-SERVED set and then by FINDing ticular, any complete system must provide the owner of the link in the PRESIDENT- facilities that enable a data-base admin- SERVED set, we can traverse from a istration staff to write various utility rou- PRESIDENT to all CONGRESSes over tines for loading the data base, to check which he served. We can similarly traverse its validity, to collect statistics about record from a given CONGRESS record to all frequencies and clusterings, and to reallo- PRESIDENTs who served over that cate groups of record occurrences to im- CONGRESS. prove performance. This list is by no means exhaustive. It should come as no surprise Other COBOL DML Facilities that the same logical structure can be realized on a computer in a variety of ways, The examples presented here illustrate some each with varying efficiencies in different of the more important COBOL DML ad- situations. Users occasionally need access to ditious to the COBOLprogramming language. a "low" level of data (one close to bits on Primarily because of the static nature of the storage media); there must be such a the Presidential data base, it is difficult to system interface. create meaningful examples to illustrate An additional requirement in any system several additional COBOL DML facilities. designed to serve a wide community of Several are briefly discussed here. users concerns tuning and tailoring: means In addition to storing a new record oc- must be offered to allow various system currence in the data base, an existing record services the option of either having their occurrence may need to have some of its performance improved for a particular ap- values changed. The MODIFY verb is plication mix, or having their services by- available for this. This operation (which is passed when such usage would lead to un- performed by the system) may be very acceptable performance. Much has already complex because the effect of a key change been said about data independence; cer- may alter the position of the record or even tainly it is generally a major design goal. affect the sets in which it resides (through But data independence ultimately involves set selection). some execution time binding of decisions, A record occurrence which exists in the and the extra computation involved may be data base may be deleted by use of the intolerable. Yet it may still be desirable to ERASE verb; once again, this may be a allow certain programmers who knowingly very complex operation, because the MAN- sacrifice data independence to use the data- DATORY option may cause multiple de- base system. The full recovery services of letions of many member records. the system may be needed; its ability to Finally, the second example showed the manage multiple indexes may be needed; or

Computing SurveyB, Vol. 8, No. 1, March 1976 CODASYL Data-Base Managenumt System • 89 its ability to be used in a "customized" hence record formats and accessing strate- access method written in a procedure-ori- gies are often fixed before processing begins. ented language may be needed. However, any flexible system provides a The DBTG-like system architecture and means whereby some parameters can be language facilities makes such uses possible. set (or certain extra processing can be The proposed facilities can be categorized triggered) during execution of a request. in two parts. First, the DDL provides The DBTG system provides for this with facilities for a data-base administrator to data-base procedures. control various details of the storage strate- A data-base procedure is logically a part gies. For example, a user may declare that of the data-base definition (the schema). the system should create and maintain an It is a procedural augmentation of the index on specified items within a record (largely) declarative schema. The conditions type. The index would speed processing in under which a data-base procedure is to be appropriate situations. Examples of this invoked are given in the schema, either ex- and other DDL declarative facilities are plicitly or implicitly. The procedure often given later in this section. In addition, the operates as an ON condition functions in mechanism known as a data-base procedure PL/I and may accomplish some change to allows the normal system facilities to be the data-base state or control parameters. extended. Extremely detailed control can Data-base procedures can also be used to be attained, if desired, by the data-base collect statistics and to enforce privacy by administrator, and hence performance tun- checking passwords. We use the concept of ing is possible. Second, other facilities data-base procedures in many of the ex- provide a set of data manipulation verbs amples in the remaining subsections. (and a few more DDL options) which allow Data-base procedures are commonly used designated users to access data in a way to derive data values. Frequently, certain that is less data independent. For example, data items are computable from other items there are facilities to obtain a data-base key in the data base. For example, the value of (a logical "pointer" to a record occurrence) the total salaries paid to the people in a and subsequently to use this key to FIND department is equal to the sum of the a record. Examples of some uses of such a separate salaries of each of the employees facility are included later. It is clear that of the department; or a person's age can use of these facilities should be controlled be derived from his birth data and from carefully, since data independence could today's date. Whenever such a functional suffer. A preprocessor or extended compiler relationship exists, the data-base admin- could enforce such installation-defined stand- istrator can provide a data-base procedure ards. to compute the result. But two possibilities The subsections that follow illustrate some still remain. The functional value may be of these advanced features. We augment computed each time the record occurrence the schema developed in Section 2, Sample is retrieved by a program (VIRTUAL), or Data-Base Application, and provide frag- the functional value may be stored ments of programs that use the advanced (ACTUAL) in'the record occurrence and versions of various DML verbs. updated each time one of the values on which it depends is changed. Depending Data-Base Procedures on the situation, performance can vary widely (e.g., where an ACTUAL result de- The previous structural examples are pends on time). The DBTG specifications largely fixed at the time schema definitions provide the data-base administrator with are made. For example, the record types, the facilities for declaring items as either set types, and several of their attributes ACTUAL or VIRTUAL results of other (e.g., AUTOMATIC) are all determined items in the same record, or functions of before data-base processing begins. One items within selected members of sets reason for this is, of course, that run-time owned by the given record. As an example, interpretation generally reduces efficiency; suppose we wished to include, in PRESI-

Compu~og Sutw~y~ Vol. 8, No. 1, March 1976 90 • R. W. Taylor and R. L. Frank

DENT, the maximum number of electoral Areas votes ever obtained in any election of that president. This may be accomplished by The various record occurrences that are including an item in PRESIDENT: the subject of the STORE command must, of course, be placed on actual secondary 02 MAX-ELECT-VOTI~.S; IS ACTUAL RESULT OF storage media. Naturally, if a data-base FIND*MAX-ELECT-VOTES ON MEMBFE1RS OF ELECTIONS-WON. administrator is to have any influence in this placement strategy, there must be con- where FIND-MAX-ELECT-VOTES is a structs in the DDL that allow this policy data-base procedure name. ACTUAL was to be stated. Such control over physical chosen here since updates should be in- placement seems to be necessary for rea- frequent. FIND-MAX-ELECT-VOTES sonable performance, as demonstrated in would be called each time a new ELECTION many environments, including those not record is added to the data base. necessarily involving data bases. For ex- It is also possible to "propagate" items ample, Moler [G2] shows that munerical from an owner to a member record by using analysis programs which took advantage of a SOURCE statement. This statement the clustering of array values on the same names the item within the owner-record page had significantly improved performance type from which the propagated value should in a paged environment; for example, in be drawn; once again, the data-base admin- FORTRAN implementations with column istrator can either save storage (virtual) or major order that scan by columns, not rows, insure common copies of data values among when possible. The same phenomenon is as the various records in the set (actual). For important in a data-base application. example, consider the structure: The DBTG system provides basically two mechanisms for influencing record- occurrence placement: areas and location I PresidentJ mode (which is treated in subsection Loca- tion Mode). "An area is a named subdivi- Elections-Won sion of the data base" [$2, p. 2.23]. As was previously illustrated, each area is named Election I in the schema, and there may be one o1' more areas declared. Many implementors have where the data-base administrator has de- found it convenient to associate each area cided to include the name of the winning declared in the schema with a file (a cata- President as an item in the ELECTION logable entity) in the associated operating record (one reason for doing this might be system. However, this need not be the case; compatibility with a relational view of hence our previous stress that the area is a data). One way of implementing is: logical rather than a physical concept. Areas give the data-base administrator a RECORD NAME IS ELECTION mechanism for clustering or separating different record occurrences (possi[dy of diverse types). A given record occurrence 02 WINNING-PRESIDENT IS VIRTUAL AND SOURCE 1S PRES-NASIE of a given type may reside in only one area, OF OWNER OF ELECTIONS-WON. but other occurrences of the same :record type may reside in different areas, if desired. where PRES-NAME is a qualified name Also, occurrences of different record types of the item that will serve as the SOURCE can coexist in the same area. The area item when ELECTION is fetched. concept allows data-base designers flexi- Data-base procedures provide a number bilities such as the following: of flexible facilities to control the behavior • If certain (or all) occurrences of a of the data base. Additional examples in- record type are known to be archival, volving data-base procedures are given in they can reside in an area which is the following subsections. associated with a less expensive storage

Computing Surveys, Vol. 8, No. 1, March 1979 CO D A S Y L Data-Base M anag~ma~tt By,~t~m • 91

medium. It may be that this area need RECORD STATE i ...... WITRIN EASTERN-AP~A, WI~TgRN-A~EA not always be mounted. AREA-ID IS STATE-AREA • If records of diverse types are often then the variable STATE-AREA must be used together, their occurrences can initialized with the proper alphanumeric be clustered for high performance. area name at time of store by a command • By designating that all occurrences of like: a record type reside in one area, and by reserving that area for that record MOVE 'EASTERN-AREA' TO STATE+AREA. type, the effect of homogeneous files STORE STATE. can be achieved. In this example, the applications pro- • By omitting certain areas from a grammer has control over area placement; subschema, a measure of privacy is the MOVE (CoBoL) statement is used to attained. initialize the AREA-ID variable. If the • Records may be processed consecu- variable is left "null," then the area place- tively within an area (using the FIND ment algorithm is left to the system im- NEXT IN AREA verb), and process- plementor. Another option is to use a data- ing can proceed at essentially se- base procedure: quential speeds, if the implementor RECORD STATE allows areas to be allocated sequen- WITHIN EASTERN-ARF~A,WESTERN-AREA AREA-ID IS STATE-AREA tially. This is because each "page" USING PROCEDURE PICK$TATE-AgEA within an area will usually follow the previous one in terms of r~siding In this case, the procedure PICK-STATE- on the same cylinder on a disk. Such AREA is invoked to load the variable performance can be important, es- STATE-AREA, and the programmer need pecially in utility and certain bulk not know about the multiple areas. "statistical" application programs in Areas, then, give rather explicit control which record ordering is not important. over major subdivisions of the data base • An area may be used as a unit of and possibly their association with storage recovery. It is then possible to vary media. If this control is given to the appli- the checkpoint policy for different cations programmer, there will probably be portions of the data base. Some im- some loss in data independence. However, plementations may allow dual copies the decision concerning whether areas are of designated areas (maintained by or are not transparent:to an application is the system) both to reduce contention in the hands of the data-base administrator. and to serve as added protection Areas play an important role in data against catastrophe. manipulation in the DBTG system. They Since record occurrences are placed in are the basic unit (like files) which is areas, there must be a mechanism for OPENed and CLOSEd. As discussed in specifying in which area a record occurrence Section 2, Sample Data-Base Applications, of a given type must be stored. In Section areas are called REALMs in the COBOL subschema. The operations corresponding to 2, Sample Data-Base Application, we il- OPEN and CLOSE are called READY lustrated the constructs necessary to store and FINISH. The verb form all records in a single area. Since every record type had a unique area to contain FIND NEXT record-nameIN realm-name it, no special action was needed. If, however, provides a means whereby records can be distinct occurrences of the same record type scanned in ascending "physical" order can potentially be stored in more than one within an area. For example, if the trans- area, the situation is more complex. For action that retires erdployees moves each example, if the data-base administrator de- retired employee record to a REALM clares that STATE records may be stored called RETIRED-EMPLOYEES, then a in either of two areas by writing "batch" program to scan sequentially for

Comptiting Sllrv~vs. Vol, ~ No. |. March 1970 92 • R. W. Taylor and R. L. Frank

all the recently retired employees would specifies that the system place the new record be as follows: occurrence as close as possible to its "ap-

READY RETIRED-EMPLOYEF~. propriate" place in the set occurrence in FIND FIRST EMPLOYEE IN RETIRED-EMPLOYEES. which it (potentially) will become a member. check that aS least one exists, then This implies that the system will use the PERFOR:SI PROCESS-EMPLOYEFES UNTIL (DATABASE-STATUS = SET SELECTION clause of the appropriate END-OF-REALM) STOP RUN, set to find the proper owner-record oc- PROCESS-EMPLOYEES, currence. Having done so, the system will Process an Employee record. use the insert properties of the set (first, FIND NEXT EMPLOYEE IN RETIRED-EMPLOYEES. last, ordered, etc.) to place the new record. Location Mode Using the LOCATION MODE VIA mecha- nism, it is possible to achieve a record As illustrated in Section 2, Sample Data- clustering that is efficient for a "depth- Base Application, the DBTG-like systems first" search. For example, consider the allow a data-base administrator to have following data structure diagram: considerable control over the storage strate- gies and interrecord elusterings among records within a given area. This control is I System I achieved through proper use of the LOCA- TION MODE clause, which appears in Counties the declaration of each record type in the schema. It should be emphasized that the LOCATION MODE of a record type i i designates the strategy to be used for initial Cities record placement when a new record oc- currence is STOREd. Knowledge of how a record was initially stored can also be used in subsequently FINDing a record; but as the examples have shown, there are many J~ Streets ways of FINDing a record in the DBTG system. Thus, one should not associate LOCATION MODE with FINDing, but rather with "setting in a particular place" This diagram defines a hierarchical struc- i.e., with the STORE command. As might ture of a geographical region. By specifying be expected, there is a version of the FIND the following statements in the record verb which depends on a knowledge of the declaration section location mode of the record being sought. RECORD COUNTY Improper use of this form could, of course, .LOCATION MODE IS SYSTEM lead to a loss in data independence. WITHIN GEO-AREA There are four LOCATION MODEs de- RECORD CITY LOCATION MODE IS VIA CITIES SET fined. SYSTEM specifies that an imple- WITHIN AREA OF OWNER mentor-defined algorithm be used in storing RECORD STREET the record. Any area control specifications LOCATION ~AIODE IS VIA STREETS SET would, of course, be used by this algorithm. WITHIN AREA OF OWNER LOCATION MODE VIA set-name (where the record type is a set type member) and declaring the sets, it is possible to ef- fect a storage structure: i,dd,esex II Lex,ogtoov l cooco B,, ; ! Cities Streets

C~mputing Surveys, Vol: 8, No. 1, March 1976 CODASYL Data-Base Managemecv~ ,gystem • 93 i Here, we introduce another option of the cellent performance when applied again. For WITHIN clause. This option is meaningful example, a search fora record with location only when the LOCATION MODE of the mode VIA should follow the "clustering record is VIA some set-name. The area hierarchy" used in itslinitial storage. Simi- • decision for a given record occurrence fol- larly, knowledge of the identifier or identi- lows the area decision of the associated fiers used by the CALC routine can, in owner-record occurrence. carefully designed applications, yield per- If a data-base administrator wishes more formance close to one secondary storage explicit control over record placement, then access per record retrieval [G3]. The form

either of the two remaining location modes, FIND record-name IN set-mime USING identifier(s) CALC or DIRECT, may be appropriate. We illustrated, in Section 2, Sample Data- can be used in a hierarchical search. This is Base Application, how LOCATION MODE illustrated by the example shown in Figure CALC may be used to place records ac- 16 which finds the first MAIN street in a cording to an implementor-defined ran- city in MIDDLESEX county. This ex- domizing routine. The data-base admin- ample differs from previous ones because istrator is also free to designate a data-base there is a "den't care" condition on the procedure that can serve as an algorithm CITY record occurrence. for developing "addresses," based on the As illustrated in Section 2, the specifica- value of the identifier or identifiers. Whether tions allow usage of the form such an algorithm attempts to behave FIND ANY record-name pseudorandomly over the space of possible when a location mode of CALC has been identifiers is, of course, determined by the declared for the sought record type. By algorithm developer (the data-base ad- properly initializing the identifier or identi- ministrator). It is therefore possible to in- fiers which were used in initially storing the corporate arbitrary record placement algo- record, that record will be found. This rithms into the system. procedure requires, of course, that pro- It is important to understand the rela- grammers know which record types in the tionship between the location mode of a data base have location mode CALC and record type and the FIND verb. As il- what item or items constitute their re- lustrated in Section 2, there are many ways spective CALC-KEYS. It may therefore be to FIND a record occurrence. These may be difficult to change location modes (or the classified into two types. Either one FINDs declaration of items forming the key) with- a record occurrence based on its participa- out affecting some program and hence tion in a set (possibly a singular set), or introducing a lack of data independence. one FINDs a record occurrence based on There have been suggestions [E4] for avoid- knowledge of how it was initially STOREd. ing this potential problem by using a pre- These two methods provide a number of compiler or data-base procedures. potential flexibilities. For example, by mak- The fourth location mode for a record ing every record type a member in a singular

set and by having programmers FIND MOVE '~IIDDLFESEX' TO COUNTY-NA~IE. MOVE 'MAIN' TO STREET-NAhIE. record occurrences using the form MOVE 'FALSE' TO SUCCF~S. FIND record-name IN singular-set-name USING identifier(s) FIND COUNTY IN COUNTIES USING COUNTY-NAME. IF DATABASE-STATUS = lqOT-FOUND it is possible to provide access to all records Process for county not fouml FIND FIRST CITY IN CITIES. by specifying their partial contents. A PERFORM SEARCH-FOR-STREET UNTIL (SUCCESS = 'TRUE') OR (DATARA.~E-STATUS ~ DONE) programmer need not know the location IF SUCCESS = 'TRUE' Process for city found mode of a record type in order to use this ELSE Process for city not found. verb. In addition, the data-base adminis- SEARCH-FOR-STREET. trator can enhance performance in this case FIND STREET IN CURRENT OF STREETS U~LNG STREET-NAME, IF DATABASE-STATUS m FOUN~ by defining indexes. The details are illus- MOVE 'TRUE' TO SUCCF~S trated in subsection Search Keys. On the ELSE FIND NEXT CITY IN CITIES. ' other hand, a knowledge of the method used in storing a record initially can provide ex- FIOURE 17. Hierarchical search. 94 • R. W. Taylor and R. L. Frank type is DIRECT. In order to present this occurrences of the given record type may mode properly, we must introduce the specify the data-base key which the system notion of a data-base key. A data-base key will try to use. This is in contrast to other is a unique identifier which is associated location modes where the data-base system with every record occurrence in the data determines the data-base key based on base. This data-base key is associated with calculation keys or the proximity to a the record occurrence when it is initially logical insertion point in a set. Assuming stored, and remains the unique identifier the data-base key is not already used, the of the record occurrence throughout its system will use the program-designated key. lifetime in the data base. Although the If that data-base key is already used, the specifications leave the detailed structure of system chooses the next (higher) unused a data-base key to be decided by the im- data-base key. With care, the program may plementor, it is common for a data-base have a great deal of control over record key to have some physical implications; placement. data-base keys are often used in the im- Clearly, DIRECT location mode may be plementation strategies of a given set. much closer to the physical levels of data. For set traversal to be efficient, the mecha- If a high level of data independence is a nism used must have some physical impli- goal, then the usage of this location mode cations. For example, in many implementa- must be carefully controled. However, such tions, a data-base key will designate the a facility can be quite useful, especially area, the page within the area, and the when writing data-base maintenance pro- record number within the page of the record cedures. As an illustration, consider the occurrence. Areas (subdivisions of the data method of loading the data base. One popu- base) are often composed of fixed length lar technique is to let the system, during pages, and it is easy to note the similarity loading, operate under a schema that is to a segment/page addressing scheme in a different from the run-schema. In this load- virtual memory environment. However, in schema, the location mode of the records contrast with most virtual memory schemes, is DIRECT. If the run-schema specifies a the actual placement of records within a location mode of CALC, the load program page is often governed by a local "on-page" can then use the following algorithm: index, which maps the record number 1) For each record to be loaded, invoke portion of the data-base key to a displace- a local copy of the CALC subroutine ment within the page. In this way, space to compute a data-base key using within a page can be garbage collected, and the designated CALC-key items records, whose size can in general vary within the record. with time, can be moved within the page, 2) Sort all records by ascending com- all without disturbing pointers pointing at puted data-base key. the object being moved. Of course, if a 3) Use the load-schema (which has lo- record grows to the extent that it must be cation mode DIRECT) to store the moved to an overflow page, then a small records in one sequential pass over "overflow pointer" must remain on the the area. original page. As long as the number of So-called batch-random operation [G4] in a records on overflow pages is not a great data-base load program has been found to fraction of the total records, this scheme improve performance by as much as four can behave almost like direct address times. Location mode DIRECT has al- pointers while still allowing record move- lowed such a program to be written in a ment. If too many records are moved to high-level language. overflow pages, then the corresponding area can be expanded, and either the page Location mode DIRECT may also be size or the number of pages can be increased. used when direct accessing of records is We now return to the discussion of LO- desired or when building specialized access CATION MODE DIRECT. In the DI- methods on top of the basic system facil- RECT mode, the program which STOREs ities.

Computing Surveys, Vol. 8, No. 1, March 1976 CODASYL Data-Base Managem~2~S~y~,~ • 95

Search Keys operation does not depend on the con- Consider the data structure diagram tinued existence of the index; indexes can be added or deleted aslappropriate. In general, use of the SEARCH KEY , System clause implies that the DBMS will build an All-Elections-SS l index on the member records (of a given Elections-Won] T Native.Sons type) within a set occurrence. Its use im- plies the existence of many indexes, one for each set occurrence. The use in a singular set is a special case. Let us suppose that the data-base admin- Introduction of the SEARCH KEY istrator has determined that the dominant clause allows simulation of the traditional usage of this portion of the data base is to indexed sequential file. By declaring: answer the question: Print the elections won by President X. In this case, it would SET NAME IS ALL-STATES OWNER IS SYSTEM be appropriate to give ELECTION records ORDER IS PERMANENT I~ERTION IS a LOCATION MODE VIA ELECTIONS- SORTED BY DEFINED KEYS MEMBER IS STATE WON set in order to cluster them close to MANDATORY AUTOMATIC the related President. On the other hand, KEY IS ASCENDING STATE-NAME if we consider the question: Find the state DUPLICATES ARE NOT ALLOWED NULL IS NOT ALLOWED whose native son was the winner of the SEARCH KEY IS STATE-NAME USING INDEX election of date X, it is clear that the speed DUPLICATF,~ ARE NOT ALLOWED of response depends primarily on how fast the data-base administrator can: 1) specify the election record of a given date can be an ability to scan sequentially through all found (after that, the answer involves two record occurrences of a given type; plus 2) FIND OWNER statements). The question provide a direct path: (through an index) may then be posed: to a particular occurrence, given a value

FIND ELECTION VIA ALL-ELECTION-SS USING ELECTION-YEAR.

Since an exhaustive search of all ELEC- of its primary key. To obtain a performance TION record occurrences very likely would similar to the traditional indexed sequential be slow, it may be appropriate, if query file, the data-base administrator would volume is sufficient, to define an index on have only one record type in the associated occurrences of the ELECTION record type. area. Thus logically adjacent records would Indexes in the DBTG-like systems are tend to be physically adjacent, as in an specified by using the SEARCH-KEY indexed sequential file. clause. By declaring: SEARCH KEYs can also be used to support an arbitrary number of inversions SET NAME IS ALL-ELECTIONS-SS (secondary indexes) over the members of OWNER IS SYSTEM... MEMBER IS ELECTION a set occurrence. The data-base admin- MANDATORY AUTOMATIC... SEARCH KEY IS ELECTION-YEAR USING INDEX istrator has the option of allowing or dis- DUPLICATES ARE NOT ALLOWED allowing duplicates for items or concatena- tions of items among members of the set the system builds and maintains an index occurrence. Thus, in a Singular set, there is on the ELECTION-YEAR item. When the a mechanism for guaranteeing unique FIND statement is issued, the system uses values across all occurrences of a given this index to speed the search; it uses the record type. ELECTION-YEAR item provided by the program to search the index to find the first (and in this case only) record having the Set Selection given year. This ELECTION record is As explained in Section 2, Sample Data- then accessed. The success of the FIND Base Application, each declared set in the

• :~,.~,~:~.~, . ~ • ::~:.~L~ ¸ ~- 96 • R. W. Taylor and R. L. Frank schema has an associated set (occurrence) clared in the associated subschema, as: selection clause. The basic purpose of this SD STREETS SET SELECTION FOR STREET IS clause is to inform the system how to select a VIA COUNTIES OWNER SYSTEM particular set occurrence of the particular VIA CITIES OWNER VALUE OF COUNTY-NAME IS COUNTY set type. This is necessary, for example, VIA STREETS OWNER VALUE OF when a record type is an AUTOMATIC TOWN-NAME IS TOWN. member in a set type. As previously il- In the set selection specification, the system lustrated, when a new record of this type is instructed to first locate a COUNTY by is stored, the system must automatically going to the singular set COUNTIES and include it in a set occurrence; the set selec- to search forward until a matching county tion clause tells which is the appropriate name (i.e., MIDDLESEX) is found; then one. the system is instructed to descend into The other major use of set selection is in the CITIES set to search for a matching the FIND verb: town; finally the system is instructed to FIND record-name IN set-name USING id-1, id-2,... search for a matching street by the USING phrase on the FIND verb. When this version of the FIND verb is It should be clear that a set selection executed, the set selection clause of the declaration is basically a specification for a named set is invoked to find the set occur- data-base procedure that performs an ef- rence. Searching fol a record of the desig- fectively hierarchical search in order to nated type within the set occurrence then establish a set occurrence. An entry pDint proceeds. If the optional USING clause is is established, either through singular sets, omitted, the first such record is selected. CALC entry points, or currency (see sub- Otherwise, the system selects the first section Currency Indicators); then, if this member of the set where all identifiers (in is not the desired set occurrence, a hier- the record) are equal to the initialized archical traversal can be carried out starting identifiers. If no such record exists, the at this point until the proper set occurence search fails and control is returned to the is found. DECLARATIVES section of the program. One can also note that the only "match The set selection clause, while defined in arguments" currently allowed are item equal- the schema, may be changed or augmented ities or concatenations of item equalities. in the subschema. That is, the particular The "don't care" condition and potential set occurrence selection can vary from one for backtracking, as expressed in subsection subschema to the next, under control of the Location Mode, is not possible. This greatly data-base administrator. The set selection eases the implementation, as opposed to clause in the schema is a default, which will making tree traversals against arbitrary be used unless overridden by the sub- Boolean expressions. To achieve the same schema. effect produced by the example given in As an illustration of these concepts, we subsection Location Mode, a data-base search for the first MAIN street in procedure would be written. The system LEXINGTON in MIDDLESEX county could then be told to use the procedure (see subsection Location Mode). The rele- by the statement: vant piece of program code is: SET SELECTION IS BY PROCEDURE FIND-ANY-MAIN-STREET. MOVE 'MIDDLESEX' TO COUNTY, MOVE 'LEXINGTON' TO TOWN. MOVE 'SIAIN' TO STREET-NAME. Support for the more complex traversals FIND STREET IN STREETS USING STREET-NAME. can be defined by installation using data- This contains considerably less code than base procedures, though of course there is the previous example (though the example only one possible data-base procedure per is slightly different). Clearly, the extra set declaration per subschema. If the tra- searching is performed by the data-base versal is more specific to applications or system with only one FIND command. transactions, it must be specially pro- The specifications for doing this are de- grammed.

Computing Surveys, Vol. 8, No. 1, March 1976 CODASYL Data-Base Managemeyt System • 97 F Currency Indicators the next higher data;base key, and next within a set means tile next record in the The notion of a current record was discussed "forward" direction in that set. briefly in Section 2, Sample Data-Base It is important to understand when Applications, but our examples have not currency indicators change. Currency in- emphasized currency. Normally, the system dicators always change on execution of behaves in an expected way and the pro- some DML verb. Also, the object of certain grammer does not need to take special note actions must be the current record of the of currency. There are certain complex run-unit. The following rules apply: traversals, however, where the application • Only the current record of the run- programmer must be aware of the exact unit may be the subject of the GET, status of the currency indicators. In this ERASE, CONNECT, and DISCON- section, we discuss currency and show NECT verbs. when care must be exercised. • When a new record of any type is There are many currency indicators as- found or stored (except for special ex- sociated with a typical program that exe- ceptions discussed later), it becomes cutes with a subschema (this is called a the current record of the run-unit run-unit): and realm in which it r~sides; it also • one currency indicator designating becomes the current record of its the current record referenced by the type and of all sets in which it par- run-unit; ticipates as either owner or member. • one currency indicator for each record Thus the currency of many sets may type, indicating the current record be affected. of that type; Because the currency indicators of sets • one currency indicator for each set can potentially change when a new record type, indicating the current set oc- of a given type is found, a programmer currence of that type by "pointing" must be careful when doing traversals that at the referenced record occurrence, involve "backtracking:" By backtracking, which is either the owner or a member we mean a situation which (for some reason) of the given set; and makes it necessary that a previously es- • one currency indicator for each realm, tablished position be reestablished. If, in indicating the current record refer- the meantime, a new record has been stored, enced in that realm. the set currency indicators may have po- Thus in the Presidential data base, if a tentially changed, making the reestablish- program can access the entire data base, the ment of position difficult or impossible. system would maintain seventeen currency The programmer must anticipate such indicators--six for the six record types, changes to the currency indicators and nine for the nine sets (including each singu- take steps to avoid such situations. The lar set), one for the single area, and one following example illustrates that, even for the current record of the run-unit. The currency indicators always make the during a pure retrieval, the currency indi- concept of "next" (and prior) well defined, cators must be handled properly. The ex- regardless of how the currency has been ample has been adapted from Date [G5]. established; for example, next within a Given the STUDENT/COURSE data realm means the record in that realm with structure diagram below

Course System

Students Student- I Cou ses Grade ~L Grade

Computing Sur~eyo, V~. 8, No. 1, March 1076-

.... :? . • . 98 • R. W. Taylor and R. L. Frank answer the following question: For each MOVE 'MATH' TO COURSE-NAME. FIND COURSE 1N COURSES USING COURSE-NAME. student taking MATH, find all the other ** We now have the Math record. We a~ume it exists. courses being taken by that student and FIND FIRST GRADE IN COU1L~E-GRADE. PERFORM PRINT-STUDENTS-OTHER-COUleES UNTIL print the student's name and the names of (DATABASE.STATUS - DONE). STOP RUN. the other courses. Note the situation here. PRINT-STUDENTS-OTHER-COURSES. PERFORM FETCH-AND-PRINT-ONE-STUDENT. Given that we have found a student who PERFORM PRINT-THAT-STUDENTS-COURSES. takes MATH (found by locating the MATH ** Now check for more students by trying to ** FIND more grades associated with the given course record and by perforining the "switch and ** (Math in our example). find owner" traversal as illustrated in Sec- FIND NEXT GRADE IN COURSE-GRADE. tion 2), we must reenter the same structure END-OTHER-COURSES. FETCH-AND-PRINT-ONE-STUDENT. to find the courses which are not MATH FIND STUDENT OWNER. GET NAME OF STUDENT. being taken by that student and print Print it. END-ONE-STUDENT. them. But this retraversal of the STU- PRINT-THAT-STUDENTS-COURSES. DENT/COURSE structure wil! destroy ** Note the u~e of the RETAINING CLAUSE Below FIND NEXT GRADE IN STUDENT GRADE the previous currency associated with the RETAINING CURRENCY FOR COURSE-GRADE. IF DATABASE-STATUS ~ DONE COURSE/GRADES set. When we try to GET GRADE find another student taking MATH, the IF COU1LSE IN GRADE ~ COURSE-NAME Print the other course name. results are potentially unpredictable. GO TO PRINT-THAT-STUDENTS-COURSES. To allow a currency saving facility, two END-STUDENTS-COURSES. approaches are provided. One is the AC- FIGURE 18. Illustration of currency retention. CEPT statement. Using Data Structure Diagrams) and the (realm-name 1 relatively uncommon question: Find the ACCEPT identifier FROM tact-name ~ CURRENCY [record-name~ employees who earn more than their man- agers. The reader is encouraged to think moves the designated currency indicator to through these two examples with respect to a run-unit variable. Subsequent use of the currency indicators. FIND verb

FIND recoxd-name; DATABASE-KEY IS identifier 4. IMPLEMENTATIONS OF THE DBTG SPECi- FICATiONS would restore the relevant currency indi- cator. As mentioned in the Introduction, the Another method has also been provided. specifications initially published by the Both FIND and STORE have an optional DBTG are under continuing development RETAINING CURRENCY clause. By and refinement by groups within the using this clause, the currency for desig- CODASYL organization. It is difficult to nated record types, set types, or realms is state whether system "X" does, or does not, not changed by the execution of the verb; follow the specifications. There are, how- that is, the normal currency updates, as ever, .a number of commercially available explained previously, are not performed systems that have used one or more versions (except for the current record of the run- of the specifications as a basis for imple- unit). As an illustration of this, the pro- mentation. While these system may employ gram shown in Figure 18 answers the syntax that is slightly different from our STUDENT/COURSE question already dis- examples, they follow the same basic data cussed. We assume course name is (at least model. Some of the commercially available virtually) part of the GRADE record. The systems which are generally deemed to be reader will note that a GO TO statement is "DBTG type" systems are: used, since it is felt that a simpler program • DBMS/10 (Data Base Management results. System/10), marketed by Digital Two other examples illustrating when Equipment Corp. for use on DEC sPecial treatment of currency indicators is System 10 computers necessary are the parts explosion traversal • DMS/1100 (Data Management Sys- (see section 1, Complex Relationships tem/1100), marketed by U~IvAc

Computing Surveye~ Vol. 8, No. 1, March 1976 CODASYL Data-Base Manaoeme~t Systttm • 90 ! for use on UmvAc Series 1100 cases where a separateisub-schema language computers is provided, many implementors have still • EDMS (Extended Data Management allowed only for the inclusion or exclusion System), marketed by XEROX of entire record types or set types. In a Data Systems for use on XEROX minority of the systems the sub-schema is SIGMA 6, 7 and 9 computers allowed to select only certain data items • IDMS (Integrated Data Management from a record, or to reorder or change the System), marketed by Cullinane data-item type. Corp. for use of IBM System/360 and System/370 computers GUIDE TO FURTHER•READING • IDS/II (/II), marketed by HONEYWELL In- The DBTG class of systems will continue formation Systems for use of the to evolve, as will the state of various im- HONEYWELL 6000 series com- plementations. In an: introductory paper puters such as this, it is impossible to cover all the • PHOLAS, marketed by PHILIPS ELEC- options and characteristics of these DBTG- TROLOGICA. like systems. For these two reasons, it is Though this section is not meant to be de- necessary to read further in the literature tailed, it is worthwhile to consider what for an in-depth understanding of the total parts of the specifications are successfully system architecture, data model concepts, implemented, for such an examination would and data-base design techniques. This sec- indicate which features the implementors tion presents an annotated guide to DBTG have found easy (or difficult) to implement, literature. We concentrate here only on or which features the implementors thought literature whose principal focus is the DBTG would be of use to their customers. specifications, or literature which compares, As a general rule, the implementors have in-depth, the DBTG approach to some al- allowed full generality to be used in the ternative. Discussions of data-base manage- data model presented in Section 1, Design meat in general, or tutorial treatments, or of a Data Base, as well as in most of the references to specific vendor manuals ap- DML functions presented in Section 2, pear in the general biblio~aphy in the com- Sample Data-Base Application, and Sec- panion paper in this ' issue by Fry and tion 3, Advanced Features. The part of the Sibley. data model most frequently omitted is that The most recent developments with of singular sets. The rationale for this respect to the Schema language appear in seems to be that singular sets can be very the CODASYL Data Description Language easily simulated by the user. Journal of Development [$2]. Specifications While the basic data model is maintained for the Data Manipulation Language of in all of the systems, various implementors COBOL appear in the CODASYL COBOL have left out many of the more sophisticated Journal of Development [$3]. These docu- features in the Schema DDL. The facility ments are issued periodically, and announce- for privacy locks and keys is most fre- ments of availability appear in various quently dropped, as are data-base pro- professional journals. There is also a cedures. In addition, none of the available CODASYL Committee that is developing systems provide for VIRTUAL items. Data Manipulation Language specifications In approaches to the sub-schema facility, for FORTRAN [$5, see also $6]. All three systems vary widely. The initial imple- references are primarily language specifi- mentation of some systems did not provide cation manuals; as such they are not written for a separate sub-schema language (indeed, in tutorial fashion, but are intended pri- in 1969 the DBTG specifications did not marily for implementors and as a final include one); instead, the systems relied on arbiter regarding details of program/data- the COBOL program. Such a facility provides base system semantics. However, [$2] does for inclusion or exclusion of certain schema contain sections that describe concepts of record types from the program. Even in the Schema language.

Computi~ krv~y~ VoL 8, No. 1, M~t'0h1975 i 100 • R. W. Taylor and R. L. Frank

On a more general level, a discussion of Model" [C1] outlines the possibility of the the evolution of "navigational" systems, of coexistence of data models. The article which the DBTG systems are a prime ex- "On the Equivalences of Data Based Sys- ample, is given in the ACM Turing Lecture tems" by Sibley [C4] also explores this by Charles W. Bachman IN3]. In [N2], point. Bachman describes how data structure A SHARE Working Conference held in diagram notation can be used to illustrate Montreal, Canada, contains papers de- the organization of lower levels of data-- scribing user experiences with various com- specifically illustrating the access method mercially available implementations of the and storage medium levels. DBTG systems [U3-U5]. A collection of more advanced examples, Some aspects of implementation are based on the 1971 DBTG report has been discussed in [II-I5]. published by Frank and Sibley JEll. An- There have also been a number of papers other example by Sibley, using the 1973 dealing with the features required by syntax, is available as a National Bureau of data-base management systems. Refer- Standards report [E3]. Additional examples ences [M4-M6, MS] are of particular in- appear in vendor manuals, especially [E2]. terest. There has been a continuing debate con- cerning the merits and disadvantages of the CLASSIFICATION OF REFERENCES DBTG architecture. Aspects of this debate S Syntax and System Specifications are covered in the companion paper by N Data Structure Diagram Notation, Naviga- Michaels, Mittman, and Carlson in this tional Systems E Example Schemas, Sub-schemas, and Pro- issue. Comparisons of the DBTG proposal grams relative to the relational model appear in D Critiques and Debate Position Papers many places; one of the most complete R Suggested Revisions to the Specifications C Comparison of the DBTG Model to Other discussions is contained in the proceedings Data Models of a debate [D1, D2] in which C. W. Bach- A Designing Data Bases Using DBTG Systems U User Experience with Commercial Implemen- man and E. F. Codd are the principals. tations There have also been critiques and technical I Implementations evaluations of various aspects of DBTG. M Other Modeling Papers One such critique [D5] was presented when G Referenced Papers the 1971 report was published. In 1975, an IFIP Working Conference was devoted REFERENCES specifically to an in-depth evaluation of This bibliography collects and classifies references various constructs in the Schema language. to various articles concerning the DBTG specifi- cations. It also includes articles which debate the Proceedings of that conference have been merits of various features, articles which discuss published, and various revisions to the DDL implementational aspects, and articles which discuss data-base design in the context of a DBTG are proposed [DS, R1-R5]. The volume also system. The following abbreviations are used in contains articles that illustrate how to use the bibliography. DBTG systems to support a relational view SIGMOD/SIGFIDET The ACM Special Interest Group on the Management and discuss the use of concepts from the of Data (formerly named relational model (e.g., normalization) in the Special Interest Group the context of DBTG systems. Papers on on File Description and Translation) holds an an- the proper design of data bases by using num Conference. The data structure diagrams and on the proper Proceedings of these con- ferences are available use of the Data Manipulation Language from ACM, New York. appear in [A1-AT]. IFIP TC-2 A Special Working Con- There have been discussions of the possi- ference, "An In-Depth Evaluation of Codasyl bility of designing systems which could DDL," was held in Bel- support any data model a user might wish-- gium in January 1975. whether network, relational, hierarchic, or Proceedings of that con- ference are available in other types. Nijssen's article "Data Struc- the book Database De- turing in the DDL and Relational Data scription, B.C.M. Douque

Computing Surveye, Vol. 8, No. 1, March 1976 CODASYL Data-Base Managemen! Syst~ra • 101 i and G. M. Nijssen, Eds. nology, National Bureau of Standards, North-Holland Publ. Co., Washington, D.C.; TAYLOR, R. W., "Data administration lands,Amsterdam'1975.The Nether- [E4] and the DBTG ReporL" Proc. of ACM SIGMOD/SIGFIDET Conf., 1974, ACM, New York, pp. 431--444. (S) Syntax and System Specifications [E51 MANOLA, F. A., Principles of the CODASYL approach to the description of [$1] CODASYL DATA BASE TASK GROUP, dala structures, Report 3068, Naval Re- April 1971 report, ACM, New York. search Laboratory, 1975, Washington, D.C. [$2] CODASYL DATA DESCRIPTION LANGUAGE COMMITTEE, Data Description Language Journal of Development, Document C13.6/ (D) Critiques and Debate Position Papers 2:113, U.S. Government Printing Office, Washington, D.C. [D1] BACHMAN,C. W., "The data structure [$3] CODASYL PROGRAMMINGLANGUAGE COM- set model," in Proc. 1975 ACM-SIGMOD MITTEE, CODASYL COBOL Journal of Debate, "Data Models: Data Structure Set Development, Dept. of Supply and Services, versus Relational," R. Rustin, (Ed.), Government of Canada, Technical Ser- ACM, New York, :PPA 1-10. vices Branch, Ottawa, Ontario, Canada. [D2] CODD,E. F.; AND I)ATE, C. J. "Inter- [$4] CODASYL DATABASE LANGUAGE TASK active support for non-programmers: the GROUP, CODASYL Cobol database fa- relational and network approaches," in cility proposal, 1973, Dept. of Supply Proe. 1975 ACM-~IGMOD Debate, "Data and Services, Government of Canada, Models: Data Structure Set versus Rela- Technical Services Branch, Ottawa, Can- tional," R. Rustin, (Ed.), ACM, New ada. This document proposes revisions to York, pp. 13--41. ,, CODASYL COBOL. The revisions, as ac- [D3] DATE, C. J.; AND CODe, E. F., The cepted, appear in the latest version of the relational and network approaches: com- CODASYL Cobol Journal of Development, parison of the application programming see [$3]. interfaces," in Proc. 1975 ACM-SIGMOD [$5] CODASYL FORTRAN DML, information Debate "Data Models: Data Structure Set on the activities of this committee ' is versus Relational," R. Rustin (Ed.), ACM, available from Chairman, CODASYL New York,pp. 85-113. FORTRAN DML Committee, P.O. Box 124, [D4] EARNEST, C. P., A comparison of the 928 Garden City Drive, Monroeville, Pa., network and relational data structure models, 15146. Technical Report Sciences Corp., El [$6] STACEY,G. M., "A FORTRAN interface Segundo, California, 1974. to the CODASYL Data Base Task Group [D5] ENGELS,R.W., "An analysis of the April specifications," Computer J. 17, 2 (May 1971 DBTG report," in Proc. 1971 ACM 1974), 124-129. SIGFIDET Workshop on Data Description, Access, and Control, ACM, New York, pp. 69-91. (N) Data Structure Diagram Notation, Navi- [D6] METAXmES, A., "Information bearing and non-information bearing sets," in gational Systems Proc. of IFIP TC-~ ~pecial Working Conf., "An In-depth Technical Evaluation of the [N1] BACHMAN,C. W., "Data structure dia- CODASYL DDL~ '~ pp. 363-368. grams," Database 1, 2 (Summer 1969). [D7] OLLE,T.W., "Cut'rent and future trends [N2] BACHMAN, C. W., "The evolution of in database management systems," in storage structures," Comm. ACM 15, 7 Proc. I FI P Congress, Information Process- (July 1972), 628-634. ing 7~, North-Holland Publ. Co., Amster- [N3] BACHMAN,C. W., "The programmer as dam, The Netherlands, pp. 998--1006. navigator," Comm. ACM 16, 11 (Nov. [DS] WAGHORN,W. J., "The DDL as an in- 1973), 653-658. dustry standard?," in Proc. IFIP TC-~ Special Working Conf., "An In-depth Technical Evaluation of the CODASYL (E) Example Schemas, Subschemas, and Pro- DDL," pp. 121-167. grams tEll FRANK,R. L.; AND SIBLEY, E. H., The (R) Suggested Revisions to the .Specifications Data Base Task Group Report: an illustra- tive example, Doc. No. AD-759-267, U.S. JR1] KAY, M. H., "An assessment of the National Technical Information Service. CODASYL DDL for use with a relational [E2] PHILIPS DATA SYSTEMS, An application sub-schema," in Proc. IFIP TC-~ Special example of the CODASYL DBTG Proposal, Working Conf., "An In-depth Technical Pub. No. 5122-991-24151, Philips-Elec- Evaluation of CODASYL DDL," pp. 199- trologica, Apeldoorn, The Netherlands, 214. 1973. [R2] NIJSSEN, G. M., "Set and CODASYL [E3] SIBLEY,E. H., The CODASYL database set or eoset," in Proc. IFIP TC-~ Special approach: a Cobol example of design and Working Conf., "An In-depth Technical use of a personnel file, NBSIR 74-500, Evaluation of CODASYL DDL," pp. 1-71. Institute of Computer Sciences and Tech- [R3] OLLE, T. W., "An analysis of the flaws

ComputiQE Surv~ys, Col. 8, No. 1, March 1976 102 • R. W. Taylor and R. L. Frank

in the Schema DDL and proposed improve- tures," in Proc. Conf., on Data: Abstrac- ments," in Proc. IFIP TC-$ Special Work- tion, Definition, and Structure, ACM ing Conf., "An In-depth Technical Evalu- SIGPLAN/SIGMOD 1976, ACM, New ation of CODASYL DDL," pp. 283-297. York, pp. 73-84. [R4] ROBINSON, K. A., "An analysis of the [A5] GERRITSEN, R., "A preliminary system uses of the CODASYL set concept," in for the design of DBTG data structures," Proc. IFIP TC-e Special Working Conf., Comm. ACM, 18, 10 (Oct. 1975), 551-557. "An In-depth Technical Evaluation of [A6] MITOMA,M. F.; AND IRANI, K.B., "Auto- CODASYL DDL," pp. 169-181. matic database schema design and optimi- [R5] TAYLOR, R. W., "Observations on the zation," in Proe. of the Internatl. Conf. attributes of database sets," in Proc. on Very Large , 1975, ACM, IFIP TC-£ Special Working Conf., New York, pp. 286-321. "An In-depth Technical Evaluation of [AT] TAYLOR,R.W., "When are pointer arrays CODASYL DDL," pp. 73-84. better than chains," in Proc. ACM Na- [R6] HAWLEY, D. A.; KNOWLES, J. S.; AND tional Conf., 1974, ACM, New York, p. TOZER, E. E., "Database consistency 735. and the CODASYL DBTG proposals," Computer J. 18, 3 (1975), 206-212. (U) User Experience With Commercial Imple- (C) Comparison of the DBTG Model to Other mentations Data Models [U1] BANDURSKI, A. E. ; AND JEFFERSON, D.K., "Data description for computer- [C1] NIJSSEN, G. M., "Data structuring in aided design," in Proc. ACM SIGMOD the DDL and relational model," in Data- Internatl. Conf. on Management of Data, base Management, J. W. Klimbie, and K. L. 1975, W. F. King, (Ed.), ACM, New York, Koffeman, (Eds.), North-Holland Publ. pp. 193-202. Co., Amsterdam, The Netherlands, 1974, [U2] CANNING, R. G., "What s happening pp. 363-384. with'CODASYL-type DBMS," EDP Ana- [C2] McGEE, W. C., "A contribution to ~the lyzer, (Oct. 1974). study of data equivalence," in Database [U3] EMERSON, E. J., "DMS 1100 user ex- Management, J. W. Klimbie, and K. L. perience," in Database Management Sys- Koffeman, (Eds.), North-Holland Publ. tems, D. A. Jardine, (Ed.), North-Holland • Co., Amsterdam, The Netherlands, 1974, Publ. Co., Amsterdam, The Netherlands, pp. 123-148. 1974, pp. 35-46. [C3] SENKO, M. E., "Data description lan- [U4] LAVALLEE,P. A.; AND OHAYON, S., "DMS guage in the context of a multi-level applications and experience," in Database structured description: DIAM II with Management Systems, D. A. Jardine, (Ed.), FORAL," in Proc. IFIP TC-~ Special Work- North-Holland, Publ. Co., Amsterdam, ing Conf., "An In-depth Technical Evalu- The Netherlands, 1974, pp. 47-67. . ation of CODASYL DDL," pp 239-257. [U5] VON GOHREN, i..~.~'~ ~.,r "lTma.p.... e Xp erlence,, [CA] SIBLEY,E.H., On the eqmvalences of with integrated data store (IDS), in databased systems," in Proc. 1975 ACM- Database Management Systems, D. A. SIGMOD Debate, "Data Models: Data Jardine, (Ed.), North-Holland Publ. Co., Structure Set versus Relational," R. Rustin, Amsterdam, The Netherlands, 1974, pp. (Ed.), ACM, New York, pp. 45-76. 19-33. [C5] STONEBRAKER,M.; AND HELD, G., "Net- works, hierarchies, and relations in data- base management systems," in Proc. (I) Implementations ACM Pacific 75 Regional Conf., ACM, New York, pp. 1-9. [Ill BACHMAN, C. W., AND WILLIAMS, S. B., "A general purpose programming system for random access memories," in Proc. (A) Designing Data Bases Using DBTG Systems AFIPS 1965 Fall Jr. Computer Conf., Vol. 26, Spartan Books, Baltimore, Mary- [All BACHMAN,C.W., "Implementation tech- land, pp. 411-422. niques for data structure sets," in Database [I2] CANADAY, R. H., et al., "A back-end Management System~, D. A. Jardine, computer for database management," (Ed.), North-Holland Publ. Co., Amster- Comm. ACM 17, 10 (Oct. 1974), 575-582. dam, The Netherlands, 1974, pp. 147-257. lI3] FOSSUM, B. M., "Database integrity as [A2] BAKER, G. J., "The correct use of provided for by a particular database CODASYL DBTG sets," Database Journal management system," in Database Man- 6, 2, pp. 19-21. A. P. Publications Ltd., agement, J. W, Klimbie and K. L. Koffe- London. man, (Eds.), North-Holland Publ. Co., [A3] BROWN,A. P.G., "Modeling a real world Amsterdam, The Netherlands, 1974, pp. system and designing a schema to repre- 271-288. sent it," in Proc. IFIP TC-$ Special [I4] JOHNSON, H. R., "A schema report Working Conf., "An In-depth Technical facility for a CODASYL based data defi- Evaluation of CODASYL DDL," pp. 339- nition language," in Database Description, 347. B. C. M. Douque and G. M. Nijssen, [A4] BUBENKO,J. A., JR., et al., "From in- (Eds.), North-Holland Publ. Co., Amster- formation structures to DBTG data strut- dam, The Netherlands, 1975, pp. 299-328.

Computing Surveys, Vol. 8, No. 1, March 1976 CODASYL Data-Base Mana~rnent Syaera • 103

[ISl SCHENK, H., "Implementational aspects [M8] OLLE, T. W., An asa~sment of how the of the CODASYL DBTG proposal," in CODASYL data base task group proposal Database Management, J. W. Klimbie and meets the UUIDE-Su4~z requtreraents, K. L. Koffeman (Eds.), North-Holland Report 329 Norwegian Computing Center, Publ. Co., Amsterdam, The Netherlands, 1972. 1974, pp. 399-412. [M9] SENKO, M. E., et al., "Data structures [I6] WARREN, THOMAS, Feature analysis of and accessing in database systems." CODASYL database management systems, IBM Systems J. lg, 1, (1973), 30--93. " AD-A014 972/4WC, National Technical [M10] STEEL, T. B., Ja., Database standardi- Information Service, Springfield, Virginia, zation: a status report," in Database 1975. Deseription, B. C. M. Douque and G. M. Nijssen, (Eds.), North-Holland Publ. Co., Amsterdam, The Netherlands, 1975_pp. (M) Other Data Modeling Papers 183-198. Also in Proc. ACM 8IOMOD [M1] ABRIAL, !" R., "Data semantics," in Internatl. Conf. on Management of Data, Database Management, J. W. Klimbie and 1975, W. F. King, (Ed.), ACM, New York, K. L. Koffeman, (Eds.), North-Holland pp. 149-156. Publ. Co., Amsterdam, The Netherlands, [Mll] PARSONS,R. G.; DALE, A. G.; xNn YUR- 1974, pp. 1--60. KANEN, C. V., "Data manipulation lan- [M2] COLLMEYER,A.J., "Implications of data guage requirements for database manage- independence on the architecture of data- ment systems," Computer J. 17, 2 (May base management systems," in Proc. 1974), 99-103. ACM SIGFIDET Workshop on Data Description, Access, and Control, 1972, (G) Referenced Papers A. L. Dean, (Ed.), ACM, New York, pp. 307-321. [G1] DODD, G. G., "Elements of data man- [M3] EARNEST, C. P., "Selection and higher agement systems," Computing ~urveys level structues in networks," in Database • 1, 2 (June 1969), 117-133. Description, B. C. M. Douque and G. M. [G2] MOLER,C., "Matrix computations with Nijssen, (Eds.), North-Holland Puhl. Co., FORTRAN and paging, Comm. ACM 1§, Amsterdam, The Netherlands, 1975, pp. 4 (April 1972), 268-270. 215-237. [G3] SEVERANCE,D. G.; Arm DVHNE, R. A., [M4] EVEREST, G. C.; AND SIBLEY, E. H., "A practitioner's guide to addressing al- "Critique of the GUIDE-SHARE DBMS gorithms," Comm. ACM, (publication requirements," in Proc. ACM SIGFIDET pending). Workshop on Data Description, Access [G4] NIJSSEN,G.M., "Efficient batch updating and Control, 1971, E. F. Codd and A. L. of a random file," in Proc. ACM Dean, (Eds.), ACM, New York, pp. 93-112. SIGFIDET Workshop on Data Description, [M5] GUIDE-SHARE DATABASE REQUIREMENTS Access, and Control, 1971, E. F. Codd GRouP, Database management system re- and A. L. Dean, (Eds.), ACM, New York, quirements, 1970, SHARE, Inc., New York. pp. 173-186. [M6] HUITS, M. H. H., "Requirements for [G5] DATE,C. J., An introduction to database languages in database systems," in Data- systems, Addison-Wesley, Reading, Massa- base Description, B. C. M. Douque and chusetts, 1975. G. M. Nijssen, (Eds.), North-Holland Publ. Co., Amsterdam, The Netherlands, ACKNOWLEDGMENTS 1975, pp. 85-109. ,, [M7] McGEE, W.C., File level operations on The authors are grateful to M. L. O'Connell of the network data structures," in P~oc. ACM CODASYL Data Base Language Task Group for SIGMOD Internatl. Conf. on Management clarifying several points. This research was sup- of Data, 1974, W. F. King, (Ed.), ACM, ported, in part, by the National Science Founda- New York, pp. 32-47. tion under Grants GJ-41829 and GJ-41830.

Compo~ Surv~q~ Voa. 8, No. |, ~ 11176