REFERENTIAL INTEGRITY REVISITED~AN OBJECT-ORIENTED PERSPECTIVE* VICTOR M. ~KOWITZ Lawrence Berkeley Laboratory Computer Science Research and Development Department 1 Cyclotron Road, Berkeley, CA 94720

ABSTRACT represented by tuples, while object connec- Referential integrity underlies the relational tions are represented by references between tuples. representation of objeceoriented structures. The con- Such references are enforced in relational cept of referential integrity in relational databases is by referential integrity constraints [2]. There are two hindered by the confusion surrounding both the con- approaches to the specification of these constraints in cept itself and its implementation by relational data- relational databases. In the Universal Relation (UR) base management systems (RDBMS). Most of this approach [ll], referential integrity constraints are confusion is caused by the diversity of relational implied by associating different relations with com- representations for object-oriented structures. We mon attributes; the referential integrity meaning of examine the relationship between these representa- relations sharing common attributes is defined by a tions and the structure of referential integrity con- set of rules, called UR assumptions. The UR assump- straints, and show that the controversial structures tions make the description of object structures either do not occur or can be avoided in the rela- extremely difficult and not entirely reliable, mainly tional representations of object-oriented structures. becaise they require excessively complex attribute Referential integrity is not supported uniformly name assignments. by RDBMS products. Thus, referential integrity con- A different approach to the specification of straints can be specified in some RDBMSs non- referential integrity constraints is ‘to associate expli- procedurally (declaratively) , while in other RDBMSs citly a foreign-key in one relation with the primary- they must be specified procedurally. Moreover, some key of another relation [5]. Such constraints are a RDBMSs do not allow the specification of certain special case of inclusion dependencies 141. Every referential integrity constraints. We discuss the explicit referential integrity constraint is usually referential integrity capabilities provided by three associated with a referential integrity rule which representative RDBMSs, DB2, SYBASE, and INGRES. defines the behavior of the relations involved in the constraint under insertion, deletion, and . I. INTRODUCTION Explicit referential integrity constraints are easier to The design process involves specifying the specify and understand than the implicit referential objects and object connections relevant to the data- integrity constraints of the UR approach, because base application. In relational databases objects are they are closer to the way users describe object struc- tures. However, the referential integrity concept is * This work was supported by the Office of Health and En- still surrounded by confusion, as illustrated by the vironmental Research Program and the Applied Mathematical Sci- successive modifications of the original definition of ences Research Program, of the Office of Energy Research, U.S. [2] (e.g. see [3], [5]). Thus, certain referential Department of Energy, under Contract DE-AC03-76SF00098. integrity structures have unclear semantics, and therefore must be ‘treated with caution’ [6]. Obvi- ously, a non technical user cannot be expected to manage the complexities of such a task. In this paper we examine the characteristics of referential integrity constraints involved in the rela- tional representation of object-oriented structures. We show that the controversial structures discussed in [6] can be avoided without any effect on the capa- bility of relational schemas to represent object struc- Proceedings of the 16th VLDB Conference tures. We explore the characteristics of referential Brisbane. Australia 1990

578 integrity constraints in the context of relational sche- rules in INGRES) for specifying such constraints pro- mas representing Eztended Entity-Relationship (EER) cedurally. We discuss in this paper the referential object structures. We have selected the EER model integrity capabilities of DB2, SYBASE, and INGRES. because of its widespread use in designing relational We show that some of the restrictions imposed by databases [19]. However, our results apply to any DB2 are too stringent. We compare the SYBASE and object-oriented data model that supports generaliza- INGRES mechanisms for specifying referential tion and aggregation [g]. integrity constraints, and discuss their limitations. We have shown in [15] that an EER schema The paper is organized as follows. In section 2 can be represented by a Boyce-Codd Normal Form we briefly review the relational and EER concepts (BCNF’) relational schema of the form (R, F IJ I ), used in this paper, and the relational representation where R denotes a set of relation-schemes, and F and of EER schemas. In section 3 we examine two contr- I denote sets of key and inclusion dependencies, oversial foreign-key structures in the context of rela- respectively. Informally, relation-schemes represent tional schemas representing EER object structures. object-sets, and inclusion dependencies represent the The semantics of referential integrity rules in the existence dependencies inherent to object connec- context of relational schemes representing EER sche- tions. The inclusion dependencies in these schemas mas, is explored in section 4. In section 5 we exam- are key-based, that is, are referential integrity con- ine the effect of merging relations on the structure of straints, and relation-schemes correspond to either referential integrity constraints. The referential unique or multiple (embedded) object-sets. In [12] we integrity capabilities of DB2, SYBASE, and INGRES have shown that the mapping process involved in are examined in section 6. We conclude with a sum- representing EER schemas by relational schemas, can mary. The procedures for mapping EER schemas be expressed as the composition of (i) the mapping of into relational schemas, and for merging relation- EER schemas into a relational schemas, where every schemes in relational schemas are given in the appen- relation-scheme corresponds to a unique EER object- dix. set, followed by (ii) relation-scheme mergings, that result in relation-schemes representing multiple II. PRELIMINARY DEFINITIONS object-sets. In this section we review briefly the relational and We examine the structure of referential Extended Entity-Relationship (EER) concepts used in integrity constraints involved in relational schemas this paper, and the representation of EER object whose relation-schemes represent unique. EER structures using relational constructs. object-sets. We show that in such schemas referential integrity constraints can be associated with one out 2.1 Relational Concepts. of four possible referential integrity rules. Next, we We use letters from the beginning of the alpha- examine the effect of merging on the structure of bet to denote attributes and letters from the end of referential integrity constraints, and show that merg- the alphabet to denote sets of attributes. We denote ing entails associating some referential integrity con- by t a tuple and by t[ w] the sub-tuple of t straints with an additional, fifth, referential integrity corresponding to the attributes of W. rule. In contrast, seven referential integrity rules are A relational schema is a pair (R,A), where R is defined in [3] and [5]; we show that the two extra a set of relation-schemes and A is a set of dependen- rules are not needed for representing EER object cies over R. We consider relational schemas with structures. A = F u I, where F and I denote sets of functional Currently, several manage- and inclusion dependencies, respectively. A relation- ment systems (RDBMS), notably IBM’s DB2, SYBASE, scheme is a named set of attributes, R,(Xi), where Ri and INGRES, provide support for referential is the relation-scheme name and Xi denotes the set of integrity. Interestingly, these systems provide attributes. Every attribute is assigned a domain, and different capabilities for specifying referential every relation-scheme, Ri(Xi), is assigned a relation integrity constraints. Thus, DB2 [7] allows non- (value), ri. Th e p ro 3‘ec t iota of such a relation, r;, on procedural (declarative) specifications of referential a subset of GUI, W, is denoted zw(ri), and is equal to integrity constraints, but with certain restrictions on {t[W] 1 t E ri>. ‘I’ wo attributes are said to be com- the allowed structures. Conversely, SYBASE [18] patible if they are associated with the same domain, and INGRES [8] do not support declarative and attribute sets X and Y are said to be compatible specifications of referential integrity constraints, and iff there exists .a one-to-one correspondence of compa- provide instead mechanisms (triggers in SYBASE and tible attributes between X and Y.

579 Let Ri(Xi) b e a relation-scheme associated An EER schema can be represented as an acy- with relation ri. A junctional dependency over Ri is clic directed graph, called EER diagram: entity-sets, a statement of the form Ri: Y-Z, where Y and Z relationship-sets, and attributes, are represented by are subsets of Xi; Ri: Y-+Z is satisfied by ri iff for rectangle, diamond, and ellipse shaped vertices, any two tuples of ri, t and t ‘, t[ Y] = t ‘[r] implies respectively; and the connection of EER object-sets t[z] = t ’ [Z]. A key associated with Ri is a subset and attributes is represented by directed edges. Thus, of Xi, Ki, such that Ri : Ki~Xi is satisfied by any there are directed edges: from relationship-sets to the ri associated with Ri and there does not exist any object-sets they associate, labeled by a cardinality proper subset of Ki having this property. A (1 (one) or M (many)); from weak entity-sets to the relation-scheme can be associated with several candi- entity-sets on which they depend for identification, date keys from which one primary-key is chosen. labeled ID; from specialization entity-sets to generic Let R,(Xi) and Ri(Xj) be two relation- entity-sets, labeled ISA; and from object-sets to their schemes associated with relations 7; and rj, respec- attributes. The acyclicity of EER diagrams is tively. An inclusion dependency is a statement of the implied by certain restrictions satisfied by EER sche- form Ri[ YJ _C Rj[Z], where Y and Z are compatible m= WI, P51). A self-explanatory EER diagram subsets of Xi and Xi, respectively; R;[Y] _CRj[Z] is example is shown in figure 1. satisfied by ri and rj iff STY(Vi) C “2 (rj). If Z is 2.3 Relational Representation of Extended the primary-key of R j then Ri[ Y] & R j[Z] is said Entity-Relationship Schemas. to be key-based, and Y is called a foreign-key of Ri. Key-based inclusion dependencies are rejerential Relational schemas representing EER object integrity constraints ((21, [5]). structures are of the form (R, F U I), where R, F, and I denote sets of relation-schemes, functional 2.2 The Extended Entity-Relationship Model. dependencies, and inclusion dependencies, respec- The basic concepts of the Entity-Relationship tively’ [15]. Informally, relation-schemes represent model, (entity, relationship, attribute, entity-set, EER object-sets, inclusion dependencies represent the relationship--set, value-set, entity-identifier, weak existence dependencies inherent to object-set connec- entity-set, relationship cardinality, role) have been tions, and functional dependencies represent entity- repeatedly reviewed (e.g. [19]) since their original identifiers and relationship cardinalities. In [15] we definition in (11. We refer commonly to entities and have shown that an EER schema can be represented relationships as objects. The Extended Entity- by a BCNF’ relational schema, such that every Relationship (EER) model considered in this paper relation-scheme corresponds to a unique object-set, has two additional abstraction mechanisms, generali- functional dependencies are key dependencies, and zation and full aggregation. Generalization ((91, (191) inclusion dependencies are key-based, that is, are is an abstraction mechanism that allows viewing a referential integrity constraints. set of entity-sets as a single generic entity-set. The inverse of generalization is called specialization. The Relation- Primary Object-Set EER Attribute Scheme Key : Attribute full capability of aggregation is provided in the EER PERSON SSN :All model by allowing relationship-sets to associate any RI (A,, ) Al1 DEPARTMENT NAME : A, object-set, rather than only entity-sets. R, (41 1 A21 R&G1 1 A COURSE NR :AS: R, @,.A 1 A: FACULTY RANK: Aa R,(A,’ ) y AS; STUDENT INSTRUCTOR Red) A% 1 R, (A,)G2 1 OFFER j INSTRUCTOR /e AT* Ro (Ael& 1 As TEACH Ro (AolA2) Ao2 ASSIST Referential Integrity Constraints &kJ C 41411 R,l&1l C RIP, Ro Pel I E R4 IA+1 Re I+ 1 E R, I&: R, I+ 1 S R2 lAzl I R,lA,21 E R&a, gL&Tp=& W&J G R&b21 R,lAell C hIA% RoL$l C R&e11 RolAo21 G R7 l-472

Fig.1 An Extended Entity-Relationship Schema. Fig.2 Relational Schema for EER Schema of Fig.1.

580 A procedure for mapping EER schemas into relationships, that is, allowing objects involved in relational schemas based on the algorithms developed relationships to be unknown or inapplicable. Con- in [15], called Rmap, is given in the appendix, and is sider the EER schema of figure 1, where entity-set exemplified in figure 2. Concerning the assignment FACULTY is associated with attribute RANK; the value of names to relational attributes, various techniques of RANK can be unknown for some faculty members. can be employed [16]. In order to keep Rmap If RANK, however, is associated with entity-set PER- independent of a specific name assignment for rela- SON, then for a person who is not a faculty member, tional attributes, we use only symbolic names for the RAM value is inapplicable. attributes (e.g. Ad2 represents the second attribute of Partially specified relationships are needed for the fourth relation). representing embedded real-world associations. For example, suppose that a ternary relationship-set III. PARTLY NULL AND OVERLAPPING KEYS ASSIGNED involves entity-sets FACULTY, DEPART- In [6] Date discusses the problems caused by overlap- MENT, and COURSE, and represents the assignment of pingt (foreign and primary) keys, and partly t faculty members to courses offered by departments; foreign-keys, and points out their obscure semantics. then in ASSIGNED relationships representing courses We show below that in relational schemas represent- that are offered by departments, but that are not ing EER object structures overlapping keys do not assigned to a faculty member, the FACULTY entity is occur, and partly null foreign-keys can be avoided. unknown. Inapplicable attribute values and partially specified relationships can be avoided as follows: 3.1 Overlapping Keys. - if an attribute A associated with entity-set E has As mentioned in the previous section, the rela- inapplicable values for some entities of E, then A tional schema representation of an EER object struc- can be associated with a (possibly newly defined) ture is of the form (R, F U I ), where R, F, and I specialization of E, E ‘, so that A is always applica- denote sets of relation-schemes, functional dependen- ble for the entities of E ‘; for example, if attribute cies, and inclusion dependencies, respectively. The FUNK above is associated with FACULTY (as shown following proposition shows that relational schemas in figure l), then RANK is always applicable. generated by Rmap do not involve overlapping keys. - if a relationship-set includes partially specified Proposition 1. Let RS = (R, F lJ I) be a rela- relationships, then it can be decomposed into tional schema generated by Rmnp. Let Ri(X~) be a independent or related (by aggregation) relation-scheme of R, and let FKi denote the union relationship-sets involving only fully specified rela- of all foreign-keys, Fl(i,, associated with Ri. Then tionships (note that binary relationship-sets consist every foreign-key FKi, associated with Ri satisfies only of fully specified relationships); for example, the following conditions: FKi, is either included in, relationship-set ASSIGNED above can be decom- or disjoint with, the primary-key of R;; and FK;, is posed into relationship-sets OFFER and TEACH (as shown in figure 1) that involve only fully specified either equal to, or disjoint with, every other foreign- relationships. key of Ri. Consequently, only unknown attribute values are Proof: see proposition 4.1 of [14]. n needed for the EER modeling of missing information, 3.2 Partly Null Foreign-Keys. while partially specified relationships and inapplica- ble attribute values can be avoided. Note that for We examine below the EER modeling of miss- convenience entity-identifier attributes are usually ing information, and then show how partly null not allowed to have unknown values. foreign-keys can be avoided in relational schemas representing EER object structures. Relational modeling of missing information employs special purpose null values. The various The EER modeling of missing information meanings associated with nulls are generally involves allowing attributes to have unknown or compressed into two [3]: inapplicable values and unk- inapplicable values, and allowing partially specified nown (but applicable) values. From the discussion concerning the EER modeling of missing informa- t Two sets of attributes, X and Y, are said to be overlap- ping ifi (X - Y), (Y -X), and (X fl Y) are not empty. tion, it follows that nulls representing inapplicable t A foreign-key of a relation-scheme R,, Z, is said to be values can be avoided in databases associated with partly null if subtuples t[Z] of relations associated with R,, are al- schemas representing EER object structures. Nulls lowed to contain both null and non-null values. representing unknown values, however, can represent

581 in such databases either an unknown EER attribute connecting the corresponding object-sets with a ‘v' value or an unknown (relational) foreign-key attri- label (see figure 1). bute value. In relational schemas such as those gen- erated by Rmap, in which every relation-scheme 4.2 Referential Integrity Rules. corresponds to a unique object-set, primary-keys and Let Ri(Xi) and Ri(Xj) be two relation-schemes foreign-keys are not allowed to have null values; associated with relations ri and rj, respectively. A these constraints represent the restrictions of not referential integrity constraint Ri[ y] C Rj[Kj] is allowing object-identifiers to have unknown values, associated with a referential integrity rule consisting and not allowing relationships to be partially of an -rule, a -rule and an update-rule specified. Note that for foreign-keys these con- [5]. There is a unique insert-rule, restricted, which straints are stronger than the rejerential-integrity asserts that inserting a tuple t into ri can be per- constraint specified in [2]. However, in [3] Codd formed only if the tuple of rj referenced by t already states that nulls should not be allowed for foreign- exists. The delete and update rules define the effect keys in most cases, but only when ‘there exists a of deleting (resp. updating the primary-key value in) strong reason to depart from this discipline’. We a tuple t ’ Of ‘j : a restricted delete (resp. update) show in section 5 that such an exception is needed rule asserts that the deletion (resp. update) of t ’ can- only for relations that correspond to multiple, rather not be performed if there exist tuples in ri referenc- than single, object-sets. ing t’; a cascades delete (resp. update) rule asserts that the deletion (resp. update) of t ’ implies deleting IV. REFERENTIAL INTEGRITY RULES (resp. updating the subtuple t[q in) the tuples of ti referencing t ’ ; and a nullifies delete (resp. update) In this section we discuss the existence dependencies rule asserts that the deletion (resp. update) of t ’ inherent to object-oriented structures, and examine implies setting to null the subtuple t[ v in all the the referential integrity rules in the context of rela- tuples’t of Ti referencing t ’ tional schemas representing such structures. Procedures mapping EER sch.emas into rela- 4.1 Existence Dependency Types. tional schemas, such as Rmap, usually assume that Object-oriented structures imply certain existence dependencies are of type block. The seman- existence dependencies that must be satisfied by tics of such existence dependencies is expressed by updates. In an object-oriented environment an ele- associating the corresponding referential integrity mentary update consists of modifying an attribute constraints with restricted insert and delete rules. value, removing an object from an objecbset, or The new type of existence dependency introduced adding a new object to an object-set. In order to above, entails associating referential integrity con- satisfy the existence dependencies, usually an object straints corresponding to trigger existence dependen- z that is existence dependent on another object y, cies with cascadea delete-rules. Finally, the preserva- cannot be added before y is added, and y cannot be tion of existence dependencies under attribute removed before z is removed. Intuitively, y blocks modifications in object-oriented environments, is the addition of z, and z blocks the removal of y; expressed by associating referential integrity con- therefore the existence dependency of z and y is said straints with cascades update-rules. Thus, the to be of type block. We propose an additional, type referential integrity constraints in figure 2, for exam- of existence dependency, called trigger: if an object z ple, must be associated with restricted insert-rules is trigger existence dependent on another object, y, and cascadea update-rules; the delete-rules are re.s- then the removal of y triggers the removal of z tricted for all the constraints with the exception of (instead of z blocking the removal of y). Yote that R8jA8,] C R,[A, 2; and R,IA,2j 2 R(A,i, for which existence dependencies between objects do not affect the delete-rules are cascade.s. attribute modifications, since such modifications are The mapping of EER schema.5 into relational local to the objects. schema.5 can be straightforwardly extended with the The two types of existence dependency above explicit generation of referential integrity rules. are specified for pairs of object-sets, rather than pairs Thus, procedure Rmap is extended by associating of individual objects, and can be represented in EER every referential integrity constraint ref with a res- diagrams as follows: the block existence dependenq tricted insert-rule and a cascades update-rule; the is considered the default ty-pe. and therefore is not delete-rule depends on the type of existence depen- explicitly represented: the riistence dependency of dency corresponding to ref: if the type of the type trigger is represented by associating the edges existence dependency is block then ref is associated

582 with a restricted delete-rule, otherwise (if the type is consisting of all the vertices that are connected to Ri trigger) ref is associated with a cascades delete-rule. in G1 by directed paths consisting of edges that correspond to referential integrity constraints associ- 4.3 Conflicting Existence Dependencies. ated with cascades delete-rules. Then for every Ri of Objects can block or trigger the removal of R, the referential integrity constraints corresponding other objects with which they are involved in to the edges of G, that connect vertices of Casc(Ri) existence dependencies not only directly, but also with Ri or other vertices of Casc(Ri), are not allowed transitively. Thus, an object y can trigger the remo- to be associated with restricted delete-rules. val of an object z if y can trigger the removal of an Proof Sketch . Rmap generates referential integrity object z on which z is trigger existence dependent; constraints associated with a referential integrity conversely, if z is block existent dependent on t then graph, GI, that is isomorphic to a subgraph of the 2 blocks the removal of y. For example, suppose corresponding EER diagram (proposition 4.1 in [Id]). that in the EER schema of figure 1 the existence The condition of the proposition follows from the dependencies of INSTRUCTOR on STUDENT, STUDENT restriction above specified for EER schemas. n on PERSON, and FACULTY on PERSON, are of type trigger, while the existence dependency of INSTRUC- V. THE EFFECT OF MERGING ON REFERENTIAL TOR on FACULTY is of type block; then an INSTRUC- INTEGRITY CONSTRAINTS TOR entity blocks the removal of a PERSON entity (via FACULTY), while a PERSON entity can trigger the In this section we examine the effect of merging rela- removal of an INSTRUCTOR entity (via STUDENT). tions on the structure of foreign-keys and referential integrity constraints. If an object z blocks the removal of an object y which, in turn, can trigger the removal of z, then the Merging brings about the need to allow certain result of removing y is unpredictable. In the example foreign-key attributes to have null values. We have above, for instance, removing a PERSON entity shown in [12] that merging relations requires the depends on the order in which the removal is per- introduction of additional null constraints [lo] for formed (i.e. first via FACULTY, or first via STUDENT). restricting the way in which null values appear in Consequently, objects should not be allowed to block merged relations. A procedure for merging relations the removal of objects that can trigger their removal. in relational databases that preserves the In terms of the EER object structure, this restriction information-capacity and the normal form of the can be stated as follows: corresponding schemas, has been proposed in [12]. The merging procedure developed in [12] may gen- Let Oi be an object-set of an EER schema I%, and erate inclusion dependencies that are not key-based, let Trig(Oi) be the set of object-sets of ES consist- that is, are not referential integrity constraints. We ing of all the objectrsets that are trigger existent consider below a merging procedure that generates dependent (directly or transitively) on Oi t. The.1, only key-baaed inclusion dependencies and involve for every Oi of ES, object-sets of Tr

583 merging can be reconstructed from r,,,, so that the rules. schema generated by Rmerge, RS ’ , has equivalent The effect of merging on the structure of information-capacity with RS. An example of merg- foreign-keys and referential integrity constraints is ing is shown in figure 3, where Rmerge is applied on captured by the following proposition. the relational schema of figure 2 in order to merge Proposition 3. Let RS = (R, F IJ I ) be a rela- relation-schemes R,, Rs and Rg into R ;. tional schema generated by Rmap, and let Following merging, the referential integrity RS’= (R ‘, F’U I’) be the result of applying Rmcrgc constraints involving the relation-schemes of R are on RS. Then (a) RS ‘satisfies the conditions of propo- replaced with referential integrity constraints involv- sitions 1 and 2. (b) In every relation-scheme R i(XI) ing the new relation-scheme, R,, and some of the of R ‘, if a foreign-key FK I, is allowed to have null foreign-keys associated with R, are allowed to have values then FK:, consists of a single attribute that null values. Clearly, the referential integrity con- straints involving R, must be associated with t-e,+ does not appear in any other foreign or ere exists a directed cycle in the t&ted insertrrules and cascades update-rules, like of R ‘,.. (c) If th the corresponding referential integrity constraints referential integrity graph G,a associated with RS: involving relation-schemes of x. Concerning the then at least one of the referential integrity con- delete-rules, Rmcrge assumes that the referential straints corresponding to the edges of this cycle: integrity constraints involving relation-schemes of E (i) involves a foreign-key that is allowed to have null values, and.(ii) is associated with either a restricted are associated with restricted delete-rules, and there- or a nullifies delete-rule. fore the referential integrity constraints involving R, are also associated with restricted delete-rules. If cas- Proof Sketch. (a) The proof follows the specification cades delete-rules are also considered, then Rmerge of Rmerge. (b) See proposition 4.2 in [14]. (c) Rmop must be extended as follows: generates referential integrity constraints associated - if a referential integrity constraint involving R,, with a referential integrity graph, G,, that is iso- ref, is derived from a referential integrity con- morphic to a subgraph of the corresponding EER diagram. Since EER diagrams are acyclic (see [15]) straint associated with a cascades delete-rule, then GI is also acyclic. Cycles in G,a may result from ref is associated either with (a) a nullifies delete- merging relation-schemes in RS, and the proof is rule, if ref involves a foreign-key of R, that is based on the conditions of proposition 1 and the allowed to have null values, or (b) a cascades specification of Rmergc. n delete-rule, otherwise. The proposition above shows that merging does For example, referential integrity constraint not alter the properties of propositions 1 and 2. in the relational schema of figure R %%,I S RI[AIJ Thus, relational schemas undergoing merging are still 3, is associated with a nullifies delete-rule (because it free of the undesirable foreign-key and referential corresponds to a referential integrity constraint asso- integrity structures discussed in sections 3 and 4. It ciated with a cascades delete-rule and As, is allowed can be verified that proposition 3 is valid not only to have null values), while the other referential for the restricted merging procedure considered integrity constraints involving relation-scheme R + in above, but also for extended merging procedures such. this schema are associated with restricted delete- as that defined in [12]. Several remarks concerning the referential Reletion- Primary Referential Integrity integrity rules involved in the relational representa- Scheme Key Constraints tion of object-oriented structures, are in order. Only 4@,l) A11 R,Phl G R,I-%J restricted insert-rules, restricted delete-rules, and R2 64~~ 1 A21 R5 IAsl I _C RI b$ I caseadea update-rules are required for representing R, (Aa1 ) A31 Wbll C &I+1 existence dependencies of type block between objece R, b$A% 1 A4L Re l-h1 I G R5 I+ I sets. Cascades delete-rules are required for R5 (AsI ) A51 R ; l&l I C R2 IA21 I representing existence dependencies of type trigger, ‘We1 1 A% RW21 E R&1l and nullifies delete-rules are required only if relations R; (A,1rh2p A$ A*ol) A72 R;lA,J C R,I4.& are allowed to represent multiple object-sets (e.g. fol- R; h1 I G ReI-Q I lowing merging). The referential integrity con- m : l nulls are allowed for A,, and Aol. straints involved in relational schemas representing object-oriented structures are always associated with Fig. 3 Relational Schema of Fig. 2 after Merging. cascades update-rules, therefore nullifies and restrict

584 update-rules can be discarded. Cascades update- satisfied by the referential integrity constraints of rules become superfluous if updates of primary-key relational schemas representing EER object struc- attributes are not allowed. While such a restriction is tures: unreasonable for regular relational attributes, this G) DB2 does not support the cascades update-rules restriction underlies the definition of surrogate attri- involved in the relational schemas representing butes [2]. Consequently, if surrogate attributes are EER object structures. used as primary and attributes, then cas- cades update-rules are not necessary. (ii) Restriction (2) above treats cycles consisting of single edges differently from cycles consisting of multiple edges. This apparent contradiction does VI. REFERENTIAL INTEGRITY IN RELATIONAL not exist for the referential integrity constraints DATABASE MANAGEMENT SYSTEMS of relational schemas representing EER object Referential integrity is currently supported by structures (see condition (c) of proposition 3). several relational database management systems (iii) Restrictions (2) and (3) above are stronger than (RDBMS), notably IBM’s DB2 [7], SYBASE [18], and the conditions of propositions 1, 2, and 3. First, INGRES [8]. The referential integrity capabilities of note that if every relation-scheme corresponds to these three RDBMSs are briefly examined below. A a unique object-set, then restrictions (2) and (3) more detailed analysis is provided in [13]. above are trivially satisfied. However, if 6.1 IBM's DB2. relation-schemes are allowed to correspond to multiple object-sets (e.g. following mergings), In DB2 referential integrity constraints are then restrictions (2) and (3) above might not be specified non-procedurally (declaratively), but with satisfied. Consider, for example, the relational certain restrictions. We examine below these restric- schemas of figures 4(ii) and 4(iv), generated by tions and their effect on the relational representation Rmap and Rmerge, from the EER schemas of of object-oriented structures. figures 4(i) and 4(iii), respectively. If the two Let RS = (R, F lJ I ) be a relational schema, existence dependencies of ' SUPERVISE on where R, F, and I consist of relation-schemes, key EMPLOYEE in the EER schema of figure 4(i) are dependencies, and referential integrity constraints, both of type block (resp. trigger), then the respectively; let G1 be the referential integrity graph referential integrity constraint in the relational associated with RS. The referential integrity con- schema of figure 4(ii) is associated with a res- straints of I must satisfy the following restrictions in tricted (resp. nullifies) delete-rule; consequently DB2: this schema does not satisfy restriction (2) above, while it satisfies the condition (c) of pro- 1. Every referential integrity constraint of I is con- position 3. sidered to be associated (by default) with a res- t&ted update-rule. (iv) Conversely, the referential integrity constraints of the relational schema of figure 4(iv) satisfy 2. Let I’ be a subset of I that consists of referen- tial integrity constraints corresponding to edges forming a directed cycle in G,. If I’ consists of a single constraint, then this constraint must be associated with a cascades delete-rule. If I’ consists of two or more constraints, then at least two constraints of I’ must be associated with R,[A,J CRIIA,I] (Primary Key: AlI) restricted or nullifies delete-rules. (ii) RI (All, AI121, 3. Let Casc(Ri) be defined as in proposition 2 of section 4. For every pair of vertices of R, Ri and Rj, if Rj is connected in Gf by multiple edges to vertices of ( Casc(Ri) U {Ri}), then the referential integrity constraints corresponding to (iv) R, (All, Aal2 1, R,[A,J C R21A21] (Primary Key: A,> these edges must be associated with identical, R, h1 11 R,lA.+l C R,lA,Il (Primary Key: A2J restricted or cascades, delete-rules. n Notes: All relational attributes correspond to SSN Note that DB2 allows the specification of partly * Nulls are allowed null foreign-keys and overlapping keys. We contrast Fig. 4 Relational Schema Examples. below the DB2 restrictions above with the conditions restriction (2) of the definition above, only if all (ii) While SYBASE triggers are set-oriented (i.e. are the existence dependencies in the EER schema of activated for sets of tuple manipulations), figure 4(iii) are of type block; however, if the INGRES rules are tuple-oriented (i.e. are existence dependencies are of type trigger, then activated for single tuple manipulations); RI&I C %[+I is associated with a nullifies accordingly, INGRES rules present no problem in delete-rule, while R,[A,I] C RIIAII] is associated implementing cascades update-rulea. with a cascades delete-rule; therefore, restriction (iii) INGRES provides a more evolved mechanism for (2) above is not satisfied, while condition (c) of handling errors and messages, and INGRES’s proposition 3 is satisfied. Embedded SQL employed for specifying rules is more flexible than SYBASE’s Transact-SQL SYBASE and INGRES. 8.2 employed for specifying triggers. SYBASE [18] and INGRES (81 do not allow Overall, the INGRES rule mechanism is techni- declarative specifications of referential integrity con- cally superior to the SYBASE trigger mechanism. straints. Instead, they provide mechanisms for speci- However, both the SYBASE trigger and the INGRES fying procedurally such constraints. We examine rule mechanisms are extremely cumbersome. The use below the main characteristics and limitations of of SQL, compounded in SYBASE by certain syntactic these mechanisms. limitations, make trigger and rule procedures very The mechanism provided by SYBASE for large and hard to comprehend. The manual implementing referential integrity constraints specification of trigger and rule procedures is a tedi- involves triggers. Triggers are a special kind of ous and error-prone process that tends to discourage stored procedures that are activated (fired) when a users from specifying them for non-trivial databases. relation is affected by a data manipulation (i.e., an insertion, deletion, or update). A trigger procedure is VII. SUMMARY. associated with a unique relation, say ri, and We have examined the concept of referential employs two system provided relations, called deleted integrity in the context of relational schemas and inserted : the deleted relation consists of tuples representing EER object structures. We have of ri that are going to be deleted or updated; the explored the effect of different relational representa- inserted relation consists of tuples that are going to tions of EER schemas on the structure of referential be inserted into r;, or newly updated tuples of ri. integrity constraints. Our analysis shows that the SYBASE allows the specification of three trigger pro- referential integrity concept can be simplified, and cedures per relation: an insert, a delete, and an that the controversial referential integrity structures update trigger procedure. Given relation ri associ- can be avoided without affecting the capability of ated with relation-scheme Ri, the trigger procedures relational schemas to represent EER object struc- for ri are executed in order to enforce the referential tures. integrity constraints involving Ri. SYBASE imposes certain (rather arbitrary) technical limitations, such We have discussed the referential integrity as the number of levels allowed for nesting triggers. capabilities of three relational database management Another limitation concerns the implementation of systems (RDBMS), DB2, SYBASE, and INGRES. We referential integrity constraints associated with cas- have shown that some restrictions imposed by DB2 cades update-rules: a cascades update-rule can be on the structure of referential integrity constraints implemented only if updates of primary-key attri- limit the capability of defining in DB2 relational sche- butes are restricted to single tuples at a time, that is, mas representing EER object structures. We have only if the inserted and deleted relations consist of compared the mechanisms provided by SYBASE and single tuples. INGRES for the procedural specification of referential integrity constraints. We have shown that although The INGRES rule mechanism [8] is conceptually conceptually similar, these mechanisms have techni- similar to the SYBASE trigger mechanism. The cal differences, with the INGRES rule mechanism differences, which are mainly of a technical nature, being more flexible and less restrictive than the are summarized below: SYBASE trigger mechanism. (i) Unlike SYBASE, INGRES does not restrict the Finally, we have pointed out the difficulty of number of triggers per relation, nor the depth of using SYBASE triggers and INGRES rules. Using rule nesting. these mechanisms is labor-intensive and error-prone, therefore users should be insulated from them by a

586 high-level interface that would allow non-procedural [ll] D. Maier, J.D. Ullman, and M. Vardi, “On the specifications of referential integrity constraints, and Foundations of the Universal Relation Model”, that would detect erroneous referential integrity ACM TODS 9, 2 (June 1984), pp. 283-308. structures. We have implemented such an interface [12] V.M. Markowitz, “Merging relations in rela- as part of a Schema Design and Translation (SDT) tional databases”, Technical Report LBL-27842, tool [17]. SDT generates (i) abstract relational sche- Jan. 1990. mas from EER schemas, and (ii) RDBMS schema 131 V.M. Markowitz, “Relational database manage- definitions from abstract relational schemes. ment system support for referential integrity: A Currently, SDT targets DB2, SYBASE, and INGRES. comparative study”, Technical Report LBL- We have employed SDT for the generation of 2863, July 1990. SYBASE and INGRES schema definitions from EER schemas consisting of approximately twenty object- 141 V.M. Markowitz and J.A. Makowsky, “Identify- sets. The difficulty of specifying SYBASE triggers ing extended entity-relationship object struc- and INGRES rules is illustrated by the amount of tures in relational schemes”, IEEE Trans. on code (over one thousand lines) generated by SDT for Software Engineering, 16, 8, (Aug. 1990). the trigger and rule procedures involved in these [15] V.M. Markowitz and A. Shoshani, “On the definitions. correctness of representing extended entity- relationship structures in the ”, REFERENCES Proc. of the 1989 SIGMOD Conj., pp. 430-439. 111P.P. Chen, “The entity-relationship model- [16] V.M. Markowitz and A. Shoshani, “Name towards a unified of data”, ACM TODS 1, assignment techniques for relational schemas 1 (March 1976), pp. 9-36. representing extended entity-relationship struc- PI E.F. Codd, “Extending the relational database tures”, Proc. of the 8th Int. Conj. on Entity- model to capture more meaning”, ACM TODS Relationship Approach , Canada, 1989. 4, 4 (Dee 1979), pp. 397-434. [17] V.M. Markowitz and W. Fang,.“SDT 3.1. Refer- I31 E.F. Codd, ‘Missing information (applicable and ence manual”, Technical Report LBL-27843, inapplicable) in relational databases”, SIGMOD April 1990. Record 15,4 (Dee 1986), pp. 53-77. [18] Sybase, Inc., “Transact-SQL User’s Guide”, PI S.S. Cosmadakis and P.C. Kanellakis, “Equa- Release 4.0, Emeryville, California, May 1989. tional Theories and Database Constraints”, in [19] T.J. Teorey, D. Yang, and J.P. Fry, “A logical Proc. of the 17th ACM Symp. on Theory of design methodology for relational databases Computing (STOC), pp. 273-284, 1985. using the extended entity-relationship model”, 151C.J. Date, “Referential integrity”, in Relational ACM Computing Surveys 18,2 (June 1986), pp. Database-Selected Writings, Addison-Wesley, 197-222. 1986. I61 C.J. Date, “Referential integrity and foreign APPENDIX keys: Further considerations”, in Relational In this appendix we present two procedures, for map- Database- Writings 1985-l 989, Addison-Wesley, ping EER schemas into relational schemas, and for 1990. merging relation-schemes in relational schemas PI IBM Corporation, “IBM DATABASE 2 Referential representing EER schemas. Integrity Usage Guide”, June 1989. A.1 Mapping EER Schemas. PI Ingres, Inc., “INGRES/SQL Reference Manual”, The procedure for mapping EER schemas into Release 6.3, Alameda, California, Nov. 1989. relational schemas is based on procedures developed PI R. Hull and R. King, “Semantic - in [15]. We refer below to generalization-sources, ing: Survey, applications, and research issues”, which denote entity-sets that are not specializations ACM Computing Surveys 19,3 (Sept. 1987), pp. of other entity-sets, and independent entity-sets, 201-260. which denote generalization-sources that are not [lo] D. Maier, The theory of relational databases, weak entity-sets. We assume that EER schemas Computer Science Press, 1983. satisfy certain welt-definedness properties, such as the acyclicity of EER diagrams and the uniqueness of generalization-sources for specializations; these

587 properties are discussed in (151 (see also (91 for a 4. Specialization Entity-Se&J. Let entity-set Ei be related discussion). the specialization of entity-sets Ei,, 15 j

3. Aggregation Object-Sets. Let object-set Oi be A.2 Merging Relations. the aggregation of (not necessarily distinct) The merging procedure below is a restricted object-sets Oi,, l

588 Definition - Rmerge. Inprt : a relational s_chema RS = (R, F u I), and a subset of R, R, such that (a) the primary-keys of the relation-schemes of E are pairwise compatible; (b) there exists a relation-scheme R,(XP) in R such that for every Ri of R, i # p, Ri[KiI C Rp[Kpl E I; (c) every relation-scheme Ri(Xi) of R, i # p : (i) has exactly one non primary-key attri- bute; (ii) cannot be involved in the right- hand side of any inclusion dependency of I; and (iii) can be involved in the left-hand side of at most two inclusion dependencies, one involving R, (see (b) above), and one of the form Ri[Xi-Ki] C Rj[Kj]. Output: a relational schema RS’= ( R : F’U I’). Rmergc (a) applied on RS generates RS ’ as follows: 1. R ‘results by replacing the relation-schemes of E in R with relation-scheme R,(X,,,), such that K ,,, := KP, X,,, := Km UR,(X,) E K (xi - Ki), where the attributes of (X,-X,) are allowed to have null values; 2. F ’ results by replacing in F all the key dependen- cies involving primary-keys associated with the relation-schemes of E , with key dependency R, : K,,,-*X,,,; 3. I’ results by replacing Ri with R, and Ki with K,,,, in every inclusion dependency of I that involves a relation-scheme Ri of K. Rmergc (E) is associated with two state mappings, 1 and q’, where q maps a state r of RS into a state r ’ of RS ’ , and q ’ maps a state r ’ of RS ’ into a state r of RS, as follows: q is the identity for the relations of r that_are ass* ciated with relation-schemes of (R-R); and maps the set of relations {ri 1 ri E r,ri is associ- ated with Ri E 8) into T, as follows: (i) r,,, := rr; and (ii) -for each Ri of (E-{R,}) q ’ is the identity for every relation of (r ‘-{r ‘,}); and maps relation r ‘,,, of r ’ into relations 6 as follows:

ri- := rename( *IK,,,(x,-K,) (r ‘, 1, Km+Ki), where Ri(Xi) is a relation-scheme of K. n

589