SAFE REFERENTIALINTEGRII-Y STRUCTURESr~ RELATIONAL *

Victor M. Markowitz Information and Computing SciencesDivision Lawrence Berkeley Laboratory 1 Cyclotron Road, Berkeley, CA 94720

Abstract The concept of referential integrity is still sur- Referential integrity constraints express in rela- rounded by confusion, as illustrated by the successive tional databasesexistence dependenciesbetween modifications of the original definition of [l] (see [21, tuples. Although it is known that certain [3]). Thus, although it is known that certain referential referential integrity structures may cause structures may cause data manipulation prob- manipulation problems, the nature of these prob- lems (e.g. see [3]), the nature of these problems has not lems has not been explored and the conditions been exp!orcd and the conditions for avoiding them have for avoiding them have not been formally not been formally developed. In particular, problems developed. In this paper we examine these data created by the interaction of referential integrity and manipulation problems and formally develop constraints have not been investigated, In this paper we safenessconditions for avoiding them. Next, we examine these data manipulation problems and develop discuss the problem of specifying safe rcfcrcn- safeness conditions for avoiding them. It is worth noting tial integrity constraints in three rcprescntativc that the approach of this paper is different from that of relational databasemanagement systems, IBM’s 171,where it is shown that relational schema translations DB2, SYBASE,and INGRES. of object-oriented schemas have referential integrity structures with desirable properties. Key Words : null constraint, referential integrity constraint, relational databasemanagement sys- The referential integrity mechanisms provided by tem, safe referential integrity structure. various RDBMSsare different and difficult to use. Thus, SYBASE4.0 [lo] and MCRES 6.3 [5] provide procedural 1. Introduction mechanisms (rriggm in SYBASE and rules in INGRES) for maintaining referential integrity constraints. Con- Commercial relational datab&se management systems versely, DB2 [4] allows non-procedural (declarative) (RDBMS) provide mechanisms for maintaining key and specifications of referential integrity constraints, but restricted (nulls not allowed) null constraints. Several imposes restrictions on the structure of these constraints. commercial RDBMSs, notably IBM’s DB2, SYBASE4.0, Problems underlying the use of the referential integrity and INGRES6.3 also provide mechanismsfor maintaining mechanisms of DB2, SYBASE, and INGRES have been referential integrity constraints. Referential integrity con- examined in [9]. In this paper we examine these mechan- straints are used in relational databasesfor expressing isms in the context of safe referential integrity structures. existence dependencies between tuples [I]: such con- In SYBASE4.0 and INGRES6.3 the task of specify- straints are specified by associating a foreign-key in one ing correctly referential integrity constraints is left to with the primary-key of another relation [2]. users. Thus, no mechanism is provided by these systems Referential integrity constraints are usually associated with rules that define the behavior of the relations for detecting unsafe or even not well-defined referential involved in these constraints under insertion, deletion, integrity constraints. DB2 has been unique among and of tuples. RDBMSs in addressing the data manipulation problems caused by certain referential integrity structures. DB2 * Also issued as technical report LBL-28363. This work was attempts to avoid these problems by imposing restrictions supported by the Office of Health and Environmental Research Program on the structure of referential integrity constraints it and Ihe Applied Mathematical Sciences Research Program, of tie Office allows. We compare the DB2 restrictions with the safe- of Energy Kesearch, U.S. DepaNnent of Energy, under Contract DE- AC03-76SKXIO98. ness conditions and show that while some DR2 restric- tions are implied by the safeness conditions, other

Barcelona, September, 1991 Proceedings of the 17th International 123 Conference on Very Large Data Bases restrictions are too stringent or misplaced. Moreover, DR2 Let R,(Xi) be a relation-scheme associated with allows the specification of certain unsafe rcfcrcntial relation r,. The lotal projection of ri on a subset W of X, integrity structures. is denoted x&(ri), and is equal to (I [W] 1f E ri and The rest of the paper is organized as follows. The r[W] is total). relational concepts used in this paper are reviewed in sec- Let Ri(Xi) be a relation-scheme associated with tion 2. In section 3 we discuss briefly the SYRASE. relation ri. A key consrraint over Ri is a statementof the INGRES,and DE32mechanisms for maintaining rcfcrcntial form Ri: K,~Xi, where Ki is a subset of Xi, called key; integrity and null constraints. In section 4 we examine R,: K,-+X, is sarisfied by ri iff for any two tuples of r,, I the data manipulation problems caused by certain and 1’ , [[K,] = 1’ [Ki] implies I = I’ , and there does not referential integrity and null constraint structures, and exist any proper subset of Ki having this property. A develop the safeness conditions required for avoiding relation-scheme can be associatedwith several candidare these problems. The specification of safe referential keys from which one primary-key is chosen. integrity structures in SYBASE, INGRES, and DR2 is Let R,(Xi) and Rj(X/) be two relation-schemes examined in section 5. The paper concludes with a sum- associated with relations ri and rj, respectively. A mary. referential inregriry constrain1 is a statementof the form Ri [Y ] G Rj[Kj], where Y and Kj are compatible subsets 2. Preliminary Definitions of Xi and Xi, respectively, Kj is the primary-key of Ri, We use in this paper some graph-theoretical concepts. and Y is called a foreign-key of Ri; Ri[Y] E Rj[Kj] IS We denote by G = (V, If) a directed graph with set of satisfied by ri and rj iff rt& (r,) E nLKj (rj). vertices V and set of edges II, and by v,-+vj a directed A referential integrity constraint Ri[Y] G Rj[K,] is edge, h, incidenl from vertex v, to vertex v,. A direcrcd associated with an inset-l-rule, a -rule and an paOr from (sfarl) vertex Vi0 to (end) vertex v,m is a updale -rule [2]. There is a unique -rule, restricted, sequence of alternating vertices and edges, v,~ h,, vi, which asserts that inserting a tuple I into ri can be pcr- .,.hi, Vi, 9 such that h, is incident from vik-, to v,~, formed only if the tuple of rj referenced by f already llk

Barcelona, September, 1991 Proceedings of the 17th International 124 Conference on Very Large Data Bases has the form Ri: 0’%Z, R;: 0 ‘1IZ is satisfied by Ti iff for the relational schemashown in figure l(i). every tuple [ of ri, the subtuple I [Z ] is total. Referential integrity and nulls-not-allowed con- An example of a relational schema involving key, straints are specified in DB2 [4] declaratively (i.e. non- referential integrity, and null constraints is shown in procedurally). Referential integrity constraints are asso- figure l(i); the referential integrity graph corresponding ciated in DB2 by default with restricted update-rules. to this schema is shown in figure l(ii), and a database Referential integrity specifications in DB2 are coupled state that satisfies the constraints involved in this schema with the specifications for relation-schemes, primary- is shown in figure l(iii). keys, and nulls-not-allowed constraints; thus, the DB2 definition for a relation-scheme Ri includes the 3. Referential Integrity and Null Constraints specification of all the referential integrity constraints that in DB2, SYBASE and INGRES involve Ri in their left-hand sides. For example, the DB2 definition for relation-scheme EMPLOYEEof the schema Relational databasemanagement systems (RDBMS) sup- in figure l(i) is shown in figure 2. port the specification of relation-schemes, keys, and nulls-not-allowed constraints. In this section we over- SYBASE4.0 [lo] and INGRES6.3 [5] provide pro- cedural mechanisms for specifying referential integrity briefly the mechanismsprovided by three represen- tative commercial RDBMSs, namely DB2, SYBASE 4.0, constraints. Thus, referential integrity constraints are maintained in these systems by executing referential and INGRES6.3 for maintaining referential integrity and integrity procedures whenever tuples are inserted, deleted general null constraints. These mechanismsare examined or updated in a relation, Given a data manipulation 6 in more detail in [9]. We illustrate our discussion with involving one or several tuples of a relation Ti associated with relation-scheme Ri, a referential integrity procedure I. Relation-Schemes (Keys are underlined) associatedwith ri must [9]: (RI) EMPLOYEE (E-SSN, S-SSN, M-SSN, P-NR) (9 revoke 6 if the relation that would result by apply- (Rz) MANAGER (MSSN, P-NR) ing 6 on ri, r’i does not satisfy the referential (R3) PROJECT (P-NR) integrity constraints involving Ri and associated Null Constraints (Nulls-Not-Allowed) with restricfed insert, delete, or update rules; EMPLOYEE: 0”: E-SSN MANAGER: 0”z M-SW (ii) initiate additional (corrective) data manipulations if PROJECT: 0 2 P-NR r’i does not satisfy the referential integrity con- Referential Integrity Constraints straints involving Ri and associatedwith nullifies or (I,) MANAGER [M-SSNJ E EMPLOYEEIESSN] cascades delete or update rules. (II) EMPLOYEE [S-SSN] c EMPLOYEE [ESSN] In SYBASE the definition of referential integrity (13) EMPLOYEE [M-SSN] s MANAGER [M-SSNI procedures involves specifying a special kind of pro- (II) MANAGER [P-N-R] c PROJECT [P-M] cedures called triggers that are activated (fved) when a (Is) EMPLOYEE [P-NR] c PROJECT [P-NR] relation is affected by a data manipulation. A trigger is Rules insert delete update associated with a unique relation-scheme, say Ri, and UI.13.14) reslricld ru1&led rulrictui employs two system provided relations, called delefed (la 13) reslriclcd nLllliJies rUldCL?d and inserted : if Ri is associated with a relation ri, then ii. Referential Integrity Graph : following a data manipulation, relation deleted consists of

CREATE EMPLOYEE ( (E-SW), E-SSN CHAR(12) NOT NULL, S-SSN CHAR(12), MSSN CHAR(IZ), P-NR INTEGER, iii. Database State : (S-SW) REFERENCES EMPLOYEE ON DELETE SET NULL, (R2): r2=[: 13 (R3): r3= FOREIGN KEY (M-SSN) REFERENCES MANAGER ON DELETE RESTRICT, FOREIGN KEY (P-NR) REFERENCES PROJECT Abbr. : E(MPLOYEE). M(ANAGBR), P(ROJECX’), S(UPERVISOR) ON DELETE SET NULL) Figure 1, A Relational DatabaseExample. Figure 2. Example of a DB2 Relation Definition.

Proceedings of the 17th International 125 Barcelona, September, 1991 Conference on Very Large Data Bases the ri tuples that are going to be deleted or updated, and such null constraints with the procedural specification for relation inserted consists of tuples that are going to be referential integrity constraints. In DB2 general null con- inserted into rir or newly updated tuples of ri, SYBASE straints can be maintained using special Vulidproc pro- allows the specification of three triggers per relation: an cedures. Every relation in a DB2 databasecan be associ- .in.rert, a delete, and an updule trigger that are fired when ated with a Validproc procedure, and these procedures tuples are inserted into, deleted from, or updated in r,, arc activated by tuple manipulations in a way similar to respectively. Triggers are specified in SYBASE’s dialect the activation of SYBASEtriggers and INGRESrules. of SQL, that allows the specification of control-flow statements in addition to standard SQL statements. For 4. Safe Referential Integrity Structures example, the delete trigger for relation-scheme MANAGER of the relational schema in figure l(i) is Certain referential integrity constraints cause data mani- shown in figure 3. pulation problems. In this section we examine theseprob- lems and specify conditions for avoiding them; referential In INGRESthe specification of referential integrity integrity structures that satisfy theseconditions are said to procedures is supported by a mechanism similar to the be safe. In the next section we discuss the referential SYBASEtrigger mechanism. Instead of triggers INGRES integrity mechanisms of DB2. SYBASE, and INGRES in allows associating rules with relations. Like a trigger, a the context of safe referential integrity structures. We rule is activated when the associated relation is affected assumebelow that the referential integrity constraints are by a data manipulation, but while triggers can be specified correctly, that is, are well-defined. We illustmte activated by manipulations involving multiple tuples, our discussion with the relational schemashown in figure rules are activated by single tuple manipulations. l(i), and the databasestate shown in figure l(iii). Accordingly, instead of the inserted and deleled relations provided by SYBASE,INGRES provides two tuples, called The data manipulations considered in this paper are new and old: following a data manipulation involving a insertions, deletions or updates of one or several tuples, relation ri, the old tuple contains the ri tuple that is going where a data manipulation 6 (i) involves only insertions, to be deleted or updated, and the new tuplc is the tuple only deletions, or only updates, (ii) refers to a unique that is going to be inserted into ri, or the newly updated relation, and (iii) involves a set of tuples that does not tuple of ri. The rule proceduresarc specified in INGRES’s change during the execution of 6. We assume that the dialect of SQL, that, like SYBASE’s SQL, allows the constraints in a databaseare verified after every single- specification of control-flow statements in addition to tuple data manipulation. A data manipulation 6 is con- standard SQL statements.For example, the delete rule for sidered to succeed iff all its single-tuple data manipula- relation-scheme MANAGER of the relational schema in tions are carried out, otherwise (i.e. if at least one single- figure l(i) is shown in figure 4. tuple manipulation of 6 cannot be carried out) it is con- sidered tofail. The safenessconditions developed in this Regarding null constraints, DB2, SYBASE, and section ensure that: INGRES allow declarative specifications of nulls-not- allowed constraints. General null constraints can be maintained using triggers in SYBASE and rules in INGRES, by embedding the procedural specification for createprocedure p_deleteMANAGER (o_PJR char(20), o_MSSN int) as declare msg varchar(80) not null; check-val integer; create trigger deleteMANAGER on MANAGER for delete as begin begin select count( *) into :check-val from EMPLOYEE declare @delEMPLOYEE int where M-SSN = :o-M-SSN; select @delEMPLOYRE = count(*) from deleted, EMPLOYEE if check-4 > 0 then where deleted.M-SSN = EMPLOYEE.M_SSN msg = ‘Failed deletion in MANAGER because if @delEMPLOYEE > 0 of existing reference from EMPLOYEE’; begin raise error 1 :msg; raisenor 1 “Failed deletion in MANAGER because of endif; existing reference from EMPLOYEE” end; rollback transaction createrule r_deleteMANAGER after deletefrom MANAGER end execute procedure p_deleteMANAGER (o-P-NR = old.P-NR, end o_MSSN = old.M-SSN); Figure 3. A SYBASE Delete Trigger Exarnnle. Figure 4. An INGRES Delete Rule Example.

Barcelona, September, 1991 Pmceedings of the 17th International 126 Conference on Very Large Data Bases 1. For every data manipulation 6, the overall effect of associated with a restricted delete-rule. Let deletion 6 6 does not depend on the order in which the involve tuples (2 - - b) and (3 2 - 6) of relation r ,. Note rcfcrential integrity constraints are enforced, nor on that tuple (3 2 - b) in relation r 1 can block 6 via 12, the order in which the tuples involved in 6 are while 6 includes the deletion of this tuple. The outcome accessed;if such an independence is not ensured of 6 dependson the order in which the tuples involved in then the result of some data manipulations may be 6 are accessed:(i) if (3 2 - b) is accessedfirst, then 6 can unpredictable. be carried out and both tuples are ultimately deleted; or 2. For every two consistent databasestates, r and r’, (ii) if (2 - - b) is accessedfirst, then 6 is blocked by tuple there exists a sequence of data manipulations 6, , (3 2 - b). .*. ) 6, that maps r into r’, so that every data mani- Example 4.3. Suppose that the relational schema of pulation in the sequencemaps a consistent database figure I(i) includes only two referential integrity con- state into another consistent databasestate. straints, I, associatedwith a cascadesdelete-rule and f3 The first safenesscondition refers to the relation- associated with a restricted delete-rule. Let deletion 6 ship between referential integrity constraints and tuple involve tuples (14 4 a) and (4 - - a) of relation r , . Note deletions. Let 6 denote a deletion (of one or several that tuple (1 4 4 a) of r , can block 6 via 1s and Ii, while tuples) in a relation ri. S can trigger: (i) the delction of 6 includes the deletion of this tuple. The outcome of 6 tuples that reference tuples involved in 6 via rcfercntial dependson the order in which the tuples involved in 6 arc integrity constraints associated with cascades dclcte- accessed:(i) if (1 4 4 a) is accessedfirst, then 6 can be rules; or (ii) the update of foreign-key values in tuples carried out and both tuples are ultimately deleted; or (ii) that reference tuples involved in 6 via referential integrity if (4 - - a) is accessedfirst, then 6 is blocked by tuplc constraints associated with nullifies delete-rules. Con- (1 4 4 a). versely, if a tuple f references a tuple involved in 6 via a Null constraints may conflict with referential referential integrity constraint associatedwith a restricled integrity constraints associatedwith nullifies delete-rules. delete-rule then I blocks (the execution of) 6. Similarly, a For example, null constraint Ri: Y “$Z conflicts with tuple I can affect or be affected by a deletion 6 if 1 refcr- refcrcntial integrity constraint Ri[Z] s Ri[Ki] associated ences a tuple involved in 6 via several (transitive) with a nulli’s delete-rule; thus, deleting a tuple in the referential integrity constraints. For example, consider the relation associatedwith Rj that is referenced by a tuple t database state of figure l(iii) associated with the rela- of ri, in which subtuples I [Y] and t [Z] are both total, tional schemaof figure l(i); tuple (1 4 4 a) in relation rl would imply setting to null subtuple t [Z], while such an blocks (directly) the deletion of tuple (4 a) in relation r2 update is not allowed by the null constraint. If a tuple 1 (via II), and blocks (transitively) the deletion of tuplc referencesa tuple involved in a deletion 6 via a rcferen- (4--a)inr, (via1sandIt). tial integrity constraint associated with a nullifies delete- The outcome of a deletion 6 is unpredictable when rule that conflicts with a null constraint involving t, then I enforcing referential integrity constraints following 6 blocks (the execution of) 6. implies triggering the deletion or update of tuples that, in Example 4.4. Suppose that the relational schema of turn, can block 6. figure l(i) includes only three referential integrity con- Example 4.1. Suppose that the relational schema of straints, I3 and Is associated with nullifies delete-rules, figure I(i) includes only three referential integrity con- and I4 associatedwith a cascades delete-rule. Suppose straints, I, and Is associatedwith cascades delete-rules, also that relation-scheme R I is associatedwith null con- and I4 associatedwith a renricred delete-rule. Let dele- straint (N i) R i : M-SSN 3 PNR. Let deletion 6 involve tion 6 involve tuple (a) of relation r3. Note that tuple tuple (a) of relation rs. Note that without Nt 6 would (4 a) of relation r2 can block 6 via Id, while 6 can trigger imply nullifying (via Is) subtuples t [P-M] in tuples the deletion of this tuple via Is and 1, . The outcome of 6 (1 4 4 a) and (4 - - a) of relation r t, and nullifying (via dcpcnds on the order in which the referential integrity I4 and 13) subtuple I [M-SSN] in tuple (1 4 4 a) of relation constraints involving R 3 arc cnforccd: (i) if Is is enforced r, . However, when N t is considered, the outcome of 6 first, then tuples (1 4 4 a) and (4 - - a) are deleted from depends on the order in which the referential integrity r t, thus leading to the enforcement of It which results in constraints involving R s are enforced: (i) if I4 is enforced deleting tuple (4 a) from r2; or (ii) if f4 is enforced first, lirst, then tuple (4 a) is deleted from r2, thus leading to then 6 is blocked by tuple (4 a) of r2. the enforcement of 1s which results in nullifying subtuple I[M-SSN] in tuple (1 4 4 a) of r 1; subsequently, enforc- Example 4.2. Suppose that the relational schema of ing 1s results in nullifying subtuples t [P-NR] in tuplcs figure l(i) includes only rcfcrcntial integrity constraint I2 (1 4 - a) and (4 - - a) of r i ; or (ii) if Is is enforced first,

Barcelona, September, 1991 Proceedings of the 17th International 127 Conference on Very Large Data Bases then 6 is blocked by tuple (1 4 4 a) of r , , where subtuple Relational schemaRS is said to satisfy safenesscondition 1[P-NR] cannot be nullified becauseof N, . Sl iff for every relation-scheme Ri of R, set Reslr(Ri) is Note that because of the unique resfricfed insert- disjoint with both CaSc(Ri) and Null(Ri). I rule, problems such as those discussed above cannot be The relational schemasof examples 4.1, 4.2, 4.3, caused by insertions. Updates, however, can cause simi- and 4.4 above, for instance, do not satisfy condition Sl: in lar problems. For the sake of simplicity we assume in example 4.1 relation-scheme R2 belongs to both this paper that in a relational schema all the referential Restr(R3) and Casc(R3), in examples 4.2 and 4.3 integrity constraints are associated with identical res- relation-scheme R1 belongs to both Resrr(R 1) and tricted update-rules. The safeness condition specified Cusc(R !), and in example 4.4 relation-scheme R 1 below ensures that the overall effect of a deletion 6 belongs to both Restr(R,) and Null(R3). Conversely, rhc involving one or several tuplcs of a given relation, is relational schemaof figure l(i) satisfiescondition Sl. independent of both the order in which the referential integrity constraints are enforced, and of the order in Proposition 4.1 . Let RS = (R, F u I u N) be a rela- which the tuples involved in 6 are accessed. tional schema, where F, I, and N denote sets of key, referential integrity, and null constraints, respectively. If Definition 4.1 - Safeness Condition St. RS satisfies condition Sl then for every relation ri associ- Let RS = (R, F u I u N) be a relational schema, where ated with a relation-scheme Ri of R, and for every dele- F, I, and N denote sets of key, referential integrity. and tion 6 involving one or several tuples of Ti, the effect of 6 null constraints, respectively. Let GI = (R, I/) bc the is independent of both the order in which the referential referential integrity graph associated with RS. Given a integrity constraints of I arc enforced, and of the order in relation-scheme Ri of R, sets Casc(R,) and Null(R,) which the tuples involved in S are accessed. defined below consist of the relation-schemes whose Proof Sketch. Let T(6) denote the set of tuples either associated relations may contain tuples that can be involved in 6 or potentially affected by 6 by enforcing deleted, respectively updated, as a result of deleting referential integrity constraints associated with cascades tuples in a relation associated with R,; and set Re.qtr(R;) or nullifies delete-rules. Let T(s) denote the set of tuples defined below consists of the relation-schemes whose potentially blocking the deletion or update of tuples in relations may contain tuples that can block the dclction of T(6) following the enforcement of either null constraints, tuples in a relation associatedwilh R,: or referential integrity constraints associated with res- Cusc(Ri) is the subset of R consisting of R; and the tricted delete-rules. Clearly, if T(s) g T(6) then 6 either relation-schemes that are connected in G, to fails (for T(s) - T(6) # 0 ) or succeeds(for T(s) = 0 ). Ri by a directed path consisting of edgesthat However, if T(s) c T(6) then different sequences of correspond to referential integrity constraints accessing the tuples involved in 6 or of enforcing the associatedwith cmcudes delete-rules; referential integrity constraints, can lead to different results. The proof shows that condition Sl ensures that NUll(Ri) is the subset of R consisting of rclation- T(6) a T(6). w schemesRjl where Rj is connected in Cl to a relation-scheme Rk of Cusc(Ri), by an edge The second safenesscondition refers to data mani- that corresponds to a referential integrity pulation deadlocks caused by cyclic referential integrity constraint Rj[Yl c Rk[Kk] associated with a structures; such structures involve referential integrity nullifies delete-rule, such that none of the constraints that correspond to directed cycles in the asso- attributes of foreign-key Y belongs to a set of ciated referential integrity graphs. attributes Z that is involved in a null con- Example 4.5. Consider relation-schemesR1 and R2 of straint of the form Rj: W “AZ; the relational schema of figure l(i), and suppose that Resrr(Ri) is the subset of R consisting of relation- referential integrity constraints I 1, I*, and I3 are all asso- schemesRj, where Rj is connected in G, to a ciated with reslricted delete-rules. Suppose also that in relation-scheme Rk of Cusc(R;), by an edge the databasestate of figure l(iii) all the tuples of relation that corresponds to a referential integrity rl are total, that relation rl includes two additional constraint R,[Y] c Rk[Kt] associated with tuplcs, (5 2 6 6) and (6 5 2 a), and relation r2 includes either (i) a nulli’es delete-rule, such that at the additional tuple (6 6). If foreign-keys S-SSN and least one of the attributes of foreign-key Y M-SSN associatedwith R1 are not allowed to have null belongs to a set of attributes Z that is values, then referential integrity constraints II, 12, and I3 involved in a null constraint of the form prevent the deletion of these three additional tuples from Rj : W “$Z, or (ii) a reswicred delete-rule. r, and r2, although their deletion results in a consistent

Barcelona, September, 1991 Proceedings of the 17th International 128 Conference on Very Large Data Bases databasestate, namely that shown in figure 1(iii). be achieved by following this order for inserting tuples The safenesscondition specified below ensuresthat (i.e. if Ri > Rj then all the tuples of the relation associated the null and referential integrity constraints do not cause with Ri are inserted before the tuples of the relation asso- data manipulation deadlocks such as that discussed in ciated with Rj). example 4.5 above. (ii) If G, has cycles then the subgraph G’, of G, is defined as follows: G’, results by removing from G, edges that Definition 4.2 - Safeness Condition S2. belong to cycles, so that every removed edge corresponds Let RS = (R, F u I u N) be a relational schema, where to a referential integrity constraint involving a foreign- I;, I, and N denote sets of key, referential integrity, and key whose auributes are allowed to have null values. null constraints, respectively, and let G, be the referential Condition S2 ensures that G’, is acyclic. Consequently, integrity graph associatedwith RS. Relational schemaRS G’, defines the following partial order for the relation- is said to satisfy safeness condition S2 iff for every schemesof R: for every pair of relation-schemesof R. Ri directed cycle of G,, at least one of the referential and Rj, Ri > Ri iff Rj+Ri is an edge in G’,. Then map- integrity constraints that correspond to an edge of this ping the empty databasestate into r can be achieved by: cycle, involves a foreign-key whose auributes are (a) following this order for inserting tuples, as discussed allowed to have null values. m above in (i), where the values of the attributes of foreign keys involved in referential integrity constraints that The relational schemaof example 4.5, for instance, correspond to edges of G, that do not appear in G’,, are does not satisfy condition S2, while the relational schema replaced by null values; and (b) by replacing the null of figure l(i) satisfiescondition S2. values introduced in (a) with the actual values for these Example 4.6. Consider the relational schema of exam- foreign keys attributes. H ple 4.5, and suppose that in relation-scheme R 1 foreign- key S-SSNis allowed to have null values, while foreign- 5. Safe Referential Integrity Structures in key M-SSN is not allowed to have null values. Suppose that in the databasestate of figure l(iii) relations r 1 and DB2, SYBASE, and INGRES r2 include the additional tuples mentioned in example In this section we discuss problems underlying the 4.5. Then the deletion of these tuples, namely of tuples specification of safe referential integrity structures in (5 2 6 b) and (6 5 2 a) from r , and of tuple (6 b) from r2, DB2, SYBASE 4.0, and INGRES 6.3. Additional details can be performed as follows: (i) subtuple I[S-SSN] in regarding the referential integrity mechanisms of these tuple (6 5 2 a) of rl is nullified; (ii) tuple (5 2 6 h) is systemscan be found in [91. deleted from rl, (iii) tuplc (6 6) is deleted from r2, and The mechanismsprovided by SYBASE and INGRES (iv) tuple (6 - 2 a) is deleted from rl. Note that every for maintaining referential integrity constraints are gen- manipulation in this sequenceresults in a consistent data- eral mechanisms that can be used for maintaining other basestate. (e.g. null) constraints as well. In SYBASE and INGRES Proposition 4.2 . Let RS = (R, F u I u N) be a rcla- databasesthe specifications of referential integrity (and/or tional schema, where F, I, and N denote sets of key, other) constraints are encoded in (trigger or rule) pro- referential integrity, and null constraints, respectively. If cedures, and it is very hard, if not impossible, to decode RS satisfies condition S2 then for every consistenl data- thcsc constraint specifications by parsing these pro- basestate associatedwith RS, r, there exists a sequenceof cedures. Consequently, it is not surprising that SYBASE data manipulations that map the empty database state and INGRESdo not provide any mechanism for detecting (resp. r) associatedwith RS into r (resp. the empty data- problematic’ (e.g. unsafe) or even not well-defined base state), so that every data manipulation results in a referential integrity constraints. In systems such as consistent databasestate of RS. SYBASE and MORES, users are solely responsible for both the syntactic and semantic correctness of the pro- Proof Sketch. We consider below only the mapping of cedural specifications for referential integrity constrainu. the empty databasestate into r. The proof for the reverse mapping, of r into the empty database state, is similar In RDBMSs that support declarative specifications (example 4.6 illustrates this reverse mapping). G, denotes of referential integrity constraints, the structure of the below the referential integrity graph associatedwith RS. constraints can be easily analyzed. Accordingly, in such systemsproblematic referential integrity structures can be (i) If G, is acyclic then G, defines the following partial detected and disallowed. The only (to our knowledge) order for the relation-schemes of R: for every pair of RDBMS providing such a capability is DB2. The follow- relation-schemesof R, Ri and R;, Ri > R, iff Ri+Ri is an ing two restrictions imposed by DB2 are meant to avoid edge in G,. Mapping Lhcempty databaseslate into r can

Barcelona, September, 1991 Proceedings of the 17th International 129 Conference on Very Large Data Bases the same problems as safeness condition Sl defined in Condition Tl is more restrictive than condition Sl. section 4. Note that the notations in the definition below Example 5.1. Suppose that in the relational schema of differ from the notations used in [4]. figure l(i) referential integrity constraints 1s and Is are Definition 5.1 . associated with nullifies delete-rules, while referential integrity constraint I4 is associated with a cascades Let RS = (R, F u I u N) be a relational schema, where delete-rule. Then condition Tl is not satisfied (set F, I, and N denote sets of key, referential integrity, and Null”(R3) is not empty), while condition Sl is satisfied null constraints, respectively. Let G, be the referential (set Restr(R s) is empty). integrity graph associated with RS. Let Cusc(Ri) be defined as in section 4, and let Null’ (Ri), Rem’ (Ri), and The additional restriction imposed by Tl is meant Null” (Ri) be defined as follows: to avoid the effect of null constraints on the outcome of deletions, as discussed in section 4 (see example 4.4). Null’ (Ri) is the subset of R consisting of relation- Recall that DB2 provides a mechanism for maintaining schemesRi, where Ri is connected in G, to general null constraints using special Vulidproc pro- a relation-scheme of Cusc(Ri), by an edge cedures that are activated (triggered) by every tuple that corresponds to a referential integrity manipulation, Thus, in example 5.1 above, a Vulidproc constraint associatedwith a nullifies delete- procedure associated with relation-scheme R1 can be rule; used in order to maintain a null constraint Rear’ (RJ is the subset of R consisting of relation- R 1: P-NR 3 M-SSN, that disallows MSSN values to be schemesRi, where Rj is connected in GI to null in tuples where P-NR values are not null. a relation-scheme of Casc(R,), by an edge Example 5.2. Consider the relational schema of figure that corresponds to a referential integrity I(i), and suppose that referential integrity constraints /s constraint associated with a restricted and Is are associated with nullifies delete-rules, while delete-rule; referential integrity constraint I4 is associatedwith a cus- Null” (Ri) is the subset of Null’(Ri) consisting of cudes delete-rule. If relation-scheme R1 is associated relation-schemesR,, where R; is connected with null constraint R,: P-NR 3 M-SSN, then this in G, to relation-schemesof Casc(R,) by at schema does not satisfy both condition Tl (again, set least two edges corresponding to refcrcntial Null”(R3) is not empty), and condition Sl (set R 1 belongs integrity constraints associatedwith nullifies to both Restr(R J) and Null(R3)). delete-rules. However, even when null constraints are involved In DB2 the referential integrity constraints must satisfy condition Tl is still more restrictive than condition Sl. the following two restrictions: Example 5.3. Consider the relational schema of figure Tl : For every relation-scheme Ri of R, sets I(i) as specified in example 5.2 above, but with referen- (Cusc(Ri) - (Ri)), Null’ (Ri), and Rem (Ri) are tial integrity constraint 1s associated with a restricted pairwise disjoint, and set Null” (R,) is empty. delete-rule. Then this schema does not satisfy condition T2 : For every subset/’ of I that consists of referential Tl (R 1 belongs to both Null’ (R 3) and Restr’ (R j)), while integrity constraints corresponding to edges form- it satisfies condition Sl (RI belongs to Restr(RJ), but ing a directed cycle in G, : (i) if I’ consists of a does not belong to either Null (R 3) or Cast (R 3)). single constraint, then this constraint must be asso- Condition T2 treats cycles involving single referen- ciated with a cuscudes delete-rule; (ii) if I’ consists tial integrity constraints differently from cycles involving of two or more constraints, then at least two con- multiple rcfcrential integrity constraints. This apparent straints of I’ must be associatedwith restricted or contradiction does not exist for safe referential integrity nullifies delete-rules. U structures. Note that the DB2 restrictions above do not require Example 5.4. Consider the relational schema of figure cyclic referential integrity structures to involve at least l(i). If f2 is associated with a nullifies delete-rule then one foreign-key consisting of attributes that are allowed condition T2 is not satisfied, while condition Sl is to have null values (safenesscondition S2 ), and therefore satisfied; both T2 and Sl are satisfied if I2 is associated data manipulations deadlocks such as those discussed in with a cascades delete-rule. Similarly, if referential section 4 are not prevented in DB2. We compare below integrity constraints It and 1s are both associated with conditions Tl and T2 with safenesscondition Sl. cascades delete-rules, then condition T2 is not satisfied, while condition Sl is satisfied; both T2 and Sl are satisfied if It and 1s are associated with restrict or

Proceedings of the 17th International 130 Barcelona, September, 1991 Conference on Very Large Data Bases nullifies delete-rules. DM’ : DELETEFROM EMPLOYEEWHERE E-SSN NOT IN According to [4], condition T2 ensures that dele- (SELECTE-SSN FROM SUPERVISE) tions do not depend on the access sequence selected by Like DM, Dbf’ is ambiguous and has two possible execu- the query optimizer. Note that this is the samegoal as that tions. However, deletions such as Dhf’ are detected as of condition Sl (see proposition 4.1). However, while ambiguous, and therefore are rejected by DB2 (for more condition Sl is haed on the assumption that the set of details see the section on DML restrictions in [4]). tuples involved in a deletion does not change during its It can be easily verified that DB2 conditions Tl and execution, DB2 allows such sets of tuples to change dur- T2 are equivalent to safenessconditions Sl and S2 for ambiguous ing deletion executions, that is, allows dele- relational schemas of the following form: tions. RS = (R, F u I u N ), where R is a set of relation- Example 5.5. Suppose that the relational schema of schemes,F is a set of key constraints, I is an acyclic figure l(i) includes only one referential integrity con- set of referential integrity constraints that are associ- straint, 12, that is associated with a nullifies delete-rule, ated either with restricted or cascades delete-rules and suppose that relation-scheme RI (EMPLOYEE) is and restricted update-rules, and N consists only of associatedwith relation rl of figure l(iii). Consider the following data manipulation: Dhf : DELETEFROM EMPLOYEE WHERE S-SSN IS NULL i. Relation-Schemes (Keys are underlined) which requires the deletion from relation ri of the tuples (R’t) EMPLOYEE (E-SSN. P-NR) that represent employees wilhout supervisors. DM has (R’z) MANAGER (MmSSN, P-NR) two possible executions, dcpcnding on the order in which (Ri) PROJECT (P_NR) the tuples of r 1 is accessed: (R’,) SUPERVISE (e, S-SW) (Ri) LEAD (E-=N, MSSN) 1. if tuples (2 - - b) and (4 - - a) are accessedfirst, then tuples (3 2 - b) and (1 4 4 a) are also deleted, Null Constraints (Nulls-Not-Allowed) because the S-SSN values in these tuples turn to EMPLOYEE: 0 ‘l: ESSN SUPERVISE: 0’3 E-SSN, S-SSN nulls after the first two deletions; MANAGER: 0 “1: M-SSN LEAD: 0 3 E-SSN. M-SSN PROJECT: 0 “1: P-NR 2. if tuples (2 - - 6) and (4 - - a) are accessedlast, Referential Integrity Constraints then none of the tuples in r , are de&d. (I’,) MANAGER [MTSSN] s EMPLOYEE [E-SSN] The problem here, however, is not caused by the (I’?) SUPERVISE [S-SSN] s EMPLOYEE [E-SSN] existence of multiple accesssequences for DM, but by the (1’3) SUPERVISE [ESSN] E EMPLOYEE [E-SSNj ambiguity of DM. Thus, the two executions above (I’,) LEAD [ E-SSN] c EMPLOYEE [ETSSN] correspond to different interpretations of DM: while the (/;) LEAD [M-SSN] c MANAGER [M-SSN] first execution interprets the WHERE condition as a (/k, MANAGER [P-NR] E; PROJEt [P-NR] precondilion for the deletion (i-c rcmovc only tuplcs that (1’-/) EMPLOYEE [P-NR] s PROJECT [PmNR] represent employees without supervisors at the time of -_Rules insert delete update expressing, but before carrying out, D&f), the secondexe- (I’, ,I’,, I’5./‘#j) rutricrui reslric&!d rutiti cution interprets the WHEREcondition as a poskondition U’z. I’3 1 rutricld caadu rutridd for the deletion (i.e following DM, none of the tuples (I’7 1 rutrictd nmilijEu rulrklsd should represent employees without supervisors). ii. Referential Inteeritv Graph : Accordingly, not the structure of the referential integrity constraints should be restricted, but instead ambiguous data manipulations should be rejected. Interestingly, while DM is allowed by DB2, a dele- tion equivalent to DM expressedover a relational schema equivalent to the schema of figure l(i), is not allowed by DB2, as illustrated by the following example. Exumple 5.6. Consider the relational schema shown in figure 5(i). As explained later in this section, this schema Abbr. : E(MPI.OYEE), M(ANAGER), P(ROJECTj, S(UPERVISOR) is equivalent to the relational schema of figure I(i). The Figure 5. A Relational SchemaEquivalent to the following data manipulation expressedover the relational Relational Schemaof Figure l(i). schemaof figure 5(i) is equivalent to DM:

Proceedings of the 17th International 131 Barcelona, September, 1991 Conference on Very Large Data Bases nulls-not-allowed constraints, that disallow (primary References and foreign) key attributes to have null values. Ill E.F. .Codd, “Extending the relational database Certain relational schemascan be transformed into sche- model to capture more meaning”, ACM TODS 4, 4 mas of the form specified above. A transformation that, (Dee 1979), pp. 391434. under certain conditions, can remove cyclic referential integrity structures, referential integrity constraints asso- 121 C.J. Date, “Referential integrity”, in Relarional ciated with nullifies delete-rules, and foreign-key attri- Database-Selected Writings, Addison-Wesley, 1986. butes that are allowed to have null values, is given in 181. 131 C.J. Date, “Referential integrity and foreign keys: This transformation is exemplified in figure 5: the rela- Further considerations”, in Relational Dambase- tional schema of figure 5(i) results from the relational Wrilings 19851989, Addison-Wesley, 1990. R schemaof figure l(i) by splitting relation-scheme , into [41 IBM Corporation, “IBM DATABASE 2 Referential relation-schemes R’, , Rt4, and R’S, while adapting Integrity Usage Guide”, June 1989. accordingly the key, referential integrity and null con- straints of the relational schemaof figure 1(i). The result ISI Ingres, Inc., “INGRES/SQL Reference Manual”, of this transformation is a schema equivalent with (i.e. Release6.3, Alameda, California, Nov. 1989. having the same information-capacity as) the schema of [61 D. Maier, The lheory of relational databases, Com- figure l(i) (see [8] for more details). Transformations puter SciencePress, 1983. such as that exemplified above, can be used in order to [71 V.M. Markowitz, “Referential integrity revisited: An transform a relational schema involving a safe referential object-oriented perspective”, Proc. of the 16th VLDB integrity structure into a relational schemathat involves a Conference, Brisbane, Australia, 1990, pp. 578-589. referential integrity structure that is both safe and com- plying with the restrictions imposed by DB2. [81 V.M. Markowitz and J.A. Makowsky, “Identifying extended entity-relationship object structures in rela- 6. Summary tional schemas”,IEEE Trans. on Software Engineer- ing, 16,8, (Aug. 1990), pp. 777-790. We have examined the data manipulation problems [91 V.M. Markowitz, “Problems underlying the use of caused by certain structures of referential integrity and referential integrity mechanisms in relational data- null constraints, and developed safeness conditions for base management systems”, Proc. of the 7th Int. avoiding these problems. These conditions should com- Conf. on Data Engineering, Japan, 1991. plement the well-dejinedness conditions that ensure that referential integrity constraints are specified correctly. [ 101Sybase, Inc., “Transact-SQL User’s Guide”, Relcasc 4.0, Emeryville, California, Oct. 1989. We have examined the problems underlying the specification of safe referential integrity structures in three relational database management systems, DB2, SYBASE, and INGRES. DB2 allows declarative specifications of referential integrity constraints, but imposes restrictions on the structure of referential integrity constraints. We have shown that some of these restrictions limit the capability of specifying safe referen- tial integrity structures in DB2; conversely, DB2 allows the specification of some unsafe referential integrity structures. We have also shown that ambiguous data manipulations are not treated uniformly in DB2. The pmcedurality of the referential integrity mechanisms provided by SYBASE and INGRES makes very hard, if not impossible, the task of detecting unsafe or even not well-defined referential integrity constraints. Therefore, it is not surprising that SYBASEand INGRES do not provide mechanisms for detecting erroneous referential integrity constraints.

Prcceediigs of the 17th International 132 Barcelona, September, 1991 Conference on Very Large Data Bases