Safe Referential Integrity Structures in Relational Databases
Total Page:16
File Type:pdf, Size:1020Kb
SAFE REFERENTIALINTEGRII-Y STRUCTURESr~ RELATIONAL DATABASES* Victor M. Markowitz Information and Computing SciencesDivision Lawrence Berkeley Laboratory 1 Cyclotron Road, Berkeley, CA 94720 Abstract The concept of referential integrity is still sur- Referential integrity constraints express in rela- rounded by confusion, as illustrated by the successive tional databasesexistence dependenciesbetween modifications of the original definition of [l] (see [21, tuples. Although it is known that certain [3]). Thus, although it is known that certain referential referential integrity structures may cause data integrity structures may cause data manipulation prob- manipulation problems, the nature of these prob- lems (e.g. see [3]), the nature of these problems has not lems has not been explored and the conditions been exp!orcd and the conditions for avoiding them have for avoiding them have not been formally not been formally developed. In particular, problems developed. In this paper we examine these data created by the interaction of referential integrity and null manipulation problems and formally develop constraints have not been investigated, In this paper we safenessconditions for avoiding them. Next, we examine these data manipulation problems and develop discuss the problem of specifying safe rcfcrcn- safeness conditions for avoiding them. It is worth noting tial integrity constraints in three rcprescntativc that the approach of this paper is different from that of relational databasemanagement systems, IBM’s 171,where it is shown that relational schema translations DB2, SYBASE,and INGRES. of object-oriented schemas have referential integrity structures with desirable properties. Key Words : null constraint, referential integrity constraint, relational databasemanagement sys- The referential integrity mechanisms provided by tem, safe referential integrity structure. various RDBMSsare different and difficult to use. Thus, SYBASE4.0 [lo] and MCRES 6.3 [5] provide procedural 1. Introduction mechanisms (rriggm in SYBASE and rules in INGRES) for maintaining referential integrity constraints. Con- Commercial relational datab&se management systems versely, DB2 [4] allows non-procedural (declarative) (RDBMS) provide mechanisms for maintaining key and specifications of referential integrity constraints, but restricted (nulls not allowed) null constraints. Several imposes restrictions on the structure of these constraints. commercial RDBMSs, notably IBM’s DB2, SYBASE4.0, Problems underlying the use of the referential integrity and INGRES6.3 also provide mechanismsfor maintaining mechanisms of DB2, SYBASE, and INGRES have been referential integrity constraints. Referential integrity con- examined in [9]. In this paper we examine these mechan- straints are used in relational databasesfor expressing isms in the context of safe referential integrity structures. existence dependencies between tuples [I]: such con- In SYBASE4.0 and INGRES6.3 the task of specify- straints are specified by associating a foreign-key in one ing correctly referential integrity constraints is left to relation with the primary-key of another relation [2]. users. Thus, no mechanism is provided by these systems Referential integrity constraints are usually associated with rules that define the behavior of the relations for detecting unsafe or even not well-defined referential involved in these constraints under insertion, deletion, integrity constraints. DB2 has been unique among and update of tuples. RDBMSs in addressing the data manipulation problems caused by certain referential integrity structures. DB2 * Also issued as technical report LBL-28363. This work was attempts to avoid these problems by imposing restrictions supported by the Office of Health and Environmental Research Program on the structure of referential integrity constraints it and Ihe Applied Mathematical Sciences Research Program, of tie Office allows. We compare the DB2 restrictions with the safe- of Energy Kesearch, U.S. DepaNnent of Energy, under Contract DE- AC03-76SKXIO98. ness conditions and show that while some DR2 restric- tions are implied by the safeness conditions, other Barcelona, September, 1991 Proceedings of the 17th International 123 Conference on Very Large Data Bases restrictions are too stringent or misplaced. Moreover, DR2 Let R,(Xi) be a relation-scheme associated with allows the specification of certain unsafe rcfcrcntial relation r,. The lotal projection of ri on a subset W of X, integrity structures. is denoted x&(ri), and is equal to (I [W] 1f E ri and The rest of the paper is organized as follows. The r[W] is total). relational concepts used in this paper are reviewed in sec- Let Ri(Xi) be a relation-scheme associated with tion 2. In section 3 we discuss briefly the SYRASE. relation ri. A key consrraint over Ri is a statementof the INGRES,and DE32mechanisms for maintaining rcfcrcntial form Ri: K,~Xi, where Ki is a subset of Xi, called key; integrity and null constraints. In section 4 we examine R,: K,-+X, is sarisfied by ri iff for any two tuples of r,, I the data manipulation problems caused by certain and 1’ , [[K,] = 1’ [Ki] implies I = I’ , and there does not referential integrity and null constraint structures, and exist any proper subset of Ki having this property. A develop the safeness conditions required for avoiding relation-scheme can be associatedwith several candidare these problems. The specification of safe referential keys from which one primary-key is chosen. integrity structures in SYBASE, INGRES, and DR2 is Let R,(Xi) and Rj(X/) be two relation-schemes examined in section 5. The paper concludes with a sum- associated with relations ri and rj, respectively. A mary. referential inregriry constrain1 is a statement of the form Ri [Y ] G Rj[Kj], where Y and Kj are compatible subsets 2. Preliminary Definitions of Xi and Xi, respectively, Kj is the primary-key of Ri, We use in this paper some graph-theoretical concepts. and Y is called a foreign-key of Ri; Ri[Y] E Rj[Kj] IS We denote by G = (V, If) a directed graph with set of satisfied by ri and rj iff rt& (r,) E nLKj (rj). vertices V and set of edges II, and by v,-+vj a directed A referential integrity constraint Ri[Y] G Rj[K,] is edge, h, incidenl from vertex v, to vertex v,. A direcrcd associated with an inset-l-rule, a delete -rule and an paOr from (sfarl) vertex Vi0 to (end) vertex v,m is a updale -rule [2]. There is a unique insert-rule, restricted, sequence of alternating vertices and edges, v,~ h,, vi, which asserts that inserting a tuple I into ri can be pcr- .,.hi, Vi, 9 such that h, is incident from vik-, to v,~, formed only if the tuple of rj referenced by f already llk<m. A directed cycle is a directed path whose start exists. The delete and update rules define the effect of vertex is also its end vertex. deleting (resp. updating the primary-key value in) a tuple (’ of rj : a restricted delete (resp. update) rule assertsthat We review briefly below the relational concepts the deletion of (resp. update of the primary-key value in) used in this paper, Details can bc found in any textbook I’ cannot be performed if there exist tuples in ri refercnc- (e.g. [6]) for the basic concepts, and in [2] for rcfcrcntial ing t’; a cascades delete (resp. update) rule asserts that integrity constraints, WC denote by I a tuple and by t[ W ] the deletion of (resp. update of the primary-key value in) the subtuple of f corresponding to the attributes of W. A I’ implies deleting (resp. updating the subtuple f [Y] in) tuple is said to be rota1 if it has only non-null values. the tuples of ri referencing I’ ; and a nullifies delete (resp. A relational schema RS is a pair (R,A), whcrc R is update) rule asserts that the deletion of (resp. update of a set of relation-schemes and A is a set of constraints the primary-key value in) I’ implies setting to null the over R. We consider relational schemas with subtuple I [Y ] in all the tuples t of ri referencing I’ . A = F v I v N, where F, I, and N denote sets of key, Let RS = (R,A) be a relational schema, so that A referential integrity, and null constraints, respectively. A includes referential integrity constraints. The referential relation-scheme is a named set of attributes, Ri(Xi), integrity (directed) graph associated with RS, where Ri is the relation-scheme name and Xi denotes the G, = (V, w, is defined as follows: V = R, and set of attributes. Every attribute is assigneda domain, and If = (RijRj 1 Ri[Y] c Rj[Kj] E I). The set of referential every relation-scheme, Ri(Xi), is assigned a relation integrity constraints of RS is said to be acyclic iff G, does (value), ri. A database slafe associated with (R, A) is not have directed cycles. defined as r = < r , . ..rk >, where r, is equal to a subset of the cross-product of the domains corresponding to attri- A null cons&a& is a restriction on the way nulls butes of Ri(Xi). A databasestate r associatedwith a rela- appear in relations [6]. Let Ri(Xi) be a relation-scheme tional schema RS = (R,A) is said to be con&en1 if it associated with relation ri, A null constraint is a state- satisfies the constraints of A. Two attributes arc said to mcnt of the form Ri: Y ‘$Z, where Y and Z are subsetsof be compatible if they arc associated with the same X,; Ri: Y %Z is satisfied by ri iff for every tuple t of r,, domain, and attribute setsX and Y are said to be cnmpali- I 1Y ] is total only if ( [Z] is total. All relational database ble iff there exists a one-to-one correspondenceof com- management systems support the specification of nulls- patible attributes betweenX and Y.