What Does Boyce-Codd Normal Form Do?+

WHAT DOES BOYCE-CODD NORMAL FORM DO?+ Philip A. Bernstein Nathan Goodman Aiken Computation Laboratory Harvard University Cambridge, MA 02138 Abstract The benefits of Boyce Codd normal form (BCNF) are well understood at an intuitive level. BCNF Normalization research has concentrated on defining and its precursor, third normal form (3NF), were normal forms for database schemas and developing introduced by Codd to eliminate "anomalous" up- efficient algorithms for attaining these normal date* behavior of relations. BCNF accomplishes forms. It has never been proved that normal forms this by "placing independent relationships into are good, i.e. that normal forms are beneficial to independent relations." (A more precise character- database users. This paper considers one of the ization appears later.) BCNF is widely recommended earliest normal forms (Boyce Codd normal form in practice for this purpose [Date, Mar], and there [Cod21) whose benefits are intuitively understood. is little doubt that BCNF is a good normal form in We formalize these benefits and attempt to prove many practical situations. that the normal form attains them. Instead we prove the opposite: Boyce-Codd normal fomn fails This paper formalizes the intuitive benefits to meet its goals except in trivia2 eases. This of BCNF and attempts to prove that these benefits counterintuitive result is a consequence of the are attained. The attempt fails. Instead we prove "universal relation assumption" upon which normal- that BCNF does not meet its goals, except in trivial ization theory rests. Normalization theory will eases. This result is quite disturbing because it remain an isolated theoretical area, divorced from contradicts BCNF's intuitive appeal and empirically database practice, until this assumption is observed benefits. Either the theory is wrong, or circumvented. database practitioners are making a serious error by using BCNF. Our opinion is that the theory is at fault--that normalization theory, as currently constituted, is inadequate to characterize the 1. Introduction practical use and benefits of normal forms. The source of inadequacy can be traced to the "universal Normalization is a schema design process. The relation assumption" upon which normalization theory input is a schema (i.e., database description) is based. that describes an application; the output is another schema that describes the same application, but in On the surface, this paper is an analysis of a "better" way. Normalization theory provides Boyce Codd normal form. Cur deeper purpose is to mathematical tools for defining schemas, for draw attention to a missing link in normalization characterizing the application that a schema re- theory. For normalization theory to be well presents, and for specifying the form a "good" grounded in the database problems it intends to schema must satisfy. Most normalization research solve, it must be possible to use the theory to has focused on defining good forms for schemas prove the benefits of normal forms. Unfortunately, (MoMnal forms) and developing efficient algorithms it appears that normalization theory, as currently for attaining these normal forms. However, normal- formulated, cannot do this.. Until this missing ization theory does not provide tools for explaining link is filled in, normalization theory will be why normal forms are good. Presumably, normal forms incapable of explaining what Boyce Codd normal help database users in some way, but these benefits form does. have never been characterized within the formal framework of normalization theory. This paper attempts such a characterization for one particular normal form, namely Boyce Codd normal form. t This work was supported by the National Science Foundation under Grant Number MCS-77-05314 and by * the Advanced Research Projects Agency of the The word "update" is used generically to denote Department of Defense, Contract Number N00039-78- insertions, deletions, or replacements (i.e., G-0020. modification) of tuples in a relation. w1534-7/80/0000-0245$00.75 0 1980 IEEE 245 2. Terminology D<D' iff D(R.) CD' (Ri) for i=l,...,n; clearly, I is a partiaiiorder. 2.1 Relations and Databases In formalizing relational databases we distin- 2.2 Functional Dependencies guish the time-invariant description of a relation VS. its contents at a particular moment. The A functional dependency (abbr. FD) is a state- description of a relation is called a relation ment of the form f:X+Y, where f is the name of schema and consists of a set of attributes, and a the FD, and X,Y are sets of attributes. f is set of functional dependencies. R= <ATTR,F> defined on relation schema R=<ATTR,F> if denotes a relation schema R witk attributes ATTR, X,YcATTR. We shall assume throughout this paper and functional dependencies- F. Figure 2.1 thatevery fEF is defined on R. If f is illustrates a relation schema. The contents of a defined on R we interpret it as-a predicate on relation is called a relation state, and may be states of ET f CR) is true if for all r,r'ER, visualized as a table of data. Figure 2.2 illus- whenever r[Xl = r'[Xl, r[Yl = r' [Yl as well. trates a relation state. A relation state R of schema R=<ATTR,F> is consistent if every fEF is truein R. For Relation Schema: example, the state depicted in Fig. 2.2 is consistent. The set of all consistent states of -R is denoted DOM(E). -R = <{E#, JC, D#, M#, CT), {E#+JC, D#, M#, CT; D#+M#, CT: Given that a set of FDs, F, holds in state R, M#+D#. CT}> one can infer other FDs that must hold in R as well. The set of all such FDs can be constructed Description of Attributes: syntactically using a system of inference rules [DC], see Figure 2.3. The set of all FDs derivable from F using the inference rules of Fig. 2.3 is E# employee number called the closure of F and is denoted F+. JC job code D# department number M# employee number of manager If Y C X then X'Y CT contract type If ZCW- and X+Y then xw+yz Figure 2.1. An Example Relation Schema. If X'Y and Y+Z then X'Z Figure 2.3. Inference Rules for Relation Schema: as defined in Fig. 2. Functional Dependencies. Relation: R E# JC D# M# CT Within the confines of a single relation schema, F+ is precisely the set of FDS implied by 1 a x 11 g F: i.e., fE F+ iff f holds in every state of 2 c x 11g R in which F holds [Arm]. This fact is called 3 a y 12 n the coi??pZeteness property of FD-closure [BFH]. 4 b x 11 g 5 b ~12 n FDS also enjoy the uniqueness property within 6 c y 12 n a single relation schema, which states that syn- 7 a z 13 n tactically identical FDs are semantically equi- 8 c z 13 n valent as well [Bern]. Figure 2.2. An Example Relation. 2.3 The Universal Relation Assumption: Schema Equivalence Notationally, states of relation schema R Intuitively, two schemas are equivalent if are represented by R with possible superscript. they represent the same external application. For Elements of R (called tuples) are denoted r relation schemas this notion may be formalized in with possible superscript. If Xc_ATTR, we use terms of consistent states: relation schema to denote the projection of r on the r [xl R=<ATTR,F> and E' =<ATTR',F'> are equivalent if attributes in X, and R[Xl=tr[Xl IrER). DOM@) = DOM@'). As a consequence of completeness and uniqueness, DOM(R) = DOM(R') iff ATTR=ATTR' A database schema is a set of relation schemas and F+=F'+. Hence, -when normalizing R one can and is denoted g= {_Ri=<ATTRi,Fi>li= l,...,n). substitute F' for F whenever F'+=Ff, since A database state D for D is an assignment of the resulting schema <ATTR,F'> remains equivalent relation states to the relation schemas of D. to -R. D(Eii) denotes the relation state assigned by D to Ei, for i=l,...,n. If D and D' are both All normalization algorithms depend on this database states for database schema D, we define fact. Bernstein [Bern], for example, exploits it 246 to substitute a "reduced" set of FDs for the ones is precisely representable by the set REP(D) = supplied by the user. Without this substitution, CUIgDEDOMQ), U= *;=l D(RiIl. Two schemas -D and many schemas cannot be normalized due to redun- dancies in the way FDs are expressed in the D’ represent the same external application, i.e., schema [Bern]. are equivalent under the UR-assumption if REP(D) = REP(D'). But REP(D) =DOM(U=<ATTR,F>) where However, completeness and uniqueness do not ATTR= (U;=l ATTRi) and F=(Uy=l Fi), and n' hold in multi-relation databases enerally. Con- REP(D') =DOM(U' =<ATTR',F'>) where ATTR'=(U. ATTR;) sider Figure 2.4. The schemas DY , D2, and D3 n' - 1=1 "contain" the same FDs--i.e., FlT=F5=F3+--yet and F'= (Ui,l F;'. Consequently REP@) = REP@') + these three schemas differ qualitatively in re- iff DOM(U) =DOM(U') iff ATTR=ATTR' and F+=F', presentational capability. ~1 can only represent which is precisely the notion of schema equivalence universities in which professors and teaching required by normalization theory. assistants must teach courses in their own department; D2 permits teaching assistants to be drawn Finally we observe that the FDs that hold in from other departments, but not professors; and D(Ri) fo; every DEDOM(D) are F'[ATTRiI, not D3 permits complete flexibility in the assignment simply Fi.

Load more