Reflections on Boyce-Codd Normal Form
Total Page:16
File Type:pdf, Size:1020Kb
Reflections on Boyce-Codd Normal Form Carol Helfgott LeDwx * D. Scott fiTkeT, ,h-, * The Aerospace Corporation Computer Science Department El Segundo, California University of California, Los Angeles ABSTRACT 1. Introduction An important database design problem is The usefulness of Boyce-Codd Normal Form the development of a set of relations for the (BCNF) has been questioned by various database which capture the underlying data researchers. A recent study showed that under semantics as accurately as possible. To the common assumptions, BCNF no longer guaran- extent it can be achieved, the designer should tees freedom from various ‘anomalies’, one of its attempt to ‘normalize’ the database into purported virtues. Second, a BCNF covering is semantically independent components. One sometimes unattainable, i.e., some sets of FDs particular stopping point in the normalization have no corresponding BCNF schema. Third, it process is known widely as ‘Boyce-Codd Normal is difficult to determine whether a given schema Form’ (BCNF). violates BCNF (doing so is known to be NP- Unfortunately, BCNF has certain undesir- complete). able properties as a normal form: not all rela- tion schemata have a BCNF representation, This paper reviews the intent of the normal although they always may be decomposed into form, and suggests that these arguments may be the less stringent ‘Third Normal Form’ (3NF). discounted. We show that Also, several results have been presented which . BCNF still provides a useful design criterion question the practical usefulness of BCNF as a when such assumptions as the Universal logical database design goal. First, a number of Instance Assumption are dropped (although questions involving both BCNF and keys have BCNF can be improved upon by taking into been shown to be NP-complete [Beeri & Bern- consideration the dynamic use of the data- stein, 19791. Second, in [Bernstein & Goodman, base); 19801 it was demonstrated that BCNF does not . even in schemata where BCNF is ‘unattain- prevent ‘anomalous’ behavior if the database is able’, BCNF can be attained by unconven- required to satisfy what is now termed ‘the tional means, typically by renaming or Universal Instance Assumption’ [the condition adding attributes to better capture the whereby all tables (relation instances) in the semantic content of the data; database are required to be projections of a sin- . testing violation of BCNF seems to require gle universal table (universal instance), which is exponential time only in the w,orst case defined over all attributes]. This paper tries to where the set of ‘interesting’ dependencies put these criticisms in perspective. (nontrivial dependencies in minimal form) There are certainly drawbacks in using sim- in the dependency closure grows ple normal forms. No existing normal form exponentially large. Such situations do not incorporates, in any way, the intended dynamic seem to typify the ‘real world’: We investi- behavior of the database, although this behavior gate a model of FD schemata, called the FD is important for the design. Moreover, all nor- hierarchy model, similar to many other data mal forms are defined in terms of dependencies. models proposed in recent years. For this Not all semantics fall into this mold, a.nd infor- model the FD closure is always small, and mation can be lost when semantics are encoded testing violation of BCNF is not only not into dependencies. ** Normal forms therefore NP-complete, it is linear in the size of the l * It is pointed out in [Atzeni & Parker, 19821 that depen- input. We also point out a relationship dency inferences may not be meaningful, fi_rFt of a!!. For between keys, FDs which violate BCNF, and example, consider the dependencies FD closures. COMPANY 4 ADDRESS (companies have a plant address) ADDRESS + RESIDENT (residential addresses have an owner) suggesting by transitivity that each company has a unique l Research supported by NSF grant IST 90-12419. resident! Second, dependencies usually require an artificial ‘relationship uniqueness’ assumption: Since func- tional dependencies specify only the attribute sets on Proceedings of the Eighth International Conference on Very Large Data Bases 131 Mexico City, September, 1982 provide no guarantee that a design is correct or set of attributes K such that the FD K + U may appropriate for the application at hand. At be inferred from F. A candidate key is a key K best, normal forms provide guidelines in the log- such that no proper subset of K is also a key. A ical design phase, and should be complemented relation schema R with dependencies F is in with some deeper understanding of the applica- Boyce-Codd Normal Form if whenever the non- tion semantics. trivial FD X+ Y holds in R, then X includes a key However, BCNF does provide a useful design for R. That is, a set of attributes is either a key benchmark. In essence it expresses the design or it does not determine any other attribute. maxim, ‘one fact in one place’+**, assuming that For more discussion, see [Date, 1931] or [Ull- all facts are many-one relationships. We present man, 19801. Any FD X + Y such that X does not three arguments supporting its usefulness. contain a key is called a non-Boyce-Codd func- First, we argue that the problems with BCNF tional dependency (abbr. non-BCFD). noted by [Bernstein & Goodman, 19301 do not The relation schema apply if we drop the Universal Instance Assump- R< fA,B,CJ,lA-+ B,A+ Cj> is in BCNF because A tion, and that BCNF does then eliminate certain is a key, and neither B nor C determines any kinds of anomalous behavior. Second, we show other attributes in R. On the other hand, the that BCNF can be attained in many situations relation schema R-c tA,B,Cj,iA-, B,B+ Cj> is not where it appears unreachable, by renaming in BCNF because B is not a key, yet it deter- schema attributes in such a way as to re- mines C. B + C is a non-BCFD. If R is decom- express semantics properly. Finally, we present posed into a pair of relation schemata simple algorithms for determining BCNF proper- RI< [A,Bj,tA-+ Bj> and R2< fB,Cf,iB+ Cl>, each ties. The algorithms have exponential running of the schemata RI, Rz is in BCNF. time only in the pathological case where the The closure J’+ of a set of dependencies F is given set of dependencies has a set of ‘interest- the set of all dependencies which may be ing’ FDs (nontrivial dependencies in minimal inferred from those in F via one or more appli- form) in its closure which is exponentially large. cations of inference rules. A complete set of Thus, even the NP-completeness results here rules is Armstrong’s first three axioms: may not be insurmountable. FDl IfYEXc U,thenX+ Y 2. Boyce-Codd Normal Form FD2 If X -, Y and 2 c U, then X2 + YZ BCNF and 3NF correspond to restrictions on FD3 IfX+ YandY-, Z,thenX-, Z the functional dependencies (FDs) which hold in A good introduction to the subject may be a schema. A functional dependency is an found, for example, in [Ullman, 19801. integrity constraint among data attributes in a A covet Fc of a set of dependencies F is a database. For example, the functional depen- subset of F from which all of F may be inferred. dency E-t C may represent the constraint that each employee, E, works for only one company, It is easily verified that FJ = F+. Such a set F,, C. A relation schema with functional dependen- is a minimal cover if no proper subset of F,, is cies is said to be in Boyce-Codd Normal Form if also a cover. all the dependencies are either trivial depen- A BCNF coverin of a schema R< U,F> is a dencies (such as E-, E or EC+ E), or those in database schema 9&< Ud,Fi> 1 i=l,...,nj such which a key functionally determines one or that more attributes. Henceforth given any FD (1) Every attribute in U appears in at least one 1: X-2 Y, we will use LHS(f) (Left-Hand-Side of of the sets Ui f) and,X interchangeably, and RHS(f ) and Y interchangeably. Typicahy IIRHS(f )II= 1, i.e., (2) Fi* mcludes* alI dependencies of Ji’+ whose RED(f) wiII consist of a single attribute. (]]S]l attributes are drawn from Vi denotes the cardinality of a set S). (3) The union of the Fi sets comprises a cover We can formally define a relation schema R for F = R< U,F> as a set of attributes U and a set of (4) Each 4 is in BCNF. functional dependencies F defined on U. If the union of the Fi sets is additionally a Throughout this paper we will assume that F minimal cover for F, the database schema contains only functional dependencies. f&C Ui,Fs> 1 i=l,...,nj is called a minimal BCNF A database schema is a collection of one or covering. more relation schemata. A key for R< U,F> is a Clearly, whenever a minimal BCNF covering -- exists, a BCNF covering exists also. However, it which they hold, any two dependencies relating x to Y is not always possible to And a BCNF covering. should be equivalent. Thus, if we have three functional dependencies An example which is typically given. to illustrate f :EMP+MGR, g:MGR+ SAL, and h:EMP+ SAL, this is the schema R< {C,S,Zj.lCS+ Z,Z+ Cl> we must assume that h is equivalent to the functional relating zip codes, Z, street addresses, S, and composition of g and f , although this might not be what cities, C.