Database Decomposition Into Fourth Normal Form
Total Page:16
File Type:pdf, Size:1020Kb
DATABASE DECOMPOSITION INTO FOURTH NORMAL FORM GGsta Grahne and Kari-Jouko R;iihB: University of Helsinki, Department of Computer Science Tukholmankatu 2, SF-00250 Helsinki 25, Finland Abstract. We present an algorithm that decom- database scheme. poses a database scheme when the dependency set contains functional and multivalued dependencies. A good decomposition should have several The schemes in the resulting decomposition are in properties [BBG78]: fourth normal form and have a lossless join. Our algorithm does not impose restrictions on the Minimal redundancy: the relation schemes allowed set of dependencies, and it never re- (1) quires the computation of the full closure of the should represent meaningful objects so that, dependency set. Furthermore, the algorithm works for instance, they can be updated without in polynomial time for classes of dependencies that properly contain conflict-free dependency knowing the values for all the attributes in sets. the universal scheme. This is usually taken to mean that the relation schemes should be 1. INTRODUCTION in one of various normal forms. The structure of a relational database can be (2) Representation: it should be possible to re- defined by a universal relation scheme U and by cover the universal relation from the compo- nent relations, i.e. the decomposition should a set of dependencies D. Relations that conform have a lossless join. to the universal scheme may contain much redun- dant information and have undesirable update (3) Separation: when a (projected) relation is properties [DatBl]. It is therefore customary to updated, it should be possible to ensure that decompose the universal scheme into subschemes the dependencies in the universal relation k still hold after the update without actually Rl,-..,s such that U = 'J R.. The collection i=l ' constructing the universal relation. This $,. ,I$> is called the database scheme. Uni- condition is satisfied if the decomposition versal relations can be represented as their pro- is dependency preserving, i.e. if the pro- jections onto the relation schemes in the jected dependencies imply the original set D. This work was supported by the Academy of There exists a wide variety of dependency Finland. types, each one giving rise to different sets of normal forms. The two most common dependency types are functional dependencies and multivalued dependencies. The normal forms associated with functional dependencies are -~-third normal form (3NF) [Cod721 and Boyce-Codd ~-normal form (BCNF) [Cod74], whereas -~-fourth normal form (4NF) [Fag771 is defined for multivalued dependencies. The 186 requirements for these normal forms increase from 2. PREVIOUS WORK 3NF to 4NF, i.e. 4NF implies BCNF, which implies There is a straightforward way of finding a 3NF. lossless decomposition into 4NF. Suppose U has Unfortunately, it is not always possible to been replaced by a set of relation schemes find a decomposition having all these desirable Rl* . ,Rk (initially k=l and Rl=U). Suppose fur- properties. In particular, there exist relation ther that some R. is not in 4NF. This means that 1 schemes (or rather dependencies) that do not have there exists a dependency X-WY in D’ whose pro- a dependency preserving decomposition into BCNF jection onto Ri is nontrivial (i.e. O#XCRi, or 4NF [BeB79]. The question of what to do when a YDX=8, and O#YflRi#Ri-X) such that X is not a good decomposition does not exist is controver- key of R.. Then Ri is replaced by XY fl Ri and 1 sial (see e.g. [BBG78]). Especially the validity XU (Ri-Y). This guarantees the losslessness of of the universal relation assumption has given the join [Fag77,ABU79], and the repeated applica- rise to much debate (cf. [AtP82,U1182a]). tion produces a set of schemes in 4NF. It is, of course, important to assess how This algorithm has several, efficiency prob- well each of the three properties meets its in- lems. First, it is known that testing whether a tended goals. But this is not enough. Even if it relation scheme is in BCNF is NP-complete [BeB79]. is possible to decompose a relation scheme into Therefore testing the 4NF property is also compu- subschemes that satisfy some set of properties, tationally intractable. In their algorithm for the method will be of little use if the decompo- functional dependencies [TsF80], Tsou and Fischer sition algorithm is prohibitively expensive. have circumvented this problem by testing only a In this paper we will focus our attention on sufficient BCNF condition, not a sufficient and a problem that is known to be always solvable: necessary one. In other words, if a nontrivial the problem of decomposing a scheme U into a set dependency is found in Ri, then Ri is decomposed of subschemes {R such that R 1”’ 4-J l”..,\ regardless of whether it actually is in BCNF or have a lossless join and each Ri is in normal not. The same idea can be applied to multivalued form. dependencies and 4NF also. (If the set of depend- encies includes only multivalued dependencies, The complexity of the problem depends heavily the absence of nontrivial dependencies is a suf- on the desired normal form. For 3NF a simple ficient and necessary condition for 4NF.) polynomial time synthesis algorithm was presented in ;BDB79]. Finding a lossless decomposition into The problem then becomes that of finding a BCSF is more difficult, but an algorithm working nontrivial dependency in some Ri. Fagin [Fag771 in polynomial time was recently given by Tsou and proposed to compute the dependency closure D+ Fischer [TsF80]. Their approach does not general- before running the decomposition algorithm. Then ize to multivalued dependencies, and the complex- it would be sufficient just to scan through the ity of lossless decomposition into 4b’F is open. precomputed set when searching for some nontriv- We shall give an algorithm that is more efficient ial dependency X-Y. Unfortunately, the size of than existing general algorithms, and that works the closure D+ can be exponential [U1182b], in polynomial time for a broader class of depend- thereby ruining the efficiency of this approach. encies than existing polynomial algorithms. Moreover, Lien has shown in lLie81.1 that if such a precomputed set of dependencies is used in the We use the notation and terminology of decomposition algorithm, then that set must be [Ull82b]. the closure D+: if any smaller set is used, the algorithm may not work correctly. 187 Therefore the nontrivial dependency X-Y Finding efficiently the smallest nontrivial must be found differently. Another approach would multivalued dependency X++Y appears to be a dif- be to avoid computing D+ and to derive the de- ficult problem, and it was left open by Tsou and pendencies on demand from the initial set D. In- Fischer. However, solving this problem is essen- deed, if there are any nontrivial dependencies in tial for making the decomposition efficient, if Ri, then at least one of them can be found effi- the dependencies are tested on the basis of D ciently using a result of [HIT791. without computing D+. But then we are faced with the problem that Our solution is in some sense between the two the decomposition tree may grow too large. If we extremes. We avoid computing D+, but we do not use an arbitrary nontrivial dependency X-Y for use the fixed set D, either. Instead, during the decomposing Ri, neither of the new schemes decomposition process we will add to D members of (XYflRi and XU(Ri-Y)) is necessarily in 4NF. D+ in such a way that whenever we wish to find a Therefore both of them must be decomposed fur- nontrivial dependency X-Y, it is sufficient to ther. In each decomposition step the new sub- examine only the dependencies in the extended de- schemes must have less attributes than the decom- pendency set D. This process is dynamic: it is posed scheme Ri, but the difference may be only guided by the decomposition tree. Therefore all one attribute. Thus the decomposition tree may the dependencies in D+ will not be added to D. have IUl levels, and in the case of a full binary There exist some algorithms for performing IUI tree this means 2 -1 decomposition steps. the decomposition efficiently, but none of these This indicates that instead of an arbitrary algorithms achieves all the desired properties. nontrivial dependency X++Y, one should find a Lien's algorithm [Lie821 produces a lossless de- dependency that is in a sense as "small" as poss- composition into 4NF schemes in polynomial time, ible. That is, the subscheme XYflRi should not but it only works for so-called conflict-free de- contain any nontrivial dependencies. This would pendency sets. Sciore's method [Scidl] works guarantee that XYflRi is in 4NF, and the decompo- under the same restriction. Although this is a sition tree would degenerate into a linear tree practical subclass of multivalued dependencies, with O(llJl> nodes. In the case of functional de- the general case (unrestricted dependencies) pendencies, such a "smallest" X+Y can be found still remains as an intrigueging open question. in polynomial time. Tsou and Fischer start with Similarly, da Silva's algorithm [daS80] produces a an arbitrary nontrivial dependency X+Y. Then lossless decomposition into schemes that are in they repeatedly test whether some nontrivial de- 4NF as far as D is concerned; however, there may pendency X' +Y' still holds in XYflRi; if so, still exist derived dependencies in D+ that viol- they throw out the attributes in (XYflR;) -X'Y'. ate the normal form conditions. This is possible, since the properties of func- tional dependencies imply that X'+Y' also holds 3.