Aslmple GUIDE to FIVE NORMAL FORMS in RELATIONAL DATABASE THEORY
Total Page:16
File Type:pdf, Size:1020Kb
COMPUTING PRACTICES ASlMPLE GUIDE TO FIVE NORMAL FORMS IN RELATIONAL DATABASE THEORY W|LL|AM KErr International Business Machines Corporation 1. INTRODUCTION The normal forms defined in relational database theory represent guidelines for record design. The guidelines cor- responding to first through fifth normal forms are pre- sented, in terms that do not require an understanding of SUMMARY: The concepts behind relational theory. The design guidelines are meaningful the five principal normal forms even if a relational database system is not used. We pres- in relational database theory are ent the guidelines without referring to the concepts of the presented in simple terms. relational model in order to emphasize their generality and to make them easier to understand. Our presentation conveys an intuitive sense of the intended constraints on record design, although in its informality it may be impre- cise in some technical details. A comprehensive treatment of the subject is provided by Date [4]. The normalization rules are designed to prevent up- date anomalies and data inconsistencies. With respect to performance trade-offs, these guidelines are biased to- ward the assumption that all nonkey fields will be up- dated frequently. They tend to penalize retrieval, since Author's Present Address: data which may have been retrievable from one record in William Kent, International Business Machines an unnormalized design may have to be retrieved from Corporation, General several records in the normalized form. There is no obli- Products Division, Santa gation to fully normalize all records when actual perform- Teresa Laboratory, ance requirements are taken into account. San Jose, CA Permission to copy without fee all or part of this 2. FIRST NORMAL FORM material is granted provided First normal form [1] deals with the "shape" of a record that the copies are not made type. Under first normal form, all occurrences of a record or distributed for direct type must contain the same number of fields. First normal commercial advantage, the ACM copyright notice and form excludes variable repeating fields and groups. This the title of the publication is not so much a design guideline as a matter of defini- and its date appear, and tion. Relational database theory does not deal with rec- notice is given that copying is ords having a variable number of fields. by permission of the Association for Computing Machinery. To copy 3. SECOND AND THIRD NORMAL FORMS otherwise, or to republish, Second and third normal forms [2, 3, 7] deal with the requires a fee and/or specific permission. relationship between nonkey and key fields. Under sec- © 1983 ACM 0001-0782/83/ ond and third normal forms, a nonkey field must provide 0200-0120 75¢ a fact about the key, the whole key, and nothing but the 120 Communications of the ACM February 1983 Volume 26 Number 2 key. In addition, the record must satisfy first normal fact about another nonkey field, as in form. We deal now only with "single-valued" facts. A single- [ EMPLOYEE I DEPARTMENT I LOCATION valued fact could be a one-to-many relationship such as the department of an employee or a one-to-one relation- ..... key ...... ship such as the spouse of an employee. Thus, the phrase The EMPLOYEE field is the key. If each department is "Y is a fact about X" signifies a one-to-one or one-to- located in one place, then the LOCATION field is a fact many relationship between Y and X. In the general case, about the DEPARTMENT--in addition to being a fact Y might consist of one or more fields and so might X. In about the EMPLOYEE. The problems with this design are the following example, QUANTITY is a fact about the the same as those caused by violations of second normal combination of PART and WAREHOUSE. form. 3.1 Second Normal Form • The department's location is repeated in the record of Second normal form is violated when a nonkey field is a every employee assigned to that department. fact about a subset of a key. It is only relevant when the • If the location of the department changes, every such key is composite, i.e., consists of several fields. Consider record must be updated. the following inventory record. • Because of the redundancy, the data might become in- consistent, e.g., different records showing different loca- tions for the same department. I PART I WAREHOUSE QUANTITY ] WAREHOUSE-ADDRESS • If a department has no employees, there may be no ..... key .............. record in which to keep the department's location. The key here consists of the PART and WAREHOUSE To satisfy third normal form, the record shown above fields together, but WAREHOUSE-ADDRESS is a fact should be decomposed into the two records: about the WAREHOUSE alone. The basic problems with this design are: EMPLOYEE DEPARTMENT DEPARTMENT LOCATION • The warehouse address is repeated in every record that .... key .......... key ...... refers to a part stored in that warehouse. • If the address of the warehouse changes, every record To summarize, a record is in second and third normal referring to a part stored in that warehouse must be up- forms if every field is either part of the key or provides a dated. (single-valued) fact about exactly the whole key and noth- • Because of the redundancy, the data might become in- ing else. consistent, with different records showing different ad- dresses for the same warehouse. 3.3 Functional Dependencies • If at some point in time there are no parts stored in the In relational database theory, second and third normal warehouse, there may be no record in which to keep the forms are defined in terms of functional dependencies, warehouse's address. which correspond approximately to our single-valued facts. A field Y is "functionally dependent" on a field (or To satisfy second normal form, the record shown fields) X if it is invalid to have two records with the same above should be decomposed into (replaced by) the two X value but different Y values. That is, a given X value records: must always occur with the same Y value. When X is a key, then all fields are by definition functionally depend- ent on X in a trivial way, since there cannot be two .... key ................... records having the same X value. There is a slight technical difference between func- [WAREHOUSE[WAREHOUSE-ADDRESS tional dependencies and single-valued facts as we have ..... key ..... presented them. Functional dependencies only exist when the things involved have unique and singular identifiers When a data design is changed in this way, i.e., replacing (representations). For example, suppose a person's ad- unnormalized records with normalized records, the dress is a single-valued fact, i.e., a person has only one process is referred to as normalization. The term address. If we do not provide unique identifiers for peo- "normalization" is sometimes used relative to a particular ple, then there will not be a functional dependency in the normal form. Thus, a set of records may be normalized data. with respect to second normal form but not with respect to third. PERSON [ ADDRESS The normalized design enhances the integrity of the John Smith ~1 123 Main St., New York data by minimizing redundancy and inconsistency, but at John Smith I 321 Center St., San Francisco some possible performance cost for certain retrieval appli- cations. Consider an application that wants the addresses Although each person has a unique address, a given of all warehouses stocking a certain part. In the unnor- name can appear with several different addresses. Hence, malized form, the application searches one record type. we do not have a functional dependency corresponding to With the normalized design, the application has to search our single-valued fact. Similarly, the address has to be two record types and connect the appropriate pairs. spelled identically in each occurrence in order to have a functional dependency. In the following case, the same 3.2 Third Normal Form person appears to be living at two different addresses, Third normal form is violated when a nonkey field is a again precluding a functional dependency. February 1983 Volume 26 Number 2 Communications of the ACM 121 PERSON ] ADDRESS [EMPLOYEE SKILL EMPLOYEE[ LANGUAGE I 123 Main St., New York [IJ°hnj°hnSmithSmith t! 123 Main Street, NYC .......... key .................. key ........... Note that other fields, not involving multivalued facts, are We are not defending the use of nonunique or nonsin- permitted to occur in the record, as in the case of the gular representations. Such practices often lead to data QUANTITY field in the earlier PART/WAREHOUSE ex- maintenance problems of their own. We do wish to point ample. out, however, that functional dependencies and the var- The main problem with violating fourth normal form ious normal forms are really only defined for situations in is that it leads to uncertainties in the maintenance poli- which there are unique and singular identifiers. Thus, the cies. Several policies are possible for maintaining two in- design guidelines as we present them are a bit stronger dependent multi.valued facts in one record. than those implied by the formal definitions of the nor- mal forms. For instance, we as designers know that in the follow- (1) A disjoint format, in which a record contains either ing example there is a single-valued fact about a nonkey a skill or a language, but not both. field, and hence the design is susceptible to all the update anomalies mentioned earlier. EMPLOYEE SKILL LANGUAGE Smith cook [EMPLOYEE [FATHER [ FATHER'S-ADDRESS Smith type I Art Smith John Smith I 123 Main St., New York Smith French I Bob Smith [i John Smith [ 123 Main Street, NYC Smith German I Cal Smith John Smith [ 321 Center St., San Francisco Smith Greek I However, in formal terms, there is no functional depend- This is not much different from maintaining two separate ency here between FATHER'S-ADDRESS and FATHER, record types.