Week 11: Normal Forms Logical Database Design
Total Page:16
File Type:pdf, Size:1020Kb
Week 11: Normal Forms Logical Database Design Database Design We have seen how to design a relational Database Redundancies and Anomalies schema by first designing an ER schema Functional Dependencies and then transforming it into a relational Entailment, Closure and Equivalence one. Lossless Decompositions The Third Normal Form (3NF) Now we focus on how to transform the The Boyce-Codd Normal Form (BCNF) generated relational schema into a "better" Normal Forms and Database Design one. Goodness of relational schemas is defined in terms of the notion of normal form. CSC343 – Introduction to Databases Normal Forms — 1 CSC343 – Introduction to Databases Normal Forms — 2 1 Normal Forms and Normalization Examples of Redundancy A normal form is a property of a database schema. When a database schema is un-normalized (that is, Employee Salary Project Budget Function does not satisfy the normal form), it allows Brown 20 Mars 2 technician redundancies of various types which can lead to Green 35 Jupiter 15 designer anomalies and inconsistencies. Green 35 Venus 15 designer Hoskins 55 Venus 15 manager Normal forms can serve as basis for evaluating the Hoskins 55 Jupiter 15 consultant quality of a database schema and constitutes a useful Hoskins 55 Mars 2 consultant tool for database design. Moore 48 Mars 2 manager Moore 48 Venus 15 designer Normalization is a procedure that transforms an un- Kemp 48 Venus 15 designer normalized schema into a normalized one. Kemp 48 Jupiter 15 manager CSC343 – Introduction to Databases Normal Forms — 3 CSC343 – Introduction to Databases Normal Forms — 4 2 Anomalies What’s Wrong??? The value of the salary of an employee is repeated in every tuple where the employee is mentioned, We are using a single relation to represent data of leading to a redundancy. Redundancies lead to very different types. anomalies: In particular, we are using a single relation to store If salary of an employee changes, we have to the following types of entities, relationships and modify the value in all corresponding tuples attributes: (update anomaly) Employees and their salaries; If an employee ceases to work in projects, but Projects and their budgets; stays with company, all corresponding tuples are Participation of employees in projects, along with deleted, leading to loss of information (deletion their functions. anomaly) To set the problem on a formal footing, we introduce A new employee cannot be inserted in the the notion of functional dependency (FD). relation until the employee is assigned to a project (insertion anomaly) CSC343 – Introduction to Databases Normal Forms — 5 CSC343 – Introduction to Databases Normal Forms — 6 3 Functional Dependencies (FDs) Functional Dependencies in the Example Given schema R(X) and non-empty subsets Y and Z of the attributes X, we say that there is a functional Each employee has a unique salary. We dependency between Y and Z (Y→Z), iff for every represent this dependency as relation instance r of R(X) and every pair of tuples t , t 1 2 Employee → Salary of r, if t .Y = t .Y, then t .Z = t .Z. and say " functionally depends on 1 2 1 2 Salary A functional dependency is a statement about all Employee". allowable relations for a given schema. Meaning: if two tuples have the same Functional dependencies have to be identified by Employee attribute value, they must also understanding the semantics of the application. have the same Salary attribute value Given a particular relation r0 of R(X), we can tell if a Likewise, dependency holds or not; but just because it holds for Project → Budget r0, doesn’t mean that it also holds for R(X)! i.e., each project has a unique budget CSC343 – Introduction to Databases Normal Forms — 7 CSC343 – Introduction to Databases Normal Forms — 8 4 Non-Trivial Dependencies Looking for FDs A functional dependency Y→Z is non-trivial if no attribute in Z appears among attributes of Y, e.g., Employee Salary Project Budget Function Employee → Salary is non-trivial; Brown 20 Mars 2 technician Employee,Project → Project is trivial. Green 35 Jupiter 15 designer Green 35 Venus 15 designer Anomalies arise precisely for the attributes which are Hoskins 55 Venus 15 manager involved in (non-trivial) functional dependencies: Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant Employee → Salary; Moore 48 Mars 2 manager Project → Budget. Moore 48 Venus 15 designer Kemp 48 Venus 15 designer Moreover, note that our example includes another Kemp 48 Jupiter 15 manager functional dependency: Employee,Project → Function. CSC343 – Introduction to Databases Normal Forms — 9 CSC343 – Introduction to Databases Normal Forms — 10 5 Dependencies Cause Anomalies, Another Example ...Sometimes! ER Model The first two dependencies cause undesirable redundancies and anomalies. (0.N) The third dependency, however, does not cause redundancies because {Employee,Project} constitutes a key of the SI# Name Address Hobbies This is NOT a relation relation (...and a relation cannot contain two 1111 Joe 123 Main {biking, hiking} tuples with the same values for the key attributes.) Relational Model SI# Name Address Hobby Dependencies on keys are OK, 1111 Joe 123 Main biking other dependencies are not! 1111 Joe 123 Main hiking ……………. Redundancy CSC343 – Introduction to Databases Normal Forms — 11 CSC343 – Introduction to Databases Normal Forms — 12 6 Superkey Constraints How Do We Eliminate Redundancy? A superkey constraint is a special functional Decomposition: Use two relations to store dependency: Let K be a set of attributes of R, and U Person information: the set of all attributes of R. Then K is a superkey iff the functional dependency K U is satisfied in Person1 (SI#, Name, Address) → R. Hobbies (SI#, Hobby) E.g., SI# → SI#,Name,Address (for a Person The decomposition is more general: people with relation) hobbies can now be described independently of A key is a minimal superkey, I.e., for each X ⊂ K, X their name and address. is not a superkey No update anomalies: SI#, Hobby → SI#, Name, Address, Hobby but Name and address stored once; SI# → SI#, Name, Address, Hobby A hobby can be separately supplied or Hobby SI#, Name, Address, Hobby deleted; → We can represent persons with no hobbies. A key attribute is an attribute that is part of a key. CSC343 – Introduction to Databases Normal Forms — 13 CSC343 – Introduction to Databases Normal Forms — 14 7 More Examples When are FDs "Equivalent"? Address → PostalCode Sometimes functional dependencies (FDs) DCS’s postal code is M5S 3H5 seem to be saying the same thing, Author, Title, Edition → PublicationDate e.g., Addr → PostalCode,Str# vs Addr PostalCode, Addr Str# Ramakrishnan, et al., Database → → Management Systems, 3rd publication date Another example is 2003 Addr → PostalCode, PostalCode → Province CourseID → ExamDate, ExamTime vs Addr → PostalCode, PostalCode → Province vs Addr Province CSC343’s exam date is December 18, → When are two sets of FDs equivalent? How do starting at 7pm we "infer" new FDs from given FDs? CSC343 – Introduction to Databases Normal Forms — 15 CSC343 – Introduction to Databases Normal Forms — 16 8 Entailment, Closure, Equivalence How Do We Compute Entailment? Satisfaction, entailment, and equivalence are If F is a set of FDs on schema R and f is another semantic concepts – defined in terms of the FD on R, then F entails f (written F |= f) if every "meaning" of relations in the “real world.” instance r of R that satisfies every FD in F also satisfies f. How to check if F entails f, F and G are equivalent? Example: F = {A B, B C} and f is A C → → → Apply the respective definitions for all possible If Phone# → Address and Address → relation instances for a schema R …… ZipCode, then Phone# → ZipCode Find algorithmic, syntactic ways to compute The closure of F, denoted F+, is the set of all FDs these notions. entailed by F. Note: The syntactic solution must be "correct” with F and G are equivalent if F entails G and G respect to the semantic definitions. entails F. Correctness has two aspects: soundness and completeness – see later. CSC343 – Introduction to Databases Normal Forms — 17 CSC343 – Introduction to Databases Normal Forms — 18 9 Soundness Armstrong’s Axioms for FDs Theorem: F |- f implies F |= f This is the syntactic way of computing/testing In words: If FD f: X → Y can be derived from a set of semantic properties of FDs FDs F using the axioms, then f holds in every relation Reflexivity: Y ⊆ X |- X → Y (trivial FD) that satisfies every FD in F. Example: Given X → Y and X → Z then e.g., |- Name, Address → Name X XY Augmentation by X Augmentation: X → Y |- XZ → YZ → YX → YZ Augmentation by Y e.g., Address → ZipCode |- X → YZ Transitivity Address,Name → ZipCode, Name Thus, X → YZ is satisfied in every relation where both Transitivity: X Y, Y Z |- X Z → → → X → Y and X → Z are satisfied. We have derived e.g., Phone# → Address, Address → ZipCode the union rule for FDs. |- Phone# → ZipCode CSC343 – Introduction to Databases Normal Forms — 19 CSC343 – Introduction to Databases Normal Forms — 20 10 Completeness Correctness Theorem: F |= f implies F |- f The notions of soundness and In words: If F entails f, then f can be derived completeness link the syntax (Armstrong’s from F using Armstrong's axioms. axioms) with semantics, i.e., entailment defined in terms of relational instances. A consequence of completeness is the following (naïve) algorithm to determining if This is a precise way of saying that the F entails f: algorithm for entailment based on the