Normal Forms
Total Page:16
File Type:pdf, Size:1020Kb
Database Relational Database Design Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms FD Inference There are two interconnected problems which are caused Boyce-Codd Normal Form by bad database design: Third Normal Form Normalisation Algorithms I Redundancy problems Lossless Join BCNF Algorithm I Update anomalies BCNF Examples Dependency Preservation 3NF Algorithm Good database design is based on using certain normal forms for relation schemas. Database Example 1 Management Peter Wood Relational Let F1 = fE ! D; D ! M; M ! Dg. Database Design Update Anomalies E stands for ENAME, D stands for DNAME and M stands Data Redundancy for MNAME Normal Forms FD Inference Boyce-Codd Normal Form A relation r1 over EMP1 (whose schema is {ENAME, Third Normal Form DNAME, MNAME}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME DNAME MNAME BCNF Examples Dependency Preservation Mark Computing Peter 3NF Algorithm Angela Computing Peter Graham Computing Peter Paul Math Donald George Math Donald E is the only key for EMP1 w.r.t. F1. Database Problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms 1. We cannot represent a department and manager FD Inference Boyce-Codd Normal Form without any employees (i.e., we cannot insert a tuple Third Normal Form with a null ENAME because of entity integrity); Normalisation Algorithms such a problem is called an insertion anomaly. Lossless Join BCNF Algorithm BCNF Examples 2. For the same reason as (1), we cannot delete all the Dependency Preservation employees in a department and keep just the 3NF Algorithm department information; such a problem is called a deletion anomaly. ? In (3) it is not sufficient to check that r1 satisfies the FDs resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K ! schema(R), where K is a key for R w.r.t. F. Database More problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference ! Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database More problems with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference ! Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples ? In (3) it is not sufficient to check that r satisfies the FDs Dependency Preservation 1 3NF Algorithm resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K ! schema(R), where K is a key for R w.r.t. F. Database Final problem with EMP1 and F1 Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Normal Forms FD Inference Boyce-Codd Normal Form Third Normal Form 4. There is redundancy in r1, i.e. for every employee in Normalisation a given department MNAME is repeated. Algorithms Lossless Join BCNF Algorithm BCNF Examples ? “Peter” appears three times for “Computing” and Dependency Preservation 3NF Algorithm “Donald” twice for “Math”. Database Example 2 Management Peter Wood Relational Let F2 = fE ! Sg. Database Design Update Anomalies E stands for ENAME, S stands for SAL and C stands for Data Redundancy CNAME. Normal Forms FD Inference Boyce-Codd Normal Form A relation r2 over EMP2 (whose schema is {ENAME, Third Normal Form CNAME, SAL}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME CNAME SAL BCNF Examples Dependency Preservation Jack Jill 25 3NF Algorithm Jack Jake 25 Jack John 25 Donald Dan 30 Donald David 30 EC is the only key for EMP2 w.r.t. F2. Database Problems with EMP2 and F2 Management Peter Wood Relational Database Design Update Anomalies 1. Insertion anomaly: we cannot insert an employee Data Redundancy without any children. Normal Forms FD Inference 2. Deletion anomaly: if there is a mistake and “Donald” Boyce-Codd Normal Form Third Normal Form does not have any children, we cannot record this Normalisation fact by deleting the two tuples for “Donald”. Algorithms Lossless Join BCNF Algorithm 3. Modification anomaly: if we try to modify the salary BCNF Examples Dependency Preservation of “Jack” in the first tuple to be 27 instead of 25, 3NF Algorithm since no FD resulting from a key will be violated, but E ! S would be violated. 4. Redundancy: the salary of each employee is repeated for every child. Database Formalising Redundancy Problems Management Peter Wood Relational Database Design Update Anomalies Data Redundancy Let R be a relation schema and F be a set of FDs over R. Normal Forms FD Inference Definition. R has a redundancy problem if Boyce-Codd Normal Form Third Normal Form (1) there exists a relation r over R that satisfies F, and Normalisation (2) there exists an FD X ! A in F and two distinct tuples Algorithms Lossless Join in r that have equal XA values. BCNF Algorithm BCNF Examples Dependency Preservation • It can be shown that redundancy problems, give rise to 3NF Algorithm update anomalies and vice versa. ? Verify that the schemas of Examples, 1 and 2 have redundancy problems. Database Problem for you to work on Management Peter Wood Consider the following relation over schema Films: Relational Title Year Genre StarName Database Design Update Anomalies Star Wars 1977 SciFi Carrie Fisher Data Redundancy Normal Forms Star Wars 1977 SciFi Harrison Ford FD Inference Boyce-Codd Normal Form Raiders . 1981 Action Harrison Ford Third Normal Form Raiders . 1981 Adventure Harrison Ford Normalisation Algorithms When Harry . 1989 Comedy Carrie Fisher Lossless Join BCNF Algorithm Assume that the only FD that holds on Films is BCNF Examples Dependency Preservation Title ! Year. 3NF Algorithm What is the only key for Films? Give an example of 1. an insertion anomaly 2. a deletion anomaly 3. a modification anomaly 4. a redundancy problem for Films. Database Normal Forms Management Peter Wood Relational Database Design I We assume that we are given a (1NF) relation Update Anomalies Data Redundancy schema R and a set F of functional dependencies Normal Forms FD Inference (FDs) over R. Boyce-Codd Normal Form Third Normal Form I We define two normal forms for relation schemas: Normalisation I Boyce-Codd Normal Form (BCNF) Algorithms Lossless Join I Third Normal Form (3NF) BCNF Algorithm BCNF Examples I BCNF guarantees that the relation schema has no Dependency Preservation 3NF Algorithm redundancy problems I BCNF is stronger than 3NF: If R is in BCNF, then R is in 3NF I 3NF, however, does sometimes have some advantages (see later) There are 3 rules of inference for FDs, known as Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X ! Y (trivial FDs). 2. Augmentation. If X ! Y , then XA ! YA for any attribute A not in X or Y . 3. Transitivity. If X ! Y and Y ! Z , then X ! Z . Database Rules of inference for FDs Management Peter Wood Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E ! D and D ! M, then FD Inference Boyce-Codd Normal Form E ! M can be derived from F (transitivity). Third Normal Form Normalisation An FD X ! Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Rules of inference for FDs Management Peter Wood Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E ! D and D ! M, then FD Inference Boyce-Codd Normal Form E ! M can be derived from F (transitivity). Third Normal Form Normalisation An FD X ! Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation There are 3 rules of inference for FDs, known as 3NF Algorithm Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X ! Y (trivial FDs). 2. Augmentation. If X ! Y , then XA ! YA for any attribute A not in X or Y . 3. Transitivity. If X ! Y and Y ! Z , then X ! Z . The closure of a set of attributes, CLOSURE(X; F), effectively uses Armstrong’s Axioms to find all attributes determined by X. From CLOSURE(X; F) one can find all FDs in F + that have X on the lefthand side. For example, if CLOSURE(HR; F) = HRCT , then F + contains HR ! C, HR ! T ,... Database Closure of a set of FDs Management Peter Wood If X ! Y can be derived for a set of FDs F, we write this Relational as F ` X ! Y . Database Design Update Anomalies + Data Redundancy Given F, the closure of F, denoted by F , is the set of all Normal Forms FDs that can be derived (or proven) from F. That is, FD Inference Boyce-Codd Normal Form Third Normal Form + F = fX ! Y j F ` X ! Y g: Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Closure of a set of FDs Management Peter Wood If X ! Y can be derived for a set of FDs F, we write this Relational as F ` X ! Y .