Relational Database Design Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms FD Inference There are two interconnected problems which are caused Boyce-Codd Normal Form by bad database design: Third Normal Form Normalisation Algorithms I Redundancy problems Lossless Join BCNF Algorithm I Update anomalies BCNF Examples Dependency Preservation 3NF Algorithm Good database design is based on using certain normal forms for schemas. Database Example 1 Management Peter Wood

Relational Let F1 = {E → D, D → M, M → D}. Database Design Update Anomalies E stands for ENAME, D stands for DNAME and M stands Data Redundancy for MNAME Normal Forms FD Inference Boyce-Codd Normal Form A relation r1 over EMP1 (whose schema is {ENAME, Third Normal Form DNAME, MNAME}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME DNAME MNAME BCNF Examples Dependency Preservation Mark Computing Peter 3NF Algorithm Angela Computing Peter Graham Computing Peter Paul Math Donald George Math Donald

E is the only key for EMP1 w.r.t. F1. Database Problems with EMP1 and F1 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms 1. We cannot represent a department and manager FD Inference Boyce-Codd Normal Form without any employees (i.e., we cannot insert a tuple Third Normal Form with a null ENAME because of entity integrity); Normalisation Algorithms such a problem is called an insertion anomaly. Lossless Join BCNF Algorithm BCNF Examples 2. For the same reason as (1), we cannot delete all the Dependency Preservation employees in a department and keep just the 3NF Algorithm department information; such a problem is called a deletion anomaly. ? In (3) it is not sufficient to check that r1 satisfies the FDs resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K → schema(R), where K is a key for R w.r.t. F.

Database More problems with EMP1 and F1 Management Peter Wood

Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference → Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database More problems with EMP1 and F1 Management Peter Wood

Relational Database Design Update Anomalies 3. E.g. in the first tuple, modifying “Peter” to “Philip” or Data Redundancy Normal Forms “Computing” to “Math”, does not violate any FD FD Inference → Boyce-Codd Normal Form resulting from a key but D M would be violated Third Normal Form (D is not a key for EMP1 w.r.t. F1); Normalisation Algorithms such a problem is called a modification anomaly. Lossless Join BCNF Algorithm BCNF Examples ? In (3) it is not sufficient to check that r satisfies the FDs Dependency Preservation 1 3NF Algorithm resulting from the keys of EMP1 w.r.t. F1. ? Ideally, we would like all the FDs of a relation schema to be inferred from key dependencies, i.e. FDs of the form K → schema(R), where K is a key for R w.r.t. F. Database Final problem with EMP1 and F1 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms FD Inference Boyce-Codd Normal Form Third Normal Form

4. There is redundancy in r1, i.e. for every employee in Normalisation a given department MNAME is repeated. Algorithms Lossless Join BCNF Algorithm BCNF Examples ? “Peter” appears three times for “Computing” and Dependency Preservation 3NF Algorithm “Donald” twice for “Math”. Database Example 2 Management Peter Wood

Relational Let F2 = {E → S}. Database Design Update Anomalies E stands for ENAME, S stands for SAL and C stands for Data Redundancy CNAME. Normal Forms FD Inference Boyce-Codd Normal Form A relation r2 over EMP2 (whose schema is {ENAME, Third Normal Form CNAME, SAL}): Normalisation Algorithms Lossless Join BCNF Algorithm ENAME CNAME SAL BCNF Examples Dependency Preservation Jack Jill 25 3NF Algorithm Jack Jake 25 Jack John 25 Donald Dan 30 Donald David 30

EC is the only key for EMP2 w.r.t. F2. Database Problems with EMP2 and F2 Management Peter Wood

Relational Database Design Update Anomalies 1. Insertion anomaly: we cannot insert an employee Data Redundancy without any children. Normal Forms FD Inference 2. Deletion anomaly: if there is a mistake and “Donald” Boyce-Codd Normal Form Third Normal Form

does not have any children, we cannot record this Normalisation fact by deleting the two tuples for “Donald”. Algorithms Lossless Join BCNF Algorithm 3. Modification anomaly: if we try to modify the salary BCNF Examples Dependency Preservation of “Jack” in the first tuple to be 27 instead of 25, 3NF Algorithm since no FD resulting from a key will be violated, but E → S would be violated. 4. Redundancy: the salary of each employee is repeated for every child. Database Formalising Redundancy Problems Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Let R be a relation schema and F be a set of FDs over R. Normal Forms FD Inference Definition. R has a redundancy problem if Boyce-Codd Normal Form Third Normal Form

(1) there exists a relation r over R that satisfies F, and Normalisation (2) there exists an FD X → A in F and two distinct tuples Algorithms Lossless Join in r that have equal XA values. BCNF Algorithm BCNF Examples Dependency Preservation • It can be shown that redundancy problems, give rise to 3NF Algorithm update anomalies and vice versa. ? Verify that the schemas of Examples, 1 and 2 have redundancy problems. Database Problem for you to work on Management Peter Wood Consider the following relation over schema Films: Relational Title Year Genre StarName Database Design Update Anomalies Star Wars 1977 SciFi Carrie Fisher Data Redundancy Normal Forms Star Wars 1977 SciFi Harrison Ford FD Inference Boyce-Codd Normal Form Raiders . . . 1981 Action Harrison Ford Third Normal Form

Raiders . . . 1981 Adventure Harrison Ford Normalisation Algorithms When Harry . . . 1989 Comedy Carrie Fisher Lossless Join BCNF Algorithm Assume that the only FD that holds on Films is BCNF Examples Dependency Preservation Title → Year. 3NF Algorithm What is the only key for Films? Give an example of 1. an insertion anomaly 2. a deletion anomaly 3. a modification anomaly 4. a redundancy problem for Films. Database Normal Forms Management Peter Wood

Relational Database Design I We assume that we are given a (1NF) relation Update Anomalies Data Redundancy

schema R and a set F of functional dependencies Normal Forms FD Inference (FDs) over R. Boyce-Codd Normal Form Third Normal Form I We define two normal forms for relation schemas: Normalisation I Boyce-Codd Normal Form (BCNF) Algorithms Lossless Join I Third Normal Form (3NF) BCNF Algorithm BCNF Examples I BCNF guarantees that the relation schema has no Dependency Preservation 3NF Algorithm redundancy problems

I BCNF is stronger than 3NF: If R is in BCNF, then R is in 3NF

I 3NF, however, does sometimes have some advantages (see later) There are 3 rules of inference for FDs, known as Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X → Y (trivial FDs). 2. Augmentation. If X → Y , then XA → YA for any attribute A not in X or Y . 3. Transitivity. If X → Y and Y → Z , then X → Z .

Database Rules of inference for FDs Management Peter Wood

Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E → D and D → M, then FD Inference Boyce-Codd Normal Form E → M can be derived from F (transitivity). Third Normal Form Normalisation An FD X → Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Rules of inference for FDs Management Peter Wood

Relational Given a set F of FDs, other FDs can be derived from Database Design Update Anomalies those in F. Data Redundancy Normal Forms For example, if F contains E → D and D → M, then FD Inference Boyce-Codd Normal Form E → M can be derived from F (transitivity). Third Normal Form Normalisation An FD X → Y is trivial if Y ⊆ X; otherwise it is Algorithms Lossless Join nontrivial. BCNF Algorithm BCNF Examples Dependency Preservation There are 3 rules of inference for FDs, known as 3NF Algorithm Armstrong’s Axioms: 1. Reflexivity. If Y ⊆ X, then X → Y (trivial FDs). 2. Augmentation. If X → Y , then XA → YA for any attribute A not in X or Y . 3. Transitivity. If X → Y and Y → Z , then X → Z . The closure of a set of attributes, CLOSURE(X, F), effectively uses Armstrong’s Axioms to find all attributes determined by X. From CLOSURE(X, F) one can find all FDs in F + that have X on the lefthand side. For example, if CLOSURE(HR, F) = HRCT , then F + contains HR → C, HR → T ,...

Database Closure of a set of FDs Management Peter Wood

If X → Y can be derived for a set of FDs F, we write this Relational as F ` X → Y . Database Design Update Anomalies + Data Redundancy Given F, the closure of F, denoted by F , is the set of all Normal Forms FDs that can be derived (or proven) from F. That is, FD Inference Boyce-Codd Normal Form Third Normal Form + F = {X → Y | F ` X → Y }. Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Closure of a set of FDs Management Peter Wood

If X → Y can be derived for a set of FDs F, we write this Relational as F ` X → Y . Database Design Update Anomalies + Data Redundancy Given F, the closure of F, denoted by F , is the set of all Normal Forms FDs that can be derived (or proven) from F. That is, FD Inference Boyce-Codd Normal Form Third Normal Form + F = {X → Y | F ` X → Y }. Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation The closure of a set of attributes, CLOSURE(X, F), 3NF Algorithm effectively uses Armstrong’s Axioms to find all attributes determined by X. From CLOSURE(X, F) one can find all FDs in F + that have X on the lefthand side. For example, if CLOSURE(HR, F) = HRCT , then F + contains HR → C, HR → T ,... Example 1

Let schema(R1) = {STUDENT, POSITION, SUBJECT}; S stands for STUDENT, J stands for SUBJECT and P stands for POSITION.

Let F1 = {SJ → P, PJ → S}.

• Is R1 in BCNF w.r.t. F1 ?

Database Boyce-Codd Normal Form Management Peter Wood

Relational Database Design Assume we are given a set F of FDs, along with its Update Anomalies closure F +. Data Redundancy Normal Forms FD Inference Definition. R is in Boyce-Codd Normal Form (BCNF) Boyce-Codd Normal Form w.r.t. F if for every non-trivial FD X → Y in F +, X is a Third Normal Form Normalisation for R w.r.t. F. Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Boyce-Codd Normal Form Management Peter Wood

Relational Database Design Assume we are given a set F of FDs, along with its Update Anomalies closure F +. Data Redundancy Normal Forms FD Inference Definition. R is in Boyce-Codd Normal Form (BCNF) Boyce-Codd Normal Form w.r.t. F if for every non-trivial FD X → Y in F +, X is a Third Normal Form Normalisation superkey for R w.r.t. F. Algorithms Lossless Join BCNF Algorithm Example 1 BCNF Examples Dependency Preservation 3NF Algorithm Let schema(R1) = {STUDENT, POSITION, SUBJECT}; S stands for STUDENT, J stands for SUBJECT and P stands for POSITION.

Let F1 = {SJ → P, PJ → S}.

• Is R1 in BCNF w.r.t. F1 ? Database Example 2 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms FD Inference Let schema(R2) = {STREET, CITY, POSTCODE}; Boyce-Codd Normal Form S stands for STREET, Third Normal Form Normalisation C stands for CITY and Algorithms Lossless Join P stands for POSTCODE. BCNF Algorithm BCNF Examples Dependency Preservation Let F2 = {SC → P, P → C}. 3NF Algorithm

• Is R2 in BCNF w.r.t. F2 ? So 3NF is weaker than BCNF

Database Third Normal Form Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms Definition. An attribute A in schema(R) is said to be FD Inference Boyce-Codd Normal Form prime w.r.t. F if A is a member of one of the keys of R Third Normal Form Normalisation w.r.t. F. Algorithms Lossless Join Definition. R is in Third Normal Form (3NF) w.r.t. F if for BCNF Algorithm BCNF Examples + every non-trivial FD X → A in F either X is a superkey Dependency Preservation 3NF Algorithm for R w.r.t. F or A is prime. Database Third Normal Form Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms Definition. An attribute A in schema(R) is said to be FD Inference Boyce-Codd Normal Form prime w.r.t. F if A is a member of one of the keys of R Third Normal Form Normalisation w.r.t. F. Algorithms Lossless Join Definition. R is in Third Normal Form (3NF) w.r.t. F if for BCNF Algorithm BCNF Examples + every non-trivial FD X → A in F either X is a superkey Dependency Preservation 3NF Algorithm for R w.r.t. F or A is prime. So 3NF is weaker than BCNF I What about Example 2, with R2 = {S, C, P} and F2 = {SC → P, P → C}, which was not in BCNF?

I What are the keys and prime attributes of R2 w.r.t. F2? I Is R2 in 3NF w.r.t. F2 ?

Database Examples Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms I Example 1, with R1 = {S, J, P} and FD Inference Boyce-Codd Normal Form F1 = {SJ → P, PJ → S}, is in BCNF Third Normal Form Normalisation I Therefore R is in 3NF Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Examples Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms I Example 1, with R1 = {S, J, P} and FD Inference Boyce-Codd Normal Form F1 = {SJ → P, PJ → S}, is in BCNF Third Normal Form Normalisation I Therefore R is in 3NF Algorithms Lossless Join I What about Example 2, with R2 = {S, C, P} and BCNF Algorithm BCNF Examples F2 = {SC → P, P → C}, which was not in BCNF? Dependency Preservation 3NF Algorithm I What are the keys and prime attributes of R2 w.r.t. F2? I Is R2 in 3NF w.r.t. F2 ? Database BCNF/3NF Example 3 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy Let R3 be a relation schema, with schema(R3) = Normal Forms {ENAME, DNAME, MNAME}; FD Inference Boyce-Codd Normal Form E stands for ENAME, D stands for DNAME and M stands Third Normal Form

for MNAME. Normalisation Algorithms Lossless Join Let F3 = {E → D, D → M}. BCNF Algorithm BCNF Examples Dependency Preservation I Is R3 in BCNF w.r.t. F3? 3NF Algorithm

I What are the keys and prime attributes of R3 w.r.t. F3?

I Is R3 in 3NF w.r.t. F3? Database BCNF/3NF Example 4 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy Let R4 be a relation schema, with schema(R4) = Normal Forms {ENAME, CNAME, SAL}; FD Inference Boyce-Codd Normal Form E stands for ENAME, C stands for CNAME and S stands Third Normal Form

for SAL. Normalisation Algorithms Lossless Join Let F4 = {E → S}. BCNF Algorithm BCNF Examples Dependency Preservation I Is R4 in BCNF w.r.t. F4? 3NF Algorithm

I What are the keys and prime attributes of R4 w.r.t. F4?

I Is R4 in 3NF w.r.t. F4? Database Problem for you to work on Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

Normal Forms FD Inference Consider relation schema R where schema(R) = ABCD. Boyce-Codd Normal Form Third Normal Form

Let the set F of FDs which hold on R be Normalisation Algorithms {AB → C, C → D, D → A}. Lossless Join BCNF Algorithm BCNF Examples 1. What are all the keys of R? (done earlier) Dependency Preservation 3NF Algorithm 2. Which FDs violate BCNF? 3. Which FDs violate 3NF? Database BCNF Normalisation Algorithm Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy In practice, we are given: Normal Forms FD Inference an entity-relationship diagram (ERD) and Boyce-Codd Normal Form I Third Normal Form I a set of functional dependencies (FDs) F. Normalisation Algorithms Lossless Join To produce a database design, we BCNF Algorithm BCNF Examples Dependency Preservation 1. Convert the ERD into a database schema S. 3NF Algorithm 2. If any of the relation schemas in S are not in BCNF with respect to F, we decompose them. For example, given R = {E, C, S} (employee, child, salary) and F = {E → S}, we might decompose R into

I R1 = {E, C} and

I R2 = {E, S} How do we decide which attributes go into which decomposed relation schemas?

Database Decomposing a relation schema Management Peter Wood

Relational Database Design Given a relation schema R which is not in BCNF, we Update Anomalies decompose it by Data Redundancy Normal Forms I replacing it by two (smaller) relation schemas R and FD Inference 1 Boyce-Codd Normal Form R2 Third Normal Form Normalisation I such that R = R1 ∪ R2. Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm How do we decide which attributes go into which decomposed relation schemas?

Database Decomposing a relation schema Management Peter Wood

Relational Database Design Given a relation schema R which is not in BCNF, we Update Anomalies decompose it by Data Redundancy Normal Forms I replacing it by two (smaller) relation schemas R and FD Inference 1 Boyce-Codd Normal Form R2 Third Normal Form Normalisation I such that R = R1 ∪ R2. Algorithms Lossless Join BCNF Algorithm For example, given R = {E, C, S} (employee, child, BCNF Examples Dependency Preservation salary) and F = {E → S}, we might decompose R into 3NF Algorithm

I R1 = {E, C} and

I R2 = {E, S} Database Decomposing a relation schema Management Peter Wood

Relational Database Design Given a relation schema R which is not in BCNF, we Update Anomalies decompose it by Data Redundancy Normal Forms I replacing it by two (smaller) relation schemas R and FD Inference 1 Boyce-Codd Normal Form R2 Third Normal Form Normalisation I such that R = R1 ∪ R2. Algorithms Lossless Join BCNF Algorithm For example, given R = {E, C, S} (employee, child, BCNF Examples Dependency Preservation salary) and F = {E → S}, we might decompose R into 3NF Algorithm

I R1 = {E, C} and

I R2 = {E, S} How do we decide which attributes go into which decomposed relation schemas? Database Lossless join Management Peter Wood

Relational Database Design What if we choose a different decomposition for our Update Anomalies example? Data Redundancy Normal Forms schema(R) = {ENAME, CNAME, SAL} and FD Inference Boyce-Codd Normal Form single FD: ENAME → SAL Third Normal Form Normalisation (Modified) relation r over R is given by Algorithms Lossless Join BCNF Algorithm BCNF Examples ENAME CNAME SAL Dependency Preservation Jack Diane 25 3NF Algorithm Jack John 25 Donald Diane 30 Donald David 30 and then perform the natural join, we get

ENAME CNAME SAL Jack Diane 25 Jack Diane 30 Jack John 25 Donald Diane 25 Donald Diane 30 Donald David 30

⇒ with two tuples that were not in the original relation

If we decompose R into {ENAME,CNAME} and Database Management {CNAME,SAL} as follows: Peter Wood

ENAME CNAME CNAME SAL Relational Database Design Jack Diane Diane 25 Update Anomalies Data Redundancy

Jack John John 25 Normal Forms Donald Diane Diane 30 FD Inference Boyce-Codd Normal Form Donald David David 30 Third Normal Form Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm If we decompose R into {ENAME,CNAME} and Database Management {CNAME,SAL} as follows: Peter Wood

ENAME CNAME CNAME SAL Relational Database Design Jack Diane Diane 25 Update Anomalies Data Redundancy

Jack John John 25 Normal Forms Donald Diane Diane 30 FD Inference Boyce-Codd Normal Form Donald David David 30 Third Normal Form Normalisation Algorithms and then perform the natural join, we get Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation ENAME CNAME SAL 3NF Algorithm Jack Diane 25 Jack Diane 30 Jack John 25 Donald Diane 25 Donald Diane 30 Donald David 30

⇒ with two tuples that were not in the original relation I A decomposition which does faithfully represent the original information is called lossless

I Losslessness is guaranteed if we ensure that the common attributes between a pair of decomposed relation schemas is a key for one of them

I The BCNF algorithm ensures lossless decompositions

Database Management

Peter Wood I A decomposition such as that into {ENAME,CNAME} and {CNAME,SAL} is called lossy Relational Database Design I We started knowing Jack’s salary was 25 Update Anomalies Data Redundancy I After decomposing, if we query Jack’s salary we get Normal Forms FD Inference both 25 and 30 Boyce-Codd Normal Form Third Normal Form

I The decomposition does not faithfully represent the Normalisation Algorithms original information we had Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm Database Management

Peter Wood I A decomposition such as that into {ENAME,CNAME} and {CNAME,SAL} is called lossy Relational Database Design I We started knowing Jack’s salary was 25 Update Anomalies Data Redundancy I After decomposing, if we query Jack’s salary we get Normal Forms FD Inference both 25 and 30 Boyce-Codd Normal Form Third Normal Form

I The decomposition does not faithfully represent the Normalisation Algorithms original information we had Lossless Join BCNF Algorithm I A decomposition which does faithfully represent the BCNF Examples Dependency Preservation original information is called lossless 3NF Algorithm

I Losslessness is guaranteed if we ensure that the common attributes between a pair of decomposed relation schemas is a key for one of them

I The BCNF algorithm ensures lossless decompositions 1. R1 = EMPLOYEE, containing all the attributes in the violating FD, i.e., schema(EMPLOYEE) = { ENAME, SAL }, and

F1 = { ENAME → SAL }.

2. R2 = DEPENDENT, containing all attributes in R except those on the RHS of the violating FD, i.e., schema(DEPENDENT) = { ENAME, CNAME }, and

F2 = ∅ (excluding trivial FDs).

Database Decomposition Condition used by Algorithm Management Peter Wood

The FD ENAME → SAL violates BCNF in schema R. Relational Database Design Update Anomalies Data Redundancy I ENAME is the left-hand side of the violating FD Normal Forms I SAL is the right-hand side (RHS) of the violating FD FD Inference Boyce-Codd Normal Form Third Normal Form Split {ENAME, CNAME, SAL} into two relation schemas: Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm 2. R2 = DEPENDENT, containing all attributes in R except those on the RHS of the violating FD, i.e., schema(DEPENDENT) = { ENAME, CNAME }, and

F2 = ∅ (excluding trivial FDs).

Database Decomposition Condition used by Algorithm Management Peter Wood

The FD ENAME → SAL violates BCNF in schema R. Relational Database Design Update Anomalies Data Redundancy I ENAME is the left-hand side of the violating FD Normal Forms I SAL is the right-hand side (RHS) of the violating FD FD Inference Boyce-Codd Normal Form Third Normal Form Split {ENAME, CNAME, SAL} into two relation schemas: Normalisation Algorithms Lossless Join 1. R1 = EMPLOYEE, containing all the attributes in the BCNF Algorithm BCNF Examples violating FD, i.e., Dependency Preservation schema(EMPLOYEE) = { ENAME, SAL }, and 3NF Algorithm

F1 = { ENAME → SAL }. Database Decomposition Condition used by Algorithm Management Peter Wood

The FD ENAME → SAL violates BCNF in schema R. Relational Database Design Update Anomalies Data Redundancy I ENAME is the left-hand side of the violating FD Normal Forms I SAL is the right-hand side (RHS) of the violating FD FD Inference Boyce-Codd Normal Form Third Normal Form Split {ENAME, CNAME, SAL} into two relation schemas: Normalisation Algorithms Lossless Join 1. R1 = EMPLOYEE, containing all the attributes in the BCNF Algorithm BCNF Examples violating FD, i.e., Dependency Preservation schema(EMPLOYEE) = { ENAME, SAL }, and 3NF Algorithm

F1 = { ENAME → SAL }.

2. R2 = DEPENDENT, containing all attributes in R except those on the RHS of the violating FD, i.e., schema(DEPENDENT) = { ENAME, CNAME }, and

F2 = ∅ (excluding trivial FDs). Database Management Algorithm DECOMPOSE(R, F) Peter Wood

Relational Database Design Update Anomalies let the output database schema Out be empty; Data Redundancy if IS-BCNF(R,F) then Normal Forms FD Inference add R to Out; Boyce-Codd Normal Form Third Normal Form

else Normalisation let X → A inF + be nontrivial (i.e.A is not inX) Algorithms Lossless Join such thatX is not a superkey with respect to F; BCNF Algorithm BCNF Examples let R have schema(R ) =X ∪{ A}; Dependency Preservation 1 1 3NF Algorithm merge DECOMPOSE(R1,F) and Out; let R2 have schema(R2) = schema(R) −{ A}; merge DECOMPOSE(R2,F) and Out; end if return Out; Database Algorithm properties Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy The natural join can be applied to all of the relations in Normal Forms FD Inference DECOMPOSE(R, F) to recover precisely the information Boyce-Codd Normal Form Third Normal Form

stored in any relation over schema(R); this is known as Normalisation the lossless join property. Algorithms Lossless Join + BCNF Algorithm Note that, in general, we need to consider F , the BCNF Examples Dependency Preservation closure of F, to check whether there are any FDs which 3NF Algorithm violate BCNF. But we can start trying to find violations in F, and only consider F + once we find no violations in F. I CITY → COUNTRY violates BCNF in STUD, so decompose STUD into CC, with schema(CC) = {CITY, COUNTRY}, and STUD1, with schema(STUD1) = {SNUM, POSTCODE, CITY}

I CC is in BCNF while POSTCODE → CITY violates BCNF in STUD1, so decompose STUD1 into PC, with schema(PC) = {POSTCODE, CITY}, and SINFO = {SNUM, POSTCODE}.

I All the relation schemas in the database schema {CC, PC, SINFO} are now in BCNF

Database Another Example of BCNF Decomposition Management Let STUD be a relation schema, with schema(STUD) = Peter Wood {SNUM, POSTCODE, CITY, COUNTRY}, with FDs Relational Database Design {SNUM → POSTCODE, POSTCODE → CITY, Update Anomalies Data Redundancy

CITY → COUNTRY} Normal Forms FD Inference Boyce-Codd Normal Form Third Normal Form

Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm I CC is in BCNF while POSTCODE → CITY violates BCNF in STUD1, so decompose STUD1 into PC, with schema(PC) = {POSTCODE, CITY}, and SINFO = {SNUM, POSTCODE}.

I All the relation schemas in the database schema {CC, PC, SINFO} are now in BCNF

Database Another Example of BCNF Decomposition Management Let STUD be a relation schema, with schema(STUD) = Peter Wood {SNUM, POSTCODE, CITY, COUNTRY}, with FDs Relational Database Design {SNUM → POSTCODE, POSTCODE → CITY, Update Anomalies Data Redundancy

CITY → COUNTRY} Normal Forms FD Inference I CITY → COUNTRY violates BCNF in STUD, so Boyce-Codd Normal Form Third Normal Form

decompose STUD into Normalisation Algorithms CC, with schema(CC) = {CITY, COUNTRY}, and Lossless Join BCNF Algorithm STUD1, with schema(STUD1) = {SNUM, BCNF Examples Dependency Preservation POSTCODE, CITY} 3NF Algorithm I All the relation schemas in the database schema {CC, PC, SINFO} are now in BCNF

Database Another Example of BCNF Decomposition Management Let STUD be a relation schema, with schema(STUD) = Peter Wood {SNUM, POSTCODE, CITY, COUNTRY}, with FDs Relational Database Design {SNUM → POSTCODE, POSTCODE → CITY, Update Anomalies Data Redundancy

CITY → COUNTRY} Normal Forms FD Inference I CITY → COUNTRY violates BCNF in STUD, so Boyce-Codd Normal Form Third Normal Form

decompose STUD into Normalisation Algorithms CC, with schema(CC) = {CITY, COUNTRY}, and Lossless Join BCNF Algorithm STUD1, with schema(STUD1) = {SNUM, BCNF Examples Dependency Preservation POSTCODE, CITY} 3NF Algorithm

I CC is in BCNF while POSTCODE → CITY violates BCNF in STUD1, so decompose STUD1 into PC, with schema(PC) = {POSTCODE, CITY}, and SINFO = {SNUM, POSTCODE}. Database Another Example of BCNF Decomposition Management Let STUD be a relation schema, with schema(STUD) = Peter Wood {SNUM, POSTCODE, CITY, COUNTRY}, with FDs Relational Database Design {SNUM → POSTCODE, POSTCODE → CITY, Update Anomalies Data Redundancy

CITY → COUNTRY} Normal Forms FD Inference I CITY → COUNTRY violates BCNF in STUD, so Boyce-Codd Normal Form Third Normal Form

decompose STUD into Normalisation Algorithms CC, with schema(CC) = {CITY, COUNTRY}, and Lossless Join BCNF Algorithm STUD1, with schema(STUD1) = {SNUM, BCNF Examples Dependency Preservation POSTCODE, CITY} 3NF Algorithm

I CC is in BCNF while POSTCODE → CITY violates BCNF in STUD1, so decompose STUD1 into PC, with schema(PC) = {POSTCODE, CITY}, and SINFO = {SNUM, POSTCODE}.

I All the relation schemas in the database schema {CC, PC, SINFO} are now in BCNF Database A Third Example Management Peter Wood

Relational Database Design Update Anomalies I Consider a modified relation schema EMP, with Data Redundancy Normal Forms attributes ENAME, CNAME (child name), DNAME FD Inference Boyce-Codd Normal Form (department name) and MNAME (manager name). Third Normal Form I The set of FDs is F = {E → D, D → M, M → D}, where Normalisation Algorithms E stands for ENAME, D stands for DNAME and M Lossless Join BCNF Algorithm stands for MNAME (and C stands for child name). BCNF Examples Dependency Preservation I All three FDs violate BCNF since EC is the only key. 3NF Algorithm

I We can choose any one of them as the basis for the first decomposition step.

I We will consider all three decompositions in turn. Database Third Example: Decomposition 1 Management Peter Wood

Relational Database Design Update Anomalies I If we first decompose using D → M, we get two Data Redundancy schemas with attributes {D, M} and {E, C, D}. Normal Forms FD Inference Boyce-Codd Normal Form I FDs D → M and M → D are applicable to {D, M}, but Third Normal Form both D and M are keys. Normalisation Algorithms I FD E → D is applicable to {E, C, D} and E is not a Lossless Join BCNF Algorithm superkey. BCNF Examples Dependency Preservation 3NF Algorithm I So we decompose {E, C, D} into {E, D} and {E, C}.

I E is a key for {E, D} and EC is the key for {E, C}.

I So the final database schema comprises {D, M}, {E, D} and {E, C}. Database Third Example: Decomposition 2 Management Peter Wood

I If we first decompose using E → D, we get two Relational Database Design schemas with attributes {E, D} and {E, C, M}. Update Anomalies Data Redundancy I E → D is applicable to {E, D}, but E is a key. Normal Forms FD Inference I What FDs are applicable to {E, C, M}? Boyce-Codd Normal Form Third Normal Form I None of E → D, D → M or M → D apply because D is Normalisation Algorithms not in {E, C, M}. Lossless Join + BCNF Algorithm I We have to consider all FDs inF . BCNF Examples Dependency Preservation 3NF Algorithm I Recall that E → M follows from E → D and D → M.

I E → M violates BCNF in {E, C, M} because E is not a key.

I So we decompose {E, C, M} into {E, M} and {E, C}.

I So the final database schema comprises {E, D}, {E, M} and {E, C}. Database Third Example: Decomposition 3 Management Peter Wood

Relational Database Design Update Anomalies Data Redundancy

I If we first decompose using M → D, we get two Normal Forms FD Inference schemas with attributes {M, D} and {E, C, M}. Boyce-Codd Normal Form Third Normal Form I FDs D → M and M → D are applicable to {M, D}, but Normalisation Algorithms both D and M are keys. Lossless Join BCNF Algorithm I Once again we have {E, C, M}, so it is decomposed BCNF Examples Dependency Preservation as before into {E, M} and {E, C}. 3NF Algorithm

I So the final database schema comprises {M, D}, {E, M} and {E, C}. Database An example for you to try Management Let R be a relation schema, with schema(R) = Peter Wood {C,T,H,R,S,G}. Relational Database Design Update Anomalies I C stands for a course, Data Redundancy I T stands for a teacher, Normal Forms FD Inference I H stands for hour, Boyce-Codd Normal Form Third Normal Form

I R stands for room, Normalisation Algorithms I S stands for student and Lossless Join BCNF Algorithm I G stands for grade. BCNF Examples Dependency Preservation 3NF Algorithm An example set of FDs F over R : 1.C → T, 2.HR → C, 3.HT → R, 4.CS → G and 5.HS → R.

Decompose R into BCNF. Database Dependency Preservation Management Peter Wood

Recall example: F3 = {SC → P, P → C}. Relational Database Design S stands for Street, C stands for City and P stands for Update Anomalies Postcode. Data Redundancy Normal Forms FD Inference {S,C,P} is not in BCNF Boyce-Codd Normal Form Third Normal Form Decompose {S,C,P} into {P,C} and {P,S} Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples P C P S Dependency Preservation p1 c p1 s 3NF Algorithm p2 c p2 s

Only FD that can be tested in the decomposition is P → C When we join the two relations, we see that SC → P is violated. Database Dependency Preservation Management Peter Wood

Relational Database Design Update Anomalies A decomposition is dependency preserving if the FDs Data Redundancy Normal Forms which hold on the original relation schema can be tested FD Inference Boyce-Codd Normal Form on the decomposed schemas, without using joins. Third Normal Form

Normalisation We cannot always find a BCNF decomposition that is Algorithms Lossless preserving. BCNF Algorithm BCNF Examples Dependency Preservation To test that no FDs are violated, we may need to join 3NF Algorithm relations (expensive). We can always find a 3NF dependency-preserving decomposition. Database Dependency Preservation Management Peter Wood

Relational Database Design For a starting set of attributes and FDs, some BCNF Update Anomalies Data Redundancy decompositions may be dependency preserving and Normal Forms some not. FD Inference Boyce-Codd Normal Form Third Normal Form Consider the example with attributes { E, C, D, M } and Normalisation FDs F = {E → D, D → M, M → D}. Algorithms Lossless Join BCNF Algorithm We had three possible decompositions BCNF Examples Dependency Preservation 1. {D, M}, {E, D} and {E, C}. 3NF Algorithm 2. {E, D}, {E, M} and {E, C}. 3. {M, D}, {E, M} and {E, C}.

Which of them is dependency-preserving? 1. Remove all redundancies from F (we haven’t covered this). 2. For each FD X → A in F, use X ∪ {A} as the schema of one of the relations in the decomposition. 3. If none of the schemas from Step 2 includes a superkey for R, add another relation schema that is a key for R. 4. Delete any of the schemas from Step 2 that is contained in another.

Database 3NF Algorithm Management Peter Wood

Relational Given a relation schema R and a set of FDs F, the Database Design Update Anomalies following steps produce a 3NF decomposition of R that Data Redundancy satisfies the lossless join condition and is dependency Normal Forms FD Inference preserving: Boyce-Codd Normal Form Third Normal Form

Normalisation Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm 2. For each FD X → A in F, use X ∪ {A} as the schema of one of the relations in the decomposition. 3. If none of the schemas from Step 2 includes a superkey for R, add another relation schema that is a key for R. 4. Delete any of the schemas from Step 2 that is contained in another.

Database 3NF Algorithm Management Peter Wood

Relational Given a relation schema R and a set of FDs F, the Database Design Update Anomalies following steps produce a 3NF decomposition of R that Data Redundancy satisfies the lossless join condition and is dependency Normal Forms FD Inference preserving: Boyce-Codd Normal Form Third Normal Form

1. Remove all redundancies from F (we haven’t Normalisation covered this). Algorithms Lossless Join BCNF Algorithm BCNF Examples Dependency Preservation 3NF Algorithm 3. If none of the schemas from Step 2 includes a superkey for R, add another relation schema that is a key for R. 4. Delete any of the schemas from Step 2 that is contained in another.

Database 3NF Algorithm Management Peter Wood

Relational Given a relation schema R and a set of FDs F, the Database Design Update Anomalies following steps produce a 3NF decomposition of R that Data Redundancy satisfies the lossless join condition and is dependency Normal Forms FD Inference preserving: Boyce-Codd Normal Form Third Normal Form

1. Remove all redundancies from F (we haven’t Normalisation covered this). Algorithms Lossless Join BCNF Algorithm 2. For each FD X → A in F, use X ∪ {A} as the BCNF Examples Dependency Preservation schema of one of the relations in the decomposition. 3NF Algorithm 4. Delete any of the schemas from Step 2 that is contained in another.

Database 3NF Algorithm Management Peter Wood

Relational Given a relation schema R and a set of FDs F, the Database Design Update Anomalies following steps produce a 3NF decomposition of R that Data Redundancy satisfies the lossless join condition and is dependency Normal Forms FD Inference preserving: Boyce-Codd Normal Form Third Normal Form

1. Remove all redundancies from F (we haven’t Normalisation covered this). Algorithms Lossless Join BCNF Algorithm 2. For each FD X → A in F, use X ∪ {A} as the BCNF Examples Dependency Preservation schema of one of the relations in the decomposition. 3NF Algorithm 3. If none of the schemas from Step 2 includes a superkey for R, add another relation schema that is a key for R. Database 3NF Algorithm Management Peter Wood

Relational Given a relation schema R and a set of FDs F, the Database Design Update Anomalies following steps produce a 3NF decomposition of R that Data Redundancy satisfies the lossless join condition and is dependency Normal Forms FD Inference preserving: Boyce-Codd Normal Form Third Normal Form

1. Remove all redundancies from F (we haven’t Normalisation covered this). Algorithms Lossless Join BCNF Algorithm 2. For each FD X → A in F, use X ∪ {A} as the BCNF Examples Dependency Preservation schema of one of the relations in the decomposition. 3NF Algorithm 3. If none of the schemas from Step 2 includes a superkey for R, add another relation schema that is a key for R. 4. Delete any of the schemas from Step 2 that is contained in another.