Week 11: Normal Forms Logical Design Database Design  We have seen how to design a relational Database Redundancies and Anomalies schema by first designing an ER schema Functional Dependencies and then transforming it into a relational Entailment, Closure and Equivalence one. Lossless Decompositions The (3NF)  Now we focus on how to transform the The Boyce-Codd Normal Form (BCNF) generated relational schema into a "better" Normal Forms and Database Design one.  Goodness of relational schemas is defined in terms of the notion of normal form.

CSC343 – Introduction to Normal Forms — 1 CSC343 – Introduction to Databases Normal Forms — 2

1 Normal Forms and Normalization Examples of Redundancy  A normal form is a property of a database schema.  When a database schema is un-normalized (that is, Employee Salary Project Budget Function does not satisfy the normal form), it allows Brown 20 Mars 2 technician redundancies of various types which can lead to Green 35 Jupiter 15 designer anomalies and inconsistencies. Green 35 Venus 15 designer Hoskins 55 Venus 15 manager  Normal forms can serve as basis for evaluating the Hoskins 55 Jupiter 15 consultant quality of a database schema and constitutes a useful Hoskins 55 Mars 2 consultant tool for database design. Moore 48 Mars 2 manager Moore 48 Venus 15 designer  Normalization is a procedure that transforms an un- Kemp 48 Venus 15 designer normalized schema into a normalized one. Kemp 48 Jupiter 15 manager

CSC343 – Introduction to Databases Normal Forms — 3 CSC343 – Introduction to Databases Normal Forms — 4

2 Anomalies What’s Wrong??? The value of the salary of an employee is repeated in every where the employee is mentioned,  We are using a single to represent data of leading to a redundancy. Redundancies lead to very different types. anomalies:  In particular, we are using a single relation to store  If salary of an employee changes, we have to the following types of entities, relationships and modify the value in all corresponding attributes: (update anomaly)  Employees and their salaries;  If an employee ceases to work in projects, but  Projects and their budgets; stays with company, all corresponding tuples are  Participation of employees in projects, along with deleted, leading to loss of information (deletion their functions. anomaly)  To set the problem on a formal footing, we introduce  A new employee cannot be inserted in the the notion of (FD). relation until the employee is assigned to a project (insertion anomaly)

CSC343 – Introduction to Databases Normal Forms — 5 CSC343 – Introduction to Databases Normal Forms — 6

3 Functional Dependencies (FDs) Functional Dependencies in the Example  Given schema R(X) and non-empty Y and Z of the attributes X, we say that there is a functional  Each employee has a unique salary. We dependency between Y and Z (Y→Z), iff for every represent this dependency as relation instance r of R(X) and every pair of tuples t , t 1 2 Employee → Salary of r, if t .Y = t .Y, then t .Z = t .Z. and say " functionally depends on 1 2 1 2 Salary  A functional dependency is a statement about all Employee". allowable relations for a given schema.  Meaning: if two tuples have the same  Functional dependencies have to be identified by Employee attribute value, they must also understanding the semantics of the application. have the same Salary attribute value  Given a particular relation r0 of R(X), we can tell if a  Likewise, dependency holds or not; but just because it holds for Project → Budget r0, doesn’t mean that it also holds for R(X)! i.e., each project has a unique budget

CSC343 – Introduction to Databases Normal Forms — 7 CSC343 – Introduction to Databases Normal Forms — 8

4 Non-Trivial Dependencies Looking for FDs  A functional dependency Y→Z is non-trivial if no attribute in Z appears among attributes of Y, e.g.,

Employee Salary Project Budget Function  Employee → Salary is non-trivial; Brown 20 Mars 2 technician  Employee,Project → Project is trivial. Green 35 Jupiter 15 designer Green 35 Venus 15 designer  Anomalies arise precisely for the attributes which are Hoskins 55 Venus 15 manager involved in (non-trivial) functional dependencies: Hoskins 55 Jupiter 15 consultant Hoskins 55 Mars 2 consultant  Employee → Salary; Moore 48 Mars 2 manager  Project → Budget. Moore 48 Venus 15 designer Kemp 48 Venus 15 designer  Moreover, note that our example includes another Kemp 48 Jupiter 15 manager functional dependency:  Employee,Project → Function.

CSC343 – Introduction to Databases Normal Forms — 9 CSC343 – Introduction to Databases Normal Forms — 10

5 Dependencies Cause Anomalies, Another Example ...Sometimes! ER Model  The first two dependencies cause undesirable redundancies and anomalies. (0.N)  The third dependency, however, does not cause redundancies because {Employee,Project} constitutes a key of the SI# Name Address Hobbies This is NOT a relation relation (...and a relation cannot contain two 1111 Joe 123 Main {biking, hiking} tuples with the same values for the key attributes.) SI# Name Address Hobby Dependencies on keys are OK, 1111 Joe 123 Main biking other dependencies are not! 1111 Joe 123 Main hiking ……………. Redundancy CSC343 – Introduction to Databases Normal Forms — 11 CSC343 – Introduction to Databases Normal Forms — 12

6 Constraints How Do We Eliminate Redundancy?  A superkey constraint is a special functional  Decomposition: Use two relations to store dependency: Let K be a set of attributes of R, and U Person information: the set of all attributes of R. Then K is a superkey iff the functional dependency K U is satisfied in Person1 (SI#, Name, Address) → R. Hobbies (SI#, Hobby) E.g., SI# → SI#,Name,Address (for a Person  The decomposition is more general: people with relation) hobbies can now be described independently of  A key is a minimal superkey, I.e., for each X ⊂ K, X their name and address. is not a superkey  No update anomalies: SI#, Hobby → SI#, Name, Address, Hobby but Name and address stored once; SI# → SI#, Name, Address, Hobby A hobby can be separately supplied or Hobby SI#, Name, Address, Hobby deleted; → We can represent persons with no hobbies.  A key attribute is an attribute that is part of a key.

CSC343 – Introduction to Databases Normal Forms — 13 CSC343 – Introduction to Databases Normal Forms — 14

7 More Examples When are FDs "Equivalent"?

 Address → PostalCode  Sometimes functional dependencies (FDs) DCS’s postal code is M5S 3H5 seem to be saying the same thing,  Author, Title, Edition → PublicationDate e.g., Addr → PostalCode,Str# vs Addr PostalCode, Addr Str# Ramakrishnan, et al., Database → → Management Systems, 3rd publication date  Another example is 2003 Addr → PostalCode, PostalCode → Province  CourseID → ExamDate, ExamTime vs Addr → PostalCode, PostalCode → Province vs Addr Province CSC343’s exam date is December 18, → When are two sets of FDs equivalent? How do starting at 7pm  we "infer" new FDs from given FDs?

CSC343 – Introduction to Databases Normal Forms — 15 CSC343 – Introduction to Databases Normal Forms — 16

8 Entailment, Closure, Equivalence How Do We Compute Entailment?  Satisfaction, entailment, and equivalence are  If F is a set of FDs on schema R and f is another semantic concepts – defined in terms of the FD on R, then F entails f (written F |= f) if every "meaning" of relations in the “real world.” instance r of R that satisfies every FD in F also satisfies f.  How to check if F entails f, F and G are equivalent? Example: F = {A B, B C} and f is A C → → → Apply the respective definitions for all possible If Phone# → Address and Address → relation instances for a schema R …… ZipCode, then Phone# → ZipCode Find algorithmic, syntactic ways to compute  The closure of F, denoted F+, is the set of all FDs these notions. entailed by F.  Note: The syntactic solution must be "correct” with  F and G are equivalent if F entails G and G respect to the semantic definitions. entails F.  Correctness has two aspects: soundness and completeness – see later.

CSC343 – Introduction to Databases Normal Forms — 17 CSC343 – Introduction to Databases Normal Forms — 18

9 Soundness Armstrong’s Axioms for FDs  Theorem: F |- f implies F |= f  This is the syntactic way of computing/testing  In words: If FD f: X → Y can be derived from a set of semantic properties of FDs FDs F using the axioms, then f holds in every relation Reflexivity: Y ⊆ X |- X → Y (trivial FD) that satisfies every FD in F.  Example: Given X → Y and X → Z then e.g., |- Name, Address → Name X XY Augmentation by X Augmentation: X → Y |- XZ → YZ → YX → YZ Augmentation by Y e.g., Address → ZipCode |- X → YZ Transitivity Address,Name → ZipCode, Name  Thus, X → YZ is satisfied in every relation where both Transitivity: X Y, Y Z |- X Z → → → X → Y and X → Z are satisfied. We have derived e.g., Phone# → Address, Address → ZipCode the union rule for FDs. |- Phone# → ZipCode

CSC343 – Introduction to Databases Normal Forms — 19 CSC343 – Introduction to Databases Normal Forms — 20

10 Completeness Correctness

 Theorem: F |= f implies F |- f  The notions of soundness and  In words: If F entails f, then f can be derived completeness link the syntax (Armstrong’s from F using Armstrong's axioms. axioms) with semantics, i.e., entailment defined in terms of relational instances.  A consequence of completeness is the following (naïve) algorithm to determining if  This is a precise way of saying that the F entails f: algorithm for entailment based on the axioms is ``correct’’ with respect to the Algorithm: Use the axioms in all possible Algorithm definitions. ways to generate F+ (the set of possible FD’s is finite so this can be done) and see if f is in F+

CSC343 – Introduction to Databases Normal Forms — 21 CSC343 – Introduction to Databases Normal Forms — 22

11 Decomposition Rule Generating F+

 Another example of a derivation rule we can F use in generating F+:  X → AB, AB → A (refl), X → A (trans) AB→ C u n i o n AB→ BCD decomp  So, whenever we have X AB, we can aug → A→ D AB→ BD tr a n s AB→ BCDE AB→ E "decompose" this functional dependency to aug two functional dependencies X → A, X → B D→ E BCD → BCDE

Thus, AB→ BD, AB → BCD, AB → BCDE, and AB → E are all elements of F+.

CSC343 – Introduction to Databases Normal Forms — 23 CSC343 – Introduction to Databases Normal Forms — 24

12 Attribute Closure Computing the + Attribute Closure X F  Calculating attribute closure leads to a more

efficient way of checking entailment. + closure := X; // since X ⊆ X F  The attribute closure of a set of attributes X repeat + old := closure; with respect to a set of FDs F, denoted X F, is the set of all attributes A such that X → A if there is an FD Z → V in F such that Z ⊆ closure and V ⊆ closure + + X F is not necessarily same as X G if F ≠ G then closure := closure ∪ V  Attribute closure and entailment: until old = closure

– If T ⊆ closure then X → T is entailed by F Algorithm: Given a set of FDs, F, then X → Y + if and only if Y ⊆ X F

CSC343 – Introduction to Databases Normal Forms — 25 CSC343 – Introduction to Databases Normal Forms — 26

13 Computing Attribute Closure: Normal Forms  Each normal form is a set of conditions on a schema that An Example together guarantee certain properties (relating to + redundancy and update anomalies). X XF  (1NF) is the same as the definition of F: AB → C A {A, D, E} relational model (relations = sets of tuples; each tuple = sequence of atomic values). A → D AB {A, B, C, D, E} D → E AC {A, C, B, D, E}  (2NF) 1NF plus every attribute that AC → B is not part of a (that is, a non-prime B {B} attribute) must depend on an entire candidate key (not D {D, E} part of it).  The two most used are third normal form (3NF) and Is AB → E entailed by F? Yes Boyce-Codd normal form (BCNF). Is D → C entailed by F? No  We will discuss in detail the 3NF. + Result: XF allows us to determine all FDs of the form X → Y entailed by F

CSC343 – Introduction to Databases Normal Forms — 27 CSC343 – Introduction to Databases Normal Forms — 28

14 Decomposition into 3NF: Basic Idea The Third Normal Form  Decomposition into 3NF can proceed as follows.

 A relation R(X) is in third normal form (3NF) if, for For each functional dependency of the form Y → each (non-trivial) functional dependency Y → Z, at Z, where Y contains a of a key K of R(X), least one of the following is true: create a projection on all the attributes Y, Z (2NF).  Y contains a key K of R(X);  Each attribute in Z is contained in at least one For each dependency of the form Y → Z, where (candidate) key of R(X). That is, each attribute in Y, doesn’t contain any key, and not all Z is a prime attribute. attributes of Z are key attributes, create a  3NF does not remove all redundancies. projection on all the attributes Y, Z (3NF).  3NF decompositions founded on the notion of minimal  The new relations only include dependencies Y → cover. Z, where Y contains a key K of R(X), or Z contains only key attributes.

CSC343 – Introduction to Databases Normal Forms — 29 CSC343 – Introduction to Databases Normal Forms — 30

15 Basic Idea Normalization Through Decomposition  A relation that is not in 3NF, can be replaced with  R(ABCD), A → D one or more normalized relations using  Projection: normalization.  R1(AD), A → D  We can eliminate redundancies and anomalies for  R2(ABC) the example relation Emp(Employee,Salary,Project,Budget,Function) if we replace it with the three relations obtained by projections on the sets of attributes corresponding to the three functional dependencies: Employee → Salary; Project → Budget. Employee,Project → Function.

CSC343 – Introduction to Databases Normal Forms — 31 CSC343 – Introduction to Databases Normal Forms — 32

16 …Start with… Result of Normalization

Employee Salary Employee Salary Project Budget Function Brown 20 Project Budget Brown 20 Mars 2 technician Green 35 Mars 2 Green 35 Jupiter 15 designer Hoskins 55 Jupiter 15 Green 35 Venus 15 designer Moore 48 Venus 15 Hoskins 55 Venus 15 manager Kemp 48 Hoskins 55 Jupiter 15 consultant Employee Project Function Hoskins 55 Mars 2 consultant Brown Mars technician The keys of new relations Moore 48 Mars 2 manager Green Jupiter designer are lefthand sides of Moore 48 Venus 15 designer Green Venus designer Kemp 48 Venus 15 designer Hoskins Venus manager functional dependencies; Kemp 48 Jupiter 15 manager Hoskins Jupiter consultant satisfaction of 3NF is Hoskins Mars consultant Moore Mars manager therefore guaranteed for Moore Venus designer the new relations. Kemp Venus designer Kemp Jupiter manager

CSC343 – Introduction to Databases Normal Forms — 33 CSC343 – Introduction to Databases Normal Forms — 34

17 Another Example A Possible Decomposition

Employee Project Branch Brown Mars Chicago Green Jupiter Birmingham Green Venus Birmingham Project Branch Hoskins Saturn Birmingham Employee Branch Mars Chicago Hoskins Venus Birmingham Brown Chicago Jupiter Birmingham Green Birmingham Saturn Birmingham Hoskins Birmingham Venus Birmingham This relation satisfies the functional dependencies: Employee → Branch Project → Branch ...but now we don’t know each employee’s projects!

CSC343 – Introduction to Databases Normal Forms — 35 CSC343 – Introduction to Databases Normal Forms — 36

18 Lossless Decomposition The Join of the Projections

Employee Project Branch Brown Mars Chicago  The decomposition of a relation R(X) on X1 Green Jupiter Birmingham and X is lossless if for every instance r of Green Venus Birmingham 2 Hoskins Saturn Birmingham R(X) the join of the projections of R on X1 Hoskins Venus Birmingham and is equal to itself (that is, does not Green Saturn Birmingham X2 r Hoskins Jupiter Birmingham contain spurious tuples).

The result of the join is different from the  Of course, it is clearly desirable to allow original relation. only lossless decompositions during We lost some information normalization. during the decomposition!

CSC343 – Introduction to Databases Normal Forms — 37 CSC343 – Introduction to Databases Normal Forms — 38

19 A Condition for Lossless Decomposition Intuition Behind the Test for Losslessness  Let R(X) be a relation schema and let X1 and  Suppose R1 ∩ R2 → R2. Then a row of r1 X2 be two subsets of X such that X1 ∪ X2 = X. can combine with exactly one row of r in Also, let X0 = X1 ∩ X2. 2 the natural join (since in r a particular set  If R(X) satisfies the functional dependency 2 of values for the attributes in R ∩ R X X or X X , then the decomposition 1 2 0 → 1 0 → 2 defines a unique row) of on and is lossless. R(X) X1 X2 R1 ∩ R2 R1 ∩ R2  In other words, R(X) has a lossless …………. a a ………... decomposition on two relations if the set of ………… a b …………. ………… b c …………. attributes common to the relations is a ………… c superkey for at least one of the r1 r2 decomposed relations.

CSC343 – Introduction to Databases Normal Forms — 39 CSC343 – Introduction to Databases Normal Forms — 40

20 Notation A Lossless Decomposition  Instead of saying that we have relation schema R(X) with functional dependencies F,

Employee Project we will say that we have schema Employee Branch Brown Mars Brown Chicago Green Jupiter R = (R, F) Green Birmingham Green Venus Hoskins Birmingham Hoskins Saturn where R is a set of attributes and F is a Hoskins Venus set of functional dependencies.

 The 3NF normalization problem is then to

generate a set of relation schemas R1=(R1,F1), …, Rn=(Rn,Fn), such that Ri is in 3NF.

CSC343 – Introduction to Databases Normal Forms — 41 CSC343 – Introduction to Databases Normal Forms — 42

21 Another Example Another Problem...  Assume we wish to insert a new tuple that  Schema (R, F) where specifies that employee Armstrong works in R = {SI#, Name, Address, Hobby} the Birmingham branch and participates in F = {SI# → Name, Address} can be decomposed into project Mars.

R1 = {SI#, Name, Address}  In the original relation, this update would be F1 = {SI# → Name, Address} identified as illegal, because it would cause a and violation of the Project → Branch R2 = {SI#, Hobby} dependency. F = { } 2  For the decomposed relations, however, this since R R = SI#, SI# R the decomposition 1∩ 2 → 1 is not possible because the two attributes is lossless. Project and Branch have been moved to different relations.

CSC343 – Introduction to Databases Normal Forms — 43 CSC343 – Introduction to Databases Normal Forms — 44

22 Preserving Dependencies (Intuition) Example

 A decomposition preserves dependencies if  Schema (R, F) where R = {SI#,Name,Address,Hobby} each of the functional dependencies of the F = {SI# → Name,Address} original relation schema involves attributes can be decomposed into

that appear together in one of the R1 = {SI#, Name,Address} decomposed relation schemas. F1 = {SI# → Name,Address} and  It is clearly desirable that a decomposition R = {SI#, Hobby} preserves dependencies because then it is 2 F2 = { } possible to (efficiently) ensure that the  Since F = F1 ∪ F2 the decomposition is decomposed schema satisfies the same dependency preserving. constraints as the original schema.

CSC343 – Introduction to Databases Normal Forms — 45 CSC343 – Introduction to Databases Normal Forms — 46

23 Another Example Dependency Preservation

 If f is a FD in F, but f is not in F1 ∪ F2,  Schema: (ABC; F) , F = {A  B, B  C, C  B} there are two possibilities:  Decomposition: + f ∈ (F1 ∪ F2) (AC, F ), F = {A  C} 1 1 If the constraints in F and F are [Note: A  C ∉ F, but in F+] 1 2 maintained, f will be maintained (BC, F2), F2 = {B  C, C  B} automatically. + f ∉ (F1 ∪ F2)  A  B ∉ (F ∪ F ), but A  B ∈ (F ∪ F )+. 1 2 1 2 f can be checked only by first So F+ = (F ∪ F )+ and thus the 1 2 taking the join of r and r . …This decomposition is still dependency preserving 1 2 is costly…

CSC343 – Introduction to Databases Normal Forms — 47 CSC343 – Introduction to Databases Normal Forms — 48

24 Desirable Qualities for Decompositions Minimal Cover Decompositions should always satisfy the properties of  A minimal cover for a set of dependencies F is a set lossless decomposition and dependency preservation: of dependencies U such that:  Lossless decomposition ensures that the information in  U is equivalent to F (I.e., F+ = U+) the original relation can be accurately reconstructed  All FDs in U have the form X → A where A is a based on the information represented in the decomposed single attribute relations.  It is not possible to make U smaller (while  Dependency preservation ensures that the decomposed preserving equivalence) by relations have the same capacity to represent the integrity constraints as the original relations and Deleting an FD therefore to reveal illegal updates. Deleting an attribute from an FD (its LHS)  FDs and attributes that can be deleted in this way are called redundant.

CSC343 – Introduction to Databases Normal Forms — 49 CSC343 – Introduction to Databases Normal Forms — 50

25 Computing the Minimal Cover Example: F = {ABH → CK, A → D, C → E, Computing the Minimal Cover (cont’d) BGH → L, L → AD, E → L, BH → E}  Step 1: Make RHS of each FD into a single attribute: Use  Step 3: Delete redundant FDs from F: If F – {f} decomposition rule for FDs. entails f, then f is redundant; if f is X → A then +  Example: L→ AD replaced by L → A, L → D ; ABH → CK check if A ∈ X F- {f} by ABH →C, ABH →K e.g., BGH → L is entailed by E → L, BH → E, so it is  Step 2: Eliminate redundant attributes from LHS: If B is a redundant single attribute and FD XB → A ∈ F, X → A is entailed by F,  Note: The order of steps 2, 3 can't be interchanged!! then B is unnecessary. See textbook for a counterexample. e.g., Can an attribute be deleted from ABH → C ? F = {ABH→C, ABH→K, A→D, C→E, BGH→L, L→A, L→D, E→L, BH→E} + + + + 1 Compute AB F, AH F, BH F; Since C ∈ (BH) F , BH → C F2 = {BH→C, BH→K, A→D, C→E, BH→L, L→A, L→D, E→L, BH→E} is entailed by F and A is redundant in ABH → C. F3 = {BH → C, BH → K, A → D, C → E, L → A, E → L}

CSC343 – Introduction to Databases Normal Forms — 51 CSC343 – Introduction to Databases Normal Forms — 52

26 Synthesizing a 3NF Schema Synthesizing … Step 2 Starting with a schema R = (R, F):  Step 2: Partition U into sets U1, U2, … Un  Step 1: Compute minimal cover U of F. The such that the LHS of all elements of U are + + i decomposition is based on U, but since U = F the same: the same functional dependencies will hold. U = {BH C, BH K}, U = {A D}, A minimal cover for 1 → → 2 → F = {ABH →CK, A →D, C →E, BGH →L, L→AD, U3 = {C → E}, U4 = {L → A}, U5 = {E → L} E → L, BH → E} is U = {BH→C, BH→K, A→D, C→E, L→A, E→L}

CSC343 – Introduction to Databases Normal Forms — 53 CSC343 – Introduction to Databases Normal Forms — 54

27 Synthesizing … Step 3 Synthesizing … Step 4

 Step 4: If no Ri is a superkey of R, add  Step 3: For each Ui form schema Ri = (Ri, Ui), schema R0 = (R0,{}) where R0 is a key of R. where R is the set of all attributes mentioned i R0 = (BGH, {}); R0 might be needed when in Ui not all attributes are contained in R1∪R2 …∪ R ; Each FD of U will be in some Ri. Hence the n decomposition is dependency preserving: A missing attribute A must be part of all keys (since it’s not in any FD of U, deriving R = (BHCK; BH→C, BH→ K), 1 a key constraint from U involves the R2 = (AD; A→D), augmentation axiom); R = (CE; C E), 3 → R0 might be needed even if all attributes are accounted for in R ∪R …∪R R4 = (AL; L→A), 1 2 n

R5 = (EL; E → L)

CSC343 – Introduction to Databases Normal Forms — 55 CSC343 – Introduction to Databases Normal Forms — 56

28 Synthesizing … Step 4 (cont'd) Boyce–Codd Normal Form (BCNF)  A relation R(X) is in Boyce–Codd Normal  Example: (ABCD; {A  B,C  D}), with Form if for every non-trivial functional step 3 decomposition: R1 = (AB; {AB}),R2 dependency Y → Z defined on it, Y contains = (CD; {CD}). a key K of R(X). That is, Y is a superkey for Lossy! Need to add (AC; { }), for losslessness R(X).  Step 4 guarantees lossless decomposition:  Example: Person1(SI#, Name, Address) ABCD --decomp--> AB,ACD The only FD is SI# → Name, Address --decomp-->AB,AC,CD Since SI# is a key, Person1 is in BCNF  Anomalies and redundancies, as discussed earlier, do not occur in databases with relations in BCNF.

CSC343 – Introduction to Databases Normal Forms — 57 CSC343 – Introduction to Databases Normal Forms — 58

29 Non-BCNF Examples A Relation not in BCNF

Manager Project Branch  Person(SI#, Name, Address, Hobby) Brown Mars Chicago Green Jupiter Birmingham The FD SI# → Name, Address does not Green Mars Birmingham Hoskins Saturn Birmingham satisfy conditions for BCNF since the key Hoskins Venus Birmingham is (SSN, Hobby)  HasAccount(AcctNum, ClientId, OfficeId) Assume the following dependencies:  Manager → Branch — each manager works in a The FD AcctNum → OfficeId does not particular branch; satisfy BCNF conditions if we assume that  Project,Branch → Manager — each project has keys for HasAccount are (ClientId, several managers, and runs on several branches; OfficeId) and (AcctNum, ClientId); rather however, a project has a unique manager for each than AcctNum. branch.

CSC343 – Introduction to Databases Normal Forms — 59 CSC343 – Introduction to Databases Normal Forms — 60

30 A Problematic Decomposition Normalization Drawbacks  The relation is not in BCNF because the left  By limiting redundancy, normalization helps hand side of the first dependency is not a maintain consistency and saves space. superkey.  But performance of querying can suffer because  At the same time, no decomposition of this related information that was stored in a single relation will work: Project,Branch → Manager relation is now distributed among several involves all the attributes and thus no  Example: A join is required to get the names and decomposition is possible. grades of all students taking CS343 in 2006F.  Sometimes BCNF cannot be achieved for a particular relation and set of functional Student(Id,Name) dependencies without violating the principles of Transcript(StudId,CrsCode,Sem,Grade) lossless decomposition and dependency SELECT S.Name, T.Grade preservation. FROM Student S, Transcript T WHERE S.Id = T.StudId AND T.CrsCode = ‘CS343’ AND T.Semester = ‘2006F’ CSC343 – Introduction to Databases Normal Forms — 61 CSC343 – Introduction to Databases Normal Forms — 62

31 BCNF and 3NF  Tradeoff: Judiciously introduce redundancy to improve performance of certain queries  The Project-Branch-Manager schema is not in  Example: Add attribute Name to Transcript → BCNF, but it is in 3NF. Transcript'  In particular, the Project,Branch → Manager SELECT T.Name, T.Grade dependency has as its left hand side a key, FROM Transcript’ T while Manager → Branch has a unique attribute WHERE T.CrsCode = ‘CS305’ AND T.Semester = ‘S2002’  Join is avoided; for the right hand side, which is part of the {Project,Branch} key.  If queries are asked more frequently than Transcript is modified, added redundancy might  The 3NF is less restrictive than the BCNF and improve average performance; for this reason does not offer the same  But, Transcript’ is no longer in BCNF since key is guarantees of quality for a relation; it has the (StudId,CrsCode,Semester) and StudId → Name. advantage however, of always being achievable.

CSC343 – Introduction to Databases Normal Forms — 63 CSC343 – Introduction to Databases Normal Forms — 64

32 A Revised Example 3NF Tolerates Manager Project Branch Division Some Redundancies! Brown Mars Chicago 1 Green Jupiter Birmingham 1 Green Mars Birmingham 1 Manager Project Branch Hoskins Saturn Birmingham 2 Brown Mars Chicago Hoskins Venus Birmingham 2 Green Jupiter Birmingham Green Mars Birmingham Hoskins Saturn Birmingham Functional dependencies: Hoskins Venus Birmingham . Manager → Branch,Division -- each manager works !! at one branch and manages one division; . Branch,Division → Manager -- for each branch and division there is a single manager; . Project,Branch → Division,Manager -- for each branch, a project is allocated to a single division and has a sole manager responsible.

CSC343 – Introduction to Databases Normal Forms — 65 CSC343 – Introduction to Databases Normal Forms — 66

33 BCNF Normalization (Partial) BCNF Decomposition Algorithm Given: R = (R; F) where R = ABCDEGHK and F = {ABH → C, A → DE, BGH → K, K → ADH, BH → GE} Step 1: Find a FD that violates BCNF Note ABH → C, (ABH)+ includes all attributes (BH is a key) Input: R = (R; F) A → DE violates BCNF since A is not a superkey (A+ = ADE) Step 2: Split R into: Decomp := R Remove DE - A R1 = (ADE; F1 = {A → DE }) while there is S = (S; F’) ∈ Decomp and S not in BCNF do R2 = (ABCGHK; F1 = {ABH→C, BGH→K, K→AH, BH→G}) Find X → Y ∈ F’ that violates BCNF // X isn’t a superkey in S Note 1: R is in BCNF 1 Replace S in Decomp with S1 = (XY; F1), S2 = (S - (Y - X); F2) Note 2: Decomposition is lossless since A is a key of R 1. // F1 = all FDs of F’ involving only attributes of XY Note 3: FDs K → D and BH → E are not in F1 or F2. // F2 = all FDs of F’ involving only attributes of S - (Y - X) But both can be derived from F1 ∪ F2 (E.g., K→A and A→D implies K→D) end Hence, decomposition is dependency preserving. return Decomp

CSC343 – Introduction to Databases Normal Forms — 67 CSC343 – Introduction to Databases Normal Forms — 68

34 A Good Decomposition Database Design and Normalization

Project Branch Division  The theory of normalization can be used as a basis for Manager Branch Division Mars Chicago 1 quality control operations on schemas, during both Brown Chicago 1 Jupiter Birmingham 1 Green Birmingham 1 Mars Birmingham 1 conceptual and logical design. Hoskins Birmingham 2 Saturn Birmingham 2 Venus Birmingham 2  Analysis of the relations obtained during the logical design phase can identify places where the conceptual design . Note: The first relation has a second key was inaccurate: such a validation of the design is usually {Branch,Division}. relatively easy. . The decomposition is in 3NF but not in BCNF;  Normalization can also be used during conceptual design moreover, it is lossless and dependencies are for quality control of each element of a conceptual schema preserved. (entity or relationship). . This example demonstrates that BCNF may be too strong a condition to impose on a relational schema.

CSC343 – Introduction to Databases Normal Forms — 69 CSC343 – Introduction to Databases Normal Forms — 70

35 Decomposing Product Analysis of an Entity  Supplier is (or should be) an independent entity, with its own attributes (code, surname and address)  If Product and Supplier are distinct entities, they  The functional dependency should be linked through a relationship. SupplierCode → Supplier,Address  Since there is a functional dependency from Code holds here: all properties of a supplier are identified by to SupplierCode, we are sure that each product its SupplierCode. has at most one supplier (maximum cardinality 1).  The entity violates 3NF since this dependency has a  Since there is no dependency from SupplierCode left-hand-side that does not contain the identifier and to , we have an unrestricted maximum a right-hand-side made up of attributes that are not Code part of the key. cardinality (N) for Supplier in the relationship.

CSC343 – Introduction to Databases Normal Forms — 71 CSC343 – Introduction to Databases Normal Forms — 72

36 Decomposing Product Analysis of a Relationship  Now we show how to analyze n-ary relationships for n≥3, in order to determine whether they should be decomposed.  This decomposition satisfies fundamental properties:  Consider  It is a lossless decomposition, because of one-to-many relationship that allows us to recostruct the values of the attributes of the original entity;  Moreover, it preserves dependencies because each dependency is embedded in one of the entities or can be reconstructed from them.

CSC343 – Introduction to Databases Normal Forms — 73 CSC343 – Introduction to Databases Normal Forms — 74

37 Some Functional Dependencies  Student → DegreeProgramme (each student is enrolled Decomposing Thesis in one degree programme)  The following is a decomposition of Thesis  Student → Professor (each student writes a thesis where the two decomposed relationships under the supervision of a single professor) are both in 3NF(also in BCNF)  Professor → Department (each professor is associated with a single department and the students under her supervision are students in that department)  The (unique) key of the relationship is Student (given a student, the degree programme, the professor and the department are identified uniquely)  The third FD causes a violation of 3NF.

CSC343 – Introduction to Databases Normal Forms — 75 CSC343 – Introduction to Databases Normal Forms — 76

38 More Observations... Another Decomposition  The relationship Thesis is in 3NF, because its key is made up of the Student entity, and its dependencies all have this entity on the left hand side.  However, not all students write theses, therefore not all students have supervisors.  From a normal form point of , this is not a problem.  However, our conceptual schema should reflect the fact that being in a degree programme and having a supervisor are independent facts.

CSC343 – Introduction to Databases Normal Forms — 77 CSC343 – Introduction to Databases Normal Forms — 78

39