Normalisation in This Lecture Functional Dependencies Example
Total Page:16
File Type:pdf, Size:1020Kb
In This Lecture • Idea of normalisation Normalisation • Functional dependencies •Normal forms • Decompositions Database Systems Lectures 11-12 • 2NF, 3NF, BCNF Natasha Alechina Functional Dependencies Example • Redundancy is often • A set of attributes, A, •{ID, modCode} {First, Last, modName} caused by a functional functionally determines •{modCode} {modName} dependency another set, B, or: there •{ID} {First, Last} • A functional dependency exists a functional (FD) is a link between dependency between A two sets of attributes in a and B (A B), if IDFirst Last modCode modName relation whenever two rows of the relation have the • We can normalise a 111Joe Bloggs G51PRG Programming relation by removing same values for all the undesirable FDs attributes in A, then they also have the same 222Anne Smith G51DBS Databases values for all the attributes in B. FDs and Normalisation Key attributes and superkeys •We define a set of • Not all FDs cause a • We call an attribute • We call a set of 'normal forms' problem a key attribute if this attributes a superkey • Each normal form has • We identify various attribute is part of if it includes a fewer FDs than the sorts of FD that do some candidate key. candidate key (or is last • Each normal form Alternative a candidate key). • Since FDs represent removes a type of FD terminology is redundancy, each that is a problem `prime’ attribute. normal form has less • We will also need a redundancy than the way to remove FDs last 1 Partial FDs and 2NF Second Normal Form 1NF •Partial FDs: Second normal form: • 1NF is not in 2NF Module Dept Lecturer Text •A FD, A B is a partial • A relation is in second • We have the FD FD, if some attribute of normal form (2NF) if it is M1 D1 L1 T1 {Module, Text} A can be removed and M1 D1 L1 T2 in 1NF and no non-key {Lecturer, Dept} the FD still holds M2 D1 L1 T1 attribute is partially •But also • Formally, there is some dependent on a M2 D1 L1 T3 {Module} {Lecturer, Dept} proper subset of A, candidate key M3 D1 L2 T4 C A, such that C B M4 D2 L3 T1 • And so Lecturer and • In other words, no C B M4 D2 L3 T5 Dept are partially • Let us call attributes where C is a strict subset which are part of some M5 D2 L4 T6 dependent on the of a candidate key and B primary key candidate key, key is a non-key attribute. attributes, and the rest non-key attributes. Removing FDs 1NF to 2NF – Example 1NF 2NFa 2NFb • Suppose we have a • It turns out that we can Module Dept Lecturer Text Module Dept Lecturer Module Text relation R with scheme S split R into two parts: M1 D1 L1 T1 M1 D1 L1 M1 T1 and the FD A B where • R1, with scheme C U A M1 D1 L1 T2 M2 D1 L1 M1 T2 A ∩ B = { } • R2, with scheme A U B M2 D1 L1 T1 M3 D1 L2 M2 T1 • Let C = S – (A U B) • The original relation can M2 D1 L1 T3 M4 D2 L3 M2 T3 • In other words: be recovered as the M3 D1 L2 T4 M5 D2 L4 M3 T4 • A – attributes on the left natural join of R1 and M4 D2 L3 T1 M4 T1 hand side of the FD R2: M4 D2 L3 T5 A, B where A B M4 T5 • B – attributes on the • R = R1 NATURAL JOIN R2 M5 D2 L4 T6 is the `bad’ M1 T6 right hand side of the FD dependency – • C – all other attributes A C violating 2NF A, C B Problems Resolved in 2NF Problems Remaining in 2NF 2NFa •Problems in 1NF • In 2NF the first two • INSERT anomalies Module Dept Lecturer • INSERT – Can't add a are resolved, but not • Can't add lecturers module with no texts the third one who teach no modules M1 D1 L1 • UPDATE – To change M2 D1 L1 2NFa •UPDATE anomalies lecturer for M1, we M3 D1 L2 • To change the have to change two Module Dept Lecturer M4 D2 L3 department for L1 we rows M5 D2 L4 M1 D1 L1 must alter two rows • DELETE – If we M2 D1 L1 remove M3, we M3 D1 L2 • DELETE anomalies remove L2 as well M4 D2 L3 • If we delete M3 we M5 D2 L4 delete L2 as well 2 Transitive FDs and 3NF Third Normal Form •Transitive FDs: • Third normal form 2NFa • 2NFa is not in 3NF •A FD, A C is a • A relation is in third Module Dept Lecturer • We have the FDs transitive FD, if there normal form (3NF) if M1 D1 L1 is some set B such it is in 2NF and no {Module} {Lecturer} that A B and B C non-key attribute is M2 D1 L1 {Lecturer} {Dept} M3 D1 L2 are non-trivial FDs transitively dependent •So there is a • A B non-trivial on a candidate key M4 D2 L3 transitive FD from the means: B is not a • Alternative (simpler) M5 D2 L4 primary key {Module} subset of A definition: a relation to {Dept} •We have is in 3NF if in every A B C non-trivial fd A B either B is a key attribute or A is a superkey. 2NF to 3NF – Example Problems Resolved in 3NF 2NFa 3NFa 3NFb •Problems in 2NF • In 3NF all of these are resolved (for this relation – Module Dept Lecturer Lecturer Dept Module Lecturer • INSERT – Can't add but 3NF can still have lecturers who teach anomalies!) M1 D1 L1 L1 D1 M1 L1 no modules 3NFb M2 D1 L1 L2 D1 M2 L1 • UPDATE – To change Module Lecturer M3 D1 L2 L3 D2 M3 L2 3NFa the department for L1 M4 D2 L3 L4 D2 M4 L3 Lecturer Dept M1 L1 we must alter two M5 D2 L4 M5 L4 M2 L1 rows L1 D1 M3 L2 • DELETE – If we delete L2 D1 M4 L3 M3 we delete L2 as L3 D2 M5 L4 well L4 D2 Normalisation so Far The Stream Relation • First normal form • Third normal form • Consider a relation, • Each course has • All data values are • In 2NF plus no non-key Stream, which stores several streams atomic attribute depends information about • Only one stream (of • Second normal form transitively on a candidate key (or, no times for various any course at all) • In 1NF plus no non-key dependencies of non- streams of courses takes place at any attribute is partially key on non-superkey) dependent on a •For example: labs given time candidate key for first years • Each student taking a course is assigned to a single stream for it 3 The Stream Relation FDs in the Stream Relation •Stream has the Student Course Time following non-trivial John Databases 12:00 FDs Mary Databases 12:00 • {Student, Course} Richard Databases 15:00 {Time} Richard Programming 10:00 •{Time} {Course} Mary Programming 10:00 Rebecca Programming 13:00 • Since all attributes are key attributes, Stream Candidate keys: {Student, Course} and {Student, Time} is in 3NF Anomalies in Stream Boyce-Codd Normal Form • INSERT anomalies • A relation is in Boyce- • The same as 3NF except • You can’t add an Codd normal form in 3NF we only worry empty stream Student Course Time (BCNF) if for every FD A about non-key Bs B either • If there is only one •UPDATE anomalies John Databases 12:00 • B is contained in A (the candidate key then 3NF • Moving the 12:00 Mary Databases 12:00 FD is trivial), or and BCNF are the same class to 9:00 means Richard Databases 15:00 • A contains a candidate changing two rows Richard Programming 10:00 key of the relation, Mary Programming 10:00 • DELETE anomalies • In other words: every Rebecca Programming 13:00 determinant in a non- • Deleting Rebecca trivial dependency is a removes a stream (super) key. Stream and BCNF Conversion to BCNF • Stream is not in Student Course Time BCNF as the FD Student Course Time John Databases 12:00 {Time} {Course} Mary Databases 12:00 is non-trivial and Richard Databases 15:00 {Time} does not Richard Programming 10:00 contain a candidate Mary Programming 10:00 Student Time Course Time key Rebecca Programming 13:00 Stream has been put into BCNF but we have lost the FD {Student, Course} {Time} 4 Decomposition Properties Higher Normal Forms • Lossless: Data should • Normalisation to 3NF • BCNF is as far as we 1NF Relations not be lost or created is always lossless and can go with FDs 2NF Relations when splitting dependency • Higher normal forms relations up preserving are based on other 3NF Relations sorts of dependency • Dependency • Normalisation to • Fourth normal form BCNF Relations preservation: It is BCNF is lossless, but removes multi-valued 4NF Relations desirable that FDs are may not preserve all dependencies preserved when dependencies • Fifth normal form 5NF Relations splitting relations up removes join dependencies Denormalisation Denormalisation • Normalisation • However •You might want to Address •Removes data •It leads to more denormalise if Number Street City Postcode redundancy tables in the database • Database speeds are •Solves INSERT, • Often these need to unacceptable (not Not normalised since UPDATE, and DELETE be joined back just a bit slow) {Postcode} {City} anomalies together, which is • There are going to be Address1 • This makes it easier expensive to do very few INSERTs, to maintain the • So sometimes (not UPDATEs, or DELETEs Number Street Postcode information in the often) it is worth • There are going to be database in a ‘denormalising’ lots of SELECTs that Address2 consistent state involve the joining of Postcode City tables Relational algebra reminder: Lossless decomposition selection • To normalise a relation, • Reminder of projection: we used projections C=D(R) • If R(A,B,C) satisfies AB R AB(R) R then we can project it on A,B and A,C without losing AB C AB AB CD AB CD information 1ccx 1ccx • Lossless decomposition: 2y de 3z aa R = AB(R) ⋈ AC(R) 3z aa where AB(R) is projection of 4u bc R on AB and ⋈ is natural join.