Introduction to Database Management Systems
Relation Normalization
Why Normalization? Functional Dependencies. First, Second, and Third Normal Forms. Boyce/Codd Normal Form. Fourth and Fifth Normal Form. No loss Decomposition. Summary
CIS Relation Normalization 1
Why Normalization?
An ill-structured relation contains redundant data Data redundancy causes modification anomalies:
Insertion anomalies -- Suppose we want to enter SCUBA as an activity that costs $100, we can’t until a student signs up for it Update anomalies -- If we change the price of swimming for student 150, there is no guarantee that student 200 will pay the new price Deletion anomalies -- If we delete Student 100, we lose not only the fact that he/she is a skier, but also the fact that skiing costs $200 Normalization is the process used to remove modification anomalies ACTIVITY SID Activity Fee How can this table be changed 100 Skiing 200 to fix these problems??? 150 Swimming 50 175 Squash 50 200 Swimming 50 CIS Relation Normalization 2
Dave McDonald, CIS, GSU 10-1 Introduction to Database Management Systems
Why Normalization...
Course SID Name Grade Course# Text Major Dept s1 Joseph A CIS8110 b1 CIS CIS s1 Joseph B CIS8120 b2 CIS CIS s1 Joseph A CIS8140 b5 CIS CIS s2 Alice A CIS8110 b1 CS MCS s2 Alice A CIS8140 b5 CS MCS s3 Tom B CIS8110 b1 Acct Acct s3 Tom B CIS8140 b5 Acct Acct s3 Tom A CIS8680 b1 Acct Acct
Is there any redundant data? Insertion anomalies? Update anomalies? Deletion anomalies?
CIS Relation Normalization 3
Functional Dependencies
Given two attributes, X and Y, of a relation R, Y is functionally dependent on X iff each X value must always occur with the same Y value in R. R.X --> R.Y or X --> Y List all FDs in the Course relation:
CIS Relation Normalization 4
Dave McDonald, CIS, GSU 10-2 Introduction to Database Management Systems
Functional Dependencies...
X is called the determinant of Y. X and Y may be composite. Dependency relationships change with attribute semantics. X and Y could be mutually dependent on each other. Husband --> Wife, Wife --> Husband, Husband <--> Wife X may or may not be the key attribute of R. AYA Y va lue can occur in more t han one tup le in R . Course# --> Text
CIS Relation Normalization 5
Fully Functional Dependencies A fully functional dependence ( FFD ) exists between attributes X and Y if Y is not functional dependent on any proper subset of X. ( SID, Course# ) --> Name? ( SID, Course# ) --> Grade? ( SID, Name ) --> Major? ( SID, Name ) --> SID? Note that if X is not compp,osite, then X --> Y is always a FFD. By default, the term FD refers to FFD
CIS Relation Normalization 6
Dave McDonald, CIS, GSU 10-3 Introduction to Database Management Systems
Transitively Functional Dependencies
Given attributes X, Y, and Z of a relation R, Z is transitively dependent on X iff X --> Y and Y --> Z.
Given SID --> Dept and Dept --> College SID -->? Given SID --> Major and Major --> Dept, SID -->?> ?
CIS Relation Normalization 7
Graphical Representation
Course (SID, Name, Grade, Course#, Text, Major, Dept)
Primary Key Name Major SID
Grade Course# Dept
Text
CIS Relation Normalization 8
Dave McDonald, CIS, GSU 10-4 Introduction to Database Management Systems
First Normal Form (1NF) A relation R is in 1NF iff all attribute domains contain atomic values only. A relation in 1NF has modification anomalies
Part#
QTY
WAddress WHouse#
INVENTORY (Part#, WHouse#, WAddress, QTY) CIS Relation Normalization 9
Second Normal Form (2NF) A relation is in 2NF iff R is in 1NF and every non key attribute is fully dependent on the primary key (i.e. has no partial functional dependencies).
The term, non key attribute, refers to any attribute that does not belong to any candidate key.
Part#
QTY
WAddress WHouse#
INVENTORY (Part#, WHouse#, WAddress, QTY) CIS Relation Normalization 10
Dave McDonald, CIS, GSU 10-5 Introduction to Database Management Systems
Modification Anomalies in 2NF 2NF relations have modification anomalies: Redundant Information? Update anomalies? Insertion anomalies? Deletion anomalies? Which FD causes the redundant data?
INVENTORY Part# WHouse# WAddress QTY
123 4 Atlanta 10 456 5 Birmingham 6 456 2 Columbus 10 123 7 Oakland 8 235 1 Denver 2 CIS Relation Normalization 11
Third Normal Form (3NF) A relation R is in 3NF iff R is in 2NF and every non key attribute is non transitively dependent on the primary key.
Student (SID, Name, Major, Dept)
Discussion: If a relation does not have any non-key attribute, would it automatically be in 3NF?
CIS Relation Normalization 12
Dave McDonald, CIS, GSU 10-6 Introduction to Database Management Systems
Modification Anomalies in 3NF LOCATION (Employee, Department, Location)
Redundant Information? Update anomalies? Insertion anomalies? Deletion anomalies? All determinants?
ElEmployee Department
Location
CIS Relation Normalization 13
Boyce/Codd Normal Forms (BCNF) A relation R is in BCNF iff every determinant is a candidate key. BCNF is applied to a relation R if
1. Those candidate keys are composite, and 2. The candidate keys are overlapped,
ADVISE (Student, Major, Advisor)
STUDENT ADVISOR
MAJOR
CIS Relation Normalization 14
Dave McDonald, CIS, GSU 10-7 Introduction to Database Management Systems
BCNF Example
Student Course Instructor Narayan Database Mark Smith Database Jeffries SithSmith OtiOperating Ammar Systems Smith Theory Schulman Wallace Database Mark Wallace Operating Ahamad Systems Wong Database Omiecinski Zelaya Database Jeffries Narayan Operating Ammar Systems
Teach(Student, Course, Instructor) CIS Relation Normalization 15
BCNF Example (cont’d)
Student Instructor Course
There are three possible decompositions
(Student, Instructor) and (Student, Course)
(Course, Instructor) and (Course, Student)
(Instructor, Course) and (Instructor, Student)
Which of the three will not generate spurious tuples after a join? CIS Relation Normalization 16
Dave McDonald, CIS, GSU 10-8 Introduction to Database Management Systems
Boyce–Codd Normal Form (BCNF)
• Difference between 3NF and BCNF is that for a functional dependency A → B, 3NF allows this dependency in a relation if B is a primary -key attribute and A is not a candidate key. • Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key. • Every relation in BCNF is also in 3NF. However, relation in 3NF may not be in BCNF.
• Violation of BCNF is quite rare.
• Potential to violate BCNF may occur in a relation that: – contains two (or more) composite candidate keys; – the candidate keys overlap (i.e. have at least one attribute in common).
CIS Relation Normalization 17
Fourth Normal Form (4NF)
• Although BCNF removes anomalies due to functional dependencies, another type of dependency called a multi-valued dependency (MVD) can also cause data redundancy.
• Possibl e ex is tence o f MVDs i n a rel ati on i s d ue t o 1NF and can result in data redundancy.
• Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, set of values for B and C are independent of each other.
• MVD btbetween attr ibut es A, B,and C inareltilation using the following notation: A --->> B A --->> C
CIS Relation Normalization 18
Dave McDonald, CIS, GSU 10-9 Introduction to Database Management Systems
Multi-Valued Dependencies and Fourth Normal Form
• A multi-valued dependency Course Teacher Text occurs when a determinant determines more than one Phy sics Prof. Basic dependent, and the Greene Mechanics dependents are independent of Principles of each other Physics Prof. Greene Optics • Ex.: course implies teacher Physics Prof. Basic (course -> teacher); course Mechanics implies text (course -> text), Brown where teacher and text are Physics Prof. Principles of independent Brown Optics • A Relation with course, teacher Math Prof. Basic and text is all key, and exhibits Greene Mechanics redundancy, but is in 3NF Math Prof. Vector – R(course, teacher, text) Greene Analysis • Updates can exhibit anomalies Math Prof. Trigonometry Greene CIS Relation Normalization 19
Fourth Normal Form
• Relation R is in 4 NF if and only • In the previous example, if, whenever there exist decompose course, teacher, subsets A and B of the text into two Relations: course attributes of R such that the teacher, and course text nontrivial multi-valued dependency A multi- determines B is satisfied, then all attributes of R are also Course Text functionally dependent on A Physics Basic Mechanics Physics Principles of Course Teacher Optics Prof. Greene Math Basic Physics Mechanics Physics Prof. Brown Math Vector Analysis Math Prof. Greene Math Trignonometry
CIS Relation Normalization 20
Dave McDonald, CIS, GSU 10-10 Introduction to Database Management Systems
4NF – Text Example
CIS Relation Normalization 21
Join Dependencies and Fifth Normal Form • There exist Relations that cannot be nonloss-decomposed into two Relations, but can be nonloss-decomposed into more than two • ElitjtEx.: supplier, part, project • A supplier supplies parts and projects, a project is supplied by suppliers and parts, but from this you may not validly conclude that a particular supplier supplies a particular part to a particular project
Supplier# Part# Project#
S1 P1 J2 M N S1 P2 J1 SupplierS-P-Pr Part S2 P1 J1 S1 P1 J1 O Project
CIS Relation Normalization 22
Dave McDonald, CIS, GSU 10-11 Introduction to Database Management Systems
Join Dependency
• Let R be a Relation, and let A, B, … Z be subsets of the attributes of R. Then we say that R satisfies the JD * ( A, B, …. Z ) if and only if every legal value of R is equal to the join of its projections on A, B, … Z • Supplier, part, project can be said to satisfy this only if an additional constraint is included to make the specific conclusion valid
CIS Relation Normalization 23
Fifth Normal Form
• A Relation R is in 5NF – also called projection-join normal form, if and only if every nontrivial join dependency that is satisfied by R is implied by the candidate key(s) of R • In the general case, SPJ (supplier, part, project) is not in 5NF, but SP (supplier, part), PJ (part project), and JS (project, supplier) are in 5NF • 5NF is a generalization of 4NF, which is a generalization of 3NF • It is the most general form possible for projection-based normalization
CIS Relation Normalization 24
Dave McDonald, CIS, GSU 10-12 Introduction to Database Management Systems
Fifth Normal Form (5NF)
• A relation decomposed into two relations must have lossless- jjppy,oin property, which ensures that no s purious tu ples are generated when relations are reunited through a natural join.
• However, there are requirements to decompose a relation into more than two relations.
• Although rare, these cases are managed by join dependency and fifth normal form (5NF).
CIS Relation Normalization 25
5NF - Example
CIS Relation Normalization 26
Dave McDonald, CIS, GSU 10-13 Introduction to Database Management Systems
Fifth Normal Form (5NF)
• A relation that has no join dependency.
CIS Relation Normalization 27
Rules for No Loss Decomposition Independent Projection ( Rissanen ) -- Two projections R1 and R2 of a relation R are independent iff
Every FD in R can be logically deduced from those in R1 and R2, and The common attributes of R1 and R2 form a candidate key for at least one of the pair.
An independent projection guarantees no loss decomposition
Each decomposed relation can be maintained independently. The contents of the original relation can be restored correctly by joining the decomposed relations on common attributes.
Find out the loss decompositions in the examples
CIS Relation Normalization 28
Dave McDonald, CIS, GSU 10-14 Introduction to Database Management Systems
Summary
4NF and 5NF deal with multi-value dependences Relations mapped from E-R schema need no 4NF and 5NF Limi tati ons
Can not detect redundancy between relations Results fragmented relations which may slow down data retrieval Denormalization is needed if
Relations in higher normal form cause the performance problem, Majority of queries are data retrieval, Denormalization can speed up the data retrieval, and Denormalization does not introduce severe update anomalies.
CIS Relation Normalization 29
Exercise Given relation, Lab_Usage, has been defined as follows. Lab_usage( SID, Class#, Course, SName, Account#, Lab_Hours ) where SID is a unique student ID, Class# is a unique class number, SName is a student name, Lab_Hours is the maximum laboratory hours assigned to each student in a class, Account# is a unique computer account. A student is assigned an account for each class he/she takes. Assume that no student takes the same course twice.
CIS Relation Normalization 30
Dave McDonald, CIS, GSU 10-15 Introduction to Database Management Systems
Exercise...
1. Determine all candidate keys and select a primary key. 2. List all FFDs. 3. Discuss update anomalies found in the relation. 4. Decompose the relation into 2NF relations. 5. Discuss update anomalies found in the 2NF relations. 6. Decompose the relation into 3NF relations. 7. Discuss update anomalies found in the 3NF relations if any.
CIS Relation Normalization 31
Dave McDonald, CIS, GSU 10-16