Introduction to Management Systems

Relation Normalization

† Why Normalization? † Functional Dependencies. † First, Second, and Third Normal Forms. † Boyce/Codd Normal Form. † Fourth and . † No loss Decomposition. † Summary

CIS Normalization 1

Why Normalization?

† An ill-structured relation contains redundant data † Data redundancy causes modification anomalies:

‹ Insertion anomalies -- Suppose we want to enter SCUBA as an activity that costs $100, we can’t until a student signs up for it ‹ Update anomalies -- If we change the price of swimming for student 150, there is no guarantee that student 200 will pay the new price ‹ Deletion anomalies -- If we delete Student 100, we lose not only the fact that he/she is a skier, but also the fact that skiing costs $200 † Normalization is the process used to remove modification anomalies ACTIVITY SID Activity Fee How can this be changed 100 Skiing 200 to fix these problems??? 150 Swimming 50 175 Squash 50 200 Swimming 50 CIS Relation Normalization 2

Dave McDonald, CIS, GSU 10-1 Introduction to Database Management Systems

Why Normalization...

Course SID Name Grade Course# Text Major Dept s1 Joseph A CIS8110 b1 CIS CIS s1 Joseph B CIS8120 b2 CIS CIS s1 Joseph A CIS8140 b5 CIS CIS s2 Alice A CIS8110 b1 CS MCS s2 Alice A CIS8140 b5 CS MCS s3 Tom B CIS8110 b1 Acct Acct s3 Tom B CIS8140 b5 Acct Acct s3 Tom A CIS8680 b1 Acct Acct

 Is there any redundant data?  Insertion anomalies?  Update anomalies?  Deletion anomalies?

CIS Relation Normalization 3

Functional Dependencies

Given two attributes, X and Y, of a relation R, Y is functionally dependent on X iff each X value must always occur with the same Y value in R. R.X --> R.Y or X --> Y List all FDs in the Course relation:

CIS Relation Normalization 4

Dave McDonald, CIS, GSU 10-2 Introduction to Database Management Systems

Functional Dependencies...

† X is called the determinant of Y. † X and Y may be composite. † Dependency relationships change with attribute semantics. † X and Y could be mutually dependent on each other. Husband --> Wife, Wife --> Husband, Husband <--> Wife † X may or may not be the key attribute of R. † AYA Y va lue can occur in more t han one tup le in R . Course# --> Text

CIS Relation Normalization 5

Fully Functional Dependencies † A fully functional dependence ( FFD ) exists between attributes X and Y if Y is not functional dependent on any proper subset of X. ( SID, Course# ) --> Name? ( SID, Course# ) --> Grade? ( SID, Name ) --> Major? ( SID, Name ) --> SID? Note that if X is not compp,osite, then X --> Y is always a FFD. † By default, the term FD refers to FFD

CIS Relation Normalization 6

Dave McDonald, CIS, GSU 10-3 Introduction to Database Management Systems

Transitively Functional Dependencies

Given attributes X, Y, and Z of a relation R, Z is transitively dependent on X iff X --> Y and Y --> Z.

Given SID --> Dept and Dept --> College SID -->? Given SID --> Major and Major --> Dept, SID -->?> ?

CIS Relation Normalization 7

Graphical Representation

Course (SID, Name, Grade, Course#, Text, Major, Dept)

Primary Key Name Major SID

Grade Course# Dept

Text

CIS Relation Normalization 8

Dave McDonald, CIS, GSU 10-4 Introduction to Database Management Systems

First Normal Form (1NF) A relation R is in 1NF iff all attribute domains contain atomic values only.  A relation in 1NF has modification anomalies

Part#

QTY

WAddress WHouse#

INVENTORY (Part#, WHouse#, WAddress, QTY) CIS Relation Normalization 9

Second Normal Form (2NF) A relation is in 2NF iff R is in 1NF and every non key attribute is fully dependent on the (i.e. has no partial functional dependencies).

 The term, non key attribute, refers to any attribute that does not belong to any .

Part#

QTY

WAddress WHouse#

INVENTORY (Part#, WHouse#, WAddress, QTY) CIS Relation Normalization 10

Dave McDonald, CIS, GSU 10-5 Introduction to Database Management Systems

Modification Anomalies in 2NF 2NF relations have modification anomalies: † Redundant Information? † Update anomalies? † Insertion anomalies? † Deletion anomalies? Which FD causes the redundant data?

INVENTORY Part# WHouse# WAddress QTY

123 4 Atlanta 10 456 5 Birmingham 6 456 2 Columbus 10 123 7 Oakland 8 235 1 Denver 2 CIS Relation Normalization 11

Third Normal Form (3NF) A relation R is in 3NF iff R is in 2NF and every non key attribute is non transitively dependent on the primary key.

Student (SID, Name, Major, Dept)

Discussion: If a relation does not have any non-key attribute, would it automatically be in 3NF?

CIS Relation Normalization 12

Dave McDonald, CIS, GSU 10-6 Introduction to Database Management Systems

Modification Anomalies in 3NF LOCATION (Employee, Department, Location)

† Redundant Information? † Update anomalies? † Insertion anomalies? † Deletion anomalies? † All determinants?

ElEmployee Department

Location

CIS Relation Normalization 13

Boyce/Codd Normal Forms (BCNF) A relation R is in BCNF iff every determinant is a candidate key. BCNF is applied to a relation R if

1. Those candidate keys are composite, and 2. The candidate keys are overlapped,

ADVISE (Student, Major, Advisor)

STUDENT ADVISOR

MAJOR

CIS Relation Normalization 14

Dave McDonald, CIS, GSU 10-7 Introduction to Database Management Systems

BCNF Example

Student Course Instructor Narayan Database Mark Smith Database Jeffries SithSmith OtiOperating Ammar Systems Smith Theory Schulman Wallace Database Mark Wallace Operating Ahamad Systems Wong Database Omiecinski Zelaya Database Jeffries Narayan Operating Ammar Systems

Teach(Student, Course, Instructor) CIS Relation Normalization 15

BCNF Example (cont’d)

Student Instructor Course

There are three possible decompositions

(Student, Instructor) and (Student, Course)

(Course, Instructor) and (Course, Student)

(Instructor, Course) and (Instructor, Student)

Which of the three will not generate spurious tuples after a join? CIS Relation Normalization 16

Dave McDonald, CIS, GSU 10-8 Introduction to Database Management Systems

Boyce–Codd Normal Form (BCNF)

• Difference between 3NF and BCNF is that for a A → B, 3NF allows this dependency in a relation if B is a primary -key attribute and A is not a candidate key. • Whereas, BCNF insists that for this dependency to remain in a relation, A must be a candidate key. • Every relation in BCNF is also in 3NF. However, relation in 3NF may not be in BCNF.

• Violation of BCNF is quite rare.

• Potential to violate BCNF may occur in a relation that: – contains two (or more) composite candidate keys; – the candidate keys overlap (i.e. have at least one attribute in common).

CIS Relation Normalization 17

Fourth Normal Form (4NF)

• Although BCNF removes anomalies due to functional dependencies, another type of dependency called a multi-valued dependency (MVD) can also cause data redundancy.

• Possibl e ex is tence o f MVDs i n a rel ati on i s d ue t o 1NF and can result in data redundancy.

• Dependency between attributes (for example, A, B, and C) in a relation, such that for each value of A there is a set of values for B and a set of values for C. However, set of values for B and C are independent of each other.

• MVD btbetween attr ibut es A, B,and C inareltilation using the following notation: A --->> B A --->> C

CIS Relation Normalization 18

Dave McDonald, CIS, GSU 10-9 Introduction to Database Management Systems

Multi-Valued Dependencies and

• A multi-valued dependency Course Teacher Text occurs when a determinant determines more than one Phy sics Prof. Basic dependent, and the Greene Mechanics dependents are independent of Principles of each other Physics Prof. Greene Optics • Ex.: course implies teacher Physics Prof. Basic (course -> teacher); course Mechanics implies text (course -> text), Brown where teacher and text are Physics Prof. Principles of independent Brown Optics • A Relation with course, teacher Math Prof. Basic and text is all key, and exhibits Greene Mechanics redundancy, but is in 3NF Math Prof. Vector – R(course, teacher, text) Greene Analysis • Updates can exhibit anomalies Math Prof. Trigonometry Greene CIS Relation Normalization 19

Fourth Normal Form

• Relation R is in 4 NF if and only • In the previous example, if, whenever there exist decompose course, teacher, subsets A and B of the text into two Relations: course attributes of R such that the teacher, and course text nontrivial multi-valued dependency A multi- determines B is satisfied, then all attributes of R are also Course Text functionally dependent on A Physics Basic Mechanics Physics Principles of Course Teacher Optics Prof. Greene Math Basic Physics Mechanics Physics Prof. Brown Math Vector Analysis Math Prof. Greene Math Trignonometry

CIS Relation Normalization 20

Dave McDonald, CIS, GSU 10-10 Introduction to Database Management Systems

4NF – Text Example

CIS Relation Normalization 21

Join Dependencies and Fifth Normal Form • There exist Relations that cannot be nonloss-decomposed into two Relations, but can be nonloss-decomposed into more than two • ElitjtEx.: supplier, part, project • A supplier supplies parts and projects, a project is supplied by suppliers and parts, but from this you may not validly conclude that a particular supplier supplies a particular part to a particular project

Supplier# Part# Project#

S1 P1 J2 M N S1 P2 J1 SupplierS-P-Pr Part S2 P1 J1 S1 P1 J1 O Project

CIS Relation Normalization 22

Dave McDonald, CIS, GSU 10-11 Introduction to Database Management Systems

Join Dependency

• Let R be a Relation, and let A, B, … Z be subsets of the attributes of R. Then we say that R satisfies the JD * ( A, B, …. Z ) if and only if every legal value of R is equal to the join of its projections on A, B, … Z • Supplier, part, project can be said to satisfy this only if an additional constraint is included to make the specific conclusion valid

CIS Relation Normalization 23

Fifth Normal Form

• A Relation R is in 5NF – also called projection-join normal form, if and only if every nontrivial that is satisfied by R is implied by the candidate key(s) of R • In the general case, SPJ (supplier, part, project) is not in 5NF, but SP (supplier, part), PJ (part project), and JS (project, supplier) are in 5NF • 5NF is a generalization of 4NF, which is a generalization of 3NF • It is the most general form possible for projection-based normalization

CIS Relation Normalization 24

Dave McDonald, CIS, GSU 10-12 Introduction to Database Management Systems

Fifth Normal Form (5NF)

• A relation decomposed into two relations must have lossless- jjppy,oin property, which ensures that no s purious tu ples are generated when relations are reunited through a natural join.

• However, there are requirements to decompose a relation into more than two relations.

• Although rare, these cases are managed by join dependency and fifth normal form (5NF).

CIS Relation Normalization 25

5NF - Example

CIS Relation Normalization 26

Dave McDonald, CIS, GSU 10-13 Introduction to Database Management Systems

Fifth Normal Form (5NF)

• A relation that has no join dependency.

CIS Relation Normalization 27

Rules for No Loss Decomposition Independent Projection ( Rissanen ) -- Two projections R1 and R2 of a relation R are independent iff

 Every FD in R can be logically deduced from those in R1 and R2, and  The common attributes of R1 and R2 form a candidate key for at least one of the pair.

An independent projection guarantees no loss decomposition

‹ Each decomposed relation can be maintained independently. ‹ The contents of the original relation can be restored correctly by joining the decomposed relations on common attributes.

Find out the loss decompositions in the examples

CIS Relation Normalization 28

Dave McDonald, CIS, GSU 10-14 Introduction to Database Management Systems

Summary

† 4NF and 5NF deal with multi-value dependences † Relations mapped from E-R schema need no 4NF and 5NF † Limi tati ons

‹ Can not detect redundancy between relations ‹ Results fragmented relations which may slow down data retrieval † is needed if

‹ Relations in higher normal form cause the performance problem, ‹ Majority of queries are data retrieval, ‹ Denormalization can speed up the data retrieval, and ‹ Denormalization does not introduce severe update anomalies.

CIS Relation Normalization 29

Exercise Given relation, Lab_Usage, has been defined as follows. Lab_usage( SID, Class#, Course, SName, Account#, Lab_Hours ) where SID is a unique student ID, Class# is a unique class number, SName is a student name, Lab_Hours is the maximum laboratory hours assigned to each student in a class, Account# is a unique computer account. A student is assigned an account for each class he/she takes. Assume that no student takes the same course twice.

CIS Relation Normalization 30

Dave McDonald, CIS, GSU 10-15 Introduction to Database Management Systems

Exercise...

1. Determine all candidate keys and select a primary key. 2. List all FFDs. 3. Discuss update anomalies found in the relation. 4. Decompose the relation into 2NF relations. 5. Discuss update anomalies found in the 2NF relations. 6. Decompose the relation into 3NF relations. 7. Discuss update anomalies found in the 3NF relations if any.

CIS Relation Normalization 31

Dave McDonald, CIS, GSU 10-16