HACD.9: Design Issues: Part II 29/09/2010

First Normal Form (1NF) Database Design Issues, Part II On CS252 we shall assume that every relation, and therefore every relvar, is in 1NF. Hugh Darwen

[email protected] The term (due to E.F. Codd) is not clearly defined, partly because www.dcs.warwick.ac.uk/~hugh it depends on an ill-defined concept of “atomicity” (of attribute values).

CS252.HACD: Fundamentals of Relational Some authorities take it that a relation is in 1NF iff none of its Section 9: Database Design Issues, Part II attributes is relation-valued or tuple-valued. It is certainly recommended to avoid use of such attributes (especially RVAs) in database relvars.

1 2

2NF and 3NF Boyce/Codd Normal Form (BCNF)

These normal forms, originally defined by E.F. Codd, were really BCNF is defined thus: “mistakes”. You will find definitions in the textbooks but there is no need to learn them. Relvar R is in BCNF if and only if for every nontrivial FD A → B satisfied by R, A is a superkey of R. The faults with Codd’s original definition of 3NF were reported to him by Raymond Boyce. Together they worked on an improved, More loosely, “every nontrivial determinant is a [candidate] key”. simpler normal form, which became known as Boyce-Codd Normal Form (BCNF). BCNF addresses redundancy arising from JDs that are consequences of FDs.

(Not all JDs are consequences of FDs. We will look at the others later.)

3 4

Splitting ENROLMENT (bis) Advantages of BCNF With reference to our ENROLMENT example, decomposed into IS_CALLED IS_ENROLLED_ON the BCNF relvars IS_CALLED and IS_ENROLLED_ON:

StudentId Name StudentId CourseId • Anne’s name is recorded twice in ENROLMENT, but only once in IS_CALLED. In ENROLMENT it might appear under S1 Anne C1S1 different spellings (Anne, Ann), unless the FD S2 Boris C2S1 { StudentId } → { Name} is declared as a constraint. CindyS3 C1S2 Redundancy is the problem here. DevinderS4 C3S3 • With ENROLMENT, a student’s name cannot be recorded S5 Boris C1S4 unless that student is enrolled on some course, and an anonymous student cannot be enrolled on any course. The attributes involved in the “rogue” FD have been separated Lack of orthogonality is the problem here. into IS_CALLED, and now we can add student S5!

5 6

CS252: Fundamentals of Relational Databases 1 HACD.9: Database Design Issues: Part II 29/09/2010

Another Kind of Rogue FD Splitting TUTORS_ON

TUTORS_ON TUTOR_NAME TUTORS_ON_BCNF

StudentId TutorId TutorName CourseId TutorId TutorName StudentId TutorId CourseId T1 Hugh C1S1 T1 Hugh T1 C1S1 T2 Mary C2S1 T2 Mary T2 C2S1 T3 Lisa C1S2 LisaT3 T3 C1S2 Fred C3T4S3 FredT4 C3T4S3 T1 Hugh C1S4 ZackT5 T1 C1S4

Assume the FD { TutorId } → { TutorName } holds. Now we can put Zack, who isn’t assigned to anybody yet, into the database. Note the FK required for TUTORS_ON_BCNF.

7 8

Dependency Preservation Try 1: {CourseId}→{Organiser}

SCOR SCR CO StudentId CourseId Organiser Room StudentId CourseId Room OrganiserCourseId S1 C1 Owen 13 S1 C1 13 OwenC1 S1 C2 Olga 24 S1 C2 24 OlgaC2 S2 C1 Owen 13 S2 C1 13

Assume FDs: { CourseId } → { Organiser } “Loses” { Room } → { Organiser } { Organiser } → { Room } and {Organiser } → { Room } { Room } → { Organiser } Which one do we address first? 9 10

Try 2: {Room}→{Organiser} Try 3: {Organiser}→{Room}

SCR OR SCO OR StudentId CourseId Room Organiser Room StudentId CourseId Organiser RoomOrganiser S1 C1 13 13Owen S1 C1 Owen 13Owen S1 C2 24 24Olga S1 C2 Olga 24Olga S2 C1 13 S2 C1 Owen

“Loses” { CourseId } → { Organiser } Preserves all three FDs! (But we must still decompose SCO, of course)

11 12

CS252: Fundamentals of Relational Databases 2 HACD.9: Database Design Issues: Part II 29/09/2010

An FD That Cannot Be Preserved Splitting TUTOR_FOR

TUTOR_FOR TUTORS TEACHES

StudentId TutorId CourseId StudentId TutorId TutorId CourseId T1 C1S1 S1 T1 T1 C1 T2 C2S1 S1 T2 T2 C2 T3 C1S2 S2 T3 T3 C1 C3T4S3 T4S3 C3T4 T1 C1S4 S4 T1

Now assume the FD { TutorId } → { CourseId } holds. Note the keys. This is a third kind of rogue FD. Have we “lost” the FD { StudentId, CourseId } → { TutorId } ? 13 And the FK referencing IS_ENROLLED_ON? 14

Reinstating The Lost FD And The Lost Foreign Key

Need to add the following constraint: The “lost” foreign key is easier: CONSTRAINT KEY_OF_TUTORS_JOIN_TEACHES IS_EMPTY ( ( TUTORS JOIN TEACHES ) GROUP { ALL BUT StudentId, CourseId } AS G CONSTRAINT FK_FOR_TUTORS_JOIN_TEACHES WHERE COUNT ( G ) > 1 ) ; IS_EMPTY ( ( TUTORS JOIN TEACHES ) NOT MATCHING or equivalently: IS_ENROLLED_ON ) ;

CONSTRAINT KEY_OF_TUTORS_JOIN_TEACHES WITH TUTORS JOIN TEACHES AS TJT : COUNT ( TJT ) = COUNT ( TJT { StudentId, CourseId } ) ;

15 16

In BCNF But Still Problematical Normalising TBC1

TBC1 TB BC

Teacher Book CourseId Teacher Book Book CourseId Database Systems C1T1 T1 Database Systems Database Systems C1 Database in Depth C1T1 T1 Database in Depth Database in Depth C1 Database Systems C2T1 T2 Database in Depth Database Systems C2 2aaaein DepthT1 C2Database in Depth C2Database Database in Depth C2T2 We have lost the constraint implied by the JD, but does a teacher Assume the JD *{ { Teacher, Book }, { Book, CourseId } } really have to teach a course just because he or she uses a book holds. that is used on that course? 17 18

CS252: Fundamentals of Relational Databases 3 HACD.9: Database Design Issues: Part II 29/09/2010

Fifth Normal Form (5NF) A JD of Degree > 2

5NF caters for all harmful JDs. TBC2

Relvar R is in 5NF iff every nontrivial JD that holds in R is Teacher Book CourseId implied by the keys of R. (Fagin’s definition, 1979) Database Systems C1T1 Apart from a few weird exceptions, a JD is “implied by the Database in Depth C1T1 keys” if every projection is a superkey. (Date’s definition – Database Systems C2T1 but see the Notes for this slide) in DepthT1 C2Database To explain “nontrivial”: A JD is trivial if and only if one of its Database in Depth C2T2 operands is the entire heading of R (because every such JD is clearly satisfied by R). Now assume the JD *{ { Teacher, Book }, { Book, CourseId }, { Teacher, CourseId } } holds.

19 20

Normalising TBC2 Sixth Normal Form (6NF)

TB BC 6NF subsumes 5NF and is the strictest NF: Teacher Book Book CourseId T1 Database Systems Database Systems C1 Relvar R is in 6NF if and only if every JD that holds in R is trivial. T1 Database in Depth Database in Depth C1 T2 Database in Depth Database Systems C2 6NF provides maximal orthogonality, as already noted, but is not 2aaaein Depth C2Database normally advised. It addresses additional anomalies that can arise TC Teacher CourseId with temporal data (beyond the scope of this course—and, what’s T1 C1 (and we’ve “lost” the more, the definition of has to be revised). T1 C2 constraint again) T2 C2

21 22

Wives of Henry VIII in 6NF

W_FN W_LN W_F

Wife# FirstName Wife# LastName FateWife# 1 Catherine 1 of Aragon divorced1 EXERCISE 2 Anne 2 Boleyn beheaded2 3 Jane 3 Seymour died3 (see Notes) Anne4 4 of Cleves divorced4 Catherine5 5 Howard beheaded5 6 Catherine 6 Parr survived6 Not a good idea!

23 24

CS252: Fundamentals of Relational Databases 4