Overview Real world Queries Answers Functional Model

Dependencies Databases DBMS Processing of queries and and Normalization updates

Access to stored data Jose M. Peña [email protected] Physical * slides kindly provided by Vaida Jakonien ÷ database

2

Good Design Informal design guideline

 Easy to explain semantics of the schema  Can we be sure that a translation from  Reducing redundant information in EER-diagram to relational tables results in good database design? Redundancy causes update anomalies:  Insertion anomalies  Confronted with a deployed database, how  Deletion anomalies can we be sure that it is well-designed?  Modification anomalies  What is good database design?  Four informal measures EMP( EMPID , EMPNAME, DEPTNAME, DEPTMGR) 123 Smith Research 999  Formal measure: normalization 333 Wong Research 999 888 Borg Administration null

3 4

Informal design guideline Informal design guideline

 Sometimes, it may be desirable to have  Reducing NULL values in tuples Why redundancy to gain in runtime, i.e. trade  Efficient use of space space for time.  Avoid costly outer joins  Ambiguous interpretation (unknown vs. doesn’t apply).  In that case and to avoid update anomalies  Disallow the possibility of generating spurious  either, use triggers or stored procedures to tuples update the base tables  Figures 10.5 and 10.6: cartesian product results in incorrect tuples  or, keep the base tables free of redundancy and  Only join on /primary key-attributes use views (assuming that the views are  Lossless join property: guarantees that the spurious materialized). generation problem does not occur 5 6 Functional dependencies (FD) Inference rules

 Let R be a relational schema with the attributes A 1,...,A n 1. If X ⊇ Y then XY, or X  X (reflexive rule) and let X and Y be of {A 1,...,A n}.  Let r(R) denote a relation in relational schema R. 2. XY |= XZ  YZ (augmentation rule)

WeWe say say that that X X functionally functionally determines determines Y,Y, 3. X  Y, Y  Z |= X  Z (transitive rule)

XX  Y Y ∈ ifif for for each each pair pair of of tuples tuples t t1, ,t t 2 ∈r(R) r(R) and and for for all all relations relations in in r(R): r(R): 4. X  YZ |= X  Y (decomposition rule) If t [X] = t [X] then1 2we must also have t [Y] = t [Y] If t 11[X] = t 22[X] then we must also have t 11[Y] = t 22[Y] 5. X  Y, X  Z |= X  YZ (union or additive rule)

 Despite the mathematical definition an FD cannot be 6. X  Y, WY  Z |= WX  Z (pseudotransitive rule) determined automatically. It is a property of the semantics of attributes.

7 8

For any relation Inference rules Definitions extension or state

 Textbook, page 341:  : a set of attributes uniquely (but not minimally!) identifying a tuple of a relation. ”… X  A, and Y  B does not imply that XY  AB.”  Key: A set of attributes that uniquely and minimally Prove that this statement is wrong. identifies a tuple of a relation.  : If there is more than one key in a relation, the keys are called candidate keys.  Prove inference rules 4, 5 and 6 by using only  Primary key: One candidate key is chosen to be the inference rules 1, 2 and 3. primary key.  Prime attribute: An attribute A that is part of a candidate key X (vs. nonprime attribute)

9 10

Normal Forms 1NF  1NF: The relation should have no non-atomic  1NF, 2NF, 3NF, BCNF (4NF, 5NF) values. Rnon1NF What about  Minimize redundancy multi-valued ID Name LivesIn composite 100 Pettersson {Stockholm, Linköping}  attributes ? Minimize update anomalies 101 Andersson {Linköping} 102 Svensson {Ystad, Hjo, Berlin} R2  Normal form ↑ = redundancy and update 1NF anomalies and relations become smaller. ID LivesIn ↓ 100 Stockholm R1  Join operation to recover original relations. 1NF 100 Linköping Normalization ID Name 101 Linköping 100 Pettersson 102 Ystad 101 Andersson 102 Hjo 102 Svensson 102 Berlin 11 12 2NF 2NF

 2NF: no nonprime attribute should be functionally  No 2NF: A part of a candidate key can have dependent on a part of a candidate key (= partial repeated values in the relation and, thus, so can dependency). have the nonprime attribute, i.e. redundancy + R non2NF insertion and modification anomalies. EmpID Dept Work% EmpName 100 Dev 50 Baker 100 Support 50 Baker  An FD X Y is a full functional dependency 200 Dev 80 Miller (FFD) if removal of any attribute Ai from X means R2 2NF that the dependency does not hold any more. R1 2NF EmpID Dept Work% Normalization EmpID EmpName 100 Dev 50 100 Baker 100 Support 50  2NF: Every nonprime attribute is fully functionally 200 Miller 200 Dev 80 dependent on every candidate key.

13 14

3NF 3NF

 3NF: 2NF + no nonprime attribute should be  No 3NF (but 2NF): A set of nonprime attributes can functionally dependent on a set of nonprime have repeated values in the relation and, thus, so attributes can have the nonprime attribute, i.e. redundancy + R non3NF insertion and modification anomalies. ID Name Zip City 100 Andersson 58214 Linköping 101 Björk 10223 Stockholm  An FD X Y is a transitive dependency if there is 102 Carlsson 58214 Linköping a set of nonprime attributes Z such that both XZ and ZY hold. R1 3NF R2 3NF Normalization ID Name Zip Zip City 100 Andersson 58214 58214 Linköping  3NF: 2NF + no nonprime attribute is transitively 101 Björk 10223 10223 Stockholm dependent on any candidate key. 102 Carlsson 58214 15 16

Little summary Boyce-Codd Normal Form

 BCNF: Every determinant is a superkey  X  A (in practice: every determinant is a candidate key)  2NF and 3NF do nothing if A is prime.  BCNF = decompose if X  A is such that X is not a  Assume A is nonprime. superkey and A is a prime attribute.  2NF = decompose if X is part of a candidate key.  3NF = decompose if X is part of a candidate key  Example: Given R(A,B ,C,D) and or X is nonprime, i.e. if X  A is partial or AB CD, C B. Then R is in 3NF but not in BCNF transitive.  C is a determinant but not a superkey (tuples are not uniquely identified in R)  3NF = X is a superkey or A is prime.  Should A be discriminated for being prime ? 17 18 BCNF: Example Properties of decomposition At a gym, an instructor is leading an activity in a certain room at a certain time.  Keep all attributes from the universal R nonBCNF relation R. Time Room Instructor Activity  Preserve the identified functional Mon 17.00 Gym Tina IronWoman dependencies. Mon 17.00 Mirrors Anna Aerobics  Lossless join Tue 17.00 Gym Tina Intro  It must be possible to join the smaller tables to arrive at composite information without Tue 17.00 Mirrors Anna Aerobics spurious tuples. Wed 18.00 Gym Anna IronWoman

19 20

Normalization: Example Normalization: Example

Given universal relation PID  PersonNamn PID, Land  AntalBesökILandet R(PID, PersonNamn, Land  Kontinent Land, Kontinent, KontinentYta, AntalBesökILandet) Kontinent  KontinentYta

 Functional dependencies?  Based on FDs, what are keys for R?  Keys?  Use inference rules

21 22

Normalization: Example Normalization: Example Is Land  Kontinent, Kontinent  KontinentYta, R (PID, Land , Kontinent, KontinentYta, PersonNamn, AntalBesökILandet) then in 2NF? Land  Kontinent, KontinentYta (transitive rule) No, PersonNamn depends on a part of the key ( PID ), then R1(PID , PersonNamn) PID, Land  Kontinent, KontinentYta (augmentation rule), R2(PID, Land , Kontinent, KontinentYta, AntalBesökILandet) PID, Land  PersonNamn (augmentation rule), PID, Land  AntalBesökILandet, Is R2 in 2NF? then No, Kontinent and KontinentYta depend on a part of the key ( Land ), then PID, Land  Kontinent, KontinentYta, PersonNamn, AntalBesökILandet (additive rule) R1(PID , PersonNamn) R21(Land , Kontinent, KontinentYta) 2NF:2NF: nono nonprimenonprime attributeattribute shouldshould bebe Person, Land is the key for R. R22(PID, Land , AntalBesökILandet) functionallyfunctionally dependentdependent onon aa partpart ofof aa  R1, R21, R22 are in 2NF candidatecandidate key.key.

23 24 3NF:3NF: 2NF2NF ++ nono nonprimenonprime attributeattribute shouldshould bebe functionallyfunctionally dependentdependent onon Are R1, R21, R22 in 3NF? aa setset ofof nonprimenonprime attributesattributes (( == nono Are R1, R22, R211, R212 in BCNF? transitivetransitive dependency)dependency) BCNF: Every determinant is a superkeysuperkey R22(PID, Land , AntalBesökILandet), R22(PID, Land , AntalBesökILandet), R1(PID , PersonNamn): R1(PID , PersonNamn): Yes, a single nonprime attribute, no transitive R211(Land , Kontinent) dependencies. R212(Kontinent , KontinentYta)

R21(Land , Kontinent, KontinentYta):  Yes (don’t be confused by candidate keys!) No, Kontinent defines KontinentYta , then R211(Land , Kontinent) R212(Kontinent , KontinentYta) Can the universal relation R be reproduced from R1, R22, R211 and R212 without spurious tuples?  R1, R22, R211, R212 are in 3NF

25 26

Summary and open issues 1. Which normal form?

 Good design: informal and formal properties of  The database contains data about cars, their owners and relations when the car was registered for that owner.  Functional dependencies, and thus normal forms, are about attribute semantics (= real- world knowledge), normalization can only be PersonID FirstName LastName LicensePlate RegistrationDate Birthdate automated if FDs are given. 1000 Ann Anderson ABC123 2004-10-12 1981-04-04

1010 Ben Benson DEF234 2003-02-12 1945-12-12

 Are high normal forms good design when it 1000 Ann Anderson ABC123 2001-04-23 1981-04-04 comes to performance?  No, denormalization may be required.

27 28

2. Which normal form? 3. Which normal form?

 A database contains data about registered cars  The database contains data about flights, aircrafts and and their make (type). their pilots. Flights use different aircrafts depending on the number of booked passengers.

Date Flight Aircraft Pilot 13-Jan-2005 TGU7 Airbus 300 John LicensePlate Type Maker 14-Jan-2005 TGU7 Boeing 747 Daniel ABC123 C70 Volvo 12-Jan-2005 SKX6 Airbus 300 John DEF234 S40 Volvo 13-Jan-2005 SKX6 Boeing 747 Ann FGH345 Corolla Toyota 14-Jan-2005 SKX6 Fokker 50 Mary

29 30