Overview Real world Queries Answers Functional Model
Dependencies Databases DBMS Processing of queries and and Normalization updates
Access to stored data Jose M. Peña [email protected] Physical * slides kindly provided by Vaida Jakonien ÷ database
2
Good Design Informal design guideline
Easy to explain semantics of the relation schema Can we be sure that a translation from Reducing redundant information in tuples EER-diagram to relational tables results in good database design? Redundancy causes update anomalies: Insertion anomalies Confronted with a deployed database, how Deletion anomalies can we be sure that it is well-designed? Modification anomalies What is good database design? Four informal measures EMP( EMPID , EMPNAME, DEPTNAME, DEPTMGR) 123 Smith Research 999 Formal measure: normalization 333 Wong Research 999 888 Borg Administration null
3 4
Informal design guideline Informal design guideline
Sometimes, it may be desirable to have Reducing NULL values in tuples Why redundancy to gain in runtime, i.e. trade Efficient use of space space for time. Avoid costly outer joins Ambiguous interpretation (unknown vs. doesn’t apply). In that case and to avoid update anomalies Disallow the possibility of generating spurious either, use triggers or stored procedures to tuples update the base tables Figures 10.5 and 10.6: cartesian product results in incorrect tuples or, keep the base tables free of redundancy and Only join on foreign key/primary key-attributes use views (assuming that the views are Lossless join property: guarantees that the spurious tuple materialized). generation problem does not occur 5 6 Functional dependencies (FD) Inference rules
Let R be a relational schema with the attributes A 1,...,A n 1. If X ⊇ Y then XY, or X X (reflexive rule) and let X and Y be subsets of {A 1,...,A n}. Let r(R) denote a relation in relational schema R. 2. XY |= XZ YZ (augmentation rule)
WeWe say say that that X X functionally functionally determines determines Y,Y, 3. X Y, Y Z |= X Z (transitive rule)
XX Y Y ∈ ifif for for each each pair pair of of tuples tuples t t1, ,t t 2 ∈r(R) r(R) and and for for all all relations relations in in r(R): r(R): 4. X YZ |= X Y (decomposition rule) If t [X] = t [X] then1 2we must also have t [Y] = t [Y] If t 11[X] = t 22[X] then we must also have t 11[Y] = t 22[Y] 5. X Y, X Z |= X YZ (union or additive rule)
Despite the mathematical definition an FD cannot be 6. X Y, WY Z |= WX Z (pseudotransitive rule) determined automatically. It is a property of the semantics of attributes.
7 8
For any relation Inference rules Definitions extension or state
Textbook, page 341: Superkey: a set of attributes uniquely (but not minimally!) identifying a tuple of a relation. ”… X A, and Y B does not imply that XY AB.” Key: A set of attributes that uniquely and minimally Prove that this statement is wrong. identifies a tuple of a relation. Candidate key: If there is more than one key in a relation, the keys are called candidate keys. Prove inference rules 4, 5 and 6 by using only Primary key: One candidate key is chosen to be the inference rules 1, 2 and 3. primary key. Prime attribute: An attribute A that is part of a candidate key X (vs. nonprime attribute)
9 10
Normal Forms 1NF 1NF: The relation should have no non-atomic 1NF, 2NF, 3NF, BCNF (4NF, 5NF) values. Rnon1NF What about Minimize redundancy multi-valued ID Name LivesIn composite 100 Pettersson {Stockholm, Linköping} attributes ? Minimize update anomalies 101 Andersson {Linköping} 102 Svensson {Ystad, Hjo, Berlin} R2 Normal form ↑ = redundancy and update 1NF anomalies and relations become smaller. ID LivesIn ↓ 100 Stockholm R1 Join operation to recover original relations. 1NF 100 Linköping Normalization ID Name 101 Linköping 100 Pettersson 102 Ystad 101 Andersson 102 Hjo 102 Svensson 102 Berlin 11 12 2NF 2NF
2NF: no nonprime attribute should be functionally No 2NF: A part of a candidate key can have dependent on a part of a candidate key (= partial repeated values in the relation and, thus, so can dependency). have the nonprime attribute, i.e. redundancy + R non2NF insertion and modification anomalies. EmpID Dept Work% EmpName 100 Dev 50 Baker 100 Support 50 Baker An FD X Y is a full functional dependency 200 Dev 80 Miller (FFD) if removal of any attribute Ai from X means R2 2NF that the dependency does not hold any more. R1 2NF EmpID Dept Work% Normalization EmpID EmpName 100 Dev 50 100 Baker 100 Support 50 2NF: Every nonprime attribute is fully functionally 200 Miller 200 Dev 80 dependent on every candidate key.
13 14
3NF 3NF
3NF: 2NF + no nonprime attribute should be No 3NF (but 2NF): A set of nonprime attributes can functionally dependent on a set of nonprime have repeated values in the relation and, thus, so attributes can have the nonprime attribute, i.e. redundancy + R non3NF insertion and modification anomalies. ID Name Zip City 100 Andersson 58214 Linköping 101 Björk 10223 Stockholm An FD X Y is a transitive dependency if there is 102 Carlsson 58214 Linköping a set of nonprime attributes Z such that both XZ and ZY hold. R1 3NF R2 3NF Normalization ID Name Zip Zip City 100 Andersson 58214 58214 Linköping 3NF: 2NF + no nonprime attribute is transitively 101 Björk 10223 10223 Stockholm dependent on any candidate key. 102 Carlsson 58214 15 16
Little summary Boyce-Codd Normal Form
BCNF: Every determinant is a superkey X A (in practice: every determinant is a candidate key) 2NF and 3NF do nothing if A is prime. BCNF = decompose if X A is such that X is not a Assume A is nonprime. superkey and A is a prime attribute. 2NF = decompose if X is part of a candidate key. 3NF = decompose if X is part of a candidate key Example: Given R(A,B ,C,D) and or X is nonprime, i.e. if X A is partial or AB CD, C B. Then R is in 3NF but not in BCNF transitive. C is a determinant but not a superkey (tuples are not uniquely identified in R) 3NF = X is a superkey or A is prime. Should A be discriminated for being prime ? 17 18 BCNF: Example Properties of decomposition At a gym, an instructor is leading an activity in a certain room at a certain time. Keep all attributes from the universal R nonBCNF relation R. Time Room Instructor Activity Preserve the identified functional Mon 17.00 Gym Tina IronWoman dependencies. Mon 17.00 Mirrors Anna Aerobics Lossless join Tue 17.00 Gym Tina Intro It must be possible to join the smaller tables to arrive at composite information without Tue 17.00 Mirrors Anna Aerobics spurious tuples. Wed 18.00 Gym Anna IronWoman
19 20
Normalization: Example Normalization: Example
Given universal relation PID PersonNamn PID, Land AntalBesökILandet R(PID, PersonNamn, Land Kontinent Land, Kontinent, KontinentYta, AntalBesökILandet) Kontinent KontinentYta
Functional dependencies? Based on FDs, what are keys for R? Keys? Use inference rules
21 22
Normalization: Example Normalization: Example Is Land Kontinent, Kontinent KontinentYta, R (PID, Land , Kontinent, KontinentYta, PersonNamn, AntalBesökILandet) then in 2NF? Land Kontinent, KontinentYta (transitive rule) No, PersonNamn depends on a part of the key ( PID ), then R1(PID , PersonNamn) PID, Land Kontinent, KontinentYta (augmentation rule), R2(PID, Land , Kontinent, KontinentYta, AntalBesökILandet) PID, Land PersonNamn (augmentation rule), PID, Land AntalBesökILandet, Is R2 in 2NF? then No, Kontinent and KontinentYta depend on a part of the key ( Land ), then PID, Land Kontinent, KontinentYta, PersonNamn, AntalBesökILandet (additive rule) R1(PID , PersonNamn) R21(Land , Kontinent, KontinentYta) 2NF:2NF: nono nonprimenonprime attributeattribute shouldshould bebe Person, Land is the key for R. R22(PID, Land , AntalBesökILandet) functionallyfunctionally dependentdependent onon aa partpart ofof aa R1, R21, R22 are in 2NF candidatecandidate key.key.
23 24 3NF:3NF: 2NF2NF ++ nono nonprimenonprime attributeattribute shouldshould bebe functionallyfunctionally dependentdependent onon Are R1, R21, R22 in 3NF? aa setset ofof nonprimenonprime attributesattributes (( == nono Are R1, R22, R211, R212 in BCNF? transitivetransitive dependency)dependency) BCNF: Every determinant is a superkeysuperkey R22(PID, Land , AntalBesökILandet), R22(PID, Land , AntalBesökILandet), R1(PID , PersonNamn): R1(PID , PersonNamn): Yes, a single nonprime attribute, no transitive R211(Land , Kontinent) dependencies. R212(Kontinent , KontinentYta)
R21(Land , Kontinent, KontinentYta): Yes (don’t be confused by candidate keys!) No, Kontinent defines KontinentYta , then R211(Land , Kontinent) R212(Kontinent , KontinentYta) Can the universal relation R be reproduced from R1, R22, R211 and R212 without spurious tuples? R1, R22, R211, R212 are in 3NF
25 26
Summary and open issues 1. Which normal form?
Good design: informal and formal properties of The database contains data about cars, their owners and relations when the car was registered for that owner. Functional dependencies, and thus normal forms, are about attribute semantics (= real- world knowledge), normalization can only be PersonID FirstName LastName LicensePlate RegistrationDate Birthdate automated if FDs are given. 1000 Ann Anderson ABC123 2004-10-12 1981-04-04
1010 Ben Benson DEF234 2003-02-12 1945-12-12
Are high normal forms good design when it 1000 Ann Anderson ABC123 2001-04-23 1981-04-04 comes to performance? No, denormalization may be required.
27 28
2. Which normal form? 3. Which normal form?
A database contains data about registered cars The database contains data about flights, aircrafts and and their make (type). their pilots. Flights use different aircrafts depending on the number of booked passengers.
Date Flight Aircraft Pilot 13-Jan-2005 TGU7 Airbus 300 John LicensePlate Type Maker 14-Jan-2005 TGU7 Boeing 747 Daniel ABC123 C70 Volvo 12-Jan-2005 SKX6 Airbus 300 John DEF234 S40 Volvo 13-Jan-2005 SKX6 Boeing 747 Ann FGH345 Corolla Toyota 14-Jan-2005 SKX6 Fokker 50 Mary
29 30