Database Normalization
Total Page:16
File Type:pdf, Size:1020Kb
Database Normalization (Olav Dæhli – 2018) 25.10.2018 OD: Normalization 1 What is normalization and why normalize? • Normalization: A set of rules to decompose relations (tables) into smaller relations (tables), without loosing any data dependencies. • The reason for doing this, is to avoid: • duplicates, when not needed • redundant data, when not needed • unnecessary null values (because it makes selections and joins more difficult) • anomalies when updating data (when the same data have to be updated several places) 25.10.2018 OD: Normalization 2 Redundancy and Anomalies • Redundant data • Data that are unnecessary to store in the database • Duplicated data (the same information stored in several places) • Information that can be derived/calculated from other data • Age from BirthDate • Price inclusive VAT (Value Added Tax) calculated from price exclusive VAT • Anomalies • When the same data are stored in several places, update anomalies can occure 25.10.2018 OD: Normalization 3 Functional Dependency • X -> Y (means that Y is functionally dependent of X) • X and Y can be either one attribute (column) or a set of attributes (columns) (X1, X2, ... , Xi -> Y1, Y2, ... Yj ) • X is called a determinant, since X functionally determines Y. We have to know the meaning of X and Y to deside if a functional dependency exists. • Example: • PostalCode -> City (a specific postal code will always reference the same city). There exists a functional dependency between PostalCode and City, with PostalCode as a determinant. 25.10.2018 OD: Normalization 4 Functional Dependency (FD) • To know if a functional dependency exists, the data relations have to be analyzed by someone who knows the meaning of the data FD 1 FD 2 FD 3 25.10.2018 OD: Normalization 5 Superkey • A Superkey is one or more attributes (columns) in a relation (table), that uniquely determines all the attributes (columns) in the relation (table) • All the attributes together (in a table) will always be a Superkey. Example: • RegNumber, CarMake, Color, OwnerId, OwnerId, OwnerLastName, PostalCode, City • But lots of other combinations will also be Superkeys. Examples: • RegNumber, CarMake, Color, OwnerId, OwnerId, OwnerLastName, PostalCode, • RegNumber, OwnerId, PostalCode • RegNumber 25.10.2018 OD: Normalization 6 Candidate Key • A Candidate Key is a minimal Super Key • Criteria 1: If any attributes are removed from a Candidate Key, it will loose its characteristics as a Super Key • A Candidate Key is a «candidate» to be a Primary Key (A Primary Key is always selected amongst the Candidate Keys) • Criteria 2: Since it must have the ability to act as a Primary Key, it must be able to uniquely identify each and every row in the table 25.10.2018 OD: Normalization 7 Candidate Key - Examples • This is not a Candidate Key (but it is a Superkey): RegNumber, CarMake, Color, OwnerId, OwnerLastName, PostalCode, City • This is a Candidate Key: RegNumber • Case: PROJECTMEMBER (ProjectId, EmployeeId, HoursWorked) • This is a Candidate Key: ProjectId, EmployeeId • This is not Candidate Keys • ProjectId (neither a Superkey nor a Candidate Key) • EmployeeId (neither a Superkey nor a Candidate Key) • ProjectId, EmployeeId, HoursWorked (a Superkey, but not a Candidate Key) 25.10.2018 OD: Normalization 8 Using letters instead of column names • When the functional dependencies are described, letters (instead of column names) is often used to simplify the expression 25.10.2018 OD: Normalization 9 Normalization Criterias BCNF • No other 3NF functional • No transitive depen- 2NF functional dencies • No partial dependencies functional dependencies The goal is to 1NF achieve at least 3NF • Only «atomic» values for each attribute 25.10.2018 OD: Normalization 10 First normal form (1NF) • Criteria: Only «atomic» values for each attribute • Any multivalued attributes (repeating groups) have to be removed Example: Repeating groups (violates the 1NF-rule) 25.10.2018 OD: Normalization 11 First normal form (1NF) • One solution is to make one column for each car • Problems: 1) How many columns do we need? 2) It will result in many NULL-values 2) It will complicate search for data This is a bad solution. We want to expand the number of rows, not the number columns 25.10.2018 OD: Normalization 12 First normal form (1NF) - Normalization • Make a new table Primary Foreign Key • Move the repeating values to the new table Key • To preserve the functional dependency, the repeating values will act as Primary Key and a new attributt will act as a Foreign key to the original table 25.10.2018 OD: Normalization 13 Second normal form (2NF) • The table must satisfy 1NF • A Primary Key has to be chosen (amongst the Candidate Keys) • To qualify for 2NF, no non-key attributes can be dependant of only a part of the Primary Key (ie. «partial dependency» is not allowed) • NB! Only relevant when the Primary Key consists of more than one attribute (Combined Primary Key) Primary Key Partial Functional Dependency. C is dependant of a part of the Primary Key (B). This violates the 2NF-rule. 25.10.2018 OD: Normalization 14 Second normal form (2NF) - Normalization Split the table into two tables. Remove the partial dependency, and make it a new table B will become a Foreign Key 25.10.2018 OD: Normalization 15 Second normal form (2NF) - Example Splits the table into two tables without 2NF-violations CourseId will become a Foreign Key 25.10.2018 OD: Normalization 16 Third normal form (3NF) • The table must satisfy 2NF • A Primary Key has to be chosen (amongst the Candidate Keys) • No transitive functional dependencies from the Primary Key is allowed (which means that dependencies between non-key attributes will break the rule) Breaks the rule of the 3NF criteria A -> F and F -> G (and then: A -> G) Breaks the rule of is a Transitive Dependency the 3NF criteria 25.10.2018 OD: Normalization 17 Third normal form (3NF) - Normalization Still breaking Normalized the 3NF-rule to 3NF 1 2 3 Normalized Normalized to 3NF to 3NF 25.10.2018 OD: Normalization 18 Third normal form (3NF) - Example Still breaking Normalized the 3NF criteria to 3NF 1 2 3 Normalized Normalized to 3NF to 3NF 25.10.2018 OD: Normalization 19 Third normal form (3NF) - Example The result is that we end up with three 3NF-normalized tables, where all the original dependencies are preserved (through Foreign Keys): Foreign Key Foreign Key 25.10.2018 OD: Normalization 20 BCNF (Boyce Codd Normal Form) • This is very rare condition in practice, so we only take a brief look at it • BCNF is achieved by removing all other dependencies than specified in 2NF and 3NF • To meet the requirements of BCNF, every determinant in the relation (table) have to be a Candidate Key • BCNF-violation can only occur in relations (tables) with more than one composite Candidate Key, where at least one common attribute overlaps. In other cases, 3NF and BCNF will be equivalent 25.10.2018 OD: Normalization 21 BCNF - Example • Goal: A database for arranging parent-teacher conferences at a school • Table Name: PARENT_TEACHER_CONFERENCE • Constraint: A teacher is always assigned the same room for all the conferences listed on the same date. Various teachers can use the same room. • Table before normalization: We start with the Universal Relation: (All the data we want to register) Date Time Teacher Pupil Room 25.10.2018 OD: Normalization 22 BCNF – Example: Identifing dependencies Candidate Key 1 (CK1) Candidate Key 2 (CK2) CK2 CK1 A third functional dependency which is not a Candidate Key. This violates the BCNF-criteria. 25.10.2018 OD: Normalization 23 BCNF – Example: Normalization-solution Remove the attributes involved in the functional dependency that violates BCNF, and place them in a new, separate table The remaining part becomes the second table Both dependencies is now Candidate Keys and no other dependencies exist in the relation (table) 25.10.2018 OD: Normalization 24 Normalization - A step by step-strategy 1) Start with the Universal Relation (all the attributes from all the tables) 2) Analyze the data to reveal all the functional dependencies 3) Remove any repeating values (1NF-criteria) 4) Identify all Candidate Keys 5) Select a Primary Key 6) Remove any partial dependencies (to achieve 2NF) 7) Remove any transitive dependencies (to achieve 3N) 8) Remove any remaining dependencies (to achieve BCNF) 25.10.2018 OD: Normalization 25 Problems with Normalization • Normalization leads to smaller tables • Smaller tables (and more foreign keys) complicates inserting data • Normalization increases the need for joining tables when performing queries against the table. • More duplicated data, because of more foreign keys (but less redundancy of the same reason, which is positive) • 3NF is often the goal (because the rules of BCNF is more complicated to maintain than the rules of 3NF) 25.10.2018 OD: Normalization 26.