2-14-20 Notes

❖ Surrogate Keys ➢ A is an artificial added to a to serve as a ▪ DBMS supplied ▪ Short, numeric, and never changes—an ideal primary key ▪ Has artificial values that are meaningless to users ▪ Normally hidden in forms and reports ➢ Note: The primary key of the relation is underlined below: ▪ RENTAL_PROPERTY without surrogate key: ▪ RENTAL_PROPERTY (Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) ▪ RENTAL_PROPERTY with surrogate key: ▪ RENTAL_PROPERTY (PropertyID, Street, City, State/Province, Zip/PostalCode, Country, Rental_Rate) ➢ For performance reasons, a primary key should be short and numeric – create a surrogate key as seen in the example above

❖ Foreign Keys ➢ A is a column or composite of columns that is the primary key of a other than the one in which it appears. The term arises because it is a key of a table foreign to the one in which it appears as the primary key ➢ Note: The primary keys of the relations are underlined and any foreign keys are in italics in the relations below: ▪ DEPARTMENT (DepartmentName, BudgetCode, OfficeNumber, DepartmentPhone) ▪ EMPLOYEE (EmployeeNumber, LastName, FirstName, DepartmentName) ➢ Foreign keys express relationships between rows of tables ➢ EMPLOYEE.DepartmentName stores the relationship between an employee and his/her department

Constraint ➢ A referential integrity constraint is a statement that limits the values of the foreign key to those already existing as primary key values in the corresponding relation: ▪ SKU in ORDER_ITEM must exist in SKU in SKU_DATA ➢ Note: The primary key of the relation is underlined and any foreign keys are in italics in the relations below: ▪ SKU_DATA (SKU, SKU_Description, Department, Buyer) ▪ ORDER_ITEM (OrderNumber, SKU, Quantity, Price, ExtendedPrice) ▪ Where ORDER_ITEM.SKU must exist in SKU_DATA.SKU ➢ A referential integrity constraint allows for 1 other option – the foreign key cell in the table can be empty/NULL

❖ Database Integrity ➢ The Domain Integrity Constraint – all values in a column are of the same kind ➢ The Entity Integrity Constraint – primary key must have unique data values in every of the table ➢ The Referential Integrity Constraint – limits the values of the foreign key to those already existing as primary key values in the corresponding relation ➢ The purpose of these three constraints, taken as a whole, is to create database integrity, which means that the data in our database will be useful, meaningful data

❖ Modification Anomalies ➢ Deletion Anomaly – in a relation, the situation in which the removal of one row of a table deletes facts about 2 or more themes ➢ Example EQUIPMENT_REPAIR table: Delete row for RepairNumber 2100, you are also removing information about the machine like the EquipmentType and AcquisitionCost

➢ Insertion Anomaly – in a relation, the condition that exists when, to add a complete row to a table, one must add facts about 2 or more logically different themes ➢ Example EQUIPMENT_REPAIR table: Add a record for a repair (RepairNumber, RepairDate and RepairCost), you also need to know information including ItemNumber, EquipmentType and AcquisitionCost

➢ Update Anomaly – a data error created in a non-normalized table when an update action modifies one data value without modifying another occurrence of the same data value in the table ➢ Example EQUIPMENT_REPAIR table: Update AcquisitionCost from 3500 to 5500 for the last row of the table – Drill Press has 2 different AcquisitionCost values in the table and equipment cannot be acquired at 2 different costs (this would lead to data inconsistency)

➢ Notice that the EQUIPMENT_REPAIR table duplicates data. For example, the AcquisitionCost of the same item of equipment appears several times. Any table that duplicates data is susceptible to update anomalies. A table that has such inconsistencies is said to have data integrity problems. ➢ To improve query speed, tables are sometimes designed to have duplicated data, but this opens the door to data integrity problems

❖ Normal Forms ➢ First Normal Form 1NF, Second Normal Form 2NF, Third Normal Form 3NF, Boyce-Codd Normal Form BCNF, Fourth Normal Form 4NF, Fifth Normal Form 5NF, Domain Key Normal Form DK/NF ➢ Relations are categorized into normal forms based on which modification anomalies or other problems they have ➢ Normalization Theory can be defined into 3 major categories – functional dependencies, multivalued dependencies and data constraints & odd conditions:

➢ 5NF and DK/NF are specific and rare (not covered in this course)

➢ First Normal Form 1NF ▪ Remember that question - To Key or Not to Key? ▪ Codd’s set of conditions for a relation does not require a primary key, but one is clearly implied by the condition that all rows must be unique. ▪ Thus, we will define 1NF as: • Meets the set of conditions for a relation (Figure 3-4) • Has a defined primary key

➢ Second Normal Form 2NF ▪ A relation is in 2NF if, and only if, it is in 1NF and all non-key attributes are determined by the entire primary key ▪ If the primary key is a composite primary key, then no non-key attribute can be determined by an attribute or set of attributes that make up only part of the key ▪ Relations with single attribute primary keys are automatically in 2NF ▪ If you have a relation R (A, B, N, O, P) with the (A, B), then none of the non- key attributes N, O, or P can be determined by just A or just B

➢ Third Normal Form 3NF ▪ A relation is in 3NF if, and only if, it is in 2NF and there are no non-key attributes determined by another non-key attribute - this is also known as a Transitive Dependency ▪ In order for our relation R (A, B, N, O, P) to be in 3NF, none of the non-key attributes N, O, or P can be determined by N, O, or P

➢ Boyce-Codd Normal Form BCNF ▪ A relation is in BCNF if, and only if, it is in 3NF and every determinant is a ▪ Remember a candidate key is a determinant which determines all of the other columns in a relation ▪ The only way a relation in 3NF can have problems actually requiring further normalization work to get it into BCNF is if it has overlapping composite candidate keys ▪ If a relation does not have composite candidate keys or has non-overlapping composite candidate keys, then it is already in BCNF once it is in 3NF