Normalization

SITC WORKSHOP Relational Normal Forms

• The normal forms defned in relational database theory represent guidelines for record design.

• The normalization rules are designed to prevent update anomalies and data inconsistencies.

• With respect to performance tradeofs, these guidelines are biased toward the assumption that all non-key felds will be updated frequently. • Normalization is a technique applied to tables to resolve data redundancy and prevent data anomalies in a database.

• During normalization, large tables are split up into smaller tables and felds are placed in appropriate tables. Normalization is a process that:

1) Eliminates duplicate data 2) Organizes tables 3) Increases the ease of managing data and making changes First Normal Form (1NF) • Edgar Codd, in a 1971 conference paper, defned a in frst normal form.

• First normal form (1NF) is a property of a relation in a relational database. • A relation is in frst normal form if the domain of each attribute contains only atomic values, and the value of each attribute contains only a single value from that domain. • First normal form deals with the "shape" of a record type.

• The frst normal form is applied as the frst stage of the normalization process. It usually leads to splitting felds in a table or splitting tables. 1NF tables as representations of relations

• There are no duplicate rows. • Every row-and-column intersection contains exactly one value from the applicable domain (and nothing else). • It also states that there should be no repeating groups, which implies that no two felds in a table should store values that appear logically related.

• Store more than one telephone number. • The simplest way – store multiple values for each number Is this table in 1NF? In which way can this table be represented to comply with 1NF? A design that complies with 1NF A design that also complies with higher normal forms

• Two tables: – a Customer Name table – a Customer Telephone Number table. A design that also complies with higher normal forms • A one-to-many relationship between the two tables; • "parent" to "child" relationship; • each telephone number belongs to one, and only one customer. Example

EMPID ENAME PROJECTID1 PROJECTID2 PROJECTID3

1 John Wright P1 P2 P3 2 Robert Ingram O1 O2 O3 After 1NF is applied

EMPID FIRSTNAME LAST NAME PROJECT

1 John Wright P1

1 John Wright P2

1 John Wright P3

2 Robert Ingram O1

2 Robert Ingram O2

2 Robert Ingram O3 After 1NF is applied

EMPID FIRSTNAME LAST NAME

1 John Wright

2 Robert Ingram

EMPID PROJECTID

1 P1 1 P2 1 P3 2 O1 2 O2 2 O3 (2NF) • Second normal form (2NF) is a normal form used in . 2NF was originally defned by E.F. Codd in 1971.

• A table is in 2NF if and only if it is in 1NF and every non-prime attribute of the table is dependent on the whole of every candidate key.

• The second normal form is usually applied to tables having a multiple-feld . Second Normal Form (2NF) • Second normal form is violated when a non-key feld is a fact about a subset of a key.

Warehouse Part Warehouse Quantity Address Wheel W1 10 Yola

Wheel W2 2 Jimeta

Door W1 3 Yola

Trunk W3 2 Abuja • To satisfy second normal form:

Part Warehouse Quantity Wheel W1 10 Wheel W2 2 Door W1 3 Trunk W3 2

Warehouse Warehouse Address W1 Yola W2 Jimeta W3 Abuja • Advantages: – avoid redundancy – duplication of data is reduced

• Disadvantage: – performance cost for retrieval information is higher. Consider a table describing employees' skills: What are candidate keys for this table? Is the table in 2NF? • an "Employees" table with candidate key {Employee} • an "Employees' Skills" table with candidate key {Employee, Skill}:

Third Normal Form (3NF)

Third normal form is the third step in normalizing a database design to reduce the duplication of data and ensure referential integrity by ensuring that

• the entity is in second normal form and • no non-prime felds should be dependent on another non-prime feld in a table. Third Normal Form (3NF)

Third normal form is violated when a non- key feld is a fact about another non-key feld.

Employee Department Location Abubakar Computer Science A&S building Hassan Computer Science A&S building Yasir Information systems A&S building Olumide Information systems A&S building Fatima Management POH building Assumption: each department is located in one place Disadvantages: – Redundancy – Location changes – change every record – Inconsistency - diferent records showing diferent locations for the same department. To satisfy third normal form, the record shown above should be decomposed into the two records:

Employee Department Abubakar Computer Science Hassan Computer Science Yasir Information systems Olumide Information systems Fatima Management

Department Location Computer Science A&S building Information systems A&S building Management POH building What are candidate keys for this table? Is the table in 3NF? Why? In order to express the same facts without violating 3NF, it is necessary to split the table into two:

Update anomalies cannot occur in these tables, which are both in 3NF.