Database Normalization

Normalization is the process of making a database fit “good database design” rules. We talk about normalization in terms of “normal forms” (NF). The normal forms are cumulative, i.e. for a database to be in 2NF (second normal form), it must also meet the requirements of 1NF.

The normal forms are defined as follows:

1. First normal form (1NF) sets the very basic rules for an organized database: a. Eliminate duplicative columns from the same table. b. Create separate tables for each group of related data and identify each row with a unique column or set of columns (the primary key). 2. Second normal form (2NF) further addresses the concept of removing duplicative data: a. Remove subsets of data that apply to multiple rows of a table and place them in separate tables. b. Create relationships between these new tables and their predecessors through the use of foreign keys. 3. Third normal form (3NF) goes one large step further: a. Remove columns that are not dependent upon the primary key. 4. Finally, fourth normal form (4NF) has one requirement: a. A relation is in 4NF if it has no multi-valued dependencies.

For example, an address table might have the following data:

IDNum FirstName LastName SchoolName SchAddress SchCity SchState SchZip 1 Joe Smith Carver 123 Easy Little AR 72201 Street Rock 2 Freda Joseph Carver 123 Easy Little AR 72201 Street Rock 3 Busy Body Cloverdale 456 Little AR 72209 Anywhere Rock 4 Pikup Andropov Lakewood 789 High North AR 72222 Little Rock

Notice that we have Carver and its address in this table in multiple places. To be in 2NF this table must be split into two tables. While it might not seem like it saves much space or effort in such a small example, imagine if we have hundreds of records with repeated school addresses. If the school address (or phone number, which is not shown in this example) changes, the change would have to be made to all records in the table rather than just in one place. A better design, and one that meets the requirements of 2NF, would be to split the table into the following two tables:

David Luneau 1/12/09 People:

IDNum FirstName LastName SchoolName 1 Joe Smith Carver 2 Freda Joseph Carver 3 Busy Body Cloverdale 4 Pikup Andropov Lakewood

School: SchoolName SchAddress SchCity SchState SchZip Carver 123 Easy Little AR 72201 Street Rock Cloverdale 456 Little AR 72209 Anywhere Rock Lakewood 789 High North AR 72222 Little Rock

In this case we are using the school name as the primary key in the School table. It is a “foreign key” in the People table. It is through this field that a relationship is built between the two tables.

If a school’s information changes, it can be changed in one place only. Also, the overall database will be smaller and there is no chance of having different address information in different places.

Additionally, no calculated data is to be stored in the database. If you need totals, averages, etc, they can be calculated in queries or reports.

David Luneau 1/12/09