Database Normalization Tips Luke Chung FMS, President

Page 1 of 4 MSDN Home > MSDN Library > Microsoft Access > Rate this page: 25 users 4.4 out of 5 Database Normalization Tips Luke Chung FMS, President September 2002 Applies to: Microsoft® Access Summary: This article offers tips to developers to help them avoid some of the pitfalls when designing Access tables. This article applies to Microsoft Access databases (.mdb) and Microsoft Access projects (.adp). Contents Introduction Understanding Your Data What Data Do You Need? What Are You Going to Do with the Data? How Is Your Data Related to Each Other? What Is Going to Happen to the Data Over Time? Learn How to Use Queries Database Normalization Concepts Store Unique Information in One Place Records are Free, New Fields are Expensive Know When Data Needs to Be Duplicated Use Meaningless Field for the Key Field Use Referential Integrity Conclusion Introduction One of the most important steps in designing a database is ensuring that the data is properly distributed among its tables. With proper data structures, the remainder of the application (the queries, forms, reports, code, and so on) is significantly simplified. The formal name for proper table design is database normalization. This article is an overview of the basic database normalization concepts and some common pitfalls to consider and avoid. Understanding Your Data Before proceeding with table design, it's important to understand what you're planning to do with your data and how it will change over time. The assumptions you make will affect the eventual design. What Data Do You Need? When designing an application, it's critical to understand the final results to ensure that you have all the necessary data and know where it comes from. For instance, what is the appearance of the reports, where does each piece of data come from, and does all the data exist? Nothing is more damaging to a project than the realization, late in the process, that data is missing for an important report. Once you know what data you need, you must determine where it comes from. Is the data imported from another source? Does that data need to be cleaned or verified? Does the user enter data? Having a firm grasp of what data is required and where it comes from is the first step in database design. What Are You Going to Do with the Data? http://msdn.microsoft.com/library/en-us/dnacc2k2/h.../odc_FMSNormalization.asp?frame=tru 10/18/02 Page 2 of 4 Will your users need to edit the data and, if so, how should the data be displayed for them to understand and edit? Are there validation rules and related lookup tables? Are there auditing issues associated with data entry that require keeping backups of edits and deletions? What kind of summary information needs to be displayed to the user? Do you need to generate export files? With this information, you can envision how the fields are related to each other. How Is Your Data Related to Each Other? Group your data into related fields (such as customer-related information, invoice-related information, and so on). Each group of fields represents future tables. You should then consider how they are related to each other. For instance, what tables are related in a one-to-many relationship (for example, one customer may have multiple invoices)? What tables have a one-to-one relationship (often a consideration to combine into one table)? What Is Going to Happen to the Data Over Time? After the tables are designed, the impact of time is often not considered and can cause huge problems later. Many table designs work perfectly well for immediate use. However, many designs break down as users modify the data, as new data gets added, and as time passes. Often, developers find they need to restructure their tables to accommodate these changes. When table structures change, all their dependencies (queries, forms, reports, code, and so on) also need to be updated. By understanding and anticipating change over time, a better design can be implemented to minimize the problems. Learn How to Use Queries Understanding how you are going to analyze and manipulate the data is also important. You should have a firm grasp of how queries work, how to use them to link data across multiple tables, how to use them to group and summarize data, and how to use crosstab queries when you need to display data in non-normalized format. Ultimately, the goal of good data design is to balance the needs of storing the data efficiently over time, versus easily retrieving and analyzing it. Understanding the power of queries significantly helps with properly designing your tables. Database Normalization Concepts Rather than presenting a theoretical discussion about database normalization, this section explains basic concepts involved in database normalization. How you apply them in your situation may differ based on the needs of your application. The goal is to understand these basic concepts, apply them when you can, and understand the issues when you need to deviate from them. Store Unique Information in One Place Most database developers understand the basic concept of data normalization. Ideally, you'd like to store the same data in one place and refer to it with an ID when you need to reference it. Therefore, if some information changes, you can change it in one place and the information changes throughout your application. For instance, a customer table would store a record for each customer, including name, address, phone numbers, e-mail address, and other characteristics. The customer table would have a unique CustomerID field (usually an Autonumber field) that is its key field and used by other tables to refer to the customer. Therefore, an invoice table, rather than storing all the customer information with each invoice (because the same customer may have multiple invoices), would simply refer to the customer ID value, which could be used to look up the customer details in the customer table. Access makes it very easy to do this through its powerful forms that use combo boxes and subforms. If you need to make a change to the customer's information (such as a new phone number), you can change it in the customer table and know that any other part of your application that references that information is automatically updated. With a properly normalized database, changes to data over time are easily handled with a simple edit. Improperly normalized databases often include programming or queries to make changes across multiple records or tables. This not only requires more work to implement, but it also increases the chances of the data becoming inconsistent if the code or queries don't execute properly. Records are Free, New Fields are Expensive http://msdn.microsoft.com/library/en-us/dnacc2k2/h.../odc_FMSNormalization.asp?frame=tru 10/18/02 Page 3 of 4 Databases should be designed so that over time, you simply add new records. Database tables are designed to hold huge numbers of records. However, if you find you need to add more fields, you probably have a design problem. This often happens with spreadsheet experts who design databases the way they are accustomed to designing spreadsheets. Designing time-sensitive fields (such as Year, Quarter, Product, and Salesman) requires new fields to be added in the future. But the correct design is to transpose the information and have the time-sensitive data in one field so more records can be added. For instance, rather than creating a separate field for each year, create a Year field, and enter the value of each record's year in that field. The reason it’s problematic to add additional fields is due to the impact of structural changes to tables on other parts of the application. When more fields are added to a table, the objects and code that depend on the table also need to be updated. For instance, queries need to grab the extra fields, forms need to display them, reports need to include them, and so on. However, if the data were normalized, the existing objects would automatically retrieve the new data and calculate or display it correctly. Queries are particularly powerful because they allow you to group on the Year field to show summaries by year — no matter what years are in your table. Data normalization does not mean, however, that you can't display or use data with time-sensitive or time- dependent fields. Developers who need to show and display such information can often do so by using crosstab queries. If you aren’t familiar with crosstab queries, you should learn how to use them. They are not the same as tables (in particular, you cannot edit the results of a crosstab query), but they can certainly be used for displaying information in a datasheet (up to 255 fields). If you want to use them in reports, it's more complicated because your report will need to accommodate the additional or changing field names. That's why most reports will show data as separate groupings within the report, rather than as separate columns. For those instances where you have no choice, you'll have to invest the time to support this, but hopefully all parties will understand the implication such decisions have on additional resources over time. So, that's why additional records are free (the big advantage of databases) and why additional fields are so expensive. Databases can accommodate massive amounts of change, if they are designed properly. Know When Data Needs to Be Duplicated Sometimes, data needs to be de-normalized to preserve information that may change over time.

Database Normalization Tips Luke Chung FMS, President

Database Design Solutions

Normalization of Databases

Database Normalization

Normalization of Database Tables

Database Design and Normalization

Identifying and Managing Technical Debt in Database Normalization Using Machine Learning and Trade-Off Analysis

Database Normalization—Chapter Eight

NEBC Database Course 2008 – Bonus Material

Introduction to Databases Presented by Yun Shen ([email protected]) Research Computing

Chapter- 7 DATABASE NORMALIZATION Keys Types of Key

P1 Sec 1.8.1) Database Management System(DBMS) with Majid Tahir

Database Normalization