Relational Databases: Some Basic Concepts

APPENDIX A Relational Databases: Some Basic Concepts THIS APPENDIX DISCUSSES SOME key concepts of relational databases, including normalization, sets, cursors, and locking. A relational database is a collection of tables, and a table is a set of columns and rows. A table's columns can be referred to as columns, fields, or attributes, and each column in a table requires a unique name and a data type. The data in a given row are generally referred to as a record. (The term recordset refers to a set of records, reminding you that relational databases use set-based operations rather than row-at-a-time, pointer-based operations.) In relational databases, each table needs to have a key, which is associated with a column. There are two important types of keys: primary keys and foreign keys. A primary key is a column of a table with a unique value for each row, which helps ensure data integrity by (theoretically) eliminating duplicate records. Afor eign key takes care of the relational in relational databases and provides a link between related data that are contained in more than one table. For example, in a classic parenti child relationship such as customers and orders, if CustomerId is the primary key in the Customers table, it'll need to occur in the Orders table as a foreign key to associate each order with an individual customer. The customer/ order relationship is a one-to-many relationship because each one customer can have many orders. That relationship is sometimes depicted as l:n. NOTE A primary key can also be a combination of several columns of a table, where one of those columns is aforeign key. For example, in one of my database tables, one table has a primary key, which is a combi nation of the Path and File columns, to avoid duplicate file records. The File column is a foreign key. A database is collection of tables, indexes, constraints, and other objects. The definition of these objects is known as the database's schema, which can be rep resented in a graphical way using various diagrams. 653 Appendix A Metadata refers to the collection of data about data that describes the content, quality, relations, and other characteristics of data. A database's meta data includes information that ranges from table definitions to users and their permissions. Understanding Normalization If you've been working with databases for a while, you're probably familiar with the term normalization. Database designers or developers often ask whether a database is normalized. So, what's normalization? Normalization is a process of eliminating data redundancy (except for the redundancy required by foreign keys) in multiple relational tables, and it ensures that data is organized efficiently. When you normalize a database, you basically have three goals: • Ensuring you've organized the data correctly into groups that minimize the amount of duplicate data stored in a database • Organizing the data such that, when you (or your users) modify data in it (such as a person's address or email), the change only has to be made once • Designing a database in which you can access and manipulate the data quickly and efficiently without compromising the integrity of the data in storage E. F. Codd first proposed the normalization process way back in 1972. Ini tially, he proposed three normal forms, which he called first, second, and third normal forms (lNF, 2NF, and 3NF). Subsequently, he proposed a stronger defi nition of3NF, which is known as Boyce-Codd normal form (BCNF). Later, others proposed forth and fifth normal forms. Most people agree that a database that's 3NF is probably good enough-in other words, normalized enough. First Normal Form (lNF) A table is in first normal form (lNF) if the values of its columns are atomic, con taining no repeating values. Applying INF on a table eliminates duplicate columns from the same table, which creates separate tables for each group of related data, and adds a primary key to the table to identify table rows with a unique value. For example, say you have a table called Customers (see FigureA-l). The Cus tomers table is designed to store data about customer orders. The columns store data about a customer such as name, address, and order description. 654 Relational Databases:Some Basic Concepts Figure A -1. Customers table as it might appear in Microsoft Access before normalization The data of the Customers table looks like Figure A-2. CustomerName I Address I City I Order1 1Order2 1Order3 1OrderOes 1 Order0es21 OrderDe! Ja~k .............. j t!?g. ~~~ ._1~ ~.~~_o ~ I ~~ i Q9.! ...... ; QQ.~ .. _1~Q~..... 1.~.i?!I_~L ..L ~~~!....... ! ~?.P!El.~ ... _ .~r, l<___ .__ .__ .. l! ~-:e.IJ_~__ ~ ~_q!2 _ ..~_ qlM ___ .. _~_Q9L ._ ~<l.~ .. +~~!..._-+~_~~~ ___ jM~~i!1_~ ; Jl t : ' • : ! Figure A-2. Two rows ofdata from the Customers table Figure A-l shows the design of a Customers table. As you can see, it's been designed to store up to three orders for any customer-sort of mimicking an array structure in spreadsheet format. There are obvious problems with this design. As you can see from Figure A-2, there are many columns related to orders (that's a classic giveaway of a poorly designed table-mixing two or more entities, or kinds of data.) Whenever a customer posts a new order, a new column will be added to the table. Not only that, if the same customer posts more than one order, the duplicate Address and City column data will be added to the table. This scenario adds duplicate data to the table. This table is not in INF because the INF rule says that a table is in INF if and only if a table's columns have no repeating values. You apply INF on this table by eliminating the details about the orders, pro viding just enough relational information to link the Customers table with a new Orders table. The new format of this table after INF would look like Figure A-3. You still have some information about orders in this Customers table but only basic link information: OrderNumber. You've fixed the repeating fields problem by jamming all the other order information into a single OrderDescription field. 655 Appendix A 110 Customers: Table r;]§l8J Figure A-3. Customers table schema after lNF The data of the table now looks like Figure A-4. ABC Road Jacksonville Copies XAvenue Exton 004 Paper XAvenue Exton 005 Books XAvenue Exton 006 Magazine I . Figure A-4. Data a/Customers table after lNF Second Normal Form (2NF) Second normal form (2NF) eliminates redundant data. A table is in 2NF if it's in INF and every non-key column (not a key column-primary or foreign) is fully dependent upon the primary key. Applying 2NF on a table removes duplicate data on multiple rows, places data in separate rows, places grouped data in new tables, and creates relation ships between the original tables and new tables through foreign keys. As you can see from Figure A-4, there are six records for two customers and three columns CustomerName, Address, and City has the same information three times. 2NF eliminate these cases. Under 2NF, you separate data into two different tables and relate these tables using a foreign key. Records in tables should not depend on anything other than the table's primary key. In this example, you now create a separate table called Orders. As you can probably guess, the Orders table stores information related to customer orders. It relates back to the correct customer via the CustomerId column. Now the two tables, Customers and Orders, look like Figure A-5 and Figure A-6. 656 Relational Databases:Some Basic Concepts Iftl Customers: Table DataT CustomerName Text Addr~ss Text aty Text 'il Customerld AutotUnberl FieIdPr~ Figure A-S. Customers table after 2NF IIIl Orders: Table ~ ClrderNumber Text ClrderOescnptlOn Text Customerld Text Figure A-6. Orders table after 2NF As you can see from Figures A-5 and A-6, both tables now have a primary key-denoted by the key icon to the left of the column name. This key will be unique for each record. The Customerld column of the Customers table is mapped to the CustomerId column ofthe Orders table. The CustomerId column of the Orders table is a foreign key. The relationship between the Customers and Orders table looks like Figure A-7. FigureA-7. Relationship between the Customers and Orders table Now the data in these tables look like Figure A-8 and Figure A-9. 657 Appendix A Figure A-B. Customers table after 2NF Figure A-9. Orders table after 2NF Third Normal Form (3NF) A table is in third normal form (3NF) if all columns of a table depend upon the primary key. 3NF eliminates columns that do not depend on the table's primary key. (Note that primary keys don't have to be single columns as shown in these simple examples.) For example, as you can see from Figure A-lO, CustomerId, UnitPrice, and Quantity depend on Orderld (the primary key). But the Total column doesn't depend on the Orderld column; it depends upon the UnitPrice and the Quantity columns. Figure A -10. Orders table before 3NF 658 Relational Databases:Some Basic Concepts The data of the Orders table look like Figure A-II after applying the 3NF rule on it, and you can calculate Total as the multiplication of UnitPrice and Quantity in your SQL statement. For example: SELECT UnitPrice * Quantity AS Total FROM Orders Figure A -11.

Load more