Normalization Rules
Total Page:16
File Type:pdf, Size:1020Kb
APPENDIX Normalization Rules Normalization is the process of removing data redundancy by implementing normalization rules. There are five degrees of normal forms, from the first normal form through the fifth normal form, as described in this appendix. First Normal Form The following are the characteristics of first normal form (1NF): • There must not be any repeating columns or groups of columns. An example of a repeating column is a customer table with Phone Number 1 and Phone Number 2 columns. Using “table (column, column)” notation, an example of a repeating group of columns is Order Table (Order ID, Order Date, Product ID, Price, Quantity, Product ID, Price, Quantity). Product ID, Price, and Quantity are the repeating group of columns. • Each table must have a primary key (PK) that uniquely identifies each row. The PK can be a composite, that is, can consist of several columns, for example, Order Table (Order ID, Order Date, Customer ID, Product ID, Product Name, Price, Quantity). In this notation, the underlined columns are the PKs; in this case, Order ID and Product ID are a composite PK. Second Normal Form The following are the characteristics of second normal form (2NF): • It must be in 1NF. • When each value in column 1 is associated with a value in column 2, we say that column 2 is dependant on column 1, for example, Customer (Customer ID, Customer Name). Customer Name is dependant on Customer ID, noted as Customer ID ➤ Customer Name. 505 506 APPENDIX ■ NORMALIZATION RULES • In 2NF, all non-PK columns must be dependent on the entire PK, not just on part of it, for example, Order Table (Order ID, Order Date, Product ID, Price, Quantity). The underlined columns are a composite PK. Order Date is dependent on Order ID but not on Product ID. This violates 2NF. • To make it 2NF, we need to break it into two tables: Order Header (Order ID, Order Date) and Order Item (Order ID,Product ID, Price, Quantity). Now all non-PK columns are dependent on the entire PK. In the Order Header table, Order Date is dependent on Order ID. In the Order Item table, Price and Quantity are dependent on Order ID and Product ID. Order ID in the Order Item table is a foreign key. Third Normal Form The following are the characteristics of third normal form (3NF): • It must be in 2NF. • If column 1 is dependent on column 2 and column 2 is dependent on column 3, we say that column 3 is transitively dependent on column 1. In 3NF, no column is tran- sitively dependent on the PK, for example, Product (Product ID, Product Name, Category ID, Category Name). Category Name is dependant on Category ID, and Category ID is dependant on Product ID. Category Name is transitively dependent on the PK (Product ID). This violates 3NF. • To make it 3NF, we need to break it into two tables: Product (Product ID, Product Name, Category ID) and Category (CategoryID, Category Name). Now no column is transi- tively dependent on the PK. Category ID in the Product table is a foreign key. Boyce-Codd Normal Form Boyce-Codd Normal Form (BCNF) is between 3NF and 4NF. The following are the characteris- tics of BCNF: • It must be in 3NF. • In Customer ID ➤ Customer Name, we say that Customer ID is a determinant. In BCNF,every determinant must be a candidate PK. A candidate PK means capable of being a PK; that is, it uniquely identifies each row. • BCNF is applicable to situations where you have two or more candidate composite PKs, such as with a cable TV service engineer visiting customers: Visit (Date,Route ID, Shift ID, Customer ID, Engineer ID, Vehicle ID). A visit to a customer can be identified using Date, Route ID, and Customer ID as the composite PK. Alternatively, the PK can be Shift ID and Customer ID. Shift ID is the determinant of Date and Route ID. APPENDIX ■ NORMALIZATION RULES 507 Higher Normal Forms The following are the characteristics of other normal forms: • A table is in fourth normal form (4NF) when it is in BCNF and there are no multivalued dependencies. • A table is in fifth normal form (5NF) when it is in 4NF and there are no cyclic dependencies. It is a good practice to apply 4NF or 5NF when it is applicable. ■Note A sixth normal form (6NF) has been suggested, but it’s not widely accepted or implemented yet. Index ■Numbers and Symbols overview, 302 @ for naming report parameters, 343 purposes of, 323 1NF (first normal form), 506 audits 2NF (second normal form), 505 DQ auditing, 296–298 3NF (third normal form), 506 ETL, defined, 31 4NF (fourth normal form), 507 reports, 332 5NF (fifth normal form), 507 authentication of users, 498 authorization of user access, 498 ■A Auto Build, 385 accounts, security audits of, 499 Auto Layout, 249 action column, 322 autofix action (DQ rules), 296 actions, data quality, 293–296 automating ETL monitoring, 492–493 administration functions ■ data quality monitoring, 495–498 B database management, 499–501 backing up ETL monitoring, 492–495 databases, 500 schema changes, 501–502 MDBs, 405–408 security management, 498–499 band attribute (Amadeus), 64 updating applications, 503 batch files ADOMD.NET, 412 creating, 138, 157 aggregates. See also summary tables ETL, 269 defined, 415 updating, 15–16 alerts (BI), 437–438 BCNF (Boyce-Codd Normal Form), 506 aligning partition indexes, 166 BI (Business Intelligence) allow action (DQ rules), 295 alerts, 437–438 Amadeus Entertainment case study. See case analytics applications, 413–416 study (Amadeus Entertainment) application categories, 411 AMO (Analysis Management Objects), 417 Business Intelligence Development Studio Analysis Services (OLAP) Report Wizard, 339 authentication and, 397 dashboard applications, 432–437 cubes in, 397 data mining applications. See data mining failover clusters and, 115 applications (BI) partitioned cubes, 119 examples of, 12–13 tools vs. reports, 333 portal applications, 438–439 analytics applications (BI), 413–416 reports, 34, 412–413 applications, updating by DWA, 503 search product vendors, 474 architectures systems, applications for, 17–18 data flow. See data flow architecture binary files, importing, 190 determining, 52 bitmapping, index, 169 system. See system architecture design block lists (black lists), 451 association scores, 471 boolean data type (data mining), 419, 420 attributes, customer, 444 bounce rate (e-mail), defined, 447 audio processing, text analytics and, 473 bridge tables, defined, 109 audit metadata bulk copy utility (bcp) SQL command, components of, 323 188–189 event tables, 323 bulk insert SQL command, 187, 189 maintaining, 327 business areas, identifying (Amadeus), 61–62 business case document, 51–52 509 510 ■INDEX business Intelligence (BI). See BI (Business class attribute (Amadeus), 64 Intelligence) classification algorithm, 422 Business Objects Crystal Report XI, 356 cleaning (CDI), defined, 468 Business Objects XI Release 2 Voyager, 380 cleansing, data, 277–290 business operations, evaluating (Amadeus), click-through rate (email), 98, 447 62–63 clustered configuration, defined, 43 business performance management, 13 clustering algorithm, 422 business requirements Clustering model, 431 CRM data marts (Amadeus), 96 Cognos subscription sales data mart (Amadeus), BI 8 Analysis, 380 90 PowerCube, 377, 379 verifying with functional testing, 480 Powerplay, 356 collation, database, 124 ■C columns calendar date attributes column (date continuous (data mining), 419 dimension), 77–78 cyclical (data mining), 420 campaigns description (data definition table), 305 creating CRM, 447–448 discrete (data mining), 419 defined, 447 discretized (data mining), 419 delivery/response data (CRM), 454–460 ordered (data mining), 420 response selection queries, 449 repeating, 505 results fact table, 99, 450 risk_level column, 322 segmentation (CRM), 18, 98, 447–450 status, 320, 322 candidate PK, 506 storing historical data as, 81 case sensitivity in database configuration, types in DW tables, 306 124 communication case study (Amadeus Entertainment) Communication Subscriptions Fact Table data feasibility study, 67–70 (example), 452 data warehouse risks, 67 communication_subscription transaction defining functional requirements, 63–65 table (NDS database), 140–143 defining nonfunctional requirements, master table (NDS physical database), 143 65–67 permission, defined, 96 evaluating business operations, 62–63 preferences, defined, 96 extracting Jade data with SSIS, 191–200 subscription, defined, 96 functional testing of data warehouse, 480 comparing data (ETL monitoring), 494–495 identifying business areas, 61–62 complaint rate (email), 98 iterative methodology example, 56–58 conformed dimensions overview of, 44–46 creating (views), 158 product sales. See product sales data mart defined, 7 (Amadeus) consolidation of data, 5–6 product sales reports, 349, 353, 355, 359, construction iteration, 56 369 content types (data mining), 419–420 query for product sales report, 331 continuous columns (data mining), 419 security testing, 485 control system, ETL, 31 server licenses and, 119 converting data for consolidation, 6 case table, defined (data mining), 418 cookies vs. self-authentication, 464 CDI (Customer Data Integration) covering index, 170 customer data store schema, 469 CRM (customer relationship management) fundamentals, 23–24, 467–468 basics, 14 implementation of, 469 campaign analysis (Amadeus), 64 CET (current extraction time), 182 campaign delivery/response data, change requests, procedures for, 501 454–460 character-based data types, 277 campaign segmentation, 447–450 charting. See also analytics applications (BI), customer analysis, 460–463