The Data Warehouse
Total Page:16
File Type:pdf, Size:1020Kb
The Data Warehouse Chapter 6 6.1 Operational Databases Data Modeling and Normalization • One-to-ORlihiOne Relationships • One-to-Many Relationships • Many-to-Many Relationships Data Modeling and Normalization • First Normal Form • Second Normal Form • Third N ormal F orm Make Customer ID Type ID Year Income Range Vehicle - Type Customer Figure 6.1 A simple entity- relationship diagram The Relational Model Table 6.1a • Relational Table for Vehicle-Type Type ID Make Year 4371 Chevrolet 1995 6940 Cadillac 2000 4595 Chevrolet 2001 2390 Cadillac 1997 Table 6.1b • Relational Table for Customer CtCustomer Income ID Range ($) Type ID 0001 70–90K 2390 0002 30–50K 4371 0003 70– 90K 6940 0004 30–50K 4595 0005 70–90K 2390 Table 6.2 • Join of Tables 6.1a and 6.1b CtCustomer Income ID Range ($) Type ID Make Year 0001 70–90K 2390 Cadillac 1997 0002 30–50K 4371 Chevrolet 1995 0003 70– 90K 6940 Cadillac 2000 0004 30–50K 4595 Chevrolet 2001 0005 70–90K 2390 Cadillac 1997 6.2 Data Warehouse Design The Data Warehouse “A data warehouse is a subject-oriented, integg,rated, time-variant, and nonvolatile collection of data in support of managggpement’s decision making process (W.H. Inmon).” Granularity Granularity is a term used to describe the level of detail of stored information. Dependent Data Mart External Data Extract/Summarize Data ETL Routine Decision Support System Operational (Extract/Transform/Load) Data Database(s) Warehouse Independent Report Data Mart Figure 6.2 A data warehouse process model Entering Data into the Warehouse • Independent Data Mart • ETL (Extract, Transform, Load Routine) • Metadata Structuring the Data Warehouse: Two Methods • Structure the warehouse model using the star schema • Structure the warehouse model as a multidimensional array The Star Schema • Fact Table • Dimension Tables • Slowly Changing Dimensions Purchase Dimension Purchase Key Category 1 Supermarket 2 Travel & Entertainment 3 AtAuto &Vhil& Vehicle Time Dimension 4 Retail Time Key Month Day Quarter Year 5 Restarurant 10 Jan 5 1 2002 6 Miscellaneous . Fact Table Cardholder Key Purchase Key Location Key Time Key Amount 1 2 1 10 14.50 15 4 5 11 8.25 1 2 3 10 22.40 . Cardholder Dimension Location Dimension Cardholder Key Name Gender Income Range Location Key Street City State Region 1 John Doe Male 50 - 70,000 10 425 Church St Charleston SC 3 2 Sara Smith Female 70 - 90,000 . Figure 6.3 A star schema for credit card purchases The Multidimensionality of the Star Sc hema Purchase Key Cardholder C i A(Ci,1,2,10) Time Key Location Key Figure 6.4 Dimensions of the fact table shown in Figure 6.3 Additional Relational Schemas • Snowflake Schema • Constellation Schema Time Dimension Time Key Month Day Quarter Year Promotion Dimension 5 Dec 31 4 2001 Promotion Key Description Cost 8 Jan 3 1 2002 1 wathtch promo 15. 25 10 Jan 5 1 2002 . Purchase Dimension Purchase Key Category 1 Supermarket 2 Travel & Entertainment 3 Auto & Vehicle 4 Retail 5 Restarurant 6 Miscellaneous Promotion Fact Table Purchase Fact Table Cardholder Key Promotion Key Time Key Response Cardholder Key Purchase Key Location Key Time Key Amount 1 1 5 Yes 1 2 1 10 14.50 2 1 5 No 15 4 5 11 8.25 . 1 2 3 10 22.40 . Cardholder Dimension Location Dimension Cardholder Key Name Gender Income Range Location Key Street City State Region 1 John Doe Male 50 - 70,000 5 425 Church St Charleston SC 3 2 Sara Smith Female 70 - 90,000 . Figure 6.5 A constellation schema for credit card purchases and promotions Decision Support: Analyzing the Warehouse Data • Reporting Data •Analyzing Data • Knowledggye Discovery 63On6.3 On-line Analytical Processing OLAP Operat ions • Slice – A single dimension operation • Dice – A multidimensional operation • Roll-up – A higher level of generalization •Drill-down – AllfdilA greater level of detail • Rotation – View data from a new perspective Month = Dec. Category = Vehicle Region = Two Amount = 6,720 Count = 110 Dec. Nov. Oct. Sep. Aug. Jul. Month Jun. May Apr. Mar. Feb. Fo ur Th re Jan. T e wo O ne n io Reg Retail Travel Vehicle Restaurant Supermarket Miscellaneous Category Figure 6.6 A multidimensional cube for credit card purchases Concept Hierarchy A mapping that allows attributes to be viewed from varying levels of detail. Region State City Street Address Figure 6.7 A concept hierarchy for location Four Three Supermarket Two = n Month = Oct./Nov/Dec. One o i g Category = Supermarket e Region = One R scellaneous i Q4 i M Retail Q3 Restaurant Q2 Vehicle Time Q1 Travel Category upermarket S S quarters Figure 6.8 Rolling up from months to 6.4 Excel Pivot Tables for Data Analysis Creating a Simple Pivot Table Figure 6.9 A pivot table template Figure 6.10 A summary report for income range Figure 6.11 A pie chart for income range Piblfhiivot Tables for Hypothesis Testing Figure 6.12 A pivot table showing age and credit card insurance choice Figure 6.13 Grouping the credit card promotion data by age Figure 6.14 PivotTable Layout Wizard Creating a Multidimensional Pivot Table Watch Promo = No Life Insurance Promo = Yes Magazine Promo = Yes No h Promo c Wat Yes No Life Insurance Promo Yes Figure 6.15 A credit card promotion N Y o e s Magazine Promo cube Figure 6.16 A pivot table with page variables for credit card promotions.