Data Warehouse – Part 02
Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz
1 Data Warehouse Purpose
House data for decision support Support organizational decision making – so that it can be fact-based instead of ad-hoc
2 Decision Support Categories
Reporting Analyzing Knowledge Discovery
3 Sample of Credit Card Promotion Data (from Tabl e 2. 3) Income Magazine Watch Life Ins CC Ins Sex Age Range Promo Promo Promo 40-50K Yes No No No Male 45 30-40K Yes Yes Yes No Female 40 40-50KNoNoNoNoMale 42 30-40K Yes Yes Yes Yes Male 43 50-60K Yes No Yes No Female 38 20-30K No No No No Female 55 30-40K Yes No Yes Yes Male 35 20-30K No Yes No No Male 27 30-40K Yes No No No Male 43 30-40K Yes Yes Yes No Female 41
4 Credit Card Purchases and Promotions Cons tell ati on Desig n
5 Online Analytical Processing (OLAP) Query-based methodology that supports data analysis OLAP engine structures data as a cube A cube can have more than three dimensions – as the term cube is used in business intelligence/data warehousing
6 Find the Total Sales by Product by Year and by Regi on Region South Central
Mythic World
Product
2005
7 Year Data Cubes
http://www.info- source.us/data__g_gwarehousing_mining/Data- http://zeesql.wordpress.com/2008/05/21/ Mining-and-Data-Warehousing-in-Biology- data-cubes/ 8 Medicine-and-Health-Care.html Data Cube Characteristics Designed for a specific purpose For four dimensions, visualize multiple cubes with same three dimension, but each cube represents a particular value of the fourth dimension ElExtrapolate to n dimensions
http://zeesql.wordpress.com/2008/05/21/ data-cubes/ 9 Data Cube Characteristics Cubes with many empty cells are not as useful Thus, a cube with two time dimensions is not a good design, b/c intersection of quarter and month would be often empty
http://www.info- source.us/data__g_gwarehousing_mining/Data- Mining-and-Data-Warehousing-in-Biology- 10 Medicine-and-Health-Care.html Data Store Behind Data Cube
Relational Multidimensional Star schema Arrays Advantage: user can view Advantage: query speed data at detail level defined by star schema
11 OLAP Interfaces Many are emerging – especially interfaces designed for visual exploration Default interface is a spreadsheet workbook format OLAP usefu l functiona lity Different views of data Statistical calculations Drill-down and reverse drill down (or roll-up) Look at data at a more granular (detail) level or vice-versa Short video in right panel demoing OLAP interface: http://www.softwarefx.com/Extensions/featuresOlap.aspx
12 Slice
A slice is a subset of a multi- dimensional array corresponding to a single value for one or more members of the dimensions not in the subset. http://www.practicaldb.com/blog/cubes/ Dice
The dice operation is a slice on more than two dimensions of a dtdata cub e (or more than two consecutive slices) OLAP Concept Example Credit card purchase data Month = Dec. Category = Vehicle Region = Two Amount = 6,720 Count = 110
Total amount and total number of Dec. vehicle purchases in Nov. Oct. region two for the Sep. month of December Aug.
Jul.
Month Jun.
May
Apr.
Mar.
Feb. Fo ur Th re Jan. T e wo O ne n io Reg Retail Travel Vehicle Restaurant Supermarket Miscellaneous Category
Figure 6.6 A multidimensional cube for credit card purchases Attributes May Be Based on Concept Hierarchy
17 Location Excel Pivot Tables Accomplish the cube concept aggregate your information show a new perspective http:// www.ti meatl as.com/5 _mi nut e_ti ps/ ch unk ers/l earn _to_use_pivot_tables_in_excel_2007_to_organize_data
19 Excel Pivot Table – Example – p. 1 Open CreditCardPromotion.xlsx Copy the original data to a new worksheet In order to preserve the original data Remove any blan k co lumns or rows Each column must have a heading CllCells should be proper ly formatte d for the data type Highlight the data
20 Excel Pivot Table – Example – p. 2 Click Insert Select Pivot Table Select Pivot Table to open the Create Pivot Table dialog box Select Table/Range to make sure you selected the correct range SlSelec t New Work khsheet bttbutton Click Ok
21 Excel Pivot Table – Example – p. 3 Select Income-Range for row labels Select Income Range for values Click on Count of Income Range Go to Field Setting Choose % of column setting
22 Excel Pivot Table – Example – p. 4 Highlight the percentages Total Select Insert Pie Chart
20-30K 30-40K 40-50K 50-60K
23 Excel Pivot Table – Example – p. 5 Check out the drill-down functionality Double click in the pivot table on the % value for a particular income range The detail val ues for t hat income range are disp laye d in a new worksheet
24 Continue – With Page 205 Creating a Multidimensional Pivot Table
25 Data Warehouse – Part 02
Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz
26