– Part 02

Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz

1 Data Warehouse Purpose

House data for decision support Support organizational decision making – so that it can be fact-based instead of ad-hoc

2 Decision Support Categories

Reporting Analyzing Knowledge Discovery

3 Sample of Credit Card Promotion Data (from Tabl e 2. 3) Income Magazine Watch Life Ins CC Ins Sex Age Range Promo Promo Promo 40-50K Yes No No No Male 45 30-40K Yes Yes Yes No Female 40 40-50KNoNoNoNoMale 42 30-40K Yes Yes Yes Yes Male 43 50-60K Yes No Yes No Female 38 20-30K No No No No Female 55 30-40K Yes No Yes Yes Male 35 20-30K No Yes No No Male 27 30-40K Yes No No No Male 43 30-40K Yes Yes Yes No Female 41

4 Credit Card Purchases and Promotions Cons tell ati on Desig n

5 Online Analytical Processing (OLAP)  Query-based methodology that supports  OLAP engine structures data as a cube  A cube can have more than three dimensions – as the term cube is used in /data warehousing

6 Find the Total Sales by Product by Year and by Regi on Region South Central

Mythic World

Product

2005

7 Year Data Cubes

http://www.info- source.us/data__g_gwarehousing_mining/Data- http://zeesql.wordpress.com/2008/05/21/ Mining-and-Data-Warehousing-in-Biology- data-cubes/ 8 Medicine-and-Health-Care.html Data Cube Characteristics  Designed for a specific purpose  For four dimensions, visualize multiple cubes with same three dimension, but each cube represents a particular value of the fourth dimension  ElExtrapolate to n dimensions

http://zeesql.wordpress.com/2008/05/21/ data-cubes/ 9 Data Cube Characteristics  Cubes with many empty cells are not as useful  Thus, a cube with two time dimensions is not a good design, b/c intersection of quarter and month would be often empty

http://www.info- source.us/data__g_gwarehousing_mining/Data- Mining-and-Data-Warehousing-in-Biology- 10 Medicine-and-Health-Care.html Data Store Behind Data Cube

Relational Multidimensional  Star schema  Arrays  Advantage: user can view  Advantage: query speed data at detail level defined by star schema

11 OLAP Interfaces  Many are emerging – especially interfaces designed for visual exploration  Default interface is a workbook format  OLAP usefu l functiona lity  Different views of data  Statistical calculations  Drill-down and reverse drill down (or roll-up)  Look at data at a more granular (detail) level or vice-versa  Short video in right panel demoing OLAP interface: http://www.softwarefx.com/Extensions/featuresOlap.aspx

12 Slice

A slice is a subset of a multi- dimensional array corresponding to a single value for one or more members of the dimensions not in the subset. http://www.practicaldb.com/blog/cubes/ Dice

The dice operation is a slice on more than two dimensions of a dtdata cub e (or more than two consecutive slices) OLAP Concept Example  Credit card purchase data Month = Dec. Category = Vehicle Region = Two Amount = 6,720 Count = 110

Total amount and total number of Dec. vehicle purchases in Nov. Oct. region two for the Sep. month of December Aug.

Jul.

Month Jun.

May

Apr.

Mar.

Feb. Fo ur Th re Jan. T e wo O ne n io Reg Retail Travel Vehicle Restaurant Supermarket Miscellaneous Category

Figure 6.6 A multidimensional cube for credit card purchases Attributes May Be Based on Concept Hierarchy

17 Location Excel Pivot Tables  Accomplish the cube concept  aggregate your information  show a new perspective  http:// www.ti meatl as.com/5 _mi nut e_ti ps/ ch unk ers/l earn _to_use_pivot_tables_in_excel_2007_to_organize_data

19 Excel Pivot Table – Example – p. 1  Open CreditCardPromotion.xlsx  Copy the original data to a new worksheet  In order to preserve the original data  Remove any blan k co lumns or rows  Each column must have a heading  CllCells should be proper ly formatte d for the data type  Highlight the data

20 Excel Pivot Table – Example – p. 2  Click Insert  Select Pivot Table  Select Pivot Table to open the Create Pivot Table dialog box  Select Table/Range to make sure you selected the correct range  SlSelec t New Work khsheet bttbutton  Click Ok

21 Excel Pivot Table – Example – p. 3  Select Income-Range for row labels  Select Income Range for values  Click on Count of Income Range  Go to Field Setting  Choose % of column setting

22 Excel Pivot Table – Example – p. 4  Highlight the percentages Total  Select Insert Pie Chart

20-30K 30-40K 40-50K 50-60K

23 Excel Pivot Table – Example – p. 5  Check out the drill-down functionality  Double click in the pivot table on the % value for a particular income range  The detail val ues for t hat income range are disp laye d in a new worksheet

24 Continue – With Page 205  Creating a Multidimensional Pivot Table

25 Data Warehouse – Part 02

Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz

26