Data Warehouse – Part 02

Total Page:16

File Type:pdf, Size:1020Kb

Data Warehouse – Part 02 Data Warehouse – Part 02 Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz 1 Data Warehouse Purpose House data for decision support Support organizational decision making – so that it can be fact-based instead of ad-hoc 2 Decision Support Categories Reporting Analyzing Knowledge Discovery 3 Sample of Credit Card Promotion Data (from Tabl e 2. 3) Income Magazine Watch Life Ins CC Ins Sex Age Range Promo Promo Promo 40-50K Yes No No No Male 45 30-40K Yes Yes Yes No Female 40 40-50KNoNoNoNoMale 42 30-40K Yes Yes Yes Yes Male 43 50-60K Yes No Yes No Female 38 20-30K No No No No Female 55 30-40K Yes No Yes Yes Male 35 20-30K No Yes No No Male 27 30-40K Yes No No No Male 43 30-40K Yes Yes Yes No Female 41 4 Credit Card Purchases and Promotions Cons tell ati on Desig n 5 Online Analytical Processing (OLAP) Query-based methodology that supports data analysis OLAP engine structures data as a cube A cube can have more than three dimensions – as the term cube is used in business intelligence/data warehousing 6 Find the Total Sales by Product by Year and by Regi on Region South Central Mythic World Product 2005 7 Year Data Cubes http://www.info- source.us/data__g_gwarehousing_mining/Data- http://zeesql.wordpress.com/2008/05/21/ Mining-and-Data-Warehousing-in-Biology- data-cubes/ 8 Medicine-and-Health-Care.html Data Cube Characteristics Designed for a specific purpose For four dimensions, visualize multiple cubes with same three dimension, but each cube represents a particular value of the fourth dimension ElExtrapolate to n dimensions http://zeesql.wordpress.com/2008/05/21/ data-cubes/ 9 Data Cube Characteristics Cubes with many empty cells are not as useful Thus, a cube with two time dimensions is not a good design, b/c intersection of quarter and month would be often empty http://www.info- source.us/data__g_gwarehousing_mining/Data- Mining-and-Data-Warehousing-in-Biology- 10 Medicine-and-Health-Care.html Data Store Behind Data Cube Relational Multidimensional Star schema Arrays Advantage: user can view Advantage: query speed data at detail level defined by star schema 11 OLAP Interfaces Many are emerging – especially interfaces designed for visual exploration Default interface is a spreadsheet workbook format OLAP usefu l functiona lity Different views of data Statistical calculations Drill-down and reverse drill down (or roll-up) Look at data at a more granular (detail) level or vice-versa Short video in right panel demoing OLAP interface: http://www.softwarefx.com/Extensions/featuresOlap.aspx 12 Slice A slice is a subset of a multi- dimensional array corresponding to a single value for one or more members of the dimensions not in the subset. http://www.practicaldb.com/blog/cubes/ Dice The dice operation is a slice on more than two dimensions of a dtdata cube (or more than two consecutive slices) OLAP Concept Example Credit card purchase data Month = Dec. Category = Vehicle Region = Two Amount = 6,720 Count = 110 Total amount and total number of Dec. vehicle purchases in Nov. Oct. region two for the Sep. month of December Aug. Jul. Month Jun. May Apr. Mar. Feb. Fo ur Th re Jan. T e wo O ne n io Reg Retail Travel Vehicle Restaurant Supermarket Miscellaneous Category Figure 6.6 A multidimensional cube for credit card purchases Attributes May Be Based on Concept Hierarchy 17 Location Excel Pivot Tables Accomplish the cube concept aggregate your information show a new perspective http:// www.ti meatl as.com/5 _mi nut e_ti ps/ ch unk ers/l earn _to_use_pivot_tables_in_excel_2007_to_organize_data 19 Excel Pivot Table – Example – p. 1 Open CreditCardPromotion.xlsx Copy the original data to a new worksheet In order to preserve the original data Remove any blan k co lumns or rows Each column must have a heading CllCells should be properly formatte d for the data type Highlight the data 20 Excel Pivot Table – Example – p. 2 Click Insert Select Pivot Table Select Pivot Table to open the Create Pivot Table dialog box Select Table/Range to make sure you selected the correct range SlSelec t New Work khsheet bttbutton Click Ok 21 Excel Pivot Table – Example – p. 3 Select Income-Range for row labels Select Income Range for values Click on Count of Income Range Go to Field Setting Choose % of column setting 22 Excel Pivot Table – Example – p. 4 Highlight the percentages Total Select Insert Pie Chart 20-30K 30-40K 40-50K 50-60K 23 Excel Pivot Table – Example – p. 5 Check out the drill-down functionality Double click in the pivot table on the % value for a particular income range The detail values for that income range are disp laye d in a new worksheet 24 Continue – With Page 205 Creating a Multidimensional Pivot Table 25 Data Warehouse – Part 02 Based on Chapter 06 The Data Warehouse in Data- Mining: A Tutorial-Based Primer by Roiger and Geatz 26.
Recommended publications
  • (BI) Using MS Excel Powerpivot
    2018 ASCUE Proceedings Developing an Introductory Class in Business Intelligence (BI) Using MS Excel Powerpivot Dr. Sam Hijazi Trevor Curtis Texas Lutheran University 1000 West Court Street Seguin, Texas 78130 [email protected] Abstract Asking questions about your data is a constant application of all business organizations. To facilitate decision making and improve business performance, a business intelligence application must be an in- tegral part of everyday management practices. Microsoft Excel added PowerPivot and PowerPivot offi- cially to facilitate this process with minimum cost, knowing that many business people are already fa- miliar with MS Excel. This paper will design an introductory class to business intelligence (BI) using Excel PowerPivot. If an educator decides to adopt this paper for teaching an introductory BI class, students should have previ- ous familiarity with Excel’s functions and formulas. This paper will focus on four significant phases all students need to complete in a three-credit class. First, students must understand the process of achiev- ing small database normalization and how to bring these tables to Excel or develop them directly within Excel PowerPivot. This paper will walk the reader through these steps to complete the task of creating the normalization, along with the linking and bringing the tables and their relationships to excel. Sec- ond, an introduction to Data Analysis Expression (DAX) will be discussed. Introduction It is not that difficult to realize the increase in the amount of data we have generated in the recent memory of our existence as a human race. To realize that more than 90% of the world’s data has been amassed in the past two years alone (Vidas M.) is to realize the need to manage such volume.
    [Show full text]
  • Calculated Field in Pivot Table Data Model
    Calculated Field In Pivot Table Data Model Frostbitten and unjaundiced Eddie always counteracts d'accord and cowl his tana. New-fashioned and goniometrically,anarchical Ronny however never swop potentiometric his belittlement! Torre enunciatedRevolved Gordan harmonically tunneling or beat-up. or gimlets some doxologies In regular Pivot Tables, you can group numeric, data or text fields. Product of Reliable Bioreactors on Site. Here are exclusive to model in pivot calculated field table data model that data model and used when creating pivot. Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. Eg if you are counting customers that have purchased and have years on rows. Why is this last part important? Depending on the source of data, relationships may or may not be created when the model is initially set up. This data is provided by Microsoft for informational purposes only as an aid to illustrate a concept. To use and limitations and share some limitations of calculated field in pivot table data model. Yeah, good points Derek. Date field, and use it to show a count of orders. Ins menu in the model in pivot calculated field list table that i mentioned earlier, we shall not. Please start a new test to continue. Displays all of the values in each column or series as a percentage of the total for the column or series. This is used to present users with ads that are relevant to them according to the user profile. Note: use the Insert Item button to quickly insert items when you type a formula.
    [Show full text]
  • Sharing Files with Microsoft Office Users
    Sharing Files with Microsoft Office Users Title: Sharing Files with Microsoft Office Users: Version: 1.0 First edition: November 2004 Contents Overview.........................................................................................................................................iv Copyright and trademark information........................................................................................iv Feedback.................................................................................................................................... iv Acknowledgments......................................................................................................................iv Modifications and updates......................................................................................................... iv File formats...................................................................................................................................... 1 Bulk conversion............................................................................................................................... 1 Opening files....................................................................................................................................2 Opening text format files.............................................................................................................2 Opening spreadsheets..................................................................................................................2 Opening presentations.................................................................................................................2
    [Show full text]
  • Fast Foreign-Key Detection in Microsoft SQL
    Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel Zhimin Chen Vivek Narasayya Surajit Chaudhuri Microsoft Research Microsoft Research Microsoft Research [email protected] [email protected] [email protected] ABSTRACT stored in a relational database, which they can import into Excel. Microsoft SQL Server PowerPivot for Excel, or PowerPivot for Other sources of data are text files, web data feeds or in general any short, is an in-memory business intelligence (BI) engine that tabular data range imported into Excel. enables Excel users to interactively create pivot tables over large data sets imported from sources such as relational databases, text files and web data feeds. Unlike traditional pivot tables in Excel that are defined on a single table, PowerPivot allows analysis over multiple tables connected via foreign-key joins. In many cases however, these foreign-key relationships are not known a priori, and information workers are often not be sophisticated enough to define these relationships. Therefore, the ability to automatically discover foreign-key relationships in PowerPivot is valuable, if not essential. The key challenge is to perform this detection interactively and with high precision even when data sets scale to hundreds of millions of rows and the schema contains tens of tables and hundreds of columns. In this paper, we describe techniques for fast foreign-key detection in PowerPivot and experimentally evaluate its accuracy, performance and scale on both synthetic benchmarks and real-world data sets. These techniques have been incorporated into PowerPivot for Excel. Figure 1. Example of pivot table in Excel. It enables multi- dimensional analysis over a single table.
    [Show full text]
  • Building an Effective Data Warehousing for Financial Sector
    Automatic Control and Information Sciences, 2017, Vol. 3, No. 1, 16-25 Available online at http://pubs.sciepub.com/acis/3/1/4 ©Science and Education Publishing DOI:10.12691/acis-3-1-4 Building an Effective Data Warehousing for Financial Sector José Ferreira1, Fernando Almeida2, José Monteiro1,* 1Higher Polytechnic Institute of Gaya, V.N.Gaia, Portugal 2Faculty of Engineering of Oporto University, INESC TEC, Porto, Portugal *Corresponding author: [email protected] Abstract This article presents the implementation process of a Data Warehouse and a multidimensional analysis of business data for a holding company in the financial sector. The goal is to create a business intelligence system that, in a simple, quick but also versatile way, allows the access to updated, aggregated, real and/or projected information, regarding bank account balances. The established system extracts and processes the operational database information which supports cash management information by using Integration Services and Analysis Services tools from Microsoft SQL Server. The end-user interface is a pivot table, properly arranged to explore the information available by the produced cube. The results have shown that the adoption of online analytical processing cubes offers better performance and provides a more automated and robust process to analyze current and provisional aggregated financial data balances compared to the current process based on static reports built from transactional databases. Keywords: data warehouse, OLAP cube, data analysis, information system, business intelligence, pivot tables Cite This Article: José Ferreira, Fernando Almeida, and José Monteiro, “Building an Effective Data Warehousing for Financial Sector.” Automatic Control and Information Sciences, vol.
    [Show full text]
  • Business Intelligence Tools
    How Commercial Off-the-Shelf (COTS) Business Intelligence (BI) Tools Can Improve Financial Management Analysis Carolyn L. North, CPA, CGFM, CDFM/A, PMP R. Ross Hosse, CDFM Purpose of Business Intelligence Consolidate complex data into simple displays Engility Proprietary Information 2 Situational Awareness • Know your audience – Immediate group – Commander • Consider your tool sets – Boss hates canned report – Boss loves bubble charts Engility Proprietary Information 3 Tool Sets • Excel • Accounting Reports • Business Intelligence Tools Engility Proprietary Information 4 Creating Data Sets • Certain reports, exported spreadsheets, or pivot tables have blank cells – Easy for reading – Unsuitable for importing into other systems or Excel • Steps to make more useful – Download data – Remove any subtotals or totals – Remove blanks – Ensure each column has a title • Example • https://www.whitehouse.gov/omb/budget/Suppleme ntal – Table 9 Pivot Table Engility Proprietary Information 5 Creating Data Sets - Blanks • Filling in blank space – Select entire data range – Press Ctrl + G – Displays the “Go To” dialog – Choose Special – In Paste Special dialog, choose “Blanks”. Click OK – Type equals sign (=), then the up arrow – Ctrl + Enter – Select entire data range – Ctrl C to copy – Paste dropdown, select values Engility Proprietary Information 6 VLOOKUP Function • VLOOKUP function – Searches first column of a range of cells – Returns value from any cell on the same row of the range – 1st Step is to create your lookup table – Ensure table is sorted
    [Show full text]
  • Pivot Tables
    PIVOT TABLES In data processing, a pivot table is a data summarization tool found in data visualization programs such as spreadsheets or business intelligence software. Among other functions, a pivot-table can automatically sort, count total or give the average of the data stored in one table or spreadsheet. It displays the results in a second table (called a "pivot table") showing the summarized data. Pivot tables can be seen as a simplification of the more complete and complex. Pivot tables are my best friend. GIS ANAYSIS NEEDED: The Panther Focus Area represents lands that are crucial for the Florida panther’s survival. Currently, the breeding population of the Florida panther is limited to south of the Caloosahatchee River. As urban encroachment increases and the sea levels rise, the FL panther is expected to naturally migrate to the Expansion Zone. Information Needed Who are the large landowners by County? How many acres do they own? How much of their land is currently protected? • Expansion Zone • Expansion Zone • Protected Lands • Expansion Zone • Protected Lands • Parcel Data Four Counties Charlotte Glades Highlands Lee 4,670 Parcels 2,281 Charlotte 2,083 Glades 137 Highlands 169 Lee 2,066 Owners 981 Charlotte 976 Glades 85 Highlands 30 Lee Protected 717 Protected 28 Partially 3,925 Unprotected “COUNTY” = ‘Charlotte’ and “PROTECTED” = ‘Protected” “COUNTY” = ‘Charlotte’ and “PROTECTED” = ‘Partially Protected” “COUNTY” = ‘Charlotte’ and “PROTECTED” = ‘Unprotected” 4 Counties x (protected, partially protected, and unprotected)
    [Show full text]
  • Appendix D: How to Use a Data Spreadsheet: Excel
    Appendix D How to Use a Data Spreadsheet: Excel ne does not necessarily have special statistical software to perform statistical analyses. Microsoft Office Excel can be used to run statistical procedures. Although in some respects Excel is not as preferable for data analyses as IBM SPSS, it is very user- friendly with simpler statistical procedures. This appendix describes how to use Excel to execute basic statistical calculations. Data from the 2004 version of the General Social Survey (GSS) is used for examples. OThis Appendix is based on Excel 2007 version, which differs in certain aspects from Excel’s previous versions. The most notable change that affects the exercises presented in this appendix concerns the pivot table feature. 22BASIC PROCEDURES Starting Excel: To start Excel using Windows, click on the Start button at the bottom left corner of the screen. Under Programs locate and click the Microsoft Excel icon. The layout of the Excel program has changed substantially for the Microsoft Office 2007 edition compared to its predeces- sors. Commands are now grouped in ribbons that are accessed by clicking on a specific tab. Thus, the Home tab grants access to a ribbon of several command groups: Clipboard, Font, Alignment, Number, Styles, Cells, and Editing. Once the program is started you will see a Worksheet Area that consists of cells forming columns and rows. Rows are identi- fied by numbers, and columns are identified by letters. Consequently, each cell has its own unique address—a combination of letters and numbers. For example, cell C6 is in column C, row6. The dark rim around a cell means that the cell is highlighted or active.
    [Show full text]
  • The Benefits of Data Modeling in Business Intelligence
    WHITE PAPER: THE BENEFITS OF DATA MODELING IN BUSINESS INTELLIGENCE The Benefits of Data Modeling in Business Intelligence DECEMBER 2008 Table of Contents Executive Summary 1 SECTION 1 2 Introduction 2 SECTION 2 2 Why Data Modeling for BI Is Unique 2 SECTION 3 4 Understanding the Meaning of Information 4 SECTION 4 7 Supporting Reporting Needs 7 SECTION 5 8 Conclusion 8 Copyright © 2008 CA. All rights reserved. All trademarks, trade names, service marks and logos referenced herein belong to their respective companies. This document is for your informational purposes only. To the extent permitted by applicable law, CA provides this document “As Is” without warranty of any kind, including, without limitation, any implied warranties of merchantability or fitness for a particular purpose, or noninfringement. In no event will CA be liable for any loss or damage, direct or indirect, from the use of this document, including, without limitation, lost profits, business interruption, goodwill or lost data, even if CA is expressly advised of such damages. Page 1 Executive Summary CHALLENGES Business intelligence (BI) is critical to many organizations today. Faced with ever- growing amounts of data, the challenge is to make sense of this data and unlock information that is useful and relevant to the business. A data model is a valuable communication tool to ensure that database developers understand and meet the needs of the business in the physical database system. Challenges include: Understanding the meaning of key business terms. Ensuring that reporting needs are met so that users can create flexible queries using the correct information.
    [Show full text]
  • Data Warehousing
    Data Warehousing Jens Teubner, TU Dortmund [email protected] Winter 2014/15 © Jens Teubner · Data Warehousing · Winter 2014/15 1 Part IV Modelling Your Data © Jens Teubner · Data Warehousing · Winter 2014/15 38 Business Process Measurements Want to store information about business processes. ! Store “business process measurement events” Example: Retail sales ! Could store information like: date/time, product, store number, promotion, customer, clerk, sales dollars, sales units, … ! Implies a level of detail, or grain. Observe: These stored data have different flavors: Ones that refer to other entities, e.g., to describe the context of the event (e.g., product, store, clerk) (; dimensions) Ones that look more like “measurement values” (sales dollars, sales units) (; facts or measures) © Jens Teubner · Data Warehousing · Winter 2014/15 39 Business Process Measurements Events A flat table view of the events could look like State City Quarter Sales Amount California Los Angeles Q1/2013 910 California Los Angeles Q2/2013 930 California Los Angeles Q3/2013 925 California Los Angeles Q4/2013 940 California San Francisco Q1/2013 860 California San Francisco Q2/2013 885 California San Francisco Q3/2013 890 California San Francisco Q4/2013 910 . © Jens Teubner · Data Warehousing · Winter 2014/15 40 Analysis Business people are used to analyzing such data using pivot tables in spreadsheet software. © Jens Teubner · Data Warehousing · Winter 2014/15 41 OLAP Cubes Data cubes are alternative views on such data. Facts: points in the k-dimensional space Aggregates on sides and edges of the cube would clerk make this a “k-dimensional pivot table”. date product © Jens Teubner · Data Warehousing · Winter 2014/15 42 OLAP Cubes for Analytics More advanced analyses: “slice and dice” the cube.
    [Show full text]
  • Microsoft Excel • Pivottables • Dashboards 3
    Excel Masterclass to Digitise your Business Pivot Tables and building a dashboard Tim Parle 1 July 2020 Agenda 1. Brief overview of spreadsheets in the market 2. Theory: • Brief introduction to Microsoft Excel • PivotTables • Dashboards 3. Practical examples • Simple PivotTables • Advanced PivotTales • PivotCharts • Dashboard Spreadsheet: The first ‘killer app’? The other ground breakers….. Spreadsheet programmes Microsoft Excel Calligra Sheets Google Sheets NeoOffice OpenOffice.org Calc LibreOffice Calc Gnumeric Kingsoft Market shares: Excel versus Google Sheets Source: https://medium.com/grid-spreadsheets-run-the-world/excel-vs-google- sheets-usage-nature-and-numbers-9dfa5d1cadbd Microsoft Excel: Introduction and overview Microsoft Excel is a spreadsheet application that was first launched by Microsoft Corporation in 1985. In order to perform mathematical functions on the data, the program organizes the data into columns and rows. This can then be manipulated through formulas that allow users to input and analyze large sets of data. The uses of Microsoft Excel are practically limitless - especially when you combine it with the accompanying Office Suite Programs. Source: https://excelsemipro.com/2019/10/history-of-microsoft-exce/ Popular Features of Microsoft Excel Top 10 features of Microsoft Excel to improve your ability to analyze data for your personal use or for your business: 1. Efficiently model and analyze almost any data 2. Zero in on the right data points quickly 3. Create data charts in a single cell 4. Access your spreadsheets from virtually anywhere 5. Connect, share, and accomplish more when working together 6. Take advantage of more interactive and dynamic PivotCharts 7. Add more sophistication to your data presentations 8.
    [Show full text]
  • Teaching Tip a Teaching Module of Database-Centric Online Analytical Process for MBA Business Analytics Programs
    Journal of Information Volume 30 Systems Issue 1 Education Winter 2019 Teaching Tip A Teaching Module of Database-Centric Online Analytical Process for MBA Business Analytics Programs Shouhong Wang and Hai Wang Recommended Citation: Wang, S. & Wang, H. (2019). Teaching Tip: A Teaching Module of Database-Centric Online Analytical Process for MBA Business Analytics Programs. Journal of Information Systems Education, 30(1), 19-26. Article Link: http://jise.org/Volume30/n1/JISEv30n1p19.html Initial Submission: 4 January 2018 Accepted: 26 June 2018 Abstract Posted Online: 4 December 2018 Published: 13 March 2019 Full terms and conditions of access and use, archived papers, submission instructions, a search tool, and much more can be found on the JISE website: http://jise.org ISSN: 2574-3872 (Online) 1055-3096 (Print) Journal of Information Systems Education, Vol. 30(1) Winter 2019 Teaching Tip A Teaching Module of Database-Centric Online Analytical Process for MBA Business Analytics Programs Shouhong Wang Charlton College of Business University of Massachusetts – Dartmouth Dartmouth, MA 02747, USA [email protected] Hai Wang Sobey School of Business Saint Mary’s University Halifax, NS B3H 2W3, Canada [email protected] ABSTRACT Business schools are increasingly establishing MBA business analytics programs. This article discusses the importance of a sufficient body of knowledge about databases for MBA business analytics students. It presents the pedagogical design and the teaching method of a module of database-centric OLAP (online analytical process) for an MBA business analytics course when a standalone database course is infeasible for the MBA business analytics program. The teaching module includes key database concepts for business analytics, a tutorial on database-centric OLAP, and a database-centric OLAP exercise assignment.
    [Show full text]