Basically Speaking, Inmon Professes the Snowflake Schema While Kimball Relies on the Star Schema
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Chapter 7 Multi Dimensional Data Modeling
Chapter 7 Multi Dimensional Data Modeling Fundamentals of Business Analytics” Content of this presentation has been taken from Book “Fundamentals of Business Analytics” RN Prasad and Seema Acharya Published by Wiley India Pvt. Ltd. and it will always be the copyright of the authors of the book and publisher only. Basis • You are already familiar with the concepts relating to basics of RDBMS, OLTP, and OLAP, role of ERP in the enterprise as well as “enterprise production environment” for IT deployment. In the previous lectures, you have been explained the concepts - Types of Digital Data, Introduction to OLTP and OLAP, Business Intelligence Basics, and Data Integration . With this background, now its time to move ahead to think about “how data is modelled”. • Just like a circuit diagram is to an electrical engineer, • an assembly diagram is to a mechanical Engineer, and • a blueprint of a building is to a civil engineer • So is the data models/data diagrams for a data architect. • But is “data modelling” only the responsibility of a data architect? The answer is Business Intelligence (BI) application developer today is involved in designing, developing, deploying, supporting, and optimizing storage in the form of data warehouse/data marts. • To be able to play his/her role efficiently, the BI application developer relies heavily on data models/data diagrams to understand the schema structure, the data, the relationships between data, etc. In this lecture, we will learn • About basics of data modelling • How to go about designing a data model at the conceptual and logical levels? • Pros and Cons of the popular modelling techniques such as ER modelling and dimensional modelling Case Study – “TenToTen Retail Stores” • A new range of cosmetic products has been introduced by a leading brand, which TenToTen wants to sell through its various outlets. -
Table of Contents
The Kimball Group Reader Relentlessly Practical Tools for Data Warehousing and Business Intelligence Remastered Collection Ralph Kimball and Margy Ross with Bob Becker, Joy Mundy, and Warren Thornthwaite Contents Introduction . xxv 1 The Reader at a Glance . 1 Setting Up for Success . 1 1.1 Resist the Urge to Start Coding . 1 1.2 Set Your Boundaries . 4 Tackling DW/BI Design and Development . 6 1.3 Data Wrangling . 6 1.4 Myth Busters . 9 1.5 Dividing the World . 10 1.6 Essential Steps for the Integrated Enterprise Data Warehouse . 13 1.7 Drill Down to Ask Why . 22 1.8 Slowly Changing Dimensions . 25 1.9 Judge Your BI Tool through Your Dimensions . 28 1.10 Fact Tables . 31 1.11 Exploit Your Fact Tables . 33 2 Before You Dive In . 35 Before Data Warehousing . 35 2.1 History Lesson on Ralph Kimball and Xerox PARC. 36 Historical Perspective . 37 2.2 The Database Market Splits . 37 2.3 Bringing Up Supermarts . 40 Dealing with Demanding Realities . 47 2.4 Brave New Requirements for Data Warehousing . 47 2.5 Coping with the Brave New Requirements. 52 2.6 Stirring Things Up . 57 2.7 Design Constraints and Unavoidable Realities . 60 xiv Contents 2.8 Two Powerful Ideas . 64 2.9 Data Warehouse Dining Experience . 67 2.10 Easier Approaches for Harder Problems . 70 2.11 Expanding Boundaries of the Data Warehouse . 72 3 Project/Program Planning . 75 Professional Responsibilities . 75 3.1 Professional Boundaries . 75 3.2 An Engineer’s View . 78 3.3 Beware the Objection Removers . -
MASTER's THESIS Role of Metadata in the Datawarehousing Environment
2006:24 MASTER'S THESIS Role of Metadata in the Datawarehousing Environment Kranthi Kumar Parankusham Ravinder Reddy Madupu Luleå University of Technology Master Thesis, Continuation Courses Computer and Systems Science Department of Business Administration and Social Sciences Division of Information Systems Sciences 2006:24 - ISSN: 1653-0187 - ISRN: LTU-PB-EX--06/24--SE Preface This study is performed as the part of the master’s programme in computer and system sciences during 2005-06. It has been very useful and valuable experience and we have learned a lot during the study, not only about the topic at hand but also to manage to the work in the specified time. However, this workload would not have been manageable if we had not received help and support from a number of people who we would like to mention. First of all, we would like to thank our professor Svante Edzen for his help and supervision during the writing of thesis. And also we would like to express our gratitude to all the employees who allocated their valuable time to share their professional experience. On a personal level, Kranthi would like to thank all his family for their help and especially for my friends Kiran, Chenna Reddy, and Deepak Kumar. Ravi would like to give the greatest of thanks to his family for always being there when needed, and constantly taking so extremely good care of me….Also, thanks to all my friends for being close to me. Luleå University of Technology, 31 January 2006 Kranthi Kumar Parankusham Ravinder Reddy Madupu Abstract In order for a well functioning data warehouse to succeed many components must work together. -
Kimball Vs. Inmon
© 2011 - Andy Hogg Kimball vs. Inmon "Neither are any wars so furious and bloody, or of so long continuance as those occasioned by difference in opinion, especially if it be in things indifferent." (Swift, 1726) In the world of the data warehouse (DW) there are two dominant and opposing dogmas. Zealots of both extoll the virtues of their chosen doctrine with religious fervour, whilst decrying the beliefs of the other. These doctrines have existed for years, and in that time innumerable DWs have been built upon the principles of William Inmon. Likewise, an incalculable number built upon the ideas of Ralph Kimball. Inmon’s Corporate Information Factory (CIF), is a top-down approach. Since the whole DW is built in advance of usage, it requires significant time to deliver value. It therefore requires unwavering sponsorship from a senior figure within the organisation, possessing long-term vision of the DW’s value. Commentators contrast Kimball’s Bus Architecture (BA) as a bottom-up approach, where the data marts (DMs) are built first and unified into a DW at the end of the process. Inmon (n.d.a) ridicules this:- “…in bottom up data warehouse development first one data mart is developed, then another data mart is developed, then one day - presto - you magically and effortlessly wake up and have a data warehouse”. Kimball (2003) dislikes the bottom-up description, “Bottom-up is typically viewed as quick and dirty – focused on the needs of a single department rather than the enterprise”. He maintains the BA is a holistic view of the enterprise, with a final overall structure planned from the outset. -
Kimball Toolkit Data Modeling Spreadsheet
Kimball Toolkit Data Modeling Spreadsheet Unscheduled Jethro overshadow no ceramicist plims nowhence after Yule jousts deceitfully, quite hypothyroidism. When Sterne apotheosizes his nomism hepatizes not anamnestically enough, is Obadiah away? Shawn enlighten his Louisiana rejoin cattishly, but chemurgic Arvy never escrow so randomly. Successful data access more complicated to the spreadsheet that features and kimball toolkit data modeling spreadsheet as degenerate dimension table with patient outcomes. Dimensions applicable to easily impressed by every large data warehousemanagerÕs job, such complexities of evidence, their person or even with spreadsheet and kimball toolkit data modeling spreadsheet. The conglomeration of two hybrid approaches required of triage to address information from multiple inputs to conduct additional items as modeling spreadsheet is responsible employee profile that is done. Which data warehouse project and report revenue, and costs forproduct acquisition and associated with snowflaked outriggers will require a kimball toolkit data modeling spreadsheet that several. Data modeling in kimball toolkit any kimball toolkit data modeling spreadsheet contains rows from kimball model withstands unexpectedchanges in? All over time, kimball model also conduct additional interviews are modeling spreadsheet that can drill down. Atomic transaction data is the most naturally dimensional data, such as purchase behavior, carefully selected from the vast universe of possible data sources in your organization. We alwaysshould be labeled to kimball toolkit data modeling spreadsheet can be overcome this spreadsheet to kimball toolkit. The kimball toolkit books, or changes to bring copies of kimball toolkit data modeling spreadsheet can now assume that the hands on the oltpuse in the ldapserver allows. Equivalent to a database field. -
Educational Open Government Data: from Requirements to End Users
Educational Open Government Data: from requirements to end users Rudolf Eckelberg, Vytor Bezerra Calixto, Marina Hoshiba Pimentel, Marcos Didonet Del Fabro, Marcos Suny´e,Leticia M. Peres, Eduardo Todt, Thiago Alves, Adriana Dragone, and Gabriela Schneider C3SL and NuPE Labs Federal University of Paran´a,Curitiba, Brazil {rce16,vsbc14,marina,marcos.ddf,sunye,lmperes,todt}@inf.ufpr.br, [email protected],[email protected],[email protected] Abstract. The large availability of open government data raises enor- mous opportunities for open big data analytics. However, providing an end-to-end framework able to handle tasks from data extraction and pro- cessing to a web interface involves many challenges. One critical factor is the existence of many players with different knowledge, who need to interact, such as application domain experts, database designers, and web developers. This represents a knowledge gap that is difficult to over- come. In this paper, we present a case study for big data analytics over Brazilian educational data, with more than 1 billion records. We show how we organized the data analytics phase, starting from the analytics requirements, data evolution, development and deployment in a public interface. Keywords: Open Government Data; Analytics API; data evolution 1 Introduction The large availability of Open Governmental Data raises enormous opportunities for open big data analytics. Opportunities are always followed by challenges and when one handle Big Data, difficulties lie in data capture, storage, searching, sharing, analysis, and visualization [2]. Providing useful Open Data would need a complete team, from the domain expert up to the web developer, so often it cannot be handled only by a data scientist for example. -
Lecture @Dhbw: Data Warehouse Part I: Introduction to Dwh and Dwh Architecture Andreas Buckenhofer, Daimler Tss About Me
A company of Daimler AG LECTURE @DHBW: DATA WAREHOUSE PART I: INTRODUCTION TO DWH AND DWH ARCHITECTURE ANDREAS BUCKENHOFER, DAIMLER TSS ABOUT ME Andreas Buckenhofer https://de.linkedin.com/in/buckenhofer Senior DB Professional [email protected] https://twitter.com/ABuckenhofer https://www.doag.org/de/themen/datenbank/in-memory/ Since 2009 at Daimler TSS Department: Big Data http://wwwlehre.dhbw-stuttgart.de/~buckenhofer/ Business Unit: Analytics https://www.xing.com/profile/Andreas_Buckenhofer2 NOT JUST AVERAGE: OUTSTANDING. As a 100% Daimler subsidiary, we give 100 percent, always and never less. We love IT and pull out all the stops to aid Daimler's development with our expertise on its journey into the future. Our objective: We make Daimler the most innovative and digital mobility company. Daimler TSS INTERNAL IT PARTNER FOR DAIMLER + Holistic solutions according to the Daimler guidelines + IT strategy + Security + Architecture + Developing and securing know-how + TSS is a partner who can be trusted with sensitive data As subsidiary: maximum added value for Daimler + Market closeness + Independence + Flexibility (short decision making process, ability to react quickly) Daimler TSS 4 LOCATIONS Daimler TSS Germany 7 locations 1000 employees* Ulm (Headquarters) Daimler TSS China Stuttgart Hub Beijing 10 employees Berlin Karlsruhe Daimler TSS Malaysia Hub Kuala Lumpur 42 employees Daimler TSS India * as of August 2017 Hub Bangalore 22 employees Daimler TSS Data Warehouse / DHBW 5 DWH, BIG DATA, DATA MINING This lecture is -
Overview of the Corporate Information Factory and Dimensional Modeling
Overview of the Corporate Information Factory and Dimensional Modeling Copyright (c) 2010 BIS3 LLC. All rights reserved. Data Warehouse Architect: Rosendo Abellera ` President, BIS3 o Nearly 2 decades software and system development o 12 years in DW and BI space o 25+ years of data and intelligence/analytics Other Notable Data Projects: ` Accenture ` Toshiba ` National Security Agency (NSA) ` US Air Force Copyright (c) 2010 BIS3 LLC. All rights reserved. 2 A data warehouse is a repository of an organization's electronically stored data designed to facilitate reporting and analysis. Subject-oriented Non-volatile Integrated Time-variant Reference: Wikipedia Copyright (c) 2010 BIS3 LLC. All rights reserved. 3 Copyright (c) 2010 BIS3 LLC. All rights reserved. 4 3rd Normal Form Bill Inmon Data Mart Corporate ? Enterprise Data ? Warehouse Dimensional Information Hub and Spoke Operational Data Modeling Factory Store Ralph Kimball Slowly Changing Dimensions Snowflake Star Schema Copyright (c) 2010 BIS3 LLC. All rights reserved. 5 ` Corporate ` Dimensional Information Factory Modeling 1.Top down 1.Bottom up 2.Data normalized to 2.Data denormalized to 3rd Normal Form form star schema 3.Enterprise data 3.Data marts conform to warehouse spawns develop the enterprise data marts data warehouse Copyright (c) 2010 BIS3 LLC. All rights reserved. 6 ` Focus ◦ Single repository of enterprise data ◦ Framework for Decision Support Systems (DSS) ` Specifics ◦ Create specific structures for distinct purpose ◦ Model data in 3rd Normal Form ◦ As a Hub and Spoke Approach, create data marts as subsets of data warehouse as needed Copyright (c) 2010 BIS3 LLC. All rights reserved. 7 Copyright (c) 2010 BIS3 LLC. -
The Data Warehouse ETL Toolkit
P1: FCH/SPH P2: FCH/SPH QC: FCH/SPH T1: FCH WY046-FM WY046-Kimball-v4.cls August 18, 2004 13:42 The Data Warehouse ETL Toolkit i P1: FCH/SPH P2: FCH/SPH QC: FCH/SPH T1: FCH WY046-FM WY046-Kimball-v4.cls August 18, 2004 13:42 ii P1: FCH/SPH P2: FCH/SPH QC: FCH/SPH T1: FCH WY046-FM WY046-Kimball-v4.cls August 18, 2004 13:42 The Data Warehouse ETL Toolkit Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data Ralph Kimball Joe Caserta Wiley Publishing, Inc. iii P1: FCH/SPH P2: FCH/SPH QC: FCH/SPH T1: FCH WY046-FM WY046-Kimball-v4.cls August 18, 2004 13:42 Published by Wiley Publishing, Inc. 10475 Crosspoint Boulevard Indianapolis, IN 46256 www.wiley.com Copyright C 2004 by Wiley Publishing, Inc. All rights reserved. Published simultaneously in Canada eISBN: 0-764-57923-1 Printed in the United States of America 10987654321 No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, ex- cept as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600. Requests to the Publisher for permission should be addressed to the Legal Department, Wiley Publishing, Inc., 10475 Crosspoint Blvd., Indianapolis, IN 46256, (317) 572-3447, fax (317) 572-4355, e-mail: [email protected]. -
Building Cubes with Mapreduce
Building Cubes with MapReduce Alberto Abelló Jaume Ferrarons Oscar Romero Universitat Politècnica de Universitat Politècnica de Universitat Politècnica de Catalunya, BarcelonaTech Catalunya, BarcelonaTech Catalunya, BarcelonaTech [email protected] [email protected] [email protected] ABSTRACT network access to a shared pool of configurable computing In the last years, the problems of using generic storage tech- resources (e.g., networks, servers, storage, applications, and niques for very specific applications has been detected and services) that can be rapidly provisioned and released with outlined. Thus, some alternatives to relational DBMSs (e.g., minimal management effort or service provider interaction". BigTable) are blooming. On the other hand, cloud comput- Cloud computing, in general, is a good solution for medium ing is already a reality that helps to save money by eliminat- to small companies that cannot afford a huge initial invest- ing the hardware as well as software fixed costs and just pay ment in hardware together with an IT department to man- per use. Indeed, specific software tools to exploit a cloud age it. With this kind of technologies, they can pay per use, are also here. The trend in this case is toward using tools instead of provisioning for peak loads. Thus, only when the based on the MapReduce paradigm developed by Google. company grows up (if at all), so the expenses will. The only In this paper, we explore the possibility of having data in problem is that they have to trust their data to third parties. a cloud by using BigTable to store the corporate historical In [1], we find an analysis of pros and cons of data man- data and MapReduce as an agile mechanism to deploy cubes agement in a cloud. -
Integration of Data Warehousing and Operations Analytics
16th IT and Business Analytics Teaching Workshop Integration of Data Warehousing and Operations Analytics Zhen Liu, PhD Daniel L. Goodwin College of Business Benedictine University Email: [email protected] https://www.linkedin.com/in/zhenliu/ Joint work with Erica Arnold and Daniel Kreuger 16th IT and Business Analytics Teaching 2 Workshop, June 1, 2018 Screenshot of WeChat Post • Me: I am offering two courses this quarter: Database Management Systems and Data Warehousing • Friend (CS professor): they are CS courses • Me: well, they are our MSBA core courses 16th IT and Business Analytics Teaching 3 Workshop, June 1, 2018 Motivation/Theme • My understanding of data warehousing – MSBA provides a unique perspective – Why are we different from Computer Science • Justify by examples in Operations Analytics – Newsvendor problem – Inventory pooling under fat-tail demands 16th IT and Business Analytics Teaching 4 Workshop, June 1, 2018 Business Analytics Domain 5 Topics • Introduction to Data Warehousing • Inventory Management • Customer Relationship Management (CRM) • Challenges and Future work 16th IT and Business Analytics Teaching 6 Workshop, June 1, 2018 Intro to Data Warehousing • The term "Data Warehouse" was first coined by Bill Inmon in 1990. • According to Inmon, a data warehouse is a subject oriented, integrated, time-variant, and non-volatile collection of data. – This data helps analysts to take informed decisions in an organization. 7 key features of a data warehouse • Subject Oriented − A data warehouse is subject oriented because it provides information around a subject rather than the organization's ongoing operations. – These subjects can be product, customers, suppliers, sales, revenue, etc. – A data warehouse does not focus on the ongoing operations, rather it focuses on modelling and analysis of data for decision making. -
Data Warehouse
Dr. vaibhav Sharma Data warehouse What is a Data Warehouse? [Barry Devlin] A single, complete and consistent store of data obtained from a variety of different sources made available to end users in a what they can understand and use in a business context. Inmon’s definition of Data Warehouse: In 1993, the "father of data warehousing", Bill Inmon, gave this definition of a data warehouse as: A data warehouse is subject-oriented, integrated, time-variant, nonvolatile collection of data in support of management’s decision making process. Data Warehouse Usage:- 1. Data warehouses and data marts are used in a wide range of applications. 2. Business executives use the data in data warehouses and data marts to perform data analysis and make strategic decisions. 3. In many areas, data warehouses are used as an integral part for enterprise management. 4. The data warehouse is mainly used for generating reports and answering predefined queries. 5. It is used to analyze summarized and detailed data, where the results are presented in the form of reports and charts. 6. Later, the data warehouse is used for strategic purposes, performing multidimensional analysis and sophisticated operations. 7. Finally, the data warehouse may be employed for knowledge discovery and strategic decision making using data mining tools. 8. In this context, the tools for data warehousing can he categorized into access and retrieval tools, database reporting tools, data analysis tools, and data mining tools. Reasons for data Warehouse: There are a few reasons why a data warehouse should exist: a) You want to integrate data across functions or systems to provide a complete picture of the data subject e.g.