Data Warehousing
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Cubes Documentation Release 1.0.1
Cubes Documentation Release 1.0.1 Stefan Urbanek April 07, 2015 Contents 1 Getting Started 3 1.1 Introduction.............................................3 1.2 Installation..............................................5 1.3 Tutorial................................................6 1.4 Credits................................................9 2 Data Modeling 11 2.1 Logical Model and Metadata..................................... 11 2.2 Schemas and Models......................................... 25 2.3 Localization............................................. 38 3 Aggregation, Slicing and Dicing 41 3.1 Slicing and Dicing.......................................... 41 3.2 Data Formatters........................................... 45 4 Analytical Workspace 47 4.1 Analytical Workspace........................................ 47 4.2 Authorization and Authentication.................................. 49 4.3 Configuration............................................. 50 5 Slicer Server and Tool 57 5.1 OLAP Server............................................. 57 5.2 Server Deployment.......................................... 70 5.3 slicer - Command Line Tool..................................... 71 6 Backends 77 6.1 SQL Backend............................................. 77 6.2 MongoDB Backend......................................... 89 6.3 Google Analytics Backend...................................... 90 6.4 Mixpanel Backend.......................................... 92 6.5 Slicer Server............................................. 94 7 Recipes 97 7.1 Recipes............................................... -
EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence an Architectural Overview
EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence An Architectural Overview EMC Information Infrastructure Solutions Abstract This white paper provides readers with an overall understanding of the EMC® Greenplum® Data Computing Appliance (DCA) architecture and its performance levels. It also describes how the DCA provides a scalable end-to- end data warehouse solution packaged within a manageable, self-contained data warehouse appliance that can be easily integrated into an existing data center. October 2010 Copyright © 2010 EMC Corporation. All rights reserved. EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice. THE INFORMATION IN THIS PUBLICATION IS PROVIDED “AS IS.” EMC CORPORATION MAKES NO REPRESENTATIONS OR WARRANTIES OF ANY KIND WITH RESPECT TO THE INFORMATION IN THIS PUBLICATION, AND SPECIFICALLY DISCLAIMS IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Use, copying, and distribution of any EMC software described in this publication requires an applicable software license. For the most up-to-date listing of EMC product names, see EMC Corporation Trademarks on EMC.com All other trademarks used herein are the property of their respective owners. Part number: H8039 EMC Greenplum Data Computing Appliance: High Performance for Data Warehousing and Business Intelligence—An Architectural Overview 2 Table of Contents Executive summary ................................................................................................. -
(BI) Using MS Excel Powerpivot
2018 ASCUE Proceedings Developing an Introductory Class in Business Intelligence (BI) Using MS Excel Powerpivot Dr. Sam Hijazi Trevor Curtis Texas Lutheran University 1000 West Court Street Seguin, Texas 78130 [email protected] Abstract Asking questions about your data is a constant application of all business organizations. To facilitate decision making and improve business performance, a business intelligence application must be an in- tegral part of everyday management practices. Microsoft Excel added PowerPivot and PowerPivot offi- cially to facilitate this process with minimum cost, knowing that many business people are already fa- miliar with MS Excel. This paper will design an introductory class to business intelligence (BI) using Excel PowerPivot. If an educator decides to adopt this paper for teaching an introductory BI class, students should have previ- ous familiarity with Excel’s functions and formulas. This paper will focus on four significant phases all students need to complete in a three-credit class. First, students must understand the process of achiev- ing small database normalization and how to bring these tables to Excel or develop them directly within Excel PowerPivot. This paper will walk the reader through these steps to complete the task of creating the normalization, along with the linking and bringing the tables and their relationships to excel. Sec- ond, an introduction to Data Analysis Expression (DAX) will be discussed. Introduction It is not that difficult to realize the increase in the amount of data we have generated in the recent memory of our existence as a human race. To realize that more than 90% of the world’s data has been amassed in the past two years alone (Vidas M.) is to realize the need to manage such volume. -
Calculated Field in Pivot Table Data Model
Calculated Field In Pivot Table Data Model Frostbitten and unjaundiced Eddie always counteracts d'accord and cowl his tana. New-fashioned and goniometrically,anarchical Ronny however never swop potentiometric his belittlement! Torre enunciatedRevolved Gordan harmonically tunneling or beat-up. or gimlets some doxologies In regular Pivot Tables, you can group numeric, data or text fields. Product of Reliable Bioreactors on Site. Here are exclusive to model in pivot calculated field table data model that data model and used when creating pivot. Power Query, Data model, DAX, Filters, Slicers, Conditional formats and beautiful charts. Eg if you are counting customers that have purchased and have years on rows. Why is this last part important? Depending on the source of data, relationships may or may not be created when the model is initially set up. This data is provided by Microsoft for informational purposes only as an aid to illustrate a concept. To use and limitations and share some limitations of calculated field in pivot table data model. Yeah, good points Derek. Date field, and use it to show a count of orders. Ins menu in the model in pivot calculated field list table that i mentioned earlier, we shall not. Please start a new test to continue. Displays all of the values in each column or series as a percentage of the total for the column or series. This is used to present users with ads that are relevant to them according to the user profile. Note: use the Insert Item button to quickly insert items when you type a formula. -
Business Intelligence and Column-Oriented Databases
Central____________________________________________________________________________________________________ European Conference on Information and Intelligent Systems Page 12 of 344 Business Intelligence and Column-Oriented Databases Kornelije Rabuzin Nikola Modrušan Faculty of Organization and Informatics NTH Mobile, University of Zagreb Međimurska 28, 42000 Varaždin, Croatia Pavlinska 2, 42000 Varaždin, Croatia [email protected] [email protected] Abstract. In recent years, NoSQL databases are popular document-oriented database systems is becoming more and more popular. We distinguish MongoDB. several different types of such databases and column- oriented databases are very important in this context, for sure. The purpose of this paper is to see how column-oriented databases can be used for data warehousing purposes and what the benefits of such an approach are. HBase as a data management Figure 1. JSON object [15] system is used to store the data warehouse in a column-oriented format. Furthermore, we discuss Graph databases, on the other hand, rely on some how star schema can be modelled in HBase. segment of the graph theory. They are good to Moreover, we test the performances that such a represent nodes (entities) and relationships among solution can provide and we compare them to them. This is especially suitable to analyze social relational database management system Microsoft networks and some other scenarios. SQL Server. Key value databases are important as well for a certain key you store (assign) a certain value. Keywords. Business Intelligence, Data Warehouse, Document-oriented databases can be treated as key Column-Oriented Database, Big Data, NoSQL value as long as you know the document id. Here we skip the details as it would take too much time to discuss different systems [21]. -
Chapter 7 Multi Dimensional Data Modeling
Chapter 7 Multi Dimensional Data Modeling Fundamentals of Business Analytics” Content of this presentation has been taken from Book “Fundamentals of Business Analytics” RN Prasad and Seema Acharya Published by Wiley India Pvt. Ltd. and it will always be the copyright of the authors of the book and publisher only. Basis • You are already familiar with the concepts relating to basics of RDBMS, OLTP, and OLAP, role of ERP in the enterprise as well as “enterprise production environment” for IT deployment. In the previous lectures, you have been explained the concepts - Types of Digital Data, Introduction to OLTP and OLAP, Business Intelligence Basics, and Data Integration . With this background, now its time to move ahead to think about “how data is modelled”. • Just like a circuit diagram is to an electrical engineer, • an assembly diagram is to a mechanical Engineer, and • a blueprint of a building is to a civil engineer • So is the data models/data diagrams for a data architect. • But is “data modelling” only the responsibility of a data architect? The answer is Business Intelligence (BI) application developer today is involved in designing, developing, deploying, supporting, and optimizing storage in the form of data warehouse/data marts. • To be able to play his/her role efficiently, the BI application developer relies heavily on data models/data diagrams to understand the schema structure, the data, the relationships between data, etc. In this lecture, we will learn • About basics of data modelling • How to go about designing a data model at the conceptual and logical levels? • Pros and Cons of the popular modelling techniques such as ER modelling and dimensional modelling Case Study – “TenToTen Retail Stores” • A new range of cosmetic products has been introduced by a leading brand, which TenToTen wants to sell through its various outlets. -
Sharing Files with Microsoft Office Users
Sharing Files with Microsoft Office Users Title: Sharing Files with Microsoft Office Users: Version: 1.0 First edition: November 2004 Contents Overview.........................................................................................................................................iv Copyright and trademark information........................................................................................iv Feedback.................................................................................................................................... iv Acknowledgments......................................................................................................................iv Modifications and updates......................................................................................................... iv File formats...................................................................................................................................... 1 Bulk conversion............................................................................................................................... 1 Opening files....................................................................................................................................2 Opening text format files.............................................................................................................2 Opening spreadsheets..................................................................................................................2 Opening presentations.................................................................................................................2 -
Microsoft's SQL Server Parallel Data Warehouse Provides
Microsoft’s SQL Server Parallel Data Warehouse Provides High Performance and Great Value Published by: Value Prism Consulting Sponsored by: Microsoft Corporation Publish date: March 2013 Abstract: Data Warehouse appliances may be difficult to compare and contrast, given that the different preconfigured solutions come with a variety of available storage and other specifications. Value Prism Consulting, a management consulting firm, was engaged by Microsoft® Corporation to review several data warehouse solutions, to compare and contrast several offerings from a variety of vendors. The firm compared each appliance based on publicly-available price and specification data, and on a price-to-performance scale, Microsoft’s SQL Server 2012 Parallel Data Warehouse is the most cost-effective appliance. Disclaimer Every organization has unique considerations for economic analysis, and significant business investments should undergo a rigorous economic justification to comprehensively identify the full business impact of those investments. This analysis report is for informational purposes only. VALUE PRISM CONSULTING MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS DOCUMENT. ©2013 Value Prism Consulting, LLC. All rights reserved. Product names, logos, brands, and other trademarks featured or referred to within this report are the property of their respective trademark holders in the United States and/or other countries. Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this report may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Microsoft Corporation. -
Basically Speaking, Inmon Professes the Snowflake Schema While Kimball Relies on the Star Schema
What is the main difference between Inmon and Kimball? Basically speaking, Inmon professes the Snowflake Schema while Kimball relies on the Star Schema. According to Ralf Kimball… Kimball views data warehousing as a constituency of data marts. Data marts are focused on delivering business objectives for departments in the organization. And the data warehouse is a conformed dimension of the data marts. Hence a unified view of the enterprise can be obtained from the dimension modeling on a local departmental level. He follows Bottom-up approach i.e. first creates individual Data Marts from the existing sources and then Create Data Warehouse. KIMBALL – First Data Marts – Combined way – Data warehouse. According to Bill Inmon… Inmon beliefs in creating a data warehouse on a subject-by-subject area basis. Hence the development of the data warehouse can start with data from their needs arise. Point-of-sale (POS) data can be added later if management decides it is necessary. He follows Top-down approach i.e. first creates Data Warehouse from the existing sources and then create individual Data Marts. INMON – First Data warehouse – Later – Data Marts. The Main difference is: Kimball: follows Dimensional Modeling. Inmon: follows ER Modeling bye Mayee. Kimball: creating data marts first then combining them up to form a data warehouse. Inmon: creating data warehouse then data marts. What is difference between Views and Materialized Views? Views: •• Stores the SQL statement in the database and let you use it as a table. Every time you access the view, the SQL statement executes. •• This is PSEUDO table that is not stored in the database and it is just a query. -
Procurement and Spend Fact and Dimension Modeling
Procurement and Spend Dimension and Fact Job Aid Contents Introduction .................................................................................................................................................. 2 Procurement and Spend – Purchase Orders Subject Area: ........................................................................ 10 Procurement and Spend – Purchase Requisitions Subject Area:................................................................ 14 Procurement and Spend – Purchase Receipts Subject Area:...................................................................... 17 1 Procurement and Spend Dimension and Fact Job Aid Introduction The purpose of this job aid is to provide an explanation of dimensional data modeling and of using dimensions and facts to build analyses within the Procurement and Spend Subject Areas. Dimensional Data Model The dimensional model is comprised of a fact table and many dimensional tables and used for calculating summarized data. Since Business Intelligence reports used in measuring the facts (aggregates) across various dimensions, dimensional data modeling is the preferred modeling technique in a BI environment. STARS - OBI data model based on Dimensional Modeling. The underlying database tables separated as Fact Tables and Dimension Tables. The dimension tables joined to fact tables with specific keys. This data model usually called Star Schema. The star schema separates business process data into facts, which hold the measurable, quantitative data about the business, and dimensions -
CDW: a Conceptual Overview 2017
CDW: A Conceptual Overview 2017 by Margaret Gonsoulin, PhD March 29, 2017 Thanks to: • Richard Pham, BISL/CDW for his mentorship • Heidi Scheuter and Hira Khan for organizing this session 3 Poll #1: Your CDW experience • How would you describe your level of experience with CDW data? ▫ 1- Not worked with it at all ▫ 2 ▫ 3 ▫ 4 ▫ 5- Very experienced with CDW data Agenda for Today • Get to the bottom of all of those acronyms! • Learn to think in “relational data” terms • Become familiar with the components of CDW ▫ Production and Raw Domains ▫ Fact and Dimension tables/views • Understand how to create an analytic dataset ▫ Primary and Foreign Keys ▫ Joining tables/views Agenda for Today • Get to the bottom of all of those acronyms! • Learn to think in “relational data” terms • Become familiar with the components of CDW ▫ Production and Raw Domains ▫ Fact and Dimension tables/views • Creating an analytic dataset ▫ Primary and Foreign Keys ▫ Joining tables/views “C”DW, “R”DW & “V”DW • Users will see documentation referring to xDW. • The “x” is a variable waiting to be filled in with either: ▫ “V” for VISN, ▫ “R” for region or ▫ “C” for corporate (meaning “national VHA”) • Each organizational level of the VA has its own data warehouse focusing on its own population. • This talk focuses on CDW only. 7 Flow of data into the warehouse VistA = Veterans Health Information Systems and Technology Architecture C“DW” • The “DW” in CDW stands for “Data Warehouse.” • Data Warehouse = a data delivery system intended to give users the information they need to support their business decisions. -
Fast Foreign-Key Detection in Microsoft SQL
Fast Foreign-Key Detection in Microsoft SQL Server PowerPivot for Excel Zhimin Chen Vivek Narasayya Surajit Chaudhuri Microsoft Research Microsoft Research Microsoft Research [email protected] [email protected] [email protected] ABSTRACT stored in a relational database, which they can import into Excel. Microsoft SQL Server PowerPivot for Excel, or PowerPivot for Other sources of data are text files, web data feeds or in general any short, is an in-memory business intelligence (BI) engine that tabular data range imported into Excel. enables Excel users to interactively create pivot tables over large data sets imported from sources such as relational databases, text files and web data feeds. Unlike traditional pivot tables in Excel that are defined on a single table, PowerPivot allows analysis over multiple tables connected via foreign-key joins. In many cases however, these foreign-key relationships are not known a priori, and information workers are often not be sophisticated enough to define these relationships. Therefore, the ability to automatically discover foreign-key relationships in PowerPivot is valuable, if not essential. The key challenge is to perform this detection interactively and with high precision even when data sets scale to hundreds of millions of rows and the schema contains tens of tables and hundreds of columns. In this paper, we describe techniques for fast foreign-key detection in PowerPivot and experimentally evaluate its accuracy, performance and scale on both synthetic benchmarks and real-world data sets. These techniques have been incorporated into PowerPivot for Excel. Figure 1. Example of pivot table in Excel. It enables multi- dimensional analysis over a single table.