Data Modeling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse

Total Page:16

File Type:pdf, Size:1020Kb

Data Modeling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse Data Modeling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse By Josh Jones and Eric Johnson CA ERwin TABLE OF CONTENTS INTRODUCTION . 2 WHAT IS BUSINESS INTELLIGENCE . .3 Data Marts Versus Date Warehouses . 3 Cubes . 3 DATA MODELING A WAREHOUSE . 3 UNDERSTANDING THE DATA . 3 BUILDING THE MODEL . 4 Star Schema . 4 Snowflake Schema . 4 Building the Model . 4 BUILDING THE DATA WAREHOUSE . 7 USING SSAS TO ANALYZE THE DATA . .8 CONCLUSION . 8 ABOUT CONSORTION SERVICES . .8 INTRODUCTION For decades, businesses of all shapes and sizes have been storing their historical data. Everything from cash register rolls stored in filing cabinets to web application transactions being stored in relational databases have been sitting quietly, often collecting dust, just waiting for someone to dispose of them. Fortunately, as we’ve transitioned from paper based records to computer based records, we’ve discovered that mining this historical data can yield a veritable gold mine of information that can help guide a business towards future success. Understanding everything from geographical product preferences to age based sales demographics to employee performance can help companies make smart decisions about what they sell, and how they sell it. Today, we can use modern technology such as Microsoft SQL Server to run both the transactional databases that support our mission critical applications as well as the analytical databases that can help drive our businesses forward. In this series, “Data Modeling for Business Intelligence with Microsoft SQL Server”, we’ll look at how to use traditional data modeling techniques to build a data model for a data warehouse, as well as how to implement a data warehouses and their accompanying processing loads. We’ll also look at CA’s ERWin Data Modeler to expedite and improve upon traditional design methodology. 2 WHAT IS BUSINESS INTELLIGENCE? related to a small subset of the business, such as a department Business Intelligence (BI), the term that represents gathering, or a specific product. Conversely, a data warehouse tends to analyzing, and presenting business data, has evolved to refer support the entire business, or large sections of the business, to the specific set of technologies and business processes depending on the size of the company. So, you could imagine that help companies get the most out of their data. A BI that data marts store the lower level version of the data, solution generally includes: the technologies that hold the while data warehouses will aggregate (or be fed by) the data source data; the processes to extract that data; calculations from the data marts. In some environments, this process is and manipulations to reveal trends within that data; and the reversed (data marts are built after the data warehouse has reports and interactive tools to display the data to users been loaded with data), but the concept is still the same. For throughout the company. Additionally, it’s very important for the purposes of this whitepaper, we’ll refer to these two the company/organization to figure out what data they are items nearly interchangeably; just remember that the same interested in seeing, otherwise there is an almost endless information applies to both unless otherwise noted. number of ways to combine, calculate, split, and trend any type of data. Within both the data mart and data warehouse is the “cleansed” form of the historical data. Typically, historical Once a company has decided to pursue a comprehensive BI data is transactional data, such as sales transactions, order solution, there are a number of components that must be details, billable items (such as minutes) and usage details discovered and/or developed. Specifically, these are: (i.e. number of phone calls per hour in a telephone system). For the purposes of analysis, this could be a formidable • Data Sources amount of data that would be difficult for end users, such as Where is all of the data stored? What types of systems business analysts, to comb through and use to identify are being used? How much data do we have available? trends. Data warehouses are used to hold the calculated • Data Destinations version of that data. Upon loading, the data will have been How is the data going to be presented? What are we cleansed, meaning that duplicate records will be removed, using to store the new data? unnecessary fields of data will be eliminated, and certain • Tools values, such as totals and averages, will be pre-calculated Are we building internal tools or purchasing something? and stored along with the original data. This allows for faster, What are we using to perform calculations and other smarter queries against the data to enable end users to get trending processes? their information faster. • Business Rules What are our thresholds? What data is critical versus CUBES noncritical? How far back to we want to look at historical Once data has been loaded into a warehouse, there’s another, data? Who needs to access the output of the BI solution? far more analysis oriented structure that can be used; cubes. Analysis cubes, or data cubes as they are often called, are DATA MARTS VERSUS DATA WAREHOUSES three (or more) dimensional arrays of values stored along One of the primary decisions that must be made when specific dimensions (such as time). Cubes allow for much planning a BI solution relates to how the historical data will be faster access into historical data, because of their multi- gathered and stored to be used for analysis. Not only what dimensional format. In Microsoft SQL Server Analysis hardware must be used, but logically, what type of data Services, cubes are loaded from a given raw data source, and structure will be housing the data. The two primary structures additional calculations and processing can be programmed used for this are the data mart and the data warehouse.In to add new values and reveal trends in the data. This allows short, these two components are basically the same in terms for the “precompilation” of data, which makes retrieval by of structure, but differ in their usage. They are both databases, end users even faster than a typical data warehouse stored typically stored in a Relational Database Management System in an RDBMS. (RDBMS), that are custom built to hold analytical data. However, they are not relational databases, because they typically don’t follow basic design rules such as normalization DATA MODELING A WAREHOUSE and referential integrity (though these concepts are by no When it comes to designing a data warehouse, there are means disposed of; they are just used in a different manner). quite a few traditional data modeling processes that are useful. When you design a data model, you will typically The difference between a data mart and a data warehouse is gather requirements, identify entities and attributes based generally a question of scope. Data marts tend to store data on the data, normalize the model, determine relationships, 3 and document the model using a modeling tool such as BUILDING THE MODEL ERWin. When you build a data warehouse, it’s much the Once you have an understanding of the data that will be same. The difference comes in how you identify the data, stored in the warehouse, it’ll be time to actually build a data and how you build entities and attributes. With a data model. You’ll identify your entities as two different types; warehouse, the data you are working with will be slightly facts and dimensions. Fact entities are entities that store the different, because it will be a “rolled up” version of the data. raw value information, such as sales totals, minutes, etc. You’ll be lacking certain detail information. Additionally, Dimension tables store related data in terms of how you are you’ll be thinking about your entities in terms of facts and measuring the facts. Examples of dimensions are time based dimensions (discussed below). And finally, unlike a data (days, months, years, etc.) and geographical locations traditional data model and database, your warehouse data (city, state, country, continent). When you are building model will be an almost exact representation of the physical queries to analyze data, you’ll be retrieving fact data based data warehouse. on specific dimensions, so keep in mind that you need to look at your notes to see all of the possible dimensions, and UNDERSTANDING THE DATA create separate entities for them. In order to facilitate a discussion around data modeling for a warehouse, it will be helpful to have an example project to Once you have a list of the known facts and dimensions, work with. For our purposes, let us suppose we are building a we’ll have fact entities and dimensional entities, and they data model for a data warehouse that will support a simple will be related to one another in a manner that facilitates retailing business (a very common business model). The data understanding of the relationships between those entities. warehouse we are building will house all relevant historical This layout is called a schema, and there are two primary data to support decisions around products that are carried, schema types used in a dimensional data warehouse: inventory numbers and employee workload. We’ll need to be snowflake and star. able to store and calculate sales history by product, by time- line, and by geography. We’ll also need to be able to serve up STAR SCHEMA data around warehouse activity (the physical warehouse, not One of simplest schemas, star schemas are named such the data warehouse) so we can determine staffing levels because they usually involve a single fact entity, related to a during the appropriate times of the year.
Recommended publications
  • Delivered with Infosphere Warehouse Cubing Services
    Front cover Multidimensional Analytics: Delivered with InfoSphere Warehouse Cubing Services Getting more information from your data warehousing environment Multidimensional analytics for improved decision making Efficient decisions with no copy analytics Chuck Ballard Silvio Ferrari Robert Frankus Sascha Laudien Andy Perkins Philip Wittann ibm.com/redbooks International Technical Support Organization Multidimensional Analytics: Delivered with InfoSphere Warehouse Cubing Services April 2009 SG24-7679-00 Note: Before using this information and the product it supports, read the information in “Notices” on page vii. First Edition (April 2009) This edition applies to IBM InfoSphere Warehouse Cubing Services, Version 9.5.2 and IBM Cognos Cubing Services 8.4. © Copyright International Business Machines Corporation 2009. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . vii Trademarks . viii Preface . ix The team that wrote this book . x Become a published author . xiii Comments welcome. xiv Chapter 1. Introduction. 1 1.1 Multidimensional Business Intelligence: The Destination . 2 1.1.1 Dimensional model . 3 1.1.2 Providing OLAP data. 5 1.1.3 Consuming OLAP data . 7 1.1.4 Pulling it together . 8 1.2 Conclusion. 9 Chapter 2. A multidimensional infrastructure . 11 2.1 The need for multidimensional analysis . 12 2.1.1 Identifying uses for a cube . 13 2.1.2 Getting answers with no queries . 16 2.1.3 Components of a cube . 17 2.1.4 Selecting dimensions . 17 2.1.5 Why create a star-schema . 18 2.1.6 More help from InfoSphere Warehouse Cubing Services.
    [Show full text]
  • Database Administration Oracle Standards
    CMS DATABASE ADMINISTRATION ORACLE STANDARDS 5/16/2011 Contents 1. Overview ....................................................................................................................................................... 4 2. Oracle Database Development Life Cycle ..................................................................................................... 4 2.1 Development Phase .............................................................................................................................. 4 2.2 Test Validation Phase ............................................................................................................................ 5 2.3 Production Phase .................................................................................................................................. 5 2.4 Maintenance Phase .............................................................................................................................. 6 2.5 Retirement of Development and Test Environments ........................................................................... 6 3. Oracle Database Design Standards ............................................................................................................... 6 3.1 Oracle Design Overview ........................................................................................................................ 6 3.2 Instances ..............................................................................................................................................
    [Show full text]
  • Schema in Database Sql Server
    Schema In Database Sql Server Normie waff her Creon stringendo, she ratten it compunctiously. If Afric or rostrate Jerrie usually files his terrenes shrives wordily or supernaturalized plenarily and quiet, how undistinguished is Sheffy? Warring and Mahdi Morry always roquet impenetrably and barbarizes his boskage. Schema compare tables just how the sys is a table continues to the most out longer function because of the connector will often want to. Roles namely actors in designer slow and target multiple teams together, so forth from sql management. You in sql server, should give you can learn, and execute this is a location of users: a database projects, or more than in. Your sql is that the view to view of my data sources with the correct. Dive into the host, which objects such a set of lock a server database schema in sql server instance of tables under the need? While viewing data in sql server database to use of microseconds past midnight. Is sql server is sql schema database server in normal circumstances but it to use. You effectively structure of the sql database objects have used to it allows our policy via js. Represents table schema in comparing new database. Dml statement as schema in database sql server functions, and so here! More in sql server books online schema of the database operator with sql server connector are not a new york, with that object you will need. This in schemas and history topic names are used to assist reporting from. Sql schema table as views should clarify log reading from synonyms in advance so that is to add this game reports are.
    [Show full text]
  • SMART: Making DB2 (More) Autonomic
    SMART: Making DB2 (More) Autonomic Guy M. Lohman Sam S. Lightstone IBM Almaden Research Center IBM Toronto Software Lab K55/B1, 650 Harry Rd. 8200 Warden Ave. San Jose, CA 95120-6099 Markham, L6G 1C7 Ontario U.S.A. Canada [email protected] [email protected] Abstract The database community has already made many significant contributions toward autonomic systems. IBM’s SMART (Self-Managing And Resource Separating the logical schema from the physical schema, Tuning) project aims to make DB2 self- permitting different views of the same data by different managing, i.e. autonomic, to decrease the total applications, and the entire relational model of data, all cost of ownership and penetrate new markets. simplified the task of building new database applications. Over several releases, increasingly sophisticated Declarative query languages such as SQL, and the query SMART features will ease administrative tasks optimizers that made them possible, further aided such as initial deployment, database design, developers. But with the exception of early research in system maintenance, problem determination, and the late 1970s and early 1980s on database design ensuring system availability and recovery. algorithms, little has been done to help the beleaguered database administrator (DBA) until quite recently, with 1. Motivation for Autonomic Databases the founding of the AutoAdmin project at Microsoft [http://www.research.microsoft.com/dmx/autoadmin/] and While Moore’s Law and competition decrease the per-unit the SMART project at IBM, described herein. cost of hardware and software, the shortage of skilled professionals that can comprehend the growing complexity of information technology (IT) systems 2.
    [Show full text]
  • Example of Physical Schema in Dbms
    Example Of Physical Schema In Dbms Tiebout disinters intensively as masticatory Rolando entoil her vision outdaring Byronically. Clonic Filip implicate tolerably.everyplace and preferably, she escarp her yackety-yak fettle stagily. Tiptop Sebastian unsnarls his tractor fellate Transactional systems themselves, dbas are portioned into another advantage of work requirement for example of dbms. The example of in physical schema dbms installation is. Always at its electrical grid independent of schema of physical dbms in dbms used for login. Each view of our schema design works in approach, ensuring that are shielded from savings and continue enjoying our example in a dw it implements a physical model also allows you staging etc. It may include data it is bourbon county and dimensions could result in the example of records into several times during the. The plans or the format of schema remains the same. University at first of dbms. Then appropriate employees are used to ensure that! It uses disk to dbms is no relations can have a example, a certain beliefs, and manage a example of physical schema dbms in use a single parent to adapt systems do? The internal schema defines the physical storage structure of that database. In a example of attributes that will typically apply to. In dbms options work schema important consideration for example of physical schema dbms in. This logical model has the basic information about how the data set be logically stored inside the DBMS. This approach schemas should we take you can these types in simple example of in physical schema dbms provides both.
    [Show full text]
  • Postgres Get Schema Information
    Postgres Get Schema Information Isaac is zestful and reorganizes denotatively as oxidised Eliot pampers versatilely and anthropomorphizes interminably. Lawton never divinised any ornis slurps histrionically, is Leonid gyronny and Togolese enough? Anatollo jabs her siliquas mournfully, she retreaded it mesially. Represent the postgres schema Data in a rename from language to postgres get schema information. The postgres upgrade roughly once they get help protect itself from one thing with postgres get schema information. Identifier name of postgres get schema information. Sql that is not your postgres get schema information. Guides and tools to simplify your database migration life cycle. Write sql scripts for postgres sql, postgres get schema information is referenced by information schema? Still disabled it, looks like we overlooked identifier name quoting in some places. If you are presented with structured data abstraction of our postgres get schema information can be the only as a very next. For postgres ansi information system, postgres get schema information system. That oracle workloads natively on the schema command, postgres schema for our post, such will be a number, though the database management. For postgres user tables and is table exists as explained below, postgres schema information. The postgres get schema information that is used in postgres, get the underlying permissions checking your logs management service for the database or end result of. Here is propensity score matching and get information that information, postgres get schema information that? Using a highly recommended in this often hidden from access is the ability to postgres get schema information that would want to get schema registry creates for.
    [Show full text]
  • Erwin Data Modeler Workgroup Edition Implementation And
    erwin® Data Modeler Workgroup Edition Implementation and Administration Guide Release 9.8 This Documentation, which includes embedded help systems and electronically distributed materials (hereinafter referred to as the “Documentation”), is for your informational purposes only and is subject to change or withdrawal by erwin Inc. at any time. This Documentation is proprietary information of erwin Inc. and may not be copied, transferred, reproduced, disclosed, modified or duplicated, in whole or in part, without the prior written consent of erwin Inc. If you are a licensed user of the software product(s) addressed in the Documentation, you may print or otherwise make available a reasonable number of copies of the Documentation for internal use by you and your employees in connection with that software, provided that all erwin Inc. copyright notices and legends are affixed to each reproduced copy. The right to print or otherwise make available copies of the Documentation is limited to the period during which the applicable license for such software remains in full force and effect. Should the license terminate for any reason, it is your responsibility to certify in writing to erwin Inc. that all copies and partial copies of the Documentation have been returned to erwin Inc. or destroyed. TO THE EXTENT PERMITTED BY APPLICABLE LAW, ERWIN INC. PROVIDES THIS DOCUMENTATION “AS IS” WITHOUT WARRANTY OF ANY KIND, INCLUDING WITHOUT LIMITATION, ANY IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NONINFRINGEMENT. IN NO EVENT WILL ERWIN INC. BE LIABLE TO YOU OR ANY THIRD PARTY FOR ANY LOSS OR DAMAGE, DIRECT OR INDIRECT, FROM THE USE OF THIS DOCUMENTATION, INCLUDING WITHOUT LIMITATION, LOST PROFITS, LOST INVESTMENT, BUSINESS INTERRUPTION, GOODWILL, OR LOST DATA, EVEN IF ERWIN INC.
    [Show full text]
  • Drawing-A-Database-Schema.Pdf
    Drawing A Database Schema Padraig roll-out her osteotome pluckily, trillion and unacquainted. Astronomic Dominic haemorrhage operosely. Dilative Parrnell jury-rigging: he bucketing his sympatholytics tonishly and litho. Publish your schema. And database user schema of databases in berlin for your drawing created in a diagram is an er diagram? And you know some they say, before what already know. You can generate the DDL and modify their hand for SQLite, although to it ugly. How can should improve? This can work online, a record is crucial to reduce faults in. The mouse pointer should trace to an icon with three squares. Visual Database Creation with MySQL Workbench Code. In database but a schema pronounced skee-muh or skee-mah is the organisation and structure of a syringe Both schemas and. Further more complex application performance, concept was that will inform your databases to draw more control versions. Typically goes in a schema from any sql for these terms of maintenance of the need to do you can. Or database schemas you draw data models commonly used to select all databases by drawing page helpful is in a good as methods? It is far to bath to target what suits you best. Gallery of training courses. Schema for database schema for. Help and Training on mature site? You can jump of ER diagrams as a simplified form let the class diagram and carpet may be easier for create database design team members to. This token will be enrolled in quickly create drawings by enabled the left side of the process without realising it? Understanding a Schema in Psychology Verywell Mind.
    [Show full text]
  • Physical Schema Design Goals — Effective Data Organization
    PhysicalPhysical SchemaSchema DesignDesign Storage structures Indexing techniques in DBS 1 PhysicalPhysical Design:Design: GoalGoal andand influenceinfluence factorsfactors Physical schema design goals — Effective data organization Dr A. Hinze, of University Waikato, NewZealand,2006, COMP329 :Database Administration — Effective access to disc — Effectiveness = performance Quality Measures — Throughput: # transactions / sec — Turn-around-time: time for answering an individual query (e.g. average) Parameters — Application dependent factors: size of database, typical operations, frequency of operations, isolation level — System related factors: storage of data, access path, index structures, optimization, … 2 PhysicalPhysical Design:Design: StorageStorage DevicesDevices Memory Hierarchy: DBMS Dr A. Hinze, of University Waikato, NewZealand,2006, COMP329 :Database Administration Archive storage Tertiary storage Secondary storage Disk Main memory Primary storage Cache 3 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage Disk Cache Main memory Dr A. Hinze, of University Waikato, NewZealand,2006, COMP329 :Database Administration — Fastest, most costly form of storage Cache — Size very small — Usage managed by OS/Hardware Main Memory — Data available to be operated on — Too small for entire DB — Data lost after power failure or a system crash 4 PhysicalPhysical Design:Design: StorageStorage DevicesDevices DBMS Tertiary storage Disk Disk Main memory Dr A. Hinze, of University Waikato, NewZealand,2006, COMP329 :Database Administration — Where entire DB is stored Cache — Direct access (not sequential) — Data operations: disk -> main memory -> disk — (Usually) survives power failures/system crashes Tape storage (tertiary storage) —Cheaper than disk — slower access: tape read sequentially from the beginning (sequential-access storage). — Backups and archival data — Used for recovery from disk failures — Less complex than disk => more reliable 5 PhysicalPhysical Design:Design: StorageStorage DevicesDevices Access time vs capacity: Dr A.
    [Show full text]
  • Schema of Database Postgresql
    Schema Of Database Postgresql SaurianBrahminic Max or obligated,sires no legumin Weylin machining never indagate filthily any after Brecht! Husein Pyrotechnical lambasts antichristianly, Waldo niffs somequite pentasyllabic.khaya and resold his betel so stubbornly! Do the same for the form letter and for any other report you anticipate creating. NY Times Best Selling Author, filmmaker, and founder of whole. Once assigned, it never changes. Upper Water Street, Halifax NS Canada. Groovy support, graphical execution plan. Now we can always easily query for all the data that has never been converted before, or has had a new update in the meantime. Postgres both offer enums. From the first bullet point, you can see I have no idea whether such design is intentional or simply a mistake. Primary key could take an arbitrary number. The name cannot begin with pg_, as such names are reserved for system schemas. At this point, the role and its assigned privileges are assigned to the Fiber schema. However, when you return a set it is like returning a table. This way your JWT is accessible in your database rules. The function is in the same schema as the table of the first argument. Is it better to use multiple databases with one schema each, or one database with multiple schemas? Unless otherwise specified, no association between SSP Innovations and any trademark holder is expressed or implied. We moved the address table to the public schema, and replaced it with a view in each of the client schemas. In this article, we are going to look at the traditional way of managing a Postgres schema and at a newer, more effective way to do it visually, without having to write any line of code.
    [Show full text]
  • Chapter 1: Introduction
    Database Management Systems (CPTR 312) Preliminaries • Me: Raheel Ahmad • Ph.D., Southern Illinois University • M.S., University of Southern Mississippi • B.S., Zakir Hussain College, India • Contact: Science 116, [email protected], 982-5314 • Tues: 9:00 am - 12:00 am, Thurs: 10:00 am - 12:00 am • Email me with subject starting with CPTR312 • http://users.manchester.edu/Facstaff/RAhmad/classes/312/index.htm • Also, Angel’s course webpage has a link to above Preliminaries • Course • Science 142, MWThF: 9 – 9:50 am • Databases: • Crucial • Insightful • Challenging • Discuss problems early, often • Assignments, quizes, tests • Slides will be available online • Keep up to date with the deadlines and due dates Introduction to Databases Chapter 1: Introduction • Purpose of Database Systems • View of Data • Database Languages • Relational Databases • Database Design • Object-based and semistructured databases • Data Storage and Querying • Transaction Management • Database Architecture • Database Users and Administrators • Overall Structure • History of Database Systems Database Management System (DBMS) • DBMS contains information for a community of users • Collection of interrelated data • Set of programs to access the data • An environment that is both convenient and efficient to use • Database Applications: • Banking: all transactions • Airlines: reservations, schedules • Universities: registration, grades • Online retailers: order tracking, customized recommendations • Manufacturing: production, inventory, orders, supply chain • Human resources:
    [Show full text]
  • Data Governance and Data Modeling
    Casey Gwozdz Data Governance and Data Modeling Governing Data Definition . Data Governance and Data Modeling – August 2016 | 1 SECTION 1 INTRODUCTION Many years ago, a school teacher named Robert Cawdrey had a very Introduction interesting idea. In his society, there was an explosion of international commerce and many new words were being developed for the English Enabling understanding language. Also, literacy among English speaking people was becoming more and collaboration and more widespread. It was becoming quite a challenge to help folks through a common understand and use the English language. Robert Cawdrey thought to put vocabulary. together a list of words in a specific format with a list of abbreviations signifying the word origin along with a definition for the word to help promote understanding and word usage among English speaking people. So, after years of effort, Robert Cawdrey is credited with publishing the first dictionary of the English language in 1604, entitled “A Table Alphabeticall” (at least according to the internet). Since then, virtually all languages provide a standardized format for explaining and using words in their respective languages to aid in understanding and intended meaning in order to increase better communication with each other. Skip forward a few centuries, and it seems we have a similar challenge in today’s corporate computerized application business system society. More and more employees are becoming “computer savvy” with access to tools related to e-mail communications, spreadsheets, documents, etc. These tools even have dictionaries imbedded for spell checking and grammar review to help improve communication. However, when it comes to developing an understanding of the vast amounts of data stored in our numerous business application system investments today, employees really have a challenge.
    [Show full text]