Data Modeling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse
Total Page:16
File Type:pdf, Size:1020Kb
Data Modeling for Business Intelligence with Microsoft SQL Server: Modeling a Data Warehouse By Josh Jones and Eric Johnson CA ERwin TABLE OF CONTENTS INTRODUCTION . 2 WHAT IS BUSINESS INTELLIGENCE . .3 Data Marts Versus Date Warehouses . 3 Cubes . 3 DATA MODELING A WAREHOUSE . 3 UNDERSTANDING THE DATA . 3 BUILDING THE MODEL . 4 Star Schema . 4 Snowflake Schema . 4 Building the Model . 4 BUILDING THE DATA WAREHOUSE . 7 USING SSAS TO ANALYZE THE DATA . .8 CONCLUSION . 8 ABOUT CONSORTION SERVICES . .8 INTRODUCTION For decades, businesses of all shapes and sizes have been storing their historical data. Everything from cash register rolls stored in filing cabinets to web application transactions being stored in relational databases have been sitting quietly, often collecting dust, just waiting for someone to dispose of them. Fortunately, as we’ve transitioned from paper based records to computer based records, we’ve discovered that mining this historical data can yield a veritable gold mine of information that can help guide a business towards future success. Understanding everything from geographical product preferences to age based sales demographics to employee performance can help companies make smart decisions about what they sell, and how they sell it. Today, we can use modern technology such as Microsoft SQL Server to run both the transactional databases that support our mission critical applications as well as the analytical databases that can help drive our businesses forward. In this series, “Data Modeling for Business Intelligence with Microsoft SQL Server”, we’ll look at how to use traditional data modeling techniques to build a data model for a data warehouse, as well as how to implement a data warehouses and their accompanying processing loads. We’ll also look at CA’s ERWin Data Modeler to expedite and improve upon traditional design methodology. 2 WHAT IS BUSINESS INTELLIGENCE? related to a small subset of the business, such as a department Business Intelligence (BI), the term that represents gathering, or a specific product. Conversely, a data warehouse tends to analyzing, and presenting business data, has evolved to refer support the entire business, or large sections of the business, to the specific set of technologies and business processes depending on the size of the company. So, you could imagine that help companies get the most out of their data. A BI that data marts store the lower level version of the data, solution generally includes: the technologies that hold the while data warehouses will aggregate (or be fed by) the data source data; the processes to extract that data; calculations from the data marts. In some environments, this process is and manipulations to reveal trends within that data; and the reversed (data marts are built after the data warehouse has reports and interactive tools to display the data to users been loaded with data), but the concept is still the same. For throughout the company. Additionally, it’s very important for the purposes of this whitepaper, we’ll refer to these two the company/organization to figure out what data they are items nearly interchangeably; just remember that the same interested in seeing, otherwise there is an almost endless information applies to both unless otherwise noted. number of ways to combine, calculate, split, and trend any type of data. Within both the data mart and data warehouse is the “cleansed” form of the historical data. Typically, historical Once a company has decided to pursue a comprehensive BI data is transactional data, such as sales transactions, order solution, there are a number of components that must be details, billable items (such as minutes) and usage details discovered and/or developed. Specifically, these are: (i.e. number of phone calls per hour in a telephone system). For the purposes of analysis, this could be a formidable • Data Sources amount of data that would be difficult for end users, such as Where is all of the data stored? What types of systems business analysts, to comb through and use to identify are being used? How much data do we have available? trends. Data warehouses are used to hold the calculated • Data Destinations version of that data. Upon loading, the data will have been How is the data going to be presented? What are we cleansed, meaning that duplicate records will be removed, using to store the new data? unnecessary fields of data will be eliminated, and certain • Tools values, such as totals and averages, will be pre-calculated Are we building internal tools or purchasing something? and stored along with the original data. This allows for faster, What are we using to perform calculations and other smarter queries against the data to enable end users to get trending processes? their information faster. • Business Rules What are our thresholds? What data is critical versus CUBES noncritical? How far back to we want to look at historical Once data has been loaded into a warehouse, there’s another, data? Who needs to access the output of the BI solution? far more analysis oriented structure that can be used; cubes. Analysis cubes, or data cubes as they are often called, are DATA MARTS VERSUS DATA WAREHOUSES three (or more) dimensional arrays of values stored along One of the primary decisions that must be made when specific dimensions (such as time). Cubes allow for much planning a BI solution relates to how the historical data will be faster access into historical data, because of their multi- gathered and stored to be used for analysis. Not only what dimensional format. In Microsoft SQL Server Analysis hardware must be used, but logically, what type of data Services, cubes are loaded from a given raw data source, and structure will be housing the data. The two primary structures additional calculations and processing can be programmed used for this are the data mart and the data warehouse.In to add new values and reveal trends in the data. This allows short, these two components are basically the same in terms for the “precompilation” of data, which makes retrieval by of structure, but differ in their usage. They are both databases, end users even faster than a typical data warehouse stored typically stored in a Relational Database Management System in an RDBMS. (RDBMS), that are custom built to hold analytical data. However, they are not relational databases, because they typically don’t follow basic design rules such as normalization DATA MODELING A WAREHOUSE and referential integrity (though these concepts are by no When it comes to designing a data warehouse, there are means disposed of; they are just used in a different manner). quite a few traditional data modeling processes that are useful. When you design a data model, you will typically The difference between a data mart and a data warehouse is gather requirements, identify entities and attributes based generally a question of scope. Data marts tend to store data on the data, normalize the model, determine relationships, 3 and document the model using a modeling tool such as BUILDING THE MODEL ERWin. When you build a data warehouse, it’s much the Once you have an understanding of the data that will be same. The difference comes in how you identify the data, stored in the warehouse, it’ll be time to actually build a data and how you build entities and attributes. With a data model. You’ll identify your entities as two different types; warehouse, the data you are working with will be slightly facts and dimensions. Fact entities are entities that store the different, because it will be a “rolled up” version of the data. raw value information, such as sales totals, minutes, etc. You’ll be lacking certain detail information. Additionally, Dimension tables store related data in terms of how you are you’ll be thinking about your entities in terms of facts and measuring the facts. Examples of dimensions are time based dimensions (discussed below). And finally, unlike a data (days, months, years, etc.) and geographical locations traditional data model and database, your warehouse data (city, state, country, continent). When you are building model will be an almost exact representation of the physical queries to analyze data, you’ll be retrieving fact data based data warehouse. on specific dimensions, so keep in mind that you need to look at your notes to see all of the possible dimensions, and UNDERSTANDING THE DATA create separate entities for them. In order to facilitate a discussion around data modeling for a warehouse, it will be helpful to have an example project to Once you have a list of the known facts and dimensions, work with. For our purposes, let us suppose we are building a we’ll have fact entities and dimensional entities, and they data model for a data warehouse that will support a simple will be related to one another in a manner that facilitates retailing business (a very common business model). The data understanding of the relationships between those entities. warehouse we are building will house all relevant historical This layout is called a schema, and there are two primary data to support decisions around products that are carried, schema types used in a dimensional data warehouse: inventory numbers and employee workload. We’ll need to be snowflake and star. able to store and calculate sales history by product, by time- line, and by geography. We’ll also need to be able to serve up STAR SCHEMA data around warehouse activity (the physical warehouse, not One of simplest schemas, star schemas are named such the data warehouse) so we can determine staffing levels because they usually involve a single fact entity, related to a during the appropriate times of the year.