Dimensional Modeling Is Very Elaborately Defined by Ralph Kimball
Total Page:16
File Type:pdf, Size:1020Kb
Dimensional Modeling is very elaborately defined by Ralph Kimball. The definition explains that dimensional modeling is a methodology for modeling data keeping in mind the query performance, understandability andresilience to change. Here the data is stored in a standard framework called the star schema. Star Schema Star Schema consists of a centralized fact table having multipart keys and one or more numerical business measure. This table is joined to denormalized tables called the dimension tables which have single part key and store business attributes in the form textual information. The graphical representation of this looks like a star with the fact table in the center and the dimension tables surrounding it hence the name. Please note that each dimension table provides a perspective to the fact table to which it is joined. In order to get the complete perspective of the measure stored in the fact table we need to In the figure above we have a very basic example of a Star Schema. Here the Fact Table called the Sales Fact Table has 3 Foreign Keys that connect it to the dimension tables and one measure called Sales in $. These three foreign keys uniquely identify a record in the fact table. There are 3 dimension tables in a denormalized form. The Time Dimension gives the date perspective to the Fact table, e.g. the date when the sales transaction occurred. Similarly the Location Dimension and Product Dimension give perspective of where the sales happened and what product was sold in the transaction , e.g. the store name and the name of the SKU. The Sales in $ just alone will be a number and would not mean anything unless we join it with the time dimension, product dimension and the location dimension tables to give the complete picture of the sales. In other words the fact table must be joined with the dimension table to give the complete perspective of the measure. Fact Table As explained above the Fact Table forms the center of the star schema. It contains one or more business measure which are numeric values. The fact table is joined to the dimension tables through the foreign keys to the dimension tables. These foreign keys joining the fact table to the dimension tables form the fact tables composite primary key. Characteristics of the Fact Table It has multiple foreign keys of the dimension tables to uniquely identify rows in it. This in other words is called having Multipart Key. It contains one or more business measure called the Fact. The facts are numerical valuesof business transaction. The data once loaded in the fact table is not meant to be updated. This is because we load business transactions that have already happened. The only reason that would require to change the factual data would be 1. Error in the Operational system or process resulting in incorrect or 2. Error in the ETL process while calculating and loading the fact table. The data in a fact table must be in the same level of granularity. They contain very large number of rows. It is a normalized table. Types of Measures in a Fact Based on whether a metric is calculated and or how the measure values can be added we categorize them as follows 1. Base Measures - A metric in the fact table whose value is taken as-is from the source. The ETL process does not change its value based on business rules or other measures. e.g. The "Sales in $" in the above example is a transaction whose value is loaded as-is from the source system. 2. Derived Measures - A metric in the fact table whose value is calculated based on business rules and the base measures from source system .e.g. The "Sales Margin" is a measure that is calculated based on the (Sales in $ - Cost in $)/Sales in $. Here we use two base measures namely Sales in $ and Cost in $. 3. Additive Measures - These are measures in the fact table that can be added across all the dimension tables and give meaningful data. For example the "Sales in $" in the example above can be measured across all the three dimensions attached to the fact table. If we add the "Sales in $" across the time dimension we get the total sales for a period of time, similarly total sales for across all stores, and sales for all products. 4. Semi Additive Measures - These are measures in the fact table that can be added across only some dimensions. When added across the other dimensions it may not produce meaningful results. For example the Inventory Balance metric in the example, indicates the remaining number of the product in the store at the time of the transaction. Adding it over the time dimension will not result in a meaningful result, but adding it for all the products in the store will give the total inventory count. 5. Non Additive Measures- These are measure in the fact table that cannot be added across any fact. If added they would not produce meaningful results. These are generally percentages and ratio metrics. Example of a non additive fact would be Sales Margin % as shown in the example above. Types of Fact Tables 1. Transaction Fact Tables - These are fact tables that contain the value of the business transaction that has occurred at a point of time. Here a row will be inserted for each transaction that has occurred. 2. Periodic Snapshot Fact Tables - These are fact tables that contain the complete snapshot of the transactions at the end of the business period (day/week/month etc). Take for example that there were 10 sales transactions for a particular product/SKU during the day. In Transaction Fact table we would have the 10 entries for each of the transaction and the value for inventory balance would reduce with the rows for each transaction. In the case of Periodic Snapshot table we would store the end of day Balance Inventory value only. The Periodic Snapshot fact tables are loaded continuously at the end of every business period (day/week/month etc). This way we build the fact table to provide predictable trends for business measures. 3. Accumulating Snapshot Fact Tables - These are special type of fact tables that are applied to business processes like order management. Here we create entries for all the phases of the order (start to end of the order process) when an order is created. Once the event to complete a phase is over we update the row corresponding to the event with factual entries and the date of event. 4. Factless Fact tables - These are multipart key tables with no business measure in them. They are meant to capture business metrics that cannot be determined based on business transactions. These are tables that represent the relationship between the dimensions without the need for a business transaction or event. The star schema in the figure above would not be able to define the number of products which have not been sold in the last 2 weeks. The Sales Fact table in the figure is based on business transaction event of a sale for a product. To get answer for products not sold, we would create a Factless Fact table containing the the combination of all product key for each store key available in the last 2 weeks. We would compare the product key for the last two week in the Sales Fact table and the Factless Fact table to give us products that have not been sold in the last two weeks. Dimension Tables As explained earlier, dimension table is a denormalized table which has a single key and stores business attributes in the form textual information. Dimension tables are joined to the fact table and provide business perspective to the measure in the fact. The key of the dimension table is used to join to the fact table where it becomes a part of the composite primary key. Dimension tables also contain the hierarchy of the business perspective in the denormalized form. Example is shown below The Figure above explains the denormalized structure of the product dimension. The hierarchy explains how a brand has departments within it and how a department has a classes within it and so on. Characteristics of Dimension tables Dimension tables are denormalized to accommodate simpler navigation and reduce the number of joins in the query. This improves the query performance. Dimension tables contain business attributes which are generally textual. Some attributes like Size of a product would be numerical dimension attribute. Dimension table's unique key is generally a meaningless running number called the Surrogate Key. Values of dimension attributes can change over a period of time. These types of dimensions are called Slowly Changing Dimensions. Types of Dimension tables Conformed Dimension - These are dimension tables that can be shared across multiple fact tables. A common example of conformed dimension is the Time dimension. Degenerate Dimension - A degenerate dimension contains only a key and no attributes. It is also called as an empty dimension as it is stripped of any attributes. It does not exist as a table but the key forms a part of the fact table. An example would be adding a transaction number in the Sales Fact table. The transaction number will identify all the rows corresponding to a transaction. Slowly Changing Dimension (SCD)- These are dimension tables whose attributes change over a period of time. As the name suggests the changes happen at a slow rate. Based on how the business wants to track the changes to the dimension attributes we can handle the changes in three ways namely Type 1 - In the Type 1 SCD tables we are not required to track the previous changes that have occurred to the attribute.