Data Vault Modeling The Next Generation DW Approach
Presented by: Siva Govindarajan
http://www.globytes.com [email protected]
1 Data Vault Modeling Agenda
Introduction Data Vault Place in Evolution Data Warehousing Architecture Data Vault Components Case Study Typical 3NF / Star Schema Data Vault Approach Temporal Data + Auditability Implementation of Data Vault Conclusion 2 Introduction
Data Vault concept originally conceived by Dan Linstedt Enterprise Data Warehouse Modeling approach. Hybrid approach - 3NF & Star Schema Adaptable to changes Auditable Timed right with Technology
3 Data Vault Place in Evolution
1960s - Codd, Date et. al Normal Forms 1980s - Normal Forms adapted to DWs 1985+ - Star Schema for OLAP 1990s - Data Vault concept developed Dan Linstedt 2000+ - Data Vault concept published by Dan Linstedt
4 Data Warehousing Architecture
Bill Inmon’s Data Warehouse Architecture
Data Mart OLTP Enterprise Staging Data Mart Staging Data Marts (optional) Data Marts Systems & ODS DW e tiv s p rt da a A M
Kimball’s Data Warehouse Bus Architecture
Data Mart Data Mart
OLTP Systems Staging Data Mart Data Mart
5 Data Vault Components
Hub Entities Candidate Keys + Load Time + Source Link Entities FKs from Hub + Load Time + Source Satellite Entities Descriptive Data + Load Time + Source + End Time Dimension = Hub+ Satellite Fact = Satellite + Link [+ Hub ]
6 Case Study
Modeling Approaches Typical 3NF / Star Schema Data Vault approach Scenario Typical Fact and Dimension New Dimension data Reference Data change - 1:M to M:M Additional attributes to Dimension
7 Typical 3 F / Star Schema 1. Typical Fact and Dimension
TR ORDER DIM PRODUCT tr_order_id dim_product_id dim_product_id product_id system order id product_code system root order id product desc global root order id product_name order quantity le ngth total executed qty event timestamp
8 Typical 3 F / Star Schema 2. ew Dimension Data
REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc is_strategy global root order id product_name order quantity product_category_id total executed qty product_category_name event timestamp product_category_code product_category_desc is_strategy
9 Typical 3 F / Star Schema 2. ew Dimension Data
REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc is_strategy global root order id product_name order quantity le ng th total executed qty DIM PRODUCT CATEGORY event timestamp dim_product_category_id dim_product_category_id product_category_code product_category_name product_category_desc is_strategy namespace
10 Typical 3 F / Star Schema 3. Reference 1:M to M:M Change
REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc ??? is_strategy global root order id product_name order quantity le ng th total executed qty DIM PRODUCT CATEGORY event timestamp dim_product_category_id dim_product_category_id order owner firm id ??? product_category_code product_category_name product_category_desc is_strategy namespace
11 Typical 3 F / Star Schema
4. Additional attributes to Dimension
TR ORDER DIM PRODUCT tr_order_id dim_product_id
dim_product_id product_id system order id product_code system root order id product desc global root order id product_name order quantity le ng th total executed qty height event timestamp weight dim_product_category_id packaging order owner firm id color size
12 Data Vault Approach 1. Typical Hub, Satellite and Link (Replacing Fact/Dimension)
SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id hub_record_source lnk_order_id hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp
13 Data Vault Approach 2. ew Dimension Data
SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id lnk_order_id hub_record_source hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp
SAT_CATEGORY01 HUB_CATEGORY sat_category01_id LNK_PRODUCT_X_CATEGORY hub_category_id hub_category_id lnk_product_x_category_id sat_load_date hub_record_source lnk_load_date sat_end_date hub_load_date lnk_record_source sat_record_source product_category_id hub_product_id product_category_name product_category_code hub_category_id product_category_desc is_strategy
14 Data Vault Approach 3. Reference 1:M to M:M Change SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id lnk_order_id hub_record_source hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp
SAT_CATEGORY01 HUB_CATEGORY sat_category01_id LNK_PRODUCT_X_CATEGORY hub_category_id hub_category_id lnk_product_x_category_id sat_load_date hub_record_source lnk_load_date sat_end_date hub_load_date lnk_record_source sat_record_source product_category_id hub_product_id product_category_name product_category_code hub_category_id product_category_desc is_strategy
15 Data Vault Approach 4. Additional attributes to Dimension
SAT_PRODUCT01 sat_product01_id hub_product_id sat_load_date sat_end_date sat_record_source product desc HUB_PRODUCT product_name hub_product_id SAT_PRODUCT02 hub_record_source hub_load_date sat_product02_id product_id hub_product_id product_code sat_load_date sat_end_date sat_record_source length height weight packaging color
16 Temporal Data + Auditability
SAT_PRODUCT01 sat_product01_id
hub_product_id sat_load_date sat_end_date sat_record_source product desc HUB_PRODUCT product_name hub_product_id SAT_PRODUCT02 hub_record_source hub_load_date sat_product02_id product_id hub_product_id product_code sat_load_date sat_end_date sat_record_source length height weight packaging color
LNK_PRODUCT_X_CATEGORY LNK_ORDER HUB_ORDER SAT ORDER01 lnk_product_x_category_id lnk_order_id hub_order_id sat_order01_id lnk_load_date lnk_load_date hub_record_source hub_order_id lnk_record_source lnk_record_source hub_load_date sat_load_date hub_product_id hub_order_id tr_order_id sat_end_date hub_category_id hub_product_id system order id sat_record_source hub_category_id system root order id order quantity global root order id total executed qty event timestamp source_system
SAT_CATEGORY01 HUB_CATEGORY sat_category01_id
hub_category_id hub_category_id sat_load_date hub_record_source sat_end_date hub_load_date sat_record_source product_category_id product_category_name product_category_code product_category_desc is_strategy 17 Implementation of Data Vault
Make it Simple View for Dimension View For Fact Data Loading Take advantage of Technology DW Appliances High-throughput storage devices RDBMS Features
18 Make it Simple
Create views for Dimension
From our Demo model, Join Product related Hubs and Satellites to build View for Product Dimension as shown below :
HUB_PRODUCT SAT_PRODUCT01 SAT_PRODUCT02
19 Make it Simple
Create Views For Fact
Join Order related Hubs, Satellites and Links to build View for ORDER Fact using ORDER related Hubs/ Satellites and Links.
HUB_ORDER SAT_ORDER01 LNK_ORDER
20 Data Loading
Typically data loading for a Data Vault is in the following sequence. Hubs for Dimensions Links for Dimensions Satellites for Dimensions Hubs for Fact (if any ) Links for Fact Satellites for Fact Refer to Data Vault Series article 5 of Linstedt for further details
21 DW Appliances
Next wave of IT Data Warehouse hardware solutions are the Data Warehouse Appliances. Take advantage of the technology where possible. Few DW Appliances in the market : Oracle Exadata Netezza Teradata RedBrick GreenPlum
22 High throughput storage devices
Utilize Storage devices designed for DW / OLAP applications. Following are some examples only and would change over time. Please do your research based on your sizing requirements. EMC CLARiiON Storage family Texas Memory systems RamSan – Solid State devices Hitachi USP / VSP Storage family Other Hybrid solutions with High performance storage and Solid State devices
23 RDBMS Features
Some Oracle Features catering DW / OLAP applications: Exadata Smart Scans OLAP Based Materialized views Partitioning Reference partition Advanced data compression Automatic degree of parallelism (ADOP). Star query optimization Oracle OLAP/DWH Features.
24 Conclusion In this presentation we have addressed high level overview of Data Vault Architecture. Topics discussed are Data Vault concept and architecture Data Vault Components such as Hubs, Satellites and Link tables Typical modeling challenges with traditional modeling approaches How those challenges could be handled using Data Vault Modeling Approach. Auditing and Temporal data capture using DV Approach. And finally, some implementation details If interested in learning more, try Dan Linstedt's special coaching area at: http://danLinstedt.com/my-coach.
25 Questions ?
? 26 References
Data Vault Series articles by Dan E. Linstedt -- http://www.tdan.com Referred few articles from Genesee Academy -- http://geneseeacademy.com Articles and books by Ralph Kimball and Bill Inmon from multiple sources. Technical Documents from http://www.oracle.com/technetwork
27 Special Thanks to
Dan Linstedt for review and corrections to the article. My colleague Mark Bruscke for his assistance in preparing the article.
28 The End
Contact Email : [email protected]
29