Data Vault Modeling The Next Generation DW Approach

Presented by: Siva Govindarajan

http://www.globytes.com [email protected]

1 Data Vault Modeling Agenda

 Introduction  Data Vault Place in Evolution  Data Warehousing Architecture  Data Vault Components  Case Study  Typical 3NF /  Data Vault Approach  Temporal Data + Auditability  Implementation of Data Vault  Conclusion 2 Introduction

 Data Vault concept originally conceived by Dan Linstedt  Enterprise Modeling approach.  Hybrid approach - 3NF & Star Schema  Adaptable to changes  Auditable  Timed right with Technology

3 Data Vault Place in Evolution

 1960s - Codd, Date et. al Normal Forms  1980s - Normal Forms adapted to DWs  1985+ - Star Schema for OLAP  1990s - Data Vault concept developed Dan Linstedt  2000+ - Data Vault concept published by Dan Linstedt

4 Data Warehousing Architecture

Bill Inmon’s Data Warehouse Architecture

Data Mart OLTP Enterprise Staging Staging Data Marts (optional) Data Marts Systems & ODS DW e tiv s p rt da a A M

Kimball’s Data Warehouse Bus Architecture

Data Mart Data Mart

OLTP Systems Staging Data Mart Data Mart

5 Data Vault Components

 Hub Entities  Candidate Keys + Load Time + Source  Link Entities  FKs from Hub + Load Time + Source  Satellite Entities  Descriptive Data + Load Time + Source + End Time  Dimension = Hub+ Satellite  Fact = Satellite + Link [+ Hub ]

6 Case Study

 Modeling Approaches  Typical 3NF / Star Schema  Data Vault approach  Scenario  Typical Fact and Dimension  New Dimension data  Reference Data change - 1:M to M:M  Additional attributes to Dimension

7 Typical 3F / Star Schema 1. Typical Fact and Dimension

TR ORDER DIM PRODUCT tr_order_id dim_product_id dim_product_id product_id system order id product_code system root order id product desc global root order id product_name order quantity le ngth total executed qty event timestamp

8 Typical 3F / Star Schema 2. ew Dimension Data

REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc is_strategy global root order id product_name order quantity product_category_id total executed qty product_category_name event timestamp product_category_code product_category_desc is_strategy

9 Typical 3F / Star Schema 2. ew Dimension Data

REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc is_strategy global root order id product_name order quantity le ng th total executed qty DIM PRODUCT CATEGORY event timestamp dim_product_category_id dim_product_category_id product_category_code product_category_name product_category_desc is_strategy namespace

10 Typical 3F / Star Schema 3. Reference 1:M to M:M Change

REF PRODUCT CATEGORY TR ORDER DIM PRODUCT product_category_id tr_order_id dim_product_id product_category_name dim_product_id product_id product_category_code system order id product_code product_category_desc system root order id product desc ??? is_strategy global root order id product_name order quantity le ng th total executed qty DIM PRODUCT CATEGORY event timestamp dim_product_category_id dim_product_category_id order owner firm id ??? product_category_code product_category_name product_category_desc is_strategy namespace

11 Typical 3F / Star Schema

4. Additional attributes to Dimension

TR ORDER DIM PRODUCT tr_order_id dim_product_id

dim_product_id product_id system order id product_code system root order id product desc global root order id product_name order quantity le ng th total executed qty height event timestamp weight dim_product_category_id packaging order owner firm id color size

12 Data Vault Approach 1. Typical Hub, Satellite and Link (Replacing Fact/Dimension)

SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id hub_record_source lnk_order_id hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp

13 Data Vault Approach 2. ew Dimension Data

SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id lnk_order_id hub_record_source hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp

SAT_CATEGORY01 HUB_CATEGORY sat_category01_id LNK_PRODUCT_X_CATEGORY hub_category_id hub_category_id lnk_product_x_category_id sat_load_date hub_record_source lnk_load_date sat_end_date hub_load_date lnk_record_source sat_record_source product_category_id hub_product_id product_category_name product_category_code hub_category_id product_category_desc is_strategy

14 Data Vault Approach 3. Reference 1:M to M:M Change SAT_PRODUCT01 HUB_PRODUCT sat_product01_id hub_product_id hub_product_id hub_record_source sat_load_date hub_load_date sat_end_date product_id sat_record_source product_code product desc product_name symbol HUB_ORDER SAT ORDER01 hub_order_id LNK_ORDER sat_order01_id lnk_order_id hub_record_source hub_order_id hub_load_date sat_load_date lnk_load_date tr_order_id sat_end_date lnk_record_source system order id sat_record_source hub_product_id system root order id order quantity hub_order_id global root order id total executed qty event timestamp

SAT_CATEGORY01 HUB_CATEGORY sat_category01_id LNK_PRODUCT_X_CATEGORY hub_category_id hub_category_id lnk_product_x_category_id sat_load_date hub_record_source lnk_load_date sat_end_date hub_load_date lnk_record_source sat_record_source product_category_id hub_product_id product_category_name product_category_code hub_category_id product_category_desc is_strategy

15 Data Vault Approach 4. Additional attributes to Dimension

SAT_PRODUCT01 sat_product01_id hub_product_id sat_load_date sat_end_date sat_record_source product desc HUB_PRODUCT product_name hub_product_id SAT_PRODUCT02 hub_record_source hub_load_date sat_product02_id product_id hub_product_id product_code sat_load_date sat_end_date sat_record_source length height weight packaging color

16 Temporal Data + Auditability

SAT_PRODUCT01 sat_product01_id

hub_product_id sat_load_date sat_end_date sat_record_source product desc HUB_PRODUCT product_name hub_product_id SAT_PRODUCT02 hub_record_source hub_load_date sat_product02_id product_id hub_product_id product_code sat_load_date sat_end_date sat_record_source length height weight packaging color

LNK_PRODUCT_X_CATEGORY LNK_ORDER HUB_ORDER SAT ORDER01 lnk_product_x_category_id lnk_order_id hub_order_id sat_order01_id lnk_load_date lnk_load_date hub_record_source hub_order_id lnk_record_source lnk_record_source hub_load_date sat_load_date hub_product_id hub_order_id tr_order_id sat_end_date hub_category_id hub_product_id system order id sat_record_source hub_category_id system root order id order quantity global root order id total executed qty event timestamp source_system

SAT_CATEGORY01 HUB_CATEGORY sat_category01_id

hub_category_id hub_category_id sat_load_date hub_record_source sat_end_date hub_load_date sat_record_source product_category_id product_category_name product_category_code product_category_desc is_strategy 17 Implementation of Data Vault

 Make it Simple  View for Dimension  View For Fact  Data Loading  Take advantage of Technology  DW Appliances  High-throughput storage devices  RDBMS Features

18 Make it Simple

 Create views for Dimension

From our Demo model, Join Product related Hubs and Satellites to build View for Product Dimension as shown below :

HUB_PRODUCT SAT_PRODUCT01 SAT_PRODUCT02

19 Make it Simple

 Create Views For Fact

Join Order related Hubs, Satellites and Links to build View for ORDER Fact using ORDER related Hubs/ Satellites and Links.

HUB_ORDER SAT_ORDER01 LNK_ORDER

20 Data Loading

Typically data loading for a Data Vault is in the following sequence.  Hubs for Dimensions  Links for Dimensions  Satellites for Dimensions  Hubs for Fact (if any )  Links for Fact  Satellites for Fact Refer to Data Vault Series article 5 of Linstedt for further details

21 DW Appliances

Next wave of IT Data Warehouse hardware solutions are the Data Warehouse Appliances. Take advantage of the technology where possible. Few DW Appliances in the market :  Oracle Exadata  Netezza  Teradata  RedBrick  GreenPlum

22 Highthroughput storage devices

Utilize Storage devices designed for DW / OLAP applications. Following are some examples only and would change over time. Please do your research based on your sizing requirements.  EMC CLARiiON Storage family  Texas Memory systems RamSan – Solid State devices  Hitachi USP / VSP Storage family  Other Hybrid solutions with High performance storage and Solid State devices

23 RDBMS Features

Some Oracle Features catering DW / OLAP applications:  Exadata Smart Scans  OLAP Based Materialized views  Partitioning Reference partition  Advanced data compression  Automatic degree of parallelism (ADOP).  Star query optimization  Oracle OLAP/DWH Features.

24 Conclusion In this presentation we have addressed high level overview of Data Vault Architecture. Topics discussed are Data Vault concept and architecture Data Vault Components such as Hubs, Satellites and Link tables Typical modeling challenges with traditional modeling approaches How those challenges could be handled using Data Vault Modeling Approach. Auditing and Temporal data capture using DV Approach. And finally, some implementation details If interested in learning more, try Dan Linstedt's special coaching area at: http://danLinstedt.com/my-coach.

25 Questions ?

? 26 References

 Data Vault Series articles by Dan E. Linstedt -- http://www.tdan.com  Referred few articles from Genesee Academy -- http://geneseeacademy.com  Articles and books by and from multiple sources.  Technical Documents from http://www.oracle.com/technetwork

27 Special Thanks to

 Dan Linstedt for review and corrections to the article.  My colleague Mark Bruscke for his assistance in preparing the article.

28 The End

Contact Email : [email protected]

29