When and Why to Put What Data Where the Usage of ODS and Data Mart Platforms in a Well Architected Data Warehouse
Total Page:16
File Type:pdf, Size:1020Kb
Data Warehousing When and Why to Put What Data Where The usage of ODS and data mart platforms in a well architected data warehouse By: Rob Armstrong Teradata Corporation When and Why to Put What Data Where Table of Contents Executive Summary 2 Executive Summary Operational Data Stores 3 Companies are trying to overcome two seemingly conflicting Data Warehouse Core 3 challenges. The first is how to integrate more historical Data Marts 4 Conclusion 5 and cross-functional analytic processes into their front-line About the Author 5 operations to take the most relevant actions against the most profitable customers. The second challenge is how to integrate more timely interactions and transactions into the back-end analytics so that strategy can be measured and fine tuned at an ever increasing pace. In order to balance these two objectives, companies are looking to push data to the process by incorpo- rating operational data stores and data marts into their data management architectures. Unfortunately, this constant pushing of data results in more data inconsistency, more data latency, and a more fractured understanding of the business Author’s note: This paper is a companion as a whole. Understanding the correct usage of the layers helps of the “Optimizing the Data Warehouse Layer” white paper. While this paper to alleviate some of the problems. However, the real solution describes the differences and functions of the ODS and data mart layers, the other is to integrate the data once and drive the process to the data. paper discusses how to best optimize your Having said that, and understanding the larger challenge of core data warehouse repository layer to minimize data movement, minimize data centralization will keep some on the distributed path, there are management costs, and increase reusability some guidelines to understanding how and when to use the and end-user accessibility. The companion paper is available upon request. architecture layers in an efficient, as well as an effective manner. EB-5512 > 1207 > PAGE 2 OF 5 When and Why to Put What Data Where Operational Data Stores processes access the ODS, package the order, term the “hub” or “core” data warehouse. and then send the package for distribution. Others have called this the enterprise data The operational data store is typically used The distribution processes access the ODS warehouse or the centralized data ware- as a staging area for transactional data. for mailing information and so forth. Once house. Whatever the name, it means the This is a good use of this layer. Unfortu- the transaction is completely processed same. This is where data are brought nately, the ODS often morphs into the the data would be useful in analytics together to provide the integration and reporting and access point for what is across many of the corporate business single version of the truth to the data which deemed operational data. As the front-end units, especially when combined with drive business analytics and processes. processes require more historical or cross- other corporate data. This is when the functional data, these structures become The core is the integration point for transaction needs to be moved from the the problem rather than the solution. data – While many believe the goal of ODS into a more cross-functional, fully- the data warehouse is consolidation of ODS contains data from transactions integrated data warehouse structure. data, the real effort is to integrate the data. that are ‘in flight’ – Many transactional ODS contains data not available else- Rather than simply bringing the data into systems generate data that are very raw in where – This is the key to keeping data a common platform, the data must be nature. The data may be used throughout consistent and auditable. Once the data modeled and integrated to preserve data the data within the operational systems are moved from the ODS, they need to be relationships and provide an application before the data are considered ‘set’.An deleted from the ODS. Any subsequent agnostic body of data. By having the data example would be an order process where process needs to access the data from the integrated and not compromised to any the initial order is a very high-level capture enterprise data warehouse. The data are particular application, the environment of the items and then, once the order now considered cleansed and integrated. becomes prepared to handle not only the is completed, there is validity checking, The example here is the infamous opera- known, but the unknown, analytics and inventory checking, and data cleansing or tional report, which informs management processes. This will enable the business matching before the order is considered about order processing commitments and and operational processes to identify weak complete and moves on to fulfillment. metrics. These are after the fact reports and points sooner and respond quicker to new ODS contains data not necessary for can be accessed easily from the warehouse. requirements. cross-functional processes – Once the While it is called an operational report, The core contains the historical relevance data are considered set, the next question it does not need to be generated by an and relationships – One of the great is, “Who needs to see these data?” If this is operational data store. This is confusing challenges, and a driver for pushing data simple workflow process, then containing the process with the platform. out to data marts, is the need to preserve the data in the ODS and sending messages the what was, what is analytics as reference to other organizations is much easier than Data Warehouse Core data changes. Questions, such as “How did physically moving the data to yet another As data are needed for historical analytics, Region one do last year when compared staging area. To continue our example cross-functional processes, or simple to this year?”,need to have consistency if above, the order is taken and validated. storage for easy access and ongoing branch locations have been reassigned. A message is sent to fulfillment where the management, they belong in what some The optimal way to accomplish this is EB-5512 > 1207 > PAGE 3 OF 5 When and Why to Put What Data Where through effective dates being incorporated Data Marts hundreds of times, then it makes sense to into the reference tables. Since the histori- generate that report once from the detail The data mart is the analytical equivalent cal detailed data reside in the core, and the and post the result to a data mart. A word of the ODS. These are subset extracts reference data can be managed in the core of caution on this usage is necessary. There of data for well-defined analytics and reference tables, analytics are allowed to is a difference between users asking for the perceived higher performance. There are run any point of time reporting. The other same report and asking for a similar report. many political reasons for the proliferation benefit to having historical relevance in the If each buyer is asking for only their of data marts though, in practice, they core data model is the enabling of Master products, then it is not the same report. increase data latency, increase time to Data Management across the organization. Many companies often build data marts deliver on new business needs, and create much larger than the detailed data because The core is where process can come to more confusion due to the multiple they must aggregate for every possible data – This is the real point of the core data versions of the truth coming from the scenario. In reality, letting each user ask warehouse layer. As the data are brought multitude of data marts. for their small segment against the detailed into the repository at ever increasing Data Marts are for data considered static data will result in lower CPU and disk frequencies and integrated with ever more in nature – One of the main problems utilization as opposed to generating every subject areas, they become the de facto with data marts is the constant updating possible answer set that may or may not memory of the corporation. After spend- as the underlying detailed data are main- be accessed. ing all the time and money to get the data tained. Data marts are best utilized when integrated and accessible, why would you Data mart cost must be justified by the the data extracted and transformed from want to add yet another extract, trans- ensuing business value – The real ques- the base detail will not change, or change form, and transmit process? Once the data tion for the data marts is whether or not minimally, in the future. Examples of this are in the core, companies should use the they are worth the cost. Unfortunately would be end of month or quarter report- capabilities and optimizations in the data- many companies never take this into ing, householding metrics to combine base engine to minimize further movement. consideration. Data marts exist to provide multiple customers to a single identifier, All of this would create an environment the higher performance so, the question or corporate reporting for government of minimal data latency and redundancy, is, what actions are being taken because regulations. Data that are constantly maximized data reuse and auditing, and of that better response time? Are the users changing would significantly increase the decreased time to develop new analytics or taking different actions, or taking actions cost of the data marts, as well as decrease operational processes. Of course, all of this sooner, because they have data more the data availability as the data mart is will depend on the database engine being accessible and available? One example either out of synchronization with the detail able to manage a volatile, mixed workload would be a process to advise about the or unavailable due to the updating process.