Data Warehousing

When and Why to Put What Data Where The usage of ODS and platforms in a well architected

By: Rob Armstrong Teradata Corporation When and Why to Put What Data Where

Table of Contents

Executive Summary 2 Executive Summary Operational Data Stores 3 Companies are trying to overcome two seemingly conflicting Data Warehouse Core 3 challenges. The first is how to integrate more historical Data Marts 4

Conclusion 5 and cross-functional analytic processes into their front-line

About the Author 5 operations to take the most relevant actions against the most profitable customers. The second challenge is how to integrate more timely interactions and transactions into the back-end analytics so that strategy can be measured and fine tuned at an ever increasing pace. In order to balance these two objectives, companies are looking to push data to the process by incorpo- rating operational data stores and data marts into their data management architectures. Unfortunately, this constant pushing of data results in more data inconsistency, more data latency, and a more fractured understanding of the business Author’s note: This paper is a companion as a whole. Understanding the correct usage of the layers helps of the “Optimizing the Data Warehouse Layer” white paper. While this paper to alleviate some of the problems. However, the real solution describes the differences and functions of the ODS and data mart layers, the other is to integrate the data once and drive the process to the data. paper discusses how to best optimize your Having said that, and understanding the larger challenge of core data warehouse repository layer to minimize data movement, minimize data centralization will keep some on the distributed path, there are management costs, and increase reusability some guidelines to understanding how and when to use the and end-user accessibility. The companion paper is available upon request. architecture layers in an efficient, as well as an effective manner.

EB-5512 > 1207 > PAGE 2 OF 5 When and Why to Put What Data Where

Operational Data Stores processes access the ODS, package the order, term the “hub” or “core” data warehouse. and then send the package for distribution. Others have called this the enterprise data The operational data store is typically used The distribution processes access the ODS warehouse or the centralized data ware- as a staging area for transactional data. for mailing information and so forth. Once house. Whatever the name, it means the This is a good use of this layer. Unfortu- the transaction is completely processed same. This is where data are brought nately, the ODS often morphs into the the data would be useful in analytics together to provide the integration and reporting and access point for what is across many of the corporate business single version of the truth to the data which deemed operational data. As the front-end units, especially when combined with drive business analytics and processes. processes require more historical or cross- other corporate data. This is when the functional data, these structures become The core is the integration point for transaction needs to be moved from the the problem rather than the solution. data – While many believe the goal of ODS into a more cross-functional, fully- the data warehouse is consolidation of ODS contains data from transactions integrated data warehouse structure. data, the real effort is to integrate the data. that are ‘in flight’ – Many transactional ODS contains data not available else- Rather than simply bringing the data into systems generate data that are very raw in where – This is the key to keeping data a common platform, the data must be nature. The data may be used throughout consistent and auditable. Once the data modeled and integrated to preserve data the data within the operational systems are moved from the ODS, they need to be relationships and provide an application before the data are considered ‘set’.An deleted from the ODS. Any subsequent agnostic body of data. By having the data example would be an order process where process needs to access the data from the integrated and not compromised to any the initial order is a very high-level capture enterprise data warehouse. The data are particular application, the environment of the items and then, once the order now considered cleansed and integrated. becomes prepared to handle not only the is completed, there is validity checking, The example here is the infamous opera- known, but the unknown, analytics and inventory checking, and or tional report, which informs management processes. This will enable the business matching before the order is considered about order processing commitments and and operational processes to identify weak complete and moves on to fulfillment. metrics. These are after the fact reports and points sooner and respond quicker to new ODS contains data not necessary for can be accessed easily from the warehouse. requirements. cross-functional processes – Once the While it is called an operational report, The core contains the historical relevance data are considered set, the next question it does not need to be generated by an and relationships – One of the great is, “Who needs to see these data?” If this is operational data store. This is confusing challenges, and a driver for pushing data simple workflow process, then containing the process with the platform. out to data marts, is the need to preserve the data in the ODS and sending messages the what was, what is analytics as reference to other organizations is much easier than Data Warehouse Core data changes. Questions, such as “How did physically moving the data to yet another As data are needed for historical analytics, Region one do last year when compared staging area. To continue our example cross-functional processes, or simple to this year?”,need to have consistency if above, the order is taken and validated. storage for easy access and ongoing branch locations have been reassigned. A message is sent to fulfillment where the management, they belong in what some The optimal way to accomplish this is

EB-5512 > 1207 > PAGE 3 OF 5 When and Why to Put What Data Where

through effective dates being incorporated Data Marts hundreds of times, then it makes sense to into the reference tables. Since the histori- generate that report once from the detail The data mart is the analytical equivalent cal detailed data reside in the core, and the and post the result to a data mart. A word of the ODS. These are subset extracts reference data can be managed in the core of caution on this usage is necessary. There of data for well-defined analytics and reference tables, analytics are allowed to is a difference between users asking for the perceived higher performance. There are run any point of time reporting. The other same report and asking for a similar report. many political reasons for the proliferation benefit to having historical relevance in the If each buyer is asking for only their of data marts though, in practice, they core data model is the enabling of Master products, then it is not the same report. increase data latency, increase time to Data Management across the organization. Many companies often build data marts deliver on new business needs, and create much larger than the detailed data because The core is where process can come to more confusion due to the multiple they must aggregate for every possible data – This is the real point of the core data versions of the truth coming from the scenario. In reality, letting each user ask warehouse layer. As the data are brought multitude of data marts. for their small segment against the detailed into the repository at ever increasing Data Marts are for data considered static data will result in lower CPU and disk frequencies and integrated with ever more in nature – One of the main problems utilization as opposed to generating every subject areas, they become the de facto with data marts is the constant updating possible answer set that may or may not memory of the corporation. After spend- as the underlying detailed data are main- be accessed. ing all the time and money to get the data tained. Data marts are best utilized when integrated and accessible, why would you Data mart cost must be justified by the the data extracted and transformed from want to add yet another extract, trans- ensuing business value – The real ques- the base detail will not change, or change form, and transmit process? Once the data tion for the data marts is whether or not minimally, in the future. Examples of this are in the core, companies should use the they are worth the cost. Unfortunately would be end of month or quarter report- capabilities and optimizations in the data- many companies never take this into ing, householding metrics to combine base engine to minimize further movement. consideration. Data marts exist to provide multiple customers to a single identifier, All of this would create an environment the higher performance so, the question or corporate reporting for government of minimal data latency and redundancy, is, what actions are being taken because regulations. Data that are constantly maximized data reuse and auditing, and of that better response time? Are the users changing would significantly increase the decreased time to develop new analytics or taking different actions, or taking actions cost of the data marts, as well as decrease operational processes. Of course, all of this sooner, because they have data more the data availability as the data mart is will depend on the engine being accessible and available? One example either out of synchronization with the detail able to manage a volatile, mixed workload would be a process to advise about the or unavailable due to the updating process. against an application-neutral data model. next best offer. If the query takes five If the engine is not capable of this, then Data Marts are for repetitive, and pre- minutes when going against detailed the question you have to ask is whether it dictable reporting – The other major use data, then the offer will not be timely to a is your tools or your requirements that are for data marts is when there is a known set customer in the store, at a kiosk, or when holding you back? of reporting needs across a wide body of calling into the company. However, if the users. If the same report will be requested data are preplanned and in a data mart to

EB-5512 > 1207 > PAGE 4 OF 5 When and Why to Put What Data Where

Teradata.com

provide two second response, then that issue, not all data are resident there, or technical processes such as ETL, query application could be used in a call center some other explanation? Log the reasons tools, and performance management, as and be much more effective in getting and then, as they are resolved, work to well as the corporate processes of data customers to accept the offer. The business move the process back to the data rather quality and funding. In order to drive an should be able to quantify the benefit to than continuing to push the data to the enterprise solution, there must be an offset the additional IT cost of the creation process. enterprise mindset. Total end-to-end and ongoing management of the data mart. performance means that when adding Solve the Real Problems indexes, you also incorporate the time it While it is often easier to just move the data Conclusion takes to update and manage the index into to resolve an outstanding issue, it is not your costs and latency consideration. Proper usage of the various layers is critical always the best solution. Performance and Creating new ODS tables means that you in the long-term success of complete data latency issues can also be rooted in poor not only have to add maintenance time, warehouse architecture. It is important to data quality and questionable data model- but you may now have duplicate data that understand the goal is to minimize the ing choices. When application development need auditing and reconciliation. A last data movement and increase user accessi- has to either reconcile data quality or code example is funding. There needs to be a bility. This will drive lower cost to manage around inconsistent data models, perform- mindset shift away from project funding to and deliver new capability while increasing ance will suffer. Rather than take the easy program funding. After all, which project the timeliness of data and the relevancy of fix of moving the data, which will still need funds data quality and integration? These actions. Here are a few recommendations to be reconciled and made consistent, go core infrastructure processes and integra- to help in your total evolution and usage back and fix the underlying problems. tion points need to be incorporated into of the data warehouse architecture layers. Fixing it once, correctly, will increase the new projects to ensure the enterprise – time to market for all future processes, Ask Why Not? not just the application – succeeds. fixing it in the application will mean that Too often, the decision about where to every new process will need to replicate put the data and how to organize them is About the Author efforts and most likely lead to even further based on a very narrow view of applica- data inconsistency and delay. Rob Armstrong is director of Teradata’s tion and understanding. Many operational Data Warehousing group and has more application developers have been trained Provide Enterprise-wide, End- than 20 years’ experience justifying, to think the data warehouse repository is to-end Solutions designing, and implementing data ware- not for quick access or recent data. Rather Many companies have a goal of an enter- house systems. He received a BA in than make arguments as to why the data prise data warehouse with timely, integrated economics with an emphasis on statistics need to be moved from the core, it is best data being accessible directly by the end- and relational theory from the University to ask why the data cannot reside in the user community. However, many of those of California, San Diego. Contact him at core. Is it a latency issue, performance same companies will isolate crucial [email protected].

Teradata is a registered trademark of Teradata Corporation and/or its affiliates in the U.S. and worldwide. Teradata continually improves products as new technologies and components become available. Teradata, therefore, reserves the right to change specifications without prior notice. All features, functions, and operations described herein may not be marketed in all parts of the world. Consult your Teradata representative or Teradata.com for more information. Copyright © 2007 by Teradata Corporation All Rights Reserved. Produced in U.S.A.

EB-5512 > 1207 > PAGE 5 OF 5