<<

Building the for Analytic Competition: Why the Architecture Foundation is so Critical to Success

Prepared by William McKnight www.mcknightcg.com

Sponsored by Contents Architecture Defined

Information Architecture Defined...... 2 Definition of Information Architecture Definition of Information Architecture...... 2 Lost amid the conversation on big data and the accelerating An Architecture Framework: Teradata’s Approach. . . . . 3 advancement of just about every aspect of enterprise that manages information are the things that hold it all together. Patterns and Implementation Alternatives . . . . .4 Yet this is critical: information-management components must Architecture Principles and Advocated Positions...... 5 come together in a meaningful fashion or there will be unneeded Balancing Acts: Delivery Versus Architecture...... 6 redundancy and waste and opportunities missed. Considering that Architecture Development and Information optimizing the information asset goes directly to the organiza- Management Possibilities ...... 7 tion’s bottom line, it behooves us to play an exceptional game— Harnessing Workloads ...... 7 not a haphazard one—with our technology building blocks. What Determines the Success of a Workload?...... 7 Platform Selection Process...... 8 The glue that brings components The No-Reference Architecture...... 8 together is called “architecture”—the high-level plan for the data stores, the applications that use the data, and everything in- The Analytic Ecosystem...... 8 between. The “everything in-between” can be quite extensive as The Building Blocks of Analytic Competition...... 10 that relates to data transport, middleware, and transformation. Teradata Analytic Architecture Technology...... 10 Architecture dictates the level of data redundancy, summarization, Teradata Analytic Architecture Solution Model...... 10 and aggregation since data can be consolidated or distributed A Consistent Approach Ensures Delivery...... 11 across numerous data stores optimized for parochial needs, broad-ranging needs and innumerable variations in between.

There must be a true north for enterprise information architec- ture. There needs to be a process to vet practices and ideas that accumulate in the industry and the enterprise, and assess their applicability to the architecture. We define this body of possibili- ties in terms of “Design Patterns,” “Implementation Alternatives,” “Architecture Principles” and “Advocated Positions.” These concepts will be defined later in this paper, but what is important to understand upfront is that analytic success requires focused attention on information architecture.

Analytics, not reporting, is forming the basis of competition today. Rearview-mirror reporting can be essential in support of operational needs. However, the large payback from information undoubtedly comes in the form of analytics.

EB-7592 > 0513 > PAGE 2 OF 11 An Architecture Framework: Teradata’s Approach You can architect for known requirements effectively only by Architecture is immensely important to information success— understanding the context of eventual requirements. and thus the recipe for that success begins with a good, well- rounded and complete architectural approach. You can architect The trajectory of systems in an organization is never a linear pro- an environment in a way that encourages data use by making jection from a near-recent state to a current state through known it perform well, putting up the architecture/data quickly, and requirements. It must include contingencies for the unknown and having minimal impact on users and budgets for ongoing mainte- for the forked paths that systems can take in an organization. It nance by building it well from the beginning. must impute vision derived from similar organizations, especially more advanced and progressive ones. You do not invest in archi- Any or all of these requirements can quickly send users retreating tecture to be status quo—you expect business success, supported to the safety of status quo information usage, instead of taking on by architecture. Business Architecture is supported by Informa- what might seem like a formidable challenge of progressive usage. tion Architecture and Application Architecture. But consider that in the small windows of time most users have to engage with available data, they can only reach a certain level Teradata’s Information Architecture supports Business Archi- of depth with the information. If the data is architected well, that tecture through storing or otherwise processing the data that is analysis will be deep, insightful and profitable. That is the power required, both internally and externally generated. Information of architecture. Architecture must take into consideration the numerous avenues for data today. If your service provider’s approach does not reflect this, the result will be less than successful. Conversely, let’s look at Data must be put in the best place to succeed, which primarily Teradata’s approach. means it must be enabled quickly, well-performing, and scalable. Information Architecture identifies the data (and the state of Teradata defines its Architecture Framework using the BIAS the data) needed to support the Business Architecture and approach, which consists of a focus on four key components includes logical and physical data models, and is supported by that comprise architecture, as well as two components that make Systems Architecture. it all work together: Like Information Architecture, Teradata’s 1. Business Architecture Application Architecture can subdivide applications in many ways. Application Architecture 2. Information Architecture uses Information Architecture and Systems Architecture to support 3. Application Architecture Business Architecture. While applications execute the functional

4. Systems Architecture side of the Business Architecture, effective cross-referencing of applications to the required tools and other applications is an 5. Enablement important component of the Application Architecture challenge. 6. Program Management Where the architecture rubber meets the road in Teradata’s Teradata defines theBusiness Architecture as understanding the approach is Systems Architecture. This is the physical mani- business requirements and providing vision to those requirements. festation of architecture—the base upon which Information It has to do with defining the organizational business model, Architecture and Applications Architecture reside and deliver structures, missions, goals, and processes, and understanding for the Business Architecture. Like in other areas, Systems which business fundamentals are vital for organizational success. Architecture has the issues of subdivision and optimization.

EB-7592 > 0513 > PAGE 3 OF 11 Business, Information, Applications and Systems Architecture Design Patterns and Implementation Alternatives are each disciplines unto themselves and may be optimized In daily information management activity, decisions are made individually. But they must be prioritized through Enablement. with high frequency and major decisions are never far away. Enablement evaluates cultural and organizational readiness In order to support those decisions with program context and for architectural advances and prioritizes resources and work unbiased wisdom, it is necessary to make and implement design effort accordingly. choices. To accomplish this, Teradata suggests addressing what it calls Design Patterns and Implementation Alternatives. According to Teradata, “Enablement evaluates cultural and orga- nizational readiness for the architectural advances and prioritizes Design Patterns, according to Teradata, are a set of proven resources and work effort accordingly. Enablement adds data architectural options for meeting an array of requirements. They management capabilities with each implementation, such as a are reusable approaches to solve commonly occurring problems, data quality improvement program, a data governance capabil- whether they are affecting a program at present or are those that ity or one of the ones reviewed below, that support current and should be anticipated. It is important to have alternatives laid out future information initiatives.” for different situations that are likely to be encountered, and plan them out with an appropriate level of nuance and understanding Much of the work building architecture for analytic competition of the pros and cons of architectural decisions. should include “soft” factors like Enablement, especially early in the process. While leaving room for personal judgment, which is always necessary, Teradata’s Design Patterns and its physical side— Implementation Alternatives—provide a strong basis for Enablement addresses where organizations are weak and decision-making. This basis can be very beneficial in aligning reasons they may fail. people with ultimate decisions. If left to an unsupported process, decisions would not only take longer, they would be less accepted. Design Patterns and Implementation Alternatives enable pro- Finally, according to Teradata, it is overall Program Management gram agility and appropriately shift some balance in what consti- that will intelligently bring everything together into meaningful tutes success away from simply decision-making to the execution interim points that deliver analytics to address organizational of decisions. goals in an agile fashion. Program Management extends through- out all implementations and ensures consistency and continuity Teradata’s Design Patterns and Implementation Alternatives among many projects and players. reduce the chances of failure by enabling a shop with alternatives thought out in advance, without the pressure of an impending In summary, Teradata has a comprehensive approach to informa- sprint deadline. So why fail, even if it is “fast”? Well thought-out tion architecture. It acknowledges the importance of architecture Design Patterns and Implementation Alternatives enable speed and skillfully decomposes architecture into layers that can be and reduce the chances for failure. discretely worked on in context of a full approach.

EB-7592 > 0513 > PAGE 4 OF 11 Architecture Principles and Advocated Positions Architectural decision-making during development occurs with While Design Patterns and Implementation Alternatives high frequency, but peaks at the beginning of an effort when deci- are actionable, they are built upon what Teradata refers to sions are made about what will be done in the sprint, and how. as Architectural Principles and Advocated Positions. The team then should be able to know what is needed from previ- ous architecture decisions about their work and be empowered These beliefs about information and how things should be done to deliver. Architecture provides proven, reusable components to will change less frequently and may be advocated from higher accelerate development time. company positions than the Design Patterns and Implementa- tion Alternatives. Advocated Positions help balance between short- and long-term tradeoffs. They are the bedrock upon which Teradata’s Advocated Positions Include: everything in the program flows; it is essential to get these right, • Load everything into the core physical data model then ensure that the Design Patterns and Implementation Alter- • Touch it, take it (extract all columns) natives are a correct interpretation of the positions. • Reversibility of data errors out of the core physical data model One of Teradata’s most important Advocated Positions is to prioritize data access over data loading. Although both areas can • Reusability of common components have performance issues, users (customers) of the analytic infra- • Traceability of core data to its originating source structure will always prioritize the time they are interfacing with system the data over the currency of the data. While layers of intake and • Collect , both technical and business distribution may be physically separated in a data warehouse, and • Abstracted core physical data model from business thus able to be optimized for purpose, it is the overall architecture usage that should first be optimized for data access. Today, that analytic architecture extends well beyond the data warehouse, increasing • Include acquisition/staging layer in the architecture the need for architecture. • No production reporting from non-production systems

• Integrated logical and physical data models

Architecture is about facilitating prioritized data access, • Permanently archive everything not done for its own sake or to satisfy an abstract • Enforce referential integrity standard. • Prioritize data access over data loading

• Full copy of source data objects in acquisition area You need a process to make decisions as much as you need • A single route for data to flow into the core physical the decisions themselves. With Architectural Principles and model Advocated Positions, Teradata has completely encapsulated the necessary decision-making side of analytic architecture.

EB-7592 > 0513 > PAGE 5 OF 11 Balancing Acts: Delivery Versus Architecture Even business leaders can tend to take a tactical approach to the Teradata Unified Data execution of the requirements. However, it does not necessarily Architecture™ take longer to satisfy information requirements in an architected When organizations put all their data to work, they make fashion. If architecture principles and technology possibilities are smarter decisions and create a new data-driven approach not on the table beforehand, the means to satisfy the last require- to improving their business. Through deeper insights ment may be used to satisfy a new requirement. This may or may about customers and operations, the data delivers not be appropriate. competitive advantage for leading organizations that are able to compete on analytics by leveraging all their data. This also disconnects the solution from prior solutions that may Companies should exploit this market opportunity lead the way to requirement satisfaction. For example, shops with to compete on analytics by creating a strong analytic countless multidimensional structures—and with more being foundation based on a comprehensive built on almost a daily basis—can readily attest to a need for that leverages existing, new, and emerging technologies. architecture. By taking a disciplined architectural approach, we This architecture should contain three main capabilities: have found that we are in a better position to solve the next busi- • Data Warehousing—Integrated and shared data ness problem now. environments to manage the business, and deliver strategic and operational analytics to the extended organization

• Data Discovery—Discovery analytics to rapidly unlock insights from big data through rapid exploration using a variety of analytic techniques that are accessible by mainstream business analysts.

• Data Staging—Loading, storing, and refining data in preparation for analytics

Teradata has responded to this market need by developing Teradata® Unified Data Architecture™ that allows organizations to leverage the complementary values of the Teradata® , Teradata Aster SQL-MapReduce®, and open-source Hadoop® technologies. This Unified Data Architecture™ helps companies define and deploy an architecture that makes use of these best-of-breed technologies in a way that unleashes the value of their data. Companies can apply the right technology to the right analytical opportunities so business users can isolate intelligent signals— and have an architecture for analytic decisions.

EB-7592 > 0513 > PAGE 6 OF 11 Architecture Development and Information Management Possibilites

There is a need for architecture that falls outside of captive project Harnessing Workloads timeframes and may seem somewhat removed from user require- Workloads comprise functionality necessary to achieve with data, ments—at least to users. However, the architecture requirements as well as the management of the data itself. Harnessing work- outlined here play a vital role in delivering user requirements. loads for allocation to an architecture component is both an art They are a skillful interpretation of user requirements. and a science. There are user communities with a list of require- ments upon a set of data. There are other user communities The best way to look at an analytics program is as a series of with their own list of requirements on the same data. Is this one architecture sprints. Taking on analytics as architecture means workload? If ultimately it is best to store the data in one location analytics will be done to internally adjudicated current standards and use the same tool(s) to satisfy the requirements, the practical and built to company priorities. answer is “yes.”

Architecture requires its own codified efforts. The continuous When does the “set of data” end and become a different workload? activity of information management is architecture. With disci- It could, practically speaking, be when a new data store is appro- pline, Teradata Design Patterns and Implementation Alternatives priate. Harnessing workloads can be puzzling, but ultimately as well as Architecture Principles and Advocated Positions will be workloads need to be ring-fenced for architecture purposes. continually used over time, providing ongoing value by limiting risk and not reinventing the wheel. What Determines the Success of a Workload? Many technology types have emerged in recent years to support Without architecture, analytic development is destined for high the idea that analytic data needs to perform—the primary means levels of wasted effort, restarts, redundancy and, most damaging, of judging the success of a workload. As previously mentioned, it missed opportunity. is the performance of the data access that constitutes the perfor- mance of a workload.

Information Management is nothing more than the Getting to fast performance quickly is the second measure of the continuous activity of architecture. success of an analytic workload. In the end, if the good perfor- mance goes away quickly because the application is not scaling, all would be for naught. The third measure of workload success is scale. Note that this does not mean the initial Systems Archi- tecture must last forever untouched. It does mean that Systems Architecture is maintained without user impact. As far as they are concerned, it hums along. Architecture component selection is more important than ever because it must scale with exponen- tially increasing data volumes and user requirements.

EB-7592 > 0513 > PAGE 7 OF 11 Platform Selection Process An analytic architecture approach keeps business goals foremost in mind. This also means that all shops will manifest different . That “reference” architecture will also continu- Architecture is important, practical, and holistic, and ally change. Leadership must have an agile mindset to keep it drives analytic and organizational success. updated. This is the essence of “no-reference” architecture. It is not definable in laminate. It is empowered with support compo- Many companies are not having success with their workloads nents to meet all foreseeable business goals and it will change to due to a lack of focus on architecture. Specifically, if the analytic meet those goals. And it considers all possibilities, knowing that it architecture possibilities are not known or considered for a work- is controls of one of the most important assets that the company load, it is quite likely that the platform used for the last workload has—information—and one of the most important means of will be used again for the new workload. The more the platform modern competition—analytics. possibilities are considered for the workload, the better the chance for success of that workload. The Analytic Ecosystem Analytics do not solely exist in the post-operational world. As There are many platform categories (each designed for specific a matter of fact, the whole notion of a hard boundary between types of workloads) for storing data in the analytic architecture. operational (characterized by the ERP) and the post-operational These will be discussed in the next section. There is no “one size (characterized by the data warehouse) is going away. Analytics fits all” when it comes to platform selection. There is a best plat- certainly can be operational. So can Business Intelligence (BI). So form for each workload and the odds of workload success go up much of what we’ve learned with post-operational BI is now being tremendously if the correct platform is selected. applied to the operational environment in the form of operational BI like operational dashboards, stream processing, and master . Proceeding with analytics without an architecture approach is like trying to solve a Rubik’s Cube blindfolded. Sure, some extraordinary people, with However, we must distinguish between creating and using analyt- extensive practice, can do it, but why make it so hard? ics. Analytics are used everywhere and should be generated from data created everywhere.

The No-Reference Architecture We must get beyond making that default data store selection We are in the post-reference architecture era of information man- discussed earlier. We must have knowledge of, and consider, a list agement. The 1990s were the decade of vendors going in and out of usual suspects for analytic workloads. It includes: of shops holding up laminated, uncustomized reference architec- 1. The relational data warehouse, augmented with columnar 1 tures and convincing clients to strive to attain that picture. Once capabilities they did, it was assumed, all their problems would be solved. It 2. An analytic database management system was also more palatable to the technology manager to hold out a technical standard to hit, as opposed to suggesting he must hit 3. A data warehouse appliance business goals with architecture. Leading examples of these data stores will be examined the next section. For now, let us emphasize the interplay of the analytic components. There are no set rules for how data will flow in the

1. Some vendors still do this analytic architecture.

EB-7592 > 0513 > PAGE 8 OF 11 While directionally the data warehouse will feed data marts, there post-operational analytic environment. Though these systems will be marts that do the reverse and stand alone. There are appli- do not replace the data warehouse, they store the increasingly cations that need unadulterated source data—not data that has important unstructured and semi-structured data of an organi- gone through the data warehouse first. Even if the data warehouse zation. This is data that largely has been ignored or force fit into certifiably does not alter the data, applications in audit, security, relational structure over the years, to mixed results. and the like will prefer the nondependent (on the data warehouse) data mart. Obviously all of this big data will not be replicated into the data warehouse, so interplay between the warehouse and the analytic This is not to say that nondependent data marts do not happen database management system is a must. This gets back to the sup- otherwise. They do. If the architecture is not sound and a focus port components mentioned earlier. of the program, the value-add of data passing through the data warehouse will not be clear. Architecture, and therefore ulti- Data warehouse appliances, however, could play the role of the mately business, may take a hit in these environments. data warehouse—minimally in terms of intake and distribution in the analytic environment, and storing history data. The data Analytic database management systems such as Teradata Aster’s warehouse appliance, in some circumstances, could play this data (discussed in the next section) may also play a strong role in the warehouse-like role.

The other role necessary in the analytic environment is access. It is important to work with a company that understands the methodology and components of architecture, The role of access is perhaps the most complex. Data is distributed and has the experience to help create an analytic from the data warehouse and other platforms to the best platform organization. for the data access in an architected environment.

EB-7592 > 0513 > PAGE 9 OF 11 The Building Blocks of Analytic Competition

Understanding the meaning and importance of architecture is not provides, and it is no longer necessary to sacrifice robustness and enough. It is imperative to implement the analytic environment support in the DBMS that holds the post-operational data to get with an architecture focus. This doesn’t happen by accident. the advantages of columnar.

Likewise, moving forward in an analytic program with agility Teradata has extended its leadership from their EDWs into their means bringing support components to the table. And just as we appliance family for midmarket enterprise EDWs, as well as data need to leverage the support components, we need to leverage our marts for large companies. partner for the analytic architecture. The partner should bring extensive architectural understanding and experience, and the right The Teradata Data Warehouse Appliance supports the EDW components to bear to create the proper analytic environment. approach to building the data warehouse and is the Teradata appli- ance family flagship product. It is suitable for an upper-midmarket These components include not only technology, but also a port- true EDW or as the platform for a focused application. The folio of “jump starts” for the use of the technology. In the case of Teradata Data Mart Appliance is a more limited-capacity equiva- Teradata, all needed components are already in place, integrated, lent of the Teradata Data Warehouse Appliance and is ideal for the and delivering world-class analytic organizations all over the departmental or midmarket platform. The Teradata Extreme Data world with the BIAS approach. Appliance is also part of the Teradata appliance family and repre- sents affordability for the management of large quantities of data. Teradata Analytic Architecture Technology Teradata’s offerings undoubtedly stand out for data warehouse Teradata Aster’s analytic database management system, has and data mart appliance platforms. Its Active Enterprise Data patent-pending In-Database MapReduce (MR), a hybrid row/col- Warehouse line, based on the Teradata® Database, supports more umn store with an MR approach. Its MPP architecture makes it than 50 percent of large-scale data warehouses today. All database work for predictable as well as ad-hoc analytic use cases. It blends functions in Teradata systems are always done in parallel, using the performance of a relational database (i.e., indexes, optimizers, multiple server nodes and disks with all units of parallelism par- and more) with the programming flexibility of MapReduce (Java, ticipating in each database function. Perl, Python, .Net, etc.)

Teradata Optimizer is grounded in the knowledge that every Teradata Analytic Architecture Solution Model query will be executing on a massively parallel processing system. A semantic data model is a set of symbols and text describing the Teradata manages contending requirements for resources through information needed to answer a defined set of business ques- dynamic resource prioritization that is customizable by the cus- tions. It is a representation of the access layer whose purpose is to tomer. The server-nodes interconnect was designed specifically improve the simplicity, security, and speed of the data warehouse. for a parallel processing multi-node environment. This inter- connect is a linearly scalable, high-performance, fault-tolerant, Its characteristics are: self-configuring, multi-stage network. • Usually dimensional

• Often implemented through views In Teradata 14, Teradata added columnar structure to a table, effectively mixing row, column, and multi-column structures • Easy and quick access to data directly in the DBMS. With intelligent exploitation of Teradata • Variety of ways to look at the same data Columnar, there is no longer the need to go outside the data • Primary point of entry for BI tools warehouse DBMS for the power of performance that columnar

EB-7592 > 0513 > PAGE 10 OF 11 The semantic data model is usually dimensional but can also In addition, the semantic data model must be designed to support represent other types such as Analytical Data Sets. There are two a variety of ways to look at the same data. Although an order mindsets: relational and dimensional. A relational may be depicted just one way in the integrated data layer, it can model captures the business rules. A dimensional data model cap- be shown in multiple ways across multiple semantic data models tures the navigation paths and focuses on evaluating the meaning depending on business needs. Also the semantic data model is the of the business being monitored through metrics such as Gross primary point of entry for BI tools. Sales Amount and Number of Customers. Most semantic data models are dimensional because such models support business A Consistent Approach Ensures Delivery questions that follow the pattern of: Architecture is not easy to come by without focused effort. It can

• What do I want to see? easily be shortchanged if it is not understood that it is the direct cause of analytic success. Architecture is a way of life for deliver- • What do I want to see it by? ing analytics and a consistent approach ensures that delivery. • What constraints are there on the results? Teradata’s consistent approach features: The semantic data model is often implemented through views. • A multi-component approach—BIAS—to architecture A semantic data model can be shown at conceptual, logical, and physical levels of detail. At a physical level, it is often implemented • A sound, repeatable, and successful methodology as views over the integrated data layer. The semantic data model • Use of Architectural Principles and Advocated Positions also provides quick and easy access to data—users and BI tools • Use of Design Patterns and Implementation Alternatives need to be able to answer business questions quickly and easily. • World-class technology building blocks

• Use of architecture solution model building blocks Teradata provides the building blocks for the analytic architecture solution model.

William McKnight William is a consultant specializing in information management. His company, McKnight Consulting Group, has served clients such as Fidelity Investments, Teva Pharmaceuticals, Scotiabank, Samba Bank, Pfizer, France Telecom, and Verizon—in total, 16 of the Global 2000. William is also a very popular speaker worldwide and a prolific writer who has published hundreds of articles and white papers. An Ernst&Young Entrepreneur of the Year Finalist, William is a former Fortune 50 technology executive and software engineer. He provides clients with action plans, architectures, strategies, complete program, and vendor-neutral tool selection to manage information. He can be reached at 214-514-1444 or through his at www.mcknightcg.com.

The Best Decision Possible and Unified Data Architecture are trademarks, and Teradata, the Teradata logo and SQL-MapReduce are registered trademarks of Teradata Corporation and/ or its affiliates in the U.S. and world-wide. Apache and Hadoop are registered trademarks of the Apache Software Foundation.

EB-7592 > 0513 > PAGE 11 OF 11