<<

20-20 Insights

Surviving the Petabyte Age: A Practitioner’s Guide

Executive Summary The amount of time it takes for news to become common knowledge has shrunk, thanks to: The concept of “big data1” is gaining attention across industries and the globe. Among the drivers • An emerging network of social media and blogs are the growth in social media (Twitter, Facebook, that potentially makes everyone a publisher of blogs, etc.) and the explosion of rich content from good and bad news. other information sources (activity logs from the A rapid increase in the number of people who Web, proximity and wireless sources, etc.). The • are untethered from traditional information desire to create actionable insights from ever- receptacles and now have a highly mobile increasing volumes of unstructured and struc- means of collecting and ingesting information. tured data sets is forcing enterprises to rethink their approach to big data, particularly as tradi- • The meteoric rise of desktop tools housing a tional approaches have proved difficult, if even significant portion of information. Organiza- possible, to apply to structured data sets. tions need to understand the information and processes involved in the dispensation of desk- One challenge that many, if not most, enter- top-managed information (mostly prises are attempting to address is the increas- Access and Excel). This information is most ing number of data sources made available for likely to be found in the form of: analysis and reporting. Those who have taken an Copies of operational data (including both early adopter stance and integrated non-tabular >> sources and targets). information (a.k.a. unstructured data) into their pool of analysis data have exacerbated their data >> Copies of operational data that is enriched management problems. (including the processes and sources used for enrichment, as well as the targets that A second challenge is the shrinking timeframe in receive the enriched information). which a business stays focused on a particular Processes bypassing the systematized pro- topic. Thanks to the highly integrated and com- >> cesses (including the bypassed processes, municative global economy, and the great strides the sources used for these processes, the made in expanding communications bandwidth, actors in these processes and the results of both good and bad news circumnavigate the these processes). globe at a much faster pace than ever before.

cognizant 20-20 insights | december 2011 This whitepaper lays out the concept of a business tion models cannot be maintained fast enough to information model as a vehicle to organize - appease their business constituents. Moreover, mation into separate categories, which directly once constructed and populated with information, influences the creation, capture or extraction of these models require new technologies to inter- business value and elevates it to a heightened face with the data. Adding insult to injury, all this focus. We will cover four main topics: data is largely introspective and serves merely to support the status quo. When disruptions occur, 1. Why companies dealing with big data in insights can only be gleaned from this data over 1 today’s Petabye Age need to stratify informa- a sufficient passage of time; in the meantime, tion so that trustworthy, relevant, actionable insights are derived from what is largely called and timely data can be found at a moment’s unstructured and semi-structured data, as well as notice. data obtained from outside the organization via 2. A business model that can be used to stratify social media, blogs, Web sites and a host of other information. sources that don’t fit into the neatly organized tools devised for insight generation. 3. A new definition of partitioning and a business process for formulating the partitions. A major shift is transforming the basic tenets of Partitions should deal with stratifying informa- data-driven insight generation. This shift requires tion based on its contribution to organizational a new way of combining and synthesizing data data, as well as the more traditional technical used for navigating the highly integrated and partitioning that is conducted for performance communicative global economy. and maintenance reasons. 4. Methods of rolling out an information infra- Overcoming this challenge requires organizations structure aligned with this new partitioning to solve three important issues (see Figure 1): definition. The realities of this new environ- • Data depth: How to derive insight from struc- ment are that the maintenance of a traditional tures that contain billions or more instances of enterprise information model happens at the data. These can include sessions in a Web log, speed of business and is in direct opposition entries obtained from social media, entries from to maintaining the focus of information that RFID activities or mobile-sourced activities. One directly contributes to enterprise value. thing is sure: The sheer size of these pools of data will continue to grow, resulting in techni- Three Issues to Solve cal hurdles that challenge traditional methods The Petabyte Age2 is creating a multitude of for efficiently and effectively using such large challenges for IT organizations, as they find that pools of like data. Most solutions that deal with their well-honed, carefully constructed informa- big data attempt to meet this challenge.

Data Challenges of the Petabyte Age

Figure 1

cognizant 20-20 insights 2 • Focus on enterprise value: How to quickly Sheer Depth of Similar Data determine which data requires the most focus Specialized tools have emerged to address this at any point in time. Thanks to our tightly issue of enormous pools of similar data. These connected global economy, news travels tools originate from the realization that the time- around the world more quickly than ever, honored structured query language tools, as well which requires rapid rethinking of enterprise as other tools built around database technologies, strategies and tactics. This requires the ability are ill-equipped to efficiently deal with billions, to quickly change which data is focused upon. if not trillions, of rows of data. Spawned from Traditional information models that are con- ’s attempt to deal with the data accumu- structed to synthesize business knowledge lated from all the interactions that occur with the from the deluge of available data impede the Google suite, a whole new framework nimbleness required to meet the needs of the built around the MapReduce technology has been modern-day enterprise. borne, and an emerging suite of tools has begun • Less introspective view: How to make the to appear on this new stack of technologies. whole information fabric less introspective. Using information derived from inside the There will no doubt be a refinement of the tech- organization can predict future trajectories niques that are maturing to deal with this concept only if the status quo is assumed. However, of big data. The only thing we can be sure of is when there is a high degree of turbulence, that the big-data business issues addressed by knowledge obtained from internally-generat- MapReduce and the related suite of technologies ed information is woefully inadequate in the are not going away. short term; insights are obtainable only after Just as the technologies available for launching sufficient time has passed and several cycles the initial collection of Web sites were immature, have been interpreted. The resulting organi- so are the tools for developing solutions for big zational missteps are covered regularly in the data. Much has been said about how technology news media. What is required is an ability to has taken a major step back from what is com- wield information as an early-warning system monly available for business intelligence and data for understanding changes in enterprise tra- warehousing solutions — but this is much less a jectories. Such data sources are external to statement about the problem of big data than it the enterprise until enough time has passed is about the immaturity of the technologies avail- for a history of data points to be inferred from able for solving the big-data problem set. internal data.

Converting Big Data Into Value

Relevant Actionable Trustworthy Acquired & Data Learned Created Knowledge Just-in- Inference Focused Time Capabilities Customers Markets Channels Value Risks Chain Insight Regulatory Expected Investors Disruptions Outcomes

Heard Innovation Inference Action

Extracted Originated Value Value Value

Captured Captured Transaction Captured Value Value Stream

Figure 2

cognizant 20-20 insights 3 Managing Opportunity and Risk

Managing

Operational n tio Risk A ra ct o io b People Capabilitie n a s s ll T echn o Customers olo C gy L A B E R Media N S Competitors E Diffusing Focused Enhancing Disruptive Information Sustainable Events Value Markets Geographies

P Financing cs ro tri R duct Me e n g Inn s i ovation Proces a u h la C to e rs Defining lu Enterprise Va

Strategies

Figure 3

Interestingly, the problem of large pools of data nal and external sources), learned inferences, is the primary issue, which today is tackled by heard inferences and innovations, some of which introducing technologies to tackle each of the will serve as disruptions to others in the partici- challenges outlined above independently. Com- pating marketplaces. panies that thrive in the Petabyte Age will be able to consolidate the technologies so their busi- It is the business model itself that must provide ness constituency is faced with a single interface the focus into what is pertinent to the business that addresses their full complement of informa- at a particular point in time and that serves as tional needs. the point of contention. The enterprise busi- ness models used as the basis for synthesizing Focus on Influencers information as the means of gaining insight are of Enterprise Value devised to map all data rather than “tiering” data The intent of business intelligence is to take into focus areas. Examples of focus areas include actionable, relevant, trustworthy and timely data; the following: put it through a model that aligns with key busi- • Directly relates to creating or protecting ness challenges (customers, extracted, originated or captured enterprise To create or protect geographies, channels, inves- value. enterprise value, the tors, markets, etc.) as the means to gain insight; and derive an • Does not directly contribute to value but is information deemed action plan to extract, originate mandatory for business operations. worthy of focus must or capture organizational value • May not be mandatory for business operations be sufficiently broad (see Figure 2, previous page). but is mandatory for regulatory purposes. Furthermore, captured value in scope so that both • May not be mandatory for the above categories can be a one-time event (i.e., a but is mandatory for archiving. the opportunities and temporary supply shortfall of risks are exposed in a competitor) or a permanent • Was once important but is now relegated to value stream. While captured historical trivia. all dimensions of the transactions are acceptable, To create or protect extracted, originated or cap- captured value streams are business model. tured enterprise value, the information deemed more desirable. worthy of focus must be sufficiently broad in Data is converted into insight by using acquired scope so that both the opportunities and risks are and created knowledge (obtained from both inter- exposed in all dimensions of the business model.

cognizant 20-20 insights 4 For example, in the illustrated business model in at which point it is much more difficult to Figure 3 (see previous page), operational risks, remediate. disruptive events, enterprise strategies and Disruptions make themselves known through sustainable value sources will be managed by external data much more readily than internal managing: data. However, there are also problems with exter- • People, as well as the services they provide. nal data, including the fact that this data is much • Processes and the metrics used to manage the more loosely defined and that the sheer number processes. of information sources are more extensive and change more frequently in scope and content. • Innovations — specifically, the products released into the marketplace. An example of an external data source that can be • Capabilities aligned with technologies. captured is Twitter. All Twitter content is capable of being captured, and a competitor’s promotion Information will be managed in this model, along that is broadcast on Twitter can be immediately the following dimensions (i.e., the enablers): exposed. In order to listen for a Twitter message, • Customers, or the customers, prospects and however, a handful of literally billions of 140-byte visitors who can be tapped for enterprise messages will be the potential source of this infor- value. mation. And Twitter is only one of many informa- tion sources that can expose such calls to action. • Media, both traditional and emerging (social media like Facebook and Google+) that can Early warning systems are not a new phenomenon. influence enterprise value. Just as those that are deployed for catastrophic • Markets participated in for originating, weather and natural disasters, early warning extracting or capturing enterprise value. systems for businesses should be launched to warn of disruptions to the orderly management • Financing, or the source of funds used for of the strategies and tactics of enterprises that investments and cash flow used to originate, ultimately extract, originate or capture value. extract or capture enterprise value. • Geographies and sovereign nations from which Integrating this information into a meaning- enterprise value will be originated, extracted ful early warning system requires a new way of or captured. examining information. In the Petabyte Age of ubiquitous and proliferating data, the integration • Rivals in markets and geographies that of information must be done immediately, or else compete for customers, market coverage and the value of such integration is worth significantly funding sources. less than when it was initially exposed.

A Less Introspective View Several years ago, computer scientists discovered of Information that code was more nimble if it was decoupled Only expected trends can be tracked using inter- from its underlying model, which gave rise to the nal information. Disruptions will eventually appear SOA and REST architectures; similarly, a process in internal data, but their trajectory will only be can decouple the modeling of data from the evident after two or more cycles of information ability to publish alerts, dashboards and access to make their way into the internal data stream. This consumers. This post-discovery means of utiliz- means: ing data has been written about by Forrester and others and is the basis of many advanced tools • It will take a minimum of three days for new in the marketplace today. The reason for such an sales trajectories to make themselves known to approach is to discover anomalies prior to the a daily sales system. By that time, any progress normal publication cycle. that competitors have made in capturing value from your largest customers is removed for A number of technical solutions are emerging to immediate transactions (i.e., captured trans- deal with publishing data at a moment’s notice. actional value) and, in many cases, is gone Most of these solutions are covered under the forever (i.e., captured value streams). topic of “virtualized data warehouses,” which will be covered in a separate whitepaper. Momentum In cases where data is reported less frequently, • for virtualized warehouse technology has picked such as financial results, it will take weeks or up, as all vendors in the space have positioned months for such situations to be exposed, themselves to offer “perfect solutions.”

cognizant 20-20 insights 5 Stages of Information Management

The EIS/DSS Age The BI/DW Age The NextGen Age (circa 1975-1997) (circa 1993-2013) (circa 2010-?)

Issues that were tackled: Issues that were tackled: • Elimination of paper • Single version of the truth • Improvements in monitored data • Terabytes of information • Information responsiveness • Performance constraints • Gigabytes of information • Governance models • Delivery models (PCs, Windows) • Specialized tools • Support costs • Delivery models (Web, etc.)

Issues that must be tackled: • Just-in-time information • Always-on prioritized information • Less introspective information • Petabytes of information • Source integration timing • Governance and valuation models • Component-based delivery models

Figure 4

A Framework for the Petabyte Age available elsewhere rarely comes in neat bundles of tables that are easily integrated Roughly every 15 to 20 years, the disciplines of using readily available scripts. delivering enterprise information for creating business-critical insight and improving the overall • The ability to integrate new sources of infor- decision-making process undergo radical change mation at a moment’s notice. This requirement (see Figure 4). We are in the midst of such a major challenges the basic tenets of the enterprise shift. These cycles tend to share the following information model and ETL processes that characteristics: have matured over the past 20 years. • They are ushered in with the availability of • The ability to embrace changes (i.e., tools that are greatly reduced in price or additions and deletions to the information are open source and displace much of the fabric used to steer, organize and ultimately functionality of the products being replaced produce enterprise value by proving that (e.g., in the late ‘90’s, such products like Pilot the technology arm can responsively deliver and Comshare were displaced by market trustworthy information). Disciplines such as upstarts like Javelin and Excel). process governance, data governance, infor- mation centers of excellence that manage There are referenceable cases of enterprises • a catalog of components and information that have successfully utilized next-generation lifecycle management3 are enjoying renewed solutions for translating raw data into insight. popularity because they are cornerstones of Challenges that must be tackled as part of this this renewed responsiveness to the knowledge next-generation age are: worker community.

• The ability to deliver prioritized, just-in-time What is important in the new disciplines associ- information through an always-on interface ated with insight generation is that they are cen- (i.e., mobile). tered on focusing on information, whether or • The ability to combine information generated not it is traditional, internally sourced informa- inside the organization (introspective) with tion. Many of the information sources will require information made available elsewhere. It is techniques associated with big data (billion-plus important to note that information made row tables), but all of it will require assistance in

cognizant 20-20 insights 6 focusing on the information dilemma for the for- >> Available in official operational systems. seeable future (i.e., finding which information is >> Available from unofficial operational sys- critical for a specific business need is much akin tems (normally Microsoft Access and Excel). to finding the proverbial needle in a haystack). >> Introspective but document-centric Much work has been done to create an infor- information (contracts, e-mail, etc.). mation lifecycle for managing performance of Information that is sourced outside analytical and operational systems. However, par- >> the organization (social media, blogs, titioning strategies have rarely been relegated to newswires, etc.). partition information into the following schemes: • Information that is directly attributable to • Step 2: Create an information component generating or protecting revenue for an inventory, assigning each information compo- enterprise. nent to a segment of the business information model and determining its priority in gener- • Information that may not be strategically or ating value to the organization. Also, identify tactically significant to generating revenue but information that is required but not available is mandatory for business operations. Much as part of this exercise. financial data (not financing, which is often a cash position) falls into this category. • Step 3: Assign the information inventory to the partitions of the business information Information that may not fall into the above • model (i.e., directly contributing to enterprise two categories but is required for regulatory value, required for operations, etc.). purposes. • Information required for archival purposes. • Step 4: Align potential initiatives with the par- titioned information inventory and determine Information that may have once fallen into the • the impact to improving enterprise value by above categories but has been relegated to tackling these initiatives, thereby creating a historical trivia. roadmap to this prioritized information fabric The process of partitioning information into areas critical to capturing, extracting or originating deserving focus (called “focus partitioning4”) is enterprise value. completed by determining the following: It is important to note that as much as we think • Step 1: Taking inventory of information used in that the business stakeholders don’t have the data the organization. Information will come from they need to perform their job, in reality there is one of five categories: always a means to obtain and utilize information >> Downloaded and enriched through process- required for determining and executing on the es managed entirely from desktop systems. strategic, tactical and operational needs of the

Template for Capturing, Aligning Information Components

When capturing the focused information that is used in a big data initiative, it is important to align the data back to the business information model. The template above is a vehicle that can be used to capture the focused information exposed through a big data initiative and ensure alignment and proper placement in the business information model. Figure 5

cognizant 20-20 insights 7 Alignment of Data Inventory with Business Value

Equally important to aligning information to the business information model is the identification of how the information will result in positive incremental value to the organization. It is important to continually put the identified data to the test of whether it is actionable and, if properly used, is associated with organizational value. This template facilitates testing whether information prioritized for the big data initiative is both associated with the business information model and results in value along the dimensions of the business information model.

Figure 6

enterprise. In areas where the sanctioned tech- initiative may not deliver the value anticipated if nical vehicles were unable to provide this infor- the little islands of information are engrained into mation, the enterprise stewards found means to enterprise processes. cobble together the information they required. The determination of whether tackling these It is of paramount importance that the identity and islands of information is included in the enter- use of this information be ascertained when chart- prise strategy through an enterprise information ing a course for big data. In reality, lots of related management program, an enterprise data gov- islands of little data are often sewn together in a ernance program or some other initiative is less big data initiative. Tackling the obvious big data important than engaging the owners of these islands of information.

Footnotes 1 Big data includes data sets that grow so large that they become awkward to work with using on-hand database management tools. Difficulties include capture, storage, search, sharing, analytics and visualizing.

2 Petabyte Age is a euphemism for the massive volumes of data that many organizations are dealing with that can be measured in petabytes, a unit of information equal to one quadrillion bytes. 3 Information lifeycle management is a process used to improve the usefulness of data by moving lesser used data into segments. It is most commonly concerned with moving data from always needed partitions to rarely needed partitions and, finally, into archives. 4 Focus partitioning is a term created by the author that describes applying generally accepted techniques to gain performance by segmenting data into partitions (vertical partitioning) to segmenting groups of data by the likelihood that it will participate in achieving organizational value.

cognizant 20-20 insights 8 References Mark Albala, “Enhancing Agility: Enabling Information Intelligence for a Turbulent World,” 2010. Mark Albala, “Post Discovery Intelligent Applications: The Next Big Thing,” 2009. Mark Albala, “Information and Execution Agility: The New Imperative,” 2009. Boris Evelson, “Information Post Discovery – Latest BI Trend,” blog post, Forrester Research, May 18, 2009.

About the Author Mark Albala is Practice Director of Cognizant’s North American Enterprise Information Management Consulting and Solution Architecture Practice. This practice provides solution architecture, information governance, information strategy and program governance services to companies across industries and supports Cognizant’s business intelligence and data warehouse delivery capabilities. A graduate of Syracuse University, Mark has held senior thought leadership, advanced technical and trusted advisory roles for organizations focused on the disciplines of information management for over 20 years. He can be reached at [email protected].

About Cognizant Cognizant (NASDAQ: CTSH) is a leading provider of information technology, consulting, and business process out- sourcing services, dedicated to helping the world’s leading companies build stronger businesses. Headquartered in Teaneck, New Jersey (U.S.), Cognizant combines a passion for client satisfaction, technology innovation, deep industry and business process expertise, and a global, collaborative workforce that embodies the future of work. With over 50 delivery centers worldwide and approximately 130,000 employees as of September 30, 2011, Cognizant is a member of the NASDAQ-100, the S&P 500, the Forbes Global 2000, and the Fortune 500 and is ranked among the top performing and fastest growing companies in the world. Visit us online at www.cognizant.com or follow us on Twitter: Cognizant.

World Headquarters European Headquarters India Operations Headquarters 500 Frank W. Burr Blvd. 1 Kingdom Street #5/535, Old Mahabalipuram Road Teaneck, NJ 07666 USA Paddington Central Okkiyam Pettai, Thoraipakkam Phone: +1 201 801 0233 London W2 6BD Chennai, 600 096 India Fax: +1 201 801 0243 Phone: +44 (0) 20 7297 7600 Phone: +91 (0) 44 4209 6000 Toll Free: +1 888 937 3277 Fax: +44 (0) 20 7121 0102 Fax: +91 (0) 44 4209 6060 Email: [email protected] Email: [email protected] Email: [email protected]

­­© Copyright 2011, Cognizant. All rights reserved. No part of this document may be reproduced, stored in a retrieval system, transmitted in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the express written permission from Cognizant. The information contained herein is subject to change without notice. All other trademarks mentioned herein are the property of their respective owners.