Surviving the Petabyte Age: a Practitioner's Guide
Total Page:16
File Type:pdf, Size:1020Kb
• Cognizant 20-20 Insights Surviving the Petabyte Age: A Practitioner’s Guide Executive Summary The amount of time it takes for news to become common knowledge has shrunk, thanks to: The concept of “big data1” is gaining attention across industries and the globe. Among the drivers • An emerging network of social media and blogs are the growth in social media (Twitter, Facebook, that potentially makes everyone a publisher of blogs, etc.) and the explosion of rich content from good and bad news. other information sources (activity logs from the A rapid increase in the number of people who Web, proximity and wireless sources, etc.). The • are untethered from traditional information desire to create actionable insights from ever- receptacles and now have a highly mobile increasing volumes of unstructured and struc- means of collecting and ingesting information. tured data sets is forcing enterprises to rethink their approach to big data, particularly as tradi- • The meteoric rise of desktop tools housing a tional approaches have proved difficult, if even significant portion of information. Organiza- possible, to apply to structured data sets. tions need to understand the information and processes involved in the dispensation of desk- One challenge that many, if not most, enter- top-managed information (mostly Microsoft prises are attempting to address is the increas- Access and Excel). This information is most ing number of data sources made available for likely to be found in the form of: analysis and reporting. Those who have taken an Copies of operational data (including both early adopter stance and integrated non-tabular > sources and targets). information (a.k.a. unstructured data) into their pool of analysis data have exacerbated their data > Copies of operational data that is enriched management problems. (including the processes and sources used for enrichment, as well as the targets that A second challenge is the shrinking timeframe in receive the enriched information). which a business stays focused on a particular Processes bypassing the systematized pro- topic. Thanks to the highly integrated and com- > cesses (including the bypassed processes, municative global economy, and the great strides the sources used for these processes, the made in expanding communications bandwidth, actors in these processes and the results of both good and bad news circumnavigate the these processes). globe at a much faster pace than ever before. cognizant 20-20 insights | december 2011 This whitepaper lays out the concept of a business tion models cannot be maintained fast enough to information model as a vehicle to organize infor- appease their business constituents. Moreover, mation into separate categories, which directly once constructed and populated with information, influences the creation, capture or extraction of these models require new technologies to inter- business value and elevates it to a heightened face with the data. Adding insult to injury, all this focus. We will cover four main topics: data is largely introspective and serves merely to support the status quo. When disruptions occur, 1. Why companies dealing with big data in insights can only be gleaned from this data over 1 today’s Petabye Age need to stratify informa- a sufficient passage of time; in the meantime, tion so that trustworthy, relevant, actionable insights are derived from what is largely called and timely data can be found at a moment’s unstructured and semi-structured data, as well as notice. data obtained from outside the organization via 2. A business model that can be used to stratify social media, blogs, Web sites and a host of other information. sources that don’t fit into the neatly organized tools devised for insight generation. 3. A new definition of partitioning and a business process for formulating the partitions. A major shift is transforming the basic tenets of Partitions should deal with stratifying informa- data-driven insight generation. This shift requires tion based on its contribution to organizational a new way of combining and synthesizing data data, as well as the more traditional technical used for navigating the highly integrated and partitioning that is conducted for performance communicative global economy. and maintenance reasons. 4. Methods of rolling out an information infra- Overcoming this challenge requires organizations structure aligned with this new partitioning to solve three important issues (see Figure 1): definition. The realities of this new environ- • Data depth: How to derive insight from struc- ment are that the maintenance of a traditional tures that contain billions or more instances of enterprise information model happens at the data. These can include sessions in a Web log, speed of business and is in direct opposition entries obtained from social media, entries from to maintaining the focus of information that RFID activities or mobile-sourced activities. One directly contributes to enterprise value. thing is sure: The sheer size of these pools of data will continue to grow, resulting in techni- Three Issues to Solve cal hurdles that challenge traditional methods The Petabyte Age2 is creating a multitude of for efficiently and effectively using such large challenges for IT organizations, as they find that pools of like data. Most solutions that deal with their well-honed, carefully constructed informa- big data attempt to meet this challenge. Data Challenges of the Petabyte Age Figure 1 cognizant 20-20 insights 2 • Focus on enterprise value: How to quickly Sheer Depth of Similar Data determine which data requires the most focus Specialized tools have emerged to address this at any point in time. Thanks to our tightly issue of enormous pools of similar data. These connected global economy, news travels tools originate from the realization that the time- around the world more quickly than ever, honored structured query language tools, as well which requires rapid rethinking of enterprise as other tools built around database technologies, strategies and tactics. This requires the ability are ill-equipped to efficiently deal with billions, to quickly change which data is focused upon. if not trillions, of rows of data. Spawned from Traditional information models that are con- Google’s attempt to deal with the data accumu- structed to synthesize business knowledge lated from all the interactions that occur with the from the deluge of available data impede the Google software suite, a whole new framework nimbleness required to meet the needs of the built around the MapReduce technology has been modern-day enterprise. borne, and an emerging suite of tools has begun • Less introspective view: How to make the to appear on this new stack of technologies. whole information fabric less introspective. Using information derived from inside the There will no doubt be a refinement of the tech- organization can predict future trajectories niques that are maturing to deal with this concept only if the status quo is assumed. However, of big data. The only thing we can be sure of is when there is a high degree of turbulence, that the big-data business issues addressed by knowledge obtained from internally-generat- MapReduce and the related suite of technologies ed information is woefully inadequate in the are not going away. short term; insights are obtainable only after Just as the technologies available for launching sufficient time has passed and several cycles the initial collection of Web sites were immature, have been interpreted. The resulting organi- so are the tools for developing solutions for big zational missteps are covered regularly in the data. Much has been said about how technology news media. What is required is an ability to has taken a major step back from what is com- wield information as an early-warning system monly available for business intelligence and data for understanding changes in enterprise tra- warehousing solutions — but this is much less a jectories. Such data sources are external to statement about the problem of big data than it the enterprise until enough time has passed is about the immaturity of the technologies avail- for a history of data points to be inferred from able for solving the big-data problem set. internal data. Converting Big Data Into Value Relevant Actionable Trustworthy Acquired & Data Learned Created Knowledge Just-in- Inference Focused Time Capabilities Customers Markets Channels Value Risks Chain Insight Regulatory Expected Investors Disruptions Outcomes Heard Innovation Inference Action Extracted Originated Value Value Value Captured Captured Transaction Captured Value Value Stream Figure 2 cognizant 20-20 insights 3 Managing Opportunity and Risk Managing Operational n tio Risk A ra ct o io b People Capabilitie n a s s ll T echn o Customers olo C gy L A B E R Media N S Competitors E Diffusing Focused Enhancing Disruptive Information Sustainable Events Value Markets Geographies P Financing cs ro tri R duct Me e n g Inn s i ovation Proces a u h la C to e rs Defining lu Enterprise Va Strategies Figure 3 Interestingly, the problem of large pools of data nal and external sources), learned inferences, is the primary issue, which today is tackled by heard inferences and innovations, some of which introducing technologies to tackle each of the will serve as disruptions to others in the partici- challenges outlined above independently. Com- pating marketplaces. panies that thrive in the Petabyte Age will be able to consolidate the technologies so their busi- It is the business model itself that must provide ness constituency is faced with a single interface the focus into what is pertinent to the business that addresses their full complement of informa- at a particular point in time and that serves as tional needs. the point of contention. The enterprise busi- ness models used as the basis for synthesizing Focus on Influencers information as the means of gaining insight are of Enterprise Value devised to map all data rather than “tiering” data The intent of business intelligence is to take into focus areas.