Quick viewing(Text Mode)

How 'Big Data' Is Different

How 'Big Data' Is Different

FALL 2012 VOL.54 NO.1

Thomas H. Davenport, Paul Barth and Randy Bean

How ‘Big ’ is Different

Brought to you by

Please note that gray areas reflect artwork that has been intentionally removed. The substantive content of the ar- ticle appears as originally published. REPRINT NUMBER 54104 INTELLIGENCE

[DATA AND ] How ‘Big Data’ Is Different These days, lots of people in business are talking about “big data.” But how do the poten- tial insights from big data differ from what managers generate from traditional analytics? BY THOMAS H. DAVENPORT, PAUL BARTH AND RANDY BEAN

These days, many people in the informa- tion technology world and in corporate boardrooms are talking about “big data.” Many believe that, for companies that get it right, big data will be able to unleash new organizational capabilities and value. But what does the term “big data” actually en- tail, and how will the insights it yields differ from what managers might generate from traditional analytics? There is no question that organizations are swimming in an expanding sea of data that is either too voluminous or too un- structured to be managed and analyzed Big data encompasses through traditional means. Among its bur- everything from clickstream data from geoning sources are the clickstream data the Web to genomic from the Web, content (tweets, and proteomic data from biological blogs, wall postings, etc.) and and medicine. video data from retail and other settings and from video entertainment. But big data also encompasses everything from call cen- ronments at a more granular level, to 1. Paying attention to flows ter voice data to genomic and proteomic create new products and services, and to as opposed to stocks There are data from biological research and medicine. respond to changes in usage patterns as several types of big data applications. The Every day, alone processes about 24 they occur. In the life sciences, such capa- first type supports customer-facing pro- petabytes (or 24,000 terabytes) of data. Yet bilities may pave the way to treatments and cesses to do things like identify fraud in very little of the is formatted in cures for threatening diseases. real time or score medical patients for the traditional rows and columns of con- Organizations that capitalize on big health risk. A second type involves contin- ventional . data stand apart from traditional data uous process monitoring to detect such Many IT vendors and solutions provid- analysis environments in three key ways: things as changes in consumer sentiment ers use the term “big data” as a •They pay attention to data flows as op- or the need for service on a jet engine. Yet for smarter, more insightful . posed to stocks. another type uses big data to explore net- But big data is really much more than that. •They rely on data scientists and product work relationships like suggested friends Indeed, companies that learn to take and process developers rather than data on LinkedIn and Facebook. In all these ap- advantage of big data will use real-time analysts. plications, the data is not the “stock” in a information from sensors, radio frequency •They are moving analytics away from the but a continuous flow. identification and other identifying IT function and into core business, opera- This represents a substantial change from devices to understand their business envi- tional and production functions. the past, when data analysts performed

22 MIT SLOAN MANAGEMENT REVIEW FALL 2012 SLOANREVIEW.MIT.EDU multiple analyses to find meaning in a milliseconds, then optimize the offers over the data itself — obtaining, extracting, ma- fixed supply of data. time by tracking responses. nipulating and structuring it — is critical to Today, rather than looking at data to as- Some big data environments, such as any analysis, the people who work with big sess what occurred in the past, organizations consumer sentiment analysis, are not de- data need substantial and creative IT skills. need to think in terms of continuous flows signed for automating decisions but are They also need to be close to products and and processes. “Streaming analytics allows better suited for real-time monitoring of the processes within organizations, which means you to process data during an event to im- environment. Given the volume and veloc- they need to be organized differently than prove the outcome,” notes Tom Deutsch, ity of big data, conventional, high-certitude analytical staff were in the past. program director for big data technologies approaches to decision-making are often “Data scientists,” as these professionals and applied analytics at IBM. This capability not appropriate in such settings; by the time are known, understand analytics, but they is becoming increasingly important in fields the organization has the information it also are well versed in IT, often having ad- such as health care. At Toronto’s Hospital for needs to make a decision, new data is often vanced degrees in computer science, Sick Children, for example, machine learn- available that renders the decision obsolete. computational physics or biology- or net- ing algorithms are able to discover patterns In real-time monitoring contexts, organiza- work-oriented social sciences. Their that anticipate infections in premature ba- tions need to adopt a more continuous upgraded skill set — in- bies before they occur. approach to analysis and decision-making cluding programming, mathematical and The increased volume and velocity of based on a series of hunches and hypotheses. statistical skills, as well as business acumen data in production settings means that or- Social media analytics, for example, capture and the ability to communicate effectively ganizations will need to develop continuous fast-breaking trends on customer senti- with decision-makers — goes well beyond processes for gathering, analyzing and in- ments about products, brands and what was necessary for data analysts in the terpreting data. The insights from these companies. Although companies might be past. This combination of skills, valuable efforts can be linked with production appli- interested in knowing whether an hour’s or as it is, is in very short supply. cations and processes to enable continuous a day’s changes in online sentiment correlate As a result, some early adopters of big processing. Although small “stocks” of data with sales changes, by the time a traditional data are working to develop their own talent. located in warehouses or data marts may analysis is completed there would be a raft of EMC Corporation, for example, tradition- continue to be useful for developing and re- new data to analyze. Therefore, in big data ally a provider of technologies, fining the analytical models used on big environments it’s important to analyze, de- acquired , a big data technology data, once the models have been developed, cide, and act quickly and often. company, in 2010 to expand its capabilities they need to process continuing data However, it isn’t enough to be able to in and promptly started an edu- streams quickly and accurately. monitor a continuing stream of informa- cational offering for data scientists. Other The behavior of credit card companies tion. You also have to be prepared to make companies are working with universities to offers a good illustration of this dynamic. In decisions and take action. Organizations train data scientists. the past, direct marketing groups at credit need to establish processes for determin- Early users of big data are also rethink- card companies created models to select the ing when specific decisions and actions are ing their organizational structures for most likely customer prospects from a large necessary — when, for example, data val- data scientists. Traditionally, analytical data warehouse. The process of data extrac- ues fall outside certain limits. This helps to professionals were often part of internal tion, preparation and analysis took weeks to determine decision stakeholders, decision consulting organizations advising man- prepare — and weeks more to execute. processes and the criteria and timeframes agers or executives on internal decisions. However, credit card companies, frustrated for which decisions need to be made. However, in some industries, such as by their inability to act quickly, determined online social networks, gaming and phar- that there was a much faster way to meet 2. Relying on data scientists maceuticals, data scientists are part of the most of their requirements. In fact, they and product and process de- product development organization, were able to create a “ready-to-market” da- velopers as opposed to data developing new products and product tabase and system that allows a marketer to analysts Although there has always been features. At Merck, for example, data analyze, select and issue offers in a single a need for analytical professionals to support scientists (whom the company calls sta- day. Through frequent iterations and moni- the organization’s analytical capabilities, the tistical genetics scientists) are members toring of website and call-center activities, requirements for support personnel are dif- of the drug discovery and development companies can make personalized offers in ferent with big data. Because interacting with organization.

SLOANREVIEW.MIT.EDU FALL 2012 MIT SLOAN MANAGEMENT REVIEW 23 INTELLIGENCE

3. Moving analytics from IT reconfigured for different needs. Cloud- This requires a sea change in IT activity into core business and oper- based service providers offer on-demand within organizations. As the volume of data ational functions Surging volumes pricing with fast reconfiguration. explodes, organizations will need analytic of data require major improvements in da- Another approach to managing big data is tools that are reliable, robust and capable of tabase and analytics technologies. leaving the data where it is. So-called “virtual being automated. At the same time, the ana- Capturing, filtering, storing and analyzing data marts” allow data scientists to share exist- lytics, algorithms and user interfaces they big data flows can swamp traditional ing data without replicating it. eBay, for employ will need to facilitate interactions networks, storage arrays and relational da- example, used to have an enormous data rep- with the people who work with the tools. tabase platforms. Attempts to replicate and lication problem, with between 20- and Successful IT organizations will train and scale the existing technologies will not 50-fold versions of the same data scattered recruit people with a new set of skills who keep up with big data demands, and big throughout its various data marts. Now, can integrate these new analytic capabilities data is changing the technology, skills and thanks to its virtual data marts, the company’s into their production environments. processes of the IT function. replication problem has been dramatically A further way that big data disrupts the The market has responded with a broad reduced. eBay has also established a “data traditional roles of business and IT is that array of new products designed to deal with hub” — an internal website to make it easier it presents discovery and analysis as the big data. They include open source plat- for managers and analysts to serve them- first order of business. Next-generation IT processes and systems need to be designed Coming to terms with big data is prompting for insight, not just . Tradi- tional IT architecture is accustomed to organizations to rethink their basic assumptions having applications (or services) as “black about the relationship between business and boxes” that perform tasks without expos- IT — and their respective roles. ing internal data and procedures. But big data environments must make sense of forms such as Hadoop, invented by Internet selves and share data and analyses across the new data, and summary reporting is not pioneers to support the massive scale of organization. In effect, eBay has built a social enough. This means that IT applications data they generate and manage. Hadoop al- network around analytics and data. need to measure and report transparently lows organizations to load, store and query Coming to terms with big data is prompt- on a wide variety of dimensions, including massive data sets on a large grid of inexpen- ing organizations to rethink their basic customer interactions, product usage, ser- sive servers, as well as execute advanced assumptions about the relationship between vice actions and other dynamic measures. analytics in parallel. Relational databases business and IT — and their respective roles. As big data evolves, the architecture will have also been transformed: New products The traditional role of IT— automating develop into an information ecosystem: a have increased query performance by a fac- business processes — imposes precise re- network of internal and external services tor of 1,000 and are capable of managing quirements, adherence to standards and continuously sharing information, opti- the wide variety of big data sources. Statisti- controls on changes. Analytics has been mizing decisions, communicating results cal analysis packages are similarly evolving more of an afterthought for monitoring pro- and generating new insights for businesses. to work with these new data platforms, data cesses and notifying management about the types and algorithms. anomalies. Big data flips this approach on its Thomas H. Davenport is a visiting professor Another disruptive force is the delivery head. A key tenet of big data is that the world at Harvard Business School and President’s Distinguished Professor of Information of big data capabilities through “the cloud.” and the data that describe it are constantly Technology and Management at Babson Although not yet broadly adopted in large changing, and organizations that can recog- College in Wellesley, . Paul Barth and Randy Bean are the cofounders corporations, cloud-based computing is nize the changes and react quickly and and managing partners of NewVantage well suited to big data. Many big-data ap- intelligently will have the upper hand. Partners, a Boston-based management plications use external information that is Whereas the most vaunted business and IT consulting firm. Comment on this article at http://sloanreview.mit.edu/x/54104, or con- not proprietary, such as social network capabilities used to be stability and scale, the tact the authors at [email protected]. modeling and sentiment analysis. More- new advantages are based on discovery and over, big data analytics are dependent on agility — the ability to mine existing and new Reprint 54104. extensive storage capacity and processing data sources continuously for patterns, Copyright © Massachusetts Institute of Technology, power, requiring a flexible grid that can be events and opportunities. 2012. All rights reserved.

24 MIT SLOAN MANAGEMENT REVIEW FALL 2012 SLOANREVIEW.MIT.EDU PDFs ■ Permission to Copy ■ Back Issues ■ Reprints Articles published in MIT Sloan Management Review are copyrighted by the Massachusetts Institute of Technology unless otherwise specified at the end of an article. MIT Sloan Management Review articles, permissions, and back issues can be purchased on our Web site: www.pubservice.com/msstore or you may order through our Business Service Center (9 a.m.-7 p.m. ET) at the phone numbers listed below. Paper reprints are available in quantities of 250 or more. To reproduce or transmit one or more MIT Sloan Management Review articles by electronic or mechanical means (including photocopying or archiving in any information storage or retrieval system) requires written permission. To request permission, use our Web site (www.pubservice.com/msstore), call or e-mail: Toll-free: 800-876-5764 (US and Canada) International: 818-487-2064 Fax: 818-487-4550 E-mail: [email protected] Posting of full-text SMR articles on publicly accessible Internet sites is prohibited. To obtain permission to post articles on secure and/or password-protected intranet sites, e-mail your request to [email protected] Customer Service MIT Sloan Management Review PO Box 15955 North Hollywood, CA 91615