Big Data and the Nature of Business Decisions
Total Page:16
File Type:pdf, Size:1020Kb
Big Data and the Nature of Business Decisions April, 2013 Mark Madsen www.ThirdNature.net @markmadsen Our ideas about information and how it’s used are outdated. How We Think of Users The conventional design point is the passive consumer of information. Proof: methodology ▪ IT role is requirements, design, build, deploy, administer ▪ User role is receive data Self‐service is not like picking the right doughnut from a box. How We Think of Users How We Want Users to Think of Us Our design point is the passive consumer of information. Proof: methodology ▪ IT role is requirements, design, build, deploy, administer ▪ User role is run reports Self‐serve BI is not like picking the right doughnut from a box. How We Think of Users What Users Really Think Food supply chain: an analogy for data Multiple contexts of use, differing quality levels What do you I never said the mean, “only “E” in EDW meant doughnuts?” “everything”… It’s going to get a lot bigger E Not E! Everything is digital. It’s no longer just rows and columns, it’s bits. The sensor data revolution Sensor data doesn’t fit well with current methods of collection and storage, or with the technology to process and analyze it. Copyright Third Nature, Inc. Unstructured is really unmodeled. We turn text into data, but we don’t model it by hand. Sentiment, tone, opinion Words & counts, keywords, tags Topics, genres, relationships, Categories, Entities abstracts taxonomies people, places, things, events, IDs Copyright Third Nature, Inc. Three kinds of measurement data we collect The convenient data is transactional data. ▪ Goes in the DW and is used, even if it isn’t the right measurement. The difficult and misleading data is declarative data. ▪ What people say and what they do require ground truth. The inconvenient data is observational data. ▪ It’s not neat, clean, or designed into most systems of operation. We need to make use of all three. Copyright Third Nature, Inc. “Big data is unprecedented.” ‐ Anyone involved with big data in even the most barely perceptible way We’ve been here before Source: Bill Schmarzo, EMC BI is a now commodity, a cost of doing business Big Data, Big Hype $876 Gajillion (analyst estimates of the big data market) “Big” is the oldest, easiest problem to solve Image courtesy of Teradata “Big” is well supported by databases now Source: Noumenal, Inc. Commoditization is the fundamental driver 1010 10 9 10,000 X improvement 10 8 107 106 105 104 103 102 101 10 10‐1 01‐2 10‐3 10‐4 Calculations per second per $1000 10‐5 10‐6 Data: Ray Kurzweil, 2001 1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Mechanical Relay Vacuum tube Transistor Integrated circuit Storage costs have declined with computing costs With big data systems, the cost of storing data is an order of magnitude lower than with databases today (but not the cost or ability to query it back out). Processing data at scale is at least an order of magnitude cheaper too. Source: Venturebeat Copyright Third Nature, Inc. Parallel computing: the underlying technology “In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers.” Grace Hopper Cloud Computing: A Big Data Enabler What you see: seemingly infinite resource to apply to computing problems on short notice and at low cost Key impacts of cloud computing model: Utility computing – ▪ Pay for the resources you use, when you use them ▪ Expense instead of capital ▪ Elastic: scale up and down, like a utility ▪ Speed to acquire and deploy resources But: cloud is built for scalability, not DB response time Copyright Third Nature, Inc. Two big data straw men used by vendors in our market It’s a poor It’s a poor man’s man’s ETL! database! Hadoop: a summary of the magic 1. Provides both storage and complex processing as part of the same platform 2. Makes parallel programming more accessible 3. Schemaless, therefore flexible 4. Inexpensive, reliable scale‐out 5. Potential for fast, scalable ingest 6. The Apache version is free The bad stuff: ▪ Not for mutable data ▪ Simple file‐based sequential processing ▪ Zero data management An important Hadoop + cloud computing benefit Scalability is free –if your task requires 10 units of work, you can decide when you want results: 1 server, 10 units of time 10 servers, 1 unit of time X X Time Cost is the same. Not true of the conventional IT model Copyright Third Nature, Inc. Quantitative differences can be qualitative Eadweard Muybridge, 1878 “Faster” is a qualitative difference (when there’s enough of it) Big data enables this kind of “faster” for processing workloads, as well as deeper analytics and new analytics Big data value: There’s a pony in there somewhere… The myth that still drives big data All we need is a fat pipe and pans working in parallel… You change an org by acting with, through others, not alone. What really happens to most great insights If you don’t have a way to turn that insight into an action within the organization then you are producing expensive trivia. The Three Four Six Many V’s of Big Data I got a fever, and the only prescription that can cure it is MORE V’s! Common belief: the more V’s you have, the more budget you get. Much of the big data value comes from analytics BI is a retrieval problem, not a computational problem. Five basic things you can do with analytics ▪ Prediction – what is most likely to happen? ▪ Estimation – what’s the future value of a variable? ▪ Description – what relationships exist in the data? ▪ Simulation – what could happen? ▪ Prescription – what should you do? Slide 34 Copyright Third Nature, Inc. Copyright Third Nature, Inc. Analytic Maturity: This is Nonsense Organizations do all of High these in different places at different times. “why” is the hardest question to answer, not a factual question. “What will happen?” This model isn’t built Predictive “What’s around what people do. Analytics happening?” It’s built around classes Business Value Operational BI / “Why did it of technology. happen?” realtime BI “What Analysis happened?” Reporting Static & Query, Excel, Dashboards, Statistics, data Interactive OLAP, Visual Scorecards mining, Low Reports discovery optimization Organizational Maturity Two keys to making big data worthwhile Value: Actionability: Goal Æ solution Simple “value” isn’t enough. not Information has to be Solution Æ goal actionable, somehow. We think of BI as publishing, an old metaphor. Publishing has value, but may not be actionable. Data is not the end of the line, it’s the departure point We ignored the important tasks that deliver value. Slide 38 Decisions are the starting point for most of the organization. A decision is a choice between options in a situation involving uncertainty, with a risk that the outcome won’t meet a goal. Planning data strategy means understanding the context of data use so we can build infrastructure We need to focus on what people do with information as the primary task, not on the data or the technology. Analyze Analyze Monitor Decide Act Exceptions Causes No problem No idea Do nothing Copyright Third Nature, Inc. General model for organizational use of data Analyze Analyze Monitor Decide Act Exceptions Causes No problem No idea Do nothing Act within the process Usually real-time to daily Copyright Third Nature, Inc. General model for organizational use of data Collect Act on the process new data Usually days/longer timeframe Analyze Analyze Monitor Decide Act Exceptions Causes No problem No idea Do nothing Copyright Third Nature, Inc. You need to be able to support both paths Causal analysis, “data science” Collect new data Act on the process Analyze Analyze Monitor Decide Act Exceptions Causes Act within the process Conventional BI, addition of EDM Copyright Third Nature, Inc. Act: the part that creates the most problems Decision Assumptions Deliberation ▪ Actions are consciously chosen. Rationality ▪ People make logical decisions. Sure they do. Order ▪ System are understandable and the results of actions predictable. What’s the reality in most organizations? Irrationality, vanity, unreasonable behavior, politics, bureaucracy, doing the same things repeatedly. Where data really comes from A very abstract business intelligence model Who are the people making decisions? Strategic Tactical Operational The process aspect of decisions connects people Scope of control for people in most organizations aligns: in process, on process, over process Strategic Tactical Operational The exceptions not handled at one level due to rule / procedure / policy deficiency are escalated to the next. Copyright Third Nature, Inc. What is the nature of their decisions? Scope, time frame of decision, time scale of data, data volume, breadth of data, frequency, pattern vs fact‐based Strategic Months • Pattern‐based • Broad scope Days‐ • Fact‐based • Moderate Tactical Weeks scope Mins‐ • Rule‐based Days • Narrow scope Operational Analytic complexity Copyright Third Nature, Inc. How and where can you apply information? High single value, less frequent, so improve the Strategic effectiveness of individual decisions. Tactical Fuzzy middle ground Low single value, frequent, can improve the efficiency Analytic complexity Operational or the effectiveness for large aggregate improvement. Strategy to Execution What kind of support do people have today? Strategic Dashboards, scorecards, but mainly other people Tactical Email, meetings, dashboards Operational Reports, dashboards Realm of traditional BI Reality of most reports and dashboards is that they provide basic monitoring at best. Differing decision goals and needs create tension Managing the business: Strategic • Want change • Seek adaptation Tactical Operating the business: • Want stability Operational • Seek consistency There is a difference between operating a business and managing a business.