Quick viewing(Text Mode)

Big Data and the Nature of Business Decisions

Big Data and the Nature of Business Decisions

Big Data and the Nature of Business Decisions

April, 2013

Mark Madsen www.ThirdNature.net @markmadsen Our ideas about information and how it’s used are outdated. How We Think of Users The conventional design point is the passive consumer of information. Proof: methodology ▪ IT role is requirements, design, build, deploy, administer ▪ User role is receive data Self‐service is not like picking the right doughnut from a box. How We Think of Users How We Want Users to Think of Us Our design point is the passive consumer of information. Proof: methodology ▪ IT role is requirements, design, build, deploy, administer ▪ User role is run reports Self‐serve BI is not like picking the right doughnut from a box. How We Think of Users What Users Really Think Food supply chain: an analogy for data Multiple contexts of use, differing quality levels What do you I never said the mean, “only “E” in EDW meant doughnuts?” “everything”… It’s going to get a lot bigger

E

Not E!

Everything is digital. It’s no longer just rows and columns, it’s bits. The sensor data revolution

Sensor data doesn’t fit well with current methods of collection and storage, or with the to process and analyze it.

Copyright Third Nature, Inc. Unstructured is really unmodeled. We turn text into data, but we don’t model it by hand.

Sentiment, tone, opinion Words & counts, keywords, tags

Topics, genres, relationships, Categories, Entities abstracts taxonomies people, places, things, events, IDs Copyright Third Nature, Inc. Three kinds of measurement data we collect The convenient data is transactional data. ▪ Goes in the DW and is used, even if it isn’t the right measurement. The difficult and misleading data is declarative data. ▪ What people say and what they do require ground truth. The inconvenient data is observational data. ▪ It’s not neat, clean, or designed into most systems of operation. We need to make use of all three.

Copyright Third Nature, Inc. “Big data is unprecedented.” ‐ Anyone involved with big data in even the most barely perceptible way We’ve been here before

Source: Bill Schmarzo, EMC BI is a now commodity, a cost of doing business Big Data, Big Hype

$876 Gajillion (analyst estimates of the big data market) “Big” is the oldest, easiest problem to solve

Image courtesy of Teradata “Big” is well supported by databases now

Source: Noumenal, Inc. Commoditization is the fundamental driver

1010 10 9 10,000 X improvement 10 8 107 106 105 104 103 102 101 10 10‐1 01‐2 10‐3 10‐4 Calculations per Calculations per second per $1000 10‐5 10‐6 Data: Ray Kurzweil, 2001

1900 1910 1920 1930 1940 1950 1960 1970 1980 1990 2000 Mechanical Relay Vacuum tube Transistor Integrated circuit Storage costs have declined with computing costs

With big data systems, the cost of storing data is an order of magnitude lower than with databases today (but not the cost or ability to query it back out). Processing data at scale is at least an order of magnitude cheaper too. Source: Venturebeat

Copyright Third Nature, Inc. Parallel computing: the underlying technology

“In pioneer days they used oxen for heavy pulling, and when one ox couldn't budge a log, they didn't try to grow a larger ox. We shouldn't be trying for bigger computers, but for more systems of computers.” Grace Hopper Cloud Computing: A Big Data Enabler

What you see: seemingly infinite resource to apply to computing problems on short notice and at low cost Key impacts of cloud computing model: Utility computing – ▪ Pay for the resources you use, when you use them ▪ Expense instead of capital ▪ Elastic: scale up and down, like a utility ▪ Speed to acquire and deploy resources

But: cloud is built for scalability, not DB response time

Copyright Third Nature, Inc. Two big data straw men used by vendors in our market

It’s a poor It’s a poor man’s man’s ETL! database! Hadoop: a summary of the magic 1. Provides both storage and complex processing as part of the same platform 2. Makes parallel programming more accessible 3. Schemaless, therefore flexible 4. Inexpensive, reliable scale‐out 5. Potential for fast, scalable ingest 6. The Apache version is free

The bad stuff: ▪ Not for mutable data ▪ Simple file‐based sequential processing ▪ Zero data management An important Hadoop + cloud computing benefit Scalability is free –if your task requires 10 units of work, you can decide when you want results: 1 server, 10 units of time

10 servers, 1 unit of time

X X Time Cost is the same. Not true of the conventional IT model Copyright Third Nature, Inc. Quantitative differences can be qualitative

Eadweard Muybridge, 1878 “Faster” is a qualitative difference (when there’s enough of it)

Big data enables this kind of “faster” for processing workloads, as well as deeper analytics and new analytics Big data value: There’s a pony in there somewhere… The myth that still drives big data

All we need is a fat pipe and pans working in parallel…

You change an org by acting with, through others, not alone. What really happens to most great insights

If you don’t have a way to turn that insight into an action within the organization then you are producing expensive trivia. The Three Four Six Many V’s of Big Data

I got a fever, and the only prescription that can cure it is MORE V’s!

Common belief: the more V’s you have, the more budget you get. Much of the big data value comes from analytics BI is a retrieval problem, not a computational problem. Five basic things you can do with analytics ▪ Prediction – what is most likely to happen? ▪ Estimation – what’s the future value of a variable? ▪ Description – what relationships exist in the data? ▪ Simulation – what could happen? ▪ Prescription – what should you do?

Slide 34 Copyright Third Nature, Inc. Copyright Third Nature, Inc. Analytic Maturity: This is Nonsense

Organizations do all of High these in different places at different times. “why” is the hardest question to answer, not a factual question. “What will happen?” This model isn’t built Predictive “What’s around what people do. Analytics happening?” It’s built around classes

Business Value Operational BI / “Why did it of technology. happen?” realtime BI “What Analysis happened?” Reporting Static & Query, Excel, Dashboards, Statistics, data Interactive OLAP, Visual Scorecards mining, Low Reports discovery optimization

Organizational Maturity Two keys to making big data worthwhile Value: Actionability: Goal Æ solution Simple “value” isn’t enough. not Information has to be Solution Æ goal actionable, somehow. We think of BI as publishing, an old metaphor.

Publishing has value, but may not be actionable. Data is not the end of the line, it’s the departure point We ignored the important tasks that deliver value.

Slide 38 Decisions are the starting point for most of the organization. A decision is a choice between options in a situation involving uncertainty, with a risk that the outcome won’t meet a goal. Planning data strategy means understanding the context of data use so we can build infrastructure

We need to focus on what people do with information as the primary task, not on the data or the technology.

Analyze Analyze Monitor Decide Act Exceptions Causes

No problem No idea Do nothing

Copyright Third Nature, Inc. General model for organizational use of data

Analyze Analyze Monitor Decide Act Exceptions Causes

No problem No idea Do nothing

Act within the process Usually real-time to daily

Copyright Third Nature, Inc. General model for organizational use of data

Collect Act on the process new data Usually days/longer timeframe

Analyze Analyze Monitor Decide Act Exceptions Causes

No problem No idea Do nothing

Copyright Third Nature, Inc. You need to be able to support both paths

Causal analysis, “data science” Collect new data Act on the process

Analyze Analyze Monitor Decide Act Exceptions Causes

Act within the process Conventional BI, addition of EDM

Copyright Third Nature, Inc. Act: the part that creates the most problems Decision Assumptions

Deliberation ▪ Actions are consciously chosen. Rationality ▪ People make logical decisions. Sure they do. Order ▪ System are understandable and the results of actions predictable. What’s the reality in most organizations? Irrationality, vanity, unreasonable behavior, politics, bureaucracy, doing the same things repeatedly.

Where data really comes from A very abstract business intelligence model Who are the people making decisions?

Strategic

Tactical

Operational The process aspect of decisions connects people Scope of control for people in most organizations aligns: in process, on process, over process

Strategic

Tactical

Operational

The exceptions not handled at one level due to rule / procedure / policy deficiency are escalated to the next. Copyright Third Nature, Inc. What is the nature of their decisions? Scope, time frame of decision, time scale of data, data volume, breadth of data, frequency, pattern vs fact‐based

Strategic Months • Pattern‐based • Broad scope Days‐ • Fact‐based • Moderate Tactical Weeks scope Mins‐ • Rule‐based Days • Narrow scope Operational Analytic complexity

Copyright Third Nature, Inc. How and where can you apply information?

High single value, less frequent, so improve the Strategic effectiveness of individual decisions. Tactical Fuzzy middle ground Low single value, frequent, can improve the efficiency Analytic complexity Operational or the effectiveness for large aggregate improvement.

Strategy to Execution What kind of support do people have today?

Strategic Dashboards, scorecards, but mainly other people

Tactical Email, meetings, dashboards

Operational Reports, dashboards Realm of traditional BI

Reality of most reports and dashboards is that they provide basic monitoring at best. Differing decision goals and needs create tension

Managing the business: Strategic • Want change • Seek adaptation Tactical Operating the business: • Want stability Operational • Seek consistency

There is a difference between operating a business and managing a business. Most BI / BA today supports operating.

Copyright Third Nature, Inc. Business management has changed due to information Our simplistic notions of BI with stable models, ordered data and predictability are being replaced by concepts from decision support and complex adaptive systems (CAS).

Simple Complicated Complex

Assumption: Order Assumption: Unorder Assumption: Disorder Cause and effect is repeatable Cause and effect is separated Cause and effect is coherent & predictable in time & space, repeatable, in retrospect only, modelable learnable but changing Known Knowable Unpredictable Standard processes, clear Analytical techniques to Experiment to create possible metrics, best practice determine options, effects options Sense, categorize, respond Sense, analyze, respond Test, sense, respond Reporting, dashboards Ad‐hoc, OLAP, exploration Data science, casual analysis

Copyright Third Nature, Inc. Situational context governs data use Business intelligence support varies by decision context

Handles this really well (most of the time). Handles this sort of ok, sometimes. This, not so much.

Assumption: Order Assumption: Unorder Assumption: Disorder Cause and effect is repeatable Cause and effect is separated Cause and effect is coherent & predictable in time & space, repeatable, in retrospect only, modelable learnable but changing Known Knowable Unpredictable Standard processes, clear Analytical techniques to Experiment to create possible metrics, best practice determine options, effects options, test hypotheses Sense, categorize, respond Sense, analyze, respond Test, sense, respond Reporting, dashboards Ad‐hoc, OLAP, data discovery Casual analysis, simulation Basic BI Analysis Data science, analytics

Copyright Third Nature, Inc. The usage models for conventional BI

Collect Act on the process new data Usually days/longer timeframe This is what we’ve been doing with BI so far: static Analyze Analyze Monitor reporting, dashboards,Decide Act Exceptions Causesad-hoc query, OLAP

No problem No idea Do nothing

Act within the process Usually real-time to daily

Copyright Third Nature, Inc. The usage models for analytics and “big data”

Analytics and big data is Collect Act on the process focused on new use new data cases: deeper analysis, Usually days/longer timeframe causes, prediction, optimizing decisions Analyze Analyze Monitor Decide Act This isn’t ad-hoc,Exceptions Causes reporting, or OLAP.

No problem No idea Do nothing

Act within the process Usually real-time to daily

Copyright Third Nature, Inc. Somewhere along the way, the BI community lost sight of the real goal

The M-OODA loop, Rousseau & Breton, 2004 Where does our current infrastructure have trouble? Cost of growth, storing data Cost of and ability to deliver analytics Using non‐tabular data, like text and documents Supporting use of information in real time Time to deliver information for new business projects Supporting people in analysis Hadoop Adoption

Some people can’t resist getting the next new thing because it’s new. Many IT organizations are like this, promoting a solution and hunting for the problem that matches it. Better to ask “What is the problem for which Hadoop is the answer?” Business Intelligence vs Big Data / Analytics Business Intelligence: ▪ focus is on retrieval and delivery of data ▪ monitoring and identifying exceptions ▪ little variability, ambiguity, uncertainty ▪ reporting, dashboards, scorecards, OLAP for bounded exploration and analysis Business Analytics: ▪ focus is on generation of new data, insight/foresight ▪ exploring data, finding insights ▪ expect uncertainty and probability and pattern rather than specific data ▪ computational / probabilistic techniques Both need to focus on action and goals to succeed. There’s a shift in how we view and use analytics foo

P2M M2P or M2M Big changes for data warehousing workloads

The results of analytic processing can, often do, feed back into the system from which they originate. Much of the data is being read, written and processed in real time. Our design point was not changing and ephemeral patterns. Four core capabilities big data adds 1. Unlimited scale of storage, processing ▪ Agility, faster turnaround for new data requests (but not a replacement for BI) ▪ Fewer staff to accomplish same goals 2. New data accessibility ▪ More data retained for longer period ▪ Access to data unused due to cost or processing limits ▪ Any digital information becomes usable data 3. Scalable realtime processing ▪ Brings ability to monitor and act on data as events occur 4. Arbitrary analytics ▪ Faster analysis ▪ Deeper analysis ▪ More broadly accessible analytics Big Data Shift in a Nutshell It’s an architectural reconfiguration, just like web 2.0 The old model for data The new model for data ▪ Read only ▪ Read‐write ▪ Integrate before use ▪ Integrate at time of use ▪ Record only important data ▪ Record all the data ▪ Retrieval‐focused ▪ Processing‐focused ▪ Single method of access ▪ Multiple methods of access ▪ Deterministic models & use ▪ Stochastic models & use ▪ Human‐level latency ▪ Machine‐level latency ▪ Centralized publishing ▪ Community creation As a technology moves from emerging to commodity the nature of acquiring, using and managing it changes

Innovation Maturation Saturation

Generate Constrain Standardize / options choices minimize choice Adaptation Acquisition Novel practice Good practice Best practice Maximize value Optimize Minimize costs

Agile & open 6 Sigma & process source* methods methods Copyright Third Nature, Inc. Best of Breed or Integrated?

IT mega-vendors rarely offer value in an early market Designing for data: monolithic vendor technology‐ based classifications of the ecosystem won’t help

These types of eye charts provide a categorization of what’s available, not what you need. They ignore the contexts of use that are most important.

70 State of the market It’s a supply‐side market. VC accelerated technology development beyond the ability of most organizations to adopt. We are in an early stage. People are expensive, machines are cheap, so delivery will change. We need to develop new skills and learn how to apply the new technology, data and techniques to business problems as in the ’80s “When a new technologyQuestions? rolls over you, you're either part of the steamroller or part of the road.” – About the Presenter

Mark Madsen is president of Third Nature, a research and advisory firm focused on analytics, business intelligence and data management. Mark is an award‐winning author, architect and CTO whose work has been featured in numerous industry publications. Over the past ten years Mark received awards for his work from the American Productivity & Quality Center, TDWI, and the Smithsonian Institute. He is an international speaker, a contributor at Forbes Online and Information Management. For more information or to contact Mark, follow @markmadsen on Twitter or visit http://ThirdNature.net About Third Nature

Third Nature is a research and consulting firm focused on new and emerging technology and practices in analytics, business intelligence, and performance management. If your question is related to data, analytics, information strategy and technology infrastructure then you‘re at the right place. Our goal is to help companies take advantage of information-driven management practices and applications. We offer education, consulting and research services to support business and IT organizations as well as technology vendors. We fill the gap between what the industry analyst firms cover and what IT needs. We specialize in product and technology analysis, so we look at emerging and markets, evaluating technology and hw it is applied rather than vendor market positions. CC Image Attributions

Thanks to the people who supplied the creative commons licensed images used in this presentation:

Outdated gumshoe.jpg – http://flickr.com/photos/olivander/372385317/ donuts_4_views.jpg ‐ http://www.flickr.com/photos/le_hibou/76718773/ wheat_field.jpg ‐ http://www.flickr.com/photos/ecstaticist/1120119742/ straw men.jpg ‐ http://www.flickr.com/photos/robinellis/6034919721/ ponies in field.jpg ‐ http://www.flickr.com/photos/bulle_de/352732514/ train_to_sea.jpg ‐ http://www.flickr.com/photos/innoxiuss/457069767/ chinatown little color gate.jpg ‐ http://www.flickr.com/photos/paullikespics/3248133830/ where data really comes from ‐ Blake Stacy klein_bottle_red.jpg ‐ http://flickr.com/photos/sveinhal/2081201200/