The New Analytics Lifecycle Accelerating Insight to Drive Innovation

Dave Wells November 2017 The New Analytics Lifecycle

About the Author

Dave Wells is an advisory consultant, educator, and research analyst dedicated to building meaningful connections along the path from data to business impact. He works at the intersection of information and business, driving value through analytics, , and innovation. With nearly five decades of combined experience in information management and business management, Dave has a unique perspective about the connections of business, information, data, and technology.

Knowledge sharing and skills building are Dave’s passions, carried out through consulting, speaking, teaching, research, and writing. He is a continuous learner—fascinated with understanding how we think—and a student and practitioner of systems thinking, critical thinking, design thinking, divergent thinking, and innovation.

About Eckerson Group

Eckerson Group is a research and consulting firm that helps business and analytics leaders use data and technology to drive better insights and actions. Through its reports and advisory services, the firm helps companies maximize their investments in data and analytics. Its researchers and consultants each have more than 20 years of experience in the field and are uniquely qualified to help business and technical leaders succeed with business intelligence, analytics, data management, data governance, performance management, and .

About This Report

This report is sponsored by Alteryx, and Tableau.

© Eckerson Group 2017 www.eckerson.com 2 The New Analytics Lifecycle

Executive Summary

Analytics is at the forefront of modern business management. The role of analytics for informed decision making is well known, and much attention is given to the value of insight. The power of analytics, however, stretches far beyond insight. The next generation of analytics sees insight as only the first step—the spark that ignites imagination, ideation, and inspiration—on a path to innovation. Analytics-driven innovation is a game changer, but getting there demands change. We must rethink analytics processes, compress the analytics lifecycle, and apply the right technologies in the right ways to achieve fast, scalable, accessible, and collaborative analytics. When frequency of insights matches or exceeds frequency of questions and uncertainties, we’ve achieved fast analytics. When analytics capacity dynamically adapts to expanding data volumes and growing workloads, we have scalable analytics. When data analysts from non-technical line-of-business people to highly skilled data scientists can meet their own needs, we have accessible self-service analytics. When analytics drives communication, conversation, common understanding, and shared insights, we have collaborative analytics. Fast, scalable, accessible, and collaborative—these are the keys to analytics-driven innovation.

The Age of Innovation

The Demand for Innovation

Innovation is an essential business capability. The leading companies of today continuously innovate and adapt their business models, not simply responding to change but in many instances driving change. Looking to the future, the surviving companies of the next generation must embrace innovation as a core competency. Amazon and Google are well known examples of serial innovators repeatedly reshaping markets, products, and consumer expectations. Considering the fate of companies who failed to innovate—Blockbuster, Xerox, Blackberry, and others—it is clear that innovation is a critical part of survival in today’s business environment. Successful companies understand the importance of innovation. Successful innovators understand the very strong connection between analytics and innovation.

© Eckerson Group 2017 www.eckerson.com 3 The New Analytics Lifecycle

Insight Drives Innovation

Analytics and insight form the bridge from data to innovation.

Innovation isn’t something magical that only occurs through miraculous bursts of exceptional creativity or the amazing capabilities of a lone genius. Innovation is a process that begins with insight—the ability to see deep inside markets, problems, and opportunities. Insight fuels ideas, imagination, and inspiration that are key ingredients of innovation. (See figure 1.)

Figure 1. Moving from Insight to Innovation

Analytics is an essential component. Don’t underestimate the important role of analytics for becoming an innovative organization. Analytics and insight form the bridge from data to innovation.

The Modern Analytics Organization

Embracing and Enabling Self-Service

The world of analytics changed radically with the introduction of self-service data technologies—BI and visualization, followed quickly by , and then today’s self-service analytics tools. Code-free analysis tools became line-of-business tools of choice for and .

© Eckerson Group 2017 www.eckerson.com 4 The New Analytics Lifecycle

Self-service analytics grew from demand. Business people want and need faster analytics than is practical with an IT-centric approach. The demand for data, reporting, and analytics has grown and typical IT departments simply can’t keep up with growing needs. But the drivers of self-service technology go well beyond speed of delivery, including the following:

Adaptability. Analytics needs to be fast but also adaptable. Every answer brings new questions that frequently demand new data. Adapting to rapidly evolving needs throughout the processes of discovery and analysis is an important aspect of self-service. Adapting to data infrastructure is also an important feature in today’s complex data management world where data may be accessed from on-premises, cloud, and web sources.

Affordability. Analytics needs to be ROI-friendly. Many analytics projects are one-off efforts that don’t justify the cost of a typical IT project. The investment needs to yield results quickly, and in a way that effectively empowers all users. For a typical line-of-business analyst the user experience needs to be similar to that of working in Excel, but with advanced analysis, spatial, and visualization capabilities.

Autonomy. Business analysts, data scientists, and functional data analysts want their analytics processes to be local and “in my control” with freedom to explore, discover, and change direction, as needed.

Self-service data and analysis are mainstream practices. You can’t put the genie back in the bottle.

Self-service BI and visualization tools were quickly followed by self-service data preparation tools and data catalogs, further advancing the capabilities of line-of-business analysts. Self-service brings challenges, scalability, governance, IT support, and more—but you can’t put the genie back in the bottle. Self-service data and analysis are mainstream practices that every analytics organization must embrace.

Modernizing Data Infrastructure

As the demand for fast and abundant data continues to grow, enterprise data infrastructure is evolving. The of the past struggles to keep pace with today’s needs. The data warehouse needs to be modernized.

Cloud data warehousing has many advantages but value is only realized when the data is readily accessible to all who need it.

© Eckerson Group 2017 www.eckerson.com 5 The New Analytics Lifecycle

Cloud data warehousing is becoming increasingly popular with IT and data management organizations. Migrating to the cloud offers many advantages including scalability, elasticity, managed infrastructure, rapid deployment, and fast processing. However, cloud data warehousing investments will only return value when the data is readily accessible to all who need it. Ideally, self-service tools make it easy to access data from any location and to blend data from multiple locations. Self-service technologies must adapt as data infrastructure evolves.

Relieving the Burden on IT

IT organizations can’t (and probably shouldn’t) provide all of the data, reporting, and analysis that is needed by the ever-increasing numbers of people who use data. Embracing self-service unburdens IT organizations by shifting part of the onslaught of analysis and reporting requests from IT work queues to line-of-business projects. The obvious benefit is shrinking the backlog of unfilled requests by moving the workload. But technical organizations also gain advantage directly from the self-service tools. IT analysts and developers using modern data preparation tools can prepare data for business use faster, more efficiently, and with fewer errors and oversights than doing it the old way. Analysts, developers, and report writers who use the most current reporting and analysis tools do their work faster with less coding, greater agility, and enhanced collaboration capabilities. Today’s most advanced analytics technologies also help IT operations personnel to deploy, monitor, and manage analytic models and applications.

Building an Analytics Community

Data analysis is pervasive in business today. It happens at all levels from non-technical line-of-business analysts to highly specialized statisticians and data scientists. The reach of analytics is even broader and deeper, touching everyone from C-level to front-line operations staff.

Analytics technology helps to build the analytics community with features for collaborative data exploration, repeatable and shared workflows, and managed deployment and operations.

Fostering analytics culture and becoming an analytics-driven enterprise begins by building an analytics community—a community founded on three principles:

Analyst collaboration. Cooperative, adaptive, and open analytics culture is a core element of organizations where self-service analytics, enterprise analytics, and advanced data science practices are complementary. Data sharing, model sharing, and insight sharing thrive in a collaborative culture that emphasizes communication and coaching and reduces redundancy, inconsistency, and repetition. Analytics technology with powerful collaboration and work-sharing features contributes extensively to building an analytics community.

© Eckerson Group 2017 www.eckerson.com 6 The New Analytics Lifecycle

Organizational federation. Most enterprises have multiple groups and individuals performing data analysis, often with little knowledge of others within the company doing similar work. Recognize the many places throughout the enterprise where data analysis is performed—analysis of all types from basic visualizations to spatial, predictive, and prescriptive models. Bringing together the “pockets” of analytics is a key practice for building an analytics community. Seek to achieve knowledge and information sharing among semi-autonomous lines of business, and between line-of-of business analysts, IT organizations, and data science teams.

Figure 2. Data Analysis the Analytics technology enables organizational federation with features old, hard way such as reusable and repeatable analytics workflows, collaborative data exploration and analysis, and managed deployment and operations. requirements specification

Three-tier analytics. The analytics community will struggle as a highly data seeking centralized organization, and will be equally challenged if fully decentralized. Three levels of analytics services—self, shared, and data evaluation central—must work together to build a fully functioning community. A three-tier analytics services model supports the following: access authorization • Self-service for local and autonomous work. data acquisition • Shared services to promote reuse of analysis and workflows ....across lines of business. data profiling and understanding • Central services for business-critical, enterprise-wide, and highly ....technical analytics. data improvement and enrichment The three-tier model is enabled by analytics technology that supports data blending and encourages interoperability and sharing between and formatting semi-autonomous, decentralized lines of business, information exploration and technology organizations, data science teams, and BI stakeholders. experimentation modeling and analysis Rethinking the Analytics Lifecycle

communication and sharing Traditional Analytics implementation Data Analysis the Old, Hard Way Traditional data analysis is frequently a long, slow process that fails to decision making & business outcomes deliver the needed knowledge and insights, or fails to deliver as quickly as is needed. Analysis is a lengthy process of many phases that are

© Eckerson Group 2017 www.eckerson.com 7 The New Analytics Lifecycle

organized as a linear process but usually executed with much unintended backtracking. The protracted path from information needs to decision making often diminishes the business impact of analysis.

Twelve Steps The old process for getting from information needs to decisions progresses in a somewhat linear fashion through a series of 12 steps. (See figure 2.)

Requirements Specification. The first step of traditional data analysis is to identify and understand the kinds of information that the business needs. Early specification of requirements limits agility and adaptability and is especially difficult when needs are vague, uncertain, and elusive.

Data Seeking. Finding data is a long-standing challenge of analytics. Self-service and line-of-business analysts often work with less than ideal datasets, relying on past experience, tribal knowledge, and personal networks as primary data finding techniques.

Data Evaluation. Matching data to analysis needs is a process of evaluating data contents and quality. First the analyst needs to determine whether the dataset contains a rich enough collection of variables to support the analysis. With the right set of variables, is checked including completeness, timeliness, and accuracy.

Access Authorization. Securing permission to access a dataset frequently impedes the analysis process. Whether due to policy governing security and privacy of sensitive data, or the result of territorialism and politics, access authorization is a common chokepoint for data analysis.

Data Acquisition. After authorization, the next step is to acquire the data. Sometimes this means waiting for IT or the owning organization to provide an extract or copy of a dataset. At other times the challenge is to determine the right methods and protocols to access the data directly.

Data Profiling and Understanding. Meaningful data analysis can’t be performed without first understanding the contents of each dataset. Profiling is a much deeper look into a dataset than the evaluation step described above. Profiling entails understanding the shape of the data, knowing the range and distribution of values for analysis variables, identifying outliers, discovering relationships, knowing about null values, and discovering data anomalies. Profiling is a much deeper and more systematic dive into data content and quality than the earlier data evaluation step.

Data Improvement and Enrichment. Data improvement is data preparation work to standardize, cleanse, and de-duplicate the contents of a dataset. Data enrichment is preparation work to increase analytic value of a data set through derivation of new data, appending data from other sources, and aggregating data to achieve the right granularity for the analysis.

© Eckerson Group 2017 www.eckerson.com 8 The New Analytics Lifecycle

Data Blending and Formatting. Data blending is data preparation work that combines data from two or more datasets into a single dataset for analysis. Blending often brings together data from multiple sources—ERP, legacy systems, cloud applications, web, and more. Blending data of unlike forms and formats such as relational, flat files, tagged data, and NoSQL is also common. Formatting organizes and structures data in the right form for the target analysis tool. Formatting may include sorting records, sequencing fields within records, sampling and filtering, and masking or obfuscating protected data.

Exploration and Experimentation. Exploring and experimenting with data uses techniques to gain deeper understanding of data relationships and dynamics and to discover hidden ways to derive knowledge and insight from the data.

Modeling and Analysis. Modeling and analysis applies algorithms and statistical methods to derive insights from data. Model forms vary based on the goals of analysis—discovery, diagnosis, simulation, prediction, etc.

Figure 3. Data-Driven Visualization and Communication. Sharing analytic insights involves producing charts, graphs, infographics or other visual Innovation the Modern Way means to illustrate the findings of analysis, and supporting the visuals with interpretation and explanation. frame the problem

Implementation. Analytic models and applications with find, evaluate, and acquire data ongoing value are promoted from a development and test environment to a production environment where they become explore, profile, and an operational asset. understand data Limited Impact prepare data (improve, enrich, blend, format) This long and slow process of data analysis limits business impact. Timeliness of insight is a critical element for creating explore, experiment, business value. As data and analysis latency increases, business model, andunderstand value decreases. Fresh and current insight can drive opportunity realization and innovation. The value of delayed and communicate and tell data stories retrospective insight is limited to management decision making.

operationalize Decision support with a fresh coat of paint?

Perhaps this traditional form of data analysis is really just decide, act, decision support with a fresh coat of paint. The impact of analysis and innovate is limited for several reasons:

© Eckerson Group 2017 www.eckerson.com 9 The New Analytics Lifecycle

• The process has too many steps. • It is built on steps instead of actions. A step-oriented process build walls with frequent change of ....teams and tools when moving from one step to the next. • The steps are perceived as linear although they don’t really work that way. In practice it is often a ....process of two steps forward and one step back. • The result is frequently too many charts and not enough insights.

Modern Analytics

Data-Driven Innovation for the 21st Century Making the leap from data-informed decision support to data-driven innovation demands processes and technologies that remove the limits of the traditional approach. We need to focus on actions instead of steps, and we need a compact set of non-linear actions that lead to fast and real insights. In short, we must compress the analytics lifecycle.

Seven Activities The modern process of getting from business needs to innovation is an iterative journey of seven activities. (See figure 3.) The lifecycle is compressed by combining actions that make sense to perform together. For example, finding, evaluating, and acquiring data is a single cohesive activity that encompasses the simultaneous actions of finding, evaluating, and acquiring. There is no need for a linear step-by-step process with the right technologies.

Frame the Problem. Problem framing is distinctly different from requirements specification. The goal is to understand the nature of the business problem. Understand the business goal and how time, people, risk, resources, and other variables influence that goal. Don’t leap to requirements specification. Let requirements surface naturally through the analysis process.

Find, Evaluate, and Acquire Data. Managing data in the cloud can ease some of the pain of finding and acquiring data. The combination of cloud technologies for with cloud data warehousing simplifies and accelerates data seeking, data access, and data blending. Data cataloging also makes data searchable and eliminates the tribal knowledge approach to finding data. Catalog metadata and annotations are used to evaluate the fit to analysis needs. Data can be acquired directly via a catalog tool that is protocol- and connector-aware and that enforces governance constraints.

Explore, Profile, and Understand Data. Exploration, profiling, and understanding are interwoven actions that involve looking at data from many different perspectives to gain knowledge of content, structure, and relationships while in the midst of the modeling process. Tools that support inline visualytics and interactive data exploration make this a quick and easy activity. Inline visualytics allows users to visually explore data as an integral part of data analysis activities such as blending, profiling, and modeling.

© Eckerson Group 2017 www.eckerson.com 10 The New Analytics Lifecycle

Prepare Data for Analysis. All data preparation work—improve, enrich, blend, and format—is performed iteratively and interactively as new knowledge is acquired by working with the data. Preparation is likely to be performed multiple times, iterating with the two adjacent activities. Tools with a rich library of data preparation operations and shared workflow capabilities bring speed and agility to data preparation.

Explore, Experiment, Model and Understand. Building analytic models is an intensely cognitive activity that makes little sense as a sequence of linear steps. Exploration is the act of finding meaning in the data (in contrast to earlier exploration to understand data contents). Experimentation is used to test hypotheses about data and how to gain insights from it. Modeling applies statistics and algorithms for classification, clustering, segmentation, association, sequencing, probability, and prediction—analytic methods used to extract insights from data. Analytics tools that support these actions as a single iterative activity instead of a series of steps are essential to the modern lifecycle. Tools that embed statistical and algorithmic techniques and make them accessible to non-technical users make self-service practical with modern analytics.

Data analysis tools are a critical component of modern analytics. They must support simultaneous and iterative exploration, experimentation, and modeling and they must serve the needs of a users with varying levels of technical skill ranging from line-of-business data analysts to highly skilled data scientists. Satisfying the wide range of needs may require multiple analysis tools: code-free for non-technical and advanced scripting for those who need or want to code, hiding complexity when it’s a barrier to analysis, and exposing complexity when needed. Give particular attention to tradeoffs between ease of use and advanced capabilities, bringing self-service data analysis capabilities to everyone from data novices to those who need advanced functions such as fuzzy matching, custom calculations, and geospatial analysis.

Communicate and Tell Data Stories. The analyst’s job goes beyond producing models and visualizing data. Analysts change business outcomes and make real business impact when they interpret data discoveries, expose insights, describe opportunities, and recommend actions to be taken. Data storytelling—blending visualization and narrative—is a proven and effective way to communicate the findings of analytics. Analytics tools that are especially strong for exploration, modeling, and visualization help the story to emerge as a natural product of data analysis. Data stories have greatest impact when they are shared and retold. Publishing analysis results to the cloud encourages communication and collaboration, fostering a culture of shared data, shared analytics, and shared insights.

Operationalize. The modern analytics lifecycle does more than simply implement analytic models and applications. The real goal is to operationalize—implement in production and then look beyond

© Eckerson Group 2017 www.eckerson.com 11 The New Analytics Lifecycle

implementation to consider business adoption and day-to-day operations. Automation of repetitive analysis processes, publishing scheduled reports, posting analysis results, and writing back to a data warehouse are common operations activities. Analytic models as production systems implement recurring analysis and insight to support informed decision-making processes. Operationalization needs technology to deploy at enterprise scale and to automatically refresh analytics as data changes.

Far Reaching Impact Realizing the vision of data-driven innovation requires a modern approach to analytics. Innovating with analytics demands fresh analytics, regularly refreshed analytics, real insights, and meaningful data stories—the triggers for ideation, imagination, and inspiration. The modern analytics lifecycle offers more than mildly enhanced decision support. Decision, action, and innovation are all within reach.

Modern Technologies for Modern Analytics

The modern analytics lifecycle is technology dependent. Data cataloging, data preparation, modeling and data analysis, and visualization must provide a seamless user experience to successfully adopt the lifecycle. Modern analytic technology must span the continuum of analysts from non-technical line-of-business people performing self-service data analysis to data scientists building complex and highly technical analytic models. Deployment options ideally range from departmental to enterprise analytics with elastic and scalable infrastructure for enterprise deployment.

Cloud technologies are especially helpful with elasticity, scalability, and instant infrastructure to accelerate deployment. The cloud’s horizontal scaling capabilities offer substantial benefits when deploying self-service at enterprise scale. After deployment the maintenance-free infrastructure simplifies administration, and cloud elasticity makes it practical to rapidly and dynamically adapt to fluctuating workloads.

Every activity of the modern analytics lifecycle depends on tools and technology. (See figure 4.) Data cataloging is needed to find, evaluate, and acquire data. Data preparation is used to explore, improve, enrich, and format data. Analytics are needed for experimentation, modeling, and crafting of data stories. Modern technologies support deployment on premises, virtually, in the cloud, and as a blend of multiple methods.

© Eckerson Group 2017 www.eckerson.com 12 The New Analytics Lifecycle

The right infrastructure can make all the difference. Technology enables modern analytics. Modern analytics drives innovation. And innovation is a key to sustainable business. The right technologies can support your modern analytics organization with the following:

• Ease of use and intuitive user interface. • Collaboration and sharing. • Fast data analysis. • Code-free and code-friendly advanced analytics capabilities. • Support for a variety of data types and sources. • Reusable analytic workflows. • Scalability and elasticity. • Interoperability across multiple technologies with seamless user experience.

Figure 4. Technology through the Modern Analytics Lifecycle

frame the problem

find, evaluate, and Data Catalog acquire data

explore, profile, and understand data e r u t e

Data Preparation e r c r u u u t r t t c prepare data c u u a s r r r (improve, enrich, blend, format) t t f n a s I a s r

r f s f n e n I I

s

i a l d m u explore, experiment, u t e o r r l i P

model, and understand C V - n

Data Analytics O communicate and tell data stories

operationalize

decide, act, and innovate

© Eckerson Group 2017 www.eckerson.com 13 The New Analytics Lifecycle

Join the Age of Insight-Driven Innovation

Innovation is a must for future-looking businesses, and insight-driven innovation is quickly becoming a core competency for leading businesses. The benefits are clear, but getting there demands change. We must rethink analytics processes, compress the analytics lifecycle, and apply the right technologies in the right ways to achieve fast, scalable, accessible, and collaborative analytics. To plot your course to insight-driven innovation you’ll need to rethink your analytics organization, compress the analytics lifecycle, and modernize your analytics technology stack. Make your process, lifecycle, and technology choices with the following goals in mind:

• Make analytics easy for and accessible to every stakeholder. • Give every user, regardless of technical skill level, the ability to perform the full spectrum of ...analysis activities from beginning to end of the lifecycle. • Think “scale-out” to enable self-service analytics at enterprise scale. • Adapt to continuously changing data architectures including cloud data warehousing and data ...lake deployments to derive maximum value from all data deployments. • Make fast analytics a priority—fast to find data, prepare data, analyze data, and share insights. • Ease the pain of operationalization with fast deployment and simplified administration. • Build an analytics community founded on a culture of communication and collaboration.

Need help with your business analytics or data management and governance strategy? Want to learn about the latest business analytics and big data tools and trends? Check out Eckerson Group research and consulting services.

© Eckerson Group 2017 www.eckerson.com 14 The New Analytics Lifecycle

About the Research Sponsors

Alteryx is the only quick-to-implement, self-service data analytics platform that allows data scientists and citizen users alike to break the barriers to insight, so everyone can experience the thrill of getting to the answer faster. The Alteryx platform makes it easy to securely access, cleanse, and prepare all the relevant data; blend it all together to prepare the perfect analytic data set; then perform sophisticated analytics and output easily consumable analytics that drive deeper insights and improved decision making and business performance. Drag. Drop. Solve. Experience Alteryx: alteryx.com/trial

For 11 years, Amazon Web Services has been the world’s most comprehensive and broadly adopted cloud platform. AWS offers over 90 fully featured services for compute, storage, networking, database, analytics, application services, deployment, management, developer, mobile, Internet of Things (IoT), Artificial Intelligence (AI), security, hybrid, and enterprise applications, from 43 Availability Zones (AZs) across 16 geographic regions in the U.S., Australia, Brazil, Canada, China, Germany, India, Ireland, Japan, Korea, Singapore, and the UK. AWS services are trusted by millions of active customers around the world -- including the fastest growing startups, largest enterprises, and leading government agencies -- to power their infrastructure, make them more agile, and lower costs. To learn more about AWS, visit https:// aws.amazon.com.

Tableau Software helps people see and understand data. Tableau helps anyone quickly analyze, visualize and share information. More than 15,000 customer accounts get rapid results with Tableau in the office and on-the-go. And tens of thousands of people use Tableau Public to share data in their blogs and websites. See how Tableau can help you by downloading the free trial of Tableau Desktop today.

© Eckerson Group 2017 www.eckerson.com 15