Optimizing Data in the Age of Self-Service Data Analytics

By Donna Burbank Managing Director, Global Data Strategy, Ltd

Sponsored By: Contents Introduction...... 3 The Risk of Organizational Silos Balanced by the Reward of Collaboration...... 3 Data Governance: Finding the Right Balance...... 4 Know Which Data to Govern Closely, and Which to Leave Alone...... 5 The Process is Important...... 6 The Right Technology for the Right Job...... 7 The Business Value of Collaborative Data Discovery...... 8 About the Author...... 9 About Alteryx...... 9

www.globaldatastrategy.com 2 Introduction

Today, business is driven by data. And the data-driven business transformation has given rise to new business models and strategies that simply weren’t available with the technology of the past. The rise of big data, streaming technologies, IoT, cloud, machine learning, and other relatively new technologies provide an unprecedented opportunity for growth and innovation.

Think of a leading company in today’s marketplace, and there is a high probability that that company is harnessing data to strategic advantage. Look at Amazon, with its data-driven recommendation engine and streamlined distribution chain, or Lyft and Uber, who capture big data and IoT sources in near real- time to revolutionize the transportation industry. The list goes on and continues to grow as budding entrepreneurs envision new ways to make strategic use of the wealth of available data.

To capitalize on this trend, according to one recent survey1, over 96% of media and marketing executives have stated they are deeply committed to using data to transform their businesses into data-centric companies. At the same time, however, there is a significant lack of technical skills to support this growing need. In the same survey1, only 5% of executives are extremely confident that they have the skills within their organization to support their data-driven initiatives. With the growing business demand for data-centric skills and an increasing gap in IT skills to meet this need, more business-centric staff are taking an active role in the management and strategic analysis of data.

This active involvement by business users has led to significant growth in the demand for self-service analytics. In fact, the analyst firm Gartner predicts that by 2020, self-service data preparation will be used in over half of new data integration efforts2. This doesn’t even account for self-service analytics platforms. From recent experience in my consulting practice, I expect the number to be even higher from that. We see organizations, from Fortune 100 companies to small nonprofit organizations, looking to leverage self-service data preparation and analysis since more stakeholders are eager to take an active role using data to strategic advantage.

The Risk of Organizational Silos Balanced by the Reward of Collaboration

While it is a positive trend that more roles across the organization are looking to take an active part in and curation, without proper coordination and collaboration across these roles, organizations run the risk of creating silos that hamper productivity. With more departments looking to “own” their data for strategic advantage, ownership must carefully be defined to mean curation and maintenance of this data, not obstructionist control. When data is a strategic asset, like any asset, there is a risk that groups will compete for control of this valuable resource. Marketing, Sales, and Finance, for example, may all feel that they “own” customer data and may wish to limit other groups’ access to this information. In addition, there is often an organizational and cultural divide between business users and technical IT staff, and questions of role delineation often arise as business users take a more active role in the management and analysis of their data. While IT staff often do not have the time or inclination to

1 The Data-Centric Organization: Transforming for the Next Generation of Audience Marketing, a Winterberry Group white paper, September 2016 2 Gartner Market Guide for Self-Service Data Preparation, August 2016, by Rita L. Sallam, Paddy Forry, Ehtisham Zaidi, and Shubhangi Vashisth ID: G00304870

www.globaldatastrategy.com 3 support every query or data discovery effort required by the business, at the same time, they are often cautious to relinquish control over organizational data sets.

Finding the right balance between roles and responsibilities is key to success in building a data-driven organization. With this balance in place, business stakeholders can discover new insights in the data that is important to them, based on curated data sets that have been vetted by all relevant parties across IT and the business. Getting this balance right is the purview of data governance and, once proper governance is in place, organizations can work together to become a truly data-driven organization.

Data Governance: Finding the Right Balance

Data Governance is defined as, “The exercise of authority, control, and shared decision-making (planning, monitoring, and enforcement) over the management of data assets,” according to DAMA International’s Body of Knowledge (DAMA DMBOK2) 3. This definition highlights the inherent tension between control and collaboration when building a data governance program, i.e. authority and control vs. shared decision-making.

On the one hand, business-critical data sets need to have strict controls in place to ensure and consistency. A recent Harvard Business Review article estimates that the US economy loses $3.1 trillion per year due to poor quality data4. With data as the cornerstone of the data-driven business, closely managing this business-critical asset is paramount to success.

On the other hand, controls that are too strict can limit collaboration, innovation, and data-driven discovery. Getting this balance right and creating an environment where proper controls are in place, but still allow for self-service discovery by a range of users, can have significant benefits. According to Gartner estimates, by 2019, data and analytics organizations that provide agile, curated internal and external data sets for a range of content authors will realize twice the business benefits of those who do not².

A large benefit of the self-service and collaborative approach to data governance and data discovery is the ability to harness an organization’s “tribal knowledge” that is often lost when using a more formal, top- down approach. When business users are actively involved in data management activities, decisions are made closer to the source of domain knowledge. Not only does this approach typically yield better results, but these results can be achieved more quickly. Rather than IT staff having to spend time tracking down business definitions and rules through extensive interviews and questions, these rules can be documented, in real-time, since business staff are directly involved in data management and governance.

The self-service approach to data governance requires a new platform and methods to allow effective collaboration between business and IT staff. If traditional approaches to data governance are the “encyclopedia approach,” where a small set of individuals publish a body of definitions to be consumed by the masses in a top-down manner, data governance in the world of self-service analytics is the “Wikipedia approach.” With Wikipedia, definitions are created collaboratively. While there may be the occasional error, the “wisdom of the crowd” can quickly spot and resolve these errors, ensuring

3 Data Management Association (DAMA) Data Management Body of Knowledge nd2 Edition, Technics Publications, 2017 4 “Bad Data Costs the US $3.1 Trillion Per Year”, Harvard Business Review, by Thomas C. Redman, September 2016

www.globaldatastrategy.com 4 eventual consistency. While there may be short-term inconsistencies, longer-term quality is ensured by a constant, active review that keeps information from becoming stale or outdated.

Each of these approaches has its place, even beyond the world of data. For example, the release of a new cancer drug has strict controls put in place by a small set of qualified individuals. The release of this information is controlled by formal rules and processes to ensure the community’s safety. Movie reviews, on the other hand, which in the past were published only by a small set of critics, have been replaced by crowdsourced review sites, where many viewers can post their opinions, providing a wider range of views and, arguably, a more accurate representation of the community’s preferences. Each approach is appropriate in certain circumstances. While it would be risky to have life-saving drugs created by a crowdsourced approach, it would be limiting to have opinion polls created by only a few individuals. Know Which Data to Govern Closely, and Which to Leave Alone

Such is the case with enterprise data. While some data needs to be closely regulated to reduce risk and ensure quality, other data is more suited to a collaborative approach to quality and governance. Generally, the more the data is shared across and beyond the organization, the more formal governance is needed. Examples of data that should be more strictly governed include:

Master data that provides a single view of a customer, a product, a vendor, etc.

Reference data for standardized product codes, external country code lists, etc.

Enterprise data warehouse data, including financial data reported to auditors, nonprofit results reported to sponsors

Here are examples of data better served by a more collaborative, crowd-sourced approach include:

Raw or lightly prepped data sets for exploratory analysis

Non-productionized analytical model data

Ad hoc reporting and discovery data

Some data sets balance these approaches. For example, operational reporting for a particular business area or function, or a data mart or warehouse in use by a small team or division, requires a formal approach to governance but does not necessarily require the full set of enterprise governance reviews that are needed for master or reference. Local teams can handle governance of these data assets within their own unit without negatively affecting other groups.

Figure 1 shows the continuum of data governance approaches, highlighting the fact that data governance is not a “one size fits all” effort.

www.globaldatastrategy.com 5 Figure 1 Know Which Data to Govern Strictly, and Which to Leave Alone

The Process is Important

While we’ve described various approaches to data governance in neatly-organized containers of data, in the real world, data categorization is not static. For example, data initially part of an ad hoc discovery process might be so important that it is promoted to the enterprise data warehouse. For example, a data science team may perform exploratory analysis on the effect of weather patterns or social media sentiment on product sales. Once the importance of these factors is included, certain elements of weather or social data may be included in the enterprise warehouse to be used by the wider organization, and therefore subject to stricter governance controls. Remember our adage: the more widely used data is across the organization, the more strictly governed it needs to be.

To use data effectively for business success, clear processes and procedures need to be in place for the governance of and interaction between these different data landscapes. Examples include:

Processes for data promotion between data discovery and enterprise use

Data stewardship roles for curated data sets

Automated feedback loops to encourage collaborative input for business definitions and rules

Review cycles for standard data sets, reports, analytical models, etc.

Publication and distribution mechanisms for shared data sets

Data lifecycle and workflow

www.globaldatastrategy.com 6 As much as possible, these processes should be integrated into individuals’ “day jobs” to minimize additional work. For example, citizen data scientists or developers can become more effective if they build a culture of collaboration where queries, analytical models, and background documentation are regularly shared to create a group knowledgebase where teams can collaborate. The more and governance is a normal part of business operations, the more successful it will be.

The Right Technology for the Right Job

Just as choosing the right level of governance is important for each type of data and user group so is choosing the right technology. Typically, change management tools and other technologies are used for formal documentation and governed change management for business-critical data, such as master and reference data, data models, and metadata repositories.

For discovery data, a more collaborative approach is required since it is a less formally managed data catalog. Citizen data scientists using a self-service approach to data preparation and analysis need a platform for discovery, information, and collaboration. Rather than create a new query to calculate “total sales,” a simple search could discover that this query has already been written by another team. Where more than one calculation is used, usage-based ranking can help determine which is the most commonly used. This usage-ranking approach can be considered crowdsourced governance where users can determine the best fit based on the data that is most commonly used across the organization. For example, a calculation rule for total sales may be dictated using a top-down approach, but if, in practice, most of the organization uses a different calculation, perhaps it would be better to align with the current reality that is in-use. Of course, if these rules are dictated by regulators, investors, or the like, a top-down approach may be required. In this case, knowing the reality doesn’t match design can be helpful as well.

Discussion threads and feedback loops are also key to . Definitions are often not “black and white,” and teams can collaborate and arrive at a common answer or explain discrepancies in usage. Open discussion can often uncover “tribal knowledge” across the organization since more stakeholders are given a voice to contribute knowledge.

While the functionality of collaboration-based governance catalogs varies, common themes across these approaches include:

Open editing definitions, queries, models, etc.

Discussion threads and feedback mechanisms

Discovery and search

Usage ranking and helpfulness ranking

Tagging and open categorization

Integration with tools used by citizen data scientists in their “day jobs,” e.g. self-service data analysis

www.globaldatastrategy.com 7 As discussed earlier, both the formal and collaborative approaches can be used together as they apply to data sets, governance methods, and governance tools. As shown in Figure 2, the self-service data scientist relies on several approaches and data sets.

Figure 2 The Self-Service Data Scientist Relies on Several Data Sets

For the self-service user, standardized reference data is helpful to ensure consistency and saves time spent on data search and cleanup. According to Forbes magazine, data scientists spend approximately 80% of their time preparing and managing data for analysis, and 76% of these individuals view data preparation as the least enjoyable part of their work5. Providing standardized, vetted, and governed data sets makes a citizen data scientist’s job faster and easier, allowing them to spend their valuable time discovering new insights from data.

For non-standard data sets, collaborative data governance catalogs allow citizen data scientists to more easily discover what other teams have done and leverage their collective knowledge, rather than constantly “reinventing the wheel.” In both cases, data governance and the associated analysis approach help make a data scientist’s job easier, allowing them to focus on using data for strategic advantage, rather than wasting valuable effort on data cleanup or search.

The Business Value of Collaborative Data Discovery

It’s an exciting time to be in the data management field. New technologies not only allow us to leverage data in ways that previously weren’t possible, but these technologies are more user-friendly than ever before, allowing both business and IT staff to collaborate and discover new insights from enterprise data.

With these new technologies also come new methodologies and approaches to collaborate effectively and innovate across teams and roles. Data governance has evolved as rapidly as the technology it governs, and the rise of self-service, collaborative governance is a testament to this innovation. Rather than forcing

5 “Cleaning Big Data: Most Time-Consuming, Least Enjoyable Data Science Task, Survey Says”, Forbes, March 2016 by Gil Press

www.globaldatastrategy.com 8 a top-down approach to governance in areas where it doesn’t make sense, more and more organizations are using collaborative data governance as a catalyst for innovation. The more individuals and roles who can access information and contribute to the metadata and definitions that support this data, the more data quality increases. Data-driven business transformation requires a solid data foundation and, with the rise of self-service data preparation and governance, more stakeholders than ever before have a voice and a role in contributing to this data-driven transformation and success.

About the Author

Donna Burbank is a recognized industry expert in information management with over 20 years of experience helping organizations enrich their business opportunities through data and information. She currently is the Managing Director of Global Data Strategy Ltd, where she assists organizations around the globe in driving value from their data. She has worked with dozens of Fortune 500 companies worldwide in the Americas, Europe, Asia, and Africa and speaks regularly at industry conferences. She has co-authored several books on data management and is a regular contributor to industry publications. She can be reached [email protected] and you can follow her on Twitter @donnaburbank.

About Alteryx

Revolutionizing business through data science and analytics, Alteryx empowers everyone in an organi- zation to experience the thrill of getting to the answer faster. The modern, end-to-end Alteryx analytics platform enables analysts and data scientists alike to discover, share and prep data, perform analysis – statistical, predictive, prescriptive and spatial – and deploy and manage analytic models. Hundreds of thousands of people in enterprises all over the world rely on Alteryx daily to deliver game-chang- ing results. Alteryx is proud to be the Gold winner of the 2017 Gartner Peer Insights Customer Choice Award in the Business Intelligence and Analytic marketplace. To learn more, visit www.alteryx.com.

www.globaldatastrategy.com 9