Does ETL Give Way to Wrangling in 2018? Does ETL Give Way to Data Wrangling in 2018? 2

Introduction

We’ve got a bold prediction: mapping-based ETL products are fading out in favor of data wrangling solutions, which are better equipped to keep up with today’s data, users, and pace of change. Yes, we are a data wrangling platform and have a stake in this claim. But our position in the industry also allows us to see this shift first-hand—the causes, the new technologies stepping in, and how it all affects you and your organization.

This view that modern data wrangling (also known as ) solutions will overtake ETL is an increasingly-held view in the industry. In the 2017 Gartner Market Guide on Data Preparation, Gartner cited that, “By 2020, data preparation tools will be used in more than 50% of new efforts for analytics.”

The driving force behind this change, and so many others like it across other industries, is the need for speed. Being able to keep up with the speed of change, especially in tech, is a make-or-break facet to business today. McKinsey Quarterly calls it “the urgency imperative” and demonstrates that the organization of the future doesn’t just react to changing conditions, but adopts an emergent strategy, pursuing multiple innovative strategies pretty much all the time.

Below we will share our insights for how McKinsey’s adage “Worship Speed” comes into play in the world of data—why ETL can’t keep up, and what will look like moving forward.

1. The Trend: Why ETL Is Struggling to Keep Up with the Speed of Business

2. Two Drivers Behind Why ETL is Getting Left Behind

3. Three Ways to Manage through this Paradigm Shift Does ETL Give Way to Data Wrangling in 2018? 3

The Trend: Why ETL Is Struggling to Keep Up with the Speed of Business

Extract, Transform, and Load, or the ETL process, refers to the traditional methods used in data warehousing since electronic started in the 1970’s. ETL was designed to work with the structured data and well-defined data warehousing and business intelligence systems common to that time. Over the last few decades, IT professionals and engineers have become highly-skilled at gathering and organizing business data using mapping-based ETL products, delivering organized information to business users for analysis.

Unfortunately, the typical users of ETL technology don’t have the context behind the information they’re working with. Business analysts give them a list of requirements for their data, IT prepares the data when they can get to it (which could be weeks), and then hands it back to the business analysts in what can be several exchanges over weeks or months. ETL also lacks the ability to rapidly iterate on constantly-changing source data, and businesses must be able to adapt to reach business insights quickly. And lastly, the ETL process is not nimble enough to handle the diversity of today’s data.

As organizations are struggling with the ETL bottleneck, interactive data wrangling interfaces are stepping in—enabling the people with the greatest context for the data, business and data analysts, to more quickly and accurately make data useful. To understand this trend, we first look at why things are changing as businesses strive for the data insights they need, in the time they need it.

VS.

MONTHS HOURS (ETL Cycles) (Data Wrangling Cycles) Does ETL Give Way to Data Wrangling in 2018? 4

Two Drivers Behind Why ETL is Getting Left Behind

1. THE USERS OF DATA HAVE CHANGED With traditional ETL systems, data is handled by IT staff, who often don’t have the business context for the data coming in, nor know what the end-user needs the data to look like downstream. Organizations are finding it more straight-forward (and cost-effective) to equip business users with the ability to wrangle and analyze data on their own, effectively removing the IT-Business bottleneck of traditional ETL systems. Going back to the McKinsey study on top-performing companies, this is an unleashing of decision-making—tapping in to the brilliance of your network of employees—a crucial part of surviving in today’s business landscape.

Analysts today are data wranglers. Their success relies upon solutions that are designed to fit their needs and how they work. They need to actually see the data they’re working on. They need immediate feedback on how each transformation step is impacting the data and they need to be able to iterate, moving back and forth between steps in their work.

In order to make these tasks easier, modern data solutions allow for any wrangling workflow to be run in multiple environments—in-browser, big data, cloud, etc. The infrastructure is changing; it’s more focused on cloud and it’s more diverse. End users shouldn’t need to focus on where the data is and how it’s being processed; they need to focus on how to quickly and accurately derive value from diverse data, regardless of where it resides.

IT ANALYSTS (ETL) (Data Wrangling) Does ETL Give Way to Data Wrangling in 2018? 5

2. THE NATURE OF DATA HAS CHANGED ETL can certainly navigate many data types and a high velocity of data. The issue remains that the ETL process is much too slow, and the end user needs to be able to wrangle the data themselves. Here are ways that the landscape of data is changing—requiring the proper tools that allow business users to arrive at insights quickly.

First, the sources of data that make up analysis (data inputs) have dramatically increased in diversity, volume, and rate of change. At the same time, how organizations are consuming or taking action off of that data (data outputs) has also evolved. The final destination of data used to almost always be a centralized business intelligence tool, but now it’s a plethora of tools from to . Consider the following shifts that are capping ETL’s potential in today’s marketplace:

Growing Diversity of Data Sources and Data Types With the proliferation of the Internet of Things, geospatial data, social media, and third-party data gathering, data can—and does—come from anywhere. Whether it’s a month of customer chat logs, governmental demographic data, or abounding tweets, most organizations are faced with data from multiple disparate sources, many of which are out of reach for traditional mapping-based ETL processes.

Higher Volume of Data According to McKinsey Analytics, the volume of data continues to double every three years, giving data scientists unprecedented amounts of raw information, much of which remains inaccessible due to its non-consumable, non-standardized format.

Multiple, Varied Data Outputs Centralized business intelligence systems are no longer the only output for data integration efforts. As the uses for data are expanding, users expect to be able to push prepared data directly to an assortment of technology platforms and visualization tools, many within the same department or organization (which happen to be closer to final insights in the data lifecycle). These outputs include statistical analysis, compliance reporting, machine learning, mechanical devices, and applications. Does ETL Give Way to Data Wrangling in 2018? 6

Increased Dependence on Cloud Infrastructure Not only has the nature of data changed, the way it is stored and processed has as well. Cloud-based environments offer agility, scalability, and saves organizations from having to maintain their own data center and management staff. For these reasons, most organizations are transitioning more workloads from traditional on-premises platforms to cloud-based environments—many choosing a mix of on-prem, private cloud, and public cloud solutions, also known as a hybrid cloud environment. In a 2017 study of IT professionals, 85 percent of enterprises confirmed they were pursuing a multi-cloud strategy. Without the proper data tools, users are faced with learning different tools for different computing environments, switching between different technologies to access data in each, and comparing data that doesn’t share commonalities in metadata, wrangling logic, or data lineage. Users shouldn’t need to know where the data is or where the computing is happening; they should be focused on how to they derive value out of the data faster.

The Rise of Blockchain As a distributed database with open source capabilities, the adoption of Blockchain will drive even more demand for insights using the dynamic data it stores. From transparent supply chains to smart contracts and stock trades, Blockchain is already adding to the volume and types of data to be wrangled and analyzed. For example, Walmart and IBM are piloting a food safety blockchain that will reduce foodborne illness through faster tracking of food supplies en route from China.

A 2017 study of IT professionals confirmed 85 percent of enterprises were pursuing a multi-cloud strategy.

MARY JOHNSTON TURNER, IDC Does ETL Give Way to Data Wrangling in 2018? 7

3 Ways to Manage through this Paradigm Shift

Moving forward, data professionals can expect that as ETL continues to struggle to keep up with the pace of modern business, other trends will continue to come into play during 2018, namely the increasing importance of data wrangling solutions within organizational analytics environments.

There is not an industry that hasn’t been touched by a change in their business’s data while simultaneously wanting faster insights. The shift from ETL-led processes to self-serve data preparation tools increases access to more data sources, reduces time-to-insight, and above all, moves data manipulation into the hands of the business user who has the needed context for analysis and decision-making. All of these perks feed into the larger benefit: saving organizations money and time.

As the shift from ETL accelerates, here are a few key elements of data wrangling solutions your team should take into consideration:

1. FULL LIFE-CYCLE DATA WRANGLING End-user wrangling is now an exploratory task, a chance for business users to transform and manipulate the data whose context they already know. What’s more, a subset of this work can be used on a regular basis in automatic operational pipelines. A recipe can be designed once and run in multiple environments such as desktop, big data, or multi-cloud environments. Given that, wrangling could be poised to take over production workloads, and therefore the entire data lifecycle leading up to the analysis and consumption stage.

BUSINESS ANALYTICS DATA DISCOVER & REPORTING

VALIDATE STRUCTURE

ENRICH CLEANSE Does ETL Give Way to Data Wrangling in 2018? 8

Wrangling could be poised to take over production workloads, and therefore the entire data lifecycle leading up to the analysis and consumption stage

2. DATA PREP ACCELERATES YOUR IN-HOUSE MACHINE LEARNING. AND VICE VERSA. Data prep tools now use machine learning to accelerate the preparation process. For example, Trifacta is smart enough to learn and remember your preferences for data wrangling by source. And vice versa, when Trifacta’s machine learning for data preparation is paired with your own ML applications and artificial intelligence initiatives, your applications and devices can get smarter, faster.

Case in point: Trifacta has partnered with DataRobot to automate machine learning and empower financial services and insurance organizations to accelerate data wrangling while empowering analysts of any skill level to quickly build and deploy accurate predictive models.

3. THE MORE THINGS CHANGE… The more you need data wrangling tools to remain competitive in the global marketplace.

Looking forward, there’s much more in flux in the world of data than diversity of inputs, outputs and processing engines. Not only is the software used to handle data changing, but so is the hardware, the associated platforms, government regulations, business requirements, metadata, and file structures surrounding your data.

Data wrangling solutions need to be able to address the following:

Data governance requirements Discrepancies in naming conventions, data formats, or update rates are natural when numerous analysts are working in separate environments. But this is an increasingly big problem for organizations. Similar datasets can be represented in wholly different ways, causing confusion for analysts and a need to decide which system is actually “right”. This slows down the insights process and unnecessarily burdens the governance teams. Accuracy issues arise when data is presented without proper governance to back it up. And it becomes risky, as new regulations need to be instituted across platforms. Interoperable solutions provide a centralized place for data governance requirements, even across multiple platforms and with multiple end-users. Does ETL Give Way to Data Wrangling in 2018? 9

Best-in-Breed over All-in-One Solutions Vendors are developing add-ons to their products to entice customers to use them as a one-stop data management/ analytics shop. Organizations are now faced with a choice: use the add-ons that may work halfway, or choose the best-in-breed option. Although it may seem easier to use an all-in-one provider, organizations may find their capabilities second-rate. Best-in-breed options are focused on perfecting their singular product and ensuring it is interoperable with any other platforms down the line, not just those from the same brand.

Interoperable with changing dataflow engines The engines used in different environments are constantly changing, and organizations need a tool that can operate with any data flow engine, whether MapReduce, Spark, Beam, Cloud Dataflow on Google Cloud, or EMR on Amazon Web Services. Modern data tools are able to develop the appropriate logic for your data, making sure that as engines change, the logic is maintained specific to the platform.

Interoperable between on-premises, cloud, and hybrid infrastructures An interoperable data wrangling technology ensures seamless data wrangling no matter where data is stored: on- prem, cloud, multi-cloud or hybrid cloud. A seamless wrangling experience provides consistent metadata and lineage, reproducible wrangling recipes across environments, while delivering a consistent user experience across computing environments. But it also gives organizations another huge benefit: the freedom to be flexible when choosing data environments. By investing in a single wrangling technology for business users that supports multiple environments, organizations can choose the best cloud/on-prem data environment mix they need— now and in the future.

As 2018 gets underway, we will see less reliance on traditional ETL technology/processes and welcome new technologies that can handle today’s data and speed of business. Many of us at Trifacta have worked as ETL experts, so we are passionate about ETL and data wrangling, and would love to show you how the trends above can work in your favor when you use the right tool.

If you’re interested in a data wrangling solution, we’d love to chat! We’re biased, but believe Trifacta is uniquely suited to solve the issues raised by ETL—and get you to analytics success faster in the new year.

To get started, we recommend downloading our free desktop product, Wrangler, to get a feel for what we do. Email [email protected] for more details on how your organization can implement Trifacta.

About Trifacta Trifacta is the global leader in data wrangling. Trifacta leverages decades of innovative in human-computer interaction, scalable data management and machine learning to make the process of preparing data faster and more intuitive. Around the globe, tens of thousands of users at more than 10,000 companies, including leading brands like Deutsche Börse, Kaiser Permanente, New York Life and PepsiCo, are unlocking the potential of their data with Trifacta’s market-leading data wrangling solutions. Learn more at trifacta.com.

For Additional Questions, Contact Trifacta www.trifacta.com | 844.332.2821

Experience the Power of Data Wrangling Today www.trifacta.com/start-wrangling