Does ETL Give Way to Data Wrangling in 2018? Does ETL Give Way to Data Wrangling in 2018? 2

Does ETL Give Way to Data Wrangling in 2018? Does ETL Give Way to Data Wrangling in 2018? 2 Introduction We’ve got a bold prediction: mapping-based ETL products are fading out in favor of data wrangling solutions, which are better equipped to keep up with today’s data, users, and pace of change. Yes, we are a data wrangling platform and have a stake in this claim. But our position in the industry also allows us to see this shift first-hand—the causes, the new technologies stepping in, and how it all affects you and your organization. This view that modern data wrangling (also known as data preparation) solutions will overtake ETL is an increasingly-held view in the industry. In the 2017 Gartner Market Guide on Data Preparation, Gartner cited that, “By 2020, data preparation tools will be used in more than 50% of new data integration efforts for analytics.” The driving force behind this change, and so many others like it across other industries, is the need for speed. Being able to keep up with the speed of change, especially in tech, is a make-or-break facet to business today. McKinsey Quarterly calls it “the urgency imperative” and demonstrates that the organization of the future doesn’t just react to changing conditions, but adopts an emergent strategy, pursuing multiple innovative strategies pretty much all the time. Below we will share our insights for how McKinsey’s adage “Worship Speed” comes into play in the world of data—why ETL can’t keep up, and what data management will look like moving forward. 1. The Trend: Why ETL Is Struggling to Keep Up with the Speed of Business 2. Two Drivers Behind Why ETL is Getting Left Behind 3. Three Ways to Manage through this Paradigm Shift Does ETL Give Way to Data Wrangling in 2018? 3 The Trend: Why ETL Is Struggling to Keep Up with the Speed of Business Extract, Transform, and Load, or the ETL process, refers to the traditional methods used in data warehousing since electronic data collection started in the 1970’s. ETL was designed to work with the structured data and well-defined data warehousing and business intelligence systems common to that time. Over the last few decades, IT professionals and engineers have become highly-skilled at gathering and organizing business data using mapping-based ETL products, delivering organized information to business users for analysis. Unfortunately, the typical users of ETL technology don’t have the context behind the information they’re working with. Business analysts give them a list of requirements for their data, IT prepares the data when they can get to it (which could be weeks), and then hands it back to the business analysts in what can be several exchanges over weeks or months. ETL also lacks the ability to rapidly iterate on constantly-changing source data, and businesses must be able to adapt to reach business insights quickly. And lastly, the ETL process is not nimble enough to handle the diversity of today’s data. As organizations are struggling with the ETL bottleneck, interactive data wrangling interfaces are stepping in—enabling the people with the greatest context for the data, business and data analysts, to more quickly and accurately make data useful. To understand this trend, we first look at why things are changing as businesses strive for the data insights they need, in the time they need it. VS. MONTHS HOURS (ETL Cycles) (Data Wrangling Cycles) Does ETL Give Way to Data Wrangling in 2018? 4 Two Drivers Behind Why ETL is Getting Left Behind 1. THE USERS OF DATA HAVE CHANGED With traditional ETL systems, data is handled by IT staff, who often don’t have the business context for the data coming in, nor know what the end-user needs the data to look like downstream. Organizations are finding it more straight-forward (and cost-effective) to equip business users with the ability to wrangle and analyze data on their own, effectively removing the IT-Business bottleneck of traditional ETL systems. Going back to the McKinsey study on top-performing companies, this is an unleashing of decision-making—tapping in to the brilliance of your network of employees—a crucial part of surviving in today’s business landscape. Analysts today are data wranglers. Their success relies upon solutions that are designed to fit their needs and how they work. They need to actually see the data they’re working on. They need immediate feedback on how each transformation step is impacting the data and they need to be able to iterate, moving back and forth between steps in their work. In order to make these tasks easier, modern data solutions allow for any wrangling workflow to be run in multiple environments—in-browser, big data, cloud, etc. The infrastructure is changing; it’s more focused on cloud and it’s more diverse. End users shouldn’t need to focus on where the data is and how it’s being processed; they need to focus on how to quickly and accurately derive value from diverse data, regardless of where it resides. IT ANALYSTS (ETL) (Data Wrangling) Does ETL Give Way to Data Wrangling in 2018? 5 2. THE NATURE OF DATA HAS CHANGED ETL can certainly navigate many data types and a high velocity of data. The issue remains that the ETL process is much too slow, and the end user needs to be able to wrangle the data themselves. Here are ways that the landscape of data is changing—requiring the proper tools that allow business users to arrive at insights quickly. First, the sources of data that make up analysis (data inputs) have dramatically increased in diversity, volume, and rate of change. At the same time, how organizations are consuming or taking action off of that data (data outputs) has also evolved. The final destination of data used to almost always be a centralized business intelligence tool, but now it’s a plethora of tools from machine learning to data science. Consider the following shifts that are capping ETL’s potential in today’s marketplace: Growing Diversity of Data Sources and Data Types With the proliferation of the Internet of Things, geospatial data, social media, and third-party data gathering, data can—and does—come from anywhere. Whether it’s a month of customer chat logs, governmental demographic data, or abounding tweets, most organizations are faced with data from multiple disparate sources, many of which are out of reach for traditional mapping-based ETL processes. Higher Volume of Data According to McKinsey Analytics, the volume of data continues to double every three years, giving data scientists unprecedented amounts of raw information, much of which remains inaccessible due to its non-consumable, non-standardized format. Multiple, Varied Data Outputs Centralized business intelligence systems are no longer the only output for data integration efforts. As the uses for data are expanding, users expect to be able to push prepared data directly to an assortment of technology platforms and visualization tools, many within the same department or organization (which happen to be closer to final insights in the data lifecycle). These outputs include statistical analysis, compliance reporting, machine learning, mechanical devices, and applications. Does ETL Give Way to Data Wrangling in 2018? 6 Increased Dependence on Cloud Infrastructure Not only has the nature of data changed, the way it is stored and processed has as well. Cloud-based environments offer agility, scalability, and saves organizations from having to maintain their own data center and management staff. For these reasons, most organizations are transitioning more workloads from traditional on-premises platforms to cloud-based environments—many choosing a mix of on-prem, private cloud, and public cloud solutions, also known as a hybrid cloud environment. In a 2017 study of IT professionals, 85 percent of enterprises confirmed they were pursuing a multi-cloud strategy. Without the proper data tools, users are faced with learning different tools for different computing environments, switching between different technologies to access data in each, and comparing data that doesn’t share commonalities in metadata, wrangling logic, or data lineage. Users shouldn’t need to know where the data is or where the computing is happening; they should be focused on how to they derive value out of the data faster. The Rise of Blockchain As a distributed database with open source capabilities, the adoption of Blockchain will drive even more demand for insights using the dynamic data it stores. From transparent supply chains to smart contracts and stock trades, Blockchain is already adding to the volume and types of data to be wrangled and analyzed. For example, Walmart and IBM are piloting a food safety blockchain that will reduce foodborne illness through faster tracking of food supplies en route from China. A 2017 study of IT professionals confirmed 85 percent of enterprises were pursuing a multi-cloud strategy. MARY JOHNSTON TURNER, IDC Does ETL Give Way to Data Wrangling in 2018? 7 3 Ways to Manage through this Paradigm Shift Moving forward, data professionals can expect that as ETL continues to struggle to keep up with the pace of modern business, other trends will continue to come into play during 2018, namely the increasing importance of data wrangling solutions within organizational analytics environments. There is not an industry that hasn’t been touched by a change in their business’s data while simultaneously wanting faster insights. The shift from ETL-led processes to self-serve data preparation tools increases access to more data sources, reduces time-to-insight, and above all, moves data manipulation into the hands of the business user who has the needed context for analysis and decision-making.

Does ETL Give Way to Data Wrangling in 2018? Does ETL Give Way to Data Wrangling in 2018? 2

Alteryx Technical Whitepaper

Data Cleaning Tools

Wrangling Messy Data Files

Integrating High-Performance Machine Learning: H2O and KNIME

Introducing Data Science to Undergraduates Through Big Data: Answering Questions by Wrangling and Profiling a Yelp Dataset

Cheat Sheet: Data Wrangling with KNIME Analytics Platform

Trifacta Architecture: an Intelligent Data Wrangling Platform

QBS 181: Data Wrangling

Six Core Data Wrangling Activities

Introduction to R: Data Wrangling

Titelfolie Mit Langer Überschrift, Zwei- Oder Mehrzeilig, TT Norms Pro, 28 Pt

SAS® Visual Data Mining and Machine Learning Everything Needed to Solve Your Most Complex Problems Within a Single, Integrated In-Memory Environment