Streaming-First Architectures Building the Real-Time Organization
Total Page:16
File Type:pdf, Size:1020Kb
Streaming-First Architectures Building the Real-Time Organization By Julian Ereth July 2019 This publication may not be reproduced or distributed without Eckerson Group’s prior permission. Streaming-First Architectures About the Author Julian Ereth is a researcher and practitioner in business intelligence and data analytics. His research focuses on new approaches in big data, advanced analytics, and the Internet of Things. Ereth is author of multiple internationally accepted research papers and is currently earning his Ph.D. at the University of Stuttgart (Germany). He is cofounder of Pragmatic_Apps, which builds custom business software and analytics solutions. About Eckerson Group Eckerson Group helps organizations get more value from data and analytics. Our experts each have more than 25+ years of experience in the field. Data and analytics is all we do, and we’re good at it! Our goal is to provide organizations with a cocoon of support on their data journeys. We do this through online content (thought leadership), expert onsite assistance (full-service consulting), and 30+ courses on data and analytics topics (educational workshops). Get more value from your data. Put an expert on your side. Learn what Eckerson Group can do for you! © Eckerson Group 2019 www.eckerson.com 2 Streaming-First Architectures Table of Contents Executive Summary ...................................4 Key Takeaways .....................................4 Recommendations ..................................5 The Rise of Data Streams ..............................6 From Batch Processing to Stream Processing ...........6 Benefits of Stream Processing ........................7 Streaming Components ................................8 Stream Sourcing ....................................8 Stream Transportation ...............................9 Stream Processing ................................. 10 Streaming in Analytics ............................... 11 Streaming in Real-Time Analytics ................... 11 Streaming-First Architectures ...................... 12 Benefits of a Streaming-First Architecture .............14 Implementing a Streaming Architecture .............. 15 Technology and Products .......................... 16 Open Source vs. Commercial Tools .................. 16 Conclusion ...........................................17 About Eckerson Group ............................... 18 © Eckerson Group 2019 www.eckerson.com 3 Streaming-First Architectures Executive Summary Having the right data at the right time is essential for organizations that need to compete. Having the latest information about current market movements, customer interactions, or operational data from the shop floor can tip the scales. However, gathering and processing data in (near) real time is not as easy as it sounds. Traditional analytics architectures were built mostly to support strategic business decision making, where timeliness is rarely critical. This model hits a wall when data flow velocity increases and requirements like real-time processing come into play. Accordingly, traditional architectures are integrating real-time components and gradually shifting toward “streaming first” concepts. But integrating streaming components in analytical landscapes presents challenges such as new tools, technologies, concepts, and methods, as well as a novel way of thinking about analytics architectures. And both experience and best practices are scarce in this area. This report helps business and technical executives understand data streaming, analyze analytics architectures, and optimize accordingly. Key Takeaways • Data streaming is superseding traditional batch operation in analytics architectures. • Streaming components can be categorized as stream sourcing (e.g., edge processing or CDC), stream transportation (e.g., messaging brokers or event logs) and stream processing (e.g., CEP and stream analytics). • There are two cases of data streaming in analytics: 1. Real-time analytics pipelines that provide ways to rapidly extract data from the edge and process it for immediate insights. 2. Stream-first architectures that utilize streaming components like event logs to combine systems in a flexible and asynchronous way. • There are many great open source tools that perform certain tasks, but commercial vendors and tools help to integrate, extend, and run them on an enterprise level. © Eckerson Group 2019 www.eckerson.com 4 Streaming-First Architectures Recommendations • Analyze the scenario to decide whether this is a case for an isolated real- time analytics pipeline or more profound transformation of the underlying architecture. • Think beyond current needs. A streaming-first architecture enables the integration of streaming data and also improves the architecture’s agility and sustainability. • When choosing a tool, think about factors like scalability, latency, and durability, and also consider the trade-off between an open source, best-of-breed approach and a commercial enterprise-ready solution. © Eckerson Group 2019 www.eckerson.com 5 Streaming-First Architectures The Rise of Data Streams Having the right data at the right time is essential for organizations that need to compete in today’s fast-moving and data-driven world. This simple maxim grows truer every day. Having the latest information about current market movements, customer interactions, or operational data from the shop floor can tip the scales. Companies are realizing this fact, and streaming components are gaining value in modern data landscapes. Having the right data at the right time is essential for organizations that need to compete in today’s fast-moving and data-driven world. Integrating real-time components in analytical landscapes presents challenges such as new tools, technologies, concepts, and methods, as well as a novel way of thinking about analytics architectures. To make things worse, both experience and best practices are scarce in this area. This report first explains basic ideas behind data streaming and then introduces necessary components for building modern data streaming solutions. Moreover, it describes how streaming can be integrated in analytics landscapes. Lastly, it outlines hands-on advice for implementing a streaming architecture and lists relevant streaming tools. From Batch Processing to Stream Processing Streaming data is a continuous flow that is generated from various sources. Stream processing is an umbrella term for methods to work with streaming data. During stream processing, data is constantly moving from one stage to another, where it is only temporarily saved and immediately processed. For that reason, streaming data is often referred to as data in motion or data in flight. In contrast, data at rest is permanently persisted in a database and can be read at any time for further (See Figure 1). Figure 1. Data in Motion vs. Data at Rest Processing / Processing / Source Storage Analytics Analytics Data in Motion Data at Rest Traditionally, analytical systems mostly work with data at rest. For example, in most data warehouse architectures, data is extracted, cleansed, and transformed by ETL processes and then persisted in a central data warehouse. From here all downstream analytical systems can access the data, e.g., for creating reports or dashboards. Here, ETL processes usually run on a regular basis, e.g., every night, and processes all available data in this batch. © Eckerson Group 2019 www.eckerson.com 6 Streaming-First Architectures Obviously, downstream analytics systems can only show data that has been processed by preceding ETL jobs. Accordingly, to show more current data in reports and dashboards, the batch jobs have to run more frequently, which in turn limits the size of the batch (see Figure 2). Following this logic, you eventually end up with a batch size of one, which means that each record is processed immediately and is available for downstream systems without delay. This is called stream processing. Figure 2. From Batch to Stream Processing n Monthly Batch Daily Batch Size of Dataset Micro- Batch Real Time 1 Batch Processing Stream Processing Benefits of Stream Processing Stream processing comes with various benefits that help analytical systems on a technological and business level. More up-to-date insights. The most obvious benefit of using data streams is the faster processing which makes more up-to-date data available to the business and thereby leads to more relevant and valuable insights. In time-sensitive scenarios, like fraud detection or financial trading, this can mean a real competitive advantage. Enabling new analytical use cases. Besides improving the data for existing analytics applications, stream processing also enables entirely new use cases like operational decision support, where real-time insights are needed, e.g., on a manufacturing shop floor where a worker has to decided which machine to maintain. This is also why stream processing is gaining attention in relation to the Internet of Things where it can help transform sensor data into valuable business information. © Eckerson Group 2019 www.eckerson.com 7 Streaming-First Architectures Increase flexibility of analytical architectures. The concept of stream processing introduces a whole new mindset on working with data. Analytical architectures are less about central, highly cleansed data warehouses and more about process-oriented data pipelines and event-based data hubs. This structure provides more flexibility and makes many analytical systems more business-oriented. Streaming Components