Issue Editors
Total Page:16
File Type:pdf, Size:1020Kb
Bulletin of the Technical Committee on Data Engineering December 2015 Vol. 38 No. 4 IEEE Computer Society Letters Letter from the Editor-in-Chief . David Lomet 1 Letter from the Special Issue Editors. .David Maier, Badrish Chandramouli 2 Special Issue on Next-Generation Stream Processing Kafka, Samza and the Unix Philosophy of Distributed Data . Martin Kleppmann, Jay Kreps 4 Streaming@Twitter. Maosong Fu, Sailesh Mittal, Vikas Kedigehalli, Karthik Ramasamy, Michael Barry, Andrew Jorgensen, Christopher Kellogg, Neng Lu, Bill Graham, Jingwei Wu 15 Apache Flink™: Stream and Batch Processing in a Single Engine . ........ Paris Carbone, Stephan Ewen, Seif Haridi, Asterios Katsifodimos, Volker Markl, Kostas Tzoumas 28 CSA: Streaming Engine for Internet of Things . Zhitao Shen, Vikram Kumaran, Michael J. Franklin, Sailesh Krishnamurthy, Amit Bhat, Madhu Kumar, Robert Lerche, Kim Macpherson 39 Trill: Engineering a Library for Diverse Analytics. ....................... Badrish Chandramouli, Jonathan Goldstein, Mike Barnett, James F. Terwilliger 51 Language Runtime and Optimizations in IBM Streams . Scott Schneider, Bu˘gra Gedik, Martin Hirzel 61 FUGU: Elastic Data Stream Processing with Latency Constraints . Thomas Heinze, Yuanzhen Ji, Lars Roediger, Valerio Pappalardo, Andreas Meister, Zbigniew Jerzak, Christof Fetzer 73 Exploiting Sharing Opportunities for Real-time Complex Event Analytics. ..... Elke A. Rundensteiner, Olga Poppe, Chuan Lei, Medhabi Ray, Lei Cao, Yingmei Qi, Mo Liu, Di Wang 82 Handling Shared, Mutable State in Stream Processing with Correctness Guarantees . Nesime Tatbul, Stan Zdonik, John Meehan, Cansu Aslantas, Michael Stonebraker, Kristin Tufte, Chris Giossi, Hong Quach 94 “The Event Model” for Situation Awareness . Opher Etzion, Fabiana Fournier, Barbara von Halle 105 Towards Adaptive Event Detection Techniques for the Twitter Social Media Data Stream . ...........................................Michael Grossniklaus, Marc H. Scholl, Andreas Weiler 116 Conference and Journal Notices TCDE Membership Form . back cover Editorial Board TCDE Executive Committee Editor-in-Chief Chair David B. Lomet Xiaofang Zhou Microsoft Research School of Information Tech. & Electrical Eng. One Microsoft Way The University of Queensland Redmond, WA 98052, USA Brisbane, QLD 4072, Australia [email protected] [email protected] Associate Editors Executive Vice-Chair Masaru Kitsuregawa Christopher Jermaine The University of Tokyo Department of Computer Science Tokyo, Japan Rice University Secretary/Treasurer Houston, TX 77005 Thomas Risse Bettina Kemme L3S Research Center School of Computer Science Hanover, Germany McGill University Vice Chair for Conferences Montreal, Canada Malu Castellanos HP Labs David Maier Palo Alto, CA 94304 Department of Computer Science Advisor Portland State University Kyu-Young Whang Portland, OR 97207 Computer Science Dept., KAIST Xiaofang Zhou Daejeon 305-701, Korea School of Information Tech. & Electrical Eng. Committee Members The University of Queensland Amr El Abbadi Brisbane, QLD 4072, Australia University of California Santa Barbara, California Distribution Erich Neuhold Brookes Little University of Vienna IEEE Computer Society A 1080 Vienna, Austria 10662 Los Vaqueros Circle Alan Fekete Los Alamitos, CA 90720 University of Sydney [email protected] NSW 2006, Australia Wookey Lee The TC on Data Engineering Inha University Membership in the TC on Data Engineering is open to Inchon, Korea all current members of the IEEE Computer Society who Chair, DEW: Self-Managing Database Sys. are interested in database systems. The TCDE web page is Shivnath Babu http://tab.computer.org/tcde/index.html. Duke University The Data Engineering Bulletin Durham, NC 27708 The Bulletin of the Technical Committee on Data Engi- Co-Chair, DEW: Cloud Data Management neering is published quarterly and is distributed to all TC Hakan Hacigumus members. Its scope includes the design, implementation, NEC Laboratories America modelling, theory and application of database systems and Cupertino, CA 95014 their technology. VLDB Endowment Liason Letters, conference information, and news should be sent Paul Larson to the Editor-in-Chief. Papers for each issue are solicited Microsoft Research by and should be sent to the Associate Editor responsible Redmond, WA 98052 for the issue. Opinions expressed in contributions are those of the au- SIGMOD Liason thors and do not necessarily reflect the positions of the TC Anastasia Ailamaki ´ on Data Engineering, the IEEE Computer Society, or the Ecole Polytechnique F´ed´erale de Lausanne authors’ organizations. Station 15, 1015 Lausanne, Switzerland The Data Engineering Bulletin web site is at http://tab.computer.org/tcde/bull_about.html. i Letter from the Editor-in-Chief Delayed Publication This December, 2015 issue of the Bulletin is, as some of you may notice, being published in July of 2016, after the March and June, 2016 issues have been published. Put simply, the issue is late, and the March and June issues were published in their correct time slots. The formatting of the issue, and the surrounding editorial material, e.g. the inside front cover and copyright notice, are set to the December, 2015 timeframe. Indeed, the only mention of this inverted ording of issues is in this paragraph. Things do not always go as planned. However, I am delighted that the current issue is being published, and I have high confidence that you will enjoy reading about next-generation stream processing, the topic of the issue. The Current Issue At one point a few years ago, the research community had lost interest in stream processing. The first streaming systems had been built and these early systems demonstrated their feasibility. Commercial interest had been generated, with a number of start-ups and major vendors entering the market. Even using a declarative database- style query language had become an accepted part of the technology landscape. Job done, right? Actually, wrong! As we have seen with the database field itself, innovation and a changing technological environment can lead to an “encore” of interest in a field. Such is the case with stream processing. The issue title: “Next- Generation Stream Processing” captures that. The issue itself captures a whole lot more about the state of the field. Streaming systems have evolved, sometimes in revolutionary ways. Applications of streaming technology have exploded, both in number and in importance. As much as at any time in the past, the streams area is a hive of activity. New technology is opening new application areas, while new application areas create a pull for new technology. David Maier has worked with Badrish Chandramouli to assemble this current issue devoted to presenting the diversity of work now in progress in the streaming area. Streaming technology is at the core of much of their recent research. This makes them ideal editors for the current issue. They have brought together papers that not only provide insights into new streaming technology, but also illustrate where technology might be taking us in its enabling of new applications. Streams are here as a permanent part of the technology environment in a way similar to databases. Thanks to both David and Badrish for bringing this issue together on a topic that will, I am convinced, become a fixture of both the research and the application environment of our field. David Lomet Microsoft Corporation 1 Letter from the Special Issue Editors The precursors of data-stream systems began to show up in the late 1980s and early 1990s in the form of “reactive” extensions to data management systems. With such extensions, there was a reversal of sorts between the roles of data and query. Database requests – in the form of continuous queries, materialized views, event- condition-action rules, subscriptions, and so forth – became persistent entities that responded to newly arriving data. The initial generation of purpose-built stream systems addressed many issues: appropriate languages, deal- ing with unbounded input, handling delay and disorder, dealing with high data rates, load balancing and shed- ding, resiliency, and, to some extent, distribution and parallelism. However, integration with other system com- ponents, such as persistent storage and messaging middleware, was often rudimentary or left to the application programmer. The most recent generation of stream systems have the benefit of a better understanding of application re- quirements and execution platforms, by virtue of lessons learned through experimentation with earlier systems. Scaling, in cloud, fog, and cluster environments, has been at the forefront of design considerations. Systems need to scale not just in terms of stream rate and number of streams, but also to large numbers of queries. Application tuning, operation, and maintenance have also come to the forefront. Support for tradeoffs among throughput, latency, accuracy, and availability is important for application requirements, such as meeting service- level agreements. Resource management at run time is needed to enable elasticity of applications as well as for managing multi-tenancy both with other stream tasks and other application components. Many stream applica- tions require long-term deployment, possibly on the order of years. Thus, the ability to maintain the underlying stream systems as well as evolve applications that run on them is critical. State management is also a concern, both within stream operators and in interactions with other state managers, such as transactional