© 2020 Faria Kalim SATISFYING SERVICE LEVEL OBJECTIVES in STREAM PROCESSING SYSTEMS

© 2020 Faria Kalim SATISFYING SERVICE LEVEL OBJECTIVES IN STREAM PROCESSING SYSTEMS BY FARIA KALIM DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2020 Urbana, Illinois Doctoral Committee: Professor Indranil Gupta, Chair Professor Klara Nahrstedt Assistant Professor Tianyin Xu Dr. Ganesh Ananthanarayanan Abstract An increasing number of real-world applications today consume massive amounts of data in real-time to produce up to date results. These applications include social media sites that show top trends and recent comments, streaming video analytics that identify traffic patterns and movement, and jobs that process ad pipelines. This has led to the proliferation of stream processing systems that process such data to produce real-time results. As these applications must produce results quickly, users often wish to impose performance requirements on the stream processing jobs, in the form of service level objectives (SLOs) that include producing results within a specified deadline or producing results at a certain throughput. For example, an application that identifies traffic accidents can have tight latency SLOs as paramedics may need to be informed, where given a video sequence, results should be produced within a second. A social media site could have a throughput SLO where top trends should be updated with all received input per minute. Satisfying job SLOs is a hard problem that requires tuning various deployment parameters of these jobs. This problem is made more complex by challenges such as 1) job input rates that are highly variable across time e.g., more traffic can be expected during the day than at night, 2) transparent components in the jobs’ deployed structure that the job developer is unaware of, as they only understand the application-level business logic of the job, and 3) different deployment environments per job e.g., on a cloud infrastructure vs. on a local cluster. In order to handle such challenges and ensure that SLOs are always met, developers often over-allocate resources to jobs, thus wasting resources. In this thesis, we show that SLO satisfaction can be achieved by resolving (i.e., preventing or mitigating) bottlenecks in key components of a job’s deployed structure. Bottlenecks occur when tasks in a job do not have sufficient allocation of resources (CPU, memory or disk), or when the job tasks are assigned to machines in a way that does not preserve locality and causes unnecessary message passing over the network, or when there are an insufficient number of tasks to process job input. We have built three systems that tackle the challenges of satisfying SLOs of stream processing jobs that face a combination of these bottlenecks in various environments. We have developed Henge, a system that achieves SLO satisfaction of stream processing jobs deployed on a multi-tenant cluster of resources. As the input rates of jobs change dynamically, Henge makes cluster-level resource allocation decisions to continually meet jobs’ SLOs in spite of limited cluster resources. Second, we have developed Meezan, a system that aims to remove the burden of finding the ideal resource allocation of jobs deployed on commercial cloud platforms, in terms of performance and cost, for new users of stream processing. When a user submits their job ii to Meezan, it provides them with a spectrum of throughput SLOs for their jobs, where the most performant choice is associated with the highest resource usage and consequently cost, and vice versa. Finally, we have built Caladrius in collaboration with Twitter that enables users to model and predict how input rates of jobs may change in the future. This allows Caladrius to preemptively scale a job out when it anticipates high workloads to prevent SLO misses. Henge is built atop Apache Storm [7], while Meezan and Caladrius are integrated with Apache Heron [112]. iii Acknowledgments I would like to thank my advisor, Professor Indranil Gupta, for his unfailing support and guidance. I have benefited enormously from his immense knowledge and constant motivation during the course of my Ph.D. study. He has been a mentor both professionally and personally, and has given me opportunities that I could not have ever had without his help. I am incredibly grateful for the privilege of having been his student. Additionally, I would like to thank my committee members, Prof. Klara Nahrstedt, Prof. Tianyin Xu and Dr. Ganesh Ananthanarayanan, for providing insightful feedback that helped shape my thesis. I have always looked up to Klara as the role model of an impactful female researcher in STEM. Meeting Tianyin was a turning point during grad school: I always found his optimism and energy uplifting, and am going to miss his encouragement outside school. Ganesh’s work has always set the bar for me, for what cutting-edge research in systems and networking should look like, and I have tried to meet it as best as I could. I have also had the chance to meet great collaborators during my journey. Asser Tantawi at IBM Research emphasized the importance of building systems on strong theoretical grounds. The Real-Time Compute team at Twitter (especially Huijun Wu, Ning Wang, Neng Lu and Maosong Fu) gave me great insight into streaming systems and were a massive help with Caladrius. I would like to express my sincere gratitude to Lalith Suresh, who was my internship mentor at VMware Research, and gave me the opportunity to work on incredibly exciting research. I will always be inspired by his kindness towards me, his meticulous approach towards research, and his high standards for system building and emphasis on reproducibility. I have had the privilege to make amazing friends in Urbana-Champaign. I will always be inspired by Farah Hariri and Humna Shahid, who have set an example for what great strength of character and selflessness looks like. I am indebted to Sangeetha Abdu Jyothi for her buoyant positivity, advice and help, in and out of grad school. Sneha Krishna Kumaran and Pooja Malik have been close friends through thick and thin. I am also thankful to Huda Ibeid, who is marvelously personable and broke many barriers for me. I am incredibly grateful to serendipity for giving me the chance to meet these amazing women in grad school and I will treasure their friendship forever. A special thanks to Mainak Ghosh, Rui Yang, Shadi Abdollahian Noghabi, Jayasi Mehar, and Mayank Bhatt for their kindness and support during my lows. I would also like to thank Le Xu, Beomyeol Jeon, Shegufta Ahsan Bakht, Safa Messaoud and Assma Boughoula for all the wonderful times we have spent together. I am grateful to the Computer Science Department at University of Illinois, Urbana Champaign, iv for giving me the opportunity to pursue a doctorate at this amazing institution. Many thanks to the staff of the department, including but not limited to: Viveka Perera Kudaligama, Samantha Smith, Kathy Runck, Kara MacGregor, Mary Beth Kelley, Maggie Metzger Chappell and Samantha Hendon for their help and support at countless times during my Ph.D. process. I am thankful to the National Science Foundation, Microsoft, and Sohaib and Sara Abbasi for funding my work. I will always look up to Sohaib Abbasi as a role model, who generously gave back to his country by establishing a fellowship for Pakistani students pursuing higher education. He is an inspiration and I hope I can demonstrate a generosity equal to his in my career. I would like to extend my gratitude to Dr. Usman Ilyas Dr. Tahir Azim, and Dr. Aamir Shafi, who were my mentors at my undergraduate alma mater, at National University of Science and Technology, Pakistan. My journey would not have started without their support. Finally and most importantly, I would like to thank my family for their unconditional love and support. My parents’ faith in me gives me the courage to do things I could otherwise not even contemplate. They are exemplary people who have fought many difficult battles in their life, have served their country well and have made countless sacrifices for their children. I hope that my efforts make them proud. Umar Kalim is a pillar of support, guidance and empathy, who I will always need. Umairah Kalim has always been a source of warmth, encouragement and strength, and Zubair Khan is the invaluable glue that holds our family together. I dedicate this dissertation to them. v To my family. vi Table of Contents Chapter 1 Introduction .................................... 1 1.1 Thesis Contributions.................................4 1.2 Thesis Organization..................................5 Chapter 2 Henge: Intent-Driven Multi-Tenant Stream Processing.............. 6 2.1 Introduction......................................6 2.2 Contributions.....................................8 2.3 Henge Summary...................................9 2.4 Background...................................... 11 2.5 System Design.................................... 13 2.6 Henge State Machine................................. 15 2.7 Juice: Definition and Algorithm........................... 19 2.8 Implementation.................................... 22 2.9 Evaluation....................................... 22 2.10 Related Work..................................... 37 2.11 Conclusion...................................... 39 Chapter 3 Caladrius: A Performance Modelling Service for Distributed Stream Process- ing Systems......................................... 41 3.1

© 2020 Faria Kalim SATISFYING SERVICE LEVEL OBJECTIVES in STREAM PROCESSING SYSTEMS

Network Traffic Profiling and Anomaly Detection for Cyber Security

Optimizing Resource Utilization in Distributed Computing Systems For

Evaluating the Impact of Streaming Systems Design on Application Performance Alessio Pagliari

Apache Samza

A Novel Cloud Broker-Based Resource Elasticity Management and Pricing for Big Data Streaming Applications

Configuring Data Stream Processing

A Comprehensive Solution for Research-Oriented Cloud Computing

Configuring Data Stream Processing Applications on Highly Distributed Infrastructure Alexandre Da Silva Veith

A Survey of State Management in Big Data Processing Systems

The Evolution of Smart Stream Processing Architecture Enhancing the Kappa Architecture for Stateful Streams

A Quick and Flexible Stream Processing Application Prototype Generator Alessio Pagliari, Fabrice Huet, Guillaume Urvoy-Keller

DEPENDABLE CLOUD RESOURCES for BIG-DATA BATCH PROCESSING & STREAMING FRAMEWORKS by Bara Abusalah