Robust Resource Management in Distributed Stream Processing Systems

Robust Resource Management in Distributed Stream Processing Systems Xunyun Liu Submitted in total fulfilment of the requirements of the degree of Doctor of Philosophy April 2018 School of Computing and Information Systems THE UNIVERSITY OF MELBOURNE Copyright c 2018 Xunyun Liu All rights reserved. No part of the publication may be reproduced in any form by print, photoprint, microfilm or any other means without written permission from the author except as permitted by law. Robust Resource Management in Distributed Stream Processing Systems Xunyun Liu Principal Supervisor: Prof. Rajkumar Buyya Abstract Stream processing is an emerging in-memory computing paradigm that ingests dynamic data streams with a process-once-arrival strategy. It yields real-time insights by applying continuous queries over data in motion, giving birth to a wide range of time- critical applications such as fraud detection, algorithmic trading and health surveillance. Resource management is an integral part of the deployment process to ensure that the stream processing system meets the requirements articulated in the Service Level Agreement (SLA). It involves the construction of the system deployment stack over distributed resources, as well as its continuous adjustment to deal with the constantly chang- ing runtime environment and the fluctuating workload. However, most existing resource management techniques are optimised towards a pre-configured deployment platform, thus facing a variety of challenges in resource provisioning, operator parallelisation, task scheduling, and state management to realise robustness, i.e. maintaining a certain level of performance and reliability guarantee with minimum resource costs. In this thesis, we investigate novel techniques and solutions for robust resource management to tackle arising challenges associated with the cloud deployment of stream processing systems. The outcome is a series of research work that incorporate SLA- awareness into the resource management process and relieve the burden of the devel- opers to monitor, analyse, and rectify the performance and reliability problems encoun- tered during execution. Specifically, we have advanced the state-of-the-art by making the following contributions: 1. A stepwise profiling and controlling framework that improves application performance by automatically scaling up the parallelism degree of streaming operators. It also ensures proper resource allocation between data sources and data sinks to avoid processing backlogs and starvation. 2. A resource-efficient scheduler that monitors the application execution, models the resource consumption, and consolidates the task placement for improving cost ef- ficiency without causing resource contention. 3. A replication-based state management framework that masks state loss in the cases of node crashes and JVM failures, which also reduces the fault-tolerance overhead by eliminating the need of synchronisation with a remote state storage. 4. A performance-oriented deployment framework that conducts iterative scaling of the streaming application to reach its pre-defined targets on throughput and la- tency, regardless of the initial amount of resource allocation. iii Declaration This is to certify that 1. the thesis comprises only my original work towards the PhD, 2. due acknowledgement has been made in the text to all other material used, 3. the thesis is less than 100,000 words in length, exclusive of tables, maps, bibliogra- phies and appendices. Xunyun Liu, 15 April 2018 v Preface This thesis research has been carried out in the Cloud Computing and Distributed Sys- tems (CLOUDS) Laboratory, School of Computing and Information Systems, The Uni- versity of Melbourne. The main contributions of the thesis are discussed in Chapters 2- 6 and are based on the following publications: • Xunyun Liu and Rajkumar Buyya, “Resource Management and Scheduling in Dis- tributed Stream Processing Systems: A Taxonomy, Review and Future Directions,” ACM Computing Surveys, ACM Press, 2018 (under review). • Xunyun Liu, Amir Vahid Dastjerdi, Rodrigo N. Calheiros, Chenhao Qu and Rajkumar Buyya, “A Stepwise Auto-Profiling Method for Performance Optimiza- tion of Streaming Applications,” ACM Transactions on Autonomous and Adaptive Sys- tems (TAAS), Volume 12, Issue 4, Pages: 1-33, ACM Press, 2017. • Xunyun Liu and Rajkumar Buyya, “Performance-Oriented Deployment of Stream- ing Applications on Cloud,” IEEE Transactions on Big Data (TBD), Accepted, In press, DOI:10.1109/TBDATA.2017.2720622 Pages: 1-14, IEEE, 2017. • Xunyun Liu, Aaron Harwood, Shanika Karunasekera, Benjamin Rubinstein and Rajkumar Buyya, “E-Storm: Replication-based State Management in Distributed Stream Processing Systems,” in Proceedings of the 46th International Conference on Par- allel Processing (ICPP), Bristol, UK, Pages: 1-10, IEEE, 2017. • Xunyun Liu and Rajkumar Buyya, “D-Storm: Dynamic Resource-Efficient Schedul- ing of Stream Processing Applications,” in Proceedings of the 23rd IEEE International Conference on Parallel and Distributed Systems (ICPADS), Shenzhen, China, Pages: 1-8, IEEE, 2017. vii • Xunyun Liu and Rajkumar Buyya, “Dynamic Resource-Efficient Scheduling in Data Stream Management Systems Deployed on Computing Clouds,” ACM Transactions on Internet Technology (TOIT), ACM Press, 2017 (Under review). • Xunyun Liu, Amir Vahid Dastjerdi and Rajkumar Buyya, “Stream Processing in IoT: Foundations, State-of-the-Art, and Future Directions,” Internet of Things: Prin- ciples and Paradigms, Pages: 145-161, Morgan Kaufmann, 2016. viii Acknowledgements First and foremost, I would like to express my sincere gratitude to my principal supervisor, Professor Rajkumar Buyya, for guiding me tirelessly and providing valuable research insights and continuous support throughout my PhD, without which I could never reach this milestone of writing acknowledgement. In addition, I wish to thank my co-supervisor, Dr. Benjamin Rubinstein, for generously sharing with me his knowledge and vision when I need a pair of professional eyes to analyse the real nature of the prob- lem. I also sincerely thank Dr. Rodrigo N. Calheiros for being my co-supervisor while working at the University of Melbourne. My supervisors helped me to grow from a ju- nior researcher, and their encouragement is indispensable to the completion of this thesis. I had many fruitful discussions with them that shed lights on the future research as well. I would like to thank the chair of my PhD committee Professor Shanika Karunasekera for her support and guidance. I also want to express my gratitude to Dr. Aaron Harwood, Dr. Adel Nadjaran Toosi, and Dr. Amir Vahid Dastjerdi for their time generously spent on answering my questions and proofreading my papers. Special thanks go to Dr. Chenhao Qu. As a senior student at CLOUDS lab, he offered me guidance when I was struggling to choose an approach for implementing feasible solutions. Besides, I would like to thank all members of the CLOUDS lab and acknowledge the support of the fellow PhD graduates and students. Thank you to Deepak Poola, Niko- lay Grozev, Maria Rodriguez, Bowen Zhou, Yaser Mansouri, Jungmin Jay Son, Safiollah Heidari, Caesar Wu, Minxian Xu, Sara Moghaddam, Muhammad H. Hilman, Redowan Mahmud, Md. Tawfiqul Islam, Shashikant Ilager, TianZhang He, etc. I acknowledge the China Scholaship Council, the University of Melbourne, and the Nectar Cloud for providing me with financial support and appropriate facilities to pur- sue this doctoral degree. I am also grateful to the administrative staff at the School of Computing and Information Systems, especially Rhonda Smithies, Madalain Dolic, and Julie Ireland, for their support in my numerous applications. I would like to thank my parents who have encouraged me to embark on this won- derful journey, and my girlfriend, Miss Yi Fan, for her company as we taste all the sweets and bitters of pursuing a PhD. I cannot imagine a greater fortune other than having them in my life. Their love and care are indispensable to any of my achievements. Xunyun Liu Melbourne, Australia April 2018 ix Contents 1 Introduction1 1.1 Challenges in Robust Resource Management.................4 1.1.1 Challenges in Application Profiling..................4 1.1.2 Challenges in Operator Parallelisation.................5 1.1.3 Challenges in Task Scheduling.....................5 1.1.4 Challenges in State Management....................6 1.1.5 Challenges in Resource Provisioning..................6 1.2 Research Problems and Objectives.......................7 1.3 Evaluation Methodology.............................8 1.4 Thesis Contribution................................9 1.5 Thesis Organisation................................ 11 2 Literature Review 15 2.1 Introduction.................................... 15 2.2 Background.................................... 18 2.3 Taxonomy..................................... 23 2.4 Resource Type................................... 25 2.4.1 Resource Abstractions.......................... 26 2.4.2 Virtual Machines............................. 26 2.4.3 Specific Hardware............................ 27 2.4.4 Hybrid Network............................. 28 2.5 Resource Estimation............................... 29 2.5.1 Predictive Ability............................. 30 2.5.2 Resource Cost Modelling........................ 33 2.6 Resource Adaptation............................... 36 2.6.1 Proactive Adaptation........................... 37 2.6.2 Reactive Adaptation........................... 38 2.7 Parallelism Calculation.............................

Robust Resource Management in Distributed Stream Processing Systems

Network Traffic Profiling and Anomaly Detection for Cyber Security

Optimizing Resource Utilization in Distributed Computing Systems For

Evaluating the Impact of Streaming Systems Design on Application Performance Alessio Pagliari

Apache Samza

A Novel Cloud Broker-Based Resource Elasticity Management and Pricing for Big Data Streaming Applications

Configuring Data Stream Processing

A Comprehensive Solution for Research-Oriented Cloud Computing

Configuring Data Stream Processing Applications on Highly Distributed Infrastructure Alexandre Da Silva Veith

A Survey of State Management in Big Data Processing Systems

The Evolution of Smart Stream Processing Architecture Enhancing the Kappa Architecture for Stateful Streams

A Quick and Flexible Stream Processing Application Prototype Generator Alessio Pagliari, Fabrice Huet, Guillaume Urvoy-Keller

DEPENDABLE CLOUD RESOURCES for BIG-DATA BATCH PROCESSING & STREAMING FRAMEWORKS by Bara Abusalah