A Novel Cloud Broker-Based Resource Elasticity Management and Pricing for Big Data Streaming Applications

A Novel Cloud Broker-based Resource Elasticity Management and Pricing for Big Data Streaming Applications by Olubisi A. Runsewe Thesis submitted to the Faculty of Graduate and Postdoctoral Studies in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Electronic Business School of Electrical Engineering and Computer Science Faculty of Engineering University of Ottawa c Olubisi A. Runsewe, Ottawa, Canada, 2019 Abstract The pervasive availability of streaming data from various sources is driving todays' enterprises to acquire low-latency big data streaming applications (BDSAs) for extracting useful information. In parallel, recent advances in technology have made it easier to collect, process and store these data streams in the cloud. For most enterprises, gaining insights from big data is immensely important for maintaining competitive advantage. However, majority of enterprises have difficulty managing the multitude of BDSAs and the complex issues cloud technologies present, giving rise to the incorporation of cloud service brokers (CSBs). Generally, the main objective of the CSB is to maintain the heterogeneous quality of service (QoS) of BDSAs while minimizing costs. To achieve this goal, the cloud, al- though with many desirable features, exhibits major challenges | resource prediction and resource allocation | for CSBs. First, most stream processing systems allocate a fixed amount of resources at runtime, which can lead to under- or over-provisioning as BDSA demands vary over time. Thus, obtaining optimal trade-off between QoS violation and cost requires accurate demand prediction methodology to prevent waste, degradation or shutdown of processing. Second, coordinating resource allocation and pricing decisions for self-interested BDSAs to achieve fairness and efficiency can be complex. This complexity is exacerbated with the recent introduction of containers. This dissertation addresses the cloud resource elasticity management issues for CSBs as follows: First, we provide two contributions to the resource prediction challenge; we propose a novel layered multi-dimensional hidden Markov model (LMD-HMM) framework for managing time-bounded BDSAs and a layered multi-dimensional hidden semi-Markov model (LMD-HSMM) to address unbounded BDSAs. Second, we present a container resource allocation mechanism (CRAM) for optimal workload distribution to meet the real-time demands of competing containerized BDSAs. We formulate the problem as an n-player non-cooperative game among a set of heterogeneous containerized BDSAs. Fi- nally, we incorporate a dynamic incentive-compatible pricing scheme that coordinates the decisions of self-interested BDSAs to maximize the CSB's surplus. Experimental results demonstrate the effectiveness of our approaches. ii Acknowledgements This dissertation would not have been possible without the invaluable support of many great personalities instrumental in shaping my PhD program. First and foremost, I will like to express my hearty appreciation and gratitude to my Supervisor, Dr. Nancy Samaan, for her unwavering support and encouragement throughout the course of my study. Her extensive knowledge, unsurpassed expertise, forbearance and understanding made it possible for me to work on a topic that was of great interest to me. I also want to express my deepest thanks to her for believing in my ability and constantly motivating me to work harder. Mere writing is not enough to express the wealth of support and guidance I received from her on an academic and a personal level. My gratitude also goes to my thesis advisory committee members, Prof. Raheemi Bijan and Prof. Herna Viktor, for their invaluable and constructive comments, which helped to improve my dissertation. Special thanks to my dear husband and lovely daughters, who served as my inspiration to pursue my goal that required so much understanding and stability throughout the years. Their wholehearted support for my decision is part of what kept me going. I thank my mum for her unequivocal support and prayers for which my mere expression of thanks does not suffice. To all my siblings for their constant encouragement, I am grateful. I will also like to express my sincere thanks to my friends and extended family members for being by my side throughout this journey, urging me on. Finally, I will like to acknowledge the financial support of University of Ottawa in the award of an admission scholarship and Ontario Government Scholarship (OGS) that provided me the necessary financial support for my program. Above all, I thank God for making all my endeavours come to fruition. iii To my beloved husband and girls. iv Table of Contents Table of Contents viii List of Figures ix List of Tables xi Acronyms xii 1 Introduction1 1.1 Overview.....................................1 1.2 Research Challenges and Motivation.....................3 1.2.1 Research Challenges..........................3 1.2.2 Research Questions...........................5 1.2.3 Motivation................................5 1.3 Research Scope.................................8 1.4 Research Methodology............................. 10 1.4.1 Design Science Research........................ 10 1.4.2 Research Model............................. 12 1.5 Summary of Contributions........................... 13 1.5.1 LMD-HMM............................... 13 1.5.2 LMD-HSMM.............................. 13 1.5.3 CRAM.................................. 14 1.5.4 CRAM-P................................ 14 1.5.5 Publications............................... 14 1.6 Thesis Organization............................... 15 v 2 Background 16 2.1 Big Data Paradigm............................... 16 2.1.1 Taxonomy................................ 18 2.1.2 Lifecycle Stages............................. 21 2.1.3 Services................................. 22 2.1.4 Challenges................................ 23 2.2 Cloud Datacenters............................... 24 2.2.1 Hypervisor and Container-based Virtualization............ 26 2.2.2 Cloud DCs as Host for Big Data.................... 29 2.3 Big Data Migration over Clouds........................ 30 2.3.1 Requirements.............................. 31 2.3.2 Non-networking Approaches...................... 33 2.3.3 Networking Approaches........................ 34 2.3.4 State-of-Art Networking Protocols................... 36 2.3.5 Optimization Objectives........................ 38 2.3.6 Discussions and Future Opportunities................. 40 2.4 Big Data Stream Processing in the Cloud................... 42 2.4.1 Stream Processing Architecture.................... 43 2.4.2 Stream Processing Applications.................... 44 2.4.3 Key Stream Processing Technologies................. 46 2.5 Cloud Resource Elasticity for BDSAs..................... 50 2.5.1 Resource Prediction........................... 53 2.5.2 Resource Allocation........................... 53 2.5.3 Container Elasticity.......................... 54 2.6 Reference Models................................ 55 2.6.1 Hidden Markov Models......................... 55 2.6.2 Game Theory.............................. 57 2.7 Summary.................................... 58 vi 3 Proposed Framework 59 3.1 Cloud Ecosystem and Application Scenario.................. 59 3.2 The Cloud Business Model........................... 61 3.3 Framework Layers................................ 63 3.3.1 Application Layer............................ 64 3.3.2 Resource Management Layer...................... 64 3.3.3 Cloud Layer............................... 65 3.4 Function of Actors............................... 66 3.4.1 Cloud Service Consumers....................... 66 3.4.2 Cloud Service Brokers......................... 66 3.4.3 Cloud Service Providers........................ 67 3.5 Framework Components............................ 67 3.5.1 Resource Profiler............................ 67 3.5.2 Resource Monitor............................ 68 3.5.3 Resource Predictor........................... 69 3.5.4 Resource Allocator........................... 69 3.6 Resources.................................... 69 3.7 Summary.................................... 70 4 Cloud Resource Scaling for Big Data Streaming Applications 71 4.1 Introduction................................... 72 4.2 Related Work.................................. 74 4.3 Proposed Prediction Model........................... 76 4.3.1 The Lower-Layer............................ 77 4.3.2 MD-HMM Inference & Learning Mechanisms............. 77 4.3.3 MD-HSMM Inference & Learning Mechanisms............ 82 4.3.4 The Upper-Layer............................ 86 4.4 Experiments................................... 87 vii 4.4.1 Experimental Setup........................... 88 4.4.2 LMD-HMM Implementation...................... 89 4.4.3 LMD-HSMM Implementation..................... 94 4.5 Summary.................................... 97 5 CRAM-P: a Container Resource Allocation Mechanism with Dynamic Pricing for BDSAs 99 5.1 Introduction................................... 100 5.2 Related Work.................................. 102 5.3 System Model.................................. 104 5.3.1 Problem Statement........................... 104 5.3.2 CRAM Design as a Non-Cooperative Game............. 106 5.3.3 Dynamic Pricing............................ 109 5.3.4 Algorithm Design............................ 111 5.4 Performance Evaluation............................ 113 5.4.1 Experimental Setup........................... 113 5.4.2 Testing Applications.......................... 113 5.4.3 Experimental Results.........................

A Novel Cloud Broker-Based Resource Elasticity Management and Pricing for Big Data Streaming Applications

Network Traffic Profiling and Anomaly Detection for Cyber Security

Optimizing Resource Utilization in Distributed Computing Systems For

Evaluating the Impact of Streaming Systems Design on Application Performance Alessio Pagliari

Apache Samza

Configuring Data Stream Processing

A Comprehensive Solution for Research-Oriented Cloud Computing

Configuring Data Stream Processing Applications on Highly Distributed Infrastructure Alexandre Da Silva Veith

A Survey of State Management in Big Data Processing Systems

The Evolution of Smart Stream Processing Architecture Enhancing the Kappa Architecture for Stateful Streams

A Quick and Flexible Stream Processing Application Prototype Generator Alessio Pagliari, Fabrice Huet, Guillaume Urvoy-Keller

DEPENDABLE CLOUD RESOURCES for BIG-DATA BATCH PROCESSING & STREAMING FRAMEWORKS by Bara Abusalah

Hadoop Big Data. | 1