Complex Event Processing As a Service in Multi-Cloud Environments
Total Page:16
File Type:pdf, Size:1020Kb
Western University Scholarship@Western Electronic Thesis and Dissertation Repository August 2016 Complex Event Processing as a Service in Multi-Cloud Environments Wilson A. Higashino The University of Western Ontario Supervisor Dr. Miriam A. M. Capretz The University of Western Ontario Graduate Program in Electrical and Computer Engineering A thesis submitted in partial fulfillment of the equirr ements for the degree in Doctor of Philosophy © Wilson A. Higashino 2016 Follow this and additional works at: https://ir.lib.uwo.ca/etd Part of the Databases and Information Systems Commons Recommended Citation Higashino, Wilson A., "Complex Event Processing as a Service in Multi-Cloud Environments" (2016). Electronic Thesis and Dissertation Repository. 4016. https://ir.lib.uwo.ca/etd/4016 This Dissertation/Thesis is brought to you for free and open access by Scholarship@Western. It has been accepted for inclusion in Electronic Thesis and Dissertation Repository by an authorized administrator of Scholarship@Western. For more information, please contact [email protected]. Abstract The rise of mobile technologies and the Internet of Things, combined with advances in Web technologies, have created a new Big Data world in which the volume and velocity of data gen- eration have achieved an unprecedented scale. As a technology created to process continuous streams of data, Complex Event Processing (CEP) has been often related to Big Data and used as a tool to obtain real-time insights. However, despite this recent surge of interest, the CEP market is still dominated by solutions that are costly and inflexible or too low-level and hard to operate. To address these problems, this research proposes the creation of a CEP system that can be o↵ered as a service and used over the Internet. Such a CEP as a Service (CEPaaS) system would give its users CEP functionalities associated with the advantages of the services model, such as no up-front investment and low maintenance cost. Nevertheless, creating such a service involves challenges that are not addressed by current CEP systems. This research proposes solutions for three open problems that exist in this context. First, to address the problem of understanding and reusing existing CEP management pro- cedures, this research introduces the Attributed Graph Rewriting for Complex Event Process- ing Management (AGeCEP) formalism as a technology- and language-agnostic representa- tion of queries and their reconfigurations. Second, to address the problem of evaluating CEP query management and processing strategies, this research introduces CEPSim, a simulator of cloud-based CEP systems. Finally, this research also introduces a CEPaaS system based on a multi-cloud architecture, container management systems, and an AGeCEP-based multi-tenant design. To demonstrate its feasibility, AGeCEP was used to design an autonomic manager and a selected set of self-management policies. Moreover, CEPSim was thoroughly evaluated by experiments that showed it can simulate existing systems with accuracy and low execution overhead. Finally, additional experiments validated the CEPaaS system and demonstrated it achieves the goal of o↵ering CEP functionalities as a scalable and fault-tolerant service. In tandem, these results confirm this research significantly advances the CEP state of the art and provides novel tools and methodologies that can be applied to CEP research. Keywords: Complex Event Processing, Cloud Computing, Multi-Cloud, Container Man- agement System, Simulation, Graph Rewriting ii Co-Authorship Statement The work presented in Chapters 4 and 5 (Attributed Graph Rewriting for CEP Management) has been developed in collaboration with Dr. Cedric´ Eichler from INSA Centre Val de Loire / LIFO Laboratory. Dr. Eichler’s main contribution was the introduction of graph rewriting rules as a formalism to express reconfiguration actions of Complex Event Processing queries. In addition, Dr. Eichler contributed with discussions about the scope and goals of the presented formalism, as well as with his expertise about autonomic computing. iii Acknowledgements First and foremost, I would like to thank my supervisor Dr. Miriam Capretz for her supervision, mentoring, and for welcoming me in a new country and making me feel like home. I will always be grateful for the many opportunities you gave me and for the countless hours you spent making me a better person and professional. I would also like to thank my co-supervisor Dr. Luiz Fernando Bittencourt for having me half way in the program, and for his support and orientation through these years. You are a great source of inspiration. To my parents, who have always believed and encouraged me through my journey. To my beloved Ana, who has always supported me and has never left my side even in the most difficult moments. To my brother and my nephew, who inspires me and makes me laugh every Sunday. To Dr. Beatriz Puzzi and Mr. Carlos Taube for your assistance and encouragement. To Dr. Cédric Eichler, Dr. Thierry Monteil, and Dr. Ernesto Exposito for the incredible collaboration in this thesis. To Powersmiths International Corp. for the great project in which I had the opportunity to work, and for providing the data used in many experiments from this thesis. To all my colleagues from TEB 346. During all these years, you came from all around the world and without you this would not have been possible. To all of you who should be thanked, but are not in this symbolic list. You all know who you are. iv Contents Abstract ii Co-Authorship Statement iii Acknowledgements iv List of Figures x List of Tables xiii List of Appendices xiv List of Acronyms xv 1 Introduction 1 1.1 Motivation . 2 1.2 Contributions . 4 1.3 Thesis Organization . 5 2 Background 8 2.1 Event Processing . 8 2.1.1 Stream Processing . 9 2.1.2 Complex Event Processing . 10 2.1.3 Concepts and Terminology . 11 2.1.4 Classification of CEP systems . 12 2.1.5 Query Lifecycle . 14 2.2 Cloud Computing . 17 2.2.1 Cloud Computing Definition . 17 2.2.2 Cloud Computing Architecture . 18 2.2.3 Multiple Cloud Architectures . 20 2.3 Container-Based Virtualization . 22 v 2.3.1 Application Containers . 24 2.3.2 Container Management Systems . 25 2.4 Autonomic Computing . 27 2.5 Summary . 28 3 Literature Review 30 3.1 Complex Event Processing . 30 3.1.1 Traditional Systems . 30 3.1.2 Modern Systems . 32 3.1.3 CEP Services . 40 3.1.4 Multi-Cloud CEP . 41 3.1.5 Comparison . 42 3.1.6 Discussion . 44 3.2 CEP Formal Models . 45 3.3 Cloud Computing Simulators . 46 3.4 Summary . 48 4 Attributed Graph Rewriting for CEP Management - Concepts 49 4.1 Motivation . 49 4.2 Attributed Graph Rewriting for CEP Management . 51 4.2.1 Modelling Queries . 51 4.2.2 Modelling Reconfiguration Actions . 52 4.2.3 Discussion . 52 4.3 Classification of CEP Operators . 53 4.3.1 Operator Type . 54 4.3.2 Sharing . 55 4.3.3 Duplication . 57 4.3.4 Combination . 58 4.3.5 Behaviour . 59 4.3.6 Discussion . 59 4.4 Representation of Queries and Reconfigurations . 59 4.4.1 Query Representation Using ADAGs . 60 4.4.2 Query Reconfiguration Using Graph Rewriting . 66 4.5 Summary . 70 5 Attributed Graph Rewriting for CEP Management - Evaluation 72 5.1 AGeCEP-Based Autonomic Manager . 72 vi 5.1.1 Runtime Environment Representation . 73 5.1.2 MAPE Modules . 73 5.2 Feasibility: Operator Placement . 76 5.2.1 General Principle . 76 5.2.2 Examples . 77 5.3 Feasibility: Self-Management Policies . 78 5.3.1 Operator Combination . 78 5.3.2 Operator Duplication . 79 5.3.3 Removal of an Unnecessary Merge/Split . 81 5.3.4 Processing Sub-Streams (ProcSubS) . 84 5.3.5 Predicate Indexing . 88 5.4 Viability: Performance Evaluation . 89 5.4.1 Simple Policy . 89 5.4.2 Complex Policy . 89 5.5 Summary . 91 6 Complex Event Processing Simulator 93 6.1 Motivation . 93 6.2 System Overview . 94 6.3 CEPSim Foundation . 96 6.3.1 Query Model . 96 6.3.2 Event Sets . 98 6.3.3 Event Set Queues . 100 6.4 CEPSim Simulation . 101 6.4.1 Operator Placement . 101 6.4.2 Operator Scheduling . 101 6.4.3 Operator Simulation . 102 6.4.4 Placement Simulation . 106 6.4.5 Metrics . 110 6.5 Evaluation . 112 6.5.1 Case Study . 112 6.5.2 Environment . 113 6.5.3 Set-Up . 113 6.5.4 Validation . 115 6.5.5 CPU and Memory Overhead . 119 6.5.6 Simulation Parameters . 121 vii 6.5.7 Discussion . 122 6.6 Summary . 123 7 Complex Event Processing as a Service 125 7.1 Motivation . ..