An Automation-Based Approach for Reproducible Evaluations of Distributed DBMS on Elastic Infrastructures

Institut für Organisation und Management von Informationssystemen An Automation-based Approach for Reproducible Evaluations of Distributed DBMS on Elastic Infrastructures Dissertation zur Erlangung des Doktorgrades Dr. rer. nat. der Fakultät für Ingenieurwissenschaften, Informatik und Psychologie der Universität Ulm vorgelegt von Daniel Seybold aus Crailsheim 2020 This thesis has been written by Daniel Seybold in partial fulfilment of the requirements for a doctoral degree of the faculty of Engineering, Computer Science and Psychology at Ulm University. It has been submitted on June 30, 2020. Amtierender Dekan: Prof. Dr.-Ing. Maurits Ortmanns Erstgutachter: Prof. Dr.-Ing. Dr. h.c. Stefan Wesner Zweitgutachter: Prof. Dr.-Ing. Samuel Kounev Tag der Promotion: 08.03.2021 Contact Daniel Seybold mail: [email protected] www: https://www.uni-ulm.de/in/omi/institut/persons/daniel-seybold/ Institute of Information Resource Management Faculty of Engineering, Computer Sciences and Psychology University of Ulm Albert-Einstein-Allee 43 89081 Ulm, Germany Typesetting This document was set by the author using the X LAE TEX typsetting system, the Meta font family, and the Latin Modern font family. My sincere apologies to those people whose names are not typeset correctly due to limitations of the fonts. Layout Many thanks to Jörg Domaschka for sharing the template for this thesis. ©2020 Daniel Seybold Abstract Driven by the data-intensive applications of the Web, Big Data and Internet of Things, Database Management Systems (DBMSs) and their operation have significantly changed over the last decade. Besides relational DBMSs, manifold NoSQL and NewSQL DBMSs evolved, promising a set of non-functional features that are key requirements for each data-intensive application: high performance, horizontal scalability, elasticity and high-availability. In order to take full advantage of these non-functional features, the operation of DBMSs is moving towards elastic infrastructures such as the cloud. Cloud computing enables scalability and elasticity on the resource level. Therefore, the storage backend of data-intensive applications is commonly imple- mented by distributed DBMSs operated on cloud resources. But the sheer number of heterogeneous DBMSs, cloud resource offers and the resulting number of combi- nations make the selection and operation of DBMSs a very challenging task. Therefore, supportive analyses of the non-functional DBMS features are essential. But the analyses design and execution is a complex process that involves detailed domain knowledge of multiple domains. First, the multitude of DBMSs technolo- gies with their respective runtime parameters need to be considered. Secondly, the tremendous number of resource offers including their volatile characteristics need to be taken into account. Thirdly, the application- specific workload has to be created by suitable DBMS benchmarks. While supportive DBMS benchmarks only focus on DBMS performance, the evaluation design and execution for advanced non-functional features such as scalability, elasticity and availability becomes even more challenging. This thesis enables the holistic evaluation of distributed DBMS on elastic infrastructures by defining a supportive methodology that determines the domain-specific impact factors for designing comprehensive DBMS evaluations and establishes a set of evaluation principles to ensure significant results. Moreover, reproducible evaluation processes for the non-functional features performance, scalability, elasticity and availability are established. Based on these concepts results the novel DBMS evaluation framework Mowgli. It supports the design and automated execution of performance and scalability evaluation processes. There- fore, Mowgli manages cloud resources, DBMS deployment, workload execution and result processing based on evaluation scenarios, which expose configurable domain-specific parameters. Mowgli follows the established evaluation principles with a dedicated focus on the automated and reproducible evaluation execution. Mowgli is extended with the Kaa framework that automates the DBMS elasticity evaluation process by enabling DBMS and workload adaptations. The King Louie framework builds upon these features and enables availability evaluations by providing an extensive failure injection framework. The extensive automation capabilities of Mowgli, Kaa and King Louie ensure reproducible DBMSs evaluations on elastic infrastructures. This enables comparable and novel insights in the non-functional features of distributed DBMSs. Moreover, the automation capabilities facilitate the determination of the the elastic resource impact on the non-functional DBMS features. In conclusion, this thesis provides a novel DBMS evaluation framework based on the Mowgli, Kaa and King Louie frameworks, enabling comprehensive DBMS evaluations on elastic infrastructures with a dedicated focus on advanced non-functional features as well as automated and reproducible evaluation processes. iii Zusammenfassung Angetrieben durch die datenintensiven Anwendungen des Web, Big Data und Internet der Dinge, haben sich die Datenbankmanagementsysteme (DBMS) und ihr Betrieb in den letzten zehn Jahren erheblich verändert. Neben relationalen DBMS haben sich vielfältige NoSQL- und NewSQL-DBMS entwickelt, welche die Kernanfor- derungen von datenintensiven Anwendungen versprechen: Performanz, horizontale Skalierbarkeit, Elastizität und Hochverfügbarkeit. Um diese nicht-funktionalen Eigenschaften voll auszunutzen, werden elastische Infra- strukturen wie Cloud Computing für den Betrieb von DBMS herangezogen, um Skalierbarkeit und Elastizität auch auf der Ressourcenebene zu ermöglichen. Daher werden moderne Speicherdienste datenintensiver An- wendungen durch verteilte DBMS implementiert, die auf Cloud-Ressourcen betrieben werden. Doch die bloße Anzahl heterogener DBMS, Cloud-Ressourcenangebote und die daraus resultierenden Kom- binationen machen die Auswahl und den Betrieb von DBMS zu einer komplexen Herausforderung. Daher sind unterstützende Analysen der nicht-funktionalen DBMS-Eigenschaften unerlässlich. Jedoch sind Design und Ausführung solcher Analysen komplexe Prozesse, die mehrschichtiges Domänenwissen erfordern. Zunächst müssen die DBMS mit ihren Laufzeitparametern betrachtet werden. Weiter muss die enorme Anzahl von Res- sourcenangeboten mit ihren flüchtigen Eigenschaften berücksichtigt werden. Abschließend muss die Anwen- dungslast durch geeignete DBMS-Benchmarks erzeugt werden. Bestehende DBMS-Benchmarks unterstützen hierbei nur die Erzeugung der Anwendungslast. Zudem zielen sie primär auf die DBMS-Performanz ab, wäh- rend die Analyse von Skalierbarkeit, Elastizität und Verfügbarkeit außen vor bleibt. Diese Thesis ermöglicht die ganzheitliche Analyse von DBMS auf elastischen Infrastrukturen durch die De- finition einer unterstützenden Methodik. Diese bestimmt die domänenspezifischen Einflussfaktoren für das Design umfassender DBMS-Analysen und definiert Evaluationsprinzipien um signifikante Ergebnisse zu ge- währleisten. Zudem werden reproduzierbare Analyseprozesse für die nicht-funktionalen Eigenschaften Per- formanz, Skalierbarkeit, Elastizität und Verfügbarkeit definiert. Basierend auf dieser Methodik, wird das neuartige DBMS-Evaluations-Framework Mowgli bereitgestellt, das den Evaluationsprozess für Performanz und Skalierbarkeit automatisiert. Mowgli verwaltet Cloud-Ressourcen, DBMS-Bereitstellung, Lasterzeugung und die Ergebnisverarbeitung auf Basis von konfigurierbaren Evaluationsszenarien. Mowgli folgt den Evaluati- onsprinzipien mit Fokus auf automatisierte und reproduzierbare Evaluationen. Mowgli wird durch das Kaa Framework erweitert, das den Elastizitätsbewertungsprozess automatisiert, indem es DBMS- und Lastanpas- sungen automatisiert. Das King Louie Framework baut auf diesen Merkmalen auf und ermöglicht die DBMS- Verfügbarkeitsbewertung, indem es ein umfangreiches Fehlerinjektions-Framework bereitstellt Mowglis umfangreiche Automatisierungskonzepte sowie die Erweiterungen Kaa und King Louie gewährlei- sten reproduzierbare DBMS-Evaluationen auf elastischen Infrastrukturen, die neuartige und vergleichbare Ergebnisse der nicht-funktionalen DBMS-Eigenschaften ermöglichen. Darüber hinaus erleichtern sie die Be- stimmung des Einflusses elastischer Ressourcen auf die nicht-funktionalen DBMS-Eigenschaften. Zusammenfassend stellt diese Thesis ein neuartiges DBMS-Evaluations-Framework vor, das ganzheitliche DBMS-Evaluationen auf elastischen Infrastrukturen ermöglicht, mit einem speziellen Fokus auf fortgeschrit- tene nicht-funktionale Merkmale sowie automatisierte und reproduzierbare Evaluationsprozesse. v Acknowledgements The time as a doctoral researcher has become one of the most challenging, but also most exciting parts of my life. In this time, I was able to meet a lot of inspiring people that have contributed to this thesis in their own way. I am truly thankful to all of these people. Especially, I want to thank my supervisor Stefan Wesner for offering me the position as doctoral researcher, introducing me to the research field of cloud computing, guiding me through the whole journey of his thesis and numerous research projects while always giving me enough freedom to develop own ideas. Many thanks also to my second thesis reviewer Samuel Kounev and for the inspiring discussions on performance engineering. I want to thank all of my colleagues I have met over the time at the OMI. It was a great experience to develop new research ideas

Load more