Management of polyglot persistent integrations with virtual administrators

Thomas Clauwaert

Supervisors: Prof. dr. ir. Filip De Turck, Dr. ir. Gregory Van Seghbroeck Counsellors: ing. Merlijn Sebrechts, Dr. ir. Gregory Van Seghbroeck

Master's dissertation submitted in order to obtain the academic degree of Master of Science in Information Engineering Technology

Department of Information Technology Chair: Prof. dr. ir. Bart Dhoedt Faculty of Engineering and Architecture Academic year 2017-2018

Preface

It is crazy how fast the past few months went by. I have tried my best to research and learn as much as possible while also implementing interesting things. It was not always easy and from time to time I got stuck here and there. Looking back, I am glad about what I achieved but the big added value for me is the priceless experience I’ve gained throughout this journey.

It is impossible to list every single person that helped me throughout this period but a few people deserve to be in the spotlight. First and foremost, I want to thank prof. Filip De Turck, dr. ir. Gregory Van Seghbroeck and ing. Merlijn Sebrechts for writing out this thesis proposal and providing the opportunity for a student like me to tackle this research. Merlijn especially deserves a round of applause for all the guidance and patience he had when I was stuck or in need of some advice. Next, a big shout out to all the people on the IRC channel of Juju. Even though the community is rather small, the people out there really want to help you. Finally, I’m grateful to all my friends and family for their support and every single piece of advice. You guys were great!

Thomas Clauwaert

Ghent, June 2018

Toelating tot bruikleen

“De auteur(s) geeft (geven) de toelating deze masterproef voor consultatie beschikbaar te stellen ende- len van de masterproef te kopiëren voor persoonlijk gebruik. Elk ander gebruik valt onder de bepalin- gen van het auteursrecht, in het bijzonder met betrekking tot de verplichting de bron uitdrukkelijk te vermelden bij het aanhalen van resultaten uit deze masterproef.”

“The author(s) gives (give) permission to make this master dissertation available for consultation and to copy parts of this master dissertation for personal use. In the case of any other use, the copyright terms have to be respected, in particular with regard to the obligation to state expressly the source when quoting results from this master dissertation.”

Thomas Clauwaert, June 2018

Abstract

Data management plays a crucial role in the area of information technology, as it impacts the efficiency of the system in use. End users often expect these systems to be responsive and available at any time. Good infrastructure design choices, that provide flexibility and scalability, are therefore crucial build- ing blocks of modern applications. In the state of the art, a lot of different systems have been proposed which offer (dis-)advantages in a number of key areas. The traditional relational databaseis still the predominant system, although NoSQL are finding their way into many application stacks. Modern systems often use a combination of several database systems and make the development effort a lot more complex. Industry therefore relies on modern data administrators or operations engi- neers who have the know-how to use, setup and manage these polyglot persistent applications. Since these people are hard to find, developers or data scientists are looking at other solutions to simplify the operation of different technologies.

The goal of this thesis is to propose a service which transparently manages different database systems. The idea behind a script or tool lies often in performing specific tasks which would otherwise needtobe performed manually. They can be seen as virtual administrators who perform predefined tasks. Inthis thesis several possibilities are investigated to create a virtual administrator that is responsible for the management of polyglot persistent applications and all its derivatives. The generic database service as presented in this research offers an easy-to-use platform where users request a specific database technology and the service itself will take care of installing all required components and sharing all needed information. The virtual administrator makes its own choices in deciding what services need to be deployed in order to provide the requested database technology. This way developers can ask the virtual administrator for a database technology and a database name and they end up with the connection details to use it. The developer becomes self-reliant and the time needed to get the requested operational tasks done, reduces significantly.

A proof of concept was made in the application modelling tool Juju for the generic database service. With the help of a use case and the reactive framework, a requesting service can successfully request multiple databases of a different type. The generic database service would then correctly sharethe details to the requesting service. It only acts as a proxy that relays the database details. Furthermore, the service is resource-demanding and more database technologies should be supported. In iterative steps, support for any database technology could be added so the end result becomes a full-fledged application ready for use in Juju. The idea behind the generic database service is not bound to Jujuand can be (re-)used in other environments aiming to achieve the same goal.

Samenvatting

Data speelt een cruciale rol in de meeste informatie- en technologiesystemen. De manier waarop gegevens worden verwerkt bepaalt hoe efficiënt een systeem werkelijk is. Omdat eindgebruikers ver- wachten dat een systeem op elk moment reageert en beschikbaar is, vormen de keuzes voor het ontwerp van de infrastructuur de bouwstenen van moderne toepassingen. Op het gebied van databanktechnolo- gieën is er veel keuze. Traditionele relationele databanksystemen zijn nog steeds het populairst, maar NoSQL-technologieën vonden ook hun weg in applicatie infrastructuren. In moderne systemen worden ontwikkelaars uitgedaagd om verschillende databanktechnologieën te gebruiken in hun toepassingen, afhankelijk van het type gegevens. Deze heterogene dataopslagtechnieken resulteren in een complexere infrastructuur naarmate er meer en verschillende technologieën worden gebruikt. Om die reden zijn moderne databankbeheerders of operations engineers nodig die deze verschillende systemen weten te gebruiken, te configureren en te beheren.

Het doel van deze masterproef is om een dienst voor te stellen die helpt om dit probleem aan te pakken. Machines, computers en technologie in het algemeen, helpen mensen om veel processen te automa- tiseren. Het idee achter een script (of tool) ligt vaak in het uitvoeren van specifieke taken die anders handmatig uitgevoerd moeten worden. Op een bepaalde manier zijn het virtuele administratoren. In deze masterproef is onderzocht of het mogelijk is om een virtuele administrator te creëren die verantwo- ordelijk is voor het beheer van deze heterogene dataopslagtechnieken. De generieke databank service, zoals gepresenteerd in dit onderzoek, biedt een eenvoudig te gebruiken platform waar gebruikers een databanktechnologie vragen en de service zelf zorgt voor het installeren van alle benodigde compo- nenten en het delen van alle informatie. De virtuele administrator bepaalt zelf welke diensten opgezet moeten worden wanneer er om een databank gevraagd wordt. De ontwikkelaar wordt op deze manier onafhankelijk van een fysieke administrator en de tijd die nodig is om de gevraagde operationele taken te voltooien, vermindert aanzienlijk.

Een proof of concept is gemaakt in de applicatiemodelleringstool Juju voor de generieke databank service. Met behulp van een use-case en het reactive framework kan een applicatie met succes meerdere databanken van verschillende types opvragen. De generieke databank service gaat de gegevens dan correct delen met de oorspronkelijke applicatie. De implementatie van de service fungeert alleen als een proxy die de gegevens van de databank doorgeeft. Bovendien vergt deze dienst (te) veel middelen en moeten meer databanktechnologieën worden ondersteund. In iteratieve stappen kan ondersteuning voor elke databanktechnologie worden toegevoegd, zodat het eindresultaat een volwaardige toepassing wordt. Het idee achter de generieke databank service is bovendien niet gebonden aan Juju en kan (her)gebruikt worden in andere omgevingen om hetzelfde doel te bereiken.

Virtuele administratoren voor het beheer van heterogene dataopslagtechnieken Thomas Clauwaert Begeleiders: prof. Filip De Turck, dr. ir. Gregory Van Seghbroeck, dr. ir. Tim Wauters, ing. Merlijn Sebrechts

Abstract— In een wereld waar alles 24/7 beschikbaar moet zijn, is het II.ACHTERGROND onderhoud van services en applicaties van cruciaal belang. Het uitteke- nen, opzetten en uiteindelijk beheren van deze applicatie infrastructuren Het aantal machines, services en applicaties dat een moderne zijn vaak de grootste uitdagingen van moderne systeembeheerders. In deze masterproef wordt onderzocht hoe databanken eenvoudiger gebruiksklaar systeembeheerder moet beheren is de afgelopen jaren enorm gemaakt kunnen worden aan de hand van virtuele systeemadministrato- toegenomen. Dankzij diensten als Amazon AWS, Google cloud ren. Deze virtuele entiteiten nemen de taak over van de systeembeheerder computing of Azure is het eenvoudiger geworden en configureren de nodige zaken op een automatische manier zonder dat de om snel machines operationeel te maken. Vanuit een business- gebruikscomplexiteit hierbij verloren gaat. Dit laat toe dat niet-experts in de databanktechnologieën op een vlotte, flexible en eenvoudige manier een standpunt zijn deze services ook interessant omdat de gebruiks- databanktype naar keuze beschikbaar hebben voor gebruik. Aan de hand prijs vaak bepaald wordt op basis van de gebruikte resources. van een praktijkvoorbeeld wordt de generische databank service gedefini- Configuration management tools en Agile of DevOps filosofieën eerd in een functionele analyse. Nadien wordt deze use case geïmplemen- teerd aan de hand van de applicatiemodelleer-tool Juju. Het resultaat van helpen systeembeheerders en developers om snellere workflows dit onderzoek toont dat de generische databank service de vooropgestelde te bereiken. Aan de hand van scripts kunnen ze machines opzet- use case succesvol kan uitvoeren. De implementatie van de generische da- ten en services configureren. Configuration management tools tabank legt dan ook de basis voor verdere ondersteuning en ontwikkeling schieten echter te kort wanneer gebruikers de gewenste configu- om tot een volwaardige service te komen. Doordat deze service nog in een beginfase zit, kent het nog een aantal beperkingen zoals gelimiteerde on- raties niet exact weten. Specifieke features, bepaalde parameters dersteuning voor verschillende databanktechnologieën of de noodzaak van en de opmaaktaal van de tools moeten gekend zijn om deze tools enkele vooropgestelde precondities. Mits ondersteuning staat de service na- goed te kunnen gebruiken. Deze operationele kennis, is niet bij genoeg elk gebruik van verschillende databankentypes toe in Juju. Het con- cept kan eveneens (her)gebruikt worden in andere omgevingen met oog op iedereen gekend en vertraagt vaak het gehele proces. dezelfde einddoelen. Het eerste onderdeel van dit onderzoek, ligt bij de virtuele ad- Kernwoorden— Service orchestration, polyglot persistence, Juju, auto- ministratoren. In “A Taxonomy and Survey of Clooud Resource matisatie, configuration management, virtuele systeembeheerders, hetero- Orchestration Techniques” [1] en Orchestrator Conversation: gene databanktechnologieën, systeembeheer Distributed Management of Cloud Applications [2] worden re- cente uitdagingen en state-of-the-art oplossingen met betrekking op cloud computing besproken. Het onderzoek toont aan dat I.INLEIDING orchestration technieken ervoor zorgen dat virtuele administra- toren op een automatische en flexibile manier aan herschalen OG nooit werd er zoveel data gegenereerd als vandaag. Dit kunnen doen. Deze benadering, waarin applicaties dynamisch Nvormt dan ook vaak de kern van applicaties of diensten. op elkaar kunnen reageren, ligt dan ook aan de basis van auto- Deze informatie moet opgeslagen worden in databanken. Waar matisatie. Sebrechts et al. stellen het reactive pattern voor, om men vroeger hoofdzakelijk relationele systemen gebruikte voor services in Juju, optimaal op te zetten en te beheren [3] als oplos- dataopslag, kent men vandaag de dag ook niet-relationele vari- sing voor de . Dit pattern zal samen met de applicatiemodelleer- anten die gecategoriseerd worden onder de NoSQL term. Dit tool Juju [4] gebruikt worden bij de implementaties. Juju en de betekent dat informatie, applicatie- en vormafhankelijk, op ver- TOSCA cloud modelleer standaard [5] kennen heel wat over- schillende systemen wordt opgeslagen. Moderne databankbe- eenkomsten. Deze specificatie laat toe om (cloud) applicaties heerders worden vertrouwd geacht te zijn met deze verschil- voor te stellen aan de hand van modellen. Zo wordt er beschre- lende technologieën. Deze, schaars te vinden personen, krijgen ven hoe verschillende applicaties samenhangen, interageren met vaak de vraag om nieuwe systemen op te zetten of de verbin- elkaar of welke workflow ze moeten hanteren. Typisch aan de dingsgegevens van de databank uit te delen. In een ideaal scena- TOSCA taal is het vastleggen van wat een service nodig heeft rio zouden de data-analisten of ontwikkelaars, zelf in staat moe- om goed te kunnen functioneren alsook wat de service aanbiedt ten zijn om dit te realiseren. Door een gebrek aan operationele naar andere services toe. Dit is ook terug te vinden in Juju en kennis bij niet-experts, is dit vaak niet het geval. Virtuele sys- legt de basis voor de vooropgestelde service in dit onderzoek. teembeheerders die deze taak op zich nemen, bieden een mooi Het tweede luik waarop dit onderzoek steunt, is te vinden alternatief om het werk van beide partijen te verlichten en de in de databanksector. Wanneer een applicatie gebruik maakt operationele kennis te abstraheren. Data analisten of ontwikke- van meerdere, verschillende databanktechnologieën dan spreekt laars kunnen op deze manier snel aan de slag met een databank men over de term “polyglot persistent” applicaties (zie ook fi- naar keuze en zijn zo in staat om zich te focussen op hun exper- guur 1). Met een enorm aanbod aan databanktechnologieën is tise. het niet altijd evident om een keuze te maken, in dit onderzoek charm charm

layer interface layer layer webshop "webapp" "generic-database" "generic application database" requires provides

"wants request "gets "ensures database database" request" exists and is available"

user-related recommendation items for sale information information "gets share_details "provides database" details"

Fig. 3. De communicatie tussen de twee charms gebeurt via de generische- databank interface layer. De verschillende endpointen gebruiken deze API om informatie door te geven. relational- document- graph-based based based e.g. MySQL e.g. MongoDB e.g. Neo4J service tussen de applicatie die een databank aanvraagt en de ap- plicatie die voor één kan instaan. In de applicatiemodellen bete- kent dit dat de semantische waarde van de generische databank Fig. 1. Voorbeeld van een polyglot persistence applicatie. De applicatie is pas service een atomaire databank is. Dit betekent dat de generische volledig operationeel, wanneer de verschillende databanken van de verschil- databank enkel en alleen een databank voorstelt. Applicaties lende databanktypes bruikbaar zijn. die gebruik maken van deze service zijn in staat om een data- bank aan te vragen. In dit geval wordt de generische databank Operations concreet en zullen de details om verbinding te maken met de databank, gedeeld worden. Applicaties die nadien gebruik ma-

request generic request database requesting ken van dezelfde generische databank service, die nu concreet database technology service service service is, maken op deze manier gebruik van dezelfde databank. Een laatste grote opmerking (zoals eveneens te zien op fi- guur 2) bij deze applicatiemodellen is het verschil in perspec- Application tieven tussen de operations-kant en de applicatie-zijde. Een ap- plicatie die de generische databank service gebruikt, heeft geen direct connection database requesting weet van andere services bij het opzetten en configureren van technology service service deze applicatie. Eens deze taken uitgevoerd zijn, zal de appli- catie werken met een rechtstreekse verbinding naar de databank (waar deze ook staat). Vanuit de applicatie is er dan ook geen Fig. 2. Applicatiemodel van de generische databank service. Deze service is weet meer van de generische databank. Dit betekent dat ontwik- het aanspreekpunt van services die nood hebben aan een databank. Merk kelaars of data analisten geen rekening moeten houden met de op dat na het uitwisselen van de nodigege gevens, de oorspronkelijke ser- vice wel een rechtstreekse verbinding maakt met de databank. Binnenin de generische databank service. Deze werkt louter op het niveau applicatie-logica is er geen sprake van de generische databank service. van systeembeheer en wordt enkel gebruikt bij het opzetten van databanken en uitwisselen van de verbindingsgegevens. werd er vanuit gegaan dat de gewenste databanktypes van appli- IV. IMPLEMENTATIE caties gekend zijn. Aan de hand van een use case, wordt er naast de generische databank service ook een afzonderlijke webapplicatie gemaakt. III.GENERISCHE DATABANK Deze charm (Juju-term voor services) omkadert het beschikbaar Stel dat een bedrijf ervoor kiest om een webshop-applicatie te stellen van een virtuele machine, het installeren van de apache bouwen. Zo wordt er bijvoorbeeld gekozen om alle gebruikers- software en het opzetten van enkele webpagina’s. Deze webap- data op te slaan in één soort databank. Alle informatie met be- plicatie kan pas goed functioneren nadat een configuratiebestand trekking tot de items die te koop aangeboden worden blijken dan correct ingevuld is met de verbindingsparameters van een data- weer efficiënter in een ander type. Bij het opzetten van zo een bank. webshop-applicatie is er nood aan drie verschillende services. Vervolgens is het hergebruiken van bestaande interface- en De twee verschillende databanktechnologieën moeten operatio- charms-layers op vlak van databanktechnologieën een belang- neel zijn. Vervolgens moet de databanken aangemaakt worden rijke leidraaid. Andere charm authors en Juju gebruikers heb- en moeten de correcte connectiedetails ingevuld worden. Pas na ben in het verleden deze services reeds gemaakt. Het is dan ook deze stappen kan de webshop-applicatie operationeel zijn. Het één van de sterktes van Juju om elementen (vaak in de vorm van doel van de generische databank service is het automatisch uit- layers) te hergebruiken. Lego-blokjes zijn hiervoor een mooi voeren van deze stappen. metafoor. Figuur 2 toont aan dat de generische databank werkt als proxy De eigenlijke generische databank charm en interface layer zijn dan uiteindelijk de kern van de use case. Via het reactive framework en het endpoint pattern wordt er aan de hand van flags gecommuniceerd. Figuur 3 toont aan hoe de communicatie precies verloopt over de relatie (de interface layer). Het gehele proces kan als volgt samengevat worden: 1. Een service vraagt het opzetten van een databank aan. 2. Via de generische databank layer ontvangt de generische da- tabank service dit verzoek. 3. De generische databank service gaat via de interface layer van dat overeenkomstige databanktype een formele aanvraag uitvoeren. 4. De databanktechnologie service ontvangt de aanvraag, maakt de databank en deelt de details. 5. De generische databank ontvangt de gegevens en deelt die op zijn beurt met de oorspronkelijke service uit stap 1. 6. De service ontvangt de details en genereert het configuratie- bestand. Aan de hand van de eerder vernoemde flags zorgt de gene- rische databank service er ook voor dat als een nieuwe service verbinding maakt met deze generische databank service, dat de- zelfde informatie gedeeld wordt. Dit was niet op een automati- sche manier mogelijk in Juju.

V. CONCLUSIE Het doel van deze masterproef is nagaan in welke mate het mogelijk is om een service te bouwen die ongeacht de data- banktechnologie, een databank kan voorzien en de verbindings- parameters kan delen. Aan de hand van een use case is deze minimale proof of concept succesvol geïmplementeerd in de applicatiemodelleer-tool Juju. Vooraleer er sprake kan zijn van een volwaardige service zijn er nog een aantal werkpunten voor de generische databank service. Zo is de performantie van de proof of concept niet optimaal en is er nog veel ruimte voor ver- dere ondersteuning voor meer databanktechnologieën. Wanneer deze technologieën goed (via het reactive framework en inter- face layers) geïmplementeerd worden in Juju dan zal deze ge- nerische databank service de perfecte aanvulling zijn voor meer flexibiliteit en een gemakkelijkere, geautomatiseerde workflow bij het opzetten van applicaties. Een extra voordeel is dat het gehele concept kan (her)gebruikt worden buiten Juju.

REFERENCES [1] Denis Weerasiri, Moshe Chai Barukh, Boualem Benatallah, Quan Z. Sheng, and Rajiv Ranjan, “A taxonomy and survey of cloud resource orchestration techniques,” ACM Comput. Surv., vol. 50, no. 2, pp. 26:1–26:41, May 2017. [2] M. Sebrechts, C. Johns, G. Van Seghbroeck, T. Wauters, B. Volckaert, and F. De Turck, “Orchestrator conversation: Distributed management of cloud applications,” 2018. [3] M. Sebrechts, C. Johns, G. Van Seghbroeck, T. Wauters, B. Volckaert, and F. De Turck, “Beyond generic lifecycles: Reusable modeling of custom-fit management workflows for cloud applications,” 2018. [4] Canonical, “ of Juju,” https://jujucharms.com/, 2018, Accessed: 2018-05-01. [5] OASIS Committee, “TOSCA Simple Profile in YAML Version 1.2,” https://docs.oasis-open.org/ tosca/TOSCA-Simple-Profile-YAML/v1.2/ TOSCA-Simple-Profile-YAML-v1.2.html, 2017, Accessed: 2018-05-01.

Contents

List of Figures xix

List of Tables xxiii

List of Listings xxv

1 Introduction 1

1.1 IT experts as a scarce resource ...... 2

1.2 Virtual administrators & Operations knowledge ...... 4

1.3 Polyglot persistence ...... 7

1.3.1 Definition ...... 7

1.3.2 Example ...... 7

1.3.3 Concerns ...... 9

1.4 Problem statement ...... 9

1.5 Goal ...... 10

1.6 Research questions ...... 10

1.7 Overview ...... 10

2 Background 13

2.1 From infrastructure to infrastructure as code ...... 14

xv xvi CONTENTS

2.2 Service orchestration ...... 15

2.3 OASIS TOSCA ...... 18

2.4 Juju ...... 20

2.4.1 What is Juju ...... 20

2.4.2 Juju internals ...... 23

2.4.3 Juju as a solution ...... 26

2.4.4 Alternatives ...... 29

2.5 Database technologies ...... 30

2.5.1 History ...... 30

2.5.2 Types ...... 30

2.5.3 Relational database management systems ...... 32

2.5.4 Not-relational database systems ...... 32

3 Functional specification 37

3.1 Terms and visualisations ...... 38

3.1.1 Application modelling ...... 38

3.1.2 OASIS TOSCA ...... 39

3.2 Example use case: company X ...... 39

3.3 The generic database ...... 41

3.3.1 Definition ...... 41

3.3.2 Design choices ...... 42

3.4 Possible scenarios ...... 44

3.5 Use case revisited ...... 44

3.6 Caveats ...... 44 CONTENTS xvii

4 Technical implementation 47

4.1 Juju specific terms ...... 48

4.2 Example use case: company X ...... 48

4.3 The generic-database-charm ...... 49

4.3.1 Design Choices ...... 49

4.3.2 Other possibilities ...... 49

4.3.3 The generic database under the hood ...... 50

4.4 Use case revisited ...... 53

5 Discussion, Future work & Conclusion 55

5.1 Discussion ...... 56

5.2 Answers to research questions ...... 57

5.3 Future work ...... 57

5.4 Conclusion ...... 58

Bibliography 61

Appendices 65 xviii CONTENTS List of Figures

1.1 Stackoverflow survery 2018 [1] ...... 3

1.2 Traditional communications between operations and non-operations ...... 5

1.3 The concept of a virtual administrator ...... 5

1.4 Example of a web shop application using multiple types of database technologies illus- trating the concept of polyglot persistence...... 8

2.1 The difference between service orchestration and service choreography ...... 16

2.2 Sequence diagram of an example use case where a buyer interacts with an agent. Af- terwards the agent formally performs all needed actions and communicates back with the buyer [2]...... 17

2.3 Resource entities and relationships of a Web application [3]...... 17

2.4 Example of an application topology illustrating the terms used by the TOSCA standard [4]. 18

2.5 TOSCA node types have requires and capabilities sections to fit together just like Lego pieces...... 19

2.6 Juju GUI showing two applications (MySQL and Wordpress) connected to each other. The Wordpress application is in need of a database which is provided by the MySQL service...... 21

2.7 Charm (bash template) structure ...... 23

2.8 The architecture of the charms.reactive framework: when the orchestrator executes a hook, the reactive framework initiates and runs the handlers whose preconditions are true [5]...... 25

xix xx LIST OF FIGURES

2.9 The workflow of the automatic set flags in the endpoint ...... 26

2.10 Disadvantages of relational database systems. An extra mapping step might be needed from data structure in the application to data structure on the database [6]...... 31

2.11 Most popular technologies in March 2018 ...... 32

2.12 Data platform map in 2016, illustrating the different types of database systems and tech- nologies [7]...... 33

2.13 Example of a entity–relationship model illustrating how relational systems use tables and relations to store and link data...... 34

3.1 Example application model of the Wordpress and MySQL services...... 38

3.2 Logical diagram example of the TOSCA standard showing 3 nodes connected through “HostedOn”-relations. Nodes are defined by a name and a type and typically havea “properties” and a “capabilities” section. Some nodes also have a requirements section indicating how they need to function. Two nodes can have a relation if the requirements of one are conform to the capabilities of another. [8] ...... 40

3.3 Application model of use case: company X ...... 41

3.4 Example hierarchy or categorisation of database technologies. The root of the tree is the most generic database whereas the leafs represent technology specific databases. . 43

3.5 Application model of use case: company X with generic databases...... 45

3.6 The generic database service is only functional and present on the operations sideof the application stack. The application itself directly connects to the database and isnot aware of the generic database service...... 46

4.1 Application model with both the generic database charm and the generic database dba charm...... 50

4.2 BPMN diagram of the generic database charm concept...... 51

4.3 Sequence diagram of the implemented generic database service. It is assumed that the database technology service is available. If the database is not concrete a request is send to set up the database. In the other scenario the generic database already knows the connection details...... 52 LIST OF FIGURES xxi

4.4 Visualisation of the interface layer of the generic database. The black nodes can be seen as endpoints in charms. The interface layer is the API that tells how the charms should communicate...... 53

4.5 Application model of the use case as shown in the Juju GUI service. Two generic database services represent two databases used by a webshop and a data analysis appli- cation...... 54

1 Application model of the project. Two new charms layers will be created along with one interface layer. The “” charm and “mysql-shared” interface will be reused. .. 69

2 Metadata and layer files of the two charm layers. They are the heart of the applications, allowing them to connect to each other...... 70

3 Visualisation of the interface layer of the generic database. The black nodes can be seen as endpoints in charms. The interface layer is the API that tells how the charms should communicate...... 72 xxii LIST OF FIGURES List of Tables

2.1 Summary of the most important terms in the TOSCA standard [8]...... 19

2.2 Juju terms ...... 28

2.3 Relational database and SQL terms ...... 34

2.4 Overview of non-relational (NoSQL) technologies ...... 35

3.1 Relation between the conceptual terms and how it is visualised...... 38

3.2 Different (possible) definitions for the generic database concept ...... 42

4.1 Summary of Juju terms and their meaning that are relevant in this chapter...... 48

xxiii xxiv LIST OF TABLES List of Listings

1 Example of a Puppet manifest ...... 14 2 Example of a config.yaml file in JuJu...... 24 3 Example of the Haproxy metadata.yaml file in Juju...... 24 4 Example of an install hook in Juju...... 24 5 Example of a handler in the reactive framework in Juju...... 25 6 Pseudocode illustrating how flags and the endpoint-pattern are used in the reactive framework of Juju...... 53 7 Interface.yaml file of the proxy interface layer ...... 70 8 Code of testwebapp/reactive/testwebapp.py that starts the workflow of the use case with a request for a database...... 71 9 Code of interfaces/proxy/requires.py ...... 72 10 Code of interfaces/proxy/requires.py ...... 73 11 Code of gdb-charm/reactive/gdb-charm.py ...... 74 12 Code of testwebapp/reactive/testwebapp.py to render config file...... 75

xxv

1 Introduction

One of the most crucial and central subjects in information technology (IT) systems is data. In most cases all that information needs to be stored in databases. The way data flows are treated often deter- mine how easily services can be managed. Different database technologies provide different strengths and weaknesses. These days, it is not considered acceptable anymore to have significant downtime. Applications and all its stored data need to be accessible at any time. The lifecycle on how to store, process and analyse the bits and bytes is of great importance. Operation and system engineers have the difficult task to create, deploy and maintain deployed services. In a lot of cases the central design considerations focus on application development, more than service operations which might reduce the quality of the cloud application. “The (5+1) Architectural View Model for Cloud Applications” brings attention to all aspects of cloud applications [9]. Not only methodologies like Agile and DevOps but also outsource-approaches have gained popularity over the last few years as they provide better de- velopment cycles, automation and scalability [10]. In this thesis, new state of the art approaches for managing (cloud) applications are examined. With the gathered knowledge, the concept of a service that would help system and operations engineers in their day-to-day activities is constructed. Thanks to service orchestration and application modelling tools it becomes possible to define virtual (system) administrators. Operational tasks that were once the job of operations engineers or system administra- tors, are now performed in an automated way. These virtual entities aid data scientists or developers in obtaining configuration parameters of their wanted environments without the need of a physical operations engineer.

1 2 CHAPTER 1. INTRODUCTION

1.1 IT experts as a scarce resource

The digital and technical evolution resulted to ecosystems were different types of data becamethecen- tral entity no matter what the actual business is. Guerra etal.[11] describes this phenomena where they emphasize that even computer science and engineering courses lack the “pedagogical practice in face of a reality that has required multidisciplinary, multidimensional, global, and contextualized prepa- ration”. This shows the lack of data analysis subjects and exercises in programs that focus onprogram- ming, computational insight and other IT-related skills. In a paper called “Integrating NoSQL, Relational Database, and the Hadoop Ecosystem in an Interdisciplinary Project involving Big Data and Credit Card Transactions”, Rodrigues et al. [12] show how they tackled a real life use case concerning big data tools and technologies for a group of 60 graduate students over the course of 17 academic weeks. The paper shows the complexity of the Big data subject and the technical and mathematical requirements and tools before proper analysis can be performed (graduates need to understand the concepts of NoSQL, Hadoop, MapReduce and Hive before they are able to start performing analyses). This shows that an IT-minded person needs to know a lot of different things before his workflow becomes fluent.

The previous section briefly discussed that it is not always easy to find a lot of people withgoodhands- on experience. Education institutions need the time to re-educate themselves and transition their courses to these subjects. In addition, the number of different topics to discuss in computer science and engineering studies keeps growing while there is only a fixed, limited amount of time. Therefore, it is difficult to find or become an IT expert that has the mathematical and statistical foundations andthe skill set to program properly and to have the knowledge to set up and maintain services. This group of people would rise if the know-how of services and systems is not required as a part of their workflow. A recent survey of Stack Overflow [1] shows that most people fall under the category “developer” (almost 60% identify themselves as Back-end developer). Job titles such as “Database administrator” (not even 15%), “System administrator (11,3%)”, “DevOps specialist (10,4%)” and “Data scientist or machine learn- ing specialist” (7,7%) are less present and it is safe to assume that there are less people employed in these areas. Figure 1.1 illustrates the number differences on the X-axis. In other words IT experts, no matter the expertise are scarce, especially maintainers of applications and services such as operations engi- neers or system administrators. The State of Developer Ecosystem Survey in 2018 is another recent study performed by JetBrains that confirms the imbalance between developers and other IT experts [13]. 1.1. IT EXPERTS AS A SCARCE RESOURCE 3

Figure 1.1: Stackoverflow survery 2018 [1] 4 CHAPTER 1. INTRODUCTION

1.2 Virtual administrators & Operations knowledge

Before defining the problem statement and goal of this research some terms and concepts needtobe defined. Virtual administrators and operations knowledge are key points, crucial for understanding what the goal of this research is and why certain approaches are used.

Virtual administrators

To properly define the concept of a virtual administrator, it is interesting to look at some dailywork cycles. These days, most teams have real life administrators to perform operations. These operational tasks may vary from setting up and configuring new services and machines or monitoring existing ap- plications. Some companies also make use of the possibility to outsource these tasks as they often want to focus on one specific thing without the cost and complexity of the tasks of a system administrator or network engineer. The same can be said about testing, quality assurance (QA), infosec or analytics. Figure 1.2 shows a common workflow of how non-operation minded people work together with system administrators. It is clear that there are at least two stakeholders, the person requesting something and the system administrator providing it. Because multiple people are involved, time management and planning becomes crucial. It does not occur frequently that people have the time and ability to perform operational tasks immediately, resulting in a slow process. The DevOps philosophy/approach intro- duced awareness and techniques to bundle forces between developers on the one side and operations engineers on the other. This way both teams can deal with the big differences between developing a piece of software and deploying or managing it. Both work together with the product as ultimate goal. This mindset is already a huge step forward and reduces the time it takesfrom requesting an operational task to finishing it. The virtual administrator approach reduces this bottleneck completely. Figure 1.3 shows that an operation engineer can create a virtual administrator and provide this entity with all the necessary tools in order to perform the tasks he otherwise needed to do. In this case a non-operations person can make use of the virtual administrator at any time. This reduces the number of people who are involved in the process to one. The developer or data scientist becomes self-reliant and the time needed to get the requested operational tasks done, reduces significantly. In other words, virtual ad- ministrators provide more flexibility to set up automated systems and users are not required toknow specifications of the used systems.

Imagine the case where a group of developers or other non-operations engineers are in need of some servers and applications for a new project. One of the required services is a database. In a lot of cases either the developer has access to the database or himself or he must ask an operations engineer or database administrator (DBA) to perform the steps that are necessary. In the first case the developer is required to know how to deploy and configure these systems. These steps can for example include: setting up a server, installing software, creating users, creating a database and copyingthe database details to the application the developer is working on. These are a lot of steps and they require 1.2. VIRTUAL ADMINISTRATORS & OPERATIONS KNOWLEDGE 5

Figure 1.2: Traditional communications between operations and non-operations

Figure 1.3: The concept of a virtual administrator 6 CHAPTER 1. INTRODUCTION some specific knowledge and skills. It is a grey zone whether or not the developer should knowthis knowledge as it is often not directly associated with his job. The idea of a virtual administrator isnot taking away the job of the DBA or operations engineer but is rather the concept of creating a system, a tool or a way that focuses on performing the tasks in such a way that the developer can get what he wants in a time and cost-friendly way. For the operations engineer the virtual administrator should also be constructed with flexibility as a key element. Three concepts come forth from this idea:

• No physical other person is needed or must intervene when setting up these servers, applications or services.

• No real operations knowledge (see below) must be known by the developer.

• Flexibility and reusability are the main concerns for the operations engineer.

The virtual administrator handles all technical stuff and provides the requested services anddetailsto the developer in a way that he is ready to start carrying out his expertise without bothersome config- uration issues. When different virtual administrators have the flexibility to interact with each other full-fledged automated infrastructures become possible.

Operations knowledge

The abstract concept of operations knowledge can be defined as the overall sum of the knowledge about specific technologies, their relations, configurations and limitations. This includes versionnum- bers, technology-specific differences for the same physical or logical thing, configuration parameters and all other characteristics of a service or technology. Knowing that MySQL is a management system for databases is not necessarily considered to be operations knowledge, whereas knowledge about in- stalling MySQL or how a user can connect to a MySQL database is. The following example will clarify this definition.

Imagine a team working on a web shop that sells items. The company chooses to performing data analysis on all sold units, they decide to put a team of data scientists to work. This team needs to set up their tools, configure the applications they want to use and they need to be able to get the datafromthe web shop. This task and how it is achieved is what falls under the domain of operations knowledge.The data scientist does not care what database technology (MySQL, MongoDb, or Cassandra for example) or back- and front-end technology is used for the web shop. All they want, is the ability to retrieve the data and work with it. In other areas, they might be a bit pickier as the data scientists might request some specific version (of a tool or technology) because of some specific feature (for example Hadoop version 3.0). Note that the same exercise can be made with developers instead of data scientists. Developers generally are not interested whether the runs Nginx or Apache, or if your database server 1.3. POLYGLOT PERSISTENCE 7 runs on a MySQL or a Mariadb instance. This flexibility to request things when having the operations knowledge, and at the same time being able to work on a higher more abstract level that does not require operations knowledge, is not easily attained. With the help of virtual administrators ateam should be able to have faster and easier deployments models without losing flexibility and power.

1.3 Polyglot persistence

1.3.1 Definition

Because this thesis looks at virtual administrators for the management of polyglot persistence applica- tions, this term is the final foundation of this research. The definition “polyglot persistence” originated from the blog (2008) of Scott Leberknight [14] who applied the idea of polyglot programming (2006) from Neal Ford. The term polyglot programming was introduced as the idea of using multiple lan- guages making it possible to choose the right tool for the job [15]. Polyglot literally means “a mixture of languages” 1 . Persistence, in this case, refers to the process of storing data in storage whereas the data itself will outlive the process that created it. In other words, polyglot persistence, is the term for an application, service or enterprise where multiple data store technologies coexist [16]. Picking the right language for the job may be more productive than trying to fit all aspects into a single language. Note that the polyglot persistence idea resembles the concept of microservices where instead of a monolith application, an architecture of loosely coupled services is built. This architecture enables organisations to focus on a separate business functionality within small services [17]. Microservices and polyglot persistence fit perfectly together.

1.3.2 Example

Figure 1.4 shows an example of a polyglot persistent application. The web shop uses multiple databases to store different types of information. The reason behind dividing the storage of these different types of data is often two-sided. First, parts of the application might not be core-business. If the recommen- dation system for example does not function, the web shop itself is not necessarily offline as it should still be possible to login and buy things. Secondly, and this is probably the biggest motivator for a lot of teams, are the characteristics of the database technology. If an application uses a Key-Value model to store its variables, an according database technology like Redis will be more efficient. Another example why applications might choose not to use traditional relational database systems are the use of certain algorithms. If an application uses graph data structures, a graph-based storage system might be more suitable.

1https://www.merriam-webster.com/dictionary/polyglot 8 CHAPTER 1. INTRODUCTION

webshop application

user-related recommendation items for sale information information

relational- document- graph-based based based e.g. MySQL e.g. MongoDB e.g. Neo4J

Figure 1.4: Example of a web shop application using multiple types of database technologies illustrating the concept of polyglot persistence. 1.4. PROBLEM STATEMENT 9

1.3.3 Concerns

Choosing for microservice architectures and polyglot persistent models is not necessarily the best solu- tion. Adding more systems to an application, raises the complexity of the overall infrastructure. For the application it means more configuration parameters and maybe more libraries to use before everything is properly connect. When it comes to system administration, more and different systems require more attention. In NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence this challenge is described as follows “In this new world of polyglot persistence, the DBA groups will have to become more poly-skilled - to learn how some of these NoSQL technologies work, how to monitor these systems, back them up, and take data out of and put into these systems” [16]. In other words, not only the appli- cation specific configurations need attention but the deployment complexity also raises. Asthesenew systems are required by the applications, they will also need to exist in all environments (development, testing, quality assurance and production) as well. The advantages of polyglot persistence need tobe weighed against the complexity that it entails. Choose the right tool for the job.

1.4 Problem statement

Right now developers or data scientists still need a lot of operational knowledge when they want to perform their job. Proper configurations often cause a lot of lost time and frustrations. These opera- tional tasks need to be performed by a group of scarce people, the operation engineers. If a user is in need of a virtual machine, a web server or a database, he often needs help of another physical person either explaining how things work or providing all the necessary instances and details.

Configuration management tools like Chef, Puppet or Ansible have grown in popularity over the last few years. More and more companies started to use them and the tools became more accessible2. This is already a huge step forward from the slow, manual steps one needed to take in a not so distant past. Tasks such as setting up a (virtual) machine, configuring and installing software should notbeper- formed manually anymore. These tools help the operation engineers in performing their tasks butare however of no use for developers or data scientist who have no affinity with the operations knowledge that is still required when using them. Even services offered by big companies such as Amazon, Google or Microsoft still require some basic knowledge about configuring machines and services. Preconfig- ured machines or out-of-the-box solutions are therefore in a lot of cases definitely wanted.

2https://www.g2crowd.com/categories/configuration-management 10 CHAPTER 1. INTRODUCTION

1.5 Goal

In “The DevOps Handbook, How to create world-class agility, reliability, & security in technology organi- zations” finish their book with a call for action. They sell DevOps principles and patterns asasolution that “can help the creation of dynamic learning in organizations, achieving the amazing outcomes of a fast flow and world-class reliability and security, as well as increased competitiveness and employee sat- isfaction” [18]. Even though this is written from a DevOps point of it can be directly mappedon the high-level goals and problems that are discussed in this research. When setting up tools, configu- rations and machines, people without operations knowledge should be able to receive their requested operational entities faster and more easily. Additionally, managing cloud applications should be done in the most flexible way as things may change rather quickly. In general, things should beeasier!

The goal of this thesis can be found in its title. Management of polyglot persistent integrations with virtual administrators refers towards the use of virtual administrators that allow services to request any database type they want and end up with a working database, ready to connect to.

1.6 Research questions

The creation of this service is the main focus of this research. It is however possible to look atsmaller research topics as well.

• Is it possible, and what is needed, to create a service for the management of polyglot persistent integrations?

• What does this service formally represent in an application model?

• What problems does the service solve, and at what cost?

1.7 Overview

The next few chapters are organized as follows: chapter 2 focuses on existing tools and related work. It defines used concepts, languages and technologies such as the TOSCA standard and Canonical’s application modelling tool Juju. Some existing tools will be discussed illustrating why they fall short. The similarities and differences between database technologies are also defined as they provide crucial information in defining a generic database service. Chapter 3 focusses on the functional specification. This makes it possible to define the service without any tool-specific restraints. In this chapterthe scope of the service is defined by a sample use case scenario. A technical implementation inJujuis 1.7. OVERVIEW 11 discussed in chapter 4. The presented use case acts as a proof of concept. Chapter 5 ends thisthesis with a discussion about the results, the limitations, some possible future work or research and finally a conclusion. 12 CHAPTER 1. INTRODUCTION 2 Background

This chapter will give an overview of some key components and concepts about operational tasks.A small overview of old and new approaches will be given accompanied by concepts such as Infrastruc- ture as Code (IaC) and configuration management tools. Next, state of the art research concerning virtual administrators and service orchestration will be thoroughly discussed. Afterwards, the TOSCA modelling language and Canonical’s Juju will be examined for respectively the functional specifica- tion and the technological implementation. Finally, for the sake of completeness the different type of database technologies are discussed and compared.

13 14 CHAPTER 2. BACKGROUND

2.1 From infrastructure to infrastructure as code

When it comes to setting up machines and applications or deploying developed software a lot ofoper- ational tasks used to be manual work. With virtualisation techniques and afterwards cloud computing possibilities other approaches were needed. More and more principles of software development found their way to infrastructure. “Infrastructure as Code” (IaC) is a term describing the act of setting up, managing and interacting with data centres or cloud computing models. The idea is that the configu- ration is written in files (typically YAML and JSON) to describe what the desired state should befora specific machine (see listing 1 for an example). The concept of idempotence, meaning “that a deploy- ment command always sets the target environment into the same configuration” 1 is very important. Configuration management tools such as CFEngine, Chef or Puppet can be seen as frameworks forIaC.

# execute 'apt-get update' exec { 'apt-update': command => '/usr/bin/apt-get update' }

# install apache2 package package { 'apache2': require => Exec['apt-update'], ensure => installed, }

# ensure apache2 service is running service { 'apache2': ensure => running, }

Listing 1: Example of a Puppet manifest

IaC approaches and configuration management tools become mainstream and have improved the work- flow for both developers and operation engineers [13]. These tools however, only help when someone uses them who has “operations knowledge” (see section 1.2 Operations knowledge). Users can define certain parameters more easily and the deployment is faster but if the user does not know what to de- fine and setup these tools wont help. In an introductory video about Juju (see later) Jorge Castrotalks about “service orchestration” as the next step. He states: “We see the next step being service orchestra- tion, which is when you get to the level of scales when you are talking about thousands and hundreds of thousands of instances, you have to manage at the service level instead of the individual machine. You care about the individual machine but they become like CPU and RAM are today.”

1https://www.visualstudio.com/learn/what-is-infrastructure-as-code/ 2.2. SERVICE ORCHESTRATION 15

2.2 Service orchestration

The idea behind service orchestration finds its roots in the Service Oriented Architecture (SOA) land- scape. In the early 2000s this approach focused on developing systems that are “loosely coupling inter- operable services” [19]. The reusability of components and the creation of multiple services interacting with each other, lie at the core of SOA. Other extensions or inspirations of SOA include Web services based on Web Services Description Language (WSDL) and Simple Object Access Protocol (SOAP), Web 2.0 and microservices. Service orchestration is a possible approach to help teams attain a system design compatible with SOA. Mulesoft defines service orchestration as follows: “Similar to an organizational workflow, service orchestration is the coordination and arrangement of multiple services exposed asasingle aggregate service. Developers utilize service orchestration to support the automation of business processes by loosely coupling services across different applications and enterprises and creating “second-generation,” composite applications. In other words, service orchestration is the combination of service interactions to create higher-level business services” [19].

An orchestrator in its simplest form is “program” that interprets knowledge in the form of models or files and performs the necessary management actions in automated manner[20]. Orchestration should not be confused by a similar but different approach namely the “Web Service Choreography”, a spec- ification to define business processes with XML. Not everyone agrees on terminology and concepts2 but the main difference lies in the management approach. Orchestration has a single entity ordering tasks and deciding things. With choreography there is no “conductor” (a central management node), the “performers” (webservices for example) need to act on their own. Figure 2.1 visualises the difference between the two. Note that because IT infrastructures can grow very complex, one central orchestrator often lacks maintainability. Even meta-schedulers (decentralized scheduling) do not address theexten- sive needs in cloud modelling languages. To answer this issue Sebrechts et al. propose a distributed orchestrator in the paper “Distributed Service Orchestration: Eventually Consistent Cloud Operation and Integration” [20].

2https://www.infoq.com/news/2008/09/Orchestration 3https://stackoverflow.com/questions/4127241/orchestration-vs-choreography 16 CHAPTER 2. BACKGROUND

(a) Visual representation of service orchestration. (b) Visual representation of service choreography.

Figure 2.1: The difference between service orchestration and service choreography3.

The need for orchestration tools became apparent when knowledge reuse became a very requested phenomenon. Code reuse in software development in the form of libraries has been around foryears. With Infrastructure as Code, it became clear that concepts such as abstraction and encapsulation be- came crucial in operations as well. In the paper “Orchestrator Conversation: Distributed Management of Cloud Applications”, Sebrechts et al. [20] propose the orchestrator conversation. This approach should enable the reuse of knowledge.

In “Web services orchestration and choreography” [2] Peltz uses a sequence diagram (see figure 2.2) and a use case to illustrate how a user is communicating with an agent and how this agent performs all the hard work. The concept of a virtual administrator as introduced in 1.2 becomes clear. A user formulates a high-level (with no operations knowledge) request and a virtual administrator (in this case the agent) performs all necessary tasks to fulfil the request.

In “A Taxonomy and Survey of Cloud Resource Orchestration Techniques” Weerasiri et al. provides an overview of orchestration models, languages, platforms and tools related to cloud resources. In their taxonomy they visualise the different components of cloud resource orchestration. They also illustrate that cloud resource orchestration happens on different layers. For example a user layer exists at the highest level, providing tools like command line interfaces (CLI), dashboards or other tools. This is an abstraction to the user and with the help of other layers such as the resource management or resource provisioning layer a whole hierarchy makes the orchestration of cloud resources possible.

Figure 2.3 shows an example of the resource entities and relationships of a Web application. It models the infrastructure and shows how different entities are related. A standard that would use these di- agrams for a uniform understanding and a way for tools to interpret these models would help in the process of managing cloud based services. Such a standard is the TOSCA language. 2.2. SERVICE ORCHESTRATION 17

Figure 2.2: Sequence diagram of an example use case where a buyer interacts with an agent. Afterwards the agent formally performs all needed actions and communicates back with the buyer [2].

Figure 2.3: Resource entities and relationships of a Web application [3]. 18 CHAPTER 2. BACKGROUND

Figure 2.4: Example of an application topology illustrating the terms used by the TOSCA standard [4].

2.3 OASIS TOSCA

In “Declarative vs. Imperative: Two Modeling Patterns for the Automated Deployment of Applications” Endres et al. [21] discuss two modelling patterns for automated cloud deployments. Declarative work- flows focus on what needs to deployed. All logic is interpreted and the runtime performs the necessary operations. Imperative workflows on the other hand cover how deployments happen. All required steps need to be explicitly described. The TOSCA standard supports both models as both modelling application topologies can be created as well as workflow models for deployments.

“Topology and Orchestration Specification for Cloud Applications” or TOSCA is an OASIS standard (first described in 2013) providing specifications to create self-contained cloud models that describe the topol- ogy of cloud applications alongside the management and orchestration in a workflow model [5, 8]. The fundamental goal of TOSCA is the idea of enhancing portability and (re-)usability of cloud applications and services. As the lifecycles, relationships and operational behaviours of these services are often redundant, a general approach should aid in the process of deploying and maintaining cloud services.

The following use case will illustrate the key components of the TOSCA standard. The use caseisbased on the notes of the OpenTOSCA Research Prototype4 [4]. A small group of developers want to deploy a Java WAR archive on a Tomcat instance, see figure 2.4. The Tomcat server runs on an (Ubuntu) which is hosted on a virtual machine on a cloud provider (AWS for example). TOSCA models these different entities and defines capabilities (what does the service provides) and requirements (what does the service needs). This fundamental characteristic is visualised in figure 2.5 A formal definition of these concepts is summarised in table 2.1 taken from the official TOSCA Simple Profile document8 [ ].

4http://www.opentosca.org/index.html 2.3. OASIS TOSCA 19

Figure 2.5: TOSCA node types have requires and capabilities sections to fit together just like Lego pieces.

Table 2.1: Summary of the most important terms in the TOSCA standard [8].

Term Definition A topology template consists of a set of node template and relationship Topology template template definitions that together define the topology model of aservice as a (not necessarily connected) directed graph. A node template specifies the occurrence of a software component nodeas part of a topology template. Each node template refers to a node type that Node template defines the properties of the node (e.g. ). Node types are defined separately for reuse purposes. A relationship template specifies the occurrence of a relationship between nodes in a topology template. Each relationship template refers to a relationship Relationship template type that defines the properties of the relationship. Relation types are defined separately for reuse purposes. The semantics of a node or relationship e.g. attributes, requirements, Properties capabilities, interfaces, etc. Code or logic required by the node template to successfully meet certain Artifacts requirements. E.g. files to populate databases, archives to be deployed, image files for setting up services or operating systems, etc. 20 CHAPTER 2. BACKGROUND

OpenTosca (Winery [22]), Cloudify and alien4cloud are tools based on the TOSCA standard. The tool used in this research is Juju. This application modelling tool, created by Canonical Ltd, is also charac- terised by the same concepts as TOSCA.

2.4 Juju

This section will give an extensive overview of the Juju platform. What Juju is, how it canbeused and what goals it tries to achieve. Next, a summary is given on how Juju works under the hood. The main components such as charms, hooks and relations will be discussed and the new charms.reactive framework will be examined closely as it enables reusability and provides more flexibility5 [ ].

2.4.1 What is Juju

Concepts

Canonical describes Juju as follows: “Juju is a state-of-the-art, open source modelling tool for operating software in the cloud. Juju allows you to deploy, configure, manage, maintain, and scale cloud applications quickly and efficiently on public clouds, as well as on physical servers, OpenStack, and containers. Youcan use Juju from the command line or through its beautiful GUI” [23]. Modern applications these days are not monolithic or standalone applications anymore. Multiple services work together. Microservices, load balancers, worker and slave-nodes, caching tools are all examples of multi-application architec- tures. Even a website that uses a database consists of two different applications (the actual webserver with the website and a database service). Application modelling is the art of modelling the different applications with the goal to easily manage and scale them.

When looking at the Juju GUI (see figure 2.6) it shows a visual representation of the different ap- plications and how they are connected. The model represents an undirected graph where each node represents an application and each vertex contains relation-specific details between the two applica- tions. This visual level is clearly an example of encapsulating complexity for users. 2.4. JUJU 21

Figure 2.6: Juju GUI showing two applications (MySQL and Wordpress) connected to each other. The Wordpress application is in need of a database which is provided by the MySQL service.

Charms & Bundles

Juju uses charms and bundles to set up infrastructures. Charms are the fundamental building blocks of Juju. They are a set of scripts for deploying and operating the application. Juju offers thepossibility to write charms in any language (including existing configuration management tools such as Chef and Puppet). These charms are event-driven and focus on reusing operational steps (or code) in different circumstances. If a team has multiple redundant setups for security or testing purposes ,the steps to configure them are similar if not completely the same. Bundles are collections of charms thatare linked together. With the use of a bundle a team can deploy a whole stack of technologies at once. The Canonical Distribution Of Kubernetes is a good example of such a bundle5.

The great thing about these charms is that once they are written they provide a way of settingupsys- tems without “application-specific” knowledge (hence operations knowledge). Things like dependen- cies, operational events like backups and updates can all be encapsulated in the charm. The stronger the knowledge of the charm author, the more options will be available and the more flexibility one can have when designing in the Juju GUI. Once done, a user ends up with a working infrastructure without continuously bothering an expert. In other words when the charms or bundles are written, Juju and its charms act as virtual administrators for the user.

While charms and bundles are the central elements of Juju, a user gets confronted with some other concepts first. After installing Juju and optionally setting up the credentials for public cloud environ- ments such as Amazon Web Services, Windows Azure or Google Compute Engine one can “bootstrap” their environment. A Juju controller (which is also a machine) is the central communication and management node for a cloud environment.

5https://jujucharms.com/canonical-kubernetes/ 22 CHAPTER 2. BACKGROUND

Thanks to the controller it is possible to create Juju models. A model is always associated with one controller. Models can be easily added, destroyed or modified by users. It is at this level that operations engineers also can invoke security by granting users access to specific models. In this model charms can be added and linked together. This is the environment where “modelling an infrastructure” becomes possible through the use of the Juju commandline tool or the Juju GUI.

Juju Agents

The Juju agents are the building blocks of the entire tool. Juju agents are pieces of software thatare installed on all Juju machines. There are two types of agents: the machine and unit agent. The machine agent takes care of all machine related communication whereas the unit agent specifically operates at the application unit level. It is the machine agent that creates the unit agent and the unit agent that takes care of all the charm related tasks. These agents make it possible to speak about service orchestration as discussed in 2.2.

Subordinates

Applications are composed of one or more application units. An application unit runs the application’s software and is the smallest entity managed by Juju. Application units are typically run inanisolated container on a machine with no knowledge or access to other applications deployed onto the same machine.

In Juju, applications have one or more application units running the application’s software. This unit is managed by the unit agent and typically runs in isolated containers. Thanks to these seperated con- tainers multiple applications can co-exist on the same machine without knowledge or interaction to each other. There are two types of charms in Juju: the regular ones andthe subordinates. “Subordinate applications allow units of different applications to be deployed into the same container and tohave knowledge of each other.”6 They inherit the public/private address of their principal application and do not function as standalone applications. Subordinates are perfectly suitable for logging or making backups.

6https://docs.jujucharms.com/stable/en/authors-subordinate-applications 2.4. JUJU 23

Figure 2.7: Charm (bash template) structure

2.4.2 Juju internals

Structure, hooks & relationships

As previously stated, Juju works with charms. Figure 2.7 shows the charm structure in its simplest form. The config.yaml (see listing 2 for an example that holds some options for a deployed http website) file holds the different options that will be accessible by the end user.The icon.svg is the image used in the Juju GUI to represent the service and the README should offer some explanation about the charm for other uses. Revision is optional and rather deprecated. The metadata.yaml is another important file. A simplified version of the Haproxy service is given in listing 3. The first few lines give some information but the “provides” and “requires” tags are crucial. They define how charms can interact and communicate with each other. Finally there is a folder called hooks. A hook is an executable file (written in any language that can be interpreted by an Ubuntu machine). These files willbecalled by the Juju unit agent depending on different events. The hooks inform Juju what events happen and what actions the charm should do. An example of a very basic install-hook is shown in listing 4. The language used is bash. It installs the apache software and deploys a basic website. Notethatthe reactive framework is built on top of this hook workflow. In fact the reactive framework inserts ahook that will manage all other hooks. Because the reactive framework is the new and better5 [ ] way to write charms the lifecycle of the hooks become less relevant for charm authors and are therefore not covered here. The “charm build” command creates all appropriate hooks based on the reactive files of the charm. Note that hooks are still the fundamental mechanisms of Juju. When debugging, the “juju debug-hooks” command is needed and the according hook should be examined. 24 CHAPTER 2. BACKGROUND

options: website-name: type: string default: "My Website" description: "The title of your website" port-number: type: int default: 80 description: "Port to run website on"

Listing 2: Example of a config.yaml file in JuJu. name: haproxy summary: "fast and reliable load balancing reverse proxy" maintainers: [Juan Negron , Tom Haddon ,→ ] description: HAProxy is a TCP/HTTP reverse proxy ... tags: ["cache-proxy"] series: - trusty - ... requires: reverseproxy: interface: http provides: website: interface: http ...

Listing 3: Example of the Haproxy metadata.yaml file7

#!/bin/bash set -eux apt-get install apache2 -y a2ensite 000-default echo "Hello World!" > /var/www/html/index.html service apache2 restart

Listing 4: Example of an install hook in Juju.

7https://api.jujucharms.com/charmstore/v5/haproxy-43/archive/metadata.yaml 2.4. JUJU 25

Reactive framework

In the paper “Beyond Generic Lifecycles”, Sebrechts et al. discuss some limitations such as inflexibility or good support to reuse certain steps with traditional charms. The reactive framework is an answer to those shortcomings. They speak about “emergent workflows” using declarative flags and handlers [5]. Through the use of @When annotations above functions, a charm author can define conditions when- ever the framework should “react” (hence the name charms.reactive8). These annotations, called deco- rators, allow the reactive framework to create hooks accordingly. Figure 2.8 shows the architecture of the reactive framework.

Because the reactive framework offers more flexibility and reusability in the form of layers it shouldbe the preferred method when writing charms. An example is given in listing 5.

@when('apache.available', 'mysql.availale') def setup_app(mysql): render(source='configuration.', target='/var/www/configuration.php', owner='www-data', perms='0o775', context={ 'db': mysql, }) set_state('apache.start') status_set('maintenance', 'Starting apache')

Listing 5: Example of a handler in the reactive framework in Juju.

Figure 2.8: The architecture of the charms.reactive framework: when the orchestrator executes ahook, the reactive framework initiates and runs the handlers whose preconditions are true [5].

8https://charmsreactive.readthedocs.io/en/latest/ 26 CHAPTER 2. BACKGROUND

Note: all flags are prefixed with 'endpoint.{endpoint-name}.'

SET 'changed'

remote unit joins SET or published new 'joined' relation data

SET First unit joins 'departed'

Yes add departed unit to relation.all_departed_units

remote unit leaves relationship CLEAR 'joined' No

any remote unit left?

Figure 2.9: The workflow of the automatic set flags in the endpoint

Endpoint pattern

A recent pattern in the reactive framework isthe “Endpoint-pattern” 9. The “Endpoint” class is the new base class used in building interface layers. The class provides internal flags and makes the useof @when annotations possible instead of the @hook ones. These result in code that is more read- and writeable.

As shown in figure 2.9, interface authors can use four flags that are automatically set by the endpoint- pattern:

• endpoint.{endpoint_name}.joined

• endpoint.{endpoint_name}.changed

• endpoint.{endpoint_name}.changed.{field}

• endpoint.{endpoint_name}.departed

2.4.3 Juju as a solution

When looking back at the problem, stated in section 1.4, it is interesting to look at the workflow of Juju users. Imagine the use case where a team has Juju installed and correctly configured. A data scientist wants to start performing some analysis. He needs two key aspects: his Big Data environment and the source of the data he needs to analyse. The setup of his tools (for example a Hadoop and Spark cluster)

9https://charmsreactive.readthedocs.io/en/latest/charms.reactive.relations. html#charms.reactive.endpoints.Endpoint 2.4. JUJU 27 is something Juju can manage quite well in the user-friendly Juju GUI. Setting up the connection and relation between these tools and (non-)existing datastore units show some issues. When tackling this use case practically there are a few scenarios possible (we assume the data scientist is given a Juju model to work with):

1. The operation engineer has already predefined all data store related charms in the model. What is left to do for the data scientist is setting up his tools and adding relationships.

2. The data scientist has access to charms from another model through cross model relations.

3. The user does not want to interact with Juju models at all (looking at tools that existontopof Juju: Conjure-up, Tengu or custom frameworks). All he wants to say is: “setup and configure everything for me”.

4. (Proposed) The data scientist (and/or the layers on top of Juju) model everything butthrough generic entities such as a charm representing a database instead of a concrete application.

It might be clear that approach 1 still requires some manual work from the operation engineer, some- thing fundamentally against the goal of this research. Approach 2 seems very promising but the Juju GUI offers no support whatsoever for cross model relations meaning the visual representation fornon- technical users is of no help. In addition, there is still need for an entity in the model of the data scientist representing the datastore in the form of a generic database or this approach becomes similar to ap- proach 1. Also note here that while Juju has its limitations, Juju focusses on modelling applications and applications only. Approach 3 is the ultimate goal for both the data scientist and the operation engineer if it can all work in an automated way. Point 4 is the proposed idea of the virtual data administrators filling the gap and making approach 3 more and more possible.

Overview of things to keep in mind concerning Juju:

• Juju is more powerful than the Juju GUI alone, but using all its capabilities require some more knowledge of the CLI on one hand and some experience with Juju on the other.

• Juju focusses on application modelling. Modelling more (or other) things that represent other ideas or concepts than applications is something out of the scope of Juju.

• Juju and its charms allows many languages but the reactive framework seems to be the preferred way. 28 CHAPTER 2. BACKGROUND

Table 2.2: Juju terms

Concept Meaning Example Cloud Resource that provides machines. AWS, LXD Initial cloud instance that functions Controller - as central management node. A model has one controller and is the Model - playfield for deploying applications. The sum of operations needed to install and Charm Wordpress configure applications on machines. Wiki- Bundle A collection of charms and their relations. simple An instance of the cloud Machine - (a virtual machine or container). The deployed software on a machine, Unit - one machine can have multiple units. The concept of connecting multiple Relation - charms to share information. Software on a Juju machine to keep Machine agent, Agent track of changes. Unit agent 2.4. JUJU 29

2.4.4 Alternatives

There are a lot of tools, each with their own strengths and weaknesses. Most ease the operational activities for an operations engineer or focus on automating these tasks (see section 2.1). This is not the area where Juju shines. Juju provides the flexibility of its virtual administrator and the availability of reusable charms and bundles. When looking at other tools to replace Juju and that operate at the same application modelling level some characteristics need to be defined as metrics:

• What happens (what is the lifecycle) when two applications need to be connected?

• Are users able to work, deploy with the tool without knowing operations knowledge?

• Does an application request one specific other application or does it allow generic types?

• Is it possible to reuse parts of existing models or setups?

• Is it possible to connect to one and the same database with multiple applications?

When looking at these questions it already becomes clear that most configuration management tools or PaaS (Platform-as-a-Service) like solutions do not provide enough flexibility or possibilities. The need for tools that work on the level of service orchestration (see section 2.2 Service orchestration) is crucial.

Conjure-up10 is a tool build on top of Juju. Its goal is to provide even less know-how and faster setting up times. With a mindset as “Start using your big software instead of learning how to deploy it.” their focus aligns with the goals of this research. Because Conjure-up is nothing more than a layer on top of Juju (and therefore uses Juju), this tool wont be further examined.

Other tools such as Cloudify11, Mcollective in Puppet12, Heat13, Apache Brooklyn14 or alien4cloud + terraform15 were briefly examined. The tools were not further explored because either they didnot operate at the required level of operation management or the requested flexibility and reusability seems not present. Further research on possible Juju alternatives is needed. What is clear, however is that Juju functions on a lot of levels including installing software, writing configurations, managing states and orchestrating where all other tools fall short in at least one area.

10https://conjure-up.io 11https://www.cloudify.co/ 12https://puppet.com/docs/mcollective/current/index.html 13https://wiki.openstack.org/wiki/Heat 14https://brooklyn.apache.org/ 15https://alien4cloud-blog.com/tag/terraform/ 30 CHAPTER 2. BACKGROUND

2.5 Database technologies

2.5.1 History

In 1970 Edgar F. Codd [24] introduced the concept of relational databases. From that moment forward they have been in use ever since, proving their use and capabilities. The last few decades technology did not stop evolving, as did customer satisfaction. Nowadays it is unacceptable if systems are not responsive or fast enough. In addition, customers demand a 27/7 availability making downtime or maintenance a weak spot for a service. In 1.3 introduction, the term polyglot persistence was briefly discussed and illustrated with an example as a way of solving the availability requirement. The idea of using multiple databases to power one or more applications (e.g. microservices) became more important as soon as availability became a concern. In addition, sometimes certain data structures are more suited in a non-relational model. In these cases, NoSQL solutions are the way to go. In a blogpost [6], Nikola Živković, speaks about “impedance mismatch”, a term that describes the difference between the relational model and the in-memory data structure. The traditional relational database systems are however still needed and should not be replaced as they are perfectly suitable in certain cases. The polyglot persistence concept however makes system designers aware that more possibilities exist. There are multiple types of NoSQL database technologies each with their strengths and weaknesses. These characteristics define how the data is stored. Figure 2.10 illustrates a crucial mapping step in the process of storing application data to a relational database. This, often complex mapping step, would be unnecessary if one would use a graph store NoSQL technology such as Neo4J. This illustrates that the choice of database technologies is often related to the use case. Figure 2.11 gives an overview of the most popular technologies as of March 2018.

2.5.2 Types

When It comes to database types there are generally two big categorisations. The relational databases and the NoSQL systems. A new, third concept saw the daylight in 2011. The term “NewSQL” was introduced by Matthew Aslett making a new class of database systems that combine the bestofthe two worlds. NewSQL systems maintain the ACID (Atomic, Consistency, Isolation, Durability) of tradi- tional database systems while also providing the scale features of NoSQL technologies [25]. Relational database systems also know multiple categories such as object database systems or object-relational database systems. These, together with NewSQL technologies will not be further examined in thisre- search. 451research.com created a map as shown on figure 2.12 with a lot of data platform providers [7]. Discussing the architecture of database systems are full-fledged researches and books on themselves. The components of a DBMS, the used algorithms and data structures in terms of storing andoptimisa- tions when querying, are crucial elements of every database system. In this research these details wont

16https://db-engines.com/en/ranking 2.5. DATABASE TECHNOLOGIES 31

Figure 2.10: Disadvantages of relational database systems. An extra mapping step might be needed from data structure in the application to data structure on the database [6]. 32 CHAPTER 2. BACKGROUND

Figure 2.11: Most popular technologies in March 201816 be examined. For more information refer to the Architecture of a Database System article of Hellerstein et al. who performed a comprehensive research [26].

2.5.3 Relational database management systems

Relational database systems use tables, consisting of rows and columns, to store data. Typically, SQL (=Structured Query Language) is used to insert or withdraw information. Table 2.3 gives an overview of some frequently used terms and their meaning. Examples of big enterprise relational database man- agement systems (RDBMS) are Oracle or Microsoft SQL Server. Free to use, sometimes open source, examples are MySQL, PostgreSQL, MariaDB or SQLite. Figure 2.13 illustrates the relational aspect in an example entity relationship diagram.

2.5.4 Not-relational database systems

The no-relational databases (NoSQL) have multiple data models. An overview is given intable 2.4 with some example technologies listed as well. These data models are the main reason why one would choose that specific technology. A Key-Value store might be interesting for session data, recommendation 2.5. DATABASE TECHNOLOGIES 33 C B A D E enterprise search SIEM Towards E-discovery Towards Loggly HP IDOL Logentries InterSystems Documentum WakandaDB Towards ObjectStore Apache Lucene Endeca Server XML Server Ipedo XML Objecvity McObject UniVerse OrientDB Database IBM IMS Versant UniData Adabas Tamino Acan Caché By CenturyLink iue21:Dt ltommpi 06 lutaigtedfeettpso aaaessesadtcnlge [ and technologies systems of database types different the illustrating 2016, in map platform Data 2.12: Figure LogLogic Oracle xDB Orchestrate TIBCO 1 1 Avio GrapheneDB IBM InfoSphere Urika-GD Apache Accumulo Titan Gaffer Stardog Neo4J Sparksee Engine Graph Microso InfiniteGraph Allegrograph HypergraphDB Apache Solr Apache Cassandra Ontotext GraphDB Data Explorer Logic Sumo ObjectRocket for SRCH2 Apache HBase Google Cloud Google Cloud Elascsearch Elasc Found AWS DynamoDB X15 Soware Elascsearch MarkLogic IBM Lotus Notes Hypertable IBM DataTorrent/Apache Apex Instaclustr Datastore Enterprise BigTable ScyllaDB DataStax Splunk Cloudant Search Compose Azure Cloudera Distribuon Teradata Listener IBM of Apache Kaa Apache Samza Soware AG Enterprise Sqrrl Big Data LucidWorks Maana Search Cloudera SQLStream Apache S4 RavenDB ToroDB Dataflow Cloud Google ObjectRocket Azure DocumentDB MongoLab SimpleDB AWS Couchbase MongoDirector Modulus MongoDB MapR-DB MongoDB Server for Percona Apache CouchDB RethinkDB IBM Cloudant Local jSonar Enterprise Cluster CortexDB MagnetoDB 2 ObjectRocket 2 BerkeleyDB Riak Handlersocket Redis-to-go Voldemort RedisGreen Redis Labs HyperDex QuasarDB with Redis Kinesis AWS LevelDB PipelineDB Confluent/Apache Kaa Oracle Stream Explorer Azure Stream Analycs Apache Storm IBM InfoSphere Streams TIBCO StreamBase MapR Streams Starcounter Aerospike AgilData Redis ArangoDB NoSQL Oracle LeanXcale

with Redis AWS ElasCache Cache Azure Redis Redis Cloud Redis Labs Crate Trafodion EsgnDB/ Doopex Apache Enterprise

MariaDB InfiniSQL FairCom Datomic Splice Machine xPlenty PostgreSQL Rackspace EnterpriseDB for Hadoop Teradata Cloud CockroachDB Postgres-XL VoltDB Firebird Acan Ingres Soware ScaleOut SAP Sybase SQL Anywhere SQLite SAP Sybase ASE Google Cloud JustOneDB Dataproc Percona Server MemCachier Albase HDB MemSQL 3 Albase XDB 3 GridGain In-Memory Memcached Cloud Apache Ignite MySQL Cluster Apache Data Fabric MariaDB Redis Labs Flingual Tajo MammothDB Alscale MySQL Non-relaonal NuoDB Apache Apache Geode Hive Clustrix GemFire Pivotal Apache Drill Qubole Database Oracle Apache Exadata ScaleDB Impala Oracle ScaleArc Galera VMware Connuent Tesora DVE MariaDB zone Presto AcveSpaces PureData 4 TIBCO 4 Spider IronCache MaxScale EMR AWS Engine IBM Relaonal zone DB2 IBM Infobright Deep Big SQL IBM BigMemory GigaSpaces XAP MapR Informix IBM JethroData

BigInsights ElasCache 114 Fabric MySQL Percona TokuDB IBM AWS WebScaleSQL

Hazelcast Vortex Acan Relaonal Database Apache HAWQ Inmemory.net HANA Ehcache Data Lake HDInsight Google Cloud SQL SAP Pivotal HD/ Azure Azure Database.com Coherence Memcached AWS RDS SQL Server Microso Oracle HP Cloud Postgres ClearDB Exasol Tesora Heroku Hortonworks Server DBaaS PDW SQL InfiniCache Databricks Apache Kudu Data Arsans 5 5 eXtreme Scale Grid/cache zone BigQuery Google Database Azure SQL Cloud Databases Rackspace Varnish Cache Acan PSQL Cloudera Tibero Oracle TimesTen AWS Aurora solidDB Cazena HP NonStop SQL Treasure OpenStack Trove Progress OpenEdge Data IBM Metascale 1010data Apache Flink Apache Spark Apache Hadoop Oracle Big Data Appliance TazyGrid InfluxDB IBM Analycs for Apache Spark Oracle Big Data Cloud SciDB NCache Exalycs Oracle SpaceCurve for Analycs/dashDB Greenplum Database

XtremeData Aster Teradata TempoIQ Pivotal Greenplum/ InfiniSpan Citrix HPCC IBM InfoSphere Data Warehouse SAP Sybase IQ GroveStreams IBM PureData Acan Matrix Strao Acan Vector Kx Systems ParStream HP Verca Database CitusDB Data Grid Red Hat JBoss MonetDB Teradata 6 Snowflake Azure SQL 6 AsterixDB LucidDB SQream Brytlyt Redshi Kognio MapD Ry Druid AWS

C B A D E database-landscape Key: 451research.com/ © 2016 by 451 Research LLC. Platforms January 2016 January state-of-the- All rights reserved Data Map https:// Graph General purpose BigTables Specialist analyc -as-a-Service Key value stores Document access Key value direct Data caching Appliances Data grid MySQL ecosystem Hadoop Search In-memory New SQL databases clustering/sharding Advanced Stream processing 7 ].

34 CHAPTER 2. BACKGROUND

Table 2.3: Relational database and SQL terms

Relational database term SQL term Meaning Relation Table Structured collection consisting of columns and rows Record Row Collection of fields, representing a single item Field Column One specific, labeled attribute of a record Unique key Primary key Unique defined attribute

Users Orders PK userId PK orderId FK postalCode FK userId firstName FK itemId lastName

PostalCodes PK postalCode Items cityName PK itemId FK provinceId itemName price availableUnits

Provinces PK provinceId provinceName

Figure 2.13: Example of a entity–relationship model illustrating how relational systems use tables and relations to store and link data. 2.5. DATABASE TECHNOLOGIES 35

Table 2.4: Overview of non-relational (NoSQL) technologies

Data model Examples Redis Key-Value Databases Memcached MongoDB Document Databases CouchDB Cassandra Column-Family Store BHase Neo4J Graph Databases FlockDB systems built on top of social graphs might benefit from a graph store while for inventories traditional relational database systems might still be best. 36 CHAPTER 2. BACKGROUND 3 Functional specification

In this chapter a conceptual approach and outline is given of the generic database concept. First, some terms are clarified for a proper understanding with the help of the TOSCA standard. Next, thegeneric database concept is discussed using an example use case. Afterwards, a clear definition illustrates that the generic database only works on the operational side of services, leading to an explanation about certain design choices. Finally, some caveats wrap up this chapter.

37 38 CHAPTER 3. FUNCTIONAL SPECIFICATION

Table 3.1: Relation between the conceptual terms and how it is visualised.

Conceptual Visualised Meaning Remarks Application The full software stack with all its A full ecosystem Graph model components and underlying relations of software components. Service/ A software component Acts as a self-providing Node Application providing functionality virtual administrator. The relation between This often indicates shared Relation Vertex one or more services data between the services.

3.1 Terms and visualisations

3.1.1 Application modelling

The idea of application modelling will be used to visualise software stacks or infrastructure ecosystems. Graphs are used in computer science and mathematics to represent data types that are related to each other in one way or another. Applications or services are represented by nodes, while the vertices represent data types. Table 3.1 summarises the different terms used interchangeably in the following sections. Unless otherwise stated, each graph, represents an application model that omits entities such as machines or operating systems. An example of such an application model is given in figure 3.1 where a Wordpress application needs a database to work with. The graph therefore represents two services: the Wordpress application itself and a database technology service (e.g. MySQL) that can provide a database. The relationship between the two applications denotes a shared entity, in this case a database which has several attributes such as the databasename, the username, the password andthe port number. In other words the necessary details to establish a proper connection to the database.

- databasename - hostname - username - password - port

wordpress mysql database

Figure 3.1: Example application model of the Wordpress and MySQL services. 3.2. EXAMPLE USE CASE: COMPANY X 39

3.1.2 OASIS TOSCA

Section 2.3 gave a brief introduction to the TOSCA standard language [8], explaining concepts such as node templates, relationship templates and topology templates to describe topologies of cloud based web services. Note the similarities (nodes and relations) between the application model shown before and the TOSCA definitions. The standard however goes a little further in describing the micro services architecture. Things like operating systems, scripts to populate databases, hosts being containers or not are also described in the standard. The simplified graphs in the following sections use thesame concepts and visual guidelines of TOSCA but for the sake of simplicity, characteristics that are free to choose such as for instance the operating system are left out of the graphs. In addition, the YAML code as presented in TOSCA is also omitted as it gives no additional benefits in this case.

Figure 3.2, directly taken from the TOSCA Simple Profile document [8], shows an example of a logical diagram meeting the requirements of the TOSCA standard. Note the three different nodes, each with a “Capabilities” and optional “Requirements” section. These characteristics work like Lego pieces, offering and requiring structures to fit together. This means that one node requiring “X” can hook intoanode offering (capabilities) “X”. This is illustrated in figure 3.2 for the MySQL host: the Database.MySQL node has this information in the requirements section whereas for the DBMS.MySQL node this is provided through the capabilities section.

Figure 3.1 is based on the same concept. The Wordpress node has a “requires” which is identical tothe “capabilities” of the MySQL node. Nodes that indicate the host or DBMS system are omitted for clarity and the attributes of the database connection are put on top of the relation. This leads tosmallerand clearer graphs but with the same conceptual idea as described in the TOSCA standard.

3.2 Example use case: company X

Before properly defining the generic database concept, an example use case will be examined. Company X decides to invest in the creation of a web shop. The developers creating the online application decided to use two different database technologies. All user-related information of the customers will bestored on a PostgreSQL database and all the available items of the shop will be saved in a MySQL one. Because company X wants to analyse the web shop extensively, they also decide to create an application that will perform a statistical analysis. The database containing all user-related information needs tobe accessible for this application as well. From now on the web shop webservice will be referred to as “webshop” and the statistical application as “data-app”. The two databases are named “users” and “items” respectively. Figure 3.3 shows a simplified OASIS TOSCA application model of this use case. The graph shows that the (topology) model consists of four nodes and three relationships in this case. 40 CHAPTER 3. FUNCTIONAL SPECIFICATION

Figure 3.2: Logical diagram example of the TOSCA standard showing 3 nodes connected through “HostedOn”-relations. Nodes are defined by a name and a type and typically have a “properties” and a “capabilities” section. Some nodes also have a requirements section indicating how they need to function. Two nodes can have a relation if the requirements of one are conform to the capabilities of another. [8] 3.3. THE GENERIC DATABASE 41

databasename data-app username db:users db:users password port db:users

webapp db:items

mysql databasename username db:items password port

Figure 3.3: Application model of use case: company X

The different steps for setting up this use case could be summarised asfollows:

1. Creation of the webshop and data-app applications. 2. Deployment and configuration of a PostgreSQL and MongoDB server. 3. Creation of both the users and items databases. 4. Deployment and configuration (including database connection details) ofthe webshop and data- app applications.

3.3 The generic database

3.3.1 Definition

Section 1.2 introduced the concept of virtual administrators. The idea of a tool that would automate most manual processes concerning system administration, is the fundamental starting point of the generic database concept. A rather abstract definition of the generic database could therefore be:

A generic database is a virtual administrator that handles the operational tasks, such as setting up a database server, creating databases and sharing connection details, ofa database“ administrator regardless of the database technology. ” In other words, the generic database automates all operational steps from the moment a request is made. When these operational steps are finished the generic database is no longer generic but becomes concrete and holds certain properties such as databasename, username, password and the port number. The name “generic” refers to the database’s ability to offer support for polyglot heterogeneous database 42 CHAPTER 3. FUNCTIONAL SPECIFICATION

Table 3.2: Different (possible) definitions for the generic database concept

Name What Remarks The atomic Def. 1 One service equals one database This is the chosen definition generic database One service equals one database There is no benefit or reason Multiple types of generic Def. 2 but there is a distinction to use a -generic-database databases in a hierarchy between services over a generic-database The idea that a requesting Possible, create a new service Def. 3 Generic database manager service needs n databases. that uses Def. 1 One service equals n databases The idea that every database Global generic Def. 4 is represented by the service. Also possible with Def. 1 database manager One service equals all databases technologies. Note the similarities with the “tosca.nodes.Database” definition from the TOSCA stan- dard in the OASIS TOSCA Simple Profile. There are however some key differences between the TOSCA conceptual defined database and the generic database presented here. The TOSCA database assumesit is “hosted on” a node of the type “RDBMS” illustrating that the database knows at any time what type of database technology is used. This is not the case with the generic database. In addition, the generic database is first considered to be a service with the ability to provide a database. Only after arequest, the generic database is considered a similar concrete database just like the TOSCA database.

3.3.2 Design choices

When designing infrastructures, services or applications, certain choices need to be made such as what does the service do and what does it not? These choices are crucial and often determine the usage, capabilities and limitations of a certain service. The generic database service, as defined in the previous section, is considered to represent either no database, still generic and available to fulfil a request, or a single concrete database. This choice, for an atomic-like structure, came from the idea that allother definitions were either meaningless or still possible with the generic database as is. Four approaches were examined and are summarised in table 3.2.

The second definition needs a bit of explanation. It sounds more complex than initially intended. Oneof the first ideas when approaching the generic database concept was to look at the idea of encapsulation and inheritance from object-oriented programming principles. Database technologies could be put together in a hierarchy as shown on figure 3.4 in the form of a small example. A possible thinking path was the idea to start at the bottom and add more and more support creating new services, or startat the top and make sure all necessary features were present. This approach was not further researched 3.3. THE GENERIC DATABASE 43

Generic database

SQL NoSQL Generic Generic Database Database

SQL MySQL PostgreSQL Enterprise MongoDB Cassandra Database Database Generic Database Database Database

Oracle SQL Server Database Database

Figure 3.4: Example hierarchy or categorisation of database technologies. The root of the tree is the most generic database whereas the leafs represent technology specific databases.

as there is no reason why anyone would want to use the NoSQL generic database service over the generic database service. Additionally, it would not be interesting to have a list of slightly similar but still different generic database services.

The generic database manager definitions are not useless as they tackle the generic database service from another perspective. In these definitions the service itself represents a database administrator that will make sure a database is arranged and potentially allows more functionality. This is another approach towards the generic database with another semantic meaning. The choice was made to not implement the generic database service as such because in this definition it would not be possible to model the database as is. In addition, a new second service could be created that would use the atomic generic database (from now on generic database) service.

The generic database is therefore an intermediate/proxy service between a requesting service anda providing database technology service. When designing an application model, any requesting service or application in need of a database can connect to the generic database without any constraints. The generic database takes care of all necessary actions such as creating the database and sharing the con- nection details. The generic database is therefore by definition a virtual administrator but thesemantic meaning of the service is only the representation of database and not a database administrator. 44 CHAPTER 3. FUNCTIONAL SPECIFICATION

3.4 Possible scenarios

The generic database represents a single database. This means that if an application needs tendiffer- ent databases, ten different generic databases will exist in the application model. This results inthe following scenarios:

• Scenario A: A one-on-one relation between requester node and generic database node. This is the most trivial graph. In this case one service needs one database.

• Scenario B: A n-on-one relation between requester node and generic database node. In this case multiple requester nodes want a connection to one and the same generic database. This means that one requesting service did a request for a database and n other requesting services want to connect to this very database. The connection details are therefore the same.

• Scenario C: A one-on-n relation between requester node and generic database node. This scenario is reached when one requesting service needs multiple databases.

• Scenario D: A n-on-n relation between requester node and generic database node. This is a com- bination of the previous two scenarios. A requesting service needs multiple databases and these databases are also used by various other services.

3.5 Use case revisited

The use case of company X can be remodelled with generic databases. This use case isanexample of situation D. The “users” database needs to be accessible for two services (the webshop and data-app applications) and the webshop requires two databases (users and items). Creating the application model with generic databases results in a graph as shown in figure 3.5.

3.6 Caveats

This chapter introduced the idea of a generic database virtual administrator in a descriptive manner. The idea of a virtual administrator taking care of all operational tasks for a service (or another virtual administrator) offers workflows that are easy to use. There are however some important remarks and caveats concerning the generic database concept.

It is important to realise that the generic database concept only works on the operational level of the infrastructure. This chapter used application models and graphs to illustrate workflows andcommu- nication models. They only represent the “operations” side of applications, meaning that the generic 3.6. CAVEATS 45

databasename username db:users data-app password port db:users

generic db:users db:users database postgresql (users)

webapp databasename username db:items password db:items port

generic db:items database mysql (items)

Figure 3.5: Application model of use case: company X with generic databases.

database is not relevant in the application topology. Once a requesting node such as the webshop appli- cation in the use case receives the connection details of a database, all interactions are directly with the database and not through the generic database in contrast of what might seem intuitive when looking at the graphs. As shown in figure 3.6 there is a distinct difference between the operations perspective and the application workflow perspective. A developer or an application does not need knowledge about the generic developer whatsoever. No special actions or setups should be required as the generic database service is not present in the workflow of the applications using it. This is an interesting char- acteristic but it might be counterintuitive and weird at first. Once again there is a distinction between the operations side and the inner workings of the applications.

The formal definition of the generic database service does not describe how the setup and underlying relation between database technologies is done. If a generic database represents a MySQL database after a request, the generic database needs access to or communication with a MySQL service. Theway in which this is achieved is free to choose. Maybe the generic database service and MySQL are both hosted on the same machines, maybe in different containers or maybe they are two completely different and independent machines. A fully-fledged generic database service would allow all these possibilities.

A requesting service still requests a database for a specific technology. This means that the virtual administrator of this requesting application needs the knowledge of that specific technology. An in- teresting feature would therefore be a service that would help teams choose in deciding what type of database technology would fit their application best. Such a feature would look at certain requirements or needs and determine what database technology should be used. This could be implemented on top of the generic database service or as a service that would communicate with the generic database. With such a service the database specific knowledge or requirements of an application would be completely gone and all would be automated by virtual administrators. This lies however beyond the scope of this 46 CHAPTER 3. FUNCTIONAL SPECIFICATION

Operations

request generic request database requesting database technology service service service

Application

direct connection database requesting technology service service

Figure 3.6: The generic database service is only functional and present on the operations sideofthe application stack. The application itself directly connects to the database and is not aware of thegeneric database service. research.

The generic database as proposed here, deliberately does not limit features of the generic database.In the section about design choices it is stated that the generic database at least needs a way of request- ing databases and its properties such as hostname, databasename, password and port. But backups, schemes, views, triggers or other database defined elements might also be requirements of a user. In addition, it might be interesting for a user to have SQL-query support directly from the generic database service. All previous features reside in a grey zone as they operate on the management of databases and not on the request level. The atomic generic database service was defined as the representation of a single database to keep things as clear and simple as possible. Because the previous mentioned features are useful it would be interesting to have an additional service, an extra node in the graph, with a different relation that would offer these features. This way the generic database is usedforthe deployment part of the database and the other service, for example the generic database dba service, is responsible for the operational tasks on the database. Depending on the use case, teams could opt for implementing them together.

Finally, the ease of use towards the user is another key element for this service. Even though the service adds new components to the application model, the end model might end up being more clear. As every database is modelled in the application model, it becomes clear how applications are related to different databases. 4 Technical implementation

The technical implementation of a tangible product is the focus of this chapter. The conceptofthe generic database as described in chapter 3 will be implemented in the application modelling tool Juju. This chapter follows a similar structure of the previous one. First the different terms and visualisations of the conceptual specifications will be translated to their Juju counterparts. Next, the usecaseof company X will be shortly described as if an implementation would be done without the use of the generic database concept. Afterwards, an example implementation will explain how the theoretical ideas can be constructed in Juju along with certain design choices. The same use case will be revisited once again.

47 48 CHAPTER 4. TECHNICAL IMPLEMENTATION

4.1 Juju specific terms

This chapter will use the terms as defined by Juju. Refer tochapter 2 for an in-depth explanation. Terms used in chapter 3 are mapped to their Juju names and summarised in table 4.1.

Table 4.1: Summary of Juju terms and their meaning that are relevant in this chapter.

Conceptual Visualised Juju Meaning Application The workspace Juju uses Graph Model model to deploy charms. Service or All information to deploy and Node Charm Application configure a service or application. Relation This often indicates shared Relation Vertex (interface-layer) data between the services using endpoints. A collection of charms. Makes it - - Bundle possible to deploy a model at once.

4.2 Example use case: company X

The use case of a web shop in need of two databases, one of which is also used by another dataanalysis application can be implemented in Juju but not in a fully automated way. If a Juju user wanted to implement this use case in Juju, he could tackle this problem as follows:

1. Determining what database technologies are needed.

2. Creating the webshop and data analysis app.

3. Creating 2 charms: one for the webshop and one for the data analysis applications.

4. Deploying the (existing) PostgreSQL and MongoDB charms from the charm store.

5. Deploying the (self-written) charms for the webshop and data analysis applications.

6. Adding relations accordingly.

7. Copying the connection details through manual intervention, to make sure multiple charms can access one and the same database. 4.3. THE GENERIC-DATABASE-CHARM 49

Note the final step. It is, at design level in Juju, impossible for multiple charms to access thesame database. This is a result of the implementation of the existing interface layers and database technology charm layers. They are configured in a way that new incoming relations create new databases. The whole goal of an easy-to-use application modelling tool becomes unpleasant as manual steps are still required when multiple charms need access to the same database. The generic database charm (generic- database-charm from now on) should resolve this issue.

4.3 The generic-database-charm

4.3.1 Design Choices

When creating a charm, a clear concept of the service is needed for optimal choices. One of the first questions that arises is what type of charm the generic-database-charm needs to be. Juju offers support for regular charms or subordinates. Since the generic database represents a database, a subordinate (see section 2.4.1) seems the suitable choice. There are however reasons why a regular charm is more interesting and therefore chosen in this implementation:

• Subordinates only exist for the lifetime of their principal service, this is a regular charm in whose container the subordinate service would run. This means that it is impossible to model the database without a requesting charm. In the use case of company X this would mean that no generic-database-charm can be created without the web application and/or data analysis app. In addition, the generic-database-charm would be gone if the principal service would be destroyed.

• The use of subordinates would also result in another interesting feature being lost. Regular charms can be on “stand-by”. This means that they can be deployed and be ready for useas soon as a requesting service is in need of a database.

4.3.2 Other possibilities

This section will cover other perspectives ofthe generic-database-charm. In the functional specification some different definitions were summarised in table 3.2. This already showed that different perspectives or use cases can result in different semantic meanings of the generic database service. The atomic generic database is defined as a service that represents one database either generic or concrete. In some cases it might be interesting that the generic database service would also provide features as creating users, running SQL-queries or performing backups. In this scenario the service would act more as a generic database administrator. A possible approach is the creation of a new charm and interface layer. 50 CHAPTER 4. TECHNICAL IMPLEMENTATION

generic- dba generic-database generic-database-dba

webapp

generic-database

generic- mysql database mysql-shared

Figure 4.1: Application model with both the generic database charm and the generic database dba charm.

The generic database administrator charm could use the generic database charm shown onfigure 4.1 or both could be merged into one. Note that a new interface layer focusses on the operational tasks of an administrator while the generic database interface layer simply provides a database to work with. A possible use case would be populating a database as soon as the webapp is deployed. This way multiple generic databases can exist in the model whereas only one administrator offers functionality to these databases. This service was not further implemented as the scope of this research is limited to aservice that represents only a database.

4.3.3 The generic database under the hood

The following diagrams illustrate how the generic-database-charm works. Figure 4.2 is a BPMN (= Business Process Model and Notation) model that shows the workflow when setting up the charms. The whole process starts when the webapp wants a database, and ends when he receives the connection details. The webapp charm renders a configuration file ready to be used by the application. Notethata lot of events happen asynchronous as shown by the End objects, meaning that the activities are non- blocking.

The BPMN model shows the initial thought process of the wanted implementation of theusecase that was presented previously. There is however one part that did not end up in the implemented version of the generic-database-charm. In the lane of the generic database, a gateway asks whether the database technology service is available or not. In the implementation it is a requirement for the database technology charms to be deployed and connected to the generic-database-charm. With the 4.3. THE GENERIC-DATABASE-CHARM 51 help of the Libjuju-library1 this precondition could become obsolete but it was chosen to prioritize other things such as the implementation of more technologies and as complete as possible.

Connection details

Request Setup config files db

Has db ebapp Needs db W

Does generic database Is request already represent a database? correct db? Yes Yes

No Is db service Share connection No available? details Yes Formally define request Request received No

Generic Database Setup db Error service Request received orkflow oftheGenericDatabaseService BPMN W

echnology Has db Setup db Share connection details

Request received Database T

Figure 4.2: BPMN diagram of the generic database charm concept.

1https://github.com/juju/python-libjuju 52 CHAPTER 4. TECHNICAL IMPLEMENTATION

Requesting Generic Database Service Database Technology

System Administrator alt creates [concrete = false] requests db formally requests db

setups db share connection details for db

share connection details for db

is operational

Figure 4.3: Sequence diagram of the implemented generic database service. It is assumed that the database technology service is available. If the database is not concrete a request is send to set up the database. In the other scenario the generic database already knows the connection details.

Figure 4.3 shows a similar workflow in the form of a sequence diagram. In this diagram it isveryclear what the actual workflow looks like once the database request is made. A service requests a database, the generic database service provides it in case he already represents a database. In the other scenario the generic database service proxies the request to the database technology service.

The reactive framework and the endpoint-pattern in Juju use flags as fundamental communication method. The flags signal information to which handlers can react to. This mechanism functions as atriggerthat reacts when certain conditions are met or in other words when certain flags are set. Thanks to these flags endpoints can be used granting objects to work with. These endpoints represent objects thatare defined in the interface layer. Listing 6 illustrates in pseudocode how this mechanism works. Finally, figure 4.4 illustrates how the interface layer of the generic database functions as API for both the we- bapp charm and the generic database charm. Interface layers typically have a requires and a provides side.

1 WHEN FLAG = "generic-database.ready" { 2 endpoint = endpoint_from_flag("generic-database.ready") 3 endpoint.request("technology", "mydatabasename") 4 }

5 4.4. USE CASE REVISITED 53

charm charm

layer interface layer layer "webapp" "generic-database" "generic database" requires provides

"wants request "gets "ensures database database" request" exists and is available"

"gets share_details "provides database" details"

Figure 4.4: Visualisation of the interface layer of the generic database. The black nodes can be seen as endpoints in charms. The interface layer is the API that tells how the charms should communicate.

6 WHEN FLAG = "generic-database.mysql.ready" { 7 endpoint = endpoint_from_flag("generic-database.mysql.ready") 8 connection_details = endpoint.share_details() 9 render(configfile, connection_details) 10 }

Listing 6: Pseudocode illustrating how flags and the endpoint-pattern are used in the reactive frame- work of Juju.

4.4 Use case revisited

With the help of the generic database charm the new (manual) approach is as follows:

1. Determining what database technologies are needed.

2. Creation of the webshop and data analysis applications.

3. Creating 2 charms: one for the webshop and one for the data analysis application.

4. Deployment of the (self-written) charms for the webshop and data analysis application.

5. Deployment of (existing) PostgreSQL and MongoDB charms from the charm store (can be made obsolete with the help of the Libjuju library).

6. Adding relations accordingly. 54 CHAPTER 4. TECHNICAL IMPLEMENTATION

Figure 4.5: Application model of the use case as shown in the Juju GUI service. Two generic database services represent two databases used by a webshop and a data analysis application.

Using a preconfigured bundle reduces this to:

1. Determining what database technologies are needed.

2. Creation of the webshop and data analysis applications.

3. Creating 2 charms: one for the webshop and one for the data analysis application.

4. Deploying the generic database bundle.

5. Adding relations accordingly.

This reduces the complexity by a lot. The key elements become application specific and theactual deployment steps are fully automated. Figure 4.5 shows the final model in the Juju GUI webservice of the used use case. The implementations are available at the following repositories:

• https://github.com/Ciberth/gdb-use-case

• https://github.com/Ciberth/layer-generic-database

• https://github.com/Ciberth/interface-generic-database 5 Discussion, Future work & Conclusion

In this chapter the results, limitations and possible future work are discussed. First, a comprehensive and critical look is given at the generic database service. Afterwards, the research questions and their answers are summarised. Next, the many possible optimisations and further research are mentioned. Finally, a conclusion marks the end of this thesis.

55 56 CHAPTER 5. DISCUSSION, FUTURE WORK & CONCLUSION

5.1 Discussion

Chapter 4 focused on a use case and tried to implement the generic database service in the service orchestration tool Juju. The Lego-like structure to hook services into each other (as shown inthe TOSCA standard) is the key for a good and flexible communication model. Configuration management tools did not offer support to work on the service-orchestration base and they lack in encapsulating operations knowledge. In other words, people using configuration management tools are still required to know how the services are related to each other and how they work. Note that in Juju modelling things (in this case databases), that are not really services, is not possible either. Creating a service that does not represent a database but rather a database administrator would arguably solve this issue. It would however add more complexity and maintenance to the service as different libraries are required to connect to all the different database technologies.

The presented generic database service was tested, with the use case in mind, with success. Thismeans that the generic database service was deployed successfully and both the requesting webshop and the statistical analysis services correctly got the connection details for one and the same database. This was previously not possible in Juju in an automated way. The development of the service did not go flawlessly as reactive programming requires a specific programming style. The new, state oftheart, programming principles in the application modelling tool Juju also took some time getting used to. The documentation is not always newcomer friendly and clear examples are hard to find. Furthermore, there are some caveats. Because the existing interface layers of the database technology charms do not always offer all necessary capabilities, some workarounds are sometimes needed to ensure thatthe use case can work as intended. Without these steps the connection details would be shared properly but the requesting service (e.g. webshop) would not be allowed to access the database. This is because the requesting charm is not directly connected to the providing charm and thus does not receive the right privileges. The PostgreSQL charm for example uses the pgsql interface. With the help of this interface layer it is possible to request a database. The PostgreSQL charm will create a database and edit the pg_hba.conf file to allow the generic-database-charm to access PostgreSQL. As intended the generic-database-charm can connect to the database but the webapp cannot as that host is rejected. An intermediate charm called pgbouncer makes it possible to receive connection details that can be used from any host. This shows that a good security measure counteracts the generic database service and the implemented solution loses this security feature. Another example are the MySQL interfaces. Right now there are multiple interfaces that a charm author can use when working with MySQL. The “mysql” interface makes it possible to request a database but the databasename cannot be set. The “mysql- shared” interface offers the ability to request multiple databases and to provide database names, butthe same problem of PostgreSQL happens. The created user by the MySQL charm only provides privileges from the directly connected charms (hosts). Finally, the “mysql-root” interface creates a user with root privileges but the creation of databases with a database name is not directly supported. These crucial remarks illustrate that the generic database service is more complex and less robust than it should be. 5.2. ANSWERS TO RESEARCH QUESTIONS 57

A possible solution would require the interface layers of the providing database technology charms to offer the necessary features that provide more flexibility.

The use of charms means that right now, the proof of concept generic database is only availableand useful for Juju users. Modifications to the charm require Python-code and knowledge of the reactive framework. This greatly reduces the number of possible users of the generic database service. Onthe other hand, it might be an entry point to get started with Juju.

5.2 Answers to research questions

Is it possible, and what is needed, to create a service for the management of polyglot persistent integrations?

With the help of Juju it was possible to create a generic database service that would allow polyglot persistent applications to easily request different types of databases. Adding proper support for every possible database technology is a big challenge but the proof of concept seemed hopeful.

In terms of application modelling, what does this service formally represent?

In the proof of concept implementation, it was chosen to create a generic database that represented a database only. This choice results in clear models. If an application is connected to 4 different generic databases, it becomes clear that the application uses 4 databases.

What problems does the service solve, and at what cost?

The generic database service offers an easy-to-use interface. This API requires a database technology and a databasename. Once the request is done, the service will ensure that the database is available and it will share the connection details automatically. Users are not bothered anymore by operations tasks and can fully focus on their expertise. In addition, from a Juju perspective it becomes now possible for multiple charms to connect to one and the same database, which is a feature that is impossible without the use of the generic database charm.

5.3 Future work

The presented generic database service is far from complete and requires more work. More database technologies need to be supported. Additionally, new interface layers and providing charms need to be created to realise the support for a specific database technology. In a perfect scenario, with all necessary functions on interface layers, the need of third-party libraries (connecting and interacting with the databases) becomes unnecessary. Finally, the Libjuju library should be used, allowing the service to 58 CHAPTER 5. DISCUSSION, FUTURE WORK & CONCLUSION deploy services by itself. This ability, to create standalone machines, eliminates the precondition of having a database technology service ready for use. Another interesting research topic is the flexibility of cross model relations. In Juju it is possible for relations to work across different models, even across multiple controllers. Interesting research would be to look at the possibility of the current generic database service to exist in one model and provide all necessary things to another model. This lowers the complexity even further for the model of a user application. All the above paragraphs illustrate that the generic database service is not easily implemented, especially not as a full-fledged service.

Other possible future work would include research on comparing database technologies. When is it interesting to choose MySQL over PostgreSQL? In what cases is a NoSQL-technology better? This would be the basis for a service that could work on top of the generic database. This service would reduce every bit of operational knowledge concerning databases allowing teams to automatically make choices for them.

Finally, not all charms are implemented with the reactive framework and the use of better, more com- plete, interface layers would also be a welcoming gift. A competitive tool that learns from Juju and the reactive framework, but also allows more and easier modelling might rise in the future. As previously stated, the generic database service is only compatible with Juju. The idea behind the generic database however, is not. A new other service using different tools would be interesting as well. Configura- tion management tools are slowly becoming the standard but service orchestration tools, whether or not they use configuration management tools, are certainly a good next step in system administration. In the future new tools might rise, and once they do, the concept of the generic database service is definitely worth looking into.

5.4 Conclusion

This thesis looked at service orchestration principles with as goal the creation of a service thatwould help operations engineers, data scientists and developers when requesting databases. The service is called generic database and offers an easy-to-use interface with a lot of flexibility to support polyglot persistent application stacks. Requesting services can easily use the generic database service without the need of any operations knowledge. When formally defining this service, it soon became clear that the service could be looked at from multiple perspectives. The choice presented here used the generic database service as a database. Therefore, the presented generic database service’s tasks are only ensuring that a required database is present and that the requesting service is able to connect to it.

The implementation of the chosen generic database service was done in an application modelling tool called Juju. With the help of a specific use case, a proof of concept generic database service was created as a starting point for future implementations. Thanks to the use case, the inner workings of the generic 5.4. CONCLUSION 59 database service could be determined and implemented accordingly. The proof of concept successfully functions as a virtual administrator, creating databases and sharing its details. The generic database service has a couple of weak points. This shows that a full-fledged, usable, service is not easily imple- mented. A crucial limitation in the proof of concept is the disability of the generic database to deploy services and machines where needed. This means that database technology services are required as a precondition of the generic database service. Furthermore, more support for other database tech- nologies would make the proof of concept generic database more complete and ready for professional use.

The goal of this research was primarily focused on researching the possibilities of one service requesting a database and another one providing it no matter the database type or technology. This automatic support for polyglot persistent application stacks seemed possible but there is still some work before the proof of concept can be called a full-fledged service. With iterative steps, slowly adding new features and new database technologies, the service looks promising for any Juju user. 60 CHAPTER 5. DISCUSSION, FUTURE WORK & CONCLUSION Bibliography

[1] S. Overflow, “Stack Overflow Developer Survey 2018,” 2018, accessed: 2018-03-20. [Online]. Available: https://insights.stackoverflow.com/survey/2018/

[2] C. Peltz, “Web services orchestration and choreography,” Computer, vol. 36, no. 10, pp. 46–52, Oct 2003.

[3] D. Weerasiri, M. Barukh, B. Benatallah, Q. Sheng, and R. Ranjan, “A taxonomy and survey of cloud resource orchestration techniques,” vol. 50, pp. 1–41, 05 2017.

[4] OpenTOSCA. (2013) TOSCA and OpenTOSCA: TOSCA Introduction and OpenTOSCA Ecosystem Overview. Accessed: 2018-03-21. [Online]. Available: https://www.slideshare.net/OpenTOSCA/ tosca-and-opentosca-tosca-introduction-and-opentosca-ecosystem-overview

[5] M. Sebrechts, C. Johns, G. V. Seghbroeck, T. Wauters, B. Volckaert, and F. D. Turck, “Reusable modeling of custom-fit management workflows for cloud applications,” 2018.

[6] N. Živković. (2017) Introduction to NoSQL and Polyglot Persistence. Accessed: 2018-05- 22. [Online]. Available: https://rubikscode.net/2017/07/19/introduction-to-nosql-and-polyglot- persistence/

[7] 451research. (2016) Data Platforms 2016 Map. Accessed: 2018-05-20. [Online]. Available: https://451research.com/state-of-the-database-landscape

[8] O. Committee. (2017) TOSCA Simple Profile in YAML Version 1.2. Accessed: 2018-03-21. [Online]. Available: https://docs.oasis-open.org/tosca/TOSCA-Simple-Profile-YAML/v1.2/TOSCA-Simple- Profile-YAML-v1.2.html

[9] M. Hamdaqa and L. Tahvildari, “The (5+1) architectural view model for cloud applications,” in Proceedings of 24th Annual International Conference on Computer Science and Software Engineering, ser. CASCON ’14. Riverton, NJ, USA: IBM Corp., 2014, pp. 46–60. [Online]. Available: http://dl.acm.org/citation.cfm?id=2735522.2735530

[10] M. Abramow, “How DevOps and Agile Development Can Drive Digital Transformation,” 2017, accessed: 2018-03-21. [Online]. Available: http://www.oracle.com/us/corporate/profit/big- ideas/072417-mabramow-3839318.html

61 62 BIBLIOGRAPHY

[11] V. d. C. Guerra, E. Segeti, F. Hino, F. Kfouri, L. F. S. Mialaret, L. A. V. Dias, and A. M. d. Cunha, “Interdisciplinarity and agile development: A case study on graduate courses,” in Proceedings of the 2014 11th International Conference on Information Technology: New Generations, ser. ITNG ’14. Washington, DC, USA: IEEE Computer Society, 2014, pp. 622–623. [Online]. Available: http://dx.doi.org/10.1109/ITNG.2014.49

[12] R. A. Rodrigues, L. A. L. Filho, G. S. Gonçalves, L. F. S. Mialaret, A. M. da Cunha, and L. A. V. Dias, “Integrating , relational database, and the hadoop ecosystem in an interdisciplinary project involving big data and credit card transactions,” in Information Technology - New Generations, S. Latifi, Ed. Cham: Springer International Publishing, 2018, pp. 443–451.

[13] JetBrains. (2018) The State of Developer Ecosystem in 2018. Accessed: 2018-06-07. [Online]. Available: https://www.jetbrains.com/research/devecosystem-2018

[14] S. Leberknight. (2008) Polyglot Persistence. Accessed: 2018-05-20. [Online]. Available: http://www.sleberknight.com/blog/sleberkn/entry/polyglot_persistence

[15] N. Ford. (2006) Polyglot Programming. Accessed: 2018-05-20. [Online]. Available: https: //memeagora.blogspot.be/2006/12/polyglot-programming.html

[16] P. J. Sadalage and M. Fowler, NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot Persistence. Pearson Education, Inc, 2013.

[17] P. Simoens, “System Design Course,” 2017, university Ghent.

[18] G. Kim, J. Humble, P. Debois, and J. Willis, The DevOps Handbook, How to create world-class agility, reliability, & security in technology organizations, 1st ed. IT Revolution Press, 2016.

[19] Mulesoft.com. (2017) Service Orchestration and SOA. Accessed: 2018-05-28. [Online]. Available: https://www.mulesoft.com/resources/esb/service-orchestration-and-soa

[20] M. Sebrechts, G. V. Seghbroeck, T. Wauters, B. Volckaert, and F. D. Turck, “Orchestrator conver- sation: Distributed management of cloud applications,” 2018.

[21] C. Endres, U. Breitenbücher, M. Falkenthal, O. Kopp, F. Leymann, and J. Wettinger, “Declarative vs. Imperative: Two Modeling Patterns for the Automated Deployment of Applications,” in Proceed- ings of the 9th International Conference on Pervasive Patterns and Applications. Xpert Publishing Services (XPS), 2017, pp. 22–27.

[22] O. Kopp, T. Binz, U. Breitenbücher, and F. Leymann, “Winery – modeling tool for TOSCA-based cloud applications,” in 11th International Conference on Service-Oriented Computing, ser. LNCS. Springer, 2013.

[23] C. Ltd. (2017) What is Juju? Accessed: 2018-03-21. [Online]. Available: https://jujucharms.com/ docs/stable/about-juju BIBLIOGRAPHY 63

[24] E. F. Codd, “A relational model of data for large shared data banks,” Commun. ACM, vol. 13, no. 6, pp. 377–387, Jun. 1970. [Online]. Available: http://doi.acm.org/10.1145/362384.362685

[25] A. Pavlo and M. Aslett, “What’s really new with newsql?” SIGMOD Rec., vol. 45, no. 2, pp. 45–55, Sep. 2016. [Online]. Available: http://doi.acm.org/10.1145/3003665.3003674

[26] J. M. Hellerstein, M. Stonebraker, and J. Hamilton, “Architecture of a database system,” Foundations and Trends® in Databases, vol. 1, no. 2, pp. 141–259, 2007. [Online]. Available: http://dx.doi.org/10.1561/1900000002 64 BIBLIOGRAPHY Appendices

65

67

Appendix A - Juju Tutorial Getting started

Overview

1. Introduction

2. Requirements

3. Goal

4. Design

5. implementation

6. Conclusion

Introduction

This guide is meant for people who want to start writing charms in the application modelling toolJuju. The reactive framework, the Endpoint pattern and interface layers provide an excellent framework to work with. It is not easy however for newcomers to find good examples or find the things you need in the documentation. Code gets outdated very fast and some docs provide different insights. In addition, there is the Jujucharms website with information, the reactive framework documentation, the charmhelpers documentation and the charm store to explore. I’ve realised though that the best examples are found in github repositories of experienced users. Hang out in the irc #juju channel on freenode to find out more! All code is accessible on https://github.com/Ciberth/MP- appendix-a.

Tl;dr: This is a guide for charm authors to create charm and interface layers in the reactive framework with the use of the endpoint pattern.

Requirements

Knowledge about the basic terms1 used in juju and a basic understanding of hooks and their lifecycles2 are interesting lecture before continuing. Getting started with charm development3 is another good place to start. The basics of charm development will come back in this tutorial aswell.

1https://docs.jujucharms.com/2.3/en/juju-concepts 2https://docs.jujucharms.com/2.3/en/developer-event-cycle 3https://docs.jujucharms.com/2.3/en/developer-getting-started 68

Furthermore I assume you:

• have a working Juju environment (bootstrapped and client so that you can create models and deploy charms)

• have a JUJU_REPOSITORY directory; refer to creating a new layer (https://docs.jujucharms. com/2.3/en/developer-getting-started#creating-a-new-layer)

• installed charm tools (https://docs.jujucharms.com/2.3/en/tools-charm- tools)

Goal

Time to tell what we are creating. Because I want to show an example of an interface layer I will be using multiple (layer) charms. In other words we will create two charms and one interface layer that makes it possible for them to have a relation. To demonstrate, we will create a charm that will act as proxy of another charm, passing some information. This shows the working of the interface layers and illustrates how one can use existing charms as well.

The use case we will implement can be summarised as follows:

1. A requesting charm, this will be a webpage with the generated data running on top of a webserver, called “webapp”.

2. A proxy charm, called “gdb-charm” (generic database charm).

3. The interface layer that connects the previous 2 charms, called “proxy”.

4. An existing charm and existing interface layer. The name of the interface layer is “mysql-shared” and the charm we are going to use is the “mysql” charm from the Juju charm store.

To make this a bit more concrete we will make sure the following thing works. This is in other words the lifecycle/workflow from an operations perspective when setting up the charms/services.

1. The webapp performs a request to the gdb-charm.

2. The gdb-charm recieves the request and proxies this to the mysql charm using the (existing) mysql-shared interface.

3. The (existing) mysql charm configures a database and shares the details with the gdb-charm. 69

4. The gdb-charm receives the connection details from the mysql charm and proxies it tothewe- bapp.

5. The webapp receives the details and renders a page with this information.

Design

The application model looks like figure 1. There are a total of three charms and two interface layers. From right to left the “mysql” charm and the “mysql-shared” interface are already available fromthe charmstore4. The other two charm layers webapp and gdb-charm will be created together withthe proxy interface layer.

proxy mysql-shared webapp gdb-charm mysql

Figure 1: Application model of the project. Two new charms layers will be created along with one interface layer. The “mysql” charm and “mysql-shared” interface will be reused.

Implementation

A good way to start the creation for a charm is with the help of the “charm create ” com- mand. It will create all necessary files and folders to properly create a charm. Once you are donewith editing all the files accordingly the “charm build ” will create the charm layers in your JUJU_REPOSITORY folder under the proper releases. It is also in this folder under “interfaces” where the proxy interface layer must come.

Take a look at figure 2. This image shows the different files of both the webapp and the gdb-charm. They both use the apache-layer to provide a website. Both charms will install a small website (adminer.php5 from a remote repository as can be seen in the apache.yaml file. More importantly the metadata file illustrates how both charms can form a relation. The gdb-charm provides a database through the proxy interface, whereas the webapp requires a database through the proxy interface. The layer file should list all used layers, starting with basic and all used interfaces should be listed as well.

The structure of an interface layer is slightly different. In the interface layer thereisa requires.py and a provides.py. In these files, objects are created that can be used by the corresponding charms. The

4https://jujucharms.com/mysql/ 5https://adminer.org 70

Figure 2: Metadata and layer files of the two charm layers. They are the heart of the applications, allowing them to connect to each other.

requires.py file is of great importance for charms that have the interface-layer as a requires intheir metadata while the provides.py offers objects and methods for charm layers that provide the interface layer. Do note that the interface layer also has an interface.yaml which looks like listing 7. It is this name that determines the name of the interface layer.

1 name: proxy 2 summary: Example interface that proxies mysql data 3 maintainer: user

Listing 7: Interface.yaml file of the proxy interface layer

Now it is time to look at the fundamental core of a charm layer. When using the reactive framework a folder called reactive is created with a python file in it. In this file we create handlers with decorators to react to certain conditions or events. Take a look at listing 8.

1 @when('endpoint.database.joined') 2 @when_not('endpoint.database.connected') 3 def request_mysql_db(): 4 endpoint = endpoint_from_flag('endpoint.database.joined') 5 endpoint.request('mysql', 'mydbname') 6 status_set('maintenance', 'Requesting mysql gdb') 71

Listing 8: Code of testwebapp/reactive/testwebapp.py that starts the workflow of the use case with a request for a database. 72

As you can see we request the endpoint from a flag and receive an object. Afterwards we call methods on that object. This principle is realised by the interface layer. Figure 3 visualises how an interface layer connects two charms. Listing 9 illustrates how this is implemented. The {endpoint_name} will be automatically replaced by the name it was given in the metadata files of the charms that use this.

charm charm

layer interface layer layer "webapp" "generic-database" "generic database" requires provides

"wants request "gets "ensures database database" request" exists and is available"

"gets share_details "provides database" details"

Figure 3: Visualisation of the interface layer of the generic database. The black nodes can be seen as endpoints in charms. The interface layer is the API that tells how the charms should communicate.

1 class GenericDatabaseClient(Endpoint):

2

3 @when('endpoint.{endpoint_name}.changed') 4 def _handle_technology_available(self): 5 if self.technology(): 6 set_flag(self.expand_name('endpoint.{endpoint_name}.available'))

7

8 def request(self, technology, databasename, username): 9 for relation in self.relations: 10 relation.to_publish['technology'] = technology 11 relation.to_publish['databasename'] = databasename 12 relation.to_publish['username'] = username

13

14 def databasename(self): 15 return self.all_joined_units.received['dbname']

16

17 # same thing for host, port, user, password

Listing 9: Code of interfaces/proxy/requires.py

The provides side of the interface would look something like listing 10. Here the right flags are set to signal change and the connection details are shared over the interface in share_details.

1 class GenericDatabase(Endpoint): 73

2

3 @when('endpoint.{endpoint_name}.joined') 4 def _handle_joined(self): 5 technology = self.all_joined_units.received['technology'] 6 dbname = self.all_joined_units.received['dbname'] 7 if technology: 8 flag_t = 'endpoint.{endpoint_name}.' + technology + '.requested' 9 set_flag(self.expand_name(flag_t)) 10 if dbname: 11 flag_d = 'endpoint.{endpoint_name}.' + dbname + '.requested' 12 set_flag(self.expand_name(flag_d))

13

14 def technology(self): 15 return self.all_joined_units.received['technology']

16

17 def databasename(self): 18 return self.all_joined_units.received['databasename']

19

20 def share_details(self, technology, host, dbname, user, password, port): 21 for relation in self.relations: 22 relation.to_publish['technology'] = technology 23 relation.to_publish['host'] = host 24 relation.to_publish['dbname'] = dbname 25 relation.to_publish['user'] = user 26 relation.to_publish['password'] = password 27 relation.to_publish['port'] = port

Listing 10: Code of interfaces/proxy/requires.py

Next, listing 11 shows how the gdb-charm can request a database to the existing “mysql” charm over the existing ‘‘mysql-shared” and share the details over the proxy interface back to the webapp charm.

1 @when('mysql.connected', 'endpoint.database.mysql.requested') 2 def request_mysql_db(): 3 db_request_endpoint = ,→ endpoint_from_flag('endpoint.database.mysql.requested')

4

5 databasename = db_request_endpoint.databasename() 6 username = db_request_endpoint.username()

7

8 mysql_endpoint = endpoint_from_flag('mysql.connected') 9 mysql_endpoint.configure(databasename, username, prefix="gdb")

10

11 status_set('maintenance', 'Requesting mysql db')

12

13 @when('mysql.available', 'endpoint.database.mysql.requested') 14 def render_mysql_config_and_share_details(): 74

15

16 mysql_endpoint = endpoint_from_flag('mysql.available')

17

18 # On own apache 19 render('gdb-config.j2', '/var/www/generic-database/gdb-config.html',{ 20 'db_master': "no-master", 21 'db_pass': mysql_endpoint.password("gdb"), 22 'db_dbname': mysql_endpoint.database("gdb"), 23 'db_host': mysql_endpoint.db_host(), 24 'db_user': mysql_endpoint.username("gdb"), 25 'db_port': "3306", 26 })

27

28 # share details to consumer-app 29 gdb_endpoint = endpoint_from_flag('endpoint.database.mysql.requested')

30

31 gdb_endpoint.share_details( 32 "mysql", 33 mysql_endpoint.db_host(), 34 mysql_endpoint.database("gdb"), 35 mysql_endpoint.username("gdb"), 36 mysql_endpoint.password("gdb"), 37 "3306", 38 )

39

40 clear_flag('endpoint.database.mysql.requested') 41 set_flag('endpoint.database.mysql.available') 42 set_flag('endpoint.database.concrete') 43 set_flag('restart-app')

Listing 11: Code of gdb-charm/reactive/gdb-charm.py

Finally, the webapp charm can render its own config file the same way as the gdb-charm. Listing 12 shows the final step in the workflow of a database request.

1 @when('endpoint.database.available') 2 def mysql_render_config():

3

4 mysql = endpoint_from_flag('endpoint.database.available')

5

6 render('database-config.j2', '/var/www/testwebapp/database-config.html',{ 7 'gdb_host' : mysql.host(), 8 'gdb_port' : mysql.port(), 9 'gdb_dbname' : mysql.databasename(), 10 'gdb_user' : mysql.user(), 11 'gdb_password' : mysql.password(), 12 }) 75

13 status_set('maintenance', 'Rendering config file') 14 set_flag('endpoint.database.connected') 15 set_flag('restart-app')

Listing 12: Code of testwebapp/reactive/testwebapp.py to render config file.

To properly watch all config files refer to the repository available at https://github.com/ Ciberth/MP-appendix-a.

Conclusion

This guide showed a basic example on how to create reactive charms (with an interface layer)anduse the endpoint pattern. Note that the service created here is not really useful as the webapp didnot receive the privileges to properly access the database. To fix this issue another interface-layer needs to be used or the feature to pass privileges should be added to the mysq-shared interface. The goal was however to show how to share data over the relations and illustrate how interface layers function as proper APIs for the communication between charms. I hope you liked this short introduction, best of luck in building your charms!