Evaluation and Improvement of Application Deployment in Hybrid Edge Cloud Environment

Using OpenStack, , and Spinnaker

Khaled Jendi

KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF ELECTRICAL ENGINEERING AND COMPUTER SCIENCE

This degree project is supported by:

Examinor (KTH):

Mihhail Matskin

Supervisor (KTH):

Ahmad Al-Shishtawy

Supervisor (Ericsson, EST):

Christopher Price

https://wiki.nordix.org/display/RE/Edge+Cloud+project Table of Contents

Abstract ...... 1 List of Code Snippet ...... 7 List of Abbreviations ...... 8 Chapter 1 ...... 9 1 Introduction ...... 9 1.1.1 Openstack ...... 9 1.1.2 Kubernetes ...... 10 1.1.3 System Deployment & Spinnaker ...... 10 1.2 Problem ...... 10 1.3 Purpose ...... 11 1.4 Goal ...... 11 1.4.1 Benefits, Ethics and Sustainability ...... 11 1.5 Methodology / Methods ...... 12 1.6 Delimitations ...... 13 1.7 Outline (Disposition) ...... 13 Chapter 2 ...... 15 2 Edge Cloud Computing Background ...... 15 2.1 Traditional and Early Deployment Methods ...... 15 2.2 Edge Cloud Computing ...... 16 2.2.1 Cloud Computing Service Models ...... 18 2.3 OpenStack ...... 25 2.3.1 History ...... 25 2.3.2 OpenStack Architecture ...... 26 2.4 Visualization Principles ...... 31 2.4.1 Virtual Machines ...... 31 2.4.2 Containers ...... 31 2.5 Kubernetes ...... 33 2.5.1 Kubernetes Architecture & Components ...... 34 2.6 Spinnaker ...... 38 2.6.1 Spinnaker Architecture & Components ...... 39 2.7 Related Work ...... 41 Chapter 3 ...... 42 3 Methodologies and Deployment Approaches ...... 42 3.1 Research Approaches ...... 42 3.1.1 Applied Approach ...... 42 3.1.2 Qualitative Approach ...... 42 3.1.3 Empirical & Analytical Approaches ...... 42 3.2 Research Process ...... 43 3.2.1 Solution Overview ...... 44 3.2.2 Adopted Research Process ...... 46 3.3 Project Management Objectives ...... 49 3.3.1 Project Methods ...... 50 i

3.3.2 Triple Constraint Theory ...... 52 3.4 Testing & Evaluation ...... 53 3.4.1 Deployment Using VCS ...... 53 3.4.2 Zero Downtime Testing ...... 53 3.4.3 Lighthouse Testing ...... 53 3.5 Deployment Automation Analysis ...... 54 3.6 Project Resources ...... 54 4 Application Deployment in The Hybrid Cloud ...... 55 4.1 Cloud Environment Deployment ...... 55 4.1.1 OpenStack Deployment ...... 55 4.1.2 Kubernetes Deployment ...... 59 4.1.3 Spinnaker Deployment ...... 65 4.1.3.1 Deploying Spinnaker Locally ...... 65 4.1.3.2 Deploying Distributed Spinnaker ...... 68 4.2 Test Study (Case Study) ...... 73 4.2.1 Dockerfile ...... 73 4.2.1.1 NodePort and LoadBalancer ...... 74 4.2.2 Enabling Git Artifact Provider ...... 76 4.2.3 CHATi Deployment Pipeline ...... 79 Chapter 5 ...... 87 5 Results, Discussions and Conclusions ...... 87 5.1 Performance Testing Clusters ...... 87 5.2 Performance Testing Results ...... 87 5.2.1 Downtime During Deployments ...... 88 5.2.2 CPU Usage ...... 89 5.2.3 Memory usage ...... 89 5.2.4 Message Delivery and Request Interruption ...... 90 5.2.5 Request Duration ...... 91 5.2.6 Zero Downtime (Distributed Spinnaker) ...... 92 Chapter 6 ...... 93 6 Discussions and Conclusions ...... 93 6.1 Descussion and Analysis ...... 93 6.2 Future Work ...... 93 Appendix A ...... 95 Appendix B ...... 96 Appendix C ...... 98 References ...... 102

ii

iii

Abstract

Traditional mechanisms of deployment of deferent applications can be costly in terms of time and resources, especially when the application requires a specific environment to run upon and has a different kind of dependencies so to set up such an application, it would need an expert to find out all required dependencies.

In addition, it is difficult to deploy applications with efficient usage of resources available in the distributed environment of the cloud. Deploying different projects on the same resources is a challenge. To solve this problem, we evaluated different deployment mechanisms using heterogeneous infrastructure-as-a-service (IaaS) called OpenStack and . we also used platform-as-a-service called Kubernetes. Finally, to automate and auto integrate deployments, we used Spinnaker as the continuous delivery framework.

The goal of this thesis work is to evaluate and improve different deployment mechanisms in terms of edge cloud performance. Performance depends on achieving efficient usage of cloud resources, reducing latency, scalability, replication and rolling upgrade, load balancing between data nodes, high availability and measuring zero- downtime for deployed applications.

These problems are solved basically by designing and deploying infrastructure and platform in which Kubernetes (PaaS) is deployed on top of OpenStack (IaaS). In addition, the usage of Docker containers rather than regular virtual machines (containers orchestration) will have a huge impact.

The conclusion of the report would demonstrate and discuss the results along with various test cases regarding the usage of different methods of deployment, and the presentation of the deployment process. It includes also suggestions to develop more reliable and secure deployment in the future when having heterogeneous container orchestration infrastructure.

Keywords: Openstack; Kubernetes; Spinnaker; cloud deployment; docker container orchestration;

1

Sammanfattning

Traditionella mekanismer för utplacering av deferentapplikationer kan vara kostsamma när det gäller tid och resurser, särskilt när applikationen kräver en specifik miljö att löpa på och har en annan typ av beroende, så att en sådan applikation upprättas, skulle det behöva en expert att hitta ut alla nödvändiga beroenden.

Dessutom är det svårt att distribuera applikationer med effektiv användning av resurser tillgängliga i molnens distribuerade i Edge Cloud Computing. Att distribuera olika projekt på samma resurser är en utmaning. För att lösa detta problem skulle jag utvärdera olika implementeringsmekanismer genom att använda heterogen infrastruktur-as-a-service (IaaS) som heter OpenStack och Microsoft Azure. Jag skulle också använda plattform-som-en-tjänst som heter Kubernetes. För att automatisera och automatiskt integrera implementeringar skulle jag använda Spinnaker som kontinuerlig leveransram.

Målet med detta avhandlingsarbete är att utvärdera och förbättra olika implementeringsmekanismer när det gäller Edge Cloud prestanda. Prestanda beror på att du uppnår effektiv användning av Cloud resurser, reducerar latens, skalbarhet, replikering och rullningsuppgradering, lastbalansering mellan datodenoder, hög tillgänglighet och mätning av nollstanntid för distribuerade applikationer.

Dessa problem löses i grunden genom att designa och distribuera infrastruktur och plattform där Kubernetes (PaaS) används på toppen av OpenStack (IaaS). Dessutom kommer användningen av Docker- behållare istället för vanliga virtuella maskiner (behållare orkestration) att ha en stor inverkan.

Slutsatsen av rapporten skulle visa och diskutera resultaten tillsammans med olika testfall angående användningen av olika metoder för implementering och presentationen av installationsprocessen. Det innehåller också förslag på att utveckla mer tillförlitlig och säker implementering i framtiden när den har heterogen behållareorkesteringsinfrastruktur.

Nyckelord: Openstack; Kubernetes; Spinnaker; cloud deployment; docker container orchestration;

2

3

List of Figures

Figure 1: The characteristics of cloud computing ...... 16 Figure 2: Cloud Computing Service Models with BareMetal ...... 18 Figure 3: IaaS Stack Components ...... 20 Figure 4: Cloud OS Architecture Used for Virtualization ...... 21 Figure 5: OpenStack Architecture & Services Interactions ...... 26 Figure 6: Horizon (Dashboard) GUI in OpenStack ...... 29 Figure 7: VM and Container Virtualization Architecture ...... 32 Figure 8: Kubernetes Cluster Architecture ...... 34 Figure 9: Kubernetes Web UI ...... 37 Figure 10: Spinnaker Architecture [18] ...... 39 Figure 11: Spinnaker Web UI ...... 40 Figure 12: Scientific Research Process, inspired by [5] ...... 44 Figure 13: Solution Overview ...... 45 Figure 14: Adopted Research Process ...... 47 Figure 15: Project Management Constraint Triangle [20] ...... 52 Figure 16: OpenStack Deployment ...... 56 Figure 17: An example of one edge cluster ...... 59 Figure 18: Kubespray Cluster Deployment ...... 60 Figure 19: Part of k8s-cluster configuration file ...... 62 Figure 20: K8S cluster to be deployed on-top-of OpenStack ...... 62 Figure 21: Cluster’s config file ...... 63 Figure 22: Check Kubernetes cluster system nodes and pods ...... 64 Figure 23: Running nodes and K8S Pods in AKS ...... 64 Figure 24: Spinnaker Deployment – General Requirements ...... 66 Figure 25: Spinnaker deployment into public cloud ...... 69 Figure 26: Running distributed Spinnaker ...... 73 Figure 27: NodePort and LoadBalancer designs ...... 75 Figure 28: CHATi Web Application ...... 76 Figure 29: CHATi webhook into Spinnaker ...... 77

4

Figure 30: Personal Access Token Used with Spinnaker ...... 78 Figure 31: Deployed Github provider and Authentication ...... 79 Figure 32: CHATi deployment pipeline ...... 80 Figure 33: Components of Configuration Stage ...... 80 Figure 34: Check preconditions stage in Spinnaker ...... 83 Figure 35: Deploy Stage in Spinnaker ...... 84 Figure 36: Service interruption when deployment ongoing (Cluster 1) ...... 88 Figure 37: CHATi up time percentage against ongoing deployments ...... 88 Figure 38: Average CPU usage when performing deployments (new releases) ...... 89 Figure 39: Average RAM usage when performing deployments (new releases) ...... 90 Figure 40: Percentage of correctly delivered ...... 90 Figure 41: Average request duration when performing a deployment (new release) ...... 91 Figure 42: CHATi up time percentage against ongoing deployments while pipeline engine is killed ...... 92

5

List of

Table 1: Infrastructure Resources provided by Ericsson ...... 54 Table 2: Clusters Configuration ...... 87

6

List of Code Snippet

Code Snippet 1: Installing dependencies ...... 55 Code Snippet 2: Multinode Inventory File ...... 57 Code Snippet 3: Actual OpenStack Deployment ...... 58 Code Snippet 4: Check deployed OpenStack ...... 59 Code Snippet 5: Cluster Preparation ...... 61 Code Snippet 6: Installing Halyard and Minio in the local virtual node ...... 67 Code Snippet 7: Installation of Spinnaker locally ...... 68 Code Snippet 8: Deploy and authorize the service account [27] ...... 70 Code Snippet 9: Configure http routing for khaled-aks1 resources ...... 71 Code Snippet 10: Modifying values of distributed spinnaker’s deployment ... 72 Code Snippet 11: CHATi DockerFile ...... 73 Code Snippet 12: Enabling Github Provider wih Github Token ...... 79 Code Snippet 13: Configuration stage JSON code ...... 81 Code Snippet 14: Check precondition stage ...... 82 Code Snippet 15: Wait stage ...... 83 Code Snippet 16: Deploy Stage ...... 84 Code Snippet 17: Pipeline stage ...... 85 Code Snippet 18: Undo rollout stage ...... 86

7

List of Abbreviations

IaaS Infrastructure as a Service PaaS Platform as a Service SaaS Software as a Service IP Internet Protocol JSON Javascript Object Notion API Application Programming Interface REST Representational State Transfer CPU Central Processing Unit RAM Random Access Memory GPU Graphical Processing Unit GUI Graphical User Interface RPC Remote Procedure Call RMI Remote Method Invocation NodeJS Framework to run JS scripts in server-side CLI Command-Line Interface CI Continuous Integration SOA Service-Oriented Architecture VCS Version Control System K8S Kubernetes SSH Secure Shell CNI Container Network Interface AKS Azure Kubernetes Service SDN Software Defined Network

8

Chapter 1

1 Introduction

Nowadays topics such as distributed systems and edge cloud computing are considered amongst the most ongoing researches because of the necessity of high-performance cloud computing that can serve clients on the globe without service disruption via continuous connectivity using the internet.

Edge Cloud computing is a giant and hot topic. Edge cloud computing is the basis and the cornerstone that constructs this thesis. The necessity of edge cloud computing is to get data (produced or consumed by customers) closer to computing power (data centers as an edge) where the data would be processed. This is important for boosting performance and reliable services. To be able to produce edge cloud computing, we need a cloud operating system that works as an infrastructure, a platform that can serve a different kind of clusters and reliable, secure and simple yet powerful mechanisms to deploy applications.

This thesis depends on Infrastructure-as-a-Service: OpenStack, Microsoft Azure, platform as a service: Kubernetes, Spinnaker for continuous delivery and software as a service: any software, mobile or web application.

1.1.1 Openstack

OpenStack [1], in short, is an open-source cloud infrastructure software that is capable of controlling and managing datacenters resources, such as computing nodes, storage nodes, controllers, and networking. OpenStack system is supported and runs by different organizations across the globe such as Ericsson, IBM, AT&T, and Cisco. It is a heterogeneous system that can run in different kinds of environments.

OpenStack is composed of various REST services including the authentication service, called Keystone, the computing service, called Nova, networking service, called Neutron, block storage service, called Cinder, object storage service, called Swift, image service, called Glance and orchestration service, called Heat. These services can be consumed and manipulated by both OpenStack horizon web dashboard and OpenStack client CLI. 9

1.1.2 Kubernetes

Kubernetes, known also as K8S, is an open-source platform for managing and orchestrating application deployments in containers [2]. It could also manage and scale containerized workloads & services and automates deployments of containers in their clusters.

1.1.3 System Deployment & Spinnaker

Before the emergence of cloud infrastructure, the life of a software developer was not easy! Because he/she would need to work not only on coding but also about the infrastructure & platform that their code should run in. Developers had to ask system engineers to configure computers, installing operating systems and prerequisite framework libraries that are required for their software to run in the cloud. This procedure was tedious and time-consuming. Furthermore, faults that might occur in the physical hardware system such as power failure, or network down was a nightmare for both developers and system engineers.

This leads to containerized and continuous deployment of applications. In other words, developers can focus on developing their software and then produce them into containers that need to be configured only once. This container can be deployed heterogeneously in any other cluster when needed, because it contains everything needed to run the required software, such as operating system, prerequisites libraries, network, and security configuration, etc.

This container, which can contain one or more applications, is replicated into other nodes so that there is no absence of the service.

1.2 Problem

The problem being discussed in this thesis is the performance efficiency of cloud computing deployment and how to automate deployment process for incoming deployments (or newer versions of the same application) without even knowing where, or how this application will be deployed so platform clusters and infrastructures are transparent and abstracted from developers.

This thesis would also focus on deploying applications in heterogeneous infrastructure resources, and various kinds of platforms so they could host different applications in various clusters. 10

Moreover, the usage of virtual machines rather than containers during deployment can cause several weaknesses because of virtual machines nature. This thesis tests and evaluates deploying applications on containers and virtual machines.

1.3 Purpose

The reason and purpose of this thesis work are to discuss, investigate, and evaluate various deployments methods. This thesis intends to refine and improve edge cloud deployment and automates it.

The desired result is to prove application automation deployment achieves zero downtime, deploying load balancing between Kubernetes nodes, application resiliency, automating rolling update, autoscaling for the cloud (elasticity) and continuous delivery platform for applications.

Furthermore, this thesis aims to answer: how we can achieve zero downtime for applications deployed on multiple clusters (which uses different IaaS such as OpenStack and Microsoft Azure resources)?

1.4 Goal

The result of this thesis is to allow developers to deploy their application in multi-cloud edges & platforms without concerning about the deployment process, load-balancing, scaling. In other words, the deployment of applications goes smoothly in the heterogeneous infrastructures and platforms so that the cloud is configured to support previously mentioned techniques in section 1.3. This deliverable of the thesis has to contribute toward open-source infrastructure and platform of the edge cloud computing.

Also, it presents different test cases and studies to prove the automation of deployments and compares different deployment strategies such as traditional deployment and automated deployment in different clusters. It also evaluates containers and virtual machines, the benefit and drawback for each one of them in terms of application deployment.

1.4.1 Benefits, Ethics and Sustainability

Open source community and various enterprises and developers would benefit from this thesis research. The benefit is based on the reduction of IT cost of edge cloud deployment. 11

The paper “Ethical Considerations in Cloud Computing Systems” [3] mentions several ethical issues concerning IaaS, PaaS, and SaaS of cloud computing, such as privacy and security, compliance, performance metrics, and environmental impacts.

A study performed by Accenture [4] shows that the usage of cloud- based solutions massively reduces the CO2 emission compared to on- premises applications. Cloud computing achieves environmental sustainability compared to other on-premises solutions.

More discussion about ethical considerations and environmental sustainability in chapter 6.

1.5 Methodology / Methods

This project will mainly use applied research methodology. As defined by Kothari [5], applied research aims to find key solutions for such a practical problem for the society or the enterprises, so I would use current aspects and research results and data to build up to develop a practical solution.

Thus, this method would explore previous results and various aspects of the deployment of the edge cloud and triggers a solution that can automate and improve this process.

Also, research types could be either descriptive or analytical. Because this research uses current available deployment mechanisms and information about other researches & their results, it applies the analytical research. It helps to analyze the other deployment mechanisms and studies in order to evaluate them and build other solutions or improvements on top of them.

Furthermore, this project employs the usage of a qualitative research methodology in this project. Qualitative research [5] examines the quality of the result solution. In other words, this project would investigate the reasons for and benefit of various solution approaches and their motivation.

In order to qualify the results of this project, a case study and comparison would take place between different kinds of approaches, such as the difference between the usage of virtual machines and containers.

More details of the used methods are in chapter 3.

12

1.6 Delimitations

Edge cloud computing is carried through different kinds of operating systems, various kinds of containers, applications, OpenStack, Kubernetes, and Spinnaker frameworks. In this project, we have limited Ericsson bare metal servers, and we use the Ubuntu operating system for the host instances. We use ansible framework along with the Rocky version of OpenStack, which deploys its services in Docker containers. We also use Microsoft Azure infrastructure and its containers, and clusters

For Kubernetes, the same ansible framework is also used to deploy the Kubespray (Kubernetes clusters). This report shows different test cases of deployment written in & Node JS, and the other test case is written in Java.

Spinnaker continuous delivery framework will be distributed among different nodes and clusters.

1.7 Outline (Disposition) This project is organized and structured as the following:

Chapter 2: This chapter provides history, and detailed theoretical information and description of the backbone of edge cloud computing, specifically, OpenStack, Kubernetes and Spinnaker. Chapter 2 also provides various deployment concepts of IaaS and PaaS that would be used in this project.

Chapter 3: This chapter provides various methods used in this project and the reason for choosing each method. In addition, these methods would describe the current deployment mechanisms and how IaaS (OpenStack, and Microsoft Azure) and PaaS (Kubernetes) are deployed together and then Spinnaker comes on top of them.

Chapter 4: Describes the deployment details that would be built in this thesis work. It examines the study performed and procedures to automate and improve the deployment mechanism of the edge cloud computing.

Chapter 5: This chapter presents the results out of this thesis, including different evaluation criteria of deployment automation and performance testing.

13

Chapter 6: Chapter 6 includes discussion regarding the results, discussion, conclusion and the future work.

14

Chapter 2

2 Edge Cloud Computing Background

This chapter would provide and explain the concepts and tools used in which it gives an in-depth description and understanding of the background of the thesis project.

The center points of this thesis are about automating deployment in edge cloud computing using: 1. OpenStack: This is the infrastructure of the cloud that controls the virtual computing instances, storage, routing, networking resources. 2. Microsoft Azure: Another public cloud computing infrastructure that provides a platform to spawn clusters, controlling computing instances, storage, and network resources. 3. Kubernetes: A platform as a service that distributes application into several containers. 4. Spinnaker: A continuous delivery platform that manages and orchestrates deployed applications.

So, to begin with, this chapter starts with a concise description of the previous and early deployment mechanisms, and then it moves towards deployment in cloud computing with a brief description about OpenStack, Kubernetes, and Spinnaker.

2.1 Traditional and Early Deployment Methods

At the beginning of the software engineering era, developing high- quality software was not the only hassle. Programs could be attached directly with mainframes or personal computers because the traditional deployment of an application could be a real problem and time- consuming process which requires a system architect expert to deploy an application.

As software development increase complexity, it can require external dependencies such as libraries from third-party sources, so manual deployment process requires as a prerequisite deploying all dependencies, while forgetting even one dependency could cause the application to crash. The traditionally deployed applications lack of distribution, resiliency, security, replicability, and resource allocation efficiency. Also, when developing a newer release of the application, it

15

was likely to be down while upgrading the process. Finally, the response time (latency) of applications can is high than expected, especially when the computing power is far away from the physical location of the user.

2.2 Edge Cloud Computing

As of previous mentioned problems in the traditional and early-stage deployment methods, the emergence of virtualized tenancy, networking resources, and distributed storage in a term called cloud computing which offers the perfect environment for consumers enterprises to develop and deploy their applications.

Cloud computing was defined by P. Mell and T. Grance [6]:

“Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction. This cloud model is composed of five essential characteristics, three service models, and four deployment models. “

Self- Service Measured Network service Access

Rapid Cloud Resource Comput- elasticity ing pooling

Pay-as- Security you-go Economi- cal

Figure 1: The characteristics of cloud computing 16

The tendency of enforcing the latency to be as low as possible brings the concept of Edge computing [14]. Edge computing is a continuation of cloud computing in which it brings the computing power, including services, storage, memory, and processing closer to the consumers to achieve low-latency accessibility of applications

Figure 1 shows the characteristics of cloud computing. There are five essential characteristics [6]: 1. On-demand self-service: the consumer can provision computing capabilities without outside help from the service provider. 2. Broad network access: capabilities of real-time networking over virtualized computing instances 3. Resource pooling: The provider has resources efficiently pools to serve different consumers in which they can be created and recreated on consumer’s demand. 4. Rapid elasticity: resources and capabilities can be conducted for scaling. Elasticity can also be automated upon application demands. 5. Measured service: resources are optimized, monitored, controlled and reported (when needed) by the cloud with transparency towards both providers and consumers.

Other characteristics of cloud computing are [7], economical, pay as you go, and cloud security so that data are protected against hackers and cannot be accessed by unauthorized personnel.

The main characteristics of cloud computing ensure the best performance and best exploit of cloud resources. To achieve the best exploit of cloud resources, cloud resources are virtualized in which the single physical instance (or component) can be shared among several consumers.

The consumer, on the other hand, is not aware of this procedure and to his/ her perspective, virtual machines (VM) are similar to real machines. In our deployment mechanisms, virtualization is playing an important role because it provides the environment to create several applications on the same infrastructure resources in which consumers may not change the real operating system of the running resources. In other words, the consumer has no control over cloud resources, such as physical servers, routers, …

Another important role of virtualization is that the consumer is capable of creating any kind operating system or configuration that suits his/ her deployed application. This is because the different application requires 17

different architectures, so virtualization provides the mechanisms to run different architectures on this real machine.

The infrastructure, platform and software of cloud computing are service-oriented which makes possible to be accessed remotely as utilities. This helps the cloud computing characteristic of “pay as you go” because it helps to find the usage of these services by the end-user.

One of the first cloud computing providers was Amazon [8], which applied previously mentioned characteristics of cloud computing, in which the consumers pay for on-demand services required to deploy their applications.

2.2.1 Cloud Computing Service Models

As described in the previous definition of cloud computing, there are three service models, which are: Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a Service (SaaS). Each of these service models ensure a level of abstraction and transparency to the user in which they decrease the consumer’s effort to deploy their systems. In other words, consumers do not have to deploy or manage anything related to infrastructure (including servers, routers, network resources, …), or even the platform (such as the operating system). These services are the core of cloud computing and called service- oriented architecture (SOA).

Software as a Service (SaaS)

Platform as a Service (PaaS)

Infrastructure as a Service (IaaS)

Physical Resources (BareMetal)

Figure 2: Cloud Computing Service Models with BareMetal 18

In Figure 2, inspired by [6], the Cloud Computing Service-Oriented Architecture components are:

1. Physical Resources (BareMetal): These are the actual and physical resources of the cloud in the data center. This layer is not a service model; however, it is used directly only by IaaS.

2. Infrastructure as a service (IaaS): This service model has all the required mechanisms to deliver the cloud computing infrastructure resources to the consumer. The National Institute of Standards and Technology (NIST) [6] shows that these resources consist of provision processing, networking, storage, access memory, and other computing resources. The main goal of IaaS is for the consumer to deploy and run required software by getting needed (virtualized) resources rather than deploying them on on-premise-infrastructure. Consumers can use these resources to deploy any prerequisite software or operating system in which they would not have to manage physical (underlying) infrastructure. Consumer’s control is limited to manage virialized resources such as virtual instances, networks, and storage.

Consumers of ”infrastructure as a service” are not required to purchase hardware, such as servers, network components, and other datacenter hardware, instead, consumers would pay for outsourced resources as a service in which these on-demand outsourced resources can be accessed through the network or the internet. Consumers stop paying for their virtual resources once it is turned off.

The reason behind choosing IaaS is that underlying physical data center and physical resources are transparent and abstracted consumers for customers. These resources are in-fact maintained by the cloud provider and would be accessible by consumers as a collection of services. These services are available in the form of either web-console or as code using application programming interface (API).

The process of abstraction of underlying physical infrastructure resources is known as virtualization. It provides a secure and isolated environment from other consumers environments. Also, it provides flexibility through its orchestration tools that allow consumers to manage and automate their software in their chosen environment. IaaS allows consumers to concentrate on 19

developing and managing their applications and spending less time on managing data centers and other underlying physical infrastructure.

This thesis uses two different infrastructures as a service, which are OpenStack (an in-house open-source IaaS) and Microsoft Azure which provides public resource virtualization (public cloud). Both IaaS (OpenStack and Microsoft Azure) use the same stack components shown in figure 3:

Servers Networking and Firewall

Data Center Database and Storage

Virtualization Load Balancer

Figure 3: IaaS Stack Components

In stack components shown in figure 3, virtualization is used for managing and controlling the processes of virtual machines & servers, and other virtualized infrastructure resources. This process is performed by the cloud operating system, such as Microsoft Windows Azure.

Figure 4, inspired by [9] and [10], explains the cloud operating system architecture used to virtualize physical resources of the infrastructure and provide it as a set of services for the consumer. The cloud OS kernel includes physical infrastructure and cloud drivers and the core management services.

The drives of physical infrastructure contain the secret of making physical resources transparent and virtualize them for the consumer. These drivers consist of a hypervisor, network, storage, and information drivers [10].

The hypervisor driver (also known as virtual machine monitor) can be divided into two types. The first type of hypervisor runs on top of the physical resources (BareMetal) in which it controls the hardware directly. The second type of hypervisor runs as a consumer process (application) on top of the cloud operating system.

20

Apps

Consumer Application A …

Admin Tools Storage Service Scheduler Utilities

Cloud Interfaces Replication Service …

Cloud Network Calls

VM Mgt. Process Mgt. Storage Mgt. Cl

oud OS Kernel

Network Mgt. Auth. Mgt. Image Mgt.

Physical Infrastructure & Cloud Drivers

Figure 4: Cloud OS Architecture Used for Virtualization

The cloud operating system uses the first hypervisor while the second type of hypervisor is used by the guest operating system, which is controlled by the consumer.

The core management services, such as virtual machine management service, process management service, authentication management service, … are dependent on physical infrastructure drivers so that the cloud operating system can deploy and control virtualized resources.

The virtual machine management service manages the virtual machine process from spawning to shutting it down. The network management service handles the connection mechanism between different services and virtual machines. It also ensures the communication between the internal and external networks so that the consumers can access their virtual machines and other 21

services correctly. Other services such as authentication management service which handles the consumer's authentication in which they are authorized to access their specific resources. Storage management service is responsible for providing dynamic storage capacity to the consumer with scalability capability to meet their needs.

The OS utility services consume these management services of the cloud OS kernel over the network. Utilities is a middleware layer between the consumer and the cloud OS kernel that equip different services such as admin tools that are responsible for providing a command-line interface (CLI) or graphical-user- interface (GUI) for the consumer to use the previously mentioned kernel management services. In section 2.3, come more details about OpenStack virtual machine management service and its CLI, and web GUI.

3. Platform as a Service (PaaS): PaaS is the next abstraction level that allows consumers to focus on building their application and run them into the virtualized infrastructure. To accelerate and facilitate the development process, PaaS could provide programming libraries, services, and tools provided by the cloud operator. It allows the abstraction of control and management of the underlying cloud infrastructure for the developers [6].

It can be problematic because it constrains the flexibility because the consumer (developers) must choose from currently available tools by the cloud provider. To solve such a problem, cloud providers, such as Microsoft Azure, allow consumers to deploy their PaaS software. It also leads to the Open PaaS environment [11] in which developers are free to choose or deploy any required programming framework, security, database, monitoring, logging, caching, analysis framework, payment gateway, and also to the operating system. It also allows consumers to deploy & implement their platform to the infrastructure.

The main three types of the platform as a service that are used widely by cloud providers are:

a. Application Infrastructure platform: This archetype of PaaS provides a collection of services and applications to the consumers to develop their applications based on the capabilities provided by the infrastructure. 22

b. API-Based PaaS: It provides the consumers with a collection of third-party libraries and solutions using application programming interfaces (APIs), in which consumers do not need to focus or know about these libraries’ implementation. c. Container-based PaaS: Container-based archetype employs the usages of containers, such as docker, containers (also known as LXC), etc. to orchestrate application deployments by the consumer. This thesis would focus on this archetype to automate application deployments in great detail.

There are several characteristics for platform as a service that allow consumers to concentrate upon their application [12]: a. Multi-tenant support: Consumers might have several applications to be deployed so PaaS should be able to host these applications. Container- based PaaS deploy these applications explicitly in different containers while API-based PaaS deploy them implicitly using built-in functionality. b. Elastically Scalable: Scaling is a significant part of PaaS because applications and services deployed by the consumers must be highly available. PaaS would provide both scaling-up (vertically) in which consumers are capable of adding more resources (such as more virtual CPUs) to the current system and scaling-out (horizontally) in which consumers are capable of adding more infrastructure resources (such as adding more virtual machines). Cloud computing providers provide these scaling mechanisms to be automated by allowing services auto-scaling based on the load on these services. Using container-based PaaS, the consumers may configure their deployed applications to work as a service so other applications can use it. c. Authorization: PaaS provides different libraries and tools which are authorized by authenticated consumers. Consumers may deploy their services and APIs in which other consumers may not grant access to them.

23

d. Monitoring: Monitoring is a PaaS service that allows consumers to check, manage, and control their platform and being notified (using logs) when issues have occurred into their services.

The used platform-as-a-service in this thesis is Kubernetes, which allows deploying a containerized application into a cluster.

4. Software as a Service (SaaS):

SaaS is the top-level component in the service-oriented architecture. At this level, cloud computing providers deliver their applications as services to the consumers, in which consumers can only manage or configure these applications parameters.

At this point, consumers are not capable of managing or controlling any underlying infrastructure resources such as servers, data centers, networking, storage, or databases. Also, platforms (such as operating systems), and the logic of another application are abstracted for the consumers [6]. SaaS usually provides enterprise applications such as payroll systems.

There are four types of cloud deployment models as described by [6], which are:

1. Private Cloud: A private cloud is provided and limited to be used by the same organization in which several units of this organization can consume the cloud resources. Enterprises, such as Ericsson (where this thesis is being done) have their private cloud where other organizations cannot expose confidential data. In other words, the cloud is owned, managed, and operated by the organization and usually exist on or off premises.

2. Community Cloud: Community cloud infrastructure is provided and limited to be used by a collection of organizations that share similar requirements, such as privacy, performance, and security requirements. For example, a community of organization share a concern of secure format of data and require secure communication between users and data. A community cloud is owned and controlled by the one-to-many organization(s) share the same needs.

24

3. Public Cloud: Public clouds are owned by third-party organizations. It is used by public consumers in which they can access it over the internet or the network. Current public cloud providers are operated by academic, governmental, or business organizations, such as Google Cloud, (AWS), and Microsoft Azure. Consumers pay for business organizations to provide them a pool of resources based on their requirements.

4. Hybrid Cloud: Also known as multi-cloud computing systems. Hybrid cloud systems provide a pool of resources from different and distinct cloud infrastructures, such as private, public and community clouds. These resources simulate a single cloud infrastructure even though in reality they are unique. Hybrid Cloud provides a perfect load balancing and high availability for consumer applications.

This thesis deploys multi-cloud systems, both private cloud system used by Ericsson and Microsoft Azure to enhance and improve application deployment, security and availability.

2.3 OpenStack

OpenStack is a well-known open-source cloud computing infrastructure, and it is the backbone of building both private and public clouds [1]. At Ericsson, where this thesis has been done, PayPal, and many other organizations, OpenStack is the chosen infrastructure-as-a-service platform. Flexibility, high scalability, heterogeneity (multi-tenant), and reliability are the main reasons for choosing OpenStack for IaaS.

OpenStack is the IaaS which comprised of a collection of independent services that allow consumers to manage and control the underlying cloud-based physical resources in the datacenter. Each of these services collaborates with other services in OpenStack, so consumers are capable of interacting with these services using application programmable interfaces APIs defined as REST API web services [13].

2.3.1 History

Both NASA and Rackspace Hosting initially released OpenStack in 2010 [13]. In the initial release, OpenStack was capable only to provision and control object storage and the pool of computing hypervisors (called nova). Since 2012, OpenStack has been organized, developed, and 25

controlled by OpenStack Foundation where many organizations take part in developing OpenStack. The community of OpenStack release a newer update every six-month.

Releases of OpenStack are Austin, Bexar, Cactus, Diablo, Essex, Folsom, Grizzly, Havana, Icehouse, Juno, Kilo, Liberty and now Mitaka, Newton, Ocata, Pike, and Queens in which the naming of OpenStack releases is based on the places of OpenStack summits [13]. The current release of OpenStack, which is also used in this thesis, is Rocky.

2.3.2 OpenStack Architecture

Web-based application Horizon

Authenticate & Authorize

Save Cinder Images Swift Neutron

Data Storage Glance

Request Storage VM(s) Networking Response Image

Nova Key- Stone

Orchestrate

Authenticate Heat Command line line Command & Authorize interface

OpenStack CLI Command line interface

Figure 5: OpenStack Architecture & Services Interactions 26

Figure 5 describes the architecture and interaction between main services. OpenStack services including:

1. Nova (Compute Service): Nova is heart and core service in OpenStack. It is comprised of several components that work together to manage, control and deploy computing resources as hypervisors (also called in Nova instances).

Nova does not create new virtualization and hypervisor technologies; instead, it uses current virtualization technologies, including QEMU-KVM, KVM, QEMU, VMware ESXi, Xen, and Hyper- V. Consumers could create and build a variety of instances based on their needs, including vCPU count, RAM size, and disk storage size.

2. Glance (Image Service): Glance controls and manages all virtual machine images. In other words, in order for Nova to create instances (virtual machines) with the required operating systems to boot, it requests them from Glance image service. All of these requests and responses are done through RESTful web services calls. When the Nova service receives the response from the Glance image, it stores it in the virtual machine created by the consumer so that whenever the instance is started, it can boot the operating system.

The benefit about these images, rather than manual installing OS in the instances, is that they are preconfigured, and they have all required structure for the cloud, so consumers do not have to know specific details about OS configuration for cloud-based systems.

3. Keystone (Identity Service): Keystone service is the gate for all services in OpenStack. It controls and manages all users and services in which only authenticated users (can be human or a system) and services can grant access to perform operations in OpenStack, such as creating virtual machines, virtual networks, create images, etc. Keystone preserves all users’ credentials in a database called Keystone Database.

The authentication is performed using Lightweight Directory Access Protocol [13] (called LDAP) which supports both token- based authentication (used by OpenStack services) and

27

username/password authentication mechanism (usually used by regular users).

4. Neutron (Network Service): In the first releases of OpenStack, networking between virtual machines was performed by Nova-Network service, but in order to reduce Nova responsibility and allow services to be loosely- coupled, networking has been moved to a new RESTful service called Neutron.

Neutron service provides the capability for virtual machines to have not only internal (fixed) IP addresses but also external (floating) IP addresses, in which applications in Nova virtual machines are accessible from any network over the internet. Neutron server provides control and management over networks, subnets, routers, firewalls, and security groups.

5. Cinder (Block Storage Service): Storage was one of the main problems with virtual machines OS in the cloud. When consumers are storing data in virtual machines, the data is being lost when they restart the virtual machine, because virtual machines have volatile storage (unable to store data in Glance image).

This problem leads to create a new service (earlier was called nova-volume service) that makes it possible to create and manage volumes by consumers for their virtual machines. To reduce the complexity and responsibility of managing and controlling volumes, Folsom separated volume management service from Nova. This results in a new service called Cinder.

6. Swift (Object Storage Service): Swift is a high-available, low-latency, high-scalable, and reliable object (data) storage. Swift allows consumers to store their , such as photos, documents, etc. by uploading them via the web- based application (called Horizon) or using OpenStack CLI. Also, Glance service uses Swift to store operating systems images by providing storage as a RESTful service [13].

7. Heat (orchestration Service): There are several ways to deploy infrastructure resources, such as virtual machines, networks, and storage, using Horizon or OpenStack CLI. Consumers and developers could write bash scripts to automate infrastructure deployments, but this operation 28

was not safe because developers may run into several errors, especially when modifying their code.

This could cause faulty infrastructure. To solve this problem, OpenStack provides a relatively new orchestration service (called Heat) that can deploy all required resources at one time (such as multiple instances, various network deployment, and configuration) by using Heat API and templates [13], which takes JASON form. Consumers also can develop Python or Perl script to feed the Heat service with the template.

Heat decreases complexity, costly, and vulnerable script- development to deploy, configure, and manage different infrastructure resources.

8. Horizon (DashBoard):

Figure 6: Horizon (Dashboard) GUI in OpenStack

29

Figure 6 shows the Horizon web-based interface, which provides OpenStack consumers the capability to access, provision, automate, and manage all OpenStack services to control underlying resources. It is mainly comprised of a set of libraries and web-server that implements the dashboard GUI and core libraries that implement the functionality of calling OpenStack RESTful services.

Horizon is a rich framework that provides almost all of OpenStack services functionalities. It is also the flexible and easy-to-use framework that allows consumers to construct different third-party systems, such as monitoring and logging tools. It allows web developers to create different web themes for their applications using frontend programming languages, including JavaScript and CSS.

The only weakness about Horizon is that it does not support all OpenStack objectives and functionalities, such as domain creation & management, and service catalog & endpoints management. To overcome this problem, OpenStack provides a command-line-interface (called Python-openstackclient).

9. Ceilometer: Ceilometer is a relatively old service (created in 2012) [13]. It is the monitoring tool that can gather information about all OpenStack services and components. It is capable of normalizing and transforming all gathered data in its rating engine to allow consumers for billing management and tracking their resource consumption.

10. Trove: Consumers applications are usually requiring a database management system. Trove abstracts the DBMS system from the consumer by providing database-as-a-service. It grants consumers the capability of multiple database provisioning, management, and scaling when necessary.

11. Sahara: Hadoop is a famous open-source system to solve big data problems. Sahara provides Hadoop-as-a-service for OpenStack consumers to handle large data sets in a cluster of servers [13]. Sahara splits large data sets into smaller sets and applies a distributed Map-Reduce algorithm.

30

2.4 Visualization Principles

The virtualization of the underlying physical resources is the art of edge cloud computing. There are two main mechanisms and principles of resource visualization: virtual machines, and containers.

2.4.1 Virtual Machines Section 2.2.1 briefly describes virtual machines and the different types of hypervisors. Although virtual machines have several milestone advances (such as improving scheduling, packaging and resource accessibility) [14], they still have several issues that make them an incorrect choice for cloud computing as virtualized architecture.

The drawback about virtualization is that they need a full image operating system. In other words, full image OS is relatively a large file that contains all OS utilities, which are not all are required for deployed applications. VM instances have a matter of large disk and Random- Access-Memory (RAM) space consumption which means inefficient resource management, and more time to load the full OS image into RAM (slow system startup and booting). Also, consumers’ applications can be lack memory, and consumers pay for extra resources used by not needed utilities and software run by the OS.

Furthermore, running several virtual machines on the same underlying physical resource can cause a severe problem because the physical resource (server) would be overwhelmed by the virtual machines even without any consumer application running on them.

Finally, in a cloud computing environment, underlying cloud resources, such as storage and CPU, are shared between instances. Achieving secure virtual machines resource-sharing in a portable and interoperable way is a challenge.

2.4.2 Containers Containers technology is the alternative to virtual machines to overcome their issues and achieve other wanted properties. Containers are lightweight and self-contained packages that allow portable and interoperable software applications, develop & deploy applications in a large pool of servers, interconnecting containers so that they share resources [14]. Figure 7 depicts the different approaches between virtual machines and containers. As seen, virtual machines run a full image guest OS on top of the host OS (the hypervisor) while several lightweight packaged containers are based on a single container engine.

31

VM VM Container Container

App App App App

Bins/Libs Bins/Libs Bins/Libs Bins/Libs

Guest OS Guest OS Container Engine

Host OS Host OS

Physical Resources Physical Resources

Figure 7: VM and Container Virtualization Architecture

There are two essential types of container engines [14]: 1. LXC Linux: LXC is a Linux OS level virtualization in which several isolated Linux processes are running on-top-of one Linux kernel. The Linux kernel provides the mechanism of isolation of these Linux processes through the kernel’s namespace and cgroups.

The namespace is used for giving the capability for the group of Linux processes to be segregated, so other groups have no accessibility to the resources of this group.

The cgroups (control groups) provide the capabilities to control and limit the power of the resources in a container, such as limiting the disk space or CPUs in a container [14].

2. Docker Container: Docker containers become the most used and iconic container solution for resource visualization. Docker is not a separate technology. It is indeed built using other technologies such as LXC techniques.

The reason behind the popularity of Docker images employs layered file systems through LXC technology. The idea is to build individual images that extend a base image, so consumers can use the base image as a start and add more to it as they need to. Docker images are lightweight and are easily built from portable application containers [14], so this allows docker images to be distributed as needed by consumers. 32

This thesis has two different case studies that employ the layering mechanism of docker containers. Each case study dockerize the application using a single file called “DockerFile” which indeed can control and manage the application to be deployed in its docker container. More about case studies in chapter 5.

2.5 Kubernetes

Kubernetes, as described briefly in Chapter 1, is a portable open-source platform that supports orchestrating, controlling, and managing for containerized application deployments in a collection of clusters. It also orchestrates computing, networking, and disk storage infrastructure for the consumers, which facilitates the simplicity of PaaS and flexibility of IaaS [2].

The reason behind choosing Kubernetes is that it runs the applications on containers, which provides the efficiency of containers (see section 2.4.2), load-balancing, scaling, monitoring, and logging tools. Furthermore, it supports the deployment of a wide range of an incredible diversity of workloads for various types of applications (such as stateful and stateless applications). In the test cases for this project, I use both types of applications to be deployed in heterogeneous Kubernetes clusters. Also, Kubernetes provides loosely-coupled application deployments, which means applications are decomposed into loosely- coupled pieces that are deployed dynamically. This is possible because Kubernetes uses build/release time to create applications instead of deployment time [2].

An essential feature of Kubernetes is that it supports canary-deployments. In other words, it facilitates a smart and easy mechanism of rolling back before deploying the application in all nodes in the cluster. Kubernetes is infrastructure independent, which means it runs the same in a personal computer as it runs in the cloud. Finally, it provides high-performance and efficiency for application deployment using resource isolation and utilization as it can monitor the application health.

All in all, Kubernetes is the perfect environment for microservices as it can speed up the automation of deployments; however, it is does not contain a mechanism or built-in tool for continuous delivery and continuous deployment (CI/CD).

33

2.5.1 Kubernetes Architecture & Components

Worker Node 1

Container 1 Container 1 Container 2

Pod 1 Pod 2 Master Node

Container Runtime API (Docker) Web User Server Interface

Kubelet Kube-Proxy Etcd API

Worker Node 2 CLI (KubeCTL) Container 1 Controller Kube Manager Scheduler Container 2 Pod 1

Container Runtime (Docker)

Kubelet Kube-Proxy

Image Store

Figure 8: Kubernetes Cluster Architecture

As seen in Figure 8, inspired by [15][16], Kubernetes consists of two main components, and few addons [15]. These components are:

1. Master Component: The master component is the control plane of the cluster. In other words, it is the brain of the cluster that makes all decisions for the

34

cluster, such as scheduling and managing the orchestration of the applications in the worker nodes. It also responds to events from consumers such as the creation of a new pod, or service and instructing the worker nodes to do so. Furthermore, it is capable of noticing if any of the worker nodes go down and reroute the workload to other nodes (if possible) [15]. The main components of the master node are etcd, kube-apiserver, kube-scheduler, and kube-controller-manager.

Etcd is the high-available distributed key-value store in Kubernetes that stores the metadata and status of the cluster [16] such as spawning or terminating pods, deployments, services, etc. It ensures storing information of clusters of nodes reliability in which (as seen in Figure 8) the only way to interact with etcd is through the kube- apiserver. kube-apiserver works as a gateway of the Kubernetes control plane (gateway for all components in the master node) that communicates with the API. It is also the components where it is responsible for checking and approving entries (from kubectl or web UI) before being saved in etcd component. At this point, the api- server is to respond to registered events by other components, and it will not start an event by itself [16].

Kube-scheduler is mainly responsible for checking out what nodes could be used to assign for the recently created pods [15][16], and then it notifies kube-apiserver with this newly created pod. Once the kube-apiserver validates the used node, the creates (or updates) the metadata key-value store (etcd). Kube-scheduler does not select an arbitrary node to assign the newly created pod; rather, it finds out the node that has the capacity that can manage and run the pod using built-in algorithms [16]. kube-controller-manager is responsible for taking the cluster to the desired state specified in deployment configuration & specifications. For example, if a deployment’s specification specifies the replication-factor to five, then it is the job of kube-controller- manager to make sure that the process(es) of the deployment is replicated to five different nodes. The kube-controller-manager consists of four controllers [15]. They are:

1. Node Controller which recognizes and responds to crashed or down nodes.

35

2. Replication Controller which checks the number of pods for each application and spawn more pods if needed by replication- factor by the deployment specifications. 3. Endpoints Controller which allows different services to be accessed by specified endpoints. 4. Service Account & Token Controllers which creates the required accounts for newly created namespaces, and services.

2. Node Component: Node component (also known as the worker node) handles and maintaining the actual running application deployed in the container(s). The application is running in several containers (s) in various worker nodes (the master node determines these nodes). This process is performed using a collection of components managed by the worker node, which are: kubelet, kube-proxy, and container- runtime.

Kubelet is responsible for informing master node (through kube- apiserver) about the node it runs, which means every node joins the cluster, the kubelet registers it into the kube-apiserver. Then, it listens to kube-apiserver for new pods expected to be run in this node. If kubelet finds out new pods are scheduled, and it starts the pod and it containers for the deployed application. Finally, kubelet works as a watchdog for failures happen in the nodes. If a container stops running, it restarts it and the kubelet reports the status of pod’s containers to kube-apiserver [16].

Kube-proxy provides networking abstraction, managing networking rules [15]. It performs packet forwarding and routing in which applications are exposed in the external networks (such as the internet) using kube-proxy.

Container-runtime, such as docker, is a software program that manages and controls running containers. It is the service that creates and destroys containers (when needed). As mentioned in section 2.4.2, it is responsible for packaging, testing, and distributing the application to various containers and running it using the deployed application code, and application configuration & metadata files. The image store, such as docker hub, stores application’s images (also known as docker images) so that other nodes can use the application’s image based-on how accessibility allowed for this image.

The add-ons are the required tools used by the consumer and Kubernetes engine. These tools are: 36

Web UI (Dashboard):

Figure 9: Kubernetes Web UI

Figure 9 shows the dashboard UI of Kubernetes. It allows consumers to control, monitor, and manage the cluster’s services, pods, replica sets, deployments, jobs, user’s credentials, load-balancing, and persistent volume claims (PVC).

Kubectl: Similar to OpenStack, Kubernetes offers a command-line interface CLI to perform similar tasks of web UI. 37

Furthermore, it provides advanced tasks, such as applying third-party tools to Kubernetes.

The basic units of Kubernets are pods deployments and services. A pod is a tightly-coupled collection of containers that run together in the same context [17]. The pod has its IP address, and it hosts one or more containers where all containers share this IP address and other network resources. Furthermore, it provides a high-level of abstraction of the network and storage to simplify the container’s migration in the cluster. The pod is not capable of persisting data by itself. In other words, if an application persists data in a pod, data would be lost when a pod is starting over. Applications are required to use persistent volumes (PV).

Deployments hold the template and the master plan of the pods to be spawned. This means it contains the required information and metadata (such as the resource specification and usage.) to create the pods. If the pod is killed or crashed, Kubernetes finds out the blueprint of the pod from the Kubernetes deployments. Furthermore, to update the application, Kubernetes needs to update its deployment in Kubernetes using either web UI or kubectl [17].

To allow consumers’ deployed applications in the pods to be discoverable, Kubernetes services are used. They also expose these pods to the external networks, or internally to be discovered by for other pods [17]. Services types will be tested and evaluated in this project are NodePort (deployment can be accessed from inside the cluster) and LoadBalancer (deployment get an external public IP address and can be accessed from external networks). More about them in Chapter 4, and 5.

2.6 Spinnaker

Spinnaker is the last piece of the puzzle. Spinnaker is an open-source framework, developed by Netflix, that allows continuous deployment and orchestration. It also features continuous integration and delivery [18]. In other words, it reduces the complexity to manage and control deployments, and inventory for software deployments. This means that developers are not required to release new versions of the application and perform a manual deployment; instead, they can continuously push new features or bug fixes directly to the cloud. This procedure allows developers to patch new features as they need, even several times a day. Furthermore, developers can create automated tests so the deployment can be deployed to the clusters, only if it passes the automated test, this procedure in automating deployment known as canary analysis [18]. 38

2.6.1 Spinnaker Architecture & Components

Figure 10: Spinnaker Architecture [18]

As seen in Figure 10, Spinnaker consists of a collection of RESTful services that provide the capability of deployment automation and management. These components are [18]:

Front50 service is the Spinnaker’s data store that stores Spinnaker’s pipelines, tasks, stages, notifications, strategies, and application configurations files. It uses Redis as a database.

Rosco service is responsible for baking images in Spinnaker. In other words, it creates images for consumers deployments so that other Spinnaker’s services can find the baked image and deploy it when needed.

Rush service is responsible for handling and keeping track of Spinnaker’s scripts to be executed in the specified platform. Recent releases of Spinnaker do not require Rush service because Rush service caused a bottleneck in script execution.

Orca service provides the orchestration engine in Spinnaker by managing, controlling, and running the deployment pipelines and task definitions.

Igor is the integration service. It allows the integration interface to other frameworks and platforms such as Jenkins and Github. It allows

39

consumers to keep third-party platform credentials securely. It informs the Echo service if credentials have been deleted or changed.

Echo service is responsible for triggering and executing pipelines for deployments. Furthermore, it routes the events pipelines to Orca service listeners.

Gate is the RESTful service that works as a gateway to all Spinnaker’s services. It provides the Spinnaker’s API and the user interface and consumers.

Deck service

Figure 11: Spinnaker Web UI

40

2.7 Related Work

Simplifying deployment was never a new research study, and there are several studies about it. However, studies have not focused on simplifying deployment mechanisms in which they provide secure, scalable, load-balanced and continuous deployments yet simple in multi-edge-cloud computing (when there are heterogeneous infrastructures and platforms).

A paper in [19] examines the mechanism of transparent deployment architecture in multi-cloud systems. It shows a wide variation of IaaS, such as Amazon and Lambda, which eliminate complexity for application deployment. However, it does not provide crucial deployment aspects and techniques, including continuous deployment & delivery, load-balancing, auto-scaling, and security. It also does not show the cost of their deployment solution, and it lacks for evaluation of essential deployment characteristics such as the evaluation of application downtime.

There are also a few evaluations and comparisons about the deployment performance using virtual machines and containers on the multi-cloud environment. This report provides the analysis and test of deployment automation in different systems and infrastructure virtualization. It also provides testing for scalability, security, load- balancing, and zero-downtime for deployments on the multi-cloud environment.

41

Chapter 3

3 Methodologies and Deployment Approaches

This chapter describes the research methodologies and approaches used in this project. It also illustrates the adopted methods and tools in case studies that achieve the goals of deployment automation and continuous delivery in a multi-cloud platform.

3.1 Research Approaches To reach the goal and appropriate results of this project, it is necessary to apply reason-based methods that justify the purpose of the project. Thus, the methods used in this research are applied, analytical, and qualitative method.

3.1.1 Applied Approach Software deployment is a problem that directly affected the software industry because developers tend to spend lots of effort and time to deploy their software. The applied research approach aims to find and improve solutions for the current deployment problem. Furthermore, the applied research method tends to evaluate the solution that would be found in the project [5].

3.1.2 Qualitative Approach Software deployment is a behavioral & experimental science in which it is concerned about the quality of deployment. It is not based on a measured quantity, so the qualitative approach is used in this project. To figure out the deployment automation problem, the qualitative approach discovers the motivations and desires of deployment automation. For this, it employs different mechanisms, such as in-depth interviews [5].

The qualitative approach allows to investigate a different kind of deployment mechanisms and to compare them together. It also helps in improving the current approaches, which results in a better-quality solution. In this project, we use different test cases to find out the quality of the proposed solution for deployment automation and distribution of continuous delivery framework.

3.1.3 Empirical & Analytical Approaches Deployment automation requires control & manipulation over background variables to provide the capability of analyzing and 42

evaluating the results [5], so this project uses both the empirical & analytical approaches. The analytical approach provides this research on the mechanism of analyzing the critical evaluation of various deployment approaches. The idea of the analytical research approach is to find causal relations among deployment procedures and connect them to reach the best approach that satisfies the goal and purpose of the project.

It also helps to understand the deployment automation approach designed in this project and explains it. In other words, it helps to answer the question: how we could simplify the deployment mechanism while holding other vital aspects of deployment, such as continuous delivery, load-balancing, auto-scaling, and support of multi-cloud infrastructures and environments.

The empirical (along with analytical) approach helps on experimenting and observing the outcome of the deployment automation, and how specific background variables can affect the result of deployment simplification, performance, reliability, and robustness [5].

3.2 Research Process The research process comprises of a collection of actions and activities necessary for scientific research to include [5]. These steps are ordered and dependent on each other. This means the outcome of one step is essential for the next steps. Figure 12 describes the scientific research process.

It consists of the following steps: Define the problem, extensive theory review & studying previously performed researches, devise the hypothesis, define the research, collection of the consequences, and then analyzing them. After analyzing and testing the consequences, the researchers can refine their research to enhance their hypothesis outcome. This process can be repeated until researchers can get satisfying results based on their project goals.

43

Define the

problem

Extensive theory Study previous review researches

Including assump- tions, method Devise the

types and data hypothesis H types. 0

Including new or updated ideas, Design the and artifacts. research

Refine the solu- tion, and redesign Collect H0 the research consequences based on con- sequences.

Including results examination, test- Analyze H0 ing, and evalua- consequences tion.

Figure 12: Scientific Research Process, inspired by [5]

3.2.1 Solution Overview Figure 13 introduces the solution overview used in this thesis. The big picture of the proposed solution is based on two significant parts: deployment of hybrid cloud infrastructure, the IaaS, and the hybrid platform, the PaaS. The IaaS includes Both OpenStack and Microsoft Azure. The PaaS includes (federated) Kubernetes clusters (deployed on top of both OpenStack containers and Microsoft Azure Agents) and distributed Spinnaker instances on top of Kubernetes clusters. The hypothesis in this thesis is that this allows the admin or the developer to deploy their application efficiently within a single commit & push in the VCS (version control system), including GitHub and Docker Hub. Efficient 44

deployment means the automation of deployment in which it preserves the essential deployment characteristics and properties, including continuous delivery, load-balancing, cluster auto-scaling, and support of multi-cloud infrastructures and platforms.

Figure 13: Solution Overview

45

3.2.2 Adopted Research Process The adopted research process for deployment automation, Figure 14, is based on the scientific research process in Figure 12 and also the research approaches of this project (mentioned in section 3.1).

Stage 1: Preparation Define the problem

Extensive theory Study previous review researches

Stage 2: Deploy IaaS

Deploy Deploy

OpenStack Microsoft Azure

Stage 3: Deploy PaaS

Deploy KubeSpray Deploy Azure Clusters Clusters

K8S Clusters

Deploy Distributed Spinnaker

46

Stage 4: Test Case

Develop angular Deploy & bake Observations based test app app with K8S & including VCS and dokerize it Spinnaker testing

Stage 5: Performance Testing

Performance Testing

Google Promethous Lighthouse

Stage 6: Results Analysis

Results Results analysis comparison

Results Final conclusion & presentation future work

Figure 14: Adopted Research Process

The devised hypothesis in this project is about the usage of Kubernetes (federated) clusters along with Spinnaker continuous delivery, which would simplify deployment and achieve automation along with continuous delivery. Furthermore, deployment properties, including 47

auto-scaling, secure deployment, zero-downtime, load-balancing, are preserved and achieved.

Using the qualitative approach, it is possible to ensure the quality of the solution by different test cases. The analytical and empirical approaches are used to control the test cases variables and analyze the consequences. The applied approach is essential also in this project to evaluate and refine the solution used in the test cases.

As seen in Figure 14, the research approach used in this project consists of six stages:

Stage 1: Project preparation Formulating the problem is the essential step in the entire project. The most proper mechanism to identify the research problem is by discussing it with supervisors (in both Ericsson and KTH). Thus, the result of the discussion is narrowing down the problem to be focused on a specific matter.

As the research problem was formulated, the next steps in this stage are theory and previous researches study. So, we conduct an extensive literature study on various scientific papers & videos about the deployment (in cloud computing). Furthermore, we had different mechanisms to find out the suitable methods and instruments for this project, such as meetings with experts, workshops with supervisors, and reading articles.

Stage 2: IaaS Deployment After preparation in the first stage of the project, the devised working hypothesis and its assumptions is distributed in both stage 2 and stage 3. In the second stage of the project, the deployment of a working infrastructure as a service is required. We deploy various IaaS in this stage which are OpenStack (for on-premise cloud) and Microsoft Azure (for public cloud). They both spawn various virtual machines that will be used in stage 3, Kubernetes and Spinnaker.

Stage 3: PaaS Deployment When the IaaS is in place, and virtual machines are spawned, it is possible to deploy Kubernetes clusters. Kubernetes clusters are geographically distributed because we use different cloud infrastructures. Thus, we would have KubeSpray clusters (on-premise clusters) and Azure Kubernetes clusters (public cloud clusters).

After that, the Spinnaker framework comes to perform continuous delivery. To ensure the best practices in our solution, and improve 48

deployment automation, Spinnaker will be distributed into both on- premises cloud, and the public cloud of Microsoft Azure.

Stage 4: Case Study The first phase of the case study intends to test the automation of deployment. Thus, in this phase of the case study, the first step is to develop the web application (we call it CHATi), using Angular and (Google’s distributed database).

The next step is to dockerize the application so it would be deployed in Kubernetes clusters (specifically in Docker containers). Then Spinnaker comes in place to bake the application. We develop Spinnaker scripts that allow the application deployment to be triggered automatically when the application code is pushed into the VCS repository. The deployed application would take place is various clusters in both on- premise and public clouds. Finally, we collect the consequences (results) of the deployment automation, including the load-balancing, performance, and usability of the application deployment.

Stage 5: Performance Testing In this stage, we investigate the zero downtime when a new release of this application is deployed. We also perform different kind of performance testing such as request duration, CPU usage, memory usage, and message delivery.

Furthermore, we try to kill the original Spinnaker process and try to deploy a new release of the application. In other words, we test the availability of the application and Spinnaker as the continuous delivery platform. Finally, we test the performance and usability of application deployment within federated clusters between public and on-premise clusters.

Stage 6: Results Analysis This is the final stage of the project. We evaluate and analyze the results and consequences collected in both stages of case studies and compare them together. Furthermore, we meet and present our results to our supervisors and receive feedback. Finally, we draw the advantages and disadvantages of this deployment approach and the future work of deployment automation.

3.3 Project Management Objectives After the first meetings with the supervisors for this project, we employed specific methods to work on the project. Furthermore, scientific projects, which employ the applied and qualitative research approaches, are 49

usually having a tradeoff between three main constraints that make up the quality of the project. These triple constraints variables are cost, scope, and time [20]. This project follows the triple constraints theory triangle variables (cost, scope, and time) to ensure the quality of the project.

3.3.1 Project Methods We apply several scientific research methods based on the followed research approaches in 3.1. These methods are divided into three main categories, which are rule-based Inferencing methods, data collection methods and analysis methods.

1. Rule-Based Inferencing Methods Rule-based Inferencing methods are an essential part of the project, in which they allow us to conclude from facts or shreds of evidence we know about software deployment. There are two main rule-based inferencing methods used in this project:

a. Induction Method The induction method is the mechanism to conclude from the observation. In other words, we start the observation of consequences, and based on them; it would be possible to come up with the generalization. The problem of this method in the deployment automation project is that the extended knowledge beyond what we have might be fallible (false), the more accurate machoism is the deductive inferencing method [21].

b. Deductive Method The idea behind the deductive methods is to move from general to specific, which means we start from the hypothesis, then we check its consequences by performing qualitative research. There are two main methods of deduction used in this project, which are modus ponens and modus tollens.

Modus ponens method indicates a collection of assumptions, and if all are accepted to be true, then we accept the consequences of these assumptions. The more robust method is modus tollens in which if we have at least one consequence of the deployment automation to be false, then we reject the hypothesis immediately.

2. Data Collection Methods As the problem of the thesis was determined, the process of deciding the data collection methods of the deployment automation research had started. The process of deciding which data collection methods

50

to be used are the next most crucial process after defining the problem statement.

Most of the data collection methods in this project are based-on on the collection of the primary data. Primary data are those we observe for the first time from the experiments and case studies. The selection of data collection methods in this project is based on the nature, scope, time factor, and precision required for the deployment automation. The selected methods are the observation method, the case study method, and the interview method.

a. Observation Method In behavioral scientific researches, the most accepted method is the observation method because it delivers the designated purpose of the project [5] so we can check the internal validity of the hypothesis.

In this deployment automation project, the observation method serves to collect information about what is happening during the deployment procedures and record it in which it eliminates the observer-expectancy bias.

b. Case Study Method The case study method is essential part of the project and it follows the qualitative approach in this project. We can think about it as an extension of the observation method. It has in-depth investigation about the deployment automation procedures. Furthermore, it helps to focus upon specific interrelated events or conditions and provide a complete analysis about them.

c. Interview Method The interview method is the mechanism of collecting data based on motivated verbal request-response mechanism. We needed more information about each different framework in both IaaS and PaaS because the information we found by studying previous researches and theories were not enough.

We followed structures and focused interviews with experts in each framework. In other words, we focused on specific issues and matters of the framework and its impact on other components. Interviews helped to form the hypothesis, understanding the deployment in the cloud environment, and in designing the case studies of the project.

3. Analytical Methods 51

The analytical methods are based on a qualitative approach (the qualitative data analysis). In this project, we follow three categories of qualitative data analysis, which are content analysis, framework analysis, and grounded theory analysis.

a. Content Analysis The content analysis is the analysis procedure that focuses on the behavior of data in which they can be classified and tabulated [5]. This allows us to compare a different kind of deployment behaviors from other researches and the case studies performed in this project.

b. Framework Analysis The framework analysis is based on the applied research approach. Framework analysis and content analysis are similar in many ways, but the essential key difference is that the framework analysis summarizes and structures the data. Furthermore, the summery of data is connected to original data.

c. Grounded Theory Analysis In this data analysis method, we basically start from a theory, formulate it, analyze and test it and then reformulate it as needed. This iterative method is known as grounded theory analysis.

3.3.2 Triple Constraint Theory

Cost

Quality

Scope Time

Figure 15: Project Management Constraint Triangle [20]

Each deployment mechanism follows the concept of the project management triple constraint triangle. In other words, while planning the project, we divided the project into three main criteria based on [20], the project development schedule & duration, the cost of each 52

deployment approach, and the scope of needed requirements of our test cases.

Figure 15 shows the interconnection between the three constraints. For instance, the duration and time required in the project are affected by the cost and the scope. This means there is a tradeoff between these constraints in which one constraint can affect the others. The quality of the project is the overall balancing between these project management constraints.

3.4 Testing & Evaluation To evaluate each deployment mechanism, we define different test methods in which they help to find out the advantages and disadvantages of each deployment approach. These test methods are deployment using VCS, zero downtime testing, google lighthouse testing, and stress testing.

3.4.1 Deployment Using VCS This kind of testing verifies whether the project (in the test case) can be deployed in the hybrid clusters each time a new release (committed, pushed and tagged for a new release code) is created in the VCS. This simplifies the deployment for developers in which a pipeline is created to deploy the new release in all clusters in the platform so that the developers could focus only on the development rather than the deployment.

3.4.2 Zero Downtime Testing Zero-downtime testing is related directly to software availability. In other words, this testing answers the question, how we can ensure the high availability for the software in which there will be no interruption of the software and its services while the deployment is going through. This procedure is known as high availability. It means the business must not be affected negatively while the incremental deployment is taking place.

This testing is affected by several factors, such as the usage of the load- balancers, stateless applications, and replication among different clusters in the cloud.

3.4.3 Google Lighthouse Testing The Google lighthouse is such a performance testing tool in which it checks several criteria in the deployed web application including accessibility, first contentful paint, speed index, CPU usage, content breakdown, and performance improvements [22].

53

3.5 Deployment Automation Analysis This is the final step in the project where we compare the accumulated results from the test cases. We then evaluate and analyze these results in which we formulate the advantages and disadvantages for the deployment automation in hybrid clusters.

3.6 Project Resources The resources in this project is shared with other four students in which we share OpenStack resources, Kubernetes clusters, and Microsoft Azure resources. Table 1 shows the infrastructure resources that Ericsson Software Technology provided us

Infrastructure Resources Instances 50 instances VCPU 100 VCPUs RAM 200 GB Disk Space 1000 GB Volumes 10 volumes Snapshots 10 snapshots Floating IPs 15 floating points Security Groups 10 security groups Routers 10 routers

Table 1: Infrastructure Resources provided by Ericsson

54

Chapter 4

4 Application Deployment in The Hybrid Cloud

This chapter is the core of the automation deployment project in which it describes the test cases performed in this project, along with the cost of each deployment approach. In other words, it represents each step taken of the test studies conducted in this project.

4.1 Cloud Environment Deployment The first step towards deployment automation in both test-case studies is the deployment of the edge cloud environment (both the IaaS and PaaS). This step comes directly after the problem definition, and the extensive theory and other researches study.

4.1.1 OpenStack Deployment OpenStack is the on-premise cloud infrastructure. In this project, we deploy the Rocky version of OpenStack using a framework called kola- ansible. It is an orchestration tool that deploys the IaaS deployment, such as OpenStack services into containers [23].

As seen in Figure 16, deploying OpenStack begins with preparation for the controller and compute nodes. To enable the preparation of the controller and compute nodes, we need another node called Jump Host or Jump Server. The jump server is assigned with one of floating IP addresses (provided as resources) in which it can be accessed via port 22 using the secure shell (SSH). Code Block 1 shows the installation of the essential dependencies for the jump server. # Install updates & dependencies $ sudo apt update && sudo apt upgrade -y $ sudo apt-get install -y python-pip python-dev libffi-dev gcc libssl-dev python-selinux $ export LC_ALL=C && sudo pip install -U pip

# Install Ansible $ pip install -U ansible

# Install Kola-Ansible $ pip install kolla-ansible

Code Snippet 1: Installing dependencies 55

Jump Server

controls & manages

Prepare Controller Initiate deployment & Compute Nodes Configuration

• Prepare Inventory • Installing file Dependencies • Prepare Kolla • Installing Kolla- Passwords Ansible • Prepare globals

Use & Verify Deploy OpenStack OpenStack

• Bootstrap Servers • Install OpenStack • Pre-deployment CLI Clients checks • Use OpenStack • Run Playbook

Figure 16: OpenStack Deployment

The jump server helps to install required dependencies and configuration into the controller and compute nodes, which are not required to have floating IP addresses to be accessed from external networks.

The idea is to install the Kolla-Ansible framework in the jump server, so it distributes the dependencies installation to all hosts (controllers and compute nodes) after specifying the inventory. Inventory is just a textual configuration file, called multinode that specifies the jump host, controllers, and compute nodes that will be used to deploy IaaS (OpenStack) and PaaS (Kubernetes). Code Block 2 shows the partial content of the inventory file used in this project for the first cluster that deploys OpenStack infrastructure services. We use for this cluster one controller called khaled-kube1-master and three compute nodes which

56

are khaled-kube1-node1, khaled-kube1-node2 and khaled-kube1- calico1.

[control] # These hostname must be resolvable from your deployment host khaled-kube1-master ansible_user=ubuntu ansible_become=true

[network:children] control

[compute] khaled-kube1-node1 ansible_user=ubuntu ansible_become=true khaled-kube1-node2 ansible_user=ubuntu ansible_become=true khaled-kube1-calico1 ansible_user=ubuntu ansible_become=true

[monitoring:children] control

[storage:children] compute

[deployment] khaled-jumphost ansible_connection=local

Code Snippet 2: Multinode Inventory File

If multinode inventory file is set correctly (where all hostnames are resolvable correctly), then all controllers and compute nodes are pingable. To ping the all nodes at the same time, we use the ansible ping command: ansible -i -m ping all

The next step is to configure passwords located in passwords.yml file in the kolla-ansible framework. To simplify the mechanism of setting a password of each OpenStack service, kolla-ansible framework comes with a password generation tool called kolla-genpwd

The final step in the configuration stage is to specify and configure kola globals.yml file. OpenStack requires globals.yml because it defines the variables and properties for OpenStack such as the image used in the deployment (we use Ubuntu), the installation type, the OpenStack release (we use Rocky version), network interfaces, specified IP addresses for OpenStack services, and other properties.

57

As we have the essential dependencies are installed, and the inventory file is configured along with password generation and globals.yml, we move forward to deploy OpenStack. Code Block 3 shows the deployment of OpenStack, which is divided into three stages:

1. Servers bootstrapping The servers bootstrapping step is crucial because it applies and deploys all previously performed configurations.

2. Pre-deployment checks OpenStack requires checking accessibility and configuration to be performed correctly before starting the actual deployment.

3. Deployment of OpenStack Services This is the final step in OpenStack deployment. It deploys the actual services of OpenStack. These three steps are called OpenStack playbook.

# Servers bootstrapping:

$ kolla-ansible -i ./multinode bootstrap-servers

# Pre-deployment checks $ kolla-ansible -i ./multinode prechecks

# OpenStack deployment $ kolla-ansible -i ./multinode deploy

Code Snippet 3: Actual OpenStack Deployment

As the OpenStack is deployed, we need to generate the admin-openrc file. The role of admin-openrc file is to export variables required by OpenStack to figure out the user credentials, the domain, and the project to work with. To generate the openrc file, we use kolla-ansible command kolla-ansible post-deploy. This will generate the openrc file. To make these variables in openrc file visible to OpenStack and Ubuntu OS, we need to source it.

The final step is to install OpenStack client CLI. We used python pip to install them. sudo pip install python-openstackclient

58

# Check OpenStack if installed successfully

$ openstack --version openstack 3.19.0

# Check installed path of OpenStack $ which openstack /usr/bin/openstack

Code Snippet 4: Check deployed OpenStack

4.1.2 Kubernetes Deployment Once OpenStack is deployed successfully, we could deploy the platform as a service (Kubernetes) on top of it. There are several mechanisms and methods to deploy Kubernetes on-premises such as Kubeadm and Kubespray. As long as we are using Ansible as the framework, Kubespray will be deployed. One of the benefits to choose Kubespray is that it deploys clusters spanned into multiple platforms including Microsoft Azure and AWS.

The installation of Kubespray is inspired by [24] and adapted to be installed on-top of OpenStack. In section 4.1.1 we deployed the IaaS with the controller and compute nodes. In Figure 17, we have an example of one of Kubespray clusters to be deployed on-premises on top of OpenStack compute nodes.

khaled-kube1-master Cluster’s master node

khaled-kube1-node1

khaled-kube1-node2 Cluster’s worker nodes

khaled-kube1-calico1 Cluster’s virtual networking

Figure 17: An example of one edge cluster

Similar to OpenStack deployment, we use Ansible to deploy Kubespray cluster. Furthermore, the jump server is used to install the cluster to the master and slave nodes.

59

Jump Server

controls & manages

Cluster K8S Cluster on-top-

Preparation of OpenStack

• Installing cluster • Configure requirements Kubernetes cluster • Prepare Inventory to be deployed on top of OpenStack • Prepare Configuration

Cluster Post Cluster Actual Deployment Deployment

• Deploy • Deploy Cluster Kubernetes CLI using cluster.yml • Install cluster’s and ansible context in the playbook config file

Figure 18: Kubespray Cluster Deployment

Figure 18 shows the required steps to deploy on-premises Kubespray cluster. Code snippet 5, and 6 show the cluster preparation stage. The requirements file contains the prerequire libraries to deploy the cluster such as ansible, jinja2, netaddr, pbr, ansible-modules-hashivault, and hvac. Once the requirements are installed, we can proceed to configure the cluster’s inventory. We prepare the inventory by preparing the hosts.ini file. The hosts.ini file contains the definition of the master(s), worker(s), etcd, and Container Network Interface CNI (pod-to-pod networking, such as ).

The final step in cluster preparation is to review the k8s-cluster.yml file and perform the required changes. For instance, to ensure that Kubernetes cluster would use calico as the container network interface, we changed the k8s-cluster.yml file. Figure 19 demonstrates a part of k8s-cluster.yml we use to configure this k8s cluster. 60

# Installing requirements $ sudo pip install -r requirements.txt

# Configure cluster’s inventory $ sudo vim hosts.ini

[all] khaled-kube1-master ansible_ssh_host=10.1.0.35 ip=10.1.0.35 khaled-kube1-node1 ansible_ssh_host= 10.1.0.39 ip=10.1.0.39 khaled-kube1-node2 ansible_ssh_host= 10.1.0.29 ip= 10.1.0.29 khaled-kube1-calico1 ansible_ssh_host= 10.1.0.31 ip=10.1.0.31

[kube-master] khaled-kube1-master

[etcd] khaled-kube1-node1

[kube-node] khaled-kube1-node1 khaled-kube1-node2

[k8s-cluster:children] kube-node kube-master

[calico-rr] khaled-kube1-calico1

[rack0] khaled-kube1-master khaled-kube1-node1 khaled-kube1-node2 khaled-kube1-calico1

[rack0:vars] cluster_id="1.0.0.0"

Code Snippet 5: Cluster Preparation

61

Figure 19: Part of k8s-cluster configuration file

After the cluster preparation, we configure the Kubernetes cluster to recognize the IaaS infrastructure to be deployed on-top-of it. As shown in Figure 20, we change another configuration file called all.yml in group_vars directory of Kubespray by altering the cloud_provider from none to 'openstack'.

Figure 20: K8S cluster to be deployed on-top-of OpenStack

If all configurations in the first two steps from Figure 18 are done correctly, we may proceed to the actual deployment of the k8s cluster. The following ansible-playbook command orchestrates the actual deployment of the cluster. ansible-playbook -i inventory/ws/hosts.ini --become --become- user=root cluster.yml

The final step of the Kubernetes cluster deployment is to install the Kubernetes command-line interface (kubectl) and deploy the context (the kube config file) of the cluster in $HOME/.kube/config so that the jump server can detect and manage the cluster. There are several ways to install kubectl. In Ubuntu, we used apt-get tool to install it simply by sudo apt-get update && sudo apt-get install -y kubectl while in Mac systems, we used Homebrew.

62

Finally, we add the cluster configuration file that has all properties required to reach the cluster via its master. Figure 21 shows one of the cluster’s config files used in this project.

$ cat $HOME/.kube/config apiVersion: v1 clusters: - cluster: certificate-authority-data: LS0tLS1CRUdJTiBD0t…

server: https://10.1.0.35:6443 name: cluster1 contexts: - context: cluster: cluster1 user: admin-cluster1 name: cluster1 current-context: cluster1 kind: Config preferences: {} users: - name: admin-cluster1 user: client-certificate-data: LS0tLS1CRUdJTiBDRVJUSUZJ… client-key-data: LS0tLS1CRUdJTiBSU…

Figure 21: Cluster’s config file

To ensure that Kubernetes cluster is deployed successfully, we may check out using the kubectl. Figure 22 shows the cluster deployed. We can see the current “ready” nodes. We can also find out the pods of Kubernetes cluster (the system pods). We can also notice many of these pods are replicated so that if the system pod is crashed, it will be restarted automatically and meanwhile, the replica pod will take control over the master, and worker nodes.

All previous procedure was to deploy Kubernetes cluster on-premises on top of our infrastructure. We also need to deploy other Kubernetes cluster on top of Microsoft Azure infrastructure. Microsoft Azure provides well-defined instructions to deploy the Kubernetes cluster [25]. As it is a public cloud, the cluster configuration is limited and most of it done by the Microsoft Azure.

63

Figure 22: Check Kubernetes cluster system nodes and pods

The first step is to install the Microsoft Azure command-line interface. Then specifying the cluster resource-group, name, the count of nodes, and the region for the cluster to be deployed. Region is an important factor for the end-user because the latency of the deployed application can be raised significantly depending on the location of the cluster. For our cluster, we chose West Europe as the region of the cluster to be deployed. Figure 23 shows the Microsoft Azure Kubernetes Service AKS cluster called khaled-aks1.

Figure 23: Running nodes and K8S Pods in AKS

64

4.1.3 Spinnaker Deployment Spinnaker deployment is not an atomic operation, especially as we require spinnaker to be distributed into several nodes and pods in which if the spinnaker’s node crashed, other nodes can take place without the loss of the software deployment and pipelining. Furthermore, unlike Azure AKS deployment, it is not supported by default in private or public clouds.

In the on-premise cloud environment, we have a dedicated virtual node where Spinnaker would be deployed. Furthermore, we would deploy Spinnaker in the Microsoft Azure cluster of five nodes as a public cloud. We setup the collaboration and replication between various Spinnaker’s instances using Zookeeper in which we automate the software deployment in both private and public cloud environments so if one Spinnaker’s node (in either private or public cloud environment) is down, then one of the other nodes would take place so the application would be deployed in both private and public clouds clusters.

As seen in Figure 24, Spinnaker has several prerequisite requirements to be deployed such as Halyard (the spinnaker’s lifecycle manager), Storage (which can be Minio, the S3 object storage, or Redis, the database and in-memory message broker). These are the general requirements for Spinnaker deployment. However, we found out that the deployment of Spinnaker has an extensive configuration and differences based on the nature of the cloud environment. For example, the distributed Spinnaker requires more configuration and deployment of Zookeeper in which if one virtual node is down, the other instance will take its place.

We start first deploying Spinnaker locally in the on-premise victual-node called “khaled-spinnaker”. The second step is to deploy distributed Spinnaker into the cluster of nodes in the public cloud (on-top of AKS cluster). The final step is to synchronize both Spinnaker instances (in the private and public cloud environments) using Zookeeper. The reason for using Zookeeper as a centralized service is to simplify the management of deployed Spinnaker instances.

4.1.3.1 Deploying Spinnaker Locally To install Spinnaker locally, we indeed follow the requirements is described in Figure 24, so we wrote a bash script inspired by [26] with a set of modifications to deploy Halyard and Minio in the Docker.

65

Spinnaker Node (In the private cloud)

controls & manages

Halyard Object Storage Installation Installation

• Managing • High performance Spinnaker’s life- object storage and cycle database • Configuring, and • Can be Redis, Installing Spinnaker Minio, Amazon S3, etc

Spiinaker Post Spinnaker Actual

Deployment Deployment

• Add & manage • Deploy spinnaker various providers based on the support to the configuration used deployed Spin- in Halyard naker

Figure 24: Spinnaker Deployment – General Requirements

# First requirement: halyard installation curl -O ht tps://raw.githubusercontent.com/spinnaker/halyard/master/install/debian/Ins tallHalyard.sh sudo bash InstallHalyard.sh --user ubuntu

# Second requirement: Minio installation into Docker container curl -fsSL get.docker.com -o get-docker.sh sh get-docker.sh sudo usermod -aG docker ubuntu sudo docker run -p 127.0.0.1:9090:9000 -d --name minio1 -v /mnt/data:/data -v /mnt/config:/root/.minio minio/minio server /data su do apt-get -y install jq apt-transport-https

66

# Set Mioio Secert key (required for S3 object storage)

MINIO_SECRET_KEY=`echo $(sudo docker exec minio1 cat /data/.minio.sys/config/config.json) |jq -r '.credential.secretKey'` MINIO_ACCESS_KEY=`echo $(sudo docker exec minio1 cat /data/.minio.sys/config/config.json) |jq -r '.credential.accessKey'` ech o $MINIO_SECRET_KEY | hal config storage s3 edit --endpoint http://127.0.0.1:9090 \ --access-key-id $MINIO_ACCESS_KEY \ --secret-access-key hal config storage edit --type s3

Code Snippet 6: Installing Halyard and Minio in the local virtual node

Code Snippet 6 shows the initial stages to install Spinnaker locally. First, we ensure that Spinnaker’s lifecycle management entity (halyard) is installed using ubuntu root user. Then we deploy the S3 object storage (Minio). Any S3 object storage requires access key id, and a secret key, so we add them to Minio. After that, we need to add the correct URL for the Spinnaker’s endpoint base URL so that it could be accessed. Because we use one host local Spinnaker, then can assign its IP address to spinnaker base URL, like the following: hal config security ui edit \ --override-base-url http://`curl -s ifconfig.co`:9000 hal config security api edit \ --override-base-url http://`curl -s ifconfig.co`:8084

The final step in deploying spinnaker locally is to install dependencies and apply Spinnaker’s providers. Code snippet 7 deploys dependencies including redis server (the in-memory data structure and database) and Spinnaker’s default profile. Finally, we include Kubernetes cluster as a spinnaker provider so that we can access it throughSpinnaker’s tools for automatic deployment.

67

# Install dependencies sudo apt update && apt-get -y install redis-server sudo systemctl enable redis-server && systemctl start redis-server

if [ ! -e "~/.hal/default/profiles" ] then sudo mkdir ~/.hal/default/profiles && chown -R ubuntu:ubuntu ~/.hal/default/profiles fi

sudo echo 'spinnaker.s3.versioning: false' > ~/.hal/default/profiles/front50-local.yml

SPINNAKER_VERSION=1.12.6

# Spinnaker post deployment # Configure Kubernetes cluster with Spinnaker hal config provider kubernetes enable hal config provider kubernetes account add k8s-account \ --provider-version v2 \ --context $(kubectl config current-context)

# hal config deploy edit --type distributed --account-name k8s-account

sudo hal config version edit --version $SPINNAKER_VERSION

# Apply and finalize spinnaker installation sudo hal deploy apply

Code Snippet 7: Installation of Spinnaker locally

4.1.3.2 Deploying Distributed Spinnaker To distribute Spinnaker into a collection of nodes into a cluster, we use helm charts. Briefly, helm is the application package manager that runs on top of Kubernetes in which it uses charts as a set of files that describe the Kubernetes resources. In other words, a chart can be used to deploy pods in a collation of containers where these pods contain an application or microservices such as a web application, so how helm can simplify the distribution of Spinnaker into the cluster’s nodes? 68

Distributed Spinnaker (In the public cloud)

controls & manages

Pre-installation Update helm chart Configuration values

• Update kube- • Preparing helm & Config, ingress, service account ingressGate, • Preparing Ingress Spinnaker version, using http router halyard, …

Spiinaker Post Spinnaker Actual

Deployment Deployment

• Add & manage • Create Spinnaker various providers namespace, and support to the secret deployed Spin- • Deploy Spinnaker naker helm chart

Figure 25: Spinnaker deployment into public cloud

Spinnaker has its files and dependencies as a helm chart so instead of deploying Spinnaker resources into various clusters hosts manually, we can use helm to distribute Spinnaker resources in all of the hosts into their related clusters.

As shown in Figure 25, The first stage is to prepare the cluster’s environment for Spinnaker to be installed. The first step on this stage is to install the helm server (tiller) and its role-based access control (RBAC) into the Kubernetes cluster, in which it enables admins to control, configure and manage policies and cluster’s resources.

In the yaml file of Code Snippet 8, we deploy the helm server service account into kube-system namespace (the namespace that holds all cluster’s system and networking pods). Then we give authorization to this

69

service account by creating the cluster role binding that associates the tiller account with the cluster-admin roles.

To install this yaml file specified in the Code Snippet 8, we use Kubernetes command-line interface, kubectl: kubectl apply -f helm-rbac.yaml And then we initialize the service account: helm init --service- account=tiller

apiVersion: v1 kind: ServiceAccount metadata: name: tiller namespace: kube-system --- apiVersion: rbac.authorization.k8s.io/v1beta1 kind: ClusterRoleBinding metadata: name: tiller roleRef: apiGroup: rbac.authorization.k8s.io kind: ClusterRole name: cluster-admin subjects: - kind: ServiceAccount name: tiller

namespace: kube-system

Code Snippet 8: Deploy and authorize the service account [27]

The next step of this stage is to prepare the ingress so that the application deployment traffic can be forwarded to Spinnaker resources. We achieve this process by enabling http router. To enable this resource in Microsft Azure, we enable the http addon in Kubernetes cluster, then we write this http routing configuration in a file. Code Snippet 9 enables the http routing and saves it in dns_root.txt file for future usage. This exposes the dns zone that enables the http routing. So, the output of cat ../dns_root.txt would be: cf4ba0ac01c54607a5e5.westeurope.aksapp.io which is the document that would be used by helm to establish the spinnaker resources.

70

az aks enable-addons --resource-group AKSKubernetesRG --name khaled-aks1 --addons http_application_routing

az aks show --resource-group AKSKubernetesRG --name khaled-aks1 --query addonProfiles.httpApplicationRouting.config.HTTPApplicationRouti ngZoneName -o tsv > ../dns_root.txt

cat ../dns_root.txt

Code Snippet 9: Configure http routing for khaled-aks1 resources

The next stage is to update the helm chart default values in values.yaml file [28]. As seen in Code Snippet 10, we configure spinnaker’s Kubernetes clusters in which we may add several clusters to work with. This is basically performed at kubeConfig section of the values.yaml

kubeConfig:

enabled: true

secretName: kubeconfig secretKey: config contexts: - akscluster deploymentContext: khaled-aks1

# We set our cluster’s http routing in ingress and ingressGate ingress: enabled: true host: spinnaker.cf4ba0ac01c54607a5e5.westeurope.aksapp.io annotations: kubernetes.io/ingress.class: addon-http-application-routing

ingressGate: enabled: true host: gate.cf4ba0ac01c54607a5e5.westeurope.aksapp.io annotations: kubernetes.io/ingress.class: addon-http-application-routing

71

Code Snippet 10: Modifying values of distributed spinnaker’s deployment We also update the ingress and ingressGate (the spinnaker’s UI) with the Spinnker’s http routing (the dns zone) we found in the first stage (the ingress router) so that spinnaker can communicate with these Kubernetes clusters.

The next stage is to deploy spinnaker helm chart using our modified values.yaml. But before deploying the helm chart, we need to deploy the spinnaker’s namespace and its secret as the following:

Finally, we can now deploy spinnaker using our modified spinnaker helm chart: helm install -f values.yaml --name cd --namespace spinnaker stable/spinnaker

The actual deployment could take several minutes to install all spinnakers libraries and modules including Gate (see 2.6.1). To ensure spinnaker is up and running, we try our ingress configuration in the values.yaml file. In other words, if we go to browser’s bar and insert spinnaker.cf4ba0ac01c54607a5e5.westeurope.aksapp.io then spinnaker’s gate should find the Kubernetes cluster along with our applications. Figure 26 shows a running distributed spinnaker into several pods.

72

Figure 26: Running distributed Spinnaker

4.2 Test Study (Case Study) As we have our cloud edges up and running, we develop our test applications for deployment automation. The first test study project called CHATi. CHATi is a containerized online chatting application written as microservices using Angular and Firebase to be deployed in our Kubernetes clusters in both public and private cloud systems. CHATi is available at: https://github.com/jSchnitzer1/CHATi

4.2.1 Dockerfile Dockerfile is an ordered textual script that has a set of commands (written in descending order) to instruct the docker engine to build, run, and deploy the project into containers in the Kubernetes clusters [29].

# stage 1

FROM node:latest as node WORKDIR /app COPY . . RUN npm install RUN npm rebuild node-sass RUN npm run build --prod

# stage 2

FROM nginx:alpine COPY --from=node /app/dist/CHATi /usr/share/nginx/html

Code Snippet 11: CHATi DockerFile

As we see in Code Snippet 11, CHATi dockerfile contains 8 commands divided into two stages. § Stage #1: This stage is used for building the the CHATi application. It contains the tools to build and debug the app. Basically, those are the angular and nodejs commands. § Stage #2: This stage is used for running the CHATi application. In other words, it contains the tools to start the app with best performance.

73

Going through the stage #1 of the dockerfile, it has the keyword FROM. FROM is a must-to-have keyword because it tells the docker engine to create the application from node.js framework. In other words, the Kubernetes cluster knows that it would deploy this application with node.js as a backend framework. The node framework would be pulled out from the public repository with the current latest version as specified in the command.

The next command would be WORKDIR. The WORKDIR is basically creating a new working directory of the project named “app” as angular projects code root directory are called “app”. If the application exists, then this command replaces the existing content of the working directory “app”.

After that, in the first stage of the dockerfile, we have the COPY command. It copies the content of the CHATi project code to the file system of the docker container engine which would be the destination. The COPY . . command means that the current directory (where the project exists) would be copied to the current directory of the container file system.

The next three commands are RUN commands. They are used to install, build and execute the CHATi project in the container. These RUN commands are being executed in a new layer for each command without affecting the original image.

In the stage #2 of the dockerfile, we start the application. Because it is a web application, then we will need a web server container that could host the CHATi application. For this reason, we use nginx web server, which is widely used in cloud computing. So same as stage #1, we have a FROM command to fetch the nginx repository and add it to the cluster.

Finally, we have a COPY command that copies the distributable files of the CHATi project to the web directory of the nginx so that the CHATi application would be exposed for port 80.

The next step is to create the docker-compose file(s). In this project, we want to compare between the usage of NodePort and a LoadBalancer.

4.2.1.1 NodePort and LoadBalancer As we need to compare using load balancers, and direct call to the cluster using Nodeport, we use two different docker-compose files. The deployment.yaml dockerfile uses the cluster’s NodePort while deployment.lb.yaml uses the cluster’s load balancer.

74

Traffic Traffic

(port 31002)

VM-2 VM-3 Load Balancer IaaS VM-1 IaaS

CHATi Master CHATi Master Node Node

Kubernetes Cluster Kubernetes Cluster

Pod-2 Pod-3 – Pod-1 Pod-2 Pod-3 – Pod-1

S PaaS Paa NodePort Load Balancer

Port 31000 Port 31002 Port 31005

Figure 27: NodePort and LoadBalancer designs

Figure 27 shows the different schematic and mechanism between NodePort and LoadBalancer when external calls (traffic) arrives at the cluster. NodePort applications are basically based on a specific port. In other words, each application would be reached only by assigning a port to it. LoadBalancers, on the other hand, would expose a single IP address in which all external calls will be forwarded to CHATi application.

So, in what scenarios would we apply NodePort or LoadBalancers?

As shown in Figure 27, the NodePort is used when we need to use one service or application per port. The main benefit of the load balancers over NodePorts is that they determine to which node the request should be forwarded so the service or application is directly exposed because the service would use its specified IP address. This is considered as a drawback of load balancers,because we would have to pay for a public IP address for each service that we would need to expose it to the external world. Figure

75

Figure 28: CHATi Web Application

4.2.2 Enabling Git Artifact Provider To facilitate continuous and deployment automation, we need to develop and setup the Github service provider account for Spinnaker. The service provider account points to our application repository in our Github in which whenever we push a new update, our deployment automation module deploys the updated application to the Kubernetes clusters in both private and public clouds.

The first step is to add a webhook for our CHATi repository. The webhook is a mechanism to integrate CHATi repository into other online systems such as the deployment automation modules by subscribing events performed into the CHATi repository. More specifically, we would subscribe to the git push event so that a set of actions would be triggered into our deployment automation modules when the push event took place into our CHATi repository.

76

Figure 29: CHATi webhook into Spinnaker Figure 29 shows the added webhook into our Spinnaker deployment automation artifacts and triggers (see section 4.2.3). For the webhook to be accessed correctly, it would require a personal access token. It is necessary because it is used for the authentication between Spinnaker and Github and Spinnaker. The personal access token would specify the grant privileges when accessing CHATi repository.

As seen in Figure 30, the personal access token called “spinnaker-token” provides the privileges of events accessibility across all repositories in Github.

77

Figure 30: Personal Access Token Used with Spinnaker

The final step of enabling Github artifact provider is to add the webhook into our deployment automation in Spinnaker. As seen in Code Snippet 12, we specify the Github token, which is stored in a file. Then we enable the Github provider (named “my-github- artifact-account”) using Halyard framework. After applying the changes, we would need to restart the Spinnaker services.

To check the availability of the Github artifact provider, we would use the Halyard configuration command. Figure 31 shows both “my-github- artifact-account” and authenticated the connection with Github.

78

# Setting Variables: TOKEN_FILE=~/github-token ARTIFACT_ACCOUNT_NAME=my-github-artifact-account

# Enable gitHub artifact hal config features edit --artifacts true hal config artifact github enable hal config artifact github account add $ARTIFACT_ACCOUNT_NAME \ --token-file $TOKEN_FILE

# Applying changes sudo hal config version edit --version $SPINNAKER_VERSION sudo hal deploy apply

Code Snippet 12: Enabling Github Provider wih Github Token

Figure 31: Deployed Github provider and Authentication

4.2.3 CHATi Deployment Pipeline To automate the deployment, we would need to deploy a sequence of stages, triggers, and artifacts in the deployment automation module. This process is called deployment pipelining.

79

We use JSON format to configure the following pipeline

Figure 32: CHATi deployment pipeline

Figure 32 shows the pipeline we developed to deploy CHATi application. Our pipeline consists of six stages, which are (in order): Configuration, Check Preconditions, Wait, Deploy, Pipeline, or Undo Rollout.

1. Configuration

Figure 33: Components of Configuration Stage

In this stage, as seen in Figure 33, we define Expected Artifacts and Triggers. Expected Artifact is the mechanism that the pipeline would expect the properties that should be present in the execution 80

context. These properties specified in the application configuration file of the new release (it is deployment.yaml or deployment.lb.yaml). It means where the pipeline should look to know the required configurations.

The Automated Trigger specified what the to execute on git push event happens. This means we would specify the CHATi repository with the artifact constraints specified in the Expected Artifacts.

{ "expectedArtifacts": [ { "defaultArtifact": { "customKind": true, }, "id": "defaultArtifact1", "matchArtifact": { "name": "deployment.yaml", "type": "github/file" }, "useDefaultArtifact": false, "usePriorArtifact": true } ], "triggers": [ { "enabled": true, "expectedArtifactIds": [ "defaultArtifact1"

],

"project": "jSchnitzer1",

"secret": "secret", "slug": "CHATi", "source": "github", "type": "git" } ],

Code Snippet 13: Configuration stage JSON code To add this stage, we add the json configuration as shown in Code Snippet 13

2. Check Precondition

81

Before deploying the new release, we want to ensure that the cluster consists of at least five nodes. In the real-world enterprise applications, the pipeline precondition can be used for any precondition that the cluster should provide, such as specific resources, or computation power to ensure the application to be run correctly.

Code Snippet 14 presents the JSON configuration to install the precondition check that the cluster has more than five hosts in service.

{ "name": "Check Preconditions", "preconditions": [ { "cloudProvider": "kubernetes", "context": { "cluster": "khaled-kube1", "comparison": ">=", "credentials": "my-k8s-v2-account", "expected": 5, "regions": [ "default" ] }, "failPipeline": true, "type": "clusterSize" } ], "refId": "4", "requisiteStageRefIds": [],

"type": "checkPreconditions"

}

Code Snippet 14: Check precondition stage

82

Figure 34: Check preconditions stage in Spinnaker

As a result of the previous Code Snippet 14, the Check Precondition stage took place after the Configuration stage. Figure 34 shows the installed precondition.

3. Wait This stage used to wait 15 seconds to ensure the execution of the previous stages successfully. Code Snippet 15 illustrates it.

{ "name": "Wait", "refId": "2", "requisiteStageRefIds": [ "4" ], "type": "wait", "waitTime": 15 }

Code Snippet 15: Wait stage

83

4. Deploy If the previous stages are executed correctly, then the next step in the line would be to deploy the CHATi app. To deploy the CHATi app we need to specify the “Deploy (Manifest) Configuration” that specifies the application name in the cluster. In other words, where CHATi app will be deployed. Code Snippet 16 illustrates the deployment of the CHATi into the cluster as chatiapp.

{ "account": "my-k8s-v2-account", "cloudProvider": "kubernetes", "manifestArtifactAccount": "my-github-artifact- account", "manifestArtifactId": "defaultArtifact1", "moniker": { "app": "chatiapp" }, "name": "Deploy (Manifest)", "refId": "1", "relationships": { "securityGroups": [] }, "requisiteStageRefIds": [ "2" ], "source": "artifact", "type": "deployManifest"

}

Code Snippet 16: Deploy Stage

Figure 35: Deploy Stage in Spinnaker 84

Figure 35 shows the deploy stage in Spinnaker. We can see clearly that the deploy stage requires the expected artifact which is defined in the configuration stage.

5. Pipeline This stage runs a pipeline. In other words, we may run another pipeline even in other platforms such as Jenkins jobs. Currently, we used to have a cluster validation tool that uses Jenkins to be run. Code Snippet 17 runs this Jenkens job.

{ "application": "chatiapp", "failPipeline": true, "name": "Pipeline", "pipeline": validateJob, "refId": "6", "requisiteStageRefIds": [ "1" ], "type": "pipeline", "waitForCompletion": true }

Code Snippet 17: Pipeline stage

6. Undo Checkout Finally, if the deploy stage fails, then we need to roll back. This stage is used to rollout the changes that might occur to the cluster.

This ensures that no the current version of CHATi application is still in place while cluster’s pods are running correctly. Code Snippet 18 shows the JSON code to rollout changes and undo them.

85

{ "account": "my-k8s-v2-account", "cloudProvider": "kubernetes", "isNew": true, "location": "default", "manifestName": "service chati-service", "mode": "static", "name": "Undo Rollout (Manifest)", "notifications": [ { "address": "khaled.jendi@.com", "level": "stage", "type": "bearychat", "when": [ "stage.complete", "stage.failed" ] } ], "numRevisionsBack": 1,

"refId": "5",

"requisiteStageRefIds": [

"1" ], "sendNotifications": true, "type": "undoRolloutManifest" }

Code Snippet 18: Undo rollout stage

86

Chapter 5

5 Results, Discussions and Conclusions

In this chapter, we introduce the results of different deployment mechanisms regular (manual) deployment in Kubernetes clusters, and deployment automation introduced in the previous chapters. The performance testing is done using both Prometheus and Google Lighthouse.

Prometheus is a monitoring and alerting tool [30] that can use several metrics including counters (such as the number of requests), gauges (such as current memory usage that can go up and down) and histogram (such as the request durations) [31].

5.1 Performance Testing Clusters The performance test was performed on the two clusters mentioned in Chapter 4. Table 2 shows the brief difference between both clusters. We notice that the main difference between the two clusters is deployment automation capability and the traffic distribution while other fields (cluster size, CPU, RAM, and Storage) are fixed.

Cluster 1 Cluster 2 Type On-Prem Cluster Public Cloud Cluster (Microsoft Azure) Size 4 Nodes (1 master, 4 4 Nodes (1 master, 4 worker nodes) worker nodes) CPU 4 VCPU per node 4 VCPU each node RAM 8GB per node 8GB per node Storage 80 GB 80 GB Deployment No Yes Automation Operating System Ubuntu 16.04 Ubuntu 16.04 Traffic Distribution Node Port Load Balancer

Table 2: Clusters Configuration

5.2 Performance Testing Results We performed 100 different deployments on each cluster with a different number of automated clients logged in. The results of deployment methods are shown using the following criteria:

87

5.2.1 Downtime During Deployments This criterion shows whether the application is zero downtime when the deployment is ongoing. In other words, we answer the following question, which are the deployment methods can prevent or minimize the service interruption? Figure 36 shows the downtime when deploying (installing a new release) in Cluster 1.

Figure 36: Service interruption when deployment ongoing (Cluster 1)

As we performed 100 deployments in both clusters, we have been monitoring both clusters using Prometheus. With the pipelining enabled (cluster 2) with deployment automation, we had no service down (zero downtime guaranteed) even though when a faulty deployment happened as the pipeline includes faults checks and preconditions validation. On the other hand, the manual deployment (cluster 1) shows 5% of service interruption in 100 deployments. The reason could be because of faulty deployments

CHATi Up Percentage 100 91 82 73 64 55 Cluster 1 46 Cluster 2 37 Deployments 28 19 10 1 Up

Figure 37: CHATi up time percentage against ongoing deployments

88

5.2.2 CPU Usage The CPU usage criterion shows the CPU consumption and comparison with different deployment mechanisms whether it can affect CPU performance of CPU in both clusters’ nodes. We used both Google Lighthouse and Prometheus CPU gauge. We found out that the deployment mechanisms have no direct effect on the CPU consumption in which the automated deployment pipeline has a negligible CPU usage. As seen in Figure 38, the CPU usage was almost similar in both deployment mechanisms in both clusters.

Average CPU Usage 1000 900 800 700 600 500

ms/s Cluster 1 400 Cluster 2 300 200 100 0 100 1000 5000 7000 10000 Users

Figure 38: Average CPU usage when performing deployments (new releases)

5.2.3 Memory usage The memory usage is one of the important concerns with deployment automation. We would show the memory used when deploying a new release of the CHATi application to figure out the RAM usage in both clusters. Figure 39 shows that the pipeline in cluster 2 consumed more memory than the cluster 1. It seems that the pipeline creates a backup in the memory prior to deployment procedure. Furthermore, the pipeline engine (which is deployed in the same cluster of the application) has a huge memory consumption.

89

Average Memory Usage

2,9

2,2

GB 1,5 Cluster 1 Cluster 2 0,8

0,1 100 1000 5000 7000 10000 Users

Figure 39: Average RAM usage when performing deployments (new releases)

5.2.4 Message Delivery and Request Interruption Furthermore, we would show whether the different deployment methods can cause request interruptions among users. For example, in our CHATi application, if user X is sending a message to user Y while deployment is ongoing, could messages reach correctly?

Message Delivery 100 91 82 73 64 55 46 Cluster 1 37 Cluster 2 28 19 10

Percentage Percentage of Correct Delivery 1 100 1000 50000 100000 1000000 Messages

Figure 40: Percentage of correctly delivered messages

90

Figure 40 shows the percentage of correctly delivered messages through the lifespan of the application in both clusters. We found both clusters converge into (almost) messages to be delivered correctly in small amount to requests (sent messages) when the total number of the sent messages is small (within a thousand of messages) but we have seen the difference in the long-run when the sent messages are increased. We found a linear increment of the faulty messages (not delivered messages) in cluster 1 as the amount of sent messages increased. We found also between 1~2% of faulty messages in Cluster 2, but it seems this is because of the Angular/NodeJS framework.

5.2.5 Request Duration The request duration metric shows whether the deployment mechanism can change the request duration when the deployment is ongoing.

Requests Duration 10 9 8 7 6 5 4 3 Requests (ms) 2 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Deployments

Cluster 1 Cluster 2

Figure 41: Average request duration when performing a deployment (new release)

As seen in Figure 41, requests took more time to reply when deployment is ongoing. The reason is because of the service interruption as shown in 5.2.1. On the other hand, the cluster 2 processes requests regularly even though the deployment of a new release is ongoing.

91

5.2.6 Zero Downtime (Distributed Spinnaker) The final metric is to test the solution while using automation deployment (in cluster 2 only) on (killed or suspended) spinnaker service which simulates the pipeline engine is down. This performance test is to check whether pipeline engine could be spawned automatically, and the deployment can go through without the disruption of the service.

As seen in Figure 42, the result of deployment has not shown downtime even though the pipeline instance is down. This is because the pipeline engine is replicated among the entire cluster and containerized in clusters pods.

CHATi Up Percentage 100 91 82 73 64 55 46 Cluster 2 37 Deployments 28 19 10 1 Up

Figure 42: CHATi up time percentage against ongoing deployments while pipeline engine is killed

92

Chapter 6

6 Discussions and Conclusions

In this thesis we have developed a regular chatting application and deployed it into the cloud after dockerizing it. This procedure is done after introducing IaaS, PaaS and SaaS of the cloud including edge cloud technologies such as OpenStack, container orchestration, dockers, pipelining and other technologies.

6.1 Descussion and Analysis The main goal of the project is to abstract the deployment procedure from developers and guarantee the validity of the deployment processes. As seen from previous results, we achieved the zero- downtime with automatic deployment into a cluster simply with the new code pushed into the repository (git repository or image docker repository). There are absolutely other factors which can play a critical role such as traffic distribution as shown in 4.2.1.1.

The main concern we found about auto deployment solution in the edge cloud computing is the usage of memory. This problem become more and more noticeable when using the paid resources of the public clouds such as Microsoft Azure in which companies has to pay for extra memory usage. Thus, it would be a trade-off between efficiency and reliability from one side, and cost on the other side.

In the same manner of the CHATi application has been deployed, we may automate the deployment other applications by creating the proper docker file with the docker-compose as described in 4.2.1, and then creating the specified pipeline with its stages.

6.2 Future Work the main concern within the deployment automation within the cloud is the memory consumption, we may introduce a dedicated edge cluster within the cloud computing which can serve as the pipeline engine. This mechanism could leverage the high cost of resource usage. For example, we could spin an edge cluster using OpenStack which is considered as a point of interest for developers which serves the pipelines engine. It would be able to serve different applications.

93

Another improvement area within the project can be made throughout running cluster federation. Cluster federation provides synchronization of clusters resources so when two clusters are federated, the application would be able to be deployed in both clusters which ensures the deployment exist in both clusters but this solution by itself does not provide deployment automation, so we can use federated clusters to install the pipeline engine in which we distribute the memory usage into several clusters which can achieve better resources allocation.

94

Appendix A Dockerfile

# stage 1 # this stage is for building the app # it contains the tools to build and debug the app (ng commands)

# Get latest node framework (as a dependency) FROM node:latest as node # Set the working path WORKDIR /app # Add project files COPY . . # Install npm packages RUN npm install # Build sass (using sass engine in npm and node) RUN npm rebuild node-sass # Build CHATi project as production executable RUN npm run build --prod

# stage 2 # this stage is for running the app # it contains the tools to run app with best performance

# Get the cloud web server application FROM nginx:alpine

# Copy the CHATi project into the web server application home # directory COPY --from=node /app/dist/CHATi /usr/share/nginx/html

95

Appendix B deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: chati-deployment #name of microservice application labels: app: chati-app spec: selector: matchLabels: app: chati-app replicas: 5 # define replication set with factor of 2 (run in 2 pods) template: metadata: labels: app: chati-app spec: containers: - name: chati-app image: index.docker.io/jschnitzer1/chati:v17 ports: - containerPort: 80 livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 15 timeoutSeconds: 30 --- apiVersion: v1 kind: Service metadata: name: chati-service spec: selector: app: chati-app ports: - protocol: TCP port: 80 targetPort: 80 nodePort: 31002 type: NodePort 96

deployment.lb.yaml apiVersion: apps/v1 kind: Deployment metadata: name: chati-deployment #name of microservice application labels: app: chati-app spec: selector: matchLabels: app: chati-app replicas: 5 # define replication set with factor of 2 (run in 2 pods) template: metadata: labels: app: chati-app spec: containers: - name: chati-app image: index.docker.io/jschnitzer1/chati:v17 ports: - containerPort: 80 livenessProbe: httpGet: path: / port: 80 initialDelaySeconds: 15 timeoutSeconds: 30 --- apiVersion: v1 kind: Service metadata: name: chati-service spec: selector: app: chati-app ports: - protocol: TCP port: 80 targetPort: 80 nodePort: 31002 type: LoadBalancer

97

Appendix C pipleline.json { "expectedArtifacts": [ { "defaultArtifact": { "customKind": true, "id": "06be22f8-5d6b-4229-9e60-0ee66c6de3a6" }, "id": "ae3c6ab1-6cf8-4549-bad9-9256cfebc7b4", "matchArtifact": { "id": "96c027fb-e441-4a88-990b-424ccb8e2070", "name": "deployment.yaml", "type": "github/file" }, "useDefaultArtifact": false, "usePriorArtifact": true } ], "keepWaitingPipelines": false, "lastModifiedBy": "jSchnitzer1", "limitConcurrent": true, "stages": [ { "account": "my-k8s-v2-account", "cloudProvider": "kubernetes", "manifestArtifactAccount": "my-github-artifact-account", "manifestArtifactId": "ae3c6ab1-6cf8-4549-bad9- 9256cfebc7b4", "moniker": { "app": "chatiapp" }, "name": "Deploy (Manifest)", "refId": "1", "relationships": { "loadBalancers": [], "securityGroups": [] }, "requiredArtifactIds": [], "requisiteStageRefIds": [ "3" ], "source": "artifact", "type": "deployManifest" 98

}, { "isNew": true, "name": "Check Preconditions", "preconditions": [ { "cloudProvider": "kubernetes", "context": { "cluster": "khaled-kube1", "comparison": ">=", "credentials": "my-k8s-v2-account", "expected": 5, "regions": [] }, "failPipeline": true, "type": "clusterSize" } ], "refId": "2", "requisiteStageRefIds": [], "type": "checkPreconditions" }, { "isNew": true, "name": "Wait", "refId": "3", "requisiteStageRefIds": [ "2" ], "type": "wait", "waitTime": 15 }, { "application": "chatiapp", "completeOtherBranchesThenFail": false, "continuePipeline": false, "failPipeline": false, "isNew": true, "name": "Pipeline", "pipeline": validateJob, "refId": "4", "requisiteStageRefIds": [ "1" ], "type": "pipeline", "waitForCompletion": true }, 99

{ "account": "my-k8s-v2-account", "cloudProvider": "kubernetes", "completeOtherBranchesThenFail": false, "continuePipeline": false, "failPipeline": false, "isNew": true, "location": "default", "manifestName": "deployment chati-deployment", "mode": "static", "name": "Undo Rollout (Manifest)", "notifications": [ { "address": "[email protected]", "level": "stage", "type": "bearychat", "when": [ "stage.complete", "stage.failed" ] } ], "numRevisionsBack": 1, "refId": "6", "requisiteStageRefIds": [ "1" ], "sendNotifications": true, "type": "undoRolloutManifest" } ], "triggers": [ { "enabled": true, "expectedArtifactIds": [ "ae3c6ab1-6cf8-4549-bad9-9256cfebc7b4" ], "project": "jSchnitzer1", "secret": "secret", "slug": "CHATi", "source": "github", "type": "git" } ], "updateTs": "1556218320000" }

100

101

References

[1] Openstack “Open source software clouds” [Online] Available: http://www.openstack.org/ [Accessed: 2019-02-17]

[2] Kubernetes “Open source system” [Online] Available: https://kubernetes.io/docs/concepts/overview/what-is-kubernetes/ [Accessed: 2019-02- 17]

[3] Hamid Reza Faragardi, “Ethical Considerations in Cloud Computing Systems”, School of Innovation, Design and Engineering, Mälardalen University, 721 23 Västerås, Sweden;

[4] Summary Findings “Cloud Computing and Sustainability: The Environmental Benefits of Moving to the Cloud” Accenture 2010 WSP Environment & Energy.

[5] C. R. KOTHARI “Research Methodology Method and Techniques”, 2nd ed, Jaipur, India, New Age International (P) Limited Publisher, 1990, ch 1, 6

[6] P. Mell and T. Grance, “The NIST definition of Cloud Computing,” National Institute of Standards and Technology, Special Publication 800-145

[7] Data Flair “Features of Cloud Computing – 10 Major Characteristics of Cloud Computing” [Online] Available: https://data-flair.training/blogs/features-of-cloud- computing/ [Accessed: 2019-05-14]

[8] MIT Technology Review “Who Coined Cloud Computing?” [Online] Available: https://www.technologyreview.com/s/425970/who-coined-cloud-computing/ [Accessed: 2019-05-14]

[9] Rajesh Hegde “Cloud Desk – Cloud Operating System” Information Science and Eng. Department, Sri Siddhartha Institute of Technology

[10] R. Moreno-Vozmediano, R. S. Montero, and I. M. Llorente “IaaS Cloud Architecture: From Virtualized Datacenters to Federated Cloud Infrastructures” Complutense University of Madrid, pp 66-69

[11] Web Archive "Interview With Brian Sullivan – Inventor of Open Platform As A Service," [Online] Available: https://web.archive.org/web/20180725030017/http://www.sullivansoftwaresystems.com/inter view.htm [Accessed: 2019-05-20]

[12] DZone “Essential Characteristics of PaaS” [Online] Available: https://dzone.com/articles/essential-characteristics-paas [Accessed: 2019-05-20]

102

[13] S. Awasthi, A. Pathak, L. Kapoor “Openstack- Paradigm Shift to Open Source Cloud Computing & Its Integration” Cross Functional Services, HCL Technologies

[14] C. Pahl, B. Lee “Containers and Clusters for Edge Cloud Architectures – a Technology Review” Irish Centre for Cloud Computing and Commerce IC4 & Lero, the Irish Software Research Centre, Dublin City University & Athlone Insititute of Technology

[15] Kubernetes “Kubernetes Components,” [Online] Available: https://kubernetes.io/docs/concepts/overview/components [Accessed: 2019-05-25]

[16] Medium “Kubernetes Architecture” [Online] Available: https://medium.com/@chkrishna/kubernetes-architecture-f7ca63fff46e [Accessed: 2019-05-25]

[17] LearnItGuide “What is Kubernetes - Learn Kubernetes from Basics” [Online] Available: https://www.learnitguide.net/2018/08/what-is-kubernetes-learn- kubernetes.html [Accessed: 2019-05-26]

[18] CloudAcademy “Netflix Spinnaker: multi-cloud continuous integration tool” [Online] Available: https://cloudacademy.com/blog/netflix-spinnaker/ [Accessed: 2019-06-01]

[19] M. Orzechowski, B. Balis, K. Pawlik, M. Pawlik, M. Malawski “Transparent deployment of scientific workflows across clouds – Kubernetes approach,” AGH Universtity of Science and Technology, Krakow, Poland

[20] C. J. Van Wyngaard, J. H. C. Pretorius, L. Pretorius “Theory of the Triple Constraint – a Conceptual Review,” Universities of Johannesburg and Pretoria South Africa

[21] J. Berg and T. Grüne-Yanoff (2018). Scientific Inference [PowerPoint Presentation] Retrieved from “Theory and Methodology of Science with Applications” course https://www.kth.se/social/course/AK2036/

[22] Google Lighthouse “Lighthouse” [Online] Available: https://developers.google.com/web/tools/lighthouse/ [Accessed: 2019-06-20]

[23] OpenStack Deployment using Kolla-Ansible ”OpenStack Documentation” https://docs.openstack.org/kolla-ansible/queens/user/quickstart.html [Accessed: 2019-06-20]

[24] Kubespray “Deploy a Production Ready Kubernetes Cluster” [Online] Available: https://github.com/kubernetes-sigs/kubespray [Accessed: 2019-02-20]

103

[25] Microsoft Azure “How to deploy AKS cluster” [Online] Available: https://docs.microsoft.com/en-us/azure/aks/kubernetes-walkthrough [Accessed: 2019- 04-10]

[26] Spinnaker “Install and Configure Spinnaker” [Online] Available: https://www.spinnaker.io/setup/install/ [Accessed: 2019-06-10]

[27] Helm “Role-based Access Control” [Online] Available: https://github.com/helm/helm/blob/master/docs/rbac.md [Accessed: 2019-06-11]

[28] Github “Spinnaker helm chart” [Online] Available: https://github.com/helm/charts/tree/master/stable/spinnaker [Accessed: 2019-07-15]

[29] “Docker” Dockerfile reference [Online] Available: https://docs.docker.com/engine/reference/builder/ [Accessed: 2019-07-16]

[30] “Prometheus” Prometheus Overview [Online] Available: https://prometheus.io/docs/introduction/overview/ [Accessed: 2019-12-15]

[31] “Prometheus” Prometheus Metric Types [Online] Available: https://prometheus.io/docs/concepts/metric_types/ [Accessed: 2019-12-15]

104

105

TRITA-EECS-EX-2020:81