Teaching Deployment

HANNAH FARJAMI SIMON AGARTZ NILBRINK

KTH ROYAL INSTITUTE OF TECHNOLOGY INFORMATION AND COMMUNICATION TECHNOLOGY

Abstract

In today’s IT-landscape is one of the hottest topics. There are many emerging uses and technologies for the cloud. Deployment of applications is one of the main usages of the cloud today. This has led to companies giving developers more responsibilities with deployment. Therefore, there is a need to update educations in computer science by including cloud deployment. For these reasons, this thesis attempts to give a reasonable proposal for how cloud deployment could be taught in a university course.

A literature study was conducted to gather information about topics surrounding cloud deployment. These were topics like cloud computing, service models, building techniques and cloud services. Then a case study was conducted on three different cloud services, OpenShift, , and . This was to learn how to deploy. Lastly, two interviews and a survey were conducted with people that have an insight into the subject and could provide reasonable information.

Based on our case study, interviews and survey we concluded a reasonable approach to how deployment with cloud services could be taught. It can be taught with a theoretical and practical part. The theoretical part could be a lecture introducing Heroku and OpenShift, followed by an assignment where students deploy an application to them. The reasons we recommend Heroku and OpenShift is for Heroku’s simple and fast deployment and OpenShift for being more educative.

We also realized that cloud deployment would work best as a stand-alone course. Because during the degree project it became clear how broad cloud deployment is.

Keywords: cloud computing, Platform-as-a-Service, cloud deployment, cloud services, Heroku, Cloud Foundry, OpenShift, education. Abstract

I dagens IT-miljö är molnet ett av de hetaste ämnena. Det finns många nya användningsområden och teknologier för molnet. Driftsättning av applikationer är ett av de viktigaste användningsområdena av molnet idag. Detta har lett till att företag ger utvecklare mer ansvar vid driftsättning. Därför är det nödvändigt att förändra utbildningar i datorvetenskap genom att inkludera driftsättning i molnmiljö. Av dessa skäl försöker denna avhandling ge ett rimligt förslag på hur driftsättning i molnmiljö kan läras ut på ett universitet.

En litteraturstudie genomfördes för att samla information om ämnen som berör driftsättning i molnmiljö. Dessa var ämnen som molnet, servicemodeller, byggtekniker och molntjänster. Sedan genomfördes en fallstudie på tre olika molntjänster, OpenShift, Cloud Foundry och Heroku. Detta var för att lära sig hur man driftsätter. Slutligen genomfördes två intervjuer och en undersökning med personer som har insikt i ämnet och som kan ge rimlig information.

Baserat på vår fallstudie, intervjuer och undersökning drog vi en slutsats för ett rimligt tillvägagångssätt för hur driftsättning i molnmiljö kunde läras ut. Det kan undervisas med en teoretisk och praktisk del. Den teoretiska delen kan vara en föreläsning som introducerar Heroku och OpenShift, följt av en uppgift där studenter driftsätter en applikation till dem. Anledningarna till att vi rekommenderar Heroku och OpenShift är för Heroku’s enkla och snabba driftsättning och OpenShift för att den är mycket mer lärorik.

Vi insåg också att driftsättning i molnmiljö skulle fungera bäst som en fristående kurs. Eftersom det under examensprojektet blev klart hur brett driftsättning i molnmiljö är.

Nyckelord: Molntjänster, driftsättning, molnet, plattform-som-en-tjänst Heroku, Cloud Foundry, OpenShift, utbildning.

Acknowledgments

Special thanks to Leif Lindbäck and Fadil Galjic at KTH that gave us valuable insight and guidance when writing this thesis.

Also, a big thanks to those who participated in our interviews and survey.

Table of Content

1 Introduction ...... 1 1.1 Background ...... 1 1.2 Problem...... 1 1.3 Purpose ...... 2 1.4 Goal ...... 2 1.5 Benefits, Ethics and Sustainability ...... 2 1.6 Methodology / Methods ...... 2 1.7 Delimitations ...... 3 1.8 Outline ...... 3 2 Cloud Technologies ...... 5 2.1 Cloud Computing – ”The Cloud” ...... 5 2.1.1 Definition ...... 5 2.1.2 Architecture ...... 5 2.1.3 Service models ...... 5 2.2 Building Techniques...... 7 2.2.1 Virtualization ...... 7 2.2.2 Containerization ...... 7 2.2.3 Buildpacks ...... 8 2.2.4 Virtual machines vs. Containers ...... 8 2.2.5 Containers vs. Buildpacks ...... 9 2.2.6 Orchestration ...... 9 2.3 Cloud services ...... 10 2.3.1 Heroku ...... 10 2.3.2 Cloud Foundry ...... 13 2.3.3 OpenShift ...... 16 2.4 Related Works ...... 18 3 Methods and Methodologies ...... 19 3.1 Research strategy ...... 19 3.1.1 Quantitative and qualitative methods ...... 19 3.1.2 Inductive, deductive and abductive methods ...... 19 3.1.3 Case Study ...... 20 3.1.4 Interview ...... 20 3.1.5 Survey...... 20 3.1.6 Adopted methods ...... 21 3.2 Research process ...... 21 3.2.1 Research process model ...... 21 3.3 Data collection ...... 24 3.3.1 Literature Study ...... 24 3.3.2 Case Study Observations ...... 24 3.3.3 Interview ...... 25 3.3.4 Survey...... 25 3.4 Evaluating differences between cloud services ...... 25 3.5 Tools, documentation, and modelling ...... 26 3.5.1 Tools ...... 26 3.5.2 Documentation ...... 27 3.5.3 Modelling ...... 27 1

4 Case Study of different Cloud Services ...... 29 4.1 OpenShift ...... 29 4.1.1 Setting up MiniShift ...... 29 4.1.2 Creating Java image ...... 30 4.1.3 Deploying ...... 30 4.1.4 Problems and Solutions ...... 33 4.2 Cloud Foundry ...... 33 4.2.1 Setting up Cloud Foundry CLI and PCF Dev ...... 33 4.2.2 Creating Java Docker file (and pushing it) ...... 34 4.2.3 Deploying ...... 34 4.2.4 Problems and Solutions ...... 36 4.3 Heroku ...... 36 4.3.1 Setting up Heroku CLI...... 36 4.3.2 Deploying ...... 37 4.3.3 Problems and Solutions ...... 38 5 Interviews and Survey ...... 39 5.1 Interview with an employee at a big company ...... 39 5.2 Interview with a teacher at KTH ...... 41 5.3 Student Survey ...... 43 6 Result ...... 47 6.1 Evaluation of the Case Study ...... 47 6.1.1 OpenShift ...... 47 6.1.2 Cloud Foundry ...... 48 6.1.3 Heroku ...... 49 6.1.4 Analysis...... 50 6.2 Evaluation of Interviews and the Survey ...... 51 6.2.1 Interview with an employee at a big company ...... 51 6.2.2 Interview with Leif at KTH ...... 52 6.2.3 Survey...... 53 6.2.4 Analysis...... 54 6.3 Answering the research question ...... 54 7 Discussion ...... 57 7.1 Method...... 57 7.1.1 Literature Study ...... 57 7.1.2 Case Study ...... 57 7.1.3 Interviews and a Survey ...... 57 7.1.4 Validity ...... 57 7.1.5 Reliability ...... 58 7.2 Result ...... 58 7.3 Future Work ...... 59 References ...... 61

2

1 Introduction

In today’s IT-landscape cloud computing is one of the hottest topics, although it has been around for a while. As early as the 1960’s the concept of cloud computing was discussed, but without the name. The first service that really started the cloud computing era was the Amazon Web Service in 2006. In a survey conducted by International Data Group (IDG) in 2018[1], where 550 IT decision-making people were asked, it was concluded that 73% of organizations have at least one application in the cloud and 17% more would have it in 12 months. Besides this, 38% feel pressured to adapt their applications or infrastructure to the cloud. This means that it’s likely that engineers and developers need to have knowledge about technologies in the cloud, and therefore there must be some focus on the cloud in the universities.

1.1 Background There are multiple possibilities regarding cloud technologies to choose from when deciding to deploy to the cloud. Choosing which technologies to use can be a daunting task. Some of the most common cloud services are, Heroku, which is a web-based technology, and OpenShift and Cloud Foundry, which are two open source technologies. They are all what’s called Platform-as-a-Service (PaaS), which helps developers and organizations to quickly deploy cloud applications. Then there is the choice of how to package the application before deploying it. This choice is often between buildpacks and containerization. Buildpacks are a way of building applications together with its dependencies and other library’s, while containerization does the same but also controls the runtime of the application. The choice of using buildpacks or containerization can sometimes be easy because some cloud technologies are bound to use one of them. To choose the right cloud service and technology for a certain job requires knowledge and experience. This is not included in many educations of computer science. In the course Design of Global Applications IV1201 at the university KTH Royal Institute of Technology, the students are building an application that must be deployed. However, the deployment is not a substantial part of the course. There is a possibility, to extend the course to give the students deeper knowledge about cloud deployment.

1.2 Problem As stated in the introduction, 73% of organizations have one or more application hosted in the cloud. The problem is that the cloud is not only a broad term, it is also a broad field, with many different options even when only considering deploying. This can be a problem when trying to learn about deploying applications and/or doing the deployment of a real application. Companies today are depending on new engineers to have at least a basic knowledge on how to deploy an application. Therefore, it is important that 1

students in universities are given some basic introduction to different cloud technologies and learns how to apply them. Hence this thesis is trying to answer the following question: RQ1 How can deployment to cloud services be taught in a university course?

1.3 Purpose The purpose of the thesis is to evaluate the possibilities to create course material about cloud deployment. This thesis will not only help developers and engineers when trying to deploy an application, but it will also provide useful knowledge for students doing research on cloud technologies.

1.4 Goal Our project has two goals. First is to evaluate if the knowledge of cloud deployment is useful and relevant for the course IV1201. The second is to create a basis of how cloud deployment could be integrated into the mentioned course.

1.5 Benefits, Ethics and Sustainability If the project turns out to be relevant for the course IV1201, it will be beneficial to the students in that course for learning about deployment and to the teacher of that course when integrating the material in the course. Other teachers might also find it useful in their courses. It will also benefit researchers that are looking for practical studies regarding deployment. In the project, there are a few ethical problems that can arise. One could argue that open source vs. proprietary software would be one such discussion. Also, if one would choose to go with a big cloud service provider like or Amazon, the amount of data they gather about their application or users could be an issue. We do not feel like these are topics for this university course and will therefore not be discussed further. Our project is a research study and will not have any environmental impact.

1.6 Methodology / Methods First, we will conduct a theoretical study on the topic, where we will look at different technologies. During a case study, we want to retrieve information on our chosen technologies, like complexity, time-consumption and problems and solutions to them. We also want to see what other functionalities they have. In the case study, we are going to deploy an application written in Spring boot, Java, that uses a database for storing user information. Going further, we will do interviews and a survey to gain insight from a teacher, students, and a company. In the end, we will evaluate all data from the case study, interviews, and the survey to form an answer to our research question.

2

1.7 Delimitations A deeper analysis regarding scalability and cost of deployed applications will not be conducted in this thesis. To measure and evaluate scalability and cost, much traffic is needed to make a reasonable conclusion. This would require a lot of resources we cannot provide. There will be a limited selection of cloud technologies that we will be able to test because there are too many options on cloud deployment. Trying them all would be impossible due to our limited time. All functionality of the different technologies will not be tested due to our limited resources. We will not create a learning module for the course, but instead, indicate how it could be designed.

1.8 Outline Chapter 2 provides a theoretical background for our thesis. This includes cloud computing, building techniques, cloud services, and related work.

Chapter 3 describes the methods and methodologies used in our project. This is where we describe different research methods and which ones are adopted in this report. Also, a description of our research process during our degree project.

Chapter 4 presents the case study of our project. Here we are deploying an application to different cloud services.

Chapter 5 present the outcomes of our interviews and survey.

Chapter 6 evaluates and analyses the outcomes from chapter 4 and 5, and answers the research question.

Chapter 7 discusses our project methods, results, and future work.

3

4

2 Cloud Technologies

This chapter will present the theoretical background of our degree project. The theme will be cloud technologies. Starting with an explanation of “The Cloud” and ending with a deeper overview of three popular deployment services.

2.1 Cloud Computing – ”The Cloud” It is crucial to understand the definition of cloud computing when trying to comprehend the material presented in this chapter. The building techniques subchapter will focus on different techniques for building applications, that cloud services often provide. The cloud services subchapter will mostly focus on cloud deployment services, which is a big and challenging part of cloud computing. Cloud deployment services enable the possibility to deploy applications on cloud environments. Furthermore, there are many different services on the market, which makes it hard to know which deployment service to choose.

2.1.1 Definition Cloud computing means to remotely manage and configure computing resources i.e. data storages and computing power to serve applications[2]. Often it means big data centers having a lot of hardware resources, that can be offered to a user over the .

2.1.2 Architecture According to the definition of cloud computing specified by NIST (National Institute of Standards and Technology)[3], the architecture of cloud computing can be viewed as two layers. The first layer is called the physical layer, that comprises hardware resources that are required to support the cloud services. These hardware resources are storage, network components, and servers. The second layer is called the abstraction layer, that comprises the software deployed over the physical layer.

2.1.3 Service models There are different types of service delivery models in cloud computing. The three main types are Infrastructure , , and , see figure 1. The biggest difference between them is what they provide from the cloud for you. These services will be explained in more detail below.

5

Figure 1. Showing the cloud computing service models in three-layer architecture.

Infrastructure as a Service (IaaS), is a service that offers computing resources in a virtual environment. The hardware resources involve servers, networking, data storage and virtualization that are generally used by IT administrators. A traditional provides these hardware resources too, but an IaaS maintains them for you. Amazon EC2, IBM SmartCloud Enterprise, and Rackspace Open Cloud are some existing providers of IaaS. Platform as a Service (PaaS), is a service that offers management of data and the application resources. You do not need to think about the operating system, software updates or the amount of storage the hardware resources have at the data centers. All this is managed by the PaaS. Furthermore, products and services of PaaS are commonly used by developers to build, compile and run their applications in a cloud environment. Elastic Beanstalk (AWS), Heroku and Cloud Foundry are some services and providers of PaaS. Software as a Service (SaaS), is a service that offers on-demand pay per use of application software to users. Google is a prominent SaaS provider, that offers products and services to end customers. Gmail, , and Google Docs are some examples of SaaS services provided by Google. Licenced bought programs are platform dependent because they require the software to be installed locally on the machine. Whereas products of SaaS are accessible via a web browser or a lightweight client application and thus platform independent. Anything as a Service (XaaS), is a general term for a cloud service that is comprised of several services related to cloud computing. It is mostly a combination of different services that are modified to suit particular customers. Communications as a Service (CaaS) and Monitoring as a Service (MaaS) are some services categorized under XaaS.

6

2.2 Building Techniques When deploying to the cloud, or any platform, we need to package our application in such a way that it is secure, easy to deploy, fast, and scalable. Here, there are three important concepts to clarify, virtualization, containerization, and buildpacks. Choosing between them could be after preference, its pros, and cons or as simple as which cloud technology is going to be used for the PaaS. That is because different PaaS support different packaging technologies. When developing an application, it must be able to run on different machines. This could be on multiple developer’s machines, a testing server and in production. Every single one of these could in worst case run on different platforms. This would make a hassle of installing correct dependencies and reconfiguring the machine, to install all the right components. Therefore, building techniques are important. We are going to first list different options of building techniques, then compare the ones that solve similar problems.

2.2.1 Virtualization The first solution could be virtualization. This means that we create a so-called image that contains the application and in addition a full (even though it might be lightweight) operating system. This would then run on another operating system through something that is called a hypervisor. A hypervisor[4] is a software that creates something like an API for the underlying hardware, making it possible for one type of operating system, e.g. Linux, to run on another type of operating system, e.g. Windows. This means that we can package our application together with its dependencies and libraries in a virtual machine and then run it on any type of machine. We can also scale the application by running multiple instances of the same virtual machine. Another advantage is the possibility to run different applications independent of each other on the same hardware.

2.2.2 Containerization Containerization is the thought of bundling an application together with its configuration files, libraries, and dependencies in what is called an image. When the image is running, it is called a container. This container is running on a container engine, like the hypervisor with virtual machines. The difference to a virtual machine is that each container does not need their own operating system, instead, they run on the hosts operating system[5] through the container engine. As with virtual machines, we can scale our application by running more instances of the same container and run different application independent of each other on the same hardware. Throughout this text, we will sometime use Docker as a synonym for containers. This is because Docker is the leading platform for containerization, and most platforms support Docker images as a deploying option. 7

2.2.3 Buildpacks Buildpacks are the first part of containerization, in which dependencies are downloaded and generated assets or compiled code is outputted. In that sense, buildpacks can be compared to containerization, but what happens after the buildpack is done, is up to the implementation of the application platform. This can be thought of as doing the deployment in these steps: • First, develop the application • Then take a buildpack that someone else has written for the language that the application is developed in and compile and/or generate assets • Then run the result on the platform the buildpack was written for This means that developers only need to focus on building the actual application and then use resources that other people have created which are properly tested by users. A buildpack consists of multiple scripts that run the application through the build process. These are often the following scripts: • Detect, that determines if the buildpack should be applied to the application • Supply, that makes sure that the application gets all the dependencies it needs • Finalize, that prepares the application to be launched • Release, that provides information about how the application should be run All these scripts need the directory with the source code as an argument, and some of them need more directories to know where to put and find all the dependencies.

2.2.4 Virtual machines vs. Containers The similarities are many and they derive from the same concept. But there are some advantages with containers. When we run an instance of a virtual machine, we need to run an operating system and the application itself, creating a lot of overhead. But when running an instance of a container, the overhead is much smaller because we only need to run the application. This is visualized in figure 2, where we can see that each container requires much fewer resources than a virtual machine. It is also a big advantage when we want to scale our application because we can fit more containers than virtual machines on one host.

8

Figure 2. Virtualization (Left) vs. Containerization (Right)

2.2.5 Containers vs. Buildpacks We also want to bring up differences between containers and buildpacks, because they are the building techniques that most PaaS uses. As presented, with buildpacks the developer only needs to provide the code, the rest is up to the platform. But with containers, the developer needs to provide the code and write the file that creates the container image. So, the key difference is the separation of concern. With buildpacks, we let the developer only care about the code, while containers require a developer to think about deployment. Buildpacks are however harder to write, because they consist of multiple scripts and the developer that writes them needs to know more about the underlying platform. But often there is no need to write our own buildpacks, PaaS that support buildpacks provide their own. Another big difference is that we do not always know how the buildpacks are written, which could lead to different results between buildpacks. With containers, the developers produce the file for building themselves.

2.2.6 Orchestration Orchestration is the term used for describing when automating the management, coordination, and performance of computer systems and, especially in our case, containers. This includes things like health-checks on containers and removing and starting new containers depending on how many should be running at any given time. This can be a predefined number of containers or automated depending on the workload of each container.

Kubernetes is one of the biggest open source orchestration systems.

9

2.3 Cloud services We have chosen three different PaaS, Heroku, Cloud Foundry and OpenShift because we will later try them out in our case study. These three cloud services are relevant today and together they cover essential techniques like containers, buildpacks, and orchestration.

For these three PaaS we are going to explain their workings divided into the following:

• The Architecture describes how their system is layered in an abstract way for easy understanding. • The Deployment Options describes the different options that a developer can utilize to deploy their application to the platform. • The Runtime Containers describe how the platform uses containerization for running applications. • The Network describes the networking of the platform. • The Other Features describes features that they may have. All three cloud services are container-based PaaS. Here, container-based refers to their runtime of the applications, and not the building technique. They all offer scalability options in an easy manner. But as mentioned in chapter one, this is outside the scope of this thesis. Therefore, it will not be considered in this chapter.

2.3.1 Heroku Heroku is a container-based PaaS[6] that was founded in 2007 and was initially developed by James Lindenbaum, Adam Wiggins, and Orion Henry. Heroku is a cloud service for developers to deploy, manage and scale their applications. Heroku is a polyglot platform, which means that the deployment procedure of applications is done in the same way across different programming languages. The only thing developers need to provide is the source code which should be submitted to Heroku. Then Heroku provides the management of the infrastructure, hardware, and the servers. These services are hosted on Amazon’s secure data centers [7]. Heroku provides both a web-based interface and a command line interface for developers to deploy and manage their applications.

Architecture In figure 3, we can see the architecture of Heroku. From the top, we have HTTP Routers that are responsible for receiving incoming requests for an application and route them to the application’s web dynos, which is Heroku’s implementation of containers. Then in the middle, we have the Dynos which runs the application, followed by the Dyno Manifold which is now called Dyno Manager[8], which manages dynos. At the left, we have the Control API, which lets developers manage and command the application through a set of control surface APIs. At the bottom, we have different components that a developer can add. This includes many features that Heroku provides. 10

Figure 3. Showing Heroku architecture. (Appears here with permission of Sagupta1 under Creative Commons Attribution-Share Alike 4.0 International license.)[9].

Deployment Options Heroku provides two main options to deploy your application. One is through Git, which means that your application source code must be in a Git repository. Then the code is pushed to the Heroku platform, which uses buildpacks to compile and create a slug. Slug is a compressed and pre-packaged copy of an application[10], comparable to a container image. Buildpacks can compile applications written in several languages, which makes it the backbone of Heroku’s polyglot platform. This option is the most preferred one. The other option is to deploy with a Docker image. The Docker image must be pushed to the Heroku platform, which then creates a Dyno from it, skipping the slug process.

Runtime Containers It was previously mentioned that Heroku is container-based, dynos are Heroku’s implementation of containers[11]. At the core, dynos are isolated, secured, virtualized Linux containers that contain the application’s code and dependencies. To be able to run an application the dyno needs an app slug or an image.

The dyno formation is created by the selection of dyno types and dyno configurations. Depending on which type they are, different characteristics and features are possible. The types are:

11

• Free, good for testing in a limited sandbox • Hobby, good for smaller hobby applications • Standard, good for professional applications with better scaling opportunities • Performance, good for high traffic and/or heavier applications

The computing power, amount of memory, CPU share and if they have dedicated compute resources are some of the differences that vary from dyno types. The pricing depends on the type as well, heavier resourced dynos cost more. Mixing dyno types is only possible for standard and performance dynos.

Another vital part of dynos is the dyno configuration, where the process type is specified. Every dyno belongs to one of three configurations:

• Web dynos, will receive HTTP requests from the HTTP routing, mostly runs web servers for an application. • Worker dynos, used to execute application tasks i.e. background jobs, timed jobs, and queuing systems. • One-off dynos, are temporary dynos that can be attached to the terminal. They are mostly used for administrative tasks, database migrations, and console sessions.

Dynos cannot manage themselves in a joint environment, they need something that can manage them all together. This is where the dyno manager steps in[12]. Some important jobs the manager can do is to terminate dynos, restart dynos, observe and detect a fault in the underlying hardware, and move a dyno to another physical location. One example when the dyno manager restarts all the application’s dynos is if the developer deploys a new version of the app.

Network By default, the deployed application is available at its Heroku domain, which looks like this: [application name].herokuapp.com. Custom domains are supported on Heroku, although they require some configuration[13]. Whenever an HTTP request is made to a Heroku domain, the Heroku platform will route the request to Heroku routers[14]. Determination of the location for the application’s web dynos is made by the routers by looking at the path of the request. When the location is identified, the router will forward the request to one of these dynos. Heroku routers use a random selection algorithm for selecting which web dyno should service the request[14]. The router will select another web dyno to process the request if the first attempt to connect fails i.e. if connection times out or is refused.

Other Features Heroku offers many features called add-ons, they are either serviced by Heroku or other third-party providers. Database and database management is also a service that Heroku provides, e.g. Heroku Postgres. Another add-on provided by Heroku is monitoring which lets developers manage and view information

12

about the application on Heroku through different tools, e.g. Heroku Dashboard.

2.3.2 Cloud Foundry The next PaaS that we will consider is Cloud Foundry. Cloud Foundry is an open source container-based PaaS[15]. It was first released in 2011, and since then many implementations of Cloud Foundry have been released. Cloud Foundry is a cloud service for developers to deploy, test and scale applications similar to what Heroku and other PaaS offer, but with some differences. One major difference is that with Cloud Foundry developers can deploy to run applications on their own computing infrastructure[16], however deploying on an IaaS i.e. AWS, vSphere and OpenStack, is still a possibility. Cloud Foundry is also available as a web service through e.g. Pivotal Web Services.

Architecture Cloud Foundry is a distributed system that is comprised of different components[17], see figure 4, that communicates with one another to perform the cloud service. Different implementations of Cloud Foundry i.e. Pivotal Cloud Foundry, IBM Cloud Foundry, and SAP Cloud Foundry have other features and workings, although they are all based on these core parts:

• Routing • Authentication • App Lifecycle and System State • App Storage & Execution • Services • Messaging • Metrics & Logging

The components include:

• An application execution engine • An automation engine for deploying applications and handling their lifecycles • A scriptable command-line interface (CLI)

13

Figure 4. Showing Cloud Foundry architecture.

Deployment Options Cloud Foundry provides two main options for deploying applications. One is through App Manifests. App Manifests are configuration files that allow for automating application deployment. When using App Manifests, Cloud Foundry will use buildpacks to compile and create a droplet, comparable to container image. This is the preferred option on Cloud Foundry. It is also possible to deploy an application directly with a Docker image. The Docker image must be pushed from a Docker registry to the Cloud Foundry Docker registry.

Runtime Containers Diego and Garden are what is needed to execute applications and tasks on Cloud Foundry. Garden is an API for container creation and management and Diego is the container runtime and orchestration system for the containers. For Diego to run an application it needs a Droplet or a Docker image.

14

The Cloud Controller (CC) and Diego reside in the App Life Cycle and System State component from figure 4. It is the CC that exposes Cloud Foundry’s REST API (CAPI), described below, it manages the application life cycle with help from Diego[18, pp. 33–35]. Which means, when clients make HTTP requests to CAPI they will interact with CC, so that CC can serve the request. The requests are, for example, pushing, running, staging, updating and retrieving applications. These requests can be made through the scriptable CLI or through development tools integrated into some IDEs. With these requests, CC tells Diego to perform them. The scheduling, orchestration, placement of applications and running of containerized application (processes) is done by Diego when the CC requests it. Diego will monitor the applications states and can start and stop processes when needed.

The CC and Diego need to access files for the application, such files are application code packages, resource files, buildpacks, droplets, and container images. These files are stored in the Blob Store seen in figure 4.

Network The GoRouter is responsible for receiving incoming HTTP traffic from an external load balancer and primarily route it to either the Cloud Controller or an application. The external load balancer will handle all HTTP traffic towards Cloud Foundry, it will pass the traffic into the GoRouter. The load balancer is not a Cloud Foundry component, it differs depending on the choice the developer makes about the underlying infrastructure[18, p. 31]. If the traffic is for an application, the GoRouter will route it to the hosted application running on a Diego cell[17]. The Diego cell is the virtual machine that runs the application. Diego tells the GoRouter about the location of the cells and containers that are currently running the application. The traffic that comes from Cloud Foundry’s API routes to the Cloud Controller. Moreover, some calls will go to the user account and authentication server (UAA), although they are initiated from the Cloud Controller.

The UAA is responsible for authenticating the incoming traffic to Cloud Foundry. The authentication is done for an application’s clients and end users and Cloud Foundry developers.

All the components in figure 4, communicates through HTTP and HTTPS protocols, and with NATS. NATS is an open-source messaging system.

Other Features Service Brokers reside in the services component as seen in figure 4, which has the responsibility to provide the service instance for an application that binds that service. The service can be a database or a service from a third-party SAAS provider.

Gathering metrics and statistics from these components is done by the metrics collector. The Loggregator (log aggregator) system in Cloud Foundry, seen in figure 4, streams application logs to developers.

15

2.3.3 OpenShift OpenShift is the name of a family of containerization software that is developed by the company Red Hat. What we are going to look at is the OpenShift Container Platform. This is the PaaS software that companies that want to deploy their applications can use on their own computing infrastructure. It is also possible to use OpenShift on other IaaS providers. OpenShift is also available as an online service through OpenShift Online.

Architecture The architecture of OpenShift can be seen in figure 5. The OpenShift architecture is a layered system[19] that is built to work together with Docker and Kubernetes to make it easy to deploy an application in the cloud. Docker is used for creating Linux-based container images and Kubernetes for orchestrating the containers.

Figure 5. Showing OpenShift architecture.

16

Deployment Options OpenShift provides two main ways of deploying an application. Either by using the Docker CLI and creating images as you would on any machine, or by using the OpenShift’s Source-to-Image (S2I) tool.

The S2I tool will help developers build their source code and inject it into a builder image, which then produces a new Docker image that can be run as a container. This brings some advantages as the developer does not need to write a Docker file. It can also be faster than building a completely new Docker image because it can be written to re-use already downloaded or built objects in the image. There are also advantages such as operational security, efficiency, and reproducibility[20]. The Docker images are pushed to the OpenShift internal container registry.

Runtime Containers The smallest unit in the OpenShift is the container, and here is where the actual application resides. The OpenShift containers are based on Docker images. Each container runs in a pod. One pod could be viewed as one machine, with its own internal IP address and local storage. Each pod contains one or more containers, and each container can share the resources of the pod they are running on if it is defined that it should in the configuration.

Kubernetes is responsible for managing the containerized applications through deployment, maintenance and scaling over several hosts. There are 2 different types of hosts, nodes, and masters. Nodes are Red Hat Enterprise Linux Servers or Red Hat Enterprise Linux Atomic Hosts that provide runtimes for one or more pods. These nodes can be created from cloud providers, physical or virtual systems, and must be running the OpenShift software. Nodes are managed by the Master host or hosts. Masters are hosts that contain all the components to make it possible to control the nodes.

Network Kubernetes is responsible for assigning each pod with an IP from an internal network so that pods can network with each other. This way, each pod looks like an own host and all containers in the pod behaves as if they were on the same host.

To expose a service for external clients, OpenShift uses routers and routes. This is the routing layer in figure 5. A router provides external hostname mapping and load balancing and is installed on nodes. Routers work over HTTP/HTTPS, so the node must have port 80 and 443 available. On these router’s developers can program routes which then exposes services to external clients.

Other Features OpenShift provides different images that can be pulled from their registry for easy implementation of other services, like databases. OpenShift also provides an interface to view metrics for containers and other components.

17

2.4 Related Works The thesis called Improving Software Development and Maintenance[21] shows a comparison between containers vs. virtual machines. This is somewhat related to our work because they talk about cloud deployment and containerization which is topics we also talk about. But that is all we have in common, our report is more towards deploying applications with containerization and has barely anything to do with virtual machines.

18

3 Methods and Methodologies

This chapter will present an overview of research methods and methodologies and how these methods are applied in this thesis. The first part gives a theoretical description of different methods and methodologies and ends with an explanation of the methods that are applied in this thesis. The next part shows our research process, where we describe all the steps that have been taken when working with this degree project.

3.1 Research strategy This part will give a brief description of several research methods and explain which ones are adopted in this report. A research strategy will help us to find the methods that best suit our project. Which will then, lead us to reach the goals of our thesis.

3.1.1 Quantitative and qualitative methods Quantitative and qualitative methods are two types of research methods that fundamentally differentiates depending on if the data is numerical or not. Which one to choose does not depend on which one is better than the other. Instead, it is about choosing the right method that suits your study and has the feasibility to answer the research question. Quantitative research methods[22] deal with measurable and numerical data. The quantitative research design is determined before it begins and aims for an objective approach. The research data can be gathered with pre-structured sampling methods i.e. surveys, polls, and questionnaires. The data can be observed, compared and concluded through mathematical, statistical or computational techniques. Qualitative research methods[23] deal with non-numeric data i.e. textual data, and performs process-based analysis rather than numerical based analysis as in the quantitative research. The qualitative research design is more tolerant of adjustments and can evolve during the research. The research can be gathered from different sampling methods i.e. unstructured interviews, focus groups, and case studies. The data can then be analyzed, interpreted and concluded by identifying patterns and drawing inferences about the findings.

3.1.2 Inductive, deductive and abductive methods Three research approaches that can be used are inductive, deductive or abductive[24] methods. The inductive approach is when theories and propositions are formulated from observations, tests, and patterns, often through qualitative methods. Data is collected and used to create an empirical generalization and identify relations between observations. It is important that sufficient data is collected so that the theories can be formulated through explanations of observed phenomena. The inductive approach is often called a bottom-up approach because the researcher starts without any theories or assumption and lets them formulate over the progress of the research, the deductive approach is the opposite. It 19

starts with a theory or hypotheses that the research should try to verify or falsify. It is almost always tested through quantitative methods using large data sets. The hypotheses or theory must be expressed through measurables terms, explanation of how these terms should be measured, and declaration of the expected outcome. This way the hypotheses or theory can be declared as either verified or falsified. Abductive approach is a mix between inductive and deductive. It is a way to minimize the weaknesses that inductive and deductive approaches have[25]. In abductive approach, the starting point is to explain phenomena that existing theories cannot explain. The best explanation for the relevant evidence is chosen, and preconditions are used to explain conclusions.

3.1.3 Case Study A case study is a research method that often targets a specific subject up-close and allows the researcher to conduct a detailed examination of it[26, pp. 1–2]. The study can involve both qualitative and quantitative methods and is therefore suitable for exploring, investigating and experimenting with the subject. Establishing a generalized and reliable conclusion from the case study is often stated to be challenging, especially if the design of the case study has not been carefully considered.

3.1.4 Interview Generally, an interview can be considered as either quantitative or qualitative research method depending on if the interview is structured or not[27]. A structured interview is a quantitative method because the interview questions and the choice of answers are predetermined. It is common that the questions are asked in a standardized order and, that the interviewer will not diverge from the interview schedule or ask follow-up questions. An interview in qualitative research is more flexible and can be proceeded in many ways, it often resembles a conversation. Hence, the data collected from the interview can be time-consuming to go through, thus a deciding factor if it suits the study.

3.1.5 Survey A survey is a research method to gain information from a group of pre-defined people by asking them questions. There are different forms of a survey and they can have many purposes. The most general form is to conduct a questionnaire through email, SMS or a web site. Common question types are multiple choice questions and rating scale questions. The group that is chosen for answering the survey are people that have an insight into the subject and can, therefore, provide reasonable information.

20

3.1.6 Adopted methods We will do a case study, two interviews and a survey to gather the data and information that is needed for answering the research question. A qualitative research approach seems to suit our research better, because when gathering and evaluating information from interviews, survey and case study the collected data will be textual data. If this research would be to study both scalability and cost of deployed applications in cloud services, we would have chosen a quantitative approach instead. The gathered data would have been perfect for analyzing and concluding statistical models, but since this thesis limitation is defined, the first approach will be applied. Furthermore, an inductive method will work the best because we will formulate our theory after we have gathered all the material and analyzed it. A deductive method would not suit us because we don’t have a hypothesis at the beginning to base our work on.

3.2 Research process The research process of our thesis will be explained in detail in this section, together with what and how we have done when working with this degree project. When forming our own process we took inspirations by other theses and from a book that explains a suggestion for a research process[28, pp. 26– 29].

3.2.1 Research process model For this research we have created a model that represents our research process, see figure 6. This model shows a general picture of all the steps taken for carrying out this research. There are six steps, starting from the first step which is defining the problem and working downwards towards the conclusion of the research which is the last step. Some steps were revisited during the research because the study naturally guided us to rethink some decisions, which is normal in a qualitative study[28]. Further down of this section we describe all the details encountered in our research process.

21

Figure 6. Model of our research process.

Problem Statement The first step was selecting a topic that we were interested in, thereafter researching information about it to narrow down to something more concrete. Discussions led us to formulate a preliminary research question for our problem statement. Having a research question defined was necessary to be able to proceed with the study. We then continued with a literature study, where we gather knowledge about the field. The goal of the literature study was to search for information and research methods related to our study. The literature study consisted of reading papers on related works, documentation of cloud services, articles, and videos about the cloud.

Research Design The second step was to elaborate a research design for the study by first reading about different research methods. After that, we had to analyze which methods would be most suitable for the study. To do this we had to think about two things, our limitations of resources, and what we needed to produce to answer the research questions. Further, we concluded that we needed to answer some sub-questions to be able to answer the research question. The sub-questions are as followed: 22

SRQ1 What activities could be used in the course? SRQ2 Which cloud services should be used in the course? SRQ3 How much time will a student need to deploy an application into different cloud services? SRQ4 How can a guide be written for deploying an application into different cloud services?

We concluded that a case study, interviews and a survey would be appropriate research methods. The aim of the case study was to answer sub-questions 2, 3 and 4, by observing how much time it took to deploy with these services and taking notes of all the steps that were required when deploying. The aim of the interviews was to get deeper knowledge and reasoning from the interviewees for answering sub-questions 1 and 2. The aim of the survey was to gather opinions for answering sub-question 1. The interviews were done with people with insights from enterprise and university, and the survey was sent to students at a university.

Conduct Research The third step was to conduct the research methods and collect data from the case study, interviews and the survey.

The case study was to deploy an application with OpenShift, Cloud Foundry, and Heroku. Data was collected by performing the deployments and observing the time consumption. The case study is presented in chapter 4.

Then we moved on to the two interviews, one was with an employee at a big company and the other with a teacher at the Royal Institute of Technology (KTH). For the first interview we interviewed the employee via email, we asked questions we had prepared. The goal of this interview was to gain knowledge about the deployment process for a big company. We also wanted to see which cloud services that are relevant today. For the second interview, we interviewed the university teacher and asked the questions we had prepared, but they were not the same questions as in the first interview. The goal of this interview was to get insight from a teacher’s perspective on how to teach using cloud technology in a university course. From this, we could observe which course activities are preferred from previous experiences.

Lastly, we conducted a survey that was distributed to students at KTH with knowledge in computer science. The survey included general questions about cloud technologies and which methods are preferred when learning something new. The goal of the survey was to get insight from the student’s perspective in order to know how they would like to be taught something new for sub-question 1.

The interviews and the survey are described and summarized in chapter 5.

23

Evaluation In this step, we analyzed all the findings that came from conducting our research, with qualitative data analysis because that was our adopted approach. We did evaluation by compiling and comparing the collected data together with our personal experience from completing this study. This step was essential in order to make a content analysis and to give some structure for all the data coming from the study. The goal here was to be able to answer the sub- questions and give an answer to the research question in chapter 6.

Results This step was done by presenting the results from this thesis together with data that could validate our answers. The results were derived from a combination of the literature study, conducting the research with data from case study, interviews and survey, and with our personal experiences from completing the research. This is also done in chapter 6.

Conclusion In the last step, we draw conclusions about the research questions, by clearly stating the answers to the research questions. We also highlighted the strengths and weaknesses of our research. Lastly, we also recommended future work of other technologies that did not make it in this research because of limited resources, but that could still be of interest to explore further. This is done in chapter 7.

3.3 Data collection This subchapter presents the different methods we used to collect data from.

3.3.1 Literature Study When starting the project, we had to go through a lot of literature to understand all the services. We started by reading the official documentation about all the cloud services that we chose. This laid a good foundation for understanding how they worked and how they differed from each other. After this, we branched out and read up on other technologies that these services are using and/or are based upon.

To be able to do the case study we also had to read about the practical part of these services, these are things like guides and video demonstrations of setups and deployments.

When searching for solutions to problems we searched on forums like stack overflow and the official forums for the cloud services.

3.3.2 Case Study Observations We concluded that we would do a case study to observe how easy/hard it is to deploy an application with different cloud services in order to gain an insight of what is fair to include in a course. We also wanted to try some of the services to

24

see what they offer and how they differ. During the case study, we did observations according to a list described in the sub-section about metrics down below.

3.3.3 Interview From our literature study, we could see which cloud services were relevant today by reading about services that were currently recommended for cloud deployment. Therefore, we chose Heroku, Cloud Foundry and OpenShift for the case study. However, we felt that it would be a good idea to interview an employee at a company and ask them how they deploy to see if our choice of cloud services where relevant. The interview with the teacher was then set up to give answers on the relevance of services, and what is good to cover in a possible solution for how to teach out using of cloud services. The results from the interviews were long and too comprehensive to give in their full form, thus, we have made a summary of them in chapter 5.

3.3.4 Survey While we were doing our research design for our research process described earlier in this chapter, we realized that it could be worth creating a survey for the students. There, we could hear from them how they would like to be taught and which activities they value in a course when learning something new. This information was interesting when comparing to what the teacher had said about the same question. All this could then be analyzed to find a solution to the research question, along with the other information that emerged from the studies.

3.4 Evaluating differences between cloud services In this subchapter, we describe what metrics we will follow when comparing the cloud services with each other in order to collect and evaluate them for our study. This is to decide, which cloud service should be used in the education. The different things we will look at are:

• Time Consumption • Difficulty • Community • System requirements • Pros and Cons

Time Consumption We will estimate how long setup and deployment will take for each of the services we are going to use. This is to give an idea of how much time students need to spend to get it to work. This can help a teacher to know how much to expect from the students. Also, if one technology takes too much time, then it might not be a good fit for a university course, because of the limited time a course has.

25

Difficulty This is an open metric, but it is still important to describe how difficult we think that the setup and deployment have been. This includes things like how much configuration needs to be done in comparison to other technologies, how much needs to be done to the already developed application to get it to work and if it requires much pre-study before starting.

We will use the following scale to rate the difficulty:

• Easy • Normal • Hard

Community When you come across problems that are hard to solve, it is always good with a big community that can help one to get in the right direction for a solution. Teachers in universities have often small timeslot to spare and maybe not enough knowledge for specific problems, which leads to frustration for students if the solution is nowhere to be found. With a big community, we mean many users that posts questions and problems to forums like stack overflow or similar.

System requirements This metric is important because all students do not have access to machines with high system requirements. Therefore, we want to examine what system requirements each cloud service requires. This needs to be considered by teachers.

Pros and Cons It is always good to state the pros and cons of each service, so when deciding between them, we can compare them with their pros and cons.

3.5 Tools, documentation, and modelling This subchapter will describe various methods for tools, documentation, and modelling. All software and hardware resources used for this project are described in tools. The documentation describes general information about this document i.e. the template, and reference management. Modelling describes in what form our result will be presented.

3.5.1 Tools We have used two different machines when working on this degree project. One HP running windows 10, with 8GB RAM a Dual-Core processor. This computer has been used for writing our code, documentation, and deploying to Heroku and OpenShift. To deploy to Cloud Foundry, we needed a computer with higher system requirements. Therefore, we used an iMac running macOS which had 12GB of RAM and a Quad-Core processor. 26

All figures and tables that have been created by us are drawn with Lucidchart, metachart.com, and draw.io, and all code is written with Visual Studio Code. For running command line interfaces, we used PowerShell on Windows and Terminal on macOS.

3.5.2 Documentation The structure for this degree project is based upon templates given by our university KTH. Some minor changes have been made regarding the layout to make it more suitable for our study. A shared repository on Google has been used to store and share documents and other artifacts written by us. Furthermore, a free and open-source reference management software called Zotero[29] has been used for this report, not to only manage references but also to store research materials in a shared online repository.

3.5.3 Modelling The data and the results from our research methods will be presented in various modelling types. Results from the case study will be presented in a table. The result from the interviews will be presented in a summarized text. Data gathered from the survey will be presented in pie charts.

27

28

4 Case Study of different Cloud Services

In this chapter, we are going to describe what we did to deploy an application to three different cloud services, OpenShift, Cloud Foundry, and Heroku. The application we used to test the deployments is an application that we previously made as a course project. The application is a simple Java Spring web app that is running a tomcat web server and needs a PostgreSQL database running. We will call the application reqruit, since it is a recruitment application that we are using throughout the guide, so when following this guide change those command to your application’s name.

4.1 OpenShift To test OpenShift we are going to use MiniShift. MiniShift is a tool that allows the developer to start an OpenShift cluster that comprised a single node and can be run on a local machine. This is a great way for a developer to test the OpenShift platform, and for us in our thesis for research.

4.1.1 Setting up MiniShift To get started with OpenShift some preparation must be done. Because MiniShift is running as a virtual machine, it needs a hypervisor to run for the specific operating system of your host [30]. As we were doing our research on a Windows 10 machine, we used the Hyper-V which is the native hypervisor for Windows, and run all the commands through PowerShell.

Now we can start our MiniShift virtual machine to deploy. When the virtual machine is running, we want to export the environment variables for the OpenShift client and Docker CLI so that we can deploy the application as it would be done in real production. When running the commands in snippet 1, MiniShift will output the commands you need to set up this on your operating system.

$ minishift oc-env

$ minishift docker-env Snippet 1. Exporting env for OpenShift client and Docker CLI.

When this is done, we can log in to the OpenShift client and Docker using the developer account that is the default when installing OpenShift.

$ oc login

$ docker login -u developer -p $(oc whoami -t) $(minishift registry) Snippet 2. Command for logging into OpenShift client and Docker.

The second command looks a bit weird, but what we are doing is log in to the OpenShift internal Docker registry, which is running as a Docker container inside the MiniShift VM. As we were running the command, we got an error saying it could not find the registry. What we had to do was to SSH into the

29

MiniShift VM, as in snippet 3, and then check running containers to see if the container for the registry was running in the MiniShift VM. When we did that, we saw that it was not running, so we restarted the MiniShift VM and went through snippet 1-3, and this time the second command worked.

$ minishift ssh Snippet 3. Command to SSH into the MiniShift VM.

4.1.2 Creating Java Docker image To deploy to OpenShift we first are going to create a Docker image of our application. We first start by running the mvn clean install, which forces maven to recompile all your code and package it to a jar file. Then we create the Docker file in snippet 4.

FROM openjdk:8-jdk-alpine

VOLUME /tmp

EXPOSE 8080

ENV JAVA_APP_JAR appserv-spring-2.0.jar

ADD $JAVA_APP_JAR app.jar

ENTRYPOINT ["java","-Djava.security.egd=file:/dev/./urandom","- jar","/app.jar"] Snippet 4. Dockerfile

The Docker file in snippet 4 is a standard Docker file. We need a FROM command to declare our base image, which in this case is the openjdk:8-jdk- alpine image that is a lightweight Linux distro called Alpine and comes pre- installed with openjdk 8. The volume command just creates a mount point called /temp where the container can put data in. A good thing with the volume command is that when the container is killed, it remains in existence. Good for different logs that we want to check after the container are killed. We then use the EXPOSE command to make the application accessible at port 8080 when running the container. The ENV and ADD commands just take our build jar file and put it in the image as app.jar. Last, we create an ENTRYPOINT that explains how the application should be executed from within the container.

4.1.3 Deploying Now that we have gained access and written our Docker file, it is time to build the container image from our Docker file. We start by changing the directory to where we have our JAR and Docker files. Then we need to build the image, tag it with our application name, and then push it to the OpenShift registry described in snippet 5. The $(minishift openshift registry) is a command to receive the IP for the OpenShift internal registry for connection.

30

$ docker build -t reqruit -f ./Dockerfile .

$ docker tag reqruit $(minishift openshift registry)/myproject/reqruit

$ docker push $(minishift openshift registry)/myproject/sreqruit Snippet 5. Commands for preparing our application image.

Now we need to prepare the PostgreSQL database. We start by pulling a PostgreSQL image from OpenShift, and then initialize and run it on OpenShift through its image stream by following the commands in snippet 6. The first command pulls it from a Docker registry, and the second command creates the PostgreSQL with the login and database name variables as input. It is important that these are the same values as used in your application.

$ docker pull openshift/-92-centos7

$ oc new-app -e POSTGRESQL_USER=postgres -e POSTGRESQL_PASSWORD=postgres -e POSTGRESQL_DATABASE=test openshift/postgresql-92-centos7 Snippet 6. Command for downloading the PostgreSQL image and then run it.

Now we are going to initialize and run our application, see snippet 7. First, we must check which IP that PostgreSQL pod is running on through the first command. We start initializing and run the application through the second command and input the IP for the database with the spring_datasource_url variable. With the third command, we are checking that both our pods are up and running, and with the fourth command, we can check the logs for the pod where the Spring application is running. Note that we got the name for the pod from the third command.

$ oc get svc

$ oc new-app -e spring_datasource_url=jdbc:postgresql://172.30.61.184:5432/test reqruit

$ oc get pods

$ oc logs -f reqruit-1-deploy Snippet 7. Command to run, and check the logs for our application

When we checked the logs for our application, we saw that the Spring application threw an error when starting because the database test had not been created in our PostgreSQL container. So, what we did was that we connected to the database pod, went into the PostgreSQL CLI and created the database test, exited and restarted the spring pod, see snippet 8. The oc rsh command is the way to start a remote shell in the defined pod, and postgresql-92-centos7-1- fxngs is the pod that is running our PostgreSQL database. This can be retrieved with the fourth command in snippet 7.

31

$ oc rsh postgresql-92-centos7-1-fxngs

sh-4.2$ psql

psql (9.2.13)

Type "help" for help.

postgres=# CREATE DATABASE TEST;

postgres=# \q

sh-4.2$ exit

exit Snippet 8. Manually add the database “test”.

Now we just had to expose the Spring service with the command in snippet 9.

$ oc expose svc reqruit Snippet 9. Expose command.

Now the service is up and running and we can reach it, see figure 7, and we can see that the database is being populated as we register new accounts, see figure 8.

Figure 7. Registration page of the running web app on OpenShift.

32

Figure 8. Showing the populated database.

4.1.4 Problems and Solutions Deploying with OpenShift was a straightforward affair. It is important to have Docker installed locally on the machine; it is not only enough to just export the environment variable for Docker as described in snippet 1. Checking logs for pods when deploying is also important because for example the printout from Spring boot when running our Java application is written there. With OpenShift it is also easy to install another database manager like MongoDB or MySQL because they provide Docker images just like with PostgreSQL. Just follow snippet 6 but pull the image for the database manager you need and change the command to match the documentation. Running another application in another language is also easy, just write a Docker file working for your application.

4.2 Cloud Foundry To test Cloud Foundry, we are using PCF Dev. PCF Dev is a virtual machine that is running an almost full installation of Cloud Foundry and is a great way for developer and researcher for running Cloud Foundry locally on their own machine to test it. It is also a great resource for developers to debug their application in PCF Dev before deploying it to production.

4.2.1 Setting up Cloud Foundry CLI and PCF Dev To set up PCF Dev we first must install the Cloud Foundry command line interface, cf CLI, for our chosen Operating System. On MiniShift we ran our deployment from a Windows machine, but because PCF Dev required much more performance, especially memory, we had to move to a macOS machine with 12Gb of memory. When the cf CLI is installed we need to download PCF Dev[31]. Note that this is a big file of about 20Gb so it might take some time depending on bandwidth. When the file is downloaded, we need to run the commands in snippet 10.

$ cf install-plugin cfdev

$ cf dev start -f Snippet 10. Command for starting PCF Dev. 33

The first command installs a necessary plugin to get cf CLI to work with PCF Dev, and the second command starts the virtual machine. Note that this might take some time, and it is important that there is 8Gb of free memory available. On our machine, we had to close every other application so that there was enough free memory and rerun the start command a couple of times until it worked.

4.2.2 Creating Java Docker file (and pushing it) We are going to use the same Docker file as in the previous section, see snippet 4. But when using Cloud Foundry, we must first push our Docker image to a Docker registry. In our case we are going to use DockerHub, then deploy it from that registry to Cloud Foundry. In snippet 11 we can see the commands for building the Docker image, tag the created image with your username, and pushing it to your DockerHub account. It is possible to use another registry then DockerHub. Check Cloud Foundry documentation for more information.

$ docker build -t reqruit -f ./Dockerfile .

$ docker tag reqruit /reqruit

$ docker push /reqruit Snippet 11. Command for build and pushing the image to DockerHub.

4.2.3 Deploying When looking at how to connect our app to a PostgreSQL database, we realized that it was not an easy task when using PCF Dev. Therefore, we decided to convert our Spring boot application to use MySQL instead, because that is a service that comes installed with PCF Dev. This is an easy fix. In our applications configuration file we changed the JDBC driver to work with MySQL, downloaded the correct dependency, and rebuild our .jar file. Then we had to do the commands in snippet 11 again to update our DockerHub repository.

To deploy and run the MySQL service in our Cloud Foundry environment, we run the commands in snippet 12. The first command deploys the service and makes it available in the marketplace which we can view with the second command. In this view, we can see the service name and different plans that we need when running the third command in snippet 12, where p. is the service name, db-small one of the plans, and reqruit-db the name we choose for our specific instance of the service. With the fourth command, we can list running services and verify that it was created successfully through the last operation column.

34

$ cf dev deploy-service mysql $ cf marketplace $ cf create-service p.mysql db-small reqruit-db $ cf services Snippet 12. Commands for installing and running the MySQL service.

We can now start to deploy our application. We first need to push our image from DockerHub to our Cloud Foundry instance. When we run the command in snippet 13, it will do this and run the application.

$ cf push reqruit --docker-image /reqruit

$ cf logs reqruit --recent Snippet 13. Command for pushing the image from DockerHub to Cloud Foundry.

In this step, we had some problem. The app would not start, so we had to check the logs through the second command in snippet 13. There we could see that the database connection could not be established. The problem was that when connecting, or as it is called in Cloud Foundry binding, an app to the database service, it had to be running. So, every time that we tried to start our application, it crashed because we could not establish a connection to the defined database. So, we had to solve it by first setting our app to use an in-memory database, then we could run it without the app crashing. After that, we bound it to the MySQL service and restarted it with the commands in snippet 14.

$ cf bind-service reqruit reqruit-db

$ cf restage reqruit Snippet 14. Command for binding to MySQL and restarting the app.

Now we could verify that the website was up and running, see figure 9, by going to the link provided in the list of apps, see snippet 15.

$ cf apps Snippet 15. Command to list all apps running in target space.

35

Figure 9. Registration page of the website running on Cloud Foundry.

4.2.4 Problems and Solutions Deploying to Cloud Foundry was not an easy task. Because we had written our application with PostgreSQL as the database manager, we had a problem, PCF Dev did not natively support it. Luckily, PCF Dev does support MySQL, so it was an easy fix for us to do. It might have been possible to get some kind of version of PostgreSQL running outside of Cloud Foundry and connect it as a third-party-software through the service-broker system, but we felt that it was outside the scope of a university course.

When starting PCF Dev, we had problem with performance, and our solution was to move to a better computer. But that was almost not enough, PCF Dev required a lot of hardware performance. It showed when installing took a long time and it crashed multiple times.

4.3 Heroku To deploy to Heroku we need to create an account because it is completely web based. To deploy the application we have multiple options, but we are going to use the Heroku CLI locally on our machine and use their git connection that uses buildpacks.

4.3.1 Setting up Heroku CLI We start with downloading the installer for Heroku CLI and install it on our machine. When the Heroku CLI is installed, we can login to our account and start deploying.

$ heroku login

Snippet 16. Login command for Heroku CLI

36

When running the command in snippet 16, it will take you to the login page in the browser where you enter your credentials.

4.3.2 Deploying When Heroku CLI is setup and we have logged in, we are going to add our source code to a new local repository, create a new app and then push the git repo to Heroku, see snippet 17. Note that the commands expect you to be in your project folder.

$git init

$git add .

$git commit -m "first commit"

$heroku create reqruitertest

$git push heroku master

Snippet 17. Commands to deploy the application to Heroku with git.

Now we have our application up and running, and we need to connect it to the PostgreSQL add-on in Heroku. We run the command in snippet 18 and Heroku will then automatically populate the following variables, so that our application becomes connected to the database:

• SPRING_DATASOURCE_URL • SPRING_DATASOURCE_USERNAME • SPRING_DATASOURCE_PASSWORD

$ heroku addons:create heroku-postgresql Snippet 18. Command to add PostgreSQL in the Heroku project.

From figure 10 we can see that the web application is up and running and in figure 11 we can see how the database becomes populated with rows as we register new users in the app. If we would like to push updates of fixes for the web app, we just have to run the third and fifth command in snippet 17, because when Heroku detects changes done to the master branch it automatically rebuilds the application.

37

Figure 10. Registration page of the running web app.

Figure 11. Showing the database info through the Heroku CLI

4.3.3 Problems and Solutions When deploying to Heroku we did not encounter any problems. Heroku does almost everything for us through its Buildpacks, no extra configurations had to be made.

38

5 Interviews and Survey

In this chapter, we will present our work from two interviews and a survey.

5.1 Interview with an employee at a big company In this subchapter, we are going to summarize the interview we did with an employee that has the role as a Technical Solution Responsible, TSR, at a big company. A TSR has the overall responsibility for Infrastructure & Operations delivery. We did this interview to get some understanding of how a big company works with deployment and cloud technologies. This would give us some good ideas on what companies and organizations are searching for and use it when presenting our result.

Question 1: How do you work with Cloud services? This question was asked to give a wide overview of how a big company is adapting to the Cloud market.

The employee said that they are moving all their system to the cloud. Every old system is migrated to the cloud, and new are developed for the cloud.

Question 2: What IaaS and PaaS are you using, and do you deploy with containers? We asked this to see what technologies they use and see if the technologies we are looking at are relevant in the real world.

They are using Azure as both IaaS and PaaS and uses Docker as the container technologies when deploying. With Azure, they use the tool Azure DevOps, which has Git integrated to it.

Question 3: How do you go from development to production? This question was asked to get information about their process of going from development environments to production environments.

The employee said that they first go from development to testing, then to production. For testing, they have specific groups of testers that perform traditional testing and prepare automated testing. They use a Transition Framework for the steps that they take when deploying, see examples in table 1. A transition framework is a system for structuring transitions between phases.

39

Examples from Transition Framework Responsible IT-Security Scoping, classification of system System Responsible and Risk Assessment

Penetration Tests System Responsible Create Solution Architecture Description Architect/Dev Team documentation Create Technical Solution Description Operations Team documentation Create Operational Routines Operations Team Create Maintenance Description System Responsible Solution handed over to maintenance Dev Team organization Rest-list from project handed over to Dev Team Maintenance organization Setup of Deploy routines Dev Team/Operations Setup of Backup/restore routines Operations Setup of application monitoring Dev Team/Operations Setup of system monitoring Operations Setup of Contingency Plan, disaster recovery Operations Security cleanup in production environment Operations Transition demo of system for Operations Dev Team Solution handed over to Operations Dev Team Solution handed over to Helpdesk, Incident System Responsible process Establish early life support System Responsible/Dev Team Table 1. The example of responsible areas in their Transition Framework

Question 4: How much do the developers take part in the deployment, and if so, how much knowledge do they need? This question was asked to see if developers need knowledge in deployment when working in bigger companies, and how big part they take in actual deployment. We wanted to know this so that we could give an estimate of what students need to learn to satisfy companies.

The employee said that developers are very involved in the deployment process. They deliver all source code for deployment and are part of the actual deployment. Table 1 shows which steps that the developer team is responsible for when deploying.

40

Question 5: How do you work with Version control regarding deployment? This question was asked because we wanted to know if they had a solution on a thought we had, the possibility to version control the deploy process. Version control regarding deployment is when the deployment process is somehow written into script or automated processes, that is placed in a version control system. When we did our case study, we had the idea that we wanted to show how to version control the deploy process, but we could not find a solution. So, we wanted to know if they had solved it, and if so, how. The employee asked one of the developer/DevOps to help to answer this question.

They said that they use git as version control for their code, but they do not have version control for the deployment. Microsoft might be working on adding version control for the deployment, but they were not sure. However, they said that they did use PowerShell scripts to “by-pass” their regular deployment process, to test their deploys before checking in changes.

5.2 Interview with a teacher at KTH In this subchapter, we are going to summarize the interview we did with Leif Lindbäck, who is the teacher and examiner in the course IV1201, for which the project result is aimed. We did the interview to understand what he as a teacher is looking for when integrating new topics in an existing course, and his opinions on some of our thoughts.

Question 1: What is your goal with this topic? Is it to give students an intro to cloud services and deployment or to give them actual knowledge to deploy in the real world? This question was a big and deep question, but important to know what Leif is looking to have given the students when finished with the course. This is so that we can, in our result, think about how the course could approach the material to give the students what Leif is looking for. His answer revolved around the importance of developers in the real-world having knowledge and experience with deploying to the cloud. Leif’s feeling is that, as we could confirm with the employee from our first interview, developers are very much involved in deployment procedures.

Therefore, Leif’s goal is that everybody that has finished the course should have done a deployment to a real-world service with the application that has been produced in the course. He understands that it will be hard for students to directly jump into an organization’s deployment processes when finished, but that he wants to reduce the gap and give students a little experience and knowledge to build upon.

He also acknowledges that the topic of deployment and cloud environments is very large, and it might be better suited as an own course. But before that is organized and in place, he wants to have something that at least gives the students an intro. 41

Question 2: How would you like to teach about Cloud Services? Practical, theoretical or both? We asked this question because we wanted to know how a teacher thinks about the distribution of practical and theoretical work. This is to be able to summarize our findings when doing the literature study and case study and understand which activities are most interesting for a teacher.

Leif’s courses are often very practical, and he thinks that it is the most important activity. He believes theory without practice doesn’t give anything and that the practical part is important to make the theoretical part stick. With that said, he thinks it is still important to give a theoretical background, to give knowledge in what exactly is being done so that it is just not a case of googling to get it done. This could be through a report after the practical work is done so that the student is given the ability to think about the theoretical part.

Question 3: What is the practical content you would like? This question came as a follow up question on question 2, because we wanted to hear an in-depth answer on Leif’s thought about the practical part and which services to try.

Leif did not have any thoughts on which service should be used in the course, he said that he wanted input through our project for this.

Question 4: How would you like this course part to be examined? Through lab, seminar, home lab or other? We asked this question to get an idea of what Leif would want. Note that in the course IV1201 there is a project throughout the whole course. This project has, in its current form, several requirements that must be produced and written about, both mandatory and optional for a higher grade. Leif’s answer revolves a bit around this concept.

Leif wants this to be a big part of the course project, but he is not sure on how to examine it. He had thoughts about the students doing some kind of version control on the deployment process that he then could examine. Another possible solution that he thought of was that each student could summarize the deployment process in text that he could examine, and that it could work in IV1201 because the students are already writing a report when the project is done to prove the work they submitted. He doesn’t want to add an extra assignment in the course just for this because he thinks that it would be too much in an already practical course, but it could be good to have a smaller intro assignment in a course before this course.

We discussed possibilities to add the deployment on more advanced options, like Cloud Foundry or OpenShift, as an optional requirement for a higher grade in IV1201.

42

Question 5: What material would you want in the course for this topic? This question was meant as a question about actual material that is interesting, but we came in more on how much material and what criteria Leif is looking for.

What Leif is interested in is estimation on how long it would take for students to go through the course part, from theoretical to practical. The theoretical material could either be a shorter text describing the information needed to do the practical part, or part of a lecture. The practical part could be to follow a guide or walkthrough, and then somehow prove that it has been done as discussed in question 4.

For the practical part, Leif would also want a comparison between the different cloud technologies to know which one he should choose to use in the course. When deciding between the different technologies he looks at things like how long time it takes, difficulty and if it is possible for all students to use regarding system requirements.

Question 6: Should this topic be more suitable for its own course? This question is something we wanted Leif’s opinion on because under our research we have realized more and more that it might be more suitable for a course.

Leif agrees with us on this part and is really interested in creating a stand-alone course for this topic. He thinks that this is such an important part for developers to know and if it was its own course it could be much more useful and instructive.

5.3 Student Survey We conducted a survey that was performed by students in our class at KTH university. These students study computer science and are currently on their final year before graduating. Of 60 students, fourteen students answered our survey

We created this survey to gain insight on the student’s experiences about the study content, and to test a theory we had. The theory was that a summary of a technology followed by a case study where the student performs a practical part is a good strategy of learning it. The theory originated from the interview with the teacher and our own experiences from being students. However, we wanted some validation of the theory, so it felt natural to include it in the survey.

All questions were pre-determined and in Swedish, thus we have translated them to English for the sake of this degree project. The survey included 4 questions in total where question 1-3 could be answered either “yes” or “no”. The last one is actually a sub-question to question 3 and could only be answered if the person had answered “no” on question 3. 43

Question 1: In the education, would you like to know more about cloud technologies? This question was asked to see if there was an interest in cloud technologies among the students. Here is a pie chart created by us that shows how they answered, see figure 12.

Figure 12. Showing a pie chart of answers from question 1.

As we can see in the figure above, 78.6% would like to see cloud technology in their current education and 21.4% would not. This shows that most of the students are interested in learning about cloud technologies. Therefore, it could be something that is missing in their education. However, there is still a group of students that do not want to know more, perhaps they may not know how relevant it is for current and future work. This is something we will be observing in greater detail in the discussion section of this report.

44

Question 2: In the education, would you like to know more about deploying applications in a cloud environment? This question was asked to specifically see if deploying an application to the cloud was something student would like to do. Here is a pie chart created by us that shows how they answered, see figure 13.

Figure 13. Showing a pie chart of answers from question 2.

As we can see in the figure above, 71.4% would like to see cloud deployment in their current education and 28.6% would not. Still, a big group shows interest in deploying applications, which one could interpret that students would like to be taught how to do it. Having that in mind is positive news for our research question. Despite that many answered positively, approximately a quarter of the students shows no interest in deploying. Perhaps deploying to the cloud seems too challenging at first glance.

45

Question 3: Is a summary of a technology together with a case study where the technologies are tested a good strategy for learning? This question as mentioned earlier, was to gain validation and feedback for our theory. Here is a pie chart created by us that shows how they answered, see figure 14.

Figure 14. Showing a pie chart of answers from question 3.

About 78.6% thought our theory was a good strategy and 21.4% did not. The students that did not think it was a good idea did not answer why in the sub- question. But they did give some other strategies, see the next question.

Question 3.1: If you answered “no” on the previous question, please state a reason for why so and a suggestion for another strategy for learning. This question was to see if there existed other and perhaps better strategies of learning something new. Only three people answered. One of them did not seem to understand what a case study was or what we meant by it. The other two thought doing a lab, or a seminar, would be a better approach for the practical part of learning.

46

6 Result

In this chapter, we evaluate our case study, interviews and the survey, in order to analyze the information. With this analysis, we then proceed to answer our research questions.

6.1 Evaluation of the Case Study In this subchapter, we are going to present the evaluation of data and observations that we have collected when doing our case study. When gathering the data, we followed the metrics in chapter three. Which we then used as a basis for our evaluation. We start by presenting the evaluation for each cloud service, ending with a table that shows all the data in one table. This table is going to be used to make comparisons between them for answering the research question and its sub-questions.

6.1.1 OpenShift Time Consumption For us, OpenShift took about twenty hours to setup and deploy our application. A big part of that time was setting up the MiniShift VM to run on our chosen hypervisor. The actual deployment consisted of two parts, first was to write the Dockerfile, the second was to run the commands to build and deploy the application. This part was almost straightforward, but we did encounter some minor problems, but the time to solve these problems had a minimal effect on the total time consumption.

Difficulty Our opinion is that the difficulty of OpenShift and MiniShift is “hard”. Some extra configurations had to be done for the hypervisor, e.g. setting up a virtual network switch. Therefore, the setup of MiniShift could have a hard difficulty, if the student does not have any experience in working with hypervisors. Furthermore, it can be hard if the student lacks experience with Docker. Many of the commands are on the other hand easy to understand, especially if following a good guide.

Our own personal experience from doing the OpenShift deployment explains how to solve the difficult parts with OpenShift. Therefore, the difficulty could be “normal” if following our guide.

Community The OpenShift community is not a particularly big one. Because of it being open source, there are some problems posted on their GitHub repository[32] where members of the development team give answers. On Stack Overflow, there are about 5500 asked questions at the time of writing this thesis[33].

We were able to find solutions to our problems, but we did not encounter that many problems and those we did encounter were frequently asked. Our opinion is, that for more advanced or platform-specific problems, it can be harder to find a good answer. 47

System Requirements MiniShift does not have too high requirements, we were able to run it on a laptop with 8GB memory and 2 CPU cores. The default MiniShift VM starts with 8GB memory and 20GB of disk space, but it can be lowered, and we did successfully run it when setting max memory to 4GB.

If one would like to try the full OpenShift installation, it requires two system installed with the Red Hat Enterprise Linux OS. They can be either virtual or physical machines. One would serve as a master, the other as a node. Both require 8GB of memory and 30 GB disk space.

Pros and Cons Some pros with OpenShift/MiniShift: • Does not require an extremely fast computer • OpenShift commands are easy to understand • Both Docker and S2I are deployment options • Provides images for adding third-party services • Uses common technologies for containers and orchestration Some cons with OpenShift/MiniShift: • Hard for beginner to set up the Hypervisor • MiniShift only runs a single node • Need a good understanding of Kubernetes

6.1.2 Cloud Foundry Time Consumption For us, Cloud Foundry took about 32 hours to setup and deploy our application. Most time went to getting PCF Dev to work. The steps for the setup was easy, but it took an extremely long time. There were multiple reasons for that. One was that the installation file took a long time to download due to its size. Then the installation failed multiple times, we don’t know why, which lead to more time spent. The deployment of our application also took a lot of time. The main reason for that was that we had to change our application configurations two times, because of issues with the database.

Difficulty Our opinion is that the difficulty of PCF Dev and Cloud Foundry is “hard”. The command process regarding setup and deployment was not that hard. But because of the countless issues with starting PCF Dev, it was very challenging. We also had a problem with our database, which leads us to go from using PostgreSQL to MySQL for our application. The reason for this is that PostgreSQL does not come natively with Cloud Foundry, and they do not provide an easy way of installing an image like with OpenShift. Getting PostgreSQL to work with Cloud Foundry would have been a hard task involving setting it up locally and configure a service broker. We felt that this was too advanced and outside the scope of a university course. 48

Community The Cloud Foundry community is not a particularly big one. Most answers to problems were found on their GitHub repository[34], and about 2500 asked questions on Stack Overflow at the time of writing this thesis[35].

We did not find solutions to all our problems. Therefore, we were forced to solve it on our own.

System Requirements PCF Dev requires a lot of hardware resources. According to the specification it needs 8GB of free memory and 100GB free disk space. We were able to run it on a machine with 12GB of memory, but we still needed to make sure that nothing else used up memory. It also crashed multiple times during the setup of PCF Dev. We do not know why, but we suspect that it had to do with low resources.

Pros and Cons Some pros with Cloud Foundry/PCF Dev: • Almost all functionality as in a full Cloud Foundry installation • Both Docker and buildpacks are deployment options Some cons with Cloud Foundry/PCF Dev: • Hard to add third-party services • Requires much hardware resources • Hard to find solutions to problems in the community

6.1.3 Heroku Time Consumption For us, Heroku took about four hours to setup and deploy our application. It was a very straightforward process, therefore it was fast to use. We know from previous experience, that you can deploy even faster through their web interface.

Difficulty Our opinion is that Heroku is very easy to use. Heroku does not require developers to think about the deployment at all. Most of the configuration is being done automatically by Heroku.

Community Heroku has a pretty big community with about 30000 asked questions on stack overflow at the time of writing this thesis[36].

System Requirements No special requirements you only need to run the Heroku CLI locally or use the web interface.

49

Pros and Cons Some pros with Heroku: • Fast and easy • Both Docker and buildpacks are deployment options • No configurations are needed Some cons with Heroku: • Compared to OpenShift and Cloud Foundry, some control of the deployment is concealed from the developers

6.1.4 Analysis

Table 2. Comparison table for Heroku, OpenShift and Cloud Foundry.

50

The times that we have measured, see table 2, is based on our machine. The time to setup and run could vary between different operating systems and specifications, but it is a good benchmark for the course. However, students may encounter other problems that might affect the time consumption more.

In table 2 we can see the different cloud services side by side. Heroku is the fastest and simplest option, it does not require any special specification and has a big community to ask for help. It is not exactly fair to compare Heroku to OpenShift and Cloud Foundry. Sure, they do the same thing. But we feel that they target different needs. To have control over the infrastructure has many upsides, e.g. you can configure it exactly as you want, you have full control over the data that enters the system and it will cost less by managing it yourself. Although, to control the infrastructure requires a much higher level of knowledge. With OpenShift and Cloud Foundry, you can have huge control over the infrastructure. Both can be run on an own server cluster. With Heroku, there is no root access to the infrastructure making it easier to use.

When comparing OpenShift with Cloud Foundry, using MiniShift and PCF Dev, we can see in table 2 that OpenShift is faster. Depending on the knowledge and experience of the student, we believe that OpenShift is slightly easier to use. Even though MiniShift might be harder to set up compared to PCF Dev, it is faster. PCF Dev also requires much higher system requirements; this could be a problem for some students. OpenShift is also based on known technologies like Docker and Kubernetes, which is a very good knowledge to have.

Depending on what the course is expected to teach, we believe that the choice should be between Heroku or OpenShift using MiniShift. Heroku could be used as an alternative for a lower grade because of it being simple and fast. It could also be good when the course just wants to give an intro to real cloud deployment. But for learning all the technologies behind the deployment, OpenShift is the better alternative.

6.2 Evaluation of Interviews and the Survey In this subchapter, we are going to present the evaluation of data we gather when conduction our interviews and the survey. We will start by presenting the evaluation for each interview and then the survey. Lastly, we are going to analyze them together and use the result from it when answering our research question and its sub-questions.

6.2.1 Interview with an employee at a big company The interview was conducted to get information on how a big company is working with cloud deployment. As stated, the employee is working as Technical Solution Responsible at a big company.

After the interview, it is obvious how important it is to learn about cloud deployment for students. The employee said that every service they provide are moved to or is already in the cloud. The developers are also very involved in the 51

process. This means that that there is a need to include cloud deployment in the education.

When asked what technologies they are using for IaaS and PaaS, the employee said they used for both. Azure is most comparable to Heroku in our case study because they both do not require developers to manage the infrastructure.

After we had done the literature study and the interview, we noticed that companies tend to go with cloud services like Heroku or Azure for its simplicity. With them, it is easy to e.g. scale the application, add add-ons and link systems. This can be achieved with OpenShift and Cloud Foundry too, but they have not made the process as simple. Which means that developers need more knowledge of the platforms. The trade-off here is, with simplicity there is a higher cost because you pay for both IaaS and PaaS.

One thing that we noted when we did our case study, was the lack of ability to do version control of the deployment. This was also something that we talked with the employee about, with regards to their deployment process and how they do not version control it. A member of the DevOps/dev team at the company mentioned that they wrote PowerShell script for some deployment tasks and this is something that we also thought about. However, the team member also mentioned that version control of the deployment process is something that Microsoft is working on for their DevOps tool. We think that this is something that is missing, and it is good if it would be provided by the services.

6.2.2 Interview with Leif at KTH This interview was conducted to get information from a teacher on how a course, and especially IV1201, could be formed with cloud deployment. Leif is the teacher in the course IV1201. The interview brought us good feedback on how Leif would want to integrate cloud deployment in the course. Also, what he would like the students to learn from it.

His thoughts are that the course should mostly give the students some actual hands-on experience with cloud deployment, but also basic theoretical knowledge. But he thinks it is important that the students really understand what they do when performing the practical part and not only following a guide. Therefore, he usually let the students write a report that presents what they have done followed by a discussion on the theoretical part.

Leif wants to make cloud deployment a big part of the course, but the examination is a hard part to solve. Leif is also the teacher for another course called Network Programming ID1212 where the students are doing several assignments. In that course, it could be an option to add a smaller assignment where the student must deploy an application to any PaaS as an intro to cloud deployment. Then in the other course IV1201, let the students do something more advanced. 52

Which service or technology to use in the courses is a hard question. Leif wanted our opinions on it, therefore our conclusions from the case study could be of value for him.

Leif thinks that in the future, this should be a stand-alone course because of the size and importance of cloud deployment today. In a stand-alone course, it could cover more services and technologies. It would also give the students the opportunity to get more experience with cloud deployment.

6.2.3 Survey As mentioned in chapter 5, the survey was conducted with students at KTH university. The goal of the survey was to gain insight of the student’s thoughts. We wanted to observe if there was an interest in the cloud and cloud deployment of applications. In addition, we wanted to test our theory where we propose a learning strategy and hear their feedback and suggestions on it.

From the survey, we could see that most students are interested in learning more about cloud technologies. We expected that many students would be interested, because we believe cloud technologies are a hot topic today. Even though it has been discussed for some years now the cloud is still evolving making it relevant for young developers such as the students in the survey.

We asked them if they specifically wanted to learn more about deploying applications in a cloud environment. 71.4% said “yes” which still is a good indication that most of the students are aware of the importance of cloud deployment. However, more than a quarter of the students disagrees, we were a bit surprised that it was that many who disagreed. One possible reason could be that students with no experience and knowledge about the cloud deployments are unaware of its importance for future work because they are uninformed. Another reason may be that students feel that the cloud is an abstract concept that is used in many technical contexts, which makes it too difficult to understand. But we believe we would need to carry out further studies in order to validate these reasons or adding a “why” question to those who answered “no”.

About 80% of the students liked our suggested theory and about 20% did not. We gave the students that did not favor the theory an opportunity to suggest a better solution. This was because we want to present the most suitable solution for learning cloud deployment in a university course. Therefore, feedback and insight from students play an important part for us to give an answer to the research question. From observing the answers of the sub-question 3.1, we could see that the students did not like the case study of the learning strategy. So, if we look at the learning strategy as two separate parts, one theoretical and the other practical, we can see that students dislike the practical part. With this information, we started to come up with another solution for the practical part.

53

6.2.4 Analysis We believe it is important to have a theoretical part and a practical part. The theoretical part could be taught through a lecture where a short presentation on cloud deployment is given. Additional text on the subject could also be given to the students for reading. The theoretical part would then be followed up with a lab/seminar/assignment that would be the practical part because that was what the students said in the survey. We also concluded from the survey, that this is the most favorable way for students to learn. Another option could be to add one more requirement in the project that is in the course IV1201. Where the students must do an actual deployment of their application to a PaaS. Then the students should explain and discuss what and how they have done it, in the final report for the course project. This is so that the students would need to think about the theoretical parts, and not only google the solution according to Leif.

In the interview with Leif, he wanted our opinion on which cloud service to include in the course. We have concluded that the options should be between Heroku and OpenShift. Heroku because it is easy and fast to deploy an application, with minimal need for configuration. This is also probably very near to how it is done in the real world, as a comparison, the big company uses Azure that is similar. But when doing it with Heroku, there is no knowledge being taught about the actual workings of cloud service. Therefore, OpenShift and its minimal variant MiniShift could be worth considering. With MiniShift the student would be forced to perform some network and administration tasks and get some basic knowledge about the infrastructure and backend. There would also be a possibility for the students that are interested to try out more advanced things on their own. Why we would choose OpenShift over Cloud Foundry is based on several things, which we state in the Case Study Analyse subchapter.

6.3 Answering the research question In this subchapter, we are going to answer each sub-question and then summarize them as an answer to our research question.

SRQ1 What activities could be used in the course? If we only look at IV1201, we have concluded that it would be a good idea to add it as a requirement for the project. As we have described, right now there is no option to version control the deployment process. Making it hard for Leif to verify that students have met the requirements. But the student could describe and show what has been done in the final project report. This would also give the students a chance to consider the theoretical parts.

If it should be integrated into another university course, the layout of that course must be considered. But we have concluded from the students and the teacher perspective, the learning strategy should be one theoretical part and a practical part as described in the analysis. The theoretical part could be taught through a lecture where a short presentation on cloud deployment is given. Additional text on the subject could also be given to the students for reading.

54

The theoretical part would then be followed up with a lab/seminar/assignment that would be the practical part

SRQ2 Which cloud services should be used in the course? As we state, we have concluded that Heroku or OpenShift using MiniShift would be the best alternatives. Heroku for its fastness and simplicity, and OpenShift using MiniShift for a deeper knowledge.

In general, Heroku and OpenShift could also be taught in the same course because they complement each other well.

SRQ3 How much time will a student need to deploy an application into different cloud services? We have estimated that on average it would take about 20 hours to deploy on OpenShift. With at least half that time spent on the actual setup, get it to run and understand the MiniShift VM. While the other half is spent on things like Docker management, deployment and debugging.

For Heroku, we have estimated that it would take about 4 hours to deploy. There is no time spent on setup, all time is spent on deployment and debugging.

SRQ4 How can a guide be written for deploying an application into different cloud services? We believe that a good guide should include: • Recommended system requirements • All preparations that need to be done before deploying • A description of the actual steps that are taken to deploy • Present and explain all the commands to deploy • The solution to problems that occurred We think that the case study that we have done could be used as this guide, or at least as a starting point for an assignment. We have stated all the commands we did to deploy and described them for deeper understanding. The guide also includes problems that we encountered and how we solved them.

RQ1 How can deployment to cloud services be taught in a university course? Based on our case study, interviews and the survey we have concluded a reasonable approach to how deployment with cloud services could be taught. This approach is presented through the four answers given in the sub-questions above. These four questions cover necessary criteria’s for suggesting a reasonable answer to our research question. We have concluded that the activities are a lecture that introduces Heroku and OpenShift, followed by assignments that let the students get hands-on experience on the cloud services. After our literature study, case study and the interviews, we realized that cloud deployment would work best as a stand-alone course. Because, when working with the degree project it became clear that cloud deployment is a broad technical area. The expectations on developers from companies are that they 55

should be able to handle cloud deployments, which emerged from our interview. Therefore, it is an important subject for future developers.

The practical activities in a stand-alone course could be taught through seminars, home labs or a guide to follow. The theoretical part could then cover things important for cloud deployment as:

• Cloud computing service models (IaaS, PaaS, SaaS) • Virtualization • Containerization • Orchestration • Cloud Services

56

7 Discussion

In this chapter, we will discuss some strengths and weaknesses of our research methods and research. We will also point out areas that could be considered for future work.

7.1 Method In this section, we will discuss our different research methods, and the validity and reliability of them.

7.1.1 Literature Study The literature study was performed on cloud computing, building techniques, and cloud services, this was to gather information and research methods related to our thesis. It was necessary to conduct the literature study so that we could see what would be important to observe in the case study.

7.1.2 Case Study The case study was performed with two goals. First was to test the different cloud services, to be able to compare them with different metrics in the result chapter. The second was to create a guide or a basis for a guide, to could be used in the course IV1201. It was necessary to create our own guide because we felt that guides that already exist are missing discussions of common problems and solutions. We gained valuable knowledge from conducting the case study because it clarified some unclear concepts.

7.1.3 Interviews and a Survey The interviews were performed to get opinions from a teacher and an employee at a big company. From there, we got valuable insights from the real-world that we could use in our answers. As time only allowed us to interview two people, the information might be a bit biased towards their opinions.

The survey was performed to get opinions from students to see if they are interested in cloud deployment, and what learning methods they prefer. The survey was sent to our classmates.

7.1.4 Validity Validity refers to evaluating if the answer to the research question is reasonable. Our answer’s validity can be validated by observing if the chosen research methods were suitable for gaining enough data to give an answer to the research question.

The literature study was important for us to gain knowledge about the context of cloud computing, building techniques, and cloud services. Without it, we wouldn’t have had enough understanding to be able to deploy in our case study. Without the case study, we wouldn’t have known how much can be expected from the students. This includes, how much time the students need for learning and deploying to a cloud service, which cloud service is appropriate to learn, 57

and how they could deploy. Our final answer would then have missed important factors, which would have lowered the validity of it. Another important factor to consider, to give our answer validity, is the opinions of the teacher’s, students’, and a company. The teacher’s opinion lets us know how he would like to teach the course. Students’ opinions let us know how they respond to a certain learning strategy for a course. A company’s opinion let us know what can be expected from the students in a work environment. Without insight into these opinions, our answer would not have been favourable for those that our result concerns. Therefore, the interviews and the survey were necessary for the validity of the answer.

One thing that we could have done differently, is to perform a much deeper literature study and case study that included everything about deployment to cloud services. This would have increased validity. But this would be impossible because we have a fixed time limit for conducting our research.

7.1.5 Reliability Reliability means that another researcher should be able to get the same conclusions using the same research question and research methods, in a similar environment.

There are some things that could lead to other conclusions. A big part of our research was a qualitative research method, especially the interviews. If one would conduct this research again, the interviewees might be other people who have other opinions. Those opinions could alter the outcome.

The conclusions from the case study were based on our own subjective experiences, and that might not be perceived the same for another person conduction the same research. Especially if the person's experiences with cloud deployment differs and if the cloud services have changed, i.e. updated their software or documentation, added language support or introduced new features. Therefore, the guide and comparison between cloud services could become different for another researcher.

7.2 Result We had two goals for our thesis, first was to evaluate if the knowledge of cloud deployment is useful and relevant for the course Design of Global Applications, IV1201. The second was to make a proposal of how cloud deployment could be integrated into the mentioned course. We believe that we have reached both these goals by giving an answer to the research question in the previous chapter.

Our research question was How can deployment to cloud services be taught in a university course?

The results that were found, shows that deployment to cloud services can be taught through a lecture followed by an assignment. With the answers from 58

Leif, we concluded that the most important activity must be a practical part so that students get hands-on experience with cloud deployment. But another teacher might have given us another answer, changing our conclusion. However, Leif is the teacher for the course that this thesis partly is conducted for, which makes his opinion a strong argument. Furthermore, the answer that we concluded might seem obvious with a theoretical and practical part, but with our findings and observations, it is a reasonable answer that is validated from the results of the research methods. The answer might have been different if we have had didactics in mind when concluding our research. Because didactics is a scientific method of presenting information to students in an educational way. But including research on didactics is outside of this thesis scope, though it is discussed in Future Work.

Throughout the work, we realized that there is so much more about cloud deployment that could be covered to fully understand it. Scalability, cost, and security are three very important topics in cloud deployment. If we would have included them when conducting our case study, the outcome could have been different, for example, if one of the cloud services would have had a problem with scalability/cost/security we might not have recommended it for a course. Evidently, there could be much more content to cover about cloud deployment than we have researched. Therefore, in our answer to the research question, we also stated that teaching of cloud deployment would fit better as a stand-alone course instead of it being a part of a course.

7.3 Future Work There are some things that could be researched further.

One area to research more about is Didactics, which is the theory of how things could be taught. Our learning strategy is completely built upon the teacher’s, company’s, and student’s opinions together with our own observations from the research. But with didactics, a more scientific approach could be involved in the conclusions for the learning strategy.

Our thesis does not cover the topic of scalability which could be researched more for each cloud service. Full installations of the Cloud Services could be done if one would have access to the required computing power, and then test scalability by stress-testing an application on the services.

The financial part regarding the cost could also be researched more, as an interesting comparison between the different services.

59

60

References

[1] ‘2018 Cloud Computing Survey • IDG’, IDG. [Online]. Available: https://www.idg.com/tools-for-marketers/2018-cloud- computing-survey/. [Accessed: 25-Mar-2019]. [2] T. B. Winans and J. S. Brown, ‘Cloud computing A collection of working papers’, p. 33. [3] P. Mell and T. Grance, ‘The NIST Definition of Cloud Computing’, p. 7. [4] B. Singh and G. Singh, ‘A STUDY ON VIRTUALIZATION AND HYPERVISOR IN CLOUD COMPUTING’, no. 1, p. 6, 2018. [5] Q. Zhang, L. Liu, C. Pu, Q. Dou, L. Wu, and W. Zhou, ‘A Comparative Study of Containers and Virtual Machines in Big Data Environment’, arXiv:1807.01842 [cs], Jul. 2018. [6] ‘About Heroku | Heroku’. [Online]. Available: https://www.heroku.com/about. [Accessed: 25-Apr-2019]. [7] ‘Heroku Security | Heroku’. [Online]. Available: https://www.heroku.com/policy/security. [Accessed: 25-Apr- 2019]. [8] ‘Introducing a New How Heroku Works’. [Online]. Available: https://blog.heroku.com/introducing-how-heroku-works. [Accessed: 29-Apr-2019]. [9] Sagupta1, English: The image has been re-created based on the book: Heroku Cloud Application Development. 2016. [10] ‘Heroku Buildpacks | Heroku’. [Online]. Available: https://www.heroku.com/elements/buildpacks. [Accessed: 29- Apr-2019]. [11] ‘Heroku Dynos | Heroku’. [Online]. Available: https://www.heroku.com/dynos. [Accessed: 26-Apr-2019]. [12] ‘Dynos and the Dyno Manager | Heroku Dev Center’. [Online]. Available: https://devcenter.heroku.com/articles/dynos. [Accessed: 26-Apr-2019]. [13] ‘Custom Domain Names for Apps | Heroku Dev Center’. [Online]. Available: https://devcenter.heroku.com/articles/custom-domains. [Accessed: 29-Apr-2019]. [14] ‘HTTP Routing | Heroku Dev Center’. [Online]. Available: https://devcenter.heroku.com/articles/http-routing. [Accessed: 29-Apr-2019].

61

[15] ‘Cloud Foundry – Open Source Cloud Application Platform’, Cloud Foundry. [Online]. Available: https://www.cloudfoundry.org/. [Accessed: 03-May-2019]. [16] ‘Cloud Foundry Overview | Cloud Foundry Docs’. [Online]. Available: https://docs.cloudfoundry.org/concepts/overview.html. [Accessed: 03-May-2019]. [17] ‘Cloud Foundry Components | Cloud Foundry Docs’. [Online]. Available: https://docs.cloudfoundry.org/concepts/architecture/. [Accessed: 07-May-2019]. [18] D. C. E. Winn, ‘Cloud Foundry: The Definitive Guide’, p. 104. [19] ‘Overview | Architecture | OpenShift Container Platform 3.11’. [Online]. Available: https://docs.openshift.com/container- platform/3.11/architecture/index.html. [Accessed: 03-May-2019]. [20] ‘Builds and Image Streams - Core Concepts | Architecture | OpenShift Container Platform 3.11’. [Online]. Available: https://docs.openshift.com/container- platform/3.11/architecture/core_concepts/builds_and_image_str eams.html. [Accessed: 05-May-2019]. [21] O. Falkman and M. Thorén, Improving Software Deployment and Maintenance : Case study: Container vs. Virtual Machine. 2018. [22] ‘Quantitative Research: Definition, Methods, Types and Examples’, QuestionPro, 12-Apr-2018. . [23] ‘Qualitative Research: Definition, Types, Methods and Examples’, QuestionPro, 27-Mar-2018. . [24] A. Håkansson, ‘Portal of Research Methods and Methodologies for Research Projects and Degree Projects.In: Hamid R. Arabnia Azita Bahrami Victor A. Clincy Leonidas Deligiannidis George Jandieri(ed.), Proceedings of the International Conference on Frontiers in Education: Computer Science and Computer Engineering FECS’13’, 2013, pp. 67–73. [25] ‘Abductive reasoning (abductive approach)’, Research- Methodology. . [26] Z. Zainal, ‘Case study as a research method’, p. 6, 2007. [27] A. Bryman, ‘Integrating quantitative and qualitative research: how is it done?’, Qualitative Research, vol. 6, no. 1, pp. 97–113, Feb. 2006. [28] K. S. Bordens and B. B. Abbott, Research design and methods: A process approach, 5th ed. New York, NY, US: McGraw-Hill, 2002. 62

[29] ‘Zotero | Your personal research assistant’. [Online]. Available: https://www.zotero.org/. [Accessed: 22-May-2019]. [30] ‘Setting Up the Virtualization Environment - Getting Started | Minishift | OKD Latest’. [Online]. Available: https://docs.okd.io/latest/minishift/getting-started/setting-up- virtualization-environment.html. [Accessed: 15-May-2019]. [31] ‘Download PCF Dev — Pivotal Network’. [Online]. Available: https://network.pivotal.io/products/pcfdev. [Accessed: 22-May- 2019]. [32] ‘OpenShift’, GitHub. [Online]. Available: https://github.com/openshift. [Accessed: 27-May-2019]. [33] ‘Newest “openshift” Questions’, Stack Overflow. [Online]. Available: https://stackoverflow.com/questions/tagged/openshift. [Accessed: 27-May-2019]. [34] ‘Cloud Foundry’, GitHub. [Online]. Available: https://github.com/cloudfoundry. [Accessed: 27-May-2019]. [35] ‘Newest “cloudfoundry” Questions’, Stack Overflow. [Online]. Available: https://stackoverflow.com/questions/tagged/cloudfoundry?sort= newest&pageSize=15. [Accessed: 27-May-2019]. [36] ‘Newest “heroku” Questions’, Stack Overflow. [Online]. Available: https://stackoverflow.com/questions/tagged/heroku?sort=newes t&pageSize=15. [Accessed: 27-May-2019].

63

64

TRITA-EECS-EX-2019:215

www.kth.se