IT 20 052 Examensarbete 15 hp Augusti 2020

A proposal for a cloud-based microservice architecture for the Skolrutiner system

Nhat Minh Pham

Institutionen för informationsteknologi Department of Information Technology

Abstract A proposal for a cloud-based microservice architecture for the Skolrutiner system Nhat Minh Pham

Teknisk- naturvetenskaplig fakultet UTH-enheten Skolrutiner, a fast-moving startup company, has a great idea of a platform for handling documents. However, they do not have a scalable Besöksadress: and modular system architecture since all the current services are Ångströmlaboratoriet Lägerhyddsvägen 1 centralized and running on a single virtual machine. The main purpose Hus 4, Plan 0 of this thesis is to study and design a cloud-based microservice system for the platform of the company. This new microservice Postadress: approach, with the help of many modern features of , Box 536 751 21 Uppsala is expected to resolve the major issues of the current system. The proposed architecture might be costly and challenging to implement, Telefon: but it could help the company to have a general knowledge about cloud 018 – 471 30 03 computing, , and how to apply these technologies to the

Telefax: expanding business. The re-implementation of the Skolrutiner system 018 – 471 30 00 based on the given architecture is out of the scope of this thesis work. Hemsida: http://www.teknat.uu.se/student

Handledare: Johannes Lundström Ämnesgranskare: Karl Marklund Examinator: Johannes Borgström IT 20 052 Tryckt av: Reprocentralen ITC

Contents

1 Introduction 1 1.1 Motivation ...... 1 1.2 Contribution ...... 1 1.3 Overview ...... 2

2 Cloud computing 2 2.1 Definitions ...... 2 2.2 Service models ...... 3 2.3 Deployment models ...... 5 2.4 Characteristics ...... 5

3 Microservices 6 3.1 Monolithic architecture ...... 6 3.2 Microservices ...... 8 3.3 Key benefits ...... 9 3.4 Communication mechanisms ...... 10 3.5 Inter-service communication ...... 12 3.6 Drawbacks ...... 13

4 Current system design 13 4.1 Current system design ...... 13 4.2 About the Skolrutiner application ...... 14 4.3 Current design ...... 14

5 Proposal of the new design 16 5.1 Requirements for the new design ...... 16 5.2 The new design ...... 16 5.2.1 Overall architecture ...... 17 5.2.2 API gateway and service design ...... 18 5.2.3 Authentication and authorization in the system . . . . 23 5.2.4 File uploading in the new design ...... 25 5.3 Evaluation ...... 26

6 Conclusion 27 6.1 Contribution ...... 28 6.2 Future work ...... 28 1 Introduction

This section will give you an overview of what motivates me to write the thesis, what this thesis contributes and a summary of the work that each section will cover.

1.1 Motivation While the term “cloud” might seem quite abstract, the benefits of cloud computing are very tangible [4]. Since the cloud has gained popularity in the computing world, there is no doubt that moving to the cloud nowadays is a natural evolution. More and more businesses are making the switch by shifting their applications from self-hosted servers into the cloud, and so is my client company, Skolrutiner. By cloud, we mean any computing environment in which computing, networking, and storage resources can be provisioned and released elastically in an on-demand, self-service manner [21]. With all the great advantages and profits that cloud computing brings about, sometimes it is the only choice for modern companies with large and complex systems. The term ”microservice architecture”, which describes a particular way of designing an application system as a collection of independently deploy- able services [11], is also a very hot topic in recent years. Many development teams take it as the top-tier approach to traditional monolithic architecture. Most of them are doing the shifting work from their code-heavy monoliths to the smaller, self-contained microservices. However, like any architectural style, microservices comes with both benefits and costs, which you should understand in depth when designing any microservice system. The goal of this thesis project is to design a microservice system based on cloud computing, which could meet the requirements from the company Skolrutiner of a system having high availability, flexible and easy of executing on a cloud platform [19]. The proposed architecture could be useful for the company, Skolrutiner, to help them to have a better vision of how to apply cloud computing and microservices to their specific context.

1.2 Contribution The focused contributions of this thesis are to:

• provide a comprehensive definition of cloud computing, its character- istics and advantages over maintaining physical infrastructure • compare microservices with the traditional monolithic architecture and discuss the advantages and drawbacks of implementing a microservice system over a monolithic one

1 • provide the client, Skolrutiner, with a comprehensive description of their curren system architecture, discuss the disadvantages of the cur- rent design and the company’s requirements for the design of the new system • propose a new cloud-based microservice design for the Skolrutiner sys- tem, explain and evaluate it.

1.3 Overview The thesis is divided into sections in a way that I hope will make it easier to read from beginning to end. Here is an overview of what would be covered:

• Section 2: Background provides background theory of cloud com- puting and microservices. • Section 3: Architecture introduces the company’s current system, its drawbacks, propose new design for the company and evaluates the new design. • Section 4: Conclusion summarizes the thesis work and discusses potential future work.

2 Cloud computing

This section brings up basic knowledge about cloud computing, its charac- teristics, service models and deployment models.

2.1 Definitions At the early stage of the internet era, physical servers where the only so- lution for web infrastructure of every company. However, things change and now, cloud computing underpins a huge number of services. This in- cludes consumer services like Gmail or the cloud back-up of the photos on your smartphone [14]. Cloud computing, in simple terms, is the delivery of computing services over the internet. According to the National Institute of Standards and Technology (NIST) in the United States of America, Cloud computing is ”a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of con- figurable computing resources (e.g., networks, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” [17]. This cloud model is composed of five essential characteristics, three service models, and four deployment models [17].

2 2.2 Service models The service models of cloud computing are categorized into three basic mod- els which are built on top of one another: Software (application) as a Ser- vice (SaaS), Platform as a Service (PaaS) and Infrastructure as a Service (IaaS), as shown in figure 1. Understanding these models and the differ- ences between them could enable a company like the client - Skolrutiner to accomplish business goals and effectively support business changes.

Figure 1: Service models in cloud computing

Software as a Service (SaaS): SaaS provides the capability for users to connect to and use applications running on a cloud infrastructure. All the setup of the underlying infrastructure, platform, application software and data is handled by the service provider, which is responsible for managing hardware and software, ensuring the availability and security of the app as well as its data. Tenants can access their applications through a program interface, or commonly through a web browser. Some example of SaaS products might be web-based email services (Gmail, Outlook, ...), online office services (Office365, Google Docs, ...) for personal use or for organisation use like MailChimp, Twilio, Elastic stack, etc.

3 Platform as a Service (PaaS): PaaS provides to the consumer a complete development and deployment environment in the cloud, to deliver everything from simple cloud-based applications to sophisticated, cloud- enabled enterprise applications [16] using programming languages, libraries, frameworks and tools supported by the provider. The consumers do not need to manage the underlying infrastructure including servers, operating system, network or storage, but has control over the deployed applications and possibly configuration settings for the application-hosting environment [17]. PaaS is often used by organisations for some scenarios such as devel- opment framework, analytics, business intelligence, etc. Infrastructure as a Service (IaaS): IaaS is the lowest level of the three cloud service models, providing the instant computing infrastructure, provisioned and managed over the internet. IaaS offers the capability to scale up and down quickly and pay- as-you-go payment. With infrastructure being available in the cloud, you can avoid the expense and complexity of buying and managing your own physical servers and other data-center infrastructure [15]. In addition to the three main models above, is recently a popular service model supported by many service providers. The major difference seems to be that Serverless computing makes applications scale automatically while with PaaS, applications must be configured to scale automatically. To be more precise, the figure 2 will illuminate the similarities and differences between serverless computing.

Figure 2: Similarities and differences between Platform-as-a-Service and Serverless computing

4 2.3 Deployment models There are different ways to deploy cloud resources. The options for deploy- ment are public, private and hybrid cloud. Public cloud: Public cloud is the most popular way of deploying cloud computing. The cloud resource is provisioned for open use by the general public. It is owned, managed and operated by a third-party cloud service provider and delivered over the internet. Using the public cloud, different cloud tenants share the same hardware, network and storage. Web mail services, office applications, online storage, development and testing envi- ronments are usually deployed on a public cloud. The big names of public cloud providers are Web Services, Google Cloud Platform and Mi- crosoft Azure. Private cloud: We use the term ”private cloud” to refer to internal computing resources that are used only by one organisation and not made available to the public. It can be a physical data-center located at an or- ganisation or hosted by a third-party service provider. Being dedicated exclusively to the organisation, a private cloud resource could be more eas- ily customized to meet specific requirements. Private clouds are often used by financial organizations, government agencies and medium to large corpo- rations to maintain control over a more secure environment. Hybrid cloud: Hybrid cloud is the cloud infrastructure which is a combination of public and private cloud in order to make use of advantages of both. Users can choose public cloud for tasks that require large storage and lower security needs, while private cloud is well-suited for storing sensitive data or running internal confidential tasks. The composition of these two cloud types also provides data and application transition between them and option for ”cloud bursting” [13] from private to public cloud when there is a rise in demand for additional resources.

2.4 Characteristics According to a National Institute of Standards and Technology (NIST) doc- ument, there are five essential characteristics of cloud computing [17].

1. On-demand self-service: Cloud computing resources like storage space, virtual machine instances, instances can be provisioned by the organizations without requiring human interaction with each ser- vice provider. Consumers can access by themselves their cloud account to see current status and set up their cloud service. 2. Broad network access: Cloud computing resource capabilities are avail- able over the network and could be accessed through various client platforms (e.g., mobile phones, tablets, laptops and workstations). A stable connection to the cloud with high bandwidth and low latency

5 relates directly to a better user experience of the cloud service over the internet. 3. Multi-tenancy and resource pooling: Cloud computing resources are pooled to support multiple consumers using a multi-tenant model. Multi-tenancy allows the cloud provider to dynamically assign the physical and virtual resources according to consumer demand while maintaining privacy and security over consumer’s information. Ex- ample of those resources are storage, memory, computing processors, network bandwidth, etc. The providers’ resource pool is often very large and flexible to serve multiple client requirements and to provide for economy of scale. Cloud providers should ensure that resource allocation must minimize the impact on performances of critical applications. 4. Rapid elasticity and scalability: One of the most important charac- teristics of cloud computing is that cloud computing resources could be scaled up as manufacturing organizations need them, and scaled down when the demand drops. The resources might include storage, computing power, network and therefore the service cost [8]. 5. Measured service: Cloud systems automatically monitor, report and optimize resource usage by leveraging a charge-per-use capability that is appropriate to the service type such as storage, processing, band- width, and active user accounts. It also provides transparency for both the cloud service provider and customers.

3 Microservices

Microservices is also considered as a promising architecture for modern sys- tems. This section will compare microservices with the traditional mono- lithic architecture. The section also includes a definition of the microservice architecture and a discussion of the advantages and drawbacks of imple- menting a microservice system.

3.1 Monolithic architecture Monolithic architectures have a lot of benefits when applications are initially developed:

1. Defining a monolith: A monolithic application is built as a single deployable unit, e.g. a single WAR file in Java or a single web appli- cation in .NET or PHP. They have a single code base which build the entire application, and usually consist of three main modules: • Database: Including tables in a database management system.

6 • Client-side user interface: Often consisting of HTML pages with JavaScript, or could be a single-page application running in the browser. • Server-side system: Handling requests, getting and updating data, generating HTML pages in response to client requests.

Figure 3: Monolithic architecture

2. Why monolithic architecture are suboptimal for modern cloud- based systems The first reason is about flexibility. By using a monolithic archi- tecture described in figure 3, the technology stack is decided at the start and followed throughout, which can limit the availability of ”the right tool for the job”. It is inconvenient to make frequent changes, and especially difficult for developers to upgrade the application stack version to adopt newer technology. Another issue is about reliability. Since a single code base is used for all the components of the application, monoliths have many inter- dependencies between their parts. If there are bugs in a module (e.g. a memory leak), the entire application might be affected and in the worst case become unavailable. Scaling the logical modules of a monolithic application is diffi- cult, especially when they get larger. With a large monolithic system, scaling up is often vertical, meaning adding more power (CPU, mem- ory) to an existing running machine, which is limited by the capacity of individual servers and could lead to downtime during the scaling . A large and complex monolithic application is an bottleneck to continuous development. To adopt minor changes to only one

7 component, the development team might have to rebuild, do full ap- plication testing and redeploy the entire application.

3.2 Microservices The microservice architecture is now considered as a promising technol- ogy solution to the disadvantages of the traditional monolithic architecture. Migrating to microservices from a monolithic architecture, which has been successfully done by many big companies like Netflix, Google, Amazon, etc., represents a fundamental shift in how to approach new application architec- tures. We will find out what microservice architecture is and its advantages over a monolithic one.

Definition

According to Martin Fowler, microservices, in short, is an architectural style, in which a large complex application is developed as a suite of small services. Microservices could be deployed independently and are loosely coupled [21]. These services are built around business capabilities and could be developed in different programming languages, using different data storage technolo- gies. They often communicate with lightweight, language-neutral mecha- nisms, often an HTTP resource API like Representational State Transfer (REST). Microservices have a bounded context, which means they don’t need to know anything about the underlying architecture or the implemen- tation of other services.

Characteristics

Overall, microservices are small, autonomous services that work together.

• Small and focused: High cohesion, the drive to have related code grouped together, is an important term when designing microservices. In a monolithic system, despite the efforts to make the code less co- hesive by creating abstractions and modules, code related to similar functions tend to spread all over and break down the boundaries, mak- ing implementation and fixing bugs more difficult. Microservices focus the service borders on explicit business bound- aries. This makes the code specifically tied to a given piece of func- tionality and help developers avoid the associated difficulties when the codebase grows too large. The standpoint is reinforced by the Single Responsibility Principle, saying ”Gather together those things that change for the same reason, and separate those things that change for different reasons” [18]. • Autonomous: One of the hallmarks of a well-designed microservice

8 architecture is achieving . This means that one service is able to change independently of other services and that a service can be deployed by itself without needing to change any other part of the system. To do the decoupling properly, the development team will need to model the services and the in the right way. Otherwise, many of the advantages of a microservice architecture described in the next section will be very difficult to achieve.

3.3 Key benefits The benefits of microservices seem quite obvious that many big companies have made the decision to adopt it. Compared to monolithic architecture, microservices offer: Technology diversity is a significant benefit of microservices. In a microservice architecture, each service is an independently deployable unit so with a system that contains collaborating multiple services, we can de- cide to use different technologies inside each one. This allow teams to pick an appropriate tool for each job like in figure 4, whereas with a monolithic system, developers have to select a limited number of standardized tools for all the functions. These decisions on languages and frameworks are difficult to change or upgrade, and after time will lead to being locked into outdated technologies.

Figure 4: Different technologies used for different microservices

With microservices, if one part of the system requires a version upgrade or needs to improve its performance, developers could decide to update or switch to a different technology stack much easier. For example, the team responsible of developing the user-comment service of a social network sys- tem want to change from MySQL - the old relational database to a NoSQL database like MongoDB due to the issue about performance, could do it easily without affecting other parts of the system. Being deployed in the cloud, the application system could have a large

9 number of choices of different programming languages with SDK ( Kit), storage and technologies. A microservice architecture could make it easier for failure isolation hence resilience. If one component of a system collapses, there are many techniques to implement fault isolation like bulkheads, circuit breakers, health-checking. Having explicit boundaries between services could be an obvious bulkhead. Of course, we need to understand that a system with distributed services can have a source of failure to deal with, but making your system more ”cloud native” could mitigate these problem significantly. About scalability, while a monolithic system requires scaling the whole application as a single unit, with microservices, developers can just sep- arately scale the services that need scaling. For example, in the system described in figure 5, the video service could be scaled independently of the login and post services.

Figure 5: A system with three microservices which can be scaled indepen- dently from each other

3.4 Communication mechanisms When using a monolithic architecture, a client-side browser or mobile app make requests to the application, and then the requests might be routed by a load balancer to one of several identical application instances. But in a system designed with microservices, the monolith is replaced by a set of

10 services. In a microservice system, most of the API endpoints handles data owned by different services. One possible approach is that the client can make multiple requests to many services like in figure 6.

Figure 6: Direct API calls from a client often results in requests to multiple microservices

In this direct communication approach, each service will have a public endpoint, and sometimes with a different port for each one (like https://api.service1.com:8080/). The URL endpoint could map to the service’s load balancer, so that the re- quest then be handled by one of the instances. This solution might look good but when under careful consideration, have several drawbacks:

• High coupling: With this approach, the client application is coupled to the internal microservices. The code base of the client app needs to include the endpoint of each microservice and how to call the API provided by that service. If the endpoint or the API of one service changes, developers have to change it in the application also. • Chatty communication: This approach might allow a single page/screen in the client web/app to call multiple services, which can result in many round trips between the client and backend server, adding significant latency. • Security risks: Without a ”interface”, all the services in the system are exposed and made available to the public, making the services much less secure against common attacks, especially Distributed De- nial of Service (DDoS) attacks.

11 • Cross-service concerns: As a result of direct client-to-microservice communication, the microservices which are made available to the out- side must individually handle some concerns such as authentication, authorization, etc [3].

When compared to the direct client-to-microservice communication, an API gateway might be a better approach to designing and building a large and complex microservice system. An API gateway lies between the client and a group of microservices and provides a single-entry point for those services. A well-designed API gateway system, which may contain several gateways, not only solve the problems of the client-to-microservice model, but other features like security and logging could also be applied and inte- grated into the API gateway.

3.5 Inter-service communication In a monolithic application, communication between components is quite effortless with language-level methods or function calls. One of the biggest challenges when changing from a monolithic system to microservices is re- lated to inter-service communication. Client and services have many differ- ent types of communications which could be categorized based on protocol and whether the protocol is synchronous or asynchronous. In fact, most microservice systems apply both of these communication mechanisms in a flexible way.

• Synchronous: the most well-known synchronous protocol is HTTP. With this approach, a client sends a request to a service and then waits for the response. The point here is that the protocol HTTP/HTTPS is synchronous. Regardless of whether the client code execution is asynchronous or not, the client code can only continue its task once it has received the HTTP response from the service. • Asynchronous: this approach is based on messaging protocols such as AMQP, MQTT, and STOMP. Both the message sender and the message receiver connects through a message broker and does not need to know about each other. The message sender just sends messages to the broker and does not wait for a response. Those messages are stored within the broker until the receiver is ready to process them. Using an asynchronous also allows for multiple-receiver communication. The mechanism is event publish/subscribe, using an event-bus interface or message broker to propagating data updates between services through events. One drawback of systems with this approach is the complexity due to the implementation of the message broker. [5].

12 3.6 Drawbacks Microservices is considered as a promising architecture for modern systems. But like any other architectural style, microservices also have its trade- offs. Thoroughly understanding these drawbacks could help development teams to make sensible choices when applying microservices to their specific context. • Distribution issues: despite improving modularity, microservices have some drawbacks. The significant one is that the system components are distributed. Distribution brings about several issues, one of which is that it adds a lot more complexity when the system is divided into microservices and those services communicate remotely. Remote calls are slower, and they might affect the performance when the system is poorly designed and needs to make many service calls to handle a single client request. Another issue is reliability. Services are distributed so remote calls could fail at any time because of a network problems or failure of ser- vices. To improve preciseness, the system should be designed properly to handle failure [10]. • Operational complexity: Despite the advantage of being able to de- ploy small and independent components in a short time, handling and updating an increasing numbers of services cause a great amount of tension to the development team. To achieve , the company should seriously consider adopting DevOps 1 at the first time they start with microservices, and the DevOps team needs to imple- ment the system with effective deployment and monitoring automa- tion. • Infrastructure management: Microservices often introduce multiple databases, clusters of service instances, message brokers, data caches, etc. All such components must be set up and maintained. It would take a lot of time and money if the development team struggles with them everyday [20]. Adopting cloud computing may be the best solu- tion for the company to be free from such concerns.

4 Current system design

4.1 Current system design Skolrutiner is a company with the aim to launch an efficient application to upload and access documents in order to simplify the document management 1DevOps is a set of software development practices that combine software development (Dev) and information-technology operations (Ops) to shorten the systems development life cycle while delivering features, fixes, and updates frequently in close alignment with business objectives [6]

13 in schools and youth care homes. This section will mainly discuss the design of the current system of Skolrutiner and analyse its disadvantages.

4.2 About the Skolrutiner application The Skolrutiner system consists of two synchronized parts, a website, and a mobile phone application. The website is the tool for the principals and administrators to:

• create/remove categories and upload/delete documents for each cate- gory • send push notification messages to the management group, a specific group of users or to all employees • create a call for meetings on the bulletin board.

The app (available on both Google Play and App Store) is the employee’s tool, with these following functions:

• A bulletin board providing the flow of information to the employees. • Folders for storing documents and forms. • Push notifications for group messages, calls for meetings, or documen- tation uploads. • Connecting a printer to print out documents.

4.3 Current design According to the current design, the system consists of the following com- ponents:

• Mobile application (written in React Native): makes calls to and re- ceives responses from the API server. • Web application (written in Angular): makes requests to and gets responses from the API server. • API server (.Net MVC): handles all the logic work, saves/retrieves data from the database, stores uploaded files into file storage, etc. • Database (MSSQL): includes all tables of the system. • File storage: storage for uploading files.

The current Skolrutiner server, which contains the API server and the Web application, is currently running on a single Virtual Machine (VM) instance. The database and the file storage for uploading all the documents are also located in the same virtual machine. The overall architecture of the current server is described in the figure 7.

14 Figure 7: Overall architecture of the current Skolrutiner system

Some significant issues of the current system:

• The API server and the web application are currently run on the same VM instance. They often become unavailable and need to restart manually when performing tests using curl tool to send more than 100 requests at the same time. A possible short-term solution is to only upgrade the instance by adding more RAM and CPU cores. • There is only one MSSQL database server running on the same ma- chine as the API server and web application. When the number of records in the database tables gets bigger, it would take a lot of time for the database to execute a single ”select” query. • The number of uploaded documents is increasing sharply and will ex- ceed the current storage capacity in the near future. When the time comes, the client has to pay more money to get more space for the later documents.

15 5 Proposal of the new design

After describing the design of the current system and its weaknesses, in this section, we are going to discuss the client’s requirements for a new system design, propose a new design, explain and evaluate it.

5.1 Requirements for the new design The requirements for the new design depends on the functionalities of the system and the business model. According to requests of the company Skol- rutiner, the requirements for the new system design in this thesis work are the following:

• Scalability: One of the most important characteristics of a modern system is scalability. For this requirement, the new system should be easily scaled up to handle the rising load and scalled down when the demand is below the normal level. • Simplicity: Another requirement from the company is that the sys- tem should be comprehensible and simple enough for the development team to thoroughly understand and implement. • Performance: User experience, which depends a lot on the perfor- mance of the system, is one of the most decisive factors in any business in this competitive world. The performance of the new design includes many aspects to consider such as web page display speed, network con- nection, database performance, hardware, etc. • : The new system should continue operating prop- erly, maybe at a reduced level rather than failing completely, in the event of the failures of some of its components [7]. • Cost Saving: Having a well-designed and high-performance system running smoothly certainly is significant for any IT company. How- ever, cost-effectiveness is often the most important aspect for compa- nies to consider when building their system. • Uploading large files: An additional feature for the new system is effectively handling file uploading, especially for large files (in this case files larger than 100 MB).

5.2 The new design According to the requirements of the new system mentioned in the previous section, this part will propose a new design for the Skolrutiner system with a detailed explanation.

16 5.2.1 Overall architecture This below figure 8 depicts the overall design of the new system.

Figure 8: Overall microservice design of the new system

The new system is designed as a collection of microservices. There are two API gateways. One API gateway is responsible for handling requests from normal users using the mobile application. One API gateway is respon- sible for handling requests from school administrators accessing the system using a web browser. The system is divided into several services: • Document Service: Manages uploaded documents. • User Service: Manages user’s details like personal information, roles and permissions. • Authentication Service: Authenticates users in the system. • Group Service: Connects and manages all groups of users in the system. • School Service: Manages different schools in the system.

17 • Notification Service: Manages all push notifications from the sys- tem to the users in order to notify them of real-time messages.

In this system model, the two API gateways lies between clients and services, except that notification service has a direct connection to clients to send real-time notifications. Requests from the mobile application go to the Mobile API gateway, then the gateway will contact some services, ask for necessary data, and send responses back to the user. Requests from the client web browser will be handled in the same way, but through the Web API gateway.

5.2.2 API gateway and service design The API gateways are implemented as single entry points for web and mobile application clients. When clients on web browsers make a request with an URL to retrieve a web page, the API gateway will forward the request to the web content service, return the frontend content (HTML, CSS, and javascript files). Once the frontend contents are sent back to the client and rendered by the browser, the web page will run a script to make requests to the API gateway asking for data. The API gateway routes the requests to the right services, receives the responses from the services and then sends it back to the client. The main difference between the web and the mobile API gateways are that the requests from the mobile application might not be the same as those from the web browser, and the mobile API gateway does not need to transfer the web frontend contents. The below figure describes the detailed architecture of API gateways in the new design.

18 Figure 9: Detailed architecture of API Gateways

According to figure 9, all the requests from clients are filtered by the Web Application Firewall (WAF) before reaching the Notification Service or the API gateways. For a cloud-based web application, WAF has become a requirement. The WAF improves the web application security and availabil- ity by providing protection from the application layer (layer 7) Distributed Denial-of-Service (DDoS) attacks, SQL injection, cross-site scripting, etc. There are some famous cloud-based WAF services with similar functionali- ties that clients can consider such as Cloudflare WAF, Amazon Web Service (AWS) WAF, Google Cloud Armor, Azure WAF. The request filtered by the firewall goes to the API gateway’s Load Balancer, which is placed before the API gateway instances, distributes the client requests across a group of identical API gateway instances by a suitable load balancing algorithm. Round robin is the most widely used algorithm. It distributes the requests across the instances in turn and is suitable for a group of instances that have similar computing resources, and handling the equivalent loads. There are load balancing services provided by many cloud providers, which provide highly available load balancing fea- tures without the customer thinking about managing the load balancing infrastructure. The Auto Scaling group in figure 10 consists of multiple identical in- stances that all have the same functionality. The group could make sure that the system has an appropriate number of API gateway instances avail- able to handle the load.

19 Figure 10: Auto scaling group of API gateway instances

Almost every popular cloud service providers nowadays like Amazon Web Service, Google Cloud, Azure, etc., enable users to set up their own auto-scaling group of VM instances. An autoscaling group could easily be integrated with the cloud load balancer so that all launched instances in the group will be configured to receive requests from the load balancer automatically. There are three main components of an auto-scaling group in a cloud service and all of them could be easily defined in the cloud service portal.

1. Group: a logical group containing the set of VM instances with similar properties for scaling and management schemes. Users could specify computing resources of each instance (RAM, CPU, Storage), a mini- mum number of available instances in the group and the cloud auto- scaling will ensure the minimum instance number. The Autoscaling group maintains the availability by doing a periodic health check on in- stances, terminating unhealthy instances, and launching new instances to replace them. Users could also specify the desired and maximum number of instances in the group. 2. Launch configuration: a configuration template for the auto-scaling group to launch VM instances. Users could specify the cloud VM im- age, instance type, cryptographic key pair, security group, firewall configurations, etc. In addition, users have the option to perform cus- tomized automated configuration tasks to deploy the application by

20 running scripts after the instance launch to update, install necessary packages, pull source code then run the application. The launch con- figuration can be modified after creation and can be saved to apply to other auto-scaling groups. 3. Scaling plan: The Auto-scaling feature provides several ways to de- fine the auto-scaling policy, which describes when and how to scale. Some available options for scaling plan are maintaining current in- stance level, manual scaling, scheduled scaling, and scaling on demand. In order to make the scaling work easier but still ensure the availabil- ity, I suggest scaling based on demand. The method of this approach is that when the instance monitoring feature of the cloud-hosted load balancing service detects that the CPU utilization crosses the specified threshold value for a specified duration (usually higher than 70 per- cents for 300 seconds), the auto-scaling will dynamically scale-out the group by launching a new instance to share the heavy loads. Scale-in policies are applied when the traffic becomes low for a period of time (usually smaller than 50 percent for 300 seconds), the auto-scaling will terminate an instance to reduce the cost.

Figure 11 illustrates the design of services in the new system. The details of this figure will now be explained in more detail.

Figure 11: Service design in the system

21 In the service design, the services are similar to the API gateway. They are structured as an auto-scaling group. These services get requests from a service load balancer, which receives requests from the API gateway. With this design, the API calls which are sent to the API gateway, are passed to the services to handle the work, then the gateway handles the returned results and responds to the client. In microservices with an API gateway, to avoid too many round trips between the API gateway and the services, a single request from the user to the API gateway, ideally, should only invoke one service to fetch the response. This might only be possible by data replication across services. For example, to get a list of all names of documents uploaded by students in a specified group, with this new approach of replicating data, the document service needs to know the ID of the group that the user uploading the file belongs to. If there is a change when the users join another group, that update event in the group service needs to be emitted and applied to the document service. To achieve eventual consistency when there are updates in the states of the services, asynchronous communication across the group of services is applied by using message brokers. Instances of services keep connecting to the broker, publish an event about the change of data to a topic queue and subscribes to the topic queue to receive the event of changing of data in other services that they are replicating. In this example, the document service keeps a table saving the information of people in each group. The message brokers could be deployed in the cloud with a highly avail- able solution like an active/standby model, which is supported by many cloud message broker service providers. This model includes two brokers in two different availability zones, sharing the same storage. There is normally one of the broker’s instances being active at any time, whereas the other is only in standby mode. If one of the instances goes down or is under main- tenance, it will be removed from the service, the other instance will become active in a short while [1]. Of course ”zero loss of message” will be very important to ensure data consistency. In the time the standby switches to active mode, all undelivered messages, which are stored into durable stor- age and replicated across several availability zones, could be recovered and handled by the new active broker instance. There are two database instances in the service design. The database is deployed as a cloud service, which consists of:

• Primary database: All the write operations from services will be han- dled by the primary database. This instance is also responsible for the backup job to the cloud backup storage. • Read replica: The name read replica means that it only executes read queries from services, and when there is a change in the pri- mary database, the read replica will be updated instantly (there are

22 two options for updating the change: synchronous and asynchronous, and in this design, I choose synchronous replication to ensure data consistency). If the primary database goes down, the read replica becomes pri- mary, and the cloud database service will launch a new instance as a new read replica.

5.2.3 Authentication and authorization in the system Under the microservice architecture, an application consists of multiple mi- croservice processes, and the access request for each microservice needs to be authenticated and authorized. Using stateful sessions, the session infor- mation of authenticated users are attached and only handled by one au- thentication service instance. When the load balancer forwards a request from a user to another instance, that user might be treated as unauthenti- cated. This methodology is not suitable for a horizontal scaled design like microservices. To solve this problem, the new design uses a stateless authentication mechanism with the help of client tokens. While sessions are stored in the server, tokens are kept in the client browser in the form of cookies, holding the user’s identity information. Therefore, the server does not need to store and handle the user status [12].

23 Figure 12: Stateless authentication process in the new design

Figure 12 describes the authentication process in the new design. When a user login, the request with the user’s credentials is forwarded to one of the authentication service instances. If the username and password are valid, the instance will grant a JWT 2, which is forwarded by the API gateway and sent back to the user as the response. The token is now stored as a cookie in the client web browser. Next time the user makes a request, the token is attached to that request and forwarded to the authentication service by the API gateway. If the token is valid, the authentication service will respond to the API gateway, telling that the user is authenticated. Then the request with the validated user identity will be sent from the API gateway to the resource service. The authorization work will be done by the resource service, which will decide if the user with the identity could access the resource or make that change or not. Then the response from that resource service will be returned through the API gateway again, back to the client.

2JSON Web Token (JWT) is an open standard (RFC 7519) that defines a compact and self-contained way for securely transmitting information between parties as a JSON object. This information can be verified and trusted because it is digitally signed. JWTs can be signed using a secret (with the HMAC algorithm) or a public/private key pair using RSA or ECDSA [9]

24 5.2.4 File uploading in the new design The idea of file uploading in the new design is based on the concept of a pre-signed URL, which is a feature for file storage in the cloud. The main advantage of the pre-signed URL method is to reduce the traffic load significantly between separated components of the microservice system by providing a way to upload files directly to the cloud storage. The file now does not need to be forwarded from the client, through the API gateway to the document service, and finally uploaded to the cloud storage.

Figure 13: File uploading with pre-signed URL

Figure 13 shows how the file uploading works with the help of a pre- signed URL. The cloud storage is private and only the object owner (the document service) has permission to access it. However, the document ser- vice can allow users to access the documents by creating a pre-signed URL, using its own cloud credentials. A pre-signed URL is a URL that can grant temporary access permission to a specific document in the cloud storage. This allows users to read the document with the GET method and use the POST method to upload documents within a specified period of time. The basic flow of the process is as follows: When the user wants to upload a document, there will be a request, forwarded by the API gateway to the document service. The document service, with its credentials, will use the cloud SDK to generate a pre-signed URL in response to the user through the API gateway.

25 5.3 Evaluation As all technical theory background, the old and the new system design have been talked through in the previous parts, this section will concentrate on evaluating the new design. The evaluation criteria are limited to the re- quirements for the new system from the Skolrutiner company. There are also discussions about the difficulties and advice for the company when they are in the progress of implementing the new system under design.

Scalability

A system with scalability means that it can easily scale up to handle the rising load and down when the demand decreases below the normal level. With the help of cloud computing, the new design could be considered as a scalable microservice system since the service instances and API gateway instances are structured as auto-scaling groups in order to scale up and down based on the specified policy. The message brokers can receive and deliver a large number of messages (a message broker with normal comput- ing resource level on Amazon Web Service can handle thousands of messages concurrently [2]). The development team can easily add more read instances to the database to handle the increasing number of complex read queries.

Simplicity

Simplicity means that the new system design is comprehensible and simple enough for the development team to understand and implement. This new design cannot achieve this goal due to the complexity of microservices when compared to the current system running as a monolith. There would be a huge amount of work when implementing and maintaining services, API gateway, database, message brokers, and firewall. Although the deployment job such as the endpoint and firewall configuration, policy setting is much straightforward with the help of cloud computing, the job is still much more complicated and challenging.

Cost saving

Every IT company not only need their system to have high performance and run tasks smoothly but are also highly concerned about the cost of the system. With the help of cloud computing, instead of having a separate team responsible for building, maintaining, and maybe upgrading a physical server, the company only needs to pay the cloud provider for the services they use. The auto-scaling groups for services and API gateways in the sys- tem can scale down by removing unnecessary instances based on specified policies, which help to save the cost a lot while ensuring system stability.

26 There are 2 message brokers (active/standby brokers) and 2 database in- stances (primary/read replica instances) for each service to reduce the cost while still be able to maintain the availability and reliability of the system. The development team can monitor the status of those components and simply add more instances when there is a warning from the cloud service.

Fault tolerance

The API gateway and services are built with instances, when an instance fails, the cloud service will remove it and replace it with a new one. The database of each service has at least two instances, a primary and a read replica and there are also two message brokers, one is active and the other is standby. With all the proposed solutions, the new system is considered fault tolerance, which means that the system could continue functioning in the event of failures of some components.

Upload large files

The goal of uploading large files is achieved by applying the method of pre- signed URLs, which is a feature of cloud file storage. The use of pre-signed URLs not only provides a more secure way to upload file but also help reduce a huge amount of traffic load between different components of the microser- vice system.

Performance

The performance of the new design includes a lot of aspects to consider such as web page display speed, network connection, database performance, hard- ware, etc. Since the limited time of the bachelor thesis is 10 weeks, I could not evaluate the performance of the new design. A proper evaluation can only be done if the development team at Skolrutiner decides to re-implement the system according to the microservice architecture proposed in this the- sis.

6 Conclusion

The conclusion section will conclude the report by summarizing the contri- bution and the future work of this thesis.

27 6.1 Contribution This thesis helps the company Skolrutiner to have a general knowledge about cloud computing and microservices, their characteristics, advantages, and drawbacks in different contexts. The thesis also indicates the disadvantages of the design of the current system and proposes a new design of cloud- based microservices for their system in the future. This thesis work includes the structure of the new design with a detailed explanation, a theoretical evaluation of the design based on specified requirements as well as difficulties and advises for the company in the future when they are in the progress of implementing the new system under design. The new system is successfully designed as microservices with high avail- ability, scalability, and satisfy almost all requirements for the new system design. However, the thesis does not give any advice on whether Skolrutiner should re-implement the system from scratch based on the new cloud-based microservice design, or if it is better to gradually transition to the new sys- tem design. The decision is left to Skolrutiner but I hope that this thesis will guide them in their decision.

6.2 Future work Owing to the complication concerning many technical aspects and the lim- ited time of the thesis project, many different adaptations, tests, and experi- ments have been left for future work, which is available for further researches. There are several tasks that could be carried out in the future to improve the completion of the thesis work. Depending on the choice of the company, a comprehensive roadmap for the company to gradually switch to the new system design, or a detailed instruction to implement and deploy the new system from scratch should be built and analyzed carefully. In the future work, all different programming languages, frameworks, and other tools should also be compared and selected properly, considering both technical and business angles. After the company having the new system set up, several tests and exper- iments need to be carried out to evaluate the system’s performance, as well as the other requirements mentioned in section ??. It is also very important for the company to review the system carefully to ensure the General Data Protection Regulation (GDPR) compliance, which is essential to make their system legitimate in the EU.

References

[1] Amazon mq active/standby broker for high availability. [ONLINE]. Available: https://docs.aws.amazon.com/amazon-mq/ latest/developer-guide/

28 active-standby-broker-deployment.html. Accessed on 2019-06-07.

[2] Amazon’s sqs performance and latency. [ONLINE]. Available: https: //softwaremill.com/amazon-sqs-performance-latency/. Accessed on 2020-05-22.

[3] The gateway pattern versus the direct client-to-microservice communication. [ONLINE]. Available: https://docs.microsoft.com/en-us/dotnet/standard/ microservices-architecture/ architect-microservice-container-applications/ direct-client-to-microservice-communication-versus-the-api-gateway-pattern. Accessed on 2019-06-01.

[4] Cloud computing: A complete guide. [ONLINE]. Available: https://www.ibm.com/cloud/learn/cloud-computing. Accessed on 2019-04-12.

[5] Communication in a microservice architecture. [ONLINE]. Available: https://docs.microsoft.com/en-us/dotnet/standard/ microservices-architecture/ architect-microservice-container-applications/ communication-in-microservice-architecture. Accessed on 2019-06-09.

[6] Devops. [ONLINE]. Available: https://en.wikipedia.org/wiki/DevOps. Accessed on 2019-06-13.

[7] Fault tolerance. [ONLINE]. Available: https://en.wikipedia.org/wiki/Fault_tolerance. Accessed on 2019-06-03.

[8] Five characteristics of cloud computing. [ONLINE]. Available: https://www.controleng.com/articles/ five-characteristics-of-cloud-computing/. Accessed on 2019-05-26.

[9] Introduction to json web tokens. [ONLINE]. Available: https://jwt.io/introduction/. Accessed on 2019-06-15.

[10] Microservice trade-offs. [ONLINE]. Available: https://martinfowler.com/articles/ microservice-trade-offs.html. Accessed on 2019-06-05.

29 [11] Microservices. [ONLINE]. Available: https: //martinfowler.com/articles/microservices.html. Accessed on 2019-04-30.

[12] Microservices authentication and authorization solutions. [ONLINE]. Available: https://medium.com/tech-tajawal/ microservice-authentication-and-authorization-solutions-e0e5e74b248a. Accessed on 2019-06-08.

[13] What is cloud bursting? [ONLINE]. Available: https://azure.microsoft.com/en-us/overview/ what-is-cloud-bursting/. Accessed on 2019-12-29.

[14] What is cloud computing? everything you need to know about the cloud, explained. [ONLINE]. Available: https://www.zdnet.com/article/ what-is-cloud-computing-everything-you-need-to-know-from-public-and-private-cloud-to-software-as-a/. Accessed on 2019-05-25.

[15] What is iaas? infrastructure as a service. [ONLINE]. Available: https: //azure.microsoft.com/en-gb/overview/what-is-iaas/. Accessed on 2019-05-24.

[16] What is paas? platform as a service. [ONLINE]. Available: https: //azure.microsoft.com/en-gb/overview/what-is-paas/. Accessed on 2019-05-23.

[17] Peter Mell and Timothy Grance. The NIST Definition of Cloud Computing, 2011. National Institute of Standard and Technology, Special Publication 800-145.

[18] Sam Newman. Building Microservices, designing fine-grained systems. O’Reilly Media, Inc., 1st edition, 2015.

[19] Vinod Keshaorao Pachghare. Microservice architecture for cloud computing. Journal of Information Technology and Sciences, Volume 2, Issue 1, April 2016.

[20] Kameswara Eati Carlos M Ferreira Dejan Glozic Vasfi Gucer Manav Gupta Sunil Joshi Valerie Lampkin Marcelo Martins Shishir Narain Shahir Daya, Nguyen Van Duy and Ramratan Vennam. Microservices, From Theory To Practice. O’Reilly Media, Inc., 1st edition, 2015.

[21] Matt Stine. Migrating to Cloud-Native Application Architectures. O’Reilly Media, Inc., 1st edition, 2015.

30