Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

David Gesvindr, Jaroslav Davidek and Barbora Buhnova Lab of Architectures and Information Systems, Faculty of Informatics, Masaryk University, Brno, Czech Republic

Keywords: , , Architecture Design.

Abstract: With the increasing adoption of microservice architecture and popularity of Platform as a Service (PaaS) cloud, design is in many domains leaning towards composition of loosely interconnected services hosted in the PaaS cloud, which in comparison to traditional multitier applications introduces new design challenges that software architects need to face when aiming at high scalability and resilience. In this paper, we study the key design decisions made during microservice architecture design and deployment in PaaS cloud. We identify major challenges of microservice architecture design in the context of the PaaS cloud, and examine the effects of architectural tactics and in addressing them. We apply selected tactics on a sample e-commerce application, constituting of microservices operated by Azure Service Fabric and utilizing other supportive PaaS cloud services within Microsoft Azure. The impact of the examined design decisions on the throughput, response time and scalability of the analyzed application is evaluated and discussed.

1 INTRODUCTION ware architects, with increasing difficulty to navigate among the enormous number of available design op- tions. This creates the need to examine the impact of Microservice architecture is becoming a dominant ar- applicable design patterns in the context of microser- chitectural style in the service-oriented software in- vices and PaaS cloud. dustry (Alshuqayran et al., 2016). In contrast to tra- Although some guidance on the microservice im- ditional multitier applications where the role of soft- plementation in the cloud exists, systematic support ware components is played mainly by software li- for software architects interconnecting microservices braries deployed and executed in a single process to- with other available PaaS cloud services (such as stor- gether with the main application, in microservice ar- age and communication services) is not available, chitecture, individual components become truly au- leaving them to rely on shared experience with typ- tonomous services (Fowler, 2014). There are multi- ically a single application scenario, without consider- ple advantages of this approach—change in a single ing other strategies or alternative designs. component does not require the entire application to In this paper, we study different architectural deci- be redeployed, communication interfaces become ex- sions which are being considered during microservice plicit, and components become more decoupled and architecture design in connection with the PaaS cloud. independent of each other, as illustrated in Figure 1. As a contribution of this work, we elaborate on both Separation of services into functions that can in- teract via interfaces is not new, same as methods Deployed Multitier Architecture Deployed Microservice Architecture to implement such separation in the framework of Presentation UI UI UI Deployed Deployed Deployed Compo Compo Compo Service A Service B Service C Tier -nent -nent -nent service-oriented architecture (Sill, 2016). But as Presentation Presentation Presentation Tier Tier Tier Application Compo Compo Compo emphasized by Sill (Sill, 2016), recent implementa- -nent -nent -nent Tier Application Application Application tions of microservices in cloud settings take service- Tier Tier Tier Data Shared Tier Service Service Service oriented architecture to new limits. Possibilities of Storage Storage Storage rapid scalability and use of rich PaaS (Platform as a Figure 1: Separation of components and their deployments Service) cloud services open new design possibilities in a traditional multitier architecture and in microservice ar- for microservices but also bring new threats for soft- chitecture.

619

Gesvindr, D., Davidek, J. and Buhnova, B. Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud. DOI: 10.5220/0007842906190630 In Proceedings of the 14th International Conference on Software (ICSOFT 2019), pages 619-630 ISBN: 978-989-758-379-7 Copyright c 2019 by SCITEPRESS – Science and Publications, Lda. All rights reserved

ICSOFT 2019 - 14th International Conference on Software Technologies

documented and undocumented design practices and dently of the platform as the used patterns are plat- solutions, and study their effects, including identifi- form independent and can be applied also to differ- cation of several surprising takeaways. As part of this ent cloud provider (Amazon, Google Cloud) offer- paper, we have designed and implemented a highly ing container hosting services and managed NoSQL configurable e-commerce application (an e-shop solu- . tion), which is designed in a way that its architecture Overall, we have evaluated 105 different bench- can be easily reconfigured to support thorough evalua- mark scenarios using 4 cluster configurations (5, 10, tion of the impact of various design decisions on mul- 15, 20 nodes cluster) involving 4 different storage ser- tiple performance related quality attributes (through- vices in the PaaS cloud, 2 communication strategies put, response time and scalability). When designing (synchronous and asynchronous) and 11 design pat- this application, we paid special attention to the se- terns. lection of real use cases and realistic architecture de- The paper is structured as follows. After the dis- sign, being overall as close as possible to a produc- cussion of related work in Section 2, outline of the tion version of such an application. The highly con- background in Section 3, and outline of key architec- figurable architecture of this application gives us a tural decisions that influenced the separation of ser- unique opportunity to provide comparison of multiple vices in Sections 4, Sections 5, 6 and 7 are dedi- versions of the same application using microservice cated to the presentation and evaluation of architec- architecture to evaluate and isolate impact of differ- tural concerned with service storage, communication ent design decisions, which is rarely seen in existing between microservices and application resilience. We work. Software architects can benefit from our work conclude the paper in Section 8. while designing their own applications when facing the same design decisions. With the help of this work they shall now be able to make better-informed deci- sions and choose the right architectural patterns lead- 2 RELATED WORK ing to desired quality of the application, or avoid un- documented problems caused by chosen architecture When designing microservice architectures, software or used PaaS cloud services. architects are currently often relying on known de- sign patterns and tactics (Gamma et al., 1995; Fowler, For the purpose of effective benchmark execution, 2002), which are however not validated in the con- we also implemented an automated client application text of microservices or PaaS cloud. Alderado et that can reconfigure the deployed application, initial- al. point to an absence of repeatable empirical re- ize sample data seeding on the server based on user search on the design and development of microser- requirements, execute any mix of workload, display vice applications (Aderaldo et al., 2017). New design key performance metrics and export detailed perfor- guidelines for microservice architectures are emerg- mance counters in JSON format. This application was ing (Sill, 2016; Wolff, 2016; Nadareishvili et al., used for our benchmarks discussed in this paper. 2016; Newman, 2015), which however do not con- For the implementation of the sample application, tain evaluated performance impacts of recommended we have decided to use Microsoft technologies and patterns on realistic implementations. At the same cloud services. The application is developed in .NET time catalogs of design patterns for the design of PaaS framework using Azure Service Fabric (Mic, 2018), cloud applications are becoming available (Erl et al., which is an open-source application platform for sim- 2013; Wilder, 2012; Homer et al., 2014; Mic, 2017), plified management and deployment of microservices but without measured impacts of their combinations that can run in Microsoft Azure cloud, on-premise in- and their use in a context of microservices. Vali- frastructure and any other cloud infrastructure. This dations of microservice architecture design patterns platform was chosen because of its robustness (Mi- are published by companies that have deployed mi- crosoft uses internally this technology to operate large croservices (Richardson, 2017; Net, 2015) and want scale services in Microsoft Azure, eg. Azure SQL to share their experience with transition to microser- Database, Cosmos DB and others), and rich platform vice design but not mentioning PaaS cloud deploy- services, which simplify microservice development. ment. Instead they focus on their currently deployed On the other hand, the features that we used can be architecture, its behavior and sometimes related per- manually implemented in other frameworks and the formance characteristics rather than transferable take- same results can be obtained by hosting small web ap- aways. Due to the size of their projects, they cannot plications communicating with each other via REST afford to implement multiple variants of their applica- , hosted in Docker and orchestrated by Kuber- tion using different design patterns and compare per- netes. Overall our results are generally valid indepen- formance of those to isolate the impact of used de-

620 Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

sign patterns. This is where our work complements with very effective operation costs. the current state of the art via offering more guidance Microsoft Azure used for our implementation pro- for the actual decision making. A case-study eval- vides us also with the possibility to host Azure Ser- uating the impact of transition from multi-tiered ar- vice Fabric cluster in form of a fully managed ser- chitecture to microservice architecture on throughput vice with very low deployment and maintenance ef- and operation costs is described in (Villamizar et al., fort. The cluster itself is deployed and operated at no 2015), but not in the context of PaaS cloud, as it is de- cost, we are only billed for virtual machines used to ployed to IaaS virtual machines. Challenges related host our services. New virtual machines can be easily to transaction processing and data consistency across provisioned and released based on the overall utiliza- multiple microservices are described in (Mihinduku- tion of the cluster, to take advantage of cloud elasticity lasooriya et al., 2016; Pardon et al., 2018). Criteria and to optimize operation costs. for microservice benchmarks and a list of sample ap- Despite all mentioned advantages of microservice plications are presented in (Aderaldo et al., 2017), but deployment to the PaaS cloud, there are associated without optimization for PaaS cloud and its services. threats related to missing guidance on how to design microservices in the PaaS cloud context. As there is a very rich set of PaaS cloud services currently avail- 3 MICROSERVICES IN PaaS able that can be utilized by the microservice appli- cation (storage, messaging, etc.) and have a direct CLOUD impact on the quality of the service, it becomes very complex for a software architect to design the mi- An indisputable advantage of operating microservices croservice application so that it meets all given quality in the PaaS cloud is the availability of a rich set criteria. In this paper we compare and discuss mul- of complex ready-to-use services, providing software tiple design choices and their impacts learned from architects with complex functionality, high service running over 100 experiments with variable software quality (hight scalability and availability guaranteed architecture of our microservice application. by SLA), low-effort deployment, and thus easy in- tegration within the developed application. More- over, multiple services are not even available for on- premise deployment, or are costly to deploy and op- 4 SERVICE DECOMPOSITION erate with the same quality of service. DECISIONS Microservices hosted in the PaaS cloud can bene- fit very well from cloud elasticity and measured ser- This section describes the key design decisions that vice (Mell and Grance, 2011), which allows us to eas- shaped the overall architecture of the designed appli- ily scale individual services by allocating new com- cation and led to separation of the application into a pute resources and pay only for the time when the set of interconnected microservices, refined from the service instance is running. As part of low-effort de- initial set of domain entities, identified using the Do- ployment of microservices, we can take advantage of main Driven Design principles (Evans, 2003). container orchestration as a service, which are ser- vices provided by majority of the cloud providers, 4.1 Bounded Context used to manage and orchestrate applications deployed in form of containers. Very often, it is a preconfigured To split an application with a single data model into and fully managed cluster. And for ex- a set of microservices, the Bounded Context design ample in Microsoft Azure, the Azure Container Ser- principle suggests the division of large data model vice is not even billed. One needs to pay only for into a set of smaller models with explicitly defined the compute resources used to host the containers it- relationships. Proper application of the bounded con- self. To support rapid scalability, we do not need to text principle is one of the major challenges when allocate virtual machines with the Kubernetes cluster designing a microservice architecture, as it becomes to host containerized applications. Instead, we can very difficult to find the right balance between very take advantage of PaaS cloud container-hosting ser- small microservices having a single data entity, and vices (e.g. Azure Container Istances), which is a fully services handling multiple entities that tend to ulti- managed service providing per-second billing based mately end up being too complex. on the number of created instances, the memory and The following points characterize the advantages cores selected for the instances, and the number of of setting the bounded context small (Fowler, 2014): seconds those instances are run. Such a rapid elastic- explicit service dependencies, independent scalability ity allows us to scale microservices almost instantly and high-availability.

621 ICSOFT 2019 - 14th International Conference on Software Technologies

On the other hand, the problems that arise with the service interface depending on the storage technology utilization of small bounded contexts are: used. If the microservices are stateful, the use of the • Data integrity enforcement – Referential integrity partitioning at the storage level is highly advisable, so of entities stored at a single microservice can be that every node hosting the service only stores a spe- easily enforced at the storage level, but when re- cific portion of data based on the partition key. Selec- ferring entities are stored in different microser- tion of the partition key must be done very carefully vices, it becomes very complex to guarantee that as the key will be then required by most of the service the referenced entity exists. methods to be able to determine which instance can process the request, as depicted in Figure 3. At the • Cross-service queries – When the user wants to same time, requesting data across multiple partitions access data that is distributed among multiple mi- becomes a very complex operation, as illustrated in croservices, it is necessary to query all services Figure 4, which needs to be minimized by design. participating in the query and then combine re- lated data, which is a complex operation and may have negative impact on service response time as 4.3 Summary of Recommendations shown in Section 5.6. • Cross-service transactions – Distributed transac- Data model of the application must be split into mul- tions are generally complex to implement and tiple microservices with adequate level of granular- when transaction modifies data across multiple ity. Because of better consistency enforcement and microservices, it requires the developer to im- higher communication efficiency closely coupled en- plement additional logic to ensure that transac- tities should be in the same microservice. An impor- tions on all services are either all committed or tant point to emphasize is that microservices should all rolled back. also be separated at the storage level. Storage shared by multiple microservices should be strictly avoided • Data Duplication – To overcome issues related by design, as it hinders independent scalability of mi- to cross-service queries, transactions and integrity croservices. The storage becomes a single point of enforcement, frequently referenced entities can be failure, with high risk of becoming a performance bot- stored in multiple copies as part of multiple ser- tleneck. To design highly scalable microservies, par- vices for the price of additional consistency man- titioning at the storage and compute level should be agement. applied. Our sample application consists of 7 microser- vices (6 stateful, 1 stateless) depicted in Figure 2. We 4.4 Evaluation designed every service to manage and store a single domain entity. There are two exceptions we would like to explain here: The impacts of our partitioning strategy on scalability of the designed application can be observed in Fig- • Product Service – manages product catalog, ure 9, which shows throughput of the REST API de- which persists Product and Category entities. pending on the size of the compute cluster and con- We considered separation of these closely coupled firms that applied partitioning strategy leads to design entities into isolated microservices, but because of a scalable microservice application. they are referencing each other very often and at the same time Categories are only referenced by Products, we decided to store them in a single mi- croservice. 5 STORAGE DESIGN DECISIONS • Sales Service – manages stored Orders (headers with embedded items) for a specific user and at Selection of a storage technology or storage ser- the same stores a list of OrderItems for a specific vice in the PaaS cloud has a significant impact product. We decided to store sales data in a du- on the throughput and scalability of the applica- plicate manner due to limits exposed by applied tion (Gesvindr and Buhnova, 2016a). As we expect partitioning. that even in case of microservice architecture the stor- age tier of stateful service will significantly influence 4.2 Partitioning performance metrics of the service, we decided to evaluate four different storage technologies that can Partioning is very often associated with the storage be utilized by our application, to assess how they will layer (Homer et al., 2014) but in the context of mi- limit service scalability and what the overall through- croservices, partitioning can be propagated up to the put of the service in different scenarios will be. This

622 Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

Sales Service User Service * 1 Order Item Order User Partition by ProductID Partition by UserID Partition by Email

* Address 1 Product Service 1 1 Reservation Service Category Product * Partition by CategoryID 1 Reservation 1 1 Partition by ProductID

* * Review Service Stock Service Checkout Service

* Product Review Stock Checkout Item Partition by ProductID Partition by ProductID Figure 2: Separation of data model into different microservices. shall provide software architects with additional guid- provider and offers very high throughput, nearly un- ance for selecting storage technology and design of limited scalability and low operation costs at the cost the storage layer. We evaluate only NoSQL storage of limited querying support which can be partially services, as it was demonstrated in (Gesvindr and mitigated by proper data access tier design. Buhnova, 2016a) that in the PaaS cloud they signif- The representative key-value NoSQL database we icantly outperform classical relational databases. evaluated is Azure Table Storage which is a fully managed key-value NoSQL database with almost un- 5.1 Storage Abstractions limited scalability when data partitioning is properly used, but it comes at the cost of very limited query support, which needs to be taken into account by Individual microservices in our application were im- the architect. Data is stored in tables without any plemented without dependency on any storage tech- fixed structure (every row can have a different set of nology or service, which is still fairly unique while columns), where every row must be uniquely identi- advisable approach, which allowed us to isolate and fied by a pair of partition key and row key. Data with measure the impact of used storage technology. We the same partition key is stored at the same server, designed our own abstraction layer comprised of stor- therefore it is important to generate the partition key age independent repositories and adapters for specific for stored rows in such a way that the rows are dis- storages. Queries defined to retrieve data from the tributed across multiple servers, which leads to high storage in the application are defined in C# in a form scalability of this storage. It is important to be aware of our own composite predicates, which should be de- of limited querying support as data can be efficiently scriptive enough to meet all querying needs of the ap- filtered only based on partition and row key, not by plication. Our storage abstraction layer parses them other columns, which are not indexed. The fastest and generates native queries for given storage tech- queries are those accessing the data from a single par- nology. Moreover, to define a repository, we only tition with an exact match of a row key. When query is define storage-independent domain entity as a class executed to retrieve data across multiple partitions, its in C# and all necessary storage structures are auto- response time significantly increases. The service is matically created and optimized for a given set of billed based on the amount of used storage and num- queries, which is most important in case of the key- ber of transactions. There are no performance tiers or value storage with limited querying support. This ab- billing based on performance. straction layer allows us to easily reconfigure what storage technology is used without the need to rede- To effectively retrieve data from Azure Table Stor- ploy the application, making the benchmarking pro- age, it is necessary to store the same table in duplicate cess more effective. copies but with different set of partition row keys, as illustrated in Table 1 for the list of products in a cat- alog. Data is partitioned by CategoryID, as we are 5.2 Key-value NoSQL Database always displaying the list of products in a single cat- egory and stored sorted in an ascending order based The first type of storage we decided to evaluate is on a different row key, which supports filtering and a key-value NoSQL database as it is provided in a sorting based on that column (e.g. in the table: Name, form of PaaS cloud service by every major cloud EAN Code and Price).

623 ICSOFT 2019 - 14th International Conference on Software Technologies

Table 1: Tables generated to store duplicate data in Azure Table Storage for effective querying. Table name PartitionKey RowKey Product CategoryID CategoryID ProductID Product Name CategoryID Name ProductID Product Name DSC CategoryID Name(inverted) ProductID Product EANCode CategoryID EANCode Product EANCode DSC CategoryID EANCode(inverted) Product Price CategoryID Price ProductID Product Price DSC CategoryID Price(inverted) ProductID

We took an experimental approach where for the key-value storage, all properties are indexed, so some queries with unpredictable performance, we we can efficiently load data based on complex queries. take advantage of low transaction costs and high scal- Results of this experiment are even more interest- ability of Azure Table Storage and execute multiple ing when we also compare pricing models and related variants of the query leading to the same result using costs of used storage services, as Azure CosmosDB different tables. Only the fastest query returns data. requires us to reserve performance in form of costly Request Units (RU) and every operation costs certain 5.3 Document NoSQL Database amount of RU based on operation complexity known prior its execution and when all RU are consumed by The second type of storage service we want to evalu- queries, new operations are rejected with an error. So ate is a NoSQL document database, which in compar- the client does not utilize the full potential of the ser- ison to key-value storages provides complex query- vice’s hardware, but is logically limited so that perfor- ing support by internal indexing of stored documents, mance SLA can be guaranteed. Azure Table Storage which decreases complexity of effective implementa- exposes very cheap transaction fee and provides max- tion for the developer. imum performance of the service without any need to As a representative within MS Azure, we eval- prepay certain performance tier but without any per- uated Azure Cosmos DB, which is a multi-model formance guarantees. NoSQL database as a service, built by Microsoft es- pecially for a highly distributed cloud environment. It 5.4 Reliable In-memory Storage at supports data in multiple different data models—key- value storage with a client protocol compatible with Compute Nodes Azure Table Storage, document storage using either its own protocol or it offers support for MongoDB The last type of storage we wanted to compare in our clients, columnar families with support for Apache sample application is a distributed storage using reli- Cassandra clients, graph data with support for Grem- able in-memory collections, which are locally present lin clients. at the compute nodes of the cluster. Microsoft provides SLA on performance of this Stateful services in Azure Service Fabric have service—Azure Cosmos DB guarantees less than 10 their own unique storage called Azure Service Fabric ms latencies on reads and less than 15 ms latencies Reliable Collections, so such a hosted microservice on (indexed) writes at the 99th percentile. Through- does not have to use external storage services. The put is influenced by the number of reserved Request main goal is to collocate compute resources and the Units (RU), and based on the amount of storage and storage on the same node in the cluster to minimize reserved RU the service is billed. communication latency. The storage itself is provided Our application implements adapters for both the in form of Reliable Collections, which is an evolu- document storage and also for the key-value stor- tion of classical .NET framework collections, but it age as we wanted to compare their performance in provides the developer with a persistent storage with terms of a single service. Despite the fact, that key- a multi-node high availability achieved through repli- value storage was evaluated using Azure Table Stor- cation (one active replica, multiple passive replicas) age, CosmosDB does not share its limitations related and high scalability achieved via data partitioning. to limited query support. Azure Cosmos DB automat- There are three types of supported collections: Re- ically indexes all columns in the table, so that we do liable Dictionary, Reliable Queue and Reliable Con- not have to store data with high duplicity to be able current Queue. to query them efficiently. In the document mode, we To distribute data and workload across the Ser- store collections of complex entities and same as for vice Fabric cluster, it is advisable that every stateful

624 Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

3. Connect and Product Service Product Service execute service method Partition 0 Partition 0 5.6 Evaluation Active Replica Secondary Replica 2. Get partition reference 1. Get products in Product Service Product Service category REST API We have evaluated the use of the four different stor- Service Partition 1 Partition 1 Active Replica Secondary Replica age technologies with our microservice application. Product Service Product Service Partition 2 Partition 2 The results of benchmark evaluating throughput of Active Replica Secondary Replica microservices hosted on 5-node cluster (Azure Ser- Figure 3: Accessing partitioned data stored in Azure Ser- vice Fabric 6.0.232, node hosted on Azure Virtual vice Fabric stateful services. Machines D11 V2 2 cores, 14GB RAM, 100GB SSD cache) depending on the used storage technology are service implements data partitioning and stores data depicted in Figure 5. We evaluated five different based on a partition key in separate partitions. When workloads because we expected that various storage the service is requested to load the data, Service Fab- services will have different performance characteris- ric provides very simple API, which based on the par- tics depending on the type of the workload. The most tition key opens connection to an instance of the ser- surprising and important result is that there is indeed vice that stores the partition and executes our code of no single storage service that would outperform all the the microservice that loads data from the local stor- others in all scenarios. Another fact is that throughput age of the service partition as depicted in Figure 3. of complex operations is very low due to high over- A problem arises when we need to access data across head of cross-service communication, which is one multiple partitions. Then the request is sent to a ran- of the disadvantages of microservice architecture. dom service partition, it loads local data and over net- 350 323 work requests data from other service partitions as de- 300 250 235 226 picted in Figure 4. This leads to delays in responses. 200 154 157 140 141 150 127 Efficient partitioning is a key to the design of highly 89 98 100 52 54 per second) 50 14 19 scalable and efficient storage using Reliable Collec- 7 1 7 10 6 11 0 tions. Partitioning keys we applied on our data model Simple read-only Complex read-only Simple write-only Complex write- All scenarios Average Average throughput (requests only are depicted in Figure 2. Reliable Collections Azure Table Storage Azure Cosmos DB (Key-Value) Azure Cosmos DB (Document) 3. Connect and Product Service Product Service execute service method Partition 0 Partition 0 Figure 5: Throughput of REST API hosted on 5-node clus- Active Replica Secondary Replica 2. Select random partition ter with different storage services using synchronous com-

1. Get products REST API Product Service Product Service munication for different scenarios. Service Partition 1 Partition 1 Active Replica Secondary Replica 1400 1326 Product Service Product Service 4. Connect to other partitions 1200 and execute methods Partition 2 Partition 2 Active Replica Secondary Replica 1000 753 800 654 547 Figure 4: Separation of data model into different microser- 600 534 301 277 400 221 261 243 151 162 167 86 74 vices. 200 31 73 48 19 32 0 Simple read-only Complex read-only Simple write-only Complex write- All scenarios

Average response Average time (ms) only 5.5 Summary of Recommendations Reliable Collections Azure Table Storage Azure Cosmos DB (Key-Value) Azure Cosmos DB (Document) Our experiments confirm strong dependency of ap- Figure 6: Response time of REST API hosted on 5-node cluster with different storage services using synchronous plication’s throughput on used storage technology. communication for different scenarios. None of the used storage technologies outperformed others in all scenarios, therefore selection of stor- To assess scalability of the evaluated storage ser- age technology should be accompanied with bench- vices, we deployed the same application on 20-node marks of implemented microservices to determine if cluster. The results of the benchmarks are depicted in selected technology meets required performance cri- Figure 7. The benchmarks confirm that our service is teria. Use of in-memory storage collocated to com- scalable and its scalability is significantly influenced pute resources (Reliable Collections) leads to great by the used storage service as the throughput increase scalability and lowest operation costs. Due to high does not have the same ratio for all storage services communication overhead observable in complex sce- and scenarios. We also learned that the use of Azure narios, we would recommend the use of microservice Cosmos DB is despite its high throughput very tricky, architecture for scenarios where microservices do not because one needs to pay for reserved storage per- have to communicate frequently with each other. formance (Request Units) and it is challenging to ad- just performance of individual collections to achieve best service performance without wasting allocated request units.

625 ICSOFT 2019 - 14th International Conference on Software Technologies

One can see from our experiments that a single 200,0 159,0 service instance of the stateless Public API Service, 150,0 which resends client requests to individual services, 100,0 76,3 53,4

[USD] 44,7 40,6 37,7 50,0 26,4 was not overloaded even for 20-node cluster, which 13,4 10,1 9,6 2,0 3,0 2,7 2,5 5,5 3,3 6,1 8,3 4,8 4,6 may be a sign of the used ASP.NET Core framework 0,0 Cost per million1 requests Simple read-only Complex read-only Simple write-only Complex write- All scenarios efficiency. only Reliable Collections Azure Table Storage

1000 Azure Cosmos DB (Key-Value) Azure Cosmos DB (Document) 776 784 800 703 Figure 10: Cost per 1 million REST API requests hosted on 582 600 525 394 415 20 node cluster with different storage services using syn- 400 280 256 199 192 chronous communication for different scenarios. 200 116 154 40 43 63 51 per second) 12 29 25 0 Simple read-only Complex read-only Simple write-only Complex write- All scenarios

Average Average throughput (requests only Reliable Collections Azure Table Storage 6 COMMUNICATION STRATEGY Azure Cosmos DB (Key-Value) Azure Cosmos DB (Document) Figure 7: Throughput of REST API hosted 20-node cluster DESIGN DECISIONS with different storage services using synchronous commu- nication for different scenarios. When handling communication between individual services, there are two major strategies that can be 600 559 applied when microservices communicate with each 500 449

400 335 other: 300 265 268 201 204 185 200 132 142 • Synchronous – The service requests a response 104 117 80 87 101 56 76 100 31 53 47 from another microservice and waits for the re- 0 Simple read-only Complex read-only Simple write-only Complex write- All scenarios

Average Average response time (ms) sponse before it continues in an execution. only Reliable Collections Azure Table Storage • Asynchronous – the service send a request (mes- Azure Cosmos DB (Key-Value) Azure Cosmos DB (Document) sage) to another service using messaging service Figure 8: Response time of REST API hosted on 20-node and does not actively wait for the response. When cluster with different storage services using synchronous communication for different scenarios. the response is generated, it is delivered through the messaging service back to the original service. We were further interested in the performance of Impacts of asynchronous messaging on multiple qual- Reliable Collections because they are hosted as an in- ity attributes of the application are well described tegral part of Azure Service Fabric without any ad- in (Taylor et al., 2009) and specifically for the PaaS ditional costs. From the benchmarks we can confirm cloud environment in (Homer et al., 2014; Gesvindr that performance of Reliable Collections is strongly and Buhnova, 2016a). Due to different complexity dependent on the size of the cluster as the workload and frequency of communication between microser- is evenly distributed on multiple nodes using parti- vices, we find it to be desirable to validate impacts tioning, as depicted in Figure 9. On the 20-node of both communication strategies on throughput, re- cluster (Figure 7), read operations are outperforming sponse time and scalability of the application. To this even very expensive Azure Cosmos DB (cost compar- end we implemented our sample application in such ison per request is depicted in Figure 10), especially a way that the communication strategy can be easily in complex scenarios. Unfortunately, the write op- switched thanks to adequate abstractions in its archi- erations are slower due to complex data replication tecture. among nodes to provide data redundancy.

1000 6.1 Synchronous Communication

776 800 Strategy 570 600 398 400 280 235 241 256 The client application communicates with REST API 207 186 per second) 154 143 200 98 116 44 74 14 10 16 23 29 service, which redirects requests received via publicly 0

Average Average throughput (requests 5 10 15 20 available REST API to individual microservices run- Number of nodes in the cluster ning in the Azure Service Fabric cluster. To discover Simple read-only Complex read-only Simple write-only Complex write-only All scenarios Figure 9: Throughput of REST API hosted on 5, 10, 15 and communicate with services running in the cluster, and 20 node cluster with reliable collections storage using the following steps need to happen: synchronous communication for different scenarios. 1. The location of the service needs to be resolved – The service instance can be migrated between different nodes of the cluster and can be running

626 Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

Public Signal R API Service

Product reservation

Product availability check Reservation Product Service User Service Service

Change of Product available price stock computation Review Service Stock Service Sales Service Checkout Service

Figure 11: Diagram of service interactions.

in multiple instances. If the service is stateless, service internally using Reliable Queue as the storage requests can be evenly distributed. If the service is for messages. Other services then can subscribe to stateful, then the service partition should receive receive messages of specified type, which are unfor- only requests for which it does have locally stored tunately broadcasted across all service partitions if it data. has more than one. 2. Connection to the service – When the location of the service is resolved by reverse proxy, which is 6.3 Summary of recommendations a service running in Azure Service Fabric cluster, a direct connection can be opened between two Based on the results of the experiments, it is advisable services hosted in the cluster and requested oper- to use synchronous communication as a primary com- ation can be executed. When the connection fails, munication pattern due to significantly lower over- there is a retry logic implemented as part of Azure head, higher throughput and faster response time. Service Fabric infrastructure. Asynchronous communication is desirable for long Interactions between individual microservices and running operations or operations that need to be com- types of requests are depicted in Figure 11. pleted reliably (e.g. corrective actions described in Section 7.1). 6.2 Asynchronous Communication 6.4 Evaluation Strategy We implemented both communication strategies in An alternative communication strategy does not open our sample application, which provided us with an direct connections between microservices hosted in opportunity to evaluate and compare the behavior of the cluster, but the service sends a request (message) both communication strategies, which is rarely seen to another service using messaging service. When the in experience reports form industry on the same appli- message is delivered, the request is processed and the cation, as having both implementations is costly and response is sent back to the messaging service and not suitable for large production applications. is delivered to the service waiting for the response. The advantage of this approach is a looser coupling 250 235 200 of the microservices as they do not rely on a defined 154 150 98 communication interface but only on the format of re- 100 62 49 57 50 quest and response messages. The disadvantage is the 14

Average Average throughput 10 10

(requests (requests per second) 6 higher implementation complexity, thus higher imple- 0 Synchronous communication Asynchronous communication mentation costs. Simple read-only Complex read-only Simple write-only Complex write-only All scenarios Messaging service is not part of Azure Ser- Figure 12: Throughput of REST API hosted on 5-node vice Fabric, therefore we use the ServiceFab- cluster using synchronous and asynchronous communica- ric.PubSubActors library, which hosts a new mi- tion with reliable collections storage for different scenarios. croservice that works as a reliable messaging stateful

627 ICSOFT 2019 - 14th International Conference on Software Technologies

20000 15971 component is used to describe a complex transaction 15000 11465 across multiple services, but the transaction itself is 10000 7058 split into multiple atomic blocks—the internal block 5174 5891 5000 executes operations inside the service, the external 86 151 48 301 243 0 block wraps communication with another service. As Average Average response time (ms) Synchronous communication Asynchronous communication Figure 14 Simple read-only Complex read-only Simple write-only Complex write-only All scenarios depicted in , in the atomic block there are Figure 13: Response time of REST API hosted on 5-node actions to be executed and also pair actions to revert cluster using synchronous and asynchronous communica- changes if the transaction fails on any of the blocks. tion with reliable collections storage for different scenarios. This sequence of operations is executed exactly in the order of their definition. We discovered that data vali- The results of the benchmark are depicted in Fig- dation should be executed as the initial action because ure 13. Despite the fact that synchronous calls are when this fails, no corrective actions are needed. As considered harmful (Fowler, 2014) it is very sur- no data integrity is enforced by the storage across ser- prising how significantly synchronous service com- vices, we run all necessary validations as part of the munication outperforms its asynchronous alternative, transaction. When any of the operations fails, an ex- which uses messaging as a form of reliable commu- ecution of corrective actions is initiated. Corrective nication between services. The use of reliable mes- actions are implemented as always-succeed actions, saging services leads to increased availability as very which means that they are stored in a highly reliable short outages of individual services are not propa- queue to ensure that the action is repeatedly executed gated to the client but based on our implementa- until it succeeds. With this approach, the corrective tion and tests, this communication strategy for sim- action overcomes even temporary service failures. ple scenarios has four times worse throughput than Our implementation of cross-service transactions synchronous calls. In case of the complex scenarios, does not ensure as high level of consistency and atom- the difference is significantly lower, by which one can icity as known from relational databases but it pro- conclude that asynchronous messaging is worth con- vides us with structured transaction description and sidering for long lasting complex operations where sufficient guarantees for the purpose of our business high throughput is not required and reliability is more transactions with minimum performance impact. important. Similar conclusions for PaaS cloud appli- cations (not in context of microservices) are also men- 7.2 Constraints Enforcement tioned in (Gesvindr and Buhnova, 2016a). Since the data is stored across multiple microservices with isolated storages, it is not possible to enforce ref- 7 RESILIENCE DESIGN erential integrity at the storage level. Therefore this needs to be enforced with transactions. Every transac- DECISIONS tion as described in the previous section runs its own data validation, during which it validates the existence When designing a microservice architecture, one of of referenced entities in different services. the biggest challenges is to enforce data consistency Another issue we have addressed was how to gen- across multiple services, and to manage cross-service erate unique identifiers of stored entities, because in transactions. Software architects are nowadays pro- many current applications the identifier of a record vided with hardly any guidance on addressing these (primary key) is generated by a relational database, design challenges in microservice architecture, we which cannot be followed in our project, and not therefore came with our own implementation, which all storage services offer support for generation of a is a modified version of the compensation transaction unique identifier of a record. Therefore, we gener- pattern (Homer et al., 2014) combined with an event ate globally unique identifier using service code when sourcing pattern (Homer et al., 2014) to handle re- new entity is created in sour service code before it liable cross-service compensations. Our implemen- is persisted. The uniqueness of the value is then en- tation does not increase complexity of successfully forced by the storage. executed operations, i.e. there were no performance Data validation operations are implemented as a impacts found during benchmarks. part of the service code, mostly in a constructor of an entity to prevent invalid entity from being instanti- 7.1 Cross-service Transactions ated. To deal with the mentioned issues, we implemented a component called Event Sequence Source. This

628 Design of Scalable and Resilient Applications using Microservice Architecture in PaaS Cloud

Check if Confirm order Check if user on success on success Calculate total reserved exists price products exist No correction on failure No correction on failure No correction action action action

For each order item: on success on success Persist Order Items Persist order Decrease Stock Level

on failure on failure Delete Order Items Delete order Increase Stock Level

on success Release on success customer Log success reservations on failure No correction on failure No correction action action

Figure 14: Transaction workflow using Event Sequence Source.

7.3 Handling Transient Errors 7.6 Evaluation

It is very important in the PaaS cloud to properly han- Based on our observations, none of the presented re- dle transient faults (Gesvindr and Buhnova, 2016b) by siliency design decision has a measurable impact on implementing a retry strategy so that when a cloud re- application performance for successful requests, as no source is not currently available, an error is not propa- additional actions need to be executed. Thus the eval- gated to the client, but instead the operation is retried uation of these strategies with respect to the perfor- multiple times with an increasing delay. This is al- mance metrics studied in this paper is not relevant. ready implemented in majority of client libraries and However it might be interesting to study their effects it just needs to be enabled. on resilience-motivated quality attributes, which are out of scope of this paper. 7.4 Recoverability

To increase recoverability of the application in the 8 CONCLUSION PaaS cloud, it is advisable to implement the Circuit Breaker Pattern (Homer et al., 2014). This applies In this paper, we have identified, discussed and eval- also to microservice architecture as the pattern pre- uated a set of design principles that influence service vents an application repeatedly trying to execute an decomposition, storage, communication strategy and operation that is likely to fail without wasting re- resilience in microservice architecture deployed in sources. It detects if the fault has been resolved and PaaS cloud. On the sample application, we measured then it gradually increases the load as more and more their impact and presented numerous findings, which requests are permitted to execute. support the observation that microservice architec- ture leads to high scalability, but brings new design 7.5 Summary of Recommendations challenges further amplified by operation in the PaaS cloud and richness of design choices that the archi- tects have. Decomposition of the services needs to be Cloud computing services frequently deal with very carefully validated, selection of the storage provider short outages in duration of few seconds, therefore cannot be done without knowledge of a specific work- adequate transient error handling policies in a form of load and benchmarks, synchronous communication retry strategy needs to be implemented. Especially for strategy was found to perform way better despite rec- microservices, these outages could lead to costly roll- ommendations in literature. Additional effort shall be back of cross-service transactions. Validity of data invested in extension of the studied design principles must be enforced mostly at the application level, as and patterns for microservice design in the context of due to the use of isolated storage services constraints the PaaS cloud. enforcement cannot be applied at the storage level.

629 ICSOFT 2019 - 14th International Conference on Software Technologies

ACKNOWLEDGEMENT Mihindukulasooriya, N., Garc´ıa-Castro, R., Esteban- Gutierrez,´ M., and Gomez-P´ erez,´ A. (2016). A survey This research was supported by ERDF ”Cy- of restful transaction models: One model does not fit berSecurity, CyberCrime and Critical Informa- all. J. Web Eng., 15(1-2):130–169. tion Infrastructures Center of Excellence” (No. Nadareishvili, I., Mitra, R., McLarty, M., and Amundsen, M. (2016). Microservice Architecture: aligning prin- CZ.02.1.01/0.0/0.0/16 019/0000822). ciples, practices and culture. O’Reilly Media, Inc., 1st edition. Newman, S. (2015). Building Microservices. O’Reilly Me- REFERENCES dia, Inc., 1st edition. Pardon, G., Pautasso, C., and Zimmermann, O. (2018). (2015). Why you can’t talk about microservices without Consistent disaster recovery for microservices: the mentioning netflix. https://smartbear.com/blog/deve- BAC theorem. IEEE Cloud Computing, 5(1):49–59. lop/why-you-cant-talk-about-microservices-without- Richardson, C. (2017). Who is using microservices? ment/. Sill, A. (2016). The design and architecture of microser- (2017). Cloud design patterns. https://docs.microsoft.com vices. IEEE Cloud Computing, 3(5):76–80. /en-us/azure/architecture/patterns/. Taylor, R. N., Medvidovic, N., and Dashofy, E. M. (2009). (2018). Microsoft service fabric. https://github.com/micro- Software architecture: foundations, theory, and prac- soft/service-fabric. tice. Wiley Publishing. Aderaldo, C. M., Mendonc¸a, N. C., Pahl, C., and Jamshidi, Villamizar, M., Garcs, O., Castro, H., Verano, M., Sala- P. (2017). Benchmark requirements for microservices manca, L., Casallas, R., and Gil, S. (2015). Evaluating architecture research. In Proceedings of the 1st Inter- the monolithic and the microservice architecture pat- national Workshop on Establishing the Community- tern to deploy web applications in the cloud. In 2015 Wide Infrastructure for Architecture-Based Software 10th Computing Colombian Conference (10CCC), Engineering, ECASE ’17, pages 8–13, Piscataway, pages 583–590. NJ, USA. IEEE Press. Wilder, B. (2012). Cloud Architecture Patterns. O’Reilly Alshuqayran, N., Ali, N., and Evans, R. (2016). A system- Media, 1st edition. atic mapping study in microservice architecture. In Wolff, E. (2016). Microservices: Flexible Software Archi- 2016 IEEE 9th International Conference on Service- tectures. CreateSpace Independent Publishing Plat- Oriented Computing and Applications (SOCA), pages form. 44–51. Erl, T., Puttini, R., and Mahmood, Z. (2013). Cloud Com- puting: Concepts, Technology & Architecture. Pren- tice Hall Press, Upper Saddle River, NJ, USA, 1st edi- tion. Evans, E. (2003). Domain-Driven Design: Tackling Com- plexity in the Heart of Software. Addison-Wesley. Fowler, M. (2002). Patterns of Enterprise Application Ar- chitecture. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Fowler, M. (2014). Microservices a definition of this new architectural term. Gamma, E., Helm, R., Johnson, R., and Vlissides, J. (1995). Design Patterns: Elements of Reusable Object-oriented Software. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA. Gesvindr, D. and Buhnova, B. (2016a). Architectural tac- tics for the design of efficient PaaS cloud applications. In 2016 13th Working IEEE/IFIP Conference on Soft- ware Architecture (WICSA). Gesvindr, D. and Buhnova, B. (2016b). Performance chal- lenges, current bad practices, and hints in PaaS cloud application design. SIGMETRICS Perform. Eval. Rev., 43(4). Homer, A., Sharp, J., Brader, L., Narumoto, M., and Swan- son, T. (2014). Cloud Design Patterns: Prescriptive Architecture Guidance for Cloud Applications. Mi- crosoft patterns & practices. Mell, P. and Grance, T. (2011). The NIST definition of cloud computing.

630