On Recent Advances on Stateful Orchestrated Container Reliability

On Recent Advances on Stateful Orchestrated Container Reliability Kęstutis Pakrijauskas Dalius Mažeika Faculty of Fundamental Science Faculty of Fundamental Science Vilnius Gedinimas Techincal University Vilnius Gedinimas Techincal University Vilnius, Lithuania Vilnius, Lithuania [email protected] [email protected] Abstract— Thanks to its flexibility and light weight, containers there is no risk of data loss, and recreation of the failing are becoming the primary platform to run microservices. component is a small deal. However, it is a different matter with Container orchestration frameworks – Kubernetes or Docker stateful microservices that deal with data. State is “a sequence Swarm – enable companies to stay on competitive edge by keeping of values in time that contain the intermediate results of a the velocity of code deploys high. While containers are ideal for desired computation.” [6]. The state makes deployment, stateless workloads, using orchestrated containers for stateful management, scaling, replication a complex engagement. State services is an option too. Being a commodity and crucial to any has to be synchronized across multiple replicas in a business, state or, in other words, data has to be protected and be microservice. Recovering a stateful microservice is not a trivial available. This research raises questions on what the reliability matter. Solid backup recovery strategy, replication, sharding, challenges of running stateful microservices are, and what are the recent approaches to increase reliability of stateful services in etc. may not be enough to ensure high reliability on stateful orchestrated container systems. A literature review was microservices. Inter-service state consistency of data may be performed to answer the questions. violated if a single microservice was recovered to an earlier state. Data of seemingly healthy stateful microservice can be Keywords—microservices, containers, Kubernetes, stateful, corrupted, thus triggering a restore or rollback operation. failure, availability, review Migration to a healthier node in the platform is a challenge with stateful microservices as well because of its resource intensive I. INTRODUCTION and potentially disrupting nature. Software and technology are the key in transforming organizations and delivering value to stakeholders and customers in modern times [1]. Success of business closely depends on whether or not its systems are running at the desired state. Microservices, an implementation of Service-Oriented Architecture, allows companies to keep up with the demand to scale and roll new services out [2]. Challenges of running microservice-based application are different compared to monolithic application: monitoring, recovery, and load balancing, etc. Data or state is an asset of any business. Modern systems becoming highly complex and of high scale according to DevOps report 2018 [3] and 2019 [4]. High system reliability is a concern. Thus, enterprises, such as Google [5], build their systems with resilience and reliability in mind. Components of microservices should be engineered, prepared Fig. 1. Comparison of monolith and microservice architectures for failure instead of attempting to ensure that no components fail. Microservices and entire system should be tolerant to Container orchestration frameworks, such as Kubernetes, failures, able to recover quickly [2], [11]. Downtime of Docker Swarm, or Mesos, were developed for [7]. Such tools microservice based systems is decreased if microservice returns allow to automate and abstract different microservices back online in a timely manner. management tasks like service discovery, storage orchestration, rollout and rollback, resilience [8]. It is important to evaluate A stateless service is limited to its function – its output reliability of applications running in container orchestration depends on its input. Recovery of stateless microservices systems given their capabilities of running microservices. components is straight forward – its components are ephemeral, XXX-X-XXXX-XXXX-X/XX/$XX.00 ©20XX IEEE Reliability is commonly defined as “The probability that an performed using guidelines defined by Kitchenham and Chaters item will perform a required function without failure under [13]. The search was limited to digital libraries and search stated conditions for a stated period of time.”[9]. Mathematical engines on the Internet. The search used the following terms: and statistical methods are used to quantify reliability. However, “stateful microservice” OR “stateful container orchestration” uncertainty too great in practice or reliability to be calculated. OR “microservice database” OR “container database” OR Reliability became an important effectiveness parameter as cost “microservice fault” OR “container fault”. Further studies were and complexity of systems increases. According to ISO/IEC identified by examining references in included articles. 25010 standard, reliability is a characteristic of systems and software product quality model. This paper aims to discover II. CHALLENGES SUBJECT TO STATEFUL MICROSERVICES approaches applicable to stateful microservices to satisfy the A. Federated Multidatabase sub-characteristics of reliability as described in the standard [10]: Fine-grained microservices are developed and operated by independent teams. Each microservices can be independently • maturity: degree to which a system, product or deployed and scaled. Stateful services rely on their own component meets needs for reliability under persistent storage mechanism. To reduce coupling, integration normal operation at storage level, i.e., using one data storage mechanism, like database, tends to be avoided. Interaction between microservices • availability: degree to which a system, product or should be limited to APIs. However, as there is no guarantee that component is operational and accessible when a link to retrieve records from another microservice is valid, required for use consistency becomes a challenge [14]. • fault tolerance: degree to which a system, product Microservices, being autonomous and independently or component operates as intended despite the deployed, may store data on a variety of platforms. Each presence of hardware or software faults microservice stores persistent data on its own private database. • recoverability: degree to which a system, product This data is accessible to other services only via API. or component operates as intended despite the Relationships to other entities in REST architecture are presence of hardware or software faults expressed as URI links. URI is Uniform Resource Identifier that globally addresses the referenced entities. Lifecycle of A prerequisite to reliability evaluation is setting the standard microservices is independent, thus databases are backed up of performance. The standard of performance is defined by periodically and independently. In case of recovery, links Service Level Agreement (SLA), Service Level Objectives between microservices may be broken due to inconsistent state (SLO) and Service Level Indicators (SLI) [12]. of microservices after data was restored from a backup on one SLI is a defined quantitative measure of an aspect of the level of them [15]. of a service. An SLO is the target range or value of SLIs. Setting Microservice architecture is designed to survive its SLO is important to evaluate reliability as it adds transparency individual components failing. Stateless and stateful services and understanding whether the level of a service meets the can be recovered independently. As data in stateful services can expectations or not. An SLA is an agreement with users on what be recovered from a backup, there is a question whether the to do with the service if it is not performing within the SLO. restored data is consistent with the data on other microservices. Having an established service health baseline and solid The challenge is ensuring data consistency among multiple prediction on what is to happen with a microservice is not microservices, how and when perform backup operations [14]. enough. Orchestrated container systems have a rich selection of Databases of microservices can be seen as a federated settings and techniques that can be used to increase microservice multidatabase – a hybrid between centralized and distributed reliability by taking a data-driven decision or detecting faults. database system. A database that is distributed for global users This paper aims to summarize the available information on and centralized for local users. Each microservice treats its studies related to reliability of stateful microservices in database as a centralized one, ensuring its durability and orchestrated container systems. The research questions are: consistency [15]. However, managing its consistency as a challenge because of distributed persistence. Foreign key • RQ1: How and what kind of data can be used to relationships between databases of different microservices are make data-driven decision on microservice represented as loosely coupled references like URI. There is no reliability in orchestrated container systems? guarantee that a retrieved URI points to a valid record in another microservice. • RQ2: What are the recent data-driven methods used to increase or predict stateful microservice Although backup of individual microservices can be reliability? successfully used for independent recovery, it is likely that its state will not be consistent with the state of the application. For

On Recent Advances on Stateful Orchestrated Container Reliability

Kubernetes Security Guide Contents

Running Legacy VM's Along with Containers in Kubernetes!

Ovirt and Docker Integration

Openicra : Vers Un Modèle Générique De Déploiement Automatisé Des Applications Dans Le Nuage Informatique

High Availability in Clouds: Systematic Review and Research Challenges Patricia T

Container and Kernel-Based Virtual Machine (KVM) Virtualization for Network Function Virtualization (NFV)

Kubernetes As an Availability Manager for Microservice Based Applications Leila Abdollahi Vayghan

Immutable Infrastructure, Containers, & the Future of Microservices

System Design for Telecommunication Gateways

Protection Group (PG): This Is a Dynamic Entity That Informally Represents the Groups of Components to Which a CSI Has Been Assigned

Kubernetes As an Availability Manager for Microservice Applications

Docker and Kubernetes: Changing the Opentext Documentum Deployment Model