Blue Skies

Open Issues in in the Cloud Maria Fazio and Antonio Celesti University of Messina he adoption of container-based microservice architec- tures is revolutionizing application design. By adopting Rajiv Ranjan a microservice architecture, developers can engineer Newcastle University applications that are composed of multiple lightweight, Lydia Chen self-contained, and portable runtime components deployed across IBM Research a large number of geodistributed servers. Chang Liu A microservices-based cloud application in- not fully anticipate functional- Newcastle University volves the interoperation of multiple micro­services, ities in advance (for example, each developed separately, that can be deployed, up- the types of devices that might Massimo Villari dated, and redeployed independently without com- one day access the applica- University of Messina promising the application’s ecosystem’s integrity. tion). Microservice architec- The ability to independently update and redeploy the tures are a part of a larger shift code base of one or more microservices increases ap- in IT departments towards a plications’ , portability, updatability, and DevOps culture, in which development and opera- availability, but at the cost of expensive remote calls tions teams work closely together to support an ap- (instead of in- calls) and increased overhead plication over its lifecycle, and go through a rapid or for cross-component synchronization. even continuous release cycle. The microservices approach is in contrast to the Microservices act as standalone application traditional “monolithic” development of applications, subunits or components, implementing specific where each application is a single, autonomous unit. communication protocols for sending and receiv- For example, in a client-server application, the server ing messages. In microservices, data flows through is a monolithic entity that handles HTTP requests, smart endpoints, which also process incoming in- executes logic, and retrieves or updates its data. The formation. Using well-defined interfaces and pro- problem with such monolithic architectures is that tocols, application developers can deploy different even a small modification of the application’s logic microservices on heterogeneous infrastructures requires the deployment of a new running version of without a specific integration framework. Gener- the entire code base. A microservice architecture is ally, microservice communication uses a REST ap- lightweight and easily shipped and updated. Hence, proach based on HTTP and TCP protocols, XMPP, it’s ideal for engineering applications where we can- or JavaScript Object Notation (JSON). However,

2325-6095/16/$33.00 © 2016 IEEE September/October 2016 IEEE 81 Blue Skies

Guest Guest Guest Guest Guest Guest microservice microservice processes processes microservice microservice Runtime Libs Runtime Libs

Runtime Libs Runtime Libs Runtime Libs Runtime Libs Container Container Guest OS Guest OS Container Container Container engine VM VM Container engine Host operating system

Hypervisor Host operating system Hypervisor

Physical cloud hardware Physical cloud hardware Physical cloud hardware

(a) (b) (c)

Figure 1. Comparison of cloud architectures: (a) hypervisor-based application deployment, (b) hypervisor-free containerized microservice, and (c) containerized microservice within a hypervisor-managed physical host.

currently, there are no widely adopted standardized tualization leads to weaker isolation and introduces protocols or data formats for microservice com- greater security vulnerabilities than hypervisor-based munication.1 Microservice deployment and execu- virtualization.4 tion also leads to various networking issues. To this From the user viewpoint, each container looks end, application developers currently adopt various and executes exactly like a standalone operating sys- software-defined networking (SDN) and network tem. Additionally, in a cloud computing scenario, function virtualization (NFV) solutions for network- developers can deploy a higher density of contain- ing microservices. ers (compared to VM density in hypervisor-managed datacenters) on the same physical hardware. Linux Overview of Virtualization Technologies container virtualization (LCV) is the most well- Hypervisor-based resource virtualization (such as known container-based virtualization technology. that used by Xen and VMware) is a key concept in Popular LCV solutions include , LXC, lmct- cloud computing. Hypervisor-based virtualization fy, and OpenVZ. enables cloud providers to create unique virtual ma- Figure 1 shows the key architectural differenc- chines (VMs) that share a set of physical hardware re- es between hypervisor-based and container-based sources (CPU, memory, network, and disk). Each VM virtualization. Figure 1a shows application compo- executes distinct operating system instances (rang- nents deployed within a hypervisor-based VM that ing from proprietary to open source), which supports provides abstraction for full guest operating sys- fault-tolerant and isolated security context behavior. tems (one per VM). Figure 1b shows microservice Container-based virtualization can be used to deployment within a hypervisor-free containerized create microservices.2 A container is a collection environment. Finally, Figure 1c shows microservice of operating system kernel utilities configured to deployment within a containerized environment on manage the physical hardware resources used by a a physical hardware managed by a hypervisor-based particular application component.3 Containeriza- VM. After physical hardware (for example, a server tion allows cloud providers to instantiate, relocate, or appliance), a downward-facing hypervisor is more and optimize hardware resources in a more flexible suitable for managing infrastructure-as-a-service way while providing near-native performance (if de- (IaaS) clouds, whereas containers are more suited for ployed in “hypervisor-free” mode). Because the con- managing platform-as-a-service (PaaS) clouds. Hav- tainers share a single operating system kernel, they ing said that, hypervisor-free containerization isn’t a incur lower overhead.3 However, container-based vir- replacement for traditional hypervisor technologies;

82 IEEE Cloud Computing www.computer.org/cloudcomputing the two technologies complement each other and age core networking functions via software instead must be carefully analyzed during the application of relying on hardware to handle these functions. architecture design phase in terms of performance Creating NFVs using Open Virtual Network isolation, overhead, and security requirements. (OVN) technology guarantees an efficient and se- cure use of the network. OVN complements existing Container Engines for Microservices SDN capabilities, adding native support for virtual Scheduling and Management network abstractions, such as virtual L1 and L2 Several tools can instantiate and manage containers overlays and security groups. OVN also supports the in clouds. Docker Swarm, for example, provides na- security inspection of data transfer inside virtual tive clustering for Docker containers. It turns a pool networks (for example, packet inspection); hence it of Docker hosts into a single virtual Docker host. provides extra features useful for increasing custom- Because Docker Swarm serves the standard Dock- er security and privacy er API, any tool that already communicates with a Docker daemon can use Swarm to transparently Open Issues in Scheduling and Resource scale to multiple hosts. A Docker container manager Management represents the basic container-oriented technology. Despite the clear technological advances in con- is an open-source technology for tainer and hypervisor-based virtualization technol- automating deployment, operations, and scaling of ogies, we are yet to realize a standard large-scale, containerized applications. It groups the containers performance-optimized scheduling platform for making up an application into logical units for easy managing an ecosystem of microservices networked management and discovery—for example, based on together to create a specialized application stack, their resource requirements and other constraints. such as a multitier Web application and Internet of Kubernetes also provides horizontal scaling of ap- Things (IoT) application. Future efforts will focus plications, which can be performed manually or on solving the following research challenges. automatically based on CPU load. Finally, it pro- vides automated rollouts and rollbacks and self- healing features. Configuration Selection and Management Magnum is the OpenStack API service that A cloud application (for example, a multitier Web makes container orchestration engines such as application) must typically combine multiple inter- Docker Swarm and Kubernetes available as first-class dependent microservices that provide diverse func- resources in the OpenStack managed datacenter. tionalities—for example, load balancer, webserver, Magnum uses the Heat service to schedule an operat- and server. Moreover, these microservices ing system image, which contains Docker and Kuber- have both control and dataflow dependencies. The netes, and runs this image on either VMs or a bare challenges exist in dealing with heterogeneous con- metal cluster. figurations of microservices and cloud datacenter The Google Container Engine provides a com- resources driven by heterogeneous performance mercial service that relies on Docker and Kuber- requirements. With the increase in microservice netes for cluster management and orchestration. application functionality types (encryption, com- Similarly, the Elastic Compute Cloud pression, SQL/NSQL server, virtual private net- (EC2) container service supports Docker containers work, and so on) and the heterogeneity of container to be deployed on a managed cluster of Amazon EC2 engines (LXC, Docker, Google, and Amazon) and instances. Rackspace is slightly behind with respect underlying cloud datacenter resources, the mapping to container-based offerings. Its beta service, Cari- of microservices to datacenters demands selecting na, is based on Docker Swarm and doesn’t provide bespoke configurations from an abundance of pos- any elasticity features. sibilities,5 which is impossible to resolve manually. Openstack Neutron supports the management of Branded price calculators, available from public virtual LANs in cloud datacenters by creating ad hoc cloud providers (Amazon and Azure, for example) and NFV. NFV uses virtualization technologies to man- academic projects (Cloudrado), allow comparison of

September/October 2016 IEEE Cloud Computing 83 Blue Skies

hardware resource leasing costs. However, these cal- Hence, an important research direction is to culators can’t recommend or compare configurations investigate a microservices composition framework, across microservices and datacenter resources. which will facilitate knowledge reuse and make it We therefore need new research that focuses simpler for application engineers to interact with a on developing techniques for accurately model- complex computing platform. ing, representing, and querying configurations of microservices and datacenter resources. In addi- Performance Characterization and Isolation tion, we need general-purpose decision-making In a datacenter, microservices can be deployed in- techniques, driven by heterogeneous performance side hypervisor-based VMs or on nonvirtualized requirements, to automate the selection of mi- physical hardware. A recent study found that deploy- croservice configurations and their mapping to het- ment within VMs imposes additional performance erogeneous datacenter resources.5 overhead while giving no extra benefit compared to deploying microservice containers on a virtualized Application Topology Specification and physical server.7 As noted earlier, single containers, Composition such as Docker, can support multiple and heteroge- To compose a microservices-based application topol- neous microservices that provide various application- ogy, you need to describe the microservices using a specific features in a containerized environment. well-known standard. For example, you can base mi- In this environment, unexpected interference and croservice descriptions on the Topology and Orches- contention can occur. For some microservices (such tration Specification for Cloud Applications (Tosca)/ as a compression server) storage requirements domi- YAML along with the usual image representation. nate, whereas for others (for example, transactional Moreover, workloads pertaining to different mi- query processing by database server) computational croservices depend on each other, and changes in one requirements dominate, and for still others (for ex- microservice’s execution and dataflow will influence ample, a VPN server) communication requirements those of others. Overall, the topology specification dominate. Hence, container engines (Kubernetes, and composition needs to cover the whole life cycle— Docker Swarm, and so on) must consider which that is, deploy, patch, monitor, reconfigure, and shut- microservices to combine to minimize workload in- down driven by the performance objectives of each terference and contention. Balancing resource con- microservice as well as the application as a whole. sumption and performance is critical in deciding The Business Process Execution Lan- where to deploy microservices. guage (BPEL) and Web Service Choreography Some recent work has investigated performance Interface (WSCI) are examples of Web service com- isolation and interference detection. New hardware position (agnostic to microservices) languages used design techniques change processor cache architec- in SOAs. The Resource and Application Description ture partitioning8 or integrate novel insertion policies Language (RADL) is designed for composing and to pseudo-partition caches to reduce contention.9 deploying VM images to different cloud providers.6 Hardware-based approaches add complexity Some application topology composition and speci- to the processor architecture and are difficult to fication tools found in literature (Crane, Fig, and manage over time. Sriram Govindan and his col- Maestro, for example) can’t deploy microservices leagues developed a scheme to quantify the effects across distributed datacenter hosts.2 Although Tosca of cache contention between consolidated work- supports topology pattern specification, it lacks sup- loads.10 However, they limit their discussion to port for describing data and control flow dependen- cache contention issues, ignoring other hardware cies between microservices, with a specific focus on resource types. Ripal Nathuji and Aman Kansal identifying event coordination and dataflow mecha- present a control theory-based consolidation ap- nisms; properties of microservices in terms of work- proach that mitigates the effects of cache, memory, load features (such as data format, query rate, and and hardware prefetching contention of coexisting runtime I/O dependency); and performance objec- workloads.11 However, their focus is CPU-bound or tives and measures relevant to microservices. compute-intensive applications.

84 IEEE Cloud Computing www.computer.org/cloudcomputing Several new research topics are worthy of inves- ficiency, and overloading) on performance without tigation: performance isolation and characterization understanding the whole platform’s complexity. techniques when multiple microservices run in the same container or on the same physical host; live Elastic Scheduling and Runtime Adaptation migration of containers to reduce interference and The elastic scheduling of microservices is a com- contention; and tradeoffs between live migration plex research problem due to several runtime and restarting. uncertainties. First, it’s difficult to estimate microservice work- Microservice Monitoring load behavior in terms of request arrival rate, type, Guaranteed application performance requires clear and processing time distributions; I/O system be- and real-time understanding of performance met- havior; and number of users connecting to different rics across microservices and datacenter resourc- types and mix of microservices. The real challenge es. However, variations in performance metrics in devising microservice-specific workload models is across different microservices and datacenter to accurately learn and fit statistical functions to the resources complicate this problem. For exam- monitored distributions such as request arrival pat- ple, key performance metrics for SDN resources tern, CPU usage patterns, memory usage patterns, are throughput and latency; for CPU resources, I/O system behaviors, request processing time distri- they’re utilization and throughput; and for SQL butions, and network usage patterns. and NoSQL database microservices, it’s query Without knowing the workload behaviors of response time. Therefore, how to define and for- microservices, it’s difficult to make decisions about mulate performance metrics coherently across the types and scale of datacenter resources to be microservices to give a holistic view of data and provisioned to microservices at any given time. control flows remains an open issue. Furthermore, the availability, load, and throughput Monitoring tools that were popular in the grid of datacenter resources can vary in unpredictable and cluster computing era (for example, R-GMA and ways, due to failure or congestion of network links. Hawkeye) were concerned only with monitoring per- Kubernetes offers a microservice container re- formance metrics at the datacenter resource level configuration feature, which scales by observing CPU (such as CPU percentage and TCP/IP performance), usage (“elasticity is agnostic to the workload behavior but not at the microservice level (such as end-to-end and performance targets of microservice.” Amazon’s request processing latency and communication over- autoscaling service employs simple threshold-based head). Cluster-wide monitoring frameworks (Nagios, rules or scheduled actions based on a timetable to Ganglia, Apache Hadoop, and Apache Spark) pro- regulate infrastructural resources (for example, if the vide information about hardware metrics (cluster, average CPU usage is above 40 percent, add another CPU, and memory utilization, and so on) of clus- microservice container). Other cloud providers have ter resources that might belong to public or private implemented similar simple rule-based reactive run- cloud datacenter.12,13 Monitoring frameworks used time scheduling techniques: Google’s Cloud Platform by the Amazon EC2 Container Service (Amazon autoscaler, Rackspace’s Auto Scale, Azure’s CloudWatch) and Kubernetes (Heapster) typically Fabric Controller, and IBM’s Softlayer autoscale. monitor CPU, memory, filesystem, and network us- To the best of our knowledge, no prior work has age statistics, so they can’t monitor microservice- developed workload and resource performance pre- level performance metrics. diction models to enable reconfiguration (scaling, This leads to several new research topics, in- descaling, and migration) of microservices on cloud cluding development of holistic techniques13 for datacenters while ensuring microservice-specific per- collecting and integrating monitoring data from all formance targets. Hence, important new research is microservices and datacenter resources so admin- investigating predictive workload and performance istrators or a scheduler (a computer program) can models to forecast workload input and performance track and understand the impact of runtime uncer- metrics across multiple, collocated microservices tainities (for example, failure, load-balancing ef- deployed on cloud datacenter resources.

September/October 2016 IEEE Cloud Computing 85 Blue Skies

Federated Clouds Storage and Processing Services The cloud services market has been growing in re- C1 C2 cent years, a trend that’s confirmed by the number S1 S2 S3 ... Sn of cloud providers that have appeared on the market. VM1 VM2 VM3 VM4 Currently, small and medium cloud providers can’t directly compete with the big players (such as Google, IoT Cloud Provider Amazon, and Microsoft), so they must implement new business strategies to penetrate the market.16,17 ... SA1 SA2 SA3 SAm In particular, small and medium providers can establish stronger partnerships to share resources C1 C2 Sensing and Actuating Services according to the rules of the cloud federation eco- system they belong to. Small providers can federate C3 C4 C5 with large providers to gain economies of scale, op- timize their assets, scale their capabilities, and share resources to establish new forms of collaboration. If Figure 2. A microservice as the enabler for the IoT application cloud. a small provider’s cloud runs out of capacity, it can IoT application are decomposed into collection of microservices which migrate its microservices to federated datacenters to are distributed across physical hardware resources available in the cloud ensure business continuity (see Figure 3). and on the network edge. However, federated clouds need to respond to high heterogeneity across independent cloud systems, efficient and secure data exchange among clouds, and Evolution of Microservice-Powered the ability to efficiently deploy resources and services Cloud Paradigms across such federated systems. Indeed, the dynamism Wide-scale adoption of containerization technolo- of a federation with incoming and outgoing providers gies and microservices architectures will strongly and variable resource availability makes microser- influence other emerging computing paradigms. vices and containers the best solution to quickly adapt to changes in the federated system. Cloud Computing and Internet of Things The combination of cloud computing and the IoT is presenting new opportunities for delivering new icroservices will simplify orchestration of types of application services (see Figure 2). For ex- networked applications across heterogeneous ample, private, public, and hybrid cloud providers cloud datacenters and emerging microdatacenters are looking to integrate their datacenters’ software (on the network edge). However, the creation of and hardware stacks with embedded devices (in- such applications (for example, smart city and smart cluding sensors and actuators) to provide IoT as a healthcare IoT clouds) requires new research into service (IoTaaS). scheduling and resource management algorithms Typically, IoT devices run customized soft- and platforms for managing highly distributed and ware developed with a particular programming networked microservices. language and/or development framework. Minimal processing and storage tasks can be performed in References IoT devices (for example, a sensor gateway or SDN 1. A Sill, “The Design and Architecture of Mi- virtualization) by deploying lightweight, contain- croservices,” IEEE Cloud Computing, vol. 3, no. erized microservices.14,15 Meanwhile, the massive 5, 2016, pp. 76–80. data storage and processing tasks (data mining and 2. C. Pahl and B. Lee, “Containers and Clusters big data analytics) are performed in cloud datacen- for Edge Cloud Architectures: A Technology ters that exploit virtualization (both hypervisor and Review,” Proc. 3rd Int’l Conf. Future Internet of container-based) to elastically scale up/down storage Things and Cloud (FiCloud), 2015, pp. 379–386. and processing capabilities. 3. M. Xavier et al., “Performance Evaluation of

86 IEEE Cloud Computing www.computer.org/cloudcomputing Home cloud services (IaaS, PaaS, SaaS) Cloud User

Server 1Server 2S... erver N Enterprise Government

Foreign Foreign Cloud Home cloud cloud A cloud B federation

Foreign cloud A Home cloud Foreign cloud B virtual infrastructure virtual infrastructure virtual infrastructure

Home cloud Virtual resources used by Virtual resources owned by capabilities Virtual resources used by foreign cloud A and placed home cloud and placed in its enlargement foreign cloud B and placed in its virtual infrastructure virtual infrastructure in its virtual infrastructure Virtual resources Virtual resources placed in foreigncloud A placed in foreign cloud B and rented to home cloud and rented to home cloud

Figure 3. Microservice as the basis of federating multiple cloud datacenters as part of cohesive federation, where datacenter providers can meet the performance requirements of client applications through optimal placement and migration of microservices across datacenters.

Container-Based Virtualization for High Per- 8. M.K. Qureshi and Y.N. Patt, “Utility-Based formance Computing Environments,” Proc. Cache Partitioning: A Low-Overhead, High- 21st Euromicro Int’l Conf. Parallel, Distributed, Performance, Runtime Mechanism to Partition and Network-Based Processing (PDP), 2013, pp. Shared Caches,” Proc. 39th Ann. IEEE/ACM 233–240. Int’l Symp. Microarchitecture (Micro 06), 2006, 4. C. Esposito, A. Castiglione, and K.-K.R. Choo, pp. 423–432. “Challenges in Delivering Software in the Cloud 9. Y. Xie and G.H. Loh, “Pipp: Promotion/Inser- as Microservices,” IEEE Cloud Computing, Vol. tion Pseudo-Partitioning of Multi-Core Shared 3, no. 5, 2016, pp. 10–14. Caches,” Proc. 36th Ann. Int’l Symp. Computer 5. R. Ranjan et al., “Cross-Layer Cloud Resource Architecture (ISCA 09), 2009, pp. 174–183. Configuration Selection in the Big Data Era,” 10. S. Govindan et al., “Cuanta: Quantifying Ef- IEEE Cloud Computing, vol. 2, no. 3, 2015, pp. fects of Shared On-Chip Resource Interference 16–22. for Consolidated Virtual Machines,” Proc. 2nd 6. M. Caballer et al., “Dynamic Management of ACM Symp. Cloud Computing (SOCC 11), 2011, Virtual Infrastructures,” J. Grid Computing, vol. article 22. 13, Mar. 2015, pp. 53–70. 11. R. Nathuji and A. Kansal, “Q-Clouds: Manag- 7. W. Felter et al., “An Updated Performance Com- ing Performance Interference Effects for QoS- parison of Virtual Machines and Linux Contain- Aware Clouds,” Proc. 5th European Conf. Com- ers,” Proc. IEEE Int’l Symp. Performance Analysis of puter Systems (EuroSys 10), 2010, pp. 237–250. Systems and Software (ISPASS), 2015, pp. 171–172. 12. R. Ranjan, “Streaming Big Data Processing in

September/October 2016 IEEE Cloud Computing 87 Blue Skies

Datacenter Clouds,” IEEE Cloud Computing, interests include grid computing, peer-to-peer net- vol. 1, no. 1, 2014, pp. 78–83. works, cloud computing, Internet of Things, and big 13. M. Natu et al., “Holistic Performance Monitor- data analytics. Ranjan has a PhD in computer science ing of Hybrid Clouds: Complexities and Future and software engineering from the University of Mel- Directions,” IEEE Cloud Computing, vol. 3, no. bourne (2009). Contact him at [email protected] 1, 2016, pp. 72–81. or http://rajivranjan.net. 14. A. Celesti et al., “Exploring Container Virtu- alization in IoT Clouds,” Proc. 2016 IEEE Int’l Lydia Y. Chen is a research staff member at the Conf. Smart Computing (SmartComp), 2016, pp. IBM Zurich Research Lab, Zurich, Switzerland. Her 1–6. research interests include modeling, optimizing per- 15. M. Fazio and A. Puliafito, “Cloud4sens: A Cloud- formance and dependability for big data applica- Based Architecture for Sensor Controlling and tions and highly virtualized datacenters. She received Monitoring,” IEEE Comm, vol. 53, Mar. 2015, a PhD in operations research from the Pennsylvania pp. 41–47. State University. Contact her at [email protected]. 16. M. Assis and L. Bittencourt, “A Survey on Cloud Federation Architectures: Identifying Function- Chang Liu is a research fellow (assistant professor) al and Non-functional Properties,” J. Network at Newcastle University, UK. His research interests in- and Computer Applications, vol. 72, 2016, pp. clude cloud computing, big data, distributed systems, 51–71. Internet of Things, and information security and pri- 17. A. Celesti et al., “Characterizing Cloud Fed- vacy. Liu has a PhD in information technology from eration in IoT,” Proc. 30th Int’l Conf. Advanced the University of Technology, Sydney, Australia. Con- Information Networking and Applications Work- tact him at: [email protected]. shops (WAINA), 2016, pp. 93–98. Massimo Villari is an associate professor of computer science at the University of Messina. His re- Maria Fazia is an assistant researcher of computer search interests include cloud computing, Internet of science at the University of Messina. Her research in- Things, big data analytics, and security systems. Vil- terests include distributed systems and wireless com- lari has a PhD in computer engineering from the Uni- munications, especially with regard to the design and versity of Messina. He’s a member of IEEE and IARIA development of cloud solutions for IoT services and boards. Contact him at [email protected]. applications. Fazia has a PhD in advanced technolo- gies for information engineering from the University of Messina. Contact her at [email protected].

Antonio Celesti is a postdoctoral researcher at University of Messina. His research interests include distributed systems and cloud computing, with par- ticular regard to federation, storage, security, energy efficiency; and assistive technology. Celesti has a PhD in advanced technology for information engineering from the University of Messina, Italy. Contact him at [email protected].

Rajiv Ranjan is a reader in the School of Com- puting Science at Newcastle University, UK; chair professor in the School of Computer, Chinese Uni- Read your subscriptions through versity of Geosciences, Wuhan, China; and a visiting the myCS publications portal at scientist at Data61, CSIRO, Australia. His research http://mycs.computer.org.

88 IEEE Cloud Computing www.computer.org/cloudcomputing