The Unobservability Problem in Clouds

The Unobservability Problem in Clouds Anshul Gandhiy, Parijat Dube∗, Alexei Karve∗, Andrzej Kochut∗, Harsha Ellantiy y Stony Brook University, ∗ IBM Research Abstract—The cloud is not transparent. Users of cloud comput- principles, applications written in Bluemix do not have insight ing cannot control or monitor important information about their into infrastructure specifics on which they execute, such as VMs or services, such as placement, true resource allocation, the identity and configuration of the physical machine on virtualization overhead, etc. Likewise, cloud providers cannot obtain important information about their users’ deployment such which they are hosted and the resource contention at the host. as the application model, the role of each VM, etc. While such Obtaining such insights and information could help developers information is not required to be revealed, we claim that this predict and optimize the performance of their applications. lack of information prevents users from fully understanding their resource availability, and limits the feasibility of various Note that the cloud is not required to be transparent. In performance management solutions. We refer to this lack of fact, by design, the cloud is expected to be opaque to some information as the “Unobservability” problem. extent. The opaque nature of the cloud allows CSPs to offer In this paper, we describe the unobservability problem and low-cost services to users by independently managing VM present various use cases from our experience managing a placement and resource allocation as aggressively as needed medium-scale cloud deployment with several hundred VMs and without revealing their decision logic. Likewise, users do experiments on EC2 that highlight the severe impact of unobserv- not have to reveal any information about their intended use ability on performance, and the limitations it imposes on users and cloud providers. We show that, interestingly, unobservability of cloud resources, thus protecting them from CSPs and often diminishes the potential benefits of cloud computing. To other users. However, this lack of transparency limits the full address unobservability, we present and evaluate a practical potential of the cloud. We present various examples using solution to the unobservability problem that reveals important public and private clouds that highlight the severity and spread unobservable information without requiring any instrumentation of unobservability in clouds. We find that, interestingly, unob- or changes to the cloud. servability often hampers the many benefits of cloud computing such as elasticity and availability. I. INTRODUCTION While the unobservability problem is more widespread in Cloud computing offers several benefits over traditional public clouds, it also affects private clouds. Users in a managed physical deployments such as low cost, elasticity, high avail- dedicated cloud, such as IBM’s Cloud OpenStack Services [4] ability, and access to useful services such as automated de- which runs on SoftLayer [2] Bare Metal Servers, typically ployment and monitoring. Importantly, the ability to pay-as- cannot control their VM placement and scheduling. Of course, you-go makes it lucrative for small and medium businesses to the private cloud administrator can expose such functionality economically deploy their dynamic applications in the cloud. to specific users, if needed. Likewise, the CSP cannot observe However, cloud computing does have its shortcomings. all user operations that are being performed within the VMs. Our experience managing a medium-scale private OpenStack- In this paper, we focus on the “unobservability” problem deployed cloud (with KVM hypervisors) has revealed several which relates to the inability of those involved in cloud cases of unobservability that significantly impact performance. computing, such as Cloud Service Providers (CSPs) and users, from observing each others’ configuration and settings. In We assert that addressing, even partially, the unobserv- other words, we focus on the lack of transparency in the ability problem can significantly improve cloud adoption cloud. The unobservability problem typically manifests itself among users. As a concrete example, consider the autoscaling by restricting the information available to users and CSPs. functionality that helps users leverage the elastic nature of For example, cloud users typically do not know the identity cloud computing. Many CSPs today, including Amazon [5], of the physical servers that are hosting their VMs or the RightScale [6], and cloud software solutions such as Open- specifics of the virtualization technology used by the hosts. Stack [7], offer rule-based solutions (not necessarily for free) Likewise, CSPs are often not aware of the services and to users for autoscaling their deployments. Such rule-based software employed by the cloud users in their VMs or the autoscaling solutions place the burden of determining the role of each VM in the user’s application deployment. scaling thresholds and rules on the users as the CSPs do not have any information about the users’ deployments. If the A good example of challenges related to unobservability unobservability problem can be addressed, however, then CSPs in clouds is running applications in the IBM Bluemix [1] can offer a fully automated autoscaling functionality to users. environment. IBM Bluemix is a cloud application development platform allowing users to rapidly develop, test, and publish Unfortunately, the unobservability problem is difficult to cloud applications. It is powered by the IBM IaaS Cloud [2] address because of the nature of clouds. For example, while and also offers base IaaS services such as OpenStack-based instrumenting user VMs can improve visibility for CSPs, it VMs [3]. Because of the well established cloud layering is not desirable, especially for privacy reasons. Fortunately, it Fig. 2. Performance variation on EC2. Public cloud VMs exhibit significant variation in performance, probably due to resource contention from (unobservable) colocated VMs. or application model. This unobservability often leads to physical resource contention and performance interference, and is an important research topic in itself [13], [14], [15]. Fig. 1. Abstract cloud model with the three distinct entities. • (Type II) Cloud user-CSP: In this case, the cloud user is possible to infer some missing information by analyzing cannot observe the configuration and control logic used by the performance of cloud-deployed VMs. We describe one the CSP. For instance, the cloud user is typically unaware such inference solution to the unobservability problem in this of (and has limited control on) the placement of her VMs paper, and employ it to enable CSP-driven autoscaling of on the physical machines in the cloud, is unaware of the user applications. Importantly, our inference solution does not exact amount of resources allocated to her VMs at all times require any changes to the cloud platform or user deployment. (due to contention), and is also unaware of the decision The rest of the paper is organized as follows. We describe logic that guides placement and allocation of new VMs. the unobservability problem in detail in Section II. We then present, in Sections III, IV, V, and VI, several use cases • (Type III) CSP-Cloud user: The CSP typically does not based on our experience with a medium-scale OpenStack- encroach into the cloud user’s deployment. For example, deployed cloud platform (hosting several hundred VMs) and the CSP cannot observe the software and services employed our experiments on EC2 that highlight the unobservability by the user, or their configuration settings. However, the problem in public and private clouds. We then present a po- CSP can monitor a few parameters, such as resource tential solution to the unobservability problem in Section VII utilization. The lack of transparency in this case limits based on inference techniques that requires no changes to the assistance that the CSP can provide to the cloud the cloud platform or application. We discuss related work user (see Section VI-A). Note that Type III and Type II in Section VIII, and present our conclusions in Section IX. unobservability occur simultaneously in the same scenario. II. THE UNOBSERVABILITY PROBLEM • (Type IV) End user-cloud user: End users who employ services offered by cloud users face the same unobservabil- The unobservability problem refers to the lack of visibility ity problems as experienced by cloud users in Type II. That and control among the entities in a cloud computing environ- is, the end user cannot observe the model and deployment ment. Consider the cloud deployment example in Figure 1. of the cloud user. Thus, the end user cannot determine how Here, there are three distinct entities that interact with the the cloud service will respond in case of a load spike or cloud. The first is the CSP, for example, AWS [8], Microsoft a gradual increase in traffic. Likewise, the end user cannot Azure [9], and Google Cloud Platform [10]. The second is the ascertain if and when the service will be unavailable or slow cloud user, for example, Netflix [11] and BestBuy [12], who due to, for instance, maintenance or resource contention. purchase services, such as VMs or storage, from the CSPs. The third is the end user of these customers, for example, In the following sections, we present several use cases that Netflix customers, who contract various cloud-based services highlight the Type I, Type II, and Type III unobservability through cloud users. Note that end users could also directly problems. A good example of Type IV unobservability is purchase services from CSPs. Based on this model, there are discussed in Deng et al. [16]. at least four different types of unobservability problems that can be identified: III. PERFORMANCE VARIATION (TYPE I) • (Type I) Cloud user-cloud user: Cloud users whose In this section, we present our empirical results illustrating deployments are co-located on the same physical machines the variability in performance experienced by public cloud are typically unaware of each others’ resource consumption VMs. We set up 15 micro instances on EC2 [17], 5 each on Fig.

Load more