Achieving Self-Scaling in Virtualized Datacenter Management Middleware
Total Page:16
File Type:pdf, Size:1020Kb
Tide: Achieving Self-Scaling in Virtualized Datacenter Management Middleware Shicong Meng, Ling Liu Vijayaraghavan Soundararajan Georgia Institute of Technology VMware, Inc. {smeng, lingliu}@cc.gatech.edu [email protected] ABSTRACT eliminate downtime of applications by allowing running vir- The increasing popularity of system virtualization in data- tual machines to be migrated off physical machines that re- centers introduces the need for self-scaling of the manage- quire maintenance, and can also save power[15] during non- ment layer to cope with the increasing demands of the man- peak hours by consolidating virtual machines onto a small agement workload. This paper studies the problem of self- number of servers and shutting down the rest. More re- scaling in datacenter management middleware, allowing the cently, virtualization has enabled the proliferation of cloud management capacity to scale with the management work- computing platforms such as Amazon’s EC2[4], where vir- load. We argue that self-scaling must be fast during work- tual machines in a datacenter can be provisioned on demand load bursts to avoid task completion delays, and self-scaling and rented based on usage as if they were physical machines. must minimize resource usage to avoid resource contention In order to provide features like high availability, live mi- with applications. To address these two challenges, we pro- gration, and power management, virtualized datacenters rely pose the design of Tide, a self-scaling framework for virtu- on a rich management middleware layer. This layer provides alized datacenter management. A salient feature of Tide is unified management of physical hosts, virtual machines, vir- its fast capacity-provisioning algorithm that supplies just- tualized network switches, and storage devices. One exam- enough capacity for the management middleware. We eval- ple of a virtualized datacenter management middleware layer uate the effectiveness of Tide with both synthetic and real is VMware vSphere[20]. VMware vSphere provides three world workloads. Our results show that the self-scaling ca- key management functionalities. First, it executes various pability in Tide can substantially improve the throughput manual or automated management tasks. For example, if an of management tasks during management workload bursts administrator needs to provision virtual machines to serve as while consuming a reasonable amount of resources. remote desktops for new end-users, the administrator sends these provisioning tasks to the management layer, which cre- ates the new virtual machines and places them automatically Categories and Subject Descriptors on physical hosts. In addition, vSphere may continuously C.4 [Performance Of Systems]: Performance Attributes; balance the workload of applications running in VMs via C.5.5 [Computer System Implementation]: Servers automatic live migration[7]. The second functionality pro- vided by vSphere is monitoring the runtime status of each physical server and the virtual machines hosted on them. General Terms Finally, vSphere maintains the configuration information of Management, Performance, Design the components in the datacenter like the host hardware con- figuration, virtual network configuration, and storage device configuration. 1. INTRODUCTION The ability of the management layer to support such au- Datacenter virtualization has attracted attention in re- tomated provisioning and load balancing tasks makes it an cent years due to various benefits it offers. It saves total ideal platform for cloud-based applications. Cloud-based ap- cost of ownership by consolidating virtual servers[18] and plications require elastic, automated control of resources to virtual desktops[19]. It also provides high availability to meet variable end-user demands. Without efficient resource applications that are not designed with this feature by en- controls, hot spots can arise in the infrastructure and the capsulating them within a highly-available virtual machine performance of end-user applications can suffer. Consider a (VM)[17]. Virtual machine live migration[16, 7] can nearly system without automated load balancing: as more users are added and more VMs must be provisioned, certain hosts may become overutilized, and VM performance suffers. Hence, the performance of the management system can have a direct impact on cloud application performance and cloud user sat- Permission to make digital or hard copies of all or part of this work for isfaction[13]. Moreover, the management tasks themselves personal or classroom use is granted without fee provided that copies are often must complete within specified time windows to satisfy not made or distributed for profit or commercial advantage and that copies the periodic maintenance cycles within the datacenter. bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific 1.1 Management Workload permission and/or a fee. Copyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$10.00. While a typical datacenter operator is often aware of the resource consumption of end-user applications in the data- will increase. For example, if a datacenter holds servers center, it is important to consider the contribution of man- that are shared between different companies with different agement tasks as well, especially when the management layer backup strategies (backup quarterly vs. backup bi-weekly), is used in cloud-like environments. For example, consider the load on the management layer can be bursty in unpre- the cost of powering on a VM. If the management layer is dictable ways, and this burstiness increases as more cus- performing automated load balancing, it must select the ap- tomers share a given datacenter. propriate host to run the VM[15]. This selection may require a complex series of computations involving load inspection 1.2 Challenges in Handling Workload Bursts on hosts and compatibility checking between the capabili- As the preceding discussion indicates, the management ties of a host and the required capabilities of the VM. For layer must deal with a diversity of tasks at various inter- example, the management layer must not only choose a host vals and with various resource demands. The capacity of that has sufficient resources, but must also pick a host that the management layer determines its ability to respond to a can support the VM’s demands (e.g., hardware-based vir- management workload burst. For example, if a single server tualization or connection to a specific iSCSI device). This is responsible for servicing all management tasks, it can be- type of computation can be resource-intensive, mostly CPU come a bottleneck, especially if the management tasks are and memory intensive, for the management layer, and the compute-intensive. Another common scenario is spawning resource usage can grow with the number of hosts and VMs, a large number of web server VMs within a short period of and can also grow if multiple VMs are powered on concur- time to accommodate a flash crowd. The longer it takes rently. for the management layer to create such VMs, the worse The management workload consumes considerable resources the performance for the end user, and the higher potential not just because individual management operations may be for an SLA violation. Moreover, if the management layer is expensive, but also because the workload itself is bursty[13]. busy with such a burst and is unable to perform automated Figure 1 shows the CDF of management task rates within a load balancing in a timely manner, end-user performance 10-minute time window for a production datacenter. While may also suffer. Each of these scenarios highlights the need the average workload is fairly small, the peak workload can for a flexible, elastic management layer to complement the be nearly 200 times higher than the average workload. Ex- elasticity of the application infrastructure. amples of such bursty workloads are prevalent[13]. Com- Providing the right capacity for the management layer is panies using virtual machines as remote desktops may per- challenging: sizing for regular workloads may induce excess form large numbers of virtual machine cloning and reconfig- management task latency when a burst arrives; sizing for uration tasks during a small time window to accommodate the peak may cause a lot of resources to go unused during new users. Moreover, in remote desktop environments, a so- normal periods, resources that could have been used for ap- called ’boot storm’ of power operations may occur when em- plications themselves. We argue that one possible solution ployees arrive at work. Other examples of potentially-bursty to efficiently handle management workload bursts is to make management operations include large-scale virtual machine the management system self-scaling, which allows the man- provisioning for software-as-a-service, live-migration-based agement system to automatically boost its capacity during datacenter-scale load balancing, and massive security patch- peak workloads and drop its capacity during normal work- ing. In general, the ability of virtualization to provide auto- loads. mated management can also lead to bursts in management workload in order to accomplish such tasks in specified main- 1.3 Contribution tenance windows. In this paper, we introducing Tide, a prototype manage- 4 ment system with dynamic capacity that scales with man- 10 Peak Workload agement workload. Tide leverages VMware’s vSphere[20] 3 management layer.