Urgent Virtual Machine Eviction with Enlightened Post-Copy
Total Page:16
File Type:pdf, Size:1020Kb
Urgent Virtual Machine Eviction with Enlightened Post-Copy Yoshihisa Abe†, Roxana Geambasu‡, Kaustubh Joshi•, Mahadev Satyanarayanan† †Carnegie Mellon University, ‡Columbia University, •AT&T Research fyoshiabe, [email protected], [email protected], [email protected] Abstract servative allocation. Migration for contending VMs thus has Virtual machine (VM) migration demands distinct properties its own value, which can impact resource allocation policies. under resource oversubscription and workload surges. We However, migration of a VM under contention poses present enlightened post-copy, a new mechanism for VMs challenges to be solved because of its distinct requirements. under contention that evicts the target VM with fast execu- It needs to salvage the performance of all the VMs being tion transfer and short total duration. This design contrasts affected, namely their aggregate performance. The primary with common live migration, which uses the down time of goal, therefore, is to evict the target VM from its host rapidly, the migrated VM as its primary metric; it instead focuses allowing the other VMs to claim the resources it is currently on recovering the aggregate performance of the VMs being consuming. This objective is achieved by transferring the affected. In enlightened post-copy, the guest OS identifies execution of the migrated VM to a new host, so that its memory state that is expected to encompass the VM’s work- computational cycles are made available to the other VMs. ing set. The hypervisor accordingly transfers its state, miti- Additionally, the duration of migration decides when the gating the performance impact on the migrated VM resulting VM’s state can be freed on the source host; reclaiming the from post-copy transfer. We show that our implementation, space of memory state, which can be tens or hundreds of with modest instrumentation in guest Linux, resolves VM gigabytes, can be particularly important. While adhering to contention up to several times faster than live migration. these priorities, the impact on the migrated VM should be mitigated to the extent possible, for its service disruption to be limited and the aggregate performance to recover fast. 1. Introduction These properties are especially desirable when the VMs As a means of load balancing, VM migration plays a cru- provide services that need to sustain high overall throughput. cial role in cloud resource efficiency. In particular, it affects Example cases include back-end servers or batch-processing the feasibility of oversubscription. Oversubscription is co- applications, such as big data analytics. With these types of location of VMs on the same host in which their allocated workloads, resolving the contention between VMs directly resources can collectively exceed the host’s capacity. While translates to optimizing their performance as a whole. Mi- it allows the VMs to share the physical resources efficiently, gration can also be more for saving the contending VMs than at peak times they can interfere with each other and suffer the target VM itself. The contending VMs can be running performance degradation. Migration, in this situation, offers more latency-sensitive services, such as web servers, than a way of dynamically re-allocating a new machine to these those of the target VM. Or, the target VM may be malfunc- VMs. The faster the operation is in resolving the contention, tioning, for example under a DoS attack, needing diagnosis the more aggressive vendors can be in deploying VMs. With- in segregation. In all these situations, as illustrated in Figure out a good solution, on the other hand, those with perfor- 1, migration with appropriate characteristics would allow the mance emphasis must relinquish resource efficiency and use VMs to be co-located under normal operation, and to be al- more static resource assignments, such as Placements on located new resources when experiencing a surge of loads. Amazon EC2 [1]. Although such strategies can guarantee Unfortunately, the requirements described above defy the VM performance, they lead to wasted resources due to con- trends in VM migration approaches. In particular, the cur- rent standard of live migration [12, 30] often results in elon- gated duration due to its design principles [19, 38]. This be- havior leaves VMs under contention, delaying their perfor- Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without mance recovery. In this paper, we present a design point that fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be is fundamentally different and driven by enlightenment [28]. honored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers, or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected] or Publications Enlightenment is a type of knowledge the guest passes to Dept., ACM, Inc., fax +1 (212) 869-0481. the hypervisor for improving the efficiency of its operation. VEE ’16 April 2–3, 2016, Atlanta, Georgia, USA. Copyright c 2016 ACM 978-1-4503-3947-6/16/04. $15.00 Applying it to migration, we develop an approach called en- DOI: http://dx.doi.org/10.1145/http://dx.doi.org/10.1145/2892242.2892252 Web Server: Degraded Response Time Malfunctioning VM Affecting Both Degraded Throughput Batch Proc.: Degraded Throughput VM under Normal Operation Host 1 Host 1 Host 1 Normal Batch Batch Web Batch Malfun- Opera- Proc. Proc. Server Proc. ctioning tion VM VM VM VM VM VM Before Before Before After After After Host 1 Host 2 Host 1 Host 2 Host 1 Host 2 Normal Batch Batch Web Batch Malfun- Opera- Proc. Proc. Server Proc. ctioning tion VM VM VM VM VM VM Ideal Throughput Ideal Throughput Ideal Response Time Ideal Throughput Ideal Performance Isolated (a) Optimizing Aggregate Throughput (b) Saving Response-Time-Sensitive Service (c) Isolating VM for Diagnosis Figure 1. Scenarios for Urgent VM Eviction. The top and bottom rows illustrate VMs before and after migration, respectively. lightened post-copy. It employs a post-copy style, in which (x1000) 120 the VM is resumed on the destination before its state has 100 80 been migrated, and mitigates the resulting performance im- 60 40 pact through enlightenment. Upon migration, the guest OS Gbps 10 20 0 informs the hypervisor of memory regions that require high (x1000) 120 transfer priority for sustaining the guest performance. The 100 80 60 hypervisor then transfers the execution of the VM imme- 40 5 Gbps 5 20 diately to a new host, while continuously pushing its state 0 (x1000) as instructed and also serving demand-fetch requests by the 120 100 destination VM. Our implementation, with modest changes 80 60 to guest Linux, shows the effectiveness of the guest’s initia- 40 2.5 Gbps 2.5 20 tive in performing migration with the desired properties. 0 0 30 60 90 120 150 180 210 240 270 300 330 360 The main contributions of this paper are as follows: Time (sec) • Explaining common behaviors of migration and deriving Figure 2. Live Migration Behavior of qemu-kvm under VM properties desired for VMs under contention (Section 2). Contention. The y-axis indicates throughput in operations • Presenting the design and implementation of enlightened per second. The black and gray lines represent the migrated post-copy, an approach that exploits native guest OS sup- and contending VMs, respectively, and the shaded areas the port (Sections 3 and 4). duration of migration. • Evaluating the performance and trade-offs of enlight- ened post-copy against live migration and other basic ap- proaches (Section 5). describe the algorithm and behavior of live migration caus- ing this problem, and derive the properties desired in our 2. Analysis of VM Migration solution. VM migration has historically focused on the liveness, namely minimal suspension, of the target VM. Specifically, 2.1 Mechanics of Migration live migration is the current standard widely adopted by Figure 3 illustrates aspects of VM migration including exe- common hypervisors [12, 30]. The characteristics of live mi- cution, state transfer, and performance. Migration is initiated gration, however, deviate from those desired in the context of on the source, on which the VM originally executes. The VM VMs under contention; it leads to extended duration of mi- is suspended at one point, and then resumed on the destina- gration and thus continued interference of the VMs. Figure tion. State transfer during the course of migration is catego- 2 illustrates this problem, with live migration by qemu-kvm rized into two types: pre-copy and post-copy. Pre-copy and 2.3.0 under the Memcached workload and experimental set- post-copy are phases performed before and after the VM re- up described in Section 5. One of the two contending VMs sumes on the destination, respectively. Associated with the is migrated, with approximately 24 GB of memory state to duration of these state transfer modes, there are three key be transferred, under varied bandwidth. Migration takes 39.9 time metrics: seconds at 10 Gbps, and fails to complete at 2.5 Gbps. For the duration of migration, the VMs suffer degraded perfor- • Down time: time between the suspension and resume of mance due to their mutual interference. In this section, we the VM, during which its execution is stopped. Migration VM VM Migration Start tracking page content mutation Start Suspension Resume End Send all pages to destination VM Execution Source Location Destination Re-send pages dirtied since last transfer State Pre-Copy Post-Copy Transfer Otherwise Down Time Check conditions for completion Time Execution Transfer Time Metrics Expected down time is below threshold Total Duration or additional condition holds Possible Impact Dirty State Missing Suspend guest on source on Performance Tracking VM State Figure 3.