Optimizing Total Migration Time in Virtual Machine Live Migration

IT 13 016 Examensarbete 30 hp Mars 2013 Optimizing Total Migration Time in Virtual Machine Live Migration Erik Gustafsson Institutionen för informationsteknologi Department of Information Technology Abstract Optimizing Total Migration Time in Virtual Machine Live Migration Erik Gustafsson Teknisk- naturvetenskaplig fakultet UTH-enheten The ability to migrate a virtual machine (VM) from one physical host to another is important in a number of cases such as power management, on-line Besöksadress: maintenance, and load-balancing. The amount of memory used in VMs have been Ångströmlaboratoriet Lägerhyddsvägen 1 steadily increasing up to several gigabytes. Consequently, the time to migrate Hus 4, Plan 0 machines, the total migration time, has been increasing. The aim of this thesis is to reduce the total migration time. Postadress: Box 536 751 21 Uppsala Previous work aimed at reducing the amount of time and disk space required for saving checkpoint images of virtual machines by excluding data from the memory that Telefon: is duplicated on the disk of the VM. Other work aimed at reducing the time to 018 – 471 30 03 restore a VM from a checkpoint by only loading a subset of data before resuming the Telefax: VM and marking the other memory as invalid. These techniques have been adapted 018 – 471 30 00 and applied to virtual machine live migration to reduce the total migration time. The implemented technique excludes sending duplicate data that exists on disk and Hemsida: resumes the VM before all memory has been loaded. http://www.teknat.uu.se/student The proposed technique has been implemented for fully virtualized guests in Xen 4.1. The results of research conducted with a number of benchmarks demonstrate that there is an average 44% reduction of the total migration time. Handledare: Bernhard Egger Ämnesgranskare: Philipp Rümmer Examinator: Ivan Christoff IT 13 016 Tryckt av: Reprocentralen ITC Acknowledgements I would sincerely like to thank Professor Bernhard Egger at Seoul National Uni- versity helping to supervise me during this thesis. I would also like to thank Professor Philipp Ruemmer at Uppsala University for reviewing my thesis. I would also like to dedicate this thesis to my lovingly special someone, Alek- sandra Oletic. i Contents Acknowledgements i Contents ii List of Figures iv Acronyms v 1 Introduction 1 1.1 Overview . .1 1.2 Live Migration . .1 1.3 Motivation and Problem Definition . .2 1.4 Contributions . .2 1.5 Outline . .2 2 Virtual Machine Monitors, Xen, and Memory Management 4 2.1 Overview . .4 2.2 Paravirtualization . .5 2.3 Hardware Assisted Virtualization . .5 2.4 Memory Management and Page Tables . .6 2.5 The Page Cache . .6 3 Related Work 7 3.1 Introduction . .7 3.2 Performance Measurements . .7 3.3 Live Migration Methods and Techniques . .8 3.3.1 Iterative Pre-Copy . .8 3.3.2 Memory Compression of Pre-Copy . .8 3.3.3 Post-Copy . .9 3.3.4 System Trace and Replay . .9 3.3.5 SonicMigration with Paravirtualized Guests . 10 3.3.6 Discussion of Live Migration Techniques . 10 3.4 Checkpoint Methods and Techniques . 11 3.4.1 Efficiently Checkpointing a Virtual Machine . 11 ii 3.4.2 Fast Restore of Checkpointed Memory using Working Set Estimation . 12 4 Proposed Solution to Optimize Total Migration Time 13 4.1 Page Cache Elimination . 13 4.2 Page Cache Data Loaded at the Destination Host . 15 4.3 Maintaining Consistency . 17 5 Implementation Details 20 5.1 Live Migration . 20 5.2 Pre-fetch Restore . 20 5.2.1 The Page Fault Handler . 20 5.2.2 Intercepting I/O to Maintain Consistency . 21 5.2.3 Optimizations . 22 6 Results 24 6.1 Experimental Setup . 24 6.2 Results . 25 6.3 Comparison with Other Methods . 32 6.3.1 Memory Compression . 32 6.3.2 Post-Copy . 32 6.3.3 Trace and Replay . 32 6.3.4 SonicMigration with Paravirtualized Guests . 33 7 Conclusion and Future Work 34 7.1 Conclusions . 34 7.2 Future Work . 34 Bibliography 36 iii List of Figures 2.1 Structure of Xen . .4 4.1 Network Topology . 13 4.2 Time-line Overview . 15 4.3 Simplified Architecture . 16 4.4 Violation cases . 18 5.1 Execution Trace over the Sector Accessed . 23 6.1 Total migration time normalized to unmodified Xen . 26 6.2 Downtime normalized to unmodified Xen . 26 6.3 Total data transferred . 28 6.4 Data sent over the network . 29 6.5 Performance degradation normalized to unmodified Xen . 30 iv Acronyms dom0 domain 0. domU user domain. EPT Extended Page Tables. HVM hardware virtual machine. MFN machine frame number. MMU memory management unit. NAS Network Attached Storage. NPT Nested Page Tables. PFN page frame number. PTE page table entry. SPT shadow page table. SSD solid-state drive. VCPU virtual CPU. VM virtual machine. v 1 Introduction 1.1 Overview In recent years, server virtualization has seen a steady increase of attention and popularity due to a multitude of factors. Virtualization [7] is the principle of providing a virtual interface for hardware. A virtual interface can create inde- pendence from the physical hardware by providing a layer of abstraction that accesses the hardware. A virtual machine [17] is a machine that runs in a virtualized environment as opposed to directly on hardware. Decoupling the OS from the physical machine by virtualization has enabled several techniques and methods to be employed such as power management capabilities, on-line maintenance, and load balancing which is enabled by the ability to move a virtual machine from one physical host to another. These capabilities can be achieved because the state of the virtual machine, the virtual CPUs (VCPUs), the memory, and any attached device can be recorded. The state can then be transferred to another physical host which enables the same virtual interface to the hardware where the virtual machine can be resumed. The process of moving the state from a physical host to another is called migration. Running several virtual machines on one physical server allows pooling of re- sources together to provide better power management [14]. Moving a virtual machine from one physical server to another provides cluster environments to do on-line maintenance [13]. Load balancing can be achieved by dynamically moving and allocating virtual machines across a cluster of physical hosts [22]. The use of each of these techniques requires the ability to efficiently move the virtual machine between physical hosts, and is called virtual machine migration. 1.2 Live Migration Live migration builds upon the idea of migration and takes it a step further. The "live" in live migration pertains to the fact that the migration should be 1 transparent to the users. As a consequence, the guest OS should be running during the migration. The downtime is the time it takes for the source host to suspend execution of the virtual machine (VM) until the destination host resumes it. The VM should not be stopped for a considerable amount of time in order for the it to be usable during the migration and thereby transparent to the users who ideally are not aware of the migration occurring. 1.3 Motivation and Problem Definition The aim of this thesis is to further improve upon and to reduce the the total migration time of virtual machine live migration. The total migration time is the time it takes from when the migration is initiated for a VM on the source host until the VM is resumed on the destination host. In order to be able to provide the capability of live migration, a low downtime is required. Downtime is the time during which the virtual machine is not responsive. Suspending the state of the VM includes pausing the VCPUs as well as other connected devices. Extensive work has previously been done to reduce the downtime but less work has been done focusing on reducing the total migration time. Total migration time can be very important in data centers since a reduction in total migration time improves load balancing, proactive fault tolerance, and power management capabilities. 1.4 Contributions The contributions of this thesis are: • A technique is presented that reduces the total migration time by sending only a critical subset of data through the network. We identify and cor- rectly handle all scenarios that could lead to a corrupt memory image or disk of the VM during and after the migration. • The proposed technique has been implemented in Xen 4.1 and various benchmarks have been conducted with fully-virtualized Linux guests. • The results have been analyzed compared to the performance of the orig- inal Xen implementation. The total migration time is reduced by 40.48 seconds on average which corresponds to a 44% relative reduction. 1.5 Outline The thesis is structured as follows. Chapter 1 introduces to the thesis topic. Chapter 2 continues with an introduction to virtualization concepts and tools used. Chapter 3 presents a discussion of related and previous works. Thereafter, Chapter 4 proposes a solution to the problems discussed. Chapter 5 presents 2 an discussion about implementation details. Chapter 6 provides an in-depth analysis of the solution. Chapter 7 concludes the thesis and discusses future work. 3 2 Virtual Machine Monitors, Xen, and Memory Management The following sections in this chapter discusses important background information abut virtualization, tools used, and a brief overview of relevant computer architecture. A basic knowledge of computer architecture is assumed and can be found in [17]. 2.1 Overview Xen is an open source bare metal hypervisor [1]. A hypervisor is a virtualization layer on top of the hardware that runs virtual machines.

Load more