Towards Virtual Machine Image Management for Persistent Memory
Total Page:16
File Type:pdf, Size:1020Kb
Towards Virtual Machine Image Management for Persistent Memory Jiachen Zhang, Lixiao Cui, Peng Li, Xiaoguang Liu*, Gang Wang* Nankai-Baidu Joint Lab, College of Computer Science, KLMDASR, Nankai University, Tianjin, China fjczhang, cuilx, lipeng, liuxg, [email protected] Abstract—Persistent memory’s (PM) byte-addressability and compared with I/O virtualization, since memory virtualization high capacity will also make it emerging for virtualized envi- today are dominated by hardware-assisted address translation ronment. Modern virtual machine monitors virtualize PM using like AMD’s Nested-Page Table (NPT) and Intel’s Extended either I/O virtualization or memory virtualization. However, I/O virtualization will sacrifice PM’s byte-addressability, and Page Table (EPT) [3]. Both NPT and EPT implement dedicate memory virtualization does not get the chance of PM image second level page tables to accelerate the address translation. management. In this paper, we enhance QEMU’s memory vir- When using PM under the hardware-assisted memory virtu- tualization mechanism. The enhanced system can achieve both alization, once a VMM register a PM region to the host OS PM’s byte-addressability inside virtual machines and PM image kernel, the control of virtualized memory is handed over to the management outside the virtual machines. We also design pcow, a virtual machine image format for PM, which is compatible host OS and hardware MMU. That is to say, any data access with our enhanced memory virtualization and supports storage to virtual PM regions will never pass through VMMs, which virtualization features including thin-provision, base image and reside in the user space of host servers. snapshot. Address translation is performed with the help of To achieve PM image management, in this paper, we Extended Page Table (EPT), thus much faster than image formats implemented in I/O virtualization. We also optimize pcow consid- enhance the native hardware-assisted memory virtualization. ering PM’s characteristics. The evaluation demonstrates that our We design an image monitor inside QEMU to manage PM scheme boosts the overall performance by up to 50× compared images created on PM-aware file system. The image monitor with qcow2, an image format implemented in I/O virtualization, acquires the control opportunity to organize PM image by and brings almost no performance overhead compared with the handling EPT violations and write-protection violations in user native memory virtualization. space. Index Terms—persistent memory, memory virtualization, stor- age virtualization, virtual machine, cloud storage Data of virtual disks is usually organized in well-formatted image files. Image formats like qcow2 [4] and VMDK [5], I. INTRODUCTION which support abundant storage virtualization features, are widely used in cloud environments. However, as most of them Persistent memory (PM) is attached through memory buses are designed for I/O virtualization, some metadata are not and provides byte-addressability for persistent data. Com- suitable or not necessary for memory virtualization. On the pared with DRAM, PM’s capacity is several times larger and other hand, although modern VMMs support creating raw cheaper. In order to take advantage of the best performance of PM image files for memory virtualization based on virtual PM in cloud environment, the support for using PM in virtual PM regions, the mapping between the raw format images machines is also important. and virtual PM regions is linear, and no storage virtualization Modern virtual machine monitors (VMMs) like QEMU and features can be implemented [1]. Therefore, existing image VMware vSphere can virtualize PM through current I/O virtu- formats cannot meet our needs. alization or memory virtualization [1] [2]. However, although Thus, in this paper, we also design an image format for PM PM is already at the boundary of storage and memory, I/O called pcow (short for PM copy-on-write). Like the modern virtualization and memory virtualization are still separated in virtual machine image formats, the pcow format supports thin- a way. Since traditional storage devices are attached through provision, base image and snapshot. Compared with formats I/O bus, storage virtualization features like thin-provision, base designed for I/O virtualization, we eliminate unnecessary image and snapshot are also implemented along the virtual metadata like address translation tables, and consider the I/O stack. However, using virtual I/O stack loses the byte- consistency issue caused by PM’s smaller write atomicity [6]. addressability of PM. Although directly mapping a PM region The contributions of this paper are as follows. into the virtual machine can preserve the byte-addressability, this kind of memory virtualization can not get the chance of 1) We summarize the current schemes of PM virtualization. PM image management in modern VMMs. 2) We enhance the current memory virtualization in QEMU As storage virtualization features are based on the manage- to support PM image management. ment of virtual machine images, we believe it is also necessary 3) We propose the pcow image format, which is compatible to achieve virtual PM image management when using memory with the enhanced memory virtualization and optimized virtualization for PM. However, it will be more challenging for PM. 4) We implement several storage virtualization features isolation of files. In this paper, we also achieve the isolation based on the enhanced memory virtualization and pcow. between PM images with the help of a PM-aware file system, The paper is organized as follows: We describe the back- ext4, which is DAX enabled. ground in Section II and explain our motivation in Section B. Storage Virtualization III. In Section IV, we describe the idea of the proposed methods. Section V discusses the implementation details and Storage virtualization is a process to manage the mapping the performance optimization for PM. We give experimental relationship between physical storage resources and logical results in Section VI, present the related work in Section VII, views. Several features are commonly implemented based on and make the conclusions in Section VIII. storage virtualization. For example, • Storage pooling aggregates small storage devices into II. BACKGROUND a large storage pool, which facilitates storage resource A. Persistent Memory management by a centralized controller. • Thin-provision tends to promise users a large storage Persistent memory (PM), also known as storage class space while allocating much smaller space at the begin- memory (SCM), is a type of memory device based on non- ning. Subsequent allocations will be done as the actual volatile memory (NVM) technologies. There are plenty NVM usage increases. technologies under development, such as 3D XPoint, PCM • Snapshot protects data at specific time points as read- (phase-change Memory) [7], ReRAM (resistive random-access only using copy-on-write technology. It provides user the memory) [8] and STT-MRAM (spin-transfer-torque MRAM) option to roll back to any snapshot point. [9]. Although most PM devices are not mass-produced, a few • Base image (also called template) provides the oppor- technologies like 3D XPoint are preparing for commercial use tunity to build a new image based on images created [10]. before. This feature makes the deployment and backup PM is byte-addressable and can be accessed by load/store much easier. And for cloud providers, this feature saves instructions like DRAM, and non-volatile like block devices. the storage costs [17]. While using PM can get the benefits of using both memory and Since most storage devices are connected to slow I/O buses, storage, we also have to face the common issues of memory most of these features are implemented in VMMs’ virtual I/O or storage: stacks. • As memory, like DRAM, we have to face cache coher- ence and atomic visibility [6] [11] issues when program- C. Using PM in Virtual Machines ming with PM. As displayed in Fig. 1, technically, considering storage • As storage, PM also have the issues like data durability devices of block device form or PM form, we can achieve and crash consistency. As the program interface and four kinds of virtualization in the forms of “virtual device– write atomicity changed, we should switch to cache flush physical device” relationship. The four forms of virtualization instructions (like clflush) to make the recently changed can be implemented through I/O virtualization or memory data persistence from CPU cache to PM [12]. To maintain virtualization: the order of cache flush instructions or store instructions in non-TSO (total store order) architecture (like ARM) I/O Virtualization [13], a cache flush or store instruction should also have Block Device Block Device a fence instruction (sfence or mfence) followed. Facing power failure or system crash, we also need to consider Block Device PM the crash consistency issues [14] in the context of PM [15]. PM PM According to SNIA’s NVM programming model [16], in Guest Device order to take advantage of both storage and memory char- Block Device PM Host Device acteristics of PM, we should access PM in NVM.PM.FILE (Byte-addressable) Memory Virtualization mode. According to this mode, all or part of a PM file is directly mapped as a virtual memory region to user-space Fig. 1: Forms of storage device virtualization in modern applications, then the applications can directly load/store the VMMs. Only virtualizing PM under VMMs’ memory virtu- PM region. The data durability is assured after the file system alization (“PM–PM” form) can achieve byte-addressability in or the applications issue cache flush instructions. The feature virtual machines. implements NVM.PM.FILE is called direct access (DAX) in both Windows and Linux. File systems with DAX feature I/O virtualization: Through I/O virtualization, we can supported are called PM-aware file system [6]. Linux’s XFS achieve “block device–block device” or “block device–PM” and ext4 already support being mounted with DAX enabled.