Virtual Machine Workloads: the Case for New Benchmarks For

Virtual Machine Workloads: The Case for New Benchmarks for NAS Vasily Tarasov1, Dean Hildebrand2, Geoff Kuenning3, Erez Zadok1 1Stony Brook University, 2IBM Research—Almaden, 3Harvey Mudd College Abstract images being stored on NAS servers. Encapsulating file systems in virtual disk image files simplifies the imple- Network Attached Storage (NAS) and Virtual Ma- mentation of features such as migration, cloning, and chines (VMs) are widely used in data centers thanks snapshotting, since they naturally map to existing NAS to their manageability, scalability, and ability to con- functions. In addition, non-virtualized hosts can co-exist solidate resources. But the shift from physical to vir- peacefully with virtualized ones that use the same NAS tual clients drastically changes the I/O workloads seen interface, which permits a gradual migration of services on NAS servers, due to guest file system encapsula- from physical to virtual machines. tion in virtual disk images and the multiplexing of re- quest streams from different VMs. Unfortunately, cur- Storage performance plays a crucial role when ad- rent NAS workload generators and benchmarks produce ministrators select the best NAS for their environment. workloads typical to physical machines. One traditional way to evaluate NAS performance is to This paper makes two contributions. First, we studied run a file system benchmark, such as SPECsfs2008 [38]. the extent to which virtualization is changing existing Vendors periodically submit the results of SPECsfs2008 NAS workloads. We observed significant changes, in- to SPEC; the most recent submission was in Novem- cluding the disappearance of file system meta-data op- ber 2012. Because widely publicized benchmarks such erations at the NAS layer, changed I/O sizes, and in- as SPECsfs2008 figure so prominently in configuration creased randomness. Second, we created a set of ver- and purchase decisions, it is essential to ensure that the satile NAS benchmarks to synthesize virtualized work- workloads they generate represent what is observed in loads. This allows us to generate accurate virtualized real-world data centers. workloads without the effort and limitations associated This paper makes two contributions: an analysis of with setting up a full virtualized environment. Our ex- changing virtualized NAS workloads, and the design and periments demonstrate that the relative error of our virtu- implementation of a system to generate realistic virtualized benchmarks, evaluated across 11 parameters, av- alized NAS workloads. We first demonstrate that the erages less than 10%. workloads generated by many current file system benchmarks do not represent the actual workloads produced by 1 Introduction VMs. This in turn leads to a situation where the perfor- By the end of 2012 almost half of all applications run- mance results of a benchmark deviate significantly from ning on x86 servers will be virtualized; in 2014 this the performance observed in real-world deployments. number is projected to be close to 70% [8,9]. Virtualiza- Although benchmarks are never perfect models of real tion, if applied properly, can significantly improve sys- workloads, the introduction of VMs has exacerbated the tem utilization, reduce management costs, and increase problem significantly. Consider just one example, the system reliability and scalability. With all the benefits percentage of data and meta-data operations generated of virtualization, managing the growth and scalability of by physical and virtualized clients. Table 1 presents the storage is emerging as a major challenge. results for the SPECsfs2008 and Filebench web-server In recent years, growth in network-based storage has benchmarks that attempt to provide a “realistic” mix of outpaced that of direct-attached disks; by 2014 more meta-data and data operations. We see that meta-data than 90% of enterprise storage capacity is expected to be procedures, which dominated in physical workloads, are served by Network Attached Storage (NAS) and Storage almost non-existent when VMs are utilized. The reason Area Networks (SAN) [50]. Network-based storage can is that VMs store their guest file system inside large disk improve availability and scalability by providing shared image files. Consequently, all meta-data operations (and access to large amounts of data. Within the network- NFS Physical clients Virtualized based storage market, NAS capacity is predicted to in- procedures (SPECsfs2008/Filebench) clients crease at an annual growth rate of 60%, as compared to Data 28% / 36% 99% only 22% for SAN [43]. This faster NAS growth is ex- Meta-data 72% / 64% <1% plained in part by its lower cost and its convenient file Table 1: The striking differences between virtualized and system interface, which is richer, easier to manage, and physical workloads for two benchmarks: SPECsfs2008 and more flexible than the block-level SAN interface. Filebench (Web-server profile). Data operations include READ The rapid expansion of virtualization and NAS has and WRITE. All other operations (e.g., CREATE, GETATTR, lead to explosive growth in the number of virtual disk READDIR) are characterized as meta-data. 1 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 307 indeed all data operations) from the applications are con- VM 2 Files on NAS verted into simple reads and writes to the image file. Guest OS Meta-data-to-data conversion is just one example of 1a Disk image file on DAS the way workloads shift when virtual machines are in- troduced. In this paper we examine, by collecting Local On−disk Network−based 1b Disk image file on SAN File System File System and analyzing a set of I/O traces generated by current Driver for benchmarks, how NAS workloads change when used I/O Controller in virtualized environments. We then leverage multi- 1c Disk image file on NAS Case analyzed in this paper dimensional trace analysis techniques to convert these Emulated traces to benchmarks [13, 40]. Our new virtual bench- I/O Controller 1d Pass−through to DAS or SAN marks are flexible and configurable, and support single- and multi-VM workloads. With multi-VM workloads, Emulated Disk the emulated VMs can all run the same or different application workloads (a common consequence of resource Hypervisor consolidation). Further, users do not need to go through Figure 1: VM data-access methods. Cases 1a–1d correspond a complex deployment process, such as hypervisor setup to the emulated-block-device architecture. Case 2 corresponds and per-VM OS and application installation, but can in- to the use of guest network file system clients. stead just run our benchmarks. This is useful because administrators typically do not have access to the pro- 2.1.1 Emulated Block Devices duction environment when evaluating new or existing NAS servers for prospective virtualized clients. Finally, Figure 1 shows several options for implementing the some benchmarks such as SPECsfs cannot be usefully back end of an emulated block device: run inside a VM because they do not support file-level 1a. A file located on a local file system that is deployed interfaces and will continue to generate a physical work- on Direct Attached Storage (DAS). This approach is load to the NAS server; this means that new benchmarks used, for example, by home and office installations of can be the only viable evaluation option. Our bench- VMware Workstation [39] or Oracle VirtualBox [44]. marks are capable of simulating a high load (i.e., many Such systems often keep their disk images on local file VMs) using only modest resources. Our experiments systems (e.g., Ext3, NTFS). Although this architecture demonstrate that the accuracy of our benchmarks re- works for small deployments, it is rarely used in large mains within 10% across 11 important parameters. enterprises where scalability, manageability, and high availability are critical. 2 Background 1b. A disk image file is stored on a (possibly clus- In this section, we present several common data ac- tered) file system deployed over a Storage Area Network cess methods for virtualized applications, describe in (SAN) (e.g., VMware’s VMFS file system [46]). A SAN depth the changes in the virtualized NAS I/O stack (VM- offers low-latency shared access to the available block NAS), and then explain the challenges in benchmarking devices, which allows high-performance clustered file NAS systems in virtualized environments. systems to be deployed on top of the SAN. This architecture simplifies VM migration and offers higher scal- 2.1 Data Access Options for VMs ability than DAS, but SAN hardware is more expensive Many applications are designed to access data using a and complex to administer. conventional POSIX file system interface. The methods 1c. A disk image file stored on Network Attached Stor- that are currently used to provide this type of access in a age (NAS). In this architecture, which we call VM-NAS, VM can be classified into two categories: (1) emulated the host’s hypervisor passes I/O requests from the virtual block devices (typically managed in the guest by a local machine to an NFS or SMB client, which in turn then ac- file system); and (2) guest network file system clients. cesses a disk image file stored on an external file server. Figure 1 illustrates both approaches. With an emu- The hypervisor is completely unaware of the storage ar- lated block device, the hypervisor emulates an I/O con- chitecture behind the NAS interface. NAS provides the troller with a connected disk drive. Emulation is com- scalability, reliability, and data mobility needed for ef- pletely transparent to the guest OS, and the virtual I/O ficient VM management. Typically, NAS solutions are controller and disk drives appear as physical devices to cheaper than SANs due to their use of IP networks, and the OS. The guest OS typically formats the disk drive are simpler to configure and manage.

Load more