Virtual Machine Workloads: The Case for New Benchmarks for NAS Vasily Tarasov1, Dean Hildebrand2, Geoff Kuenning3, Erez Zadok1 1Stony Brook University, 2IBM Research—Almaden, 3Harvey Mudd College

Abstract images being stored on NAS servers. Encapsulating file systems in virtual disk image files simplifies the imple- Network Attached Storage (NAS) and Virtual Ma- mentation of features such as migration, cloning, and chines (VMs) are widely used in data centers thanks snapshotting, since they naturally map to existing NAS to their manageability, scalability, and ability to con- functions. In addition, non-virtualized hosts can co-exist solidate resources. But the shift from physical to vir- peacefully with virtualized ones that use the same NAS tual clients drastically changes the I/O workloads seen interface, which permits a gradual migration of services on NAS servers, due to guest file system encapsula- from physical to virtual machines. tion in virtual disk images and the multiplexing of re- quest streams from different VMs. Unfortunately, cur- Storage performance plays a crucial role when ad- rent NAS workload generators and benchmarks produce ministrators select the best NAS for their environment. workloads typical to physical machines. One traditional way to evaluate NAS performance is to This paper makes two contributions. First, we studied run a file system benchmark, such as SPECsfs2008 [38]. the extent to which virtualization is changing existing Vendors periodically submit the results of SPECsfs2008 NAS workloads. We observed significant changes, in- to SPEC; the most recent submission was in Novem- cluding the disappearance of file system meta-data op- ber 2012. Because widely publicized benchmarks such erations at the NAS layer, changed I/O sizes, and in- as SPECsfs2008 figure so prominently in configuration creased randomness. Second, we created a set of ver- and purchase decisions, it is essential to ensure that the satile NAS benchmarks to synthesize virtualized work- workloads they generate represent what is observed in loads. This allows us to generate accurate virtualized real-world data centers. workloads without the effort and limitations associated This paper makes two contributions: an analysis of with setting up a full virtualized environment. Our ex- changing virtualized NAS workloads, and the design and periments demonstrate that the relative error of our virtu- implementation of a system to generate realistic virtu- alized benchmarks, evaluated across 11 parameters, av- alized NAS workloads. We first demonstrate that the erages less than 10%. workloads generated by many current file system bench- marks do not represent the actual workloads produced by 1 Introduction VMs. This in turn leads to a situation where the perfor- By the end of 2012 almost half of all applications run- mance results of a benchmark deviate significantly from ning on x86 servers will be virtualized; in 2014 this the performance observed in real-world deployments. number is projected to be close to 70% [8,9]. Virtualiza- Although benchmarks are never perfect models of real tion, if applied properly, can significantly improve sys- workloads, the introduction of VMs has exacerbated the tem utilization, reduce management costs, and increase problem significantly. Consider just one example, the system reliability and scalability. With all the benefits percentage of data and meta-data operations generated of virtualization, managing the growth and scalability of by physical and virtualized clients. Table 1 presents the storage is emerging as a major challenge. results for the SPECsfs2008 and Filebench web-server In recent years, growth in network-based storage has benchmarks that attempt to provide a “realistic” mix of outpaced that of direct-attached disks; by 2014 more meta-data and data operations. We see that meta-data than 90% of enterprise storage capacity is expected to be procedures, which dominated in physical workloads, are served by Network Attached Storage (NAS) and Storage almost non-existent when VMs are utilized. The reason Area Networks (SAN) [50]. Network-based storage can is that VMs store their guest file system inside large disk improve availability and scalability by providing shared image files. Consequently, all meta-data operations (and access to large amounts of data. Within the network- NFS Physical clients Virtualized based storage market, NAS capacity is predicted to in- procedures (SPECsfs2008/Filebench) clients crease at an annual growth rate of 60%, as compared to Data 28% / 36% 99% only 22% for SAN [43]. This faster NAS growth is ex- Meta-data 72% / 64% <1% plained in part by its lower cost and its convenient file Table 1: The striking differences between virtualized and system interface, which is richer, easier to manage, and physical workloads for two benchmarks: SPECsfs2008 and more flexible than the block-level SAN interface. Filebench (Web-server profile). Data operations include READ The rapid expansion of virtualization and NAS has and WRITE. All other operations (e.g., CREATE, GETATTR, lead to explosive growth in the number of virtual disk READDIR) are characterized as meta-data.

1 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 307 indeed all data operations) from the applications are con- VM 2 Files on NAS verted into simple reads and writes to the image file. Guest OS Meta-data-to-data conversion is just one example of 1a Disk image file on DAS the way workloads shift when virtual machines are in- troduced. In this paper we examine, by collecting Local On−disk Network−based 1b Disk image file on SAN File System File System and analyzing a set of I/O traces generated by current Driver for benchmarks, how NAS workloads change when used I/O Controller in virtualized environments. We then leverage multi- 1c Disk image file on NAS Case analyzed in this paper dimensional trace analysis techniques to convert these Emulated traces to benchmarks [13, 40]. Our new virtual bench- I/O Controller 1d Pass−through to DAS or SAN marks are flexible and configurable, and support single- and multi-VM workloads. With multi-VM workloads, Emulated Disk the emulated VMs can all run the same or different ap- plication workloads (a common consequence of resource consolidation). Further, users do not need to go through Figure 1: VM data-access methods. Cases 1a–1d correspond a complex deployment process, such as hypervisor setup to the emulated-block-device architecture. Case 2 corresponds and per-VM OS and application installation, but can in- to the use of guest network file system clients. stead just run our benchmarks. This is useful because administrators typically do not have access to the pro- 2.1.1 Emulated Block Devices duction environment when evaluating new or existing NAS servers for prospective virtualized clients. Finally, Figure 1 shows several options for implementing the some benchmarks such as SPECsfs cannot be usefully back end of an emulated block device: run inside a VM because they do not support file-level 1a. A file located on a local file system that is deployed interfaces and will continue to generate a physical work- on Direct Attached Storage (DAS). This approach is load to the NAS server; this means that new benchmarks used, for example, by home and office installations of can be the only viable evaluation option. Our bench- VMware Workstation [39] or Oracle VirtualBox [44]. marks are capable of simulating a high load (i.e., many Such systems often keep their disk images on local file VMs) using only modest resources. Our experiments systems (e.g., Ext3, NTFS). Although this architecture demonstrate that the accuracy of our benchmarks re- works for small deployments, it is rarely used in large mains within 10% across 11 important parameters. enterprises where scalability, manageability, and high availability are critical. 2 Background 1b. A disk image file is stored on a (possibly clus- In this section, we present several common data ac- tered) file system deployed over a Storage Area Network cess methods for virtualized applications, describe in (SAN) (e.g., VMware’s VMFS file system [46]). A SAN depth the changes in the virtualized NAS I/O stack (VM- offers low-latency shared access to the available block NAS), and then explain the challenges in benchmarking devices, which allows high-performance clustered file NAS systems in virtualized environments. systems to be deployed on top of the SAN. This archi- tecture simplifies VM migration and offers higher scal- 2.1 Data Access Options for VMs ability than DAS, but SAN hardware is more expensive Many applications are designed to access data using a and complex to administer. conventional POSIX file system interface. The methods 1c. A disk image file stored on Network Attached Stor- that are currently used to provide this type of access in a age (NAS). In this architecture, which we call VM-NAS, VM can be classified into two categories: (1) emulated the host’s hypervisor passes I/O requests from the virtual block devices (typically managed in the guest by a local machine to an NFS or SMB client, which in turn then ac- file system); and (2) guest network file system clients. cesses a disk image file stored on an external file server. Figure 1 illustrates both approaches. With an emu- The hypervisor is completely unaware of the storage ar- lated block device, the hypervisor emulates an I/O con- chitecture behind the NAS interface. NAS provides the troller with a connected disk drive. Emulation is com- scalability, reliability, and data mobility needed for ef- pletely transparent to the guest OS, and the virtual I/O ficient VM management. Typically, NAS solutions are controller and disk drives appear as physical devices to cheaper than SANs due to their use of IP networks, and the OS. The guest OS typically formats the disk drive are simpler to configure and manage. These properties with a local file system or uses it as a raw block device. have increased the use of NAS in virtual environments When an emulated block device is backed by file-based and encouraged several companies to create solutions for storage, we call the backing files disk image files. disk image files management at the NAS [6, 36, 41].

2 308 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association sponding I/O stack becomes deeper and more complex, Applications File System Benchmark as seen in Figure 2. As they pass through the layers, I/O On−Disk File System requests significantly change their properties. At the top Block Layer of the stack, applications access data using system calls

VM Guest Driver such as create, read, write, and unlink. These sys- tem calls invoke the underlying guest file system, which Controller Emulator in turn converts application calls into I/O requests to the

visor NFS Client block layer. The file system maintains data and meta- Hyper− data layouts, manages concurrent accesses, and often Network caches and prefetches data to improve application per- NAS formance. All of these features change the pattern of NFS Server File System Benchmark application requests. On−Disk File System The guest OS’s block layer receives requests from the Block Layer file system and reorders and merges them to increase

Storage performance, provide process fairness, and prioritize re- Appliance Driver quests. The I/O controller driver, located beneath the Figure 2: VM-NAS I/O Stack: VMs access and store virtual generic block layer, imposes extra limitations on the re- disk images on NAS. quests in accordance with the virtual device’s capabili- ties (e.g., trims requests to the maximum supported size 1d. Pass-through to DAS or SAN. In this case, vir- and limits the NCQ queue length [51]). tual disks are backed up by a real block device (not a file), which can be on a SAN or DAS. This approach is After that, requests cross the software-hardware less flexible than disk image files, but can offer lower boundary for the first time (here, the hardware is emu- overhead because one level of indirection—the host file lated). The hypervisor’s emulated controller translates system—is eliminated. the guest’s block-layer requests into reads and writes to the corresponding disk image files. Various request 2.1.2 Network Clients in the Guest transformations can be done by the hypervisor to op- The other approach for providing storage to a virtual ma- timize performance and provide fair access to the data chine is to let a network-based file system (e.g., NFS) from multiple VMs [18]. provide access to the data directly from the guest (case The hypervisor contains its own network file system 2 in Figure 1). This model avoids the need for disk im- client (e.g., NFS), which can cache data, limit read and age files, so no block-device emulation is needed. This write sizes, and perform other request transformations. eliminates emulation overheads, but lacks many of the In this paper we focus on NFSv3 because it is one of the benefits associated with virtualization, such as consistent most widely used protocols. However, our methodology snapshots, thin provisioning, cloning, disaster recovery. is easily extensible to SMB or NFSv4, and we plan to Also, not every guest OS supports every NAS protocol, perform expanded studies in the future. In the case of which fetters the ability of a hypervisor and its storage NFSv3, both the client and the server can limit read- and system to support all guest OS types. Further, cloud write-transfer sizes and modify write-synchronization management architectures such as VMware’s vCloud properties. Because the hypervisor and its NFS client and OpenStack do not support this design [32, 42]. significantly change I/O requests, it is not sufficient to collect data at the block layer of the guest OS; we col- 2.2 VM-NAS I/O Stack lect our traces at the entrance to the NFS server. In this paper we focus on the VM-NAS architecture, After the request is sent over a network to the NAS where VM disks are emulated by disk image files stored server, the same layers that appear in the guest OS are on NAS (case 1c in Section 2.1.1 and in Figure 1). To the repeated in the server. By this time, however, the origi- best of our knowledge, even though this architecture is nal requests have already undergone significant changes becoming popular in virtual data centers [43, 50], there performed by the upper layers, so the optimizations ap- has been no study of the significant transformations in plied by similar layers at the server can be consider- typical NAS I/O workloads caused by server virtualiza- ably different. Moreover, many NAS servers (e.g., Ne- tion. This paper is a first step towards a better under- tApp [20]) run a proprietary OS that uses specialized standing of NAS workloads in virtualized environments request-handling algorithms, additionally complicating and the development of suitable benchmarks for NAS to the overall system behavior. This complex behavior has be used in industry and academia. a direct effect on measurement techniques, as we discuss When VMs and NAS are used together, the corre- next in Section 2.3.

3 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 309 Virtualized Clients Physical Clients

App App App App App... App ...... ... Operating System Operating System Virtual Machine Virtual Machine NFS/CIFS Hypervisor NAS Appliance NFS/CIFS Front−end: NFS/CIFS NAS Appliance Back−end: GPFS/WAFL/ZFS Front−end: NFS/CIFS

Back−end: GPFS/WAFL/ZFS

(a) Physical (b) Virtualized Figure 3: Physical and Virtualized NAS architectures. With physical clients, applications use a NAS client to access the NAS appliance directly. With virtualized clients, applications access the NAS appliance via a virtualized block device. 2.3 VM-NAS Benchmarking Setup as it goes down the layers. However, doing so would require a thorough study of the request-handling logic Regular file system benchmarks usually operate at the in the guest OSes and , with further verifi- application layer and generate workloads typical to one cation through multi-layer trace collection. Although or a set of applications (Figure 2). In non-virtualized this approach might be feasible, it is time-consuming, deployments these benchmarks can be used without especially because it must be repeated for many dif- any changes to evaluate the performance of a NAS ferent OSes and hypervisors. Therefore, in this paper server, simply by running the benchmark on a NAS we chose to study the workload characteristics at a sin- client. In virtualized deployments, however, I/O re- gle layer, namely where requests enter the NAS server. quests can change significantly before reaching the NAS We collected traces at this layer and then characterized server due to the deep and diverse I/O stack described selected workload properties. The information from a above. Therefore, benchmarking these environments is single layer is enough to create the corresponding NAS not straightforward. benchmarks by reproducing the extracted workload fea- One approach to benchmarking in a VM-NAS setup is tures. Workload characterization and the benchmarks to deploy the entire virtualization infrastructure and then that we create are tightly coupled with the configuration run regular file system benchmarks inside the VMs. In of the upper layers: application, guest OS, local file sys- this case, requests submitted by application-level bench- tem, and hypervisor. In the future, we plan to perform a marks will naturally undergo the appropriate changes sensitivity analysis of I/O stack configurations to deduce while passing through the virtualized I/O stack. How- the parameters that account for the greatest changes to ever, this method requires a cumbersome setup of hyper- the I/O workload. visors, VMs, and applications. Every change to the test configuration, such as an increase in the number of VMs 3 NAS Workload Changes or a change of a guest OS, requires a significant amount In this section we detail seven categories of NAS work- of work. Moreover, the approach limits evaluation to the load changes caused by virtualization. Specifically, we available test hardware, which may not be sufficient to compare the two cases where a NAS server is accessed run hypervisors with the hundreds of VMs that may be by a (1) physical; or (2) a virtualized client, and describe required to exercise the limits of the NAS server. the differences in the I/O workload. These changes are To avoid these limitations and regain the flexibility the result of migrating an application from a physical of standard benchmarks, we have created virtualized server, which is configured to use an NFS client for di- benchmarks by extracting the workload characteristics rect data access, to a VM that stores data in a disk im- after the requests from the original physical benchmarks age file that the hypervisor accesses from an NFS server. have passed though the virtualization and NFS layers. Figure 3 demonstrates the difference in the two setups, The generated benchmarks can then run directly against and Table 2 summarizes the changes we observed in the the NAS server without having to deploy a complex in- I/O workload. The changes are listed from the most no- frastructure. Therefore, the benchmarking procedure re- ticeable and significant to the least. Here, we discuss mains the same as before—easy, flexible, and accessible. the changes qualitatively; quantitative observations are One approach to generating virtualized benchmarks presented in Section 4. would be to emulate the changes applied to each request First, and unsurprisingly, the number and size of files

4 310 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association # Workload Property Physical NAS Clients Virtual NAS Clients 1 File and directory count Many files and directories Single file per VM Directory tree depth Often deeply nested directories Shallow and uniform File size Lean towards many small files Multi-gigabyte sparse disk image files 2 Meta-data operations Many (72% in SPECsfs2008) Almost none 3 I/O synchronization Asynchronous and synchronous All writes are synchronous 4 In-file randomness Workload-dependent Increased randomness due to guest file system encapsula- tion Cross-file randomness Workload-dependent Cross-file access replaced by in-file access due to disk im- age files 5 I/O Sizes Workload-dependent Increased or decreased due to guest file system fragmenta- tion and I/O stack limitations 6 Read-modify-write Infrequent More frequent due to block layer in guest file system 7 Think time Workload-dependent Increased because of virtualization overheads

Table 2: Summary of key I/O workload changes between Physical and Virtualized NAS architectures. stored in NAS change from many relatively small files series of writes to a corresponding disk image: one to to a few (usually just one) large file(s) per VM—the a directory block, one to an inode block, and possibly disk image file(s). For example, the default Filebench one or more to data blocks. Similarly, when an applica- file server workload defines 10,000 files with an average tion accesses files and traverses the directory tree, phys- size of 128KB, which are spread over 500 directories. ical clients send many LOOKUP procedures to a NAS However, when Filebench is executed in a VM, there server. The same application behavior in a VM pro- is only one large disk image file. (Disk image files are duces a sequence of READs to the disk image. Current usually sized to the space requirements of a particular NAS benchmarks generate a high number of meta-data application; in our setup the disk image file size was set operations (e.g., 72% for SPECsfs2008), and will bias to the default 16GB for the Linux VM, and to 50GB the evaluation of a NAS that serves virtualized clients. for the Windows VM, because the benchmark we used While it may appear that removing all meta-data opera- in Windows required at least 50GB.) For the same rea- tions implies that application benchmarks can generally son, directory depth decreases and becomes fairly con- be replaced with random I/O benchmarks, such as IO- sistent: VMware ESX typically has a flat namespace; zone [11], this is insufficient. As shown in Section 5, each VM has one directory with the disk image files the VM-NAS I/O stack generates a range of I/O sizes, stored inside it. Back-end file systems used in NAS jump distances, and request offsets that cannot be mod- are often optimized for common file sizes and direc- eled with a simple distribution (uniform or otherwise). tory depths [2, 30, 31, 34], so this workload change can Third, all write requests that come to the NAS server significantly affect their performance. For example, to are synchronous. For NFS, this means that the stable improve write performance for small files, one popu- attribute is set on each and every write, which is typ- lar technique is to store data in the inode [16], a fea- ically not true for physical clients. The block layers ture that would be wasted on virtualized clients. Fur- of many OSes expect that when the hardware reports a ther, disk image files in NAS environments are typically write completion, the data has been saved to persistent sparse, with large portions of the files unallocated, i.e., storage. Similarly, the NFS protocol’s stable attribute the physical file size can be much smaller than its logi- specifies that the NFS server cannot reply to a WRITE cal size. In fact, VMware’s vSphere—the main tool for until the data is persistent. So the hypervisor satisfies the managing the VMs in VMware-based infrastructures— guest OS’s expectation by always setting this attribute on supports only the creation of sparse disk images over WRITE requests. Since many modern NAS servers try NFS. A major implication of this change is that back- to improve performance by gathering write requests into end file systems for NAS can lower their focus on opti- larger chunks in RAM, setting the stable attribute invali- mizing, for example, file append operations, and instead dates this important optimization for virtualized clients. focus on improving the performance of block allocation Fourth, in-file randomness increases significantly within a file. with virtualized clients. On a physical client, access pat- The second change caused by the move to virtualiza- terns (whether sequential or random) are distinct on a tion is that all file system meta-data operations become per-file basis. However, in virtualized clients, both se- data operations. For example, with a physical client quential and random operations are blended into a single there is a one-to-one mapping between file creation and disk image file. This causes the NAS server to receive a CREATE over the wire. However, when the applica- what appears to be many more random reads and writes tion creates a file in a VM, the NAS server receives a to that file. Furthermore, guest file system fragmenta-

5 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 311 tion increases image file randomness. On the other hand, Parameter RHEL 6.2 Win 2008 R2 SP1 cross-file randomness decreases, as each disk image file No. of CPUs 1 is typically accessed by only a single VM; i.e., it can be Memory 1GB 2GB easier to predict which files will be accessed next based Host Controller Paravirtual LSI Logic Parallel on their status, and to differentiate them by how actively Disk Drive Size 16GB 50GB Disk Image Format Thick flat VMDK they are used (running VMs, stopped ones, etc.). Guest File System Ext3 NTFS Fifth, the I/O sizes of original requests can both de- Guest I/O Scheduler CFQ n/a crease and increase while passing through the virtualiza- Table 3: Virtual Machine configuration parameters. tion layers. Guest file systems perform reads and writes in units of their block size, often 4KB. So, when read- the number of VMs increases, the contention for compu- ing a file of, say, 6KB size, the NAS server observes two tational resources grows, which can cause a significant 4KB reads for a total of 8KB, while a physical client increase in the request inter-arrival times. Longer think would request only 6KB (25% less). Since many modern times can prevent a NAS device from filling the underly- systems operate with a lot of small files [31], this differ- ing hardware I/O queues and achieving peak throughput. ence can have a significant impact on bandwidth. Sim- In summary, both static and dynamic properties of ilarly, when reading 2KB of data from two consecutive NAS workloads change when virtualized clients are in- data blocks in a file (1KB in each block), the NAS server troduced into the infrastructure. The changes are suf- may observe two 4KB reads for a total of 8KB (one for ficiently significant that direct comparison of certain each block), while a physical NAS client may send only workload properties between virtual and physical clients a single 2KB request. A NAS server designed for a vir- becomes problematic. For example, cross-file random- tualized environment could optimize its block-allocation ness has a rather different meaning in the virtual client, and fragmentation-prevention strategies to take advan- where the number of files is usually one per VM. There- tage of this observation. fore, in the rest of the paper we focus solely on charac- Interestingly, I/O sizes can also decrease because terizing workloads from virtualized clients, without try- guest file systems sometimes split large files into blocks ing to compare them directly against the physical client that might not be adjacent. This is especially true for workload. However, where possible, we refer to the aged file systems with higher fragmentation [37]. Con- original workload properties. sequently, whereas a physical client might pass an ap- 4 VM-NAS Workload Characterization plication’s 1MB read directly to the NAS, a virtualized client can sometimes submit several smaller reads scat- In this section we describe our experimental setup and tered across the (aged) disk image. An emulated disk then present and characterize a set of four different controller driver can also reduce the size of an I/O re- application-level benchmarks. quest. For example, we observed that the Linux IDE driver has a maximum I/O size of 128KB, which means 4.1 Experimental Configuration that any application requests larger than this value will Every layer in the VM-NAS I/O stack can be configured be split into smaller chunks. Note that such workload in several ways: different guest OSes can be installed, changes happen even in a physical machine as requests various virtualization solutions can be used, etc. The flow from a file system to a physical disk. However, in way in which the I/O stack is assembled and configured a VM-NAS setup, the transformed requests hit not a real can significantly change the resulting workload. In the disk, but a file on NAS, and as a result the NAS experi- current work we did not try to evaluate every possible ences a different workload. configuration, but rather selected several representative The sixth change is that when an application writes setups to demonstrate the utility of our techniques. The to part of a block, the guest file system must perform methodology we have developed is simple and accessi- a read-modify-write (RMW) to first read in valid data ble enough to evaluate many other configurations. Ta- prior to updating and writing it back to the NAS server. ble 3 presents the key configuration options and param- Consequently, virtualized clients often cause RMWs to eters we used in our experiments. Since our final goal is appear on the wire [19], requiring two block-sized round to create NAS benchmarks, we only care about the set- trips for every update. With physical clients, the RMW tings of the layers above the NAS server; we treat the is generally performed at the NAS server, avoiding the NAS itself as a black box. need to first send valid data back to the NAS client. We used two physical machines in our experimental Seventh, the think time between I/O requests can in- setup. The first acted as a NAS server, while the second crease due to varying virtualization overhead. It has represented a typical virtualized client (see Figure 3). been shown that for a single VM and modern hardware, The hypervisor was installed on a PowerEdge R710 the overhead of virtualization is small [4]. However, as node with an Intel Xeon E5530 2.4GHz 4-core CPU

6 312 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association and 24GB of RAM. We used local disk drives in this Workload Dataset size Files R/W/M ratio I/O Size machine for the hypervisor installation—VMware ESXi File-server 2.0GB 20,000 1/2/3 WF 5.0.0 build 62386. We used two guest OSes in the vir- Web-server 1.6GB 100,000 10/1/0 WF tual setup: Red Hat Enterprise Linux 6.2 (RHEL 6.2) DB-server 2.0GB 10 10/1/0 2KB and Windows 2008 R2 SP1. We stored the OS’s VM Mail-server 24.0GB 120 1/2/0 32KB disk images on the local, directly attached disk drives. Table 4: High-level workload characterization for our bench- We conducted our experiments with a separate virtual marks. R/W/M is the Read/Write/Modify ratio. WF (Whole- disk in every VM, with the corresponding disk images File) means the workload only reads or writes complete files. being stored on the NAS. We pre-allocated all of the The mail-server workload is based on JetStress, for which disk images (thick provisioning) to avoid performance R/W/M ratios and I/O sizes were estimated based on [24]. anomalies across runs related to thin provisioning (e.g., project are available from https://avatar.fsl.cs.sunysb.edu/ delayed block allocations). The RHEL 6.2 distribution groups/t2mpublic/. comes with a paravirtualized driver for VMware’s emu- Although SPECsfs is a widely used NAS bench- lated controller, so we used this controller for the Linux mark [38], we could not use it in our evaluation because VM. We left the default format and mount options for it incorporates its own NFS client, which makes it im- guest file systems unchanged. possible to run against a regular POSIX interface. We The machine designated as the NAS server was a Dell hope that the workload analysis and proposed bench- PowerEdge 1800 with six 250GB Maxtor 7L250S0 disk marks presented in this paper can be used by SPEC for drives connected through a Dell CERC SATA 1.5/6ch designing future SPECsfs synthetic workloads. controller, intended to be used as a storage server in VMware’s VMmark is a benchmark often associated enterprise environments. It is equipped with an Intel with testing VMs [45]. However, this benchmark is de- Xeon 2.80GHz Irwindale single-core CPU and 512MB signed to evaluate the performance of a hypervisor ma- of memory. The NAS server consisted of both the Linux chine, not the underlying storage system. For example, NFS server and IBM’s General Parallel File System VMmark is sensitive to how fast a hypervisor’s CPU is (GPFS) version 3.5 [35]. GPFS is a scalable clustered and how well it supports virtualization features (such file system that enables a scale-out, highly-available as AMD-V and Intel VT [1, 22]). However, these de- NAS solution and is used in both virtual and non- tails of hypervisor configuration should not have a large virtual environments. Our workload characterization effect on NAS benchmark results. Although VMmark and benchmark synthesis techniques treat NAS servers also indirectly benchmarks the I/O subsystem, it is hard as a black box and are valid regardless of its underly- to distinguish how much the I/O component contributes ing hardware and software. Since our ultimate goal is to the overall system performance. Moreover, VMmark to create benchmarks capable of stressing any NAS, we requires the installation of several hypervisors and ad- did not characterize NAS-specific characteristics such as ditional software (e.g., Microsoft Exchange) to generate request latencies. Our benchmarks, however, let us man- the load. Our goal is complementary: to design a real- ually configure the think time. By decreasing think time istic benchmark for the NAS that serves as the backend (along with increasing the number of VMs), a user can storage for a hypervisor like VMware. scale the load to the processing power of a NAS to accu- Our goal in this project was to transform some of the rately measure its peak performance. already existing benchmarks to their virtualized counter- parts. As such, we did not replay any real-world traces in 4.2 Application-Level Benchmarks the VMs. Both Filebench and JetStress generate work- In the Linux VM we used Filebench [15] to generate loads whose statistical characteristics remain the same file system workloads. Filebench can emulate the I/O over time (i.e., stationary workloads). Consequently, patterns of several enterprise applications; we used the new virtualized benchmarks also exhibit this property. File-, Web-, and Database-server workloads. We scaled up the datasets of these workloads so that they were 4.3 Characterization larger than the amount of RAM in the VM (see Table 4). We executed all benchmarks for 10 minutes (excluding Because Filebench does not support Windows, in our the preparation phase) and collected NFS traces at the Windows VM we used JetStress 2010 [23], a disk- NAS server. We repeated every run 3 times and verified subsystem benchmark that generates a Microsoft Ex- the consistency of the results. The traces were collected change Mail-server workload. It emulates accesses to using the GPFS mmtrace facility [21] and then converted the Exchange database by a specific number of users, to the DataSeries format [5] for efficient analysis. with a corresponding number of log file updates. Com- We developed a set of tools for extracting various plete workload configurations (physical and virtualized), workload characteristics. There is always a nearly infi- along with all the software we developed as part of this nite number of characteristics that can be extracted from

7 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 313 100 read fault maximum NFS read and write size. All requests 80 write 60 smaller than 4KB correspond to 0 on the bar graphs. 40 There are few writes smaller than 4KB for the File-

% of requests 20 server and Web-server workloads, but for the Database- 0 File-server Web-server Database-server Mail-server (Win) and Mail-server (JetStress) workloads the corresponding Workload percentages are 80% and 40%, respectively. Such small Figure 4: Read/Write ratios for different workloads writes are typical for databases (Microsoft Exchange a trace, but a NAS benchmark needs to reproduce only emulated by JetStress also uses a database) for two rea- those that significantly impact the performance of NAS sons. First, the Database-server workload writes 2KB at servers. Since there is no complete list of workload char- a time using direct I/O. In this case, the OS page cache acteristics that impact NAS, in the future we plan to con- is bypassed during write handling, and consequently duct a systematic study of NASes to create such a list. the I/O size is not increased to 4KB (the page size) For this paper, we selected characteristics that clearly when it reaches the block layer. The block layer cannot affect most NASes: (1) read/write ratio; (2) I/O size; then merge requests, due to their randomness. Second, (3) jump (seek) distance; and (4) offset popularity. databases often perform operations synchronously by As we mentioned earlier, the workloads produced using the fsync and sync calls. This causes the guest by VMs contain no meta-data operations. Thus, we file system to atomically update its meta-data, which can only characterize the ratio of data operations—READs only be achieved by writing a single sector (512B) to the to WRITEs. The jump distance of a request is defined virtual disk drive (and hence over NFS). as the difference in offsets (block addresses) between it For the File-server and Web-server workloads, most and the immediately preceding request (accounting for of the writes happen in 4KB and 64KB I/O sizes. The I/O size as well). We do not take the operation type into 4KB read size is dominant in all workloads because this account when calculating the jump distance. The off- is the guest file system block size. However, many set popularity is a histogram of the number of accesses of the File-server’s reads were merged into larger re- to each block within the disk image file; we report this quests by the I/O scheduler and then later split into 64KB as the number of blocks that were accessed once, twice, sizes by the NFS client. This happens because the av- etc. We present the offset popularity and I/O size dis- erage file size for the File-server is 128KB, so whole- tributions on a per-operation basis. Figure 4 depicts the file reads can be merged. For the Web-server work- read/write ratios and Figures 5–8 present I/O size, jump load, the average file size is only 16KB, so there are no distance, and offset popularity distributions for all work- 64KB reads at all. For the same reason, the Web-server loads. For jump distance we show a CDF because it is workload exhibits many reads around 16KB (some files the clearest way to present this parameter. are slightly smaller, others are slightly larger, in accor- Read/Write ratio. Read/write ratios vary significantly dance with Filebench’s gamma distribution [47]). In- across the analyzed workloads. The File-server work- terestingly, for the Mail-server workload, many requests load generates approximately the same number of reads have non-common I/O sizes. (We define an I/O size as and writes, although the original workload had twice non-common if fewer than 1% of such requests have as many writes (Table 4). We attribute this differ- such I/O size.) We grouped all non-common I/O sizes ence to the high number of meta-data operations (e.g., in the bucket called “Rest” in the histogram. This illus- LOOKUPs and STATs) that were translated to reads by trates that approximately 15% of all requests have non- the I/O stack. The Web-server and the Database-server common I/O sizes for the Mail-server workload. are read-intensive workloads, which is true for both orig- Jump distance. The CDF jump distance distribution inal and virtualized workloads. The corresponding orig- graphs show that many workloads demonstrate a signif- inal workloads do not contain many meta-data opera- icant level of sequentiality, which is especially true for tions, and therefore the read/write ratio remained un- the File-server workload: more than 60% of requests are changed (unlike the File-server workload). The Mail- sequential. Another 30% of the requests in the File- server workload, on the other hand, is write-intensive: server workload represent comparatively short jumps: about 70% of all operations are writes, which is close less than 2GB, the size of the dataset for this work- to the original benchmark where two thirds of all opera- load; these are jumps between different files in the active tions are writes. As with the Web-server and Database- dataset. The remaining 10% of the jumps come from server workloads, the lack of meta-data operations kept meta-data updates and queries, and are spread across the the read/write ratio unchanged, entire disk. The Web-server workload exhibits similar I/O size distribution. The I/O sizes for all workloads behavior except that the active dataset is larger—about vary from 512B to 64KB; the latter limit is imposed by 5–10GB. The cause of this is a larger number of files in the RHEL 6.2 NFS server, which sets 64KB as the de- the workload (compared to File-server) and the alloca-

8 314 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association 50 100 100 read 40 80 80 read write write 30 60 60 20 40 40 10 20 20 % of offsets % of requests 0 % of requests 0 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 -15 -10 -5 0 5 10 15 12345672965 I/O Size (KB) Jump distance (GB) Number of requests (a) I/O Size (b) Jump distance (c) Offset popularity Figure 5: Characteristics of a virtualized File-server workload.

50 100 100 read 40 80 80 read write write 30 60 60 20 40 40 10 20 20 % of offsets % of requests 0 % of requests 0 0 0 4 8 12 16 20 24 28 32 36 40 44 48 52 56 60 64 -15 -10 -5 0 5 10 15 123456789101160 I/O Size (KB) Jump distance (GB) Number of requests (a) I/O Size (b) Jump distance (c) Offset popularity Figure 6: Characteristics of a virtualized Web-server workload.

100 100 100 read 80 read 80 write 80 write 60 60 60 40 40 40 20 20 20 % of offsets % of requests 0 % of requests 0 0 0 4 8 64 -15 -10 -5 0 5 10 15 1 49812 I/O Size (KB) Jump distance (GB) Number of requests (a) I/O Size (b) Jump distance (c) Offset popularity Figure 7: Characteristics of a virtualized Database-server workload.

100 100 100 read 80 read 80 write 80 write 60 60 60 40 40 40 20 20 20 % of offsets % of requests 0 % of requests 0 0 0 32 64 Rest -40 -20 0 20 40 123456789101112131415R I/O Size (KB) Jump distance (GB) Number of requests (a) I/O Size (b) Jump distance (c) Offset popularity Figure 8: Characteristics of a virtualized Mail-server workload. tion policy of Ext3 that tries to spread many files across structures due to frequent file system synchronization. different block groups. The Mail-server workload demonstrates a high number For the Database-server workload there are almost no of overwrites (about 50%). These overwrites are caused sequential accesses. Over 60% of the jumps are within by Microsoft Exchange overwriting the log file multiple 2GB because that is the dataset size. Interestingly, about times. With Mail-server, “R” on the X axes designates 40% of the requests have fairly long jumps that are the “Rest” of the values, because there were too many caused by frequent file system synchronization, which to list. We therefore grouped all of the values that con- leads to meta-data updates at the beginning of the disk. tributed less than 1% into the R bucket. In the Mail-server workload approximately 40% of the requests are sequential, and the rest are spread across 5 New NAS Benchmarks the 50GB disk image file. A slight bend around 24GB This section describes our methodology for the creation corresponds to the active dataset size. Also, note that the of new NAS benchmarks for virtualized environments Mail-server workload uses the NTFS file system, which and then evaluates their accuracy. uses a different allocation policy than Ext3; this explains the difference in the shape of the Mail-server curve from 5.1 Trace-to-Model Conversion other workloads. Our NAS benchmarks generate workloads with charac- Offset popularity. In all workloads, most of the off- teristics that closely follow the statistical distributions sets were accessed only once. The absolute numbers on presented in Section 4.3. We decided not to write a new these graphs depend on the run time, e.g., when one runs benchmarking tool, but rather exploit Filebench’s abil- a benchmark longer, then the chance of accessing the ity to express I/O workloads with its Workload Model- same offset increases. However, the shape of the curve ing Language (WML) [48], which allows one to flex- remains the same as time progresses (although it shifts to ibly define processes and the I/O operations they per- the right). For the Database workload, 40% of all blocks form. Filebench interprets WML and translates its in- were updated several thousand times. We attribute this structions to corresponding POSIX system calls. Our to the repeated updates of the same file system meta-data use of Filebench will facilitate the adoption of our new

9 USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 315 virtualized benchmarks: existing Filebench users can the Linux page cache. These settings also ensure that easily run new WML configurations. (1) no additional read requests are performed to the NFS We extended the WML language to support two vir- server (readahead); (2) that all write requests are imme- tualization terms: hypervisor and vm (virtual machine). diately sent to the NFS server without modification; and We call the extended version WML-V (by analogy with (3) that replies are returned only after the data is on disk. AMD-V). WML-V is backwards compatible with the This behavior was validated with extensive testing. This original WML, so users can merge virtualized and non- approach works well in this scenario because we do not virtualized configurations to simultaneously emulate the need to generate meta-data procedures on the wire; that workloads generated by both physical and virtual clients. would be difficult to achieve using this method because For each analyzed workload—File-server, Web- a 1:1 mapping of meta-data operations does not exist be- server, Database-server and Mail-server—we created a tween system calls and NFS procedures. corresponding WML-V configuration file. By modify- Our enhanced Filebench reports aggregate operations ing these files, a user can adjust the workloads to reflect per second for all VMs and individually for each VM. a desired benchmarking scenario, e.g., defining the num- Operations in the case of virtualized benchmarks are ber of VMs and the workloads they run. different from the original non-virtualized equivalent: Listing 1 presents an abridged example of a WML-V our benchmarks report the number of reads and writes configuration file that defines a single hypervisor, which per second; application-level benchmarks, however, re- runs 5 Database VMs and 2 Web-server VMs. Flowops port application-level operations (e.g., the number of are Filebench’s defined I/O operations, which are HTTP requests serviced by a Web-server). Neverthe- mapped to POSIX calls, such as open, create, read, less, the numbers reported by our benchmarks can be di- write, and delete. In the VM case, we only use read rectly used to compare the performance of different NAS and write flowops, since meta-data operations do not ap- servers under a configured workload. pear in the virtualized workloads. For every defined VM, None of our original benchmarks, except the database Filebench will pre-allocate a disk image file of a user- workload, emulated think time, because our test was de- defined size—16GB in the example listing. signed as an I/O benchmark. For the database bench- 1 HYPERVISOR name="physical-host1" { mark we defined think time as originally defined in 2 VM name="dbserver-vm",dsize=16gb,instances=5 { 3 flowop1, ... Filebench—200,000 loop iterations. Think time in all 4 } workloads can be adjusted by trivial changes to the 5 VM name="websever-vm",dsize=16gb,instances=2 { 6 flowop1, ... workload description. 7 } 8 } 5.2 Evaluation Listing 1: An abridged WML-V workload description that de- To evaluate the accuracy of our benchmarks we observed fines 7 VMs: 5 run database workloads and 2 generate Web- how the NAS server responds to the virtualized bench- server workloads. marks as compared to the original benchmarks when Filebench allows one to define random variables with executed in a VM. We monitored 11 parameters that desired empirical distributions; various flowop attributes represent the response of a NAS and are easy to ex- can then be assigned to these random variables. We used tract through the Linux /proc interface: (1) Reads/sec- this ability to define read and write I/O-size distributions ond from the underlying block device; (2) Writes/sec- and jump distances. We achieved the required read/write ond; (3) Request latency; (4) I/O utilization; (5) I/O ratios by putting an appropriate number of read and write queue length; (6) Request size; (7) CPU utilization; flowops within the VM definition. The generation of (8) Memory usage; (9) Interrupt count; (10) Context- a workload with user-defined jump distances and offset switch count; and (11) Number of processes in the wait popularity distributions is a complex problem [28] that state. We call these NAS response parameters. Filebench does not solve; in this work, we do not attempt We sampled the response parameters every 30 sec- to emulate this parameter. However, as we show in the onds during a 10-minute run and calculated the relative following section, this does not significantly affect the difference between each pair of parameters. Figure 9 accuracy of our benchmarks. presents maximum and Root Mean Square (RMS) dif- Ideally, we would like Filebench to translate flowops ference we observed for four workloads. In these exper- directly to NFS procedures. However, this would re- iments a single VM with an appropriate workload was quire us to implement an NFS client within Filebench used. The maximum relative error of our benchmarks is (which is an ongoing effort within the Filebench com- always less than 10%, and the RMS distance is within munity). To work around this limitation, we mount NFS 7% across all parameters. Certain response parameters with the sync flag and open the disk image files with show especially high accuracy; for example, the RMS the O DIRECT flag, ensuring that I/O requests bypass distance for request size is within 4%. Here, the accu-

10

316 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association 10 10 RMS RMS 8 Maximum 8 Maximum 6 6 4 4 Error (%) Error (%) 2 2 0 0 Reads/Sec Writes/Sec Latency I/O Util Queue Len ReqSize CPU Memory Interrupts ContSwitch WaitProc Reads/Sec Writes/Sec Latency I/O Util Queue Len ReqSize CPU Memory Interrupts ContSwitch WaitProc

(a) File-server (b) Web-server

10 10 RMS RMS 8 Maximum 8 Maximum 6 6 4 4 Error (%) Error (%) 2 2 0 0 Reads/Sec Writes/Sec Latency I/O Util Queue Len ReqSize CPU Memory Interrupts ContSwitch WaitProc Reads/Sec Writes/Sec Latency I/O Util Queue Len ReqSize CPU Memory Interrupts ContSwitch WaitProc

(c) Database-server (d) Mail-server (Windows, JetStress) Figure 9: Root Mean Square (RMS) and maximum relative distances of response parameters for all workloads.

10 14 Reads/Sec Reads/Sec 9 Writes/Sec 12 Writes/Sec 8 Latency Latency 7 I/O utilization 10 I/O utilization 6 Queue length 8 Queue length 5 Request Size Request Size

Error (%) 4 CPU Utilization Error (%) 6 CPU Utilization 3 Memory Memory 4 2 Interrupts Interrupts 1 Context Switches 2 Context Switches 1 2 3 4 5 6 7 8 Waiting Processes 1 2 3 4 5 6 7 8 Waiting Processes Number of VMs Number of VMs (a) RMS distance (b) Maximum Figure 10: Response parameter errors depending on the number of VMs deployed. The first four VMs (1–4) execute four different workloads we analyzed. The next four VMs (5–8) are repeated in the same order. racy is high because our benchmarks directly emulate observed during the whole run was the highest among I/O size distribution. Errors in CPU and memory utiliza- other parameters—in the 10–13% range. tion were less than 5%, because the NAS in our experi- In summary, our benchmarks show a high accuracy ments did not perform many CPU-intensive tasks. for both single- and multi-VM experiments, even under heavy stress. Scalability with Multiple Virtual Machines. The benefit of our benchmarks is that a user can define many 6 Related Work VMs with different workloads and measure NAS perfor- Storage performance in virtualized environments is an mance against this specific workload configuration. To active research area. Le et al. studied the storage per- verify that the accuracy of our benchmarks does not de- formance implications of combining different guest and crease as we emulate more VMs, we conducted a multi- host file systems [29]. Boutcher et al. examined how the VM experiment. We first ran one VM with a File-server selection of guest OS and host I/O schedulers impacts in it, then added a second VM with a Web-server work- the performance of a virtual machine [10]. Both of these load, then a third VM executing the Database-server works focused on the performance aspects of the prob- workload, and finally a fourth VM running JetStress. Af- lem, not workload characterization or generation; also, ter that we added another four VMs with the same four the authors used direct-attached storage, which is sim- workloads in the same order. In total we had 8 different pler but less common in modern enterprise data centers. configurations ranging from 1 to 8 VMs; this setup was Hildebrand et al. discussed the implications of us- designed to heavily stress the NAS under several, differ- ing the VM-NAS architecture with enterprise storage ent, concurrently running workloads. We then emulated servers [19]. That work focused on the performance im- the same 8 configurations using our benchmarks and plications of the VM-NAS I/O stack without thoroughly again monitored the response parameters. Figures 10(a) investigating the changes to the I/O workload. Gulati et and 10(b) depict RMS and maximum relative errors, re- al. characterized the SAN workloads produced by VMs spectively, depending on the number of VMs. for several enterprise applications [17]. Our techniques When a single VM is emulated, our benchmarks can also be used to generate new benchmarks for SAN- show the best accuracy. Beyond one VM, the RMS er- based deployments, but we selected to investigate VM- ror increased by about 3–5%, but still remained within NAS setups first, for two reasons. First, NAS servers are 10%. For four parameters—latency, writes/sec, inter- becoming a more popular solution for hosting VM disk rupts and context switches count—the maximum error images. Second, the degree of workload change in such

11

USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 317 deployments is higher: NAS servers use more complex creased randomness within files, and more. network file-system protocols whereas SANs and DAS Based on these observations from real-world work- use a simpler block-based protocol. loads, we developed new benchmarks that accurately Ahmad et al. studied performance overheads caused represent NAS workloads in virtualized data centers— by I/O stack virtualization in ESX with a SAN [4]. That and yet these benchmarks can be run directly against study did not focus on workload characterization but the NAS without requiring a complex virtualization en- rather tried to validate that modern VMs introduce low vironment configured with VMs and applications. Our overhead compared to physical nodes. Later, the same new virtualized benchmarks represent four workloads, authors proposed a low-overhead method for on-line two guest operating systems, and up to eight virtual ma- workload characterization in ESX [3]. However, their chines. Our evaluation reveals that the relative error of tool characterizes traces collected at the virtual SCSI these new benchmarks across more than 11 parameters layer and consequently does not account for any trans- is less than 10% on average. In addition to providing a formations that may occur in ESX and its NFS client. directly usable measurement tool, we hope that our work In contrast, we collect the trace at the NAS layer after will provide guidance to future NAS standards, such as all request transformations, allowing us to create more SPEC, in devising benchmarks that are better suited to accurate benchmarks. virtualized environments. Casale et al. proposed a model for predicting stor- Future work. We plan to extend the number of gen- age performance when multiple VMs use shared stor- erated benchmarks by analyzing actual applications and age [12, 26]. Practical benchmarks like ours are com- application traces, including typical VM operations such plementary to that work and allow one to verify such as booting, updating, and snapshotting—and examine predictions in real life. Ben-Yehuda et al. analyzed per- root and I/O swap partition access patterns. We also formance bottlenecks when several VMs are used to pro- expect to explore more VM configuration options such vide different functionalities on a storage controller [7]. as additional guest file systems (and their age), hypervi- The authors focused on lowering network overhead via sors, and NAS protocols. Once a larger body of virtual intelligent polling and other techniques. NAS benchmarks exists, we will be able to study the Trace-driven performance evaluation and workload I/O workload’s sensitivity to each configuration param- characterization have been the basis of many stud- eter as well as investigate the impact of extracting and ies [14, 25, 27, 33]. Our trace-characterizing techniques reproducing additional trace characteristics in the gener- and benchmark-synthesis techniques are based on multi- ated benchmarks. To avoid manual analysis and trans- dimensional workload analysis. Chen et al. used multi- formation of large numbers of applications, we plan to dimensional trace analysis to infer behavior of enterprise investigate the feasibility of automatically transforming storage systems [13]. Tarasov et al. proposed a tech- physical workloads to virtual workloads via a multi-level nique for automated translation of block-I/O traces to trace analysis of the VM-NAS I/O stack. workload models [40]. Yadawakar et al. proposed to Acknowledgements. We thank the anonymous discover applications based on multi-dimensional char- USENIX reviewers for their valuable feedback. acteristics of NFS traces [49]. This work was supported in part by NSF awards In summary, to the best of our knowledge, there have CCF-0937833 and CCF-0937854, and an IBM PhD been no earlier studies that systematically analyzed vir- Fellowship. tualized NAS workloads. Moreover, we are the first to present new NAS benchmarks that accurately generate References virtualized I/O workloads. [1] Advanced Micro Devices, Inc. Industry leading virtualization platform efficiency, 2008. www.amd. 7 Conclusions and Future Work com/virtualization. We have studied the transformation of existing NAS [2] N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. I/O workloads due to server virtualization. Whereas Lorch. A five-year study of file-system metadata. such transformations were known to occur due to vir- In Proceedings of the Fifth USENIX Conference on tualization, they have not been studied in depth to date. File and Storage Technologies (FAST ’07), 2007. Our analysis revealed several significant I/O workload changes due to the use of disk images and the place- [3] I. Ahmad. Easy and efficient disk I/O work- ment of the guest block layer above the NAS file client. load characterization in VMware ESX server. In We observed and quantified significant changes such as Proceedings of IEEE International Symposium on the disappearance of file system meta-data operations Workload Characterization (IISWC), 2007. at the NAS layer, changes in I/O sizes, changes in file [4] I. Ahmad, J. M. Anderson, A. M. Holler, counts and directory depths, asynchrony changes, in- R. Kambo, and V. Makhija. An analysis of disk

12

318 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association performance in VMware ESX server virtual ma- Analysis, Characterization, and Tools (VPACT), chines. In Proceedings of IEEE International Sym- 2009. posium on Workload Characterization (IISWC), [18] A. Gulati, G. Shanmuganathan, X. Zhang, and 2003. P. Varman. Demand based hierarchical QoS us- [5] E. Anderson, M. Arlitt, C. Morrey, and A. Veitch. ing storage resource pools. In Proceedings of the DataSeries: an efficient, flexible, data format for Annual USENIX Technical Conference, 2012. structured serial data. ACM SIGOPS Operating [19] D. Hildebrand, A. Povzner, R. Tewari, and Systems Review, 43(1), January 2009. V. Tarasov. Revisiting the storage stack in virtu- [6] Atlantis Computing. www.atlantiscomputing.com/. alized NAS environments. In Proceedings of the [7] M. Ben-Yehuda, M. Factor, E. Rom, A. Traeger, Workshop on I/O Virtualization (WIOV ’11), 2011. E. Borovik, and B. Yassour. Adding advanced stor- [20] D. Hitz, J. Lau, and M. Malcolm. File system de- age controller functionality via low-overhead vir- sign for an NFS file server appliance. In Proceed- tualization. In Proceedings of the Tenth USENIX ings of the USENIX Winter Technical Conference, Conference on File and Storage Technologies 1994. (FAST ’12), 2012. [21] IBM. General Parallel File System problem deter- [8] Thomas Bittman. Virtual machines and market mination guide. Technical Report GA22-7969-02, share through 2012. Gartner, October 2009. ID IBM, December 2004. http://pic.dhe.ibm.com/ Number: G00170437. infocenter/db2luw/v9r8/index.jsp?topic= %2Fcom. \ [9] Thomas Bittman. Q&A: six misconceptions about .db2.luw.sd.doc %2Fdoc %2Ft0056934.html. server virtualization. Gartner, July 2010. ID Num- \ \ [22] Intel Corporation. Intel virtualization technol- ber: G00201551. ogy (Intel VT), 2008. www.intel.com/technology/ [10] D. Boutcher and A. Chandra. Does virtualization virtualization/. make disk scheduling passe?´ In Proceedings of the 1st Workshop on Hot Topics in Storage (Hot- [23] Microsoft Exchange Server JetStress 2010. www. Storage ’09), 2009. microsoft.com/en-us/download/details.aspx?id=4167. [11] D. Capps. IOzone file system benchmark. www. [24] Understanding database and log performance .org. factors. http://technet.microsoft.com/en-us/library/ ee832791.aspx. [12] G. Casale, S. Kraft, and D. Krishnamurthy. A model of storage I/O performance interference in [25] S. Kavalanekar, B. Worthington, Q. Zhang, and virtualized systems. In Proceedings of the Inter- V. Sharda. Characterization of storage work- national Workshop on Data Center Performance load traces from production windows servers. In (DCPerf), 2011. Proceedings of IEEE International Symposium on Workload Characterization (IISWC), 2008. [13] Y. Chen, K. Srinivasan, G. Goodson, and R. Katz. Design implications for enterprise storage systems [26] S. Kraft, G. Casale, D. Krishnamurthy, D. Greer, via multi-dimensional trace analysis. In Proceed- and P. Kilpatrick. Performance models of storage ings of the 23rd ACM Symposium on Operating contention in cloud environments. Software and System Principles (SOSP ’11), 2011. Systems Modeling, 12(2), March 2012. [14] D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. [27] G. H. Kuenning, G. J. Popek, and P. Reiher. An Everything you always wanted to know about NFS analysis of trace data for predictive file caching in trace analysis, but were afraid to ask. Technical mobile computing. In Proceedings of the Summer Report TR-06-02, Harvard University, Cambridge, 1994 USENIX Conference, 1994. MA, June 2002. [28] Z. Kurmas, J. Zito, L. Trevino, and R. Lush. Gen- [15] Filebench. http://filebench.sourceforge.net. erating a jump distance based synthetic disk access [16] G. R. Ganger and M. F. Kaashoek. Embedded in- pattern. In Proceedings of the International IEEE odes and explicit grouping: exploiting disk band- Symposium on Mass Storage Systems and Tech- width for small files. In Proceedings of the Annual nologies (MSST), 2006. USENIX Technical Conference, 1997. [29] D. Le, H. Huang, and H. Wang. Understanding [17] A. Gulati, C. Kumar, and I. Ahmad. Storage work- performance implications of nested file systems in load characterization and consolidation in virtual- a virtualized environment. In Proceedings of the ized environments. In Proceedings of 2nd Inter- Tenth USENIX Conference on File and Storage national Workshop on Virtualization Performance: Technologies (FAST ’12), 2012.

13

USENIX Association 11th USENIX Conference on File and Storage Technologies (FAST ’13) 319 [30] A. W. Leung, S. Pasupathy, G. Goodson, and E. L. [46] VMware, Inc. VMware Virtual Machine File Sys- Miller. Measurement and analysis of large-scale tem: Technical Overview and Best Practices, 2007. network file system workloads. In Proceedings of www.vmware.com/pdf/vmfs-best-practices-wp.pdf. the USENIX Annual Technical Conference (ATC [47] A. W. Wilson. Operation and implementation of ’08), 2008. random variables in Filebench. [31] D. Meyer and W. Bolosky. A study of practi- [48] Filebench Workload Model Language (WML). cal deduplication. In Proceedings of the Nineth http://sourceforge.net/apps/mediawiki/filebench/ USENIX Conference on File and Storage Tech- index.php?title=Filebench Workload Language. nologies (FAST ’11), 2011. [49] N. Yadwadkar, C. Bhattacharyya, and K. Gopinath. [32] OpenStack Foundation. www.openstack.org/. Discovery of application workloads from net- [33] J. Ousterhout, H. Costa, D. Harrison, J. Kunze, work file traces. In Proceedings of the Eighth M. Kupfer, and J. Thompson. A trace-driven anal- USENIX Conference on File and Storage Tech- ysis of the UNIX 4.2 BSD file system. In Proceed- nologies (FAST ’10), 2010. ings of the Tenth ACM Symposium on Operating [50] Natalya Yezhkova, Liz Conner, Richard L. Villars, System Principles (SOSP), 1985. and Benjamin Woo. Worldwide enterprise storage [34] D. Roselli, J. R. Lorch, and T. E. Anderson. A systems 2010–2014 forecast: recovery, efficiency, comparison of file system workloads. In Proceed- and digitization shaping customer requirements for ings of the Annual USENIX Technical Conference, storage systems. IDC, May 2010. IDC #223234. 2000. [51] Y. Yu, D. Shin, H. Eom, and H. Yeom. NCQ vs I/O scheduler: preventing unexpected misbehav- [35] F. Schmuck and R. Haskin. GPFS: A shared-disk iors. ACM Transaction on Storage, 6(1), March file system for large computing clusters. In Pro- 2010. ceedings of the First USENIX Conference on File and Storage Technologies (FAST ’02), 2002. [36] Simplivity. www.simplivity.com/. [37] K. A. Smith and M. I. Seltzer. File system aging— increasing the relevance of file system benchmarks. In Proceedings of the 1997 International Confer- ence on Measurement and Modeling of Computer Systems (SIGMETRICS 1997), 1997. [38] SPEC. SPECsfs2008. www.spec.org/sfs2008, July 2008. [39] J. Sugerman, G. Venkitachalam, and B. Lim. Vir- tualizing I/O devices on VMware workstations hosted virtual machine monitor. In Proceedings of the Annual USENIX Technical Conference, 2001. [40] V. Tarasov, K. S. Kumar, J. Ma, D. Hildebrand, A. Povzner, G. Kuenning, and E. Zadok. Extract- ing flexible, replayable models from large block traces. In Proceedings of the Tenth USENIX Con- ference on File and Storage Technologies (FAST ’12), 2012. [41] Tintri. www.tintri.com/. [42] VMware vCloud. http://vcloud.vmware.com/. [43] Richard Villars and Noemi Greyzdorf. World- wide file-based storage 2010–2014 forecast up- date. IDC, December 2010. IDC #226267. [44] VirtualBox. https://www.virtualbox.org/. [45] VMMark. www.vmware.com/go/vmmark.

14

320 11th USENIX Conference on File and Storage Technologies (FAST ’13) USENIX Association