Emulating Goliath Storage Systems with David

Emulating Goliath Storage Systems with David † Nitin Agrawal , Leo Arulraj∗, Andrea C. Arpaci-Dusseau∗, Remzi H. Arpaci-Dusseau∗ † NEC Laboratories America University of Wisconsin–Madison∗ [email protected] arulraj, dusseau, remzi @cs.wisc.edu { } Abstract Benchmarking on large storage devices is thus a fre- Benchmarking file and storage systems on large file- quent source of frustration for file-system evaluators; the system images is important, but difficult and often in- scale acts as a deterrent against using larger albeit realis- feasible. Typically, running benchmarks on such large tic benchmarks [24], but running toy workloads on small disk setups is a frequent source of frustration for file- disks is not sufficient. One obvious solution is to contin- system evaluators; the scale alone acts as a strong deter- ually upgrade one’s storage capacity. However, it is an rent against using larger albeit realistic benchmarks. To expensive, and perhaps an infeasible solution to justify address this problem, we develop David: a system that the costs and overheads solely for benchmarking. makes it practical to run large benchmarks using modest Storage emulators such as Memulator [10] prove ex- amount of storage or memory capacities readily available tremely useful for such scenarios – they let us prototype on most computers. David creates a “compressed” ver- the “future” by pretending to plug in bigger, faster stor- sion of the original file-system image by omitting all file age systems and run real workloads against them. Mem- data and laying out metadata more efficiently; an online ulator, in fact, makes a strong case for storage emulation storage model determines the runtime of the benchmark as the performance evaluation methodology of choice. workload on the original uncompressed image. David But emulators are particularly tough: if they are to be works under any file system as demonstrated in this pa- big, they have to use existing storage (and thus are slow); per with ext3 and btrfs. We find that David reduces stor- if they are to be fast, they have to be run out of memory age requirements by orders of magnitude; David is able (and thus they are small). to emulate a 1 TB target workload using only an 80 GB The challenge we face is how can we get the best of available disk, while still modeling the actual runtime ac- both worlds? To address this problem, we have devel- curately. David can also emulate newer or faster devices, oped David, a “scale down” emulator that allows one e.g., we show how David can effectively emulate a multi- to run large workloads by scaling down the storage re- disk RAID using a limited amount of memory. quirements transparently to the workload. David makes 1 Introduction it practical to experiment with benchmarksthat were oth- erwise infeasible to run on a given system. File and storage systems are currently difficult to bench- Our observation is that in many cases, the benchmark mark. Ideally, one would like to use a benchmark work- application does not care about the contents of individ- load that is a realistic approximation of a known appli- ual files, but only about the structure and properties of cation. One would also like to run it in a configuration the metadata that is being stored on disk. In particular, representative of real world scenarios, including realistic for the purposes of benchmarking, many applications do disk subsystems and file-system images. not write or read file contents at all (e.g., fsck); the ones In practice, realistic benchmarks and their realistic that do, often do not care what the contents are as long as configurations tend to be much larger and more com- some valid content is made available (e.g., backup soft- plex to set up than their trivial counterparts. File system ware). Since file data constitutes a significant fraction traces (e.g., from HP Labs [17]) are good examples of of the total file system size, ranging anywhere from 90 such workloads, often being large and unwieldy. Devel- to 99% depending on the actual file-system image [3] oping scalable yet practical benchmarks has long been avoiding the need to store file data has the potential to a challenge for the storage systems community [16]. In significantly reduce the required storage capacity during particular, benchmarks such as GraySort [1] and SPEC- benchmarking. mail2009 [22] are compelling yet difficult to set up and The key idea in David is to create a “compressed” ver- use currently, requiring around 100 TB for GraySort and sion of the original file-system image for the purposes of anywhere from 100 GB to 2 TB for SPECmail2009. benchmarking. In the compressed image, unneeded user data blocks are omitted using novel classification tech- niques to distinguish data from metadata at scale; file system metadata blocks (e.g., inodes, directories and in- direct blocks) are stored compactly on the available backing store. The primary benefit of the compressed image is to reduce the storage capacity required to run any given workload. To ensure that applications remain unaware of this interposition, whenever necessary, David syntheti- cally generates file data on the fly; metadata I/O is redi- rected and accessed appropriately. David works under any file system; we demonstrate this using ext3 [25] and Figure 1: Capacity Savings. Shows the savings in stor- btrfs [26], two file systems very different in design. age capacity if only metadata is stored, with varying file-size Since David alters the original I/O patterns, it needs distribution modeled by (µ, σ) parameters of a lognormal dis- to model the runtime of the benchmark workload on the tribution, (7.53, 2.48) and (8.33, 3.18) for the two extremes. original uncompressed image. David uses an in-kernel model of the disk and storage stack to determine the system on a multi-disk multi-TB mirrored RAID con- run times of all individual requests as they would have figuration without having access to one? An end-user executed on the uncompressed image. The model pays looking to select an application that works best at larger special attention to accurately modeling the I/O request scale may also use David for emulation. For example, queues; we find that modeling the request queues is cru- which anti-virus application scans a terabyte file system cial for overall accuracy, especially for applications issu- the fastest? ing bursty I/O. The primary mode of operation of David is the timing- One challenge in building David is how to deal with accurate mode in which after modeling the runtime, an scale as we experiment with larger file systems contain- appropriate delay is inserted before returning to the ap- ing many more files and directories. Figure 1 shows the plication. A secondary speedup mode is also available percentage of storage space occupied by metadata alone in which the storage model returns instantaneously after as compared to the total size of the file-system image computing the time taken to run the benchmark on the written; the different file-system images for this experi- uncompressed disk; in this mode David offers the poten- ment were generated by varying the file size distribution tial to reduce application runtime and speedup the bench- using Impressions [2]. Using publicly available data on mark itself. In this paper we discuss and evaluate David file-system metadata [4], we analyzed how file-size dis- in the timing-accurate mode. tribution changes with file systems of varying sizes. David allows one to run benchmark workloads that re- We found that larger file systems not only had more quire file-system images orders of magnitude larger than files, they also had larger files. For this experiment, the available backing store while still reporting the run- the parameters of the lognormal distribution controlling time as it would have taken on the original image. We the file sizes were changed along the x-axis to gen- demonstrate that David even enables emulation of faster erate progressively larger file systems with larger files and multi-disk systems like RAID using a small amount therein. The relatively small fraction belonging to meta- of memory. David can also aid in running large bench- data (roughly 1 to 10%) as shown on the y-axis demon- marks on storage devices that are expensive or not even strates the potential savings in storage capacity made available in the market as it requires only a model of the possible if only metadata blocks are stored; David is de- non-existent storage device; for example, one can use a signed to take advantage of this observation. modified version of David to run benchmarks on a hypo- thetical 1TB SSD. For workloads like PostMark, mkfs, Filebench We believe David will be useful to file and storage WebServer, Filebench VarMail, and other mi- developers, application developers, and users looking to crobenchmarks, we find that David delivers on its benchmark these systems at scale. Developers often like promise in reducing the required storage size while still to evaluate a prototype implementation at larger scales accurately predicting the benchmark runtime for both to identify performance bottlenecks, fine-tune optimiza- ext3 and btrfs. The storage model within David is fairly tions, and make design decisions; analyses at scale of- accurate in spite of operating in real-time within the ker- ten reveal interesting and critical insights into the sys- nel, and for most workloads predicts a runtime within tem [16]. David can help obtain approximate perfor- 5% of the actual runtime. For example, for the Filebench mance estimates within limits of its modeling error. For webserver workload, David provides a 1000-fold reduc- example, how does one measure performance of a file tion in required storage capacity and predicts a runtime within 0.08% of the actual.

Load more