A Fast and Slippery Slope for File Systems
Total Page:16
File Type:pdf, Size:1020Kb
A Fast and Slippery Slope for File Systems Ricardo Santana Raju Rangaswami Vasily Tarasov Dean Hildebrand Florida International University IBM Research—Almaden rsant144@fiu.edu raju@cs.fiu.edu [email protected] [email protected] Abstract vices are getting faster and faster, and it is not clear that a There is a vast number and variety of file systems cur- file system that works well on todays hardware will continue rently available, each optimizing for an ever growing number to work well on the next generation hardware. Further, the of storage devices and workloads. Users have an unprece- workload type, of which there is no shortage, also has a large dented, and somewhat overwhelming, number of data man- impact on file system selection. Due to the broad prolifera- agement options. At the same time, the fastest storage de- tion of file systems, this problem impacts numerous system vices are only getting faster, and it is unclear on how well domains, ranging from servers, laptops, embedded devices, the existing file systems will adapt. Using emulation tech- mobile phones, and virtual machines to large distributed file niques, we evaluate five popular Linux file systems across systems in the enterprise. a range of storage device latencies typical to low-end hard The problem is further complicated by the fact that we drives, latest high-performance persistent memory block de- are about to enter an era where storage latency across de- vices, and in between. Our findings are often surprising. De- vice types varies by as much as four orders of magnitude. pending on the workload, we find that some file systems While disk drives offer multi-millisecond latencies, flash- can clearly scale with faster storage devices much better based solid-state drives provide latencies in sub-millisecond than others. Further, as storage device latency decreases, we range and the newer persistent memory based storage de- find unexpected performance inversions across file systems. vices offer even lower latencies of 200-300ns [11]. These Finally, file system scalability in the higher device latency diverse device characteristics are in many cases hidden from range is not representative of scalability in the lower, sub- the file system. For example, VMs and storage virtualization millisecond, latency range. We then focus on Nilfs2 as an abstract the true nature of the underlying hardware, but still especially alarming example of an unexpectedly poor scala- allow users to request a block device with specific latency bility and present detailed instructions for identifying bottle- bounds [15]. Given these developments, file systems should necks in the I/O stack. be capable of performing well on storage devices with arbi- trary performance characteristics. Categories and Subject Descriptors D.4.2 [Storage Man- Conventionally, the evaluation of file systems has focused agement]: Secondary storage; D.4.3 [File Systems Manage- on a single type of storage hardware. Without a compre- ment]: File organization; D.4.8 [Performance]: Measure- hensive study that can keep up with the changing landscape ments; D.4.8 [Performance]: Modeling and prediction of available file systems and storage hardware, users often General Terms Measurement, Performance, Design make implicit generalizations of file system performance for similar or dissimilar hardware and workloads. Given the Keywords File systems, high-speed devices rapid pace of change, there is a need to reevaluate this con- 1. Introduction ventional thinking, and be much more comprehensive with respect to how file systems are evaluated. The number and variety of available file systems can give In this paper, we motivate the evaluation of file sys- users decision paralysis. To complicate matters, storage de- tem performance across both workloads and storage device speeds. Using an instrumented block device layer that is capable of emulating both multi- and sub-millisecond la- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed tencies, we evaluate five Linux file systems—Ext4, XFS, for profit or commercial advantage and that copies bear this notice and the full citation BTRFS, Nilfs2, and F2FS—using several common work- on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, loads. Our initial findings are both unexpected and counter- to post on servers or to redistribute to lists, requires prior specific permission and/or a intuitive and motivate substantial follow-on work to under- fee. Request permissions from [email protected]. Reprinted with permission. INFLOW’15, October 4–7, 2015, Monterey, CA. stand in greater detail how popular file systems perform Copyright c 2015 ACM 978-1-4503-3945-2/15/10. $15.00. http://dx.doi.org/10.1145/http://dx.doi.org/10.1145/2819001.2819003 across a variety of device characteristics in both end-user User Performance Expectation Model (UPEM) repre- and enterprise settings. sents file system performance as expected by users. Let’s We make several specific observations. assume that the average latency of a storage device and I/O software stack are l and l , respectively. The value of Observation 1. File systems are not created equal for all dev sw l is constant for a specific system setup, file system, and storage speeds; not only do some file systems scale better sw workload, while l is a device-specific characteristic. The than others as storage latencies decrease, their relative scal- dev total average latency of a single I/O operation is proportional ing capabilities are highly workload-sensitive. to (ldev + lsw). Then file system throughput is inversely pro- Observation 2. With some workloads unexpected perfor- portional to the latency: mance inversions occur; file systems that perform faster than C other file systems at higher latencies perform slower at lower T hroughput(ldev) = ldev + lsw latencies, and vice versa. where C is a coefficient of proportionality specific to the File system performance models built on Observation 3. system setup, file system, and workload. the reasonable expectation that performance scales with the If the file system throughput is known for two l laten- storage device speed are arbitrarily inaccurate, especially in dev cies, then two constants C and l can be computed from the lower, sub-millisecond, latency range; file system scaling sw the system of equations. Our experiments evaluated perfor- properties are complex and require further investigation. mance for device latencies between 0–10ms to represent Understanding the root causes of unexpected behavior is both currently available and future storage devices. To cal- critical to file system evolution for supporting next gener- culate C and lsw and thereby calibrate this model, we picked ation storage devices efficiently. For example, our experi- the slowest latency of 10ms and the middle-range latency of ments revealed that for some workloads Nilfs2 throughput 5ms. We believe that providing the model with information remains almost flat across high- and low-latency devices. We of half of the range should provide reasonable model accu- used Nilfs2 as a case study to develop a general guide for racy. We use the rest of the latencies (0–5ms) to compare identifying bottlenecks in a file system and an I/O stack. We model predictions against experimental results. uncovered a high level of metadata contention in Nilfs2 that It is important to distinguish UPEM, whose goal is to fetters its ability to scale as device latency decreases. Ulti- capture user performance expectations (i.e., ”My file system mately, our work motivates revisiting file system designs for should go faster with a faster storage device”), from a real the new era of diverse storage device characteristics. ”file system performance model” that describes the specifics of a file system’s design and implementation. 2. Goals and Models 3. Testing Methodology File systems remain the most common abstraction through which applications access underlying storage. Block-based Evaluating the performance of multiple file systems across file systems translate file-level operations into block-level multiple workloads and devices is a challenging but feasible accesses to the storage device. While some recent proposals task. Our approach is both systematic and pragmatic. eliminate the block abstraction for fast persistent memory Hardware and Operating System. All experiments were based devices [10–12, 20], other proposals have advocated run on identical IBM System x3650 M4 servers equipped the opposite [7, 9] for backward compatibility. We believe with two 8-core Intel Xeon CPUs and 96GB of memory, that the block abstraction will remain a significant building running Red Hat Enterprise Linux 7.0 (RHEL7). We encoun- block in the foreseeable future. tered issues with BTRFS on RHEL7’s 3.10 Linux kernel, so The main goal of our study therefore is to gain initial in- we switched to the vanilla 3.14 kernel for all experiments. sights in how file systems perform across a range of block File Systems. We chose five local file systems available in device latencies. Specifically, we are interested in under- the Linux kernel: 1) Ext4, 2) XFS, 3) BTRFS, Nilfs2, and standing file system scaling characteristics with faster stor- 5) F2FS. These file systems have significantly different in- age devices. We believe this work can provide a direct ben- ternal architectures and we consider them representative of efit to the file system research community by identifying recent developments in file system design for both conven- bottlenecks across diverse storage devices and workloads, tional and newer block storage devices. Ext4 [3] is a tradi- as well as indicating potential performance improvements tional FFS-like file system that uses inode tables, bitmaps, when users upgrade their storage.