Terra Incognita: on the Practicality of User-Space File Systems

Terra Incognita: On the Practicality of User-Space File Systems Vasily Tarasov†∗, Abhishek Guptay, Kumar Souravy, Sagar Trehany+, and Erez Zadoky Stony Brook Universityy, IBM Research—Almaden∗, Nimble Storage+ Abstract number of cores per system is up from one to dozens. At the same time, three factors dominated the stor- To speed up development and increase reliability the age space. First, HDDs, which still account for most of Microkernel approach advocated moving many OS ser- the storage devices shipped, have improved their perfor- vices to user space. At that time, the main disadvan- mance by only 10× since 1994. Second, dataset sizes in- tage of microkernels turned out to be their poor per- creased and data-access patterns became less predictable formance. In the last two decades, however, CPU and due to higher complexity of modern I/O stack [19]. RAM technologies have improved significantly and re- Thus, many modern workloads bottleneck on device searchers demonstrated that by carefully designing and I/O; consequently, OS services that involve I/O opera- implementing a microkernel its overhead can be reduced tions might not experience as high performance penalty significantly. Storage devices often remain a major bot- as before due to CPU-hungry IPC. However, the third tleneck in systems due to their relatively slow speed. factor—the advent of Flash-based storage—suggests the Thus, user-space I/O services, such as file systems and contrary. Modern SSDs are over 10,000× faster than block layer, might see significantly lower relative over- 1994 HDDs; consequently, OS I/O services might have head than any other OS services. In this paper we ex- added more relative overhead than before [5]. As such, amine the reality of a partial return of the microkernel there is no clear understanding on how the balance be- architecture—but for I/O subsystems only. We observed tween CPU and storage performance shifted and im- over 100 user-space file systems have been developed in pacted the overheads of user-space I/O services. recent years. However, performance analysis and careful design of user-space file systems were disproportion- In the 90s, Liedtke et al. demonstrated that by care- ately overlooked by the storage community. Through fully designing and implementing a microkernel with extensive benchmarks we present Linux FUSE perfor- performance as a key feature, microkernel overheads mance for several systems and 45 workloads. We estab- could be reduced significantly [7]. Alas, this proof came lish that in many setups, FUSE already achieves accept- too late to impact server and desktop OSes, which com- able performance but further research is needed for file fortably settled into the monolithic approach. Today, mi- systems to comfortably migrate to user space. crokernels such as the L4 family are used widely, but pri- marily in embedded applications [10]. We believe that 1 Introduction the same techniques that allow L4 to achieve high per- Modern general-purpose OSes, such as Unix/Linux and formance in embedded systems can be used to migrate Windows, lean heavily towards the monolithic kernel ar- I/O stack in general-purpose OSes to user space. chitecture. In the 1980s, when the monolithic kernel File systems and the block layer are two main I/O approach became a development bottleneck, the idea of stack OS services. File systems are complex pieces of microkernels arose [1, 15]. Microkernels offer only a software with many lines of code and numerous im- limited number of services to user applications: pro- plementations. Linux 3.6 has over 70 file systems, cess scheduling, virtual memory management, and Inter- consuming almost 30% of all kernel code (excluding Process Communication (IPC). The rest of the services, architecture-specific code and drivers). Software De- including file systems, were provided by user-space dae- fined Storage (SDS) paradigms suggest moving even mons. Despite their benefits, microkernels did not suc- more storage-related functionality to the OS. Maintain- ceed at first due to high performance overheads caused ing such a large code base in a kernel is difficult; moving by IPC message copying, excessive number of system it to user space would simplify this task significantly, in- call invocations, and more. crease kernel reliability, extensibility, and security. In Computational power has improved significantly recent years we observed a rapid growth of user-space since 1994 when the acclaimed Mach 3 microkernel file systems, e.g., over 100 file systems are implemented project was officially over. The 1994 Intel Pentium using FUSE [18]. Initially, FUSE-based file systems performed 190 MIPS; the latest Intel Core i7 achieves offered a common interface to diverse user data (e.g., 177,000 MIPS—almost 1,000× improvement. Simi- archive files, ftp servers). Nowadays, however, even tra- larly, cache sizes for these CPUs increased 200× (16KB ditional file systems are implemented using FUSE [21]. vs. 3MB). A number of CPU optimizations were added Many enterprise distributed file system like PLFS and to improve processing speeds, e.g., intelligent prefetch- GPFS are implemented in user space [4]. ers and a special syscall instruction. Finally, the average Despite the evident renaissance of user-space file sys- 1 tems, performance analysis and careful design of FUSE- Performance. The common thinking that user-space like frameworks were largely overlooked by the storage file systems cannot be faster than kernel counterparts community. In this position paper, we evaluate user- mistakenly assume that both use the same algorithms space file system’s performance. Our results indicate and data structures. However, an abundance of libraries that for many workloads and devices user-space file sys- is available in the user space, where it is easier to use and tems can achieve acceptable performance already now. try new, more efficient algorithms. For example, predic- However, for some setups, the FUSE layer still can be a tive prefetching can use AI algorithms to adapt to a spe- significant bottleneck. Therefore, we propose to jump- cific user’s workload; cache management can use clas- start research on user-space file systems. Arguably, with sification libraries to implement better eviction policies. a right amount of effort, the entire I/O stack, includ- Porting such libraries to the kernel would be daunting. ing block layers and drivers, can be effectively moved to user space in the future. 3 Historical Background The original microkernel-based OSes implemented file 2 Benefits of User-Space File Systems systems as user-space services. This approach was log- ically continued by the modern microkernels. E.g., Historically, most file systems were implemented in the GNU Hurd supports ufs, ext2, isofs, nfs, and ftpfs file OS kernel, excluding some distributed file systems that servers [8]. In this case, implementing file systems as needed greater portability. When developing a new file user processes was just a part of a general paradigm. system today, one common assumption is that to achieve Starting from the 1990s, projects developing user- production-level performance, the code should be writ- space file systems as part of monolithic kernels began ten in the kernel. Moving file system development to to appear. These endeavors can be roughly classified user space would require significant change in the com- into two types: specialized solutions and general frame- munity’s mindset. The benefits must be compelling: low works. The first type includes the designs in which overheads if any and significantly improved portability, some specific file system was completely or partially im- maintainability, and versatility. plemented in the user space, without providing a gen- Development ease. The comfort of software develop- eral approach for writing user-space file systems. E.g., ment (and consequently its speed) is largely determined Steere et al. proposed to move Coda file system’s cache by the completeness of a programmer’s toolbox. Nu- management—the most complex part of the system— merous user-space tools for debugging, tracing, profil- to the user space [17]. Another example of a special- ing, and testing are available to user-level developers. ized solution is Arla—an open source implementation A bug does not crash or corrupt the state of the whole of AFS [20]. Only 10% of Arla’s code is located in the system, but stops only a single program; this also often kernel which allowed it to be ported to many OSes: So- produces an easily debuggable error message. After the laris, FreeBSD, Linux, AIX, Windows, and others. bug is identified and fixed, restarting the file system is The second type of solutions—general solutions— as easy as any other user application. Programmers are include the designs that focus explicitly on expanding not limited to a single programming language (usually OS functionality by building a full-fledged frameworks C or C++), but can use and mix any programming and for creating user-space file systems [2, 3, 12]. The orig- scripting languages as desired. A large number of useful inal intention behind creating such frameworks was to software libraries are readily available. enable programs to look inside archived files, access Reliability and security. By reducing the amount of the remote files over FTP, and similar use cases—all code running in the kernel, one reduces the chances that without modifying the programs. Over time, however, a kernel bug crashes an entire production system. In re- it became clear that there were many more use cases cent years, malicious attacks on kernels have become for the user-space file system frameworks. E.g., one more frequent than on user applications. As the main can extend a functionality of an existing file system by reason for this trend Kermerlis et al. lists the fact that ap- stacking a user-space file system on top.

Terra Incognita: on the Practicality of User-Space File Systems

The Linux Kernel Module Programming Guide

Software User Guide

MOAT004 1 Eprint Cs.DC/0306067 Computing in High Energy and Nuclear Physics, 24-28 March 2003, La Jolla, California Resources

End-User Computing Security Guidelines Previous Screen Ron Hale Payoff Providing Effective Security in an End-User Computing Environment Is a Challenge

Globalfs: a Strongly Consistent Multi-Site File System

The Xen Port of Kexec / Kdump a Short Introduction and Status Report

Performance Best Practices for Vmware Workstation Vmware Workstation 7.0

Chapter 1. Origins of Mac OS X

High Velocity Kernel File Systems with Bento

Release 0.5.0 Will Mcgugan

Using the Andrew File System on *BSD [email protected], Bsdcan, 2006 Why Another Network Filesystem

Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO