To FUSE Or Not to FUSE: Performance of User-Space File Systems

To FUSE or Not to FUSE: Performance of User-Space File Systems Bharath Kumar Reddy Vangoor, Stony Brook University; Vasily Tarasov, IBM Research-Almaden; Erez Zadok, Stony Brook University https://www.usenix.org/conference/fast17/technical-sessions/presentation/vangoor This paper is included in the Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST ’17). February 27–March 2, 2017 • Santa Clara, CA, USA ISBN 978-1-931971-36-2 Open access to the Proceedings of the 15th USENIX Conference on File and Storage Technologies is sponsored by USENIX. To FUSE or Not to FUSE: Performance of User-Space File Systems Bharath Kumar Reddy Vangoor1, Vasily Tarasov2, and Erez Zadok1 1Stony Brook University and 2IBM Research—Almaden Abstract whole systems, whereas user-space bugs’ impact is more contained. Many libraries and programming languages Traditionally, file systems were implemented as part are available in user-space in multiple platforms. of OS kernels. However, as complexity of file systems grew, many new file systems began being developed in Although user-space file systems are not expected user space. Nowadays, user-space file systems are often to displace kernel file systems entirely, they undoubt- used to prototype and evaluate new approaches to file edly occupy a growing niche, as some of the more system design. Low performance is considered the main heated debates between proponents and opponents in- disadvantage of user-space file systems but the extent dicate [20,39,41]. The debates center around two trade- of this problem has never been explored systematically. off factors: (1) how large is the performance overhead As a result, the topic of user-space file systems remains caused by a user-space implementations and (2) how rather controversial: while some consider user-space file much easier is it to develop in user space. Ease of de- systems a toy not to be used in production, others de- velopment is highly subjective, hard to formalize and velop full-fledged production file systems in user space. therefore evaluate; but performance is easier to evalu- In this paper we analyze the design and implementa- ate empirically. Oddly, little has been published on the tion of the most widely known user-space file system performance of user-space file system frameworks. framework—FUSE—and characterize its performance In this paper we use a popular user-space file system for a wide range of workloads. We instrumented FUSE framework, FUSE, and characterize its performance. We to extract useful statistics and traces, which helped us an- start with a detailed explanation of FUSE’s design and alyze its performance bottlenecks and present our anal- implementation for four reasons: (1) the architecture is ysis results. Our experiments indicate that depending on somewhat complex; (2) little information on internals is the workload and hardware used, performance degrada- available publicly; (3) FUSE’s source code can be dif- tion caused by FUSE can be completely imperceptible ficult to analyze, with complex asynchrony and user- or as high as –83% even when optimized; and relative kernel communications; and (4) as FUSE’s popularity CPU utilization can increase by 31%. grows, a detailed analysis of its implementation becomes of high value to many. 1 Introduction We developed a simple pass-through stackable file File systems offer a common interface for applications to system in FUSE and then evaluated its performance access data. Although micro-kernels implement file sys- when layered on top of Ext4 compared to native Ext4. tems in user space [1, 16], most file systems are part of We used a wide variety of micro- and macro-workloads, monolithic kernels [6, 22, 34]. Kernel implementations and different hardware using basic and optimized config- avoid the high message-passing overheads of micro- urations of FUSE. We found that depending on the work- kernels and user-space daemons [7, 14]. load and hardware, FUSE can perform as well as Ext4, In recent years, however, user-space file systems but in the worst cases can be 3× slower. Next, we de- rose in popularity for four reasons. (1) Several stack- signed and built a rich instrumentation system for FUSE able file systems add specialized functionality over ex- to gather detailed performance metrics. The statistics existing file systems (e.g., deduplication and compres- tracted are applicable to any FUSE-based systems. We sion [19, 31]). (2) In academia and R&D settings, this then used this instrumentation to identify bottlenecks in framework enabled quick experimentation and prototyp- FUSE, and to explain why, for example, its performance ing of new approaches [3, 9, 15, 21, 40]. (3) Several varied greatly for different workloads. existing kernel-level file systems were ported to user 2 FUSE Design space (e.g., ZFS [45], NTFS [25]). (4) More companies rely on user-space implementations: IBM’S GPFS [30] FUSE—Filesystem in Userspace—is the most widely and LTFS [26], Nimble Storage’s CASL [24], Apache’s used user-space file system framework [35]. According HDFS [2], Google File System [13], RedHat’s Glus- to the most modest estimates, at least 100 FUSE-based terFS [29], Data Domain’s DDFS [46], etc. file systems are readily available on the Web [36]. Al- Increased file systems complexity is a contributing though other, specialized implementations of user-space factor to user-space file systems’ growing popularity file systems exist [30,32,42], we selected FUSE for this (e.g., Btrfs is over 85 KLoC). User space code is eas- study because of its high popularity. ier to develop, port, and maintain. Kernel bugs can crash Although many file systems were implemented using USENIX Association 15th USENIX Conference on File and Storage Technologies 59 FUSE—thanks mainly to the simple API it provides— tem, the daemon reads or writes from the block device. little work was done on understanding its internal ar- When done with processing the request, the FUSE dae- chitecture, implementation, and performance [27]. For mon writes the response back to /dev/fuse; FUSE’s our evaluation it was essential to understand not only kernel driver then marks the request as completed and FUSE’s high-level design but also some details of its im- wakes up the original user process. plementation. In this section we first describe FUSE’s Some file system operations invoked by an application basics and then we explain certain important implemen- can complete without communicating with the user-level tation details. FUSE is available for several OSes: we FUSE daemon. For example, reads from a file whose selected Linux due to its wide-spread use. We analyzed pages are cached in the kernel page cache, do not need the code of and ran experiments on the latest stable ver- to be forwarded to the FUSE driver. sion of the Linux kernel available at the beginning of the project—v4.1.13. We also used FUSE library commit 2.2 Implementation Details #386b1b; on top of FUSE v2.9.4, this commit contains We now discuss several important FUSE implemen- several important patches which we did not want exclude tation details: the user-kernel protocol, library and from our evaluation. We manually examined all new API levels, in-kernel FUSE queues, splicing, multi- commits up to the time of this writing and confirmed threading, and write-back cache. that no new major features or improvements were added to FUSE since the release of the selected versions. Group (#) Request Types Special (3) INIT, DESTROY, INTERRUPT 2.1 High-Level Architecture Metadata (14) LOOKUP, FORGET, BATCH FORGET, CREATE, UNLINK, LINK, RENAME, RE- FUSE file−system daemon NAME2, OPEN, RELEASE, STATFS, Application FSYNC FLUSH ACCESS FUSE library , , User Data (2) READ, WRITE Attributes (2) GETATTR, SETATTR cache /dev/fuse VFS Other kernel Extended SETXATTR, GETXATTR, Q FUSE u subsystems Attributes (4) LISTXATTR, REMOVEXATTR e Kernel Kernel−based driver u Symlinks (2) SYMLINK, READLINK file system e Directory (7) MKDIR, RMDIR, OPENDIR, RE- Figure 1: FUSE high-level architecture. LEASEDIR, READDIR, READDIRPLUS, FUSE consists of a kernel part and a user-level dae- FSYNCDIR Locking (3) GETLK, SETLK, SETLKW mon. The kernel part is implemented as a Linux kernel Misc (6) BMAP, FALLOCATE, MKNOD, IOCTL, module that, when loaded, registers a fuse file-system POLL, NOTIFY REPLY driver with Linux’s VFS. This Fuse driver acts as a proxy for various specific file systems implemented by differ- Table 1: FUSE request types, by group (whose size is in paren- ent user-level daemons. thesis). Requests we discuss in the text are in bold. In addition to registering a new file system, FUSE’s User-kernel protocol. When FUSE’s kernel driver kernel module also registers a /dev/fuse block de- communicates to the user-space daemon, it forms a vice. This device serves as an interface between user- FUSE request structure. Requests have different types space FUSE daemons and the kernel. In general, dae- depending on the operation they convey. Table 1 lists all mon reads FUSE requests from /dev/fuse, processes 43 FUSE request types, grouped by their semantics. As them, and then writes replies back to /dev/fuse. seen, most requests have a direct mapping to traditional Figure 1 shows FUSE’s high-level architecture. When VFS operations: we omit discussion of obvious requests a user application performs some operation on a (e.g., READ, CREATE) and instead next focus on those mounted FUSE file system, the VFS routes the operation less intuitive request types (marked bold in Table 1). to FUSE’s kernel driver. The driver allocates a FUSE The INIT request is produced by the kernel when a request structure and puts it in a FUSE queue.

To FUSE Or Not to FUSE: Performance of User-Space File Systems

Optimizing the Ceph Distributed File System for High Performance Computing

Proxmox Ve Mit Ceph &

Red Hat Openstack* Platform with Red Hat Ceph* Storage

The Google File System (GFS)

A Review on GOOGLE File System

Gluster Roadmap: Recent Improvements and Upcoming Features

Storage Virtualization for KVM – Putting the Pieces Together

Red Hat Data Analytics Infrastructure Solution

Glusterfs Documentation Release 3.8.0

F1 Query: Declarative Querying at Scale

A Study of Cryptographic File Systems in Userspace

Unlock Bigdata Analytic Efficiency with Ceph Data Lake