Terra Incognita: on the Practicality of User-Space File Systems

Total Page:16

File Type:pdf, Size:1020Kb

Terra Incognita: on the Practicality of User-Space File Systems Terra Incognita: On the Practicality of User-Space File Systems Vasily Tarasov†∗, Abhishek Guptay, Kumar Souravy, Sagar Trehany+, and Erez Zadoky Stony Brook Universityy, IBM Research—Almaden∗, Nimble Storage+ Abstract number of cores per system is up from one to dozens. At the same time, three factors dominated the stor- To speed up development and increase reliability the age space. First, HDDs, which still account for most of Microkernel approach advocated moving many OS ser- the storage devices shipped, have improved their perfor- vices to user space. At that time, the main disadvan- mance by only 10× since 1994. Second, dataset sizes in- tage of microkernels turned out to be their poor per- creased and data-access patterns became less predictable formance. In the last two decades, however, CPU and due to higher complexity of modern I/O stack [19]. RAM technologies have improved significantly and re- Thus, many modern workloads bottleneck on device searchers demonstrated that by carefully designing and I/O; consequently, OS services that involve I/O opera- implementing a microkernel its overhead can be reduced tions might not experience as high performance penalty significantly. Storage devices often remain a major bot- as before due to CPU-hungry IPC. However, the third tleneck in systems due to their relatively slow speed. factor—the advent of Flash-based storage—suggests the Thus, user-space I/O services, such as file systems and contrary. Modern SSDs are over 10,000× faster than block layer, might see significantly lower relative over- 1994 HDDs; consequently, OS I/O services might have head than any other OS services. In this paper we ex- added more relative overhead than before [5]. As such, amine the reality of a partial return of the microkernel there is no clear understanding on how the balance be- architecture—but for I/O subsystems only. We observed tween CPU and storage performance shifted and im- over 100 user-space file systems have been developed in pacted the overheads of user-space I/O services. recent years. However, performance analysis and care- ful design of user-space file systems were disproportion- In the 90s, Liedtke et al. demonstrated that by care- ately overlooked by the storage community. Through fully designing and implementing a microkernel with extensive benchmarks we present Linux FUSE perfor- performance as a key feature, microkernel overheads mance for several systems and 45 workloads. We estab- could be reduced significantly [7]. Alas, this proof came lish that in many setups, FUSE already achieves accept- too late to impact server and desktop OSes, which com- able performance but further research is needed for file fortably settled into the monolithic approach. Today, mi- systems to comfortably migrate to user space. crokernels such as the L4 family are used widely, but pri- marily in embedded applications [10]. We believe that 1 Introduction the same techniques that allow L4 to achieve high per- Modern general-purpose OSes, such as Unix/Linux and formance in embedded systems can be used to migrate Windows, lean heavily towards the monolithic kernel ar- I/O stack in general-purpose OSes to user space. chitecture. In the 1980s, when the monolithic kernel File systems and the block layer are two main I/O approach became a development bottleneck, the idea of stack OS services. File systems are complex pieces of microkernels arose [1, 15]. Microkernels offer only a software with many lines of code and numerous im- limited number of services to user applications: pro- plementations. Linux 3.6 has over 70 file systems, cess scheduling, virtual memory management, and Inter- consuming almost 30% of all kernel code (excluding Process Communication (IPC). The rest of the services, architecture-specific code and drivers). Software De- including file systems, were provided by user-space dae- fined Storage (SDS) paradigms suggest moving even mons. Despite their benefits, microkernels did not suc- more storage-related functionality to the OS. Maintain- ceed at first due to high performance overheads caused ing such a large code base in a kernel is difficult; moving by IPC message copying, excessive number of system it to user space would simplify this task significantly, in- call invocations, and more. crease kernel reliability, extensibility, and security. In Computational power has improved significantly recent years we observed a rapid growth of user-space since 1994 when the acclaimed Mach 3 microkernel file systems, e.g., over 100 file systems are implemented project was officially over. The 1994 Intel Pentium using FUSE [18]. Initially, FUSE-based file systems performed 190 MIPS; the latest Intel Core i7 achieves offered a common interface to diverse user data (e.g., 177,000 MIPS—almost 1,000× improvement. Simi- archive files, ftp servers). Nowadays, however, even tra- larly, cache sizes for these CPUs increased 200× (16KB ditional file systems are implemented using FUSE [21]. vs. 3MB). A number of CPU optimizations were added Many enterprise distributed file system like PLFS and to improve processing speeds, e.g., intelligent prefetch- GPFS are implemented in user space [4]. ers and a special syscall instruction. Finally, the average Despite the evident renaissance of user-space file sys- 1 tems, performance analysis and careful design of FUSE- Performance. The common thinking that user-space like frameworks were largely overlooked by the storage file systems cannot be faster than kernel counterparts community. In this position paper, we evaluate user- mistakenly assume that both use the same algorithms space file system’s performance. Our results indicate and data structures. However, an abundance of libraries that for many workloads and devices user-space file sys- is available in the user space, where it is easier to use and tems can achieve acceptable performance already now. try new, more efficient algorithms. For example, predic- However, for some setups, the FUSE layer still can be a tive prefetching can use AI algorithms to adapt to a spe- significant bottleneck. Therefore, we propose to jump- cific user’s workload; cache management can use clas- start research on user-space file systems. Arguably, with sification libraries to implement better eviction policies. a right amount of effort, the entire I/O stack, includ- Porting such libraries to the kernel would be daunting. ing block layers and drivers, can be effectively moved to user space in the future. 3 Historical Background The original microkernel-based OSes implemented file 2 Benefits of User-Space File Systems systems as user-space services. This approach was log- ically continued by the modern microkernels. E.g., Historically, most file systems were implemented in the GNU Hurd supports ufs, ext2, isofs, nfs, and ftpfs file OS kernel, excluding some distributed file systems that servers [8]. In this case, implementing file systems as needed greater portability. When developing a new file user processes was just a part of a general paradigm. system today, one common assumption is that to achieve Starting from the 1990s, projects developing user- production-level performance, the code should be writ- space file systems as part of monolithic kernels began ten in the kernel. Moving file system development to to appear. These endeavors can be roughly classified user space would require significant change in the com- into two types: specialized solutions and general frame- munity’s mindset. The benefits must be compelling: low works. The first type includes the designs in which overheads if any and significantly improved portability, some specific file system was completely or partially im- maintainability, and versatility. plemented in the user space, without providing a gen- Development ease. The comfort of software develop- eral approach for writing user-space file systems. E.g., ment (and consequently its speed) is largely determined Steere et al. proposed to move Coda file system’s cache by the completeness of a programmer’s toolbox. Nu- management—the most complex part of the system— merous user-space tools for debugging, tracing, profil- to the user space [17]. Another example of a special- ing, and testing are available to user-level developers. ized solution is Arla—an open source implementation A bug does not crash or corrupt the state of the whole of AFS [20]. Only 10% of Arla’s code is located in the system, but stops only a single program; this also often kernel which allowed it to be ported to many OSes: So- produces an easily debuggable error message. After the laris, FreeBSD, Linux, AIX, Windows, and others. bug is identified and fixed, restarting the file system is The second type of solutions—general solutions— as easy as any other user application. Programmers are include the designs that focus explicitly on expanding not limited to a single programming language (usually OS functionality by building a full-fledged frameworks C or C++), but can use and mix any programming and for creating user-space file systems [2, 3, 12]. The orig- scripting languages as desired. A large number of useful inal intention behind creating such frameworks was to software libraries are readily available. enable programs to look inside archived files, access Reliability and security. By reducing the amount of the remote files over FTP, and similar use cases—all code running in the kernel, one reduces the chances that without modifying the programs. Over time, however, a kernel bug crashes an entire production system. In re- it became clear that there were many more use cases cent years, malicious attacks on kernels have become for the user-space file system frameworks. E.g., one more frequent than on user applications. As the main can extend a functionality of an existing file system by reason for this trend Kermerlis et al. lists the fact that ap- stacking a user-space file system on top.
Recommended publications
  • The Linux Kernel Module Programming Guide
    The Linux Kernel Module Programming Guide Peter Jay Salzman Michael Burian Ori Pomerantz Copyright © 2001 Peter Jay Salzman 2007−05−18 ver 2.6.4 The Linux Kernel Module Programming Guide is a free book; you may reproduce and/or modify it under the terms of the Open Software License, version 1.1. You can obtain a copy of this license at http://opensource.org/licenses/osl.php. This book is distributed in the hope it will be useful, but without any warranty, without even the implied warranty of merchantability or fitness for a particular purpose. The author encourages wide distribution of this book for personal or commercial use, provided the above copyright notice remains intact and the method adheres to the provisions of the Open Software License. In summary, you may copy and distribute this book free of charge or for a profit. No explicit permission is required from the author for reproduction of this book in any medium, physical or electronic. Derivative works and translations of this document must be placed under the Open Software License, and the original copyright notice must remain intact. If you have contributed new material to this book, you must make the material and source code available for your revisions. Please make revisions and updates available directly to the document maintainer, Peter Jay Salzman <[email protected]>. This will allow for the merging of updates and provide consistent revisions to the Linux community. If you publish or distribute this book commercially, donations, royalties, and/or printed copies are greatly appreciated by the author and the Linux Documentation Project (LDP).
    [Show full text]
  • Software User Guide
    Software User Guide • For the safe use of your camera, be sure to read the “Safety Precautions” thoroughly before use. • Types of software installed on your computer varies depending on the method of installation from the Caplio Software CD-ROM. For details, see the “Camera User Guide”. Using These Manuals How to Use the The two manuals included are for your Caplio Software User Guide 500SE. Display examples: 1. Understanding How to Use Your The LCD Monitor Display examples may be Camera different from actual display screens. “Camera User Guide” (Printed manual) Terms: This guide explains the usage and functions In this guide, still images, movies, and sounds of the camera. You will also see how to install are all referred to as “images” or “files”. the provided software on your computer. Symbols: This guide uses the following symbols and conventions: Caution Caution This indicates important notices and restrictions for using this camera. 2. Downloading Images to Your Computer “Software User Guide” Note *This manual (this file) This indicates supplementary explanations and useful This guide explains how to download images tips about camera operations. from the camera to your computer using the provided software. Refer to This indicates page(s) relevant to a particular function. “P. xx” is used to refer you to pages in this manual. Term 3. Displaying Images on Your This indicates terms that are useful for understanding Computer the explanations. The provided software “ImageMixer” allows you to display and edit images on your computer. For details on how to use ImageMixer, click the [?] button on the ImageMixer window and see the displayed manual.
    [Show full text]
  • MOAT004 1 Eprint Cs.DC/0306067 Computing in High Energy and Nuclear Physics, 24-28 March 2003, La Jolla, California Resources
    Computing in High Energy and Nuclear Physics, 24-28 March 2003, La Jolla, California The AliEn system, status and perspectives P. Buncic Institut für Kernphysik, August-Euler-Str. 6, D-60486 Frankfurt, Germany and CERN, 1211, Geneva 23, Switzerland A. J. Peters, P.Saiz CERN, 1211, Geneva 23, Switzerland In preparation of the experiment at CERN's Large Hadron Collider (LHC), the ALICE collaboration has developed AliEn, a production environment that implements several components of the Grid paradigm needed to simulate, reconstruct and analyse data in a distributed way. Thanks to AliEn, the computing resources of a Virtual Organization can be seen and used as a single entity – any available node can execute jobs and access distributed datasets in a fully transparent way, wherever in the world a file or node might be. The system is built aroun d Open Source components, uses the Web Services model and standard network protocols to implement the computing platform that is currently being used to produce and analyse Monte Carlo data at over 30 sites on four continents. Several other HEP experiments as well as medical projects (EU MammoGRID, INFN GP -CALMA) have expressed their interest in AliEn or some components of it. As progress is made in the definition of Grid standards and interoperability, our aim is to interface AliEn to emerging products from both Europe and the US. In particular, it is our intention to make AliEn services compatible with the Open Grid Services Architecture (OGSA). The aim of this paper is to present the current AliEn architecture and outline its future developments in the light of emerging standards.
    [Show full text]
  • End-User Computing Security Guidelines Previous Screen Ron Hale Payoff Providing Effective Security in an End-User Computing Environment Is a Challenge
    86-10-10 End-User Computing Security Guidelines Previous screen Ron Hale Payoff Providing effective security in an end-user computing environment is a challenge. First, what is meant by security must be defined, and then the services that are required to meet management's expectations concerning security must be established. This article examines security within the context of an architecture based on quality. Problems Addressed This article examines security within the context of an architecture based on quality. To achieve quality, the elements of continuity, confidentiality, and integrity need to be provided. Confidentiality as it relates to quality can be defined as access control. It includes an authorization process, authentication of users, a management capability, and auditability. This last element, auditability, extends beyond a traditional definition of the term to encompass the ability of management to detect unusual or unauthorized circumstances and actions and to trace events in an historical fashion. Integrity, another element of quality, involves the usual components of validity and accuracy but also includes individual accountability. All information system security implementations need to achieve these components of quality in some fashion. In distributed and end-user computing environments, however, they may be difficult to implement. The Current Security Environment As end-user computing systems have advanced, many of the security and management issues have been addressed. A central administration capability and an effective level of access authorization and authentication generally exist for current systems that are connected to networks. In prior architectures, the network was only a transport mechanism. In many of the systems that are being designed and implemented today, however, the network is the system and provides many of the security services that had been available on the mainframe.
    [Show full text]
  • Globalfs: a Strongly Consistent Multi-Site File System
    GlobalFS: A Strongly Consistent Multi-Site File System Leandro Pacheco Raluca Halalai Valerio Schiavoni University of Lugano University of Neuchatelˆ University of Neuchatelˆ Fernando Pedone Etienne Riviere` Pascal Felber University of Lugano University of Neuchatelˆ University of Neuchatelˆ Abstract consistency, availability, and tolerance to partitions. Our goal is to ensure strongly consistent file system operations This paper introduces GlobalFS, a POSIX-compliant despite node failures, at the price of possibly reduced geographically distributed file system. GlobalFS builds availability in the event of a network partition. Weak on two fundamental building blocks, an atomic multicast consistency is suitable for domain-specific applications group communication abstraction and multiple instances of where programmers can anticipate and provide resolution a single-site data store. We define four execution modes and methods for conflicts, or work with last-writer-wins show how all file system operations can be implemented resolution methods. Our rationale is that for general-purpose with these modes while ensuring strong consistency and services such as a file system, strong consistency is more tolerating failures. We describe the GlobalFS prototype in appropriate as it is both more intuitive for the users and detail and report on an extensive performance assessment. does not require human intervention in case of conflicts. We have deployed GlobalFS across all EC2 regions and Strong consistency requires ordering commands across show that the system scales geographically, providing replicas, which needs coordination among nodes at performance comparable to other state-of-the-art distributed geographically distributed sites (i.e., regions). Designing file systems for local commands and allowing for strongly strongly consistent distributed systems that provide good consistent operations over the whole system.
    [Show full text]
  • The Xen Port of Kexec / Kdump a Short Introduction and Status Report
    The Xen Port of Kexec / Kdump A short introduction and status report Magnus Damm Simon Horman VA Linux Systems Japan K.K. www.valinux.co.jp/en/ Xen Summit, September 2006 Magnus Damm ([email protected]) Kexec / Kdump Xen Summit, September 2006 1 / 17 Outline Introduction to Kexec What is Kexec? Kexec Examples Kexec Overview Introduction to Kdump What is Kdump? Kdump Kernels The Crash Utility Xen Porting Effort Kexec under Xen Kdump under Xen The Dumpread Tool Partial Dumps Current Status Magnus Damm ([email protected]) Kexec / Kdump Xen Summit, September 2006 2 / 17 Introduction to Kexec Outline Introduction to Kexec What is Kexec? Kexec Examples Kexec Overview Introduction to Kdump What is Kdump? Kdump Kernels The Crash Utility Xen Porting Effort Kexec under Xen Kdump under Xen The Dumpread Tool Partial Dumps Current Status Magnus Damm ([email protected]) Kexec / Kdump Xen Summit, September 2006 3 / 17 Kexec allows you to reboot from Linux into any kernel. as long as the new kernel doesn’t depend on the BIOS for setup. Introduction to Kexec What is Kexec? What is Kexec? “kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot but it is indepedent of the system firmware...” Configuration help text in Linux-2.6.17 Magnus Damm ([email protected]) Kexec / Kdump Xen Summit, September 2006 4 / 17 . as long as the new kernel doesn’t depend on the BIOS for setup. Introduction to Kexec What is Kexec? What is Kexec? “kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel.
    [Show full text]
  • Performance Best Practices for Vmware Workstation Vmware Workstation 7.0
    Performance Best Practices for VMware Workstation VMware Workstation 7.0 This document supports the version of each product listed and supports all subsequent versions until the document is replaced by a new edition. To check for more recent editions of this document, see http://www.vmware.com/support/pubs. EN-000294-00 Performance Best Practices for VMware Workstation You can find the most up-to-date technical documentation on the VMware Web site at: http://www.vmware.com/support/ The VMware Web site also provides the latest product updates. If you have comments about this documentation, submit your feedback to: [email protected] Copyright © 2007–2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents. VMware is a registered trademark or trademark of VMware, Inc. in the United States and/or other jurisdictions. All other marks and names mentioned herein may be trademarks of their respective companies. VMware, Inc. 3401 Hillview Ave. Palo Alto, CA 94304 www.vmware.com 2 VMware, Inc. Contents About This Book 5 Terminology 5 Intended Audience 5 Document Feedback 5 Technical Support and Education Resources 5 Online and Telephone Support 5 Support Offerings 5 VMware Professional Services 6 1 Hardware for VMware Workstation 7 CPUs for VMware Workstation 7 Hyperthreading 7 Hardware-Assisted Virtualization 7 Hardware-Assisted CPU Virtualization (Intel VT-x and AMD AMD-V)
    [Show full text]
  • Chapter 1. Origins of Mac OS X
    1 Chapter 1. Origins of Mac OS X "Most ideas come from previous ideas." Alan Curtis Kay The Mac OS X operating system represents a rather successful coming together of paradigms, ideologies, and technologies that have often resisted each other in the past. A good example is the cordial relationship that exists between the command-line and graphical interfaces in Mac OS X. The system is a result of the trials and tribulations of Apple and NeXT, as well as their user and developer communities. Mac OS X exemplifies how a capable system can result from the direct or indirect efforts of corporations, academic and research communities, the Open Source and Free Software movements, and, of course, individuals. Apple has been around since 1976, and many accounts of its history have been told. If the story of Apple as a company is fascinating, so is the technical history of Apple's operating systems. In this chapter,[1] we will trace the history of Mac OS X, discussing several technologies whose confluence eventually led to the modern-day Apple operating system. [1] This book's accompanying web site (www.osxbook.com) provides a more detailed technical history of all of Apple's operating systems. 1 2 2 1 1.1. Apple's Quest for the[2] Operating System [2] Whereas the word "the" is used here to designate prominence and desirability, it is an interesting coincidence that "THE" was the name of a multiprogramming system described by Edsger W. Dijkstra in a 1968 paper. It was March 1988. The Macintosh had been around for four years.
    [Show full text]
  • High Velocity Kernel File Systems with Bento
    High Velocity Kernel File Systems with Bento Samantha Miller, Kaiyuan Zhang, Mengqi Chen, and Ryan Jennings, University of Washington; Ang Chen, Rice University; Danyang Zhuo, Duke University; Thomas Anderson, University of Washington https://www.usenix.org/conference/fast21/presentation/miller This paper is included in the Proceedings of the 19th USENIX Conference on File and Storage Technologies. February 23–25, 2021 978-1-939133-20-5 Open access to the Proceedings of the 19th USENIX Conference on File and Storage Technologies is sponsored by USENIX. High Velocity Kernel File Systems with Bento Samantha Miller Kaiyuan Zhang Mengqi Chen Ryan Jennings Ang Chen‡ Danyang Zhuo† Thomas Anderson University of Washington †Duke University ‡Rice University Abstract kernel-level debuggers and kernel testing frameworks makes this worse. The restricted and different kernel programming High development velocity is critical for modern systems. environment also limits the number of trained developers. This is especially true for Linux file systems which are seeing Finally, upgrading a kernel module requires either rebooting increased pressure from new storage devices and new demands the machine or restarting the relevant module, either way on storage systems. However, high velocity Linux kernel rendering the machine unavailable during the upgrade. In the development is challenging due to the ease of introducing cloud setting, this forces kernel upgrades to be batched to meet bugs, the difficulty of testing and debugging, and the lack of cloud-level availability goals. support for redeployment without service disruption. Existing Slow development cycles are a particular problem for file approaches to high-velocity development of file systems for systems.
    [Show full text]
  • Release 0.5.0 Will Mcgugan
    PyFilesystem Documentation Release 0.5.0 Will McGugan Aug 09, 2017 Contents 1 Guide 3 1.1 Introduction...............................................3 1.2 Getting Started..............................................4 1.3 Concepts.................................................5 1.4 Opening Filesystems...........................................7 1.5 Filesystem Interface...........................................7 1.6 Filesystems................................................9 1.7 3rd-Party Filesystems.......................................... 10 1.8 Exposing FS objects........................................... 10 1.9 Utility Modules.............................................. 11 1.10 Command Line Applications....................................... 11 1.11 A Guide For Filesystem Implementers.................................. 14 1.12 Release Notes.............................................. 16 2 Code Documentation 17 2.1 fs.appdirfs................................................ 17 2.2 fs.base.................................................. 18 2.3 fs.browsewin............................................... 30 2.4 fs.contrib................................................. 30 2.5 fs.errors.................................................. 32 2.6 fs.expose................................................. 34 2.7 fs.filelike................................................. 40 2.8 fs.ftpfs.................................................. 42 2.9 fs.httpfs.................................................. 43 2.10 fs.memoryfs..............................................
    [Show full text]
  • Using the Andrew File System on *BSD [email protected], Bsdcan, 2006 Why Another Network Filesystem
    Using the Andrew File System on *BSD [email protected], BSDCan, 2006 why another network filesystem 1-slide history of Andrew File System user view admin view OpenAFS Arla AFS on OpenBSD, FreeBSD and NetBSD Filesharing on the Internet use FTP or link to HTTP file interface through WebDAV use insecure protocol over vpn History of AFS 1984: developed at Carnegie Mellon 1989: TransArc Corperation 1994: over to IBM 1997: Arla, aimed at Linux and BSD 2000: IBM releases source 2000: foundation of OpenAFS User view <1> global filesystem rooted at /afs /afs/cern.ch/... /afs/cmu.edu/... /afs/gorlaeus.net/users/h/hugo/... User view <2> authentication through Kerberos #>kinit <username> obtain krbtgt/<realm>@<realm> #>afslog obtain afs@<realm> #>cd /afs/<cell>/users/<username> User view <3> ACL (dir based) & Quota usage runs on Windows, OS X, Linux, Solaris ... and *BSD Admin view <1> <cell> <partition> <server> <volume> <volume> <server> <partition> Admin view <2> /afs/gorlaeus.net/users/h/hugo/presos/afs_slides.graffle gorlaeus.net /vicepa fwncafs1 users hugo h bram <server> /vicepb Admin view <2a> /afs/gorlaeus.net/users/h/hugo/presos/afs_slides.graffle gorlaeus.net /vicepa fwncafs1 users hugo /vicepa fwncafs2 h bram Admin view <3> servers require KeyFile ~= keytab procedure differs for Heimdal: ktutil copy MIT: asetkey add Admin view <4> entry in CellServDB >gorlaeus.net #my cell name 10.0.0.1 <dbserver host name> required on servers required on clients without DynRoot Admin view <5> File locking no databases on AFS (requires byte range locking)
    [Show full text]
  • Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO
    Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO..........................................................................................................................................1 Martin Hinner < [email protected]>, http://martin.hinner.info............................................................1 1. Introduction..........................................................................................................................................1 2. Volumes...............................................................................................................................................1 3. DOS FAT 12/16/32, VFAT.................................................................................................................2 4. High Performance FileSystem (HPFS)................................................................................................2 5. New Technology FileSystem (NTFS).................................................................................................2 6. Extended filesystems (Ext, Ext2, Ext3)...............................................................................................2 7. Macintosh Hierarchical Filesystem − HFS..........................................................................................3 8. ISO 9660 − CD−ROM filesystem.......................................................................................................3 9. Other filesystems.................................................................................................................................3
    [Show full text]