Fist: a System for Stackable File-System Code Generation Erez

Total Page:16

File Type:pdf, Size:1020Kb

Fist: a System for Stackable File-System Code Generation Erez FiST: A System for Stackable File-System Code Generation Erez Zadok Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in the Graduate School of Arts and Sciences COLUMBIA UNIVERSITY May, 2001 c 2001 Erez Zadok All Rights Reserved ABSTRACT FiST: A System for Stackable File-System Code Generation Erez Zadok File systems often need to evolve and require many changes to support new features. Traditional file-system development is difficult because most of the work is done in the kernel—a hostile development environment where progress is slow, debugging is difficult, and simple mistakes can crash systems. Kernel work also requires deep understanding of system internals, resulting in developers spending a lot of time becoming familiar with the system’s details. Furthermore, any file system written for one system requires significant effort to port to another system. Stackable file systems promise to ease the development of file systems by offering a mechanism for incremental development building on existing file systems. Unfortunately, existing stacking methods often require writing complex low-level kernel code that is specific to a single operating system platform and also difficult to port. We propose a new language, FiST, to describe stackable file systems. FiST uses operations common to file-system interfaces and familiar to user-level developers: creating a directory, reading a file, removing a file, listing the contents of a directory, etc. From a single description, FiST’s compiler produces file-system modules for multiple platforms. FiST does that with the assistance of platform-specific stackable templates. The templates handle many of the internal details of operating systems, and free developers from dealing with these internals. The templates support many features: data copying and file name copying useful for applications that want to modify them; size-changing file systems such as compression; fan-out for access to multiple file systems from one layer; and more. The FiST language compiler uses the templates as a basis for producing code for a new file system, by inserting, removing, or modifying code in the templates. This dissertation describes the design, implementation, and evaluation of FiST. Our thesis is that it is possible to extend file system functionality in a portable way without changing existing kernels. This is possible because the FiST language uses file-system functions that are common across many systems, while the templates execute in-kernel operating systems specific functions unchanged. We built several file systems using FiST on Solaris, FreeBSD, and Linux. Our experiences with these file systems show the following benefits: average code size is reduced ten times as compared to writing code given another null-layer stackable file system; average development time is reduced seven times compared to writing using another null-layer stackable file system; performance overhead of stacking is only 1–2% per layer. Contents List of Tables v List of Figures vi Chapter 1 Introduction 1 1.1 Our Approach . 2 1.2 Contributions . 3 1.3 Thesis Organization . 4 Chapter 2 Background 5 2.1 Evolution of File Systems Development . 5 2.1.1 Native File Systems . 5 2.1.2 User-Level File Systems . 6 2.1.3 The Vnode Interface . 7 2.1.4 A Stackable Vnode Interface . 9 2.1.4.1 First Stacking Interfaces . 11 2.1.4.2 Fanning in Stackable File Systems . 12 2.1.4.3 Interposition and Composition . 13 2.1.4.4 4.4 BSD’s Nullfs . 15 2.1.4.5 Programmed Logic Corp.’s StackFS . 15 2.1.5 HURD . 15 2.1.5.1 How to Write a Translator . 16 2.1.6 Plan 9 . 17 2.1.6.1 Inferno . 18 2.1.7 Spring . 18 2.1.8 Windows NT . 19 2.1.9 Other Extensible File-System Efforts . 20 2.1.9.1 Compression Support . 20 2.1.10 Domain Specific Languages . 21 2.2 File Systems Development Taxonomy . 21 Chapter 3 Design Overview 24 3.1 Layers of Abstraction . 24 3.2 FiST Templates and Code Generator . 25 3.3 The Development Process . 26 i 3.3.1 Developing From Scratch . 26 3.3.2 Developing Using Existing Stacking . 27 3.3.3 Developing Using FiST . 27 3.4 The FiST Programming Model . 28 3.5 The File-System Model . 29 Chapter 4 The FiST Language 31 4.1 Overview of the FiST Input File . 31 4.2 FiST Syntax . 33 4.3 Rules for Controlling Execution and Information Flow . 35 4.4 Filter Declarations and Filter Functions . 37 4.5 Fistgen: The FiST Language Code Generator . 37 Chapter 5 Stackable Templates 39 5.1 Overview of the Basefs Templates . 39 5.2 Manipulating Files . 41 5.3 Encoding and Decoding File Data Pages . 41 5.3.1 Paged Reading and Writing . 42 5.3.1.1 Appending to Files . 43 5.3.2 Memory Mapping . 43 5.3.3 Interaction Between Caches . 43 5.4 Encoding and Decoding File Names . 44 5.5 Error Codes . 44 Chapter 6 Support for Size-Changing File Systems 46 6.1 Size-Changing Algorithms . 46 6.2 Overview of Support for Size-Changing Stackable File Systems . 47 6.3 The Index File . 49 6.3.1 File Operations . 50 6.3.1.1 Fast Tails . 51 6.3.1.2 Write in the Middle . 52 6.3.1.3 Truncate . 53 6.3.2 Additional Benefits of the Index File . 53 6.3.2.1 Low Resource Usage . 54 6.3.2.2 Index File Consistency . 54 6.4 Summary . 55 Chapter 7 Implementation 56 7.1 Templates . 56 7.1.1 Stacking . 56 7.1.2 FreeBSD . 57 7.1.3 Linux . 58 7.1.3.1 Call Sequence and Existence . 58 7.1.3.2 Data Structures . 58 7.2 Size-Changing Algorithms . 60 7.3 Fistgen . 60 ii Chapter 8 File Systems Developed Using FiST 62 8.1 Cryptfs . 62 8.2 Aclfs . 64 8.3 Unionfs . 65 8.4 Copyfs . ..
Recommended publications
  • Administering Unidata on UNIX Platforms
    C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\ADMINUNIX\ADMINUNIXTITLE.fm March 5, 2010 1:34 pm Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta UniData Administering UniData on UNIX Platforms UDT-720-ADMU-1 C:\Program Files\Adobe\FrameMaker8\UniData 7.2\7.2rebranded\ADMINUNIX\ADMINUNIXTITLE.fm March 5, 2010 1:34 pm Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Beta Notices Edition Publication date: July, 2008 Book number: UDT-720-ADMU-1 Product version: UniData 7.2 Copyright © Rocket Software, Inc. 1988-2010. All Rights Reserved. Trademarks The following trademarks appear in this publication: Trademark Trademark Owner Rocket Software™ Rocket Software, Inc. Dynamic Connect® Rocket Software, Inc. RedBack® Rocket Software, Inc. SystemBuilder™ Rocket Software, Inc. UniData® Rocket Software, Inc. UniVerse™ Rocket Software, Inc. U2™ Rocket Software, Inc. U2.NET™ Rocket Software, Inc. U2 Web Development Environment™ Rocket Software, Inc. wIntegrate® Rocket Software, Inc. Microsoft® .NET Microsoft Corporation Microsoft® Office Excel®, Outlook®, Word Microsoft Corporation Windows® Microsoft Corporation Windows® 7 Microsoft Corporation Windows Vista® Microsoft Corporation Java™ and all Java-based trademarks and logos Sun Microsystems, Inc. UNIX® X/Open Company Limited ii SB/XA Getting Started The above trademarks are property of the specified companies in the United States, other countries, or both. All other products or services mentioned in this document may be covered by the trademarks, service marks, or product names as designated by the companies who own or market them. License agreement This software and the associated documentation are proprietary and confidential to Rocket Software, Inc., are furnished under license, and may be used and copied only in accordance with the terms of such license and with the inclusion of the copyright notice.
    [Show full text]
  • A Versatile Persistent Caching Framework for File Systems Gopalan Sivathanu and Erez Zadok Stony Brook University
    A Versatile Persistent Caching Framework for File Systems Gopalan Sivathanu and Erez Zadok Stony Brook University Technical Report FSL-05-05 Abstract IDE RAID array that has an Ext2 file system can cache the recently-accessed data into a smaller but faster SCSI We propose and evaluate an approach for decoupling disk to improve performance. The same could be done persistent-cache management from general file system for a local disk; it can be cached to a faster flash drive. design. Several distributed file systems maintain a per- The second problem with the persistent caching sistent cache of data to speed up accesses. Most of these mechanisms of present distributed file systems is that file systems retain complete control over various aspects they have a separate name space for the cache. Hav- of cache management, such as granularity of caching, ing the persistent cache directory structure as an exact and policies for cache placement and eviction. Hard- replica of the source file system is useful even when coding cache management into the file system often re- xCachefs is not mounted. For example, if the cache has sults in sub-optimal performance as the clients of the file the same structure as the source, disconnected reads are system are prevented from exploiting information about possible directly from the cache, as in Coda [2]. their workload in order to tune cache management. In this paper we present a decoupled caching mech- We introduce xCachefs, a framework that allows anism called xCachefs. With xCachefs, data can be clients to transparently augment the cache management cached from any file system to a faster file system.
    [Show full text]
  • IT1100 : Introduction to Operating Systems Chapter 15 What Is a Partition? What Is a Partition? Linux Partitions What Is Swap? M
    IT1100 : Introduction to Operating Systems Chapter 15 What is a partition? A partition is just a logical division of your hard drive. This is done to put data in different locations for flexibility, scalability, ease of administration, and a variety of other reasons. One reason might be so you can install Linux and Windows side-by-side. What is a partition? Another reason is to encapsulate your data. Keeping your system files and user files separate can protect one or the otherfrom malware. Since file system corruption is local to a partition, you stand to lose only some of your data if an accident occurs. Upgrading and/or reformatting is easier when your personal files are stored on a separate partition. Limit data growth. Runaway processes or maniacal users can consume so much disk space that the operating system no longer has room on the hard drive for its bookkeeping operations. This will lead to disaster. By segregating space, you ensure that things other than the operating system die when allocated disk space is exhausted. Linux Partitions In Linux, a minimum of 1 partition is required for the / . Mounting is the action of connecting a filesystem/partition to a particular point in the / root filesystem. I.e. When a usb stick is inserted, it is assigned a particular mount point and is available to the filesytem tree. - In windows you might have an A:, or B:, or C:, all of which point to different filesystems. What is Swap? If RAM fills up, by running too many processes or a process with a memory leak, new processes will fail if your system doesn’t have a way to extend system memory.
    [Show full text]
  • Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 12 SP4
    SUSE Linux Enterprise Server 12 SP4 Storage Administration Guide Storage Administration Guide SUSE Linux Enterprise Server 12 SP4 Provides information about how to manage storage devices on a SUSE Linux Enterprise Server. Publication Date: September 24, 2021 SUSE LLC 1800 South Novell Place Provo, UT 84606 USA https://documentation.suse.com Copyright © 2006– 2021 SUSE LLC and contributors. All rights reserved. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or (at your option) version 1.3; with the Invariant Section being this copyright notice and license. A copy of the license version 1.2 is included in the section entitled “GNU Free Documentation License”. For SUSE trademarks, see https://www.suse.com/company/legal/ . All other third-party trademarks are the property of their respective owners. Trademark symbols (®, ™ etc.) denote trademarks of SUSE and its aliates. Asterisks (*) denote third-party trademarks. All information found in this book has been compiled with utmost attention to detail. However, this does not guarantee complete accuracy. Neither SUSE LLC, its aliates, the authors nor the translators shall be held liable for possible errors or the consequences thereof. Contents About This Guide xii 1 Available Documentation xii 2 Giving Feedback xiv 3 Documentation Conventions xiv 4 Product Life Cycle and Support xvi Support Statement for SUSE Linux Enterprise Server xvii • Technology Previews xviii I FILE SYSTEMS AND MOUNTING 1 1 Overview
    [Show full text]
  • Linux on the Road
    Linux on the Road Linux with Laptops, Notebooks, PDAs, Mobile Phones and Other Portable Devices Werner Heuser <wehe[AT]tuxmobil.org> Linux Mobile Edition Edition Version 3.22 TuxMobil Berlin Copyright © 2000-2011 Werner Heuser 2011-12-12 Revision History Revision 3.22 2011-12-12 Revised by: wh The address of the opensuse-mobile mailing list has been added, a section power management for graphics cards has been added, a short description of Intel's LinuxPowerTop project has been added, all references to Suspend2 have been changed to TuxOnIce, links to OpenSync and Funambol syncronization packages have been added, some notes about SSDs have been added, many URLs have been checked and some minor improvements have been made. Revision 3.21 2005-11-14 Revised by: wh Some more typos have been fixed. Revision 3.20 2005-11-14 Revised by: wh Some typos have been fixed. Revision 3.19 2005-11-14 Revised by: wh A link to keytouch has been added, minor changes have been made. Revision 3.18 2005-10-10 Revised by: wh Some URLs have been updated, spelling has been corrected, minor changes have been made. Revision 3.17.1 2005-09-28 Revised by: sh A technical and a language review have been performed by Sebastian Henschel. Numerous bugs have been fixed and many URLs have been updated. Revision 3.17 2005-08-28 Revised by: wh Some more tools added to external monitor/projector section, link to Zaurus Development with Damn Small Linux added to cross-compile section, some additions about acoustic management for hard disks added, references to X.org added to X11 sections, link to laptop-mode-tools added, some URLs updated, spelling cleaned, minor changes.
    [Show full text]
  • File Formats
    man pages section 4: File Formats Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–3945–10 September 2004 Copyright 2004 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]
  • Text Processing Tools
    Tools for processing text David Morgan Tools of interest here sort paste uniq join xxd comm tr fmt sed fold head file tail dd cut strings 1 sort sorts lines by default can delimit fields in lines ( -t ) can sort by field(s) as key(s) (-k ) can sort fields of numerals numerically ( -n ) Sort by fields as keys default sort sort on shell (7 th :-delimited) field UID as secondary (tie-breaker) field 2 Do it numerically versus How sort defines text ’s “fields ” by default ( a space character, ascii 32h = ٠ ) ٠bar an 8-character string ٠foo “By default, fields are separated by the empty string between a non-blank character and a blank character.” ٠bar separator is the empty string between non-blank “o” and the space ٠foo 1 2 ٠bar and the string has these 2 fields, by default ٠foo 3 How sort defines text ’s “fields ” by –t specification (not default) ( a space character, ascii 32h = ٠ ) ٠bar an 8-character string ٠foo “ `-t SEPARATOR' Use character SEPARATOR as the field separator... The field separator is not considered to be part of either the field preceding or the field following ” separators are the blanks themselves, and fields are ' "٠ " ٠bar with `sort -t ٠foo whatever they separate 12 3 ٠bar and the string has these 3 fields ٠foo data sort fields delimited by vertical bars field versus sort field ("1941:japan") ("1941") 4 sort efficiency bubble sort of n items, processing grows as n 2 shell sort as n 3/2 heapsort/mergesort/quicksort as n log n technique matters sort command highly evolved and optimized – better than you could do it yourself Big -O: " bogdown propensity" how much growth requires how much time 5 sort stability stable if input order of key-equal records preserved in output unstable if not sort is not stable GNU sort has –stable option sort stability 2 outputs, from same input (all keys identical) not stable stable 6 uniq operates on sorted input omits repeated lines counts them uniq 7 xxd make hexdump of file or input your friend testing intermediate pipeline data cf.
    [Show full text]
  • 11.7 the Windows 2000 File System
    830 CASE STUDY 2: WINDOWS 2000 CHAP. 11 11.7 THE WINDOWS 2000 FILE SYSTEM Windows 2000 supports several file systems, the most important of which are FAT-16, FAT-32, and NTFS (NT File System). FAT-16 is the old MS-DOS file system. It uses 16-bit disk addresses, which limits it to disk partitions no larger than 2 GB. FAT-32 uses 32-bit disk addresses and supports disk partitions up to 2 TB. NTFS is a new file system developed specifically for Windows NT and car- ried over to Windows 2000. It uses 64-bit disk addresses and can (theoretically) support disk partitions up to 264 bytes, although other considerations limit it to smaller sizes. Windows 2000 also supports read-only file systems for CD-ROMs and DVDs. It is possible (even common) to have the same running system have access to multiple file system types available at the same time. In this chapter we will treat the NTFS file system because it is a modern file system unencumbered by the need to be fully compatible with the MS-DOS file system, which was based on the CP/M file system designed for 8-inch floppy disks more than 20 years ago. Times have changed and 8-inch floppy disks are not quite state of the art any more. Neither are their file systems. Also, NTFS differs both in user interface and implementation in a number of ways from the UNIX file system, which makes it a good second example to study. NTFS is a large and complex system and space limitations prevent us from covering all of its features, but the material presented below should give a reasonable impression of it.
    [Show full text]
  • “Application - File System” Divide with Promises
    Bridging the “Application - File System” divide with promises Raja Bala Computer Sciences Department University of Wisconsin, Madison, WI [email protected] Abstract that hook into the file system and the belief that the underlying file system is the best judge File systems today implement a limited set of when it comes to operations with files. Unfor- abstractions and semantics wherein applications tunately, the latter isn’t true, since applications don’t really have much of a say. The generality know more about their behavior and what they of these abstractions tends to curb the application need or do not need from the file system. Cur- performance. In the global world we live in, it seems rently, there is no real mechanism that allows reasonable that applications are treated as first-class the applications to communicate this informa- citizens by the file system layer. tion to the file system and thus have some degree In this project, we take a first step towards that goal of control over the file system functionality. by leveraging promises that applications make to the file system. The promises are then utilized to deliver For example, an application that never ap- a better-tuned and more application-oriented file pends to any of the files it creates has no means system. A very simple promise, called unique-create of conveying this information to the file sys- was implemented, wherein the application vows tem. Most file systems inherently assume that never to create a file with an existing name (in a it is good to preallocate extra blocks to a file, directory) which is then used by the file system so that when it expands, the preallocated blocks to speedup creation time.
    [Show full text]
  • Z/OS Distributed File Service Zseries File System Implementation Z/OS V1R13
    Front cover z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 Defining and installing a zSeries file system Performing backup and recovery, sysplex sharing Migrating from HFS to zFS Paul Rogers Robert Hering ibm.com/redbooks International Technical Support Organization z/OS Distributed File Service zSeries File System Implementation z/OS V1R13 October 2012 SG24-6580-05 Note: Before using this information and the product it supports, read the information in “Notices” on page xiii. Sixth Edition (October 2012) This edition applies to version 1 release 13 modification 0 of IBM z/OS (product number 5694-A01) and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright International Business Machines Corporation 2010, 2012. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . xiii Trademarks . xiv Preface . .xv The team who wrote this book . .xv Now you can become a published author, too! . xvi Comments welcome. xvi Stay connected to IBM Redbooks . xvi Chapter 1. zFS file systems . 1 1.1 zSeries File System introduction. 2 1.2 Application programming interfaces . 2 1.3 zFS physical file system . 3 1.4 zFS colony address space . 4 1.5 zFS supports z/OS UNIX ACLs. 4 1.6 zFS file system aggregates. 5 1.6.1 Compatibility mode aggregates. 5 1.6.2 Multifile system aggregates. 6 1.7 Metadata cache. 7 1.8 zFS file system clones . 7 1.8.1 Backup file system . 8 1.9 zFS log files.
    [Show full text]
  • Sun Storagetek 5320 NAS Appliance and Gateway Administration Guide
    Sun StorageTek™ 5320 NAS Appliance and Gateway Administration Guide NAS Software Version 4.12 Sun Microsystems, Inc. www.sun.com Part No. 819-6388-10 May 2006, Revision A Submit comments about this document at: http://www.sun.com/hwdocs/feedback Copyright 2006 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, California 95054, U.S.A. All rights reserved. Sun Microsystems, Inc. has intellectual property rights relating to technology that is described in this document. In particular, and without limitation, these intellectual property rights may include one or more of the U.S. patents listed at http://www.sun.com/patents and one or more additional patents or pending patent applications in the U.S. and in other countries. This document and the product to which it pertains are distributed under licenses restricting their use, copying, distribution, and decompilation. No part of the product or of this document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and in other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, Java, AnswerBook2, docs.sun.com, Sun StorEdge, Sun StorageTek, Java, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and in other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc.
    [Show full text]
  • Implementing Nfsv4 in the Enterprise: Planning and Migration Strategies
    Front cover Implementing NFSv4 in the Enterprise: Planning and Migration Strategies Planning and implementation examples for AFS and DFS migrations NFSv3 to NFSv4 migration examples NFSv4 updates in AIX 5L Version 5.3 with 5300-03 Recommended Maintenance Package Gene Curylo Richard Joltes Trishali Nayar Bob Oesterlin Aniket Patel ibm.com/redbooks International Technical Support Organization Implementing NFSv4 in the Enterprise: Planning and Migration Strategies December 2005 SG24-6657-00 Note: Before using this information and the product it supports, read the information in “Notices” on page xi. First Edition (December 2005) This edition applies to Version 5, Release 3, of IBM AIX 5L (product number 5765-G03). © Copyright International Business Machines Corporation 2005. All rights reserved. Note to U.S. Government Users Restricted Rights -- Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Notices . xi Trademarks . xii Preface . xiii The team that wrote this redbook. xiv Acknowledgments . xv Become a published author . xvi Comments welcome. xvii Part 1. Introduction . 1 Chapter 1. Introduction. 3 1.1 Overview of enterprise file systems. 4 1.2 The migration landscape today . 5 1.3 Strategic and business context . 6 1.4 Why NFSv4? . 7 1.5 The rest of this book . 8 Chapter 2. Shared file system concepts and history. 11 2.1 Characteristics of enterprise file systems . 12 2.1.1 Replication . 12 2.1.2 Migration . 12 2.1.3 Federated namespace . 13 2.1.4 Caching . 13 2.2 Enterprise file system technologies. 13 2.2.1 Sun Network File System (NFS) . 13 2.2.2 Andrew File System (AFS) .
    [Show full text]