A Study of Linux File System Evolution Lanyue Lu, Andrea C

Total Page:16

File Type:pdf, Size:1020Kb

A Study of Linux File System Evolution Lanyue Lu, Andrea C A Study of Linux File System Evolution Lanyue Lu, Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau, Shan Lu Motivation How We Studied ? What We Did ? Local file systems are important File systems are evolving over time Our method: manual inspection • Desktop and laptop: Windows, Mac, Linux • Code base is not static • XFS, Ext4, Btrfs, Ext3, Reiserfs and JFS • Data center / Cloud: Google FS, Hadoop DFS • New features, bug-fixing, performance, reliability • All Linux 2.6 major versions • Mobile devices: Andriod, iPhone • 5079 patches, multiple passes Patches describe such evolution Why study is useful • How one version transforms to the next Quantitatively analyze file systems • Study drives system designs • Every patch is available • Patch types, bug patterns, bug consequences • Answer important questions quantitatively • “System software archeology” is possible • Performance improvement • Valuable for different communities • Reliability enhancement • File system developers Study with other rich information • System researchers • Source code, design documents Provide an annotated dataset • Tool builders • Forum, mailing list • Rich data for further analysis Key Questions 1 2 Patch Overview Bug Pattern 2004 1154 809 537 384 191 5079 511 450 358 229 158 80 1786 Q1: What do patches do ? 100% 100% Q2: What do bugs look like ? 80% Maintenance 80% Error Code Feature Q3: Do bugs diminish over time ? 60% 60% Memory Reliability Q4: What consequences do bugs have ? 40% 40% Concurrency Performance Semantic 20% 20% Q5: Where does complexity lie ? Bug 0% 0% Q6: Do bugs occur on normal paths or XFS Ext4 Btrfs Ext3 Reiser JFS All XFS Ext4 Btrfs Ext3 Reiser JFS All failure paths ? A1: 45% of patches are maintenance patches; A2: Semantic bugs dominate other types (about Q7: What performance techniques are 35% of patches are bug fixings. Even stable file 55% of total bugs). Concurrency bugs account systems, such as Ext3, have a large percentage for about 20% (higher than user-level software, used across file systems ? of bug patches. which has about 3% of concurrency bugs) 3 Bug Trend 4 Bug Consequence 5 Correlation Semantic Concurrency Memory Error Code file inode super trans tree 525 461 366 235 166 80 1833 100% balloc dir extent other 40 XFS Ext4 80 Btrfs 0.4 0.4 0.4 40 XFS Ext4 Btrfs 30 60 Wrong 0.3 0.3 0.3 30 80% 20 40 Leak 0.2 0.2 0.2 20 10 10 20 60% Hang 0.1 0.1 0.1 0 0 0 0.0 0.0 0.0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 Deadlock 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 40% Ext3 ReiserFS JFS Error 15 40 10 0.4 Ext3 0.4 ReiserFS 0.4 JFS Number of Bugs 30 Crash 0.3 0.3 0.3 10 20% Percentage of Bugs 20 5 Corruption 0.2 0.2 0.2 5 10 0% 0.1 0.1 0.1 XFS Ext4 Btrfs Ext3 Reiser JFS All 0 0 0 0.0 0.0 0.0 0 10 20 30 40 0 10 20 30 40 0 10 20 30 40 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 0.0 0.1 0.2 0.3 0.4 Linux Version A4: Corruption and crash are most common, Percentage of Code A3: The total number of bugs do not diminish (about 40% and 20%). Wrong behavior accounts A5: Metadata management has high bug over time (even for stable file systems), rather for only 5% to 10%, while it is dominant in user- density (e.g., inode and super block). Tree ebbing and flowing over time. level applications. structures are not particularly error-prone. 6 Failure Path 7 Performance Conclusions 200 149 144 88 63 28 672 135 80 126 41 24 9 415 100% 100% A large-scale study is feasible Other • Time consuming, but manageable 80% 80% Locality • Similar study for other OS components 60% Scalability 60% Research should match reality Scheduling 40% 40% • New tools are required for semantic bugs Access Opt 20% • More attention for failure paths 20% Sync 0% 0% XFS Ext4 Btrfs Ext3 Reiser JFS All History repeats itself XFS Ext4 Btrfs Ext3 Reiser JFS All • Same mistakes, same performance improvement A6: 38% of bugs are on failure paths. Common A7: A wide variety of performance techniques are bug examples: wrong or miss state update used across file systems. Performance likely justifies Dataset is available (semantic); miss unlock (concurrency); resource the use of more complicated and time saving • http://research.cs.wisc.edu/adsl/Traces/fs-patch/ leak, null dereference (memory). synchronization methods. • More data in our paper and dataset Sunday, 10 February 2013.
Recommended publications
  • Study of File System Evolution
    Study of File System Evolution Swaminathan Sundararaman, Sriram Subramanian Department of Computer Science University of Wisconsin {swami, srirams} @cs.wisc.edu Abstract File systems have traditionally been a major area of file systems are typically developed and maintained by research and development. This is evident from the several programmer across the globe. At any point in existence of over 50 file systems of varying popularity time, for a file system, there are three to six active in the current version of the Linux kernel. They developers, ten to fifteen patch contributors but a single represent a complex subsystem of the kernel, with each maintainer. These people communicate through file system employing different strategies for tackling individual file system mailing lists [14, 16, 18] various issues. Although there are many file systems in submitting proposals for new features, enhancements, Linux, there has been no prior work (to the best of our reporting bugs, submitting and reviewing patches for knowledge) on understanding how file systems evolve. known bugs. The problems with the open source We believe that such information would be useful to the development approach is that all communication is file system community allowing developers to learn buried in the mailing list archives and aren’t easily from previous experiences. accessible to others. As a result when new file systems are developed they do not leverage past experience and This paper looks at six file systems (Ext2, Ext3, Ext4, could end up re-inventing the wheel. To make things JFS, ReiserFS, and XFS) from a historical perspective worse, people could typically end up doing the same (between kernel versions 1.0 to 2.6) to get an insight on mistakes as done in other file systems.
    [Show full text]
  • W4118: Linux File Systems
    W4118: Linux file systems Instructor: Junfeng Yang References: Modern Operating Systems (3rd edition), Operating Systems Concepts (8th edition), previous W4118, and OS at MIT, Stanford, and UWisc File systems in Linux Linux Second Extended File System (Ext2) . What is the EXT2 on-disk layout? . What is the EXT2 directory structure? Linux Third Extended File System (Ext3) . What is the file system consistency problem? . How to solve the consistency problem using journaling? Virtual File System (VFS) . What is VFS? . What are the key data structures of Linux VFS? 1 Ext2 “Standard” Linux File System . Was the most commonly used before ext3 came out Uses FFS like layout . Each FS is composed of identical block groups . Allocation is designed to improve locality inodes contain pointers (32 bits) to blocks . Direct, Indirect, Double Indirect, Triple Indirect . Maximum file size: 4.1TB (4K Blocks) . Maximum file system size: 16TB (4K Blocks) On-disk structures defined in include/linux/ext2_fs.h 2 Ext2 Disk Layout Files in the same directory are stored in the same block group Files in different directories are spread among the block groups Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc. All rights reserved. 0-13-6006639 3 Block Addressing in Ext2 Twelve “direct” blocks Data Data BlockData Inode Block Block BLKSIZE/4 Indirect Data Data Blocks BlockData Block Data (BLKSIZE/4)2 Indirect Block Data BlockData Blocks Block Double Block Indirect Indirect Blocks Data Data Data (BLKSIZE/4)3 BlockData Data Indirect Block BlockData Block Block Triple Double Blocks Block Indirect Indirect Data Indirect Data BlockData Blocks Block Block Picture from Tanenbaum, Modern Operating Systems 3 e, (c) 2008 Prentice-Hall, Inc.
    [Show full text]
  • Certifying a File System
    Certifying a file system: Correctness in the presence of crashes Tej Chajed, Haogang Chen, Stephanie Wang, Daniel Ziegler, Adam Chlipala, Frans Kaashoek, and Nickolai Zeldovich MIT CSAIL 1 / 28 New file systems (and bugs) are introduced over time Some bugs are serious: security exploits, data loss, etc. File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 300 200 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] 2 / 28 Some bugs are serious: security exploits, data loss, etc. File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 ext4 xfs 300 reiserfs 200 jfs btrfs 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] New file systems (and bugs) are introduced over time 2 / 28 File systems are complex and have bugs File systems are complex (e.g., Linux ext4 is ∼60,000 lines of code) and have many bugs: 500 ext3 400 ext4 xfs 300 reiserfs 200 jfs btrfs 100 # patches for bugs 0 Jan'04 Jan'05 Jan'06 Jan'07 Jan'08 Jan'09 Jan'10 Jan'11 Cumulative number of patches for file-system bugs in Linux; data from [Lu et al., FAST’13] New file systems (and bugs) are introduced over time Some bugs are serious: security exploits, data loss, etc.
    [Show full text]
  • XFS: There and Back ...And There Again? Slide 1 of 38
    XFS: There and Back.... .... and There Again? Dave Chinner <[email protected]> <[email protected]> XFS: There and Back .... and There Again? Slide 1 of 38 Overview • Story Time • Serious Things • These Days • Shiny Things • Interesting Times XFS: There and Back .... and There Again? Slide 2 of 38 Story Time • Way back in the early '90s • Storage exceeding 32 bit capacities • 64 bit CPUs, large scale MP • Hundreds of disks in a single machine • XFS: There..... Slide 3 of 38 "x" is for Undefined xFS had to support: • Fast Crash Recovery • Large File Systems • Large, Sparse Files • Large, Contiguous Files • Large Directories • Large Numbers of Files • - Scalability in the XFS File System, 1995 http://oss.sgi.com/projects/xfs/papers/xfs_usenix/index.html XFS: There..... Slide 4 of 38 The Early Years XFS: There..... Slide 5 of 38 The Early Years • Late 1994: First Release, Irix 5.3 • Mid 1996: Default FS, Irix 6.2 • Already at Version 4 • Attributes • Journalled Quotas • link counts > 64k • feature masks • • XFS: There..... Slide 6 of 38 The Early Years • • Allocation alignment to storage geometry (1997) • Unwritten extents (1998) • Version 2 directories (1999) • mkfs time configurable block size • Scalability to tens of millions of directory entries • • XFS: There..... Slide 7 of 38 What's that Linux Thing? • Feature development mostly stalled • Irix development focussed on CXFS • New team formed for Linux XFS port! • Encumberance review! • Linux was missing lots of bits XFS needed • Lot of work needed • • XFS: There and..... Slide 8 of 38 That Linux Thing? XFS: There and..... Slide 9 of 38 Light that fire! • 2000: SGI releases XFS under GPL • • 2001: First stable XFS release • • 2002: XFS merged into 2.5.36 • • JFS follows similar timeline • XFS: There and....
    [Show full text]
  • Filesystem Considerations for Embedded Devices ELC2015 03/25/15
    Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded software engineer Filesystem considerations ABSTRACT The goal of this presentation is to answer a question asked by several customers: which filesystem should you use within your embedded design’s eMMC/SDCard? These storage devices use a standard block interface, compatible with traditional filesystems, but constraints are not those of desktop PC environments. EXT2/3/4, BTRFS, F2FS are the first of many solutions which come to mind, but how do they all compare? Typical queries include performance, longevity, tools availability, support, and power loss robustness. This presentation will not dive into implementation details but will instead summarize provided answers with the help of various figures and meaningful test results. 2 TABLE OF CONTENTS 1. Introduction 2. Block devices 3. Available filesystems 4. Performances 5. Tools 6. Reliability 7. Conclusion Filesystem considerations ABOUT THE AUTHOR • Tristan Lelong • Embedded software engineer @ Adeneo Embedded • French, living in the Pacific northwest • Embedded software, free software, and Linux kernel enthusiast. 4 Introduction Filesystem considerations Introduction INTRODUCTION More and more embedded designs rely on smart memory chips rather than bare NAND or NOR. This presentation will start by describing: • Some context to help understand the differences between NAND and MMC • Some typical requirements found in embedded devices designs • Potential filesystems to use on MMC devices 6 Filesystem considerations Introduction INTRODUCTION Focus will then move to block filesystems. How they are supported, what feature do they advertise. To help understand how they compare, we will present some benchmarks and comparisons regarding: • Tools • Reliability • Performances 7 Block devices Filesystem considerations Block devices MMC, EMMC, SD CARD Vocabulary: • MMC: MultiMediaCard is a memory card unveiled in 1997 by SanDisk and Siemens based on NAND flash memory.
    [Show full text]
  • Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO
    Filesystems HOWTO Filesystems HOWTO Table of Contents Filesystems HOWTO..........................................................................................................................................1 Martin Hinner < [email protected]>, http://martin.hinner.info............................................................1 1. Introduction..........................................................................................................................................1 2. Volumes...............................................................................................................................................1 3. DOS FAT 12/16/32, VFAT.................................................................................................................2 4. High Performance FileSystem (HPFS)................................................................................................2 5. New Technology FileSystem (NTFS).................................................................................................2 6. Extended filesystems (Ext, Ext2, Ext3)...............................................................................................2 7. Macintosh Hierarchical Filesystem − HFS..........................................................................................3 8. ISO 9660 − CD−ROM filesystem.......................................................................................................3 9. Other filesystems.................................................................................................................................3
    [Show full text]
  • Acronis® Disk Director® 12 User's Guide
    User Guide Copyright Statement Copyright © Acronis International GmbH, 2002-2015. All rights reserved. "Acronis", "Acronis Compute with Confidence", "Acronis Recovery Manager", "Acronis Secure Zone", Acronis True Image, Acronis Try&Decide, and the Acronis logo are trademarks of Acronis International GmbH. Linux is a registered trademark of Linus Torvalds. VMware and VMware Ready are trademarks and/or registered trademarks of VMware, Inc. in the United States and/or other jurisdictions. Windows and MS-DOS are registered trademarks of Microsoft Corporation. All other trademarks and copyrights referred to are the property of their respective owners. Distribution of substantively modified versions of this document is prohibited without the explicit permission of the copyright holder. Distribution of this work or derivative work in any standard (paper) book form for commercial purposes is prohibited unless prior permission is obtained from the copyright holder. DOCUMENTATION IS PROVIDED "AS IS" AND ALL EXPRESS OR IMPLIED CONDITIONS, REPRESENTATIONS AND WARRANTIES, INCLUDING ANY IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE OR NON-INFRINGEMENT, ARE DISCLAIMED, EXCEPT TO THE EXTENT THAT SUCH DISCLAIMERS ARE HELD TO BE LEGALLY INVALID. Third party code may be provided with the Software and/or Service. The license terms for such third-parties are detailed in the license.txt file located in the root installation directory. You can always find the latest up-to-date list of the third party code and the associated license terms used with the Software and/or Service at http://kb.acronis.com/content/7696 Acronis patented technologies Technologies, used in this product, are covered and protected by one or more U.S.
    [Show full text]
  • Journaling File Systems
    Linux Journaling File Systems Linux onzSeries Journaling File Systems Volker Sameske ([email protected]) Linux on zSeries Development IBM Lab Boeblingen, Germany Share Anaheim,California February27 –March 4,2005 Session 9257 ©2005 IBM Corporation Linux Journaling File Systems Agenda o File systems. • Overview, definitions. • Reliability, scalability. • File system features. • Common grounds & differences. o Volume management. • LVM, EVMS, MD. • Striping. o Measurement results. • Hardware/software setup. • throughput. • CPU load. 2 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems A file system should... o ...store data o ...organize data o ...administrate data o ...organize data about the data o ...assure integrity o ...be able to recover integrity problems o ...provide tools (expand, shrink, check, ...) o ...be able to handle many and large files o ...be fast o ... 3 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems File system-definition o Informally • The mechanism by which computer files are stored and organized on a storage device. o More formally, • A set of abstract data types that are necessary for the storage, hierarchical organization, manipulation, navigation, access and retrieval of data. 4 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems Why a journaling file system? o Imagine your Linux system crashs while you are saving an edited file: • The system crashs after the changes have been written to disk à good crash • The system crashs before the changes have been written to disk à bad crash but bearable if you have an older version • The sytem crashs just in the moment your data will be written: à very bad crash your file could be corrupted and in worst case the file system could be corrupted à That‘s why you need a journal 5 Session 9257 © 2005 IBM Corporation Linux Journaling File Systems Somefilesystemterms o Meta data • "Data about the data" • File system internal data structure (e.g.
    [Show full text]
  • Model-Based Failure Analysis of Journaling File Systems
    Model-Based Failure Analysis of Journaling File Systems Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison Computer Sciences Department 1210, West Dayton Street, Madison, Wisconsin {vijayan, dusseau, remzi}@cs.wisc.edu Abstract To analyze such file systems, we develop a novel model- based fault-injection technique. Specifically, for the file We propose a novel method to measure the dependability system under test, we develop an abstract model of its up- of journaling file systems. In our approach, we build models date behavior, e.g., how it orders writes to disk to maintain of how journaling file systems must behave under different file system consistency. By using such a model, we can journaling modes and use these models to analyze file sys- inject faults at various “interesting” points during a file sys- tem behavior under disk failures. Using our techniques, we tem transaction, and thus monitor how the system reacts to measure the robustness of three important Linux journaling such failures. In this paper, we focus only on write failures file systems: ext3, Reiserfs and IBM JFS. From our anal- because file system writes are those that change the on-disk ysis, we identify several design flaws and correctness bugs state and can potentially lead to corruption if not properly present in these file systems, which can cause serious file handled. system errors ranging from data corruption to unmountable We use this fault-injection methodology to test three file systems. widely used Linux journaling file systems: ext3 [19], Reis- erfs [14] and IBM JFS [1].
    [Show full text]
  • Reliability, Transactions, Distributed Systems
    Important “ilities” CS162 • Availability: the probability that the system can accept and process Operating Systems and requests – Often measured in “nines” of probability. So, a 99.9% probability is Systems Programming considered “3-nines of availability” Lecture 20 – Key idea here is independence of failures • Durability: the ability of a system to recover data despite faults Reliability, Transactions – This idea is fault tolerance applied to data Distributed Systems – Doesn’t necessarily imply availability: information on pyramids was very durable, but could not be accessed until discovery of Rosetta Stone November 6th, 2017 • Reliability: the ability of a system or component to perform its Prof. Ion Stoica required functions under stated conditions for a specified period of time (IEEE definition) http://cs162.eecs.Berkeley.edu – Usually stronger than simply availability: means that the system is not only “up”, but also working correctly – Includes availability, security, fault tolerance/durability – Must make sure data survives system crashes, disk crashes, etc 11/6/17 CS162 © UCB Fall 2017 Lec 20.2 How to Make File System Durable? RAID: Redundant Arrays of Inexpensive Disks • Disk blocks contain Reed-Solomon error correcting codes (ECC) to deal with small defects in disk drive • Invented by David Patterson, Garth A. Gibson, and – Can allow recovery of data from small media defects Randy Katz here at UCB in 1987 • Make sure writes survive in short term – Either abandon delayed writes or • Data stored on multiple disks (redundancy)
    [Show full text]
  • AV-Use File Systems for Multiple High-Definition Era
    Hitachi Review Vol. 56 (2007), No. 1 11 AV-use File Systems for Multiple High-definition Era Nobuaki Kohinata OVERVIEW: Accompanying the spread of AV equipment fitted with large- Damien Le Moal capacity HDDs and high-speed network interfaces, a new style of enjoying content—in which all recorded content can be enjoyed freely anywhere in Mika Mizutani the home—will become mainstream in the near future. In the file system for handling this new viewing/listening style, processing must be performed at high efficiency while assuring the access rates for writing to the HDD storing content and for reading data from the HDD. Aiming to create a middleware solution to meet the above-mentioned requirements, Hitachi has developed, and is presently commercializing, an AV-use file system that enables simultaneous access to multiple “high definition” content (i.e. HDTV programs). Focusing on developing middleware for improving the added- value of HDDs, we are continuing to intensify and push forward our research and development on fundamental technologies for supporting people’ s “new digital lives.” INTRODUCTION HDTV (high-definition TV) content that has already ACCOMPANYING the popularization of AV (audio- been recorded can be freely enjoyed in the home while visual) equipment fitted with network I/Fs (interfaces) programs on all channels are being recorded. Making and large-capacity HDDs (hard disk drives), and the this kind of viewing a reality necessitates a scheme launch of terrestrial digital broadcasting, it is that allows multiple read/writing processing operations considered that, from now onwards, the way that users on an HDD simultaneously at high efficiency while view and listen to content will continue to change.
    [Show full text]
  • Partitioning Disks with Parted
    Partitioning Disks with parted Author: Yogesh Babar Technical Reviewer: Chris Negus 10/6/2017 Storage devices in Linux (such as hard drives and USB drives) need to be structured in some way before use. In most cases, large storage devices are divided into separate sections, which in Linux are referred to as partitions. A popular tool for creating, removing and otherwise manipulating disk partitions in Linux is the parted command. Procedures in this tech brief step you through different ways of using the parted command to work with Linux partitions. UNDERSTANDING PARTED The parted command is particularly useful with large disk devices and many disk partitions. Differences between parted and the more common fdisk and cfdisk commands include: • GPT Format: The parted command can create can be used to create Globally Unique Identifiers Partition Tables (GPT), while fdisk and cfdisk are limited to msdos partition tables. • Larger disks: An msdos partition table can only format up to 2TB of disk space (although up to 16TB is possible in some cases). A GPT partition table, however, have the potential to address up to 8 zebibytes of space. • More partitions: Using primary and extended partitions, msdos partition tables allow only 16 partitions. With GPT, you get up to 128 partitions by default and can choose to have many more. • Reliability: Only one copy of the partition table is stored in an msdos partition. GPT keeps two copies of the partition table (at the beginning and end of the disk). The GPT also uses a CRC checksum to check the partition table's integrity (which is not done with msdos partitions).
    [Show full text]