Understanding Security Vulnerabilities in File Systems Miao Cai∗ Hao Huang Jian Huang Nanjing University Nanjing University UIUC

Total Page:16

File Type:pdf, Size:1020Kb

Understanding Security Vulnerabilities in File Systems Miao Cai∗ Hao Huang Jian Huang Nanjing University Nanjing University UIUC Understanding Security Vulnerabilities in File Systems Miao Cai∗ Hao Huang Jian Huang Nanjing University Nanjing University UIUC ABSTRACT There is no quantitative research demonstrating their root File systems have been developed for decades with the security- causes and consequences. Moreover, there is no study demon- critical foundation provided by operating systems. However, strating how malicious users exploit these vulnerabilities to they are still vulnerable to malware attacks and software successfully initiate the attacks, and pose a great threat to defects. In this paper, we undertake the first attempt to sys- the user data and even the safety of the whole system. tematically understand the security vulnerabilities in various In this paper, we conduct the first systematic study on the file systems. We conduct an empirical study of 157 real cases security vulnerabilities in Linux file systems. We study 157 reported in Common Vulnerabilities and Exposures (CVE). real-world cases related to file systems from the list of Com- We characterize the file system vulnerabilities in different mon Vulnerabilities and Exposures (CVE). These cases are dimensions that include the common vulnerabilities lever- committed from the year of 1999 to 2019, and cover a variety aged by adversaries to initiate their attacks, their exploitation of file systems that include eight on-disk fs implementations procedures, root causes, consequences, and mitigation ap- such as Ext4, XFS [38], and F2FS [18], two in-memory fs proaches. We believe the insights derived from this study like tmpfs, and the virtual file system (VFS). Note that our have broad implications related to the further enhancement study mainly focuses on the security aspects of the fs design of the security aspect of file systems, and the associated and implementation. We use the CVE list as our resource vulnerability detection tools. pool, because all the reported cases are confirmed by security experts, and they represent the real-world threat models. 1 INTRODUCTION In order to fully understand each security case, we de- File system (fs) is crucial to the data integrity and secu- velop a vulnerability analysis model, which includes three rity in modern computer systems. Although it has been major steps: vulnerability reproducing, attack exploitation, developed for decades with applying numerous data pro- and consequence confirmation. We use this analysis model tection techniques such as access control and sanity check- to guide our study of all the cases, and investigate the fs ing [6, 17, 20, 24, 25, 32], file systems are suffering from a vulnerabilities in different dimensions, including what are significant number of malware attacks, and they often fail their major consequences? what are the common root causes to protect users from severe damages, such as data loss and of these vulnerabilities? what are the popular fs components leakage, denial of service (DoS), systems crashes, and even that have been exploited by attackers? and how attackers full system compromise [15, 34, 36, 37]. leverage the fs features to initiate their attacks? Although prior research has performed intensive studies To be specific, we identify four major types of conse- on file systems bugs [16, 21, 29], bug-detection tools [27, 41– quences that include denial of service (DoS) (75%), data leak- 43], and formal verification for bug-free implementation [1, age (12%), access permission bypass (7%), and privilege esca- 3, 5, 33], few studies investigate their security vulnerabilities. lation (6%), in which the DoS is the major consequence of fs vulnerabilities. As for the root causes of these vulnerabilities, ∗The work was performed when the author was a visiting student in the Systems and Platform Research Group at UIUC. we find that they are mainly caused by sanity checking (45%), memory errors with fs data structures (23%), race condition Permission to make digital or hard copies of all or part of this work for in concurrency implementation (8%), and file permission personal or classroom use is granted without fee provided that copies are not (10%). These are the common issues that have been exploited made or distributed for profit or commercial advantage and that copies bear by attackers. Unfortunately, it is challenging to automatically this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with detect and fix many of them, especially for those that are credit is permitted. To copy otherwise, or republish, to post on servers or to closely related to high-level system semantic and specifica- redistribute to lists, requires prior specific permission and/or a fee. Request tions [13], such as sanity checking issues. permissions from [email protected]. Furthermore, as we map the fs vulnerabilities to fs core APSys ’19, August 19–20, 2019, Hangzhou, China components, including namespace management, inode man- © 2019 Association for Computing Machinery. agement, superblock management, block allocation, page ACM ISBN 978-1-4503-6893-3/19/08...$15.00 https://doi.org/10.1145/3343737.3343753 cache, file management, crash-safety model, permission model, APSys ’19, August 19–20, 2019, Hangzhou, China Miao Cai, Hao Huang, and Jian Huang Table 1: Summary of Linux fs vulnerabilities. In order to precisely categorize each CVE case, we manually Name Description Release Year #CVEs examine the committed report, problem description, and Ext4 A journaling fs with extents 2008 39 posted blogs following the approaches described in prior VFS Virtual file system 1995 37 bug-study work [11, 21]. For the cases whose source codes XFS A journaling fs created by SGI 1993 20 F2FS A flash-friendly fs 2013 15 are available, we also check the corresponding source codes Btrfs A copy-on-write based fs 2009 13 and the associated committed patches, and reproduce them procfs A virtual memory file system 2001 9 to further understand and confirm the vulnerability. Ext2 An extended fs 1993 6 Similar to prior characteristic studies that may suffer from Ext3 A journaling fs 2001 6 limitations on sampling, we take our best effort to collect the tmpfs A virtual memory fs 2001 5 ReiserFS A journaling fs created by Namesys 2001 4 vulnerabilities available in the CVE list. Given that we focus JFS A journaling fs created by IBM 1990 3 on the popular and representative file systems, we believe Total 157 that these limitations do not invalidate our study result. Also, we encourage readers to focus on the root causes behind and dentry management, we pinpoint that metadata man- each individual case rather than the precise numbers, since agement is the most vulnerable component in file systems, a single vulnerability could produce massive damages. which occupies 74% of the total fs vulnerabilities. As we ex- Vulnerability analysis model. To facilitate our study, pected, fs features could facilitate attackers to compromise we develop a vulnerability analysis model, which includes the file systems and even the entire computer systems. We three major steps: vulnerability reproducing, attack exploita- validate that adversaries usually exploit the unique fs fea- tion, and consequence confirmation, as described as follows. tures such as block management and data consistency model • Vulnerability reproducing. to increase the chances of successful attacks. For instance, To initiate a successful at- attackers would exploit the data inconsistency between page tack, the adversary has to verify the effectiveness of the cache and disk with hole punching operations to cause disk vulnerability, as it reveals the weakness of the file systems. data corruption in Ext4 (see CVE-2015-8839). We wish our Typical file system weaknesses include poor isolation be- findings could facilitate the file systems development onthe tween namespaces, and insufficient enforcement of file per- aspects of systems security and data protection, as well as mission model. In this step, we reproduce the vulnerabilities the associated vulnerability detection tools. according to the external references in the CVE. For each The rest of this paper is organized as follows. x 2 describes vulnerability, we generate a reproduction report which de- our study methodology. x 3 presents the consequences and scribes the conditions to trigger the specific vulnerability. • Attack exploitation. causes. We discuss how fs vulnerabilities are related to fs Adversary would combine various components in x 4, and present how attackers would exploit attack methods to conduct the attack. Typical attack methods fs features to initiate their attacks in x 5. x 6 presents the include heap spray, return oriented programming (ROP), and related work. We concludes the paper in x 7. buffer overflow. Similar to the vulnerability reproduction, we also generate a report for the attack exploitation, which 2 STUDY METHODOLOGY records the detailed attack methods of the exploitation and In this work, we focus on the study of the security vulner- adversary capability. abilities in widely used file systems. We collected the cases • Consequence confirmation. With attack exploitation, related to the file systems by searching the keywords of “file the adversary would initiate the attacks to the file systems systems” and fs names like “Ext4” via the search functionality and even the entire computer system. Typical consequences of CVE. We use the CVE cases as our study samples for two include denial of service, data leakage and loss, access per-
Recommended publications
  • Hetfs: a Heterogeneous File System for Everyone
    HetFS: A Heterogeneous File System for Everyone Georgios Koloventzos1, Ramon Nou∗1, Alberto Miranda∗1, and Toni Cortes1;2 1 Barcelona Supercomputing Center (BSC) Barcelona, Spain 2 Universitat Polit`ecnicade Catalunya Barcelona, Spain georgios.koloventzos,ramon.nou,alberto.miranda,toni.cortes}@bsc.es Abstract Storage devices have been getting more and more diverse during the last decade. The advent of SSDs made it painfully clear that rotating devices, such as HDDs or magnetic tapes, were lacking in regards to response time. However, SSDs currently have a limited number of write cycles and a significantly larger price per capacity, which has prevented rotational technologies from begin abandoned. Additionally, Non-Volatile Memories (NVMs) have been lately gaining traction, offering devices that typically outperform NAND-based SSDs but exhibit a full new set of idiosyncrasies. Therefore, in order to appropriately support this diversity, intelligent mech- anisms will be needed in the near-future to balance the benefits and drawbacks of each storage technology available to a system. In this paper, we present a first step towards such a mechanism called HetFS, an extension to the ZFS file system that is capable of choosing the storage device a file should be kept in according to preprogrammed filters. We introduce the prototype and show some preliminary results of the effects obtained when placing specific files into different devices. 1 Introduction Storage devices have shown a significant evolution in the latest decade. As the improvements in the latencies of traditional hard disk drives (HDDs) have dimin- ished due to the mechanical limitations inherent to their design, other technolo- gies have been emerging to try and take their place.
    [Show full text]
  • Linux on the Road
    Linux on the Road Linux with Laptops, Notebooks, PDAs, Mobile Phones and Other Portable Devices Werner Heuser <wehe[AT]tuxmobil.org> Linux Mobile Edition Edition Version 3.22 TuxMobil Berlin Copyright © 2000-2011 Werner Heuser 2011-12-12 Revision History Revision 3.22 2011-12-12 Revised by: wh The address of the opensuse-mobile mailing list has been added, a section power management for graphics cards has been added, a short description of Intel's LinuxPowerTop project has been added, all references to Suspend2 have been changed to TuxOnIce, links to OpenSync and Funambol syncronization packages have been added, some notes about SSDs have been added, many URLs have been checked and some minor improvements have been made. Revision 3.21 2005-11-14 Revised by: wh Some more typos have been fixed. Revision 3.20 2005-11-14 Revised by: wh Some typos have been fixed. Revision 3.19 2005-11-14 Revised by: wh A link to keytouch has been added, minor changes have been made. Revision 3.18 2005-10-10 Revised by: wh Some URLs have been updated, spelling has been corrected, minor changes have been made. Revision 3.17.1 2005-09-28 Revised by: sh A technical and a language review have been performed by Sebastian Henschel. Numerous bugs have been fixed and many URLs have been updated. Revision 3.17 2005-08-28 Revised by: wh Some more tools added to external monitor/projector section, link to Zaurus Development with Damn Small Linux added to cross-compile section, some additions about acoustic management for hard disks added, references to X.org added to X11 sections, link to laptop-mode-tools added, some URLs updated, spelling cleaned, minor changes.
    [Show full text]
  • BUGS in the SYSTEM a Primer on the Software Vulnerability Ecosystem and Its Policy Implications
    ANDI WILSON, ROSS SCHULMAN, KEVIN BANKSTON, AND TREY HERR BUGS IN THE SYSTEM A Primer on the Software Vulnerability Ecosystem and its Policy Implications JULY 2016 About the Authors About New America New America is committed to renewing American politics, Andi Wilson is a policy analyst at New America’s Open prosperity, and purpose in the Digital Age. We generate big Technology Institute, where she researches and writes ideas, bridge the gap between technology and policy, and about the relationship between technology and policy. curate broad public conversation. We combine the best of With a specific focus on cybersecurity, Andi is currently a policy research institute, technology laboratory, public working on issues including encryption, vulnerabilities forum, media platform, and a venture capital fund for equities, surveillance, and internet freedom. ideas. We are a distinctive community of thinkers, writers, researchers, technologists, and community activists who Ross Schulman is a co-director of the Cybersecurity believe deeply in the possibility of American renewal. Initiative and senior policy counsel at New America’s Open Find out more at newamerica.org/our-story. Technology Institute, where he focuses on cybersecurity, encryption, surveillance, and Internet governance. Prior to joining OTI, Ross worked for Google in Mountain About the Cybersecurity Initiative View, California. Ross has also worked at the Computer The Internet has connected us. Yet the policies and and Communications Industry Association, the Center debates that surround the security of our networks are for Democracy and Technology, and on Capitol Hill for too often disconnected, disjointed, and stuck in an Senators Wyden and Feingold. unsuccessful status quo.
    [Show full text]
  • Flexible Lustre Management
    Flexible Lustre management Making less work for Admins ORNL is managed by UT-Battelle for the US Department of Energy How do we know Lustre condition today • Polling proc / sysfs files – The knocking on the door model – Parse stats, rpc info, etc for performance deviations. • Constant collection of debug logs – Heavy parsing for common problems. • The death of a node – Have to examine kdumps and /or lustre dump Origins of a new approach • Requirements for Linux kernel integration. – No more proc usage – Migration to sysfs and debugfs – Used to configure your file system. – Started in lustre 2.9 and still on going. • Two ways to configure your file system. – On MGS server run lctl conf_param … • Directly accessed proc seq_files. – On MSG server run lctl set_param –P • Originally used an upcall to lctl for configuration • Introduced in Lustre 2.4 but was broken until lustre 2.12 (LU-7004) – Configuring file system works transparently before and after sysfs migration. Changes introduced with sysfs / debugfs migration • sysfs has a one item per file rule. • Complex proc files moved to debugfs • Moving to debugfs introduced permission problems – Only debugging files should be their. – Both debugfs and procfs have scaling issues. • Moving to sysfs introduced the ability to send uevents – Item of most interest from LUG 2018 Linux Lustre client talk. – Both lctl conf_param and lctl set_param –P use this approach • lctl conf_param can set sysfs attributes without uevents. See class_modify_config() – We get life cycle events for free – udev is now involved. What do we get by using udev ? • Under the hood – uevents are collect by systemd and then processed by udev rules – /etc/udev/rules.d/99-lustre.rules – SUBSYSTEM=="lustre", ACTION=="change", ENV{PARAM}=="?*", RUN+="/usr/sbin/lctl set_param '$env{PARAM}=$env{SETTING}’” • You can create your own udev rule – http://reactivated.net/writing_udev_rules.html – /lib/udev/rules.d/* for examples – Add udev_log="debug” to /etc/udev.conf if you have problems • Using systemd for long task.
    [Show full text]
  • F2FS) Overview
    Flash Friendly File System (F2FS) Overview Leon Romanovsky [email protected] www.leon.nu November 17, 2012 Leon Romanovsky [email protected] F2FS Overview Disclaimer Everything in this lecture shall not, under any circumstances, hold any legal liability whatsoever. Any usage of the data and information in this document shall be solely on the responsibility of the user. This lecture is not given on behalf of any company or organization. Leon Romanovsky [email protected] F2FS Overview Introduction: Flash Memory Definition Flash memory is a non-volatile storage device that can be electrically erased and reprogrammed. Challenges block-level access wear leveling read disturb bad blocks management garbage collection different physics different interfaces Leon Romanovsky [email protected] F2FS Overview Introduction: Flash Memory Definition Flash memory is a non-volatile storage device that can be electrically erased and reprogrammed. Challenges block-level access wear leveling read disturb bad blocks management garbage collection different physics different interfaces Leon Romanovsky [email protected] F2FS Overview Introduction: General System Architecture Leon Romanovsky [email protected] F2FS Overview Introduction: File Systems Optimized for disk storage EXT2/3/4 BTRFS VFAT Optimized for flash, but not aware of FTL JFFS/JFFS2 YAFFS LogFS UbiFS NILFS Leon Romanovsky [email protected] F2FS Overview Background: LFS vs. Unix FS Leon Romanovsky [email protected] F2FS Overview Background: LFS Overview Leon Romanovsky [email protected] F2FS Overview Background: LFS Garbage Collection 1 A victim segment is selected through referencing segment usage table. 2 It loads parent index structures of all the data in the victim identified by segment summary blocks.
    [Show full text]
  • Dm-X: Protecting Volume-Level Integrity for Cloud Volumes and Local
    dm-x: Protecting Volume-level Integrity for Cloud Volumes and Local Block Devices Anrin Chakraborti Bhushan Jain Jan Kasiak Stony Brook University Stony Brook University & Stony Brook University UNC, Chapel Hill Tao Zhang Donald Porter Radu Sion Stony Brook University & Stony Brook University & Stony Brook University UNC, Chapel Hill UNC, Chapel Hill ABSTRACT on Amazon’s Virtual Private Cloud (VPC) [2] (which provides The verified boot feature in recent Android devices, which strong security guarantees through network isolation) store deploys dm-verity, has been overwhelmingly successful in elim- data on untrusted storage devices residing in the public cloud. inating the extremely popular Android smart phone rooting For example, Amazon VPC file systems, object stores, and movement [25]. Unfortunately, dm-verity integrity guarantees databases reside on virtual block devices in the Amazon are read-only and do not extend to writable volumes. Elastic Block Storage (EBS). This paper introduces a new device mapper, dm-x, that This allows numerous attack vectors for a malicious cloud efficiently (fast) and reliably (metadata journaling) assures provider/untrusted software running on the server. For in- volume-level integrity for entire, writable volumes. In a direct stance, to ensure SEC-mandated assurances of end-to-end disk setup, dm-x overheads are around 6-35% over ext4 on the integrity, banks need to guarantee that the external cloud raw volume while offering additional integrity guarantees. For storage service is unable to remove log entries documenting cloud storage (Amazon EBS), dm-x overheads are negligible. financial transactions. Yet, without integrity of the storage device, a malicious cloud storage service could remove logs of a coordinated attack.
    [Show full text]
  • File Formats
    man pages section 4: File Formats Sun Microsystems, Inc. 4150 Network Circle Santa Clara, CA 95054 U.S.A. Part No: 817–3945–10 September 2004 Copyright 2004 Sun Microsystems, Inc. 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. This product or document is protected by copyright and distributed under licenses restricting its use, copying, distribution, and decompilation. No part of this product or document may be reproduced in any form by any means without prior written authorization of Sun and its licensors, if any. Third-party software, including font technology, is copyrighted and licensed from Sun suppliers. Parts of the product may be derived from Berkeley BSD systems, licensed from the University of California. UNIX is a registered trademark in the U.S. and other countries, exclusively licensed through X/Open Company, Ltd. Sun, Sun Microsystems, the Sun logo, docs.sun.com, AnswerBook, AnswerBook2, and Solaris are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc. The OPEN LOOK and Sun™ Graphical User Interface was developed by Sun Microsystems, Inc. for its users and licensees. Sun acknowledges the pioneering efforts of Xerox in researching and developing the concept of visual or graphical user interfaces for the computer industry. Sun holds a non-exclusive license from Xerox to the Xerox Graphical User Interface, which license also covers Sun’s licensees who implement OPEN LOOK GUIs and otherwise comply with Sun’s written license agreements.
    [Show full text]
  • In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage
    In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage Qiuping Wangy, Jinhong Liy, Patrick P. C. Leey, Guangliang Zhao∗, Chao Shi∗, Lilong Huang∗ yThe Chinese University of Hong Kong ∗Alibaba Group ABSTRACT invalidated by a live block; a.k.a. the death time [12]) to achieve Log-structured storage has been widely deployed in various do- the minimum possible WA. However, without obtaining the fu- mains of storage systems for high performance. However, its garbage ture knowledge of the BIT pattern, how to design an optimal data collection (GC) incurs severe write amplification (WA) due to the placement scheme with the minimum WA remains an unexplored frequent rewrites of live data. This motivates many research stud- issue. Existing temperature-based data placement schemes that ies, particularly on data placement strategies, that mitigate WA in group blocks by block temperatures (e.g., write/update frequencies) log-structured storage. We show how to design an optimal data [7, 16, 22, 27, 29, 35, 36] are arguably inaccurate to capture the BIT placement scheme that leads to the minimum WA with the fu- pattern and fail to group the blocks with similar BITs [12]. ture knowledge of block invalidation time (BIT) of each written To this end, we propose SepBIT, a novel data placement scheme block. Guided by this observation, we propose SepBIT, a novel data that aims to minimize the WA in log-structured storage. It infers placement algorithm that aims to minimize WA in log-structured the BITs of written blocks from the underlying storage workloads storage.
    [Show full text]
  • ECE 598 – Advanced Operating Systems Lecture 19
    ECE 598 { Advanced Operating Systems Lecture 19 Vince Weaver http://web.eece.maine.edu/~vweaver [email protected] 7 April 2016 Announcements • Homework #7 was due • Homework #8 will be posted 1 Why use FAT over ext2? • FAT simpler, easy to code • FAT supported on all major OSes • ext2 faster, more robust filename and permissions 2 btrfs • B-tree fs (similar to a binary tree, but with pages full of leaves) • overwrite filesystem (overwite on modify) vs CoW • Copy on write. When write to a file, old data not overwritten. Since old data not over-written, crash recovery better Eventually old data garbage collected • Data in extents 3 • Copy-on-write • Forest of trees: { sub-volumes { extent-allocation { checksum tree { chunk device { reloc • On-line defragmentation • On-line volume growth 4 • Built-in RAID • Transparent compression • Snapshots • Checksums on data and meta-data • De-duplication • Cloning { can make an exact snapshot of file, copy-on- write different than link, different inodles but same blocks 5 Embedded • Designed to be small, simple, read-only? • romfs { 32 byte header (magic, size, checksum,name) { Repeating files (pointer to next [0 if none]), info, size, checksum, file name, file data • cramfs 6 ZFS Advanced OS from Sun/Oracle. Similar in idea to btrfs indirect still, not extent based? 7 ReFS Resilient FS, Microsoft's answer to brtfs and zfs 8 Networked File Systems • Allow a centralized file server to export a filesystem to multiple clients. • Provide file level access, not just raw blocks (NBD) • Clustered filesystems also exist, where multiple servers work in conjunction.
    [Show full text]
  • CS 5600 Computer Systems
    CS 5600 Computer Systems Lecture 10: File Systems What are We Doing Today? • Last week we talked extensively about hard drives and SSDs – How they work – Performance characterisEcs • This week is all about managing storage – Disks/SSDs offer a blank slate of empty blocks – How do we store files on these devices, and keep track of them? – How do we maintain high performance? – How do we maintain consistency in the face of random crashes? 2 • ParEEons and MounEng • Basics (FAT) • inodes and Blocks (ext) • Block Groups (ext2) • Journaling (ext3) • Extents and B-Trees (ext4) • Log-based File Systems 3 Building the Root File System • One of the first tasks of an OS during bootup is to build the root file system 1. Locate all bootable media – Internal and external hard disks – SSDs – Floppy disks, CDs, DVDs, USB scks 2. Locate all the parEEons on each media – Read MBR(s), extended parEEon tables, etc. 3. Mount one or more parEEons – Makes the file system(s) available for access 4 The Master Boot Record Address Size Descripon Hex Dec. (Bytes) Includes the starEng 0x000 0 Bootstrap code area 446 LBA and length of 0x1BE 446 ParEEon Entry #1 16 the parEEon 0x1CE 462 ParEEon Entry #2 16 0x1DE 478 ParEEon Entry #3 16 0x1EE 494 ParEEon Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 ParEEon 1 ParEEon 2 ParEEon 3 ParEEon 4 MBR (ext3) (swap) (NTFS) (FAT32) Disk 1 ParEEon 1 MBR (NTFS) 5 Disk 2 Extended ParEEons • In some cases, you may want >4 parEEons • Modern OSes support extended parEEons Logical Logical ParEEon 1 ParEEon 2 Ext.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • Comparing Filesystem Performance: Red Hat Enterprise Linux 6 Vs
    COMPARING FILE SYSTEM I/O PERFORMANCE: RED HAT ENTERPRISE LINUX 6 VS. MICROSOFT WINDOWS SERVER 2012 When choosing an operating system platform for your servers, you should know what I/O performance to expect from the operating system and file systems you select. In the Principled Technologies labs, using the IOzone file system benchmark, we compared the I/O performance of two operating systems and file system pairs, Red Hat Enterprise Linux 6 with ext4 and XFS file systems, and Microsoft Windows Server 2012 with NTFS and ReFS file systems. Our testing compared out-of-the-box configurations for each operating system, as well as tuned configurations optimized for better performance, to demonstrate how a few simple adjustments can elevate I/O performance of a file system. We found that file systems available with Red Hat Enterprise Linux 6 delivered better I/O performance than those shipped with Windows Server 2012, in both out-of- the-box and optimized configurations. With I/O performance playing such a critical role in most business applications, selecting the right file system and operating system combination is critical to help you achieve your hardware’s maximum potential. APRIL 2013 A PRINCIPLED TECHNOLOGIES TEST REPORT Commissioned by Red Hat, Inc. About file system and platform configurations While you can use IOzone to gauge disk performance, we concentrated on the file system performance of two operating systems (OSs): Red Hat Enterprise Linux 6, where we examined the ext4 and XFS file systems, and Microsoft Windows Server 2012 Datacenter Edition, where we examined NTFS and ReFS file systems.
    [Show full text]