The Linux Implementation of a Log-Structured File System

Total Page:16

File Type:pdf, Size:1020Kb

The Linux Implementation of a Log-Structured File System The LinuxImplementation of a Log-structured File System Ryusuke Konishi Yoshiji Amagai Koji Sato [email protected] [email protected] [email protected] Hisashi Hifumi Seiji Kihara SatoshiMoriai [email protected] [email protected] [email protected] NTT Cyber Space Laboratories, NTT Corporation 1-1Hikari-no-oka,Yokosuka-shi, Kanagawa, 239-0847, Japan ABSTRACT meta data blocks is physically conserved on the disk plat- Toward enhancing the reliability of the Linux file system, ters. Unfortunately, this constraint is often violated by the wearedeveloping a new log-structured file system (NILFS) write optimizations performed by the block I/O subsystem for the Linux operating system. Instead of overwriting ex- and disk controllers. Careful implementation of write bar- isting blocks, NILFS appends consistent sets of modified or riers and their correct application, which partially restrict newly created blocks continuously into segmented disk re- the elevator seek optimizations, is indispensable for avoiding gions. This writing method allowsNILFStoachievefaster this problem. However, strict write ordering degrades stor- recovery time and higher write performance. The address age performance because it increases disk seeks.Journaling of the block that is written to changes for each write, which file systems degrade performance due to the seeks between makes it difficult to apply modern file system technologies the journal file and original meta data blocks.Atpresent,a such as B-tree structures. To permit such writing on the few Linuxjournaling file systems support the write barrier, Linux kernel basis, NILFS has its own write mechanism that but it has not yet taken hold. Only XFS has recently made handles data and meta data as one unit and allowsthemto it effectivebydefault.We seem to be faced with a trade-off be relocated. This paper presents the design and implemen- between performance and reliability. tation of NILFS focussing on the write mechanism. One interesting alternative approach is called log-structured file system (LFS)[6, 7]. LFS assures reliability by avoiding 1. INTRODUCTION the overwriting of on-disk blocks; the modified blocks are ap- As use of open-source operating systems advances not only pended to existing data blocks as is done for log data. This for stationery PCs but also for backend servers, their reliabil- write method also improves performance since the sequential ity and availability are becoming more and more important. block writes minimize the number of disk head hops. One important issue in these operating systems is file sys- tem reliability. For instance, applying Linux to such fields With regard to data recovery, we see the need for realizing was difficult several years ago because of the unreliablity more advanced features through the use of the technique of its standard file system. This problem has been signifi- called “snapshot”. Snapshots reduce the possibility of data cantly eased by the adoption of journaling file systems such loss including those caused by human error. Some commer- as ext3[9], XFS[2], JFS[1] and ReiserFS[3]. cial storage systems support this feature, but they are costly and are far from common. LFS suits data salvage and snap- These journaling file systems enable fast and consistent re- shots because past data are kept in disk. It can improvethe covery of the file system after unexpected system freezes or restorability of data, and can compensate for operational power failures. However, they still allow the fatal destruc- errors. tion of the file system due to the characteristic that recov- ery is realized by overwriting meta data with their copies Some LFS implementations appeared in the ’90s, but most saved in a journal file. This recovery is guaranteed to work of them are now obsolete.Asfor4.4BSD and NetBSD, properly only if the write order of on-disk data blocks and an implementation called BSD-LFS [8] is available.Asfor Linux,therewas an experimental implementation called Lin- LogFS[5], but its development was abandoned and it’s not available for recent Linux kernels. This is primarily due to the difficulty of implementing LFS; LFS basically stores blocks at different positions at each write. Combining LFS with a modern block management technique such as the B- tree[4] is a significant challenge. To overcome this shortfall, wearedeveloping NILFS, which is an abbreviation of the New Implementation of a Log-structured File System, for 102 Super block Segment usage chunks Full segment Seg 0 Seg 1 Seg 2 ... Seg i ... Seg n Partial segment ... Payload Payload Payload ... blocks blocks blocks Segment summary Logical segment Inode File B-tree Inode Check ... File blocks B-tree ... blocks blocks point blocks Write direction Figure1: Disk layout ofNILFS the Linux operating system. NILFS without losing compliance with Linux file system se- mantics and to minimize changes in the kernel code.We The rest of the paper is organized as follows. Section 2 note that the development of user tools is a future task. overviewsNILFSdesign. Section 3 focuses on the writing mechanism of NILFS, the key to our LFS implementation 2.2 On-disk representation for Linux. After showing some evaluation results in Section On-disk representation of inode number, block number and 4, we conclude with a brief summary. other sizes are designed around 64-bits in order to eliminate scalability limits. Data blocks and inode blocks are managed 2. OVERVIEW OF THE NILFS DESIGN using B-trees to enable fast lookup. The B-tree structures 2.1 Development goals of NILFS also have the role of translating logical block offset The development goals of NILFS are as follows: addresses into disk block numbers.We call the former file block number and the latter disk block number. 1. High availability Figure 1 depicts the disk layout of NILFS. Disk blocks of NILFS are divided into units of segments. The term segment 2.Adequate reliability and performance is used in multiple ways. To avoid ambiguity, we introduce three terms: 3. Support of online snapshots 4. High scalability • Full segment: 5. Compliance with Linux semantics Division and allocation units. The full segment is ba- sically divided equally and is addressable by its index 6. Improved operability with user-friendly tools value. • Partial segment: High availability means that the recovery time should match Write units. Each partial segment consists of segment those of existing journaling file systems. For this purpose, a summary blocks and payload blocks. It cannot cross DB-like technique called checkpointing is applied. However, over the full segment boundary. The segment sum- NILFS recovery is safer because it prevents the overwriting mary has information about how the partial segment of meta data. In contrast, journaling file systems restore is organized. It includes a breakdownofeachblock, important blocks from the journal file during recovery. This length of the segment, a pointer to the summary block can cause fatal collapse if the journal file is not written per- of the previous partial segment, and so on. fectly. • Logical segment: Recovery units. Each logical segment consists of one NILFS offers online snapshot support and adequate relia- or more partial segments. It represents the difference bility and performance by taking advantage of LFS. Online between two consistent states of a file system. The snapshot support allows users to clip the consistent file sys- partial segments composing a logical segment are re- tem state without stopping services and then helps with garded, logically, as one segment. online backup. The requirement of high scalability includes support of many files, and large files and disks with mini- mum performance degradation.We would like to implement A logical segment consists of file blocks, file B-tree node 103 System Call Interface Virtual File System (VFS) super_ops inode_ops inode_ops Parts of NILFS Mount/Recovery File Inode Directory Inode segment Operations Operations Operations lookup File Page Operations lookup/insert/delete Segment Segment Block Manager Manager Constructor (B-Tree) segment data allocation read B-Tree Node Cache meta data read BIO write Buffer Cache Layer Page Cache (Radix-Tree) BIO read BIO read Block I/O Layer (BIO) Device Driver Figure2:Block diagram ofNILFS blocks, inode blocks, inode B-tree node blocks, and a check locating the valid checkpoint over partial segments. This point block. The B-tree node blocks are intermediate blocks search starts from the partial segment pointed to by the su- composing B-tree structures. The check point block points per block. The file inode operations read, write, delete, and to the root of the inode B-tree and holds inode blocks on the truncate files through the manipulation of file inodes and leaves. Each inode block includes multiple inodes, each of file data blocks. The directory inode operations perform which points to the root of a file B-tree. File data and direc- lookup, listing, creation, removal, and renaming of nodes tory data are held by the file B-trees. These payload blocks on directories through the manipulation of directory inodes are a collection of modified or newly created blocks. Since and directory data blocks. pointers to unchanged blocks are reused, the B-tree node blocks or unchanged inodes may havepointerstoblocksin The lower part of NILFS is entirely new and is composed past segments. Thus, the check point block represents the of a block manager, a segment constructor, and a segment root of the entire file system at the time the logical seg- manager. These parts call generic functions of the Linux ment was created. Snapshots are realized as the ability to file cache layer or directly call the Linux block I/O (BIO) keep the on-disk blocks trackable from all or selected check Layer. The Linux file cache layer consists of the buffer cache points. The segment summary and the check point use cyclic and the page cache.
Recommended publications
  • Development of a Verified Flash File System ⋆
    Development of a Verified Flash File System ? Gerhard Schellhorn, Gidon Ernst, J¨orgPf¨ahler,Dominik Haneberg, and Wolfgang Reif Institute for Software & Systems Engineering University of Augsburg, Germany fschellhorn,ernst,joerg.pfaehler,haneberg,reifg @informatik.uni-augsburg.de Abstract. This paper gives an overview over the development of a for- mally verified file system for flash memory. We describe our approach that is based on Abstract State Machines and incremental modular re- finement. Some of the important intermediate levels and the features they introduce are given. We report on the verification challenges addressed so far, and point to open problems and future work. We furthermore draw preliminary conclusions on the methodology and the required tool support. 1 Introduction Flaws in the design and implementation of file systems already lead to serious problems in mission-critical systems. A prominent example is the Mars Explo- ration Rover Spirit [34] that got stuck in a reset cycle. In 2013, the Mars Rover Curiosity also had a bug in its file system implementation, that triggered an au- tomatic switch to safe mode. The first incident prompted a proposal to formally verify a file system for flash memory [24,18] as a pilot project for Hoare's Grand Challenge [22]. We are developing a verified flash file system (FFS). This paper reports on our progress and discusses some of the aspects of the project. We describe parts of the design, the formal models, and proofs, pointing out challenges and solutions. The main characteristic of flash memory that guides the design is that data cannot be overwritten in place, instead space can only be reused by erasing whole blocks.
    [Show full text]
  • CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N
    CERIAS Tech Report 2017-5 Deceptive Memory Systems by Christopher N. Gutierrez Center for Education and Research Information Assurance and Security Purdue University, West Lafayette, IN 47907-2086 DECEPTIVE MEMORY SYSTEMS ADissertation Submitted to the Faculty of Purdue University by Christopher N. Gutierrez In Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy December 2017 Purdue University West Lafayette, Indiana ii THE PURDUE UNIVERSITY GRADUATE SCHOOL STATEMENT OF DISSERTATION APPROVAL Dr. Eugene H. Spa↵ord, Co-Chair Department of Computer Science Dr. Saurabh Bagchi, Co-Chair Department of Computer Science Dr. Dongyan Xu Department of Computer Science Dr. Mathias Payer Department of Computer Science Approved by: Dr. Voicu Popescu by Dr. William J. Gorman Head of the Graduate Program iii This work is dedicated to my wife, Gina. Thank you for all of your love and support. The moon awaits us. iv ACKNOWLEDGMENTS Iwould liketothank ProfessorsEugeneSpa↵ord and SaurabhBagchi for their guidance, support, and advice throughout my time at Purdue. Both have been instru­ mental in my development as a computer scientist, and I am forever grateful. I would also like to thank the Center for Education and Research in Information Assurance and Security (CERIAS) for fostering a multidisciplinary security culture in which I had the privilege to be part of. Special thanks to Adam Hammer and Ronald Cas­ tongia for their technical support and Thomas Yurek for his programming assistance for the experimental evaluation. I am grateful for the valuable feedback provided by the members of my thesis committee, Professor Dongyen Xu, and Professor Math­ ias Payer.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency
    ZBD: Using Transparent Compression at the Block Level to Increase Storage Space Efficiency Thanos Makatos∗†, Yannis Klonatos∗†, Manolis Marazakis∗, Michail D. Flouris∗, and Angelos Bilas∗† ∗ Foundation for Research and Technology - Hellas (FORTH) Institute of Computer Science (ICS) 100 N. Plastira Av., Vassilika Vouton, Heraklion, GR-70013, Greece † Department of Computer Science, University of Crete P.O. Box 2208, Heraklion, GR 71409, Greece {mcatos, klonatos, maraz, flouris, bilas}@ics.forth.gr Abstract—In this work we examine how transparent compres- • Logical to physical block mapping: Block-level compres- sion in the I/O path can improve space efficiency for online sion imposes a many-to-one mapping from logical to storage. We extend the block layer with the ability to compress physical blocks, as multiple compressed logical blocks and decompress data as they flow between the file-system and the disk. Achieving transparent compression requires extensive must be stored in the same physical block. This requires metadata management for dealing with variable block sizes, dy- using a translation mechanism that imposes low overhead namic block mapping, block allocation, explicit work scheduling in the common I/O path and scales with the capacity and I/O optimizations to mitigate the impact of additional I/Os of the underlying devices as well as a block alloca- and compression overheads. Preliminary results show that online tion/deallocation mechanism that affects data placement. transparent compression is a viable option for improving effective • storage capacity, it can improve I/O performance by reducing Increased number of I/Os: Using compression increases I/O traffic and seek distance, and has a negative impact on the number of I/Os required on the critical path during performance only when single-thread I/O latency is critical.
    [Show full text]
  • CS 152: Computer Systems Architecture Storage Technologies
    CS 152: Computer Systems Architecture Storage Technologies Sang-Woo Jun Winter 2019 Storage Used To be a Secondary Concern Typically, storage was not a first order citizen of a computer system o As alluded to by its name “secondary storage” o Its job was to load programs and data to memory, and disappear o Most applications only worked with CPU and system memory (DRAM) o Extreme applications like DBMSs were the exception Because conventional secondary storage was very slow o Things are changing! Some (Pre)History Magnetic core memory Rope memory (ROM) 1960’s Drum memory 1950~1970s 72 KiB per cubic foot! 100s of KiB (1024 bits in photo) Hand-woven to program the 1950’s Apollo guidance computer Photos from Wikipedia Some (More Recent) History Floppy disk drives 1970’s~2000’s 100 KiBs to 1.44 MiB Hard disk drives 1950’s to present MBs to TBs Photos from Wikipedia Some (Current) History Solid State Drives Non-Volatile Memory 2000’s to present 2010’s to present GB to TBs GBs Hard Disk Drives Dominant storage medium for the longest time o Still the largest capacity share Data organized into multiple magnetic platters o Mechanical head needs to move to where data is, to read it o Good sequential access, terrible random access • 100s of MB/s sequential, maybe 1 MB/s 4 KB random o Time for the head to move to the right location (“seek time”) may be ms long • 1000,000s of cycles! Typically “ATA” (Including IDE and EIDE), and later “SATA” interfaces o Connected via “South bridge” chipset Ding Yuan, “Operating Systems ECE344 Lecture 11: File
    [Show full text]
  • Filesystem Considerations for Embedded Devices ELC2015 03/25/15
    Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded software engineer Filesystem considerations ABSTRACT The goal of this presentation is to answer a question asked by several customers: which filesystem should you use within your embedded design’s eMMC/SDCard? These storage devices use a standard block interface, compatible with traditional filesystems, but constraints are not those of desktop PC environments. EXT2/3/4, BTRFS, F2FS are the first of many solutions which come to mind, but how do they all compare? Typical queries include performance, longevity, tools availability, support, and power loss robustness. This presentation will not dive into implementation details but will instead summarize provided answers with the help of various figures and meaningful test results. 2 TABLE OF CONTENTS 1. Introduction 2. Block devices 3. Available filesystems 4. Performances 5. Tools 6. Reliability 7. Conclusion Filesystem considerations ABOUT THE AUTHOR • Tristan Lelong • Embedded software engineer @ Adeneo Embedded • French, living in the Pacific northwest • Embedded software, free software, and Linux kernel enthusiast. 4 Introduction Filesystem considerations Introduction INTRODUCTION More and more embedded designs rely on smart memory chips rather than bare NAND or NOR. This presentation will start by describing: • Some context to help understand the differences between NAND and MMC • Some typical requirements found in embedded devices designs • Potential filesystems to use on MMC devices 6 Filesystem considerations Introduction INTRODUCTION Focus will then move to block filesystems. How they are supported, what feature do they advertise. To help understand how they compare, we will present some benchmarks and comparisons regarding: • Tools • Reliability • Performances 7 Block devices Filesystem considerations Block devices MMC, EMMC, SD CARD Vocabulary: • MMC: MultiMediaCard is a memory card unveiled in 1997 by SanDisk and Siemens based on NAND flash memory.
    [Show full text]
  • No Slide Title
    An Embedded Storage Framework Abstracting Each Raw Flash Device as An MTD Wei Wang, Deng Zhou, and Tao Xie, San Diego State University, San Diego, CA 92182 The 8th ACM International Systems and Storage Conference (SYSTOR 2015, Full Paper), Haifa, Israel, May 26-28, 2015 Introduction Our Design1: MTD Proxy Middleware mobile devices such as smart- phones and 3-D digital cameras The MTD proxy middleware has normally use raw flash memory device directly managed by an the following components: embedded flash file system, a dedicated file system designed for address mapping, access storing files on raw flash memory devices. Thus, in addition to queuing management, multiple provide normal file system functions and interface to up layer work thread layer, proxy system, flash file systems are also designed to directly control raw interface module, With the flash memory devices, they can address the inherent constraints of support of the four components, flash (e.g., wear out issue and the erase-before-write problem). the MTD proxy middleware Although existing embedded flash storage systems are effective for enables an existing flash file traditional mobile applications, they are becoming increasingly system to communicate with an inadequate for emerging data- intensive mobile applications due to array of MTD devices without the following limitations: no modifications of existing (1) Insufficient storage capacity: One of the most remarkable trends flash file systems, which is an of mobile computing is that emerging mobile applications are attractive feature for system creating huge volume of data. design and optimization. Figure 2. MTD Proxy Middleware. (2) Incompetent performance: current flash file system only supports serial IO access.
    [Show full text]
  • State-Of-The-Art Garbage Collection Policies for NILFS2
    State-of-the-art Garbage Collection Policies for NILFS2 DIPLOMARBEIT zur Erlangung des akademischen Grades Diplom-Ingenieur im Rahmen des Studiums Software Engineering/Internet Computing eingereicht von Andreas Rohner, Bsc Matrikelnummer 0502196 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: Ao.Univ.Prof. Dipl.-Ing. Dr.techn. M. Anton Ertl Wien, 23. Jänner 2018 Andreas Rohner M. Anton Ertl Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at State-of-the-art Garbage Collection Policies for NILFS2 DIPLOMA THESIS submitted in partial fulfillment of the requirements for the degree of Diplom-Ingenieur in Software Engineering/Internet Computing by Andreas Rohner, Bsc Registration Number 0502196 to the Faculty of Informatics at the TU Wien Advisor: Ao.Univ.Prof. Dipl.-Ing. Dr.techn. M. Anton Ertl Vienna, 23rd January, 2018 Andreas Rohner M. Anton Ertl Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Andreas Rohner, Bsc Grundsteingasse 43/22 Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 23. Jänner 2018 Andreas Rohner v Danksagung Ich danke meinem Betreuer Ao.Univ.Prof. Dipl.-Ing. Dr.techn. M. Anton Ertl für seine Geduld und Unterstützung. Außerdem schulde ich Ryusuke Konishi, dem Maintainer des NILFS2-Subsystems im Linux-Kernel, Dank für die Durchsicht meiner Patches und die vielen wertvollen Verbesserungsvorschläge.
    [Show full text]
  • The Ongoing Evolution of Ext4 File System
    The Ongoing Evolution of Ext4 file system New features and Performance Enhancements Red Hat Luk´aˇsCzerner October 24, 2011 Copyright © 2011 Luk´aˇsCzerner, Red Hat. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the COPYING file. Part I Statistics Agenda 1 Who works on Ext4? 2 Lines of code Who works on Ext4? Agenda 1 Who works on Ext4? 2 Lines of code Who works on Ext4? Last year of Ext4 development 250 non merge changes from 72 developers 9 developers has at least 10 commits 8512 lines of code inserted 5675 lined of code deleted Who works on Ext4? Comparison with other local file systems File system Number of commits Developers Developers* Ext4 250 72 9 Ext3 95 34 2 Xfs 294 34 4 Btrfs 506 60 11 550 Number of commits 500 Developers 450 400 350 300 250 Duration [s] 200 150 100 50 0 EXT3 EXT4 XFS BTRFS File system Lines of code Agenda 1 Who works on Ext4? 2 Lines of code Lines of code Development of the number of lines 80000 Ext3 Xfs Ext4 70000 Btrfs 60000 50000 40000 Lines of code 30000 20000 10000 01/01/05 01/01/06 01/01/07 01/01/08 01/01/09 01/01/10 01/01/11 01/01/12 Part II What's new in Ext4? Agenda 3 Faster file system creation 4 Discard support 5 Support for file systems beyond 16TB 6 Punch hole support 7 Scalability improvements 8 Clustered allocation Faster file system creation
    [Show full text]
  • NOVA: the Fastest File System for Nvdimms
    NOVA: The Fastest File System for NVDIMMs Steven Swanson, UC San Diego XFS F2FS NILFS EXT4 BTRFS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Disk-based file systems are inadequate for NVMM Disk-based file systems cannot 1-Sector 1-Block N-Block 1-Sector 1-Block N-Block Atomicity overwrit overwrit overwrit append append append exploit NVMM performance e e e Ext4 ✓ ✗ ✗ ✗ ✗ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ Performance optimization Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ compromises consistency on Dataj system failure [1] Btrfs ✓ ✓ ✓ ✓ ✗ ✓ xfs ✓ ✓ ✗ ✓ ✗ ✓ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ [1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. BPFS SCMFS PMFS Aerie EXT4-DAX XFS-DAX NOVA M1FS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Previous Prototype NVMM file systems are not strongly consistent DAX does not provide data ATomic Atomicity Metadata Data Snapshot atomicity guarantee Mmap [1] So programming is more BPFS ✓ ✓ [2] ✗ ✗ difficult PMFS ✓ ✗ ✗ ✗ Ext4-DAX ✓ ✗ ✗ ✗ Xfs-DAX ✓ ✗ ✗ ✗ SCMFS ✗ ✗ ✗ ✗ Aerie ✓ ✗ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Ext4-DAX and xfs-DAX shortcomings No data atomicity support Single journal shared by all the transactions (JBD2- based) Poor performance Development teams are (rightly) “disk first”. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. NOVA provides strong atomicity guarantees 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Atomicity Atomicity Metadata Data Mmap overwrite append overwrite append overwrite append Ext4 ✓ ✗ ✗ ✗ ✗ ✗ BPFS ✓ ✓ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ PMFS ✓ ✗ ✗ Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ Ext4-DAX ✓ ✗ ✗ Dataj Btrfs ✓ ✓ ✓ ✓ ✗ ✓ Xfs-DAX ✓ ✗ ✗ xfs ✓ ✓ ✗ ✓ ✗ ✓ SCMFS ✗ ✗ ✗ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ Aerie ✓ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved.
    [Show full text]
  • I/O Stack Optimization for Smartphones
    I/O Stack Optimization for Smartphones Sooman Jeong1, Kisung Lee2,*, Seongjin Lee1, Seoungbum Son2,*, and Youjip Won1 1 Hanyang University, Seoul, Korea 2Samsung Electronics, Suwon, Korea Abstract ing devices for a variety of applications, including so- The Android I/O stack consists of elaborate and mature cial network services, games, cameras, camcorders, mp3 components (SQLite, the EXT4 filesystem, interrupt- players, and web browsers. driven I/O, and NAND-based storage) whose integrated The application performance of a smartphone is not behavior is not well-orchestrated, which leaves a sub- governed by the speed of its airlink, e.g., Wi-Fi, but stantial room for an improvement. We overhauled the rather by the storage performance, which is currently uti- block I/O behavior of five filesystems (EXT4, XFS, lized in a quite inefficient manner [11]. Furthermore, BTRFS, NILFS, and F2FS) under each of the five dif- one of the main sources of this inefficiency is an ex- ferent journaling modes of SQLite. We found that the cessive I/O activity caused by uncoordinated interac- most significant inefficiency originates from the fact that tions between EXT4 journaling and SQLite journaling filesystem journals the database journaling activity; we [14]. Despite its significant implications for the overall refer to this as the JOJ (Journaling of Journal) anomaly. smartphone performance, the I/O subsystem behavior of The JOJ overhead compounds in EXT4 when the bulky smartphones has not been studied nearly as thoroughly as EXT4 journaling activity is triggered by an fsync() call those in enterprise servers [26, 23], web servers [4, 10], from SQLite.
    [Show full text]
  • Android Boot Optimization on Dra7xx Devices (Rev. A)
    Application Report SPRAC30A–January 2016–Revised February 2018 Android Boot Optimization on DRA7xx Devices ABSTRACT Boot time optimizations are critical for achieving the perception of immediate application availability in the Automotive Infotainment system. They impact the end user's experience by improving the launch time of applications and infotainment use-cases. This application report captures the details on how to optimize Android software stack, along with a few optional early infotainment system features, and is meant to be used as a reference guide. The end product owner (OEM/ODM/Customer) can review the guidance for improving boot time for their specific Android powered system. Contents 1 Introduction ................................................................................................................... 2 2 Build Instructions............................................................................................................. 3 3 Deploying Instructions....................................................................................................... 5 4 Tooling/Profiling .............................................................................................................. 6 5 Optimization Steps........................................................................................................... 7 6 Adding Early Infotainment (IVI) Features to the Android System ..................................................... 8 7 Recommended Userspace Optimizations ..............................................................................
    [Show full text]