Systematically Testing File-System Crash Consistency

Total Page:16

File Type:pdf, Size:1020Kb

Systematically Testing File-System Crash Consistency Systematically Testing File-System Crash Consistency [Published at OSDI 2018] Jayashree Mohan University of Texas at Austin [email protected] 1 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 2 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 3 What is a File System? • A file system is a structured representation of data and a set of metadata describing this data. • Data includes abstractions like files and directories • File system data structures are persisted • Stored on hard disk, SSDs etc 4 What is a crash? • An event that results in interruption of ongoing processes in the system • Loss of current working state in memory • Storage left in an intermediate state 5 Crashes I crashed ☹ This is very File saved! important… Image source : https://www.fotolia.com 6 I wish file systems were crash-consistent! 7 File System Crash Consistency 1. Ordering : Filesystem operations change multiple blocks on storage that needs to be ordered • Inode, bitmaps, data blocks, superblock 2. Persistence : Data structures are cached for better performance • Great for reads! • But writes have to ensure that modified data in cache is written back to disk 8 Example of a crash scenario • Let’s consider what happens during a file append 9 File Append Example foo owner:owner: rootroot permissions:permissions: rwrw size:size: 12 pointer:pointer: 44 These updates must occur in a specific order pointer:pointer: null5 pointer:pointer: nullnull pointer:pointer: nullnull Inode Data Inodes Data Blocks Bitmap Bitmap v2v1 D1 D2 Update Update Write the the data the inode data bitmap 10 Example adapted from https://cbw.sh/static/class/5600/slides/10_File_Systems.pptx Inode Data Inodes Data Blocks Bitmap Bitmap v1 D1 D2 File system consistent, but data is lost Write the data v1v2 D1 Update the inode File system inconsistent : garbage data v1 D1 Update data bitmap File system inconsistent : space leak 11 How did FS developers handle this problem? 1.Don’t bother to ensure consistency • Run a program that fixes the file system during bootup ext2 • File system checker (fsck) • Results in data loss, but fixes inconsistency 2.Use a transaction log to make multi-writes atomic • Log stores a history of all writes to the disk • After a crash the log can be “replayed” to finish updates ext4 • Journaling file system (ext4, f2fs, btrfs, xfs) 12 File System Crash Consistency 1. Ordering : Filesystem operations change multiple blocks on storage that needs to be ordered • Inode, bitmaps, data blocks, superblock 2. Persistence : Data structures are cached for better performance • Great for reads! • But writes have to ensure that modified data in cache is written back to disk 13 The Persistence Operations • Journaling file systems aim to ensure crash consistency • But, can result in data loss if file system operations are not persisted explicitly • Changes are in memory until explicitly flushed (or a file system background checkpoint at regular timeouts) • fsync(), fdatasync(), sync foo foo fsync() foo foo foo foo Bug! 14 Crash Consistency • On crash, all the in-memory component of a file system structure is lost. • Ensuring crash consistency requires that all necessary information to recover the correct state be persisted in order. Filesystem Unmountable! Metadata Corruption Data Corruption Unmountable FS 15 Rename atomicity bug in btrfs Memory Storage 16 Rename atomicity bug in btrfs A mkdir (A) Memory Storage 17 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) Memory Storage 18 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) Memory Storage A bar 19 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) Memory Storage A bar 20 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) bar touch (B/bar) Memory Storage A bar 21 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Storage A bar 22 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) Storage A bar 23 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) fsync (A/foo) Storage A bar foo 24 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) fsync (A/foo) Storage CRASH! A bar foo Expected 25 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Persisted file A/bar touch (A/foo) missing fsync (A/foo) Storage CRASH! A A bar foo foo Actual Expected 26 What just happened? A Rename (B/bar, A/bar) A bar bar B B bar 27 What just happened? A Rename (B/bar, A/bar) A bar bar B B bar 1. unlink (A/bar) 28 What just happened? A Rename (B/bar, A/bar) A bar B B bar 1. unlink (A/bar) 2. mv (B/bar, A/bar) 29 What just happened? mkdir (A) A Rename (B/bar, A/bar) A bar touch (A/bar) B B fsync (A/bar) bar 1. unlink (A/bar) mkdir (B) 2. mv (B/bar, A/bar) touch (B/bar) rename (B/bar, A/bar) Must have been atomic touch (A/foo) fsync (A/foo) CRASH! • fsync(A/foo) commits tx that unlinks A/bar • Which means step 1 above is persisted, but rename is not persisted • End up losing file A/bar 30 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Persisted file A/bar touch (A/foo) missing fsync (A/foo) Storage CRASH! A A Existsbar in the kernel since 2014!foo foo Found by ACE and CrashMonkeyActual Expected 31 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 32 Testing Crash Consistency Today • Build FS from scratch Verified Model • Annotate filesystems Filesystems Checking • Hard to do for existing FS • State of the Art : xfstest suite • Collection of 500 regression tests Only 5% of tests in xfstest check for file system crash consistency 33 Challenges in crash consistency testing Lack of Challenges Infinite automated Workload infrastructure space Our work addresses both these issues, to provide a systematic testing framework 34 Bounded Black-Box Crash Testing (B3) New approach to testing file-system crash consistency Bounds: Target Filesystem (length, operations, args) ➢ Focus on reproducible bugs resulting in metadata corruption, Automatic Crash Explorer(ACE) data loss. … ➢ Focus on bugs where explicitly Workload 1 Workload n persisted data/metadata is corrupted CrashMonkey ➢ Found 10 new bugs across btrfs and F2FS; ➢ Found 1 bug in FSCQ (verified Output: file system) Bug report with workload, expected state, actual state www.github.com/utsaslab/crashmonkey 35 CrashMonkey and Ace : Features Completely Fully automated black-box File system Finds real bugs agnostic 36 CrashMonkey and Ace : Features Completely black- Fully automated box No manual checkers and workloads File system Finds real bugs agnostic 37 CrashMonkey and Ace : Features Completely Fully automated black-box No annotations Doesn’t need to look at file system code File system Finds real bugs agnostic 38 CrashMonkey and Ace : Features Completely black- Fully automated box Works on any POSIX file system, including verified FS File system Finds real bugs agnostic 39 CrashMonkey and Ace : Features Completely black- Fully automated box Acknowledged and patched by kernel developers File system Finds real bugs agnostic 40 B3 vs other approaches Metric Verified FS Model Check xfstests B3 Fully Automated Black Box FS agnostic Find previously unknown bugs 41 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 42 Challenges with systematic testing Lack of Challenges Infinite automated workload infrastructure space 43 CrashMonkey • Efficient infrastructure to record and replay block level IO requests • Simulate crash at different points in the workload • Automatically test for consistency after crash. • Copy-on-write RAM block device 44 How to crash the file system for testing? • Actually crash the file system 1. Randomly power cycle the VM or the server • Restarting the VM after crash is slow! • Unlikely to reveal bugs 2. Run the file system in user space • Not all file systems can be run as user space processes • Redesign file systems SIMULATE crashes instead of trying to actually crash the system! 45 How does CrashMonkey simulate crashes? Workload User space mkdir (A) touch (A/bar) fsync (A/bar) Kernel space VFS + page Cache + mkdir (B) Filesystem file system touch (B/bar) rename (B/bar, A/bar) touch (A/foo) Generic Block Block IO (BIO) requests fsync (A/foo) Layer CRASH! Block Device Driver Device specific driver Block Device (HDD/ SSD) 46 CrashMonkey Architecture Crash State 1 Workload Test harness User space Crash State 2 Kernel space Filesystem • Trace all BIO requests Generic Block Layer • To simulate crash, simply drop the BIO requests after a particular point. Device Wrapper Custom RAM Block Device 47 IO due to workload CrashMonkey in Action Persistence point Workload Initial FS state Final FS state 48 IO due to workload CrashMonkey in Action Persistence point Workload Initial FS state 49 IO due to workload Phase 1 : Record IO Persistence point Workload IO forced by unmount Initial FS state Oracle Record IO up to persistence point Safely unmount 50 IO due to workload Phase 2 : Replay
Recommended publications
  • Copy on Write Based File Systems Performance Analysis and Implementation
    Copy On Write Based File Systems Performance Analysis And Implementation Sakis Kasampalis Kongens Lyngby 2010 IMM-MSC-2010-63 Technical University of Denmark Department Of Informatics Building 321, DK-2800 Kongens Lyngby, Denmark Phone +45 45253351, Fax +45 45882673 [email protected] www.imm.dtu.dk Abstract In this work I am focusing on Copy On Write based file systems. Copy On Write is used on modern file systems for providing (1) metadata and data consistency using transactional semantics, (2) cheap and instant backups using snapshots and clones. This thesis is divided into two main parts. The first part focuses on the design and performance of Copy On Write based file systems. Recent efforts aiming at creating a Copy On Write based file system are ZFS, Btrfs, ext3cow, Hammer, and LLFS. My work focuses only on ZFS and Btrfs, since they support the most advanced features. The main goals of ZFS and Btrfs are to offer a scalable, fault tolerant, and easy to administrate file system. I evaluate the performance and scalability of ZFS and Btrfs. The evaluation includes studying their design and testing their performance and scalability against a set of recommended file system benchmarks. Most computers are already based on multi-core and multiple processor architec- tures. Because of that, the need for using concurrent programming models has increased. Transactions can be very helpful for supporting concurrent program- ming models, which ensure that system updates are consistent. Unfortunately, the majority of operating systems and file systems either do not support trans- actions at all, or they simply do not expose them to the users.
    [Show full text]
  • F2FS) Overview
    Flash Friendly File System (F2FS) Overview Leon Romanovsky [email protected] www.leon.nu November 17, 2012 Leon Romanovsky [email protected] F2FS Overview Disclaimer Everything in this lecture shall not, under any circumstances, hold any legal liability whatsoever. Any usage of the data and information in this document shall be solely on the responsibility of the user. This lecture is not given on behalf of any company or organization. Leon Romanovsky [email protected] F2FS Overview Introduction: Flash Memory Definition Flash memory is a non-volatile storage device that can be electrically erased and reprogrammed. Challenges block-level access wear leveling read disturb bad blocks management garbage collection different physics different interfaces Leon Romanovsky [email protected] F2FS Overview Introduction: Flash Memory Definition Flash memory is a non-volatile storage device that can be electrically erased and reprogrammed. Challenges block-level access wear leveling read disturb bad blocks management garbage collection different physics different interfaces Leon Romanovsky [email protected] F2FS Overview Introduction: General System Architecture Leon Romanovsky [email protected] F2FS Overview Introduction: File Systems Optimized for disk storage EXT2/3/4 BTRFS VFAT Optimized for flash, but not aware of FTL JFFS/JFFS2 YAFFS LogFS UbiFS NILFS Leon Romanovsky [email protected] F2FS Overview Background: LFS vs. Unix FS Leon Romanovsky [email protected] F2FS Overview Background: LFS Overview Leon Romanovsky [email protected] F2FS Overview Background: LFS Garbage Collection 1 A victim segment is selected through referencing segment usage table. 2 It loads parent index structures of all the data in the victim identified by segment summary blocks.
    [Show full text]
  • In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage
    In Search of Optimal Data Placement for Eliminating Write Amplification in Log-Structured Storage Qiuping Wangy, Jinhong Liy, Patrick P. C. Leey, Guangliang Zhao∗, Chao Shi∗, Lilong Huang∗ yThe Chinese University of Hong Kong ∗Alibaba Group ABSTRACT invalidated by a live block; a.k.a. the death time [12]) to achieve Log-structured storage has been widely deployed in various do- the minimum possible WA. However, without obtaining the fu- mains of storage systems for high performance. However, its garbage ture knowledge of the BIT pattern, how to design an optimal data collection (GC) incurs severe write amplification (WA) due to the placement scheme with the minimum WA remains an unexplored frequent rewrites of live data. This motivates many research stud- issue. Existing temperature-based data placement schemes that ies, particularly on data placement strategies, that mitigate WA in group blocks by block temperatures (e.g., write/update frequencies) log-structured storage. We show how to design an optimal data [7, 16, 22, 27, 29, 35, 36] are arguably inaccurate to capture the BIT placement scheme that leads to the minimum WA with the fu- pattern and fail to group the blocks with similar BITs [12]. ture knowledge of block invalidation time (BIT) of each written To this end, we propose SepBIT, a novel data placement scheme block. Guided by this observation, we propose SepBIT, a novel data that aims to minimize the WA in log-structured storage. It infers placement algorithm that aims to minimize WA in log-structured the BITs of written blocks from the underlying storage workloads storage.
    [Show full text]
  • CS 5600 Computer Systems
    CS 5600 Computer Systems Lecture 10: File Systems What are We Doing Today? • Last week we talked extensively about hard drives and SSDs – How they work – Performance characterisEcs • This week is all about managing storage – Disks/SSDs offer a blank slate of empty blocks – How do we store files on these devices, and keep track of them? – How do we maintain high performance? – How do we maintain consistency in the face of random crashes? 2 • ParEEons and MounEng • Basics (FAT) • inodes and Blocks (ext) • Block Groups (ext2) • Journaling (ext3) • Extents and B-Trees (ext4) • Log-based File Systems 3 Building the Root File System • One of the first tasks of an OS during bootup is to build the root file system 1. Locate all bootable media – Internal and external hard disks – SSDs – Floppy disks, CDs, DVDs, USB scks 2. Locate all the parEEons on each media – Read MBR(s), extended parEEon tables, etc. 3. Mount one or more parEEons – Makes the file system(s) available for access 4 The Master Boot Record Address Size Descripon Hex Dec. (Bytes) Includes the starEng 0x000 0 Bootstrap code area 446 LBA and length of 0x1BE 446 ParEEon Entry #1 16 the parEEon 0x1CE 462 ParEEon Entry #2 16 0x1DE 478 ParEEon Entry #3 16 0x1EE 494 ParEEon Entry #4 16 0x1FE 510 Magic Number 2 Total: 512 ParEEon 1 ParEEon 2 ParEEon 3 ParEEon 4 MBR (ext3) (swap) (NTFS) (FAT32) Disk 1 ParEEon 1 MBR (NTFS) 5 Disk 2 Extended ParEEons • In some cases, you may want >4 parEEons • Modern OSes support extended parEEons Logical Logical ParEEon 1 ParEEon 2 Ext.
    [Show full text]
  • Ext4 File System and Crash Consistency
    1 Ext4 file system and crash consistency Changwoo Min 2 Summary of last lectures • Tools: building, exploring, and debugging Linux kernel • Core kernel infrastructure • Process management & scheduling • Interrupt & interrupt handler • Kernel synchronization • Memory management • Virtual file system • Page cache and page fault 3 Today: ext4 file system and crash consistency • File system in Linux kernel • Design considerations of a file system • History of file system • On-disk structure of Ext4 • File operations • Crash consistency 4 File system in Linux kernel User space application (ex: cp) User-space Syscalls: open, read, write, etc. Kernel-space VFS: Virtual File System Filesystems ext4 FAT32 JFFS2 Block layer Hardware Embedded Hard disk USB drive flash 5 What is a file system fundamentally? int main(int argc, char *argv[]) { int fd; char buffer[4096]; struct stat_buf; DIR *dir; struct dirent *entry; /* 1. Path name -> inode mapping */ fd = open("/home/lkp/hello.c" , O_RDONLY); /* 2. File offset -> disk block address mapping */ pread(fd, buffer, sizeof(buffer), 0); /* 3. File meta data operation */ fstat(fd, &stat_buf); printf("file size = %d\n", stat_buf.st_size); /* 4. Directory operation */ dir = opendir("/home"); entry = readdir(dir); printf("dir = %s\n", entry->d_name); return 0; } 6 Why do we care EXT4 file system? • Most widely-deployed file system • Default file system of major Linux distributions • File system used in Google data center • Default file system of Android kernel • Follows the traditional file system design 7 History of file system design 8 UFS (Unix File System) • The original UNIX file system • Design by Dennis Ritche and Ken Thompson (1974) • The first Linux file system (ext) and Minix FS has a similar layout 9 UFS (Unix File System) • Performance problem of UFS (and the first Linux file system) • Especially, long seek time between an inode and data block 10 FFS (Fast File System) • The file system of BSD UNIX • Designed by Marshall Kirk McKusick, et al.
    [Show full text]
  • Improving Journaling File System Performance in Virtualization
    SOFTWARE – PRACTICE AND EXPERIENCE Softw. Pract. Exper. 2012; 42:303–330 Published online 30 March 2011 in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/spe.1069 VM aware journaling: improving journaling file system performance in virtualization environments Ting-Chang Huang1 and Da-Wei Chang2,∗,† 1Department of Computer Science, National Chiao Tung University, No. 1001, University Road, Hsinchu 300, Taiwan 2Department of Computer Science and Information Engineering, National Cheng Kung University, No. 1, Ta-Hsueh Road, Tainan 701, Taiwan SUMMARY Journaling file systems, which are widely used in modern operating systems, guarantee file system consistency and data integrity by logging file system updates to a journal, which is a reserved space on the storage, before the updates are written to the data storage. Such journal writes increase the write traffic to the storage and thus degrade the file system performance, especially in full data journaling, which logs both metadata and data updates. In this paper, a new journaling approach is proposed to eliminate journal writes in server virtualization environments, which are gaining in popularity in server platforms. Based on reliable hardware subsystems and virtual machine monitor (VMM), the proposed approach eliminates journal writes by retaining journal data (i.e. logged file system updates) in the memory of each virtual machine and ensuring the integrity of these journal data through cooperation between the journaling file systems and the VMM. We implement the proposed approach in Linux ext3 in the Xen virtualization environment. According to the performance results, a performance improvement of up to 50.9% is achieved over the full data journaling approach of ext3 due to journal write elimination.
    [Show full text]
  • NOVA: a Log-Structured File System for Hybrid Volatile/Non
    NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu and Steven Swanson, University of California, San Diego https://www.usenix.org/conference/fast16/technical-sessions/presentation/xu This paper is included in the Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST ’16). February 22–25, 2016 • Santa Clara, CA, USA ISBN 978-1-931971-28-7 Open access to the Proceedings of the 14th USENIX Conference on File and Storage Technologies is sponsored by USENIX NOVA: A Log-structured File System for Hybrid Volatile/Non-volatile Main Memories Jian Xu Steven Swanson University of California, San Diego Abstract Hybrid DRAM/NVMM storage systems present a host of opportunities and challenges for system designers. These sys- Fast non-volatile memories (NVMs) will soon appear on tems need to minimize software overhead if they are to fully the processor memory bus alongside DRAM. The result- exploit NVMM’s high performance and efficiently support ing hybrid memory systems will provide software with sub- more flexible access patterns, and at the same time they must microsecond, high-bandwidth access to persistent data, but provide the strong consistency guarantees that applications managing, accessing, and maintaining consistency for data require and respect the limitations of emerging memories stored in NVM raises a host of challenges. Existing file sys- (e.g., limited program cycles). tems built for spinning or solid-state disks introduce software Conventional file systems are not suitable for hybrid mem- overheads that would obscure the performance that NVMs ory systems because they are built for the performance char- should provide, but proposed file systems for NVMs either in- acteristics of disks (spinning or solid state) and rely on disks’ cur similar overheads or fail to provide the strong consistency consistency guarantees (e.g., that sector updates are atomic) guarantees that applications require.
    [Show full text]
  • Filesystem Considerations for Embedded Devices ELC2015 03/25/15
    Filesystem considerations for embedded devices ELC2015 03/25/15 Tristan Lelong Senior embedded software engineer Filesystem considerations ABSTRACT The goal of this presentation is to answer a question asked by several customers: which filesystem should you use within your embedded design’s eMMC/SDCard? These storage devices use a standard block interface, compatible with traditional filesystems, but constraints are not those of desktop PC environments. EXT2/3/4, BTRFS, F2FS are the first of many solutions which come to mind, but how do they all compare? Typical queries include performance, longevity, tools availability, support, and power loss robustness. This presentation will not dive into implementation details but will instead summarize provided answers with the help of various figures and meaningful test results. 2 TABLE OF CONTENTS 1. Introduction 2. Block devices 3. Available filesystems 4. Performances 5. Tools 6. Reliability 7. Conclusion Filesystem considerations ABOUT THE AUTHOR • Tristan Lelong • Embedded software engineer @ Adeneo Embedded • French, living in the Pacific northwest • Embedded software, free software, and Linux kernel enthusiast. 4 Introduction Filesystem considerations Introduction INTRODUCTION More and more embedded designs rely on smart memory chips rather than bare NAND or NOR. This presentation will start by describing: • Some context to help understand the differences between NAND and MMC • Some typical requirements found in embedded devices designs • Potential filesystems to use on MMC devices 6 Filesystem considerations Introduction INTRODUCTION Focus will then move to block filesystems. How they are supported, what feature do they advertise. To help understand how they compare, we will present some benchmarks and comparisons regarding: • Tools • Reliability • Performances 7 Block devices Filesystem considerations Block devices MMC, EMMC, SD CARD Vocabulary: • MMC: MultiMediaCard is a memory card unveiled in 1997 by SanDisk and Siemens based on NAND flash memory.
    [Show full text]
  • Model-Based Failure Analysis of Journaling File Systems
    Model-Based Failure Analysis of Journaling File Systems Vijayan Prabhakaran, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau University of Wisconsin, Madison Computer Sciences Department 1210, West Dayton Street, Madison, Wisconsin {vijayan, dusseau, remzi}@cs.wisc.edu Abstract To analyze such file systems, we develop a novel model- based fault-injection technique. Specifically, for the file We propose a novel method to measure the dependability system under test, we develop an abstract model of its up- of journaling file systems. In our approach, we build models date behavior, e.g., how it orders writes to disk to maintain of how journaling file systems must behave under different file system consistency. By using such a model, we can journaling modes and use these models to analyze file sys- inject faults at various “interesting” points during a file sys- tem behavior under disk failures. Using our techniques, we tem transaction, and thus monitor how the system reacts to measure the robustness of three important Linux journaling such failures. In this paper, we focus only on write failures file systems: ext3, Reiserfs and IBM JFS. From our anal- because file system writes are those that change the on-disk ysis, we identify several design flaws and correctness bugs state and can potentially lead to corruption if not properly present in these file systems, which can cause serious file handled. system errors ranging from data corruption to unmountable We use this fault-injection methodology to test three file systems. widely used Linux journaling file systems: ext3 [19], Reis- erfs [14] and IBM JFS [1].
    [Show full text]
  • Analysis and Design of Native File System Enhancements for Storage Class Memory
    Analysis and Design of Native File System Enhancements for Storage Class Memory by Raymond Robles A Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science Approved April 2016 by the Graduate Supervisory Committee: Violet Syrotiuk, Chair Sohum Sohoni Carole-Jean Wu ARIZONA STATE UNIVERSITY May 2016 ABSTRACT As persistent non-volatile memory solutions become integrated in the computing ecosystem and landscape, traditional commodity file systems architected and developed for traditional block I/O based memory solutions must be reevaluated. A majority of commodity file systems have been architected and designed with the goal of managing data on non-volatile storage devices such as hard disk drives (HDDs) and solid state drives (SSDs). HDDs and SSDs are attached to a computing system via a controller or I/O hub, often referred to as the southbridge. The point of HDD and SSD attachment creates multiple levels of translation for any data managed by the CPU that must be stored in non-volatile memory (NVM) on an HDD or SSD. Storage Class Memory (SCM) devices provide the ability to store data at the CPU and DRAM level of a computing system. A novel set of modifications to the ext2 and ext4 commodity file systems to address the needs of SCM will be presented and discussed. An in-depth analysis of many existing file systems, from multiple sources, will be presented along with an analysis to identify key modifications and extensions that would be necessary to execute file system on SCM devices. From this analysis, modifications and extensions have been applied to the FAT commodity file system for key functional tests that will be presented to demonstrate the operation and execution of the file system extensions.
    [Show full text]
  • Journaling File System Forensics
    Forensic Discovery Wietse Venema IBM T.J.Watson Research Hawthorne, New York, USA Overview Basic concepts. Time from file systems and less conventional sources. Post-mortem file system case study. Persistence of deleted data on disk and in main memory. Recovering WinXP/Linux encrypted files without key. Book text and software at author websites: – http://www.porcupine.org/ – http://www.fish2.com/ Order of Volatility (from nanoseconds to tens of years) 10-9 Registers, peripheral memory, caches, etc. 10-6 Main memory 10-3 Network state 1 Running processes Seconds 103 Disk 106 Floppies, backup tape, etc. 109 CD-ROMs, printouts, etc. Most files are accessed rarely www.things.org www.fish2.com news.earthlink.net less than 1 day 3 % 2 % 2 % 1 day – 1 month 4 % 3 % 7 % 1 - 6 months 9 % 1 % 72 % 6 months – 1 year 8 % 19 % 7 % more than 1 year 77 % 75 % 11 % Numbers are based on file read access times. Erosion paradox Information disappears, even if you do nothing. Examples: logfiles and last file access times. Routine user or system activity touches the same files again and again - literally stepping on its own footprints. Footprints from unusual behavior stand out, and for a relatively long time. Fossilization and abstraction layers (not included: financial and political layers) Useful things Without the right application, file content Applications becomes “inaccessible”. Files Deleted file attributes and content persist in File systems “inaccessible” disk blocks. Disk blocks Overwritten data persists as “inaccessible” Hardware modulations on newer data. Magnetic fields • Information deleted at layer N persists at layers N-1, etc.
    [Show full text]
  • NOVA: the Fastest File System for Nvdimms
    NOVA: The Fastest File System for NVDIMMs Steven Swanson, UC San Diego XFS F2FS NILFS EXT4 BTRFS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Disk-based file systems are inadequate for NVMM Disk-based file systems cannot 1-Sector 1-Block N-Block 1-Sector 1-Block N-Block Atomicity overwrit overwrit overwrit append append append exploit NVMM performance e e e Ext4 ✓ ✗ ✗ ✗ ✗ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ Performance optimization Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ compromises consistency on Dataj system failure [1] Btrfs ✓ ✓ ✓ ✓ ✗ ✓ xfs ✓ ✓ ✗ ✓ ✗ ✓ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ [1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. BPFS SCMFS PMFS Aerie EXT4-DAX XFS-DAX NOVA M1FS © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Previous Prototype NVMM file systems are not strongly consistent DAX does not provide data ATomic Atomicity Metadata Data Snapshot atomicity guarantee Mmap [1] So programming is more BPFS ✓ ✓ [2] ✗ ✗ difficult PMFS ✓ ✗ ✗ ✗ Ext4-DAX ✓ ✗ ✗ ✗ Xfs-DAX ✓ ✗ ✗ ✗ SCMFS ✗ ✗ ✗ ✗ Aerie ✓ ✗ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved. Ext4-DAX and xfs-DAX shortcomings No data atomicity support Single journal shared by all the transactions (JBD2- based) Poor performance Development teams are (rightly) “disk first”. © 2017 SNIA Persistent Memory Summit. All Rights Reserved. NOVA provides strong atomicity guarantees 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Atomicity Atomicity Metadata Data Mmap overwrite append overwrite append overwrite append Ext4 ✓ ✗ ✗ ✗ ✗ ✗ BPFS ✓ ✓ ✗ wb Ext4 ✓ ✓ ✗ ✓ ✗ ✓ PMFS ✓ ✗ ✗ Order Ext4 ✓ ✓ ✓ ✓ ✗ ✓ Ext4-DAX ✓ ✗ ✗ Dataj Btrfs ✓ ✓ ✓ ✓ ✗ ✓ Xfs-DAX ✓ ✗ ✗ xfs ✓ ✓ ✗ ✓ ✗ ✓ SCMFS ✗ ✗ ✗ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ Aerie ✓ ✗ ✗ © 2017 SNIA Persistent Memory Summit. All Rights Reserved.
    [Show full text]