Systematically Testing File-System Crash Consistency

Systematically Testing File-System Crash Consistency [Published at OSDI 2018] Jayashree Mohan University of Texas at Austin [email protected] 1 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 2 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 3 What is a File System? • A file system is a structured representation of data and a set of metadata describing this data. • Data includes abstractions like files and directories • File system data structures are persisted • Stored on hard disk, SSDs etc 4 What is a crash? • An event that results in interruption of ongoing processes in the system • Loss of current working state in memory • Storage left in an intermediate state 5 Crashes I crashed ☹ This is very File saved! important… Image source : https://www.fotolia.com 6 I wish file systems were crash-consistent! 7 File System Crash Consistency 1. Ordering : Filesystem operations change multiple blocks on storage that needs to be ordered • Inode, bitmaps, data blocks, superblock 2. Persistence : Data structures are cached for better performance • Great for reads! • But writes have to ensure that modified data in cache is written back to disk 8 Example of a crash scenario • Let’s consider what happens during a file append 9 File Append Example foo owner:owner: rootroot permissions:permissions: rwrw size:size: 12 pointer:pointer: 44 These updates must occur in a specific order pointer:pointer: null5 pointer:pointer: nullnull pointer:pointer: nullnull Inode Data Inodes Data Blocks Bitmap Bitmap v2v1 D1 D2 Update Update Write the the data the inode data bitmap 10 Example adapted from https://cbw.sh/static/class/5600/slides/10_File_Systems.pptx Inode Data Inodes Data Blocks Bitmap Bitmap v1 D1 D2 File system consistent, but data is lost Write the data v1v2 D1 Update the inode File system inconsistent : garbage data v1 D1 Update data bitmap File system inconsistent : space leak 11 How did FS developers handle this problem? 1.Don’t bother to ensure consistency • Run a program that fixes the file system during bootup ext2 • File system checker (fsck) • Results in data loss, but fixes inconsistency 2.Use a transaction log to make multi-writes atomic • Log stores a history of all writes to the disk • After a crash the log can be “replayed” to finish updates ext4 • Journaling file system (ext4, f2fs, btrfs, xfs) 12 File System Crash Consistency 1. Ordering : Filesystem operations change multiple blocks on storage that needs to be ordered • Inode, bitmaps, data blocks, superblock 2. Persistence : Data structures are cached for better performance • Great for reads! • But writes have to ensure that modified data in cache is written back to disk 13 The Persistence Operations • Journaling file systems aim to ensure crash consistency • But, can result in data loss if file system operations are not persisted explicitly • Changes are in memory until explicitly flushed (or a file system background checkpoint at regular timeouts) • fsync(), fdatasync(), sync foo foo fsync() foo foo foo foo Bug! 14 Crash Consistency • On crash, all the in-memory component of a file system structure is lost. • Ensuring crash consistency requires that all necessary information to recover the correct state be persisted in order. Filesystem Unmountable! Metadata Corruption Data Corruption Unmountable FS 15 Rename atomicity bug in btrfs Memory Storage 16 Rename atomicity bug in btrfs A mkdir (A) Memory Storage 17 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) Memory Storage 18 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) Memory Storage A bar 19 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) Memory Storage A bar 20 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) bar touch (B/bar) Memory Storage A bar 21 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Storage A bar 22 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) Storage A bar 23 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) fsync (A/foo) Storage A bar foo 24 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) touch (A/foo) fsync (A/foo) Storage CRASH! A bar foo Expected 25 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Persisted file A/bar touch (A/foo) missing fsync (A/foo) Storage CRASH! A A bar foo foo Actual Expected 26 What just happened? A Rename (B/bar, A/bar) A bar bar B B bar 27 What just happened? A Rename (B/bar, A/bar) A bar bar B B bar 1. unlink (A/bar) 28 What just happened? A Rename (B/bar, A/bar) A bar B B bar 1. unlink (A/bar) 2. mv (B/bar, A/bar) 29 What just happened? mkdir (A) A Rename (B/bar, A/bar) A bar touch (A/bar) B B fsync (A/bar) bar 1. unlink (A/bar) mkdir (B) 2. mv (B/bar, A/bar) touch (B/bar) rename (B/bar, A/bar) Must have been atomic touch (A/foo) fsync (A/foo) CRASH! • fsync(A/foo) commits tx that unlinks A/bar • Which means step 1 above is persisted, but rename is not persisted • End up losing file A/bar 30 Rename atomicity bug in btrfs A mkdir (A) bar touch (A/bar) foo fsync (A/bar) B mkdir (B) touch (B/bar) Memory rename (B/bar, A/bar) Persisted file A/bar touch (A/foo) missing fsync (A/foo) Storage CRASH! A A Existsbar in the kernel since 2014!foo foo Found by ACE and CrashMonkeyActual Expected 31 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 32 Testing Crash Consistency Today • Build FS from scratch Verified Model • Annotate filesystems Filesystems Checking • Hard to do for existing FS • State of the Art : xfstest suite • Collection of 500 regression tests Only 5% of tests in xfstest check for file system crash consistency 33 Challenges in crash consistency testing Lack of Challenges Infinite automated Workload infrastructure space Our work addresses both these issues, to provide a systematic testing framework 34 Bounded Black-Box Crash Testing (B3) New approach to testing file-system crash consistency Bounds: Target Filesystem (length, operations, args) ➢ Focus on reproducible bugs resulting in metadata corruption, Automatic Crash Explorer(ACE) data loss. … ➢ Focus on bugs where explicitly Workload 1 Workload n persisted data/metadata is corrupted CrashMonkey ➢ Found 10 new bugs across btrfs and F2FS; ➢ Found 1 bug in FSCQ (verified Output: file system) Bug report with workload, expected state, actual state www.github.com/utsaslab/crashmonkey 35 CrashMonkey and Ace : Features Completely Fully automated black-box File system Finds real bugs agnostic 36 CrashMonkey and Ace : Features Completely black- Fully automated box No manual checkers and workloads File system Finds real bugs agnostic 37 CrashMonkey and Ace : Features Completely Fully automated black-box No annotations Doesn’t need to look at file system code File system Finds real bugs agnostic 38 CrashMonkey and Ace : Features Completely black- Fully automated box Works on any POSIX file system, including verified FS File system Finds real bugs agnostic 39 CrashMonkey and Ace : Features Completely black- Fully automated box Acknowledged and patched by kernel developers File system Finds real bugs agnostic 40 B3 vs other approaches Metric Verified FS Model Check xfstests B3 Fully Automated Black Box FS agnostic Find previously unknown bugs 41 Agenda Crash Consistency Testing Crash-Consistency CrashMonkey Bounded Black Box Testing Ace Demo 42 Challenges with systematic testing Lack of Challenges Infinite automated workload infrastructure space 43 CrashMonkey • Efficient infrastructure to record and replay block level IO requests • Simulate crash at different points in the workload • Automatically test for consistency after crash. • Copy-on-write RAM block device 44 How to crash the file system for testing? • Actually crash the file system 1. Randomly power cycle the VM or the server • Restarting the VM after crash is slow! • Unlikely to reveal bugs 2. Run the file system in user space • Not all file systems can be run as user space processes • Redesign file systems SIMULATE crashes instead of trying to actually crash the system! 45 How does CrashMonkey simulate crashes? Workload User space mkdir (A) touch (A/bar) fsync (A/bar) Kernel space VFS + page Cache + mkdir (B) Filesystem file system touch (B/bar) rename (B/bar, A/bar) touch (A/foo) Generic Block Block IO (BIO) requests fsync (A/foo) Layer CRASH! Block Device Driver Device specific driver Block Device (HDD/ SSD) 46 CrashMonkey Architecture Crash State 1 Workload Test harness User space Crash State 2 Kernel space Filesystem • Trace all BIO requests Generic Block Layer • To simulate crash, simply drop the BIO requests after a particular point. Device Wrapper Custom RAM Block Device 47 IO due to workload CrashMonkey in Action Persistence point Workload Initial FS state Final FS state 48 IO due to workload CrashMonkey in Action Persistence point Workload Initial FS state 49 IO due to workload Phase 1 : Record IO Persistence point Workload IO forced by unmount Initial FS state Oracle Record IO up to persistence point Safely unmount 50 IO due to workload Phase 2 : Replay

Load more