Tolera&Ng File‐System Mistakes with Envyfs

Tolera&Ng File‐System Mistakes with Envyfs

Tolerang File‐System Mistakes with EnvyFS Swaminathan Sundararaman Lakshmi N. Bairavasundaram Andrea C. Arpaci‐Dusseau NetApp, Inc. Remzi H. Arpaci‐Dusseau University of Wisconsin Madison File Systems in Today’s World • Modern file systems are complex – Tens of thousands of lines of code (e.g., XFS 45K LOC) • Storage stack is also geng deeper – Hypervisor, network, logical volume manager • Need to handle a gamut of failures – Memory allocaon, disk faults, bit flips, system crashes • Preserve integrity of its meta‐data and user data 6/18/09 Tolerang File‐System Mistakes with EnvyFS 2 File System Bugs • Bug reports for Linux 2.6 series from Bugzilla – ext3: 64, JFS: 17, ReiserFS: 38 – Some are FS corrupon causing permanent data loss • FS bugs broadly classified into two categories – “fail‐stop”: System immediately crashes • Soluons: Nooks [Swi 04], CuriOS [David08] – “fail‐silent”: Accidentally corrupt on‐disk state • Many such bugs uncovered [Prabhakaran05, Gunawi08, Yang04, Yang06b] 6/18/09 Tolerang File‐System Mistakes with EnvyFS 3 Bugs are inevitable in file systems Challenge: how to cope with them? 6/18/09 Tolerang File‐System Mistakes with EnvyFS 4 N‐Version File Systems • Based on N‐version programming [Avizienis77] – NFS servers [Rodrigues01], databases [Vandiver07], security [Cox06] • EnvyFS: Simple soware layer Applicaon EnvyFS layer – Store data in N child file systems – Operaons performed on all children … Child N Child 2 • Rely on a simple soware layer Child 1 • Challenge: reducing overheads while Disk driver SIS layer retaining reliability – SubSIST: Novel Single Instance Store Disk 6/18/09 Tolerang File‐System Mistakes with EnvyFS 5 Results • Robustness – Tradional file systems handle few corrupons (< 4%) – EnvyFS3 tolerates 98.9% of single file system mistakes • Performance – Desktop workloads: EnvyFS3 has Comparable performance – I/O intensive workloads: • Normal mode: EnvyFS3 + SubSIST acceptable performance • Under memory pressure: EnvyFS3 + SubSIST large overheads • Potenal as a debugging tool for FS developers – Pinpoint the source of “fail‐silent” bug in ext3 6/18/09 Tolerang File‐System Mistakes with EnvyFS 6 Outline • Introducon • Building reliable file systems • Reducing overheads with SubSIST • Evaluaon • Conclusion 6/18/09 Tolerang File‐System Mistakes with EnvyFS 7 N‐Version Systems Development process: 1. Producing the specificaon of soware 2. Implemenng N versions of the soware 3. Creang N‐version layer — Executes different versions — Determines the consensus result 6/18/09 Tolerang File‐System Mistakes with EnvyFS 8 1. Producing Specificaon • Our own specificaon ? – Impraccal: Requires wide scale changes to file systems – Specificaons take years to get accepted • Can we leverage exisng specificaon ? – Yes, can leverage VFS, but there are some issues • VFS not precise for N‐versioning purpose – Needs to handle cases where specificaon is not precise – e.g., Ordering directory entries, inode number allocaon 6/18/09 Tolerang File‐System Mistakes with EnvyFS 9 Imprecise VFS Specificaon File 1 Ordering directory entries File 2 File 3 Dir: test • Issue: File 1 – No specified return order File 2 Readdir: test No Entries File 3 – Can’t blindly compare entries EnvyFS layer File 1 File 2 File 3 • Soluon: – Read all entries from a directory File 1 File 2 File 3 (dir: test in our case) from all FSes File 2 File 3 File 1 FS X … FS Y FS Z – Match entries from FSes File 3 File 1 File 2 – Return majority results Dir: test Dir: test Dir: test 6/18/09 Tolerang File‐System Mistakes with EnvyFS 10 Imprecise VFS Specificaon (cont) • Inode number allocaon – Inode numbers returned through system calls – Each child file system issues different inode numbers – Possible soluon: Force file systems to use same algorithm? – Our soluon: Issue inode numbers at EnvyFS layer Stat: File 1 File 1 | ?? 15 Virt # FS 1 FS 2 FS 3 EnvyFS layer 15 10 36 65 File 1 | 10 File 1 | 36 File 1 |65 Inode Mapping Table File 1 10 File 2 04 File 3 99 File 2 15 File 3 44 File 1 65 FS Z FS X File 3 16 File 1 36 FS Y File 2 43 Inode Mapping Table not persistently stored Dir: test Dir: test Dir: test 6/18/09 Inode Numbers Tolerang File‐System Mistakes with EnvyFS 11 2. Implemenng N versions of FS • Painful process – High cost of development, long me delays • Lucky! Hard work already done for us – 30 different disk based file systems in Linux 2.6 • Which file systems to use? – ext3, JFS, ReiserFS in a three‐version FS – Others should work without modificaons 6/18/09 Tolerang File‐System Mistakes with EnvyFS 12 3. Creang N‐Version Layer • N‐Version layer (EnvyFS) Applicaon – Inserted beneath VFS Read (file, 1 block) err , D – Simple design to avoid bugs VFS layer Read (file, 1 block) err , D EnvyFS Inode Mapping Table • Example: Reading a file Layer Wrappers Comparators – Allocate N data buffers err = Read (…) err = Read (…) err = Read (…) – Read data block from the disk D D D – Compare: data, return code, file posion F F F – Return: data, return code pos: x pos: x pos: x … JFS ext3 ReiserFS • Issues: D D D – Allocate memory for each read operaon – Extra copy from allocated buffer to applicaon Disk – Comparison overheads 6/18/09 Tolerang File‐System Mistakes with EnvyFS 13 Reading a File in EnvyFS • Soluon: Applicaon Read (file, 1 block) err , – Same applicaon buffer for all FS D – TCP‐like checksums for data comparison VFS layer Read (file, 1 block) err , D – Compare: checksums, return code, file EnvyFS Inode Mapping Table posion Layer Wrappers Comparators – Read data unl majority err = Read (…) err = Read (…) err = Read (…) D D D F F F pos: x pos: x pos: x … JFS ext3 ReiserFS D D D FS 1 # FS 2 # … FS N # 435 435 … 436 Disk Checksums 6/18/09 Tolerang File‐System Mistakes with EnvyFS 14 Outline • Introducon • Building reliable file systems • Reducing overheads with SubSIST • Evaluaon • Conclusion 6/18/09 Tolerang File‐System Mistakes with EnvyFS 15 Case for Single Instance Storage (SIS) Applicaon • Ideal: One disk per FS VFS layer 1 • Praccal: One disk for all FS EnvyFS layer 1 2 N • Overheads FS 1 FS 2 – Effecve storage space: 1/N … FS N – N mes more I/O (Read/write) 1 2 N Disk 1 Disk Req. Queue Disk 2 … Disk N • Challenge: Maintain diversity Part 1 Part 2 Disk … Part N while minimizing overheads 6/18/09 Tolerang File‐System Mistakes with EnvyFS 16 SubSIST: Single Instance Store Applicaon • Variant of an Single Instance Store – Selecvely merges data blocks VFS layer D • Block addressable SIS EnvyFS layer – Exports virtual disks to FSes D D D – Manages mapping, free space info. FS 1 … FS 2 – Not persistently stored on disk FS N M D M D M D • EnvyFS writes through N file systems Vdisk 1 Vdisk 2 Vdisk N – N data blocks merged to 1 data block – Content hashes not stored persistently CHash Layer Read Cache SubSIST – Meta‐data blocks not merged Free Space Management – Inter FS blocks and not intra FS Disk 6/18/09 Tolerang File‐System Mistakes with EnvyFS 17 Handling Data Block Corrupons? Applicaon Corrupon to data in a single FS VFS layer – Due to bugs, bit flips, storage stack – Corrupt data blocks not merged EnvyFS layer – All other N‐1 data blocks merged D D D – Corrupt data block fixed at next read FS 1 … FS 2 FS N × Corrupon to data block inside disk D D D • Single copy of data Vdisk 1 Vdisk 2 Vdisk N D – Different code paths CHash Layer Read Cache SubSIST – Different on‐disk structures Free Space Management D Disk 6/18/09 Tolerang File‐System Mistakes with EnvyFS 18 Outline • Introducon • Building reliable file systems • Reducing overheads with SubSIST • Evaluaon – Reliability – Performance • Conclusion 6/18/09 Tolerang File‐System Mistakes with EnvyFS 19 Reliability Evaluaon: Fault Injecon VFS Robustness of EnvyFS in recovering from a child file system’s mistake? EnvyFS layer • Corrupon: bugs in FS / storage stack JFS ext 3 • Types of disk blocks ReiserFS – superblock, inode, block bitmap, file data, … B B Pseudo • Perform different file ops Type‐aware fault injecon [Prabhakaran05] Device Driver – mount, stat, creat, unlink, read, … B B B • Report user visible results Block Driver • All results are applicable with SubSIST except corrupon to data blocks B B B Disk 6/18/09 Tolerang File‐System Mistakes with EnvyFS 20 ) ) Normal ) ext3 stat, … chmod fsync Data E loss path traversal SET‐1 ( SET‐2 ( read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET‐3 ( umount Cannot INODE mount DIR Ops fail BMAP Data IMAP Corrupt INDIRECT Result Matrix Crash DATA SUPER Read‐only JSUPER Depends GDESC e N/A 6/18/09 Tolerang File‐System Mistakes with EnvyFS 21 ext3 6/18/09 GDESC JSUPER SUPER DATA INDIRECT IMAP BMAP DIR INODE E but, does not handle superblock corrupon Ext3 stores many superblock copies; path traversal SET‐1 (stat, …) SET‐2 (chmod) read Tolerang File‐System Mistakes with EnvyFS readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET‐3 (fsync) umount Cannot mount Data N/A loss 22 ) ) ) ext3 stat, … chmod fsync E path traversal SET‐1 ( SET‐2 ( read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET‐3 ( umount INODE Data DIR loss BMAP • In addion to operaons failing, inode Cannot mount IMAP Corrupon leads to data loss INDIRECT Ops fail • Unlink: system crash during unmount DATA Crash SUPER JSUPER N/A GDESC 6/18/09 Tolerang File‐System Mistakes with EnvyFS 23 ) ) Normal ) ext3 stat, … chmod fsync Data E loss path traversal SET‐1 ( SET‐2 ( read readlink getdirentries creat link mkdir rename symlink write truncate rmdir unlink mount SET‐3 ( umount Cannot INODE mount DIR Ops fail

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    36 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us