Isolation File Systems
Total Page:16
File Type:pdf, Size:1020Kb
Physical Disentanglement in a Container-Based File System Lanyue Lu, Yupu Zhang, Thanh Do, Samer AI-Kiswany Andrea C. Arpaci-Dusseau, Remzi H. Arpaci-Dusseau University of Wisconsin Madison Motivation Isolation in various subsystems ➡ resources: virtual machines, Linux containers ➡ security: BSD jail, sandbox ➡ reliability: address spaces File systems lack isolation ➡ physical entanglement in modern file systems Three problems due to entanglement ➡ global failures, slow recovery, bundled performance Global Failures Definition ➡ a failure which impacts all users of the file system or even the operating system Read-Only ➡ mark the file system as read-only ➡ e.g., metadata corruption, I/O failure Crash ➡ crash the file system or the operating system ➡ e.g., unexpected states, pointer faults Ext3 200 Read-Only Crash 150 100 50 Number of Failure Instances 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Linux 3.X Versions Ext4 Read-Only Crash 400 350 300 250 200 150 100 50 Number of Failure Instances 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Linux 3.X Versions Slow Recovery File system checker ➡ repair a corrupted file system ➡ usually offline Current checkers are not scalable ➡ increasing disk capacities and file system sizes ➡ scan the whole file system ➡ checking time is unacceptably long Scalability of fsck on Ext3 Ext3 1007 1000 800 723 600 476 400 Fsck Time (s) 231 200 0 200GB 400GB 600GB 800GB File-system Capacity Bundled Performance Shared transaction ➡ all updates share a single transaction ➡ unrelated workloads affect each other Consistency guarantee ➡ the same journal mode for all files ➡ limited flexibility for different tradeoffs Bundle Performance on Ext3 180 SQLite Varmail 146.7 150 120 90 76.1 60 Throughput (MB/s) 30 20 1.9 0 Alone Together Server Virtualization VM1 VM2 VM3 VM4 VM5 VD1 VD2 VD3 Shared file system VD4 VD5 All VMs will crash VM1 VM2 VM3 VM4 VM5 metadata VD1 VD2 VD3 Read-Only corruption or Crash VD4 VD5 Virtual Machines VM1 VM2 VM3 100 80 60 fsck: 496s + bootup: 68s 40 20 Throughput (IOPS) 0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Time (Second) Our Solution A new abstraction for disentanglement A disentangled file system: IceFS Three major benefits of IceFS ➡ failure isolation ➡ localized recovery ➡ specialized journaling Motivation IceFS Evaluation Conclusion A New Abstraction: Cube What is a cube ➡ an independent container for a group of files ➡ physically isolated Interface ➡ create a cube: mkdir(cube_flag) ➡ delete a cube: rmdir() ➡ add files to a cube ➡ remove files to a cube ➡ set cube attributes Cube Example Cube 2 Cube 1 / a c2 c1 b d A Disentangled File System No shared physical resources ➡ no shared metadata: e.g., block groups ➡ no shared disk blocks or buffers No access dependency ➡ partition linked lists or trees ➡ avoid directory hierarchy dependency No bundled transactions ➡ use separate transactions ➡ enable customized journaling modes IceFS A prototype based on Ext3 in Linux 3.5 Disentanglement ➡ directory indirection ➡ transaction splitting Directory Indirection Cube 2 Cube 1 / a c2 c1 b d Pathname matching for cubes’ top directory paths Cubes’ dentries are pinned in memory Transaction Splitting App 1 App 2 App 3 App 1 App 2 App 3 on-disk on-disk journal journal in-memory tx preallocated tx committed tx (a) Ext3/Ext4 (b) IceFS Separate transactions from different cubes commit to the disk journal in parallel Benefits of Disentanglement Localized reactions to failures ➡ per-cube read-only and crash Localized recovery ➡ only check faulty cubes ➡ offline and online Specialized journaling ➡ parallel journaling ➡ diverse journal modes ➡ no journal, no fsync, writeback, ordered, data Motivation IceFS Evaluation Conclusion Fast Recovery Ext3 IceFS 1007 1000 800 723 600 476 400 Fsck Time (s) 231 200 91 122 35 64 0 200GB 400GB 600GB 800GB File-system Capacity Specialized Journaling OR: ordered SQLite Varmail 250 220.3 NJ: no journal 200 150 120.6 125.4 103.4 100 76.1 Throughput (MB/s) 50 1.9 9.8 5.6 0 OR OR OR OR NJ OR OR NJ Ext3 IceFS Server Virtualization VM1 VM2 VM3 VM4 VM5 metadata VD1 VD2 VD3 corruption Shared file system VD4 VD5 Server Virtualization VM1 VM2 VM3 100 IceFS-Offline fsck: 35s 80 + bootup: 67s 60 40 20 0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 100 IceFS-Online fsck: 74s 80 + bootup: 39s Throughput (IOPS) 60 40 20 0 0 50 100 150 200 250 300 350 400 450 500 550 600 650 700 Time (Second) Conclusion File systems lack physical isolation Our contribution ➡ a new abstraction: cube ➡ a disentangled file system: IceFS ➡ demonstrate its benefits: ➡ isolated failures ➡ localized recovery ➡ specialized journaling ➡ improving both reliability and performance Questions ? IceFS Disk Layout cube inode number, cube pathname orphan inode list, cube attributes SB S0 S1 S2 Sn bg bg bg bg bg cube 0 cube 1 cube 2 Each cube has one sub-super block (Si) and its own block groups (bg) .