NVSL Presentation Template

NVSL Presentation Template

NOVA: A High-Performance, Hardened File System for Non-Volatile Main Memories Jian Xu, Lu Zhang, Amirsaman Memaripour, Akshatha Gangadharaiah, Amit Borase, Tamires Brito Da Silva, Andy Rudoff (Intel), Steven Swanson Non-Volatile Systems Laboratory Department of Computer Science and Engineering University of California, San Diego 1 NVDIMM Usage Models • Legacy File IO Acceleration – fast and easy Legacy IO Throughput – Run existing IO-intensive apps on NVDIMMs 450 – “just works” 400 – NOVA is 30% - 10x faster than Ext4 for write intensive 350 workloads. 300 – Need strong protections on data. 250 • DAX Mmap -- maximum speed + programming 200 challenges 150 100 – Load -store access (x1000) per second Ops 50 – You still need a strongly-consistent file system 0 • File system corruption can still destroy your data • NOVA is strongly consistent – Data protection is still critical Ext4-datajournal NOVA 2 XFS F2FS NILFS EXT4 BTRFS 3 Disk-based file systems are inadequate for NVMM • Disk-based file systems Atomicity Data Protection cannot exploit NVMM 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Meta- Snap- Data performance overwrite append overwrite append overwrite append data shots Ext4 wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓ Ext4 Performance Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ • Ext4 optimization Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓ Btrfs compromises consistency ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ xfs on system failure [1] ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ Reiserfs ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ [1] Pillai et al, All File Systems Are Not Created Equal: On the Complexity of Crafting Crash-Consistent Applications, OSDI '14. 4 BPFS SCMFS PMFS Aerie EXT4-DAX M1FS XFS-DAX 5 NVMM file systems don’t provide strong consistency or data protection • DAX does not provide data Atomicity Data Protection Meta- Snap- atomicity guarantees Metadata Data Data data shots • Programming is more difficult BPFS ✓ ✓ ✗ ✗ ✗ PMFS ✓ ✗ ✗ ✗ ✗ Ext4 DAX ✓ ✗ ✗ ✓ ✗ XFS DAX ✓ ✗ ✗ ✓ ✗ SCMFS ✗ ✗ ✗ ✗ ✗ Aerie ✓ ✗ ✗ ✗ ✗ 6 NOVA provides strong atomicity guarantee Atomicity Data Protection Atomicity Data Protection 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Meta- Snap- Meta- Snap- Data Metadata Data Data overwrite append overwrite append overwrite append data shots data shots Ext4 BPFS wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✗ Ext4 PMFS Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✗ ✗ ✗ ✗ Ext4 Ext4 Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓ DAX ✓ ✗ ✗ ✓ ✗ XFS Btrfs ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ DAX ✓ ✗ ✗ ✓ ✗ xfs SCMFS ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✗ ✗ ✗ ✗ ✗ Reiserfs Aerie ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✗ ✗ ✗ ✗ 7 NOVA provides strong atomicity guarantee Atomicity Data Protection Atomicity Data Protection 1-Sector 1-Sector 1-Block 1-Block N-Block N-Block Meta- Snap- Meta- Snap- Data Metadata Data Data overwrite append overwrite append overwrite append data shots data shots Ext4 BPFS wb ✓ ✗ ✗ ✗ ✗ ✗ ✗ ✓ ✓ ✓ ✓ ✗ ✗ ✗ Ext4 PMFS Order ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✗ ✗ ✗ ✗ Ext4 Ext4 Dataj ✓ ✓ ✓ ✓ ✗ ✓ ✗ ✓ ✓ DAX ✓ ✗ ✗ ✓ ✗ XFS Btrfs ✓ ✓ ✓ ✓ ✗ ✓ ✓ ✓ ✓ DAX ✓ ✗ ✗ ✓ ✗ xfs SCMFS ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✗ ✗ ✗ ✗ ✗ Reiserfs Aerie ✓ ✓ ✗ ✓ ✗ ✓ ✗ ✓ ✓ ✓ ✗ ✗ ✗ ✗ NOVA NOVA ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ 8 NOVA’s Key Features • Features – High-performance • Usage Models – Strong Consistency – open()/close(), read()/write() – Snapshot support – DAX -mmap() – Data protection 9 NOVA’s Architecture 10 Core NOVA Structures Log Structure + copy-on-write + Journals • One log per iNode • Non-contiguous Tail Tail • Fast, Simple atomic File log updates • Meta-data only 11 Core NOVA Structures Log Structure + copy-on-write + Journals • Multi-page atomic update Tail Tail • Fast allocation File log • Instant data GC Data 0 Data 1 Data 1 Data 2 12 Core NOVA Structures Log Structure + copy-on-write + Journals TailTail Directory log • Small, fixed sized Tail Tail journals File log • For complex ops. Dir tail Journal File tail 13 Supporting Backups with Snapshots 14 Snapshots for Normal File Access Current epoch 012 Snapshot 0 Snapshot 1 File log 0 0x1000 1 0x2000 1 0x3000 2 0x4000 Data Data Data Data Snapshot entry File write entry Epoch ID Data in snapshot Reclaimed data Current data 15 Corrupt Snapshots with DAX-mmap() • Recovery invariant: if V == True, then D is valid – Incorrect: Naïvely mark pages read-only one-at-a-time Application: D = 1; V = True; Snapshot Page hosting D: ? 1 ? Page hosting V: False True T Time Snapshot Page Copy on Value R/W RO Fault Write Change 16 Consistent Snapshots with DAX-mmap() • Recovery invariant: if V == True, then D is valid – Correct: Block page faults until all pages are read-only Application: D = 1; V = True; Snapshot Page hosting D: ? 1 ? Page hosting V: False True F Time Snapshot Page Copy on Value R/W RO RO Fault Write Change Blocking 17 Performance impact of snapshots • Normal execution vs. taking snapshots every 10s – Negligible performance loss through read()/write() – Average performance loss 6.2% through mmap() Conventional workloads NVMM-aware workloads from WHISPER 18 Data Protection: Metadata 19 NVMM Failure Modes: Media Failures • Media errors – Detectable & correctable • Transparent to software – Detectable & uncorrectable Software: Consumes good data • Affect a contiguous range of data • Raise machine check exception (MCE) NVMM Ctrl.: Detects & corrects errors – Undetectable Read NVMM data: • May consume corrupted data Media error • Software scribbles – Kernel bugs or own bugs – Transparent to hardware 20 NVMM Failure Modes : Media Failures • Media errors – Detectable & correctable • Transparent to software – Detectable & uncorrectable Software: Receives MCE • Affect a contiguous range of data • Raise machine check exception (MCE) NVMM Ctrl.: Detects uncorrectable errors Raises exception Read – Undetectable NVMM data: • May consume corrupted data Media error & Poison Radius (PR) • Software scribbles e.g. 512 bytes – Kernel bugs or own bugs – Transparent to hardware 21 Detecting NVMM Media Errors memcpy_mcsafe() • Copy data from NVMM Process and • Catch MCEs and return failure return Yes Handler registered? No Kernel Whose Kernel panic access? User Recoverable MCE SIGBUS Unrecoverable Kernel panic 22 NVMM Failure Modes : Media Failures • Media errors – Detectable & correctable • Transparent to software – Detectable & uncorrectable Software: Consumes corrupted data • Affect a contiguous range of data • Raise machine check exception (MCE) NVMM Ctrl.: Sees no error – Undetectable Read NVMM data: • Consume corrupted data Media error • Software scribbles – Kernel bugs or own bugs – Transparent to hardware 23 NVMM Failure Modes: Scribbles • Media errors – Detectable & correctable • Transparent to software – Detectable & uncorrectable Software: Bug code scribbles NVMM • Affect a contiguous range of data Write • Raise machine check exception (MCE) NVMM Ctrl.: Updates ECC – Undetectable NVMM data: • Consume corrupted data Scribble error • Software “scribbles” – Kernel bugs or NOVA bugs – NVMM file systems are highly vulnerable 24 NVMM Failure Modes: Scribbles • Media errors – Detectable & correctable • Transparent to software – Detectable & uncorrectable Software: Consumes corrupted data • Affect a contiguous range of data • Raise machine check exception (MCE) NVMM Ctrl.: Sees no error – Undetectable Read NVMM data: • Consume corrupted data Scribble error • Software “scribbles” – Kernel bugs or NOVA bugs – NVMM file systems are highly vulnerable 25 NOVA Metadata Protection • Replicate everything inode’ Head’ Tail’ csumcsum’’ H1’ T1’ – Inodes inode – Logs Head Tail csum H1 T1 – Superblock – … ent1 c1 … entN cN • CRC32 Checksums everywhere ent1’ c1’ … entN’ cN’ Data 1 Data 2 26 Defense Against Scribbles • Tolerating Larger Scribbles – Allocate replicas far from one another – Can tolerate arbitrarily large scribbles to metadata. • Preventing scribbles – Mark all NVMM as read-only – Disable CPU write protection while accessing NVMM 27 Data Protection: Data 28 NOVA Data Protection • Divide 4KB blocks into 512-byte stripes • Compute a RAID 5-style parity stripe • Compute and replicate checksums for each stripe 1 Block 512-Byte stripe segments S0 S1 S2 S3 S4 S5 S6 S7 P P = S0..7 C = CRC32C(S ) i ⊕ i Replicated 29 File data protection with DAX-mmap • With DAX-Mmap(), file data changes are invisible to NOVA • NOVA cannot protect mmap’ed file data • NOVA logs mmap() and restores protection on munmap() or recovery User-space load/store load/store Applications: Kernel-space NOVA: read(), write() mmap() NVDIMMs File data: protected unprotected File log: mmap log entry 30 File data protection with DAX-mmap • NOVA cannot protect mmap’ed file data – User applications directly load/store the mmap’ed region – NOVA has to know what file pages are mmap’ed User-space load/store munmap() Applications: Kernel-space NOVA: read(), write() mmap() NVDIMMs Protection restored File data: File log: 31 File data protection with DAX-mmap • NOVA cannot protect mmap’ed file data – User applications directly load/store the mmap’ed region – NOVA has to know what file pages are mmap’ed User-space System Failure + Applications: recovery Kernel-space NOVA: read(), write() mmap() NVDIMMs File data: File log: 32 Performance 33 Performance Cost of Data Integrity 1.2 1 0.8 0.6 0.4 0.2 0 Fileserver Varmail Webproxy Webserver RocksDB MongoDB Exim TPCC average xfs-DAX ext4-DAX ext4-dataj Fortis baseline w/ MP+WP w/ MP+DP+WP 34 Conclusion • Existing file systems do not meet the requirements of applications on NVMM file systems • NOVA’s multi-log design achieves high performance and strong consistency • NOVA’s data protection features ensure data integrity • NOVA outperforms existing file systems while providing stronger consistency and data protection guarantees 35 Thank you! Try NOVA! https://github.com/NVSL/NOVA

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    76 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us