<<

Soft Updates Made Simple and Fast on Non-volatile Memory

Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University

@ ATC ‘17 Non-volatile Memory (NVM)

ü Non-volatile ü Byte-addressable ü High throughput and low latency

2 NVM Systems (NVMFS)

Existing NVMFS use journaling or copy-on- for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency?

A A’ inode inode

Journal B C C’ Area File System Metadata

D E E’

3 NVM File Systems (NVMFS)

Existing NVMFS use journaling or copy-on-write for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency?

A A’ inode inode

Journal B C C’ Area File System Metadata

D E E’

4 Soft Updates DRAM (Page cache) Latest metadata in DRAM § Updated in DRAM with dependency tracked ü DRAM performance ü No synchronous disk writes DISK Consistent metadata in disks § Persisted to disks with dependency enforced ü Always consistent ü Immediately usable after crash Traditional Soft Updates

5 Soft Updates

Update dependencies block bitmap § E.g., allocating a new data block 1. Allocate in bitmap 2. Fill data in the block inode new 3. Update pointer to the block data block

6 Soft Updates Is Complicated

Delayed disk writes block bitmap § Auxiliary structures for each update § complex dependencies inode new data block

Figures from Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem, ATC ’99 7 Soft Updates Is Complicated

Delayed disk writes § Auxiliary structures for each operation inode Block Directory Block § More complex dependencies inode #4 inode #5 Cyclic dependencies <--, #0> § Inode #6 Rolling back/forward Inode #7 inode block (in page cache) inode block inode block inode block inode #4 Rollback inode #4 inode #4 Rollforward inode #4 inode #6 Flush block to disks inode #6 inode #5 inode #5 inode #5 inode #5 inode #6 inode #6 inode #6 inode #6 inode #7 inode #7 inode #7 inode #7

8 Soft Updates Is Complicated

Delayed disk writes § Auxiliary structures for each operation § More complex dependencies Cyclic dependencies § Rolling back/forward

The mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks

9 Soft Updates Meets NVM

Soft Updates ü No synchronous cache flushes ü Immediately usable after crash NVM: byte-addressable and fast ü Direct write to NVM without delays ü No false sharing => no rolling back/forward ü Simple dependency tracking/enforcement

10 SoupFS

A simple and fast NVMFS derived from soft updates § Hashtable-based directories § No false sharing § Pointer-based dual views § No synchronous cache flushes § Semantic-aware dependency tracking/enforcement § Simple dependency tracking/enforcement

Get the best of both Soft Updates and NVM

11 Overview

Background Design & Implementation § Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion

12 Overview

Background Design & Implementation § Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion

13 Block-based Directories

Block-based file systems usually use block-based directories § False sharing 1.TxT|32 2 ✘ Cyclic dependency .TxT|38 ✘ Rolling back/forward fs-long-lon § Slow access indirect g.exe|512 Directory block ✘ Linear scan inode

l+f.dir|12

14 Hashtable-based Directories

Directory Optimized for cache lines Buckets inode ü No false sharing 0 1 2 3 4 … ü No cyclic dependency Efficient access Filename inode Latest Consistent ü No linear scan Pointer Pointer Next Next

Hash Len Filename inode

15 Overview

Background Design & Implementation ü Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion

16 DualViews

Latest view in page cache DRAM (Page cache) Consistent view in disks

Dual views § Eliminate synchronous writes DISK § Provide usability after crash

Traditional Soft Updates

17 DualViews

Latest view in page cache DRAM (Page cache) Consistent view in disks NVM Latest view?

Another copy of metadata in DRAM DISK NVM ✗ Double writes ✗ Double storage overheadChallenge: How to present latest view efficiently? ✗ Unnecessary synchronizations Soft Updates on NVM

18 Pointer-based Dual Views

Reuse data structures in both views DRAM Distinguish views by different pointers/structures

NVM

Soft Updates on NVM

19 Pointer-based Dual Views

Reuse data structures in both views Distinguish views by different pointers/structures

Data Structures In Consistent View In Latest View inode SoupFS inode VFS inode dentry consistent next pointer latest next pointer hash table bucket latest bucket if exists B-tree root/height in SoupFS inode root/height in VFS inode

20 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View Dir

File A File B File C File D Consistent View Dir

File A File B File C File D 21 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View Dir

Ø create E File A File B File C File D File E Consistent View Dir

File A File B File C File D 22 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View Dir Latest Buckets Ø create E File A File B File C File D File E Consistent View Dir

File A File B File C File D 23 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets Ø create E File A File B File C File D File E Consistent View Dir

File A File B File C File D 24 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets Ø create E File A File B File C File D File E 3 Consistent View Dir

File A File B File C File D 25 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B File C File D File E 3 Consistent View Dir

File A File B File C File D 26 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 Ø unlink B Consistent View Dir

File A File B File C File D 27 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 Ø unlink B Consistent View Dir

File A File B File C File D 28 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D 29 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 00 11 22 33 44 …… Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D ✔ ✔ ✔ ✔ 30 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 00 11 22 33 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A✔File B❌ File C✔ File D✔ File E✔ 33 § unlink B Consistent View Dir

File A File B File C File D 31 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D 32 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D 33 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D File E 34 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D File E ❌ 35 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C B A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir

File A File B File C File D File E ❌ 36 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM

Directory D C A VFS Filename inode Latest Consistent inode Next Next

Latest View E Dir Latest Buckets § create E File A File C File D File E 3 § unlink B Consistent View Dir

File A File C File D File E 37 Pointer-based Dual Views

Reuse data structures in both views DRAM Distinguish views by different pointers/structures

ü Eliminate synchronous writes ü Provide usability after crash NVM

ü No double write ü Little space overhead Soft Updates on NVM

38 Overview

Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion

39 Dependency Tracking

Auxiliary structures for each updates

40 Dependency Tracking

Auxiliary structures for each updates

The semantic gap between the page cache (where enforcement happens) and the file system (where tracking happens)

After removing page cache, SoupFS involves semantics in dependency tracking/enforcement

41 Semantic-aware Dependency Tracking

Track semantic operations with complementary information § Enough for dependency enforcement

Operation Complementary Information (pointers/integers) diradd added dentry, source directory∗, overwritten inode∗ dirrem removed dentry, destination directory∗ sizechg the old and new file size attrchg nothing

Information tagged with ∗ is for rename operation. 42 Semantic-aware Dependency Tracking

Track semantic operations with complementary information § Enough for dependency enforcement Operations are stored in operation list of each VFS inode

dirty inode list VFS VFS VFS inode inode inode list next list next list next operation operation operation list list list

list next operation type Complimentary information

43 Semantic-aware Dependency Enforcement

Persister daemons traverse the dirty inode list in background § persist each operation from the latest view to the consistent view with respect to update dependencies

dirty inode list VFS VFS VFS inode inode inode list next list next list next operation operation operation list list list

list next operation type Complimentary information

44 Overview

Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion

45 Evaluation Setup

Platform § Intel Xeon E5 server with two 8-core processors § 48 GB DRAM and 64 GB NVDIMM File Systems § SoupFS, PMFS, NOVA, Ext4-DAX, Ext4 NVM Write Delay Simulation § ndelay() after clflush Benchmarks § Micro-benchmarks: 100 iterations of 104 create/unlink// § Filebench and Postmark

46 Micro-benchmark Latency Inefficient Directory Organization 20 1 57.5 57.7 Ext4 EXT4 Ext4-DAX 0.8 Ext4-DAX 15 PMFS PMFS NOVA NOVA SoupFS 0.6 SoupFS

10 CDF 0.4 Latency (us/op) 5 0.2

0 0 create unlink mkdir rmdir 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Latency (us)

Lowest Latency 47 Sensitivity to NVM Write Delay

Create Unlink 70 20 PMFS 65 PMFS 16 NOVA 60 SoupFS Latency(us) 55 Latency (us) 12 15 ↑~250% NOVA 8 10 SoupFS ↑~200% 5 4

0 0 0 200 400 600 800 0 200 400 600 800 Delay (ns) Delay (ns) No effect

48 Postmark & Filebench

Postmark Fileserver-1K 350 1200 Ext4 Ext4 300 Ext4-DAX ↑~50% Ext4-DAX 1000 PMFS PMFS 250 NOVA NOVA 800 SoupFS SoupFS 200 600 150

Throughput (MB/s) 400 100 Throughput (x1000 ops/s) 50 200

0 0 Read Write Threads

49 Overview

Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion

50 Conclusion

§ Soft updates is complicated due to the mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks § We design and implement SoupFS ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement

§ Soft updates can be made simple and fast on NVM Thanks & Questions? ;-) 51