Soft Updates Made Simple and Fast on Non-volatile Memory
Mingkai Dong, Haibo Chen Institute of Parallel and Distributed Systems, Shanghai Jiao Tong University
@ ATC ‘17 Non-volatile Memory (NVM)
ü Non-volatile ü Byte-addressable ü High throughput and low latency
2 NVM File Systems (NVMFS)
Existing NVMFS use journaling or copy-on-write for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency?
A A’ inode inode
Journal B C C’ Area File System Metadata
D E E’
3 NVM File Systems (NVMFS)
Existing NVMFS use journaling or copy-on-write for crash consistency Synchronous cache flushes are necessary Cache flushes are expensive! Other options for crash consistency?
A A’ inode inode
Journal B C C’ Area File System Metadata
D E E’
4 Soft Updates DRAM (Page cache) Latest metadata in DRAM § Updated in DRAM with dependency tracked ü DRAM performance ü No synchronous disk writes DISK Consistent metadata in disks § Persisted to disks with dependency enforced ü Always consistent ü Immediately usable after crash Traditional Soft Updates
5 Soft Updates
Update dependencies block bitmap § E.g., allocating a new data block 1. Allocate in bitmap 2. Fill data in the block inode new 3. Update pointer to the block data block
6 Soft Updates Is Complicated
Delayed disk writes block bitmap § Auxiliary structures for each update § More complex dependencies inode new data block
Figures from Soft Updates: A Technique for Eliminating Most Synchronous Writes in the Fast Filesystem, ATC ’99 7 Soft Updates Is Complicated
Delayed disk writes § Auxiliary structures for each operation inode Block Directory Block § More complex dependencies inode #4 inode #5 Cyclic dependencies <--, #0> § Inode #6 Rolling back/forward Inode #7
8 Soft Updates Is Complicated
Delayed disk writes § Auxiliary structures for each operation § More complex dependencies Cyclic dependencies § Rolling back/forward
The mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks
9 Soft Updates Meets NVM
Soft Updates ü No synchronous cache flushes ü Immediately usable after crash NVM: byte-addressable and fast ü Direct write to NVM without delays ü No false sharing => no rolling back/forward ü Simple dependency tracking/enforcement
10 SoupFS
A simple and fast NVMFS derived from soft updates § Hashtable-based directories § No false sharing § Pointer-based dual views § No synchronous cache flushes § Semantic-aware dependency tracking/enforcement § Simple dependency tracking/enforcement
Get the best of both Soft Updates and NVM
11 Overview
Background Design & Implementation § Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion
12 Overview
Background Design & Implementation § Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion
13 Block-based Directories
Block-based file systems usually use block-based directories § False sharing 1.TxT|32 2 ✘ Cyclic dependency .TxT|38 ✘ Rolling back/forward fs-long-lon § Slow access indirect g.exe|512 Directory block ✘ Linear scan inode
l+f.dir|12
14 Hashtable-based Directories
Directory Optimized for cache lines Buckets inode ü No false sharing 0 1 2 3 4 … ü No cyclic dependency Efficient access Filename inode Latest Consistent ü No linear scan Pointer Pointer Next Next
Hash Len Filename inode
15 Overview
Background Design & Implementation ü Hashtable-based directories § Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion
16 DualViews
Latest view in page cache DRAM (Page cache) Consistent view in disks
Dual views § Eliminate synchronous writes DISK § Provide usability after crash
Traditional Soft Updates
17 DualViews
Latest view in page cache DRAM (Page cache) Consistent view in disks NVM Latest view?
Another copy of metadata in DRAM DISK NVM ✗ Double writes ✗ Double storage overheadChallenge: How to present latest view efficiently? ✗ Unnecessary synchronizations Soft Updates on NVM
18 Pointer-based Dual Views
Reuse data structures in both views DRAM Distinguish views by different pointers/structures
NVM
Soft Updates on NVM
19 Pointer-based Dual Views
Reuse data structures in both views Distinguish views by different pointers/structures
Data Structures In Consistent View In Latest View inode SoupFS inode VFS inode dentry consistent next pointer latest next pointer hash table bucket latest bucket if exists B-tree root/height in SoupFS inode root/height in VFS inode
20 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View Dir
File A File B File C File D Consistent View Dir
File A File B File C File D 21 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View Dir
Ø create E File A File B File C File D File E Consistent View Dir
File A File B File C File D 22 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View Dir Latest Buckets Ø create E File A File B File C File D File E Consistent View Dir
File A File B File C File D 23 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets Ø create E File A File B File C File D File E Consistent View Dir
File A File B File C File D 24 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets Ø create E File A File B File C File D File E 3 Consistent View Dir
File A File B File C File D 25 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B File C File D File E 3 Consistent View Dir
File A File B File C File D 26 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 Ø unlink B Consistent View Dir
File A File B File C File D 27 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 Ø unlink B Consistent View Dir
File A File B File C File D 28 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D 29 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 00 11 22 33 44 …… Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D ✔ ✔ ✔ ✔ 30 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 00 11 22 33 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A✔File B❌ File C✔ File D✔ File E✔ 33 § unlink B Consistent View Dir
File A File B File C File D 31 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D 32 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D 33 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D File E 34 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D File E ❌ 35 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C B A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File B❌ File C File D File E 3 § unlink B Consistent View Dir
File A File B File C File D File E ❌ 36 Pointer-based Dual Views Directory Buckets Volatile in DRAM inode Updates to NVM w/o persistence guarantee 0 1 2 3 4 … Persisted in NVM
Directory D C A VFS Filename inode Latest Consistent inode Next Next
Latest View E Dir Latest Buckets § create E File A File C File D File E 3 § unlink B Consistent View Dir
File A File C File D File E 37 Pointer-based Dual Views
Reuse data structures in both views DRAM Distinguish views by different pointers/structures
ü Eliminate synchronous writes ü Provide usability after crash NVM
ü No double write ü Little space overhead Soft Updates on NVM
38 Overview
Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views § Semantic-aware dependency tracking/enforcement Evaluation Conclusion
39 Dependency Tracking
Auxiliary structures for each updates
40 Dependency Tracking
Auxiliary structures for each updates
The semantic gap between the page cache (where enforcement happens) and the file system (where tracking happens)
After removing page cache, SoupFS involves semantics in dependency tracking/enforcement
41 Semantic-aware Dependency Tracking
Track semantic operations with complementary information § Enough for dependency enforcement
Operation Type Complementary Information (pointers/integers) diradd added dentry, source directory∗, overwritten inode∗ dirrem removed dentry, destination directory∗ sizechg the old and new file size attrchg nothing
Information tagged with ∗ is for rename operation. 42 Semantic-aware Dependency Tracking
Track semantic operations with complementary information § Enough for dependency enforcement Operations are stored in operation list of each VFS inode
dirty inode list VFS VFS VFS inode inode inode list next list next list next operation operation operation list list list
list next operation type Complimentary information
43 Semantic-aware Dependency Enforcement
Persister daemons traverse the dirty inode list in background § persist each operation from the latest view to the consistent view with respect to update dependencies
dirty inode list VFS VFS VFS inode inode inode list next list next list next operation operation operation list list list
list next operation type Complimentary information
44 Overview
Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion
45 Evaluation Setup
Platform § Intel Xeon E5 server with two 8-core processors § 48 GB DRAM and 64 GB NVDIMM File Systems § SoupFS, PMFS, NOVA, Ext4-DAX, Ext4 NVM Write Delay Simulation § ndelay() after clflush Benchmarks § Micro-benchmarks: 100 iterations of 104 create/unlink/mkdir/rmdir § Filebench and Postmark
46 Micro-benchmark Latency Inefficient Directory Organization 20 1 57.5 57.7 Ext4 EXT4 Ext4-DAX 0.8 Ext4-DAX 15 PMFS PMFS NOVA NOVA SoupFS 0.6 SoupFS
10 CDF 0.4 Latency (us/op) 5 0.2
0 0 create unlink mkdir rmdir 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Latency (us)
Lowest Latency 47 Sensitivity to NVM Write Delay
Create Unlink 70 20 PMFS 65 PMFS 16 NOVA 60 SoupFS Latency(us) 55 Latency (us) 12 15 ↑~250% NOVA 8 10 SoupFS ↑~200% 5 4
0 0 0 200 400 600 800 0 200 400 600 800 Delay (ns) Delay (ns) No effect
48 Postmark & Filebench
Postmark Fileserver-1K 350 1200 Ext4 Ext4 300 Ext4-DAX ↑~50% Ext4-DAX 1000 PMFS PMFS 250 NOVA NOVA 800 SoupFS SoupFS 200 600 150
Throughput (MB/s) 400 100 Throughput (x1000 ops/s) 50 200
0 0 Read Write Threads
49 Overview
Background Design & Implementation ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement Evaluation Conclusion
50 Conclusion
§ Soft updates is complicated due to the mismatch between per-pointer-based dependency tracking and block-based interface of traditional disks § We design and implement SoupFS ü Hashtable-based directories ü Pointer-based dual views ü Semantic-aware dependency tracking/enforcement
§ Soft updates can be made simple and fast on NVM Thanks & Questions? ;-) 51