Research Faculty Summit 2018
Systems | Fueling future disruptions What Should You Do With Persistent Memory?
Steven Swanson
Director, Non-Volatile Systems Lab Computer Science and Engineering UC San Diego 2 Non-volatile main memory (NVMM)
Byte-addressable • CPU • Denser than DRAM Core Core L1/L2 L1/L2 • DRAM-comparable latency Last-level cache • Higher bandwidth than SSD Memory Memory • Ready for DMA / RDMA Controller Controller DRAM DIMM NVDIMM
DRAM DIMM NVDIMM Memory NVMM
Intel 3D XPoint NVDIMM 3 What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system NOVAEXT4 NOVAEXT4 2. Use files and better file (distributed) system CEPH NOVAEXT4 Orion EXT4 3. Build persistent data etc. NOVA structures 4. Use it as slow DRAM NOVAEXT4Log NOVAEXT4
4 What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system NOVA NOVA 2. Use files and better file (distributed) system NOVA Orion NOVA 3. Build persistent data structures 4. Use it as slow DRAM NOVA NOVA
5 DAX
Fault Direct File IO Atomicity Speed Tolerance Access
6 NOVA: A File System for NVMM
• A NOVA FS is a tree of logs Per-inode logging • One log per inode
– Inode points to head and tail Inode Head Tail – Logs are not contiguous • Many Logs -> high concurrency Inode log • Strong consistency guarantees • Log-structured + journals + copy- Committed entry on-write Uncommitted entry
7 Atomicity: Logging for Simple Metadata Operations
• Combines log-structuring, journaling and copy-on-write • Log-structuring for single log update Tail Tail – Write, msync, chmod, etc File log – Lower overhead than journaling and shadow paging
8 Atomicity: Lightweight Journaling for Complex Metadata Operations
TailTail • Lightweight journaling for update across logs Directory log – Unlink, rename, etc TailTail – Journal log tails instead of File log metadata or data
Dir tail Journal File tail
9 Atomicity: Copy-on-write for file data
• Copy-on-write for file data – Log only contains metadata Tail Tail – Log is short File log – Instant data GC
Data 0 Data 1 Data 1 Data 2
10 Filebench throughput
400 350 300 250 per second 200 150 KOps 100 50 0 Fileserver Varmail Webproxy Webserver
Ext4-datajournal Ext4-DAX xfs-DAX NOVA 11 What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system NOVA NOVA 2. Use files and better file (distributed) system NOVA ??? 3. Build persistent data NOVA structures 4. Use it as slow DRAM NOVA NOVA
12 Existing Distributed File Systems are Slow
Distributed File Distributed File System Application System Client Server
Network Local File Client FS Network Stack Stack System
13 Orion: A Distributed Persistent Memory File System
Application
OrionFS OrionFS
RDMA 14 Orion: Key Features
• Based on NOVA • Mirrored metadata on client – Client keep local, NVMM copy of inode’s log – Leases + simple arbitration for concurrent updates • Mostly-local operation – Local read cache – CoW creates new, local copy • Pervasive RDMA – All addresses/pointers are RDMA-friendly – Zero-copy IO for most transfers (NOVA data structures are RDMA targets) – Single-ended remote data access
15 Application performance on Orion
16 What Should You Do With NVMM?
1. Use files and a conventional (distributed) file system 2. Use files and better file (distributed) system 3. Build persistent data structures 4. Use it as slow DRAM Log
17 Build NVMM Data Structures Is Hard
• All existing programming errors are still possible – Memory leaks – Multiple frees – Locking errors • There are new kinds of errors – Pointers between NV memory pools – Pointers from NVMM to DRAM • Programmers get this stuff wrong • Rebooting/restarting won’t help! • Language + Compiler support will come, but slowly
18 Optimizing RocksDB
19 What Should You Do With NVMM?
1. Use files and a conventional Easy; ~5x gains (distributed) file system 2. Use files and better file Pretty easy; ~10x gains (distributed) system The NVMM Programmability Gap 3. Build persistent data structures Really hard; ~30x gains 4. Use it as slow DRAM
20 Optimizing RocksDB
File Emulation
21 File Emulation
• Normal write-ahead logging – open(); – write(); sync(); • Emulate read/write in user space – open(); mmap(); – memcpy() + clwb + fence • Almost POSIX semantics – Minimal changes to app logic – No complex logging, allocation, or locking • Almost persistent data structure performance – Just 10% slower. 22 File Emulation Speedups
60
50
40
30
20
File Speedup Emulation 10
0 SQLite KyotoCabinet RocksDB XFS-DAX Ext4-DAX NOVA 23 What Should You Do With NVMM?
• You should study it! – Many interesting, open problems remain – Lots of PhDs to come • You should use it! – Use a file system! – Want more performance? Use file emulation! – Want more performance? Build persistent data structures.
24 NOVA is open source. We are preparing it for “upstreaming” in to Linux.
To help or try it out: https://github.com/NVSL/linux-nova
25 Thanks!
26 Thank you!