Research Faculty Summit 2018

Systems | Fueling future disruptions What Should You Do With Persistent Memory?

Steven Swanson

Director, Non-Volatile Systems Lab Computer Science and Engineering UC San Diego 2 Non-volatile main memory (NVMM)

Byte-addressable • CPU • Denser than DRAM Core Core L1/L2 L1/L2 • DRAM-comparable latency Last-level cache • Higher bandwidth than SSD Memory Memory • Ready for DMA / RDMA Controller Controller DRAM DIMM NVDIMM

DRAM DIMM NVDIMM Memory NVMM

Intel 3D XPoint NVDIMM 3 What Should You Do With NVMM?

1. Use files and a conventional (distributed) NOVAEXT4 NOVAEXT4 2. Use files and better file (distributed) system CEPH NOVAEXT4 Orion 3. Build persistent data etc. NOVA structures 4. Use it as slow DRAM NOVAEXT4Log NOVAEXT4

4 What Should You Do With NVMM?

1. Use files and a conventional (distributed) file system NOVA NOVA 2. Use files and better file (distributed) system NOVA Orion NOVA 3. Build persistent data structures 4. Use it as slow DRAM NOVA NOVA

5 DAX

Fault Direct File IO Atomicity Speed Tolerance Access

6 NOVA: A File System for NVMM

• A NOVA FS is a tree of logs Per- logging • One log per inode

– Inode points to head and tail Inode Head Tail – Logs are not contiguous • Many Logs -> high concurrency Inode log • Strong consistency guarantees • Log-structured + journals + copy- Committed entry on-write Uncommitted entry

7 Atomicity: Logging for Simple Metadata Operations

• Combines log-structuring, journaling and copy-on-write • Log-structuring for single log update Tail Tail – Write, msync, , etc File log – Lower overhead than journaling and shadow paging

8 Atomicity: Lightweight Journaling for Complex Metadata Operations

TailTail • Lightweight journaling for update across logs log – Unlink, rename, etc TailTail – Journal log tails instead of File log metadata or data

Dir tail Journal File tail

9 Atomicity: Copy-on-write for file data

• Copy-on-write for file data – Log only contains metadata Tail Tail – Log is short File log – Instant data GC

Data 0 Data 1 Data 1 Data 2

10 Filebench throughput

400 350 300 250 per second 200 150 KOps 100 50 0 Fileserver Varmail Webproxy Webserver

Ext4-datajournal Ext4-DAX -DAX NOVA 11 What Should You Do With NVMM?

1. Use files and a conventional (distributed) file system NOVA NOVA 2. Use files and better file (distributed) system NOVA ??? 3. Build persistent data NOVA structures 4. Use it as slow DRAM NOVA NOVA

12 Existing Distributed File Systems are Slow

Distributed File Distributed File System Application System Client Server

Network Local File Client FS Network Stack Stack System

13 Orion: A Distributed Persistent Memory File System

Application

OrionFS OrionFS

RDMA 14 Orion: Key Features

• Based on NOVA • Mirrored metadata on client – Client keep local, NVMM copy of inode’s log – Leases + simple arbitration for concurrent updates • Mostly-local operation – Local read cache – CoW creates new, local copy • Pervasive RDMA – All addresses/pointers are RDMA-friendly – Zero-copy IO for most transfers (NOVA data structures are RDMA targets) – Single-ended remote data access

15 Application performance on Orion

16 What Should You Do With NVMM?

1. Use files and a conventional (distributed) file system 2. Use files and better file (distributed) system 3. Build persistent data structures 4. Use it as slow DRAM Log

17 Build NVMM Data Structures Is Hard

• All existing programming errors are still possible – Memory leaks – Multiple frees – Locking errors • There are new kinds of errors – Pointers between NV memory pools – Pointers from NVMM to DRAM • get this stuff wrong • Rebooting/restarting won’t help! • Language + Compiler support will come, but slowly

18 Optimizing RocksDB

19 What Should You Do With NVMM?

1. Use files and a conventional Easy; ~5x gains (distributed) file system 2. Use files and better file Pretty easy; ~10x gains (distributed) system The NVMM Programmability Gap 3. Build persistent data structures Really hard; ~30x gains 4. Use it as slow DRAM

20 Optimizing RocksDB

File Emulation

21 File Emulation

• Normal write-ahead logging – open(); – write(); sync(); • Emulate read/write in user space – open(); mmap(); – memcpy() + clwb + fence • Almost POSIX semantics – Minimal changes to app logic – No complex logging, allocation, or locking • Almost persistent data structure performance – Just 10% slower. 22 File Emulation Speedups

60

50

40

30

20

File Speedup Emulation 10

0 SQLite KyotoCabinet RocksDB XFS-DAX Ext4-DAX NOVA 23 What Should You Do With NVMM?

• You should study it! – Many interesting, open problems remain – Lots of PhDs to come • You should use it! – Use a file system! – Want more performance? Use file emulation! – Want more performance? Build persistent data structures.

24 NOVA is open source. We are preparing it for “upstreaming” in to .

To help or try it out: https://github.com/NVSL/linux-nova

25 Thanks!

26 Thank you!