Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping

Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller

FAST ‘19 2019-02-26

1 Byte-addressable Non-volatile Memory

BNVM is coming, and with it, new optimization targets

CRSS Confidential 2 Byte-addressable Non-volatile Memory

0 1 0 1 0 0 0 1 It’s not just writes...... it’s the bits flipped by those writes 0 1 1 0 0 1 0 1

CRSS Confidential 3 BNVM power usage

4 Can we take advantage of this?

Software vs. hardware?

5 Can we take advantage of this?

Software vs. hardware?

How hard is it to reason about bit flips?

6 Can we take advantage of this?

Software vs. hardware?

How hard is it to reason about bit flips?

How do we design data structures to reduce bit flips?

7 8 Reducing Bit ● Flips in Software ● ●

9 XOR linked lists

Traditional doubly

value value value next next next prev prev prev

XOR linked list

value value value xptr xptr xptr xptr = next ⊕ prev

10 Pointers!

Some actual pointers A = 0x000055b7bda8f260 B = 0x000055b7bda8f6a0 A ⊕ B = 0x4C0 = 0b10011000000 11 Using XOR in hash tables

key value xnext ... key key value value xnext xnext ... key value

xnext 12 Using XOR in hash tables

key value xnext D

key key value value 00000 0 xxxxx 1

Both indicate “entry is empty”

13 From XOR linked lists to Red Black Trees

L P R

Standard 3-pointer red-black design 14 From XOR linked lists to Red Black Trees

⊕ RX = R P L P R LX RX ⊕ LX = L P

Now 2-pointer, and XOR pointers 15 Evaluation ● ●

16 Experimental framework

, at the memory controller

Test different data structures, with different cache parameters, over a varying number of operations

17 Bit flips: calling malloc()

Cross-over point between 40 and 48 bytes

18 Bit flips: XOR Linked Lists

XOR linked lists reduce bit flips dramatically

better 19 Bit flips:

Hash table already had few flips to save: chains should be short

better 20 Bit flips: Red-black Trees

Significant savings

better 21 Bit flips: L2 Cache Behavior

L2 cache has ultimately little effect!

better 22 Bit flips: L2 Cache Behavior

L2 cache has ultimately little effect!

...even when increasing in size

better 23 Performance: RBT insert

Performance is not significantly affected!

a) Performance cost of XORs better b) Performance benefit of smaller node size 24 Performance: hash table

Almost no effect on performance

better 25 Conclusions

Savings are significant with little performance impact.

We can design around bit flips, and we should.

Bit flip/write inversion

CRSS Confidential 26 System simulation Caching Power and wear effects Data structures estimates Pointer ABI modifications distance

Full systems More data Additional structures techniques Real hardware Bitflips from algorithms

27 Thank You! Questions?

Daniel Bittman [email protected] [email protected] [email protected]

https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19

28 Backup slides

29 Bit flips: instrumentation

better 30 Performance: RBT lookup

better 31 Performance: XOR Linked List

Operation XOR Linked List Doubly Linked List

45 +/- 1 45 +/- 1

27 +/- 1 28 +/- 1

2.6 +/ 0.1 2.2 +/- 0.1

32 Stack frames

fn0 fn1

arg0 arg0 pc pc sp sp csr0 csr1 csr1

33 Stack frames

fn0 fn1

arg0 arg0 pc pc sp sp csr0 csr1 csr1

“Wasted space”

34 Bit flips: stack frames

35