Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping
Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller
FAST ‘19 2019-02-26
1 Byte-addressable Non-volatile Memory
BNVM is coming, and with it, new optimization targets
CRSS Confidential 2 Byte-addressable Non-volatile Memory
0 1 0 1 0 0 0 1 It’s not just writes...... it’s the bits flipped by those writes 0 1 1 0 0 1 0 1
CRSS Confidential 3 BNVM power usage
4 Can we take advantage of this?
Software vs. hardware?
5 Can we take advantage of this?
Software vs. hardware?
How hard is it to reason about bit flips?
6 Can we take advantage of this?
Software vs. hardware?
How hard is it to reason about bit flips?
How do we design data structures to reduce bit flips?
7 8 Reducing Bit ● Flips in Software ● ●
9 XOR linked lists
Traditional doubly linked list
value value value next next next prev prev prev
XOR linked list
value value value xptr xptr xptr xptr = next ⊕ prev
10 Pointers!
Some actual pointers A = 0x000055b7bda8f260 B = 0x000055b7bda8f6a0 A ⊕ B = 0x4C0 = 0b10011000000 11 Using XOR in hash tables
key value xnext ... key key value value xnext xnext ... key value
xnext 12 Using XOR in hash tables
key value xnext D
key key value value 00000 0 xxxxx 1
Both indicate “entry is empty”
13 From XOR linked lists to Red Black Trees
L P R
Standard 3-pointer red-black tree design 14 From XOR linked lists to Red Black Trees
⊕ RX = R P L P R LX RX ⊕ LX = L P
Now 2-pointer, and XOR pointers 15 Evaluation ● ●
16 Experimental framework
, at the memory controller
Test different data structures, with different cache parameters, over a varying number of operations
17 Bit flips: calling malloc()
Cross-over point between 40 and 48 bytes
18 Bit flips: XOR Linked Lists
XOR linked lists reduce bit flips dramatically
better 19 Bit flips: Hash table
Hash table already had few flips to save: chains should be short
better 20 Bit flips: Red-black Trees
Significant savings
better 21 Bit flips: L2 Cache Behavior
L2 cache has ultimately little effect!
better 22 Bit flips: L2 Cache Behavior
L2 cache has ultimately little effect!
...even when increasing in size
better 23 Performance: RBT insert
Performance is not significantly affected!
a) Performance cost of XORs better b) Performance benefit of smaller node size 24 Performance: hash table
Almost no effect on performance
better 25 Conclusions
Savings are significant with little performance impact.
We can design around bit flips, and we should.
Bit flip/write inversion
CRSS Confidential 26 System simulation Caching Power and wear effects Data structures estimates Pointer ABI modifications distance
Full systems More data Additional structures techniques Real hardware Bitflips from algorithms
27 Thank You! Questions?
Daniel Bittman [email protected] [email protected] [email protected]
https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19
28 Backup slides
29 Bit flips: instrumentation
better 30 Performance: RBT lookup
better 31 Performance: XOR Linked List
Operation XOR Linked List Doubly Linked List
45 +/- 1 45 +/- 1
27 +/- 1 28 +/- 1
2.6 +/ 0.1 2.2 +/- 0.1
32 Stack frames
fn0 fn1
arg0 arg0 pc pc sp sp csr0 csr1 csr1
33 Stack frames
fn0 fn1
arg0 arg0 pc pc sp sp csr0 csr1 csr1
“Wasted space”
34 Bit flips: stack frames
35