Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping
Total Page:16
File Type:pdf, Size:1020Kb
Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller FAST ‘19 2019-02-26 1 Byte-addressable Non-volatile Memory BNVM is coming, and with it, new optimization targets CRSS Confidential 2 Byte-addressable Non-volatile Memory 0 1 0 1 0 0 0 1 It’s not just writes... ...it’s the bits flipped by those writes 0 1 1 0 0 1 0 1 CRSS Confidential 3 BNVM power usage 4 Can we take advantage of this? Software vs. hardware? 5 Can we take advantage of this? Software vs. hardware? How hard is it to reason about bit flips? 6 Can we take advantage of this? Software vs. hardware? How hard is it to reason about bit flips? How do we design data structures to reduce bit flips? 7 8 Reducing Bit ● Flips in Software ● ● 9 XOR linked lists Traditional doubly linked list value value value next next next prev prev prev XOR linked list value value value xptr xptr xptr xptr = next ⊕ prev 10 Pointers! Some actual pointers A = 0x000055b7bda8f260 B = 0x000055b7bda8f6a0 A ⊕ B = 0x4C0 = 0b10011000000 11 Using XOR in hash tables key value xnext ... key key value value xnext xnext ... key value xnext 12 Using XOR in hash tables key value xnext D key key value value 00000 0 xxxxx 1 Both indicate “entry is empty” 13 From XOR linked lists to Red Black Trees L P R Standard 3-pointer red-black tree design 14 From XOR linked lists to Red Black Trees ⊕ RX = R P L P R LX RX ⊕ LX = L P Now 2-pointer, and XOR pointers 15 Evaluation ● ● 16 Experimental framework , at the memory controller Test different data structures, with different cache parameters, over a varying number of operations 17 Bit flips: calling malloc() Cross-over point between 40 and 48 bytes 18 Bit flips: XOR Linked Lists XOR linked lists reduce bit flips dramatically better 19 Bit flips: Hash table Hash table already had few flips to save: chains should be short better 20 Bit flips: Red-black Trees Significant savings better 21 Bit flips: L2 Cache Behavior L2 cache has ultimately little effect! better 22 Bit flips: L2 Cache Behavior L2 cache has ultimately little effect! ...even when increasing in size better 23 Performance: RBT insert Performance is not significantly affected! a) Performance cost of XORs better b) Performance benefit of smaller node size 24 Performance: hash table Almost no effect on performance better 25 Conclusions Savings are significant with little performance impact. We can design around bit flips, and we should. Bit flip/write inversion CRSS Confidential 26 System simulation Caching Power and wear effects Data structures estimates Pointer ABI modifications distance Full systems More data Additional structures techniques Real hardware Bitflips from algorithms 27 Thank You! Questions? Daniel Bittman [email protected] [email protected] [email protected] https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19 28 Backup slides 29 Bit flips: instrumentation better 30 Performance: RBT lookup better 31 Performance: XOR Linked List Operation XOR Linked List Doubly Linked List 45 +/- 1 45 +/- 1 27 +/- 1 28 +/- 1 2.6 +/ 0.1 2.2 +/- 0.1 32 Stack frames fn0 fn1 arg0 arg0 pc pc sp sp csr0 csr1 csr1 33 Stack frames fn0 fn1 arg0 arg0 pc pc sp sp csr0 csr1 csr1 “Wasted space” 34 Bit flips: stack frames 35.