Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping

Total Page:16

File Type:pdf, Size:1020Kb

Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller FAST ‘19 2019-02-26 1 Byte-addressable Non-volatile Memory BNVM is coming, and with it, new optimization targets CRSS Confidential 2 Byte-addressable Non-volatile Memory 0 1 0 1 0 0 0 1 It’s not just writes... ...it’s the bits flipped by those writes 0 1 1 0 0 1 0 1 CRSS Confidential 3 BNVM power usage 4 Can we take advantage of this? Software vs. hardware? 5 Can we take advantage of this? Software vs. hardware? How hard is it to reason about bit flips? 6 Can we take advantage of this? Software vs. hardware? How hard is it to reason about bit flips? How do we design data structures to reduce bit flips? 7 8 Reducing Bit ● Flips in Software ● ● 9 XOR linked lists Traditional doubly linked list value value value next next next prev prev prev XOR linked list value value value xptr xptr xptr xptr = next ⊕ prev 10 Pointers! Some actual pointers A = 0x000055b7bda8f260 B = 0x000055b7bda8f6a0 A ⊕ B = 0x4C0 = 0b10011000000 11 Using XOR in hash tables key value xnext ... key key value value xnext xnext ... key value xnext 12 Using XOR in hash tables key value xnext D key key value value 00000 0 xxxxx 1 Both indicate “entry is empty” 13 From XOR linked lists to Red Black Trees L P R Standard 3-pointer red-black tree design 14 From XOR linked lists to Red Black Trees ⊕ RX = R P L P R LX RX ⊕ LX = L P Now 2-pointer, and XOR pointers 15 Evaluation ● ● 16 Experimental framework , at the memory controller Test different data structures, with different cache parameters, over a varying number of operations 17 Bit flips: calling malloc() Cross-over point between 40 and 48 bytes 18 Bit flips: XOR Linked Lists XOR linked lists reduce bit flips dramatically better 19 Bit flips: Hash table Hash table already had few flips to save: chains should be short better 20 Bit flips: Red-black Trees Significant savings better 21 Bit flips: L2 Cache Behavior L2 cache has ultimately little effect! better 22 Bit flips: L2 Cache Behavior L2 cache has ultimately little effect! ...even when increasing in size better 23 Performance: RBT insert Performance is not significantly affected! a) Performance cost of XORs better b) Performance benefit of smaller node size 24 Performance: hash table Almost no effect on performance better 25 Conclusions Savings are significant with little performance impact. We can design around bit flips, and we should. Bit flip/write inversion CRSS Confidential 26 System simulation Caching Power and wear effects Data structures estimates Pointer ABI modifications distance Full systems More data Additional structures techniques Real hardware Bitflips from algorithms 27 Thank You! Questions? Daniel Bittman [email protected] [email protected] [email protected] https://gitlab.soe.ucsc.edu/gitlab/crss/opensource-bitflipping-fast19 28 Backup slides 29 Bit flips: instrumentation better 30 Performance: RBT lookup better 31 Performance: XOR Linked List Operation XOR Linked List Doubly Linked List 45 +/- 1 45 +/- 1 27 +/- 1 28 +/- 1 2.6 +/ 0.1 2.2 +/- 0.1 32 Stack frames fn0 fn1 arg0 arg0 pc pc sp sp csr0 csr1 csr1 33 Stack frames fn0 fn1 arg0 arg0 pc pc sp sp csr0 csr1 csr1 “Wasted space” 34 Bit flips: stack frames 35.
Recommended publications
  • Lecture 04 Linear Structures Sort
    Algorithmics (6EAP) MTAT.03.238 Linear structures, sorting, searching, etc Jaak Vilo 2018 Fall Jaak Vilo 1 Big-Oh notation classes Class Informal Intuition Analogy f(n) ∈ ο ( g(n) ) f is dominated by g Strictly below < f(n) ∈ O( g(n) ) Bounded from above Upper bound ≤ f(n) ∈ Θ( g(n) ) Bounded from “equal to” = above and below f(n) ∈ Ω( g(n) ) Bounded from below Lower bound ≥ f(n) ∈ ω( g(n) ) f dominates g Strictly above > Conclusions • Algorithm complexity deals with the behavior in the long-term – worst case -- typical – average case -- quite hard – best case -- bogus, cheating • In practice, long-term sometimes not necessary – E.g. for sorting 20 elements, you dont need fancy algorithms… Linear, sequential, ordered, list … Memory, disk, tape etc – is an ordered sequentially addressed media. Physical ordered list ~ array • Memory /address/ – Garbage collection • Files (character/byte list/lines in text file,…) • Disk – Disk fragmentation Linear data structures: Arrays • Array • Hashed array tree • Bidirectional map • Heightmap • Bit array • Lookup table • Bit field • Matrix • Bitboard • Parallel array • Bitmap • Sorted array • Circular buffer • Sparse array • Control table • Sparse matrix • Image • Iliffe vector • Dynamic array • Variable-length array • Gap buffer Linear data structures: Lists • Doubly linked list • Array list • Xor linked list • Linked list • Zipper • Self-organizing list • Doubly connected edge • Skip list list • Unrolled linked list • Difference list • VList Lists: Array 0 1 size MAX_SIZE-1 3 6 7 5 2 L = int[MAX_SIZE]
    [Show full text]
  • Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman, Darrell D
    Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman, Darrell D. E. Long, Peter Alvaro, and Ethan L. Miller, UC Santa Cruz https://www.usenix.org/conference/fast19/presentation/bittman This paper is included in the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19). February 25–28, 2019 • Boston, MA, USA 978-1-931971-48-5 Open access to the Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST ’19) is sponsored by Optimizing Systems for Byte-Addressable NVM by Reducing Bit Flipping Daniel Bittman Peter Alvaro Darrell D. E. Long Ethan L. Miller UC Santa Cruz UC Santa Cruz UC Santa Cruz UC Santa Cruz Pure Storage Abstract stressing their weaknesses. Historically, such optimizations have included reducing the number of writes performed, ei- New byte-addressable non-volatile memory (BNVM) tech- ther by designing data structures that require fewer writes nologies such as phase change memory (PCM) enable the or by using hardware techniques such as caching to reduce construction of systems with large persistent memories, im- writes. However, it is the number of bits flipped that matter proving reliability and potentially reducing power consump- most for BNVMs such as phase-change memory (PCM), not tion. However, BNVM technologies only support a limited the number of words written. number of lifetime writes per cell and consume most of their BNVMs such as PCM suffer from two problems caused by power when flipping a bit’s state during a write; thus, PCM flipping bits: energy usage and cell wear-out.
    [Show full text]
  • Fundamental Data Structures Contents
    Fundamental Data Structures Contents 1 Introduction 1 1.1 Abstract data type ........................................... 1 1.1.1 Examples ........................................... 1 1.1.2 Introduction .......................................... 2 1.1.3 Defining an abstract data type ................................. 2 1.1.4 Advantages of abstract data typing .............................. 4 1.1.5 Typical operations ...................................... 4 1.1.6 Examples ........................................... 5 1.1.7 Implementation ........................................ 5 1.1.8 See also ............................................ 6 1.1.9 Notes ............................................. 6 1.1.10 References .......................................... 6 1.1.11 Further ............................................ 7 1.1.12 External links ......................................... 7 1.2 Data structure ............................................. 7 1.2.1 Overview ........................................... 7 1.2.2 Examples ........................................... 7 1.2.3 Language support ....................................... 8 1.2.4 See also ............................................ 8 1.2.5 References .......................................... 8 1.2.6 Further reading ........................................ 8 1.2.7 External links ......................................... 9 1.3 Analysis of algorithms ......................................... 9 1.3.1 Cost models ......................................... 9 1.3.2 Run-time analysis
    [Show full text]
  • A Theory of Singly-Linked Lists and Its Extensible Decision Procedure∗
    A Theory of Singly-Linked Lists and its Extensible Decision Procedure∗ Silvio Ranise Calogero Zarba LORIA-INRIA–Lorraine & Universita` degli Studi di Milano Universitat¨ des Saarlandes Abstract there exist precise and automatic techniques to reason about pointer reachability (see, e.g., [19]), little has been done The key to many approaches to reason about pointer- to combine such techniques with available decision pro- based data structures is the availability of a decision proce- cedures for theories over data and pointers. As a conse- dure to automatically discharge proof obligations in a the- quence, approximate solutions have been proposed. Either ory encompassing data, pointers, and the reachability re- the structure over data and pointers have been abstracted lation induced by pointers. So far, only approximate so- away so that tools to reason about reachability can be used lutions have been proposed which abstract either the data (see, e.g., [12]), or a first-order approximation of reachabil- or the reachability component. Indeed, such approxima- ity has been found (see, e.g., [26]) so that available deci- tions cause a lack of precision in the verification techniques sion procedures for the theories of pointers and data can be where the decision procedures are exploited. used. Indeed, this compromise causes a lack of precision In this paper, we consider the pointer-based data struc- in the verification techniques where such reasoning proce- ture of singly-linked lists and define a Theory of Linked dures are used. It would be very desirable to build a de- Lists (TLL). The theory is expressive since it is capable of cision procedure capable of precisely reason about reacha- precisely expressing both data and reachability constraints, bility while being extensible with available decision proce- while ensuring decidability.
    [Show full text]
  • Petalinux Tools Documentation: Reference Guide
    See all versions of this document PetaLinux Tools Documentation Reference Guide UG1144 (v2021.1) June 16, 2021 Revision History Revision History The following table shows the revision history for this document. Section Revision Summary 06/16/2021 Version 2021.1 Chapter 7: Customizing the Project Added a new section: Configuring UBIFS Boot. Chapter 5: Booting and Packaging Updated Steps to Boot a PetaLinux Image on Hardware with SD Card. Appendix A: Migration Added FPGA Manager Changes, Yocto Recipe Name Changes, Host GCC Version Upgrade. Chapter 10: Advanced Configurations Updated U-Boot Configuration and Image Packaging Configuration. UG1144 (v2021.1) June 16, 2021Send Feedback www.xilinx.com PetaLinux Tools Documentation Reference Guide 2 Table of Contents Revision History...............................................................................................................2 Chapter 1: Overview.................................................................................................... 8 Introduction................................................................................................................................. 8 Navigating Content by Design Process.................................................................................... 9 Chapter 2: Setting Up Your Environment...................................................... 11 Installation Steps.......................................................................................................................11 PetaLinux Working Environment Setup................................................................................
    [Show full text]
  • Practical Implementation of Xor – Linked Lists in Image Storage
    PRACTICAL IMPLEMENTATION OF XOR – LINKED LISTS IN IMAGE STORAGE Autori: Igor MARTA, Mihail KULEV Conducătorştiinţific: dr., conf. univ.MihailKULEV Universitatea Tehnică a Moldovei Email:[email protected], [email protected] Abstract:Have been considered quadtrees in C/C++ languages and their application for image storing and a way of segmentation of the pixel array, converting quadtree to XOR quadtree, generation of line codes and inverse traversals of quadtree. Keywords: Quadtree, XOR – linked lists, linecodes, rgb color pallet, bitmap image format, pixel array, spatial data structure. 1. Introduction The aim was to involve this method of linking nodes into a project which makes use of linked lists or trees that require bidirectional traversals. The choice was computer graphics. There are several methods of storing an image in computer memory. One of those is the polygon decomposition (here: square decomposition), which involves usage of quad trees and linked lists. Also in this method are required bidirectional traversals of the tree which is the place where XOR linking can fit pretty well in order to safe space and to avoid using of an additional memory field in the quad tree structure. Also it was practically useful to see graphically what image the quad trees contain and it was involved in the project some references of image processing. This was made in order to be able to save the pixel matrix in quad tree form and in image form. Also it was interesting to see if after decomposition and saving the image in location code form, the size of memory occupied by the image reduces or increases.
    [Show full text]
  • Data Structures (I)
    Data Structures (I) Stack Queue Linked List 1 What are data structures? A data structure is a particular way of organizing data so that they can be used efficiently 2 What are data structures? A data structure is a particular way of organizing data so that they can be used efficiently unorganized well organized 3 Contents Abstract Data Types - implemented by data structures ● Stack - LIFO ● Queue - FIFO Data structures ● Array ● Linked List ○ Singly linked list ○ Doubly linked list ○ Circular linked list ○ XOR linked list 4 Stack A stack of books 5 Stack We can ● put a book on top ● remove the top book We cannot A stack of books ● insert a book in the middle ● remove a book in the middle 6 Stack We can ● put an element on top ● remove the top element We cannot ● insert an element in the middle ● remove an element in the middle Output order ● Last In First Out (LIFO) 7 Implementation 0 1 2 3 4 5 6 7 Bear 0 1 2 3 4 5 6 7 Bear Cow 0 1 2 3 4 5 6 7 Bear Cow Duck 0 1 2 3 4 5 6 7 Bear Cow 0 1 2 3 4 5 6 7 8 Why not use array? ● Array is too powerful, easier to make mistakes ● Stack is so simple, you can't make a mistake ● Stack is easier to understand than array 9 Parentheses balance Determine if a string of 1 <= N <= 10000 parentheses is balanced. input {()[]} output Yes input {(][)} output No 10 Parentheses balance Maintain a stack, left parentheses → push stack right parentheses → pop stack { {()[}] 11 Parentheses balance Maintain a stack, left parentheses → push stack right parentheses → pop stack {()[}] { { ( {()[}] 12 Parentheses balance
    [Show full text]
  • Linux Loadable Kernel Module HOWTO
    Linux Loadable Kernel Module HOWTO Bryan Henderson 2006−09−24 Revision History Revision v1.09 2006−09−24 Revised by: bjh Fix typos. Revision v1.08 2006−03−03 Revised by: bjh Add copyright information. Revision v1.07 2005−07−20 Revised by: bjh Add some 2.6 info and disclaimers. Update references to Linux Device Drivers book, Linux Kernel Module Programming Guide. Revision v1.06 2005−01−12 Revised by: bjh Cover Linux 2.6 briefly. Update hello.c and reference to Lkmpg. Add information about perils of unloading. Mention dmesg as way to see kernel messages. Revision v1.05 2004−01−05 Revised by: bjh Add information on module.h and −DMODULE. Fix tldb.org to tldp.org. Add information on kallsyms. Revision v1.04 2003−10−10 Revised by: bjh Fix typo: AHA154x should be AHA152x Add information on what module names the kernel module loader calls for. Add information on what an LKM does when you first load it. Add information on loop module. Change linuxdoc.org to tldp.org. Revision v1.03 2003−07−03 Revised by: bjh Update on kernels that don't load into vmalloc space. Add explanation of "deleted" state of an LKM. Explain GPLONLY. Revision v1.02 2002−05−21 Revised by: bjh Correct explanation of symbol versioning. Correct author of Linux Device Drivers. Add info about memory allocation penalty of LKM vs bound−in. Add LKM−to−LKM symbol matching requirement. Add open source licensing issue in LKM symbol resolution. Add SMP symbol versioning info. Revision v1.01 2001−08−18 Revised by: bjh Add material on various features created in the last few years: kernel module loader, ksymoops symbols, kernel−version−dependent LKM file location.
    [Show full text]
  • Compact Tetrahedralization-Based Acceleration Structure for Ray Tracing
    Compact Tetrahedralization-based Acceleration Structure for Ray Tracing Aytek Amana, Serkan Demircia,Ugur˘ Gud¨ ukbay¨ a,∗ aDepartment of Computer Engineering, Bilkent University, 06800, Ankara, Turkey Abstract We propose a compact and efficient tetrahedral mesh representation to improve the ray-tracing performance. We re- order tetrahedral mesh data using a space-filling curve to improve cache locality. Most importantly, we propose an efficient ray traversal algorithm. We provide details of common ray tracing operations on tetrahedral meshes and give the GPU implementation of our traversal method. We demonstrate our findings through a set of comprehensive ex- periments. Our method outperforms existing tetrahedral mesh-based traversal methods and yields comparable results to the traversal methods based on the state of the art acceleration structures such as k-dimensional (k-d) trees and Bounding Volume Hierarchies (BVHs). Keywords: ray tracing, ray-surface intersection, acceleration structures, tetrahedral meshes, Bounding Volume Hierarchy (BVH), k-dimensional (k-d) tree. 1. Introduction tures for ray tracing, thanks to the recent advancements in the construction and traversal methods. A more recent alternative to accelerate ray-surface in- The core operation in ray tracing is the ray-surface in- tersection calculations is to use tetrahedralizations. A tersection calculations, which may contribute more than tetrahedral mesh is a three-dimensional (3-D) structure 95% of the total computation time [1]. Hence, the com- that partitions the 3-D space into tetrahedra. Constrained putational cost of intersection calculations determines the tetrahedralizations are a special case of tetrahedralizations run-time efficiency of the ray-tracing algorithm. To speed that take the input geometry into account.
    [Show full text]
  • Hardware Support for the Security Analysis of Embedded Softwares : Applications on Information Flow Control and Malware Analysis Muhammad Abdul Wahab
    Hardware support for the security analysis of embedded softwares : applications on information flow control and malware analysis Muhammad Abdul Wahab To cite this version: Muhammad Abdul Wahab. Hardware support for the security analysis of embedded softwares : appli- cations on information flow control and malware analysis. Hardware Architecture [cs.AR]. Centrale- Supélec, 2018. English. NNT : 2018CSUP0003. tel-02634340 HAL Id: tel-02634340 https://tel.archives-ouvertes.fr/tel-02634340 Submitted on 27 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. THESE DE DOCTORAT DE CENTRALESUPELEC COMUE UNIVERSITE BRETAGNE LOIRE ECOLE DOCTORALE N° 601 Mathématiques et Sciences et Technologies de l'Information et de la Communication Spécialité : Electronique Par Muhammad Abdul WAHAB Support matériel pour l’analyse de sécurité du comportement des applications Application au contrôle de flux d’information et à l’analyse des logiciels malveillants Thèse présentée et soutenue à Rennes, le 10 décembre 2018 Unité de recherche : IETR, UMR 6164 Thèse N° : 2018-08-TH
    [Show full text]
  • XOR-BASED COMPACT TRIANGULATIONS Abdelkrim
    Computing and Informatics, Vol. 37, 2018, 367{384, doi: 10.4149/cai 2018 2 367 XOR-BASED COMPACT TRIANGULATIONS Abdelkrim Mebarki D´epartement d'Informatique Facult´edes Math´ematiqueset Informatique Universit´edes Sciences et de la Technologie d'Oran { Mohamed Boudiaf USTO-MB, BP 1505 Oran El M'naouer, 31000, Oran Alg´erie e-mail: [email protected] Abstract. Media, image processing, and geometric-based systems and applications need data structures to model and represent different geometric entities and objects. These data structures have to be time efficient and compact in term of space. Many structures in use are proposed to satisfy those constraints. This paper introduces a novel compact data structure inspired by the XOR-linked lists. The subject of this paper concerns the triangular data structures. Nevertheless, the underlying idea could be used for any other geometrical subdivision. The ability of the bit- wise XOR operator to reduce the number of references is used to model triangle and vertex references. The use of the XOR combined references needs to define a context from which the triangle is accessed. The direct access to any triangle is not possible using only the XOR-linked scheme. To allow the direct access, addi- tional information are added to the structure. This additional information permits a constant time access to any element of the triangulation using a local resolution scheme. This information represents an additional cost to the triangulation, but the gain is still maintained. This cost is reduced by including this additional in- formation to a local sub-triangulation and not to each triangle.
    [Show full text]
  • I.MX Reference Manual
    i.MX Reference Manual Document Number: IMXLXRM Rev. L4.9.88_2.0.0-ga, 05/2018 i.MX Reference Manual, Rev. L4.9.88_2.0.0-ga, 05/2018 2 NXP Semiconductors Contents Section number Title Page Chapter 1 Introduction 1.1 Overview.........................................................................................................................................................................33 1.1.1 Software Base.................................................................................................................................................... 33 1.1.2 Features.............................................................................................................................................................. 34 1.2 Audience......................................................................................................................................................................... 37 1.2.1 Conventions....................................................................................................................................................... 38 1.2.2 Definitions, Acronyms, and Abbreviations........................................................................................................38 Chapter 2 System 2.1 Machine-Specific Layer (MSL)......................................................................................................................................43 2.1.1 Introduction........................................................................................................................................................43
    [Show full text]