Efficient Support of Position Independence on Non-Volatile Memory
Total Page:16
File Type:pdf, Size:1020Kb
Efficient Support of Position Independence on Non-Volatile Memory Guoyang Chen Lei Zhang Richa Budhiraja Qualcomm Inc. North Carolina State University Qualcomm Inc. [email protected] Raleigh, NC [email protected] [email protected] Xipeng Shen Youfeng Wu North Carolina State University Intel Corp. Raleigh, NC [email protected] [email protected] ABSTRACT In Proceedings of MICRO-50, Cambridge, MA, USA, October 14–18, 2017, This paper explores solutions for enabling efficient supports of 13 pages. https://doi.org/10.1145/3123939.3124543 position independence of pointer-based data structures on byte- addressable None-Volatile Memory (NVM). When a dynamic data structure (e.g., a linked list) gets loaded from persistent storage into 1 INTRODUCTION main memory in different executions, the locations of the elements Byte-Addressable Non-Volatile Memory (NVM) is a new class contained in the data structure could differ in the address spaces of memory technologies, including phase-change memory (PCM), from one run to another. As a result, some special support must memristors, STT-MRAM, resistive RAM (ReRAM), and even flash- be provided to ensure that the pointers contained in the data struc- backed DRAM (NV-DIMMs). Unlike on DRAM, data on NVM tures always point to the correct locations, which is called position are durable, despite software crashes or power loss. Compared to independence. existing flash storage, some of these NVM technologies promise This paper shows the insufficiency of traditional methods in sup- 10-100x better performance, and can be accessed via memory in- porting position independence on NVM. It proposes a concept called structions rather than I/O operations. For its appealing properties, implicit self-contained representations of pointers, and develops NVM has the potential to create some important impact on comput- two such representations named off-holder and Region ID in Value ing. NVM has drawn some strong interest in industry, exemplified (RIV) to materialize the concept. Experiments show that the enabled with the 3d-XPoint memory announcement by Intel and Micron [2], representations provide much more efficient and flexible support the “The Machine” project by HP [8], the explorations to use it for of position independence for dynamic data structures, alleviating a databases [4], key-value stores [5], file systems [11, 13, 32], and so major issue for effective data reuses on NVM. on [20]. Designs of traditional systems for general applications have been CCS CONCEPTS assuming a separation of memory and storage. Effectively support- • Hardware Memory and dense storage; • Computer systems ing NVM hence requires innovations across the entire computing ! organization Architectures; • Software and its engineering stack. Recent years have seen many studies on architectural de- ! ! Compilers; General programming languages; signs (e.g., [17–19, 21, 23, 24, 33]), operating system extensions (e.g., [11–13, 22, 27, 28, 31, 32]), and programming model develop- KEYWORDS ments (e.g., [1, 3, 6–10, 15, 29]). Compiler, Program Optimizations, Programming Languages, NVM These studies have primarily concentrated on how to provide ACID (Atomicity, Consistency, Isolation, and Durability) for appli- ACM Reference format: cations that use NVM, such as how to ensure data consistency and Guoyang Chen, Lei Zhang, Richa Budhiraja, Xipeng Shen, and Youfeng Wu. recoverability upon a system crash, how to minimize the needed 2017. Efficient Support of Position Independence on Non-Volatile Memory. logging overhead, how to support atomic updates. In this paper, we concentrate on a different dimension: data * The work was done when Guoyang Chen and Richa Budhiraja were students at NCSU. reusability, which refers to how efficiently and flexibly an appli- cation can use the data previously stored on NVM by this or other Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed applications. ACID support is important for applications to function for profit or commercial advantage and that copies bear this notice and the full citation soundly on NVM, while data reusability is critical for turning NVM on the first page. Copyrights for components of this work owned by others than ACM persistency into computing efficiency. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a More specifically, this work is focused on a fundamental aspect in fee. Request permissions from [email protected]. enhancing data reusability: the support of position independence in MICRO-50, October 14–18, 2017, Cambridge, MA, USA pointer-based data structures. A central challenge for efficient data © 2017 Association for Computing Machinery. ACM ISBN 978-1-4503-4952-9/17/10. $15.00 reuse is to enable position independence for dynamic data structures. https://doi.org/10.1145/3123939.3124543 A data object is position independent if all the references in the data 191 MICRO-50, October 14–18, 2017, Cambridge, MA, USA G.Chen, L.Zhang, R.Budhiraja, X.Shen and Y.Wu struct uint regionID; ulong offset (offset is the offset 100000h 130000h { } of the target location from the base address of the NVRegion that next = 100020h next = 100020h has an ID equaling regionID). It calculates the actual address from the fields of the struct or class at each pointer access. It doubles the space usage of pointers, and at the same time, incurs time overhead 100020h 130020h target target similarly large as swizzling causes. A better support of position independence is hence essential for data reusability on NVM. This paper describes our solutions to the problem, which consist of two novel representations of pointers and some programming lan- guage extensions. The key in our solution is the concept of implicit (a) Addr. Space in Run-1 (b) Addr. Space in Run-2 self-contained representations of pointers. A pointer is in an implicit self-contained representation if it meets three conditions: (1) it is Figure 1: Issue of position dependent representation: “next” no larger than a normal pointer; (2) it contains all the info needed points to a wrong target in Run-2 when the data region are for locating the target address of the pointer; (3) programmers can mapped to a different virtual address than in Run-1. use the pointer for data accesses in the same way as using a normal pointer (the type declaration could differ). Such representations min- imize the space overhead of pointers in a dynamic data structure, object keep their integrity (i.e., pointing to the correct addresses) and at the same time, allow intuitive usage of non-volatile pointers. regardless of where in the memory space the data object is mapped. The challenge is how to materialize the representations such that Dynamic data structures impose special difficulties for achieving they incur the minimum runtime overhead. Concatenating the two position independence due to the usage of pointers and references fields of a fat pointer (regionID and offset) into one 64-bit word, in them. Consider a linked list, in which, each node contains a for instance, can make the pointer self contained. But it would still field “next” that carries the address of the next node as illustrated require translations between the regionID and the address of the in Figure 1 (a). This implementation of pointers is not position region at runtime, which, without a careful implementation, could independent. If the linked list is mapped to a different location as easily incur large overhead. Figure 1 (b) shows, the value of “next” of the first node would not We solve the problem by developing off-holder and Region ID equal the current address of the second node, causing errors in the in Value (RIV) pointer representations, two implicit self-contained linked list. All data structures containing pointers are subject to this representations of pointers to enable efficient support of position issue, including linked lists, graphs, trees, hash tables, maps, classes, independence. By storing the offset of the target location and the and so on. pointer’s own address, off-holder offers an efficient support for point- Position independence is an essential requirement for NVM. Data ers pointing to locations within the same NVRegion as the pointers on NVM are supposed to be reused across runs and applications. reside. RIV gives a more general solution, supporting pointers both Requiring a data object to always map to the same virtual address is within and across regions. To make the runtime convertions between impractical in general systems. The address could be already used by region IDs and addresses efficient, we come up with a novel ap- some other shared memories or objects. Moreover, modern operating proach to realizing the convertions through some mapping functions systems have broadly adopted address space randomization [26], involving only several bit transformations. We provide a compari- a security technique that prevents malicious attacks by varying the son of the two new representations with existing alternatives both starting addresses of stacks and heaps at the launching time of a qualitatively and quantitatively. program. Experiments show that the enabled representations provide much Position independence is needed on traditional secondary stor- more efficient support of position independence for dynamic data age, such as database or Java objects. The method used there is structures than existing representations do. For single-region ac- pointer swizzling/unswizzling [14] (happening during data serializa- cesses, the overhead reduces from over 3X (swizzling or fat pointer) tion/deserialization), which uses position-independent indices (e.g., to 1.13X; for multi-region accesses, the overhead reduces from 2.2X offset in a file) for pointers on the secondary storage and converts to 1.4X. The exploration also reveals some other insights on support- them to direct pointers when the data are loaded into memory.