Hardware Transactional Persistent Memory

Hardware Transactional Persistent Memory Ellis Giles Kshitij Doshi Peter Varman Rice University Intel Corporation Rice University Houston, TX 77005, USA Chandler, AZ 85226, USA Houston, TX 77005, USA [email protected] [email protected] [email protected] ABSTRACT updates share some characteristics with HTM techniques in commer- Emerging Persistent Memory technologies (also PM, Non-Volatile cial processors – both checkpoint state at some level of granularity DIMMs, Storage Class Memory or SCM) hold tremendous promise and guard against communication of partial updates. However, for accelerating popular data-management applications like in- dierent mechanisms are at play: while stable stores to persistent memory databases. However, programmers now need to deal with media are usually obtained by covering updates with logging or ensuring the atomicity of transactions on Persistent Memory resi- versioning, partial updates are prevented from being propagated dent data and maintaining consistency between the order in which between threads in HTM transactions by CPUs sheltering them processors perform stores and that in which the updated values until the transaction closes. Once an HTM transaction closes, its become durable. updates become visible en masse through the cache hierarchy and e problem is specially challenging when high-performance iso- can travel in any order to memory DIMMs. However stable storage lation mechanisms like Hardware Transactional Memory (HTM) are of the updates into Persistent Memory by the transaction addition- used for concurrency control. is work shows how HTM transac- ally requires an ability to reliably delineate its updates from those tions can be ordered correctly and atomically into PM by the use of by other overlapping transactions, and to use that delineation to a novel soware protocol combined with a Persistent Memory Con- recover from an unanticipated machine restart. troller, without requiring changes to processor cache hardware or Existing PM programming frameworks separate into categories HTM protocols. In contrast, previous approaches require signicant based on the degree of change they require in the processor ar- changes to existing processor microarchitectures. Our approach, chitecture for such problems. Several works [9–13] operate with evaluated using both micro-benchmarks and the STAMP suite com- existing processor capabilities and write a log (either a write-ahead pares well with standard (volatile) HTM transactions. It also yields log or an undo log) durably to cover data changes arriving via the signicant gains in throughput and latency in comparison with volatile cache hierarchy; some [14–16] require signicant changes persistent transactional locking. to existing cache hardware and protocols in the processor microarchitecture, while others [17–19] only require external controllers not aecting the processor core. 1 INTRODUCTION Isolation among concurrent transactions in the above works is paper provides a solution to the problems of adding durability is achieved either by the use of two-phase locking protocols or to concurrent in-memory transactions that use Hardware Trans- provided within the rubric of an STM [9]. Logging based soware HTM TSX actional Memory (HTM) for concurrency control while operating approaches are problematic for transactions (e.g., Intel ) on data in byte-addressable Non-Volatile Memory or Persistent which cannot bypass the caches in order to ush the log records Memory (PM). synchronously into Persistent Memory ahead of transaction clos- Recent years have witnessed a sharp shi towards real time ings. For these, PTM [20] proposes changes to processor caches data-driven and high-throughput applications. is has spurred while adding an on-chip scoreboard and global transaction id regis- a broad adoption of in-memory and massively parallelized data ter to couple HTM with PM. Recent work [13, 21, 22] has aempted processing [1–3] across business, scientic, and industrial applica- to provide inter-transactional isolation by employing processor tion domains [4, 5]. Two emerging hardware developments provide based (HTM)[7, 8] mechanisms. However, these solutions all re- further impetus to this approach and raise the potential for transfor- quire changes to the existing HTM semantics and implementations– arXiv:1806.01108v1 [cs.DC] 22 May 2018 mative gains in speed and scalability: (a) the arrival of inexpensive, [21] and [22] propose a new instruction to perform non-aborting byte-addressable, and large capacity Persistent Memory devices [6] cacheline ush from within an HTM, while [13] proposes allowing eliminates the I/O operations and bolenecks common to many data non-aborting concurrent access to designated memory variables management applications, and, (b) the availability of CPU-based within an HTM. transaction support (with HTM)[7, 8]) makes it straight-forward for Selective and incremental changes to the clean isolation seman- threads to work spontaneously in shared memory spaces without tics of HTM are not to be undertaken lightly; understanding their having to synchronize explicitly. impact on global system correctness and performance typically re- To prevent corruption of state upon an untimely machine or quires long gestation periods before processor manufacturers will soware failure, a sequence of store operations to PM in a transac- embrace them. In this paper we provide a new solution to obtain tional section of code cannot be partially reected into PM; nor can persistence consistency in Persistent Memory while using HTM it be transmied piecemeal from processor caches into PM without for concurrency control. e solution does not alter the processor similarly risking signicant loss of data. Operating on Persistent microarchitecture, but leverages a very simple, external Persistent Memory based data thus produces new atomicity and consistency Memory controller along with a persistence protocol, to supplement requirements. Soware approaches for ensuring atomic durable the existing HTM semantics and allowing HTM transactions to op- Even though HTM guarantees that transactional values are only erate at the speed of in-memory volatile transactions. e solutioni, visible on transaction completion, hardware manufacturers cannot while it achieves the concurrency benets of HTM for PM-based simply utilize a non-volatile processor cache hierarchy or baery data also applies to non-HTM transactions in a straight-forward backed ushing of the cache on failures to provide transactional way. atomicity. Transactions that do not complete before a soware or hardware restart produce partial and therefore inconsistent updates in non-volatile memory, as there is no guarantee when a machine 2 OVERVIEW halt will occur. e halt may happen during XEND execution leaving only partial updates in cache or write buers which can 2.1 HTM+PM Basics corrupt in-memory data structures. Hardware Transactional Memory, or HTM, was introduced in [7] as a new, easy-to-use method for lock-free synchronization sup- 2.2 Challenges of persistent HTM transactions ported by hardware. e initial instructions for HTM included load Consider transactions A, B and C shown in Listings1,2 and3. and store transactional instructions in addition to transactional Assume that w, x, y, z are persistent variables initialized to zero in management instructions. Most HTM implementations extend their home locations in Persistent Memory (PM). e code section an underlying cache-coherency protocol to handle detection of demarcated between the instructions XBegin and XEnd will be transactional conicts during program execution. e hardware referred to as an HTM transaction or simply a transaction. e system performs a speculative execution on a demarcated region of HTM mechanism ensures the atomicity of transaction execution. code similar to an atomic section. Independent transactions (those Within an HTM transaction, all updates are made to private locations that do not write a shared variable) proceed unrestricted through in the cache, and the hardware guarantees that the updates are their HTM sections. Transactions which access common variables not allowed to propagate to their home locations in PM. Aer the concurrently in their HTM sections, with at least one transaction XEnd instruction completes, all of the cache lines updated in the performing a write, are serialized by the HTM. at is, all but one transaction become instantaneously visible in the cache hierarchy. of the transactions is aborted; an aborted transaction will restart Atomic Persistence: e rst challenge is to ensure that the trans- its HTM code at the beginning. Updates made within the HTM action’s updates that were made atomically in the cache are also section are hidden from other transactions and are prevented from persisted atomically in PM. Following XEnd, the transaction vari- writing to memory until the transaction successfully completes the ables are once again subject to the normal cache operations like HTM section. is mechanism provides atomicity (all-or-nothing) evictions and the use of cache write-back instructions. ere are no semantics for individual transactions with respect to visibility by guarantees regarding whether or when the transaction’s updates other threads, and serialization of conicting, dependent transac- actually get wrien to PM from the cache. is can create a problem tions. However HTM was originally designed for volatile memory if the machine crashes before all these updates are wrien back systems (rather than supporting database

Load more