Robust Architectural Support for Transactional Memory in the Power Architecture
Total Page:16
File Type:pdf, Size:1020Kb
Robust Architectural Support for Transactional Memory in the Power Architecture Harold W. Cain∗ Brad Frey Derek Williams IBM Research IBM STG IBM STG Yorktown Heights, NY, USA Austin, TX, USA Austin, TX, USA [email protected] [email protected] [email protected] Maged M. Michael Cathy May Hung Le IBM Research IBM Research (retired) IBM STG Yorktown Heights, NY, USA Yorktown Heights, NY, USA Austin, TX, USA [email protected] [email protected] [email protected] ABSTRACT in current p795 systems, with 8 TB of DRAM), as well as On the twentieth anniversary of the original publication [10], strengths in RAS that differentiate it in the market, adding following ten years of intense activity in the research lit- TM must not compromise any of these virtues. A robust erature, hardware support for transactional memory (TM) system is one that is sturdy in construction, a trait that has finally become a commercial reality, with HTM-enabled does not usually come to mind in respect to HTM systems. chips currently or soon-to-be available from many hardware We structured TM to work in harmony with features that vendors. In this paper we describe architectural support for support the architecture's scalability. Our goal has been to TM provide a comprehensive programming environment includ- TM added to a future version of the Power ISA . Two im- ing support for simple system calls and debug aids, while peratives drove the development: the desire to complement providing a robust (in the sense of "no surprises") execu- our weakly-consistent memory model with a more friendly tion environment with reasonably consistent performance interface to simplify the development and porting of multi- and without unexpected transaction failures.2 TM must be threaded applications, and the need for robustness beyond usable throughout the system stack: in hypervisors, oper- that of some early implementations. In the process of com- ating systems, libraries, compilers, run-time systems, and mercializing the feature, we had to resolve some previously applications. Further, TM must be easily exploited in exist- unexplored interactions between TM and existing features of ing code, as well as newly written software, driving the need the ISA, for example translation shootdown, interrupt han- to co-exist with nearly the full scope of the ISA's functional- dling, atomic read-modify-write primitives, and our weakly ity. The transactional extensions start with a set of instruc- consistent memory model. We describe these interactions, tions for implementing strongly-isolated[17]3 transactions as the overall architecture, and discuss the motivation and ra- well as a general mechanism for checkpointing and rollback tionale for our choices of architectural semantics, beyond of architectural state. It discourages unpredictable transac- what is typically found in reference manuals. tion behavior (e.g. aborting transactions on some but not all function calls [19]), while also providing support for identi- 1. INTRODUCTION fying the source of transaction failures when the contents of This work describes the first architectural support for TM the transaction are inconsistent with transactional seman- in the Power ISA. (The TM feature implemented in Blue tics (e.g. code that performs a cache flush operation). One Gene R /Q 1 contains no instruction set support.) Since the feature supporting these goals is transactional suspension, Power ISA is the basis for IBM R pSeries servers, whose which implicitly occurs when handling interrupts, or can strengths are robust and scalable performance in large sys- be explicitly invoked via a new suspend/resume instruction. tem configurations (e.g. up to 256 cores / 1024 threads This support is described in Section 3. In the near term, the value of the TM feature is expected ∗The author is now employed by Qualcomm Research- to come from lock elision[24] and to a lesser extent from Raleigh speculative compilation techniques that are enabled by the 1 IBM, Power, Blue Gene, AIX, Power Architecture, and checkpointing/rollback mechanism[23, 22, 28]. Over time, z/Architecture are registered trademarks of International the ease of programming relative to the weakly consistent Business Machines. Power ISA memory model will seed the development and porting of multithreaded applications. (For example if mul- tiple memory accesses can be wrapped in a transaction, there is no need to reason about the necessary memory barriers be- Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are tween those particular accesses.) The architecture has been not made or distributed for profit or commercial advantage and that copies 2 bear this notice and the full citation on the first page. To copy otherwise, to We use the term failure to refer to any unsuccessful trans- republish, to post on servers or to redistribute to lists, requires prior specific action, and reserve the term abort to refer to failures that are explicitly requested via transaction abort instructions. permission and/or a fee. 3 ISCA ’13 Tel-Aviv, Israel While Blundell et. al[17] use the term strongly atomic, we Copyright 2013 ACM 978-1-4503-2079-5/13/06 ...$15.00. prefer and use the term strongly isolated. 225 tailored for these uses, while providing a base to support to be competitive with those primitives. From this start- emerging language standards and eventually be extended ing point, we also imposed the following implementation as- for lock-free programming. sumptions: Another aspect of the inclusion of TM is its intersection with the Power ISA weak memory model. In addition to the • We would not require support for maintaining more interaction of TM with non-transactional weakly ordered than one read or write set per transaction (i.e. no guar- code, since lock elision enables the transactionalization of anteed support for intermediate rollback of nested trans- arbitrary code, we must define correct and predictable se- actions or multiple speculative versions of a line). mantics for the inclusion of memory barriers and atomic syn- • There would be no guarantee of success for any trans- chronization primitives (i.e. load-reserve/store conditional) action; all transactions must specify a failure handler al- within a transaction. We describe the intersection of TM lowing for non-transactional alternatives. and the Power ISA memory model in Section 4. • Despite a desire for graceful handling of many rare Although this paper summarizes the architectural support events, there would be no requirement of support for for HTM in one commercial architecture, and the design transaction survival across context switches or paging of challenges and trade-offs involved, some of the features dis- transactionally accessed data. cussed have not been previously described in the literature, which we highlight here as novel contributions of this work: • When possible, the primitives themselves must be largely consistent with existing components of the Power • We define a suspended transactional mode of execu- ISA. Significant deviations were only permissible if abso- tion with deferred failure handling semantics, allowing lutely necessary. for a sequential programming model that can run existing • The resulting architecture would allow for decoupled software (e.g. interrupt handlers) without fear of failure- implementations of TM and existing larx/stcx atomic syn- induced control redirection. This mode is entered as the chronization primitives. result of an interrupt or via a new suspend instruction, and is an important ingredient in the architecture's ro- • We placed no architectural limits on the transaction bustness. It is described in Section 3. size an implementation must support, but encourage the possible support of large transactions by eliminating many • We describe the potential for a loss of transactional causes of transaction failure. Implementations must have atomicity in systems that implement hardware-based the flexibility to choose granule size for tracking transac- translation shootdowns (i.e. do not rely on interproces- tional conflicts. sor interrupts), motivating support for conflict detection mechanisms with translation shootdown. Support for TM was primarily motivated as an enabler • We describe various mechanisms, most importantly an of lock elision and speculative code generation techniques. integrated cumulative barrier effect on transaction com- Lock elision is a technique that can be implemented purely mittal, that are necessary to allow transactional and non- in hardware[24], purely in software [21], or using a combina- transactional accesses to interoperate in a sensible fash- tion of the two through transactional lock elision (TLE) [7], ion. The interaction of TM and the Power ISA memory which is the approach we adopt. Lock elision requires the model is described in Section 4. ability to execute transaction unaware code, as opposed to • We define a variant of transactions called rollback-only small carefully orchestrated transactions that may be con- transactions that support the checkpointing and rollback structed by a programmer by hand, for example when build- of architectural state without the atomic features of trans- ing a lock-free data structure. Since lock-based critical sec- actions, to be used for software speculation. tions can make procedure calls that may end up distantly removed from the original critical section (potentially even We begin with a summary of the requirements that guided in system