Understanding and Effectively Preventing the ABA Problem in Descriptor-Based Lock-Free Designs

Understanding and Effectively Preventing the ABA Problem in Descriptor-based Lock-free Designs Damian Dechev Peter Pirkelbauer Bjarne Stroustrup Sandia National Laboratories Texas A&M University Texas A&M University Scalable Computing R & D Department Department of Computer Science Department of Computer Science Livermore, CA 94551-0969 College Station, TX 77843-3112 College Station, TX 77843-3112 [email protected] [email protected] [email protected] Abstract 1 Introduction An increasing number of modern real-time systems and The modern ubiquitous multi-core architectures demand the nowadays ubiquitous multicore architectures demand the design of programming libraries and tools that allow the application of programming techniques for reliable and fast and reliable concurrency. In addition, providing safe efficient concurrent synchronization. Some recently devel- and efficient concurrent synchronization is of critical impor- oped Compare-And-Swap (CAS) based nonblocking tech- tance to the engineering of many modern real-time systems. niques hold the promise of delivering practical and safer Lock-free programming techniques [11] have been demon- concurrency. The ABA1 problem is a fundamental problem strated to be effective in delivering performance gains and to many CAS-based designs. Its significance has increased preventing some hazards, typically associated with the ap- with the suggested use of CAS as a core atomic primitive for plication of mutual exclusion, such as deadlock, livelock, the implementation of portable lock-free algorithms. The and priority inversion [5], [2]. The ABA problem is a funda- ABA problem’s occurrence is due to the intricate and com- mental problem to many CAS-based nonblocking designs. plex interactions of the application’s concurrent operations Avoiding the hazards of ABA imposes an extra challenge and, if not remedied, ABA can significantly corrupt the se- for a lock-free algorithm’s design and implementation. To mantics of a nonblocking algorithm. The current state of the best of our knowledge, the literature does not offer an the art leaves the elimination of the ABA hazards to the in- explicit and detailed analysis of the ABA problem, its rela- genuity of the software designer. In this work we provide the tion to the most commonly applied nonblocking program- first systematic and detailed analysis of the ABA problem in ming techniques (such as the use of Descriptors) and cor- lock-free Descriptor-based designs. We study the semantics rectness guarantees, and the possibilities for its avoidance. of Descriptor-based lock-free data structures and propose a Thus, at the present moment of time, eliminating the haz- classification of their operations that helps us better under- ards of ABA in a nonblocking algorithm is left to the in- stand the ABA problem and subsequently derive an effective genuity of the software designer. In this work we study in ABA prevention scheme. Our ABA prevention approach out- details and define the conditions that lead to ABA in a non- performs by a large factor the use of the alternative CAS- blocking Descriptor-based design. Based on our analysis, based ABA prevention schemes. It offers speeds comparable we define a generic and practical condition, called the λδ to the use of the architecture-specific CAS2 instruction used approach, for ABA avoidance for a lock-free Descriptor- for version counting. We demonstrate our ABA prevention based linearizable design (Section 4). We demonstrate the scheme by integrating it into an advanced nonblocking data application of our approach by incorporating it in a complex structure, a lock-free dynamically resizable array. and advanced nonblocking data structure, a lock-free dynamically resizable array (vector) [2]. The ISO C++ Stan- Keywords: ABA problem, nonblocking synchronization, dard Template Library [17] vector offers a combination of lock-free programming techniques dynamic memory management and constant-time random access. We survey the literature for other known ABA prevention techniques (usually described as a part of a non- 1ABA is not an acronym and is a shortcut for stating that a value at a blocking algorithm’s implementation) and study in detail shared location can change from A to B and then back to A three known solutions to the ABA problem (Sections 2.1 1 and 2.3). Our performance evaluation (Section 5) estab- Step Action lishes that the single-word CAS-based λδ approach is fast, Step 1 P1 reads Ai from Li efficient, and practical. Step 2 Pk interrupts P1; Pk stores the value Bi into Li Step 3 Pj stores the value Ai into Li Step 4 P resumes; P executes a false positive CAS 2 The ABA Problem 1 1 The Compare-And-Swap (CAS) atomic primitive (com- Table 1. ABA at Li monly known as Compare and Exchange, CMPXCHG, on the Intel x86 and Itanium architectures [12]) is a CPU instruction that allows a processor to atomically test and mod- 2.1 Known ABA Avoidance Techniques I ify a single-word memory location. The application of a CAS-controlled speculative manipulation of a shared loca- A general strategy for ABA avoidance is based on the tion (Li) is a fundamental programming technique in the fundamental guarantee that no process Pj (Pj 6= P1) can engineering of nonblocking algorithms [11] (an example is possibly store Ai again at location Li (Step 3, Table 1). shown in Algorithm 1). One way to satisfy such a guarantee is to require all values stored in a given control location to be unique. To enforce Algorithm 1 CAS-based speculative manipulation of Li this uniqueness invariant we can place a constraint on the 1: repeat user and request each value stored at Li to be used only 2: value type Ai=ˆLi Known Solution 1 3: value type Bi = fComputeB once ( ). For a large majority of concur- 4: until CAS(Li, Ai, Bi) == Bi rent algorithms, enforcing uniqueness typing would not be a suitable solution since their applications imply the usage In our pseudocode we use the symbols ˆ, &, and : to in- of a value or reference more than once. dicate pointer dereferencing, obtaining an object’s address, An alternative approach to satisfying the uniqueness in- and integrated pointer dereferencing and field access. When variant is to apply a version tag attached to each value. The the value stored at Li is the target value of a CAS-based usage of version tags is the most commonly cited solution speculative manipulation, we call Li and ˆLi control loca- for ABA avoidance [6]. The approach is effective, when it tion and control value, respectively. We indicate the con- is possible to apply, but suffers from a significant flaw: a trol value’s type with the string value type. The size of single-word CAS is insufficient for the atomic update of a value type must be equal or less than the maximum num- word-sized control value and a word-sized version tag. An ber of bits that a hardware CAS instruction can exchange effective application of a version tag [3] requires the hard- atomically (typically the size of a single memory word). In ware architecture to support a more complex atomic primi- the most common cases, value type is either an integer or tive that allows the atomic update of two memory locations, a pointer value. In Algorithm 1, the function fComputeB such as CAS2 (compare-and-swap two co-located words) yields the new value, Bi, to be stored at Li. or DCAS (compare-and-swap two memory locations). The Definition 1: The ABA problem is a false positive execu- availability of such atomic primitives might lead to much tion of a CAS-based speculation on a shared location Li. simpler, elegant, and efficient concurrent designs (in con- As illustrated in Table 1, ABA can occur if a process P1 trast to a CAS-based design). It is not desirable to sug- is interrupted at any time after it has read the old value (Ai) gest a CAS2/DCAS-based ABA solution for a CAS-based and before it attempts to execute the CAS instruction from algorithm, unless the implementor explores the optimiza- Algorithm 1. An interrupting process (Pk) might change tion possibilities of the algorithm upon the availability of the value at Li to Bi. Afterwards, either Pk or any other CAS2/DCAS. A proposed hardware implementation (en- process Pj 6= P1 can eventually store Ai back to Li. When tirely built into a present cache coherence protocol) of an in- P1 resumes, its CAS loop succeeds (false positive execu- novative Alert-On-Update (AOU) instruction [16] has been tion) despite the fact that Li’s value has been meanwhile suggested by Spear et al. to eliminate the CAS deficiency manipulated. of allowing ABA. Some suggested approaches [15] split a Definition 2: A nonblocking algorithm is ABA-free when version counter into two half-words (Known Solution 2): a its semantics cannot be corrupted by the occurrence of ABA. half-word used to store the control value and a half-word ABA-freedom is achieved when: a) occurrence of ABA used as a version tag. Such techniques lead to severe lim- is harmless to the algorithm’s semantics or b) ABA is itations on the addressable memory space and the number avoided. The former scenario is uncommon and strictly spe- of possible writes into the shared location. To guarantee cific to the algorithm’s semantics. The latter scenario is the the uniqueness invariant of a control value of type pointer general case and in this work we focus on providing details in a concurrent system with dynamic memory usage, we of how to eliminate ABA. face an extra challenge: even if we write a pointer value no 2 more than once in a given control location, the memory al- value type stored in Li.

Load more