Verification of a Lock-Free Implementation of Multiword LL/SC Object Gao, Hui; Fu, Yan; Hesselink, Wim H

University of Groningen Verification of a Lock-Free Implementation of Multiword LL/SC Object Gao, Hui; Fu, Yan; Hesselink, Wim H. Published in: EPRINTS-BOOK-TITLE IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2009 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Gao, H., Fu, Y., & Hesselink, W. H. (2009). Verification of a Lock-Free Implementation of Multiword LL/SC Object. In EPRINTS-BOOK-TITLE University of Groningen, Johann Bernoulli Institute for Mathematics and Computer Science. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 28-09-2021 2009 Eighth IEEE International Conference on Dependable, Autonomic and Secure Computing Verification of a Lock-Free Implementation of Multiword LL/SC Object Hui Gao, Yan Fu Wim H. Hesselink School of Computer Science and Engineering Dept. of Mathematics and Computing Science University of Electronic Science and Technology of China University of Groningen Chengdu, China Groningen, The Netherlands Email: [email protected], [email protected] Email: [email protected] Abstract—On shared memory multiprocessors, synchroniza- not in general match even naive blocking designs. It has also tion often turns out to be a performance bottleneck and the been shown [13] that the widely-available atomic conditional source of poor fault-tolerance. By avoiding locks, the significant primitives, CAS and LL/SC cannot provide starvation-free benefit of lock (or wait)-freedom for real-time systems is that the potentials for deadlock and priority inversion are avoided. implementations of many common data structures without The lock-free algorithms often require the use of special memory costs growing linearly in the number of threads. atomic processor primitives such as CAS (Compare And Swap) Wait-free algorithms are therefore rare, both in research and or LL/SC (Load Linked/Store Conditional). However, many in practice, and we are most interested in designing lock-free machine architectures support either CAS or LL/SC, but not implementations. both. In this paper, we present a lock-free implementation of the ideal semantics of LL/SC using only pointer-size CAS,and A number of researchers[3], [5], [7], [9], [15] have pro- show how to use refinement mapping to prove the correctness posed techniques for designing lock-free implementations. of the algorithm. The lock-free algorithms often require the use of special atomic processor instructions such as CAS (compare and I. INTRODUCTION swap) or LL/SC (load linked/store conditional). However, We are interested in designing efficient data structures Current mainstream architectures support either CAS or and algorithms on shared-memory multiprocessors. On such LL/SC with restricted semantics (but not both), which are machines, processes often need to coordinate with each susceptible to the ABA problem [14]. other via shared data structures. In order to prevent the The ideal semantics of the atomic primitives LL/SC are corruption of these concurrent objects, processes need a inherently immune to ABA problem. However, for practical mechanism for synchronizing their access. The traditional architectural reasons, no processor architecture supports the approach is to explicitly synchronize access to shared data ideal semantics of LL/SC. Designing efficient algorithms to by different processes to ensure correct behaviors of the bridge the gap has been the subject of many researchers’ overall system, using synchronization primitives such as interest. However, most of the research is focused on im- semaphores, monitors, guarded statements, mutex locks, etc. plementing only small LL/SC objects, whose value fits in a Due to blocking, the classical synchronization paradigms single machine [4], [8], [9], [11]. using locks can incur many problems such as long delays, In this paper, using only pointer-size CAS we present a convoying, priority inversion and deadlock. Using locks also practical lock-free implementation of the ideal semantics of involves a trade-off between coarse-grained locking which LL/SC Multiword objects (whose value does not have to fit can significantly reduce opportunities for parallelism, and in a single machine word) without causing ABA problem. fine-grained locking which requires more careful design and is more prone to bugs. A true problem of lock-free algorithms is that they are Over the past two decades the research community has hard to design correctly, even when apparently straightfor- developed a body of knowledge concerning ”Lock-Free” and ward. To ensure our implementation is not flawed, we used ”Wait-Free” algorithms and data structures. In contrast to the higher-order interactive theorem prover PVS [6] for algorithms that protect access to shared data with locks, mechanical support. All invariants as well as the simulation lock-free and wait-free algorithms are specially designed relation have been completely verified with PVS. to allow multiple threads to read and write shared data Overview. In Section 2 we present we present preliminary concurrently without corrupting it. The significant benefit material which we require throughout this paper. In Section of lock (or wait)-freedom for real-time systems is that 3, we give a lock-free implementation of the ideal semantics by avoiding locks the potentials for deadlock and priority of LL/SC Multiword objects. In Section 4, we provide inversion are avoided. an overview of the proof and a description of the role of It was shown in the 1980s that all algorithms can be imple- the proof assistant PVS in it. In Section 5,we draw some mented wait-free. However, the resulting performance does conclusions. 978-0-7695-3929-4/09 $26.00 © 2009 IEEE 31 DOI 10.1109/DASC.2009.25 II. PRELIMINARY S.X := ∅; X := Y ; return true ; The machine architecture that we have in mind is based else return false fi on modern shared-memory multiprocessors that can access B. Refinement mappings a common shared address space in a heap. There can be In practice, the specification of systems is concerned several processes running on a single processor. Variables rather with externally visible behavior than computational in shared context are visible to all processes running in feasibility. We assume that all levels of specifications under associated parallel. Variables in private context are hidden consideration have the same observable state space Σ0, and from other processes. are interpreted by their observation functions Π:Σ→ We assume a universal set V of typed variables, which Σ0. Every specification can be modeled as a four-tuple is called the vocabulary. A state s is a type-consistent (Σ, Π, Θ, N ) where (Σ, Θ, N )isthetransition system [2]. interpretation of V, mapping variables v ∈Vto values sv. A refinement mapping from a lower-level specification We denote by Σ the set of all states. If C is a command, Sc =(Σc, Πc, Θc, Nc) to a higher-level specification Sa = we denote by Cp the transition C executed by process p, and (Σa, Πa, Θa, Na), written φ : Sc Sa, is a mapping sCpt indicates that in state s process p can do a step C that φ :Σc → Σa that satisfies: establishes state t. When discussing the effect of a transition 1) φ preserves the externally visible state component: Cp from state s to state t on a variable v, we abbreviate sv Πa ◦ φ =Πc. to v and tv to v . We use the abbreviation Pres(V ) for 2) φ is a simulation, denoted φ : Sc Sa: v∈V (v = v) to denote that all variables in the set V are preserved by the transition. ① φ takes initial states into initial states: Θc ⇒ Θa ◦φ. ② Nc is mapped by φ into a transition (possibly A. The Semantics of Synchronization Primitives stuttering) allowed by Na: Q∧N ⇒N◦ φ Q S Traditional multiprocessor architectures have included c a , where is an invariant of c. hardware support only for low level synchronization prim- Below we need to exploit the fact that the simulation only itives such as CAS and LL/SC, while high level synchro- quantifies over all reachable states of the lower-level system, nization primitives such as locks, barriers, and condition not all states. We therefore explicitly allow an invariant Q variables have to be implemented in software. in condition 2 ➁. The following theorem is stated in [1]. CAS atomically compares the contents of a location with a value and, if they match, stores a new value at Theorem 1 If there exists a refinement mapping from Sc to the location. The semantics of CAS is given by equivalent Sa, then Sc implements Sa. atomic statements below. We use angular brackets ... to indicate atomic execution of the enclosed specification Refinement mappings give us the ability to reduce an command1. implementation by reducing its components in relative isola- tion, and then gluing the reductions together with the same proc CAS(ref X : Val; in old, new : Val):Bool = structure as the implementation. if X = old then X := new; return true else return false; fi III.

Load more