<<

Decidable Verification under a Causally Consistent Shared Memory

Ori Lahav Udi Boker Tel Aviv University Interdisciplinary Center (IDC) Herzliya Israel Israel [email protected] [email protected]

Abstract ACM Reference Format: Causal is one of the most fundamental and Ori Lahav and Udi Boker. 2020. Decidable Verification under a widely used consistency models weaker than sequential con- Causally Consistent Shared Memory. In Proceedings of the 41st ACM SIGPLAN International Conference on Programming Language sistency. In this paper, we study the verification of safety Design and Implementation (PLDI ’20), June 15ś20, 2020, London, properties for finite-state concurrent programs running un- UK. ACM, New York, NY, USA, 16 pages. https://doi.org/10.1145/ der a causally consistent shared memory model. We estab- 3385412.3385966 lish the decidability of this problem for a standard model of causal consistency (called also łCausal Convergencež and łStrong-Release-Acquirež). Our proof proceeds by developing an alternative operational semantics, based on the notion of a thread potential, that is equivalent to the existing declara- 1 Introduction tive semantics and constitutes a well-structured transition Suppose that one wants to verify that a given sequential system. In particular, our result allows for the verification of program satisfies a certain safety specification (e.g., that a large family of programs in the Release/Acquire fragment it never crashes). If the data domain is bounded, we can of C/C++11 (RA). Indeed, while verification under RA was - represent the program as a finite-state transition system, cently shown to be undecidable for general programs, since and this verification problem is trivially decidable. Moving RA coincides with the model we study here for write/write- to concurrent programs, assuming (non-realistic) sequen- race-free programs, the decidability of verification under RA tially consistent shared memory semantics, does not change for this widely used class of programs follows from our result. muchÐthe memory constitutes another finite-state system, The novel operational semantics may also be of independent and its synchronization with the interleaving of the systems use in the investigation of weakly consistent shared memory representing the different threads is easily expressible asa models and their verification. finite-state system as well. On the other hand, if the memory does not ensure sequential consistency, but rather provides → CCS Concepts: · Software and its engineering Soft- weaker consistency guarantees, the decidability of the safety ware verification; Concurrent programming languages; verification problem is completely unclear. → · Concurrency; and In this paper, we are interested in the safety verification verification; Program verification; · Information sys- problem under causally consistent shared memory. Causal → tems Distributed database transactions. consistency is one of the most fundamental consistency mod- els weaker than sequential consistency. It is especially com- Keywords: weak memory models, causal consistency, re- mon and well studied in distributed databases (see, e.g., [37] lease/acquire, shared-memory, concurrency, verification, de- and the mongoDB documentation [40]). Roughly speaking, cidability, well-structured transition systems by allowing nodes to disagree on the relative order of some memory operations, and require global consensus only on the order of łcausally relatedž operations, causal consistency Permission to make digital or hard copies of all or part of this work for allows scalable, partition-tolerant and available implementa- personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear tions. this notice and the full citation on the first page. Copyrights for components Nowadays, causal consistency models have become cen- of this work owned by others than ACM must be honored. Abstracting with tral also in multithreaded programming. In particular, the credit is permitted. To copy otherwise, or republish, to post on servers or to Release/Acquire model (RA) is a form of causal consistency redistribute to lists, requires prior specific permission and/or a fee. Request that specifies the semantics of C/C++11 for synchroniza- permissions from [email protected]. tion accesses annotated with memory_order_release and PLDI ’20, June 15ś20, 2020, London, UK © 2020 Association for Computing Machinery. memory_order_acquire [14, 23, 24]. A stronger form of ACM ISBN 978-1-4503-7613-6/20/06. causal consistency, called SRA (for Strong Release/Acquire), https://doi.org/10.1145/3385412.3385966 which is equivalent to the standard causal consistency model

211 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker in distributed databases [18],1 characterizes the guarantees this semantics is to maintain the potential of future reads provided by łmulti-copy atomicž multiprocessor architec- of each thread in the machine state. This semantics can be tures, such as POWER. Specifically, as shown30 in[ ], SRA straightforwardly made łlossyž, as losing some parts of the precisely captures the guarantees provided by the POWER possible potential never allows for additional behaviors. In architecture for programs compiled from the C/C++’s re- addition, potentials can be represented using total orders, lease/acquire fragment. whose embedding relation (based on the ordinary subse- Despite its centrality, until recently not much was known quence relation) is a well-quasi-ordering. In this semantics, about the safety verification problem under causal consis- read transitions are very simple, they only consume a prefix tency. The challenge arises first since the standard semantics of the potential. The complexity is left for write transitions of causal consistency models is declarative (identifying pro- that need to properly increase the potentials of the different gram behaviors with partially ordered execution histories threads in a way that ensures causal consistency. Our funda- that obey certain formal consistency constraints), while veri- mental observation is that the way the potential of a certain fication is typically applied on operational models. Moreover, thread increases when another thread writes to memory can operational versions of causal consistency are inherently be defined solely in terms of the existing potentials ofthe infinite-state, as threads may generally read from an un- two threads. This intuition is made precise in our formalized bounded past. In fact, the reduction of Atig et al.[11] from (and mechanized in Coq) correspondence proofs, which es- reachability in lossy FIFO channel machines to safety veri- tablish simulations (forward for one direction and backward fication under x86-TSO semantics can be straightforwardly for the converse) between the novel lossy semantics and the adapted to causally consistent models (specifically, RA and straightforward łoperationalizationž of SRA’s declarative SRA). This implies a non-primitive recursive lower bound on semantics. the safety verification problem under causal consistency. Very recently, Abdulla et al.[3] proved that the safety verifi- Related Work. Causally consistent shared memory mod- cation problem is undecidable under one instance of causal els, their verification problems and approaches to address consistency, namely the the RA model. these problems were recently outlined in [29], where the Our main contribution in this paper is to establish the problem we resolve is left open. As mentioned above, Ab- decidability of safety verification under the SRA model. If dulla et al.[3] proved that safety verification under RA is one is specifically interested in verification under RA, our undecidable. Operational łmessage-passingž semantics for result provides a (rather tight) under-approximation (a bug SRA was developed in [30]. It is inadequate for our purposes under SRA implies a bug under RA), and, since RA and SRA as it cannot be made łlossyž without affecting its allowed coincide on write/write-race-free programs, we obtain the outcomes. decidability of safety verification under RA for this large and The safety verification problem was previously investi- widely used class of programs. gated under TSOÐthe łtotal store orderingž model of x86 To obtain decidability, we use the framework of well- multiprocessors, which, being multi-copy-atomic, is stronger structured transition systems [2, 7, 22]. Intuitively speaking, than any of the models studied here. Atig et al.[11, 12] estab- this framework allows one to establish decidability of infinite- lish the decidability of this problem (and the non-primitive state łlossyž systems, where (i) states may non-deterministi- recursive lower bound) by reducing it to (and from) reach- cally forget some information they include; and (ii) the rela- ability in lossy channel systems. Since causal consistency tion determining whether one state is obtained from another models are not multi-copy atomic and they lack any notion by losing information constitutes a well-quasi-ordering. This of a global mapping from locations to values, the idea be- approach, however, cannot be applied for (an operationalized hind their reduction cannot be applied for SRA. Notably, version of) SRA directly, whose natural states are execution SRA cannot be fully explained by program transformations histories. First, forgetting information from the history re- (instruction reordering and merging) [33], whereas, with sults, in many cases, in strictly weaker causality constraints the exception of the recent undecidability in [3], all existing that allow outcomes that cannot be obtained without los- results (of [12] in particular) are for models that are fully ing the information. Second, execution histories are only accounted for by such transformations. partially ordered and embedding between (general) partial More recently, Abdulla et al.[4] greatly simplified previous orders is not a well-quasi-ordering. proofs for TSO (and demonstrated much better practical Our solution is to develop a novel operational semantics running times on certain benchmarks) by developing and that is equivalent to SRA, for which we can use the frame- utilizing a łload-bufferž semantics for TSO. Load-buffers are work of well-structured transition systems. The key idea in roughly similar to our potential lists, but while load buffers are FIFO queues, our lists necessarily allow the insertion of 1This equivalence excludes the atomicity of read-modify-writes, which future reads at different positions, subject to certain (novel) is crucial in multithreaded programming but is not provided by causal conditions ensuring that causal consistency is not violated. consistency as defined in18 [ ] (see also ğ3.1). In addition, the łload-bufferž semantics for TSO includes

212 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK a global machine memory, while our semantics does not and 푟 := CAS(푥, 푒R, 푒W) atomically loads the value of 푥 to 푟, employ any such notion. compares it to the value of 푒R, and if the two values are equal, Verification of programs under causal consistency (espe- replaces the value of 푥 by the value of 푒W. cially under RA) has received considerable amount of atten- A sequential program 푆 is a function from a set of the form tion in recent years. The different approaches include (non- {0, 1, ... ,푁 } (the possible values of the program counter) to automated) program [21, 25, 32, 48, 49], (bounded) instructions. We denote by SProg the set of all sequential model checking based on partial order reduction [3, 5, 27, 35] programs. A (concurrent) program 푃 is a top-level parallel and robustness verification [17, 31, 41]. The latter approach composition of sequential programs, defined as a mapping reduces the verification problem to the verification under from a finite set Tid ⊆ {T1, T2,...} of thread identifiers to sequential consistency and the verification of the program’s SProg. In our examples, we often write sequential programs robustness against causal consistency. Thus, this approach as sequences of instructions delimited by line breaks, use ‘∥’ cannot work for programs that meet their safety specification for parallel composition, and refer to the program threads but still exhibit non-sequentially-consistent behaviors. as T1, T2,... following their left-to-right order in the program Finally, the problem asking whether a given implementa- listing (see, e.g., Ex. 3.5 on Page 6). tion provides causal consistency guarantees was studied in [16]. It is, however, independent from verification of client 2.2 From Programs to Labeled Transition Systems programs assuming causal consistency, as we study here. Sequential and concurrent programs induce labeled transi- tion systems. Outline. The rest of this paper is organized as follows. In ğ2 we provide preliminary definitions. In ğ3 we present the Labeled transition systems. A labeled transition system SRA model and its safety verification problem, and prove (LTS) 퐴 over an alphabet Σ is a triple ⟨푄, 푄0,푇 ⟩, where 푄 that RA and SRA coincide for write/write-race-free programs. is a set of states, 푄0 ⊆ 푄 is the set of initial states, and In ğ4 we present a straightforward operational version of 푇 ⊆ 푄 × Σ × 푄 is a set of transitions. We denote by 퐴.Q, SRA’s declarative semantics. In ğ5 we introduce our novel 휎 퐴.Q0 and 퐴.T the components of an LTS 퐴; write −→퐴 for operational semantics of SRA. In ğ6 we show how this se- ′ ′ 휎 the relation {⟨푞, 푞 ⟩ | ⟨푞, 휎, 푞 ⟩ ∈ 퐴.T} and −→퐴 for Ð휎 ∈Σ −→퐴. mantics is used to decide the safety verification problem. ∗ A state 푞 ∈ 퐴.Q is reachable in 퐴 if 푞0 −→퐴 푞 for some We conclude in ğ7. The appendices to this paper, publicly 휎1 푞0 ∈ 퐴.Q0. A sequence 휎1, ... ,휎푛 is a trace of 퐴 if 푞0 −→퐴 available in [1], provide full proofs. Mechanized Coq proofs 휎푛 of the equivalence of the two semantics of SRA are available ··· −−→퐴 푞 for some 푞0 ∈ 퐴.Q0 and 푞 ∈ 퐴.Q. The set of pre- Σ in the artifact accompanying this paper. decessors of a set 푆 ⊆ 퐴.Q w.r.t. a symbol 휎 ∈ , denoted 휎 ′ 휎 ′ by pred퐴 (푆), is given by {푞 ∈ 퐴.Q | ∃푞 ∈ 푆. 푞 −→퐴 푞 }. We ≜ 휎 2 Preliminaries define pred퐴 (푆) Ð휎 ∈Σ pred퐴 (푆). SRA is a declarative memory model, defined by imposing For sequential programs the alphabet is the set of labels certain consistency constraints on execution graphs. The latter (extended with 휀 for silent transitions), as defined next. describe the (partially ordered) history of a program run. In Definition 2.1. A label is either R(푥, 푣 ) (read label), W(푥, 푣 ) this section, we provide the preliminaries for declarative R W (write label) or RMW(푥, 푣 , 푣 ) (read-modify-write label), where memory model: We introduce a toy programming language R W 푥 ∈ Loc and 푣 , 푣 ∈ Val. We denote by Lab the set of all la- (ğ2.1), interpret its programs as transition systems (ğ2.2) and R W bels. The functions typ, loc, val , and val return (when associate these systems with execution graphs (ğ2.3). R W applicable) the type (R/W/RMW), location, read value and writ- 2.1 Programming Language ten value of a given label 푙. Let Val ⊆ N, Loc ⊆ {x, y,...}, Reg ⊆ {a, b,...} be finite sets A sequential program 푆 ∈ SProg induces an LTS over of values, (shared) memory locations, and register names. Lab ∪ {휀}. Its states are pairs 푠 = ⟨pc, 휙⟩ where 푝푐 ∈ N Figure 1 presents our toy language. Its expressions are con- (called program counter) and 휙 : Reg → Val (called local structed from registers (local variables) and values. Instruc- store, and extended to expressions in the obvious way). Its tions include assignments and conditional branching, as well only initial state is ⟨0, 휆푟 ∈ Reg. 0⟩ and its transitions are as memory operations. Intuitively speaking, an assignment given in Fig. 2, following the informal description above. (In 푟 := 푒 assigns the value of 푒 to register 푟 (involving no mem- particular, a read instruction in 푆 induces |Val| transitions ory access); if 푒 goto 푛 sets the program counter to 푛 iff with different read labels.) We identify sequential programs the value of 푒 is not 0; a łwritež 푥 := 푒 stores the value of with their induced LTSs (when writing, e.g., 푆.Q and −→푆 ). 푒 in 푥; a łreadž 푟 := 푥 loads the value of 푥 to register 푟; In turn, a concurrent program 푃 is identified with an LTS 푟 := FADD(푥, 푒) atomically increments 푥 by the value of 푒 over Tid × (Lab ∪ {휀}). Its states are functions, often denoted and loads the old value of 푥 to 푟; 푟 := XCHG(푥, 푒) atomically by 푝, assigning a state in 푃 (휏).Q to every 휏 ∈ Tid; its initial swaps 푥 to the value of 푒 and loads the old value of 푥 to 푟; states set is {푝 | ∀휏. 푝(휏) ∈ 푃 (휏).Q0}; and its transitions are

213 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker

푣 ∈ Val ⊆ N values 푥,푦, 푧 ∈ Loc ⊆ {x, y,...} locations 푒 ::= 푟 | 푣 | 푒 + 푒 | 푒 = 푒 | 푒 ≠ 푒 | ... 푟 ∈ Reg ⊆ {a, b,...} registers = = = = 휏, 휋, 휂 ∈ Tid ⊆ {T , T ,...} thread identifiers Inst ∋ inst :: 푟 : 푒 | if 푒 goto 푛 | 푥 : 푒 | 푟 : 푥 | 1 2 = = = 푆 ∈ SProg ≜ {0, 1, ... ,푁 } → Inst sequential programs 푟 : FADD(푥, 푒) | 푟 : XCHG(푥, 푒) | 푟 : CAS(푥, 푒, 푒) 푃 : Tid → SProg (concurrent) programs

Figure 1. Domains, metavariables and programming language .

푆 (pc) = 푟 := 푒 푆 (pc) = if 푒 goto 푛 푆 (pc) = if 푒 goto 푛 푆 (pc) = 푥 := 푒 푆 (pc) = 푟 := 푥 휙 ′ = 휙 [푟 ↦→ 휙 (푒)] 휙 (푒) ≠ 0 휙 (푒) = 0 푙 = W(푥, 휙 (푒)) 푙 = R(푥, 푣) 휙 ′ = 휙 [푟 ↦→ 푣] 휀 휀 휀 푙 푙 ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩ ⟨pc, 휙⟩ −→ ⟨푛, 휙⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩

푆 (pc) = 푟 := FADD(푥, 푒) 푆 (pc) = 푟 := XCHG(푥, 푒) 푆 (pc) = 푟 := CAS(푥, 푒R, 푒W) 푆 (pc) = 푟 := CAS(푥, 푒R, 푒W) 푙 = RMW(푥, 푣, 푣 + 휙 (푒)) 푙 = RMW(푥, 푣, 휙 (푒)) 푙 = RMW(푥, 휙 (푒R), 휙 (푒W)) 푙 = R(푥, 푣) 푣 ≠ 휙 (푒R) ′ ′ ′ ′ 휙 = 휙 [푟 ↦→ 푣] 휙 = 휙 [푟 ↦→ 푣] 휙 = 휙 [푟 ↦→ 휙 (푒R)] 휙 = 휙 [푟 ↦→ 푣] 푙 푙 푙 푙 ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩ ⟨pc, 휙⟩ −→ ⟨pc + 1, 휙 ′⟩

Figure 2. Transitions of LTS induced by a sequential program 푆 ∈ SProg.

−1 łinterleaved transitionsž of 푃’s components, given by: • If ⟨푤1, 푟⟩, ⟨푤2, 푟⟩ ∈ rf , then 푤1 = 푤2 (that is, rf = {⟨푟,푤⟩ | ⟨푤, 푟⟩ ∈ rf } is functional). ∈ ( ) 푙 ′ 휀 ′ 푙 Lab 푝 휏 −→푃 (휏) 푠 푝(휏) −→푃 (휏) 푠 • ∀푟 ∈ 퐸 ∩ R. ∃푤. ⟨푤, 푟⟩ ∈ rf (each read event reads from 휏,푙 휏,휀 some write event). 푝 −−→ 푝 [휏 ↦→ 푠′] 푝 −−→ 푝 [휏 ↦→ 푠′] Definition 2.4. A relation mo is a modification order for a 2.3 From LTSs to Execution Graphs set 퐸 of events if mo is a disjoint union of relations {mo푥 }푥 ∈Loc We present the general notions used to assign declarative where each mo푥 is a strict total order on 퐸 ∩ W푥 . semantics to concurrent programs. First, we define execution graphs, starting with their nodes, called events. Definition 2.5. An execution graph is a triple퐺 = ⟨퐸, rf , mo⟩ where 퐸 is a finite set of events, rf is a reads-from relation for Definition 2.2. An event is a triple 푒 = ⟨휏, 푛,푙⟩, where 퐸 and mo is a modification order for 퐸. We denote by EGraph 휏 ∈ Tid is a thread identifier, 푛 ∈ N is a serial number the set of all execution graphs. The components of 퐺 are and 푙 ∈ Lab is a label (Def. 2.1). The function tid returns the denoted by 퐺.E, 퐺.rf and 퐺.mo, and 퐺.po denotes the restric- thread identifier of an event. The functions typ, loc, val , R tion of < to 퐺.E (i.e., 퐺.po ≜ {⟨푒 , 푒 ⟩ ∈ 퐸 × 퐸 | 푒 < 푒 }). For and val are lifted to events in the obvious way. We denote by 1 2 1 2 W a set 퐸 ⊆ E, we write 퐺.퐸 for 퐺.E∩퐸 (e.g., 퐺.W = 퐺.E∩W ). E the set of all events, and use R, W, RMW for its subsets: R ≜ 푥 푥 {푒 | typ(푒) ∈ {R, RMW}}, W ≜ {푒 | typ(푒) ∈ {W, RMW}} and The next definition is used to associate execution graphs RMW ≜ R ∩ W. Sub/superscripts are used to restrict these to programs. Multiple examples below (on Page 6) illustrate sets to certain location (e.g., W = {푤 ∈ W | loc(푤) = 푥}) 푥 execution graphs of different programs. and/or thread identifier (e.g., E휏 = {푒 ∈ E | tid(푒) = 휏}).

Our representation of events induces a partial order < on Notation 2.6. For a set 퐸 of events, thread identifier 휏 ∈ Tid them: events of the same thread are ordered according to and label 푙 ∈ Lab, NextEvent(퐸, 휏,푙) denotes the event given by ⟨휏, 1 + max({푛 ∈ N | ∃푙 ′ ∈ Lab. ⟨휏, 푛,푙 ′⟩ ∈ 퐸}), 푙⟩. their serial numbers (i.e., ⟨휏1, 푛1, 푙1⟩ < ⟨휏2, 푛2, 푙2⟩ iff 휏1 = 휏2 and 푛1 < 푛2). In turn, an execution graph consists of a set of events, a reads-from mapping that determines the write event Definition 2.7. An execution graph 퐺 is generated by a ⟨ ⟩ →∗ ⟨ ⟩ from which each read event reads its value, and a modification program 푃 with final state 푝 if 푝0,퐺0 푝,퐺 for some ∈ Q order that totally orders the writes to each location. 푝0 푃. 0, where 퐺0 denotes the empty execution graph (given by 퐺0 ≜ ⟨∅, ∅, ∅⟩) and → is defined by: Definition 2.3. A relation rf is a reads-from relation for a 휏,푙 ′ ′ set 퐸 of events if the following hold: 푝 −−→푃 푝 퐸 = 퐸 ∪ {NextEvent(퐸, 휏, 푙)} ′ ′ 휏,휀 ′ • If ⟨푤, 푟⟩ ∈ rf , then 푤 ∈ 퐸∩W, 푟 ∈ 퐸∩R, loc(푤) = loc(푟) rf ⊆ rf mo ⊆ mo 푝 −−→푃 푝 ′ ′ ′ ′ ′ and valW (푤) = valR (푟). ⟨푝, ⟨퐸, rf , mo⟩⟩ → ⟨푝 , ⟨퐸 , rf , mo ⟩⟩ ⟨푝,퐺⟩ → ⟨푝 ,퐺⟩

214 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK

3 The Strong Release/Acquire Model Indeed, if a read event 푟 reads from a write event 푤1, while be- Declarative memory models, such as Strong Release/Acquire ing aware of an mo-later write event 푤2 to the same location, −1 (SRA), are formulated by a collection of constraints on ex- we have ⟨푤1,푤2⟩ ∈ mo, ⟨푤2, 푟⟩ ∈ hb and ⟨푟,푤1⟩ ∈ rf . ecution graphs, which determine the consistent execution atomicity. This condition ensures that RMWs are stronger graphsÐthe ones allowed by the model. In this section, we than a read followed by a write. It requires that RMWs read formulate the constraints of SRA and define the safety veri- from their immediate mo-predecessors: fication problem under SRA, discuss equivalent alternative −1 formulations (ğ3.1), provide several examples (ğ3.2), and in- 퐺.mo ; 퐺.mo ; 퐺.rf is irreflexive (atomicity) vestigate the relation between SRA and RA (ğ3.3). In words, if an RMW event 푒 is reading from a write event 푤, then no write event can intervene mo-between 푤 and 푒. Notation 3.1 (Relations). Given a relation 푅, dom(푅) de- notes its domain; 푅? and 푅+ denote its reflexive and transitive We refer to execution graphs that meet the three condi- closures; and 푅−1 denotes its inverse. The (left) composition tions above as SRA-consistent. With this definition, we can formally present the reachability problem under SRA, which of relations 푅1, 푅2 is denoted by 푅1 ;푅2. We denote by [퐴] the identity relation on a set 퐴, and so [퐴] ; 푅 ; [퐵] = 푅 ∩ (퐴 × 퐵). we prove to be decidable in this paper. Definition 3.2. We call a state 푝 of a program 푃 reachable Causal consistency models are based on the following under SRA if some SRA-consistent execution graph is gener- basic derived łhappens-beforež relation: ated by 푃 with final state 푝 (see Def. 2.7). hb ≜ ( po ∪ rf)+ 퐺. 퐺. 퐺. Definition 3.3 (SRA Reachability). The reachability prob- The happens-before relation captures the łcausality relationž lem under SRA is given by: in execution graphs. In words, hb is the smallest transitive Input: a program 푃 and a łbadž state 푝 ∈ 푃.Q. relation that contains the program order (po) and the reads- Question: is 푝 reachable under SRA? from (rf) relations. We note that all reads synchronize with A lower complexity bound to this problem is achieved by the writes they read from (rf ⊆ hb), in contrast to more reduction from reachability in lossy FIFO channel machines, elaborate models like RC11 [34], where only certain reads- straightforwardly following the analogous reduction of Atig from edges induce synchronization. et al. [11] to safety verification under x86-TSO. Given hb, the SRA model consists of three constraints, each of which forbids a certain pattern in execution graphs. Theorem 3.4. SRA reachability is non-primitive-recursive. The three disallowed patterns are illustrated as follows: 3.1 Other Formulations of SRA mo mo W푥 W푥 W푥 W푥 Our presentation above follows [30], where SRA is intro- (hb ∪ mo)+ rf hb rf mo duced as a strengthening of RA. The latter is the fragment of the C/C++11 model [14, 34] consisting of release stores, R RMW E 푥 푥 acquire reads and acquire-release RMWs. In addition, SRA irr-hb-mo read-coherence atomicity appears (in multiple disguises) in the literature: irr-hb-mo. This constraint requires that the modification POWER. As proved in [30], SRA precisely coincides with order mo łagreesž with the causality order: the POWER model of [9], when the latter is restricted to programs that result from compiling C/C++11 programs in (퐺.hb ∪ 퐺.mo)+ is irreflexive (irr-hb-mo) the release/acquire fragment, using the standard compilation In particular, it implies that 퐺.hb is indeed a partial order. scheme [38] (that is, placing lwsync before every store and Thus, SRA forbids so-called łload-bufferingž behaviors [39], ctrl+isync after every load). which, unless restricted appropriately, lead to the infamous Distributed Key-Value Stores. Ignoring RMWs, the SRA łout-of-thin-airž problem [13, 26]. model is equivalent to the causal convergence model, denoted read-coherence. This constraint intuitively requires that by CCv, of [16] (when applied to the standard read/write ła thread cannot read a value when it is aware of a later value memory sequential specification), as well as to the causal con- written to the same locationž. Identifying łthread 휏 being sistency model of [37] when restricted to single-instruction aware of some write event 푤ž with an hb-path from 푤 to transactions. These models are formulated in [18, 20] in (some event of) 휏, and using the modification order mo to terms of visibility (푣푖푠) and arbitration (푎푟) relations. One interpret one write being łlaterž than another, the precise direction of the correspondence follows by setting 푣푖푠 = hb condition requires that: and taking 푎푟 to be some total order extending hb ∪ mo. For the converse, one takes rf to relate each read 푟 with the 퐺.mo ; 퐺.hb ; 퐺.rf−1 is irreflexive (read-coherence) 푎푟-maximal write to the same location that is 푣푖푠-before 푟,

215 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker and sets mo = Ð푥 ∈Loc [W푥 ] ; 푎푟 ; [W푥 ]. Furthermore, our pro- Example 3.6 (Message passing). SRA supports the very gram order (po) corresponds to session order (푠표), and SRA’s common łflag-basedž synchronization. That is, the following consistency ensures strong session guarantees (푠표 ⊆ 푣푖푠)[47]. outcome is disallowed: RMWs in distributed databases require expensive global W x x := 0 ( , 0) coordination. A naive implementation of RMWs as transac- a := y //1 x := 1 mo tions that read and write from/to the same location does not b := x //0 y := 1 W (x, 1) R (y, 1) (MP) guarantee atomicity, as it allows the lost update anomaly rf ✗ (e.g., it will allow the outcome in Ex. 3.9 below). In the partic- SRA W (y, 1) R (x, 0) ular case when a certain location is only accessed by RMWs, rf mo its accesses are totally ordered by hb, which corresponds An execution graph for this outcome must have and - mo W x to marking of certain transactions as serializable, as in the edges as depicted above. However, we have from ( , 0) W x hb W x R x rf W x Red-Blue model of [15, 36]. to ( , 1), from ( , 1) to ( , 0) and from ( , 0) to R(x, 0). Hence, read-coherence does not hold, and the execu- Parallel-Snapshot-Isolation. When all store instructions tion graph is not SRA-consistent. are implemented using atomic exchanges (implementing 푥 := 푒 as 푟 := XCHG(푥, 푒)), SRA precisely captures the par- Example 3.7 (Transitive message passing). po and rf edges allel snapshot isolation model (PSI) [10, 15, 19, 43, 45] when equally contribute to hb in causal consistency. Hence, as in restricted to single-instruction transactions. Hence, our de- Ex. 3.6, the following outcome is disallowed by SRA. cidability result for SRA entails the decidability for PSI with x := 0 a := y //1 b := x //1 ✗ SRA single-instruction transactions. y := 1 x := 1 c := x //0

3.2 Examples W (x, 0) R (y, 1) R (x, 1) (MP-trans) mo rf We list some well-known litmus tests to demonstrate SRA rf W (y, 1) W (x, 1) R (x, 0) (some of which are revisited in the sequel). To simplify the presentation, instead of referring to reachable program states, Example 3.8 (Independent reads of independent writes). A we consider possible program outcomes assigning final values main difference between SRA and the x86-TSO model [42] to (some) registers. An outcome 푂 : Reg ⇀ Val is allowed is that the former is non-multi-copy-atomic. Namely, differ- for a program under the declarative model SRA if some state ent threads may observe different stores in different orders. in which the registers are assigned their values in 푂 is reach- Thus, unlike x86-TSO, the SRA model allows the following able under SRA (see Def. 3.2). We use program comment outcome, in which T2 observes W(x, 1) but not W(y, 1), while annotations (ł//ž) to denote particular outcomes. T3 observes W(y, 1) but not W(x, 1). Remark 1. To simplify our presentation, we require explicit x := 0 a := x //1 c := y //1 y := 0 ✓ SRA initialization of memory locations and adapt the examples to x := 1 b := y //0 d := x //0 y := 1 include explicit initialization. Reading from an uninitialized W (x, 0) R (x, 1) R (y, 1) W (y, 0) (IRIW) location blocks the thread. (For example, only the initial mo rf rf mo execution graph 퐺0 is generated by a program consisting W (x, 1) R (y, 0) R (x, 0) W (y, 1) of a single thread that reads from some location, without previously writing to it.) This is only a presentation matter: Example 3.9. For the implementation of locks, it is crucial one may always achieve implicit initialization by augmenting that two RMWs never read from the same write: the program with an additional thread that sets all variables x := 0 W (x, 0) to their initial value, and then signals all other thread (using a := b := rf rf (2RMW) an additional flag) to start running. CAS(x, 0, 1) //0 CAS(x, 0, 1) //0 ✗ SRA RMW (x, 0, 1) RMW (x, 0, 1) Example 3.5 (Store buffering). The following program out- come is allowed by SRA. Since mo must order the two RMWs and irr-hb-mo dictates that mo ; rf is irreflexive, any order of the RMWs entails a W (x, 0) W (y, 0) x := 0 y := 0 violation of atomicity. x := 1 y := 1 mo mo a := y //0 b := x //0 W (x, 1) W (y, 1) (SB) Example 3.10. RMWs to an otherwise-unused location can rf rf ✓ be used as fences, as the consistency constraints imply that hb SRA R (y, 0) R (x, 0) must totally order 퐺.W푥 when, except for one (initialization) In its execution graph the rf-edges are forced because of write event, all write events to 푥 in 퐺 are RMWs. For example, the read values, whereas the mo-edges are forced due to irr- placing such fences forbids the weak outcome of the SB hb-mo. It can be easily verified that the execution graph is program (Ex. 3.5). An execution graph for this outcome must SRA-consistent. have the edges as depicted on the right, and any choice of

216 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK the missing rf-edges (to the two RMW events) will violate a freedom of SRA-consistent execution graphs suffices, so that condition of SRA. programmers may adhere to a safe programming discipline without even understanding RA. z := 0 W (z, 0) x := 0 y := 0 W (x, 0) W (y, 0) Definition 3.13. An execution graph 퐺 is write/write-race x := 1 y := 1 W (x, 1) W (y, 1) (SBF) free if for every 푤1,푤2 ∈ 퐺.W with loc(푤1) = loc(푤2), a = (z ) a = (z ) : FADD , 0 : FADD , 0 rf we have 푤 = 푤 , ⟨푤 ,푤 ⟩ ∈ 퐺.hb or ⟨푤 ,푤 ⟩ ∈ 퐺.hb.A b := y //0 c := x //0 RMW (z, 0, 0) RMW (z, 0, 0) 1 2 1 2 2 1 program 푃 is write/write-race free under SRA if every SRA- ✗ SRA R (y, 0) R (x, 0) consistent execution graph that is generated by 푃 (with some final state) is write/write-race free. 3.3 Relation to the RA Model The RA model is weaker than SRA. It imposes read-coherence Theorem 3.14. Let 푃 be a program that is write/write-race and atomicity, just like SRA, but instead of irr-hb-mo, it only free under SRA. Then, the sets of states of 푃 that are reachable disallows the following patterns: under SRA and RA coincide. W푥 Proof. Using Prop. 3.12, it suffices to show that reachability mo hb hb under RA implies reachability under SRA. Let G be the set E W푥 of all RA-consistent but SRA-inconsistent execution graphs irr-hb write-coherence that are generated by 푃. To show that every state of 푃 that is First, irr-hb requires hb to be a partial order: reachable under RA is also reachable under SRA, it suffices 퐺.hb is irreflexive (irr-hb) to show that G is empty. Suppose otherwise and let 퐺 be a minimal element in mo hb Second, instead of a global agreement between and , G, in the sense that every proper 퐺.hb-prefix of 퐺 is not RA only requires a local agreement: in G. (A proper 퐺.hb-prefix of 퐺 is an execution graph 퐺.mo ; 퐺.hb is irreflexive (write-coherence) of the form ⟨퐸푝, [퐸푝 ] ; 퐺.rf ; [퐸푝 ], [퐸푝 ] ; 퐺.mo ; [퐸푝 ]⟩ where ⊊ In words, if hb orders two writes to the same location, then 퐸푝 퐺.E and dom(퐺.hb ; [퐸푝 ]) ⊆ 퐸푝 .) Since the empty mo must follow the same order. execution graph 퐺0 is trivially SRA-consistent, 퐺 cannot be empty. Let 푒 be a 퐺.hb-maximal event in 퐺.E, and let Example 3.11. Cycles in hb∪mo involving only one location 퐸′ = 퐺.E \{푒}. The minimality of 퐺 ensures that 퐺 ′ = are disallowed by write-coherence (using the fact that mo is ⟨퐸′, [퐸′] ; 퐺.rf ; [퐸′], [퐸′] ; 퐺.mo ; [퐸′]⟩ (the restriction of 퐺 total on writes to the same location). In contrast, irr-hb-mo to 퐸′) is SRA-consistent. Hence, our assumption on 푃 en- (of SRA) restricts the relation between [W푥 ] ; mo ; [W푥 ] and sures that 퐺 ′ is write/write-race free, thus using irr-hb-mo, [W푦] ; mo ; [W푦] also when 푥 ≠ 푦. The following example it follows that 퐺 ′.mo ⊆ 퐺 ′.hb ⊆ 퐺.hb. (adapted from [50]) demonstrates the difference: Now, since 퐺 is RA-consistent but not SRA-consistent, 퐺 ′ W (x, 1) W (y, 1) does not satisfy irr-hb-mo. Since 퐺 satisfies irr-hb-mo, it = = mo x : 1 y : 1 must be the case that there exists 푤 ∈ 퐸′ such that ⟨푒,푤⟩ ∈ y := 2 x := 2 W (y, 2) W (x, 2) (2+2W) 퐺.mo and ⟨푤, 푒⟩ ∈ (퐺.hb ∪ 퐺 ′.mo)+. Since 퐺 ′.mo ⊆ 퐺.hb, it a := y //1 a := x //1 rf mo hb ✓ RA ✗ SRA follows that ⟨푒, 푒⟩ ∈ 퐺. ; 퐺. . Hence, 퐺 does not satisfy R (y, 1) R (x, 1) write-coherence, which contradicts the fact that it is RA- An execution graph for this outcome must have rf and mo- consistent. □ edges as depicted above (to satisfy read-coherence), and it contains a (hb ∪ mo)-cycle, which is allowed by RA and dis- SRA allowed by SRA. 4 Operationalizing the Model In this section, we present an operational semantics for SRA, Since irr-hb-mo implies both irr-hb and write-coherence, formulating it as a memory system. While the formulation in the following trivially holds: ğ3 is declarative, it is straightforward to łoperationalizež it. Proposition 3.12. SRA-consistency implies RA-consistency. Indeed, instead of first generating a program execution graph and then checking for SRA-consistency, one may impose Reachability under RA is defined analogously to reach- consistency at each step of an incremental construction of ability under SRA (replacing łSRAž with łRAž in Def. 3.2). the execution graph. This results in an equivalent operational Then, we clearly have that all states of a program 푃 that are presentation, which is arguably simpler and easier to relate reachable under SRA are also reachable under RA. The con- to the alternative semantics we define in ğ5. verse does not hold in general, but it does hold for the large and widely used class of write/write-race-free programs. In- Definition 4.1. A memory system is a (possibly infinite) LTS spired by DRF models [8], we show that write/write-race over the alphabet (Tid × Lab) ∪ {휀}.

217 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker write read rmw 퐺 = ⟨퐸, rf , mo⟩ 퐺 = ⟨퐸, rf , mo⟩ 퐺 = ⟨퐸, rf , mo⟩ 푒 = NextEvent(퐺.E, 휏, W(푥, 푣W)) 푒 = NextEvent(퐺.E, 휏, R(푥, 푣R)) 푒 = NextEvent(퐺.E, 휏, RMW(푥, 푣R, 푣W)) ′ ′ ′ 퐺 = ⟨퐸 ∪ {푒}, rf , mo ∪ (퐺.W푥 × {푒})⟩ 퐺 = ⟨퐸 ∪ {푒}, rf ∪ {⟨푤, 푒⟩}, mo⟩ 퐺 = ⟨퐸 ∪ {푒}, rf ∪ {⟨푤, 푒⟩}, mo ∪ (퐺.W푥 × {푒})⟩ 푤 ∈ 퐺.W푥 valW (푤) = 푣R 푤 ∈ 퐺.W푥 valW (푤) = 푣R 푤 ∉ dom(mo ; 퐺.hb? ; [E휏 ]) 푤 ∉ dom(mo)

휏,W (푥,푣W) ′ 휏,R (푥,푣R) ′ 휏,RMW (푥,푣R,푣W) ′ 퐺 −−−−−−−→opSRA 퐺 퐺 −−−−−−−→opSRA 퐺 퐺 −−−−−−−−−−−−→opSRA 퐺

Figure 3. Transitions of opSRA.

The alphabet symbols of the memory system are pairs in writing the value being read (valW (푤) = 푣R), and the thread Tid × Lab, representing the thread identifier and the label of executing the read is not aware of an mo-later write to the the operation, or 휀 for internal (silent) memory actions. same location (푤 ∉ dom(mo ; hb? ; [E휏 ])). An rmw step com- bines a read and a write, but it is enforced to pick the Example 4.2. The most well-known memory system is the mo-maximal write to the relevant location in the current one of sequential consistency, denoted here by SC. This graph as the reads-from source of the freshly added RMW. memory system simply tracks the most recent value writ- This semantics exploits the fact that hb ∪ mo is acyclic in ten to each location (or ⊥ for uninitialized locations). For- SRA-consistent execution graphs (as per irr-hb-mo). Hence, mally, it is defined by SC.Q ≜ Loc → (Val ∪ {⊥}), SC.Q ≜ 0 to generate an SRA-consistent execution graph in a run of an {휆푥 ∈ Loc. ⊥} and is given by: −→SC operational semantics, we can follow a total order extending 휇(푥) = 푣R hb∪mo, which guarantees that writes are executed following ′ ′ 휇 = 휇[푥 ↦→ 푣W] 휇(푥) = 푣R 휇 = 휇[푥 ↦→ 푣W] their mo-order. In turn, since RMWs should read from their

휏,W (푥,푣W) ′ 휏,R (푥,푣R) 휏,RMW (푥,푣R,푣W) ′ immediate mo-predecessor, we require that RMWs read from 휇 −−−−−−−→SC 휇 휇 −−−−−−−→SC 휇 휇 −−−−−−−−−−→SC 휇 the current mo-maximal write. Note that SC is oblivious to the thread that takes the action The next definition and simple theorem formalize the 휏,푙 ′ 휋,푙 ′ (휇 −−→SC 휇 iff 휇 −−→SC 휇 ), and it has no silent transitions. correspondence between SRA and opSRA. By synchronizing a program and a memory system, we Definition 4.4. A state 푝 of a program 푃 is reachable under obtain a concurrent system: a memory system 푀 if ⟨푝,푚⟩ is reachable in 푃푀 for some 푚 ∈ 푀.Q. Definition 4.3. A program 푃 and a memory system 푀 form a concurrent system, denoted by 푃푀 . It is an LTS over (Tid × Theorem 4.5. A state 푝 of program 푃 is reachable under SRA (Lab ∪ {휀})) ∪ {휀} whose set of states is 푃.Q × 푀.Q; its initial (see Def. 3.2) iff it is reachable under opSRA. states set is 푃.Q0 ×푀.Q0; and its transitions are łsynchronized transitionsž of 푃 and 푀, given by: Proof. Given an SRA-consistent execution graph 퐺, one ob- tains a run of opSRA by following any total order extending 휏,푙 ′ 푙 ∈ Lab 푝 −−→푃 푝 퐺.hb ∪퐺.mo. The preconditions required by each step follow 휏,푙 ′ 휏,휀 ′ 휀 ′ 푚 −−→푀 푚 푝 −−→푃 푝 푚 −→푀 푚 directly from the fact that 퐺 is SRA-consistent. For the con- 휏,푙 휏,휀 휀 verse, it suffices to note that all reachable states of opSRA are ⟨푝,푚⟩ ⟨푝′,푚′⟩ ⟨푝,푚⟩ ⟨푝′,푚⟩ ⟨푝,푚⟩ ⟨푝,푚′⟩ −−→푃푀 −−→푃푀 −→푃푀 SRA-consistent execution graphs. Hence, if ⟨푝,퐺⟩ is reach- Next, we present the memory system opSRA that is equiv- able in opSRA, then 퐺 is an SRA-consistent execution graph alent to SRA (in the sense that is made formal in Thm. 4.5). that is generated by 푃 with final state 푝. □ We also refer the reader to Fig. 5 on Page 12, which illustrates a run of opSRA for the SB example. Remark 2. Following [27], our formulation of opSRA does The states of opSRA are execution graphs capturing (par- not directly refer to the consistency predicates, but rather tially ordered) histories of executed actions (opSRA.Q ≜ articulate necessary and sufficient conditions that ensure EGraph); the (only) initial state is the empty execution graph that the target state is a consistent execution graph. It is 퐺0 (opSRA.Q0 ≜ {퐺0}); and the transitions are given in Fig. 3. possible to take a step further and develop an equivalent A write step by thread 휏 adds a corresponding fresh write semantics with more compact states that may feel łmore event 푒 to the graph placed after all events of thread 휏 and operationalž and intuitive. Indeed, it suffices to maintain a extends mo to order 푒 after all existing writes to the same partially ordered set of write events, together with a mapping location. A read step by thread 휏 adds a corresponding fresh of which writes each thread is aware of (the łobserved writes read event and justifies it with a reads-from edge. Its source setž of [21]). This can be implemented using timestamps, 푤 must be a write event to the same location (푤 ∈ 퐺.W푥 ), messages and thread views, as was done, e.g., in [25].

218 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK

5 Making Strong Release/Acquire Lossy Example 5.2. Consider the MP program with its outcome For resolving the reachability problem under SRA, we intro- in Ex. 3.6. It is forbidden under SRA, and so we need to avoid duce an alternative memory system, which we call loSRA the following scenario: First, T1 writes 0 to x and adds a 0 (for łlossy-SRAž). In this section, we present loSRA, estab- corresponding option 표x to the (initially empty) list of T2, lish its equivalence to opSRA, and show how it is used to and then writes 1 to x without adding any option to any list decide the reachability problem. We begin with an intuitive (no thread reads 1 from x in this program outcome). Then, 1 discussion to motivate our definitions. T1 further writes 1 to y and adds a corresponding option 표y 0 A memory state of loSRA maintains a collection of łread- in the list of T1 placed before 표x. Finally, T2 may run: read 1 1 0 optionž lists for each thread, called the potential of the thread, from y (consuming 표y) and then 0 from x (consuming 표x). loc where each read option 표 contains a location (표), a value The restriction we impose on the positions of the added val (표) and two other components that are explained below. read options stems from the following key observation:2 Each read-option list stands for a sequence of possible future Shared-memory causality principle: After thread 휋 reads reads of the thread, listing the writes that it may read in from a certain write executed by thread 휏, it can perform a the order that it may read them. For example, the list 표 · 1 sequence of operations only if thread 휏 could perform the same 표 allows the thread to read val(표 ) from location loc(표 ) 2 1 1 sequence immediately after it executed the write. and then val(표2) from location loc(표2). These lists do not ascribe mandatory continuations, but rather possible futures Indeed, if thread 휏 has just performed a write 푤, then after (hence, read options). In the beginning, the empty list is thread 휋 reads from 푤, it łsynchronizesž with 휏 and it is assigned to all threadsÐbefore any write is executed, no reads thus confined by the sequences of reads that 휏 may perform. are possible (recall that we assume explicit initialization, Hence, to allow the addition of a read option 표 in certain see Remark 1). In addition, the semantics is designed so that positions of a list 퐿 of some thread 휋, we require a justifica- read-option lists are łlossyž, allowing a non-deterministic tion: the suffix of 퐿 after the first occurrence of 표 should be step that removes arbitrary options from the lists. a subsequence of a read-option list of the writing thread 휏. The read-option lists in the potentials dictate the possible This guarantees that after 휋 reads from a write 푤 of 휏, it will read steps threads can take: for a thread 휏 to read 푣 from 푥, not be able to read something that 휏 could not read at the 1 an option 표 with val(표) = 푣 and loc(표) = 푥 must be the first time that it wrote 푤. (Revisiting Ex. 5.2, the read option 표y 0 0 in each of 휏’s lists. Then, to progress to the next option in cannot be placed before 표x, because T1 cannot have 표x in its the list, the thread may consume these options, and discard lists at the point of writing 1 to y.) the first element from each of its lists. Now, since the potential of thread 휏 is used both for (i) dic- A write step is more involved, encapsulating the require- tating future reads of 휏, and (ii) justifying placement of read ments of opSRA. First, since opSRA performs write events options that are generated by 휏’s write steps, we may need following their mo-order, when a thread writes to 푥, it cannot more than one option list for each thread. We also allow to later read 푥 from a write that was already performed (this discard existing lists in silent moves of the memory system. would violate read-coherence). Accordingly, we do not allow This is demonstrated in the following example. loc( ) = a thread to write to 푥 if some read option 표 with 표 푥 Example 5.3. Consider the following program, whose an- appears in its potential. Second, when a thread performs a notated outcome is allowed under SRA: write of 푣 to 푥, it allows future reads from this write. That x := 0 y := 0 z := 0 d := x //1 e := y //1 f := z //1 is, read options 표 with loc(표) = 푥 and val(표) = 푣 may be x := 1 y := 1 z := 1 1 1 1 d2 := y //1 e2 := z //1 f2 := x //1 added to every list of every thread. This makes the write step a1 := z //1 b1 := x //1 c1 := y //1 d3 := z //0 e3 := x //0 f3 := y //0 in loSRA (unlike the one of opSRA) non-deterministicÐthe a2 := y //0 b2 := z //0 c2 := x //0 writer essentially has to łguessž what thread will read from Suppose that it can be obtained by the memory system out- the new write and when. lined above with one read-option list per thread (i.e., single- But, where in the lists should we allow to add such op- ton potentials). Suppose, w.l.o.g., that z := 1 is the last write tions? The following examples demonstrate two possible performed in the execution. Later, T3 has to read 1 from y cases. We write in them 표푣 for a read option of value 푣 from 1 푥 and 0 from x. Hence, its read-option list must include 표y and location 푥. 0 1 표x in this order. In addition, a read option 표z should be placed T 1 · 0 1 · 0 Example 5.1. Consider the IRIW program with its (SRA- in 6’s list before 표x 표y. The justification for it requires 표x 표y to be a subsequence of T3’s list. This implies that T3’s list allowed) outcome in Ex. 3.8. Clearly, the first step may only 1 0 1 0 should contain some interleaving of 표y ·표x and 표x ·표y. But, no be a write by T1 or T4. Suppose, w.l.o.g., that T1 begins. Since 0 such interleaving is a possible future for T3 (and thus cannot T3 reads 0 from x, a read option 표x should be added in the 1 be generated by loSRA): reading 표y does not allow T3 to read lists of T3. Now, before reading 0 from x, T3 has to read 1 1 from y. Hence, when T4 writes 1 to y, a read option 표y should 2A weaker observation, which only considers single reads, was essential for 0 be placed before 표x in the lists of T3. the of OGRAÐan Owicki Gries logic for RA introduced in [32].

219 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker

0 1 0 표y later; and reading 표x does not allow T3 to read 표x later. By Recall that in opSRA, an RMW may only read from the mo- allowing more than one read-option list per thread, we can maximal write to the relevant location. To achieve this in 1 0 1 0 have 표y ·표x and 표x ·표y in two separate lists in the potential of loSRA, we include an additional field in read options, which T3Ðboth are possible continuations for it after z := 1. Then, is a binary flag that can be set to either R or RMW. Intuitively, 1 0 after executing z := 1, T3 may łlosež the justifying list 표x · 표y, an RMW value means that the read option is set to read from 1 0 and choose to continue with 표y · 표x for its own reads. the mo-maximal write. Accordingly, an rmw step may only consume read options marked as RMW. Since write steps to Another complication arises due to the fact that read op- 푥 replace the mo-maximal write to 푥 in the execution graph, tions do not uniquely identify write events in the execution they may choose to mark any of the added read options as graph (this is unavoidable: for the decision procedure, we RMW, but they can only execute when no read option (of any need the alphabet of read options to be finite): thread) with location 푥 is marked as an RMW. Example 5.4. Consider the following program: Next, we turn to the formal definitions. x := 0 y := 0 a := z //1 c := w //1 Notation 5.5 (Sequences). We use 휖 to denote the empty x := 1 y := 1 w := 1 ✗ SRA (2MP) d := y //0 sequence. The length of a sequence 푠 is denoted by |푠| (in z := 1 z := 1 b := x //0 particular |휖| = 0). We often identify sequences with their Its annotated outcome is disallowed under SRA. Indeed, since underlying functions (whose domain is {1,...,|푠|}), and write T3 reads x = 0 after z = 1, the read of z must read from the 푠(푘) for the symbol at position 1 ≤ 푘 ≤ |푠| in 푠. We write write of T2. But then, T4, after reading w = 1 (from T3) cannot 휎 ∈ 푠 if 휎 appears in 푠, that is if 푠(푘) = 휎 for some 1 ≤ 푘 ≤ |푠|. read y = 0. However, the semantics described so far allows We use ł·ž for the concatenation of sequences, which is lifted this outcome as in the following snippet: to concatenation of sets of sequences in the obvious way.

T T T T T We identify symbols with sequences of length 1 or their {휖} {휖} {휖} {휖} −−−−→1 −−−−→1 −−−−→2 −−−−→2 −−−−→1 W (x,0) W (x,1) W (y,0) W (y,1) W (z,1) singletons when needed (e.g., in expressions like 휎 · 푆). 0 0 0 1 0 0 T2 0 0 1 0 1 0 0 {표y} {표x} {표x, 표z표y} {표y} −−−−→ {표y} {표x} {표z표x, 표z표y} {표y} W (z,1) Definition 5.6. Read options, read-option lists and potentials T3 0 0 0 0 0 T3 0 0 0 0 1 0 −−−−→ {표y} {표x} {표x, 표y} {표y} −−−−→ {표y} {표x} {표x, 표y} {표w표y} ... are defined as follows: R (z,1) W (w,1) 1. A read option is a quadruple 표 = ⟨휏, 푥, 푣,푢⟩, where 휏 ∈ Tid, T 1 What went wrong? The problem arises when 3 reads from 푥 ∈ Loc, 푣 ∈ Val and 푢 ∈ {R, RMW}. The functions tid, z 1 0 1 0 . At this point it has two possible futures, 표z표x and 표z표y. loc, val and rmw return the thread identifier휏 ( ), location Since read options, consisting of location and value, do not (푥), value (푣), and RMW flag푢 ( ) of a given read option. uniquely identify writes, it may read 1 from z, and remain 0 0 2. A read-option list 퐿 is a sequence of read options. with both 표x and 표y. Now, it uses one of these options to 1 3. A potential 퐵 is a finite non-empty set of read-option lists. justify the position of 표w in the list of T4, and the other for its own read. However, in a single run of opSRA, when reading We define an ordering on read-option lists, which extends 1 from z, T3 must pick which write event to read from, and to potentials and to assignments of potentials to threads. then, either it cannot read x = 0 or it cannot read y = 0. Definition 5.7. The (overloaded) relation ⊑ is defined by: To remedy this problem, we make read options to be more ′ informative. Together with location and value, read options 1. on read-option lists: 퐿 ⊑ 퐿 if 퐿 is a (not necessarily contiguous) subsequence of 퐿′; also include the thread identifier that performed the write. ′ ′ ′ ′ When a thread writes, it adds options with its own thread 2. on potentials: 퐵 ⊑ 퐵 if ∀퐿 ∈ 퐵. ∃퐿 ∈ 퐵 . 퐿 ⊑ 퐿 (a.k.a. łHoare orderingž); identifier in the different lists. For athread 휏 to read 푣 from ′ val = loc = 3. on functions from Tid to the set of potentials: B ⊑ B if 푥, a read option 표 with (표) 푣 and (표) 푥 and some ′ unique writing thread identifier must be the first in every B(휏) ⊑ B (휏) for every 휏 ∈ Tid. 1 read-option list of 휏. In this example, the two 표z options will The loSRA memory system is formally defined as follows. have different thread identifiers, which forces T3 to discard Figure 5 illustrates a run of loSRA for the SB program (Ex. 3.5) one of its lists before reading. together with the corresponding run of opSRA. Even with thread identifiers, read options do not uniquely identify write events. Nevertheless, as our proof shows, an Definition 5.8. loSRA is defined by: loSRA.Q is the set of ambiguity inside the writing thread does not harm the ade- functions B assigning a potential to every휏 ∈ Tid; loSRA.Q0 = 3 quacy of the semantics. Roughly speaking, it can be resolved {휆휏 ∈ Tid. {휖}}; and the transitions are given in Fig. 4. by picking the po-earliest write event, as reading from it 3To achieve implicit initialization of all locations to 0, one should take enforces the weakest constraints for the rest of the run. loSRA.Q0 to consist of all functions assigning to each thread sequences consisting of read options of the form ⟨T0, 푥, 0,푢⟩ where T0 is a distinguished Finally, RMWs behave like an atomic combination of a read thread identifier that is not used in programs (corresponds to the initializing and a write, with a slight adaptation of the above semantics. thread, see Remark 1).

220 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK write ′ ′ rmw ∀휋 ∈ Tid, 퐿 ∈ B (휋). ∃푛 ≥ 0,푢1, ... ,푢푛, 퐿0, ... ,퐿푛. ′ loc(표) = 푥 val(표) = 푣R 퐿 = 퐿0 ·⟨휏, 푥, 푣W,푢1⟩· 퐿1 ·...·⟨휏, 푥, 푣W,푢푛⟩· 퐿푛 read rmw(표) = RMW ∧ 퐿0 ·...· 퐿푛 ∈ B(휋) ∧ 퐿1 ·...· 퐿푛 ∈ B(휏) loc(표) = 푥 B = Bmid [휏 ↦→ 표 ·Bmid (휏)] ∧ (휋 = 휏 =⇒ ∀표 ∈ 퐿0 ·...· 퐿푛. loc(표) ≠ 푥) val(표) = 푣R lower ′ ′ 휏,W (푥,푣W) ′ ′ ∧ ∀표 ∈ 퐿0 ·...· 퐿푛. loc(표) = 푥 =⇒ rmw(표) = R B = B [휏 ↦→ 표 ·B (휏)] Bmid −−−−−−−→loSRA B B ⊑ B 휏,W (푥,푣W) ′ 휏,R (푥,푣R) ′ 휏,RMW (푥,푣R,푣W) ′ 휀 ′ B −−−−−−−→loSRA B B −−−−−−−→loSRA B B −−−−−−−−−−→loSRA B B −→loSRA B

Figure 4. Transitions of loSRA.

The definition of the write step generally follows the The rmw step is an atomic sequencing of read and write intuitive explanation above. Every read-option list after the to the same location. The read part can only be performed write transition is obtained from some previous list, with provided that the first option in all lists is marked with RMW. the addition of 푛 ≥ 0 read options of the current write, The lower transition allows to remove read options, as provided that: (i) the suffix of the existing list right after well as full read-option lists, at any point. It also allows to add the position of the first added option is a read-option list of new lists, provided that each new list is łat most as powerfulž the writing thread; (ii) the lists of the writing thread (which as some existing list (as used in Remark 3). Intuitively, lower are not discarded in this transition) cannot have options can only reduce the possible traces, while it allows us to show to read from 푥 besides the ones that are currently added; that loSRA is a well-structured transition system. and (iii) the original lists (which are not discarded in this Example 5.9. Consider the 2+2W program with its (SRA- transition) cannot have an RMW option for 푥. Note that since disallowed) outcome in Ex. 3.11. To see that this outcome the universal quantification is on lists of the new state, the cannot be obtained by loSRA, consider the last write executed step allows to łduplicatež lists before modifying them, as in a run of this program. Suppose, w.l.o.g., that it is y := 2 well as to łdiscardž complete lists (as often useful when a by T . Before executing this write, T may not have any read certain list is needed only as a justification for positioning a 1 1 options of location y in its lists. Hence, a read option of the read option). We also note that several RMW options can be form ⟨휋, y, 1,푢⟩ should be added to T ’s potential after T added, but only one of them may be later fulfilled, due to 1 1 executed y := 2. This contradicts our assumption that y := 2 condition (iii). was the last executed write. Remark 3. Our formal write step insists on having a jus- Example 5.10. Consider the 2RMW program with its (SRA- tification in the form of a complete read-option list ofthe disallowed) outcome in Ex. 3.9. To try to obtain this out- writing thread (퐿 ·...·퐿 ∈ B(휏)). It suffices, however, for the 1 푛 come in loSRA, the x := 0 by T must add a read option suffix after the first added read option tobea subsequence 1 ⟨T , x, 0, RMW⟩ in both its own list and in a list of T . But, the of some list of the writing thread ({퐿 ·...· 퐿 } ⊑ B(휏)). In- 1 2 1 푛 execution of the first RMW, which consumes one of these deed, this less restrictive step is derivable by combining a options, can only proceed after the other option marked with lower step and a write step. Note also that for 휋 = 휏 RMW is discarded. Hence, the second RMW cannot read 0, and (adding read options in the lists of the thread that performed this outcome cannot be obtained by loSRA. the write), this means that no justification is needed (since 퐿0 ·...· 퐿푛 ∈ B(휏) implies {퐿1 ·...· 퐿푛 } ⊑ B(휏)). Next, we establish the equivalence of loSRA and opSRA. To do so, we define a relation ⋎ ⊆ loSRA.Q × opSRA.Q, The read step requires the first option in all lists in the formalizing the intuitive simulation discussed so far between executing thread’s potential the read to be the same, and loSRA’s lists and opSRA’s execution graphs. For defining ⋎, consumes it from all these lists. Note that, by definition, the ′ we first define a łwrite listž linking the read options ina potential B (휏) is non-empty, and so the set B(휏) as defined read-option list 퐿 to write events in an execution graph 퐺. in the step is non-empty. When all options are consumed, 휏’s potential consists of a single empty list. Definition 5.11. A write list is a sequence푊 of write events. A write list 푊 is a ⟨퐺, 퐿⟩-write-list if |퐿| = |푊 | and the Remark 4. Our formal read step always discards the first following hold for every 1 ≤ 푘 ≤ |푊 | with 퐿(푘) = ⟨휏, 푥, 푣,푢⟩: option from the lists, which was used to justify the read. An • 푊 (푘) ∈ 퐺.W. alternative semantics that keeps the lists unchanged in read • tid(푊 (푘)) = 휏, loc(푊 (푘)) = 푥 and val (푊 (푘)) = 푣. steps (allowing to discard the first option using the lower W • if 푢 = RMW, then 푊 (푘) ∉ dom(퐺.mo). step) would be completely equivalent. Indeed, the write step that added the consumed option could always add multiple A write list 푊 is ⟨퐺, 휏⟩-consistent if, intuitively, the ex- identical consecutive read options. tension of 퐺 with a sequence of read events in thread 휏

221 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker

W (x, 0) W (x, 0) W (x, 0) W (y, 0) W (x, 0) W (y, 0) W (x, 0) W (y, 0) W (x, 0) W (y, 0)

mo mo mo rf mo rf mo mo rf mo T T T T T T −−−−−→1 −−−−−→1 W (x, 1) −−−−−→2 W (x, 1) −−−−−→1 W (x, 1) −−−−−→2 W (x, 1) W (y, 1) −−−−−→2 W (x, 1) W (y, 1) W (x,0) W (x,1) W (y,0) R (y,0) W (y,1) R (x,0)

R (y, 0) R (y, 0) R (y, 0) R (x, 0)

{휖 } T {휖 } T {휖 } T {⟨T2, y, 0, R⟩} T {휖 } T {휖 } T {휖 } −−−−−→1 −−−−−→1 −−−−−→2 −−−−−→1 −−−−−→2 −−−−−→2 {휖 } W (x,0) {⟨T1, x, 0, R⟩} W (x,1) {⟨T1, x, 0, R⟩} W (y,0) {⟨T1, x, 0, R⟩} R (y,0) {⟨T1, x, 0, R⟩} W (y,1) {⟨T1, x, 0, R⟩} R (x,0) {휖 }

Figure 5. Illustration of runs of opSRA and loSRA for the SB program (Ex. 3.5). In opSRA’s states (execution graphs), events of T1 are on the left and of T2 on the right. In loSRA’s states (a potential for each thread), the potential of T1 is at the top and of T2 at the bottom. In this simple example, all option lists consist of at most one option and all potentials are singletons. reading from the sequence of write events in 푊 satisfies Now, suppose that (i) B ⋎ 퐺, witnessed by read-coherence. Thus, we ensure that thread 휏 is not already a ⟨퐺, 휋⟩-consistent ⟨퐺, 퐿⟩-write-list푊⟨휋,퐿⟩ B ⋎ 퐺 for every 휋 ∈ Tid and 퐿 ∈ B(휋); and aware of some write that is mo-later than some write of 푊 , 휏, 푙 휏, 푙 휏,푙 ′ ′ and that after reading from a write 푤1 of 푊 , thread 휏 will (ii) B −−→loSRA B . We construct 퐺 such B′ ⋎ 퐺 ′ not become aware of some write that is mo-later than some ′ ′ 휏,푙 ′ that B ⋎퐺 and퐺 −−→opSRA 퐺 (as depicted ∃ write 푤2 that appears after 푤1 in 푊 . Formally: on the right).

Definition 5.12. A write list 푊 is called ⟨퐺, 휏⟩-consistent if We describe here the write and read steps (the rmw step ? 휏 푊 (푘) ∉ dom(퐺.mo ; 퐺.hb ; [E ∪ {푊 (푗) | 1 ≤ 푗 < 푘}]) for is obtained by carefully combining them). every 1 ≤ 푘 ≤ |푊 |. For a write step, 퐺 ′ is trivially constructed by adding a new write event 푤 to 퐺, placed last in po inside the writing Now, ⋎ relates a loSRA state B with an execution graph thread 휏, and last in mo among the writes to the same location. 퐺 if each read-option list in B has an appropriate write list. 휏,푙 ′ ′ ′ Then, 퐺 −−→opSRA 퐺 is trivial. To show that B ⋎ 퐺 , we con- struct for every 휋 ∈ Tid and 퐿′ ∈ B′(휋), a ⟨퐺 ′, 휋⟩-consistent Definition 5.13. A state B ∈ loSRA.Q matches an execution ⟨퐺 ′, 퐿′⟩-write-list 푊 ′. The write list 푊 ′ maps: (i) the new ⋎ graph 퐺, denoted by B 퐺, if for every 휏 ∈ Tid and 퐿 ∈ B(휏), options to the new write event 푤; (ii) the łoldž options that there exists a ⟨퐺, 휏⟩-consistent ⟨퐺, 퐿⟩-write-list. appear before the first new option as mapped in the existing write list 푊 for the corresponding list in B(휋); and (iii) each Next, we establish the equivalence of loSRA and opSRA, old option that appears after the first new option to the showing that every trace of opSRA is a trace of loSRA, and mo-maximal write event among (1) its mapping in the exist- vice versa. Notice that 휀-transitions do not affect reachabil- ing write list 푊 for the corresponding list in B(휋) and (2) its ity of problem states, which only concerns the sequence mapping in the existing write list 푊휏 for the justifying list of labels of the program. As loSRA employs 휀-transitions in B(휏). Roughly speaking, picking the mo-maximal write in (lower), while opSRA does not, the trace equivalence ig- the third case ensures that 푊 ′ is ⟨퐺 ′, 휋⟩-consistent: In the nores 휀-transitions. new state, thread 휋 might not be able to read from a write that it previously read from (since it łsynchronizedž with 휏 Definition 5.14. Two traces are equivalent if their restric- that may be aware of a later write) and might not be able tions to non 휀-transitions are equal. to read from the a write that 휏 reads from (since 휋 may be already aware of a later write); but, it may read from the Full proofs of Lemmas 5.15 and 5.16 are provided in [1], later write between these two writes. and their mechanization in the Coq proof assistant is pro- In turn, to simulate a read step of loSRA in opSRA, we vided in the artifact accompanying this paper. need to pick a write event 푤 from which the added read event 푟 will read-from in 퐺 ′. We pick 푤 to be the po-minimal Lemma 5.15. For every trace of loSRA there is an equivalent event among the write events that the 푊⟨휏,퐿⟩ lists associate trace of opSRA. to the first option in some 퐿 ∈ B(휏). (All these options are consumed during the step, and their corresponding writes Proof Outline. We show that ⋎ constitutes a (weak) forward are all in the same thread as dictated by the thread identifier simulation from loSRA to opSRA. Handling the lower step stored in the read options). The consistency of the푊⟨휏,퐿⟩ lists is easy (new write lists are restrictions of the ones we have). ensures that 푤 ∉ dom(퐺.mo ; 퐺.hb? ; [E휏 ]), so we can make

222 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK the read step in opSRA when reading from 푤. To show that {푠 ∈ 푆 | ∃푢 ∈ 푈 . 푢 ≾ 푠}; a set 푈 ⊆ 푆 is called upward closed B′ ⋎퐺 ′, for every thread 휋 ≠ 휏, we simply reuse the write list if 푈 = ↑푈 ; and a set 퐵 ⊆ 푈 is called a basis of 푈 if 푈 = ↑퐵. 푊⟨휋,퐿⟩ that we had for 퐺; while for thread 휏 itself, we shift its The properties of a wqo ensure that every upward closed set ′ ≜ write lists by one (setting푊⟨휏,퐿⟩ 휆푘. 푊⟨휏,퐿⟩ (1+푘)), and use has a finite basis. the po-minimality of 푤 to establish ⟨퐺 ′, 휏⟩-consistency. □ A well-structured transition system (WSTS) is an LTS 퐴 equipped with a wqo ≾ on 퐴.Q that is compatible with 퐴, For the converse, we favor backward simulation, since that is: if 푞1 −→퐴 푞2 and 푞1 ≾ 푞3, then there exists 푞4 ∈ 퐴.Q loSRA requires to łguessž the future, and without knowing ∗ ≾ such that 푞3 −→퐴 푞4 and 푞2 푞4. The coverability problem the target state, we cannot construct the next step. for ⟨퐴, ≾⟩ asks whether an input state 푞 ∈ 퐴.Q is coverable, ′ ′ Lemma 5.16. For every trace of opSRA there is an equivalent namely: is some state 푞 with 푞 ≾ 푞 reachable in 퐴? trace of loSRA. Coverability is decidable (see, e.g., [7, 22]) for a WSTS ⟨퐴, ≾⟩ provided that ≾ is decidable and the following hold: Proof Outline. We show that ⋎−1 constitutes a backward sim- (i) effective initialization: there exists an that ac- ulation from opSRA to loSRA. cepts a state 푞 ∈ 퐴.Q and decides whether ↑{푞}∩퐴.Q = ∅. For the main proof obligation (depicted on 0 (ii) effective pred-basis: there exists an algorithm that accepts 휏,푙 ′ the right), suppose that 퐺 −−→opSRA 퐺 and ∃ Q B ⋎ 퐺 a state 푞 ∈ 퐴. and returns a finite basis of ↑pred퐴 (↑{푞}). B′ ⋎ 퐺 ′, witnessed by a ⟨퐺 ′, 휋⟩-consistent Roughly speaking, these conditions ensure that (i) backward ⟨퐺 ′, 퐿′⟩-write-list 푊 ′ for every 휋 ∈ Tid 휏, 푙 휏, 푙 ⟨휋,퐿′ ⟩ reachability analysis from 푞 will converge to a fixed point; ′ ′ ′ ⋎ ′ and 퐿 ∈ B (휋). We construct a state B such B 퐺 (ii) each step in its calculation is effective; and (iii) we can 휏,푙 ′ that B −−→loSRA B and B ⋎ 퐺. check whether the fixed point contains an initial state. Again, we describe here the write and read steps. First, loSRA for a write step, let 푤 be the write event that is added in as a Well-Structured Transition System. The opSRA’s transition from 퐺 to 퐺 ′. Roughly speaking, we con- ⊑ ordering on the states of loSRA is clearly decidable and struct B by removing the read options that were associated also forms a wqo. Indeed, by Higman’s lemma, ⊑ is a wqo to 푤 in the existing write lists, and copying the suffix of each on the set of all read-option lists. In turn, its lifting to po- read-option list after the first read option associated to 푤 tentials (which are finite by definition) is a wqo on the set to the potential of thread 휏. The write lists for B are then of all potentials (see [44]). Finally, by Dickson’s lemma, the induced by those for B′ in the obvious way. pointwise lifting of ⊑ to functions assigning a potential to In turn, for a read step, let 푟 be the read event that is every 휏 ∈ Tid (i.e., states of loSRA) is also a wqo. added in opSRA’s transition from 퐺 to 퐺 ′, and 푤 be the Now, let 푃 be a program. The ⊑ ordering is naturally write event that 푟 reads from in 퐺 ′. Then, we construct B by lifted to states of the concurrent system 푃loSRA (that is, pairs ′ ′ ⟨푝, B⟩ ∈ 푃.Q × loSRA.Q, see Def. 4.3) by defining ⟨푝, B⟩ ⊑ setting B = B [휏 ↦→ ⟨tid(푤), loc(푟), valR (푟), R⟩·B (휏)], ′ ′ ′ ′ 휏,푙 ′ ⟨푝 , B ⟩ iff 푝 = 푝 and B ⊑ B . and B −−→loSRA B follows by definition. Now, to show that ′ B ⋎퐺, we use the write lists of B for every 휋 ≠ 휏. For 휋 = 휏 Lemma 6.1. 푃loSRA equipped with ⊑ is a WSTS that admits we append 푤 in the beginning of the write lists of B′. □ effective initialization and effective pred-basis. We conclude with the equivalence of opSRA and loSRA. Proof. First, since 푃.Q is (by definition) finite and ⊑ is a wqo on loSRA.Q, we have that ⊑ is a wqo of 푃 .Q. Theorem 5.17. For every program 푃, the set of program states loSRA Second, since lower is explicitly included in loSRA, ⊑ is that are reachable under opSRA coincides with the set of pro- clearly compatible with 푃 . Indeed, given 푞 = ⟨푝 , B ⟩, gram states that are reachable under loSRA. loSRA 1 1 1 = = 푞2 ⟨푝2, B2⟩ and 푞3 ⟨푝3, B3⟩ such that 푞1 −→푃loSRA 푞2 and ∗ Proof. Directly follows from Lemmas 5.15 and 5.16. □ 푞1 ⊑ 푞3 (so 푝 = 푝 ), for 푞4 = 푞2, we have 푞3 −→ 푞4 (since 1 3 푃loSRA 휀 B −→ B using the lower step) and 푞 ⊑ 푞 . 6 Decidability of the Reachability Problem 3 loSRA 1 2 4 Next, 푃loSRA trivially admits effective initialization. Indeed, We show how loSRA is used for establishing the decidability the states ⟨푝, B⟩ for which ↑{⟨푝, B⟩} ∩ 푃loSRA.Q0 ≠ ∅ are of the reachability problem under the declarative SRA model exactly the initial states themselvesÐ푃.Q0 × {휆휏. {휖}}. (see Def. 3.3). We start with recalling the framework of well- Finally, [1] establishes the effective pred-basis for 푃loSRA. structured transition systems. For this matter, we demonstrate how to calculate a finite ↑ 훼 (↑{B′}) ⟨ W( )⟩ Preliminaries. A well-quasi-ordering (wqo) on a set 푆 is basis of predloSRA for 훼 of the form 휏, 푥, 푣W , R RMW □ a reflexive and transitive relation ≾ on 푆 such that for every ⟨휏, (푥, 푣R)⟩, ⟨휏, (푥, 푣R, 푣W)⟩ or 휀. ≾ infinite sequence 푠1, 푠2,... of elements of 푆, we have 푠푖 푠푗 It is now easy to establish the decidability of reachability < ≾ for some 푖 푗. In a context of a set 푆 and a wqo on 푆, the under loSRA. upward closure of a set 푈 ⊆ 푆, denoted by ↑푈 , is given by

223 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker

Theorem 6.2 (loSRA reachability). Given a program 푃 and We believe that the potential-based semantics (both specif- a state 푝 ∈ 푃.Q, it is decidable to check whether 푝 is reachable ically for SRA and as a general idea) may be of independent (see Def. 4.4) under the memory system loSRA. interest in the development of verification techniques for pro- grams running under weak consistency, including, but not ⊑ Proof. Since the first component (the program state) in - limited to, program logics and model-checking techniques. ordered pairs of 푃loSRA’s states is equal, reachability under In particular, we are interested in developing abstraction loSRA ⟨ ⊑⟩ is reduced to coverability in 푃loSRA, , which is de- techniques, as was done for TSO and similar buffer-based □ cidable by Lemma 6.1 and the framework of [7]. models (see, e.g., [28, 46]). Other directions for future work We are now in position to prove our main results. include handling other variants of causally consistent shared- memory (see, e.g., [16]), supporting transactions (to enable, Corollary 6.3. The SRA reachability problem is decidable. e.g., full verification of client programs under PSI, see ğ3.1) and studying verification of parametrized programs under Proof. Directly follows from Theorems 4.5, 5.17 and 6.2. □ causal consistency (which is decidable for TSO [4, 6]). Corollary 6.4 (RA race-free reachability). Given a program 푃 that is write/write-race-free under SRA (see Def. 3.13) and a Acknowledgments state 푝 ∈ 푃.Q, it is decidable to check whether 푝 is reachable We thank the PLDI’20 reviewers for their helpful feedback. under RA. This research was supported by the Israel Science Founda- tion (grants 5166651 and 1373/16), and by Len Blavatnik and □ Proof. Directly follows from Thm. 3.14 and Corollary 6.3. the Blavatnik Family foundation. The first author was also supported by the Alon Young Faculty Fellowship. 7 Conclusion and Future Work We established the decidability of reachability under SRA, References a fundamental causal consistency memory model. For that [1] Ori Lahav and Udi Boker. 2020. Supplementary material for this paper. matter, we developed a novel operational semantics for SRA https://www.cs.tau.ac.il/~orilahav/papers/pldi20full.pdf and showed that it meets the requirements for decidability of [2] Parosh Aziz Abdulla. 2010. Well (and better) quasi-ordered transition the framework of well-structured transition systems. Besides systems. The Bulletin of Symbolic Logic 16, 4 (2010), 457ś515. http: //www.jstor.org/stable/40961367 the theoretical interest, Abdulla et al.[4] demonstrate that [3] Parosh Aziz Abdulla, Jatin Arora, Mohamed Faouzi Atig, and Shankara- similar verification procedures (also of non-primitive recur- narayanan Krishna. 2019. Verification of programs under the release- sive complexity) may be actually practical for challenging acquire semantics. In PLDI. ACM, New York, NY, USA, 1117ś1132. (even though naturally quite small) and synchro- https://doi.org/10.1145/3314221.3314649 nization mechanisms. We plan to explore this in the future. [4] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Ahmed Bouajjani, and Tuan Phong Ngo. 2018. A load-buffer semantics for total store ordering. Reachability is undecidable under C/C++11’s causal consis- Logical Methods in Computer Science Volume 14, Issue 1 (Jan. 2018). tency model, RA [3]. Intuitively, this stems from the fact that https://doi.org/10.23638/LMCS-14(1:9)2018 RA requires to maintain mo separately from the execution [5] Parosh Aziz Abdulla, Mohamed Faouzi Atig, Bengt Jonsson, and order, while SRA allows the execution of writes following Tuan Phong Ngo. 2018. Optimal stateless model checking under the hb ∪ mo. (We note that the existing undecidability result cru- release-acquire semantics. Proc. ACM Program. Lang. 2, OOPSLA, Article 135 (Oct. 2018), 29 pages. https://doi.org/10.1145/3276505 cially employs RMWs, and the decidability of RA without [6] Parosh Aziz Abdulla, Mohamed Faouzi Atig, and Rojin Rezvan. 2019. RMW operations is still open.) Since RA and SRA coincide on Parameterized verification under TSO is PSPACE-complete. Proc. write/write-race-free programs, and write/write-race free- ACM Program. Lang. 4, POPL, Article 26 (Dec. 2019), 29 pages. https: dom can be checked under SRA (Thm. 3.14), our result allows //doi.org/10.1145/3371094 the verification of safety properties under RA for this large [7] Parosh Aziz Abdulla, Karlis¯ Čerans,¯ Bengt Jonsson, and Yih-Kuen Tsay. 2000. Algorithmic analysis of programs with well quasi-ordered and widely used class of programs. Concurrent separation domains. Information and Computation 160, 1 (2000), 109 ś 127. https: logics [25, 48, 49], designed for verification under RA, are //doi.org/10.1006/inco.1999.2843 also essentially limited to reason only about write/write-race- [8] Sarita V. Adve and Mark D. Hill. 1990. Weak orderingÐa new definition. free programs and stateless model checking is significantly In ISCA. ACM, New York, NY, USA, 2ś14. https://doi.org/10.1145/ simpler with this assumption (see [27, ğ5]). We also note 325164.325100 [9] Jade Alglave, Luc Maranget, and Michael Tautschnig. 2014. Herding that it is straightforward to support C/C++11’s non-atomics, cats: modelling, simulation, testing, and data mining for weak memory. with łcatch-firež semantics (i.e., data races are errors) inaddi- ACM Trans. Program. Lang. Syst. 36, 2, Article 7 (July 2014), 74 pages. tion to release/acquire accesses and sequentially consistent https://doi.org/10.1145/2627752 fences (which are modeled as RMWs as in Ex. 3.10). Indeed, [10] Masoud Saeida Ardekani, Pierre Sutra, and Marc Shapiro. 2013. Non- as demonstrated in [25], it suffices to check for data races monotonic snapshot isolation: Scalable and strong consistency for geo-replicated transactional systems. In SRDS. IEEE Computer Society, assuming RA semantics. The extension to other fragments Washington, DC, USA, 163ś172. https://doi.org/10.1109/SRDS.2013.25 of C/C++11, such as relaxed and sequentially consistent ac- [11] Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and cesses, is left to future work. Madanlal Musuvathi. 2010. On the verification problem for weak

224 Decidable Verification under a Causally Consistent Shared Memory PLDI ’20, June 15ś20, 2020, London, UK

memory models. In POPL. ACM, New York, NY, USA, 7ś18. https: [31] Ori Lahav and Roy Margalit. 2019. Robustness against release/acquire //doi.org/10.1145/1706299.1706303 semantics. In PLDI. ACM, New York, NY, USA, 126ś141. https://doi. [12] Mohamed Faouzi Atig, Ahmed Bouajjani, Sebastian Burckhardt, and org/10.1145/3314221.3314604 Madanlal Musuvathi. 2012. What’s decidable about weak memory [32] Ori Lahav and Viktor Vafeiadis. 2015. Owicki-Gries reasoning for models?. In ESOP. Springer-Verlag, Berlin, Heidelberg, 26ś46. https: weak memory models. In ICALP. Springer-Verlag, Berlin, Heidelberg, //doi.org/10.1007/978-3-642-28869-2_2 311ś323. https://doi.org/10.1007/978-3-662-47666-6_25 [13] Mark Batty, Kayvan Memarian, Kyndylan Nienhuis, Jean Pichon- [33] Ori Lahav and Viktor Vafeiadis. 2016. Explaining relaxed memory Pharabod, and Peter Sewell. 2015. The problem of programming models with program transformations. In FM. Springer, Cham, 479ś495. language concurrency semantics. In ESOP. Springer, Berlin, Heidel- https://doi.org/10.1007/978-3-319-48989-6_29 berg, 283ś307. https://doi.org/10.1007/978-3-662-46669-8_12 [34] Ori Lahav, Viktor Vafeiadis, Jeehoon Kang, Chung-Kil Hur, and Derek [14] Mark Batty, Scott Owens, Susmit Sarkar, Peter Sewell, and Tjark Weber. Dreyer. 2017. Repairing sequential consistency in C/C++11. In PLDI. 2011. Mathematizing C++ concurrency. In POPL. ACM, New York, NY, ACM, New York, NY, USA, 618ś632. https://doi.org/10.1145/3062341. USA, 55ś66. https://doi.org/10.1145/1925844.1926394 3062352 [15] Giovanni Bernardi and Alexey Gotsman. 2016. Robustness against con- [35] Mohsen Lesani, Christian J. Bell, and Adam Chlipala. 2016. Chapar: sistency models with atomic visibility. In CONCUR. Schloss Dagstuhlś Certified causally consistent distributed key-value stores. In POPL. Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 7:1ś7:15. https: ACM, New York, NY, USA, 357ś370. https://doi.org/10.1145/2837614. //doi.org/10.4230/LIPIcs.CONCUR.2016.7 2837622 [16] Ahmed Bouajjani, Constantin Enea, Rachid Guerraoui, and Jad Hamza. [36] Cheng Li, Daniel Porto, Allen Clement, Johannes Gehrke, Nuno 2017. On verifying causal consistency. In POPL. ACM, New York, NY, Preguiça, and Rodrigo Rodrigues. 2012. Making geo-replicated sys- USA, 626ś638. https://doi.org/10.1145/3009837.3009888 tems fast as possible, consistent when necessary. In OSDI. USENIX [17] Lucas Brutschy, Dimitar Dimitrov, Peter Müller, and Martin Vechev. Association, Berkeley, CA, USA, 265ś278. http://dl.acm.org/citation. 2018. Static serializability analysis for causal consistency. In PLDI. cfm?id=2387880.2387906 ACM, New York, NY, USA, 90ś104. https://doi.org/10.1145/3192366. [37] Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. 3192415 Andersen. 2011. Don’t settle for eventual: Scalable causal consistency [18] Sebastian Burckhardt. 2014. Principles of eventual consistency. Found. for wide-area storage with COPS. In SOSP. ACM, New York, NY, USA, Trends Program. Lang. 1, 1-2 (Oct. 2014), 1ś150. https://doi.org/10. 401ś416. https://doi.org/10.1145/2043556.2043593 1561/2500000011 [38] Mapping 2019. C/C++11 mappings to processors. Retrieved July 3, [19] Andrea Cerone, Alexey Gotsman, and Hongseok Yang. 2015. Transac- 2019 from http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html tion chopping for parallel snapshot isolation. In DISC. Springer-Verlag, [39] Luc Maranget, Susmit Sarkar, and Peter Sewell. 2012. A tutorial Berlin, Heidelberg, 388ś404. https://doi.org/10.1007/978-3-662-48653- introduction to the ARM and POWER relaxed memory models. 5_26 http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/test7.pdf. [20] Andrea Cerone, Alexey Gotsman, and Hongseok Yang. 2017. Algebraic [40] MongoDB Manual 4.2 2019. Causal consistency and read and write laws for weak consistency. In CONCUR, Vol. 85. Schloss Dagstuhlś concerns. Retrieved November 19, 2019 from https://docs.mongodb. Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 26:1ś26:18. com/manual/core/causal-consistency-read-write-concerns https://doi.org/10.4230/LIPIcs.CONCUR.2017.26 [41] Kartik Nagar and Suresh Jagannathan. 2018. Automated detection of [21] Simon Doherty, Brijesh Dongol, Heike Wehrheim, and John Derrick. serializability violations under weak consistency. In CONCUR, Vol. 118. 2019. Verifying C11 programs operationally. In PPoPP. ACM, New Schloss DagstuhlśLeibniz-Zentrum fuer Informatik, Dagstuhl, Ger- York, NY, USA, 355ś365. https://doi.org/10.1145/3293883.3295702 many, 41:1ś41:18. https://doi.org/10.4230/LIPIcs.CONCUR.2018.41 [22] Alain Finkel and Philippe Schnoebelen. 2001. Well-structured transi- [42] Scott Owens, Susmit Sarkar, and Peter Sewell. 2009. A better x86 tion systems everywhere! Theoretical Computer Science 256, 1 (2001), memory model: x86-TSO. In TPHOLs. Springer, Heidelberg, 391ś407. 63 ś 92. https://doi.org/10.1016/S0304-3975(00)00102-X https://doi.org/10.1007/978-3-642-03359-9_27 [23] ISO/IEC 14882:2011. 2011. Programming language C++. [43] Azalea Raad, Ori Lahav, and Viktor Vafeiadis. 2018. On parallel [24] ISO/IEC 9899:2011. 2011. Programming language C. snapshot isolation and release/acquire consistency. In ESOP. Springer, [25] Jan-Oliver Kaiser, Hoang-Hai Dang, Derek Dreyer, Ori Lahav, and Berlin, Heidelberg, 940ś967. https://doi.org/10.1007/978-3-319-89884- Viktor Vafeiadis. 2017. Strong logic for weak memory: Reasoning 1_33 about release-acquire consistency in Iris. In ECOOP. Schloss Dagstuhlś [44] Sylvain Schmitz and Philippe Schnoebelen. 2012. Algorithmic as- Leibniz-Zentrum fuer Informatik, Dagstuhl, Germany, 17:1ś17:29. pects of WQO theory. (Aug. 2012). https://cel.archives-ouvertes.fr/cel- https://doi.org/10.4230/LIPIcs.ECOOP.2017.17 00727025 Lecture notes. [26] Jeehoon Kang, Chung-Kil Hur, Ori Lahav, Viktor Vafeiadis, and Derek [45] Yair Sovran, Russell Power, Marcos K. Aguilera, and Jinyang Li. 2011. Dreyer. 2017. A Promising Semantics for Relaxed-Memory Con- Transactional storage for geo-replicated systems. In SOSP. ACM, New currency. In POPL. ACM, New York, NY, USA, 175ś189. https: York, NY, USA, 385ś400. https://doi.org/10.1145/2043556.2043592 //doi.org/10.1145/3009837.3009850 [46] Thibault Suzanne and Antoine Miné. 2016. From array domains to [27] Michalis Kokologiannakis, Ori Lahav, Konstantinos Sagonas, and Vik- abstract under store-buffer-based memory models. In tor Vafeiadis. 2017. Effective stateless model checking for C/C++ con- SAS. Springer, Berlin, Heidelberg, 469ś488. currency. Proc. ACM Program. Lang. 2, POPL, Article 17 (Dec. 2017), [47] Douglas B. Terry, Alan J. Demers, Karin Petersen, Mike Spreitzer, 32 pages. https://doi.org/10.1145/3158105 Marvin Theimer, and Brent W. Welch. 1994. Session guarantees for [28] Michael Kuperstein, Martin Vechev, and Eran Yahav. 2011. Partial- weakly consistent replicated data. In PDIS. IEEE Computer Society, coherence abstractions for relaxed memory models. In PLDI. ACM, Washington, DC, USA, 140ś149. http://dl.acm.org/citation.cfm?id= New York, NY, USA, 187ś198. https://doi.org/10.1145/1993498.1993521 645792.668302 [29] Ori Lahav. 2019. Verification under causally consistent shared memory. [48] Aaron Turon, Viktor Vafeiadis, and Derek Dreyer. 2014. GPS: Navigat- ACM SIGLOG News 6, 2 (April 2019), 43ś56. https://doi.org/10.1145/ ing weak memory with ghosts, protocols, and separation. In OOPSLA. 3326938.3326942 ACM, New York, NY, USA, 691ś707. https://doi.org/10.1145/2660193. [30] Ori Lahav, Nick Giannarakis, and Viktor Vafeiadis. 2016. Taming 2660243 release-acquire consistency. In POPL. ACM, New York, NY, USA, 649ś 662. https://doi.org/10.1145/2837614.2837643

225 PLDI ’20, June 15ś20, 2020, London, UK Ori Lahav and Udi Boker

[49] Viktor Vafeiadis and Chinmay Narayan. 2013. Relaxed separation [50] John Wickerson, Mark Batty, Tyler Sorensen, and George A. Constan- logic: A program logic for C11 concurrency. In OOPSLA. ACM, New tinides. 2017. Automatically comparing memory consistency models. York, NY, USA, 867ś884. https://doi.org/10.1145/2509136.2509532 In POPL. ACM, New York, NY, USA, 190ś204. https://doi.org/10.1145/ 3009837.3009838

226