Verified Compilation of Linearizable Data Structures: Mechanizing Rely Guarantee for Semantic Refinement Yannick Zakowski, David Cachera, Delphine Demange, David Pichardie

To cite this version:

Yannick Zakowski, David Cachera, Delphine Demange, David Pichardie. Verified Compilation of Linearizable Data Structures: Mechanizing Rely Guarantee for Semantic Refinement. SAC 2018 - The 33rd ACM/SIGAPP Symposium On Applied Computing, Apr 2018, Pau, France. pp.1881-1890, ￿10.1145/3167132.3167333￿. ￿hal-01653620￿

HAL Id: hal-01653620 https://hal.archives-ouvertes.fr/hal-01653620 Submitted on 1 Dec 2017

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Verified Compilation of Linearizable Data Structures∗

Mechanizing Rely Guarantee for Semantic Refinement

Yannick Zakowski David Cachera Delphine Demange David Pichardie Univ Rennes / Inria / CNRS / IRISA

ABSTRACT verified garbage collector but none of these projects is ready Compiling concurrent and managed languages involves imple- yet to provide an executable verified . menting sophisticated interactions between client code and Our approach in [31] is based on a dedicated intermediate the runtime system. An emblematic runtime service, whose representation (IR) that features strong type guarantees, implementation is particularly error-prone, is concurrent dedicated support for abstract concurrent data structures, garbage collection. In a recent work [31], we implement an and high-level iterators on runtime internals (e.g. objects, on-the-fly concurrent garbage collector, and formally prove thread ids). We have implemented a realistic implementa- its functional correctness in the Coq proof assistant. The tion of Domani et al.’s GC algorithm [3] in this IR and use garbage collector is implemented in a compiler intermediate its companion Rely-Guarantee logic to prove its functional representation featuring abstract concurrent data structures. correctness. But in order to make this runtime service exe- The present paper extends this work by considering the cutable, we need to provide a verified compiler for our IR. concrete implementation of some of these abstract concurrent The present paper is in line with this objective: we provide data structures. We formalize, in the Coq proof assistant, a a mechanized theorem to prove the soundness of compilation theorem establishing the semantic correctness of a compiling passes implementing abstract concurrent data structures. Us- pass which translates abstract, atomic data structures into ing this result, the correctness proof of runtime services can their concrete, fine-grained concurrent implementations. then be propagated down to their low-level implementation. At the crux of the proof lies a generic result establishing once To fit into a classical verified compiler infrastructure [21], and for all a simulation relation, starting from a carefully we need a correctness theorem for a compiler from a source crafted rely-guarantee specification. Inspired by the work of language with atomic data-structures, to a target language Vafeiadis [28], implementations are annotated with lineariza- where data-structures are implemented with fine-grained tion points. Semantically, this instrumentation reflects the concurrency. The standard theorem in the community proves behavior of abstract data structures. that the compiler preserves the behavior of source programs: for any source program p, 1. INTRODUCTION obs(compile(p)) ⊆ obs(p) (1) Verified compilation is slowly becoming a reality. Comp- where obs denotes the observable behaviors of a program. Cert [21] is a realistic and fully verified C compiler. It is now Fine-grained concurrency allows programmers to reduce syn- commercially available and makes possible the generation of chronization costs to a minimum but renders the program- optimized code for highly critical applications, whereas in the ming activity extremely subtle and error-prone, due to race past compiler optimizations were often completely switched conditions. Experts who program fine-grained concurrent al- off in this context [20]. The situation for managed languages, gorithms generally resort to well-chosen data structures that that require a runtime environment including e.g. automatic can be accessed using linearizable methods: even though the memory management, is quite different. Indeed, there is still implementation of these methods is by no means atomic, they a long way to go before a realistic and fully verified compiler appear to the rest of the system to occur instantaneously. becomes available, especially in a concurrent setting. Linearizability thus considerably helps abstraction: the pro- One major challenge in this landscape is the verification of grammer can safely reason at a higher-level, and assume an an executable runtime system for such languages and an abstract, atomic specification for these data structures. emblematic runtime service is garbage collection. In [31], we Initially, linearizability was formally defined by Herlihy and recently presented a Coq formalization of a fully concurrent Wing [16]. In their seminal paper, they model system ex- garbage collector, where mutators never have to wait for the ecutions as sequences, or histories, of operation invocation collector. This work and other similar proof efforts [25, 10, 9, and response events, and a system is linearizable whenever 7, 8, 11, 12] are important stepping stones towards a realistic, all valid histories can be reordered into sequential histories. ∗This work was supported by Agence Nationale de la In this work, we rather consider an alternative formulation, Recherche, grant number ANR-14-CE28-0004 DISCOVER. based on semantic refinement that is well aligned with the SAC 2018,April 09-13, 2018, Pau, France standard correctness criterion in verified compilation [21]. Proving a statement like (1) is done by showing that the Our main result is generic in the abstract data structures used target program compile(p) simulates the source program p: in L] and their implementation I. Here, we illustrate on a for any execution of the target program, we must exhibit a simple pedagogical example what are the intrinsic challenges, matching execution of the source program. While the defini- and briefly overview our technical contribution. A more tion of the matching relation is relatively intuitive, dealing challenging example is described in Section 6. with the formal details can be cumbersome, and further, proving that it is indeed maintained along the execution is 2.1 Simple Lock Example difficult. Hence, we would like to resort to popular program Suppose L] offers abstract locks, each providing two methods verification techniques to lighten the proof process. I = {acquire, release}. At the level of L], an abstract lock In [28], Vafeiadis proposed a promising approach that fits our can be seen as a simple boolean value, with the expected needs. It is based on Rely-Guarantee (RG) [18], a popular atomic semantics. At the concrete level, L, acquire and proof technique extending Hoare logic to concurrency. In release are implemented with the code in Figure 1. They RG, threads interferences are described by binary relations use a boolean field flag, denoting the status of the lock. on shared states. Each thread is proved correct under the Releasing the lock simply sets the field to 0, while acquiring assumption that other threads obey a rely relation. The it uses a compare-and-swap instruction (cas) to ensure that effect of the thread must respect a guarantee relation, which two threads do not simultaneously acquire a shared lock. must be accounted for in the relies of the other threads. RG Note that both methods could be executed concurrently by allows for thread-modular reasoning, and hence removes the two client threads. need to explicitly consider all interleavings in a global fashion. Now, RG alone does not capture the notion of a lineariz- def acquire() ::= able method. So Vafeiadis proposes to extend RG to hybrid ok = 0; implementations, i.e. fine-grained concurrent methods instru- do{ mented with linearization points that reflect the abstract, ok=cas(this.flag,0,1) atomic behavior of methods in a ghost part of the execution } while (ok == 0); return state. The approach is elegant, but is not formally linked to the standard notion of compiler correctness. Recently, Liang def release() ::= et al. [22] improved on the work of Vafeiadis by expressing this.flag = 0; the soundness of the methodology with a semantic refine- return ment. Their work undoubtedly makes progress in the right direction. Unfortunately, their proof is not machine-checked. Figure 1: Spinlock in L. To sum up, we provide a machine-checked semantic founda- tion to the approach outlined in Vafeiadis’s PhD, and embed it in a generic, end-to-end compiler correctness theorem. 2.2 Semantic Refinement More precisely, we make the following contributions: Proving a theorem like (2) is done with a simulation: for any execution of the target program, we must exhibit a matching • We integrate the notion of linearizable data structures execution of the source program. Between L and L], the in a formally verified compiler. Correctness is phrased simulation is however particularly difficult to establish. in terms of a simple semantic refinement and avoids the difficult, though traditional, definition of linearizability. We illustrate the situation with Figure 2. Execution steps • Our proof is generic in the abstract data structures u t under consideration, and in the source program using ] ] them. The underlying simulation is proved once and L L for all, provided that hybrid implementations meet a u u ✓u u u t t t t ✓t t certain specification. L L ] u t • We express this specification in terms of RG reasoning, L so it integrates smoothly with deductive proof systems. L t u t t u t ✓ u u ✓ t u t All theorems are proved in Coq, and available online [30].

Figure 2: Intra and inter-thread matching step rela- 2. CHALLENGES AND OVERVIEW tions. At the source level, we use a core concurrent language L] that features abstract data structures with a set I of atomic labeled with are those where the effect of a method in I methods. We program a compiler compilehIi ∈ L] → L, becomes visible to other threads, and thus determine the which replaces the abstract data structures with their fine- behavior of other methods in I executed concurrently. grained concurrent implementation in L. Our goal is to prove At the intra-thread level (Figure 2, top), we need to relate that this compiler is correct, in the sense that it preserves several steps of a thread in the target program to a single the observable behaviors of source programs, with a theorem step in the source. The situation is even more difficult at of the form the inter-thread level (Figure 2, bottom): the interleaving ∀p ∈ L], obs(compilehIi(p)) ⊆ obs(p) (2) of threads at the target level (first u, then t in the example) must sometimes be matched at the source level by another 2.4 Using our Result interleaving (first t then u in the example). Indeed, it all The typical workflow for using our generic result is to (i) de- depends on which thread will be the first to execute its fine the abstract data structures specification, i.e. their type, step in the concrete execution. The matching step for a given and the atomic semantics of methods in I, (ii) provide a thread hence depends on the execution of its environment. concrete implementation, i.e. their representation in the heap Our main result, that we present in Section 4, removes this and a fine-grained hybrid implementation of methods, (iii) de- difficulty by establishing, under some hypotheses, a generic fine a coherence invariant between abstract and concrete data simulation that entails semantic refinement. This is a meta- structures that formalizes the link between a concrete data theorem that we establish once and for all, independently of structure and its abstract view, (iv) define the rely and guar- the abstract data-structures available in L]. antee of each method, (v) prove the RG specification of each method using a dedicated program logic1 and (vi) apply our meta-theorem to get the global correctness result. 2.3 Rely-Guarantee for Atomicity To prove the atomicity of linearizable methods, and establish We have successfully used this workflow to prove the correct- the simulation, we introduce an intermediate, proof-dedicated ness of the above spinlock example, as well as concurrent language L[, that comes with an RG proof system. buffers, a data structure we used [31] in our implementation of a verified concurrent garbage collector. L[ provides explicit linearization point annotations, Lin(b), to guide the proof. In the spirit of Vafeiadis’s methodology, Lin(b) annotations have an operational effect: they trigger, 3. LANGUAGE SYNTAX AND SEMANTICS whenever condition b is met, the execution of the abstract As explained in the previous section, this work considers method in a ghost part of the state (they are equivalent three different languages L], L[ and L. To lighten the pre- to skip if the condition is not met). Lin instructions are sentation, however, we will just assume one language, L[, [ what makes L hybrid. Additionally, they allow to track that includes all features, and keep in mind that source pro- whether a thread has reached the linearization point or not, grams in L] do not include any linearization instrumentation, thus helping the construction of our generic simulation. The while target programs in L do not contain abstract method annotated spinlock code is shown in Figure 3. calls or linearization instrumentation. L[ is a concurrent imperative language, with no dynamic def acquire() ::= creation of threads. It is dynamically typed, and features a ok = 0; simplified object model: objects in the heap are just records, do{ and rather than virtual method calls, the current object – the atomichok=cas(this.flag,0,1);Lin(ok==1)i object whose method is being called – is an extra function } while (ok == 0); return argument, passed in the reserved variable this. In the sequel, Var is a set of variable identifiers, method names range over def release() ::= m ∈ Methods, and field identifiers range over f ∈ Fields. atomichthis.flag = 0; Lin(true)i; return 3.1 Values and Abstract Data Structures We use the domain of values Val = Z + Ref + Null, where Ref is a countable set of references. A central notion in the [ Figure 3: Spinlock in L . language is that of abstract data structure. They are speci- fied with an atomic specification. All our development and Considering the intermediate language L[, our compiler is our proofs are parameterized by an abstract data structure in fact defined as compile = clean ◦ concretize. It first specification. It could be abstract locks as in Section 2, bags, transforms a source program p into concretizehIi(p)∈ L[ stacks, or buffers as detailed in Section 6. where abstract methods are implemented by hybrid, anno- tated code, supplied by the programmer of the data-structure. Definition 3.1. An abstract data structure is specified Then, a second compilation phase cleanhIi ∈ L[ → L takes by a tuple (A], I, . ], P) where A] is a set of abstract objects; care of removing Lin instructions in the target program. I ⊆ Methods is aJ setK of abstract methods identifiers, whose atomic semantics is given by the partial map . ] ∈ I → We prove in Coq that the compiler is correct, provided that (A] × Val) ,→ (A] × Val), taking as inputs anJ objectK and hybrid methods in I are proved correct w.r.t. an RG spec- a value, and returning an updated object and a value; and ification, RGspec, that we carefully define in terms of L[ P ⊆ Fields reserves private field identifiers for the concrete semantics to prove: implementation of abstract methods in I. RGspec (I) =⇒ ∀p ∈ L], obs(compilehIi(p)) ⊆ obs(p) Abstract objects in A] are the possible values that an in- via the aforementioned simulation. We also embed in judg- stance of a data structure can take. We use private fields ment RGspec the requirements sufficient to show that Lin to express the property of interference freedom from Herlihy instructions can be safely removed by the clean phase. This and Shavit [15]. Namely, client code can only use public ensures that, despite their operational nature, Lin instruc- tions are only passively instrumenting the program and its 1The program logic is a contribution that is out of the scope semantics. of this paper, though we use it in our development. e ::= n | null | x | e + e | - e | e mod n | ... We assume a standard semantics · for expressions, omitted b ::= true | e == e | e <> e | b || b | !b | ... here. Abstract objects are stored inJ K an abstract heap, ranged c ::= • | assume(b) | print(e) | x = e over by h] ∈ H] = Ref → A]. At the concrete level, | x = y.f | x.f = y | x = new(f,. . . ,f) abstract objects are implemented by regular, concrete objects, | return(e) | x = y.m(z) living in a concrete heap h ∈ H = (Ref × Fields) → Val.A | c ; c | c + c | loop(c) | atomic hci shared memory, ranged over by σ ∈ H] × H is made of an ] ] c ::= c | x =# y.m(z) abstract heap and a concrete heap. [ c[ ::= c | Lin (b) An intra-thread state ts = hm, c, l, lsi includes a current Figure 4: Language Syntax. method m, a current command c, a local environment l ∈ Lenv = Var → Val, and a linearization state ls ∈ LinState, that we explain below. The intra-thread operational seman- fields in Fields \P, and concrete implementations of abstract tics, partially shown in the top four rules of Figure 5, is a · methods in I use private fields only. transition relation · −→ · on intra-thread states. It is labeled with observable events ranged over by o. An observable Example 3.1. In the spinlock example, the lock abstract is either a numeric value or the silent event τ. data structure specification is given by a set of abstract objects A] = {Locked, Unlocked}, and an abstract methods set The print instruction (rule Print) is the only one that emits I = {acquire, release}. Lock implementations use a single an observable value, namely the value of the expression that private field P = {flag}. The atomic, abstract semantics of is printed. Neither the local state nor the shared memory lock methods are defined as are modified by this instruction. Print instructions are only allowed outside abstract methods implementations. acquire ](Unlocked, v) = (Locked, Null) J K An abstract method call (rule Acall) x =# y.m(z) is executed for any value v (only an Unlocked lock can be acquired), and according to the abstract semantics m ], and modifies only the abstract heap. J K release ](l, v) = (Unlocked, Null) J K Concrete method calls (rule Ccall) behave as expected, but for any input abstract lock l and value v. additionally manage the local linearization state. This lin- earization state notably keeps track of whether the execution 3.2 Language Syntax of the current method is before its linearization point (Before) The syntax of the language is detailed on Figure 4. In the or not (After). Initially, the linearization state is set to Nolin. sequel, we fix an arbitrary abstract data structure speci- When control transfers to a method in I through a concrete fication (A], I, . ], P). The language provides constants method call, the linearization state changes from Nolin to (n, null, true ...J)K, local variables (x, y, z ... ), and arithmetic Before (see rule Ccall). It switches to After when executing and boolean expressions (e, b). Regular commands (c) are a Lin instruction (rule LinTrue) when the linearization con- standard, and common to the three languages. They include dition b is true, and then back to Nolin on method return. Linearization states are used in the simulation proof, and • (skip), an assume(e) statement, a print(e) instruction that [ emits the observable value of e, variable assignment of an instrument L only. expression, field reads and updates, record allocation, non- At the L[ level, the Lin instruction also accounts for the effect deterministic choice (+), loops, and atomic blocks atomic on the abstract heap of concrete methods in I: it performs hci. Concrete method calls are written x = y.m(z). the abstract atomic call m ] to the enclosing method m, J K Some instructions are specific to a language level. In the updating the local environment and abstract heap. ] source language L , abstract method calls on a abstract The interleaving of threads is handled in rule Intl, with object are written x =# y.m(z). For any m ∈ I, such a call in relation (γ, σ) −→o (γ0, σ0) between global states (γ, σ), where a L] program is compiled to a concrete call x = y.m(z) in the γ maps thread identifiers to thread local states and σ is a L program. In L[, the Lin(b) instruction is used to annotate shared memory. Mutual exclusion between atomic blocks is a linearization point. ensured by the ¬inAtomic side condition. Finally, a client program is defined by a map from method Finally, program behaviors are defined on top of the inter- names in Methods \I to their command, and a map from leaving semantics, as expressed by the following definition. thread identifiers to their command. In the sequel, we will write m.comm for getting the command of method m, leaving the underlying program implicit. Definition 3.2. The observable behavior of a program p, written obs(p, σi), is either a finite trace of values emitted by a finite sequence of transitions or a infinite trace of values 3.3 Language Semantics emitted by an infinite sequence of transitions, from an initial For the sake of conciseness, we present here a partial view shared memory σi. of the semantics, and refer the reader to the formal develop- ment [30] for full details2. 4. MAIN THEOREM 2In our formal development, we use a continuation-based semantics to handle atomic blocks and method calls. This In this section, we formalize our main result. We use the fol- has proven to lighten the mechanization of many proofs, by lowing notations and vocabulary. For a set A, an A predicate removing any recursivity from the small step semantics. P is a subset of A. An element a ∈ A satisfies the A predicate e l = v m 6∈ I Print J K (hm, print(e), l, lsi, σ) −→v (hm, •, l, lsi, σ)

l(y) = r h](r) = a l(z) = v m0 ∈ I m0 ](a, v) = (a0, v0) Acall J K (hm, x=# y.m0(z), l, lsi, (h], h)) −→τ (hm, •, l[x 7→ v0], lsi, (h][r 7→ a0], h))

l(y) = r l(z) = v l0 = [m0.this 7→ r, m0.arg 7→ v] ls0 = if m0 ∈ I then Before(r, v) else Nolin Ccall (hm, x = y.m0(z), l, Nolini, σ) −→τ (hm0, m0.comm, l0, ls0i, σ)

b l = true h](r) = a m ∈ I m ](a, v) = (a0, v0) LinTrue J K J K (hm, Lin(b), l, Before(r, v)i, (h], h)) −→τ (hm, •, l[x 7→ v0], After(r, v, v0)i, (h][r 7→ a0], h))

b l = false LinFalse J K (hm, Lin(b), l, Before(r, v)i, (h], h)) −→τ (hm, •, l, Before(r, v)i, (h], h))

γ(t) = ts (ts, σ) −→o (ts0, σ0) ∀t0 6= t, ¬inAtomic(γ(t0)) Intl (γ, σ) −→o (γ[t 7→ ts0], σ0)

Figure 5: Semantics (excerpt).

P , written a |= P , when a ∈ P . For two sets A and B, a 1. The post-condition is established from pre-condition relation R is an A×B predicate. We use infix notations for re- and invariant: ] ∗ 0 0 0 lations. State predicates are (H ×H ×Lenv ×LinState) pred- (hm, c, l, lsi, σ) →R (hm, •, l , ls i, σ ) icates, specifying shared memories and intra-thread states. and (l, ls, σ) |= P ∩ I 0 0 0 A shared memory interference is a binary relation on H] × H, implies (l , ls , σ ) |= Q and is used for relies and guarantees. We refer to both state 2. Instructions comply with the guarantee: predicates and shared memory interferences as assertions. ∗ 0 0 0 0 00 00 00 00 (hm, c, l, lsi, σ) →R (hm, c , l , ls i, σ ) → (hm, c , l , ls i, σ ) The rely-guarantee reasoning is done at the intermediate and (l, ls, σ) |= P ∩ I level L[, on instrumented programs, more precisely on the implies σ0 G σ00 hybrid code of abstract methods implementations. Hence, 3. Linearization points are unique and non-blocking: assertions specify properties about the concrete and abstract (hm, c, l, lsi, σ) →∗ (hm, Lin(b), l0, ls0i, (h], h)) heaps simultaneously. R and (l, ls, σ) |= P ∩ I Our work derives a compiler correctness result from a generic and b l0 = true rely-guarantee specification. Of course, this cannot be achieved impliesJ K their exist r, a, a0, v, v0, for an arbitrary RG specification, so we need to constrain such that h](r) = a this specification. In the following, we explain exactly what and m ](a, v) = (a0, v0) this specification looks like, and means. and lsJ 0K= Before(r, v)

4.1 Semantic RG Judgment The last condition above is novel, and more subtle than the A hybrid method m must be specified with a semantic RG others. It captures a necessary requirement to ensure that judgment of the form R, G, I |=m {P } c {Q}, where P, Q, I Lin instructions do not block programs ( m ] is defined), and are state predicates, R and G are shared memory interfer- are unique (the linearization state is BeforeJ K). This condition ences, and c is the body of method m. State predicate I is is essential to ensure that we can clean up the Lin instrumen- meant to specify the coherence invariant between abstract tation of hybrid programs, and that our semantic refinement objects and their representation in the concrete heap. It is is not vacuously true. asked to be proved invariant separately (see Definition 4.4). The RG judgment intuitively states that starting in a state 4.2 Specifying Hybrid Methods satisfying P and invariant I, interleaving c with an environ- ∗ We now explain the specific RG judgment we require for ment behaving as prescribed by R (written →R), leads to a state satisfying Q. Additionally, this execution of c must be hybrid methods. fully reflected by guarantee G. This intuition, typical of RG The above RG judgment involves state predicates and shared reasoning, is formalized by the first two items below. memory interferences. In fact, we build them from elemen- tary bricks, object predicates and object interferences, that Definition 4.1. Judgment R, G, I |=m {P } c {Q} holds consider one object — one instance of a data structure — at whenever: a time, pointed to by a given reference r ∈ Ref . Definition 4.2. An object predicate Pr is a predicate on heap. Hence, a lifted rely Re includes (i) the client public ] pairs of an abstract object and a concrete heap: Pr ⊆ A × H. interference, and (ii) the method’s rely Rr quantified over An object interference Rr is a relation on pairs of an abstract all r: ] ] object and a concrete heap: Rr ⊆ (A × H) × (A × H). ] ] Re = Rpub ∪ {((h1, h1), (h2, h2)) | ∃r, a1, a2, ] Example 4.1. The coherence invariant specifies that an h1(r) = a1 ] ] abstract Locked (resp. Unlocked) lock is implemented in and h2 = h1[r 7→ a2] the concrete heap as an object whose field flag is set to 1 and (a1, h1) Rr (a2, h2)} (resp. 0). It is formalized as the following object predicate: Before we define the RG proof obligation asked of hybrid ILock r , {(Locked, h) | h(r, flag) = 1} method implementations, let us first recall the usual RG ∪ {(Unlocked, h) | h(r, flag) = 0} definition of stability. Object guarantees for acquire and release express the effect of the methods on the shared memory when called on a refer- Definition 4.3. State predicate P is stable w.r.t. shared ence r. They are defined as the following object interferences. memory interference R if ∀l, ls, σ1, σ2,,

r (σ1, l, ls) |= P and (σ1R σ2) implies (σ2, l, ls) |= P Grel , {((a, h1), (Unlocked, h2)) | h2 = h1[r, flag ← 0]} r Gacq ,{((Unlocked, h1), (Locked, h2)) | r Now, we fix an invariant Ir. For a method m ∈ I, let Gm h1(r, flag) = 0 and h2 = h1[r, flag ← 1]} r and Rm be the object guarantee and rely of m, as previously r In Gacq, the assignment to flag is performed only if the cas illustrated in Example 4.1. An RG specification for m includes succeeds. an RG semantic judgment, and stability obligations: Finally, both acquire and release have the same object rely, when called on a reference r: indeed, another thread could Definition 4.4 (RG method specification). The RG call both methods on the same reference. So we define the specification for method m ∈ I includes the three following r r r conditions: following object interference: Rm , Grel ∪ Gacq, for m ∈ {rel, acq}. r • For all r ∈ Ref , Rem, Gm , Ibr |=m {Pr} m.comm {Qr}

We now need to lift object predicates and object interfer- • For all r ∈ Ref , predicateŇ Ibr is stable w.r.t. Rem ences to state predicates and shared-memory interferences to 0 r0 enunciate the RG specifications of hybrid implementations. • For all r, r ∈ Ref , predicate Ibr is stable w.r.t. Gm The challenge here is twofold: make the specification effort relatively light for the user, and, more importantly, suffi- Ŋ ciently control the specifications so that we can derive our In the above judgment, we impose the pre- and post-condition generic result. Pr and Qr. Pr expresses that (i) r points to an abstract object in the abstract heap, (ii) the linearization state is set An object predicate Pr is lifted to a state predicate Pbr by to Before and (iii) the reserved local variable this of method further specifying that, in the abstract heap, r points to an m is set to r. Qr expresses that (i) the linearization state abstract object satisfying Pr: is set to After, and (ii) the value virtually returned by the abstract method (when encountering the Lin instruction) P = {(h], h, l, ls) | ∃a, h](r) = a and (a, h) |= P } br r matches the value returned by the concrete code. Intuitively, stability requirements ensure that I is indeed an invariant Similarly, in order to lift an object guarantee G to a shared br r of the whole program. memory interference Gr, the reference r should point to an abstract object in the abstract heap. Moreover, its effect on this object should be reflectedŇ in the resulting abstract heap. 4.3 Compiler Correctness Theorem Formally: So far, we have expressed requirements on hybrid methods, ] ] each taken in isolation. The last requirement we formulate Gr = {((h , h1), (h [r 7→ a2], h2)) | ] is the consistency between relies and guarantees of methods. ∃a1, h (r) = a1 and (a1, h1) Gr (a2, h2)} For a method m, to ensure that Rm is indeed a correct over- Ň approximation of its environment, we ask that Rm includes Lifting relies is a bit more subtle. When executing an hybrid 0 any guarantee Gm0 , where m is a method that may be called implementation m, one should account for two kinds of con- concurrently to m. This requirement is formalized by the current effects: the client code, and the rely of the method following definition. itself. To model the client code effect, we introduce a public 0 shared memory interference, written Rpub , that models any Definition 4.5 (RG consistency). For all threads t, t possible effect on the concrete heap, except modifying private such that t 6= t0, all methods m, m0 ∈ I and all r ∈ Ref , 0 0 r r fields in P: is called(t, m)∧is called(t , m ) ⇒ Gm0 ⊆ Rm where is called(t, m) ] ] indicates that m appears syntactically in the code of t. Rpub = {((h , h1), (h , h2)) | ∀r, f, f ∈ P ⇒ h (r, f) = h (r, f)} 1 2 Relying on predicate is called allows for accounting for data As for the method’s rely Rr, we should consider that it structures used according to an elementary protocol (such could occur on any abstract object present in the abstract as single-pusher, single-reader buffers). We finally the formal requirements on hybrid imple- Definition 4.1 uses the abstract semantics → . When Rem mentations into the RGspec judgment and use it to state our defining t, we generalize → into relation →R , where I Rem t main result, establishing that the target program semanti- T Rt , m∈is called(t) Rem. Rely Rt overapproximates interfer- cally refines the source program. ences of threads concurrent to t, while being precise enough

Definition 4.6. Let I = {m1 ... , mn}. I satisfies RGspec, to deal with any method m called by t, since Rt ⊆ Rem. written RGspec(I), if ∀ i ∈ [1, n], an RG method specification We define I , {(γ, σ) | ∀t, (γ(t), σ) |= It}. To prove that I is provided for mi, and RG consistency holds. holds on the interleaving semantics, we first prove that all the are preserved by the intra-thread steps. Besides, we Let σ an ini- It Theorem 4.1 (Compiler correctness). i prove their preservation by other threads’ steps: tial shared memory satisfying the invariant Ir for all r ∈ Ref allocated in it. If RGspec(I), then Lemma 5.1. Let γ, σ and t 6= t0 be such that (γ(t), σ) |= 0 0 o 0 0 ] It and (γ(t ), σ) |= It0 . If (γ(t ), σ) −→ (ts , σ ), then ∀p ∈ L , obs(compilehIi(p), σi) ⊆ obs(p, σi) 0 (γ(t), σ ) |= It. 5.2 Simulation Relations We emphasize that the client program p is arbitrary, modulo For both compilation phases, we build an intra-thread, or some basic syntactical well-formedness conditions (e.g., no local, simulation that we then lift at the inter-thread level. private field is accessed in the client code, or methods in I Both relations are defined using the same pattern: in a pair of [ do not modify public fields). related states, the L state satisfies It. It remains to encode Theorem 4.1 is phrased and proved w.r.t. an RG semantic in the relation the matching between execution states. judgment. In our formal development, we have developed For the first compilation phase, a whole execution of an hy- a sound, syntax-directed proof system to discharge the RG brid method is simulated by a single abstract step, occurring semantic judgment, and have successfully used this system at the linearization point. We therefore build a 1-to-0/1 to prove the implementation of the spinlock, as well as the backward simulation. concurrent buffer data structure. ] Relation [∼ states that shared memories are equal on the heaps domains in L]. Local environments are trickier to 5. PROOF OF THE MAIN THEOREM relate. In client code, they simply are equal. During a hybrid Theorem 4.1 is proved by establishing, from the RGspec(I) method call x = y.m(z), before the Lin point, the abstract [ hypothesis, a simulation between the source program and environment is equal to the environment of the L caller. its compilation. Here, we use the terminology of backward After the Lin point, the only mismatch is on variable x, ] [ simulation, as is standard in the compiler verification com- which has been updated in L , but not yet in L . munity [21]. ] Proving that [∼ is a simulation follows the above three phases. Steps by client code are matched 1-to-1 ; inside a Definition 5.1. A relation ∼ between states of L and L] ] hybrid method, steps match 1-to-0 until the Lin point; the is a backward simulation from L to L when, for any s1, s2 ] ] o ] Lin step is matched 1-to-1 ; after the Lin point, steps match and s1, if s1 ∼ s1 and s1 −→ s2, then there exists s2 such ] o ∗ ] ] 1-to-0 until the return instruction. At method call return, that s −→ s , and s ∼ s2. 1 2 2 we use RGspec(I) via It, and in particular Qr, to prove that environments coincide on x again. Eliding customary constraints on initial and final states, such When simulating from L to L[, Lin instructions have been a simulation entails preservation of observable behaviors replaced by a •. It is therefore a 1-to-1 backward simulation. (Theorem 3 in [21]). We establish two backward simulations, Recall that L semantics contains no LinState nor abstract from L to L[ and from L[ to L], which we compose. heap. Relation ∼ [ therefore only states the equality of local environments and concrete heaps. 5.1 Leveraging RGspec(I) Proving this simulation is what makes the third item in The key point is to carry, within the simulation, enough Definition 4.1 necessary. Indeed, we can match the • step in information to leverage RGspec(I). This is necessary for the L program only if the Lin instruction in the L[ program both simulations, so we factorize the work by expressing [ is non-blocking. a rich semantic invariant I over the execution of the L program. To simplify its definition and its proof, I is built The independent proof of the invariant simplifies the lifting as a combination of thread-local invariants. of simulations. Indeed, except for the part about It, that is already proved invariant by other threads’ steps, relations For a thread t, the invariant includes three kinds of infor- It keep track of the same information for all threads. Hence, mation. First, it ensures various well-formedness properties their preservation by the interleaving essentially comes for of intra-thread states. Second, it demands that Ibr, the co- free. herence invariant, holds for all r. The third information is more subtle. When executing a hybrid method m called r on a reference r, to leverage its specification Rem, Gm , Ibr |=m 6. CASE STUDY: CONCURRENT BUFFERS {Pr} m.comm {Qr}, we keep track, in It, that the state is This case study is taken from our verification project of a reachable from a state satisfying Ibr and Pr. RecallŇ that concurrent mark-and-sweep garbage collector [31]. Describ- 0 0 1 nw nr def isEmpty() ::= next_read nr next_read nr nr=this.next_read; next_write nw … next_write nw … atomic h nr=this.next_write; Lin(true) i; nr data nw data return (nw==nr)

SIZE-1 SIZE-1 def top() ::= nr=this.next_read; Figure 6: Concrete buffers layout (examples). El- nr=this.next_write; assume(nr<>nw);// buffer not empty ements contained in the buffer are colored in grey. d=this.data; Example on the right shows how the array is popu- res =d[nr ]; lated circularly. atomic h Lin ( true ) i; return res ing the full algorithm is out of the scope of this paper. Here, def pop() ::= we only give an idea of why and how buffers are used. nr=this.next_read; nw=this.next_write; In this algorithm, application threads, a.k.a mutators, are assume (nr<>nw);// buffer not empty never blocked by the collector thread. They must therefore nr=(nr+1) mod SIZE; participate to the marking of potentially live objects. Buffers, atomic h this.next_read=nr; Lin(true) i; so-called mark buffers in Domani et al.’s Java implementa- return tion [3], keep track of references to objects that are the roots def push(v) ::= of the graph of objects that may be live. Each thread, in- nw=this.next_write; cluding the collector, owns a buffer. Only the owner of the nr=this.next_read; buffer can push on it, and only the collector can read and d=this.data; pop from buffers. d[nw ]=v; nw=(nw+1) mod SIZE; In this section, we present abstract buffers and their fine- assume (nr<>nw);// no buffer overflow grained implementation. We also briefly explain how we atomic h this.next_write=nw; Lin(true) i; express their correspondence in a coherence invariant, as return well as methods guarantees and relies, and how we encode their usage protocol, which is crucial to their correctness. Figure 7: Buffers in L[. Buffers are fully verified in our formal development [30]: their implementation satisfies the RGspec predicate, and by Theorem 4.1, any program using abstract buffers can be semantically refined into a program in L. the index of the first free slot in the array. 6.1 Abstract Buffers A buffer is empty if and only if next_read and next_write are equal. Pushing a value on a buffer consists in writing An abstract buffer is a queue of bounded size SIZE, that this value in the array, at position next write, and then we model by a list (of type list value). Below, we use the incrementing . Conversely, popping a value from usual lists constructors. A buffer is pushed on one end of the next write a buffer is done by incrementing . To consult its list, and popped off from its other end. Buffers provide four next read top value, one reads the array element at position next_read. methods: isEmpty, top, pop, and push. Their respective In fact, the array can be populated in a modulo fashion abstract semantics are3: data (see right example in Figure 6). The code for implementing isEmpty ](ab, v) = (ab, 1) if ab = nil buffers is given in Figure 7, and follows the above principles. J K = (ab, 0) otherwise Our core language does not include proper arrays, but we encode them with appropriate macros. The code is blocking ] top (x::ab, v) = (x::ab, x) when trying to pop on an empty buffer, or trying to push J K pop ](x::b, v) = (b, Null) on a full buffer, as is the case for the abstract version. This is no limitation in practice: the size of buffers is chosen at J K ] push (b, v) = (b++[v], Null) if |b|