Automatic Verification of Erlang-Style Concurrency

arXiv:1303.2201v1 [cs.PL] 9 Mar 2013 ual,flwbsd btatsmnisof semantics abstract flow-based, putable, intcnqei cuaeeog ovrf nitrsigr interesting i abstrac problem an coverability our ACS verify practice the to in Though properties. enough that safety accurate find of We is Erlang. technique of tion fragment c core model a infinite-state fully-automatic, first the obtaining constr have We program. input Soter the of deriving model thus abstract construction, ACS accurate the bootstrap to use then we nepeaina etradto ytm o hc oev abstrac parametric some for a framework which give pretation We for decidable. system, are problems addition cation vector a as interpretation n,admlil ltom [ platforms multiple use funct and widely higher-order cod ing, a on-the-fly now for distribution, is support communication, Erlang concurrency, with 80s, late language the open-sourced in Ericsson d at fault-tolerant program systems to designed Originally programs Erlang. concurrent in of verification the concerns paper This Introduction 1. Erlang Nets, Petri Analysis, Keywords efficiently. surprisingly called rmtvs efraiea btatmdlof model abstract an message-pas formalise asynchronous We and type primitives. creation data process algebraic with pattern-matching mented with language tional shr ovrf.I h aeo ragporm,teinheren the programs, Erlang of case the In verify. to hard as Challenges. readable highly and Armstrong’s quick p see a certain Erlang, For to a mailbox. duction matches its that in message arrive to a tern for waiting while no block FIFO may is not send pattern-matchin mailbox, via that the (FIFFO) from First-In-First-Firable-Out sense retrieved the are mail in Messages unbounded blocking. asynchronously an sent with equipped are is Messages p and unique (pid), a identifier has processes process cess Every of passing. network message dynamic by a communicate of consists computation lang the Following types. data pattern-ma gebraic with language functional call-by-value typed, tools. programmi testing GU for and databases, control fit parallel monitoring, natural servers, mes networked a CPUs, and is multicore creation Erlang process communication, efficient passing highly offers that system hspprpeet napoc ovrf aeyproperties introduce we safety Erlang, verify Core by to spired programs approach concurrent an higher-order Erlang-style, presents paper This Abstract PSPACE h eunilpr fEln sahge re,dynamically order, higher a is Erlang of part sequential The [email protected] olipeetto ftevrfiainmto,thereby method, verification the of implementation tool a , co omnctn System Communicating Actor -complete, mneeD’Osualdo Emanuele nvriyo Oxford of University eicto,IfiieSaeMdlCekn,Static Checking, Model Infinite-State Verification, uoai eicto fEln-tl Concurrency Erlang-Style of Verification Automatic ocretporm r adt rt.Te r just are They write. to hard are programs Concurrent Soter λ A a nls hs eicto problems verification these analyse can CTOR 2 , n s tt ul oyiecom- polytime a build to it use and 3 co model actor .Lreybcueo runtime a of because Largely ]. CACM λ AS hc a natural a has which (ACS) A CTOR λ ril [ article A CTOR rttpclfunc- prototypical a , [ 1 ,acnurn Er- concurrent a ], automatically λ rgas which programs, A 2 CTOR ]. [email protected] .Aprocess A g. ekrfor hecker cigal- tching programs istributed reload- e oahnKochems Jonathan nvriyo Oxford of University written ,aug- s, s and Is, more a inter- t E s intro- ucted sage- erifi- ions, ange but , box. In- . sing that ro- at- ng X n- of d, - - t ug ihatrsyeconcurrency. actor-style with guage lang abstract hc xiisi ultehge-re etrso Erlang, of pro features dynamic creation. higher-order and concurrency the message-passing full asynchronous in exhibits which assuming concurrency, passing ore fifiiyi h tt space. state d the several in from infinity seen of be sources can task verification the of complexity duce cino nubuddsto omnctn rcse.A AC states An control of processes. set communicating finite of a set has unbounded an of action ihamibxo ieoeadcmuiaigmsae rma from [ messages powerful s communicating Turing fixed already and is a one set, size of finite of consisting whic mailbox class and a the ea with possible processes, us: first-order) as (equivalently, upon context-free accurately forced of as is w model of abstract decision to to key infinity the extent, of large causes a To checking. model state automatically bu h snhooscmuiaino nubuddstof set proce Turing-powerful unbounded of set an unbounded of an across communication messages, r must asynchronous one the that is about programs Erlang verifying of challenge The omncto ieeffect side communication fpdclass pid of i h rcs ae nitra rniin(i textract it (ii) transition message internal a an makes process the (i) rniinrl a h shape the has rule transition etr fitgr eadda rniinrls A defin VAS A rules. transition as regarded integers of vectors iesto messages of set nite ealta A fdimension of VAS a that Recall pro distinguishable. and not are fixed, arbi- class is grow pid classes may same pid the ACS of of an set unboun the in However has processes large. process of trarily a number of the mailbox and the capacity, state: infinite are models eso i class pid of cess tem ( ( ( ( ( ∞ ∞ ∞ ∞ ∞ ecnie rgasof programs consider We ihdcdbeifiiesaemdlcekn nmn,w int we mind, in checking model infinite-state decidable With u oli ovrf aeypoete fEln-ieprogr Erlang-like of properties safety verify to is goal Our nAScnb nepee aual sa as naturally interpreted be can ACS An VS,o qiaetyPtint sn one abstractio counter using net, Petri equivalently or (VAS), [ co omnctn System Communicating Actor )Gnrlrcrinrqie poeslcl call-stack local) (process a requires recursion General 1) )Hge-re ucin r rtcasvle;closures values; first-class are functions Higher-order 2) )Mibxshv none capacity. unbounded have Mailboxes 5) )A none ubro rcse a esanddy- spawned be can processes of number unbounded An 4) )Dt oan,adhnetemsaesae r un- are space, message the hence and domains, Data 3) 5 —h fca nemdaerpeetto fEln code, Erlang of representation intermediate official ]—the ( namically. one:fntosmyrtr,advralsmybe may variables size. arbitrary an and of terms return, to, bound may functions bounded: epse sprmtr rreturned. or parameters as passed be ∞ m 1) ι sn obnto fsai nlssadinfinite- and analysis static of combination a using , a rniinfo state from transition can rmismibx(i)i ed message a sends it (iii) mailbox its from , ( ∞ ι ′ 2) M i)i pwsapoeso i class pid of process a spawns it (iv) and n nt e ftasto ue.A ACS An rules. transition of set finite a and , ( ∞ ℓ fwihteeaefu id,namely, kinds, four are there which of , ι λ : 3) A q Q ( hl ekn oaayemessage- analyse to seeking while , CTOR ∞ −→ nt e of set finite a , ℓ [email protected] nvriyo Oxford of University n .H ueOng Luke C.-H. 4) AS,wihmdl h inter- the models which (ACS), λ q rttpclfntoa lan- functional prototypical a , ′ sgvnb e of set a by given is A and hc en htaprocess a that means which , 10 q CTOR ostate to .Orsrtg stu to thus is strategy Our ]. ( ∞ sessentially is 5) etradto sys- addition vector i classes pid . q ′ ih(possible) with hequipped ch m n reads and s oeEr- Core oapro- a to ι P ′ n ACS . cesses iverse eason sses. -long fi- a , with hich cess . sa es ams ded can ro- n. et S h state transition graph whose states are just n-long vectors of non- Notation. We write A∗ for the set of finite sequences of elements negative integers. There is a transition from state v to state v′ just if of the set A, and ǫ for the null sequence. Let a ∈ A and l,l′ ∈ v′ = v+r for some transition rule r. It is well-known that the deci- A∗, we overload ‘·’ so that it means insertion at the top a · l, at ′ sion problems Coverability and LTL Model Checking for VAS are the bottom l · a or concatenation l · l . We write li for the i-th EXPSPACE-complete; Reachability is decidable but its complex- element of l. The set of finite partial functions from A to B is ity is open. We consider a particular counter abstraction of ACS, denoted A ⇀ B. Given f : A ⇀ B we define f[a 7→ b] := called VAS semantics, which models an ACS as a VAS of dimen- (λx. if (x = a) then b else f(x)) and write [] for the everywhere sion |P |× (|Q| + |M|), distinguishing two kinds of counters. A undefined function. counter named by a pair (ι, q) counts the number of processes of pid class ι that are currently in state q; a counter named by (ι, m) 2. A Prototypical Fragment of Erlang counts the sum total of occurrences of a message m currently in the mailbox of p, where p ranges over processes of pid class ι. Us- In this section we introduce λACTOR, a prototypical untyped func- ing this abstraction, we can conservatively decide properties of the tional language with actor concurrency. λACTOR is essentially ACS using well-known decision procedures for VAS. single-node Core Erlang [5]—the official intermediate representation of Erlang code—without built-in functions and fault-tolerant Parametric, Flow-based Abstract Interpretation. The starting features. It exhibits in full the higher-order features of Erlang, with point of our verification pathway is the abstraction of the sources of message-passing concurrency and dynamic process creation. infinity (∞ 1), (∞ 2) and (∞ 3). Methods such as k-CFA [32] can Syntax The syntax of ACTOR is defined as follows: be used to abstract higher-order recursive functions to a finite-state λ system. Rather than ‘baking in’ each type of abstraction separately, e ∈ Exp ::= x | c(e1,...,en) | e0 (e1,...,en ) | fun we develop a general abstract interpretation framework which is | letrec x1=fun . · · · xn=fun . in e parametric on a number of basic domains. In the style of Van Horn 1 n | case e of pat → e ; . . . ; pat → e end and Might [33], we devise a machine-based operational semantics 1 1 n n receive end of λACTOR that uses store-allocated continuations. The advantage | pat1 → e1; . . . ; patn → en of such an indirection is that it enables the construction of a ma- | send(e1,e2) | spawn(e) | self () chine semantics which is ‘generated’ from the basic domains of fun ::= fun(x1,...,xn) → e Time, Mailbox and Data. We show that there is a simple no- pat ::= x | c(pat1,...,patn) tion of sound abstraction of the basic domains whereby every such abstraction gives rise to a sound abstract semantics of λACTOR pro- where c ranges over a finite set Σ of constructors which we grams (Theorem 1). Further if a given sound abstraction of the consider fixed thorough out the paper. basic domains is finite and the associated auxiliary operations are For ease of comparison we keep the syntax close to Core Erlang computable, then the derived abstract semantics is finite and com- and use uncurried functions, delimiters, fun and end. We write ‘ ’ putable. for an unnamed unbound variable; using symbols from Σ, we write n-tuples as {e1,..., en}, the list constructors as cons [ | ] Generating an Actor Communicating System. We study the ab- and the empty list as [] . Sequencing (e1 , e2) is a shorthand for stract semantics derived from a particular 0-CFA-like abstraction (fun( )→e2)(e1) and we we omit brackets for nullary constructors. of the basic domains. However we do not use it to verify properties The character ‘%’ marks the start of a line of comment. Variable of λACTOR programs directly, as it is too coarse an abstraction to be names begin with an uppercase letter, except when bound by letrec . useful. Rather, we show that a sound ACS (Theorem 3) can be con- The free variables fv(e) of an expression are defined as usual. A structed in polynomial time by bootstrapping from the 0-CFA-like λACTOR program P is just a closed λACTOR expression. abstract semantics. Further, the dimension of the resulting ACS is polynomial in the length of the input λACTOR program. The idea Labels For ease of reference to program points, we associate a is that the 0-CFA-like abstract (transition) semantics constitutes a unique label to each sub-expression of a program. We write ℓ: e to sound but rough analysis of the control-flow of the program, which mean that ℓ is the label associated with e, and we often omit the takes higher-order computation into account but communicating label altogether. Take a term ℓ: (ℓ0 : e0(ℓ1 : e1,...,ℓn : en)), we behaviour only minimally. The bootstrap construction consists in define ℓ.argi := ℓi and arity(ℓ) := n. constraining these rough transitions with guards of the form ‘re- Semantics The semantics of λACTOR is defined in Section 4, but ceive a message of this type’ or ‘send a message of this type’ or we informally present a small-step reduction semantics here to give ‘spawn a process’, thus resulting in a more accurate abstract model an intuition of its model of concurrency. The rewrite rules for the of the input λACTOR program in the form of an ACS. cases of function application and λ-abstraction are the standard Evaluation. To demonstrate the feasibility of our verification ones for call-by-value λ-calculus; we write evaluation contexts as method, we have constructed a prototype implementation called E[ ]. A state of the computation of a λACTOR program is a set Π of Soter. Our empirical results show that the abstraction framework is ι processes running in parallel. A process heim, identified by the pid accurate enough to verify an interesting range of safety properties m of non-trivial Erlang programs. ι, evaluates an expression e with mailbox holding the messages not yet consumed. Purely functional reductions with no side-effect Outline. In Section 2 we define the syntax of λACTOR and infor- take place in each process, independently interleaved. A spawn mally explain its semantics with the help of an example program. In construct, spawn(fun()→e), evaluates to a fresh pid ι′ (say), with ′ ι ′ Section 3, we introduce Actor Communicating System and its VAS the side-effect of the creation of a new process, heiǫ , with pid ι : semantics. In Section 4 we present a machine-based operational se- ′ ι ′ ι ι mantics of λACTOR. In Section 5 we develop a general abstract inter- hE[spawn(fun()→e)]im k Π −→ hE[ι ]im kheiǫ k Π pretation framework for λACTOR programs, parametric on a number A send construct, send(ι, v), evaluates to the message v with the of basic domains. In Section 6, we use a particular instantiation side-effect of appending it to the mailbox of the receiver process ι; of the abstract interpretation to bootstrap the ACS construction. In thus send is non-blocking: Section 7 we present the experimental results based on our tool im- ′ ′ ι ι ι ι plementation Soter, and discuss the limitations of our approach. hE[send(ι, v)]im′ kheim k Π −→ hE[v]im′ kheim·v k Π The evaluation of a receive construct, receive p1 → e1 ...pn → 1 letrec en end, will block if the mailbox of the process in question contains 2 %%% LOCKED RESOURCE MODULE no message that matches any of the patterns pi. Otherwise, the 3 res start = fun(Res) → spawn(fun() → res free (Res)). first message m that matches a pattern, say pi, is consumed by 4 res free = fun(Res) → the process, and the computation continues with the evaluation 5 receive {lock, P}→ of ei. The pattern-matching variables in ei are bound by θ to the 6 send(P, {acquired, self () }), res locked (Res, P) corresponding matching subterms of the message m; if more than 7 end. one pattern matches the message, then (only) the first in textual 8 res locked = fun(Res, P) → 9 receive order is fired. 10 {req, P, Cmd}→ 11 case Res(P, Cmd) of ι hE[receive p1 → e1 ...pn → en end]im·m·m′ k Π 12 {NewRes, ok}→ ′ ι 13 res locked (NewRes, P); −→ hE[θei]im m′ k Π; · 14 {NewRes, {reply, A}} → 15 send(P, {ans, self (), A}), Note that message passing is not First-In-First-Out but rather 16 res locked (NewRes, P) First-In-First-Fireable Out (FIFFO): incoming messages are queued 17 end; at the end of the mailbox but the message that matches a receive 18 {unlock, P}→ res free (Res) construct, and is subsequently extracted, is not necessarily the first 19 end. in the queue. 20 21 % Locked Resource API 22 res lock = fun(Q) →send(Q, {lock, self () }), Example 1 (Locked Resource). Figure 1 shows an example 23 receive {acquired, Q}→ ok end. λACTOR program. The code has three logical parts, which would 24 res unlock = fun(Q) →send(Q, {unlock, self () }). constitute three modules in Erlang. The first part defines an Erlang 25 res request = fun(Q, Cmd) → 1 behaviour that governs the lock-controlled, concurrent access of 26 send(Q, {req, self (), Cmd}), a shared resource by a number of clients. A resource is viewed 27 receive {ans, Q, X}→ X end. as a function implementing a protocol that reacts to requests; the 28 res do = fun(Q, Cmd) →send(Q, {req, self () , Cmd}). function is called only when the lock is acquired. Note the use of 29 higher-order arguments and return values. The function res start 30 %%% CELL IMPLEMENTATION MODULE 31 fun creates a new process that runs an unlocked ( res free ) instance cell start = () → res start (cell (zero)). 32 cell = fun(X) → of the resource. When unlocked, a resource waits for a {lock , P} 33 fun( P, Cmd) → message to arrive from a client P. Upon receipt of such a mes- 34 case Cmd of sage, an acknowledgement message is sent back to the client and 35 {write , Y} → {cell (Y), ok}; the control is yielded to res locked . When locked (by a client P), 36 read → {cell (X), {reply , X}} a resource can accept requests {req,P,Cmd} from P—and from P 37 end. only—for an unspecified command Cmd to be executed. 38 After running the requested command, the resource is expected 39 % Cell API 40 fun to return the updated resource handler and an answer, which may cell lock = (C) → res lock (C). 41 cell unlock = fun(C) → res unlock (C). be the atom ok, which requires no additional action, or a couple 42 cell read = fun(C) → res request (C, read). {reply , Ans} which signals that the answer Ans should be sent 43 cell write = fun(C, X) → res do (C, {write , X}). back to the client. When an unlock message is received from P 44 the control is given back to res free . Note that the mailbox match- 45 %%% INCREMENT CLIENT ing mechanism allows multiple locks and requests to be sent asyn- 46 inc = fun(C) → cell lock (C), chronously to the mailbox of the locked resource without causing 47 cell write (C, {succ, cell read (C)}), conflicts: the pattern matching in the locked state ensures that all 48 cell unlock (C). the pending lock requests get delayed for later consumption once 49 add to cell = fun(M, C) → 50 case M of zero → ok; the resource gets unlocked. The functions res lock , res unlock , 51 {succ,M’} → spawn(fun() → inc (C)), res request , res do encapsulate the locking protocol, hiding it from 52 add to cell (M’, C) the user who can then use this API as if it was purely functional. 53 end. The second part implements a simple ‘shared memory cell’ 54 %%% ENTRY POINT resource that holds a natural number, which is encoded using the 55 in C = cell start (), add to cell (N, C). constructors zero and {succ, }, and allows a client to read its value (the command read) or overwrite it with a new one (the {write , X} Figure 1. Locked Resource (running example) command). Without locks, a shared resource with such a protocol easily leads to race conditions. The last part defines the function inc which accesses a locked cell to increment its value. The function add to cell adds M to An interesting correctness property of this code is the mutual the contents of the cell by spawning M processes incrementing exclusion of the lock-protected region (i.e. line 47) of the concur- it concurrently. Finally the entry-point of the program sets up a rent instances of inc. process with a shared locked cell and then calls add to cell . Note that N is a free variable; to make the example a program we can Remark 1. The following Core Erlang features are not captured by either close it by setting N to a constant or make it range over all λACTOR. (i) Module system, exception handling, arithmetic primi- natural numbers with the extension described in Section 5. tives, built-in data types and I/O can be straightforwardly translated or integrated into our framework. They are not treated here because they are tied to the inner workings of the Erlang runtime sys- 1 I.e. a module implementing a general purpose protocol, parametrised over tem. (ii) Timeouts in receives, registered processes and type guards another module containing the code specific to a particular instance. can be supported using suitable abstractions. (iii) A proper treat- ment of monitor / link primitives and the multi-node semantics will tems are known as vector addition systems (VAS), which are equiv- require a major extension of the concrete (and abstract) semantics. alent to Petri nets. Definition 2 (Vector Addition System). (i) A vector addition sys- 3. Actor Communicating Systems tem (VAS) V is a pair (I, R) where I is a finite set of indices (called ZI In this section we explore the design space of abstract models of the places of the VAS) and R ⊆ is a finite set of rules. Thus a Erlang-style concurrency. We seek a model of computation that rule is just a vector of integers of dimension |I|, whose components should capture the core concurrency and asynchronous communi- are indexed (i.e. named) by the elements of I. cation features of ACTOR and yet enjoys the decidability of interest- (ii) The state transition system JVK induced by a VAS V =(I, R) λ I ing verification problems. In the presence of pattern-matching alge- has state-set N and transition relation braic data types, the (sequential) functional fragment of λACTOR is {(v, v + r) | v ∈ NI , r ∈ R, v + r ∈ NI }. already Turing powerful [28]. Restricting it to a pushdown (equiv- ′ ′ alently, first-order) fragment but allowing concurrent execution We write v ≤ v just if for all i in I, v(i) ≤ v (i). would enable, using very primitive synchronization, the simulation The semantics of an ACS can now be given easily in terms of a of a Turing-powerful finite automaton with two stacks. A single corresponding underlying vector addition system: finite-control process equipped with a mailbox (required for asynchronous communication) can encode a Turing-powerful queue Definition 3 (VAS semantics). The semantics of an ACS A = automaton in the sense of Minsky. Thus constrained, we opt for (P,Q,M,R,ι0, q0) is the transition system induced by the VAS a model of concurrent computation that has finite control, a finite V =(I, R) where I = P × (Q ⊎ M) and R = {r | r ∈ R}). The number of messages, and a finite number of process classes. transformation r 7→ r is defined as follows. 2 Definition 1. An Actor Communicating System (ACS) A is a tuple ACS Rules: r VAS Rules: r hP,Q,M,R,ι , q i where P is a finite set of pid-classes, Q is a τ 0 0 ι: q −→ q′ [(ι, q) 7→ −1, (ι, q′) 7→ 1] finite set of control-states, M is a finite set of messages, ι0 ∈ P is ?m ′ ′ the pid-class of the initial process, q0 ∈ Q is the initial state of the ι: q −−→ q [(ι, q) 7→ −1, (ι, q ) 7→ 1, (ι, m) 7→ −1] ′ ℓ ι m initial process, and R is a finite set of rules of the form ι: q −→ q′ ι: q −−−→! q′ [(ι, q) 7→ −1, (ι, q′) 7→ 1, (ι′,m) 7→ 1] ′ ′ ′′ where ι ∈ P , q, q ∈ Q and ℓ is a label that can take one of four νι . q ′ ′ ′ ′′ possible forms: ι: q −−−−→ q [(ι, q) 7→ −1, (ι, q ) 7→ 1, (ι , q ) 7→ 1] I - τ, which represents an internal (sequential) transition of a pro- Given a JVK-state v ∈ N , the component v(ι, q) counts the cess of pid-class ι number of processes in the pid-class ι currently in state q, while - ?m with m ∈ M: a process of pid-class ι extracts (and reads) the component v(ι, m) is the sum of the number of occurrences of a message m from its mailbox the message m in the mailboxes of the processes of the pid-class ι. ′ ′ - ι !m with ι ∈ P , m ∈ M: a process of pid-class ι sends a While infinite-state, many non-trivial properties are decidable ′ message m to a process of pid-class ι on VAS including reachability, coverability and place boundedness; - νι′.q′′ with ι′ ∈ P and q′′ ∈ Q: a process of pid-class ι spawns for more details see [13]. In this paper we focus on coverability, ′ ′′ a new process of pid-class ι that starts executing from q which is EXPSPACE-complete [30]: given two states s and t, is it possible to reach from s a state t′ that covers t (i.e. t′ ≤ t)? Now we have to give ACS a semantics, but interpreting the ACS mailboxes as FIFFO queues would yield a Turing-powerful model. Which kinds of correctness properties of λACTOR programs can Our solution is to apply a counter abstraction on mailboxes: dis- one specify by coverability of an ACS? We will be using ACS to regard the ordering of messages, but track the number of occur- over-approximate the semantics of a λACTOR program, so if a state rences of every message in a mailbox. Since we bound the number of the ACS is not coverable, then it is not reachable in any execu- of pid-classes, but wish to model dynamic (and hence unbounded) tion of the program. It follows that we can use coverability to ex- spawning of processes, we apply a second counter abstraction on press safety properties such as: (i) unreachability of error program the control states of each pid-class: we count, for each control-state locations (ii) mutual exclusion (iii) boundedness of mailboxes: is it of each pid-class, the number of processes in that pid-class that are possible to reach a state where the mailbox of pid-class ι has more currently in that state. than k messages? If not we can allocate just k memory cells for It is important to make sure that such an abstraction contains that mailbox. all the behaviours of the semantics that uses FIFFO mailboxes: if there is a term in the mailbox that matches a pattern, then the 4. An Operational Semantics for λACTOR corresponding branch is non-deterministically fired. To see the difference, take the ACS that has one process (named ι), three In this section, we define an operational semantics for λACTOR ?a ?b using a time-stamped CESK* machine, following a methodology control states q, q and q , and two rules ι: q −→ q , ι: q −→ q . 1 2 1 2 advocated by Van Horn and Might [33]. An unusual feature of When equipped with a FIFFO mailbox containing the sequence such machines are store-allocated continuations which allow the cab, the process can only evolve from q to q by consuming a from 1 recursion in a programs’s control flow and data structure to be the mailbox, since it can skip c but will find a matching message separated from the recursive structure in its state space. As we shall (and thus not look further into the mailbox) before reaching the illustrate in Section 5, such a formalism is key to a transparently message b. In contrast, the VAS semantics would let q evolve non- sound and parametric abstract interpretation. deterministically to both q1 and q2, consuming a or b respectively: the mailbox is abstracted to [a 7→ 1,b 7→ 1, c 7→ 1] with no A Concrete Machine Semantics. Without loss of generality, we information on whether a or b arrived first. However, the abstracted assume that in a λACTOR program, variables are distinct, and con- semantics does contain the traces of the FIFFO semantics. structors and cases are only applied to (bound) variables. The The VAS semantics of an ACS is a state transition system equipped with counters (with values in N) that support increment 2 All unspecified components of the vectors r as defined in the table are set and decrement (when non-zero) operations. Such infinite-state sys- to zero. λACTOR machine defines a transition system on (global) states, - Stop represents the empty context. which are elements of the set State - Argihℓ, v0 ...vi−1,ρ,ai represents the context s ∈ State := Procs × Mailboxes × Store ′ ′ E[v0(v1,...,vi−1, [ ],ei+1,...,en)] π ∈ Procs := Pid ⇀ ProcState where e0(e1,...,en) is the subterm located at ℓ; ρ closes the terms Mailboxes : Pid Mailbox ′ ′ µ ∈ = ⇀ ei+1,...,en to ei+1,...,en respectively; the address a points to An element of Procs associates a process with its (local) state, and the continuation representing the enclosing evaluation context E. an element of Mailboxes associates a process with its mailbox. We Addresses, Pids and Time-Stamps. While the machine supports split the Store into two partitions arbitrary concrete representations of time-stamps, addresses and σ ∈ Store :=(VAddr ⇀ Value) × (KAddr ⇀ Kont) pids, we present here an instance based on contours [32] which shall serve as the reference semantics of λACTOR, and the basis for each with its address space, to separate values and continuations. the abstraction of Section 5. By abuse of notation σ(x) shall mean the application of the first A way to represent a dynamic occurrence of a symbol is the component when x ∈ VAddr and of the second when x ∈ KAddr. history of the computation at the point of its creation. We record The local state of a process history as contours which are strings of program locations q ∈ ProcState :=(ProgLoc ⊎ Pid) × Env × KAddr × Time t ∈ Time := ProgLoc∗ 3 is a tuple, consisting of (i) a pid, or a program location which The initial contour is just the empty sequence t0 := ǫ, while is a subterm of the program, labelled with its occurrence; when- the tick function updates the contour of the process in question ever it is clear from the context, we shall omit the label; (ii) an by prepending the current program location, which is always a environment, which is a map from variables to pointers to values function call (see rule Apply): ρ ∈ Env := Var ⇀ VAddr; (iii) a pointer to a continuation, tick: ProgLoc × Time → Time tick(ℓ,t) := ℓ · t which indicates what to evaluate next when the current evaluation returns a value; (iv) a time-stamp, which will be described later. Addresses for values (b ∈ VAddr) are represented by tuples Values are either closures or pids: comprising the current pid, the variable in question, the bound value and the current time stamp. Addresses for continuations d ∈ Value := Closure ⊎ Pid Closure := ProgLoc × Env (a, c ∈ KAddr) are represented by tuples comprising the cur- Note that, as defined, closures include both functions (which is rent pid, program location, environment and time (i.e. contour); or standard) as well as constructor terms. ∗ which is the address of the initial continuation (Stop). All the domains we define are naturally partially ordered: VAddr := Pid × Var × Data × Time ProgLoc and Var are discrete partial orders, all the others are KAddr : Pid ProgLoc Env Time ∗ defined by the appropriate pointwise extensions. =( × × × ) ⊎ { } The data domain (δ ∈ Data) is the set of closed λACTOR terms; the Mailbox and Message Passing A mailbox is just a finite sequence function res : Store ×Value → Data resolves all the pointers of a m ∗ of values: ∈ Mailbox := Value . We denote the empty mailbox value through the store σ, returning the corresponding closed term: by ǫ. A mailbox is supported by two operations: res(σ, ι) := ι mmatch: pat ∗ × Mailbox × Env × Store → res(σ, (e,ρ)) := e[x 7→ res(σ, σ(ρ(x))) | x ∈ fv(e)] (N × (Var ⇀ Value) × Mailbox) ⊥ New addresses are allocated by extracting the relevant compo- enq: Value × Mailbox → Mailbox nents from the context at that point: The function takes a list of patterns, a mailbox, the current mmatch newkpush : Pid × ProcState → KAddr environment and a store (for resolving pointers in the values stored in the mailbox) and returns the index of the matching pattern, a newkpush(ι, hℓ,ρ, ,ti) :=(ι, ℓ.arg0,ρ,t) substitution witnessing the match, and the mailbox resulting from newkpop : Pid × Kont × ProcState → KAddr the extraction of the matched message. To model Erlang-style newkpop(ι, κ, h , , ,ti) :=(ι, ℓ.arg ,ρ,t) FIFFO mailboxes we set enq(d, m) := m · d and define: i+1 where κ = Argihℓ,...,ρ, i mmatch(p1 ...pn, m,ρ,σ) :=(i, θ, m1 · m2) newva : Pid × Var × Data × ProcState → VAddr such that newva(ι, x, δ, h , , ,ti) :=(ι,x,δ,t) m = m · d · m ∀d′ ∈ m . ∀j . match (p ,d′)= ⊥ 1 2 1 ρ,σ j Remark 2. To enable data abstraction in our framework, the address θ = match (p ,d) ∀j

d b Functional reductions Vars Process creation FunEval if π(ι) = hx, ρ, a, ti Spawn ′ if π(ι)= hℓ: (e0(e1,...,en)), ρ, a, ti σ(ρ(x)) = (v, ρ ) if π(ι) = hfun() → e, ρ, a, ti ′ ′ ′ b := newkpush(ι,π(ι)) then π = π[ι 7→ hv, ρ , a, ti] σ(a) = Arg1hℓ, d, ρ ,ci ′ then π = π[ι 7→ he0, ρ, b, ti] d = (spawn, ) ′ ′ σ = σ[b 7→ Arg0hℓ,ǫ,ρ,ai] Communication ι := newpid(ι,ℓ,ϑ) then ArgEval Receive ′ ′ ′ ι 7→ hι , ρ ,c,ti, if π(ι)= hv, ρ, a, ti if π(ι) = hreceive p1 → e1 ...pn → en end, ρ, a, ti π = π ′ ′ ∗ σ(a) = κ = Argihℓ, d0 ...di−1, ρ ,ci mmatch(p1 ...pn, µ(ι), ρ, σ) = (i,θ, m) " ι 7→ he,ρ, , t0i # di := (v, ρ) θ =[x1 7→ d1 ...xk 7→ dk] µ′ = µ[ι′ 7→ ǫ] b := newkpop(ι,κ,π(ι)) bj := newva(ι,xj , res(σ, dj ),π(ι)) ′ ′ ′ Self then π = π[ι 7→ hℓ.argi+1, ρ , b, ti] ρ := ρ[x1 7→ b1 ...xk 7→ bk] σ′ = σ[b 7→ Arg hℓ, d ...d , ρ′,ci] then π′ = π[ι 7→ he , ρ′, a, ti] if π(ι)= h self (), ρ, a, ti i+1 0 i i ′ µ′ = µ[ι 7→ m] then π = π[ι 7→ hι, ρ, a, ti] Apply σ′ = σ[b 7→ d ...b 7→ d ] if π(ι)= hv, ρ, a, ti, arity(ℓ)= n 1 1 k k Initial state ′ σ(a) = κ = Argnhℓ, d ...dn− , ρ ,ci Send 0 1 Init The initial state associated with d0 = (fun(x1 ...xn) → e, ρ0) dn := (v, ρ) if π(ι)= hv, ρ, a, ti b : ι,x , σ, d ,π ι σ a κ ℓ,d,ι′, ,c a program P is sP := hπ0, µ0, σ0i i = newva( i res( i) ( )) ( ) = = Arg2h i ∗ t′ := tick(ℓ,π(ι)) d = (send, ) where π0 =[ι0 7→ hP, [], , t0i] ′ ′ ′ ′ µ0 =[ι0 7→ ǫ] then π = π[ι 7→ he, ρ [x1 → b1 ...xn → bn],c,t i] then π = π[ι 7→ hv,ρ,c,ti] ′ ′ ′ ′ σ0 =[∗ 7→ Stop] σ = σ[b1 7→ d1 ...bn 7→ dn] µ = µ[ι 7→ enq((v, ρ), µ(ι ))] Figure 2. Operational Semantics Rules. The tables define the transition relation s = hπ,µ,σ,ϑi→hπ′, µ′,σ′,ϑ′i = s′ by cases; the primed components of the state are identical to the non-primed components, unless indicated otherwise in the “then” part of the rule. The meta-variable v stands for terms that cannot be further rewritten such as λ-abstractions, constructor applications and un-applied primitives. straightforward abstractions of the original ones. In Figure 3, we We omit the straightforward proof that this constitutes a sound present the abstract counterparts of the rules for the operational abstraction. semantics in Figure 2, defining the non-deterministic abstract transition relation on abstract states ( ) ⊆ State[ × State[ . When Abstracting Data. We included data in the value addresses in the referring to a particular program P, the abstract semantics is the definition of VAddr, cutting contours would have been sufficient portion of the graph reachable from sP . to make this domain finite. A simple solution is to discard the value completely by using the trivial data abstraction Data 0 := Theorem 1 (Soundness of Analysis). Given a sound abstraction ′ { } which is sound. If more precision is needed, any finite data- of the basic domains, if s → s and αcfa(s) ≤ u, then there exists abstraction would do: the analysis would then be able to distinguish ′ [ ′ ′ ′ u ∈ State such that αcfa(s ) ≤ u and u u . states that differ only because of different bindings in their frame. See Appendix B for a proof of the Theorem. We present here a data abstraction particularly well-suited to Now that we have defined a sound abstract semantics we give languages with algebraic data-types such as λACTOR: the abstraction sufficient conditions for its computability. ⌊e⌋σ,Db discards every sub-term of e that is nested at a deeper level than a parameter D. Theorem 2 (Decidability of Analysis). If a given (sound) abstraction of the basic domains is finite, then the derived abstract transition relation defined in Figure 3 is finite; it is also decidable if the ⌊(e, ρ)⌋σ,b 0 := { } ⌊(fun..., ρ)⌋σ,Db +1 := { } associated auxiliary operations (in Definition 6) are computable. di ∈ σ(ρ(xi)), ⌊(c(x1 ...xn), ρ)⌋bσ,D+1 := c(δ1 . . . δn) b Proof. The proof is by a simple inspection of the rules: all the ( δi ∈ ⌊di⌋b ) b σ,D individual rules are decidable and the state space is finite. b b b b b b b A Simple Mailbox Abstraction Abstract mailboxes need to be where is a placeholder for discarded subterms. finite too in order for the analysis to be computable. By abstracting An analogous D-deep abstraction can be easily defined for con- addresses (and data) to a finite set, values, and thus messages, crete values and we use the same notation for both; we use the become finite too. The only unbounded dimension of a mailbox notation ⌊δ⌋ for the analogous function on elements of Data. becomes then the length of the sequence of messages. We then D We define DD = hData D, αD, resDi to be the ‘depth-D’ data abstract mailboxes by losing information about the sequence and abstraction where collecting all the incoming messages in an un-ordered set: Data := { } ∪ {c(δ .c . . δ ) | δ ∈ Data } P \ \ D+1 1 n i D Mset := h (Value), ⊆, ∪, αset, enqset, ∅, mmatchseti αD(δ) := ⌊δ⌋ resD(σ, d) := ⌊d⌋b where the abstract version of enq is the insertion in the set, as easily D b b b σ,D derived from the soundness requirement;d the matching function The proof of its soundness is easy and we omit it. b b is similarly derived from the correctness condition: writing ~p = c b p1 ...pn Abstracting Time. Let us now define a specific time abstraction that amounts to a concurrent version of a standard k-CFA. A k-CFA α (m) := {α(d) |∃i. m = d} enq (d, m) := {d} ∪ m set i set is an analysis parametric in k, which is able to distinguish dynamic d ∈ m, mmatch\ (~p, m, ρ, σ) := (i, θ, m) contexts up to the bound given by k. We proceed as in standard set d \b b b b ( θ ∈ matchρ,b σb(pi, d) ) k-CFA by truncating contours at length k to obtain their abstract

b b b b b b b b b Functional abstract reductions AbsVars Abstract process creation AbsFunEval if π(ι) ∋ hx, ρ, a, t i AbsSpawn σ(ρ(x)) ∋ (v, ρ′) if π(ι) ∋ q = hℓ: (e (e ,...,en)), ρ, a, t i 0 1 ′ ′ if π(ι) ∋ hfun() → e, ρ, a, t i then π = π ⊔ [ι 7→b {hv, ρ , a, t i}] b := new\kpush(ι, q) b b b b ′ σ(a) ∋ Arg1hℓ, d, ρ , ci then π′ π ι e , ρ, b, t b b b b b b =b ⊔ [ 7→ {h 0 i}] b b Abstract communication d = (spawn, ) b ′ b b b b b b b b b b b σ = σ ⊔ [b b7→b {Arg0hℓ,ǫ, ρ, ai}] ′ bιb:= new\pid(ι,b ℓ,bt )b b b AbsReceive AbsArgEvalb b b b then b ′ ′ b b b b b if π(ι) ∋ q = he, ρ, a, t i ι 7→ {hι , ρ , c, t i}, if π(ι) ∋ hv, ρ, a, t i πb′ = π ⊔ b b e = receive p → e ...pn → en end ′ ′ 1 1 " ι 7→ {he, ρ, ∗, t0i} # σ(a) ∋ κ = Argihℓ, d0 ... di−1, ρ , ci \ mmatch(b b b p1 ...pb nb, µb(ι), ρ, σ) ∋ (i, θ, m) ′ ′b b b b b bdbi := (v, ρb)b b µb = µb ⊔ [ι 7→ ǫ ] θ =[x1 7→ d1 ...xk 7→ dk] b b b b bb := newb\kpop(ι, κ,bq) b b b δj ∈ res(σ, dj ) b b b b b b AbsSelf thenbπ′ = π ⊔b[ι 7→ {hℓ.arg , ρ′, b, t i}] b b b b i+1 b := new\b(ι,x , δ , q )b b′ ′ j va j j σ = σ ⊔ [b 7→b {bArgb i+1hℓ, d0 ... di, ρ , ci}] ′ if π(ι) ∋ h self (), ρ, a, t i ρb :=cρ[x1b 7→b b1 ...xk 7→ bk] b b b b b b ′ ′ then π′ = π ⊔ [ι 7→ {hι, ρ, a, t i}] AbsApply thenb π = π ⊔ [ιb7→ {hbei,bρ , a, t i}] b b b b b b b ′ m b if π(ι) ∋ q = hv, ρ, a, t i, arity(ℓ)= n b µ =bµ[ι 7→ b ] b b b b b ′ Initial abstract state ′ σ = σ ⊔ [b1 7→ {d1} ... bbk 7→ {dk}] b b b b b b b σ(a) ∋ Argnhℓ, d0 ... dn−1, ρ , ci b b b b b b b b b bdb0 = (bfun(x1 b...xb bn ) → e, ρ0) dn := (v, ρ) AbsSend AbsInit The initial state associated b b b b b b b bδi ∈ res(σ, dib, ) b b b with a program P is if π(ι) ∋ hv, ρ, a, t i bbi := new\va(ι,xi, δi, q ) b b b ′ sP := α(sP )= hπ0, µ0, σ0i ′′ ′ σ(a) ∋ Arg2hℓ, d, ι , , ci ρ b := ρc[x1b 7→b b1 ...xn 7→ bn] b where π =[ι 7→ {hP, [], ∗, t i}] thenbπ′ = π ⊔ [ι 7→b {he,b ρb′′, c, tick(l, t )i}] b bd = (sendb ,b ) 0 0 0 ′ b µ =[ι 7→ ǫ ]b b b ′ thenb bπ = π ⊔ [ι 7→b {hb v, ρ,b c, t i}] 0 0 bσ = bσ ⊔ [b1 7→b {d1} ... bnb7→ {dn}] ∗ µb′ = µ[ι′ 7→ enq((v, ρ), µ(ι′))] σb0 =[b 7→ {Stop}] b b b b b b d b b b b b b b b b b b b b b b b ′ ′ Figure 3. Rules defining the Abstract Semantics. Theb tablesb describeb d the conditionsb b b under which a transitiobn s = hπ, µ, σi hπ , µ , σ′i = s′ can fire; the primed versions of the components of the states are identical to the non-primed ones unless indicated otherwise in the “then” part of the corresponding rule. We write for the join operation of the appropriate domain. ⊔ b b b b b b b b counterparts: 6. Generating the Actor Communicating System i k The CFA algorithm we presented allows us to derive a sound ‘flat’ Timek := ≤i≤k ProgLoc αt (ℓ1 ...ℓk · t) := ℓ1 ...ℓk 0 representation of the control-flow of the program. The analysis The simplestS analysis we can then define is the one induced by takes into account higher-order computation and (limited) informa- \ the basic domains abstraction hData 0, Time0, Mailbox seti. With tion about synchronization. Now that we have this rough scheme of this instantiation many of the domains collapse in to singletons. the possible transitions, we can ‘guard’ those transitions with ac- Implementing the analysis as it is would lead however to an expo- tions which must take place in their correspondence; these guards, nential algorithm because it would record separate store and mail- in the form of ‘receive a message of this form’ or ‘send a message boxes for each abstract state. To get a better complexity bound, we of this form’ or ‘spawn this process’ cannot be modelled faith- apply a widening following the lines of [33, Section 7]: instead fully while retaining decidability of useful verification problems, of keeping a separate store and separate mailboxes for each state as noted in Section 3. The best we can do, while remaining sound, we can join them keeping just a global copy of each. This reduces is to relax the synchronization and process creation primitives with significantly the space we need to explore: the algorithm becomes counting abstractions and use the guards to restrict the applicabil- polynomial time in the size of the program (which is reflected in ity of the transitions. In other words, these guarded (labelled) rules the size of ProgLoc). will form the definition of an ACS that simulates the semantics of Considering other abstractions for the basic domains easily the input λACTOR program. leads to exponential algorithms; in particular, the state-space grows linearly wrt the size of abstract data so the complexity of the anal- Terminology. We identify a common pattern of the rules in Figure 3. In each rule R, the premise distinguishes an abstract pid and an ysis using DataD is exponential in D. ι abstract process state q = he, ρ, a, ti associated with ι i.e. q ∈ π(ι) Dealing with open programs. Often it is useful to verify an open and the conclusion of the rule associates a new abstractb process expression where its input is taken from a regular set of terms state—call it ′—with i.e. ′ ′ . Henceforth we shall refer q bι qb ∈b πb (ι) b b b b (see [28]). We can reproduce this in our setting by introducing a to (ι, q, q′) as the active components of the rule R. new primitive choice that non-deterministically calls one of its argu- b b b b b ments. For instance, an interesting way of closing N in Example 1 Definition 8 (Generated ACS). Given a λACTOR program P, a b b b would be by binding it to any num(): sound basic domains abstraction I = hT , M, Di and a sound data abstraction for messages Dmsg = hMsg, αmsg, resmsgi letrec ... the Actor communicating system generated by P, I and Dmsg is any num() = choice(fun() → zero, defined as fun() → {succ, any num()}). d c \ in C = cell start () , add to cell (any num(), C). AP := hPid, ProcState, Msg,R,α(ι0), α(π0(ι0))i

Now the uncoverability of the state where more than one instance where sP = hπ0, µ0,σ0,t0i is the initial state (according to Init) d d of inc is running the protected section would prove that mutual with π0 = [ι0 7→ hP, [], ∗,t0i] and the rules in R are defined by exclusion is ensured for any number of concurrent copies of inc. induction over the following rules. (i) If s s′ is proved by rule AbsFunEval or AbsArgEval or our algorithm would output the following ACS starting from AbsApply with active components (ι, q, q′), then ‘main’: 4 τ ′ νιs.server ιs!init ?ok ιs!set b b ι: q −→ q ∈ R (AcsTau) ι0 : main ′ b b b (ii) If s s is proved by AbsReceive with active components ?binit b ι0!ok ?binit ′ ′ ιs : server do serve receive error (ι, q, q ) where d = (pi,bρ )b is theb abstract message matched by b \ b ?set mmatchb andbm ∈ resmsg(σ, d), then b b b b b b mb ι: q −−→? q′ ∈ R (AcsRec) The error state is reachable in the CFA graph but not in its Parikh b c b b ′ semantics: the token init is only sent once and never after ok is (iii) If s s is proved by AbsSend with active components ′ b b b sent back to the main process. Once init has been consumed in the (ι, q, q ) where d is the abstract value that is sent and m ∈ transition from ‘server’ to ‘do serve’ the counter for it will remain resmsg(σ,b d), thenb set to zero forever. ′ b b b b bι !mb ′ b ι: q −−−→ q ∈ R (AcsSend) Theorem 3 (Soundness of generated ACS). For all choices of c b b ′ ′ ′ (iv) If s s is proved by AbsSpawn with active component I and Dmsg, for all concrete states s and s , if s → s and ′ ′ ′ ′ ′ (ι, q, q ) where ι is the new abstract pid that is generated in the αacs(s) ≤ v then there exists v such that αacs(s ) ≤ v , and b b b ′ premise of the rule, which gets associated with the process state v →acs v . b b ′′ ∗ then qb b=bhe, ρ, i b See Appendix C for a proof of the Theorem. ′ ′′ νbι .qb ′ ι: q −−−−→ q ∈ R (AcsSp) Corollary 1 (Simulation). Let AP be the ACS derived from a given b b λACTOR program P. We have JAP K simulates the semantics of P: As we will make precise later, keeping Pid and ProcState\ for each P-run s → s1 → s2 → . . . , there exists a JAP K-run small is of paramountb importanceb forb the model checking of the v →acs v1 →acs v2 →acs . . . such that αacs(s) = v and for all i, generated ACS to be feasible. This is the main reason why we keep d αacs(si) ≤ vi. the message abstraction independent from the data abstraction: this allows us to increase precision with respect to types of messages, Simulation preserves all paths so reachability (and coverability) which is computationally cheap, and keep the expensive precision is preserved. on data as low as possible. It is important to note that these two ′ ∗ Corollary 2. If there is no v ≥ αacs(s ) such that αacs(s) →acs v ‘dimensions’ are in fact independent and a more precise message then s 6→∗ s′. space enhances the precision of the ACS even when using Data 0 as the data abstraction. Example 3 (ACS Generated from Example 1). A (simplified) In our examples (and in our implementation) we use a Data D pictorial representation of the ACS generated by our procedure abstraction for messages where D is the maximum depth of the from the program in Example 1 (with the parametric entry point of receive patterns of the program. Section 5) is shown in Figure 4, using a 0-CFA analysis. The three pid-classes correspond to the starting process ι and the two static Definition 9. The abstraction function 0 calls of spawn in the program, the one for the shared cell process ιc \ N and the other, , for all the processes running . αacs : State → (Pid × (ProcState ⊎ Msg) → ) ιi binc The first component of the ACS, the starting one, just spawns relating concrete states and states of the ACS is defined as b a shared cell and an arbitrary number of concurrent copies of the d d b (ι, q) 7→ {ι | α(ι)= ι, α(π(ι)) = q} third component; these actions increment the counter associated with states ‘res free’ and ‘inc ’. The second component represents : 0 αacs(s) =  α(ι)= ι, the intended protocol quite closely; note that by abstracting mes- (bι, mb ) 7→ (ι, i) b b  α (res(σ, µ(ι) )) = m sages they essentially become tokens and do not have a payload msg i anymore. The rules of the third component clearly show its sequen- where  . b s = hπ,µ,σ b bi tial behaviour. The entry point is (ι , cell start). b 0 It is important to note that most of the decidable properties The VAS semantics is accurate enough in this case to prove mutual exclusion of, say, state ‘inc ’, which is protected by locks. of the generated ACS are not even expressible on the CFA graph b2 alone: being able to predicate on the contents of the counters means Let’s say for example that n > 0 processes of pid-class ιi reached we can decide boundedness, mutual exclusion and many other state ‘inc1’; each of them sent a lock message to the cell; note that now the message does not contain the pid of the requester expressive properties. The next example shows one simple way in b which the generated ACS can be more precise than the bare CFA so all these messages are indistinguishable; moreover the order graph. of arrival is lost, we just count them. Suppose that ιc is in state ‘res free’; since the counter for lock is n and hence not zero, the Example 2 (Generated ACS). Given the following program: rule labeled with ?lock is enabled; however, once fired the counter b letrec for ‘res free’ is zero and the rule is disabled. Now exactly one ack server = fun() → receive {init , P, X}→ can be sent to the ‘collective’ mailbox of pid-class ιi so the rule send(P, ok), do serve (X) receiving the ack is enabled; but as long as it is fired, the only ack end. message is consumed and no other process can proceed. This ιi b do serve = fun(X) → receive holds until the lock is released and so on. Hence only one process {init , , }→ error ; at a time can be in state ‘inc ’. This property can be stated as a {set , Y} → do serve (Y); 2 b {get , P} → send(P,X), coverability problem: can inc2 = 2 be covered? Since the VAS do serve (X); semantics is given in terms of a VAS, the property is decidable end. in S = spawn(server), send(S, {init , self (), a}), 4 Labels are abbreviated to unclutter the picture; for example {init, , } receive ok → send(S, {set , b}) end. is abbreviated with init νιi.inc0 tions performed by different processes are independent, thanks to the actor model paradigm. This allows for the use of preorder re- τ νιc.res free b τ ductions. In our prototype, as described in Section 7, we imple- ι0 : cell start res start sp inc stop mented a simple reduction that safely removes states which only b represent internal functional transitions, irrelevant to the property at hand. This has proven to be a simple yet effective transformation b ι !ack ιi!ans ?lock ack i cell yielding a significant speedup. We conjecture that, after the reduction, the cardinality of ProcState\ is quadratic only in the number res free b res locked b τ ιc : τ of send, spawn and receive of the program. ?unlock Res b ?req 7. Evaluation, Limitations and Extensions

ιc!lock ιc!unlock Empirical Evaluation. To evaluate the feasibility of the ap- ιi : inc0 inc1 inc5 stop proach, we have constructed Soter, a prototype implementation of our method for verifying Erlang programs. Written in Haskell, b ?ack ιc!breq b ιc!req ?ans Soter takes as input a single Erlang module annotated with safety inc2 inc3 inc4 properties in the form of simple assertions. Soter supports the full b b higher-order fragment and the (single-node) concurrency and com- Figure 4. ACS generated by the algorithm from Example 1 munication primitives of Erlang; features not supported by Soter are described in Remark 1. For more details about the tool see [11]. The annotated Erlang module is first compiled to Core Erlang by and the answer can be algorithmically calculated. As we saw the the Erlang compiler. A 0-CFA-like analysis, with support for the answer is negative and then, by soundness, we can infer it holds in DataD data and message abstraction, is then performed on the the actual semantics of the input program too. compile; subsequently an ACS is generated. The ACS is simplified and then fed to the backend model-checker along with coverability Complexity of the Generation. Generating an ACS from a pro- queries translated from the annotations in the input Erlang program amounts to calculating the analysis of Section 5 and aggre- gram. Soter’s backend is the tool BFC [18] which features a fast gating the relevant ACS rules for each transition of the analysis. coverability engine for a variant of VAS. At the end of the verifi- Since we are adding O(1) rules to R for each transition, the com- cation pathway, if the answer is YES then the program is safe with plexity of the generation is the same as the complexity of the anal- respect to the input property, otherwise the analysis is inconclusive. ysis itself. The only reason for adding more than one rule to R In Table 1 we summarise our experimental results. Many of the for a single transition is the cardinality of Msg but since this costs examples are higher-order and use dynamic (and unbounded) pro- only a constant overhead, increasing the precision with respect to cess creation and non-trivial synchronization. Example 1 appears message types is not as expensive as adoptingd more precise data as reslock and Soter proves mutual exclusion of the clients’ critical abstractions. section. concdb is the example program of [16] for which we prove mutual exclusion. pipe is inspired by the ‘pipe’ example of [19]; Dimension of the Abstract Model. The complexity of coverabil- the property proved here is boundedness of mailboxes. sieve is a ityonVASis EXPSPACE in the dimension of the VAS; hence for the dynamically spawning higher-order concurrent implementation of approach to be practical, it is critical to keep the number of com- Erathostene’s sieve inspired by a program by Rob Pike;5 Soter can ponents of the VAS underlying the generated ACS small; in what prove all the mailboxes are bounded. follows we call dimension of an ACS the dimension of the VAS All example programs, annotated with coverability queries, can underlying its VAS semantics. be viewed and verified using Soter at http://mjolnir.cs.ox. \ Our algorithm produces an ACS with dimension (|ProcState|+ ac.uk/soter/. |Msg|)×|Pid|. With the 0-CFA abstraction described at the end of \ Limitations There are programs and properties that cannot be Section 5, ProcState is polynomial in the size of the program and proved using any of the presented abstractions. (i) The program Piddis lineard in the size of the program so, assuming |Msg| to be a in Figure 5 defines a simple function that discards a message in constant, the dimension of the generated ACS is polynomial in the the mailbox and feeds the next to its functional argument and so sized of the program, in the worst case. Due to the parametricid ty of on in a loop. Another process sends a ‘bad argument’ and a good the abstract interpretation we can adjust for the right levels of preci- one in alternation such that only the good ones are fed to the func- sion and speed. For example, if the property at hand is not sensitive tion. The property is that the function is never called with a bad to pids, one can choose a coarser pid abstraction. It is also possible argument. This cannot be proved because sequential information of to greatly reduce ProcState\ : we observe that many of the control the mailboxes, which is essential for the verification, is lost in the states result from intermediate functional reductions; such reduc- counter abstraction. (ii) The program in Figure 6 defines a higher-order combinator that spawns a number of identical workers, each applied to a different task in a list. It then waits for all the workers to return a result before collecting them in a list which is 1 letrec no a = fun(X)→ case X of a → error ; b → ok end. subsequently returned. The desired property is that the combinator 2 send b = fun(P)→send(P, b), send a(P). 3 send a = fun(P)→send(P, a), send b(P). only returns when every worker has sent back its result. Unfortu- 4 stutter = fun(F)→receive → unstut(F) end. nately to prove this property, stack reasoning is required, which is 5 unstut = fun(F)→receive X → F(X), stutter (F) end. beyond the capabilities of an ACS. 6 in P = spawn(fun()→stutter(no a)), send a(P). Refinement and Extensions. Our parametric definition of the abstract semantics allows us to tune the precision of the analysis Figure 5. A program that Soter cannot verify because of the sequencing in mailboxes 5 see “Concurrency and message passing in Newsqueak”, http://youtu. be/hB05UFqOtFA ABSTR ACS SIZE TIME Example LOC PRP SAFE? D M Places Ratio Analysis Simpl BFC Total reslock 356 1 yes 0 2 40 10% 0.56 0.08 0.82 1.48 sieve 230 3 yes 0 2 47 19% 0.26 0.03 2.46 2.76 concdb 321 1 yes 0 2 67 12% 1.10 0.16 5.19 6.46 state factory 295 2 yes 0 1 22 4% 0.59 0.13 0.02 0.75 pipe 173 1 yes 0 0 18 8% 0.15 0.03 0.00 0.18 ring 211 1 yes 0 2 36 9% 0.55 0.07 0.25 0.88 parikh 101 1 yes 0 2 42 41% 0.05 0.01 0.07 0.13 unsafe send 49 1 no 0 1 10 38% 0.02 0.00 0.00 0.02 safe send 82 1 no* 0 1 33 36% 0.05 0.01 0.00 0.06 safe send 82 4 yes 1 2 82 34% 0.23 0.03 0.06 0.32 firewall 236 1 no* 0 2 35 10% 0.36 0.05 0.02 0.44 firewall 236 1 yes 1 3 74 10% 2.38 0.30 0.00 2.69 finite leader 555 1 no* 0 2 56 20% 0.35 0.03 0.01 0.40 finite leader 555 1 yes 1 3 97 23% 0.75 0.07 0.86 1.70 stutter 115 1 no* 0 0 15 19% 0.04 0.00 0.00 0.05 howait 187 1 no* 0 2 29 14% 0.19 0.02 0.00 0.22

Table 1. Soter Benchmarks. The number of lines of code refers to the compiled Core Erlang. The PRP column indicates the number of properties which need to be proved. The columns D and M indicate the data and message abstraction depth respectively. In the “Safe?” column, “no*” means that the program satisfies the properties but the verification was inconclusive; “no” means that the program is not safe and Soter finds a genuine counterexample. “Places” is the number of places of the underlying Petri net after the simplification; “Ratio” is the ratio of the number of places of the generated Petri net before and after the simplification. All times are in seconds.

8. Related Work 1 letrec 2 worker= fun(Task) → ... Static Analysis. Verification or bug-finding tools for Erlang [6– 3 8, 20, 22, 27] typically rely on static analysis. The information 4 spawn wait= fun(F, L) → spawn wait’(F, fun()→[], L). obtained, usually in the form of a call graph, is then used to extract 5 spawn wait’= fun(F, G, L) → type constraints or infer runtime properties. Examples of static 6 case L of analyses of Erlang programs in the literature include data-flow [6], 7 [] → G(); 8 [ T|Ts] → control-flow [20, 27] and escape [7] analyses. 9 S = self (), Van Horn and Might [25] derive a CFA for a multithreaded ex- 10 C = spawn(fun() → tension of Scheme, using the same methodology [33] that we fol- 11 send(S, {ans, self (), F(T) })), low. The concurrency model therein is thread-based, and uses a 12 F’ = fun() → compare-and-swap primitive. Our contribution, in addition to ex- 13 receive tending the methodology to Actor concurrency, is to use the derived 14 {ans, C, R}→ [R | G() ] parametric abstract interpretation to bootstrap the construction of 15 end, an infinite-state abstract model for automated verification. 16 spawn wait’(F, F’, Ts) Reppy and Xiao [31] and Colby [9] analyse the channel commu- 17 end. 18 nication patterns of Concurrent ML (CML). CML is based on typed 19 in spawn wait(worker, [ task1, task2, ... ]). channels and synchronous message passing, unlike the Actor-based concurrency model of Erlang. Figure 6. A program that Soter cannot verify because of the stack Venet [34] proposed an abstract interpretation framework for the sanalysis of π-calculus, later extended to other process algebras by Feret [12] and applied to CAP, a process calculus based on the Actor model, by Garoche [15]. In particular, Feret’s non-standard when the abstraction is too coarse for the property to be proved. For semantics can be seen as an alternative to Van Horn and Might’s safety properties, the counter-example witnessing a no-instance is a methodology, but tailored for process calculi. finite run of the abstract model. We conjecture that, given a spurious counter-example, it is possible to compute a suitable refinement of the basic domains abstraction so that the counter-example is no Model Checking. Huch [16] uses abstract interpretation and longer a run of the corresponding abstract semantics. However a model checking to verify LTL-definable properties of a restricted na¨ıve implementation of the refinement loop would suffer from fragment of Erlang programs: (i) order-one (ii) tail-recursive (sub- state explosion. A feasible CEGAR loop will need to utilise sharper sequently relaxed in a follow-up paper [17]), (iii) mailboxes are abstractions: it is possible for example to pinpoint a particular pid bounded (iv) programs spawn a fixed, statically computable, num- or call or mailbox for which the abstract domains need to be more ber of processes. Given a data abstraction function, his method precise while coarsely abstracting the rest. The development of a transforms a program to an abstract, finite-state model; if a path fully-fledged CEGAR loop is a topic of ongoing research. property can be proved for the abstract model, then it holds for The general architecture of our approach, combining static anal- the input Erlang program. In contrast, our method can verify Er- ysis and abstract model generation, can be adapted to accommodate lang programs of every finite order, with no restriction on the size different language features and different abstract models. By appro- of mailboxes, or the number of processes that may be spawned. priate decoration of the analysis, it is possible to derive even more Since our method of verification is by transformation to a decid- complex models for which semi-decision verification procedures able infinite-state system that simulates the input program, it is have been developed [4, 21]. capable of greater accuracy. McErlang is a model checker for Erlang programs developed by [6] R. Carlsson, K. Sagonas, and J. Wilhelmsson. Message analysis for Fredlund and Svensson [14]. Given a program, a Büchi automaton, concurrent programs using message passing. ACM TOPLAS, 2006. and an abstraction function, McErlang explores on-the-fly a prod- [7] M. Christakis and K. Sagonas. Static detection of race conditions in uct of an abstract model of the program and the Büchi automaton erlang. PADL, pages 119–133, 2010. encoding a property. When the abstracted model is infinite-state, [8] M. Christakis and K. Sagonas. Detection of asynchronous message McErlang’s exploration may not terminate. McErlang implements passing errors using static analysis. PADL, pages 5–18, 2011. a fully-fledged Erlang runtime system, and it supports a substantial [9] C. Colby. Analyzing the communication topology of concurrent pro- part of the language, including distributed and fault-tolerant fea- grams. In PEPM, pages 202–213, 1995. tures. [10] E. D’Osualdo, J. Kochems, and C.-H. L. Ong. Verifying Erlang-style ACS can be expressed as processes in a suitable variant of concurrency automatically. Technical report, University of Oxford CCS [26]. Decidable fragments of process calculi have been used DCS Technical Report, 2011. http://mjolnir.cs.ox.ac.uk/ in the literature to verify concurrent systems. Meyer [23] isolated soter/cpmrs.pdf. a rich fragment of the π-calculus called depth-bounded. For cer- [11] E. D’Osualdo, J. Kochems, and C.-H. L. Ong. Soter: an automatic tain patterns of communication, this fragment can be the basis of safety verifier for Erlang. In Proceedings of the 2nd edition on an abstract model that avoids the “merging” of mailboxes of the Programming systems, languages and applications based on actors, processes belonging to the same pid-class. Erlang programs how- agents, and decentralized control abstractions, AGERE! ’12, pages ever can express processes which are not depth bounded. We plan 137–140. ACM, 2012. to address the automatic abstraction of arbitrary Erlang programs [12] J. Feret. Abstract interpretation of mobile systems. Journal of Logic as depth-bounded process elsewhere. and Algebraic Programming, 63(1):59130, 2005. Bug finding. Dialyzer [7, 8, 20] is a popular bug finding tool, in- [13] A. Finkel and P. Schnoebelen. Well-structured transition systems cluded in the standard Erlang / OTP distribution. Given an Erlang everywhere! Theoretical Computer Science, 256(1-2):63–92, 2001. program, the tool uses flow and escape [29] analyses to detect spe- [14] L. Fredlund and H. Svensson. McErlang: a model checker for a cific error patterns. Building on top of Dialyzer’s static analysis, distributed functional programming language. In ICFP, pages 125– success types are derived. Lindahl and Sagonas’ success types [20] 136, 2007. ‘never disallow the use of a function that will not result in a type [15] P. Garoche, M Pantel, and X. Thirioux. Static safety for an actor clash during runtime’ and thus never generate false positives. Di- dedicated process calculus by abstract interpretation. In FMOODS, alyzer puts to good use the type annotations that programmers do pages 78–92, 2006. use in practice; it scales well and is effective in detecting ‘discrep- [16] F. Huch. Verification of Erlang programs using abstract interpretation ancies’ in Erlang code. However, success typing cannot be used to and model checking. In ICFP, pages 261–272, 1999. verify program correctness. [17] F. Huch. Model checking Erlang programs - abstracting recursive function calls. 64:195–219, 2002. Conclusion. We have defined a generic analysis for λACTOR, and a way of extracting from the analysis a simulating infinite-state ab- [18] A. Kaiser, D. Kroening, and T. Wahl. Efficient coverability analysis by stract model in the form of an ACS, which can be automatically proof minimization. In CONCUR, 2012. www.cprover.org/bfc/. verified for coverability: if a state of the abstract model is not cov- [19] N. Kobayashi, M. Nakade, and A. Yonezawa. Static analysis of erable then the corresponding concrete states of the input λACTOR communication for asynchronous concurrent programming languages. program are not reachable. Our constructions are parametric on the Static Analysis, pages 225–242, 1995. abstractions for Time, Mailbox and Data, thus enabling different [20] T. Lindahl and K. Sagonas. Practical type inference based on success analyses (implementing varying degrees of precision with different typings. In PPDP, pages 167–178, 2006. complexity bounds) to be easily instantiated. In particular, with a [21] Z. Long, G. Calin, R. Majumdar, and R. Meyer. Language-Theoretic 0-CFA-like specialisation of the framework, the analysis and gen- abstraction refinement. In FASE, pages 362–376, 2012. eration of the ACS are computable in polynomial time. Further, the [22] S. Marlow and P. Wadler. A practical subtyping system for Erlang. In dimension of the resulting ACS is polynomial in the length of the ICFP, pages 136–149, 1997. input λACTOR program, small enough for the verification problem [23] R. Meyer. On boundedness in depth in the π-calculus. In Fifth Ifip to be tractable in many useful cases. The empirical results using International Conference On Theoretical Computer Science, pages our prototype implementation Soter are encouraging. They demon- 477–489, 2008. strate that the abstraction framework can be used to prove inter- [24] J. Midtgaard and T. Jensen. A calculational approach to control-flow esting safety properties of non-trivial programs automatically. We analysis by abstract interpretation. Static Analysis, pages 347–362, believe that the proposed technique can easily be adapted to ac- 2008. commodate other languages and other abstract models. The level [25] M. Might and D. Van Horn. A family of abstract interpretations for of generality at which the algorithm is defined seems to support the static analysis of concurrent higher-order programs. Static Analysis, definition of a CEGAR loop readily, the formalisation of which is pages 180–197, 2011. a topic for future work. [26] R. Milner. A calculus of communicating systems, volume 92. Springer- Verlag Germany, 1980. References [27] S. Nyström. A soft-typing system for Erlang. In ACM Sigplan Erlang [1] G. Agha. Actors: a model of concurrent computation in distributed Workshop, pages 56–71, 2003. systems. MIT Press, Cambridge, MA, USA, 1986. [28] C.-H. L. Ong and S. J. Ramsay. Verifying higher-order functional [2] J. Armstrong. Erlang. CACM, 53(9):68, 2010. programs with pattern-matching algebraic data types. In POPL, pages [3] J. Armstrong, R. Virding, and M. Williams. Concurrent programming 587–598, 2011. in Erlang. Prentice Hall, 1993. [29] Y. G. Park and B. Goldberg. Escape analysis on lists. In ACM [4] A. Bouajjani, J. Esparza, and T. Touili. A generic approach to the static SIGPLAN Notices, volume 27, pages 116–127, 1992. analysis of concurrent programs with procedures. In ACM SIGPLAN [30] C. Rackoff. The covering and boundedness problems for vector Notices, volume 38, pages 62–73, 2003. addition systems. Theoretical Computer Science, 6:223–231, 1978. [5] R. Carlsson. An introduction to Core Erlang. In Proceedings of the [31] J. H. Reppy and Y. Xiao. Specialization of CML message-passing PLI’01 Erlang Workshop, 2001. primitives. In POPL, pages 315–326, 2007. [32] O. Shivers. Control-Flow Analysis of Higher-Order Languages. PhD thesis, Carnegie Mellon University, 1991. [33] D. Van Horn and M. Might. Abstracting abstract machines. In ICFP, pages 51–62, 2010. [34] Arnaud Venet. Abstract interpretation of the pi-calculus. In LOMAPS, pages 51–75, 1996. A. Abstract Domains, Orders, Abstraction where κ = Argihℓ,..., ρ, i Functions and Abstract Auxiliary Functions [ \ \ new\va : Pid × Var × Data × ProcState → VAddr Abstract Domains, Orders and Abstraction Functions: b new\va(ι,x, δ, h , , , t i) :=(ι, x, δ, t ) Pid := ProgLoc × Time\ d \ new\pid : Pid × ProgLoc × Time → Pid ≤pid := = ×≤t b b b b b ′ ′ b∗ ′ ′ d new\pid((ℓ , t ), ℓ, t ) :=(ℓ, tick ( t, tick(ℓ , t )) αpid := id × αt d d VAddr\ := Pid × Var × Data[ × Time\ Concrete and Abstract Match Function: b′ b d b d′ b matchρ,σ(pi, (x,ρ )) = matchρ,σ(pi,σ(ρ (x))) ≤va := ≤pid × = ×≤d ×≤t d matchρ,σ(x,d)= {x 7→ d} if x∈ / dom(ρ) αva := αpid × id × αd × αt ′ match (x,d)= {x 7→ d} if match ′ (p ,d) 6= ⊥ Env := Var ⇀ VAddr\ ρ,σ ρ ,σ ′ ′ ′ ′ where (p ,ρ )= σ(ρ(x)) ρ ≤env ρ ⇐⇒ ∀x ∈ Var . ρ(x) ≤va ρ (x) d ′ ′ αenv(ρ)(x) := αva(ρ(x)) matchρ,σ(p, (t,ρ )) = matchρ,σ(pi, (ti,ρ )) b b b b 1≤i≤n KAddr\ := Pid × ProgLoc × Env × Time\ O where p = c(p1,...,pn) ≤ka := ≤pid × = ×≤env ×≤t d d t = c(t1,...,tn ) α := α × id × α × α ka pid env t θ ⊗ θ′ = ⊥ if ∃x.θ(x) 6= θ′(x) \ Closure := ProgLoc × Env θ ⊗ θ′ = θ ∪ θ′ otherwise ≤cl := = ×≤env d = [] : id αcl = × αenv ∅ \ \ O Value := Closure ⊎ Pid matchρ,σ(p,d)= ⊥ otherwise

≤val := ≤cl + ≤pid d : \ ′ \ αval = αcl + αpid matchρ,b σb(pi, (x, ρ )) = matchρ,b σb (pi, d ) \ \ b ′ ProcState :=(ProgLoc ⊎ Pid) × Env × KAddr × Time d∈σb[(ρb (x)) \ b b ≤ps :=(=+ ≤pid)×≤env ×≤ka ×≤t matchρ,b σb(x,d)= {x 7→ d } d d αps :=(id + αpid ) × αenv × αka × αt \ ′ \ ′ matchb b(p, (t, ρ )) = matchb b(p , (t , ρ )) \ \ ρ,σ b ρ,σ i i Procs := Pid → P(ProcState) ≤i≤n 1O ′ ′ d π ≤proc π ⇐⇒ ∀ι ∈ Pid . π(ι) ⊆ π (ι) if p = c(pb1,...,pn) and b d αprocs(π)(ι) := {αps(π(ι)) | αpid(ι)= ι } t = c(t1,...,tn ) \ b b \b d b b b b Mailboxes := Pid → Mailbox θ = θi,θ 6= ⊥, ′ b ′b where ({Θi | 1 ≤ i ≤ n})= θ 1≤i≤n µ ≤ms µ ⇐⇒ ∀ι ∈ Pid . µ(ι) ≤m µ (ι)  O  d  θi ∈ Θi, 1 ≤ i ≤ n  Od αms(µ)(ι) := {αm(µ(ι)) | αpid(ι)= ι } \ b b otherwise b b b d b b b b matchρ,σ(p,d)= ∅   Store\ :=(VAddr\ →GP(Value\)) × (KAddr\ → P(Kont[ )) b b ′ \ ′ σ ≤st σ ⇐⇒ ∀b ∈ VAddr . σ(b) ⊆ σ (b) Lemma 1. Suppose the concrete domain C = A ⇀ B of partial ′ functions has abstract domain C = A → P(B) with the induced ∀a ∈ KAddr\ . σ(a) ⊆ σ (a) b b b order ≤ and abstraction function α : C → C as speciﬁed in 5 b b b b \ C αst(σ)(b) := {αval(σ(b)) | αva(b)= b }, b ∈ VAddr then for all f ∈ C and for all αbC(f) b≤ f b b b b b b \ b αst(σ)(a) := {αkont(σ(a)) | αka(a)= a }, a ∈ KAddr ∀a ∈ dom(f) . αB(f(a)) ∈ f(αA(a)). (6) b b b b [ \ \ \ ′ ′ State := Procs × Mailboxes × Store Further suppose f, f ∈ C such that f = f[a1 7→ b1,...,an 7→ b b b ′ ′ b ≤ :=≤procs ×≤ms ×≤st bn] and let f, f ∈ C such that f = f ⊔ [a1 7→ b1,..., an 7→ bn]

αcfa :=(id + αpid ) × αenv with αC(f) ≤ f and αA(ai) = ai, αB(bi) = bi for i = 1,...,n then b b b b b b b ′ ′ ′ b b where we write f + g := {(x,x ) | (x,x ) ∈ f or (x,x ) ∈ g}. ′ ′ b αC(f )b≤ f . b (7) Abstract Auxiliary Functions: Proof. Let f ∈ C and f ∈ C such that αC(f) ≤ f. The definition \ \ \ b newkpush : Pid × ProcState → KAddr of ≤ implies that for all a ∈ A new\kpush(ι, (ℓ, ρ, , t )) :=(ι, ℓ.arg , ρ, t ) b b b d 0 αC(f)(a) ⊆ f(a). b b \ [ \ \ Take and fix then we obtain newkpop : Pid × Kontb × ProcState →b KAddr a ∈ A a = αA(a) b b b b b b b \ newkpop(ι, κ, h , , , t i) :=(ι,ℓ.argi+1, ρ, t ) αC(f)(αA(a)) ⊆ f(αA(a)). d b b b b b b b b ′ Expanding the definition of αC yields Case: (Send). We know that s → s using rule Send; we can thus assume {αB(b0) | (a0,b0) ∈ f, αA(a0)= αA(a))}⊆ f(αA(a)). π(ι)= hv,ρ,a,ti In particular αB(f(a)) ∈ {αB(b0) | (a0,b0) ∈ f, αA(a0) = b Arg ′ αA(a))} which yields what we set out to prove σ(a)= 2hℓ,d,ι , , ci d =(send, ) ∀a ∈ dom(f) . αB(f(a)) ∈ f(αA(a)). ′ ′ ′ and for Turning to equation 7 we want to show α (f ) ≤ f . Let a ∈ A s bC then there are several cases to consider π′ = π[ι 7→ hv,ρ,c,ti] ′ b ′ b ′ ′ ′ (i) αC(f )(a) = αC(f)(a). Then since αC(f) ≤ f ≤ f web have µ = µ[ι 7→ enq((v,ρ), µ(ι ))] ′ αC(f)(a) ⊆ f(a) ⊆ f (a). σ′ = σ. (ii) a = αA(ai) for some 1 ≤ i ≤ n. Then a = aiband thusb b b ′ ′ b b As a first step we will examine u and show that u u for some αC(bf )(a)=b {αB(bib)} ∪ {αB(f(a))|αA(a)= a,a 6= ai} ′ u . For π and σ, writing ι := αpid(ι), Corollary 3 gives us b b b ⊆ {bi} ∪ αC(f)(a) hv, ρ, a, t i ∈ π(ι) b ′ b b b b ⊆ {bi} ∪ f(a) ⊆ f (a) ′ b Arg2hℓ, d, ι , , ci ∈ σ(a) b ′ b b b b b (iii) otherwise there does not exist (a,b) ∈ f such that αA(a)= a ′ b b b where αenv(ρ) = ρ, αak(a) = a, d = αval(d), t = αt(t), and hence αC(f )(a)= ∅ whichb makesb our claim trivially true. ′ ′ b ι = αpid(ι ) and c = αka(c)b. Ruleb AbsSendb b is now applicable ′ ′ b and we can set We can thus conclude that αC(f ) ≤ f . b b b b b π′ b:= π ⊔ [ι 7→ q] \ Corollary 3. Let π ∈ Procs and π ∈bProcs such that αproc(π) ≤ \ q := hv, ρ, c, t i π, let σ ∈ Store and σ ∈ Store such that αst(σ) ≤ σ and Let µb′ := µb[ι′ 7→b enq((b v, ρ), µ(ι′))] µ ∈ Mailboxes and µ ∈ Mailboxes\ b such that α (µ) ≤ µ then ms b′ ′ b b′ b b b b u := hπ , µ , σi. (i) ∀ι ∈ Pid . αps(π(ι)) ∈ π(αpid(ι)) b b b d b b ′b (ii) VAddr b b It follows from rule (AbsSend) that u u . It remains to show ∀b ∈ . αval(σ(b)) ∈ σ(αva(b)) ′ ′ ′ ′ that αcfa(s ) ≤ u whichb followsb b directly from (i) αproc(π ) ≤ π (iii) ∀a ∈ KAddr . αkont(σ(ab)) ∈ σ(αka(a)) ′ ′ and (ii) αms(µ ) ≤ µ . (iv) ∀ι ∈ Pid . αm(µ(ι)) ≤ µ(αpidb (ι)) ′ ′ (v) ∀ι ∈ Pid . ∀x ∈ Var . ∀δ ∈ Datab . ∀q ∈ ProcState . (i) αproc(π ) ≤ π follows immediately from Lemma 1 sinceb b b \ ι = αpid(ι) and αps(q)= q. αva(newva(ι, x, δ, q)) = newva(αpid(ι),x,αd(δ), αps(q)) ′ ′ ′ ′ (ii) αms(µ ) ≤ µ . It is sufficient to show that αpid(ι ) = ι , which b ′ ′ Proof. Cases (i) - (iii) follow directly from Lemma 1; it remains to bis immediate, and αm(µ b(ι )) ≤ µ(ι). For the latter, since , a sound basic domain abstraction gives us show the claims of (iv) and (v). αenv(ρ)= ρb b ′ ′ ′ αm(µ (ι )) = αm(enq((v,ρb b), µ(ι ))) (iv) By assumption αms(µ) ≤ µ which implies that b ′ ≤ enq((v, ρ), µ(ι )) = µ(ι) αms(µ)(αpid(ι)) ≤ µ(αpid(ι)) = µ(ι). b provided we can show α (µ(ι′)) ≤ µ(ι′); the latter inequality Expanding α then gives us that α (µ(ι)) ≤ α (µ)(α (ι)), m ms m ms pid follows Corollary 3. Henced we canb concludeb b bαb (µ′) ≤ µ′ since α (µ) = λι. {α (µ(ι))b| α (ι) = ιb},b which allows us ms ms m pid which completes the proof of this case. to conclude b b F b b αm(µ(ι)) ≤ µ(ι). b Case:(Receive). In the concrete s → s′ using the Receive, hence we (v) The claim follows straightforwardly from expanding newva and can make the following assumptions new\va: b b π(ι)= hreceive p1 → e1 ...pn → en end,ρ,a,ti =: q αva(newva(ι, x, δ, q))=(αpid(ι),x,αd(δ), αt(t)) (i, θ, m) = mmatch(p1 ...pn, µ(ι),ρ,σ) new\va(αpid(ι),x,αd(δ), αps(q)) θ = [x1 7→ d1 ...xk 7→ dk] where q = he,ρ,a,ti. bj = newva(ι, xj , δj , q)

δj = res(σ, dj )

and for state ′ B. Proof of Theorem 1 s ′ ′ Proof of Theorem 1. The proof is by a case analysis of the rule that π = π[ι 7→ q ] ′ ′ ′ defines the concrete transition s → s . For each rule, the transition q = hei,ρ ,a,ti in the concrete system can be replicated in the abstract transition ′ system using the abstract version of the rule, with the appropriate ρ = ρ[x1 7→ b1 ...xk 7→ bk] choice of abstract pid, the continuation from the abstract store, the µ′ = µ[ι 7→ m] message from the abstract mailbox, etc. ′ σ = σ[b 7→ d ...b 7→ d ]. Let s = hπ,µ,σi→hπ′, µ′,σ′i = s′ and u = hπ, µ, σi such 1 1 k k ′ that αproc(π) ≤ π, αmail(µ) ≤ µ, and αst(σ) ≤ σ. We consider a As a first step we will look at u to prove there there exists a u number of rules for illustration. such that ′ using rule AbsReceive. We can invoke Corollary b b b u u b b b ′ ′ 3, since αproc(π) ≤ π, to obtain where q := he,ρ [x1 → b1 ...xn → bn], c, tick(ℓ,t)i ′ q := hreceive p1 → e1 ...pn → en end, ρ, a, t i ∈ π(ι) σ = σ[b1 7→ d1 ...bn 7→ dn] b µ′ = µ where we write ρ := αenv(ρ), a := αka(a), t = αt(t) and b ι := αbpid(ι). Moreover Corollary 3 gives us b b b b As a first step we will examine u and show that there exists a u′ ′ b αm(µ(ι)) b≤ µ(ι). b such that u u using rule AbsApply. From Corollary 3, since b αproc(π) ≤ π, it follows that as αms(µ) ≤ µ and ι = αpid(ι). Since the instantiation of the basic domains is sound and αst(σ) ≤ σ web thenb know that q := hv, αenv(ρ), αka(a), αt(t)i ∈ π(αpid(ι)). b b (i, θ,bm) ∈ mmatch(\ ~p, µ(ι), ρ, σ) Letting ρ := αenv(ρ), a := αka(a), t := αt(t) and ι := αpid(ι) we b can appealb to Corollary 3 again, as αst(σ) ≤ σb, to obtain m m such that θ = αsub(θ) and ≥ αm( ). Turning to the substitution b ′ b b b b b b b Argnhbℓ, d0 . . . dn−1, ρ , ci ∈ σ(a)b θ we can see that b ′ ′ b b where we write di := αval(di) for 0 ≤ i < n, ρ := αenv(ρ ) and θ = [x1 7→ d1 ...xk 7→ dk] b b b c := αka(c). Expanding αval yields b b b b where d = α(d ) for 1 ≤ i ≤ k. Appealing to the sound basic i i b b b b b domain instantiation once more, noting that α (σ) ≤ σ, yields d0 =(fun(x1 ...xn) → e, ρ0) st b that forbj = 1,...,k we have δj := αd(δj ) ∈ res(σ, dj ); where we write ρ := α (ρ ). Taking d := (v, ρ) we obtain b0 env 0 n to obtain new abstract variable addresses we can now setb bj := from our sound basic domain abstraction b new\va(ι, xj , δj , q ). Rule AbsReceiveb is applicable now; we makeb b δi :=bαd(δi) ∈ res(σ, di) for ib= 1,...,nb the following definitions b ′ ′ as and . Turning to the abstract variable b b bπ := π ⊔ [ι 7→ q ] αst(σ) ≤ σ di = αval(di) addresses web define c b b ′ ′ q := hei, ρ , a, t i bi := new\b va(ι, xi, δi, q ) for 1 ≤ i ≤ n. b′ b b b ρ := ρ[x1 7→ b1 ...xk 7→ bk] Rule AbsApply is now applicable and we define b′ b b b µ := µ[ι 7→ m] ′ b ′b b b π := π ⊔ [ι 7→ q ] b′ b b b σ := σ ⊔ [b1 7→ d1 . . . bk 7→ dk] ′ ′ q := he, ρ [x1 → b1 ...xn → bn], c, tick(ℓ, t )i b′ b b′ ′ b ′ u := hπ , µ , σ , ϑi b′ b b b b b b b σ := σ ⊔ [ b1 7→ d1 . . . bn 7→ dn] b b′ ′ ′ and observe that u u . It remains to show αcfa(s ) ≤ u which b′ ′ b ′ b b b d b b ′ ′ ′ ′ u := hπ , µ, σ i. follows directly ifb we canb proveb b (i) αproc(π ) ≤ π , (ii) αms(µ ) ≤ µ ′ ′ b b b b b b ′ and (iii) αst(σ ) ≤ σ . It is clear from rule AbsApply that u u ; it remains to show that ′ ′ tob proveb b this case. The latter follows if we can justify ′ ′ b b αcfa(s ) ≤ u (i) αproc(π ) ≤ π . We note that by Corollary 3 we know bi = ′ ′ ′ ′ b (i) αproc(π ) ≤ π and (ii) αst(σ ) ≤ σ . αva(bi) as ι = αpid(ι), δi = αd(δi) and αproc(q) = q for ′ ′ ′ ′ ′ (i) αproc(π ) ≤ π . We can appeal to Lemma 1 provided we can 1 ≤ i ≤ n.b It follows that ρ = αenv(ρ ) and hence bq = ′ ′ ′ show that ιb= α (ι) and α (bq ) = q where the former αps(q ). Lemma 1 is nowb applicable, since ι = αpid(ι), to give pid ps ′ b ′ b is immediate. For the latter, first observe that since we have αproc(π ) ≤ π . b ′ ′ b b a sound basic domain abstraction we know αt(tick(ℓ,t)) ≤ (ii) αms(µ ) ≤ µ . It is sufficient to show that ιb= αpid(ι), which is b b m m tick(ℓ, t ); however as Time is a flat domain so the above immediate, andb αm( ) ≤ which we have already established above; hence we can conclude ′ ′. inequality is in fact an equality b αms(µ ) ≤ bµ ′ ′ d (iii) αst(σ ) ≤ σ . The observationb that bi = αva(bi) and di = b ′ αt(tick(ℓ,t)) = tick(ℓ, t ). αval(di) allows the application of Lemma 1bwhich gives αst(σ ) ≤ ′ σ as desired. Moreover bi = αva(bi) for 1 ≤ i ≤ n by Corollary 3, hence b b b d b ′ ′ This completes the proof of this case. αenv(ρ [x1 → b1 ...xn → bn]) = ρ [x1 → b1 ...xn → bn] b b ′ ′ ′ Case:(Apply). Since s → s using rule Apply we can assume that as αenv(ρ )= ρ ; in combination with c = αka(c) we obtain the ′ ′ ′ ′ desired αps(q )= q . Thus we concludeb that αbproc(π ) ≤ πb. π(ι)= hv,ρ,a,ti =: q ′ ′ ′ (ii) αst(σ ) ≤ σ b. Since αva(bi) = bi band αval(di) = di for σ(a)= Arg hℓ,d ...d − ,ρ , ci := κ n 0 n 1 1 ≤ i ≤ n Lemmab 1 is applicable once more and givesb us ′ ′ and completes the proof of this case. arity(ℓ)= n αst(σ ) ≤ σ b b b d0 =(fun(x1 ...xn) → e,ρ0)

dn =(v,ρ) b C. Proof of Theorem 3 and for i = 1,...,n Terminology. Analogously to our remark in section 6 on active δi = res(σ, di) components for rules AbsR in the abstract operational semantics it is possible to identify a similar pattern in the concrete operational bi = newva(ι, xi, δi, q) semantics. Henceforth we will speak of the concrete active compo- ′ additionally for the successor state s′ nent (ι, q, q ) of a rule R of the concrete operational semantics and we will say the abstract active component (ι, q, q′) of a rule AbsR π′ = π[ι 7→ q′] of the abstract operational semantics where AbsR is the abstract b b b counterpart of R. We will omit the adjectives abstract and concrete and so we can define when there is no confusion. (ι, q) 7→ v(ι, q ) − 1, ′ Lemma 2. Suppose s → s using the concrete rule R with concrete v′ := v (ι, m) 7→ v(ι, m) − 1, ; ′ ′   active component (ι, q, q ) and s ≥ αcfa(s). Then s s with b b′ b b′ ′ ′ (ι, q ) 7→ v(ι, q ) + 1 s ≥ αcfa(s ) using rule AbsR with abstract active component   (α (ι), α (q), α (q′)).  b b r bvb v′ pid ps ps b b b it is then clear that, using rule ∈ R, →acs and b b b b b ′ v v′ Proof. The claim follows from inspection of the proof of Theo- αacs(s )(ι, q)= αacs(s)(ι, q) − 1 ≤ (ι, q) − 1= (ι, q) ′ ′ rem 1. αacs(s )(ι, m)= αacs(s)(ι, m) − 1 ≤ v(ι, m) − 1= v (ι, m) ′ b ′b b b′ v b b′ v′ b b ′ Proof of Theorem 3. Suppose s → s′ using rule R of the concrete αacs(s )(ι, q )= αacs(s)(ι, q ) + 1 ≤ (ι, q )+1= (ι, q ). ′ operational semantics with active component (ι, q, q ). We will b b b b b b ′ b′ b Hence, since αacs(s) ≤ v, we can conclude αacs(s ) ≤ v as prove our claim by case analysis on R. desired. b b b b b b b b ′ - R = Send. Using Lemma 2, with s = αcfa(s), gives s s with - R = FunEval, ArgEval, Apply or Vars. Take s = αcfa(s); ′ Lemma 2 gives us that s s′ using abstract rule AbsR = active component (ι, q, q ) for the abstract rule AbsSend where , and ′ ′ . Examining the AbsFunEval, AbsArgEval, AbsApply or Vars respectively with ι = αpid(ι) q = αps(q) qb = αps(q ) b b ′ b concrete and abstract states we see s = hπ, σ, µi and s = hπ, active component (ι, q, q ) where ι = αpid(ι), q = αps(q) and b b b ′ ′ b b σ, µi where π = αproc(π), σ = αst(σ) and µ = αms(µ). Let the q = αps(q ). It follows that b b b pid of the recipient be ι′ and let d be the value enqueued to ι′’s τ b b brb:=b ι: q −→ bq′ ∈ R. b mailbox ′ ; inspecting the proof of Theorem 1 the pid of the b b µ(ιb) b b b abstract recipient is ι′ := α (ι′) and the sent abstract value Since s → s′ with active component (ι, q, q′) it follows that pid is d = αval(d). Appealing to the soundness of the message (i) αacs(s)(ι, q) ≥ 1, b b b ′ abstraction we obtain (ii) αacs(s )(ι, q)= αacs(s)(ι, q) − 1 and b ′ ′ ′ b (iii) αacs(s )(bι,bq )= αacs(s)(ι, q ) + 1 m := αmsg(res(σ, d)) ∈ resmsg(σ, d ) ′ ′ as ι = αpidb(ιb), q = αps(bq)band q = αps(q ). We know and hence we have v and thus αacs(s) ≤ b b b b b ′ c b b r bι !mb ′ b b v(ι, q) ≥ 1.b := ι: q −−−→ q ∈ R If we define Additionally we know b b b ′ b b ′ ′ (i) αacs(s)(ι, q) ≥ 1, v := v[(ι, q) 7→ v(ι, q) − 1, (ι, q ) 7→ v(ι, q ) + 1], ′ (ii) αacs(s )(ι, q)= αacs(s)(ι, q) − 1, ′ ′ ′ ′ then it is clear that v →acs v using rule r ∈ R and the (iii) αacs(s )(bι,bq )= αacs(s)(ι, q ) + 1 and ′ ′ ′ inequalities b b b b b b b b (iv) αacs(s )(bι ,bm)= αacs(s)(b ιb, m) + 1 ′ ′ ′ since d is the message enqueued to µ(ι ) and m = αmsg(res(σ, d)). αacs(s )(ι, q)= αacs(s)(ι, q) − 1 ≤ v(ι, q) − 1= v (ι, q) b b b b From our assumption we know v and thus ′ ′ ′ ′ ′ ′ b b bαbacs(s) ≤ αacs(s )(ι, q )= αacs(s)(ι, q ) + 1 ≤ v(ι, q )+1= v (ι, q ); v(ι, q) ≥ 1; b b b b b b b ′ ′b b the consequence of the latter two is that αacs(s ) ≤ v , since αacs(s) ≤bvb, which completesb b the proof ofb b this case. b b making the definition ′ b b - R = Receive. Letting s = αcfa(s) Lemma 2 yields that s s (ι, q) 7→ v(ι, q ) − 1, ′ using abstract rule AbsReceive with active component (ι, q, q ) v′ v ′ v ′ where , and ′ ′ . We note that :=  (ι, q ) 7→ (ι, q ) + 1,  ; ι = αpid(ι) q b= αps(q) q = αps(q ) b b s = hπ, σ, µi and s = hπ, σ, µi where π = α (π), σ = (ι′,bmb) 7→ v(bι′,bm) + 1 proc b b b   αst(σ) and µ = αms(µ). Let the message matched by mmatch  b b b b  b b b ′ we observe that we are able to use rule r ∈ R to make the step and extracted from µ(ι) be d = (pi,ρ ) then inspecting rule ′ b b b b b b v →acs v . Further the inequalitiesb b b b AbsReceive we can assume that during s s′ message d = b ′ ′ ′ ′ ′ \ αacs(s )(ι, q)= αacs(s)(ι, q) − 1 ≤ v(ι, q) − 1= v (ι, q) (pi, ρ ), where ρ = αenv(ρ ), is matched by mmatch. Since ′ ′ ′ ′ ′ ′ the message abstraction is a sound datab abstractionb we knowb αacs(s )(ι, q )= αacs(s)(ι, q ) + 1 ≤ v(ι, q )+1= v (ι, q ) that ′ ′ b b ′b b b′ b b′ b′ b b αacs(s )(ι , m)= αacs(s)(ι , m) − 1 ≤ v(ι , m) − 1= v (ι , m). m := αmsg(res(σ, d)) ∈ resmsg(σ, d ) b b b b b b b b imply, since α (s) ≤ v, that α (s′) ≤ v′ which concludes and hence we have acs acs the proofb ofb this case. b b b b b b b ?mb ′ c b b ′ r := ι: q −−→ q ∈ R - R = Spawn. Take s = αcfa(s); Lemma 2 gives us that s s using abstract rule AbsSpawn with active component (ι, q, q′) Additionally we know ′ ′ where ι = α (ι), q = α (q) and q = α (q ). We note b b b pid b ps ps b b (i) αacs(s)(ι, q) ≥ 1, that , ′ ′ ′ ′ and where s = hπ, σ, µi s = hπ ,σ , µ i s = hπ, σ, µib b b (ii) αacs(s)(ι, m) ≥ 1, π = αproc(π), σ = αst(σ) and µ = αms(µ). Further we can ′ b b b (iii) αacs(s )(bι,bq)= αacs(s)(ι, q) − 1, assume ′ ′ ′ b b b b (iv) αacs(s )(bι,bq )= αacs(s)(ι, q ) + 1 and ′ b π(ι)= hfun()b→ e,ρ,a,ti b (v) αacs(s )(bι, mb )= αacs(s)(bι,bm) − 1 σ(a)= Arg hℓ,d,ρ′, ci since d is theb messageb extractedb b from µ(ι) and m = αmsg(res(σ, d)). 1 v By assumptionb b we know αacs(bs)b≤ which implies d =(spawn, ) ′ v(ι, q) ≥ 1 and v(ι, m) ≥ 1 b ι := newpid(ι,ℓ,ϑ)

b b b b π′(ι)= hι′,ρ′,c,ti = q′ ′ ′ ′′ π (ι )= he,ρ, ∗,t0i =: q Noting that we are replicating the step s → s′ in the abstract ′ s s , αcfa(s)= s and αpid ◦ newpid = new\pid ◦ α we can see ′ ′ that the new abstract pid created is ι = αpid(ι ) together with its process state ′′ ′′ . Hence we can conclude that b b q b= αps(q ) ′ ′′ νbι .qb b ′ r := ι: q −−−−→ q ∈ R b and we observe that (i) αacs(s)(ι, q) ≥ 1, b b b ′ (ii) αacs(s )(ι, q)= αacs(s)(ι, q) − 1, ′ ′ ′ (iii) αacs(s )(bι,bq )= αacs(s)(ι, q ) + 1 and ′ ′ ′′ ′ ′′ (iv) αacs(s )(bι ,bq )= αacs(sb)(ιb, q ) + 1. v Now the assumptionb b αacs(s) b≤b allows us to conclude b b v(ι, qb) ≥b 1; so that we can deﬁne (ι,bq)b7→ v(ι, q ) − 1, v′ v ′ v ′ :=  (ι, q ) 7→ (ι, q ) + 1,  ; (ι′,bq′′b) 7→ v(bι′,bq′′) + 1    b b b b ′ and use rule r ∈ R to make the step v →acs v . Further with the inequalities b b b b ′ ′ αacs(s )(ι, q)= αacs(s)(ι, q) − 1 ≤ v(ι, q) − 1= v (ι, q) ′ ′ ′ ′ ′ ′ αacs(s )(ι, q )= αacs(s)(ι, q ) + 1 ≤ v(ι, q )+1= v (ι, q ) ′ ′ ′′b b ′ b ′′b b′ b ′′ b′ b′ ′′ αacs(s )(ι , q )= αacs(s)(ι , q ) − 1 ≤ v(ι , q ) − 1= v (ι , q ). b b b b b b ′ b b′ and our assumption αacs(s) ≤ v we see that αacs(s ) ≤ v which completesb b the proofb ofb this case and theb b theorem. b b