Finding Optimum Abstractions in Parametric Dataflow Analysis

Xin Zhang Mayur Naik Hongseok Yang Georgia Institute of Technology, USA Georgia Institute of Technology, USA University of Oxford, UK [email protected] [email protected] [email protected]

Abstract (CEGAR) but differs radically in how it analyzes an abstract coun- We propose a technique to efficiently search a large family of ab- terexample trace: it computes a sufficient condition for the failure stractions in order to prove a query using a parametric dataflow of the analysis to prove the query along the trace. The condition analysis. Our technique either finds the cheapest such abstraction represents a set of abstractions such that the analysis instantiated or shows that none exists. It is based on counterexample-guided using any abstraction in this set is guaranteed to fail to prove the abstraction refinement but applies a novel meta-analysis on abstract query. Our technique finds such unviable abstractions by doing a counterexample traces to efficiently find abstractions that are inca- backward analysis on the trace. This backward analysis is a meta- pable of proving the query. We formalize the technique in a generic analysis that must be proven sound with respect to the abstract se- framework and apply it to two analyses: a type-state analysis and mantics of the forward analysis. Scalability of backward analyses is a -escape analysis. We demonstrate the effectiveness of the typically hindered by exploring program states that are unreachable technique on a suite of benchmark programs. from the initial state. Our backward analysis avoids this problem as it is guided by the trace from the forward analysis. This trace also Categories and Subject Descriptors D.2.4 [SOFTWARE EN- enables our backward analysis to do under-approximation while GINEERING]: Software/Program Verification; F.3.2 [LOGICS guaranteeing to find a non-empty set of unviable abstractions. AND MEANINGS OF PROGRAMS]: Semantics of Programming Like the forward analysis, the backward meta-analysis is a static Languages—Program analysis analysis, and the performance of our technique depends on how this meta-analysis balances its own precision and cost. If it does General Terms Languages, Verification under-approximation aggressively, it analyzes the trace efficiently Keywords Dataflow analysis, CEGAR, abstraction refinement, but finds only the current abstraction unviable and needs more itera- optimum abstraction, impossibility, under-approximation tions to converge. On the other hand, if it does under-approximation passively, it analyzes the trace inefficiently but finds many abstrac- 1. Introduction tions unviable and needs fewer iterations. We present a generic framework to develop an efficient meta-analysis, which involves A central problem in static analysis concerns how to balance its choosing an abstract domain, devising (backward) transfer func- precision and cost. A query-driven analysis seeks to address this tions, and proving these functions sound with respect to the forward problem by searching for an abstraction that discards program analysis. Our framework uses a generic DNF representation of for- details that are unnecessary for proving an individual query. mulas in the domain and provides generic optimizations to scale We consider query-driven dataflow analyses that are parametric the meta-analysis. We show the applicability of the framework to in the abstraction. The abstraction is chosen from a large family two analyses: a type-state analysis and a thread-escape analysis. that allow abstracting different parts of a program with varying We evaluate our technique using these two client analyses on precision. A large number of fine-grained abstractions enables an seven Java benchmark programs of 400K–600K bytecodes each. analysis to specialize to a query but poses a hard search problem in Both analyses are fully flow- and context-sensitive with 2N pos- practice. First, the number of abstractions is infinite or intractably sible abstractions, where N is the number of pointer variables for large to search na¨ıvely, with most abstractions in the family being type-state analysis, and the number of object allocation sites for too imprecise or too costly to prove a particular query. Second, thread-escape analysis. The technique finds the cheapest abstrac- proving queries in different parts of the same program requires tion or shows that none exists for 92.5% of queries posed on aver- different abstractions. Finally, a query might be unprovable by all age per client analysis per benchmark program. abstractions in the family, either because the query is not true or We summarize the main contributions of this paper: due to limitations of the analysis at hand. We propose an efficient technique to solve the above search • We formulate the optimum abstraction problem for parametric problem. It either finds the cheapest abstraction in the family that dataflow analysis. The formulation seeks a cheapest abstraction proves a query or shows that no abstraction in the family can prove that proves the query or an impossibility result that none exists. the query. We call this the optimum abstraction problem. Our tech- nique is based on counterexample-guided abstraction refinement • We present a new refinement-based technique to solve the prob- lem. The key idea is a meta-analysis that analyzes counterexam- ples to find abstractions that are incapable of proving the query.

Permission to make digital or hard copies of all or part of this work for personal or • We present a generic framework to design the meta-analysis classroom use is granted without fee provided that copies are not made or distributed along with an efficient representation and optimizations to scale for profit or commercial advantage and that copies bear this notice and the full citation it. We apply the framework to two analyses in the literature. on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. • We demonstrate the efficacy of our technique in practice on the PLDI’13, June 16–19, 2013, Seattle, WA, USA. two analyses, a type-state analysis and a thread-escape analysis, Copyright c 2013 ACM 978-1-4503-2014-6/13/06. . . $10.00 for several real-world Java benchmark programs. The rest of the paper is organized as follows. Section 2 illus- cannot be in the closed state after the call. This is because the trates our technique on an example. Section 3 formalizes paramet- analysis does a weak update as x is not in the must-alias set of ric dataflow analysis and the optimum abstraction problem. We first the incoming abstract state. define a generic framework and then apply it to our two example At this point our technique has eliminated abstraction π = {} analyses. Section 4 presents our meta-analysis, first in a generic set- as incapable of proving query check 1 and must determine which ting and then for a domain that suffices for our two analyses. Sec- abstraction to try next. Since the number of abstractions is large, tion 5 gives our overall algorithm and Section 6 presents empirical however, it first analyzes the trace to eliminate abstractions that results. Section 7 discusses related work and Section 8 concludes. are guaranteed to suffer a failure similar to the current one. It does so by performing a backward meta-analysis that inspects the 2. Example trace backwards and analyzes the result of the (forward) type- state analysis. This meta-analysis is itself a static analysis and the We illustrate our technique using the program in Figure 1(a). The abstract states it computes are denoted by ↑ in Figure 1(c). Each program operates on a File object which can be in either state abstract state of the meta-analysis represents a set of pairs (d, π) opened or closed at any instant. It begins in state closed upon consisting of an abstract state d of the forward analysis and an creation, transitions to state opened after the call to open(), and abstraction π. An abstract state of the meta-analysis is a boolean back to state closed after the call to close(). Suppose that it is formula over primitives of the form: (1) err representing the set of an error to call open() in the opened state, or to call close() in pairs where the d component is >; (2) closed ∈ ts representing the closed state. Statements check and check at the end of the 1 2 the set of pairs where the d component is hts, vsi and ts contains program are queries which ask whether the state of the File object closed; (3) x ∈ vs which is analogous to (2) except that vs contains is closed or opened, respectively, at the end of every program run. x; and (4) x ∈ π meaning the π component contains x. A static type-state analysis can be used to conservatively answer The meta-analysis starts with the weakest formula under which such queries. Such an analysis must track aliasing relationships check fails, which is err ∨ (opened ∈ ts), and propagates its between pointer variables in order to track the state of objects 1 weakest precondition at each step of the trace to obtain formula correctly and precisely. For instance, in our example program, the (closed ∈ ts) ∧ (opened ∈/ ts) ∧ (x∈ / π) at the start of the trace. analysis must infer that variables x and y point to the same File This formula implies that any abstraction not containing variable x object in order to prove query check . Analyses that track more 1 cannot prove query check . Our meta-analysis has two notable as- program facts are typically more precise but also more costly. A 1 pects. First, it does not require the family of abstractions to be finite: query-based analysis enables striking a balance between precision it can show that all abstractions in even an infinite family cannot and cost by not tracking program facts that are unnecessary for prove a query. Second, it avoids the blowup inherent in backward proving an individual query. analyses by performing under-approximation. It achieves this for Making our type-state analysis query-based enables it to track the above domain of boolean formulae by dropping disjuncts in the only variables that matter for answering a particular query (such as DNF representation of the formulae. A crucial condition ensured by check ). We therefore parameterize this analysis by an abstraction 1 our meta-analysis during under-approximation is that the abstract π which specifies the set of variables that the analysis must track. state computed by the forward analysis is contained in the resulting An abstraction π is cheaper than an abstraction π if π tracks 1 2 1 simpler formula. Otherwise, it is not possible to guarantee that the fewer variables than π2 (i.e., |π1| < |π2|). For a program with N eliminated abstractions cannot prove the query. Section 4 explains N variables there are 2 possible abstractions. Figure 1(b) shows how our technique picks the best k disjuncts for a k ≥ 1 speci- which abstractions in this family are suitable for each of our two fied by the analysis designer. Smaller k enables the meta-analysis queries. Any abstraction containing variables x and y is precise to run efficiently on each trace but can require more iterations by enough to let our analysis prove query check 1. The cheapest of eliminating only the current abstraction in each iteration in the ex- these abstractions is {x, y}. On the other hand, our analysis cannot treme case. We use k = 1 for this example and highlight in bold the prove query check 2 using any abstraction in the family. This is chosen (retained) disjunct in each formula with multiple disjuncts because query check 2 is not true concretely, but the analysis may in Figure 1. For instance, we drop the second disjunct in formula fail to prove even true queries due to its conservative nature. err∨(opened ∈ ts) at the end point of the trace in Figure 1(c), since We propose an efficient technique for finding the cheapest ab- abstract state > computed by the forward analysis at that point is straction in the family that proves a query or showing that no ab- not in the set of states represented by (opened ∈ ts). The weak- straction in the family can prove the query. We illustrate our tech- est precondition of the chosen disjunct err with respect to the call nique on our parametric type-state analysis. This analysis is fully y.close() is err∨(closed ∈ ts), but this time we drop disjunct err as flow- and context-sensitive, and tracks for each allocation site in the abstract state h{closed, opened}, {}i computed by the forward the program a pair hts, vsi or >, where ts over-approximates the analysis is not in the set of states represented by err. set of possible type-states of an object created at that site, and vs Having eliminated all abstractions that do not contain variable is a must-alias set—a set of variables that definitely point to that x, our technique next runs the type-state analysis with abstraction object. Only variables in the abstraction π used to instantiate the π = {x}, but again fails to prove query check 1, and produces analysis are allowed to appear in any must-alias set. > denotes that the trace showed in Figure 1(d). Our meta-analysis performed on the analysis has detected a type-state error. this trace infers that any abstraction containing variable x but not Our technique starts by running the analysis with the cheapest containing variable y cannot prove the query. Hence, our technique abstraction. For our type-state analysis this corresponds to not next runs the type-state analysis with abstraction π = {x, y}, and tracking any variable at all. The resulting analysis fails to prove this time succeeds in proving the query. Our technique is effective query check 1. Our technique is CEGAR-based and requires the at slicing away program details that are irrelevant to proving a analysis to produce an abstract counterexample trace as a failure query. For instance, even if either of the traces in Figure 1 (c) or witness. Such a trace showing the failure to prove query check 1 (d) contained the statement “if (*) z = x”, variable z would not be under the cheapest abstraction is shown in Figure 1(c). The trace is included in any abstraction that our technique tries; indeed, tracking annotated with abstract states computed by the analysis, denoted ↓, variable z is not necessary for proving query check . that track information about the lone allocation site in the program. 1 Finally, consider the query check 2. Our technique starts with Note that state h{closed}, {}i incoming into call x.open() results in the cheapest abstraction π = {}, and obtains a trace identical outgoing state h{opened, closed}, {}i even though the File object ↑ closed ∈ ts ↑ closed ∈ ts ∧ opened ∈/ ts ↑ closed ∈ ts x = new File; ∧ opened ∈/ ts ∧ x∈ / π ∧ y∈ / π ∧ x ∈ π ∧ opened ∈/ ts ∧ x ∈ π y = x; x = new File; x = new File; x = new File; if (*) z = x; ↓ h{closed}, {}i ↓ h{closed}, {x}i ↓ h{closed}, {x}i x.open(); ↑ closed ∈ ts ↑ closed ∈ ts ∧ opened ∈/ ts ↑ closed ∈ ts y.close(); ∧ opened ∈/ ts ∧ x∈ / vs ∧ y∈ / π ∧ x ∈ vs ∧ opened ∈/ ts ∧ x ∈ vs if (*) y = x; y = x; y = x; check 1(x, closed); ↓ h{closed}, {}i ↓ h{closed}, {x}i ↓ h{closed}, {x}i else ↑ closed ∈ ts ↑ closed ∈ ts ∧ opened ∈/ ts ↑ closed ∈ ts check 2(x, opened); ∧ opened ∈/ ts ∧ x∈ / vs ∧ y∈ / vs ∧ x ∈ vs ∧ opened ∈/ ts ∧ x ∈ vs x.open(); x.open(); x.open(); (a) Example program. ↓ h{closed, opened}, {}i ↓ h{opened}, {x}i ↓ h{opened}, {x}i ↑ err ∨ closed ∈ ts ↑ opened ∈ ts ∧ closed ∈/ ts ↑ opened ∈ ts ∧ closed ∈/ ts y.close(); ∧ y∈ / vs y.close(); ↓ > y.close(); ↓ h{closed, opened}, {x}i query abstraction ↑ err ∨ opened ∈ ts ↓ h{closed, opened}, {x}i ↑ err ∨ closed ∈ ts check 1 any π ⊇ {x, y} check 1(x, closed); ↑ err ∨ opened ∈ ts check 2(x, opened); check 2 none check 1(x, closed);

(b) Feasible solutions. (c) Iteration 1 for query (d) Iteration 2 for query (e) Iteration 2 for query check 1 using π = {}. check 1 using π = {x}. check 2 using π = {x}. Figure 1. Example illustrating our technique for a parametric type-state analysis. to that in Figure 1(b) for query check 1, except that the meta- trace(a) = {a} analysis starts by propagating formula err ∨ (closed ∈ ts) instead trace(s + s0) = trace(s) ∪ trace(s0) of err ∨ (opened ∈ ts). The same disjunct err in this formula is trace(s ; s0) = {τ τ 0 | τ ∈ trace(s) ∧ τ 0 ∈ trace(s0)} retained and the rest of the meta-analysis result is identical. Thus, trace(s∗) = leastFix λT. {} ∪ {τ; τ 0 | τ ∈ T ∧ τ 0 ∈ trace(s)} in the next iteration, our technique runs the type-state analysis with π = {x} abstraction , and obtains the trace shown in Figure 1(e). Figure 2. Traces of a program s. Symbol  denotes an empty trace. This time, the meta-analysis retains disjunct (closed ∈ ts), and concludes any abstraction containing variable x cannot prove the query. Since in the first iteration all abstractions without variable x 2. A finite set D of abstract states. Our analysis uses a set of were eliminated, our technique concludes that the analysis cannot abstract states to approximate reachable concrete states at each prove query check 2 using any abstraction. program point. Formally, this means the analysis is disjunctive.

3. A transfer function a π : D → D for each atomic command a. 3. Formalism The function is parameterizedJ K by π ∈ P. This section describes a formal setting used throughout the paper. A parametric analysis analyzes a program in a standard way ex- cept that it requires an abstraction to be provided before the anal- 3.1 Programming Language ysis starts. The abstract semantics in Figure 3 describes the behav- We present our results using a simple imperative language: ior of the analysis formally. In the figure, a program s denotes a transformer Fπ[s] on sets of abstract states, which is parameterized (atomic command) a ::= ... 0 0 ∗ by π ∈ P. Note that the abstraction π is used when atomic com- (program) s ::= a | s ; s | s + s | s mands are interpreted. Hence, π controls the analysis by changing The language includes a (unspecified) set of atomic commands. the transfer functions for atomic commands. Besides this parame- Examples are assignments v = w.f and statements assume(e) terization all the defining clauses are standard. which filter out executions where e evaluates to false. The language Our parametric analyses satisfy a fact of disjunctive analyses: also has the standard compound constructs: sequential composi- LEMMA 1. For all programs s, abstractions π, and abstract states tion, non-deterministic choice, and iteration. A trace τ is a finite se- d, we have Fπ[s]({d}) = {Fπ[τ](d) | τ ∈ trace(s)}, where quence of atomic commands a1a2 . . . an. It records the steps taken F [τ] is the result of analyzing trace τ as shown in Figure 3. during one execution of a program. Function trace(s) in Figure 2 π 0 shows a standard way to generate all traces of a program s. The lemma ensures that for all final abstract states d ∈ Fπ[s]({d}), we can construct a trace τ transforming d to d0. This trace does not 3.2 Parametric Dataflow Analysis have loops and is much simpler than the original program s. We We consider dataflow analyses whose transfer functions for atomic show later how our technique exploits this simplicity (Section 4). commands are parametric in the abstraction. We specify such a We now formalize two example parametric analyses. parametric analysis by the following data: Type-State Analysis. Our first example is a variant of the type- 1. A set (P, ) with a preorder  (i.e.,  is reflexive and tran- state analysis from [8]. The original analysis maintains various sitive). Elements π ∈ P are parameter values. We call them kinds of aliasing facts in order to track the type-state of objects abstractions as they determine the degree of approximation per- correctly and precisely. Our variant only tracks must-alias facts. formed by the analysis. The preorder  on π’s dictates the cost We assume we are given a set T of type-states containing init, of the analysis: using a smaller π yields a cheaper analysis. We which represents the initial type-state of objects, and a function require that every nonempty subset P ⊆ P has a minimum ele- m : T → T ∪ {>} for every method m, which describes how a ment π ∈ P (i.e., π  π0 for every π0 ∈ P ). JcallKx.m() changes the type-state of object x and when it leads to F [s] : 2D → 2D representing disjoint sets of heap objects, except that both L and π E include the null value. Abstract location E summaries null, all Fπ[a](D) = { a π(d) | d ∈ D} the thread-escaping objects, and possibly some thread-local ones. 0 0 Fπ[s ; s ](D) = (FJ πK[s ] ◦ Fπ[s])(D) On the other hand, L denotes all the other heap objects and null; 0 0 Fπ[s + s ](D) = Fπ[s](D) ∪ Fπ[s ](D) hence it means only thread-local objects, although it might miss ∗ Fπ[s ](D) = leastFix λD0.D ∪ Fπ[s](D0) some such objects. The analysis maintains an invariant that E- summarized objects are closed under pointer reachability: if a heap Fπ[τ]: D → D object is summarized by E, following any of its fields always gives

Fπ[](d) = d null or E-summarized objects, but not L-summarized ones. Fπ[a](d) = a π(d) An abstraction π determines for each site h ∈ H whether 0 0 Fπ[τ ; τ ](d) = FJ πK[τ ](Fπ[τ](d)) L or E should be used to summarize objects allocated at h. An abstract element d is a function from local variables or fields of L- Figure 3. Abstract semantics. In the case of loop, we take the least summarized objects to abstract locations or N. It records the values fixpoint with respect to the subset order in the powerset domain 2D. of local variables and those of the fields of L-summarized heap objects. For instance, abstract state [v 7→ L, f1 7→ E, f2 7→ E] means that local variable v points to some heap object summarized (type-states) σ ∈ T (we assume init ∈ T) by L, and fields f1, f2 of every L object point to those summarized (variables) x, y ∈ V by E. Since all thread-escaping objects are summarized by E, this (analysis parameter) π ∈ = 2V abstract state implies that v points to a thread-local heap object. P 0 T V We order abstractions π  π based on how many sites are (abstract state) d ∈ D = (2 × 2 ) ∪ {>} 0 0 0 mapped to L by π and π . This is because the number of L-mapped (order on parameters) π  π ⇔ |π| ≤ |π | sites usually correlates the performance of the analysis. The thread-escape analysis includes transfer functions of the (transfer function) a π : D → D following heap-manipulating commands: J K a π(>) = > 0  a ::= v = new h | g = v | v = g | v = null | v = v | J K (ts, vs ∪ {x}) if y ∈ vs ∧ x ∈ π 0 0 x = y π(ts, vs) = v = v .f | v.f = v J K (ts, vs \{x}) otherwise x = null π(ts, vs) = (ts, vs \{x}) Here g and v are global and local variables, respectively, and h ∈ H  J K > if ∃σ ∈ ts. m (σ) = > is an allocation site. Transfer functions of these commands simulate  their usual meanings but on abstract instead of concrete states. x.m() π(ts, vs) = ({ m (σ) | σ ∈ ts}, vs) JelseK if x ∈ vs J K (tsJ ∪K { m (σ) | σ ∈ ts}, vs) otherwise Function v = new h π(d) makes variable v point to abstract J K location π(h)J, simulatingK the allocation of a π(h)-summarized ob- ject and the binding of this object with v. The transfer function for Figure 4. Data for the type-state analysis. g = v models that if v points to a thread-local object, this assign- ment makes the object escape, because it exposes the object to other an error, denoted >. Using these data, we specify the domains and threads via the global variable g. When v points to L, the transfer transfer functions of the analysis in Figure 4. function calls esc(d), which sets all local variables to E (unless they The abstraction π for the analysis is a set of variables that have value N) and resets all fields to N. This means that after calling determines what can appear in the must-alias set of an abstract state esc(d), the analysis loses most of the information about thread lo- during the analysis. An abstract state d has form (ts, vs) or >, cality, and concludes that all variables point to potentially escaping where vs should be a subset of π, and it tracks information about a objects. This dramatic information loss is inevitable because if v single object. In the former case, ts represents all the possible type- points to L beforehand, the assignment g = v can cause any object states of the tracked object and vs, the must-alias set of this object. summarized by L to escape. The transfer functions of the remaining The latter means that the object can be in any type-state including commands in the figure can be understood similarly by referring to the error state >. For brevity, we show transfer functions only for the concrete semantics and approximating it over abstract states. simple assignments and method calls. Those for assignments x = y Figure 6 shows the abstract state (denoted by ↓) computed by and x = null update the must-alias set according to their concrete our thread-escape analysis at each point of an example program. semantics and abstraction π. The transfer function for a method call The results of the analysis in parts (a) and (b1) of the figure are x.m() updates the ts component of the input abstract state using obtained using the same abstraction [h1 7→ E, h2 7→ E] and are m . In the case where the must-alias set is imprecise, this update thus identical; the results in part (b2) are obtained using a different includesJ K both the old and new type-states. abstraction [h1 7→ L, h2 7→ E]. According to our order , an abstraction π is smaller than 0 0 π when its cardinality is smaller than π . Hence, running this 3.3 Optimum Abstraction Problem analysis with a smaller π implies that the analysis would be less precise about must-alias sets, which normally correlates the faster Parametric dataflow analyses are used in the context of query- convergence of the fixpoint iteration of the analysis. driven verification where we are given not just a program to analyze but also a query to prove. In this usage scenario, the most important Thread-Escape Analysis. Our second example is a variant of the matter to resolve before running the analysis is to choose a right thread-escape analysis from [17]. A heap object in a multithreaded abstraction π. Ideally, we would like to pick π that causes the shared-memory program is thread-local when it is reachable only analysis to keep enough information to prove a given query for from at most a single thread. A thread-escape analysis conserva- a given program, but to discard information unnecessary for this tively answers queries about thread locality. proof, so that the analysis achieves high efficiency. Let H be the set of allocation sites, L that of local variables, and The optimum abstraction problem provides a guideline on re- F that of object fields in a program. Using these data, we specify solving the issue of abstraction selection. It sets a specific target the domains and transfer functions of the analysis in Figure 5. on abstractions to aim at. Assume that we are interested in queries There N means the null value, and L and E are abstract locations, expressed as subsets of D. The problem is defined as follows: ↓ [u 7→ N, v 7→ N] ↓ [u 7→ N, v 7→ N] ↓ [u 7→ N, v 7→ N] ↑ h1→E ∨ (h2→E ∧ h1→L) ↑ h1→E ↑ h1→L ∧ h2→E u = new h1; u = new h1; u = new h1; ↓ [u 7→ E, v 7→ N] ↓ [u 7→ E, v 7→ N] ↓ [u 7→ L, v 7→ N] ↑ u→E ∨ (h2→E ∧ u→L) ∨ (h2→L ∧ f→E ∧ u→L) ↑ u→E ↑ u→L ∧ h2→E v = new h2; v = new h2; v = new h2; ↓ [u 7→ E, v 7→ E] ↓ [u 7→ E, v 7→ E] ↓ [u 7→ L, v 7→ E] ↑ u→E ∨ (v→E ∧ u→L) ∨ (v→L ∧ f→E ∧ u→L) ↑ u→E ↑ u→L ∧ v→E v.f = u; v.f = u; v.f = u; ↓ [u 7→ E, v 7→ E] ↓ [u 7→ E, v 7→ E] ↓ [u 7→ E, v 7→ E] ↑ u→E ↑ u→E ↑ u→E pc: local(u)? pc: local(u)? pc: local(u)? (a) π = [h1 7→ E, h2 7→ E] (b1) π = [h1 7→ E, h2 7→ E] (b2) π = [h1 7→ L, h2 7→ E] Figure 6. Example showing our technique for thread-escape analysis with under-approximation (parts (b1) and (b2)) and without (part (a)).

query often does not exist or it is expensive to compute, so they ask (allocation sites) h ∈ H for a minimal parameter instead. (local variables) v ∈ V Our approach to solve the problem is based on using a form (object fields) f ∈ F of a backward meta-analysis that reasons about the behavior of a (analysis parameter) π ∈ = → {L, E} P H parametric dataflow analysis under different abstractions simulta- (abstract state) d ∈ = ( ∪ ) → {L, E, N} D L F neously. We explain this backward meta-analysis next. 0 (order on parameters) π  π ⇔ |{h ∈ H | π(h) = L}| ≤ 0 |{h ∈ H | π (h) = L}| esc : D → D 4. Backward Meta-Analysis esc(d) = λu. if (d(u) = N ∨ u ∈ F) then N else E In this section, we fix a parametric dataflow analysis (P, , D, − ) and describe corresponding backward meta-analyses. To avoidJ theK (transfer function) a π : D → D J K confusion between these two analyses, we often call the parametric v = new h π(d) = d[v 7→ π(h)] analysis as forward analysis. J g = vKπ(d) = if (d(v) = L) then esc(d) else d A backward meta-analysis is a core component of our algorithm Jv = gKπ(d) = d[v 7→ E] for the optimum abstraction problem. It is invoked when the for- v =JnullKπ(d) = d[v 7→ N] ward analysis fails to prove a query. The meta-analysis attempts 0 0 J v = v Kπ(d) = d[v 7→ d(v )] to determine why a run of the forward analysis with a specific ab- J K d[v 7→ d(f)] if (d(v0) = L) v = v0.f (d) = straction π fails to prove a query, and to generalize this reason. π d[v 7→ E] otherwise Concretely, the inputs to the meta-analysis are a trace τ, an ab- J 0K v.f = v π(d) = straction π, and an initial abstract state dI , such that the π instance J esc(d) K if d(v) = E ∧ d(v0) = L of the forward analysis fails to prove that a given query holds at  0 d if (d(v) = E ∧ d(v ) 6= L) ∨ d(v) = N the end of τ. Given such inputs, the meta-analysis analyzes τ back- esc(d) if d(v) = L ∧ {d(f), d(v0)} = {L, E} ward, and collects abstractions that lead to a similar verification d[f 7→ ] d(v) = ∧ {d(f), d(v0)} = { , } failure of the forward analysis. The collected abstractions are used  L if L N L d[f 7→ E] if d(v) = L ∧ {d(f), d(v0)} = {N, E} subsequently when our top-level algorithm computes a necessary  d if d(v) = L ∧ d(f) = d(v0) condition on abstractions for proving the given query and chooses a next abstraction to try based on this condition. Figure 5. Data for the thread-escape analysis. Formally, the meta-analysis is specified by the following data:

• A set M and a function DEFINITION 2 (Optimal Abstraction Problem). Given a program γ : → 2P×D. s, an initial abstract state dI , and a query q ⊆ D, find a minimum M abstraction1 π such that Elements in M are the main data structures used by the meta- Fπ[s]{dI } ⊆ q, (1) analysis, and γ determines their meanings. We suggest to read elements in M as predicates over P×D. The meta-analysis uses or show that Fπ[s]{dI } ⊆ q does not hold for any π. such a predicate φ ∈ M to express a sufficient condition for The problem asks for not a minimal π but a minimum hence cheap- verification failure: for every (π, d) ∈ γ(φ), if we instantiate est one. This requirement encourages any solution to exploit cheap the forward analysis with π and run this instance from the abstractions (determined by ) as much as possible. Note that the abstract state d (over the part of a trace analyzed so far), we requirement is aligned with the intended meaning of , which com- will fail to prove a given query. pares parameters in terms of the analysis cost. A different way of • A function comparing parameters in terms of precision, not cost, is taken in b a : M → M Liang et al. [16]. In their case, a minimum parameter for proving a J K for each atomic command a. The input φ1 ∈ M represents a 1 0 × φ The condition that π be minimum means: if Fπ0 [s]dI ⊆ q, then π  π . postcondition on P D. Given such 1, the function computes In contrast, the minimality of π means the absence of π0 that satisfies (1) the weakest precondition φ such that running a from any and is strictly smaller than π according to . abstract state in γ(φ) has an outcome in γ(φ1).J ThisK intuition B[τ]: P × D × M → M (primitive formula) p ∈ PForm B[](π, d, φ) = φ p ::= err | param(x) | var(x) | type(σ) B[a](π, d, φ) = approx(π, d, a b(φ)) B[τ ; τ 0](π, d, φ) = B[τ](π, d, B[Jτ 0K](π, F [τ](d), φ)) γ(err) = {(π, >)} π γ(param(x)) = {(π, d) | x ∈ π} γ(var(x)) = {(π, (ts, vs)) | x ∈ vs} Figure 7. Backward meta-analysis. γ(type(σ)) = {(π, (ts, vs)) | σ ∈ ts} toDNF(φ) transforms φ to the DNF form and sorts disjuncts by size p v p0 ⇔ p = p0 0 0 0 simplify(W{φ | i ∈ {1, . . . , n}}) = φ v φ ⇔ φ = φ , or both φ and φ are conjunction of primitive i 0 0 W{φ | i ∈ {1, . . . , n} ∧ ¬(∃j < i.φ v φ )} formulas and for every conjunct p of φ , there exists i j i 0 W a conjunct p of φ such that p v p drop (π, d, {φi | i ∈ {1, . . . , n}}) = k W ( {φi | i ∈ {1,..., min(k − 1, n)}}) ∨ φj Figure 9. Data for backward meta-analysis for type-state analysis. (where (π, d) ∈ γ(φj ) and j is the smallest such index) where PForm is a set of primitive formulas, and the boolean Figure 8. Functions for manipulating φ’s. operators have the standard meanings: b is formalized by the following requirement on a : γ(true) = P × D γ(false) = ∅ γ(¬φ) = (P × D) \ γ(φ) 0 0 0 0 b J K γ(φ∧φ ) = γ(φ) ∩ γ(φ ) γ(φ∨φ ) = γ(φ) ∪ γ(φ ) ∀φ1 ∈ M. γ( a (φ1)) = {(π, d) | (π, a π(d)) ∈ γ(φ1)}. J K J K (2) • The set M comes with a binary relation v such that • 0 0 0 A function ∀φ, φ ∈ M. φ v φ ⇒ γ(φ) ⊆ γ(φ ). approx : P × D × M → M. The domain of a disjunctive meta-analysis in a sense contains The function is required to meet the following two conditions: M all boolean formulas which are constructed from primitive ones 1. ∀π, d, φ. γ(approx(π, d, φ)) ⊆ γ(φ); and in PForm. We define a generic under-approximation operator for 2. ∀π, d, φ. (π, d) ∈ γ(φ) ⇒ (π, d) ∈ γ(approx(π, d, φ)). disjunctive meta-analyses as follows: The first ensures that approx(π, d, φ) under-approximates the approx : P × D × M → M input φ, and the second says that this under-approximation approx(π, d, φ) = let φ0 = (simplify ◦ toDNF)(φ) in should keep at least (π, d), if it is already in γ(φ). The main if (number of disjuncts in φ0 ≤ k) then φ0 0 purpose of approx is to simplify φ. For instance, when φ is a else dropk(π, d, φ ) logical formula, approx converts φ to a syntactically simpler Subroutines toDNF, simplify, and drop are defined in Figure 8. The one. The operator is invoked frequently by our meta-analysis to approx operator first transforms φ to disjunctive normal form and keep the complexity of the value (the analysis’s main data M removes redundant disjuncts in the DNF formula that are subsumed structure) under control. by other shorter disjuncts in the same formula. If the resulting Using the given data, our backward meta-analysis analyzes a trace formula φ0 is simple enough in that it has no more than k disjuncts τ backward as described in Figure 7. For each atomic command a (where k is pre-determined by an analysis designer), then φ0 is in τ, it transforms an input φ using a b first. Then, it simplifies the returned as the result. Otherwise, some disjuncts of φ0 are pruned: resulting φ0 using the approx operator.J K the first k − 1 disjuncts according to their syntactic size survive, Our meta-analysis correctly tracks a sufficient condition that the together with the shortest disjunct φj that includes the input (π, d) forward analysis fails to prove a query. This condition is not trivial (i.e., (π, d) ∈ γ(φj )). Our pruning is an instance of beam search (i.e., it is a satisfiable formula), and includes enough information in Artificial Intelligence which keeps only the most promising k about the current failed verification attempt on τ by the forward options during exploration of a search space. analysis. Our theorem below formalizes these guarantees. Meta-Analysis for Type-State. We define a disjunctive meta- THEOREM 3 (Soundness). For all τ, π, d and φ ∈ M, analysis for our type-state analysis using the above recipe. Doing so means defining three entities shown in Figures 9 and 10: the set of 1. (π, Fπ[τ](d)) ∈ γ(φ) ⇒ (π, d) ∈ γ(B[τ](π, d, φ)); and primitive formulas, the order v on formulas, and a function − b. 2. ∀(π0, d0) ∈ γ(B[τ](π, d, φ)). (π0,Fπ0 [τ](d0)) ∈ γ(φ). The meta-analysis uses three primitive formulas. The firstJ isKerr Proof See Appendix A.2. which says the d component of a pair (π, d) is >. The remaining three formulas describe elements that should be included in some component of (π, d). For instance, var(x) says that the d compo- 4.1 Disjunctive Meta-Analysis and Underapproximation nent is a non-> value (ts, vs) such that the vs part contains x. 0 0 Designing a good under-approximation operator approx is impor- We order formulas φ v φ in M when φ and φ are the same, 0 tant for the performance of a backward meta-analysis, and it often or both φ and φ are conjunction of primitive formulas and every 0 0 requires new insights. In this subsection, we identify a special sub- primitive formula p in φ corresponds to some primitive formula p 0 class of meta-analyses, called disjunctive meta-analyses, and define in φ that implies p . a generic under-approximation operator for these meta-analyses. Finally, for each atomic command a, the meta-analysis uses Both our example analyses have disjunctive meta-analyses and use transfer function a b, which we show in the appendix satisfies the generic under-approximation operator. requirement (2) ofJ ourK framework. This requirement means that A meta-analysis is disjunctive if the following conditions hold: a b collects all the abstractions π and abstract pre-states d such JthatK the run of the π-instantiated analysis with d generates a result • The set consists of formulas φ: M satisfying φ. That is, a b(φ) computes the weakest precondition 0 0 φ ::= p | true | false | ¬φ | φ∧φ | φ∨φ (p ∈ PForm) of a π with respect toJ theK postcondition φ. J K b Figure 6 shows the backward meta-analysis for thread-escape x = y (err) = err analysis without and with under-approximation on an example pro- J Kb x = null (err) = err gram. The trace in part (a) is generated by the forward analysis J Kb W x.m() (err) = err ∨ {type(σ) | m (σ) = >} using initial abstraction [h1 7→ E, h2 7→ E]. The abstract states x = yJ b(paramK (z)) = param(z) J K computed by the meta-analysis at each point of this trace without x =J nullKb(param(z)) = param(z) under-approximation are denoted by ↑. It correctly computes the J x.m()Kb(param(z)) = param(z) sufficient condition for failure at the start of the trace as h1→E ∨  J K b param(x) ∧ var(y) if x ≡ z (h2→E∧h1→L), thus yielding the cheapest abstraction that proves x = y (var(z)) = the query as [h1 7→ L, h2 7→ L]. Despite taking a single iteration, J K var(z) otherwise false if x ≡ z however, the lack of under-approximation causes a blow-up in the x = null b(var(z)) = var(z) otherwise size of the formula tracked by the meta-analysis. J K Part (b) shows the result with under-approximation, using k = 1 x.m() b(var(z)) = var(z) ∧ V{¬type(σ) | m (σ) = >} J bK J K in the beam search via function dropk. This time, the first iteration, x = y (type(σ)) = type(σ) shown in part (b1), yields a stronger sufficient condition for failure, J Kb x = null (type(σ)) = type(σ) h1→E, causing our technique to next run the forward analysis using b J x.m()K (type(σ)) = ¬err abstraction [h1 7→ L, h2 7→ E]. But the analysis again fails to prove V 0 0 J K ∧ ( {¬type(σ ) | m (σ ) = >}) the query, and the second iteration, shown in part (b2), computes ∧ ((¬var(x) ∧ typeJ(σ))K the sufficient condition for failure as (h1→L ∧ h2→E). By com- W 0 0 ∨ {type(σ ) | m (σ ) = σ}) bining these conditions from the two iterations, our technique finds J K the same cheapest abstraction as that without under-approximation Figure 10. Backward transfer function for type-state analysis. in part (a). Despite needing an extra iteration, the formulas tracked in part (b) are much more compact than those in part (a). LEMMA 4. For every atomic command a, backward transfer func- tion a b in Figure 10 satisfies requirement (2) of our framework. 5. Iterative Forward-Backward Analysis J K Proof See Appendix A.4. This section presents our top-level algorithm, called TRACER, which brings a parametric analysis and a corresponding backward Meta-Analysis for Thread-Escape. The backward meta-analysis meta-analysis together, and solves the parametric static analysis for the thread-escape analysis is also disjunctive. Its domain M is problem. Throughout the section, we fix a parametric analysis and constructed from the following primitive formulas p: a backward meta-analysis, and denote them by (P, , D, − ) and b p ::= h→o | v→o | f→o (M, γ, − , approx), respectively. J K OurJ TRACERK algorithm assumes that queries are expressed by where o is an abstract value in {L, E, N}, and h, v, f are an alloca- elements φ in M satisfying the following condition: tion site, a local variable, and a field, respectively. These formulas 0 0 describe properties about pairs (π, d) of abstraction and abstract ∃D0 ⊆ D. γ(φ) = P × D0 ∧ (∃φ ∈ M. γ(φ ) = P × (D \ D0)). state. Formula h→o says that an abstraction π should map h to o. The first conjunct means that φ is independent of abstractions, and Formula v→o means that an abstract state d should bind v to o; the second, that the negation of φ is expressible in M.A φ satisfying formula f→o expresses a similar fact on the field f. We formalize × these two conditions is called a query and is denoted by symbol q. these meanings via function γ : M → 2P D below: The negation of a query q is also a query and is denoted not(q). γ(h→o) = {(π, d) | π(h) = o} γ(v→o) = {(π, d) | d(v) = o} TRACER takes as inputs initial abstract state dI , a program γ(f→o) = {(π, d) | d(f) = o} s, and a query q ∈ M. Given such inputs, TRACER repeatedly invokes the forward analysis with different abstractions, until it All of our primitive formulas express a form of binding properties. proves the query or finds that the forward analysis cannot prove The negations of these formulas simply enumerate all the other the query no matter what abstraction is used. The key part of possible bindings: when η is a local variable v or a field f, 0 TRACER is to choose a new abstraction π to try after the forward not(h→L) = h→E not(h→E) = h→L analysis fails to prove the query using some π.TRACER does this not(h→N) = true not(η→L) = (η→E ∨ η→N) abstraction selection using the backward meta-analysis which goes not(η→E) = (η→L ∨ η→N) not(η→N) = (η→L ∨ η→E) over an abstract counterexample trace of the forward analysis and computes a condition on abstractions that are necessary for proving The allocation sites are treated differently because no abstraction 0 the query. TRACER chooses a minimum-cost abstraction π among π maps a site h to N. We order formulas φ v φ0 in when M such abstractions. our simple entailment checker concludes that φ0 subsumes φ. This The TRACER algorithm is shown in Algorithm 1. It uses vari- conclusion is reached when φ and φ0 are the same, or both φ and φ0 able Π to track abstractions that can potentially prove the are conjunction of primitive formulas and all the primitive formulas viable query. Whenever TRACER calls the forward analysis, it picks a min- in φ0 appear in φ. This proof strategy is fast yet highly incomplete. imum π from Π , and instantiates the forward analysis with π However, we found it sufficient in practice for our application, viable before running the analysis (lines 8-9). Also, whenever TRACER where the order is used to detect redundant disjuncts in formulas learns a necessary condition from the backward meta-analysis for in the DNF form. proving a query (i.e., \ Π in line 14), it conjoins the condition Figure 11 shows transfer function a b of the meta-analysis for P with Π (line 15). In the description of the algorithm, we do each atomic command a. We show inJ theK appendix that it satisfies viable not specify how to choose an abstract counterexample trace τ from requirement (2) of our framework which determines the semantics a failed run of the forward analysis. Such traces can be chosen by of the function using weakest preconditions. well-known techniques from software model checking [2, 20]. We LEMMA 5. For every atomic command a, backward transfer func- show the correctness of our algorithm in the appendix. tion a b in Figure 11 satisfies requirement (2) of our framework. J K THEOREM 6.T RACER(dI , s, q) computes correct results, and it is Proof See Appendix A.3. guaranteed to terminate when P is finite.  false if δ ≡ v ∧ o = L  v→ ∨ v→ δ ≡ v ∧ o =  L E if E  v→N if δ ≡ v ∧ o = N   (v→E ∨ v→N) ∧ δ→L if δ ∈ (L \{v}) ∧ o = L b  g = v (δ→o) = (v→L ∧ δ→L) ∨ δ→E if δ ∈ (L \{v}) ∧ o = E J K  δ→N if δ ∈ (L \{v}) ∧ o = N   (v→E ∨ v→N) ∧ δ→o if δ ∈ F ∧ o ∈ {L, E}   v→L ∨ ((v→E ∨ v→N) ∧ δ→N) if δ ∈ F ∧ o = N  δ→o otherwise v = g b(δ→o) = if (δ ≡ v ∧ o = E) then (true) else (if (δ ≡ v ∧ o 6= E) then (false) else (δ→o)) J K v = new h b(δ→o) = if (δ ≡ v) then (h→o) else (δ→o) J K v = null b(δ→o) = if (δ ≡ v ∧ o = N) then (true) else (if (δ ≡ v ∧ o 6= N) then (false) else (δ→o)) J K v = v0 b(δ→o) = if (δ ≡ v) then (v0→o) else (δ→o) J K  0 0 0  (v →L ∧ f→E) ∨ v →E ∨ v →N if δ ≡ v ∧ o = E v = v0.f b(δ→o) = v0→L ∧ f→o if δ ≡ v ∧ o 6= E J K  δ→o if δ 6≡ v  δ→o if δ 6∈ (L ∪ F)  0  δ→E ∨ (δ→L ∧ v→E ∧ v →L) if δ ∈ L ∧ o = E  0  ∨ (δ→L ∧ v→L ∧ f→L ∧ v →E)  0  ∨ (δ→L ∧ v→L ∧ f→E ∧ v →L)   δ→o if δ ∈ L ∧ o = N  0 0  δ→o ∧ (v→N ∨ (v→E ∧ (v →E ∨ v →N)) if (δ ∈ L ∧ o = L)  0  ∨ (v→L ∧ (v →N ∨ f→N ∨ (δ ∈ F ∧ o = E ∧ δ 6≡ f)  0  ∨ (v →L ∧ f→L) ∨ (δ ∈ F ∧ o = L ∧ δ 6≡ f)  0  ∨ (v →E ∧ f→E))))  0  δ→N ∨ (v→E ∧ v →L) if δ ∈ F ∧ o = N ∧ δ 6≡ f  0  ∨ (v→L ∧ f→L ∧ v →E) v.f = v0 b(δ→o) = ∨ (v→L ∧ f→E ∧ v0→L) J K  (δ→N ∧ (v→E ∨ v→N ∨ (v→L ∧ v0→N))) if δ ∈ ∧ o = N ∧ δ ≡ f  F  ∨ (v→E ∧ v0→L)   ∨ (δ→L ∧ v→L ∧ v0→E)  0  ∨ (δ→E ∧ v→L ∧ v →L)  0  (δ→N ∧ v→L ∧ v →L) if δ ∈ F ∧ o = L ∧ δ ≡ f   ∨ (δ→L ∧ v→N)  0 0  ∨ (δ→L ∧ v→E ∧ (v →E ∨ v →N))  0 0  ∨ (δ→L ∧ v→L ∧ (v →N ∨ v →L))  (δ→N ∧ v→L ∧ v0→E) if δ ∈ ∧ o = E ∧ δ ≡ f  F  ∨ (δ→E ∧ v→N)   ∨ (δ→E ∧ (v→E ∨ v→L) ∧ (v0→E ∨ v0→N))

Figure 11. Backward transfer function for thread-escape analysis. We use δ to range over h, v, f, and we use o to range over L, E, N.

description # classes # methods bytecode (KB) KLOC log2(# abstractions) app total app total app total app total type-state thread-esc. tsp Traveling Salesman implementation 4 997 21 6,423 2.6 391 0.7 269 569 6,175 elevator discrete event simulator 5 998 24 6,424 2.3 390 0.6 269 352 6,180 hedc web crawler from ETH 44 1,066 234 6,881 16 442 6 283 1,400 7,326 weblech website download/mirror tool 57 1,263 312 8,201 20 504 13 326 2,993 7,663 antlr A parser/translator generator 118 1,134 1,180 7,786 131 532 29 303 16,563 7,748 avrora microcontroller simulator/analyzer 1,160 2,192 4,253 10,985 224 634 64 340 37,797 10,151 lusearch text indexing and search tool 229 1,236 1,508 8,171 101 511 42 314 14,508 7,395 Table 1. Benchmark statistics computed using a 0-CFA call graph analysis. The “total” and “app” columns report numbers with and without counting JDK library code, respectively. The last two columns determine the (log of the) number of abstractions for our two client analyses. Algorithm 1 TRACER(dI , s, q): iterative forward-backward analysis and transitions to error upon any method call v.m() in application code if the following two conditions hold: (i) v may point to an 1: INPUTS: Initial abstract state dI , program s, and query q 2: OUTPUTS: Minimum π according to  such that object created at site h according to a 0-CFA may- that is used by the type-state analysis, and (ii) v is not in the current Fπ[s]({dI }) ⊆ {d | (π, d) ∈ γ(q)}. Or impossibility must-alias set tracked by the type-state analysis. If neither of these meaning that @π : Fπ[s]({dI }) ⊆ {d | (π, d) ∈ γ(q)}. conditions holds, then the abstract object remains in the init state, 3: var Πviable := P 4: while true do which corresponds to precise type-state tracking by the analysis. To enable a comprehensive evaluation of our technique, we 5: if Πviable = ∅ then 6: return impossible generated queries pervasively and uniformly from the application 7: end if code of each benchmark. For the type-state analysis, we generated a query at each method call site, and for the thread-escape analysis, 8: choose a minimum π ∈ Πviable according to  we generated a query at each instance field access and each array 9: let D = (Fπ[s]({dI }) ∩ {d | (π, d) ∈ γ(not(q))}) in 10: if D = ∅ then element access. Specific clients of these analyses may pose queries 11: return π more selectively but our technique only stands to benefit in such 12: end if cases by virtue of being query-driven. To avoid reporting duplicate results across different programs, we did not generate any queries 13: choose any τ ∈ trace(s) : Fπ[τ](dI ) ∈ D 0 0 in the JDK standard library, but our analyses analyze all reachable 14: let Π = {π | (π , dI ) ∈ γ(B[τ](π, dI , not(q)))} in bytecode including that in the JDK. 15: Πviable := Πviable ∩ (P \ Π) 16: end let Our type-state analysis answers each query (pc, h) such that the 17: end let statement at program point pc is a method call v.m() in application 18: end while code and variable v may point to an object allocated at a site h that also occurs in application code. The query is proven if every object allocated at site h that variable v may refer to at the point of this call is in state init (i.e., not error). These queries stress-test our Proof See Appendix A.5. type-state analysis since they fail to be proven if the underlying may-alias or must-alias analysis loses precision along any program 6. Experiments path from the point at which any object is created in application code to the point at which any application method is called on it. We implemented our parametric dataflow analysis technique in Our thread-escape analysis answers each query (pc, v) such that Chord [1], an extensible program analysis framework for Java the statement at program point pc in application code accesses bytecode. The forward analysis is expressed as an instance of the (reads or writes) an instance field or an array element of the object RHS tabulation framework [19] while the backward meta-analysis denoted by variable v. Such queries may be posed by a client such is expressed as an instance of a trace analysis framework that as static datarace detection. implements our proposed optimizations. We chose type-state and thread-escape analyses as they are We implemented our type-state and thread-escape analyses in challenging to scale: both are fully flow- and context-sensitive our framework, and we evaluated our technique on both of them analyses that use radically different heap abstractions. The thread- using a suite of seven real-world concurrent Java benchmark pro- escape analysis is especially hard to scale: simply mapping all grams. Table 1 shows characteristics of these programs. All exper- allocation sites to L in the abstraction causes the analysis to run iments were done using JDK 1.6 on Linux machines with 3.0 GHz out of memory even on our smallest benchmark. Moreover, these processors and a maximum of 8GB memory per JVM process. analyses are useful in their own right: type-state analysis is an The last two columns of Table 1 show the size of the family important general analysis for object state verification while thread- of abstractions searched by each analysis for each benchmark. escape analysis is beneficial to a variety of concurrency analyses. 2N N For the type-state analysis, it is where is the number of We next summarize our evaluation results, including precision, pointer-typed variables in reachable methods, since an abstraction scalability, and useful statistics of proven queries. determines which variables the analysis can track in must-alias sets. For thread-escape analysis, it is 2N where N is the number of object allocation sites in reachable methods, since the abstraction Precision. Figure 12 shows the precision of our technique. The determines whether to map each such site to L or E. absolute number of queries for each benchmark appears at the top. We presented our technique for a single query but in practice a The queries are classified into three categories: those proven using client may pose multiple queries in the same program. Our frame- a cheapest abstraction, those shown impossible to prove using any work has the same effect as running our technique separately for abstraction, and those that could not be resolved by our technique each query but it uses a more efficient implementation: at any in 1,000 minutes (we elaborate on these queries below). instant, it maintains a set of groups {G1, ..., Gn} of unresolved All queries are resolved in the type-state analysis. Of these queries (i.e., queries that are neither proven nor shown impossible queries, 25% are proven on average per benchmark. Queries im- to prove). Two queries belong to the same group if the sets of un- possible to prove are notably more than proven queries for the viable abstractions computed so far for those queries are the same. type-state analysis primarily due to the stress-test nature of the All queries start in the same group with an empty set of unviable type-state property that the analysis checks. In contrast, for the abstractions but split into separate groups when different sets of thread-escape analysis, our technique proves 38% queries and it unviable abstractions are computed for them. shows 47% queries impossible to prove, for a total of 85% resolved To avoid skewing our results by using a real type-state property queries on average per benchmark. We manually inspected several to evaluate our type-state analysis, we used a fictitious one that queries that were unresolved, and found that all of them were true tracks the state of every object allocated in the application code but impossible to prove using our thread-escape analysis, due to its of the program (as opposed to, say, only file objects). The type- limit of two abstract locations (L and E). There are two possible state automaton for this property has two states, init and error. ways to address such queries depending on the desired goal: alter The type-state analysis tracks a separate abstract object for each the backward meta-analysis to show impossibility more efficiently allocation site h in application code that starts in the init state, or alter the forward analysis to make the queries provable. 12 72 170 71 7903 5052 3644 (Total # Queries) 209 221 552 658 5857 14322 14040 6726 (Total # Queries) 100% 100% 90% 90% 80% 80% 70% 70% Unresolved 60% Impossible 60% 50% 50% Impossible 40% Proven 40%

% Queries % Queries Proven 30% 30% 20% 20% 10% 10% 0% 0%

tsp tsp hedc antlr AVG. hedc antlr AVG. avrora avrora elevator weblech lusearch elevator weblech lusearch (a) Type-state analysis. (b) Thread-escape analysis. Figure 12. Precision measurements.

In summary, we found our technique useful at quantifying the type-state analysis thread-escape analysis limitations of a parametric dataflow analysis, and inspecting the min max avg min max avg queries deemed impossible to prove suggests what aspects of the tsp 2 3 2.3 1 1 1 analysis to change to overcome those limitations. elevator 2 4 2.8 1 2 1.1 hedc 2 5 2.9 1 5 1.4 weblech 2 4 3.3 1 16 6.1 Scalability. It is challenging to scale backward static analyses. antlr 2 20 10.9 1 87 1.9 We found that underapproximation is crucial to the scalability of avrora 2 84 50.2 1 96 2.1 our backward meta-analysis: disabling it caused our technique to lusearch 2 34 16.5 1 18 1.7 timeout for all queries even on our smallest benchmark. Recall that for disjunctive meta-analysis (Section 4.1) the degree of underap- Table 3. Statistics of cheapest abstraction size for proven queries. proximation can be controlled by specifying the maximum number of disjuncts k retained in the boolean formulae that are propagated type-state analysis thread-escape analysis backward by the meta-analysis. We found it optimal to set k = 5 # groups min max avg # groups min max avg for our two client analyses on all our benchmarks. We arrived at tsp 4 1 1 1 3 1 3 2 this setting by experimenting with different values of k. Figure 13 elevator 8 1 5 1.5 10 1 13 3.8 illustrates the effect of setting k to 1, 5, and 10 on the running time hedc 26 1 4 1.3 34 1 50 7.6 of our thread-escape analysis. We show these results only for our weblech 18 1 17 2.3 51 1 88 6.4 smallest four benchmarks as the analysis ran out of memory on the antlr 101 1 106 4.6 237 1 374 10.7 larger three benchmarks for k = 1 and k = 10. Intuitively, the rea- avrora 158 1 368 8.6 890 1 262 9 son is that doing underapproximation aggressively (k = 1) reduces lusearch 151 1 139 3.6 270 1 256 13.7 the running time of the backward analysis in each iteration but it Table 4. Statistics of cheapest abstraction reuse for proven queries. increases the number of iterations to resolve a query, whereas do- ing underapproximation passively (k = 10) reduces the number of iterations but increases the running time of the backward analysis ables that must be tracked in must-alias sets ranges from 2 to 50 in each iteration. Compared to these two extremes, setting k = 5 from our smallest benchmark to our largest. On the other hand, results in much fewer timeouts and better scalability overall. thread-escape analysis only needs 1 to 2 sites mapped to L on aver- Table 2 shows statistics about the number of iterations of our age for most benchmarks, though there are queries that need upto technique for resolved queries using k = 5. Minimum, maximum, 96 such sites (this means that no abstraction with fewer than those and average number of iterations are shown separately for proven many sites can prove those queries). The graphs in Figure 14 show queries and for queries found impossible to prove. The table high- the distribution of the cheapest abstraction sizes for thread-escape lights the scalability of our technique as queries for most bench- analysis on our largest three benchmarks. We see that most queries marks are resolved in under ten iterations on average. The only are indeed proven using 1 or 2 allocation sites mapped to L. exception is the type-state analysis on avrora which takes 48 itera- Finally, it is worth finding how different the cheapest abstrac- tions on average for proven queries. The reason is that, compared tions computed by our technique are for these proven queries. Ta- to the remaining benchmarks, for avrora our type-state analysis re- ble 4 answers this question by showing the numbers of queries quires many more variables in the cheapest abstraction for most grouped together sharing the same cheapest abstraction. The table of the proven queries (this is confirmed below by statistics on the shows that around ten or less queries on average share the same sizes of the cheapest abstractions). The numbers of iterations also cheapest abstraction for both analyses, indicating that the cheapest show that our technique is effective at finding queries that are im- abstraction tends to be different for different queries, though there possible to prove since for most benchmarks it finds such queries are also a few large groups containing up to 368 queries for type- in under four iterations on average. We also show the running time state analysis and 374 queries for thread-escape analysis. for thread-escape analysis in the table as it is relatively harder to These statistics underscore both the promise and the challenge scale than type-state analysis. For each benchmark, our technique of parametric static analysis: on one hand, most queries can be takes one to two minutes on average for resolved queries. proven by instantiating the analysis using very inexpensive abstrac- tions, but on the other hand, these abstractions tend to be very dif- Statistics of Proven Queries. We now present some useful statis- ferent for queries from different parts of the same program. tics about proven queries. Table 3 shows the minimum, maximum, and average sizes of the cheapest abstraction computed by our tech- nique for these queries. For type-state analysis, the size of the 7. Related Work cheapest abstraction correlates the benchmark size and is greatly Our work is related to iterative refinement analyses but differs in the affected by the depth of method calls. The average number of vari- goal and the technique. They aim to find a cheap enough abstraction number of iterations running time of thread-escape analysis type-state analysis thread-escape analysis (s = seconds, m = minutes, h = hours) proven impossible proven impossible proven impossible min max avg min max avg min max avg min max avg min max avg min max avg tsp 2 2 2 1 2 1.5 2 2 2 1 1 1 14s 29s 21s 6s 8s 7s elevator 2 3 2.1 1 9 3.8 2 3 2 1 5 1.3 12s 107s 34s 6s 144s 15s hedc 2 3 2.3 1 2 1 2 6 2.4 1 4 1.8 17s 6m 89s 10s 5m 51s weblech 2 3 2.1 1 3 1.4 2 17 6.9 1 3 1.2 20s 16m 5m 11s 150s 28s antlr 2 18 8.9 1 47 7.8 2 88 3 1 18 1.4 18s 77m 98s 6s 21m 64s avrora 2 82 47.7 1 30 3.5 2 97 3 1 38 1.4 16s 28m 67s 5s 3h 41s lusearch 2 32 2.1 1 23 2 2 19 2.7 1 20 2.3 14s 13m 112s 6s 45m 131s Table 2. Scalability measurements.

250" 180" max"disjuncts"="5" 200" tsp" 160" elevator" max"disjuncts"="1" 140" 150" 120" max"disjuncts"="10" 100" 100" 80" 60" #"queries" 50" #"queries" 40" 20" 0" 0"

0&19" 20&39" 40&59" 60&79" 80&99" 0(19" 100&119"120&139"140&159"160&179"180&199"200&200"-meout" 20(39" 40(59" 60(79" 80(99" 100(119"120(139"140(159"160(179"180(199"200(431"-meout" analysis"-me"(seconds)" analysis"-me"(seconds)"

200" 300" hedc" 250" weblech" 150" 200" 100" 150" 100"

#"queries" 50" #"queries" 50" 0" 0"

0&19" 0'19" 20&39" 40&59" 60&79" 80&99" 20'39" 40'59" 60'79" 80'99" 100&119"120&139"140&159"160&179"180&199"200&459"-meout" 100'119"120'139"140'159"160'179"180'199"200'941"-meout" analysis"-me"(seconds)" analysis"-me"(seconds)"

Figure 13. Running time of thread-escape analysis per query for different degrees of underapproximation k = 1, 5, 10 in its meta-analysis on our smallest four benchmarks. The timeout columns denote queries that could not be resolved in 1,000 minutes. Setting k to 1 or 10 resolves fewer queries than k = 5 for the shown benchmarks and also caused the analysis to run out of memory for the largest three benchmarks.

2345% 1400 1275 6000% 5436% 2500% antlr avrora% lusearch% 1200 5000% 2000% 1000 4000% 1500% 800 706 3000% 600 1000% 805% 390 2000% 400 954% 892% 500% 295% 1000% 129% 86% 200 79 #%proven%queries% 23% 4% 3% 15% # proven queries 2% 1% 39 19 4 2 6 3 13 #%proven%queries% 164% 68% 110% 13% 181% 41% 13% 123% 0 0% 0% 1 2 3 4 5 6 7 8 9 10 11-87 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 11,96% 1% 2% 3% 4% 5% 6% 7% 8% 9% 10% 18,18% # sites set to L to prove query #%sites%set%to%L%to%prove%query% #%sites%set%to%L%to%prove%query%

Figure 14. Sizes of the cheapest abstractions computed for proven queries of the thread-escape analysis on our largest three benchmarks. to prove a query while we aim to find a cheapest abstraction or show ceed in proving the query on the trace, whereas our meta-analysis that none exists. We next contrast the techniques. computes a sufficient condition for the failure of the given analy- CEGAR-based model checkers such as SLAM [3] and BLAST sis to prove the query on the trace. Intuitively, our meta-analysis [14] compute a predicate abstraction of the given program to prove attempts to find as many other abstractions destined to a similar an assertion (query) in the program. Yogi [4, 9] combines CEGAR- proof failure as the currently chosen abstraction; the next abstrac- based model checking and directed input generation to simulta- tion our approach attempts is simply a cheapest one not discarded neously search for proofs and violations of assertions. All these by the meta-analysis. One advantage of the above approaches over approaches can be viewed as parametric in which program predi- our approach is that they can produce concrete counterexamples cates to use in the predicate abstraction. They differ from our ap- for false queries, whereas our approach can at best declare such proach primarily in the manner in which they analyze an abstract queries impossible to prove using the given analysis. Conversely, counterexample trace that is produced as a witness to the failure our approach can declare when true queries are impossible to prove to prove a query using the currently chosen abstraction. In particu- using the given analysis, whereas the above approaches can diverge lar, these approaches compute an interpolant, which can be viewed for such queries. as a minimal sufficient condition for the model checker to suc- Refinement-based pointer analyses compute cause-effect de- defining the transfer functions of the meta-analysis can be tedious pendencies for finding aspects of the abstraction that might be re- and error-prone. One plausible solution is to devise a general recipe sponsible for the failure to prove a query and then refine these as- for synthesizing these functions automatically from a given abstract pects in the hope of proving it. These aspects include field reads domain and parametric analysis. and writes to be matched [21, 22], methods or object allocation sites to be cloned [15, 18], or memory locations to be treated flow- sensitively [13]. A drawback of these analyses is that they can refine Acknowledgments much more than necessary and thereby sacrifice scalability. We thank Ravi Mangal for discussions and help with the implemen- Combining forward and backward analysis has been proposed tation. We also thank the anonymous reviewers for many insightful (e.g., [5]) but our approach differs in three key aspects. First, ex- comments. This research was supported in part by DARPA con- isting backward analyses are proven sound with respect to the pro- tract #FA8750-12-2-0020, awards from Google and Microsoft, and gram’s concrete semantics, whereas ours is a meta-analysis that is EPSRC. proven sound with respect to the abstract semantics of the forward analysis. Second, existing backward analyses only track abstract states (to prune the over-approximation computed by the forward References analysis), whereas ours also tracks parameter values. Finally, exist- [1] Chord:. http://code.google.com/p/jchord/. ing backward analyses may not scale due to tracking of program [2] T. Ball and S. Rajamani. Bebop: a path-sensitive interprocedural states that are unreachable from the initial state, whereas ours is dataflow engine. In Proceedings of the ACM Workshop on Program guided by the abstract counterexample trace provided by the for- Analysis For Software Tools and Engineering (PASTE’01), 2001. ward analysis, which also enables underapproximation. [3] T. Ball and S. Rajamani. The SLAM project: Debugging system soft- Parametric analysis is a search problem that may be tackled ware via static analysis. In Proceedings of the 29th ACM Symposium using various algorithms with different pros and cons. Liang et al. on Principles of Programming Languages (POPL’02), 2002. [16] propose iterative coarsening-based algorithms that start with [4] N. Beckman, A. Nori, S. Rajamani, R. Simmons, S. Tetali, and the most precise abstraction (instead of the least precise one in A. Thakur. Proofs from tests. IEEE Trans. Software Eng., 36(4):495– the case of iterative refinement-based algorithms). Besides being 508, 2010. impractical, these algorithms solve a different problem and cannot [5] P. Cousot and R. Cousot. Refining model checking by abstract inter- be adapted to ours: they find a minimal abstraction in terms of pretation. Autom. Softw. Eng., 6(1):69–95, 1999. precision as opposed to a minimum or cheapest abstraction. Naik [6] I. Dillig, T. Dillig, and A. Aiken. Sound, complete and scalable et al. [17] use dynamic analysis to infer a necessary condition on path-sensitive analysis. In Proceedings of the 29th ACM Conference the abstraction to prove a query. They instantiate the parametric on Programming Language Design and Implementation (PLDI’08), analysis using a cheapest abstraction that satisfies this condition. 2008. However, there is no guarantee that it will prove the query, and the [7] I. Dillig, T. Dillig, and A. Aiken. Fluid updates: beyond strong vs. approach does not do refinement in case the analysis fails. weak updates. In Proceedings of the 19th European Symposium on Finally, constraint-based and automated theorem proving tech- Programming (ESOP’10), 2010. niques have been proposed that use search procedures similar [8] S. Fink, E. Yahav, N. Dor, G. Ramalingam, and E. Geay. Effective in spirit to our approach: they too combine over- and under- typestate verification in the presence of aliasing. ACM Trans. Softw. approximations, and compute strongest necessary and weakest Eng. Methodol., 17(2), 2008. sufficient conditions for proving queries (e.g, [6, 7, 11, 12]). A [9] B. Gulavani, T. Henzinger, Y. Kannan, A. Nori, and S. Rajamani. key difference is that none of these approaches address finding Synergy: a new algorithm for property checking. In Proceedings of minimum-cost abstractions or proving impossibility results. the 14th ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE’06), 2006. 8. Conclusion [10] B. Gulavani, S. Chakraborty, A. Nori, and S. Rajamani. Automatically We presented a new approach to parametric dataflow analysis refining abstract interpretations. In Proceedings of the 14th Interna- with the goal of finding a cheapest abstraction that proves a given tional Conference on Tools and Algorithms for the Construction and query or showing that no such abstraction exists. Our approach is Analysis of Systems (TACAS’08), 2008. CEGAR-based and applies a novel meta-analysis on abstract coun- [11] S. Gulwani and A. Tiwari. Assertion checking unified. In Proceedings terexample traces to efficiently eliminate unsuitable abstractions. of the 8th International Conference on Verification, Model Checking, We showed the generality of our approach by applying it to two and Abstract Interpretation (VMCAI’07), 2007. example analyses in the literature. We also showed its practicality [12] S. Gulwani, B. McCloskey, and A. Tiwari. Lifting abstract interpreters by applying it to several real-world Java benchmark programs. to quantified logical domains. In Proceedings of the 35th ACM Sym- Our approach opens intriguing new problems. First, our ap- posium on Principles of Programming Language (POPL’08), 2008. proach requires the abstract domain of the parametric analysis to be [13] S. Guyer and C. Lin. Client-driven . In Proceedings of disjunctive in order to be able to provide a counterexample trace to the 10th International Symposium on Static Analysis (SAS’03), 2003. the meta-analysis. Our meta-analysis relies on the existence of such [14] T. Henzinger, R. Jhala, R. Majumdar, and K. McMillan. Abstractions a trace for scalability: the trace guides the meta-analysis in deciding from proofs. In Proceedings of the 31st ACM Symposium on Principles which parts of the formulae it tracks represent infeasible states that of Programming Languages (POPL’04), 2004. can be pruned. One possibility is to generalize our meta-analysis [15] P. Liang and M. Naik. Scaling abstraction refinement via pruning. In to operate on DAG counterexamples that have been proposed for Proceedings of the 32nd ACM Conference on Programming Language non-disjunctive analyses [10]. Second, the meta-analysis is a static Design and Implementation (PLDI’11), 2011. analysis, and designing its abstract domain is an art. We proposed [16] P. Liang, O. Tripp, and M. Naik. Learning minimal abstractions. In a DNF representation along with optimizations that were very ef- Proceedings of the 38th ACM Symposium on Principles of Program- fective in compacting the formulas tracked by the meta-analysis ming Languages (POPL’11), 2011. for our type-state analysis and our thread-escape analysis. It would [17] M. Naik, H. Yang, G. Castelnuovo, and M. Sagiv. Abstractions from be useful to devise a generic semantics-preserving simplification tests. In Proceedings of the 39th ACM Symposium on Principles of process to assist in compacting such formulas. Finally, manually Programming Languages (POPL’12), 2012. [18] J. Plevyak and A. Chien. Precise concrete type inference for object- oriented languages. In Proceedings of the 9th ACM Conference on Object-Oriented Programming Systems, Languages, and Applications (OOPSLA’94), 1994. [19] T. Reps, S. Horwitz, and M. Sagiv. Precise interprocedural dataflow analysis via graph reachability. In Proceedings of the 22nd ACM Symposium on Principles of Programming Languages (POPL’95), 1995. [20] T. Reps, S. Schwoon, S. Jha, and D. Melski. Weighted pushdown systems and their application to interprocedural dataflow analysis. Sci. Comput. Program., 58(1-2):206–263, 2005. [21] M. Sridharan and R. Bod´ık. Refinement-based context-sensitive points-to analysis for Java. In Proceedings of the 27th ACM Conference on Programming Language Design and Implementation (PLDI’06), 2006. [22] M. Sridharan, D. Gopan, L. Shan, and R. Bod´ık. Demand-driven points-to analysis for Java. In Proceedings of the 20th ACM Con- ference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA’05), 2005. A. Proofs of Lemmas and Theorems complete the proof of this loop case if we show that

In this part of the appendix, we provide proofs of Lemmas and ∀(T1,D1) ∈ R :(K(T1),G(D1)) ∈ R. Theorems that are stated in the main text of the paper without proofs. We will prove the simplified requirement. Pick (T1,D1) ∈ R. We should show that A.1 Proof of Lemma 1 {Fπ[τ1](d) | τ1 ∈ K(T1) ∧ d ∈ D} = G(D1). LEMMA 1. For all programs s, parameter values π, and abstract states d, we have that We derive this desired equality as follows: F [s]({d}) = {F [τ](d) | τ ∈ trace(s)}, {Fπ[τ1](d) | τ1 ∈ K(T1) ∧ d ∈ D} π π 0 = {Fπ[τ1](d) | d ∈ D ∧ (τ1 =  ∨ (τ1 = τ ; τ ∧ τ ∈ T1 where F [τ] is the result of analyzing the trace τ as shown in 0 π ∧ τ ∈ trace(s1)))} Figure 3. 0 = D ∪ {(Fπ[τ ] ◦ Fπ[τ])(d) | 0 τ ∈ T1 ∧ τ ∈ trace(s1) ∧ d ∈ D} Proof We prove the following generalization of the lemma: 0 0 0 = D ∪ {Fπ[τ ](d ) | τ ∈ trace(s1) D 0 ∀s. ∀π ∈ P. ∀D ∈ 2 . ∧ d ∈ {Fπ[τ](d) | τ ∈ T1 ∧ d ∈ D}} 0 0 0 0 Fπ[s](D) = {Fπ[τ](d) | d ∈ D ∧ τ ∈ trace(s)}. = D ∪ {Fπ[τ ](d ) | τ ∈ trace(s1) ∧ d ∈ D1} The proof is by induction on the structure of s. When s is an = D ∪ Fπ[s1](D1). atomic command a, = G(D1). (T ,D ) ∈ R Fπ[s](D) = Fπ[a](D) = { a π(d) | d ∈ D}. The fourth equality comes from the assumption that 1 1 . J K The fifth equality holds because of the induction hypothesis.  Since trace(s) = {a}, the equation above implies the lemma. The next case is that s = s1 ; s2. Let D1 = Fπ[s1](D) and A.2 Proof of Theorem 3 D2 = Fπ[s2](D1). By the induction hypothesis on s1 and s2, we have that THEOREM 3 (Soundness). For all τ, π, d and φ ∈ M,

D1 = {Fπ[τ1](d) | τ1 ∈ trace(s1) ∧ d ∈ D} 1. (π, Fπ[τ](d)) ∈ γ(φ) ⇒ (π, d) ∈ γ(B[τ](π, d, φ)); and ∧ D2 = {Fπ[τ2](d1) | τ2 ∈ trace(s2) ∧ d1 ∈ D1}. 2. ∀(π0, d0) ∈ γ(B[τ](π, d, φ)). (π0,Fπ0 [τ](d0)) ∈ γ(φ). Hence, D2 is Proof We prove the theorem by structural induction on τ. Pick {Fπ[τ2](Fπ[τ1](d)) | τ2 ∈ trace(s2) ∧ τ1 ∈ trace(s1) ∧ d ∈ D} arbitrary π, d, φ. We will prove the two items of this theorem for = {Fπ[τ](d) | τ ∈ trace(s1 ; s2) ∧ d ∈ D}. each case of τ. We move on to the case that s = s1 + s2. In this case, When τ is , φ and B[τ](π, d, φ) are the same, and Fπ0 [τ] = 0 Fπ0 [] is the identity for every π . The claim of the theorem follows Fπ[s](D) = Fπ[s1 + s2](D) = Fπ[s1](D) ∪ Fπ[s2](D). from these facts. By the induction hypothesis, the RHS of the above equation is the The next case is that τ is an atomic command a. In this 0 0 0 0 b 0 same as case, B[τ](π , d , φ) = approx(π , d , a (φ)) and Fπ0 [τ](d ) = 0 0 0 J K {F [τ](d) | τ ∈ trace(s ) ∧ d ∈ D} a π0 (d ) for all π , d . Hence, the first item of the theorem can be π 1 JprovedK as follows: ∪ {Fπ[τ](d) | τ ∈ trace(s2) ∧ d ∈ D}, which equals Fπ[τ](d) ∈ γ(φ) ⇒ a π(d) ∈ γ(φ) ⇒ (Jπ,K d) ∈ γ( a b(φ)) {Fπ[τ](d) | τ ∈ trace(s1) ∪ trace(s2) ∧ d ∈ D} ⇒ (π, d) ∈ γ(approxJ K (π, d, a b(φ))) = {Fπ[τ](d) | τ ∈ trace(s1 + s2) ∧ d ∈ D}. J K ∗ The second and third implications come from the requirement on The remaining case is s = s . Let K and G be functions defined b 1 a and approx in our framework, respectively. For the second as follows: itemJ K of the theorem, pick (π0, d0) ∈ γ(B[τ](π, d, φ)). Then, 0 0 K = λT. ({} ∪ {τ ; τ | τ ∈ T ∧ τ ∈ trace(s1)}), 0 0 b G = λD . (D ∪ Fπ[s1](D )). (π0, d0) ∈ γ(B[a](π, d, φ)) ⇒ (π0, d0) ∈ γ(approx(π, d, a (φ)) b J K ∗ ⇒ (π0, d0) ∈ γ( a (φ)) The trace set trace(s1) is the same as the least fixpoint of K, and J K ∗ ⇒ (π0, a π0 (d0)) ∈ γ(φ). Fπ[s1](D) is defined by the least fixpoint of G. Let R be a relation J K on trace sets and sets of abstract states defined by The first implication is just the unrolling of the definition of B[a], the second comes from our requirement that approx does underap- (T ,D ) ∈ R ⇔ D = {F [τ ](d) | τ ∈ T ∧ d ∈ D}. 1 1 1 π 1 1 1 proximation, and the last holds because of the requirement on a b Meanwhile, we have that in our framework. The conclusion of the above derivation impliesJ K ∗ the second item of this theorem. {Fπ[τ](d) | τ ∈ trace(s1) ∧ d ∈ D} The remaining case is that τ is τ1 ; τ2. Let = {Fπ[τ](d) | τ ∈ (leastFix K) ∧ d ∈ D}. 0 0 0 Thus, it is sufficient to prove that (leastFix K, leastFix G) ∈ R. d = Fπ[τ1](d) and φ = B[τ2](π, d , φ). This proof obligation can be further simplified because R is closed To prove the first item of the theorem, assume that (π, Fπ[τ](d)) under arbitrary union: for every {(Ti,Di)}i∈I , belongs to γ(φ). Then, [ [ 0 (∀i ∈ I :(Ti,Di) ∈ R) ⇒ ( Ti, Di) ∈ R. (π, Fπ[τ2](d )) = (π, Fπ[τ2](Fπ[τ1](d))) ∈ γ(φ). i∈I i∈I We apply the induction hypothesis on τ here, and obtain This means that the least upper bounds of any two R-related in- 2 0 0 creasing sequences will be related by R as well. Hence, we can (π, d ) ∈ γ(B[τ2](π, d , φ)). The LHS in this membership is the same as (π, Fπ[τ1](d)), and ⇔ the RHS equals γ(φ0). This means that we can apply the induction (δ ∈ L ∧ o = N ∧ d(δ) = N) hypothesis again, this time on τ1, and get ∨ (δ ∈ L ∧ o = E ∧ ((d(v) = L ∧ d(δ) = L) ∨ d(δ) = E)) ∨ (δ ∈ ∧ o = L ∧ d(v) ∈ {E, N} ∧ d(δ) = L) 0 L (π, d) ∈ γ(B[τ1](π, d, φ )) ∨ (δ ∈ F ∧ o = N ∧ (d(v) = L ∨ (d(v) ∈ {E, N} ∧ d(δ) = N))) ∨ (δ ∈ ∧ o ∈ {E, L} ∧ d(v) ∈ {E, N} ∧ d(δ) = o) But the RHS here is the same as F ⇔ 0 γ(B[τ1](π, d, B[τ2](π, d , φ)) = γ(B[τ](π, d, φ)). (δ ∈ L ∧ o = N ∧ (π, d) ∈ γ(δ→N)) ∨ (δ ∈ L ∧ o = E ∧ (π, d) ∈ γ((v→L ∧ δ→L) ∨ δ→E)) Hence, the first item of this theorem holds. Let’s move on to the ∨ (δ ∈ L ∧ o = L ∧ (π, d) ∈ γ((v→E ∨ v→N) ∧ δ→L)) proof of the second item of this theorem. Consider (π0, d0) ∈ ∨ (δ ∈ F ∧ o = N ∧ (π, d) ∈ γ(v→L ∨ (v→E ∨ v→N) ∧ δ→N)) γ(B[τ](π, d, φ)). We derive the conclusion of the second item of ∨ (δ ∈ F ∧ o ∈ {E, L} ∧ (π, d) ∈ γ((v→E ∨ v→N) ∧ δ→o)). the theorem as follows: The third subcase can be proven as follows: (π0, d0) ∈ γ(B[τ](π, d, φ)) (π, g = v (d)) ∈ γ(v→o) 0 π ⇒ (π0, d0) ∈ γ(B[τ1](π, d, φ )) ⇔ J K 0 ⇒ (π0,Fπ0 [τ1](d0)) ∈ γ(φ ) (d(v) = L ∧ esc(d)(v) = o) ∨ (d(v) = o ∧ o ∈ {E, N}) 0 ⇒ (π0,Fπ0 [τ1](d0)) ∈ γ(B[τ2](π, d , φ)) ⇔ ⇒ (π0,Fπ0 [τ2](Fπ0 [τ1](d0))) ∈ γ(φ) (d(v) = L ∧ o = E) ∨ (d(v) = o ∧ o ∈ {E, N}) ⇒ (π0,Fπ0 [τ](d0)) ∈ γ(φ) ⇔ (o = E ∧ (π, d) ∈ γ(v→L ∨ v→E)) ∨ (o = N ∧ (π, d) ∈ γ(v→N)). The first implication holds because of the definition of B[τ1 ; τ2], the second and fourth implications use induction hypothesis, the The calculations for the three subcases above imply that the third is the simple unrolling of the definition of φ0, and the last claimed equality holds for g = v. v = g implication follows from the definition of Fπ0 [τ1 ; τ2].  The second case is . In this case, we can show the lemma as follows:

A.3 Proof of Lemma 5 (π, v = g π(d)) ∈ γ(δ→o) ⇔ J K LEMMA 5. For every atomic command a, the backward transfer (π, d[v : E]) ∈ γ(δ→o) function a b in Figure 11 satisfies requirement (2) in Section 4. J K ⇔ (δ ≡ v ∧ o = E) ∨ (δ ∈ ( ∪ ) \{v} ∧ d(δ) = o) Proof For every atomic command a, define a function W [a] on L F M ∨ (δ ∈ H ∧ π(δ) = o) by ⇔ W [a](φ) = {(π, d) | (π, a π(d)) ∈ γ(φ)}. (δ ≡ v ∧ o = E ∧ (π, d) ∈ γ(true)) ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)). J K We need to prove that for all φ, The third case is that a is v = new h. The proof of this case is given below: ∀a. γ( a b(φ)) = W [a](φ). (3) J K (π, v = new h π(d)) ∈ γ(δ→o) Our proof is done by induction on the structure of φ. ⇔ J K We start by considering the cases that φ is true, conjunction, (π, d[v : π(h)]) ∈ γ(δ→o) false or disjunction. The reason that the equality in (3) holds for ⇔ these cases is that W [a] preserve all of true, conjunction, false and (δ ≡ v ∧ π(h) = o) ∨ (δ ∈ (L ∪ F) \{v} ∧ d(δ) = o) disjunction: ∨ (δ ∈ H ∧ π(δ) = o) ⇔ 0 0 W [a](true) = P × S,W [a](φ ∧ φ ) = W [a](φ) ∩ W [a](φ ), (δ ≡ v ∧ (π, d) ∈ γ(h→o)) ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)). W [a](false) = ∅,W [a](φ ∨ φ0) = W [a](φ) ∪ W [a](φ0). The fourth case is that a is v = null. We prove the lemma in The desired equality follows from this preservation and the induc- the case as follows: tion hypothesis. (π, v = null π(d)) ∈ γ(δ→o) The remaining case is that φ = δ→o for some δ ∈ H∪L∪F and ⇔ J K o ∈ {L, E, N}. This is handled by the case analysis on the atomic (π, d[v : N]) ∈ γ(δ→o) command a. ⇔ The first case is that a is g = v. We further subdivide this case (δ ≡ v ∧ N = o) ∨ (δ ∈ (L ∪ F) \{v} ∧ d(δ) = o) such that (1) δ ≡ h, (2) δ ∈ (L ∪ F \{v}) and (3) δ ≡ v. The first ∨ (δ ∈ H ∧ π(δ) = o) subcase is handled below: ⇔ (δ ≡ v ∧ o = N ∧ (π, d) ∈ γ(true)) ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)). (π, g = v π(d)) ∈ γ(h→o) ⇔ π(h) = o ⇔ (π, d) ∈ γ(h→o). J K The fifth case is that a is v = v0. We prove the lemma in the The second subcase can be proven as follows: case as follows: 0 (π, g = v π(d)) ∈ γ(δ→o) (π, v = v π(d)) ∈ γ(δ→o) ⇔ J K ⇔ J K (d(v) = L ∧ esc(d)(δ) = o) ∨ (d(v) ∈ {E, N} ∧ d(δ) = o) (π, d[v : d(v0)]) ∈ γ(δ→o) ⇔ ⇔ 0 (δ ∈ L ∧ d(v) = L ∧ d(δ) = L ∧ o = E) (δ ≡ v ∧ d(v ) = o) ∨ (δ ∈ (L ∪ F) \{v} ∧ d(δ) = o) ∨ (δ ∈ L ∧ d(v) = L ∧ d(δ) ∈ {E, N} ∧ o = d(δ)) ∨ (δ ∈ H ∧ π(δ) = o) ∨ (δ ∈ F ∧ d(v) = L ∧ o = N) ⇔ ∨ (d(v) ∈ {E, N} ∧ d(δ) = o) (δ ≡ v ∧ (π, d) ∈ γ(v0→o)) ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)) 0 The sixth case is that a is v = v0.f. We prove the case below: ∨ (δ ∈ L ∧ o = N ∧ (π, d) ∈ γ(δ→o ∧ (v→E ∧ v →L ∨ v→L ∧ f→L ∧ v0→E ∨ v→L ∧ f→E ∧ v0→L))) 0 0 (π, v = v .f π(d)) ∈ γ(δ→o) ∨ (δ ∈ F ∧ o = N ∧ (π, d) ∈ γ(v→E ∧ v →L ⇔ J K ∨ v→L ∧ f→L ∧ v0→E (d(v0) = L ∧ (π, d[v : d(f)]) ∈ γ(δ→o)) ∨ v→L ∧ f→E ∧ v0→L)) ∨ (d(v0) 6= L ∧ (π, d[v : E]) ∈ γ(δ→o)) ∨ (δ 6≡ f ∧ (π, d) ∈ γ(v→L ∧ δ→o ∧ (f→L ∧ v0→N ⇔ ∨ f→N ∧ v0→L (δ ≡ v ∧ ((d(v0) = L ∧ d(f) = o) ∨ (d(v0) 6= L ∧ o = E))) ∨ f→E ∧ v0→N ∨ (δ 6≡ v ∧ d(v0) = L ∧ (π, d) ∈ γ(δ→o)) ∨ f→N ∧ v0→E))) ∨ (δ 6≡ v ∧ d(v0) 6= L ∧ (π, d) ∈ γ(δ→o)) ∨ (δ ≡ f ∧ o = L ∧ (π, d) ∈ γ(v→L ∧ (f→L ∧ v0→N ⇔ ∨ f→N ∧ v0→L))) (δ ≡ v ∧ o = E ∧ ((d(v0) = L ∧ d(f) = E) ∨ d(v0) 6= L)) ∨ (δ ≡ f ∧ o = E ∧ (π, d) ∈ γ(v→L ∧ (f→E ∧ v0→N ∨ (δ ≡ v ∧ o 6= E ∧ d(v0) = L ∧ d(f) = o) ∨ f→N ∧ v0→E))) ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)) ∨ (π, d) ∈ γ(δ→o ∧ (v→L ∧ f→L ∧ v0→L ⇔ ∨ v→L ∧ f→E ∧ v0→E (δ ≡ v ∧ o = E ∧ (π, d) ∈ γ((v0→L ∧ f→E) ∨ v0→E ∨ v0→N)) ∨ v→L ∧ f→N ∧ v0→N ∨ (δ ≡ v ∧ o 6= E ∧ (π, d) ∈ γ(v0→L ∧ f→o)) ∨ v→E ∧ v0→E ∨ (δ 6≡ v ∧ (π, d) ∈ γ(δ→o)). ∨ v→E ∧ v0→N ∨ v→N)) ⇔ The last case is that a is v.f = v0. The following derivation (δ 6∈ (L ∪ F) ∧ (π, d) ∈ γ(δ→o)) shows the lemma for this case: ∨ (δ ∈ L ∧ o = E ∧ (π, d) ∈ γ(δ→o ∨ δ→L ∧ v→E ∧ v0→L ∨ δ→L ∧ v→L ∧ f→L ∧ v0→E 0 0 (π, v.f = v π(d)) ∈ γ(δ→o) ∨ δ→L ∧ v→L ∧ f→E ∧ v →L)) ⇔ J K ∨ (δ ∈ L ∧ o = N ∧ (π, d) ∈ γ(δ→o)) (d(v) = E ∧ d(v0) = L ∧ (π, esc(d)) ∈ γ(δ→o)) ∨ (δ ∈ L ∧ o = L ∧ (π, d) ∈ γ(δ→o ∧ v→N 0 ∨ (((d(v) = E ∧ d(v0) 6= L) ∨ d(v) = N) ∧ (π, d) ∈ γ(δ→o)) ∨ δ→o ∧ v→E ∧ v →E 0 ∨ (d(v) = L ∧ {d(f), d(v0)} = {L, E} ∧ (π, esc(d)) ∈ γ(δ→o)) ∨ δ→o ∧ v→E ∧ v →N 0 ∨ (d(v) = L ∧ {d(f), d(v0)} = {L, N} ∧ (π, d[f : L]) ∈ γ(δ→o)) ∨ δ→o ∧ v→L ∧ v →N ∨ (d(v) = L ∧ {d(f), d(v0)} = {E, N} ∧ (π, d[f : E]) ∈ γ(δ→o)) ∨ δ→o ∧ v→L ∧ f→N 0 ∨ (d(v) = L ∧ d(f) = d(v0) ∧ (π, d) ∈ γ(δ→o)) ∨ δ→o ∧ v→L ∧ v →L ∧ f→L 0 ⇔ ∨ δ→o ∧ v→L ∧ v →E ∧ f→E)) 0 (δ 6∈ (L ∪ F) ∧ (π, d) ∈ γ(v→E ∧ v →L ∧ δ→o)) ∨ (δ ∈ F ∧ o = N ∧ δ 6≡ f ∧ (π, d) ∈ γ(δ→o 0 0 ∨ (δ ∈ L ∧ o = E ∧ (π, d) ∈ γ(v→E ∧ v →L ∧ (δ→L ∨ δ→E))) ∨ v→E ∧ v →L 0 0 ∨ (δ ∈ L ∧ o = N ∧ (π, d) ∈ γ(v→E ∧ v →L ∧ δ→o)) ∨ v→L ∧ f→L ∧ v →E 0 0 ∨ (δ ∈ F ∧ o = N ∧ (π, d) ∈ γ(v→E ∧ v →L)) ∨ v→L ∧ f→E ∧ v →L)) ∨ ((π, d) ∈ γ(((v→E ∧ (v0→E ∨ v0→N)) ∨ v→N) ∧ δ→o)) ∨ (δ ∈ F ∧ (a = E ∨ o = L) ∧ δ 6≡ f ∨ (δ 6∈ ( ∪ ) ∧ (π, d) ∈ γ((f→L ∧ v0→E ∨ f→E ∧ v0→L) ∧ (π, d) ∈ γ(δ→o ∧ v→N L F 0 ∧ v→L ∧ δ→o)) ∨ δ→o ∧ v→E ∧ v →E 0 ∨ (δ ∈ ∧ o = E ∧ (π, d) ∈ γ((f→L ∧ v0→E ∨ f→E ∧ v0→L) ∨ δ→o ∧ v→E ∧ v →N L 0 ∧ v→L ∧ (δ→L ∨ δ→E))) ∨ δ→o ∧ v→L ∧ v →N ∨ (δ ∈ ∧ o = N ∧ (π, d) ∈ γ((f→L ∧ v0→E ∨ f→E ∧ v0→L) ∨ δ→o ∧ v→L ∧ f→N L 0 ∧ v→L ∧ δ→o)) ∨ δ→o ∧ v→L ∧ f→L ∧ v →L 0 0 0 ∨ (δ ∈ F ∧ o = N ∧ (π, d) ∈ γ((f→L ∧ v →E ∨ f→E ∧ v →L) ∨ δ→o ∧ v→L ∧ f→E ∧ v →E)) ∧ v→L)) ∨ (δ ∈ F ∧ o = N ∧ δ ≡ f ∧ (π, d) ∈ γ(f→N ∧ v→E ∨ (δ 6≡ f ∧ (π, d) ∈ γ((f→L ∧ v0→N ∨ f→N ∧ v0→L) ∨ f→N ∧ v→N 0 ∧ v→L ∧ δ→o)) ∨ f→N ∧ v→L ∧ v →N 0 ∨ (δ ≡ f ∧ o = L ∧ (π, d) ∈ γ((f→L ∧ v0→N ∨ f→N ∧ v0→L) ∨ v→E ∧ v →L 0 ∧ v→L)) ∨ v→L ∧ f→L ∧ v →E 0 ∨ (δ 6≡ f ∧ (π, d) ∈ γ((f→E ∧ v0→N ∨ f→N ∧ v0→E) ∨ v→L ∧ f→E ∧ v →L)) ∧ v→L ∧ δ→o)) ∨ (δ ∈ F ∧ o = L ∧ δ ≡ f ∧ (π, d) ∈ γ(f→L ∧ v→N 0 ∨ (δ ≡ f ∧ o = E ∧ (π, d) ∈ γ((f→E ∧ v0→N ∨ f→N ∧ v0→E) ∨ f→L ∧ v→E ∧ v →E 0 ∧ v→L)) ∨ f→L ∧ v→E ∧ v →N 0 ∨ (π, d) ∈ γ((f→L ∧ v0→L ∨ f→E ∧ v0→E ∨ f→N ∧ v0→N) ∨ v→L ∧ f→L ∧ v →N ∧ v→L ∧ δ→o) ∨ v→L ∧ f→N ∧ v0→L ⇔ ∨ v→L ∧ f→L ∧ v0→L)) 0 (δ 6∈ (L ∪ F) ∧ (π, d) ∈ γ(δ→o ∧ (v→E ∧ v →L ∨ (δ ∈ F ∧ o = E ∧ δ ≡ f ∧ (π, d) ∈ γ(f→E ∧ v→N ∨ v→L ∧ f→L ∧ v0→E ∨ f→E ∧ v→E ∧ v0→E ∨ v→L ∧ f→E ∧ v0→L))) ∨ f→E ∧ v→E ∧ v0→N 0 ∨ (δ ∈ L ∧ o = E ∧ (π, d) ∈ γ((δ→L ∨ δ→o) ∨ v→L ∧ f→E ∧ v →N ∧ (v→E ∧ v0→L ∨ v→L ∧ f→N ∧ v0→E ∨ v→L ∧ f→L ∧ v0→E ∨ v→L ∧ f→E ∧ v0→E)) ∨ v→L ∧ f→E ∧ v0→L))) ⇔ (4) as follows: (δ 6∈ (L ∪ F) ∧ (π, d) ∈ γ(δ→o)) ∨ (δ ∈ ∧ o = ∧ (π, d) ∈ γ(δ→ (π, d) ∈ W [a](err) ⇔ (π, a π(d)) ∈ γ(err) L E E J K ∨ δ→L ∧ v→E ∧ v0→L ⇔ a π(d) = > 0 ⇔ dJ =K > ∨ δ→L ∧ v→L ∧ f→L ∧ v →E b ∨ δ→L ∧ v→L ∧ f→E ∧ v0→L)) ⇔ (π, d) ∈ γ(err) = γ( a (err)). J K ∨ (δ ∈ L ∧ o = N ∧ (π, d) ∈ γ(δ→o)) When a is a method call x.m(), our proof of the desired property ∨ ((δ ∈ L ∧ o = L ∨ δ ∈ F ∧ o = E ∧ δ 6≡ f ∨ δ ∈ F ∧ o = L ∧ δ 6≡ f) (4) is slightly different, and it is given below: ∧ (π, d) ∈ γ(δ→o ∧ v→N ∨ δ→o ∧ v→E ∧ v0→E (π, d) ∈ W [x.m()](err) ∨ δ→o ∧ v→E ∧ v0→N ⇔ ∨ δ→o ∧ v→L ∧ v0→N (π, x.m() π(d)) ∈ γ(err) ∨ δ→o ∧ v→L ∧ f→N ⇔ J K ∨ δ→o ∧ v→L ∧ v0→L ∧ f→L x.m() π(d) = > 0 J K ∨ δ→o ∧ v→L ∧ v →E ∧ f→E)) ⇔ (d = >) ∨ (∃σ, Σ, X. d = (Σ,X) ∧ σ ∈ Σ ∧ m (σ) = >) ∨ (δ ∈ F ∧ o = N ∧ δ 6≡ f ∧ (π, d) ∈ γ(δ→N 0 ⇔ J K ∨ v→E ∧ v →L W ∨ v→L ∧ f→L ∧ v0→E (π, d) ∈ γ(err ∨ {type(σ) | m (σ) = >}) J K ∨ v→ ∧ f→ ∧ v0→ )) ⇔ L E L b ∨ (δ ∈ ∧ o = N ∧ δ ≡ f ∧ (π, d) ∈ γ(δ→N ∧ v→E (π, d) ∈ γ( x.m() (err)). F J K ∨ δ→N ∧ v→N The second case is errNot. In this case, we notice that W [a] ∨ δ→N ∧ v→L ∧ v0→N 0 preserves the negation. Hence, we prove the desired equality (4) as ∨ v→E ∧ v →L follows: for all atomic commands a, ∨ δ→L ∧ v→L ∧ v0→E ∨ δ→E ∧ v→L ∧ v0→L)) W [a](errNot) = W [a](not(err)) ∨ (δ ∈ ∧ o = L ∧ δ ≡ f ∧ (π, d) ∈ γ(δ→N ∧ v→L ∧ v0→L = P × D \ W [a](err) F b ∨ δ→L ∧ v→N = P × D \ γ( a (err)) ∨ δ→L ∧ v→E ∧ v0→E = γ(negate( Ja Kb(err))) ∨ δ→L ∧ v→E ∧ v0→N = γ( a b(errNotJ K )). ∨ δ→L ∧ v→L ∧ v0→N J K The third equality uses what we just proved in the err case. ∨ δ→L ∧ v→L ∧ v0→L)) 0 The third case is param(z). For all atomic commands a, ∨ (δ ∈ F ∧ o = E ∧ δ ≡ f ∧ (π, d) ∈ γ(δ→N ∧ v→L ∧ v →E ∨ δ→E ∧ v→N (π, d) ∈ W [a](param(z)) ⇔ (π, a π(d)) ∈ γ(param(z)) ∨ δ→E ∧ v→E ∧ v0→E ⇔ z ∈JπK ∨ δ→E ∧ v→E ∧ v0→N ⇔ (π, d) ∈ γ(param(z)) ∨ δ→E ∧ v→L ∧ v0→E ⇔ (π, d) ∈ γ( a b(param(z))). ∨ δ→E ∧ v→L ∧ v0→N)) J K The fourth case is paramNot(z). The proof of this case is  essentially identical to that of the second case, except that we are now using the fact that (4) holds for the param(z) case, instead of A.4 Proof of Lemma 4 the err case. The fifth case is var(z). We handle three atomic commands LEMMA 4. For every atomic command a, the backward transfer separately. First, we prove that (4) holds for x = y: function a b in Figure 10 satisfies requirement (2) of our frame- work in SectionJ K 4. (π, d) ∈ γ(W [x = y](var(z))) ⇔ Proof For every atomic command a, define a function W [a] on M (π, x = y π(d)) ∈ γ(var(z)) by ⇔ J K ∃Σ, X. d = (Σ,X) W [a](φ) = {(π, d) | (π, a π(d)) ∈ γ(φ)}. ∧ (y ∈ X ∧ x ∈ π ∧ z ∈ X ∪ {x} ∨ z ∈ X \{x}) J K We need to prove that for all φ, ⇔ (z ≡ x ∧ (π, d) ∈ γ(var(y) ∧ param(x))) ∀a. γ( a b(φ)) = W [a](φ). (4) ∨ (z 6≡ x ∧ (π, d) ∈ γ(var(z))) J K ⇔ Our proof is done by induction on the structure of φ. (π, d) ∈ γ( x = y b(var(z))). We start by considering the cases that φ is true, conjunction, J K false or disjunction. The reason that the equality in (4) holds for Second, we prove the required equality for x = null: these cases is that W [a] preserve all of true, conjunction, false and (π, d) ∈ γ(W [x = null](var(z))) disjunction: ⇔ 0 0 W [a](true) = P × S,W [a](φ ∧ φ ) = W [a](φ) ∩ W [a](φ ), (π, x = null π(d)) ∈ γ(var(z)) W [a](false) = ∅,W [a](φ ∨ φ0) = W [a](φ) ∪ W [a](φ0). ⇔ J K ∃Σ, X. d = (Σ,X) ∧ (z ∈ X \{x}) The desired equality follows from this preservation and the induc- ⇔ tion hypothesis. z 6≡ x ∧ (π, d) ∈ γ(var(z)) Now we prove the remaining cases one-by-one. The first case is ⇔ that φ = err. When a is x = y or x = null, we prove the desired (π, d) ∈ γ( x = null b(var(z))). J K Third, we show the desired equality for x.m(): Then, because of (5) above and the item 1 of Theorem 3, we have

(π, d) ∈ W [x.m()](var(z)) (π, dI ) ∈ γ(B[τ](π, dI , not(q))). ⇔ The equation above implies that the Π computed on line 14 contains (π, x.m() (d)) ∈ γ(var(z)) π π. Thus, at least π is removed from Π on line 15. ⇔ J K viable  ∃Σ, X. d = (Σ,X) ∧ z ∈ X ∧ (∀σ. m (σ) = > ⇒ σ 6∈ Σ) LEMMA 8. If TRACER(dI , s, q) removes π from Πviable in some ⇔ J K iteration then F [s]({d }) is not a subset of {d | (π, d) ∈ γ(q)}. V π I (π, d) ∈ γ(var(z) ∧ {typeNot(σ) | m (σ) = >}) 0 ⇔ J K Proof Suppose TRACER(dI , s, q) removes π from Πviable in (π, d) ∈ γ( x.m() b(var(z))). some iteration. We need to prove that J K 0 The sixth case is varNot(z). We use the fact that W [a] pre- Fπ0 [s]({dI }) 6⊆ {d | (π , d) ∈ γ(q)}. serves negation and conjunction, and derive the desired equality as But this is equivalent to follows: 0 Fπ0 [s]({dI }) ∩ {d | (π , d) ∈ γ(not(q))}= 6 ∅, W [a](varNot(z)) = W [a](not(var(z)) ∧ errNot) 0 = (P × D \ W [a](var(z)) ∩ W [a](errNot) which we will show in this proof. Since π is removed from Πviable b b in some iteration, TRACER must have done so using π and τ ∈ = (P × D \ γ( a (var(z)))) ∩ γ( a (errNot)) = γ(negate( aJ bK(var(z))) ∧ errNotJ )K. trace(s) such that J K 0 The first equality just uses the fact that not(var(z)) = varNot(z)∨ (π, Fπ[τ](dI )) ∈ γ(not(q)) ∧ (π , dI ) ∈ γ(B[τ](π, dI , not(q))). err. The second uses the preservation of negation and conjunction By the item 2 of Theorem 3, the second conjunct implies that by W [a], and the third equality uses what we have proved in 0 (π ,F 0 [τ](d )) ∈ γ(not(q)). (6) the err and errNot cases. The last holds because γ preserves the π I conjunction and negate implements the semantic negation. Furthermore, because of τ ∈ trace(s), by Lemma 1, we also have The seventh case is type(σ). When a is x = y or x = null, that F 0 [τ](d ) ∈ F 0 [s]({d }). (π, d) ∈ W [a](type(σ)) ⇔ (π, a π(d)) ∈ γ(type(σ)) π I π I (7) ⇔ ∃Σ,J X.K d = (Σ,X) ∧ σ ∈ Σ From (6) and (7) follows ⇔ (π, d) ∈ γ(type(σ)) 0 Fπ0 [s]({dI }) ∩ {d | (π , d) ∈ γ(not(q))}= 6 ∅, ⇔ (π, d) ∈ γ( a b(type(σ))). J K as desired.  This shows that the desired equality holds for these atomic com- mands a. When a is a method call x.m(), we prove the equality as LEMMA 9. If TRACER(dI , s, q) returns π, we have that follows: Fπ[s]({dI }) ⊆ {d | (π, d) ∈ γ(q)}. (π, x.m() π(d)) ∈ γ(type(σ)) ⇔ J K Furthermore, π is a minimum-cost parameter value (according to ∃Σ, X. d = (Σ,X) ) satisfying the above subset relationship. 0 0 0 ∧ (∀σ . m (σ ) = > ⇒ σ 6∈ Σ) Proof TRACER(dI , s, q) returns π only when ∧ ((∃σ0.J σ0K∈ Σ ∧ m (σ0) = σ) ∨ x 6∈ X ∧ σ ∈ Σ) ⇔ J K Fπ[s]({dI }) ∩ {d | (π, d) ∈ γ(not(q))} = ∅. (8) (π, d) ∈ γ(errNot But there is some D0 ⊆ D such that ∧ (V{typeNot(σ0) | m (σ0) = >}) ∧ ((W{type(σ0) | mJ(σK0) = σ}) γ(not(q)) = (P × (D \ D0)) ∧ γ(q) = (P × D0). ∨ (varNot(x)J∧Ktype(σ))) Hence, (8) implies the desired subset relationship on π. Let’s move ⇔ on to the proof that π is minimum. For the sake of contradiction, (π, d) ∈ γ( x.m() b(type(σ))). suppose that there exists π0 such that J K 0 0 The last case is typeNot(σ). The proof of this case is similar to π 6 π ∧ Fπ0 [s]({dI }) ⊆ {d | (π , d) ∈ γ(q)}. (9) that of the varNot(z) case. 0  Then, π should not be in Πviable in the iteration of TRACER that A.5 Proof of Theorem 6 chose π from Πviable. Since Πviable is becoming a smaller set in each iteration of the algorithm, π0 must have been removed from We will first prove lemmas that describe the properties of TRACER. Πviable in some previous iteration. According to Lemma 8, this Then, we will use the lemmas and prove the theorem. can happen only if 0 LEMMA 7. If the domain P of parameters is finite, TRACER(dI , s, q) Fπ0 [s]({dI }) 6⊆ {d | (π , d) ∈ γ(q)}, terminates. which contradicts the second conjunct of (9).  Proof We will prove that each iteration removes at least one ele- ment from Π . Since no new elements are added to Π LEMMA 10. If TRACER(dI , s, q) returns impossible, there is no π viable viable such that in each iteration, this combined with the fact that P (the initial value of Πviable) is finite will conclude the proof. Suppose some Fπ[s]({dI }) ⊆ {d | (π, d) ∈ γ(q)}. π ∈ Π is chosen on line 8 and some trace τ ∈ trace(s) is viable Proof Suppose TRACER(d , s, q) returns impossible. This means chosen on line 13 such that I that Πviable = ∅ right before the algorithm terminates. Hence, ev- Fπ[τ](dI ) ∈ (Fπ[s]({dI }) ∩ {d | (π, d) ∈ γ(not(q))}). ery parameter value π was removed from Πviable in some iteration of TRACER(d , s, q). This combined with Lemma 8 implies that Such a trace τ exists because of Lemma 1. Hence, we have I Fπ[s]({dI }) 6⊆ {d | (π, d) ∈ γ(q)} for every π, as claimed by (π, Fπ[τ](dI )) ∈ γ(not(q)). (5) this lemma.  THEOREM 6. TRACER(dI , s, q) computes correct results, and is guaranteed to terminate when P is finite. Proof The theorem follows from Lemmas 7, 9, and 10.