Incrementalized Pointer and Escape Analysis Frédéric Vivien, Martin Rinard

To cite this version:

Frédéric Vivien, Martin Rinard. Incrementalized Pointer and Escape Analysis. PLDI ’01 Proceedings of the ACM SIGPLAN 2001 conference on Programming language design and implementation, Jun 2001, Snowbird, United States. pp.35–46, ￿10.1145/381694.378804￿. ￿hal-00808284￿

HAL Id: hal-00808284 https://hal.inria.fr/hal-00808284 Submitted on 14 Oct 2018

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Incrementalized Pointer and Escape Analysis∗

Fred´ eric´ Vivien Martin Rinard ICPS/LSIIT Laboratory for Computer Science Universite´ Louis Pasteur Massachusetts Institute of Technology Strasbourg, France Cambridge, MA 02139 [email protected]•strasbg.fr [email protected]

ABSTRACT to those parts of the program that offer the most attrac- We present a new pointer and escape analysis. Instead of tive return (in terms of optimization opportunities) on the analyzing the whole program, the algorithm incrementally invested resources. Our experimental results indicate that analyzes only those parts of the program that may deliver this approach usually delivers almost all of the benefit of the useful results. An analysis policy monitors the analysis re- whole-program analysis, but at a fraction of the cost. sults to direct the incremental investment of analysis re- 1.1 Analysis Overview sources to those parts of the program that offer the highest expected optimization return. Our analysis incrementalizes an existing whole-program Our experimental results show that almost all of the ob- analysis for extracting points-to and escape information [16]. The basic abstraction in this analysis is a points-to escape jects are allocated at a small number of allocation sites and that an incremental analysis of a small region of the program graph. The nodes of the graph represent objects; the edges surrounding each site can deliver almost all of the benefit of a represent references between objects. In addition to points- to information, the analysis records how objects escape the whole-program analysis. Our analysis policy is usually able to deliver this benefit at a fraction of the whole-program currently analyzed region of the program to be accessed by analysis cost. unanalyzed regions. An object may escape to an unanalyzed caller via a parameter passed into the analyzed region or via the return value. It may also escape to a potentially 1. INTRODUCTION invoked but unanalyzed method via a parameter passed into Program analysis research has focused on two kinds of that method. Finally, it may escape via a global variable or analyses: local analyses, which analyze a single procedure, parallel . If an object does not escape, it is captured. and whole-program analyses, which analyze the entire pro- The analysis is flow sensitive, context sensitive, and com- gram. Local analyses fail to exploit information available positional. Guided by the analysis policy, it performs an across procedure boundaries; whole-program analyses are incremental analysis of the neighborhood of the program potentially quite expensive for large programs and are prob- surrounding selected object allocation sites. When it first lematic when parts of the program are not available in ana- analyzes a method, it skips the analysis of all potentially lyzable form. invoked methods, but maintains enough information to re- This paper describes our experience incrementalizing an construct the result of analyzing these methods should it existing whole-program analysis so that it can analyze arbi- become desirable to do so. The analysis policy then ex- trary regions of complete or incomplete programs. The new amines the graph to find objects that escape, directing the analysis can 1) analyze each method independently of its incremental integration of (possibly cached) analysis results caller methods, 2) skip the analysis of potentially invoked from potential callers (if the object escapes to the caller) methods, and 3) incrementally incorporate analysis results or potentially invoked methods (if the object escapes into from previously skipped methods into an existing analysis these methods). Because the analysis has complete infor- result. These features promote a structure in which the mation about captured objects, the goal is to analyze just algorithm executes under the direction of an analysis pol- enough of the program to capture objects of interest. icy. The policy continuously monitors the analysis results to direct the incremental investment of analysis resources 1.2 Analysis Policy We formulate the analysis policy as a solution to an in- ∗ The full version of this paper is available at vestment problem. At each step of the analysis, the policy www.cag.lcs.mit.edu/∼rinard/paper/pldi01.full.ps. This research can invest analysis resources in any one of several allocation was done while Fr´ed´eric Vivien was a Visiting Professor in the MIT Laboratory for Computer Science. The research was supported in sites in an attempt to capture the objects allocated at that part by DARPA/AFRL Contract F33615-00-C-1692, NSF Grant site. To invest its resources wisely, the policy uses empirical CCR00-86154, and NSF Grant CCR00-63513. data from previous analyses, the current analysis result for Permission to make digital or hard copies of all or part of this work for each site, and profiling data from a previous training run to personal or classroom use is granted without fee provided that copies are estimate the marginal return on invested analysis resources not made or distributed for profit or commercial advantage and that copies for each site. bear this notice and the full citation on the first page. To copy otherwise, to During the analysis, the allocation sites compete for re- republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. sources. At each step, the policy invests its next unit of . analysis resources in the allocation site that offers the best marginal return. When the unit expires, the policy recom- • Analysis Algorithm: It presents a new combined putes the estimated returns and again invests in the (po- pointer and escape analysis algorithm based on the tentially different) allocation site with the best estimated incremental approach described above. marginal return. As the analysis proceeds and the policy obtains more information about each allocation site, the • Analysis Policy: It formulates the analysis policy as marginal return estimates become more accurate and the a solution to an investment problem. Presented with quality of the investment decisions improves. several analysis opportunities, the analysis policy in- crementally invests analysis resources in those oppor- 1.3 Analysis Uses tunities that offer the best estimated marginal return. We use the analysis results to enable a stack allocation op- • Experimental Results: Our experimental results timization. If the analyis captures an object in a method, show that, for our benchmark programs, our analysis it is unreachable once the method returns. In this case, policy delivers almost all of the benefit of the whole- the generated code allocates the object in the activation program analysis at a fraction of the cost. record of the method rather than in the heap. Other opti- mization uses include synchronization elimination and the The remainder of the paper is structured as follows. Sec- elimination of ScopedMemory checks in Real-Time [6]. tion 2 presents several examples. Section 3 presents our pre- Potential software engineering uses include the evaluation of viously published base whole-program analysis [16]; readers programmer hypotheses regarding points-to and escape in- familiar with this analysis can skip this section. Section 4 formation for specific objects, the discovery of methods with presents the incrementalized analysis. Section 5 presents no externally visible side effects, and the extraction of infor- the analysis policy; Section 6 presents experimental results. mation about how methods access data from the enclosing Section 7 discusses related work; we conclude in Section 8. environment. Because the analysis is designed to be driven by an analy- 2. EXAMPLES sis policy to explore only those regions of the program that We next present several examples that illustrate the basic are relevant to a specific analysis goal, we expect the analysis approach of our analysis. Figure 1 presents two classes: the to be particularly useful in settings (such as dynamic com- complex class, which implements a complex number pack- pilers and interactive software engineering tools) in which it age, and the client class, which uses the package. The must quickly answer queries about specific objects. complex class uses two mechanisms for returning values to callers: the add and multiplyAdd methods write the re- 1.4 Context sult into the receiver object (the this object), while the In general, a base analysis must have several key proper- multiply method allocates a new object to hold the result. ties to be a good candidate for incrementalization: it must be able to analyze methods independently of their callers, 2.1 The compute Method it must be able to skip the analysis of invoked methods, We assume that the analysis policy first targets the object and it must be able to recognize when a partial analysis of allocation site at line 3 of the compute method. The goal the program has given it enough information to apply the is to capture the objects allocated at this site and allocate desired optimization. Algorithms that incorporate escape them on the . The initial analysis of compute skips information are good candidates for incrementalization be- the call to the multiplyAdd method. Because the analysis cause they enable the analysis to recognize captured objects is flow sensitive, it produces a points-to escape graph for (for which it has complete information). As discussed fur- each program point in the compute method. Because the ther in Section 7, many existing escape analyses either have stack allocation optimization ties object lifetimes to method or can easily be extended to have the other two key prop- lifetimes, the legality of this optimization is determined by erties [14, 7, 3]. Many of these algorithms are significantly the points-to escape graph at the end of the method. more efficient than our base algorithm, and we would expect Figure 2 presents the points-to escape graph from the incrementalization to provide these algorithms with addi- end of the compute method. The solid nodes are inside tional efficiency increases comparable to those we observed nodes, which represent objects created inside the currently for our algorithm. analyzed region of the program. Node 3 is an inside node An arguably more important benefit is the fact that incre- that represents all objects created at line 3 in the compute mentalized algorithms usually analyze only a local neighbor- method. The dashed nodes are outside nodes, which rep- hood of the program surrounding each object allocation site. resent objects not identified as created inside the currently The analysis time for each site is therefore independent of analyzed region of the program. Nodes 1 and 2 are a kind the overall size of the program, enabling the analysis to scale of outside node called a parameter node; they represent the to handle programs of arbitrary size. And incrementalized parameters to the compute method. The analysis result also algorithms can analyze incomplete programs. records the skipped call sites and the actual parameters at each site. 1.5 Contributions In this case, the analysis policy notices that the target This paper makes the following contributions: node (node 3) escapes because it is a parameter to the skipped call to multiplyAdd. It therefore directs the al- • Analysis Approach: It presents an incremental ap- gorithm to analyze the multiplyAdd method and integrate proach to program analysis. Instead of analyzing the the result into the points-to escape graph from the program entire program, the analysis is focused by an analysis point at the end of the compute method. policy to incrementally analyze only those regions of Figure 3 presents the points-to escape graph from the ini- the program that may provide useful results. tial analysis of the multiplyAdd method. Nodes 4 through 7 are parameter nodes. Node 8 is another kind of outside node: a return node that represents the return value of an class complex { double x,y; unanalyzed method, in this case the multiply method. To complex(double a, double b) { x = a; y = b; } integrate this graph into the caller graph from the compute void add(complex u, complex v) { method, the analysis first maps the parameter nodes from x = u.x+v.x; y = u.y+v.y; the multiplyAdd method to the nodes that represent the } actual parameters at the call site. In our example, node complex multiply(complex m) { 4 maps to node 3, node 5 maps to node 1, and nodes 6 complex r = new complex(x*m.x-y*m.y, x*m.y+y*m.x); return(r); and 7 both map to node 2. The analysis uses this mapping } to combine the graphs into the new graph in Figure 4. The void multiplyAdd(complex a, complex b, complex c) { analysis policy examines the new graph and determines that complex s = b.multiply(c); the target node now escapes via the call to the add method. this.add(a, s); It therefore directs the algorithm to analyze the add method } and integrate the resulting points-to escape graph into the } class client { current graph for the compute method. Note that because public static void compute(complex d, complex e) { the call to the multiply method has no effect on the es- 3: complex t = new complex(0.0, 0.0); cape status of the target node, the analysis policy directs t.multiplyAdd(d,e,e); the algorithm to leave this method unanalyzed. } Figure 5 presents the new graph after the integration of } the graph from the add method. Because the add method does not change the points-to or escape information, the net Figure 1: Complex Number and Client Classes effect is simply to remove the skipped call to the add method. Note that the target node (node 3) is captured in this graph, Points-to Escape Graph Skipped Method Calls which implies that it is not accessible when the compute d 1 method returns. The can therefore generate code e 2 3 .multiplyAdd( 1 , 2 , 2 ) that allocates all objects from the corresponding allocation site in the activation record of this method. t 3 2.2 The multiply Method inside node outside node The analysis next targets the object allocation site in- side the multiply method. The points-to escape graph Figure 2: Analysis Result from compute Method from this method indicates that the target node escapes to the caller (in this case the multiplyAdd method) via the Points-to Escape Graph Skipped Method Calls return value. The algorithm avoids repeated method re- analyses by retrieving the cached points-to escape graph for this 4 c 7 8 = 6 .multiply( 7 ) the multiplyAdd method, then integrating the graph from a 5 the multiply method into this cached graph. The result is s 8 b 6 4 .add( 5 , 8 ) cached as the new (more precise) points-to escape graph for the multiplyAdd method. It indicates that the target node does not escape to the caller of the multiplyAdd method, Figure 3: Analysis Result from multiplyAdd Method but does escape via the unanalyzed call to the add method. The analysis therefore retrieves the cached points-to escape Points-to Escape Graph Skipped Method Calls graph from the add method, then integrates this graph into the current graph from the multiplyAdd method, once again d 1 8 = 2 .multiply( 2 ) caching the result as the graph for the multiplyAdd method. e 2 The target node is captured in this graph — it escapes its t 3 3 .add( 1 , 8 ) enclosing method (the multiply method), but is recaptured in a caller (the multiplyAdd method). At this point the compiler has several options: it can Figure 4: Analysis Result from compute Method after inline the multiply method into the multiplyAdd method Integrating Result from multiplyAdd and allocate the object on the stack, or it can preallocate the object on the stack frame of the multiplyAdd method, Points-to Escape Graph Skipped Method Calls then pass it in by reference to a specialized version of the multiply routine. Both options enable stack allocation even d 1 if the node is captured in some but not all invocation paths, e 2 8 = 2 .multiply( 2 ) if the analysis policy declines to analyze all potential callers, or if it is not possible to identify all potential callers at com- t 3 pile time. Our implemented compiler uses inlining.

Figure 5: Analysis Result from compute Method after 2.3 Object Field Accesses Integrating Results from multiplyAdd and add Our next example illustrates how the analysis deals with object field accesses. Figure 6 presents a rational number class that deals with return values in yet another way. Each Rational object has a field called result; the methods in class Rational { Figure 6 that operate on these objects store the result of int numerator, denominator; their computation in this field for the caller to access. Rational result; Rational(int n, int d) { We next discuss how the analysis policy guides the analy- numerator = n; sis for the Rational allocation site at line 1 in the evaluate denominator = d; method. Figure 7 presents the initial analysis result at the } end of this method. The dashed edge between nodes 1 and void scale(int m) { 2 is an outside edge, which represents references not iden- result = new Rational(numerator * m, tified as created inside the currently analyzed region of the denominator); } program. Outside edges always point from an escaped node void abs() { to a new kind of outside node, a load node, which represents int n = numerator; objects whose references are loaded at a given load state- int d = denominator; ment, in this case the statement n = r.result at line 2 in if (n < 0) n = -n; the evaluate method. if (d < 0) d = -d; The analysis policy notices that the target node (node 1) if (d % n == 0) { 4: result = new Rational(n / d, 1); escapes via a call to the abs method. It therefore directs the } else { analysis to analyze abs and integrate the result into the re- 5: result = new Rational(n, d); sult from the end of the evaluate method. Figure 8 presents } the analysis result from the end of the abs method. Node 3 } represents the receiver object, node 4 represents the object } created at line 4 of the abs method, and node 5 represents class client { public static void evaluate(int i, int j) { the object created at line 5. The solid edges from node 3 to 1: Rational r = new Rational(0.0, 0.0); nodes 4 and 5 are inside edges. Inside edges represent ref- r.abs(); erences created within the analyzed region of the program, 2: Rational n = r.result; in this case the abs method. n.scale(m); The algorithm next integrates this graph into the analysis } result from evaluate. The goal is to reconstruct the result } of the base whole-program analysis. In the base analysis, which does not skip call sites, the analysis of abs changes Figure 6: Rational Number and Client Classes the points-to escape graph at the program point after the call site. These changes in turn affect the analysis of the Points-to Escape Graph Skipped Method Calls statements in evaluate after the call to abs. The incremen- talized analysis reconstructs the analysis result as follows. It result r 1 2 1 .abs() first determines that node 3 represented node 1 during the analysis of abs. It then matches the outside edge against 2 .scale() the two inside edges to determine that, during the analysis n of the region of evaluate after the skipped call to abs, the outside edge from node 1 to node 2 represented the inside inside edge outside edge edges from node 3 to nodes 4 and 5, and that the load node 2 therefore represented nodes 4 and 5. The combined graph therefore contains inside edges from node 1 to nodes 4 and Figure 7: Analysis Result from evaluate Method 5. Because node 1 is captured, the analysis removes the outside edge from this node. Finally, the updated analysis Points-to Escape Graph Skipped Method Calls replaces the load node 2 in the skipped call site to scale with nodes 4 and 5. At this point the analysis has captured result 4 node 1 inside the evaluate method, enabling the compiler to this 3 stack allocate all of the objects created at the corresponding allocation site at line 1 in Figure 6. result 5

3. THE BASE ANALYSIS The base analysis is a previously published points-to and Figure 8: Analysis Result from abs Method escape analysis [16]. For completeness, we present the algo- rithm again here. The algorithm is compositional, analyzing Points-to Escape Graph Skipped Method Calls each method once before its callers to extract a single pa- 4 rameterized analysis result that can be specialized for use at result 1 different call sites. It therefore analyzes the program in a r 1 n { 4 , 5 }.scale() bottom-up fashion from the leaves of the call graph towards the root. To simplify the presentation we ignore static class result 5 variables, exceptions, and return values. Our implemented algorithm correctly handles all of these features.

1 Figure 9: Analysis Result from evaluate After Inte- Recursive programs require a fixed-point algorithm that may analyze grating Result from abs methods involved in cycles in the call graph multiple times. 3.1 Object Representation the entry point of the method is h∅, {hvi,nvi i.1 ≤ i ≤ n}i The analysis represents the objects that the program ma- where nvi is the parameter node for parameter vi. If the nipulates using a set n ∈ N of nodes, which consists of method is invoked in a context where some of the parame- a set NI of inside nodes and a set NO of outside nodes. ters may point to the same object, the interprocedural anal- Inside nodes represent objects created inside the currently ysis described below in Section 3.5 merges parameter nodes analyzed region of the program. There is one inside node to conservatively model the effect of the aliasing. for each object allocation site; that node represents all ob- jects created at that site. The inside nodes include the set Statement Existing Edges Generated Edges of thread nodes NT ⊆ NI . Thread nodes represent thread objects, i.e. objects that inherit from Thread or implement the Runnable interface. The set of parameter nodes NP ⊆ NO represents objects passed as parameters into the currently analyzed method. There is one load node n ∈ NL ⊆ NO for each load state- ment in the program; that node represents all objects whose references are 1) loaded at that statement, and 2) not iden- tified as created inside the currently analyzed region of the program. There is also a set f ∈ F of fields in objects, a set v ∈ V of local or parameter variables, and a set l ∈ L ⊆ V of local variables. 3.2 Points•To Escape Graphs A points-to escape graph is a pair hO,Ii, where

• O ⊆ (N × F) × NL is a set of outside edges. We write f an edge hhn1, fi,n2i as n1 → n2. • I ⊆ ((N ×F)×N)∪(V×N) is a set of inside edges. We write an edge hv,ni as v → n and an edge hhn1, fi,n2i f as n1 → n2. where A node escapes if it is reachable in O∪I from a parameter node or a thread node. We formalize this notion by defining an escape function ′ ′ eO,I (n) = {n ∈ NT ∪ NP .n is reachable from n in O ∪ I} that returns the set of parameter and thread nodes through which n escapes. We define the concepts of escaped and captured nodes as follows:

• escaped(hO,Ii,n) if eO,I (n) 6= ∅ where • captured(hO,Ii,n) if eO,I (n) = ∅ We say that an allocation site escapes or is captured in the context of a given analysis if the corresponding inside node is escaped or captured in the points-to escape graph that the analysis produces. 3.3 Program Representation The algorithm represents the computation of each method Figure 10: Generated Edges for Basic Statements using a control flow graph. We assume the program has been ′ ′ preprocessed so that all statements relevant to the analy- The transfer function hO ,I i = [[ st ]](hO,Ii) models the sis are either a copy statement l = v, a load statement effect of each statement st on the current points-to escape l1 = l2.f, a store statement l1.f = l2, an object allo- graph. Figure 10 graphically presents the rules that deter- cation statement l = new cl, or a method call statement mine the new graph for each statement. Each row in this l0.op(l1,..., lk). figure contains three items: a statement, a graphical repre- sentation of existing edges, and a graphical representation 3.4 Intraprocedural Analysis of the existing edges plus the new edges that the statement The intraprocedural analysis is a forward dataflow analy- generates. Two of the rows (for statements l1 = l2.f and sis that produces a points-to escape graph for each program l = new cl) also have a where clause that specifies a set of point in the method. Each method is analyzed under the side conditions. The interpretation of each row is that when- assumption that the parameters are maximally unaliased, ever the points-to escape graph contains the existing edges i.e., point to different objects. For a method with formal and the side conditions are satisfied, the transfer function parameters v0,..., vn, the initial points-to escape graph at for the statement generates the new edges. Assignments to a variable kill existing edges from that variable; assignments to fields of objects leave existing edges in place. At control- µˆ(n) ⊆ µ(n) (1) flow merges, the analysis takes the union of the inside and outside edges. At the end of the method, the analysis re- f f µ moves all captured nodes and local or parameter variables n1 → n2 ∈ O2,n3 → n4 ∈ O1 ∪ I1,n1 −→ n3 µ (2) from the points-to escape graph. n2 −→ n4

3.5 Interprocedural Analysis µ µ n1 −→ n3,n2 −→ n3,n1 6= n2, At each call statement, the interprocedural analysis uses f f (3) the analysis result from each potentially invoked method to n1 → n4 ∈ O2,n2 → n5 ∈ O2 ∪ I2 compute a transfer function for the statement. We assume a µ(n4) ⊆ µ(n5) call site of the form l0.op(l1,..., lk), a potentially invoked f µ method op with formal parameters v0,..., vk, a points-to n1 → n2 ∈ I2,n1 −→ n,n2 ∈ NI escape graph hO1,I1i at the program point before the call µ (4) n2 −→ n2 site, and a graph hO2,I2i from the end of op.

A map µ ⊆ N × N combines the callee graph into the f µ caller graph. The map serves two purposes: 1) it maps n1 → n2 ∈ O2,n1 −→ n, escaped(hO,Ii,n) µ (5) each outside node in the callee to the nodes in the caller n2 −→ n2 that it represents during the analysis of the callee, and 2) it maps each node in the callee to itself if that node should f be present in the combined graph. We use the notation n1 → n2 ∈ I2 (6) ′ ′ µ (µ(n1) ×{f}) × µ(n2) ⊆ I µ(n) = {n .hn,n i∈ µ} and n1 −→ n2 for n2 ∈ µ(n1). The interprocedural mapping algorithm hhO,Ii,µi = f µ map(hO1,I1i, hO2,I2i, µˆ) starts with the points-to escape 1 2 2 2 2 n → n ∈ O ,n −→ n (7) graph hO1,I1i from the caller, the graph hO2,I2i from the (µ(n1) ×{f}) ×{n2} ⊆ O callee, and an initial parameter map

I1(li) if {n} = I2(vi) Figure 11: Constraints for Interprocedural Analysis µˆ(n) =  ∅ otherwise that maps each parameter node from the callee to the nodes With this optimization, a single load node may be associ- that represent the corresponding actual parameters at the ated with multiple load statements. The load node gener- call site. It produces the new mapped edges from the callee ated from the merge of k load nodes n1,...,nk is associated hO,Ii and the new map µ. with all of the statements of n1,...,nk. Figure 11 presents the constraints that define the new edges hO,Ii and new map µ. Constraint 1 initializes the 4. THE INCREMENTALIZED ANALYSIS map µ to the initial parameter mapµ ˆ. Constraint 2 extends µ, matching outside edges from the callee against edges from We next describe how to incrementalize the base algo- the caller to ensure that µ maps each outside node from rithm — how to enhance the algorithm so that it can skip the callee to the corresponding nodes in the caller that it the analysis of call sites while maintaining enough infor- represents during the analysis of the callee. Constraint 3 mation to reconstruct the result of analyzing the invoked extends µ to model situations in which aliasing in the caller methods should the analysis policy direct the analysis to causes an outside node from the callee to represent other do so. The first step is to record the set S of skipped call callee nodes during the analysis of the callee. Constraints 4 sites. For each skipped call site s, the analysis records the and 5 complete the map by computing which nodes from invoked method ops and the initial parameter mapµ ˆs that the callee should be present in the caller and mapping these the base algorithm would compute at that call site. To sim- nodes to themselves. Constraints 6 and 7 use the map to plify the presentation, we assume that each skipped call site translate inside and outside edges from the callee into the is 1) executed at most once, and 2) invokes a single method. caller. The new graph at the program point after the call Section 4.8 discusses how we eliminate these restrictions in site is hI1 ∪ I, O1 ∪ Oi. our implemented algorithm. Because of dynamic dispatch, a single call site may invoke The next step is to define an updated escape function several different methods. The transfer function therefore eS,O,I that determines how objects escape the currently an- merges the points-to escape graphs from the analysis of all alyzed region of the program via skipped call sites: potentially invoked methods to derive the new graph at the µˆs point after the call site. The current implementation obtains eS,O,I (n) = {s ∈ S.∃n1 ∈ NP .n1 −→ n2 and this call graph information using a variant of a cartesian n is reachable from n2 in O ∪ I}∪ eO,I (n) product type analysis [1], but it can use any conservative We adapt the interprocedural mapping algorithm from Sec- approximation to the dynamic call graph. tion 3.5 to use this updated escape function. By definition, 3.6 Merge Optimization n escapes through a call site s if s ∈ eS,O,I (n). A key complication is preserving flow sensitivity with re- As presented so far, the analysis may generate points-to spect to previously skipped call sites during the integration escape graphs hO,Ii in which a node n may have multiple of analysis results from those sites. For optimization pur- f f distinct outside edges n → n1,...,n → nk ∈ O. We elimi- poses, the compiler works with the analysis result from the nate this inefficiency by merging the load nodes n1,...,nk. end of the method. But the skipped call sites occur at var- ious program points inside the method. We therefore aug- 4.2 Propagated Edges ment the points-to escape graphs from the base analysis with In the base algorithm, the transfer function for an ana- several orders, which record ordering information between lyzed call site may add new edges to the points-to graph edges in the points-to escape graph and skipped call sites: from before the site. These new edges create effects that propagate through the analysis of subsequent statements. • ω ⊆ S × ((N ×{f}) × NL). For each call site s, ω(s) = f f Specifically, the analysis of these subsequent statements may {n1 → n2.hs,n1 → n2i∈ ω} is the set of outside edges read the new edges, then generate additional edges involv- that the analysis generates before it skips s. ing the newly referenced nodes. In the points-to graph from the incrementalized algorithm, the edges from the invoked • ι ⊆ S × ((N ×{f}) × N). For each call site s, ι(s) = method will not be present if the analysis skips the call site. f f {n1 → n2.hs,n1 → n2i ∈ ι} is the set of inside edges But these missing edges must come (directly or indirectly) that the analysis generates before it skips s. from nodes that escape into the skipped call site. In the points-to graphs from the caller, these missing edges are rep- • τ ⊆ S × ((N ×{f}) × NL). For each call site s, τ(s) = resented by outside edges that are generated by the analysis f f {n1 → n2.hs,n1 → n2i∈ τ} is the set of outside edges of subsequent statements. The analysis can therefore use that the analysis generates after it skips s. τ1(s) and ν1(s) to reconstruct the propagated effect of ana- lyzing the skipped method. It computes • ν ⊆ S × ((N ×{f}) × N). For each call site s, ν(s) = ′ ′ ′ 1 1 f f hO ,I ,µ i = map(hO,Ii, hτ (s),ν (s)i, {hn,ni.n ∈ N}) {n1 → n2.hs,n1 → n2i ∈ ν} is the set of inside edges ′ ′ that the analysis generates after it skips s. where O and I are the new sets of edges that come from the interaction of the analysis of the skipped method and ′ ′ ′ • β ⊆ S × S. For each call site s, β(s) = {s .hs,s i∈ β} subsequent statements, and µ maps each outside node from is the set of call sites that the analysis skips before the caller to the nodes from the callee that it represents skipping s. during the analysis from the program point after the skipped call site to the end of the method. Note that this algorithm ′ ′ • α ⊆ S × S. For each call site s, α(s) = {s .hs,s i ∈ generates all of the new edges that a complete reanalysis α} is the set of call sites that the analysis skips after would generate. But it generates the edges incrementally skipping s. without reanalyzing the code.

The incrementalized analysis works with augmented points- 4.3 Skipped Call Sites from the Caller to escape graphs of the form hO,I,S,ω,ι,τ,ν,β,αi. Note In the base algorithm, the analysis of one call site may 2 that because β and α are inverses, the analysis does not affect the initial parameter map for subsequent call sites. need to represent both explicitly. It is of course possible Specifically, the analysis of a site may cause the formal pa- to use any conservative approximation of ω, ι, τ, ν, β and rameter nodes at subsequent sites to be mapped to addi- α; an especially simple approach uses ω(s) = τ(s) = O, tional nodes in the graph from the caller. ι(s) = ν(s) = I, and β(s) = α(s) = S. For each skipped call site, the incrementalized algorithm We next discuss how the analysis uses these additional records the parameter map that the base algorithm would components during the incremental analysis of a call site. have used at that site. When the incrementalized algorithm We assume a current augmented points-to escape graph integrates an analysis result from a previously skipped site, hO1,I1,S1,ω1,ι1,τ1,ν1,β1,α1i, a call site s ∈ S1 with in- it must update the recorded parameter maps for subsequent voked operation ops, and an augmented points-to escape skipped sites. At each of these sites, outside nodes repre- graph hO2,I2,S2,ω2,ι2,τ2,ν2,β2,α2i from the end of ops. sent the additional nodes that the analysis of the previously ′ skipped site may add to the map. And the map µ records 4.1 Matched Edges how each of these outside nodes should be mapped. For ′ In the base algorithm, the analysis of a call site matches each subsequent site s ∈ α1(s), the algorithm composes the ′ outside edges from the analyzed method against existing site’s current recorded parameter mapµ ˆs′ with µ to obtain ′ edges in the points-to escape graph from the program point its new recorded parameter map µ ◦ µˆs′ . before the site. By the time the algorithm has propagated the graph to the end of the method, it may contain addi- 4.4 Skipped Call Sites from the Callee ′ tional edges generated by the analysis of statements that The new set of skipped call sites S = (S1 ∪ S2) contains execute after the call site. When the incrementalized algo- the set of skipped call sites S2 from the callee. When it maps rithm integrates the analysis result from a skipped call site, the callee graph into the caller graph, the analysis updates it matches outside edges from the invoked method against the recorded parameter maps for the skipped call sites in ′ only those edges that were present in the points-to escape S2. For each site s ∈ S2, the analysis simply composes the graph at the program point before the call site. ω(s) and ι(s) site’s current mapµ ˆs′ with the map µ to obtain the new ′ provide just those edges. The algorithm therefore computes recorded parameter map µ ◦ µˆs′ for s .

hO,I,µi = map(hω1(s),ι1(s)i, hO2,I2i, µˆs) 4.5 New Orders The analysis constructs the new orders by integrating the where O and I are the new sets of edges that the analysis orders from the caller and callee into the new analysis result of the callee adds to the caller graph. and extending the orders for s to the mapped edges and 2 −1 −1 Under the interpretation β = {hs1,s2i.hs2,s1i ∈ β} and α = skipped call sites from the callee. So, for example, the new −1 −1 ′ {hs1,s2i.hs2,s1i ∈ α}, β = α and β = α. order between outside edges and subsequent call sites (ω ) consists of the order from the caller (ω1), the mapped order the site’s current recorded parameter mapµ ˆs and the from the callee (ω2[µ]), the order from s extended to the parameter mapµ ˆ from the transfer function to obtain skipped call sites from the callee (S2×ω1(s)), and the outside the site’s new recorded parameter mapµ ˆs ∪ µˆ. edges from the callee ordered with respect to the call sites after s (α1(s) × O): • The algorithm that integrates analysis results from ′ previously skipped call sites performs a similar set of ω =ω1 ∪ ω2[µ] ∪ (S2 × ω1(s)) ∪ (α1(s) × O) ′ operations to maintain the recorded parameter maps ι =ι1 ∪ ι2[µ] ∪ (S2 × ι1(s)) ∪ (α1(s) × I) ′ and multiplicity flags for call sites that may be present τ =τ1 ∪ τ2[µ] ∪ (S2 × τ1(s)) ∪ (β1(s) × O) ′ in the analysis results from both the callee and the ν =ν1 ∪ ν2[µ] ∪ (S2 × ν1(s)) ∪ (β1(s) × I) ′ caller. If the skipped call site may be executed mul- β =β1 ∪ β2 ∪ (S2 × β1(s)) ∪ (α1(s) × S2) ′ tiple times, the analysis uses a fixed-point algorithm α =α1 ∪ α2 ∪ (S2 × α1(s)) ∪ (β1(s) × S2) when it integrates the analysis result from the skipped Here ω[µ] is the order ω under the map µ, i.e., ω[µ] = call site. This algorithm models the effect of executing ′ f ′ f µ ′ µ ′ the site multiple times. {hs,n1 → n2i.hs,n1 → n2i∈ ω,n1 −→ n1, and n2 −→ n2}, and similarly for ι,τ, and ν. 4.9 Recursion 4.6 Cleanup The base analysis uses a fixed-point algorithm to ensure that it terminates in the presence of recursion. It is possible At this point the algorithm can compute a new graph ′ ′ ′ ′ ′ ′ ′ ′ ′ to use a similar approach in the incrementalized algorithm. hO1 ∪ O ∪ O ,I1 ∪ I ∪ I ,S ,ω ,ι ,τ ,ν ,β ,α i that reflects Our implemented algorithm, however, does not check for the integration of the analysis of s into the previous anal- recursion as it explores the call graph. If a node escapes ysis result hO1,I1,S1,ω1,ι1,τ1,ν1,β1,α1i. The final step is into a recursive method, the analysis may, in principle, never to remove s from all components of the new graph and to terminate. In practice, the algorithm relies on the analysis remove all outside edges from captured nodes. policy to react to the expansion of the analyzed region by 4.7 Updated Intraprocedural Analysis directing analysis resources to other allocation sites. The transfer function for a skipped call site s performs 4.10 Incomplete Call Graphs the following additional tasks: Our algorithm deals with incomplete call graphs as fol-

• Record the initial parameter mapµ ˆs that the base al- lows. If it is unable to locate all of the potential callers of a gorithm would use when it analyzed the site. given method, it simply analyzes those it is able to locate. If it is unable to locate all potential callees at a given call • Update ω to include {s}×O, update ι to include {s}× site, it simply considers all nodes that escape into the site I, update α to contain S×{s}, and update β to contain as permanently escaped. {s} × S. • Update S to include the skipped call site s. 5. ANALYSIS POLICY The goal of the analysis policy is to find and analyze al- Whenever a load statement generates a new outside edge f location sites that can be captured quickly and have a large n1 → n2, the transfer function updates τ to include S × optimization payoff. Conceptually, the policy uses the fol- f {n1 → n2}. Whenever a store statement generates a new lowing basic approach. It estimates the payoff for capturing f inside edge n1 → n2, the transfer function updates ν to an allocation site as the number of objects allocated at that f site in a previous profiling run. It uses empirical data and include S ×{n1 → n2}. Finally, the incrementalized algorithm extends the con- the current analysis result for the site to estimate the like- fluence operator to merge the additional components. For lihood that it will ever be able to capture the site, and, each additional component (including the recorded param- assuming that it is able to capture the site, the amount of time required to do so. It then uses these estimates to calcu- eter maps µs), the confluence operator is set union. late an estimated marginal return for each unit of analysis 4.8 Extensions time invested in each site. So far, we have assumed that each skipped call site is ex- At each analysis step, the policy is faced with a set of par- ecuted at most once and invokes a single method. We next tially analyzed sites that it can invest in. The policy simply discuss how our implemented algorithm eliminates these re- chooses the site with the best estimated marginal return, strictions. To handle dynamic dispatch, we compute the and invests a (configurable) unit of analysis time in that graph for all of the possible methods that the call site may site. During this time, the algorithm repeatedly selects one invoke, then merge these graphs to obtain the new graph. of the skipped call sites through which the allocation site We also extend the abstraction to handle skipped call sites escapes, analyzes the methods potentially invoked at that that are in loops or are invoked via multiple paths in the site (reusing the cached results if they are available), and control flow graph. We maintain a multiplicity flag for each integrates the results from these methods into the current call site specifying whether the call site may be executed result for the allocation site. If these analyses capture the multiple times: site, the policy moves on to the site with the next best esti- mated marginal return. Otherwise, when the time expires, • The transfer function for a skipped call site s checks to the policy recomputes the site’s estimated marginal return see if the site is already in the set of skipped sites S. If in light of the additional information it has gained during so, it sets the multiplicity flag to indicate that s may the analysis, and once again invests in the (potentially dif- be invoked multiple times. It also takes the union of ferent) site with the current best estimated marginal return. 5.1 Stack Allocation Unanalyzed The compiler applies two potential stack allocation opti- mizations depending on where an allocation site is captured:

• Stack Allocate: If the site is captured in the method Escapes Below Captured Abandoned that contains it, the compiler generates code to allo- Enclosing Method cate all objects created at that site in the activation record of the containing method. • Inline and Stack Allocate: If the site is captured Escapes Below Caller of in a direct caller of the method containing the site, the Enclosing Method compiler first inlines the method into the caller. After inlining, the caller contains the site, and the generated code allocates all objects created at that site in the Figure 12: State-Transition Diagram for Analysis activation record of the caller. Opportunities The current analysis policy assumes that the compiler is 1) unable to inline a method if, because of dynamic dispatch, caller. The site may also escape into one or more the corresponding call site may invoke multiple methods, skipped call sites. The opportunity is of the form and 2) unwilling to enable additional optimizations by fur- ha, op,s,G,p,c,d,mi. ther inlining the callers of the method containing the alloca- tion site into their callers. It is, of course, possible to relax • Captured: The opportunity’s site is captured. these assumptions to support more sophisticated inlining and/or specialization strategies. • Abandoned: The policy has permanently abandoned Inlining complicates the conceptual analysis policy de- the analysis of the opportunity, either because its allo- scribed above. Because each call site provides a distinct cation site permanently escapes via a static class vari- analysis context, the same allocation site may have differ- able or thread, because the site escapes to the caller of ent analysis characteristics and outcomes when its enclosing the caller of its enclosing method (and is therefore un- method is inlined at different call sites. The policy therefore optimizable), or because the site escapes to the caller treats each distinct combination of call site and allocation of its enclosing method and (because of dynamic dis- site as its own separate analysis opportunity. patch) the compiler is unable to inline the enclosing method into the caller. 5.2 Analysis Opportunities In Figure 12 there are multiple transitions from the Es- The policy represents an opportunity to capture an alloca- capes Below Enclosing Method state to the Escapes Below tion site a in its enclosing method op as ha, op,G,p,c,d,mi, Caller of Enclosing Method state. These transitions indi- where G is the current augmented points-to escape graph for cate that one Escapes Below Enclosing Method opportunity the site, p is the estimated payoff for capturing the site, c is may generate multiple new Escapes Below Caller of Enclos- the count of the number of skipped call sites in G through ing Method opportunities — one new opportunity for each which a escapes, d is the method call depth of the analyzed potential call site that invokes the enclosing method from region represented by G, and m is the mean cost of the the old opportunity. call site analyses performed so far on behalf of this analysis When an analysis opportunity enters the Escapes Below opportunity. Note that a, op, and G are used to perform Caller of Enclosing Method state, the first analysis action the incremental analysis, while p, c, d, and m are used to is to integrate the augmented points-to escape graph from estimate the marginal return. Opportunities to capture an the enclosing method into the graph from the caller of the allocation site a in the caller op of its enclosing method have enclosing method. the form ha, op,s,G,p,c,d,mi, where s is the call site in op that invokes the method containing a, and the remainder of 5.3 Estimated Marginal Returns the fields have the same meaning as before. If the opportunity is Unanalyzed, the estimated marginal Figure 12 presents the state-transition diagram for anal- return is (ξ · p)/σ, where ξ is the probability of capturing ysis opportunities. Each analysis opportunity can be in one an allocation site given no analysis information about the of the states of the diagram; the transitions correspond to site, p is the payoff of capturing the site, and, assuming state changes that take place during the analysis of the op- the analysis eventually captures the site, σ is the expected portunity. The states have the following meanings: analysis time required to do so. If the opportunity is in the state Escapes Below Enclosing • Unanalyzed: No analysis done on the opportunity. Method, the estimated marginal return is (ξ1(d) · p)/(c · m). • Escapes Below Enclosing Method: The opportu- Here ξ1(d) is the conditional probability of capturing an al- nity’s allocation site escapes into one or more skipped location site given that the algorithm has explored a region call sites, but does not (currently) escape to the caller of call depth d below the method containing the site, the al- of the enclosing method. The opportunity is of the gorithm has not (yet) captured the site, and the site has not form ha, op,G,p,c,d,mi. escaped (so far) to the caller of its enclosing method. If the opportunity is in the state Escapes Below Caller of Enclosing • Escapes Below Caller of Enclosing Method: The Method, the estimated marginal return is (ξ2(d) · p)/(c · m). opportunity’s site escapes to the caller of its enclos- Here ξ2(d) has the same meaning as ξ1(d), except that the ing method, but does not (currently) escape from this assumption is that the site has escaped to the caller of its enclosing method, but not (so far) to the caller of the caller 6.1 Benchmark Programs of its enclosing method. Our benchmark programs include two multithreaded sci- We obtain the capture probability functions ξ, ξ1, and ξ2 entific computations (Barnes and Water), Jlex, and several empirically by preanalyzing all of the executed allocation Spec benchmarks (Db, Compress, and Raytrace). Figure 13 sites in some sample programs and collecting data that al- presents the compile and whole-program analysis times for lows us to compute these functions. For Escapes Below En- the applications. closing Method opportunities, the estimated payoff p is the number of objects allocated at the opportunity’s allocation Compile Time Whole-Program Application Without Analysis Analysis Time site a during a profiling run. For Escapes Below Caller of Barnes 89.7 34.3 Enclosing Method opportunities, the estimated payoff is the Water 91.1 38.2 number of objects allocated at the opportunity’s allocation Jlex 119.5 222.8 site a when the allocator is invoked from the opportunity’s Db 93.6 126.6 call site s. Raytrace 118.4 102.2 When an analysis opportunity changes state or increases Compress 219.6 645.1 its method call depth, its estimated marginal return may change significantly. The policy therefore recomputes the Figure 13: Compile and Whole-Program Analysis opportunity’s return whenever one of these events happens. Times (seconds) If the best opportunity changes because of this recomputa- The estimated optimization payoff for each allocation site tion, the policy redirects the analysis to work on the new is the number of objects allocated at that site during a train- best opportunity. ing run on a small training input. The presented execution and analysis statistics are for executions on larger produc- 5.4 Termination tion inputs. In principle, the policy can continue the analysis indefi- nitely as it invests in ever less profitable opportunities. In 6.2 Analysis Payoffs and Statistics practice, it is important to terminate the analysis when the Figure 14 presents analysis statistics from the incremen- prospective returns become small compared to the analy- talized analysis. We present the number of captured al- sis time required to realize them. We say that the analysis location sites as the sum of two counts. The first count has decided an object if that object’s opportunity is in the is the number of sites captured in the enclosing method; Captured or Abandoned state. The payoffs p in the analy- the second is the number captured in the caller of the en- sis opportunities enable the policy to compute the current closing method. Fractional counts indicate allocation sites number of decided and undecided objects. that were captured in some but not all callers of the en- Two factors contribute to our termination policy: the closing method. In Db, for example, one of the allocation percentage of undecided objects (this percentage indicates sites is captured in two of the eight callers of its enclosing the maximum potential payoff from continuing the analy- method. The Undecided Allocation Sites column counts the sis), and the rate at which the analysis has recently been number of allocation sites in which the policy invested some deciding objects. The results in Section 6 are from analy- resources, but did not determine whether it could stack al- ses terminated when the percentage of decided objects rises locate the corresponding objects or not. The Analyzed Call above 90% and the decision rate for the last quarter of the Sites, Total Call Sites, Analyzed Methods, and Total Meth- analysis drops below 1 percent per second, with a cutoff of ods columns show that the policy analyzes a small fraction 75 seconds of total analysis time. of the total program. We anticipate the development of a variety of termination The graphs in Figure 15 present three curves for each ap- policies to fit the particular needs of different . A plication. The horizontal dotted line indicates the percent- dynamic compiler, for example, could accumulate an analy- age of objects that the whole-program analysis allocates on sis budget as a percentage of the time spent executing the the stack. The dashed curve plots the percentage of decided application — the longer the application ran, the more time objects (objects whose analysis opportunities are either Cap- the policy would be authorized to invest analyzing it. The tured or Abandoned) as a function of the analysis time. The accumulation rate would determine the maximum amortized solid curve plots the percentage of objects allocated on the analysis overhead. stack. For Barnes, Jlex, and Db, the incrementalized anal- ysis captures virtually the same number of objects as the whole-program analysis, but spends a very small fraction of 6. EXPERIMENTAL RESULTS the whole-program analysis time to do so. Incrementaliza- We have implemented our analysis and the stack alloca- tion provides less of a benefit for Water because two large tion optimization in the MIT Flex compiler, an ahead-of- methods account for a much of the analysis time of both 3 time compiler written in Java for Java. We ran the exper- analyses. For Raytrace and Compress, a bug in the 1.3 JVM iments on an 800 MHz Pentium III PC with 768Mbytes of forced us to run the incrementalized analysis, but not the memory running Linux Version 2.2.18. We ran the compiler whole-program analysis, on the 1.2 JVM. Our experience using the Java Hotspot Client VM version 1.3.0 for Linux. with the other applications indicates that both analyses run The compiler generates portable C code, which we compile between five and six times faster on the 1.3 JVM than on to an executable using gcc. The generated code manages the the 1.2 JVM. heap using the Boehm-Demers-Weiser conservative garbage collector [4] and uses alloca for stack allocation. 6.3 Application Execution Statistics Figure 16 presents the total amount of memory that the 3 The compiler is available at www.flexc.lcs.mit.edu. applications allocate in the heap. Almost all of the allocated Analysis Captured Abandoned Undecided Total Analyzed Total Time Allocation Allocation Allocation Allocation Call Call Analyzed Total (seconds) Sites Sites Sites Sites Sites Sites Methods Methods Barnes 0.8 3+0 0 2 736 18 1675 13 512 Water 21.7 33+0 4 33 748 94 1799 33 481 Jlex 0.9 0+2 1 2 1054 27 2879 12 569 Db 4.5 1+0.25 4 1.75 1118 54 2444 25 631 Raytrace 76.3 8+0.37 20.63 54 1067 271 3109 64 699 Compress 79.5 4+0.33 4 19.66 1354 111 4084 40 808

Figure 14: Analysis Statistics from Incrementalized Analysis

Stack Allocation Percentage, Whole-Program Analysis No Incrementalized Whole-Program Decided Percentage, Incrementalized Analysis Application Analysis Analysis Analysis Stack Allocation Percentage, Incrementalized Analysis Barnes 36.0 3.2 2.0 Water 190.2 2.2 0.6 100 | 100 | Jlex 40.8 3.1 2.5 75 | 75 | Db 77.6 31.2 31.2 50 | 50 | Raytrace 13.4 9.0 6.7

25 | 25 | Compress 110.1 110.1 110.1 | | 0 | | | 0 | | | 0.0 0.3 0.6 0.9 0 7 14 21 Figure 16: Allocated Heap Memory (Mbytes) Analysis Time (seconds) Analysis Time (seconds) No Incrementalized Whole-Program Percentage of Objects Barnes Percentage of Objects Water Application Analysis Analysis Analysis

100 | 100 | Barnes 33.4 22.7 24.0

75 | 75 | Water 18.8 11.2 10.7

50 | 50 | Jlex 5.5 5.0 4.7

25 | 25 | Db 103.8 104.0 101.3 Raytrace 3.0 2.9 2.9 | | 0 | | | 0 | | | Compress 44.9 44.8 45.1 0.0 0.3 0.6 0.9 0.0 1.5 3.0 4.5 Analysis Time (seconds) Analysis Time (seconds) Figure 17: Execution Times (seconds)

Percentage of Objects Jlex Percentage of Objects Db

100 | 100 |

75 | 75 |

| | 7.1 Escape Analysis 50 50

25 | 25 | Many other researchers have developed escape analyses for | | 0 | | | 0 | | | Java [16, 7, 14, 3, 5]. These analyses have been presented 0 25 50 75 0 25 50 75 as whole-program analyses, although many contain elements Analysis Time (seconds) Analysis Time (seconds) that make them amenable to incrementalization. All of the analyses listed above except the last [5] analyze methods Percentage of Objects Compress Percentage of Objects Raytrace independently of their callers, generating a summary that can be specialized for use at each call site. Unlike our base Figure 15: Analysis Time Payoffs analysis [16], these analyses are not designed to skip call sites. But we believe it would be relatively straightforward memory in Barnes and Water is devoted to temporary arrays to augment them to do so. With this extension in place, the that hold the results of intermediate computations. The remaining question is incrementalization. For flow-sensitive C++ version of these applications allocates these arrays on analyses [16, 7], the incrementalized algorithm must record the stack; our analysis restores this allocation strategy in information about the ordering of skipped call sites relative the Java version. Most of the memory in Jlex is devoted to to the rest of the analysis information if it is to preserve temporary iterators, which are stack allocated after inlining. the precision of the base whole-program analysis with re- Note the anomaly in Db and Compress: many objects are spect to skipped call sites. Flow-insensitive analyses [14, 3], allocated on the stack, but the heap allocated objects are can ignore this ordering information and should therefore much bigger than the stack allocated objects. be able to use an extended abstraction that records only Figure 17 presents the execution times. The optimizations the mapping information for skipped call sites. In this sense provide a significant performance benefit for Barnes and Wa- flow-insensitive analyses should be, in general, simpler to ter and some benefit for Jlex and Db. Without stack allo- incrementalize than flow-sensitive analyses. cation, Barnes and Water interact poorly with the conser- Escape analyses have typically been used for stack allo- vative garbage collector. We expect that a precise garbage cation and synchronization elimination. Our results show collector would reduce the performance difference between that analyzing a local region around each allocation site the versions with and without stack allocation. works well for stack allocation, presumably because stack allocation ties object lifetimes to the lifetimes of the captur- 7. RELATED WORK ing methods. But for synchronization elimination, a whole- We briefly discuss related work in escape analysis, demand program analysis may deliver significant additional opti- analysis, program fragment analysis, and incremental anal- mization opportunities. For example, Ruf’s synchronization ysis. See the full paper for a more comprehensive discussion. elimination analysis determines which threads may synchro- nize on which objects [14]. In many cases, the analysis is able [3] B. Blanchet. Escape analysis for object oriented to determine that only one thread synchronizes on a given languages. application to Java. In Proceedings of the object, even though the object may be accessible to multi- 14th Annual Conference on Object-Oriented ple threads or even, via a static class variable, to all threads. Programming Systems, Languages and Applications, Exploiting this global information significantly improves the Denver, CO, Nov. 1999. ability of the compiler to eliminate superfluous synchroniza- [4] H. Boehm and M. Weiser. Garbage collection in an tion operations, especially for single threaded programs. uncooperative environment. Software—Practice and Experience, 18(9):807–820, Sept. 1988. 7.2 Demand, Fragment, and Incremental Analysis [5] J. Bogda and U. Hoelzle. Removing unnecessary Demand algorithms analyze only those parts of the pro- synchronization in Java. In Proceedings of the 14th gram required to compute an analysis fact at a subset of Annual Conference on Object-Oriented Programming the program points or to answer a given query [2, 10, 8, 11]. Systems, Languages and Applications, Denver, CO, Our approach differs in that it is designed to temporarily Nov. 1999. skip parts of the program even if the skipped parts poten- [6] G. Bollella et al. The Real-Time Specification for tially affect the analysis result. Java. Addison-Wesley, Reading, Mass., 2000. Fragment analysis is designed to analyze a predetermined [7] J. Choi, M. Gupta, M. Serrano, V. Sreedhar, and part of the program [12, 13]. A similar effect may be ob- S. Midkiff. Escape analysis for Java. In Proceedings of tained by explicitly specifying the analysis results for miss- the 14th Annual Conference on Object-Oriented ing parts of the program [9, 15]. Our approach differs in Programming Systems, Languages and Applications, that it monitors the analysis results to dynamically deter- Denver, CO, Nov. 1999. mine which parts of the program it should analyze to obtain [8] E. Duesterwald, R. Gupta, and M. Soffa. A practical the best optimization outcome. Incremental algorithms up- framework for demand-driven interprocedural data date an existing analysis result to reflect the effect of pro- flow analysis. ACM Transactions on Programming gram changes [17]. Our algorithm, in contrast, analyzes part Languages and Systems, 19(6):992–1030, Nov. 1997. of the program assuming no previous analysis results. [9] S. Guyer and C. Lin. Optimizing the use of high performance libraries. In Proceedings of the Thirteenth 8. CONCLUSION Workshop on Languages and Compilers for Parallel This paper presents a new incrementalized pointer and es- Computing, Yorktown Heights, NY, Aug. 2000. cape analysis. Instead of analyzing the whole program, the [10] Y. Lin and D. Padua. Demand-driven interprocedural analysis executes under the direction of an analysis policy. array property analysis. In Proceedings of the Twelfth The policy continually monitors the analysis results to di- Workshop on Languages and Compilers for Parallel rect the incremental analysis of those parts of the program Computing, La Jolla, CA, Aug. 1999. that offer the best marginal return on the invested analysis [11] T. Reps, S. Horowitz, and M. Sagiv. Demand resources. Our experimental results show that our analysis, interprocedural dataflow analysis. In Proceedings of when used for stack allocation, usually delivers almost all the ACM SIGSOFT 95 Symposium on the of the benefit of the whole-program analysis at a fraction of Foundations of Software Engineering, Oct. 1995. the cost. And because it analyzes only a local region of the [12] A. Rountev and B. Ryder. Points-to and side-effect program surrounding each allocation site, it scales to handle analyses for programs built with precompiled libraries. programs of arbitrary size. In Proceedings of CC 2001: International Conference on Compiler Construction, Genoa, Italy, Apr. 2001. 9. ACKNOWLEDGEMENTS [13] A. Rountev, B. Ryder, and W. Landi. Data-flow The research presented in this paper would have been im- analysis of program fragments. In Proceedings of the possible without the assistance of C. Scott Ananian, Brian ACM SIGSOFT 99 Symposium on the Foundations of Demsky, and Alexandru Salcianu. Scott and Brian provided Software Engineering, Toulouse, France, Sept. 1999. invaluable assistance with the Flex compiler infrastructure. [14] E. Ruf. Effective synchronization removal for Java. In In particular, Brian developed the profiling package on very Proceedings of the SIGPLAN ’00 Conference on short notice. Alex provided invaluable assistance with the Program Language Design and Implementation, implementation of the base algorithm. We thank David Vancouver, Canada, June 2000. Karger and Ron Rivest for interesting discussions regarding [15] R. Rugina and M. Rinard. Design-directed potential analysis policies, and the anonymous reviewers for compilation. In Proceedings of CC 2001: International their helpful and exceptionally competent feedback. Conference on Compiler Construction, Genoa, Italy, Apr. 2001. 10. REFERENCES [16] J. Whaley and M. Rinard. Compositional pointer and [1] O. Agesen. The cartesian product algorithm: Simple escape analysis for Java programs. In Proceedings of and precise type inference of parametric the 14th Annual Conference on Object-Oriented polymorphism. In Proceedings of the 9th European Programming Systems, Languages and Applications, Conference on Object-Oriented Programming, Aarhus, Denver, CO, Nov. 1999. Denmark, Aug. 1995. [17] J. Yur, B. Ryder, and W. Landi. Incremental [2] G. Agrawal. Simultaneous demand-driven data-flow algorithms and empirical comparison for flow- and and call graph analysis. In Proceedings of the 1999 context-sensitive pointer aliasing analysis. In International Conference on Software Maintenance, Proceedings of the 21st International conference on Oxford, UK, Aug. 1999. Software Engineering, Los Angeles, CA, May 1999.