Merging Equivalent Contexts for Scalable Heap-Cloning-Based Context-Sensitive Points-To Analysis∗
Total Page:16
File Type:pdf, Size:1020Kb
Merging Equivalent Contexts for Scalable Heap-Cloning-Based Context-Sensitive Points-to Analysis∗ Guoqing Xu Atanas Rountev Computer Science and Engineering Computer Science and Engineering Ohio State University Ohio State University [email protected] [email protected] ABSTRACT General Terms A context-sensitive points-to analysis maintains separate points- Algorithms, measurement, experimentation to relationships for each possible (abstract) calling context of a method. Previous work has shown that a large number of equiv- Keywords alence classes exists in the representation of calling contexts. Such Pointer analysis, points-to analysis, context sensitivity equivalent contexts provide opportunities for context-sensitive anal- yses based on binary decision diagrams (BDDs), in which BDDs 1. INTRODUCTION automatically merge equivalent points-to relationships. However, Context sensitivity in points-to analyses has been studied exten- the use of a BDD “black box” introduces additional overhead for sively; some of this work is summarized in [11, 8, 28]. Most such analysis running time. Furthermore, with heap cloning (i.e., using analyses compute a complete points-to solution for all variables in context-sensitive object allocation sites), BDDs are not as effective a program. Such algorithms usually have to sacrifice analysis pre- because the number of equivalence classes increases significantly. cision for practical scalability. An alternative is a refinement-based A further step must be taken to look inside the BDD black box to approach that performs points-to analysis by answering pointer- investigate where the equivalence comes from, and what tradeoffs related queries raised by a compiler or a client analysis on demand can be employed to enable practical large-scale heap cloning. [32, 31]. The precision of this approach can be proportional to the This paper presents an analysis for Java that exploits equivalence resource constraints (such as time and memory) allowed to answer classes in context representation. For a particular pointer variable queries. The analysis can be more precise than those that compute or heap object, all abstract contexts within an equivalence class can a complete points-to solution, if its full precision is reached (i.e., be merged. This technique naturally results in a new non-BDD the result is fully refined). However, this algorithm may not scale context-sensitive points-to analysis. Based on these equivalence when computing solutions for a large number of variables [30], and classes, the analysis employs a last-k-substring merging approach therefore is not suitable for analyses that depend on points-to infor- to define scalability and precision tradeoffs. We show that small mation for the entire program. In this paper we focus on analyses values for k can enable scalable heap cloning for large Java pro- that compute complete points-to information, and propose an al- grams. The proposed analysis has been implemented and evaluated gorithm whose precision approaches that of the refinement-based on a large set of Java programs. The experimental results show im- algorithm, and that scales well to large Java programs. provements over an existing 1-object-sensitive analysis with heap In context-sensitive analyses a standard approach is to use a call- cloning, which is the most precise scalable analysis implemented ing context abstraction that is a string of the call graph edges lead- in the state-of-the-art Paddle analysis framework. For computing a ing to the analyzed method. The length of each string is usually points-to solution for an entire program, our approach is an order limited to a fixed number k. A different technique, used in [34, of magnitude faster compared to this BDD-based analysis and to a 36], does not limit the length of a context string, but excludes all related non-BDD refinement-based analysis. contexts corresponding to call edges that are in a cycle in a pre- computed call graph. Orthogonal to the context abstraction is the Categories and Subject Descriptors decision as to where to apply context sensitivity. All existing anal- F.3.2 [Logics and Meaning of Programs]: Semantics of Program- yses use context-sensitive treatment of local variables, and some ming Languages—Program Analysis algorithms also do this for heap objects. Context-sensitive model- ing of objects is sometimes also referred to as heap cloning [26]. ∗ This material is based upon work supported by the National Science Foun- Higher precision requires a longer context string, with enabled heap dation under CAREER grant CCF-0546040. cloning. Of these two factors, heap cloning has been shown [20, 15] to be more important, because it allows analyses to distinguish dif- ferent instances of a logical data structure. The importance of heap cloning can especially be seen in object-oriented programs [15]. Permission to make digital or hard copies of all or part of this work for To achieve scalability, binary decision diagrams (BDDs) have personal or classroom use is granted without fee provided that copies are been employed [19, 34, 36, 3, 16] to avoid redundant representa- not made or distributed for profit or commercial advantage and that copies tion of similar points-to relationships. A BDD provides an effective bear this notice and the full citation on the first page. To copy otherwise, to representation for context-sensitive analysis because many contexts republish, to post on servers or to redistribute to lists, requires prior specific are equivalent, in the sense that the points-to relationships com- permission and/or a fee. ISSTA’08, July 20–24, 2008, Seattle, Washington, USA. puted under these contexts are the same [20]. A BDD automatically Copyright2008ACM978-1-60558-050-0/08/07 ...$5.00. merges the representation of equivalent points-to relationships. 225 BDDs may add overhead and increase the running time of the is invoked is allocated in addName, and this object does not affect analysis. In [10], the BDD implementation of a points-to analysis and is not affected by the callers of addName. for C is on average two times slower than an implementation using This definition is coarse-grained as it requires all pointer vari- a sparse bitmap. If we can identify the source of the equivalence, ables in a method to have the same equivalence classes. However, and then design a non-BDD analysis that explicitly merges points- the equivalent contexts for different variables in a method can dif- to relationships that have equivalent abstract contexts, we would be fer. Consider the constructor of Vector in the example. We denote able to reduce analysis running time while still keeping the size of a context-insensitive heap object abstraction by oi,wherei is the the solution as small as that of a BDD-based implementation. line number for the allocation site. Similarly, a context-insensitive A context-sensitive analysis with heap cloning may not benefit pointer variable abstraction is denoted by pi,wherep is the name of as much from BDDs. Only short context strings can be used in the variable, and i is the line number where p is first used/defined. such an analysis even if BDDs are employed [20], while an analysis Using this notation, pointer variable t3 can be decided to point to without heap cloning [34, 36] can use context strings of arbitrary object o3 within the constructor. Since no callers of the constructor length. The effectiveness of BDDs decreases in the presence of can affect the points-to pair (t3,o3), all contexts for t3 are equiv- heap cloning because there are many more equivalence classes of alent (i.e., there is only one equivalence class of contexts for this contexts and as a result fewer points-to relationships can be merged. variable). However, variable this 4 points to objects outside the Our proposal. The focus in this paper is a cloning-based context- constructor and its points-to pairs are affected by the callers of the sensitive analysis with arbitrary callstring length as the representa- constructor (e.g., the points-to sets for this in contexts containing tion of contexts. We first present a characterization of equivalence call graph edge i and ones containing edge j are different). In gen- classes of contexts for pointer variables and pointer targets. This eral, the set of equivalence classes for a method m as defined in can be captured by an abstraction function that takes as input a full [20] treats uniformly all variables in m, which may lead to unnec- context string cp for a pointer variable p and a full context co string essary equivalent classes for some variables. Our goal is to provide for a pointer target o, and produces a pair of context substrings a finer-grained definition of equivalent contexts in terms of pointer (sub1, sub2) that defines an equivalence class. All pairs (cp,co) variables and heap objects, in order to exploit greater similarity such that cp maps to sub1 and co maps to sub2 belong to the corre- (i.e., fewer equivalence classes for some variables). sponding equivalence class. The points-to relationships under the Equivalence classes for recursion-free programs. Consider a contexts within an equivalence class are guaranteed to be the same, context-insensitive points-to relationship of the form (p, o) where and all these contexts can be merged without loss of precision. p is a pointer variable and o is an object allocation site. A context- Based on this characterization, we propose a new flow-insensitive sensitive points-to relationship in a heap-cloning analysis is a 4- context-sensitive points-to analysis for Java. First, an intraproce- tuple (p, cp,o,co),wherecp is a full context (callstring) that goes dural phase builds a symbolic points-to graph for each method, us- from main to the method mp that defines p, co is a full context ing symbolic objects as placeholders for the objects that are not that goes from main to the method mo that contains o,andp under visible in the method.