Compositional Pointer and Escape Analysis for Java Programs

Comp ositional Pointer and Escap e Analysis for Java Programs John Whaley and Martin Rinard Lab oratory for Computer Science Massachusetts Institute of Technology Cambridge, MA 02139 fjwhaley, [email protected] Abstract the analysis result b ecoming more precise as more of the program is analyzed. At every stage in the analysis, the This pap er presents a combined p ointer and escap e analy- algorithm can distinguish where it do es and do es not have sis algorithm for Java programs. The algorithm is based on complete information. the abstraction of p oints-to escap e graphs, whichcharacter- ize how lo cal variables and elds in ob jects refer to other 1.1 Analysis Uses ob jects. Each p oints-to escap e graph also contains escap e Java presents a clean and simple memory mo del: concep- information, which characterizes how ob jects allo cated in tually, all ob jects are allo cated in a garbage-collected heap. one region of the program can escap e to b e accessed by an- While useful to the programmer, this mo del comes with a other region. The algorithm is designed to analyze arbitrary cost. In many cases it would b e more ecient to allo cate regions of complete or incomplete programs, obtaining com- ob jects on the stack, eliminating the dynamic memory man- plete information for ob jects that do not escap e the analyzed agementoverhead for that ob ject. A similar situation holds regions. for the Java synchronization mo del. Conceptually, every Wehave develop ed an implementation that uses the es- Java ob ject comes with a lo ck. Each synchronized metho d cap e information to eliminate synchronization for ob jects ensures that it executes atomically by acquiring and releas- that are accessed by only one thread and to allo cate ob jects ing the lo ck in its receiver ob ject. But the lo ckoverhead is on the stack instead of in the heap. Our exp erimental results wasted when only one thread accesses the ob ject | the lo cks are encouraging. We were able to analyze programs tens are required only when there is a p ossibility that multiple of thousands of lines long. For our b enchmark programs, threads may attempt to access the ob ject simultaneously. our algorithms enable the elimination of b etween 24 and In this pap er we discuss the use of our analysis results to 67 of the synchronization op erations. They also enable the eliminate unnecessary synchronization and to enable stack stack allo cation of b etween 22 and 95 of the ob jects. allo cation of ob jects. The basic idea is to use the analysis information to determine when ob jects do not escap e 1 Intro duction threads and metho ds. If an ob ject do es not escap e from its allo cating thread to another thread, the compiler can trans- This pap er presents a combined p ointer and escap e anal- form the program to eliminate synchronization op erations ysis algorithm for Java programs and programs written in on that ob ject. If an ob ject do es not escap e a metho d, the similar ob ject-oriented languages. The algorithm is based compiler can transform the program to allo cate it in that on the abstraction of p oints-to escap e graphs, whichchar- metho d's activation record instead of in the heap. Our ex- acterize how lo cal variables and elds in ob jects refer to p erimental results show that the algorithms can eliminate a other ob jects. Each p oints-to escap e graph also contains es- signi cantnumb er of heap allo cations and synchronization cap e information, whichcharacterizes how ob jects allo cated op erations. in one region of the program can escap e to b e accessed by another region. 1.2 Analysis Prop erties A key concept underlying this abstraction is the goal of representing interactions between analyzed and unana- Our analysis has several imp ortant prop erties: lyzed regions of the program. Points-to escap e graphs make a clean distinction between ob jects and references created It is an interpro cedural analysis. It is designed to com- within an analyzed region and those created in the rest of bine analysis results from multiple metho ds to obtain the program. They therefore enable a exible analysis that precise p oints-to and escap e information. is capable of analyzing arbitrary parts of the program, with It is a comp ositional analysis. It is designed to analyze each metho d once to pro duce a single parameterized analysis result that can b e sp ecialized for use at all of 1 the call sites that mayinvoke the metho d. 1 Recursive metho ds require an iterative algorithm that may analyze metho ds multiple times to reach a xed p oint. b e a ected by unanalyzed regions of the program. Our al- It is a partial program analysis in two senses. First, it gorithm is designed to analyze arbitrary regions of complete is designed to analyze each metho d indep endently of or incomplete programs, obtaining complete information for its callers. Second, it is capable of analyzing a metho d ob jects that do not escap e the analyzed regions. without analyzing all of the metho ds that it invokes, with the analysis result b ecoming more precise as more of the invoked metho ds are analyzed. 1.4 Contributions Because the analysis is comp ositional and can analyze This pap er makes the following contributions: pieces of the program indep endently of their callers and Analysis Algorithms: It presents a new combined callees, it is esp ecially appropriate for dynamically loaded p ointer and escap e analysis algorithm. The algorithm programs. Our current implementation runs in a dynamic is comp ositional and is designed to deliver useful in- compiler and analyzes libraries indep endently of applica- formation without analyzing the entire program. tions. When the compiler loads an application, it retrieves the previously computed analysis results for the libraries and Analysis Approach: It presents an analysis approach uses these results in the analysis of the application. that explicitly di erentiates between references and ob jects from analyzed and unanalyzed regions of the 1.3 Basic Approach program. This approach allows the algorithm to cap- ture the interactions between these regions, and to The analysis is based on an abstraction we call p oints-to clearly identify where it do es and do es not have com- escap e graphs. The no des of this abstraction represent ob- plete information ab out the extracted relationships. jects; the edges represent references b etween ob jects. The abstraction also contains information ab out which ob jects Analysis Uses: It presents two optimizations en- escap e to other metho ds or threads. For example, an ob- abled by the extracted p oints-to and escap e informa- ject escap es if it is returned to an unanalyzed region of the tion: synchronization elimination and stack allo cation. program or passed as a parameter to an unanalyzed metho d. For each metho d, the analysis pro duces a p oints-to es- Exp erimental Results: It presents exp erimental re- cap e graph that characterizes the p oints-to relationships cre- sults from a prototyp e implementation of the algo- ated within the metho d and the interaction of the metho d rithms. These results show that the algorithms can with the p oints-to relationships created by the rest of the eliminate a signi cant amount of synchronization op- program. The analysis represents these interactions, in part, erations and heap allo cations. by maintaining a distinction b etween two kinds of edges: in- The remainder of the pap er is organized as follows. Sec- side edges, which represent references created inside the cur- tion 2 presents examples that illustrate how the analysis rently analyzed region, and outside edges, which represent works. Section 3 describ es how the analysis represents the references created outside the currently analyzed region. program and the analysis ob jects that the algorithm uses. Similarly, there are two kinds of no des: inside nodes, Section 4 presents the intrapro cedural analysis, and Sec- which represent ob jects created inside the currently ana- tion 5 presents the interpro cedural analysis. In Section 6 we lyzed region and accessed via inside edges, and outside nodes, present an abstraction relation that characterizes the corre- which represent ob jects that are either created outside the sp ondence b etween p oints-to escap e graphs and the ob jects currently analyzed region or accessed via outside edges. Out- and references that the program creates as it runs. Sec- side edges representinteractions in which the analyzed re- tion 7 presents the synchronization elimination and stack gion reads a reference created in an unanalyzed region. In- allo cation transformations. Section 8 presents the exp eri- side edges from outside no des or no des reachable from out- mental results from our implementation. Section 9 discusses side no des representinteractions in which the analyzed re- related work; we conclude in Section 10. gion creates a reference that the unanalyzed region may read. Edges b etween inside no des represent references created 2 Example within the currently analyzed region. If the inside no des do not escap e, they have no interaction with the rest of In this section we present several examples that illustrate the program and their edges in the p oints-to escap e graph how our analysis works.

Load more