Escape Analysis in the Context of Dynamic Compilation and Deoptimization∗

Escape Analysis in the Context of Dynamic Compilation and Deoptimization∗ Thomas Kotzmann Hanspeter Mössenböck Institute for System Software Institute for System Software Johannes Kepler University Linz Johannes Kepler University Linz Linz, Austria Linz, Austria [email protected] [email protected] ABSTRACT Keywords In object-oriented programming languages, an object is said Java, just-in-time compilation, optimization, escape analy- to escape the method or thread in which it was created if it sis, scalar replacement, stack allocation, synchronization re- can also be accessed by other methods or threads. Knowing moval, deoptimization which objects do not escape allows a compiler to perform aggressive optimizations. 1. INTRODUCTION This paper presents a new intraprocedural and interprocedural algorithm for escape analysis in the context of dynamic In Java, each object is allocated on the heap and deallo- compilation where the compiler has to cope with dynamic cated by the garbage collector, which is invoked once in a class loading and deoptimization. It was implemented for while to examine the heap and release the memory of objects Sun Microsystems’ Java HotSpotTM client compiler and op- as soon as they are not referenced any longer. Allocation, erates on an intermediate representation in SSA form. We access and synchronization of objects involve considerable introduce equi-escape sets for the efficient propagation of costs: escape information between related objects. The analysis • is used for scalar replacement of fields and synchronization An object allocation does not only involve memory removal,aswellasforstack allocation of objects and fixed- reservation. The fields of the object must also be ini- sized arrays. The results of the interprocedural analysis sup- tialized, because the garbage collector presumes that port the compiler in inlining decisions and allow actual pa- unassigned references are null. Additionally, the Java rameters to be allocated on the caller stack. language specification defines default values for fields. Under certain circumstances, the Java HotSpotTM VM Therefore, the memory occupied by the newly created is forced to stop executing a method’s machine code and object is typically cleared out in a loop even if values transfer control to the interpreter. This is called deoptimiza- are explicitly assigned to the fields by the constructor tion. Since the interpreter does not know about the scalar afterwards. replacement and synchronization removal performed by the • Reading or writing a field of an object requires one compiler, the deoptimization framework was extended to re- or two memory accesses depending on whether the ad- allocate and relock objects on demand. dress of the object is already in a register or not. In the context of a generational garbage collector, the modifi- Categories and Subject Descriptors cation of a field that holds a pointer is more expensive. D.3.4 [Programming Languages]: Processors—Incremen- It is associated with a write barrier,becausepointers tal compilers, Optimization; D.4.1 [Operating Systems]: from older to younger generations must be recorded in Process Management—Synchronization; D.4.2 [Operating a remembered set so that younger generations can be Systems]: Storage Management—Allocation / deallocation collected without inspecting every object in the older strategies ones [12]. • General Terms Java programs typically coordinate multiple threads by synchronizing on a shared object. Much research Algorithms, Languages, Performance has been done to reduce the cost of synchronization ∗ in JVM implementations, but complete elimination of This work was supported by Sun Microsystems, Inc. useless synchronization is still a desirable goal. Scalar replacement. The costs described above can be re- Permission to make digital or hard copies of all or part of this work for duced by various compiler optimizations. An object that is personal or classroom use is granted without fee provided that copies are neither assigned to a non-local variable nor passed as a pa- not made or distributed for profit or commercial advantage and that copies rameter does not escape the method in which it is allocated. bear this notice and the full citation on the first page. To copy otherwise, to The compiler can eliminate the allocation and replace the republish, to post on servers or to redistribute to lists, requires prior specific object’s fields by scalars (see Figure 1). This optimization permission and/or a fee. VEE’05, June 11–12, 2005, Chicago, Illinois, USA. eliminates memory allocation, initialization, field access and Copyright 2005 ACM 1-59593-047-7/05/0006...$5.00. write barriers. 111 floatfoo( float d) { floatfoo( float d) { floatfoo( float d) { Circle c =new Circle(d / 2); Circle c =new Circle(); float r=d/2; return c.area(); c.r=d/2; return r*r*PI; } return c.r * c.r * PI; } } original method after inlining after scalar replacement Figure 1: Example for scalar replacement of fields Stack allocation. An object that is accessed only by the 2.1 Escape States and Fields Array creating method and its callees does not escape the current Before the compiler can optimize the allocation and syn- thread. Although it must not be eliminated because the chronization of objects, it has to know whether an object callees require a reference to the object, it can be allocated allocated in a method can be accessed from outside the on the stack. Its memory is released implicitly when the method. This knowledge is derived from the HIR via es- stack frame is removed at the end of the method. This cape analysis. If an object can be accessed by other meth- means that garbage collection runs faster and less frequently ods or threads, it is said to escape the current method or because the young generation of the heap does not fill up the current thread, respectively. In the context of our work, that fast. Moreover, synchronization on thread-local objects possible escape levels are: can be removed. In this paper we present a new algorithm for escape analy- • NoEscape: The object is accessible only from within sis and related optimizations. The analysis was implemented the method of creation. In most cases, the compiler for the client compiler of the Java HotSpotTM Virtual Ma- can eliminate the object and replace its fields by scalar chine [22] in the context of a research collaboration between variables (see Section 2.2). Objects with this escape Sun Microsystems and the Institute for System Software at level are called method-local. the Johannes Kepler University Linz. The paper contributes • MethodEscape: The object escapes the current method the following: but not the current thread, e.g. because it is passed to a callee which does not let the argument escape. • It presents an intraprocedural and interprocedural es- It is possible to allocate the object on the stack and cape analysis for a dynamic compiler. The results are eliminate any synchronization on it. Objects with this used for scalar replacement, stack allocation and syn- escape level are called thread-local. chronization removal. • GlobalEscape: The object escapes globally, typically • It introduces equi-escape sets for the representation of because it is assigned to a static variable or to a field dependencies between objects and the efficient propa- of a heap object. The object must be allocated on the gation of escape states. heap, because it can be referenced by other threads • It describes how the HotSpotTM deoptimization frame- and methods. work was extended to deal with eliminated object al- In contrast to the bytecodes of a method, the SSA-based locations and removedsynchronization. HIR does not contain any instructions for loading or stor- ing local variables. They are eliminated during the abstract Sun’s client compiler is a simple and fast compiler dedi- interpretation of the bytecodes. For this purpose, the com- cated to client programs, applets and graphical user inter- piler maintains a state object which contains a locals array faces [9]. Its objective is to achieve high compilation speed to keep track of the values most recently assigned to a local at a potential cost of peak performance. The front end op- variable [16]. If an instruction creates a value for the local erates on a SSA-based [5] high-level intermediate represen- variable with the number n, a pointer to this instruction is tation (HIR) and performs only few high-impact optimiza- stored in the locals array at position n. If an instruction tions, such as constant folding, method inlining and com- uses a local variable as an operand, it is provided with the mon subexpression elimination. The back end converts the corresponding value from the locals array, i.e. a pointer to HIR into a low-level intermediate representation, which is the instruction where the value was created. the basis for linear scan register allocation [24] and code Bytecodes refer to local variables via indices, and the max- generation. imum number of variables for a certain method is specified in the class file. As far as fields are concerned, we do not know 2. INTRAPROCEDURAL ANALYSIS in advance which or how many fields are accessed within a The HIR is built by parsing the bytecodes and performing method. Therefore, the state object is extended by a grow- an abstract interpretation [9]. At first, the boundaries of all able fields array which stores the current values of all fields. basic blocks are determined by examining the destinations of Figure 2 shows the contents of the locals array and the fields jumps and the successors of conditional jump instructions. array for a simple sequence of bytecodes. Then the basic blocks are filled with HIR instructions. Each At the end of the HIR construction, when we know which HIR instruction that loads or computes a value represents objects escape and which do not, we substitute the fields of both the operation and the result so that operands appear non-escaping objects by the values stored in the fields array.

Load more