Partial Escape Analysis and Scalar Replacement for Java
Total Page:16
File Type:pdf, Size:1020Kb
Partial Escape Analysis and Scalar Replacement for Java Lukas Stadler Thomas Würthinger Hanspeter Mössenböck Johannes Kepler University Oracle Labs Johannes Kepler University Linz, Austria thomas.wuerthinger Linz, Austria [email protected] @oracle.com [email protected] ABSTRACT 1. INTRODUCTION Escape Analysis allows a compiler to determine whether an State-of-the-art Virtual Machines employ techniques such object is accessible outside the allocating method or thread. as advanced garbage collection, alias analysis and biased This information is used to perform optimizations such as locking to make working with dynamically allocated objects Scalar Replacement, Stack Allocation and Lock Elision, al- as efficient as possible. But even if allocation is cheap, it lowing modern dynamic compilers to remove some of the still incurs some overhead. Even if alias analysis can remove abstractions introduced by advanced programming models. most object accesses, some of them cannot be removed. And The all-or-nothing approach taken by most Escape Anal- although acquiring a biased lock is simple, it is still more ysis algorithms prevents all these optimizations as soon as complex than not acquiring a lock at all. there is one branch where the object escapes, no matter how Escape Analysis can be used to determine whether an ob- unlikely this branch is at runtime. ject needs to be allocated at all, and whether its lock can This paper presents a new, practical algorithm that per- ever be contended. This can help the compiler to get rid of forms control flow sensitive Partial Escape Analysis in a dy- the object's allocation, using Scalar Replacement to replace namic Java compiler. It allows Escape Analysis, Scalar Re- the object's fields with local variables. placement and Lock Elision to be performed on individual Escape Analysis checks whether an object escapes its al- branches. We implemented the algorithm on top of Graal, locating method, i.e., whether it is accessible outside this an open-source Java just-in-time compiler, and it performs method. An object escapes, for example, if it is assigned well on a diverse set of benchmarks. to a static field, if it is passed as an argument to another In this paper, we evaluate the effect of Partial Escape method, or if it is returned by a method. In these cases the Analysis on the DaCapo, ScalaDaCapo and SpecJBB2005 object needs to exist on the heap, because it will be accessed benchmarks, in terms of run-time, number and size of al- as an object in some other context. locations and number of monitor operations. It performs In many cases, however, an object escapes just in a single particularly well in situations with additional levels of ab- unlikely branch. Nevertheless, this prevents optimizations. straction, such as code generated by the Scala compiler. It Therefore, we suggest a flow-sensitive Escape Analysis which reduces the amount of allocated memory by up to 58.5%, we call Partial Escape Analysis. and improves performance by up to 33%. The idea behind Partial Escape Analysis is to perform op- timizations such as Scalar Replacement in branches where Categories and Subject Descriptors the object does not escape, and make sure that the object exists in the heap in branches where it does escape. Addi- D.3.4 [Programming Languages]: Processors|Compil- tionally, our Partial Escape Analysis works not on bytecodes ers, Optimization but on the compiler's intermediate representation, so that it can be applied, possibly multiple times, at any point during General Terms compilation. This paper contributes the following novel aspects: Algorithms, Languages, Performance • A control-flow-sensitive Partial Escape Analysis algo- Keywords rithm that checks the escapability of objects for indi- escape analysis, java, virtual machine, just-in-time compila- vidual branches. tion, intermediate representation, speculative optimization • The integration of our Partial Escape Analysis in a Permission to make digital or hard copies of all or part of this work for Java compiler based on SSA form, speculative opti- personal or classroom use is granted without fee provided that copies are not mization, and deoptimization. made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components • An evaluation of this algorithm on a set of current of this work owned by others than ACM must be honored. Abstracting with benchmarks, in terms of run-time, number and size of credit is permitted. To copy otherwise, or republish, to post on servers or to allocations and number of monitor operations, show- redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. ing that our algorithm performs well on a variety of CGO ’14 February 15 - 19 2014, Orlando, FL, USA benchmarks. Copyright 2014 ACM 978-1-4503-2670-4/14/02 ...$15.00. Graal 1 c l a s s Key f Compiler Java 2 i n t i d x ; 3 Object r e f ; C++ Graal API Client Server 4 Key ( i n t idx, Object ref) f Compiler Compiler 5 t h i s . i d x = i d x ; 6 t h i s . r e f = r e f ; Interpreter GC 7 g 8 synchronized boolean e q u a l s ( Key Class Loading . o t h e r ) f HotSpot VM 9 r e t u r n idx == other.idx && 10 ref == other.ref; 11 g Figure 1: Overview of HotSpot and Graal. 12 g 13 s t a t i c CacheKey cacheKey; 14 s t a t i c Object cacheValue; 15 16 Object getValue( i n t idx, Object ref) f 2. SYSTEM OVERVIEW 17 Key key = new Key(idx, ref); 18 i f (key. equals(cacheKey)) f We implemented our analysis for Graal, which is a Java 19 r e t u r n cacheValue; just-in-time compiler written in Java that runs on top of 20 g e l s e f the HotSpot VM (Figure 1). While it can completely re- 21 r e t u r n createValue (...) ; place the client and server compilers, it reuses all other VM 22 g components, such as the interpreter, the garbage collection 23 g subsystem and class loading. It is open-source and available via the OpenJDK Graal project [11]. Listing 1: Simple example. Graal translates Java bytecode into a high-level intermedi- ate representation called Graal IR [5], on which it performs all optimizations. This SSA-based intermediate represen- have side effects; they can be reexecuted and will lead to the tation models both control flow and data flow dependencies same result. between nodes. While it explicitly expresses the control flow For example, a field store is associated with a frame state of the code, many operations are not fixed at specific loca- that describes the state after the store. A machine code tions. Rather, the position at which these operations are instruction that causes or can cause a deoptimization will executed is determined solely by their data flow dependen- be associated with the after state of the last side-effecting cies. There are usually many positions where an operation instruction. All instructions that were executed in-between could be placed, and it is up to the so-called Scheduler to do not have side effects and can therefore be reexecuted in determine a good one. the interpreter. Graal is an aggressive and optimistic compiler that often makes assumptions about the state and behavior of the run- ning application. This includes assumptions such as some 3. ESCAPE ANALYSIS classes not having subclasses, and some branches never being Escape Analysis checks whether an allocated object es- taken. When one of these assumptions is invalidated, e.g., capes (i.e., can be used outside) the allocating method or by loading a new class or by entering an unexpected branch, thread. This happens, for example, if it is assigned to a the execution needs to be transferred from compiled code global variable or heap object, or if it is passed as a param- back to the interpreter (which does not make any assump- eter to some other method. Compilers use Escape Analysis tions and can execute all code). This switch back to the to determine the dynamic scope and the lifetime of allo- interpreter is called Deoptimization [7] and requires a trans- cated objects. The result of this analysis allows the com- lation from the machine state (native stack frames) back to piler to perform numerous optimizations on operations such the Java VM state, which is used by the interpreter. as object allocations, synchronization primitives and field The Graal IR contains a mapping to Java VM state for accesses. all positions that can cause a deoptimization to occur. This Listing 1 shows a small piece of code that will serve as an mapping is expressed as FrameState nodes and consists of example to show the benefits of Escape Analysis: The get- the current position (bytecode index and method), the local Value method creates a new Key object and checks whether variables, the contents of the expression stack and the locked it is in the cache. If so, the method returns the cached value. objects. Otherwise, it creates and returns a new value (the method After inlining, one position can map to multiple Java VM createValue is not discussed here). stack frames. A frame state thus contains a reference to an When getValue is compiled, the compiler will most likely outer frame state, which is the caller's state. This reference perform some inlining, which might cause the actually com- is used to create chains of FrameState nodes that describe piled code to look like Listing 2.