Fuse Method Handles for Dynamic JVM Languages
Total Page:16
File Type:pdf, Size:1020Kb
Fuse Method Handles for Dynamic JVM Languages Shijie Xu David Bremner Daniel Heidinga IBM Centre of Advanced Studies IBM Centre of Advanced Studies IBM Ottawa Lab University of New Brunswick University of New Brunswick Ottawa, ON, Canada Fredericton, New Brunswick, Canada Fredericton, New Brunswick, Canada [email protected] [email protected] [email protected] Abstract CCS Concepts • Software and its engineering → Inter- A Method Handle (MH) in JSR 292 (Supporting Dynamically preters; Interpreters; Just-in-time compilers; Typed Languages on the JVM) is a typed, directly executable Keywords invokedynamic, method handle, bytecode gen- reference to an underlying method, constructor, or field, eration, just-in-time, object fusion with optional method type transformations. Multiple con- nected MHs make up a Method Handle Graph (MHG), which ACM Reference Format: transfers an invocation at a dynamic call site to real method Shijie Xu, David Bremner, and Daniel Heidinga. 2017. Fuse Method implementations at runtime. Despite benefits that MHGs Handles for Dynamic JVM Languages. In Proceedings of ACM SIG- have for dynamic JVM language implementations, MHGs PLAN International Workshop on Virtual Machines and Interme- diate Languages (VMIL’17). ACM, New York, NY, USA, 10 pages. challenge existing JVM optimization because a) larger MHGs https://doi.org/10.1145/3141871.3141874 at call sites incur higher graph traversal costs at runtime; and b) JIT expenses, including profiling and compilation of individual MHs, increase along with the number of MHs. 1 Introduction This paper proposes dynamic graph fusion to compile an A method handle (MH) is a typed, directly executable refer- MHG into another equivalent but simpler MHG (e.g., fewer ence to an underlying method, constructor, field, or similar MHs and edges), as well as related optimization opportunities low-level operation, with optional transformations of argu- (e.g., selection policy and inline caching). Graph fusion dy- ments or return values [12]. These transformations include namically fuses bytecodes of internal MHs on hot paths, and such patterns as method argument insertion, removal, and then substitutes these internal MHs with the instance of the substitution. Method handles, together with the Java Virtual newly generated bytecodes at program runtime. The imple- Machine (JVM) instruction, invokedynamic, were first pro- mentation consists of a template system and GraphJIT. The posed in the Java Specification Request (JSR) 292 to resolve former emits source bytecodes for individual MHs, while the “pain points” when implementing dynamic JVM languages latter is a JIT compiler that fuses source bytecodes from tem- (e.g., JavaScript, Ruby and Python) [3, 14, 18]. plates on the bytecode level (i.e., both source code and target As a method referred by an MH can have references to code are bytecodes). With the JRuby Micro-Indy benchmark other MHs, multiple connected MHs make up a Method Han- from Computer Language Benchmark Game and JavaScript dle Graph (MHG) [24]. An MHG has only one root, and it Octane benchmark on Nashorn, our results show that (a) can be executed by invoking the root’s invokeExact method, the technique can reduce execution time of Micro-Indy and which in turn triggers the execution traversal of the whole Octane benchmarks by 6.28% and 7.73% on average; b) it or partial graph. With the invokedynamic instruction, a dy- can speed up a typical MHG’s execution by 31.53% using namic method call can be resolved to real method imple- Ahead-Of-Time (AOT) compilation; and (c) the technique mentations by executing the linked MHG at the call site. In reduces the number of MH JIT compilations by 52.1%. Figure1, the root holds two child MH 2 and MH 3, and its execution can resolve a dynamic invocation to real method implementations that are referenced by leaf MHs. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies 1.1 Motivation are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights The disadvantage of an MHG is its complex structure for for components of this work owned by others than the author(s) must traversal, especially when the graph (or equivalent graphs) be honored. Abstracting with credit is permitted. To copy otherwise, or is large and repeatedly seen at multiple call sites. To resolve republish, to post on servers or to redistribute to lists, requires prior specific a dynamic method, a number of indirections from the root permission and/or a fee. Request permissions from [email protected]. have to be done. In Figure1, the JVM has to execute at least VMIL’17, October 24, 2017, Vancouver, Canada four MHs (i.e., 1, 3, 4, and 6) from the MH 1 to the leaf MH 7. © 2017 Copyright held by the owner/author(s). Publication rights licensed to Association for Computing Machinery. Considering the prevalence of dynamic method invocations, ACM ISBN 978-1-4503-5519-3/17/10...$15.00 these four indirections from root to leaves would be repeated https://doi.org/10.1145/3141871.3141874 many times at runtime, and it is worthwhile to seek a more VMIL’17, October 24, 2017, Vancouver, Canada Shijie Xu, David Bremner, and Daniel Heidinga 1 output for the MHG in Figure1. In both MHGs, MHs, i.e., 2 3 MH(1,3), MH(3,4,6) and MH(1’), are newly generated, and the distances from root to MH 7 are reduced from 4 to 3, and 4 5 from 4 to 2, respectively. 6 The provided graph fusion solution works on the bytecode 7 8 9 level, and is implemented by two components: an MH tem- plate system and GraphJIT. In the template system, templates Figure 1. Method Handle Sample of individual MH transformations produce source bytecodes. GraphJIT is a JIT compiler for a Frequently Traversed but Sta- ble Directed Acyclic (FTSDA) graph simplification. Within J9 efficient method to reach leaf MHs, instead of traversing on JVM, GraphJIT fuses source bytecodes of MHs on hot paths, the original path strictly. and moves leaf MHs close to the graph root by dynamically Large MHGs are also inefficient for Just-In-Time (JIT) generating bytecodes. By running JRuby Micro-Indy bench- compilations on the JVM. JIT compilation is a procedure mark from Computer Language Benchmark Game [2] and that translates a frequently executed bytecode method (or JavaScript Octane [16] benchmark on Nashorn [15], our eval- trace) into native machine code. In the J9 JVM (i.e., IBM’s uation shows that the technique a) can reduce execution time JVM implementation), the MH JIT compilation, conducted of Micro-Indy and Octane benchmark by 6.28% and 7.73% by Testarossa JIT (TRJIT) compiler, uses multiple phases, on average, respectively; b) can speed up a typical MHG’s two of which are a translation of individual MHs with little traversal from Micro-Indy benchmark by 31.53% using AOT optimization (i.e., warm JIT) and a translation of an MHG compilation; c) as much as 53.84% MHs have been fused by with aggressive optimization (i.e., hot JIT), when these MHs GraphJIT; d) the technique reduces the number of MH JIT are hot enough [5, 11]. Thus, both profiling tasks and warm (warm JIT) compilations by 52.1%. JITs are conducted on individual MHs, and the cost is ap- proximately proportional to the number of MHs in a graph. 1.2 Contributions For example, there are 9 profiling targets and 9 warm JIT Our contributions are compilation startups for the MHG in Figure1, if they are all hot enough. Consequently, the accumulated expenses of • bytecode level graph fusion for reduction of MHs and profiling and compilation startups on individual MHsare potential JIT compilation efforts. not trivial at program runtime, when MHGs become large. • the integration of GraphJIT with J9 JVM for the method Further more, the large number of MH compilation tasks handle graph fusion, and the identification of multiple that are submitted to JIT system would increase competition optimization opportunities (e.g., sharing of the gener- among compilation tasks (e.g., MH tasks and other non-MH ated bytecodes, and graph node selection policies) to tasks) for JIT threads. balance the compilation cost and potential benefits. • an extensive evaluation with a quantitative analysis of how graph fusion impacts performance and MH JIT compilation. We show that graph fusion can speed up method handle graph execution, and reduces the number of MH warm JIT compilations in J9 JVM. The rest of this paper is organized as follows. Section2 provides general background for the invokedynamic instruc- tion and method handles. Section3 and Section4 overview the system structure, and describe the template system. Sec- Figure 2. a) Merge 1 and 3; b) Merge 3, 4, and 6 tion5 and Section6 provide the GraphJIT integration with method handles in J9 JVM and runtime optimizations (e.g., Addressing MHG traversal and its JIT compilation over- hot MH selection and inline caching for mutable MHs) during head, this paper proposes a dynamic graph fusion to compile compilation. Section7 shows our evaluation results. Finally an MHG into another equivalent but simpler MHG, and op- the related work, future work, and conclusions are given in timization opportunities (e.g., graph node selection policy Section8, Section9 and Section 10, respectively. and inline caching). The graph fusion dynamically generates new MH transformations by fusing hot internal MHG nodes’ 2 Background bytecodes; it creates MH instances of the new transforma- tions; it sets up a new MHG with MHs from the source MHG 2.1 Invokedynamic Instruction and these newly created MHs; and it substitutes the new Many “pain points” (e.g., failed inlining and polluted profiles) MHG for the source MHG at the dynamic call site for execu- have been reported when implementing dynamic languages tion.