Practical Partial Evaluation for High-Performance Dynamic Language Runtimes

Practical Partial Evaluation for High-Performance Dynamic Language Runtimes Thomas Wurthinger¨ ∗ Christian Wimmer∗ Christian Humer∗ Andreas Woß¨ ∗ Lukas Stadler∗ Chris Seaton∗ Gilles Duboscq∗ Doug Simon∗ Matthias Grimmery ∗ y Oracle Labs Institute for System Software, Johannes Kepler University Linz, Austria fthomas.wuerthinger, christian.wimmer, christian.humer, andreas.woess, lukas.stadler, chris.seaton, gilles.m.duboscq, [email protected] [email protected] Abstract was first implemented for the SELF language [23]: a multi- Most high-performance dynamic language virtual machines tier optimization system with adaptive optimization and de- duplicate language semantics in the interpreter, compiler, optimization. Multiple tiers increase the implementation and and runtime system, violating the principle to not repeat maintenance costs for a VM: In addition to a language- yourself. In contrast, we define languages solely by writ- specific optimizing compiler, a separate first-tier execution ing an interpreter. Compiled code is derived automatically system must be implemented [2, 4, 21]. Even though the using partial evaluation (the first Futamura projection). The complexity of an interpreter or a baseline compiler is lower interpreter performs specializations, e.g., augments the in- than the complexity of an optimizing compiler, implement- terpreted program with type information and profiling infor- ing them is far from trivial [45]. Additionally, they need to mation. Partial evaluation incorporates these specializations. be maintained and ported to new architectures. This makes partial evaluation practical in the context of dy- But more importantly, the semantics of a language need namic languages, because it reduces the size of the compiled to be implemented multiple times in different styles: For code while still compiling in all parts of an operation that are the first-tier interpreter or baseline compiler, language op- relevant for a particular program. Deoptimization to the in- erations are usually implemented as templates of assem- terpreter, re-specialization in the interpreter, and recompila- bler code. For the second-tier optimizing compiler, language tion embrace the dynamic nature of languages. We evaluate operations are implemented as graphs of language-specific our approach comparing newly built JavaScript, Ruby, and compiler intermediate representation. For the runtime sys- R runtimes with current specialized production implementa- tem (code called from the interpreter or compiled code us- tions of those languages. Our general purpose compilation ing runtime calls), language operations are implemented in system is competitive with production systems even when C or C++. they have been heavily specialized and optimized for one We implement the language semantics only once in a sim- language. ple form: as a language interpreter written in a managed high-level host language. Optimized compiled code is de- Categories and Subject Descriptors D.3.4 [Programming rived from the interpreter using partial evaluation. This ap- Languages]: Processors—Run-time environments proach and its obvious benefits have been described already Keywords dynamic languages; virtual machine; language in 1971 as the first Futamura projection [14]. Still, to the best implementation; optimization; partial evaluation of our knowledge no high-performance dynamic language implementation has used this approach so far. 1. Introduction We believe that a simple partial evaluation of a dynamic High-performance virtual machines (VMs) such as the Java language interpreter cannot lead to high-performance com- HotSpot VM or the V8 JavaScript VM follow the design that piled code: if the complete semantics for a language operation are included during partial evaluation, the size of the compiled code explodes; if language operations are not included during partial evaluation and remain runtime calls, performance is mediocre. To overcome these inherent prob- lems, we write the interpreter in a style that anticipates and embraces partial evaluation. The interpreter specializes the executed instructions, e.g., collects type information and [Copyright notice will appear here once ’preprint’ option is removed.] profiling information. The compiler speculates that the inter- 1 2016/11/16 preter state is stable and creates highly optimized and com- regular call pact machine code. If a speculation turns out to be wrong, runtime g i code transfer to interpreter i.e., was too optimistic, execution transfers back to the in- f h PE boundary call terpreter. The interpreter updates the information, so that the e next partial evaluation is less speculative. interpreter host language method This paper shows that a few core primitives are sufficient code d to communicate all necessary information from the interpreter to the partial evaluator. We present our set of primi- c b dx1 bx2 tives that allow us to implement a wide variety of dynamic bx1 cx1 by1 cy1 languages. Note that we do not claim that our primitives are a ax1 ay1 necessary, i.e., the only way to make partial evaluation of dynamic language interpreters work in practice. interpreter entry compiled code compiled code function x() function y() Closely related to our approach is PyPy [6, 8, 36], which also requires the language implementer to only write an Figure 1: Interactions of interpreter code, compiled code, interpreter to express language semantics. However, PyPy and runtime code. does not use partial evaluation to derive compiled code from the interpreter, but meta-tracing: a trace-based compiler ob- serves the running interpreter (see Section 9 for details). the guest language interpreter methods (code) and the inter- In summary, this paper contributes the following: preted program (data). The interpreted program are all the data structures used by the interpreter, e.g., interpreter in- • We present core primitives that make partial evaluation structions organized as an abstract syntax tree. The language of dynamic language interpreters work in practice. They implementer does not have to provide any additional lan- allow a language-agnostic dynamic compiler to generate guage specific input to the language-independent optimizing high-performance code. compiler. • We present details of the partial evaluation algorithm and In dynamic languages, the full semantics of seemingly show which compiler optimizations are essential after simple operations are complex. For example, addition in partial evaluation. JavaScript can be as simple as the addition of two numbers. • We show that our approach works in practice by compar- But it can also be string concatenation and involve the in- ing our language implementations with the best produc- vocation of complex conversion functions that convert arbi- tion implementations of JavaScript, Ruby, and R. trary object inputs to numbers or strings. Addition in Ruby can again be just the addition of two numbers, but also the 2. System Structure invocation of an arbitrary addition method that can be rede- fined at any time. Addition in R can be just the addition of guest language We implement many different s in a managed two vectors of numbers, but also involve a complex method host language . Only the interpreter and the runtime system lookup based on the classes of its arguments. We claim that for a guest language is implemented anew by the language a simple PE that always captures the full semantics of every developer. Our framework and the host language provide a operation in compiled code is infeasible: it either leads to core set of reusable host services, such as dynamic com- code explosion if all code for the operation is compiled, or pilation (the focus of this paper), automatic memory man- mediocre performance when only parts of the operation are agement, threads, synchronization primitives, and a well- compiled and runtime calls remain in the compiled code for defined memory model. the full semantics. For the concrete examples and the evaluation of this pa- We solve this problem by making the interpreter aware of per, we implemented multiple guest languages (JavaScript, PE. While interpreting an operation, the interpreter special- Ruby, and R) in the host language Java. We require only one izes the instructions, e.g., collects type information or pro- implementation for every guest language operation. Since filing information. In other words, for each instruction the the interpreter and the runtime system are written in the same interpreter stores which subset of the semantics is actually managed host language, there is no strict separation between used. PE uses the specialization information. For this, the them. Regular method calls are used to call the runtime from instructions are assumed to be in a stable state where sub- the interpreter. sequent changes are unlikely, although not prohibited. PE Optimized compiled code is inferred from the interpreter eliminates the interpreter dispatch code and produces opti- Partial evaluation using partial evaluation. (PE) is the pro- mized machine code for a guest language function. cess of creating the initial high-level compiler intermedi- 1 Figure 1 illustrates the interactions of interpreter code, ate representation (IR) for a guest language function from compiled code, and runtime code. The interpreter methods 1 To distinguish the languages in this paper, function always refers to the a to d call each other, and call the runtime methods e to guest language while method always refers to the host language i. Two guest language

Practical Partial Evaluation for High-Performance Dynamic Language Runtimes

The LLVM Instruction Set and Compilation Strategy

A Status Update of BEAMJIT, the Just-In-Time Compiling Abstract Machine

Rubyperf.Pdf

1 Lecture 25

Dynamic Extension of Typed Functional Languages

Using JIT Compilation and Configurable Runtime Systems for Efficient Deployment of Java Programs on Ubiquitous Devices

Comparative Studies of Programming Languages; Course Lecture Notes

Bohm2013.Pdf (3.003Mb)

CDC Build System Guide

Insert Here Your Thesis' Task

EXE: Automatically Generating Inputs of Death

Development of Benchmarking Suite for Various Ruby Implementations