Surgical Precision JIT Compilers
Total Page:16
File Type:pdf, Size:1020Kb
Surgical Precision JIT Compilers Tiark Rompf ‡∗ Arvind K. Sujeethy Kevin J. Browny HyoukJoong Leey Hassan Chafizy Kunle Olukotuny zOracle Labs: {first.last}@oracle.com ∗EPFL: {first.last}@epfl.ch yStanford University: {asujeeth, kjbrown, hyouklee, kunle}@stanford.edu Abstract class Record(fields: Array[String], schema: Array[String]) { def apply(key: String) = fields(schema indexOf key) Just-in-time (JIT) compilation of running programs provides more def foreach(f: (String,String) => Unit) = optimization opportunities than offline compilation. Modern JIT for ((k,i) <- schema.zipWithIndex) f(k,fields(i)) compilers, such as those in virtual machines like Oracle’s HotSpot } def processCSV(file: String)(yld: Record => Unit) = { for Java or Google’s V8 for JavaScript, rely on dynamic profiling val lines = FileReader(file) as their key mechanism to guide optimizations. While these JIT val schema = lines.next.split(",") compilers offer good average performance, their behavior is a black while (lines.hasNext) { box and the achieved performance is highly unpredictable. val fields = lines.next().split(",") In this paper, we propose to turn JIT compilation into a preci- val rec = new Record(fields,schema) sion tool by adding two essential and generic metaprogramming yld(rec) } facilities: First, allow programs to invoke JIT compilation explic- } itly. This enables controlled specialization of arbitrary code at run- time, in the style of partial evaluation. It also enables the JIT com- Figure 1. Example: Processing CSV files. piler to report warnings and errors to the program when it is un- 1. Introduction able to compile a code path in the demanded way. Second, allow the JIT compiler to call back into the program to perform compile- Just-in-time (JIT) compilation of running programs presents more time computation. This lets the program itself define the translation optimization opportunities than offline compilation. Modern JIT strategy for certain constructs on the fly and gives rise to a powerful compilers, such as those found in virtual machines like Oracle’s JIT macro facility that enables “smart” libraries to supply domain- HotSpot for Java[38] or Google’s V8 for JavaScript [19] base specific compiler optimizations or safety checks. their optimization decisions on dynamic properties of the program We present Lancet, a JIT compiler framework for Java bytecode which would not be available to a static compiler. To give one that enables such a tight, two-way integration with the running pro- key example, most modern VMs keep track of the call targets of gram. Lancet itself was derived from a high-level Java bytecode virtual method calls, which enables the JIT compiler to perform interpreter: staging the interpreter using LMS (Lightweight Modu- inlining across virtual calls if the call site is monomorphic, i.e. it lar Staging) produced a simple bytecode compiler. Adding abstract only resolves to a single target, or polymorphic for a small, fixed interpretation turned the simple compiler into an optimizing com- number. Such an inlining decision is speculative; at a later time in piler. This fact provides compelling evidence for the scalability of the program’s execution, the method call might actually resolve to the staged-interpreter approach to compiler construction. a different target, which was not encountered before. In this case, In the case of Lancet, JIT macros also provide a natural interface the caller needs to be deoptimized. The compiled code, which was to existing LMS-based toolchains such as the Delite parallelism and optimized using too optimistic assumptions, is thrown away and DSL framework, which can now serve as accelerator macros for recompiled to take the new information into account. These and arbitrary JVM bytecode. similar approaches go back to the seminal work of Urs Hölzle and others on the Self [10] and StrongTalk [7] VMs. Categories and Subject Descriptors D.3.4 [Programming Lan- guages]: Processors – Code generation, Optimization, Run-time A Motivating Example However, there is a lot more runtime environments; D.1.3 [Programming Techniques]: Concurrent Pro- information that could be exploited, but is not readily accessible gramming – Parallel programming via simple profiling alone. Let us consider an example (in Scala): we would like to implement a library function to read CSV files. General Terms Design, Languages, Performance A CSV file contains tabular data, where the first line defines the Keywords JIT Compilation, Staging, Program Generation schema, i.e. the names of the columns. We would like to iterate over all the rows in a file and access the data fields by name (sample data Permission to make digital or hard copies of all or part of this work for personal or on the right): classroom use is granted without fee provided that copies are not made or distributed processCSV("data.txt") { record => Name, Value, Flag for profit or commercial advantage and that copies bear this notice and the full citation if (record("Flag") == "yes") A, 7, no on the first page. Copyrights for components of this work owned by others than the println(record("Name")) B, 2, yes author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or } ... republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. Moreover, we would like to process files of arbitrary schema and PLDI ’14, June 9 – 11 2014, Edinburgh, United Kingdom. be able to iterate over the fields of each record as (key,value) pairs: Copyright is held by the owner/author(s). Publication rights licensed to ACM. processCSV("data.txt") { record => ACM 978-1-4503-2784-8/14/06. $15.00. for ((k,v) <- record) print(k + ": " + v); println http://dx.doi.org/10.1145/2594291.2594316 } A straightforward implementation of processCSV is shown in // JIT compile function f (explicit compilation) Figure 1. The runtime information we wish to exploit is the schema def compile[T,U](f: T => U): T => U // evaluate f at JIT-compile time (compile-time execution) array: we can easily see that schema is never modified after reading def freeze[T](f: => T): T the first line of the file. As we know both the shape of the data // unroll subsequent loops on xs that follows and the code that performs the processing, we could def unroll[T](xs: Iterable[T]): Iterable[T] specialize our processing logic accordingly. In our second snippet Figure 2. JIT API required for CSV example. above, we would not need to execute a nested loop over the record fields but could directly execute class Record(fields: Array[String], schema: Array[String]) { while (lines.hasNext) { def apply(key: String) = fields(freeze(schema indexOf key)) val fields = lines.next().split(",") def foreach(f: (String,String) => Unit) = print("Name: " + fields(0)) for ((k,i) <- unroll(freeze(schema.zipWithIndex))) f(k,fields(i)) print("Value: " + fields(1)) } print("Flag: " + fields(2)) def processCSV(file: String) = { println val lines = FileReader(file) } val schema = lines.next.split(",") and get away without the overhead of a name-to-column mapping compile { yld: (Record => Unit) => and a record abstraction altogether. while (lines.hasNext) { ... yld(rec) } Alas, no existing JIT compiler will perform this kind of spe- } cialization reliably. Why? There are just too many things it would } need to guess. First, it would need to guess the right chunk to com- Figure 3. CSV example with explicit JIT calls. pile. It must pick the loop after schema initialization, but not the full method. Then it must prove that schema is actually a constant, tial aspects with partial evaluators and staged meta-programming which is difficult because it is stored in Record objects that are ar- systems: A JIT compiler does not operate on closed programs but guments to a method call, yld(rec), which needs to be inlined. Fi- in the presence of static data, live objects on the heap, and static nally, multiple threads may read different files at the same time, code, pieces of the program which can be executed right away, ei- and we want to obtain a specialized compiled code path per invo- ther because they are already compiled or because the VM also in- cation. Specialization per program point – even temporal special- cludes an interpreter. What sets current JIT compilers apart is their ization with deoptimization and recompilation – would not work if fully automatic nature, unobservable to the running program, and multiple versions need to be active at the same time. their support for dynamic deoptimization and recompilation. Roadblocks for Deterministic Performance The word “reliably” We consider only the latter essential and argue that JIT compil- in the preceding paragraph is indeed our main concern. While JIT ers should also come with reflective high-level APIs and extensi- compilers provide good average performance for a large set of pro- bility hooks instead of being black-box systems. In particular, we grams, their behavior is in general a black box and in many cases, propose to augment JIT compilers with two intuitive and general the achieved performance is highly unpredictable. The programmer metaprogramming facilities, upon which further functionality can has no influence on when and how JIT compilation takes place. be built: Key optimization decisions are often based on magic threshold val- • Explicit compilation: if programs can invoke JIT compilation ues; the HotSpot Server compiler, for example, imposes a 35 byte explicitly, code can be specialized with respect to preexisting method size limit for early inlining. Thus, small changes in the pro- objects on the heap, in the style of partial evaluation [28]. It gram source can drastically affect its runtime behavior. Since many also allows the JIT compiler to report warnings and errors to optimization decisions rely on dynamic profile data, performance the program when it is unable to compile a code path in the can vary significantly even between runs of the same program [17].