A Stack Hybridization for Meta-Hybrid Just-In-Time Compilation

日本ソフトウェア科学会第 36 回大会 (2019 年度) 講演論文集 A Stack Hybridization for Meta-hybrid Just-in-time Compilation Yusuke Izawa Hidehiko Masuhara Tomoyuki Aotani Youyou Cong Meta-interpreter-based language implementation frameworks, such as RPython and Truffle/Graal, are con- venient tool for implementing state-of-the-art virtual machines. Those frameworks are classified into trace- based and method- (or ast-) based strategies. RPython uses a trace-based policy to compile straight execution paths, while Truffle/Graal leverages method invocation to compile entire method bodies. Each approach has its own advantages and disadvantages. The trace-based strategy is good at compiling programs with many branching possibilities and able to reduce the size of compiled code, but it is weak at programs with varying control-flow. The method-based strategy is robust with the latter type of programs but it needs thorough method-inlining management to achieve excellent performance. To take advantage of both strategies, we propose a meta-hybrid compilation technique to integrate trace- and method-based compilations, as well as a proof-of-concept implementation called BacCaml. To achieve this goal, we develop a stack hybridization mechanism which makes it possible to coordinate trace- and method-based meta JIT compilation. In the implementation, we extend RPython's architecture and introduced a special syntax for realizing this system in a single interpreter definition. ing straight execution paths commonly executed 1 Introduction loops. In contrast, Truffle/Graal [16][15] choose Meta-interpreter-based language implementation method-based approach, rewriting abstract-syntax- ast frameworks are getting more and more popular as tree ( ) nodes and applying partial evaluation to they allow language developers to leverage conve- enhance run-time performance. nient components such as a just-in-time (JIT) com- Each strategy has its own pros and cons. The piler and garbage collectors. Many programming trace-based strategy is particularly good at com- languages, such as Python [3][11], Ruby [14][10], piling programs with branching possibilities, which R [9], and Erlang [6], have been implemented with are common in dynamically typed languages. How- language implementation frameworks, and each of ever, it performs poorly in the case of compiling implementations achieves good performance as well Fibonacci like programs [6], where the execution as ahead-of-time compilation. path actually taken at run-time is often different Language mplementation frameworks can be from the one trace for compilation. We call this the classified into two compilation approaches from the path divergence problem. On the other hand, the viewpoint of compilation units. RPython [3][4], method-based strategy is so robust that we can ap- a language implementation framework as part of ply it for a variety of programs. However, it needs the PyPy [3] project, adopts tracing compila- careful function-inlining to achieve excellent run- tion, requiring bytecode interpreters and utiliz- time performance. To take advantage of both compilation strategies, we propose a language implementation framework ∗ This is an unrefereed paper. Copyrights belong to where the language designer writes a single meta- the Author(s). An erlier version of the paper was presented at the MoreVM'19 workshop. interpreter definition; the framework compiles a Yusuke Izawa, Hidehiko Masuhara, Tomoyuki Aotani, part of programs in either compilation strategy, and Youyou Cong, Tokyo Institute of Technology, Dept. the run-time can switch execution between code of Mathematical and Computing Science. fragments compiled by both strategies. The key technique in our proposal is stack hybridization, as by falling back to the interpreter. the two strategies require the meta-interpreter to use stack frames in different ways. 2. 2 Meta-Tracing Compilation In this paper, we describe how to take advantage BacCaml is based on RPython's architecture. of both meta compilation strategies and our design Before describing the implementation details, let choice to implement our meta-hybrid JIT compiler us give an overview of RPython's meta-tracing JIT which we call BacCaml. compilation. 2 Background RPython, a subset of Python, is a tool-chain for Before presenting the implementation of our creating programming languages with a trace-based meta-hybrid compiler, we briefly review tracing and JIT compiler. RPython's trace-based JIT compiler meta-tracing compilation. We also explain the path traces the execution of an user-defined, interpreter divergence problem, a performance degradation in instead of tracing the program that an interpreter tracing JIT compilers. executes. It requires an user to implement a bytecode compiler and an interpreter definition for the 2. 1 Tracing Compilation bytecode. A tracing optimization was firstly investigated by In Figure 1, we show an example of a user-defined Dynamo project [1], and its technique is adopted to interpreter. The example uses two annotations, implement JIT compilers for dynamic languages, jit_merge_point and can_enter_jit. By adding e.g. TraceMonkey JavaScript vm [5] and SPUR, a special annotations to an user-defined interpreter tracing JIT compiler for CIL [2]. as shown in Figure 1, language implementers can vm Generally, tracing JIT compilation is separated make their more efficient. The key annota- into following phases: tions that RPython provides are jit_merge_point • Profiling: The profiler detects commonly exe- and can_enter_jit. We put jit_merge_point cuted loops (hot loop). Typically, this is done at the top of a dispatch loop to indicate it. by counts how many times a backward jump can_enter_jit at the point which a back-edge in- instruction is executed, and if the number is struction can be occurred. Furthermore, we have greater than a threshold, the path is consid- to tell the compiler whether the variables are \red" ered as a hot loop. or \green". \red" means that a variable is rele- • Tracing: The interpreter records all executed vant to the result of a calculation, hence red-colored operations as an intermediate representation variables are recorded in resulting traces. \green" (ir). Therefore function calls in hot loops are variables which are not related to the result, are automatically inlined. executed when tracing the execution. • Code generation: The compiler optimizes In the profiling phase, the profiler monitors the the trace in several ways, e.g. common- program counter at can_enter_jit, and counts subexpression elimination, dead-code elimina- how many times a back-edge instruction (e.g. jump tion, and constant folding, and compiles it into from 204 to 20) is occurred. When the counter gets native code. over the threshold, the JIT compiler goes into the • Execution: After the trace has been compiled tracing phase. Basically, the tracer records or ex- to native code, it is executed when the inter- ecutes instructions the user-defined interpreter ex- preter runs the hot code once again. ecuted. In this tracing phase, only \green" vari- Among all possible branches, only actually exe- ables (such as bytecode, and pc) are executed and cuted one is selected. To make sure that the condi- other \red" variables are recorded. The result- tion of tracing and execution are the same, a special ing traces are optimized and converted into native instruction (guard) is placed at every possible point code. When control reaches again, it goes to the (e.g. if statements) that goes to another direction. compiled code. The guard checks whether the original condition is still valid. If the condition is false, the execution in 2. 3 The Path Divergence Problem the machine code is quit and continues to execute The tracing JIT compilers have weakness in compiling a particular kind of programs [6], which we int fib(int n) { if (n <= 1) A 1 def push(stack, sp, v): 2 stack[sp] = v return 1; B 3 return sp + 1 } else { 4 int res1 = fib(n ‐ 1); C 5 def pop(stack, sp): 6 v = stack[sp - 1] int res2 = fib(n ‐ 2); D 7 return v, sp - 1 return res1 + res2; } E 8 9 def interp(bytecode): 10 stack = []; sp = 0; pc = 0 11 while True : 12 jit_merge_point( A 13 reds =[ 'stack ', 'sp '], 14 greens =[ 'bytecode ','pc ']) 15 inst = bytecode[pc] B C 16 if inst == ADD : 17 v2, sp = pop(stack, sp) 18 v1, sp = pop(stack, sp) E D 19 sp = push(stack, sp, v1 + v2) 20 elif inst == JUMP_IF: 21 pc += 1 Fig. 2: The Fibonacci function and its control 22 addr = bytecode[pc] flow. 23 if addr < pc: # backward jump 24 can_enter_jit( 25 reds =[ 'stack ', 'sp '], the nodes end with a function call is connected to 26 greens =[ 'bytecode ','pc ']) the entry of the function. Also, the nodes end with 27 pc = addr a return statement is connected to the next basic 28 blocks of its caller, which can be more than one. Tracing compilers rely on the fact that many Fig. 1: An example interpreter definition written program executions contain a subsequence (i.e., a in RPython. sequence of control flow nodes) that appear fre- quently in the entire execution. However, the ex- call the path divergence problem. The problem is ecution of Fibonacci rarely contain such a subse- observed as frequent guard failures (and compila- quence. This is because the branching nodes in the tion of the subsequent traces) when it runs such graph, namely A, B and E in the graph, take one kind of programs. Since it spends most of time for of two following nodes almost the same probability. JIT compilation, the entire execution can be slower As a result, no matter what path the tracing com- than an interpreted execution. piler chooses, the next execution of the compiled Programs that cause the path divergence prob- trace will likely to cause guard failure in its middle lem often take different execution paths when they of execution. are executed. Functions that have multiple non-tail 3 Meta-hybrid JIT Compilation Frame- recursive calls, e.g., Fibonacci, are examples.

Load more