Just-in-time Compilation Techniques

Timo Lilja [email protected]

May 13, 2010

1 Introduction

Just-in-time compilation or JIT compilation is a technique that compiles byte code into native code during run-time. The technique is not strictly needed: the goal is improve the run-time performance of the code but the system would be perfectly usable without JIT compilation. There are several benefits that can be obtained from JIT compilation. They include better compilation for specific architecture: normally code is optimized for common subset of the instruction set architecture and CPU specific instruc- tions are ignored. In JIT you can, however, produce, i.e. SSE2 optimized code if the actual CPU supports that instruction set. In normal compilation, the code can be only analyzed statically thus every execution path of the code must be compiled and there is now run-time informa- tion on how often a path is taken. In JIT compilation the compiler can obtain run-time information and perform optimizations based on it, thus providing more efficient code. In C and certain other languages there is little opportunity for global program- wide optimizations. In JIT compiled code, the entire execution of the program code can be considered as a whole when performing the optimizations. JIT systems come in many forms: Usually there is the source code which is compiled to a byte code that is interpreted and executed in a virtual machine interpreter. The JIT compilation replaces the execution of the byte code with a compilation step to native code. Once the compilation has been done, the native code gets executed. Naturally the code is cached for subsequent runs and possibly re-compiled if new information is obtained during the execution to provide better optimized code. Naturally, there are some variations: JIT systems which do not contain the byte code interpreter thus forcing all code to be JIT compiled before execution. Also, some systems do run-time compilation but store the code into persistent storage. In some broader sense of the term, these systems could be considered as JIT compilers, too. These systems are commonly referenced as mixed code.

1 First JIT compilation system according to Aycock [4] were McCarthy’s Lisp systems in the 1960s. Thompson devised a way to compile regular expressions during run-time in 1968. In JIT compilation, the term hot spot refers to a technique where only the code that is most frequently executed “hot” gets compiled. Fortran was one of the first systems to include this technique in 1974. The system maintained a frequency counter for each block of code. The more often the code got executed the more time consuming optimizations were used but this didn’t always yield faster code than the code obtained by statically compiling. Smalltalk virtual machine system was considered slow and optimized by providing a system that lazily compiled methods to native code. When the memory run out the native code was thrown away and regenerated as needed. Self programming language presented some notable steps in JIT develop- ment. Self is a prototype based dynamic language which allows changing objects during run-time making dynamic compilation mandatory. To gain some speed the designers decided to use JIT techniques in implementing the Self runtime. In the first generation the compilation was done in specialized contexts; instead of generating generic code for execution of a method the code was JIT compiled during the runtime when the exact type information was available The second generation introduced the concept of deferring the compilation of uncommonly executed code. In the third generation the code that was executed more fre- quently was recompiled with better optimizations which was labeled as adaptive compilation. In 1994 Franz presented slim binaries which provided a machine independent high-level representation of a code. The code is compiled to machine executable form during runtime taking the hardware specific instruction sets into account. Later, in 1997, Kistler extended the system by providing continuous optimiza- tions which basically optimized the code ad infinitum. The code generation was run on a separate low-priority thread which was run only if the program was idle. The concept of simulation means a technique needed to run one architec- ture’s code in another architecture. At first, the simulators executed each in- struction in an interpretive manner. Subsequent generations improved this by caching the translated instructions and moved to operate on larger blocks of code instead of a single instruction translation. Usually newer simulation envi- ronments translate only hot code. Another aspect is the translation of legacy code into VLIW code. In these systems the source architecture is usually a subset of the VLIW enabled target architecture. JIT compilation can boost the performance by obtaining more instruction level parallelism (ILP). Modern JIT development has focused on two programming languages and their execution environments: Java/JRE and Javascript/browsers. Aycock’s survey [4] is somewhat outdated in this respect: it mentions only the early Java development from the late 90s and doesn’t say anything about Javascript. Aycock provides a classification of JIT systems. According to him JITs can be categorized

2 • Invocation: whether the JIT compiler needs to be explicitly invoked or whether the execution is implicit or transparent to the user • Executability: A system is monoexecutable if the source and destination languages are same. Otherwise the system is polyexecutable

• Concurrency: if the execution of the JIT compiler can execute while the program is being run, it is concurrent. • Real-time: if the JIT compiler provides real time guarantees Since the survey is somewhat dated it doesn’t consider trace and non-trace based JIT compilers as their own category.

2 Tools

There are various tools available which allow one to use JIT compiler in their programming language execution environment. Probably the most known is the LLVM compiler project [3]. It provides virtual machine and a JIT compiler. By writing a front-end for one’s own language, it is easy to obtain JIT support for the programming language environment. LLVM is mainly focused on C-like languages. They provide front-ends for C and C++ though there are some other non-official front-ends available. Gnu [2] is rather lightweight library which provides its own RISC like instruction set and a JIT compiler. It supports SPARC, and Pow- erPC architectures and is used by various programming language projects like MzScheme, Gnu Smalltalk and CLISP. Another lightweight alternative is Gnu libJIT [1] which has its own static single assignment intermediate language and type system. It is currently used in DotGNU system. The question that programming language run-time environment designer has to face is to whether to use existing JIT implementation or to devise your own. No easy answer can be given: implementing a JIT is rather error prone task with hard to debug problems. Existing systems, on the other hand, can be too limiting if one needs complex control of the programming language execution. E.g., if one needs continuations in one’s language, it usually requires some sup- port in the underlaying IR language. If the JIT environment lacks this, adding it afterwards can prove to be a daunting task.

References

[1] GNU libJIT. http://freshmeat.net/projects/libjit/. [2] GNU lightning. http://www.gnu.org/software/lightning/. [3] LLVM: Low Level Virtual Machine. http://llvm.org.

3 [4] J. Aycock. A brief history of just-in-time. ACM Comput. Surv., 35(2):97– 113, 2003.

4