Proceedings of the Third Virtual Machine Research and Technology Symposium

USENIX Association Proceedings of the Third Virtual Machine Research and Technology Symposium San Jose, CA, USA May 6–7, 2004 © 2004 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. The Virtual Processor: Fast, Architecture-Neutral Dynamic Code Generation Ian Piumarta Laboratoire d’Informatique de Paris 6, UniversitéPierre et Marie Curie, 4, place Jussieu, 75252 Paris Cedex 05, France [email protected] Abstract A dynamic code generator is the part of a dynamic compiler that converts the abstract operations of the Tools supporting dynamic code generation tend too source representation into executable native code. be low-level (leaving much work to the client ap- It represents a large part (if not the bulk) of any plication) or too intimately related with the lan- dynamic compiler—and certainly the majority of its guage/system in which they are used (making them complexity. However, very few utilities exist to ease unsuitable for casual reuse). Applications or vir- the task of creating a dynamic code generator and tual machines wanting to benefit from runtime code those that are available are ill-suited to a simple, generation are therefore forced to implement much “plug-and-play” style of use. of the compilation chain for themselves even when they make use of the available tools. The VPU is an 1.1 Related work fast, high-level code generation utility that performs most of the complex tasks related to code genera- Several utilities have been developed to help with tion, including register allocation, and which pro- the implementation of static code generators. Ex- duces good-quality C ABI-compliant native code. In amples include C-- [1], MLRISC [3] and VPO [2] the simplest cases, adding VPU-based runtime code (part of the Zephyr compiler infrastructure). How- generation to an application requires just a few lines ever, very few tools have been developed to help of additional code—and for a typical virtual ma- with the implementation of dynamic code genera- chine, VPU-based just-in-time compilation requires tors. Those that do exist are concerned with the only a few lines of code per virtual instruction. lowest (instruction) level of code generation. ccg [9, 12] is a collection of runtime assemblers im- 1 Introduction plemented entirely in C macros and a preprocessor that converts symbolic assembly language (for a par- Dynamic compilation is the transformation of some ticular CPU) embedded in C or C++ programs into abstract representation of computation, determined macros calls that generate the corresponding binary or discovered during application execution, into na- instructions in memory. It is useful tool for build- tive code implementing that computation. It is ing the final stage of a dynamic code generator, but an attractive solution for many problem domains constitutes only a small part of a dynamic compiler, in which high-level scripts or highly-compact rep- and deals exclusively in the concrete instructions of resentations of algorithms must be used. Exam- a particular architecture. ples include: protocol instantiation in active net- works, packet filtering in firewalls, scriptable adap- vcode [6] and GNU Lightning [13] are attempts to tors/interfaces in software object busses, policy create virtual assembly languages in which a set of “typical” (but fictitious) instructions are converted functions in flexible caches and live evolution of 1 high-availability software systems. In other domains into native code for the local target CPU. Both of it is an essential technique. In particular, virtual these systems present clients with a register-based machines that interpret bytecoded languages can abstract model. Register allocation (one of the most achieve good performance by translating bytecodes 1GNU Lightning is based on ccg and is little more than a into native code “on demand” at runtime. collection of wrappers around it. #include <stdio.h> difficult code generation problems to solve) is left #include <stdlib.h> entirely to the client, and both suffer from problems when faced with register-starved, stack-based typedef int (*pifi)(int); architectures such as Intel.2 pifi rpnCompile(char *expr); Beyond these systems code generators rapidly be- int main() come intimately tied to the source language or sys- { tem with which they are designed to operate. Com- int i; mercial Smalltalk and Java compilers, for example, pifi c2f= rpnCompile("9*5/32+"); pifi f2c= rpnCompile("32-5*9/"); use sets of “template” macros (or some function- printf("\nC:"); ally equivalent mechanism) to generate native code, for (i = 0; i <= 100; i += 10) where the macros correspond closely to the seman- printf("%3d ", i); tics of the bytecode set being compiled. Adapting printf("\nF:"); for (i = 0; i <= 100; i += 10) them for use in a different application (or virtual printf("%3d ", c2f(i)); machine) requires significant work. Tasks such as printf("\n\nF:"); register allocation also tend to be adapted for the for (i = 32; i <= 212; i += 10) printf("%3d ", i); specifics of the source language. (One notable ex- printf("\nC:"); ception is the Self compiler [4], for which a fairly for (i = 32; i <= 212; i += 10) language-neutral model of stack frames and regis- printf("%3d ", f2c(i)); ters was developed. Nevertheless, the rest of the printf("\n"); return 0; Self compiler is so complex that nobody has man- } aged to extract and reuse it in a completely different context.) Figure 1: Temperature conversion table generator. This 1.2 The VPU program relies on a procedure rpnCompile() to create a native code functions converting degrees Farenheit to Celsius and vice-versa. The VPU fills the gap between trivial “virtual assembly languages” and full-blown dynamic compilers intimately tied to their source language and its semantics. It is a complete “plug-and-play” dy- semantically equivalent to C and its output is C- namic code generator that can be integrated into compatible native code. The difference is that the any application in a matter of minutes, or used as VPU is integrated into the application and performs the backend for a dynamically-compiled language its compilation at runtime, sufficiently fast that the or “just-in-time” compiler. It presents the client application should never notice pauses due to dy- with a simple, architecture-neutral model of com- namic code generation. putation and generates high-quality, C-compatible The rest of this paper is organised as follows: Sec- native code with a minimum of time and space over- tion 2 describes the feature set and execution model heads. It assumes full responsibility for many of the of the VPU from the client’s point of view. Section 3 difficult tasks involved in dynamic code generation, then describes in some detail the implementation of including register allocation and the details of lo- the VPU, from the API through to the generation of cal calling conventions. Applications and language native code (and most of the important algorithms implementations using the VPU are portable (with in between). Section 4 presents a few performance no source code changes) to all the platforms sup- measurements, and finally Section 5 offers conclu- ported by the VPU; currently PowerPC, Sparc and sions and perspectives. Pentium. A useful analogy might be to consider languages 2 Plug-and-Play code generation that are “compiled” into C source code which is then passed to an existing C compiler for conver- A simple (but complete) example illustrates the sion into the final executable. The VPU could be use of the VPU. Figure 1 shows a program that used in a similar fashion: its input “language” is prints temperature conversion tables. It relies on 2 a small runtime compiler, rpnCompile(), that con- GNU Lightning solves the “Pentium problem” by sup- verts an input expression (a string containing an porting just 6 registers. vcode solves the problem by mapping “excess” registers within its model onto memory locations, integer function in reverse-polish notation) into ex- with all the performance penalties that this implies. ecutable native code. The program first compiles #cpu pentium #include "VPU.h" pifi rpnCompile(char *expr) pifi rpnCompile(char *expr) { { insn *codePtr= (insn *)malloc(1024); VPU *vpu= new VPU; #[ .org codePtr vpu ->Ienter()->Iarg() pushl %ebp ->LdArg(0); movl %esp, %ebp while (*expr) { movl 8(%ebp), %eax char buf[32]; ]# int n; while (*expr) { if (sscanf(expr, "%[0-9]%n", buf, &n)) { char buf[32]; expr += n - 1; int n; vpu ->Ld(atoi(buf)); if (sscanf(expr, "%[0-9]%n", buf, &n)) #[ } ! expr += n - 1; else if (*expr == ’+’) vpu->Add(); pushl %eax else if (*expr == ’-’) vpu->Sub(); movl $(atoi(buf)), %eax else if (*expr == ’*’) vpu->Mul(); ]# else if (*expr == ’/’) vpu->Div(); else if (*expr == ’+’) #[ else popl %ecx abort(); addl %ecx, %eax ++expr; ]# } else if (*expr == ’-’) #[ vpu ->Ret(); movl %eax, %ecx void *entry= vpu ->compile(); popl %eax delete vpu; subl %ecx, %eax return (pifi)entry; ]# } else if (*expr == ’*’) #[ popl %ecx imull %ecx, %eax Figure 4: VPU-based code generation. ]# else if (*expr == ’/’) #[ movl %eax, %ecx popl %eax native code for a function implementing the input cltd expression. (The program fragments shown in Fig- idivl %ecx, %eax ]# ures 1 and 2 combine to form a complete program, else whose output is shown in Figure 3.) abort(); ++expr; Figure 4 shows an alternative implementation of } rpnCompile() that uses the VPU to produce the #[ leave ret ]#; same native code in a platform-independent man- return (pifi)codePtr; ner.

Proceedings of the Third Virtual Machine Research and Technology Symposium

SNC: a Cloud Service Platform for Symbolic-Numeric Computation Using Just-In-Time Compilation

A Reader Framework for Guile for Guile-Reader 0.6.2

PDF File, Or As a Single HTML Page

Latexsample-Thesis

GCC Plugins and MELT Extensions

Integral Estimation in Quantum Physics

I Don't Really Want to Read Dave's Book. I Just

Sem Vložte Zadání Vaší Práce

Bulletin Issue 18

Just-In-Time Compilation Techniques

Bulletin Issue 16 Forwarding

JIT Technology for Simleon Numerical Emulation of LEON2/LEON3 Processors