Compiling Scheme to JVM Bytecode: a Performance Study

Compiling Scheme to JVM bytecode: a performance study Bernard Paul Serpette Manuel Serrano Inria Sophia-Antipolis Inria Sophia-Antipolis 2004 route des Lucioles – B.P. 93 2004 route des Lucioles – B.P. 93 F-06902 Sophia-Antipolis, Cedex F-06902 Sophia-Antipolis, Cedex [email protected] [email protected] http://www.inria.fr/oasis/Bernard.Serpette http://www.inria.fr/mimosa/Manuel.Serrano ABSTRACT (henceforth FLs), several papers describing such systems We have added a Java virtual machine (henceforth JVM) have been published (Haskell [36], Scheme [7, 23, 9], and bytecode generator to the optimizing Scheme-to-C compiler SML [5]). Bigloo. We named thisnew compiler BiglooJVM. We have JVM bytecode isan appealing target because: used this new compiler to evaluate how suitable the JVM • It ishighly portable. Producing JVM bytecode, it is bytecode isasa target for compiling strict functional lan- possible to “compile once and run everywhere”. For in- guagessuch asScheme. In thispaper, we focuson the perfor- stance, the BiglooJVM Scheme-to-JVM compiler that manceissue.Wehavemeasuredtheexecutiontimeofmany ispresented in thispaper isimplemented on Unix but Scheme programswhen compiled to C and when compiled theverysamebootstrappedversionalsorunsonMi- to JVM. We found that for each benchmark, at least one of crosoft Windows. our hardware platformsran the BiglooJVM version in less than twice the time taken by the Bigloo version. In order • The standard runtime environment for Java contains to deliver fast programs the generated JVM bytecode must a large set of libraries and features: widget libraries, be carefully crafted in order to benefit from the speedup of database access, network connection, sound libraries, just-in-time compilers. etc. A compiler which compilesa language to JVM bytecode could make these fancy features available to Categories and Subject Descriptors programswritten in that language. D.3.1 [Programming Languages]: Language Classifica- • A lot of programming environmentsand toolsare de- tions—applicative (functional) languages;D.3.4[Program- signedfortheJVM.Somehavenocounterpartfor ming Languages]: Processors—compilers; I.1.3 [Symbolic other languages(especially the onesthat rely on the and Algebraic Manipulation]: Languages and Systems— high-level featuresof Java such asitsautomatic mem- evaluation strategies ory management). Producing JVM bytecode allows these tools to be reused. For instance, we have al- General Terms ready customized the Jinsight system [8] for profiling Scheme compiled programs. Languages, Experimentation, Measurement, Performance • The JVM isdesigned for hosting high-level, object- Keywords oriented languages. It providesconstructionsand facil- ities such as dynamic type testing, garbage collection Functional languages, Scheme, Compilation, Java virtual and subtype polymorphism. These runtime features Machine are also frequently required for FLs. 1. INTRODUCTION 1.1 Performance of JVM executions Many implementorsof high-level languageshave already In spite of the previously quoted advantages, compiling ported their compilersand interpretersto the Java Virtual toJVM(ortoJava)raisesanimportantissue:whatabout Machine [21]. According to Tolksdorf’s web page [34], there performance? Is it possible to deliver fast applications using are more than 130 compilerstargeting JVM bytecode, for a FL-to-JVM compiler? Current Java implementationsare all kinds of languages. In the case of functional languages known not to be very fast and, in addition, because the JVM isdesigned and tuned for Java, one might wonder whether it ispossible to implement a correct mapping from a FL to JVM without an important performance penalty. The aim Permission to make digital or hard copies of all or part of this work for of thispaper isto provide answersto thesequestions. personal or classroom use is granted without fee provided that copies are We have added a new code generator to the existing op- not made or distributed for profit or commercial advantage and that copies timizing Scheme-to-C compiler Bigloo. We call thisnew bear this notice and the full citation on the first page. To copy otherwise, to compiler BiglooJVM. Because the only difference between republish, to post on servers or to redistribute to lists, requires prior specific Bigloo and BiglooJVM is the runtime system, BiglooJVM permission and/or a fee. ICFP’02, October 4-6, 2002, Pittsburgh, Pennsylvania, USA. is a precise tool to evaluate JVM performance. We have used Copyright 2002 ACM 1-58113-487-8/02/0010 ...$5.00. it asa test-bed for benchmarking. We have compared the 259 performance of a wide range of Scheme programswhen com- C piled to native code via C and when compiled to JVM. We runtime have used 21 Scheme benchmarks, implemented by different authors, in different programming styles, ranging from .o a.out 18 lineslong programsto 100,000 lineslong programs(see Bigloo C Appendix A for a short description). We have tested these generator programson three architectures(Linux/x86, Solaris/Sparc Scheme Parser Optimizer Scheme and Digital Unix/Alpha). We have found the ratiosBigloo- module runtime JVM/Bigloo quite similar on Linux/x86 and Sparc. This BiglooJVM JVM isnot surprising asboth configurationsusethe same JVM generator (HotSpottm 2.0). We have thusdecided not to present the .class a.jar Sparc results in this paper. It would be pointless to measure the performance penalty JVM imposed by the JVM if our compiler is a weak one. In order runtime to “validate” the BiglooJVM compiler we have compared it to other compilers. In a first step, we have compared it to two other Scheme compilers: i) Kawa [7] because it is the Figure 1: The architecture of the Bigloo compiler. only other Scheme compiler that producesJVM code and ii) Gambit because it is a popular, portable and largely used benchmark (see Appendix A), when ran on a Linux/x86 Scheme-to-C compiler. In a second step, we have compared architecture wasmore than 100 timesslower than the cor- BiglooJVM to other compilersproducing JVM code: iii) responding C version. Modifying the produced JVM class MLj [5] and, iv) Sun’sJava compiler. We present these files, we have been able to speed up this benchmark and it is comparisons in Section 2. now “only” 3 times slower than the C version. Many small problems were responsible for the initial poor performance. 1.2 The Bigloo and BiglooJVM compilers For instance, we found that some sequences of JVM instruc- Bigloo isan optimizing compiler for the Scheme program- tionsprevent the just-in-time compiler (JIT) from compiling ming language [18]. It compiles modules into C files. Its methods. We also found that on some 32-bit architectures, runtime library uses the Boehm and Weiser garbage col- using 64-bit integersisprohibitive. We found that some lector [6]. Each module consists in a sequence of Scheme JIT (Sun HotSpot and others) compilers have a size limit definitions and top level expressions. After reading and perfunctionabovewhichtheystopcompiling.Finally,we expanding a source file, Bigloo builds an abstract syntax found that some JIT compilers refuse to compile functions tree (AST) that passes from stage to stage. Many stages that include functional loops, that is, loops that push values implement high-level optimizations. That is, optimizations on the stack. that are hardware independent, such as source-to-source [27] We present our experience with JVM bytecode generation transformations or data flow reductions[29]. Other stages in this paper. Most of the presented information is indepen- implement simplification of the AST, such as the one that dent of the source language to be compiled so it could be compiles Scheme closures [25, 30, 26]. Another stage han- beneficial to anyone wishing to implement a compiler pro- dlespolymorphism[20, 28]. That is,for each variable of the ducing JVM bytecode. source code, it selects a type that can be polymorphic, i.e., thetypedenotingany possible Scheme value, or monomor- 1.3 Overview phic which can denote the type of primitive hardware values, In Section 2 we present the performance evaluation fo- such as integers or floating point double precision.AttheC cusing on the difference between Bigloo and BiglooJVM. code generation point, the AST isentirely type-annotated In Section 3 we present the specific techniques deployed in and closuresare allocated asdata structures. Producing the the BiglooJVM code generator and the BiglooJVM runtime C code requires just one additional transformation: the ex- system. In Section 4 we present the important tunings we pressions of the AST are turned into C statements. In the have applied to the code generator and the runtime system BiglooJVM, the bytecode generator takesthe place of this to tame the JIT compilers. In Section 5, we compare the C statement generator (Figure 1). BiglooJVM compiler with other systems. Reusing all the Bigloo compilation stagesisbeneficial for the quality of the produced JVM bytecode and for the sim- 2. PERFORMANCE EVALUATIONS plicity of the new resulting compiler. The JVM bytecode We present in this section the performance evaluation of only represents 12% of the code of the whole compiler (7,000 BiglooJVM.Inafirststep,wecompareittoBigloo(that linesof Scheme code for the JVM generator to be compared isthe C back-end of Bigloo). Then, we compare it with to the 57,700 linesof Scheme code for the whole compiler). other Scheme implementations. Finally, we compare it with The JVM runtime system is only 5,000 lines of Java code non-Scheme implementationsthat produce JVM bytecode. whichhavetobecomparedwith30,000linesofSchemecode that are common to both JVM and C back-ends. This can 2.1 Settings also be compared with 33,000 lines of C code (including the To evaluate performance, we have used two different ar- Garbage Collector which is23,000 lineslong) of the C run- chitectures: time system. The first versions of BiglooJVM, a mere 2,000 lines of • Linux/x86: An AMD/Thunderbird 800Mhz, 256MB, Scheme code, were delivering extremely poor performance running Linux 2.2, Java HotSpottm of the JDK 1.3.0, applications. For instance, the JVM version of the Cgc used in “client” mode.

Load more