Optimizations in the Cibyl Binary Translator for J2ME Devices

Optimizations in the Cibyl binary translator for J2ME devices Simon Kagstr˚ om¨ ,Hakan˚ Grahn, and Lars Lundberg Department of Systems and Software Engineering, School of Engineering, Blekinge Institute of Technology Ronneby, Sweden {ska, hgr, llu}@bth.se Abstract ing Cibyl, our goals have been to produce compact translated code with performance close to native Java code for The Java J2ME platform is one of the largest software plat- common cases. Compared to implementing a Java byte- forms available, and often the only available development code backend for a C/C++ compiler, the binary translation platform for mobile phones, which is a problem when port- approach can require less engineering, since Java bytecode ing C or C++ applications. The Cibyl binary translator (which is type-safe and modeled for the characteristics of targets this problem, translating MIPS binaries into Java high-level Java code) might not be a good match for the bytecode to run on J2ME devices. This paper presents compiler structure. The general design of Cibyl has been the optimization framework used by Cibyl to provide com- described in an earlier paper [8], and this paper focuses on pact and well-performing translated code. Cibyl optimizes optimizations made to reduce the size and improve the per- expensive multiplications/divisions, floating point support, formance of the translated binaries. function co-location to Java methods and provides a peep- hole optimizer. The paper also evaluates Cibyl perfor- The optimizations we employ for Cibyl share some sim- mance both in a real-world GPS navigation application ilarities with regular compiler optimizations, e.g., use of where the optimizations increase display update frequency function inlining and constant propagation, but is also sig- with around 15% and a comparison against native Java nificantly different. Since the GCC compiler has already and the NestedVM binary translator where we show that optimized the high-level C code, the goal of the Cibyl bi- Cibyl can provide significant advantages for common code nary translator is to make the translation into Java bytecode patterns. as efficient as possible. Because of this, the binary translation optimizations we apply are mostly local in scope, act- ing on a small set of adjacent instructions or the opposite 1. Introduction act on large entities such as entire functions. The main contributions of this paper are the following. A large majority of the mobile phones sold today We first describe the set of optimizations we make and how come with support for the Java 2 Platform, Micro Edition these improve size and performance. We then perform a (J2ME) [17], and the installation base can be measured in benchmark on an application ported with Cibyl to illustrate billions of units [14]. Mobile phones are also quickly be- the optimizations in a real-world setting. Finally, we com- coming more and more powerful, having processing speed pare Cibyl against native Java and another binary translator, and memory comparable to desktop computers of a few NestedVM [1] in a micro benchmark (an implementation years ago. J2ME is a royalty-free Java development envi- of the A* algorithm) to study performance characteristics ronment for mobile phones, and is often the only available in detail. The performance results show that Cibyl is sig- method of extending the software installed on the phone. nificantly faster than NestedVM and close to native Java This poses a severe problem when porting C or C++ appli- performance on the cases we target. cations to J2ME devices, which can often require a com- plete rewrite in Java of the software, possibly assisted by The rest of the paper is structured as follows. In Sec- automated tools [3, 6, 10, 11]. tion 2 we introduce the Cibyl binary translator. The main Cibyl is a binary translator which targets this problem. part of the paper then follows in Section 3 where we de- Cibyl translates MIPS binaries compiled with GCC into scribe the optimizations performed by Cibyl and Section 4 Java bytecode, and provides means to integrate with na- where we evaluate our optimizations. Section 5 describes tive J2ME classes. Cibyl therefore allows C and C++ pro- related work and finally we conclude and present future di- grams to be ported to run on J2ME phones. When design- rections in Section 6. Java/J2ME Generate syscall Syscalls.java API wrappers CRunTime.java Java Compiler etc C source C compiler / Compiled MIPS to java Jasmin Java preverifier, Midlet.jar code linker program binary translator assembler archiver Figure 1. Translation process in Cibyl. Gray boxes show unmodified third-party tools. 2. Cibyl if the program causes an exception, the call chain emit- ted by the JVM will be human readable. The 1-1 map- Cibyl uses the GCC [16] compiler to produce a ping also enables some optimizations, which will be dis- MIPS I [7] binary, which is thereafter translated into Java cussed later. We handle register-indirect function calls spe- bytecode by the Cibyl tools. Figure 1 shows the translation cially since Java bytecode does not support function point- process with Cibyl, where we use a set of tools to translate ers. To support function pointers, we generate a special the MIPS binary. Apart from the binary translator, which “call table” method that switches on the function address outputs Java bytecode assembly to Jasmin [12], we also and calls the corresponding method indirectly. Indirect have a stub code generator to produce stubs for calls into jumps, generated by GCC for example in some switch native Java code. When translating, Cibyl uses Java local statements, are handled similarly through a “jump table” variables to represent MIPS registers, which contributes to in the method which switches on possible jump destina- producing efficient and compact code compared to using tions in the method (which can be found through the ELF class member variables or static class variables. The MIPS relocation information). instruction set is well suited for binary translation, and most arithmetic instructions can be directly translated to a se- Compared to the integer instruction set, the MIPS float- quence of loading source registers (local variables) on the ing point instruction set is more difficult to translate [8]. Java operand stack, performing the operation and storing Floating point is therefore supported by a hybrid approach into the destination register. where we use the GCC soft-float support, but implement We use a Java integer array to represent memory as seen the runtime support functions using native Java floats. GCC by the MIPS binary. This means that 32-bit memory ac- generates calls to functions such as addsf3 for a float- cesses are performed efficiently by simply indexing the ing point add, passing an integer representation of the memory array with the address divided by four, but also source registers, and our hybrid approach implements this that 8- and 16-bit accesses need an extra step of masking through a Java method that converts the integer representa- and shifting the value in memory to get the correct result. tion to real floats, performs the add and returns the integer- Since a common pattern is to use the same base register representation of the result. This solution provides a trade- repeatedly with different offsets, we pre-calculate the ar- off between implementation complexity and performance, ray index and use special memory access registers for these with a very simple implementation but less performance cases. To reduce space, we also perform the more expen- than an implementation of the MIPS FPU instruction set. sive 8- and 16-bit accesses through functions in the runtime support instead of generating bytecode directly. Similarly, expensive arithmetic operations such as unsigned multiplications are also implemented in the runtime layer. Since 3. Optimizations 32-bit access is easiest to support efficiently, Cibyl focuses on performance for this case. Cibyl uses a 1-1 mapping between C functions and We perform a number of optimizations in Cibyl apart generated Java methods, which brings a number of ben- from the general code generation optimizations described efits. First, this mapping enables the J2ME profiler to above to improve performance and reduce the size of the produce meaningful output for Cibyl programs. Second, generated Java class files. 3.1. 32-bit multiplications/divisions ity through emitting bytecode instructions directly, replac- ing the function call. The MIPS instruction for multiplication always produce This allows us to reduce the overhead of floating point a 64-bit result, split in the hi/lo register pair. We trans- operations significantly, at the cost of a slightly larger translate this to Java bytecode by casting the source registers to lated file. With this approach, we output Java bytecode for 64-bit longs, perform the multiplication, split the resulting floating point operations directly to inline the GCC helper value and place it in hi/lo. As expected, this generates functions for soft floats. We use this functionality for other many instructions in Java bytecode, and is also fairly in- purposes as well, e.g., to throw and catch Java exceptions efficient and we perform it in the runtime support to save from C code. space. Often, however, only the low 32 bits of the value is ac- 3.4. Function co-location tually used and in these cases we perform a 32-bit multiplication which can be done natively in Java bytecode. This is One large source of overhead is method calls, i.e., trans- safe since the lo/hi registers are not communicated across lated C function calls. Since Cibyl uses local variables for functions in the MIPS ABI, so if the hi register is unused the register representation, it needs to pass up to 7 argu- we simply skip producing it.

Load more