Safe and Efficient Hardware Specialization of Java Applications
Total Page:16
File Type:pdf, Size:1020Kb
Safe and Efficient Hardware Specialization of Java Applications Matt Welsh Computer Science Division University of California, Berkeley Berkeley, CA 94720-1776 USA [email protected] Abstract take full advantage of these interfaces, instead fo- cusing on the other aspects of performance, such as Providing Java applications with access to low-level compilation [18], garbage collection [1], and thread system resources, including fast network and I/O inter- performance [2]. faces, requires functionality not provided by the Java Traditionally, Java applications make use of low- Virtual Machine instruction set. Currently, Java appli- level system functionality through the use of native cations obtain this functionality by executing code writ- ten in a lower-level language, such as C, through a native methods, which are written in a language such as C. method interface. However, the overhead of this inter- To bind native method code to the Java application, face can be very high, and executing arbitrary native a native method interface is used, which has been code raises serious protection and portability concerns. standardized across most JVMs as Sun Microsys- Jaguar [37] provides Java applications with efficient tems' Java Native Interface [29]. However, the use of access to hardware resources through a bytecode spe- native methods raises two important concerns. The cialization technique which transforms Java bytecode first is performance: the cost of traversing the na- sequences to make use of inlined Jaguar bytecode which tive method interface can be quite high, especially implements low-level functionality. Jaguar bytecode is when a large amount of data must be transferred portable and type-exact, making it both safer and more across the Java-native code boundary. The second efficient than native methods. Jaguar requires that the is safety: invoking arbitrary native code from a Java target JVM or compiler recognizes Jaguar bytecode, application effectively negates the protection guar- which is a superset of the Java bytecode instruction set. antees of the Java Virtual Machine. These two prob- We describe two implementations of Jaguar: one based lems conflate, as programmers tend to write more on a static Java compiler, and the other which uses a application code in the native language to amortize standard JVM coupled with a Jaguar-enabled just-in- the cost of crossing the native interface. time compiler. The JIT compiler applies code patches We have developed a system called Jaguar [37] to the original Java bytecode, stored in the form of an- which provides applications with efficient and safe notations in the Java class file. We demonstrate that access to low-level system resources. This is accom- Jaguar-specialized Java applications perform as well as plished through a bytecode specialization technique C for a set of communication benchmarks, and that the in which certain Java bytecode sequences are trans- code patching technique involves minimal compile-time lated to low-level code which is capable of perform- overhead. ing functions not allowed by the JVM, such as direct memory access. Because this low-level code is in- lined into the Java application at compile time, the 1 Introduction overhead of the native interface is avoided. Also, aggressive optimizations can be performed on the Java [14] is becoming increasingly popular for combined application and inlined Jaguar code. large-scale server applications, including Internet Our original prototype of Jaguar, described services [27, 34], databases [16], and parallel pro- in [37], made use of a Java just-in-time (JIT) com- cessing [40, 24]. To obtain high performance, these piler which recognized certain bytecode sequences applications need to make use of specialized hard- and translated them to x86 machine code directly. ware and O/S interfaces, such as fast cluster net- This approach is both non-portable and error-prone, works [32], asynchronous I/O [23], and raw disk ac- requiring that the machine code sequences be tai- cess [17]. To date, few Java systems have striven to lored for the particular combination of JVM, JIT, and machine architecture. It also requires non- usually have special requirements for buffer alloca- trivial changes to the JIT compiler to recognize and tion. However, this requirement runs counter to the translate Java bytecode sequences. existing Java model in which all objects and arrays This paper presents a new design based on trans- are allocated from a single heap, managed by the lating Java bytecode to Jaguar bytecode, which is a JVM's garbage collector. superset of the Java instruction set. Jaguar byte- Native methods are the traditional mechanism code is used to represent low-level operations oth- by which these operations are implemented in Java. erwise unsupported by the JVM. Jaguar bytecode Native methods permit the binding of arbitrary contains two additional instructions | peek and code written in a lower-level language (such as C) poke | which are used to perform direct memory to the Java application. Apart from raising serious access. Application programmers are not permit- protection and safety concerns, native methods are ted to make use of these instructions; rather, they not well-suited for providing fine-grained, efficient are restricted to use within Java-to-Jaguar trans- access to system resources. As we demonstrated lation rules. Because Jaguar bytecode is portable, in [37], the Java Native Interface itself imposes a type-exact, and inlined directly into the Java appli- large performance penalty, making them suitable cation, it is both safer and more efficient than using only for relatively long operations which pass only native methods for low-level operations. a small amount of data across the Java-native code We present two implementations of Jaguar based boundary. Furthermore, native methods limit ex- upon this design. The first makes use of a static pressiveness, as native code can only be bound to Java compiler to translate Jaguar bytecode to ma- method invocation. Field access, operators, and chine code. The second uses a modified JIT com- other Java operations cannot hook into the native piler which applies a \patch" to produce Jaguar code interface in any way. bytecode at compile time; this mechanism requires Several projects have attempted to bind fast Jaguar support only in the JIT compiler, and is communication layers into the Java environment fully compatible with unmodified JVMs. Both ap- through the use of native methods. Native method proaches relegate the (relatively expensive) trans- bindings to MPI [13] and PVM [33] have been de- lation from Java to Jaguar bytecode to a portable scribed, however, neither of these have considered front-end compiler, minimizing the complexity im- performance issues with respect to obtaining low la- pact on the back-end compilers. tency or high bandwidth. Little work has been done to support other types of I/O, although Sun Mi- crosystems is currently considering new APIs sup- 2 Motivation and Background porting asynchronous I/O interfaces [25]. Another approach is to extend the JVM or Much previous work has addressed the CPU- Java compiler to provide special support for certain related aspects of Java performance, including com- classes, such as communication buffers [9], bulk I/O pilation [28, 18, 11], thread synchronization [2], and interfaces [4], or JVM-internal data structures [8]. garbage collection [30]. However, Java I/O per- While this technique can provide a point solution formance remains largely uninvestigated. Imple- for particular communication and I/O interfaces, it menting high-performance communication and I/O lacks generality and requires that significant new in Java requires two classes of operations which functionality be incorporated into each JVM imple- the Java Virtual Machines does not directly sup- mentation. In addition, this functionality must it- port. The first is direct access to I/O device inter- self be trusted, much in the same that native meth- faces, including user-level network interfaces [35, 32] ods are. and raw disk I/O [17]. These mechanisms gener- The Jaguar approach is motivated by the obser- ally require the use of specialized system calls, or vation that the sort of low-level operations required even memory-mapped interfaces which circumvent for enabling high-performance communication and the O/S kernel entirely. The second is the use of I/O are generally short and easily expressed as a se- explicitly-managed memory regions. For example, quence of simple instructions (e.g., accessing a par- user-level network interfaces often require that com- ticular memory address, or invoking a system call). munication buffers be pinned to physical memory This suggests that such operations can be inlined for direct access by the NI hardware; these pages into the compiled Java bytecode stream for perfor- must be allocated from a special pool or pinned dy- mance, and that some form of static analysis could namically by the O/S or NI [36]. Memory-mapped be performed to guarantee safety or type-exactness. files are often used for I/O, and raw disk interfaces Jaguar is based on the concept of bytecode spe- Instruction Stack effect Java Machine <type>peek ..., int address Bytecode Jaguar Code ) Translation ..., <type> value Rules <type>poke ..., int address, <type> value ... ) Jaguar Front-end Bytecode Back-end Compiler Compiler Figure 2: Effect of Jaguar instructions on the Java stack. <type> represents one of the Java primitive types byte, short, int, or long. Figure 1: The design of the Jaguar system. A front-end compiler applies a set of translation rules to a Java class- file to produce a Jaguar classfile, which is then compiled to machine code. Our design is illustrated in Fig- by a back-end compiler to machine code. ure 1. An example of a Jaguar translation rule is to convert a call to a particular method (repre- cialization in which Java bytecode is specialized sented in Java bytecode using the invokevirtual, at compile time to perform low-level operations.