Exploring Performance Improvement for Java-Based Scientific Simulations

Exploring Performance Improvement for Java-based Scientific Simulations Xiaorong Xiang∗ Yingping Huang Gregory Madey Department of Computer Science and Engineering University of Notre Dame NotreDame, IN 46556 USA Telephone: 574-631-7596 Email:[email protected], [email protected], [email protected] Abstract and MacOS ) for their scientific studies. It is im- possible to deploy an application written in C, C++, There has been growing interest for using the Java or FORTRAN languages from one platform to the programming language in scientific and engineering other without rebuilding the application or changing applications. This is because Java offers several fea- the code. One of the most attractive features of Java tures, which other traditional languages (C, C++, is its portability, “write once, run anywhere.” Java andFORTRAN) lack, including portability, garbage runtime environment (JVM) provides an automatic collection mechanism, built-in threads, and the RMI garbage collection feature that reduces the burden of mechanism. However, the historic poor performance explicitly managing memory for programmer. The of Java stops it from being widely used in scientific Java built-in threads implementation and Java’s Re- applications. Although research and development on mote Method Invocation (RMI) mechanism make it Java resulting in JIT compilers, JVM improvement, easy for parallel computing and distributed comput- and high performance compilers, have been done ing. much for runtime environment optimization and sig- Although there are many attractive features pro- nificantly speeded up the Java programs, we still be- vided by the Java language and the J2EE architec- lieve that creating hand-optimized Java source code ture makes Java a potential language for scientific ap- andoptimizing code for speed, reliability, scalability, plications, performance remains a prime concern for andmaintainability are also crucial during program program developers using Java. The portability and development stage. In thispaper,wedemonstrate memory management implementation in JVM impose an agent-based simulation model that is built using apenalty on the performance. Ashworth (1999) [1] Java programming language and the Swarm simu- discussed several issues related to the use of Java for lation library. We analyze the performance of this high performance scientific applications. simulation model from several aspects: runtime opti- However, much research has been done to reduce mization, database access, objects usage, parallel and the performance gap between Java and other pro- distributed computing. This simulation model pos- gramming languages. This research includes Just-in- sesses most of characteristics which general scientific Time (JIT) compilers that compile the byte code into simulations have. These techniques and analysis ap- native code on-the-fly just before execution, adap- proaches can also be generally used in other scientific tive compiler technology (Sun’s HotSpot VM), third- simulations using Java. party optimizing compilers that compile the Java source code to the optimized bytecode, and high per- 1Introduction formance compilers that compile the Java source to native code for a particular architecture (IBM High- C, C++ and FORTRAN have traditionally been used Performance Compiler for Java for RS6000 architec- for modeling scientific applications. Since the Java ture). Bull (2001) [2] rewrote the Java Grande Bench- programming language was introduced by Sun Mi- marks in C andFORTRAN. Bull also compared crosystem in the mid-1990s, there has been growing the performance between these languages in differ- interest for using it in scientific and engineering appli- ent Java runtime environments on different hardware cations. The reason for this is that Java offers several platforms. The results demonstrate that the perfor- features, which other traditional languages lack. Sci- mance gap is quite small on some platforms. entists use various platforms (such as Windows, Unix The runtime environment optimization is an im- Page 1 portant aspect to study in order to achieve high per- GCJ over other JVMs. The GCJ compiler is a project formance. Determining and understanding the fac- under development and several limitations still exist. tors that affect the performance of scientific appli- For example, GCJ can not compile the Java program cations from a software engineering perspective and with swing.UsingGCJ,theJavaapplication with identifying and eliminating the bottlenecks that limit Swarm library can be compiled into native code, but scalability at the software development stage are also it does not improve the performance. necessary for high performance scientific applications. There are many other runtime environment opti- In this paper, several approaches for analyzing and mizers and high performance compilers. AlphaWorks improving the performance of a particular scientific is a high performance compiler for Java from IBM simulation, which is used to study Natural Organic alphaWorks. It can be usedonOS/2,AIXandWin- Matter (NOM), have been described. The NOM sim- dows NT platform. JOVE 2 also can only be used ulator was built using the Java programming lan- on Windows machines. TowerJ environment 3,devel- guage and the Swarm library from Santa Fe Insti- oped by Tower Technology Corporation, is another tute [3]. The NOM simulation model is a typical example of high performance compilers. It is avail- distributed, stochastic scientific application that uses able for Solaris and Linux platforms. an agent-based modeling approach. It generates a For large scale scientific applications written in substantial data set that must be stored in a remote Java, the scalability can be improved using the par- database and able to be manipulated for producing allelism programming model. The parallelism model useful information. The application simulates the be- can run with a single JVM in a shared memory mul- havior of a large number of molecules. An application tiprocessor system or with multiple JVMs in a dis- of the NOM simulation model needs to run for a long tributed memory system. The Java built-in threads time, often for days. Severalaspects of the simulation mechanism is a convenient method for parallelism im- model have been analyzed, including runtime opti- plementation in shared memory environments. How- mization, database access, objects usage, and parallel ever, for large scale applications that require large and distributed computing. memory and CPU time, distributing the application The NOM simulation model possesses the char- on multiple JVMs in a distributed memory system acteristics which typical scientific applications have. with some message passing mechanisms for inter-VM These techniques and analytical approaches can gen- communication is a suitable way to address the re- erally be used in other scientific applications. We quirements. expect that our experiences can help other scientific The standard Java libraries, thread class, is appro- application developers to find a suitable way of tun- priate for using in the parallel programming paradigm ing and achieving higher performance for their appli- in a single JVM environment. Since most scientific cations. applications are CPU bound, in order to avoid the context switching to achieve best performance, the 2Related works number of threads should match the number of pro- cessors in the hardware architecture. Additionally, More and more implementations of Java based sim- the thread creation and destroying should be avoided ulation environment and toolkits [4][5][6][3]indicate by creating and managing a thread pool. that Java language and Java-based technologies have OpenMP [9], an open standard for shared mem- been widely and will be continue used in simulation ory directives, defines directives for FORTRAN, C, modeling and implementations. and C++. OpenMP provides aportableand scal- Traditionally, a Java program is compiled into able model that offers a simple and flexible interface bytecode using a compiler and a Java Virtual Ma- for developing parallel applications in shared memory chine (JVM) is needed to read in and interpret the systems. JOMP [10] provides a set of OpenMP-like bytecode. directives and library routines for supporting shared GCJ1,theGNUCompiler for the Java language, memory parallel programming in Java. It uses Java can compile the Java source code into either the byte- threads as the underlying parallel model and is most code or the native code. GCJ has been integrated into useful for parallelizing scientific applications at the GCC. Bothner (2003) [7] discussed the advantages, loop level. features, and limitations of GCJ in detail. Ladd For distributed computing, Java provides a com- (2003) [8] did several benchmarks using GCJ on the munication mechanism using sockets and the RMI Linux platform and showed a performance gain of 2http://www.instantiations.com/jove/ 1http://gcc.gnu.org/java/ 3http://www.towerj.com Page 2 (Remote Method Invocation). Java RMI [11] is a tree structure. These datacanbeexportedtoafor- message passing paradigm based on the RPC (Re- matted text file (HTML). Information about object mote Procedure Call) mechanism. RMI is primarily allocation and deallocation can be viewed in a user intended for use in the client-server model instead of selected order. OptimizeIt is a powerful tool that the peer-to-peer communication model. On the other detects memory leaks and CPU performance bottle- hand, the explicit use of sockets is too low-level to be necks in Java applications. used

Exploring Performance Improvement for Java-Based Scientific Simulations

Real-Time Java for Embedded Devices: the Javamen Project*

Zing:® the Best JVM for the Enterprise

Java Performance Mysteries

Performance Comparison of Java and C++ When Sorting Integers and Writing/Reading Files

Design and Analysis of a Scala Benchmark Suite for the Java Virtual Machine

Java (Software Platform) from Wikipedia, the Free Encyclopedia Not to Be Confused with Javascript

Implementing Data-Flow Fusion DSL on Clojure

Java™ Performance

Aleksey Shipilev CV.Pdf

Improving Energy Consumption of Java Programs

Mastering Clojure Macros Write Cleaner, Faster, Smarter Code

Accelerating Performance for Server-Side Java* Applications 2