Implicit Java Array Bounds Checking on 64-Bit Architectures£

Implicit Java Array Bounds Checking on 64-bit Architectures£ Chris Bentley, Scott A. Watterson, David K. Lowenthal, Barry Rountree Department of Computer Science The University of Georgia g fcbentley,saw,dkl,rountree @cs.uga.edu ABSTRACT guage. Finally, rapid progress is being made in Java compiler tech- Interest in using Java for high-performance parallel computing has nology. increased in recent years. One obstacle that has inhibited Java from However, several obstacles currently prevent Java from becom- widespread acceptance in the scientific community is the language ing widely accepted for HPC. One such obstacle is the Java lan- requirement that all array accesses must be checked to ensure they guage specification requirement that all array accesses be checked are within bounds. In practice, array bounds checking in scien- to ensure they are within bounds. This often causes a run-time per- tific applications may increase execution time by more than a fac- formance penalty, because in general, a Java compiler must ensure tor of 2. Previous research has explored optimizations to statically all array references are within bounds by adding explicit checks eliminate bounds checks, but the dynamic nature of many scientific preceding an array access. If the reference is outside the bounds of codes makes this difficult or impossible. the array, a run-time error is generated. Unfortunately, this simple Our approach is instead to create a new Java implementation solution adds overhead to all run-time array accesses. that does not generate explicit bounds checks. It instead places ar- While array accesses are infrequent in some applications, scien- rays inside of Index Confinement Regions (ICRs), which are large, tific programs tend to be array intensive, which means that check- isolated, mostly unmapped virtual memory regions. Any array ing array bounds can significantly increase execution time. For Java reference outside of its bounds will cause a protection violation; to have any chance to be widely used for HPC applications, array this provides implicit bounds checking. Our results show that our bounds checking overhead must be alleviated. new Java implementation reduces the overhead of bounds checking The current state of the art for reducing bounds checking over- from an average of 63% to an average of 9% on our benchmarks. head is to use static analysis to eliminate explicit checks by prov- ing an array reference is within its bounds [4]. However, there are several problems with this approach. First, the dynamic nature Categories and Subject Descriptors of many scientific codes makes static elimination difficult or im- D.3.4 [Programming Languages]: Processors—optimization possible. This is borne out by our manual inspection of the NAS benchmark suite [2]. Second, even when static analysis is possible, General Terms currently available compilers may have difficulty removing explicit checks. Measurement, Performance Rather than attempting to eliminate explicit bounds checks statically, our goal is to use compiler and operating system support to Keywords implicitly check array bounds in Java programs. We leverage the Array-Bounds Checking, Java, Virtual Memory 64-bit address space of modern architectures to reduce the cost of array bounds checks. This is a potentially useful technique for scientific applications, which increasingly are difficult to analyze stat- 1. INTRODUCTION ically. If static analysis cannot eliminate bounds checks, then the Interest is growing in the scientific programming community in Ò compiler must insert Ò bounds checks for an -dimensional array using Java for high performance computing (HPC). Some reasons reference. Instead, we perform no static analysis of the program, for this include attractive features such as portability, expressive- and we insert zero checks for array bounds. We place each array ness, and safety. In addition, threads are built into the Java lan- object in an index confinement region (ICR). An ICR is an isolated £ virtual memory region, of which only a small portion, correspond- This research was supported in part by a State of Georgia Ya- macraw grant as well as NSF Grant No. 0234285. ing to valid array data, is mapped and permissible to access. The rest of the ICR is unmapped, and any access to that portion will cause a hardware protection fault. This achieves implicit bounds checking for all array references. While ICRs can be implemented Permission to make digital or hard copies of all or part of this work for on most modern architecture/operating systems, they are primarily personal or classroom use is granted without fee provided that copies are intended for 64-bit machines. This allows allocation of hundreds of not made or distributed for profit or commercial advantage and that copies thousands of 32GB ICRs, each of which is large enough to implic- bear this notice and the full citation on the first page. To copy otherwise, to itly catch any illegal access to an array of double-precision num- republish, to post on servers or to redistribute to lists, requires prior specific bers. permission and/or a fee. ICS’04, June 26–July 1, 2004, Saint-Malo, France. We made two primary modifications to gcj, the GNU Java im- Copyright 2004 ACM 1-58113-839-3/04/0006 ...$5.00. Program Dynamic Percentage of Removable Checks will successfully eliminates all removable checks. In these cases FT 0% compile-time schemes must fall back to general run-time check- MG 90% BT 100% ing. Instead, our Java implementation decreases the cost of bounds CG 7% checking and does not depend on static analysis to do so. A similar approach to ours is to use segmentation for no-cost bounds checking [6]. The basic idea is to place an array in a seg- Table 1: Percentage of bounds checks, for four NAS programs, ment and set the segment limit to the size of the array. This tech- that we believe could be eliminated statically. In practice, we nique is effective for small one-dimensional arrays, because auto- are not aware of a Java compiler that can successfully eliminate matic checking is done on both ends of the array. However, typi- all of the removable checks indicated above. cally one end must be explicitly checked for large arrays. Further- more, because there are a limited number of segments that can be simultaneously active (four on the x86, for example), full bounds plementation. First, we modified array allocation (new) so that checking must be used for some arrays if there are more live arrays array objects are allocated inside of ICRs. Second, we modified than this maximum. Most importantly, multidimensional arrays the way gcj generates array indexing code so that illegal accesses cannot be supported. This is because the segment limit prevents will fall into an unmapped memory area. only an access past the allocated memory for the entire array; an Because our new Java implementation utilizes ICRs, dense ac- ½ illegal access in one of the first Ò dimensions that happens to cess patterns are replaced with sparse ones. This has negative con- fall within the allocated memory for the array will not be caught. sequences in the TLB, cache, and memory. Accordingly, we mod- While this provides some degree of security, it can not produce se- ified the Linux kernel to customize virtual addressing to optimize mantically correct Java programs. for a sparse access pattern. Furthermore, a process can choose to Electric Fence [14], which places a single unmapped page on use our customized scheme, so that regular processes are unaffected either side of an allocated memory area, bears some similarity to by our kernel modifications. ICRs. Electric Fence automatically catches overruns on one end. Our results on a 900 MHz Itanium-2 show that performance of However, it does not handle arbitrary array references, such as ref- our new Java implementation that uses ICRs is superior to gcj- erences past the unmapped page. In contrast, our Java implementa- generated code that uses explicit bounds checks. Specifically, our tion is able to catch any illegal reference. approach reduces the bounds checking overhead for scientific Java Our approach bears some similarities to Millipede [11], a soft- benchmarks from an average of 63% to an average of only 9%. In ware DSM system. Millipede avoids thrashing by placing distinct addition, all of the benchmarks performed better using ICRs rather variables (that would generally be allocated on the same page) on than full compiler bounds checking. different pages at their appropriate offsets; then, both pages are The remainder of this paper is organized as follows. The next mapped to the same physical page. Different protections can then section describes related work. Section 3 provides implementation be used on each variable, because protections are done at the virtual details, and Section 4 presents performance results. Section 5 dis- page level. cusses issues arising in this work. Finally, Section 6 concludes. Our extended and customizable virtual memory abstraction (xvm) [3] is used only by those processes that use ICRs. This means 2. RELATED WORK that regular processes are unaffected. This is somewhat reminis- A significant body of work exists on static analysis to eliminate cent of microkernels such as Mach [19], which provide flexibility array bounds checks. Kolte and Wolfe [12] perform partial redun- to application level processes (such as allowing an external pager). dancy analysis to hoist array bound checks outside of loops. Their However, our modification is inside the kernel as opposed to at the algorithm was based on that described by Gupta [10], who formu- application level. lated the problem as a dataflow analysis. In ABCD [4], Bodik et There has been work on how to utilize 64-bit architectures, mostly al. implemented a demand-driven approach to bounds checking in from the viewpoint of protection.

Load more