THE MAGAZINE OF USENIX & SAGE August 2001 • Volume 26 • Number 5

Special Focus Issue:Clustering inside: Guest Editor: Joseph L. Kaiser

CONFERENCE REPORTS 1st JavaTM VM

& The Advanced Computing Systems Association & The System Administrators Guild August 2001 ujc,Greg explained thetechnical subject, Having motivated thelisteners onthe in such anexample. real-time software explained therole of JPL’s missiondatasystem and example of Greg thenpresented acase phone). a system response inreal time(e.g., because customers areimportant usedto real-time factoris inhand-helddevices The trend. the handheldisemerging from to largedevices function migration Greg pointed out computing. of this era puting –are toward moving acollision in com- andindustrial scientific, for device, and Web computing andtheother sonal, per- development –theoneforbusiness, system of thetwo paradigms scenario, In this increasing software complexity. of thereasons fortheinevitability two of complex systems are for buildinglarge, andthedemand networking everywhere Thepresence of embedded systems today. topic by presenting in thescenario Greg Bollellaintroduced the machines. presentation onreal-time virtual theconference wasthe The invited talkof Summarized byV.N. Venkatakrishnan Hardin, aJileSystems Greg Bollella,SunMicrosystems; David K A M (JVM ’01) Symposium Research andTechnology First ™VirtualMachine EYNOTE PRIL ONTEREY l tor:SaulWold, DavidHardin, 23-24 2001 : V IRTUAL , ;login: Greg Bollella ALIFORNIA M ACHINES , R EAL T IME JVM ’01 oe yhrwr,O,adJM real- and JVM, OS, posed by hardware, problems that despitevinced the various onewascon- After attending the talk, machines. different real-time virtual ing robot handscontrolled by two pianoplay- tacular demo-presentation of aspec- The discussionculminated with thespecificationisinprogress. release of andfinal group byexpert April 30, JSR wasscheduled forpresentation to the Theimplementation of access. memory andphysical concurrency, agement, man- memory such issuesasscheduling, emphasizing Java real-time specification, Greg continued the thediscussionwith performance. provided thefastest real-time Java Such animplementation has required. noCcode orassembly with Java, enables theentire in system to bewritten This thread primitives asinstructions. real-time threads inhardware usingJava native and thissupports instructions JVMbytecodes are power hardware. low- low-cost, simple, JVM directly with approach taken by aJile isto implement the In David Hardin’s presentation, collection.scheduling andgarbage functionsintheJVMsuch as various andinherent dueto unpredictability cies, x86context laten- switch ware latencies, hard- application-level unpredictability, this: tonumerous achieving barriers But there are not justapplications. embeddedsystem software,ing various real-time build- JVMwould thus support A to allthelayers insoftware abstraction. andbelongs management, matic memory anauto- APIs, acommon setof libraries, Java hasalargesetof language, OO As anadvanced menting such systems. language isanidealchoice forimple- There are several reasons why theJava itscomputation. of part gral a system which includestimeasaninte- but real-time system isnotafastsystem, He thata thepopularmyth clarified ple. car’s components asanexam- electrical using a areal-time system, aspects of eng.sun.com eng.sun.com be obtainedfrom aboutthiswork can information Further andanewinline policy. uling, sched- instruction loopunrolling, checks, on thecompiler which includerange outlined thedirections forfuture work he Finally, issues. guage andruntime discussingsolutionsforspecificlan- tion, compila- thephasesof someof described Thenhe . theserver of structure followed by the ated code execute within, ment thatboththecompiler andgener- environ-Michael theruntime described tion. optimiza- andpeephole ister allocation, reg- graph-coloring selection, instruction optimal mistic constant propagation, opti- value-numbering, global idioms, fast-path/slow-path chy-aware inlining, These optimizationsincludeclass-hierar- improved performance. asymptotic aggressive optimizationsto achieve version The appliesmore tion. optimiza- modestlevels of with footprint fastcompilationvery timesandasmall Theclientversion provides Machine. theJava HotSpot Virtual tion of research questionthrough the presenta- Michael Paleczny’s talkaddressed this quently executed applicationcode? fre- improve through optimizationof JVM How cantheperformance of Cliff Click,SunMicrosystems Michael Paleczny, ChristopherVick,and T Summarized byV.N. Venkatakrishnan SESSION: CODEGENERATORS [email protected]. be obtainedby contacting Greg at onthisproject information Further can built usingJava. time applicationscanbesuccessfully C bara Jeff Bogda,AmbujSingh,UCSantaBar- HE NA AN J AVA S HAPE H OT A S NALYSIS POT michael.paleczny@ S ERVER W R AT ORK C OMPILER R UNTIME ? 13 CONFERENCE REPORTS A shape analysis is a program analysis SABLEVM: A RESEARCH FRAMEWORK FOR forming these checks, each of these tai- that can identify runtime objects that do THE EFFICIENT EXECUTION OF JAVA BYTECODE lored to a particular restricted case that not need to be placed in the global heap Étienne M. Gagnon, Laurie J. Hendren, commonly arises in Java programs. By and do not require any locking. It has McGill University exploiting compile-time information to been shown through previous research SableVM is an open-source virtual select the most applicable technique to that these two optimizations speed up machine for Java intended as a research implement each dynamic type check, the some applications significantly. Since the framework for efficient execution of Java run-time overhead of dynamic type shape analysis requires a complete call bytecode. The framework is essentially checking can be significantly reduced. graph, it has not been implemented in composed of an extensible bytecode Bowen introduced the topic by going the JVM. interpreter using state-of-the-art and over the Java type system and the basic innovative techniques. Written in the C After illustrating the purpose and some types. He then presented the main con- and assuming history of shape analysis, Jeff Bogda’s talk tributions of this research. This work minimal system dependencies, the inter- went on with the description of his suggests maintaining three data struc- preter emphasizes high-level techniques approach to build an incremental shape tures operationally close to every Java to support efficient execution. analysis to analyze an executing program. object. The most important of these is a The analysis is done through an experi- Sable VM introduces several innovative display of superclass identifiers of the mental framework to which the execut- ideas: a bidirectional layout for object object’s class. With this array, most ing application is instrumented so that instances that groups reference fields dynamic type checks can be performed the analysis is performed at key points in sequentially; this allows efficient garbage in four instructions. It also suggests that the program execution. Jeff then collection. It also introduces a sparse an equality test of the runtime type of an described three approaches to perform- interface virtual table layout that reduces array and the declared type of the vari- ing shape analysis: immediate propaga- the cost of interface method calls to able that contains it can be an important tion, where the analysis is done before that of normal virtual calls. Another short-circuit check for object array the method execution; delayed propaga- important feature is the inclusion of a stores. Together, these techniques result tion, which delays the analysis untill an technique to improve thin locks by elimi- in significant performance improvements appropriate time; persistent propagation, nating busy-wait in the presence of con- on some benchmarks. which utilizes results from previous exe- tention. In his talk, Gagnon presented This code that implements these tech- cutions. SPEC benchmarks that demonstrated the niques is not available in the public efficiency of this research framework. Jeff discussed the various trade-offs in domain. The system is available for aca- these approaches. The experiments sug- This paper won the best student paper demic purposes; one may contact the gest a strategy which consults the results award at the conference. Further details author at [email protected] of the previous executions and delays the on this work can be obtained from the information about the project is available initial analysis untill the end of the first author ([email protected]) and at the at http://www.research.ibm.com/jalapeno. execution. Web site (http://www.sablevm.org/). PROOF LINKING: DISTRIBUTED VERIFICATION For more information on this work, the SESSION: JVM INTEGRITY OF JAVA CLASSFILES IN THE PRESENCE OF reader may visit www.cs.ucsb.edu/~bogda MULTIPLE CLASS LOADERS Summarized by V.N. Venkatakrishnan or contact Jeff at [email protected]. Philip W.L. Fong, Robert D. Cameron, DYNAMIC TYPE CHECKING IN JALAPEÑO Simon Fraser University Bowen Alpern, Anthony Cocchi, and Computations involving bytecode verifi- David Grove, IBM T.J. Watson Research cation can be expensive. To offload this Center burden within Java Virtual Machines Jalapeño is a JVM for servers. In any (JVM), distributed verification systems JVM, one must sometimes check whether may be created. This can be done using a value of one type can be can be treated any one of a number of verification pro- as a value of another type. The overhead tocols, based on such techniques as for such dynamic type checking can be a proof-carrying code and signed verifica- significant factor in the running time of tion by trusted authorities. Fong’s Saul Wold presenting Best Student Paper some Java programs. Bowen Alpern’s talk research advocates the adoption of a pre- Award to Étienne Gagnon presented a variety of techniques for per- viously proposed mobile code verifica-

14 Vol. 26, No. 5 ;login: August 2001 [email protected]. author canbecontacted by emailat the onthiswork, For information further prone errors. to transient be mademuch more robust andless allow commodityniques will systems to Such tech- cation could bedetected. errors intheJVMandappli- all memory its staticdataarea andthatupto 39% of error than susceptibility memory higher indicated that theJVM’s heaparea hasa Theresults thatwere presented ined. checksumsdata structure were exam- JVM silentdatacorruption, detection of To increase faultinjection. technique of wereThe experiments doneusingthe is important. thesystem errors atalllevels of memory addressing classavailability, mainframe To systems closerto bring system to fail. causetheentire which typically errors, memory fortransient true particularly Thisis availability. being paidto high littleattention is systems, high-end very was woven around thefactthatexcept for Chen’s work applications. benchmark usinga JVM andfourJavaceptibility iments to investigate error sus- memory exper- Deqing Chenpresented of aseries Milojicic, HPLabs William andMaryCollege; andDejan Institute ofTechnology; AlmaRiska, University; Durga Mannaru,Georgia bara; DavidJeunFungLie,Stanford vic, UniversityofCalifornia,SantaBar- Guangrui Fu,HPLabs;ZoranDimitrije- Alan Messer, PhilippeBernadat,and Deqing Chen,UniversityofRochester; JVM S [email protected] obtained from theauthorat detailsonthisworkFurther canbe class loaders. thiswork to handlemultiple extension of Fong alsopresented an fication protocols. veri- distributed various linking supports Proof verification intheJVM. tributed dis- dard forperforming infrastructure asastan- linking, proof tion architecture, SETBLT TO USCEPTIBILITY ;login: . M EMORY E RRORS JVM ’01 concerning interaction between Java and native methodsand issues cutions of rently researching threading forlongexe- Thegroup iscur- under development. theIntel/Linuxwhile isstill platform hasbeencompleted,erPC/AIX platform JNIonthePow- The implementationof native againstthesidestack. stackframes JNI checks forlive references inthe Jalapeño collection cycle, In agarbage accesses theseobjectsbasedontheirIDs. native The code stored inasidestack. anIDandthen beassigned code will each Java object to bepassedto anative To resolve references inJalapeño JNI, the transition. special prologue andepilogueto handle with areand they compiled dynamically, methods collected inaspecialJava class, theseare In Jalapeño, the specification. Java isthrough JNIfunctionsdefinedin from Cto code The entry Java to Ccode. from frames to establishthetransition theprologuethen generates andepilogue JVM corresponding native procedure. to resolve thenative the methodwith staticmethodiscalled aspecial Jalapeño, When anative methodisinvoked in where Jalapeñoplatform isinstalled. tothe JNIfunctionsare any portable 2)despite being a native interface, tion; are to transparent theJNIimplementa- 1)changes inJalapeño has two benefits: approach This beexpected. in Casmight inJava itiswritten than rather Jalapeño, the sameinternal reflection interface in In order fortheJNIfunctionsto reuse Research Center. Watsondeveloped attheIBMT.J. inJava which isaJVMwritten Jalapeño, JNIimplementationin implication of This talkaddressed theadvantages and Research Center Watson IBMT.J. Steve Smith, Ton Ngo, I Chung Summarized byChiasen(Charles) WORK-IN-PROGRESS REPORTS MPLEMENTING JNI IN J V FOR AVA J ALAPEÑO A hs fthesystem. phase of is currently implementingthereplay phase isnearcompletion andthegroup Therecord Machine Profiler Interface. are implemented usingtheJava Virtual Both therecord andreplay phaseinJaRec thread to report. chronization isforced by waitingfora Syn- sequence canbereproduced exactly. thread execution andinteraction the monitors baseonthistimestamp, ing theorder inwhich threads enter the By forc- fileasatimestamp. in thetrace clockvaluesare recordedThese Lamport erated. threads basedonthisclockvalueisgen- fortheinteraction betweentrace the a therecord During phase, a monitor. mented whenthethread leaves orenters clockwhich isincre-has aLamport Every thread (enter andexit) monitors. threads inJava usingtwo programs replays theinteraction sequence between JaRec thatrecords isaprogram and technique. two-phase “record/replay” suggested Mark a Christiaens problem, To solve this ism into thedebugging. thus introducing non-determin- reenact, difficult becausethread races are hard to multi-threadedDebugging is programs Bosschere, GhentUniversity Naudts, MichielRonsse,KoenDe Mark Christiaens,StijnFonck,Dries M J com/jalapeno. obtained at More can information native programs. issues in a distributed JVM andto issues inadistributed this project isto investigate scalability goalof The using clustered workstations. large-scale Java applicationsby server It to run isdesigned based onKaffe VM. Kaffemik JVM isascalabledistributed Johan Andersson,Trinity College K A AFFEMIK S R ULTI INGLE EC : R - THREADED ECORD A – AD DDRESS http://www.research.ibm. /R ISTRIBUTED J PA FOR EPLAY AVA S PACE P ROGRAMS JVM F EATURING 15 CONFERENCE REPORTS improve performance in large-scale Java the program represented in the CSSA Tokyo Institute of Technology; Kouya applications. graph is optimized. Finally, the compiler Shimura, Fujitsu Laboratories produces an executable that maps the OpenJIT2 is a JIT compiler for Java writ- Kaffemik is designed as a single JVM program onto the underlying hardware ten in Java that is based on “open compil- abstraction over the cluster by imple- consistency model. ers” construction technique. It not only menting a single address space architec- serves as a JIT compiler but also as an ture across all the nodes based on the This work explores the development that application framework for JIT . global memory management protocol. supports programmable memory mod- This framework allows multiple coexist- On top of the common local thread els. Relative efficiency of different mem- ing JITs to compile different parts of a operations, Kaffemik supports internode ory models running on a common program. synchronization and remote-node thread hardware can be investigated. More creation. information can be obtained from In the OpenJIT system, each instantiated http://www.research.ibm.com/people/m/midkiff/. compiler is a set of Java objects that com- Preliminary benchmark results show that piles at least one method. The selection Kaffemik starts local threads significantly STATE CAPTURE AND RESOURCE CONTROL of methods to be compiled is determined faster than remote threads, but is much FOR JAVA: THE DESIGN AND IMPLEMENTATION through an interface that is based on slower starting local threads compared to OF THE AROMA VIRTUAL MACHINE method attributes. If the attribute does Kaffe. Remote threads are even more Niranjan Suri, University of West Florida not specify a particular compiler (a set of expensive due to the overhead induced Aroma VM is a research VM designed to compilet objects) to be used, the default by page-faults. address some of the limitations of cur- baseline compiler will be selected. The current Kaffemik prototype shows rent Java VMs. The capabilities for Both baseline compiler and compilets are that it is costly to implement distributed Aroma were motivated by the needs to constructed using the OpenJIT2 frame- applications over high-speed clusters on mobilize agent systems and distributed work and class library. Without the limi- single address space architectures. The systems. tations of OpenJIT1’s relatively simple next step in the project is to implement a Aroma provides two key capabilities: the internal structure, OpenJIT2 uses com- two-level (global and local) memory ability to capture the execution state (of plex compiler modules to carry out allocator. A garbage collector for the either the complete VM or individual analysis, program transformation, and global memory is also needed, but it is threads) and the ability to control the optimization during compilation. The not addressed in this paper. resources used by Java programs running preliminary result shows that the baseline within the VM. The state capture capabil- A JAVA COMPILER FOR MANY MEMORY compiler will have reasonable compila- ities are useful for load-balancing and MODELS tion speed as an optimizing compiler survivable systems. The resource-control Sam Midkiff, IBM T.J. Watson Research compared with IBM’s jitc and Jalapeño’s Center capabilities are useful for protecting optimizing compiler. against denial of service attacks, account- The Java memory model is heavily cou- ing for resource usage, and as a founda- The first version of OpenJIT2 is expected pled into the programming language.In tion for quality of service. Aroma to be completed by the second quarter of hopes of overcoming its various flaws, a currently provides both rate and quantity 2001. Once OpenJIT2 is complete, a new memory model has been proposed. controls for CPU, disk, and network more comprehensive runtime perfor- Instead of fixing the memory model, this resources. mance will be evaluated. talk focused on defining the memory model as part of a property of the code There is no Just-in-Time compiler for SESSION: THREADING being compiled. Aroma currently, but there are plans to Summarized by Okehee Goh integrate freely available JIT compilers Sam Midkiff proposes a Java compiler (such as OpenJIT) in the future. More AN EXECUTABLE FORMAL JAVA VIRTUAL that accepts a “.class” file annotated with information on Aroma VM can be MACHINE THREAD MODEL a memory-model specification. The obtained from J. Strother Moore and George M. Porter, compiler first represents the program http://nomads.coginst.uwf.edu/. University of Texas at Austin using the Concurrent Static Single This presentation describes a research Assignment (CSSA) form. Escape analy- OPENJIT2: THE DESIGN AND IMPLEMENTA- project in which formal methods are sis is applied to determine the order in TION OF APPLICATION FRAMEWORK FOR JIT applied to which variables should be accessed COMPILERS (JVM). “Formal methods’’ is the idea of according to the memory model. Next, Fuyuhiko Maruyama, Satoshi Matsuoka, Hirotaka Ogawa, Naoya Maruyama,

16 Vol. 26, No. 5 ;login: August 2001 http://www.cs.utexas.edu/users/moore/publications. The casestudies using ACL2 are at http://www.cs.utexas.edu/users/moore/acl2 More details about ACL2 are available at software. checked correctness proofs aboutJava eventually mechanically permit this will modelssuch as Java, of good abstraction BecausetheJVMisavery fier iscorrect. such asthatthebytecode veri- JVM itself, possible to prove aboutthe properties itshouldbe Eventually, matical model. canbeanalyzed usingthismathe- tion, such asthread synchroniza- programs, JVMbytecode Complicated features of plicating theanalysis. undulycom-tures canbeaddedwithout other ACL2 casestudies thatsuch fea- from There isampleevidence niques. tech- alternative modelingandproof were omitted to make iteasier to explore Many such features metic orexceptions. boundedarith- model doesnotsupport the For example, model andtheJVM. There are differences certain between this the state. represented asafunctionthattransforms each bytecode is Thesemanticsof table. andaclass aheap, threads, collection of includinga three components, consists of theJVM Thestate of simulator forit. JVM ismodeledin ACL2 by defininga The language basedonCommon LISP. prover forafunctionalprogramming cation Common Lisp)isatheorem ACL2 for (AComputational Logic Appli- proven the with ACL2 theorem prover. Thetheorems were forit. programs theorems abouttheJVMandbyte-code paperdiscussesseveral This such prover. checked atheorem mechanically via Theseproofs canbe about themodel. notation isthattheorems canbeproved modeling software inamathematical Abenefitof andthreads. olution, dynamic methodres- objects, ing classes, includ- theJVMare modeled, aspects of Certain aboutcomputingthings systems. using mathematicsto modelandprove ;login: JVM ’01 such as “escape analysis.” staticanalysis techniques applying gap, authorsplanto The reduce this cution. still largewhencompared to exe- normal data-race is detection The overhead of requirements. comparable memory with faster thanexisting commercial products TRaDeis1.62times the Sun JVM1.2.1, using animplemented TRaDemethod in Relative to thebenchmark created by must beobserved. graph theobjectinterconnection of topology thatcanchange the JVM instructions andthe those objectsmust bechecked, access to tial to beinvolved inarace, have thepoten- Because “global objects” objects accessible to several threads. accessible to onethread andglobal localobjects guished into two types: are distin- Objects must bemaintained. objectsforwhich anaccess history ber of minimize thiscost isto reduce thenum- Oneway to consumption costly. isvery andtime thememory threads, number of each vector to clockisproportional the becausethesize of However, conditions. to operations uncoverprevious data-race itiscompared to the ation occurs, When oper- anewread orwrite structed. objectiscon-an access forevery history To detect dataraces, vector clocks. use of bytions performed threads through the instruc- of TRaDe modelstheordering are non-deterministic andnon-local. is hard to findadatarace becausethey Normally it able inanunordered fashion. cute modifyacommon whilethey vari- which occurs whenmultiple threads exe- multi-threaded isadatarace, programs in bugoccurring of The worst type Belgium Bosschere, ELIS,GhentUniversity, Mark ChristiaensandKoenDe THE TR A -F D LY E : AT R ACE OPOLOGICAL D TCININ ETECTION A POC TO PPROACH J AVA P ROGRAMS O N - licensees intheHotSpot source bundles. SA sources are currently available to The availabletechnology for endusers. are butthey working onmakingthis yet, APIs haven’t beenincludedintheJDK Russell saidthe to beported . will rently available and forSolaris Windows, which are cur- theSA APIs, In thefuture, up symbols. andlook allocated objects, of histograms get to traverse theheapand stack, It easy wasvery seemed to useful. bevery tures foundinthe SA’s which APIs, Kenneth Russell thefea- demonstrated in theHotSpot JVM. found the SAmirrors theC++structures the APIs in inJava, datatypes high-level In order to examine the applications. orJava suitable to debugtheJVMitself which makes it live processes orcore files, either allows examination transparent of This HotSpotrunning JVMprocess. It thenloadsacore fileorattaches to a actually interface aremote with process. launches anative debuggercalleddbxto SA version theSolaris of In principle, lookupinremoteand symbols processes. read remote-process memory, core file, The SAcanattach aremote process ora raw bits. debuggers onlydealwith since these isgone, level information allthishigh- ditional C/C++debugger, When atra- examining aJVMwith types. data sible to examine abstract high-level to make itpos- HotSpot JVMorcore file, recover to state ahigh-level from a developed to helpdevelopers language, APIs fortheJava programming is asetof This HotSpotAgent Serviceability (SA). the builtwith Java tool, debugging areallyThis talkdemonstrated useful Microsystems Kenneth Russell,LarsBak,Sun O T Summarized byJohanAndersson SESSION: JVMPOTPOURRI O A FOR HE UT - H OF OT J AVA -P S POT ROCESS V IRTUAL S ERVICEABILITY H IGH M ACHINE -L EVEL A D GENT EBUGGER : A N 17 CONFERENCE REPORTS MORE EFFICIENT NETWORK CLASS LOADING files into bundles, according to the aver- contiguous objects. With this model THROUGH BUNDLING age use in the class-loading profiles. there is no need to defragment memory David Hovemeyer, William Pugh, and move objects. When a block is allo- The experimental results indicated that University of Maryland cated, the GC scans a certain number of bundling is a good compromise between David Hovemeyer presented bundling, a blocks. This approach can guarantee that on-demand loading and monolithic technique for transferring files over a the system does not run out of memory, archives. The results also showed that network. Files that tend to be needed in as well as guaranteeing an upper bound bundling is no worse than the JAR for- the same program execution and that are for the garbage collection work for the mat, when used on an application not loaded close together are placed together allocation of one block of memory. included in the training set. into groups called bundles. Hovemeyer The rest of the talk focused on how to presented an algorithm to divide a collec- The bundling algorithm is described in obtain deterministic bytecode execution. tion of files into bundles based on pro- detail in the paper. Links to related Most bytecode operations can be imple- files for file-loading behavior. The main research done at the University of Mary- mented directly as a short sequence of motivation for bundling is to improve land can be found at machine instructions that executes in the performance of network class loading http://www.cs.umd.edu/~pugh/java/. constant time. These operations include in Java, by transferring as few bytes as access to local variables and the Java possible to make best use of available DETERMINISTIC EXECUTION OF JAVA’S PRIMI- stack, arithmetic instructions, compar- bandwidth. This is very useful in areas of TIVE BYTECODE OPERATIONS isons, and branches. Siebert briefly dis- wireless computing, where bandwidth is Fridtjof Siebert, University of Karlsruhe; cussed this but focused more on the a scarce resource. Andy Walter, Forschungszentrum Informatik (FZI) bytecodes where deterministic imple- Before Hovemeyer introduced the Siebert started his talk by presenting the mentation is not straightforward: for bundling algorithm, he discussed the problems with real-time Java and gave a example, class initialization, type check- alternatives. The first alternative involves brief definition of Java real-time. To pro- ing, and method invocation. The details downloading individual files: no vide Java with real-time support, all of this can be found in the paper. unneeded files are transferred, but for operations must be carried out in con- Finally, Jamaica’s performance was com- each file that is, the cost is high in terms stant time, or at least the upper bounds pared to Sun’s JDK implementation of network latency. The other alternatives for the execution times of Java bytecode using SPECjvm98. The results suggested are to use monolithic archives such as operations must be known. Essentially, that performance comparable with Sun’s JAR, thus risking transfer of unwanted the worst-case execution time for object non-deterministic implementations can files, or to use individual-class loading allocations, dynamic calls, class initializa- be reached, by tuning the compiler, for with on-the-fly compression, which can tion, type checking, and monitors must example, and by direct generation of be time-consuming. be determined. machine code instead of using C as the Hovemeyer and Pugh’s bundling The talk presented a JVM called Jamaica, current intermediate representation. approach is a hybrid of the above alterna- which implements a deterministic JVM For more information, contact the tives, combining the advantages of each. and a hard real-time garbage collector authors or visit http://www.aicas.com. The collection of files making up the (GC). First, Siebert discussed the typical application is divided into bundles, mark-and-sweep GC, followed by a pres- SESSION: GARBAGE COLLECTION which are then compressed. The basic entation on how garbage collection and Summarized by Hughes Hilton idea is to avoid files that are not used and memory allocation are implemented in to transfer files to match the order of Jamaica to guarantee a hard upper bound MOSTLY ACCURATE STACK SCANNING request by the client. The problem is to for an allocation. To avoid memory frag- Katherine Barabash, Niv Buchbinder, divide the collection of files into bundles. mentation, compacting or moving Tamar Domani, Elliot K. Kolodner, Yoav To solve this, Hovemeyer talked about garbage collection techniques are usually Ossia, Shlomit S. Pinter, Ron Sivan, and establishing class-loading profiles, which employed. However, Jamaica takes a new Victor Umansky, IBM Haifa Research Laboratory; Janice Shepherd, IBM T.J. can be determined by using training sets turn on this issue in order to avoid frag- Watson Research Laboratory of applications to record the order and mentation altogether. The heap is divided A garbage collector must scan registers time at which each class was loaded dur- into small, fixed-sized blocks (32 bytes). and the stacks in order to find objects ing execution. The bundling algorithm An object, depending on the size, is which can be collected. Typically, there then uses this information to group the assembled as a linear list of possibly non- are three types of garbage collector: con-

18 Vol. 26, No. 5 ;login: August 2001 can be stopped anywhere. Further infor- Further can bestopped anywhere. threads Also, performance hit isminimal. andthe relocated compaction), (allowing Therefore mostobjectscanbe otherwise. andscannedconservativelystack frames) accurately where itiseasy to doso(most thestackisonly scanned this method, In respectmostly accurate to with roots. stackscanning: of pose anothertype thispaperisto pro- of The contribution scanning. thismethodof cult with aresome otherGCalgorithms stilldiffi- compaction and However, tual Machines. inJavarelocation andisusedwidely Vir- It allows object collectors andworks well. garbage of theothertwo types mise of Thisisacompro- scan objectsaccurately. to information but usesobjecttype tively, respect to roots scansthestackconserva- aconservative approach with Lastly, expensive. performance butare still comparatively allow forbetter polling andpatching, such as algorithms, Certain somewhat). mapsmay(although becompressed voluminous canbevery instruction Creating mapsatevery maps exist. threads may bestopped onlywhere type Type-accuracy alsoaddsthefactor that may becompacted.memory canalsobemoved Objects sothat lected. andtherefore iscol- tainty garbage all all object-references cer- are with known However, performance. sive interms of complex to expen- implementandisvery collectorA type-accurate ismuch more performance over time. degrading stack cannotbecompacted, which means thatthe object relocation, alsoprohibits Thisuncertainty not. tive andwhatis aboutwhatisgarbage becauseitisnotabsolutely posi- garbage itmust retain some However, penalty. implement andhasalow performance A conservative collector simpleto isvery advantages anddisadvantages. Allthree have respect towith roots. orconservative type-accurate, servative, ;login: JVM ’01 hispaper. and thesubjectof the questionthatTony Printezis asked, bothworlds? Thatwas and getthebestof could hotswapbetween thetwo types collector agarbage whatif However, theapplicationinvolved. behavior of is fairlyminimalanddependsonthe collection generational of these two types The performance difference between memory.pact defragments be taken into account thatMark&Com- anditmust also collection, old garbage occur upto 1000timesmore often than Young collection can garbage lection). (which occurs young col- during garbage objectsto oldspace faster allocationof objectsbecauseitprovides of eration 20% faster incollecting theyounger gen- The Mark&Compact is10- algorithm which canaffectlong-term performance. can occur inaMark&Sweep system, fragmentation memory However, objects. collected asoftennot garbage asyoung are oldobjects usually although objects, 200-300% faster collection forold becauseMark&Sweep provides cases, inmost faster thanMark&Compact, The Mark&Sweep isslightly algorithm copying). is, collections foryoung (that rithm garbage share they the samealgo- the system; arethey appliedto of theoldgeneration are beingconsideredalgorithms when two These space collection. after garbage remaining objectsto consolidate free two isthatMark&Compact compacts the maindifference The between the pact. JVMs are Mark&Sweep andMark&Com- collection thatare often implemented in Two garbage forgenerational algorithms Tony Printezis,UniversityofGlasgow TORINA H http://www.haifa.il.ibm.com/projects/systems/Runtime_Subsystems.html research group isavailable at IBM’s Haifa mation aboutprojects of N A AND OT -S WAPPING M ARK G ENERATIONAL &C B OMPACT TENA ETWEEN E G NVIRONMENT ARBAGE M ARK &S C OLLEC WEEP . - preliminary results promising. lookvery preliminary but more complex swappingheuristics, Printezis wantsto develop the future, In account whenconsidering theresults. must betakenfragmentation into preventsthat thealgorithm memory thefact Also, were close. did notwin very andthosebenchmarks it benchmarks, thesix three collectors garbage intwo of the It wasthefastest of faredrithm well. thehot-swappingalgo- In benchmarks, thememory.defragment Mark&Compactpass wasmadewith to one tion to failedalot, theoldgeneration from objects theyoung genera- cation of linearallo- butif collections, old garbage Mark&Sweep wasusedmostlyfor swap. He forwhento usedasimpleheuristic free chunk to theMark&Sweep collector. whilestilllookinglike a pact collector, looklike to theMark&Com- garbage ory mem- classtoarray make afree chunk of Printezis hadto useafake byte rithm, In order to develop algo- theswitching algorithms. the Mark&Sweep andMark&Compact andmake minimalchanges to flexible, time be frommance swapping, penalty incuraminimalperfor- in constant time, It must swapback andforth fairly rigid. collectorfor ahot-swappinggarbage are The requirements by Printezis setforth algorithm (semispaces) anda algorithm a two-space copying lection algorithms: stop-the-world col- garbage sequential, her fellow researchers two parallelized how Floodand paper describes Christine This processors to increase performance. multiple that cantake advantage of employ collection garbage algorithms it makes sensethatthosesystems should multiprocessorshared-memory systems, Since Java isbeingusedincreasingly with vard University Tel-Aviv University;XiolanZhang,Har- Microsystems Laboratories;NirShavit, Christine H.Flood,DavidDetlefs,Sun M P ARALLEL EMORY G M ARBAGE ULTIPROCESSORS C LETO FOR OLLECTION S HARED 19 CONFERENCE REPORTS Mark&Sweep algorithm with sliding garbage collection. With eight processors, database. Next the VM will restore each compaction (Mark&Compact). there was as much as a 5.5x performance heap record. Since the OS can move Store gain. The team concluded that parallel records in the heap segments, VM needs Load balancing is a big problem for par- garbage collection must be used to avoid to update the pointers. After all the allel garbage collection. The key to load bottlenecks in large, multi-threaded pointers have been updated, each module balancing is correctly and efficiently par- applications. The contents of this paper of the VM restores their state from the titioning the task of tracing the object and other works appear on Sun’s site at: content in the Store header field before graph. This task does not lend itself to http://www.sun.com/research/jtech/. execution of the application continues. static partitioning, which is too expen- When a program finally terminates, the sive. Another solution might be over-par- SESSION: SMALL DEVICES VM will remove the Store data from the titioning by making more chunks than database. needed and having each processor get a Summarized by Chiasen (Charles) Chung chunk and come back for more. The A program often needs external states or problem with this algorithm is that the AUTOMATIC PERSISTENT MEMORY data that are not under the control of the size of the problem is not necessarily MANAGEMENT FOR THE SPOTLESS JAVA program runtime system. Spotless VM known. The solution is a work-stealing VIRTUAL MACHINE ON THE PALM CONNECTED supports persistence in these states algorithm. In work stealing, threads that ORGANIZER through the implementation of an inter- have work copy some of it to auxiliary Daniel Schneider, Bernd Mathiske, face “External.”External data have to syn- queues, where it is available to be stolen Matthias Ernst, and Matthew Seidl, Sun chronize with the internal data when the by other threads that do not have work to Microsystems, Inc. program is suspended or resumed. To do. PalmOS does not support automatic achieve this, Spotless uses a protocol multi-tasking capabilities. To achieve adopted from the Tycoon-2 system. In parallelizing the semispaces algorithm, that, programmers have to implement Flood and her team used work-stealing low-level event callbacks using the OS Disabling write protection creates a new queues to represent the set of objects to database API to suspend and reload their dimension of safety issues for PalmOS. It be scanned, rather than Cheney’s copy applications. The talk proposes an alter- is arguable whether a well-implemented and scan pointers (used traditionally). To native approach to allow transparent VM will not cross its boundary, but hard- avoid contention when many threads multi-tasking support for Java programs ware restriction is suggested. More infor- were allocating objects into space at the running on Spotless VM, a predecessor of mation on Spotless Java Virtual Machine same time, they had each thread allocate KVM. is available at relatively large regions called local alloca- http://www.research.sun.com/spotless/. tion buffers (LABs). To restrict open memory access, the OS provided a simple database API. The API ENERGY BEHAVIOR OF JAVA APPLICATIONS Mark&Compact consists of four phases not only accesses a small subset of RAM FROM THE MEMORY PERSPECTIVE that must be parallelized: marking, for- for the application program but is also N. Vijaykrishnan, M. Kandemir, S. Kim, ward-pointer installation (sweeping), ref- costly. Thus, the database API is bypassed S. Tomar, A. Sivasubramaniam, and M. erence redirection, and compaction. The by calling an undocumented system call J. Irwin, Pennsylvania State University researchers did the mark phase in parallel to disable memory protection. The byte- With mobile and wireless computing using work-stealing queues. They han- code interpreter in the persistent Spotless gaining popular ground, battery lifespan dled the forward-pointer installation by VM still resides in the dynamic memory, has become a growing concern. N. over-partitioning the heap. They imple- but all the Java data (including the byte- Vijaykrishnan’s presentation addressed mented the reference redirection phase codes and thread data) are stored in the the energy behavior of the memory sys- by treating the scanning of the young static memory. tem during the execution of Java pro- generation as a single task and reusing grams. It has been observed that memory A program is first started by creating a the previous partitioning done in the for- systems consume a large fraction of the new Store in the resource database tag of ward-pointer installation phase for the overall memory energy. Load/store are the type “appl.”When the program is sus- old generation. Finally, they parallelized the instructions that access the most pended, the VM automatically saves the the compaction phase by using larger- memory, consuming more than 50% of current state of the application by closing grained region partitioning. the total energy in both interpreted and the persistent Store in a controlled man- JIT-compiled programs. As data them- In benchmarks it was found that with the ner. To resume the suspended Spotless selves, byte-codes need to be fetched teams’ algorithms, the more processors VM, it will be retrieved from the Store working, the greater the advantage in from memory, and so interpreters are

20 Vol. 26, No. 5 ;login: August 2001 Java-II, which is running onREALOS. which isrunning Java-II, PersonalJava onto 3.02isported pico- Sun’s bytecode execution performance. ware cache forthestackto improve the thehard- picoJava-II takes advantage of area asaJava memory forward stack, which usesastraight- JVM, traditional Unlike is aJava chip developed atFujitsu. picoJava-II real hardware stackmachine. ona machine running a software virtual This talkfocusedonusingpicoJava-II as Laboratories Ltd. Takashi Aoki,Takeshi Eto,Fujitsu O ~mdl/. can befoundat More onthetalk information sumption. code thatactuallyreduces con- energy produce compilerdesigned will native awell- JIT modeisquite significant, consumed by dynamic compilation in energy Although aware. tors beenergy collec-that heapallocators andgarbage reused across different applicationsand recommended thatclassfilesshouldbe itwas To improve consumption, energy inversely to proportional thecache size consumption is ment thatenergy It wasfoundintheirexperi- is anissue. thus datalocality than on-chip accesses; accesses arememory more expensive cache missessince off-chip quency of consumption isalsodependentonfre- energy accesses, memory frequency of thanthe Other collection. and garbage dynamic methodcompilation, loading, class- heavily inthree areas: lizes memory EVM uti- Java applications, execution of actual the and“db.” Beside on “javac” emphasis with JVM98 benchmark suite, on theseven applicationsfrom theSPEC isbased Theexperiment in experiments. Labs Virtual Machine forResearch used ExactVM (EVM) istheJVMfrom Sun piled code. more memory-intensive thanJIT-com- THE THE N R EAL S OFTWARE H ARDWARE ;login: http://www.cse.psu.edu/ V IRTUAL S TACK M M ACHINE CIEFOR ACHINE JVM ’01 system programming. cache incoherency problem complicates aggregate thestack stacksforsolving of thepresence Lastly, theJava method. of picoJava-II follows thecallingconvention theCcompiler of be more efficientif theJNIimplementationcan Next, design. and datacaches complicates software coherency between stack thelackof First, problems encountered intheresearch. open there are anumber of ever, How- petitive JIT-compiled with code. It isalsocom- conventional Cinterpreter. processor better issignificantly thanthe The testing indicates thattheJava micro- it. modified before thehardware canaccept JCChasto be internal of datastructure The performance andreduce code size. alJava to improve class-loading isatool available onPerson- (JCC) pact JavaCodeCom- thenext frame. of start additional computation to resolve the requiring site direction (downward), issue isthatthestackgrows intheoppo- Another before accessing thestackframe. hastoformer beflushedfrequently the between thestackanddatacache, Since there isnocoherency mance. improve bytecode execution perfor- picoJava-II hasa64-word stackcache to picoJava-II. bytecode execution of engine order PersonalJava to port onto thedirect ous modificationshave to bemadein Numer- tecture fromJVMs. traditional picoJava-II hasadifferentarchi- engine Saul Wold &Étienne Gagnon 21 CONFERENCE REPORTS