A Platform for Research Into Object-Level Trace Generation

View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by CiteSeerX A Platform for Research into Object-Level Trace Generation by James Foucar B.S., Computer Science, University of Texas at Austin, 2003 THESIS Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science Computer Science The University of New Mexico Albuquerque, New Mexico December, 2006 c 2006, James Foucar iii Acknowledgments Special thanks to my advisor, Darko Stefanović, for lots of help, guidance, ideas, and support. Thanks to Steve Blackburn, Matthew Hertz, and Hajime Inoue for giving help and correspondence during this project’s development. Thanks to my committee members Patrick Bridges, Hajime Inoue, and Darko Stefanović. Thanks to my friends at Sandia National Labs, especially Rich, for giving support and donating the computational power needed to complete this project. Thanks to Mom and Dad for their support and helping me proof-read this document. This material is based upon work supported by the National Science Foundation (grants CCR-0085792, CCF-0238027, CCF-0540600). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not neces- sarily reflect the views of the sponsors. iv A Platform for Research into Object-Level Trace Generation by James Foucar ABSTRACT OF THESIS Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science Computer Science The University of New Mexico Albuquerque, New Mexico December, 2006 A Platform for Research into Object-Level Trace Generation by James Foucar B.S., Computer Science, University of Texas at Austin, 2003 M.S., Computer Science, University of New Mexico, 2006 Abstract An object-level trace, which we will hereafter refer to as a garbage collection trace (GC- trace), enumerates all the events that happen during program execution that affect the heap. Therefore, GC-tracing is an excellent way to examine the behavior of an object-oriented system. Knowing behavior expedites many forms of virtual machine research including garbage collection research. GC-traces were originally generated by the painfully slow brute-force technique. Subsequently, this technique was replaced by the Merlin algorithm which can generate GC-traces at a fraction of the cost of the brute-force algorithm. Our work introduces a new GC-tracing system. Our system uses the Merlin algorithm within the context of a general shadow heap that is independent of the virtual machine. The shadow heap supports many optimizations to the Merlin algorithm and also supports analysis and verification. We examine the advantages and disadvantages of our approach, the viability of using a C++ library for analysis in conjunction with a Java-in-Java virtual machine, the various costs of integrating our shadow heap with a virtual machine, and vi the effectiveness of our GC-tracing optimizations. Finally, we compare our GC-tracing system with the GC-tracing system that is currently bundled with JikesRVM. vii Contents 1 Introduction 1 2 Background Material 2 2.1 ObjectBasedSystems............................ 2 2.2 VirtualMachines .............................. 3 2.3 GarbageCollectionvs. Explicitdeallocation . ........ 4 2.4 JikesRVM.................................. 10 2.5 GarbageCollectionTrace. 19 2.6 Merlin.................................... 23 3 Hertz’s Merlin System 28 3.1 High-levelDesign.............................. 28 3.2 Howitworksindetail............................ 29 3.3 TraceI/OIssues............................... 32 4 Shadow-heap-based system 35 viii Contents 4.1 High-levelDesign.............................. 35 4.2 Basicshadowheapalgorithm. 37 4.3 IntegrationwithJikesRVM . 39 4.4 DesignDecisions .............................. 41 4.5 ProjectHistory ............................... 45 5 Comparing the Two Systems 50 5.1 Performance................................. 50 5.2 Robustness ................................. 53 5.3 Capabilities ................................. 55 5.4 GC-TraceDistortion ............................ 56 5.5 GC-Tracequality .............................. 58 5.6 Flexibility.................................. 59 5.7 SoftwareEngineering............................ 61 5.8 Conclusion ................................. 62 6 Optimizations 64 6.1 BootRemsetHeap .............................. 64 6.2 RootDiffHeap................................ 69 6.3 SkipGCUpdateHeap ............................ 73 6.4 Eventbuffering ............................... 75 ix Contents 6.5 Missedopportunities ............................ 76 7 Verification 78 7.1 Eventgranularity:Debugalgorithms . .... 78 7.2 Algorithmicgranularity: Verifier Algorithms . ........ 79 7.3 Difficulties with verification against other systems . .......... 82 8 Results 84 8.1 Performance................................. 84 8.2 Verification ................................. 90 9 Related Work 101 9.1 Oracular................................... 101 9.2 Wolczkopatent ............................... 102 9.3 ShadowProcessing ............................. 103 10 Contributions 104 10.1ShadowHeap ................................ 104 10.2 Optimized Shadow Heaps and Corresponding Verifiers . ....... 105 10.3 AnewGC-tracingsystemforJikesRVM . 105 10.4 ImprovementstoHertz’ssystem . 105 10.5Tools..................................... 105 x Contents 10.6Analysis................................... 106 11 Things Learned 107 11.1 GCAlgorithms ............................... 107 11.2 GC-TracingAlgorithms. 107 11.3JikesRVM.................................. 108 11.4 SoftwareDesign............................... 108 11.5 SystemsProgramming . 108 12 Future Work 109 13 Conclusion 110 A Glossary 113 Glossary 114 B Class diagrams 128 C Usage 132 References 134 xi Chapter 1 Introduction GC-tracing is important tool for garbage collection research. We felt that the GC-tracing system that is bundled with JikesRVM, a popular open source VM code, had some room for improvement and we set out to create a new GC-tracing system that works with JikesRVM. The goals for this system were speed, robustness, and independence from JikesRVM. We decided that these goals could be best achieved by removing the GC-tracing system out of JikesRVM and placing it in an independent C++ library. The creation of this system gave us some good insight into the flaws of the implementation and approach of the previous system. We were able to fix some of these flaws; however, other flaws were inherent to the core algorithm used by both systems. We also gained some insight into the merits of our design approach and GC-tracing in general. In this document, we present our new GC-tracing system, the insights we have gained from our research, and a thorough evaluation of both GC-tracing systems. 1 Chapter 2 Background Material In this chapter we introduce the foundational concepts that our research is based upon. We also introduce the research virtual machine that we worked with and the Merlin algorithm [10, 11] that is the core of our GC-tracing system. Terms that are defined in the glossary are in italics. 2.1 Object Based Systems A large proportion of today’s programming projects are realized using object-oriented (OO) languages. The OO approach is based on the philosophy [21] that related data, and the set of operations on that data, should be encapsulated into a single independent entity: an object. The implementation details of the object and its data are hidden from the code outside the object. Each object allows other objects to interact with it through a well- defined interface which is enumerated in its class definition. An object is an instantiation of its class. The logic of an OO program is usually divided into small pieces called methods. Each method defines an operation upon the object. The operations themselves are typically small and primarily consist of messages to other objects to invoke their methods. 2 Chapter 2. Background Material The OO paradigm is popular because it is believed to be well suited to managing and reducing complexity in large-scale projects. This paradigm produces simple and flexible designs that closely correlate with the problem domain. OO code is also considered to be easier to read and maintain than non-OO code. For these reasons, the OO approach has become ubiquitous in industrial software engineering. 2.2 Virtual Machines When a C program is compiled, it creates a binary executable that consists of a long list of instructions that will be executed by the processor. The only protections from error that the programmer has available are those provided by the operating system. For example, most operating systems provide a paged virtual memory system that protects applications from clobbering memory outside of their virtual address space. In other words, virtual memory systems protect programs from each other. Some memory systems will raise exceptions if a program tries to read/write to a memory segment of the address space that should not be read/written. However, there are still a large number of troublesome types of memory errors that will not be caught by a programmer’s operating system. In a virtual machine environment [17], the virtual machine (VM) is an additional layer between a program and the operating system. Programs are compiled into a format that the virtual machine understands and to execute the program one must execute the virtual machine and provide the compiled program as input. While this approach incurs resource overhead [9], it has the advantage of

Load more