Accurate and Efficient Memory Monitoring in a Java Virtual Machine

Submitted by Dipl.-Ing. Philipp Lengauer Submitted at Institute for System Software Supervisor and 1st Examiner o.Univ.-Prof. Dipl.-Ing. Dr. Dr.h.c. Hanspeter Accurate and Efficient Mössenböck 2nd Examiner Associate Professor Memory Monitoring in a Doc. Ing. Petr Tuma, Dr. Java Virtual Machine February, 2017 Doctoral Thesis to obtain the academic degree of Doktor der Technischen Wissenschaften in the Doctoral Program Engineering Sciences JOHANNES KEPLER UNIVERSITAT¨ LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 Abstract Modern applications are often written in an object-oriented style. They allocate large amounts of objects and rely on automatic garbage collection to reclaim them again. Memory-related problems such as out-of-memory exceptions, long garbage collection pauses or performance degradation due to excessive garbage collection activity are hard to find without proper tools. Common memory monitoring tools record the memory state of an application either by sampling (i.e., by periodically taking a snapshot of the memory state) or by instrumentation (i.e., by adding code snippets that record memory events such as object allocations or garbage collector activity). Sampling is often too coarse to track down intricate memory problems, while instrumentation often generates too much overhead. This thesis presents the design and implementation of AntTracks, a novel Java virtual machine that is able to record memory events at object-level with negligible distortion of the application's behavior. The design includes a novel format for memory events designed to be compact and GC-independent, enabling a generic post-processing tool to analyze the memory behavior without any internal knowledge about the GC algorithm that generated the trace. Also, we describe how to generate these events efficiently in just-in- time compiled code and how to increase the trace accuracy with VM-internal information. The thesis also addresses the fact that capturing events at object-level leads to traces that are too big be stored in their full length. It proposes and evaluates several techniques for keeping traces small and manageable. We also present novel techniques for parsing and analyzing such traces. To the best of our knowledge, our approach is the first that captures information at object-level and still only requires as much heap memory as the monitored application. The AntTracks tool is able to show an overview of the memory behavior throughout the application's lifetime. It can reproduce the heap for any point in time, enabling detailed analyses about the heap state. Furthermore, the tool can analyze the changes to the heap over time, enabling analyses about trends in the heap. We provide both a qualitative evaluation, showing that the AntTracks virtual machine only imposes very little overhead compared to other state-of-the-art approaches, as well as a functional evaluation, showing how the recorded information can be used to track down memory-related performance problems. i ii Kurzfassung Moderne Programme sind oft im objektorientierten Stil implementiert. Das heißt, sie legen große Mengen an Objekten an und verlassen sich auf die automatische Speicherbereinigung, um sie wieder freizugeben. Sollten sich nun Probleme im Speicher auftun, sind diese jedoch ohne geeignete Tools/Hilfsprogramme nur schwer zu finden. Aktuelle Werkzeuge beobachten das Speicherverhalten auf zwei Arten: entweder durch Abtasten (dh. durch periodisches Erfassen des gesamten Speicherzustandes) oder durch Instrumentierung (dh. durch Einfügenvon Anweisungen um einzelne Allokationen aufzurechnen). Abtasten ist allerdings oft zu grob, um komplizierte Speicherprobleme aufzuspüren,währendInstrumentierung große zusätzliche Laufzeitkosten verursacht. Diese Arbeit entwirft und implementiert AntTracks, eine neuartige Java Virtuelle Maschine, die in der Lage ist, Speicherereignisse auf Objektebene aufzuzeichnen, dabei aber das Verhalten des beobachteten Pro- gramms nur minimal verzerrt. Ihr Design enthältein neues Format fürkompakte und implementierungsun- abhängigeSpeicherereignisse, die von dem AntTracks Werkzeug verarbeitet und analysiert werden können, ohne, dass es Wissen über den benutzten Algorithmus zur automatischen Speicherbereinigung benötigt. Weiterhin wird beschrieben, wie diese Ereignisse effizient in synchron erstelltem Code aufgezeichnet werden können und wie man deren Genauigkeit auf Basis von internen Informationen der virtuellen Maschine verbessern kann. Außerdem beschäftigtsich die Arbeit auch mit dem Problem, dass die Erfassung auf Ob- jektebene zu sehr vielen kleinen Ereignissen führt, die nicht alle auf unbestimmte Zeit gespeichert werden können.Abschließend werden neuartige Techniken zur Analyse von Speicherereignissen aufgezeigt. Nach unserem Wissen, ist dies der erste Ansatz, der die Informationen auf Objektebene verarbeiten kann und dabei maximal genauso viel Speicher wie das überwachte Programm benötigt.Das AntTracks Werkzeug ist in der Lage eine Ubersicht¨ über das Speicherverhalten der gesamten Lebensdauer des Programms zu zeigen. Das bedeutet, es kann den Speicher fürjeden Zeitpunkt rekonstruieren, was detaillierte Analysen über den Inhalt ermöglicht. Diese Arbeit bietet sowohl eine qualitative Bewertung, als auch eine funktionale Bewertung der AntTracks. Dabei werden die Verzerrung des Programms und die zusätzlichen Laufzeitkosten währendder Uberwachung¨ angezeigt, und darüber hinaus wird beschrieben, wie man mit den aufgezeichneten Ereignissen speicherbed- ingte Probleme finden kann. iii iv Contents Abstract i Kurzfassung iii Table of Contents viii 1 Introduction 1 1.1 Research Questions..........................................2 1.1.1 Quantifying Application Behavior Distortion.......................2 1.1.2 Accurate and Efficient Monitoring.............................2 1.1.3 Memory-related Performance Problems..........................2 1.2 Structure...............................................2 1.3 Contributions.............................................3 2 State of the Art 5 2.1 Allocations..............................................5 2.1.1 Free List Allocations.....................................5 2.1.2 Bump Pointer Allocations..................................6 2.1.3 Thread-local Allocation Buffers...............................6 2.2 Garbage Collection..........................................7 2.2.1 Stop-the-world Collectors with Safepoints.........................7 2.2.2 Generational Collectors and Card Tables..........................8 2.2.3 Stop and Copy Collector...................................8 2.2.4 Mark and Compact Collector................................9 2.2.5 Oracle's ParallelOld Collector................................9 2.2.6 Oracle's Garbage First Collector.............................. 10 2.2.7 Oracle's Concurrent Mark and Sweep Collector...................... 11 2.2.8 Red Hat's Shenandoah Collector.............................. 12 2.2.9 Azul's C4 Collector...................................... 12 2.3 Memory Monitoring Techniques................................... 12 2.3.1 API-based Monitoring.................................... 12 2.3.2 Sampling-based Monitoring................................. 13 2.3.3 Instrumentation-based Monitoring............................. 13 2.3.4 Event-based Monitoring................................... 14 v 2.3.5 Comparison.......................................... 14 2.4 Monitoring Tools........................................... 15 3 Observer Effect 25 3.1 Run Time............................................... 26 3.2 GC Count............................................... 26 3.3 GC Time............................................... 27 3.4 Memory Usage............................................ 28 4 Low-overhead Object and GC Tracing 29 4.1 Common Case Optimizations.................................... 30 4.1.1 Compiled Code Allocations................................. 31 4.1.2 Thread-local Allocations and Thread-local Moves..................... 31 4.1.3 Small Array Allocations................................... 34 4.1.4 Survivor Ratio........................................ 34 4.1.5 Clustered Moves....................................... 34 4.2 Events................................................. 37 4.2.1 Phase Events......................................... 37 4.2.2 Bookkeeping Events..................................... 39 4.2.3 Allocation Events....................................... 40 4.2.4 Move Events......................................... 44 4.2.5 Deallocation Events..................................... 45 4.3 Symbols................................................ 45 4.3.1 Types............................................. 45 4.3.2 Allocation Sites........................................ 45 4.4 Efficient Trace Generation...................................... 46 4.4.1 Event Buffering........................................ 46 4.4.2 Event Firing......................................... 49 4.5 Temporal Ordering of Events.................................... 57 4.6 Trace Portability........................................... 58 4.7 Continuous Tracing.......................................... 58 4.7.1 Compression......................................... 58 4.7.2 Rotation............................................ 60 4.8 Pointer Tracing............................................ 65 4.8.1 Move Events with Pointer Information........................... 65 4.8.2 Dirty Pointer Tracing by Exploiting the Card Table................... 67

Load more