ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management

ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management Rodrigo Bruno Duarte Patrício José Simão INESC-ID / Instituto Superior Técnico INESC-ID / Instituto Superior Técnico INESC-ID / ISEL University of Lisbon, Portugal University of Lisbon, Portugal Portugal [email protected] [email protected] [email protected] Luis Veiga Paulo Ferreira INESC-ID / Instituto Superior Técnico INESC-ID / Instituto Superior Técnico University of Lisbon, Portugal University of Lisbon, Portugal [email protected] [email protected] Abstract lifetimes next to each other). This is the case of, for example, Low latency services such as credit-card fraud detection and credit-card fraud detection or website targeted advertise- website targeted advertisement rely on Big Data platforms ment systems that rely on low-latency Big Data platforms (e.g., Lucene, Graphchi, Cassandra) which run on top of mem- (such as graph-based computing or in-memory databases) ory managed runtimes, such as the JVM. These platforms, to answer requests within a limited amount of time (usually however, suffer from unpredictable and unacceptably high specified in Service Level Agreements, SLAs). Such pauses in pause times due to inadequate memory management deci- these platforms delay application requests which can easily sions (e.g., allocating objects with very different lifetimes break SLAs. next to each other, resulting in memory fragmentation). This The root cause of this problem has been previously identi- leads to long and frequent application pause times, breaking fied [7, 15, 16] and can be decomposed in two sub-problems: Service Level Agreements (SLAs). This problem has been i) Big Data platforms hold large volumes of data in memory, previously identified and results show that current memory and ii) data stays in memory for a long period of time (from management techniques are ill-suited for applications that the Garbage Collector perspective), violating the widely ac- hold in memory massive amounts of middle to long-lived cepted assumption that most objects die young [21, 35]. Hold- objects (which is the case for a wide spectrum of Big Data ing this middle to long-lived data in memory leads to an applications). excessive Garbage Collection (GC) effort, as described in Sec- Previous works try to reduce such application pauses tion 2. This results in long and frequent application pauses, by allocating objects off-heap or in special allocation re- which compromise application SLAs, and demonstrates that gions/generations, thus alleviating the pressure on memory current GC techniques, available in most industrial Java Vir- management. However, all these solutions require a combi- tual Machines (such as OpenJDK HotSpot) are not suited for nation of programmer effort and knowledge, source code a wide spectrum of Big Data Applications. access, or off-line profiling, with clear negative impact on Recent works try to take advantage of application knowl- programmer productivity and/or application performance. edge to better adapt Big Data platforms to current GC tech- This paper presents ROLP, a runtime object lifetime pro- niques (in order to alleviate GC’s work). This can be done filing system. ROLP profiles application code at runtime in either through: i) manual refactoring of the application code, arXiv:1804.00702v1 [cs.DC] 9 Mar 2018 order to identify which allocation contexts create objects ii) adding code annotations, or iii) static bytecode rewriting. with middle to long lifetimes, given that such objects need to The modified code reduces the GC effort by either using 1 be handled differently (regarding short-lived ones). This pro- off-heap memory, or by redirecting allocations to scope lim- 2 filing information greatly improves memory management ited allocation regions or generations, leading to reduced decisions, leading to long tail latencies reduction of up to GC effort to collect memory. However, previous works have 51% for Lucene, 85% for GraphChi, and 60% for Cassandra, several drawbacks as they require: i) the programmer to with negligible throughput and memory overhead. ROLP is change application code, and to know the internals of GC implemented for the OpenJDK 8 HotSpot JVM and it does not require any programmer effort or source code access. 1Off-heap is a way to allocate objects outside the scope of the GC.When 1 Introduction using off-heap memory, the programmer is responsible for allocating and collecting memory (by deallocating all previously allocated objects). Big Data applications suffer from unpredictable and unac- 2Scope limited allocation regions are used to allocate objects which are ceptably high pause times due to inadequate memory man- automatically freed (with no GC effort) when the current scope is terminated. agement decisions (e.g., allocating objects with very different Such objects cannot escape scopes. to understand how it can be improved; ii) source code ac- In sum, the contributions of this work are the following: cess, which can be difficult if libraries or code inside the Java Development Kit needs to be modified; and iii) workloads • an online allocation profiling technique that can be to be stable and known beforehand, since off-line profiling used at runtime to help generational garbage collectors performs code optimizations towards a specific workload. taking better memory management decisions, namely Opposed to previous works, the goal of this work is to for object allocation; make GC pauses as short as possible using online application • the implementation of ROLP, integrated with NG2C, profiling. The profiling technique must not have a negative for HotSpot, a widely used production JVM. By im- impact on the application throughput and on memory usage, plementing ROLP in such JVM, both researchers and and should not rely on any programmer effort or previous companies are able to easily test and take advantage application execution traces (such as those used for offline of ROLP; profiling). Profiling information should be used to improve • an extensive set of performance experiments that com- GC decisions, in particular, to allocate objects with similar pare ROLP with G1 (current HotSpot default collector, lifetimes close to each other. Thus, ROLP strives to improve described in detail in Section 2), and NG2C (that takes GC decisions by producing runtime object allocation statis- advantage of programmer knowledge combined with tics. This profiling information must be accurate enough to off-line profiling for allocating objects with similar avoid the need for any programmer intervention or off-line lifetimes close to each other). application profiling. ROLP must reduce application pause- times with negligible throughput or memory overhead, and 2 Background it must work as a pluggable component (which can be en- This section provides background on generational GC, ex- abled or disabled at lunch time) for current OpenJDK JVM plaining its key insights, and why most implementations implementations. available in industrial JVMs are not suited for low-latency To attain such goals, ROLP, a runtime object lifetime pro- Big Data platforms. NG2C, a pretenuring N-Generational filer (which runs inside the JVM), maintains estimations of collector, which ROLP builds upon is also discussed. objects’ lifetimes. This information is obtained by account- ing the number of both allocated and survivor objects. The 2.1 Generational Garbage Collection number of allocated and survivor objects is kept in a global table for each allocation context. This information can then Generational GC is, nowadays, a widely used technique to be used by GC implementations to collocate objects with improve garbage collection [20]. It is based on the observa- similar lifetimes, thus reducing memory fragmentation and tion that objects have different lifetimes and, therefore, in object promotion, the two reasons for long tail latencies in order to optimize the collection process, objects with shorter current generational collectors. lifetimes should be collected more frequently than middle Dynamically perceiving application allocation behavior to long-lived objects. To take advantage of this observation, at runtime is not trivial as no profiling information from the heap (memory space where application objects live) is previous runs can be used, i.e., the profiler must only use divided into generations (from youngest to oldest). Most information acquired at runtime. This removes the need for generational collectors allocate all objects in the youngest offline profiling and ensures that the profiling information generation and, as objects survive collections, survivor ob- is optimized for the current workload. jects are copied to older generations, which are collected less ROLP is implemented inside the OpenJDK 8 HotSpot JVM, frequently. one of the most widely used industrial JVMs. In order to track Most popular JVMs, and specifically the most recent HotSpot allocation site allocations and allocation contexts, profiling collector, called Garbage First (G1) [13], also takes advan- code is inserted during bytecode Just-In-Time compilation tage of the weak generational hypothesis [35] which, as [30]. To take advantage of profiling information, ROLP is previously mentioned, states that most objects die young. integrated with NG2C [7], an N-Generational GC (based on By relying on this hypothesis, G1 divides the heap into two Garbage First [13]) that can allocate/pretenure objects into generation: young (where

ROLP: Runtime Object Lifetime Profiling for Big Data Memory Management

CS 403/503: Programming Languages

C++/CLI Language Specification

C# Programming in Depth Lecture 4: Garbage Collection & Exception

Generating Object Lifetime Traces with Merlin

Modern C++ Object-Oriented Programming

Name, Scope and Binding (2) in Text: Chapter 5

Memory Management Programming Guide

Contents Credits & Contacts

Objective-C Fundamentals

Lecture 7: Binding Time and Storage

Object Lifetime Prediction in Java

On the Prediction of Java Object Lifetimes