Exploring the Barrier to Entry - Incremental Generational Garbage Collection for Haskell

Exploring the Barrier to Entry - Incremental Generational Garbage Collection for Haskell A.M. Cheadle & A.J. Field S. Marlow & S.L. Peyton Jones Imperial College London Microsoft Research, Cambridge famc4,[email protected] fsimonmmar,[email protected] R.L. While The University of Western Australia, Perth [email protected] ABSTRACT Eventually, however, the region(s) containing long-lived We document the design and implementation of a \produc- objects (the \older" generation(s)) will fill up and it will be tion" incremental garbage collector for GHC 6.2. It builds necessary to perform a so-called major collection. on our earlier work (Non-stop Haskell) that exploited GHC's Major collections are typically expensive operations be- dynamic dispatch mechanism to hijack object code pointers cause the older generations are usually much larger than so that objects in to-space automatically scavenge them- the young one. Furthermore, collecting an old generation selves when the mutator attempts to \enter" them. This requires the collection of all younger generations so, regard- paper details various optimisations based on code speciali- less of the actual number of generations, the entire heap sation that remove the dynamic space, and associated time, will eventually require collection. Thus, although genera- overheads that accompanied our earlier scheme. We de- tional collection ensures a relatively small mean pause time, tail important implementation issues and provide a detailed the pause time distribution has a \heavy tail" due to the in- evaluation of a range of design alternatives in comparison frequent, but expensive, major collections. This renders the with Non-stop Haskell and GHC's current generational col- technique unsuitable for applications that have real-time re- lector. We also show how the same code specialisation tech- sponse requirements, for example certain interactive or real- niques can be used to eliminate the write barrier in a gen- time control systems. erational collector. The traditional way to reduce the variance of the pause times is to perform the garbage collection incrementally: Categories and Subject Descriptors: D.3.4 [Program- rather than collect the whole heap at once, a small amount ming Languages]: Processors|Memory management of collection is performed periodically. In this way the activ- (garbage collection) ities of the executing program (referred to as the mutator) and garbage collector are interleaved. General Terms: Algorithms, Design, Experimentation, One way to achieve incremental collection in the context Measurement, Performance, Languages of copying garbage collectors [8] is to use a read barrier to prevent the mutator from accessing live objects that have 1 Keywords: Incremental garbage collection, yet to be copied. This is the basis of Baker's algorithm [2] . Non-stop Haskell In our earlier paper [7] we described a low-cost mechanism for supporting the read barrier in the GHC implementa- 1. INTRODUCTION tion of Haskell with a single-generation heap. This exploits Generational garbage collection [24] is a well-established the dynamic dispatch mechanism of GHC and works by in- technique for reclaiming heap objects no longer required by tercepting calls to object evaluation code (in GHC this is a program. Short-lived objects are reclaimed quickly and called the object's entry code) in the case where the garbage efficiently, and long-lived objects are promoted to regions of collector is on and where the object has been copied but the heap which are subject to relatively infrequent collec- not yet scavenged. This `hijacking' mechanism directs the tions. It is therefore able to manage large heap spaces with call to code that automatically scavenges the object (\self- generally short pause times, these predominantly reflecting scavenging" code) prior to executing its entry code; this the time to perform minor collections. eliminates the need for an explicit read barrier. In implementing this \barrierless" scheme we had to mod- ify the behaviour of objects copied during garbage collection so that an attempt to enter the object lands in the Permission to make digital or hard copies of all or part of this work for self-scavenging code. At the same time, we had to devise a personal or classroom use is granted without fee provided that copies are mechanism for invoking the object's normal entry code after not made or distributed for profit or commercial advantage and that copies 1 bear this notice and the full citation on the first page. To copy otherwise, to An alternative is to use replication [13] which replaces the republish, to post on servers or to redistribute to lists, requires prior specific read barrier with a write barrier, but at the expense of addi- permission and/or a fee. tional work in maintaining a log of outstanding writes. This ISMM'04, October 24–25, 2004, Vancouver, British Columbia, Canada. paper focuses on Baker's approach, although the replication Copyright 2004 ACM 1-58113-945-4/04/0010 ...$5.00. approach is the subject of current work. execution of the self-scavenging code. We chose to do this by and compare the approach with that proposed by Ro- augmenting copied objects with an extra word which retains jemo in [16] (Section 5). a reference to the original object entry code after the entry code pointer has been hijacked. This extra word comes at 2. BACKGROUND a price: in addition to the overhead of managing it, it also influences the way memory is used. For example, in a fixed- We assume that the reader is familiar with Baker's in- size heap a space overhead reduces the amount of memory cremental collection algorithm [2] and the fundamentals of available for new object allocation which can reduce the av- generational garbage collection [24]. However, we review erage time between garbage collections and increase the total some notation. number of collections. In this paper we assume that all collection is performed An alternative approach is to build, at compile time, spe- by copying live objects from a from-space to a to-space. In cialised entry code for each object type that comes in two Baker's algorithm copying is synonymous with evacuation. flavours: one that executes normal object entry code and Evacuated objects are scavenged incrementally and the set one that first scavenges the object and then executes its en- of objects that have been evacuated but not scavenged con- try code. This eliminates the need for the extra word as stitute the collector queue. we can simply flip from one to the other when the object is A read barrier is required on each object access to ensure copied during garbage collection. The cost is an increase in that objects in from-space are copied to to-space before the the size of the static program code (code bloat). mutator is allowed to access them. A significant component The main objective of this paper is to detail how this of this barrier code is a test that first determines whether code specialisation can be made to work in practice and to the garbage collector is on (GC-ON) before applying the evaluate the effect of removing the dynamic space overhead from-space test. on both the mutator and garbage collector. Although the In generational collection we shall assume in the discus- idea is simple in principle it interacts in various subtle ways sions that follow that there are just two generations: the with GHC. young generation and the old generation, although our im- Specialisation also facilitates other optimisations, for ex- plementations can be configured to handle an arbitrary num- ample the elimination of the write barrier associated with ber of generations. We assume that objects are aged in ac- generational collectors. We show how the basic scheme can cordance with the number of collections they survive. We be extended to do this, although in the end we chose not to distinguish object tenuring, the process of ageing an object implement the extension as the average performance gain is within a generation, from object promotion, whereby an ob- shown to be small in practice. However, for collectors with ject is deemed sufficiently old for it to be relocated to the a more expensive write barrier, or write-intensive applica- next oldest generation. tions, one might consider the optimisation worthwhile. The remainder of this section focuses on the implementa- Experimental evaluation shows that a modest increase in tion platform that is the subject of this paper. static code size (25% over and above stop-and-copy and an 2.1 GHC and the STG-machine additional 15% over our earlier implementation of Non-stop Haskell) buys us an incremental generational collector that GHC (the Glasgow Haskell Compiler) implements the increases average execution time by a modest 4.5% and re- Spineless Tagless G-machine (STG) [21, 18, 19], which is duces it by 3.5% compared with stop-and-copy and Non-stop a model for the compilation of lazy functional languages. Haskell respectively, for our chosen benchmarks. Our solu- In the STG-machine every program object is represented tion is a compromise that exploits specialisation to remove as a closure. The first field of each closure is a pointer to the expensive read barrier but retains the write barrier to statically-allocated entry code, above which sits an info ta- yield an acceptable overhead in the static code size. ble that contains static information relating to the object's The paper makes the following contributions: type, notably its layout. An example is shown in Figure 1 for an object with four fields, the first two of which are • We describe how to eliminate the read barrier in GHC pointers (pointers always precede non-pointers). The layout using object specialisation and compare it to our previ- information is used by the collector when evacuating and ous scheme which pays a space overhead on all copied scavenging the closure.

Exploring the Barrier to Entry - Incremental Generational Garbage Collection for Haskell

Memory Diagrams and Memory Debugging

Automatic Detection of Uninitialized Variables

Memory Debugging

GC Assertions: Using the Garbage Collector to Check Heap Properties

Heapviz: Interactive Heap Visualization for Program Understanding and Debugging

Efficient and Programmable Support for Memory Access Monitoring And

Secure Virtual Architecture: Using LLVM to Provide Memory Safety to the Entire Software Stack

Debugging Memory Problems on Cray XT Supercomputers with Totalview Debugger

Detecting Memory Related Errors Using a Valgrind Tool Suite and Detecting Data Races Using Thread Sanitizer Clang

Introduction to C++ for Java Users

Thread & Memory Debugger

SCARPE: a Technique and Tool for Selective Capture and Replay of Program Executions∗