Automatic Garbage Collection Reference Counting The alternative to manual This is one of the oldest and simplest deallocation of heap space is garbage garbage collection techniques. collection. A reference count field is added to Compiler-generated code tracks each heap object. It counts how many pointer usage. When a heap object is references to the heap object exist. no longer pointed to, it is garbage, When an object’s reference count and is automatically collected for reaches zero, it is garbage and may subsequent reuse. collected. Many garbage collection techniques The reference count field is updated exist. Here are some of the most whenever a reference is created, important approaches: copied, or destroyed. When a reference count reaches zero and an object is collected, all pointers in the collected object are also be followed and corresponding reference counts decremented.

© © CS 536 Spring 2005 427 CS 536 Spring 2005 428

As shown below, reference counting Mark-Sweep Collection has difficulty with circular structures. Global pointer P Many collectors, including mark & Reference Count = 2 Link sweep, do nothing until heap space is Data nearly exhausted. Reference Count = 1 Then it executes a marking phase that Link Data identifies all live heap objects. If pointer P is set to null, the object’s Starting with global pointers and reference count is reduced to 1. Both pointers in stack frames, it marks objects have a non-zero count, but reachable heap objects. Pointers in neither is accessible through any marked heap objects are also external pointer. The two objects are followed, until all live heap objects garbage, but won’t be recognized as are marked. such. After the marking phase, any object If circular structuresare common, not marked is garbage that may be then an auxiliary technique, like freed. We then sweep through the mark-sweep collection, is needed to heap, collecting all unmarked objects. collect garbage that reference During the sweep phase we also clear counting misses. all marks from heap objects found to be still in use.

© © CS 536 Spring 2005 429 CS 536 Spring 2005 430 Mark-sweep garbage collection is pointers is a bit tricky in languages illustrated below. like Java, C and C++, that have

Global pointer Global pointer pointers mixed with other types Internal pointer within data structures, implicit pointers to temporaries, and so forth. Considerable information about data Object 1 Object 3 Object 5 structures and frames must be available at run-time for this purpose. Objects 1 and 3 are marked because In cases where we can’t be sure if a they are pointed to by global pointers. value is a pointer or not, we may need Object 5 is marked because it is to do conservative garbage collection. pointed to by object 3, which is In mark-sweep garbage collection all marked. Shaded objects are not heap objects must be swept. This is marked and will be added to the free- costly if most objects are dead. We’d space list. prefer to examine only live objects. In any mark-sweep collector, it is vital that we mark all accessible heap objects. If we miss a pointer, we may fail to mark a live heap object and later incorrectly free it. Finding all

© © CS 536 Spring 2005 431 CS 536 Spring 2005 432

Compaction start of the heap and the current object. This is illustrated below:

After the sweep phase, live heap Global pointer Adjusted Global pointer objects are distributed throughout Adjusted internal pointer the heap space. This can lead to poor

locality. If live objects span many Object 1 Object 3 Object 5 memory pages, paging overhead may be increased. Cache locality may be degraded too. Compaction merges together freed objects into one large block of free We can add a compaction phase to heap space. Fragments are no longer mark-sweep garbage collection. a problem. After live objects are identified, they Moreover, heap allocation is greatly are placed together at one end of the simplified. Using an “end of heap” heap. This involves another tracing pointer, whenever a heap request is phase in which global, local and received, the end of heap pointer is internal heap pointers are found and adjusted, making heap allocation no adjusted to reflect the object’s new more complex than stack allocation. location. Pointers are adjusted by the total size of all garbage objects between the

© © CS 536 Spring 2005 433 CS 536 Spring 2005 434 Because pointers are adjusted, Copying Collectors compaction may not be suitable for languages like C and C++, in which it Compaction provides many valuable is difficult to unambiguously identify benefits. Heap allocation is simple pointers. end efficient. There is no fragmentation problem, and because live objects are adjacent, paging and cache behavior is improved. An entire family of garbage collection techniques, called copying collectors are designed to integrate copying with recognition of live heap objects. Copying collectors are very popular and are widely used. Consider a simple copying collector that uses semispaces. We start with the heap divided into two halves—the from and to spaces.

© © CS 536 Spring 2005 435 CS 536 Spring 2005 436

Initially, we allocate heap requests This is illustrated below: from the from space, using a simple “end of heap” pointer. When the from Global pointer Global pointer space is exhausted, we stop and do Internal pointer garbage collection. Actually, though we don’t collect Object 1 Object 3 Object 5 From Space garbage. We collect live heap objects—garbage is never touched. To Space We trace through global and local pointers, finding live objects. As each The from space is completely filled. object is found, it is moved from its We trace global and local pointers, current position in the from space to moving live objects to the to space the next available position in the to and updating pointers. This is space. illustrated in Figure 0.1. (Dashed The pointer is updated to reflect the arrows are forwarding pointers). We object’s new location. A “forwarding have yet to pointers internal pointer” is left in the object’s old to copied heap objects. All copied location in case there are multiple heap objects are traversed. Objects pointers to the same object. referenced are copied and internal pointers are updated. Finally, the to

© © CS 536 Spring 2005 437 CS 536 Spring 2005 438 and from spaces are interchanged, dead objects is essentially free. In and heap allocation resumes just fact, garbage collection can be made, beyond the last copied object. This is on average, as fast as you wish— illustrated in Figure 0.2. simply make the heap bigger. As the

Object 5 From Space heap gets bigger, the time between collections increases, reducing the Internal pointer number of times a live object must be Object 1 Object 3 To Space copied. In the limit, objects are never copied, so garbage collection becomes Global pointer Global pointer free! Figure 0.1 Copying Garbage Collection (b) Of course, we can’t increase the size To Space of heap memory to infinity. In fact, Internal pointer we don’t want to make the heap so Object 1 Object 3 Object 5 From Space large that paging is required, since swapping pages to disk is dreadfully

Global pointer Global pointer End of Heap pointer slow. If we can make the heap large Figure 0.2 Copying Garbage Collection (c) enough that the lifetime of most The biggest advantage of copying heap objects is less than the time collectors is their speed. Only live between collections, then objects are copied; deallocation of deallocation of short-lived objects

© © CS 536 Spring 2005 439 CS 536 Spring 2005 440

will appear to be free, though longer- be greater than the average lifetime lived objects will still exact a cost. of most heaps objects, we can Aren’t copying collectors terribly improve our use of heap space. wasteful of space? After all, at most Assume that 50% or more of the only half of the heap space is actually heap will be garbage when the used. The reason for this apparent collector is called. We can then divide inefficiency is that any garbage the heap into 3 segments, which we’ll collector that does compaction must call A, B and C. Initially, A and B have an area to copy live objects to. will be used as the from space, Since in the worst case all heap utilizing 2/3 of the heap. When we objects could be live, the target area copy live objects, we’ll copy them into must be as large as the heap itself. To segment C, which will be big enough avoid copying objects more than if half or more of the heap objects are once, copying collectors reserve a to garbage. Then we treat C and A as space as big as the from space. This is the from space, using B as the to essentially a space-time trade-off, space for the next collection. If we making such collectors very fast at are unlucky and more than 1/2 the the expense of possibly wasted space. heap contains live objects, we can still get by. Excess objects are copied onto If we have reason to believe that the an auxiliary data space (perhaps the time between garbage collections will

© © CS 536 Spring 2005 441 CS 536 Spring 2005 442 stack), then copied into A after all their start, and utilize that structure live objects in A have been moved. throughout the program. Copying This slows collection down, but only collectors handle long-lived objects rarely (if our estimate of 50% poorly. They are repeatedly traced and garbage per collection is sound). Of moved between semispaces without course, this idea generalizes to more any real benefit. than 3 segments. Thus if 2/3 of the Generational garbage collection heap were garbage (on average), we techniques [Unger 1984] were could use 3 of 4 segments as from developed to better handle objects space and the last segment as to with varying lifetimes. The heap is space. divided into two or more generations, Generational Techniques each with its own to and from space. The great strength of copying New objects are allocated in the collectors is that they do no work for youngest generation, which is objects that are born and die between collected most frequently. If an object collections. However, not all heaps survives across one or more objects are so short-lived. In fact, collections of the youngest some heap objects are very long- generation, it is “promoted” to the lived. For example, many programs next older generation, which is create a dynamic data structure at collected less often. Objects that

© © CS 536 Spring 2005 443 CS 536 Spring 2005 444

survive one or more collections of this object in a newer generation. If we generation are then moved to the don’t do this, we may mistake a live next older generation. This continues object for a dead one. When an object until very long-lived objects reach the is promoted to an older generation, oldest generation, which is collected we can check to see if it contains a very infrequently (perhaps even pointer into a younger generation. If never). it does, we record its address so that The advantage of this approach is we can trace and update its pointer. that long-lived objects are “filtered We must also detect when an existing out,” greatly reducing the cost of pointer inside an object is changed. repeatedly processing them. Of Sometimes we can do this by course, some long-lived objects will checking “dirty bits” on heap pages to die and these will be caught when see which have been updated. We their generation is eventually then trace all objects on a page that collected. is dirty. Otherwise, whenever we assign to a pointer that already has a An unfortunate complication of value, we record the address of the generational techniques is that pointer that is changed. This although we collect older generations information then allows us to only infrequently, we must still trace their trace those objects in older pointers in case they reference an

© © CS 536 Spring 2005 445 CS 536 Spring 2005 446 generations that might point to return address stored in a frame) to younger objects. determine the routine a frame Experience shows that a carefully corresponds to. This allows us to then designed generational garbage determine what offsets in the frame collectors can be very effective. They contain pointers. When heap objects focus on objects most likely to are allocated, we can include a type become garbage, and spend little code in the object’s header, again overhead on long-lived objects. allowing us to identify pointers Generational garbage collectors are internal to the object. widely used in practice. Languages like C and C++ are weakly Conservative Garbage Collection typed, and this makes identification of pointers much harder. Pointers may The garbage collection techniques be type-cast into integers and then we’ve studied all require that we back into pointers. Pointer arithmetic identify pointers to heap objects allows pointers into the middle of an accurately. In strongly typed object. Pointers in frames and heap languages like Java or ML, this can be objects need not be initialized, and done. We can table the addresses of may contain random values. Pointers all global pointers. We can include a may overlay integers in unions, code value in a frame (or use the

© © CS 536 Spring 2005 447 CS 536 Spring 2005 448

making the current type a dynamic done. However, mark-sweep property. collection will work. As a result of these complications, C Garbage collectors that work with and C++ have the reputation of being ordinary C programs have been incompatible with garbage collection. developed [BW 1988]. User programs Surprisingly, this belief is false. Using need not be modified. They simply are conservative garbage collection, C and linked to different library routines, so C++ programs can be garbage that malloc and free properly collected. support the garbage collector. When The basic idea is simple—if we can’t new heap space is required, dead heap be sure whether a value is a pointer objects may be automatically or not, we’ll be conservative and collected, rather than relying entirely assume it is a pointer. If what we on explicit free commands (though think is a pointer isn’t, we may retain frees are allowed; they sometimes an object that’s really dead, but we’ll simplify or speed heap reuse). find all valid pointers, and never With garbage collection available, C incorrectly collect a live object. We programmers need not worry about may mistake an integer (or a floating explicit heap management. This value, or even a string) as an pointer, reduces programming effort and so compaction in any form can’t be eliminates errors in which objects are

© © CS 536 Spring 2005 449 CS 536 Spring 2005 450 prematurely freed, or perhaps never freed. In fact, experiments have shown [Zorn 93] that conservative garbage collection is very competitive in performance with application- specific manual heap management.

© CS 536 Spring 2005 451