Uniprocessor Garbage Collection Techniques?

Unipro cessor Garbage Collection Techniques Paul R Wilson UniversityofTexas Austin Texas USA wilsoncsutexasedu Abstract We survey basic garbage collection algorithms and variations such as incremental and gen erational collection The basic algorithms include reference counting marksweep markcompact copy ing and treadmill collection Incremental techniques can keep garbage collection pause times short by interleaving small amounts of collection work with program execution Generational schemes improve eciency and lo calityby garbage collecting a smaller area more often while exploiting typical lifetime characteristics to avoid undue overhead from longlived ob jects Automatic Storage Reclamation Garbage col lection is the automatic reclamation of computer storage Knu Coh App While in many systems programmers must explicitly reclaim heap memory at some p oint in the program by using a free or disp ose statement garbage collected systems free the programmer from this burden The garbage collectors function is to nd data ob jects that are no longer in use and make their space available for reuse by the the running program An ob ject is considered garbage and sub ject to reclamation if it is not reachable by the running program via any path of p ointer traversals Live p otentially reachable ob jects are preserved by the collector ensuring that the program can never traverse a dangling p ointer into a deallo cated ob ject This pap er is intended to b e an intro ductory survey of garbage collectors for unipro cessors esp ecially those develop ed in the last decade For a more thorough treatment of older techniques see Knu Coh Motivation Garbage collection is necessary for fully mo dular programming to avoid intro ducing unnecessary intermo dule dep endencies A routine op erating on a data structure should not havetoknow what other routines maybe op erating on the same structure unless there is some go o d reason to co ordinate their activities If ob jects must b e deallo cated explicitly some mo dule must b e resp onsible for knowing when other mo dules are not interested in a particular ob ject Since liveness is a global prop erty this intro duces nonlo cal b o okkeeping into routines that might other wise b e orthogonal comp osable and reusable This b o okkeeping can reduce extensibility b ecause when new functionality is implemented the b o okkeeping co de must b e up dated The unnecessary complications created by explicit storage allo cation are esp ecially troublesome b ecause programming mistakes often intro duce erroneous b ehavior that breaks the basic abstractions of the program ming language making errors hard to diagnose Failing to reclaim memory at the prop er p ointmay lead to slow memory leaks with unreclaimed memory gradually accumulating until the pro cess terminates or swap space is exhausted Reclaiming memory to o so on can lead to very strange b ehavior b ecause an ob jects space may b e reused to store a completely dierent ob ject while an old p ointer still exists The same memory may therefore b e interpreted as two dierent ob jects simultaneously with up dates to one causing unpredictable mutations of the other This pap er will app ear in the pro ceedings of the International Workshop on Memory Management St Malo France Septemb er in the SpringerVerlag Lecture Notes in Computer Science series We use the term ob ject lo osely to include any kind of structured data record suchasPascal records or C structs as well as fulledged ob jects with encapsulation and inheritance in the sense of ob jectoriented programming These bugs are particularly dangerous b ecause they often fail to show up rep eatably making debugging very dicult they maynever showupatalluntil the program is stressed in an unusual way If the allo cator happ ens not to reuse a particular ob jects space a dangling p ointer may not cause a problem Later in the eld the application may crash when it makes a dierent set of memory demands or is linked with a dierent allo cation routine A slowleakmay not b e noticeable while a program is b eing used in normal waysp erhaps for manyyearsb ecause the program terminates b efore to o much extra space is used But if the co de is incorp orated into a longrunning server program the server will eventually exhaust its swap space and crash Explicit allo cation and reclamation lead to program errors in more subtle ways as well It is common for programmers to statically allo cate a mo derate numb er of ob jects so that it is unnecessary to allo cate them on the heap and decide when and where to reclaim them This leads to xed limitations on software making them fail when those limitations are exceeded p ossibly years later when memories and data sets are much larger This brittleness makes co de much less reusable b ecause the undo cumented limits cause it to fail even if its b eing used in a way consistent with its abstractions For example many compilers fail when faced with automaticallygenerated programs that violate assumptions ab out normal programming practices These problems lead many applications programmers to implement some form of applicationsp ecic gar bage collection within a large software system to avoid most of the headaches of explicit storage management Many large programs have their own data typ es that implement reference counting for example Unfortunately these collectors are often b oth incomplete and buggy b ecause they are co ded up for a oneshot application The garbage collectors themselves are therefore often unreliable as well as b eing hard to use b ecause they are not integrated into the programming language The fact that such kludges exist despite these problems is a testimony to the value of garbage collection and it suggests that garbage collection should b e part of programming language implementations In the rest of this pap er we fo cus on garbage collectors that are built into a language implementation The usual arrangement is that the allo cation routines of the language or imp orted from a library p erform sp ecial actions to reclaim space as necessary when a memory request is not easily satised That is calls to the deallo cator are unnecessary b ecause they are implicit in calls to the allo cator Most collectors require some co op eration from the compiler or interpreter as well ob ject formats must b e recognizable by the garbage collector and certain invariants must b e preserved by the running co de Dep ending on the details of the garbage collector this may require slightchanges to the co de generator to emit certain extra information at compile time and p erhaps execute dierent instruction sequences at run time Bo e WH DMH Contrary to widespread misconceptions there is no conict b etween using a compiled language and garbage collection stateofthe art implementations of garbagecollected languages use sophisticated optimizing compilers The TwoPhase Abstraction Garbage collection automatically reclaims the space o ccupied by data ob jects that the running program can never access again Such data ob jects are referred to as garbage The basic functioning of a garbage collector consists abstractly sp eaking of two parts Distinguishing the live ob jects from the garbage in some wayor garbage detection and Reclaiming the garbage ob jects storage so that the running program can use it In practice these two phases may b e functionally or temp orally interleaved and the reclamation technique is strongly dep endent on the garbage detection technique In general garbage collectors use a liveness criterion that is somewhat more conservative than those used by other systems In an optimizing compiler a value may b e considered dead at the p oint that it can never b e used again by the running program as determined by control ow and data ow analysis A garbage collector typically uses a simpler less dynamic criterion dened in terms of a root set and reachability from these ro ots At the p oint when garbage collection o ccurs all globally visible variables of active pro cedures Typically this happ ens when allo cation of an ob ject has b een attempted by the running program but there is not sucient free memory to satisfy the request The allo cation routine calls a garbage collection routine to free up space then allo cates the requested ob ject are considered live and so are the lo cal variables of any active pro cedures The root set therefore consists of the global variables lo cal variables in the activation stack and any registers used by active pro cedures Heap ob jects directly reachable from any of these variables could b e accessed by the running program so they must b e preserved In addition since the program mighttraverse p ointers from those ob jects to reach other ob jects any ob ject reachable from a live ob ject is also live Thus the set of live ob jects is simply the set of ob jects on any directed path of p ointers from the ro ots Any ob ject that is not reachable from the ro ot set is garbage ie useless b ecause there is no legal sequence of program actions that would allow the program to reach that ob ject Garbage ob jects therefore cant aect the future course of the computation and their space may b e safely reclaimed Ob ject Representations Throughout this pap er wemake the simplifying assumption that heap ob jects are selfidentifying ie that it is easy to determine the typ e of an ob ject at run time Implementations of staticallytyp ed garbage collected languages typically have hidden header elds on heap ob jects ie an extra eld containing typ e information which

Load more