Abstract Mo dels of

Greg Morrisett Matthias Felleisen Rob ert Harp er

Carnegie Mellon Rice University Carnegie Mellon

[email protected] [email protected] [email protected]

1 Memory Safety Abstract

Advanced programming languages manage memory allo ca- Most sp eci cations of garbage collectors concentrate on the

tion and deallo cation automatically. Automatic memory low-level algorithmic details of how to nd and preserveac-

managers, or garbage collectors, signi cantly facilitate the cessible ob jects. Often, they fo cus on bit-level manipula-

programming pro cess b ecause programmers can rely on the tions such as \scanning stack frames," \marking ob jects,"

language implementation for the delicate tasks of nding \tagging data," etc. While these details are imp ortantin

and freeing unneeded ob jects. Indeed, the presence of a some contexts, they often obscure the more fundamental as-

garbage collector ensures memory safety in the same way p ects of memory management: what ob jects are garbage and

that a typ e system guarantees typesafety: no program writ- why?

ten in an advanced programming language will crash due We develop a series of calculi that are just low-level

to problems while allo cation, access, and enough that we can express allo cation and garbage collec-

deallo cation are transparent. However, in contrast to typ e tion, yet are suciently abstract that wemay formally prove

systems, memory management strategies and particularly the correctness of various memory management strategies.

garbage collectors rarely come with a compact formulation By making the heap of a program syntactically apparent, we

and a formal pro of of soundness. Since garbage collectors can sp ecify memory actions as rewriting rules that allo cate

work on the machine representations of abstract values, the values on the heap and automatically dereference p ointers

very idea of providing a pro of of memory safety sounds unre- to such ob jects when needed. This formulation p ermits the

alistic given the lack of simple mo dels of memory op erations. sp eci cation of garbage collection as a relation that removes

The recently develop ed syntactic approaches to the sp ec- p ortions of the heap without a ecting the outcome of the

i cation of language semantics by Felleisen and Hieb [11 ] evaluation.

and Mason and Talcott [18 , 19 ] are the rst execution mo - Our high-level approach allows us to sp ecify in a compact

els that are intensional enough to p ermit the sp eci cation manner a wide variety of memory managementtechniques,

of memory management actions and yet are suciently ab- including standard trace-based garbage collection (i.e., the

stract to p ermit compact pro ofs of imp ortant prop erties. family of copying and mark/sweep collection algorithms),

Starting from the  -S calculus of Felleisen and Hieb, we generational collection, and typ e-based, tag-free collection.

v

design compact sp eci cations of a numb er of memory man- Furthermore, since the de nition of garbage is based on the

agement ideas and proveseveral correctness theorems. semantics of the underlying language instead of the conser-

The basic idea underlying the developmentof our gar- vative approximation of inaccessibility,we are able to sp ecify

bage collection calculi is the representation of a program's and prove the idea that typ e inference can b e used to collect

run-time memory as a global series of syntactic declarations. some ob jects that are accessible but never used.

The program evaluation rules allo cate large ob jects in the



This work was sp onsored in part by the Advanced Research

global declaration, which represents the heap, and automat-

Pro jects Agency (ARPA), CSTO, under the title \The Fox Pro ject:

ically dereference p ointers to such ob jects when needed. As

Advanced Development of Systems Software," ARPA Order No. 8313,

issued by ESD/AVS under Contract No. F19628{91{{0168, Wright

a result, garbage collection can b e sp eci ed as any relation

Lab oratory, Aeronautical Systems Center, Air Force Materiel Com-

that removes p ortions of the current heap without a ecting

mand, USAF, and ARPAgrant No. F33615-93-1-1330. Views and

the result of a program's execution.

conclusions contained in this do cument are those of the authors and

In Section 2, we present a small functional programming

should not b e interpreted as necessarily representing ocial p olicies

or endorsements, either expressed or implied, of Wright Lab oratory

language, gc, with a rewriting semantics that makes allo-

or the United States Government.

cation explicit. We de ne a semantic notion of garbage col-

lection for gc and prove that there is no optimal collection

strategy that is computable. In Section 3, we sp ecify the

\free-variable" garbage collection rule which mo dels trace-

based collectors including mark/sweep and copying collec-

tors. We prove that the free-variable rule is correct and

provide two \implementations" at the syntactic level: the

rst corresp onds to a copying collector, the second to a gen-

erational one.

In Section 4, we formalize so-called \tag-free" collec-

tion algorithms for explicitly-typ ed, monomorphic languages

suchasPascal and Algol [7 , 29, 8]. Weshowhowtorecover

lus [11 ]. Roughly sp eaking, this kind of semantics describ es necessary shap e information ab out values from typ es during

an abstract machine whose states are programs and whose garbage collection. We are able to prove the correctness of

instructions are relations b etween programs. The desired - the garbage collection algorithm by using a well known type

nal state of this abstract machine is an answer program (A) preservation argument.

whose b o dy is a p ointer to some value, suchasaninteger, In Section 5, we justify our semantic de nition of garbage

in the heap. byshowing that Milner-style typ e inference can b e used to

Each rewriting step of a program letrec H in e pro ceeds prove that an ob ject is semantically garbage even though

according to a simple algorithm. If the body of the pro- the ob ject is still reachable. While previous authors have

gram, e ,isnotavariable, it is partitioned into an evalua- sketched this idea [3 , 5, 14, 12], we are the rst to presenta

tion context E (an expression with a hole [ ] in the place formal pro of of this result. The pro of is obtained by casting

of a sub-expression), which represents the control state, and the well known interpretation of typ es as logical relations

an instruction expression I , which roughly corresp onds to into our framework.

a program counter: e = E [I ]. The instruction expression Section 6 discusses related work and Section 7 closes with

0

determines the next expression e and anychanges to the a summary.

0

heap resulting in a new heap H . Putting the pieces to- Due to a lack of space, most of the pro ofs for lemmas

gether yields the next program in the evaluation sequence: are omitted in this pap er. However, full details may be

0 0

letrec H in E [e ] . Each instruction determines one transi- recovered from our companion technical rep ort [20 ].

tion rule of the abstract machine. Formally, a rule denotes a

relation b etween programs. A set of rules denotes the union

2 Mo deling Allo cation: gc

of the resp ective relations.

We use the following conventions: Let G be a set of

Syntax: The syntax of gc (see Figure 1) is that of a con-

0

program relations, and let P and P b e programs. Then,

ventional, higher-order, applicative programming language

G

based on the -calculus. Following the tradition of func-

0 0

P 7! P means P rewrites to P according to one

tional programming, a gc program (P ) consists of some

G



of the rules in G and 7! is the re exive, transitive

mutually recursive de nitions (H ) and an expression (e).

G

closure of 7!.

The global de nitions are useful for de ning mutually recur-

sive pro cedures, but their primary purp ose here is to repre-

G + r is the union of G with the rule r .

sent the run-time heap of a program. In general, there can

be cycles in a heap, so we use letrec instead of let as the

A program P is irreducible with resp ect to G i there

G

0 0

binding form. Expressions are either variables (x), integers

is no rule in G and no P suchthatP 7! P .

(i), pairs (he ;e i), pro jections ( ), abstractions (x:e), or

1 2 i

G

0  0 0

P + P means that P 7! P and P is irreducible

G

applications (e e ).

1 2

with resp ect to G.

Formally, the heap is a series of pairs, called bindings ,

consisting of variables and heap values. Heap values are a

P * means that there exists an in nite sequence of

G

semantically signi cant subset of expressions. The order of

G G G

programs P such that P 7! P 7! P 7!.

i 1 2

the bindings is irrelevantandeachvariable must b e b ound

to at most one heap value in a heap. Hence, we treat heaps

Figure 1 de nes the set of evaluation contexts and in-

as sets and, when convenient, as nite functions. Wewrite

struction expressions for gc . The de nition of evaluation

Dom (H ) to denote the b ound variables of H ,andRng (H )

contexts (E ) re ects the left-to-right, call-by-value evalua-

to denote the heap values b ound in H . We sometimes refer

tion order of the language. All terms to the left of the path

to variables b ound in the heap as locations or pointers.

from the ro ot to the hole are variables; the terms on the

The language contains two binding constructs: x:e

right are arbitrary. Instruction expressions (I ) consist of

binds x in e and letrec H in e binds the variables in Dom (H )

heap values (h), applications of (p ointers to heap-allo cated)

to the expressions Rng (H ) in both the values in Rng (H )

pro cedures to (p ointers to heap-allo cated) values, and pro-

and in e. Following convention, we consider programs to

jections of (p ointers to heap-allo cated) tuples.

b e equivalent up to a consistent -conversion of b ound vari-

The evaluation rules for gc re ect the intentions b ehind

ables.

our choice of instruction expressions. The transition rule al-

Considering programs equivalent mo dulo -conversion

lo c mo dels the allo cation of values in the heap by binding

and the treatment of heaps as sets instead of sequences hides

the value to a new variable and using this variable in its place

many of the complexities of memory management. In par-

in the program. Note that the \]" notation carries the im-

ticular, programs are automatically considered equivalentif

plicit requirement that the newly allo cated variable x cannot

the heap is re-arranged and lo cations are re-named as long

b e in the domain of the heap H . The transition pro j sp ec-

as the \graph" of the program is preserved. This abstraction

i es how a pro jection instruction extracts the appropriate

allows us to fo cus on the issues of determining what bind-

comp onent from a pointer to a heap-allo cated pair. Sim-

ings in the heap are garbage without sp ecifying how such

ilarly, app is a transliteration of the conventional -value

bindings are represented in a real machine.

rule into our mo di ed setting. It binds the formal parame-

ter of a heap-allo cated pro cedure to the value of the p ointer

0

given as the actual argument, and places the expression part

Mathematical Notation: We use X ] X to denote the

0 0

of the pro cedure into the evaluation context. Multiple appli-

union of two disjoint sets, X and X . We use H ] H to

cations of the same pro cedure require -conversion to ensure

denote the union of two heaps whose domains are disjoint.

that the formal parameter do es not con ict with bindings

Weusefe =xge to denote capture-avoiding substitution of

1 2

1

already in the heap . WeuseR to abbreviate the union of

the expression e for the free variable x in the expression e .

1 2

0 0

the rules allo c , pro j , and app .

WeuseX n X to denote fx 2 X j x 62 X g.

1

An alternative rule for application substitutes the actual argu-

ment(y ) for the formal (z ) within e and p erforms no allo cation. This

Semantics: The rewriting semantics for gc is an adapta-

rule is essentially equivalentto app, but the de nition ab ovesimpli-

tion of the standard reduction function of the  -S calcu-

v es the pro ofs of Section 5.

Programs:

(variables) x; y ; z 2 Var

(integers) i 2 Int ::=  j 2 j 1 j 0 j 1 j 2 j 

(expressions) e 2 Exp ::= x j i jhe ;e ij  e j  e j x:e j e e

1 2 1 2 1 2

(heap values) h 2 Hval ::= i jhx ;x ij x:e

1 2

(heaps) H 2 Heap ::= fx = h ;: ::;x = h g

1 1 n n

(programs) P 2 Prog ::= letrec H in e

(answers) A 2 Ans ::= letrec H in x

Evaluation Contexts and Instruction Expressions:

(contexts) E 2 Ctxt ::= []jhE; eijhx; E ij  E j Eej xE

i

(instructions) I 2 Instr ::= h j  x j xy

i

Rewriting Rules

allo c

(allo c ) letrec H in E [h] 7! letrec H ]fx = hg in E [x]



i

(pro j ) letrec H in E [ x] 7! letrec H in E [x ] (H (x)= hx ;x i and i =1; 2)

i i 1 2

app

(app) letrec H in E [xy] 7! letrec H ]fz = H (y )g in E [e] (H (x)= z :e)

Figure 1: The Syntax and Op erational Semantics of gc

Unfortunately, there can be no optimal garbage collector The irreducible programs of gc are either answers or

b ecause determining whether a binding is garbage or not is stuck programs. The latter corresp ond to machine states

undecidable. that result from the misapplication of primitive program

op erations or unb ound variables.

Prop osition 2.4 (Garbage Undecidable) Determining

if a binding is garbage in an arbitrary closed gc program is

De nition 2.1 (Stuck Programs) Aprogram is stuck if

undecidable.

it is of one of the fol lowing forms:

Pro of (sketch): We can reduce the halting problem to an

letrec H in E [ x] (x 62 Dom (H ) or H (x) 6= hx ;x i)

i 1 2

optimal collector by taking an arbitrary program, adding

a binding to the heap and mo difying the program so that

letrec H in E [xy] (x 62 Dom (H ) or H (x) 6= z :e or

if it terminates, it accesses the extra binding. An optimal

y 62 Dom (H ))

collector will collect the binding if and only if the original

program do es not terminate. 2

All programs either diverge or evaluate to an answer or

a stuck program. Put di erently, the evaluation pro cess

de nes a partial function from gc programs to irreducible

3 Reachability-Based Garbage Collection

programs [11 , 30 ].

Since computing an optimal collection is undecidable, a gar-

A Semantic De nition of Garbage Since the semantics of

bage collection algorithm must conservatively approximate

gc makes the allo cation of values explicit, including the

the set of garbage bindings. Most garbage collectors com-

implicit p ointer dereferencing in the language, we can also

pute the reachable set of bindings in aprogram given the

de ne what it means to garbage collect a value in the heap

variables in use in the current instruction expression and

and then analyze some basic prop erties. A binding x = h

control state. All reachable bindings are preserved; the oth-

in the heap of a program is garbage if removing the binding

ers are eliminated.

has no \observable" e ect on running the program. In our

Following Felleisen and Hieb [11 ], reachabilityingc is

case, we consider only integer results and non-termination

formalized by considering free variables. The following \free-

to b e observable.

variable" GC rule describ es bindings as garbage if there are

no references to these bindings in the other bindings, nor in

De nition 2.2 (Kleene Equivalence)

the currently evaluating expression:

(P ;G ) ' (P ;G ) means P + letrec H in x where

1 1 2 2 1 G 1

1

fv

H (x)=i if and only if P + letrec H in y and H (y )=

1 2 G 2 2

2

letrec H ] H in e 7! letrec H in e

1 2 1

(fv)

i. If G = G = R , then we simply write P ' P .

1 2 1 2

if Dom (H ) \ FV(letrec H in e)=;

2 1

A binding is garbage if removing it results in a program that

The fv rule is correct in that it only removes garbage, and

is Kleene equivalent to the original program:

thus computes valid collections. The keys to the pro of of

correctness of fv are a postponement lemma and a diamond

De nition 2.3 (Garbage) If P = letrec H ] fx =

lemma.

hg in e, then the binding x = h is garbage with respect to P

fv R

i P ' letrec H in e. A collection ofaprogram is the same

Lemma 3.1 (Postp onement) If P 7! P 7! P , then

1 2 3

fv R

program with some garbage bindings removed. An optimal 0 0

7! P . there exists a P 7! P such that P

3 1

2 2

col lection of a program is a program with as many garbage

bindings removedaspossible.

Pro of (sketch): By cases on the elements of R . 2

fv R

0

partitioning. It is p ossible to formulate an abstract version Lemma 3.2 (Diamond) If P 7! P and P 7! P ,

1 2 1

2

fv R

0

of such a mechanism, the free-variable tracing algorithm, by

7! P . 7! P and P then there exists a P such that P

3 3 3 2

2

lifting the ideas of mark-sweep and copying collectors to the

level of program syntax.

Pro of (sketch): Assume P = letrec H ] H in E [I ],

1 1 2

fv

We adopt the terminology of copying collection in the

7! P . We can easily P = letrec H in E [I ], and P

2 2 1 1

description of the free-variable tracing algorithm. We use

R

0

showby case analysis on the elements of R that if P 7! P

1

2

two heaps and a set: a \from-heap" (H ), a \scan-set" (S ),

f

0

where P = letrec H ] H ] H in E [e], for some H and

1 2 3 3

2

and a \to-heap" (H ). The from-heap is the set of bindings

t

fv R

0

in the current program and the to-heap will become the 7! P and P 7! P where P = letrec H ] e, then P

3 2 3 3 1

2

set of bindings preserved by the algorithm. The scan-set H in E [e]. 2

3

records the set of variables reachable from the to-heap that

With the Postp onement and Diamond Lemmas in hand,

have not yet b een moved from the from-heap to the to-heap.

it is straightforward to showthatfv is a correct GC rule.

The scan-set is often referred to as the \frontier."

The b o dy of the algorithm pro ceeds as follows: Avari-

fv

0 0

Theorem 3.3 (Correctness of fv) If P 7! P ,then P

able x is removed from S suchthatH has a binding for x.

f

is a col lection of P .

If no such lo cations are in S , the algorithm terminates. Oth-

erwise, it scans the heap value h to which x isboundinthe

0

Pro of: Let P = letrec H ] H in e and let P =

1 2

from-set H , lo oking for free variables. For each y 2 FV(h),

f

fv

0

letrec H in e such that P 7! P . Wemust show P evalu-

1

it checks to see if y has already b een forwarded to the to-set

0

ates to an integer value i P evaluates to the same integer.

H . Only if y is not b ound in H do es it add the variable to

t t

0

Supp ose P + letrec H in x and H (x) = i. By induc-

R

the scan-set S . This ensures that a variable moves at most

tion on the numb er of rewriting steps using the Postp one-

once from the from-heap to the scan-set.

ment Lemma, we can show that P + letrec H ] H in x

R 2

Formulating the free-variable tracing algorithm as a

and clearly (H ] H )(x) = H (x) = i. Now supp ose

2

rewriting system is easy. It requires only one rule that re-

P + letrec H in x and H (x)= i. By induction on the num-

R

lates triples of from-sets, scan-sets, and to-sets:

b er of rewriting steps using the Diamond Lemma, weknow

0 0 0

that there exists an H such that P + letrec H in x and

hH ]fx = hg;S ]fxg;H i =)

R

t

f

fv

0

hH ;S [ (FV(h) n (Dom (H ) ]fxg));H ]fx = hgi

t t

f

letrec H in x 7! letrec H in x. Thus, x must be bound in

0 0

H and since fv only drops bindings, H (x)=i. 2

Initially the free variables of the evaluation context and in-

This theorem shows that a single application of fv results

struction expression, which corresp ond to the \ro ots" of a

in a Kleene equivalent program. A real implementation in-

computation, are placed in S . Computing the free variables

terleaves garbage collection with evaluation. The following

of the context represents the scanning of the \stack" of a

theorem shows that adding fv to R preserves evaluation.

conventional implementation while computing the free vari-

ables of the instruction expression corresp onds to scanning

Theorem 3.4 For al l programs P , (P; R) ' (P; R + fv ).

the \registers." The initial tuple is re-written until we reach

a state where no variable in the scan-set is bound in the

Pro of: Clearly any evaluation under R can be simu-

from-heap. At this p oint, wehave forwarded enough bind-

lated by R + fv simply by not p erforming any fv steps.

ings to the to-heap. This leads to the following free-variable

Thus, if P + A then P + A. Now supp ose P +

R

R+fv R+fv

tracing algorithm rule:

letrec H in x and H (x )=i. Then there exists a nite

1 1 1 1

fva

0

rewriting sequence using R + fv as follows:

letrec H in e 7! letrec H in e

 00 0

(fva) if hH; FV(e); ;i =) hH ;S;H i

R+fv R+fv R+fv R+fv

00

P 7! P 7! P 7!7! letrec H in x

1 2 1 1

and Dom (H ) \ S = ;

We can showby induction on the numb er of rewriting steps

Clearly, the algorithm always terminates since the size of the

in this sequence, using the Postp onement Lemma, that all

from-heap strictly decreases with eachstep. Furthermore,

fv steps can b e p erformed at the end of the evaluation se-

this new rewriting rule is a subrelation of the rule fv, which

quence. This provides us with an alternativeevaluation se-

implies the correctness of the algorithm.

quence where all the R steps are p erformed at the b eginning:

fva fv

0 0

Theorem 3.5 If P 7! P , then P 7! P .

R R R R fv

0 0 0

P 7! P 7! P 7! 7! P 7!

n

1 2

fv fv fv

P 7! P 7!  7! letrec H in x

Pro of: Let P = letrec H in e be a gc program. The rst

n+1 n+2 1 1

step is to prove the basic invariants of the garbage collection



Since fv do es not a ect the expression part of a program

rewriting system: If hH; FV(e); ;i =) hH ;S;H i, then

t

f

0

0

and only removes bindings from the heap, P = letrec H ]

1

H ] H = H and FV(letrec H in e)= S . NowletP =

n

t t

f

fva

H in x for some H . Thus, P + letrec H ] H in x. Since

0

2 2 R 1 2

letrec H in e and supp ose P 7! P . Then,

1

H (x)=i,(H ] H )(x)= i. 2

1 1 2



hH ] H ; FV(e); ;i =) hH ;S;H i:

1 2 2 1

3.1 The Free-Variable Tracing Algorithm

and Dom (H ) \ S = ;. By the invariants,

2

0

FV(letrec H in e)= S ,soDom (H ) \ FV(P )=;. Conse-

1 2

The free-variable GC rule is a speci cation of a garbage col-

fv

0

lection algorithm. It assumes some mechanism for partition-

quently, P 7! P . 2

ing the set of bindings into two disjoint pieces such that one

set of bindings is unreachable from the second set of bindings

If we require that a collection algorithm pro duce a closed

and the b o dy of the program. Real garbage collection algo-

program, then fva is \optimal" in the following weak sense:

rithms need a deterministic mechanism for generating this

fva

0 0

algorithm on the program letrec H in e where H ;H is a if P is a closed program and P 7! P , then P has the

2 1 2

generational partition of the original program's heap. The fewest bindings needed to keep the program closed with-

resulting heap is glued onto H to pro duce a collection of out a ecting evaluation. Assuming each step in the free-

1

the program. variable tracing algorithm takes time prop ortional to the

size (in symb ols) of the heap ob ject forwarded to the to-

gen

0

heap, the time cost of the algorithm is prop ortional to the

letrec H in e 7! letrec H ] H in e

1

2

fva

0

amount of data preserved, not the total amount of data in

(gen)

if letrec H in e 7! letrec H in e

2

2

the original heap.

and H ;H form a generational partition of H

1 2

The rule's soundness follows directly from the Generational

3.2 Generational Garbage Collection

Collection theorem, together with the soundness of the free-

The free-variable tracing algorithm examines al l of the

variable tracing algorithm.

reachable bindings in the heap to determine that a set of

The third reason generational collection is imp ortantis

bindings may be removed. By carefully partitioning the

that empirical evidence shows that \ob jects tend to die

heap into smaller heaps, a garbage collector can scan less

young" [28 ]. That is, recently allo cated bindings are more

than the whole heap and still free signi cant amounts of

likely to b ecome garbage in a small number of evaluation

memory. A generational partition of a program's heap is a

steps. Thus, if we place recently allo cated bindings in

sequence of sub-heaps ordered in sucha way that \older"

younger generations, we can concentrate our collection ef-

generations never have p ointers to \younger" generations.

forts on these generations, ignoring older generations, and

still eliminate most of the garbage.

De nition 3.6 (Generational Partition)

A generational partition of a heap H is a sequenceofheaps

4 Garbage Collection via Typ e Recovery

H ;H ;:::;H such that H = H ] H ]  ]H and for al l i

1 2 n 1 2 n

such that 1  i

i i+1 i+2 n

The delimiters and other tokens of the abstract syntax mark

;. The H are referred to as generations and H is said to

i i

or \tag" heap values with enough information that wecan

be an older generation than H if i

j

distinguish pairs from functions, p ointers from integers, etc.

This allows us to navigate through the memory unambigu-

Given a generational partition of a program's heap, a

ously, but placing tags on heap values and stripping them o

free-variable based garbage collector can eliminate a set of

to p erform a computation can imp ose a heavy overhead on

bindings in younger generations without lo oking at anyolder

the running time and space requirements of programs [26 ].

generations.

An alternative to tagging is the use of typ es to determine the

Theorem 3.7 (Generational Collection)

shap e of an ob ject. If typ es are determined at compile time

Let H ;:::;H be a generational partition of the heap

and evaluation maintains enough information that the typ es

1 n

2 1

), and ] H of P = letrec H in e. Suppose H = (H

of reachable ob jects can always b e recovered, then there is

i

i i

2 1

Dom (H ) \ FV (letrec H ] H ]]H in e)=;. Then

no need to tag values.

i+1 n

i i

fv

2

Anumb er of researchers have made attempts to explore

) in e. P 7! letrec (H n H

i

this alternative [7 , 8 , 3 , 13 , 27, 2], but none of them presented

2

concise characterizations of the underlying techniques with

Pro of: Wemust show that Dom (H ) \ FV(letrec (H n

i

2

correctness pro ofs. In this section, we present the basic idea

) in e)=;. Since H ; ;H is a generational partition H

1 n

i

b ehind typ e-recovery based garbage collection. Wethenin-

of H , for all j ,1  j

j j +1 n

2

tro duce gc-mono, an explicitly typ ed, monomorphic vari-

Hence, FV(H ]]H ) \ Dom (H )=;. Now,

1 i1

i

antof gc. Weshow howto adapt the free-variable trac-

2 2

ing algorithm to recover typ es of ob jects in the heap and

FV(letrec H n H in e) \ Dom (H )

i i

2 2

to use these typ es in the traversal of heap ob jects instead

) ) [ FV(e)) \ Dom (H = (FV(H n H

i i

1

of abstract syntax. The pro of of correctness for this gar-

]]H ) [ FV(e)) = (FV(H ]]H ) [ FV(H

n 1 i1

i

2

bage collection algorithm is given by extending the pro of of

\Dom (H )

i

soundness of the typ e system for gc-mono.

2

= (FV(H ]]H ) \ Dom (H ))[

1 i1

i

1 2

((FV(H ]  ] H ) [ FV(e)) \ Dom (H ))

n

i i

2 1

4.1 gc-mono

)) ]  ] H ) [ FV(e)) \ Dom (H = ;[((FV(H

n

i i

1 2

= FV(letrec H ]  ] H in e) \ Dom (H )

n

i i

gc-mono is an explicitly typ ed, monomorphic variant of

= ;

gc. Thesetoftyp es ( )of gc-mono contains the conven-

tional basic typ es and constructed typ es for typing a func-

2

tional programming language like gc . The expressions of

gc-mono are the same as for gc, except that each raw Generational collection is imp ortantfor three practical

expression (u) is paired with some typ e information (see reasons: First, evaluation of closed, pure gc programs

Figure 2). Heap values are not paired with their typ e, re- makes it easy to maintain generational partitions.

ecting the fact that the memory is \almost tag free." Of

Theorem 3.8 Let P = letrec H in e be a closed program.

course, some typ e information is recorded within heap values

R

that are functions. In a low-level mo del that uses closures

If H ;:::;H is a generational partition of H and P 7!

1 n

0 0

(an explicit value environment paired with co de), this cor-

in e, then H ;:::;H ;H is a generational letrec H ] H

1 n

0

resp onds to maintaining a type environment, recording the

partition of H ] H .

typ es of the values in the closure's environment. The typ e

environment for a closure can b e computed at compile time The second reason generational collection is imp ortant

and shared among multiple instances, just like the co de. is that, given a generational partition, we can directly use

The evaluation contexts (E ) consist of a raw context (U ) the free-variable tracing algorithm to generate a collection

and a typ e ( ). Rawcontexts are lled with instructions (I ) of a program. The following rule invokes the free-variable

Typ es:

(typ es)  2 Type ::= int j    j  ! 

1 2 1 2

Programs:

(expressions) e 2 TExp ::= (u :  )

u 2 UExp ::= x j i jhe ;e ij  e j x::e j e e

1 2 i 1 2

(heap values) h 2 Hval ::= i jhx ;x ijx::e

1 2

(heaps) H 2 Heap ::= fx = h ;:: :;x = h g

1 1 n n

(programs) P 2 Prog ::= letrec H in e

(answers) A 2 Ans ::= letrec H in (x :  )

Evaluation Contexts and Instructions:

(contexts) E 2 TCtxt ::= (U :  )

U 2 UCtxt ::= []jhE; eij h(x: );Eij  E j Eej (x: ) E

i

(instructions) I 2 Instr ::= i jh(x : ); (x : )ij x::e j  (x: ) j (x: )(y : )

1 1 2 2 i 1 2

Rewriting Rules:

allo c

(allo c-int ) letrec H in E [i] 7! letrec H ]fx = ig in E [x]

allo c

(allo c-pair ) letrec H in E [hx : ;x : i] 7! letrec H ]fx = hx ;x ig in E [x]

1 1 2 2 1 2

allo c

(allo c-fn ) letrec H in E [y ::e] 7! letrec H ]fx = y ::eg in E [x]



i

(pro j ) letrec H in E [ (x: )] 7! letrec H in E [x ] (H (x)= hx ;x i;i=1; 2)

i i 1 2

app

0

(app) letrec H in E [(x: )(y : )] 7! letrec H ]fz = H (y )g in E [e] (H (x)= z : :e)

1 2

Figure 2: The Syntax and Op erational Semantics of gc-mono

the heap rule requires \guessing" the typ es of the values in which are a subset of the raw expressions (u). The evalu-

the heap and verifying these guesses simultaneously. The ation rules for gc-mono, named RM , are variants of the

third judgement, .P, asserts the well-typing of a complete rules for gc that largely ignore the typ e information on the

program. sub-expressions of the program's b o dy. Allo cation of inte-

The calculus gc-mono is type sound in that evaluation gers, tuples and functions strips the typ e tag o the heap

of well formed programs cannot get stuck[30]. Akey to the value b efore placing it in the heap. Allo cation of tuples also

pro of of soundness is a typepreservation lemma: removes the tags from the comp onents of the tuple. The

removal of typ e information corresp onds to the passage of

RM

0

Lemma 4.2 (Typ e Preservation) If .P and P 7! P ,

avalue from co de to data and do es not necessarily re ect a

0

then .P .

runtime cost. Pro jection and application are essentially the

same as for gc. Note that substitution of a result expres-

Theorem 4.3 (Typ e Soundness) If .P then either P

sion for an instruction o ccurs \in place," and hence the typ e

RM

0 0

is an answer or else there exists some P such that P 7! P

of the instruction is ascrib ed to the expression.

0

and .P .

The notion of a stuck state is adapted in accordance with

the typ e structure of the language.

4.2 Using Typ es in Garbage Collection

De nition 4.1 (gc-mono Stuck Programs) A

Sp ecifying a valid garbage collection rule that exploits typ es

program is stuck if it is of one of the fol lowing forms:

is straightforward. The key prop erty that the rule must

gc

0

letrec H in E [ (x: )] (x 62 Dom (H ) or

preserveis typepreservation. That is, if P 7! P ,andP is

i

0

H (x) 6= hx ;x i)

well typ ed, then P mustbewell typ ed. One waytoachieve

1 2

letrec H in E [(x: )(y : )] (x or y 62 Dom (H ) or

this goal is to makeitinto a side-condition of the GC rule:

1 2

H (x) 6= z : :e)

mono

letrec H ] H in e 7! letrec H in e

1 2 1

(mono)

if . letrec H in e

The static semantics of gc-mono consists of four judge-

1

ments ab out program comp onents. The rst judgement,

Theorem 4.4 For al l wel l formed gc-mono programs P ,

. e, is a binary relation between a typ e assignment ,

(P; RM ) ' (P; RM + mono ).

mapping a nite set of variables to typ es, and a typ ed ex-

pression e . Figure 3 contains the fairly conventional infer-

Pro of: Since mono preserves typ es, the typ e sound-

ence rules for expressions that generate the static semantics.

ness pro of trivially extends to the dynamic semantics with

Typing heaps and complete programs requires three addi-

mono. This implies that, since P is well formed, P can-

tional judgements. The rst is .h :  which asserts that

not get stuck under either system. Since bindings are only

the heap value h has typ e  under the assumptions in .

dropp ed and not up dated by mono, the results of evaluat-

The judgement is de ned using inference rules (not shown

ing under either system must b e the same. 2

here) similar to the expression-level rules. The second is

0

.H :,which asserts that the variables given typ e  in

0

Like the free-variable rule of gc , mono needs to b e re- are b ound to heap values in H of the appropriate typ e,

ned b efore it can serve as the basis for an implementation. under the assumptions in . The judgement's de nition via

(var) (int )

]fx:  g . (x :  ) . (i : int)

. (u :    ) . (u : ) . (u : )

1 2 1 1 2 2

(pro j ) (i =1; 2) (pair )

. (h(u : ); (u :  )i :    ) . ( (u :    ):  )

2 1 2 2 1 2 i 1 2 i

]fx: g . (u :  ) . (u :  !  ) . (u : )

1 2 1 1 2 2 1

(fn) (app)

. (x: :(u:  ):  !  ) . ((u :  !  )(u :  ):  )

1 2 1 2 1 1 2 2 1 2

0 0 0

; .H : .e 8 x 2 Dom ( ) : ] .H(x): (x)

(prog ) (heap)

0

. letrec H in e

.H :

Figure 3: The Static Semantics of gc-mono

0 0

= ( [ MTA(Tag[ ](h)) n where

s

t s

The re nement is similar to the one from fv to fva but

0

= ]fx: g

t

t

uses the typ es emb edded in programs to traverse heap val-

ues. The basis for the collection algorithm is a function that

The algorithm is initialized by taking the program heap H

determines a minimal, with resp ect to set-inclusion, typ e as-

as the initial from-heap and the minimal typ e assignmentof

signment forany expression e such that .e:

the program expression as .Ateach step in the algorithm,

s

describ es the typ es of all lo cations that are immediately

s

MTA(x: ) = fx: g

reachable from e or H , but have not yet b een forwarded to

t

MTA(i: ) = ;

H . describ es the typ es of all lo cations that havebeen

t t

MTA(he ;e i :    ) = MTA(e ) [ MTA(e )

2

1 2 1 2 1 2

forwarded to H .

t

MTA( e :  ) = MTA(e)

i

When a variable x is found in with typ e  and x is

s

MTA(x: :e :  !  ) = MTA(e) nfx:  g

1 1 2 1

b ound in H to the heap value h, the collector forwards

f

MTA(e e :  ) = MTA(e ) [ MTA(e )

1 2 1 2

the binding x = h to H and adds x: to . It then uses

t t

Tag [ ] to traverse h, placing the necessary typ e information

Lemma 4.5 If .e,thenMTA(e) .e and MTA(e)  .

on the comp onents so that MTA can determine the heap

value's minimal typ e assignment. This step provides the

If P = letrec H in e and P is closed, then MTA(e)de-

lo cations (and their typ es) that are immediately reachable

termines the typ es of the lo cations in the heap H that are

from h. Finally, the collector adds each of these lo cations to

immediately reachable from the expression e. A garbage

unless they have already b een forwarded to H .

s t

collector can use this typ e information together with the

Using this algorithm instead of an aprioripartitioning

following Tag function to traverse the reachable heap values

of the heaps, mono-a b ecomes a high-level sp eci cation of

based on their typ es instead of their abstract syntax.

a collector whose traversal of heap values uses typ es instead

of tag information:

Tag[int](i) = (i: int)

Tag[   ](hx ;x i) = (h(x : ); (x : )i :    )

1 2 1 2 1 1 2 2 1 2

mono-a

0

letrec H in e 7! letrec H in e

Tag[ !  ](x: :e) = ((x: :e):  !  )

1 2 1 1 1 2

if

(mono-a )

mono

 00 0

Tag is a curried function that takes a typ e and then takes a

hH; MTA(e); ;; ;i =) hH ; ;;H ; i

heap value of that typ e, and annotates that heap value with

The following de nition gives the primary invariants of enough information to turn it backinto an expression. It

the algorithm: is imp ortanttonotethatTag pattern-matches and op erates

according to the typ e argumentgiven and not the abstract

De nition 4.7 (mono-a Invariants) hH ; ;H ; i

s t t

f

syntax of the heap value given. MTA can b e used on the

satis es the mono-a invariants with respect to a program

resulting expression to nd the minimal typ e assignment for

letrec H in e and type assignment i :

0

the heap value. This provides us with the free lo cations

1. H = H ] H

t

f

and their typ es for the heap value. The following lemma

2. ] .e

s t

summarizes the relationship b etween Tag and MTA:

3. Dom ( )  Dom (H )

s

f

4. .H :

s t t

Lemma 4.6 If ; . H : ]fx:  g then there exists an h

5. ] 

s t 0

such that fx = hg  H , MTA(Tag[ ](h)) . h :  , and

MTA(Tag[ ](h))  ]fx: g

Intuitively, the invariants guarantee (1) that each bind-

ing is accounted for, (2) every binding needed for e is in the

Equipp ed with MTA, we can now rede ne the free-

scan-set or to-heap, (3) the scan-set corresp onds to bindings

variable tracing algorithm so that it uses Tag to traverse

in the from-heap, (4) the scan-set holds all free variables in

heap values. The algorithm is sp eci ed in the same manner

the to-heap, and (5) the scan-set and to-heap agree with .

0

as fva, i.e., as a rewriting system among tuples of the form

hH ; ;H ; i where H is the from-heap, is a typ e as-

s t t s

f f

Lemma 4.8 If ; . H : , T has the mono-a invariant

0

signment corresp onding to the scan-set, H is the to-heap,

mono

t

0 0

properties with respect to P and ,and T =) T ,then T

0

and is a typ e assignment for the to-heap:

t

2

The garbage collection rewriting system only maintains to sim-

t

mono

hH ]fx = hg; ]fx: g;H ; i =)

s t t

f plify the presentation and pro of; an implementation will not haveto

0 0

construct .

hH ; ;H ]fx = hg; i t

t

f

s t

binding) will have no observable e ect on evaluation. That has the mono-a invariant properties with respect to P and

is, the inference collection scheme shows that .

0

letrec fx =1;x =2;x =0;x = hx ;x ig in  x

The correctness of the algorithmic typ e-based garbage

1 2 3 4 1 3 1 4

collection rule can now easily b e veri ed.

is Kleene equivalent to the original program. Nowby apply-

mono-a

ing the free-variable rule, we can conclude that the binding

Theorem 4.9 If P is a wel l formedprogram and P 7!

mono

0 0

x = 2 can b e safely collected.

2

P ,then P 7! P .

We start by considering the original gc language as an

implicitly typ ed, monomorphic language, where the typ es of

Pro of: Wemust verify that the mono-a garbage collec-

the language are the same as for gc-mono except for the

tion rule preserves typabilityin order for mono to apply.

addition of typ e variables:

That is, wemust show that if P is a well formed program,

mono-a

0 0

and P 7! P , then .P .

(typ es)  2 Type ::= t j int j    j  ! 

1 2 1 2

Let P = letrec H in e and supp ose .P. Then, for some

, ; .H : and .e. If

0 0 0

By implicitly typ ed, we mean that the terms of the lan-

guage are not decorated with typ es as in gc-mono. Weadd

mono

 00 0

hH; MTA(e); ;; ;i =) hH ; ;;H ; i;

t

typ e variables to the set of typ es so that eachwell-formed

expression has a principal or most general typ e (explained

0

then by Lemma 4.8 taking = ;,we knowthat; .H :

s t

below).

and .e . Thus, bytheprog rule of the static semantics,

t

Typ e inference is the pro cess of decorating gc programs

0

. letrec H in e. 2

with typ es so that the resulting program typ e checks un-

der the gc-mono rules. (Refer to Figure 1 for the syntax

Furthermore, the algorithm never gets stuck, so it always

and dynamic semantics of gc and Figure 3 for the typing

applies:

rules for gc-mono.) Alternatively, we may directly sp ec-

ify a set of typing rules for gc programs by taking the

Theorem 4.10 If P is a wel l formedprogram, then there

mono-a

0 0

typing rules for gc-mono and erasing the typ e information

exists a program P such that P 7! P .

from the terms, resulting in the inference system of Figure 4.

These rules de ne judgements of the form ` e :  , where

Pro of: If the algorithm takes a step, the size of the

isatyp e assignment, e is a gc expression, and  isatyp e.

from-heap strictly decreases, so we know the collection ei-

Agiven gc expression can havemultiple typing deriva-

ther terminates or gets stuck. Here we show that the col-

tions according to these rules and consequently multiple typ-

lection cannot get stuck, so it must terminate. Supp ose

mono



ings, but an expression's typings may be ordered so that

P = letrec H in e and hH; MTA(e); ;; ;i =) hH ; ]

s

f

there is a most general, or principal typing and every other

fx:  g;H ; i. By Lemma 4.8, weknow that x 2 Dom (H ),

t t

f

typing is an instance of this principal typing and is thus

since the domain of the scan typ e-assignment is a subset

derivable.

of the from-heap. Consequently, there exists an h such

mono

0

We will showthatifwe can nd a typing for a program

]fx = hg and hH ; ]fx: g;H ; i =) that H = H

s t t

f f

f

that assigns a heap lo cation a typ e variable, then the con-

0 0 0 0  0

; ;H ]fx = hg; i with appropriate and . hH

t

s t t s

f

tents of that lo cation has no e ect on the rest of evaluation.

2

Consequently,any p ointers contained in the lo cation's bind-

ing do not need to b e scanned and traced during garbage

Extending the mono-a collection algorithm to work for

collection. The intuition b ehind the theorem is that a lo ca-

a language with explicit p olymorphism, where typ es are

tion's typ e is unconstrained only if the lo cation is not used in

passed to p olymorphic routines at run time as suggested by

some manner that would constrain the typ e. Consequently,

various language implementors [21 , 1, 27, 15 ], is straightfor-

we can replace the binding in the heap with any binding

ward b ecause enough information is preserved byevaluation

wecho ose. In particular, if the lo cation is b ound to a large

to always reconstruct the typ e of a p olymorphic ob ject. In

heap value, we can bind the lo cation to an integer or dummy

our technical rep ort [20 ], weshowhowthismay b e accom-

piece of co de without a ecting evaluation. This replacement

plished.

allows us to collect any bindings that used to b e reachable

through this binding without knowing anything ab out the

5 Collecting Reachable Garbage Using Typ e Inference

shap e of the original heap value.

The pro of of the theorem relies up on a semantic inter-

Thus far, wehave only considered sp eci cations and algo-

pretation of typ es as logical relations. In our case, the rela-

rithms for collecting unreachable bindings. In this section,

tions are a typ e-indexed family of binary relations relating

weshow that by using typ e inference during the garbage col-

programs to programs, answers to answers, and heaps to

lection pro cess, some bindings that are reachable can still b e

heaps. The relations are contrived so that, if two programs

safely collected. That is, typ e inference can b e used to prove

are related, then they are Kleene equivalent, so one program

that an ob ject is garbage even though it is reachable.

converges to an answer i the other converges to a related

For a simple example, consider the following gc pro-

answer and related answers at base typ e (int) yield equal val-

gram:

ues. Roughly sp eaking, the relations are logically extended

to relate answers of functional typ e (!) if, whenever such

letrec fx =1;x =2;x = hx ;x i;x = hx ;x ig in  x

1 2 3 2 2 4 1 3 1 4

answers are applied to appropriately related answers, the

resulting computations yield related results. Our pro of only Every binding in the heap is accessible from the program's

covers programs without cycles in their heaps, but it should expression ( x ), so the free-variable based collection rules

1 4

be p ossible to extend our arguments to all programs (see can collect nothing. But clearly the program will never

below for more details). dereference x nor x . The inference-based collection scheme

3 2

The relations are de ned in Figure 5 by induction on describ ed in this section will allow us to conclude that re-

typ es. The de nitions are parameterized by an arbitrary placing the binding x = hx ;x i with x =0 (orany other

3 2 2 3

(var) (int )

]fx:  g`x :  ` i : int

` e :    ` e :  ` e : 

1 2 1 1 2 2

(pro j) (i =1; 2) (tuple )

`he ;e i :    `  e : 

1 2 1 2 i i

]fx: g`e :  ` e :  !  ` e : 

1 2 1 1 2 2 1

(fn) (app)

` x:e :  !  ` e e : 

1 2 1 2 2

0 0 0

;` H : ` e : 

8 x 2 Dom ( ) : ] ` H (x): (x)

(prog ) (heap)

0

` letrec H in e : 

` H :

Figure 4: Typ e Inference Rules for gc

relational interpretation of typ e variables, . If t is a typ e for some fresh y and y . Wemust show that the twore-

1 2

variable, then (t) determines some xed, but arbitrary re- sulting answers are appropriately related.

0 0 0

lation b etween answer programs. This is consistent with the Let H and H b e heaps such that H ]fy = x:egH

i i

1 2

i

idea that well-typ ed programs have an implicit \8" quanti- for i =1; 2, and

er for the typ e variables in a program. The parameteriza-

0 0

 j= letrec H in z  letrec H in z : 

1 2 1

tion of the interpretation of typ e variables makes it straight-

1 2

forward to extend the de nition of the relations to account

for some z and z . Wemust showthat

1 2

for predicative p olymorphism.

Twowell-typ ed programs P and P are related at a typ e

1 2

0 0

 j= letrec H in y z  letrec H in y z :  :

1 1 2 2 2

1 2

 , written  j= P  P :  i , whenever one of the pro-

1 2

grams terminates with an answer, then the other program

00 0 0

Taking H = H ]fx = H (z )g, this follows if

i

i i i

terminates with a related answer at typ e  .

00 00

Two answers A and A are related at typ e  , written

1 2

 j= letrec H in e  letrec H in e :  :

2

1 2

 j= A  A :  , as follows: If  isatyp e variable t, then

1 2

the answers are related i they are in the relation (t). The

By the induction hyp othesis, it suces to showthat

relation is extended to the other typ es in a natural fashion.

00 00

 j= H  H :]fx :  g:

For example, if  is an arrow typ e  !  , the answers

1

1 2

1 2

are related i , whenever we apply the answer variables to

since e has a smaller typing derivation than x:e. By the

related arguments at typ e  , we get related programs at

1

lemma's hyp othesis, we knowthatj= H  H :,and

1 2

typ e  . Even though the relations b etween programs and

2

00

 since heaps remain related under extensions,  j= H

1

answers are de ned in terms of one another, the relations

00

:. By assumption, H

2

are well-founded b ecause the size of the typ e index always

decreases when one relation refers to another.

00 00

 j= letrec H in z  letrec H in z :  :

1 2 1

1 2

The de nition of the relations ensures that related pro-

grams remain related even if more bindings are added to

00 0 0

Since H = H ]fx = H (z )g, it is easy to showthat

i

i i i

the programs' heaps. Since evaluation only adds new bind-

00 00

ings and leaves existing bindings intact, it is clear that eval-

in x :  in x  letrec H  j= letrec H

1

2 1

uation preserves the relations. If the language p ermitted

00 00

assignment, then this prop ertywould not necessarily hold.

Consequently, H and H have related heap-values for each

1 2

00 00

For the statement of the following lemma, we need to

variable in ]fx :  g and thus  j= H  H :]fx :  g.

1 1

1 2

extend the logical relation on answers to heaps. Two heaps,

The other cases follow in a similar manner. 2

H and H , are related at a context , written  j= H 

1 2 1

Ourgoalistoshow that if is a typ e assignment, e an

H : if for all variables x in , the answers letrec H in x

2 1

expression, and H a heap suchthat` e :  and ;` H :

and letrec H in x are related at (x).

2

0

, then letrec H in e is Kleene equivalentto letrec H in e

The following lemma is the key to establishing our result:

0

where H is de ned as follows:

an expression is related to itself in the context of anytwo

related heaps.

0

H = fx = H (x) j (x) 62 Tvarg] fx =0j (x) 2 Tvarg

Lemma 5.1 For al l  : Tvar ! P (Ans  Ans), if `

This follows from Lemma 5.1 if we can show that, taking

e :  and  j= H  H : , then  j= letrec H in e 

1 2 1

 (t)tobetheeverywhere-de ned relation on answer pro-

0

letrec H in e :  .

2

0

grams,  j= H  H :. This in turn follows if wecan

0

Pro of (sketch): By induction on the derivation of ` show that  j= H  H : (H is related to itself ), since

0

0

e :  . Here wegivetheinteresting case (fn). Supp ose `  (t) relates every program and H (x) di ers from H (x)

0

x:e :  !  and  j= H  H :. By allo c, only when (x)=t.

1 2 1 2

Unfortunately, we cannot directly show that a well

allo c

formed heap is related to itself ! The problem is that if weat-

letrec H in x:e 7! letrec H ]fy = x:eg in y

1 1 1 1

allo c

tempt to argue by induction on the derivation of ;`H :,

letrec H in x:e 7! letrec H ]fy = x:eg in y

2 2 2 2

the uses of the heap rule require that we assume what we

Computations:

 j= P  P :  i ` P :  and ` P :  and

1 2 1 2

P + A implies P + A and  j= A  A :  and

1 R 1 2 R 2 1 2

P + A implies P + A and  j= A  A : 

2 R 2 1 R 1 1 2

Answers:

 j= A  A : t i hA ;A i2 (t)

1 2 1 2

 j= letrec H in x  letrec H in x : int i H (x )=H (x )=i

1 1 2 2 1 1 2 2

 j= letrec H in x  letrec H in x :    i  j= letrec H in  x  letrec H in  x : 

1 1 2 2 1 2 1 1 1 2 1 2 1

and

 j= letrec H in  x  letrec H in  x : 

1 2 1 2 2 2 2

0 0

 j= letrec H in x  letrec H in x :  !  i 8H ;H ;y ;y ;

1 1 2 2 1 2 1 2

1 2

0 0

 j= letrec H ] H in y  letrec H ] H in y : 

1 1 2 2 1

1 2

implies

0 0

 j= letrec H ] H in x y  letrec H ] H in x y : 

1 1 1 2 2 2 2

1 2

Figure 5: Relational Interpretation of Typ es

0

are trying to prove. The same problem is encountered when Thus, letrec H ] H in e + letrec H in x i letrec H ]

1 2 a

1

using logical relations to reason ab out conventional calculi H in e + letrec H in y and

2

b

with recursion or iteration op erators.

 j= letrec H in x  letrec H in y : :

0 a

If we forbid cycles in the heap, then we can transform b

the derivation of the heap's well formedness into a derivation

Supp ose H (x)=i (or H (y )= i). Then  must b e int since

a

b

that only uses a let-style rule instead of the recursive letrec-

 62 Tvar by the rst hyp othesis. By the de nition of  at

style heap rule:

int, H (y )= i (or H (x)=i). Thus,

a

b

] ` h :  ` H :

1 2 1 2

0

(let-heap )

] H in e: letrec H ] H in e ' letrec H

2 1 2

1

` H ]fx = hg : ]fx: g

1 2

2

Consequently, if a heap is cycle free, we mayshow byin-

duction on the derivation using the let-heap rule that the

Using the Inference GC theorem, wecannowshowthat

heap is related to itself.

the binding x = hx ;x i in the program:

3 2 2

Lemma 5.2 If ` H : ,  j= H  H : and H is

1 2 1 2 1 letrec fx =1;x =2;x = hx ;x i;x = hx ;x ig in  x

1 2 3 2 2 4 1 3 1 4

cycle free, then  j= H ] H  H ] H : ] .

1 2 1 2

can be replaced by x = 0. Taking = fx : t ;x : t g,

3 1 2 2 3 3

Finally, we can state and prove the following Inference

= fx : int;x : int  t g, H = fx =2;x = hx ;x ig,

2 1 4 3 1 2 3 2 2

GC sp eci cation: Given a cycle-free program, if we can nd

H = fx =1;x = hx ;x ig it is easy to show that the pre-

2 1 4 1 3

atyping that assigns a heap lo cation a typ e variable, then

conditions of the theorem hold. Consquently, replacing H

1

0

that lo cation can b e b ound to \0" without e ecting evalua-

with H = fx =0;x =0g results in a Kleene equivalent

2 3

1

tion.

program:

Theorem 5.3 (Inference GC) Let

letrec fx =1;x =0;x =0;x = hx ;x ig in  x :

1 2 3 4 1 3 1 4

= fx : t ;::: ;x :t g and H = fx = h ;::: ;x = h g

1 1 1 n n 1 1 1 n n

0

= fx =0;:: :;x =0g. If and H

1 n

Nowbyinvoking fv,we can collect the binding x =0:

1

2

1. ] ` e :  ( 62 Tvar), and

1 2

letrec fx =1;x =0;x = hx ;x ig in  x

1 3 4 1 3 1 4

2. ` H : ,and

1 2 2

and we know that the resulting program is Kleene equivalent

to the original program.

3. 9S:;`H : S ,and

1 1

4. H is cycle free,

2

6 Related Work

0

then letrec H ] H in e ' letrec H ] H in e.

1 2 2

1

The literature on garbage collection in sequential program-

ming languages per se contains few pap ers that attempt to

Pro of: Taking  (t)tobetheeverywhere de ned rela-

0

0

provide a compact characterization of algorithms or correct-

: holds trivially. By Lemma 5.2, tion,  j= H  H

1 0 1

1

ness pro ofs. Demers et al. [10 ] give a mo del of memory pa-

since H is cycle free and ` H : ,we know that

2 1 2 2

rameterized by an abstract notion of a \p oints-to" relation.

0

As a result, they can characterize reachability-based algo-

 j= H ] H  H ] H : ] :

0 1 2 2 1 2

1

rithms including mark-sweep, copying, generational, \con-

Since ] ` e :  ,weknowby Lemma 5.1 that

servative," and other sophisticated forms of garbage collec-

1 2

tion. However, their mo del is intentionally divorced from

0

 j= letrec H ] H in e  letrec H ] H in e : :

0 1 2 2

1

the programming language and cannot take advantage of

7 Summary and Future Work any semantic prop erties of evaluation, suchastyp e preser-

vation. Consequently, their framework cannot mo del the

Our pap er provides a unifying framework for a variety of gar-

typ e-based collectors of Sections 4 and 5. Nettles [22 ] pro-

bage collection ideas including standard copying and mark-

vides a concrete sp eci cation of a copying garbage collection

sweep collection, generational collection, tag-free collection,

algorithm using the Larch sp eci cation language. Our sp ec-

and inference-based collection. By making allo cation and

i cation of the free-variable tracing algorithm is essentially

the heap explicit, we are able to reason ab out memory man-

a high-level, one-line description of his sp eci cation.

agement using traditional -calculus techniques. In particu-

Hudak gives a denotational mo del that tracks reference

lar, we are able to make strong connections b etween garbage

counts for a rst-order language [16 ]. He presents an ab-

collection and typ e theory. By abstracting awaysuchlow-

straction of the mo del and gives an algorithm for comput-

level details as evaluation stacks, registers, and addresses,

ing approximations of reference counts statically. Chirimar,

we are able to formulate complicated collection algorithms

Gunter, and Riecke give a framework for proving invari-

in a compact manner and yet give a formal pro of of correct-

ants regarding memory management for a language with a

ness.

linear typ e system [9 ]. Their low-level semantics sp eci es

The framework is exible, as demonstrated by the

explicit memory management based on .

breadth of topics covered in this pap er, and we exp ect that

Both Hudak and Chirimar et al. assume a weak approxi-

the framework can b e extended to deal with other work on

mation of garbage (reference counts). Barendsen and Smet-

garbage collection. In the future, we hop e to extend the

sers give a Curry-liketyp e system for functional languages

Inference GC theorem of Section 5 to cover programs with

extended with uniqueness information that guarantees an

cyclic heaps. One approachistointerpret typ es as CPOs

ob ject is only \lo cally acccessible" [6 ]. This provides a com-

and prove that programs have a suitable compactness prop-

piler enough information to determine when certain ob jects

erty: every cyclic heap is appropriately approximated by

may b e garbage collected or over-written.

some nite \unwinding" of the heap. An alternative ap-

Tolmach [27 ] built a typ e-recovery collector for a vari-

proach mightbetoprovide a purely syntactic argumentthat

ant of SML that passes typ e information to p olymorphic

lo cations assigned a typ e variable never arise in an \active"

routines during execution, e ectively implementing a p oly-

p osition during evaluation. We also hop e to provide a pro of

morphic version of our language and collector describ ed in

of correctness for an algorithm that simultaneously infers

Section 4. Aditya and Caro gaveatyp e-recovery algorithm

typ es and p erforms garbage collection.

for an implementation of Id that uses a technique that ap-

p ears to b e equivalenttotyp e passing [1 ] and Aditya, Flo o d,

and Hicks extended this work to garbage collection for Id [2 ].

8 Acknowledgements

Over the past few years, a numb er of pap ers on inference-

based collection in monomorphic [7 , 29 , 8] and p olymor-

Thanks to Edo Biagoni, Andrzej Filinski, Mark Leone, Jack

phic [3 , 13 , 14 , 12] languages app eared in the literature.

Morrisett, Chris Okasaki, Andrew Tolmach, and Jeannette

App el [3 ] argued informally that \tag-free" collection is p os-

Wing for pro ofreading earlier drafts of this do cumentand

sible for p olymorphic languages suchas SML bya combi-

for providing valuable insight. This manuscript is submitted

nation of recording information statically and p erforming

for publication with the understanding that the U.S. Gov-

what amounts to typ e inference during the collection pro-

ernment is authorized to repro duce and distribute reprints

cess, though the connections b etween inference and collec-

for Governmental purp oses, notwithstanding anycopyright

tion were not made clear. Baker [5 ] recognized that Milner-

notation thereon.

style typ e inference can be used to prove that reachable

ob jects can be safely collected, but did not give a formal

References

accountofthisresult. Goldb erg and Gloger [14 ] recognized

that it is not p ossible to reconstruct the concrete typ es of

[1] S. Aditya and A. Caro. Compiler-directed typ e reconstruc-

tion for p olymorphic languages. In Proceedings of the Con-

all reachable values in an implementation of an ML-style

ference on Functional Programming Languages and Com-

language that do es not pass typ es to p olymorphic routines.

puter Architecture, pages 74{82, Cop enhagen, Denmark,

They gave an informal argument based on traversal of stack

June 1993.

frames to show that suchvalues are semantically garbage.

[2] S. Aditya, C. Flo o d, and J. Hicks. Garbage collection for

Fradet [12 ] gave another argument based on Reynolds's ab-

strongly-typ ed languages using run-time typ e reconstruc-

straction/parametricity theorem [24 ]. Fradet's formulation

tion. In Proceedings of the 1994 ACM Conference on Lisp

is closer to ours than Goldb erg and Gloger's, since he rep-

and Functional Programming, pages 12{23, Orlando, FL,

resented the evaluation \stack" as a source-language term.

June 1994.

However, none of these pap ers give a complete formulation

[3] A. W. App el. Runtime tags aren't necessary. Journal of Lisp

of the underlying dynamic and static semantics of the lan-

and Symbolic Computation, 2:153{162, 1989.

guage and thus the pro ofs of correctness are necessarily ad

hoc.

[4] Z. M. Ariola, M. Felleisen, J. Maraist, M. Odersky, and

P.Wadler. A call-by-need lamb da calculus. In Conference

Finally, Purushothaman and Seaman [23 , 25 ] and

Record of the 22nd Annual ACM Symposium on Principles

Launchbury [17 ] have prop osed \natural" semantics for cal l-

of Programming Languages, San Francisco, CA, Jan. 1995.

by-need (lazy) languages where the semantic ob jects include

an explicit heap. This allows sharing and memoization of

[5] H. Baker. Unify and conquer (garbage, up dating,

...) in functional languages. In Proceedings of the 1990 ACM

computations to be expressed in the semantics. More re-

Conference on Lisp and Functional Programming, pages

cently, Ariola et al. [4 ] have presented a purely syntactic

218{226, Nice, France, June 1990.

theory of the call-by-need -calculus that is largely compat-

ible with our work.

[6] E. Barendsen and S. Smetsers. Conventional and unique-

ness typing in graph rewrite systems. In Proceedings of the

13th Conference on the Foundations of Software Technol-

ogy and Theoretical Computer Science 1993, Bombay,New

York, NY, 1993. Springer-Verlag. Extended abstract.

[24] J. Reynolds. Typ es, abstraction, and parametric p olymor- [7] P. Branquart and J. Lewi. Ascheme for storage allo cation

phism. In Proceedings of Information Processing 83,pages and garbage collection for Algol-68. In Algol-68 Implementa-

513{523, 1983. tion. North-Holland Publishing Company, Amsterdam, 1970.

[8] D. E. Britton. Heap storage management for the program-

[25] J. M. Seaman. AnOperational Semantics of Lazy Evalua-

ming language Pascal. Master's thesis, University of Arizona,

tion for Analysis. PhD thesis, Pennsylvania State University,

1975.

1993.

[9] J. Chirimar, C. A. Gunter, and J. G. Riecke. Proving mem-

[26] P. Steenkiste and J. Hennessey. Tags and typ e checking

ory managementinvariants for a language based on linear

in LISP: Hardware and software approaches. In Proceed-

logic. In Proceedings of the 1992 ACM ConferenceonLisp

ings of the Second International ConferenceonArchitectural

and Functional Programming, pages 139{150, San Francisco,

Support for Programming Languages and Operating Systems

CA, June 1992.

(ASPLOS-II), pages 50{59, Palo Alto, CA, Oct. 1987.

[10] A. Demers, M. Weiser, B. Hayes, H. Bo ehm, D. Bobrow,

[27] A. Tolmach. Tag-free garbage collection using explicit typ e

and S. Shenker. Combining generational and conservative

parameters. In Proceedings of the 1994 ACM Conference

garbage collection: Framework and implementations. In

on Lisp and Functional Programming, pages 1{11, Orlando,

ConferenceRecord of the 17th Annual ACM Symposium on

FL, June 1994.

Principles of Programming Languages, pages 261{269, San

[28] D. Ungar. Generational scavenging: A non-disruptivehigh

Francisco, CA, Jan. 1990.

p erformance storage management reclamation algorithm. In

[11] M. Felleisen and R. Hieb. The revised rep ort on the syntactic

ACM SIGPLAN Software Engineering Symposium on Prac-

theories of sequential control and state. Technical Rep ort 89-

tical Software Development Environments, pages 15{167,

100, Rice University, June 1989. Also app ears in: Theoretical

Pittsburgh, PA, Apr. 1984.

Computer Science, 102, 1992.

[29] P.Wo don. Metho ds of garbage collection for Algol-68. In

[12] P.Fradet. Collecting more garbage. In Proceedings of the

Algol-68 Implementation. North-Holland Publishing Com-

1994 ACM Conference on Lisp and Functional Program-

pany, Amsterdam, 1970.

ming, pages 24{33, Orlando, FL, June 1994.

[30] A. Wright and M. Felleisen. A syntactic approachtotyp e

[13] B. Goldb erg. Tag-free garbage collection for strongly

soundness. Technical Rep ort TR91-160, Department of

typ ed programming languages. In Proceedings of the ACM

Computer Science, Rice University, Apr. 1991.

SIGPLAN '91 Conferenceon Programming Language De-

sign and Implementation, pages 165{176, Toronto, Ontario,

Canada, June 1991.

[14] B. Goldb erg and M. Gloger. Polymorphic typ e reconstruc-

tion for garbage collection without tags. In Proceedings of

the 1992 ACM Conference on Lisp and Functional Program-

ming, pages 53{65, San Francisco, CA, June 1992.

[15] R. Harp er and G. Morrisett. Compiling p olymorphism us-

ing intensional typ e analysis. In ConferenceRecord of the

22nd Annual ACM Symposium on Principles of Program-

ming Languages, San Francisco, CA, Jan. 1995.

[16] P. Hudak. A semantic mo del of reference counting and its

abstraction. In Proceedings of the 1986 ACM Conference

on Lisp and Functional Programming, pages 351{363, Cam-

bridge, MA, 1986.

[17] J. Launchbury. A natural semantics for lazy evaluation. In

ConferenceRecord of the 20th Annual ACM Symposium on

Principles of Programming Languages, Charleston, SC, Jan.

1993.

[18] I. Mason and C. Talcott. Reasoning ab out programs with

e ects. In Proceedings of Programming Language Implemen-

tation and Logic Programming, LNCS 582. Springer-Verlag,

1990.

[19] I. Mason and C. Talcott. Equivalences in functional lan-

guages with e ects. Journal of Functional Programming,

2(1), 1991.

[20] G. Morrisett, M. Felleisen, and R. Harp er. Ab-

stract mo dels of memory management. Technical Re-

port CMU{CS{95{110, Scho ol of Computer Science,

Carnegie Mellon University, Jan. 1995. Available as

ftp://reports.adm.cs.c mu. edu/ usr /ano n/1 995/ CMU -CS- 95-

110.ps.

[21] R. Morrison, A. Dearle, R. C. H. Connor, and A. L. Brown.

An ad ho c approach to the implementation of p olymorphism.

ACM Trans. Prog. Lang. Syst., 13(3):342{371, July 1991.

[22] S. Nettles. A Larch sp eci cation of copying garbage collec-

tion. Technical Rep ort CMU{CS{92{219, Scho ol of Com-

puter Science, Carnegie Mellon University, Dec. 1992.

[23] Purushothaman and J. Seaman. An adequate op erational

semantics of sharing in lazy evaluation. In Proceedings of

the 4th European Symposium on Programming, LNCS 582.

Springer-Verlag, 1992.