Abstract Mo dels of Memory Management
Greg Morrisett Matthias Felleisen Rob ert Harp er
Carnegie Mellon Rice University Carnegie Mellon
[email protected] [email protected] [email protected]
1 Memory Safety Abstract
Advanced programming languages manage memory allo ca- Most sp eci cations of garbage collectors concentrate on the
tion and deallo cation automatically. Automatic memory low-level algorithmic details of how to nd and preserveac-
managers, or garbage collectors, signi cantly facilitate the cessible ob jects. Often, they fo cus on bit-level manipula-
programming pro cess b ecause programmers can rely on the tions such as \scanning stack frames," \marking ob jects,"
language implementation for the delicate tasks of nding \tagging data," etc. While these details are imp ortantin
and freeing unneeded ob jects. Indeed, the presence of a some contexts, they often obscure the more fundamental as-
garbage collector ensures memory safety in the same way p ects of memory management: what ob jects are garbage and
that a typ e system guarantees typesafety: no program writ- why?
ten in an advanced programming language will crash due We develop a series of calculi that are just low-level
to dangling pointer problems while allo cation, access, and enough that we can express allo cation and garbage collec-
deallo cation are transparent. However, in contrast to typ e tion, yet are suciently abstract that wemay formally prove
systems, memory management strategies and particularly the correctness of various memory management strategies.
garbage collectors rarely come with a compact formulation By making the heap of a program syntactically apparent, we
and a formal pro of of soundness. Since garbage collectors can sp ecify memory actions as rewriting rules that allo cate
work on the machine representations of abstract values, the values on the heap and automatically dereference p ointers
very idea of providing a pro of of memory safety sounds unre- to such ob jects when needed. This formulation p ermits the
alistic given the lack of simple mo dels of memory op erations. sp eci cation of garbage collection as a relation that removes
The recently develop ed syntactic approaches to the sp ec- p ortions of the heap without a ecting the outcome of the
i cation of language semantics by Felleisen and Hieb [11 ] evaluation.
and Mason and Talcott [18 , 19 ] are the rst execution mo d- Our high-level approach allows us to sp ecify in a compact
els that are intensional enough to p ermit the sp eci cation manner a wide variety of memory managementtechniques,
of memory management actions and yet are suciently ab- including standard trace-based garbage collection (i.e., the
stract to p ermit compact pro ofs of imp ortant prop erties. family of copying and mark/sweep collection algorithms),
Starting from the -S calculus of Felleisen and Hieb, we generational collection, and typ e-based, tag-free collection.
v
design compact sp eci cations of a numb er of memory man- Furthermore, since the de nition of garbage is based on the
agement ideas and proveseveral correctness theorems. semantics of the underlying language instead of the conser-
The basic idea underlying the developmentof our gar- vative approximation of inaccessibility,we are able to sp ecify
bage collection calculi is the representation of a program's and prove the idea that typ e inference can b e used to collect
run-time memory as a global series of syntactic declarations. some ob jects that are accessible but never used.
The program evaluation rules allo cate large ob jects in the
This work was sp onsored in part by the Advanced Research
global declaration, which represents the heap, and automat-
Pro jects Agency (ARPA), CSTO, under the title \The Fox Pro ject:
ically dereference p ointers to such ob jects when needed. As
Advanced Development of Systems Software," ARPA Order No. 8313,
issued by ESD/AVS under Contract No. F19628{91{C{0168, Wright
a result, garbage collection can b e sp eci ed as any relation
Lab oratory, Aeronautical Systems Center, Air Force Materiel Com-
that removes p ortions of the current heap without a ecting
mand, USAF, and ARPAgrant No. F33615-93-1-1330. Views and
the result of a program's execution.
conclusions contained in this do cument are those of the authors and
In Section 2, we present a small functional programming
should not b e interpreted as necessarily representing ocial p olicies
or endorsements, either expressed or implied, of Wright Lab oratory
language, gc, with a rewriting semantics that makes allo-
or the United States Government.
cation explicit. We de ne a semantic notion of garbage col-
lection for gc and prove that there is no optimal collection
strategy that is computable. In Section 3, we sp ecify the
\free-variable" garbage collection rule which mo dels trace-
based collectors including mark/sweep and copying collec-
tors. We prove that the free-variable rule is correct and
provide two \implementations" at the syntactic level: the
rst corresp onds to a copying collector, the second to a gen-
erational one.
In Section 4, we formalize so-called \tag-free" collec-
tion algorithms for explicitly-typ ed, monomorphic languages
suchasPascal and Algol [7 , 29, 8]. Weshowhowtorecover
lus [11 ]. Roughly sp eaking, this kind of semantics describ es necessary shap e information ab out values from typ es during
an abstract machine whose states are programs and whose garbage collection. We are able to prove the correctness of
instructions are relations b etween programs. The desired - the garbage collection algorithm by using a well known type
nal state of this abstract machine is an answer program (A) preservation argument.
whose b o dy is a p ointer to some value, suchasaninteger, In Section 5, we justify our semantic de nition of garbage
in the heap. byshowing that Milner-style typ e inference can b e used to
Each rewriting step of a program letrec H in e pro ceeds prove that an ob ject is semantically garbage even though
according to a simple algorithm. If the body of the pro- the ob ject is still reachable. While previous authors have
gram, e ,isnotavariable, it is partitioned into an evalua- sketched this idea [3 , 5, 14, 12], we are the rst to presenta
tion context E (an expression with a hole [ ] in the place formal pro of of this result. The pro of is obtained by casting
of a sub-expression), which represents the control state, and the well known interpretation of typ es as logical relations
an instruction expression I , which roughly corresp onds to into our framework.
a program counter: e = E [I ]. The instruction expression Section 6 discusses related work and Section 7 closes with
0
determines the next expression e and anychanges to the a summary.
0
heap resulting in a new heap H . Putting the pieces to- Due to a lack of space, most of the pro ofs for lemmas
gether yields the next program in the evaluation sequence: are omitted in this pap er. However, full details may be
0 0
letrec H in E [e ] . Each instruction determines one transi- recovered from our companion technical rep ort [20 ].
tion rule of the abstract machine. Formally, a rule denotes a
relation b etween programs. A set of rules denotes the union
2 Mo deling Allo cation: gc
of the resp ective relations.
We use the following conventions: Let G be a set of
Syntax: The syntax of gc (see Figure 1) is that of a con-
0
program relations, and let P and P b e programs. Then,
ventional, higher-order, applicative programming language
G
based on the -calculus. Following the tradition of func-
0 0
P 7 ! P means P rewrites to P according to one
tional programming, a gc program (P ) consists of some
G
of the rules in G and 7 ! is the re exive, transitive
mutually recursive de nitions (H ) and an expression (e).
G
closure of 7 !.
The global de nitions are useful for de ning mutually recur-
sive pro cedures, but their primary purp ose here is to repre-
G + r is the union of G with the rule r .
sent the run-time heap of a program. In general, there can
be cycles in a heap, so we use letrec instead of let as the
A program P is irreducible with resp ect to G i there
G
0 0
binding form. Expressions are either variables (x), integers
is no rule in G and no P suchthatP 7 ! P .
(i), pairs (he ;e i), pro jections ( ), abstractions (x:e), or
1 2 i
G
0 0 0
P + P means that P 7 ! P and P is irreducible
G
applications (e e ).
1 2
with resp ect to G.
Formally, the heap is a series of pairs, called bindings ,
consisting of variables and heap values. Heap values are a
P * means that there exists an in nite sequence of
G
semantically signi cant subset of expressions. The order of
G G G
programs P such that P 7 ! P 7 ! P 7 !.
i 1 2
the bindings is irrelevantandeachvariable must b e b ound
to at most one heap value in a heap. Hence, we treat heaps
Figure 1 de nes the set of evaluation contexts and in-
as sets and, when convenient, as nite functions. Wewrite
struction expressions for gc . The de nition of evaluation
Dom (H ) to denote the b ound variables of H ,andRng (H )
contexts (E ) re ects the left-to-right, call-by-value evalua-
to denote the heap values b ound in H . We sometimes refer
tion order of the language. All terms to the left of the path
to variables b ound in the heap as locations or pointers.
from the ro ot to the hole are variables; the terms on the
The language contains two binding constructs: x:e
right are arbitrary. Instruction expressions (I ) consist of
binds x in e and letrec H in e binds the variables in Dom (H )
heap values (h), applications of (p ointers to heap-allo cated)
to the expressions Rng (H ) in both the values in Rng (H )
pro cedures to (p ointers to heap-allo cated) values, and pro-
and in e. Following convention, we consider programs to
jections of (p ointers to heap-allo cated) tuples.
b e equivalent up to a consistent -conversion of b ound vari-
The evaluation rules for gc re ect the intentions b ehind
ables.
our choice of instruction expressions. The transition rule al-
Considering programs equivalent mo dulo -conversion
lo c mo dels the allo cation of values in the heap by binding
and the treatment of heaps as sets instead of sequences hides
the value to a new variable and using this variable in its place
many of the complexities of memory management. In par-
in the program. Note that the \]" notation carries the im-
ticular, programs are automatically considered equivalentif
plicit requirement that the newly allo cated variable x cannot
the heap is re-arranged and lo cations are re-named as long
b e in the domain of the heap H . The transition pro j sp ec-
as the \graph" of the program is preserved. This abstraction
i es how a pro jection instruction extracts the appropriate
allows us to fo cus on the issues of determining what bind-
comp onent from a pointer to a heap-allo cated pair. Sim-
ings in the heap are garbage without sp ecifying how such
ilarly, app is a transliteration of the conventional -value
bindings are represented in a real machine.
rule into our mo di ed setting. It binds the formal parame-
ter of a heap-allo cated pro cedure to the value of the p ointer
0
given as the actual argument, and places the expression part
Mathematical Notation: We use X ] X to denote the
0 0
of the pro cedure into the evaluation context. Multiple appli-
union of two disjoint sets, X and X . We use H ] H to
cations of the same pro cedure require -conversion to ensure
denote the union of two heaps whose domains are disjoint.
that the formal parameter do es not con ict with bindings
Weusefe =xge to denote capture-avoiding substitution of
1 2
1
already in the heap . WeuseR to abbreviate the union of
the expression e for the free variable x in the expression e .
1 2
0 0
the rules allo c , pro j , and app .
WeuseX n X to denote fx 2 X j x 62 X g.
1
An alternative rule for application substitutes the actual argu-
ment(y ) for the formal (z ) within e and p erforms no allo cation. This
Semantics: The rewriting semantics for gc is an adapta-
rule is essentially equivalentto app, but the de nition ab ovesimpli-
tion of the standard reduction function of the -S calcu-
v es the pro ofs of Section 5.
Programs:
(variables) x; y ; z 2 Var
(integers) i 2 Int ::= j 2 j 1 j 0 j 1 j 2 j
(expressions) e 2 Exp ::= x j i jhe ;e ij e j e j x:e j e e
1 2 1 2 1 2
(heap values) h 2 Hval ::= i jhx ;x ij x:e
1 2
(heaps) H 2 Heap ::= fx = h ;: ::;x = h g
1 1 n n
(programs) P 2 Prog ::= letrec H in e
(answers) A 2 Ans ::= letrec H in x
Evaluation Contexts and Instruction Expressions:
(contexts) E 2 Ctxt ::= []jhE; eijhx; E ij E j Eej xE
i
(instructions) I 2 Instr ::= h j x j xy
i
Rewriting Rules
allo c
(allo c ) letrec H in E [h] 7 ! letrec H ]fx = hg in E [x]
i
(pro j ) letrec H in E [ x] 7 ! letrec H in E [x ] (H (x)= hx ;x i and i =1; 2)
i i 1 2
app
(app) letrec H in E [xy] 7 ! letrec H ]fz = H (y )g in E [e] (H (x)= z :e)
Figure 1: The Syntax and Op erational Semantics of gc
Unfortunately, there can be no optimal garbage collector The irreducible programs of gc are either answers or
b ecause determining whether a binding is garbage or not is stuck programs. The latter corresp ond to machine states
undecidable. that result from the misapplication of primitive program
op erations or unb ound variables.
Prop osition 2.4 (Garbage Undecidable) Determining
if a binding is garbage in an arbitrary closed gc program is
De nition 2.1 (Stuck Programs) Aprogram is stuck if
undecidable.
it is of one of the fol lowing forms:
Pro of (sketch): We can reduce the halting problem to an
letrec H in E [ x] (x 62 Dom (H ) or H (x) 6= hx ;x i)
i 1 2
optimal collector by taking an arbitrary program, adding
a binding to the heap and mo difying the program so that
letrec H in E [xy] (x 62 Dom (H ) or H (x) 6= z :e or
if it terminates, it accesses the extra binding. An optimal
y 62 Dom (H ))
collector will collect the binding if and only if the original
program do es not terminate. 2
All programs either diverge or evaluate to an answer or
a stuck program. Put di erently, the evaluation pro cess
de nes a partial function from gc programs to irreducible
3 Reachability-Based Garbage Collection
programs [11 , 30 ].
Since computing an optimal collection is undecidable, a gar-
A Semantic De nition of Garbage Since the semantics of
bage collection algorithm must conservatively approximate
gc makes the allo cation of values explicit, including the
the set of garbage bindings. Most garbage collectors com-
implicit p ointer dereferencing in the language, we can also
pute the reachable set of bindings in aprogram given the
de ne what it means to garbage collect a value in the heap
variables in use in the current instruction expression and
and then analyze some basic prop erties. A binding x = h
control state. All reachable bindings are preserved; the oth-
in the heap of a program is garbage if removing the binding
ers are eliminated.
has no \observable" e ect on running the program. In our
Following Felleisen and Hieb [11 ], reachabilityingc is
case, we consider only integer results and non-termination
formalized by considering free variables. The following \free-
to b e observable.
variable" GC rule describ es bindings as garbage if there are
no references to these bindings in the other bindings, nor in
De nition 2.2 (Kleene Equivalence)
the currently evaluating expression:
(P ;G ) ' (P ;G ) means P + letrec H in x where
1 1 2 2 1 G 1
1
fv
H (x)=i if and only if P + letrec H in y and H (y )=
1 2 G 2 2
2
letrec H ] H in e 7 ! letrec H in e
1 2 1
(fv)
i. If G = G = R , then we simply write P ' P .
1 2 1 2
if Dom (H ) \ FV(letrec H in e)=;
2 1
A binding is garbage if removing it results in a program that
The fv rule is correct in that it only removes garbage, and
is Kleene equivalent to the original program:
thus computes valid collections. The keys to the pro of of
correctness of fv are a postponement lemma and a diamond
De nition 2.3 (Garbage) If P = letrec H ] fx =
lemma.
hg in e, then the binding x = h is garbage with respect to P
fv R
i P ' letrec H in e. A collection ofaprogram is the same
Lemma 3.1 (Postp onement) If P 7 ! P 7 ! P , then
1 2 3
fv R
program with some garbage bindings removed. An optimal 0 0
7 ! P . there exists a P 7 ! P such that P
3 1
2 2
col lection of a program is a program with as many garbage
bindings removedaspossible.
Pro of (sketch): By cases on the elements of R . 2
fv R
0
partitioning. It is p ossible to formulate an abstract version Lemma 3.2 (Diamond) If P 7 ! P and P 7 ! P ,
1 2 1
2
fv R
0
of such a mechanism, the free-variable tracing algorithm, by
7 ! P . 7 ! P and P then there exists a P such that P
3 3 3 2
2
lifting the ideas of mark-sweep and copying collectors to the
level of program syntax.
Pro of (sketch): Assume P = letrec H ] H in E [I ],
1 1 2
fv
We adopt the terminology of copying collection in the
7 ! P . We can easily P = letrec H in E [I ], and P
2 2 1 1
description of the free-variable tracing algorithm. We use
R
0
showby case analysis on the elements of R that if P 7 ! P
1
2
two heaps and a set: a \from-heap" (H ), a \scan-set" (S ),
f
0
where P = letrec H ] H ] H in E [e], for some H and
1 2 3 3
2
and a \to-heap" (H ). The from-heap is the set of bindings
t
fv R
0
in the current program and the to-heap will become the 7 ! P and P 7 ! P where P = letrec H ] e, then P
3 2 3 3 1
2
set of bindings preserved by the algorithm. The scan-set H in E [e]. 2
3
records the set of variables reachable from the to-heap that
With the Postp onement and Diamond Lemmas in hand,
have not yet b een moved from the from-heap to the to-heap.
it is straightforward to showthatfv is a correct GC rule.
The scan-set is often referred to as the \frontier."
The b o dy of the algorithm pro ceeds as follows: Avari-
fv
0 0
Theorem 3.3 (Correctness of fv) If P 7 ! P ,then P
able x is removed from S suchthatH has a binding for x.
f
is a col lection of P .
If no such lo cations are in S , the algorithm terminates. Oth-
erwise, it scans the heap value h to which x isboundinthe
0
Pro of: Let P = letrec H ] H in e and let P =
1 2
from-set H , lo oking for free variables. For each y 2 FV(h),
f
fv
0
letrec H in e such that P 7 ! P . Wemust show P evalu-
1
it checks to see if y has already b een forwarded to the to-set
0
ates to an integer value i P evaluates to the same integer.
H . Only if y is not b ound in H do es it add the variable to
t t
0
Supp ose P + letrec H in x and H (x) = i. By induc-
R
the scan-set S . This ensures that a variable moves at most
tion on the numb er of rewriting steps using the Postp one-
once from the from-heap to the scan-set.
ment Lemma, we can show that P + letrec H ] H in x
R 2
Formulating the free-variable tracing algorithm as a
and clearly (H ] H )(x) = H (x) = i. Now supp ose
2
rewriting system is easy. It requires only one rule that re-
P + letrec H in x and H (x)= i. By induction on the num-
R
lates triples of from-sets, scan-sets, and to-sets:
b er of rewriting steps using the Diamond Lemma, weknow
0 0 0
that there exists an H such that P + letrec H in x and
hH ]fx = hg;S ]fxg;H i =)
R
t
f
fv
0
hH ;S [ (FV(h) n (Dom (H ) ]fxg));H ]fx = hgi
t t
f
letrec H in x 7 ! letrec H in x. Thus, x must be bound in
0 0
H and since fv only drops bindings, H (x)=i. 2
Initially the free variables of the evaluation context and in-
This theorem shows that a single application of fv results
struction expression, which corresp ond to the \ro ots" of a
in a Kleene equivalent program. A real implementation in-
computation, are placed in S . Computing the free variables
terleaves garbage collection with evaluation. The following
of the context represents the scanning of the \stack" of a
theorem shows that adding fv to R preserves evaluation.
conventional implementation while computing the free vari-
ables of the instruction expression corresp onds to scanning
Theorem 3.4 For al l programs P , (P; R) ' (P; R + fv ).
the \registers." The initial tuple is re-written until we reach
a state where no variable in the scan-set is bound in the
Pro of: Clearly any evaluation under R can be simu-
from-heap. At this p oint, wehave forwarded enough bind-
lated by R + fv simply by not p erforming any fv steps.
ings to the to-heap. This leads to the following free-variable
Thus, if P + A then P + A. Now supp ose P +
R
R+fv R+fv
tracing algorithm rule:
letrec H in x and H (x )=i. Then there exists a nite
1 1 1 1
fva
0
rewriting sequence using R + fv as follows:
letrec H in e 7 ! letrec H in e
00 0
(fva) if hH; FV(e); ;i =) hH ;S;H i
R+fv R+fv R+fv R+fv
00
P 7 ! P 7 ! P 7 !7 ! letrec H in x
1 2 1 1
and Dom (H ) \ S = ;
We can showby induction on the numb er of rewriting steps
Clearly, the algorithm always terminates since the size of the
in this sequence, using the Postp onement Lemma, that all
from-heap strictly decreases with eachstep. Furthermore,
fv steps can b e p erformed at the end of the evaluation se-
this new rewriting rule is a subrelation of the rule fv, which
quence. This provides us with an alternativeevaluation se-
implies the correctness of the algorithm.
quence where all the R steps are p erformed at the b eginning:
fva fv
0 0
Theorem 3.5 If P 7 ! P , then P 7 ! P .
R R R R fv
0 0 0
P 7 ! P 7 ! P 7 ! 7 ! P 7 !
n
1 2
fv fv fv
P 7 ! P 7 ! 7 ! letrec H in x
Pro of: Let P = letrec H in e be a gc program. The rst
n+1 n+2 1 1
step is to prove the basic invariants of the garbage collection
Since fv do es not a ect the expression part of a program
rewriting system: If hH; FV(e); ;i =) hH ;S;H i, then
t
f
0
0
and only removes bindings from the heap, P = letrec H ]
1
H ] H = H and FV(letrec H in e)= S . NowletP =
n
t t
f
fva
H in x for some H . Thus, P + letrec H ] H in x. Since
0
2 2 R 1 2
letrec H in e and supp ose P 7 ! P . Then,
1
H (x)=i,(H ] H )(x)= i. 2
1 1 2
hH ] H ; FV(e); ;i =) hH ;S;H i:
1 2 2 1
3.1 The Free-Variable Tracing Algorithm
and Dom (H ) \ S = ;. By the invariants,
2
0
FV(letrec H in e)= S ,soDom (H ) \ FV(P )=;. Conse-
1 2
The free-variable GC rule is a speci cation of a garbage col-
fv
0
lection algorithm. It assumes some mechanism for partition-
quently, P 7 ! P . 2
ing the set of bindings into two disjoint pieces such that one
set of bindings is unreachable from the second set of bindings
If we require that a collection algorithm pro duce a closed
and the b o dy of the program. Real garbage collection algo-
program, then fva is \optimal" in the following weak sense:
rithms need a deterministic mechanism for generating this
fva
0 0
algorithm on the program letrec H in e where H ;H is a if P is a closed program and P 7 ! P , then P has the
2 1 2
generational partition of the original program's heap. The fewest bindings needed to keep the program closed with-
resulting heap is glued onto H to pro duce a collection of out a ecting evaluation. Assuming each step in the free-
1
the program. variable tracing algorithm takes time prop ortional to the
size (in symb ols) of the heap ob ject forwarded to the to-
gen
0
heap, the time cost of the algorithm is prop ortional to the
letrec H in e 7 ! letrec H ] H in e
1
2
fva
0
amount of data preserved, not the total amount of data in
(gen)
if letrec H in e 7 ! letrec H in e
2
2
the original heap.
and H ;H form a generational partition of H
1 2
The rule's soundness follows directly from the Generational
3.2 Generational Garbage Collection
Collection theorem, together with the soundness of the free-
The free-variable tracing algorithm examines al l of the
variable tracing algorithm.
reachable bindings in the heap to determine that a set of
The third reason generational collection is imp ortantis
bindings may be removed. By carefully partitioning the
that empirical evidence shows that \ob jects tend to die
heap into smaller heaps, a garbage collector can scan less
young" [28 ]. That is, recently allo cated bindings are more
than the whole heap and still free signi cant amounts of
likely to b ecome garbage in a small number of evaluation
memory. A generational partition of a program's heap is a
steps. Thus, if we place recently allo cated bindings in
sequence of sub-heaps ordered in sucha way that \older"
younger generations, we can concentrate our collection ef-
generations never have p ointers to \younger" generations.
forts on these generations, ignoring older generations, and
still eliminate most of the garbage.
De nition 3.6 (Generational Partition)
A generational partition of a heap H is a sequenceofheaps
4 Garbage Collection via Typ e Recovery
H ;H ;:::;H such that H = H ] H ] ]H and for al l i
1 2 n 1 2 n
such that 1 i i i+1 i+2 n The delimiters and other tokens of the abstract syntax mark ;. The H are referred to as generations and H is said to i i or \tag" heap values with enough information that wecan be an older generation than H if i j distinguish pairs from functions, p ointers from integers, etc. This allows us to navigate through the memory unambigu- Given a generational partition of a program's heap, a ously, but placing tags on heap values and stripping them o free-variable based garbage collector can eliminate a set of to p erform a computation can imp ose a heavy overhead on bindings in younger generations without lo oking at anyolder the running time and space requirements of programs [26 ]. generations. An alternative to tagging is the use of typ es to determine the Theorem 3.7 (Generational Collection) shap e of an ob ject. If typ es are determined at compile time Let H ;:::;H be a generational partition of the heap and evaluation maintains enough information that the typ es 1 n 2 1 ), and ] H of P = letrec H in e. Suppose H = (H of reachable ob jects can always b e recovered, then there is i i i 2 1 Dom (H ) \ FV (letrec H ] H ]]H in e)=;. Then no need to tag values. i+1 n i i fv 2 Anumb er of researchers have made attempts to explore ) in e. P 7 ! letrec (H n H i this alternative [7 , 8 , 3 , 13 , 27, 2], but none of them presented 2 concise characterizations of the underlying techniques with Pro of: Wemust show that Dom (H ) \ FV(letrec (H n i 2 correctness pro ofs. In this section, we present the basic idea ) in e)=;. Since H ; ;H is a generational partition H 1 n i b ehind typ e-recovery based garbage collection. Wethenin- of H , for all j ,1 j j j +1 n 2 tro duce gc-mono, an explicitly typ ed, monomorphic vari- Hence, FV(H ]]H ) \ Dom (H )=;. Now, 1 i 1 i antof gc. Weshow howto adapt the free-variable trac- 2 2 ing algorithm to recover typ es of ob jects in the heap and FV(letrec H n H in e) \ Dom (H ) i i 2 2 to use these typ es in the traversal of heap ob jects instead ) ) [ FV(e)) \ Dom (H = (FV(H n H i i 1 of abstract syntax. The pro of of correctness for this gar- ]]H ) [ FV(e)) = (FV(H ]]H ) [ FV(H n 1 i 1 i 2 bage collection algorithm is given by extending the pro of of \Dom (H ) i soundness of the typ e system for gc-mono. 2 = (FV(H ]]H ) \ Dom (H ))[ 1 i 1 i 1 2 ((FV(H ] ] H ) [ FV(e)) \ Dom (H )) n i i 2 1 4.1 gc-mono )) ] ] H ) [ FV(e)) \ Dom (H = ;[((FV(H n i i 1 2 = FV(letrec H ] ] H in e) \ Dom (H ) n i i gc-mono is an explicitly typ ed, monomorphic variant of = ; gc. Thesetoftyp es ( )of gc-mono contains the conven- tional basic typ es and constructed typ es for typing a func- 2 tional programming language like gc . The expressions of gc-mono are the same as for gc, except that each raw Generational collection is imp ortantfor three practical expression (u) is paired with some typ e information (see reasons: First, evaluation of closed, pure gc programs Figure 2). Heap values are not paired with their typ e, re- makes it easy to maintain generational partitions. ecting the fact that the memory is \almost tag free." Of Theorem 3.8 Let P = letrec H in e be a closed program. course, some typ e information is recorded within heap values R that are functions. In a low-level mo del that uses closures If H ;:::;H is a generational partition of H and P 7 ! 1 n 0 0 (an explicit value environment paired with co de), this cor- in e, then H ;:::;H ;H is a generational letrec H ] H 1 n 0 resp onds to maintaining a type environment, recording the partition of H ] H . typ es of the values in the closure's environment. The typ e environment for a closure can b e computed at compile time The second reason generational collection is imp ortant and shared among multiple instances, just like the co de. is that, given a generational partition, we can directly use The evaluation contexts (E ) consist of a raw context (U ) the free-variable tracing algorithm to generate a collection and a typ e ( ). Rawcontexts are lled with instructions (I ) of a program. The following rule invokes the free-variable Typ es: (typ es) 2 Type ::= int j j ! 1 2 1 2 Programs: (expressions) e 2 TExp ::= (u : ) u 2 UExp ::= x j i jhe ;e ij e j x::e j e e 1 2 i 1 2 (heap values) h 2 Hval ::= i jhx ;x ijx::e 1 2 (heaps) H 2 Heap ::= fx = h ;:: :;x = h g 1 1 n n (programs) P 2 Prog ::= letrec H in e (answers) A 2 Ans ::= letrec H in (x : ) Evaluation Contexts and Instructions: (contexts) E 2 TCtxt ::= (U : ) U 2 UCtxt ::= []jhE; eij h(x: );Eij E j Eej (x: ) E i (instructions) I 2 Instr ::= i jh(x : ); (x : )ij x::e j (x: ) j (x: )(y : ) 1 1 2 2 i 1 2 Rewriting Rules: allo c (allo c-int ) letrec H in E [i] 7 ! letrec H ]fx = ig in E [x] allo c (allo c-pair ) letrec H in E [hx : ;x : i] 7 ! letrec H ]fx = hx ;x ig in E [x] 1 1 2 2 1 2 allo c (allo c-fn ) letrec H in E [y ::e] 7 ! letrec H ]fx = y ::eg in E [x] i (pro j ) letrec H in E [ (x: )] 7 ! letrec H in E [x ] (H (x)= hx ;x i;i=1; 2) i i 1 2 app 0 (app) letrec H in E [(x: )(y : )] 7 ! letrec H ]fz = H (y )g in E [e] (H (x)= z : :e) 1 2 Figure 2: The Syntax and Op erational Semantics of gc-mono the heap rule requires \guessing" the typ es of the values in which are a subset of the raw expressions (u). The evalu- the heap and verifying these guesses simultaneously. The ation rules for gc-mono, named RM , are variants of the third judgement, .P, asserts the well-typing of a complete rules for gc that largely ignore the typ e information on the program. sub-expressions of the program's b o dy. Allo cation of inte- The calculus gc-mono is type sound in that evaluation gers, tuples and functions strips the typ e tag o the heap of well formed programs cannot get stuck[30]. Akey to the value b efore placing it in the heap. Allo cation of tuples also pro of of soundness is a typepreservation lemma: removes the tags from the comp onents of the tuple. The removal of typ e information corresp onds to the passage of RM 0 Lemma 4.2 (Typ e Preservation) If .P and P 7 ! P , avalue from co de to data and do es not necessarily re ect a 0 then .P . runtime cost. Pro jection and application are essentially the same as for gc. Note that substitution of a result expres- Theorem 4.3 (Typ e Soundness) If .P then either P sion for an instruction o ccurs \in place," and hence the typ e RM 0 0 is an answer or else there exists some P such that P 7 ! P of the instruction is ascrib ed to the expression. 0 and .P . The notion of a stuck state is adapted in accordance with the typ e structure of the language. 4.2 Using Typ es in Garbage Collection De nition 4.1 (gc-mono Stuck Programs) A Sp ecifying a valid garbage collection rule that exploits typ es program is stuck if it is of one of the fol lowing forms: is straightforward. The key prop erty that the rule must gc 0 letrec H in E [ (x: )] (x 62 Dom (H ) or preserveis typepreservation. That is, if P 7 ! P ,andP is i 0 H (x) 6= hx ;x i) well typ ed, then P mustbewell typ ed. One waytoachieve 1 2 letrec H in E [(x: )(y : )] (x or y 62 Dom (H ) or this goal is to makeitinto a side-condition of the GC rule: 1 2 H (x) 6= z : :e) mono letrec H ] H in e 7 ! letrec H in e 1 2 1 (mono) if . letrec H in e The static semantics of gc-mono consists of four judge- 1 ments ab out program comp onents. The rst judgement, Theorem 4.4 For al l wel l formed gc-mono programs P ,