TIL: A Typ e-Directed Optimizing Compiler for ML
D. Tarditi, G. Morrisett, P. Cheng, C. Stone, R. Harp er, and P. Lee
Scho ol of Computer Science
Carnegie Mellon University
5000 Forb es Avenue
Pittsburgh, PA 15213-3891
structures (e.g., arrays) or recursive data structures (e.g., 1 Intro duction
lists).
We are investigating a new approach to compiling Standard
Intensional polymorphism and tag-free garbage col lection
ML (SML) based on four key technologies: intensionalpoly-
eliminate the need to use a universal representation when
morphism [23], nearly tag-free garbage col lection [12, 46 , 34],
compiling p olymorphic languages. TIL uses these technolo-
conventional functional language optimization,and loop op-
gies to representmany data values \naturally". For ex-
timization.To explore the practicality of our approach, we
ample, TIL provides tag-free, unallo cated, word-sized in-
have constructed a compiler for SML called TIL, and are
tegers; aligned, unboxed oating-p oint arrays; and unallo-
thus far encouraged by the results: On DEC ALPHA work-
cated multi-argument functions. These natural representa-
stations, programs compiled by TIL are roughly three times
tions and calling conventions not only improve the p erfor-
faster, do one- fth the total heap allo cation , and use one-
mance of SML programs, but also allowthemtointerop erate
half the physical memory of programs compiled bySMLof
with legacy co de written in languages suchasCandFortran.
New Jersey (SML/NJ). However, our results are still pre-
When typ es are unknown at compile time, TIL may pro duce
liminary | wehave not yet investigated howtoimprove
machine co de which is slower and bigger than conventional
compile time; TIL takes ab out eight times longer to compile
approaches. This is b ecause typ es must b e constructed and
programs than SML/NJ. Also, wehave not yet implemented
passed to p olymorphic functions, and p olymorphic functions
the full mo dule system of SML, although we do provide sup-
must examine the typ es at run-time to determine appropri-
p ort for structures and separate compilation. Finally,we
ate execution paths. However, when typ es are known at
exp ect the p erformance of programs compiled by TIL to im-
compile time, no overhead is incurred to supp ort p olymor-
prove signi cantly as we tune the compiler and implement
phism or garbage collection.
more optimizations .
Because these technologies make p olymorphic functions
Twokey issues in the compilation of advanced languages
slower, it b ecomes imp ortant to eliminate as many p olymor-
such as SML are the presence of garbage col lection and type
phic functions at compile time as is p ossible. Inlining and
variables. Most compilers use a universal representation for
uncurrying are well-known techniques for eliminating p oly-
values of unknown or variable typ e. In particular, values are
morphic and higher-order functions. Wehave found that
forced to t into a tagged machine word; values larger than
for the b enchmarks used here, these techniques eliminate all
amachine word are represented as p ointers to tagged, heap-
p olymorphic functions and all but a few higher-order func-
allo cated ob jects. This approach supp orts fast garbage col-
tions when programs are compiled as a whole.
lection and ecient p olymorphic functions, but can result in
Wehave also found that applying traditional lo op op-
inecient codewhentyp es are known at compile time. Even
timizations to recursive functions, such as common sub-
with recent advances in SML compilation, such as Leroy's
expression elimination and invariantremoval, is imp ortant.
representation analysis [28], values must b e placed in a uni-
In fact, these optimization reduce execution time byame-
versal representation b efore b eing stored in up dateable data
dian of 39%.
An imp ortant prop erty of TIL is that all optimizations
This researchwas sp onsored in part by the Advanced Research
and the key transformations are p erformed on typed inter-
Pro jects Agency CSTO under the title \The Fox Pro ject: Advanced
mediate languages (hence the name TIL). Maintaining cor-
Languages for Systems Software", ARPA Order No. C533, issued by
ESC/ENS under Contract No. F19628-95-C-0050,andinpartby the
rect typ e information throughout optimization is necessary
National Science Foundation under Grant No. CCR-9502674, and in
to supp ort b oth intensional p olymorphism and garbage col-
part by the Isaac Newton Institute for Mathematical Sciences, Cam-
lection, b oth of which require typ e information at run time.
bridge, England. David Tarditi was also partly supp orted byan
By using strongly-typ ed intermediate languages, weensure
AT&T Bell Labs PhD Scholarship. The views and conclusions con-
tained in this do cument are those of the authors and should not b e in-
that typ e information is maintained in a principled fash-
terpreted as representing ocial p olicies, either expressed or implied,
ion, instead of relying up on ad hoc invariants. In fact, us-
of the Advanced Research Pro jects Agency, the U.S. Government, the
ing the intermediate forms of TIL, an \untrusted" compiler
National Science Foundation or AT&T.
can pro duce fully optimized intermediate co de, and a client
can automatically verify the typ e integrityofthecode.We
To app ear in the ACM SIGPLAN '96 Conference on
have found that this ability has a strong engineering b ene t:
Programming Language Design and Implementation
typ e-checking the output of each optimization or transfor-
mation helps us identify and eliminate bugs in the compiler.
2.2 Conventional and Lo op-Oriented Opti- In the remainder of this pap er, we describ e the technolo-
gies used by TIL in detail, giveanoverview of the structure
mizations
of TIL, present a detailed example showing how TIL com-
Program optimization is crucial to reducing the cost of in-
piles ML co de, and give p erformance results of co de pro-
tensional p olymorphism,i mprovi ng lo ops and recursivefunc-
duced by TIL.
tions, and eliminating higher-order and p olymorphic func-
tions. TIL employs optimizations found in conventional
2 Overview of the Technologies
functional language compilers, including inlining, uncurry-
ing, dead-co de elimination, and constant-folding. In addi-
This section contains a high-level overview of the technolo-
tion, TIL do es a set of generalized \lo op-oriented" optimiza-
gies we use in TIL.
tions to improve recursive functions. These optimizations in-
clude common-sub expressi on elimination , invariantremoval,
and array-b ound check removal. In spite of the large num-
2.1 Intensional Polymorphism
b er of di erent optimizations, each optimization pro duces
Intensional p olymorphis m [23] eliminates restrictions on data
typ e-correct co de.
representations due to p olymorphism, separate compilation,
TIL applies optimizations across entire compilation units.
abstract datatyp es, and garbage collection. It also supp orts
This makes it more likely that inlinin g and uncurrying will
ecient calling conventions (multiple arguments passed in
eliminate higher-order functions, which are likely to interfere
registers) and tag-free p olymorphic, structural equality.
with the lo op-oriented optimizations . Since the optimiza-
With intensional p olymorphism, typ es are constructed
tions are applied to entire compilation units (whichmaybe
and passed as values at run time to p olymorphic functions,
whole programs), we paid close attention to algorithmic e-
and these functions can branch based on the typ es. For
ciency of individual optimization passes. Most of the passes
example, when extracting a value from an array, TIL uses a
haveanO (N log N )worst-case asymptotic complexity (ex-
typecase expression to determine the typ e of the array and
cluding checking typ es for equality), where N is program
to select the appropriate sp ecialized subscript op eration:
size.
fun sub[ ](x: array,i:int)=
typecase of
2.3 Nearly Tag-Free Garbage Collection
int => intsub(x, i)
Nearly tag-free garbage collection uses typ e information to
| oat => floatsub(x, i)
eliminate data representation restrictions due to garbage
| ptr( ) => ptrsub(x, i)
collection. The basic idea is to record enough represen-
If the typ e of the array can b e determined at compile-time,
tation information at compile time so that, at any p oint
then an optimizer can eliminate the typecase:
where a garbage collection can o ccur, it is p ossible to deter-
mine whether or not values are p ointers and hence must b e
sub[ oat](a, 5) ,! floatsub(a,5)
traced by the garbage collector. Recording the information
However, intensional p olymorphism comes with two costs.
at compile time makes it p ossible for co de to use untagged
First, wemust construct and pass representations of typ es to
representations. Unlike so-called conservative collectors (see
p olymorphic functions at run time. Furthermore, wemust
for example [10, 14]), the information recorded by TIL is
compile p olymorphic functions to supp ort any p ossible rep-
sucient to collect all unreachable ob jects.
resentation and insert typecase constructs to select the ap-
Collection is \nearly" tag-free b ecause tags are placed
propriate co de paths. Hence, the co de we generate for p oly-
only on heap-allo cated data structures (records and arrays);
morphic functions is b oth bigger and slower, and minimizing
values in registers, on the stack, and within data structures
p olymorphism b ecomes quite imp ortant.
remain tagless. We construct the tags for monomorphic
Second, in order to use typ e information at run time, for
records and arrays at compile time. For records or arrays
b oth intensional p olymorphism and tag-free garbage collec-
with unknown comp onenttyp es, wemay need to construct
tion, wemust propagate typ es through each stage of compi-
tags partially at run time. As with other p olymorphic op er-
lation. To address this second problem, almost all compila-
ations, weuseintensional p olymorphism to construct these
tion stages, including optimization and closure conversion,
tags.
are expressed as typ e-directed, typ e-preserving translations
Registers and comp onents of stack frames are not tagged.
to strongly-typ ed intermediate languages.
Instead, we generate tables at compile time that describ e
The key diculty with using typ ed intermediate lan-
the layout of registers and stack frames. We asso ciate these
guages is formulating a typ e system that is expressive enough
tables with the addresses of call sites within functions at
to statically typ e check terms that branchontyp es at run
compile time. When garbage collection is invoked, the col-
time, suchassub.Thetyp e system used in TIL is based
lector scans the stack, using the return address of each frame
on the approach suggested by Harp er and Morrisett [23, 33].
as an index into the table. The collector lo oks up the lay-
Typ es themselves are represented as expressions in a simply-
out of eachstack-frame to determine which stacklocations
typ ed -calculus extended with an inductivel y generated
to trace. We record additional liveness information for each
base kind (the monotyp es), and a corresp onding induction
variable to avoid tracing p ointers that are no longer needed.
eliminatio n form. The induction elimination form is es-
This approachiswell-understo o d for monomorphic lan-
sentially a \Typ ecase" at the typ e level; this allows us to
guages requiring garbage collection [12]. Following Tolmach
write typ e expressions that track the run-time control ow
[46], we extended it to a p olymorphic language as follows:
of term-level typecase expressions. Nevertheless, the typ e
when a variable whose typ e is unknown is saved in a stack
system used by TIL remains b oth sound and decidable.This
frame, the typ e of the variable is also saved in the stack
implies that at any stage during optimization, we can auto-
frame. However, unlikeTolmach, weevaluate substitutions
matically verify the typ e integrity of the co de. 2
and a p ointer to the record containing the value and the of ground typ es for typ e variables eagerly instead of lazily.
list value. The tag is a small integer value used to distin- This is due in part for technical reasons (see [33, Chapter
guish among the constructors of a datatyp e (e.g., nil vs. 7]), and in part to avoid a class of space leaks that might
::). Constructor attening rewrites all constructors that result with lazy substitution.
take records as arguments so that the comp onents of the
records are attened. In addition, constructor attening
3 Compilation Phases of TIL
eliminates tag comp onents when they are unneeded. For
example, cons applied to (hd,tl) is simply represented as a
Figure 1 shows the various compilation phases of TIL. The
p ointer to the pair (hd,tl), since suchapointer can always
phases through and includin g closure conversion use a typ ed
b e distinguish ed from nil. If the constructor takes an argu-
intermediate language. The phase after closure conversion
ment of unknown typ e, then weusetypecase to determine
use an untyp ed language where variables are annotated with
the prop er representation, according to the instantiation of
garbage collection information. The low-level phases of the
at run time.
compiler use languages where registers are annotated with
Because lists are used often in SML, the SML/NJ com-
garbage collection information.
piler also attens cons cells (and other constructors). How-
The following sections describ e the phases of TIL and
ever, in violation of the SML De nition [31], SML/NJ pre-
the intermediate languages they use in more detail.
vents programmers from abstracting the typ e of these con-
structors, in order to prevent representation mismatches b e-
tween de nitions of abstract datatyp es and their uses [3]. In
3.1 Front-end
contrast, TIL supp orts fully abstract datatyp e comp onents,
The rst phase of TIL uses the front-end of the ML Kit
but uses intensional p olymorphism to determine representa-
compiler [8] to parse and elab orate (typ e check) SML source
tions of abstract datatyp es, p otentially at run time.
co de. The Kit pro duces annotated abstract syntax for all of
In addition to sp ecializin g calling conventions and datatyp es,
SML and then compiles a subset of this abstract syntax to
the conversion from Lamb da to Lmli makes p olymorphic
an explicitly-typ e d core language called Lamb da. The com-
equality explicit as a term in the language. Also, arrays are
pilation to Lamb da eliminates pattern matching and various
sp ecialized into one of three cases: int arrays, oat arrays,
derived forms.
and p ointer arrays. Intensional p olymorphism is used to
We extended Lamb da to supp ort signatures, structures
select the appropriate creation, subscript, and up date op er-
(mo dules), and separate compilation. Each source mo dule
ations for p olymorphic arrays.
is compiled to a Lamb da mo dule with an explicit list of
Finally,TILboxes all oating p ointvalues, except for
imp orted mo dules and their signatures. Imp orted signa-
values stored in oating-p oint arrays. Wechose to b ox
tures may include transparent de nitions of typ es de ned in
oats to make record op erations faster, since typical SML
other mo dules; hence TIL supp orts a limited form of translu-
co de manipulates many records but few oats. The is-
cent [22] or manifest typ es [29]. Currently, the mapping to
sue is that oating-p ointvalues are 64 bits, while other
Lamb da do es not handle signatures, nested structures, or
scalars and p ointers are 32 bits. If oats were unboxed, then
functors. In principle, however, all of these constructs are
record o set calculations could not always b e done at com-
supp orted by TIL's intermediate languages.
pile time. Fortunately, the optimizer later eliminates un-
ecessary b ox/unbox op erations during the constant-folding
phase, so straight-line oating p oint co de still runs fast.
3.2 Lmli and Typ e-Directed Optimizations
In all, the combination of typ e-directed optimizations re-
ML
duce running times by roughly 40% and allo cation by 50%
Lmli, which stands for [23], is an intensionall y p olymor-
i
[33, Chapter 8]. However, much of this improvement can
phic language that provides explicit supp ort for construct-
b e realized by other techniques; For example, SML/NJ uses
ing, passing, and analyzing typ es at run-time. We use these
Leroy's unboxing technique to achieve comparable improve-
constructs in the translation of Lamb da to Lmli to provide
ments for calling conventions [42]. The advantage of our
ecient data representations for user-de ned datatyp es, multi-
approach is that weuseasinglemechanism (intensional
argument functions, tag-free p olymorphic equality, and sp e-
p olymorphism) to sp ecialize calling conventions, atten con-
cialized arrays.
structors, unbox oating-p ointarrays, and eliminatin g tags
After the conversion from Lamb da to Lmli, TIL p erforms
for b oth p olymorphi c equality and garbage collection.
a series of typ e-directed optimization s. SML provides only
single-argument functions; multiple arguments are passed in
a record. The rst optimization, argument attening, trans-
3.3 Optimizations
lates each function whichtakes a record as an argumentto
a function whichtakes the comp onents of the record as mul-
TIL employs an extensive set of optimizations. The opti-
tiple arguments. These arguments are passed in registers,
mizations include most of those typically done by compilers
avoiding allo cation to create the record and memory op-
for functional languages. They also include lo op-oriented
erations to access record comp onents. If a function takes
optimizations, suchasinvariant removal, applied to recur-
an argumentofvariable typ e , then weuse typecase to
sive functions.
determine the prop er calling convention, according to the
TIL rst translates Lmli to a subset of Lmli called Bform.
instantiation of at run time.
Bform, based on A-Normal-Form [18], is a more regular in-
As with functions, datatyp e constructors in SML take
termediate language than Lmli that facilitates optimization .
a single argument. For example, the cons data construc-
The translation from Lmli names all intermediate computa-
tor (::) for an list takes a single record, consisting of
tions and binds them to variables byalet-construct. It
an value and an list value. Naively, such a construc-
also names all p otentially heap-allo cated values, including
tor is represented as a pair consisting of a tag (e.g., cons),
strings, records and functions. Finally, it allows nested 3 parse, elaborate, eliminate Front end pattern matching
introduce intensional polymorphism, Conversion to Lmli choose data representations
flatten args, Typed intermediate Type-directed flatten constructors, languages optimization box floats
do inlining, uncurrying, constant- Conventional and folding, CSE, invariant removal, loop optimizations etc.
close functions, Closure conversion choose environment representations
Conversion to untyped calculate gc info for variables, language with gc info. choose representation for types
choose machine representation Conversion to RTL for variables, introduce tagging for records and arrays Registers annotated with gc info do graph-coloring register Register allocation allocation, construct tables for gc
Assembly
Figure 1: Phases of the TIL compiler 4
let expressions only within switches (branch expressions). the nested if statement is replaced by e , since z is
1
Hence, the translation from Lmli to Bform linearizes and always true at that p oint.
names nested computations and values.
invariantremoval: Using the call graph, we calcu-
After translation to Bform, TIL p erforms the following
late the nesting depth of each function. (Nesting-depth
conventional transformations:
is analogous to lo op-nesting depth in languages likeC.)
alpha-conversion: Bound variables are uniquely re- TIL assigns a let-b ound variable and the expression
named. it binds a nesting depth equal to that of the nearest
enclosing function. For every pure expression e, if all
dead-co de elimination: unreferenced, pure expres-
free variables of e have a nesting depth less than e, TIL
sions and functions are eliminated.
moves the de nition of e right after the de nition of
the free variable with the highest lexical nesting depth.
uncurrying: Curried functions are transformed to
multi-argument functions, when p ossible.
hoisting: All constant expressions are hoisted to the
top of the program. An expression is a constant ex-
constant folding: Arithmetic op erations, switches,
pression if it uses only constants or variables b ound to
and typ ecases on constantvalues are reduced, as well
constant expressions.
as pro jections from known records.
eliminating redundant comparisons: A set of sim-
sinking: Pure expressions used in only one branchof
ple arithmetic relations of the form x a switch are pushed into that branch. However, such top-down through the program. A \rule-of-signs" ab- expressions are not pushed into function de nitions. stract interpretation is used to determine signs of vari- ables. This information is used to eliminate array- inlining: Non-escaping functions that are called only b ounds checks and other tests. once are always inlined. Small, non-recursive functions are inlined in a b ottom-up pass. Recursive functions TIL applies the optimizations as follows: rst, it p er- are never (directly) inlined. forms a round of reduction optimizations, including dead- co de eliminatio n, constant folding, inlinin g functions called inlining switchcontinuations: The continuation of once, CSE, eliminatin g redundant switches, and invariant a switch is inlined when all but one branch raises an removal. These optimization s do not increase program size exception. For example, the expression and should result in faster co de. It iterates these optimiza- let x = if y then e else raise e 2 3 tions until no further reductions o ccur. Then it p erforms in e 4 switch-continuation inlinin g, sinking, uncurrying, compar- end ison eliminati on, x minimizing, and inlining . The entire pro cess, starting with the reduction optimizations, is iter- is transformed to ated two or more times. if y then let x = e in e end else raise e . 2 4 3 3.4 Closure conversion This makes expressions in e available within e for 2 4 optimizations like common sub-expression elimination . TIL uses a typ e-directed, abstract closure conversion in the style suggested by Minamide, Morrisett, and Harp er [32] to minimizing x: Mutually-recursi ve functions are bro- convert Lmli-Bform programs to to Lmli-Closure programs. ken into sets of strongly connected comp onents. This Lmli-Closure is an extension of Lmli-Bform that provides improves inlining and dead co de elimination , by sepa- constructs for explicitl y constructing closures and their en- rating non-recursive and recursive functions. vironments. For each escaping Bform function, TIL generates a closed In addition to these standard functional language trans- piece of co de, a typ e environment,andavalue environment. formations, TIL also applies lo op-oriented optimization s to The co de takes the free typ e variables and free value vari- recursive functions: ables of the original function as extra arguments. The typ es common sub expression elimination (CSE): Given and values corresp onding to these free variables are placed in an expression records. These records are paired with the co de to form an abstract closure. TIL uses a at environment representation let x = e 1 for typ e and value environments [5]. in e 2 For known functions, TIL generates closed co de but avoids end creating environments or a closure. Following Kranz [27], we mo dify the call sites of known functions to pass free variables if e is pure or the only e ect it mayhave is to raise an 1 as additional arguments. exception, then all o ccurrences of e in e are replaced 1 2 TIL closes over only variables which are function argu- with x. The only expressions that are excluded from ments or are b ound within functions. The lo cations of other CSE are side-e ecting expressions and function calls. \top-level" variables are resolved at compile-time through traditional linking, so their values do not need to b e stored eliminating redundant switches: Given an expres- in a closure. sion let x = if z then let y = if z then e else e 1 2 in ... 5 The function sub2 is a built-in 2-d array subscript function 3.5 Conversion to an untyp ed language which the front end expands to To simplify the conversion to low-level assembly co de, TIL translates Lmli-Closure programs to an untyp ed language fun sub2 (fcolumns,rows,vg, s :int, t:int) = called Ubform. Ubform is a much simpler language than if s <0 orelse s>=rows orelse t<0 orelse Lmli, since similar typ e-level and term-level constructs are t>=columns then raise Subscript collapsed to the same term-level constructor. For exam- else unsafe sub1(v,s * columns + t) ple, in the translation from Lmli-Closure to Ubform, TIL Figures 2 through 7 show the actual intermediate co de replaces typecase with a conventional switch expression. created as dot and sub2 pass through the various stages This simpli es generation of low-level co de, since there are of TIL. For readability,wehave renamed variables, erased many fewer cases. typ e information, and p erformed some minor optimization s, TIL annotates variables with representation information such as eliminating selections of elds from known records. that tells the garbage collector what kinds of values variables Figure 2 shows the functions after they have b een con- must contain (e.g., p ointers, integers, oats, or p ointers to verted to Lmli. The sub2 function takes a typ e as an ar- co de). The representation of a variable x may b e unknown gument. A function parameterized byatyp e is written as at compile time, in which case the representation informa- t., while a function parameterized byavalue is written as tion is the name of the variable y that will contain the typ e i. In the dot function, the sub2 function is rst applied to of x at run time. a type and then applied to its actual values. Each function takes only one argument, often a record, from which elds 3.6 Conversion to RTL are selected. The quality ofcodeatthislevel is quite p o or: there are eight function applications , four record construc- Next TIL converts Ubform programs to RTL,a register-transfer tions, and numerous checks for array b ounds. language similar to ALPHA or other RISC-style assembly Figure 3 shows the Lmli fragment after it has b een con- language. RTL provides an in nite numb er of pseudo-registers verted to Lmli-Bform. Functions have b een transformed to each of which is annotated with representation informa- takemultiple arguments instead of records and every inter- tion. Representation information is extended to include mediate compuation is named. lo catives, which are p ointers into the middle of ob jects. Figure 4 shows the Lmli-Bform fragment after it has b een Pseudo-registers containing lo catives are never live across a optimized. All the function applications in the b o dy of the p oint where garbage collection can o ccur. RTL also provides ai(av,a) is an applica- lo op have b een eliminated. psub heavy-weight function call and return mechanisms, and a tion of the (unsafe) integer array subscript primitive. All of form of interpro cedural goto for implementing exceptions. the comparisons for array b ounds checking have b een safely The conversion of Ubform to RTL decides whether Ub- eliminated, and the b o dy of the lo op consists of 9 expres- form variables will b e represented as constants, lab els, or sions. This lo op could b e improved even further; wehaveyet pseudo-registers. It also eliminates exceptions, inserts tag- to implementany form of strength reduction and induction ging op erations for records and arrays, and inserts garbage variable eliminatio n. collection checks. Figure 5 shows the Lmli-Bform fragment after it has b een converted to Ubform. Eachvariable is now annotated with 3.7 Register allo catio n and assembly representation information,tobeusedby the garbage col- lector. INT denotes integers and TRACE denotes p ointers to Before doing register allo cation, TIL converts RTL programs tagged ob jects. The function is now closed, since it was to ALPHA assembly language with extensions similar to closure converted b efore converting to Ubform. those for RTL. Then TIL uses conventional graph-coloring Figure 6 shows the Ubform fragment after it has b een register allo cation to allo cate physical registers for the pseudo- converted to RTL. Every pseudo-register is now annotated registers. It also generates tables describing layout and with precise representation information for the collector. garbage collection information for eachstackframe,asde- The representation information has b een extended to in- scrib ed in Section 2.3. Finally, TIL generates actual ALPHA clude LOCATIVE, which denotes p ointers into the middle of assembly language and invokes the system assembler, which tagged ob jects. Lo catives cannot b e live across garbage- do es instruction scheduling and creates a standard ob ject collection p oints. The (*) indicates p oints where the psub ai le. primitive has b een expanded to twoRTL instructions. This indicates that induction-vari ab le elimination would also b e pro table at the RTL level. The return instruction's op erand 4 An example is a pseudo-register containing the return address. Figure 7 shows the actual DEC ALPHA assembly lan- This section shows an ML function as it passes through the guage generated for the dot function. Thecodebetween various stages of TIL. The following SML co de de nes a dot L1 and L3 corresp onds to the RTL co de. The other co de pro duct function that is the inner lo op of the integer matrix is epilogue and prologue co de for entering and exiting the multiply b enchmark: function. Note that no tagging op erations o ccur anywhere val sub2 : 'a array2 * int * int -> 'a in this function. fun dot(cnt,sum)= if cnt 5 Performance let val sum'=sum+sub2(A,i,cnt)*sub2(B,cnt,j) in dot(cnt+1,sum') In this section, we compare the p erformance of programs end compiled by TIL against programs compiled by the SML/NJ else sum 6 sub2 = let fix f = ty. let fix g = arg. let a = (#0 arg) s = (#1 arg) t = (#2 arg) fix dot = columns = (#0 a) cnt,sum. rows = (#1 a) let test = plst i(cnt,bound) v = (#2 a) r = Switch enum test of check = 1=> let test1 = plst i(s,0) . in Switch enum test of let a = t1 + cnt 1=>.enum(1) b = psub ai(av,a) |0=>. c = columns * cnt let test2 = pgte i(s,rows) d= j+c enum test2 of in Switch e = psub ai(bv,d) 1=>.enum(1) f = b*e |0=>. g = sum+f let test3 = plst i(t,0) h = 1+cnt in Switch enum test3 of i = dot(h,g) 1=>.enum(1) in i |0 =>.pgte i(t,columns) end end | 0=>.sum end in r end end in Switch enum check of 1=>.raise Subscript |0=>.unsafe sub1 [ty] fv,t + s * columnsg Figure 4: Lmli-Bform after optimization end in g in f end fix dot= i.let cnt = (#0 i) sum = (#1 i) i(cnt,bound) d = plst in Switch enum d of1=>.let sum' = sum + ((sub2 [int]) fA,i,cntg)* ((sub2 [int]) fB,cnt,jg) in dotfcnt+1,sum'g fix dot = end bound:INT,columns:INT,bv:TRACE,av:TRACE,t1:INT, | 0=>.sum j:INT,cnt:INT,sum:INT. end i(bound,cnt) let test:INT = pgtt r:INT = Switchint test of Figure 2: After conversion to Lmli 1=> let a:INT = t1 + cnt b:INT = psub ai(av,a) sub2 = ... c:INT = columns * cnt fix dot = cnt,sum. d:INT = j + c let test = plst i(cnt, bound) ai(bv,d) e:INT = psub r= f:INT = b*e enum test of Switch g:INT = sum+f 1=>. h:INT = 1+cnt let a = sub2[int] i:INT = dot(bound,columns,bv, b = a(A,i,cnt) av,t1,j,h,g) c = sub2[int] in i d = c(B,cnt,j) end e = b*d | 0 => sum f = sum+e in r g = cnt+1 end : INT h = dot(g,f) in h end Figure 5: After conversion to Ubform |0 =>.sum in r end Figure 3: Lmli-Bform b efore optimization 7 dot(([bound(INT),columns(INT),bv(TRACE), av(TRACE),t1(INT),j(INT),cnt(INT), sum(INT)],[])) f L0: pgt bound (INT) , cnt(INT) , test(INT) bne test(INT),L1 mv sum(INT),result (INT) br L2 L1: addl t1(INT) , cnt(INT) , a(INT) (*) s4add a(INT) , av(TRACE) , t2(LOCATIVE) (*) ldl b(INT) , 0(t2(LOCATIVE)) mull columns(INT) , cnt(INT) , c (INT) addl j(INT) , c(INT) , d (INT) (*) s4add d (INT) , bv(TRACE) , t3(LOCATIVE) .ent Lv2851 dot 205955 (*) ldl e (INT) , 0(t3 (LOCATIVE)) # arguments : [$bound,$0] [$columns,$1] [$bv,$2] mull/v b (INT) , e (INT) , f(INT) # [$av,$3] [$t1,$4] [$j,$5] addl/v sum(INT) , f (INT) , g (INT) # [$cnt,$6] [$sum,$7] addl/v cnt(INT) , 1 , h (INT) # results : [$result,$0] trapb # return addr : [$retreg,$26] mv h (INT),cnt(INT) # destroys :$0$1$2$3$4$5$6$7$27 mv g (INT),sum(INT) dot 205955: Lv2851 br L0 .mask (1 << 26), -32 L2: return retreg(LABEL) g .frame $sp, 32, $26 .prologue 1 Figure 6: After conversion to RTL ldgp $gp, ($27) lda $sp, -32($sp) stq $26, ($sp) stq $8, 8($sp) compiler. We measure execution time, heap allo cation , phys- stq $9, 16($sp) ical memory requirements, executable size, and compile time. mov $26, $27 We also measure the e ect of lo op optimizations. Further L1: p erformance analysis of TIL app ears in Morrisett's [33] and cmplt $6, $0, $8 Tarditi's theses [45]. bne $8, L2 mov $7, $1 br $31, L3 5.1 Benchmarks L2: addl $4, $6, $8 Table 1 describ es the b enchmark programs, which range in s4addl $8, $3, $8 size from 62 lines to ab out 2000 lines of co de. Some of these ldl $8, ($8) programs have b een used previously for measuring ML p er- mull $1, $6, $9 formance [5, 16]. The b enchmarks cover a range of appli- addl $5, $9, $9 cation areas includin g scienti c computing, list-pro cessin g, s4addl $9, $2, $9 systems programming, and compilers. ldl $9, ($9) mullv $8, $9, $8 We compiled the programs as single closed mo dules. For addlv $7, $8, $7 Lexgen and Simple, which are standard b enchmarks [5], we addlv $6, 1, $6 eliminated functors by hand b ecause TIL do es not yet sup- trapb p ort the full SML mo dule language. Because whole pro- br $31, L1 grams were given to the compiler, we found that the opti- L3: mizer naturally eliminated all p olymorphic functions. Con- mov $1, $0 sequently, for this b enchmark suite, there was no run-time mov $27, $26 cost to supp ort intensional p olymorphism. ldq $8, 8($sp) We extended the built-in ML typ es with safe 2-dimensional ldq $9, 16($sp) lda $sp, 32($sp) arrays. The 2-d array op erations do b ounds checking on each ret $31, ($26), 1 dimension and then use unsafe 1-d array op erations. Arrays dot 205955 .end Lv2851 are stored in column-ma jor order. Figure 7: Actual DEC ALPHA assembly language 5.2 Comparison against SML/NJ We compared the p erformance of TIL against SML/NJ in several dimensions: execution time, total heap allo cation, physical memory fo otprint, the size of the executable, and compilation time. For TIL, we compiled programs with all optimizations enabled. For SML/NJ, we compiled programs using the de- fault optimization settings. We used a recentinternal release of SML/NJ (a variantofversion 1.08), since it pro duces co de that is ab out 35% faster than the current standard release (0.93) of SML/NJ [41]. 8 Program lines Description Checksum 241 Checksum fragmentfromtheFoxnet [7], doing 5000 checksums on a 4096-byte array. FFT 246 Fast fourier transform, multiplyi ng p olynomial s up to degree 65,536 Knuth-Bendix 618 An implementation of the Knuth-Bendix completion algorithm. Lexgen 1123 A lexical-ana ly zer generator [6], pro cessing the lexical description of Standard ML. Life 146 The game of Life implemented using lists [39]. Matmult 62 Integer matrix multiply, on 200x200 integer arrays. PIA 2065 The Persp ectiveInversion Algorithm [47] deciding the lo cation of an ob ject in a p ersp ective video image. Simple 870 A spherical uid-dynamics program [17], run for 4 iterations with grid size of 100. Table 1: Benchmark Programs TIL always pre xes a set of op erations on to eachmod- ule that it compiles, in order to facilitate optimization . This 125% \inline" prelude contains 2-d array op erations, commonly- used list functions, and so forth. Toavoid handicappi ng SML/NJ, we created separate copies of the b enchmark pro- 100% grams for SML/NJ, and placed equivalent \prelude" co de at the b eginning of each program byhand. 75% Since TIL creates stand-alone executables, weusedthe exportFn facility of SML/NJ to create stand-alone programs. The exportFn function of SML/NJ dumps part of the heap 50% to disk and throws away the interactive system. We measured execution time on DEC ALPHA AXP 3000/- 25% 300LX workstations, running OSF/1, version 2.0, using the UNIX getrusage function. For SML/NJ, we started timing after the heap had b een reloaded. For TIL, we measured the lexgen Cksum FFT KB Life Mmult PIA SIMPLE entire execution time of the pro cess, including load time. We made 5 runs of each program on an unloaded workstation Figure 8: TIL Execution Time RelativetoSML/NJ and chose the lowest execution time. Eachworkstation had 96MBytes of physical memory, so paging was not a factor in the measurements. Figure 10 presents the relative maximum amounts of We measured total heap allo cation by instrumenting the physical memory used. On average, TIL programs use half TIL run-time system to count the bytes allo cated. We used the memory used by SML/NJ programs. We see that oating- existing instrumentation in the SML/NJ run-time system. p oint programs use the least amount of memory relativeto We measured the maximum amountofphysical memory comparable SML/NJ programs. We sp eculate that this is during execution using getrusage.Weusedthesize pro- due to TIL's abilitytokeep oating values unboxed when gram to measure the size of executables for TIL. For SML/NJ, stored in arrays. we used the size program to measure the size of the run- TIL stand-alone programs are ab out half the size of stand- time system and then added the size of the heap created alone heaps and the runtime system of SML/NJ. The di er- by exportFn. Finally,we measured end-to-end compilation ence in size is mostly due to the di erent sizes of the runtime time, includin g time to assemble les pro duced byTIL. systems and standard libraries for the two compilers. (TIL's Figures 8 through 11 present the measurements (the raw runtime system is ab out 100K, while SML/NJ's runtime is numb ers are in App endix ??). For eachbenchmark, mea- ab out 425K.) The program sizes for TIL con rm that gener- surements for TIL were normalized to those for SML/NJ ating tables for nearly tag-free garbage collection consumes and then graphed. SML/NJ p erformance is the 100% mark a mo dest amount of space, and that the inlinin g strategy on all the graphs. used by TIL pro duces co de of reasonable size. Figure 8 presents relative running times. On average, Figure 11 compares compilation times for TIL and SML/NJ. programs compiled by TIL run 3.3 times faster than pro- SML/NJ do es much b etter than TIL when it comes to com- grams compiled by SML/NJ. In fact, all programs except pilation time, compiling ab out eight times faster. However, Knuth-Bendix and Life are substantially faster when com- wehaveyet to tune TIL for compilation sp eed. piled by TIL. We sp eculate that less of a sp eed-up is seen for Knuth-Bendix and Life b ecause they makeheavy use of list-pro cessing , which SML/NJ do es a go o d job of compiling. 5.3 Lo op-Oriented Optimizations Figure 9 compares the relativeamounts of heap allo ca- tion. On average, the amount of data heap-allo cated bythe We also investigated the e ect of the lo op-oriented optimiza- TIL program is ab out 17% of the amount allo cated bythe tions (CSE, invariantremoval, hoisting, comparison elimi- SML/NJ program. This is not surprising, b ecause TIL uses nation, and redundant switch elimination ). For eachbench- astack while SML/NJ allo cates frames on the heap. mark, we compared p erformance with the lo op optimiza- 9 Exec time /Exec time opt noopt Heap Allo c /Heap Allo c opt noopt 125% 100% 100% 75% 75% 50% 50% 25% 25% Lexgen Cksum FFT KB Life Mmult PIA SIMPLE Lexgen Cksum FFT KB Life Mmult PIA SIMPLE Figure 9: TIL Heap Allo cation Relative to SML/NJ Figure 12: E ects of Lo op Optimizations tions against p erformance without the lo op optimization s. Figure 12 presents the ratios of execution time with the lo op optimizations to execution time without the lo op optimiza- tions, and similar ratios for total heap allo cation. The lo op optimizations reduce execution time by 0 to 83%, with a 100% median reduction of 39%. The e ect on heap allo cation ranges from an increase of 20% to a decrease of 96.5%, with 75% a median decrease of 10%. For matmult, the matrix multiplicati on function is small enough that the optimizer inlines it, making the array di- 50% mensions known. If the array dimensions are held unknown, then the lo op optimization s sp eed up matmult by a factor of 25% 2.5. 6 Related Work Lexgen Cksum FFT KB Life Mmult PIA SIMPLE Morrison et al. used an \ad-ho c" approach to implement Figure 10: TIL Physical Memory Used Relative to SML/NJ p olymorphism in their implementation of Napier '88 [35]. In particular, they passed representations of typ es to p olymor- phic routines at run-time to determine b ehavior. However, to our knowledge, Napier '88 did not use typ es to implement tag-free garbage collection. Also, there is no description of the internals of the Napier '88 compiler, nor is there an ac- count of the p erformance of co de generated by the compiler. Peyton Jones and Launchbury suggested that typ es could b e used to unboxvalues in a p olymorphic language [26]. However, they only supp orted a limited set of \unboxed 2000% typ es" (ints and oats) and restricted these typ es from in- stantiating typ e variables. Later, Leroy suggested a gen- 1500% eral approach for unboxing values based on the ML typ e system [28]. Leroy's approach has b een extended and implemented elsewhere [38, 24, 42 ], including the SML/NJ 1000% compiler. It do es not supp ort unboxed arraycomponents nor attened, recursive datatyp es. Tolmach [46] combined 500% Leroy's approach with tag-free garbage collection. However, he used an ad hoc approach to propagate typ e information 100% to the collector. Lexgen Cksum FFT KB Life Mmult PIA SIMPLE Other researchers have suggested that p olymorphism should b e eliminated entirely at compile time [9, 25, 21 ], in the style Figure 11: Til Compilation Time Relative to SML/NJ of C++ templates [44]. This prevents separate compilation of a p olymorphic de nition from its uses. In contrast, in- tensional p olymorphism, and in particular the intermediate 10 forms of TIL, supp ort separate compilation of p olymorphic References de nitions, though wehaveyet to takeadvantage of this. [1] Shail Aditya, Christine Flo o d, and James Hicks. Garbage Tag-free garbage collection was originall y prop osed for collection for strongly-typed languages using run-time typ e monomorphic languages likePascal, but has b een used else- reconstruction. In LFP '94 [30], pages 12{23. where [12, 11 , 48, 15]. Britton suggested asso ciating typ e [2] Alfred V. Aho, Ravi Sethi, and Je rey D. Ullman. Com- information with return addresses on the stack [12]. App el pilers: Principles, Techniques, and Tools. Addison{Wesley suggested extending this technique to ML by using uni ca- Publishing Company, 1986. tion [4]. Goldb erg and Gloger improved App el's algorithm [3] Andrew App el. A critique of Standard ML. Journal of Func- [20, 19 ]. None of the uni cation-b ased algorithms were im- tional Programming, 3(4):391{429, Octob er 1993. plemented due to the complexity of the algorithms and the [4] Andrew W. App el. Runtime tags aren't necessary. Lisp and overhead of p erforming uni cation during garbage collec- Symbolic Computation, (2):153{162, 1989. tion. [5] Andrew W. App el. Compiling with Continuations. Cam- Aditya, Flo o d, and Hicks used typ e-passing to supp ort bridge University Press, 1992. fully tag-free garbage collection for Id [1]. Indep endently, Tolmach [46] implemented a typ e-passing garbage collec- [6] Andrew W. App el, James S. Mattson, and David Tarditi. A lexical analyzer generator for Standard ML. Distributed tion algorithm for ML. Our approach di ers from others with Standard ML of New Jersey, 1989. by using \nearly" tag-free collection. In particular, records and arrays on the heap are tagged. Another di erence is [7] Edoardo Biagioni, Rob ert Harp er, Peter Lee, and Brian Milnes. Signatures for a network proto col stack: A systems that we calculate typ e environments eagerly, while the other application of Standard ML. In LFP '94 [30], pages 55{64. implementation s construct typ e environments lazily during garbage collection. [8] Lars Birkedal, Nick Rothwell, Mads Tofte, and David N. Turner. The ML Kit, Version 1. Technical Rep ort 93/14, Lo op-oriented optimizations are well-known for imp er- DIKU, 1993. ative languages [2]. However, few results are rep orted for Lisp, Scheme, and ML. App el [5] and Serrano [40] rep ort [9] Guy E. Blello ch. NESL: A nested data-parallel language (version 2.6). Technical Rep ort CMU-CS-93-129, Scho ol of common-sub expressio n eliminati on optimizations similar to Computer Science, Carnegie Mellon University, April 1993. ours. App el found that CSE was not useful in the SML/NJ compiler. Serrano restricted CSE to pure expressions, while [10] Hans-Juergen Bo ehm. Space-ecient conservative garbage collection. In PLDI '93 [36], pages 197{206. our CSE handles expressions whichmay raise exceptions. [11] P. Branquart and J. Lewi. A scheme for storage allo cation and garbage collection for Algol-68. In Algol-68 Implementa- 7 Conclusions and future work tion. North-Holland Publishing Company, Amsterdam, 1970. [12] Dianne Ellen Britton. Heap storage management for the Our results show that for core-SML programs compiled as programming language Pascal. Master's thesis, University a whole, intensional p olymorphism can remove restrictions of Arizona, 1975. on data representation, yet cost literally nothing due to the [13] Fred C. Chow. Minimizing register usage p enaltyatproce- e ectiveness of optimization. They also show that lo op op- dure calls. In Proceeding s of the ACM SIGPLAN '88 Con- timizations can improve program p erformance signi cantl y. ferenceonProgramming Language Design and Implementa- These results suggest that ML can b e compiled as well as tion, pages 85{94, Atlanta, Georgia, June 1988. ACM. conventional languages suchasPascal. TIL pro duces co de [14] A. Demers, M. Weiser, B. Hayes, H. Bo ehm, D. Bobrow, that is similar in many imp ortant resp ects to co de pro duced and S. Shenker. Combining generational and conservative byPascal and C compilers. For example, most function calls garbage collection: Framework and implementations. In are known, since few higher-order functions are left, integers ConferenceRecord of the 17th Annual ACM SIGPLAN- are untagged, and most co de is monomorphic. SIGACT Symposium on Principles of Programming Lan- There are numerous areas that wewould liketoinvesti- guages,SanFrancisco, California, January 1990. ACM. gate further. Wewould like to explore the e ect of separate [15] Amer Diwan, Eliot Moss, and Richard Hudson. Compiler compilation. With separate compilation, p olymorphic func- supp ort for garbage collection in a statically typ ed language. tions may b e compiled separately from their uses, leading to In Proceedings of the ACM SIGPLAN '92 Conferenceon Programming Language Design and Implementation,pages some cost for intensional p olymorphism. Wewould liketo 273{282, San Francisco, CA, June 1992. ACM. measure this cost and explore what kinds of optimizations can reduce it. [16] Amer Diwan, David Tarditi, and Eliot Moss. Memory- System Performance of Programs with Intensive Heap Al- Another direction wewould liketoinvestigate is howthis lo cation. Transactions on Computer Systems, August 1995. approach p erforms for larger programs. Wewould liketo add supp ort for more of the ML mo dule system, since large [17] K. Ekanadham and Arvind. SIMPLE: An exercise in fu- ture scienti c programming. Technical Rep ort Computation ML programs make extensive use of the mo dule system. We Structures Group Memo 273, MIT, Cambridge, MA, July would also like to improve TIL's compile times, so that large 1987. Simultaneously published as IBM/T. J. Watson Re- programs can also b e compiled as a whole. search Center Research Rep ort 12686, Yorktown Heights, Finally,wewould like to continue improving the p er- NY. formance of ML programs. Wewould like to extend our [18] Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias register allo cation strategy along the lines of Chow [13] or Felleisen. The essence of compiling with continuations. In Steenkiste [43]. Wewould also liketoinvestigate more PLDI '93 [36], pages 237{247. lo op optimizations, such as strength-reduction, induction- [19] Benjamin Goldb erg. Tag-free garbage collection in strongly variable elimination , and lo op unrolling . On a more sp ecu- typ ed programming languages. In Proceedings of the ACM lative note, wewould like to explore stack allo cation of data SIGPLAN '91 ConferenceonProgramming Language De- structures. sign and Implementation, pages 165{176, Toronto, Canada, June 1991. ACM. 11 [20] Benjamin Goldb erg and Michael Gloger. Polymorphic typ e [38] Eigil Rosager Poulsen. Representation analysis for ecient reconstruction for garbage collection without tags. In Pro- implementation of p olymorphism. Technical rep ort, Depart- ceedings of the 1992 ACM Conference on Lisp and Func- ment of Computer Science (DIKU), University of Cop en- tional Programming, pages 53{65, San Francisco, California, hagen, April 1993. Master Dissertation. June 1992. ACM. [39] Chris Reade. Elements of Functional Programming. [21] Cordelia Hall, Simon L. Peyton Jones, and Patrick M. San- Addison-Wesley, Reading, Massachusetts, 1989. som. Unboxing using sp ecialisation. In D. Turner K. Ham- [40] Manual Serrano and Pierre Weis. 1+1 = 1: an optimizing mond, P.M. Sandom, editor, Functional Programming, 1994. CAML compiler. Technical Rep ort 2264, INRIA, June 1994. Springer-Verlag, 1995. [41] Zhong Shao. Compiling Standard ML for Ecient Execu- [22] Rob ert Harp er and Mark Lillibridge. A typ e-theoretic ap- tion on Modern Machines. PhD thesis, Princeton University, proach to higher-order mo dules with sharing. In POPL '94 Princeton, New Jersey,Novemb er 1994. [37], pages 123{137. [42] Zhong Shao and Andrew W. App el. A typ e-based compiler [23] Rob ert Harp er and Greg Morrisett. Compiling p olymor- for Standard ML. In Proceedings of the ACM SIGPLAN '95 phism using intensional typ e analysis. In ConferenceRecord ConferenceonProgramming Language Design and Imple- of the 22nd Annual ACM SIGPLAN-SIGACT Symposium mentation, pages 116{129, La Jolla, California, June 1994. on Principles of Programming Languages, pages 130{141, ACM. San Francisco, California, January 1995. ACM. [43] Peter Steenkiste. Advanced register allo cation.InPeter Lee, [24] Fritz Henglein and Jesp er Jrgensen. Formally optimal b ox- editor, Topics in AdvancedLanguage Implementation. MIT ing. In POPL '94 [37], pages 213{226. Press, 1990. [25] M.P. Jones. Partial evaluation for dictionary-freeoverload- [44] Bjarne Stroustrup. The C++ Programming Language, 2nd ing. Research Rep ort YALEU/DCS/RR-959, Yale Univer- Edition. Addison-Wesley, 1991. sity, New Haven, Connecticut, USA, April 1993. [45] David R. Tarditi. Optimizing ML. PhD thesis, Scho ol of [26] Simon Peyton Jones and John Launchbury.Unboxed values Computer Science, Carnegie Mellon University, 1996. Forth- as rst-class citizens. In Proceedings of the Conferenceon coming. Functional Programming and Computer Architecture,vol- ume 523 of Lecture Notes on Computer Science, pages 636{ [46] Andrew Tolmach. Tag-free garbage collection using explicit 666. ACM, Springer-Verlag, 1991. typ e parameters. In LFP '94 [30], pages 1{11. [27] David Kranz, Richard Kelsey, Jonathan Rees, Paul Hudak, [47] Kevin G. Waugh, Patrick McAndrew, and Greg Michaelson. James Philbin, and Norman Adams. ORBIT: An Optimizing Parallel implementations from function prototyp es: a case Compiler for Scheme. In Proceedings of the SIGPLAN '86 study.Technical Rep ort Computer Science 90/4, Heriot- Symposium on Compiler Construction, pages 219{233, Palo Watt University,Edinburgh, August 1990. Alto, California, June 1986. ACM. [48] P.L. Wo don. Metho ds of garbage collection for Algol-68. In [28] Xavier Leroy. Unboxed ob jects and p olymorphic typing. Algol-68 Implementation. North-Holland Publishing Com- In ConferenceRecord of the 19th Annual ACM SIGPLAN- pany, Amsterdam, 1970. SIGACT Symposium on Principles of Programming Lan- guages, pages 177{188, Albuquerque, NM, January 1992. ACM. [29] Xavier Leroy. Manifest typ es, mo dules, and separate compi- lation. In POPL '94 [37], pages 109{122. [30] Proceedings of the 1994 ACM Conference on Lisp and Func- tional Programming, Orlando, Florida, June 1994. ACM. [31] Robin Milner, Mads Tofte, and Rob ert Harp er. The De ni- tion of StandardML. MIT Press, 1990. [32] Y. Minamide, G. Morrisett, and R. Harp er. Typ ed clo- sure conversion. In ConferenceRecord of the 23rdAnnual ACM SIGPLAN-SIGACT Symposium on Principles of Pro- gramming Languages, St. Petersburg, Florida, January 1996. ACM. [33] Greg Morrisett. Compiling with Types. PhD thesis, Scho ol of Computer Science, Carnegie Mellon University, Pittsburgh, PA, Decemb er 1995. Published as Technical Rep ort CMU- CS-95-226. [34] Greg Morrisett, Matthias Felleisen, and Rob ert Harp er. Ab- stract mo dels of memory management. In ACM Confer- enceonFunctional Programming and Computer Architec- ture, pages 66{77, La Jolla, June 1995. [35] R. Morrison, A. Dearle, R. C. H. Connor, and A. L. Brown. An ad ho c approach to the implementation of p olymorphism. ACM Transactions on Programming Languages and Sys- tems, 13(3):342{371, July 1991. [36] Proceedings of the ACM SIGPLAN '93 ConferenceonPro- gramming Language Design and Implementation,Albu- querque, New Mexico, June 1993. ACM. [37] ConferenceRecord of the 21st Annual ACM SIGPLAN- SIGACT Symposium on Principles of Programming Lan- guages,Portland, Oregon, January 1994. ACM. 12