<<

TIL: A Typ e-Directed Optimizing for ML

D. Tarditi, G. Morrisett, P. Cheng, . Stone, R. Harp er, and P. Lee

Scho ol of Computer Science

Carnegie Mellon University

5000 Forb es Avenue

Pittsburgh, PA 15213-3891

structures (e.g., arrays) or recursive data structures (e.g., 1 Intro duction

lists).

We are investigating a new approach to compiling Standard

Intensional polymorphism and tag-free garbage col lection

ML (SML) based on four key technologies: intensionalpoly-

eliminate the need to use a universal representation when

morphism [23], nearly tag-free garbage col lection [12, 46 , 34],

compiling p olymorphic languages. TIL uses these technolo-

conventional functional language optimization,and loop op-

gies to representmany data values \naturally". For ex-

timization.To explore the practicality of our approach, we

ample, TIL provides tag-free, unallo cated, word-sized in-

have constructed a compiler for SML called TIL, and are

tegers; aligned, unboxed oating-p oint arrays; and unallo-

thus far encouraged by the results: On DEC ALPHA work-

cated multi-argument functions. These natural representa-

stations, programs compiled by TIL are roughly three times

tions and calling conventions not only improve the p erfor-

faster, do one- fth the total heap allo cation , and use one-

mance of SML programs, but also allowthemtointerop erate

half the physical memory of programs compiled bySMLof

with legacy co de written in languages suchasCandFortran.

New Jersey (SML/NJ). However, our results are still pre-

When typ es are unknown at , TIL may pro duce

liminary | wehave not yet investigated howtoimprove

machine co de which is slower and bigger than conventional

compile time; TIL takes ab out eight times longer to compile

approaches. This is b ecause typ es must b e constructed and

programs than SML/NJ. Also, wehave not yet implemented

passed to p olymorphic functions, and p olymorphic functions

the full mo dule system of SML, although we do provide sup-

must examine the typ es at run-time to determine appropri-

p ort for structures and separate compilation. Finally,we

ate paths. However, when typ es are known at

exp ect the p erformance of programs compiled by TIL to im-

compile time, no is incurred to supp ort p olymor-

prove signi cantly as we tune the compiler and implement

phism or garbage collection.

more optimizations .

Because these technologies make p olymorphic functions

Twokey issues in the compilation of advanced languages

slower, it b ecomes imp ortant to eliminate as many p olymor-

such as SML are the presence of garbage col lection and type

phic functions at compile time as is p ossible. Inlining and

variables. Most use a universal representation for

uncurrying are well-known techniques for eliminating p oly-

values of unknown or variable typ e. In particular, values are

morphic and higher-order functions. Wehave found that

forced to into a tagged machine word; values larger than

for the b enchmarks used here, these techniques eliminate all

amachine word are represented as p ointers to tagged, heap-

p olymorphic functions and all but a few higher-order func-

allo cated ob jects. This approach supp orts fast garbage col-

tions when programs are compiled as a whole.

lection and ecient p olymorphic functions, but can result in

Wehave also found that applying traditional lo op op-

inecient codewhentyp es are known at compile time. Even

timizations to recursive functions, such as common sub-

with recent advances in SML compilation, such as Leroy's

expression elimination and invariantremoval, is imp ortant.

representation analysis [28], values must b e placed in a uni-

In fact, these optimization reduce execution time byame-

versal representation b efore b eing stored in up dateable data

dian of 39%.

An imp ortant prop erty of TIL is that all optimizations

This researchwas sp onsored in part by the Advanced Research

and the key transformations are p erformed on typed inter-

Pro jects Agency CSTO under the title \The Fox Pro ject: Advanced

mediate languages (hence the name TIL). Maintaining cor-

Languages for Systems Software", ARPA Order No. C533, issued by

ESC/ENS under Contract No. F19628-95-C-0050,andinpartby the

rect typ e information throughout optimization is necessary

National Science Foundation under Grant No. CCR-9502674, and in

to supp ort b oth intensional p olymorphism and garbage col-

part by the Isaac Newton Institute for Mathematical Sciences, Cam-

lection, b oth of which require typ e information at run time.

bridge, England. David Tarditi was also partly supp orted byan

By using strongly-typ ed intermediate languages, weensure

AT&T Bell Labs PhD Scholarship. The views and conclusions con-

tained in this do cument are those of the authors and should not b e in-

that typ e information is maintained in a principled fash-

terpreted as representing ocial p olicies, either expressed or implied,

ion, instead of relying up on ad hoc invariants. In fact, us-

of the Advanced Research Pro jects Agency, the U.S. Government, the

ing the intermediate forms of TIL, an \untrusted" compiler

National Science Foundation or AT&T.

can pro duce fully optimized intermediate co de, and a client

can automatically verify the typ e integrityofthecode.We

To app ear in the ACM SIGPLAN '96 Conference on

have found that this ability has a strong engineering b ene t:

Programming Language Design and Implementation

typ e-checking the output of each optimization or transfor-

mation helps us identify and eliminate bugs in the compiler.

2.2 Conventional and Lo op-Oriented Opti- In the remainder of this pap er, we describ e the technolo-

gies used by TIL in detail, giveanoverview of the structure

mizations

of TIL, present a detailed example showing how TIL com-

Program optimization is crucial to reducing the cost of in-

piles ML co de, and give p erformance results of co de pro-

tensional p olymorphism,i mprovi ng lo ops and recursivefunc-

duced by TIL.

tions, and eliminating higher-order and p olymorphic func-

tions. TIL employs optimizations found in conventional

2 Overview of the Technologies

functional language compilers, including inlining, uncurry-

ing, dead-co de elimination, and constant-folding. In addi-

This section contains a high-level overview of the technolo-

tion, TIL do es a set of generalized \lo op-oriented" optimiza-

gies we use in TIL.

tions to improve recursive functions. These optimizations in-

clude common-sub expressi on elimination , invariantremoval,

and array-b ound check removal. In spite of the large num-

2.1 Intensional Polymorphism

b er of di erent optimizations, each optimization pro duces

Intensional p olymorphis m [23] eliminates restrictions on data

typ e-correct co de.

representations due to p olymorphism, separate compilation,

TIL applies optimizations across entire compilation units.

abstract datatyp es, and garbage collection. It also supp orts

This makes it more likely that inlinin g and uncurrying will

ecient calling conventions (multiple arguments passed in

eliminate higher-order functions, which are likely to interfere

registers) and tag-free p olymorphic, structural equality.

with the lo op-oriented optimizations . Since the optimiza-

With intensional p olymorphism, typ es are constructed

tions are applied to entire compilation units (whichmaybe

and passed as values at run time to p olymorphic functions,

whole programs), we paid close attention to algorithmic e-

and these functions can branch based on the typ es. For

ciency of individual optimization passes. Most of the passes

example, when extracting a value from an array, TIL uses a

haveanO (N log N )worst-case asymptotic complexity (ex-

typecase expression to determine the typ e of the array and

cluding checking typ es for equality), where N is program

to select the appropriate sp ecialized subscript op eration:

size.

fun sub[ ](x: array,i:int)=

typecase of

2.3 Nearly Tag-Free Garbage Collection

int => intsub(x, i)

Nearly tag-free garbage collection uses typ e information to

| oat => floatsub(x, i)

eliminate data representation restrictions due to garbage

| ptr( ) => ptrsub(x, i)

collection. The basic idea is to record enough represen-

If the typ e of the array can b e determined at compile-time,

tation information at compile time so that, at any p oint

then an optimizer can eliminate the typecase:

where a garbage collection can o ccur, it is p ossible to deter-

mine whether or not values are p ointers and hence must b e

sub[ oat](a, 5) ,! floatsub(a,5)

traced by the garbage collector. Recording the information

However, intensional p olymorphism comes with two costs.

at compile time makes it p ossible for co de to use untagged

First, wemust construct and pass representations of typ es to

representations. Unlike so-called conservative collectors (see

p olymorphic functions at run time. Furthermore, wemust

for example [10, 14]), the information recorded by TIL is

compile p olymorphic functions to supp ort any p ossible rep-

sucient to collect all unreachable ob jects.

resentation and insert typecase constructs to select the ap-

Collection is \nearly" tag-free b ecause tags are placed

propriate co de paths. Hence, the co de we generate for p oly-

only on heap-allo cated data structures (records and arrays);

morphic functions is b oth bigger and slower, and minimizing

values in registers, on the stack, and within data structures

p olymorphism b ecomes quite imp ortant.

remain tagless. We construct the tags for monomorphic

Second, in order to use typ e information at run time, for

records and arrays at compile time. For records or arrays

b oth intensional p olymorphism and tag-free garbage collec-

with unknown comp onenttyp es, wemay need to construct

tion, wemust propagate typ es through each stage of compi-

tags partially at run time. As with other p olymorphic op er-

lation. To address this second problem, almost all compila-

ations, weuseintensional p olymorphism to construct these

tion stages, including optimization and closure conversion,

tags.

are expressed as typ e-directed, typ e-preserving translations

Registers and comp onents of stack frames are not tagged.

to strongly-typ ed intermediate languages.

Instead, we generate tables at compile time that describ e

The key diculty with using typ ed intermediate lan-

the layout of registers and stack frames. We asso ciate these

guages is formulating a typ e system that is expressive enough

tables with the addresses of call sites within functions at

to statically typ e check terms that branchontyp es at run

compile time. When garbage collection is invoked, the col-

time, suchassub.Thetyp e system used in TIL is based

lector scans the stack, using the return address of each frame

on the approach suggested by Harp er and Morrisett [23, 33].

as an index into the table. The collector lo oks up the lay-

Typ es themselves are represented as expressions in a simply-

out of eachstack-frame to determine which stacklocations

typ ed -calculus extended with an inductivel y generated

to trace. We record additional liveness information for each

base kind (the monotyp es), and a corresp onding induction

variable to avoid tracing p ointers that are no longer needed.

eliminatio n form. The induction elimination form is es-

This approachiswell-understo o d for monomorphic lan-

sentially a \Typ ecase" at the typ e level; this allows us to

guages requiring garbage collection [12]. Following Tolmach

write typ e expressions that track the run-time control ow

[46], we extended it to a p olymorphic language as follows:

of term-level typecase expressions. Nevertheless, the typ e

when a variable whose typ e is unknown is saved in a stack

system used by TIL remains b oth sound and decidable.This

frame, the typ e of the variable is also saved in the stack

implies that at any stage during optimization, we can auto-

frame. However, unlikeTolmach, weevaluate substitutions

matically verify the typ e integrity of the co de. 2

and a p ointer to the record containing the value and the of ground typ es for typ e variables eagerly instead of lazily.

list value. The tag is a small integer value used to distin- This is due in part for technical reasons (see [33, Chapter

guish among the constructors of a datatyp e (e.g., nil vs. 7]), and in part to avoid a class of space leaks that might

::). Constructor attening rewrites all constructors that result with lazy substitution.

take records as arguments so that the comp onents of the

records are attened. In addition, constructor attening

3 Compilation Phases of TIL

eliminates tag comp onents when they are unneeded. For

example, cons applied to (hd,tl) is simply represented as a

Figure 1 shows the various compilation phases of TIL. The

p ointer to the pair (hd,tl), since suchapointer can always

phases through and includin g closure conversion use a typ ed

b e distinguish ed from nil. If the constructor takes an argu-

intermediate language. The phase after closure conversion

ment of unknown typ e, then weusetypecase to determine

use an untyp ed language where variables are annotated with

the prop er representation, according to the instantiation of

garbage collection information. The low-level phases of the

at run time.

compiler use languages where registers are annotated with

Because lists are used often in SML, the SML/NJ com-

garbage collection information.

piler also attens cons cells (and other constructors). How-

The following sections describ e the phases of TIL and

ever, in violation of the SML De nition [31], SML/NJ pre-

the intermediate languages they use in more detail.

vents from abstracting the typ e of these con-

structors, in order to prevent representation mismatches b e-

tween de nitions of abstract datatyp es and their uses [3]. In

3.1 Front-end

contrast, TIL supp orts fully abstract datatyp e comp onents,

The rst phase of TIL uses the front-end of the ML Kit

but uses intensional p olymorphism to determine representa-

compiler [8] to parse and elab orate (typ e check) SML source

tions of abstract datatyp es, p otentially at run time.

co de. The Kit pro duces annotated abstract syntax for all of

In addition to sp ecializin g calling conventions and datatyp es,

SML and then compiles a subset of this abstract syntax to

the conversion from Lamb da to Lmli makes p olymorphic

an explicitly-typ e d core language called Lamb da. The com-

equality explicit as a term in the language. Also, arrays are

pilation to Lamb da eliminates pattern matching and various

sp ecialized into one of three cases: int arrays, oat arrays,

derived forms.

and p ointer arrays. Intensional p olymorphism is used to

We extended Lamb da to supp ort signatures, structures

select the appropriate creation, subscript, and up date op er-

(mo dules), and separate compilation. Each source mo dule

ations for p olymorphic arrays.

is compiled to a Lamb da mo dule with an explicit list of

Finally,TILboxes all oating p ointvalues, except for

imp orted mo dules and their signatures. Imp orted signa-

values stored in oating-p oint arrays. Wechose to b ox

tures may include transparent de nitions of typ es de ned in

oats to make record op erations faster, since typical SML

other mo dules; hence TIL supp orts a limited form of translu-

co de manipulates many records but few oats. The is-

cent [22] or manifest typ es [29]. Currently, the mapping to

sue is that oating-p ointvalues are 64 bits, while other

Lamb da do es not handle signatures, nested structures, or

scalars and p ointers are 32 bits. If oats were unboxed, then

functors. In principle, however, all of these constructs are

record o set calculations could not always b e done at com-

supp orted by TIL's intermediate languages.

pile time. Fortunately, the optimizer later eliminates un-

ecessary b ox/unbox op erations during the constant-folding

phase, so straight-line oating p oint co de still runs fast.

3.2 Lmli and Typ e-Directed Optimizations

In all, the combination of typ e-directed optimizations re-

ML

duce running times by roughly 40% and allo cation by 50%

Lmli, which stands for  [23], is an intensionall y p olymor-

i

[33, Chapter 8]. However, much of this improvement can

phic language that provides explicit supp ort for construct-

b e realized by other techniques; For example, SML/NJ uses

ing, passing, and analyzing typ es at run-time. We use these

Leroy's unboxing technique to achieve comparable improve-

constructs in the translation of Lamb da to Lmli to provide

ments for calling conventions [42]. The advantage of our

ecient data representations for user-de ned datatyp es, multi-

approach is that weuseasinglemechanism (intensional

argument functions, tag-free p olymorphic equality, and sp e-

p olymorphism) to sp ecialize calling conventions, atten con-

cialized arrays.

structors, unbox oating-p ointarrays, and eliminatin g tags

After the conversion from Lamb da to Lmli, TIL p erforms

for b oth p olymorphi c equality and garbage collection.

a series of typ e-directed optimization s. SML provides only

single-argument functions; multiple arguments are passed in

a record. The rst optimization, argument attening, trans-

3.3 Optimizations

lates each function whichtakes a record as an argumentto

a function whichtakes the comp onents of the record as mul-

TIL employs an extensive set of optimizations. The opti-

tiple arguments. These arguments are passed in registers,

mizations include most of those typically done by compilers

avoiding allo cation to create the record and memory op-

for functional languages. They also include lo op-oriented

erations to access record comp onents. If a function takes

optimizations, suchasinvariant removal, applied to recur-

an argumentofvariable typ e , then weuse typecase to

sive functions.

determine the prop er calling convention, according to the

TIL rst translates Lmli to a subset of Lmli called Bform.

instantiation of at run time.

Bform, based on A-Normal-Form [18], is a more regular in-

As with functions, datatyp e constructors in SML take

termediate language than Lmli that facilitates optimization .

a single argument. For example, the cons data construc-

The translation from Lmli names all intermediate computa-

tor (::) for an list takes a single record, consisting of

tions and binds them to variables byalet-construct. It

an value and an list value. Naively, such a construc-

also names all p otentially heap-allo cated values, including

tor is represented as a pair consisting of a tag (e.g., cons),

strings, records and functions. Finally, it allows nested 3 parse, elaborate, eliminate Front end pattern matching

introduce intensional polymorphism, Conversion to Lmli choose data representations

flatten args, Typed intermediate Type-directed flatten constructors, languages optimization box floats

do inlining, uncurrying, constant- Conventional and folding, CSE, invariant removal, loop optimizations etc.

close functions, Closure conversion choose environment representations

Conversion to untyped calculate gc info for variables, language with gc info. choose representation for types

choose machine representation Conversion to RTL for variables, introduce tagging for records and arrays Registers annotated with gc info do graph-coloring register allocation, construct tables for gc

Assembly

Figure 1: Phases of the TIL compiler 4

let expressions only within switches (branch expressions). the nested if statement is replaced by e , since z is

1

Hence, the translation from Lmli to Bform linearizes and always true at that p oint.

names nested computations and values.

 invariantremoval: Using the , we calcu-

After translation to Bform, TIL p erforms the following

late the nesting depth of each function. (Nesting-depth

conventional transformations:

is analogous to lo op-nesting depth in languages likeC.)

 alpha-conversion: Bound variables are uniquely re- TIL assigns a let-b ound variable and the expression

named. it binds a nesting depth equal to that of the nearest

enclosing function. For every pure expression e, if all

 dead-co de elimination: unreferenced, pure expres-

free variables of e have a nesting depth less than e, TIL

sions and functions are eliminated.

moves the de nition of e right after the de nition of

the free variable with the highest lexical nesting depth.

 uncurrying: Curried functions are transformed to

multi-argument functions, when p ossible.

 hoisting: All constant expressions are hoisted to the

top of the program. An expression is a constant ex-

 : Arithmetic op erations, switches,

pression if it uses only constants or variables b ound to

and typ ecases on constantvalues are reduced, as well

constant expressions.

as pro jections from known records.

 eliminating redundant comparisons: A set of sim-

 sinking: Pure expressions used in only one branchof

ple arithmetic relations of the form x

a switch are pushed into that branch. However, such

top-down through the program. A \rule-of-signs" ab-

expressions are not pushed into function de nitions.

stract interpretation is used to determine signs of vari-

ables. This information is used to eliminate array-

 inlining: Non-escaping functions that are called only

b ounds checks and other tests.

once are always inlined. Small, non-recursive functions

are inlined in a b ottom-up pass. Recursive functions

TIL applies the optimizations as follows: rst, it p er-

are never (directly) inlined.

forms a round of reduction optimizations, including dead-

co de eliminatio n, constant folding, inlinin g functions called

 inlining switchcontinuations: The continuation of

once, CSE, eliminatin g redundant switches, and invariant

a switch is inlined when all but one branch raises an

removal. These optimization s do not increase program size

exception. For example, the expression

and should result in faster co de. It iterates these optimiza-

let x = if y then e else raise e

2 3

tions until no further reductions o ccur. Then it p erforms

in e

4

switch-continuation inlinin g, sinking, uncurrying, compar-

end

ison eliminati on, x minimizing, and inlining . The entire

pro cess, starting with the reduction optimizations, is iter-

is transformed to

ated two or more times.

if y then let x = e in e end else raise e .

2 4 3

3.4 Closure conversion

This makes expressions in e available within e for

2 4

optimizations like common sub-expression elimination .

TIL uses a typ e-directed, abstract closure conversion in the

style suggested by Minamide, Morrisett, and Harp er [32] to

 minimizing x: Mutually-recursi ve functions are bro-

convert Lmli-Bform programs to to Lmli-Closure programs.

ken into sets of strongly connected comp onents. This

Lmli-Closure is an extension of Lmli-Bform that provides

improves inlining and dead co de elimination , by sepa-

constructs for explicitl y constructing closures and their en-

rating non-recursive and recursive functions.

vironments.

For each escaping Bform function, TIL generates a closed

In addition to these standard functional language trans-

piece of co de, a typ e environment,andavalue environment.

formations, TIL also applies lo op-oriented optimization s to

The co de takes the free typ e variables and free value vari-

recursive functions:

ables of the original function as extra arguments. The typ es

 common sub expression elimination (CSE): Given and values corresp onding to these free variables are placed in

an expression records. These records are paired with the co de to form an

abstract closure. TIL uses a at environment representation

let x = e

1

for typ e and value environments [5].

in e

2

For known functions, TIL generates closed co de but avoids

end

creating environments or a closure. Following Kranz [27], we

mo dify the call sites of known functions to pass free variables

if e is pure or the only e ect it mayhave is to raise an

1

as additional arguments.

exception, then all o ccurrences of e in e are replaced

1 2

TIL closes over only variables which are function argu-

with x. The only expressions that are excluded from

ments or are b ound within functions. The lo cations of other

CSE are side-e ecting expressions and function calls.

\top-level" variables are resolved at compile-time through

traditional linking, so their values do not need to b e stored

 eliminating redundant switches: Given an expres-

in a closure.

sion

let x = if z then

let y = if z then e else e

1 2

in ... 5

The function sub2 is a built-in 2-d array subscript function 3.5 Conversion to an untyp ed language

which the front end expands to

To simplify the conversion to low-level assembly co de, TIL

translates Lmli-Closure programs to an untyp ed language

fun sub2 (fcolumns,rows,vg, s :int, t:int) =

called Ubform. Ubform is a much simpler language than

if s <0 orelse s>=rows orelse t<0 orelse

Lmli, since similar typ e-level and term-level constructs are

t>=columns then raise Subscript

collapsed to the same term-level constructor. For exam-

else unsafe sub1(v,s * columns + t)

ple, in the translation from Lmli-Closure to Ubform, TIL

Figures 2 through 7 show the actual intermediate co de

replaces typecase with a conventional switch expression.

created as dot and sub2 pass through the various stages

This simpli es generation of low-level co de, since there are

of TIL. For readability,wehave renamed variables, erased

many fewer cases.

typ e information, and p erformed some minor optimization s,

TIL annotates variables with representation information

such as eliminating selections of elds from known records.

that tells the garbage collector what kinds of values variables

Figure 2 shows the functions after they have b een con-

must contain (e.g., p ointers, integers, oats, or p ointers to

verted to Lmli. The sub2 function takes a typ e as an ar-

co de). The representation of a variable x may b e unknown

gument. A function parameterized byatyp e is written as

at compile time, in which case the representation informa-

t., while a function parameterized byavalue is written as

tion is the name of the variable y that will contain the typ e

i. In the dot function, the sub2 function is rst applied to

of x at run time.

a type and then applied to its actual values. Each function

takes only one argument, often a record, from which elds

3.6 Conversion to RTL

are selected. The quality ofcodeatthislevel is quite p o or:

there are eight function applications , four record construc-

Next TIL converts Ubform programs to RTL,a register-transfer

tions, and numerous checks for array b ounds.

language similar to ALPHA or other RISC-style assembly

Figure 3 shows the Lmli fragment after it has b een con-

language. RTL provides an in nite numb er of pseudo-registers

verted to Lmli-Bform. Functions have b een transformed to

each of which is annotated with representation informa-

takemultiple arguments instead of records and every inter-

tion. Representation information is extended to include

mediate compuation is named.

lo catives, which are p ointers into the middle of ob jects.

Figure 4 shows the Lmli-Bform fragment after it has b een

Pseudo-registers containing lo catives are never live across a

optimized. All the function applications in the b o dy of the

p oint where garbage collection can o ccur. RTL also provides

ai(av,a) is an applica- lo op have b een eliminated. psub

heavy-weight function call and return mechanisms, and a

tion of the (unsafe) integer array subscript primitive. All of

form of interpro cedural goto for implementing exceptions.

the comparisons for array b ounds checking have b een safely

The conversion of Ubform to RTL decides whether Ub-

eliminated, and the b o dy of the lo op consists of 9 expres-

form variables will b e represented as constants, lab els, or

sions. This lo op could b e improved even further; wehaveyet

pseudo-registers. It also eliminates exceptions, inserts tag-

to implementany form of and induction

ging op erations for records and arrays, and inserts garbage

variable eliminatio n.

collection checks.

Figure 5 shows the Lmli-Bform fragment after it has b een

converted to Ubform. Eachvariable is now annotated with

3.7 Register allo catio n and assembly

representation information,tobeusedby the garbage col-

lector. INT denotes integers and TRACE denotes p ointers to

Before doing register allo cation, TIL converts RTL programs

tagged ob jects. The function is now closed, since it was

to ALPHA with extensions similar to

closure converted b efore converting to Ubform.

those for RTL. Then TIL uses conventional graph-coloring

Figure 6 shows the Ubform fragment after it has b een

register allo cation to allo cate physical registers for the pseudo-

converted to RTL. Every pseudo-register is now annotated

registers. It also generates tables describing layout and

with precise representation information for the collector.

garbage collection information for eachstackframe,asde-

The representation information has b een extended to in-

scrib ed in Section 2.3. Finally, TIL generates actual ALPHA

clude LOCATIVE, which denotes p ointers into the middle of

assembly language and invokes the system assembler, which

tagged ob jects. Lo catives cannot b e live across garbage-

do es and creates a standard ob ject

collection p oints. The (*) indicates p oints where the psub ai

le.

primitive has b een expanded to twoRTL instructions. This

indicates that induction-vari ab le elimination would also b e

pro table at the RTL level. The return instruction's op erand

4 An example

is a pseudo-register containing the return address.

Figure 7 shows the actual DEC ALPHA assembly lan-

This section shows an ML function as it passes through the

guage generated for the dot function. Thecodebetween

various stages of TIL. The following SML co de de nes a dot

L1 and L3 corresp onds to the RTL co de. The other co de

pro duct function that is the inner lo op of the integer matrix

is epilogue and prologue co de for entering and exiting the

multiply b enchmark:

function. Note that no tagging op erations o ccur anywhere

val sub2 : 'a array2 * int * int -> 'a

in this function.

fun dot(cnt,sum)=

if cnt

5 Performance

let val sum'=sum+sub2(A,i,cnt)*sub2(B,cnt,j)

in dot(cnt+1,sum')

In this section, we compare the p erformance of programs

end

compiled by TIL against programs compiled by the SML/NJ

else sum 6

sub2 =

let fix f = ty.

let fix g = arg.

let a = (#0 arg)

s = (#1 arg)

t = (#2 arg)

fix dot =

columns = (#0 a)

cnt,sum.

rows = (#1 a)

let test = plst i(cnt,bound)

v = (#2 a)

r = Switch enum test of

check =

1=>

let test1 = plst i(s,0)

.

in Switch enum test of

let a = t1 + cnt

1=>.enum(1)

b = psub ai(av,a)

|0=>.

c = columns * cnt

let test2 = pgte i(s,rows)

d= j+c

enum test2 of in Switch

e = psub ai(bv,d)

1=>.enum(1)

f = b*e

|0=>.

g = sum+f

let test3 = plst i(t,0)

h = 1+cnt

in Switch enum test3 of

i = dot(h,g)

1=>.enum(1)

in i

|0 =>.pgte i(t,columns)

end

end

| 0=>.sum

end

in r

end

end

in Switch enum check of

1=>.raise Subscript

|0=>.unsafe sub1 [ty] fv,t + s * columnsg

Figure 4: Lmli-Bform after optimization

end

in g

in f

end

fix dot=

i.let cnt = (#0 i)

sum = (#1 i)

i(cnt,bound) d = plst

in Switch enum d

of1=>.let sum' = sum +

((sub2 [int]) fA,i,cntg)*

((sub2 [int]) fB,cnt,jg)

in dotfcnt+1,sum'g

fix dot =

end

bound:INT,columns:INT,bv:TRACE,av:TRACE,t1:INT,

| 0=>.sum

j:INT,cnt:INT,sum:INT.

end

i(bound,cnt) let test:INT = pgtt

r:INT =

Switchint test of

Figure 2: After conversion to Lmli

1=>

let a:INT = t1 + cnt

b:INT = psub ai(av,a)

sub2 = ...

c:INT = columns * cnt

fix dot = cnt,sum.

d:INT = j + c

let test = plst i(cnt, bound)

ai(bv,d) e:INT = psub

r=

f:INT = b*e

enum test of Switch

g:INT = sum+f

1=>.

h:INT = 1+cnt

let a = sub2[int]

i:INT = dot(bound,columns,bv,

b = a(A,i,cnt)

av,t1,j,h,g)

c = sub2[int]

in i

d = c(B,cnt,j)

end

e = b*d

| 0 => sum

f = sum+e

in r

g = cnt+1

end : INT

h = dot(g,f)

in h

end

Figure 5: After conversion to Ubform

|0 =>.sum

in r

end

Figure 3: Lmli-Bform b efore optimization 7

dot(([bound(INT),columns(INT),bv(TRACE),

av(TRACE),t1(INT),j(INT),cnt(INT),

sum(INT)],[]))

f L0: pgt bound (INT) , cnt(INT) , test(INT)

bne test(INT),L1

mv sum(INT),result (INT)

br L2

L1: addl t1(INT) , cnt(INT) , a(INT)

(*) s4add a(INT) , av(TRACE) , t2(LOCATIVE)

(*) ldl b(INT) , 0(t2(LOCATIVE))

mull columns(INT) , cnt(INT) , c (INT)

addl j(INT) , c(INT) , d (INT)

(*) s4add d (INT) , bv(TRACE) , t3(LOCATIVE)

.ent Lv2851 dot 205955

(*) ldl e (INT) , 0(t3 (LOCATIVE))

# arguments : [$bound,$0] [$columns,$1] [$bv,$2]

mull/v b (INT) , e (INT) , f(INT)

# [$av,$3] [$t1,$4] [$j,$5]

addl/v sum(INT) , f (INT) , g (INT)

# [$cnt,$6] [$sum,$7]

addl/v cnt(INT) , 1 , h (INT)

# results : [$result,$0]

trapb

# return addr : [$retreg,$26]

mv h (INT),cnt(INT)

# destroys :$0$1$2$3$4$5$6$7$27

mv g (INT),sum(INT)

dot 205955: Lv2851

br L0

.mask (1 << 26), -32

L2: return retreg(LABEL) g

.frame $sp, 32, $26

.prologue 1

Figure 6: After conversion to RTL ldgp $gp, ($27)

lda $sp, -32($sp)

stq $26, ($sp)

stq $8, 8($sp)

compiler. We measure execution time, heap allo cation , phys-

stq $9, 16($sp)

ical memory requirements, size, and compile time.

mov $26, $27

We also measure the e ect of lo op optimizations. Further

L1:

p erformance analysis of TIL app ears in Morrisett's [33] and

cmplt $6, $0, $8

Tarditi's theses [45].

bne $8, L2

mov $7, $1

br $31, L3

5.1 Benchmarks

L2:

addl $4, $6, $8

Table 1 describ es the b enchmark programs, which range in

s4addl $8, $3, $8

size from 62 lines to ab out 2000 lines of co de. Some of these

ldl $8, ($8)

programs have b een used previously for measuring ML p er-

mull $1, $6, $9

formance [5, 16]. The b enchmarks cover a range of appli-

addl $5, $9, $9

cation areas includin g scienti c , list-pro cessin g,

s4addl $9, $2, $9

systems programming, and compilers.

ldl $9, ($9)

mullv $8, $9, $8

We compiled the programs as single closed mo dules. For

addlv $7, $8, $7

Lexgen and Simple, which are standard b enchmarks [5], we

addlv $6, 1, $6

eliminated functors by hand b ecause TIL do es not yet sup-

trapb

p ort the full SML mo dule language. Because whole pro-

br $31, L1

grams were given to the compiler, we found that the opti-

L3:

mizer naturally eliminated all p olymorphic functions. Con-

mov $1, $0

sequently, for this b enchmark suite, there was no run-time

mov $27, $26

cost to supp ort intensional p olymorphism.

ldq $8, 8($sp)

We extended the built-in ML typ es with safe 2-dimensional

ldq $9, 16($sp)

lda $sp, 32($sp)

arrays. The 2-d array op erations do b ounds checking on each

ret $31, ($26), 1

dimension and then use unsafe 1-d array op erations. Arrays

dot 205955 .end Lv2851

are stored in column-ma jor order.

Figure 7: Actual DEC ALPHA assembly language

5.2 Comparison against SML/NJ

We compared the p erformance of TIL against SML/NJ in

several dimensions: execution time, total heap allo cation,

physical memory fo otprint, the size of the executable, and

compilation time.

For TIL, we compiled programs with all optimizations

enabled. For SML/NJ, we compiled programs using the de-

fault optimization settings. We used a recentinternal release

of SML/NJ (a variantofversion 1.08), since it pro duces co de

that is ab out 35% faster than the current standard release

(0.93) of SML/NJ [41]. 8

Program lines Description

Checksum 241 Checksum fragmentfromtheFoxnet [7], doing 5000 checksums on a 4096-byte

array.

FFT 246 Fast fourier transform, multiplyi ng p olynomial s up to degree 65,536

Knuth-Bendix 618 An implementation of the Knuth-Bendix completion algorithm.

Lexgen 1123 A lexical-ana ly zer generator [6], pro cessing the lexical description of Standard

ML.

Life 146 The game of Life implemented using lists [39].

Matmult 62 Integer matrix multiply, on 200x200 integer arrays.

PIA 2065 The Persp ectiveInversion Algorithm [47] deciding the lo cation of an ob ject in a

p ersp ective video image.

Simple 870 A spherical uid-dynamics program [17], run for 4 iterations with grid size of 100.

Table 1: Benchmark Programs

TIL always pre xes a set of op erations on to eachmod-

ule that it compiles, in order to facilitate optimization . This

125%

\inline" prelude contains 2-d array op erations, commonly-

used list functions, and so forth. Toavoid handicappi ng

SML/NJ, we created separate copies of the b enchmark pro-

100%

grams for SML/NJ, and placed equivalent \prelude" co de

at the b eginning of each program byhand.

75%

Since TIL creates stand-alone , weusedthe

exportFn facility of SML/NJ to create stand-alone programs.

The exportFn function of SML/NJ dumps part of the heap

50%

to disk and throws away the interactive system.

We measured execution time on DEC ALPHA AXP 3000/-

25%

300LX workstations, running OSF/1, version 2.0, using the

UNIX getrusage function. For SML/NJ, we started timing

after the heap had b een reloaded. For TIL, we measured the

lexgen

Cksum FFT KB Life Mmult PIA SIMPLE

entire execution time of the pro cess, including load time. We

made 5 runs of each program on an unloaded workstation

Figure 8: TIL Execution Time RelativetoSML/NJ

and chose the lowest execution time. Eachworkstation had

96MBytes of physical memory, so paging was not a factor in

the measurements.

Figure 10 presents the relative maximum amounts of

We measured total heap allo cation by instrumenting the

physical memory used. On average, TIL programs use half

TIL run-time system to count the bytes allo cated. We used

the memory used by SML/NJ programs. We see that oating-

existing instrumentation in the SML/NJ run-time system.

p oint programs use the least amount of memory relativeto

We measured the maximum amountofphysical memory

comparable SML/NJ programs. We sp eculate that this is

during execution using getrusage.Weusedthesize pro-

due to TIL's abilitytokeep oating values unboxed when

gram to measure the size of executables for TIL. For SML/NJ,

stored in arrays.

we used the size program to measure the size of the run-

TIL stand-alone programs are ab out half the size of stand-

time system and then added the size of the heap created

alone heaps and the of SML/NJ. The di er-

by exportFn. Finally,we measured end-to-end compilation

ence in size is mostly due to the di erent sizes of the runtime

time, includin g time to assemble les pro duced byTIL.

systems and standard libraries for the two compilers. (TIL's

Figures 8 through 11 present the measurements (the raw

runtime system is ab out 100K, while SML/NJ's runtime is

numb ers are in App endix ??). For eachbenchmark, mea-

ab out 425K.) The program sizes for TIL con rm that gener-

surements for TIL were normalized to those for SML/NJ

ating tables for nearly tag-free garbage collection consumes

and then graphed. SML/NJ p erformance is the 100% mark

a mo dest amount of space, and that the inlinin g strategy

on all the graphs.

used by TIL pro duces co de of reasonable size.

Figure 8 presents relative running times. On average,

Figure 11 compares compilation times for TIL and SML/NJ.

programs compiled by TIL run 3.3 times faster than pro-

SML/NJ do es much b etter than TIL when it comes to com-

grams compiled by SML/NJ. In fact, all programs except

pilation time, compiling ab out eight times faster. However,

Knuth-Bendix and Life are substantially faster when com-

wehaveyet to tune TIL for compilation sp eed.

piled by TIL. We sp eculate that less of a sp eed-up is seen

for Knuth-Bendix and Life b ecause they makeheavy use of

list-pro cessing , which SML/NJ do es a go o d job of compiling.

5.3 Lo op-Oriented Optimizations

Figure 9 compares the relativeamounts of heap allo ca-

tion. On average, the amount of data heap-allo cated bythe

We also investigated the e ect of the lo op-oriented optimiza-

TIL program is ab out 17% of the amount allo cated bythe

tions (CSE, invariantremoval, hoisting, comparison elimi-

SML/NJ program. This is not surprising, b ecause TIL uses

nation, and redundant switch elimination ). For eachbench-

astack while SML/NJ allo cates frames on the heap.

mark, we compared p erformance with the lo op optimiza- 9

Exec time /Exec time

opt noopt

Heap Allo c /Heap Allo c

opt noopt

125%

100%

100%

75%

75%

50%

50%

25%

25%

Lexgen

Cksum FFT KB Life Mmult PIA SIMPLE

Lexgen

Cksum FFT KB Life Mmult PIA SIMPLE

Figure 9: TIL Heap Allo cation Relative to SML/NJ

Figure 12: E ects of Lo op Optimizations

tions against p erformance without the lo op optimization s.

Figure 12 presents the ratios of execution time with the lo op

optimizations to execution time without the lo op optimiza-

tions, and similar ratios for total heap allo cation. The lo op

optimizations reduce execution time by 0 to 83%, with a

100%

median reduction of 39%. The e ect on heap allo cation

ranges from an increase of 20% to a decrease of 96.5%, with

75% a median decrease of 10%.

For matmult, the matrix multiplicati on function is small

enough that the optimizer inlines it, making the array di-

50%

mensions known. If the array dimensions are held unknown,

then the lo op optimization s sp eed up matmult by a factor of

25% 2.5.

6 Related Work

Lexgen

Cksum FFT KB Life Mmult PIA SIMPLE

Morrison et al. used an \ad-ho c" approach to implement

Figure 10: TIL Physical Memory Used Relative to SML/NJ

p olymorphism in their implementation of Napier '88 [35]. In

particular, they passed representations of typ es to p olymor-

phic routines at run-time to determine b ehavior. However,

to our knowledge, Napier '88 did not use typ es to implement

tag-free garbage collection. Also, there is no description of

the internals of the Napier '88 compiler, nor is there an ac-

count of the p erformance of co de generated by the compiler.

Peyton Jones and Launchbury suggested that typ es could

b e used to unboxvalues in a p olymorphic language [26].

However, they only supp orted a limited set of \unboxed

2000%

typ es" (ints and oats) and restricted these typ es from in-

stantiating typ e variables. Later, Leroy suggested a gen-

1500%

eral approach for unboxing values based on the ML typ e

system [28]. Leroy's approach has b een extended and

implemented elsewhere [38, 24, 42 ], including the SML/NJ

1000%

compiler. It do es not supp ort unboxed arraycomponents

nor attened, recursive datatyp es. Tolmach [46] combined

500%

Leroy's approach with tag-free garbage collection. However,

he used an ad hoc approach to propagate typ e information

100%

to the collector.

Lexgen

Cksum FFT KB Life Mmult PIA SIMPLE

Other researchers have suggested that p olymorphism should

b e eliminated entirely at compile time [9, 25, 21 ], in the style

Figure 11: Til Compilation Time Relative to SML/NJ

of C++ templates [44]. This prevents separate compilation

of a p olymorphic de nition from its uses. In contrast, in-

tensional p olymorphism, and in particular the intermediate 10

forms of TIL, supp ort separate compilation of p olymorphic References

de nitions, though wehaveyet to takeadvantage of this.

[1] Shail Aditya, Christine Flo o d, and James Hicks. Garbage

Tag-free garbage collection was originall y prop osed for

collection for strongly-typed languages using run-time typ e

monomorphic languages likePascal, but has b een used else-

reconstruction. In LFP '94 [30], pages 12{23.

where [12, 11 , 48, 15]. Britton suggested asso ciating typ e

[2] Alfred V. Aho, Ravi Sethi, and Je rey D. Ullman. Com-

information with return addresses on the stack [12]. App el

pilers: Principles, Techniques, and Tools. Addison{Wesley

suggested extending this technique to ML by using uni ca-

Publishing Company, 1986.

tion [4]. Goldb erg and Gloger improved App el's algorithm

[3] Andrew App el. A critique of Standard ML. Journal of Func-

[20, 19 ]. None of the uni cation-b ased algorithms were im-

tional Programming, 3(4):391{429, Octob er 1993.

plemented due to the complexity of the algorithms and the

[4] Andrew W. App el. Runtime tags aren't necessary. Lisp and

overhead of p erforming uni cation during garbage collec-

Symbolic Computation, (2):153{162, 1989.

tion.

[5] Andrew W. App el. Compiling with Continuations. Cam-

Aditya, Flo o d, and Hicks used typ e-passing to supp ort

bridge University Press, 1992.

fully tag-free garbage collection for Id [1]. Indep endently,

Tolmach [46] implemented a typ e-passing garbage collec-

[6] Andrew W. App el, James S. Mattson, and David Tarditi.

A lexical analyzer generator for Standard ML. Distributed

tion algorithm for ML. Our approach di ers from others

with Standard ML of New Jersey, 1989.

by using \nearly" tag-free collection. In particular, records

and arrays on the heap are tagged. Another di erence is

[7] Edoardo Biagioni, Rob ert Harp er, Peter Lee, and Brian

Milnes. Signatures for a network proto col stack: A systems

that we calculate typ e environments eagerly, while the other

application of Standard ML. In LFP '94 [30], pages 55{64.

implementation s construct typ e environments lazily during

garbage collection.

[8] Lars Birkedal, Nick Rothwell, Mads Tofte, and David N.

Turner. The ML Kit, Version 1. Technical Rep ort 93/14,

Lo op-oriented optimizations are well-known for imp er-

DIKU, 1993.

ative languages [2]. However, few results are rep orted for

Lisp, Scheme, and ML. App el [5] and Serrano [40] rep ort

[9] Guy E. Blello ch. NESL: A nested data-parallel language

(version 2.6). Technical Rep ort CMU-CS-93-129, Scho ol of

common-sub expressio n eliminati on optimizations similar to

Computer Science, Carnegie Mellon University, April 1993.

ours. App el found that CSE was not useful in the SML/NJ

compiler. Serrano restricted CSE to pure expressions, while

[10] Hans-Juergen Bo ehm. Space-ecient conservative garbage

collection. In PLDI '93 [36], pages 197{206. our CSE handles expressions whichmay raise exceptions.

[11] P. Branquart and J. Lewi. A scheme for storage allo cation

and garbage collection for Algol-68. In Algol-68 Implementa-

7 Conclusions and future work

tion. North-Holland Publishing Company, Amsterdam, 1970.

[12] Dianne Ellen Britton. Heap storage management for the

Our results show that for core-SML programs compiled as

Pascal. Master's thesis, University

a whole, intensional p olymorphism can remove restrictions

of Arizona, 1975.

on data representation, yet cost literally nothing due to the

[13] Fred C. Chow. Minimizing register usage p enaltyatproce-

e ectiveness of optimization. They also show that lo op op-

dure calls. In Proceeding s of the ACM SIGPLAN '88 Con-

timizations can improve program p erformance signi cantl y.

ferenceonProgramming Language Design and Implementa-

These results suggest that ML can b e compiled as well as

tion, pages 85{94, Atlanta, Georgia, June 1988. ACM.

conventional languages suchasPascal. TIL pro duces co de

[14] A. Demers, M. Weiser, B. Hayes, H. Bo ehm, D. Bobrow,

that is similar in many imp ortant resp ects to co de pro duced

and S. Shenker. Combining generational and conservative

byPascal and C compilers. For example, most function calls

garbage collection: Framework and implementations. In

are known, since few higher-order functions are left, integers

ConferenceRecord of the 17th Annual ACM SIGPLAN-

are untagged, and most co de is monomorphic.

SIGACT Symposium on Principles of Programming Lan-

There are numerous areas that wewould liketoinvesti-

guages,SanFrancisco, California, January 1990. ACM.

gate further. Wewould like to explore the e ect of separate

[15] Amer Diwan, Eliot Moss, and Richard Hudson. Compiler

compilation. With separate compilation, p olymorphic func-

supp ort for garbage collection in a statically typ ed language.

tions may b e compiled separately from their uses, leading to

In Proceedings of the ACM SIGPLAN '92 Conferenceon

Programming Language Design and Implementation,pages

some cost for intensional p olymorphism. Wewould liketo

273{282, San Francisco, CA, June 1992. ACM.

measure this cost and explore what kinds of optimizations

can reduce it.

[16] Amer Diwan, David Tarditi, and Eliot Moss. Memory-

System Performance of Programs with Intensive Heap Al-

Another direction wewould liketoinvestigate is howthis

lo cation. Transactions on Computer Systems, August 1995.

approach p erforms for larger programs. Wewould liketo

add supp ort for more of the ML mo dule system, since large

[17] K. Ekanadham and Arvind. SIMPLE: An exercise in fu-

ture scienti c programming. Technical Rep ort Computation

ML programs make extensive use of the mo dule system. We

Structures Group Memo 273, MIT, Cambridge, MA, July

would also like to improve TIL's compile times, so that large

1987. Simultaneously published as IBM/T. J. Watson Re-

programs can also b e compiled as a whole.

search Center Research Rep ort 12686, Yorktown Heights,

Finally,wewould like to continue improving the p er-

NY.

formance of ML programs. Wewould like to extend our

[18] Cormac Flanagan, Amr Sabry, Bruce F. Duba, and Matthias

register allo cation strategy along the lines of Chow [13] or

Felleisen. The essence of compiling with continuations. In

Steenkiste [43]. Wewould also liketoinvestigate more

PLDI '93 [36], pages 237{247.

lo op optimizations, such as strength-reduction, induction-

[19] Benjamin Goldb erg. Tag-free garbage collection in strongly

variable elimination , and lo op unrolling . On a more sp ecu-

typ ed programming languages. In Proceedings of the ACM

lative note, wewould like to explore stack allo cation of data

SIGPLAN '91 ConferenceonProgramming Language De-

structures.

sign and Implementation, pages 165{176, Toronto, Canada,

June 1991. ACM. 11

[20] Benjamin Goldb erg and Michael Gloger. Polymorphic typ e [38] Eigil Rosager Poulsen. Representation analysis for ecient

reconstruction for garbage collection without tags. In Pro- implementation of p olymorphism. Technical rep ort, Depart-

ceedings of the 1992 ACM Conference on Lisp and Func- ment of Computer Science (DIKU), University of Cop en-

tional Programming, pages 53{65, San Francisco, California, hagen, April 1993. Master Dissertation.

June 1992. ACM.

[39] Chris Reade. Elements of .

[21] Cordelia Hall, Simon L. Peyton Jones, and Patrick M. San-

Addison-Wesley, Reading, Massachusetts, 1989.

som. Unboxing using sp ecialisation. In D. Turner K. Ham-

[40] Manual Serrano and Pierre Weis. 1+1 = 1: an optimizing

mond, P.M. Sandom, editor, Functional Programming, 1994.

CAML compiler. Technical Rep ort 2264, INRIA, June 1994.

Springer-Verlag, 1995.

[41] Zhong Shao. Compiling Standard ML for Ecient Execu-

[22] Rob ert Harp er and Mark Lillibridge. A typ e-theoretic ap-

tion on Modern Machines. PhD thesis, Princeton University,

proach to higher-order mo dules with sharing. In POPL '94

Princeton, New Jersey,Novemb er 1994.

[37], pages 123{137.

[42] Zhong Shao and Andrew W. App el. A typ e-based compiler

[23] Rob ert Harp er and Greg Morrisett. Compiling p olymor-

for Standard ML. In Proceedings of the ACM SIGPLAN '95

phism using intensional typ e analysis. In ConferenceRecord

ConferenceonProgramming Language Design and Imple-

of the 22nd Annual ACM SIGPLAN-SIGACT Symposium

mentation, pages 116{129, La Jolla, California, June 1994.

on Principles of Programming Languages, pages 130{141,

ACM.

San Francisco, California, January 1995. ACM.

[43] Peter Steenkiste. Advanced register allo cation.InPeter Lee,

[24] Fritz Henglein and Jesp er Jrgensen. Formally optimal b ox-

editor, Topics in AdvancedLanguage Implementation. MIT

ing. In POPL '94 [37], pages 213{226.

Press, 1990.

[25] M.P. Jones. Partial evaluation for dictionary-freeoverload-

[44] Bjarne Stroustrup. The C++ Programming Language, 2nd

ing. Research Rep ort YALEU/DCS/RR-959, Yale Univer-

Edition. Addison-Wesley, 1991.

sity, New Haven, Connecticut, USA, April 1993.

[45] David R. Tarditi. Optimizing ML. PhD thesis, Scho ol of

[26] Simon Peyton Jones and John Launchbury.Unboxed values

Computer Science, Carnegie Mellon University, 1996. Forth-

as rst-class citizens. In Proceedings of the Conferenceon

coming.

Functional Programming and Computer Architecture,vol-

ume 523 of Lecture Notes on Computer Science, pages 636{

[46] Andrew Tolmach. Tag-free garbage collection using explicit

666. ACM, Springer-Verlag, 1991.

typ e parameters. In LFP '94 [30], pages 1{11.

[27] David Kranz, Richard Kelsey, Jonathan Rees, Paul Hudak,

[47] Kevin G. Waugh, Patrick McAndrew, and Greg Michaelson.

James Philbin, and Norman Adams. ORBIT: An Optimizing

Parallel implementations from function prototyp es: a case

Compiler for Scheme. In Proceedings of the SIGPLAN '86

study.Technical Rep ort Computer Science 90/4, Heriot-

Symposium on Compiler Construction, pages 219{233, Palo

Watt University,Edinburgh, August 1990.

Alto, California, June 1986. ACM.

[48] P.L. Wo don. Metho ds of garbage collection for Algol-68. In

[28] Xavier Leroy. Unboxed ob jects and p olymorphic typing.

Algol-68 Implementation. North-Holland Publishing Com-

In ConferenceRecord of the 19th Annual ACM SIGPLAN-

pany, Amsterdam, 1970.

SIGACT Symposium on Principles of Programming Lan-

guages, pages 177{188, Albuquerque, NM, January 1992.

ACM.

[29] Xavier Leroy. Manifest typ es, mo dules, and separate compi-

lation. In POPL '94 [37], pages 109{122.

[30] Proceedings of the 1994 ACM Conference on Lisp and Func-

tional Programming, Orlando, Florida, June 1994. ACM.

[31] Robin Milner, Mads Tofte, and Rob ert Harp er. The De ni-

tion of StandardML. MIT Press, 1990.

[32] Y. Minamide, G. Morrisett, and R. Harp er. Typ ed clo-

sure conversion. In ConferenceRecord of the 23rdAnnual

ACM SIGPLAN-SIGACT Symposium on Principles of Pro-

gramming Languages, St. Petersburg, Florida, January 1996.

ACM.

[33] Greg Morrisett. Compiling with Types. PhD thesis, Scho ol of

Computer Science, Carnegie Mellon University, Pittsburgh,

PA, Decemb er 1995. Published as Technical Rep ort CMU-

CS-95-226.

[34] Greg Morrisett, Matthias Felleisen, and Rob ert Harp er. Ab-

stract mo dels of memory management. In ACM Confer-

enceonFunctional Programming and Computer Architec-

ture, pages 66{77, La Jolla, June 1995.

[35] R. Morrison, A. Dearle, R. C. H. Connor, and A. L. Brown.

An ad ho c approach to the implementation of p olymorphism.

ACM Transactions on Programming Languages and Sys-

tems, 13(3):342{371, July 1991.

[36] Proceedings of the ACM SIGPLAN '93 ConferenceonPro-

gramming Language Design and Implementation,Albu-

querque, New Mexico, June 1993. ACM.

[37] ConferenceRecord of the 21st Annual ACM SIGPLAN-

SIGACT Symposium on Principles of Programming Lan-

guages,Portland, Oregon, January 1994. ACM. 12