Title RegionBased Memory Management

Authors Mads Tofte and JeanPierre Talpin

Aliation Mads Tofte Department of Computer Science Universityof

Cop enhagen JeanPierre Talpin IRISA Campus de Beaulieu France

An earlier version of this work was presented at the st ACM SIGPLANSIGACT Sym

p osium on Principles of Programming Languages Portland Oregon January

Abstract

This pap er describ es a memory management disciplin e for programs

that p erform dynamic memory allo cation and deallo cation At runtime

all values are put into regions The store consists of a stack of regions

All p oints of region allo cation and deallo cation are inferred automatically

using a typ e and eect based program analysis The scheme do es not

assume the presence of a garbage collector

The scheme was rst presented byTofte and Talpin subse

quently it has b een tested in The ML Kit with Regions a regionbased

garbagecollection free implementation of the Standard ML Core language

which includes recursive datatyp es higherorder functions and up datable

references Birkedal et al Elsman and Hallenb erg

This pap er denes a regionbased dynamic semantics for a skeletal pro

gramming language extracted from Standard ML We present the inference

system which sp ecies where regions can b e allo cated and deallo cated and

a detailed pro of that the system is sound with resp ect to a standard se

mantics

We conclude by giving some advice on how to write programs that run

well on a stack of regions based on practical exp erience with the ML Kit

Intro duction

Computers have nite memory Very often the total of memory allo cated by

a program as it is run far exceeds the size of the memory Thus a practical

discipline of programming must provide some form of memory recycling

One of the key achievements of early work in programming languages is the

invention of the notion of blo ck structure and the asso ciated implementation tech

nology of stackbased memory management for recycling of memory In blo ck

structured languages every p oint of allo cation is matched bya point of deallo ca

tion and these p oints can easily b e identied in the source programNaur

Dijkstra Prop erly used the stack discipline can result in very ecient use

of memory the maximum memory usage b eing b ounded by the depth of the call

stack rather than the numb er of memory allo cations

The stack discipline has its limitations however as witnessed by restrictions

in the typ e systems of blo ckstructured languages For example pro cedures are

typically prevented from returning lists or pro cedures as results There are two

main reasons for such restrictions

First for the stackdiscipline to work the size of a value must b e known at

the latest when space for that value is allo cated This allows for example arrays

which are lo cal to a pro cedure and have their size determined by the arguments

of the pro cedure by contrast it is not in general p ossible to determine howbig

a list is going to b ecome when generation of the list b egins

alues must com Second for the stackdiscipline to work the lifetime of v

ply with the allo cation and deallo cation scheme asso ciated with blo ck structure

When pro cedures are values there is a danger that a pro cedure value refers to

values whichhave b een deallo cated For example consider the following program

let x

in fn y x y

end

This expression is an application of a function denoted by letendtothe

number The function has formal parameter y and b o dy x y where

stands for rst pro jection fn is pronounced in SML Thus the op erator

expression is supp osed to evaluate to fn y x y where x is b ound to

the pair so that the whole expression evaluates to the pair However

if we regard the letend construct as a blo ck construct rather than just a

lexical scop e we see why a stackbased implementation would not work we

cannot deallo cate the space for x at the end since the rst comp onentof x is

still needed by the function which is returned by the en tire let expression

One way to ease the limitations of the stack discipline is to allow program

mer controlled allo cation and deallo cation of memory as is done in C C has

r r r r

Figure The store is a stack of regions every region is uniquely identied bya

region name eg r and is depicted bya box in the picture

two op erations malloc and free for allo cation and deallo cation resp ectively

Unfortunately it is in general very hard for a programmer to know when a blo ck

of memory do es not contain any livevalues and may therefore b e freed conse

quently this solution very easily leads to socalled spaceleaks ie to programs

that use much more memory than exp ected

Functional languages like Haskell and Standard ML and some ob jectoriented

languages eg JAVA instead let a separate routine in the runtime system the

garbage col lectortake care of deallo cation of memory Knuth Baker

Lieb erman Allo cation is done by the program often at a very high rate

In our example the three expressions fn y x y and

x y each allo cate memoryeach time they are evaluated The part of memory

used for holding suchvalues is called the heap the role of the garbage collector

is to recycle those parts of the heap that hold only dead values ie values which

are of no consequence to the rest of the computation

Garbage collection can b e very fast provided the computer has enough mem

ory Indeed there is a much quoted argument that the amortized cost of copying

garbage collection tends to zero as memory tends to innity App el page

It is not the case however that languages such as Standard ML frees the

programmer completely from having to worry ab out memory management To

write ecient SML programs one must understand the p otential dangers of for

example accidental copying or survival of large data structures If a program

is written without concern for space usage it maywell use much more memory

than one would like even if the problem is lo cated using a space proler for

example turning a spacewasting program into a spaceecient one may require

ma jor changes to the co de

The purp ose of the work rep orted in this pap er is to advo cate a compromise

between the two extremes completely manual vs completely automatic memory

management We prop ose a memory mo del which the user may use when pro

gramming memory can b e thought of as a stack of regions see Figure Each

region is like a stackofunb ounded size which grows upwards in the picture until

the region in its entirety is p opp ed o the region stack For example a typical

use of a region is to hold a list A program analysis automatically identies pro

gram p oints where entire regions can b e allo cated and deallo cated and decides

for eachvaluepro ducing expression into which region the value should b e put

More sp ecicallywe translate every welltyp ed source language expression

einto a target language expression e which is identical with e except for

certain region annotations The evaluation of e corresp onds step for step to the

evaluation of eTwo forms of annotations are

e at

letregion in e end

The rst form is used whenever e is an expression which directly pro duces a

value Constant expressions abstractions and tuple expressions fall into this

category The is a region variable it indicates that the value of e is to b e put

in the region b ound to

The second form intro duces a region variable with lo cal scop e e At run

time rst an unused region identied bya region name r is allo cated and

b ound to Then e is evaluated probably using the region named r Finally

the region is deallo cated The letregion expression is the only wayofintro

ducing and eliminating regions Hence regions are allo cated and deallo cated in

a stacklike manner

The target program which corresp onds to the ab ove source program is

e letregion

in letregion

in let x at at at

in y x y at at

end

end

at

end

We shall step through the evaluation of this expression in detail in Section

Brieyevaluation starts in a region stack with three regions and

and and evaluation then allo cates and deallo cates three more regions

at the end and contain the nal result

The scheme forms the basis of the ML Kit with Regions a compiler for the

Standard ML Core language including higherorder functions references and

recursive datatyp es The region inference rules we describ e in this pap er address

life times only A solution to the other problem handling values of unknown size

is addressed in Birkedal et al An imp ortant optimisation turns out to b e

to distinguish b etween regions whose size can b e determined statically and those

that cannot The former can b e allo cated on a usual stack

Using C terminology region analysis infers where to insert calls to malloc and

free but b eware that the analysis has only b een develop ed in the context of

Standard ML and relies on the fact that SML is rather more strongly typ ed than

C For a strongly typ ed imp erative language likeJAVA region inference mightbe

useful for freeing memory unlikeCJAVA do es not have free For readers who

are interested in co de generation App endix A shows the threeaddress program

which the ML Kit pro duces from the ab ove program using b oth region inference

and the additional optimisations describ ed in Birkedal et al However

this pap er is primarily ab out the semantics of regions not their implementation

Exp erience with the Kit is that prop erly used the region scheme is strong

enough to execute demanding b enchmarks and to make considerable space sav

ings compared to a garbagecollected system Birkedal et al Wehave found

that most of the allo cation is handled well by the automatic region analysis o c

casionally it is to o conservative and here a garbage collector would probably b e

useful esp ecially if the programmer do es not know the region inference rules

for now wehavechosen instead to make usually small transformations to the

source programs to make them more region friendly We shall describ e some

of those transformations towards the end of this pap er

Avery imp ortant prop erty of our implementation scheme is that programs are

executed as they are written with no additional costs of unb ounded size see

App endix A for a detailed example The memory management directives which

are inserted are each constant time op erations This op ens up the p ossibilityof

using languages with the p ower of Standard ML for applications where guarantees

ab out time and space usage are crucial for example in real time programming or

emb edded systems

The key problem which is addressed in this pap er is to prove that the region

inference system is safe in particular that deallo cation really is safe when the

analysis claims that it is safe

We do this as follows We rst dene a standard op erational semantics for our

skeletal source language giving b oth a static and a dynamic semantics Section

We then dene a regionbased op erational semantics for a target language the

target language is identical to the source language except that programs have

b een annotated with region information Section In the dynamic semantics or

the source language there is no notion of store in the target language semantics

however there is a store which is organised as a stack of regions Wethen

sp ecify the translation from source language to target language in the form of an

inference system Section We then dene a representation relation b etween

values in a standard semantics for our skeletal language and values in a region

based semantics Section and show that for every sub expression e of the

original program as far as the rest of the computation after the evaluation of

e is concerned e and its image in the target program evaluate to related values

when evaluated in related environments Section Restricting attention to

what the rest of the computation can observe turns out to b e crucial some

connections b etween values in the source language semantics and in the region

based semantics are lost when memory is reused in the regionbased semantics

The key p oint is that on that part of target machine which can b e observed by

the rest of the computation every value used in the source language is faithfully

represented byavalue in the target language

This representation relation is dened as the maximal xed p oint of a certain

monotonic op erator Prop erties of the relation are proved using a metho d of

pro of whichwe call rulebasedcoinduction Section

Algorithms for region inference are b eyond the scop e of this pap er however

we shall give some hints ab out how the region inference rules we present can b e

implemented Section

Related Work

The main dierences b etween the region stack and the traditional stack discipline

for blo ckstructured languages are as follows First when a value is created in our

scheme it is not necessarily put into the topmost region In the case of function

closures for example the closure is put as far down the stack as is necessary in

order to b e sure that the closure still exists should it ever b e accessed Second not

all regions have a size which can b e determined at the time the region is allo cated

Finally the scheme works for higherorder functions and recursive datatyp es and

allo cation is based on the basis of the typ e system of the language not the

grammar

Ruggieri and Murtagh prop ose a stack of regions in conjunction with a

traditional heap Each region is asso ciated with an activation record this is not

necessarily the case in our scheme They use a combination of interpro cedural

and intrapro cedural dataow analysis to nd suitable regions to put values in

Weuseatyp einference based analysis and this is crucial for the handling of

p olymorphism and higherorder functions

Inoue et al presentaninteresting technique for compiletime analy

sis of runtime garbage cells in lists Their metho d inserts pairs of HOLD and

RECLAIM instructions in the target language HOLD holds on to a p ointer p

say to the ro ot cell of its argument and RECLAIM collects those cells that are

reachable from p and t the path description HOLD and RECLAIM pairs are

nested so the HOLD p ointers can b e held in a stack not entirely unlike our stack

of regions In our scheme however the unit of collection is one entire region

ie there is no traversal of values in connection with region collection The path

descriptions of Inoue et al make it p ossible to distinguish b etween the individ

ual memb ers of a list This is not p ossible in our scheme as we treat all the

elements of the same list as equal Inoue et al rep ort a reclamation rate

for garbage list cells pro duced by Quicksort Inoue page We obtain

a reclamation rate but for word for al l garbage pro duced byQuicksort

without garbage collection Tofte and Talpin

Hudak describ es a reference counting scheme for a rstorder callby

value functional language Turner Wadler and Mossin use a typ e system

inspired by linear logic to distinguish b etween variables which are used at most

once and variables whichmay b e used more than once These analyses provide

somewhat dierent information from ours we only distinguish b etween no use

and p erhaps some use

George describ es an implementation scheme for typ ed lambda ex

pressions in socalled simple form together with a transformation of expressions

into simple form The transformation can result in an increase in the number of

evaluation steps by an arbitrarily large factor George page George

also presents an implementation scheme which do es not involve translation al

though this relies on not using callbyvalue reduction when actual parameters

are functions

The device we use for grouping values according to regions is unication of

region variables using essentially the idea of Baker namely that twovalue

pro ducing expressions e and e should b e given the same at annotation if

e and e and only if typ e checking directly or indirectly unies the typ e of

Baker do es not prove safetyhowever nor do es he deal with p olymorphism

To obtain go o d separation of lifetimes we use explicit region polymorphismby

whichwe mean that regions can b e given as arguments to functions at runtime

For example a declaration of the successor function fun succxx is

compiled into

fun succ xletregion

in x at at

end

Note that succ has b een decorated with two extra formal region parameters en

closed in square brackets to distinguish them from value variables suchas x

The new succ function has typ e scheme

fget put g

int int

meaning that for any and the function accepts an integer at and pro duces

an integer at p erforming a get op eration on region and a put op eration

on region in the pro cess Now succ will put its result in dierent regions

dep ending on the context

succ at succ y

We make the additional provision that a recursive function f can call itself

with region arguments which are dierent from its formal region parameters and

whichmaywell b e lo cal to the b o dy of the recursive function Such lo cal regions

resemble the activation records of the classical stack discipline

We use ideas from eect inference Lucassen Lucassen and Giord

Jouvelot and Giord to nd out where to wrap letregion in

end around an expression Most work on eect inference uses the word ef

fect with the meaning sideeect or in concurrent languages communication

eectNielson and Nielson However our eects are sideeects relative

to the underlying regionbased store mo del irresp ective of whether these eects

stem from imp erative features or not

The idea that eect inference makes it p ossible to delimit regions of memory

and delimit their lifetimes go es back to early work on eect systems Lucassen

et al Lucassen and Giord call it eect maskingtheyprovethat

side eect masking is sound with resp ect to a store semantics where regions are

not reused Talpin and Talpin and Jouvelot present a p olymorphic

eect system with side eect masking and prove that it is sound with resp ect

to a store semantics where regions are not reused

The rst version of the pro of of the present pap er was recorded in a technical

rep ort Tofte and Talpin which in turn was used as the basis for the

pro of outline in Tofte and Talpin In order to simplify the pro ofs several

mo dications to the early pro ofs have b een done The main dierences are a we

have adopted value p olymorphism which simplies the pro ofs in various ways in

particular a dicult lemma Lemma in Tofte and Talpin could b e

eliminated b the dynamic semantics of the target language has b een extended

with region environments c the denition of consistency has b een strengthened

to prevent closures with free region variables these used to complicate the pro of

d the pro ofs have b een rewritten and reorganised around the idea of rulebased

coinduction

Aiken et al havedevelop ed a program analysis which can b e used as

a p ostpass to the analysis describ ed in the present pap er Their analysis makes

it p ossible to delay the allo cation of regions and to promote the deallo cation

sometimes leading to asymptotic improvements in space usage and never leading

to worse results than region inference without their analysis added

The source language SExp

The skeletal language treated in this pap er is essentially Milners p olymorphically

typ ed lamb da calculus Milner We assume a denumerably innite set Var

of program variablesWeuse x and f to range over variables Finally c ranges

over integer constants The grammar for the source language is

e c j x j xe j e e j let x e in e end

j letrec f xe in e end

Let SExp denote the set of source language expressions The addition of pairs

and tuples to the theory is straightforward References exceptions and recursive

datatyp es have b een added in the implementation but correctness of the trans

lation of these constructs has not b een proved Callcc concurrency primitives

and other substantial extensions of Standard ML have not b een studied Nor is it

clear whether region inference can b e made to b ear on lazy functional languages

The fact that ML is typ ed is essential the fact that it has p olymorphism is not

essential for what follows

Notation

In the rest of this pap er we shall use the following terminologyAnite map is a

map with nite domain Given sets A and B the set of nite maps from A to B

n

is denoted A B The domain and range of a nite map f are denoted Domf

and Rng f resp ectivelyWhenf and g are nite maps f g is the nite map

whose domain is Domf Domg and whose value is g x if x Domg and

f x otherwise F or anymapf and set Awe write f A to mean the restriction

of f to AWe sometimes write a tuple of region variables for example in the

form ie without parentheses and commas

k

We often need to select comp onents of tuples for example the region name of

an address In such cases we rely on variable names to indicate which comp onent

is b eing selected For example r of a means the region name comp onentof

a As we shall see an address is a pair of the form ro where r is a region

name and o is an oset

Static Semantics for Source

Following Damas and Milner wehave ML types and ML type schemes

dened by

ML ML ML

int j j ML typ e

ML ML

ML typ e scheme n

n

ranges over a denumerably innite set TyVar of type variablesAnML where

ML ML ML

typ e is an instance of an ML typ e scheme written

n

ML ML ML ML ML ML ML

if there exist suchthat

n

n n

ML

An ML type environment is a nite map from program variables to ML

ML

typ e schemes We use TE to range over typ e environments When o is an ML

typ e typ e scheme or typ e environment ftv o denotes the set of typ e variables

that o ccur free in o

In Milners original typ e discipline p olymorphism is asso ciated with letIt

has turned out that there are advantages to restricting p olymorphism so that in

let x e in e end x only gets a typ e scheme if e is a syntactic value In

the present language a syntactic value is an integer constantoralamb da abstrac

tion Besides making it easier to prove soundness in connection with references

and other language extensions imp osing this restriction also makes the pro ofs of

correctness of region inference simpler wehave done b oth In fact we shall take

the restriction one step further and only allow p olymorphism in connection with

letrec Any program which satises the value restriction can b e turned into

an equivalent program which only has letrecp olymorphismby simply turning

every let x e in e end into letrec x z e in e x x end where

x and z are fresh variables In the theory that follows we therefore only have

p olymorphism in connection with letrec with this conven tion let x e in

e end is just syntactic sugar for xe e weshow the rules for let even so

to make it easier to follow the examples

ML

ML ML ML

TE x

ML

ML

TE x

ML

ML ML

TE fx ge

ML

ML ML

TE xe

ML ML

ML ML ML

TE e TE e

ML

ML

TE e e

ML ML

ML ML ML

TE e TE fx ge

ML

ML

TE let x e in e end

ML ML

ML ML

TE ff g xe f gftv TE

n

ML

ML ML

TE ff ge

n

ML

ML

TE letrec f x e in e end

Dynamic Semantics for Source

A nonrecursive closure is a triple hx e E iwhereE is an environmentiea

nite map from variables to values We use E to range over environments the

set of environments is denoted Env A recursive closure takes the form hx e E f i

where f is the name of the recursive function in question A value is either an

integer constant or a closure We use v to range over values the set of values is

denoted Val

Evaluation rules app ear b elow They allow one to infer statements of the

form E e v read in environment E the expression e evaluates to value v

A closure representing a recursive function is unrolled just b efore it is applied

rule

Expressions E e v

E c c

E xv

E x v

E xe hx e E i

E e hx e E i E e v E fx v g e v

E e e v

E e hx e E fi E e v

E ff hx e E fig fx v ge v

E e e v

E e v E fx v ge v

E let x e in e v

E ff hx e Efig e v

E letrec f xe in e v

The target language TExp

We assume a denumerably innite set RegVar f g of region variables

we use to range over region variables The grammar for the target language

TExp is

e c j x j f at j xe at

n

j e e j let x e in e end

j letrec f xat e in e end

k

in e j letregion

As is common functions are represented by closures but regionp olymorphic

functions intro duced by letrec f x are represented by socalled

region function closures which are dierent from closures In the expression

form xe at the indicates the region into which the closure representing

xe should b e put Hence the at qualies xe not e In

letrec f xat e in e end

k

the indicates where the region function closure for f should b e put A sub

sequent application f at extracts this region function closure

n

from the store applies it to actual arguments and creates a function

k

closure in

For any nite set f g of region variables k we write

k

letregion in e for letregion in letregion in e

k k

We shall not present a separate static semantics for the target language for

such a semantics can b e extracted from the translation rules in Section We

thus pro ceed to the dynamic semantics

Dynamic Semantics for Target

Assume a denumerably innite set RegName fr rg of region names

weuse r to range over region names Region names serve to identify regions at

runtime Further assume a denumerable innite set OSet of osetswe use o

to range over osets

A region is a nite map from osets to storable values A storable value is

either an integer constant a function closure or a region function closure We use

sv to range over storable values the set of storable values is denoted StoreVal

is a nite map from program variables to values We A variable environment

use VE to range over variable environments the set of variable environments is

denoted TargetEnv A region environment is a nite map from region variables

to region names Weuse R to range over region environments the set of region

environments is denoted RegEnv A function closure is a quadruple hx e VERi

where x is a program variable e is a target language expression and VE and

R give meaning to the free program and region variables of xe A region

function closure is a tuple of the form h xeVERi Region function

k

closures represent regionp olymorphic functions the region variables

k

are required to b e distinct and are referred to as the formal parameters of the

region function closure

An address is a pair ro of a region name and an oset Weusea to range

over addresses and Addr to denote over the set of addresses For any address a

we write r of a to mean the rst comp onent ie the region name of aAstore

is a nite map from region names to regions We use s to range over stores the

set of stores is denoted Store

A value is an address We use v to range over values the set of values is

denoted TargetVal

We shall b e brief ab out indirect addressing whenever a ro is an address

we write sa to mean sr o Similarlywe write s fro sv g as a shorthand

for s fr sr fo sv gg Moreover we dene the planar domain of

s written Pdoms to b e the nite set fro Addr j r Doms o

Domsr g Finallywe write s nnfr g read s without r to mean the store

s Doms nfr g

The inference rules for the dynamic semantics of TExp are shown b elow They

allow one to infer sentences of the form s VE R e v s read In store

s variable environment VE and region environment R the target expression e

evaluates to value v and a perhaps modied store s

Rule is the evaluation rule for application of a region function closure

A function closure is created from the region closure One can imagine that

aruntimeerror o ccurs if the premises cannot b e satised for example b ecause

However the correctness pro of shows that the premises DomR for som

i i

always can b e satised for programs that result from the translation

Rule concerns regionp olymorphic and p ossibly recursive functions To

keep down the numb er of constructs in the target language weha vechosen to

combine the intro duction of recursion and region p olymorphism in one language

construct Functions dened with letrec need not b e recursive so one can also

use the letrec construct to dene region functions that pro duce nonrecursive

functions Rule creates a region closure in the store and handles recursion by

creating a cycle in the store rst a fresh address is chosen by sideconditions

r R o Domsr the environment VE VE ff rog is stored

in the region function closure h xe VE Ri which in turn is stored in

k

the fresh address chosen earlier Any reference to f in e will then yield the region

function closure itself by rule as desired since letrec intro duces recursion

Moreover in any function application the op erator expression will evaluate to

apointer to an ordinary function closure h xeVE R ieven if the

k

op erator expression is of the the form f at Consequentlya

k

single rule for function application suces

Finally the pushing and p opping of the region stack is seen in Rule

s V E R e v s Expressions

Rr o Domsr

s V E R c at ros fro cg

VEx v

s V E x v s

VEf a sa h xeVE R i

k

r R o Domsr sv hx e V E R f R i k gi

i

i

at ros fro sv g s VE R f

k

r R o Domsr

s VE R xe at ros fro hx e V E R ig

s V E R e a s s a hx e VE R i

s VE R e v s s VE fx v gR e v s

s V E R e e v s

s VE R e v s s VE fx v gR e v s

s V E R let x e in e v s

r R o Domsr VE VE ff rog

s fro h xe VE RigVE R e v s

k

s VE R letrec f xat e in e v s

k

r Doms s fr fggVER f r ge v s

s V E R letregion in e v s nnfr g

Wenow illustrate the use of the rules bytwo examples commentonthe

design decisions emb o died in the rules and nally prove some prop erties ab out

the semantics

Example Function Values

Let us consider the evaluation of the expression e from Section Since

and o ccur free in e they must b e allo cated b efore the evaluation of e

b egins We show three snapshots from the evaluation of e namely a just after

the closure has b een allo cated b just b efore the closure is applied and c at

the end we assume six regions with names r r which b ecome b ound to

resp ectively Notice the dangling but harmless p ointer at b



hy x yat fx g f r gi

r r r r r r

a



hy x yat fx g f r gi

r r r r r

b



r r r

c

Example Region p olymorphism

This example illustrates region p olymorphism and the use of p olymorphic re

cursion Consider the following source expression which computes the th Fi

b onacci numb er

letrec fibx if x then

else if x then

else fibxfibx

in fib end

The corresp onding target expression is shown in Figure In the target

expression the fib function takes two arguments namely which is the region

where x is lo cated and which is the place where fib is supp osed to put

its result Due to the presense of p olymorphic recursion in the region inference

system the recursive calls of fib use regions dierent from and and the

two recursive calls use separate regions For example the rst call rst reserves

space for the result of the call then reserves space for the actual argument

then creates the actual argument p erforms the call deallo cates the actual

argument and uses the result till it can b e discarded after the

letregion

in letrec fib at x int

if

then at

else if then at

else letregion

in letregion

in fib at

letregion inxat at end

end

letregion

at in fib

letregion in x at at end

end at

end

in letregion

in fib at

at

end

end

end

Figure The Fib onacci function annotated with regions The result will b e a

single integer in

The letrec stores the following cyclic region function closure in the store at

some new address a

h x if ffib ag f r r gi

Assuming that is b ound to r the application of fib to near the end of

the program stores the following function closure in the region denoted by

hx if ffib ag f r r r r gi

We see that region inference has pro duced allo cations and deallo cations very

similar to those of a traditional stackbased implementation Indeed the maximal

memory usage in this example is prop ortional to the maximumdepthofthe

recursion as it would b e in a pure stack discipline

Design Choices

The regionbased semantics relies on a numb er of design choices some of which

are crucial

First it is crucial that the sets RegName and OSet can b e any denumerable

sets We do not assume that these sets are ordered or that there is anynotionof

address lo calityThus no particular physical implementation of the region stack

is built into the theory This is essential since real computers have a at address

space whereas the region stack conceptually is twodimensional The particular

implementation choice used in the ML Kit is describ ed in Birkedal et al

Second it is crucial that the semantics uses socalled at environments the

alternative linked environments is to represent the environmentasa linked

list of environment frames This is a p opular representation in blo ckstructured

languages and in some functional languages With linked environments closure

creation is cheap but it do es not work with regions at least if the environment

frames are intersp ersed with regions on one stack In example it is essential

that we copytheenvironmentinto the closure for y xyat so that

and hence the binding for x is not destroyed when weleave the scop e of x and

p op the stack

There are also some inessential choices There is no need to representall

ob jects b oxed in the ML Kit integers and other values that t in one machine

word are represented unboxed Recursion could probably have b een implemented

using unfolding of closures rather than cycles in the store Finally there is no

deep need to keep the region environment and the variable environment separate

in closures the ML Kit merges the two but wedosotomake it clear that region

names are not values

Prop erties of regionbased evaluation

We can now state formally that the complete evaluation of an expression do es

not decrease the store For arbitrary nite maps f and f wesaythatf

extends f written f f if Domf Domf and for all x Domf

f x f x Wethensaythat s succ eeds s written s w s or s v s if

Doms Doms and s r s r for all r Doms

Lemma If s VE R e v s then Doms Doms and s v s

The pro of is a straightforward induction on the depth of inference of s V E RE

e v s The formula Doms Doms in Lemma expresses that the store

resulting from the elab oration has neither more nor less regions than the store

in which the evaluation b egins although other regions mayhave b een allo cated

temp orarily during the evaluation The evaluation of e may write values in ex

isting regions so it is p ossible to have sr s r for some r However e never

removes or overwrites any of the values that are in s

Syntactic Equality of Expressions

Let e b e a target expression The set of program variables that o ccur free in e

is written fpve The set of region variables that o ccur free in e is frve

Both in the source language and in the target language we shall consider

two expressions equal if they can b e obtained from each other by renaming

of b ound variables This extends to closures For example hx e VE i and

hx e VE i are considered equal if VE VE and x e and x e are

equal in the ab ove sense Moreover weeven allow that the free variables of

x e may b e a renaming of the free variables of x e provided of course that

the corresp onding change has b een made in the domain of VE to obtain VE

Lo osely sp eaking this corresp onds to admitting value environments as declara

tions and then allowing the usual renamings p ermitted in an expression of the

form let VE in x e end Finallywe consider hx e V E i and hx e V E i

equal if VE fpvxe VE fpvxe This allows us to intro duce and

delete unused program variables in the domains of environments inside closures

Similarly for any region closure h x e V E R i we allo w the renamings of x

fpve and frve and the intro duction or elimination of unused program variables

that one would exp ect if the closure were written let VE R in x e end

Equality on semantic ob jects in each of the two dynamic semantics is then

dened to b e the smallest equivalence relation which is closed under the three

transformations describ ed ab ove

Region inference

The rules that sp ecify which translations are legal are called the region inference

rules In Section we present region typ es and other semantic ob jects that o ccur

in the region inference rules the rules themselves are presented in Section In

Sections and we state and prove prop erties of the region inference system

for example that the translation is a renement of Milners typ e discipline

Semantic ob jects

Region typ es

We assume three denumerably innite pairwise disjoint sets

TyVar typ e variables

or p RegVar region variables

EectVar eect variables

Toavoid to o many subscripts and primes we use b oth p for place and to

range over region variables An atomic eect is a term of the form

put j get j atomic eect

We use to range over atomic eects An eect is a nite set of atomic eects

We use to range over eects For a concrete example the eect of expression

e in Example is fput put put g

Typ es and typ es with places are given by

int j j typ e

typ e with place

In a function typ e

the ob ject is called an arrow eectFormally an arrow eect is a pair of an

eect variable and an eect we refer to and as the hand le and the latent

eect resp ectively If a function f has typ e then the latent eect is to b e

interpreted as the eect of evaluating the b o dy of f Eect variables are useful

for expressing dep endencies b etween eects For example the target expression

e fxfx at at

can b e given typ e

fput g

 

e

fget g

 

In the last o ccurrence of indicates that for all e and e of the appropriate

typ e if e evaluates to some function g and e evaluates to some value v then

the evaluation of e e e ma yinvolve an application of g As it happ ens the

evaluation would indeed involve an application of g but the typ e do es not express

that

Equalityoftyp es is dened by term equality as usual but up to set equal

ity of latent eects For example the arrow eects fput get g and

fget putg are considered equal

One mightwonder whywehave a pair on the function arrow rather

than just say an eect The reason is that the region inference algorithms

we use rely on unication just as ML typ e inference do es Damas and Milner

Thus the eect sets on function arrows p ose a problem for the existence

of principal uniers A solution is to use arrow eects together with certain

invariants ab out the use of eect variables The basic idea is that eect variables

uniquely stand for eects if and b oth o ccur in a pro of tree formed

by the inference algorithm and then it will also b e the case that

Moreover if twoarrow eects and b oth o ccur in a pro of tree and

then the presence of in implies that subsumes the

entire eect which stands for With these representation invariants and

using the sp ecial notion of substitution dened b elow one can prove the existence

of principal uniers even though typ es contain eects which are sets A

detailed accountofhow this is done is b eyond the scop e of this pap er Also the

invariants mentioned ab ove are not needed for proving the soundness of region

inference so we shall not consider them in what follows

Substitution

A type substitution is a map from typ e variables to typ es weuse S to range

t

over typ e substitutions A region substitution is a map from region variables to

region variables weuseS to range over region substitutions An eect substi

r

tution is a map from eect variables to arrow eects we use S to range over

e

eect substitutions A substitution is a triple S S S weuseS to range over

t r e

substitutions Substitution on typ es region variables and eects is dened as

follows Let S S S S then

t r e

Eects

S fputS j put g

r

fgetS j get g

r

f j S f g g

e

Typ es and Region Variables

S intint S S S S

t r

S S S

S

where S S S S

e

For a concrete example consider the substitution S S S S where

r t e

fget put g if

S

e

otherwise

int if or

S

t

otherwise

S for all

r

refer to Nowwehave where and

fget put g fput g

   

int int

S

e

fget get put g

   

int int

This more sp ecic typ e for e is appropriate if e o ccurs in the following application

expression

e n int n at at

for which one will then b e able to infer the typ e and place

fget get put g

   

int int

In applying substitutions to semantic ob jects with b ound names eg a typ e

scheme b ound variables are rst renamed to avoid capture when necessary

Substitutions comp ose Id is the identity substitution

The support of a typ e substitution S written SuppS is the set f

t t

TyVar j S g Similarly for region substitutions The support of an

t

eect substitution S written SuppS is the set f EectVar j S g

e e e

The supp ort of a substitution S S S S written Supp S is dened as

t r e

SuppS Supp S SuppS Whenever S S and S are nite maps of

t r e t r e

the appropriate typ es we take the lib erty to consider the triple S S S a

t r e

substitution without explicitly extending the nite maps to total maps

Typ e schemes

Typ e schemes resemble the typ e schemes of Damas and Milner but with

additional quantication over region variables and eect variables

simple typ e scheme

j comp ound typ e scheme

k n m

where n k andm The following denitions are stated for comp ound

typ e schemes but are easily extended to simple typ e schemes For a typ e scheme

thebound variables of written bv is the set

k n m

f g

k n m

We sometimes write the sequences of b ound variables as vectors and

resp ectivelyTwotyp e schemes are equivalent if they can b e obtained from each

other by renaming and reordering of b ound variables A typ e is an instanceof

written if there exists a substitution S such that SuppS bv and

S When wewant to make S explicit wesay that is an instance of

yp e schemes have the same instances via S written via S Equivalentt

We sometimes write as a shorthand for the simple typ e scheme not

to b e confused with the comp ound typ e scheme since comp ound typ e

schemes have a sp ecial signicance they are used exclusively as typ es of region

p olymorphic functions even for those regionp olymorphic functions that takean

empty list of actual region parameters The underlining serves to makeitclear

whether a typ e scheme is to b e regarded as simple or comp ound

TyVar

p or RegVar

EectVar

Eect FinAtomicEect

AtomicEect EectVar PutEect GetEect

put PutEect RegVar

get GetEect RegVar

Typ e TyVar ConsTyp e FunTyp e

SimpleTyp eScheme Typ e

Comp oundTyp eScheme

m

k n

RegVar TyVar EectVar Typ e

k n m

Typ eScheme SimpleTyp eScheme Comp oundTyp eScheme

ConsTyp e fintg

FunTyp e Typ eWithPlace ArrowEect Typ eWithPlace

ArrowEect EectVar Eect

Typ eWithPlace Typ e RegVar

n

TE TyEnvVar Typ eScheme RegVar

Figure Semantic ob jects of region inference

A type environment is a nite map from program variables to pairs of the

form We use TE to range over typ e environments

The semantic ob jects are summarised in Figure The notion of free variables

extend to larger semantic ob jects suchastyp e environments For example a

typ e variable is said to o ccur free in TE if it o ccurs free in TE x for some x

frv A denotes the set of region variables that o ccur For any semantic ob ject A

free in A ftvA denotes the set of typ e variables that o ccur free in AfevA

denotes the set of eect variables that o ccur free in A and fv A denotes the

union of the ab ove

The inference system

The inference rules allow the inference of statements of the form

TE e e

read in TE e translates to e which has type and place and eect The region

inference rules are nondeterministic given TE and e there may b e innitely

many e and satisfying TE e e This nondeterminism is

convenient to express typ ep olymorphism but we also use it to express freedom

in the choice of region variables Indeed the region inference rules allow one to

put all values in a single region although in practice this would b e the worst

p ossible choice

Regionbased translation of expressions TE e e

TE c c at int fputg

TE x

TE x x

TE f

k

via S fget putg

TE f f S S at

k

TE fx ge e

frve frv TE

TE xe xe at fputg

TE e e TE e e

TE e e e e f getg

TE e e

TE fx ge e

TE let x e in e end let x e in e end

gxe xe TE ff at

fv fvTE

ge e TE ff

TE letrec f xe in e end

letrec f xat e in e end

TE e e frv TE

TE e letregion in e end nfput getg

TE e e fevTE

TE e e nfg

In Rule note that the eect of referring to x is empty this is b ecause the

eects only relate to access of the region store s not the environments VE and

R In Rule the instances of the b ound region variables b ecome actual region

parameters in the target expression The resulting eect includes get and

put for we access the region closure in and create an ordinary function

closure in

In rule the eect of creating the function closure at region is simply

fputpg Following Talpin and Jouvelot one is allowed to makethe

information ab out the function less precise by increasing the latent eect This

is useful in cases where two expressions must have the same functional typ e

including the latent eects on the arrows but mayevaluate to dierent closures

The freedom to increase eects is also useful when one wants to prove that every

welltyp ed Expprogram of Milner can b e translated with the region

inference rules see Lemma b elow We shall explain the sidecondition

frve frvTE in a moment

In Rule we see that the latent eect is brought out when the function is

applied The get in the resulting eect is due to the fact that wemust access

the closure at in order to p erform the function application

In Rule note that f is regionp olymorphic but not typ ep olymorphic

inside e its own bodyIne however f is p olymorphic in typ es regions and

eects Without the limitation on typ ep olymorphism inside e region inference

would not b e decidable

Rule concerns the intro duction of letregion expressions The basic idea

which go es back to early work on eect systems Lucassen et al is this

Supp ose TE e e and assume that is a region variable whichdoes

not o ccur free in TE or in typically o ccurs free in indicating that is

used in the computation of e Then is purely local to the evaluation of e in

the sense that the rest of the computation wil l not access any value storedin

Example

Once again consider the expression e from Section Let e b e the sub expression

e let x at at at

x y at at in y

end

The typ e environment in force when this expression is pro duced is TE fg the

typ e and place of e is

fget put g



int int int

and the eect of e is fput put put put g Note that

is the only region variable which o ccurs free in but o ccurs free neither

in TE nor in Rule allows us to discharge resulting in the eect

fput put put g and the letregion in e ut

Next Rule allows one to discharge an eect variable from the eect no

letregion is intro duced since the discharge do es not inuence evaluation

Weowe an explanation for the sidecondition frve frvTE in Rule

Often but not always it is the case that every region variable which o ccur free

in a translated expression o ccurs free either in the typ e or in the eect of the

expression Here is an example where this fails to hold

fg f x f at at x at at int

where fput put get put g Here we see that is free in

the target expression but o ccurs free neither in the eect nor in the resulting typ e

and place The reason is that at is dead co de it can never b e executed

In Rule we demand that there b e no such free region variables in the b o dy of

the lamb da abstraction This sidecondition can always b e satised by applying

Rule rep eatedly if necessary just b efore applying rule However the side

condition simplies the soundness pro of sp ecically the pro of of Lemma

As mentioned earlier the region inference rules give rise to a static semantics

for the target language one just consistency replaces sentences of the TE e

e by TE e However we prefer the present form which emphasises

the the rules sp ecify a translation

Region inference is a renement of Milners typ e sys

tem

In this section we prove that the region inference system is a renement of Milners

typ e discipline Milner in the sense that an expression can b e translated

with the region rules if and only if it is welltyp ed according to Milners typ e

discipline as dened in Section In particular this shows that the problem

of determining whether a closed expression can b e regionannotated is decidable

We rst show that an expression can b e translated only if it is welltyp ed To

this end we dene a function for pro jection from semantic ob jects in the

region rules to the semantic ob jects in the Milner rules

intint

TE TE

Lemma If TE e e then TE e

The pro of is a straightforward induction on the depth of TE e e ut

e Next we show that every welltyp ed term can b e translated To this end w

dene a relation Rbetween Milners ob jects and ours Let b e some xed

region variable and let b e some xed eect variable The basic idea is to

cho ose everywhere we need a region variable in the translation and to cho ose

fget put g everywhere we need an arrow eect in the translation

Unfortunatelywe cannot simply make R a map b ecause of the distinction b e

tween simple and comp ound typ e schemes So we dene R inductively as follows

R R

fget put g

   

R int R int

R

R R R R

R R R R

DomTE DomTE x DomTE TE x RTE x

TERTE

Clearly for every TE there exists a TE such that TERTE

Lemma If TE e and TERTE then TE e e for

some e and which satisfy R frv f g frve f g and

fget put g

Pro of By induction on the depth of inference of TE e We show only t wo

cases as the rest are straightforward

e x By assumption wehave TE x and SinceTERTE we then

have TE x for some which satises RNow may b e simple

or comp ound but if it is comp ound it has no quantied region variables Let

b e the unique typ e with place satisfying RThen and

the desired conclusion follows either by Rule or by Rule

e xe Here for some and and TE xe must

have b een inferred from the premise TE fx ge Wehave

TE fx g R TE fx g where is the unique typ e with place

related to By induction there exist e and such that TE fx g

e f g and fget put g frv f g frve e

Now Rule conveniently allows us to use this inclusion to prove TE xe

fget put g

   

fput g from which the desired re xe at

sults follows ut

Substitution lemma

Lemma For al l substitutions S ifTE e e then S TE e

S e S S

The pro of is a straightforward induction on the depth of the inference of TE

e e using appropriate variants of S in the case for letrec

Next we shall state a lemma to the eect that the op eration of making typ e

schemes in the typ e environment more typ ep olymorphic do es not decrease the set

of p ossible translations Formallywesay that is at least as typ ep olymorphic

as written w if and are identical or and are b oth comp ound

and for some Furthermore we write TE w TE if DomTE

if TE x and TE x DomTE and for all x DomTE

then w and

Lemma If TE e e and TE w TE then TE e e

We omit the pro of which is a straightforward induction on the depth of

inference of TE e e We note however that the similar statement

concerning region p olymorphism replacing by is not

true b ecause applications of region functions in the target expression can b e

aected bysucha change

Fortunately it is precisely the ability to make assumed typ e schemes more

typ ep olymorphic that we need

Using Eects to Describ e Continuations

For the pro of of the soundness of the translation scheme we need to relate the

values of the dynamic semantics of the source and target language We refer to

this relation as the consistency relation

Since all values are addresses in the target language semantics the consistency

relation must involve stores Consistency also naturally dep ends on typ es at typ e

int source level integers can only b e consistent with p ointers to integers in the

target at a functional typ e only closures can b e related and so on The region

inference rules yield expressions typ es with places and eects all of whichcan

contain free o ccurrences of region variables To relate these region variables to the

region names whichidentify regions at runtime we need a region environment

R and the following denition

Denition Aregion environment R connects eect to store siffrv

s DomR and for al l frv R Dom

Based on these considerations assume that wehave dened consistency as a

relation

RegEnv Typ eWithPlace Val Store TargetVal

C

where R v s v is read in region environment R and store s source value

C

v is consistent with target value v at type with place The obvious idea

would now b e somehow to lift this relation rst from typ es with places to typ e

schemes R vsv and then bypointwise extension to environments

C

R TE E s VE Wemight then try to prove the following statement

C

Conjecture If TE e e and E e v and R TE esVE

C

and R connects to s then there exists a store s and a target value v such that

s VE R e v s and R v s v

C

However there is a problem with this conjecture Informally it states that con

sistency is preserved byevaluation Unfortunatelywe cannot exp ect that to

hold To see what the problem is consider example once more According

to the conjecture at p oint b we should have that the source language closure

hy x y fx gi and the closure found in region r are consistent

In a sense they are consistent application of the two closures map consistent

arguments to consistent results But notice that the consistency which used to

exist b etween the source environment fx g and its representation in the

target semantics was partly destroyed when the region r was p opp ed from the

region stack Thus we see that intuitively sp eaking consistency gradually de

teriorates during computation The saving factor it turns out is that there is

always enough consistency left for the rest of the computation to succeed without

running into any of the inconsistencies

To make these intuitions precise we need some notion of consistency with

resp ect to the rest of the computation One p ossibilityistowork explicitly

with continuations or evaluation contexts However wehave not explored this

p ossibility since all we need for the purp ose of the soundness pro of is a very

simple summary of which regions are accessed by the rest of the computation

Sp ecically it suces to summarise the rest of the computation by an eect

which describ es which of the currently existing regions are accessed by the rest

of the computation Thus we dene a relation

RegEnv Typ eWithPlace Val Store TargetVal Eect

C

where R v s v also written R v s v wrt is read at type

C C

with place inregion environment R and store s source value v is consistent

with target value v with respect to the eect where represents the eect of

g the rest of the computation In our example is fput get put

connected via the region environment to regions r r and r The fact that the

rest of the computation do es not access the current contents of r is evident from

the fact that no region variable free in is connected to r Thatiswhy the

environments in the two closures are consistent with resp ect to the rest of the

computation The second version of our conjecture b ecomes

Conjecture If TE e e and E e v and R TE esVEwrt

C

and R connects to s then there exists a store s and a target value

v such that s VE R e v s and R v s v wrt

C

In other words if we start out with consistency to cover b oth the evaluation of

e whose eect is and the rest of the computation whose eect is then

after the computation of e we will have enough consistency left for the rest of

the computation

However Conjecture is not quite strong enough to b e proved by induction

Consider a source language closure hx e E i and a target closure hx e VERi

whichwe think of as representing hx e E i When the source closure is applied

the b o dy e will b e evaluated in an environment E fx v gwherev is the

argument to the function Assuming that v is some target value consistent with

v the corresp onding evaluation in the target language takes the form s V E

fx v gR e However the region environment in which e is evaluated

is not necessarily the same as the region environment R which is in force at

the p oint where the application takes place for more regions mayhavebeen

allo cated since the closure was created Moreover R is imp ortant for establishing

that E fx v g and VE fx v g are consistent since v and v will b e

known to b e consistentinR not in R And wemust establish consistency of

E fx v g and VE fx v g in order to use induction to prove that the

results of the function applications are consistent

Example

Consider the target expression

letregion

inletxat

in letregion

in let f y xy at at

in letregion

in fat

end

end

end

end

end

Consider the p oint of the evaluation just after the closure for f has b een

created Let us say that the region environmentis R f r r

r g Then the store is

o gR ig s fr fg r fo g r fo hy xy at fx r

x x f

We can reasonable exp ect to have

R fx int g fx gs fx r o g wrt

C

x

where fget get put gwhich is the neteect of the remainder

of the computation at that p oint Exp ect b ecause wehave not dened yet

C

Next consider the p oint where the actual argumentto f has b een stored the

closure for f has b een fetched and we are just ab out to evaluate the b o dy of f

Now the region environment has b ecome R R f r g the store has

b ecome s s fr fo gg and we can reasonably exp ect to have

R int s r o wrt

C

where fget get put g ie the eect of the continuation at that

p oint From and we can reasonably exp ect to obtain

R fx int y int g

C

fx y gs fx r o y r o g wrt

x

But evaluation of the function b o dy is going to take place in R see rule Thus

the theorem needs to b e strong enough to handle the situation that the region

environment in which consistency is established is not the same as the region

environment in which the expression is evaluated Incidentally this is similar to

the situation in blo ckstructured languages where an an inner blo ck can call a

function declared in an enclosing blo ck Indeed it app ears that although the

variable environments do not ob ey a stack discipline the region environments

do ut

We therefore prove that the theorem holds not just for R but also for other

region environments R which agree with R

Denition Let R and R beregion environments and let beaneect We

say that R and R agree on if R frvR frv

Wearenow able to state the main theorem whichwe shall prove once we

have dened the consistency relation

and R TE E s VE wrt and Theorem If TE e e

C

E e v and R connects to s and R and R agreeon and

frve Dom R then there exist s and v such that s V E R e v s and

R vs v wrt

C

The premise frve Dom R is included only to make the pro of simpler it

helps to ensure that closures in the target language will not contain free region

variables

Note that we use the eect of the rest of the computation as an approximation

to what data is live the notion usually employed b y garbage collectors namely

that data is live if it is reachable in the memory graph is incomparable wehave

already seen that data whichisreachable in the memory graph is actually dead

and can b e deallo cated using region inference conversely sometimes data which

wekeep alive in a region is not actually used by the rest of the computation and

a garbage collector would detect it

Consistency

For simplicitywe rst present the consistency relation in the form of inference

rules without reference to the underlying mathematics We shall later explain

that the rules can b e viewed as describing a maximal xed p oint of a certain

monotonic op erator For now it suces to read the rules as follows the conclu

sion of a rule holds if and only if the premises hold

Rules characterize consistency b etween source values and storable tar

get values sv dened in Section These rules are used in Rules and

to characterize consistency b etween source and target values recall that target

values are addresses It is precisely in rules Rule and we see the signi

cance of the idea of representing the rest of the computation by the eect if

get then any claim ab out consistency of values at region is allowed

for then denotes garbage However by rule if v ro Pdoms

alue stored at address v has to b e consistentwiththe and r R then the v

source value v as describ ed by rules and Recall that ro Pdoms

abbreviates r Doms o Domsr Rule says that consistency of

environments is the p ointwise extension of consistency of values

Rule should b e straightforward In rule note that TE do es not o c

cur in the conclusion of the rule one has to invent a TE which can justify

the target expression as a compilation result of the source expression Also the

environments E and VE must b e consistentat TE The region environment R

may b e regarded as the region environment which is in force when the closures

are applied as wesaw earlier this is not necessarily the same as the region en

vironment whichwas in force when the target closure was created R in the

rule For the purp ose of the soundness theorem we clearly need to knowthat

R and R are related somehow and it turns out that it suces to require that

they agree on The condition frv e R ensures that the target closure con

tains no free region variables the two rst premises of the rule already ensure

that fpve DomVE ie that the closure contains no free program vari

ables Again this is go o d hygiene which is useful in the pro ofs sp ecically of

Lemma

Rule is similar to Rule but deals with recursion For the premises to

b e satised TE mush have f in its domain Moreover since recursion is handled

by unfolding in the source language semantics it is E ff hx e E f ig and

VE that have to b e consistent rather than just E and VE

Rule is similar to Rule but it relates recursive closures and region

function closures at comp ound typ e schemes For simple typ e schemes one uses

Rule together with rules

Typ es and Storable Values R v s sv wrt

C

i Int

R intisiwrt

C

TE xe xe at fputg

R TE E s VE wrt

C

R and R agree on frve DomR

R hx e E ishx e VER i wrt

C

TE xe xe at fputg

R TE E ff hx e E f igsVE wrt

C

R and R agree on frve DomR

R hx e E f ishx e VE R i wrt

C

Typ e Schemes and Storable Values R vssv wrt

C

TE ff gxe xe at fputg

bv fv TE

k n m

R and R agree on frve DomR f g

k

R TE ff gE ff hx e E f igsVE wrt

C

R hx e E f ish xe VE R i wrt

C

k

R vssvwrt

C

R vssvwrt

C

Typ e Schemes and addresses R vsv wrt

C

v ro Rr v Pdoms R vssv wrt

C

R vsv wrt

C

get

R vsv wrt

C

Environments R TE E s VE wrt

C

Dom TE DomE DomVE

x Dom TE R TE xExsVEx wrt

C

R TE E s VE wrt

C

The relation is dened as the maximal xed p oint of an op erator F C

C P

C where means p owerset and C is dened by

P P

C RegEnv Typ eWithPlace Val Store StoreVal Eect

RegEnv Typ eScheme RegVar Val Store StoreVal Eect

RegEnv Typ eScheme RegVar Val Store TargetVal Eect

RegEnv TyEnv Env Store TargetEnv Eect

The memb ers of C are referred to as consistency claims Weuse to range

over claims and to range over sets of claims For example a claim of the form

R vssv is read it is claimed that storable value sv is consistent

with source value v and has typ e scheme and resides at in the store s and

region environment R with resp ect to eect

Note that C is a complete lattice Wenow dene an op erator

P F

C C The denition is expressed using the syntax of inference rules

P P

but it could equally well b e expressed as a nonrecursive denition by cases for

given C is dened as the unique set f C j can b e inferred

F F

Since the rules a very similar to rules we by one of the inference rules g

shall not explain them further

Typ es and Storable Values R v s sv

F

i Int

R intisi

F

TE xe xe at fputg

R TE E s VE

R and R agree on frve DomR

R hx e E ishx e VER i

F

TE xe xe at fputg

R TE E ff hx e E f igsVE

R and R agree on frve DomR

R hx e E f ishx e VER i

F

Typ e Schemes and Storable Values R vssv

F

TE ff g xe xe at fputg

bv fvTE

k n m

R and R agree on frve DomR f g

k

R TE ff gE ff hx e E f igsVE

R hx e E f ish xe VE R i

F

k

R vssv

R vssv

F

R vsv Typ e Schemes and addresses

F

v ro Rr v Pdoms R vssv

R vsv

F

get

R vsv

F

Environments R TE E sVE

F

Dom TE DomE DomVE

x Dom TER TE xExsVEx

R TE E s VE

F

The op erator is monotonic implies Thus byTarskis

F F F

xed p oint theorem there exists a greatest xed p oint for and this greatest

F

xed p oint is also the greatest set satisfying Let b e this greatest

F

xed p oint

Denition We take to be and we write for example R v s v wrt

C C

to mean R v s v

C

We use coinduction to prove prop erties of the consistency relation to provethat

a set of claims is consistent ie that it suces to prove

F

Prop erties of Consistency

In this section we prove imp ortant lemmas ab out the consistency relation

C

Besides b eing useful in the pro of of the main theorem Theorem they address

issues suchaswhy it is safe to reuse a deallo cated region even when there are dead

p ointers into it The lemmas will b e proved using a sp ecial style of coinductive

pro of whichwe call rulebased coinduction

Rulebased Coinduction

Rulebased coinductive pro of is a style of pro of whichmakes it p ossible to present

a coinductive pro of in a form which resembles ordinary induction on depth of

inference The scenario is that a set C is given together with an op erator

C C which is monotonic with resp ect to set inclusion is dened

F P P F

by a nite set of inference rules in our case Rules Let b e the maximal

f C j gNow consider a lemma which xed p ointof

F F

states that for some given relation R C C

C if and R then

Let f C j R gWe refer formally to the memb ers of

R

as the consequences of the lemma Then can b e stated By the

R R

principle of coinduction it suces to prove ie that

F

R R

C if there exists suchthat R then

F

R

Thus the coinductive pro of can b e organised as follows takeany C Let

b e such that RShow ie show that can be inferred

F

R

by the inference rules that dene using only premises which are themselves

F

consequences of the lemma Often this is proved by a case analysis on note

not since implies that can b e inferred by an application of one of

the rules that dene from premises which are themselves in Note that

F

proving is equivalent to inferring using the xedp oint rules

F

R

for in our case Rules and only using premises which are themselves

F

i

consequences of the lemma ie i R Thus we can word the

i i

i

coinductive pro of almost as if it were a normal inductive pro of on the depth of

inference related to mininal xed p oints using the xed p oint rules for rather

F

than the rules that dene

F

We name this style of coinductive pro of rulebasedcoinductionWe empha

sise that a rulebased coinductive pro of is not a pro of on depth of inference

for the coinductive pro of establishes claims that are not conclusions of any

nite pro of tree constructed by the xed p oint rules

Preservation of consistency

The rst lemma states that consistency is preserved under decreasing eect and

increasing store This is to b e exp ected it is easier to obtain consistency with

resp ect to an observer if the observer observes a little rather than a lot and the

larger the store is the easier it is for it to contain bits of target values which are

consistent with a given source value

Lemma If R v s v wrt and and s v s

C

then R v s v wrt

C

Lemma is a sp ecial case of the following lemma

Lemma If R vs v wrt and and R and R agreeon

C

and s Rng R frv v s then R vs v wrt Similarly

C

for the other forms of

C

Notice that the domain of s need not b e a subset of the domain of s for

Lemma to apply This is crucial in the pro of of the main theorem in the case

for letregion Here s will b e the store resulting from a computation which

involves lo cal regions s will b e the result of removing the lo cal regions from s

The region variables that are free in but not in will b e the variables of the

lo cal regions

Pro of We prove Lemma and the corresp onding statements concerning the

other forms of consistency by rulebased coinduction The cases for the inference

rules to are arranged according to judgement forms In all cases we

assume

R and R agree on

s Rng R frv v s

R v s sv wrt Typ es and Storable Values

C

Assume

R vs sv wrt

C

By the remarks in Section it suces to prove that R vs sv wrt can

C

b e inferred using Rules from premises which are themselves conclusions of

the lemma

Recall that Rules express that is a xedp ointof one has if

C F

and only if either the premises ie the formulae ab ove the line of Rule

hold or the premises of Rule hold or the premises of Rule hold We deal

with each case in turn

Rule Here int for some and v sv i for some i Int But

then R vs svwrt by Rule

C

Rule Here there exist TE x e E e VE R such that is inferred

from premises

TE xe xe at fputg

R TE E s VEwrt

C

R and R agree on frv e DomR

and v hx e E i and sv hx e VER i But then by and

wehave

R and R agree on

Obviously R agrees with itself on and by and s Rng R

frv v s Thus using also and wehave that the claim

R TE E s VEwrt

C

is a consequence of the lemma Thus by Rule on and wehave

R vs sv wrt as desired since is a consequence of the lemma

C

Rule Similar to the previous case

Typ e Schemes and Storable Values R vssv wrt

C

Assume R vs sv wrt which can b e inferred byRuleorby

C

Rule The case for Rule is similar to the case for Rule So consider the

case for Rule Here takes the form and wehave R vs svwrt

C

Thus the claim R vs svwrt is a conseqence of the lemma

C

But then by Rule wehave R vs sv wrt as required since the

C

premise used ie R vs svwrt is a consequence of the lemma

C

Typ e Schemes and addresses R vsv wrt

C

Assume

R vs v wrt

C

inferred by Rule or Rule Case analysis



Strictly sp eaking we should saywehave that the claim R TE E s VE isacon

 

sequence of the lemma but the chosen formulation seems easier to read so we adopt it

throughout

get Then get so by there exist r o such that v ro

and

R r

v Pdoms

R vs s v wrt

C

By on wehave

R r

Thus and give

v Pdoms and s v s v

By and wehave that the claim R vs s v wrt

C

is a consequence of the lemma ie by that the claim

R vs s v wrt

C

is a consequence of the lemma Thus Rule on and gives

R vs v wrt since the premise used is a consequences of the

C

lemma

get Then R vs v wrt by Rule

C

Environments R TE E s VE wrt

C

The case for Rule is straightforward

Region renaming

In order to prove that reuse of old regions is safe Lemma weshallwant

to rename region variables that o ccur free in some semantic ob ject A but do not

o ccur free in the eect of the rest of the computation to other region variables

that do not o ccur free in the eect of the rest of the computation Let S be a

r

region substitution The yield of S written YieldS is the set fS j

r r r

SuppS g

r

Denition Let A be a semantic object let be an eect and let S

S S S be a substitution We say that S is a region renaming of A with

t r e

and resp ect to if S frv A is injective SuppS YieldS frv

r r

SuppS Supp S

e t

It is not in general the case that R v s v wrt implies R S vsv wrt

C C

for all substitutions S the reason is that S might map region variables in the

set frv n frvtovariables that are free in thereby making consistency

harder to achieve However the following sp ecial case holds

Lemma If R v s v wrt and S is a region renaming of with

C

respect to then R S vsv wrt Similarly for the other consistency

C

judgement forms

Intuitively as far as is concerned a region variable frv n frv denotes

a garbage region which is no dierent from any other garbage region

Pro of By rulebased coinduction on R v s v wrt and the other con

C

sistency judgement forms The cases are ordered according to judgement forms

Typ es and Storable Values R v s sv wrt

C

Assume that S is a region renaming of with resp ect to and that

R v s svwrt

C

Now must b e the conclusion of one of the following rules

Rule By wehave int for some and v sv Int Thus

R S vssv wrt

C

Rule By there exist TE x e e R E and VE such that

TE xe xe at fputg

R TE E sVEwrt

C

R and R agree on frve DomR

v hx e E i sv hx e VE R i

o prove where E E The reason for intro ducing E will b ecome clear later T

R S vssv wrt we wish to nd TE R and e satisfying

C

TE xe xe at S S fputS g

R TE E sVEwrt

C

R and R agree on frve DomR

sv hx e VER i

and that the claim is itself a consequence of the lemma Comparing

to b e and a tempting idea is simply to apply S throughout taking e

S e However S is not necessarily a region renaming on TE sowould not

necessarily b e a consequence of the lemma

Therefore let f g frvTE n frv and let f g b e distinct

n

n

new region variables new in the sense that f g frv S Let

n

S S f j i ngletTE S TE and let e S e Then

i

i

S is a region renaming of TE with resp ect to Further R is dened

as follows Let DomR b e frve Since must have b een inferred by

Rule wehave frv e frv TE Thus S is injective on frve Then for

every region variable frv e there exists one and only one region v ariable

frve such that S Dene R tobe R By these denitions

hx e VER i and hx e VER i are equal By Lemma on and the fact

that S S we obtain as desired Notice that R and R agree on

since S is a region renaming with resp ect to Thus also holds Then

by Lemma on wehave R TE E sVEwrt But then since S

C

is a region renaming of TE with resp ect to wehave that the claim is

itself a consequence of the lemma as desired Finally Rule on gives

vssv wrt as desired R S

C

Rule Almost identical to the previous case use E E ff hx e E f ig

and v hx e E f i instead of E E and v hx e E i Conclude using Rule

instead of using Rule

R vssv wrt Typ e Schemes and Storable Values

C

Assume that S that S is a region renaming of with resp ect

to and that

R vssv wrt

C

Then is the conclusion of one of the following rules

Rule Then there exist TE f x e e VE

k n m

and R such that

TE ff g xe xe at fputg

and bv fvTE

k n m

R and R agree on frve DomR f g

k

R TE ff gE ff hx e E f igsVE wrt

C

v hx e E f i sv h xe VE R i

k

As in the previous two cases S is not necessarily a region renaming of TE

old old

g f g frvTE n frv ff g Let f

k

l

new new

Let f g b e distinct new region variables new in the sense that

l

new new old new old

f gfrv S Let S S fg f

l l

new

g fg Then

l

S is a region renaming on f gTE with resp ect to

k

Let TE S TE and let e S e By Lemma on wehave

TE ff S gxe xe at S fput g

where wehaveused S SinceS is the identityonevery typ e and eect

variable wehave

S S S S

k n m

Moreover

fS S g f g f g fvTE

k n m

since S is injectiveonfrvf gTE Dene R as follows Let DomR

k

frve nfS S gFrom and Rule we get frv e frvTE

k

ff g By for every e there exists a unique frve such

that S Let R R The closures h xe VER i and

k

hS S xe VE R i are now equal Moreover by R and R agree

k

on But then by wehave

agree on frv e DomR fS S g R and R

k

By Lemma on using that R and R agree on we get

R TE ff gE ff hx e E f igsVEwrt

C

Notice that S is a region renaming of TE ff g with resp ect to Thus

from get get that the claim

R TE ff S gE ff v gsVE wrt

C

is a consequence of the lemma By Rule on and we

have

R S p hx e E f ishS S xe VE R i wrt

C

k

which is the desired result

Rule By and Rule wehave that is simple and takes the form

and R vssv wrt Thus the claim R S vssv wrt is a

C C

consequence of the lemma Thus R S vssv wrt asdesired

C

The cases for the remaining rules Rules are straightforward ut

Region allo cation

Consistency is not in general preserved under increasing eects or decreasing

stores For example for all addresses awehave f rg int fgawrt

C

if but not if fgetg since the store is emptyYet there is one

p oint where we do need to increase eects namely in the case of the main pro of

concerning expressions of the form

e letregion in e end

Starting from an assumption of the form R TE E sVE wrt we wish to

C

extend s with a new region yielding s s fr fgg increase to

fput getg the get and put eects representing the eects of e on the

new region and still b e able to claim R f r gTE E s VE wrt

C

fput getg That this is p ossible is not trivial for the region r mayhave

b een in use earlier and there mayeven b e dead p ointers into the old region named

r However if we extend the observing eect with a region variable whichisnot

free in the typ e environment then consistency really is preserved

Lemma If R TE E s VE wrt and frvTE r Doms and

C

frv fg then R f r gTE E s fr fggVEwrt Similarly

C

for the other forms of

C

Pro of The pro of is by rulebased coinduction We assume

frv fg

r Doms

For brevity let s s fr fggWenowhave a case analysis with one case

for each of Rules to

Typ es and Storable Values R v s sv wrt

C

Assume

R vssvwrt

C

frv

Then is the conclusion of one of the following rules

Rule Here v sv i for some i Intand int Hence R f

C

r g vs sv wrt by Rule itself

Rule Here is inferred from premises

TE xe xe at fput g

R TE E s VE wrt

C

R and R agree on frve DomR

v hx e E i and sv hx e VE R i

Without loss of generalitywe can assume

frvTE

for if frvTE we can do the following Let b e a fresh region variable fresh

in the sense that frvTE Consider the substitution S f gBy

and Lemma on wehave

S TE xe xS e at fput g

Moreover S is a region renaming of TE with resp ect to so Lemma on

gives

R STE EsVEwrt

C

Let R b e the region environment dened as follows If DomR then let

R R Otherwise let R have domain DomR DomR nfgf g and

values

R if

R

R if

Let sv hx S e VER i Since frve DomR wehavethat sv and sv are

equal and frvS e DomR Also R and R agree on since neither

nor is free in Th us by Lemma on wehave

R STE EsVEwrt

C

Thus we can assume that holds

By and wehave that the claim

R f r gTE E s VEwrt

C

is itself a conclusion of the lemma Moreover from and wehave

R f r g and R f r g agree on

By Rule on and the fact that frve DomR f r g

we get

vs sv wrt R f r g

C

where sv hx e VER f r gi By and Rule wehave frve

frvTE soby and wehave frve Thus sv and sv are equal

thus is the desired result

Rule Similar to the previous case

Typ e Schemes and Storable Values R vssv wrt

C

Assume

R vssvwrt

C

frv

where must b e the conclusion of one of the following rules

Rule Here is comp ound and there exist TE f x e

k n

R and VE suchthat

m

TE ff gxe xe at fput g

bv fv TE

k n m

R and R agree on frve DomR f g

k

R TE ff gE ff hx e E f igsVEwrt

C

v hx e E f i and sv h xe VE R i

k

As in the case for Rule wemay assume

frvTE fx g

without loss of generality By and we get that the claim

R f r gTE ff gE ff hx e E f igs VE wrt

C

is a consequence of the lemma Let R R f r g and let R R f r g

By and wehave

and R agree on R

Thus by Rule on and wehave

h xe VE R i wrt R vs

C

k

From and Rule wehave frve frvTE ff g This with

gives that if frve then f gThus sv and h xe VE R i

k k

are equal so really is the desired result

Rule Here is simple Write in the form Then frv

by By and Rule wehave R vssv wrt But then

C

the claim R f r g vs sv wrt is a consequence of the

C

lemma Thus R f r g vs svwrt byRule

C

Typ e Schemes and addresses R vsv wrt

C

Assume

R vsv wrt

C

frv

Then is the conclusion of one of the following rules

Rule Here R r of v v Pdomsand

R vssv wrt

C

By wehaveR f r g R r of v Since r Domswe

have v Pdoms ands v sv By and wehave that the claim

vs s v wrt is a consequence of the lemma R f r g

C

Then by Rule wehave R f r g vs v wrt as desired

C

Rule Since get and and by wehave get

Thus R f r g vs v wrt by Rule itself

C

Environments R TE E s VE wrt

C

The case for Rule is straightforward ut

Lemma If R TE E s VEwrt then R TE E s VE wrt fg

C C

Similarly for the other forms of

C

Pro of Straightforward coinductive pro of

Recursion

The source and target languages handle recursion dierently The source language

unrolls a closure each time a recursive function is applied see Rule In the

target language a closure for a recursive function contains a p ointer back to itself

see Rule To prove the correctness of our translation wemust showthat

the two representations are consistent at the p oint where we create the cycle in

the store

Lemma If R TE E s VE wrt and is a compound type scheme

C

with bv fvTE andTE ff gxe xe at

fputg and R and R agreeon and frve DomR frv and

Rr and r Doms and o Domsr then

ff hx e E f ig R TE ff gE

C

s fro h x e VE R igVE wrt

where VE VE ff rog

Pro of Let TE TE ff g E E ff hx e E f ig VE

VE ff rog and s s fro h x e VE R ig By Lemma it

suces to prove

R TE E s VE wrt

C

The pro of is by coinduction Let

q R hx e E f is h x e VE R i

q R hx e E f is ro

q R TE E s VE

Let fq q q g and show We consider q q and q in turn

F

q Since q and with bv fvTE and TE ff

gxe xe at fputg and R agrees with itself on and

frve DomR frvwehave q by rule

F

q If get then q by Rule Assume get SinceR

F

and R agree on wehave R Rr Since also r Doms andq

wehave q byrule

F

q By Lemma on R TE E s VEwrt wehave R TE E s VEwrt

C C

Thus DomTE DomE DomVE and for every x DomTE wehave

R TE xExs VExwrt ie for x f R TE xE xs VE xwrt

C C

by Rule Since also q wehave q

F

Pro of of the Correctness of the Translation

This section is the pro of of Theorem The pro of is by depth of the derivation

of E e v each with an inner induction on the depth of inference of TE

e e There are cases one for each rule in the dynamic semantics of the

source language For each of these cases the inner induction consists of a base

case in which TE e e was inferred by one of the syntaxdirected rules

ie rules plus an inductive step where Rule or was applied It

turns out the the inner inductive steps are indep endentof esowe start out by

doing them once and for all Then we deal with each of the syntaxdirected

cases

In all the cases we assume

TE e e

R TE E s VEwrt

C

E e v

R connects to s

R and R agree on

frve Dom R

Inner inductive step a Rule was applied Assume takes the form

TE e letregion in e end

and is inferred by Rule from premises

TE e e

nfput getg

frv TE

By Lemma we can cho ose such that frv aswell as

Thus frvTE Let r b e an address satisfying r Doms Let

s fr fggThenby Lemma on we get R R f r g and s

R TE E s VEwrt

C

Let R R f r g By wehave

R connects to s

and by

R and R agree on

By wehave

frve Dom R

By the inner induction applied to and there

exist s and v suchthat

v s s VE R e

R vs v wrt

C

Let s s nnfr g Rule on gives

s VE R letregion in e end v s

Note that R and R agree on as frv Also s Rng R

frv v s by and Then by Lemma on we get

R vs v wrt as required

C

Inner inductive step b Rule was applied Assume is inferred byRule

on premises TE e e nfg and fevTE By Lemma

on weget R TE E s VE wrt Also R connects to s

C

R and R agree on and frve DomR Thus by the inner induction

there exist s and v such that s V E R e v s and R vs v wrt

C

as desired

The syntaxdirected cases

Constant Rule Since R connects fputg to s and R and R agree on

fputg wehavethat r R exists and r Doms Take o Domsr

By Rule we then have s VE R c at ros fro cg Letting v

e furthermore get R intvs v wrt ro and s s fro cg w

C

by and as desired

Variable Rule There are two cases dep ending on whether TE asso ciates a

simple or a comp ound typ e scheme with the variable We deal with each of these

in turn

Variable with simple typ e scheme Assume was inferred using Rule

Then e e x for some variable x Moreover TE x p for some p

and simple Let b e the typ e for which Then pand

The evaluation must have b een by Rule so wehave v E x

Let s s By and wehave x DomVE Thus letting v

VEx wehave s V E R x v s as desired By Rule on wehav e

R pvs v wrt ie R pvs v wrt as desired recall

C C

that we identify and

Variable with comp ound typ e scheme Assume was obtained by Rule

Then e isavariable f ande is of the form f S S atp and

k

p for some and

TE f f S S atp p

k

was inferred by application of Rule to the premises

TE f p

k

via S

fgetp putpg

Then must have b een inferred by Rule so wehave v E f By

and f DomTE wehave

wrt R p vsv

C

where v VEf Since getp the denition of rules and gives

C

v Pdoms and r of v Rp and v is a recursive closure

v hx e E f i

and sv h x e VE R i for some e VE and R Furthermore

k

there exist TE and such that

n m

R TE ff p gE ff v gsVE wrt

C

TE ff p g x e x e at p p fputp g

bv fvTE p

R and R agree on

frve Dom R f g

k

Without loss of generalitywe can assume that are chosen so as to

k

satisfy

f g frv

k

By and wehave R p Doms Let r R p Let o be an

oset not in Domsr Let v r o let R R f R S

i i

i k g and let sv hx e VE R i Notice that R S exists by Let

i

s s fr o svgItfollows from Rule that

s V E R f S S atp v s

k

as desired It remains to prove

R pvs v wrt

C

enow consult Rules concerning Ifgetp we are done But even W

C

if getp wehave v Pdoms andr of v r R p as required by

Rule It remains to prove

R pvs sv wrt

C

Let TE TE ff p g Since must have b een inferred by Rules

and we equally have

at p p fputpg x e TE x e

From and f gfrv weget

k

R and R agree on

From Lemma on we get

R TE E ff v gs VE wrt

C

From weget

Dom R frve

By Rule on and wehave R pvs hx e VE R iwrt

C

as desired

Lamb da abstraction Rule Assume was inferred by Rule then

takes the following form

at p fputpg TE xe xe

Moreover was inferred by Rule yielding

v hx e Ei

Since R connects to s wehave Rp Doms Let r Rpandleto be an

oset not in Domsr Let v roands s fv hx e VER igBy

wehave R pr Thus by rule wehave

at p v s s VE R xe

Notice that R TE E s VE wrt by Lemma and Also frve

C

Dom R by Thus by Rules and or by wehave R v s v wrt

C

as required

Application of nonrecursive closure Rule Here e e e for some e and e

and e e e for some e and e and was inferred by Rule on premises



p TE e e

TE e e

f getpg

Moreover was inferred by Rule on premises

v hx e E i E e v

E e v

E fx v ge v

Let f getpg ie the eect that remains after the com

putation of e Note that so from and we

get

R TE E s VEwrt

C

R connects to s

R and R agree on

Also from weget

frve Dom R frve Dom R

By induction on and there exist s and v

such that

s v s VE R e



R pv s v wrt

C

Thus by the denition of tells us that v Notice that getp

C

Pdoms and r of v R p and there exist e VE TE and R such that

s VE R i hx e v



TE x e x e at p p fput pg

R TE E s VE wrt

C

R and R agree on

frve Dom R

g ie the eect that remains after the computation of Let f getp

e By Lemma on wehave s v s Furthermore wehave

so by Lemma on wehave

R TE E s VEwrt

C

Also from and we get

R connects to s

R and R agree on

By induction on and there exist s and v

such that

v s s VER e

R v s v wrt

C

Let TE TE fx gNow must have b een inferred by Rules

and Thus there exists a suchthat and

TE e e

Wehave s v s by Lemma on By Lemma on and

we then have

E E s VE wrt R T

C

and by Lemma on and weget

wrt R v s v

C

Let E E fx v g and let VE VE fx v g Combining

and we get

wrt s VE E R TE

C

Also by and s v s weget

R connects to s

and by

R and R agree on

Then by induction on and there exist s

and v such that

v s s VE R e

R vs v wrt

C

From and we get s VE R e e v s as desired

Moreover by Lemma on and wehave R vs v wrt as

C

desired

Application of recursive closure Rule This case is similar to the previous

case but we include it for the sake of completeness Wehave e e e for some

e and e and e e e for some e and e and by Rule there exist p

and such that



p TE e e

TE e e

fgetpg

Also assume that was inferred by application of Rule on premises

E e v v hx e E fi

E e v

E ff v g fx v ge v

To use induction rst time we split the eect into where

fgetpg By and weha ve

R TE E s VEwrt

C

R connects to s

R and R agree on

Also by wehave

frve Dom R frve Dom R

By induction on and there exist v and s

such that

s VE R e v s



R pv s v wrt

C

Notice that getp Thus by and the rules for Rules and

C

wehave v Pdoms and r of v R p and there exist e VE TE and

R such that

s v hx e VE R i



TE x e x e at p p fputpg

wrt R TE E ff v gs VE

C

R and R agree on

frve Dom R

To use induction a second time we split the remaining eect into

where fgetpg Wehave s v s by Lemma Then by

Lemma on wehave

R TE E s VEwrt

C

Moreover and imply

R connects to s

R and R agree on

By induction on and there exist s and v

such that

s VER e v s

R v s v wrt

C

Let TE TE fx gNow must have b een inferred by Rules

and with and Thus there exists an eect

TE e e

By Lemma on and wehave

R TE E ff v gs VE wrt

C

Let E E ff v g fx v g and let VE VE fx v g From

and and wehave

wrt s VE E R TE

C

From weget

R and R agree on

By and s v s we get

connects to s R

By induction on and there exist s and v

such that

s VE R e v s

R vs v wrt

C

Rule on and gives s V E R e e v s as desired

Moreover Lemma on and gives the desired R vs v wrt

C

let expressions Rule Assume was inferred by Rule then

takes the form

TE let x e in e end let x e in e end

Moreover and must b e inferred by Rules and from the premises

TE e e p

TE fx p ge e

E e v

ge v E fx v

Let b e the eect that remains after the evaluation of e ie let

Note that soby and wehave

R TE E s VEwrt

C

R connects to s

R and R agree on

By wehave

Dom R Dom R frve frve

By induction on and there exist s and v

such that

s v s VE R e

R p v s v wrt

C

By Lemma on we get

R p v s v wrt

C

By Lemma on we get

R TE E s VEwrt

C

Combining these two we get

R TE fx p g g wrt E fx v gs VE fx v

C

By and and s v s wehave

R connects to s

R and R agree on

By induction on and there exist s and v

such that

s VE fx v gR e v s

R vs v wrt

C

ver by Rule on and Here is one of the desired results Moreo

we get the desired s V E R let x e in e end v s

letrecRule In this case takes the form

TE letrec f xe in e end

letrec f xat p e in e end

k

and is inferred by application of Rule to the premises

TE ff pgxe xe at p p

k

fv fvTE

TE ff pge e

where and Moreover was inferred by Rule on

k

the premise

E ff hx e Efig e v

Since must ha ve b een inferred by Rules and wehave fputpg

By and wehave R pRp Doms Let r Rp Let o be

an oset with o Domsr Let v r o Let VE VE ff v g and

let s s fv h xe VE R ig By Lemma on weha ve

k

that

TE ff pgxe xe at p p

Let TE TE ff pg and let E E ff hx e Efig By

wehave

frve Dom R f g frve Dom R

k

have By Lemma on and we

R TE E s VE wrt Thenby Lemma weget

C

R TE E s VE wrt

C

Also by and we get

R connects to s

R and R agree on

By induction on and there exist s and v

such that

v s s VE R e

R vs v wrt

C

From and Rule we get

s VE R letrec f xat p e in e end v s

k

Now and are the desired results

This concludes the pro of of Theorem

Algorithms

The algorithms used for implementing the region inference rules in the ML Kit

will not b e describ ed here We shall give a brief overview however First ordinary

ML typ e inference is p erformed using Milners algorithm W extended to all of

Core ML The output of this phase is an explicitly typ ed lamb da term e say

Then region inference is done in two phases First e is decorated with fresh region

and eect variables everywhere a region and eect variable will b e required in

an explicitly typ ed version the fully region annotated target expression This

phase is called spreading During spreading every recursive function f of typ e

ML ML

scheme sayisgiven the most general typ e scheme whichhas as its

pro jection in the sense of Section For example a letrecb ound int int

function will b e given typ e scheme int int The spreading

phase p erforms the unications suggested by the inference rules For example

the two o ccurrences of in Rule suggest a unication of the typ es and places

of op erator and op erand Spreading employs rules and as aggressively as

p ossible ie after every application of rules and The resulting

program call it e iswellannotated with regions except for the fact that the

typ e schemes assumed for recursive functions may b e to o general compared to

the typ e schemes that were inferred for the lamb da expressions which dene the

functions

The second phase is called xedpoint resolution and takes e as input For

each recursive function in e the region inference steps unication intro duc

tion of letregions etc are iterated using less and less general typ e schemes for

the recursive functions till a xed p oint is reached This is similar in spirit to

Mycrofts algorithms for full p olymorphic recursion Mycroft

It is p ossible to extend the notion of principal uniers for typ es to a notion of

principal unier for regionannotated typ es even though regionannotated typ es

contain eects This relies on invariants ab out arrow eects that were outlined

in Section One can prove that every twotyp es and that have the same

underlying ML typ e have a most general unier provided all the arrow eects in

and satisfy the invariants

The reason for the separation of spreading and xedp oint resolution is that

unless one takes care the iteration used to handle the p olymorphic region re

cursion do es not terminate In particular there is a danger of arrow eects that

growever larger as more fresh region and eect variables are generated The

division into spreading and xedp oint resolution solves this problem by only

generating fresh variables during the spreading phase It can then b e shown

that the second phase always terminates This approach do es not always give

principal typ es for there are cases where that function in the xedp oint resolu

tion which is resp onsible for forming typ e schemes is refused the opp ortunityto

quantify region and eect variables even though it is p ermitted by the inference

rules When this happ ens the implementation simply printsawarning ab out

the p ossible loss of principal typ es and continues with a lessthanprincipal typ e

scheme Fortunately this happ ens rather infrequently in practice and since the

soundness result of the present pap er shows the correctness for al l derivations

TE e e safety is not violated

Language Extensions

In this section we outline some of the extensions that have b een made to the region

inference rules in order to handle references exceptions and recursive datatyp es

in the ML Kit

References

Assume primitives ref and for creating a reference dereferencing and

assignment resp ectivelyFor the purp ose of region inference these can b e treated

as variables with the following typ e schemes

fput g



ref ref

fget g



ref

fput put g

 

ref unit

The most interesting of these is assignment The new contents of the reference is

ya pointer or bya word if the value is in unboxed representation represented b

The assignment up dates the reference with this p ointer or word Thus there is

a put eect on the region where the reference resides The assignmentdoesnot

make a copy the stored value Thus assignment is a constant time op eration but

the downside is that the old and the new contents must b e in the same regions

see the two o ccurrences of in the typ e for Thus for values with b oxed

representation all the dierent contents of the reference will b e kept alive for

as long as the reference is live In mostly functional programs this do es not

seem to b e a serious problem and even if there are many sideeects one can still

exp ect reasonable memory usage as long as the references are relatively short

lived Longlived references that contain b oxed values and are assigned freshly

created contents often are hostile to region inference

Exceptions

Our approach here is simpleminded exception values are put into global regions

Every evaluation of an exception declaration gives rise to an allo cation in some

global region Application of a unary exception constructor to an argument forces

the argument to b ein global regions as well Thus if one constructs may exception

values using unary exception constructors one gets a space leak indeed the

space leaking region in Figure contains constructed exception values If

one uses nullary constructors only there is only going to b e one allo cation for

eachevaluation of each exception declaration

Recursive Datatyp es

So far every typ e constructor has b een paired with one region variable For

values of recursive datatyp es additional region variables the socalled auxiliary

region variables are asso ciated with typ e constructors For example consider

the declaration of the list datatyp e

datatype a list nil of a a list

The regionannotated version of the typ e list takes the form list



where stands for a region which contains the list elements contains the spine

of the list ie the constructors nil and and is an auxiliary region which

contains the pairs to which is applied Thus lists are kept very b oxed

owords the rst is a tag saying I am in region every cons cell takes up tw

cons and the second is a p ointer to the pair to which is applied The region

is called auxiliary b ecause it holds values which are internal to the datatyp e

declaration there will b e one auxiliary region for eachtyp e constructor or pro d

uct typ e formation in each constructor in the datatyp e However all o ccurrences

of the typ e constructor b eing declared are put in the same region Hence

receives typ e

fput g



list list

 

Sequential datatyp e declarations p ose an interesting design problem

datatype t C of int

datatype t C of t t

datatype t C of t t

datatype t C oft t

i i i

In the declaration of t should one givethetwo o ccurrences of t on the right

i i

hand side the same or dierent regions If one gives them the same regions one

intro duces unnecessary sharing if one gives them dierent regions the number

of auxiliary region variables grows exp onentially in ipotentially leading to slow

region inference A third p ossibility is to put a limit on the numb er of auxiliary

region variables one will allow Wehavechosen the second solution spreading re

gions as much as p ossible but a systematic empirical study of dierent solutions

has not b een conducted

Strengths and Weaknesses

The region inference rules were rst implemented in a prototyp e system Tofte

and Talpin and then in the ML Kit Birkedal et al Neither of

these systems use garbage collection This section records some of the exp erience

gained from these systems with sp ecial emphasis on how details of the region

inference rules inuence memory management We rst illustrate consequences

of the region inference rules by a series of small but complete examples Then

we rep ort a few results from larger b enchmarks run on the ML Kit Throughout

we use Standard ML syntax Milner et at roughly fun is translated into

letrec and val into let

Small examples

The examples are group ed according to the general p oint they are intended to

make

Polymorphic Recursion

Generally sp eaking p olymorphic region recursion favours recursive functions that

have a balanced call tree as opp osed to an iterative computation where the call

tree is a list We illustrate this with two examples The rst is the exp onential

version of the Fib onacci function

fun fib n if n then else fibn fibn

val fib fib

Due to region p olymorphism the two recursive calls of fib use dierent regions

lo cal to the b o dy see Figure The memory usage app ears in Figure

The next example called reynolds Birkedal et al is a depthrst

search in a tree using a predicate to record the path from the ro ot to the present

no de

datatype a tree

Lf

Br of a a tree a tree

fun mktree Lf

mktree n let val t mktreen

in Brntt

end

fun search p Lf false

search p Brxtt

if p x then true

else search fn y yx orelse p y t

max inf reg max n reg max stack nal

i ii iii iv

fib

appel

appel

reynolds

reynolds

string

string

Figure Memory usage when running sample programs on the ML Kit with

Regions Version a i Maximal space in bytes used for variable size regions

one region page is bytes ii Maximal space in bytes used for xed size

regions iii Maximal stack size during execution in bytes iv Number of bytes

holding values at the end of the computation regions on stack data in variable

sized regions

orelse

search fn y yx orelse p y t

val reynolds search fn false mktree

Due to the p olymorphic recursion the recursivecallof search do es not put the

closures for fn y yx orelse p y in the same region as p so the space

usage will b e prop ortional to the depth of the tree This leads to go o d memory

utilisation Figure

By contrast consider the rstorder variant called reynolds which uses a

list to represent the path It is obtained by replacing the search function of

reynolds by

fun memberx false

memberxxrest

xx orelse memberx rest

fun search p Lf false

search p Brxtt

if memberxp then true

else search xp t orelse

search xp t

val reynolds search mktree

As wesaw in Section region inference do es not distinguish b etween a list

and its tail so all cons cells one for each no de in the tree are put in the same

region This gives p o or memory utilisation the dierence from reynolds being

exp onential in the depth of the tree Figure More generally in connection

with a recursive datatyp e one should not count on p olymorphic recursion to

separate the lifetimes of a value v of that typ e and other values of the same typ e

contained in v

Tail recursion

Another common pattern of computation is iteration This is b est implemented

using a recursive function whose typ e scheme takes the form note

that the argument and result typ es are the same even after region annotation

Such a function is called a region endomorphism Here is how to write a simple

lo op to sum the numb ers to

fun sump as acc p

sumaccn sumnaccn

val sumit sum

In ML all functions in principle takes one argument in his case a tuple and

that is how it is implemented in the ML Kit One might think that pairs

would pile up in one region however an analysis called the storage mode analysis

Birkedal et al discovers that the region can b e reset just b efore each pair

is written so that in fact the region will only ever contain one pair Memory

usage is indep endentofthenumb er of iterations in this example By contrast

the nontailrecursiveversion

fun sum

sum n n sumn

val sumit sum

uses stack space prop ortional to the numb er of iterations

The next program appelisa variant of a program in App el

fun s nil

si si

fun length

lengthxxs length xs

valN

fun fnx

let val z length x

in if n then else fn s N

end

val appel fNnil

Here fn nil uses space N although N should b e enough The problem

is that at each iteration a list of length N is created put in a fresh region and

then passed to the recursive call which only uses the list to compute z The list

however stays live till the end of the recursive call Rule and tell us that

the b ound x will b e allo cated throughout the evaluation of the b o dy of fThe

cure in this case is not to use the p olymorphic recursion

fun fp as nx

let val z length x

in if n then else fif true then n s N else p

end

val appel fNnil

Now the storage mo de analysis will discover that the region containing the en

tire list can b e reset at each iteration this is tail call optimisation for recursive

datatyp es The ab ove transformation is a rather indirect way of instructing the

region inference algorithm that one do es not want p olymorphic recursion and if

the optimiser eliminated the conditional it would not even have the desired ef

fect It would probably b e b etter to allow programmers to state their intentions

directly Memory consumption is in Figure

Higherorder functions

If a function f is lamb dab ound it is not regionp olymorphic Rule For

example consider

fun foldl f acc acc

foldl f acc xxs foldl f faccx xs

fun concat l foldl op l

fun blanks

blanks n blanksn

valN

val string concatblanks N

Despite the fact that foldl is regionp olymorphic the lamb dab ound f is not so

all applications of the concatenation op erator in concat will put their results

in the same region leading to N space usage To obtain N space usage

one sp ecializes foldl to uncurries the resulting function and turns it into a

region endomorphism

fun concatp as acc p

concataccxxs concata ccxxs

fun concatl concat l

fun blanks

blanks n blanksn

val string concatblanks

Larger Benchmarks

Anumb er of b enchmarks from the New Jersey Standard ML b enchmark suite

have b een p orted to the Kit and compared space and time usage against execu

tion as standalone programs under Standard ML of New Jerseyversion The

largest b enchmark is Simple lines a program which originally used arrays

of oating p ointnumb ers extensivelyTo make it run on the Kit whichdoes

not supp ort arrays arrays were translated into lists of references so the p orted

program is probably not indicativeofhow one would write the program without

arrays to start with Life lines uses lists very extensively Mandelbrot

lines uses oating p oints extensively KnuthBendix lines do es extensive

dynamic allo cation of data structures that represent terms

Initially programs often use more space when running on the Kit for example

Figure shows a region prole for the original version of the KnuthBendix

b enchmark pro duced using Hallenb ergs region proler Hallenb erg The

region proler can also pinp oint the program p oints which are resp onsible for

space leaks The source program is then changed to make it more region friendly

Interestingly transformations that are go o d for region inference often are go o d

for SMLNJ to o see KnuthBendix in Figure for an example This is not

very surprising when the static analysis is able to infer shorter lifetimes it may

well b e b ecause the values actually need to b e live for a shorter time and this is

go o d for garbage collection to o The region prole of the improved KnuthBendix

completion is shown in Figure see Figure for a comparison with SML of New

Jerseyversion

Automatic Program Transformation

Apart from functions that are delib erately written as region endomorphisms the

general rule is that the more regions are separated the b etter since it makes more

aggressive recycling of memory p ossible The Kit p erforms optimisations which

separate regions These include replacing let x e in e end by e e xin

cases where e is a syntactic value and either x o ccurs at most once in e or the

value denoted by e is not larger than some given constant Another optimisation

which is implemented is sp ecialisation of curried functions as in the string

example ab ove however the Kit do es not attempt to turn functions into region

endomorphisms whichwas the last thing we did in string As a matter of

principle the Kit avoids optimisations which can lead to increased memory usage

Also useful is the ability of the region inference to suggest where space leaks

may b e exp ected If a function has comp ound typ e scheme

and contains an atomic eect of the form put where is not amongst the

b ound region variables then one quite p ossibly has a space leak every call of rp2ps - Region profiling Fri Jul 5 11:02:46 1996 Maximum allocated bytes in regions: 32665880.

bytes r122inf

r63inf

r64inf

r65inf 25M r66inf

r78inf

20M r79inf r67inf

r68inf

15M r75inf r143inf

rDesc

10M r1886inf

r1799fin

r1840fin

5M r1852inf

r1853inf

r12inf 0M

5.2 25.2 45.2 65.2 85.2 105.2 125.2 145.2 165.2 seconds

Figure Region prole for KnuthBendix b efore optimisations One region

of unb ounded size indicated as rinf in the picture is resp onsible for most

of the space leak Additional proling reveals that a single program p ointthe

application of an exception constructor to a constant string is resp onsible for all

values in that region rp2ps - Region profiling Fri Jul 5 11:40:45 1996 Maximum allocated bytes in regions: 186868.

bytes r2806inf r2807inf r2808inf 160k r2809inf rDesc 140k r2813inf r2810inf 120k stack r2811inf 100k r2812inf r2814inf

80k r2485fin r2639fin r2504inf 60k r2679inf r2505inf 40k r2635fin r2621fin 20k r2616inf OTHER 0k

5.1 25.1 45.1 65.1 85.1 105.1 125.1 145.1 165.1 185.1 205.1 seconds

Figure Region prole for KnuthBendix after optimisations

Time Time Space Space

Kit SMLNJ Kit SMLNJ

Mandelbrot orig

Life orig

Life impr

KnuthBendix orig

KnuthBendix impr

Simple orig

Figure Comparison b etween standalone programs created with the ML Kit

using the HP PARISC co de generator and SML of New Jersey resp ectively

Here orig means original program while impr means improved for region

inference All times are user time in seconds on an HP s measured

using the unix time command Space is maximal resident memory in kilobytes

measured with top and includes co de and runtime system All values are average

over three runs

the function mightputa value into some region which is external to the function

If in addition do es not o ccur free in that is all the more reason for concern

for the value will not even b e part of the result of the function In other words

the function has a sideeect at the implementation level This can easily happ en

even when there are no sideeects in the source program

In such cases the implementation simply issues a short warning This turns

out to b e very useful in practice

Another usage of the inferred information is the ability to detect dead co de

Consider the rule for letregion Rule If put and get then

whatever value that was put into was never used For example this can detect

that the functions f and g b elow are never used

let

fun fx x

fun gx ffx

in

fn x fn g

end

Conclusion

As has b een shown with the previous examples it is not the case that every ML

program automatically runs well on a stack of regions Often one has to pro

gram in a region friendly style aided by proling to ols to nd space leaks Thus

programming with regions is dierent from usual ML programming where one

relies on a garbage collector for memory management On the other hand the

region discipline oers what we feel is an attractivecombination of the conve

nience of an expressive and the ability to reason ab out the

time and space p erformance of programs The relationship b etween the abstract

mo del of the regions presented in this pap er and the concrete implementation is

close enough that one can use the abstract mo del combined with the proling

to ols mentioned earlier to tune programs often resulting in very space ecient

programs that are executed as written with no added costs of unb ounded size

Acknowledgments

It would have b een imp ossible to assess the practical use of the region inference

rules without the software develop ed by the ML Kit with Regions development

team Lars Birkedal wrote the compiler from regionannotated lamb daterms to

C together with a runtime system in C Martin Elsman and Niels Hallenb erg ex

tended this work to HP PARISC co de generation including register allo cation

and instruction scheduling Magnus Vejlstrup develop ed the multiplicity infer

ence for inferring region sizes Niels Hallenb erg implemented the region proler

Peter Sestoft and Peter Bertelsen conducted thorough tests of the system and

improved the storage mo de analysis The rst author wishes to thank Mikkel

Thorup and Bob Paige for generously providing algorithmic exp ertice sp eci

cally on graph algorithms their input was very imp ortant for the detailed design

and implementation of the region inference algorithms in the Kit The depthrst

search algorithms in Section were suggested by John Reynolds Finallywe

wish to thank the referees for many constructive suggestions and comments

References

A Aiken M Fahndrich and R Levien Better static memory management

Improving regionbased analysis of higherorder languages In Proc of the

ACM SIGPLAN ConferenceonProgramming Languages and Implemen

tation PLDI pages La Jolla CA June ACM Press

A W App el Compiling with ContinuationsCambridge University Press

H Baker List pro cessing in real time on a serial computer Communications

of the ACM April

H G Baker Unify and conquer garbage collection up dating aliasing

in functional languages In Proceedings of the ACM Conference on Lisp

and pages June

L Birkedal M Tofte and M Vejlstrup From region inference to von Neu

mann machines via region representation inference In Proceedings of the

rdACM SIGPLANSIGACT Symposium on Principles of Programming

Languages pages ACM Press January

J M L D K Giord P Jouvelot and M Sheldon Fx reference man

ual Technical Rep ort MITLCSTR MIT Lab oratory for Computer

Science Sept

L Damas and R Milner Principal typ e schemes for functional programs

In Proc th Annual ACM Symp on Principles of Programming Languages

pages Jan

E W Dijkstra Recursive programming Numerische Math

Also in Rosen Programming Systems and Languages McGrawHill

M Elsman and N Hallenb erg An optimizing backend for the ML Kit

using a stack of regions Student Pro ject Department of Computer Science

University of Cop enhagen DIKU July

M George Transformations and reduction strategies for typ ed lambda

expressions ACM Transactions on Programming Languages and Systems

Oct

P Hudak A semantic mo del of reference counting and its abstraction In

ACM Symposium on List and Functional Programming pages

P Jouvelot and D Giord Algebraic reconstruction of typ es and eects

In Proceedings of the th ACM Symposium on Principles of Programming

Languages POPL

H S Katsuro Inoue and H Yagi Analysis of functional programs to detect

runtime garbage cells ACM Transactions on Programming Languages and

Systems

D E Knuth Fundamental Algorithmsvolume of The Art of Computer

Programming AddisonWesley

H Lieb erman and C Hewitt A realtime garbage collector based on the

lifetimes of ob jects Communications of the ACM June

J Lucassen and D Giord Polymorphic eect systems In Proceedings of

the ACM Conference on Principles of Programming Languages

J M Lucassen Types and Eects towards the integration of functional

and imperative programming PhD thesis MIT Lab oratory for Computer

Science MITLCSTR

R Milner A theory of typ e p olymorphism in programming J Computer

and System Sciences

R Milner M Tofte and R Harp er The Denition of StandardML MIT

Press

A Mycroft Polymorphic typ e schemes and recursive denitions In Proc

th Int Conf on Programming LNCS

Peter Naur ed Revised rep ort on the algorithmic language Algol

Comm ACM

C Ruggieri and T P Murtagh Lifetime analysis of dynamically allo cated

ob jects In Proceedings of the th Annual ACM Symposium on Principles

of Programming Languages pages January

JPTalpin Theoretical and practical asp ects of typ e and eect infer

ence Do ctoral Dissertation May Also available as Research Rep ort

EMPCRIA Ecole des Mines de Paris

JPTalpin and P Jouvelot Polymorphic typ e region and eect inference

Journal of Functional Programming

M Tofte and JPTalpin A theory of stack allo cation in p olymorphi

cally typ ed languages Technical Rep ort DIKUrep ort Departmentof

Computer Science University of Cop enhagen

M Tofte and JPTalpin Implementing the callbyvalue lamb dacalculus

using a stack of regions In Proceedings of the st ACM SIGPLANSIGACT

Symposium on Principles of Programming Languages pages ACM

Press January

Index

s V E R e v s evaluation of The index refers to sections where the

target expression concepts are intro duced For exam

TE e e region inference ple the entry region name r RegName

rules Figure means that the no

Addr see address tion of region name is intro duced in

address a or ro Addr RegName Sections and app ears in Fig

OSet ure and that metavariable r ranges

agreementbetween region environments over region names throughout the pa

p er

arrow eect

region arguments

at allo cation directive

in typ e schemes

bvboundvariables of typ e scheme

mo dication of nite maps

c see integer constant

restriction of nite map

C domain for consistency

nn restriction of store

C

n

A B nite maps

coinduction

ML ML

claim of consistency see instance

closure in dynamic semantics function abstraction

source language hx e E i or hx e E f i see typ e variable

sequence of typ e variables

see claim of consistency

target language hx e VERi or

h xeVERi set of claims

k

connecting an eect to a store maximal xed p ointof

F

consistency see eect variable

Dom domain of nite map sequence of eect variables

see arrow eect

E see environment

Eect Figure see region variable

EectVar see eect variable sequence of region variables

eect typ e

variable typ e scheme

ML

ML typ e

atomic

ML

eect substitution S ML typ e sc heme

e

Envseeenvironment hx e E i hx e E f i hx e VERi or

environment see also typ e environ h xeVERi see clo

k

ment and region environment sure

ML

ML

in dynamic semantics of source TE e typ e rules for source

n

E Env Var Val

E e v evaluation of source ex

pressions

in dynamic semantics of target VE region substitution S

r

n

region variable or p

TargetEnvVar Addr

Rng range of nite map

SExp source language

equaivalence of typ e schemes

TE typ e environment

f see program variable

ML

TE ML typ e environment

monotonic op erator on sets of claims

F

TExp target language

s see store

fev free eect variables

sa

fpv free program variables

S see substitution

frv free region variables

S see eect substitution

e

ftv free typ e variables

S see region substitution

r

fv free typ e region and eect vari

S see typ e substitution

t

ables

Store see store

get get eect

n

instance store s Store RegName Region

ML

in source language

in target language StoreVal see value storable

integer constantc substitution S

letregion supp ort Supp

o see oset sv see value storable

of pro jection TargetEnv see environment

oset o TargetVal see value

p see region variable TyV ar see typ e variable

p owerset constructor typ e

P

planar domain of a store Pdom typ e with place Typ eWithPlace

program variable x or f Typ e RegVar Figure

put put eect Typ eWithPlace see typ e with place

n

r see region name

typ e environmentTE Var Typ eScheme

R see region environment

RegVar

RegEnv see region environment

Typ eScheme Figure

RegName see region name

typ e scheme

n

Region OSet StoreVal see typ e substitution S

t

also region typ e variable

region see also Region typ e with place

region allo cation Val see value

region environmen tR RegEnv value

n

source language v Val

RegVar RegName

storable sv StoreVal

region function closure h xeVERi

k

target language v or a TargetVal

see closure

Addr

region name r RegNameFig

VE see environment

ure

target language v

region renaming

A App endix Example ThreeAddress Co de

The threeaddress co de which the ML Kit pro duces on the waytoHPPARISC

co de for the example given in Section is shown b elow Temp orary variables start

with V Fixed registers are used for the stack p ointer SP and for function call

and return stdArg stdClos stdRes In this example the compiler discovers

that all regions can b e represented on the stack in other cases letregion and

end translate into calls of runtime system pro cedures that resemble lightweight

malloc and free op erations

LABEL main

AllocRegionV allocate global region rho

begin LETREGION rho rho

MoveSPV V SP ie rho

OffsetSPSP

MoveSPV rho

OffsetSPSP

begin APP non tail call

begin operator

begin LETREGION rho eliminated

begin LET

begin RECORD

MoveVV allocate storage for record

MoveV represents

StoreIndexLVV store component of record

MoveV represents

StoreIndexLVV store component of record

StoreIndexLV tag

MoveVV save address of record as result

end of RECORD

LET scope

MoveVV allocate storage for closure for FN y

StoreIndexLLabV store code pointer in closure

MoveVV

StoreIndexLVV save free variable x in closure

FetchVarsV

MoveVV

StoreIndexLVV save free variable rho in closure

MoveVV save address of closure as result

end LET

end LETREGION rho eliminated

end operator begin operand

MoveV represents

end operand

PushLab push return address

MoveVstdClos

MoveVstdArg

FetchIndexLstdClosV fetch code address from closure

JmpV

LABEL return address

MovestdResV

end APP

OffsetSPSP end LETREGION rho

OffsetSPSP end LETREGION rho

HALT

LABEL code for function FN y

begin RECORD

FetchVarsV

MoveVV

AllocMemLVV allocate storage for record at rho

FetchIndexLstdClosV access variable x

FetchIndexLVV extract component from record

StoreIndexLVV store component of record

MovestdArgV access variable y

StoreIndexLVV store component of record

StoreIndexLV tag

MoveVstdRes save address of record as result

end of RECORD

return

PopV

JmpV