A transformationbased optimiser for Haskell

Simon L Peyton Jones

Department of Computing Science University of Glasgow G QQ

Email simonpjdcsglaacuk

WWW httpwwwdcsglaacuk sim onpj

Andre L M Santos

Departmento de Informatica Universidade Federal de Pernambuco

Email almsdiufpebr

WWW httpwwwdiufpebr alm s

Octob er

Abstract

Many do some of their work by means of correctnesspreserving and hop efully

p erformanceimproving program transformations The Glasgow Haskell GHC

takes this idea of compilation by transformation as its warcry trying to express as much

as p ossible of the compilation pro cess in the form of program transformations

This pap er rep orts on our practical exp erience of the transformational approach to com

pilation in the context of a substantial compiler

This pap er is based in part on Peyton Jones and Peyton Jones Partain Santos

It wil l appear in Science of Computer Programming

Introduction

Using correctnesspreserving transformations as a compiler optimisation is a wellestablished

technique Aho Sethi Ullman Bacon Graham Sharp In the functional

programming area esp ecially the idea of compilation by transformation has received quite a

bit of attention App el Fradet Metayer Kelsey Kelsey Hudak

Kranz Steele

A transformational approach to compiler construction is attractive for two reasons

Each transformation can b e implemented veried and tested separately This leads to a

more mo dular compiler design in contrast to compilers that consist of a few huge passes

each of which accomplishes a great deal

In any framework transformational or otherwise each optimisation often exp oses new

opp ortunities for other optimisations the cascade eect This makes it dicult to

decide a priori what the b est order to apply them might b e In a transformational setting

it is easy for compilerwriters to plug and play by reordering transformations applying

them more than once or trading compilation time for co de quality by omitting some It

allows a late commitment to phase ordering

This pap er rep orts on our exp erience in applying transformational techniques in a particularly

thoroughgoing way to the Glasgow Haskell Compiler GHC Peyton Jones et al a

compiler for the nonstrict functional language Haskell Hudak et al Among other

things this pap er may serve as a useful jumpingo p oint and annotated bibliography for

those interested in the compiler The following distinctive themes emerge all of which are

elab orated later in the pap er

We frequently nd a close interplay b etween theory and practice a particularly satisfying

asp ect of functionallanguage research

Often a single transformation elegantly generalises a textb o ok compiler optimisation or

eectively subsumes several such optimisations

Our compiler infers types for the source program but then maintains types throughout

the compilation pro cess We have found this to b e a big win for two reasons it supp orts a

p owerful consistency check on the correctness of the compiler and it provides information

that is used to drive some optimisations notably strictness analysis

Overview

Haskell is a nonstrict purely functional language It is a relatively large language with a rich

syntax and type system designed for fullscale application programming

The overall structure of our compiler is conventional

The front end parses the source do es scop e analysis and type inference and translates

the program into a small intermediate language called the Core language This latter

stage is called desugaring

The middle consists of a sequence of CoretoCore transformations and forms the sub ject

of this pap er

The back end translates the resulting Core program into C whence it is compiled to

machine co de Peyton Jones

In what sense do es this structure p erform compilation by transformation After all since

the middle just transforms one Core program to another it is presumably optional and

indeed our compiler has this prop erty The idea however is to do as much work as possible

in the midd le leaving the irreducible minimum in the front and back ends The front end

1

should concentrate entirely on scop e analysis type inference and a simpleminded translation

1

Why do we not instead translate to Core and then typecheck Because one gets much b etter error messages

from the typechecker if it is lo oking at source co de rather than desugared source co de Furthermore to require

the translation to Core to preserve exactly Haskells typeinference prop erties places undesirable extra constraints

on the translation Lastly the translation of Haskells system of overloading into Core can only b e done in the

knowledge of the programs typing

to Core ignoring eciency The back end should include optimisations only if they cannot b e

done by a CoretoCore transformation This pap er describ es several examples of optimisations

that are traditionally done in the desugarer or in the co de generator which we reexpress as

CoretoCore transformations

In short just ab out everything that could b e called an optimisation and optimisations

constitute the bulk of what most quality compilers do app ears in the middle

In practice we nd that transformations fall into two groups

A large set of simple lo cal transformations eg b eta reduction These

transformations are all implemented by a single relatively complex compiler pass that we

call the simplier The complexity arises from the fact that the simplier tries to p erform

as many transformations as p ossible during a single pass over the program exploiting

the cascade eect It would b e unreasonably inecient to p erform just one at a time

starting from the b eginning each time Despite these eorts the result of one simplier

pass often still contains opp ortunities for further simplier transformations so we apply

the simplier rep eatedly until no further transformations o ccur with a xed maximum

to avoid pathological b ehaviour

A small set of complex global transformations eg strictness analysis sp ecialising over

loaded functions each of which is implemented as a separate pass Most consist of an

analysis phase followed by a transformation pass that uses the analysis results to iden

tify appropriate sites for the transformation Many also rely on a subsequent pass of

the simplier to clean up the co de they pro duce thus avoiding the need to duplicate

transformations already embo died in the simplier

We have taken the plug and play idea to an extreme allowing the sequence of transformation

passes to b e completely sp ecied on the command line

Rather than give a sup ercial overview of everything we fo cus in this pap er on three asp ects

of our compiler that play a key role in compilation by transformation

The Core language itself Section

Two groups of transformations implemented by the simplier inlining and b eta reduction

Section and transformations involving case expressions Section

Two global transformation passes one that p erforms and exploits strictness analysis Sec

tion and one that moves bindings to improve allo cation and sharing Section

We conclude with a brief enumeration of the other main transformations incorp orated in GHC

Section a short discussion of separate compilation Section some measurements of the

p erformance improvements achievable by transformation Section and a summary of the

lessons we learned from our exp erience Section

Program P r og B ind B ind n

1 n

Binding B ind v ar E xpr Nonrecursive

j rec v ar E xpr Recursive n

1 1

v ar E xpr

n n

Expression E xpr E xpr Atom Application

j E xpr ty Type application

j v ar v ar E xpr Lambda abstraction

1 n

j ty v ar ty v ar E xpr Type abstraction

1 n

j case E xpr of Al ts Case expression

j let B ind in E xpr Lo cal denition

j con v ar v ar Constructor n

1 n

j pr im v ar v ar Primitive n

1 n

j Atom

Atoms Atom v ar Variable

j Liter al Unboxed Ob ject

Literals Liter al integ er j f l oat j

Alternatives Al ts C al t C al t D ef aul t n

1 n

j Lal t Lal t D ef aul t n

1 n

Constr alt C al t con v ar v ar E xpr n

1 n

Literal alt Lal t Liter al E xpr

Default alt D ef aul t NoDefault

j v ar E xpr

Figure Syntax of the Core language

The Core language

The Core language clearly plays a pivotal role Its syntax is given in Figure and consists

essentially of the lambda calculus augmented with let and case

Though we do not give explicit syntax for them here the Core language includes algebraic data

type declarations exactly as in any mo dern language For example in

Haskell one might declare the type of trees thus

data Tree a Leaf a Branch Tree a Tree a

This declaration implicitly denes constructors Leaf and Branch that are used to construct

data values and can b e used in the pattern of a case alternative Bo oleans lists and tuples

are simply predeclared algebraic data types

data Boolean False True

data List a Nil Cons a List a

data Tuple a b c T a b c One for each size of tuple

Throughout the pap er we take a few lib erties with the syntax we allow ourselves inx op erators

eg E E and sp ecial syntax for lists for Nil and inx for Cons and tuples eg

abc We allow multiple denitions in a single let expression to abbreviate a sequence of

nested let expressions and often use layout instead of curly brackets and semicolons to delimit

case alternatives We use an upp ercase identier such as E to denote an arbitrary expression

A Core expression is in weak head normal form or WHNF if it is a lambda abstraction

constructor application variable or literal

The op erational reading

The Core language is of course a functional language and can b e given the usual denotational

semantics However a Core program also has a direct operational interpretation If we are to

reason ab out the usefulness of a transformation we must have some mo del for how much it

costs to execute it so an op erational interpretation is very desirable In what follows we give

an informal op erational mo del but it can readily b e formalised along the lines describ ed by

Launchbury

Like any higherorder language the op erational mo del for Core requires a garbagecollected

heap The heap contains

Data values such as list cells tuples b o oleans integers and so on

Function values such as x x the function that adds to its argument

Thunks or susp ensions that represent susp ended ie as yet unevaluated values

Thunks are the implementation mechanism for Haskells nonstrict semantics For example

consider the Haskell expression f sin x y Translated to Core the expression would lo ok

like this

let v sin x

in f v y

The let allo cates a thunk in the heap for sin x and then when it subsequently calls f passes a

p ointer to the thunk The thunk records all the information needed to compute its b o dy sin x

in this case but it is not evaluated b efore the call If f ever needs the value of v it will force

the thunk which provokes the computation of sin x When the thunks evaluation is complete

the thunk itself is updated ie overwritten with the nowcomputed value If f needs the value

of v again the heap ob ject now contains its value instead of the susp ended computation If f

never needs v then the thunk is not evaluated at all

The two most imp ortant op erational intuitions ab out Core are as follows

let bindings and only let bindings perform heap al location For example

let v sin x

in

let w pq

in

f v w

Op erationally the rst let allo cates a thunk for sin x and then evaluates the lets

b o dy This b o dy consists of the second let expression which allo cates a pair pq in

the heap and then evaluates its b o dy in turn This b o dy consists of the call f v w so

the call is now made passing p ointers to the two newlyallo cated ob jects

In our implementation each allo cated ob ject b e it a thunk or a value consists only of a

co de p ointer together with a slot for each free variable of the righthand side of the let

binding Only one ob ject is allo cated regardless of the size of the righthand side older

implementations of graph reduction do not have this prop erty We do not attempt to

share environments b etween thunks App el Kranz et al

case expressions and only case expressions perform evaluation For example

case x of

yys y g ys

The op erational understanding is as follows evaluate x and then scrutinise it to see

whether it is an empty list or a Cons cell of form yys continuing execution with

the appropriate alternative If x is an asyetunevaluated thunk the act of evaluating

it is typically implemented by saving live variables and a return address on the stack

and jumping to the co de stored inside the thunk When evaluation is complete the

nowevaluated thunk returns to the saved return address

case expressions subsume conditionals of course The Haskell expression if C E E is

desugared to

case C of True E False E

2

The syntax in Figure requires that function arguments must b e atoms that is variables or

literals and now we can see why If the language allowed us to write

f sin x pq

the op erational b ehaviour would still b e exactly as describ ed in ab ove with a thunk and

a pair allo cated as b efore The let form is simply more explicit Furthermore the let form

gives us the opp ortunity of moving the binding for v elsewhere if that turns out to b e desirable

which the apparentlysimpler form do es not Lastly the let form is more economical b ecause

many transformations on let expressions concerning strictness for example would have to

b e duplicated for function arguments if the latter were nonatomic

2

This syntax is b ecoming quite widely used Ariola et al Flanagan et al Launchbury

Peyton Jones Tarditi et al

It is also imp ortant to note where atoms are not required In particular the scrutinee of a case

expression is an arbitrary expression not just an atom For example the following is quite

legitimate

case reverse xs of

Op erationally there is no need to build a thunk for reverse xs and then evaluate it instead

we can simply save a return address load xs into an argument register and jump to the co de

for reverse Again the op erational mo del determines the syntax

Polymorphism

Like any compiler for a stronglytyped language GHC infers the type of every expression

and variable An obvious question is can this type assignment b e maintained through the

translation to the Core language and through all the subsequent transformations that are

applied to the program If so b oth transformations and co de generator might and in GHC

sometimes do take advantage of type information to generate b etter co de

In a monomorphic language the answer is a clear yes but matters are not initially so clear in

a p olymorphic setting The trouble is that program transformation involves type manipulation

Consider for example the usual comp osition function compose whose type is

compose

In an untyped Core language compose might b e dened like this

compose f g x let y g x in f y

Now supp ose that we wished to unfold a particular call to compose say

compose show double v

where v is an Int double doubles it and show converts the result to a String The result of

unfolding the call to compose is an instance of the b o dy of compose thus

compose show double v let y double v in show y

Now we want to b e able to identify the type of every variable and subexpression so we must

calculate the type of y In this case it has type Int but in another application of compose it

may have a dierent type For example if we inline compose in another call

compose toUpper show v

where toUpper converts a String to upp er case we obtain

compose toUpper show v let y show v in toUpper y

and here y has type String

This diculty arises b ecause ys type in the b o dy of compose itself is just a type variable

Evidently in a p olymorphic world it is insucient merely to tag every variable of the original

program with its type b ecause this information do es not survive across program transforma

tions

What then is to b e done Clearly the program must b e decorated with type information in

some way and every program transformation must b e sure to preserve it Deciding exactly

how to decorate the program and how to maintain these decorations correctly during trans

formation seemed rather dicult at rst We nally realised that an otheshelf solution was

available namely the secondorder lambda calculus Girard Reynolds

The idea is that every p olymorphic function such as compose has a type abstraction for each

universallyquantied p olymorphic variable in its type and in the case of compose and

whenever a p olymorphic function is called it is passed extra type arguments to indicate the

types to which its p olymorphic type variables are to b e instantiated The denition of compose

now b ecomes

compose a b c

fbc gab xa

let yb g x in f y

The function takes three type parameters a b and c as well as its value parameters f g and

x The types of the latter can now b e given explicitly as can the type of the lo cal variable y

A call of compose is now given three extra type arguments which instantiate a b and c just

as the normal arguments instantiate f g and x For example the call of compose we lo oked

at earlier is now written like this

compose Int Int String show double v

It is now simple to unfold this call by instantiating the b o dy of compose with the supplied

arguments to give the expression

let yInt double v in show y

Notice that the letb ound variable y is now automatically attributed the correct type

In short the secondorder lambda calculus provides us with a wellfounded notation in which

to express and transform p olymorphicallytyped programs It turns out to b e easy to introduce

the extra type abstractions and applications as part of the type inference pro cess

Other compilers for p olymorphic languages are b eginning to carry type information through to

the back end and use it to generate b etter co de Shao App el use type information to

improve data representation though the system they describ e is monomorphic after the front

end Our implementation uses type abstractions and applications only to keep the compilers

types straight no types are passed at runtime It is p ossible to take the idea further however

and pass types at runtime to sp ecialise data representations Morrison et al give fast

access to p olymorphic records Ohori guide garbage collection Tolmach The

most recent and sophisticated work is Harp er Morrisett

Inlining and b eta reduction

The rst transformation that we discuss is inlining Functional programs often consist of a

myriad of small functions functional programmers treat functions the way C programmers

treat macros so go o d inlining is crucial Compilers for conventional languages get

p erformance improvement from inlining Davidson Holler while functional language

3

compilers gain App el Santos Inlining removes some functioncall over

head of course but an equally imp ortant factor is that inlining brings together co de that was

previously separated and thereby often exp oses a cascade of new transformation opp ortunities

We therefore implement inlining in the simplier

We have found it useful to identify three distinct transformations related to inlining

Inlining itself replaces an o ccurrence of a letb ound variable by a copy of the righthand

side of its denition Notice that inlining is not limited to function denitions any

letb ound variable can p otentially b e inlined Remember though that o ccurrences

of a variable in an argument p osition are not candidates for inlining b ecause they are

constrained to b e atomic

Dead co de elimination discards let bindings that are no longer used this usually o ccurs

when all o ccurrences of a variable have b een inlined

Beta reduction replaces xE A by EAx An analogous transformation deals with type

applications

Beta reduction is particularly simple in our setting Since the argument A is b ound to b e atomic

there is no risk of duplicating a redex and we can simply replace x by A throughout E There

is a worry ab out name capture however what if A is also b ound in E We avoid this problem

by the simple exp edient of renaming every identier as we go which costs little extra since

we have to construct a new transformed expression anyway Whilst b eta reduction is simple

inlining is more interesting

Simple inlining

It is useful to distinguish two cases of inlining

WHNFs If the variable concerned is b ound to a weak head normal form WHNF that is

an atom lambda abstraction or constructor application then it can b e inlined without

risking the duplication of work The only downside might b e an increase in co de size

NonWHNFs Otherwise inlining carries the risk of loss of sharing and hence the duplication

of work For example

let x f in xx

it might b e b e unwise to inline x b ecause then f would b e evaluated twice instead of

once Informally we say that a transformation is W safe if it guarantees not to duplicate

work

3

This dierence may so on decrease as the increased use of ob jectoriented languages leads to nergained

pro cedures Calder Grunwald Zorn

In the case of WHNFs the tradeo is simply b etween co de size and the b enet of inlining

Atoms and constructor applications are easy they always small enough to inline Recall that

constructor applications must have atomic arguments

Functions in contrast can b e large so the eect of unrestricted inlining on co de size can b e

substantial Like most compilers we use a heuristic but no formal analysis for deciding when

to inline functions More precisely we compute the space p enalty of inlining a function at a

call site as follows

Compute the size in syntax no des of the b o dy of the function

Subtract one for each argument since we are going to replace a function call with an

instance of the b o dy

Lastly subtract a discount for each argument that a is scrutinised by a case expression

in the function b o dy and b is b ound to a constructor at the call site If inlining

is p erformed a caseofknownconstructor transformation will throw away all but one

branch of the case expression hence the discount The discount is very crude however

it is just a constant

If the space p enalty thus computed is smaller than some xed commandline settable constant

then we inline the function at the call site

For nonWHNFs attention fo cuses on how the variable is used If the variable o ccurs just once

then presumably it is safe to inline it Our rst approach was to p erform a simple o ccurrence

analysis that records for each variable how many places it is used and use this information

to guide the inlinings done by the simplier There are three complications with this naive

approach

The rst is practical As mentioned earlier the simplier tries to p erform as many transfor

mations as p ossible during a single pass over the program However many transformations

notably b eta reduction and inlining itself change the number of o ccurrences of a variable

Our current solution to this problem is to do a great deal of b o okkeeping to keep o ccurrence

information up to date App el Jim do something similar

The second complication is that a variable may o ccur multiple times with no risk of duplicating

work namely if the o ccurrences are in dierent alternatives of a case expression In this case

the only issue to consider is the tradeo b etween co de size and inlining b enet

Most seriously though inlining based on naive o ccurrence counting is not W safe Consider

this expression

let x f

g y x

in g ag b

If we replace the single o ccurrence of x by f we will recompute the call to f every time

g is called rather than sharing it among all calls to g Our current solution is conservative we

never inline inside a lambda abstraction It turns out though that this approach is sometimes

to o conservative In higherorder programs where lots of inlining is happ ening it is not unusual

to nd functions that are sure to b e called only once so it would b e p erfectly safe to inline

inside them

Using linearity

Because of these complications the b o okkeeping required to track o ccurrence information has

gradually grown into the most intricate and bugprone part of the simplier Worse work

duplication bugs manifest themselves only as p erformance problems and may go unnoticed for

4

a long time This complexity is esp ecially irritating b ecause we have a strong intuitive notion of

whether a variable can b e used more than once and that intuitive notion is an invariant of W

safe transformations That suggests that a linear type system would b e a go o d way to identify

variables that can safely b e inlined even though they o ccur inside lambdas or that cannot

safely b e inlined even though they currently o ccur only once Just as all transformations

preserve the ordinary typing of an expression Section so W safe transformations preserve

the linear type information to o and hence guarantee not to duplicate work

Unfortunately most linear type systems are inappropriate b ecause they do not take account of

callbyneed evaluation For example consider the expression

let x

y x

in y y

Under call by need evaluation even though y is evaluated many times x will b e evaluated only

once Most linear systems would b e to o conservative and would attribute a nonlinear type to

x as well as y preventing x from b eing inlined

Thus motivated we have developed a linear type system that does take account of call by need

evaluation Turner Wadler Mossin The type system assigns a type of Int to y in

the ab ove example the sup erscript indicating that y might b e evaluated more than once

1

However it assigns a type of Int to x indicating that x can b e evaluated at most once and

hence can W safely b e inlined

The type system is capable of dealing with usage p olymorphism For example consider this

denition of apply

apply f x f x

In a particular application apply g y whether or not y is used more than once dep ends on

5

whether g uses its argument more than once So the type of apply is

u v u v

u v

u

The two o ccurrences of indicate that the usage u of gs argument is the same as that of y

Our implementation of this linear type system is incomplete so we do not yet have practical

exp erience of its utility but we are optimistic that it will provide a systematic way of addressing

4

One such bug caused the compiler which is of course written in Haskell to rebuild its symbol table from

scratch every time a variable was lo oked up in the table The compiler worked p erfectly alb eit somewhat slowly

and it was months b efore we noticed Sansom

5

In fact for the purp oses of this pap er we have simplied the type a little

an area we have only dealt with informally to date and which has bitten us badly more than

once

Transforming conditionals

Most compilers have sp ecial rules to optimise conditionals For example consider the expression

if not x then E else E

No decent compiler would actually negate the value of x at runtime Let us see then what

happ ens if we simply turn the transformation handle After desugaring the conditional and

inlining the denition of not we get

case case x of True False False True of

True E

False E

Here the outer case scrutinises the value returned by the inner case This observation suggests

that we could move the outer case inside the branches of the inner one thus

case x of

True case False of True E False E

False case True of True E False E

Notice that the originallyouter case expression has b een duplicated but each copy is now

scrutinising a known value and so we can make the obvious simplication to get exactly what

we might originally have hop ed

case x of

True E

False E

Both of these transformations are generally applicable The second the caseofknownconstructor

transformation eliminates a case expression that scrutinises a known value This is always a

Go o d Thing and many other transformations are aimed at exp osing opp ortunities for such

case elimination We consider another useful variant of case elimination in Section The

rst which we call the caseofcase transformation is certainly correct in general but it app ears

to risk duplicating E andor E We turn to this question next

Join p oints

How can we gain the b enets of the caseofcase transformation without risking co de duplica

tion A simple idea is to make lo cal denitions for the righthand sides of the outer case like

this

case case S of True R False R of

True E

False E

let e E e E

in case S of

True case R of True e False e

False case R of True e False e

Now E and E are not duplicated though we incur instead the cost of implementing the

bindings for e and e In the not example though the two inner cases are eliminated

leaving only a single o ccurrence of each of e and e so their denitions will b e inlined leaving

exactly the same result as b efore

We certainly cannot guarantee that the newlyintroduced bindings will b e eliminated though

Consider for example the expression

if x y then E else E

Here is the b o olean disjunction op eration dened thus

a b case a of True True False b

Desugaring the conditional and inlining gives

case case x of True True False y of

True E

False E

Now applying the new caseofcase transformation

let e E e E

in case x of

True case True of True e False e

False case y of True e False e

Unlike the not example only one of the two inner cases simplies so only e will certainly b e

inlined b ecause e is still mentioned twice

let e E

in case x of

True e

False case y of True e False E

The interesting thing here is that e plays exactly the role of a lab el in conventional compiler

technology Given the original conditional a C compiler will shortcircuit the evaluation of

the condition if x turns out to b e True generating co de like

if x goto l

if y goto l

goto l

l code for E goto l

l code for E

l

Here l is a lab el where two p ossible execution paths if x is True or if x is False and y is True

join up we call it a join p oint That suggests in turn that our co de generator should b e able

to implement the binding for e not by allo cating a thunk as it would usually do but rather by

simply jumping to some common co de after p erhaps adjusting the stack p ointer wherever e

is subsequently evaluated Our compiler do es exactly this Rather than somehow mark e as

sp ecial the co de generator do es a simple syntactic escap e analysis to identify variables whose

evaluation is certain to take place b efore the stack retreats and implements their evaluation as

a simple adjuststackandjump As a result we get essentially the same co de as a C compiler

for our conditional

Seen in this light the act of inlining E is what a conventional compiler might call jump

elimination A go o d C compiler would probably eliminate the jump to l thus

if x goto l

if y goto l

l code for E

l

l code for E goto l

Back in the functional world if E is small then the inliner might decide to inline e at its

two o ccurrences regardless thus eliminating a jump in favour of a slight increase in co de size

Conventional compilers do this to o notably in the case where the co de at the destination of a

jump is just another jump which would corresp ond in our setting to E b eing just a simple

variable

The p oint is not that the transformations achieve anything that conventional compiler tech

nology do es not but rather that a single mechanism inlining which is needed anyway deals

uniformly with jump elimination as well as its more conventional eects

Generalising join p oints

Do es all this work generalise to data types other than b o oleans At rst one might think

the answer is yes of course but in fact the mo died caseofcase transformation is simply

nonsense if the originallyouter case expression binds any variables For example consider the

expression

f if b then B else B

where f is dened thus

f as case as of E bbs E

Desugaring the if and inlining f gives

case case b of True B False B of

E

bbs E

But now since E may mention b and bs we cannot letbind a new variable e as we did

b efore The solution is simple though simply letbind a function e that takes b andor bs

as its arguments Supp ose for example that E mentions bs but not b Then we can p erform

a caseofcase transformation thus

let e E e bs E

in case b of

True case B of e bbs e bs

False case B of e bbs e bs

All the inlining mechanism discussed ab ove for eliminating the binding for e if p ossible works

just as b efore Furthermore even if e is not inlined the co de generator can still implement

e eciently a call to e is compiled to a co de sequence that loads bs into a register adjusts

the stack p ointer and jumps to the join p oint

This go es b eyond what conventional compiler technology achieves Our join p oints can now

b e parameterised by arguments that embo dy the dierences b etween the execution paths that

led to that p oint Better still the whole setup works for arbitrary userdened data types not

simply for b o oleans and lists

Generalising case elimination

Earlier we discussed the caseofknownconstructor transformation that eliminates a case ex

pression There is a useful variant of this transformation that also eliminates a case expression

Consider the expression

if null xs then r else tail xs

where null and tail are dened as you might exp ect

null as case as of True bbs False

tail cs case cs of error tail dds ds

After the usual inlining we get

case case xs of True bbs False of

True r

False case xs of

error tail

dds ds

Now we can do the caseofcase transformation as usual giving after a few extra steps

case xs of

r

bbs case xs of

error tail

dds ds

Now it is obvious that the inner evaluation of xs is redundant b ecause in the bbs branch

of the outer case we know that xs is certainly of the form bbs Hence we can eliminate the

inner case selecting the dds alternative but substituting b for d and bs for ds

case xs of

r

bbs bs

We will see another application of this form of case elimination in Section

Summary

We have describ ed a few of the most imp ortant transformations involving case expressions but

there are quite a few more including case merging dead alternative elimination and default

elimination They are describ ed in more detail by Santos who also provides measurements

of their frequency

Like many go o d ideas the caseofcase transformation limited to b o oleans but including

the idea of using letb ound variables as join p oints was incorp orated in Steeles Rabbit

compiler for Scheme Steele We reinvented it and generalised it for case expressions

and parameterised join p oints letb ound join p oints are also extremely useful when desug

aring complex pattern matching Lacking join p oints most of the standard descriptions are

complicated by a sp ecial FAIL value along with sp ecial semantics and compilation rules to

express the joining up of several execution paths when a pattern fails to match Augustsson

Peyton Jones

Unboxed data types and strictness analysis

Consider the expression xy where x and y have type Int Because Core is nonstrict x and

y must each b e represented by a p ointer to a p ossiblyunevaluated ob ject Even if x say is

already evaluated it will still therefore b e represented by a p ointer to a b oxed value in the

heap The addition op eration must evaluate x and y as necessary unbox them add them and

b ox the result

Exp osing b oxing to transformation

Where arithmetic op erations are cascaded we would like to avoid b oxing the result of one

op eration only to unbox it immediately in the next Similarly in the expression xx we would

like to avoid evaluating and unboxing x twice Such b oxingunboxing optimisations are usually

carried out by the co de generator but it would b e b etter to nd a way to express them as

program transformations

We have achieved this goal as follows Instead of regarding the data types Int Float and so

on as primitive we dene them using ordinary algebraic data type declarations

data Int I Int

data Float F Float

Here Int is the trulyprimitive type of unboxed integers and Float is the type of unboxed

6

oats The constructors I and F are in eect the b oxing op erations Now we can express

7

the previouslyprimitive op eration thus

6

The symbol has no signicance to the compiler we use it simply as a lexical reminder that the identier

has an unboxed type or takes arguments of unboxed type

7

You may wonder why we write case a b of r I r instead of the more obvious

I a b The reason is that the latter is not a term in the Core language constructor arguments

must b e atoms Furthermore the alternative let r a b in I r isnt legal either b ecause r is an

unboxed value and hence cant b e heapallo cated by let So the case expression is really just an eager letbinding

a b case a of

I a case b of

I b case a b of

r I r

where is the primitive addition op eration on unboxed values You can read this denition

as evaluate and unbox a do the same to b add the unboxed values giving r and return a

b oxed version thereof

Now simple transformations do the Right Thing to xx We b egin by inlining to give

case x of

I a case x of

I b case a b of

r I r

But now the inner case can b e eliminated Section since it is scrutinising a known value

x giving the desired outcome

case x of

I a case a a of

r I r

Similar transformations this time involving caseofcase ensure that in expressions such as

xyz the intermediate result is never b oxed The details are given by Peyton Jones

Launchbury but the imp ortant p oints are these

By making the Core language somewhat more expressive ie adding unboxed data types

we can exp ose many new evaluation and b oxing op erations to program transformation

Rather than a few ad hoc optimisations in the co de generator the full range of transfor

mations can now b e applied to the newlyexp osed co de

Optimising evaluation and unboxing may itself exp ose new transformation opp ortunities

for example a function b o dy may b ecome small enough to inline

Exploiting strictness analysis

Strictness analysers attempt to gure out whether a function is sure to evaluate its argument

giving the opp ortunity for the compiler to evaluate the argument b efore the call instead of

building a thunk that is forced later on There is an enormous literature on strictness analysis

itself but virtual ly none explaining how to exploit its results apart from general remarks that

the co de generator can use it Our approach is to express the results of strictness analysis as a

program transformation for exactly the reasons mentioned at the end of the previous section

As an example consider the factorial function with an accumulating parameter which in Haskell

might lo ok like this

afac Int Int Int

afac a a

afac a n afac na n

Translated into the Core language it would take the following form

one I

afac a n case n of

I n case n of

a

n let a na

n none

in afac a n

In a naive implementation this function sadly uses linear space to hold a growing chain of

unevaluated thunks for a

Now supp ose that the strictness analyser discovers that afac is strict in b oth its arguments

Based on this information we split it into two functions a wrapper and a worker thus

afac a n case a of

I a case n of

I n afac a n

one I

afac a n let n I n a I a

in case n of

I n case n of

a

n let a na

n none

in afac a n

The wrapp er afac implements the original function by evaluating the strict arguments and

passing them unboxed to the worker afac When it is created the wrapp er is marked as

alwaysinlineme which makes the simplier extremely keen to inline it at every call site

thereby eectively moving the argument evaluation to the call site

The co de for the worker starts by reconstructing the original arguments in b oxed form and

then concludes with the original unchanged co de for afac Reb oxing the arguments may b e

correct but it lo oks like a weird thing to do b ecause the whole p oint was to avoid b oxing the

arguments at all Nevertheless let us see what happ ens when the simplier go es to work on

afac It just inlines the denitions of and afac itself and applies the transformations

describ ed earlier A few moments work should convince you that the result is this

afac a n case n of

I a

n case n a of

a case n of

n afac a n

Bingo afac is just what we hop ed for a strict constantspace ecient factorial function

The reb oxing bindings have vanished b ecause a case elimination transformation has left them

as dead co de Even the recursive call is made directly to afac rather than going via afac it

is worth noticing the imp ortance of inlining the wrapp er in the b o dy of the worker even though

the two are mutually recursive Meanwhile the wrapp er afac acts as an imp edancematcher

to provide a b oxed interface to afac

A simple strictness and absence analyser

GHC uses a rather simple strictness analyser the idea b eing to get a large fraction of the b enet

8

of more sophisticated strictness analysis with a small fraction of the eort More sp ecically

GHC uses a simple higherorder abstract interpretation over a domain that includes just top

b ottom functions and nite pro ducts It is describ ed by Peyton Jones Partain but

we briey review the main design choices in the rest of this section

Fixp oints and widening

The main challenge in abstract interpretation is usually that of nding the xp oints of recursive

abstract functions We take a simple approach after each iteration we widen the abstract value

so that it is easy to compare with the previous iteration This results in a loss of accuracy but

it is fast and easy to implement

The widening op erator we use is this given an abstract function we nd out whether it is

strict in each of its arguments indep endently and treat the vector of results as the widened

approximation to the function For example consider the function

f x y z if x then f x z y else y

The zeroth approximation to the abstract value of f f is by denition b ottom We write this

0

approximation in widened form thus f SSS the S stands for strict to indicate

0

that f is strict in all three arguments After one iteration we nd that f SSL the L

0 1

stands for lazy b ecause z is not used in the else branch After two iterations using f in

1

the recursive call to f we nd that f SLL and this turns out to b e the xp oint There are

2

well known pitfalls with this approach Clack Peyton Jones but they can b e avoided

at the exp ense of accuracy by the simple exp edient of always using the widened function in

the recursive call

A side b enet is that there is an obvious textual representation of the abstract value of a

function which we use for conveying strictness information b etween mo dules Section

In reality we have found it essential to widen nonrecursive functions as well for a reason that

is not initially obvious Consider the following nonrecursive denitions

f x case x of

pps

g y case y of

C a b f a

8

Of course it is imp ossible to know whether one has achieved this goal without implementing a sophisticated

analyser as well which we have not done

C c d f c

C e f f e

If we do not widen the abstract values for f and g then consider what happ ens when the abstract

interpreter nds a call such as g a It applies the abstract value of g to the abstract value

of a this application will take the least upp er b ound of the three case alternatives in gs right

hand side Each of these three will evaluate a call to the abstract f and each of these calls

will take the least upp er b ound of fs two case alternatives There is a multiplicative eect in

which every branch of every conditional reachable from a particular call is evaluated In normal

evaluation of course only one branch is taken from a conditional but abstract interpretation

in eect takes all of them This turns out to b e far to o slow in practice The solution is to

trade accuracy for time simply widen nonrecursive functions as well as recursive ones Once

g is widened its right hand side is no longer evaluated at every call For example to apply the

widened function value SSL the abstract evaluator simply checks to see if either of its rst two

arguments are b ottom and if so returns b ottom otherwise top

Pro ducts and absence analysis

Supp ose we have the following function denition

f IntInt Int

f p E

It is relatively easy for the strictness analyser to discover not only fs strictness in the pair p

but also fs strictness in the two comp onents of the pair it is for precisely this reason that

the abstract domain includes nite pro ducts For example supp ose that the strictness analyser

discovers that f is strict b oth in p and in the rst comp onent of p but not in the second Given

this information we can transform the denition of f into a worker and a wrapp er like this

f p case p of xy case x of I x f x y

f x y let x I x p xy

in E

The pair is passed to the worker unboxed ie the two comp onents are passed separately and

so is the rst comp onent of the pair

We so on learned that lo oking inside nonrecursive data structures in this way exp osed a

new opp ortunity absence analysis What if f do es not use the second comp onent of the

pair at all Then it is a complete waste of time to pass y to f at all Whilst it is unusual for

programmers to write functions with arguments that are completely unused it is rather common

for them to write functions that do not use some parts of their arguments We therefore p erform

b oth strictness analysis and absence analysis and use the combined information to guide the

workerwrapper split

Matters are more complicated if the argument type is recursive or has more than one construc

tor In such cases we revert to the simple twopoint abstract domain

Notice the imp ortance of type information to the whole endeavour The type of a function

guides the resolution of the strictness analysis and the workerwrapper splitting

Strict let bindings

An imp ortant but less commonly discussed outcome of strictness analysis is that it is p ossible

to tell whether a let binding is strict that is whether the variable b ound by the let is sure

to b e evaluated in the b o dy If so there is no need to build a thunk Consider the expression

let x R in E

where x has type Int and E is strict in x Using a similar strategy to the workerwrapper

scheme we can transform to

case R of I x let x I x in E

We call this the lettocase transformation As b efore the reb oxing binding for x usually will

b e eliminated by subsequent transformation If x has a recursive or multiconstructor type then

we transform instead to this

case R of x E

This expression simply generates co de to evaluate R bind the b oxed result to x and then

evaluate E This is still an improvement over the original let expression b ecause no thunk is

built

Co de motion

Consider the following expression

let v let w R

in w

in B

A semanticallyequivalent expression which diers only in the p ositioning of the binding for w

is this

let w R

in let v w

in B

While the two expressions have the same value the second is likely to b e more ecient than the

rst to evaluate We will say why this is so in Section A go o d compiler should transform

the rst expression into the second However the dierence in eciency is mo dest and the

transformation b etween the two seems almost to o easy to merit serious study as a result not

much attention has b een paid to transformations of this kind We call them letoating

transformations b ecause they concern the exact placement of let or letrec bindings in the

example it is the binding for w which is oated from one place to another They corresp ond

closely to co de motion in conventional compilers

We have found it useful to identify three distinct kinds of letoating transformations

Floating inwards moves bindings as far inwards as p ossible Section

The ful l laziness transformation oats selected bindings outside enclosing lambda abstrac

tions Section

Local transformations netune the lo cation of bindings Section

After describing the three transformations we describ e how they are implemented in GHC

Sections and Later on we quantify their eectiveness Section More detailed

measurements are presented in Peyton Jones Partain Santos

Floating inwards

The oatinginward transformation is based on the following observation other things being

equal the further inward a binding can be moved the better For example consider

let x y

in case z of

xx

pps

Here the binding for x is used in only one branch of the case so it can b e moved into that

branch

case z of

let x y

in xx

pps

9

Moving the binding inwards has at least three distinct b enets

p

The binding may never be executed In the example z might turn out to b e of the form

pps in which case the co de which deals with the binding for x is not executed Before

the transformation a thunk for x would b e allo cated regardless of the value of z

p

Strictness analysis has a better chance It is more likely that at the p oint at which the

binding is now placed it is known that the b ound variable is sure to b e evaluated This

in turn may enable other strictnessrelated transformations to b e p erformed In our

example instead of allo cating a thunk for x GHC will simply evaluate y increment it

and square the result allo cating no thunks at all Section

p

Redundant evaluations may be eliminated It is p ossible that the RHS will see the

evaluation state of more variables than b efore To take a similar example

let x case y of ab a

in

case y of

pq xp

If the binding of x is moved inside the case branch we get

case y of

p

9

We indicate advantages with and disadvantages with The symbol 2 indicates mo ot p oints

pq let x case y of ab a

in

xp

Now the compiler can sp ot that the inner case for y is in the RHS of an enclosing case

which also scrutinises y It can therefore eliminate the inner case to give

case y of

pq pp

The rst two b enets may also accrue if a binding is moved inside the RHS of another binding

For example oating inwards would transform

let x vw

y xx

in

B

where B do es not mention x into

let y let x vw in xx

in

B

The alert reader will notice that this transformation is precisely the opp osite of that given at

the start of Section a p oint we return to in Section This example also illustrates another

minor eect of moving bindings around

2 Floating can change the size of the thunks allo cated Recall that in our implementation

each letrec binding allo cates a heap ob ject that has one slot for each of its free

variables The more free variables there are the larger the ob ject that is allo cated In

the example oating x into ys RHS removes x from ys free variables but adds v and

w Whether ys thunk thereby b ecomes bigger or smaller dep ends on whether v andor w

were already free in y

So far we have suggested that a binding can usefully b e oated inward as far as p ossible

that is to the p oint where it can b e oated no further in while still keeping all the o ccurrences

of its b ound variable in scop e There is an imp ortant exception to this rule it is dangerous to

oat a binding inside a lambda abstraction as we discussed in Section If the abstraction is

applied many times each application will instantiate a fresh copy of the binding Worse if the

binding contains a reducible expression the latter will b e reevaluated each time the abstraction

is applied

The simple solution is never to oat a binding inside a lambda abstraction and that is what our

compiler currently do es although we plan in future to use guidance from lineartype information

see Section But what if the binding is inside the abstraction to start with We turn to

this question next

Full laziness

Consider the denition

f xs letrec

g y let n length xs

in gn

in

g

Here the length of xs will b e recomputed on each recursive call to g This recomputation can

b e avoided by simply oating the binding for n outside the yabstraction

f xs let n length xs

in

letrec

g y gn

in

g

This transformation is called ful l laziness It was originally invented by Hughes Hughes

Peyton Jones who presented it as a variant of the sup ercombinator lambdalifting

algorithm Peyton Jones Lester subsequently showed how to decouple full laziness

from lambda lifting by regarding it as an exercise in oating letrec bindings outwards

Whereas the oatin transformation avoids pushing bindings inside lambda abstractions the

full laziness transformation actively seeks to do the reverse by oating bindings outside an

enclosing lambda abstraction

The full laziness transformation can save a great deal of rep eated work and it sometimes applies

in nonobvious situations One example we came across in practice is part of a program which

p erformed the Fast Fourier Transform FFT The programmer wrote a list comprehension

similar to the following

xsdot map docos k thetas n k n

What he did not realise is that the expression thetas n was recomputed for each value of

k The list comprehension syntactic sugar was translated into the Core language where the

thetas n app eared inside a function b o dy The full laziness transformation lifted thetas n

out past the lambda so that it was only computed once

A p otential shortcoming of the full laziness transformation as so far describ ed is this it seems

unable to oat out an expression that is free in a lambda abstraction but not letrec b ound

For example consider

f x case x of

g y

pps

Here the sub expression g y is free in the xabstraction and might p otentially b e an ex

p ensive computation which could b e shared among all applications of f It is simple enough

in principle to address this shortcoming by simply letbinding g y thus

f x case x of

let a g y

in a

pps

Now the binding for a can b e oated out like any other binding

The full laziness transformation may give rise to large gains but at the price of making worse

all the things that oating inwards makes b etter Section Hence the full laziness transfor

mation should only b e applied when there is some chance of a b enet For example it should

not b e used if either of the following conditions hold

The RHS of the binding is already a value or reduces to a value with a negligible amount

of work If the RHS is a value then no work is saved by sharing it among many invocations

of the same function though some allo cation may b e saved

The lambda abstraction is applied no more than once information that should b e made

available by the linear type inference system Section

There is a nal disadvantage to the full laziness which is much more slipp ery it may cause a

space leak Consider

f x let a enumerate n in B

where enumerate n returns the list of integers b etween and n Is it a go o d idea to oat

the binding for a outside the xabstraction Certainly doing so would avoid recomputing a

on each call of f On the other hand a is pretty cheap to recompute and if n is large the list

might take up a lot of store It might even turn a constantspace algorithm into a linearspace

one or even worse

In fact as our measurements show space leaks do not seem to b e a problem for real programs

We are however rather conservative ab out oating expressions to the top level where for

tiresome reasons they are harder to garbage collect Section

Lo cal transformations

The third set of transformations consist of lo cal rewrites which netune the placement of

bindings There are just three such transformations

let vR in B A let vR in B A

case let vR in B of let vR

in

case B of

let x let vR in R let vR

in B in

let xR

in B

Each of the three has an exactly equivalent form when the binding b eing oated outwards is

a letrec The third also has a variant when the outer binding is a letrec in this case

the binding b eing oated out is combined with the outer letrec to make a larger letrec

Subsequent dep endency analysis see Section will split up the enlarged group if it is p ossible

to do so

The rst two transformations are always b enecial They do not change the number of allo ca

tions but they do give other transformations more of a chance For example the rst which

we call letoatfromapplication moves a let outside an application Doing so cannot make

things worse and sometimes makes things b etter for example B might b e a lambda abstrac

tion which can then b e applied to A The second letoatfromcase oats a letrec binding

outside a case expression which might improve matters if for example B was a constructor

application

The third transformation the letoatfromlet transformation which oats a letrec binding

from the RHS of another letrec binding is more interesting It has the following advantages

p

Floating a binding out may reveal a WHNF For example consider the expression

let x let v R in vv

in B

When this expression is evaluated a thunk will b e allo cated for x When and if x is

evaluated by B the contents of the thunk will b e read back into registers its value the

pair vv computed and the heapallo cated thunk for x will b e overwritten with the

pair

Floating the binding for v out would instead give

let v R

x vv

in B

When this expression is evaluated a thunk will b e allo cated for v and a pair for x In

other words x is al located in its nal form No up date will take place when x is evaluated

a signicant saving in memory trac

p

There is a second reason why revealing a normal form may b e b enecial B may contain

a case expression which scrutinises x thus

case x of pq E

Now that x is revealed as b eing b ound to the pair vv this expression is easily trans

formed to

Evpvq

using the caseofknownconstructor transformation

p

Floating vs binding out may reduce the number of heapoverow checks A heapoverow

check is necessary b efore each sequence of letrec bindings to ensure that a large

enough contiguous blo ck of heap is available to allo cate all of the bindings in the sequence

For example the expression

let v R

x vv

in B

requires a single check to cover the allo cation for b oth v and x On the other hand if the

denition of v is nested inside the RHS of x then two checks are required

These advantages are all very well but the letfromlet transformation also has some obvious

disadvantages after all it was precisely the reverse of this transformation which we advocated

when discussing the oatinginward transformation Sp ecically there are two disadvantages

If x is not evaluated then an unnecessary allo cation for v would b e p erformed However

the strictness analyser may b e able to prove that x is sure to b e evaluated in which case

the letfromlet transformation is always b enecial

It is less likely that the strictness analyser will discover that v is sure to b e evaluated

This suggests that the strictness analyser should b e run b efore p erforming the letfromlet

transformation

Given these conicting tradeos there seem to b e four p ossible strategies for lo cal letoating

Never no lo cal letoating is p erformed at all

Strict bindings are oated out of strict contexts only namely applications case scrutinees

10

and the righthandsides of strict lets

WHNF like Strict but in addition a binding is oated out of a letrec righthandside

if doing so would reveal a WHNF

Always like Strict but in addition any binding at the top of a letrec righthandside

is oated out

We explore the practical consequences of these four strategies in Section

Comp osing the pieces

We have integrated the three letoating transformations into GHC The full laziness and oat

inwards transformations are implemented as separate passes In contrast the lo cal letoating

transformations are implemented by the simplier Section Among the transformations

p erformed by the simplier is dep endency analysis which splits each letrec binding into its

10

A strict let is one whose b ound variable is sure to b e evaluated by the b o dy of the let

minimal stronglyconnected comp onents Doing this is sometimes valuable b ecause it lets the

resulting groups b e oated indep endently

We p erform the transformations in the following order

Do the full laziness transformation

Do the oatinwards transformation This wont aect anything oated outwards by full

laziness any such bindings will b e parked just outside a lambda abstraction

Perform strictness analysis

Do the oatinwards transformation again

Between each of these passes the simplier is applied

We do the oatinwards pass b efore strictness analysis b ecause it helps to improve the results of

strictness analysis The desirability of p erforming the oatinwards transformation again after

strictness analysis surprised us Consider the following function

f x y if y

then error Divide by zero show x

else xy

The strictness analyser will nd f to b e strict in x b ecause calls to error are equivalent to

and hence will pass x to f in unboxed form However the then branch needs x in b oxed form

to pass to show The p oststrictness oatinwards transformation oats a binding that reb oxes

x into the appropriate branches of any conditionals in the b o dy of f thereby avoiding the

overhead of reb oxing x in the common case of taking the else branch

The implementation of the oatin transformation and lo cal letoating is straightforward but

the full laziness transformation has a few subtleties as we discuss next

Implementing full laziness

We use a twopass algorithm to implement full laziness

11

The rst pass annotates each letrec binder with its level number In general level

numbers are dened like this

The level number of a letb ound variable is the maximum of the level numbers of its

free variables and its free type variables

The level number of a letrecb ound variable is the maximum of the level numbers

of the free variables of all the RHSs in the group less the letrecb ound variables

themselves

The level number of a lambdab ound variable is one more than the number of en

closing lambda abstractions

11

Actually all the other binders are also annotated but they are never lo oked at subsequently

The level number of a case or typelambdab ound variable is the number of enclosing

ordinary lambda abstractions

The second pass uses the level numbers on letrecs to oat each binding outward to just

outside the lambda which has a level number one greater than that on the binding

Notice that a binding is oated out just far enough to escap e all the lambdas which it can

escap e and no further This is consistent with the idea that bindings should b e as far in

as p ossible There is one exception to this bindings with level number zero are oated

right to the top level

Notice to o that a binding is not moved at all unless it will denitely escap e a lambda

This algorithm is largely as describ ed by Peyton Jones Lester but there are a few

complications in practice Firstly type variables are a nuisance For example supp ose that f

and k are b ound outside the following xabstraction

x a let v f a k in

Wed like to oat out the v f a k but we cant b ecause then the type variable a would

b e out of scop e The rules ab ove give a the same level number as x assuming there are no

intervening lambdas which will ensure that the binding isnt oated out of as scop e Still

there are some particularly painful cases notably patternmatching failure bindings such as

fail error a Pattern fail

We really would like this to get lifted to the top level despite its free type variable a There

are two approaches ignore the problem of outofscop e type variables or x it up somehow

We take the latter approach using the following pro cedure If a binding v e has free type

variables whose maximum level number is strictly greater than that of the ordinary variables

then we abstract over the oending type variables aan thus

v let v aan e in v a an

Now v is given the usual level number taking type variables into account while v is given

the maximum level number of the ordinary free variables only since the type variables aan

are not free in v

The reason this is a bit half baked is that some subsequent binding might mention v in theory

it to o could b e oated out but it will get pinned inside the binding for v Its the binding for

v which oats But our strategy catches the common cases

The second complication is that there is a p enalty asso ciated with oating a binding b etween

two adjacent lambdas For example consider the binding

f x y let v length x in

It would b e p ossible to oat the binding for v b etween the lambdas for x and y but the result

would b e two functions of one argument instead of one function of two arguments which is less

ecient There would b e gain only if a partial application of f to one argument was applied

12

many times Indeed our measurements indicate that allowing lambdas to b e split in this way

resulted in a signicant loss of p erformance Our pragmatic solution is to therefore treat the

12

See Santos for these gures we do not present them here

lambdas for x and y as a single lambda group and to give a single level number to all the

variables b ound by a group As a result lambda groups are never split

The third complication is that we are paranoid ab out giving bindings a level number of zero

b ecause that will mean they oat right to the top level where they might cause a space leak

In our implementation all toplevel values are retained for the whole life of the program It

would b e p ossible for the garbage collector to gure out which of them cannot b e referred to

again and hence which could safely b e garbage collected but doing so adds complexity and

slows b oth the mutator and the garbage collector We use several heuristics which sometimes

decide conservatively to leave a binding exactly where it is If this happ ens rather than giving

the binding level number zero it is given a level number of the number of enclosing lambdas

so that it will not b e moved by the second pass

Other GHC transformations

We have fo cused so far on particular asp ects of GHCs transformation system This section

briey summarises the other main transformations p erformed by GHC

The simplier contains many more transformations than those describ ed in Sections and

A full list can b e found in Peyton Jones Santos and Santos the latter also

contains detailed measurements of the frequency and usefulness of each transformation

The sp ecialiser uses partial evaluation to create sp ecialised versions of overloaded functions

using much the same technique as that describ ed by Jones

Eta expansion is an unexp ectedlyuseful transformation Gill Chapter We found

that other transformations sometimes pro duce expressions of the form

let f x let in y E

in B

If f is always applied to two arguments in B then we can W safely that is without risk

of duplicating work transform the expression to

let f x y let in E

in B

It turns out that a lambda abstraction that binds multiple arguments can b e implemented

much more eciently than a nested series of lambdas The most elegant way to achieve

the transformation is to p erform an etaexpansion the opp osite of eta reduction on

fs right hand side

x R x a R a

Once that is done normal b eta reduction will make the application to a cancel with

the y to give the desired overall eect

The crucial question is this when is eta expansion guaranteed to b e W safe Unsur

prisingly this turns out to b e yet another fruitful application for the linear type system

sketched in Section

Deforestation is a transformation that removes intermediate lists Wadler For ex

ample in the expression sum map double xs an intermediate list map double xs is

created only to b e consumed immediately by sum Successful deforestation removes this

intermediate list giving a single pass algorithm that traverses the list xs doubling each

element b efore adding it to the total

Fullblown Wadlerstyle deforestation for higherorder programs is dicult the only ex

ample we know of is describ ed by Marlow and even that do es not work for large

programs Instead we developed a new more practical technique called short cut defor

estation Gill Launchbury Peyton Jones As the name implies our metho d do es

not remove all intermediate lists but in exchange it is relatively easy to implement Gill

describ es the technique in detail and gives measurements of its eectiveness Even

on programs written without deforestation in mind the transformation reduces execution

time by some averaged over a range of programs This is a rather disapp ointing result

but we b elieve that there is p otential for improving it considerably

Deforestation also allows the desugaring of list comprehensions to b e simplied consider

ably moving a group of optimisations from the desugarer to the deforester The details

are in Gill Launchbury Peyton Jones

Lambda lifting is a wellknown transformation that replaces lo cal function declarations with

global ones by adding their free variables as extra parameters Johnsson For

example consider the denition

f x letrec g y xyg

in g

Here x is free in the denition of g By adding x as an extra argument to g we can

transform the denition to

f x g x

g x y xyg x

Some back ends require lambdalifted programs Our co de generator can handle lo cal

functions directly so lambda lifting is not required Even so it turns out that lambda

lifting is sometimes b enecial but on other occasions the reverse is the case That is the

exact opp osite of lambda lifting lambda dropping also known as the static argument

transformation sometimes improves p erformance Santos Chapter discusses

the tradeo in detail GHC implements b oth lambda lifting and the static argument

transformation Each buys only a small p erformance gain a p ercentage p oint or two on

average

Separate compilation

GHC makes a serious attempt to propagate transformations across mo dules When a mo dule

M is compiled as well as pro ducing Ms ob ject co de the compiler also emits Ms interface le

that contains inter alia three sorts of information

Scope information the names and dening mo dule of each class type and value exp orted

by M This tells an imp orting mo dule what names are exp orted by M and hence are

brought into scop e by imp orting M

Type information the type declarations of the classes types and values dened in M

whether exported or not Haskell allows an exp orted value to have a type mentioning type

constructors that are not exp orted and the compiler needs access to the latter

Implementation information for each value dened in M the interface le gives

Its denition in the Core language if it is smaller than some xed threshold this

allows it to b e inlined at call sites in other mo dules

If it is overloaded what sp ecialised instances of the value have b een compiled

Its strictness information

Its linearity information

Its arity how many arguments it takes

Providing such implementation information allows an imp orting mo dule to take advantage of

detailed knowledge ab out M but of course it also increases the coupling b etween mo dules If

M is recompiled and some of this implementation information changes then each mo dule that

imp orts M must b e recompiled to o We leave the choice to the programmer implementation

information is written into interface les if and only if the O ag is sp ecied when invoking

the compiler In this way the programmer can trade compilation time for runtime eciency

Eect of the Transformations

Whilst every transformation we have describ ed was motivated by particular examples it is far

from obvious that each will deliver measurable p erformance gains when applied to average

programs In this section we present quantitative measurements of some of the most imp ortant

transformations describ ed ab ove

Setup

We measured the eect of our transformations on a sample of b etween and programs

13

from our Nofib test suite Partain Many of these programs are real applications

that is application programs written by someone other than ourselves to solve a particular

problem None was designed as a b enchmark and they range in size from a few hundred to a few

thousand lines of Haskell Our results are emphatically not b estcase results on toy programs

It is dicult to present the eect of many interacting transformations in a mo dular way If

we measure the eect of switching them on one at a time we risk b eing either overoptimistic

b ecause an otherwise unoptimised program is a very soft target or overpessimistic b ecause

one transformation relies on another to exploit its eects Switching them o one at a time

13

The number varied b etween dierent exp eriments which were carried out over an extended p erio d

suers from the opp osite ob jections but at least it faithfully indicates the cost of omitting that

transformation from a pro duction compiler

Accordingly most results are given as percentage changes from the base case in which al l trans

formations are enabled A value greater than means more than the base case less than

means less than the base case whether that is go o d or bad dep ends on what is b eing

measured

Space precludes listing the results for each individual program Instead we rep ort just the aver

age eect where for average we use the geometric mean since we are averaging p erformance

ratios Fleming Wallace

We concentrate on the following measures

Instruction count how many instructions are taken to execute the program This

measure is indep endent of cache lo cality and paging which in to days architectures can

sometimes dominate all other eects put together Nevertheless it is a more p ortable

measure and our wallclocktime measurements which we made as a sanity check mostly

track the instructioncount measure in practice

Heap al location how much heap is allo cated by the program during its execution If an

optimisation reduces allo cation by a larger factor than instruction count it is likely that

a smaller prop ortion of instructions are memory cycles and hence that execution time

may decrease by a larger fraction than the instruction count The reverse is also true of

course

Maximum residency the maximum amount of live data during execution This number

directly aects the cost of garbage collection and is the b est measure of the space con

sumption of a program The residency numbers were gathered by sampling the amount

of live data at frequent intervals using the garbage collector Frequent sampling means

that any spikes in live memory usage are unlikely to b e missed

Binary size the size of the compiled co de excluding symbol table

Compile time

For each set of measurements we recompiled the standard prelude and libraries with the sp ecied

set of transformations enabled so that the results reect the eect on the entire program and

not only on the application part of it

Like all quantitative measurements and esp ecially average measurements the numbers we

present should not b e read uncritically In particular

The averages often conceal large individual variations It is not uncommon to nd that a

particular transformation has a small eect on most programs but a dramatic eect on

a few

Gathering these measurements is a substantial exercise taking gigabytes of disc space and

weeks of CPU time The set of b enchmark programs was not the same in every case

hence the range of sample size nor were all the measures collected in every exp eriment

Nevertheless we b elieve that the gures present a reasonably truthful picture of the relative

imp ortance of the dierent transformations The sources we cite elsewhere in this pap er give

much more detailed breakdowns of many of the gures we summarise here

Overall gains from transformation

300% 271% 250%

200% 176% 163% 141% Instructions Executed 150% 132% 122% Heap Allocated 109% 109% 104% Binary Size 100% 82% Compilation Time 50% 28% Execution Time Average change 12% 0%

-50% -49% -51% -58% -100% None Minimal Simpl

Transformations

Figure Overall eect of transformations

The overall gains from transformations are presented in Figure The x axis represents various

compilation options each compared to a baseline in which all optimisations are enabled

None that is no transformations at all It might b e argued that this makes the transfor

mations lo ok unreasonably eective b ecause the desugarer is written assuming that a

subsequent simplication will clear up much of its litter

Minimal approximates the eect of a more plausible desugarer with no further transforma

tions We approximated this setup by p erforming a single noniterative run of the sim

plier with most transformations disabled The only imp ortant transformations that

remain are b eta reduction and the inlining of trivial bindings that is ones that bind a

variable to another variable or literal

Simplier only Here all the global transformations are switched o leaving only a full run

of the simplier up to iterations although this limit was never reached

Overall switching o all transformations increases instruction count by around The

fairer minimal case still increases instruction count by while switching o everything

except the simplier only increases instruction count by The minimum compilation time

less than half that of full optimisation is achieved by the simplier only case so this is what

GHC uses when compiling without the O ag

The following sections investigate individual transformations in more detail

The simplier

It do es not make sense to measure the eect of switching the simplier o while leaving all the

other transformations on since they all rely on the simplier to clean up after them and exploit

their eects In eect the simplier is part of every transformation so it cannot sensibly b e

measured in isolation

let to caseother let float from case 4% 2% 5% case-of-case 8% let float from let 32%

let float from app. 9%

case of known constructor 10% inlining beta-reduction 17%

13%

Figure Frequency of transformations

The simplier implements a large number of separate transformations and it certainly makes

sense to ask how often each is used Figure answers this question by giving the relative

frequency of the most common transformations

We did not include in the transformation counts the following two transformations which would

otherwise dominate the pie chart

Dead co de elimination unused bindings and unreachable case alternatives

Inlining for trivial bindings ones that bind variables to other variables or literals

These two transformations almost always o ccur as a byproduct of some other transformation

which is made easier to implement thereby so their frequency is mostly a consequence of the

implementation strategy of other transformations 10% 9% 8% 7% 6% Instructions Executed 5% Heap Allocated 4%

Average change 3% 2% 1% 0% case-of- escape eta- case off analysis off expansion off

Transformations

Figure Turning o individual transformations

We also measured the eect of turning o individual transformations one at a time relative as

always to the fulloptimisation base case Figure shows three interesting cases letoating is

dealt with in Section

Switching o the caseofcase transformation increases instruction count by a substantial

Section describ ed how to use let bindings to describ e join p oints asserting that a

simple analysis in the co de generator suces to identify these sp ecial joinp oint bindings

The second column in Figure shows that the analysis is not in fact very imp ortant

switching it o gives only a increase in instruction count alb eit with a larger increase

in heap allo cation

Eta expansion Section has a substantial individual eect switching it o costs some

in b oth instructions and allo cation

Inlining

The eect of inlining is summarised in Figure The x axis is calibrated by the following

inlining strategies

O Inlining is turned o except for trivial bindings of variables to variables or literals

One o cc Trivial bindings and variables or functions that o ccur only once in a W safe context

are inlined 250%

200%

Instructions Executed 150% Heap Allocated

100% Binary Size

Compilation Time Average change 50% Functions Inlined

Case-of-known- 0% constructor

-50% off one 0 1 2 4 8 16 32 occ.

Threshold

Figure Eect of Inlining Strategies

Thresholdn Any nonrecursive binding is inlined if it is W safe to do so and either it

o ccurs just once or its space p enalty Section is less than the given threshold

As well as the usual measures instruction count heap allo cation compilation time we also

show the number of functions that are actually inlined These graphs show the following eects

The bigger eects come directly from inlining variables and functions with one o ccurrence

and then reasonable gains come up to threshold where the gains start to b e minimal

Although we get many more functions inlined with larger thresholds this is not reected

on the number of instructions executed ie we quickly get to a p oint where more inlining

is almost useless

Binary size remains virtually unaltered which means that most of the larger functions

b eing inlined do not o ccur many times in the program

Compilation time actually decreases initially since we end up with less to do in later

phases of the compiler With the highest inline threshold we measured compilation

takes ab out longer than the one o ccurrence case

Strictness Analysis

The eect of strictness analysis is shown in Figure which shows that if we disable strictness

analysis we will increase execution time by ab out and the number of ob jects allo cated in

the heap is also a lot higher since we will have many more lets 20% 18% 16% 14% 12% Execution Time 10% Objects Allocated 8%

Average change 6% 4% 2% 0% Strictness Analysis

Off

Figure Strictness Analysis

Co de motion

As mentioned in Section we identify three kinds of letoating oating inwards full laziness

and lo cal let oating Figure presents the average eects of these let oating transformations

relative to the fullyoptimised base case

FI o presents the eect of turning o the oat inwards transformation this increases instruc

tion count by a mo dest

FL o presents the eect of turning o the full laziness transformation results here are some

what variable but average to around

Lo cal LF o presents the eect of turning o all local let oating this costs around

All Floating O presents the eect of turning o all let oating ie oating inwards full

laziness and all kinds of lo cal let oating The total p enalty in instruction count is

p erhaps surprisingly ab out equal to the sum of the three individual eects a substantial

The executiontime sanity check b ears this out with an improvement from let

oating

Figure also shows the eect of the four variants of lo cal letoating describ ed in Section

In the baseline case full optimisation we use the WHNF strategy b ecause that app ears

to give the b est results The other three strategies Never Strict and Always are

presented in the columns Lo cal LF o Lo cal LF strict and Lo cal LF always All three

are indeed worse than the baseline strategy Strict and Never are very bad and

worse resp ectively while Always is only a little worse 34% 35%

30%

25%

19% 20% Instructions Executed 16% 16% Heap Allocated 15% 11% Maximum Residency 8% 10% 9% Average Residency 7% 6% 6% 6% 6% Execution Time

Average change 5% 4% 5% 3% 1% 0% -2% -5% -3% -5%-6% -10% FI off FL off Local LF Local LF Local LF All off strict always Floating off

Floating Option

Figure Let Floating

Lessons and conclusions

What general lessons ab out compilation by transformation have we learned from our exp erience

The interaction of theory and practice is genuine not simply window dressing Apart

from asp ects already mentioned second order lambda calculus linear type systems

strictness and absence analysis here are three other examples describ ed elsewhere

We make extensive use of monads Wadler particularly to express inputoutput

Peyton Jones Wadler and stateful computation Launchbury Pey

ton Jones Monads allow us to express imp erative algorithms in a purely

functional setting In particular the compiler can freely use its entire armoury of

transformations on stateful computations expressed using monads in contrast opti

mising compilers for Lisp or ML must p erform some kind of eects analysis to infer

which functions are pure and which may have side eects many optimsiations must

b e disabled for the latter Since monads simply make explicit the otherwiseimplicit

ow dep endencies it is not clear that we get b etter co de than the analyseanddisable

approach but we certainly get a simpler compiler

Parametricity a deep semantic consequence of p olymorphism turns out to b e cru

cial in establishing the correctness of cheap deforestation Gill Launchbury Pey

ton Jones and secure encapsulation of stateful computation Launchbury

Peyton Jones

GHCs time and space proler is based on a formal mo del of cost attribution Sansom

Sansom Peyton Jones an unusual prop erty for a highly op erational

activity such as proling In this case the implementation came rst but the sub

tleties caused by nonstrictness and higherorder functions practically drove us to

despair and forced us to develop a formal foundation

Plug and play really works The mo dular nature of a transformational compiler and its

late commitment to the order of transformation is a big win The ability to run a

transformation pass twice at least when going for maximum optimisation is sometimes

very useful All this really only applies to compiler writers however almost all compiler

users will b e content to use the bundle of ags that are conjured up by using the standard

O please optimise or O please optimise a lot ags

It is hard to estimate the cost of this plugandplay approach Would the compiler b e

faster if several passes were amalgamated into a single giant pass In one case namely

the simplier we have indeed combined many small transformations into a single pass

For the larger transformations the b enets are probably mo dest each do es a substantial

task so the cost reduction from eliminating the intermediate data structure is probably

small and it would come at a high programming cost It might b e more attractive to

develop automatic techniques for fusing successive passes together p erhaps by generalising

the shortcut deforestation technique mentioned ab ove to arbitrary data structures and

explicitlyrecursive functions Launchbury Sheard

The cascade eect is imp ortant One transformation really do es exp ose opp ortunities

for another Transformational passes are easier to write in the knowledge that subsequent

transformations can b e relied on to clean up the result of a transformation For example

a transformation that wants to substitute x for y in an expression E can simply pro duce

yE x leaving the simplier to p erform the substitution later

The compiler needs a lot of bullets in its gun It is common for one particular transfor

mation to have a dramatic eect on a few programs and a very mo dest eect on most

others There is no substitute for applying a large number of transformations each of

which will hit some programs

Some nonobvious transformations are imp ortant We found that it was imp ortant to

add a signicant number of obviouslycorrect transformations that would never apply

directly to any reasonable source program For example

case error Wurble of error Wurble

error is a function that prints its argument string and halts execution Semantically

its value is just b ottom No programmer would write a case expression that scrutinises

a call to error but such case expressions certainly show up after transformation For

example consider the expression

if head xs then E else E

After desugaring and inlining head we get

case case xs of error head pps p of

True E

False E

Applying the caseofcase transformation Section makes one copy of the outer case

scrutinise the call to error

Other examples of nonobvious transformations include eta expansion Section and

absence analysis Section We identied these extra transformations by eyeballing

the co de pro duced by the transformation system lo oking for co de that could b e improved

Elegant generalisations of traditional optimisations have often cropp ed up that either ex

tend the reach of the optimisation or express it as a sp ecial case of some other transfor

mation that is already required Examples include jump elimination copy propagation

b o olean shortcircuiting and lo opinvariant co de motion Similar generalisations are dis

cussed by Steele

Maintaining types is a big win It is sometimes tiresome but never dicult for each trans

14

formation to maintain type correctness On the other hand it is sometimes indisp ensable

to know the type of an expression notably during strictness analysis Maintaining types

throughout compilation is b ecoming more p opular Shao App el Tarditi et al

Perhaps the largest single b enet came from an unexp ected quarter it is very easy to

check a Core program for type correctness While developing the compiler we run Core

Lint the Core typechecker after every transformation pass which turns out to b e an

outstandingly go o d way to detect incorrect transformations Before we used Core Lint

b ogus transformations usually led to a core dump when running the transformed program

followed by a long gdb hunt to isolate the cause Now most b ogus transformations are

identied much earlier and much more precisely One of the stupidest things we did was

to delay writing Core Lint

Crossmo dule optimisation is imp ortant Functional programmers make heavy use of li

braries abstract data types and mo dules It is highly desirable that inlining strictness

analysis sp ecialisation and so on work b etween mo dules For example many abstract

data types exp ort very small functions that would probably b e implemented as macros in

C With cross mo dule inlining they can b e inlined at every call site gaining most of the

advantages of macros without the burden of macro pro cessors strange semantics Like

the ob jectoriented community Chambers Dean Grove we regard a serious

assault on global crossmo dule optimisation as the most plausible next big win

14

This is true for the transformations we have implemented Some transformations notably closure conversion

require more work and indeed a more sophisticated type system than the second order lambda calculus Harp er

Morrisett

Acknowledgements

The Glasgow Haskell Compiler was built by many p eople including Will Partain Jim Mattson

Kevin Hammond Andy Gill Patrick Sansom Cordelia Hall and Simon Marlow Im very

grateful to Sigb jorn Finne Hanne Nielson Will Partain Patrick Sansom and Phil Trinder

and three anonymous referees for helpful feedback on drafts of this pap er

The Glasgow Haskell Compiler is freely available at

httpwwwdcsglaacukfpsoftwareghc

References

AV Aho R Sethi JD Ullman Compilers principles techniques and to ols Addison

Wesley

AW App el Compiling with Cambridge University Press

AW App el T Jim Shrinking LambdaExpressions in Linear Time Department of

Computer Science Princeton University

Z Ariola M Felleisen J Maraist M Odersky P Wadler A call by need lambda

calculus in nd ACM Symp osium on Principles of Programming Languages San

Francisco ACM Jan

L Augustsson Compiling lazy functional languages part I I PhD thesis Dept Comp

Sci Chalmers University Sweden

DF Bacon SL Graham OJ Sharp Compiler transformations for highp erformance

computing ACM Computing Surveys Dec

B Calder D Grunwald B Zorn Quantifying b ehavioural dierences b etween C and

C programs Journal of Programming Languages Dec

C Chambers J Dean D Grove A framework for selective recompilation in the presence

of complex intermodule dep endencies in Pro c International Conference on Software

Engineering Seattle Apr

CD Clack SL Peyton Jones Generating parallelism from strictness analysis Internal

Note Department of Computer Science University College London Feb

JW Davidson AM Holler A study of a C function inliner Software Practice and

Exp erience

C Flanagan A Sabry B Duba M Felleisen The essence of compiling with continua

tions SIGPLAN Notices June

PJ Fleming JJ Wallace How not to lie with statistics the correct way to summarise

b enchmark results CACM March

P Fradet D Le Metayer Compilation of functional languages by program transfor

mation ACM Transactions on Programming Languages and Systems Jan

A Gill J Launchbury SL Peyton Jones A short cut to deforestation in Pro c Func

tional Programming Languages and Computer Architecture Cop enhagen ACM June

AJ Gill Cheap deforestation for nonstrict functional languages PhD thesis Depart

ment of Computing Science Glasgow University Jan

J Girard Une extension de linterpretation de Godel a lanalyse et son application a

lelimination de coupures dans lanalyse et la theorie des types in nd Scandinavian

Logic Symp osium JE Fenstad ed North Holland

R Harp er G Morrisett Compiling p olymorphism using intensional type analysis in

nd ACM Symp osium on Principles of Programming Languages San Francisco ACM

Jan

P Hudak SL Peyton Jones PL Wadler Arvind B Boutel J Fairbairn J Fasel M Guzman

K Hammond J Hughes T Johnsson R Kieburtz RS Nikhil W Partain J Peterson

Rep ort on the functional programming language Haskell Version SIG

PLAN Notices May

RJM Hughes The design and implementation of programming languages PhD thesis

Programming Research Group Oxford July

Thomas Johnsson Lambda lifting transforming programs to recursive equations in

Pro c IFIP Conference on Functional Programming and Computer Architecture Jouan

naud ed LNCS Springer Verlag

MP Jones Dictionaryfree overloading by partial evaluation in ACM SIGPLAN Work

shop on Partial Evaluation and SemanticsBased Program Manipulation PEPM Or

lando Florida ACM June

R Kelsey Compilation by program transformation YALEUDCSRR PhD thesis

Department of Computer Science Yale University May

R Kelsey P Hudak Realistic compilation by program transformation in Pro c ACM

Conference on Principles of Programming Languages ACM Jan

DA Kranz ORBIT an optimising compiler for Scheme PhD thesis Department of

Computer Science Yale University May

DA Kranz R Kelsey J Rees P Hudak J Philbin N Adams ORBIT an optimising

compiler for Scheme in Pro c SIGPLAN Symp osium on Compiler Construction ACM

J Launchbury A natural semantics for lazy evaluation in th ACM Symp osium on

Principles of Programming Languages POPL Charleston ACM Jan

J Launchbury SL Peyton Jones Lazy functional state threads in SIGPLAN Sym

p osium on Programming Language Design and Implementation PLDI Orlando

ACM June

J Launchbury T Sheard Warm fusion in Pro c Functional Programming Languages

and Computer Architecture La Jolla ACM June

S Marlow Deforestation for Higher Order Functional Programs PhD thesis Depart

ment of Computing Science University of Glasgow March

R Morrison A Dearle RCH Connor AL Brown An ad ho c approach to the implemen

tation of p olymorphism ACM Transactions on Programming Languages and Systems

July

A Ohori A compilation metho d for MLstyle p olymorphic record calculi in th ACM

Symp osium on Principles of Programming Languages Albuquerque ACM Jan

WD Partain The nob Benchmark Suite of Haskell Programs in Functional Program

ming Glasgow J Launchbury PM Sansom eds Workshops in Computing

Springer Verlag

SL Peyton Jones The Implementation of Functional Programming Languages Prentice

Hall

SL Peyton Jones Implementing lazy functional languages on sto ck hardware the Spine

less Tagless Gmachine Journal of Functional Programming Apr

SL Peyton Jones Compilation by transformation a rep ort from the trenches in Eu

rop ean Symp osium on Programming ESOP Linkoping Sweden Springer Verlag

LNCS Jan

SL Peyton Jones CV Hall K Hammond WD Partain PL Wadler The Glasgow

Haskell compiler a technical overview in Pro ceedings of Joint Framework for Infor

mation Technology Technical Conference Keele DTISERC March

SL Peyton Jones J Launchbury Unboxed values as rst class citizens in Functional

Programming Languages and Computer Architecture FPCA Boston Hughes ed

LNCS Springer Verlag Sept

SL Peyton Jones D Lester A mo dular fullylazy lambda lifter in Haskell Software

Practice and Exp erience May

SL Peyton Jones WD Partain Measuring the eectiveness of a simple strictness

analyser in Functional Programming Glasgow K Hammond JT ODonnell

eds Workshops in Computing Springer Verlag

SL Peyton Jones WD Partain A Santos Letoating moving bindings to give faster

programs in Pro c International Conference on Functional Programming Philadelphia

ACM May

SL Peyton Jones A Santos Compilation by transformation in the Glasgow Haskell

Compiler in Functional Programming Glasgow K Hammond DN Turner PM

Sansom eds Workshops in Computing Springer Verlag

SL Peyton Jones PL Wadler Imp erative functional programming in th ACM

Symp osium on Principles of Programming Languages POPL Charleston ACM

Jan

JC Reynolds Towards a theory of type structure in International Programming Sym

p osium Springer Verlag LNCS

PM Sansom Execution proling for nonstrict functional languages PhD thesis Tech

nical Rep ort FP Department of Computer Science University of Glasgow Sept

ftpftpdcsglasgowacukpubglasgowfptech reportsFP executionprofilingpsZ

PM Sansom SL Peyton Jones Time and space proling for nonstrict higherorder

functional languages in nd ACM Symp osium on Principles of Programming Lan

guages San Francisco ACM Jan

A Santos Compilation by transformation in nonstrict functional languages PhD the

sis Department of Computing Science Glasgow University Sept

Z Shao AW App el A typebased compiler for Standard ML in SIGPLAN Sym

p osium on Programming Language Design and Implementation PLDI La Jolla

ACM June

GL Steele Rabbit a compiler for Scheme AITR MIT Lab for Computer Science

D Tarditi G Morrisett P Cheng C Stone R Harp er P Lee TIL A TypeDirected

Optimizing Compiler for ML in SIGPLAN Symp osium on Programming Language

Design and Implementation PLDI Philadelphia ACM May

A Tolmach Tagfree garbage collection using explicit type parameters in ACM Sym

p osium on Lisp and Functional Programming Orlando ACM June

DN Turner PL Wadler C Mossin Once up on a type in Pro c Functional Program

ming Languages and Computer Architecture La Jolla ACM June

PL Wadler Deforestation transforming programs to eliminate trees Theoretical Com

puter Science

PL Wadler The essence of functional programming in th ACM Symp osium on

Principles of Programming Languages Albuquerque ACM Jan