Compiling Haskell by program transformation

a rep ort from the trenches

Simon L Peyton Jones

Department of Computing Science University of Glasgow G QQ

Email simonpjdcsglaacuk WWW httpwwwdcsglaacuksimonpj

January

Abstract

Many do some of their work by means of correctnesspre se rving and hop e

fully p erformanceimproving program transformations The Glasgow Haskell

GHC takes this idea of compilation by transformation as its warcry trying to express

as much as p ossible of the compilation pro cess in the form of program transformations

This pap er rep orts on our practical exp erience of the transformational approach to

compilation in the context of a substantial compiler

The paper appears in the Proceedings of the European Symposium on Programming

Linkoping April

Introduction

Using correctnesspreserving transformations as a compiler optimisation is a wellestablished

technique Aho Sethi Ullman Bacon Graham Sharp In the functional

programming area esp ecially the idea of compilation by transformation has received quite

a bit of attention App el Fradet Metayer Kelsey Kelsey Hudak

Kranz Steele

A transformational approach to compiler construction is attractive for two reasons

Each transformation can b e implemented veried and tested separately This leads

to a more mo dular compiler design in contrast to compilers that consist of a few huge

passes each of which accomplishes a great deal

In any framework transformational or otherwise each optimisation often exp oses new

opp ortunities for other optimisations the cascade eect This makes it dicult

to decide a priori what the b est order to apply them might b e In a transformational

setting it is easy to plug and play by reordering transformations applying them

more than once or trading compilation time for co de quality by omitting some It

allows a late commitment to phase ordering

This pap er rep orts on our exp erience in applying transformational techniques in a particularly

thoroughgoing way to the Glasgow Haskell Compiler GHC Peyton Jones et al a

compiler for the nonstrict functional language Haskell Hudak et al Among other

things this pap er may serve as a useful jumpingo p oint and annotated bibliography for

those interested in the compiler

A p ervasive theme is the close interplay b etween theory and practice a particularly satisfying

asp ect of functionallanguage research

Overview

Haskell is a nonstrict purely functional language It is a relatively large language with a

rich syntax and type system designed for fullscale application programming

The overall structure of the compiler is conventional

The front end parses the source do es scop e analysis and type inference and translates

the program into a small intermediate language called the Core language This latter

stage is called desugaring

The middle consists of a sequence of CoretoCore transformations and forms the sub

ject of this pap er

The back end co degenerates the resulting Core program into C whence it is compiled

to machine co de Peyton Jones

To exploit the advantages of compilation by transformation mentioned ab ove we have worked

particularly hard to move work out of the front and back ends esp ecially the latter and

reexpress it in the form of a transformation We have taken the plug and play idea to

an extreme allowing the sequence of transformation passes to b e completely sp ecied on the

command line

In practice we nd that transformations fall into two groups

A large set of simple lo cal transformations eg b eta reduction

These transformations are all implemented by a single relatively complex compiler pass

that we call the simplier The complexity arises from the fact that the simplier tries

to p erform as many transformations as p ossible during a single pass over the program

exploiting the cascade eect It would b e unreasonably inecient to p erform just one

at a time starting from the b eginning each time Despite these eorts the result of one

simplier pass often still contains opp ortunities for further simplier transformations

so we apply the simplier rep eatedly until no further transformations o ccur with a set

maximum to avoid pathological b ehaviour

A small set of complex global transformations eg strictness analysis sp ecialising

overloaded functions each of which is implemented as a separate pass Most consist

of an analysis phase followed by a transformation pass that uses the analysis results

to identify appropriate sites for the transformation Many also rely on a subsequent

pass of the simplier to clean up the co de they pro duce thus avoiding the need to

duplicate transformations already embo died in the simplier

Rather than give a sup ercial overview of everything we fo cus in this pap er on three asp ects

of our compiler that play a key role in compilation by transformation

Program Prog Bind ::: Bind n

n

Binding Bind var Expr Nonrecursive

j rec var Expr ::: var Expr Recursive n

n n

Expression Expr Expr Atom Application

j Expr ty Type application

j var ::: var Expr Lambda abstraction

n

j tyvar ::: tyvar Expr Type abstraction

n

j case Expr of Alts Case expression

j let Bind in Expr Lo cal denition

j con var ::: var Constructor n

n

j prim var ::: var Primitive n

n

j Atom

Atoms Atom var Variable

j Literal Unboxed Ob ject

Literals Literal integer j oat j :::

Alternatives Alts Calt ::: Calt Default n

n

j Lalt ::: Lalt Default n

n

Constr alt Calt Con var ::: var Expr n

n

Literal alt Lalt Literal Expr

Default alt Default NoDefault

j var Expr

Figure Syntax of the Core language

The Core language itself Section

Two groups of transformations implemented by the simplier inlinin g and b eta reduc

tion Section and transformations involving case expressions Section

One global transformation pass the one that p erforms and exploits strictness analysis

Section

We conclude with a brief enumeration of the other main transformations incorp orated in GHC

Section and a summary of the lessons we learned from our exp erience Section

The Core language

The Core language clearly plays a pivotal role Its syntax is given in Figure and consists

essentially of the lambda calculus augmented with let and case

Though we do not give explicit syntax for them here the Core language includes algebraic data

type declarations exactly as in any mo dern language For example

in Haskell one might declare the type of trees thus

data Tree a Leaf a Branch Tree a Tree a

This declaration implicitl y denes constructors Leaf and Branch that are used to construct

data values and can b e used in the pattern of a case alternative Bo oleans lists and tuples

are simply predeclared algebraic data types

data Boolean False True

data List a Nil Cons a List a

data Tuple a b c T a b c One for each size of tuple

Throughout the pap er we take a few lib erties with the syntax we allow ourselves inx

op erators eg E E and sp ecial syntax for lists for Nil and inx for Cons and

tuples eg abc We allow multiple denitions in a single let expression to abbreviate

a sequence of nested let expressions and often use layout instead of curly brackets and

semicolons to delimit case alternatives We use an upp ercase identier such as E to denote

an arbitrary expression

The op erational reading

The Core language is of course a functional language and can b e given the usual denotational

semantics However a Core program also has a direct operational interpretation If we are to

reason ab out the usefulness of a transformation we must have some mo del for how much it

costs to execute it so an op erational interpretation is very desirable

The op erational mo del for Core requires a garbagecollected heap The heap contains

Data values such as list cells tuples b o oleans integers and so on

Function values such as x x the function that adds to its argument

Thunks or susp ensions that represent susp ended ie as yet unevaluated values

Thunks are the implementation mechanism for Haskells nonstrict semantics For example

consider the Haskell expression f sin x y Translated to Core the expression would lo ok

like this

let v sin x

in f v y

The let allo cates a thunk in the heap for sin x and then when it subsequently calls f passes

a p ointer to the thunk The thunk records all the information needed to compute its b o dy

sin x in this case but it is not evaluated b efore the call If f ever needs the value of v it

will force the thunk which provokes the computation of sin x When the thunks evaluation

is complete the thunk itself is updated ie overwritten with the nowcomputed value If f

needs the value of v again the heap ob ject now contains its value instead of the susp ended

computation If f never needs v then the thunk is not evaluated at all

The two most imp ortant op erational intuitions ab out Core are as follows

let bindings and only let bindings perform heap al location For example

let v sin x

in

let w pq

in

f v w

Op erationally the rst let allo cates a thunk for sin x and then evaluates the lets

b o dy This b o dy consists of the second let expression which allo cates a pair pq in

the heap and then evaluates its b o dy in turn This b o dy consists of the call f v w so

the call is now made passing p ointers to the two newlyallo cated ob jects

In our implementation each allo cated ob ject b e it a thunk or a value consists only

of a co de p ointer together with a slot for each free variable of the righthand side of

the let binding Only one ob ject is allo cated regardless of the size of the righthand

side older implementations of graph reduction do not have this prop erty We do not

attempt to share environments b etween thunks App el Kranz et al

case expressions and only case expressions perform evaluation For example

case x of

yys y g ys

The op erational understanding is evaluate x and then scrutinise it to see whether

it is an empty list or a Cons cell of form yys continuing execution with the

appropriate alternative

case expressions subsume conditionals of course The Haskell expression if C E E

is desugared to

case C of True E False E

The syntax in Figure requires that function arguments must b e atoms that is variables

or literals and now we can see why If the language allowed us to write

f sin x pq

the op erational b ehaviour would still b e exactly as describ ed in ab ove with a thunk and

a pair allo cated as b efore The let form is simply more explicit Furthermore the let form

gives us the opp ortunity of moving the binding for v elsewhere if that turns out to b e desirable

which the apparentlysimpler form do es not Lastly the let form is more economical b ecause

many transformations on let expressions concerning strictness for example would have to

b e duplicated for function arguments if the latter were nonatomic

It is also imp ortant to note where atoms are not required In particular the scrutinee of a

case expression is an arbitrary expression not just an atom For example the following is

quite legitimate

1

This syntax is b ecoming quite widely used Ariola et al Flanagan et al Flanagan et al

Launchbury Peyton Jones

case reverse xs of

Op erationally there is no need to build a thunk for reverse xs and then evaluate it rather

we can simply save a return address and call reverse xs Again the op erational mo del

determines the syntax

Polymorphism

Like any compiler for a stronglytyped language GHC infers the type of every expression

and variable An obvious question is can this type assignment b e maintained through the

translation to the Core language and through all the subsequent transformations that are

applied to the program If so b oth transformations and co de generator might and in GHC

sometimes do take advantage of type information to generate b etter co de

In a monomorphic language the answer is a clear yes but matters are not initially so

clear in a p olymorphic setting The trouble is that program transformation involves type

manipulation Consider for example the usual comp osition function compose whose type

is

compose :

The function might b e dened like this in an untyped Core language

compose f g x let y g x in f y

Now supp ose that we wished to unfold a particular call to compose say

compose show double v

where v is an Int double doubles it and show converts the result to a String The result

of unfolding the call to compose is an instance of the b o dy of compose thus

let y double v in show y

Now we want to b e able to identify the type of every variable and subexpression so we must

calculate the type of y In this case it has type Int but in another application of compose

it may have a dierent type All this is b ecause its type in the b o dy of compose itself is

just a type variable It is clear that in a p olymorphic world it is insucient merely to

tag every variable of the original program with its type b ecause this information do es not

survive across program transformations

What then is to b e done Clearly the program must b e decorated with type information in

some way and every program transformation must b e sure to preserve it Deciding exactly

how to decorate the program and how to maintain these decorations correctly during trans

formation seemed rather dicult at rst We nally realised that an otheshelf solution

was available namely the secondorder lambda calculus Girard Reynolds

The idea is that every p olymorphic function such as compose has a type abstraction for each

universallyquantied p olymorphic variable in its type ; ; and in the case of compose

and whenever a p olymorphic function is called it is passed extra type arguments to indicate

the types to which its p olymorphic type variables are to b e instantiated The denition of

compose now b ecomes

compose a b c

fbc gab xa

let yb g x in f y

The function takes three type parameters a b and c as well as its value parameters f g and

x The types of the latter can now b e given explicitly as can the type of the lo cal variable

y A call of compose is now given three extra type arguments which instantiate a b and c

just as the normal arguments instantiate f g and x For example the call of compose we

lo oked at earlier is now written like this

compose Int Int String show double v

It is now simple to unfold this call by instantiating the b o dy of compose with the supplied

arguments to give the expression

let yInt double v in show y

Notice that the letb ound variable y is now automatically attributed the correct type

In short the secondorder lambda calculus provides us with a wellfounded notation in which

to express and transform p olymorphicallytyped programs It turns out to b e easy to intro

duce the extra type abstractions and applications as part of the type inference pro cess

Other compilers for p olymorphic languages are b eginning to carry type information through

to the back end and use it to generate b etter co de Shao App el use type information

to improve data representation though the system they describ e is monomorphic after the

front end Our implementation uses type abstractions and applications only to keep the

compilers types straight no types are passed at runtime It is p ossible to take the idea

further however and pass types at runtime to sp ecialise data representations Morrison et

al give fast access to p olymorphic records Ohori guide garbage collection

Tolmach The most recent and sophisticated work is Harp er Morrisett

Inlinin g and b eta reduction

Functional programs often consist of a myriad of small functions functional programmers

treat functions the way C programmers treat macros so go o d inlinin g is crucial Compilers

for conventional languages get p erformance improvement from inlining Davidson

Holler while functional language compilers gain App el Santos

Inlining removes some functioncall overhead of course but an equally imp ortant

factor is that inlining brings together co de that was previously separated and thereby often

exp oses a cascade of new transformation opp ortunities We therefore implement inlining in

the simplier

We have found it useful to identify three distinct transformations related to inlinin g

Inlining itself replaces an o ccurrence of a letb ound variable by a copy of the righthand

side of its denition Notice that inlining is not limited to function denitions any

letb ound variable can p otentially b e inlined Remember though that o ccurrences

of a variable in an argument p osition are not candidates for inlinin g b ecause they are

constrained to b e atomic

Dead co de elimination discards let bindings that are no longer used this usually o ccurs

when all o ccurrences of a variable have b een inlined

2

This dierence may so on decrease as the increased use of ob jectoriented languages leads to nergained

pro cedures Calder Grunwald Zorn

Beta reduction replaces xE A by EAx An analogous transformation deals with

type applications

Beta reduction is particularly simple in our setting Since the argument A is b ound to b e

atomic there is no risk of duplicating a redex and we can simply replace x by A throughout

E There is a worry ab out name capture however what if A is also b ound in E We avoid this

problem by the simple exp edient of renaming every identier as we go which costs little extra

since we have to construct a new transformed expression anyway Whilst b eta reduction is

simple inlining is more interesting

Simple inlining

It is useful to distinguish two cases of inlinin g

WHNFs If the variable concerned is b ound to a weak head normal form WHNF that

is an atom lambda abstraction or constructor application then it can b e inlined

without risking the duplication of work The only downside might b e an increase in

co de size

NonWHNFs Otherwise inlining carries the risk of loss of sharing and hence the duplica

tion of work For example

let x f in xx

it might b e b e unwise to inline x b ecause then f would b e evaluated twice instead

of once Informally we say that a transformation is W safe if it guarantees not to

duplicate work

In the case of WHNFs everything is as one would exp ect The tradeo is b etween co de size

and the b enet of inlining and like any compiler we have a variety of heuristics but no

formal analysis for deciding when a function is small enough to inline Many functions are

small though and co de size can actually decrease when they are inlined b oth b ecause the

calling co de is eliminated and also b ecause of other consequential transformations that are

exp osed

The other sorts of WHNF an atom or constructor application is always small enough to

inline Recall that constructor applications must have atomic arguments

For nonWHNFs attention fo cuses on how the variable is used If the variable o ccurs just

once then presumably it is safe to inline it Our rst approach was to p erform a simple

o ccurrence analysis that records for each variable how many places it is used and use this

information to guide the inlinings done by the simplier There are three complications with

this naive approach

The rst is practical As mentioned earlier the simplier tries to p erform as many transfor

mations as p ossible during a single pass over the program However many transformations

notably b eta reduction and inlining itself change the number of o ccurrences of a variable

Our current solution to this problem is to do a great deal of b o okkeeping to keep o ccurrence

information up to date App el Jim do es something similar

The second complication is that a variable may o ccur multiple times with no risk of duplicating

work namely if the o ccurrences are in dierent alternatives of a case expression In this case

the only issue to consider is the tradeo b etween co de size and inlining b enet

Lastly inlining based on naive o ccurrence counting is not W safe Consider this expression

let x f

g y x

in g ag b

If we replace the single o ccurrence of x by f we will recompute the call to f every time

g is called rather than sharing it among all calls to g Our current solution is conservative

we never inline inside a lambda abstraction It turns out though that this approach is

sometimes to o conservative In higherorder programs where lots of inlining is happ ening it

is not unusual to nd functions that are sure to b e called only once so it would b e p erfectly

safe to inline inside them

Using linearity

Because of these complications the b o okkeeping required to track o ccurrence information

has gradually grown into the most intricate and bugprone part of the simplier Worse work

duplication bugs manifest themselves only as p erformance problems and may go unnoticed

for a long time This complexity is esp ecially irritating b ecause we have a strong intuitive

notion of whether a variable can b e used more than once and that intuitive notion is

an invariant of W safe transformations That suggests that a linear type system would b e

a go o d way to identify variables that can safely b e inlined even though they o ccur inside

lambdas or that cannot safely b e inlined even though they currently o ccur only once Just

as all transformations preserve the ordinary typing of an expression Section so W safe

transformations preserve the linear type information to o and hence guarantee not to duplicate

work

Unfortunately most linear type systems are inappropriate b ecause they do not take account

of callbyneed evaluation For example consider the expression

let x

y x

in y y

Under call by need evaluation even though y is evaluated many times x will b e evaluated

only once Most linear systems would b e to o conservative and would attribute a nonlinear

type to x as well as y preventing x from b eing inlined

Thus motivated we have developed a linear type system that does take account of call by

!

need evaluation Wadler Turner The type system assigns a type of Int to y in

the ab ove example the sup erscript ! indicating that y might b e evaluated more than once

However it assigns a type of Int to x indicating that x can b e evaluated at most once and

hence can W safely b e inlined

The type system is capable of dealing with usage p olymorphism For example consider

3

One such bug caused the compiler which is of course written in Haskell to rebuild its symbol table from

scratch every time a variable was lo oked up in the table The compiler worked p erfectly alb eit somewhat

slowly and it was months b efore we noticed Sansom

this denition of apply

apply f x f x

In a particular application apply g y whether or not y is used more than once dep ends

on whether g uses its argument more than once So the type of apply is

u v u v

u ; v : ; :

u

The two o ccurrences of indicate that the usage u of gs argument is the same as that of y

Our implementation of this linear type system is incomplete so we do not yet have practical

exp erience of its utility but we are optimistic that it will provide a systematic way of ad

dressing an area we have only dealt with informally to date and which has bitten us badly

more than once

Transforming conditionals

Most compilers have sp ecial rules to optimise conditionals For example consider the expres

sion

if not x then E else E

No decent compiler would actually negate the value of x at runtime Let us see then what

happ ens if we simply turn the transformation handle After desugaring the conditional and

inlini ng the denition of not we get

case case x of True False False True of

True E

False E

Here the outer case scrutinises the value returned by the inner case This observation

suggests that we could move the outer case inside the the branches of the inner one thus

case x of

True case False of True E False E

False case True of True E False E

Notice that the originallyouter case expression has b een duplicated but each copy is now

scrutinising a known value and so we can make the obvious simplication to get exactly what

we might originally have hop ed

case x of

True E

False E

Both of these transformations are generally applicable The second the caseofknown

constructor transformation eliminates a case expression that scrutinises a known value

This is always a Go o d Thing and many other transformations are aimed at exp osing opp or

tunities for such case elimination We consider another useful variant of case elimination in

Section The rst which we call the caseofcase transformation is certainly correct in

general but it app ears to risk duplicating E andor E We turn to this question next

4

In fact for the purp oses of this pap er we have simplied the type a little

Join p oints

How can we gain the b enets of the caseofcase transformation without risking co de dupli

cation A simple idea is to make lo cal denitions for the righthand sides of the outer case

like this

case case S of True R False R of

True E

False E

let e E e E

in case S of

True case R of True e False e

False case R of True e False e

Now E and E are not duplicated though we incur instead the cost of implementing the

bindings for e and e In the not example though the two inner cases are eliminated

leaving only a single o ccurrence of each of e and e so their denitions will b e inlined

leaving exactly the same result as b efore

We certainly cannot guarantee that the newlyintroduced bindings will b e eliminated though

Consider for example the expression

if x y then E else E

Here is the b o olean disjunction op eration dened thus

a b case a of True True False b

Desugaring the conditional and inlinin g gives

case case x of True True False y of

True E

False E

Now applying the new caseofcase transformation

let e E e E

in case x of

True case True of True e False e

False case y of True e False e

Unlike the not example only one of the two inner cases simplies so only e will certainly

b e inlined b ecause e is still mentioned twice

let e E

in case x of

True e

False case y of True e False E

The interesting thing here is that e plays exactly the role of a lab el in conventional compiler

technology Given the original conditional a C compiler will shortcircuit the evaluation of

the condition if x turns out to b e True generating co de like

if x goto l

if y goto l

goto l

l code for E goto l

l code for E

l

Here l is a lab el where two p ossible execution paths if x is True or if x is False and y

is True join up we call it a join p oint That suggests in turn that our co de generator

should b e able to implement the binding for e not by allo cating a thunk as it would usually

do but rather by simply jumping to some common co de after p erhaps adjusting the stack

p ointer wherever e is subsequently evaluated Our compiler do es exactly this Rather than

somehow mark e as sp ecial the co de generator do es a simple syntactic escap e analysis to

identify variables whose evaluation is certain to take place b efore the stack retreats and

implements their evaluation as a simple adjuststackandjump As a result we get essentially

the same co de as a C compiler for our conditional

Seen in this light the act of inlining E is what a conventional compiler might call jump

elimination A go o d C compiler would probably eliminate the jump to l thus

if x goto l

if y goto l

l code for E

l

l code for E goto l

Back in the functional world if E is small then the inliner might decide to inline e at its

two o ccurrences regardless thus eliminating a jump in favour of a slight increase in co de size

Conventional compilers do this to o notably in the case where the co de at the destination of

a jump is just another jump which would corresp ond in our setting to E b eing just a simple

variable

The p oint is not that the transformations achieve anything that conventional compiler tech

nology do es not but rather that a single mechanism inlining which is needed anyway deals

uniformly with jump elimination as well as its more conventional eects

Generalising join p oints

Do es all this work generalise to data types other than b o oleans At rst one might think

the answer is yes of course but in fact the mo died caseofcase transformation is simply

nonsense if the originallyouter case expression binds any variables For example consider

the expression

f if b then B else B

where f is dened thus

f as case as of E bbs E

Desugaring the if and inlinin g f gives

case case b of True B False B of

E

bbs E

But now since E may mention b and bs we cannot letbind a new variable e as we did

b efore The solution is simple though simply letbind a function e that takes b andor

bs as its arguments Supp ose for example that E mentions bs but not b Then we can

p erform a caseofcase transformation thus

let e E e bs E

in case b of

True case B of e bbs e bs

False case B of e bbs e bs

All the inlinin g mechanism discussed ab ove for eliminating the binding for e if p ossible works

just as b efore Furthermore even if e is not inlined the co de generator can still implement

e eciently a call to e is compiled to a co de sequence that loads bs into a register adjusts

the stack p ointer and jumps to the join p oint

This go es b eyond what conventional compiler technology achieves Our join p oints can now

b e parameterised by arguments that embo dy the dierences b etween the execution paths that

led to that p oint Better still the whole setup works for arbitrary userdened data types

not simply for b o oleans and lists

Generalising case elimination

Earlier we discussed the caseofknownconstructor transformation that eliminates a case

expression There is a useful variant of this transformation that also eliminates a case

expression Consider the expression

if null xs then r else tail xs

where null and tail are dened as you might exp ect

null as case as of True bbs False

tail cs case cs of error tail dds ds

After the usual inlining we get

case case xs of True bbs False of

True r

False case xs of

error tail

dds ds

Now we can do the caseofcase transformation as usual giving after a few extra steps

case xs of

r

bbs case xs of

error tail

dds ds

Now it is obvious that the inner evaluation of xs is redundant b ecause in the bbs branch

of the outer case we know that xs is certainly of the form bbs Hence we can eliminate

the inner case selecting the dds alternative but substituting b for d and bs for ds

case xs of

r

bbs bs

We will see another application of this form of case elimination in Section

Summary

We have describ ed a few of the most imp ortant transformations involving case expressions

but there are quite a few more including case merging dead alternative elimination and

default elimination They are describ ed in more detail by Santos who also provides

measurements of their frequency

Like many go o d ideas the caseofcase transformation limited to b o oleans but including

the idea of using letb ound variables as join p oints was incorp orated in Steeles Rabbit

compiler for Scheme Steele We reinvented it and generalised it for case expressions

and parameterised join p oints letb ound join p oints are also extremely useful when desug

aring complex pattern matching Lacking join p oints most of the standard descriptions are

complicated by a sp ecial FAIL value along with sp ecial semantics and compilation rules to

express the joining up of several execution paths when a pattern fails to match Augustsson

Peyton Jones

Unboxed data types and strictness analysis

Consider the expression xy where x and y have type Int Because Core is nonstrict x and

y must each b e represented by a p ointer to a p ossiblyunevaluated ob ject Even if x say is

already evaluated it will still therefore b e represented by a p ointer to a b oxed value in the

heap The addition op eration must evaluate x and y as necessary unbox them add them

and b ox the result

Where arithmetic op erations are cascaded we would like to avoid b oxing the result of one

op eration only to unbox it immediately in the next Similarly in the expression xx we

would like to avoid evaluating and unboxing x twice

Exp osing b oxing to transformation

Such b oxingunboxing optimisations are usually carried out by the co de generator but it

would b e b etter to nd a way to express them as program transformations We have achieved

this goal as follows Instead of regarding the data types Int Float and so on as primitive

we dene them using algebraic data type declarations

data Int I Int

data Float F Float

Here Int is the trulyprimitive type of unboxed integers and Float is the type of unboxed

oats The constructors I and F are in eect the b oxing op erations The characters

are merely cues to the human reader the compiler treats as part of a name like any other

letter Now we can express the previouslyprimitive op eration thus

a b case a of

I a case b of

I b case a b of

r I r

where is the primitive addition op eration on unboxed values You can read this denition

as evaluate and unbox a do the same to y add the unboxed values giving r and return a

b oxed version thereof

Now simple transformations do the Right Thing to xx We b egin by inlining to give

case x of

I a case x of

I b case a b of

r I r

But now the inner case can b e eliminated Section since it is scrutinising a known value

x giving the desired outcome

case x of

I a case a a of

r I r

Similar transformations this time involving caseofcase ensure that in expressions such as

xyz the intermediate result is never b oxed The details are given by Peyton Jones

Launchbury but the imp ortant p oints are these

By making the Core language somewhat more expressive ie adding unboxed data

types we can exp ose many new evaluation and b oxing op erations to program transfor

mation

Rather than a few ad hoc optimisations in the co de generator the full range of trans

formations can now b e applied to the newlyexp osed co de

Optimising evaluation and unboxing may itself exp ose new transformation opp ortuni

ties for example a function b o dy may b ecome small enough to inline

Strictness analysis

Strictness analysers attempt to gure out whether a function is sure to evaluate its argument

giving the opp ortunity for the compiler to evaluate the argument b efore the call instead of

building a thunk that is forced later on There is an enormous literature on strictness analysis

itself but virtual ly none explaining how to exploit its results apart from general remarks that

the co de generator can use it Our approach is to express the results of strictness analysis

as a program transformation for exactly the reasons mentioned at the end of the previous

section

As an example consider the factorial function with an accumulating parameter which in

Haskell might lo ok like this

afac Int Int Int

afac a a

afac a n afac na n

Translated into the Core language it would take the following form

one I

afac a n case n of

I n case n of

a

n let a na n none

in afac a n

In a naive implementation this function sadly uses linear space to hold a growing chain of

unevaluated thunks for a

Now supp ose that the strictness analyser discovers that afac is strict in b oth its arguments

Based on this information we split it into two functions a wrapper and a worker thus

afac a n case a of I a case n of I n afac a n

one I

afac a n let n I n a I a

in case n of

I n case n of

a

n let a na n none

in afac a n

The wrapp er afac implements the original function by evaluating the strict arguments and

passing them unboxed to the worker afac The wrapp er is also marked as alwaysinline

me which makes the simplier extremely keen to inline it at every call site thereby eectively

moving the argument evaluation to the call site

The co de for the worker starts by reconstructing the original arguments in b oxed form and

then concludes with the original unchanged co de for afac Reb oxing the arguments may b e

correct but it lo oks like a weird thing to do b ecause the whole p oint was to avoid b oxing the

arguments at all Nevertheless let us see what happ ens when the simplier go es to work on

afac It just inlines the denitions of and afac itself and applies the transformations

describ ed earlier A few moments work should convince you that the result is this

afac a n case n of

I a

n case n a of

a case n of

n afac a n

Bingo afac is just what we hop ed for a strict constantspace ecient factorial function

The reb oxing bindings have vanished b ecause a case elimination transformation has left

them as dead co de Even the recursive call is made directly to afac rather than going

via afac it is worth noticing the imp ortance of inlinin g the wrapp er in the b o dy of the

worker even though the two are mutually recursive Meanwhile the wrapp er afac acts as

an imp edancematcher to provide a b oxed interface to afac

Data structures

We have found it very worthwhile to extend the strictness analyser a bit further Supp ose we

have the following function denition

f IntInt Int

f p E

It is relatively easy for the strictness analyser to discover not only fs strictness in the pair

p but also fs strictness in the two comp onents of the pair For example supp ose that the

strictness analyser discovers that f is strict b oth in p and in the rst comp onent of p but not

in the second Given this information we can transform the denition of f into a worker and

a wrapp er like this

f p case p of xy case x of I x f x y

f x y let x I x p xy

in E

The pair is passed to the worker unboxed ie the two comp onents are passed separately

and so is the rst comp onent of the pair

We so on learned that lo oking inside nonrecursive data structures in this way exp osed a new

opp ortunity absence analysis What if f do es not use the second comp onent of the pair at all

Then it is a complete waste of time to pass y to f at all Whilst it is unusual for programmers

to write functions with arguments that are completely unused it is rather common for them

to write functions that do not use some parts of their arguments We therefore p erform

b oth strictness analysis and absence analysis and use the combined information to guide the

workerwrapper split

Matters are more complicated if the argument type is recursive or has more than one con

structor In these cases we are content simply to evaluate the argument b efore the call as

describ ed in the next section

Notice the imp ortance of type information to the whole endeavour The type of a function

guides the resolution of the strictness analysis and the workerwrapper splitting

Strict let bindings

An imp ortant but less commonly discussed outcome of strictness analysis is that it is p ossible

to tell whether a let binding is strict that is whether the variable b ound by the let is sure

to b e evaluated in the b o dy If so there is no need to build a thunk Consider the expression

let x R in E

where x has type Int and E is strict in x Using a similar strategy to the workerwrapper

scheme we can transform to

case R of I x let x I x in E

As b efore the reb oxing binding for x will b e eliminated by subsequent transformation If x

has a recursive or multiconstructor type then we transform instead to this

case R of x E

This expression simply generates co de to evaluate R bind the b oxed result to x and then

evaluate E This is still an improvement over the original let expression b ecause no thunk is

built

Summary

Strictness analysis exploited via unboxed data types is a very worth while analysis and

transformation Even the relatively simple analyser we use improves execution time by

averaged across a wide range of programs Peyton Jones Partain

Other GHC transformations

We have fo cused so far on three particular asp ects of GHCs transformation system This

section briey summarises the other main transformations p erformed by GHC

The simplier contains many more transformations than those describ ed in Sections and

A full list can b e found in Peyton Jones Santos and Santos the latter

also contains measurements of the frequency and usefulness of each transformation

The sp ecialiser uses partial evaluation to create sp ecialised versions of overloaded functions

Letoating is a group of transformations that concern the placement of let bindings and

hence determine where allo cation o ccurs There are three main letoating transforma

tions

Floating inwards moves bindings as near their site of use as p ossible

The ful l laziness transformation oats constant subexpressions out of lambda ab

stractions Hughes Peyton Jones Lester it generalises the stan

dard idea of lo opinvariant co de motion Aho Sethi Ullman

Local letoating netunes the lo cation of each let binding

Details of all three are given by Peyton Jones Partain Santos along with

detailed measurements Letoating alone gives an average improvement in execution

time of around

Eta expansion is an unexp ectedlyuseful transformation Gill Chapter We found

that other transformations sometimes pro duce expressions of the form

let f x let in y E

in B

If f is always applied to two arguments in B then we can W safely that is without

risk of duplicating work transform the expression to

let f x y let in E

in B

It turns out that a lambda abstraction that binds multiple arguments can b e imple

mented much more eciently than a nested series of lambdas The most elegant way

to achieve the transformation is to p erform an etaexpansion the opp osite of eta

reduction on fs right hand side

x R x a R a

Once that is done normal b eta reduction will make the application to a cancel with

the y to give the desired overall eect

The crucial question is this when is eta expansion guaranteed to b e W safe Unsur

prisingly this turns out to b e another fruitful application for the linear type system

sketched in Section

Deforestation is a transformation that removes intermediate lists Wadler For ex

ample in the expression sum map double xs an intermediate list map double xs

is created only to b e consumed immediately by sum Successful deforestation removes

this intermediate list giving a single pass algorithm that traverses the list xs doubling

each element b efore adding it to the total

Fullblown Wadlerstyle deforestation for higherorder programs is dicult the only ex

ample we know of is describ ed by Marlow and even that do es not work for large

programs Instead we developed a new more practical technique called short cut de

forestation Gill Launchbury Peyton Jones As the name implies our metho d

do es not remove all intermediate lists but in exchange it is relatively easy to implement

Gill describ es the technique in detail and gives measurements of its eectiveness

Even on programs written without deforestation in mind the transformation reduces

execution time by some averaged over a range of programs

Lambda lifting is a wellknown transformation that replaces lo cal function declarations

with global ones by adding their free variables as extra parameters Johnsson

For example consider the denition

f x letrec g y xyg

in g

Here x is free in the denition of g By adding x as an extra argument to g we can

transform the denition to

f x g x

g x y xyg x

Some back ends require lambdalifted programs Our co de generator can handle lo cal

functions directly so lambda lifting is not required Even so it turns out that lambda

lifting is sometimes b enecial but on other occasions the reverse is the case That is the

exact opp osite of lambda lifting lambda dropping also known as the static argument

transformation sometimes improves p erformance Santos Chapter discusses

the tradeo in detail GHC implements b oth lambda lifting and the static argument

transformation Each buys only a small p erformance gain a p ercentage p oint or two

on average

The average p erformance improvements mentioned in this pap er are geometric means taken

over the large nofib suite of b enchmark programs many of which are real applications Par

tain They are emphatically not b estcase results on toy programs Nevertheless they

should b e taken only as a crude summary of the general scale of the eect the pap ers cited

give much more detail

Lessons and conclusions

What general lessons ab out compilation by transformation have we learned from our exp eri

ence

The interaction of theory and practice is genuine not simply window dressing Apart

from asp ects already mentioned second order lambda calculus linear type systems

strictness and absence analysis here are three other examples describ ed elsewhere

We make extensive use of monads Wadler particularly to express in

putoutput Peyton Jones Wadler and stateful computation Launchbury

Peyton Jones

Parametricity a deep semantic consequence of p olymorphism turns out to b e

crucial in establishing the correctness of cheap deforestation Gill Launchbury

Peyton Jones and secure encapsulation of stateful computation Launch

bury Peyton Jones

GHCs time and space proler is based on a formal mo del of cost attribution

Sansom Sansom Peyton Jones an unusual prop erty for a highly

op erational activity such as proling In this case the implementation came rst

but the subtleties caused by nonstrictness and higherorder functions practically

drove us to despair and forced us to develop a formal foundation

Plug and play really works The mo dular nature of a transformational compiler and its

late commitment to the order of transformation is a big win The ability to run a

transformation pass twice at least when going for maximum optimisation is sometimes

very useful

The cascade eect is imp ortant One transformation really do es exp ose opp ortunities

for another Transformational passes are easier to write in the knowledge that subse

quent transformations can b e relied on to clean up the result of a transformation

For example a transformation that wants to substitute x for y in an expression E can

simply pro duce yE x leaving the simplier to p erform the substitution later

The compiler needs a lot of bullets in its gun It is common for one particular trans

formation to have a dramatic eect on a few programs and a very mo dest eect on

most others There is no substitute for applying a large number of transformations

each of which will hit some programs

Some nonobvious transformations are imp ortant We found that it was imp ortant to

add a signicant number of obviouslycorrect transformations that would never apply

directly to any reasonable source program For example

case error Wurble of error Wurble

error is a function that prints its argument string and halts execution Semantically

its value is just b ottom No programmer would write a case expression that scrutinises

a call to error but such case expressions certainly show up after transformation For

example consider the expression

if head xs then E else E

After desugaring and inlining head we get

case case xs of error head pps p of

True E

False E

Applying the caseofcase transformation Section makes one copy of the outer

case scrutinise the call to error

Other examples of nonobvious transformations include eta expansion Section and

absence analysis Section We identied these extra transformations by eyeballing

the co de pro duced by the transformation system lo oking for co de that could b e im

proved

Elegant generalisations of traditional optimisations have often cropp ed up that either ex

tend the reach of the optimisation or express it as a sp ecial case of some other trans

formation that is already required Examples include jump elimination copy propaga

tion b o olean shortcircuiting and lo opinvariant co de motion Similar generalisations

are discussed by Steele

Maintaining types is a big win It is sometimes tiresome but never dicult for each

transformation to maintain type correctness On the other hand it is sometimes indis

p ensable to know the type of an expression notably during strictness analysis

Perhaps the largest single b enet came from an unexp ected quarter it is very easy to

check a Core program for type correctness While developing the compiler we run Core

Lint the Core typechecker after every transformation pass which turns out to b e

an outstandingly go o d way to detect incorrect transformations Before we used Core

Lint b ogus transformations usually led to a core dump when running the transformed

program followed by a long gdb hunt to isolate the cause Now most b ogus transforma

tions are identied much earlier and much more precisely One of the dumbest things

we did was to delay writing Core Lint

Crossmo dule optimisation is imp ortant Functional programmers make heavy use of

libraries abstract data types and mo dules It is essential that inlining strictness

analysis sp ecialisation and so on work b etween mo dules So far we have achieved this

goal by generating increasingly baro que textual interface les to convey information

from the exp orting mo dule to the imp orting one As the information b ecomes more

elab orate this approach is less and less attractive Like the ob jectoriented community

Chambers Dean Grove we regard a serious assault on global crossmo dule

optimisation as the most plausible next big win

Acknowledgements

The Glasgow Haskell Compiler was built by many p eople including Will Partain Jim Matt

son Kevin Hammond Andy Gill AndreSantos Patrick Sansom Cordelia Hall and Simon

Marlow Im very grateful to Sigb jorn Finne Hanne Nielson Will Partain Patrick Sansom

and Phil Trinder for helpful feedback on drafts of this pap er

The Glasgow Haskell Compiler is freely available at

httpwwwdcsglaacukfpsoftwareghchtml

References

AV Aho R Sethi JD Ullman Compilers principles techniques and to ols Addison

Wesley

AW App el Compiling with Cambridge University Press

AW App el T Jim Shrinking LambdaExpressions in Linear Time Department of

Computer Science Princeton University

Z Ariola M Felleisen J Maraist M Odersky P Wadler Jan A call by need lambda

calculus in nd ACM Symp osium on Principles of Programming Languages San

Francisco ACM

L Augustsson Compiling lazy functional languages part I I PhD thesis Dept Comp

Sci Chalmers University Sweden

DF Bacon SL Graham OJ Sharp Dec Compiler transformations for high

p erformance computing ACM Computing Surveys

B Calder D Grunwald B Zorn Dec Quantifying b ehavioural dierences b etween C

and C programs Journal of Programming Languages

C Chambers J Dean D Grove Apr A framework for selective recompilation in the

presence of complex intermodule dep endencies in Pro c International Conference on

Software Engineering Seattle

JW Davidson AM Holler A study of a C function inliner Software Practice and

Exp erience

C Flanagan A Sabry B Duba M Felleisen June The essence of compiling with

continuations SIGPLAN Notices

P Fradet D Le Metayer Jan Compilation of functional languages by program trans

formation ACM Transactions on Programming Languages and Systems

A Gill J Launchbury SL Peyton Jones June A short cut to deforestation in Pro c

Functional Programming Languages and Computer Architecture Cop enhagen ACM

AJ Gill Jan Cheap deforestation for nonstrict functional languages PhD thesis

Department of Computing Science Glasgow University

J Girard Une extension de linterpretation de Godela lanalyse et son application a

lelimination de coupures dans lanalyse et la theorie des types in nd Scandinavian

Logic Symp osium JE Fenstad ed North Holland

R Harp er G Morrisett Jan Compiling p olymorphism using intensional type analy

sis in nd ACM Symp osium on Principles of Programming Languages San Fran

cisco ACM

P Hudak SL Peyton Jones PL Wadler Arvind B Boutel J Fairbairn J Fasel M Guzman

K Hammond J Hughes T Johnsson R Kieburtz RS Nikhil W Partain J Peterson

May Rep ort on the functional programming language Haskell Version

SIGPLAN Notices

RJM Hughes July The design and implementation of programming languages PhD

thesis Programming Research Group Oxford

Thomas Johnsson Lambda lifting transforming programs to recursive equations

in Pro c IFIP Conference on Functional Programming and Computer Architecture

Jouannaud ed LNCS Springer Verlag

R Kelsey May Compilation by program transformation YALEUDCSRR PhD

thesis Department of Computer Science Yale University

R Kelsey P Hudak Jan Realistic compilation by program transformation in Pro c

ACM Conference on Principles of Programming Languages ACM

DA Kranz May ORBIT an optimising compiler for Scheme PhD thesis Department

of Computer Science Yale University

DA Kranz R Kelsey J Rees P Hudak J Philbin N Adams ORBIT an optimis

ing compiler for Scheme in Pro c SIGPLAN Symp osium on Compiler Construction

ACM

J Launchbury Jan A natural semantics for lazy evaluation in th ACM Symp osium

on Principles of Programming Languages Charleston ACM

J Launchbury SL Peyton Jones June Lazy functional state threads in SIGPLAN

Symp osium on Programming Language Design and Implementation PLDI Or

lando ACM

S Marlow March Deforestation for Higher Order Functional Programs PhD thesis

Department of Computing Science University of Glasgow

R Morrison A Dearle RCH Connor AL Brown July An ad ho c approach to the

implementation of p olymorphism ACM Transactions on Programming Languages

and Systems

A Ohori Jan A compilation metho d for MLstyle p olymorphic record calculi in th

ACM Symp osium on Principles of Programming Languages Albuquerque ACM

WD Partain The nob Benchmark Suite of Haskell Programs in Functional Pro

gramming Glasgow J Launchbury PM Sansom eds Workshops in Comput

ing Springer Verlag

SL Peyton Jones The Implementation of Functional Programming Languages Prentice

Hall

SL Peyton Jones Apr Implementing lazy functional languages on sto ck hardware the

Spineless Tagless Gmachine Journal of Functional Programming

SL Peyton Jones CV Hall K Hammond WD Partain PL Wadler March The

Glasgow Haskell compiler a technical overview in Pro ceedings of Joint Framework

for Information Technology Technical Conference Keele DTISERC

SL Peyton Jones J Launchbury Sept Unboxed values as rst class citizens in

Functional Programming Languages and Computer Architecture Boston Hughes

ed LNCS Springer Verlag

SL Peyton Jones D Lester May A mo dular fullylazy lambda lifter in Haskell

Software Practice and Exp erience

SL Peyton Jones WD Partain Measuring the eectiveness of a simple strictness

analyser in Functional Programming Glasgow K Hammond JT ODonnell

eds Workshops in Computing Springer Verlag

SL Peyton Jones WD Partain A Santos May Letoating moving bindings to

give faster programs in Pro c International Conference on Functional Programming

Philadelphia ACM

SL Peyton Jones A Santos Compilation by transformation in the Glasgow Haskell

Compiler in Functional Programming Glasgow K Hammond DN Turner

PM Sansom eds Workshops in Computing Springer Verlag

SL Peyton Jones PL Wadler Jan Imp erative functional programming in th

ACM Symp osium on Principles of Programming Languages Charleston ACM

JC Reynolds Towards a theory of type structure in International Programming

Symp osium Springer Verlag LNCS

PM Sansom Sept Execution proling for nonstrict functional languages PhD

thesis Technical Rep ort FP Department of Computer Science Univer

sity of Glasgow ftpftpdcsglasgowacukpubglasgowfptech reports

executionprofilingpsZ FP

PM Sansom SL Peyton Jones Jan Time and space proling for nonstrict higher

order functional languages in nd ACM Symp osium on Principles of Programming

Languages San Francisco ACM

A Santos Sept Compilation by transformation in nonstrict functional languages

PhD thesis Department of Computing Science Glasgow University

Z Shao AW App el June A typebased compiler for Standard ML in SIGPLAN

Symp osium on Programming Language Design and Implementation PLDI La

Jolla ACM

GL Steele Rabbit a compiler for Scheme AITR MIT Lab for Computer Sci

ence

A Tolmach June Tagfree garbage collection using explicit type parameters in ACM

Symp osium on Lisp and Functional Programming Orlando ACM

PL Wadler Deforestation transforming programs to eliminate trees Theoretical

Computer Science

PL Wadler Jan The essence of functional programming in th ACM Symp osium

on Principles of Programming Languages Albuquerque ACM

PL Wadler DN Turner June Once up on a type in Pro c Functional Programming

Languages and Computer Architecture La Jolla ACM