<<

Secrets of the Glasgow Haskell inliner

Simon Marlow

Microsoft Research Ltd, Cambridge Microsoft Research Ltd, Cambridge

[email protected] [email protected]

Septemb er 1, 1999

 A ma jor issue for any compiler, esp ecially for one that Abstract

inlines heavily,is name capture. Our initial brute-force

solution involved inconvenient plumbing, and wehave

Higher-order languages, such as Haskell, encourage the pro-

now evolved a simple and e ective alternative Sec-

grammer to build abstractions by comp osing functions. A

tion 3.

go o d compiler must inline many of these calls to recover an

eciently program.

 At rst wewere very conservative ab out inlining recur-

sive de nitions; that is, we did not inline them at all.

In principle, inlining is dead simple: just replace the call

But we found that this strategy o ccasionally b ehaved

of a function by an instance of its b o dy. But any compiler-

very badly. After a series of failed hacks we develop ed

writer will tell you that inlining is a black art, full of delicate

a simple, obviously-correct algorithm that do es the job

compromises that work together to give go o d p erformance

b eautifully Section 4.

without unnecessary co de bloat.

 Because the compiler do es so much inlining, it is im-

The purp ose of this pap er is, therefore, to articulate the

p ortant to get as much as p ossible done in each pass

key lessons we learned from a full-scale \pro duction" inliner,

over the program. Yet one must steer a careful path

the one used in the . We fo cus

between doing to o little work in each pass, requiring

mainly on the algorithmic asp ects, but we also provide some

extra passes, and doing to o much work, leading to an

indicative measurements to substantiate the imp ortance of

exp onential-cost algorithm. GHC now identi es three

various asp ects of the inliner.

distinct moments at which an inlining decision maybe

taken for a particular de nition. We explain why in

Section 6.

1 Intro duction

 When inlining an expression it is imp ortant to retain

the expression's lexical environment, which gives the

One of the trickiest asp ects of a compiler for a functional lan-

bindings of its free variables. But at the inline site, the

guage is the handling of inlining. In a functional-language

compiler might know more ab out the dynamic state of

compiler, inlining subsumes several other optimisations that

some of those free variables; most notably, a free vari-

are traditionally treated separately, such as copy propaga-

able mightbeknown to b e evaluated at the inline site,

tion and jump elimination. As a result, e ective inlining is

but not at its original o ccurrence. Some key transfor-

particularly crucial in getting go o d p erformance.

mations make use of this extra information, and lacking

The Glasgow Haskell Compiler GHC is an optimising com-

it will cause an extra pass over the co de. We describ e

piler for Haskell that has evolved over a p erio d of ab out ten

how to exploit our name-capture solution to supp ort

years. Wehave rep eatedly b een through a cycle of lo oking

accurate tracking of b oth lexical and dynamic environ-

at the co de it pro duces, identifying what could b e improved,

ments Section 7.

and going back to the compiler to make it pro duce b etter

co de. It is our exp erience that the inliner is a lead player in

None of the algorithms we describ e is individually very sur-

many of these improvements. No other single asp ect of the

prising. Perhaps b ecause of this, the literature on the sub-

compiler has received so much attention.

ject is very sparse, and we are not aware of published de-

scriptions of any of our algorithms. Our contribution is to

The purp ose of this pap er is to rep ort on several algorithmic

abstract some of what wehave learned, in the hop e that we

asp ects of GHC's inliner, fo cusing on asp ects that were not

may help others avoid the mistakes that we made.

obvious to us | that is to say, asp ects that we got wrong

to b egin with. Most pap ers ab out inlining fo cus on howto

For the sake of concreteness we fo cus throughout on GHC,

cho ose whether or not to inline a function called from many

but we stress that the lessons we learned are applicable to

places. This is indeed an imp ortant question, but wehave

any compiler for a functional language, and indeed p erhaps

found that we had to deal with quite a few other less obvious,

to for other languages to o.

but equally interesting, issues. Sp eci cally,we describ e the

following:

2 Preliminaries is that ys is b ound to the result of evaluating the scrutinee,

reverse xs in this case, which makes it p ossible to refer to

this value in the alternatives. This detail has no impact on

We will assume the use of a pure, non-strict, strongly-typ ed

the rest of this pap er | indeed, we omit the extra binder

intermediate language, called the GHC Core language. GHC

in our examples | but wehave found that it makes several

is itself written in Haskell, so we de ne the Core language

transformations more simple and uniform, so we include it

by giving its data typ e de nition in Haskell:

here for the sake of completeness.

type Program = [Bind]

GHC's actual intermediate language is very slightly more

complicated than that given here. It is an explicitly-typ ed

data Bind = NonRec Var Expr

language based on , and supp orts p olymorphism

!

| Rec [Var, Expr]

through explicit typ e abstraction and application. It turns

out that doing so adds only one new constructor to the Expr

data Expr = Var Var

typ e, and adds nothing to the substance of this pap er, so we

| App Expr Expr

do not mention it further. The main p oint is that this pap er

| Lam Var Expr

omits no asp ect essential to a full-scale implementation of

| Let Bind Expr

Haskell.

| Const Const [Expr]

| Case Expr Var [Alt]

| Note Note Expr

2.1 What is inlining?

type Alt -- Case alternative

Given a de nition x = E, one can inline x at a particular

= Const, [Var], Expr

o ccurrence by replacing the o ccurrence by E. We use upp er

case letters, suchas\E ", to stand for arbitrary expressions,

data Const -- Constant

and \==> " to indicate a program transformation. For ex-

= Literal Literal

ample:

| DataCon DataCon

| PrimOp PrimOp

let { f = \x -> x*3 } in f a + b -

| DEFAULT

==>

a+b*3 - c

The Core language consists of the aug-

Wehave found it useful to identify three distinct transfor-

mented with let-expressions b oth non-recursive and recur-

mations that collectively implement what we informally de-

sive, case expressions, data constructors, literals, and prim-

scrib e as \inlining":

itive op erations. In presenting examples we will use an in-

formal, alb eit hop efully clear, concrete syntax. We will feel

 Inlining itself replaces an o ccurrence of a let -b ound

free to use in x op erators, and to write several bindings in

variable by a copy of  the right-hand side of its de -

a single non-recursive let-expression as shorthand for a se-

nition. Inlining f in the example ab ove go es like this:

quence of let-expressions.

let { f = \x -> x*3 } in f a + b - c

A program Program  is simply a sequence of bindings, in

==> [inline f]

dep endency order. Each binding Bind  can be recursive

let { f = \x -> x*3 } in \x -> x*3 a + b - c

or non-recursive, and the right hand side of each bind-

ing is an expression Expr . The constructors for variables

Notice that not all the o ccurrences of f need b e inlined,

Var , application App , lamb da abstraction Lam, and let-

and hence that the original de nition of f must, in

expressions Let  should be self-explanatory. A constant

general, b e retained.

application Const  is used for literals, data constructor ap-

plications, and applications of primitive op erators; the num-

 discards bindings that are no

b er of arguments must match the arity of the constant, and longer used; this usually o ccurs when all o ccurrences of

avariable have b een inlined. Continuing our example

and the constant cannot b e DEFAULT . Likewise, the num-

gives:

ber of b ound variables in a case alternative Alt always

matches the arity of the constant; and the latter cannot b e

let { f = \x -> x*3 } in \x -> x*3 a + b - c

a PrimOp . The Note form of Expr allows annotations to

==> [dead f]

b e attached to the . The only impact on the inliner is

\x -> x*3 a + b - c

discussed in Section 7.6.

Case expressions Case  should b e self-explanatory, except

 -reduction simply rewrites a lambda application

for the Var argumentto Case . Consider the following Core

\x->E A to let {x = A} in E. Applying -

expression,

reduction to our running example gives:

case reverse xs of ys {

\x -> x*3 a + b - c

a:as -> ys

==> [beta]

[] -> error "urk"

let { x = a+b } in x*3 - c

}

The unusual part of this construct is the binding o ccur-

The rst of these is the tricky one; the latter two are easy.

rence of \ys ", immediately after the \of ". The semantics

In particular, b eta reduction simply creates a let binding. 2

In a lazy, purely functional language, inlining and dead- let x = foo 1000 in x+x

co de elimination are b oth unconditionally valid, or meaning-

preserving. Neither is valid, in general, in a language p er-

where foo is exp ensive to compute. Inlining x would

mitting side e ects, suchas Standard ML or Scheme. In

result in two calls to foo instead of one.

particular, notice that inlining is valid, regardless of

Work can b e duplicated even if x only app ears once:

 the numb er of o ccurrences of x,

let x = foo 1000

f = \y -> x * y

 whether or not the binding for x is recursive,

in ...f 3..f 4...

 whether or not E has free variables that is, inlining of

If we inline x at its single o ccurrence site, foo will b e

nested de nitions is p erfectly ne, and

called every time f is. The general rule is that wemust

 the syntactic form of E notably, whether or not it is a

b e careful when inlining inside a lamb da.

lamb da abstraction.

It is not hard to come up with examples where a single

inlining that duplicates work gives rise to an arbitrar-

Concerning the last of these items, notice that we uncon-

ily large increase in run time. GHC is therefore very

ventionally use the term \inline" equally for b oth functions

conservative ab out work duplication. In general, GHC

and non-functions. Continuing the example, x can nowbe

never duplicates work unless it is sure that the dupli-

inlined, and then dropp ed as dead co de, thus:

cation is a small, b ounded amount.

let { x = a+b } in x*3 - c

 Are any transformations exposed by inlining? For ex-

==> [inline x]

ample, consider the bindings:

let { x = a+b } in a+b*3 - c

==> [dead x]

f = \x -> E

a+b*3 - c

g = \ys -> map f ys

In this case, x is used exactly once, but we sometimes also

Supp ose wewere to inline f inside g,thus:

inline non-functions that are used several times. Consider:

let x = a,b

g = \ys -> map \x -> E ys

in

...x...case x of { p,q -> p+1 }...

No code is duplicated by doing so, but a small b ounded

amountof work is duplicated, b ecause the closure for

By inlining x we can then eliminate the case to give

\x -> E would have to b e allo cated each time g was

let x = a,b

called. It is often worth putting up with this work du-

in

plication, b ecause inlining f exp oses new transforma-

...x...a+1...

tion opp ortunities at the inlining site. But in this case,

nothing at all would b e gained by inlining f, b ecause f

In a similar way when given bindings suchas x=y , inlining

is not applied to anything.

subsumes copy propagation.

These considerations imply that inlining is not an optimi-

sation \by itself ". The direct e ects of careful inlining are

2.2 Factors a ecting inlining

small: it may duplicate co de or a constant amountofwork,

and usually saves a call or jump alb eit not invariably |

Tosay that inlining is valid do es not mean that it is desirable.

see the example in the last bullet ab ove. It is the indirect

Inlining might increase co de size, or duplicate work, so we

e ects that we are really after: the main reason for inlining

need b e careful ab out when to do it. There are three distinct

is that it often exp oses new transformations, by bringing

factors to consider:

together two co de fragments that were previously separate.

Thus, in general, inlining decisions must b e in uenced by

 Does any code get duplicated, and if so, howmuch? For

context.

example, consider

let f = \v -> ...big... in f 3, f 4

2.3 Work duplication

where \...big... " is a large expression. Then inlin-

If x is inlined in more than one place, or inlined inside a

ing f would not duplicate any work f will still be

lamb da, wehavetoworry ab out work duplication. When

called twice, but it will duplicate the co de for f 's body.

will suchwork duplication b e b ounded? Answer: at least in

Bloated programs are bad increased compilation time,

the cases when x's right hand side is:

lower cache hit rates, but inlining can often reduce

co de size by exp osing new opp ortunities for transfor-

 Avariable.

mations. GHC uses a numb er of heuristics to determine

whether an expression is small enough to duplicate.

 A constructor application.

 Does any work get duplicated, and if so, howmuch? For

 A lamb da abstraction.

example, consider 3

 An expression that is sure to diverge. a+b + a

b ecause the a that was free in x's right hand side has b een

Constructor applications require careful treatment. Con-

captured by the let binding for a .

sider:

x = f y, g y

3.1 The sledge hammer

h = \z -> case x of

a,b -> ...

Earlier versions of GHC used a sledge-hammer approach

It would plainly b e a disaster, in general, to inline x inside

to avoid the name-capture problem: during inlining, GHC

the b o dy of h, since that would duplicate the calls to f and g .

would simply rename, or clone,every single b ound variable,

Yet wewant to inline x so that it can cancel with the case .

to give:

GHC therefore maintains the invariant that every construc-

let s796 = 7

tor application has only arguments that can b e duplicated

in a+b + s796

with no cost: variables, literals, and typ e applications. We

call such arguments trivial expressions, so the invariantis

This renaming made use of a supply of fresh names that,

called the trivial-constructor-argument invariant. Once es-

in this example, has arbitrarily renamed a to s796 . This

tablished, this invariant is easy to maintain see Section 7.2.

approach su ers from two disadvantages:

The last case, that of divergent computations, is more sur-

 It allo cates far more fresh names than are actually nec-

prising, but it is useful in practice. Consider:

essary, and there is sure to b e a compile-time p erfor-

sump = \xs ->

mance cost to this.

let

 Plumbing the supply of fresh names to the places those

fail = error "sump" ++ show xs

names are required is sometimes very painful.

in let rec

go = \xs ->

Why is there a compile-time p erformance cost to the sledge-

case xs of

hammer approach? Because avariable is a structure con-

[] -> 0

taining a name; to rename the variable wemust copy the

x:xs -> if x<0 then fail

structure, inserting the new name. The substitution map-

else x + go xs

ping old names to new names b ecomes larger. Finally,ifthe

in

substitution is emptywe can sometimes avoid lo oking at an

go xs

expression or typ e at all | but if all names are cloned the

substitution is never empty.

Here error is the standard Haskell function that prints an

error message and brings execution to a halt. Semantically,

If the compiler were written in an impure language, fresh

its value is just ?, the divergent value. In this example,

names could b e allo cated by side e ect, but GHC is written

sump adds up the elements of a list, but rep orts an error if

in Haskell, which do es not have side e ects. Using the trees

any element is negative. As it stands, a closure for fail will

of [ARS94] is the b est solution we know of, but it still in-

b e allo cated every time sump is called. It is p erfectly OK

volves plumbing a tree of fresh names everywhere they might

to inline fail, b ecause if fail is ever called, execution is

b e needed. Worse, the fresh names usually aren' needed,

going to halt anyway, so there is no work-duplication issue.

but the tree is nevertheless built. This is deeply irritat-

If we do that, no closure is allo cated; instead, error is called

ing: loads of allo cation for no purp ose whatso ever. Finally,

directly if an element turns out to b e less than zero.

even if wewere not worried ab out p erformance, it is some-

times extremely painful to get the name supply to where it

GHC has a predicate whnfOrBot that identi es expressions

is needed. For example, in a typ ed intermediate language it

that are in WHNF or are certainly divergent:

should b e p ossible to have a function:

whnfOrBot :: Expr -> Bool

exprType :: Expr -> Type

One could easily imagine extending whnfOrBot to cover cases

that gures out the typ e of an expression. But supp ose the

where a small amountofwork other than allo cation is du-

expression is something like:

plicated, such as a few machine instructions.

filter Int pred xs

3 Name capture

The function filter has the p olymorphic typ e

filter :: forall a. a -> Bool -> [a] -> [a]

It is well known that any transformation-based compiler

So to gure out the typ e of the sub expression filter Int 

must b e concerned ab out name capture [Bar85 ]. Consider,

wemust instantiate filter 's typ e, substituting Int for a.

for example:

Oh no! Substitution! That can, in general, give rise to name

let x = a+b in

capture. So we need to feed a name supply to exprType :

let a = 7 in

exprType :: NameSupply -> Expr -> Type

x+a

This \solution" is deeply unattractive, and the situation is

It is obviously quite wrong to inline x to give:

only di erent in its cosmetics if the name supply is hidden

let a = 7 in

in a monad. Something b etter is required. 4

Numb er of attempts

3.2 The rapier

0 1 2 3 9 10+

Mean 93:2 1:3 0:7 1:6 3:2

Supp ose we write the call subst M [E=x] to mean the re-

Min 0:94 0 0 0 0

sult of substituting E for x in M . The standard rule for

Max 100 10 6:13 18:2 94

substitution [Bar85 ] when M is a lamb da abstraction is:

subst x:M [E=x] = x:M

Figure 1: Cloning rates

subst x:M [E=y] = x:subst M [E=y]

if x do es not o ccur free in E

inlining. It certainly happ ens in practice | we have the

If the side condition do es not hold, one must rename the

scars to show for it, though only in situations that are to o

b ound variable x to something else. The brute-force solution

convoluted to present here.

do es this renaming regardless.

Occasionally, the of in-scop e variables is not conveniently

Supp ose that we lacked a name supply, but instead knew

to hand when starting a substitution. In that case, it is

the free variables of E . Then we could test the side condi-

easy to nd the set of free variables of the range of the

tion easily and, in the common case where there is no name

substitution, and use that to get the pro cess started.

capture, nd that there was no need to rename x. But what

if x was free in E ? Then we need to come up with a fresh

name for x that is not free in E . A simple approachisto

3.3 Cho osing a new name

tryavariantof x,say\x1". If that, to o, is free in E , try

\x2", and so on.

The other choice that must be made in the algorithm is

to cho ose a fresh name, in the hop efully rare cases where

When we nally discover a name, xn, that is not free in

that proves necessary. We could just try x1, x2, and so on,

E , we can augment the substitution to map x to xn and

but there is a danger that once x1 :::x20 are in scop e, then

apply this substitution to M , the b o dy of the lamb da. In

any new x will make 20 tries b efore nding x21. A simple

general, then, wemust simultaneously substitute for several

way out is to compute some of hash value from the set

variables at once.

of in-scop e variables, and use that, together p erhaps with

To make this work at all, though, we need to know the free

the variable to b e renamed, to cho ose a new name. Indeed,

variables of E , or, more generally, the free variables of the

simply using the number of enclosing binders as the new

range of the substitution. One way to nd this is simply to

variable name gives something not unlike de Bruijn numb ers

compute the free variables directly from E , but if E is large

see Section 3.5. The nice thing is that any old choice will

this might b e costly. However, it suces to knowany su-

do; the only issue is how many iterations it takes to nd an

perset of these free variables. One obvious choice is the set

unused variable.

of al l variables that are in . If we made this choice,

then we would end up renaming any b ound variable for

which there was an enclosing binding. We call this the no-

3.4 Measurements

shadowing strategy, for obvious reasons. The no-shadowing

strategy will rename some variables when it is not strictly

We made some simple measurements of the e ectiveness of

necessary to do so, but it has the desirable prop erty of idem-

our approach. We compiled the entire nofib suite, some 370

p otence: a complete pass of the simpli er that happ ens to

Haskell mo dules, comprising around 50,000 lines of co de in

make no transformations will clone no variables. This is a

total [Par92 ]. The size of each mo dule varied from a few

go o d thing. Usually, some parts of the program b eing com-

dozen lines to a thousand lines or so.

piled are fully-transformed b efore others; the no-shadowing

Figure 1 summarises how many \tries" it to ok to nd a

strategy reduces gratuitous \churning" of variable names.

variable name that was not in scop e. The columns show

Thus, we are led to a substitution algorithm that has three

what prop ortion of binders required zero, one, two, 3-9, and

parameters, instead of two: the expression to which the sub-

10 or more attempts, to nd a variable name that was not

stitution is applied, the substitution itself, , and the set of

already in scop e. We measured these prop ortions separately

in-scop e variables, :

for each mo dule, and then to ok the arithmetic mean of the

resulting gures. The \min" resp \max" rows show the

subst x:M   = x:subst M  n x [fxg

smallest resp largest prop ortions encountered among the

if x 62

entire set of mo dules.

subst x:M   = y :subst M [x 7! y ]  [fy g

where y 62

The zero column corresp onds to the situation where the

binder is not shadowed; as exp ected, this is the case for

Notice how conveniently the set of in-scop e variables can b e

the vast ma jority 93 of binders. Our hash function we

maintained. Almost all the time, it simply travels every-

simply picked an arbitrary member of the in-scop e set as

where with the substitution; we shall see some interesting

a hash value is obviously to o simple, though: on average

exceptions to this general rule in Section 7.1.

3.2 of all binders required more than ten attempts to nd

a fresh name, and in one pathological mo dule almost all

There is one other imp ortant subtlety in this algorithm: in

binders required more than ten attempts. This pathological

the case where x is not in we must delete x from the

case suggests that there is plenty of ro om for improvement

substitution, denoted  n x. How could x b e in the domain

in the hash function.

of the substitution, but not b e in scop e? Perhaps b ecause we

are indeed substituting for x as a result of some enclosing 5

3.5 Other approaches original source program.

Another well-known approach to the name-capture prob-

4 Ensuring termination

lem to use de Bruijn numb ers [dB80 ]. Apart from b eing

entirely unreadable, this approach su ers from the disad-

vantage that when pushing a substitution inside a lambda,

Inlining, together with b eta reduction, corresp onds closely

the entirerange of the substitution must have its de Bruijn

to compile-time evaluation of the program, so we must

numbers adjusted. That op eration can b e carried out lazily,

clearly b e concerned ab out ensuring that the compiler ter-

to avoid a complexity explosion when pushing a substitution

minates. We start from a secure base: it is a fact that F

!

inside multiple lamb das, but that means yet more adminis-

is strongly normalising. This is a complicated wayofsay-

tration.

ing that the pro cess of reducing every reducible expression

redex in a F program will surely terminate. However,

!

It is far from clear that using de Bruijn numb ers gains any

GHC's intermediate language extends F . These extensions

!

eciency, and they carry a considerable cost in terms of the

intro duce non-termination in two distinct ways:

opacity of the resulting program. Programmers will not

care ab out this, but compiler writers do.

Recursive bindings. If a recursively-b ound variable is in-

There is one fairly comp elling reason for using de Bruijn

lined at one of its o ccurrences, that will intro duce a

numb ers. Precisely b ecause they do discard the original

new o ccurrence of the same variable. Unless restricted

variable names, many more common sub-expressions can

in some way, inlining could go on for ever.

arise. These CSEs increase sharing of the compiler's rep-

Recursive data typ es. Consider the following Haskell

resentation of the program; they do not in general represent

de nition for loop :

run-time sharing. However this compile-time sharing can

be particularly imp ortant when dealing with typ es, which

data T = C T -> Int

can get large. Shao, for example, rep orts substantial sav-

ings when using de Bruijn numb ers for typ es together with

g = \y -> case y of

hash-consing [SLM98 ]. However, our typ es are smaller than

C h -> h y

his we are not compiling SML mo dules so typ e sizes only

b ecome an issue for delib erately pathological programs with

loop = g C g

exp onential-sized typ es.

Another p opular approach to the name-capture problem is

Here, g is small and non-recursive, so when pro cessing

this: establish the invariant that every b ound variable is

g C g , g will be inlined. But the inlined call very

unique in the whole program. Then, since only inlining can

so on rewrites to g C g, which is just the expression

duplicate an expression, we can maintain the invariantby

we started with.

cloning al l the local ly-bound variables of an inlined expres-

The problem here is that the data typ e T is recur-

sion. There are three diculties here. First, we found in

sive, and it appears contravariantly in its own de nition

practice that in GHC at least there were a quite a few

[How96 ].

transformations that had to do extra work to maintain the

global-uniqueness invariant. Secondly, this strategy will do

Of these two forms of divergence, the former is an immediate

more cloning than is really necessary. Thirdly, cloning the

and pressing problem, since almost anyinteresting Haskell

lo cal binders of an inlined expression implies a whole extra

program involves recursion. The rest of this section fo cuses

pass over that expression, prior to simplifying the expres-

entirely on recursive de nitions.

sion in its new context. Our approach, of maintaining an

in-scop e set, combines the cloning pass with the simpli ca-

In contrast, the latter situation is rather rare, and embar-

tion pass, and simultaneously reduces the amount of cloning

rassingly GHC can still b e p ersuaded to diverge by such ex-

that has to b e done.

amples. The most straightforward solution is to sp ot such

contravariant data typ es, and disable the case-elimination

transformation

3.6 Summary

case C g of { C h -> ...h... }

==>

Our new substitution algorithm is a simple re-working of

...g...

the standard algorithm in [Bar85 ]. What is interesting is

that the resulting algorithm seems quite practical. Even if

The question of sp otting contravariant data typ es is com-

the compiler were written in a language where name-supply

plicated by the fact that Haskell data typ es can b e parame-

plumbing was not an issue, maintaining the set of in-scop e

terised and mutually recursive. The MLj compiler [BKR98]

variables makes it easy to reduce the amount of cloning that

restricts data typ es declarations somewhat, but do es p er-

is done.

form the analysis for exactly this reason.

In GHC, a variable's name is actually a pair of a string and

Before discussing recursive bindings, it is worth noting two

a unique numb er. The unique is used for comparisons, but

other p ossible sources of divergence that a Haskell compiler

the string is used when printing optionally augmented with

do es not have to deal with. Firstly, in an untyp ed setting

the unique if there is a danger of ambiguity. When wedo

suchasaScheme compiler one can easily construct terms

need to clone a name, weinvent a new unique, but keep the

suchas

same print-name. This makes it p ossible to print dumps of

intermediate co de that still contain names that relate to the

\x -> x x \x -> x x 6

let This expression is not explicitly recursive, but it nevertheless

eq = ... reduces to itself. However, the strong-normalisation theo-

in rem for F tells us that such terms simply must b e ill-typ ed.

!

let rec

Secondly, side e ects which Haskell lacks can create a re-

d = eq, neq

1

cursive structure. For example :

neq = \a b -> case d of

e,n -> not e a b

let foo a-special-value

in

bar a-special-value

...

begin

set! foo lambda ..bar..

GHC generates co de quite like this for an \Eq dictionary". A

set! bar lambda ..foo..

\dictionary" is a bundle of related \metho ds" for op erating

body

on values of a particular typ e. Here, the Eq dictionary, d,

is a pair of metho ds ordinary functions, eq and neq ; the

Here, foo and bar are mutable lo cations, each of whichis

intention is that eq is a function that determines whether

up dated to refer to the other.

its arguments are equal, and neq determines whether they

are unequal.

4.1 The problem

In this example, the neq metho d is sp eci ed by selecting the

eq metho d from the dictionary d, calling it, and negating its

From nowonwe fo cus our attention on recursive bindings.

result. You might think that it would be more straight-

We call a group of bindings wrapp ed in rec a recursive

forward to call eq directly, but this co de is generated by

group. Unrestricted inlining of non-recursive bindings is

the compiler from class and instance declarations in the

safe, but unrestricted inlining of recursive bindings might

Haskell source co de. We found that it was very hard, in gen-

lead to non-termination. One obvious thing to do, therefore,

eral, to call the appropriate metho d directly; it was much

is to ensure that each recursive group really is recursive. To

easier to allow the front end to generate naive co de, and let

discover this, we regard eachvariable in the group as a node,

the simpli er take care of the rest.

and we record an edge from f to g if f's right hand side men-

In this particular example, d and neq are genuinely mutually

tions g so f dep ends on g. The resulting collection of no des

recursive. Yet, if d were inlined in the body of neq , the

and edges describ es a graph, called the dependency graph,

case would cancel with the pair constructor, leading to the

whose strongly connected comp onents are the smallest p os-

following:

sible recursive groups [Pey87 ]. To exploit this observation,

GHC constructs the dep endency graph for each let rec ,

let

and analyses its strongly-connected comp onents. If there

eq = ...

is more than one comp onent, the let rec is split into a

neq = \a b -> not eq a b

nest of recursive and non-recursive let s. GHC p erforms this

d = eq, neq

analysis regularly; quite often, groups that were mutually-

in

recursive fall into separate strongly-connected comp onents

...

as a result of earlier transformations.

Noweverything is non-recursive, the de nition of neq is im-

So muchis well known. But what do we do when we are

proved, and inlining opp ortunities in the rest of the program

faced with a genuinely recursive group? The simplest thing

are improved.

to do is not to inline any recursively-b ound variables at all,

and that is what earlier versions of GHC did. But this con-

This is not an isolated or arti cial example. Compiling

servative strategy loses obviously-useful optimisation opp or-

Haskell's typ e-class-based overloading, using the dictionary-

tunities. Consider a recursive group of bindings:

passing enco ding sketched ab ove, gives rise to p ervasive re-

cursion through these dictionaries. Failing to unravel the

let rec

recursion has a devastating e ect on p erformance, b ecause

f = \ x -> ...g...

overloaded functions include equality, ordering, and all nu-

g = \ y -> ...f...

meric op erations, some of which show up in almost any inner

in

lo op. We originally went to great lengths in the front end

...f...

to avoid generating unnecessary dictionary recursion but,

no matter how hard we tried, some unnecessary rec s stil l

By convention, other variables of interest, suchas g in this

showed up. Our new approach uses a much simpler transla-

case, are assumed not to b e free in ...f... . Since only f is

tion scheme, along with an inliner that do es a go o d job of

called outside the rec,we can inline g at its unique call site

inlining rec -b ound variables. This approach has the merit

to give:

that it works equally well for complex recursions written

let rec

by the programmer, though admittedly these are much less

f = \ x -> ......f......

common.

in

...f...

4.2 The solution

Here, the gain is mo dest. But sometimes inlining in recsis

critically imp ortant. Consider this:

The real problem with recursive bindings is that they can

1

Thanks to Manuel Serrano for p ointing this out.

make the inliner fall into an in nite lo op. The key insightis

this: 7

The result of the algorithm is an ordered list of bindings  The inliner cannot loop if every cycle in the dependency

with the following prop erty: the only forwardreferences are graph is broken by a variable that is never inlined.

to loop-breakers. The bindings are still, of course, mutu-

ally recursive, but all the non-lo op-breakers can b e treated

The conservative scheme works by never inlining any

exactly like non-recursive lets so far as the inliner is con-

recursively-b ound variable, but that is over-kill, as wesaw

cerned: their de nition o ccurs b efore any of their uses, and

in the example in Section 4.1:

inlining them cannot cause non-termination. For example,

rec

consider the ve-no de dep endency graph given ab ove. It

d = eq, neq

forms a single strongly-connected comp onent. Supp ose we

neq = \a b -> case d of

pick q as a lo op breaker; we delete arcs leading to it and

e,n -> not e a b

p erform the strongly-connected comp onent analysis again.

The reduced dep endency graph has three strongly-connected

we obtained much b etter results by inlining d but not neq 

comp onents, namely fp g, ff ; g; hg, and fq g

than by inlining neither. The dep endency graph for this

group forms a circle, thus:

h f d neq

g

To prevent the inliner diverging, it suces to cho ose either

d or neq , and refrain from inlining it. In a more com-

of p q

plicated situation, however, it might not b e at all obvious

whichvariables suce to break all the lo ops. For example,

consider this more complex dep endency graph:

We use dashed arcs for the arcs that are deleted in step

c. Supp ose now that we cho ose f as the lo op breaker.

wwehave no strongly connected comp onents left in the

f g q No reduced graph:

h p h f

e can break all the lo ops by picking g alone,

In this graph, w g

or f and q ,orh and p,oravariety of other pairs. To exploit

this idea, we enhance the standard rec-breaking dep endency

ve, in the following way. For each rec

analysis describ ed ab o p q

group, we construct its dep endency graph, and then execute

the following algorithm:

Notice that the only forward arcs are the dashed arcs lead-

1. Perform a strongly-connected comp onent analysis of

ing to lo op breakers. Reconstructing the recursive group

the dep endency graph.

in top ologically sorted order left to right in the diagrams

gives:

2. For each strongly-connected comp onent of the graph,

p erform the following steps, treating the comp onents

rec

in top ologically-sorted order; that is, deal rst with

p = ...q...

the comp onent that do es not refer to any of the other

h = ...f...

comp onents, and so on.

g = ...h...

f* = ...g...

a If the comp onent is a singleton that do es not de-

q* = ...g...

p end on itself, do nothing.

The \* " indicates the lo op breakers. Only the lo op breakers

b Otherwise, cho ose a single variable, the loop-

are referred to in the group earlier than they are de ned,

breaker, that will not b e inlined. This choice is

considering the de nitions top to b ottom. This is a won-

made using a heuristic we discuss shortly Sec-

derful prop erty. As we shall see later Section 6, inlining

tion 4.3.

even non-recursive let -b ound variables is far from straight-

c Take the dep endency graph of the comp onenta

forward, and having to worry ab out recursion would only

subset of the original graph, and delete all the

makeitworse. The b eauty of the lo op-breaking algorithm

edges in this graph that terminate at the lo op-

means that recursive let s can b e treated essentially iden-

breaker.

tically to non-recursive lets, thereby factoring the problem

d Rep eat the entire algorithm for this new dep en-

into two indep endent pieces: rst cut the lo ops, and then

dency graph, starting with Step 1.

treat recursive and non-recursive bindings uniformly. 8

4.3 Selecting the lo op breaker This approach has the very great merit that it deals readily

with all forms of non-termination: recursive functions, re-

cursive data typ es, untyp ed languages and side e ects, for

There are two criteria that one might use to select a lo op

example, all cause no problems. The diculty with this ap-

breaker:

proach in our setting is that the simpli er is applied rep eat-

edly, a dozen times or more, b etween applying other trans-

 Try not to select a variable that it would b e very b en-

formations strictness analysis, let- oating, etc. If each it-

e cial to inline.

eration accepts a given amount of co de growth, or e ort ap-

plied, then each iteration might unrollarecursive function

 Try to select a variable that will break many lo ops.

further. The e ort/size b ound mechanism uses an auxiliary

parameter the e ort/size budget that is not recorded in

GHC currently uses only the rst of these criteria. The sec-

the tree between successive iterations of the simpli er; it

ond is a bit tricky to predict, and wehave not explored using

records the state of the inliner itself.

it. Toevaluate the rst criterion, GHC crudely \scores" each

variable byhowkeen GHC is to inline it. Sp eci cally, we

Our approach do es not have this problem | successive ap-

pick the rst of the following criterion that applies to the

plications of the simpli er will eventually terminate. How-

binding in question:

ever, our more static analysis required that recursive func-

tions and recursive data typ es b e handled di erently, which

is undesirable. And yet more would b e needed in an untyp ed

Score = 3, if the right hand side is just a constantorvari-

or impure setting.

able. In this case the binding will certainly b e inlined.

A quite separate, complementary, approach to inlin-

Score = 3, if the variable o ccurs just once counting b oth

ing recursive functions is variously describ ed by [App94]

the right hand sides of the rec itself and the b o dy of

\lo op headers", [Ser97 ] \lab els-inline", [DS97 ] \lamb da-

the let . The variable is likely to b e inlined if it o ccurs

dropping", and [San95 ] \the static argument transforma-

only once.

tion". The common idea is to turn a recursive function

de nition into a non-recursive function containing a lo cal,

Score = 2, if the right hand side is a constructor applica-

recursive de nition. Thus we can, for example, transform

tion. Thus, weavoid selecting \d" in the example in

the standard recursive de nition of map:

Section 4.1, b ecause its right hand side is a pair.

map = \f xs -> case xs of

Score = 1, if the variable has rewrite rules or sp ecialisa-

[] -> []

tions attached to it. Details of this are beyond the

x:xs -> f x : map f xs

scop e of this pap er.

into the following non-recursive de nition:

Score = 0, otherwise.

map = \f xs ->

let mp = \xs -> case xs of

Then we pick a lo op breaker by arbitrarily cho osing one of

[] -> []

the variables with lowest score. While this scoring mech-

x:xs -> f x : mp xs

anism is very crude, it seems adequate. In practice, we

in mp xs

have never come across a rec in which a di erent choice

of lo op breaker would have made a signi cant di erence.

With the original de nition, inlining would simply unroll a

This amounts to anecdotal evidence only; wehave not tried

nite numb er of iterations of map . With the new de nition,

systematically to measure the e ectiveness of lo op-breaker

inlining map creates a new, sp ecialised function de nition for

choice.

mp into which the particular f used at the call site can b e

inlined, p erhaps resulting in b etter co de | claimed b ene ts

range from 1 to 10. The overall e ect is much b etter than

4.4 Other approaches

that achieved by simply unrolling the original de nition of

map ; unrolling a lo op reduces the overheads of the lo op itself,

Amuch more common approach to termination, taken by

whereas creating a sp ecialised function, mp , reduces the cost

b oth [Ser97 ] and [WD97 ], is to b ound b oth the e ort that

the computation in each iteration of the lo op.

the inliner is prepared to invest, and the size of the expres-

sion it is prepared to build, when inlining a particular call. If

The static argument transformation may indeed b e useful,

either limit is exceeded, the inliner abandons the attempt to but it is orthogonal to the main thrust of this pap er. It is

inline the call. Bounding e ort deals with expressions, such b est considered as a separate transformation, p erformed on

as \x->x x\x->x x, that do not grow, but do not ter- map b efore inlining is b egun, that enhances the e ectiveness

minate either. The e ort b ound is typically set quite high, of inlining.

to allow for cascading transformations, so an e ort b ound

alone might pro duce very large residual programs; that is

4.5 Results

why the size b ound is necessary as well.

Avariant of the approach retains a stack of inlinings that

It is hard to o er convincing measurements for the e ec-

have b een b egun but not completed. When examining a

tiveness of the lo op-breaker algorithm, b ecause GHC is now

call, the function is not inlined if an inlining of that same

built in the exp ectation that rec s that can b e broken will

function is already in progress, or \p ending". In e ect, that

be. Nevertheless, Figure 2 gives some indicative results. It

function b ecomes the lo op breaker, but it is chosen dynam-

shows the the e ect of switching the lo op-breaker algorithm

ically rather than statically. 9

Allo cations No libs Libs to o

The real simpli er's typ e is a bit more complicated than

Mean +23 +78

this: it takes an argument that enables or disables individ-

Min 15 0

ual transformations; it gathers statistics ab out how many

Max +200 +1125

transformations are p erformed; and it takes a name supply,

to use when it has to conjure up a fresh name not based on

2

an existing name . However, we will not need to consider

Figure 2: E ect on total allo cation of switching o the lo op-

these asp ects here.

breaker algorithm

The substitution and in-scop e set p erform precisely the roles

describ ed in Section 3, but, as we shall see, they b oth have

o , by marking every rec-b ound variable as a lo op breaker.

further uses. The context tells the simpli er something

The \Mean" row shows the geometric mean of the ratio b e-

ab out the context in which the expression app ears e.g. it is

tween the switched-o version and the baseline version |

applied to some arguments, or it is the scrutinee of a case

we use a geometric mean b ecause we are averaging ratios

expression. This context information is imp ortant when

[FW86 ]. The \Min" and \Max" rows show the most ex-

making inlining decisions Section 7.5.

treme ratios we found.

We refer to an un-pro cessed expression as an \in-

The e ects are dramatic. The column headed \No libs"

expression", and an expression that has already b een pro-

has the lo op-breaking algorithm switched o when compil-

cessed as an \out-expression", and similarly for variables.

ing the application, but not when compiling the standard

The reasons for making these distinctions will b ecome ap-

libraries. The column \Libs to o" shows the e ect of switch-

parent Section 6.2.

ing o the lo op-breaking algorithm when compiling the stan-

type InVar = Var

dard libraries as well. The imp ortance of the libraries is that

type InExpr = Expr

they contain implementations of arithmetic over basic typ es;

type InAlt = Alt

if that is compiled badly then p erformance su ers horribly.

We are investigating the strange 15 gure, which sug-

type OutVar = Var

gests that switching o lo op breakers improved at least one

type OutExpr = Expr

program.

type OutAlt = Alt

As indicated in Section 2, the simpli er treats an entire

4.6 Summary

Haskell mo dule which GHC treats as a compilation unit

as a sequence of bindings, some recursive and some not. It

In retrosp ect, the algorithm is entirely obvious, yet wespent

deals each of these bindings in turn, just as if they were in

ages trying half-baked hacks, none of which quite worked,

a nested sequence of let s.

b efore nally biting the bullet and nding it quite tasty.Itis

more likely to b e imp ortant for compilers for lazy languages

than for strict ones, b ecause only non-strict languages allow

5.2 The o ccurrence analyser

recursive data structures, and it is there that the most im-

p ortant p erformance implications show up. However, as our

It is clear that whether to inline x dep ends a great deal on

rst example demonstrated, even where no data structures

how often x o ccurs in E. Before each run of the simpli er,

are involved, useful improvements can b e had.

GHC runs an occurrence analyser, a b ottom-up pass that

annotates each binder with an indication of how it o ccurs,

All of this is entirely orthogonal to the question of lo op un-

chosen from the following list:

rolling. A lo op breaker could b e inlined a xed number of

times to gain the e ect of lo op unrolling.

LoopBreaker . The o ccurrence analyser executes the

dep endency-graph algorithm we discussed in Sec-

5 Overall architecture tion 4.1, marking lo op breakers, and sorting the

bindings in each rec so that only lo op breakers are

referred by an earlier de nition in the sequence.

The GHC inliner tries to do as much inlining as p ossible in

Building the dep endency graph uses precisely the

a single pass. Since inlining often reveals new opp ortunities

information that the o ccurrence analyser is gathering

for further transformations, the inliner is actually part of

anyway, namely information ab out where the b ound

GHC's simpli er, which p erforms a large number of lo cal

variables of the rec o ccur.

transformations [PJS98 ]. In this section we giveanoverview

of the simpli er to set the scene for the rest of the pap er.

Dead . The binder do es not o ccur at all. For a let binder

whether recursive or not, the binding can be dis-

carded, and the o ccurrence analyser do es so immedi-

5.1 The simpli er

ately, so that it do es not need to analyse the right hand

sides.

The simpli er takes a substitution, a set of in-scop e vari-

OnceSafe . The binder o ccurs exactly once, and that o ccur-

ables, an expression, and a \context", and delivers a simpli-

rence is not inside a lamb da, nor is a constructor ar-

ed expression:

2

We could certainly do without this name supply,by conjuring

simplExpr :: Subst -> InScopeSet

up names based on an arbitrary base name, but it turns out that it

-> InExpr -> Context

can conveniently piggy-back on the monadic plumbing for the other

administrative arguments. -> OutExpr 10

The simpli er alternates between o ccurrence analysis and gument. Inlining is unconditionally safe; it duplicates

simpli cation, until the latter indicates that no transforma- neither co de nor work. Section 2.2 explained whywe

tions o ccurred, or until some arbitrary numb er currently 4 must not inline an arbitrary expression inside a lamb da,

of iterations has o ccurred. This entire algorithm is applied and also describ ed the trivial-constructor-argument in-

between other ma jor passes, such as sp ecialisation, strict- variant.

ness analysis [PP93 ], or let- oating [PPS96 ].

MultiSafe . The binder o ccurs at most once in each of sev-

GHC is capable of wholesale inlining across mo dule b ound-

eral distinct case branches; none of these o ccurrences

aries. Whenever GHC compiles a mo dule M it writes an

is inside a lamb da. For example:

\interface le", M.hi , that contains GHC-sp eci c informa-

tion ab out M, including the full Core-language de nitions for

case xs of

any top-level de nitions in M that are smaller than a xed

[] -> y+1

threshold. This threshold is chosen so that few, if any,

x:xs -> y+2

larger functions could p ossibly b e inlined, regardless of the

calling context. When compiling any mo dule, A, that im-

In this expression, y o ccurs only once in each case

p orts M, GHC slurps in M.hi , and is thereby equipp ed to

branch. Inlining y may duplicate co de, but it will not

inline calls in A to M's exp orts. Since the de nition of func-

duplicate work.

tion exp orted from M might refer to values not exp orted from

OnceUnsafe . The binder o ccurs exactly once, but inside a

M, GHC dumps into M.hi the transitive closure of all su-

lamb da. Inlining will not duplicate co de, but it might

ciently small functions reachable from M's exp orts. Values

duplicate work Section 2.2.

that are not exp orted from M may not b e mentioned directly

by the programmer, but may nevertheless b e inlined by the

MultiUnsafe . The binder may o ccur many times, including

inliner.

inside lamb das. Variables exp orted from the mo dule

The consequence of all this is that A may need to be re-

b eing compiled are also marked MultiUnsafe , since the

compiled if M changes. There is no avoiding this, except by

compiler cannot predict how often they are used.

disabling cross-mo dule inlining via a command-line ag.

GHC go es to some trouble to add version stamps to every

Notice that we have three variants of \o ccurs once"

inlining in M.hi so that it can deduce whether or not A real ly

OnceSafe , MultiSafe , and OnceUnsafe . We have found

needs to b e recompiled.

all three to b e imp ortant.

Some lamb das are certain to b e called at most once. Con-

6 The three-phase inlining strategy

sider:

let x = foo 1000

After considerable exp erimentation, GHC now makes an in-

f = \y -> x+y

lining decision ab out a particular let b ound variable at no

in case a of

fewer than three distinct moments. In this section we ex-

[] -> f 3

plain why. Consider again the expression:

b:bs -> f 4

let x = E in B

Here f cannot b e called more than once, so no work will b e

duplicated by inlining x,even though its o ccurrence is inside

a lamb da. Hence, it would b e b etter to give x an o ccurrence

PreInlineUnconditionally. When the simpli er meets

annotation of OnceSafe , rather than OnceUnsafe .

the expression for the rst time, it considers whether

to inline x unconditionally in B. It do es so if and only

We call such lamb das one-shot lambdas, and mark them

if x is marked OnceSafe see Section 5.2. In this case,

sp ecially. They certainly o ccur in practice | for example,

the simpli er do es not touch E at all; it simply binds

they are constructed as join p oints by the case-of-case trans-

x to E in its current substitution, discards the binding

formation for details see [PJS98 ]. We are still working

completely, and simpli es B using this extended substi-

on a typ e-based analysis for identifying one-shot lamb das

tution. This is the main use of the substitution b eyond

[WP99 ]. Details of this analysis are beyond the scop e of

dealing with name capture, but it needs a little care,

this pap er, but our p oint here is that they are b eautifully

as we discuss in Section 6.2.

easy to exploit: the o ccurrence analyser simply ignores them

when it is its \inside-lamb da" information.

Notice, crucially, that the right hand side of the de ni-

tion is processed only once, namely at the o ccurrence

site. It turns out that this is very imp ortant. If the

5.3 Summary

right hand side is pro cessed when the let is encoun-

tered, and then again at the o ccurrence of the variable,

The overall plan for GHC's simpli er is therefore as follows:

the complexity of the simpli er b ecomes exp onential

in program size. Why? Because the right hand side is

while something-happened && iterations < 4

pro cessed twice; and it mighthavea let whose right

do

hand side is then pro cessed twice each time; and so on.

perform occurrence analysis

In retrosp ect this is obvious, but it was very puzzling

simplify the result

at the time!

end

PostInlineUnconditionally. If the pre-inline test fails,

the simpli er next simpli es the right hand side, E ,to 11

pro duce E' . It then again considers whether to inline solutions suggest themselves | for example, provide some

x unconditionally in B. It decides to do so if and only mechanism for xing a's o ccurrence information; or get the

if o ccurrence analyser to propagate b's o ccurrences to a | and

we tried some of them. They are all complicated, and the

 x is not exp orted from this mo dule exp orted def-

result was a bug farm.

initions must not b e discarded, and

We nally discovered the three-phase inline mechanism we

 x is not a lo op breaker, and

have describ ed. It is simple, and obviously correct. The

3

 E' is trivial { that is, a literal or variable . Nei-

PreInlineUnconditionally phase only inlines avariable x if

ther work nor co de is is duplicated if a trivial

x o ccurs once, not inside a lamb da. That means that the

expression is inlined.

occurrence information for any variable, y ,freeinx's right

hand side is una ected by the inlining.

If so, then again the binding is dropp ed, and x is

mapp ed to E' in the substitution.

On the other hand, once the right hand side has b een pro-

cessed, if y is going to b e inlined unconditionally, then that

This case is quite common; it corresp onds to copy prop-

will have happ ened already. In our example, PreInlineUn-

agation in a conventional compiler. It often arises as a

conditionally will decide to inline a. Now the simpli er

result of -reduction. For example, consider the de ni-

moves on to the binding for b. PreInlineUnconditionally de-

tions:

clines to inline, so the right hand side of b is pro cessed; a is

inlined, and a pro cessed version of  ...big... is pro duced.

f = \x -> E

This is not trivial, so PostInlineUnconditionally declines to o.

t = f a

Another obvious question is whether PostInlineUncondition-

If f is inlined, we get a redex, and thence

ally could b e omitted altogether, leaving CallSiteInline to do

its work. Here the answer is clearly \yes"; PostInlineUncon-

f = \x -> E

ditionally is just an optimisation that allows trivial bindings

t = let x = a in E

to b e dropp ed a little earlier than would otherwise b e the

case. To summarise, the key feature of our three-phase inlin-

The interesting question is whywe do not make this

ing strategy is that it allows the use of simple, pre-computed

test at the PreInlineUnconditionally stage, something

o ccurrence information, while still avoiding the exp onential

we discuss b elow.

blowup that can o ccur if PreInlineUnconditionally is omit-

CallSiteInline. If neither of the ab ove holds, GHC retains

ted.

the let binding, adds x to the in-scop e set. While

pro cessing B,atevery occurrence of x, GHC considers

6.2 The substitution

whether to inline x. This decision is based on a fairly

complex heuristic, that we discuss in Section 7. If the

decision is \Yes", then GHC needs to have access to x's

As we mentioned at the start of Section 6, the simpli er

de nition; this can b e achieved quite elegantly, as we

carries along a the current substitution, and b the set

discuss in Section 6.3.

of variables in scop e. But since the simpli er is busy trans-

forming the expression and cloning variables, wehavetobe

more precise:

6.1 Why three-phase?

 The domain of the substitution is in-variables.

An obvious question is this: why not combine PostInlineUn-

conditionally with PreInlineUnconditionally? That is, be-

 The in-scop e set consists of out-variables.

fore pro cessing E,why not lo ok to see if it is trivial e.g. a

variable, and if so inline it unconditionally? Doing so is a

We discussed in-variables and out-variables in Section 5.

huge, but rather subtle, mistake.

But what is the range of the substitution? When used for

cloning or PostInlineUnconditionally the range was an out-

The mistake is to do with the correctness of the pre-

expression, but when used in PreInlineUnconditionally the

computed o ccurrence information. Supp ose wehave:

range was an in-expression. But watch out! Since we are,

let

in e ect, deferring the simpli cation of the in-expression, we

a = ...big...

must also record the substitution appropriate to the original

b = a

site of the expression. Thus we are led to the following

in

de nition for the substitution:

...b...b...b...

type Subst = FiniteMap InVar SubstRng

a will b e marked OnceSafe , and hence will b e inlined uncon-

data SubstRng = DoneEx OutExpr

ditionally. But if PreInlineUnconditionally now sees that b's

| SuspEx InExpr Subst

right-hand side is just a , and inlines b everywhere, a now ef-

A DoneEx is straightforward, and is used b oth by the name-

fectively o ccurs in many places. This is a disaster, b ecause

cloning mechanism, and by PostInlineUnconditionally. A

a is now inlined unconditionally in many places.

SuspEx Susp for \susp ended" is used by PreInlineUncon-

The cause of this disaster is that a's o ccurrence information

ditionally, and pairs an in-expression with the substitution

was rendered invalid by our decision to inline b. Several

appropriate to its let binding; you can think of it as a sus-

p ended application of simplExpr . Notice that we do not

3

Or, in the real compiler, a typ e application. 12

Pre Post CallSite

capture the in-scop e set as well. Why not? Because we

Mean 47:4 17:4 35:2

must use the in-scop e set appropriate to the o ccurrence site

Min 0:25 0:92 0:72

| Section 7.1 ampli es this p oint.

Max 80 95 98

6.3 The in-scop e set

Figure 3: Relative frequency of inlining

We mentioned earlier Section 6 that the simpli er needs

access to a let -b ound variable's right-hand side at its oc-

eliminate any alternatives that cannot p ossibly match. Sim-

currence sites. All we need is to turn the in-scop e set into

ilarly, the expression x `seq` F inside E can b e transformed

4

a nite mapping:

to just F , since NotAmong implies that x is evaluated . Even

the value NotAmong [] is useful: it signals that the variable

type InScopeSet = FiniteMap OutVar Definition

is evaluated, without sp ecifying anything ab out its value.

data Definition = Unknown

| BoundTo OutExpr OccInfo

The in-scop e set, extended to b e an in-scop e mapping, plays

| NotAmong [DataCon]

the role of a dynamic environment. It records knowledge of

the value of each in-scop e variable, including knowledge that

Whether or not a variable is in scop e can b e answered by

may b e true for only part of that variable's scop e. The nice

lo oking in the domain of the in-scop e set we still call it a

thing is that this dynamic knowledge can elegantly b e car-

\set" for old times sake. But the range of the mapping

ried by the in-scop e set, whichwe need anyway. The details

records what value the variable is b ound to:

of the transformations that exploit that dynamic knowledge

are b eyond the scop e of this pap er.

Unknown is used for variables b ound in lambda and case

patterns. We don't know what value sucha variable is

Almost all the time, the substitution and in-scop e set travel

b ound to.

together. But that is not always the case, as we discuss in

Section 7.1.

BoundTo is used for let b ound variables b oth recursive

and non-recursive, and records the right-hand side of

the de nition and the o ccurrence information left with

6.4 Measurements

the binding by the o ccurrence analyser. The latter is

needed when making the inlining decision at o ccurrence

Figure 3 gives some simple measurements of the relative

sites.

frequency of each form of inlining. We used the same set of

b enchmark programs in in Section 3.4, gathered statistics

NotAmong is describ ed shortly.

on how often each sort of inlining was used, and averaged

these separately-calculated prop ortions. Wetook arithmetic

The in-scop e set is also a convenient place to record informa-

means of the p ercentages, b ecause here we are averaging

tion that is valid in only part ofavariable's scop e. Consider:

\slices of the pie", so the \Mean" line should still sum to

100.

\x -> ...case x of a,b -> E...

The gures indicate that on average, each sort of inlining is

When pro cessing E, but not in the \... " parts, x is known to

actually used in practice, and that each dominates in some

b e b ound to a,b . So, when pro cessing the alternativeofa

programs.

case expression whose scrutinee is a variable, it is easy for

the simpli er to mo dify the in-scop e set to record x's bind-

ing. Why is this useful? Because E might contain another

6.5 Summary

case expression scrutinising x:

...case x of p,q -> F...

We can summarise the binding-site e ects on the substitu-

tion and in-scop e set as follows. Supp ose that we encounter

By inlining a,b for x, we can eliminate this case alto-

the binding x = E with a substitution subst , and an in-

gether. This turns out to b e a big win [PJS98 ].

scop e set in-scope .

The NotAmong variant of the Definition typ e allows the

simpli er to record negative information:

PreInlineUnconditionally. The substitution is extended

by binding x to SuspEx E subst . The in-scop e set is

case x of

not changed.

Red -> ...

Blue -> ...

PostInlineUnconditionally. The substitution is ex-

Green -> ...

tended by binding x to DoneEx E' , where E' is the

DEFAULT -> E

simpli ed version of E. The in-scop e set is not changed.

The DEFAULT alternative matches any constructors other

Otherwise. If x is not already in scop e, the substitution is

than Red , Blue , and Green . GHC supp orts such DEFAULT

not changed, but the in-scop e set is extended by bind-

alternatives directly, rather than requiring case expressions

ing x to E'. If x is already in scop e, then a new variable

to b e exhaustive, which is dreadful for large data typ es. In-

name x' is invented Section 3.3; the substitution is

side E , what is known ab out x? What we know is that it

4

is not b ound to Red , Blue , or Green . This can be useful;

The expression E1 `seq` E2 evaluates E1, discards the result, and

then evaluates and returns E2.

if E contains a case expression that scrutinises x, we can 13

replace x by x1! The right thing to do is to continue with extended by binding x to DoneEx x', and the in-scop e

the empty substitution. set is extended by binding x' to E'.

The co de is simple enough, but it to ok us a long time b efore

This concludes the discussion of what happ ens at the bind-

the interplaybetween the substitution and the in-scop e set

ing site of a variable. Nowwe consider what happ ens at its

b ecame as simple and elegant asitnow is.

o ccurrences.

7.2 Inlining at an o ccurrence site

7 Occurrences

Once the simpli er has found a variable that is not in the

substitution and hence is an OutVar , we need to decide

When the simpli er nds an o ccurrence of a variable, it rst

whether to inline it CallSiteInline from Section 6. The

lo oks up the variable in the substitution Section 7.1, and

rst thing to do is to lo ok up the variable in the in-scop e

then decides whether to inline it Section 7.2.

set:

considerInline ins v cont

= case lookup ins v of

7.1 Lo oking up in the substitution

Nothing -> error "Not in scope"

When the simpli er encounters the o ccurrence of a variable,

Just BoundTo rhs occ | inline rhs occ cont

the latter b eing an InVar must b e lo oked up in the sub-

-> simplExpr empty ins rhs cont

stitution:

Just other -> rebuild Var v cont

simplExpr sub ins Var v cont

= case lookup sub v of

If the dynamic information is BoundTo , and the predicate

Nothing -> considerInline ins v cont

inline says \yes, go ahead", we simply tail-call the simpli-

Just SuspEx e s -> simplExpr s ins e cont

er, passing the in-scop e set and the empty substitution as

Just DoneEx e -> simplExpr empty ins e cont

in the DoneEx case of the substitution. In all other cases

The variable might not be in the substitution at all { for

we give up on inlining. The function rebuild , whichwedo

example, it might be a variable that did not need to be

not discuss further here, simply combines the variable with

renamed. In that case, the next thing to do is to consider

its context.

inlining it. The substitution can b e discarded at this p oint,

The inline predicate is the interesting bit. It lo oks rst at

b ecause the inlining if any is already an out-expression.

the variable's o ccurrence information:

Incidentally, notice that the variable we previously thought

of as an InVar is nowanOutVar . This is one reason that

inline :: OutExpr -> OccInfo -> Context -> Bool

InVar and OutVar are simply synonyms for Var, rather than

inline rhs LoopBreaker cont = False

b eing truly distinct typ es.

inline rhs OnceSafe cont = error "inline: OnceSafe"

inline rhs MultiSafe cont = inlineMulti rhs cont

If the substitution maps the variable to a SuspEx , then the

simpli er is tail called again, passing the captured substi-

inline rhs OnceUnsafe cont = whnfOrBot rhs &&

tution, and the current in-scop e set. The substitution and

not veryBoring cont

the in-scop e set usually travel together, but here they do

inline rhs MultiUnsafe con = whnfOrBot rhs &&

not. Wemust use the in-scop e set from the occurrence site

inlineMulti rhs cont

b ecause that describ es what variables are in scop e there,

and the substitution from the de nition site.

The LoopBreaker case is obvious. The OnceSafe case should

never happ en, b ecause PreInlineUnconditionally will have

The third case is when the variable maps to DoneEx e. In

already inlined the binding.

this case you might think wewere done. But supp ose e was

a variable. Then we should consider inlining it, given the

The OnceUnsafe case uses the whnfOrBot predicate Sec-

current context cont , which di ers from that at the vari-

tion 2.2, to ensure that inlining will not happ en if there

able's de nition site. What if e was a partial application

is anywork duplication. However, as noted in Section 2.2,

of a function? Again, the context mightnow indicate that

even if the variable o ccurs just once, it is not alwaysagood

the function should b e inlined. So the simple thing to do

idea to inline it. The veryBoring predicate has typ e

is simply to pass e to simplExpr again. But notice that we

veryBoring :: Context -> Bool

give it the empty substitution! Consider this example:

It examines the context, returning False if there is anything

\x -> let

at all interesting ab out it, namely if and only if:

f = x

in

\x -> ...f..f...

 The variable is applied to one or more arguments.

When the binding for f is encountered, PostInlineUncondi-

 The variable is the scrutinee of a case .

tionally will extend the substitution, binding f to DoneEx x .

When the \x is encountered, the substitution will again b e

Notice that if a variable is the argument of a constructor,

extended to bind x to DoneEx x1, b ecause x is already in

it is in a veryBoring context, and so it will not b e inlined,

scop e. Now, when we replace the o ccurrence of f by x,we

thus maintaining the trivial-constructor-argumentinvariant

must not apply the same substitution again, whichwould

Section 2.2. 14

The MultiSafe and MultiUnsafe cases deal with the situa- where the Cons {x,xs} is the saturated constructor appli-

tion where there is more than one o ccurrence of the variable. cation. In reality there are a few typ e abstractions and ap-

Both make use of inlineMulti to do the bulk of the work; plications to o, but the idea is the same. These de nitions

in addition, MultiUnsafe uses whnfOrBot to avoid work du- also make a convenient place to p erform argumentevalua-

plication. tion and p erhaps unboxing for strict constructors. For the

simple de nitions, suchascons , it is clearly b etter to inline

Incidentally, since whnfOrBot rhs dep ends only on rhs , it

the de nition, even if the context is boring .

is actually lazily cached in the BoundTo constructor rather

than b eing re-calculated at each o ccurrence site. Notice that the rst case is required even though

smallEnough is sure to return True if noSizeIncrease do es.

Why? Because otherwise the second case might decide that

7.3 Inlining multiple-o ccurrence variables

the context is boring and decline to inline.

Now we are left with the case of inlining a variable that

o ccurs many times. 7.4 Size matters

inlineMulti :: OutExpr -> Context -> Bool

Wehavenow nally arrived at the smallEnough predicate,

inlineMulti rhs cont

the main asp ect of this pap er for which there is a reasonable

| noSizeIncrease rhs cont = True

alb eit small literature. We do not claim any new contri-

| boring rhs cont = False

| otherwise = smallEnough rhs cont

bution here, though unlike some prop osals smallEnough is

context-sensitive:

The third case of inlineMulti is the function that every

inliner has: is the function small enough to inline? The rst

smallEnough :: Expr -> Context -> Bool

two cases are less obvious. The second case deals with the

For the record, however, the algorithm is as follows. We

situations like this:

compute the size of the function body having rst split

let

o its formal parameters, namely the lamb das at the top.

f = \x -> E

From this size we subtract:

in

... let g = \y z -> f y, f z in ... ...

 The size of the call.

There is very little p oint in inlining f at these two sites,

 An argument discount for each argument extracted

b ecause we can guarantee that no new transformations b e-

from the context that a has dynamic information

yond those already p erformed on f itself  will b e enabled by

other than Unknown , and b is scrutinised bya case ,

doing so; the only saving is the call to f , and there is a co de

or applied to an argument, in the function b o dy.

duplication cost to pay. Howdowe know that no transfor-

 A result discount if the context is not boring and

mations will b e enabled? Because: a the arguments y and

the function body returns an explicit constructor or

z are lamb da-b ound and hence uninformative; and b the

lamb da.

result of b oth calls are simply stored in a data structure.

The predicate boring takes an expression the one we are

If the result of this computation is smaller than the inline

considering inlining and a context in which it would be

threshold then we inline the function. The argument dis-

inlined.

count, result discount, and inline threshold are all settable

from the command line. Santos gives more details of GHC's

boring :: Expr -> Context -> Bool

heuristics [San95 , Section 6.3].

Corresp onding to our example ab ove, boring returns True

if b oth

7.5 The context

a All the arguments to which the function is applied are

typ es, or variables that have dynamic information of

It should bynow b e clear that the context of an expression

Unknown ; and

plays a key role in inlining decisions. For a long time we

passed in a varietyof ad hoc ags indicating various things

b After consuming enough arguments from the context

ab out the context, but wehavenowevolved a much more

to satisfy the lamb das at the top of the function, the

satisfactory story. The context is a little like a continuation,

remaining context is veryBoring .

in that it indicates how the result of the expression is con-

sumed. But this continuation must not berepresentedasa

Even if the context is boring , however, it is still worth

function b ecause wemust b e able to ask questions of it, as

while inlining the function if the result of doing so is no

the earlier sub-sections indicate.

bigger than the call [App92]. That is what the predicate

So GHC's contexts are de ned by the following data typ e: noSizeIncrease tests. Again, one might exp ect this case

to be rare, but it isn't. For example, Haskell data con-

data Context

structors are curried functions, but in GHC's intermediate

= Stop

language constructor applications are saturated Section 2.

| AppCxt InExpr Subst Context

We bridge this gap by pro ducing a function de nition for

| CaseCxt InVar [InAlt] Subst Context

each constructor such as:

| ArgCxt OutExpr -> OutExpr

cons = \x xs -> Cons {x,xs} | InlineCxt Context 15

110

The Stop context is used when b eginning simpli cation of

'Inline-threshold'

lazy function argument, or the right hand side of a let

a 100

The AppCxt context indicates that the expression

binding. 1

t. The

under consideration is to b e applied to an argumen 90

argumentisasyet un-simpli ed, and must b e paired with

, the CaseCxt context is used when

its substitution. Similarly 80

simplifying the scrutinee of a case expression.

70

simplExpr simply recurses into the expression, building a

Allocations (%)

text \stack" as it go es. Here, for example, is what

con 60

simplExpr do es for App and Case no des:

50

simplExpr sub ins App f a cont

2 4 8

simplExpr sub ins f AppCxt a sub cont = 12 40 75 80 85 90 95 100 105 110

Binary Size (%)

simplExpr sub ins Case e b alts cont

= simplExpr sub ins e CaseCxt b alts sub cont

Figure 4: E ect of inlining threshold

Wehave already seen how useful it is to know the context

of a variable o ccurrence. The context also makes it easy to

p erform other transformations, such as the case-of-known-

be de ned by pattern-matching, using multiple equations,

constructor transformation:

so there is no convenient syntactic place to ask for f to b e

inlined everywhere. At an o ccurrence site, however, it is

case a,b of { p,q -> E }

natural just to use a pseudo-function.

==>

let {p=a; q=b} in E

The e ects of InlineMe and InlinePlease are as follows:

simplExpr just matches a constructor application with a

 The e ect of InlineMe is to make the enclosed ex-

CaseCxt continuation.

pression lo ok very small, which in turn makes the

The next case, ArgCxt , is used when simplifying the argu-

smallEnough predicate reply True . When simplExpr

ment of a strict function or primitive op erator. Here, a

nds an InlineMe in a non-boring context, it drops

genuine, functional continuation is used, b ecause no more

the InlineMe , b ecause its work is done.

needs to b e known ab out the continuation.

 The e ect of InlinePlease is to push an InlineCxt

The InlineCxt context is discussed in the next subsection.

onto the context stack. The smallEnough predicate

In practice, GHC's simpli er has another couple of construc-

returns True if it nds such a context, regardless of the

tors in the Context data typ e, but they are more p eripheral

size of the expression.

so we do not discuss them here.

There is an imp ortant subtlety,however. Consider

7.6 INLINE pragmas

g = \a b -> ...big...

{- INLINE f -}

Like some other languages, GHC allows the programmer to

f = \x -> g x y

sp ecify that a function should be inlined at all its o ccur-

and supp ose that this is the only o ccurrence of g. Should we

rences, as a pragma in the Haskell source language:

inline g in f's right hand side? By no means! The program-

{- INLINE f -}

mer is asking that f b e replicated, but not g! The right thing

f x = ...

to do is to switch o all inlining when pro cessing the b o dy

of an InlineMe ; when f is inlined, then and only then g

GHC also allows the Haskell programmer to ask the compiler

will get its chance.

to inline a function at a particular call site, thus:

...inline f a b...

7.7 Measurements

The function inline has typ e 8 : -> , and is semantically

the identity function. Op erationally, though, it asks that

As mentioned in Section 7.4, our implementation makes use

f be inlined at this call site. Such p er-o ccurrence inline

of an \inline threshold" to determine whether a given ex-

pragmas are less commonly o ered by compilers [Bak92 ].

pression is small enough to inline. Figure 4 shows the e ect

of varying this threshold on the geometric mean of  binary

Both these pragmas are translated to constructors in the

size and allo cation. We use allo cation instead of run-time

Note data typ e, which itself can b e attached to an expression

b ecause allo cation is easy to measure rep eatably, and is a

Section 2:

somewhat reliable proxy for run-time, with the notable ex-

data Note = ...

ception of some very small programs.

| InlineMe -- {- INLINE -}

The actual values for the threshold are fairly arbitrary, and

| InlinePlease -- inline

are a ected by some of the other parameters: discounts for

If they are so similar in the Core language, why do they

evaluated arguments and so on. What is more interesting

app ear so di erent in Haskell? Haskell allows functions to

is the shape of the graph. As exp ected, beyond a certain 16

p oint, binary sizes increase without having any dramatic ef- their results are not directly applicable to our setting. Nev-

fect on the eciency of the program. The graph also shows ertheless, it is a unique and inspiring approach.

that setting the threshold to o low i.e. less than 2 has a

Copious measurements of many transformations in GHC

dramatic e ect on b oth binary size and run-time. Essen-

not only inlining can b e found in Santos's thesis [San95 ];

tially very little call-site inlining is b eing p erformed b elow

although these measurements are now several years old, we

this threshold, and even less inter-mo dule inlining is hap-

b elieve that the general outlines are unlikely to havechanged

p ening b ecause this is covered by call-site inlining only; we

dramatically. [PJS98 ] contains briefer, but more up-to-date,

can't see the binding.

measurements.

The jump between threshold values 1 and 2 is caused by

the fact that even functions marked {- INLINE -} are

9 Conclusion

not inlined at a threshold of 1. The \wrapp er" functions

generated by strictness analysis are of this form, and if these

wrapp ers are not inlined p erformance drops dramatically.

This pap er has told a long story. Inlining seems a relatively

Making measurements is very instructive: wewere surprised

simple idea, but in practice it is complicated to do a go o d

by the rather small p erformance increases as the threshold

job. The main contribution of the pap er is to set down, in

is increased b eyond 2, and plan to investigate this further.

sometimes-gory detail, the lessons that wehave learned over

nearly a decade of tuning our inliner. Everyone who tries to

build a transformation-based compiler has to grapple with

8 Related work

these issues but, b ecause they are not crisp or sexy, there is

almost no literature on the sub ject. This pap er is a mo dest

There is a mo dest literature on inlining applied to imp er- attempt to address that lack.

ative programming languages, such as C and FORTRAN

| some recent examples are [DH92 , CMCH92 , CHT91 ,

Acknowledgements

CHT92 ]. In these works the fo cus is exclusively on pro-

cedures de ned at the top level. The b ene ts are found to

b e fairly mo dest in the 10-20 range, but the cost in terms

We warmly thank Nick Benton, Oege de Mo or, Andrew

of co de bloat is also very mo dest. Considerable attention is

Kennedy, John Matthews, Sven Panne, Alastair Reid, Julian

paid to the e ect on register allo cation of larger basic blo cks,

Seward, and the four IDL Workshop referees, for comments

whichwe do not consider at all.

on drafts of this pap er. Sp ecial thanks are due to Manuel

Chakravarty, Manuel Serrano, Oscar Waddell, and Norman

It seems self-evident that the b ene ts of inlining are strongly

Ramsey, for their particularly detailed and thoughtful re-

related to b oth language and programming style. Functional

marks.

languages encourage the use of abstractions, so the b ene ts

of inlining are likely to b e greater. Indeed, App el rep orts

b ene ts in the range 15-25 for the Standard ML of New

References

Jersey compiler [App92], while Santos rep orts average b ene-

ts of around 40 for Haskell programs [San95 ]. Chamb ers

rep orts truly dramatic factors of 4 to 55 for his SELF com-

[AJ97] AW App el and T Jim. Shrinking lamb da-

piler [Cha92 ]; SELF takes abstraction very seriously indeed!

expressions in linear time. Journal of Functional

Programming, 75:515{541, Septemb er 1997.

The most detailed and immediately-relevantwork wehave

found is for two Scheme compilers. Waddell and Dyb-

[App92] AW App el. Compiling with continuations. Cam-

vig rep orts p erformance improvements of 10-100 in the

bridge University Press, 1992.

Chez Scheme compiler [WD97 ], while Serrano found a

[App94] AW App el. Lo op headers in lamb da-calculus or

more mo dest 15 b ene t for the Biglo o Scheme compiler

CPS. Lisp and Symbolic Computation, 7:337{

[Ser95 , Ser97]. Both use a dynamic, e ort/size budget

343, 1994.

scheme to control termination. The inliner

uses an explicitly-enco ded context parameter that plays ex-

[ARS94] L Augustsson, M Rittri, and D Synek. On gener-

actly the role of our Context Section 7.5.

ating unique names. Journal of Functional Pro-

A completely di erent approach to the inlining problem is

gramming, 41:117{123, January 1994.

discussed by [AJ97 ]. In this pap er the fo cus is on inlining

[Bak92] HG Baker. Inlining semantics for

functions that are called precisely once, something that we

which are recursive. ACM Sigplan Notices,

have b een very concerned with. App el and Jim show that

2712:39{49, Decemb er 1992.

this transformation, along with a handful of others includ-

ing dead-co de elimination, are normalising and con uent,

[Bar85] HP Barendregt. The lambda calculus: its syntax

a very desirable prop erty. Their fo cus is then on nding

and semantics. Numb er 103 in Studies in Logic.

an ecient algorithm for applying the transformations ex-

North Holland, 1985.

haustively. Their solution involves adjusting the results of

the o ccurrence analysis phase as transformations pro ceed.

[BKR98] Nick Benton, Andrew Kennedy, and George

Their initial algorithm has worst-case quadratic complexity,

Russell. Compiling Standard ML to Javabyte-

but they also prop ose a more subtle and unimplemented

co des. In ICFP98 [ICF98 ], pages 129{140.

linear-time variant. We to o are concerned ab out ecient

application of transformation rules, but our set of trans-

formations is much larger, and includes general inlining, so 17

[Cha92] C. Chamb ers. The Design and Implementation [PP93] SL Peyton Jones and WD Partain. Measur-

of the SELF Compiler, an ing the e ectiveness of a simple strictness anal-

for Object-Oriented Programming Languages. yser. In K Hammond and JT O'Donnell, editors,

Technical rep ort STAN-CS-92-1240, Stanford , Glasgow 1993,Work-

University, Departementof Computer Science, shops in Computing, pages 201{220. Springer

March 1992. Verlag, 1993.

[CHT91] KD Co op er, MW Hall, and L Torczon. An

[PPS96] SL Peyton Jones, WD Partain, and A Santos.

exp eriment with inline substitution. Software

Let- oating: moving bindings to give faster pro-

Practice and Experience, 21:581{601, June 1991.

grams. In ICFP96 [ICF96 ].

[CHT92] K. Co op er, M. Hall, and L. Torczon. Unex-

[San95] A Santos. Compilation by transformation in

p ected Side E ects of Inline Substitution: A

non-strict functional languages. Ph.D. thesis,

Case Study. ACM Letters on Programming Lan-

Department of Computing Science, Glasgow

guages and Systems, 11:22{31, 1992.

University, Septemb er 1995.

[CMCH92] PP Chang, SA Mahlke, WY Chen, and W-M

[Ser95] M. Serrano. A fresh lo ok to inlining deci-

Hwu. Pro le-guided automatic

sion. In 4th International Computer Symposium

for C programs. SoftwarePractice and Experi-

ICS'95, Mexico city, Mexico, Novemb er 1995.

ence, 22:349{369, May 1992.

[Ser97] M Serrano. Inline expansion: when and how ? In

[dB80] N de Bruijn. A survey of the pro ject AU-

International Symposium on Programming Lan-

TOMATH. In JP Seldin and JR Hindley, edi-

guages Implementations, Logics, and Programs

tors, To HB Curry: essays on combinatory logic,

PLILP'97, Septemb er 1997.

lambda calculus, and formalism, pages 579{606.

Academic Press, 1980.

[SLM98] Z Shao, C League, and S Monnier. Implement-

ing typ ed intermediate languages. In ICFP98

[DH92] JW Davidson and AM Holler. Subprogram in-

[ICF98 ], pages 313{323.

lining: a study of its e ects on program execu-

tion time. IEEE Transactions on Software En-

[WD97] O Waddell and RK Dybvig. Fast and e ec-

gineering, 18:89{102, February 1992.

tive pro cedure inlining. In 4th Static Analysis

Symposium, number 1302 in Lecture Notes in

[DS97] O Danvy and UP Schultz. Lamb da-dropping:

Computer Science, pages 35{52. Springer Ver-

transforming recursive equations into programs

lag, Septemb er 1997.

with blo ck structure. In ACM SIGPLAN Sym-

posium on Partial Evaluation and Semantics-

[WP99] K Wansbrough and SL Peyton Jones. Once

BasedProgram Manipulation PEPM '97,vol-

up on a p olymorphic typ e. In 26th ACM Sympo-

ume 32 of SIGPLAN Notices, pages 90{106,

sium on Principles of Programming Languages

Amsterdam, June 1997. ACM.

POPL'99, pages 15{28, San Antonio, January

1999. ACM.

[FW86] PJ Fleming and JJ Wallace. How not to lie with

statistics - the correct way to summarise b ench-

mark results. CACM, 293:218{221, March

1986.

[How96] BT Howard. Inductive, co-inductive, and

p ointed typ es. In ICFP96 [ICF96 ].

[ICF96] ACM SIGPLAN International Conference on

Functional Programming ICFP'96, Philadel-

phia, May 1996. ACM.

[ICF98] ACM SIGPLAN International Conference on

Functional Programming ICFP'98, Balitmore,

Septemb er 1998. ACM.

[Par92] WD Partain. The nofib b enchmark suite

of Haskell programs. In J Launchbury and

PM Sansom, editors, Functional Programming,

Glasgow 1992,Workshops in Computing, pages

195{202. Springer Verlag, 1992.

[Pey87] SL Peyton Jones. The Implementation of Func-

tional Programming Languages. Prentice Hall,

1987.

[PJS98] SL Peyton Jones and A Santos. A

transformation-based optimiser for Haskell. Sci-

ence of Computer Programming, 321-3:3{47,

Septemb er 1998. 18