Secrets of the Glasgow Haskell Compiler inliner
Simon Marlow Simon Peyton Jones
Microsoft Research Ltd, Cambridge Microsoft Research Ltd, Cambridge
[email protected] [email protected]
Septemb er 1, 1999
A ma jor issue for any compiler, esp ecially for one that Abstract
inlines heavily,is name capture. Our initial brute-force
solution involved inconvenient plumbing, and wehave
Higher-order languages, such as Haskell, encourage the pro-
now evolved a simple and e ective alternative Sec-
grammer to build abstractions by comp osing functions. A
tion 3.
go o d compiler must inline many of these calls to recover an
eciently executable program.
At rst wewere very conservative ab out inlining recur-
sive de nitions; that is, we did not inline them at all.
In principle, inlining is dead simple: just replace the call
But we found that this strategy o ccasionally b ehaved
of a function by an instance of its b o dy. But any compiler-
very badly. After a series of failed hacks we develop ed
writer will tell you that inlining is a black art, full of delicate
a simple, obviously-correct algorithm that do es the job
compromises that work together to give go o d p erformance
b eautifully Section 4.
without unnecessary co de bloat.
Because the compiler do es so much inlining, it is im-
The purp ose of this pap er is, therefore, to articulate the
p ortant to get as much as p ossible done in each pass
key lessons we learned from a full-scale \pro duction" inliner,
over the program. Yet one must steer a careful path
the one used in the Glasgow Haskell compiler. We fo cus
between doing to o little work in each pass, requiring
mainly on the algorithmic asp ects, but we also provide some
extra passes, and doing to o much work, leading to an
indicative measurements to substantiate the imp ortance of
exp onential-cost algorithm. GHC now identi es three
various asp ects of the inliner.
distinct moments at which an inlining decision maybe
taken for a particular de nition. We explain why in
Section 6.
1 Intro duction
When inlining an expression it is imp ortant to retain
the expression's lexical environment, which gives the
One of the trickiest asp ects of a compiler for a functional lan-
bindings of its free variables. But at the inline site, the
guage is the handling of inlining. In a functional-language
compiler might know more ab out the dynamic state of
compiler, inlining subsumes several other optimisations that
some of those free variables; most notably, a free vari-
are traditionally treated separately, such as copy propaga-
able mightbeknown to b e evaluated at the inline site,
tion and jump elimination. As a result, e ective inlining is
but not at its original o ccurrence. Some key transfor-
particularly crucial in getting go o d p erformance.
mations make use of this extra information, and lacking
The Glasgow Haskell Compiler GHC is an optimising com-
it will cause an extra pass over the co de. We describ e
piler for Haskell that has evolved over a p erio d of ab out ten
how to exploit our name-capture solution to supp ort
years. Wehave rep eatedly b een through a cycle of lo oking
accurate tracking of b oth lexical and dynamic environ-
at the co de it pro duces, identifying what could b e improved,
ments Section 7.
and going back to the compiler to make it pro duce b etter
co de. It is our exp erience that the inliner is a lead player in
None of the algorithms we describ e is individually very sur-
many of these improvements. No other single asp ect of the
prising. Perhaps b ecause of this, the literature on the sub-
compiler has received so much attention.
ject is very sparse, and we are not aware of published de-
scriptions of any of our algorithms. Our contribution is to
The purp ose of this pap er is to rep ort on several algorithmic
abstract some of what wehave learned, in the hop e that we
asp ects of GHC's inliner, fo cusing on asp ects that were not
may help others avoid the mistakes that we made.
obvious to us | that is to say, asp ects that we got wrong
to b egin with. Most pap ers ab out inlining fo cus on howto
For the sake of concreteness we fo cus throughout on GHC,
cho ose whether or not to inline a function called from many
but we stress that the lessons we learned are applicable to
places. This is indeed an imp ortant question, but wehave
any compiler for a functional language, and indeed p erhaps
found that we had to deal with quite a few other less obvious,
to compilers for other languages to o.
but equally interesting, issues. Sp eci cally,we describ e the
following:
2 Preliminaries is that ys is b ound to the result of evaluating the scrutinee,
reverse xs in this case, which makes it p ossible to refer to
this value in the alternatives. This detail has no impact on
We will assume the use of a pure, non-strict, strongly-typ ed
the rest of this pap er | indeed, we omit the extra binder
intermediate language, called the GHC Core language. GHC
in our examples | but wehave found that it makes several
is itself written in Haskell, so we de ne the Core language
transformations more simple and uniform, so we include it
by giving its data typ e de nition in Haskell:
here for the sake of completeness.
type Program = [Bind]
GHC's actual intermediate language is very slightly more
complicated than that given here. It is an explicitly-typ ed
data Bind = NonRec Var Expr
language based on System F , and supp orts p olymorphism
!
| Rec [Var, Expr]
through explicit typ e abstraction and application. It turns
out that doing so adds only one new constructor to the Expr
data Expr = Var Var
typ e, and adds nothing to the substance of this pap er, so we
| App Expr Expr
do not mention it further. The main p oint is that this pap er
| Lam Var Expr
omits no asp ect essential to a full-scale implementation of
| Let Bind Expr
Haskell.
| Const Const [Expr]
| Case Expr Var [Alt]
| Note Note Expr
2.1 What is inlining?
type Alt -- Case alternative
Given a de nition x = E, one can inline x at a particular
= Const, [Var], Expr
o ccurrence by replacing the o ccurrence by E. We use upp er
case letters, suchas\E ", to stand for arbitrary expressions,
data Const -- Constant
and \==> " to indicate a program transformation. For ex-
= Literal Literal
ample:
| DataCon DataCon
| PrimOp PrimOp
let { f = \x -> x*3 } in f a + b - c
| DEFAULT
==>
a+b*3 - c
The Core language consists of the lambda calculus aug-
Wehave found it useful to identify three distinct transfor-
mented with let-expressions b oth non-recursive and recur-
mations that collectively implement what we informally de-
sive, case expressions, data constructors, literals, and prim-
scrib e as \inlining":
itive op erations. In presenting examples we will use an in-
formal, alb eit hop efully clear, concrete syntax. We will feel
Inlining itself replaces an o ccurrence of a let -b ound
free to use in x op erators, and to write several bindings in
variable by a copy of the right-hand side of its de -
a single non-recursive let-expression as shorthand for a se-
nition. Inlining f in the example ab ove go es like this:
quence of let-expressions.
let { f = \x -> x*3 } in f a + b - c
A program Program is simply a sequence of bindings, in
==> [inline f]
dep endency order. Each binding Bind can be recursive
let { f = \x -> x*3 } in \x -> x*3 a + b - c
or non-recursive, and the right hand side of each bind-
ing is an expression Expr . The constructors for variables
Notice that not all the o ccurrences of f need b e inlined,
Var , application App , lamb da abstraction Lam, and let-
and hence that the original de nition of f must, in
expressions Let should be self-explanatory. A constant
general, b e retained.
application Const is used for literals, data constructor ap-
plications, and applications of primitive op erators; the num-
Dead code elimination discards bindings that are no
b er of arguments must match the arity of the constant, and longer used; this usually o ccurs when all o ccurrences of
avariable have b een inlined. Continuing our example
and the constant cannot b e DEFAULT . Likewise, the num-
gives:
ber of b ound variables in a case alternative Alt always
matches the arity of the constant; and the latter cannot b e
let { f = \x -> x*3 } in \x -> x*3 a + b - c
a PrimOp . The Note form of Expr allows annotations to
==> [dead f]
b e attached to the tree. The only impact on the inliner is
\x -> x*3 a + b - c
discussed in Section 7.6.
Case expressions Case should b e self-explanatory, except
-reduction simply rewrites a lambda application
for the Var argumentto Case . Consider the following Core
\x->E A to let {x = A} in E. Applying -
expression,
reduction to our running example gives:
case reverse xs of ys {
\x -> x*3 a + b - c
a:as -> ys
==> [beta]
[] -> error "urk"
let { x = a+b } in x*3 - c
}
The unusual part of this construct is the binding o ccur-
The rst of these is the tricky one; the latter two are easy.
rence of \ys ", immediately after the \of ". The semantics
In particular, b eta reduction simply creates a let binding. 2
In a lazy, purely functional language, inlining and dead- let x = foo 1000 in x+x
co de elimination are b oth unconditionally valid, or meaning-
preserving. Neither is valid, in general, in a language p er-
where foo is exp ensive to compute. Inlining x would
mitting side e ects, suchas Standard ML or Scheme. In
result in two calls to foo instead of one.
particular, notice that inlining is valid, regardless of
Work can b e duplicated even if x only app ears once:
the numb er of o ccurrences of x,
let x = foo 1000
f = \y -> x * y
whether or not the binding for x is recursive,
in ...f 3..f 4...
whether or not E has free variables that is, inlining of
If we inline x at its single o ccurrence site, foo will b e
nested de nitions is p erfectly ne, and
called every time f is. The general rule is that wemust
the syntactic form of E notably, whether or not it is a
b e careful when inlining inside a lamb da.
lamb da abstraction.
It is not hard to come up with examples where a single
inlining that duplicates work gives rise to an arbitrar-
Concerning the last of these items, notice that we uncon-
ily large increase in run time. GHC is therefore very
ventionally use the term \inline" equally for b oth functions
conservative ab out work duplication. In general, GHC
and non-functions. Continuing the example, x can nowbe
never duplicates work unless it is sure that the dupli-
inlined, and then dropp ed as dead co de, thus:
cation is a small, b ounded amount.
let { x = a+b } in x*3 - c
Are any transformations exposed by inlining? For ex-
==> [inline x]
ample, consider the bindings:
let { x = a+b } in a+b*3 - c
==> [dead x]
f = \x -> E
a+b*3 - c
g = \ys -> map f ys
In this case, x is used exactly once, but we sometimes also
Supp ose wewere to inline f inside g,thus:
inline non-functions that are used several times. Consider:
let x = a,b
g = \ys -> map \x -> E ys
in
...x...case x of { p,q -> p+1 }...
No code is duplicated by doing so, but a small b ounded
amountof work is duplicated, b ecause the closure for
By inlining x we can then eliminate the case to give
\x -> E would have to b e allo cated each time g was
let x = a,b
called. It is often worth putting up with this work du-
in
plication, b ecause inlining f exp oses new transforma-
...x...a+1...
tion opp ortunities at the inlining site. But in this case,
nothing at all would b e gained by inlining f, b ecause f
In a similar way when given bindings suchas x=y , inlining
is not applied to anything.
subsumes copy propagation.
These considerations imply that inlining is not an optimi-
sation \by itself ". The direct e ects of careful inlining are
2.2 Factors a ecting inlining
small: it may duplicate co de or a constant amountofwork,
and usually saves a call or jump alb eit not invariably |
Tosay that inlining is valid do es not mean that it is desirable.
see the example in the last bullet ab ove. It is the indirect
Inlining might increase co de size, or duplicate work, so we
e ects that we are really after: the main reason for inlining
need b e careful ab out when to do it. There are three distinct
is that it often exp oses new transformations, by bringing
factors to consider:
together two co de fragments that were previously separate.
Thus, in general, inlining decisions must b e in uenced by
Does any code get duplicated, and if so, howmuch? For
context.
example, consider
let f = \v -> ...big... in f 3, f 4
2.3 Work duplication
where \...big... " is a large expression. Then inlin-
If x is inlined in more than one place, or inlined inside a
ing f would not duplicate any work f will still be
lamb da, wehavetoworry ab out work duplication. When
called twice, but it will duplicate the co de for f 's body.
will suchwork duplication b e b ounded? Answer: at least in
Bloated programs are bad increased compilation time,
the cases when x's right hand side is:
lower cache hit rates, but inlining can often reduce
co de size by exp osing new opp ortunities for transfor-
Avariable.
mations. GHC uses a numb er of heuristics to determine
whether an expression is small enough to duplicate.
A constructor application.
Does any work get duplicated, and if so, howmuch? For
A lamb da abstraction.
example, consider 3
An expression that is sure to diverge. a+b + a
b ecause the a that was free in x's right hand side has b een
Constructor applications require careful treatment. Con-
captured by the let binding for a .
sider:
x = f y, g y
3.1 The sledge hammer
h = \z -> case x of
a,b -> ...
Earlier versions of GHC used a sledge-hammer approach
It would plainly b e a disaster, in general, to inline x inside
to avoid the name-capture problem: during inlining, GHC
the b o dy of h, since that would duplicate the calls to f and g .
would simply rename, or clone,every single b ound variable,
Yet wewant to inline x so that it can cancel with the case .
to give:
GHC therefore maintains the invariant that every construc-
let s796 = 7
tor application has only arguments that can b e duplicated
in a+b + s796
with no cost: variables, literals, and typ e applications. We
call such arguments trivial expressions, so the invariantis
This renaming made use of a supply of fresh names that,
called the trivial-constructor-argument invariant. Once es-
in this example, has arbitrarily renamed a to s796 . This
tablished, this invariant is easy to maintain see Section 7.2.
approach su ers from two disadvantages:
The last case, that of divergent computations, is more sur-
It allo cates far more fresh names than are actually nec-
prising, but it is useful in practice. Consider:
essary, and there is sure to b e a compile-time p erfor-
sump = \xs ->
mance cost to this.
let
Plumbing the supply of fresh names to the places those
fail = error "sump" ++ show xs
names are required is sometimes very painful.
in let rec
go = \xs ->
Why is there a compile-time p erformance cost to the sledge-
case xs of
hammer approach? Because avariable is a structure con-
[] -> 0
taining a name; to rename the variable wemust copy the
x:xs -> if x<0 then fail
structure, inserting the new name. The substitution map-
else x + go xs
ping old names to new names b ecomes larger. Finally,ifthe
in
substitution is emptywe can sometimes avoid lo oking at an
go xs
expression or typ e at all | but if all names are cloned the
substitution is never empty.
Here error is the standard Haskell function that prints an
error message and brings execution to a halt. Semantically,
If the compiler were written in an impure language, fresh
its value is just ?, the divergent value. In this example,
names could b e allo cated by side e ect, but GHC is written
sump adds up the elements of a list, but rep orts an error if
in Haskell, which do es not have side e ects. Using the trees
any element is negative. As it stands, a closure for fail will
of [ARS94] is the b est solution we know of, but it still in-
b e allo cated every time sump is called. It is p erfectly OK
volves plumbing a tree of fresh names everywhere they might
to inline fail, b ecause if fail is ever called, execution is
b e needed. Worse, the fresh names usually aren't needed,
going to halt anyway, so there is no work-duplication issue.
but the tree is nevertheless built. This is deeply irritat-
If we do that, no closure is allo cated; instead, error is called
ing: loads of allo cation for no purp ose whatso ever. Finally,
directly if an element turns out to b e less than zero.
even if wewere not worried ab out p erformance, it is some-
times extremely painful to get the name supply to where it
GHC has a predicate whnfOrBot that identi es expressions
is needed. For example, in a typ ed intermediate language it
that are in WHNF or are certainly divergent:
should b e p ossible to have a function:
whnfOrBot :: Expr -> Bool
exprType :: Expr -> Type
One could easily imagine extending whnfOrBot to cover cases
that gures out the typ e of an expression. But supp ose the
where a small amountofwork other than allo cation is du-
expression is something like:
plicated, such as a few machine instructions.
filter Int pred xs
3 Name capture
The function filter has the p olymorphic typ e
filter :: forall a. a -> Bool -> [a] -> [a]
It is well known that any transformation-based compiler
So to gure out the typ e of the sub expression filter Int
must b e concerned ab out name capture [Bar85 ]. Consider,
wemust instantiate filter 's typ e, substituting Int for a.
for example:
Oh no! Substitution! That can, in general, give rise to name
let x = a+b in
capture. So we need to feed a name supply to exprType :
let a = 7 in
exprType :: NameSupply -> Expr -> Type
x+a
This \solution" is deeply unattractive, and the situation is
It is obviously quite wrong to inline x to give:
only di erent in its cosmetics if the name supply is hidden
let a = 7 in
in a monad. Something b etter is required. 4
Numb er of attempts
3.2 The rapier
0 1 2 3 9 10+
Mean 93:2 1:3 0:7 1:6 3:2
Supp ose we write the call subst M [E=x] to mean the re-
Min 0:94 0 0 0 0
sult of substituting E for x in M . The standard rule for
Max 100 10 6:13 18:2 94
substitution [Bar85 ] when M is a lamb da abstraction is:
subst x:M [E=x] = x:M
Figure 1: Cloning rates
subst x:M [E=y] = x:subst M [E=y]
if x do es not o ccur free in E
inlining. It certainly happ ens in practice | we have the
If the side condition do es not hold, one must rename the
scars to show for it, though only in situations that are to o
b ound variable x to something else. The brute-force solution
convoluted to present here.
do es this renaming regardless.
Occasionally, the set of in-scop e variables is not conveniently
Supp ose that we lacked a name supply, but instead knew
to hand when starting a substitution. In that case, it is
the free variables of E . Then we could test the side condi-
easy to nd the set of free variables of the range of the
tion easily and, in the common case where there is no name
substitution, and use that to get the pro cess started.
capture, nd that there was no need to rename x. But what
if x was free in E ? Then we need to come up with a fresh
name for x that is not free in E . A simple approachisto
3.3 Cho osing a new name
tryavariantof x,say\x1". If that, to o, is free in E , try
\x2", and so on.
The other choice that must be made in the algorithm is
to cho ose a fresh name, in the hop efully rare cases where
When we nally discover a name, xn, that is not free in
that proves necessary. We could just try x1, x2, and so on,
E , we can augment the substitution to map x to xn and
but there is a danger that once x1 :::x20 are in scop e, then
apply this substitution to M , the b o dy of the lamb da. In
any new x will make 20 tries b efore nding x21. A simple
general, then, wemust simultaneously substitute for several
way out is to compute some kind of hash value from the set
variables at once.
of in-scop e variables, and use that, together p erhaps with
To make this work at all, though, we need to know the free
the variable to b e renamed, to cho ose a new name. Indeed,
variables of E , or, more generally, the free variables of the
simply using the number of enclosing binders as the new
range of the substitution. One way to nd this is simply to
variable name gives something not unlike de Bruijn numb ers
compute the free variables directly from E , but if E is large
see Section 3.5. The nice thing is that any old choice will
this might b e costly. However, it suces to knowany su-
do; the only issue is how many iterations it takes to nd an
perset of these free variables. One obvious choice is the set
unused variable.
of al l variables that are in scope. If we made this choice,
then we would end up renaming any b ound variable for
which there was an enclosing binding. We call this the no-
3.4 Measurements
shadowing strategy, for obvious reasons. The no-shadowing
strategy will rename some variables when it is not strictly
We made some simple measurements of the e ectiveness of
necessary to do so, but it has the desirable prop erty of idem-
our approach. We compiled the entire nofib suite, some 370
p otence: a complete pass of the simpli er that happ ens to
Haskell mo dules, comprising around 50,000 lines of co de in
make no transformations will clone no variables. This is a
total [Par92 ]. The size of each mo dule varied from a few
go o d thing. Usually, some parts of the program b eing com-
dozen lines to a thousand lines or so.
piled are fully-transformed b efore others; the no-shadowing
Figure 1 summarises how many \tries" it to ok to nd a
strategy reduces gratuitous \churning" of variable names.
variable name that was not in scop e. The columns show
Thus, we are led to a substitution algorithm that has three
what prop ortion of binders required zero, one, two, 3-9, and
parameters, instead of two: the expression to which the sub-
10 or more attempts, to nd a variable name that was not
stitution is applied, the substitution itself, , and the set of
already in scop e. We measured these prop ortions separately
in-scop e variables, :
for each mo dule, and then to ok the arithmetic mean of the
resulting gures. The \min" resp \max" rows show the
subst x:M = x:subst M n x [fxg
smallest resp largest prop ortions encountered among the
if x 62
entire set of mo dules.
subst x:M = y :subst M [x 7! y ] [fy g
where y 62
The zero column corresp onds to the situation where the
binder is not shadowed; as exp ected, this is the case for
Notice how conveniently the set of in-scop e variables can b e
the vast ma jority 93 of binders. Our hash function we
maintained. Almost all the time, it simply travels every-
simply picked an arbitrary member of the in-scop e set as
where with the substitution; we shall see some interesting
a hash value is obviously to o simple, though: on average
exceptions to this general rule in Section 7.1.
3.2 of all binders required more than ten attempts to nd
a fresh name, and in one pathological mo dule almost all
There is one other imp ortant subtlety in this algorithm: in
binders required more than ten attempts. This pathological
the case where x is not in we must delete x from the
case suggests that there is plenty of ro om for improvement
substitution, denoted n x. How could x b e in the domain
in the hash function.
of the substitution, but not b e in scop e? Perhaps b ecause we
are indeed substituting for x as a result of some enclosing 5
3.5 Other approaches original source program.
Another well-known approach to the name-capture prob-
4 Ensuring termination
lem to use de Bruijn numb ers [dB80 ]. Apart from b eing
entirely unreadable, this approach su ers from the disad-
vantage that when pushing a substitution inside a lambda,
Inlining, together with b eta reduction, corresp onds closely
the entirerange of the substitution must have its de Bruijn
to compile-time evaluation of the program, so we must
numbers adjusted. That op eration can b e carried out lazily,
clearly b e concerned ab out ensuring that the compiler ter-
to avoid a complexity explosion when pushing a substitution
minates. We start from a secure base: it is a fact that F
!
inside multiple lamb das, but that means yet more adminis-
is strongly normalising. This is a complicated wayofsay-
tration.
ing that the pro cess of reducing every reducible expression
redex in a F program will surely terminate. However,
!
It is far from clear that using de Bruijn numb ers gains any
GHC's intermediate language extends F . These extensions
!
eciency, and they carry a considerable cost in terms of the
intro duce non-termination in two distinct ways:
opacity of the resulting program. Programmers will not
care ab out this, but compiler writers do.
Recursive bindings. If a recursively-b ound variable is in-
There is one fairly comp elling reason for using de Bruijn
lined at one of its o ccurrences, that will intro duce a
numb ers. Precisely b ecause they do discard the original
new o ccurrence of the same variable. Unless restricted
variable names, many more common sub-expressions can
in some way, inlining could go on for ever.
arise. These CSEs increase sharing of the compiler's rep-
Recursive data typ es. Consider the following Haskell
resentation of the program; they do not in general represent
de nition for loop :
run-time sharing. However this compile-time sharing can
be particularly imp ortant when dealing with typ es, which
data T = C T -> Int
can get large. Shao, for example, rep orts substantial sav-
ings when using de Bruijn numb ers for typ es together with
g = \y -> case y of
hash-consing [SLM98 ]. However, our typ es are smaller than
C h -> h y
his we are not compiling SML mo dules so typ e sizes only
b ecome an issue for delib erately pathological programs with
loop = g C g
exp onential-sized typ es.
Another p opular approach to the name-capture problem is
Here, g is small and non-recursive, so when pro cessing
this: establish the invariant that every b ound variable is
g C g , g will be inlined. But the inlined call very
unique in the whole program. Then, since only inlining can
so on rewrites to g C g, which is just the expression
duplicate an expression, we can maintain the invariantby
we started with.
cloning al l the local ly-bound variables of an inlined expres-
The problem here is that the data typ e T is recur-
sion. There are three diculties here. First, we found in
sive, and it appears contravariantly in its own de nition
practice that in GHC at least there were a quite a few
[How96 ].
transformations that had to do extra work to maintain the
global-uniqueness invariant. Secondly, this strategy will do
Of these two forms of divergence, the former is an immediate
more cloning than is really necessary. Thirdly, cloning the
and pressing problem, since almost anyinteresting Haskell
lo cal binders of an inlined expression implies a whole extra
program involves recursion. The rest of this section fo cuses
pass over that expression, prior to simplifying the expres-
entirely on recursive de nitions.
sion in its new context. Our approach, of maintaining an
in-scop e set, combines the cloning pass with the simpli ca-
In contrast, the latter situation is rather rare, and embar-
tion pass, and simultaneously reduces the amount of cloning
rassingly GHC can still b e p ersuaded to diverge by such ex-
that has to b e done.
amples. The most straightforward solution is to sp ot such
contravariant data typ es, and disable the case-elimination
transformation
3.6 Summary
case C g of { C h -> ...h... }
==>
Our new substitution algorithm is a simple re-working of
...g...
the standard algorithm in [Bar85 ]. What is interesting is
that the resulting algorithm seems quite practical. Even if
The question of sp otting contravariant data typ es is com-
the compiler were written in a language where name-supply
plicated by the fact that Haskell data typ es can b e parame-
plumbing was not an issue, maintaining the set of in-scop e
terised and mutually recursive. The MLj compiler [BKR98]
variables makes it easy to reduce the amount of cloning that
restricts data typ es declarations somewhat, but do es p er-
is done.
form the analysis for exactly this reason.
In GHC, a variable's name is actually a pair of a string and
Before discussing recursive bindings, it is worth noting two
a unique numb er. The unique is used for comparisons, but
other p ossible sources of divergence that a Haskell compiler
the string is used when printing optionally augmented with
do es not have to deal with. Firstly, in an untyp ed setting
the unique if there is a danger of ambiguity. When wedo
suchasaScheme compiler one can easily construct terms
need to clone a name, weinvent a new unique, but keep the
suchas
same print-name. This makes it p ossible to print dumps of
intermediate co de that still contain names that relate to the
\x -> x x \x -> x x 6
let This expression is not explicitly recursive, but it nevertheless
eq = ... reduces to itself. However, the strong-normalisation theo-
in rem for F tells us that such terms simply must b e ill-typ ed.
!
let rec
Secondly, side e ects which Haskell lacks can create a re-
d = eq, neq
1
cursive structure. For example :
neq = \a b -> case d of
e,n -> not e a b
let foo a-special-value
in
bar a-special-value
...
begin
set! foo lambda ..bar..
GHC generates co de quite like this for an \Eq dictionary". A
set! bar lambda ..foo..
\dictionary" is a bundle of related \metho ds" for op erating
body
on values of a particular typ e. Here, the Eq dictionary, d,
is a pair of metho ds ordinary functions, eq and neq ; the
Here, foo and bar are mutable lo cations, each of whichis
intention is that eq is a function that determines whether
up dated to refer to the other.
its arguments are equal, and neq determines whether they
are unequal.
4.1 The problem
In this example, the neq metho d is sp eci ed by selecting the
eq metho d from the dictionary d, calling it, and negating its
From nowonwe fo cus our attention on recursive bindings.
result. You might think that it would be more straight-
We call a group of bindings wrapp ed in rec a recursive
forward to call eq directly, but this co de is generated by
group. Unrestricted inlining of non-recursive bindings is
the compiler from class and instance declarations in the
safe, but unrestricted inlining of recursive bindings might
Haskell source co de. We found that it was very hard, in gen-
lead to non-termination. One obvious thing to do, therefore,
eral, to call the appropriate metho d directly; it was much
is to ensure that each recursive group really is recursive. To
easier to allow the front end to generate naive co de, and let
discover this, we regard eachvariable in the group as a node,
the simpli er take care of the rest.
and we record an edge from f to g if f's right hand side men-
In this particular example, d and neq are genuinely mutually
tions g so f dep ends on g. The resulting collection of no des
recursive. Yet, if d were inlined in the body of neq , the
and edges describ es a graph, called the dependency graph,
case would cancel with the pair constructor, leading to the
whose strongly connected comp onents are the smallest p os-
following:
sible recursive groups [Pey87 ]. To exploit this observation,
GHC constructs the dep endency graph for each let rec ,
let
and analyses its strongly-connected comp onents. If there
eq = ...
is more than one comp onent, the let rec is split into a
neq = \a b -> not eq a b
nest of recursive and non-recursive let s. GHC p erforms this
d = eq, neq
analysis regularly; quite often, groups that were mutually-
in
recursive fall into separate strongly-connected comp onents
...
as a result of earlier transformations.
Noweverything is non-recursive, the de nition of neq is im-
So muchis well known. But what do we do when we are
proved, and inlining opp ortunities in the rest of the program
faced with a genuinely recursive group? The simplest thing
are improved.
to do is not to inline any recursively-b ound variables at all,
and that is what earlier versions of GHC did. But this con-
This is not an isolated or arti cial example. Compiling
servative strategy loses obviously-useful optimisation opp or-
Haskell's typ e-class-based overloading, using the dictionary-
tunities. Consider a recursive group of bindings:
passing enco ding sketched ab ove, gives rise to p ervasive re-
cursion through these dictionaries. Failing to unravel the
let rec
recursion has a devastating e ect on p erformance, b ecause
f = \ x -> ...g...
overloaded functions include equality, ordering, and all nu-
g = \ y -> ...f...
meric op erations, some of which show up in almost any inner
in
lo op. We originally went to great lengths in the front end
...f...
to avoid generating unnecessary dictionary recursion but,
no matter how hard we tried, some unnecessary rec s stil l
By convention, other variables of interest, suchas g in this
showed up. Our new approach uses a much simpler transla-
case, are assumed not to b e free in ...f... . Since only f is
tion scheme, along with an inliner that do es a go o d job of
called outside the rec,we can inline g at its unique call site
inlining rec -b ound variables. This approach has the merit
to give:
that it works equally well for complex recursions written
let rec
by the programmer, though admittedly these are much less
f = \ x -> ......f......
common.
in
...f...
4.2 The solution
Here, the gain is mo dest. But sometimes inlining in recsis
critically imp ortant. Consider this:
The real problem with recursive bindings is that they can
1
Thanks to Manuel Serrano for p ointing this out.
make the inliner fall into an in nite lo op. The key insightis
this: 7
The result of the algorithm is an ordered list of bindings The inliner cannot loop if every cycle in the dependency
with the following prop erty: the only forwardreferences are graph is broken by a variable that is never inlined.
to loop-breakers. The bindings are still, of course, mutu-
ally recursive, but all the non-lo op-breakers can b e treated
The conservative scheme works by never inlining any
exactly like non-recursive lets so far as the inliner is con-
recursively-b ound variable, but that is over-kill, as wesaw
cerned: their de nition o ccurs b efore any of their uses, and
in the example in Section 4.1:
inlining them cannot cause non-termination. For example,
rec
consider the ve-no de dep endency graph given ab ove. It
d = eq, neq
forms a single strongly-connected comp onent. Supp ose we
neq = \a b -> case d of
pick q as a lo op breaker; we delete arcs leading to it and
e,n -> not e a b
p erform the strongly-connected comp onent analysis again.
The reduced dep endency graph has three strongly-connected
we obtained much b etter results by inlining d but not neq
comp onents, namely fp g, ff ; g; hg, and fq g
than by inlining neither. The dep endency graph for this
group forms a circle, thus:
h f d neq
g
To prevent the inliner diverging, it suces to cho ose either
d or neq , and refrain from inlining it. In a more com-
of p q
plicated situation, however, it might not b e at all obvious
whichvariables suce to break all the lo ops. For example,
consider this more complex dep endency graph:
We use dashed arcs for the arcs that are deleted in step
c. Supp ose now that we cho ose f as the lo op breaker.
wwehave no strongly connected comp onents left in the
f g q No reduced graph:
h p h f
e can break all the lo ops by picking g alone,
In this graph, w g
or f and q ,orh and p,oravariety of other pairs. To exploit
this idea, we enhance the standard rec-breaking dep endency
ve, in the following way. For each rec
analysis describ ed ab o p q
group, we construct its dep endency graph, and then execute
the following algorithm:
Notice that the only forward arcs are the dashed arcs lead-
1. Perform a strongly-connected comp onent analysis of
ing to lo op breakers. Reconstructing the recursive group
the dep endency graph.
in top ologically sorted order left to right in the diagrams
gives:
2. For each strongly-connected comp onent of the graph,
p erform the following steps, treating the comp onents
rec
in top ologically-sorted order; that is, deal rst with
p = ...q...
the comp onent that do es not refer to any of the other
h = ...f...
comp onents, and so on.
g = ...h...
f* = ...g...
a If the comp onent is a singleton that do es not de-
q* = ...g...
p end on itself, do nothing.
The \* " indicates the lo op breakers. Only the lo op breakers
b Otherwise, cho ose a single variable, the loop-
are referred to in the group earlier than they are de ned,
breaker, that will not b e inlined. This choice is
considering the de nitions top to b ottom. This is a won-
made using a heuristic we discuss shortly Sec-
derful prop erty. As we shall see later Section 6, inlining
tion 4.3.
even non-recursive let -b ound variables is far from straight-
c Take the dep endency graph of the comp onenta
forward, and having to worry ab out recursion would only
subset of the original graph, and delete all the
makeitworse. The b eauty of the lo op-breaking algorithm
edges in this graph that terminate at the lo op-
means that recursive let s can b e treated essentially iden-
breaker.
tically to non-recursive lets, thereby factoring the problem
d Rep eat the entire algorithm for this new dep en-
into two indep endent pieces: rst cut the lo ops, and then
dency graph, starting with Step 1.
treat recursive and non-recursive bindings uniformly. 8
4.3 Selecting the lo op breaker This approach has the very great merit that it deals readily
with all forms of non-termination: recursive functions, re-
cursive data typ es, untyp ed languages and side e ects, for
There are two criteria that one might use to select a lo op
example, all cause no problems. The diculty with this ap-
breaker:
proach in our setting is that the simpli er is applied rep eat-
edly, a dozen times or more, b etween applying other trans-
Try not to select a variable that it would b e very b en-
formations strictness analysis, let- oating, etc. If each it-
e cial to inline.
eration accepts a given amount of co de growth, or e ort ap-
plied, then each iteration might unrollarecursive function
Try to select a variable that will break many lo ops.
further. The e ort/size b ound mechanism uses an auxiliary
parameter the e ort/size budget that is not recorded in
GHC currently uses only the rst of these criteria. The sec-
the tree between successive iterations of the simpli er; it
ond is a bit tricky to predict, and wehave not explored using
records the state of the inliner itself.
it. Toevaluate the rst criterion, GHC crudely \scores" each
variable byhowkeen GHC is to inline it. Sp eci cally, we
Our approach do es not have this problem | successive ap-
pick the rst of the following criterion that applies to the
plications of the simpli er will eventually terminate. How-
binding in question:
ever, our more static analysis required that recursive func-
tions and recursive data typ es b e handled di erently, which
is undesirable. And yet more would b e needed in an untyp ed
Score = 3, if the right hand side is just a constantorvari-
or impure setting.
able. In this case the binding will certainly b e inlined.
A quite separate, complementary, approach to inlin-
Score = 3, if the variable o ccurs just once counting b oth
ing recursive functions is variously describ ed by [App94]
the right hand sides of the rec itself and the b o dy of
\lo op headers", [Ser97 ] \lab els-inline", [DS97 ] \lamb da-
the let . The variable is likely to b e inlined if it o ccurs
dropping", and [San95 ] \the static argument transforma-
only once.
tion". The common idea is to turn a recursive function
de nition into a non-recursive function containing a lo cal,
Score = 2, if the right hand side is a constructor applica-
recursive de nition. Thus we can, for example, transform
tion. Thus, weavoid selecting \d" in the example in
the standard recursive de nition of map:
Section 4.1, b ecause its right hand side is a pair.
map = \f xs -> case xs of
Score = 1, if the variable has rewrite rules or sp ecialisa-
[] -> []
tions attached to it. Details of this are beyond the
x:xs -> f x : map f xs
scop e of this pap er.
into the following non-recursive de nition:
Score = 0, otherwise.
map = \f xs ->
let mp = \xs -> case xs of
Then we pick a lo op breaker by arbitrarily cho osing one of
[] -> []
the variables with lowest score. While this scoring mech-
x:xs -> f x : mp xs
anism is very crude, it seems adequate. In practice, we
in mp xs
have never come across a rec in which a di erent choice
of lo op breaker would have made a signi cant di erence.
With the original de nition, inlining would simply unroll a
This amounts to anecdotal evidence only; wehave not tried
nite numb er of iterations of map . With the new de nition,
systematically to measure the e ectiveness of lo op-breaker
inlining map creates a new, sp ecialised function de nition for
choice.
mp into which the particular f used at the call site can b e
inlined, p erhaps resulting in b etter co de | claimed b ene ts
range from 1 to 10. The overall e ect is much b etter than
4.4 Other approaches
that achieved by simply unrolling the original de nition of
map ; unrolling a lo op reduces the overheads of the lo op itself,
Amuch more common approach to termination, taken by
whereas creating a sp ecialised function, mp , reduces the cost
b oth [Ser97 ] and [WD97 ], is to b ound b oth the e ort that
the computation in each iteration of the lo op.
the inliner is prepared to invest, and the size of the expres-
sion it is prepared to build, when inlining a particular call. If
The static argument transformation may indeed b e useful,
either limit is exceeded, the inliner abandons the attempt to but it is orthogonal to the main thrust of this pap er. It is
inline the call. Bounding e ort deals with expressions, such b est considered as a separate transformation, p erformed on
as \x->x x\x->x x, that do not grow, but do not ter- map b efore inlining is b egun, that enhances the e ectiveness
minate either. The e ort b ound is typically set quite high, of inlining.
to allow for cascading transformations, so an e ort b ound
alone might pro duce very large residual programs; that is
4.5 Results
why the size b ound is necessary as well.
Avariant of the approach retains a stack of inlinings that
It is hard to o er convincing measurements for the e ec-
have b een b egun but not completed. When examining a
tiveness of the lo op-breaker algorithm, b ecause GHC is now
call, the function is not inlined if an inlining of that same
built in the exp ectation that rec s that can b e broken will
function is already in progress, or \p ending". In e ect, that
be. Nevertheless, Figure 2 gives some indicative results. It
function b ecomes the lo op breaker, but it is chosen dynam-
shows the the e ect of switching the lo op-breaker algorithm
ically rather than statically. 9
Allo cations No libs Libs to o
The real simpli er's typ e is a bit more complicated than
Mean +23 +78
this: it takes an argument that enables or disables individ-
Min 15 0
ual transformations; it gathers statistics ab out how many
Max +200 +1125
transformations are p erformed; and it takes a name supply,
to use when it has to conjure up a fresh name not based on
2
an existing name . However, we will not need to consider
Figure 2: E ect on total allo cation of switching o the lo op-
these asp ects here.
breaker algorithm
The substitution and in-scop e set p erform precisely the roles
describ ed in Section 3, but, as we shall see, they b oth have
o , by marking every rec-b ound variable as a lo op breaker.
further uses. The context tells the simpli er something
The \Mean" row shows the geometric mean of the ratio b e-
ab out the context in which the expression app ears e.g. it is
tween the switched-o version and the baseline version |
applied to some arguments, or it is the scrutinee of a case
we use a geometric mean b ecause we are averaging ratios
expression. This context information is imp ortant when
[FW86 ]. The \Min" and \Max" rows show the most ex-
making inlining decisions Section 7.5.
treme ratios we found.
We refer to an un-pro cessed expression as an \in-
The e ects are dramatic. The column headed \No libs"
expression", and an expression that has already b een pro-
has the lo op-breaking algorithm switched o when compil-
cessed as an \out-expression", and similarly for variables.
ing the application, but not when compiling the standard
The reasons for making these distinctions will b ecome ap-
libraries. The column \Libs to o" shows the e ect of switch-
parent Section 6.2.
ing o the lo op-breaking algorithm when compiling the stan-
type InVar = Var
dard libraries as well. The imp ortance of the libraries is that
type InExpr = Expr
they contain implementations of arithmetic over basic typ es;
type InAlt = Alt
if that is compiled badly then p erformance su ers horribly.
We are investigating the strange 15 gure, which sug-
type OutVar = Var
gests that switching o lo op breakers improved at least one
type OutExpr = Expr
program.
type OutAlt = Alt
As indicated in Section 2, the simpli er treats an entire
4.6 Summary
Haskell mo dule which GHC treats as a compilation unit
as a sequence of bindings, some recursive and some not. It
In retrosp ect, the algorithm is entirely obvious, yet wespent
deals each of these bindings in turn, just as if they were in
ages trying half-baked hacks, none of which quite worked,
a nested sequence of let s.
b efore nally biting the bullet and nding it quite tasty.Itis
more likely to b e imp ortant for compilers for lazy languages
than for strict ones, b ecause only non-strict languages allow
5.2 The o ccurrence analyser
recursive data structures, and it is there that the most im-
p ortant p erformance implications show up. However, as our
It is clear that whether to inline x dep ends a great deal on
rst example demonstrated, even where no data structures
how often x o ccurs in E. Before each run of the simpli er,
are involved, useful improvements can b e had.
GHC runs an occurrence analyser, a b ottom-up pass that
annotates each binder with an indication of how it o ccurs,
All of this is entirely orthogonal to the question of lo op un-
chosen from the following list:
rolling. A lo op breaker could b e inlined a xed number of
times to gain the e ect of lo op unrolling.
LoopBreaker . The o ccurrence analyser executes the
dep endency-graph algorithm we discussed in Sec-
5 Overall architecture tion 4.1, marking lo op breakers, and sorting the
bindings in each rec so that only lo op breakers are
referred by an earlier de nition in the sequence.
The GHC inliner tries to do as much inlining as p ossible in
Building the dep endency graph uses precisely the
a single pass. Since inlining often reveals new opp ortunities
information that the o ccurrence analyser is gathering
for further transformations, the inliner is actually part of
anyway, namely information ab out where the b ound
GHC's simpli er, which p erforms a large number of lo cal
variables of the rec o ccur.
transformations [PJS98 ]. In this section we giveanoverview
of the simpli er to set the scene for the rest of the pap er.
Dead . The binder do es not o ccur at all. For a let binder
whether recursive or not, the binding can be dis-
carded, and the o ccurrence analyser do es so immedi-
5.1 The simpli er
ately, so that it do es not need to analyse the right hand
sides.
The simpli er takes a substitution, a set of in-scop e vari-
OnceSafe . The binder o ccurs exactly once, and that o ccur-
ables, an expression, and a \context", and delivers a simpli-
rence is not inside a lamb da, nor is a constructor ar-
ed expression:
2
We could certainly do without this name supply,by conjuring
simplExpr :: Subst -> InScopeSet
up names based on an arbitrary base name, but it turns out that it
-> InExpr -> Context
can conveniently piggy-back on the monadic plumbing for the other
administrative arguments. -> OutExpr 10
The simpli er alternates between o ccurrence analysis and gument. Inlining is unconditionally safe; it duplicates
simpli cation, until the latter indicates that no transforma- neither co de nor work. Section 2.2 explained whywe
tions o ccurred, or until some arbitrary numb er currently 4 must not inline an arbitrary expression inside a lamb da,
of iterations has o ccurred. This entire algorithm is applied and also describ ed the trivial-constructor-argument in-
between other ma jor passes, such as sp ecialisation, strict- variant.
ness analysis [PP93 ], or let- oating [PPS96 ].
MultiSafe . The binder o ccurs at most once in each of sev-
GHC is capable of wholesale inlining across mo dule b ound-
eral distinct case branches; none of these o ccurrences
aries. Whenever GHC compiles a mo dule M it writes an
is inside a lamb da. For example:
\interface le", M.hi , that contains GHC-sp eci c informa-
tion ab out M, including the full Core-language de nitions for
case xs of
any top-level de nitions in M that are smaller than a xed
[] -> y+1
threshold. This threshold is chosen so that few, if any,
x:xs -> y+2
larger functions could p ossibly b e inlined, regardless of the
calling context. When compiling any mo dule, A, that im-
In this expression, y o ccurs only once in each case
p orts M, GHC slurps in M.hi , and is thereby equipp ed to
branch. Inlining y may duplicate co de, but it will not
inline calls in A to M's exp orts. Since the de nition of func-
duplicate work.
tion exp orted from M might refer to values not exp orted from
OnceUnsafe . The binder o ccurs exactly once, but inside a
M, GHC dumps into M.hi the transitive closure of all su-
lamb da. Inlining will not duplicate co de, but it might
ciently small functions reachable from M's exp orts. Values
duplicate work Section 2.2.
that are not exp orted from M may not b e mentioned directly
by the programmer, but may nevertheless b e inlined by the
MultiUnsafe . The binder may o ccur many times, including
inliner.
inside lamb das. Variables exp orted from the mo dule
The consequence of all this is that A may need to be re-
b eing compiled are also marked MultiUnsafe , since the
compiled if M changes. There is no avoiding this, except by
compiler cannot predict how often they are used.
disabling cross-mo dule inlining via a command-line ag.
GHC go es to some trouble to add version stamps to every
Notice that we have three variants of \o ccurs once"
inlining in M.hi so that it can deduce whether or not A real ly
OnceSafe , MultiSafe , and OnceUnsafe . We have found
needs to b e recompiled.
all three to b e imp ortant.
Some lamb das are certain to b e called at most once. Con-
6 The three-phase inlining strategy
sider:
let x = foo 1000
After considerable exp erimentation, GHC now makes an in-
f = \y -> x+y
lining decision ab out a particular let b ound variable at no
in case a of
fewer than three distinct moments. In this section we ex-
[] -> f 3
plain why. Consider again the expression:
b:bs -> f 4
let x = E in B
Here f cannot b e called more than once, so no work will b e
duplicated by inlining x,even though its o ccurrence is inside
a lamb da. Hence, it would b e b etter to give x an o ccurrence
PreInlineUnconditionally. When the simpli er meets
annotation of OnceSafe , rather than OnceUnsafe .
the expression for the rst time, it considers whether
to inline x unconditionally in B. It do es so if and only
We call such lamb das one-shot lambdas, and mark them
if x is marked OnceSafe see Section 5.2. In this case,
sp ecially. They certainly o ccur in practice | for example,
the simpli er do es not touch E at all; it simply binds
they are constructed as join p oints by the case-of-case trans-
x to E in its current substitution, discards the binding
formation for details see [PJS98 ]. We are still working
completely, and simpli es B using this extended substi-
on a typ e-based analysis for identifying one-shot lamb das
tution. This is the main use of the substitution b eyond
[WP99 ]. Details of this analysis are beyond the scop e of
dealing with name capture, but it needs a little care,
this pap er, but our p oint here is that they are b eautifully
as we discuss in Section 6.2.
easy to exploit: the o ccurrence analyser simply ignores them
when it is computing its \inside-lamb da" information.
Notice, crucially, that the right hand side of the de ni-
tion is processed only once, namely at the o ccurrence
site. It turns out that this is very imp ortant. If the
5.3 Summary
right hand side is pro cessed when the let is encoun-
tered, and then again at the o ccurrence of the variable,
The overall plan for GHC's simpli er is therefore as follows:
the complexity of the simpli er b ecomes exp onential
in program size. Why? Because the right hand side is
while something-happened && iterations < 4
pro cessed twice; and it mighthavea let whose right
do
hand side is then pro cessed twice each time; and so on.
perform occurrence analysis
In retrosp ect this is obvious, but it was very puzzling
simplify the result
at the time!
end
PostInlineUnconditionally. If the pre-inline test fails,
the simpli er next simpli es the right hand side, E ,to 11
pro duce E' . It then again considers whether to inline solutions suggest themselves | for example, provide some
x unconditionally in B. It decides to do so if and only mechanism for xing a's o ccurrence information; or get the
if o ccurrence analyser to propagate b's o ccurrences to a | and
we tried some of them. They are all complicated, and the
x is not exp orted from this mo dule exp orted def-
result was a bug farm.
initions must not b e discarded, and
We nally discovered the three-phase inline mechanism we
x is not a lo op breaker, and
have describ ed. It is simple, and obviously correct. The
3
E' is trivial { that is, a literal or variable . Nei-
PreInlineUnconditionally phase only inlines avariable x if
ther work nor co de is is duplicated if a trivial
x o ccurs once, not inside a lamb da. That means that the
expression is inlined.
occurrence information for any variable, y ,freeinx's right
hand side is una ected by the inlining.
If so, then again the binding is dropp ed, and x is
mapp ed to E' in the substitution.
On the other hand, once the right hand side has b een pro-
cessed, if y is going to b e inlined unconditionally, then that
This case is quite common; it corresp onds to copy prop-
will have happ ened already. In our example, PreInlineUn-
agation in a conventional compiler. It often arises as a
conditionally will decide to inline a. Now the simpli er
result of -reduction. For example, consider the de ni-
moves on to the binding for b. PreInlineUnconditionally de-
tions:
clines to inline, so the right hand side of b is pro cessed; a is
inlined, and a pro cessed version of ...big... is pro duced.
f = \x -> E
This is not trivial, so PostInlineUnconditionally declines to o.
t = f a
Another obvious question is whether PostInlineUncondition-
If f is inlined, we get a redex, and thence
ally could b e omitted altogether, leaving CallSiteInline to do
its work. Here the answer is clearly \yes"; PostInlineUncon-
f = \x -> E
ditionally is just an optimisation that allows trivial bindings
t = let x = a in E
to b e dropp ed a little earlier than would otherwise b e the
case. To summarise, the key feature of our three-phase inlin-
The interesting question is whywe do not make this
ing strategy is that it allows the use of simple, pre-computed
test at the PreInlineUnconditionally stage, something
o ccurrence information, while still avoiding the exp onential
we discuss b elow.
blowup that can o ccur if PreInlineUnconditionally is omit-
CallSiteInline. If neither of the ab ove holds, GHC retains
ted.
the let binding, adds x to the in-scop e set. While
pro cessing B,atevery occurrence of x, GHC considers
6.2 The substitution
whether to inline x. This decision is based on a fairly
complex heuristic, that we discuss in Section 7. If the
decision is \Yes", then GHC needs to have access to x's
As we mentioned at the start of Section 6, the simpli er
de nition; this can b e achieved quite elegantly, as we
carries along a the current substitution, and b the set
discuss in Section 6.3.
of variables in scop e. But since the simpli er is busy trans-
forming the expression and cloning variables, wehavetobe
more precise:
6.1 Why three-phase?
The domain of the substitution is in-variables.
An obvious question is this: why not combine PostInlineUn-
conditionally with PreInlineUnconditionally? That is, be-
The in-scop e set consists of out-variables.
fore pro cessing E,why not lo ok to see if it is trivial e.g. a
variable, and if so inline it unconditionally? Doing so is a
We discussed in-variables and out-variables in Section 5.
huge, but rather subtle, mistake.
But what is the range of the substitution? When used for
cloning or PostInlineUnconditionally the range was an out-
The mistake is to do with the correctness of the pre-
expression, but when used in PreInlineUnconditionally the
computed o ccurrence information. Supp ose wehave:
range was an in-expression. But watch out! Since we are,
let
in e ect, deferring the simpli cation of the in-expression, we
a = ...big...
must also record the substitution appropriate to the original
b = a
site of the expression. Thus we are led to the following
in
de nition for the substitution:
...b...b...b...
type Subst = FiniteMap InVar SubstRng
a will b e marked OnceSafe , and hence will b e inlined uncon-
data SubstRng = DoneEx OutExpr
ditionally. But if PreInlineUnconditionally now sees that b's
| SuspEx InExpr Subst
right-hand side is just a , and inlines b everywhere, a now ef-
A DoneEx is straightforward, and is used b oth by the name-
fectively o ccurs in many places. This is a disaster, b ecause
cloning mechanism, and by PostInlineUnconditionally. A
a is now inlined unconditionally in many places.
SuspEx Susp for \susp ended" is used by PreInlineUncon-
The cause of this disaster is that a's o ccurrence information
ditionally, and pairs an in-expression with the substitution
was rendered invalid by our decision to inline b. Several
appropriate to its let binding; you can think of it as a sus-
p ended application of simplExpr . Notice that we do not
3
Or, in the real compiler, a typ e application. 12
Pre Post CallSite
capture the in-scop e set as well. Why not? Because we
Mean 47:4 17:4 35:2
must use the in-scop e set appropriate to the o ccurrence site
Min 0:25 0:92 0:72
| Section 7.1 ampli es this p oint.
Max 80 95 98
6.3 The in-scop e set
Figure 3: Relative frequency of inlining
We mentioned earlier Section 6 that the simpli er needs
access to a let -b ound variable's right-hand side at its oc-
eliminate any alternatives that cannot p ossibly match. Sim-
currence sites. All we need is to turn the in-scop e set into
ilarly, the expression x `seq` F inside E can b e transformed
4
a nite mapping:
to just F , since NotAmong implies that x is evaluated . Even
the value NotAmong [] is useful: it signals that the variable
type InScopeSet = FiniteMap OutVar Definition
is evaluated, without sp ecifying anything ab out its value.
data Definition = Unknown
| BoundTo OutExpr OccInfo
The in-scop e set, extended to b e an in-scop e mapping, plays
| NotAmong [DataCon]
the role of a dynamic environment. It records knowledge of
the value of each in-scop e variable, including knowledge that
Whether or not a variable is in scop e can b e answered by
may b e true for only part of that variable's scop e. The nice
lo oking in the domain of the in-scop e set we still call it a
thing is that this dynamic knowledge can elegantly b e car-
\set" for old times sake. But the range of the mapping
ried by the in-scop e set, whichwe need anyway. The details
records what value the variable is b ound to:
of the transformations that exploit that dynamic knowledge
are b eyond the scop e of this pap er.
Unknown is used for variables b ound in lambda and case
patterns. We don't know what value sucha variable is
Almost all the time, the substitution and in-scop e set travel
b ound to.
together. But that is not always the case, as we discuss in
Section 7.1.
BoundTo is used for let b ound variables b oth recursive
and non-recursive, and records the right-hand side of
the de nition and the o ccurrence information left with
6.4 Measurements
the binding by the o ccurrence analyser. The latter is
needed when making the inlining decision at o ccurrence
Figure 3 gives some simple measurements of the relative
sites.
frequency of each form of inlining. We used the same set of
b enchmark programs in in Section 3.4, gathered statistics
NotAmong is describ ed shortly.
on how often each sort of inlining was used, and averaged
these separately-calculated prop ortions. Wetook arithmetic
The in-scop e set is also a convenient place to record informa-
means of the p ercentages, b ecause here we are averaging
tion that is valid in only part ofavariable's scop e. Consider:
\slices of the pie", so the \Mean" line should still sum to
100.
\x -> ...case x of a,b -> E...
The gures indicate that on average, each sort of inlining is
When pro cessing E, but not in the \... " parts, x is known to
actually used in practice, and that each dominates in some
b e b ound to a,b . So, when pro cessing the alternativeofa
programs.
case expression whose scrutinee is a variable, it is easy for
the simpli er to mo dify the in-scop e set to record x's bind-
ing. Why is this useful? Because E might contain another
6.5 Summary
case expression scrutinising x:
...case x of p,q -> F...
We can summarise the binding-site e ects on the substitu-
tion and in-scop e set as follows. Supp ose that we encounter
By inlining a,b for x, we can eliminate this case alto-
the binding x = E with a substitution subst , and an in-
gether. This turns out to b e a big win [PJS98 ].
scop e set in-scope .
The NotAmong variant of the Definition typ e allows the
simpli er to record negative information:
PreInlineUnconditionally. The substitution is extended
by binding x to SuspEx E subst . The in-scop e set is
case x of
not changed.
Red -> ...
Blue -> ...
PostInlineUnconditionally. The substitution is ex-
Green -> ...
tended by binding x to DoneEx E' , where E' is the
DEFAULT -> E
simpli ed version of E. The in-scop e set is not changed.
The DEFAULT alternative matches any constructors other
Otherwise. If x is not already in scop e, the substitution is
than Red , Blue , and Green . GHC supp orts such DEFAULT
not changed, but the in-scop e set is extended by bind-
alternatives directly, rather than requiring case expressions
ing x to E'. If x is already in scop e, then a new variable
to b e exhaustive, which is dreadful for large data typ es. In-
name x' is invented Section 3.3; the substitution is
side E , what is known ab out x? What we know is that it
4
is not b ound to Red , Blue , or Green . This can be useful;
The expression E1 `seq` E2 evaluates E1, discards the result, and
then evaluates and returns E2.
if E contains a case expression that scrutinises x, we can 13
replace x by x1! The right thing to do is to continue with extended by binding x to DoneEx x', and the in-scop e
the empty substitution. set is extended by binding x' to E'.
The co de is simple enough, but it to ok us a long time b efore
This concludes the discussion of what happ ens at the bind-
the interplaybetween the substitution and the in-scop e set
ing site of a variable. Nowwe consider what happ ens at its
b ecame as simple and elegant asitnow is.
o ccurrences.
7.2 Inlining at an o ccurrence site
7 Occurrences
Once the simpli er has found a variable that is not in the
substitution and hence is an OutVar , we need to decide
When the simpli er nds an o ccurrence of a variable, it rst
whether to inline it CallSiteInline from Section 6. The
lo oks up the variable in the substitution Section 7.1, and
rst thing to do is to lo ok up the variable in the in-scop e
then decides whether to inline it Section 7.2.
set:
considerInline ins v cont
= case lookup ins v of
7.1 Lo oking up in the substitution
Nothing -> error "Not in scope"
When the simpli er encounters the o ccurrence of a variable,
Just BoundTo rhs occ | inline rhs occ cont
the latter b eing an InVar must b e lo oked up in the sub-
-> simplExpr empty ins rhs cont
stitution:
Just other -> rebuild Var v cont
simplExpr sub ins Var v cont
= case lookup sub v of
If the dynamic information is BoundTo , and the predicate
Nothing -> considerInline ins v cont
inline says \yes, go ahead", we simply tail-call the simpli-
Just SuspEx e s -> simplExpr s ins e cont
er, passing the in-scop e set and the empty substitution as
Just DoneEx e -> simplExpr empty ins e cont
in the DoneEx case of the substitution. In all other cases
The variable might not be in the substitution at all { for
we give up on inlining. The function rebuild , whichwedo
example, it might be a variable that did not need to be
not discuss further here, simply combines the variable with
renamed. In that case, the next thing to do is to consider
its context.
inlining it. The substitution can b e discarded at this p oint,
The inline predicate is the interesting bit. It lo oks rst at
b ecause the inlining if any is already an out-expression.
the variable's o ccurrence information:
Incidentally, notice that the variable we previously thought
of as an InVar is nowanOutVar . This is one reason that
inline :: OutExpr -> OccInfo -> Context -> Bool
InVar and OutVar are simply synonyms for Var, rather than
inline rhs LoopBreaker cont = False
b eing truly distinct typ es.
inline rhs OnceSafe cont = error "inline: OnceSafe"
inline rhs MultiSafe cont = inlineMulti rhs cont
If the substitution maps the variable to a SuspEx , then the
simpli er is tail called again, passing the captured substi-
inline rhs OnceUnsafe cont = whnfOrBot rhs &&
tution, and the current in-scop e set. The substitution and
not veryBoring cont
the in-scop e set usually travel together, but here they do
inline rhs MultiUnsafe con = whnfOrBot rhs &&
not. Wemust use the in-scop e set from the occurrence site
inlineMulti rhs cont
b ecause that describ es what variables are in scop e there,
and the substitution from the de nition site.
The LoopBreaker case is obvious. The OnceSafe case should
never happ en, b ecause PreInlineUnconditionally will have
The third case is when the variable maps to DoneEx e. In
already inlined the binding.
this case you might think wewere done. But supp ose e was
a variable. Then we should consider inlining it, given the
The OnceUnsafe case uses the whnfOrBot predicate Sec-
current context cont , which di ers from that at the vari-
tion 2.2, to ensure that inlining will not happ en if there
able's de nition site. What if e was a partial application
is anywork duplication. However, as noted in Section 2.2,
of a function? Again, the context mightnow indicate that
even if the variable o ccurs just once, it is not alwaysagood
the function should b e inlined. So the simple thing to do
idea to inline it. The veryBoring predicate has typ e
is simply to pass e to simplExpr again. But notice that we
veryBoring :: Context -> Bool
give it the empty substitution! Consider this example:
It examines the context, returning False if there is anything
\x -> let
at all interesting ab out it, namely if and only if:
f = x
in
\x -> ...f..f...
The variable is applied to one or more arguments.
When the binding for f is encountered, PostInlineUncondi-
The variable is the scrutinee of a case .
tionally will extend the substitution, binding f to DoneEx x .
When the \x is encountered, the substitution will again b e
Notice that if a variable is the argument of a constructor,
extended to bind x to DoneEx x1, b ecause x is already in
it is in a veryBoring context, and so it will not b e inlined,
scop e. Now, when we replace the o ccurrence of f by x,we
thus maintaining the trivial-constructor-argumentinvariant
must not apply the same substitution again, whichwould
Section 2.2. 14
The MultiSafe and MultiUnsafe cases deal with the situa- where the Cons {x,xs} is the saturated constructor appli-
tion where there is more than one o ccurrence of the variable. cation. In reality there are a few typ e abstractions and ap-
Both make use of inlineMulti to do the bulk of the work; plications to o, but the idea is the same. These de nitions
in addition, MultiUnsafe uses whnfOrBot to avoid work du- also make a convenient place to p erform argumentevalua-
plication. tion and p erhaps unboxing for strict constructors. For the
simple de nitions, suchascons , it is clearly b etter to inline
Incidentally, since whnfOrBot rhs dep ends only on rhs , it
the de nition, even if the context is boring .
is actually lazily cached in the BoundTo constructor rather
than b eing re-calculated at each o ccurrence site. Notice that the rst case is required even though
smallEnough is sure to return True if noSizeIncrease do es.
Why? Because otherwise the second case might decide that
7.3 Inlining multiple-o ccurrence variables
the context is boring and decline to inline.
Now we are left with the case of inlining a variable that
o ccurs many times. 7.4 Size matters
inlineMulti :: OutExpr -> Context -> Bool
Wehavenow nally arrived at the smallEnough predicate,
inlineMulti rhs cont
the main asp ect of this pap er for which there is a reasonable
| noSizeIncrease rhs cont = True
alb eit small literature. We do not claim any new contri-
| boring rhs cont = False
| otherwise = smallEnough rhs cont
bution here, though unlike some prop osals smallEnough is
context-sensitive:
The third case of inlineMulti is the function that every
inliner has: is the function small enough to inline? The rst
smallEnough :: Expr -> Context -> Bool
two cases are less obvious. The second case deals with the
For the record, however, the algorithm is as follows. We
situations like this:
compute the size of the function body having rst split
let
o its formal parameters, namely the lamb das at the top.
f = \x -> E
From this size we subtract:
in
... let g = \y z -> f y, f z in ... ...
The size of the call.
There is very little p oint in inlining f at these two sites,
An argument discount for each argument extracted
b ecause we can guarantee that no new transformations b e-
from the context that a has dynamic information
yond those already p erformed on f itself will b e enabled by
other than Unknown , and b is scrutinised bya case ,
doing so; the only saving is the call to f , and there is a co de
or applied to an argument, in the function b o dy.
duplication cost to pay. Howdowe know that no transfor-
A result discount if the context is not boring and
mations will b e enabled? Because: a the arguments y and
the function body returns an explicit constructor or
z are lamb da-b ound and hence uninformative; and b the
lamb da.
result of b oth calls are simply stored in a data structure.
The predicate boring takes an expression the one we are
If the result of this computation is smaller than the inline
considering inlining and a context in which it would be
threshold then we inline the function. The argument dis-
inlined.
count, result discount, and inline threshold are all settable
from the command line. Santos gives more details of GHC's
boring :: Expr -> Context -> Bool
heuristics [San95 , Section 6.3].
Corresp onding to our example ab ove, boring returns True
if b oth
7.5 The context
a All the arguments to which the function is applied are
typ es, or variables that have dynamic information of
It should bynow b e clear that the context of an expression
Unknown ; and
plays a key role in inlining decisions. For a long time we
passed in a varietyof ad hoc ags indicating various things
b After consuming enough arguments from the context
ab out the context, but wehavenowevolved a much more
to satisfy the lamb das at the top of the function, the
satisfactory story. The context is a little like a continuation,
remaining context is veryBoring .
in that it indicates how the result of the expression is con-
sumed. But this continuation must not berepresentedasa
Even if the context is boring , however, it is still worth
function b ecause wemust b e able to ask questions of it, as
while inlining the function if the result of doing so is no
the earlier sub-sections indicate.
bigger than the call [App92]. That is what the predicate
So GHC's contexts are de ned by the following data typ e: noSizeIncrease tests. Again, one might exp ect this case
to be rare, but it isn't. For example, Haskell data con-
data Context
structors are curried functions, but in GHC's intermediate
= Stop
language constructor applications are saturated Section 2.
| AppCxt InExpr Subst Context
We bridge this gap by pro ducing a function de nition for
| CaseCxt InVar [InAlt] Subst Context
each constructor such as:
| ArgCxt OutExpr -> OutExpr
cons = \x xs -> Cons {x,xs} | InlineCxt Context 15
110
The Stop context is used when b eginning simpli cation of
'Inline-threshold'
lazy function argument, or the right hand side of a let
a 100
The AppCxt context indicates that the expression
binding. 1
t. The
under consideration is to b e applied to an argumen 90
argumentisasyet un-simpli ed, and must b e paired with
, the CaseCxt context is used when
its substitution. Similarly 80
simplifying the scrutinee of a case expression.
70
simplExpr simply recurses into the expression, building a
Allocations (%)
text \stack" as it go es. Here, for example, is what
con 60
simplExpr do es for App and Case no des:
50
simplExpr sub ins App f a cont
2 4 8
simplExpr sub ins f AppCxt a sub cont = 12 40 75 80 85 90 95 100 105 110
Binary Size (%)
simplExpr sub ins Case e b alts cont
= simplExpr sub ins e CaseCxt b alts sub cont
Figure 4: E ect of inlining threshold
Wehave already seen how useful it is to know the context
of a variable o ccurrence. The context also makes it easy to
p erform other transformations, such as the case-of-known-
be de ned by pattern-matching, using multiple equations,
constructor transformation:
so there is no convenient syntactic place to ask for f to b e
inlined everywhere. At an o ccurrence site, however, it is
case a,b of { p,q -> E }
natural just to use a pseudo-function.
==>
let {p=a; q=b} in E
The e ects of InlineMe and InlinePlease are as follows:
simplExpr just matches a constructor application with a
The e ect of InlineMe is to make the enclosed ex-
CaseCxt continuation.
pression lo ok very small, which in turn makes the
The next case, ArgCxt , is used when simplifying the argu-
smallEnough predicate reply True . When simplExpr
ment of a strict function or primitive op erator. Here, a
nds an InlineMe in a non-boring context, it drops
genuine, functional continuation is used, b ecause no more
the InlineMe , b ecause its work is done.
needs to b e known ab out the continuation.
The e ect of InlinePlease is to push an InlineCxt
The InlineCxt context is discussed in the next subsection.
onto the context stack. The smallEnough predicate
In practice, GHC's simpli er has another couple of construc-
returns True if it nds such a context, regardless of the
tors in the Context data typ e, but they are more p eripheral
size of the expression.
so we do not discuss them here.
There is an imp ortant subtlety,however. Consider
7.6 INLINE pragmas
g = \a b -> ...big...
{- INLINE f -}
Like some other languages, GHC allows the programmer to
f = \x -> g x y
sp ecify that a function should be inlined at all its o ccur-
and supp ose that this is the only o ccurrence of g. Should we
rences, as a pragma in the Haskell source language:
inline g in f's right hand side? By no means! The program-
{- INLINE f -}
mer is asking that f b e replicated, but not g! The right thing
f x = ...
to do is to switch o all inlining when pro cessing the b o dy
of an InlineMe ; when f is inlined, then and only then g
GHC also allows the Haskell programmer to ask the compiler
will get its chance.
to inline a function at a particular call site, thus:
...inline f a b...
7.7 Measurements
The function inline has typ e 8 : -> , and is semantically
the identity function. Op erationally, though, it asks that
As mentioned in Section 7.4, our implementation makes use
f be inlined at this call site. Such p er-o ccurrence inline
of an \inline threshold" to determine whether a given ex-
pragmas are less commonly o ered by compilers [Bak92 ].
pression is small enough to inline. Figure 4 shows the e ect
of varying this threshold on the geometric mean of binary
Both these pragmas are translated to constructors in the
size and allo cation. We use allo cation instead of run-time
Note data typ e, which itself can b e attached to an expression
b ecause allo cation is easy to measure rep eatably, and is a
Section 2:
somewhat reliable proxy for run-time, with the notable ex-
data Note = ...
ception of some very small programs.
| InlineMe -- {- INLINE -}
The actual values for the threshold are fairly arbitrary, and
| InlinePlease -- inline
are a ected by some of the other parameters: discounts for
If they are so similar in the Core language, why do they
evaluated arguments and so on. What is more interesting
app ear so di erent in Haskell? Haskell allows functions to
is the shape of the graph. As exp ected, beyond a certain 16
p oint, binary sizes increase without having any dramatic ef- their results are not directly applicable to our setting. Nev-
fect on the eciency of the program. The graph also shows ertheless, it is a unique and inspiring approach.
that setting the threshold to o low i.e. less than 2 has a
Copious measurements of many transformations in GHC
dramatic e ect on b oth binary size and run-time. Essen-
not only inlining can b e found in Santos's thesis [San95 ];
tially very little call-site inlining is b eing p erformed b elow
although these measurements are now several years old, we
this threshold, and even less inter-mo dule inlining is hap-
b elieve that the general outlines are unlikely to havechanged
p ening b ecause this is covered by call-site inlining only; we
dramatically. [PJS98 ] contains briefer, but more up-to-date,
can't see the binding.
measurements.
The jump between threshold values 1 and 2 is caused by
the fact that even functions marked {- INLINE -} are
9 Conclusion
not inlined at a threshold of 1. The \wrapp er" functions
generated by strictness analysis are of this form, and if these
wrapp ers are not inlined p erformance drops dramatically.
This pap er has told a long story. Inlining seems a relatively
Making measurements is very instructive: wewere surprised
simple idea, but in practice it is complicated to do a go o d
by the rather small p erformance increases as the threshold
job. The main contribution of the pap er is to set down, in
is increased b eyond 2, and plan to investigate this further.
sometimes-gory detail, the lessons that wehave learned over
nearly a decade of tuning our inliner. Everyone who tries to
build a transformation-based compiler has to grapple with
8 Related work
these issues but, b ecause they are not crisp or sexy, there is
almost no literature on the sub ject. This pap er is a mo dest
There is a mo dest literature on inlining applied to imp er- attempt to address that lack.
ative programming languages, such as C and FORTRAN
| some recent examples are [DH92 , CMCH92 , CHT91 ,
Acknowledgements
CHT92 ]. In these works the fo cus is exclusively on pro-
cedures de ned at the top level. The b ene ts are found to
b e fairly mo dest in the 10-20 range, but the cost in terms
We warmly thank Nick Benton, Oege de Mo or, Andrew
of co de bloat is also very mo dest. Considerable attention is
Kennedy, John Matthews, Sven Panne, Alastair Reid, Julian
paid to the e ect on register allo cation of larger basic blo cks,
Seward, and the four IDL Workshop referees, for comments
whichwe do not consider at all.
on drafts of this pap er. Sp ecial thanks are due to Manuel
Chakravarty, Manuel Serrano, Oscar Waddell, and Norman
It seems self-evident that the b ene ts of inlining are strongly
Ramsey, for their particularly detailed and thoughtful re-
related to b oth language and programming style. Functional
marks.
languages encourage the use of abstractions, so the b ene ts
of inlining are likely to b e greater. Indeed, App el rep orts
b ene ts in the range 15-25 for the Standard ML of New
References
Jersey compiler [App92], while Santos rep orts average b ene-
ts of around 40 for Haskell programs [San95 ]. Chamb ers
rep orts truly dramatic factors of 4 to 55 for his SELF com-
[AJ97] AW App el and T Jim. Shrinking lamb da-
piler [Cha92 ]; SELF takes abstraction very seriously indeed!
expressions in linear time. Journal of Functional
Programming, 75:515{541, Septemb er 1997.
The most detailed and immediately-relevantwork wehave
found is for two Scheme compilers. Waddell and Dyb-
[App92] AW App el. Compiling with continuations. Cam-
vig rep orts p erformance improvements of 10-100 in the
bridge University Press, 1992.
Chez Scheme compiler [WD97 ], while Serrano found a
[App94] AW App el. Lo op headers in lamb da-calculus or
more mo dest 15 b ene t for the Biglo o Scheme compiler
CPS. Lisp and Symbolic Computation, 7:337{
[Ser95 , Ser97]. Both use a dynamic, e ort/size budget
343, 1994.
scheme to control termination. The Chez Scheme inliner
uses an explicitly-enco ded context parameter that plays ex-
[ARS94] L Augustsson, M Rittri, and D Synek. On gener-
actly the role of our Context Section 7.5.
ating unique names. Journal of Functional Pro-
A completely di erent approach to the inlining problem is
gramming, 41:117{123, January 1994.
discussed by [AJ97 ]. In this pap er the fo cus is on inlining
[Bak92] HG Baker. Inlining semantics for subroutines
functions that are called precisely once, something that we
which are recursive. ACM Sigplan Notices,
have b een very concerned with. App el and Jim show that
2712:39{49, Decemb er 1992.
this transformation, along with a handful of others includ-
ing dead-co de elimination, are normalising and con uent,
[Bar85] HP Barendregt. The lambda calculus: its syntax
a very desirable prop erty. Their fo cus is then on nding
and semantics. Numb er 103 in Studies in Logic.
an ecient algorithm for applying the transformations ex-
North Holland, 1985.
haustively. Their solution involves adjusting the results of
the o ccurrence analysis phase as transformations pro ceed.
[BKR98] Nick Benton, Andrew Kennedy, and George
Their initial algorithm has worst-case quadratic complexity,
Russell. Compiling Standard ML to Javabyte-
but they also prop ose a more subtle and unimplemented
co des. In ICFP98 [ICF98 ], pages 129{140.
linear-time variant. We to o are concerned ab out ecient
application of transformation rules, but our set of trans-
formations is much larger, and includes general inlining, so 17
[Cha92] C. Chamb ers. The Design and Implementation [PP93] SL Peyton Jones and WD Partain. Measur-
of the SELF Compiler, an Optimizing Compiler ing the e ectiveness of a simple strictness anal-
for Object-Oriented Programming Languages. yser. In K Hammond and JT O'Donnell, editors,
Technical rep ort STAN-CS-92-1240, Stanford Functional Programming, Glasgow 1993,Work-
University, Departementof Computer Science, shops in Computing, pages 201{220. Springer
March 1992. Verlag, 1993.
[CHT91] KD Co op er, MW Hall, and L Torczon. An
[PPS96] SL Peyton Jones, WD Partain, and A Santos.
exp eriment with inline substitution. Software
Let- oating: moving bindings to give faster pro-
Practice and Experience, 21:581{601, June 1991.
grams. In ICFP96 [ICF96 ].
[CHT92] K. Co op er, M. Hall, and L. Torczon. Unex-
[San95] A Santos. Compilation by transformation in
p ected Side E ects of Inline Substitution: A
non-strict functional languages. Ph.D. thesis,
Case Study. ACM Letters on Programming Lan-
Department of Computing Science, Glasgow
guages and Systems, 11:22{31, 1992.
University, Septemb er 1995.
[CMCH92] PP Chang, SA Mahlke, WY Chen, and W-M
[Ser95] M. Serrano. A fresh lo ok to inlining deci-
Hwu. Pro le-guided automatic inline expansion
sion. In 4th International Computer Symposium
for C programs. SoftwarePractice and Experi-
ICS'95, Mexico city, Mexico, Novemb er 1995.
ence, 22:349{369, May 1992.
[Ser97] M Serrano. Inline expansion: when and how ? In
[dB80] N de Bruijn. A survey of the pro ject AU-
International Symposium on Programming Lan-
TOMATH. In JP Seldin and JR Hindley, edi-
guages Implementations, Logics, and Programs
tors, To HB Curry: essays on combinatory logic,
PLILP'97, Septemb er 1997.
lambda calculus, and formalism, pages 579{606.
Academic Press, 1980.
[SLM98] Z Shao, C League, and S Monnier. Implement-
ing typ ed intermediate languages. In ICFP98
[DH92] JW Davidson and AM Holler. Subprogram in-
[ICF98 ], pages 313{323.
lining: a study of its e ects on program execu-
tion time. IEEE Transactions on Software En-
[WD97] O Waddell and RK Dybvig. Fast and e ec-
gineering, 18:89{102, February 1992.
tive pro cedure inlining. In 4th Static Analysis
Symposium, number 1302 in Lecture Notes in
[DS97] O Danvy and UP Schultz. Lamb da-dropping:
Computer Science, pages 35{52. Springer Ver-
transforming recursive equations into programs
lag, Septemb er 1997.
with blo ck structure. In ACM SIGPLAN Sym-
posium on Partial Evaluation and Semantics-
[WP99] K Wansbrough and SL Peyton Jones. Once
BasedProgram Manipulation PEPM '97,vol-
up on a p olymorphic typ e. In 26th ACM Sympo-
ume 32 of SIGPLAN Notices, pages 90{106,
sium on Principles of Programming Languages
Amsterdam, June 1997. ACM.
POPL'99, pages 15{28, San Antonio, January
1999. ACM.
[FW86] PJ Fleming and JJ Wallace. How not to lie with
statistics - the correct way to summarise b ench-
mark results. CACM, 293:218{221, March
1986.
[How96] BT Howard. Inductive, co-inductive, and
p ointed typ es. In ICFP96 [ICF96 ].
[ICF96] ACM SIGPLAN International Conference on
Functional Programming ICFP'96, Philadel-
phia, May 1996. ACM.
[ICF98] ACM SIGPLAN International Conference on
Functional Programming ICFP'98, Balitmore,
Septemb er 1998. ACM.
[Par92] WD Partain. The nofib b enchmark suite
of Haskell programs. In J Launchbury and
PM Sansom, editors, Functional Programming,
Glasgow 1992,Workshops in Computing, pages
195{202. Springer Verlag, 1992.
[Pey87] SL Peyton Jones. The Implementation of Func-
tional Programming Languages. Prentice Hall,
1987.
[PJS98] SL Peyton Jones and A Santos. A
transformation-based optimiser for Haskell. Sci-
ence of Computer Programming, 321-3:3{47,
Septemb er 1998. 18