Secrets of the Glasgow Haskell Compiler Inliner

Secrets of the Glasgow Haskell Compiler inliner Simon Marlow Simon Peyton Jones Microsoft Research Ltd, Cambridge Microsoft Research Ltd, Cambridge [email protected] [email protected] Septemb er 1, 1999 A ma jor issue for any compiler, esp ecially for one that Abstract inlines heavily,is name capture. Our initial brute-force solution involved inconvenient plumbing, and wehave Higher-order languages, such as Haskell, encourage the pro- now evolved a simple and e ective alternative Sec- grammer to build abstractions by comp osing functions. A tion 3. go o d compiler must inline many of these calls to recover an eciently executable program. At rst wewere very conservative ab out inlining recursive de nitions; that is, we did not inline them at all. In principle, inlining is dead simple: just replace the call But we found that this strategy o ccasionally b ehaved of a function by an instance of its b o dy. But any compiler- very badly. After a series of failed hacks we develop ed writer will tell you that inlining is a black art, full of delicate a simple, obviously-correct algorithm that do es the job compromises that work together to give go o d p erformance b eautifully Section 4. without unnecessary co de bloat. Because the compiler do es so much inlining, it is im- The purp ose of this pap er is, therefore, to articulate the p ortant to get as much as p ossible done in each pass key lessons we learned from a full-scale \pro duction" inliner, over the program. Yet one must steer a careful path the one used in the Glasgow Haskell compiler. We fo cus between doing to o little work in each pass, requiring mainly on the algorithmic asp ects, but we also provide some extra passes, and doing to o much work, leading to an indicative measurements to substantiate the imp ortance of exp onential-cost algorithm. GHC now identi es three various asp ects of the inliner. distinct moments at which an inlining decision maybe taken for a particular de nition. We explain why in Section 6. 1 Intro duction When inlining an expression it is imp ortant to retain the expression's lexical environment, which gives the One of the trickiest asp ects of a compiler for a functional lan- bindings of its free variables. But at the inline site, the guage is the handling of inlining. In a functional-language compiler might know more ab out the dynamic state of compiler, inlining subsumes several other optimisations that some of those free variables; most notably, a free vari- are traditionally treated separately, such as copy propaga- able mightbeknown to b e evaluated at the inline site, tion and jump elimination. As a result, e ective inlining is but not at its original o ccurrence. Some key transfor- particularly crucial in getting go o d p erformance. mations make use of this extra information, and lacking The Glasgow Haskell Compiler GHC is an optimising com- it will cause an extra pass over the co de. We describ e piler for Haskell that has evolved over a p erio d of ab out ten how to exploit our name-capture solution to supp ort years. Wehave rep eatedly b een through a cycle of lo oking accurate tracking of b oth lexical and dynamic environ- at the co de it pro duces, identifying what could b e improved, ments Section 7. and going back to the compiler to make it pro duce b etter co de. It is our exp erience that the inliner is a lead player in None of the algorithms we describ e is individually very sur- many of these improvements. No other single asp ect of the prising. Perhaps b ecause of this, the literature on the sub- compiler has received so much attention. ject is very sparse, and we are not aware of published de- scriptions of any of our algorithms. Our contribution is to The purp ose of this pap er is to rep ort on several algorithmic abstract some of what wehave learned, in the hop e that we asp ects of GHC's inliner, fo cusing on asp ects that were not may help others avoid the mistakes that we made. obvious to us | that is to say, asp ects that we got wrong to b egin with. Most pap ers ab out inlining fo cus on howto For the sake of concreteness we fo cus throughout on GHC, cho ose whether or not to inline a function called from many but we stress that the lessons we learned are applicable to places. This is indeed an imp ortant question, but wehave any compiler for a functional language, and indeed p erhaps found that we had to deal with quite a few other less obvious, to compilers for other languages to o. but equally interesting, issues. Sp eci cally,we describ e the following: 2 Preliminaries is that ys is b ound to the result of evaluating the scrutinee, reverse xs in this case, which makes it p ossible to refer to this value in the alternatives. This detail has no impact on We will assume the use of a pure, non-strict, strongly-typ ed the rest of this pap er | indeed, we omit the extra binder intermediate language, called the GHC Core language. GHC in our examples | but wehave found that it makes several is itself written in Haskell, so we de ne the Core language transformations more simple and uniform, so we include it by giving its data typ e de nition in Haskell: here for the sake of completeness. type Program = [Bind] GHC's actual intermediate language is very slightly more complicated than that given here. It is an explicitly-typ ed data Bind = NonRec Var Expr language based on System F , and supp orts p olymorphism ! | Rec [Var, Expr] through explicit typ e abstraction and application. It turns out that doing so adds only one new constructor to the Expr data Expr = Var Var typ e, and adds nothing to the substance of this pap er, so we | App Expr Expr do not mention it further. The main p oint is that this pap er | Lam Var Expr omits no asp ect essential to a full-scale implementation of | Let Bind Expr Haskell. | Const Const [Expr] | Case Expr Var [Alt] | Note Note Expr 2.1 What is inlining? type Alt -- Case alternative Given a de nition x = E, one can inline x at a particular = Const, [Var], Expr o ccurrence by replacing the o ccurrence by E. We use upp er case letters, suchas\E ", to stand for arbitrary expressions, data Const -- Constant and \==> " to indicate a program transformation. For ex- = Literal Literal ample: | DataCon DataCon | PrimOp PrimOp let { f = \x -> x*3 } in f a + b - c | DEFAULT ==> a+b*3 - c The Core language consists of the lambda calculus aug- Wehave found it useful to identify three distinct transfor- mented with let-expressions b oth non-recursive and recur- mations that collectively implement what we informally de- sive, case expressions, data constructors, literals, and prim- scrib e as \inlining": itive op erations. In presenting examples we will use an in- formal, alb eit hop efully clear, concrete syntax. We will feel Inlining itself replaces an o ccurrence of a let -b ound free to use in x op erators, and to write several bindings in variable by a copy of the right-hand side of its de - a single non-recursive let-expression as shorthand for a se- nition. Inlining f in the example ab ove go es like this: quence of let-expressions. let { f = \x -> x*3 } in f a + b - c A program Program is simply a sequence of bindings, in ==> [inline f] dep endency order. Each binding Bind can be recursive let { f = \x -> x*3 } in \x -> x*3 a + b - c or non-recursive, and the right hand side of each binding is an expression Expr . The constructors for variables Notice that not all the o ccurrences of f need b e inlined, Var , application App , lamb da abstraction Lam, and let- and hence that the original de nition of f must, in expressions Let should be self-explanatory. A constant general, b e retained. application Const is used for literals, data constructor applications, and applications of primitive op erators; the num- Dead code elimination discards bindings that are no b er of arguments must match the arity of the constant, and longer used; this usually o ccurs when all o ccurrences of avariable have b een inlined. Continuing our example and the constant cannot b e DEFAULT . Likewise, the num- gives: ber of b ound variables in a case alternative Alt always matches the arity of the constant; and the latter cannot b e let { f = \x -> x*3 } in \x -> x*3 a + b - c a PrimOp . The Note form of Expr allows annotations to ==> [dead f] b e attached to the tree. The only impact on the inliner is \x -> x*3 a + b - c discussed in Section 7.6. Case expressions Case should b e self-explanatory, except -reduction simply rewrites a lambda application for the Var argumentto Case . Consider the following Core \x->E A to let {x = A} in E. Applying - expression, reduction to our running example gives: case reverse xs of ys { \x -> x*3 a + b - c a:as -> ys ==> [beta] [] -> error "urk" let { x = a+b } in x*3 - c } The unusual part of this construct is the binding o ccur- The rst of these is the tricky one; the latter two are easy.

Secrets of the Glasgow Haskell Compiler Inliner

A Methodology for Assessing Javascript Software Protections

A Deep Dive Into the Interprocedural Optimization Infrastructure

Attacking Client-Side JIT Compilers.Key

GHC Reading Guide

CS153: Compilers Lecture 19: Optimization

Handout – Dataflow Optimizations Assignment

Quaxe, Infinity and Beyond

Hy Documentation Release 0.12.1+64.G5eb9283

Dynamic Extension of Typed Functional Languages

What I Wish I Knew When Learning Haskell

An Industrial Strength Theorem Prover for a Logic Based on Common Lisp

Comparative Studies of Programming Languages; Course Lecture Notes