Compiling Haskell by Program Transformation
Total Page:16
File Type:pdf, Size:1020Kb
Compiling Haskell by program transformation a rep ort from the trenches Simon L Peyton Jones Department of Computing Science University of Glasgow G QQ Email simonpjdcsglaacuk WWW httpwwwdcsglaacuksimonpj January Abstract Many compilers do some of their work by means of correctnesspre se rving and hop e fully p erformanceimproving program transformations The Glasgow Haskell Compiler GHC takes this idea of compilation by transformation as its warcry trying to express as much as p ossible of the compilation pro cess in the form of program transformations This pap er rep orts on our practical exp erience of the transformational approach to compilation in the context of a substantial compiler The paper appears in the Proceedings of the European Symposium on Programming Linkoping April Introduction Using correctnesspreserving transformations as a compiler optimisation is a wellestablished technique Aho Sethi Ullman Bacon Graham Sharp In the functional programming area esp ecially the idea of compilation by transformation has received quite a bit of attention App el Fradet Metayer Kelsey Kelsey Hudak Kranz Steele A transformational approach to compiler construction is attractive for two reasons Each transformation can b e implemented veried and tested separately This leads to a more mo dular compiler design in contrast to compilers that consist of a few huge passes each of which accomplishes a great deal In any framework transformational or otherwise each optimisation often exp oses new opp ortunities for other optimisations the cascade eect This makes it dicult to decide a priori what the b est order to apply them might b e In a transformational setting it is easy to plug and play by reordering transformations applying them more than once or trading compilation time for co de quality by omitting some It allows a late commitment to phase ordering This pap er rep orts on our exp erience in applying transformational techniques in a particularly thoroughgoing way to the Glasgow Haskell Compiler GHC Peyton Jones et al a compiler for the nonstrict functional language Haskell Hudak et al Among other things this pap er may serve as a useful jumpingo p oint and annotated bibliography for those interested in the compiler A p ervasive theme is the close interplay b etween theory and practice a particularly satisfying asp ect of functionallanguage research Overview Haskell is a nonstrict purely functional language It is a relatively large language with a rich syntax and type system designed for fullscale application programming The overall structure of the compiler is conventional The front end parses the source do es scop e analysis and type inference and translates the program into a small intermediate language called the Core language This latter stage is called desugaring The middle consists of a sequence of CoretoCore transformations and forms the sub ject of this pap er The back end co degenerates the resulting Core program into C whence it is compiled to machine co de Peyton Jones To exploit the advantages of compilation by transformation mentioned ab ove we have worked particularly hard to move work out of the front and back ends esp ecially the latter and reexpress it in the form of a transformation We have taken the plug and play idea to an extreme allowing the sequence of transformation passes to b e completely sp ecied on the command line In practice we nd that transformations fall into two groups A large set of simple lo cal transformations eg constant folding b eta reduction These transformations are all implemented by a single relatively complex compiler pass that we call the simplier The complexity arises from the fact that the simplier tries to p erform as many transformations as p ossible during a single pass over the program exploiting the cascade eect It would b e unreasonably inecient to p erform just one at a time starting from the b eginning each time Despite these eorts the result of one simplier pass often still contains opp ortunities for further simplier transformations so we apply the simplier rep eatedly until no further transformations o ccur with a set maximum to avoid pathological b ehaviour A small set of complex global transformations eg strictness analysis sp ecialising overloaded functions each of which is implemented as a separate pass Most consist of an analysis phase followed by a transformation pass that uses the analysis results to identify appropriate sites for the transformation Many also rely on a subsequent pass of the simplier to clean up the co de they pro duce thus avoiding the need to duplicate transformations already embo died in the simplier Rather than give a sup ercial overview of everything we fo cus in this pap er on three asp ects of our compiler that play a key role in compilation by transformation Program Prog Bind ::: Bind n n Binding Bind var Expr Nonrecursive j rec var Expr ::: var Expr Recursive n n n Expression Expr Expr Atom Application j Expr ty Type application j var ::: var Expr Lambda abstraction n j tyvar ::: tyvar Expr Type abstraction n j case Expr of Alts Case expression j let Bind in Expr Lo cal denition j con var ::: var Constructor n n j prim var ::: var Primitive n n j Atom Atoms Atom var Variable j Literal Unboxed Ob ject Literals Literal integer j oat j ::: Alternatives Alts Calt ::: Calt Default n n j Lalt ::: Lalt Default n n Constr alt Calt Con var ::: var Expr n n Literal alt Lalt Literal Expr Default alt Default NoDefault j var Expr Figure Syntax of the Core language The Core language itself Section Two groups of transformations implemented by the simplier inlinin g and b eta reduc tion Section and transformations involving case expressions Section One global transformation pass the one that p erforms and exploits strictness analysis Section We conclude with a brief enumeration of the other main transformations incorp orated in GHC Section and a summary of the lessons we learned from our exp erience Section The Core language The Core language clearly plays a pivotal role Its syntax is given in Figure and consists essentially of the lambda calculus augmented with let and case Though we do not give explicit syntax for them here the Core language includes algebraic data type declarations exactly as in any mo dern functional programming language For example in Haskell one might declare the type of trees thus data Tree a Leaf a Branch Tree a Tree a This declaration implicitl y denes constructors Leaf and Branch that are used to construct data values and can b e used in the pattern of a case alternative Bo oleans lists and tuples are simply predeclared algebraic data types data Boolean False True data List a Nil Cons a List a data Tuple a b c T a b c One for each size of tuple Throughout the pap er we take a few lib erties with the syntax we allow ourselves inx op erators eg E E and sp ecial syntax for lists for Nil and inx for Cons and tuples eg abc We allow multiple denitions in a single let expression to abbreviate a sequence of nested let expressions and often use layout instead of curly brackets and semicolons to delimit case alternatives We use an upp ercase identier such as E to denote an arbitrary expression The op erational reading The Core language is of course a functional language and can b e given the usual denotational semantics However a Core program also has a direct operational interpretation If we are to reason ab out the usefulness of a transformation we must have some mo del for how much it costs to execute it so an op erational interpretation is very desirable The op erational mo del for Core requires a garbagecollected heap The heap contains Data values such as list cells tuples b o oleans integers and so on Function values such as x x the function that adds to its argument Thunks or susp ensions that represent susp ended ie as yet unevaluated values Thunks are the implementation mechanism for Haskells nonstrict semantics For example consider the Haskell expression f sin x y Translated to Core the expression would lo ok like this let v sin x in f v y The let allo cates a thunk in the heap for sin x and then when it subsequently calls f passes a p ointer to the thunk The thunk records all the information needed to compute its b o dy sin x in this case but it is not evaluated b efore the call If f ever needs the value of v it will force the thunk which provokes the computation of sin x When the thunks evaluation is complete the thunk itself is updated ie overwritten with the nowcomputed value If f needs the value of v again the heap ob ject now contains its value instead of the susp ended computation If f never needs v then the thunk is not evaluated at all The two most imp ortant op erational intuitions ab out Core are as follows let bindings and only let bindings perform heap al location For example let v sin x in let w pq in f v w Op erationally the rst let allo cates a thunk for sin x and then evaluates the lets b o dy This b o dy consists of the second let expression which allo cates a pair pq in the heap and then evaluates its b o dy in turn This b o dy consists of the call f v w so the call is now made passing p ointers to the two newlyallo cated ob jects In our implementation each allo cated ob ject b e it a thunk or a value consists only of a co de p ointer together with a slot for each free variable of the righthand side of the let binding Only one ob ject is allo cated regardless of the size of the righthand