A Transformation-Based Optimiser for Haskell 1 Introduction

A transformationbased optimiser for Haskell Simon L Peyton Jones Department of Computing Science University of Glasgow G QQ Email simonpjdcsglaacuk WWW httpwwwdcsglaacuk sim onpj Andre L M Santos Departmento de Informatica Universidade Federal de Pernambuco Email almsdiufpebr WWW httpwwwdiufpebr alm s Octob er Abstract Many compilers do some of their work by means of correctnesspreserving and hop efully p erformanceimproving program transformations The Glasgow Haskell Compiler GHC takes this idea of compilation by transformation as its warcry trying to express as much as p ossible of the compilation pro cess in the form of program transformations This pap er rep orts on our practical exp erience of the transformational approach to com pilation in the context of a substantial compiler This pap er is based in part on Peyton Jones and Peyton Jones Partain Santos It wil l appear in Science of Computer Programming Introduction Using correctnesspreserving transformations as a compiler optimisation is a wellestablished technique Aho Sethi Ullman Bacon Graham Sharp In the functional programming area esp ecially the idea of compilation by transformation has received quite a bit of attention App el Fradet Metayer Kelsey Kelsey Hudak Kranz Steele A transformational approach to compiler construction is attractive for two reasons Each transformation can b e implemented veried and tested separately This leads to a more mo dular compiler design in contrast to compilers that consist of a few huge passes each of which accomplishes a great deal In any framework transformational or otherwise each optimisation often exp oses new opp ortunities for other optimisations the cascade eect This makes it dicult to decide a priori what the b est order to apply them might b e In a transformational setting it is easy for compilerwriters to plug and play by reordering transformations applying them more than once or trading compilation time for co de quality by omitting some It allows a late commitment to phase ordering This pap er rep orts on our exp erience in applying transformational techniques in a particularly thoroughgoing way to the Glasgow Haskell Compiler GHC Peyton Jones et al a compiler for the nonstrict functional language Haskell Hudak et al Among other things this pap er may serve as a useful jumpingo p oint and annotated bibliography for those interested in the compiler The following distinctive themes emerge all of which are elab orated later in the pap er We frequently nd a close interplay b etween theory and practice a particularly satisfying asp ect of functionallanguage research Often a single transformation elegantly generalises a textb o ok compiler optimisation or eectively subsumes several such optimisations Our compiler infers types for the source program but then maintains types throughout the compilation pro cess We have found this to b e a big win for two reasons it supp orts a p owerful consistency check on the correctness of the compiler and it provides information that is used to drive some optimisations notably strictness analysis Overview Haskell is a nonstrict purely functional language It is a relatively large language with a rich syntax and type system designed for fullscale application programming The overall structure of our compiler is conventional The front end parses the source do es scop e analysis and type inference and translates the program into a small intermediate language called the Core language This latter stage is called desugaring The middle consists of a sequence of CoretoCore transformations and forms the sub ject of this pap er The back end translates the resulting Core program into C whence it is compiled to machine co de Peyton Jones In what sense do es this structure p erform compilation by transformation After all since the middle just transforms one Core program to another it is presumably optional and indeed our compiler has this prop erty The idea however is to do as much work as possible in the midd le leaving the irreducible minimum in the front and back ends The front end 1 should concentrate entirely on scop e analysis type inference and a simpleminded translation 1 Why do we not instead translate to Core and then typecheck Because one gets much b etter error messages from the typechecker if it is lo oking at source co de rather than desugared source co de Furthermore to require the translation to Core to preserve exactly Haskells typeinference prop erties places undesirable extra constraints on the translation Lastly the translation of Haskells system of overloading into Core can only b e done in the knowledge of the programs typing to Core ignoring eciency The back end should include optimisations only if they cannot b e done by a CoretoCore transformation This pap er describ es several examples of optimisations that are traditionally done in the desugarer or in the co de generator which we reexpress as CoretoCore transformations In short just ab out everything that could b e called an optimisation and optimisations constitute the bulk of what most quality compilers do app ears in the middle In practice we nd that transformations fall into two groups A large set of simple lo cal transformations eg constant folding b eta reduction These transformations are all implemented by a single relatively complex compiler pass that we call the simplier The complexity arises from the fact that the simplier tries to p erform as many transformations as p ossible during a single pass over the program exploiting the cascade eect It would b e unreasonably inecient to p erform just one at a time starting from the b eginning each time Despite these eorts the result of one simplier pass often still contains opp ortunities for further simplier transformations so we apply the simplier rep eatedly until no further transformations o ccur with a xed maximum to avoid pathological b ehaviour A small set of complex global transformations eg strictness analysis sp ecialising over loaded functions each of which is implemented as a separate pass Most consist of an analysis phase followed by a transformation pass that uses the analysis results to iden tify appropriate sites for the transformation Many also rely on a subsequent pass of the simplier to clean up the co de they pro duce thus avoiding the need to duplicate transformations already embo died in the simplier We have taken the plug and play idea to an extreme allowing the sequence of transformation passes to b e completely sp ecied on the command line Rather than give a sup ercial overview of everything we fo cus in this pap er on three asp ects of our compiler that play a key role in compilation by transformation The Core language itself Section Two groups of transformations implemented by the simplier inlining and b eta reduction Section and transformations involving case expressions Section Two global transformation passes one that p erforms and exploits strictness analysis Sec tion and one that moves bindings to improve allo cation and sharing Section We conclude with a brief enumeration of the other main transformations incorp orated in GHC Section a short discussion of separate compilation Section some measurements of the p erformance improvements achievable by transformation Section and a summary of the lessons we learned from our exp erience Section Program P r og B ind B ind n 1 n Binding B ind v ar E xpr Nonrecursive j rec v ar E xpr Recursive n 1 1 v ar E xpr n n Expression E xpr E xpr Atom Application j E xpr ty Type application j v ar v ar E xpr Lambda abstraction 1 n j ty v ar ty v ar E xpr Type abstraction 1 n j case E xpr of Al ts Case expression j let B ind in E xpr Lo cal denition j con v ar v ar Constructor n 1 n j pr im v ar v ar Primitive n 1 n j Atom Atoms Atom v ar Variable j Liter al Unboxed Ob ject Literals Liter al integ er j f l oat j Alternatives Al ts C al t C al t D ef aul t n 1 n j Lal t Lal t D ef aul t n 1 n Constr alt C al t con v ar v ar E xpr n 1 n Literal alt Lal t Liter al E xpr Default alt D ef aul t NoDefault j v ar E xpr Figure Syntax of the Core language The Core language The Core language clearly plays a pivotal role Its syntax is given in Figure and consists essentially of the lambda calculus augmented with let and case Though we do not give explicit syntax for them here the Core language includes algebraic data type declarations exactly as in any mo dern functional programming language For example in Haskell one might declare the type of trees thus data Tree a Leaf a Branch Tree a Tree a This declaration implicitly denes constructors Leaf and Branch that are used to construct data values and can b e used in the pattern of a case alternative Bo oleans lists and tuples are simply predeclared algebraic data types data Boolean False True data List a Nil Cons a List a data Tuple a b c T a b c One for each size of tuple Throughout the pap er we take a few lib erties with the syntax we allow ourselves inx op erators eg E E and sp ecial syntax for lists for Nil and inx for Cons and tuples eg abc We allow multiple denitions in a single let expression to abbreviate a sequence of nested let expressions and often use layout instead of curly brackets and semicolons to delimit case alternatives We use an upp ercase identier such as E to denote an arbitrary expression A Core expression is in weak head normal form or WHNF if it is a lambda abstraction constructor application variable

Load more