Selective Lambda Lifting
Total Page:16
File Type:pdf, Size:1020Kb
1 Selective Lambda Lifting SEBASTIAN GRAF, Karlsruhe Institute of Technology, Germany SIMON PEYTON JONES, Microsoft Research, UK Lambda lifting is a well-known transformation, traditionally employed for compiling functional programs to supercombinators. However, more recent abstract machines for functional languages like OCaml and Haskell tend to do closure conversion instead for direct access to the environment, so lambda lifting is no longer necessary to generate machine code. We propose to revisit selective lambda lifting in this context as an optimising code generation strategy and conceive heuristics to identify beneficial lifting opportunities. We give a static analysis for estimating impact on heap allocations of a lifting decision. Performance measurements of our implementation within the Glasgow Haskell Compiler on a large corpus of Haskell benchmarks suggest modest speedups. CCS Concepts: • Software and its engineering → Compilers; Functional languages; Procedures, functions and subroutines; Additional Key Words and Phrases: Haskell, Lambda Lifting, Spineless Tagless G-machine, Compiler Opti- mization ACM Reference Format: Sebastian Graf and Simon Peyton Jones. 2019. Selective Lambda Lifting. Proc. ACM Program. Lang. 1, ICFP, Article 1 (January 2019), 17 pages. 1 INTRODUCTION The ability to define nested auxiliary functions referencing variables from outer scopes is essential when programming in functional languages. Take this Haskell function as an example: f a 0 = a f a n = f ¹g ¹n ‘mod‘ 2ºº ¹n − 1º where g 0 = a g n = 1 + g ¹n − 1º To generate code for nested functions like g, a typical compiler either applies lambda lifting or closure conversion. The Glasgow Haskell Compiler (GHC) chooses to do closure conversion [Peyton Jones 1992]. In doing so, it allocates a closure for g on the heap, with an environment containing an entry for a. Now imagine we lambda lifted g before closure conversion: g" a 0 = a g" a n = 1 + g" a ¹n − 1º f a 0 = a f a n = f ¹g" a ¹n ‘mod‘ 2ºº ¹n − 1º The closure for g and the associated heap allocation completely vanished in favour of a few more arguments at the call site! The result looks much simpler. And indeed, in concert with the other optimisations within GHC, the above transformation makes f effectively non-allocating, resulting in a speedup of 50%. Authors’ addresses: Sebastian Graf, Karlsruhe Institute of Technology, Karlsruhe, Germany, [email protected]; Simon Peyton Jones, Microsoft Research, Cambridge, UK, [email protected]. 2019. 2475-1421/2019/1-ART1 $15.00 https://doi.org/ Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 1. Publication date: January 2019. 1:2 Sebastian Graf and Simon Peyton Jones So should we just perform this transformation on any candidate? We have to disagree. Consider what would happen to the following program: f :: »Int ¼ ! »Int ¼ ! Int ! Int f a b 0 = a f a b 1 = b f a b n = f ¹g nº a ¹n ‘mod‘ 2º where g 0 = a g 1 = b g n = n : h where h = g ¹n − 1º Because of laziness, this will allocate a thunk for h. Closure conversion will then allocate an environment for h on the heap, closing over g. Lambda lifting yields: g" a b 0 = a g" a b 1 = b g" a b n = n : h where h = g" a b ¹n − 1º f a b 0 = a f a b 1 = b f a b n = f ¹g" a b nº a ¹n ‘mod‘ 2º The closure for g is gone, but h now closes over n, a and b instead of n and g. Moreover, this h-closure is allocated for each iteration of the loop, so we have reduced allocation by one closure for g, but increased allocation by one word in each of N allocations of h. Apart from making f allocate 10% more, this also incurs a slowdown of more than 10%. So lambda lifting is sometimes beneficial, and sometimes harmful: we should do it selectively. This work is concerned with identifying exactly when lambda lifting improves performance, providing a new angle on the interaction between lambda lifting and closure conversion. These are our contributions: • We derive a number of heuristics fueling the lambda lifting decision from concrete operational deficiencies in section3. • Integral to one of the heuristics, in section4 we provide a static analysis estimating closure growth, conservatively approximating the effects of a lifting decision on the total allocations of the program. • We implemented our lambda lifting pass in the Glasgow Haskell Compiler as part of its optimisation pipeline, operating on its Spineless Tagless G-machine (STG) language. The decision to do lambda lifting this late in the compilation pipeline is a natural one, given that accurate allocation estimates aren’t easily possible on GHC’s more high-level Core language. • We evaluate our pass against the nofib benchmark suite (section6) and find that our static analysis soundly predicts changes in heap allocations. The measurements confirm the rea- soning behind our heuristics in section3. Our approach builds on and is similar to many previous works, which we compare to in section7. Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 1. Publication date: January 2019. Selective Lambda Lifting 1:3 Variables f ;д; x;y 2 Var Expressions e 2 Expr F x Variable j f x Function call j let b in e Recursive let Bindings b 2 Bind F f = r Right-hand sides r 2 Rhs F λx ! e Programs p 2 Prog F f x = e; e 0 Fig. 1. An STG-like untyped lambda calculus 2 OPERATIONAL BACKGROUND Typically, the choice between lambda lifting and closure conversion for code generation is mutually exclusive and is dictated by the targeted abstract machine, like the G-machine [Kieburtz 1985] or the Spineless Tagless G-machine [Peyton Jones 1992], as is the case for GHC. Let’s clear up what we mean by doing lambda lifting before closure conversion and the operational effect of doing so. 2.1 Language Although the STG language is tiny compared to typical surface languages such as Haskell, its definition [Marlow and Jones 2004] still contains much detail irrelevant to lambda lifting. This section will therefore introduce an untyped lambda calculus that will serve as the subject of optimisation in the rest of the paper. 2.1.1 Syntax. As can be seen in fig.1, we extended untyped lambda calculus with let bindings, just as in Johnsson[1985]. Inspired by STG, we also assume A-normal form (ANF) [Sabry and Felleisen 1993]: • Every lambda abstraction is the right-hand side of a let binding • Arguments and heads in an application expression are all atomic (e.g., variable references) Throughout this paper, we assume that variable names are globally unique. Similar to Johnsson [1985], programs are represented by a group of top-level bindings and an expression to evaluate. Whenever there’s an example in which the expression to evaluate is not closed, assume that free variables are bound in some outer context omitted for brevity. Examples may also compromise on adhering to ANF for readability (regarding giving all complex subexpressions a name, in particular), but we will point out the details if need be. 2.1.2 Semantics. Since our calculus is a subset of the STG language, its semantics follows directly from Marlow and Jones[2004]. An informal treatment of operational behavior is still in order to express the consequences of lambda lifting. Since every application only has trivial arguments, all complex expressions had to be bound to a let in a prior compilation step. Consequently, heap allocation happens almost entirely at let bindings closing over free variables of their RHSs, with the exception of intermediate partial applications resulting from over- or undersaturated calls. Proc. ACM Program. Lang., Vol. 1, No. ICFP, Article 1. Publication date: January 2019. 1:4 Sebastian Graf and Simon Peyton Jones Put plainly: If we manage to get rid of a let binding, we get rid of one source of heap allocation since there is no closure to allocate during closure conversion. 2.2 Lambda Lifting vs. Closure Conversion The trouble with nested functions is that nobody has come up with concrete, efficient computing architectures that can cope with them natively. Compilers therefore need to rewrite local functions in terms of global definitions and auxiliary heap allocations. One way of doing so is in performing closure conversion, where references to free variables are lowered as field accesses on a record containing all free variables of the function, the closure environment. The environment is passed as an implicit parameter to the function body, which in turn is insensitive to lexical scope and can be floated to top-level. After this lowering, all functions are then regarded as closures: A pair of a code pointer and an environment. data EnvF = EnvF fx :: Int; y :: Int g let f = λa b ! :::x ::: y ::: CC f f ? env a b = :::x env ::: y env:::; ===) in f 4 2 let f = ¹f ?; EnvF x yº in ¹fst f º ¹snd f º 4 2 Closure conversion leaves behind a heap-allocated let binding for the closure1. Compare this to how lambda lifting gets rid of local functions. Johnsson[1985] introduced it for efficient code generation of lazy functional languages to G-machine code[Kieburtz 1985]. Lambda lifting converts all free variables of a function body into parameters. The resulting function body can be floated to top-level, but all call sites must be fixed up to include its former free variables. let f = λa b ! :::x ::: y ::: LL f f " x y a b = :::x ::: y:::; ) in f 4 2 === f " x y 4 2 The key difference to closure conversion is that there is no heap allocation at f ’s former definition site anymore.