<<

Making a Faster Curry with Extensional Types

Paul Downen Simon Peyton Jones Zachary Sullivan Microsoft Research Zena M. Ariola Cambridge, UK University of Oregon [email protected] Eugene, Oregon, USA [email protected] [email protected] [email protected] Abstract 1 Introduction Curried functions apparently take one argument at a time, Consider these two definitions: which is slow. So optimizing for higher-order lan- guages invariably have some mechanism for working around f1 = λx. let z = h x x in λy.e y z by passing several arguments at once, as many as f = λx.λy. let z = h x x in e y z the function can handle, which is known as its arity. But 2 such mechanisms are often ad-hoc, and do not work at all in higher-order functions. We show how extensional, call- It is highly desirable for an optimizing to η ex- by-name functions have the correct behavior for directly pand f1 into f2. The function f1 takes only a single argu- expressing the arity of curried functions. And these exten- ment before returning a heap-allocated function closure; sional functions can stand side-by-side with functions native then that closure must subsequently be called by passing the to practical programming languages, which do not use call- second argument. In contrast, f2 can take both arguments by-name evaluation. Integrating call-by-name with other at once, without constructing an intermediate closure, and evaluation strategies in the same intermediate language ex- this can make a huge difference to run-time performance in presses the arity of a function in its type and gives a princi- practice [Marlow and Peyton Jones 2004]. But this η expan- pled and compositional account of multi-argument curried sion could be bad if (h x x) was expensive, because in a call functions. An unexpected, but significant, bonus is that our like ( (f2 3) xs), the expensive computation of (h 3 3) approach is equally suitable for a call-by-value language and would be performed once for each element of xs, whereas in a call-by-need language, and it can be readily integrated into (map (f1 3) xs) the value of (h 3 3) would be computed only an existing compilation framework. once. An optimizing compiler should not cause an asymp- totic slow-down! CCS Concepts • Software and its engineering → Se- So the question becomes “does (h x x) do serious work?” mantics; Compilers. We should transform f1 into f2 only if it does not. But what Keywords arity, extensionality, type systems exactly do we mean by “serious work?” For example, suppose we knew that h was defined as h = λp.λr.λq.blah; that is, h ACM Reference Format: cannot begin to do any work until it is given three arguments. Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton In this case, it is almost certainly a good idea to η expand f Jones. 2019. Making a Faster Curry with Extensional Types. In Pro- 1 and replace it with f . For this reason GHC—an optimizing ceedings of the 12th ACM SIGPLAN International Haskell Symposium 2 (Haskell ’19), August 22–23, 2019, Berlin, Germany. ACM, New York, compiler for Haskell—keeps track of the arity of every in- NY, USA, 13 pages. https://doi.org/10.1145/3331545.3342594 variable, such as h, and uses that information to guide such transformations. This notion of arity is not unique Permission to make digital or hard copies of all or part of this work for to Haskell or GHC: the same impact of η expanding f1 to personal or classroom use is granted without fee provided that copies f2, with its potential performance improvement and risk of are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights disastrous slow-down, is just as applicable in eager for components of this work owned by others than the author(s) must languages like OCaml [Dargaye and Leroy 2009]. In fact, the be honored. Abstracting with credit is permitted. To copy otherwise, or risks are even more dire in OCaml, where inappropriate η republish, to post on servers or to redistribute to lists, requires prior specific expansion can change the result of a program due to side permission and/or a fee. Request permissions from [email protected]. effects like exceptions and mutable state. So arity information Haskell ’19, August 22–23, 2019, Berlin, Germany is crucial for correctly applying useful optimizations. © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. The problem is that the very notion of “arity” is a squishy, ACM ISBN 978-1-4503-6813-1/19/08...$15.00 informal one, rooted in operational intuitions rather than https://doi.org/10.1145/3331545.3342594 solid invariants or principled theory. In this paper we resolve Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones that problem by characterizing arity formally in a type sys- 2 The Key Idea tem. However, the way we do so differs from previous work Informally, we say the arity of a function is the number of [Bolingbroke and Peyton Jones 2009] in integrating types arguments it must receive before “doing serious work.” So with calling conventions (by which they mean arity) that if f has arity 3, then f, (f 3), and (f 3 9) are all partial would require quite a large change for GHC to move from applications of f; the only time f will compute anything is a lazy-by-default to strict-by-default intermediate language. when it is given three arguments. We begin by explaining Instead, our idea is to leverage multiple calling conventions why arity is important, before intuitively introducing our (by which we mean, e.g., call-by-name versus call-by-value) new approach. to solve the issues surrounding arity and η laws with types. The advantage of this approach is that it is less monolithic, 2.1 Motivation and can be added to a variety of existing systems without There are several reasons why an optimizing compiler might disrupting orthogonal parts of their semantics or implemen- care about the arity of a function: tation, making it easier to incorporate into a serious compiler like GHC. Supporting this idea, we make these contributions: • A function of arity n can be compiled into that takes n arguments simultaneously passed in machine registers. This is much, much faster than • (Section3) We describe our approach by starting with taking arguments one at a time, and returning an in- a standard call-by-value polymorphic λ-calculus, Fv , termediate function closure after each application. and extending it with a new that can • In Haskell, the expression (seq e1 e2) evaluates e1 express higher arities. We call the extended language to weak head-normal form, and then returns e2. Now eFv , and give a self-contained meaning of arity in terms suppose we define of the and equational theory of eFv . • (Section4) To demonstrate the minimal impact on loop1, loop2 :: Int -> Int an existing implementation, we show the to loop1 = \x -> loop1 x a standard call-by-value abstract machine which is loop2 = loop2 needed to fast higher-arity function calls, but (seq loop1 True) evaluates loop1 to a function clo- otherwise keeping the same semantics as Fv . sure, and returns True. In contrast, (seq loop2 True) • (Section5) Since we intend to preserve the original simply diverges because loop2 diverges. So Haskell semantics of the above machine, we need to compile terms do not always enjoy η equivalence; in general, eFv programs in a way that makes their evaluation strat- λx.ex , e. But η equivalence does hold if e has arity egy explicit to a stock call-by-value implementation. greater than 0—in other words, if e is sure to evaluate We show how to execute our extended intermediate to a closure—so knowing the arity of an expression can language by performing a simple translation fromeFv to unlock optimizations such as discarding unnecessary - calls. For example, (seq loop1 True) becomes True eFv based on η expansion, which is shown to be correct. • (Section6) This approach does not just to call-by- since loop1 has arity 1. But this arity information is value languages. We explain how the same extension not seen in the type: loop1 and loop2 have the same can be made to a call-by-need λ-calculus by likewise type, but different arities (1 versus 0). formalizing the notion of arity in Haskell programs. • The trouble with limiting η equivalence is not unique to seq in Haskell: the same issue arises in eager func- tional languages, too. For example, consider similar We believe our approach could be integrated well with levity definitions in OCaml: polymorphism [Eisenberg and Peyton Jones 2017] (discussed let rec loop1 x = loop1 x;; in Section7) by extending their solution for polymorphism let rec loop2 x y = loop2 x y;; of non-uniform representations to polymorphism of non- uniform arities. Unlike previous work on optimizing curried As before, loop1 and loop2 appear to be η equiva- functions (which we discuss in Section8) our approach grows lent, but they are not the same function. For exam- directly from deep roots in ; in particular call- ple, let f = loop1 5 in true diverges whereas by-push-value [Levy 2001] and polarity [Danos et al. 1997; let f = loop2 5 in true returns true. This is Munch-Maccagnoni 2013; Zeilberger 2009], which tell us the because with eager evaluation, all closures are com- optimal for a type based on its logical puted in advance when bound; loop2 5 evaluates to a structure. Polarity brings two optimizations for compiling closure but loop1 5 loops forever. So both eager and lazy functional languages—arity and call-by-value data rep- lazy functional languages can have the same essen- resentations [Peyton Jones and Launchbury 1991]—under tial problem with restricted η equivalence, which can the same umbrella. block optimizations. Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany

• Fast calls for higher-order functions are a problem in • Values of type (σ { τ ), such as f1 or f2, are still fully GHC and OCaml today. Consider this higher-order first-class: they can be passed to a function, returned function: as a result, or stored in a data structure. zipWith :: (a -> b -> ) -> [a] -> [b] -> [c] • At run-time, if f : τ1 { ... τn { ρ, where ρ is not zipWith f (a:as) (b:bs) = f a b : zipWith f as bs of form ρ1 { ρ2, then f is bound to a heap-allocated zipWith f _ _ = [] function closure of arity exactly n. This differs from The call to f in the body of zipWith is an “unknown an ordinary function д : τ1 → ... τn → ρ in a call-by- call” [Marlow and Peyton Jones 2004] where the com- value language because there is no guarantee that д is piler knows nothing about the arity of f. It might bound to a closure which takes n arguments before per- expect one argument, or two, or even three (so that forming work. And it is even further different from an zipWith would return a list of functions). Such un- ordinary function in a call-by-need language because known calls impose runtime overhead, which is frus- д itself might be bound to an unevaluated thunk. trating because the vastly-common case is that f is an • But what if the result type ρ was a type variable a? Is arity-2 function. it possible that a could be instantiated by (ρ1 { ρ2), thereby changing f ’s arity and making nonsense of 2.2 The Key Idea: A New Function Arrow our claim that types statically encode arity? No: in our system you cannot instantiate a type variable with Our key idea is to add a new function type (σ { τ ), intro- such a type, which corresponds to similar existing duced with λex.e, alongside the existing one (σ → τ ). With this idea, we can express that zipWith’s first argument f restrictions in GHC. For example, GHC does not let is a function that takes precisely two arguments with the type variable be instantiated with an unboxed, or un- following type: lifted type [Eisenberg and Peyton Jones 2017]. This restriction is enforced by our type system and our new f :: a b c { { function arrow fits neatly into this framework. Now the call in zipWith’s body can be fast, passing two arguments in registers with no runtime arity checks. This 2.3 Adding Higher-Arity Functions to an new type can be added locally, when compiling zipWith, Intermediate Language without looking at its call sites (see Section 2.3). One of the major advantages to our approach to function The new type has the following intuitions: arity is that it is non-disruptive to orthogonal concerns of • Terms of type (σ { τ ) enjoy unconditional η equiva- an intermediate language. Rather than a monolithic solu- tion like [Bolingbroke and Peyton Jones 2009], requiring lence: if e : σ { τ then (λex. e x) = e. • The type (σ { τ ) is unlifted; that is, it is not inhabited significant investment by changing the entire language, we by ⊥ (a divergent value). This is why η equivalence show here how to seamlessly add extensional higher-arity holds unconditionally; it is nonsensical to have a pro- functions to a λ-based calculus. Worded more formally, our gram of type (σ { τ ). solution to higher-arity functions is a conservative extension • This η equivalence also preserves work equivalence; of an existing base language, meaning that the semantics and that is, η expansion does not change the number of re- implementation of the original features in the base language duction steps in a program run. For example, consider are unchanged. In practical terms, being a conservative ex- these two programs tension is important in the context of a compiler because it means that its other orthogonal jobs, like translating the full let f1 : Int { Int let f2 : Int { Int source language into the smaller core intermediate represen- f = let x = ack 2 3 1 f2 = λye . let x = ack 2 3 tation, do not need to be changed. And optimizations that in λye .x + y in x + y were performed on the base intermediate language—while

inmap f1 [1..1000] inmap f2 [1..1000] they may now make use of arity information provided by types if it is advantageous to do so—are still correct to apply With an ordinary λ, one would expect these two pro- directly as-is in the extended calculus. gram to behave differently: in f1, the (expensive) func- While we anticipate that expert programmers may want tion ack would be called once, with x’s value being to write code that uses ({) directly, our main focus is on shared by the 1000 calls of f1. But in f2, ack would using it internally, in the intermediate language of a compiler. be called once for each of the 1000 invocations of f2. How, then, do higher-arity functions fit into the compiler With our new lambda λe, however, the two are precisely ? Since there is no need to modify the front-end of equivalent, and in both cases ack is called 1000 times. the compiler (the translation from the source language to In effect, the binding for f1 is not memoised as a thunk, the intermediate language), the program can be first trans- but is intentionally treated in a call-by-name fashion, a lated without generating any ({) functions, pessimistically point we will return to. assuming arity 0. The arity analysis that is already being Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones performed can then be formally captured in the types of Types functions by replacing (→) with ({). But just arbitrarily , s, t ∈ ::= V Source (CBV) changing types is not sound! So how can the optimizer im- | E Extensional (CBN) prove function arity when it is entwined with types? a,b, c ∈ TyVar The so-called worker/wrapper transform is the standard τ, σ, ρ ∈ Type ::= a Type variable way that GHC uses to move information from the definition | σ → τ CBV function of a function to its call sites [Gill and Hutton 2009], and is | ∀a.τ CBV polymorphism especially useful for type-changing transformations. The | σ { τ CBN function general technique is amenable to several optimizations, most | e∀a.τ CBN polymorphism notably [Peyton Jones and Launchbury 1991]. Here, we use worker/wrapper to improve the arity of types. For example, Expressions we can split the standard Haskell zipWith function into a x,y, z ∈ Var worker and wrapper like so (where \x{e means λex.e) e,u,v ∈ Expr ::= x Variable wzipWith :: (a { b { c) { [a] { [b] { [c] | e д Application wzipWith f (a:as) (b:bs) = f a b : wzipWith f as bs | λx.e CBV abstraction wzipWith f _ _ = [] | λex.e CBN abstraction | error τ Erroneous result zipWith :: (a -> b -> c) -> [a] -> [b] -> [c] zipWith f = wzipWith (\a { \b { f a b) д ∈ Arg ::= u | σ The worker wzipWith is an arity 3 function, and its first x, a ∈ BoundVar ::= x:τ Term binding argument is an arity 2 function. That way, the call to f in | a Type binding the body of wzipWith can be a fast binary call, even though Shorthand the definition of f is unknown. The purpose of the wrap- r r r per is to serve as an interface change, which can update λ x.e σ → τ ∀ a.τ V V V individual call sites by inlining. For example, we can make λ x.e = λx.e σ → τ = σ → τ ∀ a.τ = ∀a.τ E E E the following simplifications in a call to zipWith, where the λ x.e = λex.e σ → τ = σ { τ ∀ a.τ = e∀a.τ has also been split into the arity 0 wrap- per (||) :: Bool -> Bool -> Bool and arity 2 worker Figure 1. Syntax of F : call-by-value system F extended |~| :: Bool { Bool { Bool: ev v with call-by-name λ-abstractions. zipWith (||) {- inline -} ==> wzipWith (\a { \b { (||) a b) {- inline -} ==> wzipWith (\a { \b { (|~|) a b) {- eta -} ==> wzipWith (|~|) It may not look as if much has happened, but the new code is some type σ → τ . Expressions also allow for polymorphism much better: the higher-order call to (|~|) in the wzipWith with the analogous syntax of the form λa.e for abstracting loop is now a fast call to a known-arity function. a type variable a in the polymorphic type ∀a.τ , and e σ for specializing the polymorphic term e with the specific type σ. 3 Extending an Intermediate Language We also include the expression error τ to allow for the possi- With Higher-Arity Curried Functions bility of run-time errors when a value of type τ is expected. Note that this addition is not fundamental, it just shows the Since the choice of the beginning base calculus is rather ar- robustness of the approach. bitrary, we choose a modest starting point: the call-by-value Our extension with higher-arity functions is highlighted system F, referred to here as Fv . We make this choice only for in the figure, and consists of one new form of expression and illustrative purposes due to the fact that it is simple to state two new types. In types, we introduce a new form of function, formally, but still serves as a common core for strict func- σ { τ . The purpose of this new function type is that (unlike tional languages which can speak about important issues → arrows) multiple { arrows chain together into a single, such as polymorphism. However, the exact same extension higher-arity abstraction. Intuitively, this chaining could be can be applied to a lazy λ-calculus—serving as an intermedi- viewed in terms of multiple-argument functions (similar to ate representation for Haskell programs—as well, which we [Bolingbroke and Peyton Jones 2009]) like so discuss later in Section6.

3.1 Syntax τ { σ { ρ ≈ (τ, σ) { ρ

We give the syntax of eFv —a simple language based on a con- ventional system Fv for expressing higher-arity functions—in however we do not require that all the arguments of a func- Fig.1. Expressions in eFv include the usual λ-abstractions and tion be grouped together, allowing for the usual currying applications (of the forms λx:σ .e and e u) for functions of and even for higher-arity functions in eFv . Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany

We also introduce a new form of polymorphism, e∀a.τ that Typing environments similarly chains together along with {. Incrementally chain- Γ ::= x1 : τ1 ... xn : τn ing polymorphism with function arguments is another ad- Classifying types τ : r vantage of our approach over [Bolingbroke and Peyton Jones 2009]. In order to view them both as a polymorphic multi- a : V σ → τ : V ∀a.τ : V σ { τ : E e∀a.τ : E argument function, we would have to combine polymorphic abstraction and functions into a single monolithic type like Checking types Γ ⊢ e : τ ⊢ Var ⊢ Error e∀a.τ { σ ≈ (a, τ ) { σ Γ, x : τ x : τ Γ error τ : τ Γ, x : σ ⊢ e : τ Γ ⊢ e : σ → τ Γ ⊢ u : σ ( ) →I →E with the understanding that in a, τ { σ, the type variable Γ ⊢ λx:σ .e : σ → τ Γ ⊢ e u : τ a introduced on the left-hand side of the arrow is in scope Γ ⊢ e : τ a < FV (Γ) Γ ⊢ e : ∀a.τ σ : V for both τ and σ. Such monolithic types immediately be- ∀I ∀E Γ ⊢ λa.e : ∀a.τ Γ ⊢ e σ : τ [σ/a] come more complex because e∀s and {s could be nested in any way, requiring a notion of telescoping in the in- Γ, x : σ ⊢ e : τ Γ ⊢ e : σ { τ Γ ⊢ u : σ {I {E put of a function that reaches into its output. For example, Γ ⊢ λex:σ .e : σ { τ Γ ⊢ e u : τ e∀a.τ { e∀b.σ { ρ chains together as the multi-argument Γ ⊢ e : τ a < FV (Γ) function (a, τ,b, σ) { ρ, where a is in scope for τ , σ, and Γ ⊢ e : e∀a.τ σ : V e∀I ∀E ρ, but b is only in scope for σ and ρ. Our approach keeps Γ ⊢ λea.e : e∀a.τ Γ ⊢ e σ : τ [σ/a] e higher-arity functions and polymorphism separate from one another, completely avoiding the need for telescoping. Figure 2. Type system of eFv . To go along with the two new forms of types, we also have a new form of abstraction: λex:σ .e for introducing a function of type σ { τ and λea.e for introducing polymorphism of the base system Fv language, there is only one kind of type type e∀a.τ (where e is of type τ ). The two new types share (denoted here by V). However, since our extension has added the same syntax for application as in the base system Fv . The a new kind of types (denoted by E), checking whether τ : r primary reason to distinguish λ from λe is for the purpose holds points out the dichotomy between the two. And this of determining the types of annotated terms: without the distinction can be done by just reading the head connective distinction, we don’t know whether λx:a.x should be of type of the type. As such, the base language features (type vari- a → a or a { a absent additional context. ables a, arrows →, and quantifiers ∀) are all assigned the Distinguishing the base language features from the exten- kind V. In contrast, the extension (arrows { and quantifiers sion is an essential ingredient in our approach: it allows us e∀) are assigned the kind E. to preserve the semantics for the original V- of the lan- Also note that, for now, we choose not to add E type vari- guage, which remains untouched while we add the semantics ables in addition to the existing V ones inherited from the for new E-features. And it is important to know which of the base system F, since this extension is not necessary for cap- two we are dealing with (the original base language or the turing arity in types. We will further discuss the ramifications extension) because the two semantics need not (and in this of this choice later in Section7. case will not) be the same. Having the distinction lets us add Next, Fig.2 shows the typing rules. The rules for the base an entirely new evaluation strategy as a conservative exten- language are conventional for system Fv , showing how to sion to a seemingly incompatible base calculus. To achieve type check variables as well as abstractions and applications this goal, we just need to distinguish between the new types for both functions and polymorphism. The highlighted typ- introduced by the extension and the original types of Fv . This ing rules are essentially the same as the ones for system is done by classifying types as either V (for the base Fv types) Fv , except that they refer to the new forms of arrow and or E (for the extensional types being added on top), which quantifier. are two different basic kinds of types. In ordinary system Fv , kinds are not necessary, since every type has the same kind 3.3 Equational Theory (sometimes written ⋆). Classifying types as just V or E can In order to reason about programs in terms of high-level be seen as a rather rudimentary kind system. But this style syntactic rewriting, we give an equational theory for eFv with of classification makes it possible to extend the kind system the axioms presented in Fig.3. Notice that the first two with more advanced features in a more full-fledged system. axioms, βV and ∀ are the usual rules for resolving applications of function and type abstraction in the call-by-value system 3.2 Type System Fv . In particular, the βV rule is limited to substituting only We present the type system for eFv in Fig.2. Again, the ex- arguments which are values (syntactically, a variable or a tension to the base language is highlighted in the figure. In λ-abstraction). That’s because in call-by-value, arguments Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones

thereby avoiding the above counter example. However, even V ∈ Value ::= x | λx.e admitting this restricted η axiom does not solve the problem r (βV)(λ x:σ .e) V = e[V /x] of explicating arity by expanding out every λ allowed by a type. For example, if we have some unknown function (∀) (λra.e) σ = e[σ/a] z : a → a → a, then the above η→ axiom gets stuck after r (βE)( λ x:σ .e) u = e[u/x](σ : E) the first expansion:

(η{) λex:σ .(e x) = e : σ { τ (x < FV (e)) z =η→ λx:a.(z x) ,η→ λx:a.λy:a.(z x y) (η∀) λea.(e a) = e : e∀a.τ (a < FV (e)) e This is because the partial application z x might already result in an error, so η→ no longer applies. Figure 3. Equational theory of eFv . In order to fully η-expand a function of any higher arity, we must ensure that even erroneous expressions of type are evaluated first before the call is resolved, so that only σ { τ can be safely η-expanded without changing the result values may be passed to the function. For example, of a program. The consequence of this constraint is that the only context in which we are ever allowed to evaluate an (λz:a.λx:a.x)(error a) , λx:a.x expression of type σ { τ is a calling context that already has is not a valid equation because the two sides give vastly dif- an argument ready. This restriction on evaluation follows ferent results: evaluating the left-hand side causes an error at the call-by-name strategy, making it safe to η-expand any run-time when the argument error a is encountered, whereas number of λes like for the expression e : e∀a.a { a { a the right-hand side is already a value. e =η λea.(e a) Besides the usual call-by-value system Fv axioms, we have e∀ additional axioms to handle higher-arity functions in F . First ( ) ev =η{ λea.λex:a. e a x of all, we have the full version of the β axiom, denoted here =η λea.λex:a.λye :a.(e a x y) by βE, which can substitute expressions which are not values. { However, just allowing such a rule to apply whenever would Note that as a side constraint necessary for satisfying the full be dangerous—as illustrated by the inequality above. How η axiom: expressions of type τ : E are not programs and can- can we reconcile the difference between call-by-name and not be evaluated. Allowing such expressions to be programs call-by-value functions coexisting in the same language? We would let us observe the difference between error (a { b) can use the type of the argument to determine when the full and λex:a.(error (a { b) x) which, again, must be equivalent. call-by-name β axiom is safe. If the of a function has type σ : E, then any expression may be substituted since the argument should be handled in a call-by-name way. 4 Adding Higher-Arity Calls to an Abstract Otherwise, if the function’s parameter has type σ : V, then Machine this is a call-by-value application, and only values may be Now, lets consider how to run programs, with the goal of substituted. This side condition allows us to safely mix both performing higher-arity function calls to a stock implementa- β axioms in the same language. tion [Peyton Jones 1992]. As is the running theme, we begin The primary motivation, though, is the addition of the full with a standard call-by-value abstract machine for system Fv η axioms η and η in Fig.3. These are the equations that and only add an extension to function applications and calls { e∀ allow us to use types to statically expand any expression which is capable of passing multiple at once. The with the appropriate number of λes to match its arity. As machine in question is specified by the operational seman- usual, the η axioms only apply to expressions of the correct tics in Fig.4, and eagerly evaluates all function arguments in type (for example, expanding 5 to λx.(5 x) is nonsensical). a right-to-left order. Evaluated function parameters can be We really do need to use a call-by-name understanding either types (for specializing polymorphism) or closures, and for expressions of type σ { τ and e∀a.τ in order to support the stack consists of a pending function waiting for its argu- these η axioms. For example, applying the analogous η axiom ment (eB ◦ S) or a chain of parameters which are annotated ′ ′′ to an erroneous expression of type a → b is not an equality with the calling convention of their function (P r P s P t S). The purpose of keeping track of the calling convention of error (a → b) , λx:a.(error (a → b) x) function arguments is to remember the difference from the because the left-hand side evaluates to an error while the base system Fv (for which r will always be V) and the new right-hand side is a value. The safe version of the η axiom in form of extensional functions during β-reduction. Our only ∗ the call-by-value λ-calculus must be restricted like so significant extension to this base semantics is found inthe β rule. When limited to just expressions from the base call-by- (η→) λx:σ .(V x) = V : σ → τ (x < FV (V )) ∗ value system Fv , the β performs the usual single-argument Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany

Runtime state 4.1 Differences in Evaluation Order ∈ r . W WHNF ::= λ x e Abstraction value If we just run an arbitrary eFv expression, then the result P ∈ Param ::= WB Closure parameter might be wrong! That’s because the machine in Fig.4 uses | σ Type parameter just a call-by-value evaluation order, and has no idea that S ∈ Stack ::= ε Empty stack some expressions should be treated in a call-by-name way | eB ◦ S Function application | P r S Application chain as specified by the axioms in Fig.3. For example, we have ∈ B Env ::= ε Empty environment (λx:σ {τ .I)(error σ {τ ) = I | [WB/x]B Closure binding βE | [σ/a]B Type binding where I = λa.λex:a.x is the , but instead at run-time we get an error (i.e. a stuck state): ′ ′ ′ Machine steps ⟨e|B|S⟩ 7→ ⟨e |B |S ⟩ ⟨(λx:σ {τ .I)(error σ {τ ) | ε | ε⟩ (β∗)⟨ λrx.e | B | P r S⟩ 7→ ⟨e′ | B′ | S ′⟩ 7→ ⟨error σ {τ | ε | (λx:σ {τ .I)ε ◦ ε⟩ (app∗(e, [P/x]B, S) = (e′, B′, S ′)) ̸7→ ′ ′ (var)⟨x | B | S⟩ 7→ ⟨W | B | S⟩ ([WB /x] ∈ B) In this case, the equational theory gives the intended result (arg)⟨e u | B | S⟩ 7→ ⟨u | B | (eB) ◦ S⟩ and the machine is wrong. But we don’t want to modify the (fun)⟨W | B | (eB′) ◦ S⟩ 7→ ⟨e | B′ | (WB) t S⟩(e : τ : t) existing machine, since it is already perfectly serviceable for (spec)⟨e σ | B | S⟩ 7→ ⟨e | B | σ t S⟩(e : τ : t) the base language. Instead, we should elaborate the input programs them- ∗ ′ ′ ′ Multi-arity application app (e, B, S) = (e , B , S ) selves so that they give the correct result even on a purely ∗ ∗ call-by-value machine. Thankfully we don’t need to do any- app (λex.e, B, P E S) = app (e, [P/x]B, S) ∗ ′ ′ thing as drastic as continuation-passing style in order to app (e, B, S) = (e, B, S)(e , λex.e and S , P E S ) make the evaluation order explicit. Rather,η expansion comes in to solve the problem for us. As a pleasant side benefit, just - η-expanding the call-by-name expression to make the arity Figure 4. Abstract machine for eFv explicit in the syntax also makes the evaluation order ex- plicit! So there is no need to change the implementation of the machine; strategically η expanding the program gives function application like so the correct result as in: ⟨λx.e | B | P V S⟩ 7→ ⟨e | [P/x]B | S⟩ ⟨(λx:σ {τ .I)(λye :σ .error σ {τ y) | ε | ε⟩ 7→ ⟨λy:σ .error σ τ y | ε | (λx:σ τ .I)ε ◦ ε⟩ However, β∗ is also capable of passing multiple arguments e { { at once during a single step, as defined by app∗(e, B, S). The 7→ ⟨λx:σ {τ .I | ε | (λye :σ .error σ {τ y)ε V ε⟩ idea is that, once the first argument is being passed toa 7→ ⟨I | [(λy:σ .error σ {τ y)ε/x]ε | ε⟩ function closure (of any r), then all the following arguments e for the next E-function calls are passed, too, at the same 4.2 Run-Time Arity Mismatch time. For example, a ternary function call beginning with a V closure and followed by two E calls is performed by the There is a new kind of error that might occur at run-time: it following single β∗ step: may be the case that the machine gets stuck if the arity of a function and the call stack do not match. For example, there ′ ′′ E ⟨λx.λey.λez.(f z y x) | B | P V P E P E ε⟩ could be too many -parameters and not enough λes like so ′′ ′ 7→ ⟨f z y x | [P /z][P /y][P/x]B | ε⟩ ⟨λx:σ {τ .λye :ρ.x | B | P1 V P2 E P3 E ε⟩ ̸7→

All three parameters are substituted in exactly one step. In This is a stuck configuration because the multiple-argument this sense, the semantics in Fig.4 explicitly models how to application is not defined in this case, since there add higher-arity function calls to a standard abstract machine. are not enough λes to match the last E-parameter remaining on the stack: But if we want to run programs of eFv on this stock call-by- ∗ value machine, we will need to compile eFv programs first app ((λye :ρ.x), [P1/x]B, (P2 E P3 E ε)) to implement the semantics of eFv that was given in Fig.3. ∗ = app (x, [P2/y][P1/x]B, (P3 E ε)) In particular, in order to reflect eFv semantics with this stock call-by-value machine, the compilation we describe next in In this case, too, applying the missing η-expansion to sup- Section5 will have to resolve the following two issues. ply all the λes allowed by the type fixes the dynamic arity Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones

Type-driven E e д : τ (invariant, e д : τ ) Restricted η-long typing rules Γ ⊢η e : τ J K ηVar ηError (λe)E λex.e ε : τ = λex.E e ε : σ (e : σ) Γ, x : τ ⊢ x : τ Γ ⊢ error τ : τ J K J K η η ′ E r (η{)E e д : σ {τ = λex:σ .E e д, x : τ (e , λex:σ .e ) Γ, x : σ ⊢η e : τ Γ, x : σ ⊢ e : σ → τ → η J K J K ′ r r η I η→Ex (η )E e д : ∀a.τ = λa.E e д, a : τ (e λa.e ) Γ ⊢η λ x:σ .e : σ → τ Γ, x : σ ⊢η e x : τ e∀ e e , e J K J K r E (V)E e д : τ = C e д (τ : V) Γ ⊢η e : σ → τ Γ ⊢η u : σ η→E J K J K Γ ⊢ e u : τ Type-generic C e д η ⊢E ( ) J K Γ η e : τ a < FV Γ Γ ⊢ e : ∀a.τ σ : V (var)C x д = x д r r η∀I η∀E J K Γ ⊢η λ a.e : ∀ a.τ Γ ⊢ e σ : τ [σ/a] (err)C error τ д = error τ д J K ⊢E (λ)C λrx.e д = (λrx.E e ε : τ ) д (e : τ ) Checking η-expansion Θ; Γ η e : τ J K J K E E (app)C e u д = C e (E u ε : σ),д (u : σ) Γ, x : τ ⊢η e : σ Γ ⊢η e : τ a < FV (Γ) J K J K J K η{I ηe∀ (spec)C e σ д = C e σ,д Γ ⊢E λx:τ .e : τ σ Γ ⊢E λa:k.e : ∀a:k.τ J K J K η e { η e e τ : V Γ ⊢η e : τ - E V Figure 5. Pre-processing transformation from eFv to eFv Γ ⊢η e : τ

- mismatch problem: Figure 6. Type system of eFv .

⟨λx:σ {τ .λye :ρλez:σ .(x z) | B | P1 V P2 E P3 E ε⟩ 7→ ⟨x z | [P /z][P /y][P /x]B | ε⟩ 3 2 1 λemissing from e (in the η{ and η steps). C e ε handles all λe J K The opposite case is when there are too few E-parameters the other cases by descending down a chain of applications (in the app and spec steps). To do so, both transformations on the stack to fill all of the λes. For example, we could have are parameterized by a list, д, of the arguments surround- ⟨(λx:a{a.λye :a.x y) Ia | ε | ε⟩ ing the expression. So in their full generality, we have the ∗ η-expansion transformation E e д : τ (where e д : τ ) and 7→ ⟨λx:a{a.λy:a.x y | ε | Iaε V ε⟩ e the helping C e д. J K ̸7→ The purposeJ K of this compilation is to transform an eFv program into an equivalent one that gives the correct where Ia = λex:a.x : a { a. But notice that the type of the answer. The first fact about the transformation is that itjust input program was a { a, which is call-by-name (E), not elaborates the original expression by selectively applying η. call-by-value (V)! As pointed out previously in the context of the equational theory, validating the η axioms requires us to 1 (η Elaboration). For a typed eFv expression e restrict programs to expressions of V-types. This condition such that Γ ⊢ e : τ , already rejects the term (λx:a{a.λye :a.x y) Ia as a valid input 1. e =η η (E e ε : τ ), and program because it has the type a { a. As it turns out, { e∀ J K 2. e =η η (C e ε). this same condition on the final type of result that well- { e∀ J K typed programs are allowed to produce also eliminates the The invariants created by the η-elaboration process can possibility that there are not enough parameters on the stack be captured in the form of a type system, shown in Fig.6. to complete a higher-arity function call. This type system is a restriction of Fig.2 by mandating η- expansion in the arguments to function calls and underneath 5 Compiling Extensional Functions any λ (either λ or λe). We refer to the strict subset of eFv ex- We just saw some examples of how strategic η expansion pressions which are well-typed according to this restricted can let us run extensional, higher-arity functions on a stock system (i.e., expressions e such that Γ ⊢η e : τ ) as the exe- call-by-value machine. But is there a general procedure for - cutable sub-language eFv . compiling any eFv expression? Here, we demonstrate a trans- formation for inserting all the missing λes allowed by the Proposition 2 (Compilation). For a typed eFv expression e types of sub-expressions via η and η expansion, thereby ⊢ { e∀ such that Γ e : τ , making the arity implied by types explicit in the syntax of 1. Γ ⊢E (E e ε : τ ) : τ is a F- expression, and the program. This transformation is given in Fig.5 and con- η ev J K - sists of two parts. E e ε : τ uses the type τ of e to insert any 2. Γ ⊢η (C e ε) : τ is a eFv expression when τ : V. J K J K Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany

Runtime State functions interact with sharing and memoization in a call- W ∈ WHNF ::= λrx.e by-need language like Haskell?” P ∈ Param ::= x | σ S ∈ Stack ::= ε Empty stack 6.1 A Higher-Arity Lazy Machine | P r S Application Chain To avoid repetition, we will only discuss the interesting dif- | !x; S Thunk update ferences that laziness poses when compared with Sections3 H ∈ Heap ::= ε Empty heap | x := e; H Thunk allocation to5. In order to model laziness and sharing, we can use the standard construct let x:τ = u in e which should be read as “first evaluate e and remember x = u; only evaluate u if x is Machine steps ⟨e|H |S⟩ 7→ ⟨e ′|H ′|S ′⟩ needed in e.” This understanding of let-bindings is formal- (β∗)⟨ λrx.e | H | P r S⟩ 7→ ⟨e′ | H | S ′⟩ ized by the call-by-need abstract machine in Fig.7. In this machine, function parameters cannot be arbitrary expres- ( ∗( [ / ], ) ( ′, ′)) app e P x S = e S sions; they are pointers into the heap H which are allocated (push)⟨e P | H | S⟩ 7→ ⟨e | H | P t S⟩(e : τ : t) by the alloc step. As such, they are only ever evaluated when (alloc)⟨let x = u in e | H | S⟩ 7→ ⟨e[y/x] | y := u; H | S⟩ the variable is referenced (by the force step) after which its (force)⟨x | x := e; H | S⟩ 7→ ⟨e | x := e; H | !x; S⟩ binding will be updated with its value (by the memo step). (memo)⟨W | x := e; H | !x; S⟩ 7→ ⟨W | x := W ; H | S⟩ Just like with the call-by-value machine of Section4, this lazy machine requires some pre-processing step for elaborat- ∗ ′ ′ Multi-arity application app (e, S) = (e , S ) ing the evaluation strategy of extensional functions. The elab- ∗ ∗ oration for lazy evaluation is the same sort of η-expansion app (λex.e, P E S) = app (e[P/x], S) as the one in Section5, with a few differences. First, ∗ ′ ′ app (e, S) = (e, S)(e , λex.e and S , P E S ) since function parameters must be variables, we should take care to name arguments with a let. Second, to compile a let, Figure 7. A lazy abstract machine extended with higher- we should η-expand the bound expression, and push all the arity function calls surrounding arguments into its body. These are expressed by the following new lines for the transformation (where we use a list of parameters P instead of arguments): We are now equipped to properly run eFv expressions by - first compiling to the sub-language eFv . Doing so lets us use C e x P = C e x, P the same implementation of the base call-by-value system J K J K C e u P = C let x:τ = u in e x P (x , u : τ ) Fv (with support for faster multiple-argument function calls) J K J K while still satisfying the mixed call-by-value and call-by- C let x:τ = u in e P = let x:τ = E u ε : τ in C e P J K J K J K name semantics of eFv . 6.2 Forcing Thunks - Definition 1 (Programs). An expression e is an eFv program Haskell-like languages have the capability of forcing thunks if ε ⊢η e : τ and τ : V. even when their value is not yet needed using an operation seq - like to evaluate them to weak-head normal form. As Proposition 3 (Operational soundness). For any eFv pro- mentioned in Section2, seq is precisely the operation that gram e, if e = V (for some value V according to Fig.3) then ∗ invalidates η expansion for lazy function closures. This is ⟨e|ε|ε⟩ 7→ ⟨W |B|ε⟩ (for some W and B). unacceptable for our approach, since we heavily rely on the η law to make the evaluation strategy explicit at run-time. 6 Sharing and Lazy Evaluation Thankfully, the same solution we used to differentiate So far, we showed how to extend an existing language with the evaluation strategy of the base language and the higher- higher-arity using extensional functions to get the eFv calcu- arity function —that is, distinguishing the two types of lus. But the starting point—the call-by-value system Fv —was programs—also solves the issue of inappropriate forcing due not important; we needed to begin somewhere for the pur- to seq. In Section3, there wasn’t even the possibility of a call- pose of illustration. Because we formally distinguish the by-name type variable, so the type of seq : ∀a.∀b.a → b → base language (Fv ) in the extended calculus (eFv ) in order b prevents it from being applied to higher-arity functions, to preserve two different semantics, we can choose a base thereby rescuing their η law. Even if we were to go to a language with any evaluation strategy and apply the same ex- more expressive system with arity polymorphism (discussed tension. Of particular interest, we could have instead started next in Section7), the type variables a and b only range with an intermediate language that uses lazy evaluation, and over lazy (call-by-need) types, which rules out extensional achieve higher-arity functions using the same technique. The (call-by-name) ones. Note that if our lazy base calculus has question then becomes “how do higher-arity, extensional a polymorphic case expression of the form case e of x → u Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones which forces e, then it too needs to follow the interface of id) and binary function (like const) as follows: seq described above; e cannot have a call-by-name type. const : ∀a.e∀b.a { b { a 6.3 Work Preservation const = λa.λeb.λex:a.λye :b.x Even though both call-by-need and call-by-name are non- id (∀a.a { a) id : (∀a.a { a) strict evaluation strategies, they still have a crucial differ- ence: call-by-need shares the work done to evaluate bindings, id (∀a.e∀b.a { b { a) const : ∀a.e∀b.a { b { a whereas call-by-need repeats work every time the binding In this sense, the polymorphism allowed by F is analogous to is used. So we must take care that adding call-by-name to ev [Bolingbroke and Peyton Jones 2009]. But F is more expres- a call-by-need language does not accidentally cause work ev sive, because in addition to call-by-value function closures duplication, as that could have a severe performance penalty. (which might have a higher arity by starting with ∀ or → With our methodology, work is preserved automatically because we distinguish work-sharing bindings (call-by-need) and continuing with multiple e∀s or {s) it also allows for from ones that are intended to be repeated (call-by-name). “naked” higher-arity functions of kind E. If we bind a variable of a call-by-need type, then we intend Now suppose that we just added type variables of kind to share its work, so the η law does not apply in general E (we will distinguish the two kinds of type variables by which prevents duplication. But if an expression e has a a : V versus a : E at their binding site à la system Fω ). What call-by-name type, then we do not plan to share its work goes wrong? We could now write the following alternative identity and constant functions over E types: anyway because it must be treated as λex.(e x). This gives an optimizer more freedom to η expand, which is something idE : e∀a:E.a { a = λea:E.λex:a.x call-by-need compilers are sensitive about doing because it may duplicate work. Moreover, distinguishing the two constE : e∀a:E.e∀b:E.a { b { a = λea:E.λeb:E.λex:a.λye :b.x kinds of function types allows us to simplify the runtime But remember, in order to be able to evaluate programs representation of functions. Call-by-name functions can be a with a stock implementation, we need to be able to fully direct pointer to a closure instead of a pointer to a thunk like η-expand them! This makes both the evaluation strategy and call-by-need functions, which avoids the need to dynamically the arity of each function call fully explicit in the syntax check whether the closure has been evaluated yet. of the program. But how much expansion is necessary? In both of the above cases, it depends on the instantiation of 7 Arity Polymorphism a, which can include any number of additional { arrows which must “fuse” with the preceding ones. For example, if In Section3, we began with a calculus (system Fv ) that had a notion of polymorphism, and added to it higher-arity cur- we specialize the a in idE to the type of the identity function, ried functions. The base language and the extension were we learn that we are missing one λe, and so that special case distinguished by different kinds of types (written V versus should be expanded once, but if we specialize a to the type E, respectively). A consequence of this decision meant that of the , it is instead missing two λes, like so: there was only polymorphism over the original V types, but idE (e∀a:E.a{a) =η λex:e∀a:E.a{a.λye :a.x y not over the new E types. For example, if we just added a pair- { ing constructor Pair : ∀a.∀b.a → b → (a,b), then we could idE (e∀a:E.e∀a:E.a{b{a) only instantiate Pair with call-by-value (V) types, but not =η λex:e∀a:E.e∀b:E.a{b{a.λye :a.λez:b.x y z extensional types (E). On the surface, putting higher-arity { functions inside a pair seems reasonable. But to quantify over In each case, depending on the type of the higher-order higher-arity types, there are some serious issues surrounding parameter x, x will be called at a different arity, which is no arity polymorphism to consider. longer known at compile-time in the definition site of idE. Notice that the polymorphism already available in eFv does The problem with polymorphic types a of kind E is that allow for polymorphism over functions that might have dif- E types posses a form of non-uniformity: they may vary ferent arity. For example, consider the identity function in their calling convention in terms of the number of argu- ments they expect. Unlike V types for which arity checking is id : ∀a.a { a done dynamically at run-time (“unknown” functions in the parlance of [Marlow and Peyton Jones 2004]), the different id = λa.λex:a.x arity of E types is “known” at compile-time. This property is essential for fast code generation: the call site of an E func- Even though a is a V-type variable, it can still be instantiated tion type will just assume a particular arity for passing some with call-by-value function types that are called with multi- number of arguments at once and call that function without ple arguments. For example, id can be applied to unary (like consulting its meta-data. As such, the arity of E types must Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany be statically predictable, which conflicts with generic type specific type is unknown. This extension could give arich variables a : E. language of kinds with the goal of statically tracking: There is another, better-known, non-uniformity problem 1. the specific evaluation strategy of function arguments with polymorphism: data representation. For example, con- (e.g., call-by-value versus call-by-name) as in Section3, sider the projection function: snd : (a,b) → b = λ(x,y).y. 2. the number of arguments the function accepts before When both a and b are unknown polymorphic types, how doing serious work (the traditional notion of arity), can we generate code for extracting the second component 3. the representation of each argument (so that specific of the pair? This is quite a complicated question when the parameter-passing code is known at the call site), and values of different types may be represented by different 4. the representation of its result (so that higher-arity sizes and layouts in memory. If the pair (x,y) is stored con- functions can be composed). tiguously in memory, then the offset to the beginning of y The purpose of such an extension would be to maintain depends on the size of the value x: when x is a character it information about both representations and arity in a com- may 8-bits, but when it is a double-precision floating point positional way: arities and representations should still be number it takes up 64-bits. Likewise, we need to know the inferred from kinds no matter how deeply nested functions size of y to know how many bytes to move. And the situa- and data structures become. tion is further exacerbated when certain types of values, like floating-points, require special code for parameter passing. 8 Related Work A common solution to these problems with polymorphism is to enforce uniform representation: every type is constrained 8.1 Uncurrying to values represented by pointer-sized data. With this so- The classic way to handle multi-argument function calls is lution, all of the above problems vanish, but at the cost of to uncurry the function [Bolingbroke and Peyton Jones 2009; extra boxing: if a value cannot fit within the constraints ofa Dargaye and Leroy 2009; Hannan and Hicks 1998]. Similar pointer-like shape, then there is an extra level of indirection to the work here, they too focus on solving fast higher-arity by replacing the value with a pointer to the value. The indi- calls inside higher-order functions. rection of boxing is costly in practice, and so an optimizing Like us, uncurrying techniques aim to encode arity in compiler should use unboxed values when possible, but this types, resulting in the introduction of new function types. may come at the cost of disabling polymorphism. Both [Dargaye and Leroy 2009] and [Bolingbroke and Pey- An interesting approach to reconciling unboxing with ton Jones 2009] use a compound, multiple-argument function polymorphism was introduced by [Eisenberg and Peyton type (σ0,..., σn) → τ for uncurrying. As discussed in Sec- Jones 2017] called “levity polymorphism.” One can view lev- tion3, multi-argument functions requires special support ity polymorphism as a conservative extension of a conven- in the presence of polymorphism, which ends up entwining tional functional language relying on uniform representation the two different language features into one. This involves with the following novel features: a telescoped introduction of type variables in the (many) arguments of the function which remain in scope for the 1. unboxed types (e.g., and floating-point num- return type. Our approach based on extensional functions bers) with a variety of different representations, completely avoids this technical problem since the structure 2. polymorphism over types with a known (i.e., monomor- of our higher-arity types ∀a.τ and σ { τ are exactly the phic) representation, and e same as the original curried definition of functions. 3. polymorphism over representations themselves. In an approach more similar to ours, [Hannan and Hicks The key idea to make levity polymorphism work is to en- 1998] introduce an intermediate representation for their un- rich kinds to be informative enough to determine the repre- currying transformation that involves adding annotated func- sentations of values, even when the specific type of those tion types. The function type σ →⇑ τ fuses with τ when it is values are yet unknown. Pleasantly, points 1 and 2 together also a function, and (σ →ϵ τ ) is a standard curried function do not cause issue for compilability. Only point 3 could be type. Our (σ { τ ) functions also have a fusion property, but trouble, which encapsulates all of the above issues we dis- we do not require the τ be function. [Hannan and cussed about compiling non-uniform polymorphism. And Hicks 1998]’s work avoids the problem of polymorphism thankfully, [Eisenberg and Peyton Jones 2017] gives a simple, that other uncurrying techniques have by sticking to the fundamental, requirement to ensure compilability: Hindley-Milner type system, where quantifiers only appear in type schemes, but this is significantly less expressive than Never move or store a levity-polymorphic value. the unrestricted quantifiers of system F[Girard et al. 1989]. We believe that the same basic idea can be applied to Thus far, uncurrying-based arity presents a monolithic enhance the polymorphism over functions with non-uniform approach which fuses several distinct language features and arity. As future work, we would like to further enrich the can require switching to an entirely different evaluation strat- language of kinds to fill in arity information even when the egy like [Bolingbroke and Peyton Jones 2009], which can Haskell ’19, August 22–23, 2019, Berlin, Germany Paul Downen, Zachary Sullivan, Zena M. Ariola, and Simon Peyton Jones interfere with the rest of the compiler pipeline. As a gen- while optimizations freely manipulate the structure of terms. eral design goal, our technique aims to be as non-disruptive To apply the standard worker/wrapper transformation, the as possible, maintaining the same overall structure of pro- type is derived from counting λs and this is used to create grams and types for both lazy and strict languages (avoiding the worker and wrapper. The wrapper η expands higher- both explicit uncurrying and thunking), and only requiring order functions to produce higher-arity ({) functions. These that types inform us of the evaluation strategy. Thus, our functions must appear fully applied in the wrapper which technique is better suited for integrating into existing im- may require further η expansion. We exploit the fact that plementations because of its lower initial investment. As call-by-name functions can always appear fully applied due a pleasant side benefit, our approach based on integrating to the η law in order to generate saturated applications. multiple evaluation strategies introduces opportunities for optimizations unrelated to arity. For instance, in Haskell, 8.3 Polarity a call to f :: Int -> Int must first perform a dynamic Our inspiration for the treatment of arity came from polarity check to determine if f needs to be evaluated. In contrast, and call-by-push-value [Levy 2001; Munch-Maccagnoni 2013; calling f :: Int { Int can just immediately call f without Zeilberger 2009], which study a type-based mixture of - that dynamic check, because it must be bound to a closure. uation strategies in programming languages from the fields Just uncurrying a higher-arity function does not express this of and type theory. These systems enjoy the strongest difference of evaluation strategy. possible extensionality laws (a.k.a. η axioms) for every type [Munch-Maccagnoni 2009], even in the presence of recur- 8.2 Arity in Compilation sion and computational effects. To this end, every type has The importance of function arity became apparent in the an “innate” evaluation strategy which allows for mixing of once-pervasive categorical abstract machine [Cousineau et al. call-by-value constructs (like data types) with call-by-name 1985] wherein curried functions would allocate many inter- constructs (like functions). mediate closures. With the goal of fast multi-arity function Polarity has been used in the past to address other is- calls, the Krivine [Krivine 2007] and ZINC [Leroy 1990] ab- sues relevant to intermediate languages and optimization, stract machines repeatedly push arguments onto the stack like resolving the conflict between “strong sum” types and and an n-ary function will consume all n arguments it needs. non-termination [Munch-Maccagnoni and Scherer 2015]. Re- Another approach to speeding up curried programs is pre- cently, they have been extended with call-by-need constructs sented by [Marlow and Peyton Jones 2004] which combines [Downen and Ariola 2018; McDermott and Mycroft 2019] both statically and dynamically known arity information to used to support Haskell-like languages. Investigating the make fast calls. Statically, the compiler knows that a function consequences of these polarized languages in a practical declared with 2 manifest lambdas has the arity 2. If a function setting has lead us to find that the practical concerns ofa of known arity is fully applied, then the compiler can gener- function’s arity and representation are actually semantic ate code that fuses the applications. For the cases where the properties of the function’s type. arity is unknown at compile time, e.g., inside higher-order Interestingly, types with an innate evaluation strategy functions like zipWith, there is a dynamic check for when have arisen independently in practice. For example, consider calls can be fused. The crux of these approaches is that either what happens if we have a type Int# for real, 64-bit ma- the compiler must propagate statically-known arity infor- chine suitable for storing directly in a register. In mation to the call site at compile-time, or there must be a an expression like f (g 1), where f and g both have type dynamic check for arity information at run-time. Int# -> Int#, is (g 1) evaluated before or after f is called? The usage of arity information has long been an issue Since a value of type Int# is represented by a 64-bit machine in compilers leading to the development of complex arity integer, there is no way to delay the call to g. Passing an analyzers [Breitner 2014; Xu and Peyton Jones 2005] which unboxed integer in a register means it is already a value, use the information to η expand and to float λ-abstractions so it must be evaluated ahead of time (call-by-value). This into and out of contexts; all while being careful to not dupli- insight lets GHC handle unboxed values [Peyton Jones and cate work. More recent work performs analysis Launchbury 1991]. (which checks the usage of expressions) to apply the same There are different ways of formalizing polarity in asys- transformations [Sergey et al. 2014] and also to generate non- tem, and the closest one to our approach here is [Munch- updateable thunks at run-time if they will only be evaluated Maccagnoni 2009; Munch-Maccagnoni and Scherer 2015], once. Our goal is to improve the frequency of multi-arity wherein type constructors can be applied to any types. Other function calls, which is orthogonal to analysis. We can use ei- systems [Levy 2001; Zeilberger 2008] impose stronger con- ther simple analysis like syntactically-visible λ-abstractions straints on types. In the notation here, we could restrict and applications, or those more thorough methods. σ { τ : E so that σ : V and τ : E. This style requires “polar- We seek to secure arity information inside the types of our ity shifts:” conversions between V and E kinds of types. With intermediate language; a place where it will be preserved shifts, we could remove the original type σ → τ : V, but that Making a Faster Curry with Extensional Types Haskell ’19, August 22–23, 2019, Berlin, Germany would go against our desire for a conservative extension; it Zaynah Dargaye and Xavier Leroy. 2009. A verified framework for higher- fundamentally changes the semantics of the base language, order uncurrying optimizations. Higher-Order and Symbolic Computation 22, 3 (2009), 199–231. changing the front-end translation into the intermediate lan- Paul Downen and Zena M. Ariola. 2018. Beyond Polarity: Towards a Multi- guage. However, this alternative could have its own merits. Discipline Intermediate Language with Sharing. In 27th EACSL Annual For example, GHC already uses monomorphic versions of Conference on Computer Science Logic, CSL 2018, September 4-7, 2018, these shifts to represent the boxing and unboxing explicitly Birmingham, UK. 21:1–21:23. in programs (e.g., the unboxed Int# type is primitive, and the Richard A. Eisenberg and Simon Peyton Jones. 2017. Levity polymorphism. Int In Proceedings of the 38th ACM SIGPLAN Conference on Programming lazy is derived from it). Further investigation is required Language Design and Implementation, PLDI 2017, Barcelona, Spain, June to evaluate the relative practical merits of these two styles. 18-23, 2017. 525–539. Andy Gill and Graham Hutton. 2009. The Worker/Wrapper Transformation. 9 Conclusion J. Funct. Program. 19, 2 (March 2009), 227–251. Jean-Yves Girard, Paul Taylor, and Yves Lafont. 1989. Proofs and Types. This work follows the tradition of designing calculi that Cambridge University Press, New York, NY, USA. faithfully capture the practical issue of optimizing curried John Hannan and Patrick Hicks. 1998. Higher-Order unCurrying. In POPL function calls. On one hand, higher-order functions are very ’98, Proceedings of the 25th ACM SIGPLAN-SIGACT Symposium on Princi- important for expressiveness and currying can dramatically ples of Programming Languages, San Diego, CA, USA, January 19-21, 1998. reduce code size [Arvind and Ekanadham 1988]. On the other, 1–11. Jean-Louis Krivine. 2007. A Call-By-Name Lambda-Calculus Machine. efficiency is also a concern; no one wants to be penalized for Higher-Order and Symbolic Computation 20, 3 (2007), 199–207. writing elegant programs! Compiler writers have many tech- Xavier Leroy. 1990. The ZINC experiment: an economical implementation of niques to solve these problems, and we think some of these the ML language. Technical report 117. INRIA. can be put on solid ground with polarization and extension- Paul Blain Levy. 2001. Call-By-Push-Value. Ph.. Dissertation. Queen Mary ality. Interestingly, the solutions for making fast calls across and Westfield College, University of London. Simon Marlow and Simon L. Peyton Jones. 2004. Making a fast curry: higher-order functions and for unboxing appear to be push/enter vs. eval/apply for higher-order languages. In Proceedings of dual to one another. As a pleasant side effect, this approach the Ninth ACM SIGPLAN International Conference on Functional Program- is suitable for both strict and lazy functional languages. We ming, ICFP 2004, Snow Bird, UT, USA, September 19-21, 2004. 4–15. have implemented the ideas presented here as an extension Dylan McDermott and Alan Mycroft. 2019. Extended Call-by-Push-Value: of GHC’s Core intermediate language as a proof of con- Reasoning About Effectful Programs and Evaluation Order. In Program- ming Languages and Systems, Luís Caires (Ed.). Springer International cept.(available at https://github.com/zachsully/ghc/tree/eta- Publishing, Cham, 235–262. arity). Guillaume Munch-Maccagnoni. 2009. Focalisation and Classical Realisabil- ity. In Computer Science Logic: 23rd international Workshop, CSL 2009, 18th Acknowledgments Annual Conference of the EACSL (CSL 2009), Erich Grädel and Reinhard Kahle (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg, 409–423. This material is based upon work supported by the National Guillaume Munch-Maccagnoni. 2013. Syntax and Models of a non-Associative Science Foundation under Grant No. CCF-1719158 and Grant Composition of Programs and Proofs. Ph.D. Dissertation. Université Paris No. CCF-1423617. Any opinions, findings, and conclusions Diderot. or recommendations expressed in this material are those of Guillaume Munch-Maccagnoni and Gabriel Scherer. 2015. Polarised Inter- the author and do not necessarily reflect the views of the mediate Representation of with Sums. In 2015 30th Annual ACM/IEEE Symposium on Logic in Computer Science (LICS 2015). National Science Foundation. 127–140. Simon L. Peyton Jones. 1992. Implementing Lazy Functional Languages on References Stock Hardware: The Spineless Tagless G-machine. Journal of Functional Arvind and Kattamuri Ekanadham. 1988. Future Scientific Programming Programming 2, 2 (1992), 127—-202. on Parallel Machines. J. Parallel Distrib. Comput. 5, 5 (1988), 460–493. Simon L. Peyton Jones and John Launchbury. 1991. Unboxed Values As Maximilian C. Bolingbroke and Simon L. Peyton Jones. 2009. Types Are First Class Citizens in a Non-Strict Functional Language. In Proceedings Calling Conventions. In Proceedings of the 2Nd ACM SIGPLAN Symposium of the 5th ACM Conference on Languages and on Haskell (Haskell ’09). 1–12. Computer Architecture. Springer-Verlag, London, UK, UK, 636–666. Joachim Breitner. 2014. Call Arity. In Trends in Functional Programming Ilya Sergey, Dimitrios Vytiniotis, and Simon Peyton Jones. 2014. Modular, - 15th International Symposium, TFP 2014, Soesterberg, The Netherlands, Higher-order Cardinality Analysis in Theory and Practice. In Proceedings May 26-28, 2014. Revised Selected Papers. 34–50. of the 41st ACM SIGPLAN-SIGACT Symposium on Principles of Program- Guy Cousineau, Pierre-Louis Curien, and Michel Mauny. 1985. The Cat- ming Languages (POPL ’14). 335–347. egorical Abstract Machine. In Functional Programming Languages and Dana N. Xu and Simon L. Peyton Jones. 2005. Arity Analysis. (2005). Work- Computer Architecture, FPCA 1985, Nancy, France, September 16-19, 1985, ing notes. Proceedings. 50–64. Noam Zeilberger. 2008. On the Unity of . Annals of Pure and Applied Vincent Danos, Jean-Baptiste Joinet, and Harold Schellinx. 1997. A New Logic 153, 1 (2008), 660–96. Deconstructive Logic: Linear Logic. Journal of Symbolic Logic 62, 3 (1997), Noam Zeilberger. 2009. The Logical of Evaluation Order and Pattern- 755—-807. Matching. Ph.D. Dissertation. Carnegie Mellon University.