Levity Polymorphism Richard A
Total Page:16
File Type:pdf, Size:1020Kb
Bryn Mawr College Scholarship, Research, and Creative Work at Bryn Mawr College Computer Science Faculty Research and Computer Science Scholarship 2017 Levity Polymorphism Richard A. Eisenberg Bryn Mawr College, [email protected] Simon Peyton Jones Microsoft Research, Cambridge, [email protected] Let us know how access to this document benefits ouy . Follow this and additional works at: http://repository.brynmawr.edu/compsci_pubs Part of the Programming Languages and Compilers Commons Custom Citation R.A. Eisenberg and S. Peyton Jones 2017. "Levity Polymorphism." Proceeding PLDI 2017 Proceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation: 525-539. This paper is posted at Scholarship, Research, and Creative Work at Bryn Mawr College. http://repository.brynmawr.edu/compsci_pubs/69 For more information, please contact [email protected]. Levity Polymorphism Richard A. Eisenberg Simon Peyton Jones Bryn Mawr College, Bryn Mawr, PA, USA Microsoft Research, Cambridge, UK [email protected] [email protected] Abstract solves the problem, but it is terribly slow (Section 2.1). Thus Parametric polymorphism is one of the linchpins of modern motivated, most polymorphic languages also support some typed programming, but it comes with a real performance form of unboxed values that are represented not by a pointer penalty. We describe this penalty; offer a principled way to but by the value itself. For example, the Glasgow Haskell reason about it (kinds as calling conventions); and propose Compiler (GHC), a state of the art optimizing compiler for levity polymorphism. This new form of polymorphism allows Haskell, has supported unboxed values for decades. abstractions over calling conventions; we detail and verify But unboxed values come at the price of convenience restrictions that are necessary in order to compile levity- and re-use (Section 3). In this paper we describe an elegant polymorphic functions. Levity polymorphism has created new approach to reconciling high performance code with new opportunities in Haskell, including the ability to general- pervasive polymorphism. Our contributions are these: ize nearly half of the type classes in GHC’s standard library. • We present (Section 4) a principled way to reason about CCS Concepts • Software and its engineering ! Poly- compiling polymorphic functions and datatypes. Types morphism; Compilers; Functional languages are categorized into kinds, where each kind describes the Keywords unboxed types, compilation, polymorphism memory layout of its types. Thus the kind determines the calling convention of functions over its types. 1. The Cost of Polymorphism • Having a principled way to describe memory layout and Consider the following Haskell function: calling convention, we go one step further and embrace levity polymorphism, allowing functions to be abstracted bTwice :: 8 a: Bool ! a ! (a ! a) ! a over choices of memory layout provided that they never bTwice b x f = case b of True ! f (f x) move or store data with an abstract representation (Section False ! x 5). We believe we are the first to describe and implement The function is polymorphic1 in a; that is, the same function levity polymorphism. works regardless of the type of x, provided f and x are • It is tricky to be sure precisely when levity polymorphism compatible. When we say “the same function” we usually is permissible. We give a formal proof that our rules mean “the same compiled code for bTwice works for any guarantee that levity-polymorphic functions can indeed be type of argument x”. But the type of x influences the calling compiled into concrete code (Section 6). Along the way, convention, and hence the executable code for bTwice! For we were surprised to be unable to find any prior literature example, if x were a list, it would be passed in a register proving that the widespread technique of compiling via A- pointing into the heap; if it were a double-precision float, it normal form [7] is semantics-preserving. A key part of that would be passed in a special floating-point register; and so proof (unrelated to levity polymorphism) is challenging, on. Thus, sharing code conflicts with polymorphism. and we leave it as an open problem (Section 6.4). A simple and widely-used solution is this: represent every • With levity polymorphism in hand, new possibilities open value uniformly, as a pointer to a heap-allocated object. That up—including the ability to write an informative kind 1 We use the term polymorphism to refer to parametric polymorphism only. for (!) and to overload operations over both boxed and unboxed types. We explore these in Section 7. We are not the first to use kinds in this way (see Section 9), but we take the idea much further than any other compiler we know, with happy consequences. Levity polymorphism does not make code go faster; rather, it makes highly-performant code more convenient to write and more re-usable. Levity [Copyright notice will appear here once ’preprint’ option is removed.] polymorphism is implemented in GHC 8.0.1. 1 2017/4/14 2. Background: Performance through Boxed Unboxed Int Unboxed Types Lifted Bool We begin by describing the performance challenges that our Int paper tackles. We use the language Haskell2 and the compiler Unlifted ByteArray # # Char GHC as a concrete setting for this discussion, but many of # our observations apply equally to other languages supporting polymorphism. We discuss other languages and compilers in Figure 1. Boxity and levity, with examples Section 9. them. Given these unboxed values, the boxed versions can be 2.1 Unboxed Values defined in Haskell itself; GHC does not treat them specially. Consider this loop, which computes the sum of the integers For example: 1 :: n: data Int = I# Int# sumTo :: Int ! Int ! Int plusInt :: Int ! Int ! Int sumTo acc 0 = acc plusInt (I i )(I i ) = I (i + i ) sumTo acc n = sumTo (acc + n)(n − 1) # 1 # 2 # 1 # 2 Here Int is an ordinary algebraic data type, with one data GHC represents values of type Int as a pointer to a two-word constructor I , that has one field of type Int . The function heap-allocated cell; the first word is a descriptor, while the # # plusInt simply pattern matches on its arguments, fetches their second has the actual value of the Int. If sumTo used this contents (i and i , both of type Int ), adds them using (+ ), representation throughout, it would be unbearably slow. Each 1 2 # # and boxes the result with I . iteration would evaluate its second argument,3 follow the # pointer to get the value, and test it against zero; in the non- 2.2 Boxed vs. Unboxed and Lifted vs. Unlifted zero case it would allocate thunks for (acc + n) and (n − 1), and then iterate. In contrast, a C compiler would use a three- In general, a boxed value is represented by a pointer into the machine-instruction loop, with no memory traffic whatsoever. heap, while an unboxed value is represented by the value The performance difference is enormous. For this loop, on a itself. It follows that an unboxed value cannot be a thunk; typical machine 10,000,000 iterations executes in less than arguments of unboxed type can only be passed by value. 0.01s when using unboxed Ints, but takes more 2s when using Haskell also requires consideration of levity—that is, the boxed integers. choice between lifted and unlifted. A lifted type is one that is For performance-critical code, GHC therefore allows lazy. It is considered lifted because it has one extra element the programmer to write code over explicitly-unboxed val- beyond the usual ones, representing a non-terminating com- ues [20]. For example, GHC provides a built-in data type putation. For example, Haskell’s Bool type is lifted, meaning that three Bools are possible: True, False, and ?. An unlifted Int# of unboxed integers [20], represented not by a pointer but by the integer itself. Now we can rewrite sumTo like this4 type, on the other hand, is strict. The element ? does not exist in an unlifted type. sumTo# :: Int# ! Int# ! Int# Because Haskell represents lazy computation as a heap- sumTo# acc 0# = acc allocated thunk, all lifted types must also be boxed. However, sumTo# acc n = sumTo# (acc +# n)(n −# 1#) it is possible to have boxed, unlifted types. Figure 1 summa- We had to use different arithmetic operations and literals, but rizes the relationship between boxity and levity, providing apart from that the source code looks just the same. But the examples of the three possible points in the space. compiled code is very different; we get essentially the same code as if we had written it in C. 2.3 Unboxed Tuples GHC’s strictness analyzer and other optimizations can of- Along with the unboxed primitive types (such as Int# and ten transform sumTo into sumTo#. But it cannot guarantee Double#), Haskell has support for unboxed tuples. A normal, to do so, so performance-conscious programmers often pro- boxed tuple—of type, say, (Int; Bool)—is represented by a gram with unboxed values directly. As well as Int#, GHC heap-allocated vector of pointers to the elements of the tuple. provides a range of other unboxed types, such as Char#, and Accordingly, all elements of a boxed tuple must also be boxed. Double#, together with primitive operations that operate on Boxed tuples are also lazy, although this aspect of the design is a free choice. 2 GHC extends Haskell in many ways to better support high-performance Originally conceived to support returning multiple values code, so when we say “Haskell” we will always mean “GHC Haskell”. from a function, an unboxed tuple is merely Haskell syntax 3 Haskell is a lazy language, so the second argument might not be evaluated.