Yhc.Core – from Haskell to Core by Dimitry Golubovsky [email protected] and Neil Mitchell [email protected] and Matthew Naylor [email protected]

Yhc.Core – from Haskell to Core by Dimitry Golubovsky [email protected] and Neil Mitchell [email protected] and Matthew Naylor [email protected] The Yhc compiler is a hot-bed of new and interesting ideas. We present Yhc.Core – one of the most popular libraries from Yhc. We describe what we think makes Yhc.Core special, and how people have used it in various projects including an evaluator, and a Javascript code generator. What is Yhc Core? The York Haskell Compiler (Yhc) [1] is a fork of the nhc98 compiler [2], started by Tom Shackell. The initial goals included increased portability, a platform inde- pendent bytecode, integrated Hat [3] support and generally being a cleaner code base to work with. Yhc has been going for a number of years, and now compiles and runs almost all Haskell 98 programs and has basic FFI support – the main thing missing is the Haskell base library. Yhc.Core is one of our most successful libraries to date. The original nhc compiler used an intermediate core language called PosLambda – a basic lambda cal- culus extended with positional information. The language was neither a subset nor a superset of Haskell. In particular there were unusual constructs and all names were stored in a symbol table. There was also no defined external representation. When one of the authors required a core Haskell language, after evaluating GHC Core [4], it was decided that PosLambda was closest to what was desired but required substantial clean up. Rather than attempt to change the PosLambda language, a task that would have been decidedly painful, we chose instead to write a Core language from scratch. When designing our Core language, we took ideas from both PosLambda and GHC Core, aiming for something as simple as possible. Due to the similarities to PosLambda we have written a translator from our Core language to PosLambda, which is part of the Yhc compiler. Our idealised Core language differs from GHC Core in a number of ways: 1 The Monad.Reader I Untyped – originally this was a restriction of PosLambda, but now we see this as a feature, although not everyone agrees. I Syntactically a subset of Haskell. I Minimal name mangling. All these features combine to create a Core language which resembles Haskell much more than Core languages in other Haskell compilers. As a result, most Haskell programmers can feel at home with relatively little effort. By keeping a much simpler Core language, it is less effort to learn, and the number of projects depending on it has grown rapidly. We have tried to add facilities to the libraries for common tasks, rather than duplicating them separately in projects. As a result the Core library now has facilities for dealing with primitives, removing recursive lets, reachability analysis, strictness analysis, simplification, inlining and more. One of the first features we added to Core was whole program linking – any Haskell program, regardless of the number of modules, can be collapsed into one single Yhc.Core module. While this breaks separate compilation, it simplifies many types of analysis and transformation. If a such an analysis turns out to be successful then breaking the dependence on whole program compilation is a worthy goal – but this approach allows developers to pay that cost only when it is needed. A Small Example To give a flavour of what Core looks like, it is easiest to start with a small program: head2 (x:xs) = x map2 f [] = [] map2 f (x:xs) = f x : map2 f xs test x = map2 head2 x Compiling with yhc -showcore Sample.hs generates: Sample.head2 v220 = case v220 of (:) v221 v222 -> v221 _ -> Prelude.error Sample._LAMBDA228 2 Dimitry Golubovsky, Neil Mitchell, Matthew Naylor: Yhc.Core – from Haskell to Core Sample._LAMBDA228 = "Sample: Pattern match failure in function at 9:1-9:15." Sample.map2 v223 v224 = case v224 of [] -> [] (:) v225 v226 -> (:) (v223 v225) (Sample.map2 v223 v226) Sample.test v227 = Sample.map2 Sample.head2 v227 The generated Core can be treated as a subset of Haskell, with many restrictions: I Case statements only examine their outermost constructor I No type classes I No where statements I Only top-level functions. I All names are fully qualified I All constructors and primitives are fully applied Yhc.Core.Overlay We provide many library functions to operate on Core, but one of our most unusual features is the overlay concept. Overlays specify modifications to be made to a piece of code – which functions should be replaced, which ones inserted, which data structures modified. By combining a Core file with an overlay, modifications can be made after translation from Haskell to Core. This idea originated in the Mozilla project [5], and is used successfully to enable extensions in Firefox, and elsewhere throughout their platform. To take a simple example, in Haskell there are two common definitions for reverse: reverse = foldl (flip (:)) [] reverse [] = [] reverse (x:xs) = reverse xs ++ [x] 3 The Monad.Reader The first definition uses an accumulator, and takes O(n). The second definition requires O(n2), as the tail element is appended onto the whole list. Clearly a Haskell compiler should pick the first variant. However, a program analysis tool may wish to use the second variant as it may present fewer analysis challenges. The overlay mechanism allows this to be done easily. The first step is to write an overlay file: global_Prelude’_reverse [] = [] global_Prelude’_reverse (x:xs) = global_Prelude’_reverse xs ++ [x] This Overlay file contains a list of functions whose definitions we would like to replace. Any function that previously called Prelude.reverse will now invoke this new copy. For a program to insert an overlay, both Haskell files need to be compiled to Core, then the overlay function is called. But we need not stop at simply replacing the reverse function. Yhc defines an IO type as a function over the World type, but for some applications this may not be appropriate. We can redefine IO as: data IO a = IO a global_Monad’_IO’_return a = IO a global_Monad’_IO’_’gt’gt (IO a) b = b global_Monad’_IO’_’gt’gt’eq (IO a) f = f a global_YHC’_Internal’_unsafePerformIO (IO a) = a The Overlay mechanism supports escape characters – ’gt is the > character – allowing us to replace the bind and return methods. We have found that with Overlays a compiler can be customized for many different tasks, without causing conflicts. With one code base, we can allow different programs to modify the libraries to suit their needs. Taking the example of Int ad- dition, there are at least three different implementations in use: Javascript native numbers, binary arithmetic on a Haskell data type and abstract interpretation. Semantics of Yhc Core In this section an evaluator for Yhc Core programs is presented in the form of a literate Haskell program. The aim is to define the informal semantics of Core programs while demonstrating a full, albeit simple, application of the Yhc.Core library. 4 Dimitry Golubovsky, Neil Mitchell, Matthew Naylor: Yhc.Core – from Haskell to Core module Main where import Yhc.Core import System import Monad Our evaluator is based around the function whnf that takes a Core program (of type Core) along with a Core expression (of type CoreExpr) and reduces that expression until it has the form of: I a data constructor with unevaluated arguments, or I an unapplied lambda expression. In general, data values in Haskell are tree-shaped. The function whnf is often said to “reduce an expression to head normal form” because it reveals the head (or root) of a value’s tree and no more. Stricly speaking, when the result of reduction could be a functional value (i.e. a lambda expression), and the body of that lambda is left unevaluated, then the result is said to be in “weak head normal form” – this explains the strange acronym. The type of whnf is: whnf :: Core -> CoreExpr -> CoreExpr Defining it is a process of taking each kind of Core expression in turn, and asking “how do I reduce this to weak head normal form?” As usual, it makes sense to define the base cases first, namely constructors and lambda expressions: whnf p (CoreCon c) = CoreCon c whnf p (CoreApp (CoreCon c) as) = CoreApp (CoreCon c) as whnf p (CoreLam (v:vs) e) = CoreLam (v:vs) e Notice that a constructor may take one of two forms: stand-alone with no arguments, or as function application to a list of arguments. Also, because of the way our evaluator is designed, we may encounter lambda expressions with no arguments. Hence, only lambdas with arguments represent a base-case. For the no-arguments case, we just shift the focus of reduction to the body: whnf p (CoreLam [] e) = whnf p e 5 The Monad.Reader Currently, lambda expressions do not occur in the Core output of Yhc. They are part of the Core syntax because they are useful conceptually, particularly when maniplating (and evaluating) higher-order functions. Moving on to case-expressions, we first reduce the case subject, then match it against each pattern in turn, and finally reduce the body of the chosen alternative. In Core, we can safely assume that patterns are at most one constructor deep, so reduction of the subject to WHNF is sufficient. whnf p (CoreCase e as) = whnf p (match (whnf p e) as) We defer the definition of match for the moment. To reduce a let-expression, we substitute the let-bindings in the body of the let. This is easily done using the Core function replaceFreeVars.

Yhc.Core – from Haskell to Core by Dimitry Golubovsky [email protected] and Neil Mitchell [email protected] and Matthew Naylor [email protected]

Dynamic Extension of Typed Functional Languages

Haskell-Like S-Expression-Based Language Designed for an IDE

Universidad Complutense De Madrid Transformación Y

Elaboration on Functional Dependencies: Functional Dependencies Are Dead, Long Live Functional Dependencies!

Syntactic Analysis and Type Evaluation in Haskell

Haskell Communities and Activities Report

Haskell (Programming Language) 1 Haskell (Programming Language)

A History of Haskell: Being Lazy with Class April 16, 2007

Autobench Comparing the Time Performance of Haskell Programs

The Reduceron: Widening the Von Neumann Bottleneck for Graph Reduction Using an FPGA