Write You a Haskell Building a Modern Functional Compiler from first Principles

Total Page:16

File Type:pdf, Size:1020Kb

Write You a Haskell Building a Modern Functional Compiler from first Principles Write You a Haskell Building a modern functional compiler from first principles Stephen Diehl January 2015 (Draft) Write You a Haskell by Stephen Diehl Copyright © 2014-2016. www.stephendiehl.com is written work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. You may reproduce and edit this work with attribution for all non-commercial purposes. e included source is released under the terms of the MIT License. Git commit: 4e913157bb4ffb7900cd25ef88445acff642ece2 Contents Introduction 7 Goals ............................................... 7 Prerequisites ............................................ 7 Concepts 8 Functional Languages ...................................... 8 Static Typing ........................................... 9 Functional Compilers ...................................... 10 Parsing .............................................. 11 Desugaring ............................................ 12 Type Inference .......................................... 12 Transformation .......................................... 13 Code Generation ......................................... 13 Haskell Basics 15 Functions ............................................. 15 Datatypes ............................................. 16 Values ............................................... 16 Pattern matching ......................................... 17 Recursion ............................................. 18 Laziness .............................................. 19 Higher-Kinded Types ....................................... 20 Typeclasses ............................................ 20 Operators ............................................. 21 Monads .............................................. 21 Applicatives ............................................ 23 Monoids ............................................. 23 Deriving .............................................. 24 IO ................................................. 24 Monad Transformers ....................................... 25 1 Text ................................................ 30 Cabal & Stack .......................................... 30 Resources ............................................. 31 Parsing 32 Parser Combinators ........................................ 32 NanoParsec ............................................ 32 Parsec ............................................... 38 Evaluation ............................................. 41 REPL ............................................... 42 Soundness ............................................. 43 Full Source ............................................ 44 Lambda Calculus 45 SKI Combinators ......................................... 47 Implementation .......................................... 47 Substitution ............................................ 48 Conversion and Equivalences .................................. 49 Reduction ............................................. 50 Let ................................................. 51 Everything Can Be a λ term ................................... 51 Recursion ............................................. 52 Pretty Printing .......................................... 54 Full Source ............................................ 55 Type Systems 56 Rules ............................................... 56 Type Safety ............................................ 57 Types ............................................... 58 Small-Step Semantics ....................................... 59 Observations ........................................... 63 Simply Typed Lambda Calculus ................................. 64 2 Type Checker ........................................... 64 Evaluation ............................................. 67 Observations ........................................... 67 Notation Reference ........................................ 68 Full Source ............................................ 69 Evaluation 70 Evaluation Models ........................................ 70 Call-by-value ........................................... 71 Call-by-name ........................................... 72 Call-by-need ........................................... 73 Higher Order Abstract Syntax (HOAS) ............................. 74 Parametric Higher Order Abstract Syntax (PHOAS) ...................... 76 Embedding IO .......................................... 78 Full Source ............................................ 79 Hindley-Milner Inference 81 Syntax ............................................... 82 Polymorphism .......................................... 83 Types ............................................... 83 Context .............................................. 84 Inference Monad ......................................... 85 Substitution ............................................ 85 Unification ............................................ 87 Generalization and Instantiation ................................. 89 Typing Rules ........................................... 90 Constraint Generation ...................................... 94 Typing ............................................... 95 Constraint Solver ......................................... 96 Worked Examples ........................................ 97 Interpreter ............................................. 99 Interactive Shell .......................................... 100 3 Observations ........................................... 103 Full Source ............................................ 105 Design of ProtoHaskell 106 Haskell: A Rich Language .................................... 106 Scope ............................................... 107 Intermediate Forms ........................................ 108 Compiler Monad ......................................... 109 Engineering Overview 110 REPL ............................................... 110 Parser ............................................... 112 Renamer ............................................. 113 Datatypes ............................................. 113 Desugaring ............................................ 113 Core ................................................ 116 Type Classes ............................................ 117 Type Checker ........................................... 118 Interpreter ............................................. 119 Error Reporting .......................................... 119 Frontend 119 Data Declarations ........................................ 122 Function Declarations ...................................... 123 Fixity Declarations ........................................ 124 Typeclass Declarations ...................................... 125 Wired-in Types .......................................... 126 Traversals ............................................. 127 Misc Infrastructure 129 Repline .............................................. 129 Command Line Arguments ................................... 129 GraphSCC ............................................ 129 4 Optparse Applicative ....................................... 129 Full Source ............................................ 129 Resources ............................................. 130 Extended Parser 131 Toolchain ............................................. 131 Alex ................................................ 133 Happy ............................................... 134 Syntax Errors ........................................... 136 Type Error Provenance ...................................... 137 Indentation ............................................ 140 Extensible Operators ....................................... 142 Full Source ............................................ 145 Resources ............................................. 145 Datatypes 146 Algebraic data types ....................................... 146 Isorecursive Types ......................................... 147 Memory Layout ......................................... 148 Pattern Scrutiny ......................................... 149 Syntax ............................................... 151 GHC.Generics .......................................... 151 Wadler’s Algorithm ........................................ 152 Full Source ............................................ 153 Renamer 154 Renaming Pass .......................................... 154 Full Source ............................................ 154 LLVM 155 JIT Compilation ......................................... 155 LLVM ............................................... 175 Types ............................................... 176 5 Variables ............................................. 178 Instructions ............................................ 178 Data ................................................ 179 Blocks ............................................... 179 Control Flow ........................................... 180 Calls ................................................ 188 Memory .............................................. 188 GetElementPtr .......................................... 189 Casts ............................................... 189 Toolchain ............................................. 189 llvm-general ............................................ 191 Code Generation (LLVM) 191 Resources ............................................. 191 6 Introduction Goals Off we go on our Adventure in Haskell Compilers! It will be intense, long, informative, and hopefully fun. It’s important to stress several points about the goals before we start
Recommended publications
  • No-Longer-Foreign: Teaching an ML Compiler to Speak C “Natively”
    Electronic Notes in Theoretical Computer Science 59 No. 1 (2001) URL: http://www.elsevier.nl/locate/entcs/volume59.html 16 pages No-Longer-Foreign: Teaching an ML compiler to speak C “natively” Matthias Blume 1 Lucent Technologies, Bell Laboratories Abstract We present a new foreign-function interface for SML/NJ. It is based on the idea of data- level interoperability—the ability of ML programs to inspect as well as manipulate C data structures directly. The core component of this work is an encoding of the almost 2 complete C type sys- tem in ML types. The encoding makes extensive use of a “folklore” typing trick, taking advantage of ML’s polymorphism, its type constructors, its abstraction mechanisms, and even functors. A small low-level component which deals with C struct and union declarations as well as program linkage is hidden from the programmer’s eye by a simple program-generator tool that translates C declarations to corresponding ML glue code. 1 An example Suppose you are an ML programmer who wants to link a program with some C rou- tines. The following example (designed to demonstrate data-level interoperability rather than motivate the need for FFIs in the first place) there are two C functions: input reads a list of records from a file and findmin returns the record with the smallest i in a given list. The C library comes with a header file ixdb.h that describes this interface: typedef struct record *list; struct record { int i; double x; list next; }; extern list input (char *); extern list findmin (list); Our ml-nlffigen tool translates ixdb.h into an ML interface that corre- sponds nearly perfectly to the original C interface.
    [Show full text]
  • Assignment 4 Computer Science 235 Due Start of Class, October 4, 2005
    Assignment 4 Computer Science 235 Due Start of Class, October 4, 2005 This assignment is intended as an introduction to OCAML programming. As always start early and stop by if you have any questions. Exercise 4.1. Twenty higher-order OCaml functions are listed below. For each function, write down the type that would be automatically reconstructed for it. For example, consider the following OCaml length function: let length xs = match xs with [] -> 0 | (_::xs') = 1 + (length xs') The type of this OCaml function is: 'a list -> int Note: You can check your answers by typing them into the OCaml interpreter. But please write down the answers first before you check them --- otherwise you will not learn anything! let id x = x let compose f g x = (f (g x)) let rec repeated n f = if (n = 0) then id else compose f (repeated (n - 1) f) let uncurry f (a,b) = (f a b) let curry f a b = f(a,b) let churchPair x y f = f x y let rec generate seed next isDone = if (isDone seed) then [] else seed :: (generate (next seed) next isDone) let rec map f xs = match xs with [] -> [] Assignment 4 Page 2 Computer Science 235 | (x::xs') -> (f x) :: (map f xs') let rec filter pred xs = match xs with [] -> [] | (x::xs') -> if (pred x) then x::(filter pred xs') else filter pred xs' let product fs xs = map (fun f -> map (fun x -> (f x)) xs) fs let rec zip pair = match pair with ([], _) -> [] | (_, []) -> [] | (x::xs', y::ys') -> (x,y)::(zip(xs',ys')) let rec unzip xys = match xys with [] -> ([], []) | ((x,y)::xys') -> let (xs,ys) = unzip xys' in (x::xs, y::ys) let rec
    [Show full text]
  • What I Wish I Knew When Learning Haskell
    What I Wish I Knew When Learning Haskell Stephen Diehl 2 Version This is the fifth major draft of this document since 2009. All versions of this text are freely available onmywebsite: 1. HTML Version ­ http://dev.stephendiehl.com/hask/index.html 2. PDF Version ­ http://dev.stephendiehl.com/hask/tutorial.pdf 3. EPUB Version ­ http://dev.stephendiehl.com/hask/tutorial.epub 4. Kindle Version ­ http://dev.stephendiehl.com/hask/tutorial.mobi Pull requests are always accepted for fixes and additional content. The only way this document will stayupto date and accurate through the kindness of readers like you and community patches and pull requests on Github. https://github.com/sdiehl/wiwinwlh Publish Date: March 3, 2020 Git Commit: 77482103ff953a8f189a050c4271919846a56612 Author This text is authored by Stephen Diehl. 1. Web: www.stephendiehl.com 2. Twitter: https://twitter.com/smdiehl 3. Github: https://github.com/sdiehl Special thanks to Erik Aker for copyediting assistance. Copyright © 2009­2020 Stephen Diehl This code included in the text is dedicated to the public domain. You can copy, modify, distribute and perform thecode, even for commercial purposes, all without asking permission. You may distribute this text in its full form freely, but may not reauthor or sublicense this work. Any reproductions of major portions of the text must include attribution. The software is provided ”as is”, without warranty of any kind, express or implied, including But not limitedtothe warranties of merchantability, fitness for a particular purpose and noninfringement. In no event shall the authorsor copyright holders be liable for any claim, damages or other liability, whether in an action of contract, tort or otherwise, Arising from, out of or in connection with the software or the use or other dealings in the software.
    [Show full text]
  • Idris: a Functional Programming Language with Dependent Types
    Programming Languages and Compiler Construction Department of Computer Science Christian-Albrechts-University of Kiel Seminar Paper Idris: A Functional Programming Language with Dependent Types Author: B.Sc. Finn Teegen Date: 20th February 2015 Advised by: M.Sc. Sandra Dylus Contents 1 Introduction1 2 Fundamentals2 2.1 Universes....................................2 2.2 Type Families..................................2 2.3 Dependent Types................................3 2.4 Curry-Howard Correspondence........................4 3 Language Overview5 3.1 Simple Types and Functions..........................5 3.2 Dependent Types and Functions.......................6 3.3 Implicit Arguments...............................7 3.4 Views......................................8 3.5 Lazy Evaluation................................8 3.6 Syntax Extensions...............................9 4 Theorem Proving 10 4.1 Propositions as Types and Terms as Proofs................. 10 4.2 Encoding Intuitionistic First-Order Logic................... 12 4.3 Totality Checking................................ 14 5 Conclusion 15 ii 1 Introduction In conventional Hindley-Milner based programming languages, such as Haskell1, there is typically a clear separation between values and types. In dependently typed languages, however, this distinction is less clear or rather non-existent. In fact, types can depend on arbitrary values. Thus, they become first-class citizens and are computable like any other value. With types being allowed to contain values, they gain the possibility to describe prop- erties of their own elements. The standard example for dependent types is the type of lists of a given length - commonly referred to as vectors - where the length is part of the type itself. When starting to encode properties of values as types, the elements of such types can be seen as proofs that the stated property is true.
    [Show full text]
  • Commonspad Ocaml Library Programmer’S Manual
    Commonspad OCaml Library Programmer’s manual Yoann Padioleau [email protected] December 29, 2009 Copyright c 2009 Yoann Padioleau. Permission is granted to copy, distribute and/or modify this doc- ument under the terms of the GNU Free Documentation License, Version 1.3. 1 Short Contents 1 Introduction 4 I Common 7 2 Overview 8 3 Basic 14 4 Basic types 31 5 Collection 46 6 Misc 61 II OCommon 66 7 Overview 67 8 Ocollection 68 9 Oset 71 10 Oassoc 73 11 Osequence 74 12 Oarray 75 13 Ograph 77 14 Odb 81 2 III Extra Common 82 15 Interface 84 16 Concurrency 89 17 Distribution 90 18 Graphic 91 19 OpenGL 92 20 GUI 93 21 ParserCombinators 94 22 Backtrace 100 23 Glimpse 101 24 Regexp 103 25 Sexp and binio 104 26 Python 105 Conclusion 106 A Indexes 107 B References 108 3 Contents 1 Introduction 4 1.1 Features . 4 1.2 Copyright . 5 1.3 Source organization . 6 1.4 API organization . 6 1.5 Acknowledgements . 6 I Common 7 2 Overview 8 3 Basic 14 3.1 Pervasive types and operators . 14 3.2 Debugging, logging . 15 3.3 Profiling . 17 3.4 Testing . 18 3.5 Persitence . 20 3.6 Counter . 21 3.7 Stringof . 21 3.8 Macro . 22 3.9 Composition and control . 22 3.10 Concurrency . 24 3.11 Error management . 24 3.12 Environment . 25 3.13 Arguments . 28 3.14 Equality . 29 4 Basic types 31 4.1 Bool . 31 4.2 Char . 31 4.3 Num .
    [Show full text]
  • Teach Yourself Perl 5 in 21 Days
    Teach Yourself Perl 5 in 21 days David Till Table of Contents: Introduction ● Who Should Read This Book? ● Special Features of This Book ● Programming Examples ● End-of-Day Q& A and Workshop ● Conventions Used in This Book ● What You'll Learn in 21 Days Week 1 Week at a Glance ● Where You're Going Day 1 Getting Started ● What Is Perl? ● How Do I Find Perl? ❍ Where Do I Get Perl? ❍ Other Places to Get Perl ● A Sample Perl Program ● Running a Perl Program ❍ If Something Goes Wrong ● The First Line of Your Perl Program: How Comments Work ❍ Comments ● Line 2: Statements, Tokens, and <STDIN> ❍ Statements and Tokens ❍ Tokens and White Space ❍ What the Tokens Do: Reading from Standard Input ● Line 3: Writing to Standard Output ❍ Function Invocations and Arguments ● Error Messages ● Interpretive Languages Versus Compiled Languages ● Summary ● Q&A ● Workshop ❍ Quiz ❍ Exercises Day 2 Basic Operators and Control Flow ● Storing in Scalar Variables Assignment ❍ The Definition of a Scalar Variable ❍ Scalar Variable Syntax ❍ Assigning a Value to a Scalar Variable ● Performing Arithmetic ❍ Example of Miles-to-Kilometers Conversion ❍ The chop Library Function ● Expressions ❍ Assignments and Expressions ● Other Perl Operators ● Introduction to Conditional Statements ● The if Statement ❍ The Conditional Expression ❍ The Statement Block ❍ Testing for Equality Using == ❍ Other Comparison Operators ● Two-Way Branching Using if and else ● Multi-Way Branching Using elsif ● Writing Loops Using the while Statement ● Nesting Conditional Statements ● Looping Using
    [Show full text]
  • The Glasgow Haskell Compiler User's Guide, Version 4.08
    The Glasgow Haskell Compiler User's Guide, Version 4.08 The GHC Team The Glasgow Haskell Compiler User's Guide, Version 4.08 by The GHC Team Table of Contents The Glasgow Haskell Compiler License ........................................................................................... 9 1. Introduction to GHC ....................................................................................................................10 1.1. The (batch) compilation system components.....................................................................10 1.2. What really happens when I “compile” a Haskell program? .............................................11 1.3. Meta-information: Web sites, mailing lists, etc. ................................................................11 1.4. GHC version numbering policy .........................................................................................12 1.5. Release notes for version 4.08 (July 2000) ........................................................................13 1.5.1. User-visible compiler changes...............................................................................13 1.5.2. User-visible library changes ..................................................................................14 1.5.3. Internal changes.....................................................................................................14 2. Installing from binary distributions............................................................................................16 2.1. Installing on Unix-a-likes...................................................................................................16
    [Show full text]
  • Constrained Type Families (Extended Version)
    Constrained Type Families (extended version) J. GARRETT MORRIS, e University of Edinburgh and e University of Kansas 42 RICHARD A. EISENBERG, Bryn Mawr College We present an approach to support partiality in type-level computation without compromising expressiveness or type safety. Existing frameworks for type-level computation either require totality or implicitly assume it. For example, type families in Haskell provide a powerful, modular means of dening type-level computation. However, their current design implicitly assumes that type families are total, introducing nonsensical types and signicantly complicating the metatheory of type families and their extensions. We propose an alternative design, using qualied types to pair type-level computations with predicates that capture their domains. Our approach naturally captures the intuitive partiality of type families, simplifying their metatheory. As evidence, we present the rst complete proof of consistency for a language with closed type families. CCS Concepts: •eory of computation ! Type structures; •So ware and its engineering ! Func- tional languages; Additional Key Words and Phrases: Type families, Type-level computation, Type classes, Haskell ACM Reference format: J. Garre Morris and Richard A. Eisenberg. 2017. Constrained Type Families (extended version). PACM Progr. Lang. 1, 1, Article 42 (January 2017), 38 pages. DOI: hp://dx.doi.org/10.1145/3110286 1 INTRODUCTION Indexed type families (Chakravarty et al. 2005; Schrijvers et al. 2008) extend the Haskell type system with modular type-level computation. ey allow programmers to dene and use open mappings from types to types. ese have given rise to further extensions of the language, such as closed type families (Eisenberg et al.
    [Show full text]
  • Twelf User's Guide
    Twelf User’s Guide Version 1.4 Frank Pfenning and Carsten Schuermann Copyright c 1998, 2000, 2002 Frank Pfenning and Carsten Schuermann Chapter 1: Introduction 1 1 Introduction Twelf is the current version of a succession of implementations of the logical framework LF. Previous systems include Elf (which provided type reconstruction and the operational semantics reimplemented in Twelf) and MLF (which implemented module-level constructs loosely based on the signatures and functors of ML still missing from Twelf). Twelf should be understood as research software. This means comments, suggestions, and bug reports are extremely welcome, but there are no guarantees regarding response times. The same remark applies to these notes which constitute the only documentation on the present Twelf implementation. For current information including download instructions, publications, and mailing list, see the Twelf home page at http://www.cs.cmu.edu/~twelf/. This User’s Guide is pub- lished as Frank Pfenning and Carsten Schuermann Twelf User’s Guide Technical Report CMU-CS-98-173, Department of Computer Science, Carnegie Mellon University, November 1998. Below we state the typographic conventions in this manual. code for Twelf or ML code ‘samp’ for characters and small code fragments metavar for placeholders in code keyboard for input in verbatim examples hkeyi for keystrokes math for mathematical expressions emph for emphasized phrases File names for examples given in this guide are relative to the main directory of the Twelf installation. For example ‘examples/guide/nd.elf’ may be found in ‘/usr/local/twelf/examples/guide/nd.elf’ if Twelf was installed into the ‘/usr/local/’ directory.
    [Show full text]
  • Session Arrows: a Session-Type Based Framework for Parallel Code Generation
    MENG INDIVIDUAL PROJECT IMPERIAL COLLEGE LONDON DEPARTMENT OF COMPUTING Session Arrows: A Session-Type Based Framework For Parallel Code Generation Supervisor: Prof. Nobuko Yoshida Author: Dr. David Castro-Perez Shuhao Zhang Second Marker: Dr. Iain Phillips June 19, 2019 Abstract Parallel code is notorious for its difficulties in writing, verification and maintenance. However, it is of increasing importance, following the end of Moore’s law. Modern pro- grammers are expected to utilize the power of multi-core CPUs and face the challenges brought by parallel programs. This project builds an embedded framework in Haskell to generate parallel code. Combining the power of multiparty session types with parallel computation, we create a session typed monadic language as the middle layer and use Arrow, a general interface to computation as an abstraction layer on top of the language. With the help of the Arrow interface, we convert the data-flow of the computation to communication and generate parallel code according to the communication pattern between participants involved in the computation. Thanks to the addition of session types, not only the generated code is guaranteed to be deadlock-free, but also we gain a set of local types so that it is possible to reason about the communication structure of the parallel computation. In order to show that the framework is as expressive as usual programming lan- guages, we write several common parallel computation patterns and three algorithms to benchmark using our framework. They demonstrate that users can express computa- tion similar to traditional sequential code and gain, for free, high-performance parallel code in low-level target languages such as C.
    [Show full text]
  • Haskell Quick Syntax Reference
    Haskell Quick Syntax Reference A Pocket Guide to the Language, APIs, and Library — Stefania Loredana Nita Marius Mihailescu www.allitebooks.com Haskell Quick Syntax Reference A Pocket Guide to the Language, APIs, and Library Stefania Loredana Nita Marius Mihailescu www.allitebooks.com Haskell Quick Syntax Reference: A Pocket Guide to the Language, APIs, and Library Stefania Loredana Nita Marius Mihailescu Bucharest, Romania Bucharest, Romania ISBN-13 (pbk): 978-1-4842-4506-4 ISBN-13 (electronic): 978-1-4842-4507-1 https://doi.org/10.1007/978-1-4842-4507-1 Copyright © 2019 by Stefania Loredana Nita and Marius Mihailescu This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. Trademarked names, logos, and images may appear in this book. Rather than use a trademark symbol with every occurrence of a trademarked name, logo, or image we use the names, logos, and images only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject to proprietary rights.
    [Show full text]
  • Programming Inin C++C++ 11.11
    ProgrammingProgramming inin C++C++ 11.11. ControlControl structurestructure IIII ! Iteration statements - while statement - for statement - do statement ! break and continue statements ! goto statement ! Comma operator ! Null statement ! Conditional expression operator ! Summary 1 Introduction to programming in C++ for engineers: 11. Control structure II IterationIteration statements:statements: whilewhile statementstatement The while statement takes the general form: while (control) statement The control expression which is of arithmetic type is evaluated before each execution of what is in a statement. This statement is executed if the expression is not zero and then the test is repeated. There is no do associated with the while!!! int n =5; double gamma = 1.0; while (n > 0) { gamma *= n-- } If the test never fails then the iteration never terminates. 2 Introduction to programming in C++ for engineers: 11. Control structure II IterationIteration statements:statements: forfor statementstatement The for statement has the general form: for (initialize; control; change) statement The initialize expression is evaluated first. If control is non-zero statement is executed. The change expression is then evaluated and if control is still non-zero statement is executed again. Control continues to cycle between control, statement and change until the control expression is zero. long unsigned int factorial = 1; for(inti=1;i<=n; ++i) { factorial *= i; } Initialize is evaluated only once! 3 Introduction to programming in C++ for engineers: 11. Control structure II IterationIteration statements:statements: forfor statementstatement It is also possible for any (or even all) of the expressions (but not the semicolons) to be missing: int factorial = 1,i=1; for (; i<= n; ) { factorial *= i++; { If the loop variable is defined in the initialize expression the scope of it is the same as if the definition occurred before the for loop.
    [Show full text]