A Compiler for Cada

A compiler for Cada Submitted May 2013 in partial fulfilment of the conditions of the award of the degree of MSci (Hons) Computer Science. Michael Benjamin Gale School of Computer Science University of Nottingham I hereby declare that this dissertation is all my own work, except as indicated in the text: Signature Date / / Abstract Monads have evolved into a powerful and versatile tool for functional programming, but remain a challenging concept for those versed in other paradigms. However, their usefulness encourages us to explore means by which they can be made more accessi- ble. While efforts have been made to introduce syntactic constructs for this purpose, such as the do-notation and monad comprehensions in Haskell, little attention has been paid to developing syntax specifically for the unique helper functions which accompany individual monads. High-level programming languages aim to hide machinery in an underlying system which is frequently used in programs. In cases where no such abstractions are avail- able, programmers may have to write repetitive or difficult code. For this reason, it is desirable to enrich functional programming languages with abstractions for some frequently used monads. Our goal in this dissertation will be to explore the role of the state monad, to design a functional programming language which makes effective use of our observations, and to implement a compiler for it in Haskell. In the process of developing this language, which we shall name Cada, we will explore the foundations of functional programming and learn that implementing a compiler for a modern functional language is nothing to be afraid of. 2 Michael B. Gale Contents 1 Introduction 5 1.1 Project description ............................... 6 1.2 Recommended reading ............................. 7 1.3 Acknowledgements ............................... 8 2 Background 9 2.1 A tale of two calculi ............................... 9 2.1.1 Recursion ................................ 11 2.1.2 Simply-typed λ-calculus ........................ 12 2.2 System F ..................................... 14 2.2.1 Church vs Curry ............................ 15 2.2.2 Hindley-Milner ............................. 16 2.3 Monads ...................................... 17 2.3.1 In Haskell ................................ 18 2.3.2 The State monad ............................ 20 2.3.3 Monad transformers .......................... 22 3 Language design 24 3.1 Putting it all together .............................. 28 3.2 Related work ................................... 30 4 Language specification 32 4.1 Summary ..................................... 32 4.2 Module system ................................. 33 4.3 Type system ................................... 33 4.3.1 Kinds ................................... 33 4.3.2 Kind inference .............................. 35 4.4 Lexical grammar ................................ 36 4.4.1 Program structure ........................... 36 4.4.2 Identifiers ................................ 37 4.4.3 Literals .................................. 37 4.5 Context-free grammar ............................. 37 4.5.1 Algebraic data types .......................... 38 4.5.2 Type classes and instances ....................... 38 4.5.3 Expressions ............................... 39 4.6 Denotational and operational semantics ................... 39 4.7 Standard library ................................. 40 5 Implementation 42 5.1 Architecture ................................... 42 5.1.1 Directory layout ............................. 43 5.2 Configuration .................................. 44 5.3 Lexical analysis ................................. 46 5.3.1 Tokens .................................. 47 5.3.2 Multi-line comments .......................... 47 5.4 Parser ....................................... 48 5.4.1 Parser monad .............................. 49 5.4.2 Left-recursion .............................. 52 5.4.3 Ambiguities ............................... 53 5.4.4 Locations ................................. 54 5.5 Abstract syntax tree ............................... 55 5.5.1 Construction and validation ...................... 56 5.6 Pretty printing .................................. 59 5.6.1 Pretty printing combinators ...................... 59 5.6.2 Pretty printing of abstract representations .............. 60 5.7 Type system ................................... 61 5.7.1 Type environments ........................... 61 5.8 Module system ................................. 62 5.8.1 Interface files .............................. 64 5.9 Kind inference .................................. 65 5.10 Conversion .................................... 67 5.11 Type inference .................................. 69 5.11.1 Type errors ................................ 69 5.11.2 Type functions .............................. 70 5.12 Backend ..................................... 72 5.12.1 Code generation ............................. 73 5.12.2 Runtime support ............................ 74 5.12.3 Entry point ................................ 75 5.13 Testing ...................................... 75 6 Conclusion 78 4 Michael B. Gale 6.1 Evaluation of the implementation ....................... 80 6.2 Future work ................................... 82 6.2.1 Constructors and modifiers ...................... 82 6.2.2 Generalisation .............................. 82 6.2.3 Different type inference algorithm .................. 83 6.2.4 Language extension for Haskell .................... 83 A Appendix 88 A.1 System F ..................................... 88 A.2 Hindley-Milner ................................. 89 A.3 Lexical grammar ................................ 91 A.4 Context-free grammar ............................. 93 A.5 Abstract syntax tree ............................... 98 A.6 Type system ................................... 103 A compiler for Cada 5 λ.1 Introduction Computer systems are responsible for the control of many aspects of our modern lives, ranging from wrist watches to rockets. These exponentially increasing responsibilities mean that we are obliged to ensure that our software is more reliable than ever. At the same time, the growing complexity of programs poses challenges as to how we can achieve this goal. Traditionally, software engineers ran their programs through sets of rigorous test cases to ensure that there were no bugs. While this approach ruled out a reasonable number of faults, it has never been able to track down all of them. To make matters worse, the growing size of software means that more and more time is needed to design a comprehensive set of test cases. It becomes apparent that a different solution is needed to counteract the increasingly high chances for a fault to go unnoticed. Purely functional programming languages provide a promising answer to this problem. If we model computations using mathematical functions, as opposed to procedures which manipulate a computer’s memory, then we can formally prove properties about program using common techniques such as, for example, mathematical induction. Purity is its own worst enemy, however, as it forces us to make everything about functions explicit. Tasks which are easily accomplished in imperative languages can quickly turn into major obstacles for such a language. An even bigger and seemingly insurmountable problem is the need of side-effects, such as terminal input and output, in real software systems. Despite these obstacles, a series of research discoveries has turned the tide in the favour of pure languages. Moggi (1991) introduced monads, a concept which orig- inates in category theory, to functional programming as an approach to structure programs. Wadler (1992) was then quick to use them as abstractions for effects, such as e.g. failure and exceptions. Finally, Peyton Jones (2001) showed that we can safely hide side-effects using a monad if we keep such impure computations separate from pure functions with the help of a type system such as Haskell’s (Peyton Jones, 2002). 6 Michael B. Gale These breakthroughs have laid the foundations which were required to make purely functional programming languages an attractive option for the software industry. As a result, monads have become the superstars of functional programming and have even made their way into languages of other paradigms. Our focus will remain on Haskell, however, where syntactic sugar for monads has been added to the language in the shape of the do-notation, list comprehensions, and the more general monad comprehensions. However, we can observe that, with the ex- ception of list comprehensions, all of these notations are compatible with any monads. The obvious implication of this is that none of the syntactic sugar makes use of helper functions which accompany individual monads. Many of the effects which were implemented using monads in Wadler’s paper have no corresponding syntactic abstractions, despite their significance in software development. 1.1 Project description We argue that software programs typically have to maintain a lot of state such as database connections, file handles, and configuration values which may have to be read or changed. If it was not for the state monad all of this information would have to be explicitly passed from function to function. Even with the help of a monad this approach still makes it much more complicated to achieve what would be trivial tasks in imperative or object-oriented languages and requires programmers to understand many of the inner workings of monads. It is in the nature

Load more