Continuation Calculus
Total Page:16
File Type:pdf, Size:1020Kb
Eindhoven University of Technology MASTER Continuation calculus Geron, B. Award date: 2013 Link to publication Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration. General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain Continuation calculus Master’s thesis Bram Geron Supervised by Herman Geuvers Assessment committee: Herman Geuvers, Hans Zantema, Alexander Serebrenik Final version Contents 1 Introduction 3 1.1 Foreword . 3 1.2 The virtues of continuation calculus . 4 1.2.1 Modeling programs . 4 1.2.2 Ease of implementation . 5 1.2.3 Simplicity . 5 1.3 Acknowledgements . 6 2 The calculus 7 2.1 Introduction . 7 2.2 Definition of continuation calculus . 9 2.3 Categorization of terms . 10 2.4 Reasoning with CC terms . 13 2.4.1 Fresh names . 13 2.4.2 Term equivalence . 13 2.4.3 Program substitution and union . 14 2.5 Data . 15 2.5.1 Call-by-name and call-by-value functions . 17 2.6 Example: list multiplication . 18 2.6.1 Correctness proofs . 19 3 Relation to programming languages 22 3.1 ML+ syntax . 23 3.2 Example programs in ML+ . 24 3.3 Using data types in ML+ . 25 3.4 Reduction semantics of ML+ . 26 3.5 Translation . 28 3.5.1 Inert terms . 28 3.5.2 Translation to CC . 29 4 Relation to lambda calculus 32 4.1 Embedding lambda calculus in continuation calculus . 32 4.1.1 The subset λ0 ................................... 33 4.1.2 CPS transformation . 34 4.1.3 Supercombinator transformation . 37 4.1.4 Defunctionalization . 37 4.2 Embedding continuation calculus in lambda calculus . 39 4.2.1 Functionalization . 40 4.2.2 Cycle elimination . 40 5 Related work 42 6 Conclusion and future work 43 1 A Proofs 47 A.1 General . 47 A.2 Program substitution and union . 48 A.3 Term equivalence . 49 2 Chapter 1 Introduction 1.1 Foreword This thesis is about continuation calculus or CC in short, a novel way of formally modeling programs. This calculus was initially developed by the author as a simple and uniform compilation target for programs, that could subsequently be executed reasonably efficiently, such that functional languages could be readily built on it. Similar goals are fulfilled by the abstract lambda calculus [3], or the more practical spineless tagless G-machine [24], STG in short, used by the popular Glasgow Haskell Compiler. Continuation calculus is an attempt to attain the simplicity of lambda calculus in a calculus that is straightforward to implement, and natively supports continuations. The calculus was originally designed for a toy call-by-value language, but further examination revealed that call-by-name languages are also modeled by CC. The author’s research has focused on the definition of CC, how it relates to programming languages, and reasoning with CC programs. In this thesis, we try to sketch a complete picture of why CC is useful and how it can be used. Although the broad scope makes it impossible to give in-depth proofs of all intended properties, we do give formal proofs for some specific properties. The research has produced a forthcoming paper in collaboration with the author’s supervisor Herman Geuvers. [13] The paper makes up Chapter 2 and Appendix A of this thesis, with only minor changes. Research is still ongoing on the subject of ML+, an exploratory programming language to formalize the interplay of call-by-name and call-by-value. Although the author wants to further research on this topic in time, and the current text on it is rough, he thinks that its semantics as given are already meaningful, and the translation to CC is correct. Thus, ML+ backs the idea that CC supports modeling mixed call-by-name and call-by-value code, and concretizes how this can be done in practice. This introduction will continue by describing three particular qualities of continuation calculus. Firstly, we explain what functional programs are, the significance of call-by-value and call-by-name, and the added value of continuations. The latter feature is modeled by CC, but not by λ or STG. Even though continuations are modeled by extensions of lambda calculus, this loses some of its simplicity. Secondly, we explain why continuation calculus is more straightforward to implement than lambda calculus. If we are looking for a code representation with the hope of eventually executing it, it is important that we do not force idiosyncrasies in the representation that cause otherwise-unneeded complexity over the whole chain. Finally, we argue that continuation calculus is a much simpler representation than STG. Our claim is not that continuation calculus is the best in all three qualities: powerful, close-to- the-metal, and simple. The individual virtues are perhaps much better addressed by satisfiability modulo theories [19], assembly language, and a one instruction set computer [21]. Instead, we claim is that continuation calculus addresses all three qualities quite well: a sweet spot. 3 These virtues are expected to make continuation calculus attractive for numerous people who work with languages. Programming language designers should find CC a handy tool to express when computations are done, how data is grouped, and how data flows between control points. Programming language implementors should find it straightforward to make a simple implementation of continuation calculus, and hopefully already in a fast one; furthermore, the structure that CC offers in the form of names with a fixed arity should help implementors to optimize implementations. Finally, programmers interested in optimizing their code, with a proof that improved has the same functionality, can be helped using equivalence and theorems on CC when the appropriate mapping between CC and programming languages has been deepened. After this introductory section, we will briefly introduce the types of programming languages that we model with CC, and will explain the properties that we destine to find. We continue in Chapter 2 with an explanation of the calculus, a formal definition, and some mathematical tools to work with CC. We also explain a concrete program written in CC by relating it to a version in a more conventional programming language, and we prove its correctness. In Chapter 3, we show how programs can systematically be encoded in continuation calculus. For this purpose, we introduce a toy programming language called ML+, which supports all three of call-by-value, call-by-name, and continuations. We show how programs in ML+ can be encoded in continuation calculus. Finally, we explore the connection between continuation calculus and lambda calculus in Chapter 4. The author wants to remark that there is an online and offline evaluator available through http://bgeron.nl/cc. It has helped the author to correct the bugs in hand-coded CC programs. Hopefully, the evaluator may also aid the intuition of the reader. Some demo programs are included, and all programs in this thesis should be testable. 1.2 The virtues of continuation calculus 1.2.1 Modeling programs Continuation calculus models programs, specifically functional programs. By functional, we mean that entities passed around from one code block to another do not change. The dominant programming style in functional programming is to invoke a function, which will return a result. This is in contrast to what is sometimes called imperative programming: a style in which shared memory locations are written to. Components in such programs often communicate by modifying shared memory. One might say that imperative programs have side effects, and functional programs don’t. Return-based programming, often also known as functional programming, enables programming techniques that aid modularity [15]. This is a necessary aspect of the long-term quality of software. Another important characteristic of FP is that it is easier to restructure software written by other people, because there can be no hidden interfaces. In effect, each subprogram becomes analyzable on its own. Because functional languages are so structured, it is feasible to analyze them mathematically. Such analysis can provide certainty that a particular changes in the program do not introduce faulty behavior. Furthermore, it can give programmers a comprehensive mental model of how their subprograms can be used. Finally, functional language developers may choose to evolve the language in a manner that retains programmers’ mental models of the language, aided by this analysis. In effect, such analysis yields orthogonal languages. Call-by-what? Functional languages can broadly be divided in two styles: call-by-need (or lazy) and call-by-value. These styles are distinguished most easily by considering how the computer evaluates the program. In call-by-value, evaluation order always follows the structure of program code, descending into the functions that it calls. In call-by-need, the program continuously generates terms that depend on one another, which are only elaborated when it is found essential by the computer.