<<

Eindhoven University of Technology

MASTER

Continuation

Geron, B.

Award date: 2013

Link to publication

Disclaimer This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

• Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain calculus Master’s thesis

Bram Geron

Supervised by Herman Geuvers

Assessment committee: Herman Geuvers, Hans Zantema, Alexander Serebrenik

Final version Contents

1 Introduction 3 1.1 Foreword ...... 3 1.2 The virtues of continuation calculus ...... 4 1.2.1 Modeling programs ...... 4 1.2.2 Ease of implementation ...... 5 1.2.3 Simplicity ...... 5 1.3 Acknowledgements ...... 6

2 The calculus 7 2.1 Introduction ...... 7 2.2 Definition of continuation calculus ...... 9 2.3 Categorization of terms ...... 10 2.4 Reasoning with CC terms ...... 13 2.4.1 Fresh names ...... 13 2.4.2 equivalence ...... 13 2.4.3 Program and union ...... 14 2.5 Data ...... 15 2.5.1 Call-by-name and call-by-value functions ...... 17 2.6 Example: list multiplication ...... 18 2.6.1 Correctness proofs ...... 19

3 Relation to programming languages 22 3.1 ML+ syntax ...... 23 3.2 Example programs in ML+ ...... 24 3.3 Using data types in ML+ ...... 25 3.4 Reduction of ML+ ...... 26 3.5 Translation ...... 28 3.5.1 Inert terms ...... 28 3.5.2 Translation to CC ...... 29

4 Relation to 32 4.1 Embedding lambda calculus in continuation calculus ...... 32 4.1.1 The subset λ0 ...... 33 4.1.2 CPS transformation ...... 34 4.1.3 Supercombinator transformation ...... 37 4.1.4 Defunctionalization ...... 37 4.2 Embedding continuation calculus in lambda calculus ...... 39 4.2.1 Functionalization ...... 40 4.2.2 Cycle elimination ...... 40

5 Related work 42

6 Conclusion and future work 43

1 A Proofs 47 A.1 General ...... 47 A.2 Program substitution and union ...... 48 A.3 Term equivalence ...... 49

2 Chapter 1

Introduction

1.1 Foreword

This thesis is about continuation calculus or CC in short, a novel way of formally modeling programs. This calculus was initially developed by the author as a simple and uniform compilation target for programs, that could subsequently be executed reasonably efficiently, such that functional languages could be readily built on it. Similar goals are fulfilled by the abstract lambda calculus [3], or the more practical spineless tagless G-machine [24], STG in short, used by the popular . Continuation calculus is an attempt to attain the simplicity of lambda calculus in a calculus that is straightforward to implement, and natively supports . The calculus was originally designed for a toy call-by-value language, but further examination revealed that call-by-name languages are also modeled by CC. The author’s research has focused on the definition of CC, how it relates to programming languages, and reasoning with CC programs. In this thesis, we try to sketch a complete picture of why CC is useful and how it can be used. Although the broad makes it impossible to give in-depth proofs of all intended properties, we do give formal proofs for some specific properties. The research has produced a forthcoming paper in collaboration with the author’s supervisor Herman Geuvers. [13] The paper makes up Chapter 2 and Appendix A of this thesis, with only minor changes. Research is still ongoing on the subject of ML+, an exploratory to formalize the interplay of call-by-name and call-by-value. Although the author wants to further research on this topic in time, and the current text on it is rough, he thinks that its semantics as given are already meaningful, and the translation to CC is correct. Thus, ML+ backs the idea that CC supports modeling mixed call-by-name and call-by-value code, and concretizes how this can be done in practice. This introduction will continue by describing three particular qualities of continuation calculus. Firstly, we explain what functional programs are, the significance of call-by-value and call-by-name, and the added value of continuations. The latter feature is modeled by CC, but not by λ or STG. Even though continuations are modeled by extensions of lambda calculus, this loses some of its simplicity. Secondly, we explain why continuation calculus is more straightforward to implement than lambda calculus. If we are looking for a code representation with the hope of eventually executing it, it is important that we do not force idiosyncrasies in the representation that cause otherwise-unneeded complexity over the whole chain. Finally, we argue that continuation calculus is a much simpler representation than STG. Our claim is not that continuation calculus is the best in all three qualities: powerful, close-to- the-metal, and simple. The individual virtues are perhaps much better addressed by satisfiability modulo theories [19], assembly language, and a one instruction set computer [21]. Instead, we claim is that continuation calculus addresses all three qualities quite well: a sweet spot.

3 These virtues are expected to make continuation calculus attractive for numerous people who work with languages. Programming language designers should find CC a handy tool to express when are done, how data is grouped, and how data flows between control points. Programming language implementors should find it straightforward to make a simple implementation of continuation calculus, and hopefully already in a fast one; furthermore, the structure that CC offers in the form of names with a fixed should help implementors to optimize implementations. Finally, programmers interested in optimizing their code, with a proof that improved has the same functionality, can be helped using equivalence and on CC when the appropriate mapping between CC and programming languages has been deepened. After this introductory section, we will briefly introduce the types of programming languages that we model with CC, and will explain the properties that we destine to find. We continue in Chapter 2 with an explanation of the calculus, a formal definition, and some mathematical tools to work with CC. We also explain a concrete program written in CC by relating it to a version in a more conventional programming language, and we prove its correctness. In Chapter 3, we show how programs can systematically be encoded in continuation calculus. For this purpose, we introduce a toy programming language called ML+, which supports all three of call-by-value, call-by-name, and continuations. We show how programs in ML+ can be encoded in continuation calculus. Finally, we explore the connection between continuation calculus and lambda calculus in Chapter 4. The author wants to remark that there is an online and offline evaluator available through http://bgeron.nl/cc. It has helped the author to correct the bugs in hand-coded CC programs. Hopefully, the evaluator may also aid the intuition of the reader. Some demo programs are included, and all programs in this thesis should be testable.

1.2 The virtues of continuation calculus 1.2.1 Modeling programs Continuation calculus models programs, specifically functional programs. By functional, we mean that entities passed around from one code block to another do not change. The dominant programming style in is to invoke a , which will return a result. This is in to what is sometimes called : a style in which shared memory locations are written to. Components in such programs often communicate by modifying shared memory. One might say that imperative programs have side effects, and functional programs don’t. Return-based programming, often also known as functional programming, enables programming techniques that aid modularity [15]. This is a necessary aspect of the long-term quality of software. Another important characteristic of FP is that it is easier to restructure software written by other people, because there can be no hidden interfaces. In effect, each subprogram becomes analyzable on its own. Because functional languages are so structured, it is feasible to analyze them mathematically. Such analysis can provide certainty that a particular changes in the program do not introduce faulty behavior. Furthermore, it can give programmers a comprehensive mental model of how their subprograms can be used. Finally, functional language developers may choose to evolve the language in a manner that retains programmers’ mental models of the language, aided by this analysis. In effect, such analysis yields orthogonal languages.

Call-by-what? Functional languages can broadly be divided in two styles: call-by-need (or lazy) and call-by-value. These styles are distinguished most easily by considering how the computer evaluates the program. In call-by-value, evaluation order always follows the structure of program code, descending into the functions that it calls. In call-by-need, the program continuously generates terms that depend on one another, which are only elaborated when it is found essential by the computer. So when the computer is computing the sum of two , it finds where either

4 natural came from, and executes that code before the addition. The origin of the may be much further away than the calling site. This difference has broad implications. For one, call-by-need allows one to work with “infinite data structures”, if only a finite part of that data is ever required for program execution. Speed is also affected in practice, for two reasons. On the one hand, call-by-need does not compute subresults that are not used. On the other hand, call-by-value has a predictable control flow, which helps the CPU to efficiently execute the machine code [20]. Continuation calculus models both call-by-value and call-by-name. The latter is functionally equivalent to call-by-need (infinite data structures are supported), but there are no facilities to remember results of previously-done when they are reusable.

Continuations Another useful feature included in some call-by-value languages, and modeled by us, are continuations. When a program “takes a continuation”, it reifies what is currently planned to be the rest of the program’s execution. This powerful construct allows the program to go back to decisions made earlier, and change them, or it allows two to run in an interleaved fashion, perhaps to use the result of the that finishes first [14]. At its core, continuations liberate practical control flow from the syntax tree, so that the program may follow a structure that is most natural for humans. Besides in call-by-value languages, continuations have recently been described for a call-by-name language [23]. However, the three features seem not to have been modeled together so far. Continuations may be used to model a form of exceptions. [17] In this thesis, we use only undelimited continuations.

1.2.2 Ease of implementation Continuation calculus is more straightforward to implement than lambda calculus, for three reasons. The three reasons contribute to the view that continuation calculus is “closer to the machine” than lambda calculus. Firstly, continuation calculus separates code and data, a practice that is commonly called defunctionalization. [7] A continuation calculus program is always evaluated by looking up the head of the current term, and executing the corresponding rule. This rule may be precompiled, as the set of rules is known at compile time. Such precompilation is also possible in lambda calculus, but is frequently preceded by a defunctionalization. [1] Besides separating code and data, continuation calculus eliminates the need for contexts. In lambda calculus, it does not suffice to evaluate on the top level: an of the form (λx.M) t1 tk can only β-reduce (λx.M) t1 or a term in M. Finding the correct evaluation is unnecessary··· in continuation calculus. Lastly, there is only a single reduction order in continuation calculus. The semantics of a higher-level language are specified in the translation to CC, so that the intermediate CC program is free of evaluation , and can be mixed with the CC translation of programs in higher-level languages with a different evaluation order. Terms in lambda calculus have an implicitly intended (but hidden) evaluation order. Although the CPS translation [25] allows lambda terms to be simulated in other evaluation orders, the existence of the translation disproves the universality of lambda calculus for the purpose of expressing program meaning.

1.2.3 Simplicity Lambda and continuation calculus have very few language-specific features. For instance, neither acknowledges the existence of data constructors and case distinction, both present in the spineless tagless G-machine, which was mentioned early in the introduction. Such programming language features can be encoded in the existing facilities: lambda abstraction and application, and CC rules and applications. This is in stark contrast with the spineless tagless G-machine, which gives special treatment to data constructors, case distinction, fixed precision integers and operations on them. Such

5 features are not necessary for , and could theoretically be removed. All features of lambda and continuation calculus are crucial to their Turing completeness: the abstraction, application, , and β-reduction of λ, and the rules, polyadic names, dot, and reduction of CC.

1.3 Acknowledgements

The author would like to thank Herman Geuvers for his supervision: without his faith in the project, continuation calculus would have remained mere loose ideas in the author’s head. The author would also like to thank the reviewers of the forthcoming paper [13] for their helpful comments, together with Alexander Serebrenik and Hans Zantema, and other friends who have given useful feedback.

6 Chapter 2

The calculus

2.1 Introduction

Continuation calculus looks a bit like term [27] and a bit like λ-calculus, and it has ideas from both. A term in CC is of the shape

n.t1. .tk, ··· where n is a name and the ti are themselves terms. The “dot” is a binary operator that associates to the left. Note that terms do not contain variables. A program P is a list of program rules of the form

n.x1 . .xk u ··· −→ where the xi are all different variables and u is a term over variables x1 . . . xk. This program rule is said to define n, and we make sure that in a program P there is at most one definition of n. Here, CC already deviates from term rewriting, where one would have, for example:

Add(0 , m) m −→ Add(S(n), m) S(Add(n, m)) −→ These syntactic case distinctions, or pattern matchings, are not possible in CC. The meaning of the program rule n.x1 . .xk u is that a term n.t1. .tk evaluates to u[~x := ~t]: the variables ~x in t are replaced by··· the respective−→ terms ~t. A peculiarity··· of CC is that one cannot evaluate “deep in a term”: we do not evaluate inside any of the ti and if we have a term n.t1. .tm, where m > k, this term does not evaluate. (This will even turn out to be a “meaningless”··· term.) We exemplify how CC works by explaining the natural numbers: how they are represented in CC and how one can program addition on them. A natural is either 0, or S(m) for m a . We shall have a name Zero and a name S. The number m will be represented by S.( .(S.Zero) ), with m times S. So the numbers 0 and 3 are represented by the terms Zero and···S.(S.(S.Zero···)). The only way to extract information from a natural m is to “transfer control” to that natural. Execution should continue in one code when m = 0, and execution should continue in different code when m 1. This becomes possible by postulating the following rules for Zero and S: ≥ Zero.z.s z −→ S.x.z.s s.x −→ The programmer is now to construct a term z and a term s. We observe that we can separate the cases t = Zero and t = S.x by reducing t.z.s z or t.z.s s.x. → →

7 We will now implement call-by-value (“CBV”) addition in CC on these natural numbers. The idea of CC is that a does not just produce an output value, but passes it to the next function, the continuation. So we are looking for a term AddCBV that behaves as follows:

AddCBV . m . p .r r. m + p (2.1) hh ii hh ii  hh ii for all m, p, r, where  is the multi-step evaluation, and l are the terms that represent a natural number l. Term r indicates where evaluation should continuehh ii after the computation of m + p . Equation (2.1) is the specification of AddCBV . We will use the following algorithm:hh ii

0 + p = p S(m) + p = m + S(p)

To program AddCBV , we have to give a rule of the shape AddCBV .x.y.r t. We need to make a case distinction on the first x. If x = Zero, then the result of the−→ addition is y, so we pass control to r.y. If x = S.u, then control should eventually transfer to r. (AddCBV .u.(S.y)). Let us write down a first approximation of AddCBV :

AddCBV .x.y.r x.(r.y).t −→ The term t is yet to be determined. Now control transfers to r.y when x = Zero, or to t.u when x = S.u. From t.u, control should eventually transfer to AddCBV .u.(S.y).r. Let us write down a naive second approximation of AddCBV , in which we introduce a helper name B.

AddCBV .x.y.r x.(r.y).B −→ B.u AddCBV .u.(S.y).r −→ Unfortunately, the second line is not a valid rule: y and r are variables in the right-hand side of B, but do not occur in its left-hand side. We can fix this by replacing B with B.y.r in both rules.

AddCBV .x.y.r x.(r.y).(B.y.r) −→ B.y.r.u AddCBV .u.(S.y).r −→ This is a general procedure for representing data types and functions over data in CC. We can now prove the correctness of AddCBV by showing (simultaneously by induction on m) that

AddCBV . m . p .r r. m + p hh ii hh ii  hh ii B. p .r. m0 r. m0 + p + 1 hh ii hh ii  hh ii We formally define and characterize continuation calculus in the following sections. In Section 2.5, we define the meaning of , which allows us to give a specification for call-by-name (“CBN”) addition, AddCBN : hh·ii

AddCBN . m . p m + p hh ii hh ii∈ hh ii This statement means that AddCBN . m . p is equivalent to and compatible with S.( (S.Zero) ), with m + p times S. The precise meaninghh ii hh ofii this statement will be given in Definitions··· 32 and ··· Remark 33. Note that while AddCBV . m . p .r  r. m + p involved reduction, it involves no reduction to form the term AddCBV. hhm ii. hhp iicompatiblehh withii S.( (S.Zero) ): this compu- tation is delayed until the case distinctionhh ii betweenhh ii m + p = 0 and m +···p 1 is needed,··· analogously to the call-by-name paradigm. ≥ The terms AddCBV and AddCBN are of a different kind. Nonetheless, we will see in Section 2.5.1 how call-by-value and call-by-name functions can be used together. We show additional examples with FibCBV and FibCBN in Section 2.5.1. Furthermore, we model and prove a program with call/cc in Section 2.6.

8 2.2 Definition of continuation calculus

Definition 1 (names). There is an infinite set of names. Concrete names are typically denoted as upper-case letters (A, B,...), or capitalized wordsN (True, False, And,...); we refer to any name using n and m. Interpretation. Names are used by programs to refer to ‘functionality’, and will serve the role of constructors, function names, as well as labels within a function. Definition 2 (universe). The set of terms in continuation calculus is generated by: U ::= . U N | U U where . (dot) is a binary constructor. The dot is neither associative nor commutative, and there shall be no overlap between names and dot-applications. We will often use M, N, t, u to refer to terms. If we know that a term is a name, we often use n, m. We sometimes use lower-case words that describe its function, e.g. abort, or letters, e.g. r for a ‘return continuation’. The dot is read left-associative: when we write A.B., we mean (A.B).C. Interpretation. Terms by themselves do not denote any computation, nor do they have any value of themselves. We inspect value terms by ‘dotting’ other terms on them, and observing the reduction behavior. If for instance b represents a boolean value, then b.t.f reduces to t if b represents true; b.t.f reduces to f if b represents false. Definition 3 (head, length). All terms have a head, which is defined inductively:

head(n ) = n ∈ N head(a.b) = head(a).

The head of a term is always a name. The length of a term is determined by the number of dots traversed towards the head.

length(n ) = 0 ∈ N length(a.b) = 1 + length(a).

This definition corresponds to left-associativity: length(n.t1.t2. .tk) = k. ··· Definition 4 (variables). There is an infinite set of variables. Terms are not variables, nor is the result of a dot application ever a variable. V Variables are used in CC rules as formal to refer to terms. We will use lower-case letters or words, or x, y, z to refer to variables. Note that we use similar notations for both variables and terms. However, variables exist only in rules, so we expect no confusion. Definition 5 (rules). Rules consist of a left-hand and a right-hand side, generated by:

LHS ::= LHS. where every variable occurs at most once N | V RHS ::= RHS.RHS N | V | Therefore, any right-hand side without variables is a term in . A combination of a left-hand and a right-hand side is a ruleU only when all variables in the right-hand side also occur in the left-hand side.

Rules ::= LHS RHS where all variables in RHS occur in LHS → A rule is said to define the name in its left-hand side; this name is also called the head. The length of a left-hand side is equal to the number of variables in it.

9 Definition 6 (program). A program is a finite set of rules, where no two rules define the same name. We denote a program by P .

Programs = P Rules P is finite and head( ) is injective on the LHSes in P { ⊆ | · } The domain of a program is the set of names defined by its rules.

dom(P ) = head(rule) rule P { | ∈ } We will frequently extend programs: an of a program P is a program that is a superset of P . Definition 7 (evaluation). A term can be evaluated under a program. Evaluation consists of zero or more sequential steps, which are all deterministic. For some terms and programs, evaluation never terminates. We define the evaluation through the partial successor function nextP ( ): . We define · U 7→U nextP (t) when P defines head(t), and length(t) equals the length of the corresponding left-hand side.

nextP (n.t1.t2. .tk) = r[~x := ~t ] when “n.x1 .x2 . .xk r” P ··· ··· −→ ∈ It is allowed that k = 0:

nextP (n) = r when “n r” P −→ ∈

More informally, we write M P N when nextP (M) = N. The reflexive and transitive closure → of P will be denoted P . When M N, then we call N a reduct of M, and M is said to →   be defined. When nextP (M) is not defined, we write that M is final. Notation: M P . We also ↓ combine the notations: if nextP (M) = N and nextP (N) is undefined, we may write M P N P . We will often leave the subscript P implicit: M N . → ↓ In Section 2.3, we divide the final terms in three→ groups:↓ undefined terms, incomplete terms, and invalid terms. Thus, these are the three cases where nextP (M) is undefined. Definition 8 (termination). A term M is said to be terminating under a program P , notation M  P , when it has a final reduct: N : M P N P . We often leave the subscript P implicit.↓ ∃ ∈ U ↓

2.3 Categorization of terms

A program divides all terms into four disjoint categories: undefined, incomplete, complete, and invalid. A term’s evaluation behavior depends on its category, to which the term’s arity is crucial.

Definition 9. The name n has arity k if P contains a rule of the form n.x1 . .xk q. ··· −→ A term t has arity k i if it is of the form n.q1. .qi, where n has arity k (k i). − ··· ≥ Definition 10. Term t is defined in P if head(t) dom(P ), otherwise we say that t is undefined. Given a t that is defined, we say that ∈

t is complete if the arity of t is 0 • t is incomplete if the arity of t is j > 0 • t is invalid if is has no arity (that is, t is of the form n.q1. .qi, where n has arity k < i) • ··· The four categories have distinct characteristics.

10 Undefined terms. Term M is undefined iff M.N is undefined. This does not depend on term N. Extension of the program causes undefined terms to remain undefined or become incomplete, complete, or invalid. Interpretation. Because variables are not part of a term in continuation calculus, we use undefined names instead for similar purposes.1 This means that all CC terms are ‘closed’ in the lambda calculus sense.

The remaining three categories contain defined terms: terms with a head dom(P ). Extension of the program does not change the category of defined terms. ∈

Incomplete terms. If M is incomplete, then M.N can be incomplete or complete. Interpretation. There are four important classes of incomplete terms.

Data terms (see Section 2.5). If d represents ck(v1, , vn ) of a with m • ··· k D constructors, then t1 . . . tm : d.~t tk.~v. Examples: ∀ ∈ U 

t1, t2 : Zero.t1 .t2 t1 Zero represents 0 ∀ ∈ U  t1, t2 : S.(S.(S.Zero)).t1.t2 t2.(S.(S.Zero)) S.(S.(S.Zero)) represents S(S(S(0))) ∀ ∈ U  Or, using more mnemonic variables z and s:

z, s : Zero.z.s z Zero represents 0 ∀ ∈ U  z, s : S.(S.(S.Zero)).z.s s.(S.(S.Zero)) S.(S.(S.Zero)) represents S(S(S(0))) ∀ ∈ U 

Call-by-name function terms. These are terms f such that f.v1. .vk is a data term • for all ~v in the appropriate domain. Example using Figure··· 2.1: ∈ JDK z, s : AddCBN .Zero.Zero.z.s z ∀ ∈ U  z, s : AddCBN .(S.Zero).(S.(S.Zero)).z.s s.(AddCBN .Zero.(S.(S.Zero))) ∀ ∈ U  Recall that AddCBN .(S.Zero).(S.(S.Zero)) is a data term that represents 3. The second reduction shows that 1+CBN2 = S(x), for some x represented by AddCBN .Zero.(S.(S.Zero)). Call-by-value function terms. These are terms f of arity n + 1 such that for all ~v in a • certain domain, r : f.v1. .vn.r  r.t with data term t depending only on ~v, not on r. Example: ∀ ∈ U ···

r : AddCBV .(S.Zero).(S.(S.Zero)).r r.(S.(S.(S.Zero))) ∀ ∈ U  Return continuations. These represent the state of the program, parameterized over • some values. Imagine a C program fragment “return abs(2 - ?);”. If we were to resume execution from such fragment, then the program would run to completion, but it is necessary to first fill in the question mark. If r represents the above program fragment, then r.3 represents the completed fragment “return abs(2 - 3);”. If a return continuation has arity n, then it corresponds to a program fragment with n question marks.

Invalid terms. All invalid terms will be considered equivalent. If M is invalid, then M.N is also invalid.

Complete terms. This is the set of terms that have a successor. If M is complete, then M.N is invalid.

1This is substantiated by 12.

11 Common definitions Zero.z.s z −→ S.m.z.s s.m −→ Nil.ifempty.iflist ifempty −→ .n.l.ifempty.iflist iflist.n.l −→ Call-by-value functions AddCBV .x.y.r x.(r.y).(AddCBV 0.y.r) −→ AddCBV 0.y.r.x 0 AddCBV .x 0.(S.y).r −→

FibCBV .x.r x.(r.Zero).(FibCBV1 .r) −→ FibCBV1 .r.y y.(r.(S.Zero)).(FibCBV2 .r.y) −→ FibCBV2 .r.y.y0 FibCBV .y.(FibCBV3 .r.y0) −→ FibCBV3 .r.y0.fiby FibCBV .y0.(FibCBV4 .r.fiby ) −→ FibCBV4 .r.fiby .fiby0 AddCBV .fiby .fiby0 .r −→ Call-by-name functions AddCBN .x.y.z.s x.(y.z.s).(AddCBN 0.y.s) −→ AddCBN 0.y.s.x 0 s.(AddCBN .x 0.y) −→

FibCBN .x.z.s x.z.(FibCBN1 .z.s) −→ FibCBN1 .z.s.y y.(s.Zero).(FibCBN2 .z.s.y) −→ FibCBN2 .z.s.y.y0 AddCBN .(FibCBN .y).(FibCBN .y0).z.s −→

Figure 2.1: Continuation calculus representations of + and fib. The functions are applied in a different way, as shown in Figure 2.2. This incompatibility is already indicated by the different arity: arity(AddCBV ) = 3 = arity(AddCBN ) = 4, and arity(F ibCBV ) = 2 = arity(FibCBN ) = 3. Figure 2.2 shows how to use6 the four functions. 6

12 2.4 Reasoning with CC terms

This section sketches the nature of continuation calculus through theorems. All proofs are included in the appendix.

2.4.1 Fresh names Definition 11. When a name fr does not occur in the program under consideration, then we call fr a fresh name. Furthermore, all fresh names that we assume within theorems, lemmas, and are understood to be different. When we say fr is fresh for some objects, then it is additionally required that fr is not mentioned in those objects. We can always assume another fresh name, because programs are finite and there are infinitely many names. Interpretation. Fresh names allow us to reason on arbitrary terms, much like free variables in lambda calculus.

Theorem 12. Let M,N be terms, and let name fr be fresh. The following bi-implications hold:

M N t : M[fr := t] N[fr := t]  ⇐⇒ ∀ ∈ U  M t : M[fr := t] ↓ ⇐⇒ ∃ ∈ U ↓ Lemma 13 (determinism). Let M,~t,~u be terms, and let m, n be undefined names in P . If M P m.t1. .tk and M P n.u1. .ul, then m.t1. .tk = n.u1. .ul.  ···  ··· ··· ··· Remark 14. If m or n is defined, this may not hold. For instance, in the program “A B; B C ”, we have A B and A C, yet B = C. −→ −→   6 2.4.2 Term equivalence

Besides syntactic equality (=), we introduce two equivalences on terms: common reduct (=P ) and observational equivalence ( P ). ≈ Definition 15. Terms M,N have a common reduct if M  t  N for some term t. Notation: M =P N.

Proposition 16. Suppose M =P N . Then M N. ↓  Common reduct is a strong equivalence, comparable to β-conversion for lambda calculus. Terms M = N can only have a common reduct if at least one of them is complete. This makes pure 6 =P unsuitable for relating data or function terms, which are incomplete. In fact, =P is not a congruence. To remedy this, we define an observational equivalence in terms of termination.

Definition 17. Terms M and N are observationally equivalent under a program P , notation M P N, when for all extension programs P 0 P and terms X: ≈ ⊇

X.M 0 X.N 0 ↓P ⇐⇒ ↓P We may write M N if the program is implicit. ≈ Examples: AddCBV . m . 0 AddCBV . 0 . m and 0 True , but 0 1 ; see Sec- tion 2.5. h i h i ≈ h i h i h i ≈ h i h i 6≈ h i

Lemma 18. is a congruence. In other words, if M M 0 and N N 0, then M.N M 0.N 0. ≈ ≈ ≈ ≈

13 Characterization The reduction behavior of complete terms divides them in three classes. Observational equivalence distinguishes the classes.

Nontermination. When M is nonterminating and the program is extended, M remains • nonterminating. If the reduction path of M is finite, we call it terminating, and we may write M  . This is shorthand for N : M N . ↓ ∃ ∈ U  ↓ Proper reduction to an incomplete or invalid term. All such M are observationally equivalent • to an invalid term. When the program is extended, such terms remain in their execution class.

Proper reduction to an undefined term. Observational equivalence distinguishes terms M,N • if the head of their final term is different. Therefore, there are infinitely many subclasses. When the program is extended, the final term may become defined. This can cause such M to fall in a different class. The following and theorem show that distinguishes three equivalence classes: if M N, then M and N are in the same class. ≈ ≈ Proposition 19. If M N, then M N . ≈ ↓ ⇔ ↓ Theorem 20. Let M N and M  fr.t1. .tk with fr / dom(P ). Then N  fr.u1. .uk for some ~u. ≈ ··· ↓ ∈ ··· ↓

Retrieving observational equivalence Complete terms with a common reduct are observa- tionally equal. If M,N are incomplete, but they have common reducts when extended with terms, then also M N. ≈ Theorem 21. Let M,N be terms with arity k. If M.t1. .tk =P N.t1. .tk for all ~t, then M N. ··· ··· ≈ Corollary 22. Suppose M =P N and arity(M) = arity(N) = 0. Then M N. ≈ Remark 23. M  N does not always imply M N if arity(N) > 0. For instance, take the following program: ≈

Goto.x x −→ Omega.x x.x −→ Then Goto.Omega Omega, an incomplete term. We cannot ‘fix’ Goto.Omega by appending another term: Goto.Omega→ .Omega is invalid. Name Goto is defined for one ‘operand’ term, and the superfluous Omega term cannot be ‘memorized’ as with lambda calculus. On the other hand, Omega.Omega Omega.Omega is nonterminating. Hence, Goto.Omega Omega but Goto.Omega Omega→. → 6≈ Note that Goto.Omega Omega is only possible because arity(Goto.Omega) = arity(Omega). 6≈ 6 2.4.3 Program substitution and union

Definition 24 (fresh substitution). Let n1 . . . nk be names, and m1 . . . mk be fresh for M, all different. Then M[~n := ~m] is equal to M where all occurrences of ~n are simultaneously replaced by ~m, respectively. The fresh substitution P [~n := ~m] replaces all ~n by ~m in both left and right hand sides of the rules of P . We can combine two programs by applying a fresh substitution to one of them, and taking the union. As the following theorems shows, this preserves most interesting properties.

14 Theorem 25. Suppose that P 0 P is an extension program, and M,N are terms. Then the left hand equations hold. Let σ denote⊇ a fresh substitution [~n := ~m]. Then the right hand equations hold.

M P N = M P 0 NM P N Mσ P σ Nσ → ⇒ → → ⇐⇒ → M P N = M P 0 NM P N Mσ P σ Nσ  ⇒   ⇐⇒  M P = M P 0 M P Mσ P σ ↓ ⇐ ↓ ↓ ⇐⇒ ↓ M =P N = M =P 0 NM =P N Mσ =P σ Nσ ⇒ ⇐⇒ M P N = M P 0 NM P N Mσ P σ Nσ ≈ ⇒ ≈ ≈ ⇐⇒ ≈ 1 Remark 26. Names ~n are not mentioned in Mσ and P σ, so we can Theorem 25 with σ− on Mσ and P σ.

Theorem 27. Suppose that P 0 extends P , but dom(P 0 P ) are not mentioned in M, N, or P . \ Then M P N M P 0 N. ≈ ⇐⇒ ≈ 2.5 Data

In this section, we show how to program some standard data in continuation calculus. We first give a canonical representation of data as CC terms. We then give essential semantic characteristics, and show that other terms have those characteristics as well. Observational equivalence guarantees that termination of the whole program is only dependent on those characteristics. In fact, it will prove possible to implement “call-by-name values”, which delay computation until it is needed, by relying on those characteristics.

Standard representation of data In Section 2.1, we postulated terms for natural numbers in continuation calculus. We will now give this standard representation formally, as well as the representation of booleans and natural lists.

Definition 28. For a mathematical object o, we define a standard representation o of that object as a CC term, which we call a data term. We postulate that the rules in the middleh i column are included in programs that use the corresponding terms. ) True = True True.t.f t h i −→ booleans False = False False.t.f f h i −→ ) 0 = Zero Zero.z.s z h i −→ naturals m + 1 = S. m S.x.z.s s.x h i h i −→ ) [] = Nil Nil.e.c e h i −→ lists of naturals m : l = Cons. m . l Cons.x.xs.e.c c.x.xs h i h i h i −→ Theorem 29. True False . h i 6≈ h i Proof. Observe that for all t, f, True .t.f t and False .t.f f. Take two fresh names t and f. Contraposition of Theorem 20h provesi True→ .t.f h Falsei .t.f→. Because is a congruence with respect to dot, we can conclude True h Falsei . 6≈ h i ≈ h i 6≈ h i

Similar results hold for N and ListN, but we do not provide a proof here.

A broader definition The behavioral essence of these data terms is that they take a continuation for each constructor, and they continue execution in the respective continuation, augmented with the constructor arguments. For instance, 0 .z.s z and n + 1 .z.s s. n . We can capture this h i  h i  h i

15 essence in the following term sets; N and ListN are the smallest sets satisfying the following equalities. J K J K

B = M t, f : M.t.f  t M.t.f  f J K { ∈ U| ∀ ∈ U ∨ } N = M ( z, s : M.z.s  z) J K { ∈ U| ∀ ∈ U x N z, s : M.z.s  s.x ∨ ∃ ∈ J K ∀ ∈ U } ListN = M ( e, c : M.e.c  e) J K { ∈ U| ∀ ∈ U x N , xs ListN e, c : M.e.c  c.x.xs ∨ ∃ ∈ J K ∈ J K ∀ ∈ U } Remark 30. These sets are dependent on the program. The sets are monotone with respect to program extension: if M B , N , or ListN for a program, then M is also in the corresponding set for any extension∈ J K program.∈ J K ∈ J K The sets include other terms besides True , False , n , and l . Consider the following program fragment, which implements the h operatori h oni naturalh i numbers.h i ≤

Leq.x.y.t.f x.t.(Leq0.y.t.f ) −→ Leq0.y.t.f .x 0 Leq.y.x 0.f .t −→ Given naturals m, p and this program fragment, Leq. m . p B . Even more, Leq. m . p h i h i ∈ h i h i ≈ m p . In general, it follows from Theorem 21 that all M B areJ K observationally equivalent to h ≤ i ∈ True or False . The appendix contains a proof of the analogousJ K statement for N : h i h i J K Proposition 31. All terms in N are observationally equivalent to k for some k. J K h i For further reasoning, it is useful to split up N in parts as follows. J K Definition 32. For a natural number k, the set k is defined as M N M k . We define b and l analogously for booleans b and listshh ofii naturals l. { ∈ J K| ≈ h i} hh ii hh ii With this definition, we may say Leq. 3 . 4 True . In fact, if a 3 and b 4 , then Leq.a.b True .2 To support this patternh ofi h reasoning,i ∈ hh weii allow to lift ∈ hh, denotingii ∈ a term hh ii . The resulting∈ statements hh ii are implicitly quantified universally and existentially,hh·ii and are usable in proof chains. Remark 33. For data terms, we would like to reason and compute with equivalence classes of representations, k , instead of with the representations themselves, k . Of course, a CC program will always computehh ii with a term (and not with an equivalence classh ofi terms), but we would like this computation to only depend on the characterization of the equivalence class. For example, we want to compute a CBN addition function AddCBN , such that for all m, p N, t m u p : AddCBN .t.u m + p . As a specification, we want to summarize this as:∈ ∀ ∈ hh ii∀ ∈ hh ii ∈ hh ii AddCBN . m . p m + p hh ii hh ii∈ hh ii

We will also summarize a statement of the form t1 m t2 m t3 l : A.t1 ∀ ∈ hh ii ∃ ∈ hh ii ∃ ∈ hh ii  B.t2.t3 with the shorthand A. m  B. m . l . If we know A. m  B. m . l and B. m . l  C. m , then we may logicallyhh concludeii hh ii hh ii hh ii hh ii hh ii hh ii hh ii hh ii

t1 m t2 m t3 l t4 m : A.t1 B.t2.t3 C.t4 , ∀ ∈ hh ii∃ ∈ hh ii∃ ∈ hh ii∃ ∈ hh ii   which we will summarize as A. m  B. m . l  C. m . Analogous statements of this form, and longer series, will be summarizedhh ii in ahh similarii hh ii way. Inhh particular,ii it will suit us to also use → and =P in longer derivations.

2To see this, observe Leq.a.b ≈ Leq.h3i.h4i ≈ hTruei by congruence, then use Theorems 20 and 12.

16 Example: delayed addition We will program a different addition on natural numbers: one that delays work as long as possible, like in call-by-name programming languages. We use the following algorithm, for natural numbers m, p: 0 + p = p S(m) + p = S(m + p) The resulting name AddCBN will be a ‘call-by-name’ function, with specification AddCBN . m . p hh ii hh ii ∈ m+p , so we have to build a rule for AddCBN . Because AddCBN . m . p N , arity(AddCBN ) = 4hh. We reduceii the specification with a case distinction on the first argument.hh ii hh ii ∈ J K

AddCBN. 0 . p .z.s =P p .z.s, (AddCBN . 0 . p has the same specification as p ) hh ii hh ii hh ii hh ii hh ii hh ii AddCBN . S(m) . p .z.s s. m + p hh ii hh ii  hh ii We must make the case distinction by using the specified behavior of the first argument. This suggests a rule of the form AddCBN .x.y.z.s x.(y.z.s).(s.(AddCBN .x 0.y)). It almost works: −→ AddCBN . 0 . p .z.s p .z.s hh ii hh ii  hh ii AddCBN . S(m) . p .z.s s.(AddCBN.x0. p ). m hh ii hh ii  hh ii hh ii However, variable x0 is not in the left-hand side, so this is not a valid rule. Furthermore, if x = S(x0), then x0 would be erroneously appended to s.(AddCBN .x 0.y). We fix AddCBN with a helper name AddCBN 0, with specification AddCBN 0. p .s. m s. m + p . hh ii hh ii  hh ii AddCBN .x.y.z.s x.(y.z.s).(AddCBN 0.y.s)) −→ AddCBN 0.y.s.x 0 s.(AddCBN .x 0.y) −→ This version conforms to the specification. AddCBN . 0 . p .z.s p .z.s = p hh ii hh ii  hh ii hh ii AddCBN . S(m) . p .z.s AddCBN 0. p .s. m hh ii hh ii  hh ii hh ii s.(AddCBN . m . p ) = s. m + p → hh ii hh ii hh ii

2.5.1 Call-by-name and call-by-value functions We regard two kinds of functions. We call them call-by-name and call-by-value, by analogy with the evaluation strategies for lambda calculus. Figure 2.1 defines a CBN and CBV version of addition on naturals and the Fibonacci function. Figure 2.2 shows how to use them. It also illustrates that the CBV function performs work eagerly, while the CBN function delays work until it is needed: hence the analogy.

Call-by-name functions are terms f such that f.v1. .vk is a data term for all ~v in a certain • domain. Example specifications for such f: ··· AddCBN . m . p m + p hh ii hh ii∈ hh ii FibCBN . m . p fib(m) hh ii hh ii∈ hh ii Call-by-value functions are terms f of arity n + 1 such that for all ~v in a certain domain, • r : f.v1. .vn.r  r.t with data term t depending only on ~v, not on r. Example specifica- tions∀ for such··· f: r : AddCBV . m . p .r r. m + p ∀ hh ii hh ii  hh ii r : FibCBV . m .r r. fib(m) ∀ hh ii  h i The output of FibCBV is always a standard representation. Because our implementation of AddCBV does not inspect the second argument, its output may not be a standard integer. An example of this is shown in Figure 2.2. We leave formal proofs of the specifications for future work.

17 Call-by-value fib(7) Call-by-name fib(7) To apply f to ~x, evaluate f.~x.r  r.y for some To apply f to ~x, write f.~x. This is directly a r. Then y is the result. data term, no reduction happens. The result of fib(7) is 13, obtained in 362 By the specification of FibCBN , we know reduction steps: FibCBN .7 13 . No reduction is involved. ∈ hh ii FibCBV .7.fr  fr.13 Both 13 and FibCBN .7 can be used in other functions, like +. Because they are observationally equivalent, they can be substituted for each other in a term. That does not affect termination, or the head of the final term if that is undefined (Theorem 20). However, substituting 13 for FibCBN .7 may make the evaluation shorter.

13 +CBV 0 is obtained in 41 steps: FibCBN .7 +CBV 0 is obtained in 304 steps: AddCBV .13.0.fr  fr.13 in 41 steps AddCBV .(FibCBN .7).0.fr  fr.13 in 304 steps (263 more)

Our implementation of AddCBV does not examine the right argument, as the converse addition shows.

AddCBV .0 .13 .fr  fr.13 in 2 steps AddCBV .0 .(FibCBN .7 ).fr  fr.(FibCBN .7 ) in 2 steps

Figure 2.2: Calculating fib(7), fib(7) + 0, and 0 + fib(7) using call-by-value and call-by-name. Effectively, FibCBN delays computation until it is needed. A natural number n stands for S.( .(S .Zero) ). | ···{z } ··· n times

2.6 Example: list multiplication

To illustrate how control is fundamental to continuation calculus, we give an example program that multiplies a list of natural numbers, and show how an escape from a loop can be modeled without a special operator in the natural CC representation. We use an ML-like programming language for this example, and show the corresponding call-by-value program for CC. The naive way to compute the product of a list is as follows:

ListMult.l.r l.(r.(S.Zero)).(C .r) let rec listmult 1 l = match l with −→ [] 1 C .r.x.xs ListMult.xs.(PostMult.x.r) | → −→ (x : xs) x listmult 1 xs PostMult.x.r.y Mult.x.y.r | → · −→ Note that if l contains a zero, then the result is always zero. One might wish for a more efficient version that skips all numbers after zero.

let rec listmult 2 l = match l with ListMult.l.r l.(r.(S.Zero)).(B.r) −→ [] 1 B.r.x.xs x.(r.Zero).(C .r.x.xs) | (x→: xs) match x with −→ | 0 →0 C .r.x.xs.x 0 ListMult.xs.(PostMult.x.r) | → −→ x0 + 1 x listmult 2 l PostMult.x.r.y Mult.x.y.r | → · −→

However, listmult 2 is not so efficient either: if the list is of the form [x1 + 1, , xk + 1, 0], then we ··· only avoid multiplying 0 listmult 2 []. The other multiplications are all of the form n 0 = 0. We also want to avoid execution· of those surrounding multiplications. We can do so if we· extend ML with the call/cc operator, which creates alternative exit points that are invokable as a function.

18 let listmult 3 l = call/cc λabort. A l The boxes are not where A = function syntax, but are used to [] 1 | → relate listmult3 to (x : xs) match x with ) Figure 2.3. | → 0 abort 0 | → x0 + 1 x A xs (C) | → · (B)

While listmult 3 is not readily expressible in actual ML or lambda calculus, it is natural to express in CC: we list the program in Figure 2.3. These programs are a CPS translation of listmult 3, with one exception: the variable abort in Figure 2.3 corresponds to the of abort to 0 in listmult 3. Note that in CC, abort is obtained simply by constructing r.Zero. The variable r globally corresponds to the return continuation that is implicit in ML. Continuation calculus requires to explicitly variables through the continuations.

Continuation calculus Haskell equivalent

— Assume m, m0, p N, l List , r, r0 . — Assume l, xs [N], x, x0, y N. ∈ ∈ N ∈ U ∈ ∈ ListMult.xs.r A.xs.r.(r.Zero) listmult 4 l r = A l r (r 0) −→ Theorem. ListMult. l .r r. product l = listmult 4 l r = r (product l) hh ii  hh ii ⇒ — Assume r.Zero =P r0. A l r abort = case l of A.xs.r.abort xs.(r.(S.Zero)).(B.r.abort) [] r 1 −→ | → Lemma. A. l .r.r0 =P r. product l x : xs B r abort x xs hh ii hh ii = | A l→ r (r 0) = r (product l) B.r.abort.x.xs x.abort.(C .r.abort.x.xs) ⇒ −→ = B.r.r0. m . l =P r. m product l B r abort x xs = case x of ⇒ hh ii hh ii hh · ii 0 abort C .r.abort.x.xs.x 0 A.xs.(PostMult.x.r).abort | y +→ 1 C r abort x xs y −→ product = C.r.r0. m . l .x0 =P r. m l = | B r→(r 0 ) x xs = r (x product xs) ⇒ hh ii hh ii hh · ii ⇒ · PostMult.x.r.y Mult.x.y.r C r abort x xs x = A xs (PostMult x r) abort = PostMult−→. m .r. p r. m p 0  = C r (r 0 ) x xs x 0 = r (x product xs) ⇒ hh ii hh ii hh · ii ⇒ · Mult.x.y.r y.(r.Zero).(PostMult.x.(PostAdd.x.r)) PostMult x r y = r (x y) Assumption.−→ Mult. m . p .r r. m p · hh ii hh ii  hh · ii Usage. 6 == listmult 4 [3, 1, 2] id Usage. ListMult. [3, 1, 2] .r r. 6 hh ii  hh ii Figure 2.3: Left: ‘fast’ list multiplication in continuation calculus (CC). Right: Haskell program with equivalent semantics. Statements after = serve to guide the reader. The theorem and lemma are proven in Section 2.6.1. ⇒

2.6.1 Correctness proofs This section proves that ListMult in Figure 2.3 is correct. The idea is to assume that a program contains the listed definitions, and Mult behaves according to the specification; then Theorem 36 proves the specification of ListMult in that program. We need two lemmas. Firstly, we show that name A conforms to its specification. This is done by induction on list l. Furthermore, we need a lemma on the quick exit of PostMult.

Lemma 34. The specification of A is satisfied. That is, assume l List , r, r0 such that ∈ N ∈ U r. 0 =P r0. Then A. l .r.r0 =P r. product l . h i hh ii hh ii

19 Proof. We use induction on l, and make a three-way case distinction. Case 1. Base case: l = []. Then:

A. [] .r.r0 hh ii [] . (r. (S.Zero)) .(B.r.r0 ) by definition → hh ii  r. (S.Zero) by definition of [] = r. product [] S.Zero 1 =hh productii [] hh ii ∈ hh ii hh ii

Case 2. l = (0 : l0). Then:

A. 0 : l 0 .r.r0 hh ii 0 : l 0 . (r. (S.Zero)) .(B.r.r0 ) by definition of A → hh ii B.r.r0 . 0 . l 0 by definition of 0 : l0  hh ii hh ii hh ii 0 .r0 .(C .r.r0 . 0 . l 0 ) by definition of B → hh ii hh ii hh ii r0 by definition of 0  hh ii =P r.Zero by assumption = r. product (0 : l0) Zero 0 = product (0 : l0) hh ii ∈ hh ii hh ii

Case 3. l = (m + 1 : l0). Then:

A. m + 1 : l 0 .r.r0 hh ii m + 1 : l 0 . (r. (S.Zero)) .(B.r.r0 ) by definition of A → hh ii B.r.r0 . m + 1 . l 0 by definition of m + 1 : l0  hh ii hh ii hh ii m + 1 .r0 .(C .r.r0 . m + 1 . l 0 ) by definition of B → hh ii hh ii hh ii C .r.r0 . m + 1 . l 0 . m by definition of m + 1  hh ii hh ii hh ii hh ii A. l 0 . (PostMult. m + 1 .r) .r0 by definition of C → hh ii hh ii =P PostMult. m + 1 .r. product l 0 by induction if r0 =P PostMult. m + 1 .r. 0 hh ii hh ii hh ii h i Mult. m + 1 . product l 0 .r by definition of Postmult → hh ii hh ii r. (m + 1) product l0 spec Mult  hh · ii = r. product (m + 1 : l0) mathematics h i

This chain proves that A. m + 1 : l0 .r.r0 =P r. product (m + 1 : l0) . hh ii hh ii The third case requires Lemma 35, which is proved below. This completes the induction, yielding:

A. l .r.r0 =P r. product l for all l List , r . hh ii hh ii ∈ N ∈ U Lemma 35. Let x , l List , r, r0 and r. 0 =P r0. Then PostMult.x.r. 0 =P r0 . ∈ U ∈ N ∈ U h i h i Proof. By the following chain.

PostMult.x.r. 0 Mult.x. 0 .r h i by definition of PostMult → 0 . (r.Zeroh i ) . (PostMult.x. (PostAdd.x.r)) by definition of Mult → hr.Zeroi = r. 0 by definition of 0 → h i h i =P r0 by assumption

Theorem 36. The specification of ListMult is satisfied. That is: assume l ListN, r . Then ListMult. l .r r. product l . ∈ ∈ U hh ii  hh ii Proof. We fill in r0 = r.Zero in the specification of A; then r.Zero =P r. 0 is satisfied by definition of 0 . h i h i A. l .r. (r.Zero) =P r. product l for all l List , r hh ii hh ii ∈ N ∈ U If we temporarily take r to be a fresh name, then we can change =P into  with Proposition 16. A. l .r. (r.Zero) r. product l for all l List hh ii  hh ii ∈ N

20 We can generalize this again with Theorem 12:

A. l .r. (r.Zero) r. product l for all l List , r hh ii  hh ii ∈ N ∈ U Now our main correctness result follows rather straightforwardly.

ListMult. l .r A. l .r. (r.hhZeroii ) by definition → r. hhproductii l as just shown  hh ii

21 Chapter 3

Relation to programming languages

In this section, we argue the connection between continuation calculus and various programming languages: call-by-value languages, call-by-name languages, languages with continuations, and languages that support any combination of the three. We do this by introducing a new programming language, called ML+, that supports a mix of call-by-value and call-by-name, and continuations. There are separate expressions to construct a call-by-name function (byname [x1, , xk] n ··· → asif e), a call-by-value function (byval [x1, , xk](y1, yk0 ) return e), and application of either: ··· ··· → e[e1, ek] respectively e[e1, ek](e10 , , ek0 0 ). This··· section begins with··· an explanation··· of the core syntax, and some to aid writing programs in ML+. It continues with some example ML+ programs in Section 3.2. The reduction of these programs is formalized with SOS semantics for ML+ in Section 3.4. In Section 3.5, the meaning is defined again: now implicitly, by a translation to continuation calculus. One may note that the ML+ examples in Section 3.2 share function names with examples in Chapter 2. This is not a coincidence: we have tried to obtain the CC examples we have hand-coded in Chapter 2, but now starting from a more intuitive source language. A back-of-the-envelope translation of the ML+ examples showed that they translated to the CC programs in Chapter 2, modulo some simple optimizations. We leave a more in-depth account of these examples as well as the development of these optimizations for future research. This section contains preliminary work, but the author opines that it is informative enough to include in this thesis. The reader is warned that this section may be slightly rough and incomplete.

22 3.1 ML+ syntax

e ::= k (constructor) Ci/n x (bound variable) | f (global function) | e(e1, , ek) (application to k arguments, | ··· if the function has arity k)

e[e1, , ek] (augmentation with k arguments, | ··· if the function has arity k) ≥ l1, ,ln (deconstructor for type of cases) ΛΠe1,··· ,en e n | ··· let x = e in e0 (local ) | let x = cc in e (get current continuation) | byval [x1, , xk](y1, , yk0 ) return e (inline CBV function) | ··· ··· → byname [x1, , xk] asif e (inline CBN function for type of n cases) | n ··· →

Syntactic sugar:

if e then et else ef (pattern-match on booleans) | match e with (pattern-match on naturals) | 0 e0 | → 1 + y eS | → match e with (pattern-match on lists) | [] eNil | → y :: ys eCons | → catch x in e (catch static exception) | throw eexc in (raise static exception) |

Def ::= f = byval [x1, , xk](y1, , yk0 ) return e (CBV function definition) ··· ··· → f = byname [x1, , xk] asif e (CBN function definition) | n ··· → New in this syntax is the “lambdapi” operator for data deconstruction, written as the superpo- l1, ,ln sition of a lambda and a pi symbol: ΛΠ. Expression ΛΠe1,··· ,en e is evaluated as follows: first, e is evaluated to a value of a type of n cases. If the value is of··· a type of fewer or more cases, evaluation halts. A case distinction is made on this resulting value, to find that the value corresponds to k [v1, , vk]: the i’th data constructor with arity k for a type of n cases, with arguments ~v. If Ci/n ··· k = li, the expression reduces to ei[](v1, , vk), else evaluation halts. The combination of lambda and pi was··· chosen because the Λ operator customarily distinguishes values of a sum type, and the π operator customarily projects values of a product type. This data construction/deconstruction mechanism is further illustrated in Section 3.3.

Syntactic sugar We define if, match, catch, and throw to be equivalent to other syntax as follows. In effect, the four operators are shorthand notation.

23 def 0,0 if e then et else ef = ΛΠ e byval []() return et,byval []() return ef def → → match e with = ΛΠ0,1 e byval []() return e0,byval [](y) return eS → → 0 e0 | → 1 + y eS | → def match e with = ΛΠ0,2 e byval []() return eNil ,byval [](y,ys) return eCons → → [] eNil | → y :: ys eCons | → def catch x in e = let x = cc in e def ε throw eexc in eval = ΛΠεeexc[eval ] 3.2 Example programs in ML+

These are the ML+ programs that we demo in the paper. Their translation have seemed identical or almost identical to the CC programs in the paper, by an informal execution of the translation from Section 3.5.

AddCBV = byval [x, y]() return match x with 0 y → | → 1 + x0 AddCBV [x0,S[y]] which| is unsugared→ to: AddCBV = byval [x, y]() return ΛΠ0,1 → byval []() return y, → 1 byval [](x0) return AddCBV [x0, [y])] → C2/2

AddCBN = byname2 [x, y] asif match x with 0 y → | → x0 + 1 S[AddCBN [x0, y]] which| is unsugared→ to: AddCBN = byname [x, y] asif ΛΠ0,1 2 → byval []() return y, → 1 byval [](x0) return [AddCBN [x0, y]] → C2/2

FibCBV = byval [x]() return match x with 0 0 → | 1 +→y match y with | 0 →1 | →   1 + y0 AddCBV FibCBV [y](), FibCBV [y0]() | →

FibCBN = byname2 [x] asif match x with 0 0 → | 1 +→y match y with | 0 →1 | →   1 + y0 AddCBN FibCBN [y], FibCBN [y0] | →

24 ListMult = byval [l]() return let return = cc in A→[l](return[0]) A = byval [l](abort) return match l with [] 1 → | x →:: xs match x with | 0 →ΛΠεabort | → ε 1 + x0 Mult[x, A[xs](abort)]() | → 3.3 Using data types in ML+

Every data type in ML+ is a sum-of-product type.

= ( 1 k1 ) + + ( 1 kn ) T T1 × · · · × T1 ··· Tn × · · · × Tn The k-ary constructor of case i, out of n cases, is denoted

k Ci/n such that elements of are denoted T k [v1, , vk]. Ci/n ··· Deconstructing such elements happens with the “lambdapi” operator, written as the superposi- tion of a lambda and a pi symbol: ΛΠ. For each case i of type , we must supply a deconstructor T fi; then

l1, ,ln k ··· ΛΠf1, ,fn i/n[v1, , vk] =β fi(v1, , vk) ··· C ··· ··· Example: empty type The empty type has zero cases: the empty sum. Although it has no constructors, it is possible to write a call-by-name⊥ function that returns an empty type. Such a ε function must diverge (loop infinitely or throw an exception). Hence, we can use ΛΠεe to model “control flow to e” from which is not returned.

Example: unit type The unit type 1 has one case: the empty product. It has constructor 0 , C1/1 and its only value is 0 []. It can be trivially deconstructed with C1/1 0 0 ΛΠ = f1() f1 C1/1

Example: booleans The set of booleans is the sum of two empty products: B = 1 + 1. Its 0 0 0 0 constructors are 1/2 and 2/2, and its values are True = 1/2[] and False = 2/2[]. We can implementC theC ternary operator as follows: C C

if then else 0,0 b f1() f2() , ΛΠf1,f2 b The rules allow to deconstruct that to

0,0 ΛΠf1,f2 True = f1() 0,0 ΛΠf1,f2 False = f2()

We write B = 1 + 1 instead of B = 1 + 1 to avoid the implication that the booleans are a sum-of-product-of-sum-of-product type (namely unary products of the unit type 1).

25 Example: natural numbers The naturals can be typed N = 1 + N: the sum of an empty and a unary product. Its constructors are therefore 0 and 1 , and its values Zero = 0 [] and C1/2 C2/2 C1/2 S[n] = 1 [n] (for n a natural number). We can implement a basic pattern matching as follows: C2/2 case n of 0,1 0 f1() , ΛΠf ,f n | → 1 2 1 + n0 f2(n0) | → The rules allow to deconstruct that

0,1 ΛΠf1,f2 Zero = f1() 0,1 ΛΠf1,f2 S[n0] = f2(n0)

Example: lists of natural numbers Cons-lists can be typed ListN = 1 + N ListN: the sum 0 2 × of an empty and a binary product. Its constructors are therefore 1/2 and 2/2, and its values 0 2 C C Nil = 1/2[] and Cons[n, l] = 1/2[n, l] (for n N, l ListN). We can implement a basic pattern matchingC as follows: C ∈ ∈

case l of 0,2 Nil f1() , ΛΠf ,f l | → 1 2 Cons[n, l0] f2(n, l0) | → The rules allow to deconstruct that

0,2 ΛΠf1,f2 Nil = f1() 0,2 ΛΠf1,f2 Cons[n, l] = f2(n, l) Remark 37. True = Zero = Nil, using these encodings. This applies also for the data representation that we have given in Section 2.5.

3.4 Reduction semantics of ML+

For the semantics, it is necessary to define what is a value in ML+.

v ::= k (constructor) Ci/n x (bound variable) | f (global function) | v[v1, , vk] (augmentation with k arguments, | ··· ≤ if the function has arity k)

byval [x1, , xk](y1, , yk0 ) return e (inline CBV function) | ··· ··· → byname [x1, , xk] asif e (inline CBN function for type of n cases) | n ··· → Values in ML+ don’t reduce. All the expressions that are not values, are nonvalues, and they are reducible. Reduction happens within a context C : an “expression with a hole”; this hole is in the spot where reduction happens. The set Context{}is generated as follows:

26 C ::= (empty context) {} C [e1, , ek] (augmentation, 0 k) | ··· ≤ v [v1, , vj, C, ej+2, , ek] (augmentation, 0 j < k) | ··· ··· ≤ C(e1, , ek) (application, 0 k) | ··· ≤ v(v1, , vj, C, ej+2, , ek) (application, 0 j < k) | ··· ··· ≤ l1, ,ln (deconstructor, ) ΛΠe1,··· ,en C 0 n | ··· ≤ let C in Expr (local binding) |

and, of course, the domain-specific syntactic sugar. Reduction will happen according to our definitions ∆, which is a set of definitions of the form

Def ::= f = byval [x1, , xk](y1, , yk0 ) return e (CBV function definition) ··· ··· → f = byname [x1, , xk] asif e (CBN function definition), | n ··· → as defined before. The same variable f shall not appear more than once on the left-hand side of an equals sign. Right-hand sides may refer to any function symbol, even their “own” function symbol. Using this, we can define our reduction semantics. We shall let e, ei denote expressions, C a {} context, and v, vi values.

“f = byval [x1, , xk](y1, , yk0 ) return e” ∆ ··· ··· → ∈ where ~x0, ~y0 are fresh variables ∆ C f β C byval [x , , x ](y , , y 0 ) return [~x /~x,~y /~y] ` { } → { 10 ··· k0 10 ··· k0 → 0 0 }

“f = bynamen [x1, , xk] asif e” ∆ ··· → ∈ where ~x0 are fresh variables ∆ C f β C byname [x , , x ] asif e[~x /~x] ` { } → { n 10 ··· k0 → 0 }

∆ C v[v1, , vj][vj+1, , vk] β C v[v1, , vk] ` { ··· ··· } → { ··· }

∆ C let x = v in e β C e[v/x] ` { } → { }

∆ C (byval [x1, , xk](y1, , yk0 ) return e)[v1, , vk](u1, , uk0 β C e[~v/~x,~u/~y] ` { ··· ··· → ··· ··· } → { }

n l1, ,ln o n l1, ,ln o ∆ C ΛΠe1,··· ,en ((bynamen [x1, , xj] asif e)[v1, , vj]) β C ΛΠe1,··· ,en e[~v/~x] ` ··· ··· → ··· → ···

n o l1, ,ln li ∆ C ΛΠe1,··· ,en [v1, , vli ] β C ei[](v1, , vli ) ` ··· Ci/n ··· → { ··· }

C = ε {}ε 6 {} ε (ΛΠεe can never return) ∆ C ΛΠ e β ΛΠ e ` { ε } → ε

ε (bind continuation) ∆ C let x = cc in e β C let x = (byname [y] asif ΛΠ C y ) in e ` { } → { 0 → ε { } } Note that we don’t have a reduction rule for variables x, because we substitute in their values at application.

27 3.5 Translation

Our translation will consist of a number of steps. Firstly, we identify a set of “inert” ML+ terms , that have a direct translation into CC terms, without CC rules. Secondly, we will identify a set ofI “standard” ML+ terms, that include the “inert” terms, but whose translation adds 1 rule. Thirdly, we will explain how we can transform the rest of ML+ into such “standard” ML+ terms.

3.5.1 Inert terms We identify a set of the “inert” ML+ expressions. All such ML+ expressions are guaranteed to terminate. The CCI translation e of an expression e represents the ML+ value e, and the only rules involved are the standardh i data constructors.∈ Set I is generated as follows. I

::= k (general constructor) I Ci/n x (local variable) | f (defined function) | [ , , ] (augmentation) | I I ··· I byval [x1, , xk](y1, , yk0 ) return Expr (inline CBV function) | ··· ··· → byname [x1, , xk] asif Expr (inline CBN function) | n ··· →

Syntactic sugar: true false (boolean constructor) | | 0 1 + (number constructor) | | I [] :: (list constructor) | | I I Remark 38. The function : CC defined here is an extension of the function defined in h·i I → h·i

28 our paper to appear in proceedings of COS 2013.

x = x0 where x0 is a CC variable corresponding to ML+ variable x0 h i f = F where F is a CC name corresponding to f h i and the program shall include f , defined later d e k = C i,n,k and the program shall include hCi/ni Ci,n,k .x1 . .xk .f1 . .fn fi .x1 . .xk ··· ··· −→ ··· e[e1, , ek] = e . e1 . . ek h ··· i h i h i ··· h i t = F.z1. .zj if t = byval [x1, , xk](y1, , yk0 ) return t0, h i ··· ··· ··· → ~z = fv(t), F is a fresh CC name and the program shall include

F .z1 . .zj .x1 . .xk .r.y1 . .yk 0 t ··· ··· ··· −→ t = F.z1. .zj if t = byname [x1, , xk] asif t0, h i ··· n ··· → ~z = fv(t), F is a fresh CC name and the program shall include

F .z1 . .zj .x1 . .xk .r1 . .rn t 0 ··· ··· ··· −→ n

Syntactic sugar: when the program includes true = True True.t.f t h i −→ false = False False.t.f f h i −→ 0 = Zero Zero.z.s z h i −→ 1 + e = S. e Succ.m.z.s s.m h i h i −→ [] = Nil Nil.e.c e h i −→ e1 :: e2 = Cons. e1 . e2 Cons.x.xs.e.c c.x.xs h i h i h i −→ 3.5.2 Translation to CC Each ML+ term lives either in a call-by-value context, in case the innermost enclosing binder is byval, or a call-by-name context, in case the innermost enclosing binder is byname. For CBN contexts, we furthermore specify the binder arity k. We give the translation t in CC for ML+ terms in CBV context, and the translation tn for terms in n-ary CBN context. This is analogous to Plotkin [25], where M stands for the CPS translation of a “CBV λ term”, and M stands for the CPS translation of a “CBN λ term”. The free variables of t will consist of the “return continuation” variable r, in addition to the free variables of the source term t. The free variables of tn will consist of the “case continuation” variables r1, , rn, in addition to the free variables of the source term t. Correct execution··· of the CC term s will require that rules s are included in the program. d e Similarly, correct execution of sn depends on rules s n. We will give a direct translation from a lot of ML+b c terms t. For other terms, we indicate that t is equivalent to a different ML+ term t0 by writing t t0. In such cases, we define the translations t,tn as the translation of the rewritten term. Note that we only have to execute these rewritings on the top level. In the following translation scheme, assume that s, sj, t, tj are ML+ terms, and i, ij . With ∈ I n, nj we indicate terms that are not inert terms.

29 Call-by-value

Standard case: s = i s = r. i h i s = d e ∅ s = i(i1, , ik) s = i . i1 . . ik .r ··· h i h i ··· h i s = d e ∅ l1, ,lk ··· s = ΛΠi1, ,ik i s = i .( i1 .r). .( ik .r) ··· h i h i ··· h i s = d e ∅ s = let x = n in t s = n[(F.y1. .yj.r) /r] for F free, ~y = fv(t) x ··· \{ } s = n v F .~y.r.x n d e d e ∪ d e ∪ { −→ } s = let x = cc in t s = t[r/x] s = t d e d e

For all t t0 below, we define t = t and t = t0 . 0 d e d e

let x = i(i1, , ik) in t t[i(i1, , ik)/x] ··· ··· n(t1, , tk) let x = n in x(t1, , tk) ··· ··· i(i1, , ij, n, tj+2, , tk) let x = n in i(i1, , ij, x, tj+2, , tk) ··· ··· ··· ··· n[t1, , tk] let x = n in x[t1, , tk] ··· ··· i[i1, , ij, n, tj+2, , tk] let x = n in i[i1, , ij, x, tj+2, , tk] ··· ··· ··· ··· Call-by-name

Standard patterns:

s = i s = i .r1. .rn n h i ··· s = b cn ∅ s = let x = i(i1, , ik) in e s = i . i1 . . ik .(g.r1. .rn.~y) for G free, ~y = fv(e) x ··· n h i h i ··· h i ··· \{ } s = G.r1 . .rn .~y.x e b cn { ··· −→ n } l1, ,lm  ··· s = ΛΠi1, ,im i sn = i .( i1 .r). .( ik .r) [Inv n.r1. .rn/r] ··· h i h i ··· h i ··· s = Invn .r1 . .rn .x x.r1 . .rn b cn { ··· −→ ··· } s = let x = cc in e s = e [(Invn .r1. .rk) /x] n n ··· s = e Invn .r1 . .rn .y y.r1 . .rn b cn b cn ∪ { ··· −→ ··· }

For all t t0 below, we define t = t0 and t = t0 . n n b cn b cn

let x = i in t t[i/x] t(t1, , tk) let x = t(t1, , tk) in x ··· ··· t[t1, , tk] let x = t[t1, , tk] in x unless t and all ti are inert ··· ··· let x = n(t1, , tk) in t let y = n in let x = y(t1, , tk) in t ··· ··· let x = i(i1, , ij, n, tj+2, , tk) in t let y = n in let x = i(i1, , ij, y, tj+2, , tk) in t ··· ··· ··· ··· n[t1, , tk] let x = n in x[t1, , tk] ··· ··· i[i1, , ij, n, tj+2, , tk] let x = n in i[i1, , ij, x, tj+2, , tk] ··· ··· ··· ···

30 Either call-by-value or call-by-name For all t t0 below, we define t = t , t = t0 , 0 d e d e t = t0 , and t = t0 . n n b cn b cn

let x = (let y = t in t0) in t00 let y = t in let x = t0 in t00 l1, ln let in l1, ,ln ΛΠe1,··· ,en n x = n ΛΠe1,··· ,en x ··· ··· l1, ,lm l1, ,lm ΛΠe ,··· ,e ,n,e , ,e t ΛΠ ··· t 1 j−1 j+1 m e1, ,ej−1, byval [](x1, ,xl ) return n(x1, ,xl ) ,ej+1, ,em ··· ··· ··· ( ··· j → ··· j ) ··· Conjecture 39. The rewrites terminate.

The proof is for future research. We also leave for future research an equivalence on ML+ terms. It is hoped that a natural equivalence on ML+ terms t and u is equivalent to observational equivalence on the CC translation of t and u. We finally leave for future research the following conjecture.

Conjecture 40. An ML+ term in normal form, with respect to and either call-by-value or call-by-name, is uniquely translated to a CC term and program. Conversely: all ML+ terms that are translatable to CC are in normal form with respect to .

31 Chapter 4

Relation to lambda calculus

As explained in Chapter 1, lambda calculus partially fulfills the same goals as continuation calculus: it is models functional programs, can reason on them, and is very simple. Continuation calculus is more “powerful” in a sense because continuations are expressible in pure continuation calculus. In lambda calculus, one can only use continuations if all other code is also CPS transformed. As it happens, continuation calculus is quite similar to a subset of lambda calculus: the terms that are reducible using a novel β0-reduction. This reduction is equivalent to a series of n β-reductions in a row, but β0 is only applicable in specific situations: when the n β-reductions can be done “simultaneously” on the top level. If we regard only β0-reduction, then it suddenly becomes possible to use continuations again, because there are no enclosing terms that an inner term cannot “cancel”. Such cancellation can also be done in λC by the or operator [12], or by invoking a continuation; it cannot be done in vanilla lambda calculus. C A We call λ0 the set of lambda terms on which β0 reduction simulates call-by-name reduction. Continuation calculus is much like λ0, but is slightly stricter, and defunctionalized. Every code point (lambda abstraction) in CC is given a name with a fixed arity. This makes explicit the control points and the data flow between control points. Continuation calculus is slightly more explicit in that it disambiguates forms such as λx.λy.M: is that a function which is supposed to take two arguments, or λx.x which happened to have a function substituted in its body? In the second case, if the function is called with two arguments, it is easier for the programmer to spot where the program went wrong. Furthermore, one can check syntactically if a term is a CC term. This is not the case for λ0. As it happens, when we simulate lambda calculus in continuation calculus, we can recognize these differences between CC and λ in the steps we take. Firstly, we encode the chosen reduction strategy (by-name or by-value) in the term with the CPS transformation. Secondly, we make data flow between the control points explicit using the supercombinator transformation. Thirdly, we make control flow more explicit using defunctionalization. These steps are summarized in Figure 4.1; we indicate the term sets in which we can program with continuations. We explain the transformations from λ to CC and from CC to λ in more detail in Section 4.1 and 4.2, respectively.

4.1 Embedding lambda calculus in continuation calculus

In this section, we explain how to simulate a lambda term in CC, mimicing either the call-by-name strategy or the call-by-value strategy. Our approach will use a subset λ0 of lambda terms, on which a new β0 reduction is confluent with the call-by-name reduction. The approach will consist of three steps: 1. CPS transformation. We use a continuation passing style transformation to transform a lambda term M to a certain subset Λ0 of lambda terms, described below. We use M if

32 λC P transformation CPS

λ

CC supercombinator λ0 eliminate cyclic defunctionalization transformation dependencies

λ0 SC acyclic CC ∩ functionalization super combinator−

Figure 4.1: Relation between lambda calculus, lambda calculus with control, and continuation calculus (CC). The ellipses form a Venn diagram of term properties. The five dashed arrows indicate transformations from one circle to another, and are described in subsections. For example, the CPS transformation takes a term in λC, which may also be in λ or be a supercombinator, and the result of the transformation is a λ0 term, which may again be a supercombinator. The shaded areas are the terms where we can program with continuations.

we want to mimic call-by-name lambda calculus, and M if we want to mimic call-by-value lambda calculus. Plotkin [25] proves that M resp. M simulate the execution of M in call-by-name resp. call-by- value. He proves this for slightly different definitions of M and M, which are βη-convertible to our definitions. 2. Supercombinator transformation. The resulting lambda term is transformed to a supercombi- nator. (The term supercombinator will be explained shortly.) This step is a β expansion. 3. Defunctionalization. Each lambda abstraction is replaced by a name. The lambda applications are changed to dot-applications. We make a program with a rule for every lambda abstraction. Defunctionalization is closely related to [8]: both transform a block-structured program to recursive equations.

We first give the subset λ0 of lambda terms, and then we detail the three steps.

4.1.1 The subset λ0

We shall use a variant of lambda calculus as an intermediate language, which we shall call λ0. The terms of λ0 shall be a subset of the terms of lambda calculus. Reduction will be different: there shall only be the top-level rule β0, which is new.

Definition 41 (β0-reduction). For lambda terms M, N~ , and n 1, ≥ (λx1 xn.M) N1 Nn β0 M[N/~ ~x] at the top level ··· ··· → Note that the call-by-name beta rule of lambda calculus simulates β0: whenever, M β0 N, → then M βN N in n steps. The reverse is not true: β0 does not always simulate β-reduction. An application of β0 will correspond to one reduction step in continuation calculus.

33 Remark 42. Substitution in a lambda term replaces only the free variables. (This is a common definition. [26])

We formally define βN-reduction, which is our name for β-reduction in the call-by-name strategy.

Definition 43 (βN-reduction). For lambda terms M, N~ , and n 1, ≥

(λx.M) N1 N2 Nn β M[N1/x] N2 Nn at top level ··· → N ···

Of course, we may wonder if the β0 and βN reductions are always confluent. We know that if M β N, M β0 N 0, then N β N 0. So if both reductions are possible, then they have a → N →  N common reduct. The reductions are thus nonconfluent when βN makes progress but β0 fails to. That is the case exactly for these terms:

(λx1 xm.M N) t1 tn β0 (m < n) and ··· ··· 6→ (λx1 xm.xi) t1 tn β0 (m < n) ··· ··· 6→

We define λ0 as the set of terms on which β0 and βN are confluent.

Definition 44 (λ0). The subset λ0 of lambda terms are those lambda terms M such that when M β0 N in 0 or more steps, and N β N 0, then there exists N 00 such that N β0 N 00,  → N → N 0 βN N 00.

Now because β0 and βN reduction are deterministic, the β0 and βN reductions are confluent on terms in λ0. This is provable by induction.

Proving inhabitance of λ0 We can find some terms in λ0 using a typing. Assume some opaque type , that is: it shall be underivable that Γ,M : MN : τ for some τ. Now ⊥ ⊥ ` consider lambda terms M such that all subterms λx1 xm.N, with N not an abstraction, are ··· of some type τ1 τm . Then β reduction, and in particular β0 and βN reduction, preserves this property.→ · · · → Furthermore,→ ⊥ we can never encounter a well-typed term of the form (λx1 xm.N) t1 tn, for N not an abstraction and m < n, because (λx1 xm.N) t1 tm : ··· ··· ··· ··· ⊥ and (λx1 xm.N) t1 tm+1 : τ will be in violation of the opacity of . Note that··· we did not··· specify a concrete here on purpose.⊥ Any conventional type system in which is opaque, and which preserves types over β reduction suffices. We now have⊥ the sufficient machinery to define each of the three steps to simulate lambda calculus using continuation calculus.

4.1.2 CPS transformation The first step of converting λ terms to CC is to perform a continuation passing style (CPS) transformation. We give two transformations: to simulate λ terms as by call-by-name, and to · · simulate λ terms as by call-by-value. The result is a term in λ0 that simulates the original λ term. We prove inhabitance of λ0, but leave the simulation proofs to future work. The simulation proofs are expected to take a form that is very similar to the proof in Plotkin [25].

Call-by-name We define our call-by-name CPS transformation similar to that of Plotkin, as a function : λ λ defined as follows: · → x = λk.x k λx.M = λk.k (λxκ.M κ) MN = λk.M (λm.m N k) ; for each application of this function, all k, κ, κ, m, a, b, c should be taken fresh. This CPS transfor- mation is η-convertible to Plotkin’s transformation for variables, abstraction, and application.

34 Remark 45. For a complete transformation from λC to λ0, the operator needs a transformation. C The author believes that a correct transformation would be = λk.k (λaκ.a (λbκ.κ b) (λc.c)). However, he was unable to develop this belief so far, for instanceC by working out examples, by a typing as seen below, by similarity to as researched by others, or by a proof. C Theorem 46. The image of is in λ0. · In the proof, we type the terms using recursive typing [4]. Assume a base type , and a recursive type Ψ = (Ψ Ψ); τ stands for τ . Then we claim that M :Ψ for all⊥ closed M. ¬¬ → ¬ → ⊥ Furthermore, we claim that all the lambda abstractions in the image are, in fact, in λ0. Proof. We will use a typing context Γ(M) = x :Ψ x fv(M) , that is, a context in which all free variables of M have type Ψ. { | ∈ } The proof goes by induction on the size of M. We use the following induction hypothesis: Γ(M) M :Ψ M λ0 all abstractions inside M are in λ0. This implies that M :Ψ for closed `M. ∧ ∈ ∧ ` Base case. • Γ(x), k : (Ψ Ψ) x :Ψ by definition of Γ(x) ¬ → ` Γ(x), k : (Ψ Ψ) x : (Ψ Ψ) by definition of Ψ ⇒ ¬ → ` ¬ → → ⊥ Γ(x), k : (Ψ Ψ) x k : ⇒ ¬ → ` ⊥ Γ(x) λk.x k : (Ψ Ψ) ⇒ ` ¬ → → ⊥ Γ(x) λk.x k :Ψ by definition of Ψ ⇒ `

We see that the abstraction x = λk.x k is of some type τ , so x λ0. → ⊥ ∈ Inductive case 1: λx.M. • Γ(λx.M), x :Ψ M :Ψ induction hypothesis, ` Γ(M) = Γ(λx.M) x :Ψ ∪ { } Γ(λx.M), x :Ψ M : (Ψ Ψ) by definition of Ψ ⇒ ` ¬¬ → Γ(λx.M), x :Ψ, κ : (Ψ Ψ) M κ : ⇒ ¬ → ` ⊥ Γ(λx.M) λxκ.M κ :Ψ (Ψ Ψ) ⇒ ` → ¬ → → ⊥ Γ(λx.M) λxκ.M κ :Ψ Ψ by definition of Ψ ⇒ ` → Γ(λx.M), k : (Ψ Ψ) k (λxκ.M κ): ⇒ ¬ → ` ⊥ Γ(λx.M) λk.k (λxκ.M κ): (Ψ Ψ) ⇒ ` ¬¬ → Γ(λx.M) λk.k (λxκ.M κ):Ψ by definition of Ψ ⇒ `

We have seen that λxκ.M κ : τ1 τ2 and λk.k (λxκ.M κ) : τ3 for some ~τ. These type judgements, together with→ the induction→ ⊥ hypothesis, show that→ ⊥λx.M as well as all contained abstractions are in λ0. Inductive case 2: MN. • Γ(M) M :Ψ induction hypothesis ` Γ(N) N :Ψ induction hypothesis `

Γ(N), k : (Ψ Ψ) , m :Ψ Ψ m N : Ψ = (Ψ Ψ) ⇒ ¬ → → ` ¬¬ → Γ(N), k : (Ψ Ψ) , m :Ψ Ψ m N k : ⇒ ¬ → → ` ⊥ Γ(N), k : (Ψ Ψ) λm.m N k : (Ψ Ψ) ⇒ ¬ → ` ¬ → Γ(MN), k : (Ψ Ψ) M (λm.m N k): Γ(MN) = Γ(M) Γ(N) ⇒ ¬ → ` ⊥ ∪ Γ(MN) λk.M (λm.m N k): (Ψ Ψ) ⇒ ` ¬¬ → Γ(MN) λk.M (λm.m N k):Ψ by definition of Ψ ⇒ `

35 We have seen that λm.m N k : τ1 and λk.M (λm.m N k) : τ2 for some ~τ. These type judgements, together with the→ ⊥ induction hypothesis, show that→ ⊥MN as well as all contained abstractions are in λ0.

Call-by-value After the call-by-name CPS translation in the previous paragraph, we give a similar CPS translation : λ λ that simulates λ terms as call-by-value. · → x = λk.k x λx.M = λk.k λxκ.M κ MN = λk.M λm.N (λn.m n k) ; for each application of this function, all k, κ, κ, m, a, b, c should be taken fresh. This CPS transfor- mation is η-convertible to Plotkin’s transformation for variables, abstraction, and application. This CPS transformation is also η-convertible to Plotkin’s CPS transformation for call-by-value. Remark 47. The author has not found a concrete definition of yet that is compatible with the C next theorem. Such definition of is necessary to make a transformation from λC to λ0. However, the author is convinced of its existenceC through similar· results by others. [12]

Theorem 48. The image of is in λ0. · The proof is similar to the proof of Theorem 46. We type the terms using recursive typing. Assume a base type , and a recursive type Φ = Φ Φ. Then we claim M : Φ for all closed ⊥ → ¬¬ ¬¬ M. Furthermore, we claim that all the lambda abstractions in the image are in λ0. Proof. We will use a typing context Γ(M) = x :Φ x fv(M) : all free variables of M have type Φ. { | ∈ } The proof goes by induction on the size of M. We use the following induction hypothesis: Γ(M) M :Φ M λ0 all abstractions inside M are in λ0.. This implies that M :Φ for closed `M. ∧ ∈ ∧ ` Base case. • Γ(x), k : Φ k x : ¬ ` ⊥ Γ(x) λk.k x : Φ ⇒ ` ¬¬

We have seen that x = λk.k x is of some type τ1 τ2 , so x λ0. → → ⊥ ∈ Inductive case 1: λx.M. • Γ(λx.M), x :Φ M : Φ induction hypothesis, ` ¬¬ Γ(M) = Γ(λx.M) x :Φ ∪ { } Γ(λx.M), x :Φ, κ : Φ M k : ⇒ ¬ ` ⊥ Γ(λx.M) λxκ.M k :Φ Φ ⇒ ` → ¬ → ⊥ Γ(λx.M) λxκ.M k :Φ by definition of Φ ⇒ ` Γ(λx.M), k : Φ k λxκ.M k : ⇒ ¬ ` ⊥ Γ(λx.M) λk.k λxκ.M k : Φ ⇒ ` ¬¬  We have seen that λx.M = λk.k λxκ.M k : τ1 τ2 and λxκ.M k : τ3 τ4 for → → ⊥ → → ⊥ some ~τ. These type judgements, together with the induction hypothesis, show that λx.M as well as all contained abstractions are in λ0.

36 Inductive case 2: MN. • Γ(M) M : Φ induction hypothesis ` ¬¬ Γ(N) N : Φ induction hypothesis ` ¬¬

m :Φ, n :Φ, k : Φ m :Φ Φ by definition of Φ ¬ ` → ¬¬ m :Φ, n :Φ, k : Φ m n : Φ ⇒ ¬ ` ¬¬ m :Φ, n :Φ, k : Φ m n k : ⇒ ¬ ` ⊥ m :Φ, k : Φ λn.m n k : Φ ⇒ ¬ ` ¬ Γ(N), m :Φ, k : Φ N (λn.m n k): ⇒ ¬ ` ⊥ Γ(N), k : Φ λm.N (λn.m n k): Φ ⇒ ¬ ` ¬ Γ(MN), k : Φ M λm.N (λn.m n k) : Γ(MN) = Γ(M) Γ(N) ⇒ ¬ ` ⊥ ∪ Γ(MN) λk.M λm.N (λn.m n k) : Φ ⇒ ` ¬¬  We have seen three type judgements: MN = λk.M λm.N (λn.m n k) : τ1 τ2 , → → ⊥ λm.N (λn.m n k) : τ3 , and λn.m n k : τ4 , for some ~τ. The three type judgements, together with the induction→ ⊥ hypothesis, show that→ ⊥MN as well as all contained abstractions are in λ0.

4.1.3 Supercombinator transformation The second step in simulating lambda calculus with continuation calculus is to make data flow explicit, or in λ concepts: we make sure that abstractions have no free variables. This notion is captured by supercombinators, which were originally conceived by Hughes [16] as a useful subset of λ terms to perform optimizations on. Definition 49. A supercombinator S of arity n is a lambda expression of the form

λx1x2 . . . xn.E (n 0) ≥ where E is a variable or an application, such that 1. S has no free variables, and 2. All lambda abstractions inside E are supercombinators. By “lambda abstractions inside E”, we mean the lambda abstractions that are not direct children of other abstractions in E. We rephrase a transformation by Bird [5] that transforms lambda terms to a supercombinator; this supercombinator is a β expansion of the lambda term.

SC(x) = x SC(MN) = SC(M) SC(N) SC(λ~x.M) = (λ~y.λ~x.SC(M)) ~y where ~y are the free variables of λ~x.M

Conjecture 50. If M λ0, then SC(M) λ0. Furthermore, if M is closed, then SC(M) is too. ∈ ∈ 4.1.4 Defunctionalization

The resulting lambda term SC(M) after two steps is a supercombinator in λ0. The third and final step in our simulation is to convert SC(M) to a CC program and CC term. This practice is closely related to lambda lifting [8].

37 The procedure goes as follows. Let A be the set of lambda abstractions within SC( M ) that are not direct children of another lambda abstraction. By the supercombinator assumption,J K all such m A are closed. We generate a name m for each m A. We use the following translation ∈ N ∈ from λ0 terms to CC terms. (From here on, M will refer to any lambda term, not specifically the original input lambda term.)

ccify(x) = x ccify(MN) = ccify(M).ccify(N)

ccify(λx! xn.M) = λx1 xn.M if M is not an abstraction ··· N ··· We take the following program:

P = λx1 xn .M .x1 . .xn ccify(M ) λx1 xn.M A and M not an abstraction . {N ··· ··· −→ | ··· ∈ } Note that because all terms in A are closed, all CC variables in the right hand sides also occur in their left hand sides. Furthermore, because SC( ) is a supercombinator, it follows straightforwardly that there are no variables in ccify(SC( )), so ccify· (SC( )) is a well-formed CC term. (Recall that CC terms do not contain variables.) · ·

Theorem 51. For any supercombinator lambda term E, continuation calculus reduction on ccify(E) closely simulates β0 reduction on E. By “closely simulates”, we mean that for all E, one of two cases applies. Either E β0 E0 ccify(E) CC ccify(E0), or both ccify(E) CC and E β0 E0 → ⇐⇒ → ↓ → for some E0 in β0 normal form.

In the first case (bi-implication), we see that CC simulates β0-reduction in a single step. In the second case, both the β0 and CC executions terminate, but the CC execution terminates one step earlier.

Proof. The proof will have the following structure. In the first part, we assume that E β0 E0 → for some E0. We distinguish two cases, and prove either ccify(E) CC ccify(E0) (and thus the → right-implication), or ccify(E) CC E0 β0 (and thus the right disjunct). In the second part, we ↓ ∧ ↓ assume ccify(E) CC ccify(E0) for some E0 and prove E β0 E0. The two parts together suffice to prove the theorem.→ → So for the first part, let us further examine the direction. A term in λ0 reduces using β0 if the term is of the following shape: ⇒

E = (λx1 xn.M) N1 Nn β0 E0 = M[N/~ ~x](n 1) ··· ··· → ≥ We distinguish two cases.

Case 1. M = λxn+1.M 0. Then there is no β0-reduction possible on M[N/~ ~x] = λxn+1.M 0[N/~ ~x]. In 0 0 continuation calculus, arity( λx1 xn+1.M ) n+1, so already ccify(E) = λx1 xn+1.M .N1. .Nn halts. Continuation calculusN thus··· halts one≥ step earlier, and CC reductionN ··· on ccify(E) “closely··· simulates” β0 reduction on E.

Case 2. M is not an abstraction. Then ccify(E) = λx1 xn.M .ccify(N1). .ccify(Nn), and there N ··· ··· is a rule λx1 xn .M .x1 . .xn ccify(M ). Continuation calculus reduces the term: N ··· ··· −→

λx1 xn.M .ccify(N1). .ccify(Nn) CC ccify(M)[ccify(N1)/x1, , ccify(Nn)/xn] N ··· ··· → ··· With Lemma 52 below, we get the following equivalent proposition: ~ ~ λx1 xn.M .ccify(N1). .ccify(Nn) CC ccify(M[N/x]) N ··· ··· → We thus have what was to be proven:

ccify(E) CC ccify(E0) →

38 For the direction, first note that ccify( ) is injective. By virtue of being a lambda term, E is ⇐ · of the general form E = (λx1 xk.M) N1 Nn with k 0, n 0 and M not an abstraction. ··· ··· ≥ ≥ CC reduction of ccify(E) is only possible when k = n: ccify(E) CC ccify(M)[N/~ ~x]. But if k = n, → then E β0 M[N/~ ~x]. Lemma 52 will now prove ccify(M)[N/~ ~x] = ccify(M[N/~ ~x]). This→ concludes the proof, on the premise of Lemma 52.

Lemma 52. For any lambda term M with fv(M) x1, , xn such that all abstractions inside ⊆ { ··· } M are supercombinators1, and for all lambda terms N~ :

ccify(M)[ccify(N1)/x1, , ccify(Nn)/xn] = ccify(M[N/~ ~x]). ··· Proof. By induction on M.

Base case: M = some xi. Then ccify(xi) = xi. Consequently, • ccify(xi)[ccify(N1)/x1, , ccify(Nn)/xn] ··· = xi[ccify(N1)/x1, , ccify(Nn)/xn] ··· = ccify(Ni)

= ccify(xi[N/~ ~x]).

Inductive case 1: M = λy1 ym.N. As M is a supercombinator, it has no free variables, • and as substitution replaces··· only free variables, any substitution applied to M equals M. Furthermore, ccify(M) = M , which has no variables, so any substitution applied to ccify(M) equals ccify(M). It remainsN to be trivially proven that ccify(M) = ccify(M).

Inductive case 2: M = t u. Then fv(t) fv(u) x1, , xn , and we have the following • derivation. ∪ ⊆ { ··· }

ccify(t u)[ccify(N1)/x1, , ccify(Nn)/xn] ··· = (ccify(t).ccify(u)) [ccify(N1)/x1, , ccify(Nn)/xn] by definition of ccify( ) ··· = ccify(t)[ccify(N1)/x1, , ccify(Nn)/xn] ··· . ccify(u)[ccify(N1)/x1, , ccify(Nn)/xn] by definition of substitution ··· = ccify(t[N/~ ~x]) . ccify(u[N/~ ~x]) by induction = ccify(t[N/~ ~x] u[N/~ ~x]) by definition of ccify( ) = ccify((t u)[N/~ ~x]) by definition of substitution

4.2 Embedding continuation calculus in lambda calculus

In this section, we explain how to simulate continuation calculus using lambda calculus. Our embedding will be simple (dot is translated to λ-application), but the embedding will be partial, due to a fundamental difference between lambda and continuation calculus: application of too many arguments must stop execution in continuation calculus. Checking whether the right number of arguments is applied would involve additional bookkeeping to keep track of the number of arguments. We feel that such bookkeeping is often unjustified, so we give the simple embedding of CC in λ. The embedding consists of two phases: first, we eliminate cycles in the CC program, then we translate the system trivially to λ calculus (‘functionalization’). Functionalization is closely related to lambda dropping [8]: both transform a system of equations to a block-structured program. We give the latter phase first.

1By “abstractions inside M”, we mean the lambda abstractions that are not direct children of other abstractions in M. This is consistent with Definition 49.

39 4.2.1 Functionalization The gist of the embedding is that acyclic CC programs correspond to a system of mutually recursively defined lambda terms. The functionalization phase translates acyclic CC programs, and terms on such programs, to lambda calculus. The first step in the phase is to translate the CC program to a system of λ equations, by replacing by = and dots by λ-applications. As an example, consider a CC program on the left, and the corresponding−→ lambda system.

A.x D.D. (B.B) .x.x A = λx. D D (BB) x x −→ B.self .x.y.z x. (D.D. (self .self )) .(z.y.z) B = λself λxλyλz. x (DD (self self )) (z y z) −→ C .x.y y.x C = λxλy. y x −→ D.self .b.x.y b.y.(x. (self .self .b)).x D = λself λbλxλy. b y (x (self self b)) x −→ Because the source CC program is acyclic, the system of equations can be solved unambiguously by substitution.2

A = λx. λself λbλxλy. b y (x (self self b)) x (λself λbλxλy. b y (x (self self b)) x) ([fill in B here] [fill in B here]) x x B = λself λxλyλz. x (λself λbλxλy. b y (x (self self b)) x) (λself λbλxλy. b y (x (self self b)) x) (self self)  (z y z) C = λxλy. y x D = λself λbλxλy. b y (x (self self b)) x

Now a CC term using the above CC program can be embedded in lambda calculus analogously: replace dots by λ-applications and replace A, B, C, D by their λ counterparts. It is straightforward to see that reduction on a CC term is simulated by a single β0-reduction in the corresponding term. There is one exception: if the arity of a CC name is 0, then the lambda translation of that name and its successor are already the same. The converse is not always true: there are terminating CC terms with a reducing λ translation, for instance those of the form n.t1. .tk, where arity(n) > k 1. However, the λ translation will ··· ≥ not be β0-reducible any more after a single β0-reduction. As with Theorem 51, we may say that β0-reduction closely simulates CC reduction. The remainder of this section addresses how to remove cycles from a CC program.

4.2.2 Cycle elimination In the previous section, we have shown how to simulate acyclic CC programs using lambda calculus. By a cycle, we mean names N0,N1 ,Nn such that the definition of each Ni refers to Ni+1, and ··· N0 = Nn. We can also functionalize a cyclic CC program, by first eliminating the cycles in the rules. This phase works iteratively: some names are replaced by terms, and the corresponding rules are changed. To perform this phase, we first order the rules of a program. The specific order does not matter; we use the order to ensure that each rule refers only to names defined below it. Then, we fix individual rules of the program, from bottom to top; we fix those rules that refer to themselves or

2Due to space considerations, we abbreviate the solution of A.

40 a name above them. This fixing consists in adding variables to the left-hand side, and substituting other terms for the names. The result after the fixing step is a replacement rule that does not refer to names above it, and additionally the rest of the program is updated to use the replacement rule. This procedure thus transforms the program into an equivalent program in which each rule refers only to names strictly below the rule. Fixing a single rule goes as follows. Consider a rule R that defines N; we label the rule here only for clarity.

R : N .x1 . .xk r with r a term over ~x ··· −→

Assume that M1, ,Mj are the names that N refers to and that are above N in the program. Then we replace rule··· R with

R0 : N 0.self .m1 . .mj .x1 . .xk r [(self .self .m1 . .mj ) /N , m1 /M1 , , mj /Mj ] ··· ··· −→ ··· ···

Furthermore, we replace N by (N 0.N 0.M1. .Mj) in the other rules, as well as in the terms we want to evaluate. Because our procedure works··· from bottom to top, N is only used in rules above R0, hence only has to be replaced there. Therefore, the invariant is kept: all rules below R0 still only refer to names strictly below them. After fixing all rules that need fixing, the resulting program is cycle-free.

Example. We give an example program with the cycle B,D,B.

R1 : A.x D.x.x −→ R2 : B.x.y.z x.D.(z.y.z) −→ R3 : C .x.y y.x −→ R4 : D.x.y B.y.(x.D).x −→ Rule R4 refers to a name defined above R4, so we fix that rule. We replace R4 with:

R40 : D 0.self .b.x.y b.y. (x. (self .self .b)) .x −→

Furthermore, we replace D by D0.D0.B in R1 and R2. The current program is as follows.

R1 : A.x D 0.D 0.B.x.x −→ R2 : B.x.y.z x. (D 0.D 0.B) .(z.y.z) −→ R3 : C .x.y y.x −→ R40 : D 0.self .b.x.y b.y.(x. (self .self .b)).x −→ Now rule R2 refers to itself. We fix R2 to obtain the following rule:

R20 : B.self .x.y.z x. (D 0.D 0. (self .self )) .(z.y.z) −→

Furthermore, we replace B by B0.B0 in R1. The resulting program is as follows.

R1 : A.x D 0.D 0. (B 0.B 0) .x.x −→ R20 : B 0.self .x.y.z x. (D 0.D 0. (self .self )) .(z.y.z) −→ R3 : C .x.y y.x −→ R40 : D 0.self .b.x.y b.y.(x. (self .self .b)).x −→ This acyclic program was functionalized in Section 4.2.1. Remark 53. It can be beneficial to choose a smart order prior to removing cycles. For instance, had A been the bottommost rule, then we would have fixed not two but three rules.

41 Chapter 5

Related work

For a long time, lambda calculus has been the canonical formalism to model functional programs. It has been accommodated with various control extensions to model the addition of various types of control, among which undelimited and delimited continuations. [9, 11, 12, 10] The λµ calculus [22, 2] is another way to represent control-like operators. Research has been done on encoding the into lambda terms using the CPS transformation, as well as on the CPS-transformed subset of lambda calculus. [25, 6]. Furthermore, computation and value have been separated using call-by-push-value [18], supercombinators [16], and the derived spineless tagless G-machine [24]. The author has not found calculi that have a combination of continuations and control-data separation.

42 Chapter 6

Conclusion and future work

Lambda calculus has given the world a way to reason uniformly on functional programs, in a variety of ways. Yet, it seems not a natural formalism to model programs with undelimited continuations: such programs are typically modeled in extensions of λ, such as λC, which have additional rules. The meaning of a term of λC can be expressed in λ calculus by means of the CPS translation. Indeed, certain term sets in λ calculus admit taking continuations in one way or another: the image of the CPS transformation admits continuations, for instance, but also the term set λ0, which is based on a modified reduction relation. In this thesis, we have argued that continuation calculus, or CC in short, is a development of the λ0 set. It is possible to take continuations in CC, as in λ0 and λC, but CC is simpler than both λ0 and λC. Besides the support of continuations and its simplicity, CC is more straightforward to implement than either. There is no hidden evaluation order, and there is no hidden state that persists over reductions: all state is already involved in every reduction, in a uniform way. On the user side, we have shown how to use CC, by implement a simple programming language (ML+) using continuation calculus. This language supports mixing call-by-value (CBV) and call-by-name (CBN) functions. Hence, the translation implicitly shows how CC is suitable for modeling CBV languages, CBN languages, and languages that support both evaluation orders. On the philosophical side, we have given an observational equivalence between CC terms: we can replace subterms of a term by observationally equivalent subterms, such that the “meaning” does not change. If the CC term of two ML+ terms can be proven observationally equivalent, this means that the ML+ terms are equivalent in meaning, and a program transformation from one to the other is warranted.1 We have given relations between programs and elementary modifications, and used the constructed vocabulary to prove correct a program that uses continuations for nonlocal control. The breadth of these results shows that not only is continuation calculus a suitable and elegant representation of functional code with continuations, it additionally suits a wide range of uses. Research can continue in various directions, depending on the most valued use. For instance, the real-world efficiency of continuation calculus as an intermediate language is still to be tested: how fast is a naive implementation, and is it easy to make it performant? On the calculation side, transformations in lambda calculus have been the subject of a large body of research. We have proven the behavior of an optimized list multiplication program in CC in Section 2.6.1, by applying theory we developed. But in light of the similarities between lambda and continuation calculus, what properties and techniques from lambda calculus are transferable to CC? On the language side, we may be able to extend continuation calculus with special names with defined behavior with side effects. We cannot expect the existing equivalences to continue to hold;

1The ML+ terms are equivalent if we define that the meaning of an ML+ term is the meaning of its corresponding CC term, which itself is only defined as the class of observational equivalence. Although the author believes that such definition of “meaning” is what people would expect in ML+, and although this belief is substantiated by examples such as Proposition 31, there is still theory to be made and proofs to be found.

43 it would be interesting to examine what equivalences make sense on such a system in general, or on a system for a specific type of side effect. Moreover can it be interesting to search for a middle course: a calculus that allows side effects in a limited way, to aid the algebraic properties and programmer’s expectations of terms in that calculus. Even without side effects, the ML+ language of Chapter 3 promises insight on the topic of mixed by-name, by-value, and continuation languages. This insight can be fortified by trying to derive Pédrot’s semantics for ctML, a call-by-name ML with control [23], from ML+. In general, the work in progress is essential for ML+ to mature. Whatever the specific use of CC or possible descendants, a type system is currently missing in continuation calculus. Initial results by the author and his supervisor suggest that a type system may be used to prevent that evaluation flows from a well-typed term to a incomplete, invalid, or undefined term. Simultaneously, most of the expressiveness of CC seems to be maintained under the typing. A further investigation into a typing on CC should shed more light on the interplay between CC and typings, be they a simple typing or more advanced, and on the concrete guarantees that such typings yield.

44 Bibliography

[1] A. W. Appel and T. Jim. Continuation-passing, closure-passing style. In Proceedings of the 16th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL ’89, pages 293–302, New York, NY, USA, 1989. ACM. [2] Zena M. Ariola and Hugo Herbelin. Control reduction theories: the benefit of structural substitution. Journal of Functional Programming, 18:373–419, 4 2008. [3] Henk Barendregt. The Lambda Calculus: its syntax and semantics, volume 103 of Studies in logic and the foundations of mathematics. Elsevier, second edition, 1984.

[4] Henk Barendregt, Wil Dekkers, and Richard Statman. Lambda Calculus with Types. Perspec- tives in logic. Cambridge University Press, July 2013. [5] R.S. Bird. A formal development of an efficient supercombinator compiler. Science of Computer Programming, 8(2):113–137, 1987.

[6] Olivier Danvy and Andrzej Filinski. Representing Control: a Study of the CPS Transformation. Mathematical Structures in , 2:361–391, 11 1992. [7] Olivier Danvy and Lasse R. Nielsen. Defunctionalization at work. In Proceedings of the 3rd ACM SIGPLAN international conference on Principles and practice of declarative programming, PPDP ’01, pages 162–174, New York, NY, USA, 2001. ACM.

[8] Olivier Danvy and Ulrik P. Schultz. Lambda-dropping: transforming recursive equations into programs with block structure. In Proceedings of the 1997 ACM SIGPLAN symposium on Partial evaluation and semantics-based program manipulation, PEPM ’97, pages 90–106, New York, NY, USA, 1997. ACM.

[9] R. Kent Dyvbig, Simon Peyton Jones, and Amr Sabry. A monadic framework for delimited continuations. Journal of Functional Programming, 17:687–730, 10 2007. [10] . The theory and practice of first-class prompts. In Proceedings of the 15th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, pages 180–190. ACM, 1988.

[11] Matthias Felleisen and Daniel P. Friedman. Control operators, the SECD-machine, and the λ-calculus. In 3rd Working Conference on the Formal Description of Programming Concepts, pages 193–219. North-Holland Publishing, 1986. [12] Matthias Felleisen, Daniel P. Friedman, Eugene Kohlbecker, and Bruce Duba. A syntactic theory of sequential control. Theoretical Computer Science, 52(3):205–237, 1987.

[13] Bram Geron and Herman Geuvers. Continuation calculus. In U. de’Liguoro and A. Saurin, editors, First Workshop on Control Operators and their Semantics. EPTCS, to appear. [14] Paul Graham. On Lisp: Advanced Techniques for . Prentice Hall, 1993.

45 [15] J. Hughes. Why Functional Programming Matters. The Computer Journal, 32(2):98–107, 1989. [16] R.J.M. Hughes. Super-combinators: a new implementation method for applicative languages. In Proceedings of the 1982 ACM symposium on LISP and functional programming, pages 1–10. ACM, 1982.

[17] Andrew Kennedy. Compiling with continuations, continued. In Proceedings of the 12th ACM SIGPLAN international conference on Functional programming, ICFP ’07, pages 177–190, New York, NY, USA, 2007. ACM. [18] P.B. Levy. Call-by-push-value. PhD thesis, Queen Mary, University of London, 2001.

[19] Leonardo Moura and Nikolaj Bjørner. Z3: An Efficient SMT Solver. In C.R. Ramakrishnan and Jakob Rehof, editors, Tools and Algorithms for the Construction and Analysis of Systems, volume 4963 of Lecture Notes in Computer Science, pages 337–340. Springer Berlin Heidelberg, 2008. [20] Alan Mycroft. The theory and practice of transforming call-by-need into call-by-value. In Bernard Robinet, editor, International Symposium on Programming, volume 83 of Lecture Notes in Computer Science, pages 269–281. Springer Berlin Heidelberg, 1980. [21] Peter J. Nürnberg, Uffe K. Wiil, and David L. Hicks. A Grand Unified Theory for Structural Computing. In David L. Hicks, editor, Metainformatics, volume 3002 of Lecture Notes in Computer Science, pages 1–16. Springer Berlin Heidelberg, 2004.

[22] Michel Parigot. λµ-calculus: An algorithmic interpretation of classical . In and Automated Reasoning, volume 624 of Lecture Notes in Computer Science, pages 190–201. Springer Berlin Heidelberg, 1992. [23] Pierre-Marie Pédrot. Étude de ctML. L3 internship report. École Normale Supérieure de Lyon. Retrieved from http://www.pps.univ-paris-diderot.fr/~pedrot/publications.html.

[24] Simon L. Peyton Jones. Implementing lazy functional languages on stock hardware: the Spineless Tagless G-machine. Journal of Functional Programming, 2:127–202, 1992.

[25] G.D. Plotkin. Call-by-name, call-by-value and the λ-calculus. Theoretical Computer Science, 1(2):125–159, 1975.

[26] Peter Selinger. Lecture Notes on the Lambda Calculus. arXiv:0804.3434v1, 2008. [27] Terese. Term Rewriting Systems, volume 55 of Cambridge Tracts in Theoretical Computer Science. Cambridge University Press, April 2003.

46 Appendix A

Proofs

We first prove the theorems in Section 2.4.1, then those in Section 2.4.3, and finally those in Section 2.4.2. The theorems within a subsection are not proved in order, and are interspersed with lemmas.

A.1 General

Proposition 54. Let name fr be not mentioned in term M and program P . Then fr is not mentioned in any reduct of M. Proof. By induction and by definition of next(M). Theorem 55. Let M,N , P a program, and fr a name not mentioned in P . The following equivalences hold: ∈ U

M N t : M[fr := t] N[fr := t] (1) →M ⇐⇒ ∀t ∈ U : M[fr := t] → (2) ↓ ⇐⇒ ∃ ∈ U ↓ M  N t : M[fr := t]  N[fr := t] (3) M ⇐⇒ ∀t ∈ U : M[fr := t] (4) ↓ ⇐⇒ ∃ ∈ U ↓ This theorem implies Theorem 12. Proof.

( 1). Fill in t = fr. ⇐ ( 1). Since nextP (M) exists, head(M) must be in the domain of P . Because fr / dom(P ), we ⇒ ∈ know head(M) = fr. Let M = n.u1. .uk and “n.x1. .xk r” P , where n is a 6 ··· ··· → ∈ name. Then M[fr := t] = n.u1[fr := t]. .uk[fr := t] r[~x := ~u [fr := t]]. Since fr is not mentioned in r, the last term is equal··· to r[~x := ~u][fr→:= t] = N[fr := t].

( 2). Assume M[fr := t] . Then head(M[fr := t]) / dom(P ) or length(M[fr := t]) = ⇐ arity(head(M[fr := t]))↓ . If head(M) = fr, then M∈ , so assume head(M) = fr. Then6 head(M) = head(M[fr := t]) and length(M) = length↓(M[fr := t]), so also M6 . ↓ ( 2, 3, 4). Fill in t = fr. ⇒ ⇐ ⇒ ( 3). is the reflexive and transitive closure of . ⇒  → ( 4). Suppose that M[fr := t] N in k steps. If on the contrary M is not terminating, then ⇐  ↓ M  M 0 in k + 1 steps. By repeated application of ( 1), also M[fr := t]  M 0[fr := t] in k + 1 steps. Contradiction. ⇒

47 Proof of Lemma 13 (determinism). We assumed that M  m.~t and M  n.~u . So m.~t and n.~u are the term at the end of the execution path of M; we see that↓ they must be↓ equal.

Proof of Proposition 16 (M  t  N then M  N). By assumption, M  t  N. If N  t in 1 or more steps, then we could not↓ have had N . Thus N t in 0 steps: N = t. ↓  A.2 Program substitution and union

These are proofs of theorems in Section 2.4.3.

Proof of Theorem 25, equivalences 1/3. Let M = h.t1. .tk, where h is a name. We have the following cases. ···

1. h / dom(P ). All names in dom(P σ) dom(P ) are fresh, so also hσ / dom(P σ). We see that ∈ \ ∈ both nextP (M) and nextP σ(Mσ) are undefined.

2. h dom(P ). Then some rule “h.x1 . .xl r” is in P , while “hσ.x1 . .xl rσ” is in ∈ ··· −→ ··· −→ P σ. If k = l, then both nextP (M) and nextP σ(Mσ) are undefined. 6 If k = l, then nextP (M) = r[~x := ~t ], and nextP σ(Mσ) = rσ[~x := ~tσ]. We note that the domains of σ and [~x := ~tσ] are disjoint because names are never variables. We can therefore do the substitutions in parallel: rσ[~x := ~tσ] = r ~n := ~m,~x := ~tσ. Because the result of σ is never in dom(σ) (all mi are fresh), we can even put [~n := ~m] at the end: r ~n := ~m,~x := ~tσ = r ~x := ~tσ [~n := ~m] = r ~x := ~tσ σ. Also, because the result of σ is never in dom(σ) , we know that σσ = σ. We find that r ~x := ~tσ σ = r ~x := ~t  σ = Nσ. This completes the proof. Proof of Theorem 25, equivalences 2/4. The second equivalence is by transitivity of equivalence 1. The fourth equivalence is then trivial.

Proof of Theorem 25, implications 1–4. nextP (M) exists iff a rule P defines it; by P P 0 that ∈ ⊆ rule also defines nextP 0 (M). This proves implication 1 and 3. The second implication follows using the structure of M P N. Then the fourth implication follows trivially.

Proof of Theorem 25, implication 5. We have to show that for all P 00 P 0 and X , X.M 00 ⊇ ∈ U ↓P ⇐⇒ X.N 00 . This follows from M P N because P P 0 P 00. ↓P ≈ ⊆ ⊆ Proof of Theorem 25, equivalence 5. We show the left-implication. The right implication then 1 1 1 follows from Mσσ− −1 Nσσ− Mσ P σ Nσ, because σσ− is the identity substitution. ≈P σσ ⇐ ≈ So suppose Mσ P σ Nσ, and let program Q P and term X be given. We prove X.M  Q X.N by the following≈ chain: ⊇ ↓ ⇔ ↓Q X.M ↓Q Xσ.Mσ (Theorem 25 equivalence 2/3) ⇔ ↓Qσ Xσ.Nσ (M P σ Nand Qσ P σ) ⇔ ↓Qσ ≈ ⊇ X.N (Theorem 25 equivalence 2/3) ⇔ ↓Q

Lemma 56. Assume program P 0 P and M such that dom(P 0 P ) is not mentioned in P ⊇ ∈ U \ or M. Then M M 0 . ↓P ⇔ ↓P Proof. Regard nextP (M) and nextP 0 (M). The names in dom(P 0 P ) are not mentioned in M, so \ either both nextP (M) and nextP 0 (M) are defined and equal, or both are undefined. The names in dom(P 0 P ) are still not mentioned in M’s successor, so the previous sentence applies to all reducts \ of M. We find that M has a final reduct in P iff it has one in P 0, hence M M 0 . ↓P ⇔ ↓P

48 Proof of Theorem 27. The right-implication is already proven by Theorem 25, so we prove the left-implication. Suppose program P 0 P , but dom(P 0 P ) is not mentioned in M,N. Suppose furthermore program Q P and X ⊇ . Then we have\ to prove X.M X.N . Q is not required ⊇ ∈ U ↓Q ⇔ ↓Q to be a superset of P 0; it may even define some names differently than P 0. Although we know that ∆ = dom(P 0 P ) is not used in M or N, any name ∆ could be used in X. We want to compare X.M and X.N\ on an extension program of Q, so∈ we will make sure that X does not accidentally refer to names in ∆. We will rename all d ∆ within X and P 0. ∈ Take a substitution σ = [di := d0 di ∆] that renames all d ∆ to fresh names for i| ∈ ∈ M,N,X,P 0,Q. We know that M = Mσ, N = Nσ, and P = P σ, because all d ∆ are not mentioned in M, N, or P . Now note that (X.M) σ = Xσ.M and (X.N) σ = Xσ.N∈ do not contain a name in ∆, nor does any such name occur in Qσ. Take Q0 = P 0 Qσ. Then Q0 is a program because dom(Qσ P ) has no overlap with ∪ \ dom(P 0 P ) = ∆. Furthermore, Q0 is an extension program of both P 0 and Qσ. We apply Lemma\ 56 to see that

Xσ.M Xσ.M 0 ↓Qσ ⇔ ↓Q (A.1) and Xσ.N Xσ.N 0 . ↓Qσ ⇔ ↓Q We can thus make the following series of bi-implications.

X.M Xσ.M (Theorem 25) ↓Q ⇔ ↓Qσ Xσ.M 0 (A.1) ⇔ ↓Q Xσ.N 0 (M P 0 N,P 0 Q0) ⇔ ↓Q ≈ ⊆ Xσ.N (A.1) ⇔ ↓Qσ X.N (Theorem 25) ⇔ ↓Q

Because we showed X.M X.N , we can conclude that M P N. ↓Q ⇔ ↓Q ≈ A.3 Term equivalence

Proposition 57. =P is an equivalence relation.

Proof. Suppose A =P C and C =P E, then there exist B,D such that A  B  C  D  E. Suppose that C  B in k steps and C  D in l steps. Without loss of generality, k l. By determinism of , ≤ → C B D. k steps l ksteps − Then A  B  D. We see that D is a common reduct of A and E. Proposition 58. is an equivalence relation. ≈ Proof. Reflexivity and symmetry are trivial. We have to prove transitivity: if M P N and ≈ N P O, and P P 0, then X.M 0 X.O 0 . We know from the premises that ≈ ⊆ ↓P ⇔ ↓P X.M 0 X.N 0 and X.N 0 X.O 0 . ↓P ⇔ ↓P ↓P ⇔ ↓P Lemma 59. If X Y , then X Y .  ↓ ⇔ ↓ Proof. By induction on the number of steps s in X Y . If X = Y , then trivial, so assume s 1.  ≥ This implies the existence of term X0 such that X X0. If there exists Z such that Y Z , then X → Y Z . Reversely, assume X Z for  ↓   ↓  ↓ some Z. Because X = Y and by determinism of we know X X0 Z and X X0 Y . 6 → →  ↓ →  By induction on X0 Y we get Y Z .   ↓

49 Proof of Proposition 19 (M N then M  N  ). Take a fresh name X, and define P 0 = P X .t t . ≈ ↓ ⇔ ↓ ∪ { −→ }

M M 0 (evaluation of M never contains a head in dom(P 0 P )) ↓P ⇔ ↓P \ X.M 0 (X.M P 0 M, deterministic) ⇔ ↓P → → X.N 0 (M P N) ⇔ ↓P ≈ N 0 (X.N P 0 N, deterministic) ⇔ ↓P → → N (evaluation of N never contains a head in dom(P 0 P )) ⇔ ↓P \

Lemma 60. If X Y , then X.~t for k > 0. → ↓ Proof. next(M) exists iff the length of the corresponding left-hand side is equal to length(M), and length(X.t1. .tk) = length(X) + k. The corresponding left-hand side is the same for X and X.~t. ···

Proof of Lemma 18 ( is a congruence). Let P 0 P be an extension program. We must prove ≈ ⊇ that for all X, X.(M.N)  P 0 X.(M 0.N 0)  P 0 . Extend P 0 to P 00 = P 0 A.m.n X .(m.n), B.n.m X .(m.n)↓ . Note⇔ that by Lemma↓ 59, ∪ { −→ −→ }

X. (M.N) 00 A.M.N 00 B.N.M 00 ↓P ⇔ ↓P ⇔ ↓P and X. (M 0.N 0) 00 A.M 0.N 0 00 B.N 0.M 0 00 , ↓P ⇔ ↓P ⇔ ↓P so we can make the following chain:

X. (M.N) 00 ↓P A.M.N 00 ⇔ ↓P A.M.N 0 00 (N N 0) ⇔ ↓P ≈ B.N 0.M 00 ⇔ ↓P B.N 0.M 0 00 (M M 0) ⇔ ↓P ≈ X. (M 0.N 0) 00 . ⇔ ↓P

Now by Lemma 56, X. (M.N) 0 X. (M 0.N 0) 0 , which was to be shown. ↓P ⇔ ↓P Lemma 61. Let M,N and k 0. Let names fr , , fr be not mentioned in M,N,P . ∈ U ≥ 1 ··· k Suppose M.fr1. .frk M 0  t  N 0 N.fr1. .frk. Let name n be not mentioned in M,N,P . Then ···X →: X[n := M] ←X[n := N]··· . ∀ ∈ U ↓ ⇔ ↓ Proof. Suppose that X[n := M]  X0 in n steps. We will show that X[n := N]  . The other direction holds by symmetry. The proof↓ goes by induction on n. ↓ Because next(M.fr1. .frk) and next(N.fr1. .frk) exist, we know that arity(M) = arity(N) = k. Regard head(X). We··· distinguish four cases: ··· Case 1. head(X) = n, and length(X) = k. Then arity(X[n := N]) is undefined or not zero, hence X[n := N] . 6 ↓

Case 2. head(X) = n, and length(X) = k. Then there exist u1, , uk such that X = n.u1. .uk. Then: ··· ···

X[n := M] = M.u1[n := M]. .ul[n := M] ··· M 0[fr~ := ~u[n := M]] (fr~ fresh for M,P ) → = M 0[fr~ := ~u][n := M] (n fresh for M 0)  t[fr~ := ~u][n := M] (fr~ fresh for M,P )

Analogously, X[n := N] t[fr~ := ~u][n := N]. We know t[fr~ := ~u][n := M] X0 in at   ↓ most n 1 steps, and n is not mentioned in t[fr~ := ~u], so using the induction hypothesis we get − t[fr~ := ~u][n := N] . Hence, X[n := N] . ↓ ↓

50 Case 3. head(X) = n, and arity(X) = 0 or undefined. Then arity(X[n := M]) = arity(X[n := N]) = arity(X) =6 0 or undefined, hence6 X[n := M] and X[n := N] . 6 ↓ ↓ Case 4. head(X) = n, and arity(X) = 0. Then next(X[n := M]) = next(X)[n := M] and 6 next(X[n := N]) = next(X)[n := N]. We assumed X[n := M] X0 in n steps, so next(X)[n :=  ↓ M]  X0 in at most n 1 steps. We can therefore apply the induction hypothesis to find next(X)[n :=↓ N] . − ↓ Proof of Theorem 21. Suppose P 0 P is an extension program. We must prove X.M  X.N . ⊇ ↓ ⇔ ↓ Take fr~ fresh for P 0,X,M,N. Because arity(M.fr . .fr ) = arity(N.fr . .fr ) = 0, they 1 ··· k 1 ··· k both have a successor, say M 0 and N 0. By definition of =P and determinism of , we know → M.fr . .fr M 0 t N 0 N.fr . .fr . Then by Lemma 61, we know X.M 0 1 ··· k →   ← 1 ··· k ↓P ⇔ X.N 0 . ↓P Proof of Theorem 20 (M fr.t1. .tk, M N then N fr.u1. .uk). Because fr / dom(P ),  ··· ≈  ··· ∈ we know M fr . By Proposition 19, N N 0 .  ↓  ↓ Suppose on the contrary that head(N 0) = fr or length(N 0) = k. We will deduce an impossibility. 6 6 Make an extension program P 0 = P fr.x1 . .xk fr.x1 . .xk . Then M fr.t1. .tk is ∪ { ··· −→ ··· }  ··· nonterminating under P 0. But by definition of next, N 0 P 0 . This contradicts with Proposition 19 ↓ and Theorem 27, which prove that N  N 0 is nonterminating. Hence we conclude N 0 = fr.u1. .uk for some terms ~u. ··· Proof of Proposition 31. Let N0 = M z, s : M.z.s  z and Nk+1 = M x J K { ∈ U|∀ ∈ U } J K { ∈ U|∃ ∈ Nk z, s : M.z.s  s.x . Then every Nk N , and k N Nk satisfies the defining J K ∀ ∈ U } J K ⊆ J K ∪ ∈ J K equation of N , so N = k N Nk . J K J K ∪ ∈ J K Suppose M N , then M some Nk . We proceed by induction on k. If k = 0, then ∈ J K ∈ J K Theorem 21 shows M 0 . If k 1, then by the induction hypothesis there is some x Nk 1 ≈ h i ≥ ∈ J − K such that for all z, s, M.z.s  s.x. Observe that M.z.s =P S.x.z.s for all z, s, so Theorem 21 shows us M S.x. We get S.x S. k 1 = k by the induction hypothesis and Lemma 18, from which we≈ get the result. ≈ h − i h i

51