Lecture Notes Semantics Course WS 2013

Gert Smolka with Steven Schäfer Saarland University

February 3, 2014

Preface

The course is about the theory of programming languages. We study in depth a number of standard models for programming languages. In these notes, the term programming language includes logical languages like the language of the proof assistant Coq. We use Coq throughout the course to formalize the models we study. Coq will provide us with an example of a powerful logical programming language. We assume that you are familiar with Coq (first half of the ICL course). With Coq we mean several things: The underlying Coq, the Coq program- ming language, and the Coq programming system. In this course you will learn a lot about the type theory underlying Coq. However, we assume that you are already familiar with the basic ideas of type theory and their realization in Coq.

3

Contents

1 Untyped , First Encounter 1 1.1 Terms ...... 1 1.2 SubstitutionandBetaReduction ...... 2 1.3 BetaEquivalence...... 4 1.4 ChurchNumerals ...... 5 1.4.1 SuccessorFunction ...... 5 1.4.2 , , ...... 6 1.4.3 More Equivalences for Church Numerals ...... 7 1.5 Pairs, Predecessor, and Primitive Recursion ...... 7 1.5.1 Pairs ...... 7 1.5.2 PrimitiveRecursion ...... 8 1.5.3 Predecessor...... 8 1.6 ChurchNumeralsinCoq...... 8 1.6.1 Predicativity of Coq’s Universe Type ...... 9 1.6.2 ChurchExponentiation...... 10 1.7 Fixed Point Combinators and Recursive Functions ...... 11 1.8 ScottNumerals...... 12 1.9 SK-TermsandSK-Reduction...... 13 1.10 CombinatoryLogic ...... 14 1.11 AbstractionOperatorforCL ...... 16 1.12 ChurchNumeralsinCL ...... 17 1.13 Call-by-ValueReduction ...... 17 1.14 Call-by-NameReduction ...... 19 1.15 Summary ...... 19

2 Inductive Predicates 23 2.1 PredicatesandRelations...... 23 2.2 EvenNumbers ...... 24 2.2.1 InductiveDefinitionsasSpecifications ...... 25 2.3 PencilNotationforInductiveProofs...... 27 2.4 Exercises ...... 28

5 Contents

3 Abstract Reduction Systems 31 3.1 BasicDefinitions...... 31 3.2 ReflexiveTransitiveClosure...... 31 3.2.1 ProofofTransitivity...... 33 3.2.2 BinaryDefinition...... 34 3.2.3 PowerDefinition...... 35 3.3 NormalForms ...... 35 3.4 EquivalenceClosure...... 35 3.5 Church-Rosser-LikeProperties ...... 36 3.6 ReductionOperators ...... 38 3.7 Abstract Analysis of Reflexive Transitive Closure ...... 39 3.8 StrongNormalization...... 40

4 Lambda Calculus, Formalization 43 4.1 Terms and Substitutions in de Bruijn Representation ...... 43 4.2 BasicSubstitutionLaws ...... 44 4.3 BetaReduction...... 45 4.4 ClosedTerms...... 45 4.5 FormalizationinCoq ...... 46

5 Type Theory 49 5.1 Simply Typed Lambda Calculus (STLC) ...... 49 5.2 Gödel’sT ...... 51 5.3 Girard’sF ...... 52 5.4 LogicalInterpretation...... 55 5.5 FwithSingle-SortedSyntax ...... 56 5.6 FwithUniformSyntax ...... 56

5.7 Fω andCC:KindsandConversion ...... 58 5.8 CalculusofConstructions...... 60 5.9 NumbersasInductiveType ...... 61 5.10 DependentPairTypes ...... 63 5.11 RecommendedReading ...... 65

6 PCF 67

Bibliography 67

6 1 Untyped Lambda Calculus, First Encounter

The basic model in the theory of programming languages is the untyped lambda calculus [Church 1932]. The untyped lambda calculus can be seen as a small functional programming language that can express all computable functions. Turing showed in 1937 that his machines and untyped lambda calculus have equivalent computational power. While Turing’s machines compute with strings, the untyped lambda calculus computes with functions. We start with a informal presentation of the untyped lambda calculus and discuss important ideas and results. Later in the course we will formalize the lambda calculus and prove some results. From a programming point of view, every object of the untyped lambda cal- culus is a function. Thus other objects like numbers must be represented as functions. Coq encompasses a typed version of the lambda calculus. Thus experience with Coq will be helpful in understanding the untyped lambda calculus. The standard reference for untyped lambda calculus is Barendregt [1]. A more gentle textbook introduction is Hindley and Seldin [2].

1.1 Terms

The abstract syntax of the untyped lambda calculus consists of terms.

s, t ::= x | λx.s | st (x ∈ N)

Terms of the form λx.s are called abstractions and introduce local variables. One often says that the symbol λ acts as a variable binder. A term is closed if no variable occurs free in it (i.e., every variable occurrence is local). Closed terms are also called combinators. Terms that are equal up to consistent renaming of bound variables are iden- tified. For instance, (λx.x) = (λy.y). We call this important assumption the α-law. The α-law considerably complicates the formal definition of terms (to be studied later).

1 1 Untyped Lambda Calculus, First Encounter

The closed terms of the untyped lambda calculus describe functions. Every function of the lambda calculus takes one argument. However, since functions return functions, we can apply a function to any number of arguments. This is reflected in a basic notation for terms:

stu ⇝ (st)u

Here is a list of prominent combinators:

I := λx.x K := λxy.x B := λfgx.f(gx) C := λfxy.fyx S := λfgx.fx(gx) ω := λx.xx Ω := ωω

The combinator B can be seen as a composition operator. Sometimes we will write s ◦ t for Bst.

1.2 Substitution and Beta Reduction

The idea that terms describe functions is captured by a computation rule known as β-reduction. For the definition of β-reduction we need the notion of substitu- tion. x The notation st stands for the term that is obtained from the term s by re- placing every free occurrence of the variable x with the term t. From the α-law it follows that we have

y (λx.fxy)x = λz.fzx y (λx.fxy)x ≠ λx.fxx if x, y, z, and f are distinct variables. One speaks of capture-free substitution. A β-redex is a term of the form (λx.s)t. A β-reduction replaces a β-redex x (λx.s)t with the term st . We write s ≻ t if t can be obtained from s by reducing some β-redex in s. For instance, we have

x (λx.s)t ≻ st

2 1.2 Substitution and Beta Reduction for a top level β-reduction. The formal definition of β-reduction is inductive.

s ≻ s′ s ≻ s′ t ≻ t′ x ′ ′ ′ (λx.s)t ≻ st λx.s ≻ λx.s st ≻ s t st ≻ st

We write s ≻n t if t can be obtained from s with nβ-reductions, and s ≻∗ t if s ≻n t for some n ≥ 0 (...reflexive transitive closure). Here are examples.

ωω ≻ ωω SKK ≻2 λx.Kx(Kx) ≻ I

The first example shows that β-reduction does not always terminate. This is to be expected for a Turing-complete system. A term is normal if it cannot be β-reduced. Obviously, a term is normal if and only if it contains no β-redex. While the combinators I, K, S, and ω are normal, the combinator Ω is not. Given two terms s and t, we say that s evaluates to t and write s ⇓ t if s ≻∗ t and t is normal. If s evaluates to t, we say that t is a normal form of s. Note that the term Ω has no normal form. A main result about the lambda calculus says that every term has at most a normal form. An interpreter for the lambda calculus is an algorithm that given a term com- putes a normal form of the term whenever there is one. From the term KIΩ we learn that the naive strategy that reduces some beta redex as long as there is one does not suffice for an interpreter (since KIΩ ≻ KIΩ but also KIΩ ≻ I). It is known that the strategy that always reduces the leftmost outermost β-redex finds a normal form whenever there is one. In Coq we have β-redexes and β-reduction for typed terms. The typing en- sures that β-reduction always terminates. Thus every term has a normal form in Coq. We say that a term is weakly normalizing if it has a normal form. A term is strongly normalizing if there is no infinite β-reduction chain issuing from it. The term KIΩ is weakly normalizing but not strongly normalizing. If a term is strongly normalizing, we can compute a normal form of it by just reducing β-redexes as long as we can. We can define strong normalization inductively.

∀t. s ≻ t → SN t SN s

According to this definition, normal terms are strongly normalizing since they do not have successors and thus all their successors are strongly normalizing.

3 1 Untyped Lambda Calculus, First Encounter

Exercise 1 (Substitution) Convince yourself that the following equations hold.

x x x (st)u = su tu x sx = s x st = s if x not free in s y y (λx.s)t = λx.st if x ≠ y and x not free in t x λx.s = λy.sy if y not free in s

1.3 Beta Equivalence

The lambda calculus comes with a canonical equivalence relation on terms. In- formally, two closed terms are equivalent if and only if they describe the same function. We leave the notion of described function informal and define equiva- lence of terms formally. Beta equivalence is the equivalence closure β-reduction. The inductive defi- nition of β-equivalence looks as follows.

s ≻ t s ≡ t s ≡ t t ≡ u s ≡ t s ≡ s t ≡ s s ≡ u

Beta equivalence satisfies s ≻∗ t → s ≡ t and the congruence laws

s ≡ s′ → λx.s ≡ λx.s′ s ≡ s′ → t ≡ t′ → st ≡ s′t′

Another important property of β-equivalence is substitutivity.

x x s ≡ t → su ≡ tu A nontrivial property of β-equivalence is the Church-Rosser property:

s ≡ t → ∃u. s ≻∗ u ∧ t ≻∗ u

Thus two terms are β-equivalent if and only if there is a term to which they both reduce. Here are three important consequences of the Church-Rosser property (all of them have straightforward proofs).

s ≡ t → s ⇓ u → t ⇓ u s ≡ t → t normal → s ⇓ t s ≡ t → s, t normal → s = t

4 1.4 Church Numerals

These facts are of great importance for computational correctness proofs. If we want to argue that a term s evaluates to a term t or that two terms are equivalent as it comes to evaluation, we can use equational reasoning based on β-equivalence. This is much easier than arguing about β-reduction directly. Beta equivalence is an example of what is called a convertibility relation. We will say that two terms are convertible if they are β-equivalent. Beta equivalence does not satisfy the so-called η-law:

λx.sx ≡ s if x is not free in s

This is in contrast to the convertibility relation underlying Coq, which satisfies a typed version of the η-law. It is possible to consider untyped lambda calculus with both β- and η-reduction. For now, we will just consider β-reduction.

Exercise 2 Prove the following equivalences. a) SKK ≡ I b) BCCfxy ≡ f xy

1.4 Church Numerals

Church encoded a n as a function iterating a given function n- times:

0 := λfx.x b 1 := λfx.fx b 2 := λfx.f(fx) b n := λfx.f nx b We call the term n the Church numeral for n. Note that the Church numerals are normal combinators.b This ensures that numerals for different numbers are not β-equivalent (Church-Rosser property). Sometimes it is helpful to think of a numeral n as an operator λf.f n that applied to a function f yields the function f n. b

1.4.1

We express the successor function as the following normal combinator.

succ := λnfx.f(nfx)

5 1 Untyped Lambda Calculus, First Encounter

The proof of the correctness statement

succ n ⇓ Sn b d is straightforward:

succ n ≻ λfx. f(nfx) ≻2 λfx. f(f nx) = Sn b b d At this point an explicit definition of the notation snt is helpful.

s0t := t sSnt := s(snt)

We have the following equivalence for the successor function (f and g are dis- tinct variables).

succ f g ≡ g ◦ f g

1.4.2 Addition, Multiplication, Exponentiation

The following equations fully characterize addition, multiplication, and exponen- tiation of natural numbers.

0 + n = n 0 · m = 0 m0 = 1 Sm + n = S(m + n) Sm · n = n + m · n mSn = m · mn

We refer to these equations as Dedekind equations. They provide the basis for our encoding of addition, multiplication, and exponentiation.

add := λm.m succ mul := λmn.m(add n) 0 b exp := λmn.n(mul m) 1 b One can show that add, mul, and exp satisfy the Dedekind equations for Church numerals modulo β-equivalence (e.g., add 0 n ≡ n). With natural induction we b can then prove the following equivalences: b b

add m n ≡ m + n c b Æ mul m n ≡ m · n c b Æ exp m n ≡ mn c b d Since the right hand sides of the equivalences are normal, we obtain the following correctness properties with the Church-Rosser property.

add m n ⇓ m + n c b Æ mul m n ⇓ m · n c b Æ exp m n ⇓ mn c b d

6 1.5 Pairs, Predecessor, and Primitive Recursion

1.4.3 More Equivalences for Church Numerals

One can show the following β-equivalences.

m + n ≡ λf . mf ◦ nf Æ c b m · n ≡ m ◦ n Æ c b The equivalences can be used to encode addition and multiplication. In the lambda calculus with β- and η-reduction, we have the following equivalence for exponentiation.

mn ≡ nm d This equivalence does not hold without η-reduction:

m0 = 1 = λfx.fx 6≡ λx.x ≺ 0m d b b

1.5 Pairs, Predecessor, and Primitive Recursion

At first, writing a predecessor function for Church numerals seems difficult. The trick is to iterate on pairs. We start from the pair (0, 0) and iterate n-times to obtain the pair (n,n − 1).

(0, 0) ⇝ (1, 0) ⇝ (2, 1) ⇝ ··· ⇝ (n,n − 1)

1.5.1 Pairs

We encode pairs as follows.

pair := λxyf.fxy fst := λp.p(λxy.x) snd := λp.p(λxy.y)

The following equivalences are easy to prove:

fst (pair xy) ≡ x snd (pair xy) ≡ y

7 1 Untyped Lambda Calculus, First Encounter

1.5.2 Primitive Recursion

Primitive recursion is a definition scheme for functions on the natural numbers introduced by Peano in 1889 as a compagnon to natural induction. We will define a primitive recursion combinator prec satisfying the equivalences

prec x f 0 ≡ x b prec x f Sn ≡ f n(prec x f n) d b b The trick is to iterate on pairs:

(0,s) ⇝ (1, t 0 s) ⇝ (2, t 1 (t 0 s)) ⇝ ··· b b b b b b This leads to the following definition.

a := λx. pair 0 x b step := λf p. pair (succ (fst p)) (f (fst p) (snd p) prec := λxf n. snd (n (step f) (ax))

Showing the first correctness equivalence for prec is easy. For the second cor- rectness equivalence we need the following lemma.

Sn (step f) (ax) ≡ pair Sn (f n(snd (n(step f) (ax)))) d d b b The lemma follows by induction on n.

1.5.3 Predecessor

The predecessor operation can now be expressed with primitive recursion.

pred := prec 0 (λxy.x) b The correctness proof is straightforward using the correctness equivalences for prec.

pred 0 ≡ 0 b b pred Sn ≡ n d b

1.6 Church Numerals in Coq

We will now represent Church numerals and their operations in Coq and prove the correctness of the operations. This will deepen our understanding of Church numerals and raise some interesting issues about Coq. Since Coq is typed we must represent Church numerals as typed functions. We represent Church numerals as members of the type

8 1.6 Church Numerals in Coq

Definition Nat : Prop := ∀ X : Prop, (X → X) → X → X.

It is crucial that the variable X ranges over propositions rather than general types. This will be explained later. We define a function N that maps a number n to the numeral n. b Definition zero : Nat := fun Xfx ⇒ x. Definition succ : Nat → Nat := fun nXfx ⇒ f (n X f x). Definition N : nat → Nat := fun n ⇒ nat_iter n succ zero.

Use the command Compute N 7 to see the Church numeral for 7. Following the Dedekind equations, we express addition, multiplication, and exponentiation as follows.

Definition add : Nat → Nat → Nat := fun m ⇒ m Nat succ. Definition mul : Nat → Nat → Nat := fun m n ⇒ m Nat (add n) (N 0). Definition exp : Nat → Nat → Nat := fun m n ⇒ n Nat (mul m) (N 1).

We can now prove the following correctness statements.

• succ (N n) = N (S n) • add (N m) (N n) = N (m + n).

• mul (N m) (N n) = N (m * n). • exp (N m) (N n) = N (pow m n).

All proofs are straightforward and are based on the characteristic equations for the operations, which hold by conversion. The proofs for addition and multipli- cation are by induction on m, and the proof for exponentiation is by induction on n, as one would expect from the definitions. The correctness proof for addi- tion looks as follows.

Lemma add_correct m n : add (N m) (N n) = N (m + n). Proof. induction m; simpl. − reflexivity. − change (add (N (S m)) (N n)) with (succ (add (N m) (N n))). now rewrite IHm. Qed.

1.6.1 Predicativity of Coq’s Universe Type

We now explain why the definition

Definition Nat : Type := ∀ X : Type, (X → X) → X → X.

9 1 Untyped Lambda Calculus, First Encounter does not work. The reason is that Type is a predicative universe. This means that function of a type A := ∀X : Type. s can only be applied to types that are smaller than A. In particular, if f is a function of type A, then f cannot be applied to A. As a consequence, the definition

Definition add : Nat → Nat → Nat := fun m ⇒ m Nat succ. will not type check since m : Nat is applied to Nat. In contrast, Prop is an impredicative universe where a size restriction on types does not exist. The universe Type can not be made impredicative since this would result in an inconsistent system where False is provable. This is a basic fact of logic that cannot be massaged away. Since Nat is a proposition, it follows that we express numerals and operations on numerals as proofs. So our representation of Church numerals shows that Coq’s proof language has considerable computational power. Since numerals are proofs, we cannot show in Coq that the embedding function N : nat → Nat is injective (because of the elim restriction). Nevertheless, we can observe from the outside that N yields different numerals for different numbers.

1.6.2 Church Exponentiation

Since Coq has η-conversion, we can prove the equation mn = mn and use it to obtain an exponentiation function. We speak of Church exponentiationd c b . For the proof to go through, we need suitable equations for multiplication and addition. The following works.

Definition add : Nat → Nat → Nat := fun mnXfx ⇒ mXf(nXfx). Definition mul : Nat → Nat → Nat := fun mnXf ⇒ m X (n X f). Definition exp : Nat → Nat → Nat := fun mnX ⇒ n (X → X) (m X). Lemma add_correct m n : N (m + n) = add (N m) (N n). Lemma mul_correct m n : N (m * n) = mul (N m) (N n). Lemma exp_correct m n : N (pow m n) = exp (N m) (N n).

Interestingly, the above encodings of addition, multiplication, and exponentia- tion will type check if Nat is defined with Type rather than Prop since they do not require an application of a numeral to the type Nat. However, once we en- code the predecessor operation, the typing problem will reoccur and cannot be avoided.

10 1.7 Fixed Point Combinators and Recursive Functions

1.7 Fixed Point Combinators and Recursive Functions

A fixed point of a function f is an argument x such that fx = x. A fixed point combinator is a combinator R such that

Rs ≡ s(Rs) for every term s. We can see a fixed point combinator as a function that yields a fixed point for every function of the lambda calculus. Turing defined a fixed point combinator T as follows:1

A := λxf. f(xxf) Θ := AA

Since Θf ≻3 f(Θf), we have that Θ is a fixed point combinator. It follows that every function of the lambda calculus has a fixed point. With a fixed point combinator we can construct recursive functions. To do so, we represent the desired recursive function as an ordinary function F := λfx.s where f serves as the name of the recursive function. We call F the functional for the recursive function. Given a fixed point combinator R, the term RF describes the desired recursive function:

f RF ≡ F(RF) ≡ λx.sRF

This straightforward construction of recursive functions may be surprising at first. What is technically needed are functional arguments and self application of functions.

Exercise 3 Prove that the following term is a fixed point combinator.

Y := λf. (λx.f(xx))(λx.f(xx))

Exercise 4 Prove that no normal fixed point combinator exists. The following facts are helpful (WN means weakly normalizing).

WN s → WN (sx) s ⇓ t → xs ⇓ xu

Exercise 5 Let s be a term and f and x be distinct variables. Find a term t such f that tx ≡ st . Prove that your term t satisfies the equivalence. One says that t solves the equation fx = s for the function variable f . Note that the existence of t justifies the usual equational form of recursive definitions.

1 Barendregt [1] writes Θ for T .

11 1 Untyped Lambda Calculus, First Encounter

1.8 Scott Numerals

Church numerals are not the simplest representation of numbers. Given that we have unrestricted functional recursion, it suffices that the functions represent- ing the numbers provide for the case analysis coming with natural numbers (as match does in Coq). In fact, it suffices that the numerals satisfy the following equivalences (n is the numeral for n):

0 xf ≡ x Snxf ≡ f n

Note that the Church numerals do not satisfy these equivalences. However, the definition of numerals satisfying the equivalences is straightforward:

0 := λxf.x Sn := λxf. f n

We call these numerals Scott numerals in honor of Dana Scott who identified the scheme for general constructor types (the trick is always to model the match coming with the constructor type). Note that Scott numerals are normal combi- nators, and that Scott numerals for different numbers are different normal terms. It is straightforward to define functions that yield successors and predecessors.

succ := λnxf. fn pred := λn. n 0 I

The equivalence pred (succ s) ≡ s follows for every term s by β-reduction:

pred (succ s) ≻ succ s 0 I ≻3 Is ≻ s

Following the Dedekind equations, we can now define addition, multiplication, and exponentiation as follows:

add := Θ (λfmn. mn(λm′. succ (f m′n))) mul := Θ (λfmn. m 0 (λm′. add n(fm′n))) b exp := Θ (λfmn. n 1 (λn′. mul m(fmn′))) b Exercise 6 Verify that our representation of pairs is Scott style.

Exercise 7 Do the following for Scott numerals. a) Verify the correctness of addition, multiplication, and exponentiation. b) Define and verify a factorial function.

12 1.9 SK-Terms and SK-Reduction c) Define and verify a higher-order function for primitive recursion.

Exercise 8 Represent lists in Scott style. a) Define nil and cons. b) Define and verify hd and tl. c) Define and verify length and concatenation.

1.9 SK-Terms and SK-Reduction

SK-terms are defined inductively: 1. Variables are SK-terms. 2. K and S are SK-terms. 3. st is an SK-term if s and t are SK-terms.

Theorem 1 Every term is equivalent to an SK-term.

The proof of the theorem is based on an abstraction operator that for a vari- able x and an SK-term s yields an SK-term xs such that xs ≡ λx.s. The abstrac- tion operator is defined by structural recursion on SK-terms:

xx := SKK xs := Ks if x not free in s xst := S xs xt otherwise

Exercise 9 Prove xs ≡ λx.s for every variable x and every SK-term s.

We now define a translation operator [s] translating every term into an equiv- alent SK-term by structural recursion on terms.

[x] := x [st] := [s][t] [λx.s] := x[s]

Exercise 10 Let s be a term. Prove that [s] is an SK-term such that [s] ≡ s.

Weak reduction s ≻w t is a binary relation inductively defined on terms:

′ ′ s ≻w s t ≻w t ′ ′ Kst ≻w s Sstu ≻w su(tu) st ≻w s t st ≻w st

Here are facts about weak reduction.

13 1 Untyped Lambda Calculus, First Encounter

• Weak reduction does not involve substitution. • Weak reduction cannot happen below lambda (i.e., inside abstractions). • Weak reduction applies only to β-redexes of the form Kst or Sstu. 2 3 • If s ≻w t, then s ≻ t or s ≻ t.

• If s ≻w t and s is an SK-term, then t is an SK-term. In other words, SK-terms are closed under weak reduction.

∗ Exercise 11 Find two terms s and t such that s ≻ t but not [s] ≻w [t]. Hint: Consider λx.Ix.

1.10

Suppose we just consider SK-terms and weak reduction. It turns out that this subsystem of the lambda calculus is still Turing-complete. Moreover, in this system there is no need for lambdas, all we need are two constants taking the role of the combinators S and K. The resulting system is known as combinatory logic (CL).2 The terms of the CL are defined as follows.

s, t ::= x | K | S | st (x ∈ N)

Note that S and K are now constants rather than λ-terms. The reduction relation for combinatory logic is defined like weak reduction.

s ≻ s′ t ≻ t′ Kst ≻ s Sstu ≻ su(tu) st ≻ s′t st ≻ st′

Equivalence of CL-terms is defined analogously to the lambda calculus.

s ≻ t s ≡ t s ≡ t t ≡ u s ≡ t s ≡ s t ≡ s s ≡ u

As in the lambda calculus, equivalence in CL satisfies the Church-Rosser prop- erty. Thus normal forms are unique and correctness proofs in CL have the struc- ture familiar from lambda calculus. Moreover, leftmost outermost reduction yields a normal form whenever there exists one.

2 CL originated in Haskell Curry’s doctoral dissertation in 1930 (supervised by Hilbert in Göttin- gen). Some of the ideas including the combinators S and K already appeared in 1924 in a very readable paper by Moses Schönfinkel [3].

14 1.10 Combinatory Logic

Since CL has no binders, the definition of substitution is straightforward.

x xu := u x yu := y if x ≠ y x Ku := K x Su := S x x x (st)u = su tu

Reduction and equivalence in CL are substitutive. We define combinators describing identity and composition.

I := SKK B := S(KS)K

The reductions verifying the correctness of our definitions are easy to show.

Ix ≻∗ x Bxyz ≻∗ x(yz)

We represent the natural numbers in Scott-style.

0 := S(KK)I Sn := K(SI(K n))

The correctness of the representation is easy to verify.

0 xf ≻∗ x Snxf ≻∗ f n

There is also a fixed point combinator.

ω := SII A := B(SI)ω Θ := AA

We have

ωx ≻∗ xx Θf ≻∗ f(Θf)

15 1 Untyped Lambda Calculus, First Encounter

1.11 Abstraction Operator for CL

We have already seen that S and K can simulate abstraction in the lambda calcu- lus. Thus it is not surprising that we can define an abstraction operator for CL. We define an abstraction operator for CL following Hindley and Seldin [2].3

xx := I xs := Ks if x does not occur in s xsx := s if x does not occur in s xst := S xs xt otherwise

Note that x does not occur in the term xs. The defined abstraction in CL be- haves much like the native lambda abstraction in lambda calculus. We have the following [2].

x ∗ x ( s)t ≻ st ∗ x1... xn (x1... xn s)t ... t ≻ s if x ... x are distinct variables not appearing in t ... t 1 n t1... tn 1 n 1 n x y x y ( s)u = (su ) if x ≠ y and x does not occur in u

There is an important difference between native abstraction in the lambda calcu- lus and defined abstraction in CL. Native abstraction satisfies the so-called ξ-law

s ≡ t → λx.s ≡ λx.t while defined abstraction does not. For instance, with

s := Sxyz t := xz(yz) we have

xs = S(SS(Ky))(Kz) xt = S(SI(Kz))(K(yz))

Thus s ≡ t and xs 6≡ xt in CL (since xs and xt are normal and different). Speaking operationally, the failure of the ξ-law means that we cannot reduce inside defined abstractions in CL. We remark that functional programming lan- guages like ML or Haskell do not provide for reduction inside their native ab- stractions. So the failure of the ξ-law does not affect the Turing-completeness of CL.

3 There are many possible abstraction operators for CL. In our definition, the third equation may be omitted and the second equation may be restricted to variables and the constants K and S. Both modifications result in larger terms. The third equation can be seen as an η-reduction.

16 1.12 Church Numerals in CL

We can compile the terms of the lambda calculus into terms of CL. However, this does not provide a faithfull implementation of β-reduction. For instance, we have λx.Kxx ≻∗ λx.x but xKxx = SK(SKK) and xx = SKK are different normal forms. The problem is that lambda calculus provides for reductions inside abstractions and that the translation with S takes away redexes. One may consider weak β-reduction that cannot take place within abstrac- tions. However, such a weak lambda calculus is crippled since it does not en- joy the Church-Rosser property. For instance, the term K(II) would have two normal forms in such a calculus (λx.I and λx.II). Recall that CL enjoys the Church-Rosser property.

x ∗ x Exercise 12 Prove ( s)t ≻ st in CL.

1.12 Church Numerals in CL

The canonical translation of Church numerals to CL does not work since succ 0 ≻3 1 requires two β-reductions inside an abstraction. Thus using the canonical translation to CL, succ 0 will not reduce to 1. The problem can be circumvented by using the following definitions in CL.

zero := KI succ := SB n := succnzero b Exercise 13 Prove nfx ≻∗ f nx in CL. b Exercise 14 Prove the following in lambda calculus using the definitions from CL.

zero ≻∗ λfx.x succ ≻∗ λnfx. f(nfx) n ≻∗ λfx. f nx b

1.13 Call-by-Value Reduction

We return to the lambda calculus and consider a restricted version of β-reduction known as call-by-value reduction (CBV). In this context a value is a term of the form λx.s. The inductive definition of CBV is as follows.

′ ′ t is a value s ≻V s s is a value t ≻V t x ′ ′ (λx.s)t ≻V st st ≻V s t st ≻V st

17 1 Untyped Lambda Calculus, First Encounter

Every normal term is CBV-normal, but (λx.xx)x and λx.II are examples of CBV- normal terms that are not normal. CBV-reduction is deterministic, that is, if ′ ′ s ≻V t and s ≻V t , then t = t . The lambda calculus with cal-by-value reduction is still Turing-complete. Numbers can be represented with Scott numerals (see Section 1.8):

0 := λxf.x Sn := λxf. f n

The successor and predecessor functions

succ := λnxf. fn pred := λn. n 0 I still reduce as they should

succ n ≻V Sn

pred (succ n) ≻V n

There is also a combinator providing for recursive functions.

AV := λxf. f(λy.xxfy)

ΘV := AV AV

For every abstraction s we have

Θ 2 Θ V s = AV AV s ≻V s(λy. AV AV sy) = s(λy. V sy)

So ΘV is a fixed point combinator for abstractions up to η-conversion. This suffices for recursive functions since ΘV s and λy. ΘV sy do the same for all arguments. Combinators like ΘV are often called call-by-value fixed point com- binators. Call-by-value reduction is more complex than unrestricted β-reduction since the elegant interplay between β-reduction and β-equivalence is lost. The reason for considering CBV-reduction is that is used in most execution-oriented pro- gramming languages. We will return to CBV-reduction when we consider the typed programming language PCF.

Exercise 15 Compute the normal forms of the following terms in CBV-lambda calculus. a) II(II) b) K(ω(λx.yΩ))

18 1.14 Call-by-Name Reduction

1.14 Call-by-Name Reduction

There is also call-by-name reduction (CBN), which is defined as follows.

′ s ≻N s x ′ (λx.s)t ≻N st st ≻N s t

Θ Θ ∗ Θ With CBN-reduction, is still a fixed point combinator, that is, f ≻N f( f). Moreover, the basic operations on Scott numerals work as expected:

0 xf ≻N x

Snxf ≻N f n

succ n ≻N Sn

pred (succ n) ≻N n

However, since argument terms are never evaluated, there is a problem with nested operations:

succ (succ 0) ≻N λxf.f(succ 0) ≠ 2

We conclude that call-by-name reduction does not make sense for pure lambda calculus.

1.15 Summary

We have considered three different untyped reduction systems: • λβη, lambda calculus with β- and η-reduction. • λβ, lambda calculus with β-reduction. • CL, combinatory logic with K- and S-reduction. All three systems satisfy the Church-Rosser property, where equivalence in each case is defined as the equivalence closure the reduction relation. Reduction can take at every subterm, which makes equivalence congruent (i.e., compatible with the term structure). This setup provides for equational correctness proofs. Com- putation reduces a term until a normal form is reached. If always the leftmost- outermost redex is reduced, a normal form will be reached if there exists one. The Church-Rosser property ensures that the reachability of a normal form is preserved by arbitrary reduction steps. All three systems compute with functions and only with functions. All three systems have a fixed point combinator R such that Rf ≻∗ f(Rf). Such every definable function has a fixed point.

19 1 Untyped Lambda Calculus, First Encounter

All three systems can define functions in equational style:

fx1 ... xn := s

This means that there is an algorithm that giben the equation constructs a term t such that the equivalence f tx1 ... xn ≡ st holds. In the nonrecursive case

t := λx1 ... xn.s does the job. In the recursive case

t := R(λfx1 ... xn.s) does the job where R is some fixed point combinator. In CL, the lambdas used above can be simulated with an abstraction operator. Constructor-based data structures can be represented with Scott’s encoding. For numbers, pairs, and lists we obtain the following.

zero := λxf.x succ a := λxf .f a pair a b := λf .f ab nil := λxf.x cons a b := λxf .f ab

The trick is that a constructor yields the match for the constructed value, where the match is represented as a function taking a “continuation” argument for every constructor. Thus reduction can in fact perform the match. One can also verify that the constructors are injective on normal forms. Moreover, different constructors for the same data structure always yield different results . From the above it is clear that all three systems are Turing-complete. Never- theless, the tree systems differ in reduction power, where CL is weaker than λβ, and λβ is weaker than λβη. For instance, Church’s exponentiation works only in λβη, and the canonical translation of Church numerals does not work in CL.

20 1.15 Summary

The idea underlying the Church numerals extends to constructor-based data structures in general. Here are the encodings fur numbers, pairs, and lists.

zero := λxf.x succ a := λxf.f(axf) pair a b := λf .f ab nil := λxf.x cons a b := λxf.fa(bxf)

We have switched the argument order for numbers to reveal the similarity with Scott’s encoding. Church’s encoding refines Scott’s encoding in that “recursive” constructor arguments are committed to the continuations given for the match. This way Church’s encoding builds in a particular form of primitive recursion.

21

2 Inductive Predicates

Inductive definitions of predicates are often informally presented by giving in- ference rules for atomic propositions obtained with the predicate. In Coq’s type theory inductive definitions are a basic feature. In this chapter we present the intersection model for inductive predicates, which obtains inductive predicates without using inductive definitions. By studying the intersection model we obtain a deeper understanding of inductive predicates and the accompanying induction principles. The intersection model for inductive predicates refines the intersection model for the inductive definitions of sets. The standard reference for inductive defini- tions in set theory is a 1977 handbook chapter by Peter Aczel [?]. There is an accompanying Coq development formalizing the definitions and proofs of this chapter.

2.1 Predicates and Relations

Predicates in type theory correspond to relations in set theory. A predicate is a function

X1 →···→ Xn → Prop while a relation is a subset of

X1 ×···× Xn

A main difference between predicates and relations is that relations are exten- sional while predicates are not. That is, we may have different predicates p and q that are equivalent:

p ≈ q := ∀x1 ... xn. px1 ... xn ↔ qx1 ... xn

In Coq it is consistent to assume that equivalent predicates are equal, but we will not make this assumption. We define subsumption of predicates as follows.

p≼q := ∀x1 ... xn. px1 ... xn → qx1 ... xn

23 2 Inductive Predicates

For relations R and S, subsumption is simply set inclusion R ⊆ S. Note that equivalence of predicates is an equivalence relation and that subsumption of predicates is a preorder (i.e., a reflexive and transitive predicate).

2.2 Even Numbers

We present the intersection model at the example of an inductive predicate hold- ing for the even natural numbers. We present the definition of the predicate using inference rules.

even n even 0 even (S(S n))

Formally, the definition give us a predicate

even : nat → Prop and three base lemmas

B1 : even 0 B2 : ∀n. even n → even (S(S n)) BI : ∀p. p0 → (∀n. even n → pn → p(S(S n))) → even  p

We say that the lemmas B1 and B2 are the introduction principles and the lemma BI is the induction principle coming with the definition. We call the lemmas B1, B2, and BI the base lemmas for the predicate even. As it comes to formal proofs, we make the crucial assumption that all we know about even are the base lemmas. The introduction lemmas B1 and B2 allow us to derive propositions even n according to the inference rules. The induction lemma BI allows us to show that even subsumes a given predicate p. To do so, we have to show for each inference rule that p propagates from the recursive premises of the rule to the conclusion of the rule. The second rule for even has a single premise which is recursive. The propagation condition for this rule is

∀n. even n → pn → p(S(S n)) where the premise pn is the so-called inductive hypothesis. We get an inductive hypothesis for every recursive premise of a rule.

24 2.2 Even Numbers

2.2.1 Inductive Definitions as Specifications

We can see the base lemmas as a specification

spec : (nat → Prop) → Prop of the predicate even. This leaves open how the predicate even is obtained. We can ask whether the specification has a unique solution (up to equivalence) and whether we can construct a solution of the specification not using a native inductive definition. The answer to both questions is yes. We formulate the specification for even with three defining predicates that abstract over the base lemmas.

D1 q := q0 D2 q := ∀n, qn → q(S(S n)) DI q := ∀p. p0 → (∀n. qn → pn → p(S(S n))) → q  p spec q := D1 q ∧ D2 q ∧ DI q

Showing that the specification has at most one solution (up to equivalence) is straightforward. Moreover, we can show that the intersection

I := λn. ∀p. D1 p → D2 p → pn of all predicates satisfying the introduction principles is a solution of the spec- ification. Showing that I satisfies D1 and D2 is straightforward. To show that I satisfies the induction principle DI one first shows that I satisfies the property

DL q := ∀p. D1 p → D2 p → q  p

DL I is almost the induction principle DI I we want to show. In fact, we obtain a proof of DI I by applying the proof of DL I to the predicate λn. In ∧ pn. That a predicate q satisfies D1, D2, and DL means that q is the least predicate satisfying D1 and D2. The predicates DI and DL both formulate induction principles. In fact, we can specify even with either DI or DL since we have

DI p → DL p D1 p → D2 p → DL p → DI p

We call ML the pure induction principle and MI the augmented induction prin- ciple.

25 2 Inductive Predicates

We speak of the intersection model for inductive predicates since we construct the inductive predicate as the intersection of all predicates satisfying the intro- duction principles. The intersection model is commonly used in to obtain the least set satisfying given closure properties. It is important to understand that the predicates D1, D2, DI, and DL can be obtained algorithmically from the defining inference rules. If we define even with an inductive definition in Coq, the value constructors of the definition provide the base lemmas B1 and B2. The remaining base lemma BI (the augmented induction principle) will be provided under the name even_ind. Coq’s induction tactic is a convenient way to apply BI. Coq establishes even_ind with a proof term using the fix and match coming with the inductive defini- tion of even. If even is defined inductively, we can apply the tactics destruct and inversion to assumptions for propositions obtained with even. The uses of destruct and inversion can always be simulated with the induction principle BI. What we have shown at the example of even will carry over to all inductive definitions accepted by Coq.

Exercise 16 Study the intersection model for inductive predicates at the exam- ple of the evenness predicate using Coq. a) Define predicates D1, D2, and DI such that

spec q := D1 q ∧ D2 q ∧ DI q

is a specification for evenness predicates. b) Show that the specification spec has at most one solution up to equivalence. c) Define DL and prove

DI p → DL p D1 p → D2 p → DL p → DI p d) Define the predicate

even n := ∀p. D1 p → D2 p → pn

and show that it satisfies the specification spec. e) Prove the following facts. i) even 4 ii) ¬even 1 iii) even (S(S n)) → even n iv) even n → ¬even (Sn)

26 2.3 Pencil Notation for Inductive Proofs

Hint: Use the tactic refine to apply the induction principle. Note that (ii) and (iv) can be shown with the induction principle L, while (iii) requires the induction principle BI. f) Define an evenness predicate using an inductive definition and prove that it satisfies the specification spec.

2.3 Pencil Notation for Inductive Proofs

We prove three claims about the predicate even to demonstrate a pencil notation for inductive proofs. Make sure you understand that the proofs only use the induction principle BI for even.

Proof (1) ¬even 1 ∀n. even n → n ≠ 1 Induction on even 1. Claim trivial 0 ≠ 1 2.A : even n S(Sn) ≠ 1 IH : n ≠ 1

Claim trivial 

The proof first equivalence transforms the claim so that the induction principle can be used to do case analysis on the derivation of even 1. The inductive hy- pothesis is not used. In Coq the tactic remember can be used for the equivalence transformation of the initial claim.

Proof (2) ∀n. even (S(S n)) → even n ∀k. even k →∀n. k = S(S n) → even n Induction on even 1. Claim trivial ∀n. 0 = S(S n) → even n 2.A : even k S(Sk) = S(S n) → even n IH : ∀n. k = S(S n) → even n

Claim follows 

As before, the proof uses the induction principle to do a case analysis. The inductive hypothesis is not used. We have given the inductive hypotheses in the proofs above so that you know what it is. In the next pencil proof the use of the inductive hypothesis is essential.

27 2 Inductive Predicates

Proof ∀n. even n → ¬even (S n) Induction on even 1. Claim follows with (1) ¬even 1 2.A : even n ¬even (S(S(S n))) IH : ¬even (S n)

Claim follows with (2) and IH 

Exercise 17 Simulate the above pencil proofs with Coq.

2.4 Exercises

Exercise 18 (Reachability) Let a type X, a predicate R : X → X → Prop and a point a : X be given. We define a predicate “R can reach a from x” inductively:

Rxy reach R a y reach R a a reach Rax a) Define the predicate reach R a with the intersection method in Coq and show that it satisfies the base lemmas coming with the inference rules. b) Define R∗ with a native inductive definition in Coq and prove R∗ ≈ reach R.

Exercise 19 (Termination) Let a type X and a predicate R : X → X → Prop be given. We define a predicate “R terminates on x” inductively:

Rx  ter R ter R x a) Define the predicate ter R with the intersection method in Coq and show that it satisfies the base lemmas coming with the inference rule. b) Define SN R with a native inductive definition in Coq and prove SN R ≈ ter R.

Exercise 20 (Knaster-Tarski) Let X be a type and F : (X → Prop) → (X → Prop) be a monotone predicate (i.e., ∀p q. p  q → Fp  Fq). Find a predicate I such that you can prove the following. The intersection p⊓q abbreviates the predicate λx. px ∧ qx. a) Fp  p → I  p b) FI  I c) FI ≈ I

28 2.4 Exercises d) p  I → Fp  I e) F(I ⊓ p)  I f) F(I ⊓ p)  p ↔ I  p Hint: The problem is a translation of a special case of the Knaster-Tarski fixed point theorem from set theory to type theory. Google to find out more about the theorem and its proof. The proof of the Knaster-Tarski theorem is a classical example for the use of the intersection method in Mathematics.

29

3 Abstract Reduction Systems

We are aiming at proving the Church-Rosser property of combinatory logic and lambda calculus. The proofs factorize into general results about abstract re- duction systems and certain verifications for the concrete systems. An abstract reduction system is a reduction system where the notion of term is kept abstract. The architecture of the confluence proofs can be fully developed at the level of abstract reduction systems, and this will be the topic of this chapter. We will develop our definitions both in set theory and type theory. The type- theoretic definitions are refinements of the set-theoretic definitions. There is an accompanying Coq development formalizing all definitions and proofs.

3.1 Basic Definitions

Set-theoretically, an abstract reduction system (ARS) consists of a set X and a relation R ⊆ X × X. We can see an ARS as a possibly infinite directed graph. Following this view, we call the elements of X nodes and the elements of R edges. If Rxy, we say that X is a predecessor of y, and that y is a successor of x. Type-theoretically, an abstract reduction system consists of a type X and a predicate R : X → X → Prop.

3.2 Reflexive Transitive Closure

Given an ARS, we want to define the reflexive transitive closure R∗ of R. Note that this amounts to an operator mapping a relation R to a relation R∗. There are many different definitions of R∗. While all definitions yield the same rela- tion or equivalent predicates, the particular definition chosen matters since it contributes proof principles. In particular, we want a simple yet powerful induc- tion principle for R∗. We choose the following inductive definition, called linear definition in the following.

Rxy R∗yz R∗xx R∗xz

31 3 Abstract Reduction Systems

The canonical induction principle for this definition has one case for each rule.

py → (∀xy. Rxy → R∗yz → py → px) → ∀x. R∗xy → px

The first rule yields a base case. The second rule yields an inductive step with a single inductive hypothesis. We formalize the definition of R∗ as follows in Coq. Variable X : Type. Implicit Types R : X → X → Prop. Inductive star R : X → X → Prop := | starR x : star R x x | starCxyz : R x y → star R y z → star R x z. Given this definition, Coq generates an induction principle quantifying over both arguments of R∗ (check star_ind):

(∀x. pxx) → (∀xyz. Rxy → R∗yz → pyz → pxz) → ∀xy. R∗xy → pxy

This induction principle is unnecessarily complex. The reason is that the argu- ment y of R∗xy is a uniform parameter mathematically but is not declared as such by our definition. In fact, in Coq we can make y a uniform parameter only if we switch the argument order of R∗ (because parameters have to come first). Given this situation, we decide to work with the induction principle Coq gener- ates. It subsumes the canonical induction principle shown above but produces unnecessarily complex inductive hypotheses. In the above inductive definition of star it is possible to make the argument x of star R x y a non-uniform parameter. This leads to more compact matches and destructs but does not change the induction principle generated. Exercise 21 Establish the canonical induction principle for R∗ in Coq. Lemma star_canonical_ind R (p : X → Prop) y : p y → (∀ x x’, R x x’ → star R x’ y → p x’ → p x) → ∀ x, star R x y → p x.

Exercise 22 Show in Coq that the two induction principles for R∗ are equivalent. To do so, assume a type X and two relations R and S in a section. Then formulate the two induction principles (S taking the role of R∗) and prove them equivalent. Interestingly, the equivalence does not depend on the rules (i.e., constructors) defining R∗.

32 3.2 Reflexive Transitive Closure

Exercise 23 Given X and R, show that the constructors and the induction princi- ple uniquely determine R∗. To do so, assume X and R and two relations S and S′ with corresponding constructors and induction principles and show that S and S′ are equivalent.

Exercise 24 Establish the following right-to-left induction principle for R∗.

Lemma star_right_ind R (p : X → Prop) x : p x → (∀ y y’, star R x y’ → p y’ → R y’ y → p y) → ∀ y, star R x y → p y.

3.2.1 Proof of Transitivity

From the definition of R∗ it is clear that R∗ is reflexive (1st rule) and subsumes R (2nd rule and reflexivity). What is not obvious is that R∗ is transitive. In fact, showing the transitivity of R∗ requires an inductive proof.

Claim Let R∗ be obtained with the linear definition. Then R∗ is transitive.

Proof Let R∗xy and R∗yz. We show R∗xz by induction on the derivation of R∗xy. If R∗xy is obtained with the first rule of the definition, we have x = y and the claim follows. If R∗xy is obtained with the second rule, we have Rxx′ and R∗x′y. The inductive hypothesis applies to R∗x′z and yields R∗x′z. (The inductive hypothesis applies since R∗x′z has a shorter derivation than R∗xy.)

The claim follows with the second rule of the definition. 

Here is a more explicit and less verbose presentation of the proof that is easier to verify.

Proof

A : R∗xy B : R∗yz Show: R∗xz Induction A 1.x = y Claim trivial 2. Rxx′ ∧ R∗x′y IH : R∗x′z

Claim follows with 2nd rule 

33 3 Abstract Reduction Systems

Exercise 25 Simulate the above transitivity proof with a Coq script. First use the induction tactic and notice that you get a more complex inductive hypothesis. The reason is that the induction tactic is using the automatically generated in- duction principle star_ind, which quantifies over both variables. Then redo the proof using the canonical induction principle (use the tactic refine to apply it). This should give you exactly the proof shown above.

Exercise 26 Prove the following properties of R∗ in Coq. a) Monotonicity: R≼S → R∗ ≼S∗ b) Minimality: If R≼S and S is reflexive and transitive, then R∗ ≼S. c) Idempotence: (R∗)∗ ≈ R∗ d) Interpolation: R≼S≼R∗ → R∗ ≈ S∗

3.2.2 Binary Definition

Here is a second definition of the reflexive transitive closure called binary defi- nition in the following.

R#xy R#yz Rxy Rxx R#xz R#xy

This time it is obvious that R# is reflexive (1st rule), transitive (2nd rule), and subsumes R (3rd rule). In fact, the inductive definition of R# faithfully realizes what the term reflexive transitive closure says. This comes at the cost of a com- plex induction principle with 3 cases, where the second case comes with two inductive hypotheses. To show that R∗ ≈ R#, we have to verify that the rules of one system can simulate the rules of the other system. It is easy to see that the binary system can simulate the two rules of the linear system. For instance, we have to verify

Rxy → R#yz → R#xz to show that the binary system can simulate the 2nd rule of the liner system. We also have to verify that the linear system can simulate each rule of the binary system. The first and the second rule are straightforward. That we can simulate the third rule follows by the transitivity lemma for the linear system. To show R∗ ≈ R# in Coq, we need two inductions. An induction on R∗ is needed for R∗ ≼ R#, and an induction on R# is needed for R# ≼ R∗.

Exercise 27 Carry out the definition of R# in Coq and show the equivalence with R∗.

34 3.3 Normal Forms

3.2.3 Power Definition

Here is another set-theoretic definition of R∗ that is commonly used in the liter- ature.

∗ R := [ Rn n∈N R0 := { (x,x) | x ∈ X } Rn+1 := R ◦ Rn R ◦ S := { (x,z) | ∃y. Rxy ∧ Syz }

This definition relies on natural recursion and does not require an understand- ing of general inductive definitions. The definition inherits the linear induction principle of the natural numbers.

Exercise 28 Carry out the power definition in Coq and prove the equivalence with the linear definition.

3.3 Normal Forms

reducible R x := ∃y. Rxy normal R x := ¬reducible R x x ⇓R y := R∗xy ∧ normal R y

Exercise 29 Prove R∗xy → normal R x → x = y.

3.4 Equivalence Closure

We define the equivalence closure R≡ using the transitive closure.

R≡ := (R↔)∗ equivalence closure R↔ := R ∪ R−1 symmetric closure R−1 := { (y,x) | (x,y) ∈ R } inverse

This gives us a linear recursion principle for R≡. In Coq, we refine the above definition to the inductive definition

Inductive ecl R : X → X → Prop := | eclR x : ecl R x x | eclCxyz : R x y → ecl R y z → ecl R x z | eclSxyz : Ryx → ecl R y z → ecl R x z.

35 3 Abstract Reduction Systems

This yields a convenient linear induction principle building in the case analysis coming with the symmetric closure R ∪ R−1. ≡ We write x ≡R y for R xy.

Exercise 30 Prove the following properties of R≡ in Coq. a) Transitivity and symmetry. b) Monotonicity: R≼S → R≡ ≼S≡ c) Minimality: If R≼S and S is an equivalence, then R≡ ≼S. d) Idempotence: (R≡)≡ ≈ R≡ e) Interpolation: R≼S≼R≡ → R≡ ≈ S≡

3.5 Church-Rosser-Like Properties

We will use the following definitions for the analysis of the Church-Rosser prop- erty.

x ↓R y := ∃z. Rxz ∧ Ryz joinable R↓ := { (x,y) | x ↓R y } functional R := ∀xyz. Rxy → Rxz → y = z diamond R := ∀xyz. Rxy → Rxz → y ↓R z confluent R := diamond (R∗) ∗ semi-confluent R := ∀xyz. Rxy → R∗xz → y ↓R z Church-Rosser R := R≡ ⊆ R↓

We obtain the following results.

functional R → semi-confluent R diamond R → semi-confluent R confluent R ↔ semi-confluent R Church-Rosser R ↔ semi-confluent R confluent R → functional (⇓R)

All results have illuminating diagram-based proof sketches. The proofs often use induction on R∗. Translating the diagram-based proof sketches into Coq scripts is routine. Often it is necessary to strengthen the inductive claim by reverting assumptions (implicit in the diagram-based sketches). Here is a textual proof of the second result. Setting up the inductive claim as shown is essential.

Claim diamond R → semi-confluent R.

36 3.5 Church-Rosser-Like Properties

Proof

A : diamond R ∗ B : R∗xz Show: ∀y. Rxy → y ↓R z Induction B 1.x = z Claim follows 2. Rxx′ ∧ R∗x′z ∗ IH : ∀y. Rx′y → y ↓R z ∗ Rxy Show: y ↓R z Rx′v ∧ Ryv by A R∗vu ∧ R∗zu by IH R∗yu

Claim follows 

The properties we have defined for reduction predicates are all extensional. Extensionality is defined as follows.

extensional p := ∀RS. R ≈ S → pR → pS

Exercise 31 Simulate the textual proof shown above with a Coq script using the canonical induction principle for R∗.

Exercise 32 Prove that every semi-confluent relation is Church-Rosser. Start with a diagram-based proof sketch, then give the textual proof, and finally write proof scripts in Coq. Do the proof in Coq first with the canonical induction principle so that is simulates your textual proof. Then do the proof with the induction tactic and use automation tactics when convenient.

Exercise 33 Let R be Church-Rosser. Prove the following. R a) x ≡R y → normal R y → x ⇓ y b) x ≡R y → normal R x → normal R y → x = y

Exercise 34 Prove that the given definitions of confluence and Church-Rosser are extensional in Coq.

Exercise 35 Explain why sym R := (R = λxy. Ryx) cannot be used as a defi- nition of symmetry in Coq.

37 3 Abstract Reduction Systems

3.6 Reduction Operators

Reduction operators are a tool for proving confluence. A reduction operator is a function ρ : X → X. The triangle property for reduction operators is defined as follows.

triangle R ρ := ∀xy. Rxy → Ry(ρx)

The proof of the following facts is straightforward.

triangle R ρ → diamond R R∗ = S∗ → triangle S ρ → confluent R

The second fact gives us a method for proving the Church-Rosser property of lambda calculus and combinatory logic. For the given reduction relation R (β- reduction or weak reduction) we construct a reduction relation S (called parallel reduction) and a reduction operator ρ such that R∗ = S∗ and triangle R ρ. The existence of S and ρ implies the Church-Rosser property of R. Reduction operators can also be used to compute normal forms with the fol- lowing algorithm: Given x, iterate ρ on x until a normal term ρnx is reached. This simple algorithm is correct if ρ satisfies the following properties.

sound R ρ := ∀x. R∗x(ρx) cofinal R ρ := ∀xy. R∗xy → ∃n. R∗y(ρnx)

It is not difficult to prove the following facts.

cofinal R ρ → confluent R sound R ρ → normal R x → ρx = x sound R ρ → cofinal R ρ → (x ⇓R y ↔ ∃n. y = ρnx ∧ normal R y) triangle R ρ → cofinal R ρ triangle R ρ → reflexive R → sound R ρ

Barendregt [1] calls reduction operators strategies. He shows that left-most outermost reduction yields a sound and cofinal reduction operator for lambda calculus. Barendregt [1] does not say that reduction operators can be used to prove confluence. Takahashi [4] uses reduction operators and the triangle prop- erty to prove confluence.

Exercise 36 Let ρ be a reduction operator satisfying the triangle property for R. Prove the following. a) reflexive R → sound R ρ

38 3.7 Abstract Analysis of Reflexive Transitive Closure b) Rxy → R(ρx)(ρy) c) Rxy → R(ρnx)(ρny) d) cofinal R ρ

Exercise 37 Let ρ be a reduction operator that is sound for R. Prove the follow- ing. a) normal R x → ρx = x b) cofinal R ρ → (x ⇓R y ↔ ∃n. y = ρnx ∧ normal R y)

3.7 Abstract Analysis of Reflexive Transitive Closure

We have already seen several constructions of R∗. All of them used inductive definitions, either directly or through the natural numbers. We will now see how R∗ can be specified, constructed, and analyzed without using inductive defini- tions. We will do the following: • We give a specification of R∗. • We show that the specification has at most one solution. • We construct a solution of the specification not using inductive types. The techniques we introduce here apply to inductive predicates in general. They are related to the impredicative definitions of the logical connectives. We say that a relation S is closed under left-composition with a relation R if

lcomp R S := ∀xyz. Rxy → Syz → Sxz

We define R∗ as the intersection of all relations that are reflexive and closed under left-composition with R.

star R x y := ∀p, reflexive p → lcomp R p → pxy

Note that this defines a function mapping relations to relations. We refer to this definition of R∗ as impredicative definition. It is straightforward to show that R∗ the least relation that is reflexive and closed under left-composition with R. This provides us with a specification for R∗.

least R S := ∀p, reflexive p → lcomp R p → S  p spec R S := reflexive S ∧ lcomp R S ∧ least R S

We already know

spec R (star R)

39 3 Abstract Reduction Systems

It is also easy to show that the specification has at most one solution up to equivalence:

spec R S → spec R S′ → S ≈ S′

If we want to know whether some other definition of R∗ is equivalent to the im- predicative definition, we just show that the other definition satisfies the spec- ification. This is, for instance, straightforward to do for the linear inductive definition we discussed earlier in this chapter. We can now ask whether the induction principles we gave before for R∗ can be obtained directly from the specification of R∗. This is in fact the case. We define

ind R S := ∀p. reflexive p → (∀xyz. Rxy → Syz → pyz → pxz) → S  p and show

spec R S → ind R S

The trick is to use least R S as an induction principle with the predicate

qxy := Sxy ∧ pxy

We can take the view that spec R specifies R∗ as the least relation satisfying two closure properties. The closure properties correspond to the rules of the inductive definition, and the requirement that R∗ is the least relation satisfy- ing the closure properties yields a primitive but sufficiently powerful induction principle. From the perspective of proof rules we can see the closure properties as in- troduction rules and the induction principle as an elimination rule.

3.8 Strong Normalization

A node x of an ARS is strongly normalizing if every successor of x is strongly normalizing. This is an interesting inductive definition with a single defining rule. In formal notation the defining rule looks as follows.

∀y. Rxy → SN R y SN R x

We have the following fact.

SN R x ↔ ∀y. Rxy → SN R y

40 3.8 Strong Normalization

Make sure you understand the proof of this fact. From left to right one analyses the derivation of the assumption SN R x (tactic destruct in Coq). From right to left one constructs a derivation of the claim SN R x (tactic constructor in Coq). It is easy to show that every normal node is strongly normalizing (since it has no successors).

normal R x → SN R x

It follows that every node in a finite directed acyclic graph is strongly normal- izing. We would expect that a node x with Rxx is not strongly normalizing. Proving this fact requires induction.

Claim Rxx → SN R x → ⊥

Proof Show A : SN Rx Rxx → ⊥ Induction A IH : ∀y. Rxy → Ryy → ⊥

Claim follows by IH 

Coq infers the canonical induction principle for SN:

(∀x. (∀y. Rxy → SN R y) → (∀y. Rxy → py) → px) → ∀x. SN R x → px

Since there is a single defining rule there is a single inductive step. The first premise of the inductive step is the premise of the defining rule, and the second premise of the inductive step in the inductive hypothesis. Here is the proof term for the induction principle for SN.

fun R p step ⇒ fix f x A := match A with SNI g ⇒ step x g (fun y B ⇒ f y (g y B)) end

Note the higher-order structural recursion: The function g obtained from A for all arguments yields a derivation structurally smaller than A. This property is an essential feature of Coq’s type theory.

∗ locally_confluent R := ∀xyz. Rxy → Rxz → y ↓R z

Lemma 2 (Newman) A strongly normalizing relation is confluent if it is locally confluent.

Exercise 38 Find a locally confluent relation that is not confluent.

41

4 Lambda Calculus, Formalization

A major difference between combinatory logic and lambda calculus is that the lambda calculus comes with a variable binder introducing local variables. In fact, the lambda calculus is the prototypical syntactic system with a variable binder. The presence of a variable binder considerable complicates substitution, which due to beta reduction is a basic ingredient of the lambda calculus. We will study the lambda calculus with de Bruijn’s term representation where numeric argument references replace the local variables of the standard notation.

4.1 Terms and Substitutions in de Bruijn Representation

Following de Bruijn, we represent the terms of the untyped lambda calculus with numeric variable references:

s, t ::= n | st | λs (n ∈ N)

There are no local names. Argument references and free variables are repre- sented with numbers. Here are 2 examples:

λf xy. f xy λλλ2ˆ1ˆ0ˆ λf.f(λx.fx(λy.fxy)) λ0ˆ(λ1ˆ0ˆ(λ2ˆ1ˆ0ˆ))

A substitution is a function N → term. A renaming is a function N → N. We consider renamings to be substitutions. We use σ and τ to denote substitutions and ξ and ζ to denote renamings. The application of substitutions to terms is defined with two mutually recursive operators.

! : (N → term) → term → term ↑ : (N → term) → N → term σ ! n := σn σ ! st := (σ ! s)(σ ! t) σ ! λs := λ(↑σ ! s) ↑σ 0 := 0 ↑σ (Sn) := S ! σn

43 4 Lambda Calculus, Formalization

The application operator ! recurses through terms where the substitution to be applied is transformed with the up-operator ↑ when it is pushed below a lambda. The mutual recursion terminates since it goes through the following stages:

σ ! s ⇝ ξ ! s ⇝ ξ ! n

4.2 Basic Substitution Laws

Application of the identity substitution id = λn.n leaves a term unchanged:

id ! s = s

For many proofs it is crucial to have a composition operation for substitutions such that the composition law holds:

σ ! τ ! s = σ ·τ ! s

The definition of the composition operator is straightforward:

(σ ·τ)n := σ ! τn

The proof of the composition law is interesting. By induction on s it reduces to the distribution law ↑(σ ·τ) = ↑σ ·↑τ

The distribution law follows from two instances of the composition law:

σ ! ξ ! s = σ ·ξ ! s ξ ! τ ! s = ξ·τ ! s

The instances of the composition laws follow by induction on s and two instances of the distribution law:

↑(σ ·ξ) = ↑σ ·↑ξ ↑(ξ·τ) = ↑ξ·↑τ

The first instance is obvious. The second instance follows with the first instance of the composition law and the S-law, which holds by definitional equality.

S ·τ = ↑τ ·S

44 4.3 Beta Reduction

With composition substitutions yield an operator monoid satisfying the com- position and the identity law.

σ ·(τ ·µ) = (σ ·τ)·µ id·σ = σ = σ ·id σ ! τ ! s = σ ·τ ! s id ! s = s

The up-operator is a monomorphism on this monoid.

↑(σ ·τ) = ↑σ ·↑τ ↑id = id

4.3 Beta Reduction

Now that we have substitutions we can define beta reduction:

s ≻ s′ s ≻ s′ t ≻ t′ (λs)t ≻ βt ! s λs ≻ λs′ st ≻ s′t st ≻ st′

The operator β : term → N → term is defined as follows:

βt 0 := t βt (Sn) := n

By induction on the reduction relation ≻ one can show that beta reduction is substitutive: s ≻ t → σ ! s ≻ σ ! t

The interesting case of the induction follows with the composition law and the equation β(σ ! t)·↑σ = σ ·βt which in turn follows from the equation βt·S = id.

4.4 Closed Terms

We have not formalized the notion of free variables of terms. We now for- malize the notion of closed term. To do so, we define an inductive predicate

45 4 Lambda Calculus, Formalization dclosed : N → term → Prop such that dclosed d s is derivable if no variable n ≥ d is free in s.

n < d dclosed ds dclosed d t dclosed (Sd) s dclosed dn dclosed d (st) dclosed d (λs)

4.5 Formalization in Coq

The formalization in Coq has to resolve two complications: 1. Renamings are not substitutions (no subtyping). 2. Substitutions are not extensional (no functional extensionality). The first complication is resolved by defining the operators ↑ and ! first for re- namings and then for substitutions. This eliminates the mutual recursion be- tween ↑ and ! since we now have separate operators for renamings and substi- tutions. A conversion operator is provided mapping renamings to their corre- sponding substitutions. One shows that the application operators for renamings and substitutions agree modulo conversion:

ξ ! ! s = sub ξ ! s

The lack of functional extensionality can be resolved by working with equiv- alence of substitutions rather than equality.

σ ≈ τ :=∀n. σn = τn

The application operators for renamings and substitutions satisfy equivalence laws

ξ ≈ ζ → ξ ! ! s = ζ ! ! s σ ≈ τ → σ ! s = τ ! s saying that equivalent substitutions yield identical results. The equivalence laws can be shown by induction on s.

Exercise 39 Show the equivalence laws for renamings and substitutions.

Exercise 40 Prove the composition law.

Exercise 41 Prove that beta reduction is substitutive.

Exercise 42 Prove σ ! s = s for every substitution and every closed term. You need two lemmas for the predicate dclosed.

46 4.5 Formalization in Coq

Exercise 43 Prove that closedness of terms is a decidable property.

Exercise 44 Prove that a term s is closed if and only if S ! s = s. Hint: Show ↑nS ! s = s → dclosed ns first.

47

5 Type Theory

We study a number of typed lambda calculi. We start with a basic calculus STLC (simply typed lambda calculus) and then consider several increasingly more ex- pressive extensions known as T, F, Fω, CC, and CCω. We also consider the exten- sion of CCω with inductive types. All systems we consider are strongly normal- izing and can be seen as subsystems of the calculus underlying Coq. The systems we consider are known as type theories. Type theories combine a computational interpretation with a logical interpretation. This double inter- pretation is known as Curry-Howard correspondence. The logical interpretation is concerned with propositions and proofs. Propositions appear as types and proofs appear as the elements of types. F can express quantified propositional logic as well as primitive recursive functions on number.

In basic type theories like F or CCω, the logical and the computational inter- pretation of the theory are separated in that the theory cannot state and prove theorems about the computational objects it can define. This difficulty can be overcome by extending the theories with inductive types providing the compu- tational objects.

5.1 Simply Typed Lambda Calculus (STLC)

The syntax of STLC consists of types and terms. There is a confluent reduction relation on terms and a typing relation that relates contexts, terms, and types. We consider the de Bruijn version of STLC where abstractions specify argument types.

Types

We assume a set of type variables X and define types as follows.

A,B ::= X | A → B

49 5 Type Theory

Terms

We assume a set of term variables x and define terms as follows.

s, t ::= x | st | λx : A.s

A formal representation of terms will represent abstractions without local names using de Bruijn indices. Note that abstractions carry an argument type. This is the de Bruijn version of STLC. There is also a Curry version of STLC where abstractions do not carry argument types. The terms of the Curry version of STLC are identical with the terms of the untyped lambda calculus.

Reduction

Reduction can contract a β-redex everywhere.

x (λx : A.s)t ≻ st

Note that reduction ignores the type annotations in abstractions. Reduction in STLC is confluent.

Contexts

A context

Γ = {x1 : A1,...,xn : An} is a finite collection of variable declarations x : A where no variable is declared more than once. The order of the declarations does not matter.

Typing Relation

The typing relation Γ ⊢ s : A is defined inductively.

(x : A) ∈ Γ Γ ⊢ s : A → B Γ ⊢ t : A Γ ,x : A ⊢ s : B x ∉ Γ Γ ⊢ x : A Γ ⊢ st : B Γ ⊢ λx : A.s : A → B

Note that in the rule for abstractions x must not be declared in Γ . This is fine since the local variable x is a notational device that can be renamed. In the de Bruin representation Γ is a stack of types and the variable and abstraction rules look as follows:

Γ ⊢ n : A Γ ,A ⊢ s : B Γ ,A ⊢ 0 : A Γ ,B ⊢ Sn : A Γ ⊢ λA.s : A → B

50 5.2 Gödel’s T

Properties of Typing

• Subject reduction If s ≻ s′ and Γ ⊢ s : A, then Γ ⊢ s′ : A. • Canonical form If ⊢ s : A → B and s is normal, then s = λx : A.t for some t. • Unique type If Γ ⊢ s : A and Γ ⊢ s : B, then A = B. • Strong normalization If Γ ⊢ s : A, then s is strongly normalizing. • Decidability There is a decision algorithm for Γ ⊢ s : A. A logic system with simply typed lambda terms was first studied by Church [1940]. A computational system similar to STLC was first studied by Curry and Feys [1958]. They considered a system without type annotations not having the unique type property. One speaks of implicit typing or Curry-style typing. The version of STLC with type annotations presented here is known as de Bruijn version.

Exercise 45 Formalize STLC in Coq.

5.2 Gödel’s T

T is a simply typed lambda calculus extended with numbers and primitive recur- sion. Numbers are accommodated with a base type N and two constructors O and S.

A,B ::= N | A → B s, t ::= O | Ss | prec s t (xy.u) | x | st | λx : A.s

Note that there are no type variables. The syntactic form for primitive recursion introduces two local variables x and y in the third constituent. The local vari- ables x and y must be distint and are subject to alpha renaming. A formalization would realize the local variables with de Bruijn indices. Reduction is defined with the reduction rule for beta redexes and two reduc- tion rules for primitive recursion.

prec O t (xy.u) ≻ t x y prec (S s)t(xy.u) ≻ u s prec s t (xy.u)

The reduction rules for primitive recursion become more readable if we abbrevi- ate the syntactic form (xy.u) with ρ and express substitution with application.

prec O t ρ ≻ t prec (S s) t ρ ≻ ρs(prec stρ)

51 5 Type Theory

The reduction relation of T is confluent. The typing rules of T are the typing rules of STLC plus rules for the new syntactic forms.

Γ ⊢ s : N Γ ⊢ O : N Γ ⊢ Ss : N

Γ ⊢ s : N Γ ⊢ t : A Γ ,x : N, y : A ⊢ u : A x, y ∉ Γ prec s t (xy.u) : A

The typing relation of T satisfies the properties we have stated for STLC, where the canonical form property now reads as follows: • If ⊢ s : N and s is normal, then s = SnO for some n. • If ⊢ s : A → B and s is normal, then s = λx : A.t for some t. T can be seen as a definition of the class of higher-order primitive recursive functions. It is straightforward to express Ackermann’s function in T. Acker- mann’s function is not first-order primitive recursive. Gödel sketched T in a seminal paper in 1958. He showed how proofs in an systems could be expressed in T, and argued that the strong normal- ization of T yields the consistency of the arithmetic system. A strong normaliza- tion proof for T was first given by Tait in 1967.

Exercise 46 It is possible to formulate T with simpler terms as follows:

s, t ::= O | S | prec | x | st | λx : A.s

In this version the operator prec relies on abstractions obtained with λ. There are advantages and disadvantages to this approach. a) Express the term prec s t (xy.u) of the old system in the new system. b) Express the operator prec of the new system in the old system. c) Give the typing rules for the terms S and prec in the new system. d) Give the reduction rules for redexes obtained with prec in the new system. e) Explain why the unique type property is lost in the new system. f) State the canonical form property of the new system.

Exercise 47 Formalize T in Coq using de Bruijn indices.

5.3 Girard’s F

F extends STLC with polymorphic types ∀X.A whose members are functions mapping a type X to a member of A where A may depend on X. The most

52 5.3 Girard’s F straightforward polymorphic type is ∀X.X → X. The single member of this type is the polymorphic identity function λX.λx : X.x. In Coq, the universe for the type variables of F would be Prop.1 F is also known as the polymorphic lambda calculus. F can express the natural numbers with primitive recursion using Church nu- merals. In fact, F subsumes T. The type of natural numbers can be expressed in F as follows:2 nat :=∀X. X → (X → X) → X

The types and terms of F are defined as follows:

A,B ::= X | A → B |∀X.A s, t ::= x | st | λx : A.s | sA | λX.s

Reduction can contract β-redexes everywhere.

x (λx : A.s)t ≻ st X (λX.s)A ≻ sA

We assume that type variables and term variables are distinct. The reduction relation of F is confluent. Since F has binders for both term and type variables, contexts must account for both kinds of variables. There are different ways to define contexts for F. We follow Harper [2013] and complement Γ with a separate context ∆ for type variables, where ∆ is just a finite set of type variables. We first define a typing relation ∆ ⊢ A.

X ∈ ∆ ∆ ⊢ A ∆ ⊢ B ∆, X ⊢ A X ∉ ∆ ∆ ⊢ X ∆ ⊢ A → B ∆ ⊢∀X.A

Note that in the rule for abstractions X must not occur in ∆. This is fine since the local variable X is a notational device that can be renamed. Informally, ∆ ⊢ A holds if and only if ∆ contains all variables that are free in A.

1 In Coq, the universe Prop is impredicative while the universes behind Type are predicative. 2 The arguments of Church numerals are swapped so that the canonical elements of nat are exactly the Church numerals.

53 5 Type Theory

Next we define the primary typing relation ∆ Γ ⊢ s : A. (x : A) ∈ Γ ∆Γ ⊢ x : A

∆Γ ⊢ s : A → B ∆Γ ⊢ t : A ∆ ⊢ A ∆Γ ⊢ s : ∀X.B ∆Γ ∆Γ X ⊢ st : B ⊢ sA : BA

∆ ⊢ A ∆Γ ,x : A ⊢ s : B ∆, X Γ ⊢ s : A x ∉ Γ X ∉ ∆ ∆Γ ⊢ λx : A.s : A → B ∆Γ ⊢ λX.s : ∀X.A It is also necessary to define valid contexts. A pair ∆ Γ is valid if ∆ ⊢ A for every declaration (x : A) ∈ Γ . Informally, ∆Γ is valid if and only if ∆ contains all variables that are free in Γ . For subject reduction to hold we need that ∆Γ ⊢ s : A is derivable and in addition that ∆ Γ is valid. Note that the empty context 0 0 is valid. For valid contexts, the typing relation of F satisfies the properties we have stated for STLC. The canonical form property for F looks as follows: • If ⊢ s : A → B and s is normal, then s = λx : A.t for some t. • If ⊢ s : ∀X.A and s is normal, then s = λX.t for some t. Exercise 48 Convince yourself that the typing x : X ⊢ (λX. (λx : X.x)x)(∀X.X) : ∀X.X is derivable. Subject reduction does not hold for the typing since the term re- duces to x. Explain why the context of the typing is not valid. What happens if we employ the valid context {X} {x : X}?

Exercise 49 A canonical element of a type A is a normal term s such that ⊢ s : A. One can show that the canonical elements of the type ∀X. X → (X → X) → X are exactly the Church numerals with swapped argument order. Show that the type ∀X. (X → X) → X → X has the ordinary Church numerals as canonical elements and an extra canonical element representing the number 1.

Exercise 50 Express primitive recursion for nat in F. Use Coq to type check your solution. Also use Coq to compute with your solution.

Exercise 51 Define a datatype list nat in F. Realize the constructors nil and cons and a fold function.

Exercise 52 Typing judgements can always be reduced to typing judgements for empty contexts. Let ∆ = {X1,...,Xn} and Γ = {x1 : A1,...,xn : An} be such that ∆ Γ is a valid context. Given a term s and a type A, find a term t and a type B such that ∆Γ ⊢ s : A if and only if ⊢ t : B.

54 5.4 Logical Interpretation

5.4 Logical Interpretation

The types of STLC and F can be seen as logical propositions. A typing judgement Γ ⊢ s : A says that s is a proof of the proposition A under the assumptions in Γ . Under this interpretation, the typing rules are the natural deduction rules for implication and universal quantification. If we omit the proof terms, the typing rules of F are the ND rules for implication and universal quantification.

A ∈ Γ ∆Γ ⊢ A

∆ ⊢ A → B ∆Γ ⊢ A ∆ ⊢ A ∆Γ ⊢∀X.B ∆Γ ∆Γ X ⊢ B ⊢ BA

∆ ⊢ A ∆Γ ,A ⊢ B ∆, X Γ ⊢ A X ∉ ∆ ∆Γ ⊢ A → B ∆Γ ⊢∀X.A

Note that ∆ keeps track of the propositional variables in use, and that Γ col- lects the propositions for wich assume proofs. We can represent both ∆ and Γ as sets. The system is set up such that the wellformedness of contexts is not automatically checked. That is, we need ∆ ⊢ Γ in addition to ∆Γ ⊢ A. F can express falsity, conjunction, disjunction, and existential quantification:

⊥ := ∀Z.Z A ∧ B := ∀Z. (A → B → Z) → Z A ∨ B := ∀Z. (A → Z) → (B → Z) → Z ∃X.A := ∀Z. (∀X. A → Z) → Z

F is an amazing system. On the one hand, F is a computational system that can express types for numbers, lists, and trees together with a rich class of prim- itive recursive functions. On the other hand, F constitutes an intuitionistic proof system for quantified propositional logic. There is one thing missing, however: F cannot reason about the computational objects it can express. From the logical point of view, term reduction in STLC and F simplifies proofs. Subject reduction tells us that proofs are preserved by reduction, and strong nor- malization tells us that every provable proposition has a normal proof. Showing that ⊥ has no normal proof is not difficult in F. Together with strong normal- ization and subject reduction this yields the consistency of F (that is, ⊥ has no proof).

55 5 Type Theory

5.5 F with Single-Sorted Syntax

A formalization of the two-sorted (types and terms) presentation of F with de Bruijn indices is a tedious enterprise since three different application operations for substitutions are needed. Things become easier if we switch to a single-sorted presentation where general terms represent both proper terms and types. We are familiar with a single-sorted syntax from Coq. We employ the following terms.

s,t,A,B = x | P | A → B |∀x.A | st | λx : A.s

The term P serves as universe for the types of F.3 A context is a finite set of declarations x : A where no variable is declared more than once. Declarations of the form x : P introduce type variables. Thus there is no need anymore for a separate context ∆ for type variables. The typing relation Γ ⊢ s : A is defined as follows. (x : A) ∈ Γ Γ ⊢ A : P Γ ⊢ B : P Γ ,x : P ⊢ A : P x ∉ Γ Γ ⊢ x : A Γ ⊢ A → B : P Γ ⊢∀x.A : P

Γ ⊢ s : A → B Γ ⊢ t : A Γ ⊢ s : ∀x.B Γ ⊢ A : P Γ Γ x ⊢ st : B ⊢ sA : BA

Γ ⊢ A → B : P Γ ,x : A ⊢ s : B Γ ⊢∀x.A : P Γ ,x : P ⊢ s : A x ∉ Γ x ∉ Γ Γ ⊢ λx : A.s : A → B Γ ⊢ λx : P.s : ∀x.A

Valid contexts are defined as follows: Γ valid Γ valid Γ ⊢ A : P x ∉ Γ x ∉ Γ 0 valid Γ ,x : P valid Γ ,x : A valid

The rules ensure that every variable x free in a type A of a declaration y : A in Γ has a declaration x : P in Γ .

Exercise 53 Explain why ⊢ P : P cannot be derived.

Exercise 54 Formalize F in Coq using single-sorted syntax and de Bruijn indices.

5.6 F with Uniform Syntax

The presentation of F can be further simplified. As is, F comes with two function types that require separate rules for type formation, application, and abstrac- tion. This duplication can be eliminated by working with a single function type

3 P plays the role of Prop in Coq.

56 5.6 F with Uniform Syntax

∀x : A.B. If the variable x is not free in B, ∀x : A.B represents the ordinary func- tion type A → B. Moreover, ∀x : P.A represents the function type ∀x.A. General function types ∀x : A.B are called products. Products will give us an elegant representation of F that will be the basis for richer systems.4 We also provide a second universe T that acts as type of the universe P. The uniform presentation of F employs the following terms:

s,t,A,B = x | P | T |∀x : A.A | st | λx : A.s (n ∈ N)

A context is a finite set of declarations x : A where no variable is declared twice. We define the typing relation Γ ⊢ s : A as follows.

Γ ⊢ P : T

(x : A) ∈ Γ Γ ⊢ x : A

Γ ⊢ A : u Γ ,x : A ⊢ B : P x ∉ Γ Γ ⊢∀x : A.B : P

Γ ⊢ s : ∀x : A.B Γ ⊢ t : A Γ x ⊢ st : Bt

Γ ⊢∀x : A.B : u Γ ,x : A ⊢ s : B x ∉ Γ Γ ⊢ λx : A.s : ∀x : A.B

The metavariable u ranges over the universes P and T. Valid contexts are defined as follows:

Γ valid Γ ⊢ A : u x ∉ Γ 0 valid Γ ,x : A valid

If Γ is a valid context, we have the following: 1. If Γ ⊢ s : T, then s = P. 2. If Γ ⊢ s : P, then s corresponds to a type of the two-sorted presentation of F. In particular, if s contains a term ∀x : A.B with A ≠ P, then x is not free in B. Hence ∀x : A.B represents the type A → B.

4 Calling function types products is common in type theory. There is a name clash with cartesian products A × B in set theory, whose elements are pairs rather than functions.

57 5 Type Theory

5.7 Fω and CC: Kinds and Conversion

Consider the logical interpretation of F. The abstraction

λX : P. X → ⊥ describes a negation operator. F cannot type this abstraction since it does not provide the type P → P of the abstraction. Types of this form are often called kinds. It is straightforward to extend the uniform presentation of F with kinds. We just need an additional product rule providing the function types represent- ing kinds. Γ ⊢ A : T Γ ,x : A ⊢ B : T x ∉ Γ Γ ⊢∀x : A.B : T

We can now type the negation function λX : P. X → ⊥ with P → P. We also have available the kind (P → P) → P for the function describing the existential quanti- fier. Consider the well-typed application

(λX : P. X → ⊥)⊥

It reduces to the implication ⊥ → ⊥, which is provable. As is, our typing rules do not provide a proof for the equivalent term (λX : P. X → ⊥)⊥. This can be fixed by adding the conversion rule Γ ⊢ s : A Γ ⊢ B : P A ≡β B Γ ⊢ s : B where ≡β stands for β-equivalence. We have now arrived at a system know as Fω. Reduction in Fω is β-reduction and is confluent. For valid contexts, Fω has the properties we have stated for STLC, where types are now unique up to β-equivalence. From confluence and strong normalization it follows that β-equivalence of well-typed terms is decid- able. The canonical form property now reads as follows: • If ⊢ s : ∀x : A.B and s is normal, then s = λx : A.t for some t. • If ⊢ s : P and s is normal, then s has the form ∀x : A.B. • If ⊢ s : T, then s has the form s ::= P | s → s.

Fω contains F as a subsystem (omit the conversion rule and the product rule for kinds). F in turn contains the STLC as a subsystem: Restrict the product rule of F to Γ ⊢ A : P Γ ,x : A ⊢ B : P x ∉ Γ Γ ⊢∀x : A.B : P

58 5.7 Fω and CC: Kinds and Conversion

In this formulation of STLC a typing Γ ⊢ s : A is only derivable if every type variable free in A is declared in Γ . We observe the following: • STLC has product types for P × P. • F has product types for P × P and T × P.

• Fω has product types for P × P, T × P, and T × T.

It turns out that Fω can be extended with the missing products for P × T. This yields the basic calculus of constructions CC designed by Coquand and Huet in 1985 as the starting point for Coq. Since all four products are allowed, a single product rule suffices for CC (u and v range over the universes P and T):

Γ ⊢ A : v Γ ,x : A ⊢ B : u x ∉ Γ Γ ⊢∀x : A.B : u

For CC, the conversion rules is generalized so that conversion becomes possible both at P and T.

Γ ⊢ s : A Γ ⊢ B : u A ≡β B Γ ⊢ s : B

For valid contexts, the typing relation of CC enjoys the properties we have stated for STLC, where the unique types and the canonical forms properties need to be modified. We remark that the strong normalization proof gets harder with each step we climb up the ladder from STLC to CC.

Fω appeared in Girard’s habilitation in 1972. Girard uses a three-sorted syn- tax distinguishing between terms, types, and kinds and proves strong normaliza- tion of Fω. Pierce’s textbook presents Fω with three-sorted syntax. Single-sorted syntax and general products appeared in the work of de Bruijn and Martin-Löf and are used in Coquand and Huet’s initial presentation of the calculus of con- structions in 1985. See the handbook chapter of Barendregt and Geuvers [2001] for a comprehensive discussion of CC-like systems called pure type systems.

Exercise 55 Find a term s so that ⊢ s : (λX : P.P)⊥ is derivable. Write down the derivation in detail.

Exercise 56 Give functions describing conjunction, disjunction, and existential quantification in Fω. Start by stating the types for this functions. Check your results with Coq. Prove in Coq that your definitions are equivalent to Coq’s predefined versions of conjunction, disjunction, and existential quantification.

59 5 Type Theory

5.8 Calculus of Constructions

In CC, the universe T has no type. Consequently, we cannot express function types like T → P. From the naive use of Coq one would expect T : T. This is, however, not the case. In fact, extending CC with a rule T : T yields an inconsistent system where ⊥ has a proof that has no normal form. A similar result was first shown by Girard [1972] and is know as Girard’s paradox. A shorter proof of the inconsistency result was given by Hurkens in 1995. The bottom line summarizing these results says that a type theory is inconsistent if it has a universe that contains itself as a member. The effect of u : u can be obtained to some extend by a hierarchy of infinitely many universes U0 : U1 : U2 : ··· such that U0 ⊆ U1 ⊆ U2 ⊆ ··· . The idea of an infinite and cumulative hierarchy of universes is known from set theory and was first employed for type theories by Martin-Löf. We now consider the following terms:

s,t,A,B = x | Un |∀x : A.A | st | λx : A.s (x ∈ N, n ∈ N)

As before, a context is a finite set of declarations x : A where no variable is declared twice. We start with the following typing rules (u ranges over the uni- verses Un):

Γ ⊢ Un : Un+1

(x : A) ∈ Γ Γ ⊢ x : A

Γ ⊢ A : u Γ ,x : A ⊢ B : u x ∉ Γ Γ ⊢∀x : A.B : u

Γ ⊢ s : ∀x : A.B Γ ⊢ t : A Γ x ⊢ st : Bt

Γ ⊢ A : u Γ ,x : A ⊢ s : B x ∉ Γ Γ ⊢ λx : A.s : ∀x : A.B

Γ ⊢ s : A Γ ⊢ B : u A ≡β B Γ ⊢ s : B

Γ ⊢ s : A A  B Γ ⊢ s : B

60 5.9 Numbers as Inductive Type

The subtyping relation employed by the final subtyping rule is defined as fol- lows: B  B′ m < n ′ A  A Um  Un ∀x : A.B ∀x : A.B

Subtyping gives us a cumulative hierarchy U0 ⊆ U1 ⊆ U2 ⊆ ··· . Subtyping is needed so that the product rule works as expected. For instance, ∀x : U1.x is accommodated as an element of U2. Without the subtyping rule, ∀x : U1.x cannot be typed. Note that the product rule closes every universe under taking products. More- over, if A : u and B : v for some universes u and v, then the product ∀x : A.B is an element of either u or v (the subtyping rule is essential here).

One could hope that the system we have arrived at subsumes F if U0 is taken as P. However, this is not the case since ∀x : U0.x is not an element of U0. This can be fixed by adding a special product rule for U0:

Γ ⊢ A : u Γ ,x : A ⊢ B : U0 x ∉ Γ Γ ⊢∀x : A.B : U0

One says that this rule makes U0 impredicative. The system we have arrived at is a calculus of constructions known as CCω. CCω is the basic type theory underlying Coq. An extension of CCω is studied in Luo [1994]. It turns out that U0 is the only universe that can be made impredicative with- out losing consistency [Harper and Pollak, 1991]. Valid contexts are defined as before: Γ valid Γ ⊢ A : u x ∉ Γ 0 valid Γ ,x : A valid

For valid contexts, the typing relation of CCω enjoys the properties we have stated for STLC. The unique types property must be modified to least types unique up to β-equivalence. The canonical forms property needs to be adapted as well. In addition to the properties stated for STLC, CCω satisfies the following property. • Propagation If Γ valid and Γ ⊢ s : A, then there exists a universe u such that Γ ⊢ A : u.

A strong normalization proof for CCω can be found in Luo [1994].

5.9 Numbers as Inductive Type

The logical and the computational interpretation of CCω are separated in that we cannot express and prove propositions about numbers and primitive recursive

61 5 Type Theory functions. The situation changes if we accommodate numbers with an inductive type rather than with functions. Recall that we obtained T from STLC by adding the numbers as an inductive type. While T is a purely computational system, the situation changes if we extend CCω with inductive numbers since we can now type primitive recursion much more generally than before. It is now possible to express and prove propositions about numbers. It turns out that inductive proofs can be formulated as primitive recursive functions. Type theories with dependent function types and inductive types were for- mulated first by Martin-Löf in 1972. In contrast to CCω, Martin-Löf’s systems restrict all universes to be predicative. The initial versions of Coq did not pro- vide inductive types. Inductive types appeared in Coq around 1991. From the Coq perspective the eliminator for numbers is the most straightfor- ward combination of match and fix. One may think of the eliminator as a match extended with direct structural recursion.

We extend the terms of CCω as follows.

s,t,A,B = ··· | N | O | S s | elimA s t1 t2

We say that N, O, and S are constructors and that elim is an eliminator. The eliminator gives us primitive recursion and inductive proofs. The recursion is obtained with two reduction rules.

elimA O t1 t2 ≻ t1

elimA (S s) t1 t2 ≻ t2 s(elimA s t1 t2)

Coming from Coq, it is helpful to think of the term elimA s t1 t2 as a recursive match for s : nat. The terms t1 and t2 represent the two branches of the match (t1 is the branch for O and t2 is the branch for S). The second argument of t2 takes the result of the direct structural recursion. The term A represents the so-called return type function of the match. This function is present in Coq but is usually inferred automatically. Here are the typing rules for the new constructs.

Γ ⊢ s : N

Γ ⊢ N : U1 Γ ⊢ O : N Γ ⊢ S s : N

Γ ⊢ A : N → u Γ ⊢ s : N Γ ⊢ t1 : AO Γ ⊢ t2 : ∀n : N. An → A(Sn)

Γ ⊢ elimA s t1 t2 : As

Note that the instance u = U0 of the typing rule for the eliminator yields the induction principle for natural numbers.

62 5.10 Dependent Pair Types

The extension of CCω with inductive numbers preserves the important prop- erties of the typing relation. Given an inductive type definition, Coq automatically derives a function rep- resenting the eliminator for the type. For the inductive definition of nat, the elimination function is declared with the name nat_rect. The theories underlying early versions of Coq reduced match and fix to elimi- nators. So match and fix appeared as syntactic convenience for eliminators. Cur- rent versions of Coq provide for more permissive uses of match and fix. So far a fully satisfying foundation for the permissive use of match and fix is missing.

5.10 Dependent Pair Types

The elements of a function type ∀x : A.B are functions that take an element x of A to an element of B, where the type B may depend on x. Dependent pair types Σ x : A.B apply this typing idea to pairs: The elements of Σ x : A.B are pairs (x,y) such that x is an element of A and y is an element of B, where B may depend on x. Following the notation (due to Martin-Löf), dependent pair types are often called Σ-types.

We now extend CCω with dependent pair types. The usual eliminators for pair types are the projections π1 (first component) and π2 (second component). We will also consider a more expressive eliminator, which represents the match coming with the inductive definition of dependent pair types in Coq.

We extend the terms of CCω as follows:

s,t,A,B = ··· | Σ x : A.B | (s,t)A | π1 s | π2 s | elimA s t

The reduction rules for the new eliminators are as follows.

π1 (s1,s2)A ≻ s1

π2 (s1,s2)A ≻ s2

elimA (s1,s2)B t ≻ ts1 s2

There is an additional subtyping rule for pair types.

A  A′ B  B′ Σ x : A.B  Σ x : A′.B′

63 5 Type Theory

Here are the typing rules for the new constructors and eliminators.

Γ ⊢ A : u Γ ,x : A ⊢ B : u x ∉ Γ , u ≠ U0 Γ ⊢ Σ x : A.B : u

Γ Γ Γ x ,x : A ⊢ B : u ⊢ s : A ⊢ t : Bs x ∉ Γ , u ≠ U0 Γ ⊢ (s,t)Σ x : A.B : Σ x : A.B

Γ ⊢ s : Σ x : A.B Γ ⊢ s : Σ x : A.B Γ ⊢ Γ ⊢ x π1 s : A π2 s : Bπ1 s

Γ ⊢ C : (Σ x : A.B) → u

Γ ⊢ s : Σ x : A.B Γ ⊢ t : ∀x : A. ∀y : B. C(x,y)Σ x : A.B x ≠ y Γ ⊢ elimC s t : Cs

The first rule asserts that all universes but U0 are closed under taking dependent pair types. The impredicative universe U0 is excluded to avoid inconsistency. The rules for pairing and projection express basic intuitions. Pairs are annotated with their types so that the decidability of the typing relation is preserved. The rule for the eliminator elim corresponds to the typing rule for the matches Coq provides if Σ-types are defined inductively. The argument C of elim represents the return type function of the match.

It is not difficult to express the projections π1 and π2 with the eliminator elim. On the other hand, expressing the eliminator elim with the projections π1 and π2 seems impossible. The extension of CCω with dependent pair types preserves the important properties of the typing relation. Proofs can be found in Luo [1994]. In the logical interpretation, function types appear as implications and univer- sal quantifications. Likewise, pair types appear as conjunctions and existential quantifications. Existential quantifications can be expressed with function types if the uni- verse of propositions is impredicative. For pair types in predicative universes the coding does not work since the coded pair types appear as members of the next higher universe.

Exercise 57 Study dependent pair types in Coq. a) Define dependent pair types inductively. b) Define a function elim that models the eliminator for dependent pair types. c) Define the projections π1 and π2 using elim. d) Prove the η-law s = (π1 s,π2 s) using elim. e) Convince yourself that you cannot prove the η-law using the projections.

64 5.11 Recommended Reading f) Convince yourself that you cannot define elim using the projections. Do not use matches except when you define elim.

5.11 Recommended Reading

• The first chapter of the homotopy type theory (HOTT) book [2013] gives an informal presentation of type theory relating it to set theory. There is also a discussion of dependent pair types.

• Luo’s book [1994]. Comprehensive presentation of CCω with pair types. Com- plete proofs. Philosophical background. • Handbook chapter of Barendregt and Geuvers [2001]. Covers pure type sys- tems generalizing CC.

• Girard’s book [1990]. Presents the story from STLC to Fω. • Martin-Löf’s An Intuionistic Theory of Types. Revised version of a paper from 1972 at http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.131.926. Official references to Martin-Löf’s papers appear in the HOTT book.

65

6 PCF

PCF is a Turing complete system with general recursion. We present PCF with call-by-value reduction (CBV).

67

Bibliography

[1] Henk P. Barendregt. The Lambda Calculus: Its Syntax an Semantics. North- Holland, 1984. Revised Edition.

[2] J. Roger Hindley and Jonathan P. Seldin. Lambda-Calculus and Combinators, an Introduction. Cambridge University Press, 2008.

[3] Moses Schönfinkel. Über die Bausteine der mathematischen Logik. Mathema- tische Annalen, 92:305–316, 1924.

[4] Masako Takahashi. Parallel reductions in lambda calculus. Inf. Comput., 118(1):120–127, 1995.

69