Quick viewing(Text Mode)

The Theory of the Foundations of Mathematics - 1870 to 1940

The Theory of the Foundations of Mathematics - 1870 to 1940

The of the foundations of - 1870 to 1940 -

Mark Scheffer (Version 1.0) 2 3

.

Mark Scheffer, id. 415968, e-mail: [email protected]. Last changes: March 22, 2002. This report is part of a practical component of the Com- puting study at the Eindhoven University of Technology. 4

To work on the foundations of mathematics, two things are needed: Love and Blood.

- Anonymous quote, 2001. Contents

1 Introduction 9

2 Cantor’ paradise 13 2.1Thebeginningofset-theory...... 13 2.2Basicconcepts...... 15

3 Mathematical constructs in -theory 21 3.1Somemathematicalconcepts...... 21 3.2Relations...... 23 3.3Functions...... 29 3.4 Induction Methods ...... 32 3.4.1 Induction ...... 32 3.4.2 Deduction...... 33 3.4.3 The of induction ...... 34 3.5Realnumbers...... 45 3.5.1 Dedekind’scuts...... 46 3.5.2 Cantor’schainsofsegments...... 47 3.5.3 Cauchy-...... 48 3.5.4 Propertiesofthethreedefinitions...... 50 3.6Infinitesets...... 51 3.7TheContinuumHypothesis...... 60 3.8 Cardinal and Ordinal and ...... 63 3.8.1 Cardinal numbers and Cantor’s ...... 63 3.8.2 Ordinal numbers and Burali-Forti’s Paradox ...... 65

4 Peano and Frege 71 4.1Peano’sarithmetic...... 71 4.2Frege’swork...... 74

5 6 CONTENTS

5 Russell 79 5.1 Russell’s paradox ...... 82 5.2Consequencesandphilosophies...... 88 5.3ZermeloFraenkel...... 92 5.3.1 Axiomatic ...... 92 5.3.2 Zermelo Fraenkel (ZF) ...... 93

6 Hilbert 99 6.1Hilbert’sprooftheory...... 101 6.2Hilbert’s23problems...... 110

7 Types 113 7.1 Russell and Whitehead’s ...... 113 7.2Ramsey,HilbertandAckermann...... 119 7.3Quine...... 121

8G¨odel 123 8.1 Informally: G¨odel’s incompleteness ...... 123 8.2 Formally: G¨odel’sIncompletenessTheorems...... 127 8.2.1 On formally undecidable ...... 127 8.2.2 The impossibility of an ‘internal’ of . 130 8.2.3 G¨odel numbering and a concrete proof of G1, G2 and G3 131 8.3 G¨odel’s and Peano ...... 132 8.4Consequences...... 134 8.5 Neumann-Bernays-G¨odelaxioms...... 135

9 Church and Turing 141 9.1TuringandTuringMachine...... 141 9.2ChurchandtheLambdaCalculus...... 153 9.3TheChurch-Turingthesis...... 166

10 Conclusion 169

A Timeline and Images 181 CONTENTS 7 Mathematical

Many different notations have been developed for set theory and . Most notations that we have used are standard today; other notations that we have used are introduced in the text.

Mathematical Logic

also described as

∧ conjuction and ∨ disjunction (inclusive) or ¬ not ϕ(x) propositional → implication if ... then ↔ bi-implication , iff ≡ equivalence is equivalent to ∀ universal quantifier for all ∃ existential quantifier exists ∃! one- existential quantifier exists a unique

In most places we have chosen to use the following notation1 to denote quantifications:

( : range : ) denotes the relationship over a set of termsrangingoverrange

Consider a general pattern (Qx : ϕ(x0,...,xn):t(x0,...,xn)), with Q aquantifier,ϕ a boolean expression in terms of the dummies x0, ..., xn, and t(x0,...,xn) the term of the quantification. The quantification is the accumulation of values t(x0,...,xn) using an operator or relation indicated by Q, over all values (x0,...,xn) for which ϕ(x0,...,xn)holds.

1Notation originally due to E.W. Dijkstra. 8 CONTENTS

This is suitable for manipulation and unambiguous in the sense that it explicitly indicates the quantifier Q, the dummies and the range of the dummies that is indicated by the boolean expression ϕ (i.e. it exactly determines the domain of the quantification). This allows us to about general properties of quantifications, in a way in which the (scopes of the) bound variables are clearly identified. Note that this type of quantification is only suitable for binary operations that are symmetric and associative.

Example: ( x :0≤ x ≤ 5:x2) = 02 +12 +22 +32 +42 +52 = 5 x2 x=0 Example: (∃x : x ∈ N : x3 − x2 =18) ≡

‘there exists a natural x such that x3 − x2 =18’ If the term ranges over all possible values of the variable (here : x), or if it is clear what the range of a variable is, we can omit it.

Example: (∀x : true : x ∈ A → x ∈ B) ≡

(∀x :: x ∈ A → x ∈ B) ≡

‘all elements of A are also elements of B’ Chapter 1

Introduction

Pure mathematics is, in its way, the poetry of logical ideas.

-

This report covers the most important developments and theory of the foundations of mathematics in the period of 1870 to 1940. The tale of the foundations is fairly familiar in general terms and for its philosophical con- tent; here the main emphasis is laid on the mathematical theory. The history of the foundations of mathematics is complicated and is a many-sided story; with this article I do not aim to give a definitive or complete version, but to capture what I consider the essence of the theoretical developments, and to present them in a clear and modern setting. Some basic mathematical knowledge on set-theory and are presupposed.

By the middle of the nineteenth century, certain logical problems (for example paradoxes around the notions of infinity, the infinitesimal and con- tinuity) at the heart of mathematics had inspired a movement, led by German , to provide mathematics with more rigorous foundations.

This is where the theory of this report begins, with the emergence of set theory by the German Cantor. In section 2.1 we informally describe how work on a problem concerning trigonometric series gradually led Cantor to his theory of sets (section 2.2). As a result of the work of Weierstrass, Dedekind and Cantor, had been provided with much more sophisticated foundations. The notion of infinitesimal had been banished, ‘real’ numbers had been provided with a logically consistent

9 10 CHAPTER 1. INTRODUCTION definition (section 3.5), continuity had been redefined and, more controver- sially, a whole new branch of arithmetic had been invented which addressed itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7). In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could not resolve this paradox, but it was not taken so seriously, partly because the paradoxes appeared in a rather technical region.

The Italian mathematician Peano (section 4.1) was able to show that the whole of arithmetic could be founded upon a system that uses three basic notions and five initial axioms. At the same time the German mathematician Frege (section 4.2) worked on developing a logical for mathematics. Just as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s grounds were strictly logic; he followed a development later called , also known as the development of so-called .

The British mathematician Russell noted Peano’s work and later that of Frege. Soon thereafter he showed (section 5.1) how finite like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed out the difficulties that arose with self-referential terms. This paradox that Russell found existed not only in specific technical regions but in all of the axiomatic systems underlying mathematics at the same time (section 5.1). But since the paradoxes could be avoided in most practical applications of set theory, the in set theory as a proper foundation of mathematics remained. Axiomatic set theory (section 5.3.1) was an attempt to come to a theory without paradoxes. Various responses to the paradox (section 5.2) led to new sets of axioms for set theory. The two main approaches are by the German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun- garian von Neumann, the Hungarian-Austrian G¨odel and the Briton Bernays (section 8.5). It also led to the emergence of the ‘intuitionistic’ of mathematics by the Dutch mathematician Brouwer (not covered here) and to a theory of types, proposed by Russell himself with the help of his for- mer teacher, the English mathematician Whitehead. Despite of the paradox Russell and Whitehead still claimed that all mathematics could be founded on a mathematical logic; this believe was given a definite presentation in their work ‘Principia Mathematica’ (section 7.1). Various consequences fol- lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and 11

Ramsey, see section 7.2).

At the turn of the century, the German mathematician listed certain important problems concerning the foundations of mathema- tics and mathematics in general (section 6.2. To overcome paradoxes and other problems that arose in existing systems, Hilbert developed a theory of axiomatic systems (section 6.1). He then stimulated his student Zermelo in using this axiomatic method to develop as first a set of axioms for set theory (section 5.3.2). Hilbert had since then made more precise demands on any proposed set of axioms for mathematics (section 6.1) in terms of consistency, and .

In 1931 G¨odel had shown that consistency and completeness could not both be attained (chapter 8). G¨odel’s work left outstanding Hilbert’s ques- tion of decidability. The English mathematician Turing proved in 1936 that there are undecidable problems, by giving the so-called that cannot be solved by any (section 9.1), after formalizing the no- tion of algorithm with his concept of the . The American mathematician Church (independently) obtained the same result but with another formalization of the notion of an algorithm, using his computational model of (section 9.2). In section 9.3 we state that these two notions are equivalent and correspond to the intuitive notion of algorithm or computability. In chapter 10 I summarize the theory of the foundations of mathematics, before giving my own opinion and make some suggestions for future work.

This article is part of the practical component of my study of computing science, and written for a large part in 8 weeks at the Heriot-Watt university in Edinburgh under supervision of prof. F. Kamareddine. I want to thank Rob Nederpelt and the section of the computing science de- partment of the Eindhoven University of Technology for making this possible. Rob Nederpelt always inspired me to continue working on this report and was patient in explaining difficult proofs to me. And last but not least, I want to thank Fairouz Kamareddine for her support and positive motivation, and Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last revision) for the typesetting and editing of large parts of this document and for helping me in many ways to finish this article in such a small period of time. 12 CHAPTER 1. INTRODUCTION Chapter 2

Cantor’s paradise

2.1 The beginning of set-theory

Perhaps the most surprising thing about mathematics is that it is so surprising. The rules which we make up at the beginning seem ordinary and inevitable, but it is impossible to foresee their consequences. These have only been found out by long study, ex- tending over many centuries. Much of our knowledge is due to a comparatively few great mathematicians such as Newton, Euler, Gauss, or Riemann; few careers can have been more satisfying than theirs. They have contributed to human even more lasting than great literature, since it is independent of .

- Titchmarsh, E. C. in [88]

Bythelate19th century the discussions about the foundations of had become the focus for a running debate about the of the branches of mathematics ([23, last paragraph of section 35, page 69/70]). Although there had been no conscious plan leading in that direction, the stage was set for a consideration of questions about the fundamental nature of mathema- tics.

In the study of logic, the work of the English mathematician in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders

13 14 CHAPTER 2. CANTOR’S PARADISE

Peirce around 1880 ([49, page 187]), had contributed to the development of a symbolism to explore logical deductions and in the logician (see [98]) had directed keen attention to fundamental questions.

All of these debates came together through the pioneering work of the German mathematician on the concept of a set. Cantor had begun work in this because of his interest in Riemann’s theory of trigono- series.

In Germany at the university of Halle, the direction of Cantor’s research turned away from and towards analysis. This was due to Heine, one of his senior colleagues at Halle, who challenged Cantor to prove the open problem on the uniqueness of representation of a function as a trigonometric series (see [30, section 5.2, page 182]). Starting from the work on trigonometric series and on the function of a complex variable done by the German mathematician (see [75]) in 1854, Cantor in 1870 showed ([30, page 182]) that such a function can be represented in only one way by a trigonometric series. Consideration of the collection of numbers (originally termed ‘ sets’, see [30, section 5.2, page 184]) that would not conflict with such a representation led him, first, in 1872, to define irrational numbers in terms of convergent sequences of rational numbers (or quotients of , see section 3.5.2) and then to begin his major lifework, the theory of sets and the concept of transfinite numbers. 2.2. BASIC CONCEPTS 15 2.2 Basic concepts

The essence of mathematics lies in its freedom.

- Georg Cantor, quoted in [58]

In 1974 Cantor published his first article on set-theory. A set, wrote Can- tor (in ‘Untersuchungenuber ¨ die Grundlagen der Mengenlehre I’, published in [20, page 261-281]), is “a collection of definite, distinguishable objects of perception or thought conceived as a whole”. In this report we use a similar of the concept of a set.

What is a set? A (finite or infinite) collection of objects, that is considered as a single, abstract object.

A set is sometimes also called aggregate, or (as it was first called by Riemann (see [31, page 88]) and later by the mathematician Russell:) mani- fold. The objects are also called elements or members of the set.

We denote a set of elements between ‘{’, ’}’, and membership of an element to a set by the membership relation ∈.

Example: If we consider a set that contains natural numbers, we write 4 ∈ {2, 3, 4, 5} to indicate that 4 is an element of the set {2, 3, 4, 5}. We write 4 ∈{7, 8, 9} to indicate that 4 is not an element of the set {7, 8, 9}.

In a mathematical context we mostly consider sets of numbers and functions. We denote the well-known sets of natural numbers by N (this set is also called the naturals), the integers by Z, the fractional numbers by Q (this set is also called the rationals)andtherealsbyR (this set is also called the ). The objects of a set themselves can also be sets.

What is set theory? A branch of mathematics that deals with the proper- ties of well-defined collections of objects, which may be of a mathematical nature, such as numbers or functions, or not. 16 CHAPTER 2. CANTOR’S PARADISE

Cantor defined ([49, page 288]) two sets A and B to be identical (equal), notation A = B, if and only if A and B have the same elements. When later set-theory was axiomatized, this definition became also known as the

Axiom of : A = B := (∀x :: (x ∈ A ↔ x ∈ B))

Example: {3, 3, 7} = {7, 3} and {2, {3, 4}} = {{2, 3},4}

The relation ‘is a of’, notation ⊆, indicates that one set is con- tained in the other:

Definition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)

Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B)

We often want to create a new set from a given set by selecting elements that have certain properties. For example we take the set of powers of three or the set of all even numbers (to be exact: the set containing those ele- ments of the set of natural numbers that have the to be divisible by 2). This principle was used by Cantor, and we also call it the unrestricted or naive comprehension principle because it later (see sections 3.8 and 5.1) turned out to be untenable.

Comprehension principle: For all properties ϕ there is precisely one set, denoted by {x | ϕ(x)}, whose elements are exactly those objects which have the property ϕ.

We thus have that y ∈{x | ϕ(x)}↔ϕ(y). As a consequence (by taking for all x, ϕ(x) = false), there is at least one set that has no elements: the , denoted by ∅.

Theorem: (∃!x :: (∀y :: y/∈ x)) Proof: If we take ϕ to be false, the comprehension principle says that ‘there is precisely one set whose elements are exactly those objects which have the property false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)). This is equivalent to saying there is no element y that can be a member of x:(∃!x :: (∀y :: y/∈ x)). From now on, we denote this unique set x by ∅ and call it the empty set. 2.2. BASIC CONCEPTS 17

Corollary: (∀a :: ∅⊆a) Proof: We want to prove that (∀a :: ∅⊆a) or, using the definition of the subset relation: (∀x :: x ∈∅→x ∈ a). From the previous theorem we know that (∀y :: y/∈∅). This yields us (∀x :: false → x ∈ a), which is true.

Using the comprehension principle we can create new sets from given sets. So now we can introduce some operations on sets, by applying the compre- hension principle. But before we do that, we first introduce some general (i.e. regardless whether the operations are set-theoretic or not) properties of operations: idempotence, commutativity, associativity and distributivity. Although Cantor did not formulate these properties as such, they are used in the branch of calculus and useful in the set theory that follows in this chapter. Suppose ⊕ and  are binary1 operations on a certain domain and E, F and G are elements on that domain (for example sets), on which we have defined the relation ‘=’.

Definition of idempotence: ⊕ is idempotent := (∀E :: E ⊕ E = E)

Definition of commutativity: ⊕ is commutative := (∀E,F :: E ⊕ F = F ⊕ E)

Definition of associativity: ⊕ is associative := (∀E,F,G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G))

Definition of distributivity: ⊕ is distributive2 over  := (∀E,F,G :: E ⊕ (F  G)=(E ⊕ F )  (E ⊕ G))

1These properties can also be generated for operations of arbitrary , but this will not be necessary for our discussion. 2This form of distributivity is also called left-distributivity, as opposed to right- distributivity. ⊕ is right-distributive over  := (∀E,F,G :: (E  F ) ⊕ G =(E ⊕ G)  (F ⊕ G)) In ordinary mathematics this distinction is often left out for commutative operations, and we for example simply say that × is distributive over + (when in it is both left- and right-distributive). 18 CHAPTER 2. CANTOR’S PARADISE

The symbol ∪ is employed to denote the of two sets. Thus, the set A ∪ B is defined as the set that consists of all elements belonging either to set A or set B.

Definition of union: A ∪ B := {x | x ∈ A ∨ x ∈ B}

The operation is denoted by the symbol ∩. A ∩ B is defined as the set composed of all elements that belong to both A and B.

Definition of intersection: A ∩ B := {x | x ∈ A ∧ x ∈ B}

Any two sets the intersection of which is the empty set are said to be dis- joint. A collection of sets is called (pairwise) disjoint or mutually exclusive if any two distinct sets in it are disjoint.

Example: The operations union and intersection on sets are both idempo- tent, commutative and associative.

The difference of sets B and A, denoted B − A, contains those elements of B, that are not in A.

Definition of difference: B − A := {x | x ∈ B ∧ x/∈ A}

If A ⊆ B we often call the difference B − A the relative of A in B.WethencallB the , and if it is clear what the universe is we often denote the relative complement of A by Ac. From the definitions that we have introduced so far, we can deduce three properties that are known as the laws of reciprocity. The second and third law are also known as the laws of de Morgan, named after the English mathematician :

First law of reciprocity: A ⊆ B ↔ AC ⊇ BC Second law of reciprocity: (A ∪ B)C = AC ∩ BC Third law of reciprocity: (A ∩ B)C = AC ∪ BC

We define the of V , denoted by P(V ), as the set of all of V .NotethatifV = ∅, this operation creates a larger set from a given set V . 2.2. BASIC CONCEPTS 19

Definition of powerset: P(V ):={A | A ⊆ V }

Given a set V ,wethushavethat(∀y :: y ∈P(V ) ↔ y ⊆ V )

We can extend the union of a pair of sets to any finite collection of sets; the union is then defined as the set of all objects which belong to at least onesetinthecollectionA.Wecandothesamefortheintersection. Definition: A := {x | (∃y :: y ∈ A ∧ x ∈ y)} Definition: A := {x | (∀y :: y ∈ A → x ∈ y)}

We can divide a set of objects into a partition, that is a family of subsets that are mutually exclusive and jointly exhaustive. Assume P is a set of subsets of X.

Definition of partition: P is a partition of X := X = {A | A ∈ P }∧(∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)

In this chapter I have made extensive use of [30] in section 2.1 and [17] in section 2.2. 20 CHAPTER 2. CANTOR’S PARADISE Chapter 3

Mathematical constructs in set-theory

3.1 Some mathematical concepts

The mathematician is entirely free, within the limits of his imagi- nation, to construct what world he pleases. What he is to imagine is a matter for his own caprice; he is not thereby discovering the fundamental of the universe nor becoming acquainted with the ideas of God. If he can find, in , sets of entities which obey the same logical scheme as his mathematical entities, then he has applied his mathematics to the external world; he has created a branch of science.

- J.W.N. Sullivan in Aspects of Science, 1925

Now that we have this apparatus of set-theory available, we will see that it is not just a separate branch of mathematics, but that we can define some basic mathematical constructs in set-theory. In this section we will consider pairs and the cartesian , necessary before we can treat relations (in section 3.2) and functions (in section 3.3).

First we consider the mathematical concept of an . Compared to a ‘normal’ pair, where two pairs are considered equal if they have the same elements, we want an ordered pair to also have the property

21 22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY that the elements appear in the same order:

(∀c, d :: = ↔ a = c ∧ b = d)

We can now easily verify that the following definition (see [17, chapter 8]) in set-theory satisfies the desired property.

Definition of ordered pair1: := {a, {a, b}}

As the A × B is by definition the set of all ordered pairs with a ∈ A and b ∈ B, we can now use the same definition in set-theory:

Definition of cartesian product: A × B := { | a ∈ A ∧ b ∈ B}

Let V = {Vi | i ∈ I} be a set of sets. We now define the cartesian product of a set of sets, denoted by ×V or ×i∈I Vi. The definition uses the concept of a function, that will be introduced on page 29.

Definition of cartesian product of a set of sets: × { → | ∀ ∈ ∈ } V := f : I i∈I Vi ( i : i I : f(i) Vi)

1Representation originally by Kuratowski, see [49, page 294]. 3.2. RELATIONS 23 3.2 Relations

Mathematicians do not study objects, but relations between ob- jects. Thus, they are free to replace some objects by others as long as the relations remain unchanged. Content to them is irre- levant: they are interested in form only.

- J.H. Poincare´

In mathematics, a relation maps each element from an input set (called domain) to either true or false. We formalize this notion in set-theory.

Definition of : R is a binary relation between X and Y := R ⊆ X × Y

Note: We can easily generalize this definition for n-ary relations: R is an n-ary relation on X1,...Xn := R ⊆ X1,X2 × ...× Xn, for n ∈ N. We call n the arity of the function.

Example: We have already seen the definitions of the subset and proper sub- set relations in section 2.1. There we defined the set R ⊆ X ×Y implicitly by using a ; only those pairs are in R for which the statement holds (here we are using in fact the comprehension principle of page 16). We will continue to use statements to define relations.

We define the following shorthand notation (sometimes also written in infix notation as xRy): R(x, y):=∈ R.

The mathematical expression ‘x∈ R’, with R representing the ‘less than’ relation.

Example: The relation < on the naturals (i.e. between N and N)canbe defined as:

< 0, 1 >, < 1, 2 >, < 2, 3 >, ... < 0, 2 >, < 1, 3 >, < 2, 4 >, ... < 0, 3 >, < 1, 4 >, < 2, 5 >, ... . . 24 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

On a relation R we can define the concepts of domain and range.

Definition of domain, range: dom(R):={x ∈ X | (∃y : y ∈ Y : R(x, y))} ran(R):={y ∈ Y | (∃x : x ∈ X : R(x, y))}

If we define the relation of X, we want it to have the usual pro- perty that idX (x)=x for all x ∈ X (see for example [3, section 1.9.5.b, page 30]). In set-theory, we denote the identity relation on V by IV .

Definition of identity relation: IV := {∈ V × V | x = y}

Assume R is a binary relation on a set X (i.e. R ⊆ X × X). As we did for operations in section 2.2, we can also define some general properties of relations. Note that we have already defined an equality relation ‘=’ on X at page 16. Hereby we can explicitly state on which domain the property holds (e.g. R is reflexive on X) or leave this implicit (e.g. simply R is reflexive).

Definition of reflexivity: R is reflexive := (∀x : x ∈ X : R(x, x))

Definition of : R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))

Definition of anti-symmetry: R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)

Definition of transitivity: R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))

Definition of connectivity: R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))

Definition of equivalence: R is an := R is reflexive, symmetric and transitive 3.2. RELATIONS 25

Note: Asymmetric means not symmetric, and is not the same as anti- symmetric.

Example: The subset relation is reflexive, anti-symmetric (note that the proof of anti-symmetry uses the of extensionality of page 16) and transitive, but not connective.

If R is an equivalence relation on a set X, we denote the of x with respect to R as [x]R.

Definition of equivalence class: [x]R := {y ∈ X | R(x, y)}

If R is an equivalence relation on X, the quotient set X/R of X modulo R is the set of equivalence classes [x]R for all x ∈ X.

Definition of quotient set: X/R := {[x]R | x ∈ X}

We now continue to build on the concept of relations, by categorizing them based on the properties they have. An important property of relations is the ability to compare and order elements. Suppose X and Y are sets, and R is a relation on X.

Definition of (weak) partial ordering: R is a (weak) partial ordering := R is reflexive, anti-symmetric and transitive (on X)

Definition of quasi ordering: R is a quasi ordering := R is irreflexive and transitive

Definition of strict partial ordering: R is a strict partial ordering := R is irreflexive, anti-symmetric and transitive

Definition of (total or linear) ordering: R is a (total or linear) ordering := R is irreflexive, anti-symmetric, transitive and connective

Definition of well-ordering: R is a well-ordering := R is an ordering on X and each nonempty subset of X has a least element 26 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Definition of well-foundedness: AsetV is well-founded by a relation R := S is partially ordered by R and contains no infinite descending chains

AsetS contains a set C that is an infinite descending chain iff C ⊂ S ∧ C has no minimal element.

Theorem: (without proof) Any subset of a well-founded set is also well- founded.

Now we can speak of a set of which the elements are ordered by a relation R, we define the well-known concepts of (immediate) successor and prede- cessor.

Definition of (immediate) predecessor: An element x1 ∈ X is a pre- decessor of an element x2 ∈ X (with respect to an ordering R on X):= R(x1,x2) ∧¬R(x2,x1). x1 is an immediate predecessor of x2 if in (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1,x3) ∧ R(x3,x2))

Definition of (immediate) successor: An element x2 ∈ X is a suc- cessor of an element x1 ∈ X (with respect to an ordering R on X):= R(x1,x2) ∧¬R(x2,x1). x2 is an immediate successor of x1 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1,x3) ∧ R(x3,x2))

Note that with these definitions it can be easily proved that if a relation R on X is an ordering, then each element except the smallest has a unique immediate predecessor and each element except the largest has a unique immediate successor. The notions of smallest and largest elements will be introduced hereafter. In the literature the immediate successor or predeces- sor is sometimes called just successor or predecessor. Sometimes we also see that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of the ‘next’ or ‘previous’ value.

When R is a partial ordering we often denote it by the symbol ,and when it is a quasi ordering by ≺. Now we can distinguish elements based on their order. Let X be a set, partially ordered by  and let Y be a subset of X.

Definition of minimal element: x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : y  x) 3.2. RELATIONS 27

Definition of maximum element: x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : x  y)

Definition of least element: x is a least (also called smallest or first) element of X := x ∈ X ∧ (∀y : y ∈ X : x  y)

Definition of maximal element: x is a maximal (also called greatest, largest, last) element of X := x ∈ X ∧ (∀y : y ∈ X : y  x)

Definition of lowerbound: x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x  y)

Definition of upperbound: x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y  x)

Definition of infimum: x is an infimum for Y in X := x is the greatest lowerbound for Y in X

Definition of supremum: x is a supremum for Y in X := x is the smallest upperbound for Y in X

Example: Let X = {4, 6, 12, 24, 36} and R(x, y):=x is a divisor of y.Then R is a partial order (but not strict) and also a quasi order, but not a (total) order. 4 and 6 are minimal elements of X, but X has no least element. 1 is a lowerbound for X, and 2 is the infimum of X. 28 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

The so-called least number principle says that any non-empty subset of the natural numbers has a least element. This principle can be shown (a proof can be found in [59, page 7]) to be equivalent to the principles of weak and strong induction, that will be introduced in section 3.4.

Example: The relation < on the naturals is an example of a total ordering on N. From the so-called least number principle we can conclude that N is also well-ordered by <. We prove the latter.

Proof: We know that < is an ordering on N. We show by induction on the number of elements of A, notation | A |,that(∀A : A ⊆ N ∧ A = ∅ : A has a least element). Suppose N = {0,...,n}, n ∈ N.LetA ⊆ N.For| N | = 0 it is trivial that A is well-ordered. For | N | = n +1,ifA ∩{0,...,n} = ∅, n +1isaleast element of A.IfA ∩{0,...,n} = ∅, we can apply the induction principle to conclude that A ∩{0,...,n} has a least element. The least element of A ∩{0,...,n} is also a least element of A ∩{0,...,n+1}. 3.3. FUNCTIONS 29 3.3 Functions

In mathematics, a function maps each element from an input set to one or more elements of an output set; in other words it is a special kind of relation that indicates for each pair of the input and output set if it belongs to the function or not. More precisely, f is a function or mapping from X to Y means that f assigns to each x ∈ X a uniquely determined y ∈ Y ,no- tation f(x)=y. We can define this notion in set-theory by using a relation between X and Y such that for each x ∈ X there is a unique y ∈ Y such that ∈ f.

Definition of function: f is a function from a set X to a set Y , notation f : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X :(∃!y : y ∈ Y : ∈ f))

The definitions of domain and range as given in the subsection about relations can now also be used for functions. We now introduce a notation for the set of all functions f : X → Y .

Definition of Y X : Y X := {f ∈P(X × Y ) | f is a function from X to Y }

As we did before for relations and operations, we now define some general properties for functions.

Definition of injective: f : X → Y is injective or an injection := (∀x1,x2 : x1,x2 ∈ X : x1 = x2 → f(x1) = f(x2))

Definition of surjective: f : X → Y is surjective or a surjection := (∀y : y ∈ Y :(∃x : x ∈ X : y = f(x))

Definition of bijective: f : X → Y is bijective or a := f is surjective and f is injective

If f is bijective, f is also called a (one-to-one) correspondence between X and Y .

Example: We have the following property: f : X → Y is surjective ↔ Ran(f)=Y . 30 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Example: f : N → [−2π, 2π], with f(x)=sin(x) is a function and a relation. g :[−2π, 2π] → N,withg(x)=y iff x = sin(y) is a relation, not a function.

We will now consider two special kinds of functions: the and the .

Definition of sequence: s is a sequence of X := s is a function from N to X (i.e. s ∈ XN)

Definition of identity function: The identity function idX := idX : X → X and (∀x : x ∈ X : idX (x)=x)

We now introduce some operations on functions in set-theory. We can easily check that these definitions correspond to mathematical operations.

Definition of composition: The composition g◦f of two functions f : A → B and g : B → C := the function g ◦ f : A → C with g ◦ f(x)=g(f(x)), for all x ∈ A

Definition of inverse function: The inverse of a bijection f : X → Y := the function f −1 : Y → X with (∀y : y ∈ Y : f −1(y)=x ↔ y = f(x))

Definition of restricted function: The restriction of a function f : X → Y to X0,withX0 ⊆ X := the function fX0 : X0 → Y with (∀x : x ∈ X0 : fX0 (x)=f(x))

Just as in , we can now combine a set and relations on that set into a structure.

Definition of (relational) structure: X, R0,...,Rp is a (relational) structure := X is a set and R0,...,Rp are relations on X

The concept of a structure enables us to abstract from the exact set and relations, and reason about sets of structures instead. There also is a useful definition for equivalence of structures, called . 3.3. FUNCTIONS 31

Let R = X, R0,...,Rp and S = Y,S0,...,Sp be two structures, such that (∀i :0≤ i ≤ p :thearityofRi and Si is ni +1).

Definition of isomorphism: f is an isomorphism between R and S := f ∀ ≤ ≤ ∀ ∈ is a bijection from X to Y and ( i :0 i p :( x0,...,xni : x0,...,xni ↔ X : Ri(x0,...,xni ) Si(f(x0),...,,f(xni ))))

With the notion of isomorphism, we can now abstract over structures. When two structures are similar (the sets are of the same size and the rela- tionships between the elements in one structure are retained between images of those elements in the other structure), we call them isomorphic.

Definition of isomorphic: Two structures R and S are isomorphic,nota- tion R  S := there exists an isomorphism from R to S

Definition of : f is an automorphism of R := f is an isomorphism from R to R

Example: An isomorphism from structure N,< to Neven,< is given by f : N → Neven,withf(n)=2n. f is not an isomorphism from N, ⊕ to N,<,witha ⊕ b := b divides a.

Example: The function g : R+ → R+ with g(x)=log(x) is an isomorphism + + + between R , ∗ and R , +, because for all r1, r2 ∈ R ,log(r1 ∗ r2)= log(r1)+log(r2).

Example: An automorphism of A, R0,...,Rp is the identity function idA : 3 A → A,soidA = {| a ∈ A}. Also, the function f(x)=2x is an automorphism of R,<. 32 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3.4 Induction Methods

There is a tradition of opposition between adherents of induction and deduction. In my view it would be just as sensible for the two ends of a worm to quarrel.

- A. Whitehead, quoted in [76]

3.4.1 Induction Induction is a method of reasoning from a part to a whole, from particu- lars to generals, or from the individual to the universal. It should not be confused with the mathematical principle of induction (treated in section 3.4.3). In ordinary induction we examine a certain number of cases and then generalize. Reasoning by analogy, where a conclusion is made based on an analogues situation, is also a primitive form of induction (see [23, page 6]).

Example of : 2 Coffee shop burger no. 1 was greasy . . . Coffeeshopburgerno. 2 wasgreasy...... Coffee shop burger no. 100 was greasy . . . Therefore, all coffee shop burgers are greasy (or: the next coffee shop burger will be greasy).

So in induction the conclusion contains that was not con- tained in the premisses. This is the source of uncertainty in inductions: inductions are strengthened as confirming instances pile up, but they can never bring (unless every possible cause is actually examined, in which case they become deductions). As said in [49, page 366], the broad difference between deductive and inductive reasoning is that in deduction the conclusion asserts less than the premisses, whereas in induction it asserts more. In chapter 14, section 3 of [49] there is a more detailed treatment of inductive reasoning, including a distinguishment between determinative and conceptual induction. In both these kinds of induction, the conclusion goes beyond the premisses (or the evidence).

2Example from: Peter Suber, Philosophy department, Earlham College. 3.4. INDUCTION METHODS 33

3.4.2 Deduction Mathematics, in its widest significance, is the development of all types of formal, necessary, .

- A. Whitehead, quoted in [100]

In contrast to induction, deduction is a method of reasoning that is based on a rigorous proof: a derivation (using fixed rules called a system of logic), of one statement (the conclusion) from one or more statements (the premisses) - i.e. a chain of statements, each of which is either a or a consequence of a statement occurring earlier in the proof. In deductive reasoning, we are not directly concerned with the of the conclusion but rather whether the conclusion does or does not follow from the premisses. If the conclusion follows from the premisses, we say that our reasoning is valid;ifitdoesnot we say that our reasoning is invalid.

The Greek found deductive reasoning, not empirical procedures, the method to establish mathematical . This usage is a generalization of what the Greek called the (see [49, chapter 1, section 5 and 6)]), but a syllogism is now recognized as merely a special case of a deduction. Also, the traditional view that deduction proceeds from the gene- ral to the specific has been abandoned as incorrect by most logicians. Some experts regard all valid as deductive in form and for this and other reject the supposed contrast between deduction and induction. The German mathematician Hilbert greatly contributed to deductive reasoning as we will see when we introduce his (also known as the axiomatic method) in chapter 6. Logic, in mathematical context, can be seen as the theory of the formal structure of deductive reasoning. The logic of Hilbert’s (see section 6.1) and Russell’s Principia Mathematica (see section 7.1) are a form of reasoning with deductive certainty, although others have proposed different formalizations of deductive logic (see [49, page 121]). Originally based on Aristotle’s logic, the deductive has become more subtle and complex and is now based on modern symbolic logic. 34 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

3.4.3 The principle of induction Informal

The principle of induction, also known as ,isan important process for proving theorems. It was even used by Peano to define the concept of natural numbers (see section 4.1, axiom 3). ‘Mathematical induction’ is unfortunately named, for it is unambiguously a form of deduc- tion. The was probably inspired by the fact that, just like induction, it generalizes to a whole set from a smaller sample. But, as we will see, mathematical induction concludes with deductive certainty.

The informal structure of the proof of a theorem by mathematical induc- tion is fairly simple:

1) Basis. Prove that the theorem holds for a specific case (which often is minimal for a given ordering of the elements). This case is also called base case.

2) Induction step. Prove a rule that says that if the theorem holds for an arbitrary element, it is true for the next case. This often is a rule of heredity that tells us that the theory is true for the immediate successor case of an arbitrary element if it is true for the arbitrary element itself. The claim that the theorem is true for an arbitrary element is called the induction .

3) Conclusion. Together, 1 and 2 imply that the theorem holds for all cases starting with the base case. If you didn’t use the minimal case in step 1, then you have proven only that the theorem holds for that case and its successors, not for all possible cases.

The induction step can take two forms which correspond to two forms of mathematical induction. Again we assume there is an ordering of the ele- ments with +1 the immediate successor relation. Weak: prove that if the theorem holds for an arbitrary element n,thenit holds for the element n +1 Strong: prove that if the theorem holds for all elements up to some arbitrary element n, then it holds for the element n +1 3.4. INDUCTION METHODS 35

We will now formally state the principle of induction. This is important, since many mistakes are being made in applying the principle. It does not go without saying that if we are to use mathematical induction to prove that some theorem applies to ‘all possible cases’, then those cases must somehow be enumerable and in some way linked to the integers. And we have to be able to speak about the minimal case, the nth case, the successor of a given case, etc.

Formal

Suppose that we want to prove a property ϕ(s) that holds for all s ∈ S. The induction principle assumes that S is a well-founded set and every element except for the smallest has an immediate predecessor. This condition is also known as S is inductive. The structure of an in fact resembles that of the naturals, i.e. if we have the axioms (see in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is inductive. In case the set S is the naturals, we also refer to the principle as natural induction. The principle presupposes the following two conditions:

AlS is a set, well-founded by relation R (such that ‘+’ denotes the im- mediate successor of an element with respect to the relation R)and with smallest element e

BlEvery element except e has a (unique) immediate predecessor and ϕ is a property of elements of S

If Aland Blhold, we can use the induction principle.

Definition of the (weak) (mathematical) induction principle: if

Clϕ(e) (i.e. e has a property ϕ)

Dl(∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the (unique) immediate successor of s also has property ϕ) then the property ϕ holds for every element in S 36 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Step Clis also called the base of a proof by induction, step Dlis also called the induction step,andϕ(s) is called the induction hypothesis.

Proof: Suppose S is a well-founded set and every element except the small- est, denoted e, has an immediate predecessor, and suppose that a property ϕ is true for e,aswellasfortheimmediatesuccessors+ ∈ S if it is true for s ∈ S. We now prove by that ϕ holds for all s ∈ S. Suppose that ϕ is not true for all s ∈ S.LetN be the set of elements of S for which ϕ is not true, i.e. N = {s ∈ S |¬ϕ(s)}. By the theorem of page 26 we also know that if S is well-founded, any subset of S is also well-founded, thus N contains a smallest element n.Ifn = e, we have a contradiction. If n>e, n has an immediate predecessor, denoted n−.Sincen is the smallest element for which ϕ doesn’t hold, ϕ must hold for n−.ButthenbyDl, ϕ must also hold for the immediate successor of n−,thatisn: contradiction. Thus ϕ must be true for all s ∈ S.

As we mentioned before, this principle can be generalized in several ways. One way is to prove in step Clthat ϕ holds for a (possibly non-minimal) case b ∈ S.InstepDlwe then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)). The conclusion then is that the property ϕ holds for all elements in S that are ordered larger or equal to b.

We now show (with ) why the additional property Bl that every element except the smallest must have an immediate predecessor is necessary for the induction principle. Consider the natural numbers with the ordering  defined as follows:

• if n and m are both even, then n  m if n

• if n and m are both odd, then n  m if n

• if n is even and m is odd, we always define n  m

We can check that N is well-founded by , but not every element (for example 1) has an immediate predecessor. We take the property ϕ that every element is even. The smallest element in the ordering is 0, which is even. Also, if s has property ϕ then so does the successor of s. That is because in our ordering, the successor of an even number is always the next even number, never an odd number, and if s has property ϕ,thens must be even. 3.4. INDUCTION METHODS 37

Therefore (with only conditions Al,Cland Dlholding) every natural num- ber is even: contradiction!

There is however a weaker principle, called transfinite induction which - suitably stated - does apply to every well-ordered set. But first we regard a stronger principle, that is based on the same assumptions (Aland Bl)asthe weak induction principle.

Principle of strong (mathematical) induction: The same as for (weak) induction, but instead of Cland Dlwith   D2 )(∀x : x ∈ S :(∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ S we have ϕ(x)ifallR-predecessors y of x have property ϕ)

Sometimes this is also informally stated using the infamous three dots as (∀s : s ∈ S :(ϕ(e) ∧ ϕ(e+) ∧ ...∧ ϕ(s)) → ϕ(s+).

Proof: Suppose X, R is a structure such that Al,Bland Elhold. Again we use proof by contradiction, and assume (∃x : x ∈ X : ¬ϕ(x)). Thus {x ∈ X |¬ϕ(x)} is non-empty and has a smallest element e (since X, R is well-founded). We now have ¬ϕ(e) ∧ (∀z : z ∈ X : R(z,e) → ϕ(z)). According to El(substitute z for y, X for S,andtakee for x) we then have ϕ(e): contradiction.

Note that the base case is not really left out, since it is implicitly present in the quantification (take e for x). This form of induction, when applied to ordinals (ordinals form a well-ordered and hence well-founded set and are introduced in section 3.8.2) is called transfinite induction.

Principle of transfinite induction3: The same as for strong induction, but instead of Aland Blas assumptions, it can be applied to any set S that is well-ordered by relation a R, and with smallest element e.

3Sometimes this principle is called the Principle of Complete Induction, for example in [4], but this is less common. 38 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

An example of such a set are the ordinals or cardinals, or even the class of all ordinals. A proof by transfinite induction typically needs to distinguish three cases:

1. s is a minimal element

2. s has an immediate predecessor (i.e. the set of elements which are smaller than s has a largest element) In this case we can apply normal induction.

3. s has no immediate predecessor (i.e. s is a so-called limit-ordinal, see also section 3.8.2) The case for limit ordinals is typically approached by noting that a b is (by definition) the union of all ordinals a

Proof: The proof of the principle of transfinite induction is similar to the proof of the strong induction principle.

Clearly, all three given principle are equivalent, since we proved them to be true. These proofs however are based on an underlying set of axioms (the so-called ZF axioms and the Peano axioms, that will be introduced in section 5.3 and chapter 4 respectively). Without these conditions (to be exact, with- out Peano’s induction axiom), we cannot directly prove the principles to be true from the ZF axioms alone4. In that case we can prove the equivalence of the principles by showing that they imply each other. As an example, we now prove that (mathematical) induction is a special case of transfinite induction, for the set of natural numbers. To prove this it suffices to show that ( Cland Dl) ↔ El.

4With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible to prove mathematical induction. An extra axiom is needed, the infamous , or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s Lemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2) and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one of them to be true, the others follow as consequences, but none of them can be proven from the other fundamental axioms in ZF set theory alone. There are also other equivalent statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise to prove the equivalence of these statements. 3.4. INDUCTION METHODS 39

Normal induction (IND):

(∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k +1))→ (∀n : n ∈ N : ϕ(n)))

Transfinite induction (TFIND):

(∀ψ :: (∀q : q ∈ N :(∀p : p ∈ N : p

We can prove the equivalence of IND and TFIND in two ways: in a con- structive way or with a proof by contradiction. We give both proofs.

Proof by Contradiction: (from: [17])

It suffices to prove that IND’ ≡ TFIND’, with

IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)))

TFIND’ ≡ (∀ψ :: (∀q : q ∈ N :(∀p : p ∈ N : p

Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’, and instantiate ψ with the property ϕ. We now want to prove IND’. If we take q =0, (∀p : p ∈ N : p<0 → ϕ(p)) is trivially true. Thus we have ϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Assume k ∈ N,ϕ(k) ∧¬ϕ(k + 1). That means the condition of TFIND’ (∀p : p ∈ N : p

Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For all properties ψ we have to prove (∀q : q ∈ N :(∀p : p ∈ N : p0. Suppose we have (∀q : q ∈ N :(∀p : p ∈ N : p< q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)), and thus ϕ(q) also holds for all q>0. Hereby we have proved TFIND’. 40 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Constructive Proof:

ProofofTFIND→ IND: Assume TFIND, and let ϕ be a property. We now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k +1)) → (∀n : n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). We want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us: (∀k : k ∈ N :(∀l : l ∈ N : l0, and (∀l : l ∈ N : l0.

ProofofIND→ TFIND: Assume ψ is a property. Also assume that (i): (∀k : k ∈ N :(∀l : l ∈ N : l

Structural Induction

In many cases we do not want to prove properties about the integers or similar well-ordered sets. In such cases straight induction is not always useful. However, forms of induction can also be appropriate when trying to prove properties about structures defined recursively. This generalized induction principle is known as .Itisusefulwhenobjectsarebuilt up from more primitive objects: if we can show the primitive objects have the desired property, and that the act of building preserves that property, then we have shown that all objects must have the property. The induc- tive hypothesis (i.e., the assumption) is to assume that something is true for ‘simpler’ forms of an object and then prove that it holds for ‘more complex’ forms. ‘Complexity’ can be defined in several ways: the most common way is to say that one object is more complex than another if it includes that 3.4. INDUCTION METHODS 41 other object as a subpart, but this need not always be the case.

A general treatment of recursively defined structures (formal definition of structural induction over recursive datatypes) will be presented in a later version of this report.

Example: We show that mathematical induction is an instance of the general notion of structural induction over values of recursively defined types, in a later version of this report.

Example: As an example of the use of mathematical induction we prove the . The binomial theorem states that for all x, y ∈ R,and n ∈ N we have n n EQ ≡ (x + y)n = xn−j yj j j=0 We call the left-hand side of this equality LHS, and the right-hand side RHS, and abbreviate the equality by EQ. We assume two real numbers x and y and prove EQ by induction on n.

Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For some reason, most textbooks take n = 1 as the basis, in which case LHS is simply x + y,andRHSis 1 1 x1−0y0 + x1−1y1 = x + y 0 1 Induction case: We assume EQ is true for n = k andhavetoshowthatitis then also true for n = k +1: +1 k k +1 (x + y)k+1 = xk+1−j yj j j=0 First, we rewrite the left side of this equation: LHS =(x + y)k+1 =(x + y)k (x + y)= (here in fact we are using the induction hypothesis) k k xk−j yj (x + y)= j j=0 42 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY k k k k xk−j+1 yj + xk−j yj+1 j j j=0 j=0 In the right side of the equation, we use Pascal’s identity: n +1 n n (∀k, n : k, n ∈ N ∧ 0

+1 k k k k RHS = xk+1−j yj + xk+1−j yj j j − 1 j=0 j=1 3.4. INDUCTION METHODS 43

and k k k k LHS = xk−j+1 yj + xk−j yj+1 j j j=0 j=0

The first sums of LHS and RHS are the same, and we can see that the second sums are also equal, by doing a dummy transformation (let i = j −1):

+1 k k k k xk+1−j yj = xk−i yi+1 j − 1 i j=1 i=0

So LHS = RHS, and we can conclude that EQ holds for all x, y ∈ R and n ∈ N.

Example: We give an example of a proof about binary trees using structural induction. First we define a data structure for binary trees. For this example we will use a definition in the notation of the language Z to describe recur- sive data structures. The structure of a binary is well known and says that a tree is either a leaf or made up of two subtrees glued together by a node.

TREE ::= leaf | node < TREE × TREE >

An example of such a tree is node(leaf, node(node(leaf, leaf), leaf)). We now define the size of a tree, by both the leaves and the nodes. The basic idea of the definition is that we define the size of a tree inductively over the structure, saying how the size of a given tree is calculated from the sizes ofitsparts.AgainwedefinethesizeinthelanguageZ, by first declaring its type and then saying how it is defined in each of the two cases: 44 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY size : TREE → N ∀ t1, t2 :TREE• size(leaf) = 1 ∧ size(node(t1, t2))=1+size(t1) + size(t2)

Similarly, we make two new definitions about trees: leaves: TREE → N nodes: TREE → N ∀ t1, t2 :TREE• leaves(leaf) = 1 ∧ leaves(node(t1,t2)) = leaves(t1)+leaves(t2) ∧ nodes(leaf) = 0 ∧ nodes(node(t1,t2))=1+nodes(t1) + nodes(t2)

We now want to prove the following theorem by structural induction on the size of the tree t.

Theorem: For all trees t, size(t) = leaves(t) + nodes(t).

 Proof: Let t, t , t1 and t2 be of type TREE. We prove the theorem by induction on the size of t. Base case: Assume t=leaf. Then size(t) = size(leaf) = 1. Also, leaves(t)+ nodes(t)=leaves(t)+0=1+0=1. Induction case: Assume t =node(t1, t2). The induction hypothesis says that   the theorem holds for all t with size(t ) < size(t). Then size(t)= size(node(t1, t2)) = 1 + size(t1) + size(t2) = (apply induction hypothesis to t1 and t2)1 +(leaves(t1) + nodes(t1)) + (leaves(t2) + nodes(t2)). And leaves(t) + nodes(t) = leaves(node(t1, t2)) + nodes(node(t1, t2)) = (leaves(t1)+leaves(t2)) + (1 + nodes(t1) + nodes(t2)) = (commutativity and associativity of + ) 1 + (leaves(t1) + nodes(t1)) + (leaves(t2) + nodes(t2)). 3.5. REAL NUMBERS 45 3.5 Real numbers

What do we mean when we say ‘continuum’? Here is a description Albert Einstein gave on page 83 of [21]:

The surface of a marble table is spread out in front of me. I can get from any point on this table to any other point by passing continuously from one point to a ‘neighboring’ one, and repeating this process a (large) number of times, or, in other words, by going from point to point without executing ‘jumps’. I am sure the reader will appreciate with sufficient clearness what I mean here by ‘neighboring’ and by ‘jumps’ (if he is not too pedantic). We express this property of the surface by describing the latter as a continuum.

People have been using the concept of real numbers for a long time (the Babylonians for example already calculated with roots long B.C., see [12]). In order for set theory to cover the fundamental structures of analysis, a precise and formal basis for the real numbers was needed. Even simple equa- tions have no solutions if all we knew were rational numbers (for example, there is no x such that x2 = x ∗ x =2).

When Cantor developed his set theory, it was well known that each type of number could be constructed as the limit of a sequence of numbers of another type. But it became clear that, especially in connection with theorems as- serting the existence of some limit relations, (see [30, page 182]) the proof might require irrational numbers to be defined in terms of rational ones, in order to avoid of existence involved in the theorem. Cauchy and Heine tried to define the irrational or real numbers in the second half of the 19th century. In 1872 Cantor and Dedekind followed with their precise definition of the real numbers. We first present the three methods (of Dedekind, Cantor and Cauchy) of defining the reals in terms of rationals and then show that they are identifiable. 46 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

3.5.1 Dedekind’s cuts As a in the Polytechnic School in Z¨urich I found my- self for the first time obliged to lecture upon the elements of the differential calculus and felt more keenly than ever before the lack of a really scientific foundation for arithmetic.

- , in the opening of the paper in which Dedekind’s cuts were introduced.

Dedekind defined a cut to determine a . A cut is a partition of a sequence into two disjoint nonempty subsequences, all the members of one of which are less than all the members of the other. Dedekind used the point at which the sequence is partitioned5 to define a real number.

Definition of a (Dedekind) cut: Given an ordering < on a set V , a subset C ⊆ V is a cut in V := 1) C = ∅∧C = V

2) (∀a, b : a, b ∈ C : a ∈ C ∧ b

3) C does not have a greatest element Example: {x ∈ Q | x2 < 2} is a cut in Q. Notice that we can also define the same cut as {x ∈ Q | x4 < 4}.

Each real number r can now be defined by a cut C in Q if r is the supre- mum for C. Each cut then determines a unique real number (see paragraph 3.5.4). We want to identify cuts that define the same real number, such as for example {x ∈ Q | x2 < 2} and {x ∈ Q | x4 < 4}.

Definition of (Dedekind) cut equivalence:AcutC1 is equivalent to a cut C2, notation C1 ∼ C2 := there is a supremum r for C1 and for C2

We can now define RDedekind as the set of all equivalence classes of all cuts in Q: RDedekind := {C ⊆ Q | C is a cut in Q }/∼.

5Actually, Dedekind’s original definition did not use a partition but a slightly more complex division. For details see the link ‘Dedekind cuts’ at http://zax.mine.nu/stage. 3.5. REAL NUMBERS 47 √ 2 Example:√{x ∈ Q | x < 2} has 2 as supremum. We can√ identify the real number 2 with the equivalence class of all sets that have 2 as supremum.

3.5.2 Cantor’s chains of segments In mathematics the art of proposing a question must be held of higher value than solving it.

- A thesis defended in Cantor’s doctoral examination.

Cantor defined a chain of segments to determine a real number (see also [17, chapter 12]). This is a sequence of ever decreasing intervals in Q,the limit of which determines a unique real number.

Definition of chain segments: V n∈N is a chain of segments (in V ):=

1) (∀n : n ∈ N : an ∈ V ∧ bn ∈ V )

2) (∀n : n ∈ N : an ≤ an+1 ≤ bn+1 ≤ bn)

−n 3) (∀n : n ∈ N : bn − an ≤ 2 )

Example: Consider the following chain of segments in Q: << 1, 2 >, < 1.4, 1.5 >, <√1.41, 1.42 >, < 1.414, 1.415 >,...>. Each segment ‘includes’ 2.

V V Note that n∈N (notation or when it is clear which set V is meant) is actually a sequence, and in 3) a minimum bound is put on the speed of convergence. We now want to be able to say when two chains are equivalent.

Definition of chain equivalence: The chains of segments and are equivalent, notation := (∀k : k ∈ N : bk ≥ ck ∧ dk ≥ ak)

Theorem: ∼ is an equivalence relation on the set of all chains of segments of Q 48 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Each equivalence class of chains of segments in Q now determines uniquely arealnumberr. To be precise, r is determined by ∼ if (∀n : n ∈ N : an

We can now define RCantor as the set of all equivalence classes of chains Q R Q ∼ of segments in : Cantor :=n∈N /

3.5.3 Cauchy-sequences Men pass away, but their deeds abide.

- Louis Cauchy, his last words quoted in [22].

Cauchy defined a to determine a real number. His sequence of numbers defines a real by letting the numbers come closer to the real num- ber in every step.

Definition of Cauchy Sequence: With  a partial order on a set6 V , { }V an n∈N is a Cauchy sequence in V :=

1) (∀n : n ∈ N : an ∈ V )

2) (∀k : k ∈ N :(∃p : p ∈ N :(∀n, m : n, m ∈ N : n, m > p → −k | an − am |≤ 2 )))

Example: The informally (using ‘...’ to informally indicate an infinite con- tinuation) defined sets {1, 1.4, 1.414, 1.4142, 1.41421, 1.414213,...} and { } ∈ N 1, 1.414,√1.4121,... are both Cauchy sequences. For each n , an+1 lays closer to 2thanan.

We also denote a Cauchy sequence {an}n∈N simply by an. We now want to be able to say when two Cauchy sequences are equivalent.

6V is in general an ordered, commutative . We will not further discuss this here, and for the rest of this paragraph take V = Q. 3.5. REAL NUMBERS 49

Definition of Cauchy sequence equivalence: The sequences an and bn are equivalent, notation an ∼ bn := limn→∞(an) = limn→∞(bn)

Note that in the definition of equivalence the hitherto undefined notion of a limit is used. With the following definition we can formalize the notion of a limit.

Definition of sequence convergence: A sequence {an}n∈N of elements of asetV is said to converge to a sequence {bn}n∈N, notation limn→∞(an)= limn→∞(bn):=(∀k : k ∈ N :(∃p, q : p, q ∈ N :(∀n, m : n, m ∈ N ∧ n> −k p ∧ m>q:| an − bm |< 2 )))

Note: convergence is usually defined in terms of real numbers, but we can- not use such definition here because we yet have to define the reals. The num- ber r is then called the limit of the sequence an, notation limn→∞(an)=r, −k if (∀k : k ∈ N :(∃p : p ∈ N :(∀n : n ∈ N ∧ n>p:| an − r |< 2 ))). A sequence is said to diverge if it does not converge.

Theorem: Any convergent sequence {an}n∈N is bounded and has a unique limit. Proof: First we prove (by contradiction) the uniqueness. Suppose the se- quence has 2 limits, c and c.Takeanyk ∈ N. Then from the definition of −k convergence there is an p such that | an−c | < 2 if n>p. Also, there   −k  is an integer p such that | an − c | < 2 ,ifn>p. Adding the two equa- tions we get (using the inequality: (∀a, b :: | a + b |≤|a | + | b | ))    −k : | c − c | = | (an − c)+(c − an) |≤|an − c | + | an − c | < 2 ∗ 2. Hence, | c − c | < 2 ∗ 2−k, for all k ∈ N,ifn>p∧ n>p. This means c = c, thus the limit is indeed unique. Now we prove boundedness. The sequence converges, so we can take, for example, k = l. Then there is a p such that −k | aj − c | < 2 for j>p. We then have, again using the , −l that | aj |≤| aj − c | + | c | < 2 + | c |. Then the sequence can be bounded by M = max.{| a1 |, | a2 |,...,| ap |, (1 + | c |)}

Each real number can now be defined by an equivalence class of Cauchy sequences: r is determined by an ∼ if r = limn→∞(an), for each sequence an from the equivalence class an ∼. 50 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

We can now define RCauchy as the set of all equivalence classes of Cauchy Q R Q ∼ sequences in : Cauchy :=n∈N /

3.5.4 Properties of the three definitions Before these definitions for real numbers were given, we intuitively thought of the reals as infinite sequences of (decimal) digits. In the rest of this section we assume that by R we mean this set of reals, i.e. all infinite sequences of decimal numbers. We can now check whether the three new definitions indeed are correct ways to identify real numbers:

1) Q is a chain of segments → (∃!c : c ∈ R :(∀n : n ∈ N : an ≤ c ≤ bn)) 2) C is a cut in Q → (∃!c : c ∈ R : c = supremum(C))

3) {an}n∈N is a Cauchy sequence → (∃!c : c ∈ R : limn→∞(an = c)) Then we can check for every newly defined set X of reals that:

a) it contains a countable, densely ordered (i.e. (∀r1,r2 : r1,r2 ∈ D :(∃q : q ∈ Q : r1

b) every has a supremum in X.

Every set for which a) and b) hold is isomorphic with R. If a definition satisfies a) and b) it possesses the properties we intuitively want the real numbers to have. It can be proven that if these two properties hold we have defined the reals successfully such that there is a total ordering on the reals, the reals are densely ordered and the ordering is continuous. 3.6. INFINITE SETS 51 3.6 Infinite sets

Our minds are infinite, and yet even in these circumstances of finitude we are surrounded by possibilities that are infinite, and the purpose of life is to grasp as much as we can out of that in- finitude.

- A.N. Whitehead in [76]

The size of a finite set V , notation | V |, can be defined by the number of elements that it has. But counting the elements does not end for infinite sets. Cantor was concerned with the problem of measuring the sizes of infinite sets (because he was investigating questions about singularities of , see [30, chapter 4]) and proposed a rather nice solution to this problem. He observed that two finite sets have the same size if the elements of one set can be paired with the elements of the other set; this method compares sets without resorting to counting and can be extended to infinite sets.

This is the concept of an equivalence relation between sets (the relation is also referred to as ‘are of the same ’, ‘equipotent’ or ‘equipollent’ (see [30, page 229])).

Definition of set equivalence: AsetV is equivalent to a set W , notation V ∼ W := there is a bijection f : V → W

It is simple to check that ∼ has the properties of an equivalence relation, i.e. it is reflexive, symmetric and transitive. But if we consider ∼ to be a true relation, we need the concept of V , the set of all sets: ∼⊆V × V .But the existence of V is paradoxical, see section 3.8.

This new method to the number of elements of a set is reflected in the notion of cardinality of a set, and led to the surprising result that there are many levels of infinity. Before we present a proof of this result, using Cantor’s famous diagonalization method, we first introduce some more definitions. 52 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Postulate for Cardinal numbers: With every set V is associated a well-defined abstract entity V , called the of V , such that V ∼ W ↔ V = W .WecanthinkofV as denoting the common property of set equivalence (as defined above) of all sets in the equivalence class of V .

It proved difficult however, to come to an exact definition of cardinality from this postulate. Cantor regarded cardinals as special abstract entities of a new kind. In 1884, the German mathematician Frege came with his own definition of cardinal numbers. He discussed it with the mathematician Russell and they proposed the idea of defining V as V/ ∼, the equivalence class of V modulo ∼. The postulate for cardinal numbers then follows at once. Frege also denoted finite cardinal numbers as natural numbers: ∅ =0, {∅} =1,{∅, {∅}} =2,.... This Frege-Russell definition would become stan- dard, until - as we will later see in section 3.8 - it became known that this definition could also lead to a paradox.

Cantor used the Hebrew letter aleph to name the different levels of in- finity. The cardinality of the set of natural numbers is by definition called aleph-null or aleph-nough, notation ℵ0. The ‘next levels’ of infinity are called ℵ1, ℵ2,.... Since the cardinality of the set of reals was unknown, Cantor de- fined it as c. If we assume the (see section 3.7), that says there is no level of infinity between the cardinality of N and R,thecar- dinality of the set of reals can also be denoted by aleph-one, notation ℵ1.

Property of cardinality: Given the cardinality V of a set V ,wehave

• If V is finite: V = the number of elements of V

• If V is infinite: V = ℵi, when there exists a bijection between V and the set (N)

Sometimes the cardinality of a set V is also denoted by | V |,afterthesize of a set V . A more rigorous treatment of cardinal numbers will be given in section 3.8.1. This new concept enabled Cantor to define more concepts for the analysis of infinite sets. It also inspired others to analyze the properties of infinite sets. 3.6. INFINITE SETS 53

No other question has ever moved so profoundly the spirit of man, no other idea has so fruitfully stimulated his intellect; yet no other concept stands in greater need of clarification than that of the in- finite.

- D. Hilbert, quoted in [96]

In the rest of this section we will present some of the results of the research of infinite sets.

Definition of finite: AsetV is finite := (∃n : n ∈ N : V ∼{x ∈ N | x

Definition of infinite: AsetV is infinite := V is not finite

Definition of Dedekind infinite: AsetV is Dedekind infinite := (∃W : W ⊂ V : V ∼ W )

Theorem: V is Dedekind infinite ↔ V is infinite (from [17]) Proof: We show that V is infinite iff N ≤1 V . We prove the two implications of the theorem separately:

V is Dedekind infinite → V is infinite: V is Dedekind infinite, i.e. there exists a W ⊂ V such that V ∼ W , i.e. there exists a bijection f : V → W . Because W is nonempty and W ⊂ V there also exists an a ∈ V such that a/∈ W . Consider the function g : N → V , defined recursively by g(0) = a and g(k +1)=f(g(k)). We now have to show that g is an injection, i.e for all i, j ∈ N : i = j → g(i) = g(j). We use induction on i:

i =0:if0= j then g(0) = a/∈ W and g(j) ∈ W ,sog(0) = g(j).

i = k + 1 : assume k +1= j, then we can prove g(k +1)= g(j)by induction on j:

j =0:g(0) = a/∈ W and g(k +1)∈ W ,sog(k +1)= g(0). j = l +1:weknowk =1= j = l +1,sok = l. By the induction hypotheses g(k) = g(l). Since f is a bijection we also have that f(g(k)) = f(g(l)), i.e g(k +1)= g(l +1)org(i) = g(j). 54 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

V is Dedekind infinite ← V is infinite: N ≤1 V , so there exists a bijec- tion f : N → V . We show that W := V −{f(0)}, clearly a real subset of V (W ⊂ V ), is equivalent to V (W ∼ V ). The following function g is a bijection from V to W : g(f(i)) = f(i +1),g(x)=x if x = f(i), for all i ∈ N.

Definition of countable: AsetV is countable, also called denumerable := V is finite or V ∼ N

Definition of uncountable: AsetV is uncountable := V is not countable

Definition of denumeration: A denumeration ofasetV is a bijection f : N → V

Cantor then proved that N, Z and Q all have the same cardinality and also called these sets countably infinite.

Theorem: Q is countable Proof: We give a bijection from N to Q, by listing all elements of Q. Consider a ∈ N ∈ N+ a th a table with all fractionals b (a ,b , with fractional b on the a row and the bth column. If we list all elements row by row, we would not obtain a correspondence between N and Q, since the list would never get to the second row. By listing the elements at the diagonals (south-west to north-east), starting from the north-west corner, we obtain a correspondence N Q 2 1 between and . Because 2 = 1 , etc, we hereby skip an element when it would cause a repetition. We can also give a bijection from Q to an infinite subset of N which is equivalent to N: for each fractional a ∈ Q with a and b 1 b relative prime, let f():= 2 (a + b)(a + b +1)+n.

An example of an is the set of real numbers, R. In 1873 Cantor proved that R is uncountable, using a technique called diagonaliza- tion (also known as the diagonal method), see [17, page 99].

Theorem: R is uncountable Proof: Suppose there is a bijection f between N and R. We contradict this by finding an x in R that is not paired with anything in N. We construct this X by taking the first fractional digit of x arbitrarily but never 0 or 9 or the first fractional digit of f(1), the second fractional digit of x also different from 0, 9, and the second fractional digit of f(2), etc. Continuing this way 3.6. INFINITE SETS 55 down the diagonal of the table of digits, we obtain all digits of x. x is not f(n) for any n because the nth fractional digit of x differs from the nth frac- tional digit of f(n). Note that we avoid the problem of certain numbers such as 2.3999 ... and 2.4000 ... being equal by never selecting a 9 or a 0. Similarly, we can use this diagonalization method to show that N ∼{0, 1}N.

Theorem: (∀V :: P(V ) ∼{0, 1}V ). (see [17, page 98]) Proof: We show that there is a bijection K from P(V )to{0, 1}V .For W ⊆ V ,defineK(W ) (also denoted KW ), the characteristic function of W , as: KW (v)=1ifv ∈ W KW (v)=0ifv/∈ W .

We now show that K is a bijection from P(V )to{0, 1}W :

1) f is injective: let W1,W2 ⊂ V and suppose W1 = W2, that means there is an element w ∈ V , such that (w ∈ W1∧w/∈ W2)∨(w/∈ W1∧w ∈ W2). ∧ ∨ ∧ Then we have that (KW1 (w)=1 KW2 (w)=0) (KW1 (w)=0 ∃ ∈ KW2 (w) = 1), and thus ( w : w V : KW1 (v) = KW2 (v)), i.e. KW1 =

KW2 .

V 2) f(w) is surjective: suppose g ∈{0, 1} .LetWg = {v ∈ V | g(v)=1}. ∀ ∈ ↔ ∀ ∈ Then ( v : v V : KWg (v)=1 g(v)=1),thus(v : v V :

KWg (v)=g(v)), and g = KWg .

We can define an ordering relation ≤1 on the of sets. We say that V ≤1 W if there is an injection from V to W but not vice versa. Then V<1 W of course means that V ≤1 W holds but not V ∼ W .This relation on the set of cardinals only depends on the cardinals themselves and not on the choice of the particular sets V and W . The relation ≤1 is reflexive and transitive. Cantor also conjectured that ≤1 is a partial order. This was later proven independently by the two mathematicians F. Bernstein and E. Schr¨oder (see [59, page39]).

We give two theorems that are based on the relation <1:

Theorem: (without proof) (∀V : V is a non-empty set: V<1 P(V )) 56 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Theorem: V is Dedekind infinite ↔ N ≤1 V Proof: This theorem follows directly from the theorem on page 53 and the definition of infinite.

Although we have seen that N is countable but R is not, we might still think that there is some smaller interval of the reals that can be paired to the naturals.

Theorem: N ∼ [0, 1] ProofofPoincare´ (see [17]) We show there is no bijection f : N → [0, 1], in particular (∀f :(f : N → [0, 1]) : f is not surjective). We do this by constructing for every function f : N → [0, 1] a y ∈ [0, 1] such that (∀n : n ∈ N : f(n) = y). We construct this y by means of a chain of segments (see paragraph 3.5.2). Let f : N → [0, 1]. Let Sn be an infinite chain of segments such that

1) (∀i : i ∈ N : f(i) ∈/ Si)

2) (∀i : i ∈ N : Si+1 ⊆ Si)

−i−1 3) (∀i : i ∈ N : | Si | =3 ), with | Si | being the length of segment Si.

We can construct such a chain of segments, for if we divide a segment −n−1 Sn =[pq,qn] in three equal parts (i.e. each part has length 3 ), at least one of these parts does not contain f(n + 1). We take this part for Sn+1. The constructed chain of segments determines (see paragraph 3.5.2) a real number y,with(∀n : n ∈ N : y ∈ Sn), and thus certainly y ∈ [0, 1]. We also have that (∀n : n ∈ N : f(n) ∈/ Sn ∧ y ∈ Sn),i.e.so(∀n : n ∈ N : y = f(n)).

The following theorem gives a way to prove the equivalence of sets:

Theorem of Cantor-Bernstein: V ≤1 W ∧ W ≤1 V → V ∼ W Proof: Assume V ≤1 W and W ≤1 V . Then there are injections f : V → W and g : W → V . We know that Dom(g)=W , so to prove g is surjective we have to prove Ran(g) ∼ W .SinceRan(g) ⊆ V and g ◦ f is an injec- tion from V to Ran(g), we have V ≤1 Ran(g). And since for all W and V , W ⊆ V ∧ V ≤1 W → V ∼ W (see the lemma below), we have Ran(g) ∼ V . 3.6. INFINITE SETS 57

Lemma: W ⊆ V ∧ V ≤1 W → V ∼ W Proof: Suppose W ⊆ V and V ≤1 W . There is an injection h : V → W .Let A0 := V − W ,and(∀n : n ∈ N : An+1 := h(An)). We now give the desired bijection k : V → W . • ∈ k(a):=a if a/ n An • ∈ k(a):=h(a)ifa n An

We show that k is a bijection:

• k is injective: Suppose a = b,then k(a) = k(b) by using a case analysis ∈ ∧ ∈ ∈ ∧ ∈ ∈ ∧ ∈ a/ n An b/ n An,a / n An b n An,a n An b/ ∈ ∧ ∈ n An ,a n An b n An. For all cases, it follows that k(a) = k(b) by the definition of k and the injectivity of h.

• k is surjective: Suppose w ∈ W ,thusw/∈ A0. Again we use case analysis: – if w/∈ A then w = k(w). n n ∈ ∈ ∈ ≥ – if w n An, assume w Ap.Sincew/A0,p 1. Thus there is   a w ∈ Ap−1 such that w = k(w ). Example: We prove that (a, b) ∼ [0, 1] for all a, b ∈ R by using the theorem of Cantor-Bernstein. We first prove that (0, 1) ∼ [0, 1] and consequently that (0, 1) ∼ (a, b). Then, by the transitivity of ∼ we can conclude that (a, b) ∼ [0, 1].

Proof of (0, 1) ∼ [0, 1]: The identity function id(0,1) :(0, 1) → [0, 1] is an injection from (0, 1) to [0, 1], so (0, 1) ≤1 [0, 1]. The function 1 ≤ f(x)= 3 (x +1)isaninjectionfrom[0, 1] to (0, 1), so [0, 1] 1 (0, 1). By the theorem of Cantor-Bernstein we now know that (0, 1) ∼ [0, 1].

Proof of (0, 1) ∼ (a, b): The function f(x)=(b − a)x + a is a bijection from (0, 1) to (a, b).

Using the Cantor-Bernstein theorem we can also prove that (a, b) ∼ (0, 1) ∼ R ∼ Rn ∼{0, 1}R ∼P(N) ∼ NN, for all n ∈ N,n≥ 1. 58 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

Theorem: V is infinite → N ≤1 V Proof: V is infinite and thus not empty. We take one element x0 ∈ V .Next, we take an element x1 ∈ V −{x0}. We can repeat this infinitely (i.e. for all n we can select an x ∈ V −{x0,...,xn}), if we assume that it is possible to always select an element from any non-empty set (see the axiom of choice below). In this way we get a countable subset of V ,namely{x0,x1,x2,...}. The only assumption we have made here is the so-called axiom of choice.

Axiom of choice (AC): Given any set W of non-empty sets V ,thereisa function f which assigns to each member V of W an element f(V )ofV .

This definition was proposed first in an article by Zermelo in 1908 (trans- lated in [93, pages 199-215]). Such a function f is called a choice function for W . The axiom can be restricted by limiting to those families W of a par- ticular cardinality. Since for any finite W the axiom is provable, the weakest non-trivial case occurs when W is denumerable (see page 54 for the definition of denumerable). This case is known as the Denumerable axiom. Zermelo regarded the AC as already implicitly used by mathematicians. In response some people asked when this assumption developed from mathematics, when it is implicitly used, and when exactly it can or cannot be avoided. Zermelo attempted to prove AC, but the controversy over his proof of 1904 (see [63, page 310]) led Zermelo to axiomatize set theory (see section 5.3.1). We can add AC to set theory based on the axioms of Zermelo and Fraenkel (ZF, see section 5.3), in which case it is termed ZFC (ZF supplemented by the Axiom of Choice). For more details on the role of the AC, we refer to section 5.3 and [63]. See http://zax.mine.nu/stage and click on ‘links’ for some quotes about the AC.

An instance of the following theorem (without proof) of the British ma- thematician F.P. Ramsey is often used in . The notation V n in this theorem is defined as the set of all subsets of V with n elements, i.e. V n := {X ⊆ V | X has n elements}.

Theorem of Ramsey: If V is a denumerable set and f : V n →{0, 1,...,m− 1} with n, m ∈ N and n, m ≥ 1then(∃W : W ≤1 V : W is denumerable and f is constant on W n). 3.6. INFINITE SETS 59

Theorem: R2 ∼ R ∼ (0, 1) Proof: We can say that R ∼ (0, 1) if there is a bijection between (0, 1) and R. Indeed, there exists a bijection f :(0, 1) → R,definedasf(x)= π − R ∼ R2 tan( 2 (2x 1)). Thus: (0, 1). If we consider an element of ,thatistwo real numbers between 0 and 1, then we can these numbers to an element r ∈ R by interchangeably taking the next digit of each of the two numbers. For example, we map (0.76584 ...,0.13275,...) uniquely to (0, 71635 ...). Thus: R2 ∼ R.Since∼ is transitive, we know that R2 ∼ R ∼ (0, 1).

Theorem: P(N) ∼ (0, 1) Proof: First we show that P(N) ≤1 R. Suppose V ∈P(N), map V to the decimal 0.a1 a2 ...,withai =1ifi ∈ V and ai = 0 otherwise. This injection proves that P(N) ≤1 R. Now we give an injection from (0, 1) to P(N): assume r ∈ (0, 1), i.e. r =0.a1a2 ... with 0 ≤ ai ≤ 9. We want to identify numbers such as 0.3999 ... and 0.4000 .... Therefore we assume there is not an i ∈ N such that for all n>i,n∈ N,an =9.Thenwe map r to the set {1a1, 1a1a2,...} of natural numbers. Clearly, this map- ping is well-defined. For example, r =0.17803 ... is mapped to the set {11, 117, 1178, 11780, 117803,...}.Thus(0, 1) ≤P(N), hence P(N) ∼ (0, 1).

Corollary: P(N) ∼ R Proof: This directly follows from P(N) ∼ (0, 1) and (0, 1) ∼ R,andthe transitivity of ∼. 60 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3.7 The Continuum Hypothesis

We still think that the study of the size of the continuum should be our guiding for further research in set theory.

- Judah Haim in [33]

After showing that the real numbers cannot be put into one-to-one corre- spondence with the natural numbers (see section 3.5), Cantor hypothesized in 1877 that each infinite subset of R is either denumerable or equivalent to the continuum. This hypothesis was first published in 1878 in [13] and became later known as:

The Continuum Hypothesis (CH): (N ≤1 A ≤1 R) → (A ∼ N ∨ A ∼ R)

This hypothesis (as given in [17, page 128]) is also known in many other forms, of which we will mention and explain the most important. We can immediately see that the following version of CH is equivalent to the given definition: ‘any set of real numbers is either finite, countably infinite or has the same cardinality as the entire set of reals’. This means that ‘the num- ber of real numbers is the next level of infinity above the number of natural numbers’ (see also [30, page 197]).

As we saw in section 3.6, Cantor defined the cardinality of the natural numbers to be ℵ0, and the next levels of infinity to be ℵ1, ℵ2, ℵ3,etc.Healso named the cardinality of the reals c, for continuum. Cantor’s original for- mulation of CH was: (B) c = ℵ1. Since Cantor also proved that P(N) ∼ R (see page 59), we can also state CH as: (C) P(N) ∼ℵ1. The cardinality of the power set of any set X is equal to the cardinality of {0, 1}X (see page X 7 ℵ0 55), often denoted as 2 , so another formulation of CH is: (D)2 = ℵ1 (see [31]). These formulations, although (B) leads us to think about sizes of reals, (C) about subsets and (D) about cardinal , are all equivalent in ZFC. We will not go into details of less precise or more de- pendant formulations such as ‘what is the cardinality of the set of points on a geometrical ?’.

7 Actually in this formulation we have identified the cardinalities ℵ0 and ℵ1 with the sets that have these cardinalities. 3.7. THE CONTINUUM HYPOTHESIS 61

Some of the theory that is needed in the remaining part of this section, for the generalized continuum hypothesis, will be introduced in later chapters. If you are not familiar with the notations that are used, you might want to skip the remaining part of this section and get back to it later.

In 1908 the German mathematician Felix Haussdorf proposed the follo- wing generalization of CH (that is also called aleph-hypothesis):

The Generalized Continuum Hypothesis (GCH): ℵr (∀r : r is an ordinal : 2 = ℵr+1)

For a definition and the notation of ordinal numbers, we refer to section 3.8.1. Obviously, (see section 5.3) we have that ZF + GCH ! CH.Note that ZF + GCH ! AC (so we don’t need ZFC once we have GCH). Cantor and many other great mathematicians spent years trying to prove CH or its negation (Cantor tried to prove his hypothesis by using a decom- pensation theorem; for details see [31, page 117]), but did not succeed. This problem was so important that Hilbert (see section 6.2) put it first in his list of 23 problems.

In 1938 significant was made when the mathematician G¨odel proved (in his article ‘What is Cantor’s continuum problem?’) that CH is consistent with ZFC (see section 5.3.2) by constructing a model of ZFC + CH. Since at the same period, G¨odel proved his famous incompleteness the- orem (see chapter 8), people suspected that CH was one of the statements (of ZFC) that can neither be proved nor disproved. Mathematicians sus- pected that CH was undecidable in ZFC but it took until 1963 until this was proved by in [15].

To do that he used a new technique called . Forcing is a combi- natorial technique for proving statements consistent with the axioms of set theory. Cohen used it in order to prove that the negation of AC and the negation of CH are consistent with the axioms of set theory (AC and CH were already known to be consistent). Essentially it consists of a method of performing the following algorithm: start with a model of set theory M. Construct an object X not in M with certain properties. Consider the smal- lest model M with X an element of M and M a subset of M (this is done in a way such that the construction of M is implicit in the construction of 62 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

X). For more details on forcing, see [51] and [81].

Thus Cohen constructed a model of ZFC + ¬CH and this, along with G¨odel’s model of ZFC + CH, showed that CH is undecidable in ZFC.So this means that either CH or ¬CH could be added as an axiom of ZFC. But since neither of these axioms seems axiomatic or ‘self-evident’ they have, unlike AC, not been adopted as axioms of set theory. Mathematicians either accept this incompleteness in set theory or try to find more intuitive axioms that will help decide it. In other words, the question remains what intuitive axiom of set theory we need to make it more complete, and whether, with some axiom system for set theory, the continuum hypothesis is true. 3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 63 3.8 Cardinal and Ordinal numbers and Para- doxes

Every transfinite consistent multiplicity, that is, every transfinite set, must have a definite aleph as its cardinal number.

-GeorgCantor

3.8.1 Cardinal numbers and Cantor’s Paradox In section 3.6 we already encountered cardinal numbers and the notion of set equivalence. After defining the equivalence of sets (see page 51), Cantor realized that all sets that are equivalent to a given set V have a common property. He identified this property with the cardinal number V of a set V , a property that abstracts from the nature and order of the elements of a set.

Example: Consider the following sets: A = {1, 2, 3}, B = {3, 2, 1}, C = {{4}, 7, {a, b}}, D = {1, {4}}. We can say that A ∼ B ∼ C,or(equiva- lently) A = B = C.WealsohaveA ∼ D,orA = D. Note that in this example the equality ‘=’ between cardinal numbers is a new type of equality that is defined as A = B ↔ A ∼ B.

We can see that cardinality abstracts from the order and nature of the elements, and for finite sets the cardinal number can be identified with the ordinary ‘number of elements’. Therefore we identify the cardinal number of a finite set of n elements with the n. We denote the smallest infinite set (or transfinite) cardinal number by ℵ0. As we have already seen on page 52, this is the cardinal number of N or any denumerable infinite set. Cantor defined the ‘next’ levels of infinity by ℵ1, ℵ2,....

The next question was how to pass from the abstract notion of cardinal numbers to real cardinal numbers, i.e. one wanted to regard cardinal numbers as objects of the mathematical system. It turned out to be quite a problem to define the cardinal V of a set V as an object of set theory. In , as well as in Quine’s ‘’ (see section 7.3), the defini- tion of the cardinal V of V poses no problem: V can be defined as the set of all sets equivalent to V . But this definition (first given by Frege, see page 64 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

3.6) of cardinal numbers as given in section 3.6 can lead to a paradox that was first found by Cantor.

Cantor’s paradox: The set of all sets is its own power set. Therefore, the cardinality of the set of all sets must be bigger than itself.

In axiomatic set theory however (e.g. in ZF, see section 5.3), without the unrestricted comprehension axiom, there is no set which contains all sets equivalent to V . With this paradox the need arose to find a new definition of cardinals in a context without the unrestricted comprehension axiom, such that traditional paradoxes could no longer be derived.

Several new definitions of cardinal numbers were then proposed, based on ordinal numbers (for which we refer to the next section8). The following definition that comes from the mathematician von Neumann is now the stan- dard definition for cardinal numbers.

Definition of Cardinal number (or initial number): A cardinal number α := an α with property (∀γ :: α ∼ γ → α ≤ γ)

For each set V we can prove (see [17, section 2.10]) that there exists exactly one cardinal number α satisfying V ∼ α (proof uses AC). We call this unique α the cardinality or cardinal number of the set V ,andisalso denoted by V .

In other words, with the axiom of choice we can develop the theory of ordinals in the von Neumann way and define V to be the least ordinal α equiv- alent to V . The existence of such an α is guaranteed by the well-ordering theorem. If we have the axiom of foundation among our axioms, even if the axiom of choice is absent we can define V as the set of all sets W of least rank among those equivalent with V (see [1]). In the absence of the axioms of choice and foundation the operation V is undefinable (see [1]).

For more information on the definition and calculus of cardinal numbers, we refer to [59, chapter 6], [25] and [34].

8The rest of this section depends on concepts that are defined in later chapters. 3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 65

3.8.2 Ordinal numbers and Burali-Forti’s Paradox We already introduced Cantor’s concept of cardinal number in section 3.6, and saw in the previous paragraph that it abstracts of the order and nature of the elements of a set. Cantor also defined a property of sets, the ordinal number, that only abstracts from the nature of the elements of a set, but retains the order in which they are given.

Here we consider sets with a total ordering (see page 25). Recall that in addition for a well-ordered set, each non-empty subset also has a first mem- ber in the given ordering (see also page 3.2). In the case of ordered sets, the concept of equivalence is now replaced by the sharper concept of . We consider two ordered sets V and W similar, notation V  W ,ifthereis a bijection between V and W that retains all order relations. Note that we have already seen this relation with the concept of isomorphism (‘is isomor- phic to’, see page 31), and note that  is an equivalence relation. Instead of saying two sets are similar, we also can say they are of the same .

Definition of an Order Type: An equivalence class under the  (isomor- phism) relation

The equivalence class to which an ordered set V belongs is called the order type of V . All well-ordered sets that are as such similar to a given set V have a common property. Cantor identified this property with the ordinal number V of a well-ordered set V , a property that only abstracts from the nature of the elements of a set. And just as for cardinals (see section 3.8.1) the question was posed how to define ordinal numbers as part of set theory. In 1883 Cantor defined in [13] an ordinal number as the order type of a well- ordered set.

Definition of Ordinal Number (Cantor): A well-ordered set V has or- dinal number o := o is the order type of V

If a set is finite and simply ordered, it is well-ordered and it has an ordinal number. The ordinal number of that set is the same, regardless of the order of the elements. For each finite and simply ordered set, we can therefore 66 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY identify the (finite) cardinal number with the ordinal number.

Example: 0=∅;1={0};2={0, 1};3={0, 1, 2} are ordinal numbers.

The smallest infinite ordinal number is called ω. This is the ordinal num- ber of the sequence {0, 1, 2, 3,...}, which can be seen as N or as the sequence of finite cardinal numbers in their ‘natural’ order. We introduce some other transfinite ordinals by example (from [10, page 66]).

Example:

If we call the set ∅ as ‘0’, the next set as ‘1’, etc., then consider the union of all the sets {0, 1, 2, . . . }. This is another ordinal called ω and is the first non-finite ordinal. It has a successor: ω ∪ ω, called ω +1. More ordinals can be obtained by continuing this succession, and taking the union of all these ordinals yields an ordinal we call ω ∗2, etc. The natural numbers in reverse order are denoted ∗ω.

V1 = {2, 3, 4,...,1} ; V2 = {3, 4, 5,...,1, 2}

V3 = {1, 3, 5,...,2, 4, 6,...} ; V4 = {...,3, 2, 1}

V5 = {1, 3, 5,...,6, 4, 2} ; V6 = {1, 11, 21,...,2, 12, 22,...}

N = ω ; V1 = ω +1;V2 = ω +2;V3 = ω + ω = ω ∗ 2 ∗ ∗ V4 = ω ; V5 = ω + ω ; V6 = ω ∗ 10

For ordinal numbers n of N and m of M we say that n

Unfortunately, a similar situation as for cardinal numbers, was found for ordinal numbers. In 1897 it was found by the Italian assistant of the mathematician Peano, Burali-Forti, that this definition can give rise to a paradox (see [18, page 259]). 3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 67

The Burali-Forti Paradox: The set of all ordinal numbers, taken in their natural order, form a well-ordered series, and therefore also has an ordinal number Ω. But the ordinal number of any subset of the set of all ordinals exceeds every number of that subset, and therefore Ω exceeds any ordinal number whatsoever.

This led to new proposals for definitions of ordinal numbers. Hence we hereunder present another definition, given by in [61]. In 1923 he pointed out that among all well-ordered sets having a Cantorian ordinal as their order type, there is a particular one with some very special properties. Von Neumann defined this particular set as the ordinal of that order type.

Definition of ordinal number: Asetα is an ordinal number :=

1) α is a well-ordered set with the binary relation ∈ as its ordering

2) (∀β :: β ∈ α ↔ β ⊂ α)

With this definition of ordinal numbers, the Burali-Forti paradox can no longer be applied, since the set of all ordinals is well-ordered by  and 2) also holds (a proof is given in [59, section 4.2]). According to this def- inition, the empty set is an ordinal number. This ordinal number is also denoted by 0. Similarly we also denote the ordinal numbers {0} by 1, {0, 1} by 2, {0, 1, 2} by 3, etc. Otherwise said: 0 = ∅, 1={∅}, 2={∅, {∅}},.... These ordinal numbers, which are finite sets, are called finite ordinal num- bers . The finite ordinal numbers are identified with the natural numbers. The set ω = {0, 1, 2,...} of all natural numbers is also an ordinal number. An ordinal number that is an infinite set, like ω, is called a transfinite ordi- nal number. For every well-ordered set V , there exists exactly one ordinal number isomorphic to V .

Definition of ordinal number of a well-ordered set V : The ordinal number of a well-ordered set V := the ordinal number isomorphic to V 68 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY

A detailed treatment of ordinal calculus that is based on this definition of of ordinal numbers, is outside the scope of this report. In the remainder of this section we will only define the most common concepts.

As we saw in 3.2 we also write α ∈ β (we denote ordinals by lower-case Greek letters) as α<β, which defines an ordering on the ordinal numbers. The least ordinal number is of course 0, and the ordering of the finite ordi- nal numbers coincides with the usual ordering of the natural numbers. The least transfinite ordinal is ω (see also 5.3.2). The ordering ≤,definedby α ≤ β := α<β∨ α = β, is a linear ordering and a well-ordering of the ordinal numbers. Therefore we can apply transfinite induction (see page 37) on ordinal numbers.

For any ordinal number α, the set α = {γ | γ ≤ α} (called a seg- ment of α) also is an ordinal number, and α is the unique predecessor of α. A transfinite ordinal without a predecessor is called a limit ordinal num- ber , and all the other ordinal numbers are called isolated ordinal numbers. The first limit ordinal number is ω. For any set V of ordinal numbers, {γ | (∃η : η ∈ V : η ≤ γ)} is an ordinal number, the supremum of V .

A full treatment of the theory of ordinal numbers is omitted here. Ri- gorous study has produced a complete calculus of ordinal numbers and pro- duced significant results. We only mention here the so-called well-ordering theorem, which Cantor had accepted as true (see [18, page 257]) but that was first proved rigorously by Zermelo in 1904.

Well-Ordering Theorem: Every set can be well-ordered.

This means that ordinals give us a way of ‘counting’ any set, even if it is not finite. The particular significance of the well-ordering theorem lies in the possibility that we can apply the principal of mathematical induction (which is well known for denumerable sets, see section 3.4.3) to any arbitrary well- ordered set. Ordinal numbers form the basis of transfinite induction which is a generalization of the principle of induction. 3.8. CARDINAL AND ORDINAL NUMBERS AND PARADOXES 69

We now have the following properties (given without proof):

• Two finite and ordered sets have the same order type if and only if they have the same cardinal number

• Cantor’s theorem : the cardinality of any set is lower than the cardi- nality of the set of all its subsets (i.e. there is no highest aleph)

• If two sets have the same ordinal number, they have the same cardinal number, but not necessarily vice versa

For more information and theory on cardinal numbers, ordinal calculus and set theory we refer to two classical books on set-theory: [25] and [34]. The first one gives a good introduction to set theory and presupposes little mathematical knowledge, the latter is more suitable for readers with experi- ence on set theory. 70 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY Chapter 4

Peano and Frege

4.1 Peano’s arithmetic

Questions that pertain to the foundations of mathematics, al- though treated by many in recent times, still lack a satisfactory solution. The difficulty has its main source in the ambiguity of language.

- Peano in the opening of the paper ‘Arithmetices Principia’, novo methodo exposita in which he introduces axioms for the integers

The Italian mathematician (1858-1932) spent most of his career successively in the infinitesimal calculus, in foundations of mathe- matics and in linguistic studies. After his work on calculus (see Peano’s first publication [65]) and geometry (see [66] [67]), Peano gained particular inter- est in the field of number theory, also known as arithmetic. Like Dedekind (see quote on page 46), Peano became aware of the lack of in mathe- matics by his experience in teaching infinitesimal calculus.

What is number theory? The field of mathematics consisting of the study of the properties of the natural numbers

Since then, Peano strived for rigor, for an abstract mathematics. He came to the conclusion that mathematics must be constructed, independently of intuition or , in a way that absolutely guarantees the

71 72 CHAPTER 4. PEANO AND FREGE of its theorems.

In order to satisfy this requirement he devoted himself to the transforma- tion of mathematics into a self-contained system, and rewrote mathematics in symbolic form as an (see section 6.1), based exclusively on postulated primitive notions and primitive propositions. To discard intuition, he first renounced ordinary language (because it is often not sufficient and imprecise) and desired a new mathematical symbolism, consisting entirely of neutral symbols. Second, he formalized the logic of the mathematical ar- gument to replace intuitive by application of a limited number of stated logical rules.

So Peano formalized both the and the logic of the mathematical argument, and thereto first developed parts of sym- bolic logic and first formalized propositional and calculus. This development was rudimentary and would later be worked out in full detail by the mathematicians Russell and Whitehead in ‘Principia Mathematica’ (1910, see section 7.1). He introduced letters to denote propositions and propositional functions (Peano’s logic notation) and the symbol ∈ for the membership relation of a set.

The work of formalization of mathematics was published in the journal ‘Rivista di Mathematica’ (this journal was previously founded by himself) and ‘Formulario Mathematico’, a series of 5 books that is also known as ‘Formulaire de Math´ematique’1. In 1899 he axiomatized the arithmetic of cardinal numbers, to be published in the third volume of ‘Formulario Math- ematico’ in 1901. Peano based the foundations of arithmetic on 5 axioms (see [31, page 227]), that are formulated with the help of three (undefined) terms, the acquaintance with the latter being assumed:

a) N (the set of natural numbers)

b) 0 (the particular natural number zero)

c) a+ (the immediate successor of the natural number a)

1The original ‘Formulaire de Math´ematique’ was called ‘Formulario Mathematico’ when the first final version appeared in 1908, because Peano at that time consistently used , his simplificated dialect of , for all his mathematical publications. 4.1. PEANO’S ARITHMETIC 73

Definition of the Peano axioms for the natural numbers:

1) 0 ∈ N (zero is a natural number) 2) a ∈ N → a+ ∈ N (the immediate successor of any number is a number) 3) 0 ∈ S ∧ (∀x :: (x ∈ S → x+ ∈ S)) → N ⊂ S (if a set S contains zero and if it contains any number x it also contains the immediate successor x+ of that number, then S includes the whole of N) 4) a, b ∈ N ∧ a+=b+ → a = b (no two different numbers have the same immediate successor) 5) a ∈ N → a+ =0 (zero is not the immediate successor of a number)

Axiom three has the function to formalize the principle known as mathe- matical induction. We can show that in ZF (see section 5.3) we can derive the five axioms of Peano. For more information on the Peano axioms, I refer to [31, chapter 5], [49, page 146-147] and [64, appendix A].

After defining the natural numbers, Peano used a recursive definition to define the arithmetical sum, product and other operators, and he derived much of the elementary number theory.

Example: Peano defined the sum a + b by with respect to b : a +0 = a, a +(b+) = (a + b)+. Similarly we can define the product a ∗ b : a ∗ 0=0,a∗ (b+) = (a ∗ b)+a.

Peano then showed how rationals and reals can be formally obtained from naturals, and further considered elementary analysis and geometry. In later years, Peano turned away from the foundations of mathematics and devoted most of his time on his new international auxiliary language Interlingua. He invented this language (see [49, page 148-150]) in an attempt to reduce the grammatical structure of and create a universal language. His mathematical work were to have a profound influence on the thought of mathematics, but his language Interlingua received little response. 74 CHAPTER 4. PEANO AND FREGE 4.2 Frege’s work

As I think about acts of integrity and grace, I realize that there is nothing in my knowledge to compare with Frege’s dedication to truth. His entire life was on the verge of completion, much of his work had been ignored to the benefit of men infinitely less capa- ble, his second volume was about to be published, and upon finding that his fundamental assumption was in error, he responded with intellectual pleasure clearly submerging any feelings of disappoint- ment. It was superhuman and a telling indication of that of which men are capable if their dedication is to creative work and knowledge instead of cruder efforts to dominate and be known.

- B. Russell about Frege, in [93, page 127]

The German mathematician and philosopher Gottlob Frege (1848-1925) was one of the founders of modern symbolic logic putting forward the (lo- gistic) view that mathematics is reducible to logic. He has written many important papers on philosophy. Frege once said ‘every good mathematician is at least half a philosopher, and every good philosopher is at least half a mathematician’. Famous is his for the existence of god, but we will not discuss his philosophical writings here. We will mention his three most important works on the foundations of mathematics: Begriffs- schrift, Grundlagen der Arithmetik and Grundgesetze der Arithmetik.

Begriffsschrift

Just as Peano, the German mathematician Gottlob Frege invented a log- ical symbolism to which he gave the name ‘Begriffsschrift’ (in English known as ‘Concept script’). We will not treat the symbolism that was used in Be- griffsschrift here, in full detail (it can be found in [49, page 175-182] and in [31, page 177-199]), but give a few examples of his new logic and describe the rest of his work in general terms. Frege rejected the subject/predicate regimentation on which Aristotelian logic depends, and recognized (not as the first) that the patterns of Aris- totle cannot always be used to evaluate inferences correctly. 4.2. FREGE’S WORK 75

Example: Certain obvious inferences, such as:

If Joe doesn’t wear a kilt, than Joe is not Scottish.

Joe doesn’t wear a kilt.

Therefore, Joe is not Scottish. do not fall under the patterns of traditional logic (also called ). Ac- tually this is another kind of inference that contains a conditional expression of the form:

if B then A

B

Therefore, A.

Frege adopted this new rule in the system of logic of his Begriffsschrift. With arbitrary expressions for A and B, the rule became later known as . A logic that evaluates these sorts of expressions is called a propositional logic.

What is (or sentential calculus)? A symbolic system of treating compound propositions and their logical re- lationships. Compound propositions are formed via a set of derivation rules using standard symbols: ∧, ∨, →, ¬ ; Basic propositions consist of simple, unanalyzed propositions.

Frege based his propositional calculus on 6 axioms: for all x, y and z:

1 x → (y → x)

2(x → (y → z)) → ((x → y) → (x → z))

3(x → (y → z)) → (y → (x → z))

4(x → y) → (¬y →¬x)

5 ¬¬x → x

6 x →¬¬x 76 CHAPTER 4. PEANO AND FREGE

Derivations in the propositional calculus were based on two procedures of and the rule of modus ponens. For the full calculus of predi- cates, three additional axioms were needed. For all x, y and (propositional functions) F :

7(x = y) → (F (x) → F (y))

8 x = x

9(∀x :: F (x)) → F (y) Frege presented this new logic in his ‘Begriffsschrift’ in 1879. It consists of three parts. In the first part he provides a list of inferences from which, he , all of logic can be derived. Then Frege demonstrates in the second part the completeness of his logic (i.e. all inferences that can be shown to be valid inferences using the techniques of Aristotelian or proposi- tional logic can also be shown to be valid using only Frege’s laws and rules of inference). The third part of Begriffsschrift shows that logic alone suffices to show the validity of certain inferences (about properties that are heredi- tary in so-called ‘ancestral sequences’). He also showed that mathematical induction (see section 3.4.3) can be replaced by a principle about ancestral sequences that depends only on logical laws.

Grundlagen der Arithmetik

Throughout his work Frege developed (as the first) the main thesis of logi- cism, that mathematics is reducible to logic. But thereto, he had to do more than developing a new logical symbolism. His next book, ‘Die Grundlagen der Arithmetik’ (1884), was devoted to the ‘foundations of arithmetic’. In this book, Frege treated the foundations of arithmetic, based on the concept of (cardinal) numbers. He put forward the logicist philosophy that arithmetic could be founded upon logic alone, and he discussed work of others in detail (see [49, 184-185]). In [31, page 183] we learn more about Frege’s philosophy. In the introduction of his book Frege announced his three guiding principles:

1) Always to separate sharply the psychological from the logical, the sub- jective from the objective

2) Never to ask for the meaning of a word in isolation, but only in the context of a 4.2. FREGE’S WORK 77

3) Never to lose sight of the distinction between

In his book he presented his own theory of numbers, and wanted to show that all the truths in arithmetic are derivable from logical laws and defini- tions alone. He did this by sketching the proof, but not giving the official Begriffsschrift proofs of the truths of arithmetic. Before Frege could do that he needed a new version of Begriffsschrift, to accompany the new require- ments that his formalization of the concept of numbers had, but also to fill in pieces that were simply missing.

Grundgesetze der Arithmetik

In his next three papers ‘Function and Concept’, ‘On Sense and Meaning’, and ‘On Concept and Object (1892)’, he introduced all modifications that he was to make to his language, Begriffsschrift, and his logical system. During that period he also completed his definitions of the natural numbers and some of the proofs of simple truths of arithmetic from these definitions and logical laws. His new logical calculus included a symbolic representation of the of any given proposition, which provided a shorter notation for many Begriffsschrift propositions. The calculus also had several other new logical and arithmetical symbols, one of the most important of them being a notation for what Frege called the ‘course-of-values’ of a . The course-of-values of a propositional function ϕ , denoted by Frege asεϕ ˘ (ε), denoted the truth value for all possible values of the argument (here ε). We denote it as cov and define equal course-of-values by cov(f)=cov(g) ↔ (∀a :: f(a)=g(a)). In 1893, Frege published the first volume of his ‘Grundgesetze der Aritmetik’, the ‘Basic Laws of Arithmetic’. It set out the new version of logic and began the proofs that were to make the project successful. In the second part Frege wanted to define the natural numbers and some basic laws governing them and, in the third part, he would define the real numbers and lay the foundations for expressing analysis in terms of logic. In 1902, when volume 2 was in press, he received a now famous letter from the English mathematician and logician Russell (see chapter 5), who pointed out, with great modesty, a contradiction could be derived in Frege’s system (see section 5.1). This contradiction would later be named after Russell and become known as ‘Russell’s paradox’. 78 CHAPTER 4. PEANO AND FREGE

Hardly anything more unwelcome can befall a scientific writer than one of the foundations of his edifice be shaken after his work is finished. I have been placed in this position by a letter of mr. just as printing of the second volume was near- ingcompletion....

- The first paragraph of the appendix from Frege’s ‘Grundgesetze der Aritmetik’

After many letters between the two (see for example [93, pages 124-128]), Frege modified one of his axioms and explained in an appendix to the book that this was done to restore the consistency of the system. However with this modified axiom, many of the theorems of volume 1 do not go through and Frege must have known this. He probably never realized that even with the modified axiom the system is inconsistent since this was not shown until after Frege’s death in 1925, by Leshniewski (see [85]).

The scope of Frege’s Grundgesetze is similar to that of Principia Mathe- matica (to be discussed in section 7.1), and both aimed at a logistic basis for mathematics, but with Russell’s theory of types Principia Mathematica did not contain the paradox. Frege’s contribution to the foundations of ma- thematics was therefore largely indirect (through Principia Mathematica, see [49, page 181]). Although Frege attracted only a small audience in his lifetime, he was a major influence on Peano and Russell, and in the years thereafter his influence on contemporary philosophy, especially on thought about language and logic, has become ubiquitous.

In this text I have made extensive use of the excellent books [98] and [97] about Frege that contain many more about Frege and his work, and chapter 4.5 from [31] and chapter 6, section 4 from [49]. Chapter 5

Russell

The fact that all Mathematics is Symbolic Logic is one of the greatest discoveries of our age; and when this fact has been esta- blished, the remainder of the principles of mathematics consists in the analysis of Symbolic Logic itself.

- B. Russell in Principles of Mathematics, 1903

The English logician and philosopher Bertrand Russell (1872-1970) pu- blished in his long life an incredible number of books on logic, the theory of knowledge and many other topics. He certainly was one of the most impor- tant logicians and of the .

Russell’s private life, affairs, imprisonment, his social and political cam- paigns and advocacy of both pacifism and nuclear disarmament are certainly interesting, but we will not discuss these subjects here (see for more informa- tion and references on Russell’s life and work [62], [80] and [31, chapter 6, 7, 11 and sections 8.2, 8.3, 8.4, 8.8.3, 8.9.2, 10.1, 10.2.1]). I quote the following assessment from [73]: “Bertrand Russell had one of the most widely varied and persistently influential intellects of the 20th century. During most of his active life, a span of three generations, Russell had at any time more than 40 books in print ranging over philosophy, mathematics, science, , so- ciology, education, history, , politics and polemic. The extent of his influence resulted partly from his amazing efficiency in applying his intellect (he normally wrote at the rate of 3,000 largely unaltered words a day) and partly from the deep humanitarian feeling that was the mainspring of his ac-

79 80 CHAPTER 5. RUSSELL tions. This feeling expressed itself consistently at the frontier of social change through what he himself would have called a liberal anarchistic, left-wing, and skeptical atheist temperament.”

Here, we will focus on Russell’s mathematical contributions to the foun- dations of mathematics. His contributions relating to mathematics include his discovery of Russell’s paradox, his defense of logicism (the view that mathematics is, in some significant sense, reducible to formal logic), his in- troduction of the theory of types, and his refining and popularizing of the first-order predicate calculus. Along with Kurt G¨odel (see chapter 8), he is usually credited with being one of the two most important logicians of the twentieth century. We will look at each of these contributions in more detail.

Russell discovered the paradox which bears his name in 1901, while working on his ‘Principles of Mathematics’ (1903). The paradox and the closely related vicious principle are discussed in section 5.1. Russell’s own response to the paradox came with the introduction of types (see chap- ter 7). Using the vicious circle principle also adopted by Henri Poincar´e, together with Russell’s so-called ‘no-class’ theory of classes, Russell was then able to explain why the unrestricted comprehension axiom (see section 2.1) fails: propositional functions, such as ‘x is a set’, should not be applied to themselves since self-application would involve a vicious circle. On this view, it follows that it is possible to refer to a collection of objects for which a given condition (or predicate) holds only if they are all at the same level or ‘type’.

Although first introduced by Russell in 1903 in the Principles, his theory of types finds its mature expression in his 1908 article ‘Mathematical Logic as Based on the Theory of Types’ and in the monumental work he co-authored with , ‘Principia Mathematica’ (1910, 1912, 1913). Principia Mathematica and the theory of types will be treated in detail in chapter 7. The theory admits of two versions, the ‘simple theory’ and the ‘ramified theory’. Both versions of the theory later came under attack. For some, they were too weak since they failed to resolve all of the known para- doxes. For others, they were too strong since they disallowed many ma- thematical definitions which, although consistent, violated the vicious circle principle. Russell’s response to the second of these objections was to intro- duce, within the ramified theory, the . Although the 81 axiom successfully lessened the vicious circle principle’s scope of application, many claimed that it was simply too ad hoc to be justified philosophically.

Of equal significance during this same period was Russell’s defense of logi- cism, the theory that mathematics was in some important sense reducible to logic. First defended in his Principles, and later in more detail in ‘Principia Mathematica’, Russell’s logicism consisted of two main theses. The first is that all mathematical truths can be translated into logical truths or, in other words, that the vocabulary of mathematics constitutes a proper subset of that of logic. The second is that all mathematical proofs can be recast as logical proofs or, in other words, that the theorems of mathematics consti- tute a proper subset of those of logic.

Like Gottlob Frege, Russell’s basic idea for defending logicism was that numbers may be identified with sets of sets and that number-theoretic state- ments may be explained in terms of quantifiers and identity. It followed that number-theoretic operations could be explained in terms of set-theoretic operations such as intersection, union, and the like. In ‘Principia Mathema- tica’ Whitehead and Russell were able to provide detailed derivations of many major theorems in set theory, finite and transfinite arithmetic, and elemen- tary measure theory. A fourth volume on geometry was planned but never completed.

For more information on Russell’s theory of types and about Principia Mathematica, we refer to chapter 7. In this chapter we used parts of [73] and [39]. 82 CHAPTER 5. RUSSELL 5.1 Russell’s paradox

I hoped sooner or later to arrive at a perfect mathematics which should leave no room for doubts, and bit by bit to extend the of certainty from mathematics to other .

- Russell, in [78]

Paradoxes have been known for a long time, but in particular with the introduction of more formal systems at the end of the 19th century paradoxes became more influential on the foundations of mathematics. Before we de- scribe the most famous paradox of Russell, we first define the notion of a paradox.

What is a paradox? A paradox is a statement which appears self-contradictory or contrary to expectations, and is also known as an

In an axiomatic system (see section 6.1) a paradox is a derivation that leads to a contradictory statement.

A paradox is properly something which is contradictory to ge- neral opinion; but is frequently used to signify something self- contradictory [...] Paralogism, by its etymology, is best fitted to signify an offence against the formal rules of inference.

- De Morgan, in [31, page 310]

In [86], three ‘paradox threats’ are identified: when systems are complex, formal or designed for computers, there often is not enough intuition to notice inconsistencies. With the previously described formalizations, the systems of Cantor (see chapter 2), Peano (see section 4.1), Frege (see section 4.2), and not to mention Russell himself were at risk. And indeed, in 1902 Russell discovered a paradox in Frege’s ‘Grundgesetze der Aritmetik’. The paradox turned out to be at the basics of mathematics, since it could be formulated in all the systems mentioned above. We first formulate the paradox in Cantor’s set theory:

Russell’s paradox: Let R = {x | x ∈ x}.ThenR ∈ R ↔ R/∈ R 5.1. RUSSELL’S PARADOX 83

Russell in 1901 studied Cantor’s work [31, section 6.6.1] and after noting that some sets belonged to themselves while the rest did not do so, Russell showed that the set of all sets which do not belong to themselves belongs to itself if and only if it does not do so - and, by repetition of the argument, vice versa also. Russell also expressed this paradox in terms of predicates, and as such first presented his discovery in a letter to Frege (see [93, page 124] and see also the quote on page 78).

Since Peano’s system was based on the set theory of Cantor, also Peano’s work contained the paradox. In Frege’s work (Grundgesetze der Aritmetik) self-application was not possible, so R ∈ R was not allowed, but the para- dox could still be expressed by using Frege’s notion (see page 77) of the course-of-values of a function. If we define equal course-of-values cov by cov(f)=cov(g) ↔ (∀a :: f(a)=g(a)), we can derive the paradox in Frege’s work as follows (see also [86, page 7] for a slightly different proof):

Define f(x):=(¬∀ϕ :: (cov(ϕ)=x) → ϕ(x)), and let K := cov(f).

¬f(K)

≡{def. f}

¬(¬∀ϕ :: cov(ϕ)=K → ϕ(K))

≡{elim.¬¬}

(∀ϕ :: cov(ϕ)=K → ϕ(K))

≡{instantiate ϕ with f}

cov(f)=K → f(K)

≡{def. K, elim. →}

f(K)

The paradox had a big influence, since it could be formulated in all sys- tems, and all statements in were entailed by a contradiction. 84 CHAPTER 5. RUSSELL

In the eyes of many mathematicians (e.g. Hilbert, Brouwer) it therefore appeared that no proof could be trusted once it was discovered that the logic underlying all mathematics was inconsistent. Russell’s paradox arises as a result of naive set-theory’s so-called unrestricted or naive comprehension axiom (see page 16). Cantor created this axiom with the intuition that any coherent condition may be used to determine a set. But that means that the condition ϕ that determines a set V = {x | ϕ(x)} may depend on the whole set V , i.e. it allows impredicative definitions (see below for the definition of impredicative). Most attempts at resolving Russell’s paradox have therefore concentrated on various ways of restricting or abandoning this axiom.

Before we consider the consequences of the discovery of the paradox, we first take a further look at the nature of the paradox, hereby following Russell’s own analysis. While writing ‘The Principles’, Russell’s attention was attracted by what is now known as Cantor’s paradox and (according to a letter he wrote to the French mathematician Jourdain) found that there was something wrong with his earlier refutation of Cantor’s paradox (see [29, section 7]). He removed his earlier refutation from ‘The Principles’ and his revised diagnosis uncovered a true paradox. As we have already seen, he summarized this discovery and the reasoning that led thereto in a second letter to Frege.

After discovering his famous paradox, Russell traced the back to what he called the ‘vicious circle principle’. The ‘vicious circle’ that his prin- ciple is named after, arises from the assumption that a set of objects may contain members which can only be defined by means of the set as a whole. Therefore, Russell said that statements are not legitimate and meaningless, if they contain a set of objects such that it will contain members which pre- suppose this (total or whole) set of objects. That means a statement is only legitimate if all propositions it contains refer to already defined sets.

Definition of impredicative: Adefinitionisimpredicative if it involves a set V that has a member v ∈ V whose definition depends on V .1

1Note that a direct implementation of this definition as a new axiom of set theory is not possible; We might rephrase the definition as ‘whatever set contains an apparent element, that element must not be dependant on that set’. This might be implemented by fixing ‘an apparent element’ of a set and then expressing its independency of other elements of that set. This independency means that, regardless of the nature of the elements of the 5.1. RUSSELL’S PARADOX 85

In a sense those impredicative definitions are thus circular, and were con- sidered the cause of . For more information about , see [57, section 15.3].

Definition of Vicious Circle Principle2: Definitions, assumptions or statements involving all of a set must not be a part or an element of that set. In other words, impredicative definitions should be avoided.

In terms of set theory we can formulate the principle as : No set V is allowed to contain members v definable only in terms of V ,ormembersv involving or presupposing V .

Vicious circle are that are condemned by the vicious circle principle. Such arguments may not necessarily lead to (since fallacious arguments can lead to true conclusions).

In Principia Mathematica (see [31, section 7.2]), Russell assembles a col- lection of seven different paradoxes, all of which were based on the same circular type of reasoning, and then he resolved them by making their circu- larity explicit. We will now mention eight of the most well-known paradoxes, most of whom originate from the vicious circle principle. set, the nature of the apparent element remains the same. The ‘nature’ of the elements can be seen as all the members of that element (or in case the element is an individual, the nature of the apparent element can be seen as that individual). This leads us to the following axiom: (∀X :: (∀x : x ∈ X : x = a → (∀x : x ∈ X ∧ x = x : x = b(x) → a ∈ X))). Clearly this does not avoid the paradox of Russell. We consider a set X:=R ≡{x | x/∈ x} and an element x ∈ R, i.e. we have x/∈ x. Despite the fact that the set X is ‘too large’, the axiom does not prohibit the existence of the set X. The axiom tells us x = a → (∀x : x ∈ R ∧ x = x : x = b(x) → a ∈ R). In other words, we can change each element in R except x and the nature of x should not depend on it. The only thing we know about x is that x ∈ x and x ∈ R. So to obtain a contradiction we have to show that x ∈ x ∨ x/∈ R. Now we can change all x into any value b(x), but still we will have x/∈ x and x ∈ R.So unfortunately this most ‘direct’ attempt to solve the paradox fails. 2Russell formulated it originally as ‘Whatever involves all of a collection must not be one of the collection’. Or, as formulated in [49, page 113]: ‘If, provided a certain collection had a total, it would have members only definable in terms of that total, then the said collection has no total’. Another formulation of [87] says ‘No entity can be defined in terms of a totality of which it is itself a possible member’. 86 CHAPTER 5. RUSSELL

1 Russell’s paradox (1903), which we have discussed in this section. The impredicativity is clear in the definition of the set that contains all sets that are not members of themselves. There are many popularizations of this paradox, one of them is from Russell himself (1919) and concerns the plight of the barber of a certain village who has enunciated the principle that he shaves only all those persons of the village who do not shave themselves. The paradox is then formed by the question ‘Does the barber shave himself?’.

2 Burali-Forti’s paradox (1897), which we have discussed in section 3.8.2. The impredicativity comes from the ordinal number of the naturally ordered set of all order numbers.

3 Cantor’s paradox, which we have discussed in section 3.8.1. The im- predicativity comes from the cardinal number of the set of all sets.

4 The liar’s paradox: We quote from [49, page 127]: “If a man says ‘I am lying’, his utterance is self-contradictory, and it cannot be either true or false. The oldest form of this particular paradox, in the words of Principia Mathematica, is that of Epimenides the Cretan, ‘who said that all Cretans were liars, and all other statements made by Cretans were certainly lies’.”.

5 Richard’s paradox: The French schoolteacher (1862- 1956) published a paradox in [74] in 1905. He considered a set V of all non-terminating decimals that can be defined in a finite number of words. By arranging V as a sequence, and applying Cantor’s diagonal argument to the members of V , a different but non-terminating decimal was produced, defined in a finite number of words.

6 Paradox of definitions. Again we quote from [49]: “The possible defi- nitions of specific ordinal numbers can be arranged in a sequence, and there are therefore at most ℵ0 of them. But the totality of ordinal numbers is not denumerable, and so there exist ordinal numbers which cannot be individually defined. Among such indefinable ordinals there is a least, and thus it appears that the description ‘the least indefinable ordinal’ yields a definition of an entity that cannot be defined.”.

7 Berry’s paradox: “The least integer not nameable in fewer than nine- teen syllables” is itself a name that contains only eighteen syllables. 5.1. RUSSELL’S PARADOX 87

8 The Grelling-Nelson paradox: The German philosopher (1886-1942) published with his friend (1882-1927) in 1908 a paradox. As described in [31, page 336]: “Some words can be predicated of themselves: in English, ‘word’ is a word, ‘noun’ is a noun, and so on. This property is called ‘autological’, and is obviously itself autological. Other English words are not autological; ‘German’, say, or ‘verb’. They are called ‘heterological’ - but this word is heterological if and only if it is not so.”.

The first three paradoxes are logical paradoxes that can be formulated within Cantor’s set theory. The remaining five are mainly paradoxes of nam- ing, they are of a semantic kind. All these paradoxes have stimulated funda- mental research, and especially Russell’s paradox that revealed the vicious circle principle and first showed the need for a theory of types or other re- striction of the power of the comprehension axiom. 88 CHAPTER 5. RUSSELL 5.2 Consequences and

Perhaps the greatest paradox of all is that there are paradoxes in mathematics.

- E. Kasner and J. Newman quoted in [46]

The various proposals to overcome this paradox led to various . One proposal was to reconstruct set theory on an axiomatic basis (this axiomatic method was first suggested by Hilbert, see section 6.1) sufficiently restrictive to exclude the paradoxes. Hilbert and other formalists had the basic idea to allow the use of only well-defined and finitely constructible objects, together with rules of inference that were deemed to be absolutely certain.

The mathematician Zermelo in 1908 as first did an attempt to formulate proper axioms for set-theory such that the paradox is not deducable, but most other parts of set-theory are. This attempt was successful and, after a refinement by the mathematician Fraenkel, led to the ZF axiom system (see section 5.3) which is still the most accepted basis today. Subsequent refinements to ZF have been made by Skolem, and later by the three mathe- maticians von Neumann, Bernays and G¨odel (see section 8.5). Russell’s own response to the paradox came with the introduction of his theory of types in his Principia Mathematica (see section 5.4). Russell al- ready laid out a first version of his theory to eliminate the paradoxes in 1908. Since self-application (R ∈ R) caused a contradiction, he decided to suppress this. With this approach he assigned types to variables (as types he took sets) and allows expressions such as x ∈ y only if the type of x is one less (in some order) than the type of y. The outlawing of impredi- cative definitions seemed a solution to the known paradoxes in set theory. But it turned out there are essential and accepted parts of mathematics that contain impredicative definitions. This was a serious problem to Russell’s solution, despite the fact that many instances of impredicative definitions in mathematics could be circumvented. We quote from [22, page 265]: “In 1918, the German mathematician (1885-1955) tried to construct as much parts of analysis as possible from the natural number system without the use of impredicative definitions. Although he succeeded in obtaining a considerable part of analysis, he was unable to derive the important theorem 5.2. CONSEQUENCES AND PHILOSOPHIES 89 that every nonempty set of real numbers having an upperbound has a least upperbound”. Other attempts towards a solution for the focus on the foundations of logic. Luitzen Brouwer and the intuitionists took this approach and tried to prevent the paradoxes by denying the principle of the excluded middle (which states that any mathematical statement is either true or false). Brouwer first attacked the logical foundations of mathematics in his doctoral thesis in 1907; This formed the beginning of the Intuitionist School. The intuitionists had the basic idea that one cannot assert the exis- tence of a unless one can also indicate how to go about constructing it.

In the period after the discovery of the paradoxes, we distinguish three main philosophies of mathematics: logicism, and .

What is Logicism? A school of mathematical thought which holds the thesis that mathematics is a part of (or a branch of) logic.

Logicists contend that all of mathematics can be deduced from pure logic, without the use of any specifically mathematical concepts, such as number or set. The first ideas date back to Leibniz (1616) and the actual reduction of mathematics to logic was started by Dedekind (1818) and Frege (1884-1903) and later by Peano, and Whitehead and Russell (in Principia Mathematica 1910-1913).

What is Intuitionism? A school of mathematical thought by the 20th cen- tury Dutch mathematician L.E.J. Brouwer (1881-1966) that contends that the primary objects of mathematical are mental constructions go- verned by self-evident laws.

Intuitionists have challenged many of the oldest principles of mathema- tics as being non-constructive (and hence meaningless). They proposed that a proof in mathematics should be excepted only if it constructed the mathe- matical entity it talked about, and not if it merely showed that the entity ‘could’ be constructed or that supposing its non-existence would result in contradiction. 90 CHAPTER 5. RUSSELL

Brouwer had the fundamental insight that such nonconstructive argu- ments will be avoided if one abandons a principle of classical logic (which lies for example behind De Morgan’s laws). This is the principle of the ex- cluded third (or excluded middle), which asserts that for every proposition ϕ,eitherϕ or ¬ϕ; or equivalently that, for every ϕ, ¬¬ϕ implies ϕ.This principle is basic to classical logic and had already been enunciated by Aris- totle, though with some reservations, as he pointed out that the statement “there will be a sea battle tomorrow” is neither true nor false.

Because of the weight it places on mental apprehension through construc- tion of purported mathematical entities, intuitionism is sometimes also called constructivism. A still more severe form of constructivism which we will not further discuss is strict finitism, in which one rejects infinite sets. More in- formation on intuitionism can be found in [60].

What is Formalism? A school of mathematical thought introduced by the 20th century mathematician David Hilbert, which holds that all mathematics can be reduced to rules for manipulating formulas without any to the meanings of the formulas.

Formalists contend that it is the mathematical symbols themselves, and not any meaning that might be ascribed to them, that are the basic objects of mathematical thought. Hilbert’s program, called formalism, was to con- centrate on the of mathematics and to study its . A statement should be a , that is a theorem provable within the syntax of mathematics.

These three philosophies do not necessarily contradict each other, and all philosophies are still advocated today. Whether the logicist thesis has been established seems to be matter of opinion. Though successful, it can be questioned on the ground that the systematic development of logic pre- supposes mathematical ideas in its formulation. The intuitionists succeeded in rebuilding large parts of present-day mathematics, but a large part is still wanting, making intuitionist mathematics less powerful and in many respects much more complicated than . These are serious ob- jections to the intuitionistic approach, but it is generally conceded that its methods do not lead to contradictions, and some hope for a new intuitionist reconstruction of mathematics carried out in a different and more successful 5.2. CONSEQUENCES AND PHILOSOPHIES 91 way. Unfortunately for the formalists, a consequence of G¨odel’s incomplete- ness theorem (see chapter 8) is that the consistency of mathematics can be proved only in a language which is stronger than the language of mathema- tics itself. Yet, formalism is not dead - most pure mathematicians are tacit formalists, but the naive attempt to prove the consistency of mathematics in a weaker system had to be abandoned. From [11, item from ] we learn that most mathematicians of all three philosophies are also philo- sophical realists: “While no one, except an extremist intuitionist, will deny the importance of the language of mathematics, most mathematicians are also philosophical realists who believe that the words of this language denote entities in the real world. Following the Swiss mathematician Paul Bernays (1888-1977), this position is also called Platonism, since believed that mathematical entities really exist.”. For more information about realism, see [57]. 92 CHAPTER 5. RUSSELL 5.3 Zermelo Fraenkel

5.3.1 Axiomatic set theory After the discovery of Russell’s paradox, it became clear that set theory needed a new and more rigorous basis. Hilbert’s proof theory, that will be treated in more detail in chapter 6.1, offered a way to put set theory on firm and hopefully consistent grounds. The so-called calculus was a first formalization of Cantor’s set theory, but it lacked the preciseness of Hilbert’s later theories and was inconsistent because it still contained in some form the (naive) comprehension principle (see page 16). The first real axiomatization of set theory was given in 1908 by the German mathematician Ernest Zermelo in [101]. The attitude adopted in his axiomatic development of set theory is that it is not necessary to know what ‘sets’ are and the ‘things’ that are its elements, nor what the ‘membership relation’ means [49, see page 288, paragraph 1]. Zermelo instead postulated a domain B of abstract objects and represented the elements or ‘things’ of this domain by the letters a, b, c, . . .. He then defined the primitive notions of equality and membership: a = b states that ‘a’and‘b’ designate the same ‘thing’. a ∈ b is defined on the domain B and if a ∈ b holds, we call b a set and a an element of this set. Thus some, but not necessary all objects of B are sets. The assumptions adopted about these notions are called the axioms of the theory. Its theorems are the axioms together with the statements that can be deduced from the axioms using the rules of inference (see also section 6), for example by a system of logic. Criteria for the choice of axioms have been identified by several people (see Hilbert’s theory in section 6, or [49, last of page 287]). The most accepted criteria (more formally defined in chapter 6) include: 1. Consistency of the system (it should be impossible to derive both a statement and its negation, in other words the paradoxes should be avoided). 2. Plausibility (the axioms should be in accord with intuitive beliefs about sets, see [60]). 3. Completeness (richness of the theory: the desirable results of Cantorian set theory ought to be derived as theorems). In the next paragraph we will present the set of axioms that Zermelo has chosen and that formed the basis for all future axiomatizations of set theory 5.3. ZERMELO FRAENKEL 93

(see also section 8.5).

5.3.2 Zermelo Fraenkel (ZF) Axioms Zermelo formulated his axiomatic system in 1908, the extensions of Fraenkel are from 1922. In the same year (1922) the Norwegian mathematician Skolem (1887-1963) proposed a formal language for formulating the theory.

Zermelo noted that the sets involved in a derivation of the paradoxes are very large3 (for Cantor’s paradox it is the set of all sets (see section 3.8.1), for Russell’s paradox it is the set of all sets which are not members of them- selves (see section 3.8.2), and for the Burali-Forti paradox (see section 3.8.2) it is the set of all well-orderings). Therefore he wanted to restrict the size of sets, and he changed the (naive) comprehension principle into his , such that the paradox could no longer be derived:

Separation Axiom: (∀z∃y∀x :: (x ∈ y ↔ x ∈ z ∧ ϕ(x))) For every set z and definite4 property ϕ of sets there exists a set whose ele- ments are exactly those of z having the property ϕ.

There are also certain limitations on the property ϕ (i.e. it should be de- finite) that we will mention later in section 8.5. We show that the standard derivation of Russell’s paradox cannot be applied when the naive compre- hension axiom is replaced by the separation axiom.

Let R = {x | x ∈ Z ∧ x/∈ x}

R ∈ R ↔ R ∈ Z ∧ R/∈ R

→ R/∈ R, contradiction.

R/∈ R ↔ R/∈ Z ∨ R ∈ R

3The term proper class is sometimes used to refer to these ‘excessively large’ sets; all other sets are then referred to as improper classes. This means all sets are classes but not every class is a set. A class that is not a set is called a proper class. 4See section 8.5 for the definition of the concept of definiteness. 94 CHAPTER 5. RUSSELL

← R/∈ Z

In both equations above we can only conclude that R ∈ R ↔ R/∈ R if we know that R/∈ Z. Since we cannot directly conclude (or refute) R ∈ Z, Russell’s derivation of his paradox does not apply.

However, this fact alone does not guarantee that there does not exist a paradox, as claimed in some articles, but merely that the separation axiom does not permit the construction of paradoxical sets with elements defined in terms of the sets themselves. But until consistency is proved, there might be other less obvious ways to construct a paradox.

We now give all of the ZF axioms that constitute set theory. The first seven axioms are those that were originally formulated by Zermelo. Axiom 8 and 9 were later added by Fraenkel and von Neumann respectively. The axioms 1 through 8 are the original set of the Zermelo-Fraenkel axioms.

In the definitions below we use several shorthand notations. If we wish however we can express these definitions in full detail, such that the notation of each expression does not depend on previous axioms. For example, in axiom 8 we used the ∃! to denote that there is exactly one y, and in axiom 9 we used the symbols ∩ and ∅, and in axiom 6 we used ⊆ to express x ⊆ z as a shorthand for (∀y :: y ∈ x → y ∈ z). The separation and substitution axioms are actually axiom schemes.

The Zermelo-Fraenkel axioms:

1. Extensionality axiom (or axiom of determination): (∀x, y, z :: (z ∈ x ↔ z ∈ y) → x = y) Sets are uniquely determined by their members,or to be exact: if every element of a set x is at the same time an element of y, and conversely, then x = y.

2. Axiom of the empty set: (∃x∀y :: y/∈ x) There is an (improper, see also footnote on page 93) set, the ‘null’ or ‘empty’ set, which contains no elements at all. 5.3. ZERMELO FRAENKEL 95

3. Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y. For every set z there exists a set y whose elements are exactly those of z having the property ϕ.

4. Pairing axiom: (∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b.

5. Sum-set axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z.

6. Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z there is a set y that includes every subset of x.

7. Axiom of infinity: (∃z :: ∅∈z ∧ (∀a : a ∈ z : {a}∈z)) There exists a successor set.

8. Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y :: ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x, y)))) The of a set under an operation ϕ (functional property) is again aset.

9. Axiom of foundation or (by von Neumann): (∀a ::= ∅→(∃b :: b ∈ a ∧ b ∩ a = ∅)) Every non-empty set is disjoint from at least one of its elements.

Theorem: (from [49, chapter 11]) The domain B itself (see page 92) is not aset. Proof: Suppose V is any given set. Then5, V has a subset W that consists of those elements of V that are not members of themselves. But then W is not an element of itself (because in that case we would have W ∈ W , while W

5Since the property x/∈ x is definite. See section 8.5 for the definition of the concept of definiteness. 96 CHAPTER 5. RUSSELL consists of elements that are not membersofthemselves).ButifWwould be an element of V − W , we would also have W ∈ W . This means that W is not a member of V .ButV is certainly in B, and therefore B is not the same as V .ThusB cannot coincide with any set at all.

The theory is not complete, since many statements are independent of ZF. Independent of the previous axioms, the following two statements have a more dubious status (and are not part of standard ZF):

10. Axiom of choice (AC): (∀x :: (∃f : f is a function : Dom(f)=x − {∅} ∧ Ran(f) ⊂ A ∧ (∀a : a ∈ Dom(f):f(a) ∈ a))) Every set x has a choice function.

Definition of choice function: A function f is called choice function for the set V := Dom(f)=V − {∅} ∧ (∀v : v ∈ Dom(f):f(v) ∈ V )

11. Generalized Continuum Hypothesis(GCH):

ℵr For any cardinal ℵr, {0, 1} = ℵr+1

In 1908 Felix Haussdorf proposed this generalization of CH.Another formulation of this axiom and more information are given in section 3.6. In the remainder of this section, we will give a short of the nature of the other axioms. For more detailed information, we refer to section 8.5 and to the rich literature on set theory that is available (for example [17], [24], [49, chapter 11], [28]).

The axioms are not minimal. For example, as we have already seen in section 2.26, the axiom of the empty set can be deduced from the separation axiom. We also have empty set axiom + substitution axiom ! separation axiom. We have also seen in section 2.2 how we can define basic operations with the extensionality and separation axioms. The pairing, sum and pow- erset axioms, together with the extensionality axiom, ensure uniqueness of the pairs, sums and powersets of sets. With these axioms alone we can al- ready create an infinite number of axioms. However, each set constructed

6The existence of the empty set in section 2.2 was actually derived from the compre- hension principle but the result can similarly be obtained from the separation axiom. 5.3. ZERMELO FRAENKEL 97 with axioms 1 to 6 only has a finite number of elements. It is the infinity axiom that we need to create infinite sets. These sets are not unique, but the smallest successor set, denoted ω, is unique. We call its elements the natural numbers. With this axiom we can now also prove the principle of induction for ω (see section 3.4.3). The substitution axiom says that whenever ϕ is a property of sets, such that to every x there is exactly one y for which ϕ(x, y), and a is a set, then there exists a set, the elements of which are exactly those y for which an x ∈ a exists such that ϕ(x, y). The foundation axiom says that each non-empty set has -minimal elements (see below). An implication of this axiom is that there is no function f defined on ω such that (∀i : i ∈ ω : f(i +1)∈ f(i)). For a motivation and analysis of the role of the foundation axiom we refer to [17, section 2.1].

Definition of epsilon-minimal: An element b ∈ a is epsilon-minimal in a := b ∩ a = ∅

Another corollary of the foundation axiom is that there is no set which has itself as its only element. Note that to prevent the paradoxes we need the separation axiom, not the foundation axiom.

The origin of the axiom of choice was Cantor’s recognition of the impor- tance of being able to well-order arbitrary sets; i.e., to define an ordering relation for a given set such that each nonempty subset has a least element. The virtue of a well-ordering for a set is that it offers a means of proving that a property holds for each of its elements by a process (transfinite in- duction) similar to mathematical induction. Zermelo (1904) gave the first proof that any set can be well-ordered. His proof employed a set-theoretic principle that he called the axiom of choice, which, shortly thereafter, was shown to be equivalent to the so-called well-ordering theorem. One form of this principle is expressed as the axiom of choice. A choice function for a set A ‘chooses’ an element from each non-empty subset in A.Ifx is a nonempty set the elements of which are nonempty sets, then there exists a function f with domain y such that for member a of y, f(a) ∈ a. For a more detailed discussion of the axiom of choice we refer to [17, section 2.9]. Intuitively, the axiom asserts the possibility of making a simultaneous choice of an element in every nonempty member of any set; this guarantee accounts for its name. The assumption is significant only when the set has infinitely many members. Zermelo was the first to state explicitly the axiom, although 98 CHAPTER 5. RUSSELL it had been used but essentially unnoticed earlier. It soon became the subject of vigorous controversy because of its unconstructive nature. There are a few mathematicians who feel that the use of the axiom of choice is improper, but to the vast majority it, or an equivalent assertion, has become an indispens- able and commonplace tool. For this discussion of the axiom of choice we have used [63], [77] and [11]. A discussion of the Generalized Continuum Hypothesis can be found in section 3.7. Chapter 6

Hilbert

The further a mathematical theory is developed, the more harmo- niously and uniformly does its construction proceed, and unsus- pected relations are disclosed between hitherto separated .

- Hilbert, quoted in [76]

David Hilbert (1862-1943) was a German mathematician who reduced geometry to a series of axioms and contributed substantially to the esta- blishment of the formalistic foundations of mathematics. His first work was on theory and in 1888 he proved his famous Basis theorem (see [5]). After that he did significant work in the of , and published his ‘’, or ‘Report on the theory of numbers’ in 1897. In 1899 he published the ‘Grundlagen der Geometrie’ (to appear in English as ‘The ’ in 1902), which contained (see [31, section 4.7.2]) what would become a widely accepted set of 21 axioms for Euclidian geometry and an analysis of their significance. This axiomatic method that Hilbert used (for geometry, but its application and concept is more general and can be used far beyond the domain of geometry, see also [57, section 14.7]) will be treated in section 6.1. A substantial part of Hilbert’s fame rests on a list of 23 mathematical problems he outlined in 1900, and posed as a challenge for the next century. Some of these problems were related to the foundations of mathematics (see section 6.2). In 1905 Hilbert attempted to lay a firm foundation of mathematics by proving its consistency, resulting in two volumes of ‘Grundlagen der Mathematik’ that

99 100 CHAPTER 6. HILBERT were intended to lead to a proof theory. Despite that in 1931 Kurt G¨odel showed this goal to be unattainable (see chapter 8), the work Hilbert had done on the foundations of mathematics nevertheless remained influential to the development of logic. Hilbert’s work on integral equations in about 1909, (see [45]) led to research in and established the basis for his work on infinite-dimensional , later called (see [22, page 232]). When Hilbert was made an honorary citizen of G¨ottingen he gave an address which ended with six famous words, showing his enthusiasm for mathematics and optimism for solving mathematical problems: “There are absolutely no unsolvable problems. Instead of the foolish ignorabimus [Latin for ‘the ignorant’], our answer is on the contrary: Wir m¨ussen wissen, Wir werden wissen” [We must know, We shall know]. 6.1. HILBERT’S PROOF THEORY 101 6.1 Hilbert’s proof theory

Hilbert formalized mathematical theories in order to turn them into well- defined objects of discussion, thus making possible the new kind of investi- gation to which he gave the new name meta-mathematics. Hilbert was the first who emphasized that strict formalization of a theory involves the total abstraction from the meaning, the result being called a or formalism. In its structure, a formalized theory is no longer a system of meaningful propositions but one of sentences as sequence of words, which in turn are sequences of letters (a symbolic language). Hilbert’s method of making the formal system as a whole the object of mathematical study is called metamathematics or proof theory.

What is metamathematics? The study about mathematics itself (with respect to formalized mathematical systems, metamathematics thus consists of statements about the signs and formulas occurring within axiomatic sys- tems). One of the primary goals of metamathematics is to determine the nature of mathematical reasoning

After Hilbert presented an axiomatic development of geometry in ‘Grund- lagen der Geometrie’ (1899), he devoted himself to the much greater task of applying his new metamathematic method to pure mathematics as a whole. Or, as Hilbert wrote in 1917: “Since the examination of the consistency is a task that cannot be avoided, it appears necessary to axiomatize logic itself and to prove that number theory and set theory are only parts of logic”. Hilbert took a formal(istic) approach to achieve this logistic goal (logicism is the study that uses logic as the basis of mathematics and formalists at- tempted to successfully axiomatize mathematics, see also the philosophies in section 5.2). Thereto Hilbert identified three properties that an axiomatic system should have: it should be decidable, complete and consistent.Inor- der to define these notions, we first have to make precise some other concepts.

Definition of an axiom: A proposition that is regarded as true without proof

Definition of free variable: A variable that is not bound within the scope of a quantifier 102 CHAPTER 6. HILBERT

An axiom that does not contain any variables is also called an axiom statement, an axiom with free variables is called an axiom scheme and each free variable is to be quantified over all well-formed formulas.

Definition of statement (or sentence): A well-formed formula with no free variables

Of the systems that Hilbert’s proof theory applies to, we here consider those susceptible to G¨odel’s incompleteness theorem (that will be presented in chapter 8).

Definition of an STGA language: A language1 L is Susceptible to G¨odel’s argument (STGA) if it consists of: 1 E, a denumerable set of (well-formed) expressions (also called formulas) of L 2 S⊆E, sentences of L (i.e. with no free variables) 3 P⊆S, provable sentences of L 4 R⊆S, refutable sentences of L 5 H⊆E, predicates of L (i.e. with free variables, H∩S = ∅). For convenience, we here assume predicates to have exactly one variable. 6 A function ϕ : E×N →E,ϕ assigns to every E ∈Eand n ∈ N an expression E(n) such that for every H ∈Hwe take for E and every n ∈ N, H(n) is a sentence (H(n) ⊆E hence, H(n) ⊆S). We can think of such a function ϕ as a substitution function. Infor- mally, the sentence H(n) expresses the proposition that the number n belongs to the set by H.

The following set is the only one that depends on a semantic of the expressions, and is normally determined by a model that we accept as representing the truth. The model should be distinguished from the set of derivation rules that (syntactically or mechanically) determines whether sentences are provable or

1Sometimes also called system, since it not only defines a language but also includes the (dis)provability and truth of expressions. 6.1. HILBERT’S PROOF THEORY 103

refutable. It is important to realize that the truth of a sentence is not the same as the provability of that sentence.

7 T⊆S, true sentences of L. This set can be determined by a model (see page 107)

First, we give an intuitive explanation of this definition: In most parts of mathematics, not every sequence of symbols is meaningful or useful. There- fore we only consider the so-called well-formed formulas E.Someofthese formulas (also called propositions) do not contain free variables, we name them sentences (S). Some of them are provable from the axiomatic system (i.e. they can be derived from the axioms and derivation rules of the axiomatic system), and are elements of P. Others are refutable, also called disprovable (i.e. their negation can be derived from the axioms and derivation rules of the axiomatic system) and are elements of R. These notions only depend on whether the sentence is derivable from the axiomatic system and are inde- pendent from the truth of the sentence. We call the set of true sentences T (the other sentences are false). Other formulas have free variables, i.e. they are functions. We call them predicates (H). We also assume there exists a function ϕ that assigns to every expression H ∈Hand natural number n a sentence H(n).

What is an Axiomatic System? An axiomatic system (sometimes also called formal axiomatic system) is a logical system that gives rise to an STGL language and has an explicitly stated finite set of axioms from which provable sentences can be derived (using a finite set of derivation rules)

The set of axioms and derivation rules determines which sentences of L are provable or not. The axiomatic system also contains a syntax definition that determines the well-formedness of expressions of L. Normally, the syn- tax definition of an axiomatic system consists of an alphabet of symbols and a set of rules. We show that this notion of an axiomatic system gives rise to a language that falls under the of STGL languages. Such an axiomatic system A is often defined as follows: 104 CHAPTER 6. HILBERT

Definition of axiomatic system: An axiomatic system A consists of:

• An alphabet Σ, consisting of a finite number of constants (with their ) and variables.

• A recursive definition of a syntax, determining which formulas are well- formed formulas.

• An initially determined and fixed set of axioms and derivation rules (also called transformation rules or rules of inference).

The recursive definition over the given alphabet gives us the set of ex- pressions. The variables enable us to form predicates. The set of axioms and derivation rules let us prove or refute sentences. Ideally, we want all sen- tences that are provable coincide with the sentences we intuitively consider true (P = T ) and the refutable sentences coincide with those we consider false. We call a system with this property correct. We now give an example of a definition of a simple axiomatic system.

Example: axiomatic system A1

2 1 0 0 2 0 0 2 0 0 • Σ={∨ , ¬ , ( , ) , ∀ ,x ,y ,R0,true , false } The numbers that are written in superscript denote the arity of the relations; a constant or variable is a 0-ary relation.

• ϕ is a well-formed formula if it

0. is one of the constants true and false.

1. is an Ri(x1,...,xj), with Ri a relation with arity j,andx1,...,xj variables or constants.

2. has the form of ϕ1 ∨ ϕ2,ϕ1 ∧ ϕ2, (ϕ1), ¬ϕ1, ∀(ϕ1), where ϕ1 and ϕ2 are smaller formulas and xi is some variable from Σ. 6.1. HILBERT’S PROOF THEORY 105

• For all variables x, variables or constants c and d and well-formed for- mula ϕ,

R0(c, d) ∀x(ϕ) true false

¬false ¬true true false

true ∧ ϕ false∧ ϕ ϕ false

true ∨ ϕ ϕ ∨ true true ϕ 106 CHAPTER 6. HILBERT

2 The STGA language L that can be constructed on the basis of A1, L denoted by A1 , consists of the following parts: 1. E is the set of usual mathematical predicates formed by the symbols of the given alphabet (so E includes the binary relation R0).

2. S is the set of those expressions without free variables (i.e. proposi- tions).

3. The provable sentences P are those that are true from the derivation rules. For example, ¬ false ∧ R0(false, true) → true ∧ R0(false, true) → true ∧ true → true.

4. The refutable sentences R are those that are false from the derivation rules. For example, ∀y (false ∨y) ∧ true → false ∧ true → false.

5. The predicates are those expressions with one free variable.

6. For each such predicate we can replace the free variable by a formula that is represented3 by a natural number, and obtain a proposition.

7. The definition of an axiomatic system does not include a model. If we think of the standard logic that is used in practice, we can see that for all formulas except those with an ∀-symbol, the formulas are derivable if and only if they are true.

We now introduce some concepts related to STGA languages and axiomatic systems. We assume that A is an axiomatic system that gives rise to an STGA language L.

Definition of derivable: A formula ϕ is derivable in L := ϕ ∈P. A formula ϕ is derivable from an axiomatic system A, notation A ! ϕ := there is an axiom ai of A and a sequence of formulas ϕ1,...,ϕl such that ϕ1 = ai and ϕl = ϕ and each ϕi follows from the preceding formulas and the axioms of A by the derivation rules of A.

2 Sometimes it is also said that an axiomatic system A1 gives rise to a language LA 3An example of such a bijective function between a predicate and a set of natural numbers will be given in section 8.2. 6.1. HILBERT’S PROOF THEORY 107

We call the sequence of formulas ϕ1,...,ϕl in a derivation of the state- ment ϕ a π of the statement ϕ.WhenA ! ϕ, we also write ϕ ∈ A.

Example:

A1 !¬false ∧ R0(false, true)

A1 !∀x)x¬ (since the formula is not well-formed, i.e. does not follow to be true from the syntax definition)

A1 !∀y (false ∨y) ∧ true (since it does not follow from the derivation rules, i.e. is a refutable sentence)

Hilbert proposed a program to reformulate all mathematics as a formal axiomatic theory, and this theory has to be proved to be consistent, i.e. free from contradiction. The standard method that was used to prove the consis- tency of axiomatic systems was to give a ‘model’. A model for an axiomatic theory is simply a system of objects, chosen from some other theory and satisfying the axioms.

This means we can relate axiomatic systems to existing systems by means ofamodel,alsocalledinterpretation or structure. A model of a formal axiomatic theory is a well-defined mathematical system with the particular structure that is characterized by the theory.

Definition of universe: Set of values that variables of an axiomatic system may take

Definition of a model: A universe together with an assignment of n-ary relations to n-ary constants, and a corresponding assignment of the variables.

We define a model M for an axiomatic system A by : M =(U, P1,...,Pk) with U a universe for A and P1,...,Pk the relations corresponding to symbols R1,...,Rk of A. If a formula ϕ is true in the model M (i.e. by interpretation of the relation symbols by the corresponding relations), notation M|= ϕ,we say that M is a model of ϕ. 108 CHAPTER 6. HILBERT

Example: Let M1 =(N, ≤) be a model for axiomatic system A1 M1 |= ∀x∀y(x ≤ y ∨ y ≤ x) M1 |= ∀x∀y(x ≤ y ∧ y ≤ x) Note that instead of using R1 for the relation symbol, we immediately took the interpretation ≤.

AtheoryTh ofamodelM, notation Th(M) is the set of true statements in the language of that model.

Definition of a theory: Th(M):={ϕ | ϕ is a statement and M|= ϕ}

So now we can say that Hilbert was looking for an axiomatic system for which logic can be a model. Hilbert proposed such an axiomatic system to have the properties of consistency, completeness and decidability. We will now introduce these concepts, along with some other properties of axiomatic systems. Since the properties of an axiomatic system A give rise to corre- sponding properties in the language LA, we here distinguish in each definition between the property of a language and of an axiomatic system.

Definition of decidability: A language L is decidable := (∀ϕ :: (ϕ ∈P∨ϕ ∈R)). An axiomatic system A is decidable := (∀ϕ :: there is an algorithm that de- cides in a finite number of steps whether (or not) A ! ϕ) (see also [49, page 270])

Definition of consistency: A language L is consistent := ¬(∃s : s ∈S: s ∈P∧s ∈R), i.e. P∩R= ∅ or no sentence is both provable and refutable in L. An axiomatic system A is consistent := ¬(∃ϕ :: A ! ϕ ∧ A !¬ϕ)(i.e.itis not possible for any formula ϕ, to derive both ϕ and ¬ϕ) (see also [49, page 240])

A language L is inconsistent if is not consistent. Clearly, L is inconsistent if P and R are not disjoint. Note that consistency and decidability do not refer to T , but only concern P and R. The following definitions of com- pleteness, and correctness also depend on the truth set T (and therefore on the model that determines that truth set). 6.1. HILBERT’S PROOF THEORY 109

Definition of completeness: A language L is complete for a model M := (∀ϕ :: M|= ϕ → ϕ ∈P). An axiomatic system A is complete for model M := (∀ϕ :: M|= ϕ → A ! ϕ) (i.e. all true statements in the model are deriva- ble/provable)

A language L is incomplete if it is not complete. Note that the statement (∀ϕ :: M|= ϕ → A ! ϕ) is equivalent with (∀ϕ :: A ! ϕ →M |= ϕ), i.e. all statements ϕ that are not derivable/provable, are also not true in the model.

Definition of soundness: A language L is sound for a model M := (∀ϕ :: ϕ ∈P→M|= ϕ). An axiomatic system A is a sound axiomatization for a model M := (∀ϕ :: A ! ϕ →M|= ϕ) (i.e. if a statement ϕ is derivable/provable, it is true in the model)

Definition of correctness: A language L is correct for a model M := P⊆T∧R∩T = ∅ (i.e. every provable sentence is true and every refutable sentence is false (not true)). An axiomatic system A is correct for a model M := A is sound for M and A is complete for M

Theorem: If L is correct, it is consistent. Proof: This follows directly from the definitions of correctness and consis- tency because if P is a subset of T and T is disjoint from R,thenP must be disjoint from R. 110 CHAPTER 6. HILBERT 6.2 Hilbert’s 23 problems

Who of us would not be glad to lift the veil behind which the future lies hidden: to cast a glance at the next level of our science and at the secrets of its development during future centuries? What particular goals will there be toward which the leading mathema- tical spirits of coming generations will strive? What new methods and new facts in the wide and rich field of mathematical thought will the next centuries disclose?

- D. Hilbert, in the opening of his speech to the 1900 Congress in

In 1900 Hilbert outlined his list of 23 mathematical problems to the In- ternational Congress of Mathematics in Paris, which he urged upon the at- tention of his contemporaries. His famous address was important and still today influences and stimulates mathematical research all over the world. It was not only a collection of problems, but it was also his philosophy of mathematics (see also the formalist viewpoint in section 5.2) and a collec- tion of problems important to that philosophy. Many of the problems have since been solved, and each solution was a noted (or even a mathema- tical breakthrough). Some of these problems however remain unsolved till this day. In 2000, in the footsteps of Hilbert, the Clay Mathematics Insti- tute (see http://zax.mine.nu/interests/questions/clay.htm) has made a new list of 7 (for a large part mathematical) problems to be solved in this century.

Among those problems is one of the original problems (number 8) of Hilbert. It requires a solution to the , which is usually considered to be the most important unsolved problem in mathematics. We mention some of the original problems that are related to the foundations of mathematics. For a complete source of information on the 23 (or 25?, see [32]) original publications of Hilbert, see the articles [41] and [40], also available online [42]. 6.2. HILBERT’S 23 PROBLEMS 111

• Problem 1: Cantor’s problem of the cardinal number of the continuum. This problem is also known as the Continuum Hypothesis and exten- sively covered in section 3.7.

• Problem 2: The consistency of the axioms of arithmetic. The question is whether it can be shown that the axioms on which arithmetic is based are consistent. G¨odel later showed that any formal system that contains arithmetic (see chapter 8) can never prove its own consistency. Another metamathematical argument might exist, that cannot be expressed in the system, but can prove its consistency.

• Problem 6: Mathematical treatment of the axioms of , asks to treat in the same manner, by means of axioms, those physical sciences in which mathematics plays an important part; in the first rank are the theory of and . So far no complete axiomatiza- tion of physics has been found.

• Problem 9: Proof of the most general law of reciprocity in algebraic number theory. For any field of numbers, the law of reciprocity (for more references see http://www.mathematik.uni-bielefeld.de/∼kersten/- hilbert/prob9.html) is to be proved for the residues of the lth power, when l denotes a prime, and further when l isapowerof2orapower of an odd prime. This problem is still unsolved.

• Problem 10: Decidability of solvability of diophantine equations. This question asks if, ‘given a with any number of un- known quantities and with rational integral numerical coefficients, to devise a process according to which it can be determined by a finite number of operations whether the equation is solvable in rational inte- gers’. In modern terminology the problem asks to devise an algorithm that tests whether a has an integral root. A root of a poly- nomial is an assignment of values to its variables so that the value of the polynomial is 0. A root is an integral root if all variables are assigned integer values. Some have an integral root (for example 6x3yz2 +3xy2 − x3 − 10 has an integral root at x =5,y =3andz =0) and some do not. Hilbert did not use the term algorithm but rather ‘a process according to which it can be determined by a finite number of operations’. In order to solve this problem this notion had to be made more precise 112 CHAPTER 6. HILBERT

(this was done by Turing, see section 9.1). Also, Hilbert asked that an algorithm be devised. Thus he apparently assumed such an algorithm exists, but now we know that this problem is algorithmically unsolv- able. In 1970, the young Russian Yuri Matijasevic, building on the work of Martin Davis, Hilary Potnam and Julia Robinson, showed that no algorithm exists for testing whether a polynomial has integral roots.

• Problem 23: Further development of the methods of the . Of the 23 problems Hilbert posed, this one is the least defi- nite, since it involves the general question of extending the calculus of variations, which basically is the theory of the variation of functions. With some examples that we will not treat here, Hilbert gave a jus- tification of the necessity for an of the differential and in- tegral calculus (for more references see http://www.mathematik.uni- bielefeld.de/∼kersten/hilbert/prob23.html).

At the end of his article, Hilbert says that he does not believe mathema- tics will, like other sciences, split into separate branches whose connection becomes ever more loose, but that the organic unity of mathematics is in- herent in the nature of this science, for mathematics is the foundation of all exact knowledge of natural phenomena. For a more detailed assessment of Hilbert’s view, see [49, section 12.4] and [31, section 4.7]. Chapter 7

Types

7.1 Russell and Whitehead’s Principia Ma- thematica

Logic has become more mathematical and mathematics has be- come more logical. The consequence is that it has now become wholly impossible to draw a line between the two; in fact, the two are one. They differ as boy and man; logic is the youth of ma- thematics and mathematics is the manhood of logic.

- B. Russell in [79, page 194]

In section 4.1 we saw that with the postulates he presented, Peano stated and organized the fundamental laws of number theory, the core of mathema- tics. If statements satisfying these conditions could be derived in this logic, it would show that (at least part of) mathematics was founded in pure logic. As we have seen in section 4.2, Frege was adherent to the goal of logicism that all of mathematics could be derived from logic alone. But unfortunately the language that he created was inconsistent, as we have learned from Russell’s paradox in section 5.1. In his 1908 paper, ‘Mathematical Logic as Based on the Theory of Types’, Russell laid out a theory to eliminate the paradoxes. With Principia Mathematica, Bertrand Russell and his teacher, the mathe- matician Alfred Whitehead, presented this theory to prevent the paradoxes while at the same time allowing many of the operations Frege considered de- sirable. The theory of types basically says that all sets and other entities have

113 114 CHAPTER 7. TYPES a logical ‘type’, these types can be ordered and sets are always constructed from specified members with lower types. We will look at the theory of types in more detail in section 7.2. Principia Mathematica consisted of three volumes (sometimes also called ‘the Principia’) and was named after the ‘Philosophiae naturalis principia mathe- matica’ of the English physician . But unlike Newton’s book it dealt not with the application of mathematical techniques to physics, but to logic and mathematics itself. With their mathematical treatment of the prin- ciples of the mathematicians, Russell and Whitehead intended to summarize the recent work in logic as well as to give a revolutionary and systematical development of mathematical logic and derive basic mathematical principles from the principles of logic alone. Their collaboration began in 1903 when Whitehead and Russell were both in the initial stages of preparing second volumes to earlier books on related topics: Whitehead’s 1898 ‘A Treatise on ’ and Russell’s 1903 ‘The Principles of Mathematics’. Their work overlapped considerably and they began collaborating on what would become ‘Principia Mathema- tica’. The approach of Russell and Whitehead was essentially that of Frege, to define mathematical entities (like numbers) in pure logic and then derive their fundamental properties. Indeed, their definition of natural numbers was basically the same as the one of Frege, but unlike him, they opted to avoid the philosophical aspects and justifications. Although ‘Principia’ was largely successful there still was critique on the axioms of infinity and the axiom of reducibility, they were considered to be too ad hoc solutions to be justified philosophically. In 1919 Russell published about the philosophy behind his work in an ‘Introduction to Mathematical Philosophies’ which was accessible to a broad audience and therefore has been the main source through which Russell’s logicist view of mathematics has become known.

I quote the following assessment about Principia Mathematica from [91]: “In addition to its notation (much of it borrowed from Peano), its mas- terful development of logical systems for propositional and predicate logic, and its overcoming of difficulties that had beset earlier logical theories and logistic conceptions, the Principia offered discussions of functions, definite descriptions, truth, and logical laws that had a deep influence on discus- sions in analytical philosophy and logic throughout the 20th century. What is perhaps missing is any hesitation or perplexity about the limits of logic: whether this logic is, for example, provably consistent, complete, or decida- 7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 115 ble, or whether there are concepts expressible in natural languages but not in this logical notation. This is somewhat odd, given the well-known list of problems posed by Hilbert in 1900 that came to animate 20th-century logic, especially German logic. The Principia is a work of confidence and mastery and not of open problems and possible difficulties and shortcomings; it is a work closer to the naive progressive elements of the Jahrhundertwende than to the agonizing fin de siecle.”. We would like to add that with the very for- mal and accurate build-up of mathematics, Russell and Whitehead not only managed to avoid the paradoxes but also created one of the most impressive and complicated works of all times and that is, next to Aristotle’s Organon, considered to be the most influential book on logic that was ever written.

In the next section we will further investigate Russell’s theory of types. The English mathematician Frank Plumpton Ramsey (1903-1930) offered criticism to the theory of types that was accommodated in later editions of Principia Mathematica. The result of this is the ‘deramified theory of types’ that will be treated in subsequent sections, together with a later simplification to this theory by the mathematicians Hilbert and (1896- 1962) from Germany. The mathematician also published articles on type systems, but did not develop his typed version of before the 1940’s, and his thereby falls outside the scope of this article (1870-1940). We will only summarize his work in this paragraph. The main difference between the type structure of Russell and that of Church is that the former is set-based with linear ordering of types and the latter is function based with a non-linear order of types. The that emerged from Church’s lambda calculus (see section 9.2) was extended with simple types in 1940 to prevent paradoxes, similar to the extension of logical set theory with simple types by Russell in 1910 to avoid the paradoxes. Church also proposed another logical set theory in 1974.

[..] in the simple theory of types it is well known that the indi- viduals may be dispensed with if classes and relations of all types are retained; or one may abandon also classes and relations of the lowest type, retaining only those of higher type. In fact any finite number of levels at the bottom of the of types may be deleted. But this is no reduction in the variety of entities, because the truncated theory of types, by appropriate deletions of entities 116 CHAPTER 7. TYPES

in each type, can be made isomorphic to the original hierarchy - and indeed the continued adequacy of the truncated hierarchy to the original purposes depends on this isomorphism.

- A. Church in ‘The need for abstract entities’.

Organization of Principia Mathematica

The nearly 2,000 pages Principia Mathematica starts with a short preface that explains what it wants to demonstrate, namely that pure mathematics can be based on logic alone and requires no other primitive notions. Russell classifies statements that involve logical constants only (such as the laws of reciprocity, see page 18 of Principia Mathematica) as pure mathematics, and other mathematical assertions that also refer to non-logical contents (such as the statement that (perceptual) space is three-dimensional) as part of . The belief was then expressed that pure mathematics was suf- ficient to include all traditional mathematics. Then, after an introduction, the first volume introduces a symbolic logic that is based on a small set of axioms, and then lays out the propositional and predicate calculi. Built upon these, Whitehead and Russell define types, sets, relations and their properties, and basic operations on sets. The second volume continues with a purely logical theory of cardinal and . This allowed them to introduce basic arithmetic, including addition, and expo- nentiation of both finite cardinals and of relations. The volume ends with a general theory of simply ordered sets (series) which is followed by a logical base of fundamental , including subjects as convergent sequences, continuity, limits and derivatives. The third volume was meant to prepare the ground for the fourth and con- cluding volume on geometry (which was never completed), and contained a theory of numbers that was called ‘measurement’. It starts with a theory of well-ordered sets, finite, infinite and continuous series, the negative integers, ratios and the real numbers, and finally vectors, coordinates and basic geo- metric notions such as . More details about the organization of Principia Mathematica and a critical assessment of its work can be found in [31, chapter 7, and specifically section 7.8]. 7.1. RUSSELL AND WHITEHEAD’S PRINCIPIA MATHEMATICA 117

The symbolic logic and notation of Principia Mathematica

Russell and Whitehead opted for a more modern notation of Peano in- stead of Frege’s Begriffsschrift. Unlike Frege, Russell and Whitehead treated functions as first-class citizens. A good introduction to the logical calculus and the specific notation that was used in Principia Mathematica can be found in [49, section 3.2 and 3.3] and [31, sections 7.2, 7.3, 7.7 and 7.8].

Russell’s theory of types

Russell’s 1908 book included a categorization of most of the important contradictions of that time, and an analysis of their common characteristics. To prevent the paradoxes he catalogued, Russell formulated the vicious circle principle (see page 85) and implemented it using types in Principia Mathe- matica (see for details [31, section 7.9] and [49, section 3.2 and 3.3]).

Whatisatype? A type is the range of significance of a propositional function, that is, the collection of arguments for which the said function is significant and has val- ues.

The type of a variable in a proposition is fixed by all the values the func- tion is concerned with, i.e. by the totality over which the variable ranges. This division of objects into types (the type of an object can be seen as a property of that object) is necessary to conform to the vicious circle principle, i.e. to make sure that ‘whatever contains an apparent variable must not be a possible variable of that variable’. This can be established by making sure that ‘an apparent variable’ is of a different and higher type than the possible values of that type. This linear order of types prevents vicious , since the variables contained in an object determine the type of that object.

Russell then defined an individual as being not a proposition but a con- stant, destitute of complexity. We can now categorize propositions by their types. First order propositions are elementary propositions that only con- tain individuals, second order propositions are propositions with first-order propositions as variables and possibly propositions of lower than first order types. This can be continued, such that the n +1th order propositions con- tain propositions of order n and possibly others of order smaller than n. 118 CHAPTER 7. TYPES

We now also restrict relations like ∈ so that x ∈ y is only significant when y is of a type one level higher than x, and we confine quantifiers always to a single level. As can be proved however, this way of restricting propo- sitions prevents the paradoxes but can in some cases be needlessly restrictive.

For more information about types in Principia Mathematica, see [31, section 7.9] and [49, section 3.3]. For a formalization (in modern notation) of Russell’s Ramified Theory of Types (RTT), we refer to [86, chapter 3]. On its turn, this reference is again partly based on [52], [53], [54] and [43], all of which in a certain context discuss RTT. A detailed introduction to the (symbolic) logic and notation of Principia Mathematica, as well as a formal introduction to RTT, STT and NF and MP (see section 7.3), is to be included in a later version of this report. 7.2. RAMSEY, HILBERT AND ACKERMANN 119 7.2 Ramsey, Hilbert and Ackermann

Suppose a contradiction were to be found in the axioms of set theory. Do you seriously believe that a bridge would fall down?

- F.P. Ramsey, quoted in [58]

Ramsey published his first major work ‘The Foundations of Mathematics’ (see [69, page 105-142]) in 1925. In this publication he attempted to improve Principia Mathematica in two ways. First he proposed dropping the axiom of reducibility which, he writes, is “[...] certainly not self-evident and there is no reason to suppose it true; and if it were true, this would be a happy accident and not a logical necessity, for it is not a .”. His second simplification is to suggest simplifying Russell’s theory of types by regarding certain semantic paradoxes as linguistic. He accepted Russell’s solution to remove the logical paradoxes of set theory arising from, for example, ‘the set of all sets which are not members of themselves’. However, the seman- tic paradoxes such as ‘this is a lie’ are, Ramsey claims, quite different and depend on the meaning of the word ‘lie’. These he removed with his reinter- pretation of the axiom of reducibility. After his suggestions, Russell’s theory became known as the ramified theory of types (RTT), and Ramsey’s modification of the theory as the deramified theory of types. For more detailed information about the history of deramification, we refer to [86, chapter 4].

Hilbert, together with Ackermann (see [2]), simplified Russell’s theory of types by removing the orders into what has become known as the ‘simple theory of types’ (STT). We quote from page 115 of [49]: “[In the simple theory of types,] every individual or individual variable is said to be of type i; and if a predicate or ϕ(x1,...,xn) has arguments x1, ..., xn,oftypesτ1,...,τ2 respectively, then ϕ(x1,...,xn) is said to be of type (τ1,...,τ2). Thus, for example, any predicate with two individual ar- guments is of type (i, i), while a predicate with a single argument that is itself a predicate with two individual arguments is of type (i, i, (i, i)). Having introduced the hierarchy of types in this way, we shall now require bound variables to be of some definite type. Every quantifier will then range over the totality of all entities of the same type as the bound variable. When 120 CHAPTER 7. TYPES this is done, we have a very comprehensive logical calculus which is secure against vicious circularity”.

A further discussion and formalization (in the form of Church’s simply typed lambda calculus λ → c) of the simple theory of types can be found in [86]. 7.3. QUINE 121 7.3 Quine

Just as the introduction of the irrational numbers ... is a conve- nient myth [which] simplifies the laws of arithmetic ...so physical objects are postulated entities which round out and simplify our account of the flux of existence ... The conceptional scheme of physical objects is [likewise] a convenient myth, simpler than the literal truth and yet containing that literal truth as a scattered part

- Quine, quoted in [50]

Willard Van Orman Quine (1908-2000) was an American mathematician who became interested in the work of Russell. An alternative to Russell’s sys- tem is one that allows a single universe of all types (or all sets). In Russell’s theory such an object is too big but according to others, including Quine, having a set of all sets or a type of all types is legitimate as long as we do not permit forming all subsets. If there is some restriction on which subsets can be formed, for example by requiring a stratified predicate to define the sub- set, then no contradiction will result. Quine proposed in [94, pages 80-101] a system called New Foundations, NF, based on this idea. To restrict the way subsets are formed, Quine further restricted the comprehension axiom to:

NFC(omprehension) Axiom: ∃x∀y :: (y ∈ x ↔ ϕ(y)), where x is not free in ϕ(y)andϕ(y) is stratified

In [86, footnote 4], we find two definitions of stratification.

Definition of heterogeneous stratification: A well-formed formula ϕ is heterogeneously stratified := there is a function f from the variables and constants of ϕ to the natural numbers such that for each atomic well-formed formula F (x1,...,xn)ofϕ, f(F )=1+(max :1≤ i ≤ n : f(xi))

Definition of homogeneous stratification: A well-formed formula ϕ is homogeneously stratified := ϕ is heterogeneously stratified and for the corre- sponding function f we also have that f(xi)=f(xj) for 0 ≤ i, j ≤ n

With the NFC axiom the paradox is obviously prevented, since the sen- tence ϕ ≡ x/∈ x is not stratified. 122 CHAPTER 7. TYPES

We quote from [86, page 3]: “NF is weak for mathematical induction and the axiom of choice is not compatible with NF. We cannot prove Peano’s axiom[s] in it, unless we assume the existence of a class with m + 1 ele- ments. Also, NF is said to lack motivation because its axiom of compre- hension is justified only on technical grounds and one’s mental image of set theory does not lead to such an axiom. To overcome some of the difficulties, Quine adopted similar measures to NBG (Neumann-Bernay-G¨odel, see sec- tion 8.5) set theory[, and developed another non-iterative set theory called ML (Mathematical Logic), first presented in [70]]. Like NBG, ML contains a bifurcation of classes into elements and non-elements. Sets can enjoy the property of being full objects whereas classes cannot. ML was obtained from NF by replacing (NFC) by two axioms, one for class existence and one for elementhood. Theruleofclassexistenceprovides[...] theexistenceofthe classes of all elements satisfying any condition ϕ, stratified or not. The rule of elementhood is such as to provide the elementhood of just those classes which exist for NF. Therefore, the two axioms of comprehension for ML [are]: Comprehension by a set: (∃y∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is stratified with set variables only in which y does not occur free. Impredicative comprehension by a class: (∃∀x :: x ∈ y ↔ ϕ(x)), where ϕ(x) is any formula in which y does not occur free. ML was liked both for the manipulative convenience we regain in it and the symmetrical universe it furnishes. It was however proved subject to the Burali-Forti paradox”.

For more information, we refer to [70], [71], [72] and the website http://diamond.boisestate.edu/∼holmes/holmes/nf.html. Chapter 8

G¨odel

The development of mathematics towards greater precision has led, as is well known, to the formalization of large tracts of it, so that one can prove any theorem using nothing but a few me- chanical rules. [...] It will be shown below that this is not the case, that on the contrary there are in the two systems mentioned [viz. Principia Mathematica and ZF] relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms.

-K.G¨odel, in the opening of the paper introducing the incom- pleteness theorem (1931)

8.1 Informally: G¨odel’s incompleteness theorems

No system of Hilbert’s type in which the integers (or Peano’s arithmetic, see section 4.1) can be defined can be both consistent and complete. At the time this seemed unreal, but in 1931 Kurt G¨odel (born in 1906 in Brnn, Austria-Hungary, what is now Brno, Czech Republic) presented mathema- ticians with the astounding and melancholy conclusion that the axiomatic method has certain limitations, which rule out the possibility that even the ordinary arithmetic (as by Peano) can ever be fully axiomatized. As a corol- lary of this theorem, he proved that it is impossible to establish the internal logic consistency of a very large class of deductive systems. It provoked a reappraisal of philosophies of mathematics.

123 124 CHAPTER 8. GODEL¨

G¨odel’s famous incompleteness theorem and the corresponding corollary are also called the first and the second incompleteness theorem. G¨odel was able to show that, if an axiomatic system of formalized arithmetic is wide enough, then 1. The system is necessarily incomplete, in the sense that there exists a formula ϕ of the system such that neither ϕ nor its negation is derivable (see also section 8.2 for the definition of incompleteness), and

2. If the system is consistent, then no proof of its consistency is possible which can be formalized within it (see also section 8.2 for the definition of consistency). We first indicate (in 8 steps, following the lines of the original proof of G¨odel) the main lines of both theorems in this section, and provide a more rigorous and exact proof of the theorems in section 8.2 and further sections.

1 The (syntax of) formulas of an axiomatic system are precisely defined and built up from a finite alphabet of symbols. Proofs are noth- ing but a finite series of formulas and can be replaced by numbers. With such a representation, the G¨odel numbering,G¨odel gave a well- ordering of all well-formed formulae of an axiomatic system S (to be precise, of ω-complete systems, see section 8.2 for more details). G¨odel then showed how to represent metamathematical concepts as ‘formula’, ‘proof-’ and ‘provable formula’ by a series of natural numbers. We define gn(ϕ) to be the G¨odel number corresponding to well-formed formula ϕ of S.

2 We consider a formula prov(ϕ)ofS, stating that ϕ is a provable for- mula. Precisely, we define prov(ϕ):=‘ϕ is a provable formula’. A class sign is a formula with just one free variable. We suppose that the class signs are ordered by a function R with domain N, such that R(n)is defined as the nth class sign. By [R(n); q] we denote the formula which is denoted by replacing the free variable in R(n)byq.

3 We now define a set K of natural G¨odel numbers by n ∈ K ↔ ¬prov([R(n); n]). Since the symbols that are used in this formula are all definable in S, there also is a formula with one free variable (i.e. a class sign) that denotes n ∈ K, for some natural number n. We call 8.1. INFORMALLY: GODEL’S¨ INCOMPLETENESS THEOREMS 125

this class sign C. Sothereisanaturalnumberq such that C = R(q). We now show that the proposition G ≡ [R(q); q], is unprovable in S. Since1 this formula says that q ∈ K,thatis¬prov([R(q); q]), we can say that G is a property that asserts of itself that it is not provable.

4 We show that G is provable ↔¬G is provable, and hence is undecid- able:

• Suppose G is provable, this means [R(q); q] is provable, (by replac- ing the variable in the class sign by q)thatisq ∈ K, i.e. ¬prov([R(q); q]), and this says ¬prov(G):G is not provable. • Suppose G is not provable, this means its negation ¬[R(q); q]is provable, (by replacing the variable in the class sign C by q) that is q/∈ K, i.e. (¬¬prov([R(q); q]), and this is equivalent with prov([R(q); q]) or prov(G):G is provable.

A proof of G leads to a proof of ¬G and vice versa, thus the system S is inconsistent. So if we assume that S is consistent, then both G and ¬G must not be provable: G is undecidable in S.

5 By a metamathematical consideration we know however that G is true. Because from the remark that G asserts its own unprovability, it follows at once that G is true, since G is unprovable (because undecidable). So there is a true statement in S (namely G) that is not provable: the system S is incomplete!

6IfweaddG as an axiom, we can again apply the argument given in the previous five steps in the same way. Basically we then create another formula G, since in step 3 a proposition is defined that states ‘this formula is not provable’, or in other words ‘this formula does not follow from the axioms’. That means, the proposition depends on the set of axioms. Therefore, as I. Grattan-Guinness cleverly calls it in [31, page 510], the system S is ‘essentially incompleteable’.

7G¨odel then showed that ‘if arithmetic is consistent, it is incomplete’. We want to prove this conditional statement as a whole. We define the condition of the statement by A: ‘arithmetic is consistent’. We

1By replacing in the class sign C, which expresses that n ∈ K for some natural number n, the free variable by q. 126 CHAPTER 8. GODEL¨

already have seen in section 6.1 that this means that there is at least one formula ϕ of arithmetic that is not true. So we can express A ≡ (∃y :: (∀x :: ¬prov (x is a proof of y))). A system is incomplete if there is a true statement that is not provable. Thus we can represent the conclusion of the conditional statement by G.

8 We can now formally prove A → G (see section 8.2 for the proof). This means that if A is provable, we know (by modus ponens or the role of detachment) that G is provable. But we already saw that (unless S is inconsistent), G is not provable; thus if S is consistent, A is not provable! That means if arithmetic is consistent its consistency cannot be established by metamathematical reasoning within the formalism of arithmetic (this is G¨odel’s theorem 11, see [93, page 614]). Or, as expressed in [31, page 510], ‘any set S of consistent formulae of PM cannot include the formula F asserting its consistency’. 8.2. FORMALLY: GODEL’S¨ INCOMPLETENESS THEOREMS 127 8.2 Formally: G¨odel’s Incompleteness Theorems

The first incompleteness theorem says that Principia Mathematica or any other system in which arithmetic can be developed, is essentially incomplete, that is in any consistent set of arithmetical axioms there are statements that are true but cannot be derived from the set.

The second theorem says that it is impossible to give a metamathemat- ical proof of the consistency of a system comprehensive enough to contain the whole of arithmetic - unless the proof itself employs rules of inference in certain essential respects different from the derivation rules identifying theorems within the systems.

In the following two paragraphs, we will first give an abstract version of G¨odel’s first and second incompleteness theorem, investigate the set of lan- guages that the theorem applies to, and then in the third paragraph fill in the details by giving a specific G¨odel numbering for arithmetic. Then in the next sections we will apply the theorem to the system of Peano Arithmetic and that of Principia Mathematica, and discuss the consequences of the in- completeness theorem.

8.2.1 On formally undecidable propositions We assume there is an STGA language L and investigate the conditions for asystemL for which G¨odel showed that there is a true sentence that is not provable in L (i.e. (∃t : t ∈T : t/∈P)). We define the following concepts: A predicate H expresses a set of numbers A := (∀n :: H(n) ∈T ↔ n ∈ A) A is expressible in L if A is expressed by some predicate of L.Notethat expressibility in L only concerns with T and not with P and R.

Theorem: Not every set of numbers is expressible. Proof: (from [84]) Since L is built up of a finite number of symbols and derivation rules, there are only denumerably many expressions or predicates of L. But (by Cantor’s theorem, see page 69) there are non-denumerably many sets of natural numbers. Therefore, not every set of numbers is ex- pressible in L. 128 CHAPTER 8. GODEL¨

Let gn be a function that assigns to each expression a unique natural number (just as in step 1 in section 8.1, i.e. gn is a bijection between E and N). For any E ∈E,wealsocallgn(E) the G¨odel number of E. We will give a specific numbering in section 8.2.3. For this abstract treatment the only assumption2 we make is that every number is the G¨odel number of some expression.

We define En to be the inverse of gn, i.e. gn(En)=n.Thediagonali- zation of En for En ⊆H,isdefinedbyEn(n). We define d(n)tobethe G¨odel number of the diagonalization of En,thatis:d(n):=gn(En(n)), and call d the diagonal function of the system. For each set of natural numbers A,wedefineA∗ to be the set of all numbers n such that d(n) ∈ A, i.e. we have n ∈ A∗ ↔ d(n) ∈ A. For any set of natural numbers A,wedefine its complement A to be the set of all natural numbers not in A.Thecom- plement operation ∼ binds stronger than the ∗, i.e. (A∗)istobereadas(A)∗.

Abstract form of G¨odel’s first theorem: Let P be a set of G¨odel num- bers of all the provable sentences. If the set P∗ is expressible in L and L is correct, then there is a true sentence of L not provable in L. Proof: (based on [84]) Suppose L is correct and P∗ is expressible in L by a predicate H with G¨odel number h.LetG be the diagonalization of H (i.e. the sentence H(h)). We show that G is true but not provable in L. H expresses P∗ in L, i.e. H(n) is true ↔ n ∈ P∗ for all n ∈ N. In particular, H(h) is true ↔ h ∈ P∗. Wehavethath ∈ P∗ ↔ d(h) ∈ P↔ d(h) ∈/ P . But since h is the G¨odel number of H and by the definition of d, d(h)is the G¨odel number of H(h)andsod(h) ∈ P ↔ H(h)isprovableinL and d(h) ∈/ P ↔ H(h) is not provable in L. Now we have: H(h) is true ↔ H(h) is not provable in L. This means that H(h) is either true and not provable in L or false but provable in L. The latter alternative violates the hypothe- sis that L is correct. Hence it must be that H(h) is true but not provable in L.

Note that in this proof we have not defined the set T by a model but determined the truth of G by a metamathematical argument just as we have seen in step 5 of section 8.1, that is nevertheless commonly accepted by all mathematicians. Note also that the proposition G corresponds to the propo-

2This assumption is for technical reasons that make the proof more simple; G¨odel’s original numbering did not have this restriction. 8.2. FORMALLY: GODEL’S¨ INCOMPLETENESS THEOREMS 129 sition G of point 3 of section 8.1, since H(h) is a proposition that expresses of itself that it is not provable.

Theorem: If L is correct and if the set P∗ is expressible in L,thenL is incomplete. Proof: AsystemL that is correct and for which the set P∗ is expressible in L contains a sentence G that is true but not provable or refutable (By the previous theorem and the assumption of correctness). Hence G is true, but undecidable in L, and hence also incomplete.

That is where the name incompleteness theorem comes from. By this theorem, it follows immediately that if a system is consistent,andtheset P∗ is expressible in that system (which we will later see is true for a system of basic arithmetic) then it is incomplete. Note that this is the statement A → G of point 8 in section 8.1. When we study a particular language L, such as a system containing Peano’s arithmetic or the system of Principia Mathematica, we have to verify the assumption that P∗ is expressible in L. We can do this by separately verifying the following conditions. ∗ G1 : For any set A expressible in L, the set A is expressible in L. G2 : For any set A expressible in L, the set A is expressible in L.

G3 : The set P is expressible in L. ∗ Theorem: G1 ∧ G2 ∧ G3 → P is expressible in L. ∗ Proof: G1 and G2 imply that for any expressible set A, A is expressible in ∗ L. In particular we then have that if P is expressible in L (i.e G3 holds), P is expressible in L.

Before we prove a general form of G¨odel’s second incompleteness theo- rem, we introduce some more definitions.

AsentenceEn is a G¨odel sentence for a set A of natural numbers if either En is true and its G¨odel number lies in A,orEn is false and its G¨odel number lies outside A, i.e. En is a G¨odel sentence for A if and only if En ∈T ↔n ∈ A.

Diagonal Lemma: For any set A,ifA∗ is expressible in L, then there is a G¨odel sentence for A. 130 CHAPTER 8. GODEL¨

Proof: Suppose H is a predicate that expresses A∗ in L;leth be its G¨odel number. Then d(h) is the G¨odel number of H(h). For any number n, H(n) is true ↔ n ∈ A∗, therefore, H(h) is true ↔ d(h) ∈ A, and since d(h)isthe G¨odel number of H(h), then H(h)isaG¨odel sentence for A.

Lemma: If L satisfies G1, then for any set A expressible in L,thereisa G¨odel sentence for A. ∗ Proof: L satisfies G1, thus for any expressible set A, A is expressible in L. Now we can apply the previous lemma to conclude that there is a G¨odel sentence for A.

With the we can also prove the first theorem as follows: Since P∗ is expressible in L, by the diagonal lemma, there is a G¨odel sentence G for P.AG¨odel sentence for P is a sentence which is (by the definition of a G¨odel sentence) true if and only if it is not provable in L.Soforany correct system L,aG¨odel sentence for P is a sentence which is true but not provable in L.

8.2.2 The impossibility of an ‘internal’ proof of consis- tency With the diagonal lemma we can also prove a general form of G¨odel’s second theorem, that was first formulated in this form by the Polish mathematician .

A general form of G¨odel’s second theorem (by Tarski)

1. The set T∗ is not expressible in L 2. If condition G1 holds, then T is not expressible in L

3. If conditions G1 and G2 both hold, then the set T is not expressible in L (i.e. for systems for which G1 and G2 hold, truth within the system is not definable within the system.) Proof: To begin with, there cannot possibly be a G¨odel sentence for the set T because such a sentence would be true if and only if its G¨odel number was not the G¨odel number of a true sentence, and this is absurd. 8.2. FORMALLY: GODEL’S¨ INCOMPLETENESS THEOREMS 131

1. If T∗ were expressible in L, then by the diagonal lemma, there would be aG¨odel sentence for the set T, which we have just shown is impossible. Therefore, T∗ is not expressible in L. 2. Suppose condition G1 holds. Then if T were expressible in L, the set T∗ would be expressible in L, violating (1). 3. If G2 also holds, then if T were expressible in L,thenT would also be expressible in L, violating (2).

Now we have seen both theorems in a general form, we will consider particular mathematical languages, starting with first order arithmetic, which we can build on in section 8.3 to prove the incompleteness of systems based on Peano’s arithmetic and other systems.

8.2.3 G¨odel numbering and a concrete proof of G1, G2 and G3 This section will be completed in a later version of this document. For the moment we refer to G¨odel’s original work that can be found in [93]. 132 CHAPTER 8. GODEL¨ 8.3 G¨odel’s theorem and Peano Arithmetic

The classification of the various modes of syllogisms, when they are exact, has little importance in mathematics. In the mathema- tical sciences are found numerous forms of reasoning irreducible to syllogisms.

- G. Peano in [68, page 379]

There are various different incompleteness proofs of Peano Arithmetic (with and without ). We mention three of them. The sim- plest uses a truth set defined by Tarski and shows that every axiomatizable subsystem of N (the of arithmetic) is incomplete. This proof of G¨odel’s first theorem however cannot be formalized in arithmetic (since the truth set is not expressible in arithmetic), and was based on the underlying assumption that Peano Arithmetic is correct, implying that every sentence provable in Peano Arithmetic is a true sentence. G¨odel’s original incompleteness proof involves the much weaker assumption of ω-consistency.

Definition of simple consistency: An axiomatic system A is simply consistent := no sentence is both provable and refutable in A

Definition of ω-inconsistent: An axiomatic system A is ω-inconsistent := there is a predicate F (w) (in one free variable w) such that the sentence (∃w :: F (w)) is provable but all the sentences F (0),F(1),... are refutable

Definition of ω-incomplete: An axiomatic system A is ω-incomplete := A is a simply consistent axiomatic system in which all Σ0-sentences are provable

G¨odel’s original proof was based on the assumption of ω-consistency and shows that every axiomatizable ω-consistent system in which all true Σ0- sentences are provable is incomplete. This proof is of course formalizable in Peano Arithmetic (and this is necessary for G¨odel’s second theorem) and also shows that any axiomatic system A that is simply consistent and in which all Σ0-sentences are provable, is ω-incomplete. The third proof (1936) is due to Rosser and uses the even weaker assumption of simple consistency. It is based on an axiomatic system by the American mathematician Raphael Robinson (1912-1995), that we refer to as R.It 8.3. GODEL’S¨ THEOREM AND PEANO ARITHMETIC 133 shows that every axiomatizable simply consistent extension of R is incom- plete, but thereto uses a more elaborate sentence than the G¨odel sentence ‘G is undecidable’.

We intend to include the three proofs in a later version of this document. They can be found in [84] but in a particular presentation that does not use the concept of a model for axiomatic systems, and that sometimes attaches different meanings to established definitions, nevertheless it contains in our opinion one of the best discussions of G¨odel’s incompleteness theorems. In a later version of this document we will also show how, given the proof of incompleteness of Peano Arithmetic, G¨odel’s theorems apply to Principia Mathematica.

We quote K. G¨odel on the first page of [27]:

The most comprehensive formal systems that have been set up hitherto are the system of Principia Mathematica on the one hand and the Zermelo-Fraenkel axiom system of set theory (further de- veloped by J. von Neumann) on the other. These two systems are so comprehensive that in them all methods of proof today used in mathematics are formalized, that is, reduced to a few axioms and rules of inference. One might therefore that these axioms and rules of inference are sufficient to decide any ma- thematical question that can at all be formally expressed in these systems. It will be shown that this is not the case, that on the contrary there are in the two systems mentioned relatively simple problems in the theory of integers that cannot be decided on the basis of the axioms”. 134 CHAPTER 8. GODEL¨ 8.4 Consequences

I had a lot of conversations with him [G¨odel] and a lot of dis- agreements. Like most others, I was hard to convince about the incompleteness theorem. There was at the time a tendency, which I shared, to think that it was special to a certain type of formali- zation of logic and that a radical reformalization might have the effect that the G¨odel argument did not apply. I persisted in that longer than I should have, and he was always trying to convince me otherwise.

- A. Church in an interview at Princeton University (1985)

In a later version of this document we will discuss the implications of G¨odel’s theorem and show the reactions that followed the publication of his paper [27] in 1931. 8.5. NEUMANN-BERNAYS-GODEL¨ AXIOMS 135 8.5 Neumann-Bernays-G¨odel axioms

There is an infinite set A that is not too big.

There’s no sense in being precise when you don’t even know what you’re talking about.

- John von Neumann (sources unknown)

Let us recapture the situation of the axiomatic theory of sets before we introduce the Neumann-Bernays-G¨odel theory.

When Cantor introduced his set theory, he gave the informal definition (see page 16) of a set being ‘any comprehension into a whole M of definite and separate objects m of our intuition or thought’. After Hilbert proposed his proof theory, set theory was given a more rigorous basis, and axiomatic theories for Cantor’s sets were proposed. Cantor’s definition was replaced by the principle of comprehension (see page 16), which was adopted by Frege and Russell. Based on this principle a first formal theory of sets, called ‘ideal calculus’ was developed (not treated in detail here, see for example [36]). The antinomies of Burali-Forti and Russell however showed that this theory was inconsistent, and one way to restore consistency was to incorporate in the system a theory of types, as was done by Russell. At the same time, intu- itionists tried to do mathematics without Cantor’s set theory at all. Others tried to overcome the inconsistencies by making Cantor’s set theory more rigidly axiomatic, and the most successful axiomatization of set theory was presented by Zermelo in 1908.

The problem for him was to solve the problem of axiomatization in such a way that it excludes all contradictions but still is sufficiently wide for all that is valuable in this theory to be preserved. As we have seen in section 5.3, Zermelo postulated a domain of abstract objects (sets) and elements of this domain, defined the primitive notions of ‘equality’ and ‘is element of’ relation, and introduced 7 axioms. The comprehension axiom was replaced by the weaker separation axiom, that only allows new sets to be created from existing sets and with definite predicates. Before we will describe why the Hungarian mathematician von Neumann opposed this solution and came with his own solution to the paradoxes, we will look at this separation axiom 136 CHAPTER 8. GODEL¨ in more detail. Zermelo defined the separation axiom as follows:

Separation axiom: (∀z∃y∀x :: x ∈ y ↔ x ∈ z ∧ ϕ(x)), ϕ is definite and does not contain y.For every set z there exists a set y whose elements are exactly those of z having the property ϕ.

The concept of definiteness in this axiom was defined by Zermelo as fol- lows: “A question or assertion ϕ, the validity or invalidity of which is decided without arbitrariness by the basic laws of logic, is said to be ‘definite’ ”. We have already seen on page 93 that this axiom excludes the paradoxes of Russell and Burali-Forti, and as Kneebone remarks3 in [49, page 263] also the semantic paradoxes.

In [83], the Norwegian mathematician Skolem pointed out that the defi- nition of ‘definiteness’ was rather vague and he made precise the formulation of ‘by the basic laws of logic’. Fraenkel used Skolem’s idea to formulate the separation axiom in a new way (for details, see [49, page 290, 291]). In 1922 Fraenkel proposed the introduction of another axiom that allows the existence of larger cardinal numbers than hitherto possible. The foundation axiom of von Neumann makes occurrence of so-called extraordinary sets im- possible. A set is extraordinary if there is a sequence of sets V1,V2,V3,... such that V2 ∈ V1, V3 ∈ V2, etc. Von Neumann’s subsequent interest in set theory led to the second major axiomatization of set theory in the 1920s.

His formulation differed considerably from Zermelo and Fraenkel (see sec- tion 5.3) because the notion of function, rather than that of set, was taken as primitive. In a series of papers beginning in 1937, however, the Swiss logician Paul Bernays, a collaborator with the formalist David Hilbert, mod- ified the von Neumann approach in a way that put it in much closer contact with Zermelo and Fraenkel. In 1940, the Czech-born Kurt G¨odel, known for his incompleteness proof (see chapter 8), further simplified the theory. This version is known as the Neumann-Bernays-G¨odel (NBG) axioms.

3We quote: “since a definite property is one that is decidable by the basic relations of the domain B [of sets, the abstract objects postulated by Zermelo], no such property as that of being definable in a finite number of words can be used in the definition of a set, and the semantic paradoxes are thus also excluded”. 8.5. NEUMANN-BERNAYS-GODEL¨ AXIOMS 137

Before we give the axioms, it is convenient to adopt the undefined notions of class and the membership relation (though, as is also true in Zermelo and Fraenkel, ∈ suffices). In the axioms we distinguish between the use of capital Latin letters and lowercase Latin letters for the variables. The capital letters stand for variables that take classes (the totalities corresponding to certain properties)asvalues.Aclassisdefinedtobeasetifitisamemberofsome class; those classes that are not sets are called proper classes. The lowercase letters are used as special restricted variables for sets.

Example: ‘for all x, A(x)’ stands for ‘for all X,ifX is a set, then A(X)’; i.e. the condition holds for all sets. Intuitively, sets are intended to be those classes that are adequate for mathematics, and proper classes are thought of as those collections that are ‘so big’ that, if they were permitted to be sets, contradictions would follow. In the Neumann-Bernays-G¨odel axioms, the classical paradoxes are avoided. This can be proven by showing in each case that the collection on which the paradox is based is a proper class, i.e. is not a set.

Theorem: With the Neumann-Bernays-G¨odel axioms, the derivation of Russell’s paradox does not apply.

Proof: We show that R := {x | x is a set ∧ x/∈ x} is a class, but not a set. For all y we have that y ∈ R ↔ y is a set ∧ y/∈ y. We prove by contradiction that R is not a set. Suppose R is a set. Suppose R ∈ R. But then we have (take R for y in the above statement) R ∈ R ↔ R is a set ∧ R/∈ R: contradiction. So we must have R/∈ R. Then by our assumption we have R is a set ∧ R/∈ R,and thus R ∈ R: contradiction. Since in both cases (R ∈ R and R/∈ R)weget a contradiction, out assumption that R is a set must be wrong.

The Neumann-Bernays-G¨odel axioms (NBG):

1 Extensionality axiom (or axiom of determination): (∀X, Y, z :: (z ∈ X ↔ z ∈ Y ) → X = Y ) Classes are uniquely determined by their members, to be exact: if every element (that is a set) of a class X is at the same time an element of Y , and conversely, than X = Y . 138 CHAPTER 8. GODEL¨

2 Axiom of the empty set: (∃x∀y :: y/∈ x) There is an (improper, see also footnote on page 93) set, the ‘null’ or ‘empty’ set, which contains no elements at all. 3 Axiom for class formation: (∃Y ∀x :: (x ∈ Y ↔ ϕ(x)), ϕ is a proposi- tion in which set variables are only introduced by existential and uni- versal quantifiers. For every set z there exists a set y whose elements are exactly those of z having the property ϕ. 4 Pairing axiom: (∀a, b :: (∃y∀x :: x ∈ y ↔ x = a ∨ x = b)) Given two sets a and b there exists a set whose elements are exactly a and b. 5 Sum-set axiom or Union axiom: (∀z∃y∀x :: x ∈ y ↔ (∃w :: w ∈ z ∧ x ∈ w)) For every set z there exists a set y whose elements are exactly those objects occurring in at least one element of z. 6 Power set axiom (∀z∃y∀x :: x ∈ y ↔ x ⊆ z) For every set z thereisasety that includes every subset of x. 7 Axiom of infinity: (∃z :: ∅∈z ∧ (∀a : a ∈ z : {a}∈z)) There exists a successor set. 8 Axiom of choice: (∀x :: (∃f : f is a function : Dom(f)=x − {∅} ∧ (∀a : a ∈ Dom(f): f(a) ∈ x))) Every set x has a choice function. 9 Axiom of replacement or axiom of substitution (by Fraenkel): (∀x∃!y : ϕ is a class : ϕ(x, y)) → (∀a :: (∃b∀y :: y ∈ b ↔ (∃x : x ∈ a : ϕ(x, y)))) The image of a set under an operation (functional property) is again a set. 10 Axiom of restriction: X = ∅→(∃y : y ∈ X ∧ X ∩ y = ∅)Every non-empty class is disjoint from one of its elements. 8.5. NEUMANN-BERNAYS-GODEL¨ AXIOMS 139

The axioms 1, 3, 9 and 10 are different from ZF. The third axiom (scheme) is presented in a form to facilitate a comparison with the third axiom (scheme) of ZF. In a detailed development of NBG, however, there appears, instead, a list of seven axioms (not schemes) that for each of certain conditions there exists a corresponding class of all those sets satisfying the condition. From this finite set of axioms, each instance of the above scheme, can be obtained as a theorem. When obtained in this way, the third axiom scheme of NBG is called the class .

In contrast to the ninth axiom scheme of ZF (see section 5.3.2), that of NBG is not an axiom scheme but an axiom. Thus, with the comments above about the third axiom in mind, it follows that NBG has only a finite number of axioms. On the other hand, since the ninth axiom or scheme of ZF provides an axiom for each formula, ZF has infinitely many axioms. The finiteness of the axioms for NBG makes the logical study of the system simpler.

The relationship between the theories may be summarized by the state- ment that ZF is essentially the part of NBG that refers only to sets. We give the following theorems without proof:

Theorem: Every theorem of ZF is a theorem of NBG

Theorem: Any theorem of NBG that speaks only about sets is a theorem of ZF

Theorem: ZF is consistent if and only if NBG is consistent

Note that the fact that NBG avoids the classical paradoxes and that there is no apparent way to derive any one of them in ZF does not settle the question of the consistency of either theory. All we know from this theorem is that either both axioms are consistent, or both are inconsistent. 140 CHAPTER 8. GODEL¨ Chapter 9

Church and Turing

9.1 Turing and Turing Machine

We may hope that machines will eventually compete with men in all purely intellectual fields.

- in [38, page 46]

Alan Mathison Turing (1912-1954) was an English mathematician and logician who pioneered in the field of computer theory and who contributed important logical analyses of computer processes. Turing studied in Cam- bridge, worked there on theory and (independently of de Moivre) discovered the central limit theorem. In 1936 he won the Smith’s Prize. As we have seen in the previous chapters, many mathematicians had attempted to eliminate all possible error from mathematics by establishing a formal, or purely algorithmic, procedure for establishing truth (the so-called for- malist program). With his incompleteness theorem (see section 8.1), Kurt G¨odel threw up an obstacle to this effort, for he showed that any useful ma- thematical axiom system is incomplete in the sense that there must exist propositions whose truth can never be decided (within the system). Turing was motivated by G¨odel’s work to seek an algorithmic method of determining whether any given propositions were undecidable, with the ultimate goal of eliminating them from mathematics. Instead, he proved in his seminal paper ‘On Computable Numbers, with an Application to the Entscheidungspro- blem’ (reprinted in [19]) that there cannot exist any such universal method of determination. We now regard this , or Entscheidungs-

141 142 CHAPTER 9. CHURCH AND TURING problem, in more detail.

Decidability was one of Hilbert’s requirements for an axiomatic system (see section 6.1). The problem of decidability asks if, given a mathematical proposition, one could find an algorithm which would decide if the propo- sition is true or false. When given an algorithm, it is easy to see that it can prove certain propositions. But it is more difficult to prove there is no algorithm that can solve certain propositions. Thereto Turing introduced a hypothetical computing device (later called Turing machine). The Turing Machine and proof of undecidability are given later in the section.

After this important publication Turing completed his Ph.D. in 1938 on systems of logic based on ordinals, under direction of Alonzo Church (see section 9.2). During the war Turing worked on breaking German Enigma codes, and in 1948 he worked in Manchester on the construction of a new digital computer. He described a modern computer before technology had reached the point where the construction was a realistic possibility. His ef- forts in the construction of early computers and the development of early programming techniques were of prime importance. He also championed the theory that computers eventually could be constructed that would be capable of human thought, and he proposed a simple test, now known as the Tur- ing test, to assess this capability. Turing’s papers on the subject are widely acknowledged as the foundation of research in artificial intelligence. In 1952 Turing published the first part of his theoretical study of morphogenesis, the development of pattern and form in living organisms.

The Turing Machine

Turing introduced his hypothetical computing device in 1936. He origi- nally conceived the machine as a mathematical tool that could infallibly re- cognize undecidable propositions - i.e., those mathematical statements that, within a given formal axiomatic system (that includes at least arithmetic), cannot be either true or false. G¨odel had demonstrated that such proposi- tions exist in any such system. Turing instead proved there can never exist any universal algorithmic method for determining whether a proposition is undecidable. This was left open by G¨odel, since the incompleteness theorem (see section 8.1) only stated that consistency and completeness could not at the same time be attained; that means there were statements (in consistent 9.1. TURING AND TURING MACHINE 143 systems) about numbers, indubitably true, which could not be proved from finitely many rules. But the decidability of mathematical statements was not settled by G¨odels theorem because it needs a formal definition of (al- gorithmic) method in the formulation of the problem (or a definition of the notion of algorithm in the definition of decidability in section 6.1). Thereto Turing introduced a machine that was later to be called the Turing machine, an idealized mathematical model that reduces the logical structure of any computing device to its essentials. By extrapolating the essential features of information processing, Turing was instrumented in the development of the modern digital computer. His model served as a basis for all subsequent digi- tal computers, which share his basic scheme of an input/output device (tape and head), memory (tape) and central processing unit (head and transition function).

Nowadays there are many models of computing devices available in the (complexity). We will not cover restricted models such as finite automata and pushdown languages (and corresponding notions such as regular languages and context-free grammars). We now directly in- troduce the much more powerful model of Turing that we need to invest all mathematical problems.

The Turing Machine model uses an infinite tape as its unlimited memory, and has a tape head that can read and write symbols (of a set Γ) and move around a tape (to the L(eft) or R(ight)). We here assume the tape is right- infinite; this means the tape continues infinitely to the right side but it has a left-most position. Initially the tape contains an input string of symbols from an input alphabet Σ and is blank (i.e. filled with a special blank symbol ") everywhere else. The Turing Machine is in a state q of a set of states Q, and starts in an initial state q0. It uses a transition function δ that deter- mines how it gets from one configuration (that is the current state, the tape contents and the head location) to the next. This transition can consist of writing a new symbol of the tape alphabet Γ to the tape and moving the tape head either Left or Right, and depends on the current state and the current symbol on tape. This computation (i.e. sequence of transitions) continues until the Turing Machine enters either the (final) state qaccept or the (final) state qreject. We can define a Turing Machine (sometimes called determin- istic, since each transition is determined uniquely given the configuration) formally as a septuple: 144 CHAPTER 9. CHURCH AND TURING

Definition of a Turing Machine (TM): A Turing Machine (TM) := (Q, Σ, Γ,δ,q0,qaccept,qreject)with: 1 Q is a finite set of states. 2 Σ is a finite input alphabet not containing the special blank symbol ". 3 Γ is a finite tape alphabet, where {"} ∈ ΓandΣ⊆ Γ. 4 δ is the transition function, where δ is finite and δ : Q × Γ → Q × Γ ×{L, R}.

5 q0 is the start state, where q0 ∈ Q.

6 qaccept is the accept state, where qaccept ∈ Q.

7 qreject is the reject state, where qreject ∈ Q and qreject = qaccept.

We call configurations accepting configurations if the state is qaccept, re- jecting configurations if the state is qreject,andhalting configurations if the state is either qaccept or qreject. A start configuration C on input w is a con- figuration with state q0 and the head is on the leftmost position on the tape with just w on it.

After defining the Turing Machine, Turing made his famous proposal (known as Turing’s thesis, see also section 9.3) for the concept of ‘com- putability by a Turing machine’. The proposal says that whenever there is an effective method for obtaining the values of a mathematical function (i.e. it is intuitively or effectively computable), the function can be computed by a Turing Machine. The claim is trivial, and if the thesis is correct we can reduce problems of (non-)existence of effective methods by problems of the (non-)existence of Turing Machine problems. We quote one of Turing’s formulations from [90]:

Turing’s Thesis: LCM’s [Logical Computing Machines, Turing’s expres- sion for Turing Machines] can do anything that could be described as “rule of thumb” or “purely mechanical”.

We now introduce more of Turing’s theory of Turing Machines before we define his proof of undecidability. 9.1. TURING AND TURING MACHINE 145

We define a language to be a set of strings, a string being a series of alphabet symbols (i.e. w ∈ Σ∗, for all strings w). We say that a TM M accepts input string w if a sequence of configurations C1,...,Ck exists where

1 C1 is the start configuration of M on input w.

2EachCi yields Ci+1 via the transition function δ on M.

3 Ck is an accepting configuration. A set of strings that M accepts is called the language of M.

Definition of the language of a TM: The language ofaTMM, notation L(M):={w | w is a string that M accepts }.

Let w ∈ Σ∗. We now define a notion that covers the ability of a TM to end in the accept state when started with any string of a certain language.

Definition of Turing-recognizable: A language L is recognized by a TM M := there exists a TM M such that for all strings

1 with input w, M stops in qaccept if w ∈ L and

2 with input w, M stops in qreject or does not stop (loops) if w/∈ L. If language L is recognized by a TM M we say that M is an acceptor for L. We distinguish between recognizing and deciding capabilities.

Definition of Turing-decidable (or decidable): A language L is decided by a TM M := there exists a deterministic TM M such that:

1 with input w, M holds in qaccept if w ∈ L,and

2 with input w, M holds in qreject if w/∈ L. If a language L is decided by a TM M we say that M is a decider for L.

There are several variants on Turing Machines such as double-sided in- finite Turing Machines, multitape Turing Machines, non-deterministic Tur- ing Machines and certain types of so-called enumerators. Most variants are equivalent in the sense that they can recognize the same set of languages 146 CHAPTER 9. CHURCH AND TURING

(but not necessarily equally efficient).

Example: We now give an example of a Turing Machine solving a mathema- tical problem by first defining it as a language problem. The problem (idea from [56]) is to design a Turing Machine that computes the function

f(x, y)=x + y if x ≥ y

f(x, y)=0ifx

For simplicity, we assume x and y to be positive integers. First we have to choose a for representing positive integers, and decide what the initial situation of the tape is. We choose a unary notation in which any positive integer xis represented by w(x) ∈{1}+, such that | w(x) |= x.We assume that w(x)andw(y) are on the tape in unary notation, separated by a single ‘0’ and with the read-write head on the left-most symbol of w(x). We first describe how the sum of x and y can be calculated, then how the comparison x ≥ y can be made and finally how to combine those two ma- chines into a Turing Machine that computes the desired function.

Calculating the sum

To add the two numbers a and b, we only have to remove the separating 0, so addition amounts to the concatenation of two strings. The following Turing Machine, called Adder, adds a and b and is constructed relatively simple: Adder = (Q, Σ, Γ,δ,q0,qA,qR), with

Q = {q0,q1,...,q4}

Σ={0, 1}

Γ={0, 1, "}

q0 = {q0}

qA = {q4}

qR = {}

δ(q0, 1) = (q0, 1,R) 9.1. TURING AND TURING MACHINE 147

δ(q0, 0) = (q1, 1,R)

δ(q1, 1) = (q1, 1,R)

δ(q1, ")=(q2, ",L)

δ(q2, 1) = (q3, 0,L)

δ(q3, 1) = (q3, 1,L)

δ(q3, ")=(q4, ",R) Note that we remove the ‘0’ by temporarily creating an extra ‘1’, a fact that is remembered by putting the machine into state q1. The transition δ(q2, 1) = (q0, 0,R) is needed to remove this ‘1’ at the end of the computa- tion. Finally, we move the read-write head back to the leftmost ‘1’. This is not strictly necessary in this example, because the machine is designed such that it will terminate right after any addition, but it is not harmful and normally a good habit to let any action terminate in a state from which it is easy to take further transitions.

Comparison

To compare two numbers a and b, we again assume they are written in the notation that we used before and divided by a single ‘0’. We will construct a Turing Machine that halts in an accepting state if a ≥ b and in a rejecting state if a

Q = {q0,q1,q2,q3,q4,q5,q6,q7}

Σ={0, 1}

Γ={0, 1,x,y,"}

q0 = {q0}

qA = {q5}

qR = {q7}

The transitions of δ can be grouped in several parts.

δ(q0, 1) = (q1,x,R)

δ(q1, 1) = (q1, 1,R)

δ(q1, 0) = (q2, 0,R)

δ(q2,y)=(q2,y,R)

δ(q2, 1) = (q3,y,L)

This set replaces the leftmost ‘1’ of a with ‘x’, then causes the read-write head to travel right to the first ‘1’ of b and replace it with the symbol ‘y’. When the dividing ‘0’ is passed, the machine enters state q2, indicating that it is now dealing with the number b. When the symbol ‘y’ has been written, the machine enters a state q3, indicating that on ‘1’ of ‘y’ has been successfully paired with a ‘1’ of ‘x’. The next of transitions reverses the direction and repositions the read-write head over the leftmost ‘1’ of a, and returns control to the initial state,

δ(q3,y)=(q3,y,L)

δ(q3, 0) = (q4, 0,L)

δ(q4, 1) = (q4, 1,L)

δ(q4,x)=(q0,x,R) 9.1. TURING AND TURING MACHINE 149

The rewriting continues this way when the input is a string 1x01y, stopping only when on one side no more ‘1’s can be replaced. In that case either the left side will not contain anymore ‘1’s (a ≤ b), or the right side has run out of ‘1’s (a>b). In case the left side will not contain anymore ‘1’s, the transition δ(q4,x)=(q0,x,R) will leave the read-write head on a ‘0’ in stead of a ‘1’.

δ(q0, 0) = (q5,x,L)(a ≤ b)

δ(q2, ")=(q6, ",L)(a>b)

In the first case we still have to check whether the right side has any ‘1’s left, to determine whether a = b.Thisisdoneinthestateq5.

δ(q5,x)=(q5,x,R)

δ(q5, 0) = (q5, 0,R)

δ(q5,y)=(q5,y,R)

δ(q5, 1) = (q7,y,R)(a

δ(q5, ")=(q6, ",L)(a = b)

Combining Turing Machines for complicated tasks

We now have to put together the Turing Machines’ Adder and Comparer to obtain the desired Turing Machine that computes the given function. We can do this by starting with the input a and b in the previously described notation and starting position, and using Comparer to determine whether or not a ≥ b. We index all states with a C, i.e. the last transition will be δ(qC,0,x)=(qC,5,x,L)orδ(qC,2, ")=δ(qC,6, ",L). In the first case (a ≥ b), the Comparer should send a ‘start signal’ to the Adder, to give a + b as out- put. In the second case (a

δ(qC,7, ∗)=δ(qE,0, ∗) bring the Eraser in the initial state. The Adder respec- tively Eraser will then give the desired output because their behavior on the input does not change as a result of the remaining of the states by comparer (to be exact: the state in which the comparer terminates is suitable as an initial position for Adder or Eraser). The only thing we have not taken care of is that when the Comparer enters a final state, it does not have the initial representation of the numbers a and b on tape, but has replaced the ‘1’s by ‘x’s and ‘y’s. We can easily (it is just some extra work, you can try it as an exercise if you want) fix this by letting Comparer, as the last action before entering a final state, replace all ‘x’s and ‘y’s by ‘1’s. The result is a Turing Machine that combines Comparer, Adder and Eraser to compute the func- tion f. Similarly to this example, we can for example multiply two numbers a and b, and we can also translate macro-instructions like ‘if p then qj else qk’ (meaning that when we read ‘p’ on tape, then the Turing Machine goes into a state qj and otherwise into a state qk), and even combine them into complicated subprograms that can be invoked repeatedly whenever needed. (End of Example)

The

After introducing the notion of a TM in [89], Turing answered Hilbert’s decision problem for mathematical logic (in German called ‘Entscheidungs- problem’) in the negative. The Entscheidungsproblem asks whether there exists a definite method or algorithm which (at least in principle) can be ap- plied to any given mathematical property to decide whether that proposition is provable. We now define the notion of an algorithm with the notion of a Turing Machine, and the set of provable propositions by the set of languages that can be decided by some TM. If we look at the definition of decidability in section 6.1, we have that for all formulas ϕ an algorithm, i.e. a TM, exists that decides whether ϕ istrueornot.Ifwecodeϕ by means of a language, and this is always possible (see the previous example for a demonstration), we can reformulate the problem as: for all strings w ∈ L, there exists a TM M that decides ϕ. We now show that this is not possible for all problems (i.e. languages) by giving a specific problem, the Halting problem, that is not decidable.

The Halting problem is the problem of testing whether a TM accepts a given input string. We define the problem by stating it as a language pro- 9.1. TURING AND TURING MACHINE 151 blem, and asking whether that language is decidable.

Definition of the Halting problem: For all strings w, H := {| M is a TM and M accepts w}.IsH decidable? (i.e. is there for each language a TM that decides for all strings w if they belong to the language or not, that is (using Turing’s thesis, see section 9.3): is there for each problem an algorithm that can decide it?).

Theorem: H is recognizable Proof (by Turing): The following TM U, also called Universal Turing Ma- chine because it is capable of simulating any other Turing Machine, recog- nizes H. We informally define U, because a detailed definition of the septuple such a TM consists of (see the definition of a TM) is a lot of work.

Description of Universal Turing Machine: U = “On the input where M is a TM and w is a string: 1simulateM on input w 2ifM ever enters its accept state, accept” Note that this TM loops on input if M loops on w,whichis why this machine does not decide H. If the algorithm had some way to de- termine that M was not halting on w,itcouldreject . Hence H is sometimes called the Halting problem. As Turing demonstrated, an algorithm has no way to make this determination.

Theorem: H is undecidable (see also [82, page 165]). Proof (by Turing): We assume H is decidable and obtain a contradiction. Suppose D is a decider for H, and defined by D():=“ • accept if M accepts w • reject if M does not accept w” NowweconstructanewTMO with D as a subroutine. This new TM calls D to determine what M does when the input to M is its own description . Once O has determined this information, it does the opposite. That is, it rejects if M accepts and accepts if M does not accept. The following is a description of O: O = “On input ,whereM is a TM: 152 CHAPTER 9. CHURCH AND TURING

1runD on input >,

2 output the opposite of what D outputs; that is if D accepts, reject and if D rejects, accept”

We summarize the behavior of O as follows: O()=“

• accept if M does not accept

• reject if M accepts

Now we obtain the contradiction by running O with its own description as input. In that case we get: O()=“

• accept if O does not accept

• reject if O does accept

Thus neither O nor D can exist.

Turing wrote in his last publication about the interpretation of unsolvable problems, such as the Halting problem for Turing machines:

These ... may be regarded as going some way towards a demon- stration, within mathematics itself, of the inadequacy of ‘reason’ unsupported by common sense.

- Alan Turing

In this section I have made extensive use of [38] [92] for information on the life and work of Turing and [89] [82] [19] for the theory of TM’s and the Halting problem. Another valuable source of information on Turing’s life and work is the website http://www.turing.org.uk/ 9.2. CHURCH AND THE LAMBDA CALCULUS 153 9.2 Church and the Lambda Calculus

Alonzo Church (1903-1995) was an American mathematician, whose work is of major importance in mathematical logic, recursion theory and in theore- tical . One of the most important contributions to logic is his invention in the 1930s of the lambda calculus. He is also remembered for Church’s theorem published in 1936 in [14, page 345-363], stating that the lambda calculus can be used to embody a correct formalization of the notion of computability (see section 9.3). The notion of lambda definability is conceptually the basis for the discipline of , and the lambda calculus is also the basis for type theory. Church also founded the Journal of Symbolic logic in 1956. He had 31 doctoral students including famous mathematicians such as Turing, Kleene, Kemeny and Smullyan. We now introduce the lambda calculus (Church’s formalization of the notion of effective calculability) in a modern setting, using [9, chapter 4].

Application and abstraction

First we introduce the basic concepts of λ-calculus. A formalization fol- lows thereafter. The lambda calculus has only two basic operations, abstrac- tion and application.

• Abstraction is for constructing functions: For an expression E we in- troduce λx.E to denote the abstraction of E over x, i.e. ‘the function of x which computes E’. Example1: λx . x +1,λn . n × n,etc. We will later see how to define a recursive function; this is not so easy since we do not have function names.

• (Function) application: The expression FAdenotes that F is consid- ered as a function (an algorithm) applied to input A. The original lambda calculus theory is type-free so we also consider FF,thatis,F applied to itself. Example: (λx . x +1)4,(λn . n × n)7,etc.

1Note that in some examples we have simplified the notation for the clarity of the example, since in pure lambda calculus we do not have arithmetic symbols, like + and ×, but we can encode these operations in the pure lambda calculus, as we will later see. 154 CHAPTER 9. CHURCH AND TURING

These two notions can be very powerful if we introduce the rule of beta reduction which allows us to apply an expression over an abstraction, and for example, rewrite (λx . x+1)4 to 4+1. Similarly (λn . n×n) 7 can be reduced to 7×7. It is also allowed to use arbitrary nesting: ((λn . λx . (x+1)×n)7)4 can be reduced to (λx . (x +1)× 7) 4 and then to (4 + 1) × 7. Similar to ordinary mathematics, the names of the variables are irrele- vant to the rules that can be applied, which allows a transformation of the names (also known as dummy transformation). This rule in lambda calculus is called alpha conversion. For example, alpha conversion allows us to rewrite λn . nn to λx . xx, since they are essentially the same function.

Note that we also want to use functions as variables and arguments: ((λf . (λn . λx . fx × n)7)(λy . y + 1)) 4 should reduce to the earlier expression. But above we only have functions of one argument; we now introduce functions with more arguments, while avoiding new notations. We can solve this pro- blem by using iteration of applications, often called after the Amer- ican mathematician H.B. Curry who made it popular.

Example: f(x, y)=3× x + y can be written as F1 ≡ λx . (λy . 3 × x + y). Then f(4, 5) is written (F1 4) 5, that is ((λx . (λy . 3 × x + y)) 4) 5, which can be reduced to (by using beta reduction): 3 × 4+5.

The above explanation and examples give an idea of what lambda calcu- lus is. We will now work towards a more formal definition of lambda calculus. The system of lambda calculus is based on the structure of Abstract Reduc- tion Systems (ARS). The terms of the ARS then coincide with the inductively defined lambda terms and the reduction relation will be β−reduction. So be- fore we formally define the lambda calculus, we introduce the most relevant theory of abstract reduction systems.

Abstract Reduction Systems

Definition of Abstract Reduction System (ARS): An abstract reduc- tion system A := a structure  A, → consisting of a set A and a binary relation → on A (i.e. →⊆ A × A). The relation is also called reduction or rewrite relation. If for a, b ∈ A,we have a → b,wecallb a one-step reduct of a. 9.2. CHURCH AND THE LAMBDA CALCULUS 155

The transitive and reflexive of → is written as (or alternatively →∗). This means is the smallest relation on A satisfying, for all a, b, c ∈ A, (closure of →)ifa → b then a b, (reflexive) a a,and (transitive) if a b and b c then a c. Thus a b if and only if there exists a finite sequence of reduction steps a ≡ a0 → a1 → ... → an ≡ b. This sequence may be empty, in which case a ≡ b.Here≡ denotes (the syntactic) identity of elements of A, i.e. a ≡ b if and only if a and b are the same element of A.

Definition of Normal Form: Aterma ∈ A of an ARS < A, →> is a normal form := there is no b ∈ A such that a → b. Furthermore, b ∈ A has a normal form if and only if b a for some normal form a ∈ A

Definition of Weakly Normalizing: The reduction relation → of an ARS < A, →> is weakly normalizing (or weakly terminating):=everya ∈ A has a normal form. In this case we also say that A is weakly normalizing

Definition of Strongly Normalizing: The reduction relation → of an ARS < A, →> is strongly normalizing (also called terminating, well-founded or noetherian) := there exists no infinite reduction a0 → a1 → a2 → ..., with for all n ∈ N, an ∈ A.

Lemma If an ARS is strongly normalizing, it is weakly normalizing.

Proof: We prove this by proving the contraposition: if A, → is not weakly normalizing then A, → is not strongly normalizing. Suppose A, → is not weakly normalizing. Then there is a0 ∈ A without a normal form. Since a0 has no normal form, then certainly a0 is not a normal form itself, so there is a1 ∈ A such that a0 → a1.Nowa0 has no normal form, so a1 can not be a normal form. Thus we get an element a2 ∈ A such that a1 → a2. Repeating this process yields an infinite reduction a0 → a1 → a2 → ....

Definition of Unique Normal Form: The reduction relation → of an ARS < A, →> has the unique normal form property := for all a, b, c ∈ A 156 CHAPTER 9. CHURCH AND TURING such that a b, a c,andb, c are normal forms, we have b ≡ c

Lemma An ARS with the unique normal form property is not always weakly normalizing. Proof: For instance, the abstract reduction system with only element a ∈ A and rewrite rule a → a has no normal forms, so it trivially has the unique normal form property and is not weakly normalizing.

Definition of Local Confluence: A reduction relation → of an ARS is called locally confluent or weakly confluent (also weakly Church- Rosser) := for all a, b, c ∈ A with a → b and a → c thereexistsad ∈ A such that b d and c d

Definition of Confluence: A reduction relation → of an ARS is called confluent (or has the Church-Rosser property,oris Church-Rosser) := for all a, b, c ∈ A with a b and a c there exists a d ∈ A such that b d and c d

Lemma If a reduction relation has the unique normal form property and is weakly normalizing then it is confluent. Proof: Suppose we have a b and a c.Since→ is weakly normalizing, there are normal forms b and c such that b b and c c. By transitivity we also have a b and a c, and thus by the unique normal form property b ≡ c. Hence b b and c b.

Lemma If → is confluent then → has the unique normal form property. Proof: Suppose a b, a c,andb, c are normal forms. By confluence, there exists a d such that b d and c d.Sinceb and c are normal forms, we must have b ≡ d and c ≡ d,thusb ≡ c.

Syntax

Nowwehaveseenthebasicprincipleoflambdacalculus,wewillgivea more formal definition. We formally define the syntax of the lambda calculus by giving its grammar. 9.2. CHURCH AND THE LAMBDA CALCULUS 157

Definition of the Syntax of Lambda Terms: Lambda Term E := C | v | (E1E2) | (λv . E),with

• C ranges over a set of constants (we will use the constant names a, b, c, ... for elements of C)

• v ranges over a (denumerable) set of variables (using v, w, x, ...)

• (E1E2) denotes a combination involving the application of one expres- sion (E1)toanother(E2). The subexpression E1 is referred to as the operator and E2 is referred to as the operand

• (λv . E) denotes an abstraction. Informally it denotes a function of v which produces result E. The subexpression E is referred to as the body of the abstraction and v is called the bound variable of the abstraction

We also call lambda terms simply ‘terms’ or ‘expressions’. Notational conventions: to achieve a minimal notation, we drop parentheses whenever possible, and assume:

• Association to the left for iterated application: FE1 E2 ... En denotes (...((FE1) E2) ... En), • Association to the right for iterated abstraction: λx1 .x2.....xn.E or shortly λx1 x2 ...xn .E denotes λx1 . (λx2 . (...(λxn .E) ...)).

Example: We can write the expression F1 of the previous example as λx y . 3 × x + y,andλv . E1E2 means (λv . (E1E2)).

Free/Bound Variables and α-conversion

We distinguish between free and bound occurrences of variables in an ex- pression. An occurrence of v in E is said to be bound if it occurs within a subexpression of E with the form λv . E1,andtheoccurrence is said to be free otherwise.

Example: n occurs free in λx . (x +1)× n,whereasx occurs bound in this expression. Both n and x occur bound in λn . (λx . x +1)× n. Further x occurs both bound and free in (λx . x +1)× x (the second occurrence of ‘x’ 158 CHAPTER 9. CHURCH AND TURING in this expression is bound, the third occurrence is free).

Definition of free variables: The free variables of a term E, denoted by FV(E), is a set of variables defined recursively by:

• FV(C)=∅,

• FV(v)={v},

• FV(E1E2)=FV(E1) ∪ FV(E2),

• FV(λv . E)=FV(E) −{v}.

An expression E is said to be closed if FV(E)=∅.

Example: The expression λz . (λx . z + x)(λy . y × z)isclosed.

α-conversion

We consider two terms as ‘equivalent’ if they only differ in their bound variables. So λx . x and λy . y are considered being equivalent. But we must distinguish λx . y + x and λy . y + y, since one has a free occurrence of y and the other not. Note also that λxy . xy and λxy . yx are not equivalent. The renaming process is called α-conversion, and allows us to change the name of a bound variable, as long as we do so consistently. It is formally defined as the equivalence relation generated by the following reduction:

  Definition of α-reduction: λx . E →α λy . E ,whereE is obtained from E by replacing all free occurrences of x in E by y,providedy is fresh,thatis, y neither occurs as a free variable nor as a bound variable in the expression E (i.e. it does not occur in E).

Expressions that can be made textually equivalent by renaming bound variables are called α-convertible or alpha(betically) equivalent.Whentwo lambda terms E1 and E2 are α-convertible in this sense we write E1 ≡α E2, and often also E1 ≡ E2. 9.2. CHURCH AND THE LAMBDA CALCULUS 159

Example: Some α-conversions: λx . x +1≡α λy . y +1 λx . (λy . y × x) ≡α λy . (λy . y × y) (because the y’s in (λy . y × x) will get bound) λx .(λy . x × y)y ≡α λx .(λz . x × z)y

From now on, two λ-terms are considered (syntactically) equal if they are α-convertible to each other.

Substitution

We now formally define the concept of substitution of a variable in lambda terms.

Definition of Substitution: The substitution of expression E for each free occurrence of v in expression E0, denoted by E0[E/v], is defined by induction on the structure of E0 as:

• C[E/v] ≡ C E if x ≡ v • x[E/v] ≡ x if x ≡ v

• (E1E2)[E/v] ≡ (E1[E/v])(E2[E/v])   ≡  λx . E1 if x v ≡ ∈ • ≡ λx . (E1[E/v]) if x v and x/FV(E) (λx . E1)[E/v]  ≡ ∈  λy . ((E1[y/x])[E/v]) if x v and x FV(E) and y/∈ FV(E1E)

Example: (λx . z+7×x)[x+3/z] ≡ λy . (z+7×y)[x+3/z] ≡ λy . (x+3)+7×y.

The following lemma tells us that substitution behaves well; it can be proven by induction on the structure of λ-terms.

Lemma For all terms E0,E1,E2 and variables x, y such that x ≡ y:

E0[E1/x][E2/y] ≡ E0[E2/y][E1[E2/y]/x]. 160 CHAPTER 9. CHURCH AND TURING

Reduction System for the Lambda Calculus

As we have seen with an example at the beginning of this section, the main rule for the lambda calculus is the beta reduction rule, that we can now formally define.

Definition of β-reduction: β-reduction is the compatible relation gener- ated by (λv . E1)E2 →β E1[E2/v], with the rules:

E1 →β E2 E1 →β E2 E1 →β E2 E1E →β E2E EE1 →β EE2 λv.E1 →β λv.E2 As before, any term matching the left-hand side of the rule is called a redex and thus any expression of the form (λv . E1)E2 is called a β-redex. β-reduction is a reduction relation →β of the pure lambda calculus. We often write → resp. instead of →β and β. We use =β (or sometimes simply =) to denote the equivalence relation generated by →β. Note the difference between ≡( α)and=(β).

Example: (λnx . (x +1)× n)74→β (λx . (x +1)× 7) 4 →β (4 + 1) × 7.

Example: This example illustrates the need of α-conversion during β reduc- tion, even if distinct names are chosen from the start. Define TWICE ≡ λf . λx . f(fx), then

(λy . yy)TWICE

→β TWICE TWICE

≡ (λf . λx . f(fx)) TWICE

→β λx . TWICE (TWICE x)

≡ λx . TWICE ((λf . λx . f(fx))x)

→β λx . TWICE ((λx . f(fx))[x/f]) (Note the name clash)

≡α λx . TWICE ((λy . f(fy))[x/f]) 9.2. CHURCH AND THE LAMBDA CALCULUS 161

≡ λx . TWICE (λy . x(xy))

→β ...

Example:

1. (λx . x +1)((λy . y × y)3)β (two possibilities) (3 × 3) + 1, so different reduction paths are possible.

2. Ω ≡ (λx . xx)(λx . xx) →β (λx . xx)(λx . xx) →β ···, thus infinite sequences of steps are possible: β-reduction is not always terminating. This corresponds to ‘self-reproducing programs’.

3. (λx . xxx)(λx . xxx) →β (λx . xxx)(λx . xxx)(λx . xxx) →β ···,so terms can even become arbitrarily large.

4. (λy . c)((λx . xxx)(λx . xxx)) → c, but also (λy . c)((λx . xxx)(λx . xxx)) → (λy . c)((λx . xxx)(λx . xxx)(λx . xxx)) and the latter term can be reduced to c or again to a longer term, etc.

Although we already saw that λ-calculus is neither weakly nor strongly nor- malizing, it does have the important confluence property. First we introduce the following definition of the diamond property that we use to prove that →β is confluent. To prevent confusion in the notation we will from now on also use the implication symbol ⇒.

Definition of the Diamond Property: A binary relation → on the lambda terms Λ satisfies the diamond property, notation →|= ♦ := (∀M,M1,M2 : M,M1,M2 ∈ Λ:(M → M1 ∧ M → M2) ⇒ (∃M3 : M3 ∈ Λ: M1 → M3 ∧ M2 → M3))

Note that a reduction →β has the Church-Rosser property if it satisfies the diamond property.

Lemma: Let → be a binary relation on a set Λ with its transitive, reflexive closure and let →|= ♦.Then|= ♦. 162 CHAPTER 9. CHURCH AND TURING

Proof: Assume → is a binary relation on a set Λ with its transitive, reflexive closure, and →|= ♦. We now have to prove that |= ♦. Suppose M, L, K ∈ Λ, M L and M K. We then have to prove (∃N : N ∈ Λ: L N ∧ K N). Let

(*) M ≡ M0 → M1 → ...→ Mn ≡ L, for some n ∈ N

(**) M ≡ K0 → K1 → ...→ Km ≡ K, for some m ∈ N

We now need to apply a technique called induction loading (see for more information the links on http://zax.mine.nu/stage/) to prove that K and L have a common reduct N. To be precise, we show that l(m, n) holds for all m, n ∈ N,with l(m, n) := there exists a N(i, j) ∈ Λ, with i, j ∈ N and 0 ≤ i ≤ n ∧ 0 ≤ j ≤ m such that:

(a) N(i, 0) ≡ Mi if 0 ≤ i ≤ n

(b) N(0,j) ≡ Kj if 0 ≤ j ≤ m

(c) N(i, j) → N(i, j +1) if 0≤ i ≤ n ∧ 0 ≤ j

(d) N(i, j) → N(i +1,j)if0≤ i

Clearly, when l(m, n) is true for all m, n ∈ N, we know that K and L have a common reduct. So the only remaining proof obligation is to show that l(m, n) holds for all m, n ∈ N. We prove this by induction to n. Base case (n): n=0

(a)letN(0, 0) be M0, then (a) holds trivially by reflexivity of ‘≡’.

(b)letN(0,j)beKj for 0 ≤ j ≤ m, then (b) also holds trivially.

Note that this is valid in combination with the definition under (a) since N(0, 0) ≡ M0 ≡ M ≡ K0.

(c) N(i, j) → N(i, j + 1) holds because i =0and(**).

(d) N(i, j) → N(i +1,j) holds trivially because n = 0 yields an empty range for i. 9.2. CHURCH AND THE LAMBDA CALCULUS 163

Induction case (n): Induction hypothesis (i.h.-n): suppose that for n = k, k ∈ N, for all m ∈ N the statement l(m, n) is true. We now prove the statement for n = k + 1. We do this by induction to m. Base case (m): m=0

(a)letN(k +1, 0) be Mk+1 for 0 ≤ k ≤ m, then (a) holds trivially.

(b)sincej = 0 this amounts to N(0, 0) ≡ K0.

This is true because of our previous definition of N(0, 0) ≡ M0. and the fact that M0 ≡ M ≡ K0. (c) holds trivially, because m = 0 yields an empty range for j.

(d) N(i, j) → N(i +1,j) because j = 0 and (*).

Induction case (m): Induction hypothesis (i.h.-m): suppose that for m = r and n = k +1, r ∈ N, the statement l(m, n) is true. We now prove the statement for m = r +1.

(a) N(i, 0) ≡ Mi for 0 ≤ i ≤ k + 1 follows from i.h.-n.

(b) N(0,j) ≡ Kj for 0 ≤ j ≤ r + 1 follows from i.h.-m. (c)and(d) We already know from the induction hypotheses that N(i, j) → N(i, j + 1) is okay for (0 ≤ i ≤ k +1∧ 0 ≤ j

We can now sketch the proof2 of the following fundamental theorem of the untyped lambda calculus:

2The lines of the proof are due to W. Tait and P. Martin-L¨of (see [6], section 3.2]), but as far as I know this is the first proof that formalized the above lemma to a reasonable extent. 164 CHAPTER 9. CHURCH AND TURING

Theorem (Church, Rosser): →β is confluent. Proof: By the previous lemma, we know that if any binary relation on a set satisfies the diamond property, its transitive reflexive closure also satisfies the diamond property. Suppose we have a binary relation →partial−β on the set Λ such that β is the transitive reflexive closure of →partial−β. So if we prove that →partial−β satisfies the diamond property, by application of the previous lemma we have proved that β satisfies the diamond property, i.e. →β is confluent. A concrete definition of →partial−β, a proof that its transitive reflexive closure is indeed →β, and a proof that →partial−β satisfies the diamond property can be found on pages 60-62 of [6].

Theorem: λ-calculus has the unique normal form property. Proof: Suppose that a term a of Λ, → has two normal forms, n1 ∈ Λ and n2 ∈ Λ. This means there is no b ∈ Λ such that n1 → b or n2 → b. But a n1 ∧ a n2, and then by the Church-Rosser property we know (∃c : c ∈ Λ:a n1 ∧ a n2). But then we must have n1 ≡ n2.

Example: All constants are normal forms, as well as x, λx.x, λx.xx, yy, ....

Note that the term (λx.xx)(λx.xx) cannot be reduced to a normal form. Confluence is a fundamental property for functional programming; we relay on this when we evaluate programs by rewriting, knowing that we never have to backtrack an evaluation (this is also one of the main differences with logic programming).

In the λ-calculus we have defined in this section, we can represent natural numbers and basic operations on the natural numbers. We will not show this here; in most books on the lambda calculus there are some examples of how to do basic arithmetic in lambda calculus. The λ-calculus represents a certain class of (partial) functions on the integers. By a classical result of the American mathematician Stephen C. Kleene (1909-1994) this is exactly the set of (partial) recursive functions. The proof can be found in [6, theorem 9.2.16]. Church also thought of the set of functions that could be calculated in his λ-calculus, and conjectured the following thesis: 9.2. CHURCH AND THE LAMBDA CALCULUS 165

Church’s thesis (1936) The set of effectively computable functions, i.e. functions that intuitively (effectively) can be computed, is the same as the set of functions that can be defined in λ-calculus.

A more formal version and detailed treatment of Church’s thesis can be found in section 9.3.

Alan Turing proved in 1937 that the class of Turing computable functions is the same as the class of functions definable in λ-calculus.

So the power of Turing Machines is the same as the power of λ-calculus. Both models capture the intuitive idea of computation. This important thesis is the subject of the next section. 166 CHAPTER 9. CHURCH AND TURING 9.3 The Church-Turing thesis

The Church-Turing thesis concerns the intuitive notion of algorithm (or ef- fective or mechanical method) in logic and mathematics. The notion of an algorithm or an effective method is an informal one, and attempts to char- acterize this effectiveness lacked rigor, mainly because the key requirement that the method demands no insight or ingenuity is left unexplicated.

One of Turing’s achievements in his paper of 1936 (reprinted in [19] and online available at http://www.abelard.org/turpap2/tp2-ie.asp) was to present a formally exact predicate with which the informal predicate ‘can be calculated by means of an algorithm or effective method’ may be replaced. The formal concept proposed by Turing is that of computability by a Turing Machine (see section 9.1). He introduced this thesis in [90] in the course of arguing that the ‘Entscheidungsproblem’ for the predicate calculus is unsolv- able.

Turing’s thesis: TM’s can do anything that could be described as intu- itively computable

Church also presented in [14] a formally exact way to express this no- tion of intuitively computable. Turing’s method was however more obvious and more general than Church’s, since the latter only considered functions of positive integers. In order to calculate the values of the function Church introduced his lambda calculus and specified the notion of a recursive func- tion (see section 9.2).

Church’s thesis: A function of positive integers is effectively computable only if it is recursive

The reverse implication is also referred to as the converse of Church’s thesis. The class of lambda-definable functions and the class of recursive functions were later shown to be identical. This was established in the case of functions of positive integers by Church and the American mathematician Kleene (see [47], [14]). After learning of Church’s proposal, Turing quickly established that the apparatus of lambda-definability and his own apparatus of computability were equivalent ([89], page 263). 9.3. THE CHURCH-TURING THESIS 167

Theorem: Lambda-definability and Turing Machine-computability are equivalent. Proof: See [89, page 263] for a proof that Turing’s machines and Church’s lambda calculus can compute the same set of functions.

Although Turing and Church had chosen different ways to formalize the intuitive notion of effective computability, respectively by identifying the no- tion with that of computability by a Turing Machine and in the lambda cal- culus, both methods are equivalent. After this proof of equivalence, Kleene introduced the term ‘Church-Turing thesis’ to refer to any of the two equiv- alent theses ([48], page 232).

Church-Turing thesis: The intuitive notion of an algorithm equals the Turing Machine algorithm or (equivalent) the calculable functions of lambda- calculus

There are a number of misunderstandings of the Church-Turing thesis, collected in [16]; Turing did not show that • Any problem can be solved ‘by instructions, explicitly stated rules or procedures’ • A universal TM ‘can compute any function that any computer, with any architecture can compute’ (Turing said noting about the limits of what can be computed by a machine) • Whatever can be calculated by a machine (working on finite data in accordance with a finite program of instructions) is Turing-machine- computable (this is known as Thesis M, see [16]) • Any process that can be given a systematic mathematical description (or a ‘precise enough characterization of a set of steps’, or that is ‘scientifically describable’ or ‘scientifically explicable’) can be simulated by a TM (this is known as Thesis S, see [16]) Since the word ‘computable’ is often tied by definition to effective calcu- lability, the Church-Turing thesis is often stated as ‘All computable functions are computable by a Turing Machine’ (a function is said to be computable if and only if there is an effective procedure for determining its values). 168 CHAPTER 9. CHURCH AND TURING

If we summarize the above, we can say that to define the concept of an algorithm, Church used a notational system, the lambda calculus. Turing did the same with his theoretical computing device, the Turing Machine. On the face they seemed very different from one another, but these two definitions turned out to be equivalent, in the sense that each picks out the same set of mathematical functions. The Church-Turing thesis is the assertion that this set contains every function whose values can be obtained by a method or algorithm corresponding to our intuitive notion of effectively computable. Clearly, if there were functions of which an informal (intuitive) statement, but not the formal statement, were true, then the latter would be less gene- ral than the former and so could not be reasonably be employed to replace it. When the thesis is expressed in terms of the formal concept by Turing, it is appropriate to refer to the thesis also as the Turing thesis, and idem for the case of Church. It is agreed amongst mathematicians and logicians that ‘computable by means of a TM’ is the correct accurate rendering of the informal notion in question. Chapter 10

Conclusion

It is a profoundly erroneous truism, repeated by all copy books and by eminent people, when they are making speeches, that we should cultivate the habit of thinking of what we are doing. The precise opposite is the case. Civilization advances by extending the number of important operations which we can perform with- out thinking about them. ... The study of mathematics is apt to commence in disappointment ... We are told that by its aid the stars are weighed and the billions of molecules in a drop of water are counted. Yet, like the ghost of Hamlet’s father, this greatest science eludes the efforts of our mental weapons to grasp it.

- A. Whitehead, in [99]

When I started my study on the foundations of mathematics, I did not quite know what to expect. By now I’ve learned that the foundations of mathematics can be a fascinating and important subject. Learning this new subject was an interesting challenge, but sometimes hard work when I had to go through numerous books that were full of details or too vague and philosophical. Most books that I found on the foundations of mathematics were either very detailed and descriptive (with an unmatched level of detail and exactness is the book [31] of I. Grattan-Guinness) or treat only a part of the theory that was developed from 1890 to 1940 (for example [17] gives an excellent introduction to set theory). One of the better, though relatively unknown, is the book of G.T. Kneebone [49] that is quite complete and still considerably theoretic. One of the motivations to write this article was to

169 170 CHAPTER 10. CONCLUSION present the theory properly. Hopefully that makes it more clear and enjoy- able. Some of the good literature used, such as the books just mentioned, will be found in the references at the end of this report.

At the same time, I also tried to briefly introduce the reader to the his- torical context of the most important developments. Most undergraduate courses I have taken gave little or no information about the history that is laying behind the theory. Emphasis was laid on the accumulation of mathe- matical knowledge. I believe that the in education can not only make the study of mathematics more interesting, but also help in the growth of mathematical understanding and appreciation of the current form of the theory.

I want to conclude this report with a summary of the theory and my own view on the project, and with some ideas for future work.

The project

In the beginning of the 20th century Hilbert said we should formalize all of mathematics, mathematical reasoning. This ‘project’ (from now on I will refertoitasthe project) has been the central theme of this report. When reading about the work and biographies of all those brilliant men that have put themselves on this problem, you can (at least that’s what happened to me) get caught up into this fascinating philosophical question.

To most people however, this all seems very impractical. We all know you can make a popular or start your own business on the web and in one year make a million dollars if you’re lucky. And when it comes to verifying mathematical proofs and making reliable software, a for- mal basis is rarely used, the human mind is still the most important, and other techniques, such as model-checking, are preferred. It might be worth writing another article, on how and why in that respect the more practical, working mathematicians and more theoretical logicians (or formalists, if you prefer) grew apart. But let’s first go back to the project.

The attempt to formalize mathematical reasoning is not new - the Greek already thought rationality was the supreme goal. We can think of Plato 171 and Reason, or as Russell1 would say - think of Pythagoras and Rationality! Aristoteles made a big step in formalizing the reasoning, with his patterns of reasoning that are known as syllogisms. Ever since, logic was further developed and important contributions come from De Morgan, Leibniz and especially Boole. Because he was interested in theology and God (see [31, chapter 3] and also [30, section 5.8, page 203]), Cantor became obsessed with the notion of infinite, and developed his theory of infinite sets. With Cantor mathematics got more abstract, and some people regarded his set theory as a disease. Poincar´e, the great French mathematician, said2: (from [95]) “Later generations will regard Mengenlehre (set theory) as a disease from which one has recovered.”. Peano and Frege, as we have learned in chapter 4, brought mathematical reasoning to an even higher level of formalization. So far, so good. But there turned out to be some problems, and although Cantor had already noticed this (see Cantor’s paradox in section 3.8), it was Russell who spread the bad news to everyone, by stating his Russell paradox. At that point Hilbert proposed to use a formal axiomatic method to solve these problems, and he gave his famous three requirements of consistency, completeness and decidability.

This proposal of Hilbert to formalize mathematics, led to the development of several axiomatic systems, such as those of Zermelo and Fraenkel, and of G¨odel, Bernays and Neumann. Russell and Whitehead made their own at- tempt to formalize mathematics, with their theory of types. But although all of these attempts were fruitful to a certain extent, in total they all failed, and it took G¨odel and Turing to show that in fact ‘the project’ couldn’t be done. Formalizing mathematics so that we have absolute truth is not possible! But these works of G¨odel and Turing were new and complicated, and not everyone clearly recognized its importance. And even nowadays, few people are familiar with the details of their work, and we often see confu- sion between notions like ‘checking the proof of a statement’ and ‘checking whether a statement is true (or not)’. There is also much confusion about the exact implications of G¨odel’s and Turing’s work. G¨odel created a statement within , that is not provable in any axiomatic system. Turing later formalized the notion of computability to show there is no mechanical

1Although rationality is more commonly associated with Plato, Russell always insisted on attributing it to Pythagoras (see [62]). 2Whether or not he actually said this is a matter of debate amongst historians of mathematics. 172 CHAPTER 10. CONCLUSION procedure to decide if a statement is correct or not.

At first this was a shock, but then mathematicians were saying (and again it would be nice to write an article about the different responses of mathematicians and logicians): so what - we should do mathematics exactly the same way as we’ve always done it, this does not apply to the problems I care about. Indeed mathematicians continued with their work, and the theorems of G¨odel and Turing had no or little impact in practice on how we (should) do mathematics. The only effect the project might have had on working mathematicians, is that they have become a bit more precise in the use of language and in writing their proofs. Some of course were inspired by problems like the 23 of Hilbert. But there has been another consequence of all this theoretical work, that I was made aware of through a videotaped lecture of G.J. Chaitin on the internet. I quote him about Hilbert’s attempt to formalize all mathematics after the publications of the theorems of G¨odel and Turing: “It failed in that precise technical sense. But in fact it succeeded magnificently, not formalization of reasoning, but formalization of has been the great technological success of our time - languages! So if you look at the history of the beginning of this century you’ll see papers by logicians studying the foundations of mathematics in which they had predicate calculi. Now you look back and you say this is clearly a ! [...] If you look at Turing’s paper of course there is a machine language [...]. Or, as von Neumann said: the universal Turing Machine is really the notion of a general purpose programmable computer - and that’s the idea of software. [...] If you look at papers by Alonzo Church you see the lambda calculus, which is a functional programming language. If you look at G¨odel’s original paper you see what looks like LISP, it’s very close to LISP”. As he showed there are numerous examples of unexpected offspring of theoretical research, and all of the foundational work is not so impractical after all! As G.J. Chaitin concluded in his speech, this is the way “we’re all benefiting from the glorious failure of this project!”. Now this is not entirely true, but it is true that theoretical studies, as he says “don’t have spin-off in dollars right away, but sometimes they have vastly unexpected consequences”. Formal methods/studies have not always done a good job promoting themselves - maybe we can emphasize this aspect and show that technology often advances through fascinating impractical ideas. 173

Status of the project

That brings us to ask if the question of the foundation of mathematics, more than a decade after Hilbert formulated it, is now settled once and for all. The short answer is: it is not. Even from the amount of interesting resources on current research that are available on the internet alone, we can conclude there is still a lot of work to do on the foundations of mathematics. I consider creating an online version of this document with more background information and links.

Although G¨odel and Turing showed that it is impossible to totally for- malize even basic arithmetic, let alone the whole of mathematics, it is still possible to formalize parts of mathematics (for example, geometry) success- fully. As P. Andrews says in [4], “attempts to understand the nature of rea- soning and to build sophisticated information systems which can draw logical conclusions may be regarded as part of an endeavor to fashion more powerful intellectual tools for coping with the increasingly complex problems which confront mankind.” In that respect the formalization is not restricted to ma- thematical reasoning, and it can also be applied to other disciplines (such as physics, chemistry or even social sciences). Especially the development of software and computer systems will be facilitated by a formalization of theories. Despite that total formalization of parts of mathematics is very useful, this is not the focus of most current research: (most people believe that) the human mind will (at least for the near future) be the one to prove whether a given mathematical statement is true or not.

Ideas for future work and distinguishment between mathematics and software

And although it cannot be determined by a machine whether any given mathematical statement is true, we can try to develop an axiomatic system such that as much as possible of the interesting statements3 can be proved within that system. This is useful because, even when all axiomatic systems are incomplete and there are undecidable statements, if we provide one of the

3As interesting statements, we consider all statements in the (everyday) work of prac- ticing mathematicians. These ‘practical’ statements do not include the specific purely theoretical statements that G¨odel invented for his incompleteness theorem. 174 CHAPTER 10. CONCLUSION statements that the system does contain, and which we claim to be decida- ble by providing a concrete and completely formalized (dis)proof of it within that system, we still have a way to decide mechanically whether or not the proof is correct for the given statement. The question then is if the set of statements for which we can do this, still forms a part of mathematics that is interesting enough. This has to be a part of our investigation: to find out how many of the practical mathematical proofs contain ‘meta-arguments’, in other words which classes will fall outside our system. Although we want to change as little as possible to the (side of) mathematics itself, this also might be a necessary option4. As P. Andrews calls his book [4], we get: ‘to truth through proof’. This should be the first goal for the near future:

(1) Investigate which parts of mathematics can(not) be formalized (i.e. con- tain ‘meta-arguments’), which formalization is best usable and allows most parts of (practical) mathematics to be formalized, and totally formalize proof checking for as most parts of mathematics as possible.

Formalization is not only important to check the correctness of mathema- tical theories that are becoming ever more complex. Many models in physics and chemistry depend on underlying mathematical theorems, and the suc- cess of the model depends on the correctness of the mathematical theorems. Also, we are becoming more and more dependent on automated systems, in particular computers and software. There is a growing need for reliable (that is, correctly specified and working according to the specifications) software, not only for (safety) critical systems, but also in everyday applications. A formal approach can not only be used to prove correctness of mathematical statements but also of computer programs. This is an important point: Distinguishment between mathematics and .

Instead of the proofs of mathematical statements, we are then checking the derivation steps of program derivations. I want to emphasize this differ- ence, since it is often unclear or left implicit which of the two is meant when arguments for/against formalistic studies are given. We have to realize that we can never obtain a 100% guarantee of correctness of any algorithm, since

4For a successful formalization of parts of mathematics we therefore do not only look at the axiomatic system, but it also might require us to limit certain parts of mathematics so that they contain less undecidable proofs or require us to rewrite certain existing proofs to a form that is permitted by the system. 175 we also are dependent on the correctness of the proof-checker. That is why we have to try to keep the proof-checker as simple, small and intuitive as pos- sible (see also the ‘Bruijn criterion’ in [26, pages 4 and 26]). And analogue, we can never obtain a 100% guarantee of correctness of any mathematical statement, since we learned from G¨odel that the consistency of any axiomatic system cannot be proved within that system, and therefore we better also try to keep the axiomatic system as simple, small and intuitive as possible (we could see all this as the Bruijn criterion variant for axiomatic systems). But nevertheless, any such implementation of a proof checker would give us the highest degree of certainty possible.

Software and Proof Checking

I would also like to remark that proof checking for programs can only give us a way to verify the correctness of programs. At least as important (to ob- tain correct programs) is the correct construction of programs. This is the focus of the work in the area of programming . At the Eindhoven University of Technology for example, the techniques of E.W. Dijkstra are used to derive correct programs from their specification. Unfortunately both areas (proof checking/verification vs. construction/derivation) are merely ad- vocates of their own approach, while a combination of both could give the best results. Although there has been some minor work on formalizing these proof techniques and combining formal methods and program derivations (see for example [26]), cooperation is still minimal. If we go one step further back in the process of creating correct software, the success of any piece of software depends on the correctness of its specification. These first phases of software engineering (indicating user requirements/specifications) can also be adopted to comply with the methods of program derivation and formal proof checkers (note that we not only use the term ‘proof checker’ for mathematics, i.e. to check mathematical statements, but also for the software variant: for checking algorithms/programs derivations). And since we can never obtain a 100% guarantee of correctness of software (it depends for example on the correctness of the specifications and the proof checker itself), techniques can also be used as a verification method to improve reliability even further. Therefore I stress for an integrated approach, for the combina- tion of all of the mentioned methods can only together give us the highest reliability (i.e. highest chance of correctness of software). Such an integrated approach requires research and cooperation between the various branches 176 CHAPTER 10. CONCLUSION representing the methods I mentioned before and ultimately incorporation in the software engineering process.

Mathematics and Proof Checking

Let’s go back to proof checking of mathematical statements. We men- tioned the first goal of investigating and formalizing proof checking. As a next step (2) we can think of building proof assistants. Proof assistants not only check the proofs for us, but also help us in making the proofs: they are tools that are a combination of a proof development system and a proof checker. A good article about proof assistants using systems can be found in [8]. Also an interesting article on computer assisted mathe- matics (for ) is [7] with an abstract history of computations versus proofs in mathematics. The notion of ‘helping’ or ‘assisting’ in making proofs might be considered vague. For complicated statements, we can think of tools that keep track of the context of the proof, of the remaining proof obligations and even fill in part of the proofs for us automatically.

Proof assistants should make it easier for us to prove mathematical theorems. Then (3) we can think of building a standard of proved mathematics. After a proof checker has confirmed the correctness of a given mathematical statement and its corresponding proof, they can be stored in a . It can be accessible to everyone via the internet and even be used for previously mentioned automated proving methods by proof assistants. And although we can not see the quality of mathematical work as evident as the quality of phys- ical products, this could be the long awaited ‘quality stamp’ for mathematics. There have already been attempts to build standard libraries of mathematics (see the Mizar project at http://www.mizar.org/ and the PRL project, see http://.www.cs.cornell.edu/Info/Projects/NuPRL/nuprl.html, but they lack the formal basis that has to be provided by (1) and (2)). Barendregt and his group have formalized parts of algebra using the theorem prover . This shows that it is possible to formalize large parts of mathematics, but the process itself of formalizing mathematics is too direct and informal and needs to be further developed. Many valuable have come out of attempts on what are here called phase (2), (3) and (4), but for a successful result this is premature and do we first have to start thoroughly at the be- ginning (1). Work in this direction was done in [44], where a syntax-driven derivation system is presented for a formal language of mathematics called 177

Weak Type Theory. This is a start of a more rigorous approach to the trans- lation of mathematical texts (statements and proofs).

We see the extension of proof assistants with more intelligent and sophis- ticated automated proving methods, as the last and final phase (4) of future work. Part of the branch of automated proving are classical theorem proving methods (such as for example automated induction, etc.). New methods are from areas such as neural networks, and genetic and DNA com- puting and in the future possibly even quantum computing.

I want to end these ideas by summarizing the steps that are laying ahead of us, in a new project.

The new project (for mathematics): 1 Investigate which parts of mathematics can(not) be formalized (i.e. con- tain ‘meta-arguments’), which formalization is best usable and allows most parts of (practical) mathematics to be formalized, and totally formalize proof checking for as most parts of mathematics as possible

2 building a proof assistant (probably based on some form of WTT and some form of TT)

3 build a standard library (archive) of proved mathematics

4 further develop automated proving techniques (to build in the proof assistant) And similarly we can formulate the new project for computer systems:

The new project (for software construction): 1 formalize as much of program derivation checking as possible

2 build a programming assistant (environment) based on a suited (and preferably popular) programming language

3 build a standard library of reusable correct software (i.e. suitable for component based software engineering) and its specification

4 further develop automated proving and program derivation techniques 178 CHAPTER 10. CONCLUSION

One of the most important questions, part of step (1), has so far in this conclusion been avoided: What to take for the basis of mathematics? This is one of the most difficult questions and as we have seen many great scientists have thought about this. There is currently no consensus of what is the best approach, and I am not in the position to give an argumented opinion. A thorough research of the alternatives will have to yield the best approach and will show which choice of foundational system is best usable in practice. The only thing I can say is that it seems that recently most people seem to favor type theory over , relational calculi and also over set theory. P.J. Scott for example favors type theory over category theory in the introduction of [55]. H. Barendregt gives arguments for the use of type theory over set theory in [7], and we quote from [4, the second page of the preface]: “[People prefer the approach they are most familiar with.] However, those familiar with both type theory and axiomatic set theory recognize that in some ways the former provides a more natural vehicle than the latter for formalizing what mathematicians actually do”. On the contrary, on http://- www.rbjones.com/rbjpuc/logic/jrh0111.htm we find a detailed assessment on the choice for a foundational system, with advantages of set theory over type theory. Also, several new types of logic have been proposed, such as IF logic (see [37]) and several types of so-called ‘fuzzy logics’, but until so far it seems they lack preciness, formalization and proofs to support claims that they can be used successfully as a foundation for mathematics. A final remark on the debate between type theory and axiomatic set theory as a foundational basis, is that if there is a mapping from the axioms of (some form of) set theory in (some form of) type theory and vice versa, type theoretic expressions have their counterparts in set theory. It is interesting to investigate if among such mappings there is indeed a bijection. That would show the equivalence of both theories in expressive power, so that the debate can turn onto the question which theory is more intuitive and useful. Some do not really believe in a successful formalization of mathematics but rather see the indeterminacies in mathematical representations and the un- decidabilities in any formal system as the source of problem solving and creative power (see [87, page 174]). This standpoint was already mentioned in 1807 by the German mathematician Hegel (1770-1831) in [35]: “Dagegen muß behauptet werden, daß die Wahrheit nicht ein ausgepr¨agte M¨unze ist, die fertig gegeben und so angestrichen werden kann”. 179

I am aware of the limitations of this report. Many chapters are still infor- mal, such as the work of Frege in chapter 4. The theory of types in chapter 7 and of G¨odels incomepleteness theorem in chapter 8 are not completely covered and certain subjects closer to logic (such as intuitionism) are treated very minimally. The only excuse I have is that it is simply not possible to study all the original works in such a short period of time, and include all theory in this report. I hope to complete this work at a later stage. It might also be worth to extend (on both sides) the period of which the theory is treated in this report. Recently we have seen interesting new theories on category and type theory and even on the foundations of mathematics, as we look at Chaitin’s results on randomness; it seems that he went further where G¨odel and Turing left off. Finally I would like to remark that the ‘new project’, consisting of the four steps mentioned in this conclusion, is just my own view of work that lays ahead of us. To end with a concluding remark by Alan Turing, from his paper on the Turing test: “We can only see a short distance ahead, but we can see plenty there that needs to be done”.

Mark Scheffer, August 20015

5p.s. To those who wonder what the turtle and the elephant are doing on the cover of this report, I refer to the website http://zax.mine.nu/stage/. 180 CHAPTER 10. CONCLUSION Appendix A

Timeline and Images

Figure A.1: Luitzen Brouwer

Figure A.2: George Cantor

Drawings by Soshichi Uchii, [email protected]; Photo Quine by Kelly Wise; Photo Ramsey due to Harcourt, Brace, Jovanovich.

181 182 APPENDIX A. TIMELINE AND IMAGES

Figure A.3: Richard Dedekind

Figure A.4: Gottlob Frege

Figure A.5: Kurt G¨odel

Figure A.6: David Hilbert 183

Figure A.7: John von Neumann

Figure A.8: Giuseppe Peano

Figure A.9: Henri Poincar´e

Figure A.10: 184 APPENDIX A. TIMELINE AND IMAGES

Figure A.11: Frank Plumpton Ramsey

Figure A.12: Bertrand Russell

Figure A.13: Alan Turing 185 186 APPENDIX A. TIMELINE AND IMAGES Bibliography

[1] Y. Bar-Hillel A.A. Fraenkel and A. Levy. Foundations of set theory. North-Holland Press, Amsterdam, 2 edition, 1973. First edition 1958.

[2] W. Ackermann and D. Hilbert. Grundz¨uge der Theoretischen Logik, volume Band XXVII of Die Grundlehren der Mathematischen Wis- senschaften in Einzeldarstellungen. Springer-Verlag, first edition, 1928. .

[3] J.H.J. Almering. Analyse. Delftse Uitgevers Maatschappij, 1993.

[4] P. Andrews. An introduction to mathematical logic and type theory: to truth through proof. Academic press, 1986.

[5] J. Backer and P. Rudnicki. Hilbert’s basis theorem. Association of Mizar Users, University of Bialystok, 12, 2000, 2000. Published in Journal of Formalised Mathematics.

[6] H. Barendregt. The Lambda Calculus - Its Syntax and ,vol- ume 103. Science Publishing Company, Inc., 1984.

[7] H. Barendregt and A.M. Cohen. Electronic Communication of Ma- thematics and the Interaction of Computer Algebra Systems and Proof Assistants. J. Symbolic Computation. Academic Press, 2001.

[8] H. Barendregt and H. Geuvers. Proof-checking using Dependent Type Systems, volume 2, chapter 18, pages 1149-1240 of Handbook of Artifi- cial Reasoning. Oxford Press, 2001.

[9] C.J. Bloo. Computational Models. TU/e Press, 2001. Manuscript originally started by H. Geuvers and J. Hooman.

187 188 BIBLIOGRAPHY

[10] J. Breuer. Introduction to the Theory of Sets. Prentice-Hall, August 1958.

[11] Encyclopedia Brittanica. P. Bernays. EB, 2000.

[12] K.S. Brown. Mathematics. Seanet, 1991.

[13] G. Cantor. Ein beitrag zur mannigfaltigkeitslehre. Journal f. reine und angew. Math., Gesammelte Abhandlungen., 84, pages 119-133, 1878. Translated in ‘Contributions to the foundation of the theory of transfi- nite numbers (translation from German’, by Philip E. Jourdain, Dover Publishing, 1952.

[14] A. Church. An unsolvable problem in elementary number theory,vol- ume 58. American journal of Mathematics, 1936.

[15] P.J. Cohen. Set Theory and the Continuum Hypothesis. Benjamin, 1966.

[16] B.J. Copeland. The Church-Turing Thesis. Springer-Verlag, 1997. Item in Stanford Encyclopedia of Philosophy.

[17] H.C. Doets D. van Dalen and H. de Swart. Sets: Naive, Axiomatic and Applied. Pergamon Press, 1978.

[18] J.W. Dauben. Georg Cantor, His Mathematics and Philosophy of the Infinite. Harvard University Press, 1979.

[19] M. Davis. The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems and Computable Functions. Raven Press, New York, 1965.

[20] Diverse. , 65. Springer-Verlag, Berlin, 1908.

[21] A. Einstein. Relativity: the special and general theory. Methuen Press, , 1970.

[22] H. Eves. Mathematical Circles Revisited. Boston Press, 1971.

[23] H. Eves. Foundations and fundamental concepts of mathematics.Dover publications inc., Mineola, New York, third edition edition, 1997. BIBLIOGRAPHY 189

[24] A. Fraenkel. Einleitung in die Mengenlehre. Springer-Verlag, third edition, 1928.

[25] A.A. Fraenkel. Abstract Set Theory. North-Holland Press, Amsterdam, 3 edition, 1966. First edition in 1953.

[26] M. Franssen. Cocktail. Eindhoven University Press, 2000. Doctoral thesis.

[27] K. G¨odel. On formally undecidable propositions of Principia Mathema- tica and related systems. Dover publications, New York, 1992. English translation of G¨odel’s original 1931 publication of the incompleteness theorem. First published in 1962 by Basic Books, inc., New York.

[28] D. Goldrei. Classic set theory, a guided independant study. Chapman and Hall, 1996.

[29] I. Grattan-Guinness. How did Russell write the principles of mathema- tics (1903). McMaster University Library Press, 1997. In the Journal of the Bertrand Russell Archive.

[30] I. Grattan-Guinness. From the Calculus to Set theory 1630-1910. Princeton University Press, 2000. First published in 1980 by G. Duck- worth & Co, London.

[31] I. Grattan-Guinness. The Search for Mathematical Roots 1870-1940. Princeton University Press, 2000.

[32] I. Grattan-Guinness. A sideways look at Hilbert’s Twenty-three Pro- blems of 1900. Middlesex University Press, 2000.

[33] J. Haim. Introduction of the Israel Mathematical Conference Procee- dings, volume 6. Bar-llan University Press, 1993.

[34] P.R. Halmos. Naive Set Theory. Van Nostrand Press, London, 1990.

[35] G.W.F. Hegel. Ph¨anomenologie des Geistes. Reprint: Meiner, Hbg., 1807. English translation ‘The Phemenology of Mind’ by J.B. Baillie in 1910, London.

[36] H. Hermes and H. Schulz. Mathematische Logik. Unknown, 1952. In Encyklopedia Mathematische Wissenschaften, I1, 1, I, page 58. 190 BIBLIOGRAPHY

[37] J. Hintikka. The Principles of Mathematics Revisited. Cambridge Uni- versity Press, 1996.

[38] A. Hodges. Turing. The Great Philosophers. Phoenix, 1997.

[39] A.D. Irvine. Bertrand Arthur William Russell. Stanford University Press, 2000.

[40] D. Joyce. Hilbert’s 1900 Address. Clark University, Worcester, 1997.

[41] D. Joyce. A list of Hilbert’s problems. Clark University, Worcester, 1997.

[42] D. Joyce. The Mathematical Problems of David Hilbert, http://- alepho.clarku.edu/ djoyce/hilbert/. Clark University, Worcester, 1997.

[43] F. Kamareddine and T. Laan. A reflection on russell’s ramified types and kripke’s hierarchy of truths. Journal of the Interest Group in Pure and Applied Logic, 4 (2):195–213, 1996.

[44] F. Kamareddine and R. Nederpelt. A derivation system for a formal language of mathematics. To be published, July 2001.

[45] I. Kaplansky. Encyclopedia Brittanica, item on David Hilbert.EB, 1990.

[46] E. Kasner and J. Newman. Mathematicians and the imagination.New York Publishing, 1940.

[47] S.C. Kleene. Lambda-definability and recursiveness. Duke Mathemati- cal Journal 2:340-353, 1936.

[48] S.C. Kleene. Mathematical Logic. New York, 1967.

[49] G.T. Kneebone. Mathematical logic and the foundations of mathema- tics. D. van Nostrand Company, 1963. Reprint 2001.

[50] J. Koendrink. Solid Shape. Cambridge, 1990.

[51] K. Kunen. Set theory: an introduction of independence proofs.New York Press, 1980. BIBLIOGRAPHY 191

[52] T. Laan. A formalization of the ramified type theory. TUE Computing Science Reports, 1994. Technical Report 94-33.

[53] T. Laan. The Evolution of Type Theory in Logic and Mathematics. PhD thesis, Eindhoven University of Technology, 1997.

[54] T. Laan and R.P. Nederpelt. A modern elaboration of the ramified theory of types. Studia Logica, 57(2/3):243–278, 1996.

[55] J. Lambek and P.J. Scott. Introduction to higher order logic. Cambridge University Press, 2001.

[56] P. Linz. An introduction to formal languages and automata.D.C.Heath and Company, 1990.

[57] J.R. Lucas. The conceptual roots of mathematics. Rootledge Press, 2000.

[58] D. MacHale. Comic Sections. Dublin, 1993.

[59] Mosch´eMachover. Set theory, logic and their limitations. Cambridge University Press, 1996.

[60] P. Mancosu. From Brouwer to Hilbert, the debate on the foundations of mathematics in the 1920s. Oxford University Press, 1998.

[61] E. Maor. To infinity and beyond. Boston Press, 1987.

[62] R. Monk. Russell. The Great Philosophers. Routledge, 1999. First published in 1997 by Phoenix.

[63] G.H. Moore. Zermelo’s axiom of choice: it’s origins, development and influence. Springer-Verlag, 1982.

[64] E. Nagel and J. R. Newman. G¨odel’s proof. New York University Press, 1986. First published in 1958.

[65] G. Peano. Calcolo differenziale e principii di calcolo integrale. Press, 1884.

[66] G. Peano. Applicazioni geometriche del calcolo infinitesimale.Turin Press, 1887. 192 BIBLIOGRAPHY

[67] G. Peano. Calcolo geometrico secundo lAusdehnungslehre di H. Grass- mann e precedutto dalle operazioni della logica deduttiva. Fratelli Bocca, Torino, 1888. Translation in German ‘Geometric Calculus : Accor- ding to the Ausdehnungslehre of H. Grassmann’ by Lloyd Kannenberg, november 1999, Publisher Birkhauser.

[68] G. Peano. Dizionario di matematica. Parte prima. Logica matematica. Unknown, 1901. In Ri(e)vista di mathematica, edited by Peano.

[69] L.J.J. Wittgenstein P.M. Sullivan. The foundations of mathematics. Unknown, June 1927. Reprinted by F. P. Ramsey, June 1927, Theoria 61 (2) (1995), pages 105-142.

[70] W. Van Orman Quine. Mathematical Logic. Harvard University Press, 1951. Revised edition of Norton, New York 1940.

[71] W. Van Orman Quine. From a Logical Point of View: 9 Logico- Philisophical Essays. Harvard University Press, 2 edition, 1961. Cam- bridge, Massachusetts.

[72] W. Van Orman Quine. Set Theory and its Logic. Harvard University Press, 1963. Cambridge, Massachusetts.

[73] R.C.W. Bertrand Russell entry in Encyclopedia Brittanica. EB, 2000.

[74] J. Richard. Les principes de math´ematiques et le probl`eme des ensem- bles. Revue gnrale des sciences pures et appliques, 16, 1905. Published also in Acta Mathematica 30 (1906), pages 295-296.

[75] B. Riemann. Uber die Hypothesen, welche der Geometrie zu grunde liegen.G¨ottingen Press, 1854.

[76] N. Rose. Mathematical Maxims and Minims. Raleigh NC, 1988.

[77] H. Rubin and J.E. Rubin. Equivalents of the axiom of choice. North- Holland Press, Amsterdam, 1963.

[78] B. Russell. My philosophical development. London: George Allen and Unwin, New York: Simon and Schuster, 1959. BIBLIOGRAPHY 193

[79] B. Russell. Introduction to Mathematical Philosophy. The Great Philosophers. London: George Allen and Unwin; New York: The Macmillan Company, 1999. First published in 1997 by Phoenix. [80] B. Russell. The autobiography of Bertrand Russell. Routledge, 2000. [81] S. Shelah. Proper forcing, lecture notes in mathematics. Springer- Verlag, 1982. [82] M. Sipser. Introduction to the theory of computation. PWS Publishing Company, Boston, 1997. [83] A.T. Skolem. Einige bemerkungen zur axiomatischen begr¨undung der mengenlehre. Akademiska Bokhandeln, Helsinki, 1922. In ‘Matem- atikerkongressen i Helsingfors 4-7 juli 1922, Den femte skandinaviska matematikerkongressen’, pages. 217-232. Reprinted in ‘Selected Works in Logic’, by A.T. Skolem, edited by Jens E. Fenstad, 1970, Publisher Universitetsforlaget, Oslo. [84] R.M. Smullyan. G¨odel’s incompleteness theorems. Oxford Logic Guides. Oxford University Press, 1992. [85] B. Sobocinski. L’analyse de l’antinomie Russellienne par Lesniewski. Unknown, 1950. Methodus I, pages 94-107, 220-228, 308-316; Metho- dus II, pages 237-257. [86] F. Kamareddine T. Laan and R. Nederpelt”. Types in Logic and Ma- thematics before 1940, volume 8. Bulletin of Symbolic Logic, January 2002. To be published. [87] M. Tiles. Mathematics and the image of reason. Routledge, 1991. [88] E.C. Titchmarsh. Mathematical Maxims and Minims. Press, 1988. [89] A.M. Turing. On computable numbers, with an application to the Ent- scheidungsproblem, volume 42, pages 230-265 of 2. London Mathe- matical Society, 1936. With corrections from Proceedings of the Lon- don Mathematical Society, Series 2, Vol.43 (1937) pages 544 to 546. Reprinted with some annotations in ‘The Undecidable: Basic Papers on Undecidable Propositions, Unsolvable Problems and Computable Functions’, ed. Martin Davis, 1965, Raven Press, New York. 194 BIBLIOGRAPHY

[90] A.M. Turing. Intelligent Machinery. National Physical Labatory, 1948. National Physical Labatory Report in ‘Machine Intelligence 5’ by Meltzer, B. and Michie, P., 1969, Edinburgh University Press.

[91] Unknown. Encyclopedia Brittanica; Item on Principia Mathematica. EB, 2000.

[92] Unknown. Encyclopedia Brittanica; Item on Turing. EB, 2000.

[93] J. van Heijenoort. From Frege to G¨odel: source book in mathematical logic 1879-1931. Harvard University Press, 1967.

[94] W. van Orman Quine. New foundations for Mathematical Logic.The American Monthly, February 1937. 44(2), pages 70-80.

[95] Various. The Mathematical Intelligencer, volume 13. Springer-Verlag, Berlin, 1991.

[96] J. von Neumann. Zur Einfurung der transfiniten Zahlen. Acta Szeged. 1:199-208 [I, 3], 1923.

[97] J. Weiner. Frege in Perspective. Cornell, 1990.

[98] J. Weiner. Frege. Past Masters. Oxford University Press, 1999.

[99] A. Whitehead. An introduction to Mathematics. Williams and Norgate, London, 1911.

[100] A. Whitehead. A treatise on universal algebra. New York, 1960.

[101] E. Zermelo. Untersuchungenuber ¨ die Grundlagen der Mengenlehre, I. Springer-Verlag, 1908. In Mathematische Annalen 65, 1908, pages 261-281.