The Theory of the Foundations of Mathematics - 1870 to 1940
The theory of the foundations of mathematics - 1870 to 1940 -
Mark Scheffer (Version 1.0) 2 3
.
Mark Scheffer, id. 415968, e-mail: [email protected]. Last changes: March 22, 2002. This report is part of a practical component of the Com- puting Science study at the Eindhoven University of Technology. 4
To work on the foundations of mathematics, two things are needed: Love and Blood.
- Anonymous quote, 2001. Contents
1 Introduction 9
2 Cantor’s paradise 13 2.1Thebeginningofset-theory...... 13 2.2Basicconcepts...... 15
3 Mathematical constructs in set-theory 21 3.1Somemathematicalconcepts...... 21 3.2Relations...... 23 3.3Functions...... 29 3.4 Induction Methods ...... 32 3.4.1 Induction ...... 32 3.4.2 Deduction...... 33 3.4.3 The principle of induction ...... 34 3.5Realnumbers...... 45 3.5.1 Dedekind’scuts...... 46 3.5.2 Cantor’schainsofsegments...... 47 3.5.3 Cauchy-sequences...... 48 3.5.4 Propertiesofthethreedefinitions...... 50 3.6Infinitesets...... 51 3.7TheContinuumHypothesis...... 60 3.8 Cardinal and Ordinal numbers and Paradoxes ...... 63 3.8.1 Cardinal numbers and Cantor’s Paradox ...... 63 3.8.2 Ordinal numbers and Burali-Forti’s Paradox ...... 65
4 Peano and Frege 71 4.1Peano’sarithmetic...... 71 4.2Frege’swork...... 74
5 6 CONTENTS
5 Russell 79 5.1 Russell’s paradox ...... 82 5.2Consequencesandphilosophies...... 88 5.3ZermeloFraenkel...... 92 5.3.1 Axiomatic set theory ...... 92 5.3.2 Zermelo Fraenkel (ZF) Axioms ...... 93
6 Hilbert 99 6.1Hilbert’sprooftheory...... 101 6.2Hilbert’s23problems...... 110
7 Types 113 7.1 Russell and Whitehead’s Principia Mathematica ...... 113 7.2Ramsey,HilbertandAckermann...... 119 7.3Quine...... 121
8G¨odel 123 8.1 Informally: G¨odel’s incompleteness theorems ...... 123 8.2 Formally: G¨odel’sIncompletenessTheorems...... 127 8.2.1 On formally undecidable propositions ...... 127 8.2.2 The impossibility of an ‘internal’ proof of consistency . 130 8.2.3 G¨odel numbering and a concrete proof of G1, G2 and G3 131 8.3 G¨odel’s theorem and Peano Arithmetic ...... 132 8.4Consequences...... 134 8.5 Neumann-Bernays-G¨odelaxioms...... 135
9 Church and Turing 141 9.1TuringandTuringMachine...... 141 9.2ChurchandtheLambdaCalculus...... 153 9.3TheChurch-Turingthesis...... 166
10 Conclusion 169
A Timeline and Images 181 CONTENTS 7 Mathematical Notations
Many different notations have been developed for set theory and logic. Most notations that we have used are standard today; other notations that we have used are introduced in the text.
Mathematical Logic
symbol meaning also described as
∧ conjuction and ∨ disjunction (inclusive) or ¬ negation not ϕ(x) propositional function → implication if ... then ↔ bi-implication if and only if, iff ≡ equivalence is equivalent to ∀ universal quantifier for all ∃ existential quantifier exists ∃! one-element existential quantifier exists a unique
In most places we have chosen to use the following notation1 to denote quantifications:
(relation : range : term) denotes the relationship over a set of termsrangingoverrange
Consider a general pattern (Qx : ϕ(x0,...,xn):t(x0,...,xn)), with Q aquantifier,ϕ a boolean expression in terms of the dummies x0, ..., xn, and t(x0,...,xn) the term of the quantification. The quantification is the accumulation of values t(x0,...,xn) using an operator or relation indicated by Q, over all values (x0,...,xn) for which ϕ(x0,...,xn)holds.
1Notation originally due to E.W. Dijkstra. 8 CONTENTS
This notation is suitable for formal manipulation and unambiguous in the sense that it explicitly indicates the quantifier Q, the dummies and the range of the dummies that is indicated by the boolean expression ϕ (i.e. it exactly determines the domain of the quantification). This allows us to reason about general properties of quantifications, in a way in which the (scopes of the) bound variables are clearly identified. Note that this type of quantification is only suitable for binary operations that are symmetric and associative.
Example: ( x :0≤ x ≤ 5:x2) = 02 +12 +22 +32 +42 +52 = 5 x2 x=0 Example: (∃x : x ∈ N : x3 − x2 =18) ≡
‘there exists a natural number x such that x3 − x2 =18’ If the term ranges over all possible values of the variable (here : x), or if it is clear what the range of a variable is, we can omit it.
Example: (∀x : true : x ∈ A → x ∈ B) ≡
(∀x :: x ∈ A → x ∈ B) ≡
‘all elements of A are also elements of B’ Chapter 1
Introduction
Pure mathematics is, in its way, the poetry of logical ideas.
This report covers the most important developments and theory of the foundations of mathematics in the period of 1870 to 1940. The tale of the foundations is fairly familiar in general terms and for its philosophical con- tent; here the main emphasis is laid on the mathematical theory. The history of the foundations of mathematics is complicated and is a many-sided story; with this article I do not aim to give a definitive or complete version, but to capture what I consider the essence of the theoretical developments, and to present them in a clear and modern setting. Some basic mathematical knowledge on set-theory and logics are presupposed.
By the middle of the nineteenth century, certain logical problems (for example paradoxes around the notions of infinity, the infinitesimal and con- tinuity) at the heart of mathematics had inspired a movement, led by German mathematicians, to provide mathematics with more rigorous foundations.
This is where the theory of this report begins, with the emergence of set theory by the German mathematician Cantor. In section 2.1 we informally describe how work on a problem concerning trigonometric series gradually led Cantor to his theory of sets (section 2.2). As a result of the work of Weierstrass, Dedekind and Cantor, pure mathematics had been provided with much more sophisticated foundations. The notion of infinitesimal had been banished, ‘real’ numbers had been provided with a logically consistent
9 10 CHAPTER 1. INTRODUCTION definition (section 3.5), continuity had been redefined and, more controver- sially, a whole new branch of arithmetic had been invented which addressed itself to the problems (e.g. paradoxes) of infinity (sections 3.6, 3.7). In 1895 Cantor discovered a paradox (section 3.8.1) that he did not publish but communicated to Hilbert in 1896. In 1897 it was rediscovered in a slightly different form by Burali-Forti (section 3.8.2). Cantor and Burali-Forti could not resolve this paradox, but it was not taken so seriously, partly because the paradoxes appeared in a rather technical region.
The Italian mathematician Peano (section 4.1) was able to show that the whole of arithmetic could be founded upon a system that uses three basic notions and five initial axioms. At the same time the German mathematician Frege (section 4.2) worked on developing a logical basis for mathematics. Just as Peano, Frege wanted to put mathematics on firm grounds. But Frege’s grounds were strictly logic; he followed a development later called logicism, also known as the development of so-called mathematical logic.
The British mathematician Russell noted Peano’s work and later that of Frege. Soon thereafter he showed (section 5.1) how finite descriptions like ‘set of all sets’ could be self-contradictory (i.e. paradoxical) and pointed out the difficulties that arose with self-referential terms. This paradox that Russell found existed not only in specific technical regions but in all of the axiomatic systems underlying mathematics at the same time (section 5.1). But since the paradoxes could be avoided in most practical applications of set theory, the belief in set theory as a proper foundation of mathematics remained. Axiomatic set theory (section 5.3.1) was an attempt to come to a theory without paradoxes. Various responses to the paradox (section 5.2) led to new sets of axioms for set theory. The two main approaches are by the German mathematicians Zermelo and Fraenkel (section 5.3), and by the Hun- garian von Neumann, the Hungarian-Austrian G¨odel and the Briton Bernays (section 8.5). It also led to the emergence of the ‘intuitionistic’ philosophy of mathematics by the Dutch mathematician Brouwer (not covered here) and to a theory of types, proposed by Russell himself with the help of his for- mer teacher, the English mathematician Whitehead. Despite of the paradox Russell and Whitehead still claimed that all mathematics could be founded on a mathematical logic; this believe was given a definite presentation in their work ‘Principia Mathematica’ (section 7.1). Various consequences fol- lowed (section 7.3) and new conceptions of logic arose (by Wittgenstein and 11
Ramsey, see section 7.2).
At the turn of the century, the German mathematician David Hilbert listed certain important problems concerning the foundations of mathema- tics and mathematics in general (section 6.2. To overcome paradoxes and other problems that arose in existing systems, Hilbert developed a theory of axiomatic systems (section 6.1). He then stimulated his student Zermelo in using this axiomatic method to develop as first a set of axioms for set theory (section 5.3.2). Hilbert had since then made more precise demands on any proposed set of axioms for mathematics (section 6.1) in terms of consistency, completeness and decidability.
In 1931 G¨odel had shown that consistency and completeness could not both be attained (chapter 8). G¨odel’s work left outstanding Hilbert’s ques- tion of decidability. The English mathematician Turing proved in 1936 that there are undecidable problems, by giving the so-called halting problem that cannot be solved by any algorithm (section 9.1), after formalizing the no- tion of algorithm with his concept of the Turing Machine. The American mathematician Church (independently) obtained the same result but with another formalization of the notion of an algorithm, using his computational model of lambda calculus (section 9.2). In section 9.3 we state that these two notions are equivalent and correspond to the intuitive notion of algorithm or computability. In chapter 10 I summarize the theory of the foundations of mathematics, before giving my own opinion and make some suggestions for future work.
This article is part of the practical component of my study of computing science, and written for a large part in 8 weeks at the Heriot-Watt university in Edinburgh under supervision of prof. F. Kamareddine. I want to thank Rob Nederpelt and the formal methods section of the computing science de- partment of the Eindhoven University of Technology for making this possible. Rob Nederpelt always inspired me to continue working on this report and was patient in explaining difficult proofs to me. And last but not least, I want to thank Fairouz Kamareddine for her support and positive motivation, and Boukje Nouwen (as she breathes a sigh of relief that this is (I think) the last revision) for the typesetting and editing of large parts of this document and for helping me in many ways to finish this article in such a small period of time. 12 CHAPTER 1. INTRODUCTION Chapter 2
Cantor’s paradise
2.1 The beginning of set-theory
Perhaps the most surprising thing about mathematics is that it is so surprising. The rules which we make up at the beginning seem ordinary and inevitable, but it is impossible to foresee their consequences. These have only been found out by long study, ex- tending over many centuries. Much of our knowledge is due to a comparatively few great mathematicians such as Newton, Euler, Gauss, or Riemann; few careers can have been more satisfying than theirs. They have contributed something to human thought even more lasting than great literature, since it is independent of language.
- Titchmarsh, E. C. in [88]
Bythelate19th century the discussions about the foundations of geometry had become the focus for a running debate about the nature of the branches of mathematics ([23, last paragraph of section 35, page 69/70]). Although there had been no conscious plan leading in that direction, the stage was set for a consideration of questions about the fundamental nature of mathema- tics.
In the study of logic, the work of the English mathematician George Boole in the 1850s ([49, chapter 2.S4, page 51]), and the American Charles Sanders
13 14 CHAPTER 2. CANTOR’S PARADISE
Peirce around 1880 ([49, page 187]), had contributed to the development of a symbolism to explore logical deductions and in Germany the logician Gottlob Frege (see [98]) had directed keen attention to fundamental questions.
All of these debates came together through the pioneering work of the German mathematician Georg Cantor on the concept of a set. Cantor had begun work in this area because of his interest in Riemann’s theory of trigono- metric series.
In Germany at the university of Halle, the direction of Cantor’s research turned away from number theory and towards analysis. This was due to Heine, one of his senior colleagues at Halle, who challenged Cantor to prove the open problem on the uniqueness of representation of a function as a trigonometric series (see [30, section 5.2, page 182]). Starting from the work on trigonometric series and on the function of a complex variable done by the German mathematician Bernhard Riemann (see [75]) in 1854, Cantor in 1870 showed ([30, page 182]) that such a function can be represented in only one way by a trigonometric series. Consideration of the collection of numbers (originally termed ‘point sets’, see [30, section 5.2, page 184]) that would not conflict with such a representation led him, first, in 1872, to define irrational numbers in terms of convergent sequences of rational numbers (or quotients of integers, see section 3.5.2) and then to begin his major lifework, the theory of sets and the concept of transfinite numbers. 2.2. BASIC CONCEPTS 15 2.2 Basic concepts
The essence of mathematics lies in its freedom.
- Georg Cantor, quoted in [58]
In 1974 Cantor published his first article on set-theory. A set, wrote Can- tor (in ‘Untersuchungenuber ¨ die Grundlagen der Mengenlehre I’, published in [20, page 261-281]), is “a collection of definite, distinguishable objects of perception or thought conceived as a whole”. In this report we use a similar description of the concept of a set.
What is a set? A (finite or infinite) collection of objects, that is considered as a single, abstract object.
A set is sometimes also called aggregate, class or (as it was first called by Riemann (see [31, page 88]) and later by the mathematician Russell:) mani- fold. The objects are also called elements or members of the set.
We denote a set of elements between brackets ‘{’, ’}’, and membership of an element to a set by the membership relation ∈.
Example: If we consider a set that contains natural numbers, we write 4 ∈ {2, 3, 4, 5} to indicate that 4 is an element of the set {2, 3, 4, 5}. We write 4 ∈{7, 8, 9} to indicate that 4 is not an element of the set {7, 8, 9}.
In a mathematical context we mostly consider sets of numbers and functions. We denote the well-known sets of natural numbers by N (this set is also called the naturals), the integers by Z, the fractional numbers by Q (this set is also called the rationals)andtherealsbyR (this set is also called the continuum). The objects of a set themselves can also be sets.
What is set theory? A branch of mathematics that deals with the proper- ties of well-defined collections of objects, which may be of a mathematical nature, such as numbers or functions, or not. 16 CHAPTER 2. CANTOR’S PARADISE
Cantor defined ([49, page 288]) two sets A and B to be identical (equal), notation A = B, if and only if A and B have the same elements. When later set-theory was axiomatized, this definition became also known as the
Axiom of extensionality: A = B := (∀x :: (x ∈ A ↔ x ∈ B))
Example: {3, 3, 7} = {7, 3} and {2, {3, 4}} = {{2, 3},4}
The relation ‘is a subset of’, notation ⊆, indicates that one set is con- tained in the other:
Definition of subset: A ⊆ B := (∀x :: x ∈ A → x ∈ B)
Definition of proper subset: A ⊂ B := (A ⊂ B ↔ A ⊆ B ∧ A = B)
We often want to create a new set from a given set by selecting elements that have certain properties. For example we take the set of powers of three or the set of all even numbers (to be exact: the set containing those ele- ments of the set of natural numbers that have the property to be divisible by 2). This principle was used by Cantor, and we also call it the unrestricted or naive comprehension principle because it later (see sections 3.8 and 5.1) turned out to be untenable.
Comprehension principle: For all properties ϕ there is precisely one set, denoted by {x | ϕ(x)}, whose elements are exactly those objects which have the property ϕ.
We thus have that y ∈{x | ϕ(x)}↔ϕ(y). As a consequence (by taking for all x, ϕ(x) = false), there is at least one set that has no elements: the empty set, denoted by ∅.
Theorem: (∃!x :: (∀y :: y/∈ x)) Proof: If we take ϕ to be false, the comprehension principle says that ‘there is precisely one set whose elements are exactly those objects which have the property false’. In mathematical notation: (∃!x :: (∀y :: y ∈ x ↔ false)). This is equivalent to saying there is no element y that can be a member of x:(∃!x :: (∀y :: y/∈ x)). From now on, we denote this unique set x by ∅ and call it the empty set. 2.2. BASIC CONCEPTS 17
Corollary: (∀a :: ∅⊆a) Proof: We want to prove that (∀a :: ∅⊆a) or, using the definition of the subset relation: (∀x :: x ∈∅→x ∈ a). From the previous theorem we know that (∀y :: y/∈∅). This yields us (∀x :: false → x ∈ a), which is true.
Using the comprehension principle we can create new sets from given sets. So now we can introduce some operations on sets, by applying the compre- hension principle. But before we do that, we first introduce some general (i.e. regardless whether the operations are set-theoretic or not) properties of operations: idempotence, commutativity, associativity and distributivity. Although Cantor did not formulate these properties as such, they are used in the branch of calculus and useful in the set theory that follows in this chapter. Suppose ⊕ and are binary1 operations on a certain domain and E, F and G are elements on that domain (for example sets), on which we have defined the equality relation ‘=’.
Definition of idempotence: ⊕ is idempotent := (∀E :: E ⊕ E = E)
Definition of commutativity: ⊕ is commutative := (∀E,F :: E ⊕ F = F ⊕ E)
Definition of associativity: ⊕ is associative := (∀E,F,G :: (E ⊕ F ) ⊕ G = E ⊕ (F ⊕ G))
Definition of distributivity: ⊕ is distributive2 over := (∀E,F,G :: E ⊕ (F G)=(E ⊕ F ) (E ⊕ G))
1These properties can also be generated for operations of arbitrary arity, but this will not be necessary for our discussion. 2This form of distributivity is also called left-distributivity, as opposed to right- distributivity. ⊕ is right-distributive over := (∀E,F,G :: (E F ) ⊕ G =(E ⊕ G) (F ⊕ G)) In ordinary mathematics this distinction is often left out for commutative operations, and we for example simply say that × is distributive over + (when in fact it is both left- and right-distributive). 18 CHAPTER 2. CANTOR’S PARADISE
The symbol ∪ is employed to denote the union of two sets. Thus, the set A ∪ B is defined as the set that consists of all elements belonging either to set A or set B.
Definition of union: A ∪ B := {x | x ∈ A ∨ x ∈ B}
The intersection operation is denoted by the symbol ∩. A ∩ B is defined as the set composed of all elements that belong to both A and B.
Definition of intersection: A ∩ B := {x | x ∈ A ∧ x ∈ B}
Any two sets the intersection of which is the empty set are said to be dis- joint. A collection of sets is called (pairwise) disjoint or mutually exclusive if any two distinct sets in it are disjoint.
Example: The operations union and intersection on sets are both idempo- tent, commutative and associative.
The difference of sets B and A, denoted B − A, contains those elements of B, that are not in A.
Definition of difference: B − A := {x | x ∈ B ∧ x/∈ A}
If A ⊆ B we often call the difference B − A the relative complement of A in B.WethencallB the universe, and if it is clear what the universe is we often denote the relative complement of A by Ac. From the definitions that we have introduced so far, we can deduce three properties that are known as the laws of reciprocity. The second and third law are also known as the laws of de Morgan, named after the English mathematician Augustus de Morgan:
First law of reciprocity: A ⊆ B ↔ AC ⊇ BC Second law of reciprocity: (A ∪ B)C = AC ∩ BC Third law of reciprocity: (A ∩ B)C = AC ∪ BC
We define the power set of V , denoted by P(V ), as the set of all subsets of V .NotethatifV = ∅, this operation creates a larger set from a given set V . 2.2. BASIC CONCEPTS 19
Definition of powerset: P(V ):={A | A ⊆ V }
Given a set V ,wethushavethat(∀y :: y ∈P(V ) ↔ y ⊆ V )
We can extend the union of a pair of sets to any finite collection of sets; the union is then defined as the set of all objects which belong to at least onesetinthecollectionA.Wecandothesamefortheintersection. Definition: A := {x | (∃y :: y ∈ A ∧ x ∈ y)} Definition: A := {x | (∀y :: y ∈ A → x ∈ y)}
We can divide a set of objects into a partition, that is a family of subsets that are mutually exclusive and jointly exhaustive. Assume P is a set of subsets of X.
Definition of partition: P is a partition of X := X = {A | A ∈ P }∧(∀A, B : A, B ∈ P : A = B ∨ A ∩ B = ∅)
In this chapter I have made extensive use of [30] in section 2.1 and [17] in section 2.2. 20 CHAPTER 2. CANTOR’S PARADISE Chapter 3
Mathematical constructs in set-theory
3.1 Some mathematical concepts
The mathematician is entirely free, within the limits of his imagi- nation, to construct what world he pleases. What he is to imagine is a matter for his own caprice; he is not thereby discovering the fundamental principles of the universe nor becoming acquainted with the ideas of God. If he can find, in experience, sets of entities which obey the same logical scheme as his mathematical entities, then he has applied his mathematics to the external world; he has created a branch of science.
- J.W.N. Sullivan in Aspects of Science, 1925
Now that we have this apparatus of set-theory available, we will see that it is not just a separate branch of mathematics, but that we can define some basic mathematical constructs in set-theory. In this section we will consider pairs and the cartesian product, necessary before we can treat relations (in section 3.2) and functions (in section 3.3).
First we consider the mathematical concept of an ordered pair . Compared to a ‘normal’ pair, where two pairs are considered equal if they have the same elements, we want an ordered pair to also have the property
21 22 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY that the elements appear in the same order:
(∀c, d :: =
We can now easily verify that the following definition (see [17, chapter 8]) in set-theory satisfies the desired property.
Definition of ordered pair1: := {a, {a, b}}
As the cartesian product A × B is by definition the set of all ordered pairs with a ∈ A and b ∈ B, we can now use the same definition in set-theory:
Definition of cartesian product: A × B := { | a ∈ A ∧ b ∈ B}
Let V = {Vi | i ∈ I} be a set of sets. We now define the cartesian product of a set of sets, denoted by ×V or ×i∈I Vi. The definition uses the concept of a function, that will be introduced on page 29.
Definition of cartesian product of a set of sets: × { → | ∀ ∈ ∈ } V := f : I i∈I Vi ( i : i I : f(i) Vi)
1Representation originally by Kuratowski, see [49, page 294]. 3.2. RELATIONS 23 3.2 Relations
Mathematicians do not study objects, but relations between ob- jects. Thus, they are free to replace some objects by others as long as the relations remain unchanged. Content to them is irre- levant: they are interested in form only.
- J.H. Poincare´
In mathematics, a relation maps each element from an input set (called domain) to either true or false. We formalize this notion in set-theory.
Definition of binary relation: R is a binary relation between X and Y := R ⊆ X × Y
Note: We can easily generalize this definition for n-ary relations: R is an n-ary relation on X1,...Xn := R ⊆ X1,X2 × ...× Xn, for n ∈ N. We call n the arity of the function.
Example: We have already seen the definitions of the subset and proper sub- set relations in section 2.1. There we defined the set R ⊆ X ×Y implicitly by using a statement; only those pairs
We define the following shorthand notation (sometimes also written in infix notation as xRy): R(x, y):=
The mathematical expression ‘x
Example: The relation < on the naturals (i.e. between N and N)canbe defined as:
< 0, 1 >, < 1, 2 >, < 2, 3 >, ... < 0, 2 >, < 1, 3 >, < 2, 4 >, ... < 0, 3 >, < 1, 4 >, < 2, 5 >, ... . . 24 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
On a relation R we can define the concepts of domain and range.
Definition of domain, range: dom(R):={x ∈ X | (∃y : y ∈ Y : R(x, y))} ran(R):={y ∈ Y | (∃x : x ∈ X : R(x, y))}
If we define the identity relation of X, we want it to have the usual pro- perty that idX (x)=x for all x ∈ X (see for example [3, section 1.9.5.b, page 30]). In set-theory, we denote the identity relation on V by IV .
Definition of identity relation: IV := {
Assume R is a binary relation on a set X (i.e. R ⊆ X × X). As we did for operations in section 2.2, we can also define some general properties of relations. Note that we have already defined an equality relation ‘=’ on X at page 16. Hereby we can explicitly state on which domain the property holds (e.g. R is reflexive on X) or leave this implicit (e.g. simply R is reflexive).
Definition of reflexivity: R is reflexive := (∀x : x ∈ X : R(x, x))
Definition of symmetry: R is symmetric := (∀x, y : x, y ∈ X : R(x, y) → R(y, x))
Definition of anti-symmetry: R is anti-symmetric := (∀x, y : x, y ∈ X : R(x, y) ∧ R(y, x) → x = y)
Definition of transitivity: R is transitive := (∀x, y, z : x, y, z ∈ X : R(x, y) ∧ R(y, z) → R(x, z))
Definition of connectivity: R is connective := (∀x, y : x, y ∈ X : R(x, y) ∨ (x = y) ∨ R(y, x))
Definition of equivalence: R is an equivalence relation := R is reflexive, symmetric and transitive 3.2. RELATIONS 25
Note: Asymmetric means not symmetric, and is not the same as anti- symmetric.
Example: The subset relation is reflexive, anti-symmetric (note that the proof of anti-symmetry uses the axiom of extensionality of page 16) and transitive, but not connective.
If R is an equivalence relation on a set X, we denote the equivalence class of x with respect to R as [x]R.
Definition of equivalence class: [x]R := {y ∈ X | R(x, y)}
If R is an equivalence relation on X, the quotient set X/R of X modulo R is the set of equivalence classes [x]R for all x ∈ X.
Definition of quotient set: X/R := {[x]R | x ∈ X}
We now continue to build on the concept of relations, by categorizing them based on the properties they have. An important property of relations is the ability to compare and order elements. Suppose X and Y are sets, and R is a relation on X.
Definition of (weak) partial ordering: R is a (weak) partial ordering := R is reflexive, anti-symmetric and transitive (on X)
Definition of quasi ordering: R is a quasi ordering := R is irreflexive and transitive
Definition of strict partial ordering: R is a strict partial ordering := R is irreflexive, anti-symmetric and transitive
Definition of (total or linear) ordering: R is a (total or linear) ordering := R is irreflexive, anti-symmetric, transitive and connective
Definition of well-ordering: R is a well-ordering := R is an ordering on X and each nonempty subset of X has a least element 26 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Definition of well-foundedness: AsetV is well-founded by a relation R := S is partially ordered by R and contains no infinite descending chains
AsetS contains a set C that is an infinite descending chain iff C ⊂ S ∧ C has no minimal element.
Theorem: (without proof) Any subset of a well-founded set is also well- founded.
Now we can speak of a set of which the elements are ordered by a relation R, we define the well-known concepts of (immediate) successor and prede- cessor.
Definition of (immediate) predecessor: An element x1 ∈ X is a pre- decessor of an element x2 ∈ X (with respect to an ordering R on X):= R(x1,x2) ∧¬R(x2,x1). x1 is an immediate predecessor of x2 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1,x3) ∧ R(x3,x2))
Definition of (immediate) successor: An element x2 ∈ X is a suc- cessor of an element x1 ∈ X (with respect to an ordering R on X):= R(x1,x2) ∧¬R(x2,x1). x2 is an immediate successor of x1 if in addition (¬∃x3 : x3 ∈ X ∧ x3 = x1 ∧ x3 = x2 : R(x1,x3) ∧ R(x3,x2))
Note that with these definitions it can be easily proved that if a relation R on X is an ordering, then each element except the smallest has a unique immediate predecessor and each element except the largest has a unique immediate successor. The notions of smallest and largest elements will be introduced hereafter. In the literature the immediate successor or predeces- sor is sometimes called just successor or predecessor. Sometimes we also see that the term ‘direct’ is used in stead of ‘immediate’, or we simply speak of the ‘next’ or ‘previous’ value.
When R is a partial ordering we often denote it by the symbol ,and when it is a quasi ordering by ≺. Now we can distinguish elements based on their order. Let X be a set, partially ordered by and let Y be a subset of X.
Definition of minimal element: x is a minimal element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : y x) 3.2. RELATIONS 27
Definition of maximum element: x is a maximum element of X := x ∈ X ∧ (¬∃y : y ∈ X ∧ y = x : x y)
Definition of least element: x is a least (also called smallest or first) element of X := x ∈ X ∧ (∀y : y ∈ X : x y)
Definition of maximal element: x is a maximal (also called greatest, largest, last) element of X := x ∈ X ∧ (∀y : y ∈ X : y x)
Definition of lowerbound: x is a lowerbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : x y)
Definition of upperbound: x is an upperbound for Y in X := x ∈ X ∧ (∀y : y ∈ Y : y x)
Definition of infimum: x is an infimum for Y in X := x is the greatest lowerbound for Y in X
Definition of supremum: x is a supremum for Y in X := x is the smallest upperbound for Y in X
Example: Let X = {4, 6, 12, 24, 36} and R(x, y):=x is a divisor of y.Then R is a partial order (but not strict) and also a quasi order, but not a (total) order. 4 and 6 are minimal elements of X, but X has no least element. 1 is a lowerbound for X, and 2 is the infimum of X. 28 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
The so-called least number principle says that any non-empty subset of the natural numbers has a least element. This principle can be shown (a proof can be found in [59, page 7]) to be equivalent to the principles of weak and strong induction, that will be introduced in section 3.4.
Example: The relation < on the naturals is an example of a total ordering on N. From the so-called least number principle we can conclude that N is also well-ordered by <. We prove the latter.
Proof: We know that < is an ordering on N. We show by induction on the number of elements of A, notation | A |,that(∀A : A ⊆ N ∧ A = ∅ : A has a least element). Suppose N = {0,...,n}, n ∈ N.LetA ⊆ N.For| N | = 0 it is trivial that A is well-ordered. For | N | = n +1,ifA ∩{0,...,n} = ∅, n +1isaleast element of A.IfA ∩{0,...,n} = ∅, we can apply the induction principle to conclude that A ∩{0,...,n} has a least element. The least element of A ∩{0,...,n} is also a least element of A ∩{0,...,n+1}. 3.3. FUNCTIONS 29 3.3 Functions
In mathematics, a function maps each element from an input set to one or more elements of an output set; in other words it is a special kind of relation that indicates for each pair
Definition of function: f is a function from a set X to a set Y , notation f : X → Y := f ⊆ X × Y ∧ (∀x : x ∈ X :(∃!y : y ∈ Y :
The definitions of domain and range as given in the subsection about relations can now also be used for functions. We now introduce a notation for the set of all functions f : X → Y .
Definition of Y X : Y X := {f ∈P(X × Y ) | f is a function from X to Y }
As we did before for relations and operations, we now define some general properties for functions.
Definition of injective: f : X → Y is injective or an injection := (∀x1,x2 : x1,x2 ∈ X : x1 = x2 → f(x1) = f(x2))
Definition of surjective: f : X → Y is surjective or a surjection := (∀y : y ∈ Y :(∃x : x ∈ X : y = f(x))
Definition of bijective: f : X → Y is bijective or a bijection := f is surjective and f is injective
If f is bijective, f is also called a (one-to-one) correspondence between X and Y .
Example: We have the following property: f : X → Y is surjective ↔ Ran(f)=Y . 30 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Example: f : N → [−2π, 2π], with f(x)=sin(x) is a function and a relation. g :[−2π, 2π] → N,withg(x)=y iff x = sin(y) is a relation, not a function.
We will now consider two special kinds of functions: the identity function and the sequence.
Definition of sequence: s is a sequence of X := s is a function from N to X (i.e. s ∈ XN)
Definition of identity function: The identity function idX := idX : X → X and (∀x : x ∈ X : idX (x)=x)
We now introduce some operations on functions in set-theory. We can easily check that these definitions correspond to mathematical operations.
Definition of composition: The composition g◦f of two functions f : A → B and g : B → C := the function g ◦ f : A → C with g ◦ f(x)=g(f(x)), for all x ∈ A
Definition of inverse function: The inverse of a bijection f : X → Y := the function f −1 : Y → X with (∀y : y ∈ Y : f −1(y)=x ↔ y = f(x))
Definition of restricted function: The restriction of a function f : X → Y to X0,withX0 ⊆ X := the function fX0 : X0 → Y with (∀x : x ∈ X0 : fX0 (x)=f(x))
Just as in algebra, we can now combine a set and relations on that set into a structure.
Definition of (relational) structure: X, R0,...,Rp is a (relational) structure := X is a set and R0,...,Rp are relations on X
The concept of a structure enables us to abstract from the exact set and relations, and reason about sets of structures instead. There also is a useful definition for equivalence of structures, called isomorphism. 3.3. FUNCTIONS 31
Let R = X, R0,...,Rp and S = Y,S0,...,Sp be two structures, such that (∀i :0≤ i ≤ p :thearityofRi and Si is ni +1).
Definition of isomorphism: f is an isomorphism between R and S := f ∀ ≤ ≤ ∀ ∈ is a bijection from X to Y and ( i :0 i p :( x0,...,xni : x0,...,xni ↔ X : Ri(x0,...,xni ) Si(f(x0),...,,f(xni ))))
With the notion of isomorphism, we can now abstract over structures. When two structures are similar (the sets are of the same size and the rela- tionships between the elements in one structure are retained between images of those elements in the other structure), we call them isomorphic.
Definition of isomorphic: Two structures R and S are isomorphic,nota- tion R S := there exists an isomorphism from R to S
Definition of automorphism: f is an automorphism of R := f is an isomorphism from R to R
Example: An isomorphism from structure N,< to Neven,< is given by f : N → Neven,withf(n)=2n. f is not an isomorphism from N, ⊕ to N,<,witha ⊕ b := b divides a.
Example: The function g : R+ → R+ with g(x)=log(x) is an isomorphism + + + between R , ∗ and R , +, because for all r1, r2 ∈ R ,log(r1 ∗ r2)= log(r1)+log(r2).
Example: An automorphism of A, R0,...,Rp is the identity function idA : 3 A → A,soidA = {| a ∈ A}. Also, the function f(x)=2x is an automorphism of R,<. 32 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY 3.4 Induction Methods
There is a tradition of opposition between adherents of induction and deduction. In my view it would be just as sensible for the two ends of a worm to quarrel.
- A. Whitehead, quoted in [76]
3.4.1 Induction Induction is a method of reasoning from a part to a whole, from particu- lars to generals, or from the individual to the universal. It should not be confused with the mathematical principle of induction (treated in section 3.4.3). In ordinary induction we examine a certain number of cases and then generalize. Reasoning by analogy, where a conclusion is made based on an analogues situation, is also a primitive form of induction (see [23, page 6]).
Example of inductive reasoning: 2 Coffee shop burger no. 1 was greasy . . . Coffeeshopburgerno. 2 wasgreasy...... Coffee shop burger no. 100 was greasy . . . Therefore, all coffee shop burgers are greasy (or: the next coffee shop burger will be greasy).
So in induction the conclusion contains information that was not con- tained in the premisses. This is the source of uncertainty in inductions: inductions are strengthened as confirming instances pile up, but they can never bring certainty (unless every possible cause is actually examined, in which case they become deductions). As said in [49, page 366], the broad difference between deductive and inductive reasoning is that in deduction the conclusion asserts less than the premisses, whereas in induction it asserts more. In chapter 14, section 3 of [49] there is a more detailed treatment of inductive reasoning, including a distinguishment between determinative and conceptual induction. In both these kinds of induction, the conclusion goes beyond the premisses (or the evidence).
2Example from: Peter Suber, Philosophy department, Earlham College. 3.4. INDUCTION METHODS 33
3.4.2 Deduction Mathematics, in its widest significance, is the development of all types of formal, necessary, deductive reasoning.
- A. Whitehead, quoted in [100]
In contrast to induction, deduction is a method of reasoning that is based on a rigorous proof: a derivation (using fixed rules called a system of logic), of one statement (the conclusion) from one or more statements (the premisses) - i.e. a chain of statements, each of which is either a premise or a consequence of a statement occurring earlier in the proof. In deductive reasoning, we are not directly concerned with the truth of the conclusion but rather whether the conclusion does or does not follow from the premisses. If the conclusion follows from the premisses, we say that our reasoning is valid;ifitdoesnot we say that our reasoning is invalid.
The Greek found deductive reasoning, not empirical procedures, the method to establish mathematical facts. This usage is a generalization of what the Greek philosopher Aristotle called the syllogism (see [49, chapter 1, section 5 and 6)]), but a syllogism is now recognized as merely a special case of a deduction. Also, the traditional view that deduction proceeds from the gene- ral to the specific has been abandoned as incorrect by most logicians. Some experts regard all valid inferences as deductive in form and for this and other reasons reject the supposed contrast between deduction and induction. The German mathematician Hilbert greatly contributed to deductive reasoning as we will see when we introduce his proof theory (also known as the axiomatic method) in chapter 6. Logic, in mathematical context, can be seen as the theory of the formal structure of deductive reasoning. The logic of Hilbert’s metamathematics (see section 6.1) and Russell’s Principia Mathematica (see section 7.1) are a form of reasoning with deductive certainty, although others have proposed different formalizations of deductive logic (see [49, page 121]). Originally based on Aristotle’s logic, the deductive argument has become more subtle and complex and is now based on modern symbolic logic. 34 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
3.4.3 The principle of induction Informal
The principle of induction, also known as mathematical induction,isan important process for proving theorems. It was even used by Peano to define the concept of natural numbers (see section 4.1, axiom 3). ‘Mathematical induction’ is unfortunately named, for it is unambiguously a form of deduc- tion. The name was probably inspired by the fact that, just like induction, it generalizes to a whole set from a smaller sample. But, as we will see, mathematical induction concludes with deductive certainty.
The informal structure of the proof of a theorem by mathematical induc- tion is fairly simple:
1) Basis. Prove that the theorem holds for a specific case (which often is minimal for a given ordering of the elements). This case is also called base case.
2) Induction step. Prove a rule that says that if the theorem holds for an arbitrary element, it is true for the next case. This often is a rule of heredity that tells us that the theory is true for the immediate successor case of an arbitrary element if it is true for the arbitrary element itself. The claim that the theorem is true for an arbitrary element is called the induction hypothesis.
3) Conclusion. Together, 1 and 2 imply that the theorem holds for all cases starting with the base case. If you didn’t use the minimal case in step 1, then you have proven only that the theorem holds for that case and its successors, not for all possible cases.
The induction step can take two forms which correspond to two forms of mathematical induction. Again we assume there is an ordering of the ele- ments with +1 the immediate successor relation. Weak: prove that if the theorem holds for an arbitrary element n,thenit holds for the element n +1 Strong: prove that if the theorem holds for all elements up to some arbitrary element n, then it holds for the element n +1 3.4. INDUCTION METHODS 35
We will now formally state the principle of induction. This is important, since many mistakes are being made in applying the principle. It does not go without saying that if we are to use mathematical induction to prove that some theorem applies to ‘all possible cases’, then those cases must somehow be enumerable and in some way linked to the integers. And we have to be able to speak about the minimal case, the nth case, the successor of a given case, etc.
Formal
Suppose that we want to prove a property ϕ(s) that holds for all s ∈ S. The induction principle assumes that S is a well-founded set and every element except for the smallest has an immediate predecessor. This condition is also known as S is inductive. The structure of an inductive set in fact resembles that of the naturals, i.e. if we have the axioms (see Peano axioms in section 4.1) 0 is in N and if x is in N then x + 1 is in N, the set N is inductive. In case the set S is the naturals, we also refer to the principle as natural induction. The principle presupposes the following two conditions:
AlS is a set, well-founded by relation R (such that ‘+’ denotes the im- mediate successor of an element with respect to the relation R)and with smallest element e
BlEvery element except e has a (unique) immediate predecessor and ϕ is a property of elements of S
If Aland Blhold, we can use the induction principle.
Definition of the (weak) (mathematical) induction principle: if
Clϕ(e) (i.e. e has a property ϕ)
Dl(∀s : s ∈ S : ϕ(s) → ϕ(s+)) (i.e. if s ∈ S has property ϕ, then the (unique) immediate successor of s also has property ϕ) then the property ϕ holds for every element in S 36 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY
Step Clis also called the base of a proof by induction, step Dlis also called the induction step,andϕ(s) is called the induction hypothesis.
Proof: Suppose S is a well-founded set and every element except the small- est, denoted e, has an immediate predecessor, and suppose that a property ϕ is true for e,aswellasfortheimmediatesuccessors+ ∈ S if it is true for s ∈ S. We now prove by contradiction that ϕ holds for all s ∈ S. Suppose that ϕ is not true for all s ∈ S.LetN be the set of elements of S for which ϕ is not true, i.e. N = {s ∈ S |¬ϕ(s)}. By the theorem of page 26 we also know that if S is well-founded, any subset of S is also well-founded, thus N contains a smallest element n.Ifn = e, we have a contradiction. If n>e, n has an immediate predecessor, denoted n−.Sincen is the smallest element for which ϕ doesn’t hold, ϕ must hold for n−.ButthenbyDl, ϕ must also hold for the immediate successor of n−,thatisn: contradiction. Thus ϕ must be true for all s ∈ S.
As we mentioned before, this principle can be generalized in several ways. One way is to prove in step Clthat ϕ holds for a (possibly non-minimal) case b ∈ S.InstepDlwe then show that (∀s : s ∈ S ∧ s ≥ b : ϕ(s) → ϕ(s+)). The conclusion then is that the property ϕ holds for all elements in S that are ordered larger or equal to b.
We now show (with proof by contradiction) why the additional property Bl that every element except the smallest must have an immediate predecessor is necessary for the induction principle. Consider the natural numbers with the ordering defined as follows:
• if n and m are both even, then n m if n • if n and m are both odd, then n m if n • if n is even and m is odd, we always define n m We can check that N is well-founded by , but not every element (for example 1) has an immediate predecessor. We take the property ϕ that every element is even. The smallest element in the ordering is 0, which is even. Also, if s has property ϕ then so does the successor of s. That is because in our ordering, the successor of an even number is always the next even number, never an odd number, and if s has property ϕ,thens must be even. 3.4. INDUCTION METHODS 37 Therefore (with only conditions Al,Cland Dlholding) every natural num- ber is even: contradiction! There is however a weaker principle, called transfinite induction which - suitably stated - does apply to every well-ordered set. But first we regard a stronger principle, that is based on the same assumptions (Aland Bl)asthe weak induction principle. Principle of strong (mathematical) induction: The same as for (weak) induction, but instead of Cland Dlwith D2 )(∀x : x ∈ S :(∀y : y ∈ S : R(y, x) → ϕ(y)) → ϕ(x)) (i.e. for all x ∈ S we have ϕ(x)ifallR-predecessors y of x have property ϕ) Sometimes this is also informally stated using the infamous three dots as (∀s : s ∈ S :(ϕ(e) ∧ ϕ(e+) ∧ ...∧ ϕ(s)) → ϕ(s+). Proof: Suppose X, R is a structure such that Al,Bland Elhold. Again we use proof by contradiction, and assume (∃x : x ∈ X : ¬ϕ(x)). Thus {x ∈ X |¬ϕ(x)} is non-empty and has a smallest element e (since X, R is well-founded). We now have ¬ϕ(e) ∧ (∀z : z ∈ X : R(z,e) → ϕ(z)). According to El(substitute z for y, X for S,andtakee for x) we then have ϕ(e): contradiction. Note that the base case is not really left out, since it is implicitly present in the quantification (take e for x). This form of induction, when applied to ordinals (ordinals form a well-ordered and hence well-founded set and are introduced in section 3.8.2) is called transfinite induction. Principle of transfinite induction3: The same as for strong induction, but instead of Aland Blas assumptions, it can be applied to any set S that is well-ordered by relation a R, and with smallest element e. 3Sometimes this principle is called the Principle of Complete Induction, for example in [4], but this is less common. 38 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY An example of such a set are the ordinals or cardinals, or even the class of all ordinals. A proof by transfinite induction typically needs to distinguish three cases: 1. s is a minimal element 2. s has an immediate predecessor (i.e. the set of elements which are smaller than s has a largest element) In this case we can apply normal induction. 3. s has no immediate predecessor (i.e. s is a so-called limit-ordinal, see also section 3.8.2) The case for limit ordinals is typically approached by noting that a limit ordinal b is (by definition) the union of all ordinals a Proof: The proof of the principle of transfinite induction is similar to the proof of the strong induction principle. Clearly, all three given principle are equivalent, since we proved them to be true. These proofs however are based on an underlying set of axioms (the so-called ZF axioms and the Peano axioms, that will be introduced in section 5.3 and chapter 4 respectively). Without these conditions (to be exact, with- out Peano’s induction axiom), we cannot directly prove the principles to be true from the ZF axioms alone4. In that case we can prove the equivalence of the principles by showing that they imply each other. As an example, we now prove that (mathematical) induction is a special case of transfinite induction, for the set of natural numbers. To prove this it suffices to show that ( Cland Dl) ↔ El. 4With only the fundamental axioms of Zermelo-Fraenkel set theory, it is not possible to prove mathematical induction. An extra axiom is needed, the infamous Axiom of Choice, or one of its equivalent forms. The four statements known as ‘Axiom of Choice’, ‘Zorn’s Lemma’, ‘Well-Ordering principle’ (also known as well-ordering theorem, see page 3.8.2) and ‘Mathematical Induction Principle’ are all equivalent, meaning that if you assume one of them to be true, the others follow as consequences, but none of them can be proven from the other fundamental axioms in ZF set theory alone. There are also other equivalent statements that are sometimes used (such as Zermelo’s postulate), and it is a nice exercise to prove the equivalence of these statements. 3.4. INDUCTION METHODS 39 Normal induction (IND): (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k +1))→ (∀n : n ∈ N : ϕ(n))) Transfinite induction (TFIND): (∀ψ :: (∀q : q ∈ N :(∀p : p ∈ N : p We can prove the equivalence of IND and TFIND in two ways: in a con- structive way or with a proof by contradiction. We give both proofs. Proof by Contradiction: (from: [17]) It suffices to prove that IND’ ≡ TFIND’, with IND’ ≡ (∀ϕ :: ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1))) TFIND’ ≡ (∀ψ :: (∀q : q ∈ N :(∀p : p ∈ N : p Proof of TFIND’ → IND’: Assume ϕ is a property. We assume TFIND’, and instantiate ψ with the property ϕ. We now want to prove IND’. If we take q =0, (∀p : p ∈ N : p<0 → ϕ(p)) is trivially true. Thus we have ϕ(0). We now prove by contradiction that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). Assume k ∈ N,ϕ(k) ∧¬ϕ(k + 1). That means the condition of TFIND’ (∀p : p ∈ N : p Proof of IND’ → TFIND’: Assume IND’, instantiate ϕ with ψ. For all properties ψ we have to prove (∀q : q ∈ N :(∀p : p ∈ N : p Constructive Proof: ProofofTFIND→ IND: Assume TFIND, and let ϕ be a property. We now need to prove that ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k +1)) → (∀n : n ∈ N : ϕ(n)). Assume ϕ(0) ∧ (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)). We want to use TFIND to conclude (∀n : n ∈ N : ϕ(n)). TFIND gives us: (∀k : k ∈ N :(∀l : l ∈ N : l ProofofIND→ TFIND: Assume ψ is a property. Also assume that (i): (∀k : k ∈ N :(∀l : l ∈ N : l Structural Induction In many cases we do not want to prove properties about the integers or similar well-ordered sets. In such cases straight induction is not always useful. However, forms of induction can also be appropriate when trying to prove properties about structures defined recursively. This generalized induction principle is known as structural induction.Itisusefulwhenobjectsarebuilt up from more primitive objects: if we can show the primitive objects have the desired property, and that the act of building preserves that property, then we have shown that all objects must have the property. The induc- tive hypothesis (i.e., the assumption) is to assume that something is true for ‘simpler’ forms of an object and then prove that it holds for ‘more complex’ forms. ‘Complexity’ can be defined in several ways: the most common way is to say that one object is more complex than another if it includes that 3.4. INDUCTION METHODS 41 other object as a subpart, but this need not always be the case. A general treatment of recursively defined structures (formal definition of structural induction over recursive datatypes) will be presented in a later version of this report. Example: We show that mathematical induction is an instance of the general notion of structural induction over values of recursively defined types, in a later version of this report. Example: As an example of the use of mathematical induction we prove the binomial theorem. The binomial theorem states that for all x, y ∈ R,and n ∈ N we have n n EQ ≡ (x + y)n = xn−j yj j j=0 We call the left-hand side of this equality LHS, and the right-hand side RHS, and abbreviate the equality by EQ. We assume two real numbers x and y and prove EQ by induction on n. Basis case: For n = 0 the EQ clearly is correct, since both sides are 1. For some reason, most textbooks take n = 1 as the basis, in which case LHS is simply x + y,andRHSis 1 1 x1−0y0 + x1−1y1 = x + y 0 1 Induction case: We assume EQ is true for n = k andhavetoshowthatitis then also true for n = k +1: +1 k k +1 (x + y)k+1 = xk+1−j yj j j=0 First, we rewrite the left side of this equation: LHS =(x + y)k+1 =(x + y)k (x + y)= (here in fact we are using the induction hypothesis) k k xk−j yj (x + y)= j j=0 42 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY k k k k xk−j+1 yj + xk−j yj+1 j j j=0 j=0 In rewriting the right side of the equation, we use Pascal’s identity: n +1 n n (∀k, n : k, n ∈ N ∧ 0 0. Suppose we have (∀q : q ∈ N :(∀p : p ∈ N : p< q → ψ(p)). By IND’ we also know that (∀k : k ∈ N : ϕ(k) → ϕ(k + 1)), and thus ϕ(q) also holds for all q>0. Hereby we have proved TFIND’. 40 CHAPTER 3. MATHEMATICAL CONSTRUCTS IN SET-THEORY