<<

CS 226 Course Notes, Spring 2003

Anton Setzer∗ January 4, 2004

1 Introduction 1.1 The Topic of Computability Theory A computable is a function f : A B → such that there is a mechanical procedure for computing for every a A the result f(a) B. ∈ ∈ Computability theory is the study of computable functions. In computability theory we study the limits of this notion.

1.2 Examples (a) exp : N N, exp(n) := 2n. (N = 0→, 1, 2, . . . ). It is trivial{ to write} a program for computing exp. in almost any programming language – so exp is computable. However: Can be really compute say exp(10000000000000000000000000000000000)? (b) Let String be the of strings of ASCII symbols. Define a function check : String 0, 1 by → { } 1 if p is a syntactically correct Pascal program, check(p) := 0 otherwise.

Clearly, the function check is computable ( checker, which is part of the Pascal com- piler), provided one refers to a precisely defined notion of what a Pascal program is. (c) Define a function terminate : String 0, 1 , → { } 1 if p is a syntactically correct Pascal program  without interaction with the user, which terminate(p) :=   terminates, 0 otherwise.   Is terminate computable? We will later see that terminate is an instance of the Turing Halting Problem, and therefore non-computable.

∗email [email protected], www http://www.cs.swan.ac.uk/∼csetzer/index.html, tel. 01792 51-3368, room 226. These notes are based on lecture notes by U. Berger.

1-1 January 4, 2004 CS 226 Computability Theory 1-2

(d) Define a function issortingfun : String 0, 1 , → { } 1 if p is a syntactically correct  Pascal program, which has as input a list issortingfun(p) :=   and returns a sorted list, 0 otherwise.   So issortingfun checks whether its input is a sorting function. Of course it has to be specified precisely what it means for a program to take a list as input and returns it (e.g. input from console and output on console). If we could decide this problem, we could decide the previous one: Take a Pascal program for which we have to check whether it is a sorting function. Create a new pascal program, which takes as input a list, then runs as the original one, until that one terminates. If it terminates, the new program takes the original list, sorts it and returns it. Otherwise the new program never terminates. The new program is a sorting function if and only if the original program terminated. Since the problem, whether a program terminates, is undecidable, the problem of verifying, whether we have a sorting function, is undecidable as well.

1.3 Problems in Computability In order to properly understand and answer the questions raised in the previous examples we have to: First give a precise definition of what “computable” means. • – That will be a mathematical definition. For showing that a function is computable, it suffices to show how it can be com- ∗ puted by a computer. For this an intuitive understanding of “computable” suffices. In order to show that a function is non-computable, however, we need to show ∗ that it cannot be computed in principle. For this we need a very precise notion of computable. Then provide evidence that our definition of “computable” is the correct one. • – That will be a philosophical . Develop methods for proving that certain functions are computable and certain functions are • non-computable. Questions related to the above are the following: Given a function f : A B can be computed, can it be done effectively? (Complexity • theory.) → Can the task of deciding a given problem P 1 be reduced to deciding another problem P 2? • (Reducibility theory). Other interesting questions, which are beyond the scope of this lecture are: Can the notion of computability be extended to computations on infinite objects (like e.g. • streams of data, real numbers, higher type operations)? (Higher and abstract computability theory). What is the relationship between computing (producing actions, data etc.) and proving. • Computability theory involves three areas: January 4, 2004 CS 226 Computability Theory 1-3

Mathematics. • – To give a precise definition of computability and analyse this concept. Philosophy. • – to verify that notions found, like “computable”, are the correct ones. Computer science. • – To investigate the relationship of these theoretical concepts and computing in the real world. Remark: In computability theory, one usually abstracts from limitations on time and space. A problem will be computable, if it can be solved on an idealised computer, even if it it would take longer than the life time of the universe to actually carry out the computation.

1.4 Remarks on the History of Computability Theory Leibnitz (1670) Leibnitz built a first mechanical calculator. He was thinking about building • a machine that could manipulate symbols in order to determine the truth values of math- ematical statements. He noticed that a first step would be to introduce a precise , and he was working on defining such a language.

Gottfried Wilhelm von Leibnitz (1646 – 1716)

Hilbert (1900) poses in his famous list “Mathematical Problems” as 10th problem to decide • Diophantine equations.

David Hilbert (1862 – 1943) January 4, 2004 CS 226 Computability Theory 1-4

Hilbert (1928) poses the “” (decision problem). • – He has (already 1900) developed a theory for formalising mathematical proofs, and believes that it is complete and sound, i.e. that it shows exactly all true mathematical formulae. – Hilbert asks, whether there is an algorithm, which decides whether a mathematical formula is a consequence of his theory. Assuming that his theory is complete and sound, such an algorithm would decide the truth of all mathemtical formulae, expressible in the language of his theory. – The question, whether there is an algorithm for deciding the truth of mathematical formulae is later called the “Entscheidungsproblem”.

G¨odel, Kleene, Post, Turing (1930s) introduce different models of computation and • prove that they all define the same of computable functions.

Kurt G¨odel (1906 – 1978)

Stephen Cole Kleene (1909 – 1994)

Emil Post (1897 – 1954) January 4, 2004 CS 226 Computability Theory 1-5

Alan Mathison Turing (1912 – 1954)

G¨odel (1931) proves in his incompleteness that that any reasonable recursive is • incomplete, i.e there is a formula which is neither provable nor its negation. By the Church- Turing thesis to be established later (see below), it will follow that the recursive functions are the computable ones. Therefore no such theory proves all true formulae (true in the sense of being true in a certain model, which could be standard structure assumed in mathematics). Therefore, the “Entscheidungsproblem” is unsolvable – an algorithm for deciding the truth of mathematical formulae would give rise to a complete and sound theory fulfilling G¨odel’s conditions.

Church, Turing (1936) postulate that the models of computation established above define • exactly the set of all computable functions (Church-Turing thesis). Both establish undecidable problems and conclude that the Entscheidungsproblem is unsolv- • able, even for a class of very simple formulae. – Church shows the undecidability of equality in the λ-calculus. – Turing shows the unsolvability of the halting problem. That problem turns out to be the most important undecidable problem.

Alonzo Church (1903 - 1995)

Post (1944) studies degrees of unsolvability. This is the birth of degree theory. • Matiyasevic (1970) solves Hilbert’s 10th problem negatively: The solvability of Diophan- • tine equations is undecidable. January 4, 2004 CS 226 Computability Theory 1-6

Yuri Vladimirovich Matiyasevich ( 1947) ∗

Cook (1971) introduces the complexity classes P and NP and formulates the problem, • whether P = NP. 6

Stephen Cook (Toronto)

Today: • – The problem P = NP is still open. Complexity theory has become a big research area. 6 – Intensive study of computability on infinite objects (e.g. real numbers, higher type functionals) is carried out (e.g. Dr. Berger in Swansea). – Computability on inductive and co-inductive data types is studied. – Research on program synthesis from formal proofs (e.g. Dr. Berger in Swansea). – Concurrent and game-theoretic models of computation are developed (e.g. Prof. Moller in Swansea). – Automata theory further developed. – Alternative models of computation are studied (quantum computing, genetic algo- rithms). – · · · Remarks on the Name “Computability Theory”: • – The original name was theory, since the mathematical concept claimed to cover exactly the computable functions is called “recursive function”. – This name was changed to computability theory during the last 10 years. – Many books still have the title “recursion theory”. January 4, 2004 CS 226 Computability Theory 1-7

1.5 Administrative Issues Lecturer: Dr. A. Setzer Dept. of Computer Science University of Wales Swansea Singleton Park SA2 8PP UK Room: Room 211, Faraday Building Tel.: (01792) 513368 Fax. (01792) 295651 Email [email protected] Home page: http://www.cs.swan.ac.uk/ csetzer/ ∼ Assessment:

80% Exam. • 20% Coursework. •

The course home page: is located at http://www.cs.swan.ac.uk/ csetzer/lectures/computability/03/index.html There is an open version of the∼ slides and a, for copyright-reasons, password-protected version. The password is .

1.6 Plan for this Module. (Might be changed, since the lecturer is teaching this module for the first time).

1. Introduction. This section. • 2. Encoding of data types into N. We show how to encode elements of some data types as elements of N, the set of natural • numbers. This shows that by studying computability of natural numbers we treat computability • on many other data types as well. We discuss as well the notions of countable vs. uncountable sets. • 3. The Unlimited Register Machine (URM) and the halting problem. We study one model of computation, the URM. It is one of the easiest to work with. • We show that the halting problem is undecidable. • 4. Turing machines. Turing machines as a more intuitive model of computation. • 5. Algebraic view of computability. We study the notion of primitive recursive functions. • We introduce the concept of a partial recursive function, a third model of computation. • January 4, 2004 CS 226 Computability Theory 1-8

6. Lambda-definable functions. We study a fourth model of computation. • We show that lambda-definable functions and partial recursive functions are the same • class. 7. Equivalence and the Church-Turing thesis. We show that the four notions of computation above coincide. • We discuss the Church-Turing thesis, namely that the in an intuitive sense computable • functions are exactly those which can be computed by one of the equivalent models of computation. 8. of the computable functions. We show how to enumerate in a computable way all computable functions. • 9. Recursively enumerable predicates and the arithmetic hierarchy. We study a notion, which covers relations which are undecidable, but which can still be • computed: we can in a computable way enumerate the elements fulfilling it. We study the arithmetic hierarchy. • 10. Reducibility. We study the concept of reducibility. • We show Rice’s theorem, which allows to show the undecidability of many sets. • 11. Computational complexity.

We study basic complexity theory. • 1.7 Aims of this Module To become familiar with fundamental models of computation and the relationship between • them. To develop an appreciation for the limits of computation and to learn techniques for recog- • nising unsolvable or unfeasible computational problems.

To understand the historic and philosophical background of computability theory. • To be aware of the impact of the fundamental results of computability theory to areas of • computer science such as software engineering and artificial intelligence. To understand the close connection between computability theory and logic. • To be aware of recent concepts and advances in computability theory. • To learn fundamental proving techniques like induction and diagonalisation. • January 4, 2004 CS 226 Computability Theory 1-9

1.8 Literature: N J Cutland: Computability: An Introduction to Recursive Function Theory. Cambridge • University Press, 1984. – Main course book H. R. Lewis and C. H. Papadimitriou: Elements of the Theory of Computation. 2nd Edition, • Prentice Hall, 1998. M. Sipser: Introduction to the Theory of Computation. PWS, 1997. • J. Martin: Introduction to Languages and the Theory of Computation. McGraw-Hill, 2003. • J. E. Hopcroft, R. Motwani, J. D. Ullman: Introduction to Automata Theory, Languages, • and Computation. Addison-Wesley, 2000. – Book on automata theory and context free grammars. J. R. Hindley: Basic Simple . Cambridge University Press, Cambridge Tracts • in Theoretical Computer Science 42, 1997. – Best book on the λ-calculus. D. J. Velleman: How To Prove It. Cambridge University Press, 1994. • – Basic mathematics. Recommended to fill any gaps in your mathematical background. E Griffor (Ed.):Handbook of Computability Theory. North-Holland, 1999. • – Very expensive. State of the art of computability theory, postgraduate level. January 4, 2004 CS 226 Computability Theory 2-1

2 Encoding of Data Types into N

In this section, we show how to encode elements of some data types as elements of N, the set of natural numbers. This shows that by studying computability of natural numbers we treat computability on many other data types as well. We discuss as well the notions of countable vs. uncountable sets.

2.1 Countable and Uncountable Sets Notation 2.1 If A is a finite set, let A be the number of elements in A. | | Remark 2.2 One sometimes writes #A for A . | | If A and B are finite sets, then A = B , if and only if there is a between A and B: | | | | a d a d

b e b

e c f c

? Bijection exists No Bijection

a c

d

b e

? No Bijection

For arbitrary (possibly infinite) sets, the above generalises as follows: Definition 2.3 Two sets A and B have the same cardinality, written as A B, if there exists a bijection between A and B, i.e. if there exists functions f : A B and g :'B A which are inverse with each other: x A.g(f(x)) = x and x B.f(g(x)) =→x. → ∀ ∈ ∀ ∈ It follows immediately: Remark 2.4 If A and B are finite sets, then A B if and only A and B have the same number of elements. ' Lemma 2.5 is an , i.e. for all sets A, B, C we have: ' (a) Reflexivity. A A. ' (b) Symmetry. If A B, then B A. ' ' (c) Transitivity. If A B and B C, then A C. ' ' ' Proof: (a): The function id : A A, id(a) = a is a bijection. (b): If f : A B is a bijection,→ so is its inverse f −1. (c): If f : A → B and g : B C are , so is the composition g f : A C. → → ◦ → January 4, 2004 CS 226 Computability Theory 2-2

Theorem 2.6 A set A and its (A) := B B A never have the same cardinality: P { | ⊆ } A (A) 6' P Proof: This is a typical diagonalisation argument. We first consider the case A = N and show that there is no bijection between N and (N). Assume we have such a bijection f : N (N). We define a set C N s.t. C = f(Pn) for every n N. C = f(n) will be violated at elemen→ Pt n: ⊆ 6 ∈ If n f(n), we add n not to C, therefore n f(n) n C. • ∈ ∈ ∧ 6∈ If n f(n), we add n to C, therefore n f(n) n C. • 6∈ 6∈ ∧ ∈ Example: We take an arbitrary function f, and show how C is defined in this case:

f(0) = 0, 1, 2, 3, 4, . . . f(1) = { 0, 2, 4, . . . } f(2) = { 1, 3, . . . } f(3) = { 0, 1, 3, 4, . . . } { } C = 1·,· · 2, . . . { } We were considering above the nth of f(n), so we were going through the diagonal in the above . Therefore this argument is called a diagonalisation argument. So we obtain the following definition

C := n N n f(n) . { ∈ | 6∈ } Now C = f(n) is violated at element n: If n C , then n f(n). If n C, then n f(n). C is not in the of f, a contradiction. ∈ 6∈ 6∈ ∈ In short, the above argument reads as follows: Assume f : N (A) is a bijection. Define C := n N n f(n) . C is in the image of f. Assume C = f(→n).PThen { ∈ | 6∈ }

Definition of C n C n f(n) ∈ ⇔ 6∈ C=f(n) n C ⇔ 6∈ a contradiction

General Situation: The proof for the general situation is almost identical: Assume f : A (A) is a bijection. We define a set→CP, s.t. C = f(a) is violated for a:

C := a A a f(a) { ∈ | 6∈ } C is in the image of f. Assume C = f(a). Then we have

Definition of C a C a f(a) ∈ ⇔ 6∈ C=f(a) a C ⇔ 6∈ a contradiction

Definition 2.7 A set A is countable, if it is finite or A N. • ' A set, which is not countable, is called uncountable. • January 4, 2004 CS 226 Computability Theory 2-3

Examples:

N is countable. • – Since N N. ' Z := . . . , 2, 1, 0, 1, 2, . . . is countable. • { − − } – We can enumerate the elements of Z in the following way: 0, 1, +1, 2, +2, 3, +3, 4, +4, . . .. So−we have−the follo−wing :− 0 0, 1 1, 2 1, 3 2, 4 2, etc. This7→ map7→can− be describ7→ ed7→as− follo7→ws: g : N Z, → n if n is even, g(n) := 2  n+1 if n is odd. − 2 Exercise: Show that g is bijective. (N) is uncountable. • P – (N) is not finite. P – N (N). 6' P ( 1, . . . , 10 ) is finite, therefore countable. • P { } Lemma 2.8 A set A is countable, if and only if there is an injective map g : A N. → Remark 2.9 Intuitively, Lemma 2.8 expresses: A is countable, if we can assign to every element a A a unique code f(a) N. It is not required that each element of N occurs as a code. ∈ ∈ Proof of Lemma 2.8: “ ”: Assume A is countable. Show there is an injective f : A N. ⇒ → Case A is finite. Assume A = a0, . . . , an with ai = aj for i = j. Define f : A N, ai i. • f is injective. { } 6 6 → 7→ Case A is infinite. A is countable, so there exists a bijection from A into N, which is therefore • injective. “ ”: Assume f : A N is injective. Show A is countable. If⇐A is finite, we are →done. Assume A is infinite. Then f is for instance something like the following: f A N 0 a 1 b 2 c 3 4 d 5

6 7 8 9

In order to obtain a bijection g : A N, we have to jump over the gaps in the image of f: → January 4, 2004 CS 226 Computability Theory 2-4

g f N N 0 A 0 a 1 1 b 2 2 c 3 4 3 d 5

6 7 8 9

So we have f(a) = 1, which is the element number 0 in the image of f. g should instead map a to 0. • f(b) = 4, which is the element number 1 in the image of f. g should instead map b to 1. • etc. • 1 is element number 0 in the image of f, because the number of elements f(a0) below f(a) is 0. 4 is element number 1 in the image of f, because the number of elements f(a0) below f(b) is 1. So in general we define g : A N. → g(a) := a0 A f(a0) < f(a) |{ ∈ | }| g is well defined, since f is injective, so the number of a0 A s.t. f(a0) < f(a) is finite. We show that g is a bijection: ∈ g is injective: • Assume a, b A, a = b. Show g(a) = g(b). By the injectivit∈ y of6 f we have f(a)6 = f(b). Let for instance f(a) < f(b). 6 g f N . A . N . . . . g(a) . . . f(a) . a . . . . g(b) . . . . . f(b) . . . b . . . . .

Then ⊂ a0 A f(a0) < f(a) = a0 A f(a0) < f(b) , { ∈ | } 6 { ∈ | } therefore

g(a) = a0 A f(a0) < f(a) < a0 A f(a0) < f(b) = g(b) , |{ ∈ | }| |{ ∈ | }| g(a) = g(b). 6 g is surjective: • We define by induction on k for k N an element ak A s.t. g(ak) = k. Then the assertion follows: ∈ ∈ Assume we have defined already a0, . . . , ak−1. January 4, 2004 CS 226 Computability Theory 2-5

g f . N A . N 0 a0 f(a0) ...... k−1 a(k−1) . f(a(k−1)) k a . . . . . n . . a’ . .

There exist infinitely many a0 A, f is injective, so there must be at least one a0 A s.t. 0 ∈ ∈ f(a ) > f(ak−1). Let n be minimal s.t. n = f(a) for some a A and n > f(ak−1). Let a be the unique element of A s.t. f(a) = n. Then we have ∈

00 00 00 00 a A f(a ) < f(a) = a A f(a ) < f(ak ) ak . { ∈ | } { ∈ | −1 } ∪ { −1} Therefore

g(a) = a00 A f(a00) < f(a) |{ 00 ∈ | 00 }| = a A f(a ) < f(ak ) + 1 |{ ∈ | −1 }| = g(ak−1) + 1 = k 1 + 1 − = k .

Let ak := a. Then g(ak) = k.

Corollary 2.10 (a) If B is countable and g : A B injective, then A is countable. → (b) If A is uncountable and g : A B injective, then B is uncountable. → (c) If B is countable and A B, then A is countable. ⊆ (d) If A is uncountable and A B, then B is uncountable. ⊆ Proof: (a): If f : B N is an injection, so is f g : A N. (b): By (a). →Why? (Exercise). ◦ → (c): By (a). (What is g?; exercise). (d): By (c). Why? (Exercise).

Lemma 2.11 Let A be a non-. A is countable, if and only if there is a surjection h : N A. → Proof: “ ”: Assume A is non-empty and countable. Show there exists a surjection f : N A. ⇒ →

Case A is finite. Assume A = a , . . . , an . Define f : N A, • { 0 } →

ak if k n , f(k) := ≤ a0 otherwise . f is clearly surjective. Case A is infinite. A is countable, so there exists a bijection from N to A, which is therefore • surjective. January 4, 2004 CS 226 Computability Theory 2-6

“ ”: Assume h : N A is surjective. Show A is countable. Define⇐ g : A N, g→(a) := min n h(n) = a . g(a) is well-defined, since h is surjective: there exists some n→s.t. h(n) = a, therefore{ | the minimal} such n is well-defined. It follows that for all a A, h(g(a)) = a. Therefore g is injective: If a, a0 A, g(a) = g(a0), then a = h(g(a)) = h(g(a0)) =∈ a0. By Lemma 2.8, A is countable. ∈

Lemma 2.12 The following sets are uncountable: (a) (N). P (b) F := f f : N 0, 1 . { | → { }} (c) G := f f : N N . { | → } (d) The set of real numbers R.

Proof: (a): (N) is not finite and (N) N by Theorem 2.6. (b): PWe introduce a bijectionP bet6'ween F and (N). Then by (N) uncountable it follows that F is uncountable. P P Define for A (N) χA : N 0, 1 , ∈ P → { } 1 n A, χA(n) := ∈ 0 otherwise.

χA is called the characteristic function of A. For instance, if A = 2, 4, 6, 8, . . . , , then χA is as follows: { } A = {0 2, 4, 6, 8, . . .

X_A(n)

1 n 0 1 2 3 4 5 6 7 8 χ is a function from (N) to N 0, 1 , where we write the application of χ to an element A as P → { } −1 χA instead of χ(A). χ has an inverse, namely the function χ from N 0, 1 into (N), where for f : N 0, 1 , χ−1(f) := n N f(n) = 1 . → { } P → { } { ∈ 1| if n is ev} en, For instance, if f : N N, f(n) = (i.e. f = χA for the set A of even numbers → 0 otherwise, as above), then χ−1(f) is as follows f

1 n 0 1 2 3 4 5 6 7 8 X^{−1}(f) = {0 2, 4, 6, 8, . . .

We show that χ and χ−1 are in fact inverse: χ−1 χ is the identity: If A N, then ◦ ⊆ −1 χ (χA) = n N χA(n) = 1 { ∈ | } = n N n A { ∈ | ∈ } = A January 4, 2004 CS 226 Computability Theory 2-7

χ χ−1 is the identity: If f : N 0, 1 , then ◦ → { } −1 χ −1 (n) = 1 n χ (f) χ (f) ⇔ ∈ f(n) = 1 ⇔ and

−1 χ −1 (n) = 0 n χ (f) χ (f) ⇔ 6∈ f(n) = 1 ⇔ 6 f(n) = 0 . ⇔

Therefore χχ−1(f) = f. It follows that χ is bijective and F is uncountable. (c): By (c), since F G. (d):y A first idea is to⊆ define a function f : F R, f (g) = (0.g(0)g(1)g(2) ) , where the right 0 → 0 · · · 2 hand side is a number in binary format. If f0 were injective, then by F uncountable we could con- clude R is uncountable. The problem is that (0.a a ak01111 ) and (0.a a ak10000 ) 0 1 · · · · · · 2 0 1 · · · · · · 2 denote the same real number, so f0 is not injective. In order to avoid this problem, we modify f0, so that we don’t obtain any binary numbers of the form (0.a0a1 ak1111 )2. We define instead f : F R, f(g) = (0.g(0) 0 g(1) 0 g(2) 0 )2, i.e. f(g) = (0·.a· · a a ·)· ·, where → · · · 0 1 2 · · · 2 0 if k is odd, ak := k g( 2 ) otherwise.

If two (b0b1 ) and (c0c1 ) don’t end in 1111 , i.e. are not of the form (d0d1 dl11111 ), then · · · · · · · · · · · · · · · (0.b b ) = (0.c c ) (b b b ) = (c c c ) 0 1 · · · 2 0 1 · · · 2 ⇔ 0 1 2 · · · 0 1 2 · · · Therefore

f(g) = f(g0) (0.g(0) 0 g(1) 0 g(2) 0 ) = (0.g0(0) 0 g0(1) 0 g0(2) 0 ) ⇔ · · · 2 · · · 2 (g(0) 0 g(1) 0 g(2) 0 ) = (g0(0) 0 g0(1) 0 g0(2) 0 ) ⇔ · · · · · · (g(0) g(1) g(2) ) = (g0(0) g0(1) g0(2) ) ⇔ · · · · · · g = g0 , ⇔ f is injective. Remark on the Continuum Hypothesis: One can show that the four sets above all have the same cardinality. So there are two infinite N and R, which have different cardinality. The question arises whether there is a set B with cardinality in between, i.e. s.t. there are injections from N into B and from B into (N), but s.t. B N and B (N). This question was Hilbert’s first problem. That thereP is no such set6'is called the6' PContinuum Hypothesis. (R is called the continuum). Paul Cohen has shown that this question is independent from ordinary , i.e. we can neither show that such a B exists, nor that such a B doesn’t exist. January 4, 2004 CS 226 Computability Theory 2-8

Paul Cohen ( 1934) ∗

2.2 Encoding of Sequences into N Definition 2.13 Let A, B be sets.

k k A := (a1, . . . , ak) a1, . . . , ak A . A is the set of k- of elements of A, and is called • the k-fold{ Cartesian| product. ∈ } Especially A0 = () . { } A∗ := Ak. • n∈N A∗ is theS set of tuples of A of arbitrary length, and called the Kleene-Star. A B := f f : A B . • A → B is the{ |set of functions→ } from A to B and is called the . → Remark: A∗ can be considered as the set of strings having letters in the alphabet A. We want to define a bijective function π : N2 N. π will be called the pairing function. To define such a bijection means, that we hav→e to enumerate all elements of N2. Pairs of natural numbers can be enumerated in the following way:

y 0 1 2 3 4 x 0 0 2 5 9 14

1 1 4 8 13 19

2 3 7 12 18 25

3 6 11 17 24 32

4 10 16 23 31 40

We start in the top left corner and define π(0, 0) = 0. • Then we go to the next diagonal. π(1, 0) = 1 and π(0, 1) = 2. • Then we go to the next diagonal. π(2, 0) = 3, π(1, 1) = 4, π(0, 2) = 5. • etc. • January 4, 2004 CS 226 Computability Theory 2-9

Note, that the following na¨ıve attempt to enumerate the pairs, fails:

y 0 1 2 3 4 x 0 π(0, 0) π(0, 1) π(0, 2) π(0, 3) π(0, 4) π(0, 5) 1 → → → → → → · · · 2 → → → → → → · · · 3 → → → → → → · · · 4 → → → → → → · · · → → → → → → · · · In this enumeration, the first pair is π(0, 0), which gets number 0. The next one is π(0, 1), and it gets number 1. π(0, 2) gets number 2 etc. But this way we never reach the pair (1, 0), since it takes infinitely long to enumerate the elements of the first row – this enumeration does not work. We will investigate the first solution, which worked. If we look at the diagonals we see that following:

For the pairs in the diagonal we have the property that x + y is constant. • – The first diagonal, consisting of (0, 0) only, is given by x + y = 0. – The second diagonal, consisting of (1, 0), (0, 1), is given by x + y = 1. – The third diagonal, consisting of (2, 0), (1, 1), (0, 2), is given by x + y = 2. – Etc.s The diagonal given by x + y = n, consists of n + 1 pairs: • – The first diagonal, given by x + y = 0, consists of (0, 0) only, i.e. of 1 pair. – The second diagonal, given by x + y = 1, consists of (1, 0), (0, 1), i.e. of 2 pairs. – The third diagonal, given by x + y = 2, consisting of (2, 0), (1, 1), (0, 2), i.e. of 3 pairs. – etc.

y 0 1 2 3 4 x 0 0 2 5

1 1 4

2 3 7

3 6

If we look at how many elements are before the pair (x0, y0) in the above order we have the following: We have to count all elements of the previous diagonals. These are those given by x + y = n • for n < x0 + y0. – In the above example for the pair (2, 1), these are the diagonals given by x + y = 0, x + y = 1, x + y = 2. x+y−1 – The diagonal, given by x+y = n, has n+1 elements, so in total we have i=0 (i+1) = 1 + 2 + + (x + y) = x+y i. P · · · i=1 P January 4, 2004 CS 226 Computability Theory 2-10

n n(n+1) – Gauß showed already as a student at school that i=1 i = 2 . Therefore the above (x+y)(x+y+1) P is 2 . Further, we have to count all pairs in the current diagonal, which occur in this ordering • before the current one. These are y pairs.

– Before (2, 1) there is only one pair, namely (3, 0). – Before (3, 0) there are 0 pairs. – Before (0, 2) there are 2 pairs, namely (2, 0), (1, 1).

(x+y)(x+y+1) Therefore we get that there are in total 2 + y pairs before (x, y), therefore the • (x+y)(x+y+1) pair (x, y) is the pair number ( 2 + y) in this order. We have now the following definition:

Definition 2.14 x+y (x + y)(x + y + 1) π(x, y) := + y (= ( i) + y) 2 Xi=1

n n(n+1) Exercise: Prove that i=1 i = 2 . P Lemma 2.15 π is bijective.

Proof: We show π is injective: We prove first that, if x + y < x0 + y0, then π(x, y) < π(x0, y0):

x+y x+y x+y+1 π(x, y) = ( i) + y < ( i) + x + y + 1 = i Xi=1 Xi=1 Xi=1 0 0 x +y ( i) + y0 = π(x0, y0) ≤ Xi=1 Assume now π(x, y) = π(x0, y0) and show x = x0 and y = y0. We have by the above x + y = x0 + y0.

Therefore 0 0 x+y x +y y = π(x, y) ( i) = π(x0, y0) ( i) = y0 − − Xi=1 Xi=1 and x = (x + y) y = (x0 + y0) y0 = x0. We show π is surjectiv− e: Assume− n N. Show π(x, y) = n for some x, y N. The k0 ∈ ∈ 0 ( i=1 i)k ∈N is strictly existing. Therefore there exists a k s.t. P k k+1 a := i n < i ( ) ≤ ∗ Xi=1 Xi=1

So, in order to obtain π(x, y) = n, we need x + y = k. By y = π(x, y) x+y i, we need to define − i=1 y := n a and, by k = x + y, therefore x := k y. By ( ) it follows 0 Py < k + 1, and therefore − − ∗ ≤ x, y 0. Further, π(x, y) = ( x+y i) + y = ( k i) + (n k i) = n. ≥ i=1 i=1 − i=1 P P P Since π is bijective, we can define π0, π1 as follows: Definition 2.16 Let π : N N and π : N N be s.t. π (π(x, y)) = x, π (π(x, y)) = y. 0 → 1 → 0 1 January 4, 2004 CS 226 Computability Theory 2-11

Remark: π, π0, π1 are computable in an intuitive sense.

“Proof:” π is obviously computable. π0(n), π1(n) can be computed, by first searching for a k s.t. k(k+1) n < (k+1)(k+2) and then defining as in the proof of the surjectivity of π (Lemma 2.15) 2 ≤ 2 π (n) := n k(k+1) and π (n) = k π (n). 1 − 2 0 − 1 Remark 2.17 For all z N, π(π (z), π (z)) = z. ∈ 0 1 Proof: Assume z N and show z = π(π (z), π (z)). π is surjective, so there exists x, y s.t. ∈ 0 1 π(x, y) = z. Then π(π0(z), π1(z)) = π(π0(π(x, y)), π1(π(x, y))) = π(x, y) = z. We want to encode now elements of Nk as natural numbers. I order to encode an element (l, m, n) N3 = ((N N) N), we encode first l, m as π(l, m) N, and then the complete triple as π∈(π(l, m), n). ×Similarly× we encode (l, m, n, p) N4 as π(π(π∈(l, m), n), p). So π3(l, m, n) = 4 ∈ k π(π(l, m), n), π (l, m, n, p) = π(π(π(l, m), n), p), etc. The corresponding decoding function πi : N N should return the ith component. For k = 3 we see that, if x = π3(l, m, n) = π(π(l, m), n), → 3 3 then π0(π0(x)) = l, π1(π0(x)) = m, π1(x) = n, so we define π0(x) = π0(π0(x)), π1 (x) = 3 4 π1(π0(x)), π1 (x) = π1(x). Similarly, for k = 4 we have to define π0 (x) = π0(π0(π0(x))), 4 4 4 π1 (x) = π1(π0(π0(x))), π2 (x) = π1(π0(x)). π3(x) = π1(x). k k k In general one defines for k 1 π : N N, π (x0, . . . , xk−1) = π( π(π(x0, x1), x2) xk−1), and for i < k πk : N N, πk(≥x) = π ( π→( x) ) and for 0 < i < k,·π· k· (x) = π ( π (π (· · · π ( x) ))). i → 0 0 · · · 0 · · · i 1 0 0 · · · 0 · · · k − 1 times k − i − 1 times k k An induction definition of π , πi is| as{zfollo}ws: | {z } Definition 2.18 (a) A bijective function πk : Nk N. We define by induction on k for k N, k 1 → ∈ ≥ πk : Nk N → π1(x) := x k+1 k For k > 0 π (x0, . . . , xk) := π(π (x0, . . . , xk−1), xk)

(b) The inverse of πk. We define by induction on k for i, k N s.t. 1 k, 0 i < k ∈ ≤ ≤ k πi : N N 1 → π0(x) := x k+1 k πi (x) := πi (π0(x)) for i < k k+1 πk (x) := π1(x) k k k Lemma 2.19 (a) For (x , . . . , xk ) N , i < k, xi = π (π (x , . . . , xk )). 0 −1 ∈ i 0 −1 (b) For x N, x = πk(πk(x), . . . , πk (x)). ∈ 0 k−1 Proof: Induction on k. 1 1 1 Base case k = 0: Proof of (a): Let (x0) N . Then π0 (π (x0) = x0. 1 1 ∈ Proof of (b): Let x N. Then π (π0 (x)) = x. Induction step k ∈k + 1: Assume the assertion has been shown for k. → k+1 Proof of (a): Let (x , . . . , xk) N . Then 0 ∈ k+1 k+1 k k for i < k πi (π (x0, . . . , xk)) = πi (π0(π(π (x0, . . . , xk−1), xk))) k k = πi (π (x0, . . . , xk−1)) IH = xi

k+1 k+1 k and πk (π (x0, . . . , xk)) = π1(π(π (x0, . . . , xk−1), xk))) = xk January 4, 2004 CS 226 Computability Theory 2-12

Proof of (b): Let x N. ∈ k+1 k+1 k+1 k k+1 k+1 k+1 π (π0 (x), . . . , πk (x)) = π(π (π0 (x), . . . , πk−1(x)), πk (x)) k k k = π(π (π0 (π0(x)), . . . , πk−1(π0(x))), π1(x)) IH = π(π0(x), π1(x)) Rem. 2.17 = x

We want to encode now N∗ into N. N∗ = N0 Nn. N0 = , and we can encode its only ∪ n≥n {hi} element as 0. For n 1, we have already definedS an encoding of πn : Nn N. Note that a hi ≥ n → k natural number can (and in fact will) both be of the form π (a0, . . . , an−1) and π (b0, . . . , bn−1) for n = k. In order to distinguish between those elements we have to add the length to that code 6 n n and encode (a , . . . , an ) as an element of N as π(n 1, π (a , . . . , an )). Note that we 0 −1 n≥1 − 0 −1 can use n 1 instead of n as first componenSt, since we are referring here to n 1. In order to − n ≥ obtain a code for elements of n≥n N , which is different from the code for , we add 1 to the n hi above, so (a0, . . . , an−1) is encoSded as π(n 1, π (a0, . . . , an−1)) + 1. One can see now that we have obtain in fact a bijection from N∗ to N−. The complete definition is as follows:

Definition 2.20 (a) A bijective function λx. x : N∗ N. Define for x N∗, x : N as follows: h i → ∈ h i := 0 , hi k+1 x , . . . , xk := 1 + π(k, π (x , . . . , xk)) h 0 i 0 (b) The length of a code of an element of N∗: We define

lh : N N , → lh(0) := 0 , lh(x) := π (x 1) + 1 if x > 0 . 0 − lh(x) is called the length of x.

(c) We define for x N and i < lh(x), (x)i N as follows: ∈ ∈ lh(x) (x)i := π (π (x 1)) i 1 −

For lh(x) i, let (x)i := 0. ≤

Lemma 2.21 (a) lh( ) = 0, lh( x , . . . , xk ) = k + 1. hi h 0 i (b) For i k, ( x , . . . , xk )i = xi. ≤ h 0 i (c) For x N, x = (x) , . . . , (x) . ∈ h 0 lh(x)−1i Proof: (a) lh( ) = lh(0) = 0. hi

lh( x , . . . , xk ) = π ( x , . . . , xk 1) + 1 h 0 i 0 h 0 i − = π (π(k, ) + 1 1) + 1 0 · · · − = k + 1 January 4, 2004 CS 226 Computability Theory 2-13

(b) lh( x , . . . , xk ) = k + 1. Therefore h 0 i k+1 ( x0, . . . , xk )i = πi (π1( x0, . . . , xk 1)) h i k+1 h ki+1− = πi (π1(1 + π(k, π (x0, . . . , xk)) 1)) k+1 k+1 − = πi (π (x0, . . . , xk)) Lem 2.19(a) = xi

(c) If x = 0, lh(x) = 0, and therefore (x)0, . . . , (x)lh(x)−1 = = 0 = x. h il+1 hi If x > 0, let x 1 = π(y, l). Then lh(x) = l + 1, (x)i = π (y) and therefore − i (x) , . . . , (x) = πl+1(y), . . . , πl+1(y) h 0 lh(x)−1i h 0 l i l+1 l+1 l+1 = π(π (π0 (y), . . . , πl (y)), l) + 1 Lem 2.19(a) = π(y, l) + 1 = x Theorem 2.22 (a) If A is countable, so are Ak, A∗. (b) If A, B are countable, so are A B, A B. × ∪ (c) If An are countable sets for n N, so is N An. ∈ n∈ S (d) Q, the set of rational numbers, is countable. Proof: (a) Assume A is countable. We show first that A∗ is countable: there exists f : A N, f injective. Define f ∗ : A∗ N∗, ∗ → → f (a0, . . . , an) = (f(a0), . . . , f(an)). Then f is injective. λx. x : N∗ N is bijective. Therefore (λx. x ) f ∗ : A∗ N is injective, A∗ is countable. Ak h iA∗, therefore→ Ak is countable as well. h i ◦ → (b) ⊆Assume A, B countable. Show A B, A B are countable. If A = or B = , then A B = , therefore× ∪countable, and A B = A or A B = B, therefore countable.∅ ∅ × ∅ ∪ ∪ Assume A, B = . Then there exist surjections f : N A and g : N B. Then f g : N2 A B, (f g)(n, m) :=6 ∅(f(n), g(m)) is surjective as well. Further→ λx.(π (x)→, π (x)) : N ×N2 is surjectiv→ × e, × 0 1 → therefore as well (f g) (λx.(π0(x), π1(x))) : N A B. Therefore A B is countable. Further h : N2 A× B,◦h(0, n) := f(n), h(k, n) →:= g(×n) for k > 0, is surjectiv× e, therefore as well h (λx.(π (x), π→(x)))∪ : N A B. ◦ 0 1 → ∪ (c) Assume An are countable for n N. Show A := An is countable as well. ∈ n∈N If all An are empty, so is n∈N An. Assume Ak Sis non-empty for some k. By replacing all Al which are empty by Ak Swe get a sequence of sets, s.t. their union is the same as A. So without loss of generality we can assume that An = for all n. An are countable, so there exist 2 6 ∅ fn : N An surjective. Then f : N A, f(n, m) := fn(m) is surjective as well. Therefore → → f (λx.(π0(x), π1(x))) : N A is as well surjective, A is countable. (d)◦ Z N is countable, since→Z and N are countable. Let A := (z, n) Z N, n = 0 . A Z N, therefore× A is countable as well. Since A = and countable, there{ exists∈ ×a surjection6 } f :⊆N × A. z 6 ∅ → Define g : A Q, g(z, n) := n . g is surjective, therefore as well g f : N Q. So we have defined a → from N onto Q, Q is countable. ◦ →

2.3 Reduction of Computability on some Data Types to Computability on N We want to reduce computability on some data types A to computability on N. This avoids having to introduce the notion of computability for each of those types individually. In order to obtain January 4, 2004 CS 226 Computability Theory 2-14 this, we need to encode elements of A as elements of N, i.e. we need a function code : A N, and we need to decode them again, i.e. we need a decoding function decode : N A. If →these functions are computable, then from a f : A A we can obtain→a computable f : N N, by taking an element n N, decoding it into A, applying→ then f and then encoding it back in→to a natural number: ∈ e

f A - A 6 decode code

f ? N - N e We want as well to recover from f the original function f. For this we need that, if we first encode an element a A as a natural number and then decode it again, we obtain the original value a, e i.e. we need deco∈ de(code(a)) = a. If we have these properties, we say that A has a computable encoding into N. Since we are referring to the notion of “computable” in an intuitive sense, and since this is not a precise mathematical definition, we call the following an “informal definition”, and lemmata referring to it “informal lemmata”. Informal Definition: A data type A has a computable encoding into N, if there exist in an intuitive sense computable functions code : A N, and decode : N A such that decode(code(a)) = a for all a A. → → ∈ Remark: Note that we do not require that code(decode(n)) = n for every n N. So every a A has a unique code in N, i.e. code is injective, as expressed by deco∈de(code(a)) = a. However, not∈ every n N needs to be a code for an element of A, (which would imply code(decode(n)) = n), code need not be ∈surjective. The most common data used is the data type of finite strings over a finite alphabet A, and the set of such strings is A∗. Now A∗ has a computable encoding into N: Informal Lemma: If A is a finite set, then A∗ has a computable encoding into N. “Proof”: If A is empty, then A∗ = , and we can define code : A∗ N, code(a) = 0, and decode : N A∗, decode(n) = . Both{hi} function are clearly computable,→and if a A∗, then a = and deco→de(code(a)) = decohide(0) = = a. ∈ hi hi Assume now A = , A = a , . . . , an , where ai = aj for i = j. Define f : A N, f(ai) = i, 6 ∅ { 0 } 6 6 → and g : N A, g(i) = ai for i n, g(i) = a0 for i > n. f and g are in an intuitive sense → ≤ ∗ computable, and therefore as well code : A N, code(a1, . . . , an) = f(a1), . . . , f(an) and the ∗ → h i function decode : N A , decode(n) := (g((n)0), . . . , g((n)lh(n)−1)). → ∗ We show decode(code(x)) = x for x A : Let x = (a , . . . , an) ∈ 1 decode(code(x)) = decode(code(a1, . . . , an))

= decode( f(a ), . . . , f(an) ) h 1 i = (g(f(a1)), . . . , g(f(an))

= (a1, . . . , an) = x .

Remark: Assume A, B have computable encodings codeA : A N, decodeA : N A and → → codeB : B N, decodeB : N B. If f : A B is in an intuitive sense computable, then we can → → → define an in an intuitive sense computable function f : N N, f := codeB f decodeA: → ◦ ◦ e e January 4, 2004 CS 226 Computability Theory 2-15

f A - B 6 6

decodeA codeA decodeB codeB

? f ? N - N e From f we can recover f, since f = decodeB f codeA (easy exercise). So by considering computability on N we cover computability on data◦ ◦types with a computable encoding into N as e e well.

2.4 Appendix: Some Mathematical Background 2.4.1 Injective – Surjective – Bijective Definition Let f : A B, A0 A. → ⊆ (a) f[A0] := f(a) a A is called the image of A0 under f. { | ∈ } (b) The image of A under f is called the image of f.

f[A'] A B f[A] A' A B a e a e

b f b f c c g g d d h h

f Image of A0 under f Image of f

Definition 2.23 Let A, B be sets, f : A B. → (a) f is injective or an injection or one-to-one, if f applied to different elements of A has different results: a, b A.a = b f(a) = f(b). ∀ ∈ 6 → 6 (b) f is surjective or a surjection or onto, if every element of B is in the image of f: b B. a A.f(a) = b. ∀ ∈ ∃ ∈ (c) f is bijective or a bijection or a one-to-one correspondence iff it is both surjective and injec- tive.

If we visualise a function by having arrows from elements a A to f(a) B then we have the following: ∈ ∈ A function is injective, if for every element of B there is at most one arrow pointing to it: • January 4, 2004 CS 226 Computability Theory 2-16

c c a a

d d

b e b e

injective non-injective A function is surjective, if for for every element of B there is at least one arrow pointing to • it: d a d a

b b e

c e c f

surjective non-surjective A function is bijective, if for every element of B there is exactly one arrow pointing to it: • d a

b e

c f

bijective Note that, since we have a function, for every element of A there is exactly one arrow • originating from there.

2.4.2 Sequences vs. Functions N A → A function f : N A is nothing but an infinite sequence of elements of A numbered by elements → of N, namely f(0), f(1), f(2), . . ., usually written as (f(n))n∈N. We identify functions f : N A with infinite sequences of elements of A. So the following denotes the same mathematical object:→ 0 if n is odd, The function f : N N, f(n) = • → 1 if n is even. The sequence (1, 0, 1, 0, 1, 0, . . .). • 0 if n is odd, The sequence (an)n N where an = • ∈ 1 if n is even. January 4, 2004 CS 226 Computability Theory 2-17

2.4.3 Partial Functions A f : A ∼ B is the same as a function f : A B, but it need not be defined for all values, so we do not require→ that for every a A there is a→value f(a). The key example of a partial function is the function∈ computed by a computer program, which has an input a A and possibly returns an output f(a) B. (We assume that the program always returns the ∈same result, which can be guaranteed by forbidding∈ references to variables, which are defined outside the scope of this function.) If the program applied to some a A returns a value b B, then f(a) is defined and equal to b. If the program applied to a does not∈ terminate, then f(∈a) is undefined. Other examples of partial functions are the function f : R ∼ R, f(x) = 1 , which is not defined for → x x = 0; or g : R ∼ R, g(x) = √x, which is only defined for x 0. → ≥ Definition 2.24

Let A, B be sets. A partial function f from A to B, written f : A ∼ B, is a function • f : A0 B for some A0 A. A0 is called the domain of f, written as A→0 = dom(f). → ⊆ Let f : A ∼ B. • → – f(a) is defined, written as f(a) , if a dom(f). ↓ ∈ – f(a) b (f(a) is partially equal to b) : f(a) f(a) = b. ' ⇔ ↓ ∧ We want to work with terms like f(g(2), h(3)), where f, g, h are partial functions. The question is here what happens if g(2) or h(3) is undefined. There is a theory of partial functions, in which f(g(2), h(3)) might be defined, even if g(2) or h(3) is undefined. This makes senses for instance for the function f : N2 ∼ N, f(x, y) = 0. Such functions, which are defined, even if some of its are undefined,→are called non-strict, whereas functions, which are defined only if all of its arguments are defined are called strict. In this lecture, functions will always be strict. Therefore a term like f(g(2), h(3)) is defined only, if g(2) and h(3) are defined, and if f applied to the results the results of evaluating g(2) and h(3) is defined. f(g(2), h(3)) is then evaluated as for ordinary functions: We first compute g(2) and h(3), and then evaluate f applied to the results of those computations.

Definition 2.25

For expressions t formed from constants, variables and partial functions we define t and • t b as follows: ↓ ' – If t = a is a constant, then t is always true and t b : a = b. ↓ ' ⇔ – If t = x is a variable, then t is always true and t x : a = x. ↓ ' ⇔ – f(t , . . . , tn) b : a , . . . , an.t a tn an f(a , . . . , an) b . 1 ' ⇔ ∃ 1 1 ' 1 ∧ · · · ∧ ' ∧ 1 ' f(t , . . . , tn) : b.f(t , . . . , tn) b 1 ↓ ⇔ ∃ 1 ' s : (s ). • ↑ ⇔ ¬ ↓ We define for expressions s, t formed from constants and partial functions s t : (s • t ) (s a, b.s a t b a = b). ' ⇔ ↓↔ ↓ ∧ ↓→ ∃ ' ∧ ' ∧ t is total means t . • ↓ A function f : A ∼ B is total, iff a A.f(a) . • → ∀ ∈ ↓ January 4, 2004 CS 226 Computability Theory 2-18

Remark: Total partial functions are ordinary (non-partial) functions. Remark: Quantifiers always range over defined elements. So by m.f(n) m we mean: there exists a defined m s.t. f(n) m. So from f(n) g(k) we cannot∃conclude'm.f(n) m unless g(k) . ' ' ∃ ' ↓ Remark 2.26 (a) If s a, s b, then a = b. ' ' (b) For all terms we have t a.t a. ↓⇔ ∃ ' (c) f(t , . . . , tn) a , . . . , an.t a tn an f(a , . . . , an) . 1 ↓⇔ ∃ 1 1 ' 1 ∧ · · · ∧ ' ∧ 1 ↓ Examples: Assume f : N ∼ N, dom(f) = n N n > 0 , f(n) := n + 1 for n dom(f). Let g : N ∼ N, dom(g) = 0, 1,→2 , g(n) := n. Then{ ∈we ha| ve } ∈ → { } f(1) , f(0) , f(1) 2, f(0) n for all n. • ↓ ↑ ' 6' g(f(0)) , since f(0) . • ↑ ↑ ↑ |{z} g(f(1)) , since f(1) , f(1) 2, g(2) . • ↓ ↓ ' ↓ '2 g(f|{z(2)}) , since f(2) , f(2) 3, but g(3) . • ↑ ↓ ' ↑ '3 g(f|{z(2)}) f(0), since both expressions are undefined. • ' ↑ ↑ |{z} |{z} g(f(1)) f(g(1)), since both sides are defined and equal to 2. • ' '2 '2 g|(f{z(2))} f| (g{z(1))}, since the left hand side is undefined, the right hand side is defined. • 6' ↑ ↓ | {z } | {z } f(f(1)) f(1), since both sides evaluate to different (defined) values. • 6' '3 '2 +,| {zetc.} can|{zb}e treated as partial functions. So for instance • · – f(1) + f(2) , since f(1) , f(2) , and + is total. ↓ ↓ ↓ ↓ ↓ – f|{z(1)} + f|{z(2)} 5. ' '2 '3 – f|{z(0)} +f|(1){z} , since f(0) . ↑ ↑ ↑ |{z} Remark: Strict evaluation of functions corresponds to the so called “call-by-value” evaluation of functions, as it is done in most imperative and some functional programming languages: when evaluating f(t1, . . . , tn), one first evaluates t1, . . . , tn. Then one evaluates f applied to the results of this evaluation. Undefined values corresponds to an infinite computation, and if one of the ti is undefined, the computation of f(t1, . . . , tn) doesn’t terminate. In Haskell we have “call-by-name” evaluation, which means ti are evaluated only if they are needed in the computation of f. For instance, if we have f : N2 N, f(x, y) = x, and t is an undefined term, then with call-by-need, the term f(2, t) can be evaluated→ to 2, since we never need to evaluate t. So functions in Haskell are non-strict. In our setting, functions are strict, so f(2, t) is undefined. January 4, 2004 CS 226 Computability Theory 2-19

2.4.4 λ-Notation When using in an informal context (i.e. outside the λ-calculus to be introduced later) by λx.t we mean the function mapping x to t. E.g. λx.x + 3 is the function f s.t. f(x) = x + 3, λx.√x is the function mapping x to √x. This notation will be used if we want to introduce such a function without giving it a name. Note that we do not specify the domain and codomain – when this notation is used, it will either not matter or clear from the context.

2.4.5 Some Standard Sets N is the set of natural numbers, i.e. • N := 0, 1, 2, . . . . { } – Note that 0 is a natural number. – When elements of a sequence, e.g. when counting the arguments of a function, we will usually start with 0: The 0th element of a sequence is what is usually called the first element. (So, when ∗ considering variables x0, . . . , xn−1, x0 is the 0th variable). The first element of a sequence is what is usually called the second element. (So, ∗ when considering variables x0, . . . , xn−1, x1 is the first variable). etc. ∗ Z is the set of integers: • Z := N x x N . ∪ {− | ∈ } Q is the set of rationals, e.g. • x Q := x, y Z, y = 0 . { y | ∈ 6 }

R is the set of real numbers. • If A, B are sets, then A B is the product of A and B, i.e. • × A B := (x, y) x A y B × { | ∈ ∧ ∈ } If A is a set, k N, k > 0, then Ak is the set of k-tuples of elements of A, i.e. • ∈ k A := (x , . . . , xn ) x , . . . , xk A . { 0 −1 | 0 −1 ∈ } We identify A1 (which is formally correct (x) x A ) with A. (Note that, if one is pedantic, (x) is different from x: (x) is the {tupel| consisting∈ } of one element x, whereas x is one element, from which we haven’t formed a . Ignoring this difference usually doesn’t cause problems).

2.4.6 Relations, Predicates and Sets A predicate on a set A is a property P of elements of A. In this lecture, A will usually be Nk • for some k N, k > 0. ∈ We write P (a) for “predicate P is true for the element a of A”. • We often write “P (x) holds” for “P (x) is true”. • We can use P (a) in formulas. Therefore: • January 4, 2004 CS 226 Computability Theory 2-20

– P (a) (“not P (a)”) means that “P (a) is not true”. ¬ – P (a) Q(q) means that “both P (a) and Q(a) are true”. ∧ – P (a) Q(q) means that “P (a) or Q(a) is true” (especially, if both P (a) and Q(a) are true, ∨then P (a) Q(a) is true as well). ∨ – x B.P (x) means that “for all elements x of the set B P (x) is true”. ∀ ∈ – x B.P (x) means that “there exists an element x of the set B s.t. P (x) is true”. ∃ ∈ In this lecture, “relation” is another word for “predicate”. • We identify a predicate P on a set A with x A P (x) . Therefore predicates and sets will • be identified. E.g., if P is a predicate, { ∈ | } – x P stands for x x A P (x) , which is equivalent to P (x), ∈ ∈ { ∈ | } – x P.ϕ(x) for a formula ϕ stands for x.P (x) ϕ(x). ∀ ∈ ∀ → – etc. An n-ary relation or predicate on N is a relation P Nn. A unary, binary, ternary relation • on N is a 1-ary, 2-ary, 3-ary relation on N, respectiv⊆ely. An n-ary function on N is a function f : Nn N. A unary, binary, ternary function on N is • a 1-ary, 2-ary, 3-ary function on N, respectiv→ely.

2.4.7 Other Notations The “dot”-notation. When writing expressions such as x.A(x) B(x), the convention is that if we have a dot after the quantifier together with a variable∀ (in the∧example the dot after “ x”), then the scope of the quantifier includes as much as possible to the right. So in the example∀ x. refers to A(x) B(x), and the formula expresses that for all x both A(x) and B(x) hold, or, ∀using brackets, x.∧(A(x) B(x)). However, in (A x.B(x) C(x)) D(x), x refers only to B(x) C(x), since∀ this is∧the maximum scope possible→ ∀– it doesn’t∧ make∨sense to∀include into the scope∧of x as well “) D(x)”. Similarly∀, in x.A(x) ∨B(x), x refers to A(x) B(x), whereas in (A x.B(x) C(x)) D(x), x refers to B∃(x) C∧(x). ∃ ∧ ∧ ∃ ∨ ∧ This∃ applies as well∨ to λ-expressions, so λx.x + x is the function taking an x and returning x + x.

~x, ~y etc. We will often refer to arguments of functions and relations, to which we don’t refer n+1 explicitly. An example are the variables x , . . . , xn in the definition of a function f : N N, 0 −1 →

g(x0, . . . , xn−1), if y = 0, f(x0, . . . , xn−1, y) = h(x0, . . . , xn−1), if y > 0.

In order to avoid having to write in such kind of examples often x0, . . . , xn−1, we write ~x for it. In general, ~x stands for x0, . . . , xn−1, where n, i.e. how many variables are meant by ~x, is usually clear from the context. Examples:

n+1 f : N N, then in f(~x, y), ~x needs to stand for n arguments, therefore ~x = x , . . . , xn . • → 0 −1 n+2 If f : N N, then in f(~x, y), ~x needs to stand for n + 1 arguments, so ~x = x , . . . , xn. • → 0 If P is an n + 4-ary relation, then in P (~x, y, z), ~x stands for x , . . . , xn . • 0 +1 Similarly, we write ~y for y0, . . . , yn−1, where n is clear from the context, similar for ~z, ~n, m~ , etc. January 4, 2004 CS 226 Computability Theory 3-1

3 The Unlimited Register Machine (URM) and the Halting Problem

A model of computation consists of a set of partial functions together with methods, which describe, how to compute those functions. We will usually consider models of computation, which contain all computable functions. One usually aims at describing an as simple model of computation as possible, i.e. to minimise the constructs used, while still being able to represent all computable functions. This makes it easier to show for another model of computation, that the first model can be interpreted in it. Furthermore, models of computations are more used for showing that something is not computable rather than showing that something is computable, and this is easier to show if the model of computation is simple. The URM (the unlimited register machine) is one model of computation, which is particularly easy to understand. It is a virtual machine, i.e. it is the description of, how a computer would execute its program, but it is not intended to be actually implemented. It is an abstract virtual machine, so we do not intend to simulate this machine on another computer (although there are implementations) – the machine serves as a mathematical model, which is then investigated mathematically. For instance, one usually doesn’t write programs in it – instead one shows that in principal there is a way of writing a certain program in this language. In fact it turns out that it is difficult to write actual programs for the URM. We will have a very low level programming language – there will be no functions, objects, etc., not even while loops. The only loop construct will be the goto. So the URM does not support structured programming in any sense. A URM is an idealised machine, as expressed by the word “unlimited”. So we don’t assume any bounds on the amount of memory or execution time – however all values will be finite. There exist various variants of URMs, we will here introduce a URM which is particularly simple.

3.1 Syntax and Semantics of the URM The URM consists of • – infinitely many registers Ri, each of which can store an arbitrarily big natural number;

– a finite sequence of instructions I0, I1, I2, . . . In; – and a program counter PC, which can store a natural number. If it contains a number 0 i n, this means that it points to instruction Ii. If the PC is set to another number, the≤ program≤ will stop.

R R R R R R R R R 0 1 2 3 4 5 6 7 8 · · ·

I I I In 0 1 2 · · ·

PC

Solid line: Execute instruction Dotted line: Program has terminated

There are three kinds of URM instructions: • January 4, 2004 CS 226 Computability Theory 3-2

– The successor instruction succ(n) where n N. ∈ If the PC points to this instruction, the URM performs the following : It adds 1 to register Rn, and increments then the PC by one. So the PC points then either to the next instruction, if there is one, or the program stops, if there is none. – The predecessor instruction pred(n), where n N. ∈ If the PC points to this instruction, the URM performs the following operation: If Rn contains a value > 0, then it subtracts 1 from it, otherwise it leaves it as it is. Then the PC is incremented by 1. – The conditional jump instruction ifzero(n, q), 1 n N, q N. If the PC points to this instruction, the URM performs the following ≤operation:≤ ∈

If Rn contains value 0, then the PC is set to value q – so there is an instruction Iq, ∗ it continues executing that instruction; if there is no such instruction the program will stop. If R contains a value > 0, then the PC is incremented by 1, so the program continues ∗ executing the next instruction or it stops, if the current instruction is the last one.

Note that a URM program I , . . . , In will refer only to finitely many registers, namely those • 0 explicitly occurring in I0, . . . , In.

(k) k ∼ (k) If P = I0, . . . , In is a URM program, it computes a function P : N N. P (a0, . . . , ak−1) • is computed as follows: →

– Initialisation: Set PC to 0, store a0, . . . , ak−1 in registers R0, . . . , Rk−1, respectively, and set all other registers to 0 (it suffices to do this for those registers, referenced in the program).

– Iteration: As long as the PC holds a value j 0, . . . , n , execute Instruction Ij . Continue with the next instruction as given by the∈ PC.{ } – Output: If the PC value is bigger than n, the program stops, and the function returns (k) the value b contained in register R0: P (a0, . . . , ak−1) := b. If the program never (k) stops, then P (a0, . . . , ak−1) is undefined. A partial function f : Nk ∼ N is URM-computable, if f = P (k) for some k N and some • URM program P . → ∈

Remark: 1. A URM program P defines many URM-computable functions:

(1) ∼ – As a unary function P : N N, one stores its argument in R0, sets all other registers to 0 and then executes P. → (2) 2 ∼ – As a binary function P : N N, one stores its two arguments in R0, R1, sets all other registers to 0 and then executes→ P. (3) 3 ∼ – As a ternary function P : N N, one stores its three arguments in R0, R1, R2, sets all other registers to 0 and then→executes P. – etc. 2. For a partial function f to be computable we do not need to determine, whether f(n) is defined or not. We only need to determine in case f(n) after finite amount of time that f(n) is actually defined and to compute the value of f(↓n). In case f(n) , we will wait infinitially long for an answer. ↑ The Turing halting problem, which will be discussed problem, is the problem to decide, whether f(n) or f(n) . We will see later that this problem is undecidable. ↓ ↑ January 4, 2004 CS 226 Computability Theory 3-3

If one wants to make sure that we always get an answer, one has to consider total computable functions f, i.e computable functions, which are always defined. In order to introduce total computable functions, we have to introduce first the partial computable functions and then take the of those functions, which are total. There is no programming language (in which it is decidable whether a program is actually a program), in which all functions are total and which allows to define all computable functions. Example: The function f : N ∼ N, f(x) 0 is URM-computable. We derive a URM-program for computing it in several steps.→ ' Step 1: The program should, if initially R0 contains x and all other registers contain 0, terminate with value 0 in register 0. A higher level program (not in the language of the URM) would be as follows:

R0:= 0 Step 2: Since we have only a successor, and predecessor operation available, we replace the program by the following: while R = 0 do R := R · 1 0 6 { 0 0− } x y if y x, Here x · y := max x y, 0 , i.e. x · y = − ≤ . − { − } − 0 otherwise. Step 3: We replace the while-loop by a goto:

LabelBegin : if R0 = 0 then goto LabelEnd; · R0 := R0 1; goto LabelBegin− ; LabelEnd :

Step 4: The last goto will be replaced by a conditional goto, depending on R1 = 0. Since R1 is initially true, and never changed during the program, this jump will always be carried out. LabelBegin : if R0 = 0 then goto LabelEnd; R := R · 1; 0 0− if R1 = 0 then goto LabelBegin; LabelEnd :

Step 5: We translate the program into a URM program I0, I1, I2: I0 = ifzero(0, 3) I1 = pred(0) I2 = ifzero(1, 0)

3.2 Translating Higher-Level Constructs into the Language of URM Remark on jump addresses In the following, we will often insert URM programs P as part of another URM program P 0, for instance as follows: succ(0) P pred(0) By this we mean that all jump addresses in P are adapted accordingly, so that they fit to the new numbers of instructions. In the example above, for instance, one has to add 1 to each jump address, since the new numbers of the instructions is the old number of it, increased by 1. Furthermore, when inserting a piece of program, we assume that whenever P terminates, it termi- nates with PC equal to the number of the first instruction following it. So in the example above, if the execution of P as part of P 0 terminates, then the next instruction is the instruction succ(0). This can be achieved by renaming jump addresses, which jump outside of P , accordingly. In order to introduce more complex URM programs, we introduce some constructions for forming URM programs: January 4, 2004 CS 226 Computability Theory 3-4

Labelled URM programs: We replace numbers as jump addresses by labels. A label is a we assign to certain program lines. If mylabel refers to Ik, then ifzero(n, mylabel) stands for ifzero(n, k). It is trivial to translate a program with labels into an ordinary URM program. LabelEnd will be a special label, denoting the first instruction following the program. So the above program reads with labels as follows:

LabelBegin : I0 = ifzero(0, LabelEnd) I1 = pred(0) I2 = ifzero(1, LabelBegin)

We can now omit the notations Ik = and write the program as follows: LabelBegin : ifzero(0, LabelEnd) pred(0) ifzero(1, LabelBegin) Replacement of registers by variable names. Similarly we write variable names instead of registers. So, if we write x for R0, y for R1, the above program reads as follows: LabelBegin : ifzero(x, LabelEnd) pred(x) ifzero(y, LabelBegin) Writing of statements in a more readable way:

x := x + 1; stands for succ(x). • x := x · 1; stands for pred(x). • − if x = 0 then goto mylabel; • stands for ifzero(x, mylabel).

The above program reads now as follows: LabelBegin : if x = 0 then goto LabelEnd; x := x · 1; if y =−0 then goto LabelBegin; Introduction of more complex statements. We introduce now some more complex statements, which translate back into URM programs, using any of the new statements introduced before. Most new statements will require some extra registers, which are not used elsewhere in the program and are not used for storing the input and the output of the function. This is not a problem, since we have arbitrary many registers available. The auxiliary registers will at the end of the instruction always be set to 0. Since all registers, except of the argument registers, are 0, and since other statements don’t change those registers, they will be always zero, when this statement is started. By “let aux be a new variable” we mean, that aux is a variable denoting a new register. LabelEnd will in the following the next instruction, following the instructions forming the complex statement.

(a) The statement goto mylabel; executes an unconditional jump to statement with label mylabel. It stands for the (labelled) URM statement: if aux = 0 then goto mylabel; where aux is new variable. January 4, 2004 CS 226 Computability Theory 3-5

(b) If Instructions is a sequence of instructions, the following statement whileh x = 0 do i 6 { Instructions ; h i} stands for the following URM program:

LabelLoop : if x = 0 then goto LabelEnd; Instructions gotoh LabelLoopi ;

(c) The statement x := 0 sets register associated with x to 0. It stands for the following program: while x = 0 do x := x · 1; ; 6 { − } (d) The statement y := x; sets register associated with y to the value x. All other registers (except for auxiliary regis- ters), including x will remain unchanged. Let aux be a new variable. In case x and y denote the same variable, it stands for the empty URM program (no instruc- tions). Otherwise the program is executed in two steps: First we decrement x and increment simultaneously aux, until x is 0. At the end x contains 0, and aux contains the original value of x. Then we set y to 0 and then, in a loop, we decrement aux and increment simultaneously x and y, until aux = 0. At the end, x and y contain the previous value of aux, which is the original value of x, and aux = 0. The complete program is as follows:

while x = 0 do x := x ·6 1; { aux :=−aux + 1; ; y := 0; } while aux = 0 do aux := aux6 · 1; { x := x + 1;− y := y + 1; ; } (e) Assume x, y, z denote different registers. The statement x := y + z; computes in x the sum of y and z. All other registers, including y, z, except for auxiliary ones, remain as they are. It is computed as follows (aux is an additional variable): x := y; aux := z; while aux = 0 do aux := aux6 · 1; { x := x + 1;−; } (f) Assume x, y, z denote different registers. The statement x := y · z; − January 4, 2004 CS 226 Computability Theory 3-6

computes in x the difference between y and z. If z is bigger than y, then x becomes 0. All other registers, including y, z, except for auxiliary ones, remain as they are. It is computed as follows (aux is an additional variable): x := y; aux := z; while aux = 0 do aux := aux6 · 1; { x := x · 1; −; − } (g) Assume x, y denote different registers, and let Statements be a sequence of statements. The statement h i while x = y do Statements6 {; h i} executes statements, as long as x = y. It can be defined as follows (aux, aux0, aux1 are new variables): 6

· aux0 := x y; aux := y−· x; 1 − aux := aux0 + aux1; while aux = 0 do Statements6 { h · i aux0 := x y; −· aux1 := y x; aux := aux− + aux ; ; 0 1 } We could now continue and introduce more and more complex statements into the language of URM, and it should be clear by now, that in principle one could translate languages of high-level programming languages into URM. We will stop here, since we have now enough material in order to prove the next lemmata.

3.3 Constructions for Introducing URM-Computable Functions We introduce notations for some partial functions

Definition 3.1 (a) Define the zero function zero : N ∼ N, zero(x) = 0. → (b) Define the successor function succ : N ∼ N, succ(x) = x + 1. → n n ∼ n (c) Define for 0 i < n the projection function proj : N N, proj (x , . . . , xn ) = xi. ≤ i → i 0 −1 ∼ ∼ (d) Assume g : (B0 Bk−1) C, and hi : A Bi (i = 0, . . . , k 1). Then define the × · · · × → → ∼ − composition of g with h , . . . , hk as g (h , . . . , hk ) : A C, 0 −1 ◦ 0 −1 →

(g (h , . . . , hk ))(a) : g(h (a), . . . , hk (a)) ◦ 0 −1 ' 0 −1 (e) Assume g : Nk ∼ N, h : Nk+2 ∼ N. Then we define the function defined by primitive recursion from g and h, namely→ primrec(→g, h) : Nk+1 ∼ N, as follows: Let f := primrec(g, h). →

f(n , . . . , nk , 0) : g(n , . . . , nk ) 0 −1 ' 0 −1 f(n , . . . , nk , m + 1) : h(n , . . . , nk , m, f(n , . . . , nk , m)) 0 −1 ' 0 −1 0 −1 In the special case k = 0, it doesn’t make sense to use g() – in this case let g be a natural number instead. So the case k = 0 reads as follows: January 4, 2004 CS 226 Computability Theory 3-7

We define for n N, h : N2 ∼ N, f := primrec(n, h) : N ∼ N: ∈ → → f(0) : n ' f(m + 1) : h(m, f(m)) ' (f) Let g : Nn + 1 ∼ N. We define µ(g) : Nn ∼ N, → → µ(g)(x , . . . , xn ) (µy.g(x , . . . , xn , y) 0)) 0 −1 ' 0 −1 ' where we define the partial expression µy.g(x , . . . , xn , y) 0 as follows 0 −1 ' the least y N s.t. g(x0, . . . , xn−1, y) 0 and ∈ 0 0 ' (µy.g(x0, . . . , xn−1, y) 0) : g(x0, . . . , xn−1, y ) for 0 y < y if such y exists, ' ' undefined ↓ ≤ otherwise Examples for definition by primitiv e recursion. Addition can be defined using primitive recursion: Let f(x, y) := x + y. We have • f(x, 0) = x + 0 = x , f(x, y + 1) = x + (y + 1) = (x + y) + 1 = f(x, y) + 1 .

Therefore f(x, 0) = g(x) , f(x, y + 1) = h(x, y, f(x, y)) ,

where g : N N, g(x) := x, and h : N3 N, h(x, y, z) := z + 1. So f = primrec(g, h). → → Multiplication can be defined using primitive recursion: Let f(x, y) := x y. We have • · f(x, 0) = x 0 = 0 , · f(x, y + 1) = x (y + 1) = x y + x = f(x, y) + x . · ·

Therefore, we have f(x, 0) = g(x) , f(x, y + 1) = h(x, y, f(x, y)) , where g : N N, g(x) := 0, and h : N3 N, h(x, y, z) := z + x. So f := primrec(g, h). → → n 1 if n > 0, Define pred : N N, pred(n) := n · 1 = − • → − 0 otherwise. pred can be defined using primitive recursion: pred(0) = 0 , pred(x + 1) = x .

Therefore, we have pred(0) = 0 , pred(x + 1) = h(x, pred(x)) , where h : N2 N, h(x, y) := y. Therefore, pred = primrec(0, h). → January 4, 2004 CS 226 Computability Theory 3-8

x · y can be defined using primitive recursion: Let f(x, y) := x · y. We have • − − f(x, 0) = x · 0 = x , − f(x, y + 1) = x · (y + 1) = (x · y) · 1 = pred(f(x, y)) . − − −

Therefore,

f(x, 0) = g(x) , f(x, y + 1) = h(x, y, f(x, y)) ,

where g : N N, g(x) := x, and h : N3 N, h(x, y, z) := pred(z). Therefore, f = primrec(g, h). → → Examples for definition by µ.

Let f : N2 N, f(x, y) := x · y. Then µ(f)(x) (µy.f(x, y) 0) x. • → − ' ' ' Let f : N ∼ N, f(0) , f(n) = 0 for n > 0. Then (µy.f(y) 0) . • → ↑ ' ↑ 1 if there exist primes p, q < 2n + 4 s.t. 2n + 4 = p + q, Let f : N N, f(n) := • → 0 otherwise µy.f(y) 0 is the first n s.t. there don’t exist primes p, q s.t. 2n + 4 = p + q. Goldbach’s conjecture' says that every even number 4 is the sum of two primes. Goldbach’s conjecture is equivalent to (µy.f(y) 0) . It is one ≥of the most important open problems in mathematics to show (or refute) Goldbac' ↑h’s conjecture. If we could decide whether a partial computing function is defined (which we can’t), we could decide Goldbach’s conjecture.

We want to show that the set of URM-computable functions is closed under the just introduced operations. The notion of URM-computable function is not directly suitable, since it refers to specific registers and, since the values of the arguments are destroyed by running it. 0 0 Remark on µ. We need in the definition of µ the condition “g(x0, . . . , xn−1, y ) for 0 y < y”. If we defined instead ↓ ≤

0 the least y N s.t. g(x0, . . . , xn−1, y) 0 if such y exists, (µ y.g(x0, . . . , xn−1, y) 0) : ∈ ' ' ' undefined otherwise

0 then in general t := (µ y.g(x , . . . , xn , y) 0) would be non-computable: Assume for instance 0 −1 ' g(x , . . . , xn , 1) 0. Before we have finished computing g(x , . . . , xn , 0), we don’t know yet 0 −1 ' 0 −1 whether the value is 0, non-zero or undefined. But we have t 0 g(x , . . . , xn , 0) 0, ' ⇔ 0 −1 ' t 1 g(x0, . . . , xn−1, 0) 0, where the latter includes the case g(x0, . . . , xn−1, 0) . The' ab⇔ove was only a heuristics,6' and does not show yet that we cannot compute t↑by using any trick. However, one can reduce the decidability of the Turing halting problem to the problem of determining the value of t, and therefore show that t is for some functions g non-computable.

k ∼ Lemma and Definition 3.2 Assume f : N N is URM-computable. Assume x , . . . , xk , y, → 0 −1 z0, . . . ,zl are variable names for different registers. Then one can define a URM program, which, computes f(x0, . . . , xk−1) and stores the result in y in the following sense: If f(x0, . . . , xk−1) , then the program ends at the first instruction following this program, and stores the result in y↓. If f(x , . . . , xk ) , the program never terminates. Further the program can be defined so that it 0 −1 ↑ doesn’t change the arguments x0, . . . , xk−1 and the variables z0, . . . , zl. For P we say it is a URM program which computes y f(x , . . . , xk ) and avoids z , . . . , zl. ' 0 −1 0 January 4, 2004 CS 226 Computability Theory 3-9

Proof: (k) Let P be a URM program s.t. P = f. Let u0, . . . , uk−1 be registers different from the above. By renumbering of registers and of jump addresses, we obtain a program P 0, which computes the result of f(u0, . . . , uk−1) in u0 (if it is defined – if not it does not terminate), leaves the registers mentioned in the lemma unchanged, and which, if it terminates, terminates in the first instruction following P 0. The following is a program as intended: u0 := x0; · · · uk−1 := xk−1; P 0 y := u0;

n Lemma 3.3 (a) zero, succ and proji are URM-computable. n ∼ k ∼ (b) If f : N N, gi : N N are URM-computable, so is f (g , . . . , gn ). → → ◦ 0 −1 (c) If g : Nn ∼ N, and h : Nn+2 ∼ N are URM-computable, so is primrec(g, h). → → (d) If g : Nn+1 ∼ N is URM-computable, so is µ(g). → Proof: Let xi denote register Ri. Proof of (a)

zero is computed by the following program: • x0 := 0. succ is computed by the following program: • x0 := x0 + 1. projn is computed by the following program: • k x0 := xk.

Proof of (b) n ∼ k ∼ Assume f : N N, gi : N N are URM-computable. Show f (g0, . . . , gn−1) is computable. A plan for the program→ is as→follows: ◦

The input is stored in registers x , . . . , xk . Let ~x := x , . . . , xk . • 0 −1 0 −1 First we compute gi(~x) for i = 0, . . . , k 1 and store the result in registers yi. • − Then we compute f(y , . . . , yn ) (which is f(g (~x), . . . , gn (~x))), and store the result in • 0 −1 ' 0 −1 the output register x0.

A problem could be that, when computing yi gi(~x), ' we might change one of the registers ~x, which are still to be used by later computations of • yj gj (~x) for j > i ' we might change yj for j < i which contains the result from previous computations of gj(~x). • We have already seen in Lemma and Definition 3.2 that we can define programs, which avoid changing their arguments and certain other registers. Therefore programs Pi as follows do exist.

Let Pi be a URM program (i = 0, . . . , n 1), which computes yi gi(~x) and avoids yj for • j = i. (Note that our definition includes that− the arguments are changed' neither). 6 January 4, 2004 CS 226 Computability Theory 3-10

Let Q be a URM program, which computes x f(y , . . . , yn ). • 0 ' 0 −1 A URM program R for computing f (g , . . . , gn ) is defined as follows: ◦ 0 −1 P0 · · · Pn−1 Q (n) We show R (~x) (f (g (~x), . . . , gn (~x))). ' ◦ 0 −1

Case 1: For one i gi(~x) . The program will loop in program Pi for the first such i, therefore • (n) ↑ R (~x) . Further, (f (g , . . . , gn )(~x)) , the program computes correctly. ↑ ◦ 0 −1 ↑ Case 2: gi(~x) for all i. The program will execute Pi, and execute yi gi(x0, . . . , xk−1) and • reaches the beginning↓ of program Q. '

(n) – Case 2.1: f(g (~x), . . . , gn (~x)) . The program Q will loop, and we have R (~x) and 0 −1 ↑ ↑ (f (g , . . . , gn )(~x)) . ◦ 0 −1 ↑ – Case 2.2: Otherwise, the program will reach the end of program Q and result in x0 (n) ' f(g (~x), . . . , gn (~x)), so R (~x) (f (g , . . . , gn ))(~x). 0 −1 ' ◦ 0 −1 (n) In all cases, we have R (~x) (f (g , . . . , gn ))(~x). ' ◦ 0 −1 Proof of (c) Assume g : Nn ∼ N, and h : Nn+2 ∼ N are URM-computable. Let f := primrec(g, h). Show f is URM-computable.→ → Note that the defining equations for f are as follows (let ~n := n0, . . . , nn−1:

f(~n, 0) g(~n), • ' f(~n, k + 1) h(~n, k, f(~n, k)). • ' So in order to compute f(~n, l) for l > 0 we have to make the following computations: Compute f(~n, 0) as g(~n). • Compute f(~n, 1) as h(~n, 0, f(~n, 0)), using the previous result. • Compute f(~n, 2) as h(~n, 1, f(~n, 1)), using the previous result. • • · · · Compute f(~n, l) as h(~n, l 1, f(~n, l 1)), using the previous result. • − − A plan for the program is as follows:

Let ~x := x , . . . , xn . Let y, z, u be new registers. • 0 −1 We will compute f(~x, y) for y = 0, 1, 2, . . . , xn, and store the result in z. • – Initally we start with y = 0 (which is the case since all registers except of xi for i = 0, . . . , n contain initially 0), and need to compute z f(~x, 0). This is achieved by computing z g(~x). Then y = 0'and z f(~x, y). ' ' – In the step from y to y + 1 we assume that when entering one iteration of the loop we have initially z f(~x, y). We want to achieve that after increasing of y by 1 we still have z f(~x, y).' Then the loop-invarant z f(~x, y) is preserved. This is obtained as follows:' ' We first compute u h(~x, y, z) ( h(~x, y, f(~x, y)) f(~x, y + 1)). ∗ ' ' ' January 4, 2004 CS 226 Computability Theory 3-11

Then execute z := u ( f(~x, y + 1)). ∗ ' Finally execute y := y + 1. ∗ Then we have z f(~x, y) for the new value of y. ∗ ' – This loop is iterated as long as y hasn’t reached xn.

Once y has reached xn, z contains f(~x, y) f(~x, xn). • ' Execute x := z. • 0 Let therefore P be a URM program, which computes z g(~x), and avoids y. • ' Q be a program, which computes u h(~x, y, z). • ' The following program R computes f (everything following % is a comment): P % Compute u g(~x, y, z) ' while xn = y do Q 6 { % Compute u h(~x, y, z) % will be h'(~x, y, f(~x, y)) f(~x, y + 1) z := u; ' ' y := y + 1; ; } x0 := z; Correctness of this program: When P is terminated, we have y = 0 and z g(~x) f(~x, y). After each iteration of the while loop, we have y := y0 + 1 and z h(~x, y0, z0), where' y0', z0 are the previous values of y, z, respectively. This amounts to having z' f(~x, y). The loop terminates, ' when y has reached xn, and then z contains f(~x, y), which is then stored in x0. (n+1) If P loops for ever, or in one of the iterations Q loops for ever, then R loops as well, R (~x, xn) . ↑ But then we have as well f(~x, k) for some k xn, and therefore f(~x, l) is undefined for all l > k ↑ ≤ (f(~x, k + 1) h(~x, k, f(~x, k)) , f(~x, k + 2) h(~x, k + 1, f(~x, k + 1)) , etc.) Especially, f(~x, xn) . ' (n+1) ↑ ' ↑ ↑ Therefore f(~x, xn) R (~x, xn) (since both sides are undefined). ' Proof of (d) Assume g : Nn+1 ∼ N is URM-computable. Show µ(g) is URM-computable as well. → Note µ(g)(x0, . . . , xk−1) is the minimal z s.t. g(x0, . . . , xk−1, z) 0. Let ~x := x0, . . . , xk−1 and let y, z be registers different from ~x. ' Plan for the program:

Compute g(~x, 0), g(~x, 1) , . . . until we find a k s.t. g(~x, k) 0. • Then return k. ' This is carried out by executing z g(~x, y) and successively increasing y by 1 until we have • z = 0. ' Since we haven’t introduced a repeat-until-loop, we replace this by a while loop as follows: • – Initially we set z to something not 0. As long as z = 0, compute z g(~x, y) and then increase y by 1. If the while-lo6 op terminates,'we have y = n + 1 for the minimal n s.t. g(~x, n) 0. ' Decrease y once, and store result in x0.

Let P be a program which computes z g(x0, . . . , xk−1, y). Then the following program R computes µ(g) (note that initially y = 0, z'= 0). January 4, 2004 CS 226 Computability Theory 3-12 z := z + 1; while z = 0 do P 6 { y := y + 1; ; y := y · 1; } − x0 := y; By setting z initially to 1, we guarantee that the while-loop is executed at least once. Further 0 0 initially y = 0. After each iteration of the while loop, we have y := y +1 and z g(x0, . . . , xk−1, y ), where y0 is the value of y before starting this iteration. If the loop terminates, then' we have therefore 0 0 z 0 and y = y + 1 for the first value, such that g(x0, . . . , xk−1, y ) 0. At the end x0 is set to'that value. If ever P runs into a loop, then for some k we have g'(~x, k) and for all l < k g(~x, k) 0, and therefore µ(g)(~x) , and R(n)(~x) . If P always terminates,↑but the while-loop doesn’t6'terminate, then there is no ↑k s.t. g(~x, k) ↑0, again µ(g)(~x) and R(n)(~x) . ' ↑ ↑ 3.4 The Undecidability of the Halting Problem The undecidability of the Halting Problem was first proved 1936 by Alan Turing in this paper “On computable numbers, with an application to the Entscheidungsproblem”. We recast his proof using URMs instead of Turing machines. In the following, by “computable” means “URM-computable”. This will be justified later when we will argue that URM-computability does coincide with the intuitive notion of computability (the Church-Turing-thesis).

Definition 3.4 (a) A problem is an n-ary predicate M(x0, . . . , xn−1) of natural numbers, i.e. a property of n-tuples of natural numbers. (b) A problem M is decidable, if the characteristic function of M defined by

1 if M(x0, . . . , xn−1) holds, χM (x , . . . , xn ) := 0 −1 0 otherwise

is computable.

Example the binary predicate

Multiple(x, y) : x is a multiple of y ⇔ is a problem. χM (x0, . . . , xn−1) decides, whether M(x0, . . . , xn−1) holds (then it returns 1 for yes), or not (then it returns 0 for no). For instance,

1 if x is a multiple of y, χMultiple(x, y) = 0 if x is not a multiple of y.

This function is intuitively computable (and one can show in fact that it is URM-computable), therefore Multiple is decidable. URM-programs can be written as a string of ASCII-symbols, which is an element of A∗, where A is the set of ASCII-symbols, and which can be encoded as a natural number. Let for a URM program P , code(P ) be the number encoding P . It is intuitively decidable, whether a string of ASCII symbols is a URM-program, and therefore it is as well intuitively decidable, whether n = code(P ) for a URM-program P . With some effort one can show that this property can be decided by a URM-program. However, the following problem is undecidable: January 4, 2004 CS 226 Computability Theory 3-13

Definition 3.5 The Halting Problem is the following binary predicate: 1 if x = code(P ) for a URM program P and P (1)(y) Halt(x, y) : ↓ ⇔ 0 otherwise Example: Let P be a code for the URM program ifzero(0, 0) (0) If the input is > 0, the program terminates immediately, and R0 remains unchanged, so P (k) k for k > 0. ' If the input is = 0, the program loops for ever. Therefore, P (0)(0) . Let e = code(P ). Then Halt(e, n) holds for n > 0 and does not hold↑ for n = 0. Remark: We will see below that Halt is undecidable. However, the following function is com- putable: 1 if x = code(P ) for a URM program P and P (1)(y) WeakHalt(x, y) : ↓ ' undefined otherwise A program for computing WeakHalt(x, y) can be defined as follows: One first checks whether x encodes a valid URM program. If this is not the case, the program enters an infinite loop. Otherwise, one simulates the corresponding URM program P with input y. This can be done – it is an programming exercise to write a program which simulates a URM, and therefore this simulation is intuitively computable. It is not too complicated to show that there exists in fact a URM program for carrying out the simulation. If the simulation stops, we output 1, otherwise the program loops for ever. Theorem 3.6 The halting problem is undecidable. Proof: Assume there exist a URM P s.t. P (2) decides the Halting Problem. Therefore we have 1 if x codes a URM program Q s.t. Q(1)(y) P (2)(x, y) ↓ ' 0 otherwise.

We argue now similar to the proof that N (N): We define a computable function f : N ∼ N, which cannot be computed by the URM with6' Pcode e – this will be violated on input e: → If P (2)(e, e) 1, i.e. e encodes a URM Q and Q(1)(e) , then we let f(e) . Therefore, • Q(1) = f. ' ↓ ↑ 6 If P (2)(e, e) 0, i.e. e doesn’t encode a URM, or it encodes a URM Q and Q(1)(e) , then • we let f(e) 'and define f(e) : 0. (We could have defined f(e) n for any other natural↑ number n, it↓only matters that'f(e) is a defined.) Therefore, if e enco' des a URM Q, we have Q(1) = f. 6 The complete definition is therefore as follows: 0 if P (2)(e, e) 0 f(e) ' ' undefined otherwise. f is not URM-computable, for f = R(1) is violated by f(code(R)) R(1)(code(R)). Assume f were computed by R, i.e. f = R(R). Then: 6'

property of P R(1)(code(R)) P (2)(code(R), code(R)) 1 ↓ ⇔ ' Def of f f(code(R)) ⇔ ↑ f=R(1) R(1)(code(R)) , ⇔ ↑ January 4, 2004 CS 226 Computability Theory 3-14 a contradiction. However, f can be intuitively computed (the program makes use of program P ), and and it can easily be shown that it is URM-computable. So we get a contradiction, and the assumption that P (2) decides the Halting Problem is wrong. Hence, the Halting Problem is undecidable.

Remark: The above proof can easily be adapted to any reasonable programming language, in which one can define all computable functions. Such programming languages are called Turing-complete languages. For instance Babbage’s machine was, if one removes the restriction to finite memory, Turing-complete, since it had a conditional jump. Applied to standard programming language, which are Turing complete, the unsolvability of the Turing-halting problem means: it is not possible to write a program, which checks, whether a program on given input terminates. January 4, 2004 CS 226 Computability Theory 4-1

4 Turing Machines 4.1 Motivation URMs is a model of computation which is very easy to understand. The model of URMs has however two drawbacks:

(1) The execution of a single URM instruction, e.g. succ(n), might take arbitrarily long:

For instance, if Rn contains the binary number 111 111, we have to replace it by the binary · · · k times number 1 000 000, i.e. we have to replace k |ones{zby zeros.} Since k is arbitrary, this single · · · k times step might| tak{ze arbitrarily} long time. Therefore URMs are unsuitable as a basis for defining the complexity of algorithms. Of course, there exist meta theorems which relate the complexity of URM programs to the actual URM programs. But for defining the complexity, a different notion is needed, and the Turing machine is the currently most widely accepted model of computation used for this purpose. (2) We are aiming at a notion of computability, which covers all possible ways of computing something, independently of any concrete machine. URMs are a model of computation which covers computers, as they are currently used. However, there might be completely different notions of computability, based on symbolic manipulations of a sequence of characters, where it might be more complicated to see directly that all such computations can be simulated by a URM. It is more easy to see that such notions are covered by the Turing machine model of computation.

We will therefore introduce a second model of computation, the Turing machine (TM). We will later show that the sets of Turing-computable and of URM-computable functions coincide. The definition of computability proposed by Turing in 1936 is based on an analysis of how a human being (called agent) carries out a computation on a piece of paper.

15 . 16= 15 90 240

In order to formulate this, we will make the following steps:

An algorithm should be deterministic, and therefore we assume that the agent uses only • finitely many symbols, which he puts at discrete positions on the paper.

1 5 . 1 6 = 1 5 9 0 − − − 2 4 0

Sideremark: If one has doubts, whether this is always possible (we might squeeze a small symbol between two existing ones, which might violate having a grid of symbols), one can January 4, 2004 CS 226 Computability Theory 4-2

go to level of pixels: We assume that the agent’s eye can only distinguish pixels of a certain size. If we represent, what the agent has written on a piece of paper, as pixels, we obtain a grid. Each cell in this grid is a pixle, which contains one colour. Since the agent can only distinguish finitely many colours, each of which can be represented by one symbol, we obtain that the original piece of paper can be represented as a grid, each cell of which contains one of finitely many symbols. For instance, if one writes A1 on the paper, where A is black (written as B) and 1 is red (written as R), we obtain the following representation: B B R R B B R R B B R B B R B B B B R B B R B B R B B R We don’t have to work with a two-dimensional piece of paper, one can always assume that • the whole paper is written on one long tape (a two dimensional paper can be simulated by having a symbol which separates the lines of the paper). Each entry on this tape is called a cell.

1 5 . 1 6 = CR 1 5 CR · · · · · · In the real situation, an agent can look at several cells at the same time. But the amount • of cells he can look at simultaneously is physically bounded. Looking at finitely many cells simultaneously can be simulated by looking at one symbol at each time and moving around, in order to observe the cells in the neighbourhood. Therefore, we assume that the agent looks only at one symbol at a time. This is modelled by having a head of the Turing machine, which indicates the position of the tape we are currently looking at.

1 5 . 1 6 = CR 1 5 CR · · · · · · ↑ Head

In reality the agent can make larger jumps between positions on the tape, but the distance • can only be finite, bounded by a fixed length corresponding to the physical ability of the agent to make one step. Such steps can be simulated by finitely many one-step movements. Therefore, we assume that the agent can move in one step only one symbol to the left or to the right. The agent will work purely mechanistically, read the current symbol and depending on it • make a movement or change the symbol on the tape. The agent himself will only have finite amount of memory – data exceeding this has to be stored on the tape. Therefore there are only finitely many states of the agent, and depending on this state and the symbol at the head of the tape, the agent will move to a next state, change the symbol, and make a movement of the head. A final diagram of a Turing machine looks as follows:

1 5 . 1 6 = CR 1 5 CR · · · · · · ↑ s0 January 4, 2004 CS 226 Computability Theory 4-3

4.2 Definition of a Turing Machine Using these considerations we arrive at the following definition of a Turing machine: A Turing machine is a quintuple (Σ, S, I, xy, s0), where Σ is a finite set of symbols, called the alphabet of the Turing machine. On the tape, the • symbols in Σ will be written. S is a finite set of states. • I is a finite sets of quintuples (s, a, s0, a0, D), where s, s0 S, a, a0 Σ D L, R , s.t. for • every s S, a Σ there is at most one s0, a0, D s.t. (s, a, s∈0, a0, D) ∈S. The∈elemen{ }ts of I are called instructions∈ ∈ . ∈ xy Σ (a symbol for blank). • ∈ s S (the initial state). • 0 ∈ If (s, a, s0, a0, D) I, this instruction means the following: ∈ If the Turing machine is in state s, and the symbol at position of the head is a, then • – the state is changed to s0, – the symbol at this position is changed to a0, – if D = L the head moves left, – if D = R the head moves right. For instance, if we have the following instructions: (s0, 1, s1, 0, R) (s1, 6, s2, 7, L) Then we have the following sequence of configurations: Initially: • 1 5 . 1 6 = CR 1 5 CR · · · · · · ↑ s0 Symbol is 1, state is s , so instruction (1, s , 0, s , R) expresses: replace this symbol by 0, • 0 0 1 change to state s1, move once to the right: 1 5 . 0 6 = CR 1 5 CR · · · · · · ↑ s1 Symbol is 6, state is s , so instruction (6, s , 7, s , L) expresses: replace this symbol by 7, • 1 1 2 change to state s2, move once to the left: 1 5 . 0 7 = CR 1 5 CR · · · · · · ↑ s2

Example We develop a Turing machine over Σ = 0, 1, xy , (where xy stands for a blank entry), which does the following: Assume the tape contains{initially} a binary number, and to the left and right of it there are only blanks, and the head is pointing to any digit of this number, and the Turing machine is in state s0, e.g. 1 0 1 0 0 1 0 0 1 1 1 · · · · · · ↑ s0 January 4, 2004 CS 226 Computability Theory 4-4

Then the Turing machine will stop with the head at the most significant bit of the original binary number, incremented by 1, and all other cells will be blank:

1 0 1 0 0 1 0 1 0 0 0 · · · · · · ↑ s3

So we have that the Turing machine is ( 0, 1, xy , S, I, xy, s0), where we develop in the following the set of states S and the set of instructions{ I: } First, we will move the head to the least significant bit of the binary number. So, as long as • the head shows a digit 0 or 1, we move the head right. If we encounter the symbol xy, then we move once left, and the Turing machine stops in state s1.

So we have the states s0, s1 and the following instructions:

– (s0, 0, s0, 0, R)

– (s0, 1, s0, 1, R)

– (s0, xy, s1, xy, L). In the above example, at the end of this step the state of the TM is as follows: • 1 0 1 0 0 1 0 0 1 1 1 · · · · · · ↑ s1

Increasing a binary number b is done as follows: • We have two cases. – If the number consists of 1 only i.e. b = (111 111) . In this case, b+1 = (1 000 000) . · · · 2 · · · 2 k times k times b+1 is obtained by replacing all ones by|zeros,{z and} then replacing the first blank| {zsymb}ol by 1. – Otherwise the binary representation of the number contains a 0, followed by a finite sequence of ones until we reach the least significant bit. This includes the case if the least significant bit is 0, so the finite sequence of ones has length 0: Example 1: b = (0100010111) , one 0 followed by 3 ones. ∗ 2 Example 2: b = (0100010010) , least significant digit is 0. ∗ 2 Let b = (b b bk0 111 111) . (In the special case l = 0 we have b = (b b bk0) .) 0 1 · · · · · · 2 0 1 · · · 2 l times Then b + 1 is obtained| b{zy replacing} the final block of ones by 0 and the 0 by 1: b + 1 = (b b bk1 000 000) . (In the special case l = 0 we have b + 1 = (b b bk1) .) 0 1 · · · · · · 2 0 1 · · · 2 l times | {z } In general we have to replace, as long as we find ones only starting from the right, the ones by zeros, and move left, until we encounter a 0 or a xy, which is replaced by a 1. If we find a 0 or a xy, then we replace this symbol by a 1.

So we have the new state s2 and the following instructions:

– (s1, 1, s1, 0, L).

– (s1, 0, s2, 1, L).

– (s1, xy, s2, 1, L). January 4, 2004 CS 226 Computability Theory 4-5

At the end the head will be one field to the left of the 1 written, and the state will be s2. In the example above, this situation is as follows:

1 0 1 0 0 1 0 1 0 0 0 · · · · · · ↑ s2

Finally, we have to move the most significant bit, which is done as follows: • – (s2, 0, s2, 0, L).

– (s2, 1, s2, 1, L).

– (s2, xy, s3, xy, R).

The program will terminate in state s3, in the example above this is as follows:

1 0 1 0 0 1 0 1 0 0 0 · · · · · · ↑ s3

So the complete Turing machine is as follows:

( 0, 1, xy , { } s0, s1, s2, s3 , {(s , 0, s , 0, }R), { 0 0 (s0, 1, s0, 1, R), (s0, xy, s1, xy, L), (s1, 1, s1, 0, L), (s1, 0, s2, 1, L), (s1, xy, s2, 1, L), (s2, 0, s2, 0, L), (s2, 1, s2, 1, L), (s , xy, s , xy, R) , 2 3 } s0, xy)

4.3 Turing-Computable Functions

Definition 4.1 Let M = (Σ, S, I, xy, s0) be a Turing machine with 0, 1 Σ Then we define for (k) k ∼ (k) { } ⊆ every k N M : N N, where M (a , . . . , ak ) is computed as follows: ∈ → 0 −1 Initialisation: • – The head is an arbitrary position.

– Starting from this, we write bin(a0)xybin(a1)xy xybin(ak−1) on the tape, where bin(a) is the binary string corresponding to a. · · · E.g. if k = 3, a = 0, a = 3, a = 2 then we write 0xy11xy10. ∗ 0 1 2 – All other cells contain xy.

– The state is set to s0. Iteration: As long as there is an instruction, corresponding to the state of the TM and the • symbol at the head, the TM performs the operation according to this instruction. Output: • January 4, 2004 CS 226 Computability Theory 4-6

– Case 1: The TM stops. At any time, only finitely many cells contain a non-blank symbol: initially this is the case, and in finitely many steps, only finitely many cell contents are changed. Therefore there cannot be an infinite sequence of symbols in 0, 1 starting from the head position to the right, and let the kth cell be the first one{not}containing 0 or 1. Therefore the TM contains, starting from the head position, symbols b b bk c, where 0 1 · · · −1 bi 0, 1 and c 0, 1 . k might be 0 (if the symbol at the position of the head itself is =∈ {0, 1).} 6∈ { } 6 (k) Let a = (b , . . . , bk ) (in case k = 0, a = 0). Then M (a , . . . , ak ) a. 0 −1 2 0 −1 ' (k) – Case 2: Otherwise. Then M (a , . . . , ak ) . 0 −1 ↑ Example: Let Σ = 0, 1, a, b, xy . { } If tape contains at the end, starting with the head position, 01010xy0101xy or 01010axy, the • output is (01010)2 = 10. If tape contains at the end, starting with head position, abxy, a or xy, the output is 0. • Definition 4.2 f : Nk ∼ N is Turing-computable, in short TM-computable, if f = M (k) for some TM M, the alphabet of →which contains 0, 1 . { } ∼ Example: The above example of a TM treated in detail shows that succ : N N is Turing- computable. →

Theorem 4.3 If f : Nn ∼ N is URM-computable, then it is as well Turing-computable by a TM with alphabet 0, 1, xy . → { } Proof: We first introduce the following notation: By saying the tape of a TM contains a0, . . . , al we mean that the tape contains, starting with the head position, a0, . . . , al, and all other cells of the tape contain xy. Furthermore, in this proof, bin(n) will stand for any binary representation of n, for instance bin(2) could be any of 10, 010, 0010 etc. This is needed, since when performing operations, we don’t want to bother with normalising the representation of numbers on the tape, i.e. we don’t want to deal with having to delete possible leading zeros. Assume f is URM-computable by URM P , i.e. f = P (n). Assume P refers only to registers R0, . . . , Rl−1 and that the input registers (i.e. R0, . . . , Rn−1) of P are among R0, . . . , Rl−1, i.e. that l > n We will define a Turing machine M, which simulates P . This will be done as follows:

If the registers R , . . . , Rl of the URM P contain a , . . . , al , this is modelled in the TM • 0 −1 0 −1 as the tape containing bin(a )xy xybin(al ). 0 · · · −1 An instruction Ij will be simulated by finitely many states qj,0, . . . , qj,i of the TM with • instructions for those states. The instructions and states will be defined in such a way that the following holds: Assume for the URM P • – R0, . . . , Rl−1 contain a0, . . . , al−1,

– the URM is about to execute Ij .

Assume that after executing Ij the URM P arrives at a state where • January 4, 2004 CS 226 Computability Theory 4-7

– R0, . . . , Rl−1 contain b0, . . . , bl−1, – the PC contains j0. Then, if the configuration of the TM P is s.t. • – the tape contains bin(a )xybin(a )xy xybin(al ), 0 1 · · · −1 – and the state of is qj,0, then the TM will, after executing the corresponding instructions, arrive at a configuration, • in which

– the tape contains bin(b )xybin(b )xy xybin(bl ), 0 1 · · · −1 – the state is qj0 ,0.

For instance, if we simulate the instruction I4 = pred(2) then we have the following: Assume the URM is about to execute instruction I with register contents • 4 R0 R1 R2 2 1 3 So PC is initially 4 Then the URM will end with the PC containing 5 and register contents • R0 R1 R2 2 1 2 So, when simulating this in the TM we want the following:

If the TM is in state q , , with the tape containing bin(2)xybin(1)xybin(3) • 4 0 it should reach state q , with the tape containing bin(2)xybin(1)xybin(2) • 5 0

Furthermore, we need an initialisation for the TM consisting of states qinit,0, . . . , qinit,j and corre- sponding instructions, s.t.

if the TM initially contains the configuration corresponding to arguments b0, . . . , bn−1 of the • function to be defined, namely bin(b )xybin(b )xy xybin(bn ), 0 1 · · · −1 it will reach state q , with the tape containing • 0 0 bin(b )xybin(b )xy xybin(bn )xy 0xy0xy xy0xy. 0 1 · · · −1 · · · l n times | − {z } Remark: We assume here that 0 is represented on the tape by 0, enclosed by blanks separating the binary numbers of the registers. We could as represent it as the empty string, which means that we have the two enclosing blanks with no symbol in between. Then the initial configuration of the TM would represent already bin(b )xybin(b )xy xybin(bn )xy bin(0)xybin(0)xy xybin(0)xy, since 0 1 · · · −1 · · · l n times bin(0)xybin(0)xy xybin(0)xy is just xyxy xy . No initialisation| would− b{ze needed. } · · · · · · l n times l n times | − {z } |− {z } (n) If we have achieved the above simulation steps, then a computation of P (a0, . . . , an−1) is simu- lated as follows: Assume the run of the URM, starting with Ri containing a ,i = ai i = 0, . . . , n 1, 0 − January 4, 2004 CS 226 Computability Theory 4-8

and a ,i = 0 for i = n, . . . , l 1 is as follows: 0 −

Instruction R R Rn Rn Rl 0 1 · · · −1 · · · −1 I0 a0 a1 an−1 0 0 = = = · · · = = · · · = Ik a , a , a ,n a ,n a ,l 0 0 0 0 1 · · · 0 −1 0 · · · 0 −1 Ik a , a , a ,n a ,n a ,l 1 1 0 1 1 · · · 1 −1 1 · · · 1 −1 Ik a , a , a ,n a ,n a ,l 2 2 0 2 1 · · · 2 −1 2 · · · 2 −1 · · · Then the corresponding TM will successively reach the following configurations:

State Tape contains qinit,0 bin(a0)xybin(a1)xy xybin(an−1)xy xy xy · · · xy xy xy xy xy qk0,0 bin(a0) bin(a1) bin(an−1) bin(0) bin(0) = · · · = · · · q , bin(a , )xybin(a , )xy xybin(a ,l )xy 0 0 0 0 0 1 · · · 0 −1 qk , bin(a , )xybin(a , )xy xybin(a ,l )xy 1 0 1 0 1 1 · · · 1 −1 qk , bin(a , )xybin(a , )xy xybin(a ,l )xy 2 0 2 0 2 1 · · · 2 −1 · · · Example: We take the URM program P

I0 = ifzero(0, 3) I1 = pred(0) I2 = ifzero(1, 0) (1) The corresponding unary function P operates as follows: R1 is initially zero and never changed, therefore R1 is always 0 and the jump in I1 is always executed. Therefore a run of the program is as follows: It checks whether R0 contains 0. If yes, it will stop. Otherwise, it will reduce R0 by 1, and then jump back to the beginning. So the program always terminates with R0 containing 0, and we have P (1)(n) 0. A run of this program'corresponding to a computation of P (1)(2) is as follows:

Instruction R0 R1 I0 2 0 I1 2 0 I2 1 0 I0 1 0 I1 1 0 I2 0 0 I0 0 0 I3 0 0 URM Stops

We start with instruction I0 and R0 containing 2 and R1 containing 0. Jump I0 is not executed, so we arrive at I1 with register values unchanged. Then R0 is reduced by 1 and we move to I2. Then we jump back to I0 with the register values unchanged. Etc. The corresponding Turing machine M should simulate this program. A run of M 1(2) will start in state qinit,0 with initial configuration bin(2)xy. Then in the initialisation part, the TM expands this to bin(2)xybin(0)xy and arrives at state q0,0. Then each of the steps of the URM is simulated in the TM. So, when simulating I0, the TM checks whether the first binary number on the tape is 0 or not. If it is 0, it switches to q3,0, otherwise, it switches to q1,0. When simulating I1, it decreases the first binary number on the tape by one. When simulating I2, it checks, whether the second binary number on the tape is 0 or not. If it is zero, it switches to q0,0, otherwise to q3,0. January 4, 2004 CS 226 Computability Theory 4-9

Assuming that we have introduced such a TM, we can write the configuration of the URM and of the TM in one table and obtain the following:

Instruction R0 R1 State of TM Content of Tape qinit,0 bin(2)xy I0 2 0 q0,0 bin(2)xybin(0)xy I1 2 0 q1,0 bin(2)xybin(0)xy I2 1 0 q2,0 bin(1)xybin(0)xy I0 1 0 q0,0 bin(1)xybin(0)xy I1 1 0 q1,0 bin(1)xybin(0)xy I2 0 0 q2,0 bin(0)xybin(0)xy I0 0 0 q0,0 bin(0)xybin(0)xy I3 0 0 q3,0 bin(0)xybin(0)xy URM Stops TM Stops

Once we have introduced such a simulation we can see that P (n) = M (n):

n n If P (a , . . . , an ) , P (a , . . . , an ) j, then P will eventually stop with Ri containing • 0 −1 ↓ 0 −1 ' some values bi, where b0 = j. Then, the TM M starting with bin(a0)xy xybin(an−1) will eventually terminate in a con- · · · n figuration bin(b )xy xybin(bl ) and therefore M (a , . . . , an ) b = j. 0 · · · −1 0 −1 ' 0 n If P (a0, . . . , an−1) , the URM P will loop and the TM M will carry out the same steps • ↑ n n as the URM and loop as well, therefore M (a0, . . . , an−1) , again P (a0, . . . , an−1) n ↑ ' M (a0, . . . , an−1).

Therefore, we have P (n) = M (n) and we are done, provided we have defined the simulation. We describe informally, how the instructions of the URM are simulated and the initialisation is obtained.

Initialisation. We start with the tape containing bin(a )xy xybin(an ), and have to • 0 · · · −1 extend this to bin(a )xy xybin(an )xy bin(0)xy xybin(0). 0 · · · −1 · · · l − n times This is achieved by moving the head to the| end of the{z initial}position (by moving to the nth blank to the right), and then inserting, starting from the next blank, l n-times 0xy, and then moving back to the beginning (by moving to the lth blank to the left).− Simulation of URM instructions. • – Simulation of instruction Ik = succ(j). We have to increase the (j + 1)st binary number on the tape by 1 (note that register number 0 is the first number on the tape, register number 1 is the second number on the tape, etc.). Initially, the configuration is:

bin(c ) xy bin(c ) xy xy bin(cj ) xy xy bin(cl) xy 0 1 · · · · · · ↑ qk,0

We first move to the (j + 1)st blank to the right. Then we are at the end of the ∗ (j + 1)st binary number.

bin(c ) xy bin(c ) xy xy bin(cj ) xy xy bin(cl) xy 0 1 · · · · · · ↑ January 4, 2004 CS 226 Computability Theory 4-10

Now perform the operation for increasing by 1 as above. ∗ At the end we obtain:

bin(c ) xy bin(c ) xy xy bin(cj + 1) xy xy bin(cl) xy 0 1 · · · · · · ↑ However, it might be that we needed to write over the separating blank a 1, in which case we have:

bin(c ) xy bin(c ) xy bin(cj ) bin(cj + 1) xy xy bin(cl) xy 0 1 · · · −1 · · · ↑ If we are in the latter case, then we have to shift all symbols to the left once left, ∗ in order to obtain a separating xy between the lth and l 1st entry. This can be achieved easily (until one has reached the l 1st blank) and− we obtain − bin(c ) xy bin(c ) xy bin(cj ) xy bin(cj + 1) xy xy bin(cl) xy 0 1 · · · −1 · · · ↑ Otherwise, we move the head to the left, until we reached the (j + 1)st blank to the ∗ left, and then move it once to the right. We obtain

bin(c ) xy bin(c ) xy xy bin(cj + 1) xy xy bin(cl) xy 0 1 · · · · · · ↑ – Simulation of instruction Ik = pred(j). We have to decrease the (j + 1)st binary number on the tape by 1. Assume the configuration at the beginning is : ∗

bin(c ) xy bin(c ) xy bin(cj ) xy xy bin(cl) xy 0 1 · · · · · · ↑ We want to decrease the jth number by 1 (or leave it as it is, if it is zero. So we want to achieve the following configuration: · bin(c ) xy bin(c ) xy bin(cj 1) xy xy bin(cl) xy 0 1 · · · − · · · ↑ Done as follows: We move as before to the end of the (j + 1)st number. ∗ Then we check, if the number consists only of zeros or not. ∗ If it consists only of zeros, i.e. it represents 0, then pred(j) doesn’t change · anything. Otherwise, the number is of the form b0 bk1 00 0. · · · · 0 · · · l times We have (b0 bk1 00 0)2 1 = (b0 bk0 11| {z1)2}. · · · 0 · · · − · · · 0 · · · l times l times So, we have to replace| {zthe} binary string by b| {zbk}0 11 1. This can be done 0 · · · · · · l0 times similarly as for the successor operation. | {z } Finally we move back to the beginning. ∗ 0 – Simulation of instruction Ik = ifzero(j, k ). The URM instruction does the following: if Rj contains zero, then the next instruction is Ik0 (or the program stops if there is no such instruction). Otherwise the next instruction is Ik+1 (or the program stops if there is no such instruction). This is simulated as follows on the TM: It moves to the (j + 1)st binary number on the tape, checks whether it consists only of zeros. If yes, we switch to state qk0,0, otherwise we switch to state qk+1,0. January 4, 2004 CS 226 Computability Theory 5-1

This completes the simulation of the URM P . Remark: We will later show the other direction, namely that every TM-computable program is URM-computable. Therefore both models of computation define the same functions. This will be done by showing first that every TM-computable function is partial recursive (where the partial recursive functions is a third model of computation), and then that every partial recursive function is URM-computable. One could show as well directly that every TM-computable function is URM-computable, but the way taken in this lecture is probably easier. Extension to arbitrary alphabets. Let A be a finite alphabet s.t. xy A, and B := A∗. To a 6∈ (A,n) n ∼ Turing machine T = (Σ, S, I, xy, s0) with A Σ corresponds a partial function T : B B, (A,n) ⊆ → where T (a0, . . . , an−1) is computed as follows:

Initially write a xy xyan on the tape, otherwise blanks. Start in state s on the left • 0 · · · −1 0 most position of a0. Iterate TM as before. • In case of termination, the output of the function is c cl , if the tape contains, starting • 0 · · · −1 with the head position c cl d with ci A, d A. 0 · · · −1 ∈ 6∈ Otherwise, the function value is undefined. • This notion is modulo encoding of A∗ into N equivalent to the notion of Turing-computability on N. However, when considering complexity bounds, it might be more appropriate, since the encoding and decoding of natural numbers might exceed the intended complexity bounds.

Remark on Turing-computable predicates: A predicate A is Turing-decidable, iff χA is Turing-computable. However, instead of simulating χA, which amounts to finally writing the output of χA (a binary number 0 or 1) on the tape, it is more convenient, to take TM with two additional special states strue and sfalse corresponding to truth and falsity of the predicate. Then we can say that a predicate is Turing decidable, if, when we write initially the inputs as before on the tape and start executing the TM, it always terminates in strue or sfalse, and it terminates in strue, iff the predicate holds for the inputs, and in sfalse, otherwise. This notion, which is equivalent to the first notion, is usually taken as basis for complexity considerations.

5 Algebraic View of Computability

The previous models of computations (URMs and TMs) were based on programming languages, which allow to introduce all computable functions. In this section we discuss a model of computa- tion, in which the set of computable functions is generated from basic functions by using certain operations. Thus we describe the set of computable functions as the least algebra of functions con- taining the basic functions, and which is closed under these operations. This algebraic approach was proposed by G¨odel and Kleene in 1936, and is a elegant and mathematically rigorous definition of the set of computable functions. In order to show that a function is computable, it is usually easier to show that it is computable in this model. We will first introduce the primitive-recursive functions. This is a set of total functions, which includes all functions which can be realistically computed, and many more, but does not cover all computable functions. Then we will introduce the partial recursive functions, which is a complete model of computation. Then we will show that the sets of URM-computable, of TM-computable functions and of partial recursive functions coincide.

5.1 The Primitive-Recursive Functions Definition 5.1 We define inductively the set of primitive-recursive functions f together with their arity, i.e. together with the k s.t. f : Nk N. → January 4, 2004 CS 226 Computability Theory 5-2

We write “f : Nk N is primitive-recursive” for “f is primitive-recursive with arity k”, and N for N1. → The following basic functions are primitive-recursive: • – zero : N N, → – succ : N N, → – projk : Nk N (0 i k). i → ≤ ≤ (Remember that these functions have defining equations – zero(n) = 0, – succ(n) = n + 1,

– proj(a0, . . . , ak−1) = ai.) k n If g : N N is primitive-recursive, and for i = 0, . . . , k 1 we have hi : N N is • → n − → primitive-recursive, then g (h , . . . , hk ) : N N is primitive-recursive as well. ◦ 0 −1 → (Remember that f := g (h , . . . , hk ) has defining equation: ◦ 0 −1 – f(~x) = g(h0(~x), . . . , hk−1(~x)).) Some notations: In the special case k = 1 we write g h instead of g (h), and we write ◦ ◦ g g g gn instead of g (g (g gn)). 0 ◦ 1 ◦ 2 ◦ · · · ◦ 0 ◦ 1 ◦ 2 ◦ · · · ◦ If g : Nn N, and h : Nn+2 N are primitive-recursive, then primrec(g, h) : Nn+1 N is • primitive-r→ecursive as well. → → (Remember that f := primrec(g, h) has defining equations – f(~x, 0) = g(~x), – f(~x, n + 1) = h(~x, n, f(~x, n)).) If k N and h : N2 N is primitive-recursive, then primrec(k, h) : N N is primitive- • recursive∈ as well. → → (Remember that f := primrec(k, h) has defining equations – f(0) = k, – f(n + 1) = h(n, f(n)).) Remark: That we defined inductively this set means that the set of primitive-recursive functions is the least set closed under the above mentioned operations, or that it is the set generated by the above operations: The primitive-recursive functions are exactly those for which we can introduce n a term formed from zero, succ, proji , ( , . . . , ) (i.e if f, gi are terms so is f (g0, . . . , gn−1)) and primrec, provided we respect the arities◦of the functions as above. ◦ Examples: primrec(proj1, succ proj3 ) : N2 N is prim. rec. • 0 ◦ 2 → N N :N→N : → :N3→N |{z} | {z } :N3→|N{z }

:N2→| N {z } |(We will see{zbelow that this} is the function add : N2 N, add(x, y) := x + y). → primrec( 0 , proj2 ) : N N is prim. rec.. • 0 → N ∈ :N2→N |{z} :N1→N | {z } (W| e will {zsee below}that this is the function pred : N N, pred(x) := x · 1). → − January 4, 2004 CS 226 Computability Theory 5-3

Definition 5.2 A relation R Nn is a primitive-recursive relation, if the characteristic function n ⊆ χR : N N is primitive-recursive. → Examples:

The id : N N, id(n) = n is primitive-recursive, since • id = proj1: proj1 : N1 N, p→roj1(n) = n = id(n). 0 0 → 0 The constn : N N, constn(k) = n is primitive-recursive, since • → constn = succ succ zero: ◦ · · · ◦ ◦ n times | {z } succ succ zero(k) = succ(succ( succ(zero(k)))) ◦ · · · ◦ ◦ · · · n times n times | {z } = succ| (succ{z( succ}( 0))) · · · n times = 0| +1 + 1{z + 1 } · · · n times = n| {z }

= constn(k) .

Addition is primitive-recursive. We have seen previously that add : N2 N, add(x, y) = x+y • follows the rules →

add(x, 0) = x + 0 = x = g(x) , add(x, y + 1) = x + (y + 1) = (x + y) + 1 = add(x, y) + 1 = h(x, y, add(x, y)) .

where g : N N, g(x) = x, therefore g = id = proj1 prim. rec. , → 0 and h : N3 N h(x, y, z) := z + 1 . → We have h = succ proj3 and therefore prim. rec. : ◦ 0

(succ proj3)(x, y, z) = succ(proj3(x, y, z)) ◦ 2 2 = succ(z) = z + 1 = h(x, y, z) .

Therefore add = primrec(proj1, succ proj3). 0 ◦ 2 January 4, 2004 CS 226 Computability Theory 5-4

Multiplication is primitive-recursive. We have seen previously that mult : N2 N, mult(x, y) = • x y follows the rules → · mult(x, 0) = x 0 · = 0 = g(x) , mult(x, y + 1) = x (y + 1) · = x y + x · = mult(x, y) + x = add(mult(x, y), x) = h(x, y, mult(x, y)) ,

where g : N N , g(x) = 0 and therefore g = zero prim. rec. → and h : N3 N , h(x, y, z) = add(z, x) . → We have h = add (proj3, proj3) and therefore prim. rec. : ◦ 2 0 (add (proj3, proj3))(x, y, z) = add(proj3(x, y, z), proj3(x, y, z)) ◦ 2 0 2 0 = add(z, x) = h(x, y, z) .

Therefore mult = primrec(zero, add (proj3, proj3)) is primitive-recursive. ◦ 2 0 Remark: Unless a direct reduction to the principle of primitive recursion is demanded, for showing • that a function f : Nk+1 N is primitive-recursive defined by using primrec it suffices to show informally that → – f(~x, 0) can be defined by an expression built from previously defined primitive-recursive functions, parameters ~x, and constants. Example: f(x , x , 0) = (x + x ) 3 . 0 1 0 1 · – f(~x, y + 1) can be defined by an expression built from previously defined primitive- recursive functions, parameters ~x, the recursion argument y, the recursion hypothesis f(~x, y), and constants. Example: f(x , x , y + 1) = (x + x + y + f(x , x , y)) 3 . 0 1 0 1 0 1 · Similarly, if one wants to verify that a function is primitive-recursive by giving a direct • definition of it in terms of other primitive-recursive functions (i.e. by using previously defined primitive-recursive functions and composition), it suffices to show that f(~x) can be defined by an expression built from previously defined primitive-recursive functions, parameters ~x and constants. Example: f(x, y, z) = (x + y) 3 + z . · All the following proofs will be given in this style and are therefore examples for this principle. We continue with the introduction of primitive-recursive functions: January 4, 2004 CS 226 Computability Theory 5-5

The predecessor function pred is primitive-recursive, because of the equations • pred(0) = 0 , pred(x + 1) = x ,

where the induction step doesn’t refer to the recursion hypothesis, but only to the recursion argument. The function f(x, y) = x · y is primitive-recursive, because of the equations • − f(x, 0) = x , f(x, y + 1) = pred(f(x, y)) .

The signum-function sig : N N, • → 1, if x > 0, sig(x) := 0, if x = 0

is primitive-recursive, since sig(x) = x · (x · 1): − − – For x = 0 we have

x · (x · 1) = 0 · (0 · 1) − − − − = 0 · 0 − = 0 = sig(x) .

– For x > 0 we have

x · (x · 1) = x (x 1) − − − − = x x + 1 − = 1 = σ(x) .

Alternatively, one can show that sig is primitive-recursive by using the principle of primitive recursion and the equations

sig(0) = 0 , sig(x + 1) = 1 .

· The relation “x < y”, i.e. A(x, y) : x < y is primitive-recursive, since χA(x, y) = sig(y x): • ⇔ − – If x < y, then y · x = y x > 0 , − − therefore · sig(y x) = 1 = χA(x, y) . − – If (x < y), i.e. x y, then ¬ ≥ y · x = 0 , − therefore sig(y · x) = 0 . − Consider the sequence of definitions of addition, multiplication, exponentiation: • January 4, 2004 CS 226 Computability Theory 5-6

– Addition:

n + 0 = n , n + (m + 1) = (n + m) + 1 ,

Therefore, if we write (+1) for the function N N, (+1)(n) = n + 1, then → n + m = (+1)m(n) .

– Multiplication:

n 0 = 0 , · n (m + 1) = (n m) + n , · · Therefore, if we write (+n) for the function N N, (+n)(k) = k + n, then → n m = (+n)m(0) . · – Exponentiation:

n0 = 1 , nm+1 = (nm) n , · Therefore, if we write ( n) for the function N N, ( n)(m) = n m, then · → · · nm = ( n)m(1) . · We can extend this sequence further, by defining – Superexponentiation:

superexp(n, 0) = 1 , superexp(n, m + 1) = nsuperexp(n,m) ,

Therefore, if we write (n ) for the function N N, (n )(k) = nk, then ↑ → ↑ nm = (n )m(1) . ↑ – Supersuperexponentiation:

supersuperexp(n, 0) = 1 , supersuperexp(n, m + 1) = superexp(n, supersuperexp(n, m)) ,

– Etc. This way we obtain a sequence of extremely fast growing functions. Traditionally, instead of considering such binary functions, one considers a sequence of unary functions. which have a similar growth rate. These functions are called the Ackermann functions, and will exhaust all primitive recursive functions (in the sense of Lemma 5.9):

For n N, the n-th branch of the Ackermann function Ackn : N N, is defined by • ∈ →

Ack0(y) = y + 1 , y+1 Ackn (y) = (Ackn) (1) := Ackn(Ackn( Ackn( 1))) . +1 · · · y + 1 times | {z } Ackn is primitive-recursive for fixed n. We show this by induction on n: January 4, 2004 CS 226 Computability Theory 5-7

– Base-case: Ack0 = succ is primitive-recursive.

– Induction step: Assume Ackn is primitive-recursive. Show Ackn+1 is primitive- recursive. We have:

Ackn+1(0) = Ackn(1) ,

Ackn+1(y + 1) = Ackn(Ackn+1(y)) .

Therefore Ackn+1 is primitive-recursive.

Remark:

Ack0(n) = n + 1 . n+1 Ack1(n) = Ack0 (1) = 1 +1 + + 1 · · · n + 1 times = 1 |+ n +{z1 = n} + 2 . n+1 Ack2(n) = Ack1 (1) = 1 +2 + + 2 · · · n + 1 times = 1 |+ 2(n{z+ 1) } = 2n + 3 > 2n . n+1 Ack3(n) = Ack2 (1) > 2 2 2 1 · · · · · · · n + 1 times = 2|n+1{z> 2n}. n+1 Ack4(n) = Ack3 (1) 21 > 2··· . n + 1 times | {z } Ack5(n) will iterate Ack4 n + 1 times, etc. So for small n, Ack5(n) will exceed the number of particles in the universe, and therefore Ack5 is not realistically computable.

Whereas Ackn for fixed n is primitive-recursive, we will show below that a uniform version of the 2 Ackermann function, i.e. Ack : N N, Ack(x, y) := Ackx(y), is not primitive-recursive. → 5.2 Closure Properties of the Primitive-Recursive Functions The primitive-recursive relations are closed under union, intersection, and complements: • If R, S Nn are primitive-recursive, so are R S, R S, Nn R. Note that⊆ ∪ ∩ \ (R S)(~x) R(~x) S(~x): • ∪ ⇔ ∨ (R S)(~x) ~x R S ∪ ⇔ ∈ ∪ ~x R ~x S ⇔ ∈ ∨ ∈ R(~x) S(~x) ⇔ ∨ January 4, 2004 CS 226 Computability Theory 5-8

(R S)(~x) R(~x) S(~x): • ∩ ⇔ ∧ (R S)(~x) ~x R S ∩ ⇔ ∈ ∩ ~x R ~x S ⇔ ∈ ∧ ∈ R(~x) S(~x) ⇔ ∧ (Nn R)(~x) R(~x): • \ ⇔ ¬ (Nn R)(~x) ~x (Nn R) \ ⇔ ∈ \ ~x R ⇔ 6∈ R(~x) ⇔ ¬ Therefore, the primitive-recursive predicates are essentially closed under , , . ∨ ∧ ¬ Proof that primitive-recursive predicates are closed under , and complement: ∪ ∩

– χR∪S(~x) = sig(χR(~x) + χS(~x)) (and therefore primitive-recursive): If R(~x) holds, then ∗

sig(χR(~x) + χS(~x)) = 1 = χR∪S(~x) .

=1 ≥0 | {z } | {z } ≥1

| =1{z } | {z } Similarly, if S(~x) holds, then ∗

sig(χR(~x) + χS(~x)) = 1 = χR∪S(~x) .

≥0 =1 | {z } | {z } ≥1

| =1{z } | {z } If neither R(~x) nor S(~x) holds, then we have ∗

sig(χR(~x) + χS(~x)) = 0 = χR∪S(~x) .

=0 =0

| {z }=0 | {z }

| =0{z } | {z } – χR S(~x) = χR(~x) χS(~x))(and therefore R S is primitive-recursive): ∩ · ∩ If R(~x) and S(~x) hold, then ∗

χR(~x) χS(~x) = 1 = χR S(~x) . · ∩ =1 =1

| {z }=1| {z } | {z } If R(~x) holds, then χR(~x) = 0, therefore ∗ ¬

χR(~x) χS(~x) = 0 = χR S(~x) . · ∩ =0

| {z }=0 | {z } January 4, 2004 CS 226 Computability Theory 5-9

Similarly, if S(~x), we have ∗ ¬

χR(~x) χS(~x) = 0 = χR S(~x) . · ∩ =0

=0| {z }

· | {z } – χNn (~x) = 1 χR(~x) (and therefore primitive-recursive): \R − If R(~x) holds, then χR(~x) = 1, therefore ∗ · 1 χR(~x) = 0 = χNn (~x) . − \R =1

|=0{z } | {z } If R(~x) does not hold, then χR(~x) = 0, ∗ · 1 χR(~x) = 1 = χNn (~x) . − \R =0

|=1{z } | {z } The predicates “x y” and “x = y” are primitive-recursive: • ≤ – x y (y < x). The≤ righ⇔ thand¬ of this equivalence is primitive-recursive, since ‘y < x” is primitive- recursive, and primitive-recursive predicates are closed under . ¬ – x = y x y y x. The righ⇔thand≤ side∧ ≤of this equivalence is primitive-recursive, since “x y” and “y x” are primitive-recursive, and primitive-recursive predicates are closed under≤ . ≤ ∧ n The primitive-recursive functions are closed under definition by cases: Assume g1, g2 : N • N are primitive-recursive, and R Nn is primitive-recursive, then the function f : Nn N→, ⊆ → g (~x), if R(~x), f(~x) := 1 g2(~x), if R(~x), ¬ is primitive-recursive, since

f(~x) = g (~x) χR(~x) + g (~x) χNn (~x) : 1 · 2 · \R

– If R(~x) holds, then χR(~x) = 1, χNn\R(~x) = 0,

g (~x) χR(~x) + g (~x) χNn (~x) = g (~x) = f(~x) . 1 · 2 · \R 1 =1 =0 | {z } =g1(~x) |=0 {z } | {z } | {z } =g1(~x) | {z } – If R(~x) holds, then χR(~x) = 0, χNn (~x) = 1, ¬ \R

g (~x) χR(~x) + g (~x) χNn (~x) = g (~x) = f(~x) . 1 · 2 · \R 2 =0 =1 | {z } | {z } =0 =g2(~x) | {z } | {z } =g2(~x) | {z } January 4, 2004 CS 226 Computability Theory 5-10

The primitive-recursive functions are closed under bounded sums: • If g : Nn+1 N is primitive-recursive, so is → f : Nn+1 N , f(~x, y) := g(~x, z) , → zX 0, g(~x, z) := g(~x, 0) + g(~x, 1) + + g(~x, y 1) . · · · − zX

f(~x, 0) = 0 , f(~x, y + 1) = f(~x, y) + g(~x, y) .

Here, the last equations follows from

f(~x, y + 1) = g(~x, z) z

The primitive-recursive functions are closed under bounded products: • If g : Nn+1 N is primitive-recursive, so is → f : Nn+1 N , f(~x, y) := g(~x, z) , → zY 0, g(~x, z) := g(~x, 0) g(~x, 1) g(~x, y 1) . · · · · · · − zY

f(~x, 0) = 1 , f(~x, y + 1) = f(~x, y) g(~x, y). · Here, the last equations follows by

f(~x, y + 1) = g(~x, z) z

Example for closure under bounded products: f : N N, → f(n) := n! = 1 2 n · · · · · · is primitive recursive, since

f(n) = (i + 1) = g(i) , i

f(0) = 0! = 1 = (i + 1) . i

The primitive-recursive relations are closed under bounded quantification: • If R Nn+1 is primitive-recursive, so are the relations ⊆ R (~x, y) : z < y.R(~x, z) , 1 ⇔ ∀ R (~x, y) : z < y.R(~x, z) . 2 ⇔ ∃ Proof for R1:

χR1 (~x, y) = χR(~x, z) : zY

– If z < y.R(~x, z) holds, then z < y.χR(~x, z) = 1, therefore ∀ ∀

χR(~x, y) = 1 = 1 = χR1 (~x, y) . zY

– If for one z < y we have R(~x, z), then for this z we have χR(~x, z) = 0, therefore ¬

χR(~x, y) = 0 = χR1 (~x, y) . zY

Proof for R2:

χR2 (~x, y) = sig( χR(~x, z)) : zX

sig( χR(~x, y)) = sig( 0) zX

= χR2 (~x, y) .

– If for one z < y we have R(~x, z), then for this z we have χR(~x, z) = 1, therefore

χR(~x, y) χR(~x, z) = 1 , ≥ zX

sig( χR(~x, y)) = 1 = χR2 (~x, y) . zX

The primitive-recursive functions are closed under bounded search, i.e. if R Nn+1 is a • primitive-recursive predicate, so is f(~x, y) := µz < y.R(~x, z) where ⊆

the least z s.t. R(~x, z) holds, if such z exists, µz < y.R(~x, z) := y otherwise. Proof: Define

Q(~x, y) : R(~x, y) z < y. R(~x, z) , ⇔ ∧ ∀ ¬ Q0(~x, y) : z < y. R(~x, z) . ⇔ ∀ ¬ Q and Q0 are primitive-recursive. Q(~x, y) holds exactly, if y is the minimal z s.t. R(~x, z) holds (i.e. if R(~x, y) holds, but for all z < y R(~x, z) is false). We show f(~x, y) = ( χQ(~x, z) z) + χQ0 (~x, y) y : · · zX

χQ0 (~x, y) y = 0 . · It follows ( χQ(~x, z) z) + χQ0 (~x, y) y = z = f(~x, y) . · · zX

z < y.χQ(~x, z) z = 0 . ∀ · Furthermore, Q0(~x, y), therefore

χQ0 (~x, y) y = y . · It follows ( χQ(~x, z) z) + χQ0 (~x, y) y = y = f(~x, y) . · · zX

We show that the functions, which encode fixed length and arbitrary length sequences as natural numbers or decode them, are primitive-recursive:

Lemma 5.3 The following functions are primitive-recursive: (a) The pairing functions π : N2 N. (Remember, that π(n, m) enc→odes two natural numbers as one.) (b) The projections π , π : N N. 0 1 → (Remember that π0(π(n, m)) = n, π1(π(n, m)) = m, so π0, π1 invert π.) (c) πk : Nk N (k 1). → ≥ k (Remember that π (n0, . . . , nk−1) encodes the sequence (n0, . . . , nk). January 4, 2004 CS 226 Computability Theory 5-13

(d) f : N3 N, → πk(x), if i < k, f(x, k, i) = i x, otherwise. k k (Remember that πi (π (n0, . . . , nk−1)) = ni for i < k.) We write πk(x) for f(x, k, i), even if i k. i ≥ k (e) The function fk : N N, fk(x , . . . , xk ) = x , . . . , xk . → 0 −1 h 0 −1i (Remember that x0, . . . , xk−1 encodes the sequence x0, . . . , xk−1 as one natural number. Note that it doesnh’t make senseito say that λx. x : N∗ N is primitive-recursive, since each primitive-recursive function has to have domainh iNk for→some k.) (f) lh : N N. → (Remember that lh( x , . . . , xk ) = k.) h 0 −1i 2 (g) g : N N, g(x, i) = (x)i. → (Remember that ( x , . . . , xk )i = xi for i < k.) h 0 −1i Proof: (a)

π(n, m) = ( i) + m i≤Xn+m = ( i) + m i

Therefore π0, π1 are primitive-recursive. (c) Proof by induction on k. k = 1: π1(x) = x, so π1 is primitive-recursive. k k + 1: Assume that πk is primitive-recursive. Show that πk+1 is primitive-recursive as well: → k+1 k π (x0, . . . , xk) = π(π (x0, . . . , xk−1), xk) . Therefore πk+1 is primitive-recursive (using that π, πk are primitive-recursive). (d) We have

1 π0(x) = x , k+1 k πi (x) = πi (π0(x)), if i < k, k+1 πi (x) = π1(x), if i = k,

Therefore k−i k π1((π0) (x)), if i > 0, πi (x) = k (π0) (x), if i = 0. and x, if i k, k−i ≥ f(x, k, i) = π1((π0) (x)), if 0 < i < k, k (π0) (x), if i = 0 < k.  January 4, 2004 CS 226 Computability Theory 5-14

Define g : N2 N, → g(x, 0) := x ,

g(x, k + 1) := π0(g(x, k)) , k which is primitive-recursive. Then we get g(x, k) = (π0) (x), therefore x, if i k, · ≥ f(x, k, i) = π1(g(x, k i)), if 0 < i < k, g(x, k), − if i = 0 < k. So f is primitive-recursive.  (e)

· k fk(x , . . . , xk ) = 1 + π(k 1, π (x , . . . , xk )) 0 −1 − 0 −1 is primitive-recursive. (f) 0, if x = 0, lh(x) = · π0(x 1) + 1, if x = 0. − 6 (g)

lh(x) · (x)i = π (π (x 1)) i 1 − = f(π (x · 1), lh(x), i) 1 − is primitive-recursive.

Lemma and Definition 5.4 There exists primitive-recursive functions with the following prop- erties: 2 (a) A function snoc : N N s.t. snoc( x0, . . . , xn−1 , x) = x0, . . . , xn−1, x . (The name snoc is derive→ d from invertingh the lettersi in consh . Our definitioni of the encoding of sequences corresponds to appending new elements at the end rather than at the beginning). (b) Functions last : N N and beginning : N N s.t. → → last(snoc(x, y)) = y , beginning(snoc(x, y)) = x .

Proof: (a) Define y , if x = 0, snoc(x, y) = h i · 1 + π(lh(x), π(π1(x 1), y)), otherwise, − so snoc is primitive-recursive. We have snoc( , y) = snoc(0, y) hi = y , h i k+1 snoc( x , . . . , xk , y) = snoc(1 + π(k, π (x , . . . , xk)), y) h 0 i 0 lh (hx0, . . . , xki) = k + 1 k+1 · = 1 + π(k + 1, π(π1((1 + π(k, π (x0, . . . , xk))) 1), y)) k+1 − = 1 + π(k + 1, π(π1(π(k, π (x0, . . . , xk))), y)) k+1 = 1 + π(k + 1, π(π (x0, . . . , xk), y)) k+2 = 1 + π(k + 1, π (x0, . . . , xk, y))

= x , . . . , xk, y . h 0 i January 4, 2004 CS 226 Computability Theory 5-15

(b) Proof for beginning: Define , if lh(x) 1, hi ≤ beginning(x) :=  (x)0 if lh(x) = 2, h1 + πi((lh(x) · 1) · 1, π (π (y · 1))), otherwise. − − 0 1 −  Let x = snoc(y, z). Show beginning(x) = y. Case lh(y) = 0: Then x = snoc(y, z) = z h i therefore lh(x) = 1, and

beginning(x) = hi = y

Case lh(y) = 1: Then y = y0 for some y0, snoc(y, z) = y0, z , h i h i beginning(x) = (x) h 0i = ( y0, z ) h h i 0i = y0 h i = y

Case lh(y) > 1: Let lh(y) = n + 2,

n+2 y = y , . . . , yn = 1 + π(n + 1, π (y , . . . , yn )) . h 0 +1i 0 +1 Then snoc(y, z) = 1 + π(n + 2, π(π (y · 1), z)) . 1 − Therefore

· · · beginning(snoc(y, z)) = 1 + π(((lh(x) 1) 1), π0(π1(snoc(y, z) 1))) − − · − · = 1 + π(n, π0(π1((1 + π(n + 2, π(π1(y 1), z))) 1))) · − − = 1 + π(n, π0(π1(π(n + 2, π(π1(y 1), z))))) · − = 1 + π(n, π0(π(π1(y 1), z))) · − = 1 + π(n, π1(y 1)) − n+2 · = 1 + π(n, π1((1 + π(n + 1, π (y0, . . . , yn+1))) 1)) n+2 − = 1 + π(n, π1(π(n + 1, π (y0, . . . , yn+1)))) n+2 = 1 + π(n, π (y0, . . . , yn+1))) = y .

Proof for last: Define last(x) := (x)lh(x)−· 1

If y = y , . . . , yn , then h 0 −1i

last(snoc(y, z)) = last( y , . . . , yn , z ) h 0 −1 i · = ( y , . . . , yn , z ) n− h 0 −1 i lh(hy0,...,y 1,zi)− 1 = ( y , . . . , yn , z )n h 0 −1 i = z . January 4, 2004 CS 226 Computability Theory 5-16

Lemma 5.5 The primitive-recursive functions are closed under course-of-values recursion: If g : Nn+2 N is primitive-recursive, then f : Nn+1 N is primitive-recursive as well, where f is defined by→ → f(~x, k) = g(~x, k, f(~x, 0), f(~x, 1), . . . , f(~x, k 1) ) . h − i Remark: Informally, this means: If we can define f(~x, y) by an expression which uses previously defined primitive-recursive functions, constants, ~x, y and any f(~x, z) s.t. z < y, then f is primitive- recursive – if we have such an expression, we can form a function g as in the lemma above. Example: The Fibonacci numbers are primitive-recursive, i.e. the function fib : N N, defined by →

fib(0) := 1 , fib(1) := 1 , fib(n) := fib(n 1) + fib(n 2), if n > 1, − − is primitive-recursive, as shown by course-of-values recursion. (This is the inefficient implementa- tion, the more efficient solution can be seen directly to be primitive-recursive). Proof that primitive-recursive functions are closed under course-of-values recursion: Let f(~x, y) := g(~x, k, f(~x, 0), f(~x, 1), . . . , f(~x, k 1) ) h − i Show f is prim. rec. Define h : Nn+1 N, → h(~x, y) := f(~x, 0), f(~x, 1), . . . , f(~x, y 1) . h − i (Especially, h(~x, 0) = .) It follows hi

h(~x, 0) = , hi h(~x, y + 1) = f(~x, 0), f(~x, 1), . . . , f(~x, y 1), f(~x, y) h − i = snoc( f(~x, 0), f(~x, 1), . . . , f(~x, y 1) , f(~x, y)) h − i =h(~x,y) = snoc(h|(~x, y), g(~x, y, {zf(~x, 0), f(~x, 1), .}. . , f(~x, y 1) )) h − i =h(~x,y) = snoc(snoc(h(~x, y), g(|~x, y, h(~x, y))) .{z }

Therefore h is primitive-recursive. Now we have that

f(~x, y) = ( f(~x, 0), . . . , f(~x, y) )y h i +1 = (h(~x, y + 1))y+1 is primitive recursive. We show that functions, which manipulate codes for sequences of natural numbers, are primitive- recursive.

Lemma and Definition 5.6 There exists primitive-recursive functions with the following prop- erties: (a) A function append : N2 N s.t. →

append( n , . . . , nk , m , . . . , ml ) = n , . . . , nk , m , . . . , ml . h 0 −1i h 0 −1i h 0 −1 0 −1i January 4, 2004 CS 226 Computability Theory 5-17

We write n m ∗ for append(n, m) .

(b) A function subst : N3 N s.t. if i < n then →

subst( x , . . . , xn , i, y) = x , . . . , xi , y, xi, xi , . . . , xn , h 0 −1i h 0 −2 +1 −1i and if i n, then ≥ subst( x , . . . , xn , i, y) = x , . . . , xn . h 0 −1i h 0 −1i This means that subst(x, i, y) will be the result of substituting in the sequence x the ith element by y, if such an element exist – if not, x remains unchanged. We write x[i/y] for subst(x, i, y). (c) A function substring : N3 N s.t., if i < n, →

substring( x , . . . , xn , i, j) = xi, xi , . . . , x , h 0 −1i h +1 min(j−1,n−1)i and if i n, ≥ substring( x , . . . , xn , i, j) = . h 0 −1i hi (d) A function half : N N, s.t. half(n) = k if n = 2k or n = 2k + 1. → (e) The function bin : N N, s.t. bin(n) = b , . . . , bk , for bi in normal form (no leading zeros, → h 0 i unless n = 0), s.t. n = (b0, . . . , bk)2 −1 −1 (f) A function bin : N N, s.t. bin ( b , . . . , bk ) = n, if (b , . . . , bk) = n. → h 0 i 0 2 Proof: (a) We have

append( x , . . . , xn , 0) = append( x , . . . , xn , ) h 0 i h 0 i hi = x , . . . , xn , h 0 i and for m > 0 append( x , . . . , xn , y , . . . , ym ) = x , . . . , xn, y , . . . , ym h 0 i h 0 i h 0 0 i = snoc( x , . . . , xn, y , . . . , ym , ym) h 0 0 −1i = snoc(append( x , . . . , xn , y , . . . , ym ), ym) h 0 i h 0 −1i = snoc(append( x , . . . , xn , beginning( y , . . . , ym )), last( y , . . . , ym )) . h 0 i h 0 i h 0 i Therefore we have

append(x, 0) = x , append(x, y) = snoc(append(x, beginning(y)), last(y)) ,

One can see that beginning(x) < x for x > 0, therefore the last equations give a definition of append by course-of-values recursion, therefore append is primitive-recursive. (b) We have

x, if lh(x) i, ≤ subst(x, i, y) := snoc(beginning(x), y), if i + 1 = lh(x), snoc(subst(beginning(x), i, y), last(x)) if i + 1 < lh(x).  January 4, 2004 CS 226 Computability Theory 5-18

Therefore subst is definable by course-of-values recursion. (c) We can define

, if i lh(x), hi ≥ substring(x, i, j) = substring(beginning(x), i, j), if i < lh(x) and j < lh(x), snoc(substring(beginning(x), i, j), last(x)) if i < lh(x) j, ≤  which is a definition by course-of-values recursion. (d) half(x) = µy < x.(2 y = x 2 y + 1 = x). (e) · ∨ · 0 , if x = 0, h i bin(x) =  1 if x = 1, hsnoi c(half(x), x · (2 half(x))), if x > 1. − · therefore definable by course-of-values recursion. (f) 0, if lh(x) = 0, −1 bin (x) = (x)0 if lh(x) = 1, bin−1(beginning(x)) 2 + last(x) if lh(x) > 1, · therefore definable by course-of-v alues recursion.

5.3 Not All Computable Functions are Primitive-Recursive All primitive-recursive functions are computable. However, there are computable functions which are not primitive-recursive. One such function is the Ackermann-function:

2 Definition 5.7 The Ackermann function Ack : N N is defined as Ack(n, m) := Ackn(m). Therefore Ack has the following recursion equations:→

Ack(0, y) = y + 1 , Ack(x + 1, 0) = Ack(x, 1) , Ack(x + 1, y + 1) = Ack(x, Ack(x + 1, y)) .

Lemma 5.8 For each n, m, the following holds: (a) Ack(m, n) > n.

(b) Ackm is strictly monotone, i.e. Ack(m, n + 1) > Ack(m, n). (c) Ack(m + 1, n) > Ack(m, n). (d) Ack(m, Ack(m, n)) < Ack(m + 2, n). (e) Ack(m, 2n) < Ack(m + 2, n). (f) Ack(m, 2k n) < Ack(m + 2k, n). · Proof: (a) Induction on m. m = 0: Ack(0, n) = n + 1 > n . m m + 1: Side-induction on n n =→0: IH Ack(m + 1, 0) = Ack(m, 1) > 1 > 0 . January 4, 2004 CS 226 Computability Theory 5-19 n n + 1: → Ack(m + 1, n + 1) = Ack(m, Ack(m + 1, n)) Main IH > Ack(m + 1, n) Side IH > n , therefore Ack(m + 1, n + 1) > n + 1 . (b) Case m = 0: Ack(0, n + 1) = n + 2 > n + 1 = Ack(0, n) . Case m = m0 + 1:

Ack(m0 + 1, n + 1) = Ack(m0, Ack(m0 + 1, n)) (a) > Ack(m0 + 1, n) .

(c) Induction on m. m = 0:

Ack(1, n) = n + 2 > n + 1 = Ack(0, n) m m + 1: Side-induction on n: n =→0: (b) Ack(m + 2, 0) = Ack(m + 1, 1) > Ack(m + 1, 0) . n n + 1: → Ack(m + 2, n + 1) = Ack(m + 1, Ack(m + 2, n)) main-IH > Ack(m, Ack(m + 2, n)) side-IH + (b) > Ack(m, Ack(m + 1, n)) = Ack(m + 1, n + 1) .

(d) Case m = 0: Ack(0, Ack(0, n)) = n + 2 < 2n + 3 = Ack(2, n) . Assume now m > 0. Proof of the assertion by induction on n: n = 0:

Ack(m + 2, 0) = Ack(m + 1, 1) = Ack(m, (Ack(m + 1, 0)) > Ack(m, Ack(m, 0)) . n n + 1: →

Ack(m + 2, n + 1) = Ack(m + 1, Ack(m + 2, n)) IH,(b) > Ack(m + 1, Ack(m, Ack(m, n))) (b),(c) > Ack(m, Ack(m 1, Ack(m, n))) − = Ack(m, Ack(m, n + 1)) . January 4, 2004 CS 226 Computability Theory 5-20

(e) Case m = 0:

Ack(m, 2n) = Ack(0, 2n) = 2n + 1 < 2n + 3 = Ack(2, n) = Ack(m + 2, n) . m = m0 + 1: Induction on n: n = 0:

Ack(m0 + 1, 2n) = Ack(m0 + 1, 0) < Ack(m0 + 3, 0) = Ack(m0 + 3, n) . n n + 1: →

Ack(m0 + 1, 2n + 2) = Ack(m0, Ack(m0, Ack(m0 + 1, 2n)) (d) < Ack(m0 + 2, Ack(m0 + 1, 2n)) IH < Ack(m0 + 2, Ack(m0 + 3, n)) = Ack(m0 + 3, n + 1)

(f) Induction on k: k = 0: trivial. k k + 1: → Ack(m, 2k+1 n) = Ack(m, 2 2k n) · · · (e) < Ack(m + 2, 2k n) · IH < Ack(m + 2 + 2k, n) = Ack(m + 2(k + 1), n) .

Lemma 5.9 Every primitive-recursive function f : Nn N can be majorised by one branch of the Ackermann function, i.e. there exists an N s.t. →

f(x , . . . , xn ) < AckN (x + + xn ) 0 −1 0 · · · −1 for all x0, . . . , xn−1 N. Especially, if f : N ∈ N is primitive-recursive, then there exists an N s.t. →

x N.f(x) < AckN (x) ∀ ∈ Proof: We write (~x) for x + xn , if ~x = x , . . . , xn . 0 · · · −1 0 −1 We proof Pthe assertion by induction on the definition of primitive-recursive functions. Basic functions: zero: zero(x) = 0 < x + 1 = Ack0(x) . succ: succ(x) = Ack(0, x) < Ack(1, x) = Ack1(x) . January 4, 2004 CS 226 Computability Theory 5-21

n proji :

proj(x0, . . . , xn−1) = xi

< x + + xn + 1 0 · · · −1 = Ack (x + + xn ) . 0 0 · · · −1 k n Composition: Assume assertion holds for f : N N and gi : N N. Show assertion for → → h := f (g0, . . . , gk−1). Assume◦ f(~y) < Ackl( (~y)) , X and

gi(~x) < Ackmi ( (~x)) . X Let N := max l, m , . . . , mk . By 5.8 (c) it follows { 0 −1}

f(~y) < AckN ( (~y)) , X

gi(~x) < AckN ( (~x)) . X Then, with M s.t. l < 2M , we have

h(~x) = f(g0(~x), . . . , gk−1(~x))

< AckN (g (~x) + + gk (~x)) 0 · · · −1 (b) < AckN (AckN ( (~x)) + + AckN ( (~x))) X · · · X = AckN (AckN ( (~x)) k) · X M < AckN (AckN ( (~x)) 2 ) X · < AckN+2M (AckN ( (~x))) X AckN+2M (AckN+2M ( (~x))) ≤ X < AckN+2M+2( (~x)) . X

Primitive recursion, n > 1: Assume assertion holds for f : Nn N and g : Nn+2 N. Show assertion for h := primrec(f, g) : Nn+1 N. → → Assume →

f(~x) < Ackl( (~x)) , X g(~x, y, z) < Ackr( (~x) + y + z) . X Let N := max l, r , k < 2M . Then { }

f(~x) < AckN ( (~x)) , X g(~x, y, z) < AckN ( (~x) + y + z) . X We show h(~x, y) < AckN+3( (~x) + y) X by induction on y: y = 0: January 4, 2004 CS 226 Computability Theory 5-22

h(~x, 0) = g(~x)

< AckN ( (~x)) X < AckN+3( (~x) + 0) . X y y + 1: →

h(~x, y + 1) = g(~x, y, h(~x, y))

< AckN ( (~x) + y + h(~x, y)) X IH < AckN ( (~x) + y + AckN+3( (~x) + y)) X X < AckN (AckN+3( (~x) + y) + AckN+3( (~x) + y)) X X = AckN (2 AckN+3( (~x) + y)) · X < AckN+2(AckN+3( (~x) + y)) X = AckN+3( (~x) + y + 1) X

Primitive recursion n = 0: Assume l N, g : N2 N. Show assertion for h := primrec(l, g) : N1 N. ∈ → Define→

f 0 : N N, f 0(x) = l , → g0 : N3 N, g0(x, y, z) = g(y, z) , → h0 : N2 N, h := primrec(f 0, g0) . → Using the constructions already shown and IH it follows that

0 h (x, y) < AckN (x + y) for some N. Therefore 0 h(y) = h (0, y) < AckN (y) .

Lemma 5.10 The Ackermann function is not primitive-recursive.

Proof: Assume Ack were primitive-recursive. Then f : N N, f(n) := Ack(n, n) would be primitive-recursive as well. Then there exists an N N, s.t. f(→n) < Ack(N, n) for all n, especially ∈ Ack(N, N) = f(N) < Ack(N, N) , a contradiction. Remark: We can show more directly that there are non-primitive-recursive computable functions: Assume all computable functions are primitive-recursive. Define h : N2 N as follows: → f(n), if e encodes a string in ASCII  which is a term denoting a unary h(e, n) =   primitive-recursive function f, 0, otherwise.   January 4, 2004 CS 226 Computability Theory 5-23

So h(e, n) computes, if e is a code for a primitive-recursive function f, the result of applying f to n. For other e, the result of h(e, n) doesn’t matter as long as it is defined, we have set it to 0. h can be considered as an interpreter for the primitive-recursive functions. So if e is a code for f, then n.f(n) = h(e, n), f = λn.h(e, n). Therefore for each primitive- recursive function f : N N ∀there exists an e s.t. f = λn.h(e, n) (choose e to be the code for n). → h is computable, since the defining laws for primitive-recursive functions give us a way of computing h(e, n). Using the assumption, that all computable recursive functions are primitive-recursive, it follows that h is primitive-recursive. Define f : N N, f(n) := h(n, n) + 1 . → f is primitive-recursive, since h is primitive-recursive. But f is chosen in such a way that it cannot be of the form λn.h(e, n). This is violated at input e:

f(e) = h(e, e) + 1 = h(e, e) = (λn.h(e, n))(e) . 6 But since f is primitive-recursive, there exists a code e for f and therefore f = λe.h(e, n) for some e, which cannot be by the above, so we get a contradiction. This completes the proof. A short version of the proof reads as follows: Assume all computable functions were primitive-recursive. Define h as above. h is computable, therefore primitive-recursive. Define f as above, and let e be a code for f. Then we have

f(n) = h(e, n) for all n. But then h(e, e) = f(e) = h(e, e) + 1 , a contradiction. Remark 2: The above proof can be used in other contexts as well. It shows as well that there is no programming language, which computes all computable functions and such that all functions definable in this language are total. If we had such a language, then we could define codes e for programs in this language and therefore we could define h : N2 N, → f(n), if e is a program in this language h(e, n) =  for a unary function f, 0, otherwise.  If f is a unary function, computable in this language, then f = λn.h(e, n) for the code e for the program of f. Now define as above f(n) = h(n, n) + 1. h is computable, therefore as well f, therefore f is computable in this language. Let f = λn.h(e, n). Then we obtain

h(e, e) + 1 = f(e) = (λn.h(e, n))(e) = h(e, e) , a contradiction. Remark 3: The above proof shows as well for instance that if we extend the primitive-recursive functions by adding the Ackermann function as basic function, or any other functions, we still won’t obtain all computable functions – the above argument can be used for for such sets of functions in just the same way as for the set of primitive-recursive functions.

5.4 The Partial Recursive Functions The primitive-recursive functions will allow to define functions which compute the result of running n steps of a URM or TM, • January 4, 2004 CS 226 Computability Theory 5-24

check whether a URM or TM has stopped, • obtain, depending on n arguments the initial configuration for computing P (n) for a URM • P or M (n) for a TM M, extract from a configuration of P and M, in which this machine has stopped, the result • optained by P (n), M (n). We will show the existence of such functions for TMs in Subsection 5.6. However, TMs and URMs do not allow to compute an n, s.t. after n steps the URM or TM has stopped. In order to obtain such an n, we will extend the set of primitive-recursive function by the principle of being closed as well under µ. The resulting set will be called the set of partial recursive functions. Using the µ-operator, we will be able to find the first n s.t. a TM or URM stops. Together with the above we will then be able to show that all TM- and URM-computable functions are partial recursive. Note that µ(f) might be partial, even if f is total, therefore we will necessarily obtain a set of partial functions rather than a set of total functions, therefore the name partial recursive functions. The recursive functions will be the partial recursive functions, which are total.

Definition 5.11 We define inductively the set of partial recursive functions f together with their ∼ arity, i.e. together with the k s.t. f : Nk N. We write “f : Nk ∼ N is partial recursive”→ for “f is partial recursive with arity k”, and N for N1. → The following basic functions are partial recursive: • – zero : N ∼ N, → – succ : N ∼ N, → – projk : Nk ∼ N (0 i k). i → ≤ ≤ k ∼ n ∼ If g : N N is partial recursive, and for i = 0, . . . , k 1 we have hi : N N is partial • → n ∼ − → recursive, then g (h , . . . , hk ) : N N is partial recursive as well. ◦ 0 −1 → As before we write g h instead of g (h), and g g g gn for g (g (g gn)). ◦ ◦ 0 ◦ 1 ◦ 2 ◦ · · · ◦ 0 ◦ 1 ◦ 2 ◦ · · · ◦ If g : Nn ∼ N, and h : Nn+2 ∼ N are partial recursive, then primrec(g, h) : Nn+1 ∼ N is • partial recursive→ as well. → → If k N and h : N2 ∼ N is partial recursive, then primrec(k, h) : N ∼ N is partial recursive • as wel∈l. → → If k N and g : Nk+2 ∼ N is partial recursive, then µ(g) : Nk+1 ∼ N is partial recursive as • well.∈ → → (Remember that f := µ(g) has defining equation

min k N g(~x, k) 0 l < k.g(~x, l) , if such a k exists, f(~x) { ∈ | ' ∧ ∀ ↓} ' undefined, otherwise.)

Definition 5.12 (a) A recursive function is a partial recursive function, which is total.

n (b) A recursive relation is a relation R N s.t. χR is recursive. ⊆ Example: One can show that the Ackermann function is recursive. Note that it is not primitive- recursive. January 4, 2004 CS 226 Computability Theory 5-25

5.5 Closure Properties of the Recursive Functions Every primitive-recursive function (relation) is recursive. • The recursive functions and relations have the same closure properties as those discussed for • the primitive-recursive functions and relations in Sect. 5.2, using essentially the same proofs. Let for a predicate U Nn+1 • ⊆ min z U(~n, z) , if such a z exists, µz.U(~n, z) := { | } undefined, otherwise.

Then we have that, if U is recursive, so is the function

f(~n) : µz.U(~n, z) . ' Proof: f(~n) µz.(χNn (~n, z) 0) . ' \U ' 5.6 Equivalence of URM-Computable, TM-Computable and Partial Re- cursive Functions Lemma 5.13 All partial recursive functions are URM computable.

Proof: By Lemma 3.3.

Turing-computable functions are partial recursive. We are going to show that TM-computable functions are partial recursive. We will show an even stronger result: We will encode Turing machines M as natural numbers code(M) and show that n+1 ∼ there exists for every n N partial recursive functions fn : N N s.t. for every TM M we have ∈ → n (n) m~ N .fn(code(M), m~ ) M (m~ ) . ∀ ∈ ' These functions fn will be called universal partial recursive functions. The functions fn are es- sentially interpreters for Turing machines – fn essentially evaluates (simulates) a Turing machine (given by its code e) applied to arguments m~ . n n We will later write e (m~ ) for fn(e, m~ ). The brackets in e are called “Kleene-brackets”, since Kleene introduced this{ } notation. { }

Encoding of Turing Machines as Natural Numbers.

We will encode TMs using the following steps:

In general we use sometimes G¨odel-brackets in order to denote the code of some object. E.g. • one writes x for the code of x, y for the code of y. d e d e We assume some encoding x for the symbols x of the alphabet Σ as natural numbers, s.t. • d e – 0 = 0, d e – 1 = 1, d e – xy = 2. d e We assume that each states q is encoded as natural number q , where we have q = 0 for • d e d 0e the initial state q0 of the TM. We encode the directions in the instructions by • January 4, 2004 CS 226 Computability Theory 5-26

– L = 0, d e – R = 1. d e We assume that the alphabet consists of 0, 1, xy and those symbols mentioned in the • instructions. Any other symbols of the TM will{ never}occur during a run of the TM, therefore omitting those symbols doesn’t change the behaviour of the TM. Therefore we don’t need to write down the alphabet explicitly.

Similarly, we can assume that the set of states consists of q0 and all states mentioned in • the instructions, therefore we don’t need to write down the set of states explicitly. Again, omitting any other states won’t change the behaviour of the TM.

Since we assumed that q0 is always the initial state with code 0, and that xy has code 2, we • don’t need to mention those symbols. Therefore a TM is given by its set of instructions. An instruction I = (q, a, q0, a0, D), where q, q0 are states, a, a0 are symbols, D L, R , will • be encoded as ∈ { } code(I) = π5( q , a , q0 , a0 , D ) . d e d e d e d e d e

A set of instructions I0, . . . , Ik−1 will be encoded as code(I0), . . . , code(Ik−1) . This number • is as well the encoding{ code(M) }of the correspondingh TM. i

Encoding of the Configuration of a TM as a Natural number.

The configuration of a TM can be given by: The code q of its state q. • d e A segment (i.e. a consecutive sequence of cells) of the tape, which includes the cell, the head • is pointing to, and all cells which are not blank. (Note that during a run of a TM, only finitely many cells contain a symbol which is non-blank: initially this is the case and in finitely many steps only finitely many cells are changed from blank to non-blank. Therefore there exists always a segment as just stated.) A segment a0, . . . , an−1 is encoded as code(a0, . . . , an−1) := a , . . . , an . hd 0e d −1ei The position of the head on this tape. This can be given as a number i s.t. 0 i < n, if the • ≤ segment represented is a0, . . . , an−1.

A configuration of a TM, given by a state q, a segment a0, . . . , an−1 and a head position i, will be 3 encoded as π ( q , code(a0, . . . , an−1), i). Note that the segmend e t represented is not uniquely defined, but that doesn’t cause any problems. We only require that the functions generate and use arbitrary codes for configurations.

Primitive-Recursive Functions Simulating the Steps of a Turing Machine

We will now introduce primitive-recursive function, as outlined in the beginning of Section 5.4, which create the initial configuration of a TM, make one step of the TM, check whether the TM has stopped and extract the result from its configuration.

We define primitive-recursive functions symbol, state : N N, which extract the symbol at • the head and the state of the TM from the current configuration:→

3 symbol(a) = (π (a)) 3 1 π2 (a) 3 state(a) = π0 (a)

3 Note that a = π ( q , a , . . . , an , i). d e hd 0e d −1ei January 4, 2004 CS 226 Computability Theory 5-27

We define a primitive-recursive function lookup : N3 N, s.t. if e is the code for a TM, q • state, a a symbol, then lookup(e, q, a) is a number π3(→q0 , a0 , D ) for the first instruction of the TM e of the form (q, a, q0, a0, D), if it exists. Ifdnoe sucd eh dinstructione exists, the result will be 0(= π3(0, 0, 0)). lookup is defined as follows: – First find using bounded search the index of the first instruction starting with q , a0 . d e d e – Then extract the corresponding values from this instruction. Formally the definition is as follows: – Define an auxiliary primitive-recursive function g : N3 N, → 5 5 g(e, q, a) = µi lh(e).π ((e)i) = q π ((e)i) = a , ≤ 0 ∧ 1 which finds the index of the first instruction starting with q, q. – Now define

3 5 5 5 lookup(e, q, a) := π (π2 ((e)g(e,q,a)), π3((e)g(e,q,a)), π4 ((e)g(e,q,a))) .

There exists a primitive-recursive relation hasinstruction N3, s.t. hasinstruction(e, q , a ) • holds, iff the TM corresponding to e has a instruction (q⊆, a, q0, a0, D) for some q0, a0,dDe. d e

– Defined, using the function g from the previous item, as

· χhasinstruction(e, q, a) = sig(lh(e) g(e, q, a)) . − If there is such an instruction, g(e, q, a) is < lh(e), therefore lh(e) · g(e, q, a) > 0. Oth- erwise g(e, q, a) is lh(e), lh(e) · g(e, q, a) = 0. − − There exists a primitive-recursive function next : N2 N s.t., if e encodes a TM and c is a • code for a configuration, then next(e, c) is a code for→the configuration after one step of the TM is executed, or equal to c, if the TM has halted. Informal description of next:

3 – Assume the configuration is c = π (q, a , . . . , an , i). h 0 −1i – Check using hasinstruction, whether the TM has stopped. If it has stopped, return c. – Otherwise, use lookup to obtain the codes for the next state q0, next symbol a0 and direction D. 0 – Replace in a , . . . , an , the ith element by a . Let the result be x. h 0 −1i – It might be that the head in the next step will leave the current segment. Then we have to extend the segment by one blank to the left or right: If the direction is D = L and i = 0, we have to extend the segment to the left by ∗ one blank. So the resultd ofe next is

π3(q0, xy x, 0) . hd ei ∗ If the direction is D = L and i n 1, we have to extend the segment to the ∗ right by one blank. Then6 d thee result≥of next− is

π3(q0, x xy , n) . ∗ hd ei January 4, 2004 CS 226 Computability Theory 5-28

Otherwise let i0 = i 1, if D = L , and i0 = i + 1, if D = L . Then the result of ∗ next is − d e 6 d e π3(q0, x, i0) . The above can easily be formalised as a primitive recursive function, using the previously defined functions, list operations (especially substitution and concatenation of lists) and case distinction. There exists a primitive-recursive predicate terminate N2, s.t. terminate(e, c) holds, iff the • TM e with configuration c stops. ⊆

terminate(e, c) : hasinstruction(e, state(c), symbol(c)) . ⇔ ¬ There exists a primitive-recursive function iterate : N3 N s.t. iterate(e, c, n) is the result of • iterating TM e with initial configuration c for n steps.→ The definition is as follows

iterate(e, c, 0) = c , iterate(e, c, n + 1) = next(e, iterate(e, c, n)) .

There exists a primitive-recursive function initn : Nn N, s.t. initn(m~ ) is the initial config- • uration for computing M (n)(m~ ) for of any TM M con→taining alphabet 0, 1, xy : { } n 3 init (m , . . . , mn ) = π ( q , bin(m ) xy bin(m ) xy xy bin(mn ), 0) . 0 −1 d 0e 0 ∗hd ei∗ 1 ∗hd ei∗· · ·∗hd ei∗ −1 There exists a primitive-recursive function extract : N N, s.t. if c is a configuration in • which the TM stops, extract(c) is the natural number corresp→ onding to the result returned by the TM. – Assume c = π3(q, x, i). – We need to find first the largest subsequence of x starting at i consisting of zeros and ones only: This is j defined as

j = µz < lh(x).z i (x)z = 0 (x)z = 1 . ≥ ∧ 6 ∧ 6 Now extract(c) = bin−1(substring(x, i, j)) . There exist primitive-recursive predicates Tn Nn+2 s.t., if e encodes a TM M, then • n ⊆ T (e, m0, . . . , mn−1, k) holds, iff M, started with the initial configuration corresponding to (n) a computation of M (m0, . . . , mn−1), stops after π0(k) steps and has final configuration π1(k).

Tn(e, m~ , k) : terminate(e, iterate(e, initn(m~ ), π (k))) iterate(e, initn(m~ ), π (k)) = π (k) . ⇔ 0 ∧ 0 1 There exists a primitive-recursive function U : N N s.t. , if M is a TM with code(M) = e, • and Tn(e, m,~ k) holds, then U(k) is the result of M→(n)(m~ ).

U(k) = extract(π1(k)) .

Now it follows that, if e encodes a TM M, then • M (n)(~n) U(µz.Tn(e, m,~ z)) . ' January 4, 2004 CS 226 Computability Theory 5-29

So we have shown the following theorem:

Theorem 5.14 (Kleene’s Normal Form Theorem)

n+1 ∼ (a) There exists partial recursive functions fn : N N s.t. if e is the code of TM M then → n fn(e, ~n) M (~n) . ' (b) There exist a primitive-recursive function U : N N and primitive recursive predicates n n+2 → T N s.t. for the function fn introduced in (a) we have ⊆ n fn(e, ~n) U(µz.T (e, m,~ z)) . ' Definition 5.15 Let U, Tn as in Kleene’s Normal Form Theorem 5.14. Then we define for e N the partial recursive function e n : Nn ∼ N by ∈ { } → e n(m~ ) U(µy.Tn(e, m~ , y)) . { } ' e n(m~ ) is pronounced Kleene-brackets-e in n arguments applied to m~ . { } Corollary 5.16

(a) For every Turing-computable function f : Nn ∼ N there exists an e N s.t. → ∈ f = e n . { } (b) Especially, all Turing-computable functions are partial recursive.

Proof: (a) Immediate. (b) by (a), since e n is partial recursive. { }

Theorem 5.17 The sets of URM-computable, of Turing-computable, and of partial recursive func- tions coincide.

Proof: By Theorem 4.3, every URM computable function is TM-computable. By Corollary 5.16, every TM computable function is partial recursive. By Lemma 5.13, all partial recursive functions are URM computable. Remark:

By Kleene’s Normal Form Theorem every Turing computable function g : Nn ∼ N is of the • form e n. Since the Turing computable functions are the partial recursive →functions, we obtain{ the} following:

The partial recursive functions g : Nn ∼ N are exactly the func- tions e n. → { }

This means: – e n is partial recursive for every e N. { } ∈ – For every partial recursive function g : Nn ∼ N there exists an e s.t. g = e n. → { } January 4, 2004 CS 226 Computability Theory 6-1

Therefore we can say • n+1 ∼ n fn : N N , fn(e, ~x) e (~x) → ' { } forms a universal n-ary partial recursive function, since it encodes all n-ary partial recursive function. So we can assign to each partial recursive function g a number, namely an e s.t. g = e n. • { } – Each number e denotes one partial recursive function e n. { } – However, several numbers denote the same partial recursive function, i.e. there are e, e0 s.t. e = e0 but e n = e0 n. 6 { } { } (Proof: There are several algorithms for computing the same function. Therefore there are several Turing machines which compute the same function. These Turing machines have different codes e).

Lemma 5.18 The set F of partial recursive functions (and therefore as well of Turing-computable and of URM-computable functions), i.e.

F := f : Nk ∼ N k N f partial recursive { → | ∈ ∧ } is countable.

Proof: Every partial recursive function is of the form e n for some e, n N. Therefore { } ∈ f : N2 F , f(e, n) := e n → { } is surjective, therefore as well

g : N F , g(e) := f(π (e), π (e)) . → 0 1 Therefore F is countable.

6 The Church-Turing Thesis

We have introduced three models of computations: The URM-computable functions. • The Turing-computable functions. • The partial recursive functions. • Further we have shown that all three models compute the same partial functions. Lots of other models of computation have been studied: The while programs. • Symbol manipulation systems by Post and by Markov. • Equational calculi by Kleene and by G¨odel. • The λ-definable functions. • Any of the programming languages Pascal, C, C++, Java, Prolog, Haskell, ML (and many • more). Lots of other models of computation. • January 4, 2004 CS 226 Computability Theory 7-1

One can show that the partial functions computable in these models of computation are again exactly the partial recursive functions. So all these attempts to define a complete model of com- putation result in the same set of partial recursive functions. Because of this, one states the

Church-Turing Thesis: The (in an intuitive sense) com- putable partial functions are exactly the partial recursive functions (or equivalently the URM-computable or Turing-computable func- tions).

Note that this thesis is not a mathematical theorem, but a philosophical thesis. Therefore the Church-Turing thesis cannot be proven, but we can only provide philosophical evidence for it. This evidence comes from the following considerations and empirical facts:

All complete models of computation suggested by researchers (including those mentioned • above) define the same set of partial functions. Many of these models were carefully designed in order to capture intuitive notions of com- • putability: – The Turing machine model captures the intuitive notion of computation on a piece of paper in a general sense. – The URM machine model captures the general notion of computability by a com- puter. – Symbolic manipulation systems capture the general notion of computability by manip- ulation of symbolic strings. No intuitively computable partial function, which is not partial recursive, has been found, • despite lots of researchers trying it. Using metamathematical investigations, a strong intuition has been developed that in princi- • pal programs in any programming language can be simulated by Turing machines and URMs, and therefore partial functions computable in such a language are partial recursive.

Because of this, only very few researchers seriously doubt the correctness of the Church-Turing thesis. Remark: Because of the equivalence of the models of computation it follows that the halting problem for any of the above mentioned models of computation is undecidable. Especially it is undecidable, whether a program in one of the programming languages mentioned terminates. If we had a decision procedure which decides whether or not say a Pascal program terminates for given input, then we could, using a translation of URMs into Pascal programs, decide the halting problem for URMs, which is impossible.

7 Kleene’s Recursion Theorem

The main theorem in this section is Kleene’s Recursion Theorem, which expresses that the partial recursive functions are closed under a very general form of recursion. In order to prove this we use the S-m-n theorem, which is an important lemma that is used in many proofs in computability theory. January 4, 2004 CS 226 Computability Theory 7-2

7.1 The S-m-n Theorem n+m ∼ If f : N N is a partial recursive, and we fix the first m arguments (say l0, . . . , lm−1), we obtain a function→ of n arguments

n ∼ g : N N , g(x , . . . , xn ) = f(l , . . . , lm , x , . . . , xn ) , → 0 −1 0 −1 0 −1 which is again partial recursive. The following theorem says that we can compute a Kleene index 0 0 n of g (i.e. an e s.t. g = e ) from a Kleene index of f and l0, . . . , lm−1 primitive-recursively: { } m m+n m n There exists a primitive-recursive function S s.t., if f = e , then g = S (e, l , . . . , lm ) : n { } { n 0 −1 } Theorem 7.1 (S-m-n Theorem). For all m, n N there exists a primitive-recursive function ∈ Sm : Nm+1 N n → s.t. for all l , . . . , lm , x , . . . , xn N 0 −1 0 −1 ∈ m n m+n S (e, l , . . . , lm ) (x , . . . , xn ) e (l , . . . , lm , x , . . . , xn ) . { n 0 −1 } 0 −1 ' { } 0 −1 0 −1

Proof: We write ~x for x0, . . . , xn−1 and ~l for l0, . . . , lm−1. Let M be a Turing machine encoded 0 m ~ as e. The Turing machine M corresponding to Sn (e, l) should be such that

n M 0 (~x) M n+m(~l, ~x) . ' Such a Turing machine can be defined as follows:

1. The initial configuration is that x0, . . . , xm−1 are written on the tape and the head is pointing to the left most bit:

xy xy bin(x ) xy xy bin(xn ) xy xy · · · 0 · · · −1 · · · ↑

0 2. M writes first the binary representation of l0, . . . , lm−1 separated by blanks in front of this, and terminates this step with the head pointing to the most significant bit of bin(l0). So we obtain after this first step the following configuration:

xy xy bin(l ) xy xy bin(lm ) xy bin(x ) xy xy bin(xn ) xy xy · · · 0 · · · −1 0 · · · −1 · · · ↑ 3. Then M 0 will run M, starting in this configuration, and will terminate, if M terminates. The result will be M m+n(~l, ~x) , ' and in total we get therefore n M 0 (~x) M m+n(~l, ~x) ' as desired. A code for M 0 can be obtained from a code for M and from ~l as follows:

00 One takes a Turing machine M , which writes the binary representations of l0, . . . , lm−1 in • front of its initial position (separated by a blank and with a blank at the end), and terminates at the left most bit. It’s a straightforward execrcise to write a code for the instructions of such a Turing machine, depending on ~l, and show that the function defining it is primitive- recursive. Assume, the terminating state of M 00 has G¨odel number (i.e. code) s, and that all other states have G¨odel numbers < s. January 4, 2004 CS 226 Computability Theory 7-3

Then one appends to the instructions of M 00 the instructions of M, but with the states • shifted, so that the new initial state of M is the final state s of M 00 (i.e. we add s to all the G¨odel numbers of states occurring in M). This can be done as well primitive-recursively.

00 ~ m So a code for M can be defined primitive-recursively depending on a code e for M and l, and Sn is the primitive-recursive function computing this. With this function it follows now that, if e is a code for a TM, then Sm(e,~l) n(~x) e n+m(~l, ~x) . { n } ' { } This equation holds, even if e is not a code for a TM: In this case e m+n interprets e as if it were the code for a valid TM M (A code for such a valid TM is obtained{ }by deleting any instructions code(q, a, q0, a0, D) in e s.t. there exists an instruction code(q, a, q00, a00, D0) • occurring before it in the sequence e, and by replacing all directions > 1 by R = 1.) • d e 0 m ~ e := Sn (e, l) will have the same deficiencies as e, but when applying the Kleene-brackets, it will be interpreted as a TM M 0 obtained from e0 in the same way as we obtained M from e, and therefore

n e0 n(~x) M 0 (~x) M n+m(~l, ~x) e n+m(~l, ~x) . { } ' ' ' { } So we obtain the desired result in this case as well. n Notation: We will in the following usually omit the superscript n in e (m0, . . . , mn−1), i.e. we n { } will write e (m0, . . . , mn−1) instead of e (m0, . . . , mn−1). Further e not applied to arguments and without{ }superscript means usually {e} 1. { } { } 7.2 Kleene’s Recursion Theorem Theorem 7.2 (Kleene’s Recursion Theorem). Let f : Nn+1 ∼ N be a partial recursive function. Then there exists an e N s.t. → ∈ n e (x , . . . , xn ) f(e, x , . . . , xn ) . { } 0 −1 ' 0 −1 Before proving this, we give some examples for the use of this theorem. Examples:

1. There exists an e s.t. e (x) e + 1 . { } ' For showing this take in the Recursion Theorem f(e, n) := e + 1. Then we get:

e (x) f(e, x) e + 1 . { } ' '

Note that it would be rather difficult to find directly such an e (try it yourself). Such an e would be a code for a Turing machine, which, independent of its input writes its own code on the tape.

Remark: Such applications of the Recursion Theorem are usually not very useful. Usually, when using it, one doesn’t use the index e directly, but only the application of e to some arguments. { } January 4, 2004 CS 226 Computability Theory 7-4

2. The function fib computing the Fibonacci numbers can be seen to be recursive by using Kleene’s Recursion theorem. (This is a weaker result than what we obtained above, where we showed that fib is even primitive-recursive. However, it is a nice example which demonstrates very well the use of the recursion theorem.) Remember the defining equations for fib: fib(0) = 1 , fib(1) = 1 , fib(n + 2) = fib(n) + fib(n + 1) . From these equations we obtain 1, if n = 0 or n = 1, fib(n) = fib(n · 2) + fib(n · 1), otherwise. − − We show using the recursion theorem that there exists a recursive function g : N N, which fulfils the same equations, i.e. s.t. → 1, if n = 0 or n = 1, g(n) ' g(n · 2) + g(n · 1), otherwise. − − This can be shown as follows: Define a recursive f : N2 N s.t. → 1, if n = 0 or n = 1, f(e, n) '  e (n · 2) + e (n · 1), otherwise. { } − { } − Now let e be s.t. e (n) f(e, n) . { } ' Then e fulfills the equations 1, if n = 0 or n = 1, e (n) { } '  e (n · 2) + e (n · 1), otherwise. { } − { } − Let g = e . Then g fulfills the equations as desired: { } 1, if n = 0 or n = 1, g(n) ' g(n · 2) + g(n · 1), otherwise. − − It is not a priori clear that g(x) = fib(x), since there might be several ways of solving the same recursion equation. (For instance, the recursion equation f(x) f(x) can be solved by any definition of f, e.g. by f(x) = 0, by f(x) = 1, and by f(x) ). ' ↑ Therefore we show by induction on n that n N.g(n) fib(n), and that therefore fib is recursive: ∀ ∈ '

– In the base cases n = 0, 1 this is clear. – In the induction step n n + 1 with n 1 we have → ≥ IH g(n + 1) g(n · 1) + g(n) fib(n · 1) + fib(n) = fib(n + 1) . ' − ' − In a similar way one can introduce arbitrary partial recursive functions g, where g(~n) refers to arbitrary other values g(m~ ). Such definitions correspond to the recursive definition of functions in many programming languages. For instance, in Java one defines the Fibonacci numbers recursively as follows: January 4, 2004 CS 226 Computability Theory 7-5

public static int fib(int n){ if (n == 0 || n == 1){ return 1;} else{ return fib(n-1) + fib(n-2); } };

When programming functions recursively, we obtain functions which might terminate, or might not terminate. Similarly, when defining functions using the recursion theorem, we obtain partial recursive functions, which might not be always be defined, as in the following two examples: 3. There exists a partial recursive function g : N ∼ N s.t. → g(x) g(x) + 1 . ' (Take in Kleene’s Recursion Theorem

f(e, x) e (x) + 1 , ' { } so we obtain an e s.t. e (x) f(e, x) e (x) + 1 . { } ' ' { } Then let g = e .) { } It follows that g(x) . ↑ For, if g(x) were defined, say g(x) = n, we would get n = n + 1. Note that, if g(x) is undefined, g(x) + 1 is undefined as well, so

g(x) g(x) + 1 ' holds in this case. The definition of g corresponds to the following definition in Java:

public static int g(int n){ return g(n) + 1; };

When executing g(x), Java loops (actually it will terminate with a stack overflow). (Note that a recursion equation for a function f can not always be solved by setting f(x) . E.g. the recursion equation for fib above can’t be solved by setting fib(n) .) ↑ ↑ 4. There exists a partial recursive function g : N ∼ N s.t. → g(x) g(x + 1) + 1 . ' Again g(x) is undefined, for if g(x) were defined, it need to be infinitely big. A corresponding Java function will again loop for ever. January 4, 2004 CS 226 Computability Theory 7-6

5. An interesting example, where the Recursion Theorem is of great help in order to show that a function is partial recursive, is the Ackermann Function Ack. Remember that Ack has the following defining equations:

Ack(0, y) = y + 1 , Ack(x + 1, 0) = Ack(x, 1) , Ack(x + 1, y + 1) = Ack(x, Ack(x + 1, y)) .

So we have y + 1, if x = 0, Ack(x, y) = Ack(x · 1, 1), if x > 0 and y = 0, Ack(x−· 1, Ack(x, y · 1)), otherwise. − − In order to show that Ackis recursive, we use Kleene’s Recursion Theorem in order to introduce a partial recursive function g which fulfils the same equations, i.e.

y + 1, if x = 0, g(x, y) g(x · 1, 1), if x > 0 y = 0, ' g(x−· 1, g(x, y · 1)), if x > 0 ∧ y > 0. − − ∧  (In detail this is shown as follows: There exists an e s.t. y + 1, if x = 0, e (x, y)  e (x · 1, 1), if x > 0 y = 0, { } ' {e}(x−· 1, e (x, y · 1)), if x > 0 ∧ y > 0. { } − { } − ∧  Let g := e 2. Then g fulfils those equations.) { } We show by induction on x that g(x, y) is defined and equal to Ack(x, y) for all x, y N: ∈ – Base case x = 0. g(0, y) = y + 1 = Ack(0, y) . – Induction Step x x + 1. Assume → g(x, y) = Ack(x, y) .

We show g(x + 1, y) = Ack(x + 1, y) by side-induction on y: Base case y = 0: ∗ Main-IH g(x + 1, 0) g(x, 1) = Ack(x, 1) = Ack(x + 1, 0) . ' Induction Step y y + 1: ∗ → g(x + 1, y + 1) g(x, g(x + 1, y)) ' Main-IH g(x, Ack(x + 1, y)) ' Side-IH Ack(x, Ack(x + 1, y)) ' = Ack(x + 1, y + 1) . January 4, 2004 CS 226 Computability Theory 8-1

Proof of Kleene’s Recursion Theorem: We write ~x for x0, . . . , xn−1. Assume

f : Nn+1 ∼ N . → We have to find an e s.t. ~x N. e n(~x) f(e, ~x) . ∀ ∈ { } ' 1 The idea is to define e = Sn(e1, e2) for some (yet unknown) e1, e2. Then we get

e n(~x) S1 (e , e ) n(~x) e n+1(e , ~x) . { } ' { n 1 2 } ' { 1} 2

So, in order to fulfil our original equation, we need to find e1, e2 s.t.

~x. e n+1(e , ~x) f(S1 (e , e ), ~x) . ∀ { 1} 2 ' n 1 2 '{e}n(~x) 'f(e,~x) | {z } | {z } Now let e1 be an index for the partial function function

(y, ~x) f(S1 (y, y), ~x) , 7→ n i.e. let e1 s.t. e n+1(y, ~x) f(S1 (y, y), ~x) . { 1} ' n Using this equation, the above equation becomes

f(S1 (e , e ), ~x) f(S1 (e , e ), ~x) . n 1 2 ' n 2 2 n+1 '{e1} (e2,~x) | {z } We can fulfill this equation by defining e2 := e1 . So, an index solving the problem is

1 1 e = Sn(e1, e2) = Sn(e1, e1) , and we are done. In short the complete proof reads as follows: Let e1 be s.t. e n+1(y, ~x) f(S1 (y, y), ~x) . { 1} ' n 1 Let e := Sn(e1, e1). Then we have

1 e = Sn(e1, e1) e n(~x) S1 (e , e ) n(~x) { } ' { n 1 1 } S-m-n theorem e n+1(e , ~x) ' { 1} 1 Def of e1 1 f(Sn(e1, e1), ~x) 1' e = Sn(e1, e1) f(e, ~x) . ' January 4, 2004 CS 226 Computability Theory 8-2

8 Recursively Enumerable Predicates 8.1 Introduction In this section we are studying predicates P Nn, which are non-decidable, but “half decidable”. The official name is semi-decidable, or recursiv⊆ely enumerable. (The latter name will be explained later). Remember that a predicate A is recursive (or, using the Church-Turing thesis computable), if its characteristic function χA is recursive, so we have a “full” decision procedure:

P (x , . . . , xn ) χA(x , . . . , xn ) = 1, i.e. answer yes , 0 −1 ⇔ 0 −1 P (x , . . . , xn ) χA(x , . . . , xn ) = 0, i.e. answer no . ¬ 0 −1 ⇔ 0 −1 A predicate P will be semi-decidable, if there exists a partial recursive function f s.t.

P (x , . . . , xn ) f(x , . . . , xn ) . 0 −1 ⇔ 0 −1 ↓ Therefore

If P (x , . . . , xn ) holds, we will eventually know it – the algorithm for computing f will • 0 −1 finally terminate, and then we know that P (x0, . . . , xn−1) holds.

If P (x0, . . . , xn−1) doesn’t hold, then the algorithm computing f will loop for ever, and we • never get an answer. So we have:

P (x , . . . , xn ) f(x , . . . , xn ) i.e. answer yes , 0 −1 ⇔ 0 −1 ↓ P (x , . . . , xn ) f(x , . . . , xn ) i.e. no answer returned by f . ¬ 0 −1 ⇔ 0 −1 ↑ It seems at first sight that recursively enumerable sets are not interesting from a computational point of view. But it turns out that such sets occur in many practical applications. A typical example is a set A of the form

A(x) there exists a proof in a certain of a property ϕ(x) . ⇔ For instance, ϕ(x) might express the correctness of an algorithm, e.g. that the result of a digital circuit with x-bit inputs for multiplying two binary numbers is actually the product of its inputs. Under certain conditions (“soundness and completeness”; see other modules) we have: If we find a proof for ϕ(x), then ϕ(x) holds. • If we don’t find a proof, then ϕ(x) doesn’t hold. • There are many formal systems such that the corresponding function f s.t.

A(x) f(x) ⇔ ↓ terminates in most practical cases very fast, despite the fact that for theoretical reasons it is known that it can take arbitrarily long before this happens. If one under these circumstances runs f(x), one usually assumes that, if one hasn’t obtained an answer after a certain amount of time, A(x) is probably false, and then uses other means in order to determine the reason why A(x) is false. If A(x) holds, one expects the algorithm for f(x) to terminate after a certain amount of time. If this algorithms actually terminates within the expected amount of time, one knows that ϕ(x) is true. January 4, 2004 CS 226 Computability Theory 8-3

8.2 Recursively Enumerable Predicates and their Canonical Numbering Definition 8.1 Assume f : Nn ∼ N is a partial function. → (a) The domain of f, in short dom(f) is defined as follows:

n dom(f) := (x , . . . , xn ) N f(x , . . . , xn ) . { 0 −1 ∈ | 0 −1 ↓} (b) The range of f, in short ran(f) is defined as follows:

ran(f) := y N x , . . . , xn .(f(x , . . . , xn ) y) . { ∈ | ∃ 0 −1 0 −1 ' }

(c) The graph of f is the set Gf defined as

n+1 Gf := (~x, y) N f(~x) y . { ∈ | ' } Remark: The notion “graph” used here has nothing to do with the notion of “graph” in graph theory. • The is essentially the graph we draw when visualising f. Take as an • example x ∼ , if x even, f : N N , f(x) = 2 → undefined, if x is odd. Then we can draw f as follows:

4 3 2 1 0 0 1 2 3 4 5 6 7

In this example we have

Gf = (0, 0), (2, 1), (4, 2), (6, 3), . . . { } These are exactly the coordinates of the crosses in the picture:

4 3 (6,3) 2 (4,2) 1 (0,0) (2,1) 0 0 1 2 3 4 5 6 7 January 4, 2004 CS 226 Computability Theory 8-4

Definition 8.2 A predicate A Nn is recursively enumerable, in short r.e., if there exists a partial recursive• function f : N⊆n ∼ N s.t. → A = dom(f) .

Sometimes recursive predicates are as well called • – semi-decidable or – semi-computable or – partially computable. Lemma 8.3 (a) Every recursive predicate is r.e. (b) The halting problem, defined as Haltn(e, ~x) : e n(~x) , ⇔ { } ↓ is r.e., but not recursive. Proof: (a) Assume A Nk is decidable. Then Nk A is recursive, therefore its characteristic function ⊆ \ χNk\A is recursive as well. Define k ∼ f : N N, f(~x) : (µy.χNk (~x) 0) . → ' \A ' Note that y doesn’t occur in the body of the µ-expression. Then we have If A(~x), then • χNk (~x) 0 , \A ' so f(~x) (µy.χNk (~x) 0) 0 , ' \A ' ' especially f(~x) . ↓ If (Nk A)(~x), then • \ χNk (~x) 1 , \A ' so there exists no y s.t. χNk (~x) 0 . \A ' therefore f(~x) (µy.χNk (~x) 0) undefined , ' \A ' ' especially f(~x) . ↑ So we get A(~x) f(~x) ~x dom(f) , ⇔ ↓⇔ ∈ A = dom(f) is r.e. . (b) We have n Halt (e, ~x) : fn(e, ~x) , ⇔ ↓ where fn is partial recursive as in Sect. 5 s.t. n e (~x) fn(e, ~x) . { } ' So n Halt = dom(fn) is r.e. . We have seen above that Haltn is non-computable, i.e. not recursive. January 4, 2004 CS 226 Computability Theory 8-5

n Theorem 8.4 (The sets We .) There exist r.e. predicates Wn Nn+1 s.t., if we define ⊆ n n n W := (x , . . . , xn ) N W (e, x , . . . , xn ) , e { 0 −1 ∈ | 0 −1 then we have the following: Each of the predicates Wn Nn is r.e. • e ⊆ For each r.e. predicate P Nn there exists an e N s.t. P = Wn, i.e. • ⊆ ∈ e ~x N.P (~x) Wn(~x) . ∀ ∈ ⇔ e n n In other words, the r.e. sets P N are exactly the sets We , where e ranges over all natural numbers. ⊆

Remark:

n We ist therefore a universal recursively enumerable sets, which encodes all other re- • cursively enumerable predicates. The theorem expresses that we can assign to every recursively enumerable predicate A a • n natural number, namely the e s.t. A = We . – Each number denotes one predicate. – But several numbers denote the same predicate, i.e. there are e, e0 s.t. e = e0 but n n 6 We = We0 . (This is since there are e, e0 s.t. e = e0 but e n = e0 n). 6 { } { } Proof idea for Theorem 8.4: If A is r.e., then A = dom(f) for some partial rec. f. Let f = e n. Then A = Wn. { } e Proof of Theorem 8.4: Let fn s.t.

e, ~n N.fn(e, ~x) e (~x) . ∀ ∈ ' { } Define n W := dom(fn) . Wn is r.e. We have

~x Wn (e, ~x) Wn ∈ e ⇔ ∈ fn(e, ~x) ⇔ ↓ e (~x) ⇔ { } ↓ ~x dom( e n) . ⇔ ∈ { } Therefore Wn = dom( e n) . e { } n W is r.e., since fn is partial recursive. Furthermore, we have for any set A Nn ⊆ A is r.e. iff A = dom(f) for some partial recursive f iff A = dom( e n) for some e N { } ∈ iff A = Wn for some e N. e ∈ This shows the assertion. January 4, 2004 CS 226 Computability Theory 8-6

8.3 Characterisations of the Partial Recursive Functions Theorem 8.5 (Characterisation of the recursively enumerable predicates). Let A Nn. The following is equivalent: ⊆ (i) A is r.e. (ii) A = ~x y.R(~x, y) { | ∃ } for some primitive-recursive predicate R. (iii) A = ~x y.R(~x, y) { | ∃ } for some recursive predicate R. (iv) A = ~x y.R(~x, y) { | ∃ } for some recursively enumerable predicate R. (v) A = or ∅ A = (f (x), . . . , fn (x)) x N { 0 −1 | ∈ } for some primitive-recursive functions

fi : N N . → (vi) A = or ∅ A = (f (x), . . . , fn (x)) x N { 0 −1 | ∈ } for some recursive functions fi : N N . → Remark: (a) We can summarise Theorem 8.5 by saying that there are essentially three equivalent ways of defining that A Nn is r.e.: ⊆ A = dom(f) for some partial recursive f; • A = or A is the image of primitive recursive/recursive functions f , . . . , fn ; • ∅ 0 −1 A = ~x y.R(~x, y) for some primitive-recursive/recursive/r.e. R. • { | ∃ } (b) Consider the special case n = 1. Taking (i), (v), (vi) we get the following equivalence:

Let A N. Then the following is equivalent: • ⊆ – A is r.e. – A = or ∅ A = ran(f) for some primitive-recursive f : N N . → (This is (v)). – A = or ∅ A = ran(f) for some recursive f : N N . → (This is (vi))

This means that A N is r.e. if A = or there exists a (primitive-)recursive function f, which enumerates all its elemen⊆ ts. This explains∅ the name “recursively enumerable predicate”. January 4, 2004 CS 226 Computability Theory 8-7

Proof of Theorem 8.5: (i) (ii): Pro→of idea for (i) (ii): If A is r.e., A = dom(f), then → A(~x) f(~x) ⇔ ↓ y.the Turing machine for computing f(~x) terminates after y steps ⇔ ∃ y.R(~x, y) ⇔ ∃ where R(~x, y) the Turing machine for computing f(~x) terminates after y steps . ⇔ R is primitive-recursive. Detailed proof of (i) (ii): (The actual predicate R→we will take will be slightly differently from that in the proof idea – it is technically easier to prove the theorem this way.) ∼ If A is r.e., then for some partial recursive function f : Nn N we have → A = dom(f) .

Let f = e n. By Kleene’s{ } Normal Form Theorem there exist a primitive-recursive function U : N N and a n+1 → primitive-recursive predicate Tn N s.t. ⊆ n e (~x) U(µy.Tn(e, ~x, y)) . { } ' Therefore

A(~x) ~x dom(f) ⇔ ∈ ~x dom( e n) ⇔ ∈ { } U(µy.Tn(e, ~x, y)) ⇔ ↓ U prim.-rec., therefore total µy.Tn(e, ~x, y) ⇔ ↓ y.Tn(e, ~x, y) ⇔ ∃ y.R(~x, y) . ⇔ ∃ where R(~x, y) Tn(e, ~x, y) . ⇔ Now R is primitive-recursive, and

A = ~x y.R(~x, y) . { | ∃ } (ii) (iii): Trivial. (iii)→ (iv): By Lemma 8.3. (iv) → (ii): Assume→ A = ~x y.R(~x, y) , { | ∃ } where R is r.e. By “(i) (ii)” there exists a primitive-recursive predicate S s.t. → R(~x, y) z.S(~x, y, z) . ⇔ ∃ Therefore

A = ~x y. z.S(~x, y, z) { | ∃ ∃ } = ~x y.S(~x, π (y), π (y)) { | ∃ 0 1 } = ~x y.R0(~x, y) , { | ∃ } January 4, 2004 CS 226 Computability Theory 8-8 where R0(~x, y) : S(~x, π (y), π (y)) is primitive-recursive. ⇔ 0 1 (ii) (v): Proof→ idea for (ii) (v), special case n = 1: Assume → A = x N y.R(x, y) , where R is primitive recursive, • { ∈ | ∃ } A = , • 6 ∅ y A fixed. • ∈ Define f : N N recursive, → π (x), if R(π (x), π (x)), f(x) = 0 0 1 y otherwise.

Then A = ran(f). Detailed proof for (ii) (v): Assume A is not empty and→ R is primitive-recursive s.t.

A = ~x y.R(~x, y) . { | ∃ }

Let ~y = y , . . . , yn be some fixed elements s.t. A(~y) holds. Define for i = 0, . . . , n 1 0 −1 − n+1 n+1 n+1 n+1 n+1 πi (x), if R(π0 (x), π1 (x), . . . , πn−1(x), πn (x)), fi(x) := yi, otherwise. fi are primitive-recursive. We show

A = (f (x), . . . , fn (x)) x N . { 0 −1 | ∈ } “ ”: Assume⊇ x N, and show ∈ A(f0(x), . . . , fn−1(x)) . If R(πn+1(x), πn+1(x), . . . , πn+1(x), πn+1(x)), then • 0 1 n−1 n z.R(πn+1(x), πn+1(x), . . . , πn+1(x), z) , ∃ 0 1 n−1 therefore (πn+1(x), πn+1(x), . . . , πn+1(x)) A , 0 1 n−1 ∈ therefore A(f0(x), . . . , fn−1(x)) .

If (Nk R)(πn+1(x), πn+1(x), . . . , πn+1(x), πn+1(x)), then • \ 0 1 n−1 n

fi(x) = yi ,

therefore by A(~y) A(f0(x), . . . , fn−1(x)) . January 4, 2004 CS 226 Computability Theory 8-9

So in both cases we get that A(f0(x), . . . , fn−1(x)) , so (f (x), . . . , fn (x)) x N A . { 0 −1 | ∈ } ⊆ “ ”: Assume⊆ A(x0, . . . , xn−1) , and show z.(f (z) = x fn (z) = xn ) . ∃ 0 0 ∧ · · · ∧ −1 −1 We have for some y R(x0, . . . , xn−1, y) . Let n+1 z = π (x0, . . . , xn−1, y) . Then we have n+1 n+1 xi = πi (z) , y = πn (z) , therefore n+1 n+1 n+1 n+1 R(π0 (z), π1 (z), . . . , πn−1(z), πn (z)) , therefore for i = 0, . . . , n 1 − n+1 fi(z) = πi (z) = xi , therefore

(x , . . . , xn ) = (f (z), . . . , fn (z)) (f (x), . . . , fn (x)) x N , 0 −1 0 −1 ∈ { 0 −1 | ∈ } and we have A (f (x), . . . , fn (x)) x N . ⊆ { 0 −1 | ∈ } Therefore we have shown A = (f (x), . . . , fn (x)) x N , { 0 −1 | ∈ } and the assertion follows. (v) (vi): Trivial. (vi)→ (i): Proof→idea for (vi) (i), special case n = 1: If A = ran(f), where f→is recursive, then A = dom(g) where

g(x) (µy.f(y) x) . ' ' g is partial recursive. Detailed proof for (vi) (i): If A is empty, then A is recursiv→ e, therefore r.e. Assume A = (f (x), . . . , fn (x)) x N . { 0 −1 | ∈ } for some recursive functions fi. Define f : Nn ∼ N , → s.t. f(x , . . . , xn ) : µx.(f (x) x fn (x) xn ) . 0 −1 ' 0 ' 0 ∧ · · · ∧ −1 ' −1 January 4, 2004 CS 226 Computability Theory 8-10 f can be written as

· · f(x0, . . . , xn−1) : µx.(((f0(x) x0) + (x0 f0(x)))+ , ' −· −· ((f1(x) x1) + (x1 f1(x)))+ + − − · · · · · ((fn (x) xn ) + (xn fn (x))) 0) −1 − −1 −1− −1 ' therefore f is partial recursive. Further we have

A(x , . . . , xn ) x N.x = f (x) xn = fn (x) 0 −1 ⇔ ∃ ∈ 0 0 ∧ · · · ∧ −1 −1 f(x , . . . , xn ) , ⇔ 0 −1 ↓ therefore A = dom(f) is r.e. .

Theorem 8.6 (Relationship between recursive and r.e. predicates). A Nk is recursive iff both A and Nk A are r.e. ⊆ \ Proof ideas: “ ” is easy. F⇒or “ ”: Assume R, S primitive-recursive s.t. ⇐ A(~x) y.R(~x, y) ⇔ ∃ (Nk A)(~x) y.S(~x, y) \ ⇔ ∃ In order to decide A, search simultaneously for a y s.t. R(~x, y) and for a y s.t. S(~x, y) holds. If we find a y s.t. R(~x, y) holds, then A(~x) holds. If we find a y s.t. S(~x, y) holds, then A(~x) holds Detailed Proof: ¬ “ ”: If⇒A is recursive, then both A and Nk A are recursive, therefore as well r.e. “ ”: \ Assume⇐ A, Nk A are r.e. Then there exist\ primitive-recursive predicates R and S s.t.

A = ~x y.R(~x, y) , { | ∃ } Nk A = ~x y.S(~x, y) . \ { | ∃ } By A (Nk A) = Nk , ∪ \ it follows ~x.(( y.R(~x, y)) ( y.S(~x, y))) , ∀ ∃ ∨ ∃ therefore as well ~x. y.(R(~x, y) S(~x, y)) . ( ) ∀ ∃ ∨ ∗ Define h : Nn N , h(~x) := µy.(R(~x, y) S(~x, y)) . → ∨ h is partial recursive. By ( ) we have h is total, so h is recursive. We show ∗ A(~x) R(~x, h(~x)) . ⇔ If A(~x) January 4, 2004 CS 226 Computability Theory 8-11 then y.R(~x, y) ∃ and ~x (Nk A) , 6∈ \ therefore y.S(~x, y) . ¬∃ Therefore we have for the y found by h(~x) that R(~x, y) holds, i.e.

R(~x, h(~x)) .

On the other hand, if R(~x, h(~x)) holds then

y.R(~x, y) , ∃ therefore A(~x) . Now we get A = ~x R(~x, h(~x)) is recursive. { | } Theorem 8.7 (Characterisation of partial recursive functions). Let f : Nn ∼ N. Then → f is partial recursive Gf is r.e. . ⇔ Proof idea for “ ”: Assume R primitiv⇐e recursive s.t.

Gf (~x, y) z.R(~x, y, z) . ⇔ ∃

In order to compute f(~x), search for a y s.t. R(~x, π0(y), π1(y)) holds. f(~x) will be the first projection of this y. Detailed Proof: “ ”: Assume⇒ f is partial recursive. Then f = e n for some e N. By Kleene’s Normal Form Theorem we have { } ∈ f(~x) U(µy.Tn(~x, y)) , ' n+1 for some primitive-recursive relation Tn N and some primitive-recursive function U : N N. Therefore ⊆ →

0 0 (~x, y) Gf (f(~x) y) z.(Tn(~x, z) ( z < z.Tn(~x, z )) U(z) = y) , ∈ ⇔ ' ⇔ ∃ ∧ ∀ ∧ therefore Gf is r.e. “ ”: ⇐ If Gf is r.e., then there exists a primitive-recursive predicate R s.t.

f(~x) y (~x, y) Gf z.R(~x, y, z) . ' ⇔ ∈ ⇔ ∃

Therefore for any z s.t. R(~x, π0(z), π1(z)) holds we have that

f(~x) π (z) . ' 0 Therefore f(~x) π (µu.R(~x, π (u), π (u))) , ' 0 0 1 f is partial recursive. January 4, 2004 CS 226 Computability Theory 8-12

8.4 Closure Properties of the Recursively enumerable Predicates Lemma 8.8 The recursively enumerable predicates are closed under the following operations: (a) Union: If A, B Nn are r.e. so is A B. ⊆ ∪ (b) Intersection: If A, B Nn are r.e. so is A B. ⊆ ∩ (c) Substitution by recursive functions: n k If A N is r.e., fi : N N are recursive for i = 0, . . . , n, so is ⊆ → k C := (y , . . . , yk ) N A(f (y , . . . , yk ), . . . , fn (y , . . . , yk )) . { 0 −1 ∈ | 0 0 −1 −1 0 −1 (d) (Unbounded) existential quantification: If D Nn+1 is r.e., so is ⊆ n E := (x , . . . , xn ) N y.D(x , . . . , xn , y) . { 0 −1 ∈ | ∃ 0 −1 } (e) Bounded universal quantification: If D Nn+1 is r.e., so is ⊆ n+1 F := (x , . . . , xn , z) N y < z.D(x , . . . , xn , z) . { 0 −1 ∈ | ∀ 0 −1 } Proof: Let A, B Nn be r.e. Then there⊆ exist primitive-recursive relations R, S s.t.

n A = (x0, . . . , xn−1) N y.R(x0, . . . , xn−1, y) , { ∈ n | ∃ } B = (x , . . . , xn ) N y.S(x , . . . , xn , y) . { 0 −1 ∈ | ∃ 0 −1 } (a), (b): One can easily see that

n A B = (x0, . . . , xn−1) N y.(R(x0, . . . , xn−1, y) S(x0, . . . , xn−1, y)) , ∪ { ∈ n | ∃ ∨ } A B = (x , . . . , xn ) N y.(R(x , . . . , xn , π (y)) S(x , . . . , xn , π (y))) . ∩ { 0 −1 ∈ | ∃ 0 −1 0 ∧ 0 −1 1 } therefore A B and A B are r.e. ∪ ∩ (c):

C = (~y) A(f (~y), . . . , fn (~y)) { | 0 −1 } = (~y) z.R(f (~y), . . . , fn (~y), z) is r.e. { | ∃ 0 −1 } (d) follows from 8.5. (e): Assume T is a primitive-recursive predicate s.t.

n+1 D = (x , . . . , xn , y) N z.T (x , . . . , xn , y, z) . { 0 −1 ∈ | ∃ 0 −1 } Then we get F = (~x, y) y0 < y.D(~x, y0) { | ∀ } = (~x, y) y0 < y. z.T (~x, y0, z) { | ∀ 0 ∃ 0 } = (~x, y) z. y < y.T (~x, y , (z)y0 ) is r.e., { | ∃ ∀ } January 4, 2004 CS 226 Computability Theory 11-1 where in the last line we used that

0 0 (~x, z) y < y.T (~x, y , (z)y0 ) is primitive-recursive . { | ∀ } Lemma 8.9 The r.e. predicates are not closed under complement, i.e. there exists an r.e. predi- cate A Nn s.t. Nn A is not r.e. ⊆ \ Proof: Haltn is r.e.. If Nn Haltn were r.e., then by Theorem 8.6 Haltn were recursive, which is not the case. \

9 Lambda-definable Functions 10 Reducibility. 11 Computational Complexity