<<

Chapter 7: The Grand Unified Theory of Computation slides © 2017, David Doty ECS 220: Theory of Computation based on “The Nature of Computation” by Moore and Mertens

- Great unifications in science

Newton: apples and planets follow the same laws of motion and gravity

Maxwell: electricity and magnetism are part of a single electromagnetic field, and its oscillations are what we call light

Grand Unified Theory of Physics: (TODO: combine gravity with quantum mechanics)

Turing (and others): all powerful models of computation known at the time (e.g., general recursive functions, λ-calculus, Turing machines) have equivalent power

Chapter 7 2 Computing with steam

“I wish to God that these calculations had been executed by steam!” -- Charles Babbage (while checking astronomical tables and finding errors)

Any “well-behaved” f:ℝ→ℝ has a Taylor series expansion around a ∈ ℝ

푓′ 푎 푓′′ 푎 푓′′′ 푎 푓′′′′ 푎 푓 푥 = 푓 푎 + 푥 − 푎 + 푥 − 푎 2 + 푥 − 푎 3 + 푥 − 푎 4 + … 1! 2! 3! 4! cos(x) is approximated within 0.02 in the interval [0,π/2] by the 4th order Taylor expansion: 푥2 푥4 cos 푥 ≈ 1 − + 2 24

7.1: Babbage' Vision and Hilbert's Dream 3 Method of finite differences if p(x) is a degree n polynomial, then Δp(x) = p(x+1) – p(x) is a degree n-1 polynomial

x 0 1 2 3 4 5 6 7 8 p(x) = x3 + 2x2 – x + 1 1 3 15 43 93 171 283 435 633 Δp(x) = p(x+1) – p(x) = 3x2 + 7x + 2 2 12 28 50 78 112 152 198 250 ΔΔp(x) = Δp(x+1) – Δp(x) = 6x + 10 10 16 22 28 34 40 46 52 58 ΔΔΔp(x) = ΔΔp(x+1) – ΔΔp(x) = 6 6 6 6 6 6 6 6 6 6

7.1: Babbage's Vision and Hilbert's Dream 4 The Difference Engine (mid 1800’s) original design: replica:

detailed explanation of mechanism: https://www.youtube.com/watch?v=PFMBU17eo_4

7.1: Babbage's Vision and Hilbert's Dream 5 7.1: Babbage's Vision and Hilbert's Dream 6 The analytical engine (hypothetical design)

• Babbage: “The whole of arithmetic now appeared within the grasp of mechanism. A vague glimpse even of an Analytical Engine at length opened out, and I pursued with enthusiasm the shadowy vision.” • store (memory): 1000 50-digit integers • mill (CPU): addition, subtraction, multiplication, division • programmability: punched cards used by Jacquard Loom • variable cards: copy number into mill • operation card: instruction for mill to execute • combinatorial cards: if-then, goto, for loop

7.1: Babbage's Vision and Hilbert's Dream 7 The analytical engine: more than number crunching

• Ada Lovelace: “Again, it might act upon other things besides number, were objects found whose mutual fundamental relations could be expressed by those of the abstract science of operations… Supposing, for instance, that the fundamental relations of pitched sounds in the science of harmony and of musical composition were susceptible of such expression and adaptations, the engine might compose elaborate and scientific pieces of music of any degree of complexity or extent.” • i.e., computation isn’t about numbers necessarily, but general manipulation

algorithmically-composed music: https://www.youtube.com/watch?v=Cbb08ifTzUk

7.1: Babbage's Vision and Hilbert's Dream 8 The analytical engine: more than number crunching • Ada Lovelace wrote a program for the Analytical Engine with two nested for loops to calculate Bernoulli numbers • Machines were not precise enough in the 1800s to enable the construction of a reliable Analytical Engine • The idea of a universally programmable computer came up again independently in the 1930s, to explore a much more abstract question about the foundations of mathematics.

7.1: Babbage's Vision and Hilbert's Dream 9 The foundations of mathematics

• Hilbert’s problems (1900) • 23 problems presented in 1900 to the International Congress of Mathematicians • Hilbert’s 10th problem: “Specify a procedure which, in a finite number of operations, enables one to determine whether or not a given Diophantine equation [a polynomial equation with integer coefficients] with an arbitrary number of variables has an integer solution.” • Note: he’s not bothering even to ask whether one exists… such optimism! • Even more optimism in 1928: the (“the ”) • “The Entscheidungsproblem is solved if one knows a procedure that allows one to decide the of a given logical expression by a finite number of operations.” • e.g., decide if this is true: Ǝx,y,z,n ∈ ℕ \ {0}: (n ≥ 3) Λ (xn + yn = zn) • Subquestion for both Hilbert’s 10th problem and the Entscheidungsproblem: what exactly is a “procedure”?

7.1: Babbage's Vision and Hilbert's Dream 10 The foundations of mathematics

• Goal: axiomatic foundation for mathematics: reduce all of mathematics to theory and , creating a powerful enough to prove all mathematical facts • Other mathematical objects can be defined using only sets:

ℕ: ordered pairs: sequences:

0 = {} = {} (a,b) is defined as { {a}, {a,b} } [x0, x1, x2, … ] is defined by 1 = {0} = {{}} the function f(n) = xn 2 = {1} = {{{}}} functions: ℝ: 3 = {2} = {{{{}}}} f(n) = n2 is defined by its graph a pair of sequences of … { (n,n2) | n ∈ ℕ } digits [3] and [1,4,1,5,9,…]

7.1: Babbage's Vision and Hilbert's Dream 11 Russell’s paradox

• One stumbling block: what objects exactly can “sets” contain? • other sets? • themselves? i.e., can we have a set S such that S ∈ S? • Define R = {S: S ∉ S}. Then R ∈ R ⇔ R ∉ R. • Solution: restricted set comprehension… one must define a larger set from which elements are being taken, i.e., for some existing set A we can write R = {S ∈ A: S ∉ S}

7.1: Babbage's Vision and Hilbert's Dream 12 Zermelo-Fraenkel

Axiom 1. (Extensionality) If two sets have the same members, then they are equal: ∀x ∀y [∀z (z ∈ x ⇔ z ∈ y) ⇒ x = y] 2. (Comprehension) Given any set z and any ϕ, there is a of z consisting of those elements of z with the property ϕ:, for any formula ϕ with free variables among x, z, w1, …, wn we have an axiom ∀z ∀w1 … ∀wn ∃y ∀x (x ∈ y ⇔ x ∈ z ∧ ϕ) Axiom 3. (Pairing) For any sets x, y there is a set which has them as members: ∀x ∀y ∃z (x ∈ z ∧ y ∈ z) Axiom 4. (Union) For any family F of sets, we can form a new set A which has as elements all elements which are in at least one member of F (maybe A has even more elements): ∀F ∃A ∀Y ∀x (x ∈ Y ∧ Y ∈ F ⇒ x ∈ A) Axiom 5. () For any set x, there is a set which has as elements all of x, and again possibly has more elements: ∀x ∃y ∀z (z ⊆ x ⇒ z ∈ y) Axiom 6. (Infinity) There is a set that has infinitely many elements: ∃x [∅ ∈ x ∧ ∀y ∈ x (y ∪ {y} ∈ x)] Axiom 7. (Replacement) If a function has domain a set, then its range is also a set: For each formula with free variables among x, y, A, w1, …, wn, the following is an axiom. ∀A ∀w1 … ∀wn [∀x ∈ A ∃!y ϕ ⇒ ∃Y ∀x ∈ A ∃y ∈ Y ϕ] Axiom 8. (Foundation) Every nonempty set x has a member y which has no elements in common with x: ∀x [x ≠ ∅ ⇒ ∃y ∈ x (x ∩ y = ∅)] Axiom 9. (Choice) Allows one to pick out elements from each of an infinite family of sets: Gives a way to start with one ∀A [∀x ∈ A (x ≠ ∅) ∧ ∀x ∈ A ∀y ∈ A (x ≠ y ⇒ x∩y = ∅) ⇒ ∃B ∀x ∈ A ∃!y (y ∈ x ∧ y ∈ B) simple set, and then build other sets from that.

Taken from 7.1: Babbage's Vision and Hilbert's Dream 13 http://euclid.colorado.edu/~monkd/m6730/gradsets02.pdf Computation: building up from simple parts • General theme: define some “atomic” operations, and define “computable” operations to be any way of composing them. • λ-calculus (Church) • general recursive functions (Gödel, Herbrand) • generalization of what’s now called “primitive recursive functions” (Skolem) • Turing machines (Turing) • All these definitions are equivalent. Church-Turing Thesis: these models capture anything that could “be reasonably called a computation” • Turing, 1936: Halting problem is undecidable • Hence the Entscheidungsproblem is undecidable, since the statement “This Turing machine halts” can easily be encoded as a mathematical . • textbook: “Hilbert may have found the phenomenon of undecidability disappointing, but we find it liberating. Universal computation is everywhere, even in seemingly simple mathematical questions. It lends those questions an infinite richness, and proves that there is no mechanical procedure that can solve them once and for all. Mathematics will never end.” 7.1: Babbage's Vision and Hilbert's Dream 14 If programs can run other programs, some programs must be non-halting

• Operating systems/Web browsers with Javascript • Any decent programming language has an interpreter, i.e., a universal program: • U(P,x) = P(x) • Now consider defining Boolean-returning program V that, on input P, runs P(P) and negates the answer, so V(P) = !P(P) • What is the output of V(V)? • Not well-defined: i.e., V(V) doesn’t halt • Any programming language powerful enough to write an interpreter for itself must contain programs that don’t halt • Universality implies non-halting programs

7.2: Universality and Undecidability 15 Relative sizes of sets

• Note that for all finite sets A,B, |A| ≥ |B| ⇔ ∃ onto function f:A→B • Let’s take this as a definition of |A| ≥ |B| for any sets A,B • |{nonnegative even integers}| ≥ |ℕ|? • [0,1] ≥ |ℝ|? • P(ℕ) ≥ [0,1]? why? • |ℕ| ≥ |P(ℕ)|? • Theorem (Cantor, 1874): |ℕ| < |P(ℕ)|, i.e.,|ℕ| ≤ |P(ℕ)| and |ℕ| ≥ |P(ℕ)|

7.2: Universality and Undecidability 16 Diagonalization

Theorem (Cantor, 1874): No function f:ℕ→P(ℕ) is onto, i.e., for every f:ℕ→P(ℕ), there exists S ⊆ ℕ such that, for all n ∈ ℕ, f(n) ≠ S.

Proof. For all n let Sn = f(n) ⊆ ℕ. Define S = { n ∈ ℕ | n ∉ Sn}. Then S ∉ range(f).

0 1 2 3 4 …

1 indicates that S0 0 0 0 0 0 ℕ is countably infinite, but number is in S1 1 1 1 1 1 P(ℕ) is uncountably infinite: set S 1 0 1 0 1 range(f) 2 |ℕ| < |P(ℕ)| S3 0 1 0 1 0

S4 0 0 1 1 0 Nothing special about ℕ: … for any set A, |A| < |P(A)|.

S 1 0 0 0 1 …

7.2: Universality and Undecidability 17 Diagonalization in computability

Consider defining Boolean-returning program V where V(P) = !P(P) What is the output of V(V)? Not well-defined because V is a row in the table. i.e., V(V) doesn’t halt; table below cannot be all true/false inputs

P1 P2 P3 P4

P1 P1(P1) P1(P2) P1(P3) P1(P4) …

P2 P2(P1) P2(P2) P2(P3) P2(P4)

programs P3 P3(P1) P3(P2) P3(P3) P3(P4)

P4 P4(P1) P4(P2) P4(P3) P4(P4) … …

V !P1(P1) !P2(P2) !P3(P3) !P4(P4) …

7.2: Universality and Undecidability 18 The Halting Problem

• HALTING • Given: program P and input x • Question: will P(x) halt? i.e., is the output of P(x) well-defined? • DEC is the set of decision problems decidable by some program (that halts with an answer on all input instances) • Theorem: HALTING ∉ DEC (Turing, 1936)

7.2: Universality and Undecidability 19 What if HALTING were decidable?

• Suppose there is a program HALTS such that, for every program P and input string x, HALTS(P,x) = true ⇔ P(x) halts. • Would be handy for debugging! • But much more profound uses… consider calling HALTS(P, "") for P = P(input): # ignore input for t = 3 to ∞: for n = 3 to t: for x = 1 to t: for y = 1 to t: for z = 1 to t: if xn + yn == zn: return (x,y,z,n)

7.2: Universality and Undecidability 20 HALTING ∉ DEC

Suppose for the sake of contradiction that the program Halts(P,x) exists. Define the program Catch22(P): if Halts(P,P): then Catch22(Catch22) executes: if Halts(Catch22,Catch22): loop forever loop forever else: else: return true return true What happens if we run Catch22(Catch22)? It halts if and only if Catch22(Catch22) does not halt, a contradiction, so the program Halts does not exist, i.e., HALTING ∉ DEC

7.2: Universality and Undecidability 21 Showing undecidability using reductions

NP-hardness Undecidability “original” hard problem CIRCUIT-SAT HALTING reduction polynomial-time computable function f: {0,1}* → {0,1}* f: {0,1}* → {0,1}* “typical” input/output formula, graph, list of integers program types

7.2: Universality and Undecidability 22 Another undecidable problem FORTY-TWO: input: program Q question: is there a y such that Q(y) = 42? To show FORTY-TWO is undecidable, we show that HALTING ≤ FORTY-TWO

Reduction: Given input P and x, the reduction outputs the program QP,x: def Q(y): e.g., if def gcd(a,b): then QP,x(y): gcd(60,24) P = gcd QP,x = if b==0: return 42 run P(x) x = (60,24) return a return 42 else: def gcd(a,b): return gcd(b,a%b) if b==0: return a else: Then P(x) halts ⇔ (∃y) QP,x(y)=42 return gcd(b,a%b)

7.2: Universality and Undecidability 23 Computable enumerability

• If a program P halts on input x, there is a finite proof of that fact. • HALTING(P,x) = ∃t HALTS-IN-TIME(P,x,t) • HALTS-IN-TIME is decidable • Any decision problem with this property is called (c.e.), the set of c.e. problems is called CE. • DEC vs. CE is analogous to P vs. NP • no time bounds defining DEC and CE • a.k.a., recursively enumerable (r.e.) • a problem is co-c.e. if its complement is c.e.

7.2: Universality and Undecidability 24 Computable enumerability • Imagine a program P (an enumerator) with no input that runs forever, occasionally printing a number. The set of all printed numbers is printed(P). • Exercise: Let L ⊆ ℕ. There is a program P such that L = printed(P) ⇔ L is c.e. ∃ program Q: L(x)=true ⇔ (∃w) Q(x,w)=true

• forward direction? reverse direction? “dovetailing” def Q(x,w): def P(): def P(): i = 0 for x = 0,1,2,…: for t = 0,1,2,…: for each y in printed(P): w = 0 for x = 0 to t: i = i+1 while !Q(x,w): for w = 0 to t: if i == w: w = w+1 if Q(x,w): return x == y print x print x

7.2: Universality and Undecidability 25 Arithmetical hierarchy

• We saw HALTING ∈ CE, but HALTING ∉ DEC • Corollary: DEC ⊊ CE, DEC ⊊ coCE, and CE ≠ coCE • Exercise: DEC = CE ∩ coCE

• Define Σ0 = Π0 = DEC, Σ1 = CE, Π1 = coCE, and • A ∈ Σk if there is B ∈ Πk-1 such that, for all x ∈ {0,1}*, A(x) = (∃w) B(x,w) • Πk = coΣk • Unlike polynomial hierarchy, we know this one is proper: Σk ⊊ Σk+1 and Σk ≠ Πk

TOTAL: TOTAL(P) = true ⇔ Given: program P (∀x ∈ {0,1}*)(∃t ∈ ℕ) P(x) halts in ≤ t steps

Question: does P halt on every input? i.e., TOTAL ∈ Π2

7.2: Universality and Undecidability 26 From undecidable problems to unprovable truths

• formal system: of axioms • including rules of (e.g., modus ponens, “[A and (A ⇒ B)] ⇒ B”) • theorem: statement that is provable by applying rules of inference to axioms • consistency: formal system is consistent if no contradiction can be proved: • for each statement T, at most one of T or !T is provable • completeness: formal system is complete if all statements can be resolved by proof: • for each statement T, at least one of T or !T is provable • consistent and complete: for each statement T, exactly one of T or !T is provable. • A consistent and complete formal system fulfills Hilbert’s dream… all mathematical truths can be discovered by exploring the system’s proofs. • Technicality… soundness: formal system is sound if all provable statements are true • possible for a system to be consistent but unsound... perhaps everything it proves is false!

7.2: Universality and Undecidability 27 Gödel’s Incompleteness Theorem • No (expressive enough) formal system is both complete and consistent. • Intuitively, Gödel showed that any sufficiently expressive formal system can be used to express a self-referential statement: As currently stated, • “This statement cannot be proved.” these are mere • If false, then it can be proved, so the system is not consistent. tricks of language. • If true, then it cannot be proved, so the system is not complete.

7.2: Universality and Undecidability 28 Modern-day proof of Incompleteness Theorem weak ^ Intuition: since the halting problem is undecidable, infinitely many statements • Suppose for contradiction that the formal system is of the form “P(x) does not halt” are true • expressive: statements about programs can be encoded in it but unprovable. • complete: for every statement T, either T or !T is provable. • consistent: for every statement T, either T or !T is not provable. • sound: for every statement T, if T is provable then it is true. • To solve the problem HALTING, given P and x, do two searches in parallel: • search for a proof that P(x) halts • search for a proof that P(x) does not halt soundness • Whether P(x) halts or not, completeness and consistency imply that this is provable. • One search will succeed, contradicting undecidability of the halting problem. sound • So no system expressive enough to reason about programs is both complete and consistent.

7.2: Universality and Undecidability 29 More details about distinction between consistency and soundness http://www.scottaaronson.com/blog/?p=710

7.2: Universality and Undecidability 30 Consistent Guessing

CONSISTENT-GUESSING: input: program P problem: if P(ε) accepts, then accept, if P(ε) rejects, then reject, and if P(ε) runs forever, then either accept or reject (but halt!) • CONSISTENT-GUESSING is uncomputable. (diagonalization shows this similarly to original halting problem)

7.2: Universality and Undecidability 31 Modern-day proof of Incompleteness Theorem (fixed) • If P(ε) accepts, then a transcript of the steps of computation is a proof. (*) Thus if ACCEPTS(P,ε) is true, then it is provable, and similarly for REJECTS(P,ε). • Suppose the formal system is • expressive: statements about programs can be encoded in it • complete: for every statement T, either T or !T is provable. • consistent: for every statement T, either T or !T is not provable. • To solve CONSISTENT-GUESSING with program C, given program P, two searches in parallel: 1. search for a proof that P(ε) accepts; accept if one is found 2. search for a proof that P(ε) does not accept; reject if one is found Consistency and (*) imply that if P(ε) rejects, then there is no proof of 1 (by completeness, then there is a proof of 2), and if P(ε) accepts, then there no proof of 2 (by completeness, there is a proof of 1). So if P(ε) accepts, then C accepts, and if P(ε) rejects, then C rejects. So C solves CONSISTENT-GUESSING. • One of the searches will succeed and be correct, contradicting the uncomputability of CONSISTENT-GUESSING. • So no system expressive enough to reason about programs is both consistent and complete. 7.2: Universality and Undecidability 32 Counter machines

• Finite state machine with a fixed number of counters c1, c2, …, ck, each holding a nonnegative integer.

• Start with inputs n1, n2, …, nl ∈ ℕ as values of c1, c2, …, cl, and cl+1, …, ck start 0. • Finite-state machine, where each state is one of: • inc c: increment counter c • dec c: decrement counter c; no effect if c = 0 • if c=0 goto i: if counter c is 0, then jump to state i • goto i (can be shorthand for if c=0 goto i for unused c) • may also have accept/reject semantics, or interpret the final value of some counter as the output • called “additive” counter machines in the textbook

7.6: Computation Everywhere 33 input a φ(a) = “a is odd” Example counter machines 1. if a=0 goto 7 2. dec a input a f(a) = 2a input a f(a) = [a/2] 3. if a=0 goto 6 1. if a=0 goto 6 1. while a>0: 4. dec a 2. dec a 2. dec a 5. goto 1 3. inc b 3. dec a 6. accept 4. inc b 4. inc b 7. reject 5. goto 1 6. end input a f(a) = 2a f(a,b) = ab inputs a,b 1. inc b 1. while a>0: 1. while a>0: 2. while a>0: 2. 2. dec a 3. dec a i. … 3. while b>0: 4. while b>0: is a shorthand for 4. dec b 5. dec b 1. if a=0 goto i 5. inc c 6. inc c 2. 6. inc d 7. inc c i-1. goto 1 7. while c>0: 8. while c>0: i. … 8. dec c 9. dec c 9. inc b 10. inc b

7.6: Computation Everywhere 34 Alternative formalism of counter machines

• I made up the “line number” definition to make this model more palatable to scientists who have programmed, but have not seen finite-state machines. • Equivalent to the following definition: a k-counter machine is a triple • Q: finite set of states,

• q0: initial state, • δ: Q × {z,n}k → Q × {+1,–1,0}k • δ(q, z, n, z) = (r, 1, –1, 0) means: • if the 3-counter machine is in state q, counters 1 and 3 are =0, counter 2 is >0, • then go to state r, increment counter 1, and decrement counter 2.

7.6: Computation Everywhere 35 3-counter machines are Turing universal

(Assume Turing machine maintains a 1 at each end of the tape) Need third counter c to do the following operations on a and b:

Turing machine operation Counter machine implementation

q6 1 0 0 1 1 1 0 0 0 1 1 _ read bit under tape head is a odd? change bit under tape head inc/dec a a = 1 0 0 1 1 1 move tape head right set a = 2a (+ 1) ; set b = [b/2] 2 = 39 move tape head left set b = 2b (+ 1) ; set a = [a/2] b = 1 1 0 0 0 2 move tape head onto blank if b=0 then = 24 and change it to 1 set a = 2a + 1

7.6: Computation Everywhere 36 2-counter machines are (sort of) Turing universal [Minsky 1967, Computation: Finite and Infinite Machines] • To represent counter values (a,b,c) in a single counter x, let x = 2a∙3b∙5c and y = 0. • To increment b, set x = 3x. (using y as a work counter) • To decrement a, set x = [x/2]. • To test if c > 0, test if x ≡ 0 mod 5. • To start with a = n and b = c = 0, start with x = 2n∙30∙50 =2n. • If f:ℕ → ℕ is any computable function, this machine can start with x=2n and halt with x=2f(n). • Caveat about encoding: there is no 2-counter machine that starts with x=n and halts with x=2n. [Schroeppel 1972, A Two Counter Machine Cannot Calculate 2N] “Theorem: Any counter machine can be simulated by a 2-counter machine, provided an obscure coding is accepted for the input and output.” • 2-counter machines can do universal computation on encoded inputs, but they cannot compute the encoding themselves. • However, the fact that 2-counter machines can simulate arbitrary 3-counter machines implies that the Halting Problem for 2-counter machines is undecidable. 7.6: Computation Everywhere 37 2-counter machines: finite automata on the plane

Finite automaton occupying a point (x,y) ∈ ℕ2. It cannot write anything, or see anything. It can sense if it is touching the southern wall, or western wall (or both). It can move north, south, east, or west based on its current state and 2 “wall bits”:

δ: S × {wall, no wall}2 → S × {,,,→} There is an automaton A so that this problem is undecidable: given (x,y) ∈ ℕ2, if started at (x,y), will A ever visit the origin?

7.6: Computation Everywhere 38 Recursive functions A first attempt at defining the slippery notion of “computable functions”

7.3: Building Blocks: Recursive Functions 39 Building functions f:ℕk → ℕ from scratch

Three “atomic” functions: Most of what we know about constant: 0(x) = 0 primitive recursive functions is from the work of Rósza Péter. successor: S(x) = x+1

projection: Pi(x1,…,xk) = xi (note that P1(x1) is the identity function for arity 1 inputs) Operations to create new functions. Let f and g be functions.

composition: h = f ○ g defined by h(x) = f(g(x)), where x = (x1,…,xk) is an arbitrary number of variables. primitive : h(x, 0) = f(x) and h(x, y) = g(x, y, h(x,y-1)); the recursive case is always 1 smaller, so guaranteed to reach the base case h(x, 0) eventually

def h(x,y): # recursive pseudocode def h(x,y): # iterative pseudocode if y==0: z = f(x) return f(x) for y’=0 to y-1: else: rec = z rec = h(x, y-1) z = g(x, y’, rec) return g(x, y, rec) return z

7.3: Building Blocks: Recursive Functions 40 Examples of primitive recursive functions x+0 = x and x+(y+1) = (x+y)+1 • addition: add(x, 0) = x and add(x, y) = S(add(x,y-1))

add(x, 0) = P1(x) base case add(x, y) = g(x, y, add(x, y-1)) recursive case

where g = S ○ P3 i.e., g(x, y, z) = S(P3(x, y, z))

• multiplication: mult(x, 0) = 0 and mult(x, y) = add(mult(x,y-1), x)

• exponentiation: exp(x, 0) = 1 and exp(x, y) = mult(exp(x,y-1), x)

7.3: Building Blocks: Recursive Functions 41 Primitive programming with BLOOP

BLOOP programs have for loops (“bounded recursion”, i.e., repeat y times), but no while loops. In other words, before entering any loop, we must pre-compute how many total iterations it will take. def exp(x,y): z3 = 0 repeat y times: def mult(x,y): z2 = 0 z2 = 0 repeat y times: repeat z3 times: def add(x,y): z1 = x z1 = x z1 = x repeat y times: repeat z2 times: repeat z2 times: z1 = z1 + 1 z1 = z1 + 1 z1 = z1 + 1 return z1 z2 = z1 z2 = z1 return z2 z3 = z2 return z2

7.3: Building Blocks: Recursive Functions 42 Iterated iteration: the Ackermann function

y A1(x,y) = x + y = x + 1 + … + 1 A3(x,y) = x = x ∙ x ∙ … ∙ x y y

x A (x,y) = x ∙ y = x + x + … + x x 2 A4(x,y) = x = x^(x^(…^x)) y “tetration” y

1 if y=0 Knuth’s up-arrow notation (each x appears y times) A (x,y) = x↑y = A3(x,y) = x ∙ x ∙ … ∙ x i A (x, A (x, y-1)) if y>0 i-1 i x↑↑y = A4(x,y) = x↑(x↑(x↑… x↑x)) x↑↑↑y = A5(x,y) = x↑↑(x↑↑(x↑↑… x↑↑x)) …

7.3: Building Blocks: Recursive Functions 43 Each “level” of the Ackermann function is primitive recursive What if i is an input variable also? This is def Ai(x,y): “The” Ackermann function. (one variant) zi = 1 repeat y times: A(i,x,y) = A(i-1, x, A(i, x, y-1))

zi-1 = 1 repeat zi times: Any BLOOP program has a fixed number of nested z = 1 i-2 loops, so it seems difficult to compute A with a single repeat zi-1 times: (i nested loops) BLOOP program, and we seem to need i nested loops. …

z1 = x k repeat z2 times: Time complexity characterization. f:ℕ →ℕ is primitive recursive z1 = z1 + 1 if and only if there is a program P and i ∈ ℕ such that z = z 2 1 P(x) computes f(x) in at most Ai(2, n) steps for n = |x|. … 2 n 2 zi = zi-1 i.e., in 2n steps, or 2 steps, or 2 steps, etc… return z2 n

7.3: Building Blocks: Recursive Functions 44 “The” Ackermann function is not primitive recursive

A1(x,y) = x + y = x + 1 + … + 1 f(1) = A1(2,1) = 2+1 = 3 y A2(x,y) = x ∙ y = x + x + … + x f(2) = A2(2,2) = 2+2 = 4 y y 3 A3(x,y) = x = x ∙ x ∙ … ∙ x f(3) = A3(2,3) = 2 = 2∙2∙2 = 8 y x 22 x 22 = 216 = 65536 A4(x,y) = x = x^(x^(…^x)) f(4) = A4(2,4) = 2↑(2↑(2↑2)) =

y f(5) = A5(2,5) = 2↑↑(2↑↑(2↑↑(2↑↑2))) = 2↑↑(2↑↑(2↑↑4)) f(n) = An(2, n) grows faster = 2↑↑(2↑↑65536) than any single level A (2, n). 2 i = 2↑↑(22 ) = 7.3: Building Blocks: Recursive Functions 65536 45 Diagonalization strikes again

n=1 n=2 n=3 n=4 n=5

A1(2,n) = 2+n 3 4 5 6 7

A2(2,n) = 2n 2 4 6 8 10 n A3(2,n) = 2 2 4 8 16 32 65536 A4(2,n) = 2↑↑n 2 4 16 65536 2

A5(2,n) = 2↑↑↑n 2 4 65536 ??? ???

f(n) = An(2,n) 3 4 8 65536 ??? Since each row grows faster than the one above it,

f(n) = An(2, n) grows faster than any single row.

7.3: Building Blocks: Recursive Functions 46 The Ackermann function is computable

def A(n,x,y): # recursive def A(n,x,y): # iterative with a stack if n==2: stack = [n] return x*y while !is_empty(stack): else if y==0: n = stack.pop() return 1 if n==2: else: y = x*y return A(n-1, x, A(n, x, y-1)) else if y==0: y = 1 Homework: stack can be represented as a single else: integer s; pop, push, and is_empty can be y = y – 1 implemented as primitive recursive functions of s. stack.push(n-1) stack.push(n) return y If we knew how many iterations the while loop would execute in advance, we could replace it with a FLOOP program (“free loops”) bounded loop and show A is primitive recursive… Turing universal but the number is essentially A(n,x,y) itself! 7.3: Building Blocks: Recursive Functions 47 μ-recursive functions (a.k.a. general recursive)

• Given a function f(x,y), we can define a new function h = μy f as h(x) = μy f(x,y) = min { y : f(x,y) = 0 } • This corresponds to the (unbounded) while loop: def h(x): μ-recursive functions built by y = 0 while f(x,y) != 0: constant, successor, projection, y = y + 1 return y composition, primitive recursion, and μ-“recursion” • Such a y may not exist! • So μ-recursive functions may not be total (defined on all inputs)

7.3: Building Blocks: Recursive Functions 48 Partial recursive functions

• A total function is defined on all inputs. • A partial function may be undefined on some inputs. • a partial function f undefined on input x corresponds to a program computing f that does not halt on input x

primitive recursive functions = BLOOP programs

⊊ ⊊

total recursive functions = FLOOP/Python programs that always halt

⊊ ⊊ partial recursive functions = FLOOP/Python programs 7.3: Building Blocks: Recursive Functions 49 No total programming language can compute all the total recursive functions

• BLOOP was a failed attempt to capture all total recursive functions • Ackermann function is counterexample… computable, but not by any BLOOP program. • But is there a more powerful programming language, whose programs always halt (like BLOOP), but that (unlike BLOOP) can compute all total recursive functions? … NO • Suppose all programs in some language are total: they halt (are defined) on all inputs. • So the “universal” function U(P,x) = P(x) is total, where P is a program and x its input. • Using U as a subroutine, we can define a program V such that V(P) = P(P)+1 (by running U(P,P))… but then V(V) is undefined. • So V is not total, hence not implementable in the programming language. • V is simple to compute using U, so U is not computable by the programming language either. • Similar shows BLOOP has no universal program; you can’t write a BLOOP interpreter in BLOOP.

7.3: Building Blocks: Recursive Functions 50 A universal partial recursive function

• There is a FLOOP interpreter written in FLOOP. • i.e., there is a partial recursive function U such that, given any description f of a partial recursive function f and input x, U(f, x) = f(x): • i.e., U(f, x) is defined if and only if f(x) is defined, and if defined they are equal. • The best way to understand why this is true is to think about modern programming languages like Python, where this is almost trivial. • The best way to understand how much of a deep, unobvious insight this was in the 1930s is to read the “function” way of phrasing it.

7.3: Building Blocks: Recursive Functions 51 Building a universal partial recursive function using pre-1936 technology: Gödel-numbering recursive function source code: h:ℕk → ℕ is represented by a positive integer h.

textbook: The universal function U(f, x) = f(x) is 7, if h = 0 partial recursive. The proof of this claim is rather 11, if h = S technical, and with a thankful nod to the pioneers of computation, we relegate its details to history. 13, 17, 19, … if h = P1 , P2 , P3 , … h = 2∙3f∙5g, if h = f ○ g 22∙3f∙5g, if h(x, 0) = f(x) and h(x, y) = g(x, y, h(x,y-1)) 3 f 2 ∙3 , if h = μy f

recall add(x, 0) = P1(x) and add(x, y) = g(x, y, add(x, y-1)), where g = S ○ P3 so g = 2∙3S∙5P3 = 2∙311∙519 = 6,757,621,765,136,718,750 and add = 22∙3P1∙5g = 22∙313∙56,757,621,765,136,718,750 7.3: Building Blocks: Recursive Functions 52 λ-calculus

• Proposed by Alonzo Church as a definition of “computable” function. • Basis for functional programming languages (LISP, Scheme, Clojure, Haskell, ML, OCaml, Elm) • Easy to evaluate: Interpreter for λ-calculus in 7 lines of Scheme: http://matt.might.net/articles/implementing-a-programming-language/

; eval takes an expression and an environment to a value (define (eval e env) (cond ((symbol? e) (cadr (assq e env))) ((eq? (car e) 'λ) (cons e env)) (else (apply (eval (car e) env) (eval (cadr e) env))))) ; apply takes a function and an argument to a value (define (apply f x) (eval (cddr (car f)) (cons (list (cadr (car f)) x) (cdr f)))) ; read and parse stdin, then evaluate: (display (eval (read) '())) (newline)

7.4: Form is Function: The λ-Calculus 53 Status of the question “What is a computable function?”, April 1936

• Kurt Gödel and Jacques Herbrand: μ-recursive • Alonzo Church: λ-definable • Stephen Kleene (student of Church): showed μ-recursion and λ-calculus define equivalent functions. • Most were not yet convinced that this is the right definition. • Church showed that the problem of finding whether a λ-expression “has a normal form” (corresponding to halting computation) is not solvable by λ-calculus… but what what if Church was wrong in asserting everything “computable” is computable by λ-calculus? • Emil Post accused Church of attempting to “mask this identification under a definition.” [Finite combinatory processes, Formulation I, The Journal of Symbolic Logic 1 (1936), pp. 103–105] • Kleene: “I myself, perhaps unduly influenced by rather chilly receptions from audiences around 1933–35 to disquisitions on λ-definability, chose, after general recursiveness had appeared, to put my work in that format. . . . ” • The existence of universal recursive functions and λ-expressions was known in a sense (Kleene normal form theorem), but difficult to construct, and the significance not fully appreciated. • Meanwhile, across an ocean…

7.4: Form is Function: The λ-Calculus 54 Turing’s applied philosophy The first convincing argument for the correct definition of mechanical computability

“What does a Turing machine look like? You can certainly imagine some crazy looking machine, but a better approach is to look in a mirror.” Charles Petzold, The Annotated Turing

7.5: Turing's Applied Philosophy 55 What is a computer? They didn’t always look like this:

7.5: Turing's Applied Philosophy 56 Computers once looked like this:

7.5: Turing's Applied Philosophy 57 The most important paper in Computer Science

Computing is usually done by writing certain symbols on paper. We may suppose this paper is divided into squares like a child’s arithmetic book… the two-dimensional character of paper is no essential of computation. I assume then that the computation is carried out on one-dimensional paper, i.e. on a tape divided into squares… The behavior of the computer at any moment is determined by the symbols which he is observing, and of his “state of mind” at that moment. “On computable numbers, with an application to the Entscheidungsproblem” Alan Turing, Proceedings of the London Mathematical Society, 1936

7.5: Turing's Applied Philosophy 58 The most important paper in Computer Science

Let us imagine the operations performed by the computer to be split up into “simple operations” which are so elementary that it is not easy to imagine them further divided. Every such operation consists of some change of the physical system consisting of the computer and his tape… We may suppose that in a simple operation not more than one symbol is altered. Any other changes can be split up into simple changes of this kind.

“On computable numbers, with an application to the Entscheidungsproblem” Alan Turing, Proceedings of the London Mathematical Society, 1936

7.5: Turing's Applied Philosophy 59 The most important paper in Computer Science

It is always possible for the computer to break off from his work, to go away and forget all about it, and later to come back and go on with it. If he does this he must leave a note of instructions (written in some standard form) explaining how the work is to be continued. This note is the counter part of the “state of mind.”

“On computable numbers, with an application to the Entscheidungsproblem” Alan Turing, Proceedings of the London Mathematical Society, 1936

7.5: Turing's Applied Philosophy 60 The most important paper in Computer Science

We will suppose that the computer works in such a desultory manner that he never does more than one step at a sitting. The note of instructions must enable him to carry out one step and write the next note. Thus the state of progress of the computation at any stage is completely determined by the note of instructions and the symbols on the tape.

i.e., the computer updates by calculating δ(q1, b1) = (q2, b2, ±1) for states q1, q2 and symbols b1, b2

“On computable numbers, with an application to the Entscheidungsproblem” Alan Turing, Proceedings of the London Mathematical Society, 1936

7.5: Turing's Applied Philosophy 61 The most important theorem in the Theory of Computing

Theorem (Turing, 1936). There is a universal Turing machine U that, on input (M,x), where M is a description of a Turing machine and x is a string, simulates M(x). • Fairly straightforward (if tedious) to implement compared to other universal models such as μ-recursive functions. • Gödel identified this ease of simulation, and thus the ease with which one can prove equivalence of models, or use diagonalization to show limitations, as “a kind of miracle”. • Definition of Turing machine is extremely robust to choices in the definition • one-way or two-way infinite tapes • multiple tapes • 2D tapes • multiple heads • All of these different kinds of machines can easily simulate each other.

7.5: Turing's Applied Philosophy 62 The Grand Unification f is computable by a Turing machine ⇔ f is λ-definable ⇔ f is μ-recursive • Rightmost equivalence shown by Church and Kleene, published in April 1936. • Turing (first transatlantic phone service: 1927) • submitted his paper in May 1936, • learned of the work of Church and Kleene, • added a proof of the leftmost equivalence in an appendix to the paper, • published November 1936. • The skeptics (Gödel, Post, etc.) were immediately convinced by Turing’s physical that this model (therefore also the other two) captured general “computation”. • Straightforward to show all are equivalent to programmability in FLOOP/Python.

For detailed history see http://people.cs.uchicago.edu/~soare/History/

7.5: Turing's Applied Philosophy 63 Church-Turing Thesis

Church-Turing Thesis: The universal Turing machine is capable of simulating any computing device that can be finitely described, where each step accesses and modifies a finite amount of information. • This is not a mathematical statement, but a physical law… so open to refutation in principle. • By analogy, most attempts to build a perpetual motion machine (violating the 2nd Law of Therodynamics) fail because they are actually increasing entropy somewhere not being explicitly modeled. • Similarly, most attempts to describe a “super-Turing” physical device smuggle an “infinite description” somewhere in the definition (e.g. neural nets with infinite precision real weights)

7.5: Turing's Applied Philosophy 64