<<

INF551 Computational Logic:

Artificial Intelligence in Mathematical Reasoning

Stephane´ Graham-Lengrand [email protected] Practical information Timetable: 9 (lectures & practicals) Fridays [14:00-16:00] & [16:15-18:15], from 21st September

Room: Nicole-Reine Lepaute, Turing building

Website (slides, links to practical sheets, . . . ): http: //www.enseignement.polytechnique.fr/informatique/INF551/

Evaluation: 25%: participation (once in the 8 remaining weeks: showing to the class

what you have done / tried to do from one week to the next)

75%: Exam on 21st December (using paper+laptop)

. . . If you need to be evaluated by a project, come and see me.

2 Course notes Distributed course notes = G. Dowek’s book

Available in English!

Course notes are self-contained.

Some of its material should have / may have been covered in one of your undergraduate courses, typically INF412 at Polytechnique.

This year in INF551 we shall: ˆ treat parts of book not treated in undergrad. curriculum ˆ develop other parts further ˆ make TD more practical / machine-based.

3 Lecture 0 Introduction

4 Reasoning and computing You reason since the first words you spoke You compute since kindergarten

You encountered the definition of such mechanisms at undergrad. level (INF412)

Similar situation in the History of Science: reasoning and computing since Stone Age

. . . properly defined during XIXth or XXth century

5 Reasoning vs. computing Easier to compute than to reason (especially since computers)

Follow some computing rules. No intelligence needed, just time and space. Can be automated (done by a machine).

Question addressed by this course:

Can we use computing to reason?

6 What do we mean by “reasoning”? “The art of establishing truth.”

Since ancient times, the question is:

Is a particular mathematical statement true or false?

What do we mean by “true”?

Since ancient times: confronting with reality First mathematical fields = arithmetic and geometry c.f. Plato (e.g. Meno)

Since then, the fields of mathematics have. . . ˆ diversified ˆ got further and further away from witnessable reality

7 XIXth century: Logic crisis Most notable example: The notion of infinity How can truth be checked against reality? Triggered by works of Cantor identifying 6= notions of infinity R ' converging sequences in Q (1870-1872) 2 Bijections & |Q|= 6 |R| (1874), |P(A)| > |A|, |R| = R to Dedekind in 1877: “Je le vois, mais je ne le crois pas!” Continuum Hypothesis (1878) 1874-1884: developped ideas into theory of ordinals and cardinals 1883: Cantor’s premisses of Mandelbrot’s fractal theory Huge impact on topology& measure theory (Borel-Lebesgue,. . . ) Criticised by e.g. Kronecker for not being able to produce sets in finite number of “steps” from natural numbers First ideas of constructivism. To Lindemann proving that π is transcendental: “Why study such problems when irrational numbers do not exist?”

Fear of , Fear of paradoxes/inconsistencies (lots found between 1885-1905)

8 A standard example Suppose a and b are strictly positive integers.

a = b a2 = ab a2 − b2 = ab − b2 (a − b)(a + b) = b(a − b) a + b = b b + b = b 2b = b 2 = 1

Teacher: “This is wrong! You divide by 0!” Pupil: “Can I not?” Reasoning is about rules (applying them correctly), more than about truth. ⇒ Computation?

9 Computation & the foundations of Mathematics From the 1870-1940 research period: rules of logical formalised Combined with axioms to form the notion of , they ˆ shifted the problem of truth to the problem of provability: Is a statement true or false? ⇒ Is a statement provable or not? ˆ suggested that rigourous reasoning could be reduced to a mechanical task: “If controversies were to arise, there would be no more need of disputation between two philosophers than between two calculators. For it would suffice for them to take their pencils in their hands and to sit down at the abacus, and say to each other: Let us calculate.” Leibniz (1677)

Inference rules & mathematical axioms: incredibly tied together “more-or-less” interchangeable, but rules are usually more “computer-friendly” More generally: Computation intrinsically tied to the foundations of mathematics (Which axioms & rules can be used to develop all mathematics & describe any problem that we would like a machine to solve?)

10 Logic gave birth to computers and A.I. General-purpose computers invented on paper (30s) before built in real life!

. . . for the very purpose of automating mathematical reasoning . . . by the very people who cleared up the foundations of mathematics (Goedel, Church, Turing)

“I propose to consider the question, ‘Can machines think?’ ” (1950)

“Artificial Intelligence” coined by John McCarthy at Dartmouth Conference (1956)

11 A very brief & approximate overview of A.I.

“Can machines think?”

Goals ranging Deduction, reasoning, to Interaction with human / from problem solving environment, NLP

McCarthy & Minsky found A.I. lab at MIT (1958) then McCarthy found Stanford A.I. lab (SAIL) (1963)

Approaches: Symbolic A.I. Logic-based A.I. Neural nets Cognitive simul. “The neats” “The scruffies” (McCarthy, championing (Minsky) for A.I.)

Logic programming Probabilistic methods Automated reasoning Uncertain reasoning Constraint solving Machine learning

12 In this course: Logic-based approach to A.I. Several algorithmics disciplines related to proofs: Inference rules

{  # Algorithmics of Algorithmics of Algorithmics of Proof-checking Proof-searching Proof-reducing Easy Lect. 2-5 Lect. 9

Two questions remain central: ˆ Termination of such algorithms ˆ Determinism of such algorithms and computational cost

Some unexpected results: While proof-search can be implemented (under minimalistic hypotheses) as computational process, the converse is true:

All computation can be seen as the search for a mathematical proof

Proof-search not just 1 small algorithmic domain ⇒ ubiquitous in CS

13 The plan Week 1: Introduction and review of undergrad. material Week 2,3,4,5: Algorithmics of proof-search Week 6,7,8: Modelling all mathematical problems in universal framework Week 9: Constructivism

14 Questions?

15 Lecture I What do you remember from your undergraduate programme?

16 Contents

I. Predicate logic II. Meta-mathematics III. Some theories IV. The notion of proof & the / V. Computability

17 I. Predicate logic

18 Stating the obvious Logic is about statements, statements require language.

As usual, language = +

Predicate logic on two levels: ˆ terms (whose semantics are the mathematical objects you want to talk about) ˆ formulae (the aforementioned statements)

The rest of this section presents chapter 5 of INF412 course notes.

19 More generally, today’s slides indicate section numbers in course notes:

Section in INF412 Slide title Section in INF551

http://www.enseignement.polytechnique.fr/informatique/ INF412/i.php/Main/Poly

20 5.1 Syntax of terms 1.2-1.3 Syntax of terms depends on: ˆ a term signature, i.e., a set of elements called term symbols, equiped with a function mapping term symbols to natural numbers called their arities ˆ a denumerable set of elements called variables

Terms are defined inductively as follows: ˆ a variable is a term ˆ for every term f of arity n in the signature,

if t1,. . . ,tn are terms, then f(t1, . . . , tn) is a term

Do not see f(t1, . . . , tn) as a string with parentheses and f commas: even though I use this as concrete syntax on my slides, what I am talking about is really a tree, with the root ... t labelled with symbol f and n direct sub-trees t1,. . . ,tn: t1 n

21 5.1 Syntax of terms 1.2-1.3 That was long and tedious. Here’s a more synthetic presentation of the same thing: Let Σ be a term signature. The set of terms is defined by

t, t1, t2,... ::= x | f(t1, . . . , tn) with x ranging over variables and f/n ranging over Σ

22 5.1 Syntax of formulae 1.2-1.3 Similarly: predicate signature = set of elements called predicate symbols, equiped with function mapping predicate symbols to natural numbers called their arities

Let Ψ be a predicate signature. The set of (pre-)formulae is defined by

A, B, C, . . . ::= p(t1, . . . , tn) | > | ⊥ | ¬A | A ∧ B | A ∨ B | A ⇒ B | ∀xA | ∃xA with x ranging over variables and p/n ranging over Ψ

Again, we are talking about trees

But this is not finished!

23 5.1 Syntax of formulae 1.2-1.3 Formulae are not exactly trees, because ∀x p(x) is the same formula as ∀y p(y) (though technically 6= trees)

R 1 x is a bound variable, as is x in 0 f(x)dx

Set of formulae = set of such trees quotiented by the renaming of bound variables

The next two slides formalise this.

24 5.2.2 Free variables 1.2-1.3 Bound= not free

Free variables of terms, defined by induction on terms:

FV(x) := x

FV(f(t1, . . . , tn)) := FV(t1) ∪ · · · ∪ FV(ti)

Free variables of formulae, defined by induction on formulae:

FV(p(t1, . . . , tn)) := FV(t1) ∪ · · · ∪ FV(tn) FV(>) = FV(⊥) := ∅

FV(¬A) := FV(A)

FV(A ∧ B) = FV(A ∨ B) = FV(A ⇒ B) := FV(A) ∪ FV(B)

FV(∀xA) = FV(∃xA) := FV(A)\{x} t (resp. A) is closed if FV(t) = ∅ (resp. FV(A) = ∅)

25 α-equivalence 1.2-1.3 Define the swap of 2 variables in (all terms, and then) all formulae P : (xy)P everywhere where x is written (free or bound), write y, and vice versa (easy definition by induction on terms and formulae)

∀xP is identified with ∀y (xy)P if y 6∈ FV(P ) ∃xP is identified with ∃y (xy)P if y 6∈ FV(P ) α-equivalence := smallest equivalence relation on trees identifying the above at the root of trees or in their sub-trees

Example: p(w) ∧ ∀x p(x + w) and p(w) ∧ ∀y p(y + w) are α-equivalent

Why “if y 6∈ FV(P )” (i.e. y is a fresh variable)?

(y = −1) ∧ ∃x(x × x = y) not the same as (y = −1) ∧ ∃y(y × y = x)

Formulae = equivalence classes of trees (modulo α-equivalence)

26 5.3 Model structure 2.1 Given a language made of a term signature Σ and a predicate signature Ψ, a (bivaluated) model structure of that language is: ˆ a non- M ˆ for all f/n in Σ, a function fˆ from Mn to M ˆ for all p/n in Ψ, a function pˆ from Mn to {0, 1} Example: A model structure for Σ = {0/0,S/ 1, +/2, ×/2} and Ψ = {=/2} can be

N := (N, 0,n 7→ n + 1, (n, m) 7→ n + m, (n, m) 7→ n × m,

(n, m) 7→ 1if n = m or 0 if not) What is the semantics of x + x? Depends on the value of x!

Valuation = function from (finite sets of) variables to M

27 5.3 Semantics of terms and formulae 2.1 Given a model structure as on previous slide, Semantics of terms: given term t & valuation x φ := φ(x) φ covering FV(t), J K ˆ f(t1, . . . , tn) φ := f( t1 φ,..., tn φ) the semantics of t according to φ, denoted J K J K J K

t φ & defined in M by induction on t: J K

Semantics of formulae: p(t1, . . . , tn) φ :=p ˆ( t1 φ,..., tn φ) given formula A & valuation φ J K J K J K A ∧ B φ := ∧ˆ( A φ, B φ) covering FV(A), the semantics of A J K J K J K ... according to φ, denoted A φ & J K ∀xA φ := 1 if for all a ∈ M, A φ,x7→a = 1 defined in {0, 1} by induction on A: J K J K := 0 if not

∃xA φ := 1 if there is a ∈ M with A φ,x7→a = 1 J K J K := 0 if not

. . . with >ˆ = 1, ⊥ˆ = 0, ¬ˆ, ∧ˆ, ∨ˆ, ⇒ˆ ... given by your favourite truth tables

28 5.4.1,6,6.2.1 Notation, remark and terminology 2.1 Notation To specify the model structure, say M, used to compute the semantics of term t M M (resp. formula A), we sometimes write t φ (resp. A φ ) J K J K . . . but often omitted if the model structure is clear (simply writing t φ or A φ) J K J K Remark When A is closed, no need for valuation:

A φ independent from φ (in which case we may simply write A ) J K J K . . . but valuation useful for sub-formulae

Terminology A theory is a set of closed formulae

A model structure is said to be a model of a closed formula A (resp. a theory T ) if A = 1 J K (resp. for every A in T , A = 1) J K A closed formula A is valid, denoted |= A, if every model structure is a model of A

A closed formula A is a semantical consequence of a theory T , denoted T |= A, if every model of T is a model of A

29 We’re supposed to build mathematics from scratch, and we haven’t got to natural numbers yet, but so far you have used

ˆ characters ˆ trees ˆ sets!

30 II. Meta-mathematics

31 Where the snake seems to bite its tail Mathematics = the art of formally studying concepts

Logic until INF412 (and until Frege) = the informal way you did mathematics

But logic itself can be formalised as an object of mathematical study

Example: ˆ Define the set of formulae ˆ Define the set of valid formulae

ˆ But to do so requires a few mathematical concepts Examples : numbers, trees, sets ; how are these defined ? ˆ if Logic itself is our object of study, which logic do we use to reason about it?

Impossible to do any mathematics ex-nihilo

32 Meta-mathematics To address this, Hilbert introduced the idea that the snake bites another snake’s tail:

Distinguish the object level and the meta level:

ˆ objet level = the logic / the theory / the rules. . . that we define / study ˆ meta level = the logic / the theory / the rules. . . that we use to reason about object level

Remark: the two snakes have no reason to be of the same species i.e., object-level and meta-level have no reason to be the same logic

ˆ Meta-level is usually implicit (just like when you studied geometry at school) If one day you formalise the meta-level, you will informally place yourself in a meta-meta-level, and so on. To start off doing anything, we must informally agree on some basic concepts like trees ˆ OR we can use a machine as a reference for the meta-level (cf., practicals) but you still have to trust the implementation

33 A stupid remark Look again at the definition of A J K We think we do semantics, giving a “meaning” to formulae. . .

. . . and “ A = 1” is a statement of the meta-level J K

But in fact, we are just translating A from the object-level to the meta-level: A ∧ B = 1 if A = 1 and B = 1 J K J K J K ∀xA = 1 if for all a ∈ ..., A x7→a = 1 J K J K ... ∧ is translated as “and”, ∀ is translated as “for all”,. . .

Semantics = Syntax of the meta-level In other words, everything is syntax

34 Hilbert’s program (1920) Hilbert: that hierarchy of snakes/levels (object, meta, meta-meta,. . . ) is acceptable if the same language and logic can be used at all levels. His 1920 program: Find a language to express all mathematics.

Find a logic X made of finitely many axioms + a notion of proof `X and, writingX for X used as the meta-logic andX for X used as the object logic, ˆ Completeness: Prove inX that all “true” mathematical statements can be proved inX

i.e., Prove inX the theorem “If A = 1 then `X A” J K ˆ : Prove inX that no contradiction can be proved inX

i.e., Prove inX the theorem “ 6`X ⊥” ˆ Decidability:

Have an algorithm that terminates on all inputs A, outputting whether `X A or 6`X A ˆ X should be based on arithmetic (the only thing that undeniably exists), but all the mathematics we know should be derived from it

35 Hilbert’s program after the 30s • Language to express all mathematics: that of predicate logic

• Logic X, axioms to derive all mathematics: DEPENDS Arithmetic itself (PA): weak Equality+ Zermelo-Fraenkel’s (cf., Lect. 2 & 3) : ok In both cases: “few” axioms but with an schema

• Logic X, notion of proof: predicate logic • Logic X, completeness: DEPENDS A = 1 ambiguous, as it depends on model structure! J K If A = 1 is meant “in all model structures”, then J K N If A = 1 is meant “in N” (written A = 1) for PA, then J K J K • Logic X, consistency:

• Logic X, decidability ():

36 III. Some theories

37 6.1.3 Equality 1.5 Let Σ be a term signature and Ψ a predicate signature containing at least = /2

∀x, x = x 0 ∀x1 . . . xixixi+1 . . . xn, 0 xi = xi ⇒ 0 f(x1, . . . , xi, . . . , xn) = f(x1, . . . , xi, . . . , xn) for each f/n ∈ Σ 0 ∀x1 . . . xixixi+1 . . . xn, 0 xi = xi ⇒ 0 p(x1, . . . , xi, . . . , xn) ⇒ p(x1, . . . , xi, . . . , xn) for each p/n ∈ Ψ

38 6.1.8 Arithmetic (Peano’s) 1.5 Σ = {0/0,S/1, +/2, ×/2}, and Ψ = {= /2}

Equality axioms, plus: ∀y (0 + y = y) ∀x ∀y (S(x) + y = S(x + y)) ∀y (0 × y = 0) ∀x ∀y (S(x) × y = (x × y) + y) ∀x ∀y (S(x) = S(y) ⇒ x = y) ∀x ¬(0 = S(x)) 0 x S(x) y ∀x1 . . . xn ( z P ⇒ ∀x ({z}P ⇒ z P ) ⇒ ∀y {z}P )

for all formulae P with FV(P ) = {x1, . . . , xn, z}

Last line is an axiom schema (infinitely many axioms)!

Remark: N is a model of Peano’s arithmetic

39 Substitution { } . . . 1.2 . . . is what you imagine it is.

t t Careful though when defining x (∀yP ) and x (∃yP ) !!!

What if y ∈ FV(t)?

Hint: t t x (∀yP ) = ∀y x P t t if x 6= y and y∈ / FV(t) x (∃yP ) = ∃y x P otherwise, rename y by swapping it with a fresh variable z: (yz)P

Many notions stacked on each other: variable swapping . α-equivalence . quotienting trees . substitution

Many mistakes in books

Many mistakes in symbolic computation systems (programming languages, proof systems, . . . )

40 IV. The notion of proof & the soundness/completeness theorem

41 The definitions of & semantical consequence. . . ˆ . . . are annoying, as they heavily rely on meta-level ˆ . . . are nowhere near offering a algorithmic way of checking (at least in predicate logic) even checking that a structure is a model of a formula may require infinitely many checks (∀...)

Obsession of mathematicians: justify, by an of finite size, statements about infinitely many objects

In that respect, semantics = useless (translates object-level ∀ to meta-level “for all”)

Looking for more syntactic notion, with a finite object that ˆ justifies validity ˆ can be communicated and checked by someone else (or a computer) ⇒ The notion of proof

42 4,6.3.1 The notion of proof 1.4 Many different formalisms, defining what it means to prove A from T , denoted T ` A

In all of them (except exceptions) proofs are finite objects, with a notion of size (you can prove a property of all proofs by induction on their sizes)

Frege-Hilbert proofs: Proof of T ` A = sequence of formulae such that: ˆ the last one is A ˆ every formula is either – a consequence of the previous ones by one of those two inference rules: A ⇒ BA A B ∀xA – or a formula in T – or one of the “logical axioms” See INF412 course notes for their list, or practical #3

43 4.3 Proofs as trees 1.4 By looking, in that sequence of formulae, at how they depend on each other by the two inference rules, you can also see a Frege-Hilbert proof as a tree ˆ whose internal nodes are labelled by instances of the two inference rules ˆ whose leaves are labelled by logical axioms or axioms from T ˆ whose conclusion is labelled by A

Example:

A ⇒ B A B ⇒ C ⇒ D B A ⇒ C A C ⇒ D C D is a proof of A, A ⇒ B,A ⇒ C,B ⇒ C ⇒ D ` D

Pushing further the concept of proofs-as-trees, is an alternative formalism for proofs

44 4.3 Natural Deduction 1.4

Γ,A ` A Γ,A ` B Γ ` A ⇒ B Γ ` A

Γ ` A ⇒ B Γ ` B Γ ` P Γ `∀ xP x∈ / FV(Γ) t Γ `∀ xP Γ ` x P t Γ ` x P Γ `∃ xP Γ,P ` Q x∈ / FV(Γ,Q) Γ `∃ xP Γ ` Q

Γ ` A1 Γ ` A2 Γ ` A1 ∧ A2 i ∈ {1, 2} Γ ` A1 ∧ A2 Γ ` Ai

Γ ` Ai Γ ` A1 ∨ A2 Γ,A1 ` B Γ,A2 ` B {i, j} = {1, 2} Γ ` A1 ∨ A2 Γ ` B Γ,A `⊥ Γ `⊥ Γ ` ¬¬A

Γ `> Γ `¬ A Γ ` A Γ ` A

45 6.3.4 Soundness theorem 2.2 Soundness theorem:

If T ` A then T |= A

Proof : Easy induction on the size of the proof, using truth tables. (just mind how you deal with free variables) Corollary: Let M be a model of T

If T ` A then M is a model of A By contraposition:

If M is not a model of A, then T 6` A This gives a method for showing that A cannot be proved from T :

find a model of T where A = 0 J K Particular case of consistency (A = ⊥):

Definition: T is consistent if ⊥ cannot be proved from T (i.e., T 6` ⊥). Consequence of Soundness: To prove that T is consistent, it suffices to find a model of T

46 Another consequence of soundness Let T be a theory and A a closed formula. Assume there is a model of T where A = 1 and another one where A = 0. J K J K

What can you say about T ` A? about T ` ¬A?

Hilbert seriously hoped that this would not be the case for T = PA

47 6.2-6.3 Completeness theorem - Goedel 1927 2.3 Completeness theorem: Assume the term and predicate signatures are denumerable.

If T |= A then T ` A

Proof : Suppose T 6` A. Cleary, T , ¬A 6` ⊥. If we could prove that every consistent theory has a model (see next slide), then that model of T , ¬A would be a model of T that is not a model of A. Qed.

Corollary: The compactness theorem

If T |= A then there is a finite Γ of T such that Γ |= A

Proof : Since proofs are finite object, a proof of T ` A only uses finitely many axioms of T ; call them Γ; we have Γ ` A and by soundness Γ |= A.

48 6.3.5 If a theory is consistent then it has a model 2.3 Completion of a theory Lemma: If a theory T on a denumerable signature (Σ, Ψ) is consistent, then there is a consistent theory T 0 ⊇ T on denumerable signature (Σ0, Ψ) (with Σ0 ⊇ Σ) such that 1. for all closed formulae A, either T 0 ` A or T 0 ` ¬A 2. for all closed formulae ∃xA, 0 0 t 0 if T ` ∃xA then T ` x A for some closed term t made from signature Σ Proof : see INF412 or INF551 course notes Now, we define a model structure for the signature (Σ, Ψ): ˆ M is the set of closed terms on Σ0

ˆ for every f/n in Σ, fˆ maps t1, . . . , tn to f(t1, . . . , tn) 0 ˆ for every p/n in Ψ, pˆ maps t1, . . . , tn to 1 if T ` p(t1, . . . , tn) and to 0 if not

Lemma: For all closed formulae A, if T 0 ` A then A = 1 in that model structure J K Proof : Easy induction on the size of A, using 1. and 2.

This structure is a model of T (for all A ∈ T we have A ∈ T 0 and therefore T 0 ` A)

49 V. Computability

50 7-8 Programs, computable functions and (semi-)decidable sets 3-4 Define a notion of programs (e.g., Turing machines) in denumerable numbers, each of them representing a partial function from N to N (i.e., a total function from N to N ∪ {∞}) We apply programs to numbers by interpreting them as their represented function (If p is a program and n a number, we can write p(n))

Partial functions from N to N (or total functions from N to N ∪ {∞}) that are represented by programs are said to be computable

A program p terminates on n if p(n) 6= ∞

A set of natural numbers A is decidable (resp. semi-decidable) if the function   N −→ N ∪ {∞} N −→ N ∪ {∞}     n ∈ A 7→ 1 resp. n ∈ A 7→ 1    n 6∈ A 7→ 0 n 6∈ A 7→ ∞ is computable.

51 Generalising to other sets C, D,. . . C D Extend this to functions from C to D if you have injections into N (p·q and p·q ): N −→ N f : C −→ D computable if C D computable pxq 7→ pf(x)q

Formally, computability depends on injections, 0 but two injections p·q, p·q : C −→ N define the same notion of computability for C 0 if pxq 7→ pxq and its inverse (from N to N) are computable

Hence, 1 notion of computability for ˆ finite sets 2 ˆ N such that projections are computable n ˆ N such that projections are computable ˆ lists such that access to the head and acces to the tail are computable ˆ trees such that access to sub-trees and node-labels are computable etc. In practice, no ambiguity

52 8.1,8.3.3,8.7.2 Known results, easy proofs 3.4.1-3.4.2 Programs can be represented as numbers.

There is a universal program, i.e., a program u such that for any program p and any number n, u(p(ppq, n)q) = p(n)

The halting problem is undecidable: The set of all p(ppq, n)q such that p terminates on n is undecidable

The generalised halting theorem: Let A be a decidable set of programs such that every program in A always terminates (i.e., for all p ∈ A and all n ∈ N, p terminates on n). There is a total that is not represented by a program in A

53 9.2.2 Back to logic 5.4 Formulae and proofs can be represented as numbers

Theorem: The set of all A such that PA ` A is semi-decidable:

Proof : Lemma (Checking a proof is decidable): The function mapping p(pπq, pAq)q to 1 if π is a proof for PA ` A, to 0 otherwise, is computable by a program p Given a closed formula A, enumerate all numbers n until p(p(n, pAq)q) = 1. If PA ` A, then the program will find a proof and terminate. If not it will run forever.

54 What’s next

Practical: try solving propositional problems by a computer

Next time: how people have done it before you

55 Questions?

56 Lecture II Automated reasoning mechanisms: propositional logic

57 The semantical view Remember: ˆ T ` A every model of T is a model of A (soundness & completeness)

ˆ a model structure is either a model of A or a model of ¬A ( A ∈ {0, 1}) J K Provability (of A in theory T )

YES Do we have T ` A?NO m

YES Is every model of T a model of A?NO

NO Is there a model of T that is a model of ¬A?YES m

NO Is there a model of T , ¬A?YES

Satisfiability (of ¬A with respect to theory T )

58 I. Propositional logic: solving SAT by DPLL

59 Propositional logic

T := ∅

class of formulae := those built from the following signatures:

Σ := ∅

Ψ := {p1/0, p2/0, . . . , pi/0,...} “Propositional variables”

No predicate symbol of arity ≥ 1 ⇒ formulae contain no terms

t Formulae contain no terms ⇒ x A is A

(∀xA) ⇔ A

(∃xA) ⇔ A

60 General methodology To determine whether, in propositional logic, ` A, we simply determine whether there is a model of ¬A (i.e., a model structure satisfying ¬A, i.e., a model structure where ¬A = 1) J K What is a model structure for propositional logic? Applying definition of model structure:

A mapping from Ψ (the set of propositional variables) to {0, 1}

A formula is a finite object, using finitely many propositional variables

To determine whether there is a model of A, using propositional variables p1, . . . , pn it suffices to look at the model structures for p1, . . . , pn

How many? ... 2n

Problem is decidable (enumerate every structure, and for each of them compute ¬A with truth tables) J K Problem is in fact NP-complete

61 This can take time Are there easy cases?

1. If A of the form l1 ∧ · · · ∧ lm,

where the li are literals (i.e., prop. variables or their negations), it’s easy. 2 cases: ˆ If A does not feature both p and ¬p for any p,

A is satisfiable: let the model structure every li to 1 ˆ if A features both p and ¬p for some p, A is unsatisfiable 2. If A is a disjunctive normal form (DNF): a disjunction of conjunctions of literals, it’s easy: Notice there is a model of A1 ∨ A2 iff there is a model of A1 or a model of A2, so take each of the formulae in the disjunction, then check whether one of them is satisfiable . . . then A is satisfiable or all of them are unsatisfiable . . . then A is unsatisfiable In both 1 and 2, question can be answered in linear time (in the size of formula)

62 An example of 2. propositional variables = American swing states (Florida, Ohio, North Carolina, Virginia, Wisconsin, Colorado, Iowa, Nevada, New Hampshire)

mapped to 1 if won by Clinton, to 0 if won by Trump

A: “there is a tie in the number of electors”

63 64 If A already expressed as

(¬Florida ∧ Ohio ∧ ¬NCarolina ∧ ¬Virginia ∧ Wisconsin ∧ ¬Colorado ∧ ¬Iowa ∧ ¬Nevada ∧ NHampshire)

∨ (¬Florida ∧ ¬Ohio ∧ NCarolina ∧ Virginia ∧ ¬Wisconsin ∧ ¬Colorado ∧ ¬Iowa ∧ ¬Nevada ∧ NHampshire)

∨ (¬Florida ∧ ¬Ohio ∧ ¬NCarolina ∧ Virginia ∧ Wisconsin ∧ Colorado ∧ ¬Iowa ∧ ¬Nevada ∧ ¬NHampshire)

∨ (¬Florida ∧ ¬Ohio ∧ ¬NCarolina ∧ Virginia ∧ ¬Wisconsin ∧ Colorado ∧ Iowa ∧ ¬Nevada ∧ NHampshire)

∨ (¬Florida ∧ ¬Ohio ∧ ¬NCarolina ∧ Virginia ∧ ¬Wisconsin ∧ Colorado ∧ ¬Iowa ∧ Nevada ∧ NHampshire)

then it’s easy to check satisfiability: each item of the big disjunction gives rise to a model so long as there is no inconsistency in it (p and ¬p in it)

65 The worst case...... is on the contrary when A is a conjunctive normal form (CNF): a conjunction of disjunctions of literals

In that case we can

ˆ transform A into a propositionally equivalent DNF by distributing ∧ over ∨: A ∧ (B ∨ C) −→ (A ∧ B) ∨ (A ∧ C)

Exponential complexity!

ˆ Or try to enumerate the 2n model structures

Exponential complexity!

We will choose a smart version of the latter: the incremental construction of a model structure that has a chance of working

DPLL

66 DPLL (stands for Davis–Putnam–Logemann–Loveland, 1962)

The idea of the incremental construction is to avoid investigating all 2n structures by efficiently cutting branches in the tree of possibilities Working on worst-case scenario: is a CNF satisfiable? Terminology ˆ A literal l is a prop. variable p or its negation ¬p we define l by: p := ¬p and ¬p := p ˆ A clause C is a set of literals (to be understood as a disjunction) The input is a set φ of clauses (to be understood as a conjunction of clauses) (in other words, we do not care about associativity and commutativity of ∧ and ∨) The output is SAT (with a model) or UNSAT

67 Ingredients and notations We now organise the incremental construction of a model The DPLL process will make transitions between states that are either the state UNSAT or a state of the form ∆kφ

where ∆ is a sequence of possibly tagged literals, i.e., literals, like l, and tagged literals, like ld (also called decision literals) Intuition:

ˆ To each state ∆kφ where ∆ is consistent (does not contain l and l) corresponds a partial model: the partial map assigning 1 to the literals in ∆ (regardless of tags) ˆ When we reach ∆kφ, all the possible model structures extending (that represented by) ∆ remain to be checked as potential models of φ ... but not only! d d if ∆ contains tagged literal l , i.e., ∆ = ∆1, l , ∆2 it means we have made the arbitrary decision of mapping l to 1, and all the structures

extending (that represented by) ∆1, l also remain to be checked

68 Outcome of DPLL We start from ()kφ

We apply the transition rules until we no longer can

If we reach UNSAT, then φ is unsatisfiable

If we reach ∆kφ and no rule applies, the structure represented by ∆ is a model of φ

69 Transition rules lit(φ) denotes the set of literals appearing in φ

We write ∆ |= ¬C if, for every l ∈ C, we have l appearing in ∆ (tagged or untagged)

i.e., there is no extension of (the structure represented by) ∆ that is a model of C

ˆ Decide: ∆kφ / ∆, ldkφ where l 6∈ ∆, l 6∈ ∆, l ∈ lit(φ)

ˆ Backtrack: d ∆1, l , ∆2kφ / ∆1, lkφ if there is C ∈ φ such that ∆1, l, ∆2 |= ¬C

and ∆2 has no decision literals

ˆ Fail: ∆kφ / UNSAT if there is C ∈ φ such that ∆ |= ¬C and ∆ has no decision literals

70 Short-cutting the exploration of 2n possibilities Above rules are complete: they describe the depth-first exploration of the tree of possibilities (of height n) apply Decide eagerly and you get the full exploration of 2n possibilities apply Backtrack and Fail eagerly and you stop investigating a sub-tree as soon as it becomes clear that no model will be found there

To even avoid branching on literal that is a consequence of previous ones, DPLL offers: Unit propagation: ∆kφ / ∆, lkφ if there is (C ∨ l) ∈ φ such that ∆ |= ¬C, and l 6∈ ∆, l 6∈ ∆ Pure literal: ∆kφ / ∆, lkφ if l 6∈ ∆, l 6∈ ∆, l ∈ lit(φ) and l 6∈ lit(φ)

71 Termination and determinism System is not deterministic: you are still free to choose how you apply transition rules

Theorem: No matter the strategy applying the rules, DPLL always terminates.

Proof: To each state ∆kφ, let ˆ a be the number of model structures that remain to be checked (2n at first) ˆ b be the number of literals in lit(φ) not determined in ∆ (a, b) strictly decreases with each transition (lexicographically) 2

Which strategy to use? In your interest: apply Decide as a last resort to factorize the gains of branch-cutting

But: applying other rules first requires checking side-conditions (look at all clauses?) computational cost

Efficient strategy needs to implement appropriate data structures and algorithms to perform eager application of other rules with efficient checks of side-conditions (e.g., technique of “2-watched literals”)

72 Example

() k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d2 k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d23d k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d23d4 k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d23d45d k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d23d45d6 k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 1d23d45 k 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2

Decision 1d incompatible with decision 5d

Decision 3d has nothing to do with it: finding that 5 must be in model is lost information for future branch 3

Would like to directly go to 1d25k1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2

73 Improving on backtrack d Backjump: ∆1, l , ∆2kφ / ∆1, lbjkφ if d 1. there is C ∈ φ such that ∆1, l , ∆2 |= ¬C the “conflict clause” 0 0 2. there is a clause C ∨ lbj with lit(C ∨ lbj) ⊆ lit(φ), the “backjump clause” such that 0 ˆ φ ` C ∨ lbj backjump clause is a consequence of previous clauses 0 ˆ ∆1 |= ¬C

3. lbj 6∈ ∆1, lbj 6∈ ∆1

Here we apply it with . . . and we get a transition from d d d d ∆1 = 1 2 1 23 45 6k1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 ld = 3d to d d 1 25k1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 ∆2 = 45 6 Clearly, difficulty (i.e., computational cost) is in inventing clause C = 6 ∨ 5 ∨ 2 2 ∨ 5 such that 1 ∨ 2, 3 ∨ 4, 5 ∨ 6, 6 ∨ 5 ∨ 2 ` 2 ∨ 5 l = 5 bj Can come from the analysis of conflict 0 C = 2 1d23d45d6 |= ¬(6 ∨ 5 ∨ 2)

74 But the work is not wasted! We have learnt a clause (the Backjump clause) that we can reuse later

Learn: ∆kφ / ∆kφ, C if lit(C) ⊆ lit(φ) and φ ` C

Typical application of this is after a Backjump

Careful in eagerly adding such clauses: having too many (redundant) clauses can also slow down the process

Sometimes it can be useful to remove a redundant clause: Forget: ∆kφ, C / ∆kφ if φ ` C

Yet another potentially useful rule: Restart: ∆kφ / ()kφ only useful if φ is different from original set of clauses (clauses have been learnt)

75 Conclusions New rules, new possibilities =⇒ more chances of speeding up process =⇒ more strategy to control how to apply them

Not in the scope of this course to investigate them.

Back to the original problem: deciding provability in propositional logic We wonder whether ` A

Assume ¬A to be, or reduce ¬A to, a CNF

Start DPLL with ()k¬A

If UNSAT: then we have ` A If SAT: then we have 6` A

Of course, reducing ¬A to a CNF is also computational costly (imagine if it is a DNF!), so there are ways to adapt DPLL to formulae that are not CNF

76 Questions?

77 Lecture III Extending propositional reasoning

78 In the previous lecture...... we have seen a methodology to solve problems in propositional logic (a.k.a. SAT problems) it featured a main algorithm: DPLL

Many problems can be modelled as SAT problems

Can we extend our range of computer-aided reasoning techniques beyond propositional logic?

Example : Arithmetic can we decide whether the following holds? (2x > 1 ∧ (x > −3 ⇒ y > 0)) ⇒ x + y > 0

(implicitly: for all x, y)

79 Other example: the Clinton vs. Trump elections

In the previous lecture, I used the picture to illustrate the tree of the possible scenarios . . . and “expressed” the issue of whether there is a tie as a boolean DNF, merely enumerating the 5 models, which I had pre-computed from arithmetic considerations

80 Other example: the Clinton vs. Trump elections Using arithmetic literals allows the tie to be naturally expressed as a CNF, literally expressing that the number of electors for each candidate is the same, and noting that, before considering swing states, Clinton had secured 237 electors, and Trump 191:

ˆ For each swing state s, with ns electors, we have two arithmetic variables cs and ts (number of electors won by Clinton and Trump in that state) and we place in the CNF the following clauses

0 ≤ cs ≤ ns, 0 ≤ ts ≤ ns (four unit clauses)

ns ≤ cs ∨ ns ≤ ts cs ≤ 0 ∨ ts ≤ 0 (two binary clauses)

ˆ Add to the CNF:

237 + cFlorida + cOhio + cNCarolina + cVirginia + cWisconsin + cColorado + cIowa + cNevada + cNHampshire

= 191 + tFlorida + tOhio + tNCarolina + tVirginia + tWisconsin + tColorado + tIowa + tNevada + tNHampshire

Question: Can we extend propositional reasoning (as provided by e.g., DPLL) to integrate arithmetic concepts, and answer the question of whether there may be a tie?

81 Contents

I. Ground problems II. Quantifier elimination III. Trying to get full arithmetic

82 I. Ground problems

83 Simplex algorithm Input: a collection of multivariate polynomials of degree ≤ 1 In other word, each polynomial is of the form

a1x1 + ... + amxm + b with coefficients in Z

Output: whether there exists an instantiation of the variables such that the value of each polynomial for that instantiation is positive

Problem can be expressed in matrix format:       a11 ··· a1m x1 −b1        . .   .   .   . .  ·  .  ≥  .        an1 ··· anm xm −bn

Problem can be solved by Linear Algebra methods

Natively provides solutions in Q

84 Application to automated reasoning in arithmetic Take first-order signature (Σ, Ψ) of Linear Arithmetic, where Σ = {0, 1, +, −} Ψ = {≤, ≥, <, >}

Axioms of Linear Arithmetic: 2 versions, depending on whether variables range over ˆ integers (Linear Integer Arithmetic - LIA) ˆ rationals or reals (Linear Rational/Real Arithmetic - LRA). Whichever of LIA or LRA is used, every literal (of free variables among {x1, . . . , xm}) is equivalent to a literal of the form a1x1 + ... + amxm + b ≥ 0 or a1x1 + ... + amxm + b > 0

Conjunction of literals l1 ∧ · · · ∧ ln (free variables among {x1, . . . , xm}) forms input of simplex algorithm (massaging it slightly to handle strict inequalities) Simplex ˆ gives instantiation in Q: l1 ∧ · · · ∧ ln consistent with LRA ˆ says there is no instantiation: l1 ∧ · · · ∧ ln inconsistent with LRA Question: Can we use this decision procedure to determine whether an arbitrary formula over (Σ, Ψ) can be proved in LRA? (and what about LIA?)

85 More generally Consider a theory T with a ground decision procedure, i.e., an algorithm that

ˆ takes as input a collection of literals l1, . . . , ln

ˆ outputs whether the conjunction of the literals l1, . . . , ln is consistent with T

i.e., whether T , l1, . . . , ln 6` ⊥

Can we determine whether an arbitrary formula A is provable in T (i.e., T ` A)?

Idea: Adapt DPLL using the above decision procedure, to determine whether there is a model of T , ¬A

DPLL(T )

As in DPLL, it works when ¬A is (equivalent to) a CNF φ. Difference with DPLL: literals are no longer propositional variables or their negations but may be any literals (say, in the syntax of first-order logic) to which theory T now gives “semantical meaning” (e.g., l = (x < 5y) in arithmetic)

86 General architecture DPLL(T ) is therefore a technique to turn

an algorithm deciding whether there is a model of T , l1, . . . , ln into

an algorithm deciding whether there is a model of T , φ

DPLL(T ) enlarges the class of formulae that can be decided by the first algorithm

Now we keep the notation ∆|=¬C as before (syntactical entailment, T not involved) but we also write ∆|=T (semantical inconsistency) if the first decision algorithm says that there is no model of T , ∆ (i.e., no extension of (the structure represented by) ∆ is a model of T )

As in DPLL we start the algorithm with the empty model ()kφ. Then if we reach state UNSAT: φ (and therefore ¬A) inconsistent with T (i.e., T ` A) state ∆kφ and no rule applies: φ (and therefore ¬A) consistent with T (i.e., T 6` A)

87 Transition rules Syntactical rules

ˆ Decide: ∆kφ / ∆, ldkφ where l 6∈ ∆, l 6∈ ∆, l ∈ lit(φ)

d ˆ Backtrack: ∆1, l , ∆2kφ / ∆1, lkφ if there is C ∈ φ such that ∆1, l, ∆2|=¬C

and ∆2 has no decision literals ˆ Fail: ∆kφ / UNSAT if there is C ∈ φ such that ∆|=¬C and ∆ has no decision literals ˆ Unit propagation: ∆kφ, C ∨ l / ∆, lkφ, C ∨ l where ∆|=¬C, l 6∈ ∆, l 6∈ ∆ Semantical rules (involve calling the decision procedure for the theory)

ˆ BacktrackT : d ∆1, l , ∆2kφ / ∆1, lkφ if ∆1, l, ∆2|=T and ∆2 has no decision literals

ˆ FailT :

∆kφ / UNSAT if ∆|=T and ∆ has no decision literals ˆ Theory Propagate:

∆kφ / ∆, lkφ where ∆, l|=T , l ∈ lit(φ), l 6∈ ∆, and l 6∈ ∆

88 Strategies First possibility: “syntactical rules” (first 4) applied eagerly ⇒ “semantical rules” only applied if propositional DPLL would return SAT with model ∆ amounts to running a SAT-solver that ignores semantics of literals, rejecting the models it proposes if they do not work for the theory until one works for the theory, or all propositional models are proposed and rejected

Advantage: can use on-the-shelf SAT-solver and decision procedure, only code to write is for the above loop

Second possibility: “semantical rules” before Decide cuts branches early before a partial assignement is completed. . . when already assigned values are inconsistent with T

To do better than that: if partial assignement ∆ is inconsistent with T , it’d be good if decision procedure returned some subset l1, ..., ln of ∆ that is already inconsistent.

The smaller the better, as we could learn the clause l1 ∨ ... ∨ ln and use it syntactically

89 Other rules

The extra rules LearnT , ForgetT , BackjumpT can be generalised to the case with such a theory T : consequences are now semantical consequences (consequences modulo the theory) of original clauses, rather than syntactical consequences

No change in Restart: ∆kφ ⇒ ()kφ

Be careful with the Pure literal rule! The rule (as such) is incorrect. Example: x < 5 is in a clause, ¬(x < 5) appears nowhere in the problem It is unsafe to assume x < 5 without loss of generality (imagine you must satisfy ¬(x < 2 + 3)!)

90 Examples of theories with ground decision procedure LRA, with the simplex algorithm.

LIA, with simplex+ “Branch & Bound” techniques to rule out non-integer solutions: Simplex gives a solution where x 7→ 3/4? Add to DPLL the clause (x≤0 ∨ x≥1), valid in LIA (but obviously not in LRA). Still, some more tricks needed to ensure that the whole algorithm terminates.

Equality for first-order terms (where function symbols have no semantical meaning) whether an inconsistency such as

a = b, c = f(a), d = f(b), c 6= d |=Eq holds, can be decided by algorithm known as Congruence Closure (CC)

Arrays: theory on term signature { [ ] , write( , , ) } and predicate signature {=} Axioms: those of equality +

∀aijv(i = j ⇒ (write(a, i, v))[j] = v)

∀aijv(i 6= j ⇒ (write(a, i, v))[j] = a[j])

91 Conclusion for DPLL(T )

A whole field about whether, given a theory T1 and a theory T2 both with ground decision procedures, a ground decision procedure can be defined for the theory T1 ∪ T2

(answer is yes under some sufficient assumptions. . . )

DPLL and DPLL(T ) are useful tools to treat problems with very large inputs (thousands/millions of clauses)

Heavily used in industry to model systems, to verify programs (with tools such as Why3)

Limitations?

Not all formulae treated: no quantifier in A

Actually: if we want to prove A, DPLL and DPLL(T ) “treat” universal quantifier ∀, since the free variables of A (say {x1, . . . , xn}) behave as if universally quantified (∀x1 ... ∀xnA)

But what about ∃, and alternations of ∀ and ∃?

92 II. Quantifier elimination

93 General principle Definition A theory T admits quantifier elimination if for all (first-order) formulae F , there is a quantifier-free formula G such that T ` F ⇔ G

Reduction theorem Assume T is such that, for any formula F of the form

∃y(l1 ∧ · · · ∧ ln)

(where l1, . . . , ln are literals) there is a quantifier-free formula G such that T ` F ⇔ G Then T admits quantifier elimination.

Proof: By induction on the structure of formulae 2

94 Example: total dense orders with no endpoints 1/3 Consider the following theory based on an empty term signature, an equality binary predicate = and another binary predicate <.

We define the theory of total dense orders with no endpoints as: Totality: ∀x∀y(x = y ∨ x < y ∨ y < x)

Transitivity: ∀x∀y∀z(x < y ∧ y < z ⇒ x < z)

Irreflexivity: ∀x¬(x < x)

Density: ∀x∀y(x < y ⇒ ∃z(x < z ∧ z < y))

No left end point: ∀x∃y(y < x)

No right end point: ∀x∃y(x < y)

Examples of models of this theory: the real and the rational numbers with the symbol < interpreted by the usual order relation.

95 Example: total dense orders with no endpoints 2/3 Theorem This theory admits quantifier elimination

Proof

We use the Reduction Theorem. We only have to consider formulae of the form

∃y(l1 ∧ · · · ∧ ln) We seek a quantifier-free formula equivalent to the above.

Negated literals among l1 . . . ln are replaced by small disjunctions by the totality axiom

∧ are distributed over disjunctions

∃y is distributed over disjunctions

We now simply have to find a quantifier-free formula for any formula F of the form

∃y(a1 ∧ · · · ∧ am) where each ai is an atom

96 Example: total dense orders with no endpoints 3/3

If ai does not feature y, we move it out of the scope of the quantifier

If ai is y = y we remove it

If ai is y = v (v is a variable necessarily different from y) we remove it, replace y by v everywhere in the scope of the quantifier, and remove the quantifier

If ai is y < y, the formula is equivalent to ⊥

After the above, the only atoms in the scope of the quantifier are: either of the form l < y or of the form y < r

Doing the steps above produces a formula of the form V V G ∧ ∃y(( i li < y) ∧ ( j y < rj)) where y is not free in G V This is equivalent to G ∧ i,j li < rj (using density and the no-end-point axioms!)

97 Other examples of first-order theories admitting quantifier elimination ˆ Linear arithmetic on natural numbers Presburger, 1929 ˆ Linear arithmetic on rational numbers (LRA) Fourier, 1827 - Motzkin, 1936 ˆ Non-linear arithmetic on rational/real numbers Tarski, 1948 ˆ Non-linear arithmetic on complex numbers Tarski, 1948

Note on Linear Integer Arithmetic (also applies to Presburger arithmetic):

With signature ({0, 1, +, −}, {≤, ≥, <, >}), LIA does not actually admit quantifier elimination.

For instance, the formula ∃x(y = x + x) is not equivalent to any quantifier-free formula

But if we add an infinite number of (unary) predicate symbols (n| )n≥2, with axioms describing n|y as ∃x(y = nx)...

. . . then it admits quantifier elimination (see “efficient” algorithm by Cooper -1972)

98 Conclusion for quantifier elimination Admitting quantifier-elimination does not necessarily provide algorithm to decide the validity of all formulae!

We still need algorithm to decide quantifier-free problems

In the case of Linear Integer Arithmetic, simplex algorithm is not sufficient: we have to treat the new predicate symbols (n| )n≥2

Can be done.

When quantifier-elimination does provide algorithm for all formulae, efficiency can be very bad! (Cooper’s reduction is doubly exponential) In practice: may often be more efficient to use an incomplete method (e.g., triggers in SMT-solving)

What about full integer arithmetic (Peano’s arithmetic PA, i.e., with multiplication)? Can we have an algorithm to decide whether PA ` A?

99 III. Trying to get full arithmetic

100 Recursive functions

 Goedel’31 Incompleteness: There exists a closed formula A N such that PA 6` A and A = 1 OW J K Easy1 Easy2  Incompleteness’: There exists a closed formula B

such that PA 6` B and PA 6` ¬B KS Easy3

Church’36 Undecidability: The set of all C 5= such that PA ` C is undecidable _g Turing’37 λ-calculus The halting problem is undecidable Turing Machines

101 2 Recursive functions l

 Goedel’31 Incompleteness: There exists a closed formula A N such that PA 6` A and A = 1 Rosser’39 Rosser’39 OW J K Easy1 Easy2  Incompleteness’: There exists a closed formula B

such that PA 6` B and PA 6` ¬B KS Easy3

Church’36 Undecidability: The set of all C 5= such that PA ` C is undecidable _g Turing’37  Ø λ-calculus The halting problem is undecidable Turing Machines j 4

Rosser’39

102 Easy 5.5 Easy1: take B := A, then soundness

N Easy2: take A among {B, ¬B} s.t. A = 1 J K

Easy3: Simultaneaously look for proof of A and proof of ¬A g(pAq) := smallest n that encode a proof of PA ` A or a proof of PA ` ¬A 4 possibilities 1. PA ` A and PA 6` ¬A: g terminates and outputs proof of PA ` A 2. PA 6` A and PA ` ¬A: g terminates and outputs proof of PA ` ¬A 3. PA 6` A and PA 6` ¬A: g does not terminate 4. PA ` A and PA ` ¬A: impossible by existence of a model of PA (we have seen one: N) Assuming completeness (3 never happens), g would always terminate & could be used for deciding PA ` A

103 Not so easy: Turing 37 (Undecidability from halting problem) 5.3 We define a computable function F mapping each program f and each number n to a formula of arithmetic Af,n s.t.

PA ` Af,n iff f terminates on input n

By contradiction: If there were a program (representing G below) to decide provability of formulae in PA, it would decide provability in PA of formulae of the form Af,n. By composing with F , we would have a program deciding whether “The program f terminate on input n” (the halting problem); contradiction.

(f,n) A 0 F G

1 G o F

104 In details: the Representation Theorem 5.3

Given a program f , construct formula Bf with free variables among {x, y} s. t. the Representation Theorem holds: The following 3 assertions are equivalent f(p) = q q p PA ` {y}{x}Bf q p N {y}{x}Bf = 1 J K p times where p := S(...S (0) ...)

In that case the following 3 assertions are equivalent:

f terminates on p p PA ` ∃y{x}Bf p N ∃y{x}Bf = 1 J K All the hard work is in the construction of Bf from f (see INF412 or INF551 course notes)

Note that the function mapping f to Bf must be computable QED

105 Questions?

106 Lecture IV Semi-decision procedures for proof-search: goal-directed techniques and unification

107 Automated reasoning techniques you know so far For propositional logic: DPLL

For propositional logic + decision procedure for universally quantified conjunctions of literals: DPLL(T )

When general quantifiers are involved: ˆ For some specific theories (if lucky): decision procedure given by quantifier elimination techniques (cf Lecture 3) ˆ Otherwise: in general, problem not decidable

108 Semi-decidability BUT: the problem of provability (in predicate logic) is still semi-decidable

Semi-decision algorithm for provability of T ` A: Enumerate all trees (where nodes are labelled by ) Check every time if it is a proof of T ` A

Correctness: If T ` A is provable, then at some point one of its proofs will be enumerated Otherwise the algorithm does not terminate

Comments: Sufficient to prove semi-decidability and Godel’s¨ theorem Useless in practice

Today and next week: better semi-decision algorithms, to be used in practice

109 Contents

I. Proof-search: from Natural Deduction to Calculus II. The rules of Sequent Calculus III. Proof-search in the cut-free Sequent Calculus

110 I. Proof-search: from Natural Deduction to Sequent Calculus

111 The proof system you know so far: Natural Deduction

Introduction rules Elimination rules Γ,P ` Q Γ ` P ⇒ Q Γ ` P

Γ ` P ⇒ Q Γ ` Q

Γ ` P1 Γ ` P2 Γ ` P1 ∧ P2 i ∈ {1, 2} Γ ` P1 ∧ P2 Γ ` Pi Γ ` Pi Γ ` P1 ∨ P2 Γ,P1 ` Q Γ,P2 ` Q i ∈ {1, 2} Γ ` P1 ∨ P2 Γ ` Q Γ `⊥

Γ `> Γ ` P Γ ` P Γ `∀ xP x∈ / FV(Γ) t Γ `∀ xP Γ ` x P t Γ ` x P Γ `∃ xP Γ,P ` Q

Γ `∃ xP Γ ` Q

Γ ` ¬¬P Γ,P `⊥ Γ `¬ P Γ ` P

Γ,P ` P Γ ` P Γ `¬ P Γ `⊥

112 Principles of goal-directed proof-search Instead of enumerating all trees & then check if they are correct proof-trees of your goal. . .

. . . try to construct trees ˆ where each node (and its children) form a correct instance of inference rule ˆ whose root is labelled with your goal

Construct such a tree incrementally from its root, i.e., the goal Goal-directed

Incrementally = At each leaf, extend the proof-tree with new leaf: the old leaf should be the conclusion of an inference rule the new leaves should be its premisses

axiom axiom P,Q ` P P,Q ` Q ∧-intro P,Q ` P ∧ Q ⇒-intro P ` Q ⇒ (P ∧ Q)

Let’s try an introduction rule. Another one. Let’s try an axiom.

113 Elimination rules Situation is not as good

Can always apply Γ ` A ∧ B ∧-elim Γ ` A

B must be guessed (not present in the conclusion), so should all possible B be enumerated?

An example:

P ∧ Q ` P ∧ Q ∧-elim P ∧ Q ` P

How were ∧ and Q guessed?

114 An asymmetry In natural deduction, while trying to prove T ` A,

Shape of A guides the choice of introduction rules

Shape of hypotheses in T does not guide the choice of elimination rules (at least directly)

115 The spirit of Sequent Calculus (Gentzen 1935) Introduction rules (work well): they are kept = right rules

Elimination rules: replaced by introduction rules for hypotheses= left rules

Example: Γ,A,B ` C ∧-left Γ,A ∧ B ` C What to do with this hypothesis? Back to our example: P,Q ` P ∧-left P ∧ Q ` P

116 Other left rules

Γ,A ` C Γ,B ` C ∨-left Γ,A ∨ B ` C

Γ ` A ¬-left Γ, ¬A ` B

Γ ` A Γ,B ` C ⇒-left Γ,A ⇒ B ` C

⊥-left Γ, ⊥ ` A

117 Elimination of Double Negation In Natural Deduction: EDN saves copies of formulae on the left. Could be kept in Sequent Calculus: . . . or you can keep saved copies on the right

axiom axiom axiom axiom P ` P P,Q ` Q P ` P P,Q ` Q ⇒-left ⇒-left P,P ⇒ Q ` Q P,P ⇒ Q ` Q ¬-left P, ¬Q,P ⇒ Q ` ⊥ P,P ⇒ Q ` (⊥, )Q ¬-right ¬-right P, ¬Q ` ¬(P ⇒ Q) P ` ¬(P ⇒ Q),Q ¬-left ¬-left ¬¬(P ⇒ Q),P, ¬Q ` ⊥ ¬¬(P ⇒ Q),P ` (⊥, )Q ¬-right ¬¬(P ⇒ Q),P ` ¬¬Q ¬¬(P ⇒ Q),P ` Q EDN ¬¬(P ⇒ Q),P ` Q ¬¬(P ⇒ Q),P ` Q

axiom axiom P ` P P,Q ` Q Sequents with several ⇒-left P,P ⇒ Q ` Q propositions on the right: ¬-right P ` ¬(P ⇒ Q),Q ¬-left ¬¬(P ⇒ Q),P ` Q

118 II. The rules of Sequent Calculus

119 Logical rules

>-right ⊥-left Γ ` >, ∆ Γ, ⊥ ` ∆ Γ ` A, ∆ Γ ` B, ∆ Γ, A, B ` ∆ ∧-right ∧-left Γ ` A ∧ B, ∆ Γ,A ∧ B ` ∆ Γ ` A, B, ∆ Γ,A ` ∆ Γ,B ` ∆ ∨-right ∨-left Γ ` A ∨ B, ∆ Γ,A ∨ B ` ∆ Γ,A ` B, ∆ Γ ` A, ∆ Γ,B ` ∆ ⇒-right ⇒-left Γ ` A ⇒ B, ∆ Γ,A ⇒ B ` ∆ Γ,A ` ∆ Γ ` A, ∆ ¬-right ¬-left Γ ` ¬A, ∆ Γ, ¬A ` ∆

axiom The axiom rule: Γ,A ` A, ∆ From bottom to top: every rule decreases the number of connectives Height of proof-trees is bound! Decision algorithm . . . for propositional logic

120 What about quantifiers?

t Γ ` P, ∆ Γ, (∀x P ), x P ` ∆ x 6∈ FV(Γ, ∆) Γ ` (∀x P ), ∆ Γ, (∀x P ) ` ∆

t Γ ` x P, (∃x P ),∆ Γ,P ` ∆ x 6∈ fv(Γ, ∆) Γ ` (∃x P ), ∆ Γ, (∃x P ) ` ∆

Natural Deduction: hypotheses are permanent can be used several times

∀-left and ∃-right need to keep a duplicata of the formula

121 The drinker’s theorem Consider the following statement:

“There is always someone such that, if he drinks, everybody drinks”

122 Proof - Informal Take the first guy you see, call it Bob.

Either Bob does not drink, in which case he satisfies the predicate “if he drinks, everybody drinks”

. . . or Bob drinks, in which case we have to check that everybody else drinks

If this is the case, then again Bob is the person we are looking for

If we find someone who does not drink, call it Derek, we change our mind and say that the guy we are looking for is Derek

123 Proof - Formal in sequent calculus: Let A be the formula ∃x (drinks(x) ⇒ ∀y drinks(y))

0 0 drinks(Bob), drinks(y) ` drinks(y), ∀y drinks(y ),A

0 0 drinks(Bob) ` drinks(y), drinks(y) ⇒ ∀y drinks(y ),A t = y drinks(Bob) ` drinks(y),A

drinks(Bob) ` ∀y drinks(y),A

` drinks(Bob) ⇒ ∀y drinks(y),A t = Bob ` A

124 How to translate a proof from Natural Deduction? By induction on the size of the proof

Translating a proof of the form π Γ ` A ∧ B ∧-elim Γ ` A

By induction hypothesis, we get a proof-tree π0 (in Sequent Calculus) of Γ ` A ∧ B

axiom π0 Γ, A, B ` A ∧-left Γ ` A ∧ B Γ,A ∧ B ` A cut Γ ` A

125 The cut rule First, we add rule Γ ` A, ∆ Γ,A ` ∆ cut Γ ` ∆

Secondly, show that this rule is superfluous cut-elimination

Why eliminate cuts? Destroys all advantages of Sequent Calculus

126 Two Equivalence with Natural Deduction:

Γ ` A provable in ND iff provable in SC with cuts

Translations both ways

Cut-elimination:

Γ ` ∆ has a proof in SC with cuts iff it has a cut-free proof

Cut-elimination by step-by-step reduction of proof

127 A typical case

π1 π2 π3 Γ, A, B ` ∆ Γ ` A, ∆ Γ ` B, ∆ ∧-left ∧-right Γ,A ∧ B ` ∆ Γ ` A ∧ B, ∆ cut Γ ` ∆

π1 π3

Γ, A, B ` ∆ Γ,A ` B, ∆ π2 cut Γ,A ` ∆ Γ ` A, ∆ cut Γ ` ∆

The cut-elimination proof is by induction on the size of the formula been “cut” (here, a cut on A ∧ B is replaced by two “smaller cuts”, on A and on B, resp.)

128 III. Proof-search in the cut-free Sequent Calculus

129 Choices Never need to enumerate all possible propositions But. . . some choices still have to be made

1. Choice of sequent

? ? P,Q ` P P,Q ` Q ∧-right P,Q ` P ∧ Q

2. Choice of Proposition to decompose or application of axiom P ∧ Q ` Q ∨ R

3. Choice of term t Γ, ∀x A, x A ` B ∀-left Γ, ∀x A ` B

130 Several kinds of choices Imagine you have to do “either A or B” General case: don’t know choose A, then if A fails choose B (backtrack) Irrelevant choice: don’t care no matter whether you choose A or B, the result will be the same (sequentialising independent tasks to be done)

1. Choice of Sequent don’t care ? ? P,Q ` P P,Q ` Q ∧-right P,Q ` P ∧ Q

2. Choice of Proposition to decompose or application of axiom don’t care!? P ∧ Q ` Q ∨ R

3. Choice of Term (disregarding duplicata) don’t know t Γ, x A ` B ∀-left Γ, ∀x A ` B

131 Finite and infinite choice Choice of Sequent: finite

Choice of Proposition: finite (if T finite)

Choice of term: infinite

Duplicata allows us to prevent backtracking (can always re-attack formula with different term instead of backtracking)

Still, each time we have to choose 1 term among infinitely many:

axiom p(f(f(c))) ` p(f(t)), ∃x p(f(x)) ∃-right p(f(f(c))) ` ∃x p(f(x)) try c, f(c), f(f(c)),. . . for term t Which term to choose? How have we guessed it?

Can we delay the choice of the term? Let’s use a meta-variable X

132 Sequent Calculus with meta-variables: 1st attempt

X Γ ` P, ∆ Γ, (∀x P ), x P ` ∆ x 6∈ fv(Γ, ∆) X 6∈ mv(Γ, ∆) Γ ` (∀x P ), ∆ Γ, (∀x P ) ` ∆

X Γ ` x P, (∃x P ), ∆ Γ,P ` ∆ X 6∈ mv(Γ, ∆) x 6∈ fv(Γ, ∆) Γ ` (∃x P ), ∆ Γ, (∃x P ) ` ∆

Syntax must be enriched:

t ::= x | X | f(t1, . . . , tn) if f/n ∈ Σ mv(t) and mv(Γ) are the equivalent of fv(t) and fv(Γ) for meta-variables

133 Axiom and Substitution r(986) ` r(Y ), (∃y r(y)) Example: r(986) ` ∃y r(y)

We wish to say: it’s done by instantiating Y by 986

More generally, what to do for an axiom?

Γ, p(t1, . . . , tn) ` p(u1, . . . , un), ∆

It would be good to instantiate all meta-variables so that, for all i such that 1 ≤ i ≤ n, we have ti = ui.

Substitution: partial function from meta-variables to terms (σ(X) = t), which can easily be extended to terms (σ(u) = t) as follows:

σ(f(t1, . . . , tn)) = f(σ(t1), . . . , σ(tn)) σ(x) = x σ(X) = X si X 6∈ domain(σ)

134 Unifier and Propagation Formally, we look for a substitution σ such that for all i with 1 ≤ i ≤ n, we have

σ(ti) = σ(ui). σ is a unifier, i.e.

σ is a solution to the unification problem t1 = u1,. . . , tn = un

Example Let A := ∀x p(x, x) and B := ∃y (p(y, 0) ∧ p(y, S(0)))

ok with σ(Y ) = σ(X) = 0 ok with σ0(Y ) = σ0(X0) = S(0)

A, p(X,X) ` p(Y, 0),B A, p(X0,X0) ` p(Y,S(0)),B

A ` p(Y, 0),B A ` p(Y,S(0)),B

A ` p(Y, 0) ∧ p(Y,S(0)),B

A ` B

σ and σ0 incompatible: impossible to reconstruct a proof. As soon as one of them is chosen, we have to propagate this choice into the other branch.

135 Sequent Calculus with meta-variables: 2nd attempt

The state of all open branches (Γ1 ` ∆1) ... (Γn ` ∆n) is grouped in a datastructure :

Γ1 ` ∆1 o ... o Γn ` ∆n

S o Γ ` P, ∆ nX o S o Γ, (∀x P ), x P ` ∆ x 6∈ fv(Γ, ∆) X 6∈ mv(Γ, ∆) S o Γ ` (∀x P ), ∆ S o Γ, (∀x P ) ` ∆ nX o S o Γ,P ` ∆ S o Γ ` x P, (∃x P ), ∆ X 6∈ mv(Γ, ∆) x 6∈ fv(Γ, ∆) S o Γ ` (∃x P ), ∆ S o Γ, (∃x P ) ` ∆

σ(S) σ unifier of (ti = ui)1≤i≤n S o Γ, p(t1, . . . , tn) ` p(u1, . . . , un), ∆

136 . . . and the logical rules are adapted

Connect. Left-rules Right-rules

S S >, ⊥ S o Γ, ⊥ ` ∆ S o Γ ` >, ∆ S o Γ ` A, ∆ S o Γ,A ` ∆ ¬ S o Γ, ¬A ` ∆ S o Γ ` ¬A, ∆ S o Γ,A ` ∆ o Γ,B ` ∆ S o Γ ` A, B, ∆ ∨ S o Γ,A ∨ B ` ∆ S o Γ ` A ∨ B, ∆ S o Γ, A, B ` ∆ S o Γ ` A, ∆ o Γ ` B, ∆ ∧ S o Γ,A ∧ B ` ∆ S o Γ ` A ∧ B, ∆ S o Γ ` A, ∆ o Γ,B ` ∆ S o Γ,A ` B, ∆ ⇒ S o Γ,A ⇒ B ` ∆ S o Γ ` A ⇒ B, ∆

137 Unification Let’s come back to unifiers.

Questions:

0 0 Is there always a unifier for a problem t1 = t1, . . . , tn = tn ?

How to find it in non-trivial cases?

... Robinson’s unification algorithm

138 Unification algorithm: an example Solutions to problem p(f(X)) = p(f(f(c))) are the same as those of problem

f(X) = f(f(c)) are the same as those of problem X = f(c) and this problem has one solution: the substitution X 7→ f(c).

139 Unification algorithm: general case Choose an equation in the system

ˆ f(t1, ..., tn) = f(u1, ..., un) −→ replace by t1 = u1, ..., tn = un

ˆ f(t1, ..., tn) = g(u1, ..., um) −→ fail

ˆ X = X −→ suppress

ˆ X = t (or t = X), X featured in t, t distinct from X, −→ fail

ˆ X = t (or t = X), X not featured in t, −→ substitute X by t in the rest of the system if solving it returns a substitution σ then return σ ∪ {X 7→ σ(t)}

0 0 The result is denoted mgu(t1 = t1, . . . , tn = tn) (most general unifier)

140 Is it finished?

Example Let P1 := ∀z p(z, z) and P2 := ∃x ∀y p(x, S(y))

P1, p(Z,Z) ` (p(X,S(y))),P2

P1 ` (p(X,S(y))),P2

P1 ` (∀y p(X,S(y))),P2

P1 ` P2

` P1 ⇒ P2 with mgu(Z = X,Z = S(y)) : Z 7→ S(y) X 7→ S(y)

The term for x could not use y, which is freed at a later stage!

141 The trick

Example Let P1 := ∀z p(z, z) and P2 := ∃x ∀y p(x, S(y))

P1, p(Z,Z) `X (p(X,S(y(X)))),P2

P1 `X (p(X,S(y(X)))),P2

P1 `X (∀y p(X,S(y))),P2

P1 ` P2

` P1 ⇒ P2 not ok, since there are no unifiers for Z = X,Z = S(y(X)) (mgu(Z = X,Z = S(y(X))) = Fail)

Just a technical trick, or deeper remark?

Y (x ,...,xn) Compare ` ∃x1 ... ∃xn∀y P with ` ∃x1 ... ∃xn 1 y P

Equiprovable! (cf next week)

142 Sequent Calculus with meta-variables This time it’s the correct version!

nx(Φ) o S o Γ `Φ x P, ∆ x 6∈ fv(Γ, ∆) S o Γ `Φ (∀x P ), ∆

X S o Γ, (∀x P ), x P `Φ,X ∆ X 6∈ mv(Γ, ∆) S o Γ, (∀x P ) `Φ ∆

X S o Γ `Φ,X x P, (∃x P ), ∆ X 6∈ mv(Γ, ∆) S o Γ `Φ (∃x P ), ∆

nx(Φ) o S o Γ, x P `Φ ∆ x 6∈ fv(Γ, ∆) S o Γ, (∃x P ) `Φ ∆

σ(S) σ = mgu(t1 = u1, . . . tn = un) S o Γ, p(t1, . . . , tn) `Φ p(u1, . . . , un), ∆

143 Final remarks Using this Sequent Calculus for propositional logic, how does it compare to DPLL?

Not good (why?)

. . . but there are ways to adapt Sequent Calculus to be as efficient!

Next week: ˆ Using the above algorithmics to program: an introduction to logic programming ˆ Another proving technique that is not goal-directed: the method

144 Questions?

145 Lecture V Introduction to logic programming, Resolution

146 Where we stand You now know how to perform proof-search in predicate logic

Today we

ˆ see how to program using logic

ˆ see alternative methodology for automated reasoning: resolution

Why treat them in the same lecture? they involve a common concept, that of clausal forms

147 Contents

I. Clausal forms II. Logic programming III. Resolution

148 I. Clausal forms

149 Definitions: from propositional to predicate logic You know literals and clauses from propositional logic (SAT-solving/DPLL) We now lift those concepts to predicate logic

ˆ literal (denoted l, l0,...) = (positive literal) or negation of atomic formula (negative literal)

ˆ clause (denoted C,C0,...) = disjunction of literals (implicitly or explicitly) universally quantified

∀x1 ... ∀xk l1 ∨ ... ∨ lp with {x1, . . . xk} = FV(l1 ∨ ... ∨ lp)

ˆ Let A be a (closed) formula of predicate logic. clausal form of A

= finite set of clauses C1,...,Cn such that A ` ⊥ iff C1,...,Cn ` ⊥

Many automated reasoning techniques are based on handling of clausal forms

Question: Does A always have a clausal form?

150 Producing clausal forms Searching for formula B of the form (∀x1 ... ∀x1 l1 ∨ ... ∨ l1 ) ∧ ... ∧ (∀xq ... ∀xq lq ∨ ... ∨ lq ) 1 k1 1 p1 1 kq 1 pq with {xi , . . . , xi } = FV(li ∨ ... ∨ li ) 1 ki 1 pi and such that A ` ⊥ iff B ` ⊥

4 stages

151 Producing clausal forms 1. a prenex form of A: a formula logically equivalent to A, of the form

Q1x1 ...Qnxn C with all quantifiers Q1 ...Qn at the head of formula, and C quantifier-free

2. a skolemised prenex form of A: a closed formula B of the form

∀y1 ... ∀ym D where D is quantifier-free, such that A ` ⊥ iff B ` ⊥. (Every free variable or existentially quantified variable x has been substituted by new function symbols x(y1, . . . , yi))

152 Producing clausal forms 3. a skolemised prenex conjunctive normal form of A: same thing with D being a conjunction of disjunctions, i.e., we get a formula of the form 1 1 q q ∀y1 ... ∀ym (l1 ∨ ... ∨ lp1 ) ∧ ... ∧ (l1 ∨ ... ∨ lpq )

4. a clausal form of A: a conjunction of closed clauses, logically equivalent to a skolemised prenex conjunction normal form of A, i.e., we get something of the form: (∀x1 ... ∀x1 l1 ∨ ... ∨ l1 ) ∧ ... ∧ (∀xq ... ∀xq lq ∨ ... ∨ lq ) 1 k1 1 p1 1 kq 1 pq with {xi , . . . , xi } = FV(li ∨ ... ∨ li ) 1 ki 1 pi

153 Producing clausal forms Example: ¬(r(986) ⇒ ∃y r(y)) becomes r(986) ∧ ∀y ¬r(y), the latter is unsatisfiable iff the former is (i.e., iff r(986) ⇒ ∃y r(y) is valid)

To save space (but no logical information), we often retain from (∀x1 ... ∀x1 C1) ∧ ... ∧ (∀xq ... ∀xq Cq) 1 k1 1 kq only the set of clauses C1,...,Cn (variables are all implicitly universally quantified), where disjunction is associative & commutative, i.e., clauses = (multi-)sets of literals

154 II. Logic programming

155 Remember Church’s theorem (1937) Undecidability: The set of all C such that PA ` C is undecidable Remember the proof (from the halting problem) We define a computable function F mapping each program f and each number n to a formula of arithmetic Af,n s.t.

PA ` Af,n iff f terminates on input n By contradiction: If there were a program (representing G below) to decide provability of formulae in PA, it would decide provability in PA of formulae of the form Af,n. By composing with F , we would have a program deciding whether “The program f terminate on input n” (the halting problem); contradiction.

(f,n) A 0 F G

1 G o F

156 In details: the Representation Theorem 5.3

Given a program f , construct formula Bf with free variables among {x, y} s. t. the Representation Theorem holds: The following 3 assertions are equivalent

f(p) = q q p PA ` {y}{x}Bf q p N {y}{x}Bf = 1 J K p times where p := S(...S (0) ...)

All the hard work is in the construction of Bf from f (see INF412 or INF551 course notes)

157 What do we learn from this?

Given a computable function f , take the formula Bf

Given an integer input n in the domain of f , the only instance of Y such that Y p PA ` y {x}Bf is f(n)

How do you compute that instance? run proof-search as in Lecture 4!

If such an instance exists, proof-search will find it (otherwise it runs forever)

Proof-search can compute any computable thing

Proof-search is Turing-complete

Formulae of predicate logic form a programming language“L OGIC PROGRAMMING”

158 A practical programming language Our “Logic Programming language” described above is currently ˆ given operational semantics by proof-search in PA ˆ using arbitrary formulae of predicate logic to represent programs (or at least those formulae used in the representation theorem)

This is inherited from our original goal to prove Church’s theorem (and then Goedel’s)¨

If instead the only goal is to have a Turing-complete language that is ˆ given operational semantics by some proof-search process ˆ using logical formulae to represent programs

. . . then we can build a much simpler “Logic Programming language” that is ˆ given operational semantics by proof-search in the empty theory ˆ using a very restricted syntax form of logical formulae to represent programs: Horn clauses

This language is called ProLog

159 Other Logic Programming languages Other choices are possible. Logic Programming languages form a family.

Their semantics is always tied ˆ to a precise methodology to perform proof-search ˆ to a class of formulae used for programs ˆ to a unification algorithm (for ProLog: Robinson’s)

Variant: Higher-Order Logic Programming( λ-ProLog) ˆ proof-search in higher-order logic (i.e., with simply-typed λ-terms, etc) ˆ bigger class of formulae: Hereditary Harrop formulae ˆ pattern unification, an algorithm more powerful than Robinson’s (but still decidable)

Variant: Constraint Logic Programming uses different proof-search algorithm, with specific treatment of some sub-formulae (constraints) in a particular theory (e.g., reals)

All Turing-complete (not more: cf., Church-Turing thesis), very high-level languages Today, we see basic ProLog

160 ProLog Prover implementing goal-directed proof-search, in sequent calculus, using meta-variables and Robinson’s algorithm, restricted to sequents of the form C1,...,Cn ` l1 ∧ ... ∧ lp where

C1,...,Cn are (universally quantified) Horn clauses with one positive literal l1, . . . , lp are positive literals with meta-variables Horn clause: clause that has at most one positive literal Syntax: l:-l1,...,ln. for (l ∨ (¬l1) ∨ ... ∨ (¬ln)) with n ≥ 1

or (l1 ∧ ... ∧ ln) ⇒ l l. for n = 0

Conjunction of literals l1 ∧ ... ∧ ln to prove, the query, is written ?:-l1,...,ln. Property of this fragment of sequent calculus: Left-hand side of sequent stays invariant all along proof-search It’s the programme, given in a .pl file

161 III. Resolution

162 A(nother) semi-decision procedure for validity (=provability) Resolution method is a refutational method (i.e., tries to determine that negation of formula is unsat):

Let A be a formula of predicate logic. We want a (not-too-stupid) semi-decision procedure to answer the question: Is A provable/valid? i.e., Is ¬A unsatisfiable? i.e., Is B unsatisfiable ? . . . where B is. . . a clausal form of ¬A

163 Resolution method . . . is not goal-directed like sequent calculus

. . . is based instead on a saturation mechanism: from the clausal form of ¬A, produce new clauses such that if the former had a model (were satisfiable), then it would be a model of the latter

We enrich our database of “known” clauses we saturate it until we hit the empty clause ⊥

If the clausal form of ¬A were satisfiable, so would ⊥. Contradiction. ¬A is unsat., so A is provable

Risk: saturation never terminates, the set of known clauses grows forever

164 Rules to produce new clauses There are only 2 of them!

Simplification C ∨ l ∨ l0 σ = mgu(l = l0) σ(C ∨ l)

Resolution C ∨ l C0 ∨ ¬l0 σ = mgu(l = l0) σ(C ∨ C0)

The above are controlled combinations of the (more easily understandable) rules C C ∨ l ∨ l C ∨ l C0 ∨ ¬l σ(C) C ∨ l C ∨ C0

165 Example Want to prove r(986) ⇒ ∃y r(y)

Take its clausal form: Clauses r(986) and ¬r(y)

Take σ = mgu(986 = y), i.e., the substitution that maps y to 986 applying resolution rule to r(986) and ¬r(y) gives the empty clause ⊥

(resolution rule is to be understood bearing in mind that C ∨ ⊥ ≡ C)

. . . so our set of clauses r(986) and ¬r(y) was unsatisfiable

. . . so the formula ¬(r(986) ⇒ ∃y r(y)) was unsatisfiable

. . . so the formula r(986) ⇒ ∃y r(y) was valid

166 More generally. . . Soundness and Completeness

A set of clauses C1,...,Cn is unsatisfiable if and only if ⊥ can be obtained by saturating the set with the simplification and resolution rules

Proof Soundness easy (bottom implies top): for each of the two rules, just check that a model of the premisses is a model of the conclusion Completeness: hard

167 Last example The drinker’s theorem

Formula ∃x (p(x) ⇒ ∀y p(y)), to be proved, becomes: the set of clauses p(x) and ¬p(y(z)), to show unsatisfiable

Apply resolution rule:

with σ = mgu(x = y(z)), i.e., the substitution mapping x to y(z)

obtain the empty clause cqfd

168 Conclusion Resolution used a lot in automated theorem provers for predicate logic (it’s currently the dominating approach for pure predicate logic): Vampire, E-prover, SPASS, Prover9, Otter, Carine, Gandalf, SNARK, . . .

They compete every year in the CASC competition: https://en.wikipedia.org/wiki/CADE_ATP_System_Competition

Secret is in how to apply the rules (strategies) We haven’t seen them, but many techniques exist: selection function, orders, machine-learned strategies,. . .

In presence of theories, user interaction, or more expressive logics, dominance of resolution is not as clear

Goal-directed proof-search offers more natural human interaction (cf., Coq)

Very active research area: mixing different approaches and try to get the best of all worlds

169 Questions?

170 Lecture VI The theory of sets

171 Where we stand today We know computer-aided reasoning techniques for ˆ propositional logic ˆ some specific theories with or without quantifiers, ˆ predicate logic in general Now: we want a systematic way of representing any problem (of mathematical nature) in a universal framework (language+logic+theory) Why? Because then, mechanics of solving problems in that framework =⇒ mechanics of solving any problem Arithmetic? Too weak! . . . difficult to account for different notions of infinity Today: Proposal for such framework = Set theory

172 Contents

I. The mathematics chat room II. Naive set theory & Russell’s paradox III. Zermelo-Fraenkel set theory IV. Doing without the bugged axiom schema V. Arithmetic in set theory VI. Church, Turing and Goedel crash the party again!

173 I. The mathematics chat room

174 The mathematics chat room 74-78: Georg enters chat. There are different notions of infinity! Here’s the notion of bijection!

78: Leopold says That’s all crap

78: Georg says I can’t prove the property that there is no cardinal between Q and R. Strange. Let’s call it the Continuum Hypothesis

79: Gottlob enters chat. Hi guys! I’ve formalised logic with quantified variables. anyone interested?

82: Ferdinand says π is transcendental

86: Leopold says lol! Irrational numbers do not exist, let alone transcendental numbers

175 The mathematics chat room 89: Giuseppe enters chat. Here’s an axiomatisation of arithmetic. Georg, you use a property that you cannot justify; let’s call that the .

91: Georg says You like it, yeah? Here are some more: axiom of infinity and axiom of the

91: Leopold says This is nonsense

91: Leopold left the chat.

93: Gottlob says I think Georg’s ideas are cool, I’ll extend my logic with them!

176 The mathematics chat room 95: Georg says Here are my latest thoughts about sets and ordinals.

95: David enters chat. That’s quite a lot of ordinals. Who’s got the biggest?

96: Georg to David (private): Bugger, the set of all ordinals doesn’t seem to be a set; let’s keep this private

97: Cesare enters chat. Hi Georg, the set of all ordinals doesn’t seem to be a set!

97: Georg says Go away, you misread my paper! Damn it, I’ve been busted.

99: Georg says Oh no! the cardinal of the set of all sets also seems problematic. . . And I still can’t prove the continuum hypothesis. Ok, now I’ll prove that Francis Bacon wrote Shakespeare’s plays.

177 The mathematics chat room 99: David says Now now, children, let’s clear up this mess. I have 23 problems to submit to you all. Let’s use Gottlob’s logical methodology more systematically.

02: Bertrand enters chat. Hold on, chaps. I dare say that Gottlob’s logical foundations are bugged too; and there is no such thing as the set of all sets.

03: Gottlob says You’ve undermined the whole of mathematics! I’m depressed.

03: Bertrand says Don’t worry, me and me pal Alfred will fix it with a Theory of Types!

08: Ernst enters chat. Don’t bother, here’s an axiomatisation of set theory without the paradox.

178 II. Naive set theory & Russell’s paradox

179 Na¨ıveset theory Syntax: Empty term signature; Predicate signature: ∈ and = , arity 2

Notation: Let’s fix a particular sequence of variables y1,. . . ,yi,. . . t1,...,tp If A is a formula and t1,. . . , tp are p terms, then A[t1, . . . , tp] denotes y1,...,yp A

In Cantor’s work and Frege’s work, we have, for every formula A such that

FV(A) ⊆ {y1, x1, . . . , xn}, an axiom

∀x1 ... ∀xn∃c∀y (y ∈ c ⇔ A[y])

Informally: existence of the set {y | A[y]}, i.e., the set of all elements y satisfying A[y]

First instantiation: in particular we have ∃r∀y (y ∈ r ⇔ >) i.e., ∃r∀y (y ∈ r) (there is a set of all sets)

180 Russell’s paradox (1902) Second instantiation: in particular we have axiom R: ∃r∀y (y ∈ r ⇔ ¬y ∈ y)

What about r ∈ r? or is it ¬r ∈ r?

Clearly, R ` ∃r(r ∈ r ⇔ ¬r ∈ r)

Is this problematic?

Yes, since:

Lemma: If T ` (A ⇔ ¬A) then T ` ⊥

Proof: Clearly, (A ⇔ ¬A),A ` ¬A and since we also have (A ⇔ ¬A),A ` A, we get (A ⇔ ¬A),A ` ⊥ Therefore A ⇔ ¬A ` ¬A And finally A ⇔ ¬A ` ⊥ So indeed R ` ⊥

181 III. Zermelo-Fraenkel set theory

182 Fixing set theory with Separation (Zermelo 1908) Russell’s paradox can be quickly fixed:

for every formula A such that FV(A) ⊆ {y1, x1, . . . , xn},

have an axiom SA:

∀x1 ... ∀xn∀x∃c∀y (y ∈ c ⇔ (y ∈ x∧A[y])) Informally: existence of the set {y ∈ x | A[y]}, i.e., the set of all elements y in x satisfying A[y] Separation axiom(s) (x is split in two)

Corollary: (SA)A ` ¬(∃a∀y(y ∈ a)) There is no set of all sets Proof : Indeed S¬y∈y ` ∀a ∃r ∀y (y ∈ r ⇔ (y ∈ a ∧ ¬y ∈ y)) so S¬y∈y, (∃a∀y(y ∈ a)) ` ∃r ∀y (y ∈ r ⇔ (> ∧ ¬y ∈ y)) so S¬y∈y, (∃a∀y(y ∈ a)) ` R so S¬y∈y, (∃a∀y(y ∈ a)) ` ⊥.

183 Improving Separation with Replacement (Fraenkel 1922) Note: the separation schema is quite restrictive There are still instances of the bugged axiom schema that are desired but not allowed by separation.

Generalise separation schema into replacement schema:

for every formula A such that FV(A) ⊆ {y1, y2, x1, . . . , xn},

have an axiom RA:

∀x1 ... ∀xn [functional(A) ⇒ ∀x∃c∀y (y ∈ c ⇔ ∃z (z ∈ x ∧ A[z, y]))] where functional(A) abbreviates ∀z∀y∀y0 ((A[z, y] ∧ A[z, y0]) ⇒ y = y0)

Informally : “c = Im(A|x)” Lemma: Generalises separation

Proof : separation axiom SA can be proved from replacement axiom Ry1=y2∧A:

∀x1 ... ∀xn [functional(y1 = y2 ∧ A) ⇒ ∀x∃c∀y (y ∈ c ⇔ ∃z (z ∈ x ∧ z = y ∧ A[z]))]

(functional(y1 = y2 ∧ A) easy to prove)

184 Building bigger sets (motivation) Notice that neither the separation schema nor the replacement schema construct sets “bigger” than those already constructed Cantor 1891: simple proof that there is a set strictly “bigger” than that of natural numbers Take the set ω of natural numbers Take its power set P(ω) Assume f is a surjective function from ω to P(ω) (assumption H) Let a := {y ∈ ω | ¬y ∈ f(y)}. Let x ∈ ω such that f(x) = a (f is surjective) Question: x ∈ a? Answer: x ∈ a ⇔ (¬x ∈ f(x)) ⇔ (¬x ∈ a) Conclusion: ...,H ` (A ⇔ ¬A) for some A, ...,H ` ⊥ (by previous Lemma) ... ` ¬H What did Cantor need in ... for this? Existence of the set ω, existence of a power set of a set, Separation (to define a)

185 Building bigger sets (axioms) Power set axiom:

∀x∃z∀w (w ∈ z ⇔ (∀v (v ∈ w ⇒ v ∈ x)))

Informally: “z = P(x)”

Also useful Union axiom: ∀x∃z∀w (w ∈ z ⇔ (∃v (w ∈ v ∧ v ∈ x)))

Informally: “z = S x”

186 Remaining axioms Infinity axiom:

∃I (∀x (Empty[x] ⇒ (x ∈ I)) ∧ ∀x∀y ((x ∈ I ∧ Succ[x, y]) ⇒ (y ∈ I))) where Empty[x] is the formula ∀y (¬(y ∈ x)) “x = ∅” and Succ[x, y] is the formula ∀z (z ∈ y ⇔ (z ∈ x ∨ z = x)) “y = x ∪ {x}”

Also useful axiom: ∀x∀y ((∀z (z ∈ x ⇔ z ∈ y)) ⇒ x = y)

Removing the infinity axiom makes sense: you get the theory of finite sets

Removing the extensionality axiom makes sense: you get the intentional theory of sets

187 Zermelo-Fraenkel set theory (ZF ∗) ˆ Equality axioms ˆ Replacement axiom schema ˆ Power set axiom ˆ Union axiom ˆ Infinity ˆ Extensionality

188 IV. Doing without the bugged axiom schema

189 Notations and trivial constructions: intersections ˆ Let a ⊆ b be the formula ∀w (w ∈ a ⇒ w ∈ b) For every formula A, let ∀x ∈ a, A abbreviate ∀x (x ∈ a ⇒ A) and ∃x ∈ a, A abbreviate ∃x (x ∈ a ∧ A)

ˆ Let“ w ∈ a ∩ b” be the formula w ∈ a ∧ w ∈ b Let“ z = a ∩ b” be the formula ∀w (w ∈ z ⇔ “w ∈ a ∩ b”)

We have ZF ∗ ` ∀a∀b∃z “z = a ∩ b” (separation of a: z = {w ∈ a | w ∈ a ∩ b})

ˆ Let“ w ∈ a\b” be the formula w ∈ a ∧ ¬w ∈ b Let“ z = a\b” be the formula ∀w (w ∈ z ⇔ “w ∈ a\b”)

We have ZF ∗ ` ∀a∀b∃z “z = a\b” (separation of a again)

ˆ Let“ w ∈ T a” be the formula ∀y, y ∈ a ⇒ w ∈ y Let“ z = T a” be the formula ∀w (w ∈ z ⇔ “w ∈ T a”)

We have ZF ∗ ` ∀a ((∃y(y ∈ a)) ⇒ ∃z “z = T a”) (separation of y)

190 Basic constructions: singleton, doubleton ˆ Let“ z ∈ {x}” be the formula z = x and let“ z = {x}” be the formula ∀w (w ∈ z ⇔ “z ∈ {x}”)

ZF ∗ ` ∀x∃z “z = {x}” (separation of power set: z = {w ∈ P(x) | w ∈ {x}})

ˆ But what about“ {x1, x2}” ? What set can we separate to get: ∗ ZF ` ∀x1∀x2∃z∀w (w ∈ z ⇔ (w = x1 ∨ w = x2)) ˆ When separation fails, try replacement Let one[x] be the formula ∀y (y ∈ x ⇔ Empty[y])

Let two[x] be the formula ∀y (y ∈ x ⇔ (Empty[y] ∨ one[y]))

ZF ∗ ` ∃x Empty[x] (pure logic+separation), call it 0 ZF ∗ ` ∃x one[x] (power set of 0), call it 1 ZF ∗ ` ∃x two[x] (power set of 1+separation), call it 2 ZF ∗ ` ∀x ¬(Empty[x] ∧ one[x]) (pure logic) ∗ ZF ` ∀x1∀x2∃z∀w (w ∈ z ⇔ (w = x1 ∨ w = x2)) replacement from 2 ∗ with A[z, y] := ((Empty[z] ∧ y = x1) ∨ (one[z] ∧ y = x2)) (ZF ` functional(A))

191 Basic constructions: binary unions, pairs, cartesian product ˆ Let “a ∪ b” be the set S{a, b}. In other words, let“ w ∈ a ∪ b” be the formula w ∈ x ∨ w ∈ y

let“ z = a ∪ b” be the formula ∀w (w ∈ z ⇔ “w ∈ a ∪ b”)

ZF ∗ ` ∀x∀y∃z “z = a ∪ b” (build {a, b} then use union axiom)

ˆ Let “(x1, x2)” be the set {{x1}, {x1, x2}}

In other words, let“ z = (x1, x2)” be the formula

∀y (y ∈ z ⇔ ((∀w (w ∈ y ⇔ (w = x1))) ∨ (∀w (w ∈ y ⇔ (w = x1 ∨ w = x2)))))

∗ ZF ` ∀x1 ∀x2 ∃z “z = (x1, x2)” (see previous slide)

ˆ Let“ w ∈ a × b” be the formula ∃y1 ∃y2 (y1 ∈ a ∧ y2 ∈ b ∧ “w = (y1, y2)”)

Let“ z = a × b” be the formula ∀w (w ∈ z ⇔ “w ∈ a × b”)

ZF ∗ ` ∀a ∀b ∃z “z = a × b” (separation of P(P(a ∪ b)))

192 Basic constructions: functions A function f is represented in Set Theory as its graph: the set of pairs (x, f(x))

ˆ Let“ y = f(x)” be the formula

∃z ∈ f, “z = (x, y)” ˆ Let“ f : a −→ b” be the formula

∃z (f ⊆ z ∧ “z = a × b” ∧ ∀x∀y1∀y2 (“y1 = f(x)” ∧ “y2 = f(x)” ⇒ y1 = y2)) Let“ c = a −→ b” be the formula ∀f (f ∈ c ⇔ “f : a −→ b”) ZF ∗ ` ∀a ∀b ∃c “c = a −→ b” (separation of P(a × b)) ˆ Let“ f : a −→ b is injective” be the formula

“f : a −→ b” ∧ ∀x1∀x2∀y (“y = f(x1)” ∧ “y = f(x2)” ⇒ x1 = x2) ˆ Let“ f : a −→ b is surjective” be the formula

“f : a −→ b” ∧ ∀y (y ∈ b ⇒ ∃x “y = f(x)”) ˆ Let“ f : a −→ b is bijective” be the formula

“f : a −→ b is injective” ∧ “f : a −→ b is surjective”

193 Conclusion: Dropping bugged axiom is tedious and difficult Every construct justified by separation (or replacement) of previously constructed sets

Construction of bigger and bigger sets...... is done by explicit use of power set and union axioms

In some sense, ZF ∗ implements Kronecker’s idea: the sets that one wishes to talk about must be constructed in finitely many steps from basic sets . . . as opposed to the “virtual” sets that the bugged axiom allowed (e.g., the set of all sets) whose size is un-constrained by the size of previously constructed sets

Notice: Infinity axiom is not required in any of the above theorems.

Neither is Extensionality axiom.

Extensionality is required for unicity of the above constructions (exercise!). Without extensionality, you can have 2 different sets containing same elements (echo: you can have 2 different programs computing the same input-output relation)

194 Important remarks Hope: set theory is consistent

Claim: All the mathematics we know can be done in set theory (possibly with the help of 0 to 3 extra axioms -see next week) Let’s start with arithmetic!

195 V. Arithmetic in set theory

196 Peano by Von Neumann Reminder: Empty[x] is the formula ∀y (¬(y ∈ x)) “x = ∅” Succ[x, y] is the formula ∀z (z ∈ y ⇔ (z ∈ x ∨ z = x)) “y = x ∪ {x}”

ZF ∗ ` ∃x Empty[x] (already done, without using extensionality)

ZF ∗ ` ∀x∃y Succ[x, y] (singleton+binary union, no extensionality)

ZF ∗ ` ∀x∀y ¬(Succ[x, y] ∧ Empty[y]) (kind of already done)

ZF ∗ ` ∀x∀y∀y0 ((Succ[x, y] ∧ Succ[x, y0]) ⇒ y = y0) (using extensionality)

In set theory, natural numbers are encoded as sets: 0 := ∅, 1 := {0}, 2 := {0, 1}, 3 := {0, 1, 2},...

197 The set of natural numbers itself Remark: all natural numbers belong to the set I of infinity axiom

The set of natural numbers ω is the intersection of all of I containing 0 and closed under successor:

Let H[a] be the formula ∀x (Empty[x] ⇒ (x ∈ a))) ∧ (∀x∀y ((x ∈ a ∧ Succ[x, y]) ⇒ (y ∈ a))

Let Nat[n] be the formula ∀x (H[x] ⇒ n ∈ x) Induction principle!

We have ZF ∗ ` ∃ω∀n (n ∈ ω ⇔ Nat[n]) (axiom of infinity+separation)

Remarks: Natural numbers defined such that induction principle works ZF ∗ ` H[ω]

Notation: ∀ωx, A (resp. ∃ωx, A) stands for ∀x ∈ ω, A (resp. ∃x ∈ ω, A)

198 Definition by It can be proved in ZF ∗ that if f : B × ω × A −→ A and h : B −→ A then there is a unique function g : B × ω −→ A such that ˆ ∀b ∈ B, g(b, 0) = h(b) and ˆ ∀b ∈ B, ∀n ∈ ω, g(b, Sn) = f(b, n, g(b, n)) where Sn stands for “the set y ∈ ω such that Succ[n, y]”

Writing the above as a formula A such that ZF ∗ ` A can be done, but very long!

Those in INF412 should be reminded of the PC on recursive functions Everyone can have a look at Definition 3.1 of INF551 course notes

With this we can define two formulae Plus and Mult such that: ∗ ZF ` ∀ωx, ∀ωy, Empty[x] ⇒ Plus[x, y, y] ∗ ZF ` ∀ωx, ∀ωx0, ∀ωy, ∀ωz, ∀ωz0, (Succ[x, x0] ∧ Succ[z, z0] ∧ Plus[x, y, z]) ⇒ Plus[x0, y, z0] ∗ ZF ` ∀ωx, Empty[x] ⇒ Mult[x, y, x] ∗ ZF ` ∀ωx, ∀ωx0, ∀ωy, ∀ωz, ∀ωz0, (Succ[x, x0] ∧ Plus[z, y, z0] ∧ Mult[x, y, z]) ⇒ Mult[x0, y, z0]

199 VI. Church, Turing and Goedel crash the party again!

200 Back to Hilbert’s programme. . . Hilbert’s programme was about the existence of a logic X, based on arithmetic, such that. . .

In Lecture 1, I claimed Hilbert’s programme failed because of Church’s, Turing’s, and Goedel’s theorems in Peano’s arithmetic PA. Maybe PA was not the “right” X to accomplish Hilbert’s programme!

What about set theory?

If it fails too, what about other theories we have not thought about yet?

201 Poor languages Reminder: You have seen proofs for Church’s and Goedel’s theorems in PA

. . . using a language with symbols 0, =, S,. . .

What if no such symbols? (as in set theory)

More generally, let us consider a language L0 in which we can construct formulae ˆ N , “to be a natural number” ˆ Null, “to be zero” ˆ Succ, “to be the successor of. . . ”, ˆ P lus, “to be the addition of . . . and . . . ”, ˆ Mult, “to be the multiplication of . . . and . . . ” ˆ Eq, “to be two equal natural numbers”

In set theory, this is “simple”: N[n] is just n ∈ ω and Eq[n, m] is just N[n] ∧ N[m] ∧ n = m

202 Poor languages Now, did we really use all the axioms of PA to prove Church’s and Goedel’s theorems? (in infinite numbers because of the induction schema)

Def: Let T0 be the theory that expresses with N , Null, Succ, P lus, Mult, Eq the axioms of PA (+, ×, =) without induction (see INF551 course notes, Def 5.2). Example:

∀x∀y∀x0∀y0 ((N[x] ∧ N[y] ∧ Succ[x, x0] ∧ Succ[y, y0] ∧ Eq[x0, y0]) ⇒ Eq[x, y])

Remark and Idea: the axioms of T0 are in finite numbers. . .

. . . but T0 is sufficient for the constructs used in the representation theorem (slide 29 of Lecture 3)

Def: N-model: Any structure for language L0 where N interprets (elements satisfying) N , 0 interprets Null, n 7→ n + 1 interprets Succ, + interprets P lus, × interprets Mult, = interprets Eq

203 Rich theories in poor languages

General Church’s theorem: Let T be a theory in L0, that has an N-model and where T0 can be proved. Provability in T (i.e., T `) is undecidable

Proof: we adapt representation of programs as formulae by replacing S(t) ˆ x A by ∃x Succ[x, t] ∧ A 0 ˆ x A by ∃x Null[x] ∧ A ˆ t = u by Eq[t, u] ˆ ... The representation theorem is adapted, with T and its N-model in stead of PA and N. To prove it we use the fact that T proves T0.

Application: Provability in ZF ∗ is undecidable.

What about inconsistent extensions? e.g. what if we take T0, ⊥ ?

204 Side-question: Poor theories in poor languages Specific Church’s theorem:

Provability in the empty theory (in language L0) is undecidable

Proof:

Let H be the conjunction of all axioms of T0 (in finite! numbers)

Obvious (e.g., from INF412): T0 ` A iff ` H ⇒ A

Examples: ˆ Language with 1 binary predicate symbol undecidable ˆ Language with 1 predicate symbol of arity > 1 undecidable ˆ Language with 1 unary predicate symbol and 1 term symbol of arity > 1 undecidable

205 Some decidable theories Provability in predicate logic (without axioms) is undecidable

But if symbols are governed by specific axioms, decidability can be recovered

Example: Presburger’s arithmetic (arithmetic with + but not ×)

Example: Euclid’s geometry

Basically, decidability has no monotonicity properties: if T1 ⊆ T2 ⊆ T3 and provability in T2 is decidable/undecidable, nothing can be said of decidability of provability in T1 or T3.

206 Godel’s¨ theorem The implication Church ⇒ Goedel still works!

General Goedel’s theorem:

Let T be a theory in L0, that has an N-model, and such that T is a decidable subset of the set of formulae.

There is a closed formula A such that neither T ` A nor T ` ¬A

(Call such a formula a Goedel formula)

Proof: Proof-checking is still decidable (since belonging to T is decidable). Proof-search is still semi-decidable. Run two proof-search algorithms in parallel, one on A the other on ¬A. If Goedel’s theorem was false the parallel execution of the two programs would systematically terminate, providing algorithm to decide provability in T , contradicting Church.

207 Application

Take PA and your favourite N-model. Apply Goedel’s theorem and get a Goedel formula A1.

Take PA,A1 and extend your N-model. Apply General Goedel’s theorem and get a Goedel formula A2.

Take PA,A1,A2 and extend your N-model. Apply General Goedel’s theorem and get a Goedel formula A3. ... Goedel’s theorem will always provide new Goedel formulae.

To be compared to the completion theorem (see in Lecture 1), that states: A consistent theory can always be completed into another consistent theory where every closed formula A is such that either A is provable or ¬A is provable.

Where is the catch???

208 Questions?

209 Lecture VII The theory of functions

210 Where we stand today Remember: Last week we proposed Set theory as a universal framework (language+logic+theory) for representing any problem (of mathematical nature) Aim: mechanics of solving problems in universal framework =⇒ mechanics of solving any problem Good points of Set theory: ˆ Hard to do without sets ˆ Rather minimalistic (although axiom schema of replacement)) ˆ Nobody’s found a contradiction yet (always a plus) ˆ Can express arithmetic and (we believe) all the mathematics we’ve written so far Bad points of Set theory: ˆ Provability is undecidable (but semi-decidable) ˆ Doing anything in set theory is long and tedious In any case, Church / Goedel’s¨ theorems =⇒ Ability to express arithmetic incompatible with decidability of provability/completeness

211 Formalising problems in theory or in practice Why do we only “believe” that all the mathematics we’ve written so far can be done in Set theory?

Have you already seen books/articles/course notes expressed entirely in Set Theory?

Impact of ZF on the world of mathematics:

Not huge revolution on nature of mathematical activity

At best: thought process at the back of each mathematician’s mind raising the question at every reasoning step: “Would I be able to justify this step within ZF?”

Conviction that formalising maths in ZF can be done “in theory”

Mathematicians did not formalise their maths in ZF “in practice”

. . . at least until computers were invented

212 Using computers Proofs in ZF are too long, unreadable, cannot be processed by humans For a start: lacking definition mechanisms and convenient notations

Could those problems be overcome by using computers?

Can do parsing and pretty-printing =⇒ use for convenient interfacing between human and Set theory?

Set theory = low-level machine language, computer compiling human proofs into it?

Problem: Human wants to do big-step reasoning. Challenges: #1 Need mechanical way to convert big-step reasoning into small-step reasoning (e.g., using predicate logic & set theory axioms) #2 Need rigourous language to even express big-step reasoning

213 Challenge #1 is difficult to achieve with Set theory Big-step reasoning to small-step reasoning requires filling the “proof gaps” automatically Can an algorithm be smart enough to do so, using the right axioms at the right time?

Axioms not very “computer-friendly” Proving 45 ∗ 67 = 3015 requires a huge proof in PA...... and an even bigger proof in ZF, with complex use of the axioms As 45 ∗ 67 can be trivially computed, using a computer to search for a proof of 45 ∗ 67 = 3015 in PA or ZF...... is using a computer the wrong way

In summary, computers have more processing power, but no intuition

=⇒ so we need an alternative to ZF that is more computer-friendly

214 Computer-friendly stuff Set are not very computer-friendly: one of the math objects least convenient to implement

Integers, lists, trees, on the other hand, are computer-friendly (while not primary objects in Set theory)

So are Functions! (again, while not primary objects in Set theory)

. . . since they correspond to computer programs! (well, computable ones)

. . . but computer-friendly as long as one doesn’t ask whether 2 functions are extensionally equal (whether for all possible inputs, they produce the same output)

Remember: whether 2 Turing machines produce the same output, given the same input, is an undecidable problem!

Today: theory of sets =⇒ theory of functions (and we will drop extensionality)

215 Contents

I. Back to Russell’s paradox II. The λ-calculus III. Ensuring termination of λ-calculus with a typing system IV. Higher-Order Logic: using the typed λ-calculus to fix Frege’s logic

216 I. Back to Russell’s paradox

217 Frege’s system In fact, Frege’s Begriffschrift (1879) considers functions as more primitive than sets They are part of his logic (rather than the purpose of an axiomatic theory)

Frege distinguishes objects (which include natural numbers) and functions

ˆ Variables x, y, z, . . . represent objects, while variables f, g, h, . . . represent functions ˆ (The representation of) a function can be applied to (the representation of) an object, so f(x) represents an object ˆ Both kinds of variables can be quantified over: ∀x . . . and ∀f . . . ˆ predicates1 are those functions mapping objects to either 0 or 1 ˆ formulae are the various representations of 0 and 1 ˆ Given a formula A[x],

Frege allows the representation, in his syntax, of the function c 7→ A[x] x7→c J K In modern notations (Church’s), such a function is denoted λx.A[x] ˆ His system allows to prove the simplification property: ∀y (((λx.A[x])(y)) ⇔ A[y])

1 Frege called them “concepts”

218 Frege’s system Notice that we are already outside the syntax of (what you know to be) predicate logic, since ˆ there are 2 kinds of variables and terms (instead of 1): for objects and for functions ˆ formulae are particular object terms & λx.A[x] builds a function term from a formula ˆ moreover λx.A[x] and λy.A[y] are identified as the same term =⇒ notion of binder in the syntax of terms

But so far so good, no one has found any contradiction in this system

But in 1893, Frege adds a tool for creating objects from functions (to speak about equality of functions)

In the case of a predicate F , F denotes a particular object called the extension of concept F , satisfying “Basic Law V” (Axiom 5 of Frege’s system): ∀F ∀G ((F = G) ⇔ ∀x (F x ⇔ Gx))

219 Russell’s paradox (1902) Russell expresses his paradox both in terms of set theory and in Frege’s logical system (with functions) Let P be the following predicate term: λx.∃F (x = F ∧ ¬F (x)) Clearly, P (P ) means (λx.∃F (x = F ∧ ¬F (x)))(P ) and by the simplification property this is equivalent to ∃F (P = F ∧ ¬F (P )) Then by Basic Law V we know that P = F is equivalent to ∀x (P x ⇔ F x) so the above is equivalent to ¬P (P ) In Frege’s system, there is a formula P (P ) equivalent to its negation

=⇒ in Frege’s system there is a proof of ⊥ No set involved, but of course in substance P is “the set of all objects satisfying P ”

220 In their own words. . . Original letter to Frege: “There is just one point where I have encountered a difficulty. You state that a function too, can act as the indeterminate . This I formerly believed, but now this view seems doubtful to me because of the following contradiction. Let w be the predicate: to be a predicate that cannot be predicated of itself. Can w be predicated of itself? From each answer its opposite follows. Therefore we must conclude that w is not a predicate.”

Frege: “A scientist can hardly meet with anything more undesirable than to have the foundation give way just as the work is finished. In this position I was put by a letter from Mr Bertrand Russell as the work was nearly through the press.”

221 Vicious circle It became quite clear that Russel’s paradox was about a “vicious circle”:

the liar saying “I am a liar”

In Frege’s system, the simplification property ∀y (((λx.A[x])(y)) ⇔ A[y]) can be seen as a computational rule, oriented left-to-right

Basic Law 5 allows us to forget about  and basically “simplify” (λx.¬x(x)) (λx.¬x(x)) to ¬((λx.¬x(x)) (λx.¬x(x)))

The paradox comes from the non-termination of the “simplification” process

222 II. The λ-calculus

223 Seeing computation in the Begriffschrift Church, following previous work by Curry, made a theory of this simplification process:

The λ-calculus Retaining only the necessary ingredients from Frege. We need: ˆ variables x, y, z, . . . ˆ λ-abstractions ˆ applications ˆ and that’s it In other words, λ-terms are defined by the following syntax: t, u, v, . . . ::= x | λx.t | t u

Notational conventions:

Implicit parentheses: the concrete syntax t0 t1 . . . tn means (... (t0 t1) . . . tn) Scope of λ-abstractions: when writing the concrete syntax λx. . . ., as much of ... as possible must be understood to be under the λ-abstraction, e.g., λx.xy means λx.(xy), not (λx.x)y

224 We have seen all this before Free variables: as you expect,

FV(x) := x

FV(λx.t) := FV(t)\{x}

FV(t u) := FV(t) ∪ FV(u) x is bound in λx.t

The syntax is quotiented by α-equivalence, e.g., λx.x and λy.y are the same λ-term

u Substitution {x}t defined by induction on t in a way that avoids variable capture

225 Reduction One reduction rule: u (λx.t) u −→r o o t {x}t t, u ranging over terms, x over variables

Redex: term that can be reduced by −→r o o t , i.e., an instance of (λx.t) u

But we do not only want to reduce at the root of terms, also inside them:

−→ inductively defined by ˆ 0 0 if t −→r o o t t , then t −→ t ˆ if t −→ t0, then (t u) −→ (t0 u) ˆ if u −→ u0, then (t u) −→ (t u0) ˆ if t −→ t0, then (λx.t) −→ (λx.t0)

−→∗ is reflexive and transitive closure of −→ ←→∗ is reflexive, transitive and symmetric closure of −→

226 Currification No need to model functions with several : a function f with 2 arguments (x, y) 7→ e[x, y] can be seen as a function g mapping one argument x to: the function that maps y to e[x, y] (x, y) 7→ e[x, y] equivalent to x 7→ (y 7→ e[x, y]) In λ-calculus syntax: λx.λy.e[x, y] To apply it, instead of writing f(x, y), write ((g x) y)

Example: (λx.λx0.x0) ((λy.y) z) ((λy.y) z0)

How many redexes?3

Several ways in which we can reduce

In this case, they all end up with the same (irreducible) λ-term z0 (t irreducible if it cannot be reduced by −→ , i.e., none of its sub-terms is a redex)

227 Confluence Is this a general property?

Yes!

Theorem: the relation −→ is confluent, i.e., ∗ ∗ ∗ ∗ If t−→ t1 and t−→ t2, then there exists u such that t1−→ u and t2−→ u t ∗ ∗  t1 t2

∗ ∗  u

Proof: not today

228 Corollaries Corollary 1: the relation −→ is Church-Rosser, i.e., ∗ ∗ ∗ If t1←→ t2, then there exists u such that t1−→ u and t2−→ u ∗ t1 o / t2

∗ ∗  u

Proof: by induction on the number of “peaks”

Corollary 2: Irreducible forms are unique, i.e., Given a λ-term t, there is at most one irreducible u such that t−→∗ u

Proof: trivial

Existence of u?

229 Back to Russell’s vicious circle ω = (λx.x x)(λx.x x) ?

(λx.y) ω ?

Definition t weakly normalising if there exists irreducible u such that t−→∗ u t strongly normalising if there is no infinite reduction sequence starting from t strongly normalising clearly implies weakly normalising(K onig’s¨ lemma) Examples above show that it is not the same Russell’s paradox is based on absence of normalisation (using the negation connective):

(λx.¬(x x)) (λx.¬(x x)) −→ ¬((λx.¬(x x)) (λx.¬(x x))) −→ ¬¬((λx.¬(x x)) (λx.¬(x x))) −→ ¬¬¬((λx.¬(x x)) (λx.¬(x x))) ...

230 III. Ensuring termination of λ-calculus with a typing system

231 Intuition Reason why some terms are non-normalising lies in the question of whether applying a function to itself( x x) makes mathematical sense

Paradox in Frege’s system lies in the question of whether applying a predicate to itself( P (P ) or even P (P )) makes mathematical sense

Paradox in na¨ıve set theory lies in the question of whether it makes mathematical sense for a set to belong to itself( x ∈ x)

At least in the first two cases, this has to do with the

domain of definition of function x or predicate P

We need to control it to avoid paradoxes

232 Controlling domains and applications Easy in Set theory: if f : A −→ B and x ∈ A then writing f(x) “makes sense” and f(x) ∈ B

But we don’t need Set theory for that

Idea: Abstract away from sets to retain only the necessary ingredients for controlling domains of definition and applications of functions to arguments

x ∈ A becomes x : A

This is the notion of typing This is a purely syntactic notion

233 The simply-typed λ-calculus We consider some base types: a, b, etc

Syntax of types A, B, C, . . . ::= a | A → B Notational conventions on implicit parentheses:

the concrete syntax A1 → · · · → An → A0 means A1 → (· · · → (An → A0) ···)

Typing context ∆,...: finite map from λ-calculus variables to types

Notation: ∆ can be for instance x1 :A1, . . . , xn :An We can write ∆, x:A

Typing is a relation between 3 things: a context, a λ-term and a type defined inductively by typing rules:

∆, x:A x: A

∆ t: A → B ∆ u: A ∆, x:A t: B ∆ t u: B ∆ λx.t: A → B

234 Properties of the typing system Remark: If ∆ t: A then FV(t) is included in the domain of ∆

u Substitution: If ∆, x:A t: B and ∆ u: A, then ∆ {x}t: B Proof: easy induction on t

0 0 Subject Reduction: If ∆ t: A and t −→ t then ∆ t : A Proof: easy induction on the inductive property t −→ t0 (base case = root reduction; uses above lemma)

Strong Normalisation: If ∆ t: A then t is strongly normalising Proof: not today

235 Easy extension Easy to add typed constants to the simply-typed λ-calculus

(Higher-order) signature: set of constants equipped with fixed types (instead of arities) Syntax of the λ-calculus with constants: t, u, v, . . . ::= x | c | λx.t | tu where c ranges over the signature

Constants play no role in computation. Same definition of reduction relation −→

Add 1 typing rule for constants:

∆ c: A if c:A is in the signature

Properties on previous slide still hold (constants behave like free variables that can never be bound)

236 IV. Higher-Order Logic: using the typed λ-calculus to fix Frege’s logic

237 Setting up the scene 2 base types: i (individuals), Prop (propositions)

As opposed to predicate logic, terms and formulae are made of the same syntax: that of λ-terms with constants

Signature of constants:

>: Prop ∧: Prop → Prop → Prop

⊥: Prop ∨: Prop → Prop → Prop

¬: Prop → Prop ⇒: Prop → Prop → Prop

∀A :(A → Prop) → Prop

∃A :(A → Prop) → Prop

P ∧ Q abbreviation for ∧ PQ,. . .

∀Ax P abbreviation for ∀A λx.P ,...

238 Philosophy of the proof system We get a typing system deriving typing judgements of the form ∆ t: A

A “formula” P is now just a λ-term of type Prop (depends on context ∆!)

Next slide: a proof system deriving judgements of the form Γ `∆ P where ∆ is a typing context, P (the goal) is a formula according to ∆, and Γ (the hypotheses) a set of formulae according to ∆

239 The inference rules -part 1 You already know the rules for propositional logic (see INF551 course notes): Γ,P `∆ Q Γ `∆ P ⇒ Q Γ `∆ P Γ `∆ P ⇒ Q Γ `∆ Q

Γ `∆ P1 Γ `∆ P2 Γ `∆ P1 ∧ P2 i ∈ {1, 2} Γ `∆ P1 ∧ P2 Γ `∆ Pi

Γ `∆ Pi ∆ Pj : Prop Γ `∆ P1 ∨ P2 Γ,P1 `∆ Q Γ,P2 `∆ Q {i, j} = {1, 2} Γ `∆ P1 ∨ P2 Γ `∆ Q

{∆ A: Prop}A∈Γ Γ `∆ ⊥ ∆ P : Prop Γ `∆ > Γ `∆ P Γ,P `∆ ⊥ Γ `∆ ¬¬P Γ `∆ ¬P Γ `∆ P

{∆ A: Prop}A∈(Γ,P ) Γ,P `∆ P

240 The inference rules -part2 We adapt the rules for quantifiers to the higher-order case

Γ `∆,x:A P Γ `∆ ∀AxP ∆ t: A t Γ `∆ ∀AxP Γ `∆ x P

t Γ `∆ x P ∆, x:A P : Prop Γ `∆ ∃AxP Γ,P `∆,x:A Q

Γ `∆ ∃AxP Γ `∆ Q

Finally, we include computation within the reasoning: Γ `∆ Q ∆ P : Prop P ←→∗ Q Γ `∆ P

241 Property and equality

Property: Whenever Γ `∆ P we have ∆ P : Prop (invariant by each of the inference rules)

As usual, P ⇔ Q is an abbreviation for (P ⇒ Q) ∧ (Q ⇒ P )

Equality can be defined by Leibniz:

=A is λx.λy.∀A→PropP (P x ⇔ P y)

We can use the usual infix notation t =A u for ((=A t) u)

Exercises (practical next week):

Prove `∀ Ax, x =A x

Prove that if Γ `∆ P [t] and Γ `∆ t =A u then Γ `∆ P [u] (assuming ∆, y1 :A P : Prop, ∆ t: A and ∆ u: A)

242 HOL computer-friendly? HOL includes computation within reasoning. Our hope: encode arithmetic so that 45 ∗ 67 = 3015 can be proved in 1 reasoning step. ...

` 3015 =nat 3015 45 ∗ 67←→∗ 3015 ` 45 ∗ 67 =nat 3015

We do this next week

You will have noticed that moving from predicate logic to HOL gets us much closer to the Coq system. . .

243 Questions?

244 Lecture VIII The λ-calculus: arithmetic and computability

245 One preliminary remark about HOL Take it for granted: ˆ Type-checking: The problem of knowing, for a λ-term t, a context ∆ and a type A, whether ∆ t: A, is decidable ˆ Typability: The problem of knowing, for a λ-term t, whether it is typable (there exist ∆ and A such that ∆ t: A), is also decidable Remark: The typing system is a fragment of Caml’s. If the Caml compiler can type-check, so can we.

Corollary: Proof checking in HOL is decidable In particular, it can be decided whether an application of rule Γ `∆ Q ∆ P : Prop P ←→∗ Q Γ `∆ P is correct, because of confluence and strong normalisation of P and Q

246 Contents

I. Arithmetic in Higher-Order Logic II. The λ-calculus beyond simple types III. Representing recursive functions in the λ-calculus IV. (teaser)

247 I. Arithmetic in Higher-Order Logic

248 Representation of natural numbers Representation is guided by the desire to define functions by induction

3 is iterating a function three times

3 = λx.λf.(f (f (f x)))

p = λx.λf.(f (f ... (f x)...)) | {z } p times

If h is the function that doubles its argument, what is (n 1 h)?

Every p can be typed:

p: a → (a → a) → a Let us abbreviate a → (a → a) → a by a0.

Every p is irreducible

Easy to check: every closed irreducible λ-term of type a0 is of the form p

249 So far we have seen four representations of natural numbers 3 is S(S(S(0))) (Peano)

3 is what all the sets of three elements have in common (Cantor)

3 is {0, 1, 2} (Von Neumann)

3 is λx.λf.(f (f (f x))) (Church)

250 Representation of functions, successor, addition Definition n F (λ-term) represents a total function f from N to N ∗ if for all p1, . . . , pn, q such that f(p1, . . . , pn) = q, we have (F p1 . . . pn)−→ q

Successor: SUC := λn.λx.λf.f (n x f)

Addition: PLUS := λp.λq.λx.λf.p (q x f) f

Multiplication: TIMES := λp.λq.λx.λf.p x (λy.q y f)

Clearly, 0 0 SUC : a → a 0 0 0 PLUS : a → a → a 0 0 0 TIMES : a → a → a

251 Peano’s axioms The following theorem can be proved in HOL:

` (TIMES 45 67) = 3015 in just a couple of steps! . . . without axioms . . . just using computing power of λ-term reduction in that sense, HOL much more computer-friendly than set theory

The following Peano’s axioms can be proved in HOL (see practical):

`∀ a0 n (PLUS 0 n) = n

`∀ a0 n∀a0 m (PLUS (SUC n) m) = SUC (PLUS n m)

`∀ a0 n∀a0 m (TIMES 0 m) = 0

`∀ a0 n∀a0 m (TIMES (SUC n) m) = PLUS m (TIMES n m)

Just like in ZF ∗ But this time without axioms!

252 . . . well, almost

The first one: `∀ a0 n (PLUS 0 n) = n actually requires a weak form of extensionality: ∀a→bf (f = λx.f x) Cannot be proved in HOL as defined last week. . . axiom? But can be proved in HOL if the notion of reduction is extended (2 reduction rules instead of one):

u (λx.t) u −→r o o t {x}t

λx.t x −→r o o t t x 6∈ FV(t)

All the theorems you know about λ-calculus still hold with the extra rule (confluence, strong normalisation of simply-typed terms, subject reduction, etc) Coq implements (an extension of) HOL. version ≤ 8.3 integrates only the first rule =⇒ weak extensionality needs to be added as an axiom version ≥ 8.4 integrates both rules =⇒ weak extensionality can be proved

253 Induction For all integer p we have in HOL a proof of

`∀ a0→PropP (P 0 ⇒ (∀a0 m, P m ⇒ P (SUC m)) ⇒ P p) But it can be shown (not in INF551) that in HOL there is no proof of

`∀ a0 n∀a0→PropP (P 0 ⇒ (∀a0 m, P m ⇒ P (SUC m)) ⇒ P n) i.e., you cannot prove (within HOL) that every inhabitant of a0 satisfies every

“hereditary property” (a predicate P such that P 0 and ∀a0 m, P m ⇒ P (SUC m)) Church’s trick: Let’s say that natural numbers are not all the inhabitants of a0 but only those satisfying every hereditary property, i.e., satisfying the predicate NATa: λn.∀a0→PropP (P 0 ⇒ (∀a0 m, P m ⇒ P (SUC m)) ⇒ P n)

Safety check: you can prove ` NATa 0 and `∀ a0 n, (NATa n) ⇒ (NATa (SUC n)) Now when we want to quantify over natural numbers, we write

∀a0 n, (NATa n) ⇒ · · · or ∃a0 n, (NATa n) ∧ · · ·

And now we can prove the induction principle (see practical):

`∀ a0 n, (NATa n) ⇒ ∀a0→PropP (P 0 ⇒ (∀a0 m, P m ⇒ P (SUC m)) ⇒ P n)

254 Missing Peano axioms Two of peano’s axioms are still missing

ˆ Zero is the successor of noone: ∀a0 n (NATa n) ⇒¬(0 = SUC n) cannot be proved (with or without the assumption (NATa n))

But this can actually be proved (see practical) when a has two distinct elements:

` (∃ax∃ay ¬(x = y)) ⇒ ∀a0 n ¬(0 = SUC n)

This is the case of Prop!

Therefore: `∀ Prop0 n ¬(0 = SUC n)

ˆ Successor is injective:

∀a0 n (NATa n) ⇒∀a0 m (NATa m) ⇒(SUC n = SUC m) ⇒ n = m

No trick there, we have to add it as an axiom

255 Church, Turing and Goedel crash the party again! Let HOL+ be HOL with ˆ the injectivity of successor ˆ Weak Extensionality (either as an axiom or as part of the reduction relation)

We have seen that HOL+ can fully express Peano’s arithmetic

(i.e., can prove the theory T0)

Sanction: If HOL+ is consistent (i.e., has a model, which would be an N-model), then ˆ Provability in HOL+ is undecidable(Church’s theorem) ˆ There is a Goedel formula in HOL+ (or in HOL)(Godel’s theorem) (cannot be proved and neither can its negation be proved)

The second point is due to the fact that proof-checking in HOL+ is still decidable

256 Consistency Well. . . is HOL+ (or even HOL) consistent? Nobody has found a contradiction yet

Simple types prevent the construction of paradoxes such as Russel’s: the term (λx.x x)(λx.x x) with infinite reduction is banned since x x cannot be typed!

Can we prove that HOL+ (or even HOL) is consistent?

Well, it depends which meta-logic we work in:

Working in set theory, we have proved that Peano’s Arithmetic is consistent ∗ (we have constructed an N-structure in ZF )

Still working in set theory, we could prove that HOL+ (and HOL) is consistent (would only take a couple of hours) . . . and it heavily relies on the strong normalisation of λ-terms that are used

Morality: deep connection between consistency and strong normalisation

257 II. The λ-calculus beyond simple types

258 Turing-completeness? You will have noticed that our λ-terms are particular (and simple) Caml programs

Caml, like every decent programming language, is Turing-complete

Question: is the simply-typed λ-calculus Turing-complete?

Remember the Generalised Halting Problem (Lecture 3): Let T be a decidable subset of programs such that every program in T always terminates. Then T is not Turing-complete (there is a total computable function not represented by a program in T )

Fact 1: Being a λ-term of type a0 → a0 is a decidable property (Typing is decidable)

0 0 0 0 Fact 2: If t: a → a , then for every numeral p (of type a ), we have t p: a . . . and therefore t p is strongly normalising (=⇒ terminates)

Conclusion: The simply-typed λ-calculus is not Turing-complete

259 What does the simply-typed λ-calculus lack for Turing-completeness? The source of non-termination in your usual programming language. For instance: ˆ the while loop in imperative languages (e.g., C) ˆ the general recursive calls let rec ...= ...in ... in functional languages (e.g., Caml)

In summary: strong normalisation deeply connected to the consistency of the logic using λ-terms strong normalisation incompatible with Turing-completeness

Very difficult (impossible?) to have a logic integrating all computations from a Turing-complete language

Let us investigate how to make the λ-calculus Turing-complete independently from any logic

260 Representation of functions in general Our definition of “representing a function” was designed for total functions

Let us extend this definition for partial functions as well

Definition n F (λ-term) represents a function f from N to N ∗ ˆ if for all p1, . . . , pn, q such that f(p1, . . . , pn) = q, (F p1 . . . pn)−→ q

ˆ and if for all p1, . . . , pn such that f is not defined on p1, . . . , pn,

(F p1 . . . pn) does not reduce to an irreducible term

This defines a notion of “computable functions” (those represented by some λ-terms)

261 3 equivalent notions ˆ Recursive functions - Godel,¨ 1933 ˆ λ-calculus - Church, 1936 ˆ Turing machines, 1937

Rosser (1939): those three computational models coincide (they identify as “computable” the same set of functions)

In the next section, we prove that recursive functions can be represented by λ-terms

262 III. Representing recursive functions in the λ-calculus

263 Projections, zero, successor, composition We have already done most of the work!

Projections: λx1. .... λxn.xi

Constant functions returning 0: λx1. .... λxn.0

Successor λn.λx.λf.f (n x f)

Composition: m n If F represents f : N −→ N and each Gi represents gi : N −→ N, then

h : Nn −→ N

(p1, . . . , pn) 7→ f(g1(p1, . . . , pn), . . . , gm(p1, . . . , pn)) is represented by

H = λx1. ... λxn.(F (G1 x1 ... xn)) ... (Gm x1 ... xn))

264 This is a bit too naive, though If g not defined on 4 and h is the constant function returning 0, then the composition f = h ◦ g is not defined on 4 but (F 4) = (λy.((λx.0)( G y)) 4)−→∗ ((λx.0)(G 4))−→∗ 0

The trick: & t&u is a λ-term meaning “t, provided that u reduces to some p”

We can define t&u := (u t (λx.x))

It is easy to check that ˆ t&p−→∗ t ˆ t&u does not reduce to an irreducible term if u does not

265 Representing recursive functions - part 1

Projections: λx1. .... λxn.((((xi&x1)&...&xi−1)&xi+1)&...&xn)

Constant functions returning 0: λx1. .... λxn.((0&x1)&...&xn)

Successor: λn.λx.λf.(f (n x f))

Plus: λp.λq.(((λx.λf.(p (q x f) f))&p)&q)

Times: λp.λq.(((λx.λf.(p x (λy.(q y f))))&p)&q)

χ<: λp.λq.(((p (K 1) T (q (K 0) T ))&p)&q) where K = λx.λy.x and T = λg.λh.(h g)

266 Representing recursive functions - part 2 Composition:

λx1. ... λxn.(H (G1 x1 ... xn)) ... (Gm x1 ... xn))

If then else: Ifz = λx.λy.λz.(x y λz0.z)

Iterating a function represented by t forever:

Yt = ((λx.(t (x x))) (λx.(t (x x))))

Yt −→ (t Yt) −→ (t (t Yt)) −→ · · · Minimisation:

λx1. ... λxn.(YG0 x1 ... xn 0) where 0 G := λf.λx1. ... λxn.λxn+1.(Ifz (G x1 ... xn xn+1) xn+1 (f x1 ... xn (S xn+1)))

267 Conclusion Theorem:

∗ If f(p1, ..., pn) = q, then (F p1 ... pn)−→ q

If f not defined on p1, ..., pn, then (F p1 ... pn) does not reduce to an irreducible term

Corollary: The untyped λ-calculus is Turing-complete (since recursive functions are)

In the above constructions, which λ-terms can be typed?

Which one cannot?

268 Posterity of λ-calculus Robin Milner: add to it primitive numerals (4 operations + test), a primitive fixpoint and a let construct: PCF

Add to that tree datatypes and pattern-matching: (Ca)ML, a real-life programming language

Coq uses a version of this where the fixpoint is controlled: You can write (the equivalent of) let rec f x = u when the type of x is a tree datatype, but the only recursive calls of f allowed in u must be applied to arguments that are clearly subtrees of x =⇒ ensures strong normalisation =⇒ ensures consistency, but drops Turing-completeness

λ-calculus used as a basis to approach new domains: e.g., polymorphic λ-calculus, parallel λ-calculus, λ-calculus for quantum computations,. . .

269 IV. Intuitionistic logic (teaser)

270 Constructivism Remember Kronecker’s criticism of Cantor’s work: Cantor’s inability to explicitly construct the sets he talked about Example: the set of all sets

Fortunately, the existence of such a set leads to inconsistency (Russell) and Zermelo-Fraenkel’s set theory is a step towards constructivism: Every set whose existence we claim needs to be justified by a construction, consisting of ˆ applying power set axiom or union axiom finitely many times (from the empty set or the set of natural numbers) ˆ and then using separation or replacement

Still, such constructions rely on axioms

Even better in HOL: there are no axioms! Every function whose existence we claim is actually a λ-term, i.e., a computer program. We can hardly get more explicit!

271 The drinker’s theorem Well. . . this is not completely true.

Let’s come back to the drinker’s theorem: “There is always someone such that, if he drinks, everybody drinks”

272 Proof - Informal Take the first guy you see, call it Bob.

Either Bob does not drink, in which case he satisfies the predicate “if he drinks, everybody drinks”

. . . or Bob drinks, in which case we have to check that everybody else drinks

If this is the case, then again Bob is the person we are looking for

If we find someone who does not drink, call it Derek, we change our mind and say that the guy we are looking for is Derek

273 Problem with this?

We have turned this into a formal proof of the formula ∃x (DRINKS(x) ⇒ ∀y DRINKS(y))

. . . using the rule Γ ` ¬¬A Γ ` A

We can develop the proof in predicate logic, in HOL, with or without a theory

Problem with this: We have proved the theorem . . . but we are still incapable of identifying the person satisfying the property (or rather, our choice depends on the context)

In other words, we fail to provide a witness of existence

In other words, the logic we use does not have the witness property

The logic we use lacks a certain dose of constructivism

274 What’s next? Next Lecture: ˆ a way to modify every logical system we’ve talked about . . . to recover the witness property (and be more constructive) ˆ unveiling a new connection between proofs and programs: the Curry-Howard correspondence

275 Questions?

276 Lecture IX Intuitionistic logic(s) & the proofs-as-programs paradigm

277 The drinker’s theorem “There is always someone such that, if he drinks, everybody drinks”

Let DF be the statement of the drinker’s theorem:

∃x(DRINKS(x) ⇒ ∀y DRINKS(y))

278 Contents

I. Excluded-middle vs Double negation II. Annoying things in logic so far III. The proofs-as-programs paradigm

279 I. Excluded-middle vs Double negation

280 Two possible rules Γ ` ¬¬A double negation excluded middle Γ ` A Γ ` A ∨ ¬A

Γ, ¬A `¬ A Γ, ¬A ` A . . . allow to derive Γ, ¬A ` A : Γ, ¬A `⊥ Γ ` A ∨ ¬A Γ,A ` A Γ, ¬A ` A ======Γ ` A Γ ` ¬¬A Γ ` A

Γ ` A . . . are interchangeable, in that one rule can be derived using the other:

Γ, ¬(A ∨ ¬A),A ` A Γ ` ¬¬A − − − − − − Γ, ¬(A ∨ ¬A),A ` A ∨ ¬A Γ, ¬A ` ¬¬A Γ, ¬A `¬ A

Γ, ¬(A ∨ ¬A),A `⊥ Γ, ¬A `⊥

Γ, ¬(A ∨ ¬A) `¬ A Γ, ¬A ` A ======Γ, ¬(A ∨ ¬A) ` A ∨ ¬A Γ ` A ======Γ ` A ∨ ¬A

281 II. Annoying things in logic so far

282 Formal proof of the Drinker’s theorem

... `¬ DRINKS(y) ... ` DRINKS(y)

¬DF, DRINKS(BOB), ¬DRINKS(y), DRINKS(y) `⊥

¬DF, DRINKS(BOB), ¬DRINKS(y), DRINKS(y) `∀ z DRINKS(z)

¬DF, DRINKS(BOB), ¬DRINKS(y) ` DRINKS(y) ⇒ ∀z DRINKS(z)

¬DF,... `¬ DF ¬DF, DRINKS(BOB), ¬DRINKS(y) `∃ x(DRINKS(x) ⇒ ∀z DRINKS(z))

¬DF, DRINKS(BOB), ¬DRINKS(y) `⊥

¬DF, DRINKS(BOB) ` ¬¬DRINKS(y)

¬DF, DRINKS(BOB) ` DRINKS(y)

¬DF, DRINKS(BOB) `∀ y DRINKS(y)

¬DF ` DRINKS(BOB) ⇒ ∀y DRINKS(y)

¬DF `∃ x(DRINKS(x) ⇒ ∀y DRINKS(y)) ======` DF

283 Problem with this We have proved the theorem . . . but we are still incapable of identifying the person satisfying the property (or rather, our choice depends on the context)

In other words, we fail to provide a witness of existence

In other words, the logic we use does not have the witness property

The logic we use lacks a certain dose of constructivism

284 Lack of witness, another example Predicate P : assuming P (0), ¬P (2)

Can we prove that there is an integer x such that P (x) ∧ ¬P (S(x))? P (0), ¬P (2) ` ∃x (P (x) ∧ ¬P (S(x)))

Is there an integer n such that we can prove P (n) ∧ ¬P (S(n))? P (0), ¬P (2) ` P (n) ∧ ¬P (S(n))

Concrete example: √ √ 2 Let u0 := 2, ux+1 := ux , and P (x) be “ux irrational”

P (0), P (2)?

Applying the above: There is x such that P (x) and ¬P (x + 1) √ 2 Therefore: There is an irrational r (:= ux) such that r rational √ √ √ √ 2 √ 2 r is either 2 or 2 , depending on whether 2 is rational or not

285 The annoying mismatch Remark: ` A ∧ B if and only if both ` A and ` B The object-level ∧ matches the meta-level “and” `∀ x A[x] if and only if for all terms t we have ` A[t] (t not necessarily closed) The object-level ∀ matches the meta-level “for all”

If either ` A or ` B then ` A ∨ B Clearly: If there is a term t such that ` A[t] then `∃ x A[x]

But If you have...... you don’t necessarily have

`∃ x A[x] an n such that ` A[n]

Example `∃ x (P (x) ∨ ¬P (S(x))) an n such that ` P (n) ∨ ¬P (S(n))

` A ∨ B either ` A or ` B

Example ` A ∨ ¬A either ` A or `¬ A (e.g., A Goedel formula)

For ∨ and ∃, there is a mismatch between the object-level and the meta-level

286 The culprit and how to fix the mismatch In all our examples, mismatch entirely due to: ˆ the ˆ or, equivalently, the Elimination of Double Negation.

The fix is easy: Disallow those laws You get what is called Intuitionistic Logic(s) -as opposed to (s)

Distinction can be done for propositional logic, predicate logic, higher-order logic, etc.

The claim: we recover a full match between object-level and meta-level ˆ ` A ∧ B if and only if both ` A and ` B ˆ `∀ x A[x] if and only if for all terms t we have ` A[t] ˆ ` A ∨ B if and only if either ` A or ` B ˆ `∃ x A[x] if and only if there is a term t such that ` A[t]

The above match works in the empty theory, not in any theory! (imagine the theory A ∨ B or the theory ∃x A)

287 III. The proofs-as-programs paradigm

288 Now that we got rid of Excluded Middle/Elimination of Double Negation...... let’s look at the fragment of propositional Now let’s look again at the typing rules for logic concerning implication only: the λ-calculus:

Γ,P ` P ∆, x:A x: A

Γ ` P ⇒ Q Γ ` P ∆ t: A → B ∆ u: A Γ ` Q ∆ t u: B

Γ,P ` Q ∆, x:A t: B Γ ` P ⇒ Q ∆ λx.t: A → B What can we say? Γ ←→ ∆ P ←→ A Q ←→ B ⇒ ←→ →

289 More precisely Propositions are Types Proofs are Programs

Every proof tree in (the implication fragment of) intuitionistic logic can be annotated to be the typing tree of some λ-term (λ-calculus variables annotate hypothese, a λ-term annotates the conclusion)

Conversely, every typing tree, for some λ-term t, can be turned into a proof tree in (the implication fragment of) intuitionistic logic, simply by hiding variables and λ-term annotations

It is the Curry-Howard isomorphism

290 Re-expressing the rules using annotations Let’s use ˆ α, β, etc. for the variables annotating hypotheses (not to confuse with the variables x, y, etc. in the terms of predicate logic) ˆ M , N , etc. for the λ-terms annotating proof-trees (not to confuse with the terms of predicate logic t, u,. . . )

Γ, α:P ` α: P

Γ, α:P ` M : Q Γ ` M : P ⇒ Q Γ ` N : P Γ ` λα.M : P ⇒ Q Γ ` MN : Q

What about the other connectives?

We can extend the λ-calculus to account for the introduction and elimination rules of the other connectives

291 ∧, ∨

Γ ` M : P1 Γ ` N : P2 Γ ` M : P1 ∧ P2 i ∈ {1, 2} Γ ` (M,N): P1 ∧ P2 Γ ` πi(M): Pi

P1 ∧ P2 is a product type“ P1 ∗ P2” (e.g., in OCaml)

Γ ` M : Pi i ∈ {1, 2} Γ ` inji(M): P1 ∨ P2

Γ ` M : P1 ∨ P2 Γ, α1 :P1 ` N1 : Q Γ, α2 :P2 ` N2 : Q

Γ ` match M with inj1(α1) 7→ N1 | inj2(α2) 7→ N2 : Q

P1 ∨ P2 is a sum type“ P1 + P2”

292 ∀, ∃, ⊥

Γ ` M : P Γ ` M : ∀xP t Γ ` λx.M : ∀xP Γ ` M t: x P

t Γ ` M : x P Γ ` M : ∃xP Γ, α:P ` N : Q Γ `h t, Mi: ∃xP Γ ` let hx, αi = M in N : Q

Γ ` M : ⊥

Γ ` abort(M): P

¬P defined as P ⇒ ⊥, and > defined as ¬⊥ (i.e., ⊥ ⇒ ⊥)

293 Summing up the syntax

Intro-constructs Elim-constructs

M,N,...::= α axiom | λα.M | MN ⇒

| (M,N) | πi(M) ∧

| inji(M) | match M with inj1(α1) 7→ N1 | inj2(α2) 7→ N2 ∨ | λx.M | M t ∀

| ht, Mi | let hx, αi = M in N ∃

| abort(M) ⊥

294 Reductions

N (λα.M) N −→r o o t α M

πi((M1,M2)) −→r o o t Mi M match inji(M) with inj1(α1) 7→ N1 | inj2(α2) 7→ N2−→r o o t αi Ni t (λx.M) t −→r o o t {x}M t,M let hx, αi = ht, Mi in N −→r o o t x,α N

+ some permutation rules such as

(match M with inj1(α1) 7→ N1 | inj2(α2) 7→ N2) N

−→r o o t match M with inj1(α1) 7→ (N1 N) | inj2(α2) 7→ (N2 N)

πi(match M with inj1(α1) 7→ N1 | inj2(α2) 7→ N2)

−→r o o t match M with inj1(α1) 7→ πi(N1) | inj2(α2) 7→ πi(N2) ...

295 Results from Lecture 7 still hold Remark: If Γ ` M : P then FV(M) is included in the domain of Γ

N Substitution: If Γ, α:P ` M : Q and Γ ` N : P , then Γ ` α M : Q

Subject Reduction: If Γ ` M : P and M −→ M 0 then Γ ` M 0 : P

Through the Curry-Howard isomorphism,

This is describing a proof transformation process

Strong Normalisation: If Γ ` M : P then M is strongly normalising (careful with the permutation rules, though)

The process of transforming proofs terminates, producing proofs of a particular shape: the typing trees of irreducible λ-terms

Corollary: Every theorem P that has a proof in a theory T , also has a proof of that shape

296 Shape of those proofs in the empty theory Theorem: 1. Any closed, irreducible and typed λ-term, is an intro-construct 2. There is no closed, irreducible λ-term of type ⊥

Proof: by simultaneous induction on the size of λ-terms (and 2. easy consequence of 1.)

297 Corollary (stilll in the empty theory) Consistency: intuitionistic predicate logic without axioms is consistent

Proof: If ⊥ has a proof, it also has a proof whose λ-term is an intro-construct. Impossible.

Witness property: if `∃ x P [x] then there is a term t such that ` P [t]

Proof: The proof can be transformed into a proof annotated by an intro-construct, necessarily ht, Mi, which provides t and the proof of ` P [t]

Disjunction property: if ` P1 ∨ P2 then either ` P1 or ` P2

Proof: The proof can be transformed into a proof annotated by an intro-construct, necessarily inji(M), where M annotates a proof of ` Pi

Conclusion: In the empty theory, we recover a full match between object-level and meta-level (in the sense discussed before)

Remark: Law of Excluded Middle would break all of the above approach

298 In non-empty theories Axioms labelled by variables α, β, etc.

without computational role

Theorem about shape of irreducible typed λ-terms no longer holds if not closed

In some theories (e.g., PA), a computational role can be given to axioms

Theorem holds again, and its corollaries:

Consistency, Witness property, Disjunction property

299 Programming by proving In arithmetic, does ∀x ∃y (x = 2 × y ∨ x = 2 × y + 1) have a proof in intuitionistic logic? by induction on x!

What about ∃y (25 = 2 × y ∨ 25 = 2 × y + 1) ?

What is the witness? How do you compute it?

An intuitionistic proof of ∀x ∃y (x = 2 × y ∨ x = 2 × y + 1) is a program that computes the half here by recursion on x!

Its execution mechanism is the proof-transformation process described before

The program is correct with respect to the specification x = 2 × y ∨ x = 2 × y + 1

See other examples in the practical

300 Questions?

301