MATHEMATICAL LOGIC Narrowly construed, mathematical logic is the study of definition and inference in YIANNIS N. MOSCHOVAKIS mathematical models of fragments of lan- guage, especially the first order logic frag- ment. Logic has made critical contributions
I. Propositional Logic, PL . to the foundations of science, especially L II. First Order Logic, FO . through the work of Kurt G¨odel, and it III. G¨odel’s Incompleteness Theorem. also has numerous applications. For set the- IV. Computability. ory theoretical computer science V. Recursion and Programming. and , these VI. Alternative Logics. applications are so important, that parts VII. Set Theory. of these fields are normally included in the modern, broad conception of the discipline. Glossary I. Propositional Logic, PL Church-Turing Thesis: Claim that ev- ery computable function can be computed Each logic L has a syntax which delin- by a Turing machine. eates the grammatically correct linguistic Computability theory: Study of com- expressions of L, a semantics which assigns putable functions on the natural numbers. meaning to the correct expressions, and a Continuum hypothesis: Conjecture structured system of proofs which specifies that there are only two sizes of infinite sets the rules by which some L-expressions can of real numbers. be inferred from others. Database: Finite, typically relational There are other words to describe these structure. things: formal language is sometimes used First order logic: Mathematical model to describe a plain syntax, formal system of the part of language built up from the pro- often identifies a syntax together with an positional connectives and the quantifiers. inference system (but without an interpre- Incompleteness phenomenon: G¨odel’s tation), and abstract logic has been used to discovery, that sufficiently strong axiomatic refer to a syntax together with an interpreta- theories cannot decide all propositions which tion, leaving inference aside. It is, however, they can express. a fundamental feature of logic that it draws Model theory: Study of formal defin- clean distinctions and studies the connec- ability in first order structures. tions among these three aspects of language. Paradox: Counterintuitive truth. We explain them first in the simplest exam- Peano arithmetic: Axiomatic theory of ple of the “logic of propositions”, which is natural numbers. part of many important logics. Proof theory: Study of inference in for- A. Propositional Syntax mal systems independently of their interpre- tation. The symbols of PL are the connectives Propositional connectives: The lin- ¬ (not) & (and) ∨ (or) guistic constructs “and”, “not”, “or” and “implies”. → (implies, if-then) Quantifiers: The linguistic constructs the two parentheses ‘(’, ‘)’, and an infi- “there exists” and “for all”. nite list of (formal) propositional variables Turing machine: Mathematical model P0, P1, P2,... which intuitively stand for of computing device with unbounded mem- declarative propositions, things like ‘John ory. loves Mary’ or ‘3 is a prime number’. It has Unsolvable problem: A problem whose only one category of grammatically correct solution requires a non-existent algorithm. expressions, the formulas, which are strings 1 2 YIANNIS N. MOSCHOVAKIS
(finite sequences) of symbols defined induc- what the truth value of B, so that ‘if the tively by the following conditions: moon is made of cheese, then 1 + 1 = 5’ is F1. Each Pi is a formula. true (on the plausible assumption that the F2. If A and B are formulas, then so are moon is not made of cheese). This material the expressions implication assumed by Propositional Logic has been attacked as counterintuitive, but it ¬A (A & B) (A ∨ B) (A → B) agrees with mathematical practice and it is For example, if P and Q are propositional the only useful interpretation of implication variables, then (P → Q) and (P ∨¬P ) are which accords with the Compositionality formulas, which we read as “if P then Q” Principle. and “either P or not P ”. Using these rules, we can construct for The inductive definition gives a precise each formula A a truth table which tabulates specification of exactly which strings of sym- its truth value under all assignments of truth bols are formulas, and also insures that each values to the variables. For example, the formula is either prime, i.e., just a variable truth table for (Q → P ) consists of the first Pi, or it can be constructed in exactly one three columns of Table 2 while the first two way from its simpler immediate parts, by one of the connectives. This makes it possible to P Q (Q → P ) (P → (Q → P )) prove properties of formulas and to define 1 1 1 1 operations on them by structural induction 1 0 1 1 on their definition. 0 0 1 1 More propositional connectives can be in- 0 1 0 1 troduced as “abbreviations” of formula com- Table 2. binations, e.g., and the last column give the truth table for A ↔ B ≡ ((A → B)&(B → A)) (P → (Q → P )) . A ∨ B ∨ C ≡ (A ∨ (B ∨ C)). If n variables occur in a formula A, then the truth table for A has 2n rows and de- B. Propositional Semantics termines an n-ary bit function vA, with ar- If B stands for some true proposition, guments and values in the two-element set then ¬B is false, independently of the {1, 0}. By the Definitional Completeness “meaning” or internal structure of B. This Theorem, every n-ary bit function is vA for is an instance of a general Compositional- some A, so that the formulas of PL provide ity Principle for PL: The truth value of a definitions (or “symbolic representations”) formula depends only on the truth values for all bit functions. of its immediate parts. The semantics of A formula A is a semantic consequence PL comprise the rules for computing truth of a set of formulas T (or T -valid) if every values, and they can be summarized in Ta- assignment to the variables which satisfies ble 1, where 1 stands for ‘truth’ and 0 for (makes true) all the formulas in T also sat- ‘falsity’. By the first line of this table, for isfies A,. We write
A B ¬A (A & B) (A ∨ B) (A → B) T |= A ⇔ A is T -valid, 1 1 0 1 1 1 and |= A, in the important special case when 1 0 0 0 1 0 T is empty, in which case A is called a tau- 0 1 1 0 1 1 tology. A formula A is satisfiable if some 0 0 1 0 0 1 assignment satisfies it, i.e., if ¬A is not a Table 1. Truth value semantics. tautology. Let example, if A and B are both true, then ¬A A ∼ B ⇔ {A} |= B and {B} |= A is false while (A & B), (A ∨ B) and (A → B) ⇔ |= A ↔ B, are all true. Notice that if A is false, then (A → B) is reckoned to be true no matter and call A and B equivalent if A ∼ B. MATHEMATICAL LOGIC 3
Equivalent formulas define the same bit the computation of bit functions by appeal- function, and they can be substituted for ing to the formula representations of the cir- each other without changing truth values. cuits which realize them. For example, using Clearly disjunctive normal forms, one sees immedi- ately that (if we do not care about cost), (A → B) ∼ (¬A ∨ B), every n-ary bit function can be computed by so that the implication connective is super- an unbounded fan-in circuit in no more than fluous. In fact, every formula is equivalent 3 time units. There is, in general, a sub- to one in disjunctive normal form, i.e., a dis- stantial trade-off between the size and time junction A1 ∨···∨ Ak where each Ai is a complexity of the circuits which compute a conjunction of variables or negations of vari- given bit function. ables (literals). D. The Satisfiability Problem C. Applications to Circuits The assertion that “C(A) and C(B) Each formula A with n variables can be never give the same output on the same realized by a switching circuit C(A) with n inputs” means precisely that “(A ↔¬B) is inputs and one output, so that C(Pi) con- a tautology”, so that to detect that A and B sists of just one input-output edge, C(A&B) do not have this safety property we need to is constructed by joining C(A) and C(B) determine whether the formula ¬(A ↔¬B) with an and-gate, etc. Figure 1 exhibits the is satisfiable. Because of such natural formulations P 1 s of “error detection” for circuits relative & - ¬ > s - to given specifications, it is very impor- P ∨ 2 > tant to find efficient algorithms for deter- P 3 mining whether a given formula is satisfi- able. The problem is of non-deterministi- Fig. 1. The circuit for (¬((P1 & P2) ∨ P3). cally polynomial time complexity (NP), be- circuit for ((P1 &P2) → P3) using the equiv- cause it can be resolved by guessing (“non- alent formula without implications, so that deterministically”) some assignment and only ¬-, &- and ∨-gates are required. These then verifying that it satisfies A in a number are restricted circuits, of fan-in (maximum of steps which is bounded by a polynomial number of edges into a node) 2 and fan-out in the length of A; and it is NP-complete, 1, but the Definitional Completeness The- i.e., every NP-problem can be “reduced” to orem implies that every n-ary bit function it by a polynomial reduction. This is a ba- can be computed by some formula circuit sic result of S. Cook, who introduced the C(A). complexity class NP, showed that it con- There are basically two useful measures tains a large number of important problems, of circuit complexity, and both of them are and asked if it coincides with the (seem- faithfully mirrored in formulas. The num- ingly) smaller class P of “feasible”, deter- ber of gates of C(A) is exactly the number ministically polynomial time problems. The of connectives in A and measures size com- question whether P = NP is the fundamen- plexity (construction cost), while the depth tal open problem of complexity theory; it of C(A), which measures the time complex- amounts simply to the question whether the ity of computation, is exactly the rank of satisfiability problem can be solved by a de- A, defined inductively so that rk(Pi) = 1, terministic, polynomial algorithm. rk(A&B) = max(rk(A), rk(B))+1 and sim- ilarly for the other connectives. One can E. Propositional Inference now use natural manipulations of formulas A proof of a formula A from a set of hy- to construct circuits which compute a given potheses T is any finite sequence bit function with minimum size or time com- plexity, or to establish optimality results for A0,A1,...,An−1,A 4 YIANNIS N. MOSCHOVAKIS which ends with A, and such that each Ai is M gives the two-element set {1, 0} of truth either in T , or a PL-axiom, or follows from values; but there are others, e.g., the set of previously listed formulas by a rule of infer- all finite and co-finite subsets of some in- ence. To make this notion precise we need finite set, the set of all “closed and open” to specify a set of PL-axioms and rules of in- subsets of a topological space, etc. ference; and for these to be useful, it should Each formula A with n variables defines be that they are few and easy to understand, an n-ary function on every Boolean algebra and that the formulas provable from T are B, simply by letting the propositional vari- exactly the T -tautologies. ables range over B and replacing ¬, & and ∨ We need just one, binary inference rule: and → by ′, ∩, ∪ and ⇒ respectively, where A (A → B) x ⇒ y = x′ ∪ y (Modus Ponens) B on B. Now the axioms for a Boolean al- This is sound, i.e., {A, (A → B)} |= B, gebra insure that every propositional axiom so that if A and (A → B) are both T - defines a function with constant value 1—in tautologies, then so is B. fact the particular choice of axiomatization An axiom is any instance of the following for Boolean algebras (and there are many) is axiom schemes, where A, B and C are arbi- quite irrelevant as long as this fact obtains; trary formulas and we have omitted several and then the Completeness Theorem implies parentheses which pedantry would require: that two formulas A and B define the same (1) A → (B → A) n-ary operation on all Boolean algebras ex- (2) (A → B) actly when A ∼ B, i.e., when A and B define → (A → (B → C)) → (A → C) the same bit function. Boolean algebras have many important (3) A → (B → (A & B)) applications in mathematics (to measure (4) (A & B) → A (4′) (A & B) → B ′ theory, among other things), and they are (5) A → (A ∨ B) (5 ) B → (A ∨ B) the subject of the classical Stone Represen- (6) (A → C) tation Theorem which identifies them all → (B → C) → ((A ∨ B) → C) (up to isomorphism) with subalgebras of powerset algebras. In logic they are mostly (7) (A → B) → (A →¬B) →¬A used through the “non-standard” Boolean (8) ¬¬A → A semantics of this subsection, which extend These are all tautologies, and so every for- to richer logics and provide a powerful tool mula provable from T is T -valid. We write for independence (unprovability) results. T ⊢ A ⇔ there is a proof of A from T, II. First Order Logic, FOL and it is not hard now to establish the basic Soundness and Completeness Theorem Consider the claim: for PL. For all sets T and any A, If everybody has a mother, and T |= A ⇔ T ⊢ A. every mother loves her children, then everybody is loved by F. Boolean Algebras somebody. A Boolean algebra is a set B with at least It is certainly true, it has the “linguistic two, distinct elements 0 and 1, a unary com- form” of many similar (more substantial) plementation operation ′, and binary infi- claims in mathematics, and it appears to mum ∩ and supremum ∪ operations such be true by virtue of its form and not be- that certain properties hold. The standard cause of any special properties of the words example is the set P(M) of all subsets of “mother”, “love”, etc. First Order Logic some non-empty set M, with 0 = ∅, 1 = M makes it possible to express complex asser- and the usual complementation, intersection tions of this type and to show that they are and union operations, which for a singleton true by logic alone. The symbolic expression MATHEMATICAL LOGIC 5 of this one will be quantification is only allowed over individ- uals; if we add formula formation rules (∀x)(∃y)M(x, y) n n (∀Pi )A (∃Pi )A h &(∀x)(∀y)[M(x, y) → L(y,x)] we obtain the formulas of second order logic, SOL → (∀x)(∃y)L(y,xi ), . Consider the simple formula give-or-take a few parentheses and brackets (1) (∃v )(¬v = v & P1(v )). which will be required to make the syntax 2 2 1 1 2 completely precise. Its “translation” into English by the reading of the symbols we have introduced is
A. First Order Syntax some object other than v1 P1 The symbols of FOL are the propositional has the property 1 connectives, the parentheses, the quantifiers which is exactly how we would translate the result of substituting v3 for v2 in it, ∀ (for all) ∃ (there exists) 1 (∃v3)(¬v3 = v1 & P1(v3)). the comma ‘,’, the identity symbol ‘=’, an This is because both occurrences of v in (1) infinite list v , v ,... of individual variables 2 0 1 are bound by the quantifier ∃v , just as the which will denote arbitrary objects in some 2 occurrences of x are bound by the dx in domain, and for each n = 0, 1,..., two infi- 1 x2dx and can be replaced by y without nite lists of function and relational symbols 0 changing the meaning of the definite inte- n n n n R f0 , f1 ,..., P0 , P1 ,..., gral. On the other hand, the occurrence of v1 in (1) is free, because it is not within the which will stand for n-ary functions and re- scope of any quantifier, and so the inter- lations on the objects. pretation of v1 clearly affects the meaning There are two categories of grammati- of (1). terms cally correct expressions in FOL, and Using the same simple example, consider formulas, defined recursively by the follow- 1 1 the results of substituting f (v3) and f (v2) ing conditions. 0 0 for v1 in (1), T1. Each variable vi is a term. v v f1 v P1 v T2. If t1,...tn are terms, then (the (∃ 2)(¬ 2 = 0 ( 3) & 1( 2)), string) fn(t ,...,t ) is also a term. When 1 1 i 1 n (∃v2)(¬v2 = f0 (v2) & P1(v2)). n = 0, we write simply f0. i f1 v F1. If t ,...,t are terms, then the ex- The first of these says of 0 ( 3) what (1) says 1 n v pressions of 2, but the second says that “something is 1 1 not a fixed point of f0 and has property P1”, n t1 = t2 Pi (t1,...,tn) which is quite different—evidently because the variable v in f1(v ) is “caught” by the are formulas, the latter written simply P 2 0 2 i quantifier ∃v . The first is a free substitution when n = 0. 2 (causing no confusion) while the second is F2. If A and B are formulas, then so are not. We will denote the result of substitut- the expressions ing the term t for the free occurrences of the ¬A (A & B) (A ∨ B) (A → B) variable x in some formula A by F3. If A is a formula, then so are the A{x :≡ t} expressions and we will tacitly assume that all substitu- tions are free. (∀v )A (∃v )A i i Formulas of FOL are too messy to write Notice that by the notational convention in down, and so we often resort to “informal F1, all PL-formulas are also FOL-formulas. descriptions” of them like the example about This logic is called first order because mothers loving their children above, recipes, 6 YIANNIS N. MOSCHOVAKIS
extends to FOL in a straightforward man- ı |= t1 = t2 ⇔ ı(t1) = ı(t2) ner and implies the following basic fact: the Pn Pn ı |= i (t1,...,tn) ⇔ (ı( i ))(ı(t1),...,ı(tn)) truth value of A relative to ı depends only on ı |= ¬A ⇔ ı 6|= A the values of ı on the function and relation ı |=(A & B) ⇔ ı |= A and ı |= B symbols which occur in A, and on the values ı |=(A ∨ B) ⇔ ı |= A or ı |= B ı(x) for the individual variables which occur in ı |=(A → B) ⇔ ı 6|= A or ı |= B free A. The Tarski conditions do nothing more ı vi A d , |=(∀ ) ⇔ for all in D than translate formulas into English, in ef- ı{vi := d}|= A fect identifying FOL with a precisely formu- ı |=(∃vi)A ⇔ for some d in D, lated, small but very expressive fragment of ı{vi := d}|= A natural language. Table 3. The Tarski truth conditions. C. Structures really, from which the full, grammatically A vocabulary (or signature) is any finite correct formula could (in principle) be con- sequence σ = {f1,..., fk, P1,..., Pl} of func- structed. tion and relation symbols, and FOL(σ) is the part of FOL whose formulas involve only B. First Order Semantics the function and relation symbols of σ. The idea is to think of f1,..., fk and P1,..., Pl as Whether (1) is true or false depends on constants, denoting fixed functions and rela- v f1 the object 1, on the function 0 , on the tions on some set D, and to use the formulas P1 property 1, and (most significantly) on the of FOL(σ) to study definability in structures range of objects over which we interpret the existential quantifier—where do we search M =(DM , f1,...,fk,P1,...,Pl) 1 for things which may or may not satisfy P1? of vocabulary σ, where the universe DM To interpret the formulas of FOL we must of M is any non-empty set, and f1,...,fk, be given a domain D and an interpretation P1,...,Pl are functions and relations which ı, a function which assigns an object ı(vi) can be assigned to the vocabulary symbols, in D to each individual variable, an n-ary e.g., such that fi is n-ary if fi is n-ary. n function ı(fi ) on D to each n-ary function An M-assignment is any function α from n n symbol fi , and an n-ary relation ı(Pi ) on the variables to DM , and it extends natu- n D to each Pi . Using these, first we extend rally to an interpretation αM by the associ- inductively ı to all terms by ation of fi with fi and Pi with Pi; the stan- n n dard notation for structure satisfaction is ı(fi (t1,...,tn))=(ı(fi ))(ı(t1),...,ı(tn)), M, α |= A ⇔ αM |= A. so that ı(t) is some object in D. To as- sign truth values to formulas, define first, Formulas of FOL(σ) with no free variables for each variable x and d in D, the update are called sentences and (by the Composi- tionality Principle) they are simply true or = ı{x := d}, false in every σ-structure, without reference which agrees with ı on all function and rela- to any assignment. They define properties tion symbols, and also on all individual vari- of structures. We write ables, except that (x) = d. With the help of M |= A ⇔ for any (and hence all) α, this basic operation, we can state in Table 3 M, α |= A (A a sentence), the classical Tarski truth conditions which determine the truth of formulas relative to and if M |= A, we say that M satisfies A or a fixed domain D and an interpretation ı. is a model of A. The truth value of a formula A relative to While sentences define properties of struc- an interpretation ı is 1 if ı |= A and 0 oth- tures, formulas with free variables can be erwise, and the Compositionality Principle used to define relations on structures. If, for MATHEMATICAL LOGIC 7 example, A has at most one free variable x, important in their study. we set Two structures M1 and M2 are isomor- phic if some one-to-one correspondence be- RA(d) ⇔ M, α{x := d} |= A, tween their universes carries the functions where α is any assignment, since its only and relations of M1 to those of M2. Isomor- relevant value is updated in this definition. phic structures satisfy the same first order In the same way, formulas with n free vari- sentences, but the converse is not true, as ables define n-ary relations on σ-structures, we will see in II-F. the first order definable relations of M. A n function f : DM → DM is first order defin- D. Databases able if its graph In the most general terms, a database is just a finite structure, typically relational, Gf (x1,...,xn,w) ⇔ w = f(x1,...,xn) i.e., without functions, only relations. “Fi- is first order definable. Some examples: nite” does not mean “small” or “simple”, A directed graph is a structure G = and in the interesting applications databases (D,E), where E is a binary “edge” relation are huge structures of large and complex vo- on the set of “nodes” G, and it is a graph cabularies, with basic relations such as “x is (undirected) if it satisfies the sentence an employee born in year n”, “y is the su- pervisor of x”, etc. Properties of structures (∀x)(∀y)[E(x, y) → E(y,x)]. are usually called queries in database the- ory, and one of the main tasks in the field Complete graphs (cliques) are characterized is to develop representations for databases by the sentence which support fast algorithms for updating, (∀x)(∀y)E(x, y), entering new information in the base and data testing, determining the truth or falsity while “diameter ≤ 2” is defined by of queries. As it happens, both updating and data testing are very efficient for first order (∀x)(∀y)[x = y ∨ E(x, y) queries, and so database systems, including ∨ (∃z)[E(x, z) & E(z, y)]]. the industry standard SQL make heavy use of methods from first order logic. Finite directed and undirected graphs are Motivated by Database Theory, a good used to model many notions in computer sci- deal of research has been done since the ence, e.g., circuits. 1970s in Finite Model Theory, the mathe- A semigroup (monoid) with identity is matical and logical study of finite structures. a structure (S, e, ·) where the identity e is For a rather surprising, basic result, let some specified member of S, · is a binary
“multiplication” on S, and the following sen- Probσ[M |= A : |DM | = n] tences are true: = the proportion of σ-structures (∀x)(∀y)[x · (y · z)=(x · y) · z], of size n which satisfyA, (∀x)(x · e = x & e · x = x). where structures are counted “up to isomor-
Here and in the sequel we write t1 · t2 rather phism”. The - Law. For each sentence than the pedantically correct ·(t1,t2). FOL 0 1 A In addition to semigroups, there are of FOL(σ) in a relational vocabulary, either groups, rings, fields and ordered fields, vector lim Probσ[M |= A : |DM | = n] = 1, spaces, and any number of other structures n→∞ which are the stuff of “abstract” algebra. or These classes of structures are all charac- lim Probσ[M |= A : |DM | = n] = 0, terized by first order axioms, and the use of n→∞ methods from logic is becoming increasingly i.e., either A or ¬A is asymptotically true. 8 YIANNIS N. MOSCHOVAKIS
More advanced work in this area is con- model. cerned primarily with the algorithmic anal- For an impressive application, let (in the ysis of queries on finite structures, especially vocabulary of arithmetic) in logics richer than FOL. ∆0 ≡ 0, ∆m+1 ≡ (∆m + 1), E. Arithmetic so that the numeral ∆m is about the sim- Most basic is the structure of arithmetic plest term which denotes the number m, add a constant c to the language, and let N =(N, 0, 1, +, ·), where N = {0, 1,...} is the set of (non- T = {A : N |= A} negative) natural numbers and + and · are ∪ {∆0 ≤ c, ∆1 ≤ c, ∆2 ≤ c,...}. the operations of addition and multiplica- Every finite subset S of T has a model, tion. The first order definable relations and namely functions on N are called arithmetical, and they obviously include addition, multiplica- NS =(N, 0, 1, +, ·,m), tion and the ordering on N, which is defined where the object m which interprets c is by the formula some number bigger than all the numerals x ≤ y ≡ (∃z)[x + z = y]. which occur in formulas of S. So T has a countable model By a basic Lemma of G¨odel, if a function f N =(N, 0, 1, +, ·,c), is determined from arithmetical functions g T and h by the equations and then N = (N, 0, 1, +, ·) is a structure f(0, ~x) = g(~x) for the vocabulary of arithmetic which sat- (2) f(y + 1, ~x) = h(f(y, ~x), y, ~x), isfies all the first order sentences true in the “standard” structure N but is not isomor- then f is also arithmetical. Thus exponen- phic with N—because it has in it some ob- y tiation x is arithmetical, with g(x) = 1, ject c which is “larger” than all the interpre- h(w,y,x) = w · x, and, with some work, so tations of the numerals ∆0.∆1,.... It fol- is the function p(x) which enumerates the lows that, with all its expressiveness, First prime numbers, Order Logic does not capture the isomor- p(0) = 2, p(1) = 3, p(2) = 5, .... phism type of complex structures such as N. These non-standard models of arithmetic In fact, the scheme of Primitive Recur- were constructed by Skolem in the 30s. sion (2) is the basic method by which func- Later, in the 50s, Abraham Robinson con- tions are introduced in number theory, so structed by the same methods non-standard that, with some work, all fundamental num- models of analysis, and provided firm foun- ber theoretic relations and functions are dations for the classical Calculus of Leibnitz arithmetical, and all celebrated theorems with its infinitesimals and “infinitely large” and open problems of the theory of num- real numbers. bers are expressed by first order sentences Model Theory has advanced immensely of N. These include the Prime Number The- since the early work of Tarski, Abraham orem, Fermat’s Last (Wiles’) Theorem, and Robinson and Malcev. Especially with the the (still open) question whether there exist contributions of Shelah in the 70s and, more infinitely many twin pairs of prime numbers. recently, Hrushovsky, it has become one of the most mathematically sophisticated F. Model Theory branches of logic, with substantial applica- The mathematical theory of structures tions to algebra and number theory. starts with the following basic result: Compactness and Skolem-L¨owenheim The- G. First Order Inference orem. If every finite subset of a set of sen- The proof system of First Order Logic is tences T has a model, then T has a countable an extension of that for Propositional Logic, MATHEMATICAL LOGIC 9
first by identity axioms which insure that = Logic is the converse of this result: is an equivalence relation and a congruence Completeness of FOL. If T |= A, then for all function and relation symbols, e.g., T ⊢ A. for unary function symbols, It may be argued that the semantic conse- (∀x)(∀y)[x = y → f(x) = f(y)]. quence relation T |= A captures the intuitive notion A follows from the assumptions in T In addition, there are two axioms for the by logic alone, in the sense that it insures quantifiers, that A is true whenever all the hypotheses A{x :≡ t} → (∃x)A (∀x)A → A{x :≡ t}, in T are true, independently of the meaning assuming that the term substitutions are of the function and relation symbols. Grant- free; and there are two new inference rules, ing that and considering the strong express- ibility of First Order Logic discussed in II-C C → A A → C above, we may then argue further that the C → (∀x)A (∃x)A → C Completeness Theorem answers definitively (for science) the ancient question of what fol- which can be used only when the variable lows from what by logic alone: a proposi- x is not free in C. Proofs from a set T of tion A follows from certain assumptions T FOL(σ) sentences are defined exactly as for as a matter of logic (and independently of PL, and we set again the facts), if A and T can all be expressed T ⊢ A ⇔ there is a proof of A from T. faithfully as FOL(σ) assertions about some σ-structure M, and T ⊢ A. On this view, it Notice that without the restriction on the is hard to overemphasize the importance of quantifier rules, the sequence this result for the foundations of mathemat- P (x) → P (x),P (x) → (∀x)P (x), ics and science. Incidentally, there is an obvious extension (∃x)P (x) → (∀x)P (x) of the Tarski conditions to Second Order would be a proof of (∃x)P (x) → (∀x)P (x), Logic, e.g., which is, obviously, not valid. With the re- Pn striction, however, for every structure M, if ı |=(∀ i )A ⇔ for all n-ary P on D, n every M-assignment satisfies the hypothesis ı{Pi := P } |= A. of either new rule, then every M-assignment However, there is no useful Completeness satisfies the conclusion, so that the quanti- Theorem for SOL, as we will see in IV-F. fier inference rules are sound. H. G¨odel’s Completeness Theorem I. Proof Theory A model of a set of sentences T in FOL(σ) If Model Theory is the study of seman- is any structure M which satisfies every A tics independently of inference, then Proof in T , in symbols Theory can be viewed as the mathemati- cal investigation of formal proofs indepen- M |= T ⇔ for all A in T, M |= A. dently of interpretation. This has always We also write been one of the most active research areas of logic, and it has been invigorated in recent T |= A ⇔ for all M, years by its substantial applications to com- M |= T =⇒ M |= A, puter science, including automated deduc- tion, an important component of artificial which extends to FOL(σ) the semantic con- intelligence. Key to these applications—and sequence relation of PL. From the comments the basic result of Proof Theory—is the Ex- above: tended Normal Form Theorem of Gentzen, Soundness Theorem for FOL. If T ⊢ A, whose somewhat weaker (but simpler) Her- then T |= A. brand version is fairly easy to describe. The fundamental fact about First Order There are four Herbrand inference rules, 10 YIANNIS N. MOSCHOVAKIS and they apply to n-ary disjunctions and “Proof Theory = no semantics” are of- ten honored in the breach: like the Com- A1 ∨···∨ An. pleteness Theorem, most fundamental re- Two of them are structural, and they clearly sults of logic are about connections between preserve meaning: you can interchange the truth and proof, and some of the deepest re- order of the disjuncts, or delete one of two sults in one part of the discipline depend on occurrences of the same disjunct. The other two are quantifier rules, methods and ideas from the other. III. Godel’s¨ Incompleteness Theorem A1 ∨···∨ An{x :≡ t} A1 ∨···∨ An ∗ A1 ∨···∨ (∃x)An A1 ∨···∨ (∀x)An Having established that FOL proves all logical truths, it is natural to ask if it can also where the ∗ indicates that the ∀-rule can prove—from some natural set of axioms—all only be used if the variable x is not free mathematical truths. This is not possible, in its conclusion. The result applies only by G¨odel’s fundamental result, whose spe- to sentences without identity and in prenex cial case for arithmetical truths we discuss normal form, i.e., looking like in this section. (Q1x1) · · · (Qn)B A. The Incompleteness of Peano Ari- Q where each i is ∀ or ∃ and B is quantifier- thmetic free. The classical Peano axioms for arithmetic Herbrand’s Theorem. Every provable =- comprise the properties of the successor free sentence A of FOL(σ) in prenex form can be derived from a provable quantifier- (3) x + 1 6= 0 x + 1 = y + 1 → x = y, free disjunction by the four Herbrand rules. the recursive definitions of addition and The restriction to prenex sentences is not multiplication, essential, because every formula can be con- x + 0 = x verted to an equivalent prenex one by the (4) x +(y +1) = (x + y) + 1, application of simple rules which can be x · 0 = 0 added to the system. (5) The theorem asserts (in part) that every x · (y + 1) = x · y + x, provable sentence A has a “normal” proof, and the Induction Axiom which cannot be in which only formulas of “quantifier rank” expressed fully in First Order Logic. Its Sec- no greater than A occur. This is a power- ond Order Logic version is ful tool for proof-theoretic studies. As for applications, all automated deduction sys- (∀P ) P (0)&(∀x)(P (x) → P (x + 1)) tems use Herbrand-like inference systems h (or their Gentzen variants), and the pro- → (∀x)P (x) , gramming language PROLOG is based en- and the best we can do in FOL is to adopti tirely on this idea. the Axiom Scheme The proof of Herbrand’s Theorem is con- structive: an algorithm is defined, which (6) A{y :≡ 0} computes for each proof Π of a prenex sen- tence A a Herbrand proof Π′, and then it is &(∀x)(A{y :≡ x} → A{y :≡ x + 1}) shown by simple, combinatorial arguments → (∀x)A{y :≡ x}. that Π′, indeed, proves A. The additional, effective content is significant for the foun- The set PA of (first order) Peano axioms dational applications of the theorem (for ex- is obtained by taking the correctly spelled ample to consistency proofs), and also in the versions of all the formulas in (3)–(6) and applications to automated deduction. adding enough universal quantifiers in front It should be emphasized that the simplis- of them so that they become sentences. This tic slogans “Model Theory = no inference” is a very strong set of axioms, it can prove MATHEMATICAL LOGIC 11 all simple properties of numbers and most of p(i) is the i’th prime number. For example, their deep properties too—although proving the (correctly spelled) prime formula PA a theorem from is harder than proving it +(v , 0) = v using, say, methods from analysis, and num- 1 0 ber theorists distinguish and value “elemen- has the horrendously large code tary proofs” in PA. 2133551679111113617101915. G¨odel’s First Incompleteness Theorem. The size of codes is irrelevant: what mat- There is a sentence g in FOL(0, 1, +, ·), ters is that every string of symbols (and such that N |= g but PA 6⊢ g. hence every term, formula and proof) has One’s first thought is that we can over- a code from which it can be reconstructed, come this “incompleteness phenomenon” by by the Unique Factorization Theorem for strengthening PA, perhaps add G¨odel’s own numbers; and (more significantly) that PA is g to it, or use the Second Order Logic powerful enough to express and prove sim- version of the Induction Axiom along with ple properties of formulas and proofs, thus a suitable axiomatization of Second Order translated into properties of numbers. For Logic. None of this helps: G¨odel’s funda- example, if ∆n is the numeral denoting n, mental discovery is that first order truth in as above, then PA can prove all true, basic N (and every other sufficiently rich struc- relations among numerals, e.g., ture) simply cannot be presented usefully as PA an “axiomatic theory”. We will make this m + n = k =⇒ ⊢ ∆m + ∆n = ∆k. precise in a more general version of the In- Less trivially, the basic (coded) proof re- completeness Theorem in the next section. lation B. Coding (G¨odel numbering) ProofPA(a,p) ⇔ a is the code of some sentence The basic ingredients of the proof of the A and p is the code of a proof Incompleteness Theorem are coding and of A from PA self-reference. Proof v In analytic geometry we “code” (repre- is defined by some formula PA with 1 v PA sent) points in the plane by pairs of real and 2 free, and can prove its basic prop- numbers, their coordinates, so we can trans- erties, e.g., late geometrical questions into algebraic ProofPA(a,p) problems and solve them by calculation. =⇒ PA ⊢ ProofPA{v1 :≡ ∆a, v2 :≡ ∆p}. G¨odel’s basic idea is to code the syntactic Similarly, the relation objects of FOL(0, 1, +, ·)—terms, formulas, D(a,p) ⇔ proofs—by natural numbers, so that their a is the code of some formula A properties are translated into properties of with only v1 free, and p is the numbers, which can then be expressed in code of a PA-proof of PA FOL(0, 1, +, ·) and (perhaps) proved in . A{v :≡ ∆ } Since all syntactic objects are strings of 1 a is defined by some formula D with just v , v symbols, if we view a proof A1,...,An−1 as 1 2 a sequence of formulas separated by com- free. Set mas, it is enough to code strings, and we A ≡ (∀v2)¬D can do this in (at least) one simple minded so that only v1 is free in A, and if a is the way: we enumerate the symbols of the lan- code of A, set guage g ≡ A{v1 :≡ ∆a}. ¬ & ∨ → ( ) ∀ ∃ , =0 1+ · v0 v1 ·· 1 2 3 4 5678910111213141516 ·· Unscrambling the definitions, g asserts that and we set there is no PA-proof of A{v1 :≡ ∆a}; but g is v n0 n1 n2 nm A{ 1 :≡ ∆a}, so that g claims its own [a0a1a2 · · · am] = 2 3 5 · · · p(m) , unprovability; and a careful analysis of the where ni is the code of the symbol ai and situation shows that, indeed, g cannot be 12 YIANNIS N. MOSCHOVAKIS provable in PA, else PA would prove a con- A. Turing machines g tradiction. This also shows, that is true. A Turing machine M is determined by a It is not that simple, of course, and much finite alphabet S = {s ,...,s }, a finite delicate analysis and computation must be M 0 k set QM = {q0,...,qm} of (internal) states, done to establish that D(a,p) is arithmeti- and a finite table of transitions of the form cal and to derive a formal contradiction from the assumption that g is PA-provable. Key q,s 7→ q′,s′,m to the proof is the “self-reference” in the ′ ′ definition of D(a,p), which uses the coding, where q, q are states, s,s are in SM or the and the argument depends on the strength special “blank” symbol , and the move m (not the weakness) of the axiomatic system is −1, 0 or +1. No two transitions are acti- PA. Coding and self-reference have become vated by the same pair q,s on the left. We standard tools of logic since G¨odel’s work, imagine that, at any moment, M is in some and they have found substantial applica- internal state q and sits in front of an infinite tions in many areas, including computer sci- “tape” with symbols in some of its cells. The ence and set theory. machine can only “see” the symbol s just in front of it, and does nothing (halts) unless IV. Computability one of its transitions is activated by the pair q,s; in which case it switches to state q′, it It is easy to determine whether an arbi- replaces s by s′ on the tape, and it moves n trary equation a0 +a1x+· · ·+anx = 0 with integer coefficients a0,...,an has integer so- b a b b b lutions, since every integer root must divide 6 =⇒ 6 a0, and so all we have to do is to test the q q′ finitely many divisors of a0. The problem is not so easy for equations in k unknowns Fig. 2. q,a 7→ q′, , −1. r1 r2 rk (7) ar1,...,rn x1 x2 · · · xk = 0, left (if it can), right or none-at-all, depend- r1 ··· rk≤n + X+ ing on whether m is −1, +1 or 0. and it is much more interesting, in fact A machine M starts computing facing the to find an algorithm which deter- leftmost cell, with an arbitrary string input mines whether (7) has a solution u0u1 ··· um−1 v0v1 ··· vn−1 is No. 10 in David Hilbert’s famous 1900 list 6 6 of 23 open problems in mathematics. Dio- q0 qt phantine equations are notoriously difficult to solve, and one might suspect that no al- gorithm would do the job, but how can you u = u0 · · · um−1 on the tape, and it may diverge prove such an assertion? Using ideas and (never halt), for example if u = 11 techniques from G¨odel’s work and motivated and M has the two transitions by questions arising from it, logicians devel- q0, 1 7→ q0, 1, +1 q0, 7→ q0, 1, +1 oped in the 30s a tool for establishing abso- lute unsolvability results of this kind which If it halts, then its output on u is the string led to some spectacular applications, includ- M[u] = v0 · · · vn−1 at the left end of the ing a rigorous proof of the unsolvability of tape, until the first blank (and it is possi- Hilbert’s 10th. ble that M[u] is empty.) The most direct approach was by Turing, Finally, M computes a string function ∗ ∗ who reasoned that algorithms should be im- f : S1 → S2 if S1 ∪ S2 ⊆ SM and for every ∗ plemented by “mechanical devices” and in- string u ∈ S1 , M[u] = f(u). By identifying troduced “abstract machines” that can per- each natural number n with the string || · · · | form symbolic computations some ten years of n + 1 tallies from the one-member alpha- before digital computers were invented. bet {|} (unary notation), the notion covers MATHEMATICAL LOGIC 13 functions whose arguments or values are ei- claiming to capture the notion of “com- ther strings or numbers. Moreover, if we putable” from different perspectives, includ- code strings by numbers as above, then the ing Church’s λ-definable functions, Post’s transformation u 7→ [u] and its inverse can canonical systems, the general recursive be computed by a Turing machine, so that a functions of G¨odel, Herbrand and Kleene, string function is Turing computable exactly Kleene’s µ-recursive functions and, in the when its “coded version” is computable, and forties, Markov’s (formal) algorithms; each we can safely confuse the two notions. of these was proved equivalent to Turing computability, and the “simulation tech- B. The Church-Turing Thesis niques” developed for these proofs make it seem very unlikely that some algorithm will Turing argued persuasively that the sym- ever be discovered which cannot be simu- bolic computations of any “finite mechanical lated by a Turing machine. device” with access to unbounded memory It should be emphasized, however, that can be simulated by one of his machines, and the Church-Turing Thesis does not provide he has been fully justified by the subsequent a rigorous definition for the notion of algo- developments in computers. Church had al- rithm, which remains informal. Complex- ready made an equivalent (though less well ity results about algorithms are rigorously justified) claim, and so the new fundamental grounded on various so-called computation principle carries both famous names: models which embody diverse features of ac- The Church-Turing Thesis: A string tual computers. When we simulate these ∗ ∗ function f : S1 → S2 is computable if and models by Turing machines, the time and only if it can be computed by a Turing ma- space complexity of computations increase chine M on some alphabet SM ⊇ S1 ∪ S2. substantially, and so we cannot claim that The Church-Turing Thesis cannot be rig- the informal algorithm has been faithfully orously proved, as it identifies the intuitive, modeled. On the other hand, the time com- informal notion of “computability” with the plexity increase is bound by a polynomial precise, mathematical property of Turing factor for all the known simulations, so that computability. Within mathematics, it is the class P of polynomial problems can be officially a definition, much like the defini- defined in terms of Turing machines without tions of arclength or area in terms of in- ambiguity. tegrals. But mathematical definitions are Turing-computable functions are also not entirely arbitrary: when we “define” the called recursive, because of the basic G¨o- length of the circumference of a circle of ra- del-Herbrand-Kleene characterization men- dius r by an integral which computes out to tioned above. 2πr, we fully expect that if we draw such a circle and measure its circumference, it will C. Unsolvable Problems turn out to be 2πr, within the margin of er- ∗ ror of our measurements. Similarly, when we A set of strings (or problem) Q ⊆ S from prove that a certain string function f is not a finite “alphabet” S is computable (recur- Turing computable, we fully expect that no- sive, solvable, decidable) if some Turing ma- body will ever discover an algorithm which chine M computes its characteristic func- computes f, because no such algorithm ex- tion ists. This is the standard method of appli- 1 if u ∈ Q, cation of the Thesis. cQ(u) = (0 otherwise, Evidence for the Church-Turing The- sis comes from Turing’s analysis, from the otherwise it is unsolvable or undecidable. sixty-odd years of failed attempts to contra- The definitions apply to problems about dict it, and from the robustness of the no- natural numbers, coded in unary; to prob- tion of Turing computability. Many classes lems about FOL-formulas, by identifying of functions were defined in the thirties (for example) each variable vi by a similar 14 YIANNIS N. MOSCHOVAKIS sequence vv · · · v of i+1 v’s, so that the syn- D. Undecidable Theories FOL tax of is based on a finite vocabulary; A theory T in FOL(σ) is any set of sen- and to relations (sets of n-tuples) on strings tences closed under consequence, or numbers, by thinking of u1,...,un as a single string. T ⊢ A =⇒ A ∈ T. Each Turing machine can be represented by a string of 0’s and 1’s which codes its al- The two basic examples are theories of σ- phabet, internal states and transitions, and structures this leads to the first and most basic unsolv- Th(M) = {A | M |= A}, ability result, due to Turing: The Halting problem: It is undecidable and axiomatic theories of the form whether an arbitrary Turing machine M halts on an arbitrary binary string u. T = Th(T0) = {A : T0 ⊢ A}, For the proof, Turing constructed a uni- where T0 is a decidable set of axioms T0. versal machine U which can simulate every The terminology is natural, because we other, i.e., would certainly demand of any “axioma- tization” that it can be decided effectively U[M, u] = M[u], if M is the code ofM. whether an arbitrary sentence is an axiom. Every decidable theory T is axiomatiz- This treatment of programs as data is, of able since Th(T ) = T when T is a theory, course, routine today. but the converse fails, in general, and in par- All unsolvability results are (ultimately) ticular for T0 = ∅ when the vocabulary is not established by reducing the Halting Problem trivial: to them, i.e., showing that if such-and-such Church’s Theorem: If the vocabulary σ a function were computable, then the Halt- includes at least one binary function or re- ing Problem would be solvable. The proofs lation symbol, then it is undecidable for a are often difficult and generally depend on sentence A of FOL(σ) whether ⊢ A. results specific to the field in which the prob- lem arises. A FOL(σ)-theory T is consistent if it does In mathematics, the problems which have not contain a contradiction A & ¬A, and it been proved unsolvable include: is complete if for every sentence A, either A or ¬A is in T . It is easy to verify that every Hilbert’s 10th: Whether a given Diophan- consistent, axiomatizable, complete theory is tine equation has integer solutions (Matija- decidable, and we can use this to formulate sevich, following work of Martin Davis, Hi- and prove a very general version of the G¨odel lary Putnam and Julia Robinson). Incompleteness Theorem. The key tool is The Word Problem for Groups: Whether the notion of translation. two words denote the same element in a Suppose T1 and T2 are theories, perhaps finitely generated, finitely presented group in different vocabularies σ1 and σ2—e.g., T1 (P. Novikov, W. Boone). might by Th(PA), and T2 might be some ax- The Homeomorphism Problem for 4- iomatic set theory. A translation of T1 into manifolds: Whether the orientable n-ma- T2 is a computable string function ρ which nifolds represented by two triangulations are assigns a sentence ρ(A) of FOL(σ2) to every homeomorphic, for n ≥ 4 (A. Markov). This sentence A of FOL(σ1) and preserves propo- problem is solvable for 2-manifolds, by their sitional logic and T1-inference, i.e., classical representation as spheres with han- dles, and it is still open for 3-manifolds, T2 ⊢ ρ(¬A) ↔¬ρ(A) pending (among other things) the resolu- T2 ⊢ ρ(A & B) ↔ ρ(A) & ρ(B) tion of the Poincar´eConjecture. T1 ⊢ A =⇒ T2 ⊢ ρ(A). There is also a large number of unsolvable problems in Computer Science. Notice that the identity function ρ(A) = A MATHEMATICAL LOGIC 15 translates every theory into itself. Wilkie’s Theorem, that every set in R which The G¨odel Incompleteness Theorem (Ros- is first order definable using exponentials is ser’s form). If T is a consistent, axiomati- a finite union of intervals. zable theory and Peano arithmetic Th(PA) is translatable into T , then T is undecidable E. The Second Incompleteness Theo- and hence incomplete. rem In short, every consistent axiomatic sys- What sorts of true sentences are not prov- tem in which a reasonable amount of mathe- able in sufficiently strong axiomatizable the- matics can be developed is undecidable and ories? If T = Th(T ) is axiomatizable in incomplete. 0 FOL(σ), then the (coded) proof relation To state the strongest corresponding re- sult about theories of structures, we need ProofT (a,p) ⇔ the simple fact that every computable set is a is the code of some sentence arithmetical, essentially due to G¨odel. A in FOL(σ) and p is the code Tarski’s Theorem. If Th(N) is translat- of a proof of A from T able into Th(M), then Th(M) is not arith- is Turing computable, and hence arithmeti- metical, a fortiori it is not decidable. cal. Using this, we can construct a sentence To apply Tarski’s Theorem, you need (in ConcisT in the vocabulary of PA which ex- effect) to give a first order definition of the presses naturally the consistency of T and natural numbers within the given structure. establish the following: One of the first results of this type was the G¨odel’s Second Incompleteness Theorem undecidabilty of the theory of rational num- (Rosser’s form). If T is consistent, axiom- bers Th(Q, 0, 1, +, ·) (Julia Robinson), but atizable and ρ translates Th(PA) in T , then there are many others, and there are also T cannot prove the translation ρ(ConsisT ) of many difficult open problems in this area. its consistency sentence. On the other hand, many interesting the- The theorem makes it clear that we can- ories are decidable, including the following: not axiomatize a substantial part of math- • The theory Th(N, 0, 1, +) of arithmetic ematics in any way whatsoever so that the without multiplication (Presburger). consistency of the system can be established • The theory Th(Q, ≤). This coincides “constructively”: because the (presumably with the theory of every dense, linear or- simple) “constructive methods” we would be dering without endpoints. willing to use in a consistency proof should • The theory Th(C, 0, 1, +, ·) of the com- be part of the “substantial part of mathe- plex number field, which coincides with the matics” we want to axiomatize. Beyond its theory of every algebraically closed field of obvious foundational significance, the Sec- characteristic 0 (Tarski, Abraham Robin- ond Incompleteness Theorem has numer- son). ous applications, especially in comparing the • The theory Th(R, 0, 1, +, ·, ≤) of the or- strength of various hypotheses in Axiomatic dered field of real numbers, which coincides Set Theory. with the theory of every real closed field (Tarski). F. Hierarchies The classical result here is Tarski’s decid- 0 ability of the ordered field of real numbers, A set Q of strings or numbers is Σ2 if which (using coordinates) implies that Eu- clidean geometry is decidable, in a sense triv- u ∈ Q ⇔ (∃x1)(∀x2)R(u, x1,x2), ializing much of ancient Greek mathematics! where the quantified variables range over It is still open whether the extended the- y natural numbers and the matrix R is com- ory Th(R, 0, 1, +, ·, ≤, ↑) (with x ↑ y = x putable, and it is Π0 if, for all u for x > 0) is decidable, but there has been 3 substantial progress in this problem with u ∈ Q ⇔ (∀x1)(∃x2)(∀x3)R(u, x1,x2,x3) 16 YIANNIS N. MOSCHOVAKIS with the same restrictions. The definitions b a b b b extend naturally to all k, and we also set 6 6 0 0 0 ′ ∆k =Σk ∩ Πk. q =⇒ q Kleene, who introduced these classes, showed ? ? that 011 10 1 0 1 1 1 1 1 ∆0 = the class of recursive sets, 1 Fig. 3. q,a, 0 7→ q′, , 1, −1, +1 0 0
Σ1 Σ2