MATHEMATICAL LOGIC Narrowly construed, mathematical logic is the study of deﬁnition and inference in YIANNIS N. MOSCHOVAKIS mathematical models of fragments of lan- guage, especially the ﬁrst order logic frag- ment. Logic has made critical contributions

I. Propositional Logic, PL . to the foundations of science, especially L II. First Order Logic, FO . through the work of Kurt G¨odel, and it III. G¨odel’s Incompleteness Theorem. also has numerous applications. For set the- IV. Computability. ory theoretical computer science V. Recursion and Programming. and , these VI. Alternative Logics. applications are so important, that parts VII. Set Theory. of these ﬁelds are normally included in the modern, broad conception of the discipline. Glossary I. Propositional Logic, PL Church-Turing Thesis: Claim that ev- ery computable function can be computed Each logic L has a syntax which delin- by a Turing machine. eates the grammatically correct linguistic Computability theory: Study of com- expressions of L, a semantics which assigns putable functions on the natural numbers. meaning to the correct expressions, and a Continuum hypothesis: Conjecture structured system of proofs which speciﬁes that there are only two sizes of inﬁnite sets the rules by which some L-expressions can of real numbers. be inferred from others. Database: Finite, typically relational There are other words to describe these structure. things: formal language is sometimes used First order logic: Mathematical model to describe a plain syntax, formal system of the part of language built up from the pro- often identiﬁes a syntax together with an positional connectives and the quantiﬁers. inference system (but without an interpre- Incompleteness phenomenon: G¨odel’s tation), and abstract logic has been used to discovery, that suﬃciently strong axiomatic refer to a syntax together with an interpreta- theories cannot decide all propositions which tion, leaving inference aside. It is, however, they can express. a fundamental feature of logic that it draws Model theory: Study of formal deﬁn- clean distinctions and studies the connec- ability in ﬁrst order structures. tions among these three aspects of language. Paradox: Counterintuitive truth. We explain them ﬁrst in the simplest exam- Peano arithmetic: Axiomatic theory of ple of the “logic of propositions”, which is natural numbers. part of many important logics. Proof theory: Study of inference in for- A. Propositional Syntax mal systems independently of their interpre- tation. The symbols of PL are the connectives Propositional connectives: The lin- ¬ (not) & (and) ∨ (or) guistic constructs “and”, “not”, “or” and “implies”. → (implies, if-then) Quantiﬁers: The linguistic constructs the two parentheses ‘(’, ‘)’, and an inﬁ- “there exists” and “for all”. nite list of (formal) propositional variables Turing machine: Mathematical model P0, P1, P2,... which intuitively stand for of computing device with unbounded mem- declarative propositions, things like ‘John ory. loves Mary’ or ‘3 is a prime number’. It has Unsolvable problem: A problem whose only one category of grammatically correct solution requires a non-existent algorithm. expressions, the formulas, which are strings 1 2 YIANNIS N. MOSCHOVAKIS

(ﬁnite sequences) of symbols deﬁned induc- what the truth value of B, so that ‘if the tively by the following conditions: moon is made of cheese, then 1 + 1 = 5’ is F1. Each Pi is a formula. true (on the plausible assumption that the F2. If A and B are formulas, then so are moon is not made of cheese). This material the expressions implication assumed by Propositional Logic has been attacked as counterintuitive, but it ¬A (A & B) (A ∨ B) (A → B) agrees with mathematical practice and it is For example, if P and Q are propositional the only useful interpretation of implication variables, then (P → Q) and (P ∨¬P ) are which accords with the Compositionality formulas, which we read as “if P then Q” Principle. and “either P or not P ”. Using these rules, we can construct for The inductive deﬁnition gives a precise each formula A a truth table which tabulates speciﬁcation of exactly which strings of sym- its truth value under all assignments of truth bols are formulas, and also insures that each values to the variables. For example, the formula is either prime, i.e., just a variable truth table for (Q → P ) consists of the ﬁrst Pi, or it can be constructed in exactly one three columns of Table 2 while the ﬁrst two way from its simpler immediate parts, by one of the connectives. This makes it possible to P Q (Q → P ) (P → (Q → P )) prove properties of formulas and to deﬁne 1 1 1 1 operations on them by structural induction 1 0 1 1 on their deﬁnition. 0 0 1 1 More propositional connectives can be in- 0 1 0 1 troduced as “abbreviations” of formula com- Table 2. binations, e.g., and the last column give the truth table for A ↔ B ≡ ((A → B)&(B → A)) (P → (Q → P )) . A ∨ B ∨ C ≡ (A ∨ (B ∨ C)). If n variables occur in a formula A, then the truth table for A has 2n rows and de- B. Propositional Semantics termines an n-ary bit function vA, with ar- If B stands for some true proposition, guments and values in the two-element set then ¬B is false, independently of the {1, 0}. By the Deﬁnitional Completeness “meaning” or internal structure of B. This Theorem, every n-ary bit function is vA for is an instance of a general Compositional- some A, so that the formulas of PL provide ity Principle for PL: The truth value of a deﬁnitions (or “symbolic representations”) formula depends only on the truth values for all bit functions. of its immediate parts. The semantics of A formula A is a semantic consequence PL comprise the rules for computing truth of a set of formulas T (or T -valid) if every values, and they can be summarized in Ta- assignment to the variables which satisﬁes ble 1, where 1 stands for ‘truth’ and 0 for (makes true) all the formulas in T also sat- ‘falsity’. By the ﬁrst line of this table, for isﬁes A,. We write

A B ¬A (A & B) (A ∨ B) (A → B) T |= A ⇔ A is T -valid, 1 1 0 1 1 1 and |= A, in the important special case when 1 0 0 0 1 0 T is empty, in which case A is called a tau- 0 1 1 0 1 1 tology. A formula A is satisﬁable if some 0 0 1 0 0 1 assignment satisﬁes it, i.e., if ¬A is not a Table 1. Truth value semantics. tautology. Let example, if A and B are both true, then ¬A A ∼ B ⇔ {A} |= B and {B} |= A is false while (A & B), (A ∨ B) and (A → B) ⇔ |= A ↔ B, are all true. Notice that if A is false, then (A → B) is reckoned to be true no matter and call A and B equivalent if A ∼ B. MATHEMATICAL LOGIC 3

Equivalent formulas deﬁne the same bit the computation of bit functions by appeal- function, and they can be substituted for ing to the formula representations of the cir- each other without changing truth values. cuits which realize them. For example, using Clearly disjunctive normal forms, one sees immedi- ately that (if we do not care about cost), (A → B) ∼ (¬A ∨ B), every n-ary bit function can be computed by so that the implication connective is super- an unbounded fan-in circuit in no more than ﬂuous. In fact, every formula is equivalent 3 time units. There is, in general, a sub- to one in disjunctive normal form, i.e., a dis- stantial trade-oﬀ between the size and time junction A1 ∨···∨ Ak where each Ai is a complexity of the circuits which compute a conjunction of variables or negations of vari- given bit function. ables (literals). D. The Satisﬁability Problem C. Applications to Circuits The assertion that “C(A) and C(B) Each formula A with n variables can be never give the same output on the same realized by a switching circuit C(A) with n inputs” means precisely that “(A ↔¬B) is inputs and one output, so that C(Pi) con- a tautology”, so that to detect that A and B sists of just one input-output edge, C(A&B) do not have this safety property we need to is constructed by joining C(A) and C(B) determine whether the formula ¬(A ↔¬B) with an and-gate, etc. Figure 1 exhibits the is satisﬁable. Because of such natural formulations P 1 s of “error detection” for circuits relative & - ¬ > s - to given speciﬁcations, it is very impor- P ∨ 2 > tant to ﬁnd eﬃcient algorithms for deter- P 3 mining whether a given formula is satisﬁ- able. The problem is of non-deterministi- Fig. 1. The circuit for (¬((P1 & P2) ∨ P3). cally polynomial time complexity (NP), be- circuit for ((P1 &P2) → P3) using the equiv- cause it can be resolved by guessing (“non- alent formula without implications, so that deterministically”) some assignment and only ¬-, &- and ∨-gates are required. These then verifying that it satisﬁes A in a number are restricted circuits, of fan-in (maximum of steps which is bounded by a polynomial number of edges into a node) 2 and fan-out in the length of A; and it is NP-complete, 1, but the Deﬁnitional Completeness The- i.e., every NP-problem can be “reduced” to orem implies that every n-ary bit function it by a polynomial reduction. This is a ba- can be computed by some formula circuit sic result of S. Cook, who introduced the C(A). complexity class NP, showed that it con- There are basically two useful measures tains a large number of important problems, of circuit complexity, and both of them are and asked if it coincides with the (seem- faithfully mirrored in formulas. The num- ingly) smaller class P of “feasible”, deter- ber of gates of C(A) is exactly the number ministically polynomial time problems. The of connectives in A and measures size com- question whether P = NP is the fundamen- plexity (construction cost), while the depth tal open problem of complexity theory; it of C(A), which measures the time complex- amounts simply to the question whether the ity of computation, is exactly the rank of satisﬁability problem can be solved by a de- A, deﬁned inductively so that rk(Pi) = 1, terministic, polynomial algorithm. rk(A&B) = max(rk(A), rk(B))+1 and sim- ilarly for the other connectives. One can E. Propositional Inference now use natural manipulations of formulas A proof of a formula A from a set of hy- to construct circuits which compute a given potheses T is any ﬁnite sequence bit function with minimum size or time com- plexity, or to establish optimality results for A0,A1,...,An−1,A 4 YIANNIS N. MOSCHOVAKIS which ends with A, and such that each Ai is M gives the two-element set {1, 0} of truth either in T , or a PL-axiom, or follows from values; but there are others, e.g., the set of previously listed formulas by a rule of infer- all ﬁnite and co-ﬁnite subsets of some in- ence. To make this notion precise we need ﬁnite set, the set of all “closed and open” to specify a set of PL-axioms and rules of in- subsets of a topological space, etc. ference; and for these to be useful, it should Each formula A with n variables deﬁnes be that they are few and easy to understand, an n-ary function on every Boolean algebra and that the formulas provable from T are B, simply by letting the propositional vari- exactly the T -tautologies. ables range over B and replacing ¬, & and ∨ We need just one, binary inference rule: and → by ′, ∩, ∪ and ⇒ respectively, where A (A → B) x ⇒ y = x′ ∪ y (Modus Ponens) B on B. Now the axioms for a Boolean al- This is sound, i.e., {A, (A → B)} |= B, gebra insure that every propositional axiom so that if A and (A → B) are both T - deﬁnes a function with constant value 1—in tautologies, then so is B. fact the particular choice of axiomatization An axiom is any instance of the following for Boolean algebras (and there are many) is axiom schemes, where A, B and C are arbi- quite irrelevant as long as this fact obtains; trary formulas and we have omitted several and then the Completeness Theorem implies parentheses which pedantry would require: that two formulas A and B deﬁne the same (1) A → (B → A) n-ary operation on all Boolean algebras ex- (2) (A → B) actly when A ∼ B, i.e., when A and B deﬁne → (A → (B → C)) → (A → C) the same bit function. Boolean algebras have many important (3) A → (B → (A & B)) applications in mathematics (to measure (4) (A & B) → A (4′) (A & B) → B ′ theory, among other things), and they are (5) A → (A ∨ B) (5 ) B → (A ∨ B) the subject of the classical Stone Represen- (6) (A → C) tation Theorem which identiﬁes them all → (B → C) → ((A ∨ B) → C) (up to isomorphism) with subalgebras of powerset algebras. In logic they are mostly (7) (A → B) → (A →¬B) →¬A used through the “non-standard” Boolean (8) ¬¬A → A semantics of this subsection, which extend These are all tautologies, and so every for- to richer logics and provide a powerful tool mula provable from T is T -valid. We write for independence (unprovability) results. T ⊢ A ⇔ there is a proof of A from T, II. First Order Logic, FOL and it is not hard now to establish the basic Soundness and Completeness Theorem Consider the claim: for PL. For all sets T and any A, If everybody has a mother, and T |= A ⇔ T ⊢ A. every mother loves her children, then everybody is loved by F. Boolean Algebras somebody. A Boolean algebra is a set B with at least It is certainly true, it has the “linguistic two, distinct elements 0 and 1, a unary com- form” of many similar (more substantial) plementation operation ′, and binary inﬁ- claims in mathematics, and it appears to mum ∩ and supremum ∪ operations such be true by virtue of its form and not be- that certain properties hold. The standard cause of any special properties of the words example is the set P(M) of all subsets of “mother”, “love”, etc. First Order Logic some non-empty set M, with 0 = ∅, 1 = M makes it possible to express complex asser- and the usual complementation, intersection tions of this type and to show that they are and union operations, which for a singleton true by logic alone. The symbolic expression MATHEMATICAL LOGIC 5 of this one will be quantiﬁcation is only allowed over individ- uals; if we add formula formation rules (∀x)(∃y)M(x, y) n n (∀Pi )A (∃Pi )A h &(∀x)(∀y)[M(x, y) → L(y,x)] we obtain the formulas of second order logic, SOL → (∀x)(∃y)L(y,xi ), . Consider the simple formula give-or-take a few parentheses and brackets (1) (∃v )(¬v = v & P1(v )). which will be required to make the syntax 2 2 1 1 2 completely precise. Its “translation” into English by the reading of the symbols we have introduced is

A. First Order Syntax some object other than v1 P1 The symbols of FOL are the propositional has the property 1 connectives, the parentheses, the quantiﬁers which is exactly how we would translate the result of substituting v3 for v2 in it, ∀ (for all) ∃ (there exists) 1 (∃v3)(¬v3 = v1 & P1(v3)). the comma ‘,’, the identity symbol ‘=’, an This is because both occurrences of v in (1) inﬁnite list v , v ,... of individual variables 2 0 1 are bound by the quantiﬁer ∃v , just as the which will denote arbitrary objects in some 2 occurrences of x are bound by the dx in domain, and for each n = 0, 1,..., two inﬁ- 1 x2dx and can be replaced by y without nite lists of function and relational symbols 0 changing the meaning of the deﬁnite inte- n n n n R f0 , f1 ,..., P0 , P1 ,..., gral. On the other hand, the occurrence of v1 in (1) is free, because it is not within the which will stand for n-ary functions and re- scope of any quantiﬁer, and so the inter- lations on the objects. pretation of v1 clearly aﬀects the meaning There are two categories of grammati- of (1). terms cally correct expressions in FOL, and Using the same simple example, consider formulas, deﬁned recursively by the follow- 1 1 the results of substituting f (v3) and f (v2) ing conditions. 0 0 for v1 in (1), T1. Each variable vi is a term. v v f1 v P1 v T2. If t1,...tn are terms, then (the (∃ 2)(¬ 2 = 0 ( 3) & 1( 2)), string) fn(t ,...,t ) is also a term. When 1 1 i 1 n (∃v2)(¬v2 = f0 (v2) & P1(v2)). n = 0, we write simply f0. i f1 v F1. If t ,...,t are terms, then the ex- The ﬁrst of these says of 0 ( 3) what (1) says 1 n v pressions of 2, but the second says that “something is 1 1 not a ﬁxed point of f0 and has property P1”, n t1 = t2 Pi (t1,...,tn) which is quite diﬀerent—evidently because the variable v in f1(v ) is “caught” by the are formulas, the latter written simply P 2 0 2 i quantiﬁer ∃v . The ﬁrst is a free substitution when n = 0. 2 (causing no confusion) while the second is F2. If A and B are formulas, then so are not. We will denote the result of substitut- the expressions ing the term t for the free occurrences of the ¬A (A & B) (A ∨ B) (A → B) variable x in some formula A by F3. If A is a formula, then so are the A{x :≡ t} expressions and we will tacitly assume that all substitu- tions are free. (∀v )A (∃v )A i i Formulas of FOL are too messy to write Notice that by the notational convention in down, and so we often resort to “informal F1, all PL-formulas are also FOL-formulas. descriptions” of them like the example about This logic is called ﬁrst order because mothers loving their children above, recipes, 6 YIANNIS N. MOSCHOVAKIS

extends to FOL in a straightforward man- ı |= t1 = t2 ⇔ ı(t1) = ı(t2) ner and implies the following basic fact: the Pn Pn ı |= i (t1,...,tn) ⇔ (ı( i ))(ı(t1),...,ı(tn)) truth value of A relative to ı depends only on ı |= ¬A ⇔ ı 6|= A the values of ı on the function and relation ı |=(A & B) ⇔ ı |= A and ı |= B symbols which occur in A, and on the values ı |=(A ∨ B) ⇔ ı |= A or ı |= B ı(x) for the individual variables which occur in ı |=(A → B) ⇔ ı 6|= A or ı |= B free A. The Tarski conditions do nothing more ı vi A d , |=(∀ ) ⇔ for all in D than translate formulas into English, in ef- ı{vi := d}|= A fect identifying FOL with a precisely formu- ı |=(∃vi)A ⇔ for some d in D, lated, small but very expressive fragment of ı{vi := d}|= A natural language. Table 3. The Tarski truth conditions. C. Structures really, from which the full, grammatically A vocabulary (or signature) is any ﬁnite correct formula could (in principle) be con- sequence σ = {f1,..., fk, P1,..., Pl} of func- structed. tion and relation symbols, and FOL(σ) is the part of FOL whose formulas involve only B. First Order Semantics the function and relation symbols of σ. The idea is to think of f1,..., fk and P1,..., Pl as Whether (1) is true or false depends on constants, denoting ﬁxed functions and rela- v f1 the object 1, on the function 0 , on the tions on some set D, and to use the formulas P1 property 1, and (most signiﬁcantly) on the of FOL(σ) to study deﬁnability in structures range of objects over which we interpret the existential quantiﬁer—where do we search M =(DM , f1,...,fk,P1,...,Pl) 1 for things which may or may not satisfy P1? of vocabulary σ, where the universe DM To interpret the formulas of FOL we must of M is any non-empty set, and f1,...,fk, be given a domain D and an interpretation P1,...,Pl are functions and relations which ı, a function which assigns an object ı(vi) can be assigned to the vocabulary symbols, in D to each individual variable, an n-ary e.g., such that fi is n-ary if fi is n-ary. n function ı(fi ) on D to each n-ary function An M-assignment is any function α from n n symbol fi , and an n-ary relation ı(Pi ) on the variables to DM , and it extends natu- n D to each Pi . Using these, ﬁrst we extend rally to an interpretation αM by the associ- inductively ı to all terms by ation of fi with fi and Pi with Pi; the stan- n n dard notation for structure satisfaction is ı(fi (t1,...,tn))=(ı(fi ))(ı(t1),...,ı(tn)), M, α |= A ⇔ αM |= A. so that ı(t) is some object in D. To as- sign truth values to formulas, deﬁne ﬁrst, Formulas of FOL(σ) with no free variables for each variable x and d in D, the update are called sentences and (by the Composi- tionality Principle) they are simply true or = ı{x := d}, false in every σ-structure, without reference which agrees with ı on all function and rela- to any assignment. They deﬁne properties tion symbols, and also on all individual vari- of structures. We write ables, except that (x) = d. With the help of M |= A ⇔ for any (and hence all) α, this basic operation, we can state in Table 3 M, α |= A (A a sentence), the classical Tarski truth conditions which determine the truth of formulas relative to and if M |= A, we say that M satisﬁes A or a ﬁxed domain D and an interpretation ı. is a model of A. The truth value of a formula A relative to While sentences deﬁne properties of struc- an interpretation ı is 1 if ı |= A and 0 oth- tures, formulas with free variables can be erwise, and the Compositionality Principle used to deﬁne relations on structures. If, for MATHEMATICAL LOGIC 7 example, A has at most one free variable x, important in their study. we set Two structures M1 and M2 are isomor- phic if some one-to-one correspondence be- RA(d) ⇔ M, α{x := d} |= A, tween their universes carries the functions where α is any assignment, since its only and relations of M1 to those of M2. Isomor- relevant value is updated in this deﬁnition. phic structures satisfy the same ﬁrst order In the same way, formulas with n free vari- sentences, but the converse is not true, as ables deﬁne n-ary relations on σ-structures, we will see in II-F. the ﬁrst order deﬁnable relations of M. A n function f : DM → DM is ﬁrst order deﬁn- D. Databases able if its graph In the most general terms, a database is just a ﬁnite structure, typically relational, Gf (x1,...,xn,w) ⇔ w = f(x1,...,xn) i.e., without functions, only relations. “Fi- is ﬁrst order deﬁnable. Some examples: nite” does not mean “small” or “simple”, A directed graph is a structure G = and in the interesting applications databases (D,E), where E is a binary “edge” relation are huge structures of large and complex vo- on the set of “nodes” G, and it is a graph cabularies, with basic relations such as “x is (undirected) if it satisﬁes the sentence an employee born in year n”, “y is the su- pervisor of x”, etc. Properties of structures (∀x)(∀y)[E(x, y) → E(y,x)]. are usually called queries in database the- ory, and one of the main tasks in the ﬁeld Complete graphs (cliques) are characterized is to develop representations for databases by the sentence which support fast algorithms for updating, (∀x)(∀y)E(x, y), entering new information in the base and data testing, determining the truth or falsity while “diameter ≤ 2” is deﬁned by of queries. As it happens, both updating and data testing are very eﬃcient for ﬁrst order (∀x)(∀y)[x = y ∨ E(x, y) queries, and so database systems, including ∨ (∃z)[E(x, z) & E(z, y)]]. the industry standard SQL make heavy use of methods from ﬁrst order logic. Finite directed and undirected graphs are Motivated by Database Theory, a good used to model many notions in computer sci- deal of research has been done since the ence, e.g., circuits. 1970s in Finite Model Theory, the mathe- A semigroup (monoid) with identity is matical and logical study of ﬁnite structures. a structure (S, e, ·) where the identity e is For a rather surprising, basic result, let some speciﬁed member of S, · is a binary

“multiplication” on S, and the following sen- Probσ[M |= A : |DM | = n] tences are true: = the proportion of σ-structures (∀x)(∀y)[x · (y · z)=(x · y) · z], of size n which satisfyA, (∀x)(x · e = x & e · x = x). where structures are counted “up to isomor-

Here and in the sequel we write t1 · t2 rather phism”. The - Law. For each sentence than the pedantically correct ·(t1,t2). FOL 0 1 A In addition to semigroups, there are of FOL(σ) in a relational vocabulary, either groups, rings, ﬁelds and ordered ﬁelds, vector lim Probσ[M |= A : |DM | = n] = 1, spaces, and any number of other structures n→∞ which are the stuﬀ of “abstract” algebra. or These classes of structures are all charac- lim Probσ[M |= A : |DM | = n] = 0, terized by ﬁrst order axioms, and the use of n→∞ methods from logic is becoming increasingly i.e., either A or ¬A is asymptotically true. 8 YIANNIS N. MOSCHOVAKIS

More advanced work in this area is con- model. cerned primarily with the algorithmic anal- For an impressive application, let (in the ysis of queries on ﬁnite structures, especially vocabulary of arithmetic) in logics richer than FOL. ∆0 ≡ 0, ∆m+1 ≡ (∆m + 1), E. Arithmetic so that the numeral ∆m is about the sim- Most basic is the structure of arithmetic plest term which denotes the number m, add a constant c to the language, and let N =(N, 0, 1, +, ·), where N = {0, 1,...} is the set of (non- T = {A : N |= A} negative) natural numbers and + and · are ∪ {∆0 ≤ c, ∆1 ≤ c, ∆2 ≤ c,...}. the operations of addition and multiplica- Every ﬁnite subset S of T has a model, tion. The ﬁrst order deﬁnable relations and namely functions on N are called arithmetical, and they obviously include addition, multiplica- NS =(N, 0, 1, +, ·,m), tion and the ordering on N, which is deﬁned where the object m which interprets c is by the formula some number bigger than all the numerals x ≤ y ≡ (∃z)[x + z = y]. which occur in formulas of S. So T has a countable model By a basic Lemma of G¨odel, if a function f N =(N, 0, 1, +, ·,c), is determined from arithmetical functions g T and h by the equations and then N = (N, 0, 1, +, ·) is a structure f(0, ~x) = g(~x) for the vocabulary of arithmetic which sat- (2) f(y + 1, ~x) = h(f(y, ~x), y, ~x), isﬁes all the ﬁrst order sentences true in the “standard” structure N but is not isomor- then f is also arithmetical. Thus exponen- phic with N—because it has in it some ob- y tiation x is arithmetical, with g(x) = 1, ject c which is “larger” than all the interpre- h(w,y,x) = w · x, and, with some work, so tations of the numerals ∆0.∆1,.... It fol- is the function p(x) which enumerates the lows that, with all its expressiveness, First prime numbers, Order Logic does not capture the isomor- p(0) = 2, p(1) = 3, p(2) = 5, .... phism type of complex structures such as N. These non-standard models of arithmetic In fact, the scheme of Primitive Recur- were constructed by Skolem in the 30s. sion (2) is the basic method by which func- Later, in the 50s, Abraham Robinson con- tions are introduced in number theory, so structed by the same methods non-standard that, with some work, all fundamental num- models of analysis, and provided ﬁrm foun- ber theoretic relations and functions are dations for the classical Calculus of Leibnitz arithmetical, and all celebrated theorems with its inﬁnitesimals and “inﬁnitely large” and open problems of the theory of num- real numbers. bers are expressed by ﬁrst order sentences Model Theory has advanced immensely of N. These include the Prime Number The- since the early work of Tarski, Abraham orem, Fermat’s Last (Wiles’) Theorem, and Robinson and Malcev. Especially with the the (still open) question whether there exist contributions of Shelah in the 70s and, more inﬁnitely many twin pairs of prime numbers. recently, Hrushovsky, it has become one of the most mathematically sophisticated F. Model Theory branches of logic, with substantial applica- The mathematical theory of structures tions to algebra and number theory. starts with the following basic result: Compactness and Skolem-L¨owenheim The- G. First Order Inference orem. If every ﬁnite subset of a set of sen- The proof system of First Order Logic is tences T has a model, then T has a countable an extension of that for Propositional Logic, MATHEMATICAL LOGIC 9

ﬁrst by identity axioms which insure that = Logic is the converse of this result: is an equivalence relation and a congruence Completeness of FOL. If T |= A, then for all function and relation symbols, e.g., T ⊢ A. for unary function symbols, It may be argued that the semantic conse- (∀x)(∀y)[x = y → f(x) = f(y)]. quence relation T |= A captures the intuitive notion A follows from the assumptions in T In addition, there are two axioms for the by logic alone, in the sense that it insures quantiﬁers, that A is true whenever all the hypotheses A{x :≡ t} → (∃x)A (∀x)A → A{x :≡ t}, in T are true, independently of the meaning assuming that the term substitutions are of the function and relation symbols. Grant- free; and there are two new inference rules, ing that and considering the strong express- ibility of First Order Logic discussed in II-C C → A A → C above, we may then argue further that the C → (∀x)A (∃x)A → C Completeness Theorem answers deﬁnitively (for science) the ancient question of what fol- which can be used only when the variable lows from what by logic alone: a proposi- x is not free in C. Proofs from a set T of tion A follows from certain assumptions T FOL(σ) sentences are deﬁned exactly as for as a matter of logic (and independently of PL, and we set again the facts), if A and T can all be expressed T ⊢ A ⇔ there is a proof of A from T. faithfully as FOL(σ) assertions about some σ-structure M, and T ⊢ A. On this view, it Notice that without the restriction on the is hard to overemphasize the importance of quantiﬁer rules, the sequence this result for the foundations of mathemat- P (x) → P (x),P (x) → (∀x)P (x), ics and science. Incidentally, there is an obvious extension (∃x)P (x) → (∀x)P (x) of the Tarski conditions to Second Order would be a proof of (∃x)P (x) → (∀x)P (x), Logic, e.g., which is, obviously, not valid. With the re- Pn striction, however, for every structure M, if ı |=(∀ i )A ⇔ for all n-ary P on D, n every M-assignment satisﬁes the hypothesis ı{Pi := P } |= A. of either new rule, then every M-assignment However, there is no useful Completeness satisﬁes the conclusion, so that the quanti- Theorem for SOL, as we will see in IV-F. ﬁer inference rules are sound. H. G¨odel’s Completeness Theorem I. Proof Theory A model of a set of sentences T in FOL(σ) If Model Theory is the study of seman- is any structure M which satisﬁes every A tics independently of inference, then Proof in T , in symbols Theory can be viewed as the mathemati- cal investigation of formal proofs indepen- M |= T ⇔ for all A in T, M |= A. dently of interpretation. This has always We also write been one of the most active research areas of logic, and it has been invigorated in recent T |= A ⇔ for all M, years by its substantial applications to com- M |= T =⇒ M |= A, puter science, including automated deduc- tion, an important component of artiﬁcial which extends to FOL(σ) the semantic con- intelligence. Key to these applications—and sequence relation of PL. From the comments the basic result of Proof Theory—is the Ex- above: tended Normal Form Theorem of Gentzen, Soundness Theorem for FOL. If T ⊢ A, whose somewhat weaker (but simpler) Her- then T |= A. brand version is fairly easy to describe. The fundamental fact about First Order There are four Herbrand inference rules, 10 YIANNIS N. MOSCHOVAKIS and they apply to n-ary disjunctions and “Proof Theory = no semantics” are of- ten honored in the breach: like the Com- A1 ∨···∨ An. pleteness Theorem, most fundamental re- Two of them are structural, and they clearly sults of logic are about connections between preserve meaning: you can interchange the truth and proof, and some of the deepest re- order of the disjuncts, or delete one of two sults in one part of the discipline depend on occurrences of the same disjunct. The other two are quantiﬁer rules, methods and ideas from the other. III. Godel’s¨ Incompleteness Theorem A1 ∨···∨ An{x :≡ t} A1 ∨···∨ An ∗ A1 ∨···∨ (∃x)An A1 ∨···∨ (∀x)An Having established that FOL proves all logical truths, it is natural to ask if it can also where the ∗ indicates that the ∀-rule can prove—from some natural set of axioms—all only be used if the variable x is not free mathematical truths. This is not possible, in its conclusion. The result applies only by G¨odel’s fundamental result, whose spe- to sentences without identity and in prenex cial case for arithmetical truths we discuss normal form, i.e., looking like in this section. (Q1x1) · · · (Qn)B A. The Incompleteness of Peano Ari- Q where each i is ∀ or ∃ and B is quantiﬁer- thmetic free. The classical Peano axioms for arithmetic Herbrand’s Theorem. Every provable =- comprise the properties of the successor free sentence A of FOL(σ) in prenex form can be derived from a provable quantiﬁer- (3) x + 1 6= 0 x + 1 = y + 1 → x = y, free disjunction by the four Herbrand rules. the recursive deﬁnitions of addition and The restriction to prenex sentences is not multiplication, essential, because every formula can be con- x + 0 = x verted to an equivalent prenex one by the (4) x +(y +1) = (x + y) + 1, application of simple rules which can be x · 0 = 0 added to the system. (5) The theorem asserts (in part) that every x · (y + 1) = x · y + x, provable sentence A has a “normal” proof, and the Induction Axiom which cannot be in which only formulas of “quantiﬁer rank” expressed fully in First Order Logic. Its Sec- no greater than A occur. This is a power- ond Order Logic version is ful tool for proof-theoretic studies. As for applications, all automated deduction sys- (∀P ) P (0)&(∀x)(P (x) → P (x + 1)) tems use Herbrand-like inference systems h (or their Gentzen variants), and the pro- → (∀x)P (x) , gramming language PROLOG is based en- and the best we can do in FOL is to adopti tirely on this idea. the Axiom Scheme The proof of Herbrand’s Theorem is con- structive: an algorithm is deﬁned, which (6) A{y :≡ 0} computes for each proof Π of a prenex sen- tence A a Herbrand proof Π′, and then it is &(∀x)(A{y :≡ x} → A{y :≡ x + 1}) shown by simple, combinatorial arguments → (∀x)A{y :≡ x}. that Π′, indeed, proves A. The additional, eﬀective content is signiﬁcant for the foun- The set PA of (ﬁrst order) Peano axioms dational applications of the theorem (for ex- is obtained by taking the correctly spelled ample to consistency proofs), and also in the versions of all the formulas in (3)–(6) and applications to automated deduction. adding enough universal quantiﬁers in front It should be emphasized that the simplis- of them so that they become sentences. This tic slogans “Model Theory = no inference” is a very strong set of axioms, it can prove MATHEMATICAL LOGIC 11 all simple properties of numbers and most of p(i) is the i’th prime number. For example, their deep properties too—although proving the (correctly spelled) prime formula PA a theorem from is harder than proving it +(v , 0) = v using, say, methods from analysis, and num- 1 0 ber theorists distinguish and value “elemen- has the horrendously large code tary proofs” in PA. 2133551679111113617101915. G¨odel’s First Incompleteness Theorem. The size of codes is irrelevant: what mat- There is a sentence g in FOL(0, 1, +, ·), ters is that every string of symbols (and such that N |= g but PA 6⊢ g. hence every term, formula and proof) has One’s ﬁrst thought is that we can over- a code from which it can be reconstructed, come this “incompleteness phenomenon” by by the Unique Factorization Theorem for strengthening PA, perhaps add G¨odel’s own numbers; and (more signiﬁcantly) that PA is g to it, or use the Second Order Logic powerful enough to express and prove sim- version of the Induction Axiom along with ple properties of formulas and proofs, thus a suitable axiomatization of Second Order translated into properties of numbers. For Logic. None of this helps: G¨odel’s funda- example, if ∆n is the numeral denoting n, mental discovery is that ﬁrst order truth in as above, then PA can prove all true, basic N (and every other suﬃciently rich struc- relations among numerals, e.g., ture) simply cannot be presented usefully as PA an “axiomatic theory”. We will make this m + n = k =⇒ ⊢ ∆m + ∆n = ∆k. precise in a more general version of the In- Less trivially, the basic (coded) proof re- completeness Theorem in the next section. lation B. Coding (G¨odel numbering) ProofPA(a,p) ⇔ a is the code of some sentence The basic ingredients of the proof of the A and p is the code of a proof Incompleteness Theorem are coding and of A from PA self-reference. Proof v In analytic geometry we “code” (repre- is deﬁned by some formula PA with 1 v PA sent) points in the plane by pairs of real and 2 free, and can prove its basic prop- numbers, their coordinates, so we can trans- erties, e.g., late geometrical questions into algebraic ProofPA(a,p) problems and solve them by calculation. =⇒ PA ⊢ ProofPA{v1 :≡ ∆a, v2 :≡ ∆p}. G¨odel’s basic idea is to code the syntactic Similarly, the relation objects of FOL(0, 1, +, ·)—terms, formulas, D(a,p) ⇔ proofs—by natural numbers, so that their a is the code of some formula A properties are translated into properties of with only v1 free, and p is the numbers, which can then be expressed in code of a PA-proof of PA FOL(0, 1, +, ·) and (perhaps) proved in . A{v :≡ ∆ } Since all syntactic objects are strings of 1 a is deﬁned by some formula D with just v , v symbols, if we view a proof A1,...,An−1 as 1 2 a sequence of formulas separated by com- free. Set mas, it is enough to code strings, and we A ≡ (∀v2)¬D can do this in (at least) one simple minded so that only v1 is free in A, and if a is the way: we enumerate the symbols of the lan- code of A, set guage g ≡ A{v1 :≡ ∆a}. ¬ & ∨ → ( ) ∀ ∃ , =0 1+ · v0 v1 ·· 1 2 3 4 5678910111213141516 ·· Unscrambling the deﬁnitions, g asserts that and we set there is no PA-proof of A{v1 :≡ ∆a}; but g is v n0 n1 n2 nm A{ 1 :≡ ∆a}, so that g claims its own [a0a1a2 · · · am] = 2 3 5 · · · p(m) , unprovability; and a careful analysis of the where ni is the code of the symbol ai and situation shows that, indeed, g cannot be 12 YIANNIS N. MOSCHOVAKIS provable in PA, else PA would prove a con- A. Turing machines g tradiction. This also shows, that is true. A Turing machine M is determined by a It is not that simple, of course, and much ﬁnite alphabet S = {s ,...,s }, a ﬁnite delicate analysis and computation must be M 0 k set QM = {q0,...,qm} of (internal) states, done to establish that D(a,p) is arithmeti- and a ﬁnite table of transitions of the form cal and to derive a formal contradiction from the assumption that g is PA-provable. Key q,s 7→ q′,s′,m to the proof is the “self-reference” in the ′ ′ deﬁnition of D(a,p), which uses the coding, where q, q are states, s,s are in SM or the and the argument depends on the strength special “blank” symbol , and the move m (not the weakness) of the axiomatic system is −1, 0 or +1. No two transitions are acti- PA. Coding and self-reference have become vated by the same pair q,s on the left. We standard tools of logic since G¨odel’s work, imagine that, at any moment, M is in some and they have found substantial applica- internal state q and sits in front of an inﬁnite tions in many areas, including computer sci- “tape” with symbols in some of its cells. The ence and set theory. machine can only “see” the symbol s just in front of it, and does nothing (halts) unless IV. Computability one of its transitions is activated by the pair q,s; in which case it switches to state q′, it It is easy to determine whether an arbi- replaces s by s′ on the tape, and it moves n trary equation a0 +a1x+· · ·+anx = 0 with integer coeﬃcients a0,...,an has integer so- b a b b b lutions, since every integer root must divide 6 =⇒ 6 a0, and so all we have to do is to test the q q′ ﬁnitely many divisors of a0. The problem is not so easy for equations in k unknowns Fig. 2. q,a 7→ q′, , −1. r1 r2 rk (7) ar1,...,rn x1 x2 · · · xk = 0, left (if it can), right or none-at-all, depend- r1 ··· rk≤n + X+ ing on whether m is −1, +1 or 0. and it is much more interesting, in fact A machine M starts computing facing the to ﬁnd an algorithm which deter- leftmost cell, with an arbitrary string input mines whether (7) has a solution u0u1 ··· um−1 v0v1 ··· vn−1 is No. 10 in David Hilbert’s famous 1900 list 6 6 of 23 open problems in mathematics. Dio- q0 qt phantine equations are notoriously diﬃcult to solve, and one might suspect that no al- gorithm would do the job, but how can you u = u0 · · · um−1 on the tape, and it may diverge prove such an assertion? Using ideas and (never halt), for example if u = 11 techniques from G¨odel’s work and motivated and M has the two transitions by questions arising from it, logicians devel- q0, 1 7→ q0, 1, +1 q0, 7→ q0, 1, +1 oped in the 30s a tool for establishing abso- lute unsolvability results of this kind which If it halts, then its output on u is the string led to some spectacular applications, includ- M[u] = v0 · · · vn−1 at the left end of the ing a rigorous proof of the unsolvability of tape, until the ﬁrst blank (and it is possi- Hilbert’s 10th. ble that M[u] is empty.) The most direct approach was by Turing, Finally, M computes a string function ∗ ∗ who reasoned that algorithms should be im- f : S1 → S2 if S1 ∪ S2 ⊆ SM and for every ∗ plemented by “mechanical devices” and in- string u ∈ S1 , M[u] = f(u). By identifying troduced “abstract machines” that can per- each natural number n with the string || · · · | form symbolic computations some ten years of n + 1 tallies from the one-member alpha- before digital computers were invented. bet {|} (unary notation), the notion covers MATHEMATICAL LOGIC 13 functions whose arguments or values are ei- claiming to capture the notion of “com- ther strings or numbers. Moreover, if we putable” from diﬀerent perspectives, includ- code strings by numbers as above, then the ing Church’s λ-deﬁnable functions, Post’s transformation u 7→ [u] and its inverse can canonical systems, the general recursive be computed by a Turing machine, so that a functions of G¨odel, Herbrand and Kleene, string function is Turing computable exactly Kleene’s µ-recursive functions and, in the when its “coded version” is computable, and forties, Markov’s (formal) algorithms; each we can safely confuse the two notions. of these was proved equivalent to Turing computability, and the “simulation tech- B. The Church-Turing Thesis niques” developed for these proofs make it seem very unlikely that some algorithm will Turing argued persuasively that the sym- ever be discovered which cannot be simu- bolic computations of any “ﬁnite mechanical lated by a Turing machine. device” with access to unbounded memory It should be emphasized, however, that can be simulated by one of his machines, and the Church-Turing Thesis does not provide he has been fully justiﬁed by the subsequent a rigorous deﬁnition for the notion of algo- developments in computers. Church had al- rithm, which remains informal. Complex- ready made an equivalent (though less well ity results about algorithms are rigorously justiﬁed) claim, and so the new fundamental grounded on various so-called computation principle carries both famous names: models which embody diverse features of ac- The Church-Turing Thesis: A string tual computers. When we simulate these ∗ ∗ function f : S1 → S2 is computable if and models by Turing machines, the time and only if it can be computed by a Turing ma- space complexity of computations increase chine M on some alphabet SM ⊇ S1 ∪ S2. substantially, and so we cannot claim that The Church-Turing Thesis cannot be rig- the informal algorithm has been faithfully orously proved, as it identiﬁes the intuitive, modeled. On the other hand, the time com- informal notion of “computability” with the plexity increase is bound by a polynomial precise, mathematical property of Turing factor for all the known simulations, so that computability. Within mathematics, it is the class P of polynomial problems can be oﬃcially a deﬁnition, much like the deﬁni- deﬁned in terms of Turing machines without tions of arclength or area in terms of in- ambiguity. tegrals. But mathematical deﬁnitions are Turing-computable functions are also not entirely arbitrary: when we “deﬁne” the called recursive, because of the basic G¨o- length of the circumference of a circle of ra- del-Herbrand-Kleene characterization men- dius r by an integral which computes out to tioned above. 2πr, we fully expect that if we draw such a circle and measure its circumference, it will C. Unsolvable Problems turn out to be 2πr, within the margin of er- ∗ ror of our measurements. Similarly, when we A set of strings (or problem) Q ⊆ S from prove that a certain string function f is not a ﬁnite “alphabet” S is computable (recur- Turing computable, we fully expect that no- sive, solvable, decidable) if some Turing ma- body will ever discover an algorithm which chine M computes its characteristic func- computes f, because no such algorithm ex- tion ists. This is the standard method of appli- 1 if u ∈ Q, cation of the Thesis. cQ(u) = (0 otherwise, Evidence for the Church-Turing The- sis comes from Turing’s analysis, from the otherwise it is unsolvable or undecidable. sixty-odd years of failed attempts to contra- The deﬁnitions apply to problems about dict it, and from the robustness of the no- natural numbers, coded in unary; to prob- tion of Turing computability. Many classes lems about FOL-formulas, by identifying of functions were deﬁned in the thirties (for example) each variable vi by a similar 14 YIANNIS N. MOSCHOVAKIS sequence vv · · · v of i+1 v’s, so that the syn- D. Undecidable Theories FOL tax of is based on a ﬁnite vocabulary; A theory T in FOL(σ) is any set of sen- and to relations (sets of n-tuples) on strings tences closed under consequence, or numbers, by thinking of u1,...,un as a single string. T ⊢ A =⇒ A ∈ T. Each Turing machine can be represented by a string of 0’s and 1’s which codes its al- The two basic examples are theories of σ- phabet, internal states and transitions, and structures this leads to the ﬁrst and most basic unsolv- Th(M) = {A | M |= A}, ability result, due to Turing: The Halting problem: It is undecidable and axiomatic theories of the form whether an arbitrary Turing machine M halts on an arbitrary binary string u. T = Th(T0) = {A : T0 ⊢ A}, For the proof, Turing constructed a uni- where T0 is a decidable set of axioms T0. versal machine U which can simulate every The terminology is natural, because we other, i.e., would certainly demand of any “axioma- tization” that it can be decided eﬀectively U[M, u] = M[u], if M is the code ofM. whether an arbitrary sentence is an axiom. Every decidable theory T is axiomatiz- This treatment of programs as data is, of able since Th(T ) = T when T is a theory, course, routine today. but the converse fails, in general, and in par- All unsolvability results are (ultimately) ticular for T0 = ∅ when the vocabulary is not established by reducing the Halting Problem trivial: to them, i.e., showing that if such-and-such Church’s Theorem: If the vocabulary σ a function were computable, then the Halt- includes at least one binary function or re- ing Problem would be solvable. The proofs lation symbol, then it is undecidable for a are often diﬃcult and generally depend on sentence A of FOL(σ) whether ⊢ A. results speciﬁc to the ﬁeld in which the prob- lem arises. A FOL(σ)-theory T is consistent if it does In mathematics, the problems which have not contain a contradiction A & ¬A, and it been proved unsolvable include: is complete if for every sentence A, either A or ¬A is in T . It is easy to verify that every Hilbert’s 10th: Whether a given Diophan- consistent, axiomatizable, complete theory is tine equation has integer solutions (Matija- decidable, and we can use this to formulate sevich, following work of Martin Davis, Hi- and prove a very general version of the G¨odel lary Putnam and Julia Robinson). Incompleteness Theorem. The key tool is The Word Problem for Groups: Whether the notion of translation. two words denote the same element in a Suppose T1 and T2 are theories, perhaps ﬁnitely generated, ﬁnitely presented group in diﬀerent vocabularies σ1 and σ2—e.g., T1 (P. Novikov, W. Boone). might by Th(PA), and T2 might be some ax- The Homeomorphism Problem for 4- iomatic set theory. A translation of T1 into manifolds: Whether the orientable n-ma- T2 is a computable string function ρ which nifolds represented by two triangulations are assigns a sentence ρ(A) of FOL(σ2) to every homeomorphic, for n ≥ 4 (A. Markov). This sentence A of FOL(σ1) and preserves propo- problem is solvable for 2-manifolds, by their sitional logic and T1-inference, i.e., classical representation as spheres with han- dles, and it is still open for 3-manifolds, T2 ⊢ ρ(¬A) ↔¬ρ(A) pending (among other things) the resolu- T2 ⊢ ρ(A & B) ↔ ρ(A) & ρ(B) tion of the Poincar´eConjecture. T1 ⊢ A =⇒ T2 ⊢ ρ(A). There is also a large number of unsolvable problems in Computer Science. Notice that the identity function ρ(A) = A MATHEMATICAL LOGIC 15 translates every theory into itself. Wilkie’s Theorem, that every set in R which The G¨odel Incompleteness Theorem (Ros- is ﬁrst order deﬁnable using exponentials is ser’s form). If T is a consistent, axiomati- a ﬁnite union of intervals. zable theory and Peano arithmetic Th(PA) is translatable into T , then T is undecidable E. The Second Incompleteness Theo- and hence incomplete. rem In short, every consistent axiomatic sys- What sorts of true sentences are not prov- tem in which a reasonable amount of mathe- able in suﬃciently strong axiomatizable the- matics can be developed is undecidable and ories? If T = Th(T ) is axiomatizable in incomplete. 0 FOL(σ), then the (coded) proof relation To state the strongest corresponding re- sult about theories of structures, we need ProofT (a,p) ⇔ the simple fact that every computable set is a is the code of some sentence arithmetical, essentially due to G¨odel. A in FOL(σ) and p is the code Tarski’s Theorem. If Th(N) is translat- of a proof of A from T able into Th(M), then Th(M) is not arith- is Turing computable, and hence arithmeti- metical, a fortiori it is not decidable. cal. Using this, we can construct a sentence To apply Tarski’s Theorem, you need (in ConcisT in the vocabulary of PA which ex- eﬀect) to give a ﬁrst order deﬁnition of the presses naturally the consistency of T and natural numbers within the given structure. establish the following: One of the ﬁrst results of this type was the G¨odel’s Second Incompleteness Theorem undecidabilty of the theory of rational num- (Rosser’s form). If T is consistent, axiom- bers Th(Q, 0, 1, +, ·) (Julia Robinson), but atizable and ρ translates Th(PA) in T , then there are many others, and there are also T cannot prove the translation ρ(ConsisT ) of many diﬃcult open problems in this area. its consistency sentence. On the other hand, many interesting the- The theorem makes it clear that we can- ories are decidable, including the following: not axiomatize a substantial part of math- • The theory Th(N, 0, 1, +) of arithmetic ematics in any way whatsoever so that the without multiplication (Presburger). consistency of the system can be established • The theory Th(Q, ≤). This coincides “constructively”: because the (presumably with the theory of every dense, linear or- simple) “constructive methods” we would be dering without endpoints. willing to use in a consistency proof should • The theory Th(C, 0, 1, +, ·) of the com- be part of the “substantial part of mathe- plex number ﬁeld, which coincides with the matics” we want to axiomatize. Beyond its theory of every algebraically closed ﬁeld of obvious foundational signiﬁcance, the Sec- characteristic 0 (Tarski, Abraham Robin- ond Incompleteness Theorem has numer- son). ous applications, especially in comparing the • The theory Th(R, 0, 1, +, ·, ≤) of the or- strength of various hypotheses in Axiomatic dered ﬁeld of real numbers, which coincides Set Theory. with the theory of every real closed ﬁeld (Tarski). F. Hierarchies The classical result here is Tarski’s decid- 0 ability of the ordered ﬁeld of real numbers, A set Q of strings or numbers is Σ2 if which (using coordinates) implies that Eu- clidean geometry is decidable, in a sense triv- u ∈ Q ⇔ (∃x1)(∀x2)R(u, x1,x2), ializing much of ancient Greek mathematics! where the quantiﬁed variables range over It is still open whether the extended the- y natural numbers and the matrix R is com- ory Th(R, 0, 1, +, ·, ≤, ↑) (with x ↑ y = x putable, and it is Π0 if, for all u for x > 0) is decidable, but there has been 3 substantial progress in this problem with u ∈ Q ⇔ (∀x1)(∃x2)(∀x3)R(u, x1,x2,x3) 16 YIANNIS N. MOSCHOVAKIS with the same restrictions. The deﬁnitions b a b b b extend naturally to all k, and we also set 6 6 0 0 0 ′ ∆k =Σk ∩ Πk. q =⇒ q Kleene, who introduced these classes, showed ? ? that 011 10 1 0 1 1 1 1 1 ∆0 = the class of recursive sets, 1 Fig. 3. q,a, 0 7→ q′, , 1, −1, +1 0 0

Σ1 Σ2

0 0 It also has a special query state q?, and when ∆1 ∆2 ···

it goes into q?, the computation stops and

0 0 Π1 Π2 does not resume until some external agent (the oracle) replaces the contents on the 0 and that a non-empty set Q is Σ1 exactly query tape by some string. when it is recursively (or computably) enu- A string function f is computable relative merable, i.e., if to some given g if it can be computed by Q = {f(0), f(1),...} such an oracle machine, provided each time q? is reached, the string u on the query tape ∗ with some recursive f : N → S . Moreover, is replaced by the value g(u). We let these classes increase properly and exhaust the arithmetical sets. A similar hierarchy f ≤T g ⇔ f is computable in g, 1 1 1 and we extend this notion of Turing re- Σk, Πk, ∆k ducibility to sets of natural numbers via for the analytical (second-order deﬁnable) their characteristic functions. sets is constructed by allowing the quantiﬁed It is not hard to show that there ex- variables to range over the unary functions ist Turing-incomparable sets of numbers α : N → N and the matrix to be arithmeti- (Kleene-Post). In fact, there exist Turing- 1 cal, so that all arithmetical sets are in ∆1. incomparable recursively enumerable sets, These hierarchies classify the analytical but this was quite hard to prove and it was sets of natural numbers and strings by the a celebrated open question for some twelve logical complexity of their (simplest) deﬁ- years, known as Post’s Problem. The si- nitions, and they are powerful tools in the multaneous, independent discovery in 1956 theory of deﬁnability. For example, by Friedberg and Muchnik of the priority 0 every axiomatizable theory is Σ1. method which proved it, initiated an intense This rules out an axiomatization of Second study of Turing reducibility which is still, Order Logic SOL, whose set of valid sen- today, one of the most active research areas tences (on the empty vocabulary) is not an- of logic, the largest (and technically most alytical. Somewhat surprisingly, it also rules sophisticated) part of computability or re- out an axiomatization of the theory cursion theory.

Tf = {A | for all ﬁnite (D,E), (D,E) |= A} V. Recursion and Programming 0 0 of ﬁnite graphs, which is Π1 but not Σ1 (Tra- chtenbrot). In its most general form, a recursive def- inition of a function x is expressed by a re- G. Turing reducibility cursive (or ﬁxed point) equation Imagine a Turing machine with a second (8) x(t) = f(t,x), query tape which it handles exactly like its where the functional f(t,x) provides a primary tape, implementing somewhat more method for computing each value x(t), per- complex transitions of the form haps using (“calling”) other values of x in ′ ′ ′ q,s1,s2 7→ q ,s1,s2,m1,m2 the process. It is possible to characterize the MATHEMATICAL LOGIC 17 computable functions on the natural num- complete, with the pointwise partial order- bers using simple recursive equations of this ing form, generalizations of the primitive recur- sive deﬁnition (2) in III-E. Though con- π ≤ ρ ⇔ for all x,π(x) ≤ ρ(x). ceptually less direct than Turing’s approach Here π : D → W is monotone if through idealized machines, this modeling of computability by “recursiveness” provides a x ≤D y =⇒ π(x) ≤W π(y), powerful tool for establishing properties of and it is Scott-continuous if, in addition, for computable functions, and it is especially every chain C in D, useful in the theory of programming lan- guages. π(supremum(C)) = supremum(π[C]).

A. Recursive equations The Least-Fixed-Point Theorem. If (D, ≤) is a complete poset and π : D → D is mono- Not every recursive equation (8) has a tone, then the recursive equation solution x, and some have many, e.g., the trivial x(t) = x(t) which is satisﬁed by ev- (9) x = π(x) ery function. The basic result which guar- antees canonical solutions to a large class of has a least solution. recursive equations comes from the theory The theorem is proved by setting recur- of partially ordered sets. sively A partially ordered set or poset is a struc- (10) x0 = ⊥, xn+1 = π(xn). ture (D, ≤D), where ≤ is a binary relation and for all x,y,z in D, In the simplest case, which is suﬃcient for the applications to programming languages, x ≤ x, [x ≤ y & y ≤ z] =⇒ x ≤ z D D D D the mapping π is Scott continuous, and then [x ≤D y & y ≤D x] =⇒ x = y; x = supremum{x0,x1,...} a subset C of D is a chain if every two mem- bers of C are ≤D-comparable, i.e., x ≤D y is the least ﬁxed point of π. For the full re- or y ≤D x; and a poset D is complete if ev- sult we need to extend the iteration (10) into ery chain in D has a supremum (least upper the “transﬁnite”, using recursion on ordinal bound). numbers. Every complete poset has a least element There is a rich theory of complete posets ⊥ (the supremum of the empty chain!), and and various kinds of mappings on them, every set A can be turned into a ﬂat poset mostly motivated by the applications to pro- gramming, but also by earlier work in ab- 0 1 2 ... 1 stract recursion, the generalization of com- Y > N ⊥ O putability to abstract structures. ⊥ B. Programming Languages Fig. 4. Flat poset. From the mathematical point of view, a A⊥ by adding a “bottom” below all its oth- programming language P is very much like erwise incomparable elements. Other, basic a logic, with a syntax, a semantics, and an examples include the set of all subsets of a implementation, which plays the role of an set A (under ⊆) and the set of all (ﬁnite inference system. and inﬁnite) sequences from some set, under The syntax is generally much more com- “extension”. The Cartesian product of com- plex than that of logics, with many diﬀer- plete posets is complete, and, more impor- ent categories of grammatically correct ex- tantly, if W is complete, then the function pressions. There are variables of various spaces of all arbitrary, monotone or Scott- kinds, some of them for functions of spec- continuous mappings π : D → W are also iﬁed types; constants which are meant to 18 YIANNIS N. MOSCHOVAKIS denote acts of interaction with the environ- interpreters, to name two), but they must ment (input, output, interrupts); and var- have the basic soundness property, that MA ious ways of combining grammatically cor- “computes” [[A]] in a well understood way rect expressions to produce new ones, using which relates the abstract (mathematical) programming constructs like composition, denotations of programs to the behavior of “while loops”, functional abstraction and machines. recursion. Some closed expressions (with no Even with this grossly oversimpliﬁed de- free variables) corresponding to the “sen- scription, it should be clear that the basic tences” of a logic are singled out, typically methodology of logic—the clean distinction called programs. With all this complexity, between syntax, semantics and inference— the “grammar” is still speciﬁed by an in- has had an immense inﬂuence on the devel- duction, as it is for logics, so that it is again opment of programming languages; and that possible to prove properties of correct ex- the fundamental, related notions of symbolic pressions and to deﬁne operations on them computation and recursion introduced by lo- by structural induction. gicians in the 30s are essential to the under- In the denotational semantics introduced standing of programming languages. by Dana Scott, a programming language P In the other direction, the study of pro- is interpreted in a structure (D, —) whose gramming languages—spurred by the need universe D is a complete poset, the do- for applications—has introduced a host of main. The points of D may include concrete interesting problems in logic, chief among data (words from some ﬁnite alphabet), but them the question of logic of programs: what also functions of various sorts and complex are the natural formal languages and in- mathematical structures which model com- ference systems in which the fundamental putations, interactions, etc. For each cor- properties of programs can be expressed and rect expression A and each assignment α rigorously proved? Much work has been to the variables, the denotation [[A]](α) is a done on this, but it is fair to say that the point in D, determined by a structural in- question is still open, and a formidable chal- duction of the following general form: ﬁrst lenge to logicians and computer scientists. a (Scott-continuous) recursive equation (9) is constructed from α and the denotations VI. Alternative Logics of the parts of A, and then we take From the many alternative logics which [[A]](α) = the least ﬁxed point of [x = π(x)]. are obtained by changing the syntax, seman- tics or inference system of First Order Logic, The use of recursive equations is absolutely we consider, very brieﬂy, just two. essential here, to interpret the iteration and recursive constructs which are at the heart A. Modal and Temporal Logic of programming languages. Modal Logic goes back to Aristotle, the The implementation is a function which traditional founder of logic, who took neces- assigns to each program A a “machine” sity as one of the basic linguistic constructs MA—or, more concretely, code in the ma- worthy of logical study. The modern syntax chine language of some processor—which is obtained by adding to FOL the propo- computes the denotation [[A]] of A. In the sitional box operator 2, so that with each simplest case, [[A]] might just be a sequence formula A we have the formula 2A (with of external acts, like “printing” some ﬁle or the same free variables), read necessarily A. drawing some picture on a monitor; more The possibility operator is deﬁned by the ab- often [[A]] is a function relating input to breviation 3A ≡¬2¬A. output, or a “strategy” in some game, by Modal formulas are interpreted in Kripke which the machine responds to a sequence structures of external stimuli. As with inference sys- tems, implementations come in a great va- M =(W, s0, {Ms | s ∈ W }, R), riety of shapes and forms (compilers and of a speciﬁed vocabulary σ, where W is some MATHEMATICAL LOGIC 19 set of possible worlds; s0 is a speciﬁed “ac- can express interesting properties of the sys- tual world”; each Ms is a σ-structure as- tem, especially if we augment the language sociated with the world s; and R(s,t) is with some additional, natural primitives like an accessibility relation on the worlds, in- Next with the truth condition tuitively standing for “t is a possible alter- Next native to s”. There are no ﬁxed, general Ms |= A ⇔ Ms+1 |= A. assumptions about the accessibility relation For example, 2(p → Next q) says that “ev- or the interpretations of the given relations ery state which has property p is followed on the various worlds; it could be, for exam- immediately by one which has property q”, ple, that “Mary is John’s wife” in the actual and 32p says that “p will eventually be- world s0, but in alternative possible worlds come and remain true”, both interesting John’s wife might be Ellen, John may not properties of ﬁnite state machines. This have a wife—or he may not even exist. As- temporal logic is decidable, and so are vari- signments associate objects in the possible ous extensions of it, in which essentially all worlds to individual variables, and the basic, interesting liveliness and fairness properties semantic relation Ms, α |= A is deﬁned by of ﬁnite state machines can be expressed, so the Tarski conditions (for structures) with that one can mechanically verify the “cor- the additional clause rect behavior” of ﬁnite state machines. The relevant algorithms are practical, if not sim- Ms, α |= 2A ple, they are used commercially, and they ⇔ for all t, if R(s,t), then Mt, α |= A. provide a spectacular example of the emerg- For example, if R is transitive, ing ﬁeld of applied logic. R(s,t) & R(t,t′) =⇒ R(s,t′), B. Intuitionistic Logic then, the formula First Order Intuitionistic Logic FOL has 2 22 I (11) A → A the same syntax as FOL, and almost the is satisﬁed by all assignments, in all possible same inference system: we simply replace worlds, while it may fail for some A in non- the Double Negation Law ¬¬A → A, (8) transitive structures. Finally, in I-E, by the weaker (8)I ¬A → (A → B). M, α |= A ⇔ Ms0 , α |= A. Kripke has established a Completeness The- Diﬀerent conceptions of “necessity” can orem for FOLI using a variation of his se- be modeled by placing appropriate restric- mantics for Modal Logic, and this is use- tions on the accessibility relation, for exam- ful for obtaining unprovability results for ple that it be transitive, linear, etc., and FOLI . The language, however, is meant to there is a question of constructing a suitable be understood constructively, and so it is inference system and proving the appropri- not really possible to explain its semantics ate Completeness Theorem in each case. A fully within classical mathematics. Aside great deal of interesting work has been done from philosophical concerns, the real inter- in this area, much of it motivated by puzzles est of Intuitionistic Logic comes from the in the philosophy of language. proof theory of FOLI , which, somewhat sur- If we take W = N for the set of possible prisingly, also has important applications to worlds, with s0 = 0 and R(s,t) ⇔ s ≤ t, Computer Science. Some sample results: and if we read 2A as “from now on A”, we (1) For any two sentences A and B, get one version of Temporal Logic, very use- ful for applications to computing systems. ⊢I A ∨ B =⇒ ⊢I A or ⊢I B, The worlds are interpreted by the states of some ﬁnite state machine, the propositional and hence 6⊢I p ∨¬p. variables stand for properties of states, and (2) In Heyting Arithmetic, i.e., the axiom the propositional formulas (which suﬃce) system PA of III-A with Intuitionistic Logic, 20 YIANNIS N. MOSCHOVAKIS for any sentence (∃x)A, combinatorics to the transﬁnite. The best set-theoretic results are about the interac- PA ⊢I (∃x)A tion between these two poles of the subject. =⇒ for some n, PA ⊢I A{x :≡ ∆n}. At about the same time as Cantor’s orig- inal contributions, Gottlob Frege initiated (3) If PA ⊢I (∀x)(∃y)A and (∀x)(∃y)A is a sentence (no free variables), then there is an eﬀort to create a foundation of mathe- a computable function f, such that for all n, matics on the basis of set theory. Frege’s approach was diﬀerent (he took “function” PA ⊢I A{x :≡ ∆n, y :≡ ∆f(n)}. This last result is obtained with Kleene’s rather than “set” as his primitive notion) Realizability Theory, and it illustrates the and his original program was overly ambi- following general principle: from a construc- tious and failed. He had the right basic idea, tive proof of (∀x)(∃y)R(x, y), we can extract however, that all objects of classical mathe- an algorithm which computes for each x, matics can be “deﬁned within set theory”, so some y such that R(x, y). There are obvi- that their properties can be (ultimately) de- ous applications of this idea in Computer rived from properties of sets. It took some Science, and much of the current research in time for this to take hold, but it is fair to Intuitionism is motivated by it. say that since the 1930s, set theory has been the oﬃcial language of mathematics, just as VII. Set Theory mathematics is the oﬃcial language of sci- ence. This richness of the ﬁeld makes it fer- Sets are collections into a whole of deﬁ- tile ground for logical investigations, and it nite and separate objects of our intuition or is not an accident that logicians have been thought, according to Georg Cantor, who ini- involved with set theory from the beginning. tiated their mathematical study in the mid 1870s. Thus the basic relation of the theory A. Cardinal Arithmetic is membership (∈), There are exactly as many left shoes in x ∈ A ⇔ x is a member of A, a (normal) shoe store as there are right shoes—and we can be sure of this without and a set is completely determined by its counting, because of the obvious one-to-one members, correspondence between left and right shoes. A = B ⇔ (for all x)[x ∈ A ⇔ x ∈ B]. The principle here is that equivalent sets have the same number of members, Finite sets can be simply enumerated, e.g., A = {0, 5, 7}. Inﬁnite sets are usually speci- (12) |A| = |B| ⇔ A ∼c B, ﬁed by means of some condition P (x) which where A ∼c B indicates that some one-to- characterizes their members, and we write one correspondence exists between the mem- A = {x | P (x)} bers of A and the members of B, and to indicate that A “is the set of all objects |X| = the number of objects in the set X. which satisfy P (x)”. This is a basic tool in mathematics: we Cantor was led to the study of arbitrary, count a set A by establishing a one-to-one abstract sets in his eﬀort to understand the correspondence between its members and structure of some speciﬁc sets of real num- the members of some already-counted set B. pointsets bers or , and the theory which he Moreover, if we set created still exhibits today these two re- lated but separate concerns. The theory of A 4c B ⇔ for some subset C ⊆ B,A ∼c C, pointsets or descriptive set theory is primar- then, obviously, ily a theory of deﬁnability on the real num- 4 bers, and it is characterized by its applica- (13) |A| ≤ |B| ⇔ A c B, tions to other ﬁelds of mathematics, espe- and we can often prove indirectly that there cially analysis. Abstract set theory is pri- are objects in B which are not in A by show- marily a theory of counting, an extension of ing (using arithmetic) that |A| < |B|, so MATHEMATICAL LOGIC 21 that B ⊆ A is impossible. As examples of “proofs by counting”, Cantor proposed to associate a cardinal Cantor showed ﬁrst that number |X| with every (ﬁnite or inﬁnite) set (16) ℵ + ℵ = ℵ · ℵ = ℵ X, so that (12) and (13) hold, and then to 0 0 0 0 0 use similar counting and (inﬁnite) cardinal (basically because of (14)), and arithmetic techniques in the study of arbi- c ℵ0 trary sets. One might expect problems, be- = 2 . cause a ﬁnite set cannot be equivalent with Both of these facts are easy, but they sup- one of its proper subsets (by the so-called port the computation Pigeonhole Principle), while c2 = 2ℵ0 · 2ℵ0 = 2ℵ0+ℵ0 = 2ℵ0 = c, (14) N = {0, 1, 2,...} ∼c {0, 2, 4,...} which means that there is a one-to-one via the correspondence f(n) = 2n. Can- correspondence between the line and the tor showed that, despite this “paradox”, his plane—and, hence, between the line and cardinal arithmetic is a powerful tool with real n-space, for every n! This was new, it important applications in almost all areas was surprising, and it was proved by “plain of mathematics. arithmetic”. Eventually it motivated the de- Cantor’s ﬁrst fundamental discovery was velopment of dimension theory, whose basic that there are (at least) two inﬁnite sizes of result is that there is no continuous, one-to- sets: if one correspondence of real n-space with real ℵ0 = |N|, c = |R| = |the real numbers| m-space unless n = m. Moreover, the set of rational numbers is countable, and so is are the cardinal numbers of the two most the set of algebraic numbers, the solutions basic sets in mathematics, then of polynomial equations c (15) ℵ0 < . 2 n a0 + a1x + a2x + · · · + x = 0 A set A is countable if |A| ≤ ℵ0, otherwise it is, like R, uncountable. with integer coeﬃcients. Thus, since R is To deﬁne the arithmetical operations on uncountable, “by simple counting” there ex- (possibly inﬁnite) cardinal numbers, choose ist transcendental (not algebraic) real num- sets K, L with no members in common so bers, a famous result of Liouville’s whose that κ = |K|, λ = |L|, and set original proof had rested on delicate conver- gence arguments for inﬁnite series. It was a κ + λ = |K ∪ L|, “killer application” which made set theory κ · λ = K × L, instantly known (and somewhat notorious) κλ = |(L → K)|. in the mathematical community. Cardinal addition and multiplication sat- Here the union K ∪L is the set of all objects isfy the following absorption laws which ba- which belong to either K or L; the Carte- sically trivializes them in the inﬁnite case: sian product K × L is the set of all ordered pairs (x, y) with x ∈ K and y ∈ L; and if 0 <κ ≤ λ and λ is inﬁnite, the function space (L → K) is the set of all then κ + λ = κ · λ = λ. functions f : L → K. If κ and λ are ﬁnite, we get the usual sum, product and exponen- For exponentiation, however, Cantor ex- tial, noting, in particular, that there are κλ tended (15) to the general inequality functions from a set of size λ to one of size κ. κ < 2κ, Moreover, all the familiar arithmetical iden- tities hold—e.g., addition and multiplication which provides inﬁnitely many distinct “or- are associative and commutative, multipli- ders of inﬁnity”, perhaps what people meant cation distributes over addition, κ0 = 1, and when they referred to Cantor’s Paradise. µ Exponentiation is the source of the deep- λ+µ λ µ λ λ·µ κ = κ · κ , κ = κ . est questions about inﬁnite sets, chief among 22 YIANNIS N. MOSCHOVAKIS them Cantor’s Generalized Continuum Hy- (where many of the applications lie) well or- pothesis (GCH), the claim that for all inﬁ- derable? The natural ordering of R won’t nite κ, do, since (for example) R has no least el-

κ + ement, and it is hard to imagine how one (GCH) 2 = κ could arrange all the real numbers into a = the least cardinal number > κ. transﬁnite sequence, with each point fol-

ℵ0 + lowed by its successor and every non-empty The “ordinary” case (CH) 2 = ℵ0 was No. 1 in Hilbert’s list, it dominated set-theoretic subset having a least element. The Contin- research in the 20th century and, in a sense, uum Problem (whether CH is true or not) it is still open today. and this Well Ordering Problem were the In addition to the cardinal numbers, central open problems in set theory at the which count the members of a set turn of the 20th century. one, two, three, . . . B. The paradoxes in the ﬁnite case, Cantor also introduced in- Cantor developed his theory on the ba- ﬁnite versions of the ordinal numbers sis of the following General Comprehension ﬁrst, second, third, . . . Principle which ﬂows naturally from his “deﬁnition” of sets quoted in the beginning which assign position in a sequence. These of this section: every deﬁnite (unambigu- are associated with “transﬁnite sequences”, ous) property P (x) of mathematical objects, well ordered structures i.e., (A, ≤), where ≤ has an extension, the set A = {x | P (x)} linear ordering is a on A (so that x ≤ y or which “collects into a whole” all the objects every non- y ≤ x, for all x, y in A) and which satisfy P (x), so that empty subset of A has a least element. Ev- ery ordinal number α has a successor α + 1 (17) x ∈ A ⇔ P (x). which deﬁnes “the next position”, and ev- But this is not generally true: because if ery set of ordinal numbers A has a least up- per bound sup A. The least inﬁnite ordinal R = {x | is a set and x∈ / x}, number ω deﬁnes the ﬁrst position with in- then, from (17), 0, 1, 2, ...ω, ω + 1, ω + 2, ... R ∈ R ⇔ R is a set and R∈ / R ⇔ R∈ / R ω · 2, ω · 2 + 1, ... which is absurd. The argument was dis- ﬁnitely many predecessors, and it is a limit covered in 1902 by Bertrand Russell, and it ordinal, without an immediate predecessor. was not the ﬁrst contradiction in set theory. Ordinal arithmetic has fewer direct appli- However, earlier “paradoxes” (some of them cations than the arithmetic of cardinal num- known to Cantor) were technical, not un- bers, but well ordered structures and ordinal like the paradoxes with inﬁnitesimals which numbers are the fundamental tools in the had been commonplace in the Calculus some study of transﬁnite iteration, which is rich years earlier, and it was thought that they in applications. In a typical case, a function would go away in a careful development of f : A → B is deﬁned by recursion on some the subject. The Russell Paradox is not well ordered structure (A, ≤), and then the technical, it goes to the heart of the nature crucial properties of f are established by in- of sets, and it threw the mathematical com- duction along ≤. Moreover, the exact spec- munity into a spin. iﬁcation of the relation ≤ is often unimpor- L. E. J. Brouwer initiated the intuitionis- tant: all that matters is that some relation tic program which denies that abstract sets well orders A, in other words that A be well are meaningful objects of study and also re- orderable. jects some of the basic principles of logic. Mathematical objects cannot be said to “ex- (WOP) Is every set well orderable? ist” in any sense independent of (mental) Speciﬁcally, is the set R of real numbers “mathematical activity”; and to prove that MATHEMATICAL LOGIC 23 some x has property P , one must construct reformulation of set theory which eventually some speciﬁc object x which has property prevailed. P —it is not enough to derive a contradiction What has prevailed is Axiomatic Set The- from the assumption that no x has property ory, ﬁrst proposed in 1904 by Zermelo as a P . Intuitionism had a strong inﬂuence in pragmatic way to avoid the paradoxes by re- the philosophy of mathematics and remains building Cantor’s set theory on the basis of a vibrant ﬁeld of study within logic, but it a few set-theoretic principles which are ba- never carried much favor with mathemati- sic, simple and well understood by their uses cians: too much of classical mathematics in classical mathematics. Formalists can ac- must be thrown out to satisfy its tenets. cept it, as nothing more but the choice of Hilbert proposed to “save” classical math- a speciﬁc set of axioms, whose “truth” is ematics from the paradoxes and Brouwer’s irrelevant—if, at all, meaningful. But it is attack by formalizing as large a part of it as the realists who, in the end, have received possible in some ﬁrst order, axiomatic the- the greatest comfort from axiomatic set the- ory T , and then establishing the consistency ory: because the systematic development of of T by absolutely safe, ﬁnitistic methods. consequences of the axioms eventually led Formalism is the reading of Hilbert’s Pro- to a narrower, more concrete concept of set, gram as a philosophical view: it alleges that which ultimately justiﬁed the axioms. once T is chosen, then T is all there is— Much of modern logic was created in re- there is nothing more to mathematics but sponse to the challenge of the set-theoretic the study of the inference relation T ⊢ A, paradoxes, and that is another reason why with no reference to meaning. Aside from the discipline is so intimately tied with set the impact of G¨odel’s Second Incomplete- theory. ness Theorem (IV-E) which weakens it, For- malism also fails to account for the applica- C. Zermelo-Fraenkel Set Theory tions of mathematics: it is hard to see how There are eight axioms in ZFC (Zermelo- the existence or not of certain patterns of Fraenkel Set Theory with Choice), and it is meaningless symbols can have any bearing assumed that they are interpreted over some on the escape velocity of a rocket. given domain of sets V, which comes en- From those reluctant to abandon the tra- dowed with a binary membership condition, ditional, realist view that mathematical ob- x ∈ y. The formal theory ZFC is obtained jects are, well, real, no matter how abstract by expressing these axioms by sentences of and diﬃcult to pin down, Russell ﬁrst pro- FOL(∈), and it requires inﬁnitely many sen- posed to replace set theory by his famous tences, because the Replacement Axiom 5 theory of types: it is claimed (roughly, and requires an axiom scheme. Here we will de- in the later simple version due to Ramsey), scribe them brieﬂy and informally, with a that every mathematical object is of a cer- few interspersed comments. tain (natural number) type n, and that ev- 1. Extensionality. Two sets are equal ery set A is of some successor type n + 1, exactly when the have the same members. such that the members of A are of the im- 2. Empty set and Pairing. There is a mediately preceding type n. Type theory set ∅ with no members, and for any two sets is awkward to apply and it yields only a a, b, there is a set {a, b} whose members are poor shadow of Cantor’s set theory, albeit exactly a and b. without the paradoxes. It never gained fa- 3. Unionset. For each set A, there is a vor as a true alternative to set theory, al- set ∪A whose members are the members of though it has been studied extensively as the members of A, a logical system, it has found its own ap- t ∈∪A ⇔ (∃x)[x ∈ A & t ∈ x]. plications (especially recently, to program- ming languages), and many of its fundamen- 4. Powerset. For each set A, there is a tal ideas were eventually incorporated in the set P(A) whose members are all the subsets of A. 24 YIANNIS N. MOSCHOVAKIS

An operation F : V → V is deﬁnite if it In eﬀect, AC postulates a function f is ﬁrst order deﬁnable with parameters, i.e., which makes a choice f(x) from the non- empty set {y | (x, y) ∈ R}, “simultane- F (x) = G(x,a ,...,ak) 1 ously”, for each x ∈ A. If B carries a well where G(x, y1,...,yk) is ﬁrst order deﬁn- ordering ≤, we could take able, II-C. 5. Replacement. The image f(x) = the ≤-least y such that (x, y) ∈ R; F [A] = {F (x) | x ∈ A} Zermelo showed that, conversely, AC implies of a set A by a deﬁnite operation F is a set. that every set is well orderable, and identi- (This was formulated in the 30s, primarily ﬁed numerous examples where the seemingly by Skolem, and it is much stronger than Zer- controversial AC is routinely used in math- melo’s original Separation Axiom.) ematics. Somewhat later Hartogs showed For the next two axioms, we need the no- that AC is also equivalent with the cardinal tion of function f : A → B from one set comparability property to another, which is not among our primi- (∀A,B)[A ≤ B ∨ B ≤ A] tives, and so we need to “reduce” the notion c c of function to that of set. The trick is well without which there is no cardinal arith- known: ﬁrst we ﬁx some ordered pair oper- metic, and this limited further opposition ation (x, y) which satisﬁes the key property to AC to those who were willing to aban- (18) (x, y)=(x′, y′) ⇔ x = x′ & y = y′, don completely Cantor’s Paradise. The last axiom of ZFC involves the cum- and then we model a function f by its graph, mulative hierarchy of sets, which is deﬁned Gf = {(x, y) ∈ A × B | y = f(x)}, by recursion on the ordinal numbers as fol- which is just a set with some special proper- lows: Ku- ties. It is common to use the so-called V = ∅, ratowski pair operation 0 Vα+1 = P(Vα), (x, y) = {x, {x, y}}, Vλ = α<λ Vα (if λ is limit). but there are many others, and all that is needed is some operation which satis- 8. FoundationS . Every set is a member of ﬁes (18). some Vα. 6. Inﬁnity. There is a set I and a one-to- This is a limiting axiom, not needed for one function f : I → I which is not onto I, the development of Cantor’s set theory or f[I] ( I. its applications, but it is important be- Next comes Zermelo’s chief contribution: cause it codiﬁes within the axiomatic the- ory a conception of set which replaced in the 1930s Cantor’s free-wheeling notion of a “collection into a whole”: each set is f reached starting with “nothing” (the empty- R set ∅), by “indeﬁnite” (never ending) “itera- B . tion” of the powerset operation. Admittedly . . more complex than Cantor’s, this notion of . x grounded set prohibits the circular construc- A tions which lead to the paradoxes, and it can be described intuitively in suﬃciently clear 7. Axiom of Choice (AC). For each bi- terms to justify the axioms. nary relation R ⊆ A × B, To see how classical mathematics can be (∀x ∈ A)(∃y ∈ B)(x, y) ∈ R developed on the basis of these seven ax- ioms, consider ﬁrst arithmetic. A number =⇒ (∃f : A → B)(∀x ∈ A)(x, f(x)) ∈ R. system is a triple (N, 0,S) such that N is a MATHEMATICAL LOGIC 25 set, 0 ∈ N, S : N → N is a one-to-one func- other than 1 to CH. tion which is never 0, and In both of these proofs, logic plays an es- [X ⊆ N & 0 ∈ X & S[X] ⊆ X] =⇒ X = N; sential role which goes much beyond provid- ing the context in which their claims can we prove that there exists a number system be made precise. For example, the con- and that every two number systems are iso- structible universe L is deﬁned by iterating morphic, and then we choose some speciﬁc the operation of taking all ﬁrst order deﬁn- number system and call its members the nat- able subsets rather than P(A) in the cum- ural numbers. The real numbers are identi- mulative hierarchy of sets, and then a strong ﬁed with some complete ordered ﬁeld, once version of the Skolem-L¨owenheim Theorem we prove that one such exists and any two is used at a crucial point to show that GCH are order isomorphic, and so forth for other holds in L. Through the work, initially, structures. This process of “deﬁning” (more of Robert Solovay for forcing and Ronald accurately: modeling faithfully) mathemat- Jensen for constructibility, these theories ical structures in set theory has found such have been much generalized and continue to widespread acceptance in mathematics, that be very active research areas of logic, with “to make a notion precise” is now viewed as important applications to analysis, algebra synonymous with “deﬁning it in set theory”. and topology. D. Independence Results E. Current Research in Set Theory It is, perhaps, ironic, that the axiomati- In one direction, set theory is more in- zation of set theory made possible to formu- volved now with applications than ever be- late and prove its own limitations. Let ZF fore. Especially fruitful has been the devel- be the theory with axioms 1–6, i.e., without opment in the 1960s of eﬀective descriptive the Axiom of Choice: set theory, which incorporates methods from Theorem. If ZF is consistent, then so are recursion theory into the study of deﬁnabil- the theories ZFC+GCH (G¨odel, 1938) and ity on the continuum to yield very substan- ZFC+¬CH (Paul Cohen, 1963). tial applications to analysis. In eﬀect, ZFC can neither disprove nor Beyond the applications, set theory has prove the Continuum Hypothesis, unless a attempted to confront the fundamental contradiction can be obtained from its “con- problem posed by the independence results: structive” core. In addition, Cohen showed what does one do with the Continuum Prob- that ZF cannot prove the Axiom of Choice, lem, now that we know that it cannot be and several additional consistency and inde- settled in ZFC? Some have adopted a for- pendence results. malist view, that it is meaningless to ask G¨odel’s proof uses an inner model, a sub- “whether CH is true or not”, and that “set collection of our intended universe of sets V: theory is the study of all models of ZFC”. using only axioms 1–6, he deﬁnes a certain This is a very active area of research. collection L of constructible sets and shows In another direction, people have looked that if we re-interpret “set” to mean “mem- for new axioms, extending ZFC, which ber of L”, then all the axioms of ZFC as well might provide the needed answers, and a as GCH are true. Cohen’s forcing method great deal of research has been done in this builds “virtual universes” which are “larger” direction since the 1960s. Generally speak- than V, and so he must describe them in- ing, two kinds of axioms have been con- directly. This can be done with Boolean- sidered. Large cardinal axioms are plausi- valued models: a collection M ⊂ V and a ble generalizations of the Axiom of Inﬁn- binary condition E on M are deﬁned, and ity, which, however, have very few direct then it is shown that, for a certain (com- consequences for the continuum. Deter- plete) Boolean algebra B, the Boolean se- minacy hypotheses postulate that certain mantics of the “structure” (M,E) assign 1 (fairly simple) inﬁnite games on the nat- to all the theorems of ZFC but something ural numbers are determined; somewhat 26 YIANNIS N. MOSCHOVAKIS technical and not especially plausible, these axioms answer most deﬁnability questions about the real numbers that are indepen- dent of ZFC, although, unfortunately, they cannot settle the Continuum Problem. In a fundamental advance made in the 1980s, Donald A. Martin, John Steel and Hugh Woodin showed that the plausible large car- dinal axioms imply the fruitful determinacy hypotheses, and so a “uniﬁed”, very strong extension of ZFC has been created which is the subject of much current research. Un- fortunately, it does not solve the Continuum Problem, and so the search goes on. It may well be that set theory will con- tinue to be dominated in the 21st century by the search for an answer to the Contin- uum Problem, as it certainly was during the century just ended.

References Handbook of Logic in Computer Science, edited by S. Abramsky, T. S. E. Maibaum and D. M. Gabbay, in ﬁve vol- umes, the ﬁrst publiahed in 1993, Clarendon Press, Oxford. Handbook of Proof Theory, edited by Samuel R. Buss, Studies in Logic and the Foundations of Mathematics, vol. 137, Else- vier, 1998. Model Theory, by Wilfried Hodges, Encuclopedia of Mathematics and its Ap- plications, vol. 42, Cambridge University Press, 1993. Theory of Recursive Functions and Eﬀective Computability, by H. J. Rogers, Jr., McGraw-Hillm New York, 1967. Set Theory, by Kenneth Kunen, Studies in Logic and the Foundations of Mathemat- ics, vol. 102, Elsevier, 1998. Descriptive Set Theory, by Yiannis N. Moschovakis, Studies in Logic and the Foun- dations of Mathematics, vol. 100, North Hol- land, 1980.

University of California, Los Angeles, and University of Athens