<<

SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS

ETHAN JERZAK

Abstract. First-order logic enjoys a nearly universal status as the language in which mathematics is done—or at least, in which it is ideally done. Whence this priority? In this paper, I present some of the basic principals of first- order logic, and then give an introduction to second-order theories. I shall demonstrate that second-order logic is not reducible to first-order logic, and I will sketch some of the meta-logical results that make first-order logic so appealing and that fail for higher orders. The emphasis shall be more on explication than on detailed proof: my aim is to sort out what one gains and loses by limiting himself to first-order quantifiers, and to see what second-order theories can offer in terms of expressive power.

Contents 1. History 1 2. First-order Logic 3 2.1. Syntax 3 2.2. Semantics 4 2.3. Deductive Systems 5 3. First-Order Metalogical Results 5 4. First-Order Limitations 8 4.1. Skolem’s Paradox 8 4.2. ‘A Paradoxical Situation’ 9 4.3. Ultraproducts 9 5. Second-Order Logic 12 5.1. Deductive Systems 13 5.2. Semantics 14 6. Second-Order Metaresults 15 7. Conclusion: Living without Completeness 18 Acknowledgments 21 References 22

1. History These historical considerations may well seem extraneous to the contemporary mathematician—or, if not exactly extraneous, at least as a skimmable first section written merely to orient oneself vis-a-vis certain incidental timelines. It is, as one says, of merely historical interest. But nonetheless I proceed with it, resolutely

Date: 21 August 2009. 1 2 ETHAN JERZAK ignoring the consequent jeers and slanders of the ahistoricists. This is not a paper on mathematics; it is a paper on the foundations of mathematics, and as such historical considerations are of more than merely cursory interest. Progress in mathematics as such is basically unidirectional, as long as one does not forget the fundamental theorems. Progress in the foundations of mathematics, on the contrary, is sketchy and multifarious, precisely because there is no definitive notion of deduction for these more philosophical questions. We may know more mathematics than Kant, but this does not exclude the possibility that his theory of its foundations does not remain more correct. To history, then, without further apology. Mathematics today conforms to a self-conception that is relatively new (certainly fewer than 150 years old). Roughly, the contemporary mathematician lays down axioms and definitions more or less arbitrarily, and then works out the consequences of these axioms according to a deductive system of some sort. No longer does the mathematician hold that his axioms are in some way the ‘correct’ axioms for de- scribing the world; the analyst remains neutral on whether the universe really looks like R3 or not. The focus instead is on giving deductive proofs from these axioms and definitions. Some conventional shorthand is generally allowed—to write out even fairly basic proofs fully would be unacceptably time-consuming—but gener- ally one considers his proof valid if any competent person could easily translate it into a valid argument in some first-order language, had he only the time and moti- vation. (Of course, the level of dogmatism on this point varies between the different branches, but most of the mainstream divisions embody it to some degree.) This notion of mathematical proof differs greatly from its historical predecessors. Most proofs prior to Frege were written in something very close to plain English (or Greek, German, or French, as the case may have been). Kant (1724–1804) thought that a mathematical proof was not a purely analytic thing; the pure imagination was involved in determining constructibility, such that going through a proof was a genuinely active process that only a rational agent with all of the cognitive faculties could undertake. The notion of a purely formal language developed first with Gottlob Frege (1848–1925), whose Begriffschrift was supposed to bridge ‘gaps’ in the natural language proofs and resolve certain contradictions that ambiguities in natural languages allowed. For example, the Intermediate Value Theorem was taken to require pure intuition, not logical proof, until Bolzano supplied an analytic proof from the definition of continuity in 1817. Frege wanted to similarly ground all of mathematics.1 This project spawned logicism, the idea that mathematics could be reduced in its essentials to logic (and thereby reducing mathematics to an analytic, not synthetic, discipline). It is here that logic began to take the form that we know it today, and, indeed, where the notions of first-order logic became separated from higher orders. Logic emerged as a purely formal deductive system that could be studied in its own right, and upon which all mathematics was supposed to be constructed. The development of formal logic, then, is historically bound up with a certain brand of foundationalism. One must find firm ground on which the structures of mathematics can be secured, and the way one does it is generally via what became today’s first-order logic. I shall give a brief overview of bread-and-butter first-order

1One notable exception is Geometry. Frege actually agreed with Kant that Geometric proofs were in fact synthetic, and required pure intuition in addition to logical deduction. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 3 logic, in order to then give a slightly more detailed overview of second-order logic and compare the foundational merit of each.

2. First-order Logic In its broadest sense, we take ‘logic’ to mean ‘the study of correct reasoning’. We want to formalize such intuitive notions as ‘valid arguments’ and ‘proofs’. To accomplish this goal, we will develop a formal system into which we can translate what we take to be correct arguments in English and ensure that they are indeed complete and correct. Ideally, we will develop a formal language such that an English argument is valid if and only if it is modeled by a formally valid inference in our language. Needless to say, this is a goal perhaps too ambitious for mere mortals. Thus, we shall restrict ourselves to the language commonly spoken by mathematicians. The important thing is always to keep in mind is that logic is a model for correct reasoning whose justification stems, at least initially, from our plain-English intuitions of what constitutes a correct argument. We seek a language that captures this intuition as faithfully as possible. One may, if one wishes, be on guard against a sort of epistemological circle here: we use intuitions to develop and justify a system of correct reasoning, and in turn make determinations about the correctness of our English-language arguments based on whether it is successfully modeled in that formal language. This worry I leave aside for now; let us at least explore the circle before trying to find a way out of it. first-order logic consists of two main parts: the syntax, and the semantics. Roughly speaking, the syntax formally determines which strings of symbols are permissible for the language; the semantics assigns meanings to these permissible expressions. As we will see, in standard first-order logic, these two notions will coincide nicely. The exposition of first-order logic that follows is a bit hasty, and serves mainly to establish the notation that will be drawn upon in later sections.

2.1. Syntax. The syntax of a language consists of an alphabet together with for- mation rules. For first-order logic, we have an alphabet consisting of logical symbols that retain the same meaning no matter the application:

• Quantifiers: ∀ (universal) and ∃ (existential) • Logical connectives: ¬ (negation), ∧ (conjunction), ∨ (disjunction), → (conditional), and ↔ (bicontitional) • Punctuation: ( and ) • A of variables: {x0, x1, x2...} • Identity: = (optional)

And we also have some non-logical symbols, depending on the application. Non- logical symbols represent relations, predicates, functions, and constants on whatever we take to be our domain of discourse. For example, if we were doing , we would want the symbols {1, ∗} always to function in the same way between different groups. If we were doing arithmetic, we would want {0, 1, +, ∗, <} at our disposal. In general, we allow the set of non-logical symbols to be as large as one 4 ETHAN JERZAK pleases. It can be any cardinality.2 Call a first-order language with a set K of non-logical symbols L1K. If it has equality, call it L1K =. A set of symbols alone is insufficient for making a meaningful language; we also need to know how we can put those symbols together. Just as we cannot say in English “Water kill John notorious ponder,” we want to rule out pseudo-formulas like “∀(→ x12∨.” We therefore define a well-formed formula by formation rules. I omit the details, which are tedious and can be found in any competent text on first-order logic. One first defines the terms inductively as any variable xn or any expression f(t1...tn) where f is an n-place function and ti are terms. One then specifies certain formulas as atomic: usually those of the form P (t1...tn) where P is any n-place relation and t1...tn are terms. One then wants to make provisions for building complex formulas out of the atomic ones, through rules such as: if φ and ψ are formulas, then ¬φ, φ → ψ, and (∀xi)φ(xi) are all formulas. These three rules are actually sufficient to generate all the formulas, once one trims away the excess connectives and quantifiers. (All the others can be expressed in terms of these three. For example, we define (∃xi)Φ(xi) to be ¬(∀xi)¬Φ(xi).) We call any variable that is not quantified (i.e. an xi not preceded by (∀xi) or (∃xi)) a free variable. A variable that is quantified is a bound variable. Note that everything above only tells us how we can put terms together; it leaves the question of what they mean untouched. We have as of yet no idea how ∀ and ∃ differ. For that, we need a semantics.

2.2. Semantics. Again, I assume prior acquaintance with first-order semantics. This section mainly establishes the symbolism. A model of L1K(=) is a structure M = hd, Ii, in which d, the domain of the model, is a non-, and I is an interpretation function that assigns items from d to the non-logical symbols in K. For example, if a is an individual constant symbol in K, then I(a) is a member of d. If A is a binary relation symbol, then I(A) is a of d × d.A variable-assignment s is a function from the variables of L1K(=) to d. Each formula is assigned a truth-value in the standard inductive way. An atomic formula P (t1...tn) is assigned to ‘true’ if hd1...dni ∈ I(P ), where d1, . . . , dn are the evaluation of the terms t1, . . . , tn and I(P ) is the interpretation of P (which by assumption is a subset of Dn). It is assigned to ‘false’ otherwise. A formula built out of simpler formulas from the purely logical connectives (¬ or →) gets treated in the standard truth-functional way. Finally, a quantified formula (∀xi)Φ(xi) is true according to M and s if Φ(xi) is true for every pair composed by the interpretation M and some variable assignment s0 that differs from s only on the value of xi. This captures the idea that (∀xi)Φ(xi) is true if every possible choice of a value for xi causes Φ(xi) to be true. Given the above, we are in a position to define the satisfaction relation. If a for- mula φ evaluates to ‘true’ under a given interpretation M and variable assignment s, we say that M and s satisfy Φ, or M, s  Φ. If Φ is a sentence, i.e. a formula with no free variables, we write can simply M  Φ.

2One might protest that this provision already takes certain notions, like cardinality, for granted. In general, when constructing a language, we always have to do it in an ontologi- cally richer metalanguage. In this case, we presuppose the language of . This revelation would provide evidence against a position like Quine’s, which holds that first-order logic has epistemological priority because it is ontologically presuppositionless. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 5

A formula Φ is semantically valid (also called a logical truth) if M, s  Φ for every M and s. A formula Φ is satisfiable if there is some model M and assignment s such that M, s  Φ. Finally, we say that Φ is a semantic consequence of Γ (denoted S Γ  Φ) if Γ {¬Φ} is not satisfiable. 2.3. Deductive Systems. The above gives one way of talking about one sentence ‘following’ from another. It uses the semantic notion of all possible models. A deductive system is another way of determining inference—one that relies on syntax alone. The deductive system explicated below is called D1. First, I stipulate the following axiom schemes. Any formula obtained by substituting formulas for the Greek letters is an axiom. • Φ → (Ψ → Φ) • (Φ → (Ψ → χ)) → ((Φ → Ψ) → (Φ → χ)) • (¬Φ → ¬Ψ) → (Ψ → Φ) • (∀xi)Φ(xi) → Φ(t) Let Γ be a set of formulas and let Φ be a single formula. A deduction of Φ from Γ is a finite sequence Φ1...Φn such that Φn is exactly Φ, and, for all i ≤ n, either Φi ∈ Γ, or Φi is an axiom, or Φi follows from previous formulas in the sequence by one of these rules of inference: • If you have Φ and Φ → Ψ, you can infer Ψ. • If you have Φ → Ψ(xi), you can infer Φ → (∀xi)Ψ(xi), as long as you make sure that xi does not occur free in Ψ or in any member of Γ. And that’s it. We could spend a lot time showing that these are sufficient to generate all the other axioms you want to use (like existential introduction, modus tolens, etc.), but it hardly seems worth it. The authors of standard logic texts have kindly worked out the details. However, if you have identity in your language, you will also want two more axioms:

• (∀xi)(xi = xi) • (∀xi)(∀xj)(xi = xj → (Φ(xi) → Φ(xj))), as long as xj is free for xi in Φ(xi). If there is a deduction of Φ from Γ we write Γ ` Φ. This gives a purely syntactic notion of deducibility. A ‘proof’ in any first-order theory means a proof according to this deductive system. Recall that we now have two criteria for when one formula ‘follows from’ another. We have on the one hand our semantic criterion, which has to do with every model of one formula being a model of the other. On the other hand, we have our syntactic criterion, which operates according to this formal, non-semantic deductive system. It is then natural to ask: Do these two notions of consequence match? That is, if Φ is a semantic consequence of Γ, is Φ syntac- tically deducible from Γ, and vice-versa? This question motivates a meta-logical investigation.

3. First-Order Metalogical Results For the most part, I will only state the well-known results for L1K(=). I will give the sketchiest of ideas for how to prove the following results, many of which involve a tedious argument involving induction on complexity.3

3Complexity is something that we define for formulas solely to be able to do induction on them easily. It can be done any number of ways. Usually, the atomic formulas will have complexity 6 ETHAN JERZAK

Theorem 3.1. : If Γ ` Φ then Γ  Φ Proof. Basically, we show that the logical axioms are logically implied by anything, and that both modus ponens and our ∀ introduction rule preserve logical implica- tion. Suppose that all the logical axioms are valid (checking this is a good exercise). Then we can supply a simple inductive proof that any formula Φ deducible from Γ is logically implied by Γ. Case One: Φ is a logical axiom. In this case there is nothing to prove. Case Two: Φ ∈ Γ. Then there is also nothing to prove. Case Three: We obtain Φ via modus ponens from Ψ and Ψ → Φ, where Γ  Ψ and Γ  (Ψ → Φ). But those together semantically imply Γ  Φ, since every model of Ψ and Ψ → Φ must be model of Φ. Case Four: Somewhere, we obtain Φ → (∀xi)Ψ(xi) from Φ → Ψ(xj), where xj does not occur free in Ψ or in any member of Γ. This is the only tricky part of the proof. Suppose that Γ  Φ → Ψ(xj). We need to show that Γ  Φ → ∀(xi)Ψ(xi) Let M be a model of Γ that contains all the names of Φ and ∀(xi)Ψ(xi). We must show that M is a model of Φ → ∀(xi)Ψ(xi). First of all, it is possible that 0 xj ∈ dom(M). If so, let M be the same model as M except map xj 7→ yj, some dummy variable. Since xj does not occur free anywhere, this modification changes 0 no truth values. Thus, M is a model of Φ → Ψ(xj). Suppose that there is some 0 00 xk ∈ dom(M ) such that ¬Ψ(xk). Let M be a model with the same domain and 0 00 truth values as M , but add a new variable to M called xj with the same properties 0 00 and relations as xk ∈ dom(M ). Note that M again changes no truth-values from 0 00 00 M . Then ¬Ψ(xj) is true in M . But since M is a model of Φ → Ψ(xj), this implies that ¬Φ is true in M 00. But then ¬Φ is true in M 0, and thus also in M. Therefore, either ¬Φ is true in M or (∀xi)Ψ(xi) is true in M, which was what we wanted to show.  Soundness tells us that we will never be able to prove a contradiction in L1K(= )—whatever implication we can prove syntactically is also semantically valid. It is natural to wonder whether the converse holds: given a semantically valid in- ference, is there a proof of this inference in the deductive calculus? One of the most appealing results of first-order logic will be that the answer to this question is yes. Soundness tells us that we have not added an axiom scheme in the deductive calculus that will lead us astray; completeness tells us that we have taken care of everything, so to speak. First, a lemma. Lemma 3.2. The following are equivalent: • If Γ  Φ, then Γ ` Φ • Any consistent set of formulas is satisfiable. Proof. A set of formulas is consistent if there is no Φ such that Γ ` Φ and Γ ` ¬Φ. A set is satisfiable if there is a model M and assignment s such that M, s  χ for all χ ∈ Γ. (⇒) Suppose the former, and suppose by contradiction that Γ is a consistent set of formulas that is unsatisfiable. Then for any Φ, we have Γ  Φ. (An unsatisfiable set of formulas semantically implies anything. Semantic implication means that every model M of Γ is also a model of Φ, but this is vacuously true if there is no zero, and ‘adding one’ component will increase the complexity by one. (So that, for instance, if Φ has complexity n,(∀xi)Φ(xi) will have complexity n + 1.) SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 7 model of Γ.) Then Γ  Φ and Γ  ¬Φ. But by hypothesis this implies that Γ ` Φ and Γ ` ¬φ, contradicting that Γ is consistent. S (⇐) Suppose the latter, and suppose Γ  Φ. Then by definition Γ ¬Φ is unsatisfiable. Thus by hypothesis Γ S ¬Φ is inconsistent, i.e. Γ S ¬Φ `. But Γ S ¬Φ ` if and only if Γ ` Φ.  Theorem 3.3. Completeness (Godel 1930): If Γ  Φ then Γ ` Φ. Proof. This proof is by no means straightforward. It depends crucially on a prin- cipal of infinity (namely, that models with infinite domains exist). If we did not allow such models, the completeness theorem would not hold, because there are satisfiable sentences that have no finite models (for example, the first-order Peano axiomatization of arithmetic). A sketch of the proof using the above lemma goes as follows. Begin with a consistent set Γ, and extend Γ to a set ∆ of formulas for which: • Γ ⊆ ∆. • ∆ is consistent, and for any formula Φ, either Φ ∈ ∆ or ¬Φ ∈ ∆. • For any formula Φ and variable xi, there is a c such that (¬(∀xi)(Φ(xi) → ¬Φ(c). We then form a model M in which members of Γ not containing ‘=’ are satisfied. Dom(M) is the set of equivalence classes of terms, where t1 ∼ t2 if T h(M)  t1 = t2. (T h(M) is just the set of semantic consequences of M.) For any P , ht1...tni ∈ I(P ) ⇔ P (t1...tn) ∈ ∆. Finally we extend this model to include formulas involving ‘=’. The difficulty, of course, is showing in detail that this can be done, which I here neglect.  Completeness is one of the most appealing results of L1K(=). It, together with soundness, yields an immediate and useful consequence: Theorem 3.4. Compactness: Let Γ be a set of formulas. Then Γ is satisfiable if and only if every finite subset of Γ is satisfiable. Proof. (⇒) This implication is trivial. Any model of Γ is eo ipso a model of ∆ ⊆ Γ. (⇐) If Γ is not satisfiable, by completeness, it is inconsistent, i.e. Γ ` (Φ ∧ ¬Φ) in D1. But by definition a deduction in D1 contains only finitely many premises. So there is some finite subset ∆ ⊆ Γ such that ∆ ` (Φ ∧ ¬Φ) in D1. By soundness, ∆ cannot be satisfiable. Thus, if every finite subset ∆ of Γ is satisfiable, so is Γ.  Finally, we have two related theorems about the sizes of models for first-order theories. Theorem 3.5. Downward Lowenheim–Skolem: Let M be a model of L1K(=) of cardinality κ. Then M has a submodel M 0 whose domain is at most τ < κ (for any τ we pick) such that, for every assignment s and formula Φ, M, s  Φ if and 0 only if M , s  Φ. Proof. The idea is to find a subset of dom(M) of a particular cardinality that is closed under all of the elementary operations. One first selects any subset of M with the desired cardinality, and then take its closure, showing that the cardinality 8 ETHAN JERZAK of the closure of M under all the elementary operations has cardinality not greater than M. The only difficulty occurs with existential quantifiers: in, say, (∃xi)P (xi), we might have the original xi outside of our chosen subset. We have to show that we can still find another xj in the closure of the subset that satisfies P .  Theorem 3.6. Upward Lowenheim–Skolem: Let Γ be a set of formulas of L1K(=) with arbitrarily large finite models. Then, for every infinite cardinal κ, there is a model of Γ whose domain has cardinality κ. Proof. The idea here is simply to bloat our language with new constant sym- bols. Recall that the set of non-logical terminology could be of any cardinality. So given Γ, a set of formulas with arbitrarily large finite models, we can simply add new constants, stipulating that they do not equal one another. We have, say, ∃(xi)∃(xj)...(xi 6= xj)... for whatever number of symbols we want. Every finite sub- set of these axioms is still satisfiable by some model M of Γ, since Γ has arbitrarily large models. Thus, by compactness, the entire set of axioms is satisfiable by Γ, and therefore there is a model of them all. This model cannot have cardinality less than however many constants we added.  4. First-Order Limitations Most of the limitations of first-order logic that I discuss here follow from the pre- vious two theorems. Together, they imply that for any first-order axiomatization, if that axiom system is satisfiable with arbitrarily large models, then it has models of any cardinality above a certain point. Thoralf Skolem was the first to thematize the oddities that arise from these theorems, publishing a ‘paradox’ in 1922 that he intended to undermine first-order set theory as a solid foundation for mathematics. The resolution of the paradox is well known—unlike the liar’s paradox and Russel’s paradox, this is not an actual antinomy—but it nonetheless places set theory in what Skolem calls a “paradoxical state of affairs,” and provides a good entryway to the limitations of first-order expressability. I will sketch the so-called paradox below, presuming a working knowledge of first-order set theory (in this case, ZFC). 4.1. Skolem’s Paradox. The basic idea is this: from the first-order axioms of ZFC we can use Cantor’s theorem to prove that there are uncountable sets—sets that cannot be put in one-to-one correspondence with the (standard) natural numbers. But since ZFC is a first-order theory, if it is consistent (and therefore satisfiable) with infinite models, it must have models of any cardinality. Then we have a model of ZFC that is countable. But we proved within ZFC itself that there are uncount- able sets! It seems, then, that the (translated) proposition ‘There are uncountable sets’ should be true in any model. Thus, we have a model where there are no uncountable sets, and a statement proved from the axioms of ZFC saying that any model must have uncountable sets. As bad as the above sounds, it turns out that if we think about it a bit more care- fully, there is no strict paradox here, and we needn’t consign ZFC to inconsistency. The resolution goes something like this: First, we must say more carefully what it means for there to be a certain ‘func- tion’ in ZFC.4 Our intuition of a function is something like a rule that assigns

4On its face, a function between our objects, whatever they are, is a second-order thing. We have to show that, with the language of set theory, it can be made a first-order thing. On this more later. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 9 elements of one set to elements in another set. That is, given x ∈ X and y ∈ Y , a function such that f(x) = y is just something that takes x from X and plops it into y in Y . We can reduce this intuitive notion more formally to a purely set-theoretic expression: f(x) = y just in case a certain relation holds between the ordered pair hx, yi—and this ordered pair can just be represented as the unordered set {x, {x, y}}. So the first horn of the apparent paradox says: we can prove from ZFC that for any model M there is a set X ∈ dom(M) such that no set F ∈ dom(M) 5 puts X in a one-to-one correspondence with N. The second horn of the paradox says that there is a model M 0 of ZFC in which there is a set F that puts X in one-to-one correspondence with N. And once we realize this, we see that there is no paradox at all. The isomorphism set F required for countability is in M 0, but not in M. So from within the model M, it is true that no set F will put X in one-to-one correspondence with N; but it is also true that X and N can be put in one-to-one correspondence, namely, in the model M 0. Thus, the two statements are consistent. 4.2. ‘A Paradoxical Situation’. Although we have shown that Skolem’s paradox yields no strict contradiction, it nonetheless points toward a deeper insight about first-order logic and the limitations on what it can express. It illustrates in a quite intuitively jarring way the manner in which L1K(=) is, in a sense, blind to the notion of cardinality. Where does this realization leave our project? Recall that our goal in developing a rigorous logical language was partly to be able to state the informal arguments that mathematicians regularly make in some sort of formal language. Many of the tools that mathematicians regularly use, after this paradox comes to the fore, get quite complicated when speaking a first-order language. We speak, in normal mathematical settings, of the natural numbers, or the real numbers. With the preceding observations in hand, we can no longer do this naively. There can be no first-order axiomatization that captures what we mean by the natural numbers—in particular, that will express the crucial notion that everything in dom(N) follows 0 by some iteration of the successor function S. By compactness there are non-standard countable models, and by Lowenheim-Skolem, there are even uncountable models. In order to talk about the natural numbers, then, we have to resort to talking about a particular model of an axiom scheme, one that cannot be captured by any axioms of a pure first-order language. This is a general fact about first-order languages. There are myriad concepts that mathematicians take to be non-problematic, like cardinality, that the language we have developed cannot capture. There is, in fact, a powerful tool for determining whether a certain concept can be made first-order expressible, and I will briefly state some of the results. 4.3. Ultraproducts. This section presupposes prior acquaintance with finitely ad- ditive {0, 1} measures. I will prove one proposition that will put the following on somewhat firm ground. First, a definition: A filter F on a set X is a collection of of X such that • X ∈ F • ø ∈/ F

5In the language of sets, an isomorphism F between X and Y is a set whose elements are of the form {x, {x, y}} where x ∈ X, y ∈ Y such that (∀y ∈ Y )(∃x ∈ X) such that {x, {x, y}} ∈ F , and also (∀x1, x2 ∈ X)(({x1, {x1, y}} ∈ F ∧ {x2, {x2, y}} ∈ F ) → x1 = x2). 10 ETHAN JERZAK

• If A, B ∈ F then A T B ∈ F . • If A ∈ F and A ⊂ B then B ∈ F . If, for every subset A of X, either A ∈ F or Ac = X −A ∈ F , we call F an ultrafilter. An ultrafilter, incidentally, is equivalent to a finitely additive {0, 1} measure. Just take the sets in F to have measure one, and those not in F to have measure zero. Now, there is one trivial possibility that we usually have to rule out: a perfectly good ultrafilter on X is the principal ultrafilter, which dictates that a single point has measure one. Any interesting ultrafilter should be non-principal. If no single point has measure one, then we can show by the intersection axiom that no finite set of points has measure one. This proves that the only ultrafilter on a finite set is principal. The following theorem shows that, for infinite sets, the world is more interesting. Proposition 4.1. Every infinite set X has an ultrafilter. Proof. The proof uses Zorn’s lemma, and therefore the . We will start out with any old filter on X. Our aim is to show that we can extend F to an ultrafilter. Let F be the poset of all filters on X under inclusion (such that Fi ≤ Fj if every set in Fi is also in Fj). By Zorn’s lemma, there is a maximal Fm in this chain of filters. Now assume by contradiction that Fm is not an ultrafilter. c 0 Suppose that there is some set A such that A/∈ Fm and A ∈/ Fm. Define Fm to 0 agree with Fm everywhere, but arbitrarily let it also include A. Then Fm is a filter greater than Fm, contradicting our assumption that Fm was the maximal element in F. Now we just have to rule out the possibility that all of these filters were principal. Just stipulate that, for all the filters we are considering, every set whose compliment is a single point is in the filter (or: has measure one). The maximal element in a chain satisfying this property will also be non-principal, and by the above argument it must indeed be an ultrafilter. 

Say we have a family of structures Mi over some index set I, where each Mi has the same set of non-logical terminology. Say also that we have some finitely additive {0, 1} measure µ on I. We will want I to be infinite, and µ to range over all cofinite subsets of I, otherwise µ is principal and the thing we are about to define is much less interesting. Take the Cartesian product

Y Mi i∈I and define an equivalence relation on the product such that a ∼ b just in case ai = bi for almost all i ∈ I (where ‘almost all’ is defined by µ). We then quotient out by this equivalence relation, obtaining the ultraproduct of Mi with respect to U and I, denoted

Y Mi/µ. i∈I Of course, one must show that ∼ is indeed an equivalence relation, but this is not difficult; one merely realizes that the union of two sets each with measure zero must itself be measure zero. The following theorem will show why ultraproducts are so useful for determining what concepts first-order logic can express. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 11

Theorem 4.2. (JerzyLo´s): Let Φ(x1...xn) be a first-order formula, and let Q M = i∈I Mi/µ. Then, for [a1]...[an] ∈ M, M  Φ([a1]...[an]) if and only if, for almost every i ∈ I, Mi  Φ(a1(i)...an(i)).

Proof. (⇒) This is trivial. If a formula Φ([a1]...[an]) is true in M, then Φ is in the equivalence where, for all i ∈ I, Mi  Φ(a1(i)...(an(i)). Everything else in this equivalence class differs from this only on a set of measure zero. Thus, if M  Φ([a1]...[an]), for almost every i, Mi  Φ(a1(i)...(an(i))). (⇐) The proof is by induction on the complexity of Φ. If Φ is atomic, then for Q almost all i, we have that Mi  Φ(a1(i)...an(i)). In i∈I Mi, then, we have that almost every element of the direct product has Φ(a1(i)...an(i)) true. But since the ultraproduct quotients out by sets of measure zero, we have that M  Φ([a1]...[an]). The inductive step is similar, but we should note that the existential case uses the axiom of choice. To go from Φ([a1]...[an]) to (∃x)(Φ([a1]...[an])), we have to be able to form each ultraelement [ai] by picking some aj(i) for each i ∈ I that satisfies the existential formula.  The above theorem gives us a powerful tool disproving that something is first- order expressible. Given some property, if we take the ultraproduct of a bunch of structures almost all of which satisfy that property, we can prove that we cannot formulate that property in a first-order way by showing that the ultraproduct does not satisfy that property. (For properties that we know to be first-order, it gives us a powerful tool for showing what certain ultrastructures look like.) Let’s have an example: Example 4.3. Suppose we are doing graph theory. This is a fairly simple first- order language, where for the constants K we have a set of vertices V , and a set of edges E such that for any v1, v2 ∈ V , either E(v1, v2) or ¬E(v1, v2). We stipulate also that (∀v1)¬E(v1, v1), and (∀v1, v2 ∈ V )[E(v1, v2) → E(v2, v1)]. It is fairly easy to show that certain things are first-order expressible, just by stating them thus. For example, we can represent ‘has a three-cycle’ in a first-order way with the formula: (∃v1, v2, v3)[E(v1, v2),E(v2, v3),E(v3, v1)]. (Note that we do not need to stipulate here that v1 6= v2, etc., because no vertex can be connected to itself with an edge, such that if v1 = v2, the formula is unsatisfiable.) It is a bit more difficult to prove that something isn’t first-order expressible, because it takes a lot of time to check every possible formula. The standard way of doing it is using Lowenheim-Skolem. For example, to prove that there is no first- order formula that is true for all and only finite graphs, we can simply use theorem 3.6 above: If a formula Φ is true for all finite graphs, then it has arbitrarily large finite models. But then by 3.6 it has an infinite model of any cardinality, so that there are models of Φ that also include infinite graphs. This is a perfectly fine thing to do, but based on our knowledge of ultraproducts we have a much more amusing way of doing such proofs. Take, for instance, the property of connectedness. Can we find a first-order formula Φ that holds for all any only connected graphs? We first observe that for any n we can indeed find a formula that says ‘G has a path from v1 to v2 of length n.’ When n = 2, for example, we could say that there is a path of length two from v1 to v2 if (∃v3)(E(v1, v3),E(v3, v2)). It is easy to generalize this formula to n. We seek, then something that says that given any two vertices in G, there is a path of length one between them, or a path of length two, or a path of length three, etc. This is an 12 ETHAN JERZAK infinite disjunction, though, which are disallowed. But perhaps we are not being clever enough, and an entirely different approach will give the desired property. The following argument will show that this cannot be the case. The strategy is to construct a bunch of connected graphs, take their ultraproduct, and show that the ultraproduct is itself not connected. By theorem 4.2, then, it will follow that connectedness cannot be expressed by any first-order formula. Each Gi will just be the i-cycle. This defines a countably infinite collection of graphs, so there is some ultrafilter µ on the index set N. Furthermore, whatever this µ is, almost all of the Gi must be connected, because all of them are. We can then take the ultraproduct Y G = Gi/µ. i∈N So, we ask: what does G look like? We can prove that, whatever it is, it cannot have any cycles of any length. There cannot be a cycle of length 3, because almost all Gi have no cycle of length three, and having a cycle of any particular length is first- order expressible. Obviously, then, G has no cycles of length n for any n, because almost all Gi have no cycle of length n. Given a vertex, the property that this vertex has a particular degree is first-order: thus, every vertex in G must have degree two, as does every vertex in every Gi. The only graph with no cycles but where every vertex has degree two is the graph isomorphic to Z with a vertex at every integer and an edge between any two integers n and m were |n − m| = 1. G therefore consists of countably many copies of Z. But then G is certainly not connected. If connectedness were first-order expressible, then G would be connected; G is not connected; therefore connectedness is not first-order expressible. This concludes what I wish to say about first-order logic on its own. The idea I wish to stress is that, while L1K(=) has quite appealing metalogical results, viz., the soundness and completeness theorems, it places limits on the expressibility of certain crucial mathematical notions. In the next section, I shall develop stan- dard second-order logic, to see what one gains and loses by allowing higher-order quantifiers.

5. Second-Order Logic The completeness theorem for first-order logic suggests, intuitively speaking, that we have developed the ‘right’ semantics and the ‘right’ deductive system for a system with quantifiers and variables that can range over objects. As we will show, for second-order logic, there can be no such match, no matter what semantics and deductive system we use. Thus, there are competing versions of second-order logic, especially with respect to the semantics. In this paper I will stick to standard second-order logic, which is the most obvious extension of first-order logic to include second-order variables. Other ways of doing it exist, each with its own merits and limitations. The two most notable extensions different from the standard way that I shall develop are, first, a free-variable only extension, where we allow a free variable to be a relation but we do not allow quantifiers to range over relations, properties, or functions, and second, full second-order logic with the Henkin semantics, where for every model we limit what collections of objects each second-order variable can range over. The Henkin model actually yields metalogical results almost as appealing as those for first-order, but comes with most of its baggage. I shall therefore limit myself to full second-order logic with the standard semantics. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 13

We begin with L1K, first-order logic without equality. We add to L1K relation and function variables, with universal quantifiers to bind them. We will call stan- dard second-order logic with a set K of constants L2K. In all that follows, capital letters like X and Y will stand for relation variables, lowercase letters like f and g will stand for function variables, and lowercase letters like x and y will stand for object variables. A function or relation variable of the form Xn or f n indicates that the function or relation is n-place. Greek letters stand for formulas, and Γ and ∆ stand for sets of formulas. I shall also hereafter abbreviate things like R(x1, ..., xn) as Rhxin. We define the existential quantifier in the usual way: (∃X)Φ ↔ ¬(∀X)¬Φ, and (∃f)Φ ↔ ¬(∀f)¬Φ. So, for example, if I wanted to say, ‘there is a property that applies to nothing’, I could: (∃X)(∀x)¬Xx. You may be fretting about the lack of identity as a logical symbol, but we can define it purely in terms of second-order variables:

x = y :(∀X)(Xx ↔ Xy). For the purposes of this paper, I will not distinguish between properties/relations and sets. Intuitively, there does seem to be a difference in sense, namely, that properties/relations seem intensional while sets are purely extensional. In fact, from a logical standpoint, we can define sets, relations, and functions, all in terms of another (usually one chooses sets to be primitive). I can define fhxin = hyim to be the ordered set (which, as we saw above, can be rewritten, albeit not efficiently, as an unordered set) X = hhxin, hyimi. A relation Rhxin can be encoded by the set X = hxin. This is where differences in sense can become tricky at least in non-mathematical settings: the set of people defined by the property of having a liver may well be the same as the set of people defined by the property of having a kidney, but these properties still seem to be different. In the world of mathematics, though, these differences are less important. Our concern here is mostly in the range of these objects and the logical structure that this range implies, not in the subtleties of their intuitive senses. It is easy to see from the above how to create n-order logics: simply introduce new variables to range over sets of (and therefore relations among and functions between) (n − 1)-order variables. It is a fact, however, which I shall by no means prove, that one gains little in expressive power by going beyond second-order variables. Second- order logic is, in a certain sense, enough.

5.1. Deductive Systems. The deductive system D2 for L2K is a straightforward extension of D1. One takes everything from it, and merely adds the following axioms: • (∀Xn)Φ(Xn) → Φ(T ), where T is either an n-place relation variable free for Xn in Φ or a non-logical n-place relation letter. • From Φ → Ψ(X1), you can infer (∀X)Φ(X), as long as you make sure that X does not occur free in Φ or in any premise. Of course, using the above equivalence between functions and relations, one gets similar deductive rules that can abbreviate the unwieldily translation. Basically, in either of the two above axioms, replace each n-place relation variable with an n-place function variable, and change everything else in the obvious way. We will 14 ETHAN JERZAK also require the following two axioms, the first being more essential, the latter being more contentious: n n • Axiom scheme of comprehension: (∃X )(∀hxin)(X hxin ↔ Φhxin). • Axiom of choice: n+1 n+1 n n+1 (∀X )[(∀hxin)(∃y)(X hhxin, yi) → (∃f )(∀hxin)(X hhxin, fhxini)]. These formal definitions are a bit more opaque than the ones derived mutatas mutandis from D1, so they call for a bit of discussion. The axiom scheme of comprehension says that for every formula Φ of L2K, there is some relation Xn with the same extension. This version of the axiom of choice says, in the antecedent, that for every sequence hxin there is some y such that the sequence hhxin, yi satisfies Xn+1. The conditional says that, if this be the case, then there is some function that selects one such y for every hxin. This axiom has a troubled history, but I have implicitly been assuming it (in the metatheory) throughout this paper. For example, the proof ofLo´s’stheorem depends on it. Other equally interesting things happen if we drop the axiom of choice, but that is a topic for another paper. My aim here is in finding a good way to formalize mainstream mathematics. Recall that we defined first-order equality in terms of relations. To justify D2, then, we would have to show that we can derive the axioms for equality in D1 from the above axioms and the definition of equality in L2K. This is a good exercise.

5.2. Semantics. I shall present only the standard semantics, the most obvious ex- tension of first-order semantics to L2K.6 We begin with the exact same structure: a model of L2K is structure M = hd, Ii. A variable assignment s still assigns a member of d do each first-order variable, but now it also assigns a subset of dn to each n-place relation variable, and a function from dn to d for each n-place function variable. The denotation function for the terms of L2K just extends that for L1K: • If M = hd, Ii is a model and s is an assignment on M, the denotation of n n f htin in M, s is the value of the function s(f ) at the sequence of members of d denoted by the members htin. The relation of satisfaction is also an extension of that for L1K: n • If X is a relation variable and htin is a sequence of n terms, then M, s  n X htin if the sequence of members of d denoted by the members htin is an element of s(Xn). 0 0 • M, s  ∀XΦ if M, s  Φ for each assignment s that agrees with s at every variable except possibly X.

6The other two semantics in common use for L2K work roughly as follows: we can define the 0 relation of quasi-satisfiability such that M, s quasi-satisfies Φ if M, s  Φ for every assignment s0 that agrees with s on the first-order variables. This definition can become the basis for a semantics for second-order logic with only free (no bound) second-order variables. The Henkin semantics for full L2K (with bound second-order variables) allows relation variables to range only over a fixed subset of relations on the domain, not necessarily all possible relations (and similarly for functions). So a Henkin model of L2K is a structure hd, D, F, Ii, where d is still the domain and I is still the interpretation function, but D and F are collections of relations and functions, respectively. The interpretation function is only allowed to assign to each relation variable a relation in D, and for function variables, only collections found in F . These limitations are have some useful applications and appealing metaresults. In particular, the soundness and completeness theorems hold for L2K with the Henkin semantics. Unfortunately, this is only the case because the Henkin semantics effectively makes L2K identical to a many-sorted version of L1K, and therefore no more expressive. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 15

0 0 • M, s  ∀fΦ if M, s  Φ for each assignment s that agrees with s at every variable except possibly f. Everything else works exactly as you would expect. Φ is valid if for every M and s, we have that M, s  Φ. Φ is satisfiable if there is some M and s that satisfy Φ. Φ is a semantic consequence of Γ if every M, s that satisfy Γ also satisfy Φ, i.e. if Γ S ¬Φ is not satisfiable. There are a few other details to work out, but everything pretty much translates mutatas mutandis from L1K semantics to L2K standard semantics. Let us now look at the metaresults for second-order logic.

6. Second-Order Metaresults As we will see, L2K is much less metalogically satisfying than L1K. Of particular consternation is the lack of completeness. There is one glimmer of hope, however, which is that our deductive system D2 is at least sound. Theorem 6.1. Soundness of L2K: Let Γ be a set of formulas and Φ a single formula in L2K. If Γ `D2 Φ, then Γ  Φ in the standard semantics. Proof. Like the soundness proof for D1 and L1K, this proof is relatively straight- forward. We dutifully check every axiom and rule of inference. Unlike the proof for D1, however, it will seem more dubious. I shall check only the two substantially new rules: the axiom scheme of comprehension and the axiom of choice. What fol- lows will, I hope, seem very much like cheating and circular argumentation. Indeed, any attempt to ‘prove’ or ‘validate’ the axiom of choice should fall on skeptical ears. But we are not validating the axiom out of thin air as a metaphysical principal; we are showing that, given that it is an intuitively true axiom, the rule holds in the formal language L2K. Basically, then, to validate the axiom of choice for L2K,I am simply going to invoke that principal in the metatheory. Thus, it will seem as though I am using something to prove itself. But really this is not the case—or, if it is the case, then it is also the case for the proof of soundness for L1K. Recall that in that proof, we proved the validity of, say, modus ponens by saying: Case Three: We obtain Φ via modus ponens from Ψ and Ψ → Φ, where Γ  Ψ and Γ  (Ψ → Φ). But those together semantically imply Γ  Φ, since every model of Ψ and Ψ → Φ must be model of Φ. This proof, too, uses the very principal it is trying to validate, by appealing to the fact that any model of Ψ and Ψ → Φ must be a model of Φ. The point is that the principal we invoke is in the metatheory, while the principal we prove is in the formal system itself. The same shall go for these two new axioms in L2K. The aim is not to show that these axioms are metaphysically justified, but rather to show that they are allowed with respect to the semantics. Thus, this proof is not structurally different from that for D1; in order to argue that D1 retains epistemological priority, one would have to argue that the very metatheoretic principal of modus ponens itself is epistemologically prior to the axiom of choice or comprehension. That investigation leaves the realm of logic, and enters that of metaphysics. Anyhow, to business. Let us first prove that the axiom scheme of comprehension is sound. We want to show that, given a formula Φ with n free variables, we can n n find a relation X with the same extension, i.e. for any hxin, X hxin holds if and only if Φhxin is true. We will use the principal that every formula in the 16 ETHAN JERZAK metatheory determines a relation, namely, the relation of satisfying that formula. We also assume (in the metatheory) that if d is a set and P a property, there is some set containing all and only the members of d that satisfy P . Given these two principals, the validity of the axiom scheme follows immediately: given Φ, there is some relation in the metatheory co-extensive with Φ, and we can restrict the domain of that relation to that of our original model by the above separation property. That will yield a perfectly good relation Xn in M that has the same extension as Φ. A similar argument uses the axiom of choice to prove the validity of the axiom of choice. Start with any (n + 1)-place relation Xn+1 in which, for whatever first n variables we choose, there is some n + 1st variable y satisfying Xn+1. We want to n show that there is some f that will pick out one such y for every sequence hxin. But since we have such a y for every hxin, we have a bunch of non-empty sets, one with each y. By the (metatheoretic!) axiom of choice, there is some function that will pick out one such y for every appropriate sequence. And by the principal of separation we used in the above paragraph, we can restrict the domain of f n to whatever model we started with, without changing the desired property that this n+1 function satisfies X hhxin, fhxini for any hxin. Such a function therefore exists in our model, and the axiom of choice (in D2) is semantically implied by anything.  The natural thing to inquire now is whether L2K, like L1K, is complete. The answer will be negative, by Godel’s first incompleteness theorem. But this failure of completeness will at the same time give a quite appealing result: we can successfully characterize arithmetic in pure L2K. I assume familiarity with the first-order of arithmetic. The set of non-logical symbols is A = {0, s, +, ∗, <}. The only thing that changes in the second-order version is: instead of having an axiom scheme for induction, whereby we have to enumerate a separate axiom for each sentence in the language, we can simply state the full-blooded induction axiom:

(∀X)[(X0 ∧ (∀x)(Xx → Xsx)) → (∀x)Xx] We already know that L1A has non-standard models—even uncountable models. The following will show that this slight embarrassment does not occur in L2A.

Theorem 6.2. Categoricity of Arithmetic (Dedekind): Let M1 = hd1,I1i and M2 = hd2,I2i be two models of L2A. Let 0i, si, +i, and ∗i be the interpretations of the constants for i = 1, 2. It follows that M1 and M2 are isomorphic.

Proof. The strategy is to define a particular subset of d1 × d2 and prove that it is, first, a function, and second, an isomorphism. This is a routine proof, but an important result. Let S ⊆ d1 × d2 be called successor closed if:

• h01, 02i ∈ S • If ha, bi ∈ S then hs1a, s2bi ∈ S.

Let f be the intersection of all successor-closed subsets of d1 × d2. f cannot be empty, because d1 × d2 is itself successor-closed and h01, 02i is in every successor- closed subset. Now, to the task of showing that f is not only a function but also an isomorphism. Four lemmas establish the theorem, and we make ample use of inductive arguments. SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 17

(1) For every a ∈ d1 there is some b ∈ d2 such that ha, bi ∈ f. Let P be the set in d1 for which this property holds; we must show that P contains everything in d1. Obviously, 01 ∈ P , since h01, 02i ∈ f. Now, let a ∈ P . There is some b ∈ d2 such that ha, bi ∈ f. But since f is successor-closed, hs1a, s2bi ∈ f. Thus s1a ∈ P , and by induction, P contains every member of d1. (2) If ha, bi ∈ f and ha, ci ∈ f, then b = c. (Thus f is a function.) Let P be the subset of d1 for which the desired property holds. Suppose by contradiction that 01 is not in P . Then there is some c 6= 02 such that h01, ci ∈ f. Set S = d1 × d2 − {h01, ci}. S is still successor-closed, but since h01, ci is not in S, it cannot be in the intersection of all successor-closed subsets, a contradiction. Thus 01 ∈ P . Now, let a ∈ P , and let b be the unique element of d2 such that ha, bi ∈ f. Clearly hs1a, s2bi ∈ f. Now just apply the same trick as before: suppose there is some c 6= s2b such that 0 0 hs1a, ci ∈ f. Let S = f −{hs1a, ci}. Now, clearly s1a 6= 01, so h01, 02i ∈ S . 0 Say hu, vi ∈ S . Then hu, vi ∈ f, and so hs1u, s2vi ∈ f. If u 6= a, then 0 s1u 6= s1a, and so hs1u, s2vi ∈ S , If, on the contrary, u = a, then since 0 0 a ∈ P , v = b, and s2v = s2b 6= c, we have hs1u, s2vi ∈ S . Then S is successor-closed and f ⊆ S0, contradicting that f is the intersection of all successor-closed subsets. (3) By (1) and (2), we can start writing f(a) = b instead of ha, bi ∈ f. f is one-to-one and onto d2. Let P be the subset of d2 such that for everything in P there is something in d1 that f sends to it. Clearly 02 ∈ P , because f(01) = 02. Let b ∈ P . Then there is some a ∈ d1 such that f(a) = b. But since f is successor-closed, f(s1a) = s2b, and thus s2b ∈ P . Thus f is onto. Now, let P be the subset of d1 such that, for any a, b ∈ P , if f(a) = f(b), then a = b. The same argument as (2) shows that 01 ∈ P . Now let a ∈ P and everything less than a be in P . Then suppose by contradiction that there is some c 6= s1a such that f(c) = f(s1a). Let S = f − {hc, f(c)i}, and use the same tricks as before to show that S must be successor closed, contradicting that f is the intersection of all successor closed subsets. (4) f preserves the structure of the models. Obviously f(01) = 02. Since f is successor-closed, we must have that f(s1a) = s2(f(a)). We still have to check that f(a +1 b) = f(a) +2 f(b), and similarly for ∗. The arguments for these include no ideas not previously used in (1), (2), or (3). Thus, f is an isomorphism.



This on its own is a very appealing result for L2K. We can finally, it seems, give an axiomatization that captures exactly what we mean when we say ‘natural num- ber’. (A similar construction works to show the categoricity of the real numbers.) It also refutes Lowenheim-Skolem for L2K: here we have an axiom system all of whose models are isomorphic and therefore are countable (since the standard model N is countable). In fact, we can give a complete characterization of cardinality for L2K. The following sentence holds for all and only infinite sets:

INF (X):(∃f)[(∀x)(∀y)(f(x) = f(y) → x = y)∧(∀x)(Xx → Xf(x))∧(∃y)(Xy∧(∀x)(Xx → fx 6= y))] 18 ETHAN JERZAK

This says that there is a one-to-one function from X to X whose range is a proper subset of X. Any using it, we can characterize finitude:

FIN(X): ¬INF (X). Neither of these formulas has any non-logical terminology. And I can similarly characterize any cardinality by simply creating a formula that says: X has cardi- nality at least κ and X is not isomorphic to a set of size κ. This shows that both Lowenheim-Skolem theorems fail for L2K. Theorem 6.3. The standard semantics for L2K is not compact. Proof. This theorem follows from the above. I can simply find a set of sentences, every finite subset of which is satisfiable, but which are not altogether satisfiable. The categoricity of arithmetic will serve us well: take the exact same construction we routinely use to prove that there are non-standard models of L1A: add an infinite number of axioms, which say: • (∃c)(c 6= 0) • (∃c)(c 6= 0) ∧ (c 6= s0) • (∃c)(c 6= 0) ∧ (c 6= s0) ∧ (c 6= ss0) • ... Every finite subset of these axioms is satisfiable in L2A (since they are true in the standard and only model), but all of them together are not, or else arithmetic would not be categorical. If L2K were compact, then L2A would be compact; L2A is not compact; therefore L2K is not compact.  The last metalogical result we shall state is the incompleteness of D2—indeed, of any deductive system for L2K. Theorem 6.4. Let D be any effective deductive system that is sound for L2A. Then D is not complete: there is a logical truth that is not a theorem of D.A forteriori, D2 is incomplete. Proof. Let AR be the axioms of arithmetic, and set T = {Φ} such that Φ is a sentence with no relation or function variables and `D AR → Φ. Since D is effective, the set T is recursively enumerable. And since D is sound, every element of T is true of N. By Godel’s incompleteness theorem (1934), the set of true first- order sentences of arithmetic is not recursively enumerable. So, pick Ψ to be a true first-order sentence that is not in T . Then AR → Ψ is not provable in D, but AR → Ψ is a semantic logical truth. 

7. Conclusion: Living without Completeness The aim of the entire preceding investigation was to sort out, in a hasty and preliminary way, which is the ‘right’ logical language for ‘grounding’ (or: ‘in which to do’) mathematics. The single quotation marks are essential. Of course, this question is already at once vague and philosophically vested: it presupposes, in the first place, that mathematics is something that has somehow to be ‘grounded’, and it leaves the question of what constitutes the ‘right’ way to do it completely ambiguous. To be maximally foundational, this question must needs come after the more interesting but broader ontological inquiry: what, after all, is mathematics? SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 19

Is it the sort of thing that requires any grounding? What level of formalism is required for a compelling mathematical proof? If the formal language is to ‘ground’ anything, what in turn grounds it? And what, in God’s name, does the existential quantifier mean? (This is related to the question: what is the ontological status of the elements in the domains of our models?) Different answers to this question will yield different views on how much one should allow as logic. Indeed, different answers will bring for different views on how much to distinguish mathematics from logic proper in the first place. In order to get a bit of a clearer view on how to begin to get a hold of these ques- tions, I wish to delineate two rough conceptions of mathematics that are floating around in the entire above inquiry. The first, which we might call ‘foundational- ism’, holds that mathematics is indeed something that requires a firm ground, and that the way to provide this ground–basically, to eliminate as much of the Kantian imagination as possible from the tools required for mathematics—is to develop a pure logical language in which mathematical proofs become purely analytic, not synthetic. We build mathematics out of things like modus ponens, which are sup- posed to be eternally valid. The second view denies the need for firm ‘foundations’, whatever those could be, and makes the somewhat more modest claim that the pur- pose of logic is something more akin to transparency; we wish to develop a formal language in order to state proofs to which we already intuitively assent in the most transparent way possible. We might call those in this camp ‘semanticists’. On this view, mathematics becomes more an exercise, not of proving eternal truths from analytic a priori axioms, but rather of investigating the meanings of our already substantial ideas of various concepts. provides a formal lan- guage in which these concepts can, as it were, be investigated and modified to the fullest extent that we are capable of understanding. Here, modus ponens is valid because, upon understanding the meaning of ‘if...then’ clauses, it is impossible for us to so much as imagine a world in which the principle did not hold. It makes no claim that modus ponens is justified over and above our conceptual understanding. Note that these two views are related to, but not identical with, a sort of ob- jectivist/subjectivist bent. The foundationalists will be more comfortable with the idea the epistemic warrant for mathematical truths transcends the limitations of our minds, whereas the semanticists may well hold that hoping for such an epis- temic warrant is foolhardy and that the most we can hope for is to discover the boundaries of our conceptual understanding. Nonetheless, these dichotomies are not equivilant: in particular, dismissing the possibility of an epistemic warrant that transcends the structure of our minds does not imply that everything in the discourse is purely subective. We may well be beholden to the limits imposed upon our understanding in an unequivocal way. A third view, which I shall not discuss much, claims to dismiss all of these questions: mathematics has no purpose out- side of itself, and any attempt to secure foundations for it or even to investigate its ‘meaning’ is speculative stupidity. A mathematician’s world is a playground of axioms, where one makes up certain rules arbitrarily and sees what he can build out of them. Despite the anti-philosophical prejudice of this camp, its ontological presuppositions are not substantially different from the semanticist view; as long as they are doing mathematics proper and not speaking gibberish, they are fol- lowing certain rules of inference like modus ponens that, if only implicitly, have 20 ETHAN JERZAK some special inviolability (if only with respect to our mental dispositions). What distinguishes them from the semanticists is merely their attitude toward philosophy. I do not intend, in this paper, to adjudicate between these two views on a metaphysical level. I wish merely to align them with what should naturally be their stances on which type of logic is preferable as a ‘foundation’ or ‘language’ for mathematics, based on the above metalogical results, and then to point out tempting epistemological inferences that nonetheless cannot be made on the basis of logic. The focus throughout will be on completeness. If one requires for mathematics an epistemic warrant that is supposed to tran- scend the particular disposition of the human mind, one naturally wants to make as few ontological presuppositions as possible. Given this requirement, there are two marks against higher-order languages that stand out immediately. The first is precisely that higher-order languages are so expressive. Since second-order logic can characterize arithmetic, real analysis, and even set theory, the reasoning goes, it must have all of the results of those theories as underlying ontological presuppo- sitions. Indeed, this line of reasoning leads Quine to dismiss second-order logic as “set theory in sheep’s clothing.” Second-order logic cannot provide a ground for mathematics, because, in a certain sense, it already is mathematics. Using it to provide a ground would be patent circularity. The second and perhaps even more fundamental reason that a foundationalist grants first-order logic epistemic priority is that it is complete. Godel’s completeness theorem, on this view, validates L1K(=) as an ontologically presuppositionless, completely self-validating system. It is, in a sense, precisely the system in which to express pure analyticity and to rid the playing field of the need for anything metaphysically stronger than the pure understanding (in particular, the pure pure intuition). Since anything significantly more expressive L1K than it cannot be complete, it is univocally the correct language for grounding mathematics. The syntax and the semantics align perfectly, and therefore all that L1K does, in the end, is to bring tautologies to light, without making any claims about what sorts of things there are. It is a perfect tool for proving, given some presuppositions about what there is, what else must follow. Anything more expressible than it is delegated to the realm of incompleteness, where there are scary valid formulas that we cannot prove, and where we cannot prove the of our language within itself. Using such an incomplete language requires a certain degree of faith, defeating the goal of trying to place mathematics beyond all possible doubt. Not only does L2K require a heavy-duty ontology; it also cannot have a self-validating match between the syntax and the semantics. We should therefore limit ourselves as much as possible to L1K(=), duly accepting the baggage that comes with it. On a non-foundationalist view, however, these appeals lose some of their luster, and the limitations on what L1K can express become more damning. There is little question that, if the point of mathematical logic is to provide a formal language to capture the intuitively compelling arguments that mathematicians regularly make, L2K does a better job than L1K. L1K is simply not strong enough to express many of the concepts that we regularly use unproblematically, like cardinality and graph connectedness. Some things that we can express we can be stated only clumsily, like, for instance, the infinite list of axioms that comprises the induction scheme. We often afford L1K priority because of the completeness theorems and some of the other metalogical results. But if the point of logic is not to ‘ground’ mathematics SECOND-ORDER LOGIC, OR: HOW I LEARNED TO STOP WORRYING AND LOVE THE INCOMPLETENESS THEOREMS 21 but rather to find a formal language to encapsulate it, incompleteness no longer creates as damning a problem. Less epistemological weight is placed on logic, and therefore the fact that there are semantic truths that we can never prove is less frightening. If the point is only to find a language in which to do mathematics, then, in a sense, we simply must resign ourselves to an incomplete language. The mathematics that we need that language to express is itself necessarily incomplete. We face, therefore, a trade-off in choosing between languages. On the one hand, L1K(=) is complete, consistent, and compact, but cannot express many things in mathematics that get treated unproblematically. On the other, L2K can capture pretty much any notion we want to use in mathematics, but we pay the price of incompleteness and the failure of compactness. Deciding which to take as the more appropriate logic for doing mathematics depends largely on what one aims for logic to do. I conclude by raising a few concerns for each view, surveying possible arguments by which one could try to use some metalogical results to validate one view over the other. Recall that one of the appeals of L1K, to foundationalists, is that it is ‘ontologi- cally presuppositionless’. The evidence for this claim lies mostly in the completeness theorem. But let us look more closely at how we actually constructed L1K and proved the completeness theorem. From nothing comes nothing; we had to start out with a fairly rich metalanguage in order to construct L1K in the first place. I claim that the metalanguage required for L1K is no less ontologically rich than that required for L1K. In a sense, this will imply that, epistemologically speaking, L1K and L2K stand on more or less the same ground. Though L1K itself is not strong enough to say things about cardinality, or to do set theory, we implicitly assumed these tools while constructing L1K. K itself is a good old fashioned set, which we allow to be of any cardinality. The entire construct of cardinality is presupposed by the Lowenheim-Skolem theorems.Lo´s’stheorem requires nothing less than the full axiom of choice for the transition from Φ to (∃x)Φ. Indeed, even Godel’s completeness theorem, the result that is supposed to guarantee the self-validating nature of L1K, must presuppose, in the metatheory, that there are formulas with infinite models. Thus, the same metatheoretic principles needed for the construction of L2K are needed in L1K. We use little in the metatheory of L2K that we do not use for L1K; the axiom of choice, the validity of basic rules of semantic inference, etc. Thus, to argue that L1K retains epistemological priority because of its limited ontology or completeness, one has somehow to show that, from an epistemological standpoint, a formal language can transcend the metalanguage that was used to construct it. There may well be a good way of doing this, but I cannot see immedi- ately what it would be, and to deal with it properly would doubtless take far more room than I have here. Completeness may well still give L1K some sort of priority, but the argument for this inference must somehow show that a formal language can transcend the metalanguage required for its construction. Until one can do this, one might as well abandon completeness and embrace L2K for its expressive power. If what results from L1K is no more privileged than L2K, there is no need to limit our discourse to the former.

Acknowledgments. I wish to thank my mentors, Jonathan Stephenson and Matthew Wright, for their orations on various fascinating mathematical topics throughout 22 ETHAN JERZAK the REU. I wish in particular to thank Matthew Wright for providing feedback on this paper.

References [1] Enderton, Herbert. A Mathematical Introduction to Logic. Academic Press, Inc. 1972. [2] Schapiro, Stewart. Foundations without Foundationalism: A Case for Second-Order Logic. Oxford University Press. 1991. [3] Babai, Lazlo. REU 2009 Class Notes, Discreet Mathematics. University of Chicago