<<

PREFACE

This special issue contains invited papers from the conference “Found- ations of the Formal Sciences I” which took place at the Humboldt- Universität zu Berlin in May 1999. The conference was part of the conference series “Foundations of the Formal Sciences” which has been successfully bringing together young researchers of the different Formal Sciences not only at its first meeting in Berlin, but in the meantime also at the next meeting in the year 2000 in Bonn. The reader can find more about the general aim of the conference series in the introductory paper “The Formal Sciences: Their Scope, Their Foundations, and Their Unity” of the first editor in this issue. As laid out in that article, we asked the participants to give very special one-hour talks: They should be addressing an interdisciplinary, non-specialist (yet informed) audience, but they should not be mere sur- veys; instead, we expected understandable descriptions of main techniques and lists of open problems with an interdisciplinary character. The conference was a full success: Researchers from fields lying vastly apart as history of mathematics, applied computer science and higher set theory were able to talk to each other, and reached some level of general understanding of each other’s fields. We asked the authors for written versions of their talks that capture this interdisciplinary spirit, and indeed, we received very fine articles that are written for the serious non-specialist reader with a research interest. The papers that the reader can find in this volume are not just compila- tions of interesting results from these areas but also fine explanations of the major concepts and proof techniques as well as descriptions of the important research projects in these fields. Again, we refer the reader to the introduction which contains a couple of remarks on how to utilize this volume. We shall give the list of participants and the conference schedule, but first of this is the place to mention a couple of notes of gratitude to- wards people without whom we would not have been able to organise the conference: First of all, we have to thank the Studienstiftung des deutschen Volkes and their representatives Dr. Hans-Ottmar Weyand and Dr. Niels Weidtmann. Their financial help was instrumental for the realization of

Synthese 133: 1–4, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 2 PREFACE the project as a whole. At the Humboldt-Universität zu Berlin, we have to say thanks to the graduate students Mr Thoralf Räsch and Mr Mi- chael Bruening who were of technical assistance during the conference and to the administrative assistant of the Lehrstuhl für Mathematische Logik, Mrs Christa Dobers, who helped with the production of the con- ference programme. The Humboldt-Universität zu Berlin and in particular the Lehrstuhl für Mathematische Logik provided the lecture rooms and some technical equipment for which we would like to thank Ronald Jensen representative for all involved persons. We’d also like to thank Rudolf Rijgersberg of Kluwer Academic Pub- lishers and John Symons, Meredith Enish, and Jaakko Hintikka for their interest and support, and for the rare opportunity to fill a double issue of this journal with the proceedings of a conference.

PARTICIPANTS

• Susana Balfego, Humboldt-Universität zu Berlin, Lehrstuhl für Mathematische Logik, Unter den Linden 6, D – 10099 Berlin [email protected] • Sebastian Bauer, Charlottenstraße 25, D – 13156 Berlin [email protected] • Christoph Benzmüller, Universität des Saarlandes, Fachbereich In- formatik (FB 14), AG Deduktionssysteme, D – 66041 Saarbrücken [email protected] • Manuel Bodirsky, Rotenbergstraße 12, D – 66111 Saarbrücken [email protected] • Michael Bruening, Humboldt–Universität zu Berlin, Lehrstuhl für Mathematische Logik, Unter den Linden 6, D – 10099 Berlin [email protected] • Wolfgang Burr, Westfälische Wilhelms–Universität Münster, Institut für mathematische Logik und Grundlagenforschung, Einsteinstraße 62, D – 48149 Münster [email protected] • Antje Christensen, Klintholmvej 3, 2. th., 2700 Brønshøj, Dänemark [email protected] • Hartmut Fitz, Wichertstraße 64/III, D – 10439 Berlin [email protected] • Gerhard Fotheringham, Technische Universität Berlin, Sekretariat TIB 4/2–1, Gustav-Meyer-Allee 25, D – 13355 Berlin [email protected] PREFACE 3

• Stefan Geschke, Freie Universität Berlin, Fachbereich Mathematik und Informatik, Arnimallee 2–6, D – 14195 Berlin [email protected] • Kai Hauser, Humboldt–Universität zu Berlin, Lehrstuhl für Mathe- matische Logik, Unter den Linden 6, D – 10099 Berlin [email protected] • Jan Jürjens, The University of Edinburgh, Laboratory for Foundations of Computer Science, James Clerk Maxwell Building, Room 3311, King’s Buildings, Mayfield Road, Edinburgh EH9 3JZ, Scotland [email protected]..uk • Reinhard Kahle, Wilhelm–Schickard–Institut für Informatik, Eber- hard–Karls–Universität Tübingen, Sand 13, D – 72076 Tübingen [email protected] • Peter Koepke, Rheinische Friedrich–Wilhelms–Universität Bonn, Mathematisches Institut, Beringstraße 6, D – 53115 Bonn [email protected] • Eberhard Knobloch, Technische Universität Berlin, Institut für Philo- sophie, Wissenschaftstheorie, Wissenschafts– und Technikgeschichte, Ernst–Reuter–Platz 7, D – 10587 Berlin [email protected] • Oliver Kutz, Erich-Weinert-Straße 26, D – 10439 Berlin [email protected] • Heiko Mantel, Deutsches Forschungszentrum für Künstliche Intelli- genz GmbH, Stuhlsatzenhausweg 3, D – 66123 Saarbrücken [email protected] • Adrian . D. Mathias, Institut de Recherche en Mathématiques et Informatique Appliquées, Université de la Réunion, 15 avenue René Cassin, BP 7151, F – 97715 Saint-Denis [email protected] • Ralph Matthes, Ludwig–Maximilians–Universität München, Institut für Informatik, Lehrstuhl für Theoretische Informatik, Oettingenstraße 67, D – 80538 München [email protected] • Guy Merlin Mbakop, Humboldt-Universität zu Berlin, Institut für Mathematik, Unter den Linden 6, D – 10099 Berlin [email protected] • Stephan Merz, Ludwig–Maximilians–Universität München, Institut für Informatik, Lehr– und Forschungseinheit Programmierung und Softwaretechnik, Oettingenstraße 67, D – 80538 München [email protected] 4 PREFACE

• Hans Jürgen Prömel, Humboldt–Universität zu Berlin, Institut für In- formatik, Lehr– und Forschungsgebiet Algorithmen und Komplexität, Unter den Linden 6, D – 10099 Berlin [email protected] • Thoralf Räsch, Humboldt–Universität zu Berlin, Lehrstuhl für Mathe- matische Logik, Unter den Linden 6, D – 10099 Berlin [email protected] • Michael Stolz, Eberhard–Karls–Universität Tübingen, Mathemati- sches Institut, Auf der Morgenstelle 10, D-72076 Tübingen [email protected] • Christian Tapp, Hensenstraße 168, D – 48161 Münster [email protected] • Andreas Weiermann, Westfälische Wilhelms–Universität Münster, Institut für mathematische Logik und Grundlagenforschung, Einstein- straße 62, D – 48149 Münster [email protected]

SCHEDULE

Friday, May 7, 1999 Saturday, May 8, 1999 Sunday, May 9, 1999 COMPUTER SCIENCE I MATHEMATICS II Chair: Kahle Chair: Weiermann 830 – 920 Prömel 830 – 920 Kahle 930 – 1020 Matthes 930 – 1020 Burr Coffee Break Coffee Break 1100 – 1150 Jürjens 1100 – 1150 Koepke 1200 – 1215 Closing

LUNCH BREAK 1300 – 1320 Opening MATHEMATICS I HISTORY &PHILOSOPHY II Chair: Prömel Chair: Rudolph 1330 – 1420 Weiermann 1330 – 1420 Christensen 1430 – 1520 Geschke 1425 – 1515 Hauser Coffee Break Coffee Break HISTORY &PHILOSOPHY I COMPUTER SCIENCE II Chair: Koepke Chair: Koepke 1600 – 1650 Stolz 1530 – 1620 Mantel 1700 – 1750 Knobloch 1630 – 1720 Merz Coffee Break Coffee Break 1815 – 1840 Mathias 1730 – 1820 Benzmüller

B.. F.R. Bonn, February 2001 BENEDIKT LÖWE

THE FORMAL SCIENCES: THEIR SCOPE, THEIR FOUNDATIONS, AND THEIR UNITY

Organizing a conference series with the title “Foundations of the Formal Sciences” obliges us to fill the terms “Formal Sciences” and, in particular, “Foundations of the Formal Sciences” with meaning. There are two very natural answers to the question “What are the Formal Sciences?”:

• Answer 1: ‘There is a profound duality in the classification of sciences according to their scientific approaches: some sciences are empirical, some are formal. The former deal with predictions and their falsifica- tion, the latter with the understanding of systems without empirical component, be it man-made systems (literary systems, the arts or social systems) or formal systems”. • Answer 2: “Formal sciences are those that deal with the deductive analysis of formal systems (i.e., systems independent of direct human influence)”. These two answers differ more from each other than it seems at first sight. Answer 2 is much stricter about what it allows to be called a formal science: for example, literary sciences classify as formal in the sense of Answer 1. Nonetheless, the do not (mainly) deal with formal systems, so they are not formal sciences in the sense of Answer 2. Answer 1 rests on the traditional dichotomy between natural sciences and liberal arts. Stressing this dichotomy leads to ignoring those formal sciences that we want to talk about; even worse, the fact that anything computational is often seen as the handmaiden of the empirical sciences, shifts mathematics and computer science towards the borderline between formal and empirical sciences in the sense of Answer 1.1 In Habermas (1967), Habermas mentions formal thinking as a method of analysis in the social sciences and the liberal arts, and identifies the combination of the lack of immediate empirical facts and their nomolo- gical character as the peculiarity of the social sciences. But the reader will search in vain for an unambiguous mention of the formal method as one of the fundamental parts of the of sciences.

Synthese 133: 5–11, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 6 BENEDIKT LÖWE

In the following, we shall not be using the traditional dichotomy, but we shall be talking about formal sciences in the sense of Answer 2.In this sense, and for the purpose of this paper and of the whole volume, we can (non-exclusively) subsume the subjects mathematics, theoretical philosophy, theoretical computer science, and formal linguistics under the term “Formal Sciences”. It is much harder to say something about the term “Foundations” be- cause the identification of foundational work cannot be done at this level of abstraction. Instead, we shall use the term “Foundations” as a negative criterion. Obviously, none of the formal sciences are entirely formal in their methods and goals: All research done in each of those subjects is either motivated, exemplified or corroborated by empirical or even hermeneutic methods. So we get a broad spectrum of non-formal factors in the research done in the formal sciences. When we talk about “Foundations of the Formal Sciences”, we want to exclude that part of the formal sciences that is closer to empirical methods than to formal methods. The paradigm of research in the area we want to touch with the confer- ence series has an empirical phenomenon as motivation, isolates a formal abstraction from it and investigates this formal system with pure deductive methods.

After identifying what part of sciences the conference series is about, let us have a brief look at the formal sciences from their beginnings.2 The formal sciences in our sense have been intrinsically multidisciplinary from their early beginnings. Early descriptions of formal thinking as a science can be found in Plato3 and Aristotle’s organon can clearly be seen as the first systematic treatment of a theory of thinking. Although it more specifically describes the dawn of logic than the dawn of formal sciences in general, the famous opening sentence of the Prior Analytics can be seen as a first agenda for the formal sciences:

For many centuries the formal sciences have been understood as the foundations of occidental knowledge and formed an integral part and the fundamental basis of the intellectual community under the headings of the logical sciences of the trivium and the mathematical sciences of the quadrivium.5 The Aristotelian and scholastic tradition accepted the formal sciences as a certain branch of the sciences with its very peculiar epistemic methods THEFORMALSCIENCES 7 and standards. To put it in Kantian terms, the formal sciences dealt with the Reine Anschauung as opposed to empirical data. By that they have been connected to the methodology of mathematics and logic, thereby being part of both the philosophical tradition and the newly won applications of mathematical sciences to the natural sciences and engineering. Both the object and the methods of the Formal sciences were recognized as different from the Natural and the Social sciences. The Aristotelian (demonstratio potissima) had been isolated as a special formal principle of mathematics, and has been subject to many investigations in the late middle ages and the renaissance.6 Just to give some examples (out of many possibilities) for the recog- nition of the methods and the scope of the formal sciences as a distinct field different from Natural sciences and Social sciences, let us give some quotes:

All that can fall within the compass of human understanding, being either, First, the nature of things, as they are in themselves, their relations, and the manner of operation: or, Secondly, that which man himself ought to do, as a rational and voluntary agent, for the attainment of any end, especially happiness: or; Thirdly. the ways and means whereby the knowledge of both the one and the other of these is attained and communicated; ...... ,Thethirdbranchmaybecalled or the doctrine of signs: the most usual whereof being words, it is aptly enough termed also , logic: the business whereof is to consider the nature of signs the mind makes use of for the understanding of things, or conveying its knowledge to others.7 Even more pointedly, we can find the identification of the formal method in Kant:

Die alte griechische Philosophie teilte sich in drei Wissenschatften ab: die Physik,dieEthik und die Logik. Diese Einteilung ist der Natur der Sache vollkommen angemessen, und man hat an ihr nichts zu verbessern, als etwa nur das Prinzip derselben hinzuzutun, um sich auf solche Art teils ihrer Vollständigkeit zu versichern, teils die notwendigen Unterabteilungen richtig bestimmen zu können. Alle Vernunfterkenntnis ist entweder material und betrachtet irgend ein Objekt oder formal, und beschäftigt sich mit der Form des Verstandes und der Vernunft selbst und den allgemeinen Regeln des Denkens überhaupt, ohne Unterscheidung der Objekte.8 In the twentieth century, the stark contrast between the (empirical) nat- ural sciences and (hermeneutic) liberal arts and social sciences led to a tendency to subsume the formal sciences into one of those categories.9 Therefore, in some treatments of Philosophy of Science, you will find the formal methods as a characteristic feature of the Natural sciences. In Hempel (1965), both the deductive-nomological and the statistical explanations seem to be used to describe the natural sciences, and formal knowledge seems to appear only as the base theory. We already mentioned Habermas who stressed the dichotomy between the Natural sciences and 8 BENEDIKT LÖWE the liberal arts (Habermas 1967). In Habermas (1968), he even categorizes different research processes, but again (as opposed to Locke and Kant in the above quotes) we will look for a mention of formal methods in vain:

Für drei Kategorien von Forschungsprozessen läßt sich ein spezifischer Zusammenhang von logisch-methodologischen Regeln und erkenntnisleitenden Interessen nachweisen. ...IndenAnsatzderempirisch-analytischenWissenschaftengehteintechnisches,inden Ansatz der historisch-hermeneutischen Wissenschaften ein praktisches und in den Ansatz kritisch orientierter Wissenschaften jenes emanzipatorische Erkenntnisinteresse ein, das schondentraditionellenTheorienuneingestanden...zugrunde lag.

But as we know, the twentieth century has also seen the glorious days of the formal sciences. The impact and unity of the formal sciences have changed significantly. The impact of the formal sciences could not be greater: formal sys- tems have taken over almost every aspect of the modern world, mainly in the form of computers. The complexity of the computations that we entrust these computers with is far beyond the limits of what human be- ings can possibly check. So formal methods become necessary to reflect upon what these machines do, and to give criteria for accepting or reject- ing the conclusions that these computers arrive at. Other applications of formal methods lie within abstract mathematical thinking which needs a firm foundation. The level of abstraction that some modern theories from physics reach is a clear indication that even natural sciences cannot exist without a fundamental formal basis. So the formal sciences can display an abundance of applications that show their importance. But on the other hand (or maybe even as a natural consequence of the multitude of applications), an increasing amount of complexity and specialization continuously got a tighter and tighter grip on the particular fields of the formal sciences and led to reduced frequency in interdisciplinary work. Nowadays we are not shocked anymore if researchers from different areas within the formal sciences do not even have the absolute minimum of knowledge that we expect from our own students. Giving talks to an interdisciplinary audience, we normally presuppose no knowledge what- soever. Consequently these talks rarely get to the level of sophistication that is needed to actually understand the technical and formal content of our results. Instead, in interdisciplinary talks, we tend to stay within the realm of metaphors and vague intuitions, and use a language that is more apt to advertise our theory than to explain it. Especially young researchers encounter difficulties when they try to build links between different areas. There are very few places in the world where a young researcher could acquire a broad knowledge in several areas THEFORMALSCIENCES 9 of the formal sciences. Truly interdisciplinary graduate programs are a rarity, and even those that exist have to face the fact that the background knowledge of students from different areas is so strikingly different that it makes communication between them very difficult and thus effectively splits the student body of the graduate program in almost disjoint groups. Building bridges between the research areas needs willingness and the academic freedom to choose untrodden paths, but the career situation of young people often does not allow that. Many conferences in the formal sciences fall into one of two categories: (a) specialized conferences in which representatives of a certain research area can discuss the particular problems of their field, and (b) large international conferences10 with plenary talks covering whole research areas by speakers of international renown, complementing those plenary talks with specialized sessions. We do not want to belittle the importance of conferences of these kinds. They are both important for the development of the formal sciences as such. But they do rarely give the opportunity to give a talk that encompasses several areas of the formal sciences, but nonetheless goes into enough detail such that joint work might actually evolve. We want to encourage this kind of talk in order to disseminate the inter- disciplinary spirit among our predominantly young participants. The lack of opportunity for giving such talks was the main reason for the conference series “Foundations of the Formal Sciences”.11 The conference series “Foundations of the Formal Sciences” tries to help with this general situation. We would like to provide a platform for young researchers in the different formal sciences to meet, to present gen- eral ideas, techniques and open questions that lie on the borders between the areas. We would like our meetings to result in a general understanding of what the other subjects in the formal sciences are dealing with, and in the optimal case, interdisciplinary joint work that employs techniques and ideas from different areas to solve problems. The conference series started with the Berlin meeting in May 1999 of which this special double issue of Synthese is the proceedings volume. The Berlin meeting did not have a particular topic, but it was supposed to encompass most of the formal sciences. Of course, it is hard to organize conferences as open as that without falling into the trap of becoming another conference with non-technical interdisciplinary talks. After the Berlin meeting, we therefore decided to take certain topics with a great interdisciplinary potential as the subjects of the further conferences in the FotFS series. Thus the series was continued 10 BENEDIKT LÖWE in November 2000 with a conference on “Applications of Mathematical Logic to Philosophy and Linguistics” which was held in Bonn, organized by Peter Koepke, Wolfgang Malzkorn, Thoralf Räsch, Rainer Stuhlmann- Laeisz and the present author. We shall have the third meeting of the series on the topic of “Complexity in Mathematics and Computer Science” in Vi- enna in September 2001. Information about the past and future conferences of the series can he found at http://www.math.uni-bonn.de/people/fotfs/.

The interdisciplinary character and the emphasis on young research- ers and the education of the next generations of researchers in the formal sciences should also permeate the proceedings volumes of the series. The volumes are mainly written by young researchers and they should be used as reference texts by young researchers. We imagine that these proceedings could be used as the text basis of an interdisciplinary seminar for graduate students and as a compendium of certain areas and techniques of the formal sciences for a young researcher who autodidactically wants to broaden his or her knowledge of the related areas of the formal sciences. We know that keeping alive the interdisciplinary spirit in the formal sciences is a huge task, but nevertheless we hope that with our conference series and the proceedings volume we ale humbly providing a little help towards completing that task.

NOTES

1 This tendency has been traditionally very strong in philosophy departments in contin- ental Europe, in particular in Germany. But, as the president of the Allgemeine Gesellschaft für Philosophie in Deutschland (AGPD), Wolfram Hogrebe writes in a letter to the mem- bers of the AGPD, these times may belong to the past: “Wenn man die Entwicklung der Philosophie in den letzten Jahren des vergangenen Jahrhunderts überschlagt, fällt auf, dass ein ehedem vielleicht drohendes Auseinanderdriften von logischem Scharfsinn und hermeneutischer oder historischer Sensibilität gegenwärtig nicht mehr zu befürchten ist. Gerade die jüngeren Temperamente der Philosophie verbinden wieder analytische, phänomenologische und historische Tugenden. Dieser Umstand lässt alte Antagonismen als verblichen erscheinen (Hogrebe 2000). 2 This brief introductory essay cannot aspire to be a fully-fledged analysis of the notion of the formal sciences throughout history and at the present time. It would be a valuable project to investigate the development of the awareness that the formal sciences are more than just a conditio sine qua non or a special case of the other sciences. 3 For example, in the Sophistes, 259c–264b. 4 Prior Analytics, 24a. 5 Note that part of the quadrivium (although commonly termed “mathematical”) does not belong to the formal sciences. THEFORMALSCIENCES 11

6 Cf. Jardine (1988, 693). 7 John Locke, An Essay Concerning Human Understanding (1690); (Wilburn 1947, Book IV, Chapter XXII). 8 Immanuel Kant, Grundlegung zur Metaphysik der Sitten (1785); (Buchenau and Cassirer 1922, 243). 9 This tendency is reiterated in the academic organization of many universities. As an ex- ample, the doctoral title conferred in Germany to mathematicians and computer scientists is “doctor rerum naturalium”. 10 As examples, we are thinking of the Meetings of the Association for Symbolic Logic or the International Congresses of Logic, Methodology and Philosophy of Science. 11 We do not claim that FotFS is the only venue for this. On several levels, conferences and conference series like FotFS have been held and will be held. As two examples out of some, let us mention the Southern African workshop “Logic, Universal Algebra, and Theoretical Computer Science” (LUATCS’99) amd the interdisciplinary series of one-day conference “Set Theory and Its Neighbours”, organized by Mirna Džamonja and Charles Morgan.

REFERENCES

Buchenau, Artur and Cassirer, Ernst (eds.): 1922, Schriften von 1783–1788 von Immanuel Kant, Berlin [Immanuel Kants Werke IV]. Habermas, Jürgen: 1967, ‘Zur Logik der Sozialwissenschaften’, Philosophische Rundschau Beiheft 5, 195 . Habermas, Jürgen: 1968, ‘Erkenntnis und Interesse, Frankfurter Antrittsvorlesung vom 28.6.1965’, in Jürgen Habermas, Technik und Wissenschaft als “Ideologie”, Frankfurt [edition suhrkamp 287]. Hempel, Carl G.: 1965, Aspects of Scientific Explanation and other Essays in the Philosophy of Science,NewYork. Hogrebe, Wolfram: 2000, ‘Public Letter to the Members of the Allgemeine Gesellschaft für Philosophie in Deutschland eV.’, Bonn, April 13th, quoted from: http://www.uni- leipzig.de/∼phi1os/agpd/mitteil.htm Jardine, Nicholas: 1988, ‘Epistemology of the Sciences’, Chapter 19 of: Charles B. Schmitt et al. (eds), The Cambridge History of Renaissance Philosophy, Cambridge, pp. 685– 711. Wilburn, Raymond (ed.): 1947, An Essay Concerning Human Understanding, John Locke, London [Everyman’s Library 984].

Mathematisches Institut Rheinische Friedrich-Wilhelms-Universität Bonn Beringstrasse 6, D-53115 Bonn, Germany E-mail: [email protected]

ANDREAS WEIERMANN

SLOW VERSUS FAST GROWING∗

ABSTRACT. We survey a selection of results about majorization hierarchies. The main focus is on classical and recent results about the comparison between the slow and fast growing hierarchies.

1. THE EXTENDED GRZEGORCZYK HIERARCHY

Majorization hierarchies have been established as valuable scales for clas- sifying various natural subclasses of the general recursive functions. The basic idea is as follows. To the class of functions under consideration one associates a hierarchy of increasing functions (Fα) with ordinals α such that Fα dominates Fβ whenever β<α. The computational complexity of a function h is then given by the least α such that h can be computed with time (or space) bound Fα. For illustration let us consider some well known examples. By recursion on natural we define the Grzegorczyk hierarchy (Fn)n<ω (of number-theoretic functions) as follows:

F0(x) := x + 1, := x+1 Fl+1(x) Fl (x).

(Here the exponent denotes the of iterations.) The hierarchy (Fn)n<ω classifies the primitive recursive functions as follows. Any Fl is primitive recursive. For any primitive recursive function g : IN k → IN there exists a l such that g(x) < k Fl (max{x}) for all x ∈ IN . Moreover, a function g is primitive recursive if and only if there exists a natural number l such that g can be computed on a Turing machine with time (or space) bound Fl . This characterization can be used, e.g., for proving interesting closure properties of the primitive recursive functions like closure under para- meter recursion, simple nested recursion and unnested multiple recursion (Cichon and Weiermann 1997; Ritchie 1968). Another prime example for majorization hierarchies is provided by the classification of the multiple recursive functions. Recall (cf. Rose 1984) that a function is multiple recursive if it can be defined by applications of

Synthese 133: 13Ð29, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 14 ANDREAS WEIERMANN primitive recursive operations and a finite number of applications of the following scheme for f

f(x,y1,...,yk) := 0ify1 · ...· yk = 0,

f(x,y1 + 1,...,yk + 1) := g(x,y1,...,yk,f1,...,fk) otherwise, where for i = 1, 2,...,kthe number fi is given by

fi = f(x,y1 + 1,...,yi−1 + 1,yi ,

fi,1(x, y1,...,yk,f(x,y1 + 1,...,yk−1 + 1,yk)), . . . ,

fi,k−i (x, y1,...,yk,f(x,y1 + 1,...,yk−1 + 1,yk))) and where g and fi,j (for j = 1,...,k− i) have been defined previously. Thus a function h is multiple recursive if it can be defined from primitive recursive functions via nested recursions along the lexicographic ordering on tuples of natural numbers of some fixed length. The hierarchy of  + 2-ary Ackermann functions is defined as follows:

A+2(0,...,0, 0,z) := z + 1, z+1 A+2(x1,...,xn,y+ 1,z) := A+2(x1,...,xn,y,·) (z),

A+2(xk−1,xk + 1, 0, 0,...,0,z) := A+2(xk−1,xk,z+ 1, 0,...,0,z), where A+2(x1,...,xn,y,·) denotes the function

z → A+2(x1,...,xn,y,z) and xk−1 abbreviates the argument sequence x1,...,xk−1. The hierarchy (A+2)<ω classifies the multiple recursive functions in the following sense. Each function A+2 is multiple recursive. For every multiple recursive function g : IN k → IN there exists a natural num-  k ber  such that g(x) < A+2(1, 0, max{x}) for all x ∈ IN . Moreover, a function g is multiple recursive if and only if there exists a natural number  such that g can be computed with time (or space) bound x → A+2(1, 0 ...,0,x). More complex majorization hierarchies are obtained by extending the Grzegorczyk hierarchy or the hierarchy of  + 2-ary Ackermann func- tions into the transfinite. The most significant example is provided by the Schwichtenberg-Wainer hierarchy which extends the Grzegorczyk hier- archy to the segment of ordinals less than ε0, the least ordinal α such that ωα = α. Using the Cantor normal form theorem we can assign to each limit ordinal λ<ε0 a canonical (or standard) fundamental sequence λ[x]:x< SLOW VERSUS FAST GROWING 15

ω as follows. If λ = ωβ+1 ·(γ + 1) then λ[x]:=ωβ+1 ·γ + ωβ ·(x + 1).If λ = ωβ · (γ + 1) with β a limit then λ[x]:=ωβ · γ + ωβ[x]. By recursion on ordinals we define

F0(x) := x + 1, := x+1 Fα+1(x) Fα (x), Fλ(x) := Fλ[x](x).

Note that this hierarchy extends naturally the Grzegorczyk hierarchy and moreover the hierarchy of  + 2 ary Ackermann functions since

+ =  1 A 2(x1,...,xn,y,z) Fω ·x1+···+ω ·xn+y (z).

The Schwichtenberg-Wainer hierarchy (Fα)α<ε0 has been investigated ex- tensively. It has been shown that it is equivalent with Kreisel’s ordinal recursive functions, a function class based on <ε0-times iterated enumer- ation, and hierarchies based on nested and unnested <ε0-recursion (Rose 1984; Schwichtenberg 1971; Wainer 1972). Perhaps, the most notable result concerning the SchwichtenbergÐWainer hierarchy is that it yields a classification of the provably recursive functions of first order Peano arithmetic PA. Indeed, for any α<ε0 the function Fα is provably recursive in PA, i.e., there exists an e ∈ IN such that Fα ={e} and

PA ∀x∃yT(e,x,y)

[where as usual T denotes the Kleene predicate and {e}(x) = U(µyT(e,x,y))]. Moreover, if

PA ∀x∃yA(x,y) for some  formula A (i.e., all unbounded quantifiers in A are existen- tial ones) then there is an α<ε0 such that for all m ∈ IN t h e r e i s a n n

ωf [x]:=sf (0) +···+sf (x) + x.

This yields a fundamental sequence for ω such that Fω(x) ≥ sf (x) for all x ∈ IN. Thus one vague assumption on naturalness for a system of fundamental sequences would be that (for the limits λ in question) λ[x] should be defined only using knowledge about λ and x in some effective (primitive recursive) way. This is, e.g., true for the standard system of fundamental sequences for the limits less than ε0 and this is also true for the existing systems of fundamental sequences which can be found in the literature about proof-theoretic ordinals (cf. e.g., Buchholz 1986; Schütte 1986). But the larger the proof-theoretic ordinal in question, the harder and more involved the definition of a system of fundamental sequences becomes. Despite these problems it has nevertheless been possible to develop a smooth and rather general theory of fundamental sequences and ma- jorization hierarchies (cf. Buchholz et al. 1994). For a given countable ordinal τ one assumes the existence of a function N : τ → IN such that N0 = 0,N(α + 1) ≤ Nα + 1 and so that the set {α<τ: Nα < k} is finite for all k ∈ IN. In applications where τ is an ordinal which is denoted by a finitely generated set of terms one may define Nα by counting the number of symbols occurring in the term notation for α. Based on N one may define (following a suggestion of Cichon) λ[x]:=max{β<α: Nβ ≤ Nα + x} for any limit λ<τ(cf. Buchholz et al. 1994). The resulting assignment behaves nicely for defining majorization hierarchies since one easily veri- fies that α<βyields Fα(x) < Fβ (x) for all x ≥ Nα. Moreover, if Nα denotes for a given α<ε0 the number of occurrences of the symbol ω in the Cantor normal form representation of α then the resulting hierarchy

(Fα)α<ε0 matches up with the Schwichtenberg Wainer hierarchy. The ap- proach carries over to all proof-theoretic ordinals which have appeared in the literature so far (cf., e.g., Schütte (1977)). In , if we define SLOW VERSUS FAST GROWING 17

:= { + }∪{ x+1 : ≤ } Fα(x) max( x 1 Fβ (x) β<α& Nβ x ) then this definition yields a hierarchy which is equivalent to the one which is defined by using the norm based fundamental sequences. This approach can also be used for developing a theory of majorization hierarchies along a given well-ordering ≺ on the natural numbers by put- ting N(ord(x)) := x where ord(x) is the order type of a limit element x ∈ field(≺). This has been carried out in Möllerfeld and Weiermann (1996). A related approach of comparable generality has been developed by Friedman and Sheard (1995).

2. COMPARING THE SLOW AND FAST GROWING HIERARCHY IN THE TRADITIONAL SPIRIT

The hierarchy (Fα) is a typical example of a fast growing (iteration) hier- archy since even for small values of α and x the result Fα(x) turns out to be a rather large natural number. The definition of Fα may be interpreted as a complex iteration of the . A somewhat less complex way of iterating the successor function is provided by the following defin- ition of the so called Hardy hierarchy which has been introduced in Hardy (1904)

H0(x) := x,

Hα+1(x) := Hα(x + 1),

Hλ(x) := Hλ[x](x). The Hardy hierarchy measures the length of descending sequences of or- dinals of the form α>α[x] >(α[x])[x + 1] > ... > (...((α[x])[x + 1])...)[x + n]. For the standard assignment of fundamental sequences for the ordinals less than ε0 one verifies immediately that Fα(x) = Hωα (x) holds for all x ∈ IN. Thus the hierarchy (Hα) contains rather fast growing functions even for comparatively small ordinal values. A slight modification of the definition of the Hardy hierarchy at successor levels leads us to the so called slow growing or pointwise hierarchy:

G0(x) := 0,

Gα+1(x) := Gα(x) + 1,

Gλ(x) := Gλ[x](x). The slow growing hierarchy measures the length of descending se- quences of ordinals of the form α>α[x] >(α[x])[x] > ... > (. . . ((α[x])[x])...)[x] and the difference between these hierarchies does not seem to be very big at first sight. Nevertheless, if (Gα) is defined with 18 ANDREAS WEIERMANN respect to the standard assignment of limits for the ordinals less than ε0, one verifies easily that

Gβ (x) Gα(x) = (x + 1) + Gγ (x)

β for any α<ε0 of the form ω + γ>β,γ. Thus (Gα) behaves like a natural homomorphism from a segment of the countable ordinals into the number-theoretic functions and the hierarchy (Gα)α<ε0 consists of ele- mentary functions only. Cichon and Wainer commented in (Cichon and Wainer 1983, 401, lines 5Ð8): “Despite the usefulness of the Grzegorczyk and Hardy hierarchies, we would claim that the Slow-Growing Hierarchy is the most natural majorization hierarchy of all and its comparison with the fast growing hierarchies is therefore of immediate interest”. Further motivations for investigations on the slow growing hierarchy have been given, for example, by Girard (1981) from where we select some quotations [comments of the present author are in square brackets]: “The interest of the hierarchy γ lies in the fact that, in some sense, e  is the number of steps needed to compute γe. Certainly, when e = 2e we need just one additional step to compute γ2e(n) from γe(n): add 1; but when e = e,e+1, this is not so clear. However, γ is a hierarchy which reflects more faithfully the computation process than others, like λ” (Girard 1981, 330, lines 31Ð35). [Here γ denotes the pointwise hierarchy whereas λ denotes a slight variant of the Hardy hierarchy. Furthermore e, e and e range over elements of Kleene’s ordinal notation system O,2e denotes the successor of e and e,e+1 denotes the limit ordinal coded by e.] “In practice, one will often prefer λ, which can easily be composed. (Similarly, in practice, one will prefer proofs with cuts which are easier to handle.) However, γ is more interesting to study (as well as cut free proofs are more interesting to study)” (Girard 1981, 331, lines 5Ð8). Historically the slow growing hierarchy has been introduced to build a hierarchy over the predicative ordinals which matches the hierarchy (Fα) via an autonomous generation process exactly at &0. This relationship has been announced in Wainer (1970) and is claimed in Wainer (1972). Some indications in this direction can be found in Vogel (1977). For a certain specific assignment of fundamental sequences Vogel proved that (Gα)α<ϕω0 matches with (Fn)n<ω. This means that for any α<ϕω0there is an n<ωsuch that Gα(x) < Fn(x) for all x ∈ INand that for any n<ω there is an α<ϕω0 such that Fn(x) < Gα(x) for all x ∈ IN . Another interesting relationship between ϕω0, the first primitive recurs- ively closed ordinal, and the Grzegorczyk hierarchy has been obtained by Hofbauer (1992). By folklore, ϕω0 is an upper bound for the order type of any multiset path ordering over a finite signature. (See, for example, SLOW VERSUS FAST GROWING 19

Dershowitz and Okada (1988) for a proof.) Hofbauer showed that the de- rivation lengths function for a finite rewrite system over a finite signature, which can be proved terminating via the multiset path ordering, can be bounded by a function F for some  ∈ IN. Any primitive recursive func- tion can be computed by a finite rewrite system R over a finite signature such that the rules of R are reducing under a multiset path ordering. Thus primitive recursion corresponds exactly to the rewrite computational com- plexity of the multiset path ordering. Interpreting Hofbauer’s result in the context of hierarchies we see that in case of the multiset path ordering the order type of the termination ordering for a rewrite system R is related to the R derivation lengths function via the slow growing hierarchy.

Vogel also obtained that (Gα)α<ϕε00 consists of <ε0-recursive func- tions and that (Gα)α<&0 consists of <&0-recursive functions. (Recall that a number-theoretic function g is called <ε0 (<&0) recursive if there is an α<ε0 (α<&0) such that g is primitive recursive in Fα.) More precisely

Vogel introduced a hierarchy (Vα)α<&0 as follows (cf. Vogel (1977)): a V0(a, x) := x , x+1 Vα+1(0,x) := Vα(·,x) (0), x+1 Vα+1(a + 1,x) := Vα(·,x) (Vα+1(a, x) + 1),

Vλ(a, x) := Vλ[x](a, x).

He verified that Gϕαβ(x) = Vα(Gβ (x), x). Hence any function Vα is <ε0 recursive for α<ε0. But a straightforward verification yields = Vα(a, x) VGα(x)(a, x) and thus any function from (Vα)α<ε0 is even elementary recursive in Fω. Thus for Vogel’s assignment there is no match between (Gα) and (Fα) at &0. A more comprehensive analysis of the relationship between (Fα) and 1 (Gα) has been achieved by Girard. In his framework of +2-logicheproved in Girard (1981)1 that for a certain specific assignment of fundamental sequences the hierarchy (Fα)α<ε0 matches with (Gα)α<η0 where η0 de- notes the HowardÐBachmann ordinal, i.e., the proof-theoretic ordinal of ID1, the theory of one non-iterated inductive definition. As a special case Girard obtained a match between the multiple recursive functions and the hierarchy (Gα)α<θ.ω0. Another interesting relationship between θ.ω0 (cf. Schütte (1977)) and the hierarchy (Fα)α<ωω has been established in Weiermann (1995). The ordinal θ.ω0 is an upper bound for the order type of any lexicographic path ordering over a finite signature. It is shown in Weiermann (1995) that the derivation lengths function for a finite rewrite system over a finite signature, which can be proved terminating via the lexicographic path or- ω dering, can be bounded by a function Fα for some α<ω . Any multiple 20 ANDREAS WEIERMANN recursive function can be computed by a finite rewrite system R over a finite signature such that the rules of R are reducing under a lexicographic path ordering. Thus multiple recursion corresponds exactly to the rewrite computational complexity of the lexicographic path ordering. Interpreting this result from the viewpoint of hierarchies we see that in case of the lexicographic path ordering the order type of a termination ordering for a rewrite system R is related to the R derivation lengths func- tion via the slow growing hierarchy. An attempt to give an explanation of this phenomenon in a more general context has been given in Weiermann (1998b). The relationship between ε0 and η0 obtained by Girard has later been - obtained by several authors even if different assignments of fundamental sequences are assumed (cf. Aczel (1980), Buchholz (1980), Cichon and Wainer (1983), Jervell (1979), Schmerl (1981), Schwichtenberg (1980)). For the case of the standard Buchholz ordinal notation system for η0 and its associated system of natural fundamental sequences the result can be found, for example, in Arai (1993).

In his 1981 paper Girard indicates that (Fα)α<ηn matches with

(Gα)α<ηn+1 where ηn denotes the proof-theoretic ordinal of the theory IDn+1 (which formalizes n + 1-times iterated inductive definitions, cf. := Feferman (1968)). Thus, η<ω supn<ω ηn is the least ordinal where (Fα) and (Gα) match . According to Wainer, ordinals η where (Fα) and (Gα) match up are called subrecursively inaccessible. Thus η<ω is minimal subrecursively in- accessible. In the context of tree ordinals the subrecursive inaccessibility of η<ω has been established by Wainer (1989) and in the case of the standard Buchholz notation system and its associated system of natural fundamental sequences by Arai (1993). What can be said about the relationship between (Gα) and (Fα) for ordinals above η<ω? For recursive λ let η<λ be the proof-theoretic ordinal of ID<λ. By reasoning via analogy one might conjecture that there is a match between (Fα) and (Gα) for any η<λ. Surprisingly this is not the case for the standard Buchholz ordinal notations and its associated system of natural fundamental sequences. Although the matter becomes rather specialized now we report (to the experts) some of the results which have been obtained so far. The reader who is not familiar with Buchholz style notations may skip the rest of this section. Let |T | be the ordinal of a theory T and |T |γ the least ordinal γ such that the T -provably recursive functions are classified by (Gα)α<γ .In Weiermann (1996a) the following results on |T |γ are obtained. (The results about ε0,ψ0ε.n+1 and ψ0.ω are already in Arai (1993).) SLOW VERSUS FAST GROWING 21

A list of slow and fast growing ordinals.

formal system T |T | lower bound for |T |γ

PA ε0 ψ0ε.1+1

IDn ψ0ε.n+1 ψ0ε.n+1+1 + 1 − ACA0 (+1 CA) ψ0.ω ψ0.ω = ID<ω + 1 − · · ACA (+1 CA) ψ0(.ω ε0) ψ0(..1 ε.1+1) = W − IDω ACA + (+1 − CA) + (BI) ψ ε + ψ ε + 1 0 .ω 1 0 ..1 1 = IDω

ID + ψ ε + + ψ ε + + ω 1 0 .ω 1 1 0 ..1 1 1 ......

IDω+n ψ ε + + ψ ε + + 0 .ω n 1 0 ..1 n 1 ......

ID<ω·2 ψ0.ω·2 ψ0..1+ω

ID · ψ ε · + ψ ε · + ω 2 0 .ω 2 1 0 ..1 2 1 ......

ID 2 ψ ε + ψ ε + ω 0 .ω2 1 0 . 2 1 .1 ...... n IDω ψ0ε.ωn +1 ψ0ε. n +1 .1 ...... ACA + (01 − CR) ψ . ω ψ . ω 2 0 ω 0 .1 = ID<ωω

ω IDω ψ0ε.ωω +1 ψ0ε. . +1 . 1 1 ...... 1 ACA + (0 − CA) ψ . ψ . + 2 0 ε0 0 ε.1 1 = ID<ε0

IDε0 ψ0ε.ε +1 ψ0ε.ε + +1 0 .1 1 ID≺∗ ψ ε + ψ ε + 0 ..1 1 0 ..2 1 ...... 22 ANDREAS WEIERMANN

Let 10 be the least ordinal which is a fixed point of the ℵ function (which as usual is denoted by α → .α). Then ψ010, the proof-theoretic 1 − ordinal of (+1 TR)0, is a comparatively large (proof-theoretic) ordinal. It is shown in Weiermann (1996a) that if (Gα) and (Fα) match at η< ∈{ } ψ010 then η ψ0.ω,ψ0..ω ,... . Thus the subrecursively inaccessible ordinals are distributed sparsely and they do not have distinguished strong closure properties. { } Remark: It is an open problem whether the ordinals in ψ0.ω,ψ0..ω ,... are really subrecursively inaccessible. A recent application of the slow growing hierarchy has been given by Arai (1997). He showed that the consistency of PA follows from a principle of pointwise induction up to η0 and he generalized his result to the theories IDn. Here we take the opportunity to include a new result which solves an open problem posed at the end in Arai’s paper. In the theorem we assume the notations from Arai (1997).

∀ ∃ k−1 · [ ]m = THEOREM. Ik n mD0D1 (. n) 1 0.

Proof: For α<ε0 let

Nα ψα := max({0}∪{ψ(β + 1) : β<α& Nβ ≤ 33 } where Nα denotes the number of occurrences of the symbol ω in the Cantor normal form of α.LetT be the notation system of Arai (1997). For a ∈ T define Cxa<ε0 as follows:

1. Cx 0 := 0. 2. Cx (a0,...,ak) := Cx a0# ...#Cxak where # denotes the natural sum. 3. Cx D10 := ω Cx a+1 4. Cx D1a := ω · x if a = 0. 5. Cx D00 := 1. ψ(Cx ak+ψ(...+ψ(Cxa0+x))) 6. Cx D0(a0,...,ak) := x . The important point in this definition is the nested use of ψ in the last item of the definition. By a brute force calculation one verifies Cx+2a[x] < Cx+2a for a ∈ T . Now assume that k ≥ 2. Then for any n the least m such that k−1 · [ ]m = D0D1 (. n) 1 0 is bounded by

k−1 k−1 · + · = ψ(C3(D1 (. n) 3) C3(D0D1 (. n)) 3 n+2 <ψ(ωk−2(ω ))<ψ(ωk + n). SLOW VERSUS FAST GROWING 23

2 For k = 1 we obtain C3(D0(. · n))<ψ(ω + n). The result follows since n → ψ(ωk + n) is provably recursive in Ik. 

Similarly we obtain for any n ∈ IN the following strengthening of the last remark in Arai (1993):

∀ ∃ { k · [ ]m = } Ik d m D0D1(. n)) d 0 .

These results are optimal since we also obtained

 ∀ ∃ { k · [ ]m = } Ik n m D0D1(. n)) 1 0 and

 ∀ ∃ { k+1 · [ ]m = } Ik d m D0D1 (. 1)) d 0 but even a proof sketch for these results is beyond the scope of this paper.

3. SOME UNEXPECTED RESULTS ABOUT THE SLOW GROWING HIERARCHY

The results obtained so far indicate that the slow growing hierarchy be- haves (with respect to the choice of its underlying system of fundamental sequences) as stable as the hierarchy (Fα) does. It thus seems desirable to prove that the growth rate of the hierarchy (Gα) is not affected by small changes of the underlying system of fundamental sequences. In this section we survey recent results on the slow growing hierarchy and show that, somewhat surprisingly, the hierarchy (Gα) is extremely sensitive with respect to choices of its underlying system of fundamental sequences. For a given countable ordinal τ let us call two assignments of funda- mental sequences ·[·] : τ ∩ Lim × IN → τ and ·[·]τ ∩ Lim × IN → τ  closely related if the resulting hierarchies (Fα) and (Fα) match up at every limit λ ≤ τ.Givenτ and two closely related assignments ·[·] and ·[·]  it is an obvious question whether (Gα) and (Gα) match up at some limit λ ≤ τ. In his inspiring (Cichon 1992), Cichon introduced a certain specific assignment ·[·]C of fundamental sequences (for the limits below &0)which is closely related to the standard assignment. By Vogel’s and Girard’s res- ults we know that (Gα)α<ϕω0 matches with (Fn)n<ω. By appealing to such a type of hierarchy comparison theorem it is stated in Cichon (1992) that the resulting (Gα)α<ϕω0 and (Fn)n<ω match up. In particular any function from (Gα)α<ε0 should then be primitive recursive but a verification of this 24 ANDREAS WEIERMANN remained problematic. Let us consider the situation in some more detail. Put

NC 0 := 0

NC α := max{m, 1 + NC α1, ···, 1 + NC αm}

α1 αm if ε0 >α= ω +···+ω >α1 ≥ ...≥ αm and put

λ[x]C := max{β<λ: NC β

C The problem is to classify the resulting (Gα )α<ε0 . In a first step we con- sidered the following related problem. Recall that N0 := 0andNα := α1 αm m + Nα1 +···+Nαm if α = ω +···+ω >α1 ≥ ... ≥ αm.Ifwe put

λ[x]:=max{β<λ: Nβ < Nλ + x}.

then the resulting hierarchy (Gα)α<ε0 consists of elementary functions only. By reasoning via analogy one might conjecture that the same is true C for (Gα )α<ε0 . Nevertheless Arai came up with the following surprising result. For α<ε0 let

Aα(x) := max{Aβ (x) + 1 : β<α& NCβ ≤ NC α + x}.

Arai showed that (Aα) and (Fα) match at ε0. Thus is surprising since the definition of (Aα) looks pointwise and indeed, after tedious calculations it C was finally possible to show that (Gα ) matches with (Fα) at ε0 (Weiermann 1997). For the first time it was possible to produce a slow growing hierarchy which is fast growing by using an assignment of fundamental sequences which is closely related to the standard one. Afterwards, it turned out that for several assignments of fundamental sequences (which are closely related to the standard one) (Gα) often matches with (Fα) at ε0. We report some examples. For a real number c ≥ 1let

λ[x]c := max{β<λ: Nβ ≤ c · Nλ+ x}.

Then for c = 1 the induced slow growing hierarchy consists of element- ary functions but for c>1 the induced slow growing hierarchy matches up with (Fα)α<ε0 . The hierarchy (Gα) becomes even fast growing when its underlying system of fundamental sequences is defined as follows: (ωα + λ)[x]:=ωα + λ[x + 1]; ωβ+1[x]:=ωβ · (x + 1); ωλ[x]:=ωλ[x] SLOW VERSUS FAST GROWING 25 where ωα + λ>λ∈ Lim. But not every slight modification of the standard assignment yields a fast growing hierarchy (Gα)α<ε0 . Assume that the underlying system of fundamental sequences is defined as follows: (ωα + λ)[x]:=ωα + λ[x]; ωβ+1[x]:=ωβ · 2x ; ωλ[x]:=ωλ[2x] where ωα + λ>λ∈ Lim. Despite the occurrence of the exponential function the hierarchy (Gα)α<ε0 consists of elementary functions only. The results obtained so far indicate that the slow growing hierarchy is either slow or fast growing. It thus seems desirable to prove that this is always the case for natural assignments which are slight modifications of the standard assignment. Warned by the counter examples above it might be no surprise for the reader to see that this is not possible. Indeed, let us consider Vogel’s nota- − tion system T for &0 and let us consider the subsystem T where we delete all terms in which the symbol for ordinal addition occurs. Then the order − type of this system is equal to ε0. T is the least set of terms containing 0 and which is closed under applications of the binary function symbol ϕ.In − T the rôle of the successor function is played by α → ϕ0α. The induced system of fundamental sequences for the remaining limits in T − is defined as follows: [ ]:= x+1 1. (ϕα+10) x ϕα 0, + [ ]:= x+1 + 2. (ϕα+1(β 1)) x ϕα (ϕα+1β 1), 3. (ϕαλ)[x]:=ϕαλ[x], 4. (ϕλ0)[x]:=ϕλ[x]0, 5. (ϕλ(β + 1))[x]:=ϕλ[x](ϕλβ + 1).

The induced hierarchy (Gα)α∈T − yields a classification of the function which are elementary recursive in the . In view of this Vogel-style result it is an immediate problem to find out the relationship between (Gα) and (Fα) in case that both hierarchies are defined with respect to the standard assignment of fundamental se- quences for the limits below &0. After painful computations it turned out that (Gα)α<&0 matches the functions which are elementary in Fω (Weier- mann 2001b). Thus for the standard assignment the traditional relationship between &0 and ω + 1 is re-obtained. Nevertheless Wainer’s 1972 claim about the relationship between (Gα) and (Fα) can be realized in case that the following slight modification of the standard assignment of fundamental sequences for the limits below &0 is assumed. Let T be Vogel’s notation system for &0 from Vogel (1977). For α ∈ T let Nα be the number of occurrences of ϕ in α.Foranyα ∈ T and x<ωwe define recursively α[x] as follows. 1. 0[x]:=0, 26 ANDREAS WEIERMANN

2. (α1 +···+αm)[x]:=α1 +···+αm[x], 3. (ϕ00)[x]:=0, 4. (ϕ0(β + 1))[x]:=ϕ0β · (x + 1), + [ ]:= x+1 5. (ϕ(α 1)0) x ϕα (0), + + [ ]:= x+1 + + 6. (ϕ(α 1)(β 1)) x ϕα (ϕ(α 1)β 1), 7. (ϕλ0)[x]:=ϕλ[Nλ+ x]0, 8. (ϕλ(β + 1))[x]:=ϕ(λ[Nλ+ Nβ + 1 + x])(ϕλβ + 1), 9. (ϕαλ)[x]:=ϕα(λ[x]).

Then the induced hierarchy (Gα) matches (Fα) at &0 for the first time and the latter assignment is closely related to the standard one (cf., e.g., Weiermann (2001a)). By variations of such definitions one obtains as- signments of fundamental sequences such that the minimal subrecursively inaccessibles can, e.g., be chosen as θ.n0foranyn ∈ IN a n d o r a s θ.ω0 (cf. Schütte (1977)). In merely all examples mentioned so far it turned out that slight modi- fications of the standard assignment induce slow growing hierarchies which grow as fast as the standard slow growing hierarchy. Neverthe- less, if we consider the system of fundamental sequences defined via λ[x]:=max{β<λ: Nβ ≤ Nλ + x} then the induced slow growing hierarchy (Gα)α<ε0 becomes very slow growing in the sense that there is a fixed elementary recursive function which eventually dominates every such Gα for α<ε0. Since this result is new we indicate a proof for it. First one verifies the following assertions:

1. α<β& Nα ≤ x + 1 ⇒ Gα(x) ≤ Gβ (x), 2. ωλ[x]=ωλ[x] if λ ∈ Lim, 3. (ωα + λ)[x]=ωα + λ[x] if ωα + λ>λ∈ Lim, 4. ωα+1[x]=ωα + ωα+1[[ x]] where γ [[ x]] is defined by γ [[ x]] = max{β<γ: Nβ ≤ x}. Using these assertions one proves by induction on α that α = ωβ + γ>γ implies

≤ + · + + (6) Gα(x) (Gβ (x) 1) (Gωβ [[ x+1]] (x) 1) Gγ (x).

Let Bα(x) be the number of ordinals less than α such that Nα ≤ x.Then 3x Bα(x) ≤ 3 . Moreover α<βand Nα ≤ x implies Bα(x) < Bβ (x). 3Bα(x+1) The assertion (6) together with an induction on α yields Gα(x) ≤ 3 . Thus we obtain the following theorem. SLOW VERSUS FAST GROWING 27

THEOREM. Let ·[·] : (ε0 ∩ Lim) → ε0 be defined via λ[x]:=max{β< : ≤ + } ·[·] ≤ λ Nβ Nλ x and let (Gα)α<ε0 be induced by .ThenGα(x) + 3x 1 333 holds for x ≥ Nα.

Résumé: The results of this section indicate that “the” slow growing hier- archy is very sensitive with respect to the choice of its underlying system of fundamental sequences and perhaps this hierarchy should be renamed. Fol- lowing Girard’s suggestion the term “pointwise hierarchy” seems adequate here. Nevertheless we believe that the results discussed in this section shed (at least in some examples) some light on the elusive notion of standard or natural system of fundamental sequences. Assume that we have given an assignment ·[·] of fundamental sequences for the limit ordinals less than ε0 which is closely related to the standard system. We consider ·[·] as standard if the induced hierarchy (Gα)α<ε0 matches with the elementary recursive functions as does the standard slow growing hierarchy for the segment of ordinals below ε0. One crucial property of ·[·] for being standard is a strong form of (Lipschitz) continuity in case of ordinal addition. This should mean that the system ·[·] should satisfy (ωα +λ)[x]=ωα +λ[x] for ωα +λ>λ, since, e.g., a condition of the form (ωα +λ)[x]=ωα +λ[x+1] may lead to a fast growing induced hierarchy (Gα)α<ε0 . In the case of or- dinal exponentiation with respect to base ω the standardness conditions on ·[·] are much more relaxed. Thus conditions of the form ωλ[x]=ωλ[p(x)] and ωα+1[x]=ωλ · q(x),wherep and q are elementary recursive, do not necessarily lead beyond elementary recursion (cf., e.g., Weiermann 1999).

Finally for guaranteeing that the induced hierarchy (Gα)α<ε0 matches the elementary functions it is necessary that ωα+1[x] involves for large x at least a duplication of ωα since as shown in the previous theorem a defin- ition of the form ωα+1[x]:=max{β<ωα+1 : Nβ ≤ Nωα+1 + x} may lead to a very slow growing induced hierarchy (Gα)α<ε0 .

NOTES

∗ The author is a Heisenberg fellow of the Deutsche Forschungsgemeinschaft. 1 According to the referee, Girard obtained this result already in 1975.

REFERENCES

Ackermann, W.: 1940, Zur Widerspruchsfreiheit der reinen Zahlentheorie, Mathematische Annalen 117, 162Ð194. Aczel, P.: 1980, Another Elementary Treatment of Girard’s Result Connecting the Slow and Fast Growing Hierarchies of Number-Theoretic Functions, manuscript. Arai, T.: 1993, ‘A Slow Growing Analogue to Buchholz’ Proof’, Annals of Pure and Applied Logic 54, 101Ð120. 28 ANDREAS WEIERMANN

Arai, T.: 1997, ‘Consistency Proof Via Pointwise Induction’, Archive for Mathematical Logic 37, 149Ð165. Arai, T.: 1998, ‘Variations on a Theme by Weiermann’, The Journal of Symbolic Logic 63, 897Ð925. Blankertz, B. and Weiermann, A.: 1996, ‘How to Characterize Provably Total Functions by the Buchholz Operator Method’, in Petr Hajek (ed.), Gödel ’96, Logical Foundations of Mathematics, Computer Science and Physics – Kurt Gödel’s Legacy, Proceedings of a conference, Brno, Czech Republic, August 1996, Berlin [Lecture Notes in Logic 6], pp. 205Ð213. Buchholz, W.: 1980, Three Contributions to the Conference on Recent Advances in Proof Theory, Oxford, mimeographed. Buchholz, W.: 1986, ‘A New System of Proof-Theoretic Ordinal Functions’, Annals of Pure and Applied Logic 32, 195Ð207. Buchholz, W., E. A. Cichon and A. Weiermann: 1994, ‘A Uniform Approach to Funda- mental Sequences and Hierarchies’, Mathematical Logic Quarterly 40, 273Ð286. Cichon, E. A.: 1992, ‘Termination Proofs and Complexity Characterisations’, in Peter Aczel, Harold Simmons and Stanley Wainer (eds), Proof Theory, A Selection of Papers from the Leeds Proof Theory Programme, an International Summer School and Confer- ence on Proof Theory, Leeds University, UK, 24 JulyÐ2 August 1990, Cambridge, pp. 173Ð193. Cichon, E. A. and S. S. Wainer: 1983, ‘The Slow-Growing and the Grzegorczyk Hierarch- ies’, Journal of Symbolic Logic 48, 399Ð408. Cichon, E. A. and A. Weiermann (1997), ‘Term Rewriting Theory for the Primitive Recursive Functions’, Annals of Pure and Applied Logic 83, 199Ð223. Dershowitz, N. and M. Okada: 1988, ‘Proof-Theoretic Techniques for Term-Rewriting Theory’, in IEEE Computer Society (ed.), Proceedings of the Third Annual Symposium on Logic in Computer Science (LICS ’88), Edinburgh, Scotland, UK, 5Ð8 July 1988, Edinburgh, pp. 104Ð111. Feferman, S.: 1968, ‘Systems of Predicative Analysis, II: Representations of Ordinals’, Journal of Symbolic Logic 33, 193Ð220. Friedman, H. and M. Sheard: 1995, ‘Elementary Descent Recursion and Proof Theory’, Annals of Pure and Applied Logic 71, 1Ð47. 1 Girard, J. Y.: 1981, ‘+2 Logic, Part 1’, Annals of Mathematical Logic 21, 75Ð219. Hardy, G.: 1904, ‘A Theorem Concerning the Infinite Cardinal Numbers’, Quarterly Journal of Mathematics 35, 87Ð94. Grzegorczyk, A.: 1953,‘Some Classes of Recursive Functions’, Rozprawy Matematyczne 4. Hofbauer, D.: 1992, ‘Termination Proofs by Multiset Path Orderings Imply Primitive Recursive Derivation Lengths’, Theoretical Computer Science 105, 129Ð140. Howard, W. A.: 1970, ‘Assignment of Ordinals to Terms for Primitive Recursive Func- tionals of Finite Type’, in A. Kino, J. Myhill and R. Wesley (eds), Intuitionism and Proof Theory, Proceedings of the summer conference at Buffalo N.Y. 1968, Amsterdam [Studies in Logic and the Foundations of Mathematics 8], pp. 443Ð458. Jervell, H. R.: 1979, Homogeneous Trees, Lecture Notes at the University of München. Möllerfeld, M. and A. Weiermann: 1996, A Uniform Approach to ≺-Recursion, Münster, preprint. Ritchie, D. M.: 1968, Program Structure and Computational Complexity, Doctoral disser- tation. Harvard University. Rose, H. E.: 1984, Subrecursion: Functions and Hierarchies, Clarendon Press, Oxford. SLOW VERSUS FAST GROWING 29

Schmerl, U. R.: 1981, Über die schwach und die stark wachsende Hierarchie zahlenthe- oretischer Funktionen, Sitzungsberichte der Bayerischen Akademie der Wissenschaften, Mathematisch-Naturwissenschaftliche Klasse. Schütte, K.: 1977, Proof Theory, Springer. Schütte, K.: 1986/1987, ‘Majorisierungsrelationen und Fundamentalfolgen eines Ordin- alzahlensystems von G. Jäger’, Archiv für Mathematische Logik 26, 29Ð55. Schwichtenberg, H.: 1971, ‘Eine Klassifikation der ε0-rekursiven Funktionen’, Zeitschrift für mathematische Logik und Grundlagen der Mathematik 17, 61Ð74. Schwichtenberg, H.: 1990, Homogeneous Trees and Subrecursive Hierarchies, Lecture at the University of München. Vogel, H.: 1977, ‘Ausgezeichnete Folgen für prädikativ rekursive Ordinalzahlen und prädikativ rekursive Funktionen’, Zeitschrift für Mathematische Logik und Grundlagen der Mathematik 23, 435Ð438. Wainer, S. S.: 1970, ‘A Subrecursive Hierarchy over the Predicative Ordinals’, in Wilfrid Hodges (ed.), Conference in Mathematical Logic, London, 1970, Berlin [Lecture Notes in Mathematics 255], pp. 350Ð351. Wainer, S. S.: 1972, ‘Ordinal Recursion and a Refinement of the Ordinal Recursive Functions’, Journal of Symbolic Logic 37, 281Ð292. Wainer, S. S.: 1989, ‘Slow Growing Versus Fast Growing’, Journal of Symbolic Logic 54 (1989), 608Ð614. Weiermann, A.: 1995, ‘Termination Proofs by Lexicographic Path Orderings Imply Multiply Recursive Derivation Lengths’, Theoretical Computer Science 139, 355Ð362. Weiermann, A.: 1996a, ‘Investigations on Slow Versus Fast Growing, Part I: How to Majorize Slow Growing Functions Nontrivially by Fast Growing Ones’, Archive for Mathematical Logic 34, 313Ð330. Weiermann, A.: 1996b, ‘How to Characterize Provably Total Functions by Local Predic- ativity’, Journal of Symbolic Logic 61, 52Ð69. Weiermann, A.: 1997, ‘Sometimes Slow Growing is Fast Growing’, Annals of Pure and Applied Logic 90, 91Ð99. Weiermann, A.: 1998a, ‘How is it that Infinitary Methods can be Applied to Finitary Mathematics? Gödel’s T : A Case Study’, Journal of Symbolic Logic 63, 1348Ð1370. Weiermann, A.: 1998b, ‘Bounding Derivation Lengths with Functions from the Slow Growing Hierarchy’, Archive for Mathematical Logic 37, 427Ð441. Weiermann, A.: 1999, ‘What Makes a (Pointwise) Hierarchy Slow Growing?’, in S. Barry Cooper and John K. Truss (eds), Sets and Proofs, Invited Papers from the Lo- gic Colloquium 97, University of Leeds, England, 6Ð13 July 1997, Cambridge [London Mathematical Society Lecture Notes Series 258], pp. 403Ð423. Weiermann, A.: 2001a, &0 May be Minimal Subrecursively Inaccessible, appeared in: Mathematical Logic Quarterly 47 (2001), 397Ð408. Weiermann, A.: 2001b, ‘Some Interesting Connections between the Slow Growing Hier- archy and the Ackermann Function’, appeared in: Journal of Symbolic Logic 66 (2001), 609Ð628.

Institut für Mathematische Logik und Grundlagenforschung der Westfälischen Wilhelms-Universität Münster Einsteinstr. 62, D-48149 Münster Germany E-mail: [email protected]

STEFAN GESCHKE

APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY

ABSTRACT. Elementary submodels of some initial segment of the set-theoretic universe are useful in order to prove certain theorems in general topology as well as in algebra. As an illustration we give proofs of two theorems due to Arkhangel’skii concerning cardinal invariants of compact spaces.

1. INTRODUCTION

Several theorems in general topology, especially inequalities between certain cardinal invariants of a topological space, can be proved in the following way: For a given topological space X consider a space X0 which is small in some sense and approximates X sufficiently well. Calculate the cardinal invariant in question for X0 and show that this cardinal invariant is the same for X0 and X since X0 is a good approximation of X. The method of elementary submodels provides a uniform approach for generating small approximations of topological spaces as well as of other structures. Typically, mathematics takes place in the set-theoretic universe (V, ∈), i.e., the class of all sets together with the usual ∈-relation. For a given topo- logical space X we would like to use the Löwenheim-Skolem Theorem to get a small elementary submodel (M, ∈∩M2) of the universe (V, ∈) such that X and its topology are contained in M and then consider the space X0 which is what M thinks that X is. However, there are two problems. First of all, working in V we cannot get elementary submodels of (V, ∈).This follows from Gödel’s Second Incompleteness Theorem. Moreover, it is not clear what it really is that M thinks that X is. The first problem can be solved by taking elementary submodels not of the whole universe but of a sufficiently large initial segment of the universe. The solution of the second problem depends on the application we have in mind.

Synthese 133: 31–41, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 32 STEFAN GESCHKE

Much of the material presented here is contained in Just and Weese (1997). Much more on this topic can be found in Dow (1988).

2. MODELS OF SET THEORY

Let us briefly recall some basics from logic. Kunen (1980) and Chang and Keisler (1990) provide an excellent background in set theory and model theory, respectively. The language L of set theory is the first-order language with the binary relation symbol ∈. That is, the language L consists of the formulae over the alphabet

{∧, ¬, ∃,(,),∈, =} ∪ Var, where Var is a countably infinite set of variables. As usual, we will freely use abbreviations like ‘⊆’, ‘⇒’, and ‘∃x ∈ y’ inside a formula. An L-structure is a pair (N, E) where N is a set and E is a binary relation on N, i.e., E ⊆ N 2. By the usual abuse of notation, sometimes we will identify (N, E) and N.Ifϕ(x1,...,xn) isaformulaanda1,...,an ∈ N,thenϕ[a1,...,an] is the word obtained from ϕ by replacing xi by ai whereever xi occurs freely, i.e., not in the scope of a quantifier ∃xi ,inϕ for i = 1,...,n. Abusing notation, we will refer to ϕ[a1,...,an] as a formula as well. As usual, by induction on the length of ϕ, one defines when (N, E) satisfies ϕ[a1,...,an]. We write (N, E) |= ϕ[a1,...,an] for ‘(N, E) satisfies ϕ[a1,...,an]’. (N, E) is a model of ϕ[a1,...,an] if it satisfies ϕ[a1,...,an].If is a set of formulae, (N, E) satisfies  or is a model of  if it satisfies all formulae in . An L-structure (M, F ) is called an elementary submodel of (N, E) if M ⊆ N and for all formulae ϕ[x1,...,xn] and all a1,...,an ∈ M,

(M, F ) |= ϕ[a1,...,an] if and only if (N, E) |= ϕ[a1,...,an]. We write (M, F )  (N, E) if (M, F ) is an elementary submodel of (N, E). Note that (M, F )  (N, E) implies F = E∩M2. In the following, we will simply write (M, E), respectively M, instead of (M, E ∩ M2). The well-known Löwenheim-Skolem Theorem guarantees the exist- ence of many small elementary submodels of a given structure (N, E).

THEOREM 2.1 (Löwenheim-Skolem). Let N be an L-structure and A ⊆ N infinite. Then there is M ⊆ N such that A ⊆ M, |M|=|A|,andM  N.  APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY 33

Intuitively, (V, ∈) satisfies all the axioms of our standard set theory ZFC. However, V is a proper class and not a set. Thus we cannot get an elementary submodel of (V, ∈) from the Löwenheim-Skolem Theorem. Moreover, as mentioned in the introduction, Gödel’s Second Incomplete- ness Theorem implies that we cannot hope to get a model of ZFC at all working in ZFC alone. However, every proof of a theorem of ZFC uses only finitely many axioms. (Note that ZFC contains infinitely many axioms.) And we can get models of every finite part of ZFC. Recall the use of in set theory: For a set x, we set  x := {z : z ∈ y for some y ∈ x}.    ∈ N n 0 := ∅ n := For every n we define x recursively. Let x and x n−1  x for every n ≥ 1. The transitive closure (x) of x is the set n := { :| | } n∈N x. For a cardinal χ let Hχ x tc(x) <χ. Each Hχ is a set and (Hχ , ∈) satisfies some quite large part of ZFC. Moreover, the following holds:

THEOREM 2.2 (Reflection Principle). If ϕi (x1,...,xn), i ∈{1,...,m}, are formulae and ρ is a cardinal, then there is χ>ρsuch that for all a1,...,an ∈ Hχ and all i ∈{1,...,m},

(Hχ , ∈) |= ϕi[a1,...,an] if and only if ϕi [a1,...,an] holds in the universe. 

Suppose ϕi (x1,...,xn), i ∈{1,...,m},andχ are as in the Reflection Principle and we have M  (Hχ , ∈). Then for all a1,...,an ∈ M and all i ∈{1,...,m},

M |= ϕi[a1,...,an] if and only if ϕi [a1,...,an] holds in the universe.

Thus, with respect to a given set of finitely many formulae, M looks like an elementary submodel of the universe. In the situation above, the formulae ϕi are said to absolute over M.

3. TOPOLOGICAL SPACES

Let us recall a few basic notions from topology. See Engelking (1977) for a lot of information on general topology. 34 STEFAN GESCHKE

Let X be a topological space. X is Hausdorff if for any two distinct points x,y ∈ X there are disjoint open sets U,V ⊆ X such that x ∈ U and y ∈ V . X iscompact if it is Hausdorff and for every family C of open sets with X = C there is a finite subfamily F of C such that X = F . It is an easy exercise to show that a continuous image of a compact space is compact, provided it is Hausdorff. If Y is a subset of X, a subset U of Y is open in Y if it is of the form O ∩ Y for some open subset O of X. The topology on Y we have just defined is the subspace topology on Y with respect to X. Note that Y is compact with respect to this topology if X is compact and Y isaclosed subset of X. Two topological spaces X and Y are topologically the same if they are homeomorphic, that is, if there is a bijection f : X → Y such that both f and f −1 are continuous. f is called a homeomorphism. The topology of a space X can be described by giving a base for the topology. Let τ be the topology of X, i.e., the collection of open subsets of X. B ⊆ τ is a base of X if for all x ∈ X and all U ∈ τ with x ∈ U there is V ∈ B such that x ∈ V ⊆ U. If we do not require B to be a subset of τ, we get the notion of a network. N ⊆ P (X) is a network of X if for all x ∈ X and all U ∈ τ there is V ∈ N such that x ∈ V ⊆ U.IfB is a base of X, then the open subsets of X are precisely the unions of elements of B.

EXAMPLE 3.1. Consider the set R of real numbers with the usual to- pology, i.e., the topology consisting of unions of open intervals. The set {(p, q) : p, q ∈ Q,p

3.1. Cardinal Invariants A cardinal invariant of topological spaces is a mapping i assigning a cardinal i(X) to each space X such that i(X) = i(Y) if X and Y are homeomorphic. An easy example is the cardinality of a space. Clearly, any two spaces which are homeomorphic have the same cardinality. Two less trivial cardinal invariants are the weight and the network-weight. The weight w(X) of a topological space X is the least infinite cardinal κ such that X has a base of size at most κ. The network-weight nw(X) of a topological space X is the least infinite cardinal κ such that X has a network of size at most κ. Since every base of X is a network, nw(X) ≤ w(X). Example 3.1 shows nw(R) = w(R) =ℵ0. APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY 35

4. ARHANGEL’SKII’S THEOREM

THEOREM 4.1 (Arhangel’skii, see Engelking (1977)). Let X and Y be compact. If there is a continuous mapping f : X → Y which is onto, then w(Y ) ≤ w(X).

For the proof let us first observe that the corresponding statement for network-weight holds for all topological spaces.

LEMMA 4.2. Let X and Y be topological spaces and let f : X → Y be continuous and onto. Then nw(Y ) ≤ nw(X). Proof. Let N be a network for X.ThenN  := {f [V ]:V ∈ N} is a network for Y : Let U ⊆ Y be open and non-empty. Let y ∈ U.Letx ∈ X be such that f(x) = y.Sincef is continuous, f −1[U] is open. Clearly, x ∈ f −1[U]. Thus, there is V ∈ N such that x ∈ V ⊆ f −1[U].Nowf [V ]∈N  and y ∈ f [V ]⊆U. 

Theorem 4.1 follows from this lemma once we know that for compact spaces weight and network-weight are the same. We give a proof of this fact using elementary submodels.

LEMMA 4.3. If X is compact, then w(X) = nw(X). Proof. Let N be a network of X with κ :=|N|= nw(X) and let τ be the topology of X.Letχ be sufficiently large. That is, let χ be large enough for Hχ to contain X and τ and such that all those finitely many formulae are absolute over Hχ we want to be absolute in the following proof. We could write down these formulae having a close look at the rest of the proof. But this it not necessary since the Reflection Principle says that a suitable χ exists for any finite set of formulae. Now pick M  Hχ such that N ∪{N,X,τ}⊆M and |M|= κ. We claim that τ ∩ M is a base of X. Let U ∈ τ be non-empty and let x ∈ U.Fory ∈ X\U there are disjoint open sets Uy and Vy such that x ∈ Uy and y ∈ Vy.SinceN isanetwork, there are sets Ay ,By ∈ N such that x ∈ Ay ⊆ Uy and y ∈ By ⊆ Vy .Since χ is sufficiently large, |= ∃   ∈  ∩  =∅∧ ⊆  ∧ ⊆  Hχ Uy ,Vy τ(Uy Vy Ay Uy By Vy). Therefore, |= ∃   ∈  ∩  =∅∧ ⊆  ∧ ⊆  M Uy,Vy τ(Uy Vy Ay Uy By Vy).   ∈ ∩ ⊆ ∈  ∈  \ ⊆ Now Uy,Vy τ M τ, x Uy,andy Vy. Clearly, X U  \ ⊆ \ y∈X\U Vy.SinceX U is compact, there is a finite set F X U such 36 STEFAN GESCHKE  \ ⊆  that X U y∈F Vy. Note that F does not have to be a subset of M. {  : ∈ } However, Uy y F is a subset of M. Since this set is finite, it can be {  : ∈ }={ } defined in Hχ . More precisely, suppose Uy y F U1,...,Un . Let ϕ(z,x1,...,xn) be the formula saying that the elements of z are precisely x1,...,xn.Nowϕ[W,U1,...,Un] holds in Hχ if and only if W ={U1,...,Un}. Here we could argue as follows: ϕ is one of those formulae we want to be absolute over Hχ and thus, by the choice of χ,we have

Hχ |= ϕ[W,U1,...,Un] if and only if W ={U1,...,Un}. However, it turns out that formulae which are as simple as ϕ are absolute over every Hθ , no matter what cardinal θ is. This is due to the fact that for every b ∈ Hθ and every a ∈ b we have a ∈ Hθ . This property of the Hθ ’s is called transitivity.Since

Hχ |= (∃zϕ)[U1,...,Un],

M |= (∃zϕ)[U1,...,Un].  {  : ∈ }∈  :=  ∈ ∩ Thus Uy y F M. Therefore U y∈F Uy M τ. Clearly, ∈  ⊆ \  ⊆ x U X y∈F Vy U. This finishes the proof of the claim and thus the proof of the lemma. 

Proof of the theorem. Let X and Y be compact and f : X → Y continuous and onto. Then by Lemma 4.3 and Lemma 4.4,

w(X) = nw(X) ≥ nw(Y ) = w(Y ). Theorem 4.1 does not hold if the spaces are only assumed to be Hausdorff.

EXAMPLE 4.4 (Alexandroff and Niemytzki, McAuley, see Engelking (1977)). Consider X := R2 and let the topology τ on X be generated by the base B which is defined as follows: 2 For each point (x, y) ∈ X and i ∈ N let Ui(x, y) := {(a, b) ∈ R :| − | 1 } := { ∈ R2 :| − |≤ (a, b) (x, y) < i+1 and Ci(x, y) (a, b) (a, b) (x, y) 1 } i+1 .Let := { : ∈ R = ∈ N} B Ui(x, y) x,y , y 0, and i  1 ∪ {(x, )}∪U (x, ) \ C x, 0 i 0 i +   i 1 1 ∪ C x,− : x ∈ R,i ∈ N . i i + 1 APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY 37

This space is Hausdorff. It turns out that nw(X) =ℵ0 while w(X) = ℵ 2 0 .LetN be a countable network of X.Letσ be the topology on X which is generated by N as a base. By the properties of a network, τ ⊆ σ . Thus (X, σ ) is Hausdorff and the identity idX : (X, σ ) → (X, τ); x → x is continuous. However, by definition of σ ,

ℵ0 ℵ0 = w(X, σ ) < w(X, τ) = 2 . 

5. BETTER SUBMODELS

In the proof of Lemma 4.3, we used the fact that for every elementary submodel M of Hχ a finite subset of M already is an element of M.On the other hand, for every M  Hχ , N =ℵ0 ∈ M. Clearly, N ⊆ M. Thus, if x ∈ M is countable, then M contains a bijection f : N → x. But now x = f [N]⊆M. The same argument shows that x ⊆ M if x ∈ M has size κ and κ ⊆ M. However, typically we do not know whether for some y ⊆ M, y is an element of M. The following lemma comes in handy.

LEMMA 5.1. Let δ be an ordinal and χ a cardinal. Suppose (Mα)α<δ is a  chain of elementary submodels of Hχ , i.e., for all α<δ , Mα Hχ and  :=   for all α, β < δ with α<β, Mα Mβ .ThenM α<δ Mα Hχ .

This lemma allows it to construct elementary submodels of Hχ with various closure properties.

LEMMA 5.2. Let χ, κ,andλ be infinite cardinals such that κλ = κ and λ<χ. Then for every A ⊆ Hχ with |A|≤ κ there is M  Hχ such that A ⊆ M, |M|≤ κ,andforx ⊆ M with |x|≤ λ, x ∈ M. Proof. By the Löwenheim-Skolem Theorem, there is M0  Hχ with + A ⊆ M0 and | M0 |≤ κ. By induction on α<λ, construct a chain + ≤ (Mα)α<λ of elementary submodels of Hχ of size κ as follows: + :=  For a limit ordinal α<λ let Mα β<α Mβ . By Lemma 5.1, Mα λ Hχ .Byκ = κ, λ<κand thus α<κ.Since|Mβ |≤ κ for all β<α, |Mα|≤ κ. If α is a successor, say α = β + 1, let Mα  Hχ be such that |Mα|≤ κ, Mβ ⊆ Mα, and for each x ⊆ Mβ with |x|≤ λ, x ∈ Mα. This is possible by the Löwenheim-Skolem Theorem together with the fact that Mβ has not more thanκλ = κ subsets of size λ. := ⊆ + ≤ M α<λ+ Mα works for the lemma: Clearly, A M.Sinceλ κ, |M|≤ κ. By Lemma 5.1, M  Hχ .Letx ⊆ M be of size λ. For each y ∈ x 38 STEFAN GESCHKE

+ ∈ := { : ∈ } let αy <λ be such that y Mαy .Letα sup αy y x . Note that + α<λ since |x|≤ λ.Nowx ⊆ Mα and thus x ∈ Mα+1 ⊆ M. 

Using this lemma, we can give an easy proof of another famous theorem of Arhangel’skii. For a topological space X and x ∈ X, a family B of open subsets of X is called a local base at x if every element of B contains x and for every open set O ⊆ X containing x there is U ∈ B with U ⊆ O. X is first countable if for every x ∈ X there is an at most countable local base at x.

THEOREM 5.3 (Arhangel’skii, see Engelking (1977)). Let X be compact ℵ and first countable. Then |X|≤ 2 0 . Proof. Let τ be the topology of X.Letχ be large enough and pick ℵ M  Hχ such that X, τ ∈ M, |M|≤ 2 0 , and for all countable a ⊆ M, a ∈ M. M exists by Lemma 5.2. Claim 1. X ∩ M is a closed subspace of X. Let x ∈ X be in the closure of X ∩ M.SinceX is first countable, there is a sequence (xn)n∈N in X ∩ M converging to x. Since every countable subset of M is an element of M and (xn)n∈N can be considered as a subset of N × (X ∩ M) ⊆ M, (xn)n∈N ∈ M.SinceX is Hausdorff, x is the unique limit of (xn)n∈N. By compactness of X, M thinks that (xn)n∈N has a limit. Thus x ∈ M. This finishes the proof of the claim. Clearly, the lemma follows from Claim 2. X ⊆ M. Suppose there is x ∈ X \ M.SinceM knows that X is first countable, for every y ∈ X ∩ M, M contains a countable local base By at y.Since N ∈ ⊆ ∈ ∩ ∈ ∩ ∈ M, By M for all y X M. For each y X M pick Uy By ∈ ∩ ⊆ ∈ \ such that x Uy. Clearly, X M y∈X∩M Uy and x X y∈X∩M Uy. Since X ∩ M is compact by Claim 1, there is a finite set F ⊆ X ∩ M such ∩ ⊆ ⊆ { : ∈ }∈ that X M y∈F Uy.SinceF M is finite, Uy y F M.Now  Hχ |= X \ {Uy : y ∈ F }=∅.

Thus  M |= X \ {Uy : y ∈ F }=∅.  Therefore X ∩ M ⊆ {Uy : y ∈ F }. A contradiction. 

Note that for every M that contains all its subsets of size ≤ λ,wehave ℵ |M|≥ 2λ.Andeven2 0 can be large. Sometimes it is sufficient to consider models M  Hχ with the property that all subsets of M of size ≤ λ are APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY 39 covered by elements of M of size ≤ λ. For example, for every n ∈ N and every set A ⊆ Hχ of size ℵn there is M  Hχ such that A ⊆ M, |M|= ℵn, and for every countable subset x of M there is a countable set y ∈ M with x ⊆ y. Recently, models of this kind have been very useful in Fuchino et al. (2001). Another application of such models, more closely connected to the topic of this article, is Dow’s proof (Dow 1988) of the result of Hajnal and Juhasz (1980) that a topological space X has countable weight if every subspace of size at most ℵ1 has countable weight. Another important class of models are the internally approachables. One instance of internal approachability is Vκ -likeness. For a cardinal κ an elementary submodel M of some Hχ , χ>κis called Vκ -like if there is a chain (Mα)α<κ of elementary submodels of Hχ = with M α<κ Mα such that for each α, Mα has size less than κ and (Mβ )β≤α ∈ Mα+1. Note that if (Mβ )β≤α ∈ Mα+1,thenMα ∈ Mα+1 since Mα ∈ Hχ , Mα can be defined in Hχ as the last element of the sequence (Mβ )β≤α,andMα+1  Hχ . Also note that every Vκ -like model has size κ. Among other nice properties, if κ is regular, every subset of a Vκ -like model M of size less than κ is covered by an element of M of size less than κ.Forleta ⊆ M be of size less than κ. By regularity of κ, there is α<κsuch that a ⊆ Mα.NowMα ∈ Mα+1 covers a, has size less than κ, and is contained in M. Various kinds of internally approachable models have been success- fully used by Shelah and others. For example, some so-called black box principles are formulated using internally approachable models. These principles hold in ZFC and can be used to construct structures with certain second order properties (e.g., in Mekler and Shelah (1993) or in Eklof and Mekler (1990)). That is, while constructing a structure by induction, one can keep track of its later endomorphisms. More on this topic will be con- tained in Shelah (in preparation). Vκ -like models were used in Fuchino and Soukup (1997) in order to characterize partial orderings with the so-called weak Freese-Nation property.

6. HOW TO GET APPROXIMATIONS OF X FROM M

What we really did in the proof Lemma 4.3 was to consider the topology on X which is generated by the open subsets of X that are contained in the elementary submodel M. We basically showed that this topology coincides with the original topology on X. Similar things happened in the proof of Theorem 5.3. Here we con- sidered the space X ∩M and showed that X ∩M already coincides with X. However, there are many proofs using elementary submodels M of some 40 STEFAN GESCHKE

Hχ where the approximation of a topological space X given by M is really smaller than X. All we did so far was to consider a space obtained from X by thinning out the topology or by passing to a subspace of X. Another method, which is especially useful if the spaces under consideration are compact, is to pass to a quotient. Let X be compact and χ sufficiently large. For M  Hχ define an equivalence relation ∼M on X as follows:

x ∼M y if and only if for all continuous f : X → R with f ∈ M,we have f(x)= f(y).

X/ ∼M is compact and in some sense the most reasonable approxima- tion of X we can obtain from M. ℵ For a set A let [A] 0 denote the set of countably infinite subsets of X. ℵ ℵ C ⊆[A] 0 is closed and unbounded in [A] 0 if every countable B ⊆ A is included in some element of C and the union of every countable chain in C is again an element of C. Note that by the Löwenheim-Skolem Theorem together with Lemma 5.1, for every infinite cardinal χ the set ℵ ℵ {M ∈[Hχ ] 0 : M  Hχ } is closed and unbounded in [Hχ ] 0 . Bandlow (1991) proposed the following type of characterization of a class K of compact spaces by a class F of continuous mappings:

A compact space X is in the class K if and only if for every sufficiently large χ there is ℵ a closed and unbounded subset C of [Hχ ] 0 consisting of elementary submodels of Hχ such that for every M ∈ C the quotient map q : X → X/ ∼M belongs to F . For example, this works well in the case of openly generated com- pact spaces, which were studied by Šcepinˇ (1981). Among other things, Šcepinˇ proved that every openly generated compact space X satisfies the countable chain condition (c.c.c.), i.e., every family of pairwise dis- joint non-empty open subsets of X is at most countable. Bandlow (1991) characterized openly generated compact spaces in terms of elementary submodels of Hχ ’s using the ∼M -approach and gave a simple proof Šcepin’sˇ result on the c.c.c. of openly generated spaces using his character- ization. The class F of mappings used to characterize open generatedness is the class of open mappings. A continuous mapping is called open if the images of open sets under this mapping are again open.

REFERENCES

Bandlow, I.: 1991, ‘A Construction in Set-theoretic Topology by Means of Elementary Substructures’, Zeitschrift für Mathematische Logik 37, 467–480. Chang, C. C. and Keisler, J.: 1990, Model Theory, Amsterdam [Studies in Logic and the Foundations of Mathematics 73]. APPLICATIONS OF ELEMENTARY SUBMODELS IN GENERAL TOPOLOGY 41

Dow, A.: 1988, ‘An Introduction to Applications of Elementary Submodels to Topology’, Topology Proceedings 13, 17–72. Eklof, P. and Mekler, A.: 1990, Almost Free Modules, Amsterdam [North-Holland Mathematical Library 46]. Engelking, R.: 1977. General Topology, Warszawa [Polska Akademie Nauk, Monographie Matematyczne 60]. Fuchino, S., Geschke, S., and Soukup, L.: 2001, ‘On the Weak Freese-Nation Property of P (ω)’, Archive for Mathematical Logic 40, 425–435. Fuchino, S. and Soukup, L.: 1997, ‘More Set Theory Around the Weak Freese-Nation Property’, Fundamenta Mathematicae 154, 159–176. Juhasz, I.: 1980, Cardinal Functions in Topology: Ten Years Later, Amsterdam [Mathem- atical Centre Tracts 123]. Just, W. and Weese, M.: 1997, Discovering Modern Set Theory II: Set-theoretic Tools for Every Mathematician, Providence [American Mathematical Society, Graduate Studies in Mathematics 18]. Kunen, K.: 1980, Set Theory, Amsterdam [Studies in Logic and the Foundations of Mathematics 102]. Mekler, A. and Shelah, S.: 1993, ‘Some Compact Logics – Results in ZFC’, Annals of Mathematics 137, 221–248.1 Šcepin,ˇ V.: 1981, ‘Functors and Uncountable Powers of Compacts’, Russian Mathematical Surveys 36, 1–71. Shelah, S.: Non Structure Theory, (in preparation).

Freie Universität Berlin II. Mathematisches Institut Arnimallee 3, D-14195 Berlin Germany E-mail: [email protected]

1 This is number 375 in Shelah’s publication list. It can be found at Shelah’s archive, which is located at http:/www.math.rutgers.edu/FTP_DIR/shelah.

MICHAEL STOLZ

THE HISTORY OF APPLIED MATHEMATICS AND THE HISTORY OF SOCIETY

ABSTRACT. Choosing the history of statistics and operations research as a case study, several ways of setting the development of 20th century applied mathematics into a so- cial context are discussed. It is shown that there is ample common ground between these contextualizations and several recent research programs in general contemporary history. It is argued that a closer cooperation between general historians and historians of math- ematics might further the integration of the internalist and externalist approaches within the historiography of mathematics.

1. INTRODUCTION

The historiography of mathematics is a field of research which incorpor- ates several divergent traditions or methodological stances. It has proven convenient to divide the field roughly into an “internalist” and an “ex- ternalist” approach. The internalist one, for which Eberhard Knobloch’s contribution to the present volume may serve as an example, aims at re- constructing the development of mathematical concepts in the work of a particular mathematician or in the discussions of a certain group of math- ematicians. A history of that kind is sometimes bound to presuppose on the part of the reader a thorough understanding of the relevant mathematical subject matter. On the other hand, the externalist approach is characterized by an attempt to place a certain aspect of mathematics within a relevant ex- tramathematical context. Since there are, of course, several kinds of context which might be deemed relevant, studies in the history of math- ematics which are written in the externalist vein possibly do not have very much in common. So it is the first aim of the present paper to convey an idea of the variety of social contexts into which the history of mathematics can be set. Whatever differences there may be, the externalist historiography of mathematics usually tends to present a relatively low level of technical- ity and thus to be accessible to the non-specialist. In particular,1 general historians, who are chiefly interested in the history of society or of polit- ical ideas, and whose speciality may be the history of trade unions or of

Synthese 133: 43Ð57, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 44 MICHAEL STOLZ anti-communist intellectuals, can read studies of this kind without any insurmountable difficulties. It is the second aim of this paper to show that they might even consider this reading worthwhile. More specific- ally, it will be argued that several themes in the historiography of 20th century applied mathematics fit in with some recent research trends in general contemporary history and that a closer cooperation between histor- ians of mathematics and general historians might favor the integration of externalist and internalist approaches in the historiography of mathematics. By way of a case study, the paper focusses on those parts of 20th cen- tury applied mathematics which have entered into the toolkit of economics and management science. The main examples are statistics and operations research (OR). No attempt is made to elucidate the complex social dimen- sions of applied mathematics as a whole Ð a field whose demarcation as opposed to pure mathematics and to the natural and engineering sciences is in itself thoroughly social and historical. Nevertheless, the question whether OR belongs to the province of the historian of mathematics de- serves some comment. In fact, the practitioners of OR show a tendency toward stressing its interdisciplinary character, especially the close con- nection between OR and the social sciences (cf., e.g., Lesourne (1990)). What is more, if the history of mathematics was confined to non-trivial theoretical achievements, the quantitative methods employed in manage- ment would to a considerable extent have to be excluded from this history. An illuminating discussion of these points can be found in the text of an address, given in 1982 not by a historian, but by a research mathematician, J. Barkley Rosser (Rosser 1982, 509Ð510). His retrospective views of his work in wartime OR deserve to be quoted at some length:

What is mathematics? I take the entirely pragmatic view that if a person’s associates thought the problem he or she was solving was a mathematical problem, then it was. Many of you will disagree with this. Indeed, many of the mathematicians involved in such enterprises during the War privately did not accept this definition. The attitude of many with the problems they were asked to solve was that the given problem was not really mathematics but, since an answer was needed urgently and quickly, they got on with it. And there was another aspect. Problems that purported to require mathematical treat- ment were often not clearly formulated. A discussion between the person with the problem and a mathematician could result in a major reformulation. This usually resulted in a simplification. I shall count this also as mathematics. . . . Is OR (operations research) mathematics? Nowadays, the practitioners insist that it is a separate discipline, and I guess by now [1982] it is. It is certainly not now taught in departments of mathematics. But, it grew out of mathematics. At the beginning of OR, during the war, it was mathematics according to my definition above, although some of the very good operators were physicists and chemists. The Air Force Generals and Navy Admirals thought it was wonderful stuff. You could not have convinced one of them that it was not mathematics. HISTORY OF APPLIED MATHEMATICS AND SOCIETY 45

Rosser’s text makes it quite clear that it is useful to distinguish between two perspectives on mathematics, the one from the inside and the one from the outside. The relevance of the latter becomes obvious when one wishes to account for the prestige that mathematics and mathematicians enjoyed during one period or another. Here what the “Air Force Generals and Navy Admirals” thought to be mathematics may prove crucial.

2. A HISTORY OF PERSONS

In this section, applied mathematics is set into a social context by treat- ing mathematicians as individuals endowed with a specific biographical background and with patterns of social interaction. If one looks for factors that influenced the career path and outlook of those mathematicians who were to set up the mathematical tools of management in the postwar years, the importance of the Second World War can hardly be overestimated.2 At this point, the case of J. Barkley Rosser, who was quoted above, proves revealing. The readers of a volume with the title “Foundations of the Formal Sciences” will probably mainly remember Rosser’s contributions to logic. In fact, his wartime work with a group which was charged with developing and producing rockets led him to devote a considerable part of his postwar work to applied mathematics and there, among others, to ballistics. And so he served as president not only of the Association for Symbolic Logic (1950Ð1953), but also of the Society for Industrial and Applied Mathematics (1964Ð1966) (Sacks 1989). A characteristic feature of the mathematicians’ wartime experience was their participation in interdisciplinary groups of civilian scientists working for various units of the armed services.3 The first OR groups emerged in Britain before the onset of the war (McCloskey 1987a). In the process of the British pioneering development of the radar technology, which was undertaken against the backdrop of Germany’s vigorous policy of airborne rearmament, it had become apparent that the military efficacy of the new gadgets depended on a better understanding of the operations that the radar units were to perform. This “operational research”, which was soon to be extended to other military activities, e.g., anti-submarine warfare, was undertaken by interdisciplinary groups of usually civilian scientists who worked closely with the military commands. These OR groups consisted of scientists who came from a wide range of specialities, but typically included several physicists and mathematicians. “Blackett’s Circus”, as the most famous of the early OR groups was called,4 included besides Blackett, a physicist, three physiologists, two mathematical physicists, one astrophysicist, one Army officer, one surveyor with a background in anti- 46 MICHAEL STOLZ aircraft, and two mathematicians (McCloskey 1987a, 148). Even before Pearl Harbor, the British transferred their achievements in military science and technology to the United States. This transfer did not only lead to the foundation of the famous Radiation Laboratory at the MIT and to a major redirection of the American work on radar, but also inspired the establish- ment of OR groups in the U.S. armed services. Shortly after the American entry into World War II, the Navy had two OR organizations, the Army Air Forces had one, and their number multiplied during the following years (McCloskey 1987c, 911) Apart from the OR groups, other teams played a crucial role. In the United States, the mobilization of science in the war was coordinated by the Office of Scientific Research and Development (OSRD), which was headed by Vannevar Bush. In November 1942, Bush created an “Applied Mathematics Panel” (AMP) to coordinate the mathematical assistance to the armed services. Many (but not all) of the mathematicians who were on leave from their universities to do war-related work were employed under contract with the AMP. To direct the panel, Bush chose Warren Weaver, an applied mathematician from the University of Wisconsin who had been the director of the Rockefeller Foundation’s Natural Sciences Division since 1934. “By the end of the war, AMP had undertaken almost 200 studies, nearly one-half of which represented direct requests from the armed ser- vices” (Rees 1980, 609, cf. Owens 1989). One of the major contractors for the AMP was the Statistical Research Group (SRG) at Columbia Uni- versity. As opposed to the OR groups, the SRG was quite homogeneous, consisting of statisticians and economists. Another contractor in the field of statistics was Jerzy Neyman’s Statistical Laboratory at Berkeley. A pertinent feature of the mathematical work done within the frame- work of an AMP contract was the inevitable client-orientation of research, which turned the research groups into consulting groups. Thus both the OR teams and the AMP contracting groups foreshadowed the “think tanks” of the postwar period, especially the RAND Corporation. RAND was foun- ded in 1946, initially under the auspices of Douglas Aircraft Company, and soon became a key player in the further development of OR and in its moulding into systems analysis. Both systems analysis and RAND experts were to play a significant role at the Defense Department during the Mc- Namara era. In addition to the work done for the military clients, since the 1960s RAND set out to develop tools for optimizing social policy, which perfectly fit in with the context of President Johnson’s Great So- ciety programs.5 RAND soon was considered the example of a nonprofit advisory corporation and an icon of the Cold War.6 HISTORY OF APPLIED MATHEMATICS AND SOCIETY 47

However, the case of the two wartime statistics groups mentioned above also aptly illustrates the tension between client-orientation and genuine mathematical research interests in statistics. On the one hand, Neyman dis- played a tendency to neglect the practical and computational aspects of the work arising from his contract for the sake of fundamental research, and his contract eventually was terminated by the AMP (Owens 1989, 292). On the other hand, one of the SRG’s lasting achievements in the realm of theoretical statistics was accomplished exactly because the client’s needs were taken seriously: One of the crucial tasks statistics had to fulfil in the wartime economy was quality control, especially in the production of military equipment. These quality control experiments involved two con- flicting aims. On the one hand, the degree of reliability of the procedures which seemed desirable in ordnance testing required large sample sizes, which were, on the other hand, consuming both time and money. There- fore, a Navy ordnance officer came up with a vague idea of a procedure which would allow, if certain conditions were met, to terminate an experi- ment earlier than planned, while maintaining the standards of reliability of the tests. These ideas, after undergoing a mathematical reformulation due to the statistician W. A. Wallis and the later Nobel laureate in economics, Milton Friedman, served as point of departure for Abraham Wald’s theory of ‘sequential analysis’,7 which was to grow out of his SRG work (Wallis 1980). The importance of client-orientated research groups should not, though, turn the historian’s attention away from the role which some individuals played in wartime research and science policy. In this context, the em- blematic figure is, of course, John von Neumann, who became one of the most influential mathematical consultants during the war. Amy Da- han Dalmedico views him as a paradigm case of an emergent species of mathematicians who differ both socially and culturally from prewar mathematicians. She lays stress on his role in shaping the political, sci- entific, and technical options of the United States, and on his contribution to blurring the boundaries between pure and applied mathematics, phys- ics, and engineering science (Dahan Dalmedico 1996, esp. 173Ð174). Von Neumann was in various ways involved in wartime OR. So he occasion- ally worked as a consultant with the Anti-Submarine Warfare Operations Research Group at the MIT Radiation Laboratory (Fortun and Schweber 1993, 603). He was particularly influential in incorporating game theory into the OR toolkit. Moreover, his decisive contributions to the develop- ment of the digital computer helped prepare the ground for the postwar applications of OR to far more complex problems than those which could be tackled during the war.8 48 MICHAEL STOLZ

A glance at the principal figures mentioned above points to another im- portant lesson that can be learned from treating the history of 20th century mathematics as a history of mathematicians. It is in fact possible to depict this history as a history of transatlantic transfer. OR was imported to the United States from Britain during the early stages of the war, and in the postwar period it radiated from both countries onto the continent. Many of the outstanding American figures in this story had immigrated from Europe: John von Neumann, Jerzy Neyman, and Abraham Wald. If one also takes other specialities of applied mathematics into consideration, one can easily add many others, e.g., Theodor von Kármán, Richard Courant, Mark Kac, and William Feller. The best-known aspect of this migration is, of course, the case of the refugee mathematicians from Nazi Germany or from the parts of Europe occupied by the Wehrmacht. Nevertheless, several eminent mathematicians emigrated before 1933, and Neyman had been living in Britain for four years before he departed for Berkeley in 1938. Therefore, forced emigration was but one aspect of a long-term process of transatlantic exchange and transfer. This process contributed to consolidating the rather precarious position of applied mathematics within the mathematical landscape of the United States.9 But the most significant watershed in the history of American applied mathematics was, after all, the Second World War. Arguably, the transatlantic contacts in the realm of mathematics fit in with the much broader topic of cultural transfer across the Atlantic Ocean. Current research in contemporary history has been highlighting the role of emigration and remigration in spreading both American values and schemes of economic organization throughout western Europe, and the role that American foundations played in this process is now widely appreciated.10 To add a more specific point, the story outlined above could to some extent be depicted as the history of the Rockefeller Foundation. As director of the Rockefeller Foundation’s Natural Sciences Division, War- ren Weaver was decisively involved in using Rockefeller funds to further the integration of German refugee mathematicians into American univer- sities (Reingold 1981). In the postwar period, the foundation encouraged the voluntary migration of scientists by the award of scholarships. Some were to exert a lasting influence on the larger orientations of American research, e.g., in mathematical economics. This field was strongly inspired (and given a “Bourbakian” bias) by Gérard Debreu, who had received his training as a mathematician in Paris at the École Normale Supérieure and settled down in the U.S. at the end of the forties (Weintraub and Mirowski 1994). HISTORY OF APPLIED MATHEMATICS AND SOCIETY 49

3. A HISTORY OF VALUES

Up to the present point, this overview has linked applied mathematics to general trends in history by focussing on the patterns of social in- teraction the mathematicians were engaged in. A contextualization can also be achieved by bringing discourses about applied mathematics into play. These discourses can be related to political ideas or, more generally, to various kinds of values. So one can investigate into the relationship between value laden discourse about mathematics one the one hand, and the social identities of the “producers” and “consumers” of mathematics on the other. Once again, this strategy reflects a research trend in contem- porary history, which seems to have been gaining momentum in recent years. There is a tendency to blur the boundaries between political history and history of technology, and to direct one’s attention to the interplay of developments in politics, economy, technology, and culture.11 During the 1950s and 60s, OR and its offspring, systems analysis, played an outstanding role in justifying political decisions, first with re- spect to armament policy, then, in the sixties, to welfare programs.12 This can be gleaned from the RAND Corporation’s influential position as an advisory body and from the pre-eminent prestige of OR at the Pentagon during the McNamara era. The function OR fulfilled in this context was twofold: It was both a tool and a symbol. The symbolic function can be aptly illustrated by the history of the management control system PERT (Program Evaluation Review Technique), which was employed in the de- velopment of the Navy’s Polaris missile during the second half of the 1950s. Hailed as a major breakthrough in management science, PERT also aroused considerable interest on the part of the public and contributed decisively to the prestige of the Polaris program.13 Nevertheless, Harvey P. Sapolsky argues in his study of the Polaris project, that, while the actual implementation of the program was hampered by adverse circumstances and was, in fact, never completed, the Polaris staff was quite successful in using PERT to “sell” the missile program. “PERT did not build the Polaris, but it was extremely useful for those who did build the weapon system to have many people believe that it did” (Sapolsky 1972, 125). Should the historians of applied mathematics set themselves the task to account for cases like PERT? Perhaps, provided that the partially math- ematical character of these systems, or the thin veneer of mathematics acquired by management techniques, was actually instrumental in bringing about their enormous prestige. Theodore M. Porter’s discussion of a some- what related topic, the establishment of cost-benefit analysis in the U.S. public works bureaucracy, suggests an affirmative answer and sketches 50 MICHAEL STOLZ a sociological explanation. His broader concern is with accounting for “the prestige and power of quantitative methods in the modern world” (Porter 1995, viii). Porter interprets the quantitative tools of management as “technologies of trust”, and sets out to show that their particular prestige has to be seen against the backdrop of “the American political context of systematic distrust” (Porter 1995, 149). Two successive chapters of his book may serve to illustrate Porter’s point. Chapter 6 is devoted to one of the most prestigious corps of French state engineers, the Corps des Ponts et Chaussées, while chapter 7 deals with a somewhat analogous American body, the Army Corps of Engin- eers. The French corps, dating back to the prerevolutionary period, held a central place within the French public works bureaucracy. During the 19th century, the need to assess the rentability of public works projects gave rise to a particular strand of economic thought, of which the work of Ponts et Chaussées engineer Jules Dupuit is a well-known example. Porter’s point is that these engineers, who typically had a strong com- mand of mathematics and displayed much creativity in applying it to the problems of public works, nevertheless did not try to erect a formalized system which would allow to calculate the prospective benefit of a project in a uniform manner. Instead, the final decision whether a project should be realized or not was arrived at in a rather informal way. According to Porter, this reliance on the engineers’ expertise reflects the prestige that the Corps des Ponts et Chaussées enjoyed within the meritocratic order of postrevolutionary France. “The standing of Ponts engineers was less a result of their technical knowledge than of the secure position they held in society. These were men who believed in their own capacity to make de- cisions. Within a body like the Corps des Ponts, informal discussion within a context of shared experience and personal trust was often sufficient to reach agreement. They felt no need to engage in the elaborate justificatory ritual of formal quantitative decision procedures unless threatened from outside by controversy and political pressures” (Porter 1995, 142). By contrast, in the United States, the Army Corps of Engineers did not enjoy an unchallenged position of that kind. Part of its duties was the selection of flood control and water projects. The 1936 Flood Control Act required some kind of cost-benefit analysis in the process of assessing pro- spective projects, and restricted federal funds to those projects from which an excess of benefits over costs could be expected. In the decades before the Act, the corps had practised an assessment of costs and benefits without developing standardized schemes for these analyses. But since the 1940s, bureaucratic conflict between the Corps and other agencies required a more formalized way to justify the Corps’ decisions, which were at the same HISTORY OF APPLIED MATHEMATICS AND SOCIETY 51 time challenged by powerful interest groups. Thus since the early 1950s, the Corps began to multiply the numbers of economists and other social scientists in its ranks, who set out to provide cost-benefit analysis with an economic rationale. The techniques underwent a process of standard- ization and formalization, subsequently spread to all kinds of government expenditures, and were to inspire the economics of public health. Accord- ing to Porter, the “transformation of cost-benefit analysis into a universal standard of rationality, backed up by thousands of pages of rules, cannot be attributed to the megalomania of experts, but rather to bureaucratic conflict in a context of overwhelming public distrust” (Porter 1995, 189). Porter relates quantitative techniques (and thus mathematics, widely understood) to values via the concept of “technologies of trust”. The next case places OR within a context of value laden ideas that is much more obviously political. Jonathan Rosenhead’s research into postwar OR in Britain leads to the conclusion that British OR was, above all, a left- wing phenomenon. In fact, he observes an “interpenetration of operational research and of socialist thinking” (Rosenhead 1989, 6). Many of the founding fathers of British OR were left-wing intellectuals, and one of the best-known proponents of a Marxist perspective on science, the x-ray crystallographer J. D. Bernal, was deeply involved in wartime OR. In March 1946, Cecil Gordon, a biologist and wartime practitioner of OR, who had temporarily been a communist in the 1930s, was appoin- ted to head a new unit at the Board of Trade, the Special Research Unit. At the same time, other administrative agencies, like the National Coal Board, established OR organizations as well. Gordon participated in sev- eral committees whose discussions centered around the ways in which OR could become instrumental in achieving socialist goals. To cite but one example, the case of “consumer needs”: here it was envisaged to protect the peacetime consumers’ interests by using those OR measures that had been employed during the war to secure and optimize the supply of the armed services. In various reports, the committees advanced some ambitious schemes as to the role OR should henceforth play within the administration. Rosenhead summarizes them as follows (Rosenhead 1989, 17Ð18):

• the full use of OR to raise the level of national productivity; • the application of OR to redesigning the government administrative machine more efficiently, to assess export prospects, to formulate criteria of efficiency for the newly nationalized industries, etc.; • the establishment of OR groups in each government department and of a central unit for the study of broad national problems, as well as the encouragement of OR activity in nationalized and other key industries. 52 MICHAEL STOLZ

In the end, Rosenhead states the total failure of all attempts to firmly establish OR in the government apparatus. In October 1948, Gordon left the Board of Trade for reasons that can, according to Rosenhead, “only be a matter of speculation” (Rosenhead 1989, 15). But his earlier membership in the communist party may well have proven a liability given the onset of Cold War. Subsequently there was a “nearly simultaneous collapse of OR elsewhere in government service” (Rosenhead 1989, 24). Rosenhead puts part of the blame on the obstructiveness of the civil service, but interprets the fate of British OR within the civilian government machine chiefly in the framework of the Cold War, where economic concepts that in some way or another were tied to central planning fared badly. There “was a change in the intellectual climate in which certain ideas and policies, in- cluding that of centralized state planning, became tarred with the brush of totalitarianism. Operational research, because of the socialist perspective of many of its originators, and more generally through the left-wing image which science and scientists had acquired since the 1930s, was exposed to this chill wind” (Rosenhead 1989, 24). Rosenhead’s paper has been presented as an example of a certain kind of contextualization. Nevertheless, it reveals some specifics of the fate of OR in postwar Britain which differ considerably from the American case, where OR was a symbol rather than a victim of the Cold War. So it becomes quite clear that a political and social contextualization of the mathematical tools of management ultimately calls for a comparative approach. Much of the groundwork which is required for a comparative analysis still remains to be done, since the bulk of the available historical literature focusses on the English-speaking countries.

4. FURTHER PERSPECTIVES

In the previous section some evidence has been presented that discourses about applied mathematics in one way or another feature in the history of political and social thought. They may become associated to outright political ideas, such as socialism, or help symbolize the objectivity and impartiality of bureaucratic action. One thing that these contextualizations of mathematics have in common is that the content of the mathematics in- volved does not matter at all. One could be inclined to suspect that treating mathematics as a black box is in fact the hallmark of an externalist history. By way of conclusion, some indications shall be given that this is not the case. If one wishes to relate the content of a mathematical theory to general history, an obvious idea is to investigate into the history of mathematical HISTORY OF APPLIED MATHEMATICS AND SOCIETY 53 modeling of social phenomena. As a second step, then, it is possible to explore whether this modeling has left its mark on the general history of ideas about society or even on some set of social practices. As far as the relationship between mathematics and the social sciences (as well as the related case of the biological sciences) is concerned, a considerable amount of research has already been done. In particular, the interrelation between 19th century statistics and the social and biological sciences of those days, which for example can be gleaned from the successive interpretations of the normal distribution, has aroused much interest.14 The introduction of probabilistic concepts into social sciences such as econometrics has been studied (Morgan 1990). The historical development of the mathematical models which are used in economic equilibrium theory and in biology has been traced in Ingrao and Israel (1990) and Israel (1996). Even for the post- 1945 period, some aspects of the interplay between the social sciences and mathematics have been the object of detailed research. Among the OR techniques, game theory has received special attention, and one can now appreciate the extent to which game theory influenced the modeling of competitive behavior not only in economics, but also in political science (Weintraub 1992). Historians of postwar psychology have pointed out that statistical decision theory had a sweeping impact on this field: First, it established methodological standards, and second, it served as a metaphor for human reasoning. Cognition came to be viewed as “intuitive statistics” (Gigerenzer 1987; Gigerenzer et al. 1989). It is true that the research cited so far chiefly belongs to the province of the historian of science. In the light of Porter’s study of the bureaucratic practices of cost-benefit analysis and of Sapolsky’s account of the Polaris project, it seems wise to turn to the administrative aspects of the American large-scale armament projects if one wishes to assess the influence which the content of applied mathematics exerted not only on articulating theory, but also on controlling action. In his recent book “Rescuing Prometheus” (Hughes 1998), which tells the story of large technological projects like the SAGE air-defense system, Thomas P. Hughes devotes one chapter to the “Spread of the Systems Approach”. After reviewing the beginnings of OR, its moulding into systems analysis, the history of RAND and of McNamara’s “Whiz Kids”, Hughes narrates the career of Jay Forrester, the father of SAGE’s Whirlwind computer. In the 1960s, after having abandoned his engineering work in favor of a position at MIT’s Sloan School of Management, Forrester began studying the structure of social systems. He portrayed them as dynamic systems featuring complex feed- back relationships, and set out to simulate them by computer modeling. In this vein, he published a series of books, “Industrial Dynamics” (1961), 54 MICHAEL STOLZ

“Urban Dynamics” (1969), and “World Dynamics” (1971). These models in turn inspired those which formed the basis of the famous “The Limits to Growth” (1972) report for the Club of Rome (D. Meadows et al.). This chapter from Hughes’ book allows in principle to trace the gradual transformation of a mathematical tool (or of a paradigm of mathematical modeling) into a set of political ideas, which actually were not only ideas, but also ways of imagining or conceptualizing a complex world. But “The Spread of the Systems Approach” does not aim at giving a history of math- ematical models, but at illustrating the spread of ideas that originated in the large-scale armament projects. Nevertheless, Hughes does not eschew mentioning some salient features of these models. So he points out that Forrester was convinced of the importance of nonlinear phenomena and incorporated them into his models, although this was only possible at the expense of simplicity (Hughes 1998, 183). To sum up, what is called for is an integrated view, a historical study written both from the angle of tool-based management of large-scale sys- tems (like in Hughes’ account), and from the angle of the internal structure of mathematical models (like in Giorgio Israel’s book (Israel 1996)). Of course, this would require a joint effort both on the part of internalist and externalist historians of mathematics, and on the part of the historians of technology, economy, economics, politics, and society. But in this way mathematics would become a legitimate object of the general history of ideas.

NOTES

1 Of course, the externalist historiography of mathematics is relevant to sociology as well, because mathematics is seen as a crucial test case for sociological theories of knowledge, cf. Bloor (1976) and Restivo (1992). It is beyond the scope of the present paper to elaborate on this point in any detail. 2 Of course, the alterations in the character of scientific work wrought by World War II have aroused considerable interest on the part of both historians of science and general historians. The main thrust of historical research, though, has been devoted to the phys- icists’ experience. Nevertheless, there is a considerable amount of published material, if largely commemorative, which reveals some characteristic traits of the mathematical war effort. For a recent survey article, covering also the mathematics relevant to physics and engineering, see Dahan Dalmedico (1996). 3 Fortun and Schweber point out that the interdisciplinary nature of the OR groups was not new. “Scientific management investigations Ð like the later OR ones Ð were inter- disciplinary in character. There was nothing novel about the ‘mixed-team’ approach of OR. Industrial psychology grew out of the ‘scientific management’ investigations of the effect of repetitive actions on workers on the assembly line. Nor was it unusual to have HISTORY OF APPLIED MATHEMATICS AND SOCIETY 55 physiologists on the team to help in the investigation of fatigue in the workplace” (Fortun and Schweber 1993, 624). 4 This group got its nickname because it was headed by Manchester physicist P. M. S. Blackett, who would be awarded the Nobel prize in 1948. Blackett wrote the most influ- ential wartime text on the methods of OR and inspired the installation of OR groups in various British military units (McCloskey 1987a, 1987b). 5 See Fisher and Walker (1994), Jardini (1996), Hounshell (1997), and Hughes (1998). 6 See Smith (1966), Smith (1991), Waring (1995), and Edwards (1996). 7 “Sequential analysis is a method of statistical inference whose characteristic feature is that the number of observations required by the procedure is not determined in advance of the experiment. The decision to terminate the experiment depends, at each stage, on the results of the observations previously made. A merit of the sequential method, as applied to testing statistical hypotheses, is that test procedures can be constructed which require, on the average, a substantially smaller number of observations than equally reliable test pro- ceduresbasedona predeterminednumberofobservations....Thesequentialprobability ratio test frequently results in a saving of about 50 per cent in the number of observations over the most efficient test procedure based on a fixed number of observations” (Wald 1947, 1). 8 This development also bears some relevance to gender studies. In the present context this can for example be illustrated by Wallis’ account of the SRG work. Here is how he described the duties of an assistant director of the group: “Bowker’s principal responsibility was to organize and manage the computing, which was done by about 30 young women, mostly mathematics graduates of Hunter or Vassar” (Wallis 1980, 322). It comes as no surprise that the 18 principal members of the SRG were all male (Wallis 1980, 324). 9 The theme of a marginalized applied mathematics in the interwar U.S. is frequently evoked in the commemorative literature, see Rees (1980), Lax (1989) and Prager (1972) together with the critical comment in Mac Lane (1989, 508Ð510). Historical studies have added several nuances to this picture, see Hanle (1982), Hunter (1996), Siegmund-Schultze (1998), and Hunter (1999). The last reference contains helpful information on how transatlantic contacts facilitated the creation of the community of American mathematical statisticians (Hunter 1999, 49Ð54). 10 See Mazon (1988), Gemelli (1997), Krohn and von zur Mühlen (1997), and Hochgeschwender (1998). 11 Some recent examples of this trend are Jordan (1994), Willeke (1995), Edwards (1996), and van Laak (1999). 12 See Jardini (1996) and Hounshell (1997). 13 “At the time of the first missile launch from the USS George Washington in 1960, press coverage of PERT was said by a naval public relations officer to have been almost as great as that devoted to Polaris” (Sapolsky 1972, 111). 14 E.g., Krüger (1987) and Gigerenzer (1989); see Porter (1994) for further references.

REFERENCES

Bloor, D.: 1976, Knowledge and Social Imagery, London. Dahan Dalmedico, A.: 1996, ‘L’essor des mathématiques appliquées aux États-Unis: L’impact de la seconde guerre mondiale’, Revue d’histoire des mathématiques 2, 149Ð213. 56 MICHAEL STOLZ

Edwards, P. N.: 1996, The Closed World: Computers and the Politics of Discourse in Cold War America, Cambridge, MA. Fortun, M. and S. S. Schweber: 1993, ‘Scientists and the Legacy of World War II: The Case of Operations Research (OR)’, Social Studies of Science 23, 595Ð642. Fisher, G. H. and W. E. Walker: 1994, Operations Research and the RAND Corporation, Santa Monica, CA. Gemelli, G.: 1997, ‘Les écoles de gestion en France et les fondations américaines (1930Ð 1970)’, Entreprises et histoire 14–15, 11Ð28. Gigerenzer, G. et al.: 1989, The Empire of Chance, Cambridge. Gigerenzer, G. and D. J. Murray: 1987, Cognition as Intuitive Statistics, Hillsdale, NJ. Hanle, P. A.: 1982, Bringing Aerodynamics to America, Cambridge, MA. Hochgeschwender, M.: 1998, Freiheit in der Offensive? Der Kongreß für kulturelle Freiheit und die Deutschen, München. Hounshell, D.: 1997, ‘The Cold War, RAND, and the generation of knowledge, 1946Ð 1962’, Historical Studies in the Physical and Biological Sciences 27, 237Ð267. Hughes, T. P.: 1998, Rescuing Prometheus,NewYork. Hunter, P. W.: 1996, ‘Drawing the boundaries: Mathematical statistics in 20th-Century America’, Historia Mathematica 23, 7Ð30. Hunter, P. W.: 1999, ‘An unofficial community: American mathematical statisticians before 1935’, Annals of Science 56, 47-68. Ingrao, B. and G. Israel: 1990, The Invisible Hand: Economic Equilibrium in the History of Science, Cambridge, MA. Israel, G.: 1996, La mathématisation du réel: essai sur la modélisation mathématique, Paris. Jardini, D.: 1996, Out of the Blue Yonder: The RAND Corporation’s Diversification into Social Welfare Research, 1946–1968, Ph.D. dissertation, Carnegie Mellon University. Jordan, J. M.: 1994, Machine-Age Ideology: Social Engineering and American Liberalism, 1911-1939, Chapel Hill, NC. Krohn, C.-D. and P. von zur Mühlen (eds.): 1991, Rückkehr und Aufbau nach 1945: deutsche Remigranten im öffentlichen Leben Nachkriegsdeutschlands,Marburg. Krüger, L. et al. (eds.): 1987, The Probabilistic Revolution, 2 volumes, Cambridge, MA. Lax, P. D.: 1989, ‘The flowering of applied mathematics in America’, in P. Duren (ed.), A Century of Mathematics in America, Part II, Providence, RI, pp. 455-466. Lesourne, J.: 1990, ‘OR and the Social Sciences’, Journal of the Operational Research Society 41, 1Ð7. Mazon, B.: 1988, Aux origines de l’École des Hautes Études en Sciences Sociales: le rôle du mécénat américain (1920–1960),Paris. McCloskey, J. F.:1987a ‘The Beginnings of Operations Research: 1934–1941’, Operations Research 35, 143Ð152. McCloskey, J. F.: 1987b, ‘British Operational Research in World War II’, Operations Research 35, 453Ð470. McCloskey, J. F.: 1987c, ‘U.S. Operations Research in World War II’, Operations Research 35, 910-925. Morgan, M. S.: 1990, The History of Econometric Ideas, Cambridge. Moutet, A.: 1997, Les logiques de l’entreprise: la rationalisation dans l’industrie française de l’entre-deux-guerres,Paris. Mac Lane, S.: 1989, ‘The Applied Mathematics Group at Columbia in World War II’, in P. Duren (ed.), A Century of Mathematics in America, Part III, Providence, pp. 495Ð515. HISTORY OF APPLIED MATHEMATICS AND SOCIETY 57

Owens, L.: 1989, ‘Mathematicians at War: Warren Weaver and the Applied Mathemat- ics Panel, 1942Ð1945, in D. E. Rowe and J. McCleary (eds.), The History of Modern Mathematics, Volume II: Institutions and Applications, Boston, pp. 287Ð305. Porter, T. M.: 1994, ‘The Social Organization of Probability and Statistics’, in I. Grattan-Guinness (ed.), Companion Encyclopedia of the History and Philosophy of the Mathematical Sciences, London, pp. 1392-1398. Porter, T. M.: 1995, Trust in Numbers: The Pursuit of Objectivity in Science and Public Life, Princeton, NJ. Prager, W.: 1972, ‘Introductory Remarks’, Quarterly of Applied Mathematics 30, 1Ð9. Rees, M.: 1980, ‘The Mathematical Sciences and World War II’, The American Mathem- atical Monthly 87, 607Ð621. Reingold, N.: 1981, ‘Refugee Mathematicians in the United States of America’, Annals of Science 38, 313Ð338. Restivo, S.: 1992, Mathematics in Society and History, Sociological Inquiries, Dordrecht. Rosenhead, J.: 1989, ‘Operational Research at the Crossroads: Cecil Gordon and the Development of Post-War OR’, Journal of the Operational Research Society 40, 3Ð28. Rosser, J. B.: 1982, ‘Mathematics and Mathematicians in World War II’, Notices of the American Mathematical Society 29, 509Ð515. Sacks, G.E.: 1989, ‘John Barkley Rosser (1907Ð1989)’, Notices of the American Mathem- atical Society 36, 1367. Sapolsky, H. M.: 1972, The Polaris System Development: Bureaucratic and Programmatic Success in Government, Cambridge, MA. Smith, B. L.: 1966, The Rand Corporation: Case Study of a Nonprofit Advisory Corpora- tion, Cambridge, MA. Smith, J. A.: 1991, The Idea Brokers: Think Tanks and the Rise of the New Policy Elite, New York. Siegmund-Schultze, R.: 1998, Mathematiker auf der Flucht vor Hitler, Braunschweig. van Laak, D.: 1999, Weiße Elefanten: Anspruch und Scheitern technischer Großprojekte im 20. Jahrhundert, Stuttgart. Wald, A.: 1947, Sequential Analysis,NewYork. Wallis, W. A.: 1980, ‘The Statistical Research Group, 1942-1945’, Journal of the American Statistical Association 75, 320Ð333. Waring, S. P.: 1991, Taylorism Transformed: Scientific Management Theory since 1945, Chapel Hill. Waring, S. P.: 1995, ‘Cold Calculus: The Cold War and Operations Research’, Radical History Review 63, 28Ð51. Weintraub, E. R. (ed.): 1992, Toward a History of Game Theory, Durham. Weintraub, E. R. and P. Mirowski: 1994, ‘The Pure and the Applied: Bourbakism Comes to Mathematical Economics’, Science in Context 7, 245Ð272. Willeke, S.: 1995, Die Technokratiebewegung in Nordamerika und Deutschland zwischen den Weltkriegen: eine vergleichende Analyse, Frankfurt/M.

Eberhard-Karls-Universität Tübingen Eberhard-Karls-Universität Tübingen Mathematisches Institut Seminar für Zeitgeschichte Auf der Morgenstelle 10 Wilhelmstra§e 36 D-72076 Tübingen D-72074 Tübingen Germany Germany E-mail: [email protected]

EBERHARD KNOBLOCH

LEIBNIZ’S RIGOROUS FOUNDATION OF GEOMETRY BY MEANS OF RIEMANNIAN SUMS

ABSTRACT. In 1675, Leibniz elaborated his longest mathematical treatise he ever wrote, the treatise “On the arithmetical quadrature of the circle, the ellipse, and the hyperbola. A corollary is a trigonometry without tables”. It was unpublished until 1993, and represents a comprehensive discussion of infinitesimal geometry. In this treatise, Leibniz laid the rig- orous foundation of the theory of infinitely small and infinite quantities or, in other words, of the theory of quantified indivisibles. In modern terms Leibniz introduced ‘Riemannian sums’ in order to demonstrate the integrability of continuous functions. The article deals with this demonstration, with Leibniz’s handling of infinitely small and infinite quantities, and with a general theorem regarding hyperboloids.

1. INTRODUCTION

In 1675, still being in Paris, Gottfried Wilhelm Leibniz elaborated his mathematical treatise “On the arithmetical quadrature of the circle, the ellipse, and the hyperbola. A corollary is a trigonometry without tables”. It was his longest mathematical treatise he ever wrote. Only in 1993 it was posthumously published for the first time (Leibniz 1993). Leibniz origin- ally wanted to submit it to the French Academy of Sciences in order to become a member of this institution. Hence, there can be no doubt that he rated it very high, and that with good reasons as I would like to show. The treatise is a comprehensive study of infinitesimal geometry whose main subject are conic sections and the logarithmic curve. This curve is dealt with on the reproduced page which might give an appropriate im- pression of what the manuscript resembles (Figure 1). The original size of the manuscript is folio. First of all, we have to comprehend the title. “Arithmetical quadrature” means a converging infinite series consisting of rational numbers. “Tri- gonometry without tables” means the expansion into infinite series of the trigonometric functions like sin x,cosx etc., so that trigonometric tables are no longer needed. The treatise consists of 51 theorems and 24 scholia, among them the well-known Leibniz criterion for the convergence of alternating series

Synthese 133: 59–73, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 60 EBERHARD KNOBLOCH

Figure 1. Reproduced page of the manuscript. LEIBNIZ’S RIGOROUS FOUNDATION 61

(Proposition 49). In the treatise, Leibniz laid the rigorous foundation of the theory of infinitely small and infinite quantities or, in other words, of the theory of quantified indivisibles. Leibniz had critically studied Galileo’s “Discourses and mathematical demonstrations concerning two new sciences” where Galileo considered indivisibles as non-quantities. His own opinion fundamentally differed from Galileo’s approach regarding the infinite (Knobloch 1999). For Leib- niz, mathematics was the science of quantities which consequently cannot handle non-quantities as Galileo proposed. Hence, he defined: “Infinitely small quantities are smaller than any given quantity, but greater than zero; infinitely large quantities are larger than any given quantity”. To him, both kinds of quantities must be variable quantities. The de- cisive aspect is, that although they are fictive quantities (because they are introduced by a fiction) they are quantities, nevertheless. It does not matter, whether they appear in nature or not, because they give possibilit- ies of abbreviations in speaking, thinking, discovering, and in proving. In Leibniz, Galileo’s non-quantities have become quantities which, therefore, mathematicians can handle which means, above all, with which they can calculate. Like Galileo, Leibniz emphasized certainty, rigour, and precise- ness of mathematics. Leibniz put the handling of infinitesimals or infinitely small quantities on a secure, finite basis which used quantities. Indivisibles have to be interpreted as infinitely small quantities of this type, as he un- derlined repeatedly, because indivisibles in the true sense of the word, for example, points, are no more objects of mathematics than the unbounded, infinite asymptote. I would like to discuss the following three issues: • Leibniz’s sixth theorem: the integrability of certain curves. • Leibniz’s arithmetic of the infinite: twelve rules. • Leibniz’s 22nd theorem on hyperboloids.

2. LEIBNIZ’S SIXTH THEOREM: THE INTEGRABILITY OF CERTAIN CURVES

“The theorem is”, Leibniz said in his summary of “De quadratura arith- metica”, “most thorny. Therein it is overly carefully demonstrated that the procedure of constructing certain rectilinear step spaces and in equal fash- ion polygons can be continued to such a degree that they differ from each other or from curves by a quantity which is smaller any given quantity. This is usually assumed by others in most cases” (Archimedes is meant). “Its reading can be passed over at the beginning. It serves, however, to lay 62 EBERHARD KNOBLOCH

Figure 2. the foundations of the whole method of indivisibles in the soundest way possible” (Leibniz 1993, 24). Leibniz partially repeated these words immediately before he explained the theorem itself (Leibniz 1993, 28): “The reading of this proposition can be omitted if somebody does not want supreme rigour in demonstrating Proposition 7. And it will be better that it is disregarded at the beginning and that it will be read only after the whole subject has been understood, in order that its excessive exactness does not discourage the mind from the mind from the other by far more agreeable things by making it become weary prematurely. For it brings about only this that two spaces of which one passes into the other if we progress infinitely, approach each other to a difference which is smaller than any arbitrary assigned difference, even then when the number of inscriptions remains only finite”. Hence it is time to explain his Figure 2.

Leibniz considers a curve through indexed points C (his example is a circular arc). The vertical axis is the x-axis, the horizontal axis is the y- axis. A new curve through indexed points D is constructed in the following way: LEIBNIZ’S RIGOROUS FOUNDATION 63

1. The tangents in C cut the x-axis in T . 2. The points D are the points of intersection of the perpendiculars on the x-axis through T with the ordinates through C of the first curve. 3. The secants through two subsequent points C cut the x-axis in M. 4. The points F are the points of intersection of the perpendiculars on the x-axis through M with the curve through the points D. The points N are the points of intersection of these perpendiculars with the ordinates through the points C of the first curve. The construction is possible for every curve through points C (the first curve) provided it complies with three conditions:

(i) It is continuous. (ii) There is no point of inflection. (iii) There is no point with a vertical tangent. The first condition is necessary in order to choose sufficiently prox- imal points C of the curve. The second condition makes sure that certain constructed points T and M form a sequence which does not change the direction: A point M related to two subsequent points T lies al- ways between the two T . The third condition makes sure that there is always a point of intersection nF of the curve segment nD n+1D with the perpendicular nNnP . The first condition is indispensible. If necessary, the two other condi- tions can be fulfilled by dividing the curve into several segments. Once the second curve has been constructed, the first curve can be omitted, because the quadrature – or in modern terms the integration – concerns the second curve which has to be continuous and monotonously increasing or decreasing, as can be seen in Figure 3. Theorem 6 maintains that the difference between the areas of the mixtilinear figure 1B 1D 2D 3D 3B 1B and of the step figure 1B 1N 1P 2N 2P 3B 1B can be made smaller than any given quantity. While the “common method of indivisibles” considered inscriptions and circumscriptions of mixtilinear figures, the step figure is neither an inscription nor a circumscription, rather something in between. In modern terms: Leibniz demonstrated the integrability of a huge class of functions by means of Riemannian sums which depend on intermediate values of the partial integration intervals. To this end he introduced two new notions: a new notion of equality and a new notion of sum of ordinates. While up to then two quantities were called equal if there difference was zero, Leibniz called two quantities equal if their difference can be made arbitrarily, that is infinitely small. 64 EBERHARD KNOBLOCH

Figure 3.

While up to then infinitely small quantities were summed up without that “infinitely small” was well defined, Leibniz multiplied the sum of the ordinates by an infinitely small quantity which was a well-defined notion. The proof consists of eight steps:

1. Partition We choose a partition of the whole integration interval 1B 3B into a finite number of intervals whose length might differ from one another. An arbitrary rectangle over a partial integration interval is covered by an elementary rectangle e.r. reaching from the x-axis to the vertical through N and a complementary rectangle c.r. between two verticals through two subsequent points D. These rectangles overlap each other. 2. Estimation of the difference taken absolutely between the area of one elementary rectangle 1B 2B 1P 1N and one mixtilinear fig- ure 1B 2B 2D 1D. We claim that |1B 2B 2D 1D −1 B 2B 1P 1N| < |1D 1E 2D| where 1D 1E 2D is a complementary rectangle. 3. Demonstration of this claim by reduction The common part 1B 2B 1P 1F 1D is subtracted from both areas, giv- ing |1B 2B 2D 1D −1 B 2B 1P 1N|=|1D 1N 1F −1 F 1P 2D| <1 D 1E 2D. The two threelinear areas do not overlap and lie within the complementary rectangle 1D 1E 2D. Hence even the sum of their areas is smaller than the area of this complementary rectangle. LEIBNIZ’S RIGOROUS FOUNDATION 65

4. Estimation for all such elementary rectangles and mixtilinear fig- ures The previous demonstration is valid for all rectangles and figures. Hence, the difference between the sum of the areas of all mixtilinear figures and the sum of the areas of all elementary rectangles is smaller than the sum of the areas of all complementary rectangles. Leibniz uses the triangle axiom without saying it. If we use only two mixilinear figures f1 and f2, and two elementary rectangles e.r.1 and e.r.2 we get indeed that

|f1 + f2|−|e.r.1 + e.r.2|≤|f1 + f2 − (e.r.1 + e.r.2)|

≤|f1 − e.r.1|+|f2 − e.r.2|. 5. Sum of the areas of all complementary rectangles Let C denote the sum of the areas of all complementary rectangles. C is smaller than the sum of their bases times their greatest height or the sum of their bases times the common height provided that all rect- angles have the same height. The sum of the bases is the differences between the greatest and the smallest ordinate, that is 1L 3D.Lethm denote the greatest height, then we have C<1 L 3D · hm. 6. Estimation of the difference between the area of the whole mixti- linear figure and that of all elementary rectangles Let M denote the area of the whole mixilinear figure and E the sum of the areas of the elementary rectangles. Then M − E can be bounded from above by the number 1L 3D · hm, i.e., M − E<1L 3D · hm. 7. Reduction of the quantity 1L 3D · hm This greatest height (an abscissa) can be chosen smaller than any given quantity, because the curve is continuous. Hence this quantity 1L 3D · hm can be made smaller than any given quantity. 8. Reduction of the difference M − E The number M − E can be made smaller than any given quantity. Q.E.D. Leibniz repeated the proof for the “common method of indivisible”. In this case which is illustrated in Figure 4, it becomes considerably easier: Thereisacurve1N 2N 3N and a step figure 1N 1P 2N 2P 3N etc. The points N coincide with the points D. Again, there are elementary and complementary rectangles. The argumentation is based on five steps: 1. Difference between the areas of the mixtilinear figure 1N 1B 3B 3N 2N 1N and the step figure The area of the mixtilinear figure will be denoted by f and the area of the step figure will be denoted by s. Then we have that f − s is 66 EBERHARD KNOBLOCH

Figure 4.

equal to the sum of the threelinear areas 1N 1P 2N, 2N 2P 3N etc. (the hatched areas). 2. Estimation of the difference f − s We have that f − s

He saw the disadvantages of such a procedure (Leibniz 1993, 33): “I would have willingly preferred to omit this theorem”, he added in a comment on it, “because nothing is more alien to my mind than these scru- pulous details of some authors which imply more ostentation than utility. For they consume time, so to speak, on certain ceremonies, include more trouble than ingenuity, and envelop the origin of inventions in blind night which is, at it seems to me, mostly more prominent than the inventions themselves. I do not deny, however, that it is in the interest of geometry to have the methods themselves and the principles of the inventions as well as some more outstanding theorems rigorously demonstrated. Hence, I believed that I had to give way a bit to the received opinions”.

3. LEIBNIZ’S ARITHMETIC OF THE INFINITE: TWELVE RULES

In his treatise Leibniz used a dozen rules which constitute his arithmetic of the infinite. He just applied them without demonstrating them, only relying on the “law of continuity”: The rules of the finite remains valid in the domain of the infinite. They are as follows: 1. Finite + infinite = infinite 2.1 Finite ± infinitely small = finite 2.2 x, y finite, x = (y + infinitely small) ⇒ x − y ≈ 0 (not assignable difference) 3. Infinite1 − infinite2 = infinite3, if infinite1 > infinite2 (or infinite1 divided by infinite2 =1) 4. Infinite ± infinitely small = infinite 5. Finite times infinitely small = infinitely small  infinite, or = 6. Infinite times infinitely small  finite, or infinitely small (proof is needed) 7.1 Infinite times infinite = infinite 7.2 xn infinite ⇒ x infinite finite, or 8. Infinite divided by infinite= infinite (proof is needed) 9. x infinitely small, y>0, y

Figure 5.

11. Infinitely small divided by finite = finite divided by infinite = infinitely small (infinitesimal) Corollary: finite divided by infinite = x divided by finite ⇒ x infinitely small = + + 12. x divided by y (x infinitely small1) divided by (y infinitely small2) The laws 10. and 11. are particularly important, because they permit the demonstration that an unknown quantity is finite or infinitely small. I would like to follow Leibniz’s advice, that is to illustrate the use of the rules by an example.

4. LEIBNIZ’S 22ND THEOREM ON HYPERBOLOIDS

I would like to illustrate Leibniz’s handling of these quantities by his Theorem 22 which reads as follows:

THEOREM 22. Let there be given an arbitrary hyperboloid ymxn = a (except the hyperbola yx = a). Then there are two infinitely long areas. The area between the asymptote A 0G,thearc0C 1C of the curve, a finite abscissa A 1B and the corresponding finite ordinate infinite, if mn.

As an example, let us consider yx2 = a (i.e., m = 1andn = 2) which is depicted in Figure 5. In this case, F1 is infinite and F2 is finite. LEIBNIZ’S RIGOROUS FOUNDATION 69

Figure 6.

Leibniz’s demonstration of Theorem 22 relies on three theorems:

THEOREM 18. If ymxn = a or bxn = ym, the ratio of the zone between two ordinates, the arc of the curve and the axis to the so-called “conjugated zone” between the two corresponding abscissas, the same arc of the curve and the conjugated axis (that is the ratio of the two hatched zones to each other) is the same as m/n. This theorem is visualized in Figure 6. For a proof, see the demonstra- tion in Knobloch (1993, 80–82).

THEOREM 20. Let there be given three quantities X, Z, V .LetV + X + V +X = have a finite ratio to V Z which is not equal to one: V +Z 1. 1. If X, Z are finite, V is finite, as well. 2. If X or Z is infinite, V is infinite, as well. For a proof, see the demonstration in Knobloch (1990, 51).

m n THEOREM 21. Let there be given a hyperboloid y x = a, A 0B an infinitely small abscissa, 0B 0C the corresponding infinite ordinate.  is infinitely great, if mn is finite, if m = n. For a proof, see the demonstration in Knobloch (1993, 83f.; 1994, 273– 276). The proof of Theorem 22 consists of 11 steps. 70 EBERHARD KNOBLOCH

Figure 7.

1. Infinite length We claim that either F1 is infinitely long or 0B 0C is infinite, if A 0B is infinitely small. Proof: By our rules, A 1B is finite divided by infinitely small which is A 0B infinite. n m (A 1B) ( 0B 0C) x But n = m = . The left ratio is infinite, according to (A 0B) ( 1B 1C) finite rule 10. Hence either x is infinite, (according to the corollary to rule 10), or 0B 0C is infinite (according to rule 7.2.) 2. Partition/reduction Let A 0B be infinitely small, that is, according to (1), let 0B 0C be infinite. We consider Figure 7: F = V + X + Z + R, this area is bounded by

1. the asymptote A 0G, 2. the infinite arc of the curve 1C 0C, 3. two segments A 1B, 1B 1C, 4. the infinitely small segment 0C 0G. We reduce the affirmation to the following assertion: If R + Z is infinite/finite, then F is infinite/finite. The proof of the “reduced” affirmation will be contained in steps (3) through (6): 3. Application of Theorem 18 V +X = m = = We have V +Z n ,wherem/n 1, and m/n is finite, because yx a is excluded. 4. Application of Theorem 20 X is finite. Hence we get the implication: If Z is infinite/finite, then V is infinite/finite. Let Z be finite/infinite, then V is finite/infinite. Hence LEIBNIZ’S RIGOROUS FOUNDATION 71

either V + X or V + X + Z is finite/infinite, because the addition of a finite quantity X does not make an infinite/finite quantity finite/infinite. 5. The whole area F R = A 0B · A 1G = infinitely small according to rule 5. Hence according to step (4): if Z is finite/infinite, then V + X + Z + R is finite/infinite, because the addition of an infinitely small quantity does not make a finite/infinite quantity infinite/finite. 6. Replacement of Z by Z + R If Z+R is finite/infinite, then Z is finite/infinite, because the difference R is infinitely small, this means F is finite/infinite or the affirmation of (2) is true. 7. Application of Theorem 21 Therefore we have to solve the question: When is Z + R finite or infinite? According to Theorem 21 we know: infinite, if mn. We notice: This is not the dichotomy we need. 8. Case 1: mn instead of Z + R is infinitely small.

As a consequence, his proof is false here. Yet it can be corrected. To this end, we have to consider the following case: If Z + R is infinitely small, Z is infinitely small, because R is infinitely small. Theorem 20 does not include this case. As a consequence we have to investigate Theorem 20 in the case, where Z is infinitely small. 72 EBERHARD KNOBLOCH

9. Let us call this case Theorem 20a: V +X + = + If V +Z finite, V X V Z, X is finite, and Z is infinitely small, then V is finite. Proof: 1) Suppose V is infinitely small. Then V + Z is infinitely small. V + X is finite. Hence is infinite according to rule 10. This is a contradiction to the assumptions. 2) Suppose V is infinite. Then V + Z is infinite, V + X is infinite. Hence (V + X) − (V + Z) is infinite. For if we subtract a smaller infinite from a greater infinite, whose ratio is not equal to 1, the remainder (difference) is infinite by the rules infinite1 divided by infinite2 = r, or infinite1 = r · infinite2, or infinite1 − infinite2 = (r − 1)· infinite2 (if r =1). But (V +X)−(V +Z) = X −Z is finite (it is finite minus infinitely small, hence finite). This is a contradiction to the foregoing result. The steps 1) and 2) lead to the conclusion that V is finite. 10. Continuation of the main proof by means of Theorem 20a If m>n,thenZ + R is infinitely small, that means that Z is infinitely small. Hence either V is finite according to (9) or V + X + Z + R is finite. 11. Change of the coordinates Every hyperbola has two asymptotes AG, AB.Ifmm, that is if the area is infinitely great with regard to the asymptote AG, then the area with regard to the other asymptote is finite.

5. CONCLUSION

Leibniz’s treatise on the arithmetical quadrature of conic sections is of highest mathematical interest, and that especially because of its method- ological considerations. According to Leibniz the best method must have four peculiarities: fertility, security, conclusiveness, and universality. His treatise shows how the method of indivisibles, that is of infinitely small quantities complies with all four conditions. The treatise itself gives over- whelming evidence of the fertility and universality of the method. Leibniz did not conceal how slippery calculations with the infinite is, if it is not guided by the thread of a proof. The conclusiveness is demonstrated by means of Archimedes’s estimation process, in other words in the soundest way possible as Leibniz put it. No mathematician ever contested this pro- cess: Leibniz justified his method by refering to the model of mathematical rigour. LEIBNIZ’S RIGOROUS FOUNDATION 73

REFERENCES

Knobloch, Eberhard: 1990, ‘L’infini dans les mathématiques de Leibniz’, in L’infinito in Leibniz, Problemi e terminologia, Simposio Internazionale Roma, 6–8 Novembre 1986, Roma, pp. 33–51. Knobloch, Eberhard: 1993, ‘Les courbes analytiques simples chez Leibniz’, in Sciences et techniques en perspective, Vol. 26, pp. 74–96. Knobloch, Eberhard: 1994, ‘The Infinite in Leibniz’s Mathematics – The Historiograph- ical Method of Comprehension in Context’, in Kostas Gavroglu, Jean Christianidis and Efthymios Nicolaïdis (eds), Trends in the Historiography of Science, Dordrecht [Boston Studies in the Philosophy of Science 151], pp. 265–278. Knobloch, Eberhard: 1999, ‘Galileo and Leibniz: Different Approaches to Infinity’, Archive for the History of Exact Sciences 54 (1999), 87–99. Leibniz, Gottfried Wilhelm: 1993, De quadratura arithmetica circuli ellipseos et hy- perbolae cujus corollarium est trigonometria sine tabulis, kritisch herausgegeben und kommentiert von Eberhard Knobloch, Göttingen [Abhandlungen der Akademie der Wissenschaften in Göttingen, Mathematisch-physikalische Klasse 3; 43].

Institut für Philosophie, Wissenschaftstheorie, Wissenschafts- und Technikgeschichte der Technischen Universität Berlin Ernst–Reuter–Platz 7 10587 Berlin Germany E-mail: [email protected]

A. R. D. MATHIAS

A TERM OF LENGTH 4 523 659 424 929

ABSTRACT. Bourbaki suggest that their definition of the number 1 runs to some tens of thousands of symbols. We show that that is a considerable under-estimate, the true number of symbols being that in the title, not counting 1 179 618 517 981 links between symbols that are needed to disambiguate the whole expression.

1. INTRODUCTION1

Bourbaki, the self-perpetuating French group of mathematicians, are ill at ease with logic and foundations. Some signs of that are given in my essay The Ignorance of Bourbaki (Mathias 1992); Leo Corry has remarked in his book (Corry 1996, 318Ð9), that Bourbaki do not in their later volumes use the system they have so carefully set out in their Volume I; indeed, there was, according to copies of La Tribu unearthed by Corry2 a consid- erable debate within Bourbaki as to whether the volume on La théorie des Ensembles should be written at all. That volume, indeed, has met criticism. Serre has remarked that few logicians like it. Godement, in the first hundred pages of his text Algèbre, though he follows Bourbaki’s exposition of logic and set theory, tells his readers to eschew formal reasoning. Another French scholar is quoted in Chouchan (1995, 124):

Jacques Roubaud parle même de l’effroyable premier livre sur la théorie des ensembles: un vrai désastre, auquel le monde a renoncé depuis longtemps. ...Onneserendpascompte que c’est une présentation souvent fallacieuse.

One is tempted to ask “Why should Bourbaki have had a problem? Surely the task of setting up an axiomatic system of set theory as a basis for mathematics is quite straightforward”. A projected book, Danish Lec- tures on Bourbaki’s Foundations, will attempt an answer in the light of a re-examination of Bourbaki’s foundations made in response to the kind invitation of Professor Stig Andur Pedersen to speak at a meeting devoted to Bourbaki held at Roskilde in October 1998. The purpose of the present paper, which will form the first chapter of that book, is to make this point:

Synthese 133: 75Ð86, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 76 A. R. D. MATHIAS

PROPOSITION 1.1. Bourbaki’s abbreviated structuralist definition of the number 1, when expanded into the primitive symbolism of the first edi- tion of La Théorie des Ensembles, comprises 4 523 659 424 929 symbols together with 1 179 618 517 981 links between certain of those symbols. That definition is quoted in the next paragraph. In ¤2 we review Bourbaki’s syntax; ¤¤3Ð6 give the details of the calculation of the length of that formula, using the formalism of the original edition; in the seventh section we remark that its length is vastly increased by the formalism of the 1970 edition. Some brief comments on the psychological significance of these arithmetical freaks will be found in the final section.

Bourbaki’s Abbreviated Definition of 1 Chapters I and II of Bourbaki’s Théorie des Ensembles were published in 1954, and Chapter III in 1956. Among the primitive signs used was a reverse C, standing presumably for “couple”, to denote the ordered pair of two objects. Being typographically unable to reproduce that symbol, we use instead the symbol •. With that change, the footnote on page 55 of Chapter III reads Bien entendu, il ne faut pas confondre le terme mathématique désigné (chap. I, ¤1, no 1) par le symbole “1” et le mot “un" du langage ordinaire. Le terme désigneé par “1” est égal, en vertu de la définition donnée ci-dessus, au terme désigné par le symbole

τZ((∃u)(∃U)(u = (U, {∅},Z)et U ⊂{∅}×Z et (∀x)((x ∈{∅}) ⇒ (∃y)((x, y) ∈ U))

et (∀x)(∀y)(∀y )(((x, y) ∈ U et (x, y ) ∈ U) ⇒ (y = y ))

et (∀y)((y ∈ Z) ⇒ (∃x)((x, y) ∈ U)))).

Une estimation grossière montre que le terme ainsi désigné est un assemblage de plusieurs dizaines de milliers de signes (chacun de ces signes étant l’un des signes τ, , ∨, ¬, =, ∈, •).

2. BOURBAKI’S SYNTAX

Bourbaki use the Hilbert operator but write it as τ rather than ε,which latter is visually too close to the sign ∈ for the membership relation.3 Bourbaki use the word assemblage, or, in their English translation, as- sembly, to mean a finite sequence of signs or letters, the signs being τ, , ∨, ¬, =, ∈ and •. The substitution of the assembly A for each occurrence of the letter x in the assembly B is denoted by (A|x)B. A TERM OF LENGTH 77

Bourbaki use the word relation to mean what in English-speaking countries is usually called a well-formed formula.

2.1. The rules of formation for τ-terms are these: let R be an assembly and x a letter; then the assembly τx (R) is obtained in three steps: (2·1·0) form τR, of length one more than that of R; (2·1·1) link that first occurrence of τ to all occurrences of x in R (2·1·2) replace all those occurrences of x by an occurrence of . In the result x does not occur. The point of that is that there are no bound variables; as variables become bound (by an occurrence of τ)they are replaced by , and those occurrences of  are linked to the occurrence of τ that binds them. The intended meaning is that τx (R) is some x of which R is true. Links may be indicated by lines drawn under the formula. An example is given in Remark 5.3. Certain assemblies are terms and certain are relations. These two classes are defined by a simultaneous recursion, presented in Godement (1963) thus: T1: every letter is a term T2: if A and B are terms, the assembly •AB, in practice written (A, B), is a term. T3: if A and T are terms and x a letter, then (A|x)T is an term. T4: if R is a relation, and x a letter, then τx (R) is an term. R1: If R and S are relations, the assembly ∨RS is a relation; in practice it will be written (R ∨ S). R2: ¬R is a relation if R is. R3: if R is a relation, x a letter, and A a term, then the assembly (A|x)R is a relation. R4: If A and B are terms, =AB is a relation, in practice written A = B. R5: If A and B are terms, the assembly ∈AB is a relation, in practice written A ∈ B. That is all.

REMARK 2.2. Clauses T3 and R3 are, as pointed out to me by So- lovay, redundant Ð if omitted, they can be established as theorems Ð and were added to Bourbaki’s original definition by Godement, presumably for pedagogical reasons.

REMARK 2.3. Note that every term begins with a letter, • or τ;every relation begins with =, ∈, ∨,or¬. Hence no term is a relation. 78 A. R. D. MATHIAS

Quantifiers are introduced as follows:

DEFINITION 2.4. (∃x)R is (τx(R) | x)R.

DEFINITION 2.5. (∀x)R is ¬(∃x)¬R.

3. SOME CALCULATIONS

DEFINITION 3.1. We write (A) for the length of an assembly A, not counting any links that are there; oc(x, A) for the number of occurrences in A of the letter x,andλ(A) for the number of links in A, which will equal the number of occurrences of .

PROPOSITION 3.2. If R is of length r, τx R is of length r + 1. Proof: We have added a τ, and replaced each x by a . q.e.d.

PROPOSITION 3.3. λ(τx (R)) = oc(x, R) + λ(R).

PROPOSITION 3.4. If x has m occurrences in R and y (distinct from x) has k occurrences in R,theninτx (R), x has no occurrences and y has k occurrences.

PROPOSITION 3.5. If x has m occurrences in R and y (distinct from x) has k occurrences in R, then in each of the formulæ (∃x)R and (∀x)R, x has no occurrences and y has (m + 1)k occurrences. Proof: There are the original k occurrences of y, and each of the mx’s has been replaced by τx (R), in each of which y has k occurrences. q.e.d.

PROPOSITION 3.6. If R is of length r and has m occurrences of x,then the length of (∃x)R is r(m + 1). Proof: Each replacement of x by τx (R) has increased the length by r. q.e.d.

PROPOSITION 3.7. If R is of length r and has m occurrences of x,then the length of (∀x)R is (r + 1)(m + 1) + 1. Proof: The formula is ¬(∃x)¬R and ¬R is of length r + 1 and has m occurrences of x. hfill q.e.d.

PROPOSITION 3.8. If x has m occurrences in R,andR has  links, λ((∃x)R) = λ((∀x)R) = m( + m) + . A TERM OF LENGTH 79

Proof: λ(τx (R)) =  + m, by Proposition 3.3. Each occurrence of x in R is replaced by one of τx (R), and then there are the  original links in R. q.e.d.

REMARK 3.9. A curiosity of this syntax, not needed for our present cal- culations, is that two trivially equivalent formulæ might have markedly different lengths. Thus if R has 2 occurrences of x,5ofy and 3 of z,and is of length 50, the formula (∃x)(∃y)R will be of length 3900, with 234 occurrences of z, whereas the formula (∃y)(∃x)R will be of length 2400 with 144 occurrences of z.

4. PARSING THAT TERM

We begin by repeating Bourbaki’s abbreviated term in open display, with y replaced by z:



τZ (∃u)(∃U)  u = (U, {∅},Z)et ⊂{∅}× U  Z et   (∀x) (x ∈{∅}) ⇒ (∃y) (x, y) ∈ U et    (∀x)(∀y)(∀z) (x, y) ∈ U et (x, z) ∈ U ⇒ (y = z) et     (∀y) (y ∈ Z) ⇒ (∃x) (x, y) ∈ U .

That is of the form   

τZ (∃u)(∃U) A(u, U, Z) et B(U,Z) et C(U) et D(U) et E(U,Z) .

5. THE CLAUSES

5.1. E(U,Z) is of the form   (∀y) ∨¬ ∈ yZ ∈•τx ∈•x yUyU 80 A. R. D. MATHIAS

If we write

e(y, U, Z) =df ∨¬ ∈ yZ ∈•τx ∈•x yUyU we see by inspection that e has 15 symbols (not counting subscripts, which represent links) and 1 link, 3 occurrences of y,1ofZ and 2 of U. Hence E(U,Z) has (15 + 1)×(3 + 1) + 1 = 65 symbols, among them 4 occurrences of Z and 8 of U; and 3×(3+1) + 1 = 13 links.

5.2. D(U) is of the form (∀x)(∀y)(∀z)d(U,x,y,z),whered = d(U,x,y,z) is

∨¬¬ ∨ ¬ ∈ •xyU¬∈•xzU = yz; by inspection d is of length 19, with 2 occurrences of U, 2 occurrences of x, 2 occurrences of y and 2 occurrences of z, and no links. Hence (∀z)d is of length (19 + 1)×(2 + 1) + 1 = 61, and has 6 occur- rences of x,6ofy and 6 of U, and 4 links; (∀y)(∀z)d is of length (61 + 1)×(6 + 1) + 1 = 435, and has 42 occurrences of x, and of U,and6×(4 + 6) + 4 = 64 links; finally D(U) which is (∀x)(∀y)(∀z)d is of length (435 +1)×(42 + 1) + 1 = 18 749, and has 43 × 42 = 1806 occurrences of U, and 42×(64 + 42) + 64 = 4516 links.

REMARK 5.3. According to the footnote on page E.II 6 of Bourbaki, ∅ is τ¬¬¬ ∈τ ¬¬ ∈ ,

or, with the links indicated by subscripts,

τx ¬¬¬ ∈ τy ¬¬ ∈ yx x.

So it has 3 links and 12 symbols. {x} is the term τy ∀z(z ∈ y ⇔ z = x) (slightly simplified from the actual definition as {x,x}). z ∈ y ⇔ z = x is

¬∨¬∨¬∈zy = zx¬∨¬=zx ∈ zy which has 20 symbols, with 4 occurrences of z,2ofy,and2ofx,and0 links. Call that f(x,y,z). (∀z)f(x,y,z) therefore is of length (20 + 1)×(4 + 1) + 1 = 106, with 10 occurrences of y and 10 of x, and 16 links. A TERM OF LENGTH 81

Therefore {x} is of length 107, with 10 occurrences of x and 26 links. Replacing each x by ∅, we find:

PROPOSITION 5.4. {∅} is of length 97 + 120 = 217, with 56 links.

5.5. C(U) is of the form (∀x)c(x,U),wherec(x, U) is    (x ∈{∅}) ⇒ (∃y) (x, y) ∈ U , that is,

∨¬ ∈ x{∅}∈•xτy ∈•xy UU, so replacing {∅} by its expansion into 217 symbols, we see that c(x, U) is of length 231 symbols, with 3 occurrences of x and 2 of U, and 57 links. Hence C(U) is of length (232× 4) + 1 = 929 symbols, with 3×(57 + 3) + 57 = 237 links, and 8 occurrences of U.

5.6. The Cartesian product is defined by Bourbaki on page E.II.3. Ex- panding their notation for the class-forming operator, {w | R},we have   X × Y = τw(∀z) (z ∈ w) ⇔ (∃x)(∃y)(z = (x, y) & x ∈ X & y ∈ Y) .

Write b0(x,y,z,X,Y)for (z = (x, y) & x ∈ X & y ∈ Y). A & B is ¬∨¬A¬B,so(A & B) & C is ¬∨¬¬∨¬A¬B¬C. So taking A to be z = (x, y), B to be x ∈ X and C to be y ∈ Y ,we have b0(x,y,z,X,Y)is ¬∨¬¬∨¬=z • xy¬∈xX¬∈yY, which by inspection is of length 19, with 2 occurrences each of x and y,1 each of z, X,andY , and 0 links. Therefore (∃y)b0 is of length 19 × 3 = 57, with 6 occurrences of x,3 each of z, X,andY , and 4 links; (∃x)(∃y)b0 is of length 57 × 7 = 399, with 21 occurrences each of z, X and Y , and 64 links. Call that b1(z,X,Y). A ⇔ B is ¬∨¬∨¬AB¬∨¬BA, so writing b(w, z, X, Y) for z ∈ w ⇔ b1(z,X,Y), we see that b is of length 8 + (2 × 3)+(2× 399) = 812, with 2 occurrences of w, 44 occurrences of z, 42 occurrences each of X and Y , and 128 links. Therefore (∀z)b is of length (813× 45) + 1 = 36 586, with 42× 45 = 1890 occurrences each of X and Y , 90 occurrences of w, and 44×(128 + 44) + 128 = 7696 links. So we conclude that 82 A. R. D. MATHIAS

PROPOSITION 5.7. The term X × Y is of length 36 587, has 90 + 7696 = 7786 links, and has 1890 occurrences each of X and Y . Now we have seen that {∅} is of length 217, with 56 links. Replacing each occurrence of X by one of {∅}, we increase the length by 1890 × 216 = 408 240, and add 56 × 1890 = 105 840 links. So we conclude that:

COROLLARY 5.8. The term {∅}×Z has 1890 occurrences of Z in it, and is of length 444 827, with 7786 + 105 840 = 113 626 links.

5.9. The formula U ⊂ V is (∀s)((s ∈ U) ⇒ (s ∈ V)); ((s ∈ U) ⇒ (s ∈ V))is

∨¬ ∈ sU ∈ sV, which is of length 8, with 2 occurrences of s, one each of U and V , and no links; hence U ⊂ V is of length 9 × 3 + 1 = 28, with 3 occurrences each of U and V , and 4 links. We conclude that the formula B(U,Z) ⇔df U ⊂ {∅}×Z is, when we replace each V by {∅}×Z, of length 28 + (3 × 444 826) = 1 334 506, with 3 occurrences of U, 5670 occurrences of Z and 4 +(3× 113 626) = 340 882 links.

5.10. The triple (X,Y,Z)is defined to be ((X, Y), Z); in other words •• XYZ;sou = (X,Y,Z)is = u ••XYZ, of length 7, with one occurrence each of u, X, Y , Z and no link; so u = (U, {∅},Z)is of length 223, with one occurrence each of u, U,andZ, and 56 links.

6. THE GRAND TOTAL

The formula A & B is ¬∨¬A¬B,so((((A & B) & C) & D) & E) is ¬∨¬¬∨¬¬∨¬¬∨¬A¬B¬C¬D¬E, which is 16 symbols plus those in A, B, C, D and E. So adding up: the number 1 is of the form τZ((∃u)(∃U)F(u,U,Z)) where F is the conjunction of A to E.Wehavethistable:

formula length u U Z links A 223 1 1 1 56 B 1 334 506 0 3 5670 340 882 C 929 0 8 0 237 D 18 749 0 1806 0 4516 E6508413 total: 1 354 472 1 1826 5675 345704 A TERM OF LENGTH 83

From that we conclude that F is a formula of length 1 354 488, with 1 occurrence of u, 1826 occurrences of U, 5675 occurrences of Z, and 345 704 links. Hence (∃U)F is of length 1 354 485 × 1827 = 2 474 649 576, with 1827 occurrences of u, 1827 × 5675 = 10 368 225 occurrences of Z,and 1826×(345 704 + 1826) + 345 704 = 634 935 484 links; and (∃u)(∃U)F is of length 2 474 649 576 × 1828 = 4 523 659 424 928 with 1828 × 1827 × 5675 = 18 953 115 300 occurrences of Z, and 1827×(634 935 484 + 1827) + 634 935 484 = 1 160 665 402 481 links. Finally the term denoting the number 1 is of length one more than that, namely 4 523 659 424 929, with 18 953 115 300 + 1 160 665 402 481 = 1 179 618 517 981 links, without which, of course, the formula would be unreadable.

7. THE LATER EDITION

In the combined 1970 edition of chapters I to IV, Bourbaki revert to the definition familiar to set theorists of the ordered pair of x and y as {{x}, {x,y}}. The corresponding footnote, on page E III 24 of that edition, is almost identical to the original, the only differences being the omission of a primitive symbol (the reverse C) for ordered pair, and the reference to Chapter I appearing more simply as (I, p. 15). Though there are good reasons for that change, it would mean, given the commitment of Bourbaki to the τ operator, an enormous increase in the number of symbols in their definition of the term 1, for •xy, instead of being of length 3 with one occurrence each of x and y, and no link, will be of length 4545, with 336 occurrences of x, 196 occurrences of y and 1114 links. X × Y will now be of length roughly 3.1845912 × 1018, with 1.15067 × 1018 links, and 6.982221 × 1014 occurrences each of X and of Y , and a program in Allegro Common Lisp written by Solovay yields these exact figures:

PROPOSITION 7.1. If the ordered pair (x, y) is introduced by definition rather than taken as a primitive, the term defining 1 will have 2 409 875 496 393 137 472 149 767 527 877 436 912 979 508 338 752 092 897 symbols, with 871 880 233 733 949 069 946 182 804 910 912 227 472 430 953 034 182 177 links. At 80 symbols per line, 50 lines per page, 1,000 pages per book, the shorter version would occupy more than a million books, and the longer, 6 × 1047 books. I believe that the approach customary among set-theorists 84 A. R. D. MATHIAS is simpler, whereby one takes the class-forming operator as a primitive symbol, and defines

0 =df {x |¬x = x} and

1 =df {x | x = 0}.

8. HIGHER CARDINALS

REMARK 8.1 (Solovay). Bourbaki’s definition of 1 as given will not gen- eralise satisfactorily to higher cardinals as it omits a clause stating that the function u with graph U is 1-1. If one seeks an ad hoc definition of the number 1 in Bourbaki’s dialect of set theory, a shorter one Ð perhaps the shortest ? Ð would be τZ ∃x(x ∈ Z et ∀y(y ∈ Z ⇒ y = x)) , which runs to 176 symbols with 56 links.

REMARK 8.2. Rough calculations suggest that for large n, a Bourbachiste definition of n as some object a for which ∃x1 ...∃xn a ={x1,...,xn}, with the x’s all distinct will have over n exp(2exp(n + 1)) symbols, whereas von Neumann’s definition of n as the set of all m less than n, when written in the symbolism of a standard set-theoretic formalism with quantifiers taken as primitive, has O(2expn) symbols, whilst a yet shorter definition of von Neumann’s n would be the union of the class of all b such that ((b is a transitive set of transitive sets) and (∃x1 ...∃xn b ={x1, ...xn} with the x’s all distinct)), which has O(n2) symbols; I learn from Solovay that that can be improved to O(log n log log n) or even O(log n) with “recycling of variables”.

9. DISCUSSION

Early reviewers such as Mostowski wrote that Bourbaki’s chosen found- ations were “cumbersome”; I had not realised to what extent till I read a footnote in Bourbaki, reproduced in Godement, saying that the term for the number 1 would take some tens of thousands of signs to write out in full. I thought, “That must be false, surely only a couple of hundred”; and then the truth emerged. I see in the hopeless unwieldiness of their system of logic, with its remarkable explosion in the length of formulæ, a possible explanation of A TERM OF LENGTH 85 the psychological stress suffered by some readers of Bourbaki. What will happen to a young innocent who decides to learn mathematics by reading Bourbaki, and to start with Volume I? It will tie him in knots. Either he will shut the book in disgust, or he will persevere and then he will be paralysed by the mental effort required to disentangle the formalism. Bourbaki themselves took the first course: as remarked by Corry, they shied away from their own foundations. I expect that they came to the conclusion that logic is crazy Ð they had to conclude that to protect their sanity; but were they aware that the picture of logic they were giving to their disciples is merely a grotesque distortion and diminution of a noble subject? Is it too fanciful to see here, in this choice of formalism, with its unintuitive treatment of quantifiers, the reason for the phenomenon (which many mathematicians in various European countries have drawn to my attention whilst beseeching me not to betray their identity, lest the all- powerful Bourbachistes take revenge by depriving them progressively of research grants, office facilities and ultimately of employment) that where the influence of Bourbaki is strong, support for logic is weak? How does one get the message across, to those who have accepted the Bourbachiste gospel, that logicians are actually not interested in such purposeless pro- lixity, still less do they advocate it as the proper intellectual framework for doing mathematics? As mentioned above, I have in preparation a fuller discussion of this topic. Other chapters of Danish Lectures will discuss the foundational exposition in the first hundred pages of Godement’s 600-page text Al- gebra, Dieudonné’s essay on the philosophy of Bourbaki, sundry writings of other members of the Bourbaki school, and lastly, what is at the core of Bourbaki’s mistreatment of logic, the efforts of Hilbert and his school to use his ε operator to establish the consistency and completeness of mathematics. For really the débâcle is hardly Bourbaki’s fault. The founding fathers, with Chevalley in the lead, were keen to introduce Hilbertian standards of rigour to France; and in their youthful enthusiasm they swallowed the Hil- bertian promise of a complete and consistent mathematical system hook, line and sinker. Their formalism rests on a device of Hilbert with which he pursued the chimera of consistency proofs. In the analysis of a complete system, the ε-operator might make sense, but in an incomplete system, such as Gödel showed all moderately expressive reasonable fragments of mathematics to be, it becomes very tortuous, and not something to place at the centre of a serious exposition of mathematical truth. 86 A. R. D. MATHIAS

NOTES

1 This paper owes its inception to the encouragement I have received from readers of The Ignorance of Bourbaki, and from the stimulating effect of Professor Pedersen’s invitation and hospitality. Discussions with Leo Corry during the Roskilde meeting have been most helpful. I am grateful to Robert Solovay and Tim Carlson for detecting an erroneous for- mula in an earlier draft of Proposition 3.8, and am further grateful to Solovay for further comments and corrections and for writing a program to give the exact figures for the yet lengthier symbolism discussed in ¤7. The first draft of the paper was written when I held an Associate Professorship at the Universidad de los Andes, Santa Fé de Bogotá, Colombia. It was put into final form whilst I was working on a research project at the HumboldtÐUniversität zu Berlin sponsored by the Deutsche Forschungsgemeinschaft. 2 Cf. Corry (1996, 319 and footnote 63 on p. 320). 3 The possible significance of the choice of the letter τ is to be discussed in a later chapter of my Danish Lectures. REFERENCES

Bourbaki, Nicolas: 1939, Elements de mathematique, Les structures fondamentales de l’analyse, Theorie des ensembles, Livre 1, Fascicule de Resultats,Paris. Bourbaki, Nicolas: 1954, Elements de mathématique, XVII, 1. part.: Les structures fon- damentales de l’analyse, Livre I: Theorie des ensembles, Chapitre I: Déscription de la mathématique formelle, Chapitre II: Theorie des ensembles, Paris [Actualités Scientifiques et Industrielles 1212]. Bourbaki, Nicolas: 1956, Elements de mathématique, XVII, 1. part.: Les structures fonda- mentales de l’analyse, Livre I: Theorie des ensembles, Chapitre III: Ensembles ordonnes, Cardinaux, Nombres entiers, Paris [Actualités Scientifiques et Industrielles 1243]. Bourbaki, Nicolas: 1957, Elements de mathématique, XVII, 1. part.: Les structures fon- damentales de l’analyse, Livre I: Theorie des ensembles, Chapitre IV: Structures,Paris [Actualités Scientifiques et Industrielles 1258]. Bourbaki, Nicolas: 1968, Elements of Mathematics, Theory of Sets,Paris. Bourbaki, Nicolas: 1970, Elements de mathématique, Théorie des Ensembles, Nouvelle Edition, Paris. Chouchan, M.: 1995, Nicolas Bourbaki, Faits et Légendes, Argenteuil. Corry, Leo: 1996, Modern Algebra and the Rise of Mathematical Structures, Basel [Science Networks, Historical Studies 17]. Godement, Roger: 1963, Cours d’Algèbre, Paris (revised 1966). Mathias, Adrian R. D.: 1992, The Ignorance of Bourbaki, Mathematical Intelligencer,Vol. 14, pp. 4Ð13; (also in Physis Riv. Internaz. Storia Sci (N.S.) Vol. 28, pp. 887Ð904; (avail- able in a Hungarian translation by András Racz as Bourbaki tévútjai, in A Természet Világa, 1998, III. különszáma).

Universidad de los Andes Current address: Santa Fe« de Bogotaand« Departement de Mathematiques HumboldtÐUniversität zu Berlin Universite« de la Reunion Germany Saint Denis de la Reunion E-mail: [email protected] France E-mail: [email protected] HANS JÜRGEN PRÖMEL

LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY

ABSTRACT. In the children’s book “The Phantom Tollbooth” by Norton Juster one can find the following passage: “Yes, please,” said Milo. “Can you show me the biggest number there is?” “I’d be delighted,” [the Mathemagician] replied, opening one of the closet doors. “We keep it right here. It took four miners just to dig it out.” Inside was the biggest 3 Milo had ever seen. It was fully twice as high as the Mathemagician. This is what children might consider to be a large number. The scope of this paper is to shed some light on numbers which adults – in former times and now – regard as large. Of course, the selection is arbitrary.

1. – A HISTORICAL GLIMPSE

Numbers have always fascinated people. We start with a short look at the number system of the Romans and the Greeks which will lead us to Archimedes and to the largest number which appears in the literature of early occidental history.

1.1. The Roman system The German word “Zahl” (number) goes back to the Old High-German word “zala”, which in turn belongs (as also the English “tale” and the Dutch “taal” do) to the Indo-German root “del-” (to notch, to carve). So, considering the origin of the word “Zahl”, it basically means “the notched” or “the carved”. Indeed, bones notched by pre-historic man 20,000 years ago are prob- ably the oldest items which human beings used to help them counting. Our ancestors cut notches into bones and later into pieces of wood in order to express the number of different things. But it is clear that the human ability to accurately and quickly determine the number of notches on a bone, say, rarely exceeded the number four. Thus, if one tries to count with aid of notches, the question arises how to express numbers larger than four in such a way that the encoded number can be recognized at the

Synthese 133: 87–105, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 88 HANS JÜRGEN PRÖMEL

Figure 1. A piece of wood with notches.1

first glance. The answer is easy and was found independently by several of our ancestors. After four equal notches, just change the form or position of the fifth one slightly to maintain readability of this sequence of lines. Moreover, some thousands of years ago several cultures developed the habit of doubling the symbol which they used for a fifth notch at every second occurance. A typical example is given in Figure 1 which shows a piece of wood with notches made by Dalmatian herdsmen. You easily can recognize three different kind of marks which are very similiar to the numerals I, V and X used by the Romans. In fact, the ori- gin of the Etruscan and, hence, also of the Roman numerals is the use of different wooden notches. At the time of Caesar, then, the Romans used mainly the following symbols as numerals: IVXLCD=IC M=CI C 1 5 10 50 100 500 1000 Using these symbols, the Romans created all numbers they wanted to express just by adding them. For example,

3888 = MMMDCCCLXXXVIII.

The only possibility for them to express large numbers without writing very long sequences of these symbols was to create new numerals. For example, they also used the numerals I CC CCI CC I CCC CCCI CCC 5000 10,000 50,000 100,000 . But they stuck (essentially) to their principle of expressing numbers by adding numerals from a limited toolbox (additive principle). Through continuing the scheme shown above the Romans could also have created symbols for 500,000, 1,000,000 and so on. But they did not do so, probably because these symbols got confusing. So 23 (of the original 33) copies of CCCI CCC can still be seen on the Columna Rostrata dedicated to the Roman commander Dulius in the 3rd century B.C. describing the capture of 3,300,000 prisoners. The Romans even did not have a word for numbers larger than 100,000. For one million they used, as Plinius writes in the first century, “decies centena milia” which is ten hundreds of thousand. LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 89

The symbol ∞ was a stylistic variation of CI C also occasionally used to express 1,000. John Wallis (1616–1703) proposed that this symbol should be used for infinity and mathematicians have been using it ever since. It seems surprising that a culture such as that of the Romans which maintained a high technological (and intellectual) level for centuries, kept a number system which did not allow them to calculate efficiently (a sub- ject we have not touched upon at all here), and moreover still reflected an archaic way of thinking.

1.2. The Greek System The Greeks, from about the fifth century B.C., used the notation illustrated in Figure 2.

Units Tens Hundreds α Alpha 1 ι Iota 10 ρ Rho 100 β Beta 2 κ Kappa 20 σ Sigma 200 γ Gamma 3 λ Lambda 30 τ Tau 300 δ Delta 4 µ My 40 υ Ypsilon 400  Epsilon 5 ν Ny 50 φ Phi 500 Digamma 6 ξ Xi 60 χ Chi 600 ζ Zeta 7 o Omikron 70 ψ Psi 700 η Eta 8 π Pi 80 ω Omega 800 ϑ Theta 9 Koppa 90 San 900

Figure 2. The Greek numbers.

To the 27 letters of the Greek alphabet they associated numbers divided into three groups of nine letters each. The first nine letters were used as units, the next nine as tens and finally the last nine as hundreds. Obviously, exactly 27 letters were needed to keep this working. For this reason, the Greek kept three letters from Semitic origin as numerals which had van- ished from the usual Greek alphabet earlier on. These letters are digamma ( ) or vau for 6, koppa ( ) or qoph for 90 and san ( ) or sampi for 900. To express the numbers from 1,000 to 9,000, the Greeks recycled the letters associated to the numbers from 1 to 9 and used an additional apo- stroph or iota indicating that the following number should be multiplied 90 HANS JÜRGEN PRÖMEL by 1,000, thus making use of the multiplicative principle instead of the additive one of the Romans. ια ιβ ιγ ιδ ... ιϑ 1000 2000 3000 4000 . . . 9000

Moving to even larger numbers, the symbol for 10.000 was a capital M, the first letter of the Greek word µυρι´ oι (myriad = ten thousand). To avoid confusion with the symbol for 40, the M usually got an expo- nent α = 1 or as Aristarchos of Samos (∼310 – ∼230 B.C.) used it, an α preceding it. Correspondingly,

αM βM γ M δM ... ιϑ ϑM 10.000 20.000 30.000 40.000 . . . 99.990.000.

So, for example, Aristarchos used this system to express the number

ιζρo M ιωo = 71.755.875 7.175 × 10.000 + 5.875 and surely all quantities of daily Greek life were expressible within this system. But the Greeks are famous for their mathematicians, their astro- nomers, their physicists – scientists with need for even larger numbers. For example, Apollonius of Perge (∼262 – ∼180 B.C.) invented powers of myriads. At about the same time Archimedes (∼287 – 212 B.C.) developed in his little book αµµιπης´ (“De numero arenae” or “The Sand-Reckoner”, cf. Heath (1953)) an ingenious system, which led to probably the largest number ever used in the literature of early occidental history.

1.3. The Sand-Reckoner “There are some, king Gelon, who think that the number of the sand is infinite in multitude; ...ButIwilltrytoshowyoubymeansofgeometrical proofs, which you will be able to follow, that, of the numbers named by me ...some exceed not only the number of the mass of sand equal in magnitude to the earth filled up in the way described, but also that of a massequalinmagnitudetotheuniverse....” Unfortunately, the original description of Archimedes’ number system was lost. But he at least repeats the description of his numbers in the Sand- Reckoner to such an extent that it allows to solve the problem he mentioned LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 91

Up to Q − 1 = 108 − 1 1st period, 1st order up to Q2 − 1 = 1016 − 1 1st period, 2nd order . . up to Q6 − 1 = 1048 − 1 1st period, 6th order . . 8 QQ − 1 = 108·10 − 1 1st period, Qth order 8 P = 108·10 8 P · Q − 1 = 108(10 +1) − 1 2nd period, 1st order . . 8 P 2 − 1 = 1016·10 − 1 2nd period, Qth order . . 16 up to P Q − 1 = 108·10 − 1 Qth period, Qth order

Figure 2. The hierarchy of Archimedes. to the Syracuse king Gelon. In this note, we will not discuss the size of the universe to which Archimedes referred. This is based on one of the boldest astronomical speculations of antiquity, in which Aristarchos of Samos pro- posed putting the earth in motion about the sun. We will also not discuss the interesting geometric considerations in Archimedes’ Sand-Reckoner. We will just have a brief look at the numbers which are constructed there. So far, the largest number in the Greek system we have seen is

ιϑ ϑ M ιϑ ϑ = 108 − 1.

Archimedes called the numbers up to this one “numbers of first order”. The numbers of second order then begin with a myriad-myriades (say Q = 108) and end with Q2 − 1. Then Q2 is the unit of third order numbers, whichgoasfarasQ3 − 1. Archimedes continued this way until he reached the numbers of Qth order which end with QQ − 1 = (108)108 − 1. To express numbers even larger than this one, he extended his terminology by calling all numbers constructed so far numbers of first period. The second period, then, con- sequently begins with the number (108)108 (which we will call P )and continues to its Qth order, see Figure 3. The periods continue to the Qth period, where the construction of Archimedes stops. That is, his system goes up to a myriad-myriad units of the myriad-myriadth order of the myriad-myriadth period – a number that would be written as a one followed by 80,000,000,000,000,000 zeros. To compare, the number of electrons in the universe (as far as it is presently 92 HANS JÜRGEN PRÖMEL known) is estimated to be just 1087. Archimedes was of course aware that he could continue a long these lines. He wrote that he stopped at this point because for his purpose it was enough to know the numbers up to this size. But surely, as he mentioned explicitely, one could go further. Moreover, it is notable that in connection with this work on large numbers Archimedes mentioned a principle that led to the invention of , in other words that the addition of orders of numbers (which is in his system the addition of the exponents of these numbers to the base Q = 108) corresponds to finding the product of numbers. Archimedes concluded his investigation with a much smaller number than those he had constructed before. “Hence the number of grains of sand which could be contained in a sphere of the size of our ‘universe’ is less than 1,000 units of the seventh order [in the first period] of numbers [or 1051].” “I conceive that these things, king Gelon, will appear incredible to the great majority of people who have not studied mathematics, ...”

2. RAMSEY NUMBERS AND RAMSEY THEORY

We will now make a big jump from Archimedes to modern age and will try to find out what people nowadays consider to be a large number. Admit- tedly an unscientific approach is to look at the Guinness Book of Records (1997). There one will find that the largest number ever used in a mathem- atical proof is a bound which was published in 1977 and which is known as Graham’s number. This number is an upper bound which arose from a problem in a part of combinatorics called Ramsey theory. Moreover, we learn there that Graham’s number cannot be expressed using conventional notation of powers, and powers of powers. To write down this number a special arrow notation, developed by Knuth (1976), is needed. Before moving on to a discussion of Graham’s number, we will prepare the ground by looking at so-called Ramsey numbers.

2.1. Ramsey’s Theorem and Small Ramsey Numbers In any collection of six people obviously either three of them mutually know each other or three of them mutually do not know each other. Assum- ing the relation of “knowing” is symmetric, we can turn this observation into the following mathematical statement: if one two-colors the pairs of a six-element set, say with red (for knowing each other) and blue (for not knowing each other), then there must exist a monochromatic three-element set (i.e., a triangle) either in color red or in color blue. This observation can LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 93 be considered as the first nontrivial Ramsey result. Why is this a “Ramsey” result? In 1928 Frank Plumpton Ramsey (1903–1930) had written a paper “On a problem of formal logic” which was published in 1930 in the Proceedings of the London Mathematical Society (Ramsey 1930). It is this paper for which he became eponymous for a part of discrete mathematics nowadays known as Ramsey theory. His object was to give a decision procedure for the sentences of propos- itional logic. The need for such procedures – in present day terminology we would say algorithms – arose with the crisis of the foundations of mathematics around 1900. It is ironic that a purely mathematical result from Ramsey’s paper has proved to be of greater consequence than the metamathematical investig- ations for which they were used as tools. Even more, as it was discovered later, the full strength of Ramsey’s theorem was not needed to find a de- cision procedure for the statements in the special class of first-order logic investigated by Ramsey. In abridged form, the key mathematical result in Ramsey’s paper is the following:

THEOREM (Ramsey 1930) For every pair s,t of positive integers there exists a least positive integer n = R(s,t) such that for every two-coloring of the pairs of an n-element set, say n ={0,...,n− 1}, with colors red and blue say, there exists either an s-element subset of n such that all its two-element subsets (pairs) are red (we say that there exists a red s-subset of n) or there exists a blue t-subset of n.

R(s,t) is called the Ramsey number of (s,t).Aswehaveobserved already, R(3, 3) ≤ 6 and the reader will easily convince himself that there exists a two-coloring of the pairs of 5 ={0, 1, 2, 3, 4} without a monochromatic triangle, i.e., R(3, 3) = 6. For almost the last seventy years, the problem of determining R(s,t), either exactly or asymptotically, has been one of the fundamental problems in combinatorics. There is a vast literature on Ramsey numbers and small ones in partic- ular. Besides the value of R(3, 3) it is known, e.g., that R(4, 4) = 18. But already the value of R(5, 5) is still unknown. The presently best bounds are 43 ≤ R(5, 5) ≤ 49 and there is some evidence that 43 might be the truth. Paul Erdos˝ (1913–1996), who was one of the most ingenious mathematicians of this century and who wrote more than a hundred papers on Ramsey theory, commented on this as follows: 94 HANS JÜRGEN PRÖMEL

“Suppose an evil spirit would tell us, ‘Unless you tell me the value of R(5, 5) I will exterminate the human race.’ Our best strategy would perhaps be to get all the computers and computer scientists to work on it. If he would ask for R(6, 6) our best bet would perhaps be to try to destroy him before he destroys us.” A dynamic survey on small Ramsey numbers is available in the Electronic Journal of Combinatorics at www.combinatorics.org and is maintained by Radziszowski.

2.2. Bounds on Ramsey Numbers Though much effort has been spent on calculating small Ramsey numbers exactly, the real challenge is to determine the growth rate of Ramsey func- tions. This problem inspired many researchers and has led to new ideas and techniques whose effect went far beyond the original questions. Ramsey himself proved that R(t,t) ≤ t!, but he emphazised that this bound might be far too large. And he was right! Ramsey’s bound was first improved by Skolem (1933) who showed that R(t,t) ≤ 4t−1 − 1. Shortly afterwards, Erdos˝ and Szekeres (1935) rediscovered Ramsey’s the- orem while solving a combinatorial problem in geometry. They showed in particular that

(1) R(s,t) ≤ R(s − 1,t) + R(s,t − 1).

To see this inequality, let n = R(s − 1,t) + R(s,t − 1) and assume a red- blue coloring of [n]2, the pairs of n ={0,...,n−1},tobegiven.Then,for every v ∈ n there are either at least R(s − 1,t) elements which form a red pair with v or there are at least R(s,t − 1) elements which form a blue pair with v. Hence, by choice of R(s − 1,t) and R(s,t − 1), there exists either areds-subset of n or a blue t-subset. Using Pascal’s  identity for binomial ≤ s+t−2 coefficients, inequality (1) leads to R(s,t) s−1 . Invoking Stirling’s formula then implies that

1 4t R(t,t) ≤ √ √ . 2 π t In view of the simplicity of the proof of the Erdos-Szekeres˝ bound, it is amazing that over 50 years were to pass before the bound  was improved 2t−2 −→ significantly. In 1986, Rödl showed that R(t,t) t−1 0, and, a short time later Thomason (1988) improved this by showing that   √ − − 1 + A log t 2t 2 R(t,t) ≤ t 2 t − 1 LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 95 for some constant A>0. Since then, there has been no progress on the upper bound.

Considering the lower bounds for R(s,t), the situation is even more dis- appointing. For some time it was even not clear that an exponential lower bound existed. In his seminal paper (Erdos˝ 1947), Erdos˝ found a simple but beautiful argument which yields

1 t R(t,t) ≥ √ t 2 2 . e 2 His proof became a standard textbook example of the power of the probabilistic method. In fact, Erdos˝ showed that   n − t +1 (2) if 2 (2) < 1thenR(t,t)>n t from which the above lower bound follows, by simple calculation. To see n 2 (2), consider the set of all 2(2) two-colorings of [n] . How many mono- chromatic t-subsets do we expect on average? Each t-subset of n is a red n − t t-subset for 2(2) (2) colorings  and a blue t-subset for the same number of colorings. Since there are n t-subsets, the average number of monochro-   t n − t + (2) 1 matic t-subsets is t 2 . Hence, if this number is strictly smaller than 1, there must exist at least one coloring without a single monochromatic t-subset, i.e., R(t,t) > n. This bound too, remained unbeaten for over three decades. In 1977, Spencer showed, using a powerful probabilistic tool known as the “Lovász local lemma” that √ 2 t R(t,t) ≥ t 2 2 , e thus gaining a factor two. This remains the only appreciable profit of more than 50 years of work on this lower bound on R(t,t). Comparing upper and√ lower bounds for R(t,t), we see in particular that 1 R(t,t) t lies between 2 and 4, for large t. Until now, it is not even known 1 whether the limit lim R(t,t) t exists. The real challenge, of course, lies in t→∞ determining this limit, provided it exists. Already in 1947, Erdos˝ offered $100 for a solution of the former problem and $250 for an answer to the latter one. But either of the two problems seem to be out of reach presently. 96 HANS JÜRGEN PRÖMEL

2.3. Bounds for R(3,t) Of particular interest over the last decades was to determine the asymptot- ics of R(3,t), i.e., of the least n such that every two-coloring of the pairs of n which does not contain a red triangle must contain a blue t-subset. Before discussing these asymptotics, we start with a remark concerning notation. Throughout this section, we are just interested in the rate of growth of R(3,t).Letf and g be arbitrary functions from N to N.We write f(n) = O(g(n)) if there are positive integers c and n0 such that for all n ≥ n0 , f(n) ≤ c · g(n) . Thus, informally f(n) = O(g(n)) means that f does not grow faster than g. We write f(n) = ,(g(n)) if the opposite happens, that is, if g(n) = O(f (n)). Finally, f = -(g(n)) indicates that f = O(g(n)) and f(n) = ,(g(n)). This is to say that f and g have precisely the same rate of growth. Using these notations, the Erdos-Szekeres˝ result implies that R(3,t) = O(t2). But in his 1947 paper, containing the famous exponential lower bound on R(t,t),Erdos˝ mentioned that he was not even able to prove the nonlinearity of R(3,t). It was only in 1961 that he succeeded in showing,  using a new and subtle 2 probabilistic argument, that R(3,t) = , t , leaving only a gap of a log2 t factor of log2 t between upper and lower bound. More and more determining R(3,t) became a cradle of methods and results, exceeding by far the original motivation, then the challenge of determining R(s,t) in general. In 1980, Ajtai, Komlós and Szemerédi invented a novel method to improve the upper bound by a factor of log t, = t2 i.e., showing that R(3,t) O log t . This new method proved in later years to be a very powerful tool in combinatorics and became known, in slightly refined form, as the “semirandom method” or the “Rödl nibble”. Later on, Shearer (1983, 1991) reduced the constant hidden in the O(.) and simplified the proof of the upper bound considerably. Much effort was also spent in improving Erdos’s˝ lower bound. But for several years, these attempts were only partly successful, improving only the constant obtained by Erdos.˝ The decisive step forward was then made by Kim (1995) who improved the Erdos˝ bound by the same factor by which Ajtai, Komlós and Szemerédi had improved the upper bound  due to Erdos-˝ = t2 Szekeres 15 years earlier. He showed that R(3,t) , log t , and hence, that   t2 R(3,t) = - . log t For this remarkable result, Kim received the D. Ray Fulkerson Prize in 1997. In the eulogy of the prize winning paper one can read that “the LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 97

Figure 3. Some graphs. paper is a veritable cornucopia of modern techniques in the probabilistic method”. It is worth noting that the main tool in the proof of Kim is the semir- andom method mentioned above, developed to a culmination point in his paper.

2.4. Ramsey Theory for Graphs Over the past seventy years, several generalizations and ramifications of Ramsey’s theorem have been obtained, providing deep insight into various structures. We will briefly touch upon two of the most significant exten- sions of Ramsey’s theorem before returning to Graham’s number. The first one, discussed in this section, generalises Ramsey’s theorem to graphs – the most important structure in discrete mathematics. A graph is a pair G =(V,E) where V denotes a finite set, the set of ⊆ V vertices of G,andE 2 is a subset of the pairs of V , the set of edges of = V G. Note that all graphs throughout this paper are finite. If E 2 ,then G is called a complete graph or a clique. A clique on n vertices is denoted by Kn. Let G = (V (G),E(G )) and H = (V (H ),E(H )) be graphs. Then   ⊆  = ∩ G is an induced subgraph of H if V(G) V(H)and E(G ) E(H) V(G) =  2 . Two graphs G (V (G),E(G)) and G are isomorphic if there is a bijection ϕ : V(G) −→ V(G) such that {x,y}∈E(G) if and only if {ϕ(x),ϕ(y)}∈E(G). We say that H contains (a copy of) G if there is an induced subgraph G of H which is isomorphic to G. Note that, in Figure 4, K4 is an induced subgraph of N,butC5, the cycle of length 5, is not. Using the notion of graphs, Ramsey’s theorem may be rephrased as follows: for every pair s,t of positive integers there exists a least positive integer n = R(s,t) such that for every red-blue coloring of the edges of Kn there exists either a red copy of Ks in Kn (i.e., a copy of Ks in Kn such 98 HANS JÜRGEN PRÖMEL that all its edges are in red) or there exists a blue copy of Kt in Kn.To abbreviate this statement, we introduce the Ramsey arrow notation

Kn −→ (Ks ,Kt ).

Thus, for example, K6 −→ (K3,K3). The first steps towards extending Ramsey’s theorem to graphs in gen- eral were made in the middle of the 60s. In response to a question of Erdos˝ and Hajnal (1967), in 1968 Graham showed that there exists a graph F on 8 vertices which does not contain a K6 but, nevertheless, for every red-blue coloring of the edges of F there exists a monochromatic K3. But what can be said about obtaining monochromatic copies of graphs different from K3 or, more generally, different from cliques? A complete answer to this question was obtained independently by Rödl (1973) in his Master’s thesis, by Deuber (1975), and by Erdos,˝ Hajnal and Pósa (1975). The latter two papers appeared in the proceedings of the conference ded- icated to the 60th birthday of Erdos˝ in 1973. The results of all three papers imply that for every pair G,H of graphs there exists a graph F such that

F −→ (G,H ), in other words, for every red-blue coloring of the edges of F , there exists either a red copy of G orabluecopyofH in F . Let R(G,H) denote the least n such that there exists a graph F on n vertices satisfying F −→ (G,H ). This is the induced Ramsey number of (G,H ).ItwasErdos˝ who suggested already in 1975 to study R(G,H).In 1984, he himself then wrote (with a little change in the notation): “Hajnal and I observed that if G and H have at most t vertices, then

− t1 ε (3) R(G,H) ≤ 22 .

We have never published the not entirely trivial proof of (3) since Hajnal and I thought that perhaps

(4) max R(G,H) = R(t,t).

Conjecture (4) is perhaps a little too optimistic, but we have no counter- example. Perhaps there is a better chance to prove

(5) R(G,H) ≤ 2ct.”

It is still not known whether (5) and maybe even (4) hold, but (3) has been improved considerably, as we shall see below. However, first observe LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 99 that if the answer to (5) is positive, then the result is best possible (up to the value of the constant c). This follows since

1 t R(Kt ,Kt ) = R(t,t) ≥ √ t 2 2 c 2 by the result of Erdos˝ proved in Section 2.2. On the other hand, (5) is certainly true for cliques. This follows from the Erdos-Szekeres˝ result of Section 2.2, namely 1 4t R(Kt ,Kt ) = R(t,t) ≤ √ √ . 2 π t The presently best general upper bound for graphs G,H on at most t vertices is

2 R(G,H) ≤ 2ct(log t) for some absolute constant c, i.e., missing a purely exponential bound in t conjectured in (5) by a factor of (log t)2 in the exponent. This result is due to Kohayakawa, Prömel and Rödl (1997).

2.5. The Graham-Rothschild Theorem In their by now classical paper, Graham and Rothschild (1971) introduced the concept of parameter sets or combinatorial cubes. The idea was to find a combinatorial abstraction of linear and affine vector spaces over finite fields. This was motivated by a conjecture of Gian-Carlo Rota (1932– 1999), which proposed a geometric analogue to Ramsey’s theorem. In fact, the Ramsey theorem for n-parameter sets directly implies Rota’s conjec- ture for lower dimensional cases and, as it has turned out, the method used in the proof of this theorem also contains the seeds of the ideas to prove Rota’s conjecture in its full strength. This was done by Graham, Leeb and Rothschild (1972). But the impact of parameter sets goes far beyond the proof of Rota’s conjecture. For example, Ramsey’s theorem itself is an immediate con- sequence of the Graham-Rothschild theorem. To a certain extent the Graham-Rothschild theorem can be viewed as the first theorem of a new Ramsey theory age. It unifies seemingly different structures and results, displays the richness of the field of extensions of Ramsey’s theorem and attracted much attention to Ramsey theory – compare, e.g., Prömel and Voigt (1990). To state the Graham-Rothschild theorem more precisely, we need some definitions. Let A be a finite set (alphabet). We are concerned with An,the 100 HANS JÜRGEN PRÖMEL

Figure 4. Combinatorial lines in 23. set of n-tuples over A and certain subsets of this set, called parameter sets or combinatorial cubes. Zero-parameter sets are simply singleton elements of An. A one- parameter set (or combinatorial line) L ⊆ An is a set of size |A| such that there exists a nonempty set I ⊆ n of coordinates and for every i ∈ n\I there exists an element ai ∈ A such that

L ={(x0,...,xn−1) | xi = xj for all i,j ∈ I and xi = ai ∈ A for i ∈ I}. Intuitively speaking the set I consists of the moving coordinates and the coordinates i ∈ n\I are fixed. There are 33 − 1 combinatorial lines in A3 when A = 2 ={0, 1}.Some of them are indicated in Figure 5. Every combinatorial line can be represented by a one-parameter word f ∈ (A∪{λ})n containing the parameter λ at least once. L can be obtained from f by replacing λ by elements of A. Thus the parameter λ indicates the moving coordinates. For example, L ={(0, 1, 0),(1 , 1, 1)}⊆23 is represented by the one-parameter word f = (λ, 1,λ) . In general, an m-parameter set (or combinatorial m-cube) M ⊆ An is given by an m-parameter word f = (f0,...,fn−1) where fi ∈ A ∪ {λ0,...,λm−1}. We require that each parameter λi,i

m n M ={f · (a0,...,am−1) | (a0,...,am−1) ∈ A }⊆A is the combinatorial m-cube represented by f . LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 101   n n We denote by A m the set of combinatorial  m-cubes in A (represen- n = n ted by m-parameter words). Observe that A 0 A . Now suppose we identify A with the  Galois field GF(q), for some prime power q.Then ∈ n every f GF(q) m represents an m-dimensional affine subspace of the n-dimensional vector space over GF(q). Note that in general, there exist additional m-dimensional affine subspaces, except in the case m = 0, where we have a bijective correspondence. Graham and Rothschild (1971) proved that for every finite set A and for all positive integers k,s ,andt there exists a positive integer n such that        n s t A −→ A ,A , k k k in other words, for every red-blue coloring of the combinatorial k-cubes in An there exists either a red s-subcube or a blue t-subcube. Why is this result an extension of Ramsey’s theorem? To see this, let us deduce Ramsey’s theorem from the Graham-Rothschild theorem. Let A = 1 ={0}, k = 2, choose s,t arbitrarily and let n be according  to : n −→ [ ]2 the Graham-Rothschild theorem. Consider the mapping C 1 2 n defined by C(f ) ={i,j } where i is the position of the first occurence  of λi : n −→ [ ]s : n −→ [ ]t in f and j respectively. Define s 1 s n and t 1 t n , where [n]s denotes the s-element subsets of n and [n]t respectively, analog- ously. Clearly, C, s,andt are surjective. Let D be a red-blue coloring of the pairs in n. Then define a red-blue coloring DC of the combinatorial n two-subcubes of 1 by DC(f ) = D(C(f )). By choice of n, there exists either a red s-subcube, r say, or a blue t-subcube in 1n. Without loss of generality, assume the former holds. Then s (r) is an s-subset of n which is red with respect to D, completing the proof of Ramsey’s theorem. The Graham-Rothschild theorem for parameter sets was a first, but major step in answering Rota’s conjecture in the affirmative. Only a short time later, Graham, Leeb and Rothschild settled this conjecture completely in their Pólya prize winning paper (Graham and Rothschild 1972). They proved that for every finite field GF(q) and for all positive integers k,s and t there exists a positive integer n such that for every red-blue coloring of the k-dimensional affine subspace, of the n-dimensional vector space over GF(q), there exists either a red s-dimensional subspace or a blue t-dimensional subspace. In order to come closer to this result and to be able to describe more affine subspaces of some vector space combinatorially, Graham and Roth- schild first extended the concept of combinatorial cubes. For simplicity of presentation, we will restrict our considerations to the alphabet A = 2, having the field GF(2) in mind. 102 HANS JÜRGEN PRÖMEL

Let a parameter word be given. The difference now is that additional labels are added to each occurence of a parameter. These labels are ele- ments from the (additive) group (Z2, +). Thereby, we require that the first occurence of every parameter is labeled by the unit element 0, all other occurences may be labeled arbitrarily by 0 or 1. An example of a labeled 1-parameter word in 23 is F = ((λ, 0), 0,(λ,1 )).Iff is a labeled m- parameter word in 2n and g is a labeled k-parameter word in 2m,then the composition f · g is a labeled k-parameter word in 2n which results from replacing the parameter (λi,α),inf by α + gi ∈ 2ifgi ∈ 2orby (λj ,α + β) if gi = (λj ,β). Hence, the labeled 1-parameter word F rep- resents the generalized combinatorial line L ={(0, 0, 1),(1 , 0, 0)},which is not a combinatorial line in the sense described earlier. Note that there is a bijection between generalized combinatorial lines and 1-dimensional af- fine subspaces of the n-dimensional vector space over GF(q). In particular, any two elements of 2n are joined by a generalized combinatorial line. Extending their result on combinatorial cubes to generalized combinat- orial cubes, Graham and Rothschild proved that for all positive integers s and t there exists a least positive integer n = GR(s,t) such that for every red-blue coloring of the generalized combinatorial lines in 2n there exists either a red generalized s-subcube or a blue generalized t-subcube. This implies already the case k = 1 of the partition theorem for affine spaces mentioned above.

3. KNUTH’S ARROW NOTATION AND GRAHAM’S NUMBER

“A hypercube has 2n corners....Whatisthesmallestdimensionofa hy- percube such that if the lines joining all pairs of corners are two-colored, aplanarK4 of one color will be forced? Ramsey’s theorem guarantees that the question has an answer only if the forced K4 is not confined to a plane. The existence of an answer when the forced monochromatic K4 is planar was first proved by Graham and Bruce L. Rothschild in a far-reaching generalization of Ramsey’s theorem that they found in 1970. Finding the actual number, however, is something else. In an unpublished proof Graham has recently established an upper bound, but it is a bound so vast that it holds the record for the largest number ever used in a serious mathematical proof” (Graham 1977). In the terminology of the previous section, this can be expressed as saying that Graham established an upper bound for GR(2, 2). Gardner continues: “To convey at least a vague notion of the size of Graham’s number we must first attempt to explain Knuth’s arrow notation.” LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 103

In 1976, Knuth introduced the following notation to express large numbers: xn abbreviates x + x + ...+ x = nx ↑ · · · n x n abbreviates x x ... x = x  ··x x ↑↑ n abbreviates x ↑ x ↑ ...↑ x = x· n . . k k −1 k −1 k −1 x ↑ ...↑ n abbreviates x ↑ ...↑ x ↑ ...↑ ...↑ ...↑ x, where there are n occurences of x and this expression is evaluated from the right. Obviously, these numbers grow very rapidly. For example, 3 ↑ 3 = 27, 3 ↑↑ 3 = 3 ↑ 3 ↑ 3 = 333 =7 625 597 484 987, and 3 ↑↑↑ 3 = 3 ↑↑ ··3 3 ↑↑ 3 = 3 ↑↑ 333 = 3· , where the number of threes is 333. The reader is invited to imagine the size of 3 ↑↑↑↑ 3. Following Gardner again, Graham’s number now can be expressed as

3 ↑↑ ...↑↑ 3 where the number of arrows is 3 ↑↑ ...↑↑ 3 where the number of arrows is 3 ↑↑ ...↑↑ 3 where the number of arrows is et cetera, up to 60 lines later 3 ↑↑ ...↑↑ 3 where the number of arrows is 3 ↑↑↑↑ 3

Surely this is a large number and surely beyond human comprehension, but, of course, very small as finite numbers go. Graham’s upper bound now says that

GR(2, 2) ≤ Graham’s number.

The alert reader will have observed that Knuth’s arrow notation is very similiar to functions defined about 50 years earlier by Ackermann in 1928. 104 HANS JÜRGEN PRÖMEL

Let

F0(n) := n + 1 n+1 + F1(n) := F0 (n) =2n 1 n+1 ≥ n+1 F2(n) := F1 (n) 2 + ·2n 1 n+1 ≥ ·· + F3(n) := F2 (n) 2 , a stack of n 1twos . . n+1 Fk+1(n) := Fk (n), where the n + 1 in the exponent denotes the (n + 1)-fold iteration. This hierarchy is called the Grzegorczyk hierarchy of primitive recurs- ive functions. F2 is an exponential function, F3 is called the stack-of-twos function, F4 is an iteration of the stack-of-twos function and so forth. Spencer suggests calling F4 the wow-function, as its growth rate is already beyond imagination. However, all these functions are still primitive recus- ive. Moreover, for every primitive recursive function f there exist k and n0 such that f(n) ≤ Fk(n) for all n ≥ n0, i.e., the function Fk eventu- ally dominates f . Thus, using diagonalisation, we may define a function Fω(n) := Fn(n) for every n<ωwhich is recursive but no longer primitive recursive. Fω is called an Ackermann-type function (though Ackermann’s original definition is somewhat different). A rough estimate now shows that

Graham’s number is an upper bound and there is, of course, the possib- ility this number will be deflated in the future. In fact, it is believed that GR(2, 2) = 6 – but if true it would be a really large 6.

NOTES

1 This figure is taken from Ifrah (1989).

REFERENCES

Das Guinness Buch der Rekorde 1998: 1997, Hamburg. LARGE NUMBERS, KNUTH’S ARROW NOTATION, AND RAMSEY THEORY 105

Noga A. and J. Spencer: 1992, ‘The Probabilistic Method; With an Appendix on Open Problems by Paul Erdos’,˝ Wiley-Interscience Series in Discrete Mathematics and Optimization, New York. C. B. Boyer: 1989, ‘A History of Mathematics; With a Foreword by Isaac Asimov’, revised and with a preface by Uta C. Merzbach, New York Dedron, P. and J. Itard: 1959, Mathématiques et mathématiciens, Collection sciences et savantes, Paris. Erdos,˝ P.: 1947, ‘Some Remarks on the Theory of Graphs’, Bulletin of the American Mathematical Society 53, 292–294. Gardner, M.: 1977, ‘Mathematical Games’, Scientific American, pp. 18–28. Graham, R. L., K. Leeb and Rothschild, B. L.: 1972, ‘Ramsey’s Theorem for a Class of Categories’, Advances in Mathematics, 8, 417–433. Graham, R. L. and V. Rödl: 1987, ‘Numbers in Ramsey Theory’, in C. Whitehead (ed.), Surveys in combinatorics 1987, Invited papers for the Eleventh British Combinatorial Conference (held at the University of London Goldsmith’s College, July 13–17, 1987), Cambridge, London Mathematical Society Lecture Note Series 123, pp. 111–153. Graham, R. L. and B. L. Rothschild: 1971, ‘Ramsey’s Theorem for n-parameter Sets’, Trans. Am. Math. Society. 159, 257–292. Graham, R. L., B. L. Rothschild and J. Spencer: 1990, Ramsey Theory, Wiley-Interscience Series in Discrete Mathematics and Optimization, New York. Heath, T. L. (ed.): 1953, The Works of Archimedes; The “Method” of Archimedes,New York. Ifrah, G.: 1989, Universalgeschichte der Zahlen, Übersetzung aus dem Französischen von Alexander von Platen, Sonderausgabe, Mit 797 Abbildungen, Tabellen und Zeichnungen des Autors, Frankfurt. Kim, J. H.: 1995, ‘The Ramsey Number R(3,t) has Order of Magnitude t2/ log t’, Random Structures and Algorithms 7, 173–208. Knuth, D. E.: 1976, ‘Mathematics and Computer Science: Coping with Finiteness’, Science 194, 1235–1242. Kohayakawa, Y., H. J. Prömel and V. Rödl: 1997, ‘Induced Ramsey Numbers’, Combinat- orica 18, 373–404. Nešetril,J.:ˇ 1995, ‘Ramsey Theory’, in R. Graham, M. Grötschel and L. Lovász (eds.), Handbook of Combinatorics, Amsterdam, pp. 1331–1404. H. J. Prömel and B. Voigt: 1990, ‘Graham-Rothschild Parameter Sets’, in J. Nešetrilˇ and V. Rödl (eds.), Mathematics of Ramsey Theory, Collected papers of the Prague sym- posium on graph theory held in Prague, Czechoslovakia, Algorithms and Combinatorics 5, Berlin, pp. 113–149. F. P. Ramsey: 1930, ‘On a Problem of Formal Logic’, Proceedings of the London Mathematical Society 30, 264–286. H. Wussing: 1965, Mathematik in der Antike, Mathematik in der Periode der Sklavenhal- tergesellschaft,Leipzig.

Hans Jürgen Prömel Humboldt-Universität zu Berlin Unter den Linden 6 10099 Berlin Germany E-mail: [email protected]

RALPH MATTHES

TARSKI’S FIXED-POINT THEOREM AND LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES

ABSTRACT. The new concept of lambda calculi with monotone inductive types is intro- duced by help of motivations drawn from Tarski’s fixed-point theorem (in preorder theory) and initial algebras and initial recursive algebras from category theory. They are intended to serve as formalisms for studying iteration and primitive recursion on general inductively given structures. Special accent is put on the behaviour of the rewrite rules motivated by the categorical approach, most notably on the question of strong normalization (i.e., the impossibility of an infinite sequence of successive rewrite steps). It is shown that this key property hinges on the concrete formulation. The canonical system of monotone inductive types, where monotonicity is expressed by a monotonicity witness being a term expressing monotonicity through its type, enjoys strong normalization shown by an embedding into the traditional system of non-interleaving positive inductive types which, however, has to be enriched by the parametric polymorphism of system F. Restrictions to iteration on monotone inductive types already embed into system F alone, hence clearly displaying the difference between iteration and primitive recursion with respect to algorithms despite the fact that, classically, recursion is only a concept derived from iteration.

1. INTRODUCTION AND OVERVIEW

The aim is to generalize the notion of function defined by iteration or prim- itive recursion on N in the direction of more complex inductively generated structures than the naturals. As a secondary aim, the difference between iteration and primitive recursion shall be elucidated. Our generalization draws its motivation from Tarski’s theorem guaran- teeing for any given monotone function on a complete lattice an associated structure which allows reasoning by (generalized) induction. However, Tarski’s result has no direct connection to the notion of algorithm (as is required for studying iteration and primitive recursion). As a constructive refinement on the way to a syntactical analysis of algorithms, we review initial algebras and initial recursive algebras with respect to some func- tor on a category. To that aim, a minimal background on category theory is provided. Those semantical structures may be more easily studied by formalisms having only the bare bones of the semantical structures. The formalisms at hand are lambda calculi with inductive types. Again, a min-

Synthese 133: 107–129, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 108 RALPH MATTHES imal background on lambda calculi will be given. The first inductive types come by examples which all stick to the traditional paradigm of positivity: They are structures where the monotonicity of the induced operator on the complete lattice (à la Tarski) may be simply read off the shape of the inductive type (which hence is a trivial syntactic check). However, it is also possible to introduce systems having inductive types without that severe syntactic restriction. In order to formulate them, on has to introduce parametric polymorphism, i.e., universally quantified types. By their help, monotonicity of the type ρ with respect to the type variable α may be expressed as the type

∀α∀β.(α → β) → ρ → ρ[α := β].

If one writes ρ(α) for ρ and ρ(β) for ρ[α := β], this may be inter- preted that for any α and β, whenever α is “contained” in β,thenρ(α) is “contained” in ρ(β).1 The known systems of inductive types had syntactic restrictions on ρ and α which always guaranteed that closed terms of type ∀α∀β.(α → β) → ρ → ρ[α := β] existed whenever ρ and α were used to introduce an inductive type. The new system of monotone inductive types allows any ρ and α to give rise to an inductive type. However, the behaviour of iteration and primitive recursion are controlled by arbitrary terms of type

∀α∀β.(α → β) → ρ → ρ[α := β] which need not exist beforehand but may already use iteration and primit- ive recursion without restriction. Moreover, those monotonicity witnesses need not be closed. Apriori, this approach gives much more freedom, and really seems to be the most far-reaching generalization of lambda calculus (without dependent types) possible in the spirit of Tarski’s theorem. However, if we restrict the system of monotone inductive types to allow iteration only, we can embed it into system F (which is the basic system of impredicative polymorphism without any inductive types). Therefore, it is crucial to have primitive recursion in the system. Nevertheless, also the full system embeds into the system of non-interleaving positive inductive types showing that all the subtleties which were added to that relatively simple system do not even add algorithmic power. Note, that the system of positive inductive types had to be extended by the capabilities of system F in order to make the embedding possible. Hence, it is already fully impredicative and therefore still inaccessible to proof theoretic analysis. The definition and analysis of monotone inductive types without system F is still an area full of open questions. LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 109

In Section 2 we state Tarski’s theorem, give an example with natural numbers and introduce “extended induction” which will later be the mo- tivation for primitive recursion. Section 3 defines categories, functors and initial algebras for functors. Then a lambda calculus is developed to cover those notions syntactically. We leave the problem open how to model the morphism part of the functors. In Section 4 examples are given which show how rich even non-interleaving positive inductive types are, and even more those with interleaving. Also system F is defined and a first solution is given to the mentioned problem of finding the action of types on morph- isms. In Section 5 primitive recursion as opposed to iteration is introduced. Section 6 proceeds with original material from Matthes (1998) spelt out in a more transparent manner. First the reference system of monotone induct- ive types is given, then variants are dealt with showing that reasonable formulations may lead to non-termination due to a lack of uniformity in the monotonicity witness, finally it is explained how an easy interpolation argument leads to the embedding of the term rewrite system of monotone inductive types into the (a priori much smaller) system of non-interleaving positive inductive types. In Section 7 an attempt is made to relate this all to other research. In principle, the presentation is self-contained: Categories, functors, ini- tial algebras, lambda calculus and inductive types are defined. Since most of the structures are intended as a motivation for the definitions in Section 6, no results are proved. Also in that Section 6 we do not aim at proving results in full detail. This has all been done in my Ph.D. thesis (Matthes 1998) which, however, never tried to be readable to readers outside the field.

2. TARSKI’S FIXED POINT THEOREM

We formulate only the part of Tarski’s theorem (Tarski 1955) which will be used later.

THEOREM 1 (Tarski 1939). Let (U, ⊆, ∧, ∨) be a complete lattice of sets (subsets of some fixed set) with ⊆ being set inclusion, and : U → U be monotone. Then  µ := {S | (S) ⊆ S} (the infimum of the pre-fixed-points) is a pre-fixed-point of , i.e., (µ ) ⊆ µ .

The set µ is called the set which is inductively generated by . 110 RALPH MATTHES

Remark that we do not need U to be formed of sets with set inclusion as partial order, and we even have that µ is a fixed point of . As an example consider U := P (R) with settheoretic infimum and supremum, and with

(S) := {0}∪{r + 1 | r ∈ S}.

Obviously, is monotone. Set N := µ . By the theorem, 0 ∈ N and for all r ∈ N,alsor + 1 ∈ N. We now want to show that r ∈ N ⇒ r2 ∈ N. We try to use the fact that N is the infimum of all pre-fixed-points of .SetS := {r|r2 ∈ N}. If we manage to show that (S) ⊆ S, then we may infer N ⊆ S. Clearly, 0 ∈ S.Nowletr ∈ S. We have to show that r + 1 ∈ S, i.e., (r + 1)2 = r2 +r+r+1 ∈ N. We may assume that we proved beforehand that addition does not lead out of N. We already know that r2 ∈ N. But we do not know whether r ∈ N! As a remedy to this failure of proof consider S := N ∩ S and show (S) ⊆ S because then due to Tarski’s theorem and the monotonicity of we also get (S) ⊆ S and therefore N ⊆ S which in turn implies N ⊆ S. It is clear that (S) ⊆ S holds because we now have r ∈ N in the crucial proof step. It should be remarked that mathematicians usually take the second approach when showing a statement by induction. They do it intuitively and often even when the first approach would be more to the point – a fact hidden for mathematical induction because the natural numbers are commonly seen as given in total and not cut out of some larger universe of objects. It will become important for the aim of this work to make a difference. Remind the following three properties of µ : (µ-I) (µ ) ⊆ µ . (µ-E) (S) ⊆ S ⇒ µ ⊆ S. (µ-E+) (µ ∧ S) ⊆ S ⇒ µ ⊆ S. The last rule shall be called “extended induction” and may easily be in- ferred from (µ-I), (µ-E) and the monotonicity of like in the example of natural numbers. (µ-E) is the rule of induction.

3. INITIAL ALGEBRAS AND ITERATION

In the previous section we argued strictly classically. Let us now look more closely at our relation ⊆. We want to add information why ⊆ holds by LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 111 adding witnesses of ⊆. This may be seen as the generalization of the theory of preorders to category theory (in the sense of Eilenberg and Mac Lane (1942)2): We do not only have a morphism from some object A to an object B or do not have it, but consider in general a class of morphisms from A to B. Instead of least pre-fixed-points of some monotone “operator” initial algebras w.r.t. some functor are studied. A category consists of • objects • morphisms from some object to some object • composition of morphisms having several properties: • The class of morphisms from an object A to A itself always contains a morphism called the identity on A. • Composition respects types, i.e., one may compose morphisms iff the source object of the first morphism g is the same as the target object of the second morphism f , and the result g ◦ f is a morphism from the source object of f to the target object of g. • Composition is associative and the identities are neutral elements with respect to composition (as long as composition with them is defined). In case there is at most one morphism from any object to any object, this gives precisely the preorders (= reflexive and transitive binary relations) on the objects (the identities correspond to reflexivity, composability to transitivity). A functor is • a mapping from some category to a category, • mapping objects to objects and morphisms to morphisms and fulfilling several compatibility properties: • A morphism is mapped to a morphism from the mapped source object to the mapped target object. • Identities are mapped to identities. • The composition of morphisms is mapped to the composition of the mapped morphisms. In case of preorders this concept reduces to the monotone mappings (reflecting only the first compatibility property). Instead of least pre-fixed-points of monotone operators we now study initial algebras of endofunctors. Let F be an endofunctor on some cat- egory. An initial F -algebra is a pair (M, c) with M an object of the category and c a morphism from F(M) to M such that for any other F - algebra, i.e., any pair (S, s) with S an object and s a morphism from F(S) 112 RALPH MATTHES to S there is a unique morphism Es from M to S such that Es ◦ c = s◦F(Es), where the morphism part of F is applied to Es. The last equation may be depicted via the commuting diagram in Figure 1. Note that this equation trivially holds in case of a preorder.

c // F(M) M

F(Es) Es   // F(S) s S

Figure 1. Initial F -Algebra

Earlier we had (µ ) ⊆ µ , now we have a morphism from F(M) to M, hence c is the witness for the validity of (µ-I). Earlier we had to prove (S) ⊆ S in order to conclude µ ⊆ S, now we have to provide a morphism s from F(S)to S in order to get a morphism from M to S: Es. Our aim is the syntactic study of initial algebras via systems of terms with rewrite rules. We have to model the categorical notions such as morphism and composition. We use the lambda calculus as basic formalism for dealing with function(al)s. Instead of objects we simply have types which are nothing but strings of symbols built up from symbols for variables by help of type formers – most notably the binary → allowing to form the type ρ → σ from types ρ and σ . The type ρ → σ should be thought of as the type of functions from ρ to σ , i.e., of functionals whose arguments have type ρ and whose results have type σ . “Functionals” is the preferred name for those intuitive objects because the arguments may already be functions (and even functionals). Instead of morphisms from A to B we study terms of type ρ → σ where terms of any type ρ should be thought of as denoting functionals of type ρ. Terms are again only syntactic objects. We have

• typed variables, denoted like xρ , • lambda abstraction λxρ rσ : ρ → σ (if r already has type σ ; after a colon or as a superscript we indicate the type of the term) modelling the functional x → r(x) in general mathematical language (r(x) is r but with the dependency on x indicated, hence it is only metasyn- tactically blown up.), in other words: lambda abstraction is functional abstraction, • and application rρ→σ sρ : σ modelling application of a function taking arguments of type ρ to some argument of that type. (In general math- ematics one would perhaps prefer to write r(s).) The identity of the LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 113

argument type of the functional r and the type of the argument s is the restriction imposed on term formation by the type system.

At present there is no life in the system. Life is added by term rewriting: Clearly, if one applies x → r(x) to s, the result is r(s). In the term system this is modelled by the rule of beta reduction3 which allows to replace (λxr)s by r[x := s] which is the term r after replacing any occurrence of the free variable x in r by s. We not only allow to replace the whole term in this fashion (which will be indicated by the binary relation , hence being defined by (λxr)s  r[x := s]) but may do the replacement to some subterm of the term under consideration, and arrive at the binary one step relation → on terms. It is well-known that for simply-typed lambda calculus (only type variables and →) those replacements cannot go on forever – independent of the strategy of replacement in case of choice. This fact is known as strong normalization of simply typed lambda calculus.4 Together with confluence (which means that if one has a choice in replace- ment then both possible outcomes of replacement can be joined again by some further replacements) normalization implies that the least equival- ence relation making a term and its replacement equivalent is decidable. This equivalence is called beta equality.5 Let us model initial F -algebras syntactically (Geuvers 1992). Fix a type ρ and some variable α which might occur in ρ. The mapping α → ρ shall take the position of the functor F . (Its action on types/objects is given by substitution, its action on terms of function type/morphisms will be discussed in the next section.) Instead of the object M we include the inductive type µαρ in our type system the interpretation of which is the least pre-fixed-point of α → ρ. First look at Figure 2.

Cµαρ . ρ[α := µαρ] /µαρ/

µαρ µαρ (α→ρ)(λx .xEµs) λx .xEµs   [ := ] // ρ α σ s σ

Figure 2. Inductive Type with Iteration 114 RALPH MATTHES

The morphism part of the initial F -algebra is replaced by a term formation rule:

(µ-I) If t isatermoftypeρ[α := µαρ],thenCµαρ t isatermoftypeµαρ (in the diagram we used another mathematical means of expressing lambda abstraction: the dot).

Weak6 initiality is expressed by a term formation rule and a term rewrite rule: (µ-E) If s is a term of type ρ[α := σ ]→σ and r is a term of type µαρ,thenrEµs is a term of type σ . (In the diagram, we use lambda abstraction instead of the notation .Eµs with the dot as means of abstraction.)

(βµ) The new rewrite rule (added to beta reduction for →):   ρ[α:=µαρ] µαρ (Cµαρ t )Eµs  s (α → ρ)(λx .xEµs)t .

(r  r means that a subterm r may be replaced by r in order to carry out one rewrite step →.) It will later become clear that (βµ) expresses iteration on µαρ.

This is not yet a proper definition unless we answer the following question: What is the counterpart of the functor on morphisms in our term world? More precisely, we have to give the definition of the term

µαρ (α → ρ)(λx .xEµs), which has to have type ρ[α := µαρ]→ρ[α := σ ].

4. POSITIVE INDUCTIVE TYPES

In order to formulate interesting examples which help us in finding the µαρ definition of (α → ρ)(λx .xEµs) we first extend the type system by the type 1 and sum types, and the term system by the appropriate introduc- tion and elimination rules for those type constructs. The rewrite rules are extended accordingly.7 We include 1 in our type system which serves as a singleton set, and also ρ + σ modelling the disjoint sum for any two types ρ and σ . The only inhabitant of 1 will be IN1: Add the term IN1 of type 1 (as an initial rule of the term system). It makes no sense to give an elimination rule for 1 and hence there is also no beta reduction rule for 1. Inhabitants of type ρ + σ LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 115 are left injections of inhabitants of type ρ or right injections of inhabitants of type σ . This gives rise to the following term formation rules:

• If r isatermoftypeρ,thenINLσ r is a term of type ρ + σ . • If r isatermoftypeσ ,thenINRρr isatermoftypeρ + σ . Elimination of terms of sum type is done by case analysis written as a term formation rule: • If r is a term of type ρ + σ and s and t are terms of type τ,then r(xρ .s, yσ .t) isatermoftypeτ. (The dot serves as an indication of binding, i.e., x and y are bound variables.) The meaning of these rules is displayed by the beta reduction rules for sums: ρ σ • (INLσ r)(x .s, y .t)  s[x := r]. ρ σ • (INRρr)(x .s, y .t)  t[y := r]. Let us consider several examples of inductive types: First define the type of natural numbers by nat := µα.1 + α. Our introduction rule gives us a term of type nat for any term of type 1 + nat. Thus zero and the successor are put together in this rule. We get them isolated by defining:

• 0 := Cnat(INLnatIN1) : nat. nat • S := λx .Cnat(INR1x) : nat → nat. Likewise, a term s of type 1 + σ → σ in the elimination rule pairs up the initial value of the iteration and the step function. We again isolate them by:

• s0 := s(INLσ IN1) : σ . σ • sS := λx .s(INR1x) : σ → σ .

We now want to have that 0Eµs rewrites to s0,and(St)Eµs and sS(tEµs) rewrite to the same term, so that Eµ behaves (equationally) as an iterator on naturals meaning that (S ...( S 0)...)Eµs and sS(...(s S s0)...) are equal n n in some sense (here: that they have a common reduct). An easy calculation shows that this is achieved by setting

nat (α → 1 + α)(λx .xEµs)   1+nat 1 nat nat := λz .z y .INLσ y,y .INR1((λx .xEµs)y) .

This will be an instance of the general method indicated below. α → 1+α is monotone in quite trivial a sense: It is even strictly positive, i.e., α occurs only strictly positively (never to the left side of a →)in1+ α. 116 RALPH MATTHES

Now consider the type cont := µα.1 + ((α → nat) → nat) which may be seen as the type of continuations for programs whose results are natural numbers.8 Let us ignore the tasks of isolation dealt with in the previous example. Then we have as constructors of type cont the terms D : cont and C : ((cont → nat) → nat) → cont. We would like to have an inhabitant e of type cont → nat which completes in some sense the continuation it gets as an argument. It is specified as (= shall denote the equivalence imposed by rewriting): • eD = 0. • e(Cf)= fefor any term f of type (cont → nat) → nat. The trivial continuation is “completed” to give the trivial result 0 and a continuation given by some f representing a functional which calculates numbers given some function of type cont → nat, i.e., some “completion function”, is completed simply by applying this f to the “completion func- tion” e being constructed. We see that the recursive call is by no means to e with some term smaller than Cf in any sense. The term e which represents the function about to be defined is even fed in as an argument to the term f . But nevertheless this is an instance of iteration on cont if one defines (α → 1 + ((α → nat) → nat))rρ→σ for arbitrary terms r : ρ → σ as

1+(ρ→nat)→nat 1 (ρ→nat)→nat λx .x(y .INL(σ→nat)→naty,y .INR1

σ →nat ρ (λz .y(λz1 .z(rz1)))). Later we cite a result that the definition given as above, but read as a term rewrite system (from left to right) is strongly normalizing. Note that cont is a nested inductive type in the sense that the inductive type nat appears in its definition. Note also that for cardinality reasons there cannot be a set-theoretic model interpreting the function type as the full set-theoretic function space.9 Nevertheless, the monotonicity of α → 1 + ((α → nat) → nat)) is seen quite easily: α occurs only positively in 1 + ((α → nat) → nat)) because every occurrence is to the left of an even number of →. Clearly, this type is not strictly positive. A third example is given by the ρ-branching well-founded trees (for arbitrary type ρ) tree(ρ) := µβ.1 + (ρ → β). (β is assumed not to be free in ρ.) The dependency on β is strictly positive. As a final example consider the highly branching trees after Berger: Tree := µα.1 + (tree(α) → α) LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 117 consists of the well-founded trees whose branching degree is tree(Tree), i.e., the whole type of well-founded trees branching over Tree about to be defined! Monotonicity is still seen by one glance at the definition: α occurs only positively in 1 + (tree(α) → α) (because α occurs negatively in tree(α)). Because the fixed-point is formed with respect to the variable α occurring free (as a parameter) of the type tree(α) showingupinthedefin- ition of Tree, the latter type is called an interleaving non-strictly positive inductive type.

LEMMA 1. If µαρ is only formed in case α occurs in ρ only positively (and ρ obeys hereditarily to this restriction), then there are closed terms

map : (τ → τ ) → ρ[α := τ ]→ρ[α := τ ]. α→ρ,τ1,τ2 1 2 1 2

This is well-known (Leivant 1990; Howard 1992). For a careful defini- tion yielding normal terms map see Matthes (1998, 5.1.1). As long as there is no interleaving of inductive types (which rules out Tree) the definition is quite straightforward. Interleaving requires to use (µ-I) and (µ-E) in the definition of map.10 Now we may set for those types:

→ µαρ := µαρ (α ρ)(λx .xEµs) mapα→ρ,µαρ,σ (λx .xEµs).

THEOREM 2. The term rewrite system of positive inductive types is strongly normalizing. Proof. Reduction-preserving embedding into Girard’s (1972) (and Reynolds’ (1974)) system F. System F is the polymorphic lambda calculus. We do not have inductive types but universal types ∀αρ (for any type ρ and any type variable α). The corresponding term formation rules are as follows: • If r is a term of type ρ,then αr is a term of type ∀αρ, provided that α does not occur free in a type of a variable free in r (well motivated by the Curry-Howard-correspondence (Howard 1980) with second-order propositional logic). • If r isatermoftype∀αρ,thenrσ isatermoftypeρ[α := σ ]. The beta reduction rule for universal types: ( αr)σ  r[α := σ ]. It was Girard who found a normalization proof for system F which is the impredicative variant of the Tait computability method (Tait 1967). Since Girard’s proof numerous proof variants have been published. They are inherently difficult because their means of proof have to go beyond second-order arithmetic. Nevertheless, there are quite short, modular and clear proofs available.11 118 RALPH MATTHES

Let an embedding be a translation of a term rewrite system to another, where the types and terms are translated in a compatible fashion such that any rewrite step in the source system gives rise to at least one rewrite step between the translated terms in the target system. The most important fact concerning embeddings for our purposes is the preservation of strong normalization: If the target system of an embedding is strongly normalizing, so is the source system. The proof that our system of positive inductive types embeds into sys- tem F may be done as usual. (Note however, that most authors do not really care about the simulation of rewrite steps and are content with preservation of beta equality and even neglect the need for lemmas on interchangeability of the translation function with substitution. For a careful presentation see Matthes (1998).)

5. PRIMITIVE RECURSION

Full primitive recursion is more than mere iteration as given by (βµ). For example, in Gödel’s T (for a thorough presentation see Girard et al. (1989)) which has primitive recursion on natural numbers we may also access the previous number argument as an argument to the step function calculating the value of the function(al) for some number argument:12 For any terms s0 : σ and sS : nat → σ → σ there is a term F : nat → σ (with free variables among those of s0 and sS) such that F 0 = s0 and F(St) = sSt(Ft) (= denotes the equality relation induced by the rewrite rules of Gödel’s T).13 Of course, it is possible to represent functions defined by primitive recursion by functions defined by iteration only. However, the behaviour of this encoding w.r.t. rewriting is not satisfactory: It does only work for numerals (instead of arbitrary terms of type nat) and, e.g., the encoded predecessor function needs a linear number of steps and not only a constant number.14 In general, primitive recursion directly corresponds to “extended induc- tion” as defined in Section 2: (µ ∧ S) ⊆ S ⇒ µ ⊆ S. This principle has been derived from ordinary induction plus monotonicity plus the Tarski theorem. The derivation might be cast in the term calculus but again has an unsatisfactory rewriting behaviour (the problems with Gödel’s T men- tioned above are nothing but an instance of this phenomenon). Therefore, we include the counterpart of “extended induction” as a new primitive. The formation of the binary infimum will be modelled by the categorical product and by product types: A product of the objects A and B of some × A,B category consists of some object A B and (projection) morphisms π1 × A,B × from A B to A and π2 from A B to B such that for any object C and LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 119 morphisms l from C to A and r from C to B there is a unique morphism   × A,B ◦ = A,B ◦ = l,r from C to A B such that π1 l,r l and π2 l,r r. Therefore, we extend our lambda calculus by types ρ × σ for any types ρ and σ and term formation rules as follows: • If r is a term of type ρ and s is a term of type σ ,thenr, s is a term of type ρ × σ . • If r is a term of type ρ × σ ,thenrL is a term of type ρ and rRisa term of type σ . The categorical equalities become beta reduction rules: r, sL  r and r, sR  s. On the categorical side we now require that our initial F -algebra is also an initial recursive F -algebra (Geuvers 1992): For any object S and morphism s from F(M× S) to S there is a unique morphism E+s from M to S such that E+s ◦c = s ◦F(Id, E+s) where Id is the identity morphism on M. The equality is depicted in Figure 3.

c // F(M) M

+ F(Id,E s) E+s   × // F(M S) s S

Figure 3. Initial Recursive F -Algebra

On the level of terms this gives a new formation rule: • If r is a term of type µαρ and s isatermoftypeρ[α := µαρ × σ ]→ + σ ,thenrEµ s isatermoftypeσ . The new rewrite rule is read off the diagram in Figure 4.

Cµαρ . ρ[α := µαρ] /µαρ/

+ + →  µαρ  µαρ (α ρ)( Id,λx .xEµ s ) λx .xEµ s   [ := × ] // ρ α µαρ σ s σ

Figure 4. Inductive Type with Primitive Recursion

If we restrict ourselves to positive inductive types as in Lemma 1 we + may now give the rule (βµ ) of primitive recursion as follows:     +  µαρ  µαρ +  (Cµαρ t)Eµ s s mapα→ρ,µαρ,µαρ×σ λx . x,(λx .xEµ s)x t 120 RALPH MATTHES

THEOREM 3. The extended system of positive inductive types with iteration15 and primitive recursion is strongly normalizing. Proof. For example, by an extension of an appropriate proof for system F. (One may also show – cf. Matthes (1998, 4.2.2) – an embedding into monotone inductive types explained in the next section and infer strong normalization from that system.) It is preferable (and will be assumed in the sequel) that system F’s uni- versal type quantification be included into the system of positive inductive types. It is easy to see that Lemma 1 may then be expressed more neatly by the existence of closed terms mapα→ρ of type

∀α∀β.(α → β) → ρ → ρ[α := β].

(Simply take α β.mapα→ρ,α,β and also consider the case of universal quantification in the proof of that lemma.) The term mapα→ρ obviously witnesses the monotonicity of α → ρ as displayed by its type. And this monotonicity is derived from the syntactic requirement of positivity in the formation of µαρ. This directly leads us to the generalization studied in the next section.

6. MONOTONE INDUCTIVE TYPES

The observation central to this article:16

Strong normalization of systems of inductive types does not depend on a specific definition of the monotonicity witnesses mapα→ρ.

And further: The witness even need not be closed (conditional mono- tonicity). It moreover may be nothing but a variable (hypothetical mono- tonicity). However, in these cases we have to include the witnesses in our term formation rules (attached witnesses of monotonicity) such that the rewrite rules of iteration and primitive recursion have access to them.

6.1. A System of Monotone Inductive Types It is like the extension of system F by positive inductive types, but with the following specification: • µαρ may be formed without any restriction. • The introduction rule for terms of type µαρ is changed to: If m is a term of type ∀α∀β.(α → β) → ρ → ρ[α := β] and t isatermof type ρ[α := µαρ],thenCµαρ mt is a term of type µαρ. LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 121

• + The formation rules concerning Eµ and Eµ are unchanged. • Replace in both rewrite rules (pertaining to iteration and primitive re- = cursion) the term mapα→ρ,µαρ,τ by m(µαρ)τ (for the types τ σ and τ = µαρ × σ , respectively). As mentioned above, the hypothesized monotonicity as expressed by an arbitrary term of type ∀α∀β.(α → β) → ρ → ρ[α := β] hastobeac- cessible in the beta (i.e., iteration and primitive recursion) rules for µαρ. Here, we attach the monotonicity witness to the constructor Cµαρ .

THEOREM 4. This term rewrite system is strongly normalizing. + Proof. Without (βµ ) this could be done by an embedding into system F. The full system may be treated by a technically demanding extension of an appropriate normalization proof for system F (Matthes 1998, 9.5.1). (Below we will see that an easy embedding suffices to establish the result.)

6.2. Other Systems of Monotone Inductive Types Normalization of the previous system does not carry over to arbitrary other reasonable definitions of systems of monotone inductive types even with it- eration only. We consider two modifications with the original introduction rule (if t : ρ[α := µαρ],thenCµαρ t : µαρ) but with witnesses of some weakened forms of monotonicity in the elimination rule for iteration. The choices are: • m :∀β.(µαρ → β) → ρ[α := µαρ]→ρ[α := β] • m :∀α.(α → σ) → ρ → ρ[α := σ ] In the first case, we are content with the first argument being fixed to µαρ, in the second case, we fix the second argument to the type σ of the functional defined by iteration on µαρ. In both cases we get as term formation rule: If t isatermoftypeµαρ, m is given as above and s is a term of type ρ[α := σ ]→σ ,thenrEµms is a term of type σ . We may now easily give modified iteration rules (because in the original system we always only used mapα→ρ,µαρ,σ , hence with first argument µαρ and second argument σ ):17   • (C t)E ms  s mσ (λxµαρ .xE ms)t µαρ µ  µ  µαρ • (Cµαρ t)Eµms  s m(µαρ)(λx .xEµms)t

LEMMA 2. One of the systems is strongly normalizing (even with primit- ive recursion added), the other not even weakly normalizing, i.e., there are 122 RALPH MATTHES terms of the other system which cannot be rewritten to a normal form (a term which cannot be rewritten any further). Proof. Again, e.g., by an intricate extension of the normalization proof for system F in the positive case (Matthes 1998, 9.5.4),18, and by a counterexample in the negative case. The counterexample will now be given for the first system: Set ρ := α, choose arbitrary tµαα and σ and set s := λy σ y.Define

µαα→β µαα m := βλz λu .z(Cµααu).

m is even closed. If one left out Cµαα of the definition of m, we would have in essence the canonical monotonicity witness guaranteed by Lemma 1, and we would no longer profit from the special setting of the first argument. µαα (Cµααt)Eµms rewrites in two steps to mσ (λx .xEµms)t and in three µαα more steps to (λx .xEµms)(Cµααt), hence in a total of six steps to the initial term. This gives rise to an infinite loop which cannot be avoided by some other strategy of choosing the subterms to replace. What is the reason for such an unfortunate behaviour of the first sys- tem? It is a lack of uniformity which is exploited in the definition of the monotonicity witness m shown above. The canonical monotonicity witness would essentially do the same at any type, i.e., it would be parametric,the present m only works for the type µαα (as first argument). Only recently there has even been made a proposal for a non- normalizing extension of system F by some non-parametric rewrite rule for terms of an already inhabited type of system F (Harper and Mitchell 1999): Set σ := ∀α∀β.(α → α) → (β → β). Trivially, σ is inhabited by the closed term t := α βλxα→α λy β .y. A new constant J  of type σ is added to system F along with the following rewrite rule: J ρρ rρ→ρ  r for any type ρ. Clearly, this definition exploits the equality of both type arguments. Our term t would behave quite differently: tρρ rρ→ρ rewrites in three steps to λy ρ .y and does not profit from the identity of the type arguments. The innocent-looking rule for J  – after all, it is just a projection – also gives rise to an infinite loop: Set ρ := ∀α.α → α and ω := α.J ρα(λxρ .xρx) of type ρ.Thenωρω rewrites to J ρρ(λxρ.xρx)ω by beta reduction for universal types. The J -rule applies and yields (λxρ .xρx)ω whichinturn rewrites to the initial term ωρω. I guess, the second variation on monotone inductive types discussed above is well-behaved simply because it is impossible to make non- parametric use of the additional freedom given by the less uniform type of the required monotonicity witnesses. LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 123

6.3. Embedding Monotone Inductive Types into Positive Inductive Types Recall that we call types “non-interleaving” if every subexpression µαρ has no subexpression of the form µβσ with α free in µβσ .

THEOREM 5 (Main Theorem). The system of monotone inductive types with (full) primitive recursion and iteration defined in Subsection 6.1 em- beds into the extension of system F by non-interleaving positive inductive types with (full) primitive recursion and iteration.19

Note that the reverse embedding is quite easy. Hence, both systems are equivalent with respect to rewrite behaviour, hence intensionally equivalent. Proof. An amazingly simple transformation which is clarified by the concept of interpolation in second-order intuitionistic propositional logic.20 Extensions of system F correspond via the Curry–Howard- isomorphism (Howard 1980) to intuitionistic second-order proposi- tional logics in natural deduction formulation: the types are seen as the formulas, and the typed terms are considered to be the formal proofs of their respective types/formulas. Fix α and ρ and let m : ∀α∀β.(α → β) → ρ → ρ[α := β]. Relative to the free variables of m, ρ is provably monotone in α. Relative to the same free variables, mβα proves (β → α) → ρ[α := β]→ρ.Thatis,ρ is proved from the two assumptions β → α and ρ[α := β]. Interpolation would mean that one finds an intermediate formula/type ρ∀ such that ρ∀ only has free type variables which are common to the assumptions and the conclusion,21 and such that the assumptions imply ρ∀,andρ∀ in turn implies the conclusion. In intuitionistic second-order propositional logic, this is easily achieved by setting

ρ∀ := ∀γ.(α → γ)→ ρ[α := γ ]

(for a new type variable γ ) provided that α occurs free in ρ. Otherwise, a trivial solution would be to take ρ as intermediate formula. Nevertheless, we always take ρ∀ and check that there are terms of types (β → α) → ρ[α := β]→ρ∀ and ρ∀ → ρ with at most the free variables of m:

λzβ→αλxρ[α:=β] γ λyα→γ .mβγ (λhβ .y(zh))x

: (β → α) → ρ[α := β]→ρ∀ and

ρ∀ α q∀ := λx .xα(λy .y) : ρ∀ → ρ. 124 RALPH MATTHES

We now substitute β by α in the first term, apply it to the identity on α and simplify in order to get the term

ρ α→γ p∀ := λx γ λy .mαγyx : ρ → ρ∀.

Therefore, ρ and ρ∀ are even logically equivalent relative to the assump- tions made in m. Note that ρ∀ has only one free occurrence of α,andthe occurrence is non-strictly positive (two times to the left of a →). Finally, we are in the position to define the transformation of the types and the terms of the system of monotone inductive types in 6.1 into the system of non-interleaving positive inductive types. The only non-trivial   case of the type transformation is (µαρ) := µα.(ρ )∀, i.e., if ρ has already been transformed into the non-interleaving positive inductive type ρ,we transform µαρ into µα∀γ.(α → γ) → ρ[α := γ ].Sinceα does not occur free in ρ[α := γ ], we rule out interleaving by this transformation. The terms are transformed trivially (i.e., homomorphically) except for the cases pertaining to µ:  ˆ ˆ • (Cµαρ mt) := C(µαρ) t with t as follows: We assume that we already have translated t into t of type ρ[α := (µαρ)] and m into m and    ˆ  henceforth p∀ into p∀ of type ρ → ρ∀, hence we may set t := p∀[α :=     (µαρ) ]t of type ρ∀[α := (µαρ) ].    • (rEµs) := r Eµsˆ with r the result of translating r and sˆ as follows: We assume that we already have s of type ρ[α := σ ]→σ  as    the result of translating s and similarly q∀ of type ρ∀ → ρ and set ρ [α:=σ ]       sˆ := λz ∀ .s (q∀[α := σ ]z) of type ρ∀[α := σ ]→σ .  +   +  [ := × ]     • := ρ∀ α (µαρ) σ [ := × ] (rEµ s) r Eµ λz .s (q∀ α (µαρ) σ z) , justified by a similar argument. ˆ    Notice that t needs m (via p∀)andthatsˆ does not need m (because it  only requires q∀). It is thus essential to have the monotonicity witnesses attached to the introduction rule (µ-I).22 Is this translation of the type and term systems also an embedding of the term rewrite systems? It is indeed a simple consequence of the following

LEMMA 3. Let hα→β be a lambda abstraction and ρ be a positive inductive type. Then

[ := ] ρ →∗ q∀ α β (mapα→ρ∀,α,β h(p∀x )) mαβhx, where →∗ means finitely many reduction steps of system F. = α→β ρ∀ β→γ α Proof. Note that mapα→ρ∀,α,β λf λx γ λy .xγ (λz .y(f z)). The rest is nothing but beta reduction in system F. LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 125

The proof of the main theorem is finished by checking that

    + µαρ (Cµαρ mtEµs) → s m(µαρ)σ (λx .xEµs)t and that

+  (Cµαρ mtEµ s)

     →+ × µαρ  µαρ +  s m(µαρ)(µαρ σ) λx . x,(λx .xEµ s)x t , where →+ denotes at least one rewrite step of the system of positive in- ductive types. (In order to get a mathematically proper proof one has to state several lemmata on compatibility with substitution.) To sum up: In the world of interpolation it seems to be trivial, but for inductive types we get the collapse of monotone inductive types with (full) primitive recursion to non-interleaving positive inductive types (in presence of system F’s expressive power being crucial for the type transformation µαρ → µα∀γ(α→ γ)→ ρ[α := γ ]).

7. RELATED WORK AND CONCLUSIONS

It seems that Mendler (1987b) first tried to give a definition of monotone inductive types in a system of dependent types. Unfortunately, the mono- tonicity proof has not been used in the definition of the additional rewrite rule for those types. One could ask how this system could be justified at all. There is an easy answer: The specific form of recursion used in that system is the same as that of the extension of system F also presented in Mendler (1987a). And Uustalu and Vene (1997) discovered – perhaps by under- standing what Leivant had in mind in Leivant (1990, 308) – that Mendler’s (small) system (which has the superfluous requirement of positivity) em- beds into non-interleaving positive inductive types. Thorsten Altenkirch proposed in 1993 to formulate the recursion rule by using an arbitrary monotonicity witness (“term of functorial strength”) given beforehand. He also sketched a normalization proof by means of saturated sets. The systems studied in this article correspond via the Curry-Howard- isomorphism to (second-order) propositional logics. Real applications de- mand predicate logics, the inductive types then become inductively defined predicates. Term rewriting becomes proof transformation. Uustalu stud- ies a great variety of systems with inductively and coinductively defined predicates and gives embeddings between those systems (Uustalu 1998). 126 RALPH MATTHES

The inductive types could then serve as the term language for program extraction from proofs carried out in those systems. On the side of systems with monotonicity witnesses there is also the notion of monotone fixed-point types (Matthes 1999a) (reducible to non- interleaving positive fixed-point types which have a very nice direct proof of strong normalization) and the reduction of monotone inductive and coinductive types to non-interleaving positive fixed-point types (Matthes 1999b). A major open question concerns the reducibility of monotone inductive types to positive inductive types in the absence of universal types. A first result in this direction by Altenkirch (1999) shows that in parametric equal- ity theory the witnesses for positive inductive types are necessarily those found in Lemma 1 if they map the identity to the identity – something which the known witnesses for non-positive α → ρ fail to do. Although there are still open questions I cannot imagine how to find a system expressing the results of Tarski’s theorem more generally than the systems of monotone inductive types, unless one introduces dependent types which are indeed an enormous realm of further study.

ACKNOWLEDGEMENTS

I am thankful for support by the Volkswagenstiftung. Thanks to Thorsten Altenkirch for many fruitful discussions and especially for his hint on using interpolation to explain the embedding. I am also thankful to the anonymous referee who provided so much useful criticism.

NOTES

1 Note that this suggests two different readings of the same →: implication and containment. 2 With respect to category theory, the present article is self-contained. For a textbook on category theory see Mac Lane (1998). 3 It is a reduction towards a term which cannot be rewritten any further (a normal form) due to strong normalization mentioned below. 4 Without types, normalization does not hold as may be seen from the generic example (λx.xx)(λx.xx)  (xx)[x := λx.xx]≡(λx.xx)(λx.xx) which even loops. 5 There is a huge body of knowledge on first-order term rewrite systems not involving any binding of variables. The lambda calculi studied in this article heavily rest on the beta rule relating variable binding and substitution which makes them higher-order term rewrite systems. However, there is also a more subtle notion of higher-order term rewriting using lambda calculus as a framework, and consequently formulating the rewrite rules only up LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 127 to beta (and eta) equality. See Mayr and Nipkow (1998) for confluence and van de Pol and Schwichtenberg (1995) for normalization of systems of this kind. 6 We do not require uniqueness. 7 Clearly, 1 and sum types are the counterparts to the categorical notions of terminal object and coproduct, respectively (Geuvers 1992). Finite sums would be finite coproducts and the empty sum be an initial object (typically represented by 0). Terminal objects are empty products. Binary products are needed for the definition of primitive recursion in Section 5. 8 Taken from an unpublished manuscript by Martin Hofmann. 9 Of course, the recursion-theoretic model of lambda calculus allows to define the function e the totality of which may then be proved. There is a wealth of models for types such as cont studied in the field of domain theory. 10 Nevertheless there is not much freedom left in the definition. See the discussion at the very end of the article. 11 For an introduction to this subject see Girard et al. (1989). 12 A typical example would be the factorial function !:N → N specified by 0!:=1and (n + 1)!:=n!·(n + 1). 13 Note that σ may be an arbitrarily complex type, hence giving rise to a wider class of representable functions than only the classical primitive recursive functions. By taking σ := nat → nat one can already define Ackermann’s function which therefore is primitive recursive in our sense. 14 In Spławski and Urzyczyn (1999) evidence is given that there cannot be any reason- able encoding of inductive types into system F having a constant time predecessor. As a consequence, primitive recursion could not be simulated by iteration in any reasonable encoding. 15 As remarked earlier, (µ-E) appears in the definition of the terms map in case of inter- leaving inductive types, hence we need both iteration and recursion in our system. Still, it is possible to formulate a system without iteration (Matthes 1998, 5.1.3). One would hardly like to use it. 16 A precise formulation is given by the systems ESMIT and ISMIT in Matthes (1998). 17 Therefore, one could even think of a version with monotonicity witness of type (µαρ → σ) → ρ[α := µαρ]→ρ[α := σ ] which, by the next lemma, would also induce a non- normalizing system. 18 The main theorem below will even cover this situation, hence there is in fact no need for that intricate extension. 19 Some more sophistication even shows that we do not need iteration in the target system of our embedding (Matthes 1998, 4.5.2 and 7). 20 Originally, the embedding has been found by defining a “dual” of a system proposed by Mendler (1987a) and a modified realizability interpretation inspired by Berger (1993) of some direct proof of normalization by means of saturated sets à la Tait (Tait 1975). In Matthes (1998) the intermediate system is motivated by help of an easy lattice-theoretic observation. It was my colleague Thorsten Altenkirch who suggested to recast the proof in terms of interpolation. 21 In first-order logic, one instead requires that the function symbols of the interpolating formula occur in the assumptions as well as the conclusion. 22 However, a dual translation involving the existential quantifier allows to embed systems of monotone inductive types with monotonicity witnesses attached to the elimination rules into the system of non-interleaving positive inductive types (Matthes 1998, 6.2.3). 128 RALPH MATTHES

REFERENCES

Altenkirch, T.: 1999, ‘Logical Relations and Inductive/Coinductive Types’, in G. Gottlob, E. Grandjean and K. Seyr (eds.), Computer Science Logic 12th International Workshop, Brno, Czech Republic, August 24–28, 1998, Berlin. [Lecture Notes in Computer Science 1584], pp. 343–354. Berger, U.: 1993, ‘Program Extraction from Normalization Proofs, in: M. Bezem and J. F. Groote (eds.), Typed Lambda Calculi and Applications, Berlin. [Lecture Notes in Computer Science 664], pp. 91–106. Eilenberg, S. and Mac Lane, S.: 1942, ‘Natural Isomorphisms in Group Theory’, Proceed- ings of the National Academy of Sciences USA 28, pp. 537–543. Geuvers, H.: 1992, ‘Inductive and Coinductive Types with Iteration and Recursion’, in B. Nordström, K. Pettersson and G. Plotkin (eds.), Proceedings of the 1992 Workshop on Types for Proofs and Programs, Båstad, Sweden, June, pp. 193–217. (Only published via ftp://ftp.cs.chalmers.se/pub/cs-reports/baastad.92/proc.dvi.Z) Girard, J.-Y.: 1972, Interprétation Fonctionnelle et Élimination des Coupures dans l’Arith- métique d’Ordre Supérieur, Thèse de Doctorat d’État, Université de Paris VII. Girard, J.-Y., Lafont, Y., and Taylor, P.: 1989, Proofs and Types, Cambridge. [Cambridge Tracts in Theoretical Computer Science 7]. Harper, R. and Mitchell, J. C.: 1999, ‘Parametricity and Variants of Girard’s J Operator’, Information Processing Letters 70, 1–5. Howard, B.: 1992, ‘Fixed Points and Extensionality in Typed Functional Programming Languages’, Ph.D. thesis, Stanford University. Howard, W. A.: 1980, ‘The Formulae-as-Types Notion of Construction’, in J. P. Seldin and J. R. Hindley (eds.), To H. B. Curry: Essays on Combinatory Logic, Lambda Calculus and Formalism, London, pp. 479–490. Mac Lane, S.: 1998, Categories for the Working Mathematician, Berlin. [Graduate Texts in Mathematics 5]. Leivant, D.: 1990, ‘Contracting Proofs to Programs’, in P. Odifreddi (ed.), Logic and Computer Science, London. [APIC Studies in Data Processing 31], pp. 279–327. Matthes, R.: 1998, ‘Extensions of System F by Iteration and Primitive Recursion on Mono- tone Inductive Types’, Doktorarbeit (Ph.D. thesis), University of Munich. (Available via the homepage http://www.tcs.informatik.uni-muenchen.de/˜matthes) Matthes, R.: 1999a, ‘Monotone Fixed-point Types and Strong Normalization’, in G. Gottlob, E. Grandjean and K. Seyr (eds.), Computer Science Logic, 12th Interna- tional Workshop, Brno, Czech Republic, August 24–28, 1998, Berlin. [Lecture Notes in Computer Science 1584], pp. 298–312]. Matthes, R.: 1999b, ‘Monotone (Co)inductive Types and Positive Fixed-point Types’, Theoretical Informatics and Applications 33, 309–328 [FICS ’98 Proceedings]. Mayr, R. and Nipkow, T.: 1998, ‘Higher-Order Rewrite Systems and their Confluence’, Theoretical Computer Science 192, 3–29. Mendler, N. P.: 1987, ‘Recursive Types and Type Constraints in Second-order Lambda Calculus’, in Proceedings of the Second Annual IEEE Symposium on Logic in Com- puter Science, IEEE Computer Society Press, Ithaca, N.Y., pp. 30–36. (Forms a part of Mendler (1987b).) Mendler, P. F.: 1987b, ‘Inductive Definition in Type Theory’, Technical Report 87-870, Cornell University, Ithaca, N.Y. (Ph.D. Thesis [Paul F. Mendler = Nax P. Mendler]) van de Pol, J. and Schwichtenberg, H.: 1995, ‘Strict Functionals for Termination Proofs’, in M. Dezani-Ciancaglini and G. Plotkin (eds.), Typed Lambda Calculi and Applica- LAMBDA CALCULI WITH MONOTONE INDUCTIVE TYPES 129

tions, 2nd International Conference, TLCA ’95, Edinburgh, GB, April 10–12, 1995, Proceedings, Berlin. [Lecture Notes in Computer Science 902], pp. 350–364. Reynolds, J. C.: 1974, ‘Towards a Theory of Type Structure’, in B. Robinet (ed.), Program- ming Symposium, Berlin. [Lecture Notes in Computer Science 19], pp. 408–425. Spławski, Z. and Urzyczyn, P.: 1999, ‘Type Fixpoints: Iteration vs. Recursion’, in Proceedings of the 4th ACM SIGPLAN International Conference on Func- tional Programming, September 27–29, 1999, Paris, France, ACM Digital Library. http://www.acm.org/pubs/contents/proceedings/fp/317636/. Tait, W. W.: 1967, ‘Intensional Interpretations of Functionals of Finite Type I’, Journal of Symbolic Logic 32, 198–212. Tait, W. W.: 1975, ‘A Realizability Interpretation of the Theory of Species’, in R. Parikh (ed.), Logic Colloquium Boston 1971/72, Berlin. [Lecture Notes in Mathematics 453], pp. 240–251. Tarski, A.: 1955, ‘A Lattice-theoretical Fixpoint Theorem and Its Applications’, Pacific Journal of Mathematics 5, 285–309. Uustalu, T.: 1998, ‘Natural Deduction for Intuitionistic Least and Greatest Fixedpoint Logics, with an Application to Program Construction’, Ph.D. thesis, Royal Institute of Technology, Kista, Sweden. Uustalu, T. and Vene, V.: 1997, ‘A Cube of Proof Systems for the Intuitionistic Predicate µ-,ν-logic’, in M. Haveraaen and O. Owe (eds.), Selected Papers of the 8th Nordic Workshop on Programming Theory (NWPT ’96), Oslo, Norway, December 1996, Oslo. [Research Reports, Department of Informatics, University of Oslo 248], pp. 237–246.

Lehr- und Forschungseinheit für theoretische Informatik Institut für Informatik der Universität München Germany E-mail: [email protected]

JAN JÜRJENS

GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES – AN ELEMENTARY INTRODUCTION

ABSTRACT. Mathematical models are an important tool in the development of software technology, including programming languages and algorithms. During the last few years, a new class of such models has been developed based on the notion of a mathematical game that is especially well-suited to address the interactions between the components of a system. This paper gives an introduction to these game-semantical models of programming languages, concentrating on motivating the basic intuitions and putting them into context.

1. INTRODUCTION

The importance of reliability in software products is by now well-known (most commonly known example for a source of concern is the “year 2000 bug”). One aim in the design of new programming languages is to try to minimize the occurrences of mistakes through appropriate design (classical example is to encourage programmers to program in a struc- tured way, as achieved through the design of the programming language Pascal by providing programming constructs such as procedures). More suitable programming languages enable a more efficient and more secure development of software. Providing mathematical models for programming languages is an im- portant step in this direction. Their purpose is to serve as a basis for understanding and reasoning about how programs behave. On the one hand, they can be used for analysis and verification, on the other, there have been significant examples of the design of new programming lan- guage principles influenced by the mathematical foundations (for instance the influence of the lambda-calculus on the development of the functional language ML). For a long time (ca. 1950–1980) these models were functional in nature: execution of a program was thought of as computation of a function (as op- posed to an interactive process). Using this model allowed the development of the notions of correctness of a program with respect to its specification. These models could be classified as operational semantics respectively denotational semantics. Different structures also relating inputs to outputs

Synthese 133: 131–158, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 132 JAN JÜRJENS in a functional or relational way (namely Turing machines) were employed to model computational complexity. • Operational semantics employs an evaluation relation ⇓: M ⇓ c means that the “program” M converges to the canonical form (“value”) c (“big-step”-semantics). • In denotational semantics one employs mathematical methods to use one’s intuition about specific mathematical structures to reason about programming languages. Programs are interpreted composition- ally in the structures that traditionally are order-theoretic: A term op(M1,...,Mn) consisting of an operation op and subterms Mi is interpreted by composing the interpretations of the operation and of the subterms:

[op(M1,...,Mn)]:=[op]([M1],...,[Mn]).

Both kinds of semantics have been very successful, but also have disad- vantages: Operational semantics is syntax-dependent and thus too explicit for a nice mathematical theory. In denotational semantics, the programs are modeled extensionally (i.e., showing only input/output-dependencies and no aspects of the actual computation process) which abstracts from their dynamics. While this has been an adequate approach to the tradi- tional forms of (functional) computation, the rise of interest in distributed systems (of which the most commonly known is the internet) in recent years has called for a model that takes account of the interactions between components of a system (or equivalently, between a system and its environ- ment). Moreover, this approach also models appropriately the realization of functional computation. More speculatively, an intensional model of computation (i.e., one that reflects some properties of the process of com- putation) could perhaps also be used to model computation-related aspects like computational complexity. These observations beginning in 1992 in Abramsky and Jagadeesan (1992) (and independently in Hyland and Ong (1992)) led to the con- struction of very satisfactory game-semantical models for linear logic (a resource-sensitive logic introduced in Girard (1987); another model had been given in Blass (1992), but with non-associative composition). These model were intensional in nature: thus the usual completeness re- sults, stating that provability of a formula is reflected in the model, were strengthened to “full completeness” results where each proof is itself re- presented. Another games model for linear logic was given in Lamport (1994), while the ones in Lafont and Streicher (1991) or Mey (1994) (the latter for predicate logic without contractions) are not intensional. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 133

Subsequently, this led in 1993 to the development of intensional game- theoretical models in the semantics of programming languages indepen- dently by Abramsky et al. (1994); Hyland and Ong (2000) and Nickau 1996). These models proved to be very useful and provided e.g., a solution for the probably best-known open problem in the semantics of program- ming languages, the “Full Abstractness Problem” for the programming language PCF (Plotkin 1977), by giving the first syntax-independent fully abstract model. PCF is a higher-order functional programming language that essentially is a fragment of any programming language with higher- order procedures (for instance any expressive enough object-oriented programming language). Precursors to these game-theoretical models can be seen in Joyal (1977), where for the first time a category of games is defined, and in the work of Kleene on recursive functionals, and of Berry and Curien on sequential algorithms in Berry and Curien (1982). In another line of research, game-semantical methods have so far had a number of other applications, including in Abramsky and Jagadeesan (1994) an alternative realization of the “Geometry of interaction” program (initiated in Girard (1989) and developed in a series of papers). In this paper we would like to give an accessible introduction to the games model of PCF while concentrating on motivating the basic intuitions and putting them into context.

2. GAMES – INFORMAL DEVELOPMENT

Game theory was founded in the beginning of the century with works by Zermelo, Borel and von Neumann on parlour games. In the 1950’s John Nash made his famous contributions to non-cooperative game theory and to bargaining theory that he later received the Nobel Prize of economics for (together with J. C. Harsanyi and R. Selten). The use of game-theoretical methods in logic (with the best-known example the Ehrenfeucht-Fraissé games used in (finite) model theory) also originated at the beginning of this century and is still a strong field of research. Similarly game-theoretical methods have been used in models of concurrency/reactive systems (for an introduction into the latter field (cf. Merz 2002), the modeling of interactive protocols, natural language semantics (Hintikka games (Hintikka and Sandu 1997)) etc. 134 JAN JÜRJENS

2.1. Lorenzen Games The use of game theory in the semantics of programming languages is based upon work done by P. Lorenzen in the 1950’s on “dialogue games” (Lorenzen 1960 (for a survey, cf. Felscher 1986), however we will here consider a slight variation of the games considered there in order to make the connection to Game Semantics in the following subsection more expli- cit). There a sentence of the propositional calculus (these are the formulae inductively constructed from atomic propositions p using the connectives ∨, ∧, ⇒, ¬) is interpreted via a two-player game between the “Proponent” trying to prove an assertion and the “Opponent” trying to disprove it. This is done recursively on the structure of the given formula. It can be done both for intuitionistic (in this logic essentially the law of the excluded third is not required to hold) and classical logic and we will start by giving the rules for the treatment of the former. To formulate this more formally, we first need to be a little more precise: The players are called “1”and“2”, and at each point in the game, each of them can either attack or defend the (sub-)formula under consideration at that point. Thus the possible moves are: A player (1 or 2) can

• assert a formula (e.g., A ∨ B)or • attack a (previously asserted) formula (in the notation employed below this will be denoted by a “?” under the attacked formula). A play of a game then is a sequence of moves made in turns by the two players 1 and 2 according to the following rules.

• 1 starts by asserting a formula and then it is 2’s turn to move as the “attacker”. • If the player whose turn it is to move is currently in the attacker role he can attack the formula φ asserted by the other player in the preceding move in the following way:

− If the currently attacked formula is of the form A ∧ B he can attack one of the subformulas A or B (and moreover, he can later attack the other not yet attacked subformula). − If the currently attacked formula is of the form A ∨ B, A ⇒ B or ¬A then he can simply attack the whole formula.

• If in the preceding move one of the players attacked the formula φ, the other one can now make the following moves (in “defense” of φ) depending on the structure of φ: GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 135

atomic If φ is an atomic formula, this depends on which player cur- rently is to move: 2 can assert φ,but1 can only assert φ if 2 has previously asserted it. A ∧ B He can simply assert the whole formula A ∧ B. φ = A ∨ B He can either assert A or B (under the proviso of the pre- vious case if the chosen formula is atomic). (Note that however in this intuitionistic case he does not later have the option to also assert the other disjunct !) A ⇒ B He can attack A. Instead (or also additionally, at a later point of the game) he can assert B (under the above proviso). ¬A He can attack A. • If in each of the subplays the player in turn cannot move then the play stops. If there is an attack that could not be answered, this can only be an attack by 2,since2 can assert atomic formulae ad libitum, and then 2 is the winner, and otherwise 1 is. Note that the asymmetry of the rules wrt. atomic propositions comes from the fact that in order to show that a formula is valid (semantically), one has to show that it evaluates to true for any possible valuation of the atomic propositions involved. These rules are pictured schematically in the following table:

1 A ∨ B 1 A ∧ B 2 ? 2 attack A or B 1 choose A or B 1 defend chosen formula 2 attack chosen formula

1 A ⇒ B 1 ¬A 2 ? 2 ? 1 attack A 1 attack A or defend B

A player has won a single play of the game corresponding to a formula if he made the last move that is allowed according to the above rules (i.e., which is “legal”). A strategy for a player p (say 1) is a function that assigns to every sequence of legal moves ending with a move of the other player (2) a move of his own (i.e., of 1). Thus one can obtain a play of a game by playing a strategy for 1 off against a strategy for 2. A strategy is said to be a winning strategy if this is done in such a way that p wins every possible play of the game. It then follows that winning strategies for 1 136 JAN JÜRJENS asserting a formula φ correspond exactly to proofs of φ in intuitionistic logic (in short we have the slogans “propositions-as-assertion-moves” and “proofs-as-winning-strategies”). For illustration we present proofs via games for two formulae (here and in the following we present games by showing a typical run instead of all possible runs to increase readability): Modus Ponens: Identity: 1 ((A ⇒ B) ∧ A) ⇒ B 1 A ⇒ A 2 ? 2 ? 1 ? 1 ? 2 ? 2 A 1 ? 1 A 2 A 1 A 2 B 1 B Let us go through the game for the first formula in detail: Here 1 starts by asserting the formula φ = ((A ⇒ B) ∧ A) ⇒ B. By the rules given above and the structure of φ the only move 2 can then do is to attack the whole formula. 1 can then defend φ by either attacking the premiss (A ⇒ B) ∧ A (and possibly asserting the conclusion B later), or by asserting B immediately. According to the strategy represented by the above diagram she chooses the first option, which means that she must actually attack one of the conjuncts A ⇒ B and A, and so she attacks the first (only to attack the second later – in fact she would also succeed by doing it the other way around). Now 2 can either attack the premiss A, or simply assert B (note that the latter option would not exist for 1 by the above rules). In the latter case 1 would immediately have the winning move to assert the conclusion B of φ (which is then possible because 2 has asserted it first). Thus in this play 2 chooses instead to attack A.Now1 makes use of her still existing option to attack the other conjunct A (that she has not attacked before) of the conjunction (A ⇒ B) ∧ A that has previously been under her attack. 2 can only defend A by simply asserting it. But then also 1 is allowed to defend A as the premiss of A ⇒ B by asserting it. The only option left to 2 is to finish off her defense of A ⇒ B by asserting B. But then again 1 may assert B as the conclusion of φ. Since there is no move left for 2, 1 wins this play, and in fact from the explanations given above it is clear that 1 has a winning strategy for this game. Thus we have given a proof of φ in intuitionistic logic. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 137

We obtain a representation of proofs of classical logic if the above rules are weakened so that not only both conjuncts of a conjunction can sub- sequently be attacked, but also both disjuncts of a disjunction subsequently be defended: 1 A ∨ ¬ A 2 ? 1 ¬ A 2 ? 1 ? 2 A 1 A

2.2. Game Semantics To lead over from Lorenzen Games to games in the semantics of program- ming languages we can make use of the Curry-Howard-Isomorphism. The idea is to view “propositions as types” and proofs for a proposition A as terms of type A in the λ-calculus (thus a proposition is interpreted as valid iff the corresponding type is inhabited). Recalling the above slogans this gives us “assertions-as-types” and “winning-strategies-as-terms” (note that for P ’s strategies to be winning means that the corresponding term denotes a total function). So for example the natural number 3 ∈ N is represented by the following strategy: 1 N 2 q 1 3 Here we change our notation slightly to indicate the change of per- spective: Firstly, instead of ? we write q. This is now interpreted as a request by the environment for an element of N. The difference to the preceding situation is that here our types are usually inhabited (while there the propositions were not always valid) and so the attention is turned from provability (the existence of some proof) to a specific proof. This we in- dicate by naming the proof (i.e., the element representing it – 3 in the above example) instead of simply asserting the proposition. Because of the fundamental difference in the interpretation in this setting of the first move (which asserts the type of the game) from the others we will not consider it as a move here, such that the game starts with a question by 2 (this player is here renamed to E for “environment”, while 1 becomes S for “system”). 138 JAN JÜRJENS

The interpretation of a question and its corresponding answer is then the delivery of the requested data by the system to the environment. This has the consequence that a player can repeatedly “attack” the same →- connective (which was not possible in Lorenzen Games) in order to get different inputs (see the examples below). Here “system” and “environment” can take several interpretations: for instance, a computer system and its user, a computer and the other com- puters of its network, or in the program text a term and its context. The explicit distinction between system and environment from the beginning is an important difference to most other process models. (In CSP, for example, one does have two different operators for internal and external choice. On the other hand, Hoare takes the view that: “In choosing an alphabet, there is no need to make a distinction between events which are initiated by the object (perhaps choc) and those which are initiated by some agent outside the object (for example, coin). The avoidance of the concept of causality leads to considerable simplification in the theory and its application” (Hoare 1985, 24).) Note also that a value (that in standard denotational semantics is atomic) is here represented by an interactive process (“splitting the atom of computation”). Under the Curry-Howard-Isomorphism, ⇒, ∧, ∨, true, false corres- pond to the function space →, the cartesian product ×, the disjoint sum +, the singleton 1 and the empty set ∅ respectively (and so we will adopt the latter notation). The above example Identity here instantiates to the identity function of type A → A (represented by the “copycat-strategy”): A → A E q S q E a S a (Note that for clarity we write the request under the corresponding type, and not under the connective → as in Lorenzen Games.) Similary, Modus Ponens corresponds to function application. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 139

To give a few more examples: Addition: λf : N → N.if f(0) = 0 then 1 else 0 N → (N → N) (N → N) → N E q E q S q S q E n E q S q S 0 E m n=0 E n>0 or n + m S 1 S 0 or The first example is a particular run of the strategy that, after being requested an output by the environment, itself requests the two input ar- guments, and then returns their sum. Note that the same function could be modelled by a different strategy, namely by the one that takes the argu- ments in the reverse order. This illustrates the intensionality of this model, and it in fact models the situation correctly also if one takes into account features of actual computation like store and control (see below). The second one is an example for a higher-order function: it pictures a strategy for the function

g := (λf : N → N.if f(0) = 0 then 1 else 0) : (N → N) → N that takes a function f : N → N and returns 1 if the function has value 0 at input 0, otherwise 0. Note that g receives its input not “all at once” (as a first-order function would receive its input, e.g., a natural number), but in a “demand-driven” fashion. This is in accordance with the general way of representation by (finitary) interaction and is necessary to satis- factorily model programming languages, since on a computer one cannot deal directly with infinite objects (like functions with infinite domain), but only indirectly through a finite representation by a term or (as here) finite (but arbitrary) portions of it. Thus after being requested an output from the environment, g requests a value from its input f . In this run of the strategy, f in turn demands from g a value (its argument) and receives 0, whereupon it delivers its value at 0. If this value equals 0, g returns 1, otherwise 0 to the initial request (for simplification these two cases are depicted in the same diagram, so in fact the diagram represents to possible runs: the first one is obtained by substituting the first instances n = 0and1inthelast two lines, and the second one by using the second cases n>0 and 0. Note that one can also model non-strict functions (a function is non- strict if it delivers a defined output even for an undefined input, in 140 JAN JÜRJENS the domain-theoretic sense): The following function delivers 3 without looking at its input: N → N E q S 3 By the above remarks about inputs to higher-order functions, one often requires several interactions between the function and its argument, as in the following example of the function λf.f (0) + f(1) : (N → N) → N that takes a function f : N → N as input and outputs the value f(0) + f(1) ∈ N (note that here we deviate from Lorenzen Games by allowing the same connective to be “attacked” twice, as indicated above): (N → N) → N E q S q E q S 0 E n S q E q S 1 E m S n + m These interactions can also be nested in each other, as in the following example of the function λf.f (f (3)) : (N → N) → N that takes a function f : N → N as input and produces the value f(f(3)) ∈ N (where we introduce pointers to indicate which question provided data refer to; this concept is defined more precisely in the next section): GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 141

(N → N) → N E ll6q6 lll OO 55l S kkk qUU kkk E q SiSiS GG SSS 55 S kkk qOO kkk E qOO S 3 E n S n E m S m Whereas in the example above one would not really need the pointers by instead making the convention that each delivered data refers to the “pending” question (this is the “well-bracketing” condition defined be- low), there are more complicated examples where this is not possible: For example, λf.f (λx.f (λy.y)) and λf.f (λx.f (λy.x)) would be identified without the use of pointers. In modelling systems it is conceptually nice (and for more complicated systems even required in order to make modelling feasible) to model the different components and then obtain a model of the whole by putting together the models of the parts. In order to do this one needs to be able to compose the strategy representing one component (which in the above system/environment-distinction takes on the role of the “system”) with the strategy representing the joint behaviour of the other components (the “environment” of the former component). To visualize composition of (i.e., interaction between) strategies, con- sider the following example of the composition of λn.(n, 3) : N → N × N (that maps a value n to (n, 3)) followed by the (uncurryed) addition +:N × N → N: 142 JAN JÜRJENS

N → N × N → N

q Er

El q Sr

Sl q

El n

Sl n Er

El q Sr

Sl 3 Er

n + 3 Sr The interaction takes place in the following way: The play starts in the right game with the question from Er in N. According to the strategy for + this prompts a question from Sr in the left factor of N × N.Nowany question by Sr in the domain of the strategy on the right (and thus in the codomain of the strategy on the left) is in the game on the left interpreted as a move by El. This corresponds very nicely to the intuitive fact that every component of a whole is part of the environment for any other component. Now according to the strategy for λn.(n, 3),sinceEl asks for the left factor of N × N, this results in a demand of input in N by Sl, and the response n by El is copied to the left factor of N × N. There it is interpreted as a move by Er and the game continues similarly. In the end, the strategy resulting from the composition is obtained by “hiding” the moves that are not any longer in interaction with the overall environment (the ones in brackets in the middle). As expected, the corresponding function is λn.n + 3.

NOTE 1.

• Note that as with composition of functions one can only compose strategies with matching types: to form σ ; τ,wemusthaveσ of the type A → B and τ of the type B → C for suitable A, B, C. • In the setting of Lorenzen Games, composition of strategies gives us a natural proof for the transitivity of ⇒. • Obviously there is a close relationship of the composition of strategies to the “parallel composition + hiding” in the process algebra CSP (because of the way our strategies are typed it is here possible to put these two constructions together without losing associativity; with the same idea one can also construct typed processes (cf., e.g., Abramsky (1996); Jürjens (1999b)). GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 143

• In addition to associativity we also have (partial) neutral elements wrt. composition (given by the identity strategies as presented above), so in fact we can form a category (see below).

As a special case of composition we get the application of strategies to their input: Define I to be the empty game with no moves (and one strategy, namely the one that does nothing). Then we can represent, e.g., the element (3, 5) ∈ N × N by the strategy I → N × N q 3 q 5 andso3+ 5 becomes in fact 8: I → N × N → N

q Er

El q Sr

Sl 3 Er

El q Sr

Sl 5 Er

8 Sr With game semantics one can also model the key ingredients of im- perative languages, namely commands and store, and furthermore one can define control operators that allow early escape from function evaluation. One of the nicest features of game semantics is that the abilities to use store resp. control correspond exactly to different kinds of internal properties of the strategies involved (this will be made more precise below).

3. DEFINITIONS

To put the above intuitive examples on a more solid foundation, we will now provide the underlying definitions. They appeared in McCusker (1998) and are essentially an adaption of Hyland and Ong (2000), taking account of ideas in Abramsky et al. (2000). 144 JAN JÜRJENS

3.1. Games and Strategies

DEFINITION 1. An arena is a structure A = (MA,λA, A) consisting of

• a set of moves MA, • the labelling function λA : MA →{S, E}×{Q, A} (call moves labelled (S,l) resp. (E,l) (for l ∈{Q, A})“S-moves” resp. “E-moves” and moves labelled (l, Q) resp. (l, A) (for l ∈{E,S}) “questions” resp. “answers”) and • the enabling relation A⊆ (MA +{ι}) × MA (with ι/∈ MA; say m enables n if m  n — the idea is that during a play moves can be made only when they are enabled by earlier moves.). Call a move that is enabled by ι “initial”. under the following conditions: • Initial moves are E-questions, and they are not enabled by any other moves besides ι. • Answers can only be enabled by questions. • Enabling alternates between E-moves and S-moves (i.e., an E-move can only enable a S-move and vc. vs. ).

DEFINITION 2. • A justified sequence is a sequence s of moves together with each a justification pointer from every non-initial move m to a move n earlier in s such that n  m. We say that (this occurrence of) the move n xx justifies m and write this as n · t· m (where · denotes concatenation, and supposing that t is the sub- sequence of moves between n and m).Note that justified sequences always start with E-questions. • For a justified sequence s,wedefinethesystem view s and the environment view s of s by induction on the length of s:

ε = ε. s · m = s · m, if m is a S-move. s · m = m, if m is initial. ww ww s · m · t· n = s · m · n, if n is an E-move. ε = ε. s · m = s · m, if m is an E-move. ww ww s · m · t· n = s · m · n, if n is an E-move. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 145

• A justified sequence s is a legal position if

− players alternate (if s = s1 · m · n · s2 and m is an E-move, then n is a S-move and vc. vs. ) and − for any prefix t · m of s:ifm is a S-move, then its justifier is in t and if m is a non-initial E-move then its justifier is in t.

Write LA for the set of legal positions of A.

DEFINITION 3. • Let m be a move in a legal position s. We say that m is hereditarily justified by an occurrence of a move n in s if there is a subsequence of s starting with n and ending in m such that every move is justified by the preceding move in it. For a set of (occurrences of) initial moves we write sI for the subsequence of s consisting of the moves hereditarily justified by a move of I. • A game is a structure A = (MA,λA, A,PA) where

− (MA,λA, A) is an arena and − PA is a non-empty, prefix-closed subset of LA called the valid pos- itions such that for s ∈ PA and I a set of initial moves of s we have sI ∈ PA. • A (deterministic) strategy σ for a game A is a non-empty set of even- length positions from PA satisfying − s · a · b ∈ σ ⇒ s ∈ σ and − s · a · b, s · a · c ∈ σ ⇒ b = c and b and c have the same justifier (determinacy condition).

3.2. Composition of Strategies Now we would like to model compositionality. It is convenient to do this in the framework of category theory, because that way we can make use of already existing results on models of PCF (or linear logic). A category consists of objects and morphisms. The objects of our cat- egory will be the games. To define the notion of morphism we first need to consider a construction on games:

DEFINITION 4. Given games A and B, the game A  B is defined as follows (A  B, as opposed to A → B, is the usual notation for the morphisms sets in models of linear logic):

MAB = MA + MB 146 JAN JÜRJENS ¯ λAB =[λA,λB]

ι AB m ⇔ ι B m

m AB n ⇔ m A n ∨ m B n ∨[ι B m ∧ ι A n] for m = ι

PAB ={s ∈ LAB : s |A∈ PA ∧ s |B ∈ PB }. ¯ (where λA means λA with the S/E-labels inverted and s|A is the sub- sequence of s consisting of moves from MA).

Now a morphism from a game A to a game B is a strategy on A  B. After some auxiliary definitions we will give the definition of composition of strategies:

• For a sequence u of moves from games A, B, C with justification pointers define u |B,C to be the subsequence of u consisting of moves from B and C (removing pointers that point to moves from A). Similarly define u|A,B. u is an interaction sequence of A, B, C if u |A,B∈ PAB and u |B,C∈ PBC . Write the set of all such sequences as int(A,B,C). • Suppose u ∈ int(A,B,C). By definition of , a pointer from an A- move a can only point to a B-move b if b is initial and its pointer points to an initial C-move c.Defineu |A,C to be the subsequence of u consisting of the moves of A and C where in the mentioned case the pointer from a is changed to point to c. • Given strategies σ : A  B, τ : B  C,define

σ τ := {u ∈ int(A,B,C): u |A,B∈ σ ∧ u |B,C∈ τ} and finally the composition of σ followed by τ to be

σ ; τ := {u |A,C: u ∈ σ τ}.

PROPOSITION 1. We obtain a category G whose objects are games and where the morphisms from A to B are strategies σ : A  B with composition as defined above and identities the copycat-strategies.

3.3. Restrictions on Strategies In this section we will define certain restrictions on the sets of strategies that are needed for the game-semantical characterization of programming disciplines mentioned earlier.

DEFINITION 6. By determinacy of strategies we know that for s · a · b, t · a ∈ LA (where s · a · b has even length) with s · a = t · a, there is a GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 147 unique (by determinacy) extension match(s · a · b, t · a) of t · a by b (with a justification pointer for b) such that s · a · b = match(s · a · b, t · a).A strategy σ on A is called innocent iff in each such situation it satisfies

s · a · b ∈ σ ∧ t ∈ σ ∧ t · a ∈ PA ∧ t · a = s · a ⇒ match(s · a · b, t · a) ∈ σ, i.e., a move by S depends only on the S-view.

For an example for a non-innocent strategy consider the following function (strictly speaking, the following examples are strategies in the category C to be derived from G in the next subsection):

F := λf : N → (N → N). new x := 0inf(if x = 0then(x := 1; 0) else 1) (if x = 0then(x := 1; 0) else 1)

Then we have f 01, if f asks for its first argument first Ff = f 10, if f asks for its second argument first

The strategy for this function has the following two runs: // // // // // // (N N N) N (N N N) N E pp8qO8O E pp8qO8O ppp ppp hh3q3 82q82 S hhhh EOEO S ppp OO hhhhh pp q q E OO E OO

S 0 S 0 E qOO E qOO S 1 S 1 E n E m S n S m This violates innocence: Since ww ww ww ww ww vv q1 · q2 · q3 = q1 · q2 · q3 = q1 · q2 e· eq4 · 0 · q3 ,

P must do the same in both runs. 148 JAN JÜRJENS

DEFINITION 7. A strategy σ is well-bracketed iff for each s · a · b ∈ σ with b an answer, the justification pointers on s · a · b have the form tt tt vv ... q · q1 ... a1 ... qn ... an· b

(with an = a and where ai are answers), i.e., S can answer only the most recent unanswered question in S’s view.

A counter-example for the well-bracketing condition is provided by the control operator catch: (N → (N → N)) → N whichisdefinedby   0, if f calls its first argument first, = catch(f )  1, if f calls its second argument first, n + 2, if f returns n immediately. The following is a possible run for the corresponding strategy, where clearly the bracketing condition is violated: // // // (N N N) N E q8qO8O qqq 33 S hhhhh q hhhhh E q S 0

PROPOSITION 2. We obtain categories Gi , Gb resp. Gib that are sub- categories of G with the same objects and morphisms the innocent, well-bracketed resp. innocent and well-bracketed strategies.

3.4. Cartesian Closedness Models of the lambda-calculus (and so in particular of PCF) are often given in the framework of Cartesian closed categories (ccc’s). Note that all four defined categories are autonomous, i.e., symmetric monoidal closed (and this structure is respected by the subcategory inclu- sions), via the following tensor product (and the unit I = (∅, ∅, ∅, {ε})):

DEFINITION 8. Given games A and B, the game A ⊗ B is defined as follows:

MA⊗B = MA + MB

λA⊗B =[λA,λB]

ι A⊗B m ⇔ ι A m ∨ ι B m

m A⊗B n ⇔ m A n ∨ m B n

PA⊗B ={s ∈ LA⊗B : s |A∈ PA ∧ s |B ∈ PB }. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 149 ¯ (where λA means λA with the S/E-labels inverted and s|A is the sub- sequence of s consisting of moves from MA).

We will make use of the autonomous structure in order to obtain cartesian closed categories out of the categories defined above using the Girard translation of intuitionistic logic into linear logic. First we will define the categorical product.

DEFINITION 9. Given games A and B, the game A × B is defined as follows:

MA×B = MA + MB

λA×B =[λA,λB]

ι A×B m ⇔ ι A m ∨ ι B m

m A×B n ⇔ m A n ∨ m B n

PA×B ={s ∈ LA×B : s |A∈ PA ∧ s |B = ε}

∪{s ∈ LA×B : s |B∈ PB ∧ s |A= ε}.

The projections are the obvious copycat strategies.

It is straightforward to generalize the definition from the binary to the set-indexed case and to show that this actually gives a categorical product. To define the morphisms in the ccc’s to be constructed we need the exponential of a game:

DEFINITION 10. Given a game A, the game !A is defined as follows:

M!A = MA

λ!A = λA

!A =A

P!A ={s ∈ L!A | for each initial move m, sm ∈ PA}.

Intuitively, !A stands for arbitrarily many copies of A. The use of this operator is necessitated by the fact that the λ-calculus, as opposed to linear logic, is not resource-sensitive. To define composition of morphisms σ : A → B in C (which will be strategies !A  B in G) with τ : B → C we will then need for each strategy σ :!A  B a “lifting” σ † :!A !B. This, however, can only be defined for a restricted class of games: 150 JAN JÜRJENS

DEFINITION 11. A game A is well-opened iff for all sm ∈ PA with m initial, s = ε. For σ :!A  B with well-opened games A, B define σ † :!A !B by † σ ={s ∈ L!A!B | for all initial m, sm ∈ σ }.

One can show that for well-opened games this construction does not only preserve the property of being a strategy, but also that of being innocent and well-bracketed. Now we can construct a ccc from each of the categories defined above using the Girard translation:

DEFINITION 12. The category C has as objects well-opened games and as morphisms σ : A → B strategies for !A  B. The composition σ ; τ : A → C of morphisms σ : A → B and τ : B → C is defined † to be σ ; τ. The subcategories Ci , Cb and Cib are defined by imposing restrictions analogously to the definitions above.

One can show that each of these four categories is cartesian closed and that this additional structure is respected by the inclusions. As usual in ccc’s let us write (A ⇒ B) := (!A  B), )(f ) : A → (B ⇒ C) for the morphism obtained by currying f : A × B → C,andev: (A ⇒ B) × A → B for the morphism obtained by uncurrying the identity on A ⇒ B. In fact there are conceptually very appealing factorization theorems that show that each strategy can be factored into an innocent (resp. well-bracketed) and a non-innocent (resp. non-well-bracketed) part.

4. FULLY ABSTRACT MODELS FOR PROGRAMMING LANGUAGES

In the following we will present the basic results about game-semantical models for programming languages. We start by defining the language in question.

4.1. The Language PCF The programming language PCF is a call-by-name functional language with a base type of expressions denoting natural numbers and constants for arithmetic and recursion. Its syntax is that of an applied simply-typed λ-calculus (for a definition of λ-calculus cf. Matthes (2002)) with types given by the following grammar:

A ::= exp | A1 → A2. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 151

Terms are defined as follows:

M ::= x | λx : A.M | M1M2 | n | succM | predM

| condM1M2M3 | YAM

(where x is a variable and n a natural number). Typing judgements are made using the following rules:

Variables: i ∈{1,...,n} x1 : A1,...,xn : An  xi : Ai *, x : A  M : B *  M : A → B,*  N : A Functions: , *  λx : A.M : A → B *  MN : B

*  M : exp *  M : exp Arithmetic: , , *  n : exp *  succM : exp *  predM : exp

*  Mexp,*  N : exp,*  N exp Conditional and recursion: 1 2 , *  condMN1M2 : exp *  M : A → A

*  YAM : A The “big-step” operational semantics of PCF is given by a relation M ⇓ V (“M evaluates to V ”) where M is a closed term (a term with no variables, i.e., so that  M : A can be derived) and V is a canonical form defined by the following grammar:

V ::= n | λx.M

This determines a partial function from closed term of type exp to natural numbers in the following way (where M[N/x] is the capture-free substitution of the term N for the variable x in the term M): 152 JAN JÜRJENS

Canonical forms: V ⇓ V

M ⇓ λx.M ,M [N/x]⇓V Functions: MN ⇓ V

M ⇓ n M ⇓ n + 1 M ⇓ 0 Arithmetic: , , succM ⇓ n + 1 predM ⇓ n predM ⇓ 0

M ⇓ 0,N ⇓ V M ⇓ n + 1,N ⇓ V Conditional: 1 , 2 condMN1N2 ⇓ V condMN1N2 ⇓ V

M(YM) ⇓ V Recursion: YM ⇓ V .

4.2. Game-semantical Characterization of Programming Disciplines We will first give the usual interpretation of the simply-typed λ-calculus in a Cartesian closed category. For * = x1 : A1,...,xn : An let us write * := A1 × ...× An. Each type A is modelled by an object A: Starting with the definition of exp (see below), higher types are defined by A → B = A ⇒ B. Aterm*  M : A is modelled as a morphism *  M : A : * → A:

Variables are interpreted by projections:

*  xi : Ai  = πi : * → Ai.

Abstraction is modelled by currying:

*  λx : A.M : A → B

= )(*, x : A  M : B) : * → A ⇒ B.

Application is interpreted via the evaluation map ev : (A ⇒ B)×A → B:

*  MN : B = (*  M : A → B, *  N : A); ev. GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 153

Thus to obtain a model for PCF in any of the four ccc’s defined above we are left to interpret the type exp and the term constants n,succM, predM, condMN1N2 and YAM: exp is the flat game N of natural numbers:

MN ={q}∪{n | n ∈ ω}

λN(q) = OQ

λN(n) = PA (for each n)

ι N q

q N n (for each n)

PN ={ε, q}∪{qn | n ∈ ω}

The strategies for N are ⊥={ε} and n ={ε, qn} for each n. The constant succ is interpreted as

*  succM : exp = (*  M; s) : * → exp. using the morphism s : exp represented by the following strategy: !N  N ll6q6 lll OO lll qOO n n + 1 The operation pred is defined similarly. The conditional is then defined as

*  condMN1N2 = (*  M, *  N1, *  N2); c using the morphism c : N × N × N → N represented (via the canonical isomorphism !(N×N×N) =!N⊗!N⊗!N) by the strategy whose two typical plays are depicted below: !N ⊗ !N ⊗ !N  N !N ⊗ !N ⊗ !N  N q q q q 0 m = 0 q q n n n n 154 JAN JÜRJENS

Finally recursion is interpreted in the usual way making use of the fact that the ccc’s defined above are cpo-enriched. One can then show that the categories Cib, Cb and Ci (or more pre- cisely, their quotient by an intrinsic preorder on the hom-sets) via the above interpretations give fully abstract models for the languages PCF, (a simplified version of) Idealized Algol and (a minor variant of) SPCF resp. . Here Idealized Algol is viewed as an extension of PCF with the constructs of a basic imperative language and block-allocated variables. More precisely, we add the two base types com (for commands which alter the state and which can be composed sequentially) and var (for vari- ables which store natural numbers, that are allocated using an operator new x in M and that can be written to and read from). SPCF is an ex- tension of PCF by control operators. More precisely, this variant of it is obtained by adding to PCF a family of control operators catchk. Intuitively, catchk x1,...,xk in M terminates immediately when the term M tries to evaluate the variable xi and returns i − 1. If M delivers n without using any of the xi , catchk x1,...,xk in M returns n + k. Thus one obtains the following semantic characterization of program- ming disciplines (Abramsky and McCusker 1997, 1999b; Laird 1997; Abramsky et al. 1998) (the last case functional + store + control has not been published yet):

Constraints Language D+I+B purely functional D+I functional + control D+B functional + store D functional + store + control

Here D stands for the subcategory of C with the same objects and the morphisms restrained by the determinacy condition (resp. I by the innocence and B the well-bracketing condition).

5. FURTHER WORK

By further work the above results have been considerably extended and in- clude recursive types (McCusker 1998) and call-by-value (Abramsky and McCusker 1999b; Honda and Yoshida 1997). Thus at least in principle the main features of languages like Scheme or Core ML (except for the ability to test references for equality) have been taken care of. Also there has been research towards nondeterminism (Harmer and McCusker 1999). GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 155

Very recently there has been developed a new concurrent form of game semantics resolving problems posed by the sequentiality of the traditional ones and giving a full completeness result for multiplicative-additive linear logic (Abramsky and Melliès 1999; Abramsky 1999b). Also, game se- mantics has been employed to develop a notion of “Process Realizability” (Abramsky 1999b). Applications of game semantics to reasoning about security issues can be found in Malacaria and Hankin (1999). Some of the work currently in progress addresses subtyping, and in another line of research, game- semantical ideas are being employed in specification and refinement in a way that takes account of program dynamics and the system/environment distinction (Abramsky 1999a, Jürjens 1999a). Further work will address semantics for object-oriented languages (Java) and logical principles for structuring protocols.

6. CONCLUSION

Since this paper was intended to be an elementary introduction to game semantics and just to convey the basic intuitions, many details had to be left out. For these the reader is referred to Abramsky (1997a) and Abramsky and McCusker (1999a).

ACKNOWLEDGEMENT

This work was supported by the Studienstiftung des deutschen Volkes. The author is very grateful to his supervisor, Prof. Samson Abramsky, for teaching him the subject of this introduction. Material from Ab- ramsky and McCusker (1999a) and Abramsky (1997b; 1999c) was used extensively for this paper. Furthermore the author would like to thank Benedikt Löwe and Florian Rudolph for organizing the Research Colloquium “Foundations of the Formal Sciences” (where the talk on which this paper is based was de- livered), and R. Matthes, S. Merz and the other participants for interesting discussions. Further thanks go to Andreas Seidl for comments on the draft and the anonymous referee for insightful suggestions. This work is dedicated to the author’s mother on the occasion of her birthday. 156 JAN JÜRJENS

REFERENCES

Abramsky, S.: 1999, ‘Retracing Some Paths in Process Algebra’, in CONCUR ’96: Concurrency Theory (Pisa), Springer Verlag, Berlin, pp. 1–17. Abramsky, S.: 1997a, ‘Semantics of Interaction: An Introduction to Game Semantics’, in A. Pitts and P. Dybjer (eds.), Semantics and Logics of Computation, Cambridge, 1995), Cambridge, pp.1–31. Abramsky, S.: 1997b, ‘Games in the Semantics of Programming Languages’, in P. Dek- ker, M. Stokhof and Y. Venema (eds.), Proceedings of the 11th Amsterdam Colloquium, ILLC, Department of Philosophy, University of Amsterdam 1997, pp. 1–6. Abramsky, S.: 1999a, A note on Reactive Refinement. Abramsky, S.: 1999b, ‘Process Realizability/Concurrent Games & Full Completeness of Linear Logic’, Lecture Notes for the Lectures at the Marktoberdorf Summer School. Abramsky, S.: 1999c, ‘Game Semantics and Full Abstraction for Sequential Programming Languages’, Course at LFCS, University of Edinburgh. Abramsky, S., Honda, K., and McCusker, G.: 1998, ‘A Fully Abstract Game Semantics for General References’, in Proceedings of the Thirteenth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE, pp. 334–344. Abramsky, S. and Jagadeesan, R.: 1992, ‘Games and Full Completeness for Multiplicative Linear Logic (extended abstract)’, in R. Shyamsunder (ed.), Foundations of Software Technology and Theoretical Computer Science, New Delhi, 1992, Berlin, pp. 291–301. Abramsky, S. and Jagadeesan, R.: 1994, ‘Games and Full Completeness for Multiplicative Linear Logic’, Journal of Symbolic Logic 59, 543–574. Abramsky, S., Jagadeesan, R., and Malacaria, P.: 1994, ‘Full Abstraction for PCF (exten- ded abstract)’, in M. Hagiya and J. C. Mitchell (eds.), Theoretical Aspects of Computer Software, Sendai, 1994) Berlin, pp. 1–15. Abramsky, S., Jagadeesan, R., and Malacaria, P.: 2000, ‘Full Abstraction for PCF, Information and Computation. 163, 409–470. Abramsky, S. and McCusker, G.: 1997, ‘Linearity, Sharing and State: A Fully Abstract Game Semantics for IDEALIZED ALGOL with Active Expressions’, in P. O’Hearn and R. Tennent (eds.), ALGOL-like Languages, Volume 2, Boston, pp. 297–329. Abramsky, S. and McCusker, G.: 1999a, ‘Game Semantics’, in H. Schwichtenberg and U. Berger (eds.), Logic and Computation: Proceedings of the 1997 Marktoberdorf Summer School, Berlin, pp. 1–55. Abramsky, S. and McCusker, G.: 1999b, ‘Full Abstraction for Idealized Algol with Passive Expressions’, Theoretical Computer Science 227, 3–42. Abramsky, S. and Melliès, P. A.: 1999, ‘Concurrent Games and Full Completeness’, in Proceedings of the Fourteenth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE, pp. 431–442. Berry, G. and Curien, P. L.: 1982, ‘Sequential Algorithms on Concrete Data Structures’, Theoretical Computer Science 20, 265–321, Blass, A.: 1992, ‘A game Semantics for Linear Logic’, Annals of Pure and Applied Logic 56, 183–220. Felscher, W.: 1986, ‘Dialogues as a Foundation for Intuitionistic Logic’, in D. Gabbay and F. Guenther (eds.), Handbook of Philosophical Logic, vol. III, D. Reidel Publishing Company, pp. 341–372. Girard, J.-Y.: 1987, ‘Linear Logic’, Theoretical Computer Science 50, 1–101. Girard, J.-Y.: 1989, ‘Towards a geometry of Interaction’, in John W. Gray and Andre Scedrov (eds.), Categories in Computer Science and Logic, Proceedings of the AMS- GAMES IN THE SEMANTICS OF PROGRAMMING LANGUAGES 157

IMS-SIAM joint summer research conference held June 14–20, 1987, University of Colorado, Boulder, with support from the National Science Foundation, Providence [Contemporary Mathematics 92], pp. 69–108. Harmer, R. and McCusker, G.: 1999, ‘A Fully Abstract Game Semantics for Finite Non- determinism’, in Proceedings of the Fourteenth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE. Hintikka, J. and Sandu, G.: 1997, ‘Game-theoretical Semantics’, in J. van Benthem (ed.), Handbook of Logic and Language, Elsevier Science. Hoare, C. A. R.: 1985, Communicating Sequential Processes, Prentice-Hall International. Honda, K., and Yoshida, N.: 1997, ‘Game Theoretic Analysis of Call-by-value Compu- tation’, in P. Degano, R. Gorrieri and A. Marchietti-Spaccamela (eds.), Proceedings, 25th International Colloquium on Automata, Languages and Programming: ICALP ’97, Berlin [Lecture Notes in Computer Science 1256], pp. 225–236. Hyland, J. M. E. and Ong, C. H. L.: 1993, ‘Fair Games and Full Completeness for Multiplicative Linear Logic without the Mix-Rule’, Unpublished Manuscript. Hyland, J. M. E. and Ong, C. H. L.: 2000, ‘On Full Abstraction for PCF: I, II and III’, Information and Computation 163, 285–408. Joyal, A.: 1977, ‘Remarques sur la Théorie des Jeux a Deux Personnes’, Gazette des Sciences Mathématiques du Quebec 1(4). Jürjens, J.: 1999a, ‘Towards Reactive Refinement’, contributed talk at the Marktoberdorf Summer School. Jürjens, J.: 1999b, ‘A Category of Processes, Specifications and Refinement’, talk at the workshop “Categorical Models of Concurrency”, Dresden, October 1999. Lafont, Y. and Streicher, T.: 1991, ‘Game Semantics for Linear Logic’, in Proceedings of the Sixth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE, pp. 43–50. Laird, J.: 1997, ‘Full Abstraction for Functional Languages with Control’, in Proceedings of the Fourteenth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE, pp. 58–67. Lamarche, F.: 1994, ‘Sequentiality, Games and Linear Logic (Announcement)’, in Work- shop on Categorical Logic in Computer Science. Lorenzen, P.: 1960, ‘Logik und Agon’, in Atti del Congresso Internazionale di Filosofia, Sansoni, Firenze, pp. 187–194. Malacaria, P. and Hankin, C.: 1999, ‘Non-deterministic Games and Program Analysis: An Application to Security’, in Proceedings of the Fourteenth International Symposium on Logic in Computer Science, Computer Society Press of the IEEE, pp. 443–452. McCusker, G.: 1998, ‘Games and Full Abstraction for a Functional Metalanguage with Recursive Types’, Berlin [Distinguished Dissertations in Computer Science]. Matthes, R.: 2002, ‘Tarski’s Fixed-point Theorem and Higher-order Term Rewrite Sys- tems’, this volume. Merz, S.: 2002, ‘Model Checking and Beyond: On the Analysis of Reactive Systems’, this volume. Mey, D.: 1994, ‘Finite games for a Predicate Logic without Contractions’, Theoretical Computer Science 123, 341–349. Nickau, H.: 1996, ‘Hereditarily Sequential Functionals: A Game-Theoretic Approach to Sequentiality’, Dissertation, Universität Gesamthochschule Siegen. Plotkin, G. D.: 1977, ‘LCF Considered as a Programming Language’, Theoretical Com- puter Science 5, 223–255. 158 JAN JÜRJENS

LFCS Division of Infomatics, University of Edinburgh, U.K. E-mail: [email protected], http://www.jurjens.de/jan ANTJE CHRISTENSEN

THE INCAN QUIPUS

ABSTRACT. Quipus, knotted structures of woollen or cotton cords, were used as a bureau- cratic tool in the Inca state. In the absense of a writing system, numerals and possibly other pieces of information were encoded on the quipus by tying knots into elaborately structured coloured cords. Though interpretation of the quipu contents is far from complete, some information on Inca mathematics can be deducted from the analysis of ancient specimen, especially when combined with the results of anthropological and linguistic research in contemporary Andean societies. In this paper, the quipus are introduced, their structure is explained, and some results on mathematical concepts of the Incas are presented based on a comparison of mathematical and anthropological literature on the subject.

1. PURPOSE AND SCOPE OF THE PAPER

Understanding the quipus, the knot records of the Inca empire, is one of the major goals in Andean studies. It is pursued by ethnographers and an- thropologists as well as historians and philosophers of mathematics. From the time of the Spanish conquest of the Andes to the present day, observers and scholars have contributed to the present body of knowledge on the subject. Yet a wide range of questions are still unanswered. This paper has the purpose to introduce the subject with a special view to mathematics. It does not attempt to give an overview over the literature or the present state of research on the subject. It is mainly based on two sources. One is the present principal volume on quipus, Code of the Quipu by Marcia and Robert Ascher (1981). The second source is a recent work on the un- derstanding of numbers and arithmetics in contemporary Andean societies, with a view to their pre-hispanic precursers, The Social Life of Numbers by Gary Urton (1997). It contains an extensive list of literature on the subject of the quipus.

2. HISTORICAL BACKGROUND

The Inca culture existed from about 1400 to its extinction by Spanish conquerors about 1560 AD. It extended over modern Peru and parts of

Synthese 133: 159–172, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 160 ANTJE CHRISTENSEN

Ecuador, Bolivia, Chile, and Argentina. The Inca population is estimated to have numbered between three and five million. The people were organised in a state under a king, the Sapa Inca, who reigned at the capital Cuzco. Though the state was expanded constantly during the first centennium of its existence, Inca culture was relatively homogeneous. There was a common language, Quechua, and a common religion. The culture fostered large scale engineering projects, like an extended road system and an effective system of irrigation, as well as an extended bureaucracy with a country- wide system of taxation and storing of agricultural produce. Yet, as far as known to date, the Incas had no writing in the common sense of the word, that is, a transcription of spoken words. Accounts were kept by tying knots into cords, using an elaborated system of number representation and arrangement. These knot cords were called “quipu”, the Quechua word for “knot” (see, e.g., Ascher and Ascher 1981). Since there are no original Inca written sources, and the culture was des- troyed within just one generation, our knowledge of Inca culture is limited. The main sources at hand are written accounts from colonial times, most of which were written by Spanish conquerors, anthropological research on contemporary Quechua cultures, and last but not least analysis of objects from the Inca civilisation, like the quipus. Spanish written accounts have to be considered carefully: Often the authors had only a limited understanding of the culture they described. Most of them felt superior to the Incas, so their evaluations were often not very careful. In any case, they have to be understood within a framework of the Spanish Middle Ages. A valuable source are authors who were part of both cultures, like Garcilasso de la Vega and Felipe Guaman Poma de Alaya. Both were sons of Spanish fathers and Inca mothers. Guaman Poma has provided us with several hundred drawings of everyday life in the Inca state (Ascher and Ascher 1981, 64). Contemporary Quechua cultures may provide valuable clues to their Inca precurser. Yet the question whether a specific trait of a culture was present also in pre-hispanic times, or was introduced during colonialisa- tion or even at a later stage, has to be carefully addressed. It is of great advantage that neither the language nor the concepts of mathematics seem to have undergone significant changes since colonial times, because native and Spanish number systems have functioned in separate domains (Urton 1997, 11, 209). As for physical objects, those remaining today are mostly either sturdy, like building foundations and walls, or preserved by the conquerors for the value of the material, like golden figurines. The quipus’ odds of being preserved were low. The cloth they were composed of rotted easily, and the THE INCAN QUIPUS 161 conquerors had typically no interest in preserving them. There are about 600 quipus in the museum collections today. The largest collection with almost 300 specimens is at the Museum für Völkerkunde in Berlin. The content of at least some of these quipus belongs to the abstract realm of mathematics, which our own culture to a certain extent shares with the Incas. Though any cultural values and semantic background attached to the numerals on the quipus by the Incas is lost when transcribing them to Hindu-Arabic numerals, transcription and mathematical analysis of the quipus allows us to reconstruct part of the thoughts that lay behind their construction.

3. CONTENTS OF THE QUIPUS

Part of the Inca state’s administration was the regular collection of statist- ical data on people and resources. These data allowed to organise the state’s economy as a whole. The quipus were the bookkeepers’ books (Ascher and Ascher 1981, 62; Urton 1997, 179). Astronomy is known to have played an important role in Inca religion. The Incas claimed descent from the sun and the moon. According to the chroniclers, they were familiar with approximations of a number of astro- nomical figures, like the length of the solar year and the lunar month (e.g., Urton 1997, 139). It is probable that quipus were used to write down these figures. It is affirmed by Spanish chroniclers and supposed by modern re- searchers (Urton 1998) that quipus also could serve as records for history, jurisdiction, diplomacy, and similar purposes. It is not known to date whether the knots on such quipus were purely mnemonic, like a knot in a handkerchief, or if there existed a code which allowed those who were literate in the code to read any quipu, no matter if they were familiar with its content beforehand or not. Future advancements in research on these quipus may even overthrow the presently agreed-upon opinion that the Incas had no writing. Numerical quipus are by far the best documented kind of quipu, and from the viewpoint of mathematical history the most interesting ones. Thus the focus will here be on numerical quipus.

4. MATERIAL AND DESIGN

Quipus were made from coloured cotton or wool yarns which were spun together to form an eye at one end. The other end was prevented from 162 ANTJE CHRISTENSEN unravelling by an overhand knot (Ascher and Ascher 1981, 13). The ma- terial was dyed before it was spun. Colour combinations were achieved by spinning strands of different colours together to achieve a barber pole effect, or by splicing strands of different colours, so that the top and bottom end of the finished cord had different colours (Ascher and Ascher 1981, 21). The backbone of a quipu is the main cord, which is usually thicker than the other cords. Pendant cords are attached to it by inserting each cord through its own eye, see Figure 1. Pendants are 20 to 50 cm long. Ancient specimens are found with as few as three or as many as two thousand pendant cords. Some cords, called the top cords, fall in the direction op- posite to that of the pendant cords. They can be attached in the same way as the pendant cords or passed through the loops formed by several pendant cords, thus uniting these to a group. Both pendant and top cords can have subsidiaries which are attached to the pendant or top cord instead of the main cord. The subsidiaries can have subsidiaries themselves. Up to six levels of subsidiaries were found. The pendant, top, and subsidiary cords carried knots (Ascher and Ascher 1981, 16–17).

5. NUMERALS

The number system was described by the native contemporary of the Span- ish conquest, Garcilasso de la Vega, and could be confirmed through the analysis of preserved specimens by the US-American anthropologist L. Leland Locke in 1912 (Locke 1912). The Incas used a base ten positional system. Hence they had developed independently the same number system that we use today. Urton has argued (Urton 1997, 215) that the decimal structure of Quechua numbers is derived from two models which are ubi- quitous in pre-Hispanic and colonial as well as contemporary Quechua culture: the duality of odd and even on the one hand and groups of five on the other hand, typically modelled as a mother and her age-graded off- spring. Inca administration was based on groups of five and ten (Zuidema 1964). This base of administrative and social units can still be found in contemporary Quechua societies (Urton 1997, 79). The digits of the numerals encoded on quipus were represented by clusters of one to nine single overhand knots, and the positions were separated by spaces. The clusters closer to the main cord represented the higher positions, while the unit position was close to the end of the cord. The digits in the unit position were represented by so-called long knots, multiple overhand knots whose number of turns indicated the digit (Locke 1912). Figure 1 shows the knot types and Figure 2 a schematic example THE INCAN QUIPUS 163

Figure 1. Knots and attachment methods. of quipu cords. As an overhand knot with only one turn is the same as the simple overhand knot used for the digit 1 in the other positions, a digit 1 in the unit position was represented by a figure eight knot instead. To use different knots for the unit position allowed to have several numbers on the same cord without ambiguity as to where a new number begins, a phenomenon actually observed on ancient specimens (Ascher and Ascher 1981, 31). A zero digit, as in 101, was represented by an empty space on the cord. Thus alignment of the cords was necessary for correct decipering (Locke 1912). As becomes clear from the above description, the quipu notation of numerals does not exhibit a secondary base of five, as could be expected from the explanation of the base ten as two groups of five. Neither does Quechua language exhibit a base of five: All numbers from one to ten have distinct names, none of which is formed on the basis of others (Urton 1997, 42). A reminiscence of a base of five may be appearing in one of the numerous conventional number names that differ from the primary ones, namely phishqa phishqa (“five five”) for ten (Urton 1997, 78). The construction exists though for other numbers than five as well (Urton 1997, 153). 164 ANTJE CHRISTENSEN

Figure 2. Schematic example.

Figure 3. Quipu table.

6. TABLES AND TREES

The spatial layout of the quipu allowed to arrange the numbers in tables and tree structures. Tables were formed by grouping pendant cords (Ascher and Ascher 1981, 83). One can transcribe a quipu with pendant cord groups into a table by letting the group indicate the column and the position within the group the row, see Figure 3. Position within the group was often rein- forced by colour, i.e. pendants with the same position had the same colour (Ascher and Ascher 1981, 89). Summation over columns and rows, i.e., within each group and each position, was known as well (Ascher and Ascher 1981, 87). Top cords which were drawn through the loops of a whole group were used for the sum within this group. An extra group contained the sums within positions, that is, the first pendant cord in the summation group carried the sum of the cords in the first position in the other groups etc., see Figure 4. The top cord of the summation group carried the Grand Total. While grouping allowed cross-categorisation, subsidiaries allowed hier- archical categorisation within the layout of a tree structure (Ascher and THE INCAN QUIPUS 165

Figure 4. Summation pattern.

Figure 5. Ply and knot directionality.

Ascher 1981, 109). It is worth noting that in Western mathematics, tree structures were introduced as late as 1857 in an article by Cayley. Apart from colours and spacing, variations in spin and ply direction and in knot directionality (see Figure 5) may have provided a structure for the numerals on the quipus. Only recently have these variations been recognised (Urton 1994). Their significance, or whether they in fact are significant, could not be established so far.

7. THE NUMBER ZERO

Garcilasso de la Vega does not describe a special sign for zero, nor is there a well-attested word for zero in Quechua (Urton 1979, 49). While Urton concludes from his linguistic analysis that the value zero does not appear to have an independent existence in Quechua (Urton 1997, 50), Ascher and Ascher argue that the Incas indeed had the concept of nothingness in their number system (Ascher and Ascher 1981, 89–90). Zero was represented by a cord without knots. The argument is based on the colour code allowing to omit meaningless cords. In a western style table, a meaningless cell would typically be filled in with a nonnumeric symbol – e.g., a dash or 166 ANTJE CHRISTENSEN

“N/A”. On a quipu with, say, three groups consisting of a blue, a red, and a green pendant cord each, where the red cord in the third group is meaningless, this cord can be omitted without introducing ambiguity. Having a red cord in the third group without any knots on it can then be interpreted as a relevant cell, filled in with the 0. In contrast to Hindu-Arabic numerals, zero is thus represented by the absence of signs, that is, nothingness is represented by nothingness. Cords without knots as well as repeated colour patterns with single cords being omitted do frequently occur on ancient quipus (see the ex- amples in Ascher and Ascher 1981). Thus it has to be assumed that these phenomena were built into the quipus on purpose, and the explanation above is the most convenient to date. In addition, another ancient American culture, the Mayas, are known to have had the concept of zero. It is possible that the Incas took it over from them.

8. ARITHMETIC

The main source on modern interpretation of numerals on ancient quipus are Ascher and Ascher, a team consisting of an anthropologist and a math- ematician. They have studied a large number of ancient quipus in detail and concluded that Inca arithmetic must at least have included (Ascher and Ascher 1981, 133–155) • addition • division into equal parts • division into unequal parts • of integers by integers and fractions. This list implies that the Incas dealt with fractional values in the form of division into parts and common ratios, though only integers can be dir- ectly represented by the quipu knots. Keeping in mind that the Incas were able goldsmiths, it is to be expected that Inca arithmetics encompass the concept of fractions: Exact ratios are needed in the formulation of alloys. Urton has studied contemporary Quechua arithmetic and the opera- tions of addition, , multiplication, and division from a linguistic and conceptual point of view (Urton 1997, 138–212). He characterises Quechua arithmetic as an art of rectification, thus having a specific goal, namely to establish or reestablish a state of balance (Urton 1997, 145). In this respect, even numbers are considered more complete than odd numbers (Urton 1997, 57). In the course of his fieldwork on the art of weaving in Bolivia, Urton has established that addition, subtraction, and division are employed when THE INCAN QUIPUS 167 weavers lay out the intricate designs that characterise Andean textiles. Furthermore, the weaving of these designs is learnt by young weavers as numerical formulas (Urton 1997, 125). In the following, the occurrence of the operations mentioned above is demonstrated by examples from Ascher and Ascher (1981). The quipus are identified by the labels given to them by Ascher and Ascher. The labels consist of a letter combination signifying the first author to de- scribe the quipu, and a consecutive number in accordance with the order of publication.

8.1. Addition As mentioned in the section on “Tables and Trees”, top cords and sum- mation groups carried the sums of numbers on other cords. Summation appears on about 25% of the quipus examined by Ascher and Ascher (1981, 89). Thus addition was beyond doubt a part of Inca arithmetic. Addition, as well as multiplication, turn up linguistically as well in the formation of complex numbers. For example, 21 is in Quechua iskay chunka ujniyuq, literally “two (times) ten (plus) one”. This principle is strictly adhered to for all complex numbers (Urton 1997, 46).

8.2. Division Though being conceptually more complex than multiplication, division is considered first because the quipus quoted for illustration are the less complex ones. There are three different core terms for division in Quechua (Urton 1997, 165): • palqa, “to split or branch", for example a road or river, • rak’iy, “to separate" one entity into smaller, simpler ones, • t’aqay, “to divide and redistribute" a set of separate entities. When comparing two quipus with numerals on them that appear to be division tables, it is thus well possible that we try to describe with the same concept of division the results of operations that represented different concepts for the makers of the quipus.

Division into equal parts: AS161 (Ascher and Ascher 1981, 135). The quipu consists of two groups with two respectively six pendant cords, see Figure 6. The sum of the values on the cords is 200, divided evenly into the two groups and, within each group, distributed evenly to the cords, as far as this is possible using only integers. 168 ANTJE CHRISTENSEN

Figure 6. AS161 – division into equal parts.

Division into unequal parts: AS120 (Ascher and Ascher 1981, 144). The quipu consists of three groups of eight pendants each and a summation group, see Figure 7. In each group the third pendant has a subsidiary. The th th value on the j cord in the i group is denoted by pij , and that on the th th subsidiary in the i group by piS. The value on the j cord in the summa- tion group is denoted by pj . With this notation, p1j + p2j + p3j = pj for j = 1,...,8,S. The numbers on the quipu range widely, namely from 102 to 43,372. For all of them, the following relationships hold:

p1j = 0.340 · Pj

p2j = 0.425 · Pj for j = 1,...,8,S

p3j = 0.235 · Pj

That is, the values in the first group are 340/1000 = 17/50, the ones in the second group 425/1000 = 17/40 and the ones in the third group 235/1000 = 47/200 of the values in the summation group. Hence the quipu can be interpreted as a division table: Given the values in the sum- mation group, these were divided into three unequal parts, which then were encoded in the first three groups. As the exact result of the calculation is not always an integer, the numbers had to be rounded in order to be encoded on the quipu. Therefore an error is made which can be stated in percent of the number on the quipu. For the first group, this error is at most 0.6% with the exception of the third cord, where the error is 1.4%. That is, 0.33796 ≤ p1j ≤ 0.34204 for j = 3and0.33524 ≤ p13 ≤ 0.34476. In pj p3 the second group, the error is at most 0.7% and in the third group at most THE INCAN QUIPUS 169

Figure 7. AS120 – division into unequal parts.

0.9%, both with the exception of j = 1. The actual numbers are thus close to what is expected, if the interpretation as a division table is correct.

8.3. Multiplication Urton has argued that multiplication in Quechua has an ontological foundation independent of repeated addition (Urton 1997, 160). His ar- gument is based on the concept of reproductivity, which is central to the understanding of numbers Quechua.

Multiplication of integers by integers and fractions: AS55 and AS56 (Ascher and Ascher 1981, 149). These are two small quipus with seven and three pendants respectively which were found together. Though the pendants are not grouped by spacing or colour, an implicit grouping ap- pears when examining the values. Denote the values on the pendants of the first quipu by p, q1,q2,q3,r1,r2,r3 in the order of their appearance, and those on the pendants of the second quipu by s1,s2,s3. With this notation, several attractive relations can be found: · = 2 2 2 q1q2q3 r1r2r3 s1 s2 s3 q1r2s3 = r1s2q3 = s1q2r3

qj · rj+1 = sj · sj+1 for j = 1, 2, 3 In the last equation, addition is modulo 3, i.e. 3+1=1. There is even a more general pattern implicit in the numbers:

3 6 2 4 q1 = B Cx r1 = B C xs1 = B Cx q2 = xr2 = BCx s2 = Cx 2 2 2 q3 = yr3 = B C ys3 = B Cy = 7 = 34 where B 8 and C 33 . Hence the numbers are all multiples of the values on the third and fourth cord of the larger quipu by fractions.The first cord on the larger quipu is related to the next three cords by a fraction as well. Namely, 33 q + q + q p = (q + q + q ) · = 1 2 3 . 1 2 3 34 C 170 ANTJE CHRISTENSEN

The error induced by the restriction to integers is remarkably small, namely in all cases smaller than 0.4% and in eleven out of the thirteen cases even smaller than 0.2%. The values on the two quipus range from 734 to 2,427. It has to be noted, though, that Ascher and Ascher modified one digit on the larger quipu by 1, assuming that an error occurred either in knotting by the quipu keeper or in their own transcription.

Geometric interpretation. The quipus AS143 and AS149 (Ascher and Ascher 1981, 145) expose the same underlying structure as the previously considered AS120. The former has four and the latter five pendant groups. Denote the three ratios pij ,j = 1, 2, 3 from AS120 by a,b,c in the order pj of increasing value. Thus they appear on the quipu in the order b, c, a. Accordingly, denote the ratios from AS143 by b1,b2,c,a and those from AS149 by b, c1,c2,c3,a in the order of appearance on the quipus. For AS143, let b = b1 + b2. For AS149, let c = c1 + c2 + c3. The following table lists the ratios from the three quipus.

AS120 AS143 AS149 b1 = 0.110 b2 = 0.228 b 0.340 0.338 0.222 c1 = 0.105 c2 = 0.534 c3 = 0.017 c 0.425 0.437 0.656 a 0.235 0.225 0.122

The ratios from all three quipus solve approximately the following equation: − c a = c b − a a This is one of the classical proportions studied by the Pythagoreans. Inca mathematicians may thus have shared with the mathematicians of ancient Greece a sense of mathematical beauty. The figures, standardised = − a by c, appear as areas in Figure 8; X 1 c . The error is for all figures at most 0.7%. Adding the bi and ci was not arbitrary, since they also appear in the c1 figure, see Figure 9. Here c is the area of the unit square minus the areas of the rectangle with sides 1 and X and of the circle with diameter 1 − X. This number is accurate up to 1.7%, while the error for the other numbers is at most 1%. THE INCAN QUIPUS 171

Figure 8. AS120, AS143 and AS149 – geometric interpretation 1/2.

Figure 9. AS120, AS143 and AS149 - geometric interpretation 2/2.

Ascher and Ascher assert that this figure is “quite similar to a geometric form thought to be important and persistent in the cosmology of western South America” (Ascher and Ascher 1981, 146). If the makers of these quipus really thought of this figure, they must have had a geometric visu- alisation for arithmetic. In addition, they had an idea of the area of a circle with a given diameter.

8.4. Values Attached to Numerals and Arithmetic As it already became clear in the section on division, reconstructing the numbers and their numerical relationships uncovers only part of the picture. Values and symbolism attached to both the numerals and the oper- ations whose results were encoded cannot be deciphered in this way, thus leaving the significance of the quipus largely uncertain.

9. CONCLUSION

In the absence of a writing system, the Incas kept track of numbers like balances and statistical data with the help of quipus, cord structures on which numbers were encoded as knots. A base ten positional system was used. The structure of the quipus allowed to organise numbers in tables or hierarchical tree structures. The study of ancient specimen, in conjunction 172 ANTJE CHRISTENSEN with the analysis of concepts of numbers and arithmetics in contempor- ary Quechua societies, provides clues about Inca mathematics. Arithmetic included at least addition, division into equal parts, division into unequal parts and multiplication of integers by integers and fractions. The structure of the quipus suggests that the concept of zero was included in the Inca number system. In addition, there are hints that the Incas may have had a geometrical visualisation of arithmetic.

REFERENCES

Ascher, M. and Ascher, R.: 1981, Code of the Quipu, A Study in Media, Mathematics and Culture, Ann Arbor. Julien, C. J.: 1988, ‘How Inca Decimal Administration Worked’, Ethnohistory 35, 357– 379. Locke, L. L.: 1912, ‘The Ancient Quipu, a Peruvian Knot Record’, American Anthropolo- gist 14, 325–332. Urton, G.: 1994, ‘A New Twist in an Old Yarn: Variation in Knot Directionality in the Inka Khipus’, Baessler-Archiv Neue Folge 42, 271–305. Urton, G.: 1997, The Social Life of Numbers: A Quechua Ontology of Numbers and Philosophy of Arithmetic, Austin TX. Urton, G.: 1998, ‘From Knots to Narratives: Reconstructing the Art of Historical Record Keeping in the Andes from Spanish Transcriptions of Inka Khipus’, Ethnohistory 45, 409–438. Zuidema, R. T.: 1964, The Ceque System of Cuzco: The Social Organization of the Capital of the Inca, Leiden.

Novo Nordisk A/S, Novo Allé 2880 Bagsvaerd Denmark E-mail: [email protected] STEPHAN MERZ

MODEL CHECKING TECHNIQES FOR THE ANALYSIS OF REACTIVE SYSTEMS

ABSTRACT. Model checking is a widely used technique that aids in the design and debugging of reactive systems. This paper gives an overview on the theory and algorithms used for model checking, with a bias towards automata-theoretic approaches and linear- time temporal logic. We also describe elementary abstraction techniques useful for large systems that cannot be directly handled by model checking.

1. THE TOPIC OF THIS SURVEY

Reactive systems such as process controllers, operating systems or com- munication networks maintain an ongoing interaction with their environ- ment. Usually, such systems comprise several components that operate concurrently. This inherent concurrency makes reactive systems notori- ously error-prone, due to race conditions or deadlocks that are difficult to detect and reproduce by conventional testing. On the other hand, reactive systems find applications in fields where errors are expensive or may even endanger human lives. Therefore, industry is willing to apply (and pay for) formal methods to model and analyse hardware and software components for reactive systems. In particular, model checking technology (Clarke et al. 1986; Queille and Sifakis 1981), which can automatically analyse finite- state models of reactive systems has found wide acceptance as a debugging tool. Traditionally, from finite automata to Turing machines, computing devices are modelled as mapping inputs to outputs. Such models are not well-suited for reactive systems, which continuously receive input from their environment. Instead, reactive systems are modelled as transition systems that communicate via messages or via shared variables. This sur- vey paper explains the principles of the algorithmic analysis of finite-state models of reactive systems, which rely on results in logic and automata theory as well as on efficient data structures and algorithms. Model check- ing can be applied to models of rather large and industrially relevant systems (Clarke et al. 1993). Still, even a state space of 10120 states, which has sometimes been claimed to be manageable using symbolic model

Synthese 133: 173Ð201, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 174 STEPHAN MERZ checking, is small when compared to present-day processors with per- haps a few dozen or even a hundred registers and a megabyte of cache memory. In most practical applications, one has to construct models that are small enough to be handled automatically, but sufficiently detailed for the properties of interest to be established. Abstraction techniques promise to support this state-space reduction by computer-assisted tools, although some user interaction is necessary to identify the abstraction function or the predicates of interest. Another important technique is the decomposition of systems into largely independent subsystems whose properties can be established by model checking, and which are combined to yield proper- ties of the entire system (McMillan 1997). We do not cover compositional reasoning in this survey. The plan of the paper is as follows: Section 2 introduces transition sys- tems, temporal logic, and ω-automata, and explores the relations between these concepts. It provides the theoretical background for the presentation of PTL model checking in Section 3. The analysis of a cryptographic pro- tocol gives an indication of how the technique can be applied in practice. In Section 4 we consider abstraction techniques, which enable large, and even infinite-state, systems to be reduced to tractable models. Our object- ive throughout is to give a coherent exposition of some central material. The selection is biased by personal prejudice and experience, and it is complemented by references to alternative approaches.

2. TEMPORAL LOGIC AND AUTOMATA

Reactive systems can be formally represented as transition systems, and their properties are conveniently expressed as temporal logic formulas. This section defines these concepts and presents some fundamental theory that is applied in the analysis of reactive systems.

2.1. Transition Systems

DEFINITION 1. A transition system T = (S, I, A,δ,α) isgivenbya set S of states, a non-empty subset I ⊆ S of which are initial, a set A of actions, a total transition relation δ ⊆ S × S (that is, for every s ∈ S there is some t ∈ S related by δ), and an action labelling α : δ → 2A that associates a non-empty set of actions with every pair of states (s, t) ∈ δ. A run of T is an infinite sequence ρ = r0r1 ... of states ri ∈ S such that r0 ∈ I and (ri,ri+1) ∈ δ holds for all i ∈ N. MODEL CHECKING TECHNIQUES 175

Intuitively, a transition system defines the possible evolutions of a sys- tem, starting from an initial state. The literature is abundant with variations of the definition given above. For example, we have assumed that the transition relation is total to simplify some definitions below; this can be achieved by adding a “stuttering” transition that does not change the state and is the only transition enabled in deadlocked states. Frequently, fairness conditions are associated with actions to ensure that actions are eventually executed provided they are often enough enabled. In this paper we mainly consider the linear-time temporal logic PTL where fairness assumptions can be expressed as formulas, hence there is no need to complicate the definition of a transition system. In practice, reactive systems are not described as monolithic transition systems, but are composed of processes that interact with each other and with the environment. The entire transition system can be obtained as the product of simpler transition systems associated with the individual pro- cesses. On the other hand, knowledge about such additional structure can be useful for optimizations such as the partial-order reduction methods described in Section 3.2.

2.2. Temporal Logic A property of a reactive system can be defined semantically as a set of be- haviors (infinite sequences of states), which represent those executions of the system that satisfy the property. It is convenient to express properties in a suitable logic. In this paper we will mainly consider linear-time temporal logic (Pnueli 1977; Manna and Pnueli 1992; Kröger 1987). Assume given a denumerable set V of atomic propositions, which rep- resent properties of individual states. For example, req1 and owns1 could appear as atomic propositions in the specification of a resource manager to indicate that at the present state there is a pending request for the resource from process 1, and that process 1 has been granted the resource.

DEFINITION 2. Formulas of the propositional linear-time temporal logic PTL are inductively defined as follows: • Every atomic proposition v ∈ V isaformula. • Boolean combinations of formulas are formulas. • If ϕ and ψ are formulas then ❡ϕ (read: next ϕ)andϕ until ψ are formulas.

PTL formulas are interpreted over ω-sequences of states, which inter- pret the atomic propositions in V. Formally, a state is a function s : V → 176 STEPHAN MERZ

{tt, ff} (where tt and ff are the Booleans true and false), and a behavior is a sequence σ = s0s1 ... of states. We let σi = si denote the i-th state of behavior σ (counting from 0), and σ |i = si si+1 ..., the suffix of behavior σ with the first i states omitted.

DEFINITION 3. The relation σ |= ϕ (read: ϕ holds of σ ) is inductively defined as follows:

• σ |= v (for v ∈ V)iffσ0(v) = tt. • The semantics of boolean combinations is defined in the usual way. ❡ • σ |= ϕ iff σ |1 |= ϕ. • σ |= ϕ until ψ iff for some i ≥ 0, σ |i |= ψ and σ |j |= ϕ holds for 0 ≤ j

Useful PTL formulas can be defined as abbreviations: ϕ (eventually ϕ)isdefinedastrue until ϕ and asserts that ϕ holds for some suffix. The dual formula ϕ (always ϕ), defined as ¬¬ϕ, stipulates that ϕ holds for every suffix. Finally, ϕ unless ψ is defined as (ϕ until ψ)∨ϕ;itrequires that ϕ holds for as long as ψ does not hold, but does not require ψ to hold eventually. PTL formulas can be used to assert properties about the accessibility of states and their relative ordering. As examples, consider the following properties of a resource manager for two processes:

¬(owns1 ∧ owns2) It is never the case that both processes own the re- source; in other words, the resource is exclusive. In general, formulas of the form P where P is non-temporal, express invariants of the system.

(req1 ⇒ owns1) Every request by process 1 will eventually be honored. Note that in the presence of the invariance property above, this property is only implementable if process 2 cooperates by eventu- ally releasing the resource whenever process 1 requests it. In general, formulas of the form (P ⇒ Q) for non-temporal formulas P and Q express response properties.

(req1 ∧¬(owns1 ∨ owns2)) ⇒ owns1 If there are infinitely many states such that process 1 has requested the resource and the resource is free, then infinitely often process 1 owns the resource. This formula expresses (strong) fairness for granting the resource to process 1.

(req1 ∧ req2 ⇒¬owns2 unless (owns2 unless (¬owns2 unless owns1))) MODEL CHECKING TECHNIQUES 177

Whenever both processes compete for the resource, process 2 will be granted the resource at most once before it is granted to process 1. This property, known as “1-bounded overtaking”, is an example for a precedence property. PTL formulas can be interpreted over runs of transition systems once the interpretation of atomic formulas at system states has been fixed. We say that formula ϕ holds of transition system T (or that T isamodelofϕ) if σ |= ϕ holds for all runs σ of T .

2.3. ω-Automata On the other hand, one can construct an automaton (over ω-words) that characterizes any given PTL formula. We present some basic elements of the theory of ω-automata that was initiated by the works of Büchi (1962) and Muller (1963). Much more material on this subject can be found in the excellent survey papers by Thomas (1990, 1997).

DEFINITION 4. A Büchi automaton B = (Q,I,δ,F)over an alphabet  is given by a finite set Q of states, a non-empty set I ⊆ Q of initial states, a transition relation δ ⊆ Q × ×Q, and a set F ⊆ Q of accepting states. ω A run of B over an ω-word w = w0w1 ...∈  is an infinite sequence ρ = r0r1 ...of states ri ∈ Q such that r0 ∈ I and (ri ,wi,ri+1) ∈ δ holds for all i ∈ N.Therunρ is accepting iff there exists some q ∈ F such that ri = q holds for infinitely many i ∈ N. The language L(B) defined by B is the set of ω-words for which there exists some accepting run of B.Anω-language L ⊆ ω is called ω- regular iff L = L(B) for some Büchi automaton B.

Thus, Büchi automata are a straightforward generalization of finite automata over finite words (Hopcroft and Ullman 1979): their structure is identical, but the acceptance condition changes. For our purposes it is important that the emptiness problem for Büchi automata is decidable.

PROPOSITION 5. For a Büchi automaton B = (Q,I,δ,F)with n states, it is decidable in time O(n) whether L(B) =∅. Proof. Obviously, we have that L(B) = ∅ iff there exist q0 ∈ I, q ∈ F , ∗ + x y w  x ∈  ,andy ∈  such that q0 ⇒ q and q ⇒ q (where q ⇒ q means that there is a path in B from state q to q labelled with w). The existence of such paths can be decided efficiently using Tarjan’s algorithm, which enumerates the strongly connected subcomponents of B that are reachable from states in I, and checking whether they contain some accepting state. Q.E.D. 178 STEPHAN MERZ

Figure 1. A Büchi automaton.

Many results about finite automata operating on finite words carry over to Büchi automata, the notable exception being that deterministic Büchi automata are strictly weaker than nondeterministic ones. For example, there is no deterministic Büchi automaton equivalent to the automaton shown in Figure 1 (with initial state q0 and accepting state q1), which accepts precisely those words in {a, b}ω that contain only finitely many a’s (Thomas 1990). Therefore, the standard method of proving that the complement of a regular language is again regular, which first constructs a deterministic automaton that is easily complemented, does not apply to Büchi automata. Nevertheless, the class of ω-regular languages is closed under complementation (see (Thomas 1997, 2000) for expositions of dif- ferent proof strategies). The following result is due to Safra (1988), who gave an explicit construction of essentially optimal complexity.

PROPOSITION 6. For a Büchi automaton B over  with n states there is a Büchi automaton B with 2O(nlog n) states such that L(B) = ω \ L(B).

Generalized Büchi automata (Vardi and Wolper 1994) define the accept- ance condition via a (finite) set F ={F1,...,Fn} of sets of states, and require an accepting run to contain some state from each Fi infinitely often. Simple automata-theoretic manipulations allow to construct an equiva- lent Büchi automaton for a given generalized Büchi automaton, and the results obtained for Büchi automata carry over essentially unchanged to generalized Büchi automata. In particular, the language is non-empty iff some strongly connected subcomponent reachable from some initial state contains a state from every Fi . Other types of ω-automata have been defined, which are sometimes exponentially more succinct than Büchi automata. These include Muller, Rabin, and Streett automata, which have more elaborate acceptance condi- tions (such as requiring that every state in some set R ⊆ Q be eventually followed by some state from G ⊆ Q). Deterministic Rabin and Streett automata are at the heart of Safra’s complementation proof for Büchi automata. MODEL CHECKING TECHNIQUES 179

2.4. From PTL to Büchi Automata Behaviors can be viewed as ω-words over the alphabet 2V (which is finite if V is chosen as a finite set, such as the set of atomic propositions that occur in a given PTL formula). From this perspective, PTL formulas and Büchi automata both describe ω-languages, and it is natural to compare the expressiveness of the two formalisms. For example, the Büchi automaton of Figure 1 represents the same requirement as the PTL formula b. We now show that for every PTL formula ϕ built from a (finite) set V of atomic propositions there is a generalized Büchi automaton Aϕ over 2V that accepts precisely those behaviors of which ϕ holds. Define the Fischer-Ladner closure C(ϕ) of ϕ as the set of subformulas of ϕ and their negations, identifying ¬¬ψ and ψ. The states of Aϕ are sets of formulas in C(ϕ) with the intuition that in every accepting run of Aϕ that passes through some state q, the formulas ψ ∈ q will be satisfied in the suffix of the behavior. Formally, the states of Aϕ are maximal subsets of C(ϕ) that satisfy the following healthiness conditions: • ψ ∈ q iff ¬ψ/∈ q,forallψ ∈ C(ϕ). • ψ1 ∨ ψ2 ∈ q iff ψ1 ∈ q or ψ2 ∈ q, whenever ψ1 ∨ ψ2 ∈ C(ϕ). • Conditions for other boolean connectives are similar. • If ψ1 until ψ2 ∈ q then ψ2 ∈ q or ψ1 ∈ q. • If ¬(ψ1 until ψ2) ∈ q then ψ2 ∈/ q.

The initial states of Aϕ are precisely those states that contain ϕ.  The transition relation δ of Aϕ is defined such that (q,a,q ) ∈ δ iff 1. a = q ∩ V is the set of atomic formulas that occur in q (and must therefore hold immediately in any behavior that satisfies the formulas in q). 2. q contains ψ (resp., ¬ψ) whenever q contains ❡ψ (resp., ¬ ❡ψ).  3. If ψ1 until ψ2 ∈ q and ψ2 ∈/ q then ψ1 until ψ2 ∈ q .  4. If ¬(ψ1 until ψ2) ∈ q and ψ1 ∈ q then ¬(ψ1 until ψ2) ∈ q . The healthiness and next-state conditions for the until operator are motiv- ated by the “recursion law” ❡ ψ1 until ψ2 ⇔ ψ2 ∨ (ψ1 ∧ (ψ1 until ψ2))

It remains to define the acceptance condition of Aϕ: these ensure that ψ2 will eventually be satisfied whenever some state promises satisfaction of ψ1 until ψ2. (The healthiness conditions and next-state relation alone 1 1 only ensure that ψ1 will be satisfied as long as ψ2 is not.) Let ψ1 until ψ2 , k k ...,ψ1 until ψ2 be all subformulas of this form that occur in C(ϕ).The acceptance condition F is defined as {F1,...,Fk} where Fi is the set of all 180 STEPHAN MERZ

i i states of Aϕ that either do not contain ψ1 until ψ2 or contain the formula i ψ2. This construction implies the existence of a characteristic Büchi auto- matonforanygivenPTL formula. The following proposition is due to Lichenstein et al. (1985) and Vardi and Wolper (1994).

PROPOSITION 7. For every PTL formula ϕ of length n there exists a Büchi automaton with 2O(n) states that accepts precisely the behaviors of which ϕ holds.

Combining Propositions 5 and 6, it follows that the satisfiability prob- lem for PTL is solvable in exponential time. Although this result is asymptotically optimal (more precisely, the satisfiability problem for PTL is PSPACE-complete (Sistla 1985)), the construction outlined above in- evitably yields a Büchi automaton of exponential size. Constructions that often avoid this exponential blow-up have been investigated (Gerth et al. 1995; Daniele et al. 1999) and are being used in practical implementations. As an aside, Büchi automata are in fact more expressive than PTL formulas. To attain the same level of expressiveness (which, by Büchi’s theorem, is that of monadic second order logic of linear orders interpreted over the natural numbers), PTL can be augmented by so-called “auto- maton operators” (Wolper 1983), by fixed-point definitions (Stirling 1992) or by quantification over propositional variables. However, correctness properties for practical applications can usually be formulated in PTL, and few tools (Klarlund et al. 1998) support the more expressive variants.

2.5. Variations on the Theme Branching-time temporal logics (Ben-Ari et al. 1983; Emerson and Clarke 1980; Emerson 1990) are interpreted over an entire transition system (or, equivalently, over the tree of paths obtained by unwinding the transition system) rather than over its runs. They combine temporal connectives such as ❡and until with quantifiers over the set of paths that start at the current state in the underlying transition system. In particular, one can express pos- sibility properties such as “every request by process 1 may eventually be honored”. One popular branching-time temporal logic is the Computation Tree Logic CTL, whose formulas are inductively defined as follows:

• Every atomic proposition v ∈ V isaformula. • Boolean combinations of formulas are formulas. • If ϕ and ψ are formulas then ∃ ❡ϕ, ∃ϕ,and∃(ϕ until ψ) are formulas. MODEL CHECKING TECHNIQUES 181

CTL formulas are interpreted at a state s of an infinite tree T ,as follows: • T ,s |= v (for v ∈ V)iffs(v) = tt. • The semantics of boolean combinations is defined in the usual way. ❡ • T ,s |= ∃ ϕ iff T ,t |= ϕ for some son t of s in T . • T ,s |= ∃ ϕ iff for some path s = s0,s1,... in T , T ,si |= ϕ holds for all i ≥ 0. • T ,s |= ∃ (ϕ until ψ) iff there exist a path s = s0,s1,... in T and some i ≥ 0 such that T ,si |= ψ and T ,sj |= ϕ holds for all 0 ≤ j< i. Derived operators include ∃ϕ ≡∃(true until ϕ), ∀ ❡ϕ ≡¬∃❡¬ϕ, and ∀ϕ ≡¬∃¬ϕ. For example, the formula ∀¬(owns1 ∧ owns2) expresses the mutual exclusion property of our resource manager, whereas ∀(req1 ⇒∃owns1) asserts that whenever process 1 requests the re- source it is at least possible that the request will eventually be honored, although there may also be executions that do not honor the request. It turns out that the expressiveness of CTL and PTL is incomparable (Lamport 1983): whereas PTL is clearly incapable of expressing possib- ility properties, fairness properties cannot be expressed in CTL because path quantifiers and temporal operators strictly alternate. The logic CTL∗ lifts this restriction and (strictly) subsumes both PTL and CTL. In turn, CTL∗ is contained in Kozen’s version of the µ-calculus (Kozen 1983). The choice of linear versus branching time logics has been at the cen- ter of a heated debate some 20 years ago. Today, the choice is mainly guided by pragmatic considerations, including which types of properties are relevant for the given problem, and what tools are available. Alternating-time temporal logics (Alur et al. 1997) refine the quanti- fiers of branching time logics by considering the different processes (or “agents”) that constitute a reactive system. It is then possible to assert, for example, that the manager can ensure mutual exclusion, or that it is possible for the manager and process 2 to cooperate to ensure eventual access to the resource for process 1. Automata-theoretic characterizations of branching-time logic are based on tree automata (Thomas 1990, 1997), which again define a notion of regular tree languages. More recently, alternating automata have attrac- ted considerable interest (Muller et al. 1988; Kupferman and Vardi 1997; Thomas 2000); they combine the “existential”, nondeterministic branching of Büchi automata with its dual, “universal” branching: given a state and an input symbol, the automaton chooses a set of states that will be active simultaneously. Therefore, runs of alternating automata over ω-words are infinite trees or dags (directed acyclic graphs). Alternating automata allow 182 STEPHAN MERZ a rather uniform automata-theoretic presentation of decision procedures for linear-time, branching-time, and alternating-time logics (Vardi 1995).

3. MODEL CHECKING PTL SPECIFICATIONS

Building on the foundations of Section 2, we now present the basic model checking algorithm for PTL specifications over finite-state systems, dis- cuss a class of optimization techniques, and as a concrete application present an analysis of a cryptographic protocol.

3.1. Basic PTL Model Checking Algorithm The model checking problem requires to decide, for a given transition system T and PTL formula ϕ,whetherϕ holds of T or not. Equivalently, we may ask whether there exists a run of T that does not satisfy ϕ.In the automata-theoretic framework presented in Section 2, we consider the product automaton of T and A¬ϕ, and decide whether it has an accepting run. For the construction of the product, T is considered as a Büchi auto- maton with trivial acceptance condition (remember that we had assumed T not to contain fairness conditions). Formally, assume given a finite transition system T = (S, I, A,δT ,α) and a PTL formula ϕ, represented by the corresponding Büchi automaton A¬ϕ = (Q,J,δA,F), and assume that the states of the transition sys- tem define the interpretation of the atomic formulas V of ϕ. The model checking algorithm operates on pairs (s, q) of states where s ∈ S is a state of T and q ∈ Q is a state of A¬ϕ.Callapair(s0,q0) initial whenever   s0 ∈ I and q0 ∈ J are initial states of T and A¬ϕ, and call (s ,q ) a   successor of (s, q) if both (s, s ) ∈ δT and (q, s(V), q ) ∈ δA hold (where s(V) denotes the interpretation of the atomic formulas in state s), that is, if both components are possible successor states in T and A¬ϕ, respectively. Finally, call (s, q) accepting if q ∈ F . The model checking algorithm schematically presented in Figure 2, essentially taken from Courcoubetis et al. (1992), explores all reachable states and simultaneously checks for acceptance cycles. The algorithm uses a stack (resulting in a depth-first search) that contains those pairs whose successors still need to be gen- erated and a set of pairs that have already been encountered. The main body of the algorithm is given by the procedure dfs, which can operate in one of two modes, as indicated by the boolean flag search_cycle.In the basic state exploration mode, search_cycle is false, and reachable pairs are generated until some accepting pair is encountered. At this point, the search switches to cycle search mode and tries to find a path back MODEL CHECKING TECHNIQUES 183 dfs(boolean search_cycle) { p = top(stack); foreach (q in successors(p)) { if (search_cycle and (q == seed)) report acceptance cycle and exit; if ((q, search_cycle) not in seen) { enter (q, search_cycle) into seen; push q onto stack; dfs(search_cycle); if (not search_cycle and (q is accepting)) { seed = q; dfs(true); }}} pop(stack); } // initialization seen = emptyset(); stack = emptystack(); foreach initial pair p { push p onto stack; enter (q, false) into seen; dfs(false) }

Figure 2. Basic PTL model checking algorithm. to the accepting pair. As shown in Courcoubetis (1992), the algorithm is guaranteed to find some acceptance cycle if one exists, although in general not all cycles will be generated (even if the search were continued instead of exiting). This so-called on-the-fly algorithm interleaves the construction of the state space and the search for cycles and stops as soon as some cycle has been found. In particular, it is not necessary to construct (and store) the entire product automaton. When a cycle is found, the contents of the stack represent a behavior that is both a run of T and satisfies ¬ϕ,and thus provides a counterexample of why the property does not hold of the transition system. The algorithm of Figure 2 has complexity linear in the product of the sizes of T and of A¬ϕ (by Proposition 7, the latter can be exponential in the length of ϕ). In practice, the exponential factor in the size of the formula is not really problematic, because formulas used as correctness assertions tend to be short. The limiting factor is the size of the model. In prac- tice, the model is not supplied explicitly, but is described in programming language-like notations that may contain state variables, communication 184 STEPHAN MERZ channels, parallel processes etc (see also Section 3.3). The models gener- ated from such descriptions can be of size exponential in the size of the description, and therefore the algorithm is in fact exponential in both of its inputs. Current technology limits the applicability of explicit-state model checking to models with several million reachable states. The following section describes techniques that try to overcome this limit, while Section 4 describes how bigger, and even infinite-state systems, can be reduced to systems of manageable size.

3.2. Partial-order reduction techniques Models of reactive systems often contain symmetry that can be exploited to optimize the model checking procedure. One class of optimizations that is particularly successful in the case of asynchronous systems relies on the fact that systems are composed of individual processes that operate largely independently, except for occasional synchronization. The product of the transition systems corresponding to each process, which represents the entire system, contains all interleavings of the actions of the individual processes, but for most properties the relative order of independent actions is irrelevant. Formally, for every action a ∈ A of a given transition system T let pre(a) denote the set of states for which there is some successor state t (i.e., (s, t) ∈ δ) such that a ∈ α(s,t), and for every state s ∈ pre(a) denote by post(a, s) the set of successor states that can be reached by action a. Call two actions a,b ∈ A independent if all of the following conditions hold: • For all s ∈ pre(a) and t ∈ post(a, s), t ∈ pre(b) iff s ∈ pre(b). • For all s ∈ pre(b) and t ∈ post(b, s), t ∈ pre(a) iff s∈ pre(a). • For all s ∈ pre(a)∩pre(b), post(b, t) = post(a, s). t∈post(a,s) t∈post(b,s) In other words, independent actions neither enable nor disable each other, and their executions commute in that the same states are reachable when the actions are executed in either order. Define trace equivalence on finite sequences of actions as the smallest equivalence relation that contains all sequences that differ by the exchange of adjacent independent actions. The definition of independence ensures that from any given state, trace equivalent action sequences lead to the same set of result states; hence for the analysis of reachability it suffices to consider one representative of equivalent traces. Partial-order reduction algorithms (Valmari 1990; Godefroid and Wolper 1994; Holzman and Peled 1994; Penczek 1999) differ in how this MODEL CHECKING TECHNIQUES 185 idea is extended to full PTL model checking and how it is implemented as a practical algorithm with low overhead. First, the semantic definition of independence given above is approximated by a sufficient syntactic cri- terion appropriate for the particular modelling language. For example, in a language based on shared variables, two actions of different processes are certainly independent if they do not update the same variable. For message passing systems, send and receive operations concerning the same channel are independent at those states where the channel is neither empty nor full. Second, simple criteria ensure that at least one, and ideally exactly one, representative from every class of trace equivalent action sequences is considered. Finally, for model checking the formula ϕ being analysed must be taken into account: call an action visible if it changes any variable that occurs in ϕ. Holzmann and Peled (1994) define an action to be safe if it is not visible and if it is independent (w.r.t. the syntactic approximation of independence) of all actions of different processes. In the depth-first search algorithm shown in Figure 2 it is then enough to consider the successor states for some process all of whose actions are safe, in effect delaying the actions of the other processes. However, the delayed actions must be considered when some state is reached that has already been en- countered before. This rather simple heuristic often leads to substantial savings and, in contrast to more elaborate algorithms, carries almost no overhead because the set of safe actions can be determined statically. In general, the effectiveness of partial-order reductions depends on the struc- ture of the system: while they are useless for tightly synchronized systems, exponential reductions in the numbers of states and transitions explored during model checking may be obtained for loosely coupled, asynchronous systems.

3.3. Analysis of the Needham-Schroeder Public-key Protocol As a concrete application of PTL model checking we describe the use of the model checker SPIN (Holzmann 1991) to analyze a public-key crypto- graphic protocol suggested by Needham and Schroeder. SPIN implements the techniques presented in the previous sections.

3.3.1. Description of the protocol. The objective of the protocol, symbolically represented in Figure 3, is to allow two agents A(lice) and B(ob) to agree on a shared secret, represented 1 by a pair (NA,NB ) of random numbers. The agents are assumed to have previously generated a corresponding pair of public and private keys and to know each other’s public key. Integrity of these keys ensures that a message M encrypted with, say, A’s public key (denoted as MA) can 186 STEPHAN MERZ

Figure 3. Needham-Schroeder public-key protocol. only be decrypted by agent A. The protocol requires A and B to exchange three messages:

1. Alice initiates the protocol and sends the message A,NAB to Bob. The first part of the message tells Bob the identity of the agent who wants to establish a common secret. The second part contains the number chosen by his partner. 2. Bob generates a random number NB and responds with the message NA,NB A. Since that message contains Alice’s nonce NA from the first message, which only Bob could decrypt, Alice concludes that the message must originate with Bob and accepts the pair (NA,NB ) as the shared secret. 3. Finally, Alice responds with the message NB A. Following the same line of reasoning as for message 2, upon receipt of that message Bob is convinced that it originates with Alice and also accepts (NA,NB ) as the secret shared between the two agents. The protocol is intended to be used over an unsafe medium. More precisely, an attacker may (besides acting like any honest agent) inter- cept messages intended for other agents and perhaps replay them later. However, because we are only interested in attacks on the protocol rather than on the encryption algorithm, we assume that even an attacker can only decrypt messages encrypted with his own public key. The protocol contains a flaw, and the reader is invited to find it before continuing. The error was discovered, using model checking technology, some 17 years after the protocol was first published (Lowe 1996).

3.3.2. A SPIN Model In order to analyse the protocol via model checking we must construct a finite-state model of the protocol. This step is the most difficult one in the application of model checking and it requires to concentrate on those aspects of the system that are relevant for the properties of interest. It is important to clearly identify any simplifying assumptions and restrictions made for the model: the subsequent analysis concerns the model, not the MODEL CHECKING TECHNIQUES 187 real system, and therefore the model should be validated for example by performing simulations, by code review, and similar software engineering practice, to ensure that the results of model checking are relevant for the system. For our example, we make the following simplifying assumptions: 1. There are only three agents present in the network, namely A, B, and I(ntruder). 2. The honest agents A and B are only capable of participating in one run of the protocol each where A acts as the initiator (sending messages 1 and 3) and B as the responder (sending message 2). They block upon receipt of an unexpected message. 3. Agent I is capable of temporarily storing exactly one intercepted message. These assumptions imply that complex protocol errors caused by the interference of several runs of the protocol by the same agent cannot be detected. In general, model checking should be regarded as a de- bugging rather than a verification technique. (The formal verification of cryptographic protocols has, for example, been studied by Paulson (1999).) With these assumptions, it is straightforward to write a model for the honest agents A and B in SPIN’s modelling language PROMELA (“protocol meta-language”), which has been designed for the analysis of commu- nication protocols and contains primitives for the sending and receipt of messages over channels. Figure 4 contains an almost verbatim excerpt from the actual PROMELA code for agent A. The mtype declaration defines a number of symbolic constants, including the names, keys, and nonces of the three agents. A typedef declaration is used to represent blocks of encrypted data as triples that consist of the key and two data fields. (Of course, no real encryption takes place in the model!) Finally, a message in transit is represented as a tuple consisting of the message number, the intended receiver, and a block of encrypted data. All communication takes place over a globally known channel on which every process can listen and send messages. The size of the model is reduced by declaring the channel to be unbuffered, which enforces synchronous communication between the agents, although this assumption is unimportant for this protocol. Agent A first chooses a partner (either B or I) for the protocol run, and looks up the corresponding public key. She then sends the first message as prescribed by the protocol and waits for somebody to send her a message of the second type. If that mes- sage’s encrypted part is unreadable or does not contain her own nonce she blocks (modelled as an infinite loop), otherwise she extracts the partner’s nonce and responds with message 3. The assignment to the global variable partnerA indicates successful termination of A’s run of the protocol and 188 STEPHAN MERZ mtype = {msg1, msg2, msg3, agentA, agentB, agentI, nonceA, nonceB, nonceI, keyA, keyB, keyI}; typedef Crypt { mtype key, info1, info2; } chan network = [0] of {mtype, /* msg# */ mtype, /* receiver */ Crypt};

active proctype A() { mtype partner, pkey, pnonce; Crypt data;

if /* choose a partner for this run */ :: partner = agentB; pkey = keyB; :: partner = agentI; pkey = keyI; fi; network!msg1(partner, {pkey, agentA, nonceA}); network?msg2(agentA, data); do :: (data.key == keyA) && (data.info1 == nonceA) -> break; od; pnonce = data.info2; network!msg3(partner, {pkey, pnonce, 0}); partnerA = partner; }

Figure 4. PROMELA code for agent A. is used in the correctness assertion. The code for agent B is similar and omitted. The code for agent I, part of which is shown in Figure 5, does not prescribe a fixed protocol—we are trying to use SPIN to find the attack. Rather, it describes the actions that agent I can perform at any given moment. The overall structure is that of an infinite loop that offers a non- deterministic choice between either receiving a message from the network or sending one of the three possible messages. The first alternative models reception of a message from the network (irrespective of the actually inten- ded receiver, modelling interception of messages). Agent I may then store the message in the local variable intercepted, even if it cannot decrypt the message. On the other hand, if the message is encrypted with its own public key, it may analyze the data part and possibly extract the nonces used by agents A and B. MODEL CHECKING TECHNIQUES 189 active proctype I() { bool knows_nonceA, knows_nonceB; mtype msg, type_intercepted; Crypt data, intercepted; do :: network?msg(_, data) -> if /* Perhaps store the message */ :: intercepted = data; type_intercepted = msg; :: skip; fi; if /* Try to decrypt the message */ :: (data.key == keyI) -> if :: (data.info1 == nonceA) || (data.info2 == nonceA) -> knows_nonceA = true; :: else -> skip; fi; if :: (data.info1 == nonceB) || (data.info2 == nonceB) -> knows_nonceB = true; :: else -> skip; fi; :: else -> skip; fi; :: if /* Replay or send msg1 to B */ :: (type_intercepted == msg1) -> network!msg1(agentB, intercepted); :: data.key = keyB; if :: data.info1 = agentA; :: data.info1 = agentI; fi; if :: knows_nonceA -> data.info2 = nonceA; :: knows_nonceB -> data.info2 = nonceB; :: data.info2 = nonceI; fi; network!msg1(agentB, data); fi; :: ... /* Replay or send msg2 or msg3 */ od; }

Figure 5. PROMELA code for agent I. 190 STEPHAN MERZ

Figure 6. Message sequence chart visualizing the attack.

The second alternative represents the emission of a message of the first type to agent B (there is obviously no point in sending such a message to either A or I). There are two subcases: first, agent I may have previously intercepted such a message and may replay it from its store. Second, it may construct the message from the data at its disposal, possibly using nonces learnt from messages received earlier. It may pretend that the message was sent by either A or I (it would not make sense to masquerade as B). The remaining alternatives model the emission of the other message types and are similar.

3.3.3. Analysis of the model We want to verify that whenever both agents A and B have successfully performed a run of the protocol then A believes to share the secret with B if and only if B believes to share the secret with A. This property is expressed by the PTL formula

(partnerA = 0 ∧ partnerB = 0 ⇒ (partnerA = agentB ⇔ partnerB = agentA)) MODEL CHECKING TECHNIQUES 191

(All PROMELA variables are initialized to 0.) Given the model of the three agents and this formula, SPIN runs for a few seconds and reports a behavior that violates the property and constitutes an attack on the protocol. SPIN visualizes the communications between the three agents in the form of the message sequence chart shown in Figure 6: Alice initiates a run of the pro- tocol with Intruder who in turn (masquerading as A) starts a run with Bob, using the nonce received from A. Agent B replies with his nonce, encryp- ted with A’s key. Now, agent I cannot decrypt this message, but forwards it to A. Unsuspecting, since she finds her nonce from the first message, A declares success and returns the second nonce to her partner I. This time, I is able to decrypt the message, extract the nonce, and send it to B, who also declares success. The result is a configuration where A believes to share the secret with I, but B believes to share it with A. To avoid this attack, the second message in the protocol should explicitly identify the sender. A variant where the second message is replaced by B,NA,NB A satisfies the desired property, as SPIN confirms, but of course the analysis of the simplified model does not establish the integrity of the modified protocol.

3.4. CTL model checking Model checking algorithms for CTL have traditionally used a completely different approach based on fixed-point computations (Clarke et al. 1986; 1996): Define the satisfaction set [[ ϕ]] of a CTL formula ϕ in a transition system T as the set of states s such that T ,s |= ϕ. The model check- ing problem can then be restated as deciding whether I ⊆[[ϕ]] holds. Satisfaction sets can be recursively computed as follows: [[ v]] = { s : s(v) = tt} (for v ∈ V) [[ ¬ ϕ]] = S \[[ϕ]] [[ ϕ ∨ ψ]] = [[ ϕ]] ∪[[ ψ]] ❡ − [[ ∃ ϕ]] = δ1([[ ϕ]] ) ={s : t ∈[[ ϕ]] for some t s.t. (s, t) ∈ δ} − [[ ∃ ϕ]] = ν λX.[[ ϕ]] ∩ δ 1(X)  [[ ∃ (ϕ until ψ)]] = µ λX.[[ ψ]] ∪ ([[ ϕ]] ∩ δ−1(X)) where νf (resp., µf ) denote the greatest (resp., smallest) fixed point of a function f that maps sets of states to sets of states. These fixed points can be computed effectively because S is finite. The definitions follow the recursive characterizations of the CTL operators: ∃ϕ ⇔ ϕ ∧∃❡∃ϕ ∃(ϕ until ψ) ⇔ ψ ∨ (ϕ ∧∃❡∃(ϕ until ψ) which are easily verified from the CTL semantics. The algorithms to com- pute [[ ϕ]] given in Clarke et al. (1986) have complexity linear in the product 192 STEPHAN MERZ of the sizes of T and ϕ. Therefore, the exponential blow-up incurred for PTL by the translation to Büchi automata can be avoided for CTL.Again, note that the dominant factor is the size of the transition system, and that the CTL model checking problem is also exponential (PSPACE-complete) when the description of the model is considered as the input given to the model checker. The algorithm can be refined to take into account fairness conditions, which are not expressible as CTL formulas. A very efficient way to implement the computation of satisfaction sets is in the form of so-called symbolic model checking algorithms based on compact data structures such as binary decision diagrams (Bryant 1986) to represent sets of states without explicitly enumerating system states. Symbolic CTL model checking has first been implemented in the SMV system (McMillan 1993). It performs very well when the state space of the system is sufficiently regular, and when the fixed-point computations con- verge quickly, and important practical systems with rather large state space have been analysed in this way (Clarke et al. 1993). Some common data structures such as queues are difficult to represent as BDDs, and special- purpose data structures have been proposed. Symbolic model checking can also be used for PTL specifications (Clarke et al. 1994); the idea is to introduce a new state variable vψ for every subformula ψ of the original specification, and to express the transition and acceptance conditions of the Büchi automaton in terms of these new variables. It is difficult to compare the performance of symbolic model checkers such as SMV and of explicit-state model checkers with partial-order re- ductions such as SPIN because they are optimized for different types of systems. In practice, SMV tends to perform better for hardware or syn- chronous systems with relatively short data paths, while SPIN often does better for software or asynchronous systems. More recently, algorithms for CTL model checking based on a class of alternating tree automata have been developed and implemented (Bernholtz et al. 1994; Leucker 1999) whose complexity matches that of the fixed-point algorithm described above.

4. ABSTRACTION TECHNIQUES

The model checking techniques that we have described in this survey op- erate on finite-state models. There has also been active research on model checking for restricted classes of infinite-state systems (Moller 1996; Es- parza 1996; Esparza et al. 1999). Of course, this requires that the state space be sufficiently regular for interesting properties to be decidable, so these techniques cannot yield general-purpose methods. MODEL CHECKING TECHNIQUES 193

Although physical systems are ultimately finite-state systems, they may well be too big to be amenable to model checking. Besides, it is often useful to model systems as having infinitely many states when one is not interested in the specific limitations imposed by concrete hardware. Examples include synchronization protocols for an arbitrary number of processes or communication protocols over networks with arbitrary buffer- ing capacity. In this section, we discuss abstraction techniques that allow properties to be verified for large, and even infinite-state, models by veri- fying a small, finite-state abstraction. The abstract model is constructed semi-automatically, relying on abstract interpretation and/or on interactive theorem provers. However, the runs of the abstract system are analysed automatically using model checking.

4.1. Boolean Abstractions Abstract interpretation (Cousot and Cousot 1977) provides a generic framework for the construction of abstract models. It requires the state spaces of the concrete and abstract models to be related by a Galois connection,apair(α, γ ) of functions where the abstraction function α associates an element of a lattice of abstract states to each concrete state, whereas the concretization function γ maps every abstract state to a set of concrete states such that α and γ are consistent with each other. Lattice-theoretic operations on the abstract states correspond to set- theoretic operations on sets of concrete states. In our application, we are interested in verifying properties expressed in temporal logic, and it is convenient to choose the Boolean algebra generated by a finite set of basic assertions V as the abstract domain. The choice of suitable basic assertions in general requires creativity on the side of the verifier. Obviously, the state predicates that appear in the formula to be verified should be included in the set of basic assertions. State predicates that occur as preconditions of actions are also natural candidates for inclusion into V. In more complic- ated situations, it is necessary to invent auxiliary predicates that may be suggested by an informal correctness argument. Given a (possibly infinite-state) transition system T = (S, I, A,δ,α) and a finite set V of basic assertions that can be evaluated over S, we say that T = (S,I,A, δ, α) is a Boolean abstraction of T with respect to V if the following conditions are satisfied:

• S = 2V ; we interpret a state p ∈ S as the conjunction of those basic assertions v ∈ V that are contained in p and the negations of those in V \ p. • p ∈ I if there exists some concrete state s ∈ I such that p holds of s. 194 STEPHAN MERZ

Figure 7. Dining mathematicians protocol.

• (p, q) ∈ δ if there exist concrete states (s, t) ∈ δ such that p holds of s and q holds of t. • a ∈ α(p, q) if there exist concrete states s,t ∈ S such that a ∈ α(s,t), p holds of s,andq holds of t.

PROPOSITION 8. Assume that T is a transition system and that T is a Boolean abstraction of T with respect to some set V of basic assertions. For every run ρ = r0r1 ...of T thereisarunρ = r0 r1 ...of T such that ri holds of ri ,foralli ∈ N.

Proposition 8 asserts that every run of the concrete system T has a corresponding run in the abstract system T . In particular, any run that vi- olates some PTL formula ϕ built from the set V of basic assertions has an abstract counterpart in T , which also violates ϕ. Conversely, any property that can be verified for T also holds for T . In fact, it is not necessary to choose the full powerset of V as the set of abstract states: it is enough that S contains abstract states that represent every reachable concrete state.

4.2. Example: The Dining Mathematicians The “dining mathematicians” problem due to Dams et al. (1994) serves to illustrate the use of Boolean abstractions. Figure 7 illustrates the protocol: two processes P1 and P2 alternate between “thinking” and “eating” states. To ensure exclusive use of the dining room, they synchronize via a shared variable n, initialized to some positive natural number. Process P1 may eat only if n is even, whereas P2 must wait for n to become odd. When leaving the “eating” state, P1 and P2 update n such that eventually the other process will be able to eat. We wish to verify the following properties:

(P os) (n > 0)(Excl)¬(eat1 ∧ eat2) (Live1) eat1 (Live2) eat2 The first two formulas assert invariance properties stating that n is al- ways positive and that the two processes are never simultaneously at the MODEL CHECKING TECHNIQUES 195

Figure 8. An abstraction of the dining mathematicians protocol.

“eating” state. The remaining two formulas assert the liveness proper- ties that each process will eat infinitely often (“starvation freedom”). The protocol and the properties under investigation suggest to choose the set V ={eat1,eat2,n>0, even(n)} as the set of basic assertions. A Boolean abstraction of the original protocol (more precisely, the reachable part of such an abstraction) is shown in Figure 8; the action labelling has been omitted. The states marked q0 and q1 are the initial states of the abstraction, since the initial condition does not specify whether initially n is even or odd. The transitions between the abstract states correspond to all possible transitions of the concrete system. For example, from the concrete states represented by the abstract state q2 the only possible action is the eat1 action of process P1. However, the information that n is a positive even number is not enough to conclude whether n will be even or odd after division by 2. In this simple example, the abstraction can be computed automatically, relying on elementary arithmetical facts such as

n>0 ∧¬even(n) ⇒ even(3 ∗ n + 1)

Evaluating the formulas listed above over the abstract model, we imme- diately see that (P os) and (Excl) hold since all reachable states contain the formula n>0, and none contains both eat1 and eat2. Similarly, for- mula (Live1) is satisfied because state q2, which satisfies eat1, is reached from any other state after finitely many steps. On the other hand, formula (Live2) cannot be verified over the given abstraction: the run that altern- ates between states q0 and q2 never visits a state that satisfies eat2. In fact, (Live2) can not be verified over any finite-state abstraction without using additional information: For all k ∈ N,whenn = 2k, process P1 may eat k times in succession before P2 gets a chance to eat. Hence, any finite-state abstraction of the protocol must contain a cycle that does not contain a state satisfying eat2 and therefore invalidates property (Live2). 196 STEPHAN MERZ

4.3. Strengthening Boolean Abstractions The failure to verify some property over the abstract model does not imply that the property is false for the concrete model: the identification of dis- tinct concrete states makes the abstract model contain executions that have no concrete counterpart. In particular, “false loops” such as that between states q0 and q2 in our example invalidate liveness properties. Still, the counterexample reported for the abstract model points to a way of strength- ening the abstract model without refining the abstraction: in hand proofs, well-founded orderings are used to verify that the system eventually exits from the cycle. Fortunately, this idea can also be applied to break cycles in the abstract model. Assume given a valuation function ν : S → D from the states of the concrete transition system into some well-founded ordering , "). An edge labelling ν : δ →{≺, ", −} is an ordering annotation for a Boolean abstraction T = (S,I,A, δ,α) and ν if • ν(p,q) =≺implies that ν(t) ≺ ν(s) for all concrete states (s, t) ∈ δ such that p holds of s and q holds of t,and • ν(p,q) ="implies that ν(t) " ν(s) for all concrete states (s, t) ∈ δ such that p holds of s and q holds of t. The idea is that for every ≺-labelled edge that is taken in some run of the abstract system, the valuation function ν strictly decreases for every corresponding run in the concrete system, and similarly for "-labelled edges. The principle of well-founded orderings implies that only finitely many transitions that decrease ν are possible between any two transitions that may increase ν. Now consider the PTL-formulas   Decr ≡ p ∧ ❡q NonDecr ≡ p ∧ ❡q ν(p,q)=≺ ν(p,q)=− Then the principle of well-founded orderings justifies the abstract system to be strengthened by the formula Decr ⇒ NonDecr.Thatis, any property verified for T under this additional hypothesis holds of the concrete system. For the dining mathematicians protocol, we define ν(s) = s(n) as the value of n at state s, and choose the standard ordering ≤ on the natural numbers as our well-founded ordering. The annotation ν shown in Figure 9 is an ordering annotation for the abstraction of Figure 8; this is justified from elementary arithmetical facts such as n>0 ∧ even(n) ⇒ n div 2

The annotation ν is used to break the cycle between states q0 and q2, along which the valuation decreases infinitely often, but never increases. In fact, MODEL CHECKING TECHNIQUES 197

Figure 9. An ordering annotation for the abstraction of Figure 8.

property (Live2) can be verified by model checking under the hypothesis Decr ⇒ NonDecr.

In general, several well-founded orderings may be used in combina- tion, and invariants may be necessary to establish the soundness of an ordering annotation. Other strengthening techniques allow fairness condi- tions associated with the concrete system to be recovered in the abstract system (Merz 1997). We have successfully employed these abstraction techniques to verify, among other examples, a reader-writer algorithm de- veloped at Siemens Corporate Research, and a self-stabilizing protocol due to Dijkstra (Merz 1998). The latter example required rather intricate proofs (performed using Isabelle (Paulson 1994)) to establish the soundness of the abstraction. Still, the separation between interactive proofs in ordinary first-order logic that justify the soundness of abstractions and the use of temporal logic model checking to verify the temporal properties of the sys- tem resulted in substantial savings over an alternative direct proof (Qadeer and Shankar 1998) performed in PVS.

Abstractions for branching-time temporal logics, where one also wishes to verify possibility properties asserting the existence of certain branches, require the use of so-called ∀∃ abstractions of transitions (Dams et al. 1994), besides the ∃∃ abstractions considered above. These can also be useful to establish counterexamples for the concrete system when analys- ing PTL properties. In deductive model checking (Sipma et al. 1996) the abstract system is constructed in a process of stepwise refinement, starting from a trivial abstraction and the Büchi automaton for the negation of the property. 198 STEPHAN MERZ

ACKNOWLEDGEMENT

I gratefully acknowledge the helpful suggestions of an anonymous referee.

NOTES

1. Agreement on a shared secret is a fundamental cryptographic problem: such a secret may for example be used to generate a session key for the encryption of further messages sent between the agents. Numbers such as NA and NB are usually called “nonces” in the literature on cryptographic protocols, to indicate that they should be used only once by any honest agent participating in the protocol.

REFERENCES

Alur, R., Henzinger, T. A., and Kupferman, O.: 1997, ‘Alternating-time Temporal Logic’, in 38th IEEE Symposium on Foundations of Computer Science, pp. 100Ð109. Ben-Ari, M., Halpern, J., and Pnueli, A.: 1983, ‘The Temporal Logic of Branching Time’, Acta Informatica 20, 207Ð226. Bernholtz, O., Vardi, M., and Wolper, P.: 1994, ‘An Automata-Theoretic Approach to Branching-Time Model Checking’, in D. L. Dill (ed.), 6th International Conference on Computer Aided Verification (CAV’94), Stanford, Berlin [Lecture Notes in Computer Science 818], pp. 142Ð155. Bryant, R. E.: 1986, ‘Graph-based Algorithms for Boolean Function Manipulation’, IEEE Transactions on Computers C-35, 677Ð691. Büchi, J. R.: 1962, ‘On a Decision Method in Restricted Second-order Arithmetics’, in International Congress on Logic, Method and Philosophy of Science, Stanford, pp. 1Ð12. Clarke, E. M., Emerson, E. A., and Sistla, A. P.: 1986, ‘Automatic Verification of Finite- state Concurrent Systems Using Temporal Logic Specifications’, ACM Transactions on Programming Languages and Systems 8, pp. 244Ð263. Clarke, E. M., Grumberg, O., and Hamaguchi, K.: 1994, ‘Another Look at LTL Model Checking’, in D. L. Dill (ed.), 6th International Conference on Computer Aided Veri- fication (CAV’94), Stanford, Berlin [Lecture Notes in Computer Science 818], pp. 415Ð427. Clarke, E. M., Grumberg, O., Hiraishi, H., Jha, S., Long, D. E., McMillan, K. L., and Ness, L. A.: 1993, ‘Verification of the Futurebus+ Cache Coherence Protocol’, in D. Agnew, L. Claesen and R. Camposano (eds.), IFIP Conference on Computer Hardware Description Languages and their Applications, Ottawa 1993, Amsterdam, pp. 5Ð20. Clarke, E. M., Grumberg, O., and Long, D. E.: 1996, ‘Model Checking’, in Manfred Broy (ed.), Deductive Program Design, Berlin [NATO ASI Series F-152], pp. 305Ð350. Courcoubetis, C., Vardi, M., Wolper, P., and Yannakakis, M.: 1992, ‘Memory-efficient Al- gorithms for the verification of Temporal Properties’, Formal Methods in System Design 1, 275Ð288. Cousot, P. and Cousot, R,: 1977, ‘Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints’, in 4th ACM Symposium on Principles of Programming Languages, Los Angeles, pp. 238Ð252. MODEL CHECKING TECHNIQUES 199

Dams, D., Grumberg, O., and Gerth, R.: 1994, ‘Abstract Interpretation of Reactive Sys- ∗ ∗ ∗ tems: Abstractions Preserving ∀CTL , ∃CTL and CTL ’, in Ernst-Rüdiger Olderog (ed.), Programming Concepts, Methods, and Calculi (PROCOMET ’94),Amsterdam [IFIP Transactions], pp. 561Ð581. Daniele, M., Giunchiglia, F., and Vardi, M.: 1999, ‘Improved Automata Generation for Linear Temporal Logic’, in: N. Halbwachs and D. Peled (eds.), Computer Aided Verific- ation (CAV’99), Trento, Italy, Berlin 1999 [Lecture Notes in Computer Science 1633], pp. 249Ð260. Emerson, E. A. and Clarke, E. C.: 1980, ‘Characterizing Correctness Properties of Parallel Programs Using Fixpoints’, in: 7th International Colloquium on Automata, Languages and Programming, Berlin [Lecture Notes in Computer Science 85], pp. 169Ð181. Emerson, E. A.: 1990, ‘Temporal and Modal Logic’, in Jan van Leeuwen (ed.), Formal Models and Semantics, Handbook of Theoretical Computer Science, Volume B, Amster- dam, pp. 997Ð1071. Esparza, J.: 1996, ‘More Infinite Results’, in 1st International Workshop on the Verification of Infinite State Systems, Pisa, Italy, Electronic Notes in Theoretical Computer Science 5. Esparza, J., Finkel, A., and Mayr, R.: 1999, ‘On the Verification of Broadcast Protocols’, in G. Longo (ed.), 14th IEEE Symposium on Logic in Computer Science, Trento, Italy, Washington, 1999, pp. 352Ð359. Gerth, R., Peled, D., Vardi, M., and Wolper, P.: 1995, ‘Simple on-the-fly Automatic Veri- fication of Linear Temporal Logic’, in Protocol Specification, Testing, and Verification, Warsaw, Poland, London, 1995, pp. 3Ð18. Godefroid, P. and Wolper, P.: 1994, ‘A Partial Approach to Model Checking’, Information and Computation 110, 305Ð326. Holzmann, G.: 1991, Design and Validation of Computer Protocols, Englewood Cliffs, NJ. Holzmann, G. and Peled, D.: 1994, ‘An Improvement in Formal Verification’, in IFIP WG 6.1 Conference on Formal Description Techniques, Bern, Switzerland, London 1994, pp. 197-214. Hopcroft, J. E. and Ullman, J. D.: 1979, Introduction to Automata Theory, Languages, and Computation, Reading, MA. Klarlund, N., Klarlund, M., and Klarlund F.: 1997, ‘The Logic-Automaton Connection in Practice’, in: M. Nielsen and W. Thomas (eds.), Computer Science Logic, CSL ’97,Berlin [Lecture Notes in Computer Science 1414], pp. 311Ð326. Kozen, D.: 1983, Results on the Propositional mu-calculus’, Theoretical Computer Science 27, 333Ð354. Kröger, F.: 1987 Temporal Logic of Programs, Berlin [EATCS Monographs on Theoretical Computer Science 8]. Kupferman, O. and Vardi, M. Y.: 1997, ‘Weak Alternating Automata Are Not so Weak’, in 5th Israeli Symposium on Theory of Computing and Systems, Ramat Gan pp.147Ð158. Lamport, L.: 1983, ‘What Good is Temporal Logic?’, in: R. E. A. Mason (ed.), Information Processing 83: Proceedings of the IFIP 9th World Congress,Amsterdam, pp. 657Ð668. Leucker, M.: 1999, ‘Model Checking Games for the Alternation Free mu-Calculus and Alternating Automata’, in A. Voronkov (ed.), 6th International Conference on Logic for Programming and Automated Reasoning (LPAR’99), Berlin [Lecture Notes in Computer Science 1705], pp. 77Ð91. Lichtenstein, O., Pnueli, A., and Zuck, L.: 1985, ‘The Glory of the Past’, in R. Parikh (ed.), Logics of Programs, Berlin [Lecture Notes in Computer Science 193], pp. 196Ð218. 200 STEPHAN MERZ

Lowe, G.: 1996, ‘Breaking and Fixing the NeedhamÐSchroeder Public Key Protocol Using FDR’, in T. Margaria and B. Steffen (eds.), Tools and Algorithms for the Construction and Analysis of Systems (TACAS’96), Berlin [Lecture Notes in Computer Science 1055], pp. 147Ð166. Manna, Z. and Pnueli, A.: 1992, The Temporal Logic of Reactive and Concurrent Systems – Specification,NewYork. McMillan, K. L.: 1993, Symbolic Model Checking, Dordrecht. McMillan, K. L.: 1997, ‘A compositional Rule for Hardware Design Refinement’, in O. Grumberg(ed.), 9th International Conference on Computer Aided Verification (CAV’97), Berlin [Lecture Notes in Computer Science 1254], pp. 24Ð35. Merz, S.: 1997, ‘Rules for Abstraction’, in R. K. Shyamasundar and K. Ueda (eds.), Ad- vances in Computing Science – ASIAN’97, Kathmandu, Nepal 1997, Berlin [Lecture Notes in Computer Science 1345], pp. 32Ð45. Merz, S.: 1998, ‘On the Verification of a Self-stabilizing Algorithm, typed notes 1998 available at: http://www.pst.informatik.uni-muenchen.de/˜merz/papers/dijkstra.ps.gz Moller, F.: 1996, ‘Infinite Results’, in: U. Montanari and V. Sassone (eds.), 7th Inter- national Conference on Concurrency Theory (CONCUR’96), Pisa, Italy, Berlin 1996 [Lecture Notes in Computer Science 1119], pp. 195Ð216. Muller, D. E.: 1963, ‘Infinite Sequences and Finite Machines’, in 4th Annual Symposium on Switching Circuit Theory and Logical Design, New York, pp. 3Ð16. Muller, D. E., Saoudi, A., and Schupp, P. E.: 1988, ‘Weak Alternating Automata Give a Simple Explanation of Why Most Temporal and dynamic Logics are Decidable in Exponential Time’, in 3rd IEEE Symposium on Logic in Computer Science, pp. 422Ð427. Paulson, L. C.: 1994, Isabelle: A Generic Theorem Prover, Berlin [Lecture Notes in Computer Science 828]. Paulson, L. C.: 1999, ‘Proving Security Protocols Correct’, in G. Longo (ed.), 14th IEEE Symposium on Logic in Computer Science, Trento, Italy, Washington 1999, pp. 370Ð383. Penczek, W., Gerth, R., and Kuiper, R.: 1999, ‘Partial Order Reductions Preserving Simulations’, to appear. Pnueli, A.: 1977, ‘The Temporal Logic of Programs’, in Proceedings of the 18th Annual Symposium on the Foundations of Computer Science, pp. 46Ð57. Qadeer, S. and Shankar, N.: 1998, ‘Verifying a Self-stabilizing Mutual Exclusion Al- gorithm’, in D. Gries and W.-P. de Roever (eds.), Programming Concepts and Methods, Shelter Island, NY, pp. 424Ð443. Queille, J. P. and Sifakis, J.: 1981, ‘Specification and Verification of Concurrent Systems in Cesar’, in 5th International Symposium on Programming, Berlin [Lecture Notes in Computer Science 137], pp. 337Ð351. Safra, S.: 1988, ‘On the Complexity of ω-automata’, in 29th IEEE Symposium on Foundations of Computer Science, pp. 319Ð327. Sipma, H. B., Uribe, T. E., and Manna, Z.: 1996, ‘Deductive Model Checking’, in R. Alur and T. Henzinger (eds.), 8th International Conference on Computer-Aided Verification, Berlin [Lecture Notes in Computer Science 1102], pp. 208Ð219. Sistla, A, P. and Clarke, E. M.: 1985, ‘The Complexity of Propositional Linear Temporal Logic’, Journal of the ACM 32, 733Ð749. Stirling, C.: 1992, ‘Modal and Temporal Logics’, in S. Abramsky, D. Gabbay and T. Maibaum (eds.), Handbook of Logic in Computer Science, Oxford, pp. 477Ð563. Thomas, W.: 1990, ‘Automata on Infinite Objects’, in Jan van Leeuwen (ed.), Handbook of Theoretical Computer Science, Volume B: Formal Models and Semantics,Amsterdam, pp. 133Ð194. MODEL CHECKING TECHNIQUES 201

Thomas, W.: 1997, ‘Languages, Automata, and Logic’, in G. Rozenberg and A. Salomaa (eds.), Handbook of Formal Language Theory, Volume III, Berlin, pp. 389Ð455. Thomas, W.: 2000, ‘Complementation of Büchi Automata Revisited’, in J. Karhumäki et al. (eds), Jewels are Forever, Contributions on Theoretical Computer Science in Honor of Arto Salomaa, Berlin, pp. 109Ð122. Valmari, A.: 1990, ‘A Stubborn Attack on State Explosion’, in 2nd International Workshop on Computer Aided Verification, Berlin [Lecture Notes in Computer Science 531], pp. 156Ð165. Vardi, M. Y.: 1995, ‘Alternating Automata and Program Verification’, in Jan van Leeuwen (ed.), Computer Science Today, Berlin [Lecture Notes in Computer Science 1000], pp. 471Ð485. Vardi, M. Y. and Wolper, P.: 1994, ‘Reasoning about Infinite Computations’, Information and Computation 115, 1Ð37. Wolper, P.: 1983, ‘Temporal Logic Can be More Expressive’, Information and Control 56, 72Ð93.

Institut für Informatik, Ludwig-Maximilians-Universität München, Cettingenstr. 67, 80538 Müchen Germany E-mail: [email protected]

CHRISTOPH BENZMÜLLER

COMPARING APPROACHES TO RESOLUTION BASED HIGHER-ORDER THEOREM PROVING

ABSTRACT. We investigate several approaches to resolution based automated theorem proving in classical higher-order logic (based on Church’s simply typed λ-calculus) and discuss their requirements with respect to Henkin completeness and full extensionality. In particular we focus on Andrews’ higher-order resolution (Andrews 1971), Huet’s con- strained resolution (Huet 1972), higher-order E-resolution, and extensional higher-order resolution (Benzmüller and Kohlhase 1997). With the help of examples we illustrate the parallels and differences of the extensionality treatment of these approaches and demon- strate that extensional higher-order resolution is the sole approach that can completely avoid additional extensionality axioms.

1. INTRODUCTION

It is a well known consequence of Gödel’s first incompleteness theorem that there cannot be complete calculi for higher-order logic with respect to standard semantics. However, Henkin (1950) showed that there are indeed complete calculi if one gives up the intuitive requirement of full function domains in standard semantics and considers Henkin’s general models in- stead. For higher-order calculi therefore Henkin completeness constitutes the most interesting notion of completeness. A very challenging task for a calculus aiming at Henkin-completeness is to provide a suitable extensionality treatment. Unfortunately the im- portance of full extensionality in higher-order theorem proving, i.e., the suitable combination of functional and Boolean extensionality, has widely been overlooked so far. This might be due to the fact that (weak) func- tional extensionality is already built-in in the pure simply typed λ-calculus and that Boolean extensionality or the subtle interplay between Boolean and functional extensionality does simply not occur in this context. How- ever, the situation drastically changes as soon as one is interested in a higher-order logic based on the simply typed λ-calculus, as now Boolean extensionality is of importance too. We therefore investigate the extensionality treatment of several resolu- tion based approaches to Henkin complete higher-order theorem proving:

Synthese 133: 203–235, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 204 CHRISTOPH BENZMÜLLER

Andrews’ higher-order resolution (Andrews 1971), Huet’s constrained resolution (Huet 1972), higher-order E-resolution, and extensional higher- order resolution (Benzmüller and Kohlhase 1998a). In order to ease the comparison we present them in a uniform way. Even though we focus on the resolution method in this paper the main results on the feasibility of extensionality reasoning in higher-order theorem proving do nevertheless apply to other theorem proving approaches as well. For Andrews’ and Huet’s approach it is well known that generally infin- itely many extensionality axioms are required in the search space in order to reach Henkin completeness. With the help of rather simple examples we will point out the shortcomings of this kind of extensionality treat- ment; namely a fair amount of non-goal directed search which contrasts the general idea of resolution based theorem proving. Whereas the use of higher-order E-unification (cf. Snyder 1990; Nip- kow and Qian 1991; Wolfram 1993; Qian and Wang 1996) instead of simple syntactical higher-order unification partially improves the situation, this idea nevertheless fails to provide a general solution and still requires additional extensionality axioms to ensure Henkin completeness. The first calculus that generally takes into account, that higher-order theory unification with respect to theories including full extensionality is as hard as Henkin complete higher-order theorem proving itself, is the extensional higher-order resolution approach (Benzmüller and Kohlhase 1998a). This calculus very closely integrates higher-order unification and resolution by allowing for mutual recursive calls (instead of hierarchical calls solely from resolution to unification as in first-order). With its close integration of unification and resolution this approach ensures Henkin completeness without requiring additional extensionality axioms. With the help of our examples we show that this aspect is not only of theoretical but also of practical importance as proof problems requiring non-trivial exten- sionality reasoning can be solved in the extensional higher-order resolution approach in a more goal directed way. As a theoretical result the paper presents Henkin completeness proofs for the resolution approaches of Andrews and Huet which have been ex- amined in literature so far only with respect to Andrews’ rather weak semantical notion of V -complexes. The paper is organised as follows: Syntax and semantics of higher-order logic and a proof theoretic tool for analysing Henkin completeness are sketched in Section 2. Various resolution based calculi are then introduced in Sections 3 and their extensionality treatment is investigated with the help of examples in Section 4. Related work is addressed in Section 5, and Section 6 concludes the paper. APPROACHES TO HIGHER-ORDER THEOREM PROVING 205

2. SYNTAX AND SEMANTICS OF HIGHER-ORDER LOGIC

2.1. Classical Type Theory We consider a higher-order logic based on Church’s simply typed λ- calculus (Church 1940) and choose BT :={ι, o} as base types,whereι denotes the set of individuals and o the set of truth values. Functional types are inductively defined over BT.Asignature contains for each type an infinite set of variables and constants, and particularly it provides the logical constants ¬o→o, ∨o→o→o,and (α→o)→o for every type α.As all other logical operators can be defined (e.g., A ∧ B := ¬(¬A ∨¬B), ∀Xα P X := ((α→o)→o)(λXα P X),and∃Xα P X := ¬∀Xα ¬(P X))) the given logical constants are sufficient to define a classical higher-order logic. The set of all -terms (closed -terms) of type α is denoted by wffα (cwffα). Variables are printed as upper-case (e.g., Xα), constants as lower- case letters (e.g., cα), and arbitrary terms appear as bold capital letters (e.g., Tα). If the type of a symbol is uniquely determined by the given context we h →···→ → Un omit it. We abbreviate function applications by α1 αn β αn ,which 1 n stands for (···(h →···→ → U ) ···U ).Forα-, β-, η-, βη-conversion α1 αn β α1 αn and the definition of β-normal, βη-normal, long βη-normal,andhead- normal form we refer to Barendregt (1984) as well as for the definition of free variables, closed formulas (also called sentences), and substitutions. Substitutions are represented as [T1/X1,...,Tn/Xn] where the Xi spe- cify the variables to be replaced by the terms Ti. The application of a substitution σ to a term (resp. literal or clause) C is printed Cσ . h Higher-order unification and sets of partial bindings GBγ are well explained in Snyder and Gallier (1989). A calculus R provides a set of rules {rn| 0

2.2. Clauses, Literals, and Unification Constraints The approaches studied in this paper are presented using a uniform nota- tion for clauses, literals, and unification constraints (the notation is due to Kohlhase (1994)). Literals, e.g., [A]µ, consist of a literal atom A and a polarity µ ∈{T,F}. For all rules presented in this paper we assume that the polarity specifiers µ, ν ∈{T,F} refer to complementary polarities, i.e., µ = ν. In particular we distinguish between proper literals and pre- literals. The (normalised) atom of a pre-literal has a logical constant at 206 CHRISTOPH BENZMÜLLER head position, whereas this must not be the case for proper literals. For T T instance, [A ∨ B] is a pre-literal and [po→o (A ∨ B)] is a proper literal. Furthermore a literal is called flexible if its atom contains a variable at head position. A unification problem between two terms T1 and T2 (between n terms T1,...,Tn) generated during the refutation process is called an unification constraint and is represented as [T1 =? T2] (resp. [=? (T1,...,Tn)]). A unification constraint is called a flex-flex pair if both unification terms have flexible heads, i.e., variables at head position. Clauses consist of disjunctions of literals or unification constraints. The unification constraints specify conditions under which the other literals are [ 1 2 ]T ∨[ 1 =? 1 ]∨[ 2 =? 2 ] valid. For instance the clause pα→β→o Tα Tβ Tα Sα Tβ Sβ can be informally read as: if T1 is unifiable with S1 and T2 with S2 then (p T1 T2) holds. We implicitly treat the disjunction operator ∨ in clauses as commutative and associative, i.e., we abstract from the particular or- der of the literals. Additionally we presuppose commutativity of =? and implicitly identify any two α-equal constraints or literals. Furthermore we assume that any two clauses have disjoint sets of free variables, i.e., for each freshly generated clause we choose new free variables. If a clause contains at least one pre-literal we call it a pre-clause,other- wise a proper clause. A clause is called empty, denoted by , if it consists only of (possibly none) flex-flex pairs. An important aspect of clause normalisation is Skolemisation. In this paper we employ Miller’s sound adaptation of traditional first-order Skolemisation (Miller 1983), which associates with each Skolem func- tion the minimum number of arguments the Skolem function has to be applied to. Higher-order Skolemisation becomes sound, if any Skolem function f n only occurs in a Skolem term, i.e., a formula S ≡ f nAn, where none of the Ai contains a bound variable. Thus the Skolem terms only serve as descriptions of the existential witnesses and never ap- pear as functions proper. Without this additional restriction the calculi do not really become unsound, but one can prove an instance of the axiom of choice. Andrews (1973) investigates the following instance: ∃E(ι→o)→o ∀Pi→o (∃Xι PX)⇒ P(EP)), which we want to treat as an optional axiom for the resolution calculi presented in this paper; for further details we refer to Miller (1983).

2.3. Standard and Henkin Semantics

A standard model for HOL provides a fixed set Dι of individuals, and a set Do :={, ⊥} of truth values. The domains for functional types are defined inductively: Dα→β is the set of all functions f : Dα → Dβ . Henkin models APPROACHES TO HIGHER-ORDER THEOREM PROVING 207 only require that Dα→β has enough members that any well-formed formula can be evaluated. Thus, the generalisation to Henkin models restricts the set of valid formulas sufficiently, such that complete calculi are possible. The following figure illustrates the sketched connection between standard- and Henkin semantics.

In Henkin and standard semantics Leibniz equality (which is defined as . α = := λXα λYα ∀Pα→o PX⇒ PY) denotes the intuitive identity relation and the (type parameterised) functional extensionality principles

. . ∀Mα→β ∀Nα→β (∀X MX= NX)⇒ (M = N) as well as the Boolean extensionality principle

. ∀Po ∀Qo (P ⇔⇒ (P = Q) are valid (cf. Benzmüller 1999a; Benzmüller and Kohlhase 1997). Satis- fiability and validity (M |= F or M |= )ofaformulaF or set of formulas  in a model M are defined as usual. We want to point out that the above statements on equality and exten- sionality do not apply to general models as originally introduced by Henkin (1950). Andrews (1972) showed that the sets Dα→o maybesosparsein Henkin’s original notion of general models that Leibniz equality may de- note a relation, which does not fulfil the functional extensionality principle. Due to lack of space we cannot present this general model here but refer to Andrews (1972) for further details. The solution suggested by Andrews is to presuppose the presence of the intuitive identity relations in all domains Dα→α→o, which ensures the existence of unit sets {a}∈Dα→o for all elements a ∈ Dα. The existence of these unit sets in turn ensures that Leibniz equality indeed denotes the intended (fully extensional) identity relation. In this paper, “Henkin semantics” means the corrected version of Henkin’s original notion as given in Andrews (1972). 208 CHRISTOPH BENZMÜLLER

2.4. Proving Completeness The abstract consistency proof principle (also called unifying principle) is a strong tool supporting the analysis of the connection between syntax and semantics for higher-order calculi. This proof principle has originally been introduced by Smullyan (1963) for first-order logic and has been adapted to higher-order logic by Andrews (1971). However, Andrews’ adaptation allows completeness proofs only for the rather weak semantical notion of V -complexes (in which the axioms of extensionality may fail, cf. Benzmüller 1991; Benzmüller and Kohlhase 1997). The following proof principle adapts Andrews abstract consistency proof principle to Henkin semantics.

DEFINITION 1 (Acc for Henkin Models). Let be a signature and , a class of sets of -sentences. If the following conditions hold for all ∈ ∈ ∈ A, B cwffo, F, G cwffα→β ,and , , then we call , an abstract consistency class for Henkin models, abbreviated by Acc.(Wewantto point out that we assume an implicit treatment of α-convertibility here, whereas Andrews treats α-convertibility explicit in his notion of η-wffs; cf. Andrews (1971, 3.1.2, 2.7.5).) saturated  ∪{A}∈, or  ∪{¬A}∈, .

∇c If A is atomic, then A ∈/  or ¬A ∈/ .

∇¬ If ¬¬A ∈ ,then ∪{A}∈, .

∇β If A ∈  and B is the β-normal form of A,then ∪{B}∈, .

∇η If A ∈  and B is the η-long form of A,then ∪{B}∈, .

∇∨ If A ∨ B ∈ ,then ∪{A}∈, or  ∪{B}∈, .

∇∧ If ¬(A ∨ B) ∈ ,then ∪{¬A, ¬B}∈, .

∇ α ∈ ∪{ }∈ ∈ ∀ If F ,then FW , for each W cwffα.

α ∇∃ If ¬ F ∈ ,then∪{¬(F w)}∈, for any new constant w ∈ α.

. o ∇b If ¬(A = B) ∈ ,then ∪{A, ¬B}∈, or  ∪{¬A, B}∈, .

. α→β . β ∇q If ¬(F = G) ∈ ,then ∪{¬(F w = G w)}∈, for any new constant w ∈ α. APPROACHES TO HIGHER-ORDER THEOREM PROVING 209

This definition extends Andrews notion of abstract consistency classes for V -complexes by the new requirements saturated, ∇η, ∇b,and∇q . Satur- atedness turns the partial V -complexes into total structures and the latter two conditions ensure that Leibniz equality indeed denotes a fully exten- sional relation (which may not be the case in V -complexes, where Leibniz equality simply not necessarily denotes the intended identity relation; cf. Benzmüller 1991; Benzmüller and Kohlhase 1997). The following model existence theorem is due to Andrews (1971).

THEOREM 2 (Henkin Model Existence (Andrews 1971)). Let  be a set of closed -formulas, , be an abstract consistency class for V -complexes (i.e., , fulfils ∇c, ∇¬, ∇β , ∇∨, ∇∧, ∇∀, ∇∃), and let  ∈ , . There exists a V -complex M, such that M |= .

The following related theorem addressing Henkin semantics (and ad- ditional ones addressing several notions in between Henkin semantics and V -complexes) is presented in Benzmüller (1999a); Benzmüller and Kohlhasse (1997).

THEOREM 3 (Henkin Model Existence (Benzmüller and Kohlhase 1998)). Let  be a set of closed -formulas, , be an abstract consistency class for Henkin models, and let  ∈ , . There exists a Henkin model M, such that M |= .

The complicated task of proving Henkin completeness for a given (res- olution) calculus R can now be reduced to showing that the set of all sets  containing R-consistent closed formulas is an abstract consistency class for Henkin models, i.e., to verify the (syntactically checkable) conditions given in Definition 1.

3. HIGHER-ORDER RESOLUTION

In this section we introduce several higher-order resolution calculi. Ad- ditional approaches not mentioned here are briefly sketched and related to the presented ones in Section 5. The sketched approaches will be compared with respect to their extensionality treatment in Section 4.

3.1. Andrews’ Higher-Order Resolution R We transform Andrews’ higher-order resolution calculus (Andrews 1971) in our uniform notation. In the remainder of this paper we refer to this calculus with R. Extending Andrews (1971) we show that R is Henkin 210 CHRISTOPH BENZMÜLLER complete if one adds infinitely many extensionality axioms into the search space.

λ-Conversion. Calculus R provides two explicit rules addressing α- conversion and β-reduction (cf. Andrews 1971, 5.1.1) but does not provide . aruleforη-conversion. Consequently η-equality of two terms (e.g., fι→ι = λXι fX) cannot be proven in this approach without employing the functional extensionality axiom of appropriate type; cf. Section 4.1. In our presentation we omit explicit rules for α-andβ-convertibility and instead treat them implicitly, i.e., we assume that the presented rules operate on input and generate output in β-normal form and we automatic- ally identify terms which differ only with respect to the names of bound variables.

Clause Normalisation. R introduces only four rules belonging to clause normalisation: negation elimination, conjunction elimination, existential elimination, and universal elimination (cf. Andrews 1971, 5.1.4.–5.1.7.). As our presentation of clauses in contrast to Andrews (1971) explicitly mentions the polarities of clauses and brackets the literal atoms we have to provide additional structural rules, e.g., the rule ∨T .

C ∨[¬A]T C ∨[¬A]F • Negation elimination: ¬T ¬F C ∨[A]F C ∨[A]T • Conjunction1 /disjunction elimination: C ∨[A ∨ B]T C ∨[A ∨ B]F C ∨[A ∨ B]F ∨T ∨F ∨F C ∨[A]T ∨[B]T C ∨[A]F l C ∨[B]F r • Existential2/universal elimination: C ∨[ αA]T C ∨[ αA]F T F T F C ∨[A Xα] C ∨[A sα]

Xα is a new free variable and sα is a new Skolem term

Additionally Andrews presents rules addressing commutativity and as- sociativity of the ∨-operator connecting the clauses literals (cf. Andrews 1971, 5.1.2.). We have already mentioned the implicit treatment of these aspects in Section 2.2. In the remainder of this paper Cnf(A) denotes the set of clauses ob- tained from formula A by clause normalisation. It is easy to verify that clauses produced with Andrews’ original normalisation rules can also be obtained with the rules presented here (and vice versa). APPROACHES TO HIGHER-ORDER THEOREM PROVING 211

Resolution and Factorisation. Instead of a resolution and a factorisation rule – which work in connection with unification – Andrews presents a simplification and a cut rule. The cut rule is only applicable to clauses with two complementary literals which have identical atoms. Similarly Sim is defined only for clauses with two identical literals. In order to generate identical literal atoms during the refutation process these two rules have to be combined with the substitution rule Sub presented below. [A]µ ∨[A]µ ∨ C • Simplification: Sim [A]µ ∨ C

[A]µ ∨ C [A]ν ∨ D • Cut: Cut C ∨ D

Unification and Primitive Substitution. As higher-order unification was still an open problem in 1971 calculus R employs the British Museum Method instead, i.e., it provides a substitution rule that allows to blindly instantiate free variables by arbitrary terms. As the instantiated terms may contain logical constants, instantiation of variables in proper clauses may lead to pre-clauses, which must be normalised again with the clause normalisation rules. C • Substitution of arbitrary terms: Sub C[Tα /Xα]

Xα is a free variable occurring in C.

Extensionality Treatment. Calculus R does not provide rules addressing the functional and/or Boolean extensionality principles. Instead R as- sumes that the following extensionality axioms are (in form of respective clauses) explicitly added to the search space. And since the functional extensionality principle is parameterised over arbitrary functional types infinitely many functional extensionality axioms are required3. =. . . EXT : ∀F → ∀G → (∀X FX= GX)⇒ F = G α.→β α β α β β = ∀ ∀ ⇔ ⇒ =. o EXTo : Ao Bo (A B) A B These are the crucial directions of the extensionality principles and the backward directions are not needed. The extensionality clauses derived from the extensionality axioms have the following form (note the many free variables, especially at literal head position, that are introduced into the search space – they heavily increase the amount of blind search in any attempt to automate the calculus): 212 CHRISTOPH BENZMÜLLER

α→β :[ ]T ∨[ ]F ∨[ ]T o :[ ]F ∨[ ]F ∨[ ]F ∨[ ]T E1 p(Fs) QF QG E1 A B PA PB → α β :[ ]F ∨[ ]F ∨[ ]T Eo :[A]T ∨[B]T ∨[PA]F ∨[PB]T E2 p(Gs) QF QG 2

pβ→o, sα are Skolem terms and P(α→β)→o, Q(α→β)→o are new free variables.

Proof Search. Initially the proof problem is negated and normalised. The main proof search then starts with the normalised clauses and applies the cut and simplification rule in close connection with the substitution rule. An intermediate application of the clause normalisation rules may be needed to normalise temporarily generated pre-clauses. The extensionality treatment in R simply assumes to add at the beginning of the refutation process the above clauses obtained from the extensionality axioms. When abstracting from the initial and intermediate normalisations the proof search can be illustrated as follows:

Completeness Results. Andrews (1971) gives a completeness proof for calculus R with respect to the semantical notion of V -complexes. As the extensionality principles are not valid in this rather weak semantical structures, the extensionality axioms are not needed in this completeness proof.

THEOREM 4 (V -completeness of R). The calculus R is complete with respect to the notion of V -complexes. Proof. We sketch the proof idea: 4(i) First show that the set of non- refutable sentences in R is an abstract consistency class for V -complexes. 4(ii) Then prove completeness of R with respect to V -complexes in an indirect argument: assuming non-completeness of R leads to an contradiction by 4(i) and Theorem 3.  We now extend this result and prove Henkin completeness of calculus R.

THEOREM 5 (Henkin completeness of R). The calculus R is com- plete with respect to Henkin semantics provided that the infinitely many extensionality axioms are given. Proof. 5(i) The crucial aspect is to prove that the set of non-refutable sentences in R enriched by the extensionality axioms is an abstract con- APPROACHES TO HIGHER-ORDER THEOREM PROVING 213 sistency class for Henkin models. 5(ii) An indirect argument analogous to 4(ii) employing 5(i) and Theorem 3 ensures completeness. In order to show 5(i) we have to verify the additional abstract con- sistency properties saturated, ∇η, ∇b,and∇q as specified in Definition 1. saturated We show that  ∪{A}R  or  ∪{¬A}R . Assume  R  but  ∪{A}R  and  ∪{¬A}R . By Lemma 6 (cf. below) we get {A ∨¬A}ER , and hence, since A ∨¬A is a tautology, it must be the case that  ER , which contradicts our assumption.

∇η Assuming A ∈  and  ∪{B}R ,weget R  by Lemma 7 (cf. below). This ensures the assertion by contraposition.

∇b We first apply rule Sub and instantiate the variables A and B in the o o Boolean extensionality axioms E1 and E2 with terms A and B.Now . o assume that ¬(A = B) ∈  and  ∪{A, ¬B}R  and  ∪ {¬A, B}R . Employing the instantiated Boolean extensionality axioms it is easy to see that  R , which ensures the assertion by contraposition.

∇q Can be shown analogously to ∇b when appropriately instantiating the α→β α→β functional extensionality axioms E1 , E2 .

LEMMA 6. Let  be a set of sentences and A, B be sentences. If  ∪ {A}R  and  ∪{B}R ,then ∪{A ∨ B}R . Proof. We first verify that Cnf( ∗ A ∨ B) = Cnf() ∪ (Cnf(A)  Cnf(B)),where, 3 = :={C∨D|C ∈ Cnf(A)}, D ∈ Cnf(B)}.Thenwe use that ∪(,1 ,2) R , provided that ∪,1 R  and ∪,2 R . 

LEMMA 7. Let  be a set of sentences and let A, B be sentences in β- normal form, such that A can be transformed into B by (i) a one step η- expansion or (ii) a multiple step η-expansion. Then  ∪{B}R  implies  ∪{A}R . Proof. Case (ii) can be proven by induction on the number of η- expansion steps employing (i) in the base case. To prove case (i) note that A and B differ (apart from α-equality) only with respect to a single subterm Tα→β . More precisely, A[(λX T X)/T] is equal to B. Normalising sentences A (resp. B) may result in several clauses A1,...,An (resp. B1,...,Bn) 214 CHRISTOPH BENZMÜLLER with duplicated occurrences of subterm T (resp. λX T X). We appro- α→β α→β priately instantiate the functional extensionality axioms E1 , E2 and T F derive the (Leibniz equation) clauses C1 :[Qf] ∨[Q(λX fX)] and  F  T C2 :[Q f ] ∨[Q (λX fX)] (the latter can be obtained from the former by substituting λX ¬Q X for Q). Obviously, we can derive for each 1 ≤ i ≤ n the clause Bi from its counterpart Ai with the help of C1 and C2 (formally we apply an induction on the occurrences of term T in Ai). 

3.2. Huet’s Higher-Order Constrained Resolution CR In this section we transform Huet’s constrained resolution approach (Huet 1972, 1973a) to our uniform notation. The calculus here is the unsor- ted fragment of the variant of Huet’s approach as presented in Kohlhase (1994). In the remainder of this paper we refer to this calculus as CR .We extend (Huet 1972, 1973a) and show that CR is Henkin complete if we add infinitely many extensionality axioms to the search space.

λ-Conversion. Like R calculus CR assumes that terms, literals, and clauses are implicitly reduced to β-normal form. Furthermore we assume that α-equality is treated implicitly, i.e., we identify all terms that differ only with respect to the names of bound variables.

Clause Normalisation. Huet (1972) does not present clause normalisation rules but assumes that they are given. Here we employ the rules ¬T , ¬F , ∨T ∨F ∨F T F , l , r , ,and as already defined for calculus R in Section 3.1.

Resolution and Factorisation. As first-order unification is decidable and unitary it can be employed as a strong filter in first-order resolution (Robinson 1965). Unfortunately higher-order unification is not decid- able (cf. Lucchesi 1972; Huet 1973b; Goldfarb 1981) and thus it can not be applied in the sense of a terminating side computation in higher- order theorem proving. Huet therefore suggests in Huet (1972, 1973a) to delay the unification process and to explicitly encode unification prob- lems occurring during the refutation search as unification onstraints. In his original approach Huet presented a hyper-resolution rule which sim- ultaneously resolves on the resolution literals A1,...An (1 ≤ n)and B1,...Bm (1 ≤ m) of two given clauses and adds the unification constraint [=? (A1,...An, B1,...Bm)] to the resolvent.

[A1]µ ∨ ...∨[An]µ ∨ C[B1]µ ∨ ...∨[Bm]µ ∨ D Hres C ∨ D ∨[=? (A1,...An, B1,...Bm)] APPROACHES TO HIGHER-ORDER THEOREM PROVING 215

In order to ease the comparison with the two other approaches discussed in this paper we instead employ a resolution rule Res and a factorisation rule Fac. Like Hres both rules encode the unification problem to be solved as a unification constraint. [A]µ ∨ C [B]ν ∨ D • Constrained resolution: Res C ∨ D ∨[A =? B] [A]µ ∨[B]µ ∨ C • Constrained factorisation: Fac [A]µ ∨ C ∨[A =? B]F One can easily prove by induction on n + m that each proof step applying rule Hres can be replaced by a corresponding derivation employ- ing Res and Fac. For a formal proof note that the unification constraint [=? (A1,...An, B1,...Bm)] is equivalent to [A1 =? A2]∨[A2 =? A3]∨... ∨[An−1 =? An]∨[An =? B1]∨[B1 =? B2]∨[B2 =? B3]∨...∨[Bn−1 =? Bn].

Unification and Splitting. Huet (1975) introduces higher-order unifica- tion and higher-order pre-unification and shows that higher-order pre- unification is sufficient to verify the soundness of a refutation in which the occurring unification problems have been delayed until the end. The higher-order pre-unification rules presented here are discussed in detail in Benzmüller (1999a). They furthermore closely reflect the rules as presented in Snyder and Gallier (1989). C ∨[A =? A] • Elimination of trivial pairs: Triv C

? C ∨[Aα→β Cα = Bα→β Dα] • Decomposition Dec C ∨[A =? B]∨[C =? D] ∨[ =? ] • Elimination of λ-binders: C Mα→β Nα→β ? Func (weak functional extensionality) C ∨[M sα = N sα]

sα is a new Skolem term. ∨[ n =? m] ∈ h C Fγ U h V G GBγ • Imitation of rigid heads: FlexRigid C ∨[F =? G]∨[F Un =? h Vm] h GBγ is the set of partial bindings of type γ for head h as defined in Snyder and Gallier (1989). Huet points to the usefulness of eager unification to filter out clauses with non-unifiable unification constraints or to back-propagate the solu- tions of easily solvable constraints (e.g., in case of first-order unification 216 CHRISTOPH BENZMÜLLER problems occurring during the proof search). Many of the higher-order uni- fication problems occurring in practice are decidable and have only finitely many solutions. Hence, even though higher-order unification is generally not decidable it is sensible in practice to apply the unification algorithm with a particular resource4, such that only those unification problems which may have further solutions beyond this bound need to be delayed. In our presentation of calculus CR we explicitly address the aspect of eager unification and substitution by rule Subst. This rule back-propagates eagerly computed unifiers to the literal part of a clause. • Eager unification and substitution:

C ∨[X =? A] X/∈ free(A) Subst C[A/X] Rule Subst is applicable provided that [X =? A] is solved with respect to the other unification constraints in C, i.e., that there is no conflict with other unification constraints. The literal heads of our clauses may consist of set variables and it may be necessary to instantiate them with terms introducing new logical con- stant at head position in order to find a refutation. Unfortunately not all appropriate instantiations can be computed with the calculus rules presen- ted so far. To address this problem Huet’s approach provides the following splitting rules: [ ]T ∨ P A C T 1. Instantiate set variables: T T ? S∨ [Q] ∨[R] ∨ C ∨[P A = (Qo ∨ Ro)] [P A]F ∨ C SF [ ]µ ∨ [ ]F ∨ ∨[ =? ∨ ] ∨ P A C TF Q C P A (Qo Ro) ν ? S¬ F ? [Q] ∨ C ∨[P A = ¬Qo] [R] ∨ C ∨[P A = (Qo ∨ Ro)]

[ ]T ∨ P Aα→o C T T ? α S [Mα→o Z] ∨ C ∨[P A = M] [ ]F ∨ P Aα→o C F F ? α S [Mα→o s] ∨ C ∨[P A = M] T F S and S are infinitely branching as they are parameterised over type α. Qo,Ro,Mα→o,Zα are new variables and sα isanewSkolem constant. A theorem which is not refutable in CR if the splitting rules are not available is ∃Ao.A. After negation this statement normalises to clause C1 : [A]F , such that none but the splitting rules are applicable. With the help of APPROACHES TO HIGHER-ORDER THEOREM PROVING 217

TF  T rule S¬ and eager unification, however, we can derive C2 :[A ] which is then successfully resolvable against C1.

Extensionality Treatment. On the one hand η-convertibility is built-in in higher-order unification, such that calculus CR already supports func- tional extensionality reasoning to a certain extent. On the other hand CR nevertheless fails to address full extensionality as it does not realise the re- quired subtle interplay between the functional and Boolean extensionality principles. For example, without employing additional Boolean and func- tional extensionality axioms CR cannot prove the rather simple Examples presented in Sections 4.2, 4.3, and 4.4.

Proof Search. Initially the proof problem is negated and normalised. The main proof search then operates on the generated clauses by applying the resolution, factorisation, and splitting rules. Despite the possibility of eager unification CR generally foresees to delay the higher-order unification process in order to overcome the undecidability problem. When deriv- ing an empty clause CR then tests whether the accumulated unification constraints justifying this particular refutation are solvable. Like R,the extensionality treatment of CR requires the addition of infinitely many extensionality axioms to the search space. The following figure graphically illustrates the main ideas of the proof search in CR .

Completeness Results. Huet (1972, 1973a) analyses completeness of CR only with respect to Andrews V -complexes, i.e., Huet verifies that the set of non-refutable sentences in CR is an abstract consistency class for V - complexes.

THEOREM 8 (V -completeness of CR ). The calculus CR is complete with respect to the notion of V -complexes.

We now extend this result and prove Henkin completeness of calculus CR .

THEOREM 9 (Henkin completeness of CR ). The calculus CR is complete wrt. Henkin semantics provided that the infinitely many extensionality axioms are given. 218 CHRISTOPH BENZMÜLLER

Proof. Analogously to the proof of Theorem 5 we can reduce the prob- lem to verifying that the set of non-refutable sentences in R enriched by the extensionality axioms is an abstract consistency class for Henkin mod- els. The assertion then follows in an indirect argument employing Theorem 3. In addition to the abstract consistency properties already examined in Huet (1972, 1973a) for Theorem 8 we have to verify saturatedness, ∇η, ∇b,and∇q as specified in Definition 1. The proofs of all four statements are analogous to the corresponding parts in the proof of Theorem 5. For saturatedness and ∇η we use analogues of Lemmas 6 and 7.

LEMMA 10. Let  be a set of sentences and A, B be sentences. If  ∪ {A}CR  and  ∪{B}CR ,then ∪{A ∨ B}CR  Proof. Analogous to the proof of Lemma 6. 

LEMMA 11. Let  be a set of sentences and let A, B be sentences in β-normal form, such that A can be transformed into B by (i) a one step η- expansion or (ii) a multiple step η-expansion. Then ∪{B}CR  implies  ∪{A}CR . Proof. The proof is analogous to Lemma 7. The main difference is with regard to the derivability of the clauses Bi from its counterparts Ai with the help of C1 and C2 obtained from the (suitably instantiated) func- tional extensionality axioms. It might be the case that the terms T occur inside flexible literals of the clauses Ai. Resolving these flexible literals against C1 and C2 results then in flex-flex pairs that cannot be solved eagerly but have to be delayed. E.g., let Aj (1 ≤ j ≤ n)beofform ν ν [R(pT)] ∨ D. Instead of Bj :=[R(p(λX T X))] ∨ D we can derive  =[ ]ν ∨ ∨[ =? ] only Bj : Q(λX T X) D QT R(p(λX T X)) . Hence, we have to show (in a technically rather complicated inductive proof on the  length of the derivation) that each refutation employing Bj can be replaced by a corresponding one employing Bj . 

3.3. Higher-Order E-Resolution CRE Some more recent approaches to higher-order theorem proving employ equational higher-order unification instead of syntactical higher-order uni- fication in order to ease and shorten proofs on the resolution layer by relocating particular computation or reasoning tasks to the unification process. For instance, equational higher-order unification has been invest- igated within the contexts of higher-order rewriting and narrowing (cf. Nipkow and Prehofer 1998; Prehofer 1998), and within the context of restricted higher-order E-resolution (Wolfram 1993). APPROACHES TO HIGHER-ORDER THEOREM PROVING 219

In this Section we will sketch a higher-order E-resolution approach based on calculus CR . In contrast to the other investigated calculi the aim thereby is not to provide a detailed description of the particular rules and the functioning of the calculus, but to provide a sufficient basis for the in- vestigation to what extent equational higher-order unification can improve the extensionality reasoning in a higher-order theorem prover. Generally unification of two (or several) terms S and T aims at comput- ing sets of unifiers, i.e., substitutions σ , such that Sσ equals Tσ (Sσ = Tσ ). Equational unification thereby extends syntactical unification in the sense that it tries to equalise Sσ and Tσ modulo a fixed equational theory E (written as Sσ =E Tσ ) instead of equalising them syntactically. A survey to unification theory is given in Baader and Siekmann (1994), and Siekmann (1989). Within our higher-order context we assume that an equational theory E is defined by a fixed set of equations between closed λ-terms. For instance, equations expressing commutativity and associativity of the ∧-operator are (λXo λYo X∧Y)= (λXo λYo Y ∧X) and (λXo λYo λZo (X∧Y)∧Z) = (λXo λYo λZo X ∧ (Y ∧ Z)). And within this particular theory E (to be more precise modulo the congruence relation defined by this equations) the following two terms are unifiable by [a/X]: (po→o (bo ∧ Xo) ∧ (Xo ∧ bo)) and (po→o ao ∧ (ao ∧ (bo ∧ bo))). We want to point out that Huet’s unification approach as presented for calculus CR is of course not a pure syntactical one as it already takes αβη- equality into account. We nevertheless call Huet’s approach syntactical higher-order unification in this paper in order to distinguish it from equa- tional higher-order unification in the sense of this Subsection, where the theory E may contain additional higher-order equations. Several, often restricted, approaches to higher-order E-unification have been discussed in literature. Wolfram (1993) a general higher-order E- unification approach which employs higher-order rewriting techniques. An approach restricted to first-order theories is given in Snyder (1990) and an- other restricted one, where as much computation as possible is pushed to a first-order E-unification procedure, is discussed in Qian and Wang (1996) and Nipkow and Qian (1991). Dougherty and Johann (1992) presents a restricted combinatory logic approach. We now sketch our higher-order E-resolution approach CRE .

Clause Normalisation, Resolution and Factorisation, and Splitting. We assume that calculus CRE coincides with calculus CR in all but the uni- 220 CHRISTOPH BENZMÜLLER

fication part. Thus CR provides the clause normalisation, resolution and factorisation, and splitting rules as introduced in Section 3.2.

Equational Unification. Instead of presenting a concrete set of rules for higher-order E-unification we refer to the respective approaches given in Snyder (1990), Nipkow and Qian (1991), Wolfram (1993), and Qian and Wang (1996). For our investigation of CRE it will be of minor importance which particular approach we choose and how general this approach is. Whereas higher-order E-unification can indeed partially improve the extensionality treatment in CRE , we will present simple theorems in Section 4 which cannot be proven in CRE (or in any of the related approaches mentioned above) without additional extensionality axioms. These counterexamples do not depend on the concrete choice of an equational theory E.

3.4. Extensional Higher-Order Resolution ER We now present the extensional higher-order resolution approach as intro- duced in Benzmüller and Kohlhase (1998a), Benzmüller (1991a). In the remainder of this paper we refer to this calculus as ER. ER is Henkin complete without requiring additional extensionality axioms.

λ-Conversion. In contrast to R and CR calculus ER assumes that all terms, literals, and clauses are implicitly reduced to long βη-normal form.

Clause Normalisation, Resolution and Factorisation, and Unification and ¬T ¬F ∨T ∨F ∨F T Splitting. ER employs the normalisation rules , , , l , r , , F , the resolution and factorisation rules Res, Fac, and the unification rules Triv, Dec, Func, FlexRigid, Subst as already defined for calculus CR in Section 3.2. Additionally ER employs the infinitely branching unification rule FlexFlex, which guesses instances in case of flex-flex pairs (cf. Conjecture 13 in Section 3.4).

n m F h C ∨[F n U = H m V ] G ∈ GB • γ →α δ →α γ n→α Guess FlexFlex C ∨[F Un = H Vm]F ∨[F = G]F h GBγ n→α is the set of partial bindings of type γ for a constant h in the given signature. The splitting rules presented for CR in Section 3.2 are replaced in ER by the more elegant primitive substitution rule as first introduced by Andrews (1989). APPROACHES TO HIGHER-ORDER THEOREM PROVING 221

[ k]α ∨ ∈ {¬,∨}∪{ β |β∈T } Qγ U CP GBγ • Primitive substitution Prim k α F [Qγ U ] ∨ C ∨[Q = P] {¬,∨}∪{ β |β∈T } GBγ is the set of partial bindings of type o for a logical constant in the signature.

Extensionality Treatment. Instead of adding infinitely many extensional- ity axioms to the search space CR provides two new extensionality rules which closely connect refutation search and eager unification. The idea is to allow for recursive calls from higher-order unification to the over- all refutation process. This turns the rather weak syntactical higher-order unification approach considered so far into a most general approach for dynamic higher-order theory unification. C ∨[M =? N ] • Unification and equivalence: o o F Equiv C ∨[Mo ⇔ No] C ∨[M =? N ] • Unification and Leibniz equality: α α F Leib C ∨[∀Pα→o P M ⇒ P N]

Proof Search. Initially the proof problem is negated and normalised. The main proof search then closely interleaves the refutation process on res- olution layer and unification, i.e., the main proof search rules Res, Fac, and Prim and the unification rules are integrated at a common conceptual level. The calls from unification to the overall refutation process with rules Leib and Equiv introduce new clauses into the search space which can be resolved against already given ones. This close interplay between unification and refutation search com- pensates the infinitely many extensionality axioms required in R and CR by a more goal-directed approach to full extensionality reasoning. The following picture graphically illustrates the main ideas of the proof search in ER.

Completeness Results. Henkin completeness of the presented approach with rule FlexFlex is analysed in detail in Benzmüller (1999a) and Benzmüller and Kohlhase (1998a). Here we only mention the main result:

THEOREM 12 (Henkin completeness of ER). The calculus ER is com- plete with respect to Henkin semantics. 222 CHRISTOPH BENZMÜLLER

Benzmüller (1999a) presents but does not prove the following interest- ing claims which are of major practical importance as they will lead to an enormous reduction of the search spaces in ER.

CONJECTURE 13 (FlexFlex-rule is not needed). Rule FlexFlex can be avoided in ER without affecting Henkin completeness.

CONJECTURE 14 (Base type restriction of rule Leib). Rule Leib can be restricted to base types α in ER without affecting Henkin completeness.

4. EXAMPLES

In this section we compare the extensionality treatment provided by the calculi R, CR , CRE ,andER with the help of simple examples. Des- pite their simplicity the latter two of these examples are nevertheless challenging with respect to their automisation in a higher-order theorem prover.

4.1. η-Equality . EXAMPLE 15. fι→ι = λXι fX

Solution in R. In order to prove Example 15, which normalises after F negation and expansion of Leibniz equality to C1 :[qf] and C2 : T [q(λXfX)] where q(ι→ι)→o is a new Skolem term, we first have to α→β appropriately instantiate the two functional extensionality clauses E1 α→β and E2 with the help of rule Sub:

ι→ι :[ ]T ∨[ ]F ∨[ ]T E1 p(fs) Qf Q(λX fX) ι→ι :[ ]F ∨[ ]F ∨[ ]T E2 p(fs) Qf Q(λX fX)

Employing cut and simplification we can derive

F T C3 :[Qf] ∨[Q(λX fX)] which corresponds to the Leibniz equation between f and (λX fX). With rule Sub we then substitute the term λMι→ι ¬(q M) for the predicate variable Q, re-normalise the generated pre-clause, and obtain

T F C4 :[qf] ∨[q(λX fX)] APPROACHES TO HIGHER-ORDER THEOREM PROVING 223

By applying the cut rule to C4, C1, and C2 we then derive .

Solution in CR, CRE , and ER. We first sketch the proof of Example 15 F T in CR . Initially we resolve on C1 :[qf] and C2 :[q(λX fX)] and ? F thereby obtain the unification constraint C3 : ∨[f = (λX fX)] .The η-equality of the two unification terms is shown with the help of the uni- fication rule Func which derives the trivial unification constraint C4 :  ∨ ? F [fs= fs] (where sι is new Skolem term). This unification constraint can be subsequently eliminated with rule Triv. Our examples illustrates higher-order unification already addresses weak functional extensionality (η-equality). An analogous refutation can clearly be employed in calculus CRE as weak functional extensionality is built-in in higher-order E-unification as well. Example 15 is trivially solvable in ER due to the fact that we implicitly assume all terms to be in long βη-normal form, i.e., the clauses to be F T refuted are C1 :[q(λX fX)] and C2 :[q(λX fX)] . Clearly, when considering long βη-normal forms instead of β-normal forms the problem is trivially solvable in calculi R, CR ,andCRE as well.

4.2. Set Descriptions In higher-order logic sets can be elegantly encoded by characteristic func- tions. An interesting problem then is to investigate whether two encodings describe the same set. The following trivial example demonstrates the importance of the extensionality principles for this purpose.

EXAMPLE 16. The set of all red balls equals the set of all balls that are red: {X|red X ∧ ball X}={X|ball X ∧ red X}. This problem can be encoded as (λXι red X ∧ ball X) = (λXι ball X ∧ red X).

Negation, expansion of Leibniz equality, and clause normalisation leads to the following clauses (where p(ι→o)→o is a new Skolem constant): F T C1 :[p(λX red X ∧ ball X)] C2 :[p(λX ball X ∧ red X)]

Solution in R. As no rule is applicable to C1 and C2 Example 16 is not refutable in R without employing extensionality axioms. The only way to derive a contradiction is to employ suitable instances of the extensionality clauses in a rather complicated derivation: o o 1. With rule Sub instantiate the Boolean extensionality axioms E1 and E2 with the terms (red Y ∧ ball Y) and (ball Y ∧ red Y) for variables A 224 CHRISTOPH BENZMÜLLER

and B. By normalising and employing simplification exhaustively to the resulting pre-clauses we obtain among others:

F F 1 F 1 T C3 :[red Y ] ∨[ball Y ] ∨[P F ] ∨[P G ] T 1 F 1 T C4 :[red Y ] ∨[P F ] ∨[P G ] T 1 F 1 T C5 :[ball Y ] ∨[P F ] ∨[P G ]

where F1 stands for the term (red Y ∧ ball Y) and G1 for (ball Y ∧ red Y). 1 F 1 T From C3–C5 we derive C6 :[P F ] ∨[P G ] by cut and simplification, where C corresponds to the clause normal form of 6 . ∀Y ((λX red X ∧ ball X) Y) = ((λX ball X ∧ red X) Y). 2. With rule Sub we now instantiate the functional extensionality axioms ι→o ι→o 2 = ∧ E1 and E2 with terms F : (λX red X ball X) for variable F and G2 := (λX ball X ∧ red X) for variable G. T 2 F 2 T C7 :[q(red s ∧ ball s)] ∨[Q F ] ∨[Q G ] F 2 F 2 T C8 :[q(ball s ∧ red s)] ∨[Q F ] ∨[Q G ]

3. Applying substitution [(λZ qZ)/P,s/Y] with rule Sub to clause C6 leads to:

F T C9 :[q(red s ∧ ball s)] ∨[q(ball s ∧ red s)] Applying cut and simplification we combine the results of the above steps and derive from C7, C8,andC9

F T C10 :[Q(λX red X ∧ ball X)] ∨[Q(λX ball X ∧ red X)] which represent the Leibniz equation between (λX red X ∧ ball X) and (λX ball X ∧ red X). With the help of C1 and C2 we can now derive  after appropriately instantiating C10 with [p/Q]. Note that in Steps 1 and 2 we had to guess the right instantiations of the extensionality axioms and to apply non-goal directed forward reasoning.

Solution in CR. The only rule that is applicable to C1 and C2 in calculus CR is the resolution rule Res leading to the following unification constraint

? C3 :  ∨[p(λX red X ∧ ball X) = p(λX ball X ∧ red X)] As this unification constraint is obviously not solvable by syntactical higher-order unification we cannot find a refutation on this derivation path. As in calculus R the only way to find a refutation is to guess appropri- ate instances of the extensionality axioms and to derive from them clause C10 representing the Leibniz equation between (λX red X ∧ ball X) and APPROACHES TO HIGHER-ORDER THEOREM PROVING 225

(λX ball X ∧ red X). A concrete derivation can be carried out analog- ously to the above derivation in R. The only difference is that we employ resolution and factorisation instead of cut and simplification. In contrast to R we thereby gain additional guidance with respect to finding some of the required instantiations when combining the resolution/factorisation steps with eager unification attempts. But note that this only holds for the instantiation of non-formulas, e.g., as given in Step 3. The key step in the proof, namely the instantiation of the extensionality axioms in Step 1 with appropriate formulas as arguments, is not supported by unification. Instead the splitting rules have to be employed in order to guess the right instances. The problem with the splitting rules (or analogously the primit- ive substitution rule) is that each application introduces new clauses with T F flexible literals into the search space (in case of S and S even infinitely many) such that the splitting rules become recursively applicable to the new clauses as well. Consequently, the extensionality treatment in CR is analogously to the one in R rather hard to guide in practice. Overwhelming the search space with extensionality clauses and applying forward reasoning to them fur- thermore principally contrasts the intended character of resolution based theorem proving.

Solution in CRE . Analogous to the unsuccessful initial attempt in CR we first resolve between C1 and C2 and obtain

? C3 :  ∨[p(λX red X ∧ ball X) = p(λX ball X ∧ red X)]

Whereas syntactical unification as employed in CR clashes on this uni- fication constraint, calculus CRE can solve this E-unification problem provided that the employed E-unification algorithm covers associativity of the ∧-operator (i.e., E |= (λXo λYo X ∧ Y)= (λXo λYo Y ∧ X)). Hence, depending on the peculiarity of unification theory E calculus CRE can provide more goal directed solutions to particular examples and avoid applications of the extensionality axioms. However, the examples below will demonstrate that E-unification does not provide a general solution.

Solution in ER. Calculus ER provides another goal directed solution avoiding the extensionality axioms. Instead of employing equational uni- fication calculus ER analyses the unifiability of the unification constraint C3 with the help of a recursive call from within its unification algorithm to its own overall refutation process. Clearly, this idea can be seen as a very general form of equational unification, namely equational unification 226 CHRISTOPH BENZMÜLLER modulo the theory defined by the given clause context and full higher-order logic. Like above we initially resolve between C1 and C2 and obtain clause C3. Then we transform C3 with the unification rules Dec and Func into

? C4 :  ∨[red s ∧ ball s = ball s ∧ red s] and apply a recursive call to the overall refutation process with the Boolean extensionality rule Equiv. After normalisation and elimination of identical literals we thereby obtain the following trivially refutable set of propositional clauses

F F T T C5 :[red s] ∨[ball s] C6 :[red s] C7 :[ball s]

4.3. Reasoning with Classical Logic

The following theorem states that all unary logical operators Oo→o which map the propositions a and b to  consequently also map a ∧ b to .

EXAMPLE 17. ∀Oo→o (O ao) ∧ (O bo) ⇒ (O (ao ∧ bo)).

Negation and normalisation leads to (oo→o is a Skolem constant for O)

T T F C1 :[oa] C2 :[ob] C3 :[o(a∧ b)]

Solution in R. Obviously there is no rule applicable to C1 – C3.Asin Section 4.2 we are forced to appropriately instantiate the extensionality axioms. In particular we employ the following two instantiations of the =. Boolean extensionality principle EXTo :

. o (a ⇔ (a ∧ b)) ⇔ (a = (a ∧ b)) and

. o (b ⇔ (a ∧ b)) ⇔ (b = (a ∧ b))

That means we guess the substitutions [a/A,(a∧b)/B], [b/A, (a ∧b)/B] o o and then instantiate the Boolean extensionality clauses E1 and E2 with rule Sub. From the instantiated clauses we can now derive

F T F T C4 :[Pa] ∨[P(a∧ b)] ∨[Qb] ∨[Q(a∧ b)] APPROACHES TO HIGHER-ORDER THEOREM PROVING 227 . . which represents that (a = (a ∧ b)) ∨ (b = (a ∧ b)). By instantiating P and Q with o and simplification we obtain:

F F T C5 :[oa] ∨[ob] ∨[o(a∧ b)]

Resolving against C1, C2,andC3 leads to .

Solution in CR and CRE . There are only two possible proof steps at the very beginning: resolve between C1 and C3 and between C2 and C3. Thereby we get

? ? C4 :  ∨[pa= p(a∧ b)] C5 :  ∨[pb= p(a∧ b)]

Both unification constraints are neither solvable by syntactical higher- order unification nor by higher-order E-unification. Successful refutations in CR and CRE therefore require the application of appropriately instantiated extensionality clauses as demonstrated within the refutation in calculus R above. Note that higher-order (E-)unification does not even provide any support for choosing the right instantiations of the extensionality axioms. Hence both calculi, CR as well as CRE , cannot be Henkin complete without additional extensionality axioms.

Solution in ER. ER allows for a straightforward refutation of the clauses C1 – C3.LikeinCR and CRE the only possible steps at the beginning are to resolve between C1 and C3 and between C2 and C3. Thereby we get

? ? C4 :  ∨[pa= p(a∧ b)] C5 :  ∨[pb= p(a∧ b)]

Decomposing both the unification constraints in both clauses leads to

? ? C6 :  ∨[a = (a ∧ b)] C7 :  ∨[b = (a ∧ b)]

When regarding both unification constraints isolated they are obviously neither syntactically nor semantically solvable. When considering them simultaneously, however, it is easy to see that at least one of both uni- fication constraints must be solvable. Such a non-constructive reasoning on the simultaneous solvability/non-solvability of unification constraints is handled in ER by recursive calls from unification to the overall proof search. In this sense ER intuitively first assumes that the unification constraints are simultaneously not solvable and then tries to refute this assumption. More concretely, the recursive calls with rule Equiv applied 228 CHRISTOPH BENZMÜLLER to C6 and C7 introduce after normalisation and factorisation the follow- ing clauses into the search space (note the importance of the fact that the generated clauses are analysed in a common context):

F F T T T T C5 :[a] ∨[b] C6 :[a] ∨[b] C7 :[a] C8 :[b]

Clauses C5–C8 can be refuted immediately, which contradicts the as- sumption of the simultaneous semantical non-unifiability of the unification constraints in C6 and C7. Hence, either C6 or C7 must already be the empty clause, which justifies the proof.

4.4. Mappings from Booleans to Booleans We already mentioned in Section 2.3 that in Henkin semantics the do- main Do of all Booleans contains exactly the truth values ⊥ and . Consequently the domain of all mappings from Booleans to Booleans 5 contains exactly the denotations of the following four functions: λXo Xo, λXo ¬Xo , λXo ⊥,andλXo . This theorem can be formulated as follows (where fo→o is a constant):

(f = λXo Xo) ∨ (f = λXo ¬Xo) ∨ (f = λXo ⊥) ∨ (f = λXo )

By unfolding the definition of Leibniz equality, negating the theorem, and applying clause normalisation we obtain the following clauses (where p1,...,p4 are Skolem constants):

1 T 1 F 2 T 2 F D1 :[p f ] D2 :[p λXo Xo] D3 :[p f ] D4 :[p λXo ¬Xo]

3 T 3 F 4 T 4 F D5 :[p f ] D6 :[p λXo ⊥] D7 :[p f ] D8 :[p λXo ]

Solution in R, CR , and CRE . As the reader may easily check, none of the applicable resolution steps leads to a unification constraint that is solvable by higher-order unification or higher-order E-unification (independent from theory E). In order to find a refutation appropriate instances of the extensionality principles are needed, just as illustrated in the previous example. Because of lack of space we do not present the quite lengthy refutation here.

Solution in ER. In ER we can find the following goal directed refutation of the clauses D1,...,D8. We first resolve between the related clauses D1 and D2, D3 and D4, D5 and D6,andD7 and D8, and immediately APPROACHES TO HIGHER-ORDER THEOREM PROVING 229 decompose the head symbols in the unification pairs. Thereby we obtain the following four clauses consisting of exactly one unification constraint.

F F F C1 :[p = x] C2 :[p = ¬x] C3 :[p = ⊥]

F C4 :[p = ]

Whereas none of these unification constraints is solvable taken alone (even not by E-unification), it is possible in calculus ER to refute the assumption that these unification constraints are simultaneously not solv- able. Like in the previous example the idea of the following derivation is to show that always one of these unification constraints must be solvable even though one cannot specify which one. The proof presented here has been automatically generated by the prototypical higher-order theorem prover LEO (Benzmüller and Kohlhase 1998b) (which implements calculus ER) within 25 seconds on a Pentium II with 400MHz. Each line presented be- low introduces a new clause (the line numbering thereby corresponds to the clause numbering) by applying the specified calculus rules to previously derived clauses. For instance, line 32 describes that clause C32 is derived from clauses C17 and C16 by resolution with rule Res and immediate elim- ination of trivial unification constraints with rule Triv. In the proof below s1,...,s4 are new Skolem constants of Boolean type introduced by the functional extensionality rule Func at the very beginning of the refutation.

3 F 5 : Func(C4) C5 :[(p s ) =] 2 F 6 : Func(C3) C6 :[(p s ) =⊥] 4 4 F 7 : Func(C2) C7 :[(p s ) = (¬ s )] 1 1 F 8 : Func(C1) C8 :[(p s ) = s )] 3 F 10 : Equiv+Cnf(C5) C10 :[(p s )] 2 T 13 : Equiv+Cnf(C6) C13 :[(p s )] 4 T 4 F 16 : Equiv+Cnf(C7) C16 :[s ] ∨[(p s )] 4 T 4 F 17 : Equiv+Cnf(C7) C17 :[(p s )] ∨[s ] 1 F 1 F 20 : Equiv+Cnf(C8) C20 :[(p s )] ∨[s ] 1 T 1 T 21 : Equiv+Cnf(C8) C21 :[s ] ∨[(p s )] 4 T 4 F 32 : Res+Triv(C17; C16) C32 :[(p s )] ∨[(p s )] 4 F 1 F 1 4 F 36 : Res(C20; C17) C36 :[s ] ∨[s ] ∨[(p s ) = (p s )] 1 F 4 F 1 4 F 42 : Dec(C36) C42 :[s ] ∨[s ] ∨[s = s ] 1 F 4 F 56 : Equiv+Cnf(C42) C56 :[s ] ∨[s ] 230 CHRISTOPH BENZMÜLLER

1 T 4 T 4 1 F 76 : Res(C32; C21) C76 :[s ] ∨[(p s )] ∨[(p s ) = (p s )] 4 T 1 T 4 1 F 85 : Dec(C76) C85 :[(p s )] ∨[s ] ∨[s = s ] 4 T 1 T 4 T 134 : Equiv+Cnf(C85) C134 :[(p s )] ∨[s ] ∨[s ] 4 F 1 F 141 : Res+Triv(C56; C16) C141 :[(p s )] ∨[s ] 1 T 4 F 144 : Res+Triv(C56; C21) C144 :[(p s )] ∨[s ] 1 T 4 F 163 : Res+Triv(C141; C21) C163 :[(p s )] ∨[(p s )] 1 T 4 2 F 211 : Res(C163; C13) C211 :[(p s )] ∨[(p s ) = (p s )] 1 T 4 2 F 237 : Dec(C211) C237 :[(p s )] ∨[s = s ] 4 T 1 T 250 : Res+Triv(C134; C16) C250 :[s ] ∨[s ] 1 T 4 T 255 : Res+Triv(C134; C17) C255 :[s ] ∨[(p s )] 4 T 1 F 387 : Res+Triv(C255; C20) C387 :[(p s )] ∨[(p s )] 1 F 4 3 F 458 : Res(C387; C10) C458 :[(p s )] ∨[(p s ) = (p s )] 4 T 1 2 F 459 : Res(C387; C13) C459 :[(p s )] ∨[(p s ) = (p s )] 1 F 4 3 F 492 : Dec(C458) C492 :[(p s )] ∨[s = s ] 4 T 1 2 F 493 : Dec(C459) C493 :[(p s )] ∨[s = s ] 4 T 1 F 2 F 519 : Equiv+Cnf(C493) C519 :[(p s )] ∨[s ] ∨[s ] 1 F 4 F 3 F 523 : Equiv+Cnf(C492) C523 :[(p s )] ∨[s ] ∨[s ] 2 F 1 F 558 : Res+Triv(C519; C141) C558 :[s ] ∨[s ] 1 T 2 F 592 : Res+Triv(C558; C21) C592 :[(p s )] ∨[s ] 4 T 2 F 610 : Res+Triv(C558; C250) C610 :[s ] ∨[s ] 2 F 1 3 F 664 : Res(C592; C10) C664 :[s ] ∨[(p s ) = (p s )] 2 F 1 3 F 706 : Dec(C664) C706 :[s ] ∨[s = s ] 3 F 4 F 783 : Res+Triv(C523; C144) C783 :[s ] ∨[s ] 2 F 3 F 820 : Res+Triv(C783; C610) C820 :[s ] ∨[s ] 4 F 3 F 824 : Res+Triv(C783; C16) C824 :[(p s )] ∨[s ] 3 F 4 2 F 912 : Res(C824; C13) C912 :[s ] ∨[(p s ) = (p s )] 3 F 4 2 F 952 : Dec(C912) C952 :[s ] ∨[s = s ] 2 T 4 T 3 F 1078 : Equiv+Cnf(C952) C1078 :[s ] ∨[s ] ∨[s ] 2 T 3 F 1144 : Res+Triv(C1078; C783) C1144 :[s ] ∨[s ] 3 F 1218 : Res+Triv(C1144; C820) C1218 :[s ] 3 T 1 T 2 F 1302 : Equiv+Cnf(C706) C1302 :[s ] ∨[s ] ∨[s ] APPROACHES TO HIGHER-ORDER THEOREM PROVING 231

3 T 2 F 1363 : Res+Triv(C1302; C558) C1363 :[s ] ∨[s ] 2 F 1377 : Res+Triv(C1363; C1218) C1377 :[s ] 1 T 2 T 4 T 1454 : Equiv+Cnf(C237) C1454 :[(p s )] ∨[s ] ∨[s ] 2 T 1 T 1502 : Res+Triv(C1454; C144) C1502 :[s ] ∨[(p s )] 1 T 1521 : Res+Triv(C1502; C1377) C1521 :[(p s )] 1 3 F 1560 : Res(C1521; C10) C1560 :[(p s ) = (p s )] 1 F 1565 : Res+Triv(C1521; C20) C1565 :[s ] 1 3 F 1576 : Dec(C1560) C1576 :[s = s ] 3 T 1 T 1643 : Equiv+Cnf(C1576) C1643 :[s ] ∨[s ] 1 T 1646 : Res+Triv(C1643; C1218) C1646 :[s ]

1655 : Res+Triv(C1646; C1565) C1655 : 

4.5. Additional Examples and Case Studies Benzmüller (1999a) discusses several additional examples that require full extensionality reasoning – such as the following example on sets: ℘(∅) ={∅}

It furthermore reports on case studies with the higher-order theorem prover LEO (Benzmüller and Kohlhase 1998) that demonstrate the feasibility of calculus ER in practice.

5. RELATED WORK

Related to calculus CR is the higher-order resolution approach of Jensen and Pietrzykowski (1972, 1976) which also employs a higher-order uni- fication algorithm in order to guide the proof search. The undecidability problem of higher-order unification is thereby tackled by dove-tailing the generation of resolvents. Like CR this approach requires the extensionality axioms in the search space to ensure Henkin completeness. Kohlhase (1994) presents a sorted variant of Huet’s constrained resolu- tion approach. Kohlhase (1995) discusses a higher-order tableaux calculus that is quite closely related to calculus ER, as it already introduces addi- tional calculus rules in order to improve its extensionality treatment. As is illustrated in detail in Benzmüller (1999a) the presented extensionality rules are unfortunately not sufficient to completely avoid additional exten- sionality axioms. The first sufficient set of extensionality rules in this sense 232 CHRISTOPH BENZMÜLLER is presented in Benzmüller (1997), which introduces a variant of calculus ER as presented here. The theorem proving modulo approach described in Dowek et al. (1998) is a way to remove computational arguments from proofs by reasoning modulo a congruence on propositions that is handled via rewrite rules and equations. In their paper the authors present a higher-order logic as a theory modulo. Equality is usually treated as a defined notion in approaches and systems for automated higher-order theorem proving. This is probably the main reason why the problem of mechanising primitive equality in higher-order logic while preserving Henkin completeness has rarely been addressed in literature so far. Approaches to integrate primitive equality in a Henkin complete higher-order theorem proving approach are discussed in Snyder and Lynch (1991), Benzmüller (1999a, b). Of course, the field of higher-order term rewriting and narrowing (Prehofer 1998; Nipkow and Prehofer 1998; Nipkow 1995) is very active. But calculi developed in this context typically only address functional extensionality and do not focus on the subtle interplay between functional and Boolean extensionality that is required in a Henkin complete theorem proving approach. The most powerful automated higher-order theorem prover currently available is (to the best knowledge of the author) the TPS-system (Andrews 1996) which employs the mating method (Andrews 1976) as inference mechanism. TPS employs a clever extensionality pre-processing mechan- ism which transforms embedded equations in input formulas into more appropriate ones in order to avoid later applications of the extensionality axioms. However, this does not provide a general solution and many theor- ems requiring non-trivial extensionality reasoning, such as Examples 3.4 and 4.4, cannot be proven this way.

6. CONCLUSION

In this paper we investigated four approaches to resolution based higher- order theorem proving: Andrews’ higher-order resolution approach R, Huet’s constrained resolution approach CR , higher-order E-resolution CRE , and extensional higher-order resolution ER. Thereby we focused on the extensionality treatment of these approaches and pointed to the crucial role of full extensionality for ensuring Henkin completeness. The investigated examples demonstrate that simply adding (infinitely many) extensionality axioms to the search space – as suggested for R and CR – increases the amount of blind search and is thus rather infeasible in practice. APPROACHES TO HIGHER-ORDER THEOREM PROVING 233

Whereas higher-order E-unification and E-resolution indeed improves the situation in particular contexts, it does still not provide a general solution. Calculus ER is the sole studied approach that can completely avoid the extensionality axioms. It’s extensionality treatment is based on goal directed extensionality rules which closely connect the overall refutation search with unification by allowing for mutual recursive calls. This suitably extends the higher-order E-unification and E-resolution idea, as it turns the unification mechanism into a most general, dynamic theory unifica- tion mechanism. Unification may now itself employ a Henkin complete higher-order theorem prover as a subordinated reasoning system and the considered theory (which is defined by the sum of all clauses in the actual search space) dynamically changes. Due to the close connection of unific- ation and refutation search it is even possible in ER to realise a kind of non-constructive reasoning on E-unifiability, as was demonstrated in this paper.

ACKNOWLEDGEMENTS

I want to thank Volker Sorge and the anonymous referee of this paper for their useful comments and contributions. I am also grateful to Mi- chael Kohlhase for many fruitful discussions on extensional higher-order theorem proving.

NOTES

∨F ∨F 1. Conjunction elimination is provided by the rules l and r . We note that conjunction is defined with the help of disjunction and negation; cf. Section 2.1. 2. Existential elimination is realised by the rule F . For this note that existential quantification is defined with the help of universal quantification (and universal quantification with the help of ); cf. Section 2.1. 3. It is still an open problem whether it is possible to restrict the required instances of the functional extensionality axioms in dependence of a given proof problem. 4. One may choose a bound on the allowed number of nested branchings in the search tree with rule FlexRigid. 5. Since Do contains two elements, Do→o contains in each Henkin model at most four elements. And because of the requirement, that the function domains in Henkin models must be rich enough such that every term has a denotation, it follows that Do→o contains exactly the pairwise distinct denotations of the four presented function terms.

REFERENCES

Andrews, P, B.: 1971, ‘Resolution in Type Theory’, Journal of Symbolic Logic 36, 414– 432. 234 CHRISTOPH BENZMÜLLER

Andrews, P. B.: 1972, ‘General Models and Extensionality’, Journal of Symbolic Logic 37, 395–397. Andrews, P. B.: 1973, Letter to Roger Hindley dated January 22. Andrews, P. B.: ‘Refutations by Matings’, IEEE Transactions on Computers C-25, 801– 807. Andrews, P. B.: 1989, ‘On Connections and Higher Order Logic’, Journal of Automated Reasoning 5, 257–291. Andrews, P. B., Bishop, M., Issar, S., Nesmith, D., Pfenning, F., and Xi, H.: 1996, ‘TPS: A Theorem Proving System for Classical Type Theory’, Journal of Automated Reasoning 16, 321–353. Barendregt, H. P.: 1984, The Lambda Calculus Ð Its Syntax and Semantics, Studies in Logic and the Foundations of Mathematics 103, Amsterdam. Benzmüller, C.: 1997, A calculus and a System Architecture for Extensional Higher-order Resolution, Research Report 97-198, Department of Mathematical Sciences, Carnegie Mellon University. Benzmüller, C.: 1999a, Equality and Extensionality in Automated Higher-Order Theorem Proving, Ph.D. thesis, Technische Fakultät, Universität des Saarlandes. Benzmüller, C.: 1999b, in H. Ganzinger (ed.), Proceedings of the 16th Conference on Automated Deduction, Lecture Notes in Artificial Intelligence 1632, pp. 399–413, Springer. Benzmüller, C. and Kohlhase, M.: 1997, ‘Model Existence for Higher-Order Logic’, SEKI- Report SR-97-09, Fachbereich Informatik, Universität des Saarlandes. Benzmüller, C. and Kohlhase, M.: 1998a, ‘Extensional Higher-order Resolution’, in Kirch- ner and Kirchner (eds.), Proceedings of the 15th Conference on Automated Deduction, Lecture Notes in Artificial Intelligence 1421, pp.56–72, Springer. Benzmüller, C. and Kohlhase, M. 1998b, ‘LEO – A Higher-order Theorem Prover, in Kirchner and Kirchner (eds.), Proceedings of the 15th Conference on Automated Deduction, Lecture Notes in Artificial Intelligence 1421, pp. 139–144, Springer. Baader, F. and Siekmann, J.: 1994, ‘Unification Theory’, in D. M. Gabbay, C. J. Hog- ger, J. A. Robinson (eds.), Handbook of Logic in Artificial Intelligence and Logic Programming, Volume 2: Deduction Methodologies, Oxford, Chapter 2.2 Church, A.: 1940, ‘A formulation of the Simple Theory of Types’, Journal of Symbolic Logic 5, 56–68. Dowek, G., Hardin, T., and Kirchner, C.: 1998, Theorem Proving Modulo, Rapport de Recherche 3400, Institut National de Recherche en Informatique et en Automatique. Dougherty, D. and Johann, P.: 1992, ‘A Combinatory Logic Approach to Higher-order E-unification’, in D. Kapur (ed.), Proceedings of the 11th Conference on Automated Deduction, Lecture Notes in Artificial Intelligence 607, pp. 79–93, Springer. Goldfarb, W. D. 1981, ‘The Undecidability of the Second-order Unification Problem’, Theoretical Computer Science 13, 225–230. Henkin, L.: 1950, ‘Completeness in the Theory of Types’, Journal of Symbolic Logic 15, 81–91. Huet, G. P.: 1972, Constrained Resolution: A Complete Method for Higher Order Logic, Ph.D. thesis, Case Western Reserve University. Huet, G. P.: 1973, ‘A Mechanization of Type Theory’, in D. E. Walker and L. Norton (eds.), Proceedings of the 3rd International Joint Conference on Artificial Intelligence, pp. 139–146. Huet, G. P.: 1973, ‘The Undecidability of Unification in Third Order Logic’, Information and Control 22, 257–267. APPROACHES TO HIGHER-ORDER THEOREM PROVING 235

Huet, G. P.: 1975, ‘A Unification Algorithm for Typed λ-calculus’, Theoretical Computer Science 1, 27–57. Jensen, D. C. and Pietrzykowski, T.: 1972, ‘A Complete Mechanization of ω-order Type Theory’, in Proceedings of the ACM annual Conference, volume 1, 89–92. Jensen, D. C. and Pietrzykowski, T.: 1976, ‘Mechanizing ω-order Type Theory through Unification’, Theoretical Computer Science 3, 123–171. Kohlhase, M.: 1994, A Mechanization of Sorted Higher-Order Logic Based on the Resolution Principle, Ph.D. thesis, Fachbereich Informatik, Universität des Saarlandes. Kohlhase, M.: 1995, ‘Higher-Order Tableaux’, in P. Baumgartner, R. Hähnle, and J. Pose- gga (eds.), Theorem Proving with Analytic Tableaux and Related Methods, Lecture Notes in Artificial Intelligence 918, pp. 294–309, Springer. Lucchesi, C. L.: 1972, The Undecidability of the Unification Problem for Third Order Languages, Report CSRR 2059, University of Waterloo, Waterloo, Canada. Miller, D.: 1983, Proofs in Higher-Order Logic, Ph.D. thesis, Carnegie Mellon University. Nipkow, T.: 1995, ‘Higher-order Rewrite Systems’, in J. Hsiang (ed.), Rewriting Tech- niques and Applications, 6th International Conference, Lecture Notes in Computer Science 914, Springer. Nipkow, T. and Prehofer, C.: 1998, ‘Higher-order Rewriting and Equational Reasoning’, in W. Bibel and P. Schmitt (eds.), Automated Deduction Ð A Basis for Applications, Dordrecht, Applied Logic Series, pp. 399–430. Nipkow, T. and Qian, Z,: 1991, ‘Modular Higher-order E-unification’, in R. V. Book (ed.), Proceedings of the 4th International Conference on Rewriting Techniques and Applications, Lecture Notes in Artificial Intelligence 488, pp. 200–214, Springer. Prehofer, C.: 1998, Solving Higher-Order Equations: From Logic to Programming, Progress in theoretical computer science, Birkhäuser. Qian, Z. and Wang K.: 1996, ‘Modular Higher-order Equational Preunification’, Journal of Symbolic Computation 22, 401–424. Robinson, J. A.: 1965, ‘A Machine-oriented Logic Based on the Resolution Principle’, Journal of the Association for Computing Machinery 12, 23–41. Siekmann, J. H.: 1989, ‘Unification Theory’, Journal of Symbolic Computation 7, 207– 274. Smullyan, R. M. 1963, ‘A Unifying Principle for Quantification Theory’, Proceedings of the National Acadamy of Sciences, USA 49, pp. 828–832. Snyder, W.: 1990, ‘Higher Order E-unification’, in M. Stickel (ed.), Proceedings of the 10th Conference on Automated Deduction, Lecture Notes in Artificial Intelligence 449, pp. 573–578, Springer. Snyder, W. and Gallier, J.: 1989, ‘Higher-order Unification Revisited: Complete Sets of Transformations’, Journal of Symbolic Computation 8, 101–140. Snyder, W. and Lynch, C.: 1991, ‘Goal-directed Strategies for Paramodulation’, in R. V. Book (ed.), Proceedings of the 4th International Conference on Rewriting Techniques and Applications, Lecture Notes in Artificial Intelligence 488, pp. 200–214, Springer. Wolfram, D. A.: 1993, The Clausal Theory of Types, Cambridge, Cambridge Tracts in Theoretical Computer Science 21.

Fachbereich Informatik, Universität des Saarlandes D-66041 Saarbrücken Germany E-mail: [email protected]

REINHARD KAHLE

MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS

ABSTRACT. We give an overview of recent results in ordinal analysis. Therefore, we discuss the different frameworks used in mathematical proof-theory, namely subsystem of analysis including reverse mathematics, Kripke–Platek set theory, explicit mathematics, theories of inductive definitions, constructive set theory,andMartin-Löf’s type theory.

1. INTRODUCTION

In this paper, we present an overview of results in ordinal analysis, a major branch of mathematical proof theory. Our aim is to give an impression of the richness and variety of this area of logic. In particular we focus on the different frameworks within mathematics for which proof theory has established informative results. Because of the lack of space, we do not give any proofs, but refer to the relevant literature. Also we necessar- ily omit the broad area of applications which results from proof-theoretic investigations. For a general pointer to the literature, we first have to mention Hilbert and Bernays’ fundamental work (Hilbert and Bernays 1934, 1939). Mod- ern proof theory is described in the monographs of Schütte (1960, 1977) and Takeuti (1987).1 As an introduction, Pohlers’ lecture notes (1989) are very suitable, while Girard’s monograph (1987b) serves as a good ref- erence for a more advanced reader. Besides the excellent Handbook of Mathematical Logic (Barwise 1977) (not only with respect to proof the- ory), the Handbook of Proof Theory, edited by Buss (1998) also contains important contributions. Focusing on ordinal analysis as one of the most important techniques in mathematical proof theory, Pohlers has written a series of papers which can be strongly recommended for the beginner (Pohlers 1982, 1986, 1991, 1992, 1996, 1998). In addition, Rathjen’s pa- pers The Realm of Ordinal Analysis (Rathjen 1999) gives an overview of the state of the art. Both this article and Pohlers’ lecture notes (1989) also contain extensive lists of the relevant literature. The structure of the paper is as follows. In the next section, we de- scribe the background of ordinal analysis by outlining the general aims

Synthese 133: 237–255, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 238 REINHARD KAHLE of proof theory and presenting a brief history of mathematical proof the- ory. In Section 3, we describe the ideas of ordinal analysis. The fourth section forms the core of this paper, describing different frameworks of mathematical theories and presenting an overview of their main results. In the final section we give a short conclusion and mention additional topics and applications.

2. THE BACKGROUND

Proof theory in general analyzes proofs or formal systems. This very gen- eral programme can be differentiated into investigations of purely formal properties of proofs and the analysis of particular theories, especially of theories which are of mathematical interest. Here, we understand mathematical proof theory in a narrow sense as the part of logic which investigates the mathematical strength of a formal theory. In contrast, the comparison of different forms of calculi, like Hilbert-style calculi, sequent calculi, and natural deduction, is a typ- ical example for the more general aims of proof theory. Of course, these investigations are of fundamental interest. They are interesting also with respect to mathematical proof theory, in particular if we look at the cut rule: ⇒ AA⇒  ⇒  One of the main aims of proof theory is to prove cut elimination, i.e., to prove that a derivable formula of a formal system is also derivable without use of the cut rule. There are many important consequences following from cut elimination. As examples, we mention separation and interpolation theorems as well as the subformula property. This property says that each formula of the premises is a subformula of a formula of the consequence. It is obvious that in the absence of the subformula property proof search is quite complicated. In the case of the cut rule, one has to guess the so-called cut formula A which, in principle, could be any well-formed formula. In particular in the case of intuitionistic logic, cut elimination can be used to show the disjunction and existential property. Furthermore, it can be used for decision procedures. For a detailed discussion of this applications we refer the reader to Troelstra and Schwichtenberg’s textbook (Troelstra and Schwichtenberg 2000) which also serves as a standard reference for the basic techniques in proof theory. For ordinal analysis, cut elimination is of special interest because the (appropriate) assignment of ordinals to MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 239 proofs and their growth during the cut elimination procedure will yield the (proof-theoretic) upper bound of an axiom system. As another fruitful branch investigating purely formal properties of proofs we mention substructural logics and linear logic which deal with restricted structural rules (Schroeder-Heister and Došen 1993; Girard 1987a). With respect to applications in computer science, the very close relation between structural proof theory and type theory is important. Ad- ditional references are Girard et al. 1989, Wainer and Wallen 1992 and Hindley 1996. Let us come back to mathematical proof theory by giving a very brief history. At the turn of the 20th century, Cantor’s discovery of set theory led to a new debate about the foundations of mathematics. At the same time, Peano and Dedekind gave first formal foundations of various areas of mathematics (others than geometry which was the formal discipline since Euclid). But most prominent (nowadays!) is Frege’s attempt to give a purely logical basis for mathematics as a whole. Despite the fact that Russell’s paradox breaks down Frege’s formalism, Whitehead and Russell presented a new approach in their monumental work Principia mathem- atica. At the same time, Hilbert and his school began to investigate the foundations of mathematics. Puzzled by the paradoxes of Cantor’s set the- ory, Hilbert demanded an axiomatic reconstruction to avoid antinomies. In fact, he invented proof theory as a discipline in mathematics when he started the so-called Hilbert’s programme: To show the consistency of a mathematical theory by purely finitary methods. Hilbert himself did not give a precise definition of finitary methods, but today primitive recursive arithmetic PRA is commonly accepted as the formal theory reflecting these finitary methods. In 1931, Gödel showed with his famous incompleteness results, that – under some natural conditions – a mathematical theory cannot prove its own consistency. From this point of view, Hilbert’s pro- gramme failed. Nevertheless, Gentzen succeeded in giving a consistency proof of Peano arithmetic PA by means of transfinite induction up to the 2 ordinal 0. By Gödel’s result, this additional concept must be beyond PA. But it is still a constructively justifiable concept. For this reason, Gentzen’s result is indeed considered as a consistency proof of PA which reduces the consistency to a more “elementary” concept. But, more generally, the concepts and methods Gentzen invented are the tools for modern proof theory in mathematics. Starting in the 1950’s, Schütte advanced proof the- ory by use of the ω-rule (cf. below). In 1967, Takeuti succeeded to give 1 an ordinal analysis of an impredicative subsystem of analysis, namely 1 comprehension. In the 70’s, the methods of proof theory were improved in many respects and, with the work of Jäger and Pohlers, set theory entered 240 REINHARD KAHLE the stage when several systems of Kripke-Platek set theory were analyzed. 1 Today, Arai and Rathjen are achieving a proof-theoretic treatment of 2 comprehension (Rathjen 1995). For more detailed information about the development of ordinal ana- lysis in proof theory we refer to Feferman’s preface in Buchholz et al. (1981), the survey articles of Pohlers cited above and Rathjen (1999). In addition, for the early history the source book edited by van Heijenoort (1967) and the collected works of Gentzen (Szabo 1969) provide reprints of the seminal papers. For a philosophical discussion of ordinal analysis, in particular as the core of so-called reductive proof-theory, we refer to Feferman’s papers (1988) and (2000). There, a precise definition of proof theoretic reduction is given and compared with the related notions from model theory (in the first paper) and the notions of translation and relative interpretations (in the latter one). Roughly speaking, a proof-theoretic reduction is given by a partial recursive function from the proof of a theory T to the proofs of a theory S which preserves proofs of a certain class of formulae .In particular,  has to contain absurdity, so that the reduction ensures relative consistency. Finally, the proof of these properties has to be carried out by “elementary” means. Usually, PRA is choosen for this aim.3 We finish this background overview by stressing that there are also other important and interesting branches of proof theory studying the mathematical power of formal theories. Most important are functional interpretations and realizability. Functional interpretation goes back to a seminal paper of Gödel (1958). Further references are Avigad and Fe- ferman’s article in the Handbook of Proof Theory (Avigad and Feferman 1998) and Burr’s contribution to this volume (Burr 2001). For the aims and methods of realizability, we refer to Troelstra (1998). This is particularly of interest for the computational content of mathematical proofs.

3. ORDINAL ANALYSIS

Gödel’s theorems show that Hilbert’s programme cannot be realized in its original conception. When Gentzen proved the consistency of PA by means of transfinite induction, he used a concept beyond purely finitary methods. But in this concept, the “transfiniteness” of a system is put into a math- ematically clear notion which is – at least to some extent – constructively justified. Moreover, this approach allows one to give a uniform measure for mathematical theories, the so-called proof theoretic ordinal of a system. MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 241

Transfinite induction up to an ordinal α is defined as

TI(α, X) :⇔ (∀x.(∀y.y ≺α x → y ∈ X) → x ∈ X) →∀x.x ∈ X where ≺α is a proper initial segment of a primitive-recursive well-ordering ≺ of order type α,andX a set variable. This definition depends on the choice of ≺ and. it has turned out that to avoid pathological cases, we would like to have a notion of natural ordinal representation system.This problem is discussed in detail by Rathjen (1998, 1999). Formally, we define the proof-theoretic ordinal of a formal system T as

|T|:=sup{α|T TI(α, X)}.

In fact, there are several alternatives to defining a meaningful notion of proof-theoretic ordinal, but it has turned out that in almost all interesting cases, these definitions coincide. For a discussion of the different notions of proof-theoretic ordinal we refer to Pohlers (1996). Peano arithmetic was analyzed by Gentzen in terms of transfinite induc- tion up to 0, (Gentzen 1936). Based on his result and methods, the analysis of mathematical theories by means of transfinite induction, called ordinal analysis, was extended to much stronger systems. For the lower bound, one has to carry out well-ordering proofs, while for the upper bounds there are essentially two (not completely) different methods, following Takeuti and Schütte. Here, we will follow the so-called Schütte-style which performs cut elimination in semi-formal systems. For Takeuti’s method which works always with finitary derivations, we refer to Takeuti’s textbook (1987). A comparison of both methods can be found in Buchholz (1997, 2001). For a meaningful proof-theoretic ordinal analysis, we first of all need an ordinal notation system. It turns out that this task is quite difficult. An introduction to notation systems would go far beyond the scope of this paper. The reader can find a profound discussion, for instance, in Pohlers (1996, 1998) and Rathjen (1998, 1999).4 For the study of impredicative systems, the concept of collapsing is needed. Roughly speaking, collapsing functions are used in ordinal notation systems to pin down uncountable or- dinals to countable ones. At least at this stage, Set Theory and Generalized Recursion Theory provide fundamental tools for proof theory. In particular, large cardinals and their recursive analogs are used (Rathjen 1993, 1998). But the use of large cardinals is purely heuristic with two (different) pur- poses. One aim is the justification of the well-foundedness of the notation systems. On the other hand, from a technical point of view, the closure conditions of cardinals corresponds to closure properties of the analyzed systems. But the large cardinals are not “really” needed. One could even 242 REINHARD KAHLE work with their recursive analogs which are set-theoretically unproblem- atic, but for the price of much more technical work. Very recently, Setzer gave a new approach to ordinal notation systems for impredicative systems (Setzer 1999). These so-called ordinal systems do not need collapsing but, at the moment, it is not completely clear how far this concept will lead. Usually, well-ordering proofs are rather technical and carrying out such a proof requires going up to the very limits of the axiom system. Ex- amples for well-orderings proof can be found, for instance, in Buchholz (1975), Buchholz et al. (1981), Buchholz and Schütte (1988), Jäger (1983), Rathjen (1988), Setzer (1998). For the upper bound of a theory, we search for a cut-free derivation system (in the impredicative case we have to eliminate the reflection rules also). With respect to the original question of consistency, the question whether a contradictory formula, like 0 = 1, is derivable becomes manage- able in this way. Since for mathematical theories cut elimination does not hold, we need to change to semi-formal systems. Here it may be possible to prove cut elimination, but one has to pay the price of infinitary deriva- tions. For instance, for Peano Arithmetic PA, we can define the semi-formal system PA ω containing the ω-rule:

, A(n)¯ for all n . , ∀x.A(x)

“For all n” is a metastatement meaning that we have an infinitary list of premises A(0¯), A(1¯), A(2¯),... (where n¯ is the (formal) numeral repres- enting the natural number n). Now we can translate every PA-proof into a PA ω-proof. Then we prove cut elimination for PA ω, i.e., we can convert every PA ω-proof into a cut-free PA ω-proof. Moreover, it turns out that the lengths of cut-free PA ω-proofs are bounded by 0. In fact, this additional information is of particular interest to the proof-theoretic analysis of a formal system – much more than the consistency, which becomes nothing but a byproduct.

4. THE LANDSCAPE OF PROOF THEORY

In the following, we present an overview of results in mathematical proof theory by listing theories of different frameworks together with their re- spective proof theoretic ordinals. The landscape can be split into three major blocks. The first part contains predicative theories. The idea of pre- dicativity goes back to Poincaré, and means that we can only use sets which are already defined to define new ones. Theories where the definition of MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 243 a set refers to a domain which already contains the new set are called impredicative. In the 60’s, Schütte and Feferman proved that 0 is the limit of predicativity.5 A paradigmatic impredicative system is the theory of non-iterated inductive definitions ID1 (cf. below) of strength of the so- 6 called Bachmann–Howard ordinal. It turns out that, in the gap between 0 and Bachmann–Howard, there are a lot of interesting theories which are not essentially impredicative. From studying these systems, the notion of metapredicativity arose (Strahm 1999). This term is used for theories that have a proof-theoretic strength beyond 0, but for which a proof-theoretic analysis does not need methods of impredicative proof theory, in particular collapsing. From this point of view, it is quite possible that even typical impredicative theories can have a metapredicative treatment. In the following, we sketch the systems of six different frameworks which have been studied in proof theory: Subsystems of analysis with reverse mathematics as a part, Kripke–Platek set theory, explicit math- ematics, theories of inductive definitions, constructive set theory and Martin-Löf’s type theory. The first four are based on classical logic, while it is crucial that we choose intuitionistic logic for the two last ones. For the (meta)predicative systems, the proof-theoretic ordinal is given in terms of the binary and ternary Veblen function, or the function. For the impre- dicative theories we use the notation system of Pohlers (1998).  stands CK for ω1 , the first nonrecursive ordinal (which is the recursive counterpart of ω1, the first uncountable ordinal). Moreover, the constants I0, M0 and K0 which are used in the notation system have their origins in the first weakly inaccessible, the first weakly Mahlo and the first weakly compact cardinal.7 The appearance of these ordinals should give just an impression of the scales in the landscape of proof theory.

4.1. Subsystems of Analysis The most important framework is formed by the subsystems of analysis.

With respect to mathematical practice, full analysis Z2 is one of the most important theories. The crucial axiom is comprehension for arbit- rary second order formulae, i.e., those which can contain second order quantifiers without any restriction:

∃X.∀y.y ∈ X ↔ φ(X,y).

Although an ordinal analysis of Z2 is still out of reach, by restricting the comprehension axiom we get treatable theories. First, we can restrict comprehension to arithmetical formulae, i.e., formulae without bound set 0 variables. The class of arithmetical formulae is called ∞. By defining a so-called jump hierarchy along a primitive recursive well-ordering ≺ of 244 REINHARD KAHLE order type α, we can iterate the comprehension for arithmetical formulae 0 8 along such a hierarchy and get the theory (∞ − CA)α. The union of all 0 0 (∞ − CA)β for β<αis called (∞ − CA)<α. A standard measure for the complexity of a formula is the length of its quantifier prefix. The Greek capital  is used for prefixes starting with a universal quantifier,  for existential quantifiers, and  for formulae which are equivalent to a  and a  formula of the same complexity. While the superscript 0 refers to number quantifiers, the superscript 1 is used for set quantifiers. The subscript indicates the number of quantifier alternations. 1 Let φ be an arithmetical formula. Then 1 is the class of all formulae ∀ 1 ∃ of the form X.φ and 1 , the class of formulae of the form X.φ.In 1 ∀ ∃ ∀ general k consists of the formulae of the form X1. X2. X3....QXk.φ ∀ ∃ 1 where Q is the appropriate quantifier or . Analogously, k stands for the class of formulae of the form ∃X1.∀X2.∃X3....QXk.φ. If we weaken 1 1 this definition by saying that a formula belongs to the class k or k if it is provably equivalent to a formula which has the intended syntactical form, 1 1 1 we can define k as the intersection of k and k . 1 − + 1− − Here, we present the following theories: (1 CA) (BR), (1 CA) , 1 − 1 − 1 − + 1 − 1 − + (1 CA)0, (1 CA), (1 CA) (BI), (2 CA) and (2 CA) (BI). The superscript − for the first theory means that we do not allow free set 1 parameters in the 1 formulae in the comprehension axiom. The subscript 0 for the second theory means that induction is restricted to sets, while we otherwise have induction for arbitrary formulae. (BI) stands for the axiom of Bar induction, expressing that we have transfinite induction for arbitrary formulae along definable well-orderings WF(≺) → TI(≺,φ).9 (BR) denotes the corresponding rule. Comprehensive presentations of the proof-theoretic results and meth- ods for subsystems of analysis can be found in Pohlers’ Handbook article (1998) and in the monograph of Buchholz and Schütte (1988).

Reverse mathematics A special branch of this area is reverse mathematics. In this programme started by Friedman and Simpson, we look at certain mathematical the- orems taken from working mathematics. It turns out that quite a lot of theorems are proof-theoretically equivalent to just five distinguishable sub- systems of analysis. The results of this programme are summarized in Simpson’s monograph (Simpson 1999). The five systems considered in 1 − reverse mathematics are RCA0, WKL0, ACA0, ATR 0 and (1 CA)0.All these theories are second order theories and again, the index 0 indicates that induction is restricted to sets. For the precise definition of the theories, we refer to Simpson (1999). Here we give only the meaning of the acronyms: MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 245

• RCA stands for recursive comprehension axiom. Its set existence axioms yield sets A which are recursive in given sets B1,...,Bn. • WKL stands for weak König’s lemma. WKL0 extends RCA0 by a form- alization of weak König’s lemma, expressing that every infinite binary tree has an infinite path. • ACA stands for arithmetical comprehension axiom. ACA0 can be re- garded as the second order analog of Peano Arithmetic. In fact, ACA0 is a conservative extension of PA. • ATR stands for arithmetical transfinite recursion. Informally, it ex- presses that the jump operation can be iterated along any countable well-ordering.

4.2. Kripke–Platek Set Theory Within set theory, it turns out that Kripke–Platek set theory is an extremely fruitful framework for proof theory. It was investigated by Barwise (1975), and its use for proof theory was elaborated by Jäger (1986). As a theory for the constructible hierarchy L, Kripke–Platek is obtained from Zermelo- Fraenkel by leaving out the power set axiom and restricting Separation and Collection to 0 formulae. We get the theory KPu by assuming natural numbers as urelements, Pairing, Transitive Hull, 0-Separation and 0-Collection. As induction principles, we have complete induction for the natural numbers IndN as well as induction for the element relation Ind∈ (foundation). In KPu,the letter u stands for the presence of urelements. Sets satisfying the axioms of KPu are called admissible.10 Now, we can strengthen KPu by incorporating (axiomatically) the no- tion of admissibility. Doing this, for KPl we replace 0-collection by the so-called limit axiom expressing that every set is contained in an admissible set. So l means the limit axiom. KPi results from the presence of 0- collection and the limit axiom. It formalizes a (recursively) inaccessible universe, therefore i. KPM, studied by Rathjen, formalizes a recursively Mahlo universe of sets. And, as the strongest theory, Rathjen has also investigated KPu extended by 3 reflection. Other parameters which can be modified are the induction principles. If we leave out foundation and restrict induction on the natural numbers to sets, we write the superscript 0. For theories where both foundation and r IndN are restricted to sets, we use the superscript (restricted). If we restrict only foundation to sets but allow full induction on the natural numbers we write the superscript w (weak). In the following, we consider the theories KPu0, KPl0, KPi0, KPm0, KPu, r r w w KPl , KPi , KPl , KPi , KPl, KPi, KPM and KPu + (3 − Ref) cf. Jäger 246 REINHARD KAHLE

(1986), Jäger and Strahm (2001), Pohlers (1996, 1998), Rathjen (1991, 1994).

4.3. Explicit Mathematics In the seventies, Feferman introduced systems of explicit mathematics to formalize Bishop style mathematics (Feferman 1975, 1979). It turns out that these theories, especially the theory T0, are proof-theoretically very useful. Explicit mathematics and, as its first order part, applicative theories, have become a substantial field within the landscape of proof theory. In the applicative part, explicit mathematics is based on partial combin- atory logic, providing an existence predicate and a predicate for the natural numbers as relation symbols. So the term language, in particular allowing self-application, is extremely powerful, while the provable properties are completely tailored by the axioms about certain constants. Moreover, for the second order part, a special naming relation enables one to represent types (which play the role of sets in explicit mathematics) by terms. While the types are still extensional, their names show intensional behavior. The basic theory of explicit mathematics is called BON (Basic theory of Op- erations and Numbers), consisting of partial combinatory algebra, pairing and projection, natural numbers and definition by cases. For explicit math- ematics, we add type existence axioms for elementary comprehension and join, expressing disjoint unions, and get the theory EETJ. In analogy to admissibles in Kripke–Platek set theory, in explicit mathematics universes can be added as types reflecting certain closure properties. As a corres- ponding principle to foundation, explicit mathematics can be strengthened by the principle of inductive generation. Finally, we can also add axioms for the Mahloness in explicit mathematics. In the following, we consider the theories BONF , EETJF , EMUT , EMUF , EMAT , T0 and T0(M). EMU is EETJ strengthened by universes, and in EMA the Mahlo axioms are added. T0 is EETJ plus inductive gen- eration and, finally, T0(M) is its strengthening by the Mahlo axioms. The index F stands for full induction, while T means that induction is restricted to types. For the first order theories an overview is given by Jäger et al. (1999b). For explicit mathematics, we refer to the following publications, Beeson (1985), Marzetta (1994), Jäger and Strahm (2001), Strahm (1999), Feferman (1979), Jäger (1983), Jäger et al. (2001), Jäger and Studer (to appear). For reasons of completeness, we will mention that there exists an altern- ative approach to introducing types or sets in the applicative framework by means of a (partial) truth predicate. This can be seen as an axiomatization MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 247 of Aczel’s Frege structures, and it has some advantages with respect to the syntactical expressive power (Beeson 1985; Cantini 1996; Kahle 1997).

4.4. Theories of Inductive Definition Because inductive definition is one of the most important concepts in math- ematics, it is natural to investigate theories which formalize this principle directly. Starting from the fundamental theory ID1 of non-iterated induction definitions, iteration yields a natural family of formal systems which are useful for proof theoretic investigation. By dropping the leastness con- dition for inductive definitions, we get fixed point theories which are of interest in the area of proof-theoretical weak systems. Formally, ID1 extends Peano Arithmetic by means of new relation sym- bols Pφ for all positive operator forms φ(R,x), i.e., formulae having only positive occurrences of the relation symbol R. Pφ is the least fixed point of such an operator form, axiomatized as follows:

(ID1.1) ∀x.φ(Pφ ,x)→ Pφ(x),

(ID1.2) (∀x.φ(ψ,x) → ψ(x)) →∀x.Pφ(x) → ψ(x).

For the corresponding fixed point theory, the leastness condition (ID1.2) is dropped, but the first axiom is stated as an equivalence:  (ID1) ∀x.φ(Pφ ,x)↔ Pφ(x).

Both theories can be iterated by defining inductive definition and fixed points for operator forms which contain constants of a lower level. This can even be done for the transfinite case. So we get, for an ordinal α,the   theories IDα and IDα. The union of theories IDβ and IDβ for all β<α  are called ID<α and ID<α respectively. The main reference for the proof- theoretic investigations of inductive definition is Buchholz et al. (1981). The corresponding finitely iterated fixed point theories were studied by Fe- ferman (1982), in relation to a problem posed for Martin-Löf’s type theory. Transfinitely iterated fixed point theories were one of the first examples of metapredicative theories (Jäger et al. 1999a).

4.5. Constructive Set Theory Recently, it has turned out that constructive set theory CZF is a fruitful framework for proof-theoretic investigations (Aczel 1978, 1986; Griffor and Rathjen 1994; Rathjen 1998). Introduced by Aczel, it is based on intuitionistic logic, and essentially replaces the power set axiom of ZF by 248 REINHARD KAHLE strong collection and subset collection. In CZF,wehaveextensionality, pairing and union as in ZF. Additionally, we have infinity, set induction, restricted separation, i.e., separation for formulae containing restricted set quantifiers only, strong collection and subset collection. It can be strengthened by the regular extension axiom (REA). Since Burr’s contri- bution to this volume (Burr 2001) is devoted to this theory, we refer for definitions to his article.

4.6. Martin-Löf’s Type Theory As a last example, we should mention Martin-Löf’s type theory.Itwas invented for philosophical reasons, but it turns out that we can study mathematical concepts from a proof-theoretic point of view in this frame- work. It has been investigated mainly by Rathjen and Setzer (Griffor and Rathjen 1994; Setzer 1993, 1998). Unfortunately, the syntactical overhead of Martin-Löf’s type theory does not allow a condensed presentation and we refer the reader to Martin-Löf (1984), Troelsta and van Dalen (1988) and the references above. We will consider the systems ML1, ML<ω, ML1V, ML1W and ML1WV. In terms of the terminology of Martin-Löf’s type the- ory, the index 1 refers to one universe, while <ω indicates the presence of finitary many universes. V denotes Aczel’s set of iterated sets (Aczel 1978) and W stands for the W-type which formalizes induction along well-ordered types. The theory ML1W was proof-theoretically analyzed (independently) by Rathjen (Griffor and Rathjen 1994) and Setzer (1993).

Its proof-theoretic strength (!(I0+ω)) is slightly too strong to fit in our snapshot, but the restrictions ML1W and ML1WV considered by Grif- for and Rathjen are of appropriate strength. Setzer has analyzed a theory for a Mahlo universe in Martin-Löf’s type theory MLMW which, again, is slightly stronger than the corresponding theories in the other frameworks

(!(M0+ω)) (Setzer 2000).

5. CONCLUDING REMARKS

We conclude our overview of the landscape of proof-theory by stressing two conceptional parallels which appear in the different frameworks. The first one is foundation for Kripke–Platek, which corresponds to inductive generation in explicit mathematics, the leastness condition for fixed points in theories of inductive definitions and the so-called W-type in Martin-Löf’s type theory. From a heuristic point of view, a strong sim- ilarity in all these frameworks is suggested. In fact, in a lot of cases, adding this concept leads from a (meta)predicative to an impredicative MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 249 V W 1 ML / V W 1 <ω 1 1 ML ML ML ML ) REA ( + set theory type theory Constructive Martin-Löf’s CZF CZF 0 0 1 <ω ω < 1 <ω ω <      ID ID ID ID PA ID ID ID ID ID ID F ) T F F T M ( 0 0 Appl. theories BON EETJ EMU EMU EMA T T ) Ref − 3 0 r TABLE I ( KPi KPi 0 + / / 0 0 r w w KP KPu KPl KPm KPu KPl KPl KPl KPi KPi KPM KPu A snapshot of the landscape 0 WKL / 0 0 0 0 ) RCA ATR ATR ACA CA ) ) ) − 1 1 BI BR BI 0 ( ( ( Analysis  < + + + ) − ) ) ) ) ) ) CA CA CA CA CA CA CA − − − − − − − 0 ∞ 1 1 1 1 1 1 1 1 1 2 1 2 ( PRA PA ( Subsystems Reverse Math. Explicit Math. )( )( ) ) 0 )( 1 1 )( 1  1 1 · + )( + + )( 0 + 0 0 + ω 0 ω ω    I M K 0 + 0 0 ( ( ( ( ( ( ( ( 00  0 0         ω 0  1 I(α,X) 0 ! PRA T ω  φ φ φω ! ! ! ! ! ! ! 250 REINHARD KAHLE theory completely analogously. On the other hand, subtle differences in one framework can lead to a better understanding of both the concept and the theory. And it happens that such a discovery leads to a refinement of the studied theory. The second example is the concept of universes, coming from Martin- Löf’s type theory. It was taken up by Feferman when he defined finitary iterated fixed point theories and showed that fixed points correspond to universes. In the same way, one can study universes in explicit mathem- atics and, finally, it has turned out that they are conceptual analogs to admissibles in Kripke–Platek set theory. As mentioned above, we did not discuss the important parts of func- tional interpretation and realizability in this paper. But, we will give very roughly some pointers to applications of mathematical proof theory. The fundamental relation between proof theory and subrecursive hier- archies is described in this volume in the work of Weiermann (2001). With respect to the significance of proof theory for mathematics, com- binatorial independence results are a major topic. In 1977, Paris and Harrington found the first example, a finite Ramsey theorem, of an in- completeness in Peano Arithmetic which is strictly mathematical (Paris and Harrington 1977). Starting from this, combinatorial statements turned out to be of crucial interest for independence result. The work of Harvey Friedman (Harrington et al. 1985) serves as a reference. Finally, looking at computer science, proof theory turns out to be the essential tool. A standard example is the λ-calculus and the operations on it. In particular, normalization is nothing more than cut elimination. For termination proofs of rewrite systems, ordinal assignment is a standard tool (Dershowitz and Okada 1998; Cichon 1992). In this context, we stress the work of Weiermann who used particularly the methods of impredicative proof theory for this aim (Weiermann 1998). For classical complexity theory, bounded arithmetic, introduced by Buss, provides a proof theoretic account (Buss 1986). Also in this area, methods originated in ordinal analysis can be used as is done in the work of Beckmann (1996). Even in the other articles of the volume, one will find proof-theoretic ingredients, in the work of Matthes (2002) (strong normalization), Jürjens (2002) (Dialogue logic) and Benzmüller (2002) (higher order resolution).

NOTES

1 The second edition of Takeuti’s book extends the first edition (1975) by including additional contributions from other leading proof theorists. MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 251

2 α 0 is the limit of ω powers, formally: 0 := (least α.α = ω ) 3 In Feferman (1988), Feferman proposed to use the theory S to prove these properties. Unfortunately, an example due to Niebergall shows that this definition is not transitive. 4 In his internet homepage, Rathjen provides a version of Rathjen (1998) containing additional appendices with detailed proofs. 5 0 is the limit of the (binary) Veblen function φαβ cf. Pohlers (1989). 6 For a description equaling the Bachmann–Howard ordinal, we refer to Pohlers (1989). 7 For a set theoretic discussion of large cardinals, cf. Kanamori (1994). 8 For a precise definition of the jump hierarchy, cf. Feferman (1977). 9 WF(≺) is defined as ∀X.(∃x.x ∈ X) →∃x.x ∈ X ∧∀y.y ≺ x → y ∈X. 10 Since the presence of urelements is not required for admissibles, our admissibles are admissible sets above N,tobemoreprecise.

REFERENCES

Aczel, P.: 1978, ‘The Type Theoretic Interpretation of Constructive Set Theory’, in A. MacIntyre, L. Pacholski and J. Paris (eds), Logic Colloquium ’77, Amsterdam [Studies in Logic and the Foundations of Mathematics 96], pp. 56–66. Aczel, P.: 1986, ‘The Type Theoretic Interpretation of Constructive Set Theory: Inductive Definitions’, in R. Barcan Marcus, G. Dorn and P. Weingartner (eds), Logic, Meth- odology, and Philosophy of Science, Vol. VII, Amsterdam [Studies in Logic and the Foundations of Mathematics 114], pp. 17–49. Avigad, J. and S. Feferman: 1998, ‘Gödel’s Functional (‘Dialectica’) Interpretation’, in S. Buss (ed.), Handbook of Proof Theory, Amsterdam, pp. 337–406. Barwise, J.: 1975, Admissible Sets and Structures, Berlin [Perspectives in Mathematical Logic]. Barwise, J. (ed.): 1977, Handbook of Mathematical Logic, Amsterdam [Studies in Logic and the Foundations of Mathematics 90]. Beckmann, A.: 1996, Separating Fragments of Bounded Predicative Arithmetic,Dis- sertation, Institut für mathematische Logik und Grundlagenforschung, Westfälische Wilhelms-Universität Münster. Beeson, M.: 1985, Foundations of Constructive Mathematics, Berlin [Ergebnisse der Mathematik und ihrer Grenzgebiete (3. Folge) 6]. Benzmüller, W.: 2002, ‘Comparing Approaches to Resolution Based Higher-Order The- orem Proving’, this volume. Buchholz, W.: 1975, ‘Normalfunktionen und konstruktive Systeme von Ordinalzahlen’, in J. Diller and G. Müller (eds), ISILC Proof Theory Symposium. Dedicated to Kurt Schütte on the Occasion of his 65th Birthday, Proceedings of the International Summer Institute and Logic Colloquium, Kiel 1974, Berlin [Lecture Notes in Mathematics 500], pp. 4–25. Buchholz, W.: 1997, ‘Explaining Gentzen’s Consistency Proof within Infinitary Proof The- ory’, in G. Gottlob, A. Leitsch and D. Mundici (eds), Computational Logic and Proof Theory, 5th Kurt Gödel Colloquium, KGC ’97, Vienna, Austria, 25–29 August 1997, Proceedings, Berlin [Lecture Notes in Computer Science 1289], pp. 4–17. Buchholz, W.: 2001, ‘Explaining the Gentzen–Takeuti Reduction Steps: A Second-Order System’, Archive for Mathematical Logic 40, 244–272. 252 REINHARD KAHLE

Buchholz, W., S. Feferman, W. Pohlers and W. Sieg: 1981, Iterated Inductive Definitions and Subsystems of Analysis: Recent Proof-Theoretical Studies, Berlin [Lecture Notes in Mathematics 897]. Buchholz, W. and K. Schütte: 1988, Proof Theory of Impredicative Subsystems of Analysis, Napoli [Studies in Proof Theory (Monographs) 2]. Burr, W.: 2001, ‘Concepts and Aims of Functional Interpretation: Towards a Functional Interpretation of Constructive Set Theory’, this volume. Buss, S.: 1986, Bounded Arithmetic, Napoli [Studies in Proof Theory. (Lecture Notes) 3]. Buss, S. (ed): 1998, Handbook of Proof Theory, Amsterdam [Studies in Logic and the Foundations of Mathematics 137]. Cantini, A.: 1996, Logical Frameworks for Truth and Abstraction, An Axiomatic Study, Amsterdam [Studies in Logic and the Foundations of Mathematics 135]. Cichon, E. A.: 1992, ‘Termination Orderings and Complexity Characterisations’, in P. Aczel, H. Simmons and S. Wainer (eds), Proof Theory, A Selection of Papers from the Leeds Proof Theory Programme, an International Summer School and Conference on Proof Theory, held at the Leeds University, UK, 24 July–2 August 1990, Cambridge, pp. 171–193. Dershowitz, N. and M. Okada: 1988, ‘Proof-Theoretic Techniques for Term-Rewriting Theory’, in IEEE Computer Society (ed.), Proceedings of the Third Annual Symposium on Logic in Computer Science (LICS ’88), Edinburgh, Scotland, UK, 5–8 July, 1988, Edinburgh, pp. 104–111. Feferman, S.: 1975, ‘A Language and Axioms for Explicit Mathematics’, in J. Crossley (ed.), Algebra and Logic, Papers from the 1974 Summer Research Institute of the Aus- tralian Mathematical Society, Monash University, Australia, Berlin [Lecture Notes in Mathematics 450], pp. 87–139. Feferman, S.: 1977, ‘Theories of Finite Type Related to Mathematical Practice’, in J. Barwise (ed.), Handbook of Mathematical Logic, Amsterdam [Studies in Logic and the Foundations of Mathematics 90], pp. 913–971. Feferman, S.: 1978, ‘Constructive Theories of Function and Classes’, in M. Boffa, D. van Dalen and K. McAloon (eds), Logic Colloquium ’78, Proceedings of the Colloquium held in Mons, August 1978, Amsterdam [Studies in Logic and the Foundations of Mathematics 97], pp. 159–224. Feferman, S.: 1982, ‘Iterated Inductive Fixed-Point Theories: Application to Hancock’s Conjecture’, in G. Metakides (ed.), Patras Logic Symposion, Proceedings of the Logic Symposion held at Patras, Greece, 18–22 August, 1980, Amsterdam [Studies in Logic and the Foundations of Mathematics 109], pp. 171–196. Feferman, S.: 1988, ‘Hilbert’s Program Relativized: Proof-Theoretical and Foundational Reductions, Journal of Symbolic Logic 53, 364–384. Feferman, S.: 2000, ‘Does Reductive Proof Theory have a Viable Rationale?’, Erkenntnis 53, 63–96. Gentzen, G.: 1936, ‘Die Widerspruchsfreiheit der reinen Zahlentheorie’, Mathematische Annalen 112, 493–565. Girard, J.-Y.: 1987a, ‘Linear Logic’, Theoretical Computer Science 50, 1–102. Girard, J.-Y.: 1987b, Proof Theory and Logical Complexity, Vol. I, Napoli [Studies in Proof Theory (Monographs) 1]. Girard, J.-Y., Y. Lafont, and P. Taylor: 1989, Proofs and Types, Cambridge [Cambridge Tracts in Theoretical Computer Science 7]. Gödel, K.: 1958, ‘Über eine bisher noch nicht benützte Erweiterung des finiten Stand- punktes’, Dialectica 12, 280–287. MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 253

Griffor, E. and M. Rathjen: 1994, ‘The Strength of Some Martin-Löf Type Theories’, Archive for Mathematical Logic 33, 347–385. Hilbert, D. and P. Bernays: 1934, Grundlagen der Mathematik I, 2nd edn, 1968, Berlin. [Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen 40]. Hilbert, D. and P. Bernays: 1939, Grundlagen der Mathematik II, 2nd edn, 1970, Berlin. [Die Grundlehren der mathematischen Wissenschaften in Einzeldarstellungen 50]. Hindley, J.: 1996, Basic Simple Type Theory, Cambridge [Cambridge Tracts in Theoretical Computer Science 42]. Harrington, L., M. Morley, A. Scedrov and S. Simpson (eds): 1985, Harvey Friedman’s Research on the Foundations of Mathematics, Amsterdam [Studies in Logic and the Foundations of Mathematics 117]. Jäger, G.: 1983, ‘A Well-Ordering Proof for Feferman’s Theory T0’, Archiv für mathema- tische Logik und Grundlagenforschung 23, 65–77. Jäger, G.: 1986, Theories for Admissible Sets, A Unifying Approach to Proof Theory, Napoli [Studies in Proof Theory (Lecture Notes) 2]. Jäger, G., R. Kahle, A. Setzer and Th. Strahm: 1999a, ‘The Proof-Theoretic Analysis of Transfinitely Iterated Fixed Point Theories, Journal of Symbolic Logic 64, 53–67. Jäger, G., R. Kahle, and Th. Strahm: 1999b, ‘On Applicative Theories’, in A. Cantini, E. Casari and P. Minari (eds.), Logic and Foundation of Mathematics, Selected Contributed Papers of the Tenth International Congress of Logic, Methodology and Philosophy of Science, Florence, August 1995, Dordrecht [Synthese Library 280], pp. 83–92. Jäger, G., R. Kahle and Th. Studer: 2001, ‘Universes in Explicit Mathematics’, Annals of Pure and Applied Logic 109, 141–162. Jäger, G. and Th. Strahm: 2001, ‘Upper Bounds for Metapredicative Mahlo in Explicit Mathematics and Admissible Set Theory’, Journal of Symbolic Logic 66, 935–958. Jäger, G. and Th. Studer: to appear, ‘Extending the System T0 of Explicit Mathematics: The Limit and Mahlo Axioms’, Annals of Pure and Applied Logic. Jürjens, J.: 2002, ‘Games in the Semantics of Programming Languages – An Elementary Introduction’, this volume. Kahle, R.: 1997, Applikative Theorien und Frege-Strukturen, Ph.D. thesis, Institut für Informatik und angewandte Mathematik, Universität Bern. Kanamori, A.: 1994, The Higher Infinite, Large Cardinals in Set Theory from Their Beginnings, Berlin [Perspectives in Mathematical Logic]. Marzetta, M.: Predicative Theories of Types and Names, Ph.D. thesis, Institut für Inform- atik und angewandte Mathematik, Universtiät Bern. Matthes, R.: 2002, ‘Tarski’s Fixed-Point Theorem and Higher-Order Rewrite Systems’, this volume. Martin-Löf, P.: 1984, Intuitionistic Type Theory, Napoli [Studies in Proof Theory (Lecture Notes) 1]. Paris, J. and L. Harrington: 1977, ‘A Mathematical Incompleteness in Peano Arithmetic’, in J. Barwise 1977 (ed.), Handbook of Mathematical Logic, Amsterdam, pp. 1133–1142. Pohlers, W.: 1982, ‘Admissibility in Proof Theory’, in L. J. Cohen, J. Łos,´ H. Pfeiffer, and K. P. Podewski (eds), Logic, Methodology and Philosophy of Science,Vol.VI, Amsterdam [Studies in Logic and the Foundations of Mathematics 104], pp. 123–139. Pohlers, W.: 1986, ‘Beweistheorie’, in S. D. Chatterji, I. Fenyoe, U. Kulisch, D. Laugwitz and R. Liedl (eds), Jahrbuch Überblicke Mathematik 1986, Mathematical Survey,Vol. 19, Mannheim, pp. 37–62. Pohlers, W.: 1989, Proof Theory, Berlin [Lecture Notes in Mathematics 1407]. 254 REINHARD KAHLE

Pohlers, W.: 1991, ‘Proof Theory and Ordinal Analysis’, Archive for Mathematical Logic 30, 311–376. Pohlers, W.: 1992, ‘A Short Course in Ordinal Analysis’, in P. Aczel, H. Simmons and S. Wainer (eds), Proof Theory, A Selection of Papers from the Leeds Proof Theory Pro- gramme, An International Summer School and Conference on Proof Theory, held at the Leeds University, UK, 24 July–2 August 1990, Cambridge, pp. 27–78 Pohlers, W.: 1996, ‘Pure Proof Theory, Aims, Methods and Results’, Bulletin of Symbolic Logic 2, 159–188. Pohlers, W.: 1998, ‘Subsystems of Set Theory and Second Order Number Theory’, in S. Buss (ed.), Handbook of Proof Theory, Amsterdam, pp. 209–335. Rathjen, M.: 1988, Untersuchungen zu Teilsystemen der Zahlentheorie zweiter Stufe und 1 1 + der Mengenlehre mit einer zwischen 2-CA und 2-CA BI liegenden Beweisstärke, Ph.D. thesis, Westfälische Wilhelms-Universität, Münster. Rathjen, M.: 1991, ‘Proof-Theoretic Analysis of KPM’, Archive for Mathematical Logic 30, 377–403. Rathjen, M.: 1993, ‘How to Develop Proof-Theoretic Ordinal Functions on the Basis of Admissible Ordinals’, Mathematical Logic Quarterly 39, 47–54. Rathjen, M.: 1994, ‘Proof Theory of Reflection’, Annals of Pure and Applied Logic 68, 181–224. 1 − Rathjen, M.: 1995, ‘Recent Advances in Ordinal Analysis: 2 CA and Related Systems’, Bulletin of Symbolic Logic 1, 468–485. Rathjen, M.: 1995, ‘The Higher Infinite in Proof Theory’, in J. Makowsky and E. Ravve (eds), Logic Colloquium ’95, Proceedings of the Annual European Summer Meeting of the Association of Symbolic Logic, Haifa, Israel, 9–18 August 1995, Berlin [Lecture Notes in Logic 11], pp. 275–304. Rathjen, M.: 1999, ‘The Realm of Ordinal Analysis’, in S. Cooper and J. Truss (eds), Sets and Proofs, Invited Papers from Logic Colloquium ’97 – European Meeting of the Asso- ciation of Symbolic Logic, Leeds, July 1997, Cambridge [London Mathematical Society Lecture Notes Series 258], pp. 219–279. Schütte, K.: 1960, Beweistheorie, Berlin [Grundlehren der mathematischen Wis- senschaften 103]. Schütte, K.: 1977, Proof Theory, Translation from the German by J. N. Crossley, Berlin [Grundlehren der mathematischen Wissenschaften 225]. Setzer, A.: 1993, Proof Theoretical Strength of Martin-Löf Type Theory with W-Type and One Universe, Ph.D. thesis, Ludwig-Maximilians-Universität München. Setzer, A.: 1998, ‘Well-Ordering Proofs for Martin-Löf Type Theory’, Annals of Pure and Applied Logic 92, 113–159. Setzer, A.: 1999, ‘Ordinal Systems’, in S. B. Cooper and J. K. Truss (eds), ‘Sets and Proofs, Invited Papers from Logic Colloquium ’97 – European Meeting of the Association of Symbolic Logic, Leeds, July 1997, Cambridge [London Mathematical Society Lecture Notes Series 258], pp. 301–338. Setzer, A.: 2000, ‘Extending Martin-Löf Type Theory by One Mahlo-Universe’, Archive for Mathematical Logic 39, 155–181. Schroeder-Heister, P. and K. Došen (eds): 1993, Substructural Logics, Seminar for Natural- Language Processing Systems of the University of Tübingen, Germany, 7–8 October 1990, Oxford [Studies in Logic and Computation 2]. Simpson, S.: 1999, Subsystems of Second Order Arithmetic, Berlin [Perspectives in Mathematical Logic]. MATHEMATICAL PROOF THEORY IN THE LIGHT OF ORDINAL ANALYSIS 255

Strahm, Th.: 1999, ‘First Steps into Metapredicativity in Explicit Mathematics’, in S. Cooper and J. Truss (eds), Sets and Proofs, Invited Papers from Logic Colloquium ’97 – European Meeting of the Association of Symbolic Logic, Leeds, July 1997, Cambridge [London Mathematical Society Lecture Notes Series 258], pp. 383–402. Szabo, M. (ed.): 1969, The Collected Papers of Gerhard Gentzen,Amsterdam. Takeuti, G.: 21987, Proof Theory, Amsterdam [Studies in Logic and the Foundations of Mathematics 81]. Troelstra, A. S.: 1998, ‘Realizability’, in Samuel Buss (ed.), pp. 407–473. Troelstra, A. and H. Schwichtenberg: 22000, Basic Proof Theory, Cambridge [Cambridge Tracts in Theoretical Computer Science 43]. Troelstra, A. and D. van Dalen: 1988, Constructivism in Mathematics, An Introduction, Vol. II, Amsterdam [Studies in Logic and the Foundations of Mathematics 123]. van Heijenoort, J. (ed.): 1967, From Frege to Gödel, A Source Book in Mathematical Logic, 1879–1931, Cambridge MA. Wainer, S. and L. Wallen: 1992, ‘Basic Proof Theory’, in P. Aczel, H. Simmons and S. Wainer (eds), Proof Theory, A Selection of Papers from the Leeds Proof Theory Pro- gramme, an International Summer School and Conference on Proof Theory,heldatthe Leeds University, UK, 24 July–2 August 1990, Cambridge. Weiermann, A.: 1998, ‘How is it that Infinitary Methods can be Applied to Finitary Mathematics? Gödel’s T : A Case Study’, Journal of Symbolic Logic 63, 1348–1370. Weiermann, A. 2001, ‘Slow Versus Fast Growing’, this volume.

WSI, Universität Tübingen Sand 13, D-72076 Tübingen Germany E-mail: [email protected]

WOLFGANG BURR

CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS: TOWARDS A FUNCTIONAL INTERPRETATION OF CONSTRUCTIVE SET THEORY

ABSTRACT. The aim of this article is to give an introduction to functional interpretations of set theory given by the author in Burr (2000a). The first part starts with some general remarks on Gödel’s functional interpretation with a focus on aspects related to problems that arise in the context of set theory. The second part gives an insight in the techniques needed to perform a functional interpretation of systems of set theory. However, the first part of this article is not intended to be a complete survey of functional interpretations and here we recommend, for example, Avigad and Feferman (1998), Troelstra (1990) and Troelstra (1973).

1. CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS

When we discuss the origins of Gödel’s Dialectica interpretation, we start with Hilbert’s program. As Hilbert never formulated it exactly, we may roughly summarize it as follows: Ð Formalize mathematics in a formal system. Ð Show the consistency of this system with finitist methods. Hilbert formulated this program in the twenties and by finitist methods he meant the following (see Troelstra 1990):

In Hilbert’s sense, this may be described as mathematics of a purely combinatorial nature, dealing with configurations of finite, discrete, concretely representable objects that can by surveyed (grasped) in all their parts. Elementary school arithmetic may be regarded as typically finitary in Hilbert’s sense: it deals with natural numbers and certain specific operations on them, such as addition and multiplication, which have purely combinatorial character. On the other hand, the general concept of a function from N to N is not finitary. Primitive recursive arithmetic, for example, is a suitable system for carry- ing out Hilbert’s program. Gödel’s second incompleteness theorem (1931) showed that this program was unattainable: No consistent, recursively ax- iomatized system that contains primitive recursive arithmetic proves its own consistency, let alone the consistency of Peano arithmetic or even stronger theories. Hence any consistency proof of classical mathematics

Synthese 133: 257Ð274, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 258 WOLFGANG BURR or even arithmetic exceeds the framework of finitary mathematics. Gödel’s famous paper from 1958 ‘Über eine bisher noch nicht benützte Erweit- erung des finiten Standpunktes’ Gödel (1958) starts with the following comment on this fact:

P. Bernays hat wiederholt darauf hingewiesen, dass angesichts der Tatsache der Un- beweisbarkeit der Widerspruchsfreiheit eines Systems mit geringeren Beweismitteln als denen des Systems selbst eine Überschreitung des Rahmens der im Hilbertschen Sinn finiten Mathematik nötig ist, um die Widerspruchsfreiheit der klassischen Mathematik, ja sogar um die der klassischen Zahlentheorie zu beweisen. Da die finite Mathematik als die der anschaulichen Evidenz definiertist, sobedeutet das [...],dass manfürden Widerspruchsfreiheitsbeweis der Zahlentheorie gewisse abstrakte Begriffe braucht.1 Gödel concludes:

Jedenfalls lehrt uns die Bernayssche Bemerkung, zwei Bestandteile in der finiten Einstel- lung unterscheiden, nämlich erstens das konstruktive Element, welches darin besteht, dass von mathematischen Objekten nur insoweit die Rede sein darf, als man sie aufweisen oder durch Konstruktion tatsächlich herstellen kann; zweitens das spezifisch finitistische Ele- ment, welches darüber hinaus fordert, dass die Objekte, über welche man Aussagen macht, mit welchen die Konstruktionen ausgeführt werden und welche man durch sie erhält, “anschaulich” sind, das heisst letzten Endes raum-zeitliche Anordnungen von Elementen, deren Beschaffenheit abgesehen von Gleichheit und Verschiedenheit irrelevant ist.2 In view of his second incompleteness theorem, Gödel continues to argue that in order to realize at least a relativised Hilbert program one has to drop the second requirement: Certain abstract notions have to be accepted. A prominent example is Gentzen’s consistency proof of Peano arithmetic (PA). Here ordinals up to ε0 are adjoint to finitary mathematics. In his paper from 1958 Gödel takes another route and shows that the notion of comput- able or primitive recursive functional of finite type is sufficient for that task as well. (Functionals of finite type were introduced first by Hilbert in 1925, see Hilbert (1926).) For that reason Gödel introduces a quantifier free term calculus T of primitive recursive functionals of finite type which generalize the primitive recursive functions. In particular, T allows the definition of a functional F from given functionals G and H by primitive recursion:

F(0) = G; F(x + 1) = H(x,G(x)).

In contrast to the class of primitive recursive functions, G and H may be of finite type Ð this allows us to define much more complex functions, e.g., the Ackermann function and, in fact, all ≺ε0-descent recursive functions. T further contains the constant 0 and a successor functional suc, and it is combinatorially complete, i.e., allows explicit definition. Furthermore, it is equipped with an induction rule. The main result of Gödel’s paper is an interpretation of HA in T: Gödel assigns to any formula ϕ in the language CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 259

D of HA aformulaϕ of the form ∃v∀wϕD(v, w) where ϕD is a formula of T and v,w are tuples of variables of finite type Ð this translation is called the Dialectica translation. Gödel proves: Whenever HA  ϕ with translation D ϕ ≡∃v∀wϕD(v, w) then there are terms t in T such that T  ϕD(t, w), or, in other words: ϕ is D-interpretable in T. In the translation D the case of implication is of particular interest. In his paper from 1958, Gödel gives the definition of D without any explan- ation. However, in earlier sources we find a detailed description of how to perform and justify this transition. Suppose we are given two expressions ∃x∀yM(x,y) and ∃u∀vN(u,v) with ϕ,ψ quantifier free. The task is to find an expression which is again in ∃∀-shape and has ‘the same meaning’ as the implication

(1) (∃x)(∀y)M(x,y) → (∃u)(∀v)N(u,v).

These are Gödel’s explanations (quoted from the talk ‘In what sense is intuitionistic logic constructive’ held by Gödel at Yale university in 1941 (Feferman 1994, 196Ð197), we adapt Gödel’s notation and write (∀x) instead of (x)):3

This implication (i.e., (1)) means: If there exists an object x satisfying a certain condition then there exists also an object u satisfying a certain other condition. But in a constructive logic that can only mean: Given such an x you can construct such a u, i.e.,

(∃f)(∀x)[(∀y)M(x,y) → (∀v)N(f (x), v)].

But this expression still has not the form we want, since implication still is applied to expressions containing quantifiers. But now we have only universal quantifiers here, and what can in a constructive logic be the meaning of the assertion: If ∀xF(x) then ∀yG(y)? The simplest interpretation which suggests itself is this: Given a counterexample for G you can construct a counterexample for F , i.e., the expression in square brackets will mean:

(∃g)(∀v)[∼ N(f(x),v) →∼ M(x,g(v))]; but here the symbol of implication is applied only to expressions without quantifiers. Therefore, we can replace it by the ordinary implication of two-valued logic. Furthermore, we can apply transposition to this implication, interchanging the order of the two terms, and obtain in this manner for the whole expression:

(∃f)(∀x)(∃g)(∀v)[M(x,g(v)) ⊃ N(f(x),v)].

Butnow,thatforeveryx there exists such a function g means that g is really a function of two variables x and v, i.e., we finally obtain

(∃f, g)(∀x,v)[M(x,g(x,v)) ⊃ N(f(x),v)], and now we are through, since this expression has the form we want [...]. 260 WOLFGANG BURR

At this point we should note that this justification essentially needs the decidability of quantifier free formulae. The translations of the remaining logical operations are defined and justified in a much simpler way and we do not want to describe them here. Let us briefly summarize what we gain from this interpretation (cf. Troelstra’s introduction to the 1941 lecture, Feferman 1994, 187): Ð First of all, this reduces Heyting arithmetic to the quantifier free term calculus T. Hence HA is consistent relative to T. However, there are various further applications of this interpretation: Ð Together with the negative-translation, this interpretation also reduces classical Peano arithmetic to the same system T. Ð The provably recursive functions of HA are characterized in terms of T: whenever HA ∀x∃yϕ(x,y) with ϕ a primitive recursive predic- ate, then there is a term t in T such that T  ϕ(x,tx) (the same holds for PA). D ÐIfHA ∃xϕ(x) for an arbitrary formula ϕ with ϕ ≡∃v∀wϕD,then there are terms t,v in T with T  ϕD(t,v,w) (in general, this does not hold for PA). Ð A version of Markov’s rule can be justified for HA: Whenever HA  ¬¬∃xϕ(x) with ϕ primitive recursive, then T  ϕ(t) for a term t and hence by a computability proof within HA we get HA ∃xϕ(x). Ð There is a number-theoretic formula A(x) such that ¬∀x(A(x) ∨ ¬A(x)) can be consistently added to intuitionistic arithmetic HA (see Troelstra 1990, 187). The proof of the interpretation theorem proceeds along the derivations of HA and we want to give some hints of the proof. Induction is interpreted by using the schema of primitive recursion in higher types. The interpreta- tion of the cut rule uses the composition of functionals. The interpretation of modus ponens corresponds to application and for ϕ → ϕ ∨ ψ and ϕ ∧ ψ → ϕ one uses canonical constants for all types. All these oper- ations, namely composition, primitive recursion and application, and the canonical constants are available in the system T. The interpretation of the remaining propositional axioms and the quantifier rules and axioms are straightforward, besides the following innocent-looking axiom (in our calculus this axiom is a special case of A7):

ϕ → ϕ ∧ ϕ.

This is a contraction axiom and we want to take a closer look at its in- terpretation. For simplicity we assume for the moment that ϕ ≡∀wϕ0(w) CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 261

D where ϕ0 is quantifier free. In this case we have ϕ ≡ ϕ and the translation of the contraction axiom reads

D (ϕ → ϕ ∧ ϕ) ≡∃W∀yz[ϕ0(Wyz) → ϕ0(y) ∧ ϕ0(z)].

For the interpretation we need a term t in T such that

(2) ϕ0(tyz) → ϕ0(y) ∧ ϕ0(z).

That means t has to decide, for all given y,z, which one of these two is the essential witness for (2), in particular we may define tyz := z if ¬ϕ0(y) and tyz := y else . The crucial point is that this decision has to be made  in terms of T. Provided we have a characteristic term tϕ0 for ϕ0 with T = ↔ tϕ0 1 ϕ0 this is possible and we can interpret (2). The example ϕ ≡∀wϕ0 already covers the problems of the general case. Hence for arbitrary ϕ ∈ LA with translation ∃v∀wϕD the matrix ϕD has to fulfill these two properties in order to make the argument work:

(i) We need a characteristic term in T for ϕD. (ii) ϕD has to be decidable, i.e., T  ϕD ∨¬ϕD. As long as we are dealing with HA, both requirements are fulfilled, because for ϕ ∈ LA the matrix ϕD of the Dialectica-translation is a Boolean com- bination of equations of type o. These formulae are decidable and it is easy to define characteristic terms for them in T. However, there are formulae in T for which these requirements fail, namely equations of higher type. Consider for example ϕ :≡ x1 = y1 for variables of type 1. We neither have a characteristic term for ϕ nor T  ϕ ∨¬ϕ. We should mention that the decidability of a formula does not entail the existence of a characteristic term for this formula, here the example is T provided with classical logic: although now all formulae are decidable, we still do not have characteristic terms for equations of higher type. And vice versa, in general the existence of a characteristic term does not imply the decidability of a formula: here the example is the system of constructive set functionals introduced in Burr (2000). For 0-formulae this system is provided with characteristic terms, although even prime formulae are undecidable. It is remarkable that Gödel does not mention these aspects at all in his paper from 1958. Troelstra comments (Troelstra 1990, 228):

It is doubtful whether in 1958 Gödel had already realized the need for characteristic func- tions in connection with the axiom A → A ∧ A. In a letter of 1970 J. Diller explicitly drew Gödel’s attention to the role of characteristic functions. Gödel reacted in a letter to Bernays (14 July 1970) as follows: “I do not understand what it means to say that in my proof of the formula p ⊃ p ∧ p a passage (which is not possible) to the characteristic term of a 262 WOLFGANG BURR formula is required. What is required is the decidability of intensional equations between functions.” Gödel probably regarded the existence of (computable) equality functionals as a con- comitant of the decidability of equality; from the passage just quoted one cannot tell whether Gödel realized that in general (that is, at higher types) the decidability of equality does not entail, axiomatically, the presence of equality functionals at higher types. In 1972, Gödel wrote a second version of his paper from 1958, this time in English (Feferman 1990, 271Ð280).4 There he treats the prob- lems concerning contraction in footnote i2. In an earlier version of that footnote, later crossed out, he introduces equality functionals with axioms G(f, g) = 0 ↔ f = g for all types. Later he realized that the interpreta- tion of HA needs only characteristic terms for formulae of type o and the final version of this footnote is (Feferman 1990, 277): For the proofs that the Axioms 1 and 4 of H and the deduction Rule 6 of H5 hold in the interpretation[...],thefollowingprincipleofdisjunctive definition is needed: A function f may be defined by stipulating

A ⊃ f(x)= t1, ¬A ⊃ f(x)= t2, where t1,t2 are terms and A is a truth-value function of equations between number terms, both containing only previously defined functions and no variables except those of the sequence x. We may summarize this discussion: the Dialectica interpretation is ap- plicable to HA only due to the conditions that both the matrices of the translated formulae are decidable and we have characteristic terms for these formulae. None of these conditions follows in general from the other one. As soon as it comes to extensions of HA, these requirements may fail. One interesting extension is the natural span of HA and T: Heyting arith- metic in all finite types. This theory is the framework to discuss the strength of the Dialectica-translation, i.e., to characterize the schema ϕ ↔ ϕD. ω ω There are various versions of arithmetic in all finite types: HA , HA0 , I − HAω, E − HAω and WE − HAω, and they all differ in how equality in higher types is treated. (For these systems we refer to Troelstra (1990, 230Ð231), Avigad and Feferman (1998) and Troelstra (1973). One has to be careful because sometimes these sources differ in notation.) Concerning ω − ω the Dialectica-interpretability we should mention that HA0 and WE HA are Dialectica-interpretable in T. I − HAω is equipped with equality func- D tionals Eσ for each type and is -interpretable in T + Eσ . (This system is quite different from T: the type 2 functionals are no longer continuous ω ω and E1 is not provably recursive in I − HA .) On the other hand, HA (in Troelstra (1973) this is called N − HAω,whereN stands for neutral) and E − HAω are not Dialectica-interpretable, but for quite different reasons. CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 263

Howard showed that no functional of T D-interprets the axiom (E2) (cf. Howard 1973) of extensionality for type 2. In E − HAω equality in higher types is extensional and hence (E2) is derivable. This, of course, implies that E − HAω is not D-interpretable. HAω is an intensional version where we neither have the decidability of equations of higher types nor additional equality functionals. This example by Howard shows that HAω is not D-interpretable in T: Let 01 be a canonical zero-functional of type 1 and ϕ be the formula

(3) ∀u1¬¬∃x(x = 0 ↔ u1 = 01).

This formula is derivable in HAω but not D-interpretable by any continuous functional. Since all functionals of T are continuous we see that HAω is not D-interpretable in T (for the precise argument see, for example, Diller and Nahm 1974). The derivation of (3) essentially makes use of the contraction axiom and in contrast to the non-interpretability of (E2) this example has to do with the lack of characteristic terms. This example demonstrates that in order to interpret the contraction of HAω the argument sketched above fails: In T we do not have characteristic terms for the prime formulae of HAω. In 1974, Diller and Nahm gave a refined translation ∧ to overcome this contraction problem and make the interpretation independent of characteristic terms and the decidability of quantifier free formulae. The main idea of the Diller-Nahm translation is the following: The Dia- lectica translation requires a decision between two witnesses, which is in general impossible. The Diller-Nahm translation, in contrast, allows us to collect both witnesses and hence makes the decision superfluous. We now want to describe the idea of this translation in more detail (see also Diller and Vogel 1975). Again the translation sends any ϕ (now ϕ a formula of ω ∧ HA ) to a formula: ϕ ≡∃v∀wϕ∧. Almost all cases of the translation are just as in the Dialectica transla- tion (for the precise definition see Diller and Nahm 1974). Only the inter- esting case, namely implication, is different. To motivate this translation, let us start with an implication

(4) ∃v∀wϕ →∃y∀zψ, where ϕ and ψ are quantifier free. We are looking for an ∃∀-formula with “almost” the same meaning as (4). Since (4) is intuitionistically equivalent to

(5) ∀v(∀wϕ →∃y∀zψ) 264 WOLFGANG BURR we may treat v as free parameters. This situation is visualized (as a natural deduction) by the following picture:

where we may have possibly several assumptions ∀wϕ[v]. Heyting arithmetic (also in all finite types) satisfies existential definability (ED), that is: from a derivation of a closed formula ∃vϕ one can construct a witness v. This property still holds if the derivation depends on a negative (i.e., ∃-and∨-free) assumption, which is the case in our situation. This consideration motivates the passage from (5) to ∃y(∀wϕ[v]→∀zψ). Again the step to ∃y∀z(∀wϕ[v]→ψ) is valid in intuitionistic logic and after computing y0 we may again consider z as free parameters. Hence we have reached the following situation (and until now the procedure is exactly the same as for the Dialectica translation):

(6) ∀wϕ[v]→ψ[y0,z], which is:

It is constructively justified that one may search through that derivation to find the particular witnesses w1,...,wn that are needed to reach the quan- tifier free conclusion ψ. It is clear that in general we need finitely many such witnesses, as is demonstrated by the example ψ :≡ ϕ(w1) ∧ ϕ(w2) CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 265 with undecidable ϕ(w) (this is exactly a contraction). When searching up- wards, the derivation may branch and hence each branch may contribute a number of witnesses. The induction rule may as well increase this number and may make it depend on parameters. However, in the end we need only finitely many witnesses and the idea is to collect them all and form the conjunction of all these witnesses. Formally, we use a restricted universal quantifier and choose X (the number of witnesses wi we need) and W (a functional with Wi := wi for i

∃WX[(∀x

Putting all together, we have reached

∀v∃y∀z∃WX[(∀x

∃WXY∀vz[(∀x

We see that we do not only have a different translation but also the system T has changed: we now have an additional restricted universal quantifier. The new system is called T∧ and is T plus axioms for the restricted universal quantifier. Note that we do not increase the stock of functionals, but T∧ is no longer a quantifier free term calculus. ω ∧ Diller and Nahm’s main result is: HA is -interpretable in T∧. With the new translation it is easy to interpret the contraction ϕ → ϕ ∧ ϕ (again we ∧ assume ϕ ≡ ϕ ≡∀wϕ0):

∧ (ϕ → ϕ ∧ ϕ) ≡∃WX[∀yz(∀x

Now one just has to put X := 2andW0 := y and W1 := z. Diller and Vogel in Diller and Vogel (1975) give an ∧-interpretation of ω+ HA + BIσ in T enriched by bar-recursion functionals for type σ . Avigad in Avigad (1998) also uses a Diller-Nahm-style translation to ˆ interpret ID<ω, again a system with undecidable formulae, in Martin-Löf type theory. We already mentioned that together with the negative translation, Gödel’s Dialectica interpretation is also applicable to classical PA.Amore direct approach was proposed by Shoenfield (see Shoenfield 1967). He S S gives a translation ϕ → ϕ ≡∀x∃yϕS (x, y). ϕ is said to be -interpretable in T if there are terms T with T  ϕS(x, t), where in contrast to the 266 WOLFGANG BURR

Dialectica-interpretation the terms t may contain the free variables x. Shoenfield’s interpretation theorem is: PA is S-interpretable in T (to be more precise: Shoenfield gives an interpretation in the full type structure, but it is easy to see that this interpretation can be performed in T as well). The two variants ∧ and S of the Dialectica translation were introduced to interpret two extensions of HA, namely HAω and PA. Both extensions have specific features that made a refined translation necessary: In the first case the lack of characteristic terms, in the second the law of excluded middle. Although the two problems are quite different, we can say that they are both of logical nature. In both cases we do not need new functionals for the interpretation (note: in terms of proof theoretic strength, PA, HA and HAω are equivalent), but a more careful translation that respects the peculiarities of the respective extensions. This turns out to be a general pattern. The translation is responsible for handling logic, while the functionals are needed to interpret the mathematics (here we cheat a bit: of course combinators and elementary arithmetic are also needed to handle logic, we mean the mathematically strong functionals Rσ ).

2. FUNCTIONAL INTERPRETATIONS OF SET THEORY

Having successfully applied functional interpretations to first-order arith- metic, the question arises if this method is applicable also to other theories, namely to set theory. The starting point of this work was the paper by Michael Rathjen ‘A proof-theoretic characterization of the primitive re- cursive set functions’ (Rathjen 1992). Rathjen was able to show that the &1-definable set functions of Kripke-Platek set theory with infinity (KPω) but foundation restricted to &1-formulae are exactly the primitive recurs- ive set functions6 enriched by the constant function x → ω. This result is closely related to a theorem by Parsons, namely that the provably re- cursive functions of I&1 are exactly the primitive recursive functions. The Shoenfield interpretation of PA with primitive recursive functionals can be considered as a generalization of Parsons’ theorem. It was a natural question to ask if it is also possible to raise Rathjen’s result to KPω in a similar way. Andreas Weiermann suggested to generalize the primitive recursive set functions to functionals of finite type and he proposed to try a Shoenfield- style interpretation of KPω. This led to a system of set functionals with constants 0,ω; combinators K,S; case-distinction functionals Cσ for each type; a functional Suc with axiom Suc ab = a ∪{b}; a union functional U with Uf a = {fx | x∈a} and transfinite recursors R with RGa = G(RG  a)a. Recall the general pattern from the end of Section 1. It turned CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 267 out (point two of the general pattern) that these functionals are sufficient to deal with the mathematics of KPω. Point one of this pattern now demands a suitable translation to handle the logic of KPω. Here the first attempt was to copy the Shoenfield-translation, but leaving 0-formulae unchanged (this is necessary in order to interpret (0-Separation)). This approach leads to a problem we call 0 ⊆ ,1, i.e., a restricted quantifier might be read as an unrestricted one as well. This is the crucial example

(7) ∀w(w∈a →⊥) → (∀y∈a)⊥.

The S-translation of this formula is (modulo classical logic)

∃w[w∈a ∨ (∀y∈a)⊥].

Thus an interpreting term for (7) corresponds to a choice functional. In a joint paper by Volker Hartung and the author (Burr and Hartung 1998) such a choice functional is added to the system of functionals in order to prove an interpretation theorem. However, this solution is not satisfactory, since the choice-functional is not definable within KPω. What was done is the following: it was tried to repair a logical problem (0 ⊆ ,1 is of a logical nature) by use of a new functional Ð this is not in the sense of our general pattern. Instead, we need a new translation to solve this problem. The first satisfying result that was established in this context was a Diller-Nahm-style interpretation of KPω in the full type structure over a model of KPω (Burr 2000b). Later it turned out that this translation is the Shoenfield-variant of the translation × for the interpretation of con- structive set theory. Therefore we will now focus on the interpretation of constructive set theory. The theory CZF, which has been introduced by Aczel, is formulated in the first order language of set theory L∈ with ∈ as the only non-logical symbol. The logic of CZF is intuitionistic predicate logic with equality. CZF− contains the following non-logical axioms: (Extensionality), (Pair), (Union), (Infinity), (0-Separation), a strong version of the collection schema and (Foundation). CZF contains in addition a subset collection schema. An important feature of Aczel’s constructive set theory is a kind of predicativity: we do not have power sets and full separation, and the proof-theoretic strength of CZF is that of KPω.The∈-relation is not de- cidable in CZF, and from a constructive point of view, this is an essential property that should be preserved by the mathematical axioms: Using ω and (0-Separation), we have arithmetical comprehension in CZF, i.e., a := {x∈ω | ϕ(x)} is a set for all arithmetical formulae ϕ. Hence, de- cidability of ∈ would imply the law of excluded middle for arithmetical 268 WOLFGANG BURR formulae, and CZF would be far from what one would call constructively justified. It is straightforward to see that ∅∈x ∨¬∅∈x implies the excluded middle for all restricted formulae (restricted excluded middle (REM)). Since (Subset-Collection) implies the axiom and this, to- gether with (REM), implies the power set axiom (and therefore would increase the strength of CZF enormously), we see once more that non- decidability of ∈ is an important feature. Hence, care should be taken in formulating the axioms: The axiom of choice (AC) has to be excluded, since together with (0-Separation) it implies (REM) (Dianonescu, cf. Beeson (1985, 163) and Goodman and Myhill 1978). These considerations show that all functionals that decide ∈ cannot be considered to be constructive, in particular not a choice functional, and are not appropriate for a functional interpretation of constructive set theory. For more information about CZF see for example (Aczel 1978, 1982; Griffor and rathjen 1994 and Trolestra and van Dalen 1988). Now we focus on the functional interpretation of CZF− since the interpretation of all of CZF is similar and only needs an additional functional. According to our general pattern, two steps are necessary towards a functional interpretation of constructive set theory. We look for a suitable translation to interpret the logic of CZF− and for a system of functionals to treat the mathematics of CZF−. Towards the right translation we observe that constructive set theory contains undecidable prime formulae. In order to cope with the resulting contraction problem, we have to choose a Diller-Nahm style translation. Furthermore, 0-formulae should be preserved by the translation since we have to interpret (0-Separation). Consequently, a first attempt is to define a translation that leaves 0-formulae unchanged and then copies the Diller-Nahm translation. Doing so, we are able to interpret contrac- tion and solve the 0 ⊆ ,1-problem: The Diller-Nahm-translation of ∀w(w∈a →⊥) → (∀y∈a)⊥ reads

∃WX[(∀x∈X)(Wx∈a →⊥) → (∀y∈a)⊥] and this is trivially interpretable by the definition X := a and W := λx.x. The contraction and the 0 ⊆ ,1-problem are of similar nature and they are both solved by the Diller-Nahm translation: we accumulate in- formation instead of facing an impossible decision. In the first case we collect two witnesses y and z, in the second a-many witnesses w for all w∈a. The Diller-Nahm translation is designed exactly for that purpose. CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 269

However, within intuitionistic logic we also have to deal with the re- stricted existential quantifier and here we face a dual phenomenon, which we call the 0 ⊆ &1-problem

 (∃y∈a)(y = y) →∃w(w∈a → w = w).

The translation of this formula is

(8) ∃w[(∃y∈a)(y = y) → w∈a].

Again any interpreting term for (8) involves a choice-functional, which is not constructive. A closer look at (9) yields that we are not able to give a single interpret- ing term, while it is very easy to give a set of interpreting terms, namely the set a ={x | x∈a} itself. We would again like to accumulate information. This time, however, the Diller-Nahm translation does not help us at all. It only allows us to collect information in the antecedent of an implication. This observation leads to the main idea behind the translation × which turns out to be suitable for an interpretation of constructive set theory: Its existential quantifiers no longer range over single objects, but over inhab- ited sets in the sense of the Diller-Nahm translation. For every ϕ ∈ L∈ the translation ϕ× reads

∃vQ∀w[(∃q∈Q)(q = q) ∧ (∀q∈Q)ϕ(vq,w)¯ ].

That means the existential quantifiers now range over inhabited sets

{vq | q∈Q}, where each element vq (q∈Q) satisfies the matrix ϕ¯. The second step towards a functional interpretation of CZF− is to find a suitable system of constructive set functionals to interpret the mathematics − of CZF . This job will be done by T∈, the set theoretic analogue of Diller and Nahm’s system T∧. The stock of basic functionals of T∈ is 0 and ω for the corresponding sets, combinators K and S, a successor-functional Suc, I and N for intersection and relative complements, the union functional U and recursors R. A distinguished and important subclass of the formulae of T∈ is 0. It contains atomic formulae s∈t,s = t and ⊥,wheres and t are terms of type o and is closed under →, ∧, ∨, (∀x∈a) and (∃x∈a). Hence this class contains all 0-formulae of L∈. It is an important feature of 0, that is does not contain equations of higher type. In order to axiomatize the basic functionals, T∈ contains furthermore equations of arbitrary type. 270 WOLFGANG BURR

Completely analogous to T∧ we now obtain the class of formulae of T∈ by closing under ∧, → and (∀x∈a). Hence, the restricted existential quantifier ∈ and disjunctions occur only within 0-formulae, while ∧, → and (∀x a) may as well occur in front of equations of arbitrary type. This together with the fact that every 0-formula is provably equivalent in T∈ to an equation of type o underline the correspondence between T∈ and T∧. Now everything is arranged to give the main result: CZF− is ×- interpretable in T∈. We want to give some hints of the proof of the interpretation theorem: Most of the logical rules and axioms are - due to a distribution lemma - interpreted similarly to the Diller-Nahm interpreta- tion. The 0 ⊆ &1-problem is solved by the new translation. Concerning the set theoretical axioms of CZF−, we see that (Exten- sionality) is preserved by the translation and is also an axiom of T∈. (Pair), (Union) and (0-Separation) are interpreted using Suc,I,N,U and infinity, of course, needs the constant ω. (Foundation) is interpreted by transfinite recursion (corresponding to arithmetic, where induction is in- terpreted by primitive recursion). The interpretation of (Strong-Collection) uses the union functional U. It should be noted that we can interpret col- lection for arbitrary formulae. In contrast, in the classical framework, we can treat collection only for &1-formulae. There are two immediate corollaries of the interpretation theorem: − CZF is consistent relative to T∈ and its provably total &1-definable set functions are all representable in T∈ by terms of T∈. When it comes to all of CZF it is necessary to introduce a so-called 7 + fullness-functional which leads to the system T∈ . Now the interpretation theorem easily extends to CZF. Vice versa it can be shown that all ex- tensions of T∈ that interpret CZF already contain this fullness-functional. + Hence T∈ is the minimal extension of T∈ that interprets all of CZF. Once having defined the ×-translation, it is a natural question to ask for a characterization of this translation, in order to get more information about the principle involved in the translation.8 For this purpose constructive set theories in all finite types are introduced, i.e., the language of T∈ is extended to full first order logic by adding quantifiers for all finite types and define several extensions of T∈ formulated in this language. One of ω− − these extensions is CZF : it is the system ‘spanned’ by T∈ and CZF in a natural way. This theory is an adequate framework to discuss the strength of the translation ×. Similar to the characterization of the Dialectica and the Diller-Nahm translation we ask which principles are needed to prove the schema ϕ ↔ ϕ×. In arithmetic and for the Dialectica-translation, these principles are the axiom of choice

(AC) ∀xσ ∃yτ ϕ(x,y) →∃Y σ →τ ∀xϕ(x,Yx) CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 271 and, furthermore, Markov’s principle (M) and a principle of independence of premises (IP). For the Diller-Nahm translation one needs (AC) and modified principles (M∧) and (IP∧). In both cases we also have a converse result, namely that the schemes ϕ ↔ ϕD (resp. ϕ ↔ ϕ∧) already prove the principles mentioned above. As already mentioned, it is well known that the (set theoretical) axiom of choice added to CZF− implies the law of excluded middle for restricted formulae (REM). Therefore, in intuitionistic set theory choice principles should be treated with great caution. However, certain choice principles are needed in order to prove the equivalence of ∀uϕ and ϕ → ψ to their ×-translation. A closer look yields that the involved principles are weaker than those stated above. Here by ‘weaker’ we mean the following: Suppose that ∀x∃yϕ holds. (AC) as above gives a choice functional Y with ∀xϕ(x,Yx). In the context of set theory, we are in general not able to give such a functional, but only functionals S and Y such that for any argument x the set Sx is inhabited and ϕ(x,Ysx) holds for all s∈Sx. This leads to the scheme (WAC) (weak axiom of choice)

(WAC) ∀x∃yϕ(x,y) →∃YS∀x[(∃s∈Sx)(s = s) ∧ (∀s∈Sx)ϕ(x, Y sx)].

Similar considerations lead to a variant of (IP∧) which is appropriate in the context of set theory

(IP∈) (∀wϕ →∃uψ) →∃US[∀wϕ → (∃s∈S)(s = s) ∧ (∀s∈S)ψ(Us)] for ϕ ∈ T∈. The variant of Markov’s principle introduced by Diller and Nahm is sufficient also in our case:

(M∈) (∀wϕ → ψ) →∃WX[(∀x∈X)ϕ(Wx) → ψ] ϕ,ψ ∈ T∈.

ω− The theory CZF∗ is introduced by adding these three principles to − CZFω and the characterization theorem states precicely the equivalence ω− ω− × ω of CZF∗ and CZF +{ϕ ↔ ϕ | ϕ ∈ L∈}. ω− The interpretation theorem easily extends to CZF∗ which yields the ω− following version of existential definability. Whenever CZF∗ proves an existential theorem ∃xϕ(x) then we can find an inhabited set of wit- ω− nesses, all of them satisfying ϕ: There are terms X,Qsuch that CZF∗  (∃q∈Q) (q = q) ∧ (∀q∈Q) ϕ(Xq). Even more, if we have in addition ω− the uniqueness CZF∗ ∃!xϕ(x) we can use this information to obtain a ω− single term t such that CZF∗  ϕ(t). 272 WOLFGANG BURR

Let us briefly return to Kripke-Platek set theory. We already mentioned that KPω contains the 0 ⊆ ,1 problem which in the presence of classical logic coincides with the 0 ⊆ &1-problem. If we compare the Shoenfield versions of ∧ and ×, we observe the same phenomenon: the differences collapse, and we find a single translation ∨ that is the common Shoenfield-version both of × and ∧. If we consider the Shoenfield interpretation of PA, it turns out that the interpreting system is still Gödel’s T, i.e., a system with intuitionistic logic. This holds, since the part of T that is relevant for the interpretation of HA is decidable. With T∈ this is different and hence we need to introduce the classical counterpart c ∨ c T∈. A first result now states: KPω is -interpretable in T∈. Again this can easily be extend to Kripke-Platek set theory in all finite types and the interpretation theorem also holds for the extended systems. This yields relative consistency, conservativity and a characterization of the provably total &-definable set functions. In Burr and Hartung (1998) we also give a c computability proof for the functionals of T∈ in KPω and hence we even have the stronger result: The provably total &-definable set functions of ω c KPω∗ are exactly the type 1 functionals of T∈. Let us conclude this discussion by some final remarks. The functional + interpretation of Aczel’s constructive set-theories in systems T∈ and T∈ provides a lot of information about the analyzed systems. We get con- sistency proofs for CZF− and CZF relative to “quantifier-free” systems of constructive set functionals and obtain moreover some interesting insights in the nature of constructive set theory. The interpretation extracts the ex- istential content of the investigated theories in a way that all existential consequences are made explicit in terms of constructive set functionals. However, the translation tells us that this explicity is less sharp than in the context of arithmetic, since in general one obtains only an inhabited set of witnesses rather than precisely one. This underlines an important feature of extensional set theories without the axiom of choice: they all come along with an inherent portion of vagueness.

NOTES

∗ The present paper is based on a part of the author’s doctoral dissertation (under the supervision of Professor Justus Diller at the University of Münster). 1 English translation by Stefan Bauer-Mengelberg and Jean van Heijenoort in (Feferman 1990, 241): P. Bernays has pointed out on several occasions that, since the consistency of a system cannot be proved using means of proof weaker than those of the system itself, it is necessary to go beyond the framework of what is, in Hilbert’s sense, finitary mathematics if one wants to prove consistency of classical mathematics, or even that of classical number theory. Consequently, since finitary mathematics is defined as the mathematics in which CONCEPTS AND AIMS OF FUNCTIONAL INTERPRETATIONS 273 evidence rests on what is intuitive, certain abstract notions are required for the proof of the consistencyofnumbertheory[...]. 2 (Feferman 1990, 245): In any case Bernays’ remark teaches us to distinguish two com- ponents in the finitary attitude; namely, first, the constructive element, which consists in our being allowed to speak of mathematical objects only in so far as we can exhibit them or actually produce them by means of a construction; second, the specifically finitistic element, which makes the further demand that the objects about which we make state- ments, with which the constructions are carried out and which we obtain by means of these constructions, are ‘intuitive’, that is, are in the last analysis spatiotemporal arrangements of elements whose characteristics other than their identity or nonidentity are irrelevant. 3 It is remarkable that in his 1941 talk we already find all the ideas of his 1958 paper. In many aspects this talk gives more detailed explanations than his 1958 paper. 4 The paper was planned to be published in the Swiss journal Dialectica.Thisnever happened and it appeared in (Feferman 1990) for the first time. 5 Axiom 1 is ϕ → ϕ ∧ϕ, axiom 4 is ϕ ∨ϕ → ϕ and rule 6 is ϕ → ψ,χ → ψ  ϕ ∨χ → ψ. 6 This notion is due to Jensen and Karp, (Jensen and Karp 1971). 7 Note that (Subset-Collection) is equivalent to a fullness axiom. 8 However, note that the proof of the interpretation theorem does not use these principles.

REFERENCES

Aczel, P.: 1978, ‘The Type Theoretic Interpretation of Constructive Set Theory’, in A. McIntyre, L. Pacholski and J. Paris (eds.), Logic Colloquium ‘77 [Studies in Logic and the Foundations of Mathematics], Amsterdam. Aczel, P.: 1982, ‘The Type Theoretic Interpretation of Constructive Set Theory: Choice Principles’, in A. S. Troelstra and D. van Dalen (eds.), The L.E.J. Brouwer Centenary Symposium [Studies in Logic and the Foundations of Mathematics 110], Amsterdam. ˆ Avigad, J.: ‘Predicative Functionals and an Interpretation of ID<ω’, Annals of Pure and Applied Logic 92, p. 1Ð34 Avigad, J. and Feferman, S.: 1998, ‘Gödel’s Functional (“Dialectica”) Interpretation’, in S. Buss (ed.), Handbook of Proof Theory [Studies in Logic and the Foundations of Mathematics 137], p. 337Ð406. Beeson, M.: 1985, Foundations of Constructive Mathematics, Berlin. Burr, W. and Hartung, V.: 1998, ‘A Characterization of the &1-definable Functions of KPω +(uniform AC)’, Archive for Mathematical Logic 37, 199Ð214. Burr, W.: 1998, Functionals in Set Theory and Arithmetic, (Doctoral Thesis), Münster. Burr, W.: 2000a, ‘Functional Interpretation of Aczel’s Constructive Set Theory CZF’, Annals of Pure and Applied Logic 104, 31Ð73. Burr, W.: 2000b, A Diller-Nahm Style Functional Interpretation of KPω, Archive for Mathematical Logic 39, pp. 599Ð604. Buss, S. (ed.): 1998, Handbook of Proof Theory, Studies in Logic and the Foundations of Mathematics 137, Amsterdam. Diller, J.: 1979, ‘Functional Interpretation of Heyting’s Arithmetic in all Finite Types’, Nieuw Archief voor Wiskunde, Derde Serie 27, 70Ð97. 274 WOLFGANG BURR

Diller, J. and Müller, G. H. (eds.): 1974, ‘|= ISLIC Proof Theory Symposium, Dedicated to Kurt Schütte on the Occasion of his 65th Birthday’, Proceedings of the International Summer Institute and Logic Colloquium, Lecture Notes in Mathematics 500, Kiel. Diller, J. and Nahm, W.: 1974, ‘Eine Variante zur Dialectica-Interpretation der Heyting- Arithmetik endlicher Typen’, Archiv für mathematische Logik und Grundlagenforschung 16, 49Ð66. Diller, J. and Vogel, H.: 1975, ‘Intensionale Funktionalinterpretation der Analysis’, in J. Diller and G. H. Müller (eds.), |= ISLIC Proof Theory Symposium, Dedicated to Kurt Schütte on the Occasion of his 65th Birthday, pp. 56Ð72. Feferman, S.: 1968, ‘Ordinals Associated with Theories for one Inductively Defined Set’, unpublished. Feferman, S. et al. (eds.): 1990, Kurt Gödel, Collected Works, Volume 2, Oxford. Feferman, S. et al. (eds.): 1994, Kurt Gödel, Collected Works, Volume 3, Oxford. Gödel, K.: 1941, ‘In What Sense is Intuitionistic Logic Constructive?’, in: S. Feferman et al. (eds.), Kurt Gödel, Collected Works, Volume 3, pp. 189Ð200. Gödel, K.: 1958, ‘Über eine bisher noch nicht benützte Erweiterung des finiten Stand- punktes’, Dialectica 12, 280Ð287. Goodman, N. D. and Myhill, J.: 1978, ‘Choice Implies Excluded Middle’, Zeitschrift für mathematische Logik und Grundlagen der Mathematik 24, 461. Griffor, E. and Rathjen, M.: 1994, ‘The Strength of Some Martin-Löf Type Theories’, Archive for Mathematical Logic 3, 347Ð385. van Heijenoort, J.: 1967, From Frege to Gödel: A Sourcebook in Mathematical Logic, 1879Ð1931, Cambridge MA. Hilbert, D.: 1926, Über das Unendliche, Mathematische Annalen 95, p. 161Ð190, (Eng- lish translation in: J. van Heijenoort (ed.), From Frege to Gödel: A Sourcebook in Mathematical Logic, 1879Ð1931, pp. 367Ð392. Howard, W. A.: 1973, ‘Hereditarily Majorizable Functionals of Finite Type’, in A. S. Troelstra (ed.), Metamathematical Investigation of Intuitionistic Arithmetic and Analysis [Lecture Notes in Mathematics 344], Berlin, pp. 454Ð461. Jensen, R. B. and Karp, C.: 1971, ‘Primitive Recursive Set Functions’, in D. S. Scott (ed.), Axiomatic Set Theory, Proceedings of the Symposium in Pure Mathematics of the American Mathematical Society, held at the University of California, Los Angeles, California, July 10ÐAugust 5, 1967, Providence 1971, Proceedings of Symposia in Pure Mathematics 13, pp. 143Ð176. Myhill, J.: 1975, ‘Constructive Set Theory’, Journal of Symbolic Logic 40, 347Ð382. Rathjen, M.: 1992, ‘A Proof-Theoretic Characterization of the Primitive Recursive Set Functions’, Journal of Symbolic Logic 57, 954Ð969. Shoenfield, J. R.: 1967, Mathematical Logic, Reading MA. Troelstra, A. S. and van Dalen, D.: 1988, Constructivism in Mathematics Ð Volumes I and II [Studies in Logic and the Foundations of Mathematics 123], Amsterdam. Troelstra, A. S. (ed.): 1973, Metamathematical Investigation of Intuitionistic Arithmetic and Analysis, Berlin, Lecture Notes in Mathematics 344. Troelstra, A. S.: 1990, ‘Introductory Note to 1958 and 1972’, in Kurt Gödel, Collected Works, Volume 2, pp. 217Ð241.

Laugeufelder Str. 57 D-22769 Hamburg Germany E-mail: [email protected] PETER KOEPKE

THE CATEGORY OF INNER MODELS

1. INTRODUCTION

Set Theory is the mathematics of infinity. The results and arguments of set theory are characterized by enormous differences in size, as well as inter- actions between entities of different sizes. In this article we shall study a category of class-sized objects and its impact on small sets of real numbers. The dialectical tension between the finite and the infinite is present in the very foundations of set theory and mathematics. Properties of the infin- ite are formulated as finite mathematical expressions, reasoning about the infinite is conducted by finite mathematical proofs. Gödel’s incomplete- ness theorems prove mathematically that one cannot completely describe the infinite by finitary means: there are (number theoretic) statements in- volving quantifications over infinite domains which are not decided by the standard axioms of ZFC set theory; moreover, the undecidability phe- nomenon cannot be avoided by extensions of the axiomatic system (Gödel 1931). Research in axiomatic set theory has discovered the independence of many principles of infinitary combinatorics. Cantor’s continuum hypo- ℵ0 thesis 2 =ℵ1, which proposes an answer to the first nontrivial question about infinitary cardinal exponentiation was shown to be independent by Gödel (1938) and Cohen (1963). There is a wealth of set theoretical axiom systems which extend the usual axioms and which are (presumably) con- sistent and realizable in first order structures. This is well-documented in the standard textbooks (Jech 1978; Kanamori 1994), which we recommend as a general reference for this article. Most research in axiomatic set theory consists in constructing and ex- amining transitive models of the axioms of ZFC. Since the consistency results of axiomatic set theory are relative consistencies, new models of set theory have to be constructed from given ones, usually as extensions or submodels where the smaller model is a class-sized transitive submodel of the bigger one. Class-sized transitive models of ZFC are called inner

Synthese 133: 275Ð303, 2002. © 2002 Kluwer Academic Publishers. Printed in the Netherlands. 276 PETER KOEPKE models of set theory. Gödel’s model L of constructible sets is the paradigm of an inner model (Gödel 1938) (see also Devlin (1984)). Set theory has made good use of situations in which there are many in- ner models available. To assume the existence of many inner models seems natural if one accepts the profound inaccessibility of the infinite. Never- theless there is the desire to classify the spectrum of “possible worlds” mathematically. It was noted many years ago and has become part of the set theoretical folklore that some natural operations and properties of inner models are related to large cardinal notions. Motivated by techniques from large cardinal theory we shall explore some aspects of the family of inner models from a category-theoretical perspective. In Section 3 we shall consider the category of inner models with ele- mentary embeddings as morphisms. In the presence of large cardinals this category is nontrivial, i.e., there are morphisms which are not the identity. The first ordinal moved by a nontrivial morphism is a large cardinal in an inner model. We investigate situations in which this category exhibits some structural richness (Section 2 and 4). Sets of real numbers may allow a rep- resentation by a commutative subsystem (diagram) of the category of inner models (Section 4). The existence of such normal forms (embedding nor- mal forms with witnesses) for a set A ⊆ R implies the determinacy of the infinitary game with winning set A and other regularity properties (Section 5). In Section 6, we consider an operation by which an embedding of an inner model can act on another model to yield an “induced” embedding of that model. This operation is useful for coding information into diagrams of inner models. In Section 7 we give an indication how embedding normal forms for projective sets of reals can be built, which can be used to prove the famous theorem of Martin and Steel on projective determinacy (Martin and Steel 1989). The preconditions for such constructions are given by measurable cardinals and Woodin cardinals. We conclude this article with an Appendix containing more details on the construction of the embedding normal forms. The principal message of this article is that notions of transcendental size (in the present situation: parametrized families of proper classes) are related to familiar mathematical structures and that there are natural methods of transformation between the realms of inner models and of descriptive set theory. It could be interesting and fruitful to relate tech- nical results on the structure of the family of inner models to philosophical questions about “possible worlds”. Can the spectrum of possible worlds be employed to gain information about the “actual” world as is being sugges- ted by the determinacy results of Section 5? A programmatic answer which corresponds well to the present situation in the foundations of mathematics THE CATEGORY OF INNER MODELS 277 may be read from the concluding words of Felix Hausdorff’s work Das Chaos in kosmischer Auslese (Hausdorff 1998, p. 209) which he published under his pseudonym Paul Mongré:

Werden wir also den kosmocentrischen Aberglauben los wie früher den geocentrischen und anthropocentrischen; erkennen wir, dass in das Chaos eine unzählbare Menge kosmi- scher Welten eingesponnen ist, deren jede ihren Inhabern als einzige und ausschlie§lich reale Welt erscheint und sie verleiten möchte, ihre qualitativen Merkmale und Beson- derheiten dem transcendenten Weltkern beizulegen. Aber dieser Weltkern entzieht sich jeder noch so losen Fessel und wahrt sich die Freiheit, auf unendlich vielfache Weise zur kosmischen Erscheinung eingeschränkt zu werden; er gestattet das Nebeneinander aller dieser Erscheinungen, die als specielle Möglichkeiten, als begrifflich irgendwie abgegrenz- te Theilmengen in seiner Universalität enthalten sind Ð ja er ist nichts anderes als eben dieses Nebeneinander und darum transcendent für die einzelne Erscheinung, die in sich selbst ihr eigenes abgeschlossenes Immanenzgebiet hat.1

2. INNER MODELS OF ZERMELO FRAENKEL SET THEORY

A model of set theory is a rich structure in which the usual mathematical arguments can be formulated. Such a structure can be thought of as a re- lational algebraic structure equipped with an elementhood relation ∈ and operations of set formation which satisfy certain laws. These laws are ex- pressed by the axioms of ZermeloÐFraenkel set theory including the axiom of choice (ZFC). We shall concentrate our attention on inner models, i.e., transitive class-sized models of set theory which can be regarded as stand- ard models of the system ZFC. The detailed development and analysis of the theory ZFC is quite involved. The following is a brief sketch how the basic mathematical notions can be formalized set-theoretically. It mainly serves to fix some notation. Set theory studies the informal notion of a set as described by Cantor. The class term {x | ϕ} denotes the collection of all objects x such that ϕ ∈{ | }:↔ z { | } holds, i.e., z x ϕ ϕ(x ). Those z are called the elements of x ϕ . Basic operations on sets and classes can be defined with the help of class terms: •∅={x | x = x} is the empty set; •{x,y}={z | z = x ∨ z = y} is the unordered pair of x and y; • (x, y) ={{x,x}, {x,y}} is the ordered pair of x and y. The theory of relations and functions can be built upon the notion of ordered pair as usual.  • x ={z |∃y ∈ xz∈ y} is the union of (the elements in) x; • P(x) ={y | y ⊆ x} is the powerset of x. 278 PETER KOEPKE

The principal tools to study the infinite are induction and recursion along the (transfinite) ordinal numbers, which were formalized by von Neumann as follows (von Neumann 1923). • A class A is transitive if it is an initial segment of the ∈-relation: ∀x ∈ A ∀y ∈ xy∈ A; • A is well-ordered by the ∈-relation if (a) (A, ∈) is linearly ordered, i.e., (A, ∈) is transitive, non-reflexive and connected, and (b) (A, ∈) is well-founded, i.e., ∀x ⊆ A(x = ∅ → ∃u ∈ x ∀v ∈ xv∈ /u). • an ordinal is a set α which is transitive and well-ordered by the ∈- relation. The class Ord of all ordinals is itself transitive and well-ordered by the ∈-relation. Each ordinal α has an immediate successor α + 1 = α ∪{α}. Natural numbers are those ordinals which can be reached from 0 =∅by the +1-operation: n is a natural number,if n =∅∨((∃m(n = m + 1)) ∧∀m ∈ n(m =∅∨∃(m =  + 1))). The collection of all natural numbers will be denoted by ω. A central question in set theory is which class terms {x | ϕ} can be considered to be mathematical objects in the full sense, i.e., sets. Russell’s antinomy shows that not all classes can be permitted to be sets. The ax- iomatization of set theory by Zermelo and Fraenkel postulates that many classes defined above are admissible as sets (Zermelo 1930). The ZermeloÐ Fraenkel system is sufficiently rich to carry out the development of all mathematical notions. It is based on the intuitive notion of a set and has not lead to any contradictions so far. We give a short list of the axioms using the notation of class terms. Axioms (2.1) to (2.6) express that the ∈-relation is extensional (a set is ex- actly determined by its elements) and well-founded and that set theoretical universes are closed relative to the basic operations introduced above.

Set Theoretical Axioms (2.1) Extensionality: ∀x∀y(∀z(z ∈ x ↔ z ∈ y) → x = y). (2.2) Foundation: ∀x(x = ∅ → ∃y ∈ x∀z ∈ x(z∈ / y)). (2.3) Pairing: ∀x∀y∃z(z ={x,y}).  (2.4) Union: ∀x∃z(z = x). (2.5) Powerset: ∀x∃z(z = P(x)). (2.6) Infinity: ∃z(z = ω). THE CATEGORY OF INNER MODELS 279

The remaining two schemata of axioms express the closure of set the- oretical universes under certain definitions by first-order formulae. The separation schema could be deduced from the more powerful replace- ment schema but we include separation as it is closest to the original comprehension principle of naive set theory.

(2.7) Separation: for each first-order formula ϕ(u,w) postulate: ∀w∀x∃zz={u ∈ x | ϕ(u,w) }.

(2.8) Replacement: for each first-order formula ϕ(u,v,w) postulate: ∀w((∀u, v, v(ϕ(u, v, w) ∧ ϕ(u,v, w) → v = v) → ∀x∃zz={v |∃u ∈ xϕ(u,v,w) }).

So the image of a set under a definable function is again a set. The only other principle widely assumed in mathematical practice is the axiom of choice. In set theory this is usually employed in the equivalent form of Zermelo’s well-ordering principle (Zermelo 1904):

(2.9) Choice: ∀x∃α∃f(αis an ordinal ∧ f : x ↔ α).

The axiomatic system consisting of (2.1) to (2.8) is abbreviated as ZF, and the full system (2.1) to (2.9) as ZFC (ZermeloÐFraenkel set theory with choice). A model of set theory isapair(M, E),whereE is a binary relation on the domain M which satisfies the axioms ZFC. Models of the form (M, ∈) where ∈ denotes the ∈-relation restricted to M are of particular interest and are often obtained by the

MOSTOWSKI COLLAPSING LEMMA 2.1. Let (M, E) be a strongly well-founded relation, i.e., (M, E) satisfies the extensionality axiom (2.1), is well-founded, and {x | xEm} is a set for all m ∈ M.Then(M, E) is isomorphic to a unique structure (N, ∈) where N is a transitive class.

By Gödel’s incompleteness theorems we should not be able to construct models of set theory from ordinary mathematical objects. We can only expect to construct such models out of given models. This motivates the

DEFINITION 2.2. A class M is called an inner model of set theory if (M, ∈) it is a definable transitive model of the system ZFC which contains all the ordinals.

Gödel has defined the inner model L of constructible sets (Gödel 1938, see Devlin 1938). This model is the ⊆-smallest inner model since its defin- ition is absolute for any other inner model of set theory: If M is an inner 280 PETER KOEPKE model, then the constructible universe LM constructed inside the model M is the same as the original constructible universe: LM = L. This implies ⊆-minimality: L = LM ⊆ M. There is always the trivial inner model V ={x | x = x}, which is the universe of all sets. Gödel’s axiom of constructibility asserts that every set in V belongs to the constructible sets: V = L. In this case, the family of inner models is trivial and consists only of the model V itself. To assume that the family of inner models is non-trivial corresponds to the idea that the set theoretical universe should be rich and allow many possibilities like the existence of non-constructible sets.

3. ELEMENTARY EMBEDDINGS AND THE CATEGORY OF INNER MODELS

A universal theme in modern mathematics is the study of structure pre- serving maps (homomorphisms, embeddings, isomorphisms, etc.) between structures of the same type. The appropriate framework for this is the language of categories. In the context of models of set theory, a natural requirement for struc- ture preserving maps is that they preserve the operations of set formation as described in the ZermeloÐFraenkel axioms. So we consider definable elementary embeddings π : M → N between inner models where for every first-order ∈-formula ϕ(u) and all x ∈ M:

(M, ∈) |= ϕ[x] if and only if (N, ∈) |= ϕ[π(x) ].

The map π can be extended to subclasses A of M:letA be definable in M from parameters p by a formula ϕ. Then one can define π(A) = ∩  x∈V π(A x); the class π(A) is definable in N from parameters π(p) by the formula ϕ. Intuitively, the collection of all inner models with elementary maps between them can be seen as a category. Because of the subtle difficulties concerning the definability of such a category in ZFC we restrict the complexity of the models and embeddings.

DEFINITION 3.1. Fix a sufficiently large natural number n<ω.The Category of Inner Models consists of the following: objects are all in- ner models which are n-definable from parameters; morphisms are all elementary maps between the objects which are n-definable from parameters. THE CATEGORY OF INNER MODELS 281

Figure 1. An elementary embedding.

Note that the family of classes which are n-definable from parameters can be described by a single n+1-formula. Inner models can be charac- terized as classes which are transitive, closed with respect to the finite set of Gödel functions and almost universal (see Jech 1978, Chapter 2). An embedding π : M → N is 1-elementary iff it is fully elementary (see Kanamori 1994, p. 45). This indicates that the category of inner models can be uniformly represented within the system ZFC by a concrete but complicated formula which we shall not state explicitly. The constant n is assumed to be big enough for the intended applic- ations. The case n = 1 will cover most interesting situations and in particular models which are naturally obtainable from V by iteration trees. So we shall assume that n = 1, and for the remainder of this article “inner model” and “elementary embedding of inner models” are to be understood as objects and morphisms of the above category.2 The category of inner models has been applied before. Most notable is the work on connections with left-distributive algebras which was initiated by Laver (1992, 1997) and Dougherty (1997). The formula defining the category of inner models can be evaluated within every model M of set theory. The category of inner models within M will in general be different from the category of inner models within V. We say that an elementary embedding π : M → N is internal (in M)ifπ is a morphism of the category of inner models as defined in M.Weshall encounter a subtle interplay between internal and non-internal embeddings in the construction of iteration trees as described in the Appendix. Let us introduce relations for comparing inner models. We define the usual von Neumann-hierarchy (von Neumann (1925):

V0 =∅, = Vα+1 P(Vα), Vλ = Vα, for limit ordinals λ. α<λ 282 PETER KOEPKE  = The hierarchy exhausts the set theoretical universe: V α∈Ord Vα.It possesses certain absoluteness properties: if M is an inner model then for M M all α ∈ Ord: (Vα) = Vα ∩M.Here(Vα) denotes the term Vα as defined in M; relativized notations like this will also be used in connection with other class terms.

Figure 2. The von Neumann-Hierarchy.

For each ordinal α and inner models M and N we define the equival- ence relation of agreement below α: M ∼α N ↔ Vα ∩ M = Vα ∩ N.In many set theoretical arguments, parameters have to be considered and we extend the relation of agreement to include finite sequences of parameters: Let (M, ∈, p) be an inner model with a finite parameter sequence p ∈ M and let α ∈ Ord,α ≥ ω. Consider an appropriate language for the structure (M, ∈, p,(z | z ∈ Vα ∩ M)) in which the sets z ∈ Vα ∩ M are taken as constants. We may assume that the language is absolutely coded as a subset of Vα and that for β ≥ α the theories cohere nicely:

Th(M, ∈, p,(z | z ∈ Vα ∩ M)) = Th(M, ∈, p,(z | z ∈ Vβ ∩ M)) ∩ Vα.

For each infinite ordinal α and pointed inner models (M, p) and (N, q) of the same type define:

(M, p) ∼α (N, q)

:↔ Th(M, ∈, p,(z | z ∈ Vα ∩ M)) = Th(N, ∈, q,(z | z ∈ Vα ∩ N)).

Of course, (M, p) ∼α (N, q) implies that M ∼α N. By Tarski’s theorem on the undefinability of truth (Tarski 1935) the definition of ∼α for pointed inner models can not be carried out within ZFC. Without further details we can resolve this by restricting the “Th- operator” to statements of limited quantifier-complexity. If models M and N agree below some ordinal α then methods from M can be applied to N as we shall see in Section 6. THE CATEGORY OF INNER MODELS 283

4. LIMIT CONSTRUCTIONS IN THE CATEGORY OF INNER MODELS

If π : M → N is an elementary map between inner models, there are in general constants available in N which are not in the image of M. In this sense, N is an extension of M by constants. We shall construct iterated extensions of this kind. Sometimes an infinite sequence of such extensions can be embedded into another inner model which is equivalent to the well-foundedness of the direct limit. This criterion will be used to decide whether a real number belongs to a given set A of reals. Let us assume that there is a non-trivial elementary map π : V → M in our category, i.e., π = id. This assumption is equivalent to the existence of a measurable cardinal (see Definition 7.1): let α be the critical point of π, i.e., α is the minimal ordinal such that π(α) > α.ThenU ={x ⊆ α | α ∈ π(x)} is a non-trivial α-complete normal ultrafilter on P(α), hence α is a measurable cardinal as defined by St. Ulam (1930). Conversely, if there is a non-trivial α-complete normal ultrafilter U on P(α) then the Scott- ultrapower of V by U is a non-trivial elementary map π as above (Scott 1961). There are many motivations to assume the existence of a measurable cardinal. In the context of algebraic categories, “big” structures usually can be embedded into proper substructures, and it seems reasonable to assume that the universe of sets is big in a similar way. We shall see that we can derive more maps and commutative diagrams from π, and this will lead to combinatorial consequences. We assume that π is internal in V, so we can use π to transport the definitions of π and of M up to M: M |= π(π) : M → π(M). Since the notion of an elementary map of inner models is sufficiently absolute, we obtain an elementary map π(π) : M → π(M) into the inner model π(M). This process can be iterated transfinitely.

4.1. A Well-founded Direct Limit

Define commutative systems (Mi )i<θ , (πij )i≤j<θ by recursion on the length θ.SetM0 = V, π00 = id, M1 = M, π01 = π and π11 = idM1.Ifθ is a limit ordinal, we simply take the union of the uniquely defined systems of smaller lengths. For the successor step, assume that the system (Mi )i<θ ,(πij )i≤j<θ is defined and we have to construct (Mi )i≤θ ,(πij )i≤j≤θ .

Case 1 θ = θ¯ + 1 is a successor ordinal. Then we continue by mapping π up to M ¯:SetMθ = π ¯(M1), π ¯ = π ¯(π01), πiθ = π ¯ ◦ π ¯,for ¯ θ 0θ θθ 0θ θθ iθ i<θ,andπθθ = idMθ . 284 PETER KOEPKE

Case 2 θ is a limit ordinal. In this case, the system (Mi )i≤θ ,(πij )i≤j≤θ is the direct limit of the system (Mi )i<θ ,(πij )i≤j<θ. The limit can be formed by a universal construction. We claim that it exists in the category of inner models. By the Mostowski Collapsing Lemma 2.1 it suffices to see that the direct limit is strongly well-founded.

So assume for a contradiction that the direct limit is ill-founded. Note that for i0 <θthe final segment (Mi )i0≤i<θ ,(πij )i0≤i≤j<θ of the directed system is definable in Mi0 in a uniform way as a maximal iteration whose direct limit is ill-founded. Choose ξ0 ∈ Ord minimal such that ξ is the first member of an ill-founded chain, i.e., there are 0 = j0 ξn+1 ). Then the system (Mi )j1≤i<θ ,(πij )j1≤i≤j<θ has an ill-founded chain starting with ξ1. This chain is an element of V and an argument using the absoluteness of well-foundedness shows that a possibly different ill-founded chain starting with ξ1 is an element of Mj1 . The elementarity of π0j1 yields that Mj1 thinks π0j1 (ξ0) is the minimal first member of an ill-founded chain. But

π0j1 (ξ0)>ξ1, contradiction.

Figure 3. The well-foundedness argument for a linear iteration.

4.2. An Ill-founded Direct Limit The embedding π : V → M can be continued differently by repetition:  define a commutative system (Mi)i<ω with maps (πij )i≤j<ω by recursion:  =  =  =  =  Set M0 V, π00 id, M1 M,andπ01 π.IfMn is defined, then     set πn,n+1 = πMn and Mn+1 = (πn,n+1) Mn. The other maps of the system are determined by commutativity. If α is the critical point of π then for each n<ω : πn,n+1(α) >  α. This implies that the direct limit of the system (Mi)i<ω ,(πij )i≤j<ω is ill-founded. THE CATEGORY OF INNER MODELS 285

Figure 4. An ill-founded iteration.

4.3. Embedding Normal Forms Combining the two iteration techniques described above one can build treelike systems of inner models so that some branches through the tree have well-founded limits whereas others have ill-founded limits. The systems will be indexed by the tree ω<ω of finite sequences of natural numbers. Since branches through ω<ω can be identified with real num- bers, i.e., elements of R = ωω, we can associate with every real number a well-founded or ill-founded limit through the tree of models.

DEFINITION 4.1. A commuting system (Ms ,πst)s⊆t∈ω<ω of inner models Mn and elementary maps πst : Ms → Mt is called an embedding normal form (ENF) for a set A ⊆ R of reals if for every p ∈ R:

(4.1) We have p ∈ A if and only if the direct limit Mp = limm≤n<ω(Mpm,πpm,pn) is well-founded, and hence a trans- itive inner model.

This connection between inner models and reals appears attractive but it is not strong enough for the intended applications. One can prove under the assumption of a measurable cardinal that every set of reals has an embedding normal form (see Koepke (1998)). We strengthen the notion of ENF by requiring that the ill-foundedness of branches is already witnessed locally.

DEFINITION 4.2. A commuting system (Ms,πst)s⊆t∈ω<ω together with a system (ws)s∈ω<ω , ws : R → Ord is called an embedding normal form with witnesses (ENFW) for a set A ⊆ R of reals if

(4.2) (Ms ,πst)s⊆t∈ω<ω is an embedding normal form for A,

<ω (4.3) for every s ∈ ω : ws ∈ Ms and R ⊆ Ms , 286 PETER KOEPKE

(4.4) for every s ⊆ t ∈ ω<ω, s = t,andp ∈ R \ A, p ⊇ t we have πst(ws)(p) > wt (p). Witnesses exist if the models are sufficiently closed; a class X is κ- closed if Xκ ⊆ X.

Figure 5. An ENFW tree.

THEOREM 4.3. If (Ms,πst)s⊆t∈ω<ω is an ENF for a set A ⊆ R in which ℵ every model Ms is 2 0 -closed then there is a system (ws)s∈ω<ω of witnesses for (Ms ,πst)s⊆t∈ω<ω . p Proof. For every p ∈ R \ A one can find a sequence (γs | s ∈ b) of p p ordinals such that s ⊆ t ∈ p → πst(γs )>γt . Then define for s ∈ T p functions ws : R → Ord by: ws(p) = γs ,ifs ∈ p ∈ R \ A,and ws (p) = 0, else. If s ⊆ t ∈ p and p ∈ R \ A then (πst(ws))(p) = πst(ws(p)) = p p πst(γs )>γt = wt (p). Q.E.D.

5. DETERMINACY FROM EMBEDDING NORMAL FORMS

Descriptive set theory studies sets arising from ordinary mathematical practice: from a logical perspective these are pointsets, i.e., subsets of the real numbers and of similar spaces, which are simply definable. In order of increasing definition complexity one considers the following pointclasses: open and closed subsets of Euclidean spaces; Borel sets; analytic sets, which are the continuous images of Borel sets; co-analytic sets, which are the complements of analytic sets; projective sets, which are obtainable from Borel sets by taking continuous images and complements finitely often. The principal aim of these studies is to extend results about the regularity properties of simple sets to more complex pointclasses. THE CATEGORY OF INNER MODELS 287

A key notion in modern descriptive set theory is that of determinacy. The theory of infinite games considers games whose positions are finite sequences partially ordered by inclusion. Two players called player I and player II alternately try to lengthen a position by one move. Thereby, they determine a maximal path through the tree of positions. Player I’s aim is to get this path into a previously fixed winning set while player II tries to prevent this. The winning set is determined if there is a winning strategy for one of the players. By a classical result of Gale and Stewart (1953), topologically simple winning sets are determined. We shall show that winning sets representable by an ENFW are also determined. This is done by introducing an auxiliary game G∗ which is an extension of the original game G by “side moves”. One can view the original game as the auxiliary game with “hidden” side moves. The game G∗ is determined due to its simple topological nature. The ENFW is then used to construct a winning strategy in G from a win- ning strategy in G∗. In the crucial case of the construction one moves to different models of the ENFW and employs the witnesses of the ENFW as optimal side moves for player I in G∗. If player II can win against the optimal moves, player II can also win the original game G where these moves are “hidden”.

DEFINITION 5.1. A tree is a nonempty set of finite sequences, T ⊆ V<ω, closed under the formation of initial segments. For t ∈ V<ω let |t| denote the length of t. T is partially ordered by ⊆.Apath through T is a sequence p of length ≤ ω such that ∀n<ω(pn ∈ T); p is maximal if there is no path through T properly extending p. A maximal path through T is also called a play on T .Aplayp = (a0,a1,a2,a3,...) is sometimes represented in the form

I a0 a2 ...

II a1 a3 ... to indicate that player I makes the move a0, then player II answers a1, player I makes the move a2,etc.Let[T ] denote the set of plays of T .A game G(T , A) on T isgivenbyasetA ⊆[T ] of winning plays for player I. We say that player I wins the play p in the game G(T , A) if p ∈ A; player II wins if p ∈[T ]\A.

The obvious question is whether one of the players possesses a winning strategy in this game. A strategy on T is a function σ : T → V such that

∀t ∈ T(t is not maximal in (T , ⊆) → t.σ(t) ∈ T). 288 PETER KOEPKE

A strategy σ : T → V is a winning strategy for player I in the game G(T , A) if ∀p ∈[T ] ((∀2n<|p| (p(2n) = σ(p2n))) → p ∈ A). Similarly, σ is a winning strategy for player II if ∀p ∈[T ] ((∀2n + 1 < |p| (p(2n + 1) = σ(p2n + 1))) → p ∈[T ]\A). Player I and player II cannot both have winning strategies in G(T , A). G(T , A) is determined if one of the players has a winning strategy in G(T , A). We are mainly interested in games on the real numbers. Here, T is the tree ω<ω of finite sequences of natural numbers. We identify [T ] with the set R = ωω of reals. A set A ⊆ R is called determined if G(A) = G(ωω,A)is determined. Analytic (projective) determinacy is the statement that every analytic (projective) set of reals is determined. The determinacy of a pointclass has profound implications for its descriptive set theory (see Moschovakis (1980)). Consider a set A ⊆ R which has an ENFW. We modify the game G(A) = G(ω<ω,A) to an auxiliary game G∗(A) by adding side moves for player I and a system of rules such that if player I satisfies all the rules then player I has also produced a winning play for the original game G(A). Let T ∗ consist of all finite sequences of the form

((a0,f0), a1,(a2,f2), a3,...,(a2n,f2n)),or

((a0,f0), a1,(a2,f2), a3,...,(a2n,f2n), a2n+1) such that the following three conditions hold:

(5.1) aj ∈ ω,forj<2n + 2,

(5.2) f2j : R → θ,forj ≤ n, for some fixed sufficiently large ordinal θ (we shall give an adequate lower bound for θ in (5.5)), and:

(5.3) ∀x ∈ R \ A(x ⊇ (a0,...,a2i+2) → f2i(x) > f2i+2(x)),forall i

I a0,f0 a2,f2 ...

II a1 a3 ... Since there is no infinite descent in the ordinals, the functions f0,f2,...,f2n serve to push away the sequence (a0,a1,...) from R\A THE CATEGORY OF INNER MODELS 289 and into A. Player I wins the game G∗(A) if player I is able to satisfy the rules (4.1) to (4.3) in an infinite play. So we define the winning set for player I by:

A∗ ={p ∈[T ∗]|p is infinite}, G∗(A) = G(T ∗,A∗).

LEMMA 5.2. G∗(A) is determined. Proof. Call a position a winning position for player II if player II can force a finite play starting from that position. Now assume that player II has no winning strategy in G∗(A). Then the initial position ∅ is not a winning position for player II. Whenever t ∈ T ∗ is of even length 2n and is not a winning position for player II then there must be an extension t.σ(t)of t which is not a winning position for player II. This function σ is basically a strategy for player I and if player I follows σ in a play p in G∗(A),then p is infinite. Hence player I has a winning strategy in G∗(A). Note that the above is basically the GaleÐStewart argument for the determinacy of the closed game G∗(A) where A∗ is closed in the natural topology on [T ∗]. Q.E.D.

Assume that player I has a winning strategy σ ∗ for the game G∗(A). Player I is able to turn σ ∗ into a winning strategy for G(A) by “hiding” the side-moves f0,f2,.... “Internally” he reacts to the moves a1,a3,... ∗ of player II by playing a0,f0,a2,f2,...as given by σ . Officially he only plays the numbers a0,a2,... without the side moves f0,f2,.... Then the play p = (a0,a1,a2,a3,...) is a win for player I; if not then p ∈ R \ A and rule (4.3) implies f0(p) > f2(p) > f4(p) > ..., contradicting the well-foundedness of the ordinals. Therefore hiding the functions produced by σ ∗ yields a winning strategy for player I in G(A). If player I does not have a winning strategy in G∗(A) then by the de- terminacy of the auxiliary game, player II must have a winning strategy in G∗(A), call it σ ∗. We shall turn this into a winning strategy for player II in the game G(A). The problem is that σ ∗ expects to see side moves ∗ f0,f2,f4,...to calculate his response. To apply σ , player II has to guess or simulate these moves, and obviously he has to simulate them in an op- timal way. These simulations will be provided by the witnesses (ws)s∈ω<ω of an ENFW (Ms,πst)s⊆t∈ω<ω for A. By condition (4.4) the witnesses are descending along the ENF and this gives arbitrarily long sequences of functions satisfying the rule (5.3) in the 290 PETER KOEPKE definition of G∗(A). Player II will use the witnesses to simulate optimal moves of player I. We set:

∗ (5.4) σ(a0) = π∅,a0 (σ )(a0,wa0 ), ∗ σ(a0a1a2)=π∅,a0a1a2 (σ )(a0,πa0,a0a1a2 (wa0 ), a1,a2,wa0a1a2 ), ∗ σ(s)= π∅,s(σ )(s, πs1,s(ws1), πs3,s(ws3),...,ws ), for |s| odd. Note that the sequence (πs1,s(ws1), πs3,s(ws3),...,ws ) is a se- quence of descending functions which lives in Ms .Toviewthemaslegal side moves the constant θ in (5.2) must have been chosen sufficiently large, e.g.,

<ω (5.5) θ>supremum of the range of ws for every s ∈ ω .

∗ It is then possible to apply the mapped strategy π∅,s(σ ) inside Ms .

CLAIM 5.3. σ is a winning strategy for player II in G(A). Proof. Let p = (a0,a1,a2,...) ∈ R be a play in G(A) where player II follows σ . Assume for a contradiction that p ∈ A. Then the direct limit

(Mp,πpm,p)m<ω = limm≤n<ω(Mpm,πpm,pn) is transitive by (4.1). p = (a0,a1,a2,...) satisfies the equations (5.4). Applying the maps πpm,p to the equations yields:

∗ (5.6) a1 = π∅,p(σ )(a0,πa0,p(wa0 )) ∗ a3 = π∅,p(σ )(a0,πa0,p(wa0 ), a1,a2,πa0a1a2,p(wa0a1a2 )), ∗ a2n+1 = π∅,p(σ )(p2n + 1,πp1,p(wp1),...,πp2n+1,p(wp2n+1)), for n<ω. The sequence of functions on the right-hand side satisfies the rule (5.3): if x ∈ R \ A and p2n + 3 ⊆ x then

πp2n+1,p(wp2n+1)(x) = πp2n+3,p(πp2n+1,p2n+3(wp2n+1)(x))

>πp2n+3,p(wp2n+3(x))

= πp2n+3,p(wp2n+3)(x).

Therefore,

(5.7) I a0,πp1,p(wp1)a2,πp3,p(wp3) ...

II a1 a3 ... THE CATEGORY OF INNER MODELS 291

∗ ∗ is a play in π∅,p(G (A)) in which player II follows the strategy π∅,p(σ ) and in which the rule (5.3) is observed. An absoluteness argument shows that a similar play must actually exist inside the model Mp: Consider, in Mp, the set P of all finite sequences ∗ ∗ of moves in π∅,p(G (A)) in which player II follows the strategy π∅,p(σ ). (P , ⊇) is a partial order under reverse inclusion. (P , ⊇) is ill-founded in V as witnessed by the play (5.7). By the absoluteness of well-foundedness between V and the transitive model Mp, (P , ⊇) is ill-founded in Mp. ∗ Hence, in Mp, there is an infinite play in π∅,p(G (A)) in which player ∗ II follows the strategy π∅,p(σ ). Since π∅,p: V → Mp is elementary, there is, in V, an infinite play in G∗(A) in which player II follows the strategy σ ∗.Butthenσ ∗ is not a winning strategy for player II since player II’s aim is to make plays in G∗(A) finite. Contradiction. Q.E.D.

6. INDUCED EMBEDDINGS

We have seen in the preceding chapter that a set of reals having an em- bedding normal form with witnesses possesses strong regularity properties like determinacy. This motivates the construction of such normal forms for as many sets of reals as possible. The existence of embedding normal forms can be viewed as a certain richness of the category of inner models with elementary embeddings. An essential technique for building iteration trees and normal forms in the proof of the MartinÐSteel theorem is given by the following construc- tion in which a given elementary embedding π : M → N induces an elementary embedding π ∗ : M∗ → N ∗ of another inner model M∗ which is in sufficient agreement with M. Fix a non-trivial elementary embedding π : M → N of inner models with α being the smallest ordinal moved by π.LetM∗ be an inner model ∗ ∗ ∗ ∗ such that M ∼α+1 M . We want to define π : M → N using π.This can be done by applying the extender derived from π to the model M∗. In the present presentation however we shall carry out the construction as a category theoretic limit without explicit mention of extenders. We represent M∗ as a direct limit of a system whose components are elements ∗ of M ∩ Vα+1. The system can then be lifted by applying π to each of its components. Take I ={i ∈ M∗ | i : α → M∗ is injective} as a class of indices for a directed system. When we want to refer to the map i as opposed to the index i we write σi instead of i. I is partially ordered by the relation i ≤ j if and only if range(σi) ⊆ range(σj ).Fori ∈ I let Ei be the unique binary 292 PETER KOEPKE relation on α such that σi : (α, Ei ) → (range(σi), ∈) is an isomorphism. ≤ = −1 ◦ : → For i j define σij σj σi ;themapσij (α, Ei ) (α, Ej ) is a ∗ structural embedding, and (M , ∈), (σi)i∈I is the transitive direct limit of the system S = (α, Ei ,σij )i≤j∈I . Note that for i ≤ j ∈ I the components Ei and σij are elements of ∗ Vα+1 ∩ M = Vα+1 ∩ M. We can thus apply π to the components: define ∗ = ∗ = ∗ = α π(α), Ei π(Ei) and σij π(σij ). By the elementarity of π,the ∗ = ∗ ∗ ∗ lifted system S (α ,Ei ,σij )i≤j∈I is a directed commutative system; ∗ ∗ ∗ let (N ,E ), (σi )i∈I be a direct limit of the system. The systems S and S∗ are connected by the identity map, i.e., for ≤ ∈  ◦ = ∗ ◦  i j I:id α σij σij id α. Hence the direct limits can be connected uniquely by a map π ∗ : M∗ → N ∗ such that for each i ∈ I and ξ ∈ α: ∗ = ∗ π (σi(ξ)) σi (ξ). Since all morphisms considered are injective and structure preserving, we have π ∗ : (M∗, ∈) → (N ∗,E∗) injectively. We shall show that π ∗ is indeed an elementary map, and we shall later discuss conditions under which the image (N ∗,E∗) is well-founded and can be taken as a transitive ∈-model of the form (N ∗,E∗) = (N ∗, ∈).

∗ Figure 6. Inducing π from π.

The limit of a directed system is isomorphic to the limit of any cofinal subsystem. We can thin out the systems S and S∗ to cofinal subsystems THE CATEGORY OF INNER MODELS 293 which are n-elementary for a given n:Leti ∈ I. By the Levy reflection ∈ ⊆ ∩ ∗ ∩ ∗ ≺ principle choose θ Ord such that range(σi) Vθ M and Vθ M n M∗. By the downward LöwenheimÐSkolem theorem applied in M∗ there is ∈ ∗ ≺ ∩ ∗ ⊆ ∗ |= = X M such that X n Vθ M , range(σi) X,andM card(X) α. Choose j ∈ I such that range(σj ) = X. This shows that the class of ∗ indices j for which σj : (α, Ej ) → (M , ∈) is n-elementary is cofinal in  =  : → ∗ ∗ I. The identical map π α id α (α, Ej ) (α ,Ej ) is ω-elementary by the elementarity of π. Therefore the connecting map π ∗ : (M∗, ∈) → ∗ ∗ ∗ (N ,E ) is at least n-elementary. Since n ∈ ω is arbitrary, π is fully elementary. Let us assume until further notice that the structure (N ∗,E∗) is strongly well-founded. By the Mostowski Collapsing Lemma 2.1, the structure is isomorphic to a unique transitive ∈-structure. We can thus assume without loss of generality that (N ∗,E∗) is of the form (N ∗, ∈) where N ∗ is transi- tive. Then the map π ∗ is called the embedding of M∗ induced by π.Since N ∗ satisfies the axioms of ZFC, N ∗ is an inner model. We shall study the relations between the original map π and the map π ∗ induced by π. Consider i ∈ I such that range(σi) is transitive. Then the map σi : (α, Ei ) → (range(σi), ∈) is the Mostowski isomorphism of (α, Ei ).Bythe : ∗ ∗ → ∈ elementarity of π, π(σi) (α ,Ei ) (range(π(σi)), ) is the Mostowski ∗ ∗ ∗ : ∗ ∗ → ∗ ∈ isomorphism of (α ,Ei ). On the other hand, σi (α ,Ei ) (N , ) ∗ ∗ is the natural embedding of (α ,Ei ) into the direct limit. We claim that ∗ ∈ ∈ ∗ ∗ range(σi ) is transitive: Let x y range(σi ). Choose ξ<α such that = ∗ ≥ ∗ = ∗ = ∗ ∈ y σi (ξ). Choose j i and ζ<α such that x σj (ζ ). x σj (ζ ) = ∗ ∗ ∗ y σi (ξ) implies that ζEj σij (ξ). It suffices to show the following ∗ = ∗ = ∗ = ∗ ∗ = 6.1. There is ν<α such that ζ σij (ν);thenx σj (ζ ) σj (σij (ν)) ∗ ∈ ∗ σi (ν) range(σi ). Proof. By assumption, range(σi) is transitive. This implies that range(σi) is an ∈-initial segment of range(σj ). Then range(σij ) is an Ej - initial segment of α. This can be expressed as: ∀γ<α∀δ<α∃η<α (δ Ej σij (γ ) → δ = σij (η)). We apply the elementary map π to this fact: ∀ ∗∀ ∗∃ ∗ ∗ ∗ → = ∗ γ<α δ<α η<α(δ Ej σij (γ ) δ σij (η)). Then the claim follows with γ = ξ, δ = ζ , η = ν. Q.E.D.

∗ ∗ ∗ ∗ Since range(σi ) is transitive, σi is the Mostowski collapse of (α ,Ei ). Since the Mostowski collapse is uniquely determined, we have:

∗ = 6.2. σi π(σi).

M∗ ∗ M∗ 6.3. π(H≤α) = π (H≤α) . 294 PETER KOEPKE

M∗ Proof. Let x ∈ (H≤α) . Choose i ∈ I such that range(σi) is transitive and σi(0) = x.Thenby6.2,wehavethatπ(x) = π(σi(0)) = π(σi)(0) = ∗ = ∗ = ∗ σi (0) π (σi(0)) π (x). Q.E.D.

∗ 6.4. N ∼α∗ N . Proof. The ordinal α is strongly inaccessible in M. Thus Vα ∩ M ∈ M∗ ∗ ∗ ∗ (H≤α) . Vα∗ ∩ N = π(Vα ∩ M) = π (Vα ∩ M ) = Vα∗ ∩ N . Q.E.D.

The agreement of inner models below some level is crucial for many constructions in the category of inner models. The large cardinal notions considered in the next chapter mainly involve postulates on agreement. We now discuss conditions under which the image structure (N ∗,E∗) is well-founded. The relation E∗ is set-like:

6.5. If z ∈ N ∗,then{x ∈ N ∗ | xE∗z}∈V. ∗ ∈ ∗ ∈ ∗ = ∗ Proof. Let xE z N . Choose i I and ζ<α such that z σi (ζ ). We may assume that 0 ∈ range(σi). Choose η ∈ Ord such that range(σi) ⊆ ∈ ≥ ∗ = ∗ ∈ Vη. Choose k I, k i and ξ<α such that x σk (ξ).Definej I by j(γ) = k(γ) if k(γ) ∈ Vη,andj(γ) = 0 else. Then i ≤ j ≤ k.By construction, the following formula holds:

∀γ<α∀δ<α(σk(γ ) ∈ σi(δ) → σk(γ ) = σj (γ )).

Applying the inverse of σk to this formula yields:

∀γ<α∀δ<α(γEk σik(δ) → γ = σjk(γ )).

We now apply π to get:

∀ ∗∀ ∗ ∗ ∗ → = ∗ γ<α δ<α (γ Ek σik(δ) γ σjk(γ )).

∗ And finally, we apply σk , and have: ∀ ∗∀ ∗ ∗ ∗ ∗ → ∗ = ∗ γ<α δ<α (σk (γ ) E σi (δ) σk (γ ) σj (γ )). = ∗ ∗ ∗ = ∗ = ∗ Since x σk (ξ) E σi (ζ ) we have x σk (ξ) σj (ξ). Hence { ∈ ∗ | ∗ }⊆{ ∗ | ∈ ⊆ ∗} x N xE z σj (ξ) j I,range(σ ) Vη,ξ <α , which is a set. Q.E.D.

6.6. If M∗ is countably closed, (M∗)ω ⊆ M∗,then(N ∗,E∗) is strongly well-founded. THE CATEGORY OF INNER MODELS 295

Proof. By the previous claim we only have to check well-foundedness. ∗ Assume E is ill-founded. Then there are indices in ∈ I and ordinals ξn < ∗ ∗ ∗ ∗ ≤ α such that for n<ω: σi + (ξn+1)E σi (ξn). We may assume that i0 ≤ n 1 ∗ ∗ n i1 .... Then for n<ω: ξn+1 Ei + σi i + (ξn). This means that the direct ∗ = ∗ ∗ n 1 ∗ n n 1 limit of the system E (α ,Ei ,σi i )m≤n<ω is ill-founded. = m m n Define the system E (α, Eim ,σimin )m≤n<ω. This system can be em- bedded into (M∗, ∈) and hence is well-founded. Since M∗ is countably ∗ ∗ ∗ closed, E ∈ M .SinceM ∼α+1 M ,wehaveE ∈ M. E = π(E), and by the elementarity of π, E ∗ is well-founded in N and hence in V. Contradiction. Q.E.D.

The final claim shows that a given degree of closure is preserved by the formation of induced embeddings.

6.7. Assume that in our situation M∗ is η-closed and (α∗)η ⊆ N where η<α.ThenN ∗ is η-closed. = ∗ ∗ Proof. Consider an η-sequence (zδ)δ<η (σi (ξδ))δ<η from N with ∗ δ ∗ indices iδ and ordinals ξδ <α.Bytheη-closure of M we can choose ∈ ∀ = ∗ an index j I such that δ<ηiδ

7. LARGE CARDINALS AND THE CONSTRUCTION OF ENFS

The initial assumptions for the construction of systems of elementary em- beddings of inner models are large cardinal axioms. It is now customary to formulate large cardinal notions like measurable cardinals and strong cardinals in terms of elementary embeddings of inner models.3 This has proved to be a strong unifying principle in large cardinal theory (see the survey article by Kanamori and Magidor (1978)). The strength of a large cardinal assumption is expressed by the degree of agreement between the sources and the targets of the postulated embeddings. We have seen in the previous chapter that such agreement allows the construction of new em- beddings out of given ones. All elementary embeddings to be considered will be taken from the family of internal maps as defined in Section 4. So 296 PETER KOEPKE the definition of this family is part of the definitions of the subsequent large cardinal notions. There are, however, combinatorial equivalences which do not depend on particular families of inner models and embeddings. Since embeddings induced by π can move models different from the model where π is defined one is able to build complicated branching systems of models out of large cardinal assumptions. The constructions require careful control of the agreement among models. In the proof of 1 the MartinÐSteel theorem ENFWs for n-sets are constructed by recur- sion on n. Woodin cardinals provide the exact large cardinal strength for the successor case of the recursion. We list some relevant large cardinal notions in order of strength. Using 6.7 we require sufficient closure of the image models so that the induced embeddings possess well-founded image models (6.6) and witnesses (Theorem 4.3). One can prove that the definitions are equivalent to the same formulations without closure require- ments. A measurable cardinal Ð the weakest notion considered here Ð is the obvious assumption for making the structure of the category of inner models non-trivial.

DEFINITION 7.1. A cardinal α is measurable if there is an elementary embedding π : V → M with critical point α into an α-closed inner model M.

A measurable cardinal is strongly inaccessible hence the image model ℵ M in this definition is certainly 2 0 -closed.

DEFINITION 7.2. A cardinal α is strong (Gaifman 1974) if for every x ∈ V there is an elementary embedding π : V → M with critical point α into an α-closed inner model M such that x ∈ M.

DEFINITION 7.3. Let α<δand p be a finite sequence of parameters. Then α is strong in p up to δ if for all η<δthere is an elementary embedding π : V → M with critical point α into an α-closed inner model M such that (V, p) ∼η (M, π(p)) .

DEFINITION 7.4. A cardinal δ is a Woodin cardinal (Shelah and Woodin 1990) if for all finite sequences p of parameters there is α<δwhich is strong in p up to δ.

The growth of strength from measurable to Woodin cardinals is for- midable: below each strong cardinal there are cofinally many measurable cardinals; if δ is a Woodin cardinal then Vδ |= “there are cofinally many strong cardinals”. Woodin cardinals imply the existence of many element- THE CATEGORY OF INNER MODELS 297 ary embeddings with favourable preservation properties. We shall indicate in the following how this can be used in the construction of ENFWs for arbitrary projective sets. The projective sets are obtained recursively from open sets in some product space Rn by finitely many complementations and projections. Provided there is a measurable cardinal one can construct ENFWs for closed sets by the iteration methods of chapter 3. The recursion step re- quires to show: if a set A ⊆ Rn+1 has a sufficiently closed ENF then the complement Rn+1 \ A and the projection {a ∈ Rn |∃b ∈ R (a, b) ∈ A} both have sufficiently closed ENFs. The main idea of the proof is already present in the simpler case of complements. For simplicity we shall consider 1-dimensional sets, i.e., n = 0. So assume that the set A ⊆ R has an ENFW N = (Ns ,πst)s⊆t∈ω<ω together with witnesses (ws )s∈ω<ω . The aim is to construct an ENF M = (Ms ,σst)s⊆t∈ω<ω for R\A. Simultaneously one defines an auxiliary system ∗ = ∗ ∗ M (Ms ,σst)s⊆t∈ω<ω which reflects many properties of the given system N and in particular the well-foundedness of branches.

Figure 7. Constructing an ENF M from N . The two highlighted branches constitute an alternating chain.

We state the crucial properties of this diagram:

∈ R ∗ FACT 7.5. For all p we have Np is well-founded if and only if Mp is ∗ well-founded, where Np resp. Mp denote the direct limits of the branches corresponding to the real p in N resp. M∗. 298 PETER KOEPKE

FACT 7.6. The systems M and M∗ are complementary in the sense that ∈ R ∗ for all p we have Mp is well-founded if and only if Mp is ill-founded.

Granting these two properties we have an embedding normal form representation of the complement of A as required.

∈ R \ ∗ FACT 7.7. p A if and only if Np is ill-founded if and only if Mp is ill-founded if and only if Mp is well-founded.

In conclusion we see that under appropriate initial assumptions the cat- egory of inner models and elementary embeddings exhibits rich structural properties which can be used to analyse situations in descriptive set theory.

8. APPENDIX: MORE TECHNICAL DETAILS

For a reader who is keen on understanding the whole proof of the MartinÐ Steel theorem we shall now sketch the crucial part of the construction in terms of the category of inner models. For a complete argument along these lines but formulated in the different language of extenders we refer to Koepke (1998). The fact 7.6 is obtained by constructing the branches (Mpn)n<ω and ∗ ∗ (Mpn)n<ω through M and M as complimentary branches of an alter- nating chain. In our particular situation this amounts to the following three requirements:

∗ (8.1) M∅ = M∅ = V.

(8.2) For every n<ω, the embedding σpn,pn+1 : Mpn → Mpn+1 ∗ is induced by a map which is internal in Mpn. ∗ : ∗ → ∗ (8.3) For every n<ω, the embedding σpn,pn+1 Mpn Mpn+1 is induced by a map which is internal in Mpn+1.

Schematically the two branches can be represented as in Figure 8 where the broken arrows indicate that an internal map induces an elementary embedding of another model. An alternating chain can also be seen as linear construction

∗ ∗ M∅,Mp1,Mp1,Mp2,Mp2,....

At each stage of the construction an internal map is chosen and used to induce an elementary map of an earlier model in the sequence; the resulting THE CATEGORY OF INNER MODELS 299

Figure 8. An alternating chain. image is put as the next model on the sequence. This is a generalization of the iteration process studied in Section 4. The generalization retains some traces of the construction of a well-founded direct limit, and by a crucial lemma of Martin and Steel (1989), the alternating chain has at least one well-founded branch. ∗ ∗ ∗ ∗ If we can find functions wp0, wp1, ... in Mp0, Mp1, ... which behave like the witnesses wp0, wp1, ...of the ENF for A we have:

∗ If Np is ill-founded as witnessed by wp0, wp1, ...,thenMp is illfounded ∗ ∗ as witnessed by wp0, wp1, ...and then Mp is well-founded by the above- mentioned lemma of Martin and Steel.

The case when Np is well-founded is treated by a different argument.

The construction of the alternating chain requires a high degree of agreement between structures on both branches since one uses the method of induced embeddings. Let us state some properties needed in the recursive construction of the alternating chain: The construction takes place under the assumption of a sufficiently big Woodin cardinal δ. In the course of the construction δ yields a sequence ∗ ∗ κ0 <κ0 <κ1 <κ1 <...<δof large cardinals such that: |= |= ∗ (8.4) Npn “Mpn κn is strong in πp0,pn(wp0),..., πpn,pn(wpn) up to δ”, ∗  |=  ∼ ∗+ (8.5) Np n “Mp n κn 1 Mpn”, and

(8.6) Npn |= “(Mpn, πp0,pn(wp0),...,πpn,pn(wpn)) ∼κ∗ ∗ ∗ ∗ ∗ ∗ n (Mpn, σp0,pn(wp0),...,σpn,pn(wpn))”.

Note that the facts about the alternating chain are stated in Npn where the witnessing parameters πp0,pn(wp0),...,πpn,pn(wpn) are living. To continue we have to incorporate the witness wpn+1 into the construc- tion. So we lift properties (8.4) to (8.6) to Npn+1 by the elementary embedding πpn,pn+1, and get: 300 PETER KOEPKE

|= |= ∗ (8.7) Npn+1 “Mpn κn is strong in πp0,pn+1(wp0),..., πpn,pn+1(wpn) up to δ”, ∗  + |=  ∼ ∗+ (8.8) Np n 1 “Mp n κn 1 Mpn”, and

(8.9) Npn+1 |= “(Mpn, πp0,pn+1(wp0),...,πpn,pn+1(wpn)) ∼κ∗ ∗ ∗ ∗ ∗ ∗ n (Mpn, σp0,pn(wp0),...,σpn,pn(wpn))”. We consider the initial part of the alternating chain to be given by a term which can be interpreted in Npn and in Npn+1. ∗ Within Npn+1 thereisamapinternalinMpn with critical point κn ∗ ∗ : ∗ → ∗ which can be applied to Mpn to yield πpn,pn+1 Mpn Mpn+1.The ∗ internal map can be chosen strong enough so that there are κn >κn and ∗ wpn+1 with the following properties: |= ∗ |= ∗ ∗ (8.10) Npn+1 “Mpn+1 κn is strong in σp0,pn+1(wp0),..., ∗ ∗ ∗ σpn,pn+1(wpn)), wpn+1 up to δ”, |= ∗ ∼ (8.11) Npn+1 “Mpn+1 κn+1 Mpn”, and |= ∗ ∗ ∗ ∗ ∗ (8.12) Npn+1 “(Mpn+1, σp0,pn+1(wp0), ..., σpn,pn+1(wpn)), ∗ wpn+1) ∼ κn (Mpn, πp0,pn+1(wp0),...,πpn,pn+1(wpn), wpn+1)”.

∗ As above one can find an internal map in Mpn+1 which induces a map : → ∗ πpn,pn+1 Mpn Mpn+1 and a κn+1 >κn such that: |= |= ∗ (8.13) Npn+1 “Mpn+1 κn+1 is strong in πp0,pn+1(wp0),..., πpn,pn+1(wpn), wpn+1 up to δ”, ∗ (8.14) N  + |= “M  + ∼ ∗ + M ”, and p n 1 p n 1 κn+1 1 pn+1

(8.15) Npn+1 |= “(Mpn+1, πp0,pn+1(wp0),..., πpn,pn+1(wpn), wpn+1) ∗ ∗ ∗ ∗ ∗ ∼ ∗ κ + (Mpn+1, σp0,pn+1(wp0),..., σpn,pn+1(wpn), ∗n 1 wpn+1)”. This corresponds to the initial situation (8.4) to (8.6) and shows that the construction can be continued. There are many subtle points to be arranged to make this construction work of which we mention two: the constrution ∗ has to be local, i.e., the definition of Mpn and of Mpn should only depend on pn and not on all of p; all structures should be sufficiently closed so that the questions of well-foundedness of induced images and of existence of witnesses is resolved. THE CATEGORY OF INNER MODELS 301

NOTES

1 So let us cast aside the cosmocentric superstition just like we cast aside the geocentric and anthropocentric superstition before; let us realize that there are myriads of cosmic worlds spun into the chaos Ð each of them appearing to its inhabitants as the sole and only real world and misleading them to assign its qualitative and particular characteristics to the transcendental core of the world. But this core escapes every bond however loose and keeps its freedom to be restricted to a cosmic appearance in infinitely varied ways. It allows the coexistence of all these appearances which are contained in its universality as particular possibilities and as conceptually defined subsets. Indeed it is nothing else than just this coexistence and thus transcendent for a particular appearance which has a closed realm of immanence in itself. (Translation P.K.) 2 Alternatively one could also work in a class theoretic system and study classes which are inner models. 3 The notion of a “flipping property” allows to characterize not only embedding cardinals but also combinatorial large cardinal notions. See Abramson et al. (1977), Barnabel (1989), Di Prisco and Marek (1985) and Di Prisco and Zwicker (1980).

REFERENCES

Abramson, F. G., L. A. Harrington, E. M. Kleinberg, and W. S. Zwicker: 1977, ‘Flip- ping Properties: A Unifying Thread in the Theory of Large Cardinals’, Annals of Mathematical Logic 12, 25Ð58. Barnabel, J. B.: 1989, ‘Flipping Properties and Huge Cardinals’, Fundamenta Mathemat- icae 132, 171Ð188. Cohen, P.: 1963, ‘The Independence of the Continuum Hypothesis’, Proceedings of the National Academy of Sciences of the United States of America 50, 1143Ð1148. Devlin, K. J.: 1984, Constructibility, Berlin [Perspectives in Mathematical Logic]. Di Prisco, C. A. and W. Marek: 1985, ‘Some Aspects of the Theory of Large Cardinals’, in L. P. de Alcantara (ed.), Mathematical Logic and Formal Systems, A Collection of Papers in Honor of Newton C. A. da Costa, New York, pp. 87Ð139. [Lecture Notes in Pure and Applied Mathematics 94] Di Prisco, C. A. and W. S. Zwicker: 1980, ‘Flipping Properties and Supercompact Cardinals’, Fundamenta Mathematicae 109, 31Ð36. Dougherty, R. and T. Jech: 1997, ‘Finite Left-distributive Algebras and Embedding Algebras’, Advances in Mathematics 130, 201Ð291. Gaifman, H.: 1974, ‘Elementary Embeddings of Set Theory and Certain Subtheories’, in T. Jech (ed.), Axiomatic Set Theory, Proceedings of the Symposium in Pure Mathem- atics of the American Mathematical Society, held at the University of California, Los Angeles, Calif., July 10ÐAugust 5, 1967, Providence [Proceedings of Symposia in Pure Mathematics, XIII, Part II], 33Ð101. Gale, D. and F. M. Stewart: 1953, ‘Infinite Games with Perfect Information’, in H. W. Kuhn andA.W.Tucker(eds.),Contributions to the Theory of Games II, Princeton [Annals of Mathematical Studies 28], 245Ð266. Gödel, K.: 1931, ‘Über formal unentscheidbare Sätze der Principia Mathematica und verwandter Systeme I’, Monatshefte für Mathematik und Physik 38, 173Ð198. 302 PETER KOEPKE

Gödel, K.: 1938, ‘The Consistency of the Axiom of Choice and of the Generalized Continuum-hypothesis’, Proceedings of the National Academy of Sciences of the United States of America 24, 556Ð557. Hausdorff, F. (Paul Mongré): 1898, Das Chaos in kosmischer Auslese – Ein erkenntniskri- tischer Versuch,Leipzig. Jech, T.: 1978, Set Theory, San Diego. Kanamori, A.: 1994, The Higher Infinite, Large Cardinals in Set Theory from Their Beginnings, Berlin [Perspectives in Mathematical Logic]. Kanamori, A. and M. Magidor: 1978, ‘The Evolution of Large Cardinal Axioms in Set Theory’, in G. H. Müller and D. S. Scott (eds.), Higher Set Theory, Proceedings,Ober- wolfach, Germany, April 13Ð23, 1977, Berlin [Lecture Notes in Mathematics 669], 99Ð275. 1 Koepke, P.: 1996, ‘Embedding Normal Forms and 1-determinacy, in W. Hodges, M. Hyland, C. Steinhorn and J. K. Truss (eds), Logic: From Foundations to Applications, European Logic Colloquium, Keele, UK, July 20Ð29, 1993, Oxford, 215Ð224. Koepke, P.: 1998, ‘Extenders, Embedding Normal Forms, and the MartinÐSteel-theorem’, Journal of Symbolic Logic 63, 1137Ð1176. Laver, R.: 1992, ‘The Left Distributive Law and the Freeness of an Algebra of Elementary Embeddings’, Advances in Mathematics 91, 209Ð231. Laver, R.: 1997, ‘Implications between Strong Large Cardinal Axioms’, Annals of Pure and Applied Logic 90, 79Ð90. Martin, D. A.: 1970, ‘Measurable Cardinals and Analytic Games’, Fundamenta Mathem- aticae 66, 287Ð291. Martin, D. A. and J. R. Steel: 1989, ‘A Proof of Projective Determinacy’, Journal of the American Mathematical Society 2, 71Ð125. Moschovakis, Y. N.: 1980, Descriptive Set Theory, Amsterdam [Studies in Logic and the Foundations of Mathematics 100]. Scott, D.: 1961, ‘Measurable Cardinals and Constructible Sets’, Bulletin de l’Académie Polonaise des Sciences, Série des Sciences Mathématiques, Astronomiques et Physiques 9, 521Ð524. Shelah, S. and W. H. Woodin: 1990, ‘Large Cardinals Imply that Every Reasonably Definable Set of Reals is Lebesgue Measurable’, Israel Journal of Mathematics 70, 381Ð394. Tarski, A.: 1935, ‘Der Wahrheitsbegriff in den formalisierten Sprachen’, Studia Philosoph- ica 1, 261Ð405. Ulam, S.: 1930, ‘Zur Ma§theorie in der allgemeinen Mengenlehre’, Fundamenta Mathem- aticae 16, 140Ð150. von Neumann, J.: 1923, ‘Zur Einführung der transfiniten Zahlen’, Acta Universitatis Szegediensis, Sectio Scientiarum Mathematicarum 1, 199Ð208. von Neumann, J.: 1925, ‘Eine Axiomatisierung der Mengenlehre’, Journal für die reine und angewandte Mathematik 154, 219Ð240. Zermelo, E.: 1904, ‘Beweis, dass jede Menge wohlgeordnet werden kann (Aus einem an Herrn Hilbert gerichteten Briefe)’, Mathematische Annalen 59, 139Ð141. Zermelo, E.: 1930, ‘Über Grenzzahlen und Mengenbereiche: Neue Untersuchungen über die Grundlagen der Mengenlehre’, Fundamenta Mathematicae 16, 29Ð47. THE CATEGORY OF INNER MODELS 303

Mathematisches Institut, Current Address: Centre de Recerca Rheinische Friedrich-Wilhelms-Universität Matematica,` Bonn, Institut d’Estudis Catalans, Beringstra§e 1, DÐ53115 Bonn Apartat 50, Germany E-08193 Bellaterra Spain E-mail: [email protected]