Formally Verified Mathematics

contributed articles

doi:10.1145/2591012 “The development of mathematics With the help of computational proof toward greater precision has led, as is well known, to the formalization of assistants, formal verification could become large tracts of it, so one can prove any the new standard for rigor in mathematics. theorem using nothing but a few mechanical rules. The most comprehen- By Jeremy Avigad and John Harrison sive formal systems that have been set up hitherto are the system of Principia Mathematica on the one hand and the Zermelo-Fraenkel axiom system of set theory (further developed by J. von Formally Neumann) on the other. These two systems are so comprehensive that in them all methods of proof used today in mathematics are formalized, that Verified is, reduced to a few axioms and rules of inference. One might therefore conjecture that these axioms and rules of inference are sufficient to decide any Mathematics mathematical question that can at all be formally expressed in these systems. It will be shown below that this is not the case…”4 Gödel was right to claim the mathematics of his day could generally be formalized in axiomatic set theory and type theory, and these have held up, to today, as remarkably robust foun- From the point of view of the foundations of dations for mathematics. Indeed, set- mathematics, one of the most significant advances theoretic language is now ubiquitous, th and most mathematicians take the in mathematical logic around the turn of the 20 language and axioms of set theory to century was the realization that ordinary mathematical underwrite their arguments, in the sense that any ambiguities in a claim arguments can be represented in formal axiomatic or proof could, in principle, be elimi- systems in such a way their correctness can be verified nated by spelling out the details in mechanically, at least in principle. Gottlob Frege set-theoretic terms. In the mid-1930s, a group of mathematicians writing presented such a formal system in the first volume under the pen name Nicolas Bourba- of his Grundgesetze der Arithmetik, published in 1893, ki adopted set theory as the nominal though in 1903 Bertrand Russell showed the system foundation for a series of influential treatises aiming to provide a self-con- to be inconsistent. Subsequent foundational systems include the ramified type theory of Russell and Alfred key insights North Whitehead’s Principia Mathematica, published Among the sciences, mathematics is distinguished by its precise language in three volumes from 1910 to 1913; Ernst Zermelo’s and clear rules of argumentation. axiomatic set theory of 1908, later extended by T his fact makes it possible to model mathematical proofs as Abraham Fraenkel; and Alonzo Church’s simple formal axiomatic derivations. type theory of 1940. When Kurt Gödel presented C omputational proof assistants make his celebrated incompleteness theorems in 1931, it possible to check the correctness of these derivations, thereby increasing he began with the following assessment: the reliability of mathematical claims.

66 communications of the acm | april 2014 | vol. 57 | no. 4 tained, rigorous presentation of the of a time when any disputants could velopment of the deductive method, core branches of mathematics: translate their disagreement into the as exemplified by Euclid’s Elements “…the correctness of a mathemati- characteristica and say to each other of Geometry. Starting from axioms cal text is verified by comparing it, “calculemus” (“let us calculate”). In the and postulates, a mathematical proof more or less explicitly, with the rules of 20th century, however, even Bourbaki proceeds by a chain of incontrovert- a formalized language.”3 conceded complete formalization was ible logical steps to its conclusion. From this standpoint, the contem- an unattainable ideal: Through the ages, the method of the porary informal practice of writing a “…the tiniest proof at the beginning Elements was held to represent the mathematical proof is an approxima- of the Theory of Sets would already re- paradigm of rigorous argumentation to that ideal, whereby the task of quire several hundreds of signs for its tion, to mathematicians, scientists, a referee is to exercise professional complete formalization… formalized and philosophers alike. Its appeal is judgment as to whether the proof mathematics cannot in practice be eloquently conveyed in John Aubrey’s could be expressed, in principle, in a written down in full… We shall there- short biography1 of the philosopher way that conforms to the rules. The fore very quickly abandon formalized Thomas Hobbes (1588–1697), who mathematician Saunders Mac Lane mathematics.”3 made his first serious contact with put it as follows: Due to developments in computer mathematics at the age of 40: “A Mathematical proof is rigorous science over the past few decades, it is “Being in a Gentleman’s Library, when it is (or could be) written out in now possible to achieve complete for- Euclid’s Elements lay open, and ’twas the first-order predicate language L(∈) malization in practice. Working with the 47 El. libri 1 [Pythagoras’s Theo- as a sequence of inferences from the “computational proof assistants,” us- rem]. He read the proposition. By axioms ZFC, each inference made ac- ers are able to verify substantial mathe- G—, sayd he (he would now and then cording to one of the stated rules… matical theorems, constructing formal sweare an emphaticall Oath by way When a proof is in doubt, its repair is axiomatic derivations of remarkable of emphasis) this is impossible! So he usually just a partial approximation to complexity. Our goal in this article is to reads the Demonstration of it, which the fully formal version.”15 describe the current technology and its referred him back to such a Proposi- This point of view is to some ex- motivations, survey the state of the art, tion; which proposition he read. That tent anticipated by the 17th century highlight some recent advances, and referred him back to another, which philosopher Gottfried Leibniz, who discuss prospects for the future. he also read. Et sic deinceps [and so called for development of a universal on] that at last he was demonstratively language (characteristica universalis) Mathematical Rigor convinced of that trueth. This made a p er s u . c om in which anything can be expressed The notion of proof lies at the heart him in love with Geometry.”1 and a calculus of reasoning (calculus of mathematics. Although early re- The encounter turned Hobbes into p y of wall s ratiocinator) for deciding the truth of cords of measurement and numeric an enthusiastic amateur geometer assertions expressed in the character- computation predate the ancient “wont to draw lines on his thigh and istica. Leibniz’s ambitions were not Greeks, mathematics proper is com- on the sheets, abed.” He became no-

mage c ourte I mage limited to mathematics; he dreamed monly seen as having begun with de- torious in later years for bombarding

april 2014 | vol. 57 | no. 4 | communications of the acm 67 contributed articles

top mathematicians with his error- “Dirichlet alone, not I, nor Cauchy, referees, even those with the best of ridden ruler-and-compass geometric nor Gauss knows what a completely intentions, are fallible, and mistakes constructions, some of which are now rigorous mathematical proof is. Rather are inevitably made in the peer-review known to be impossible in principle. we learn it first from him. When Gauss process. A book written by Lecat in But what, exactly, constitutes a says that he has proved something, it 1935 included 130 pages of errors proof? Who is to say whether a proof is is very clear; when Cauchy says it, one made by major mathematicians up to correct or not? Maintaining that a proof can wager as much pro as con; when 1900, and even mathematicians of the need convince only the person reading Dirichlet says it, it is certain…” (quoted stature of J.E. Littlewood have pub- it gives the notion a subjective charac- by Schubring21). lished faulty proofs: ter. In practice, proofs tend to become Mathematics has, at critical junc- “Professor Offord and I recently generally accepted when they persuade tures, developed in more speculative committed ourselves to an odd mistake not just one or two people but a broad ways. But these episodes are invariably (Annals of Mathematics Annals of Math- or particularly influential group of followed by corresponding periods of ematics (2) 49, 923, 1.5). In formulating mathematicians. Yet as Hobbes him- retrenchment, analyzing foundations a proof a plus sign got omitted, becom- self noted, this does not entirely avoid and increasingly adopting a strict de- ing in effect a multiplication sign. The the subjective and fallible character of ductive style, either to resolve apparent resulting false formula got accepted as the judgment: problems or just to make the material a basis for the ensuing fallacious argu- “But no one mans Reason, nor the easier to teach convincingly.6 ment. (In defence, the final result was Reason of any one number of men, Reflecting on the history of math- known to be true.)”14 makes the certaintie; no more than ematics, we can to some extent dis- Every working mathematician must an account is therefore well cast up, entangle two related concerns. The routinely deal with inferential gaps, because a great many men have unani- first is whether the methods used in a misstatements, missing hypotheses, mously approved it.”12 given proof are valid, or appropriate to unstated background assumptions, In the history of mathematics, there mathematics. This has to do with com- imprecise definitions, misapplied re- is no shortage of controversy over the ing to consensus as to the appropriate sults, and the like. A 2013 article in the validity of mathematical arguments. rules of argumentation. For example, Notices of the American Mathematical Berkeley’s extended critique of the is it legitimate to refer to negative Society (Grcar7) on errors in the math- methods of the calculus in The Analyst numbers, complex numbers, and in- ematical literature laments the fact (1734) is one example. Another is the finitesimals, and, if so, what properties that corrections are not published as “vibrating string controversy” among do they have? Is it legitimate to use the often as they should be. Some errors Leonhard Euler, Jean d’Alembert, and axiom of choice or to apply the law of are not easily repaired. The first pur- Daniel Bernoulli, hinging on whether the excluded middle to statements in- ported proof of the four-color theorem an “arbitrary” continuous function on volving infinite structures? The second in 1879 stood for a decade before a flaw a real interval could be represented by concern has to do with whether, given was pointed out. Referees reviewing a trigonometric series. Carl Friedrich a background understanding of what Andrew Wiles’s first proof of Fermat’s Gauss is usually credited with provid- is allowed, a particular proof meets Last Theorem found a mistake, and it ing the first correct proof of the funda- those standards; that is, whether or not took Wiles and a former student, Rich- mental theorem of algebra, asserting the proof is correct. The foundational ard Taylor, close to a year to find a way that every nonconstant polynomial debates of the early 20th century, and to circumvent it. Daniel Gorenstein an- over the complex numbers has a root, the set-theoretic language and formal- nounced, in 1983, that the classifica- in his doctoral dissertation of 1799; ism that emerged, were designed to tion of finite simple groups had been but the history of that theorem is espe- address the question as to what meth- completed, unaware there was a gap cially knotty, since it was not initially ods are legitimate, by establishing an in the treatment of the class of “qua- clear what methods could legitimately in-principle consensus as to the infer- sithin” groups. The gap was not filled be used to establish the existence of ences that are allowed. Formal verifica- until 2001, and doing so required a the roots in question. Similarly, when tion is not designed to help with that; 1,221-page proof by Michael Aschbach- Gauss presented his proof of the law of just as human referees can strive only er and Stephen Smith. quadratic reciprocity in his Disquitio- to establish correctness with respect Even when an argument turns out nes Arithmeticae (1801), he began with to an implicit conception of what is ac- to be correct, judging it to be so can the observation that Legendre’s al- ceptable, formal correctness can be as- take a long time. Grigori Perelman leged proof a few years prior contained sessed only modulo an underlying axi- posted three papers on arXiv in 2002 a serious gap. omatic framework. But, as will become and 2003 presenting a proof of Thur- Mathematicians have always been clear in the next section, establishing ston’s geometrization conjecture. reflectively conscious of their meth- correctness is a nontrivial concern, This result, in turn, implies the Poin- ods, and, as proofs grew more complex and is where formal methods come caré conjecture, one of the Clay Math- in the 19th century, mathematicians into play. ematics Institute’s famed Millen- became more explicit in emphasizing nium Prize challenges. The proof was the role of rigor. This is evident in, for Correctness Concerns scrutinized by dozens of researchers, example, Carl Jacobi’s praise of Johann Mathematical proofs are complex ob- but it was not until 2006 that three Peter Gustav Lejune Dirichlet: jects, and becoming more so. Human independent groups determined that

68 communications of the acm | april 2014 | vol. 57 | no. 4 contributed articles any gaps in Perelman’s original proof verdict is disappointingly lacking in were minor, and could be filled using clarity and finality. In fact, as a result the techniques he had developed. of this experience, the journal changed The increased complexity is exac- its editorial policy on computer-assist- erbated by the fact that some proofs ed proof so it will no longer even try rely on extensive calculation. Kenneth Every working to check the correctness of computer Appel’s and Wolfgang Hakken’s 1976 mathematician code. Dissatisfied with this state of af- proof of the four-color theorem relied fairs, Hales turned to formal verifica- on an exhaustive computer enumera- routinely has tion, as we will see. tion of combinatorial configurations. to deal with In November 2005, the Notices of Subsequent proofs, though more effi- the American Mathematical Society pub- cient, have this same character. Proofs inferential gaps, lished an article by Brian Davies called that depend on explicit checking of “Whither Mathematics?” that raised cases are nothing new in themselves; misstatements, questions about the mounting com- for example, proofs of Bertrand’s con- missing hypotheses, plexity of mathematical proof and the jecture (for n ≥ 1 there is a prime n ≤ p role of computers in mathematics. In ≤ 2n) often begin with a comment like unstated August 2008, the Notices published an “Let us assume n ≥ 4,000, since one background opinion piece by Melvyn Nathanson can verify it explicitly for other cases.” that also raised concerns about the sta- But this feature took on dramatic pro- assumptions, tus of mathematical proof: portions with Thomas Hales’s 1998 imprecise “…many great and important theo- proof of the Kepler conjecture, stat- rems don’t actually have proofs. They ing that no packing of spheres in 3D definitions, have sketches of proofs, outlines of ar- space has higher density than the misapplied results, guments, hints and intuitions that were natural face-centered cubic packing obvious to the author (at least, at the commonly used to stack oranges, and the like. time of writing) and that, hopefully, are cannonballs, and such. Hales, work- understood and believed by some part ing with Samuel Ferguson, arrived at a of the mathematical community.”18 proof in 1998 consisting of 300 pages He concluded: of mathematics and calculations per- “How do we recognize mathemati- formed by approximately 40,000 lines cal truth? If a theorem has a short of computer code. As part of the peer- complete proof, we can check it. But review process, a panel of 12 refer- if the proof is deep, difficult, and al- ees appointed by the Annals of Math- ready fills 100 journal pages, if no one ematics studied the proof for four has the time and energy to fill in the full years, finally returning with the details, if a ‘complete’ proof would be verdict that they were “99% certain” 100,000 pages long, then we rely on the of the correctness, but in the words of judgments of the bosses in the field. In the editor Robert MacPherson: mathematics, a theorem is true, or it’s “The news from the referees is bad, not a theorem. But even in mathemat- from my perspective. They have not ics, truth can be political.”18 been able to certify the correctness of Nathanson’s essay did not explicitly the proof, and will not be able to certify mention contemporary work in formal it in the future, because they have run verification. But a few months later, in out of energy to devote to the problem. December 2008, the Notices devoted This is not what I had hoped for… an entire issue to formal proof, focus- “Fejes Tóth thinks that this situa- ing on methods intended to alleviate tion will occur more and more often Nathanson’s concerns. in mathematics. He says it is similar to the situation in experimental sci- Automating Mathematics ence—other scientists acting as refer- As noted, for suitable formal proof ees can’t certify the correctness of an systems, there is a purely mechanical experiment, they can only subject the process for checking whether an al- paper to consistency checks. He thinks leged proof is in fact a correct proof of that the mathematical community will a certain proposition. So, given a prop- have to get used to this state of affairs.” osition p, we could in principle run a The level of confidence was such search program that examines in some that the proof was indeed published suitable sequence (in order of, say, in the Annals, and no significant error length and then alphabetical order) has been found in it. Nevertheless, the every potential proof of p and termi-

april 2014 | vol. 57 | no. 4 | communications of the acm 69 contributed articles

nates with success if it finds one. That have been particularly effective in prac- is, in the terminology of computability tice (such as Gröbner basis algorithms theory, the set of provable formulas is and Wu’s method, both applicable to recursively enumerable. theorem proving in geometry). But this naive program has the fea- Although automation is an exciting ture that if p is not provable, it will run Although and ambitious goal, there is little realis- fruitlessly forever, so it is not a yes/no automation tic hope of automated provers routinely decision procedure of the kind Leibniz proving assertions with real mathemati- imagined. A fundamental limitative re- is an exciting cal depth. Attention has thus focused sult due to Church and Turing shows and ambitious on methods of verification making this cannot be avoided, because the set use of substantial interaction between of provable formulas is not recursive goal, there is mathematician and computer. (computable). This inherent limita- tion motivates the more modest goal of little realistic Interactive Theorem Proving having the computer merely check the hope of having The idea behind interactive theorem correctness of a proof provided (at least proving is to allow users to work with in outline form) by a person. Another automated provers a computational “proof assistant” to reaction to limitative results like this, routinely prove convey just enough information and and related ones due to Gödel, Tarski, guidance for the system to be able to Post, and others, is to seek special cas- assertions with confirm the existence of a formal axi- es (such as restricting the logical form real mathematical omatic proof. Many systems in use to- of the problem) where a full decision day actually construct a formal proof procedure is possible. depth. object, a complex piece of data that can All three possibilities have been well be verified by independent checkers. represented in the development of for- One of the earliest proof systems mal proof and automated reasoning: was de Bruijn’s Automath, which ap- ˲˲ Complete but potentially nonter- peared in the late 1960s. The compu- minating proof search; tational assistance it rendered was ˲˲ Decision procedures for special minimal, since the project’s empha- classes of problems; and sis was on developing a compact, ef- ˲˲ Checking of proof hints or sketch- ficient notation for describing math- es given by a person. ematical proof. An early milestone In the category of general proof was Jutting’s 1977 Ph.D. thesis in search, there were pioneering experi- which he presented a complete for- ments in the late 1950s by Gilmore, malization of Landau’s book on a Davis, Putnam, Prawtiz, Wang, and foundational construction of the real others, followed by the more system- numbers as “Dedekind cuts,” deduc- atic development of practically effec- ing that the reals so constructed are a tive proof procedures, including “tab- complete ordered field. leaux” (Beth, Hintikka), “resolution” Proof checkers soon came to in- (Robinson, Maslov), and “model elim- corporate additional computer assis- ination” (Loveland), as well as more tance. Andrzej Trybulec’s Mizar sys- specialized techniques for “equation- tem, introduced in 1973 and still in al” reasoning, including “Knuth-Ben- use today, uses automated methods dix completion.” Perhaps the most to check formal proofs written in a famous application of this kind of language designed to approximate in- search is the proof of the Robbins con- formal mathematical vernacular. The jecture, discovered by McCune17 using Boyer-Moore NQTHM theorem prover the automated theorem prover EQP in (an ancestor of ACL2), also actively 1996 that settled a problem that had used today, was likewise introduced been open since the 1930s. in the early 1970s as a fully automatic In the category of decision proce- theorem prover; in 1974 the project’s dures, perhaps the first real “automat- efforts shifted to developing methods ed theorem prover” was Davis’s proce- of allowing users to prove facts in- dure for the special case of Presburger crementally, then provide the facts as arithmetic, a generalization of integer “hints” to the automated prover in sub- programming. Implementations of sequent proofs. many other decision procedures have Influential systems introduced in followed, motivating further theoreti- the 1980s include Robert Constable’s cal developments. Some procedures Nuprl, Mike Gordon’s HOL, and the

70 communications of the acm | april 2014 | vol. 57 | no. 4 contributed articles

Coq system, based on a logic developed setting p = 2 witnesses the conclusion; such features have been incorporated by Thierry Coquand and Gerard Huet, the system confirms this automatically into LCF-style provers. For example, and, in the 1990s, Lawrence Paulson’s (using “auto”). Otherwise, the proof Figure 2 is a proof of the same state- Isabelle and the Prototype Verification splits into two cases, depending on ment, again in Isabelle, but written in a System, or PVS, developed by John whether or not n is prime. If n is prime, more declarative style. Rushby, Natarjan Shankar, and Sam the result is immediate. The case where A number of important theorems Owre.a By 1994, William Thurston n is not prime is handled by appealing have been verified in the systems de- could write the following in an article to a previously proved fact. scribed here, including the prime in the Bulletin of the American Math- A user can step through such a “pro- number theorem, the four-color theo- ematical Society: cedural” proof script within the proof rem, the Jordan curve theorem, the “There are people working hard assistant itself to see how the goal state Brouwer fixed-point theorem, Gödel’s on the project of actually formalizing changes in successive steps. But the first incompleteness theorem, Dirich- parts of mathematics by computer, script is difficult to read in isolation, let’s theorem on primes in an arithme- with actually formally correct formal since the reader must simulate, or tic progression, the Cartan fixed-point deductions. I think this is a very big but guess, the results of applying each tac- theorems, and many more.b In the next very worthwhile project, and I am con- tic. Ordinary mathematical proofs tend section we describe even more impres- fident we will learn a lot from it.”23 to emphasize, in contrast, the interme- sive milestones, though even more im- Many of these systems are based on diate statements and goals, often leav- an architecture developed by Robin ing the justification implicit. TheMizar b For a list of “100 great theorems,” including those Milner with his 1972 proof LCF proof proof language was designed to model formalized in various systems, see http://www. checker, which implemented Dana such a “declarative” proof system, and cs.ru.nl/˜freek/100/ Scott’s Logic of Computable Functions. An LCF-style prover is based on a small, Figure 1. An LCF tactic-style proof in Isabelle. trusted core of code used to construct theorems by applying basic rules of the lemma prime_factor_nat: “n ~= (1::nat) ==> EX p. prime p & p dvd n” axiomatic system. Such a system can apply (induct n rule: nat_less_induct) apply (case_tac “n = 0”) then include more elaborate pieces of apply (rule_tac x = 2 in exI) code built on top of the trusted core, apply auto to provide more complex proof pro- apply (case_tac “prime n”) cedures that internally decompose to apply auto apply (subgoal_tac “n > 1”) (perhaps many) invocations of basic apply (frule (1) not_prime_eq_prod_nat) rules. Correctness is guaranteed by apply (auto intro: dvd_mult dvd_mult2) the fact that, ultimately, only the basic done rules can change the proof state; ev- erything the system does is mediated by the trusted core. (This restriction is often enforced by using a functional Figure 2. A declarative proof in Isabelle. programming language like ML and OCaml and implementing the basic inference rules as the only constructors lemma prime_factor_nat: “n ~= (1::nat) ==> EX p. prime p & p dvd n” proof (induct n rule: nat_less_induct) of an abstract data type.) fix n :: nat Many provers support a mode of assume "n ~= 1" and working where the theorem to be proved ih: "ALL m < n. m ~= 1 --> (EX p. prime p & p dvd m)" then show "EX p. prime p & p dvd n" is presented as a “goal” that is trans- proof - formed by applying “tactics” in a back- { assume "n = 0" ward fashion; for example, Figure 1 is a hence "prime (2 :: nat) & 2 dvd n" proof of the fact that every natural num- by auto hence ?thesis by blast } ber other than 1 has a prime divisor in moreover the Isabelle proof assistant. { assume "prime n" The “lemma” command establishes hence ?thesis by auto } moreover the goal to be proved, and the first in- { assume "n ~= 0" and "~prime n" struction invokes a form of complete with `n ~= 1` have "n > 1" by auto induction. The next two statements with `~prime n` and not_prime_eq_prod_nat split the proof into two cases, depend- obtain m k where "n = m * k" and "1 < m" and "m < n" by blast with ih obtain p where "prime p" and "p dvd m" by blast ing on whether or not n = 0. When n = 0, with `n = m * k` have ?thesis by auto } ultimately show ?thesis by auto qed a For a more comprehensive list of provers, see qed http://www.cs.ru.nl/˜freek/digimath/index.html; for an overview of 17 proof assistants in use today, see Wiedijk.26

april 2014 | vol. 57 | no. 4 | communications of the acm 71 contributed articles

portant are the bodies of mathemati- tinct from those in the verification of tional interpretation. The resulting cal theory that have been formalized, ordinary mathematics, and the details proof has approximately 150,000 lines often on the way to proving such “big would take us too far afield. So, here, of “code,” or formal proof scripts, in- name” theorems. Substantial libraries in this article, we deliberately set aside cluding 4,000 definitions and 13,000 have been developed for elementary hardware and software verification, lemmas and theorems. As a basis for number theory, real and complex anal- referring the reader to Donald MacK- the formalization, Gonthier and his ysis, measure theory and measure- enzie’s book Mechanizing Proof: Com- collaborators had to develop substan- theoretic probability, linear algebra, puting, Risk and Trust16 for a thoughtful tial libraries of facts about finite group finite group theory, and Galois theory. exploration of the topic. theory, linear algebra, Galois theory, Formalizations are now routinely de- and representation theory. From scribed in journals, including the Jour- Contemporary Efforts there, they worked from a presenta- nal of Automated Reasoning, Journal of Interactive theorem proving reached tion of the Feit-Thompson theorem in Formalised Reasoning, and Journal of a major landmark on September 20, two texts, one by Helmut Bender and Formalized Mathematics. The annual 2012, when Georges Gonthier an- George Glauberman describing the Interactive Theorem Proving confer- nounced he and a group of research- “local analysis,” the other by Thomas ence includes reports on formaliza- ers under his direction had completed Peterfalvi describing a “character-the- tion efforts and advances in interactive a verification of the Feit-Thompson oretic” component. As one might ex- theorem proving technology. theorem. The project relied on the pect, they had to cope with numerous Also worth noting is that interactive Coq interactive proof assistant and a errors and gaps, some not easy to fix, proof systems are also commonly used proof language, SSReflect, Gonthier though none fatal. to verify hardware and software sys- designed. The Feit-Thompson theo- Hales’s Flyspeck project is another ambitious formalization effort. In re- sponse to the outcome of the referee process at the Annals, Hales decided to formally verify a proof of the Kepler conjecture. (The name “Flyspeck” is a contraction of “Formal Proof of the Ke- pler Conjecture.”) He made it clear he viewed the project as a prototype: “In truth, my motivations for the project are far more complex than a simple hope of removing residual doubt from the minds of few referees. Indeed, I see formal methods as fundamental to the long-term growth of mathematics.”9 The proof involves three essential uses of computation: enumerating a class of combinatorial structures called “tame hypermaps”; using linear- tems. In principle, this is no different rem, sometimes called the odd-order programming methods to establish from verifying mathematical claims; theorem, says every finite group of odd bounds on a large number of systems for the purposes of formal verification, order is solvable; equivalently, that the of linear constraints; and using inter- hardware and software systems must finite simple groups of odd order are val methods to verify approximately be described in mathematical terms, exactly the cyclic groups of prime or- 1,000 nonlinear inequalities that arise and the statement that such a system der. This theorem was an important in the proof. All this is in addition to meets a certain specification is a theo- first step in the classification of finite the textual “paper” proof, which in rem to be proved. Most of the systems simple groups mentioned earlier. The and of itself is quite long and involves described here are thus designed to original proof by Walter Feit and John Euclidean measure theory, geometric serve both goals. The connections run Thompson, published in 1963, filled properties of polyhedral, and various deeper; hardware and software specifi- 255 journal pages. While a proof that combinatorial structures. Partly as a cations often make sense only against long would not raise eyebrows today, it result of the formalization effort, both background mathematical theory was unheard of at the time. the “paper” and “machine” parts of of, say, the integers or real numbers; Gonthier launched the project in proof have been streamlined and re- and, conversely, methods of verify- 2006 with support from the Micro- organized.8 The combination of non- t.me ing software apply to the verification soft Research - Inria Joint Centre in trivial paper proofs and substantial y of p i c ho s

of code that is supposed to carry out Orsay, France. Because Coq is based time-consuming computational com- s specifically mathematical computa- on a constructive logic, Gonthier had ponents make it a particularly difficult tions. However, hardware and software to reorganize the proof in such a way formalization challenge, yet after a

verification raises concerns largely dis- every theorem has a direct computa- substantial effort by a large, geograph- c ourte I mage

72 communications of the acm | april 2014 | vol. 57 | no. 4 contributed articles ically distributed team, the project is do next. Eventually I became convinced However, few researchers working nearing completion. that the most interesting and important in formal verification would claim that Another significant formal verifica- directions in current mathematics are checking every last detail of a math- tion effort is the Univalent Foundations the ones related to the transition into a ematical proof is the most interesting project introduced by Fields Medalist new era which will be characterized by or important part of mathematics. For- Vladimir Voevodsky.24 Around 2005, the widespread use of automated tools mal verification is not supposed to re- Voevodsky, and independently Steve for proof construction and verification.” place human understanding or the de- Awodey and Michael Warren, realized During the 2012–2013 academic velopment of powerful mathematical that constructive dependent type theo- year, the Institute for Advanced Study theories and concepts. Nor are formal ry, the axiomatic basis for Coq, has an in Princeton, NJ, held a program on proof scripts meant to replace ordinary unexpected homotopy-theoretic inter- Univalent Foundations, drawing an in- mathematical exposition. Rather, they pretation. Algebraic topologists rou- terdisciplinary gathering of mathema- are intended to supplement the math- tinely study abstract spaces and paths ticians, logicians, and computer scien- ematics we do with precise formula- between elements of those spaces; tists from all over the world. tions of our definitions and theorems continuous deformations, or “higher- and assurances that our theorems are order” paths, between the paths; and Rigor and Understanding correct. One need only recognize, as we deformations between the paths; and We have distinguished between two say here, that verifying correctness is so on. What Voevodsky, Awodey, and types of concern that can attend a math- an important part of mathematics. The Warren realized is that one can view ematical proof: whether the methods it mathematical community today independent type theory as a calculus uses are appropriate to mathematics vests a good deal of time and resources of topological spaces and continuous and whether the proof itself represents in the refereeing process in order to maps between them, wherein the as- a correct use of those methods. How- gain such assurances, and surely any sertion x = y is interpreted as the exis- ever, there is yet a third concern that is computational tools that can help in tence of a path between x and y. often raised—whether a proof delivers that regard should be valued. More specifically, Voevodskyan appropriate understanding of the showed there is a model of construc- mathematics in question. It is in this The Quest for Certainty tive type theory in the category of “sim- respect that formal methods are often In discussions of formally verified plicial sets,” a mathematical structure taken to task; a formal, symbolic proof mathematics, the following question well known to algebraic topologists. is for the most part humanly incompre- often arises: Proof assistants are com- Moreover, this interpretation validates hensible and so does nothing to aug- plex pieces of software, and software a surprising fact Voevodsky called the ment our understanding. Thurston’s invariably has bugs, so why should we “univalence axiom,” asserting, rough- article23 mentioned earlier added the trust such a program when it certifies a ly, that any two types that are isomor- following caveat: proof to be correct? phic are identical. The axiom jibes “…we should recognize that the hu- Proof assistants are typically de- with informal mathematical practice, manly understandable and humanly signed with an eye toward minimizing wherein two structures that are isomor- checkable proofs that we actually do such concerns, relying on a small, trust- phic are viewed as being essentially the are what is most important to us, and ed core to construct and verify a proof. same. However, it is notably false in the that they are quite different from for- This design approach focuses concern universe of sets; for example, there is a mal proofs.” on the trusted core, which consists of, bijection between the sets {1, 2} and The article was in fact a reply to an for example, approximately 400 lines in {3, 4}, but the first set contains the ele- article by Jaffe and Quinn13 in the Bulle- Harrison’s HOL light system. Users can ment 1 while the second does not. Vo- tin of the American Mathematical Society obtain a higher level of confidence by evodsky has suggested that dependent that proposed a distinction between asking the system to output a descrip- type theory with the univalence axiom “theoretical” mathematics, which has tion of the axiomatic proof that can be can provide a new foundation for a speculative, exploratory character, checked by independent verifiers; even mathematics that validates structural and fully rigorous mathematics. The if each particular verifier is buggy, the intuitions. There is also hope among Jaffe-Quinn article drew passionate, odds that a faulty inference in a formal computer scientists that univalence heated responses from some of the proof can make it past multiple verifi- can provide a new foundation for com- most notable figures in mathematics, ers shrinks dramatically. It is even pos- putation where code can be designed some rising to defend the role of rigor sible to use formal methods to verify to work uniformly on “isomorphic” in mathematics, some choosing to the trusted core itself. There have been data structures (such as lists and ar- emphasize the importance of a broad experiments in “self-verifications” of rays) implemented in different ways. conceptual understanding. Given that simplified versions of Coq by Bruno In an excerpt from a grant proposal both are clearly important to math- Barras and HOL light by Harrison,10 as posted on his Web page, Voevodsky de- ematics, the Jaffe-Quinn debate may well as Jared Davis’s work on Milawa, a scribed his motivations for the project come across as much ado about noth- kind of bootstrapping sequence of in- as follows: ing, but the episode makes clear that creasingly powerful approximations to “While working on the completion many mathematicians are wary that ex- the ACL2 prover. of the proof of the Block-Kato conjec- cessive concern for rigor can displace To researchers in formal verifica- ture I have thought a lot about what to mathematical understanding. tion, however, these concerns seem

april 2014 | vol. 57 | no. 4 | communications of the acm 73 contributed articles

misplaced. When it comes to informal ful to the function described in axiom- proof, mistakes arise from gaps in the atic terms and the compiler respects reasoning, appeal to faulty intuitions, the appropriate semantics. However, imprecise definitions, misapplied compared to using unverified code, background facts, and fiddly special this method provides a high degree of cases or side conditions the author Many confidence as well. failed to check. When verifying a the- mathematicians Similar considerations bear on the orem interactively, users cannot get use of automated methods and search away with any of this; the proof checker are wary procedures to support the formaliza- keeps the formalizer honest, requiring that excessive tion process; for example, one can rede- every step to be spelled out in complete sign conventional automated reason- detail. The very process of rendering concern for rigor ing procedures so they generate formal a proof suitable for machine verifica- can displace proofs as they proceed or invoke an tion requires strong discipline; even if “off-the-shelf” reasoning tool and try there are lingering doubts about the mathematical to reconstruct a formal proof from the trustworthiness of the proof checker, output. With respect to both reasoning formal verification delivers a very high understanding. and computation, an observation that degree of confidence—much higher often proves useful is that many proof than any human referee can offer with- procedures can naturally be decom- out machine assistance. posed into two steps: a “search” for Mathematical results obtained some kind of certificate and a “check- through extensive computation pose ing” phase where this certificate is veri- additional challenges. There are at fied.2 When implementing these steps least three strategies that can be used in a foundational theorem prover, the verify such results. First, a user can re- “finding” (often the difficult part) can write the code that carries out the cal- be done in any way at all, even through culations so it simultaneously uses the an external tool (such as a computer al- trusted core to chain together the axi- gebra system11), provided the checking oms and rules that justify the results of part is done in terms of the logical ker- the computation. This approach pro- nel; for example, linear programming vides, perhaps, the highest form of ver- methods can provide easily checked ification, since it produces formal axi- certificates that witness the fact that omatic proofs of each result obtained a linear bound is optimal. Similarly, by calculation. This is the method be- semi-definite programming packages ing used to verify the linear and non- can be used to obtain certificates that linear inequalities in the Flyspeck proj- can be used to verify nonlinear inequal- ect.22 The second strategy is to describe ities.20 Along these lines, the Flyspeck the algorithm in mathematical terms, project uses optimized, unverified code prove the algorithm correct, then rely to find informative certificates witness- on the trusted core to carry out the ing linear and nonlinear bounds, then steps of the computation. This was the uses the certificates to construct fully method used by Gonthier in the veri- formal justifications.22 Such practices fication5 of the four-color theorem in raise interesting theoretical questions Coq; because it is based on a construc- about which symbolic procedures can tive logic, Coq’s “trusted computing in principle provide efficiently check- base” is able to normalize terms and able certificates, as well as the prag- thereby carry out a computation. The matic question of how detailed the cer- third strategy is to describe the algo- tificates should be to allow convenient rithm within the language of the proof verification without adversely affecting checker, then extract the code and run the process of finding them. it independently. This method was used by Tobias Nipkow, Gertrud Bau- Prospects er, and Paula Schultz19 to carry out the We have not touched on many impor- enumeration of finite tame hypermaps tant uses of computers in mathematics in the Flyspeck project; ML code was ex- (such as in the discovery of new theo- tracted from formal definitions auto- rems, exploration of mathematical phe- matically and compiled. This approach nomena, and search for relevant infor- to verifying the results of computation mation in databases of mathematical invokes additional layers of trust, that, facts). Correctness is only one impor- for example, the extracted code is faith- tant part of mathematics, as we have

74 communications of the acm | april 2014 | vol. 57 | no. 4 contributed articles emphasized, and the process of verifica- izer’s familiarity with that library. to making programs more reliable. In Proceedings of the 20th International Colloquium on Automata, tion should interact continuously with In the most ideal circumstances, an Languages and Programming (Lund, Sweden, July other uses of formal methods. But even expert can handle approximately a 5–9), A. Lingas, R. Karlsson, and S. Carlsson, Eds. Springer, Berlin, 1993, 1–14. with this restriction, the issues we have half page to a page of a substantial 3. bourbaki, N. Elements of Mathematics: Theory of considered touch on important aspects mathematical text in a long, uninter- Sets. Addison-Wesley, Reading, MA, 1968; translated from the French Théorie des ensembles in the series of artificial intelligence, knowledge rupted day of formalizing. But most Eléments de mathématique, revised version, Hermann, representation, symbolic computation, circumstances are far less than ideal; Paris, 1970. 4. gödel, K. Über formal unentscheidbare Sätze der hardware and software verification, and an inauspicious choice of definitions Principia Mathematica und verwandter Systeme I. Monatshefte für Mathematik und Physik 38, 1 (1931), programming-language design. Math- can lead to hours of fruitless struggle 173–198; reprinted with English translation in Kurt ematical verification raises its own chal- with a proof assistant, and a formal- Gödel: Collected Works, Volume 1, S. Feferman et al., Eds. Oxford University Press, Oxford, U.K., 1986, lenges, but mathematics is a quintes- izer often finds elementary gaps in 144–195. sentially important type of knowledge, the supporting libraries that require 5. gonthier, G. Formal proof—The four-color theorem. Notices of the American Mathematical Society 55, 11 and understanding how to manage it is extra time and effort to fill. (Dec. 2008), 1382–1393. central to understanding computation- We should not expect interactive the- 6. grabiner, J.V. Is mathematical truth time-dependent? In New Directions in the Philosophy of Mathematics: al systems and what they can do. orem proving to be attractive to math- An Anthology, T. Tymoczko, Ed., Birkhäuser, Boston, The developments we have dis- ematicians until the time it takes to 1986, 201–213. 7. grcar, J.F. Errors and corrections in the mathematical cussed make it clear that it is pragmati- verify a mathematical result formally is literature. Notices of the American Mathematical cally possible to obtain fully verified roughly commensurate with the time it Society 60, 4 (Apr. 2013), 418–432. 8. hales, T. Dense Sphere Packings: A Blueprint for Formal axiomatic proofs of substantial math- takes to write it up for publication or the Proofs. Cambridge University Press, Cambridge, U.K., 2012. ematical theorems. Despite recent ad- time it takes a referee to check it care- 9. hales, T. The Kepler Conjecture (unpublished manuscript), 1998; http://arxiv.org/pdf/math/9811078.pdf vances, however, the technology is not fully by hand. Within the next few years, 10. harrison, J. Towards self-verification of HOL Light. quite ready for prime time. There is a the technology is likely to be most use- In Proceedings of the Third International Joint Conference in Automated Reasoning (Seattle, Aug. steep learning curve to the use of formal ful for verifying proofs involving long 17–20), U. Furbach and N. Shankar, Eds. Springer, Berlin, 2006, 177–191. methods, and verifying even straight- and delicate calculations, whether ini- 11. harrison, J. and Théry, L. A sceptic’s approach to forward and intuitively clear inferences tially carried out by hand or with com- combining HOL and Maple. Journal of Automated Reasoning 21, 3 (1998), 279–294. can be time consuming and difficult. It puter assistance. But the technology is 12. hobbes, T. Leviathan. Andrew Crooke, London, 1651. is also difficult to quantify the effort improving, and the work of researchers 13. Jaffe, A. and Quinn, F. ‘Theoretical mathematics’: Toward a cultural synthesis of mathematics and involved. For example, 15 researchers like Hales and Voevodsky makes it clear theoretical physics. Bulletin of the American contributed to the formalization of the that at least some mathematicians are Mathematics Society (New Series) 29, 1 (1993), 1–13. 14. littlewood, J.E. Littlewood’s Miscellany. Cambridge Feit-Thompson theorem over a six-year interested in using the new methods to University Press, Cambridge, U.K., 1986. period, and Hales suspects the Flyspeck further mathematical knowledge. 15. mac Lane, S. Mathematics: Form and Function. Springer, New York, 1986. project has already exceeded his initial In the long run, formal verification 16. macKenzie, D. Mechanizing Proof: Computing, Risk and estimate of 20 person-years to comple- efforts need better libraries of back- Trust. MIT Press, Cambridge, MA, 2001. 17. mcCune, W. Solution of the Robbins problem. Journal tion. However, it is ultimately difficult ground mathematics, better means of of Automated Reasoning 19, 3 (1997), 263–276. to distinguish time spent verifying a sharing knowledge between the vari- 18. nathanson, M. Desperately seeking mathematical truth. Notices of the American Mathematical Society theorem from time spent developing ous proof systems, better automated 55, 7 (Aug. 2008), 773. the interactive proof system and its li- support, better means of incorporat- 19. nipkow, T., Bauer, G., and Schultz, P. Flyspeck I: Tame graphs. In Proceedings of the Third International braries, time spent learning to use the ing and verifying computation, bet- Joint Conference in Automated Reasoning (Seattle, Aug. 17–20). Springer, Berlin, 2006, 21–35. system, and time spent working on the ter means of storing and searching 20. Parrilo, P.A. Semidefinite programming relaxations for mathematics proper. for background facts, and better in- semialgebraic problems. Mathematical Programming 96, 2 (2003), 293–320. One way to quantify the difficulty of terfaces, allowing users to describe 21. Schubring, G. Zur Modernisierung des Studiums a formalization is to compare its length mathematical objects and their in- der Mathematik in Berlin, 1820–1840. In Amphora: Festschrift für Hans Wussing Zu Seinem 65. Geburtstag, to the length of the original proof, tended uses, and otherwise convey S.S.Demidov et al., Eds. Birkhäuser, Basel, 1992, 649–675. a ratio known as the “de Bruijn fac- their mathematical expertise. But ver- 22. Solovyev, A. Formal Computation and Methods. Ph.D. thesis, University of Pittsburgh, Pittsburgh, PA, 2012. tor.” Freek Wiedijk carried out a short ification need not be an all-or-nothing 23. thurston, W.P. On proof and progress in mathematics. study25 and, in the three examples proposition, and it may not be all that Bulletin of the American Mathematical Society (New Series) 30, 2 (1994), 161–177. considered, the ratio hovers around a long before mathematicians routinely 24. univalent Foundations Program, Institute for factor of four, whether comparing the find it useful to apply an interactive Advanced Study. Homotopy Type Theory: Univalent Foundations of Mathematics. Princeton, NJ, 2013; number of symbols in plain-text pre- proof system to verify a key lemma or https://github.com/HoTT/book sentations or applying a compression certify a computation that is central 25. wiedijk. F. The de Bruijn Factor (unpublished manuscript), 2000; http://www.cs.ru.nl/˜freek/factor/ algorithm first to obtain a better mea- to a proof. We expect within a couple 26. wiedijk, F. The Seventeen Provers of the World. sure of the true information content. of decades seeing important pieces of Springer, Berlin, 2006. A better measure of difficulty is the mathematics verified will be common- Jeremy Avigad ([email protected]) is a professor in amount of time it takes to formalize place and by the middle of the century the Department of Philosophy and the Department of a page of mathematics, though that may even become the new standard Mathematical Sciences at Carnegie Mellon University, can vary dramatically depending on for rigorous mathematics. Pittsburgh, PA. the skill and expertise of the person John Harrison ([email protected]) is a principal engineer in Intel Corporation, Hillsboro, OR. carrying out the formalization, den- References 1. aubrey, J. Brief Lives. A. Clark, Ed. Clarendon Press, sity of the material, quality and depth Oxford, U.K., 1898. of the supporting library, and formal- 2. blum, M. Program result checking: A new approach © 2014 ACM 0001-0782/14/04 $15.00

april 2014 | vol. 57 | no. 4 | communications of the acm 75