Machine Learning for Automated Reasoning
Total Page:16
File Type:pdf, Size:1020Kb
Machine Learning for Automated Reasoning Proefschrift ter verkrijging van de graad van doctor aan de Radboud Universiteit Nijmegen, op gezag van de rector magnificus prof. mr. S.C.J.J. Kortmann, volgens besluit van het college van decanen in het openbaar te verdedigen op maandag 14 april 2014 om 10:30 uur precies door Daniel A. Kühlwein geboren op 7 november 1982 te Balingen, Duitsland Promotoren: Prof. dr. Tom Heskes Prof. dr. Herman Geuvers Copromotor: Dr. Josef Urban Manuscriptcommissie: Prof. dr. M.C.J.D. van Eekelen (Open University, the Netherlands) Prof. dr. L.C. Paulson (University of Cambridge, UK) Dr. S. Schulz (TU Munich, Germany) This research was supported by the NWO project Learning2Reason (612.001.010). Copyright © 2013 Daniel Kühlwein ISBN 978-94-6259-132-5 Gedrukt door Ipskamp Drukkers, Nijmegen Contents Contents i 1 Introduction 1 1.1 Formal Mathematics . 1 1.1.1 Interactive Theorem Proving . 1 1.1.2 Automated Theorem Proving . 2 1.1.3 Industrial Applications . 3 1.1.4 Learning to Reason . 4 1.2 Machine Learning in a Nutshell . 5 1.3 Outline of this Thesis . 6 2 Premise Selection in ITPs as a Machine Learning Problem 9 2.1 Premise Selection as a Machine-Learning Problem . 9 2.1.1 The Training Data . 10 2.1.2 What to Learn . 10 2.1.3 Features . 12 2.2 Naive Bayes and Kernel-Based Learning . 14 2.2.1 Formal Setting . 14 2.2.2 A Naive Bayes Classifier . 15 2.2.3 Kernel-based Learning . 15 2.2.4 Multi-Output Ranking . 18 2.3 Challenges . 19 2.3.1 Features . 20 2.3.2 Dependencies . 20 2.3.3 Online Learning and Speed . 21 3 Overview of Premise Selection Techniques 23 3.1 Premise Selection Algorithms . 23 3.1.1 Premise Selection Setting . 23 3.1.2 Learning-based Ranking Algorithms . 24 3.1.3 Other Algorithms Used in the Evaluation . 25 i CONTENTS 3.1.4 Techniques Not Included in the Evaluation . 25 3.2 Machine Learning Evaluation Metrics . 26 3.3 Evaluation . 27 3.3.1 Evaluation Data . 27 3.3.2 Machine Learning Evaluation . 28 3.3.3 ATP Evaluation . 30 3.4 Combining Premise Rankers . 33 3.5 Conclusion . 34 4 Learning from Multiple Proofs 37 4.1 Learning from Different Proofs . 37 4.2 The Machine Learning Framework and the Data . 38 4.3 Using Multiple Proofs . 39 4.3.1 Substitutions and Unions . 40 4.3.2 Premise Averaging . 40 4.3.3 Premise Expansion . 41 4.4 Results . 42 4.4.1 Experimental Setup . 42 4.4.2 Substitutions and Unions . 42 4.4.3 Premise Averaging . 42 4.4.4 Premise Expansions . 44 4.4.5 Other ATPs . 44 4.4.6 Comparison With the Best Results Obtained so far . 46 4.4.7 Machine Learning Evaluation . 46 4.5 Conclusion . 48 5 Automated and Human Proofs in General Mathematics 49 5.1 Introduction: Automated Theorem Proving in Mathematics . 49 5.2 Finding proofs in the MML with AI/ATP support . 50 5.2.1 Mining the dependencies from all MML proofs . 50 5.2.2 Learning Premise Selection from Proof Dependencies . 51 5.2.3 Using ATPs to Prove the Conjectures from the Selected Premises 52 5.3 Proof Metrics . 53 5.4 Evaluation . 54 5.4.1 Comparing weights . 56 5.5 Conclusion . 56 6 MaSh - Machine Learning for Sledgehammer 59 6.1 Introduction . 59 6.2 Sledgehammer and MePo . 61 6.3 The Machine Learning Engine . 62 6.3.1 Basic Concepts . 63 6.3.2 Input and Output . 63 6.3.3 The Learning Algorithm . 63 ii CONTENTS 6.4 Integration in Sledgehammer . 64 6.4.1 The Low-Level Learner Interface . 64 6.4.2 Learning from and for Isabelle . 65 6.4.3 Relevance Filters: MaSh and MeSh . 67 6.4.4 Automatic and Manual Control . 68 6.4.5 Nonmonotonic Theory Changes . 68 6.5 Evaluations . 69 6.5.1 Evaluation on Large Formalizations . 69 6.5.2 Judgment Day . 72 6.6 Related Work and Contributions . 73 6.7 Conclusion . 73 7 MaLeS - Machine Learning of Strategies 75 7.1 Introduction: ATP Strategies . 75 7.1.1 The Strategy Selection Problem . 76 7.1.2 Overview . 77 7.2 Finding Good Search Strategies with MaLeS . 77 7.3 Strategy Scheduling with MaLeS . 79 7.3.1 Notation . 80 7.3.2 Features . 80 7.3.3 Runtime Prediction Functions . 82 7.3.4 Crossvalidation . 85 7.3.5 Creating Schedules from Prediction Functions . 85 7.4 Evaluation . 86 7.4.1 E-MaLeS . 87 7.4.2 Satallax-MaLeS . 88 7.4.3 LEO-MaLeS . 91 7.4.4 Further Remarks . 94 7.4.5 CASC . 95 7.5 Using MaLeS . 97 7.5.1 E-MaLeS, LEO-MaLeS and Satallax-MaLeS . 97 7.5.2 Tuning E, LEO-II or Satallax for a New Set of Problems . 98 7.5.3 Using a New Prover . 101 7.6 Future Work . 102 7.7 Conclusion . 102 Contributions 105 Bibliography 107 Scientific Curriculum Vitae 121 Summary 125 iii CONTENTS Samenvatting 127 Acknowledgments 129 iv Chapter 1 Introduction Heuristically, a proof is a rhetorical device for convincing someone else that a mathematical statement is true or valid. — Steven G. Krantz, [52] I am entirely convinced that formal verification of mathematics will eventu- ally become commonplace. — Jeremy Avigad, [6] 1.1 Formal Mathematics The foundations of modern mathematics were laid at the end of the 19th century and the beginning of the 20th century. Seminal works such as Frege’s Begriffsschrift [30] estab- lished the notion of mathematical proofs as formal derivations in a logical calculus. In Principia Mathematica [118], Whitehead and Russell set out to show by example that all of mathematics can be derived from a small set of axioms using an appropriate log- ical calculus. Even though Gödel later showed that no effectively generated consistent axiom system can capture all mathematical truth [32], Principia Mathematica showed that most of normal mathematics can indeed be catered for by a formal system. Proofs could now be rigidly defined, and verifying the validity of a proof was a simple matter of checking whether the rules of the calculus were correctly applied. But formal proofs were extremely tedious to write (and read), and so they found no audience among practicing mathematicians. 1.1.1 Interactive Theorem Proving With the advent of computers, formal mathematics became a more realistic proposal. Interactive theorem provers (ITP), or proof assistants, are computer programs that support This chapter is based on: “A Survey of Axiom Selection as a Machine Learning Problem”, submitted to “Infinity, computability, and metamathematics. Festschrift celebrating the 60th birthdays of Peter Koepke and Philip Welch”. 1 CHAPTER 1. INTRODUCTION Theorem There are infinitely many primes: for every number n there exists a prime p > n. Proof [after Euclid] Given n. Consider k = n! + 1, where n! = 1 · 2 · 3 · ::: · n. Let p be a prime that divides k. For this number p we have p > n: otherwise p ≤ n; but then p divides n!, so p cannot divide k = n! + 1, contradicting the choice of p. QED Figure 1.1: An informal proof that there are infinitely many prime numbers [117] the creation of formal proofs. Proofs are written in the input language of the ITP, which can be thought of as being at the intersection between a programming language, a logic, and a mathematical typesetting system. In an ITP proof, each statement the user makes gives rise to a proof obligation. The ITP ensures that every proof obligation is met with a correct proof. ACL2 [47], Coq [11], HOL4 [90], HOL Light [39], Isabelle [68], Mizar [35], and PVS [71] are perhaps the most widely used ITPs. Figures 1.1 and 1.2 show a simple informal proof and the corresponding Isabelle proof. ITPs typically provide built-in and programmable automation procedures for performing reasoning that are called tactics. In Figure 1.2, the by command specifies which tactic should be applied to discharge the current proof obligation. Developing proofs in ITPs usually requires a lot more work than sketching a proof with pen and paper. Nevertheless, the benefit of gaining quasi-certainty about the correct- ness of the proof led a number of mathematicians to adopt these systems. One of the largest mechanization projects is probably the ongoing formalization of the proof of Kepler’s conjecture by Thomas Hales and his colleagues in HOL Light [37]. Other major undertakings are the formal proofs of the Four-Color Theorem [33] and of the Odd-Order Theorem [34] in Coq, both developed under Georges Gonthier’s leadership. In terms of mathematical breadth, the Mizar Mathematical Library [61] is perhaps the main achievement of the ITP community so far: With nearly 52000 theorems, it covers a large portion of the mathematics taught at the undergraduate level. 1.1.2 Automated Theorem Proving In contrast to interactive theorem provers, automated theorem provers (ATPs) work with- out human interaction. They take a problem as input, consisting of a set of axioms and a conjecture, and attempt to deduce the conjecture from the axioms. The TPTP (Thou- sands of Problems for Theorem Provers) library [91] has established itself as a central infrastructure for exchanging ATP problems. Its main developer also organizes an annual competition, the CADE ATP.