Efficient Semantic Features for Automated Reasoning Over Large

Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence (IJCAI 2015) Efficient Semantic Features for Automated Reasoning over Large Theories Cezary Kaliszyk∗ Josef Urbany Jirˇ´ı Vyskocilˇ z University of Innsbruck Radboud University Czech Technical University in Prague Abstract in a fully computer-understandable form. In such encodings, the mathematical knowledge consisting of definitions, the- Large formal mathematical knowledge bases en- orems, proofs and theories is explained in complete detail, code considerable parts of advanced mathematics allowing the computers to fully understand the semantics of and exact science, allowing deep semantic com- such complicated objects and to verify correctness of the long puter assistance and verification of complicated reasoning chains with respect to the formal inference rules of theories down to the atomic logical rules. An es- the chosen logical framework (set theory, type theory, etc.). sential part of automated reasoning over such large Recent highlights of this development include the formal theories are methods learning selection of relevant encoding and verification of two graduate textbooks leading knowledge from the thousands of proofs in the cor- to the proof of the Odd Order theorem (“every finite group of pora. Such methods in turn rely on efficiently com- odd order is solvable”) [Gonthier et al. , 2013], the formal ver- putable features characterizing the highly struc- ification of the 300-page book leading the proof of the Kepler tured and inter-related mathematical statements. conjecture [Hales, 2012], and verification of the seL4 operat- In this work we (i) propose novel semantic features ing system microkernel [Klein et al., 2010]. characterizing the statements in such large seman- This means that larger and larger parts of mathematics and tic knowledge bases, (ii) propose and carry out their mathematical thinking can now be analyzed, explored, as- efficient implementation using deductive-AI data- sisted, and further developed by computers in ways that are structures such as substitution trees and discrimina- impossible in domains where complete semantics is missing. tion nets, and (iii) show that they significantly im- The computers can not only use inductive AI methods (such prove the strength of existing knowledge selection as learning) to extract ideas from the large corpora, but they methods and automated reasoning methods over the can also combine them with deductive AI tools such as au- large formal knowledge bases. In particular, on a tomated theorem provers (ATPs) to attempt formal proofs of standard large-theory benchmark we improve the new ideas, thus further growing the body of verified scientific average predicted rank of a mathematical statement knowledge (which can then again be further learned from, and needed for a proof by 22% in comparison with state so on ad infinitum). In fact, this has started to happen recently. of the art. This allows us to prove 8% more theo- Large formal corpora built in expressive logics of interactive rems in comparison with state of the art. theorem provers (ITPs) such as Isabelle [Nipkow and Klein, 2014], Mizar [Grabowski et al., 2010] and HOL Light [Harri- 1 Introduction: Reasoning in Large Theories son, 1996] have been translated to first-order logic, and ATPs such as Vampire [Kovacs´ and Voronkov, 2013],E [Schulz, In the conclusion of his seminal paper on AI [Turing, 1950], 2002] and Z3 [de Moura and Bjørner, 2008] are used to prove Turing suggests two alternative ways how to eventually build more and more complicated lemmas in the large theories. learning (AI) machines: (i) focusing on an abstract activ- ity like chess, and (ii) focusing on learning through physi- Since existing ATP calculi perform poorly when given cal senses. Both these paths have been followed with many thousands to millions of facts, a crucial component that successes, but so far without producing AI competitive in the makes such ATP assistance practical are heuristic and learn- most advanced application of human intelligence: scientific ing AI methods that select a small number of most relevant [ ] thinking. The approach we follow is to try to learn that from facts for proving a given lemma Kuhlwein¨ et al., 2012 . large bodies of computer-understandable scientific reasoning. This means that we want to characterize all statements in In the last decade, large corpora of complex mathematical the knowledge bases by mathematically relevant features that (and scientific) knowledge and reasoning have been encoded will to a large extent allow to pre-compute the most promising combinations of formulas for proving a given conjecture. In ∗Supported by the Austrian Science Fund (FWF): P26201. some sense, we are thus trying to make a high-level approxi- ySupported by NWO grant nr. 612.001.208. mation of the proof-search problem, and to restrict the fragile zSupported by the Czech Grant Agency, GACR P103/12/1994. local decisions taken by the underlying ATP search by such 3084 high-level knowledge about what makes sense globally and Formal Proof Mining Premise Selection and Proving (Corpus Analysis) what is likely a blind alley.1 The question is how to design (Corpus Querying) User Query: efficient features that will make such global approximative 100,000 Proved Proof? methods as good as possible. This is the subject of this paper. Lemmas Conjecture Lemma Conjecture ATPs Proofs Contributions Features Features 1. Semantic features for characterizing mathematical Machine Learning Premise Relevant Lemmas statements. We propose matching, abstraction and unifica- (Training Phase) Selection (Cutoff Segment) tion features and their combinations as a suitable means for characterizing statements in large mathematical corpora writ- ten in expressive logical frameworks (Section 4). Figure 1: Theorem proving over large formal corpora. The cor- 2. Fast semantic feature-extraction mechanisms. The cru- pus contains many lemmas that can be used to prove a new conjecture (user query). The corpus comes with many formal proofs that cial idea making the use of semantic features feasible is that can be mined, i.e., used to learn which lemmas (premises) are most such features often correspond to the nodes of fast deductive- relevant for proving particular conjectures (queries). The strength AI data-structures such as substitution trees and discrimina- of the learning methods depends on designing mathematically rele- tion nets. We implement and optimize such feature extraction vant features faithfully characterizing the statements. When a new mechanisms and demonstrate that they scale very well even conjecture is attempted, the learners trained on the corpus rank the on the largest formal corpora, achieving extraction times be- available lemmas according to their estimated relevance for the con- low 100 seconds for over hundred thousand formulas (Sec- jecture, and pass a small number (cutoff segment) of the best-ranked tions 5 and 6). lemmas to ATP systems, which then attempt a proof. 3. Improved Premise-Selection Performance. We evaluate the performance of the semantic features when selecting suit- of the large corpora. Here we rely on the MPTP2078 bench- able premises for proofs and compare them to the old features mark [Alama et al., 2014] used in the 2012 CASC@Turing using standard machine-learning metrics such as Recall, Pre- ATP competition [Sutcliffe, 2013] of the Alan Turing Cente- cision, AUC, etc. The newly proposed features improve the nary Conference.7 average predicted rank of a mathematical statement needed In this setting, the task that we are interested in is to auto- for a proof by 22% in comparison with the best old features. matically prove a new conjecture given all the available theo- 4. Improved Theorem-Proving Performance. We rems. Because the performance of existing ATP methods de- compare the overall performance of the whole feature- grades considerably [Urban et al., 2010; Hoder and Voronkov, characterization/learning/theorem-proving stack for the new 2011] when given large numbers of redundant axioms, our re- and old features and their combinations. The improved search problem is to estimate the facts that are most likely to machine-learning performance translates to 8% more the- be useful in the final proof. Following [Alama et al., 2014] orems proved automatically over the standard MPTP2078 we define: large-theory benchmark, getting close to the ATP performance obtained by using human-selected facts (Section 7). Definition 1 (Premise selection problem). Given an ATP A, a corpus Γ and a conjecture c, predict those facts from Γ that 2 The Large-Theory Setting: are likely to be useful when A searches for a proof of c. Premise Selection and Learning from Proofs The currently strongest premise selection methods use machine learning on the proofs of theorems in the corpus, such The object of our interest is a large mathematical corpus, un- as naive Bayes, distance-weighted k-nearest neighbor, ker- derstood as a set Γ of formally stated theorems, each with nel methods, and basic ensemble methods [Kuhlwein¨ et al., zero or more proofs.2 Examples of such corpora are the 3 2012; Kuhlwein¨ et al., 2013; Kaliszyk and Urban, 2013b; Mizar Mathematical Library (MML), the Isabelle Archive 2013a; 2014]. It is possible that a particular fact is useful of Formal Proofs (AFP),4 and the Flyspeck (Formal Proof 5 during a proof search without being used in the final formal of the Kepler Conjecture) development done in HOL Light. proof object. Such cases are however rare and hard to de- Such large corpora contain tens to hundreds of thousands of 6 tect efficiently. Therefore the relation of being useful during proved statements. To be able to do many ATP experiments, the proof search is usually approximated by being used in the smaller benchmarks have been defined as meaningful subsets final proof. Additionally, the learning setting is usually sim- 1Obviously, when developed, such guiding methods can be also plified by choosing at most one (“best”) proof for each the- tried directly inside the ATP calculi.

Load more