Machine Learning and Automated Theorem Proving

UCAM-CL-TR-792 Technical Report ISSN 1476-2986 Number 792 Computer Laboratory Machine learning and automated theorem proving James P. Bridge November 2010 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 http://www.cl.cam.ac.uk/ c 2010 James P. Bridge This technical report is based on a dissertation submitted October 2010 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Corpus Christi College. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: http://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Machine learning and automated theorem proving James P. Bridge Summary Computer programs to find formal proofs of theorems have a history going back nearly half a century. Originally designed as tools for mathematicians, modern applications of automated theorem provers and proof assistants are much more diverse. In particular they are used in formal methods to verify software and hardware designs to prevent costly, or life threatening, errors being introduced into systems from microchips to controllers for medical equipment or space rockets. Despite this, the high level of human expertise required in their use means that theorem proving tools are not widely used by non specialists, in contrast to computer algebra packages which also deal with the manipulation of symbolic mathematics. The work de- scribed in this dissertation addresses one aspect of this problem, that of heuristic selection in automated theorem provers. In theory such theorem provers should be automatic and therefore easy to use; in practice the heuristics used in the proof search are not universally optimal for all problems so human expertise is required to determine heuristic choice and to set parameter values. Modern machine learning has been applied to the automation of heuristic selection in a first order logic theorem prover. One objective was to find if there are any features of a proof problem that are both easy to measure and provide useful information for determining heuristic choice. Another was to determine and demonstrate a practical approach to making theorem provers truly automatic. In the experimental work, heuristic selection based on features of the conjecture to be proved and the associated axioms is shown to do better than any single heuristic. Additionally a comparison has been made between static features, measured prior to the proof search process, and dynamic features that measure changes arising in the early stages of proof search. Further work was done on determining which features are important, demonstrating that good results are obtained with only a few features required. Acknowledgments I would like to thank my two supervisors, Professor Lawrence Paulson and Dr Sean Holden, for their guidance, feedback and encouragement during my research. Thanks must also go to the author of the theorem prover \E", Dr Stephan Schulz, who has with great patience responded promptly to many e-mails. Contents 1 Motivation 11 1.1 The thesis . 11 1.2 Applications of automated theorem provers . 12 1.3 Choice of theorem prover . 13 1.3.1 Automation versus expressive power . 14 1.3.2 SAT solvers . 14 1.3.3 First order logic theorem provers . 15 1.3.4 Proof assistants . 15 1.3.5 Prover used . 15 1.4 Importance of heuristic selection . 16 1.5 Motivation for using machine learning . 16 1.6 Dissertation summary . 17 2 Background 19 2.1 Logic . 19 2.1.1 Logic levels or languages . 19 2.1.2 Proof methods . 25 2.1.3 Decidability and semi-decidability . 29 2.1.4 Expressive power . 30 2.2 ATPs versus proof assistants . 31 2.3 Resolution based theorem proving . 31 2.3.1 Resolution and related calculi . 31 2.3.2 Practical implementations . 36 2.4 Machine learning . 37 2.4.1 General concepts . 37 2.4.2 Machine learning approaches . 39 2.4.3 Decision trees . 40 CONTENTS CONTENTS 2.4.4 Linearly separable classes . 41 2.4.5 Perceptrons . 44 2.4.6 Margin . 45 2.4.7 Transforming the feature space . 45 2.4.8 Kernel functions arising from transformed space . 46 2.4.9 The support vector machine . 47 2.4.10 Nonseparable data and soft margin classifiers . 49 2.4.11 Alternatives to SVMs . 50 2.4.12 Feature selection . 51 2.5 Applying machine learning to theorem proving . 53 2.5.1 TEAMWORK and the E-theorem prover . 53 2.5.2 Neural networks and folding architecture networks . 55 2.5.3 Learning with symbols and large axiom libraries . 55 2.5.4 Proof planning (Omega project) . 56 2.6 Summary . 56 3 Methodology 59 3.1 Generic description of experimental method . 59 3.2 Data | conjectures to be proved . 60 3.3 Measuring features . 60 3.4 Dynamic and static features . 61 3.5 Theorem prover used . 62 3.6 Selecting the heuristics for the working set . 63 3.6.1 Clause selection within heuristics . 63 3.7 Fitting a support vector machine - SVMLight . 63 3.8 Kernel functions . 64 3.8.1 Linear basis function kernel . 64 3.8.2 Polynomial kernel . 65 3.8.3 Sigmoid tanh kernel . 65 3.8.4 Radial basis function kernel . 65 3.9 Custom software . 65 3.10 Overview of experimental work . 66 3.11 Computer hardware used . 66 3.12 Summary . 66 6 CONTENTS CONTENTS 4 Initial experiment 69 4.1 Classification problem . 70 4.2 Data used . 70 4.3 Heuristic used . 71 4.4 Running the theorem prover . 71 4.5 Training and test data sets . 71 4.6 Features measured . 72 4.7 Using SVMLight and kernel selection . 72 4.7.1 Linear kernel . 72 4.7.2 Radial basis function kernel . 73 4.7.3 Sigmoid tanh kernel . 74 4.7.4 Polynomial kernel . 74 4.7.5 Further investigation of the radial basis function . 74 4.8 Filtering features . 76 4.9 Results for reduced feature set . 77 4.10 Summary . 77 5 Heuristic selection experiment 79 5.1 Selecting a working set of heuristics . 79 5.2 Data used . 80 5.3 Feature sets . 81 5.4 Initial separate classifiers . 82 5.5 Automatic heuristic selection . 82 5.6 Performance measures for classifiers . 83 5.7 Unextended feature set experiments . 84 5.7.1 First classifications . 84 5.7.2 Identical learning and test sets . 88 5.7.3 First results of heuristic selection . 89 5.8 Experiments with extended feature sets . 91 5.8.1 Classifications with extended feature set . 91 5.8.2 Heuristic selection with extended feature set . 92 5.9 Further analysis . 94 5.10 Conclusions . 95 7 CONTENTS CONTENTS 6 Feature selection 97 6.1 Selectively removing features . 97 6.1.1 Ordering features for removal . 98 6.1.2 Including optimisation and other improvements . 99 6.2 Testing small and large feature subsets . 100 6.2.1 Improving the data sets . 101 6.2.2 Enumerating the subsets . 101 6.2.3 Coding for parallel execution . 101 6.2.4 Looking at large subsets . 102 6.2.5 Analysis of three feature subset results . 102 6.2.6 Partial extension to four feature subsets . 103 6.2.7 Results for fixed heuristics . 104 6.2.8 Varying gamma for the best subsets . 105 6.2.9 The best two features (7 and 52) . 106 6.3 Small subset results without H0 . 106 6.3.1 Results without H0 . 106 6.3.2 Feature scores without H0 . 107 6.3.3 The best three features without H0 (10, 14 and 15) . 107 6.4 Random heuristic selection . ..

Load more