Designing a Theorem Prover
Total Page:16
File Type:pdf, Size:1020Kb
Designing a Theorem Prover Lawrence C Paulson Computer Laboratory University of Cambridge May 1990 CONTENTS 2 Contents 1 Folderol: a simple theorem prover 3 1.1 Representation of rules :::::::::::::::::::::::::: 4 1.2 Propositional logic :::::::::::::::::::::::::::: 5 1.3 Quanti¯ers and uni¯cation :::::::::::::::::::::::: 7 1.4 Parameters in quanti¯er rules :::::::::::::::::::::: 7 2 Basic data structures and operations 10 2.1 Terms ::::::::::::::::::::::::::::::::::: 10 2.2 Formulae :::::::::::::::::::::::::::::::::: 11 2.3 Abstraction and substitution ::::::::::::::::::::::: 11 2.4 Parsing and printing ::::::::::::::::::::::::::: 13 3 Uni¯cation 14 3.1 Examples ::::::::::::::::::::::::::::::::: 15 3.2 Parameter dependencies in uni¯cation :::::::::::::::::: 16 3.3 The environment ::::::::::::::::::::::::::::: 17 3.4 The ML code for uni¯cation ::::::::::::::::::::::: 17 3.5 Extensions and omissions ::::::::::::::::::::::::: 18 3.6 Instantiation by the environment :::::::::::::::::::: 19 4 Inference in Folderol 20 4.1 Solving a goal ::::::::::::::::::::::::::::::: 21 4.2 Selecting a rule :::::::::::::::::::::::::::::: 22 4.3 Constructing the subgoals :::::::::::::::::::::::: 23 4.4 Goal selection ::::::::::::::::::::::::::::::: 25 5 Folderol in action 27 5.1 Propositional examples :::::::::::::::::::::::::: 27 5.2 Quanti¯er examples :::::::::::::::::::::::::::: 30 5.3 Beyond Folderol: advanced automatic methods ::::::::::::: 37 6 Interactive theorem proving 39 6.1 The Boyer/Moore theorem prover :::::::::::::::::::: 40 6.2 The Automath languages ::::::::::::::::::::::::: 41 6.3 LCF, a programmable theorem prover :::::::::::::::::: 42 6.4 Validation-based tactics ::::::::::::::::::::::::: 43 6.5 State-based tactics :::::::::::::::::::::::::::: 45 6.6 Technical aspects of Isabelle ::::::::::::::::::::::: 46 7 Conclusion 47 8 Program listing 48 1 FOLDEROL: A SIMPLE THEOREM PROVER 3 1 Folderol: a simple theorem prover Because many di®erent forms of logic are applicable to Computer Science, a common question is | How do I write a theorem prover? This question can be answered with general advice. For instance, ¯rst do enough paper proofs to show that automation of your logic is both necessary and feasible. The question can also be answered with a survey of existing provers, as will be done in this chapter. But automatic theorem-proving involves a combination of theoretical and coding skills that is best illustrated by a case study. So this chapter presents a toy theorem prover, called Folderol, highlighting the key design issues. Code fragments are discussed in the text; a full program listing appears at the end of the chapter. Folderol works in classical ¯rst-order logic. It follows an automatic strategy but is interactive. You can enter a conjecture, let Folderol work with it for a while, and ¯nally modify or abandon the conjecture if it cannot be proved. There is no practical automatic proof procedure, even for most complete theories. Even if there were, interaction with the machine would still be necessary for exploring mathematics. A theorem prover should o®er some evidence that its proofs are valid. The Boyer and Moore [1979] theorem-prover prints an English summary of its reasoning, while Folderol prints a trace of the rules. Most LCF-style systems o®er no evidence of correctness other than an obstinate insistence on playing by the rules [Paulson 1987]. If correctness is a matter of life and death, then a prover can be designed to output its proof for checking by a separate program. Absolute correctness can never be obtained, even with a computer-checked proof. There are fundamental reasons for this. Any program may contain errors, including the one that checks the proofs; we have no absolutely reliable means by which to verify the veri¯er. In addition, a proof of correctness of a program relies on assumptions about the real world; it assumes that the computer works correctly, and usually assumes that the program is used properly. Hardware correctness proofs depend upon similar assumptions [Cohn 1989b]. Even in pure mathematics, the premises of a theorem are subject to revision [Lakatos 1976]. Although Folderol performs well on certain examples, its proof strategy su®ers inherent limitations. The strategy would be even worse if we adopted modal or intuitionistic logic, and cannot easily be remedied. My hope is that, once you have studied this chapter, you will gain the con¯dence to implement more sophisticated strategies. Folderol exploits many of the concepts of functional programming, although it is not pure. It is written in Standard ML, which is increasingly popular for theorem proving. Several recent books teach this language. Remark on quanti¯ers. The usual notation for quanti¯ers, seen elsewhere in this volume, gives the quanti¯ed variable the smallest possible scope. To save on parentheses, quanti¯ers in this chapter employ a dot convention, extending the scope of quanti¯cation as far as possible. So 8x:P ! Q abbreviates 8x:(P ! Q), and 9y:P ^9z:Q _ R abbreviates 9y:(P ^ (9z:(Q _ R))) 1 FOLDEROL: A SIMPLE THEOREM PROVER 4 1.1 Representation of rules Given that Folderol will handle classical ¯rst-order logic, what formal system is best suited for automation? Gentzen's sequent calculus LK [Takeuti 1987] supports backwards proof in a natural way. A naijve process of working upwards from the desired conclusion yields many proofs. Most rules act on one formula, and they come in pairs: one operating on the left and one on the right. We must inspect the rules carefully before choosing data structures. The quanti¯er rules, especially, constrain the representation of terms. The classical sequent calculus deals with sequents of the form ¡ j¡ ¢. Here ¡ is the left part and ¢ is the right part, and ¡ j¡ ¢ means `if every formula in ¡ is true then some formula in ¢ is true.' The sequent A j¡ A (called a basic sequent) is trivially true. You may prefer to think in terms of refutation. Then ¡ j¡ ¢ is false if every formula in ¡ is true and every formula in ¢ is false. If these conditions are contradictory then the sequent is true. Structural rules, namely exchange, thinning, and contraction, make sequents behave like a consequence relation. The thinning rules, in backwards proof, delete a formula from the goal. Since a formula should not be deleted unless it is known to be redundant, let us postpone all deletions. A goal is trivially true if it has the form ¡ j¡ ¢ where there exists a common formula A 2 ¡ \ ¢. Accepting these as basic sequents makes thinning rules unnecessary; the other formulae are simply thrown away. The exchange rules swap two adjacent formulae, for traditionally ¡ and ¢ are lists, not sets. Lists are also convenient in programming. We can do without ex- change rules by ignoring the order of formulae in a sequent. In the rule ¡ j¡ ¢;A ¡ j¡ ¢;B ^:right ¡ j¡ ¢;A^ B there is no reason why A ^ B needs to be last. It can appear anywhere in the right part of the conclusion. For backwards proof: if the goal contains A ^ B anywhere in the right part then we can produce two subgoals, replacing A ^ B by A and B respectively. We can write the rule more concisely and clearly as follows, showing only formulae that change: j¡ A j¡ B ^:right j¡ A ^ B The contraction rules, in backwards proof, duplicate a formula. Duplicating the A ^ B, then applying the conjunction rule above, makes subgoals j¡ A; A ^ B and j¡ B;A ^ B. These are equivalent to j¡ A and j¡ B, so the A ^ B is redundant. A case-by-case inspection of the rules reveals that the only formulae worth du- plicating are 8x:A on the left and 9x:A on the right. Let us add contraction to the rules 8:left and 9:right. The rule 8:left takes a goal containing 8x:A on the left and makes one subgoal by adding the formula A[t=x] for some term t. The subgoal retains a copy of 8x:A so that the rule can be applied again for some other term. This example requires n repeated uses of the quanti¯ed formula on terms a, f(a), 1 FOLDEROL: A SIMPLE THEOREM PROVER 5 :::: 8 ! j¡ ! ¢¢¢ ¢¢¢ x:P(x) P (f(x)) P (a) P (|f(f{z( f}(a) ))) n times The rule 9:right similarly retains the quanti¯ed formula in its subgoal. Folderol uses these quanti¯ed formulae in rotation. It never throws them away. Thus even 8x:P (x) j¡ Q makes Folderol run forever. If not for the re-use of quanti¯ed formulae, the search space would be ¯nite. A theorem prover should only instantiate a quanti¯er when strictly necessary | but there is no e®ective test. First-order logic is undecidable. 1.2 Propositional logic Propositional logic concerns the connectives ^, _, !, $, and :. We take negation as primitive rather than de¯ning :A as A !?. Although A $ B means (A ! B) ^ (B ! A), performing this expansion could cause exponential blowup! The rules for $ permit natural reasoning. Figure 1 shows the rules for each connective. In backwards proof each rule breaks down a non-atomic formula in the conclusion. If more than one rule is applicable, as in A ^ B j¡ B ^ A, which should be chosen? Some choices give shorter proofs than others. This proof of A ^ B j¡ B ^ A uses ^:left before ^:right (working upwards): A; B j¡ BA;Bj¡ A ^:right A; B j¡ B ^ A ^:left A ^ B j¡ B ^ A If ^:right is used ¯rst then ^:left must be used twice. In larger examples the di®erence can be exponential. A; B j¡ B A; B j¡ A ^:left ^:left A ^ B j¡ B A ^ B j¡ A ^:right A ^ B j¡ B ^ A Rules can be chosen to minimize the proliferation of subgoals. If a goal is a basic sequent then trying other rules seems pointless.1 Let us say that the cost of a rule is the number of premises it has.