ABSTRACT

CLASSIFYING PROOF STRATEGIES IN ABELLA

In the realm of logic certain domains have kept their head above water, resisting the thrust of automation. As such, interactive provers exist for these higher-order or more complex logics, demanding significant expenditures of human expertise and time. Recent decades have witnessed many attempts to bring automation to interactive theorem provers, and the last ten years have seen an explosion of machine learning research on the topic. This thesis defines a representation for proofs that are completed interactively in the theorem prover Abella based on the concept of strategies. The core idea is that certain strategies for applying the inductive hypothesis and following the structure of definitions within the specification can both be used to help automate proofs, and instances of these strategies can be extracted from existing proofs. The latter is the focus of this work, providing the implementation details along with the programs for parsing proof data, interacting with the Abella system to reprove proof scripts, and transforming the extracted data into the abstract representation corresponding to strategies. This representation starts with targets which annotate each proof step with information tied to the strategy. Then a proof tree is constructed, which captures the dependencies between proof steps; and the combination of targets and proof tree and transformed into a proof frame, the abstraction that corresponds directly with a strategy.

The potential of this data abstraction to facilitate partial and full automation along with proof visualization and summary is explained and put forth as future work.

Joseph Reeves August 2020

CLASSIFYING PROOF STRATEGIES IN ABELLA

by

Joseph Reeves

A thesis

submitted in partial

fulfillment of the requirements for the degree of

Master of Science in Computer Science

in the College of Science and Mathematics

California State University, Fresno August 2020

© 2020 Joseph Reeves APPROVED

For the Department of Computer Science:

We, the undersigned, certify that the thesis of the following student meets the required standards of scholarship, format, and style of the university and the student's graduate degree program for the awarding of the master's degree.

Joseph Reeves Thesis Author

Todd Wilson (Chair) Computer Science

Ming Li Computer Science

David Ruby Computer Science

For the University Graduate Committee:

Dean, Division of Graduate Studies AUTHORIZATION FOR REPRODUCTION

OF MASTER’S THESIS

I grant permission for the reproduction of this thesis in part or in its entirety without further authorization from me, on the condition that the person or agency requesting reproduction absorbs the cost and provides proper acknowledgment of authorship.

X Permission to reproduce this thesis in part or in its entirety must be obtained from me.

Signature of thesis author: Joseph Reeves ACKNOWLEDGMENTS

First and foremost, I would like to thank Todd Wilson for his role as advisor to this thesis. Since taking his CSci 217 course two years ago, each semester has been filled with office visits that spiraled into long conversations; and these conversations were what cultivated my interest in type-theory, logic, and computation. And his intuitions about the field provided key guidance in the development of this thesis, and really helped get the work off the ground.

Additionally, I would like to thank Ming Li for opening the door to my academic career, inviting me to a research team and advising our group through several publications and conference presentations. In this group, Carlos Moreno was an outstanding research mentor, showing me the ropes and often rewriting my buggy code.

My interests in the fields of logic, computation, and artificial intelligence were aroused through courses I took with David Ruby, as well as the math department professors Oscar and Maria Nogin. Their lecturing, or independent studies, directed me towards areas that I found exciting.

Lastly, I would like to thank my family and friends for supporting me through this process; as well as the students I instructed, many of which I can now call friends, that kept academic life enjoyable.

TABLE OF CONTENTS Page

LIST OF FIGURES ...... viii

INTRODUCTION...... 1

Theorem Proving...... 1

The Future of Theorem Proving...... 4

Thesis Contributions ...... 7

Thesis Overview ...... 9

RELATED WORK ...... 10

Proof Systems...... 11

Automation with Learning ...... 14

Automation with Expert Knowledge and Strategies...... 21

Proof Capture ...... 25

Conclusion ...... 27

AN INTRODUCTION TO ABELLA ...... 29

The Abella Logical Framework ...... 29

Proving Add Exists ...... 31

Why Abella? ...... 38

Additional Example Proofs ...... 38

PROOF STRATEGIES...... 39

Inductive Proof Schemes ...... 40

Connecting a Strategy with a Definition ...... 43

Strategy Description ...... 48

Additional Strategies...... 49

EXTRACTING PROOF DATA ...... 51 vii vii

Page

The Two-Phase Approach ...... 51

The Static Phase ...... 52

The Dynamic Phase ...... 53

ABSTRACTING A PROOF ...... 58

Targets ...... 58

Nodes ...... 71

Proof Trees ...... 72

Proof Frames ...... 81

Querying the Data ...... 87

SOLVING PROOFS USING STRATEGIES ...... 89

Using a Strategy ...... 89

Filling in the Frame...... 91

CONCLUSION ...... 96

REFERENCES ...... 98

APPENDICES ...... 102

APPENDIX A: EXAMPLE SPECIFICATIONS AND ...... 103

LIST OF FIGURES

Page

Figure 1. Proof diagram for the theorem add exists, with an Abella-like syntax...... 3

Figure 2. Example sig file...... 32

Figure 3. Example mod file...... 33

Figure 4. Add exists proof screen 1...... 33

Figure 5. Add exists proof screen 2...... 34

Figure 6. Add exists proof screen 3...... 34

Figure 7. Add exists proof screen 4...... 35

Figure 8. Add exists proof screen 5...... 35

Figure 9. Add exists proof screen 6...... 36

Figure 10. Add exists proof screen 7...... 36

Figure 11. Add exists proof diagram...... 37

Figure 12. Inductive scheme 1...... 41

Figure 13. Inductive scheme 2...... 41

Figure 14. Inductive scheme 3...... 42

Figure 15. Inductive scheme 4...... 42

Figure 16. Inductive scheme 5...... 43

Figure 17. Direct embedding example...... 45

Figure 18. Direct embedding with multiple definitions example...... 45

Figure 19. Multiple direct embeddings example...... 46

Figure 20. Indirect embedding example...... 47

Figure 21. No embedding progress example...... 48

Figure 22. No embedding commutativity of add example...... 48

Figure 23. Algorithm for extracting theorem and proof step information...... 56 ix ix

Page

Figure 24 Add exists witness and target for step 6...... 60

Figure 25. Target definition for leaf with unfold witness...... 60

Figure 26. Add uniqueness witness and target for step 8...... 61

Figure 27. Target definition for leaf with Eq witness...... 61

Figure 28. Add nat witness and target for step 4...... 61

Figure 29. Target definition for leaf with no witness...... 62

Figure 30. Mult exists target for step 7...... 62

Figure 31. Target definition for steps that correspond to RHS of definitions...... 63

Figure 32. Add uniqueness target for step 7...... 63

Figure 33. Target definition for steps indirectly corresponding with definitions...... 63

Figure 34. Add uniqueness target for step 6...... 65

Figure 35. Target definition for steps inverting terms...... 65

Figure 36. Add uniqueness target for step 4...... 65

Figure 37. Division existence target for step 8...... 66

Figure 38. Target definition for steps causing a branch on a separate term...... 67

Figure 39. Multiplication exists target for step 6...... 68

Figure 40. Progress target for steps 9-13...... 69

Figure 41. Target definition for intermediary proof steps a.k.a. links...... 69

Figure 42. Division exists target for step 7...... 70

Figure 43. Node definition...... 71

Figure 44. Interleavings of Progress IH applications...... 73

Figure 45. Branch structure for add exists step 3...... 75

Figure 46. Branch definition for the Proof Tree...... 75

Figure 47. Sequence structure for mult exists steps 5,6,7...... 76

Figure 48. Sequence definition for the Proof Tree...... 76 x x

Page

Figure 49. ISet structure for progress steps 10-14...... 77

Figure 50. ISet definition for Proof Tree...... 77

Figure 51. Proof Tree definition...... 77

Figure 52. Topological ordering for branch of Progress proof (on the left), and an arbitrary proof...... 79

Figure 53. Topological ordering with Iset and Sequence structure...... 79

Figure 54. Topological sort for branch of mult exists proof...... 80

Figure 55. Multiplication exists with nodes removed (left), progress with nodes removed (right)...... 82

Figure 56. Proof Tree lifting of multiplication exists steps 5-7...... 83

Figure 57. Proof Tree lifting of progress steps 9-13...... 83

Figure 58. Gap in progress from steps 9-13...... 84

Figure 59. Gap Definition...... 84

Figure 60. Proof Frame definition...... 85

Figure 61. Basic system design for using strategies and proof frames...... 90

Figure 62. Proof Frame comparison for progress on value cases of plus (left) and length (right)...... 92

Figure 63. Proof frame refinement through analogy for progress on value cases of plus (left) and length (right)...... 93

Figure 64. Proof frame information for inductive branch of multiplication exists...... 94

Figure 65. Back chaining visualization for inductive branch of multiplication exists...... 95

INTRODUCTION

Theorem Proving

An Overview Mechanized theorem proving has exploded in use over the past several decades due to several factors like the refinement of automation techniques, the introduction of new more expressive logics, and the deployment of powerful deductive systems. The application space of theorem proving ranges broadly across domains in computer science.

One example is hardware verification, commonly used in circuit design, that formulates specifications in propositional logic and leverages state-of-the-art SAT solvers to provide quick solutions. In fact, one of the key successes in theorem proving has been the progression of automation techniques for propositional logic and fragments of first-order logic, which incorporate implementations of resolution and backtracking with complex heuristics, among other techniques. However, automation has not been as fully developed for the more expressive logics.

More expressive logics like first-order and higher-order logic are needed to reason about more complex systems (those beyond the scope of propositional logic). For example, software verification can be performed in a higher-order logic, providing guarantees about programs based on their specification, e.g., CompCert [1] a certified compiler extracted from the Coq proof system. Another area of growing interest is defining theories of modern mathematics within existing proof systems. Recent work in and homotopy type theory has produced surprising results, and the results are witnessed in proof systems. Even still, software engineers are not flocking to proof systems to verify the correctness of their programs; likewise, many mathematicians are not going through the trouble of formulizing theories and completing proofs in a proof system. Several barriers stand in the way of new, and even experienced users; one of 2 2 which is the lack of automation for these more expressive logics. Unlike fully automated systems, these theorem proving problems require interaction from the user due to their inherent difficulty. That said, there have been several attempts at providing partial or full automation within these systems. Before introducing these possibilities of automation and how this thesis approaches the problem, it is important to understand the environment of interactive theorem proving.

Interactive Theorem Proving Interactive theorem proving describes the process wherein a user creates definitions for terms in the specification, then inputs a conjecture (formula to be proved) and works interactively with the proof system to produce a proof of the conjecture. This process can occur at different levels of abstraction, but in general the proof system displays a proof state to the user, consisting of a set of local hypotheses and goals to be proved, and the user inputs a tactic to further the proof. A tactic is a command that transforms the proof state, updating the set of hypotheses or set of sub goals, and returns a new proof state to the user. The process of tactic input continues until each sub goal is proved, and the existing tactic sequence represents the proof of the theorem. Different proof systems employ different logical frameworks, each with its own tactic language. In addition, these frameworks have varying meta-properties and syntax for representation of objects. These components can affect the reasoning within a proof system, but the example proof in Figure 1 can give a taste for a generic interactive proof.

The example proof diagram captures the general aspects of an interactive proof environment and process, using an Abella-like syntax for terms. In short, the arrow, forall, and exists are the familiar connectives of first-order logic, and the judgements within the curly braces can be viewed as derivations. Each gray shaded box represents a window that the user will see, and these windows are subdivided by a blue line with the 3 3

Figure 1. Proof diagram for the theorem add exists, with an Abella-like syntax.

4 4 hypotheses (labeled by names and in black) above the line, and a goal and list of labelled sub goals (in red) below the line. The user sees the windows in sequence, denoted by the bolded numbers next to each window. At each point, the user inputs a tactic (underlined with blue text), which modifies the proof state in some way. These tactics, and the specification for this example, will be examined in detail in section 3.

The Future of Theorem Proving

Theorem Proving for the Masses To place the specific goals of this thesis in context, it is important to establish the overarching aspirations of theorem proving as it relates to the fields of computer science, mathematics, and beyond. As mentioned earlier, theorem proving can be used to verify software correctness; but even more, theorem proving can generate correct programs with the proofs as programs paradigm. As software begins to touch every part of our life, the demand for provably correct programs will be paramount. Theorem provers will be an irreplaceable tool to this end. Building systems with high-level abstractions and near full automation will produce an environment for novice programmers to transition into using theorem proving tools.

A similar vision holds for mathematics; while correctness may not be critical, the turn to mechanization can produce a major increase in productivity. Modern mathematics can be complicated to formularize, and hundred-page proofs can be untenable in an interactive system. Giving users higher levels of abstraction to work from, with automation to fill in the gaps, could make some of this work more tractable. Further, when large theories become mechanized, knowledge creation can be realized with the large corpus of existing proofs. This not only means generating new theorems and their corresponding proofs but supporting summarization and visualization that will help users work with large theories. This can save much time from the current paradigm of reading 5 5 long proofs in prose, without a guarantee of correctness, and without a good way to summarize or learn from the proof.

Automation and Machine Learning Automation is a key concern when considering how best to deploy theorem provers to the masses. Much work has been done on this topic, from developing expert heuristics for controlling search to complex algorithms like Rippling [2]. However, as proof corpuses grow in size, focus has been placed on proof reuse which comes in two flavors: analogy where a single proof object is transformed into a proof object for a similar theorem, or strategy where a general strategy derived from several proofs within a family is used to generate proofs of similar theorems. Even with proof reuse, expert knowledge in needed in many cases; for example, deciding which proofs should be used to create a strategy, and how other proofs might be recognized as members of the same proof family. Recent trends have tried to leapfrog the need for expert knowledge by using state-of-the-art machine learning (ML) techniques to learn from proof objects. A common approach is to learn on a set of features in the proof state to determine which tactic should be chosen. The progress of such approaches has been relatively slow, when considering the victories of ML in other domains. In addition, ML models are notoriously difficult to understand, and in-turn the output proof objects, even if they are valid, may be difficult to understand because of an inefficient model. There are a few considerations to investigate before leaping head-first into the ML paradigm.

Proof Strategies This thesis approaches the problem of automation through the lens of proof strategies. In short, a strategy is a procedure for guiding the proof process, describing how an inductive hypothesis (IH) is to be applied, as well as how the structure of definitions in the specification can correspond with proof steps. In this sense, strategies 6 6 will not provide tactic sequences for full automation, but will instead give a frame that can be seen roughly as a guide, and further refinement of that frame through search or proof reuse can reach the final tactic sequence.

The focus of this thesis is on defining a strategy and extracting instances of strategies from completed proofs. This involves interacting with a proof system to capture the relevant information about a proof, then implementing a procedure that processes and abstracts the data to the level of a strategy. A key component of this process is the representation of proof data, which can be detailed at varying levels of abstraction, and can contain simple or complex constructs. While this work will not reach the point of automation via strategies, it provides the necessary groundwork for such a system. Going backwards, starting from the already completed proofs and extracting strategy instances from them is a useful approach because it shows how the strategies are embedded in existing proofs, giving key feedback in the development of strategies before trying to prove theorems from scratch. Further, this corpus can be used as a data set for

ML algorithms to classify theorems with specific strategies, where strategy selection will be a key component of the imagined automated prover.

In addition, this idea can be co-opted for use in ML generated proofs, namely for better understanding. When performing an interactive proof, users may leave comments, much like a program is commented, to give insight into why tactics are being used and to catalog higher-level strategies. In addition, users will follow some internal heuristic for which choice to make, when there are many that will lead to a completed proof. This can be seen in cases where tactic applications are independent of one another and can be interchanged in the sequence. So, in an ML generated proof not only are comments lost, but the proof structure may be mangled as well. Thus, the first goal of this work is to reclaim a proof representation that is clear and understandable, from a raw tactic sequence. This is achieved by reproving the theorem with the tactic sequence to collect 7 7 information about the proof state, then abstracting the proof towards an instance of a strategy in a way that makes it digestible.

Thesis Contributions This thesis seeks to provide the groundwork for an automated proof system using a strategy-based approach. This involves defining the idea of a strategy and extracting instances of strategies from existing proofs, storing these data structures in data bases. To complete this task involves the development of the theory for representing the various data structures, as well as the implementation of programs that realize the theory. Thus, the artifacts produced in this process can be split into two types: theory and programs and are enumerated below:

1. Parsers were developed for the Sig, Mod, and Thm files in Abella. They store the data in python dictionaries due to their interoperability with JSON

formatting. These parsers are useful for producing data that can be queried

and analyzed, but their main purpose is to provide the underlying data that

will be used by the following programs.

2. A parser for Abella sequents output to the terminal screen by the Abella

application was developed. This includes parsing of instantiations and witnesses (options that the user may turn on or off). The information is stored

in a dictionary for easy access, so users can quickly work with the data

relevant to their situation. All information output to the terminal is collected,

so an automated system using the program can be guaranteed it sees what the

human sees.

3. A system was developed for re-running proof scripts and collecting information about the theorem and proof steps including theorem

dependencies, proof step dependencies, output sequents for each proof step, 8 8

tree structure of the proof, and more. This system was implemented in Python,

where the Abella application was interacted with by opening sub processes.

The information was then stored in the JSON format. This program provides a

database with all the relevant information to a proof process, opening the door for ML.

4. The theory behind strategies based on IH applications and definition

embeddings is described and supported with various examples. This construction is designed specifically for an easy extraction from proof scripts

and easy implementation for automating proofs, serving as the basis for the

two-level solving approach.

5. Targets are defined to correspond with the components of a strategy, and the

implementation details for extracting targets from proofs steps are cataloged

along with the definition of the varying target types. Targets are an annotation on proof steps that connect the proofs steps with their abstract intent.

6. The proof tree data structure and its constructors are defined. This data

structure captures the hierarchical dependencies between the steps and

of steps within a proof. The method for extracting a proof tree from a proof is

explained.

7. The proof frame data structure is defined, along with the way it can be extracted from proof steps annotated by targets and a corresponding proof

tree. The proof frame servers as the basis for strategy-based solving and can

be used for both analogy-based solving and ML-based solving techniques. Beyond the solving aspect, proof frames provide a well-suited abstraction for

visualization and proof corpus analytics that can be altered to fit a user’s

purpose. 9 9

8. The system design for a two-level proof solver is presented; discussing the

way in which strategies can be used along with other solving engines to help

automate the proof process. It is also brought back to the data base of proof

frames, examining ways that the data can be used to improve the system’s capabilities.

Thesis Overview First, a survey of related works and their relevance to this work is presented. This section first examines proof systems and their key features, along with the research done in automating theorem provers. To realize proof abstraction and automation, it is necessary to focus on a specific system, with its own unique logic and representation. The chosen system is Abella, a niche proof system built for the domain of programming languages. As it is unfamiliar to some readers, Section 3 introduces the logic of Abella with a complete example proof, then references other examples that can be found in the appendix. Next, in Section 4 the theory behind proof strategies and their components is laid out accompanied by examples. The first step towards abstracting proof information is collecting that information, and Section 5 details the programs implemented for this purpose. Section 6 gives the structure of an abstracted proof, starting with targets and nodes, then exploring proof trees and proof frames, the structures that tie the strategies and proofs together. These descriptions are accompanied by implementation details and examples. Section 7 explores the question of automation, examining a few possible approaches for a prover built to use strategies. Section 8 concludes the thesis with remarks on the work completed and the potential for future work.

RELATED WORK

A lot of work has been done with automated theorem proving in higher-order logic over the past few decades, attempting to bridge the gap between interactive theorem proving and automation. Before getting into the specific achievements of related research, it is important to study the various proof systems that are being used throughout the research community and industry. Several systems including Coq, Lean, and Isabelle are compared with a focus on unique aspects of each both in the logical framework and implemented system. The proof system Abella used in this thesis is introduced in Section 3.

Next is a study of the automation methods employed over the last few decades.

The attempts to solve various aspects of the automation problem can roughly be grouped through the following subproblems: knowledge organization and selection of lemmas/definitions (what knowledge is needed), complicated tactic discovery algorithms

(what actions are possible), feature extraction/proof structuring (what can be learned), data set accumulation (what can we learn from), learning techniques to solve proofs., and leveraging automation from other heuristic based approaches.

Specifically, this review starts with the automated, human-out-of-the-loop approaches. Starting with the ML approaches covering work on using ML for tactic choice, and reinforcement learning (RL). Next are the topics of sequence discovery and tactic evolution, EFSMs, control algorithms, and hammers. These sections will give background for how automation techniques can be incorporated as an extension of the system presented in this thesis.

Then, the strategy-like approaches are covered. These are approaches that require expert knowledge to construct heuristic driven strategies for solving proofs. Another important aspect to this approach is classifying theorems for heuristic solvers that were 11 11 expertly developed. The work in this thesis starts with the idea of strategies like these works but looks to subvert the need for expert knowledge by building a data base that can be gleaned for learning.

The last section looks at a thesis on proof representation. This work was a significant motivation for the direction of this thesis. The time is then taken to tie the various sections of related work together into a unified picture, deriving the motivation and merit behind the direction of this thesis.

Proof Systems The proof systems Coq, Lean, and Isabelle (HOL), are surveyed below. For each proof system, there are some key elements to pay attention to: the underlying logic of the proof system, the style of proofs within the system (forwards or backwards), the domains commonly explored within the system, the corpus of proofs existing for the system, and the output of the system. These core aspects of logical frameworks were enumerated in

[3], which serves as a good starting point in understanding how to differentiate the systems. Additional features like popularity, ease-of-use, levels of automation, and documentation can also hold relevance in the conversation of deciding which proof system to use or study.

Coq Coq [4] is a proof system in higher-order logic that allows both automatic verification of proof developments and interactive theorem proving. The logical foundations for Coq come from the Calculus of Inductive Constructions (CIC) [5]. In short, CIC is a pure type system built on top of the Calculus of Constructions. The system uses a cumulative hierarchy of types, with restrictions on which types can be predicative or impredicative to preserve consistency. Proofs are observed as objects under the Curry-

Howard isomorphism, giving rise to a system with a single judgement, the type 12 12 judgement. This judgements as types paradigm forces the creation of explicit proof objects, which can be extracted as programs in functional languages like OCaml and

Haskell. Further, the underlying verification in Coq is performed by its type-checking system, which can be understood as compilation of the program and proofs of correctness in relation to a set of specifications. As such, developments for verifying properties of programs and extracting said programs, such as the certified compiler CompCert [1] have been undertaken in Coq. Specifically, the specifications for proof developments in Coq are written in the language Gallina, and computation is defined using a functional programming approach, though relations can also be written in a Prolog style. For interactive development, Coq provides tactics, programs that transform proof states, with ranging levels of complexity and automation. One such language that allows the creation of complex tactics is Ltac extended by Ltac2, with tactic programs written like ML programs. The unfortunate result of such tools, along with built-in tactics like auto that can merge many simpler tactics, is that the proof developments that employ these tactics require unfolding the proof into simpler tactic applications for the structure of the proof to be better understood. It may be interesting to understand and learn from proofs employing automation in a coarse-grained way, but this thesis relies on proofs with simple tactics, where steps are not hidden from the user. As such, a corpus like CoqGym with proof developments containing the tactic auto, would not be well-suited for this work.

Nuprl Nuprl [6] is a proof system similar to Coq, focused on capturing proofs as computable functions, providing an interactive environment that allows proof development and the evaluation of derived functions. In further comparison, Nuprl also allows automation through programs in ML that compute proof objects. The logic is 13 13 based on constructive type theory with cumulative universes, with the addition of quotient and set types to the usual base types. In addition, due to the focus on computation, terms are viewed as canonical, a value, or non-canonical and can be evaluated until it becomes a value. The process of proof is through refinement, a.k.a. back chaining, of the goal. This process necessarily includes the proof of well- formedness, as it is undecidable for terms in the theory. Nuprl provides libraries for computational mathematics and various systems.

Lean Lean [7] is a relatively new proof system founded by Microsoft in collaboration with Carnegie Mellon University. Lean provides diversity in the underlying dependent type theory, allowing formulizations like CIC or Martin-Lof type theory depending on if the Prop type is impredicative. In addition, Lean allows declarative or imperative proof building, essentially allowing proof objects to be inferred implicitly by the system or created explicitly. Lean provides interoperability with automation tools like SMT solvers and is pushing towards higher levels of automation within a proof development. Further, libraries in category theory and homotopy type theory are under development. There are high hopes for Lean within the theorem proving community, but its current stage of development makes it difficult to use for research, as the current system may not be compatible with updates in the new rollout of Lean 3.

Isabelle Isabelle [8] is a generic proof system that has been instantiated for different logics, the most common being HOL [9]. Isabelle does not explicitly construct proof objects with deductive steps, but instead uses a declarative approach. This is achieved by writing axioms in a form similar to Horn clauses, with the extension of types. This however forces unification in the backtracking search to be higher-order unification, 14 14 which in general is undecidable. This is handled using a lazy list of unifiers in cases where the list can be instantiated. Isabelle uses resolution as the method of proving theorems. The process of using resolution on rules is bidirectional so there is no strict forward or backward proofs. However, tactics specifically work backwards, taking a set of rules and returning a list of proof states, where each state represents a different unifier.

The declarative model does not work for the explicit proof representation that this thesis seeks to achieve, where proof steps are captured directly and without search procedures.

Automation with Learning

Machine Learning The section enumerates the recent work leveraging ML to produce tactic sequences. The work in [10] both synthesizes a data set and presents a method for tactic creation using deep-learning. CoqGym is a data set that contains 70,835 human-written proofs which exist among 123 projects from varying backgrounds including software verification, type theory, and . In addition, synthetic proofs are derived from subgoals within the human-written proofs (these are not used for training as they might bias the model). ASTactic is the method given to generate tactics as programs.

There are a few restrictions, namely that only atomic tactics are generated, and a goal is encoded with up to 10 premises along with the entire local context (when there may be 1000s of premises in the global environment). Encoding is done with a TreeLSTM to preserve the structure of the goal statement, and the output tactics are also modelled as

ASTs, arising from a limited CFG to reduce the output space. More work needs to be done on argument selection, as for some tactics like induction, quantified variables are selected at random. The model outperforms the built-in auto tactic in Coq by 10%, but underperforms the hammer by 5%~12%. When combined with the hammer, the model outperforms the hammer alone by 5%. Proof storage is also explored in [11], where a 15 15 multivariate analysis library and proof of the Kepler conjecture are taken from the theorem prover HOL Light and stored with an eye on simplification. A method is used to reduce parenthesis and application operators from applied functions, leveraging the idea of De Brujin indices. Regression and CNN testing is done on the data set to evaluate the new abstractions. The work in [12] looks at the question of knowledge selection. A set of matching, abstraction, and unification features are extracted from a large corpus, and are used to improve premise selection in proof search. The contribution of this work is to present new features, and provide an architecture for extracting features, using discrimination trees to match features, and substitution trees to unify features. In this way, features can be generalized to improve premise selection of similar proofs. Testing was done on proofs in the Mizar library with two models, k-NN and naïve Bayes, both showing large improvements over benchmark methods.

TacticToe [13] is a system that reasons with HOL4 tactics. The system relies on the current goal to predict a single tactic. This has the drawback of losing information from previous proof steps, but greatly simplifies the model. To select tactics, a K-nearest neighbor classifier is trained. In addition to the normal features, the top-level structure of a statement with placeholder for atoms is included. The idea is that top-level structure is useful in determining when logical tactics are necessary to break apart a goal statement.

The proof process was performed with A* search and a co-distance value defined by TF- IDF from previous work for the KNN classifier. A set of 500 tactics is ranked based on the co-distance (distance to a solution), and nodes are expanded with the best fit tactic first. Following each step, previously expanded nodes may be expanded again in parallel if they have more tactics with higher co-distances. A “small hammer” strategy was incorporated, where each new subgoal was passed to a hammer for a small period to find a proof. TacticToe outperformed the next best hammer strategy on the HOL4 standard library during testing. GamePad [14] is another system that supports tactic level choice. 16 16

GamePad was developed for the Coq theorem prover and tested on the Feit-Thompson theory library. Theorems are viewed as games, with a starting state that contains the conjecture and environment, and tactics are moves towards a final state. GamePad works inside of Python, with an API for manipulating abstract representations of proofs and terms in Coq. All output proofs are run within Coq to ensure validity. There are two models being trained, one for predicting the remaining number of proofs steps before completion (may be useful in interactive scenarios), and the other for predicting the next proof step. Testing was done with an SVM, GRU, and TreeLSTM model, with the tree based NN outperforming the first two, supporting the idea that a tree-based representation and embeddings are useful. In [15] formulas are represented as graphs, allowing embeddings for a node to consider both children and parents, unlike TreeRNNs that only consider children. In addition, graphs can share nodes and share leaves, removing repeated information in the tree, and all variables were named x. A GNN was tested on the HOList benchmark for two cases, one where nodes only had edges to children (tree way), and one where nodes had edges to parents as well. The latter performed better, as subexpressions in nodes rely on their context (parents).

There are two important take-aways from the work presented above: one, ML approaches are using more complicated structures for representing features within a proof to enhance learning; and two, ML approaches tend to work better on smaller proofs. The latter case prompted the motivation for developing a two-level solver that provides a proof outline via strategy which contains gaps that ML tools can be used to solve.

Reducing the proof solving problem to smaller subproofs would coalesce well with the current state of research. The former was an inspiration to reimagine the features of a proof at an abstract level, providing structure without specific details like tactic arguments that can clutter the data. This was achieved with a similar approach to the 17 17 work of Proof Capture described below, which allows proofs to be viewed at a higher level of abstraction.

Reinforcement Learning This section explores the use of reinforcement learning (RL) to solve proofs. The first example [16] focuses on the domain of quantified boolean logic. It is noted that boolean formulas can contain 1000s of variables, whereas text fragments passed into neural nets are typically much smaller. In addition, variables in this context are devoid of meaning. This work sets out to train a GNN through deep RL, which selects actions for a backtracking algorithm to solve the SAT problems. The architecture was trained and run within the CADET software, and 4 experiments were considered: predicting actions in a family of proofs; generalizing policies for shorter proofs to longer proofs; generalizing policies for proofs of one family to another; and computational tradeoff. The system positively answered the first two, but failed on the third, showing a lack of generalization.

This means the policy in one family of proofs was not transferrable to another family. For the fourth question, the comparison is between brute force search, and prediction with a

NN, where brute force search can perform 10 actions in the time a NN makes one prediction. Thus, the NN would have to make actions that were at least 10xs better than the original heuristic approach. It succeeded in matching the benchmark VSIDS algorithm, and the author’s note that their own implementation is unoptimized and execution time can be further reduced. In [17], RL is taken to the domain of HOL for the proof system HOL Light with the system DeepHOL-zero. A couple features of proofs make RL difficult, namely the sparsity of rewards and infinite action space. However, starting the RL process with seed data from imitation learning has significant drawbacks; the exploration process is stymied, training data is required for each proof assistant, new fields have few existing proofs, exploration cannot go beyond seed data to and become 18 18 generalizable. To solve the problem of premise selection, the RL agent will start with a useful similarity metric, ranking premise similarity to the goal. The idea is that similar premises will be used in tactic application under a certain goal. Lastly, the agent is trained on failure to avoid unsuccessful branches. After testing different exploration approaches, DeepHOL-zero was found to outperform the imitation learning architecture

DeepHOL [18] but did not match the imitation plus RL architecture presented by Bansal.

Taking a different approach, the work in [19] uses Monte Carlo simulations to solve proofs in the leanCoP system for FOL. The tableau architecture is used, where a proof proceeds by searching for refutations. The prover is incomplete as there are no constraints on which clause to extend (infinite loops can occur in even trivial examples), which is normally solved via exhaustive strategies. Instead, Monte-Carlo search is used, where nodes in a successful proof search are given value 1, with discounting applied respective to the distance away from the start node. Tests were done on FOL theorems from the Mizar Mathematical Library using XGBoost for learning instead of a deep NN, and 40% more proofs were solved than the baseline. The work of this section shows that there is potential for RL in the domain of proof theory, but work needs to be done to improve exploration/generalization and reduce the search space.

EFSMS, Controlled Search, Hammers In this section, we will consider a hodgepodge of systems including EFSMs, strategy finders, and hammers, all of which are geared towards solving proofs automatically. The system SEPIA (Search of Proofs Using Inferred Automata), presented in [20] uses the model inference technique MINT presented in previous work to infer

EFSMs, and uses paths through an EFSM to provide a solution to a proof. An extended FSM has the additional property of guards on its transitions, so certain transition can only be taken if the guards evaluate to true, based on the system memory variables. Sequences 19 19 of tactics for Coq proofs were viewed as trees, where each goal state was an unnamed node and each tactic was a transition (guards were placed on transitions based on the

MINT strategy). Equivalent states were merged to reduce redundancy in the EFSM. In addition, the merging process created new tactic paths, potentially identifying new proofs. Breadth first search was used to move through the EFSM, with a maximum of

10,000 tactics applied before failure. The system outperformed Coq’s automated tactics, and in some cases provided shorter proofs than the given data set. The contribution of this work was to test the EFSM inference algorithm by automated proofs using BFS and comparing the performance versus benchmark provers.

Certain algorithms have a set of parameters that can be modified to alter the algorithm execution, and the choice of parameter values is called a strategy1. One example would be choosing which node to expand next for a search algorithm ParamILS

[21] was a system built to optimize an algorithm’s performance by selecting suitable strategies for a given class of problems. The optimization scheme happens through iterated local search. Given an initial configuration, select a parameter to modify then evaluate performance, and with each loop perturb the strategy to move out of local minimum. Simple time improvements are made such as giving up on the iterated local search when the parameters perform worse than a known best strategy. This was applied to algorithm SPEAR, an algorithm that implements tree search procedures to solve the class of SAT problems, and CPLEX for solving mixed integer programming problems. A key point in this work is that an objective function is necessary to determine when one strategy is better than another. This idea is expanded in BilStr [22] where strategies are developed for the E automated theorem prover. In short, 280 predefined strategies are

1 This definition of strategy refers mainly to control parameters of an algorithm and is not the same as the strategies defined in this thesis. 20 20 modified and collapsed into a minimal set based on performance on 200 randomly selected problems. This process is achieved by running a loop where BilStr selects N strategies with at least 8 best solutions (on the 200 problems), modifies some strategies with ParamILS, then updates the best solution . The idea here is that good strategies come from good strategies. In other words, to evolve new strategies, good strategies must be seeded into the system. A similar approach is taken in MaLeS [23] where strategies are used to prove some theorems from TPTP on the theorem provers E, LEO-II, and Stallax. This work attacks three problems: one, find an initial set of strategies that can be a seed for local search; two, define a set of features that are computationally inexpensive do retrieve but diverse/expressive; three, schedule strategies. The first is achieved by taking a list of problems and list of strategies, and for each faster strategy doing a local search to find all faster strategies (here faster is referring to time of execution time for solving a problem in the problem set). This is like ParamILS except it focuses on a specific problem and not the entire corpus. Then, machine learning is used to predict which strategy will work best on a given problem, with features coming directly from Shulz’s work with E and the TPTP library. Lastly, the scheduler selects first the overall best strategy (to get rid of easy to solve problems), then selects a strategy based on the learned strategy prediction model. These works constitute alternative ways to imagine strategy creation using local search. One major drawback is that a predefined set of strategies is required for each. The assumption that good strategies come from good strategies may be true, but this does not bode well for solving unseen problems. A common tool used to prove a certain subset of goals in HOL is a hammer. The idea of a hammer is to translate a goal into FOL, solve the goal with resolution or some other FOL automated procedure, then translate the proof steps back into HOL. Of course, hammering can only be done on a subset of HOL theorems, as HOL provides additional 21 21 tactics/definitions inhibiting translation. The work in [24] looks to convert sub goals of a

HOL proof into FOL and prove them with a hammering technique. The hammer in this case is a set of three procedures: resolution, model elimination, and the Delta preprocessor. These procedures are run in parallel, and they share unit clauses so that each procedure can make progress for different pieces of the output proof. An LCF-style kernel maintains the FOL translations and performs quick conversions back to HOL when necessary. It is noted that classic automated theorem provers care about soft- deadlines, e.g., the overall time to complete a proof or a series of proofs is considered.

Whereas this system looks at hard-deadlines, e.g., each subgoal must be proved within 5 seconds of the hammer being evoked. This is because the hammer is evoked on subgoals during the process of a user guided proof; and a user cannot wait extended periods of time between each proofs step. This sort of time-constraint will be important in the work

I am considering, as it anticipates the interaction between users and the tool. The system out proved any individual hammering procedure when run on a set of subgoals.

These techniques such as hammering and controlled search could be useful for automating simple cases within a proof. As was mentioned previously, the strategy-based proof solver presented in this thesis purposefully leaves holes in the output proof structure which are to be filled in by separate processes. Search or hammering could automatically produce the sequence of tactics that fill these holes.

Automation with Expert Knowledge and Strategies

Rippling Rippling [2] is one of the most well-developed algorithms for solving inductive proofs using reduction and rewriting. The core idea behind rippling is that in an inductive proof the inductive conclusion can be processed to the point that the inductive hypothesis is embedded inside the conclusion, and from here the proof follows directly. This process 22 22 of discovering the embedded inductive hypothesis within the inductive conclusion is captured in the idea of rippling-out. In short, there are portions of the inductive conclusion called wave-fronts which contain pieces of the inductive hypothesis called wave-holes. The rewriting process is done with wave-rules that enforce a certain heuristic, ensuring the rewritten rule moves towards connecting the pieces of the inductive hypothesis that are scattered between various wave fronts and outside wave fronts. There have been several extensions to this general algorithm, looking at heuristics for applying wave-rules, ways of generating new wave-rules, learning from failure, and adapting it to specific domains through expert knowledge.

Being able to discover complex algorithms such as rippling from proof data would be extremely exciting, not only for the effect on solving proofs, but for discovering the implicit structure of inductive proofs at a higher level of reasoning. While the strategies presented in this thesis are far simpler than the components and heuristics involved in rippling, it may be possible with the right abstractions of proof data to learn something of the sort. This thesis works towards producing a representation that could yield these higher-level reasoning techniques.

RE-based Strategies The work in [25] looks to learn methods for solving proofs that are classified into proof families. The idea is to collect a small set of proofs that act as representatives for a proof family, then create a method that generalizes the tactic sequences used between the representative proofs in a minimal way. The way tactic sequences are generalized is through methods which share an inductive definition similar to regular expressions, with constructions for sequences of methods, options between methods, Kleene star on methods (repetition zero or more times), fixed repetition, and a list corresponding to branching. An algorithm is presented that takes the tactic sequences from a set of 23 23 representative proofs and unifies them into a method using the constructions described above, where the method is minimal on a defined size metric and complete in the sense that it covers all tactic sequences of the representative examples.

This work was extended by Duncan in [26]. The general idea is to find a series of commonly occurring tactic sequences that is diverse but concise and use evolutionary algorithms to develop new tactic sequences from the base set. For the implementation,

Duncan used the Isabelle ITP, and a grammar for tactics derived from Isabelle (tactics that needed arguments were not within the scope of this work). Several pattern finding algorithms from Bioinformatics were considered, including Sparse Markov Transducers and the Tieresias algorithm, but Duncan chose a variable length Markov Model approach to resolve some issues the previous algorithms presented (as they were biased towards biological data sets). A threshold was used to determine when a sequence should be considered significant, as too many sequences would cause a lack of generality, and too few sequences would cause a lack of diversity. Using a grammar for tactics, Koza’s genetic programming was used on the initial set of sequences, with a spread of 50% crossover, 25% mutation, 25% reproduction, and a fitness score based on the number of theorems a sequence could solve in the corpus. This was later modified with an approach that favored crossover (swapping branches from independent sequences). The contributions include developing a way to find commonly occurring sequences of tactics, and NewT, a system for evolving those tactic sequences to create better sequences

(meaning they can prove more theorems then the original sequences).

Defining methods with a simple set of constructions and generalizing similar proofs using those methods is an idea this thesis borrows from the works above.

However, the methods are built up from tactic commands. This causes issues with argument discovery and generalizability across specifications. This thesis takes the idea of methods and abstracts it to the concept of a strategy, which is not built up form tactics 24 24 but instead from less specific components, allowing the strategies to generalize across specifications unlike the methods. The problem of selecting representative proofs based on expert knowledge is something Duncan sought to address with pattern discovery and evolution. This thesis explores the possibilities of both discovered and expert-driven strategies, where both options have merits and drawbacks.

Classifying Formulas for Heuristic Solutions For this section, a couple works focused exclusively on the classification of proof statements will be considered. In the dissertation [27], Bridge was one of the first to apply ML techniques to the domain of theorem provers. Only theorems in first-order logic were considered, namely in the theorem prover E. Also, the Robinson method of resolution was used exclusively to solve proofs, with the only variation being the heuristic used in the resolution procedure. Heuristic here means a way of guiding search in resolution. Thus, the task was collapsed into classifying conjectures into one of five sets, corresponding to five different heuristics, where the heuristic chosen should be the best suited to solve the proof. For testing the feature set initially, only was considered, as a subset of the TPTP library. SVMlight was used with several different kernel functions, and it was discovered through a process of random removal, filtering of features that only two of the 16 preselected features were useful in classification. The data set was broadened to the entire TPTP library, and three separate feature sets, mixed between static and dynamic (found during the proof process) features were used. The feature sets gave similar results, making it seem that classifying the proof before running it for a short period of time is more cost-effective. This method consistently proved more theorems than a single heuristic could on its own. Bridge’s main contribution was to provide a system for testing the usefulness of certain features for a classification task; and 25 25 to experiment with first-order logic, showing that ML techniques could in fact be used to improve automated proof.

Next is the work in [28], where researchers from Google employ a NN architecture to classify entailment tasks as true or false. The work gives a way to generate a data set for testing, then employs several NN models, some of which capturing the tree- like structure of a proof, and some being linear. It was found that the NN architectures that captured tree structures were better suited for the classification task. This work was extended by creating the possible worlds model, which captures a form of model checking, and thus is specific to the domain of entailment. This model outperformed all general NN architectures. The important contribution of this work was to test and show the gain in using tree based NN architectures.

Using ML to classify theorem formulas to match a specific strategy is a possible approach to implement the strategies presented in this thesis. The alternative would be to use heuristic selection in controlling how a strategy is applied. The work by Bridge shows that there are only a few features that impact how effective a classifier is in selecting a proper heuristic solver to prove a theorem. This idea can be carried to strategies, where only a select few features are used to determine how a strategy should be applied to a theorem. However, as strategies become more complex, or in the case of

Bridge’s work if there were many more specific heuristic solvers, the set of important features distinguishing theorem formulas will likely grow.

Proof Capture The work by Velykis [29] shares many commonalities with the work in this thesis, and indeed served as a prime motivation for the direction of this thesis. Velykis presents a system that captures the process of a proof through a data structure called the

PoofProcess. This structure serves as an extension of Hi-proof [30] incorporating 26 26 additional user-defined features such as the intent, relevant features, pre/postconditions all in a hierarchical structure. This hierarchical structure allows the user to explicitly describe the strategy being used when solving a proof, including important information like the way certain steps were taken or how the steps should be organized. While this presents a large burden to the user, which is especially consequential when proof developments are already time-consuming, Velykis describes ways in which this can be addressed. One such solution is providing suggestions to users about which features to select, or automatically selecting the features with ML algorithms. The data was not directly used to generate strategies, but the process for generating such strategies is laid out, and work on strategy generation is referenced.

With expert user insight captured directly during the process of a proof, it should be possible to greatly enhance the effectiveness of ML based approaches at proof solving.

The input data has already been pre-processed to weed out all non-important information, and the proof has been organized with additional structure not apparent from a proof script. This is an almost ideal data set. Two problems come to mind, giving users the ability to define new features as they please could lead to patterns being lost under different naming, and users may not always communicate with the system why they are performing certain steps (they may not even know why themselves). These aside, developing such a data set is highly desirable. This thesis builds on the same underlying belief, but with a different approach. The hierarchical structure will be developed automatically, without user-input, and as such the feature set of the structure will be significantly more limited. The tradeoff will allow for the data to be more directly connected to proof solving strategies, as those very strategies are informing the data abstraction in the first place. As an extension of this work, there may be middle ground to be found between automating the development of a rich proof structure and having expert 27 27 users explicitly create the structure, which will enhance the complexity of possible strategies to be learned.

Conclusion There are opposing forces within the two directions of work in automation: on the one hand, machine learning techniques are being thrown at large proof corpuses with sophisticated feature processing in an attempt to black-box the solving of proofs; on the other hand, expertly crafted algorithms or strategies for proofs families are developed to capture reasoning for certain types of proofs. It would be desirable to leave aside expert knowledge, but also keep the effectiveness and understandability of strategies for reasoning about proofs. This thesis works towards a middle ground, finding an abstraction of proof data that will be amenable to learning strategies and developing proofs without much expert knowledge. It veers off the work of [29] with Proof Capture in two significant ways; one, the proof representations are extracted from proof data automatically, not filled in by experts; two, the proof representations are necessarily strategy focused, so they may be adapted directly to an automated prover setting. While this limits the expressability of the system, it allows for a quick turn-around for implementing automated solvers. Further, the idea of a strategy presented in this work allows for a hierarchical structuring of a proof, forming a frame that can guide the proof process. From here, the gaps in the proof can use the proof reuse techniques developed with powerful machine- learning systems, along with heuristic based search algorithms. This gives a two-level structure to the proof solving problem, first using high-level reasoning in a strategy to build an outer structure, then using additional automation techniques to fill-in the specific tactic sequences. What this does is provide an abstract representation of a proof at a higher-level useful for understanding general reasoning techniques and for visualization 28 28 and summarization; but also significantly reduces the search space at the tactic level by reducing the proof to a series of gaps. Given that the ML solvers worked well with shorter proofs, focusing them on small gaps within a proof should be effective.

AN INTRODUCTION TO ABELLA

This section is meant as a preliminary introduction to Abella. First, the logical framework of Abella is described, giving the technical details insofar as they are relevant in distinguishing Abella from other proof systems. One caveat is the absence of discussion on nominal constants and hypothetical reasoning, as these constructs only complicate matters and the thesis can be justified through simpler examples. Second, an example proof is given, starting with the signature and module files, and concluding with a sequence of interactive windows that a user would see had she been following along on her own system. Lastly, the motivation for selecting Abella, beyond what was said about other popular theorem provers, is put forward.

The Abella Logical Framework A detailed walk-through of Abella can be found in [31], but this section serves to provide enough information to distinguish Abella from other theorem provers and denote the significant choices made in its development. Starting with the reasoning logic, Abella uses the logic G, in which types are user-defined simple types or arrow types, and terms are lambda terms with variables, constants, abstractions, and applications. Binding of variables is managed internally using the lambda-tree syntax, which allows Abella to internalize reasoning about binding in the object language. The type Prop is the type of formulas in the reasoning logic, and predicates are constants with target type Prop.

Formulas are further composed with the expected logical connectives and quantifiers: conjunction, disjunction, universal quantification, existential quantification, and additionally nabla quantification. Equality is included as a connective and checks for syntactic equality of terms, achieving this through a decidable unification algorithm that finds the most general unifier. The arrow types do allow for higher-order types, (those with an arrow type to the left of an arrow), along with quantification over those types; 30 30 however, there is no quantification over proposition type, so it may be said that the logic is first-order. Operations are defined using a relational fixed-point definition instead of a functional approach. A limit to this approach is that fixed-point definitions may not exist if a predicate being defined appears to the left of an implication; the so-called stratification condition which in turn restricts hypothetical reasoning without the introduction of a context. This does not inhibit the system, because Abella implements a two-level logic approach. Aside from the reasoning logic, Abella also allows relations to be written in the specification logic of hereditary Harrop formulas. These are an extension of the Horn clauses one would see in Prolog-like systems, where hypothetical judgements can be introduced within the relations, even though they break the stratification condition of the reasoning logic. But the power of using a two-level approach goes beyond this. Meta- theoretic properties about the specification, including monotonicity, instantiation, and cut (see Page 72 of [31]) are given for free. By filtering the specifications through the hereditary Harrop formulas, these properties already proved by in the Abella framework can be used without additional work. One take-away for proofs involving meta-theoretic properties of programming languages is the gift of substitution lemmas that need not be proved explicitly.

In terms of using the software, Abella provides direct compilation of theorem files or a user can work with the interactive proof assistant. The user creates a sig and mod file, described below, that define the type signature of constants and their relations respectively. Then, the user interactively declares formulas to be proved, and uses the set of tactic commands to complete the proof. Abella provides the capability to use induction as well as co-induction, with the limitation that these be used on predicates from a relation and not terms of a type. The reason for this is that new constants may be created at any time, breaking the validity of induction on that type (if it were possible); whereas, 31 31 relations cannot be changed and therefore induction on a predicate cannot be affected by the introduction of new constants.

Proving Add Exists

Signature The first step in developing an Abella module is creating a signature file (.sig) that contains the user-declared simple types and constants. The types will follow the keyword kind separated by commas. Once types have been declared, they can be used to construct the types of constants. Constants follow the keyword type and have a name followed by a type, which can be a simple type or a type constructed with arrows, where the right-most type is called the target of the constant. Again, Abella allows higher-order types, so there may appear an arrow type to the left of an arrow. Lastly, types with target o can be used as propositions in the reasoning logic. These constants will be used in the module file to inductively define judgements.

In Figure 2 the signature file starts by declaring the name of the specification example which will reappear in the module and theorem file. The only simple type introduced in nat which will be used as the type for natural numbers. The constant z is zero and the successor function s will be defined as expected, taking a natural number as input and returning a natural number – namely the input number incremented by one. nat is a constant with target o that will be used to define the judgement for natural numbers.

Such a judgement will be necessary for induction on the structure of natural numbers.

Lastly, add is a constant with three natural numbers as input, with target type o identifying it as a proposition. Hence, add will not represent the recursive function from 32 32 two natural numbers to an output, but instead will define the relationship between three natural numbers that holds true when the first two add to the third.

sig example.

kind nat. type z nat. type s nat  nat. type nat nat  o. type add nat  nat  nat  o.

Figure 2. Example sig file.

Module After the signature has been written, the module file can be populated with relations for the constants with target o, i.e., the specification level formulas. These relations are defined using hereditary Harrop formulas, an extension of Horn clauses. The syntax will be familiar to a Prolog user, with head and antecedents in the case of a rule and only the head in the case of a fact. Additionally, upper case letters denote universally quantified variables.

The module file in Figure 3 first declares the specification example. The judgement nat is inductively defined such that z is a natural number and the successor of any natural number is also a natural number. The judgement add is defined with recursion on the first number. The base case when the first input is zero unifies the second and third numbers. The recursive case when the first number is the successor of some number is defined in relation to a recursive call to add. Since this is written in a subset of the lambda-prolog language, Abella provides the means to run queries over the specification, as one could in any Prolog system. 33 33 mod example.

nat z. nat (s N) :- nat N.

add z N2 N2. add (s N1) N2 (s N3) :- add N1 N2 N3.

Figure 3. Example mod file.

Theorem A theorem may be written in completion and compiled, it may be completed interactively through the Abella terminal application, or it may be completed interactively with the assistance of a third-party mediator like Proof General for Emacs. This section will look at the interactive proof setting, giving each window of output as it would come to the user, with explanations for each step, starting with Figure 4.

Abella < Specification "example". Reading specification "example".

Abella < Theorem add_exists : forall N1 N2, {nat N1} -> {nat N2} -> exists N3, {add N1 N2 N3}.

======forall N1 N2, {nat N1} -> {nat N2} -> (exists N3, {add N1 N2 N3})

Figure 4. Add exists proof screen 1.

For these windows, user input will follow the < prompt and end with a stop (period). First, the specification example is read in (that includes the sig and mod file).

Then the formula to be proved is input after the keyword Theorem and a name for the theorem, in this case add_exists. The theorem simply states that for any two natural numbers there exists a third that is the sum of the first two. Quantifiers are represented with the name followed by a list of upper-case variables, connectives are as you would 34 34 expect, and formulas from the specification logic are surrounded with curly brackets. The

ASCI line separates the hypotheses (above the line) and goals (below the line) for each proof state. Currently, the conjecture is the goal to be proved.

The induction tactic in Figure 5 selects an inductively defined proposition to induct on, in this case {nat N1}, and places the goal as a new hypothesis, namely the inductive hypothesis labeled IH. The * and @ are used to denote the depth of the inductive argument, where the IH can only be used when the inductive argument has reduced in size.

add_exists < induction on 1.

IH : forall N1 N2, {nat N1}* -> {nat N2} -> (exists N3, {add N1 N2 N3}) ======forall N1 N2, {nat N1}@ -> {nat N2} -> (exists N3, {add N1 N2 N3})

Figure 5. Add exists proof screen 2.

The tactic intros introduces the hypotheses to the left of the right-most implication, in this case the two natural number judgments. The new goal is to prove the existence of N3 given the assumptions are true, seen in Figure 6.

add_exists < intros.

Variables: N1 N2 IH : forall N1 N2, {nat N1}* -> {nat N2} -> (exists N3, {add N1 N2 N3}) H1 : {nat N1}@ H2 : {nat N2} ======exists N3, {add N1 N2 N3}

Figure 6. Add exists proof screen 3. 35 35

The tactic case H1 matches the judgement against the corresponding heads in the module file inverting the rules, in this case producing two branches, one where N1 = z and one where N1 = s N for some N. The first case is presented to the user, and the second is stored as a subgoal displayed after the current goal, shown in Figure 7.

add_exists < case H1. Subgoal 1:

Variables: N2 N IH : forall N1 N2, {nat N1}* -> {nat N2} -> (exists N3, {add N1 N2 N3}) H2 : {nat N2} H4 : {nat z} ======exists N3, {add z N2 N3}

Figure 7. Add exists proof screen 4.

The goal is trivially proved based on the first case of add so a simple search will unify N2 with N3 and complete that subgoal, moving the proof state to the next subgoal. Note that search is doing minimal work when compared to automated tactics in other languages that will attempt to apply lemmas and more.

add_exists < search. Subgoal 2:

Variables: N2 N IH : forall N1 N2, {nat N1}* -> {nat N2} -> (exists N3, {add N1 N2 N3}) H2 : {nat N2} H3 : {nat N}* ======exists N3, {add (s N) N2 N3}

Figure 8. Add exists proof screen 5. 36 36

The IH is used with the apply tactic, using H3 and H2 as arguments and returning the implied result as H4. Note that the goal has s N whereas H3 has N. This is because the depth of N was reduced due to the use of the case tactic in Figure 7, which in turn allowed the application of the IH in Figure 9.

add_exists < apply IH to H3 H2. Subgoal 2:

Variables: N2 N N3 IH : forall N1 N2, {nat N1}* -> {nat N2} -> (exists N3, {add N1 N2 N3}) H2 : {nat N2} H3 : {nat N}* H4 : {add N N2 N3} ======exists N3, {add (s N) N2 N3}

Figure 9. Add exists proof screen 6.

Finally, a search will complete the proof, as the leap from H4 to the goal is mediated by the inductive rule of add unifying N3 from the goal with s N3 from H4. Note exists is a binder that will allow for this instantiation as variables can be renamed.

add_exists < search.

Proof completed.

Figure 10. Add exists proof screen 7.

This is quite the process for a relatively simple proof. In addition, it may seem like the user is bombarded with useless information (hypotheses that will no longer be used), and repeated proof states, that distract from the proof process. The representation can be made simpler with a tree like structure below and colorings as described in 37 37

Section 1.2 of the introduction. But there are still many abstractions that can be made, as will be discovered in Section 6.

Figure 11. Add exists proof diagram. 38 38 Why Abella? The motivation for selecting Abella can be broken down into three parts: simplicity of the logic, relation-based specifications, and familiarity. The logic for Abella is easy to interact with because of the two-level approach, with the specification placed in a separate file and theorems proved in another. The reasoning logic may be weaker than many other systems with higher-order logic, but that is a strength of Abella relative to this thesis. By focusing on a restricted logic, it is easier to prototype and produce promising results that may later be generalized to more expressive systems and logics.

Second, the relation-based specifications are well-suited for back chaining, a technique used during the search procedure in Prolog like languages. This form of computation can be leveraged when trying to discover the outer structure of a proof. Lastly, Abella is familiar to the author and advisor. Having proved properties about various type systems in Abella, it would be much easier to interact with in terms of parsing data and understanding the proof structure. In addition, the proofs already completed, along with some proofs given in the Abella distribution, would be suitable for displaying the work in this thesis; because in the first case, the abstract understanding of the proofs is well- grasped, and in the second case the new proofs from the corpus can be tested without bias or artificial selection.

Additional Example Proofs Throughout the remainder of this thesis a handful of example proofs will be used for visualizing instances of the concepts being described. The sig, mod, and thm files can be found in Appendix A.

PROOF STRATEGIES

This section introduces the idea of a proof strategy. Unlike strategies developed in some of the related works that are tactic focused, producing specific tactic sequences for automating a proof; the proposed conception of a strategy sits at a higher level than tactics. The basic idea is that a strategy will describe the way in which an inductive hypothesis is to be applied, and provide information relating the generated proof branches to the underlying definitions in the specification. This higher-level proof structure will be represented as a proof frame (see Section 7), which can then be further refined to the point of proof steps and respective tactic applications through various methods (see

Section 8).

There are several benefits to defining a strategy in this way. First, the strategies are more generalizable than a tactic-based strategy because they are not constrained by definitions or lemmas from specific specifications or domains. This is because extracting tactic sequences from a proof frame happens following the construction of the proof frame via. strategy, and during this stage one might introduce domain specific lemmas.

This will also allow strategies to be discovered and learned across specifications. Second, this separates the learning process into a hierarchical approach; learning strategies then learning to extract a tactic sequence from a proof frame. It can greatly reduce the search space by providing a general structure for the proof, thus reducing the burden on programs that produce specific tactics. This also allows learning at an abstract level, which can give rise to patterns not apparent between proofs being examined at a fine- grained level. Third, this provides a way of classifying proofs that may aid in proof summary and proof understanding. As machine-learning approaches grow more capable, the proofs they produce will need to be understood quickly. A technique that seeks to capture a proof at a higher-level can help summarize proofs which humans did not create. 40 40

It can also be used as a method for displaying proofs hierarchically, allowing users to view and maneuver through proofs in an efficient interface.

The section continues by introducing some basic inductive proof schemes and how they extend to specification definitions. The question of incorporating additional strategies is considered, and finally the correspondence between strategies and base cases is hinted at. Note this section does not provide a means for selecting a strategy given a goal formula, which will be discussed in Section 8.

Inductive Proof Schemes At the highest level, a strategy describes the way in which the IH is applied within a proof. Strategies then extend to providing information based on the specification’s definitions. First, looking at how an IH can be applied irrespective of a specific specification, we consider inductive proof schemes in increasing complexity. Here, a proof scheme refers to the general actions that are taken in an inductive proof. Note these schemes will only consider the inductive branch of an inductive proof and not consider the base case.

For the following schemes, diagrams will have hypotheses labeled by H, with and

@ symbol and * symbol for the inductive argument, where * means the argument has been reduced in size and is a viable argument for the IH. For other hypotheses, and the goals denoted by G, the depth with be annotated with a -1 corresponding to one level higher than the initial inductive argument. Arrows denote the application of the IH as it is labeled, with the input on the left-hand-side and the output on the right-hand-side. Lastly, different elements are labeled by bold numbers for reference in their descriptions.

The first scheme is shown in Figure 12 is the most basic form of inductive proof where an inductive argument H@ is unfolded to H* (this is done in Abella with the case tactic) in step 1. Step 2 is the application of the IH to H*, giving a hypothesis of the form 41 41

G(-1) to denote that it is at a depth above the goal G. Then in step 3 a derivation exists from G(-1) to G (this is found in Abella with the search tactic). Of course, something this simple would be easy enough to discover and automate, but more complex cases exist.

Figure 12. Inductive scheme 1.

The second scheme shown in Figure 13 represents a proof where additional arguments to the IH, beyond the inductive argument, must change depth before applying the IH. Step 2 is the point at which another hypothesis will be inverted so that it matches the depth of the inductive argument H*. This can occur in proofs such as uniqueness proofs, where the two input judgements should be at the same depth to ensure that they associate with the same definition. This can generally be thought of as the scheme where processing is done before the application of the IH.

Figure 13. Inductive scheme 2.

The third scheme shown in Figure 14 keeps the same IH application as the first, but step 3 involves building additional hypotheses before the goal is proved in step 4.

This can generally be thought of as adding work after the application of the IH. This can occur in proofs where the goal corresponds to a definition with multiple terms on the right-hand-side, when the IH will only provide the inductive term. Hence, the other terms 42 42 must also be built before the goal can be proved. In definitions like multiplication, where the addition term depends on the multiplication term on the right-hand-side, the IH must be applied first as step 3 depends on the output of step 2.

Figure 14. Inductive scheme 3.

The fourth scheme shown in Figure 15 shows step 2 with multiple applications of the IH on multiple inductive arguments. When a definition has multiple inductive terms on the right-hand-side, and the judgement of that definition is the inductive argument, each of the terms are viable candidates for IH application. This idea is used in the proof of progress for the plus case, where both arguments to plus are candidates for the IH, and both IH applications will need to be used in order to complete the proof. Step 3 is not given structure because it may be that multiple branches are created with each IH application, and each of those requires its own form of proof. There may in fact be extra work that occurs before the IH application as in scheme 2, or after as in scheme 3.

Figure 15. Inductive scheme 4.

The fifth scheme shown in Figure 16 is more a combination of the previous schemes. Here the brackets correspond to options, and the ellipses refer to a list of arbitrary size. As such, step 2 and step 4 are the unfolding and creating (respectively) of additional hypotheses used in the proof. This diagram could be further extended to 43 43 account for multiple applications of the IH as seen in scheme four to better capture all the previous schemes, but that would unnecessarily complicate the diagram. This is meant to show that any combination of the given actions may be performed in an inductive proof.

Figure 16. Inductive scheme 5.

The schemes described above should not be taken as replicas of exact proof structures, but as generalizations that capture certain cases not explicitly drawn in the diagrams. Take for example scheme 1; it may be that the IH takes in multiple arguments, but that the other arguments do not need to change before application and thus do not need to be represented in the diagram. This could be the case in a proof reasoning about the add judgement, where the IH takes in two natural numbers as input, but the second number does not change from the right-hand-side to left-hand-side in the recursive definition for addition. In other words, the diagrams seek to capture a set of specific active steps in the proof process that correspond with an inductive proof scheme at a high level, but actual instances of these schemes would be more complex if viewed at the level of proof steps.

Connecting a Strategy with a Definition While the proof schemes described above show ways and inductive proof can use an IH, a strategy will provide more information following the application of the IH based on the definitions in the specification. This information flows directly from the IH, and will provide the foundation for a proof frame, offering guidance to how a proof should be 44 44 completed. It is important to note that the abstract process of connecting definitions to the

IH is dependent on the goal formula and not the underlying specifications, so the same strategy can be used within different specifications, producing specification specific proof frames. There are a few types of embeddings from IH applications into definitions that will be considered including direct single definition embeddings, direct multi-definition embeddings, multi-direct single definition embeddings, indirect definition embedding, and no embedding. Each will be accompanied by an example diagram corresponding to a specific proof. The same notation as before is used for inductive arguments, and the IH application is represented by a red arrow. If the arrow points to a term in a definition

(blue text), then there is an embedding into the definition at that term. Additional notations are introduced per example.

The first embedding considered is a direct arrow between the output of the IH and a definition’s inductive right-hand-side term. This embedding is considered direct when the IH outputs the term in question, which is the case in many proofs where the goal uses and existential quantifier for its right most term, which follows the last implication arrow.

Figure 17 shows the inductive case of the existence for multiplication proof, where the IH takes the inductive argument N1* and produces the term mult N1 N2 N3. This term is now a new hypothesis; hence it is directly embedded into the definition. In the case of multiplication, the add term is needed on the right-hand-side to get the goal (left-hand- side). It can easily be seen that there is a dependency between the recursive call to multiplication and addition based on the variables, and this dependency is captured in the embedding. 45 45

Figure 17. Direct embedding example.

The next embedding is direct with multiple definitions. This will occur when the

IH produces an output that corresponds to the inductive term in multiple definitions. In this case, further work must be done to cause a branch and land in a specific definition. In

Figure 18, the existence proof for division gives an example of an IH application that corresponds with two definitions. This happens because division has two definitions in which the inductive argument (the first argument N1) gets smaller. The two recursive division terms are highlighted to show that the IH has produced an output that cannot yet distinguish between the two definition. To further the proof, a case on the fifth argument will allow the definitions to be separated by a branch. This procedure is not given directly by the strategy, but it can be found by examining the differences between the set of definitions the strategy returns.

Figure 18. Direct embedding with multiple definitions example. 46 46

As was seen in Scheme four in the previous section, some proofs require multiple applications of the IH. In this scenario, there may be multiple terms to embed (one for each application of the IH), into the right-hand-side of a definition. While the applications are performed sequentially, the resulting branches can be looked at as if the applications were simultaneous, if there are no dependencies, and the resulting embeddings arise from the applications. Figure 19 shows the proof of progress for the plus case where the two inductive terms of Plus are split into values or steps by application of the IH. They can then be traced to their respective definition embeddings.

As can be seen, it is not necessary that an embedding exists, for example the case where

E1 is a step, it does not matter whether E2 is a value or a step because the definition’s right-hand-side is complete without the use of E2. Then there is the second case where both E1 and E2 will be embedded into the definition as their respective terms. The remaining case where both E1 and E2 are values is described below.

Figure 19. Multiple direct embeddings example.

Indirect embeddings are a way to capture the flow of a proof based on underlying definitions even when the definitions are not being used directly in the proof search to construct the goal. This can be seen most clearly in uniqueness proofs, where variables are strung through terms on the right-hand-side of definitions, and those terms suggest the flow of the proof. Figure 20 looks at the uniqueness of multiplication, where the 47 47 application of the IH to the two mult terms causes a unification of N3 and N3’, not actually producing a new mult judgment. However, it is useful to note the correspondence between the IH and the mult term in the definition of multiplication, because it can then be seen that the add terms will have to be unified next in order to unify N4. So, the embedding provides guidance of how and in what order like terms should be unified; not which terms should be constructed.

Figure 20. Indirect embedding example.

The last case to consider is when no embedding exists. Take for instance the case of plus in progress where the IH has been applied twice and both E1 and E2 are values.

The corresponding definition has the term add on the right-hand-side. To construct the add term, since E1 and E2 are values that it can be shown that they are number expressions which are in turn natural numbers. The natural number judgement can then be used with the add exists lemma to finish the branch. The strategy may deduce which definition is relevant (the only definition without a direct embedding), but the definition offers no more information other than the need for an add term. Search is then required, perhaps looking at the left-hand-side of the rule and noticing the correspondence of E1 and E2 with the two number arguments to plus as a guide. This sort of additional step is not yet captured by a strategy, but it can be implemented on top of a strategy.

Another case where an embedding does not exist is in the proof of the commutativity of addition. Figure 22 shows that the goal does not have a right-hand-side that corresponds with the definition of add; add is defined by induction on the first argument but the difference between the IH output and the goal is in the second 48 48

Figure 21. No embedding progress example. argument. The fact that the inductive arguments do not align may be a hint that an inductive lemma is necessary to complete the proof, transforming the hypothesis directly into the goal without use of a definition. A more sophisticated understanding of the cases with no embedding could enhance the power of a strategy, but for the current work both this proof and the former will have gaps where some secondary algorithm is used to perform the reasoning.

Figure 22. No embedding commutativity of add example.

Strategy Description A strategy will consist of how the IH is to be applied and how the result is embedded into a definition, where the embedding will provide additional information like how to use the definition’s structure to complete the proof. Both components will need to take the goal formula and structure of the definitions into account; however, neither is specification specific, meaning the strategies can be ported across specifications. 49 49

The components of a strategy will be detailed more in Section 7, where the schemes listed above will be analyzed for each proof step, and the resulting structure that embodies an instance of a strategy will be the proof frame.

Additional Strategies Depending on the level of specificity used to define a strategy, there will be proofs with strategies that do not fit the current repertoire. Take for example a proof by strong induction, which can be seen in the proof of existence for the GCD definition in the Abella examples. The strategy for proving something by strong induction is different than the strategies previously described because the application of the IH produces a function. This function is later used on smaller inductive arguments (hence the strong induction), but the ordering of size is maintained through additional judgements. Another complex strategy is one where inductive lemmas are used in both the base and inductive case; appearing in proofs such as the commutativity of addition. This provides an extra hurdle since it is difficult to connect the strategy with the definitions. Such strategies would require an extension of strategy features. These sorts of extensions would be more amenable to a description of strategy which already contains the necessary features and structural components as building blocks; allowing any strategy imaginable to be formed as an instance of the description. Such work is necessary in discovering and learning new strategies but is beyond the scope of this Thesis which builds a system around a set number of strategies to achieve early assurance of the idea’s potential.

Non-Inductive Branches Using a strategy to provide information about non-inductive branches (base cases) will be the work of the implementation. There is clearly information that can be transferred from the proof frame for the inductive case to the base case. Take for example the uniqueness proofs where a strategy would say to invert hypotheses to the same depth 50 50 before applying the IH. This same idea of inversion will be used in the base case to ensure the terms match with the same definitions. Also, in the commutativity of addition proof, following the IH is an application of an inductive lemma. The base case will also require an inductive lemma to be proved. These correspondences can possibly be extracted from the strategy.

EXTRACTING PROOF DATA

This thesis seeks to connect a formula with the specification definitions to form a sort of proof guide. As such, it is necessary to parse the signature and module files so the text can be transformed into operable data-structures that are easily manipulated by the algorithms proposed later. Beyond that, to enable proof reuse all data from the proof process for completed proofs must be collected. This section describes the method and implementation for collecting such data, starting with the specific tools and languages used, then further breaking down the process into a static and dynamic phase.

The Two-Phase Approach In extracting information from the proof environment, there are two phases: the static and dynamic. In the static phase, the pre-compiled files are parsed, and their information stored as JSON files. These files include the signature, module, and theorem files, containing type information, relations for the specification logic constants, and definitions and theorems (statement and proof sequence) respectively. This information can be collected off-line, or at compile time when a user declares the specification files to

Abella. Separately, information about the run-time environment of the proof is captured dynamically. The process involves stating a theorem and applying one tactic at a time, at each step collecting the relevant sequent information (additional information such as number of steps can be collected during this process). Of course, Abella provides the ability for compiling theorem files and outputting proof scripts that could be parsed in one pass. However, this project is concerned with the interactive situation, not only when a user is interacting with a theorem prover, but also when a strategy may be interacting with a proof state. It would make more sense for an automated proof system to adapt after each tactic application, as opposed to producing full sequences of tactics ready for 52 52 compilation. For these reasons, theorems are re-proved, and the relevant information is captured on-line (during the interactive proof process).

While the work here is done specifically to further the proposal of this thesis, these scripts can be easily used by anyone working with Abella. The information can give insights into a proof corpus, with meta information; but more importantly this provides a base API in Python for users to extract and visualize data that interests them. Once the data has been parsed and stored as a JSON object, access is far easier than hacking the base Abella build. This can be the first step towards a much richer proof environment, with real-time queries, visualization, and analysis.

The Static Phase As mentioned in the previous section, the signature file contains declarations of simple types, and the declarations of constants. The simple types are parsed and stored in a list of types, and the constants are parsed and stored as a triple: constant name, constant typing, and the label judgement (when the target is o), constant (when the type is a singleton), or function (an arrow type that does not have o as its target). A dictionary indexed by the three categories stores the lists of constants, with types stored as nested lists of simple types. These lists are nested to encompass higher-order types. The module file is parsed in the static phase as well, collecting the Prolog-style definitions of the formulas. Each definition has a left-hand-side (LHS) and optionally a right-hand-side (RHS). The definitions are indexed by their LHS judgement and a counter if there are multiple definitions for the same judgement. The counter counts from zero for each rule of the same judgement, resetting at zero for new judgments. Arguments to the judgements are either variables or functions, which are stored as a list. The terms on the RHS are similarly parsed, except in the case of a lambda abstraction which demands a richer data structure keeping track of the abstracted variables. 53 53 Implementation Details The parsing scripts are implemented in Python, and the collected information are exported as JSON objects. There are a couple reasons for the choice in language and storage format. First, Python provides a subprocess library that allows the creation of multiple processes with standard input and output capabilities. Instead of scripting at the level of the shell to open an Abella process, input tactics, and read proof states, this can be done within a Python program. This is beneficial because it allows the code for parsing and running Abella to be kept within the Python environment, which also allows for machine learning and other complex libraries to be used directly during the proof process. Second, the interoperability between dictionaries in Python and JSON objects lead to clear code and easy access. There is a direct translation between Python types and

JSON, and a simple API that allows exporting Python structures as JSON objects and importing JSON objects as Python structures.

It should be noted that only a subset of the Abella syntax is parsed with these

Python scripts. The Abella changelog has new developments that have not yet been incorporated into the accepted grammar. In addition, the grammar is constructed from examples and descriptions on the Abella website that do not contain these new features.

The Dynamic Phase The dynamic phase involves reproving theorems and extracting valuable information from the sequent information output by the Abella process to the console window. Specifically, an Abella subprocess is opened in a Python application, and the designated specification is loaded (corresponding to a signature and module file). Then, the theorem file is parsed. The definitions of the theorem file are looped through and written to the Abella process. The definitions can be loaded at the beginning if they are defined in the same order as they appeared in the theorem file, in case of dependencies between definitions. From this point, theorems are looped through, and for each theorem 54 54 the corresponding tactic list is looped through. After each tactic is applied, the sequent being output is parsed.

The information being kept on a proof-level basis is:

 Theorem name – the name as it appears in the thm file  Theorem formula – the formula stored as a meta-term (recursively defined

data-structure)

 Dependencies – list of lemmas used that are defined outside the theorem  Size of proof – number of proof steps

 Tree structure of proof – dictionary indexed by goal names, where each

value contains the list of proof step IDs that occur in order for the given

goal. Each ID is paired with a set of goal names in the case the step leads

to branches, or an empty set in the case the step completes the goal.

 List of proof steps – list of poof step data structures defined below The information being kept on a proof step basis is:

 Step ID – ID counting from zero, numbering the proof steps in order

 Tactic – raw tactic string

 Out Hypotheses – a list of hypotheses that changed in the subsequent

proof step after the tactic was applied

 Goals – a list of subgoals and the current goal  Witness – optional, in the case of a search tactic parse the witness and

store its type and relevant information

 Instantiations – optional, list of instantiations created in the subsequent proof step after the tactic was applied

The last two pieces of information, witnesses and instantiations, are both provided by the Abella system when optional settings are marked on. It is assumed that these settings will always be set, because the witnesses will be especially important for 55 55 determining how a subgoal was proved. Witnesses provide the information behind how a search tactic solved a proof; information that would be difficult to recover in its absence.

Instantiations are the list of unifications that have occurred up to the current proof step.

These appear as a list of equalities of the form N1 = N2 and are easy to keep track of. Collecting the remainder of the information for the theorem and proof steps is done through with the algorithm shown in Figure 23. The algorithm starts by initializing a handful of variables before iterating through tactics for a theorem in step 2. It is necessary to keep two proof states at a time to know what the effect of a tactic application was, because information like the out hypotheses is stored in the proof step before the application. Further, global information is maintained, like the current goal, list of subgoals, and instantiations will also be used when determining how a tactic affected the proof state. Within the loop, 2.1 grabs the lemma applications that form dependencies between this proof and other proofs, though additional work is done to make sure the lemma was not defined within the proof and was not added twice to the list. The proof states will be dictionaries prev_seq and out_seq, corresponding to the before and after states of a tactic application. Step 2.2 sets the follows for prev_seq and 2.4 will set the witness of prev_seq in the case that a search tactic was executed on line 2.3. The output sequent is parsed and stored in the variable out_seq, and the new goal name is placed in new_g on line 2.5. The branching that occurs next splits into three cases: one, the goal is the same, meaning the proof is in the same branch; two, the goal has changed but the new goal exists in the previously known subgoals, meaning the last proof step completed its branch; three, the last step used a branching tactic and new subgoals exist. In a further breakdown, the first case updates the proof tree showing that the previous step stayed within the same goal, and the difference in hypotheses between the two steps are stored in the previous steps out hypotheses. The second case updates the tree with an empty list because the step was the end of the branch, completing the subgoal. There are no 56 56 hypotheses to be changed since the branch is completed. The last case updates the subgoals and lists the new subgoals in the proof tree as a branching set. Again, the difference in hypotheses are checked. Finally, new instantiations are set and prev_seq is appended to the step list, then the process continues until all tactics are executed.

1. Initialize values: subgoals = curr_hypotheses = [], prev_sequent = {}, proof_step = 0, curr_g = ‘root’, proof_tree[‘root’] = [], subgoals = [‘root’], instantiations = [] 2. Iterate for tactic in theorem[‘tactics’]: 1. If tactic is lemma, Theorem[dependencies].append(lemma name) 2. prev_sequent[‘tactic’] = tactic, prev_sequent[‘step’] = step 3. Post_line(tactic) # Run tactic in Abella application 4. If tactic is search, prev_sequent[‘witness’] = parse witness 5. Out_seq = parse sequent, new_g = out_seq[‘goal’] 6. If not completed 1. If new_g is curr_g 1. Proof_tree[curr_g].append((step,[curr_g])) 2. Prev_seq[out_hyps] = hyp_diff(prev_seq,out_seq) 2. If new_g in subgoals 1. Proof_tree[curr_g].append((step,[])) 3. Else 1. Subgoals += new subgoals 2. Proof_tree[curr_g].append((step,[new subgoals])) Figure 23. Algorithm for3. extracting Prev_seq[out_hyps] theorem and proof = step information. hyp_diff(prev_seq,out_seq) Implementation Details4. Prev_seq[‘instantiations’] = new instantiations The proof information5. Theorem[‘proof in the dynamic steps’].append(prev_seq) phase will come directly from the output of 6. Prev_seq = out_seq the Abella software to the console window. By parsing such information, one can recover 7. Else, proof complete (break) 57 57 the same data that a user would have access to during an interactive proof process. In this sense, no information is lost. However, it may be advantageous to retrieve information from within the Abella process as it runs. This is made difficult by the relatively small amount of documentation for the Abella source code. In addition, it would be more difficult to maintain internal code hijacking the Abella system, than independent code simply parsing the output. For these reasons, Abella processes will be opened from within a Python process as a subprocess an interacted with in the same way a human user would. The output will be read and parsed within the Python process, and the information will be stored in a JSON format.

As with the static parsing, the grammar captured by the implemented parser is not complete in terms of Abella syntax. While it works for the examples used throughout this thesis, there exists some forms of syntactic sugar and special cases that are not covered.

ABSTRACTING A PROOF

After running a proof script and collecting proof state information, the output object is a tree like structure with proof steps corresponding to proof states (hypotheses and goals), and edges corresponding to tactic commands. This structure is so fine-grained that it can hide important similarities between proofs that a human perceives at a higher level of abstraction. In addition, the structure leaves out important information about the proof process like how a sub goal is being proved, as these steps are normally wrapped up in the single tactic search, or how steps relate to the structure of definitions. Adding information to the proof representation will help exploit the meaning of the proof.

Further, moving the proof representation to a higher level of abstraction will not only make the proofs more understandable to the human eye, but it can enhance the ability to learn clear patterns between proofs that were previously well-hidden.

This section begins by presenting the key components that will go into the representation, and details how these components are extracted from the proof. This initial process moves the steps of a proof to nodes, data structures containing additional information. Then the dependencies between proofs are captured with a hierarchical structure called a proof tree. Finally, the proof tree is abstracted to the level of a strategy, with a focus on the IH application and the embedding of proof steps into definitions; and this new structure is called a proof frame. Additional information that might be kept in a proof frame is discussed, and potential uses of the data via queries are put forward.

Targets A target is a piece of information added to each proof step which captures in some sense the goal of that step, possibly tying the step to a part of a definition or identifying it as a component in a strategy. Targets will help give the proof structure and make the reason for steps within a proof explicit, at least in the context of strategies. Different 59 59 types of targets will be defined, starting with those for the leaf steps of a proof where a subgoal is proved. Then targets will be examined for steps within a branch, followed by steps before a branch, in each case relating a target back to definitions or IH applications.

As the target types are introduced, they will be accompanied by the methods for procedurally finding the targets, matching the current implementation. This procedure works from the leaves upwards, and the descriptions below will follow this order.

Leaf Step Targets The targets at the leaves of a proof are necessary when determining what the preceding steps’ targets will be, as they contain the information about how a branch was proved. Unfortunately, in many cases this information is hidden behind a search tactic; one of the many reasons to introduce targets in the first place. Luckily Abella provides the witness of the search if the feature is set to on, and these witnesses are assumed to exist in this work.

The first witness type to examine is the unfold witness, which has information about the definition being unfolded, labeled by the judgement name and position in the mod file, and the hypotheses corresponding to the RHS terms of the definition. This information will be used to determine which preceding proof steps produced the hypotheses used in the search, and they can be linked with the named definition. An example seen in Figure 24 is the case for add exists, where the witness shows that this is the second definition for add (the recursive definition), and H4 is the hypothesis used as the RHS term of that definition. In this case, the target will be those three components of the witness; judgement name, index, hypotheses used in search. Of course, witnesses of this sort can grow in complexity, with left or right for proving disjunctions, similar information for conjunctions, as well as potential nesting. Since the desired information is 60 60 the definition being used, the witness will contain the top-level definitions and their corresponding hypotheses.

Figure. 24 Add exists witness and target for step 6.

Target := (judgement, ID, hypotheses)

Figure 25. Target definition for leaf with unfold witness.

In the case of equality proofs, the witness does not provide information about a corresponding definition, but it is still important to capture information at the leaf to form targets in the preceding steps of the branch. Therefore, the target in this case will include the hypothesis that contain the unified variable (which is known from the goal). Moving backwards, knowing these hypotheses will allow targets to be formed based on dependencies between proof steps, i.e., which proof steps lead to those hypotheses having unified variables; and this proof will continue in a backwards direction. One example of this is the leaf node for the inductive branch of the uniqueness of addition. Here the term s N5 was unified with s N4, and the variable appears in both hypothesis H3 and H4 as the last variable in the add judgement. This is reflected in the in the target. 61 61

Figure 26. Add uniqueness witness and target for step 8.

Target := (Eq, hypotheses)

Figure 27. Target definition for leaf with Eq witness.

Leaves which correspond with definitions that do not have a RHS will be handled in the same way as the above two cases. Take the example of the base case in the add nat proof shown in Figure 28. The witness does not contain an unfold, so the target will simply be the definition for the corresponding base case and the hypothesis used. This definition will have to be matched by the program, since the witness does not give its label.

Figure 28. Add nat witness and target for step 4.

Last are the leaves which do not share any correspondence to a definition. For example, the case where no rule head matches the form of the predicate of a hypothesis, 62 62 so running a case on it will complete the sub goal. This can also be seen in examples where a lemma producing a contradiction is used to finish the sub goal. These types of steps will have generic targets labeled by impossible or contradiction. The proof steps preceding these leaves will be links, and not contain definition-based targets, as there is no definition to correspond to.

Target := impossible | contradiction

Figure 29. Target definition for leaf with no witness.

Definition-Based Targets The next set of targets to be explored are those for proof steps that occur within a subgoal that correspond to the RHS terms of definitions. Following the ordering above, the first to be considered are those subgoals ending with an unfold, then the Eq case.

For those proof steps in direct correspondence with a term on a definition’s RHS, the target will refer to that definition and the term. The definition will be referenced by the judgment name and index, and the term will be saved the way it appears in the hypothesis. Take the example in Figure 30 for the existence of multiplication. The target is the add term in the second definition for multiplication.

Figure 30. Mult exists target for step 7. 63 63

Target := (judgement, ID, term)

Figure 31. Target definition for steps that correspond to RHS of definitions.

Since the target of the leaf node in this case contains the hypotheses H4 and H6, the target of step 7 in Figure 30 can be found by knowing that this proof step creates H6.

This information is captured in the Out Hypotheses of a proof step. However, it may be the case that multiple proof steps change the same hypothesis, so only the proof step occurring latest in the proof will receive a target corresponding to the term. Indirect correspondences to the definitions, occurring in the equality proofs, will also be linked to RHS definition terms but in a slightly altered way. Here, uniqueness theorems will generally take two terms as arguments that have the same structure. The target in these cases will be the same as above, except the variable to be unified will also be included. This can be seen in step 7 of the add uniqueness proof shown in Figure 32.

Figure 32. Add uniqueness target for step 7.

Target := (judgement, ID, term, Eq)

Figure 33. Target definition for steps indirectly corresponding with definitions.

As noted earlier, it is more difficult to find the indirect correspondence to the

RHS terms in the equality proofs because the witness does not give away the relevant 64 64 information. This can be achieved without the additional information by keeping track of the most recent hypotheses used for unification, like H3 and H4 in Figure 26. Then, you can determine which proof step forced the unification of variables in those hypotheses by looking either at subsequent instantiations or the out hypotheses. The in hypotheses used in this proof step are then matched against terms in definitions, until the appropriate definition and RHS term are located. Then, these in hypotheses become the most recent hypotheses used for unification, and preceding proof steps will follow the same procedure to determine an indirect connection to the definitions.

IH related Targets The next set of targets to be considered are those that relate to the strategy of IH application, referring to the way IH arguments are processed before application and how branching is used after application. These two components will then be generalized for the proof steps that perform similar actions but do not relate back to the IH.

There are proof steps that transform IH arguments before the application, with the most common case being inversion. Inversion will allow the arguments for the IH to be at the same depth by matching the inductive definition cases and providing RHS inductive terms of the argument at hand. Such a case can be seen in the uniqueness of addition proof below, in step 6 where H2 is inverted with the case tactic to create H4 to be used by the IH in the following step.

The target includes the definitions of the inverted rule that will produce RHS terms to be used by the IH, as there may be more than one inductive definition, along with those terms labelled by their hypothesis name. This target can be generalized to proof steps that use inversion to match the term being inverted with a specific definition. This may provide new hypotheses in the case of an inductive definition or unify values when the definition has no RHS. In the base case of the uniqueness of addition inversion 65 65 matches H2 with the base case definition of addition, unifying N3 and N3’. For these general cases, the target will resemble the same structure, but the third element of the pair may be empty if there are no arguments produced for future applications.

Figure 34. Add uniqueness target for step 6.

Target := (definitions, instantiations, IH arg hyps)

Figure 35. Target definition for steps inverting terms.

Figure 36. Add uniqueness target for step 4. 66 66

The second component to consider is actions taken after the IH application to branch into different subgoals, when the IH produces an output that embeds into multiple definitions. This is obviously the case when an IH produces a disjunct, and it is split.

However, this gets more complicated in proofs like division exists, where the IH produces a division term that needs to be split by a case on the fifth argument as zero or a successor. Figure 37 shows the target consists of the definitions being split, and the instantiation that causes the split, similar to the previous target but without the IH args. It is important to see the difference between the two; this case is creating a distinction between two definitions by altering the structure of a term via inversion of a separate term, whereas the previous case is an inversion on a term that alters the structure of that term itself.

Figure 37. Division exists target for step 8. 67 67

Target := (definitions, instantiations)

Figure 38. Target definition for steps causing a branch on a separate term.

For both types of target, instantiations can be found directly from the proof state by checking which instantiations are added after a proof step. Further, the relevant definitions that the branch will induce will be found in the targets for the leaf steps of the branches created, which is why the procedure works in a backwards direction. The IH args for the first target are found by checking the out hypotheses of the inversion and the in hypotheses of the IH application.

Link Targets Links are a general form of target that serve as the intermediaries between interspersed definition-based targets, structurally linking together the terms on the RHS of a definition. They also service proof steps that do not directly correspond with a definition, but connect components of a strategy, where the link simply ties the step to those that depend on its output.

Links can occur for proof steps that fall between definition-based targets, where the proof step serves to connect the two. Take step 6 in the existence of multiplication proof in Figure 39. The step uses a lemma to extract the nat judgement on N3 from the mult judgement in hypothesis H4. This will allow step 7 to construct the add judgment which serves as the second term on the RHS of the multiplication definition. In other words, step 6 serves to link the mult and add terms in the definition.

68 68

Figure 39. Multiplication exists target for step 6.

Immediate dependencies are already contained in a node via in(out) hypotheses, so the target for a link step will catalog the link that occurs from the current proof step to the definition-based targets. So, each preceding proof step that is a member of the link will append its index to the head of the list, keeping track of the chains of intermediary proof steps. It may also be the case that a proof step serves as a link between multiple definition-based targets, and in this case, it will have multiple lists of steps in its target.

The same procedure is applied for subgoals in equality proofs, as well as subgoals with no definition-based targets; however, here the link is formed across the entire branch. To see an example with a chain of links, take the case where both E1 and E2 are values in the proof of progress for plus. Figure 40 shows the targets of the proof steps, where two chains are formed from the initial values down to the leaf node. This case is interesting because there are two separate chains being formed, [11,13] and [12,13]. The reason they are separate is because they are independent sequences of tactic commands.

Dependencies will be discussed in more detail in the section on proof trees. But it is important to keep in mind that links are only formed between dependent steps, so step 11 depends on instantiation generated in step 9; likewise, step 12 depends on the 69 69 instantiation generated in step 10. Also, the target from step 13 is bubbled up along the two chains, since they will both lead to arguments used in step 13.

Figure 40. Progress target for steps 9-13.

Target := ([ step IDs ], target (end of link))

Figure 41. Target definition for intermediary proof steps a.k.a. links.

Links can also exist between other target types, for example in the division exists proof where a link exists allowing for a case that splits the definitions, seen in Figure 42. The link is simply the step that depends on the output, along with the target of that step, where chains of links are constructed as described earlier. The target of step 8 is taken from Figure 37. 70 70

Figure 42. Division exists target for step 7.

Additional Targets Links serve as the catch all for proof steps with targets not defined in early sections. It will be clear when the proof frame is constructed, that targets provide a means to extract the strategies used in a proof. As such, if more complex proof strategies are to be considered, the set of targets will need to expand, and links will need to capture more fine-grained information. However, it will be possible to retrieve some of this information after the fact, as the proof structure will retain information like the tactic used at each step, which can then be interpreted separately. 71 71

Lastly, each inductive proof begins with the same three tactics, and they will not receive any target for that reason.

Nodes To achieve a simpler representation of a proof step, the node data structure is introduced. It abandons extraneous information like redundant goal formulas or unused hypotheses, while keeping the ID corresponding to the proof step, target, hypotheses used and hypotheses affected, and tactic.

Node := ID

Target

In Hypotheses

Out Hypotheses

Tactic

Figure 43. Node definition.

Each component of the node will have a purpose when constructing the proof tree and proof frame in the following sections. The in hypotheses and out hypotheses will be used to discover the dependencies between proof steps and help extract the proof tree, the target will be used to track which steps are instances of a strategy and help extract the proof frame, and the tactic will allow additional information to be added to the proof frame after the fact, like what type of lemma is being used and whether it uses induction.

Given the targets of proof steps, finding the in hypotheses is simply a matter of looking at the arguments to the tactic command, and the out hypothesis are obtained during the reproving process described in Section 5. Nodes are given IDs in the order they are presented in the original proof script. 72 72 Proof Trees Within a proof there exist dependencies, forcing some steps to be performed before others, along with sets of independent steps that can be reordered without affecting the proof other than possible tactic argument renaming. In addition, the proof structure contains branching steps that split the current goal into multiple subgoals. The purpose of a proof tree is to provide a structure that captures the branching and dependencies within a proof. This will be important for extracting components of a strategy from the proof and determining how those components depend on each other. These dependencies are often not clear from the strict ordering of proof steps in a proof script, so this additional structure is required.

There are different levels of dependencies that can be extracted. For example, in the simplest case two proof steps could be switched with no renaming and the rest of the proof would proceed as normal. A more complicated example could be reordering multiple branching steps, leading to the same set of branches in the end but causing the need for renaming all tactic arguments as the hypotheses have been introduced in a new order. The following section described some of the challenges and a possible solution for capturing those complex dependencies using a normal form. The proof tree defined in the following section captures only dependencies within branches, not addressing the possible arbitrary ordering of nested branches themselves. The definition of components of the proof tree is followed by a description of how the proof tree is to be constructed procedurally. The proof tree will provide a full representation of the initial proof, containing the nodes for proof steps, as well as dependency information in the proof tree structure. This will then be further refined in the following section to reach the idea of a proof frame. 73 73 Heuristic Rules for a Normal Form Creating a structure that captures the dependencies within a proof can be difficult due to the way in which certain proof steps can be reorganized without affecting the reasoning of the proof. The ordering of these proof steps within the script is dependent upon user preference, with no built-in or defined guide for proper ordering. One such example comes in the proof of progress for the plus case, where two possible interleavings of applying the IH and performing a case on the output disjuncts are shown below, each of which leading to the same set of branches.

Figure 44. Interleavings of Progress IH applications.

It might be beneficial to create a proof representation that flattens these two applications into one, like seen in Figure 19. This representation would be useful in finding a common form that a proof script of any interleaving could be raised to, and the representation is arguably closer to the reasoning of the proof where the actions are viewed as simultaneous. However, such a tactic combining the two steps is not supported in Abella, and so the applications must be done in sequence. For this reason, the strategies will also produce the applications in a sequential way; such that the output can directly correspond to a proof script in Abella and be run.

To solve the problem of proof scripts that use different orderings, a normal form can be introduced that maintains a sequential order but ensures proof scripts will not diverge. One can achieve a normal form by introducing heuristic rules that describe the 74 74 way in which these arbitrary choices should be made. For example, “apply the IH to inductive hypotheses in order” would ensure that the IH is applied to E1 before E2, and

“split disjuncts after IH applications” would lead to the left interleaving in Figure 44 and no other. Then a more general proof representation would not be needed, as proof scripts could be passed through this heuristic and rewritten with a script in the normal form. Of course, this normal form would be enforced by the proof solving program that implements the strategies so that a correspondence exists between the database and the solver for proof reuse matters.

One of the problems with this approach is rewriting proofs with the heuristic in place. The heuristic can be simple or complex, depending on how fine-grained the user wants to be in organizing choices of proof step sequencing. We are currently exploring appropriate heuristics for type-theory based proofs, but these may be different for other domains. But once the heuristic is decided upon, the difficulty is in rewriting raw tactics that have been reordered because many tactics take hypotheses as inputs. Abella provides ways of renaming hypotheses and setting new names for created hypotheses, but the transformation is still non-trivial as it requires reasoning about the effects of tactics on the proof state. Such reasoning would require in some sense a reimplementation of the Abella logic to ensure the new proof script is correct, or it could be done on-line within an instance of the Abella interactive prover. Either option requires additional software development. For this thesis, proof scripts are taken as is. This approach is sound because the small set of proofs used for testing can be examined by hand and rewritten to follow the same heuristics. As such, a first step to expanding this work to a larger corpus will be either implementing the heuristic translation into a normal form or defining a new proof representation that forgoes the need for a normal form. 75 75 Defining a Proof Tree To abstract the proof structure to a higher level of representation, it is necessary to form hierarchical categories for portions of the tree, beyond having nodes only connected by edges. This tree structure, named a Proof Tree, is constructed from three inductively defined types: sequences, independent sets, and branches, along with nodes at the bottom level to represent each proof step. This hierarchical structure is similar to the proof representation developed in [29].

First is the branch where a proof step with tactic case will cause the proof to branch based on different subgoals. For the construction of a branch, the node that causes the branching of the proof is embedded in stored, then the subproofs that follow are each their own proof tree with an additional annotation for the instantiation that occurred. For add exists step 3, the branch will be modelled as a node with several children trees annotated by their specific case seen in Figure 45. Branches will be shaded light orange.

Figure 45. Branch structure for add exists step 3.

Branch := Node, [(Proof Tree, Term Instantiation)]

Figure 46. Branch definition for the Proof Tree. 76 76

Next is the sequence, used to capture the dependencies between proof steps. The sequence is a list of proof trees, where a dependence is formed between the elements of the list in order. The example in Figure 47 has three proof steps all dependent on each other, even if only two of the proof steps have targets directed towards RHS predicates of the respective rule. Thus, these three proof steps will be placed in a sequence as nodes.

The proof steps are represented as nodes within the sequence because there is no additional structure to be added to the three steps. Sequences will be shaded light blue.

Figure 47. Sequence structure for mult exists steps 5,6,7.

Sequence := [ Proof Tree ]

Figure 48. Sequence definition for the Proof Tree.

The last constructor is the independent set, or ISet. The purpose of this constructor is to capture portions of a proof where independent threads are working towards one goal. Take the example for a proof of progress for the plus constructor, in the case where both expressions are values (see Figure 49). In this proof, the construction of 77 77 the arguments used for the add exists theorem are both taken independently, i.e., they can be rearranged in any order as long as the ordering with a sequence is respected. Note that this example has an independent set where the two elements in the set are sequences, as shown by the coloring. ISets will be shaded light green.

Figure 49. ISet structure for progress steps 10-14.

ISet := [ Proof Frame ]

Figure 50. ISet definition for Proof Tree.

The complete definition of the Proof Frame is shown in Figure 51, tying the three inductively defined constructors and node together into one definition.

Proof Tree : Sequence | ISet | Branch | Node

Figure 51. Proof Tree definition.

Constructing a Proof Tree Constructing the proof tree involves capturing the branching constructs as well as the dependencies between sequences of proof steps. Clearly branches are easy to 78 78 construct as they follow directly from the original tree structure, and the branch annotations can be output by Abella with the command Set instantiations on. This command will show how variables from the branching term were instantiated. Untangling sequences and intendent sets is trickier. The reason for this is that independent sets can exist inside a sequence at an arbitrary point like in Figure 52 where the two inputs for add exists are worked on independently then brought back together to be used as arguments for the same theorem. This issue is resolved by forming a topological sort based on the dependencies of proof nodes within a branch. The targets cannot be used directly, because some dependencies are not clear, especially when a target is a predicate from the

RHS, but other predicates on the RHS depend on it. This is not captured by the notion of target, and instead by the notion of sequence. Thus, the sort must be found by examining the change in local environment between each node, and whether the points of change are used in subsequent nodes. Once the topological sorting is achieved, it is straightforward to distinguish sequences from independent sets, where the final ordering follows the node

IDs given by the ordering of tactic commands in the original proof script.

The branch of the proof of progress where both expressions are values (proof steps 9-13), has the topological sort shown in Figure 52, where vertices are the proof step

IDs. Both paths are used to supply arguments to the lemma in step 13, but neither paths affect the other, as can be seen by their separation. To the right of this example is an arbitrary topological sort that may appear in a proof branch.

The algorithm for constructing independent sets versus sequences works by moving in a reverse order up the graph. If a node has multiple edges incident to it, create a sequence including the current node, and an independent set for those edges.

Independent sets are closed if the paths from the set stem from a single node, for example

N0 in Figure 53, and such a node is included in the same sequence as the independent set.

79 79

Figure 52. Topological ordering for branch of Progress proof (left), and an arbitrary proof (right).

Figure 53. Topological ordering with Iset and Sequence structure. 80 80

There are special cases that do not nicely fit into the scheme described by the algorithm above. Take for example the topological sort for the inductive case branch of mult exits (see Figure 54). The predicates on the RHS of the inductive rule for mult have dependencies expressed by the arrows from N5 to N6 to N7. However, the predicate in N5 will eventually be used in the search tactic for N8 because it is needed for the rule to be used. Earlier it was claimed that these steps should be in a sequence because they must happen in order, it would not be possible to have them in an independent set. Thus, for cases like the following where an independent set is infeasible, the edge from N5 to N8 will be ignored, and the four steps will be captured as a sequence. However, this will be accounted when the proof is lifted, as N5 will hold a special place in the sequence.

Figure 54. Topological sort for branch of mult exists proof.

It bears mentioning that a topological sort may not preserve the ordering found in the originally proof script. For example, in the progress proof, the independent set can be performed in any order if the order within the sequences are preserved. This adds additional use to the proof tree; it can serve to disambiguate ordering in the proof script that is otherwise flexible. Further, one could use this structure to create a normal form, 81 81 establishing an ordering based on the structure that is no longer arbitrary. Such a process could further advance attempts at ML which are hindered by arbitrary human choice in proof scripts.

Once the sorting has been completed and the components of the tree are resolved, the construction of the entire tree is complete.

Proof Frames A proof frame is the final step of abstraction from the full proof with information about each proof state towards the instance of a strategy. As such, the proof frame contains only the information necessary to understand the strategy applied within the proof; namely, the proof nodes that contain targets relating to the strategies application of the IH and the resulting embedding into a definition. This leaves out some chunks of the proof, and those holes will be filled in with a new structure called a gap that represents a portion of the proof where the strategy had no further information to give. After these components are described in more detail below, the proof frame is compared to a proof tree, and possible additions to the proof frame are proposed. Lastly, the possible uses of the proof frame are viewed through the lens of database queries.

Extracting Strategies via Targets The targets attached to each node were defined in such a way that makes extracting the strategy instance straight-forward. The process is achieved by iterating through the nodes and keeping only the nodes with targets that are not links. What is left are the nodes that correspond with the strategy either in applying the IH or embedding proof steps into RHS terms of definitions. These are exactly the pieces of a proof that a strategy attempts to uncover. As was mentioned earlier, the introduction of new strategies and targets respectively will reduce the number of links and make this extracted structure richer. However, there is a benefit to keeping the extracted structure simple, namely that 82 82 it will be easier to implement the strategy in the reverse direction, starting from a theorem formula and attempting to generate a proof. In some cases, the extracted nodes will be almost the full proof, as it is for the existence of multiplication; but in other cases, many nodes will be severed as can be seen in progress for the values branch, with the other branches left out of Figure 55.

Figure 55. Multiplication exists with nodes removed (left), progress with nodes removed (right).

Dependencies are not abandoned during this extraction process. The original proof tree, which captures branching, independent sets, and sequences is kept, but must be lifted to match the remaining nodes. This means that proof tree structure that relates to nodes that are not a part of the extracted node set will not be kept in the proof tree. In some cases, this may not cause a change as can be seen in the existence of multiplication proof where step 6 has been removed but the sequence stays the same (see Figure 56).

However, the proof of progress sees a lifting of the proof tree structure for steps 9-13, leaving only the branch and an outer sequence structure (see Figure 57). 83 83

Figure 56. Proof Tree lifting of multiplication exists steps 5-7.

Figure 57. Proof Tree lifting of progress steps 9-13.

Gaps Extracting the strategy information from the previous section will leave some branches untouched where the IH application is not embedded into a definition. A placeholder, referred to as a gap, will fill the place of these proof steps. These sets of proof steps will be filled in with the preconditions, the instantiations that make up the branch, and the postcondition, how the subgoal is to be solved. Take the proof of progress plus case where both E1 and E2 are values. The proof steps 9-12 do not embed into a 84 84 definition with the current understanding of a strategy, and so would not be extracted in the previous section. Instead, they will be filled in by a gap with preconditions E1 = val

E1 and E2 = val E2, and postcondition target add N1 N2 N3 for the third rule of the step judgement.

Figure 58. Gap in progress from steps 9-13.

Gap := (Pre Conditions, Post Condition)

Pre Conditions := branch instantiations

Post Condition := Target

Figure 59. Gap Definition.

The preconditions of a gap can be found by examining the lifted proof tree from the previous section, where branch constructors will still exist and contain the relevant instantiations for a particular branch. The post condition is simply the target of the leaf node, where information like which hypotheses are used in an unfold can be ignored, since those hypotheses did not exist at the outset of the gap. Gaps lose a lot of information from the underlying proof tree. Keeping some of the information as annotations to the gap will be discussed in a following section. 85 85

However, the core of the problem is that strategies do not extend to these gaps. Creating more complex strategies that can fill in some of the structure of gaps could be useful when the same gaps reoccur frequently and can be learned and incorporated into the strategy. Two other solutions that do not compromise the simplicity and generality of strategies are using a secondary search algorithm to solve gaps or using proof by analogy to find and reuse existing proofs with similar gaps. These will be discussed in the section on Using Proof Strategies.

Defining a Proof Frame A proof frame is defined in a similar way to a proof tree minus one exception, the existence of gaps in a proof frame that can fill the spot of multiple nodes from the proof tree. Thus, adding the definition of a gap from Figure 59 as a constructor after the Node in the proof tree definition gives Figure 60, where the three inductively defined constructors are also modified to include a gap. Gaps were defined in the previous section in such a way that they only occur in subgoals with no nested branching, and begin with the instantiations of the branch as preconditions, and the target of the leaf node of the branch as a postcondition. In such a formulation, it would be appropriate to only include gaps in the inductive definition of the Branch, not as a constructor for a proof frame. However, it is imagined that gaps will be a useful concept for signifying holes throughout a proof, not only at the branches. Extending the concept of a gap in this way would necessitate its appearance as a constructor for a proof frame, since it may appear within any of the other constructors. So, it is included in the proof frame definition in Figure 60 in order to signify the concepts generalizability to the rest of the proof, even if this work limits it to branches.

Proof Frame : Sequence | ISet | Branch | Node | Gap

Figure 60. Proof Frame definition. 86 86 Hints from Strategy and Definitions, Adding Information to the Frame The proof frame represents the minimal expression of a strategy instance in a proof (though this could be reduced even further by removing node information other than the target). This serves as a starting point for visualization and summarization of proofs, as well as learning proof strategies across a large corpus. The reason the proof frame is kept in close correspondence with a strategy is because the application of a strategy to a theorem formula should produce at minimum a proof frame. Then proof reuse can be implemented through analogy by matching proofs with similar frames, or patterns can be learned across multiple frames that allow the solver to refine a given frame. Further, the proof frame can be used as a way of classifying proof structure without including features specific to the proof that can distract from generalizable patters across corpuses.

That said, there may be instances where adding additional information to the proof frame is beneficial. Take for example cases where a strategy gives valuable information in the inductive branch that can be used in the base case branch of a proof.

This can be seen in proofs like uniqueness where the second argument is inverted before applying the IH, and the same is done in the base case. Displaying that sort of information in the proof frame will make it possible to extend strategies to non-inductive branches through learning. Additionally, information like the type of lemmas used at given nodes could enhance learning. For example, existence proofs typically use existence lemmas and uniqueness proofs use uniqueness lemmas respectively on inductively defined terms. Adding this information to the frame can allow strategies to further refine their frame, giving not only steps that correspond with RHS terms of definitions, but also the type of lemma that should be used at that step. Adding information of the sort could compromise generalizability of strategies, but that is not an 87 87 issue when working within a domain with similar types of proofs that exhibit a finer- grained concept of strategy.

An additional form of information that may aid visualization and summarization along with learning is including the complexity or difficulty of certain parts of the proof. For example, the link in the existence of multiplication proof covers one proof step, and the lemma would not be hard to find with search if it is already known that the next step uses the existence of add lemma. The gap in Figure 58 for a case of progress has four steps that are lifted, including the Iset contained in the proof tree. It would be more difficult to recover these steps with search, as the chain from val E1, val E2 to the add exists lemma is longer. So, annotating gaps with the number of steps, types of tactics used, and complexity of underlying proof tree can give the user an idea of which sections of the proof frame will be more difficult to fill in than others.

Querying the Data First, a word on implementation. Each of the programs, extracting nodes from proof steps, extracting a proof tree from nodes, and extracting a proof frame from a proof tree are all implemented independently. In fact, the pipelining starts back at Section 5 with the parsing of sig, mod, and thm files which are then re-proved to extract proof state information. For such a system, the way in which data is stored in a data base is up to the user. Only proof step information can be stored, and when a proof frame is needed the proof step information can be retrieved and passed through the pipeline. Additionally, the proof frames can be stored alongside the proof step information, requiring more storage space that is possibly redundant, but reducing the time needed to access proof frames.

The obvious and intended use of the processed data presented in this section is for machine learning. At the strategy level, proof frames can be used in classification engines, matching theorem formulas with their respective strategies, as well as matching 88 88 proof states in a node with specific components of a strategy. At the lower-level, proof frames can be overlaid onto the proof tree, and one can learn how a strategy predicts the specific tactic sequences used within a proof.

There are additional uses of the data that can benefit users. First, finding a similarity metric between proof frames or proof trees would allow users to query the data base for existing proofs that match the theorem they seek to prove. This would allow a kind of proof by analogy, where the user or an automated system can take the similar existing proof and transform it so that it can be used to solve the current proof. Or seeing the existing proof may give guidance to the user of how to structure the proof they are working on. Another type of query in this vein is finding commonly occurring tactic sequences. If the proof development is partially completed, it could be useful to find the most common tactics that occur after the last tactic input, giving additional guidance to the user. This idea can be extended to automated systems that leverage these reoccurring patterns to refine the proof frames output by strategies at the tactic level.

Of course, statistical information about corpuses can be gleaned, looking at proof size, strategy types, reoccurring tactic sequences, gap sizes, and more. Beyond the statistical understanding, specific proofs could be visualized as a proof frame, proof tree, or full proof with steps. These levels of abstraction allow a user to digest a concise summary of a proof based on the strategy instance, then pull back the layers of abstraction if they wish to see the finer details like tactic applications and hypothesis naming.

While the data structures and resulting database were created with a specific reason in mind, there will always exist interesting unforeseen uses for the data. Providing the data and a set of tools for exploring the data will allow users to discover these uses as it suits their own needs.

SOLVING PROOFS USING STRATEGIES

Given a conjecture formula and the definitions within a specification, it is possible in some cases to apply a strategy and derive a proof frame. A proof frame is an outline for how the proof should proceed, based on how the strategy controls the application of the IH and embeds the result into the RHS of corresponding definitions.

This frame is naturally linked to the inductive definitions in the specifications and will follow the computation within those definitions to provide a path for the proof to follow.

This section gives a description of how a system might use the concept of strategies and proof frames to automate the theorem proving process. The proof frame may have gaps, and the exact tactic choices are unclear. Two possibilities for filling in the proof frame, proof by analogy and search, are discussed as well.

Using a Strategy Proof strategies can be applied in a couple ways within a proof solver. First, several strategies could be iterated over and applied to a theorem, returning the set with the most complete proof frames and abandoning those strategies that fail. Second, a classification method could be used to determine which strategy should be used on a theorem. The second option seems more suited as there are some obvious features of a proof that can connect it to a strategy. For example, uniqueness proofs or other proofs where the IH has arguments with shared variables should use a strategy that inverts the

IH arguments before applying the IH. Also, definitions with multiple inductive RHS terms should use a strategy that applies the IH to each of these terms, as is the case for the proof of progress. Simple features of a goal or the definitions can guide the choice of strategy, and an enumeration of these features would correspond to a heuristic approach to selection. This approach is well-suited for domains where the features in proofs using different strategies are well-separated and easily noticeable. Such features can be encoded 90 90 with expert knowledge. Separately, one of the purposes of generating a data base of proof frames is to extract a classifier for strategies based on the data set.

Figure 61. Basic system design for using strategies and proof frames.

After a strategy is selected, it can be applied to the theorem formula and a proof frame can be constructed. With the proof frame in hand, there are several possibilities to move forward. Figure 61 shows how the proof frame can be input into one of three 91 91 programs: a visualization processor, a proof reuse engine, or an automated solver like a hammer. In the first case, the proof frame is formatted into a guide that the user can visualize and use as an aid as they complete the proof by hand. In the second case, the data base in searched, looking for existing frames that have a similar structure. When a similar frame is found, it can be used via proof by analogy methods, and the proof frame can be refined with the additional information. This concept is detailed in the following section on analogy. The last option is sending the frame to an automated solver that fills in the gaps of the frame with raw tactics, producing a proof script. One way to fill in the frame is by back chaining over the gaps, a method discussed in the section search below.

Filling in the Frame As is shown in Figure 62, beyond visualization, an important part of the strategy- based system is constructing a proof frame that can be refined by additional systems. In a way, the proof frame not only provides a guide for users, but also for automated systems, making their job easier. In the first example, a proof frame can be used to find other similar proofs in the database by comparing components of the frame. By using the frame for comparisons, the matching should be more effective than matching raw theorem formulas and hoping to hit a proof that can be useful. Once a matching is found, analogy techniques can be used to morph the found proof’s steps into something that works within the current proof frame. A second option is sending the proof frame to an automated solver to produce a proof script. Again, the proof frame has already significantly limited the search space, so the automated solver will have a head-start in finding an appropriate proof script to solve the problem. 92 92 Analogy Proof by analogy is a well-studied method of proof reuse where a similarity metric is used to search for a matching proof, and that matching proof is modified so that it can be used to solve the current problem. Various similarity metrics and rewrite strategies have been considered, some using machine learning or heuristics. The promise of a proof frame is to reduce the difficulty of the search for matching proofs because the similarity metric not only considers formula structure, but the components of a proof frame as well. In fact, it is possible to match only certain components, like gaps within a proof frame, retrieving valuable information from subproofs within otherwise distinct proofs.

Figure 62. Proof Frame comparison for progress on value cases of plus (left) and length (right).

Take the proof of progress as an example. Assume you are trying to prove progress for plus for the first time, and you have retrieved the proof frame from the theorem formula. Further, you had previously proved the same theorem but with length as an expression instead. Figure 62 shows the side-by-side proof frames for the value 93 93 case with plus on the left and length2 on the right. It can be noticed immediately that both proof frames have a gap with preconditions as value instantiations and postconditions as step rules. This may signify that the proof for the length case could help guide the proof for progress. The next step would be looking under the hood of the length proof, examining the proof steps that form the gap. Figure 63 shows these steps on the left, where the arguments are left blank. Transforming those steps so they match the proof frame for the plus case, replacing cfl_string with cfl_num and length_exists with add_exists will add new information to the existing proof frame. This structure does not yet capture the fact that both E1 and E2 in the plus case will need to produce arguments for add_exists, but it does provide more information than was previously given. This could be further refined with additional analogy-based methods to reach an output proof script.

Figure 63. Proof frame refinement through analogy for progress on value cases of plus (left) and length (right).

Search Another use of a proof frame is in reducing the search space for existing automated solvers. The frame should provide a guide that reduces the suggested actions for a given step, and further breaks the proof down into components with gaps, such that

2 Length does not appear in Appendix A, as it is only used once as an example. Its implementation is irrelevant to this use. 94 94 those components can be solved independently reducing the depth that a search algorithm must travel.

One such example can be seen with a basic back chaining algorithm applied to the inductive branch of the proof frame for multiplication existence. The proof frame information can be seen in Figure 64, where the strategy works up to the application of the IH, giving the target for step 5 and its embedding into the second definition of multiplication along with the mult term on the RHS of the definition. This will also yield the proof state of step 6 if the strategy has been applied in an interactive setting, which is possible given the system presented in Section 5. The proof frame includes the other terms on the RHS of the inductive definition, which in this case is the add term. Lastly, the goal is extracted from the proof state of step 6.

Figure 64. Proof frame information for inductive branch of multiplication exists.

With the information provided by the frame in Figure 65, a search moving backwards from the goal can fill in the tactic commands necessary to complete the branch. In brief, back chaining works by starting from a goal and deciding what is needed to solve that goal. Then those needed elements are taken one by one, deciding what is 95 95 needed to solve each of them. The process continues until facts are reached, which can then be chained forward to the goal. Specifically, this can be achieved by matching the needed terms with the LHS of the definition, then needing the RHS terms of the definition. Or, the needed terms could be matched with the target of an implication, then needing the antecedents.

Figure 65 gives a visualization of the back chaining that can occur, starting from the add term which is needed to prove the goal, and which can be found in the proof frame. The add term is matched to the target of the add exists lemma, with the instantiations of the quantified variables to the left of the lemma. Then what is needed is the two nat judgements, one of which exists as a hypothesis. The second nat judgment can be matched with the target of another lemma, again with corresponding instantiations. The needed mult term for this lemma exists as a hypothesis, so the search is complete, and the proof can be extracted by reversing the path of the search.

Figure 65. Back chaining visualization for inductive branch of multiplication exists.

CONCLUSION

The work presented above provides a step towards the implementation of a fully- automated theorem prover that can learn strategies from a corpus of proof frames and apply them to new theorems with the aid of underlying search procedures. Indeed, such a system is still the goal of this thesis, and the groundwork is all but complete for early prototyping. However, the merit of this thesis is not solely bound to the prospects of future work but can be seen in the artifacts already generated.

In brief, the thesis describes a way of viewing inductive proofs based on the idea of a strategy and provides a concrete representation for this idea via proof frames along with the procedure for extracting the data from proof scripts. This is accompanied by the programs that parse and extract proof data from within the Abella system, along with programs for transforming the data into proof trees and proof frames; all combined with

APIs for JSON database storage. So, the contribution of this thesis is not only providing the theory behind a strategy and its representation, but proving this representation has a correspondence to and could be abstracted from existing proofs by providing the details for implementing the data extraction in the proof system Abella.

There are many ways this work can be extended and serve as the basis for future research projects. This work can be directly used for collecting information through queries on the data base, as well for developing more concise, understandable visualizations of proofs. As noted earlier, it also provides the groundwork for a theorem prover that employs powerful machine learning techniques over the data base to solve proofs via strategies. There are also some pieces of the representation that are left open for future exploration, including the use of a normal form or abstraction of proof trees to capture dependencies between branches. Also, expanding the set of targets and strategy components to account for proofs will more complex strategies like strong induction, and 97 97 even creating a procedure for strategy discovery are possible directions extending this work.

In the end, the intuition carrying this work is that breaking a proof into levels of abstraction, starting from a strategy that looks at a goal formula and definitions for guidance, then filling in the gaps with search, will be an effective form of automation.

Though, even if it is the case that black-box machine learning will advance to the point where such reasoning is no longer necessary, something about understanding the structure behind the process of proving theorems is enticing, and should be held onto for as long as possible.

REFERENCES

REFERENCES

[1] A. Appel et al., "Position paper: the science of deep specification", Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 375, no. 2104, 2017. Available: https://royalsocietypublishing.org/doi/pdf/10.1098/rsta.2016.0331. [Accessed 5 May 2020].

[2] A. Bundy. “The automation of proof by mathematical induction,” in A. Robinson and A. Voronkov, editors. Handbook of Automated Reasoning, vol. I. Elsevier Science, 2001, ch. 13, pp. 845-911.

[3] F. Pfenning. "Logical frameworks," in A. Robinson and A. Voronkov, editors. Handbook of Automated Reasoning, vol. I. Elsevier Science, 2001, ch. 17, pp. 1065-1145.

[4] The Coq Proof Assistant Reference Manual: Version 8.1. 2019. [Online]. Available: https://coq.inria.fr/distrib/current/refman/. [Accessed 2019].

[5] C. Paulin-Mohring, "Introduction to the Calculus of Inductive Constructions," in All about Proofs, Proofs for All, vol. 55, P. Bruno Woltzenlogel and D. David Eds., (Studies in Logic ( and foundations): College Publications, 2015.

[6] R. Constable et al., Implementing mathematics. Prentice-Hall, 1986. [Online]. Available: http://www.nuprl.org/book/. [Accessed 5 May 2020].

[7] L. de Moura, S. Kong, J. Avigad, F. van Doorn, and J. von Raumer. “The Lean Theorem Prover," in Automated Deduction-CADE-25, 25th International Conference on Automated Deduction. https://doi. org/10, 2014, vol. 1007, pp. 978- 3.

[8] L. Paulson. Old Introduction to Isabelle. 2010. [Online]. Available: https://www.researchgate.net/publication/228742561_Old_Introduction_to_Isabelle #pf8. [Accessed 2019].

[9] T. Nipkow, L. C. Paulson, and M. Wenzel, Isabelle/HOL: a proof assistant for higher-order logic. Springer Science & Business Media, 2002.

[10] K. Yang and J. Deng, "Learning to prove theorems via interacting with proof assistants," arXiv preprint arXiv:1905.09381, 2019. Accepted to 36th International Conference on Machine Learning.

[11] C. Kaliszyk, F. Chollet, and C. Szegedy, "Holstep: A machine learning dataset for higher-order logic theorem proving," arXiv preprint arXiv:1703.00426, 2017. Appeared as poster in 5th International Conference on Learning Representations. 100 100 [12] C. Kaliszyk, J. Urban, and J. Vyskocil, "Efficient semantic features for automated reasoning over large theories," in Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.

[13] T. Gauthier, C. Kaliszyk, and J. Urban. TacticToe: Learning to reason with HOL4 tactics. In T. Eiter and D. Sands, editors, 21st International Conference on Logic for Programming, Artificial Intelligence and Reasoning, LPAR-21, volume 46 of EPiC Series in Computing, pages125–143. EasyChair, 2017.

[14] D. Huang, P. Dhariwal, D. Song, and I. Sutskever, "Gamepad: A learning environment for theorem proving," arXiv preprint arXiv:1806.00608, 2018. Accepted in 7th International Conference on Learning Representations.

[15] A. Paliwal, S. Loos, M. Rabe, K. Bansal, and C. Szegedy, "Graph representations for higher-order logic and theorem proving," arXiv preprint arXiv:1905.10006, 2019.

[16] G. Lederman, M. N. Rabe, E. A. Lee, and S. A. Seshia, "Learning Heuristics for Automated Reasoning through Reinforcement Learning," 2018. Under review in 7th International Conference on Learning Representations. [Online] Available: OpenReview, https://openreview.net/pdf?id=HkeyZhC9F7. [Accessed 5 May 2020].

[17] K. Bansal, S. M. Loos, M. N. Rabe, and C. Szegedy, "Learning to reason in large theories without imitation," arXiv preprint arXiv:1905.10501, 2019. Under review.

[18] K. Bansal, S. M. Loos, M. N. Rabe, C. Szegedy, and S. Wilcox, "Holist: An environment for machine learning of higher-order theorem proving (extended version)," arXiv preprint arXiv:1904.03241, 2019. Short version in International Conference on Machine Learning, 2019, pp. 454-463.

[19] C. Kaliszyk, J. Urban, H. Michalewski, and M. Olšák, "Reinforcement learning of theorem proving," in Advances in Neural Information Processing Systems, 2018, pp. 8822-8833.

[20] T. Gransden, N. Walkinshaw, and R. Raman, "SEPIA: search for proofs using inferred automata," in International Conference on Automated Deduction, 2015: Springer, pp. 246-255.

[21] F. Hutter, H. H. Hoos, K. Leyton-Brown, and T. Stützle, "ParamILS: an automatic algorithm configuration framework," Journal of Artificial Intelligence Research, vol. 36, pp. 267-306, 2009.

[22] J. Urban, "Blistr: The blind strategymaker," arXiv preprint arXiv:1301.2683, 2013. 101 101 [23] D. Kühlwein and J. Urban, "MaLeS: A framework for automatic tuning of automated theorem provers," Journal of Automated Reasoning, vol. 55, no. 2, pp. 91-116, 2015.

[24] J. Hurd, "First-order proof tactics in higher-order logic theorem provers," Design and Application of Strategies/Tactics in Higher Order Logics, number NASA/CP- 2003-212448 in NASA Technical Reports, pp. 56-68, 2003.

[25] M. Jamnik, M. Kerber, M. Pollet, and C. Benzmüller, "Automatic learning of proof methods in proof planning," Logic Journal of the IGPL, vol. 11, no. 6, pp. 647-673, 2003.

[26] H. Duncan, "The use of data-mining for the automatic formation of tactics," Ph. D. thesis, University of Edinburgh, 2007.

[27] J. P. Bridge, “Machine learning and automated theorem proving,” Ph. D. dissertation, No. UCAM-CL-TR-792. University of Cambridge, Computer Laboratory, 2010.

[28] R. Evans, D. Saxton, D. Amos, P. Kohli, and E. Grefenstette, "Can neural networks understand logical entailment?," arXiv preprint arXiv:1802.08535, 2018. Accepted in 6th International Conference on Learning Representations.

[29] A. Velykis, “Capturing Proof Process,” Ph. D. dissertation, Newcastle University, 2015.

[30] E. Denney, J. Power, and K. Tourlas, "Hiproofs: A hierarchical notion of proof tree," Electronic Notes in Theoretical Computer Science, vol. 155, pp. 341-359, 2006.

[31] D. Baelde et al., "Abella: A system for reasoning about relational specifications," Journal of Formalized Reasoning, vol. 7, no. 2, pp. 1-89, 2014.

APPENDICES

APPENDIX A: EXAMPLE SPECIFICATIONS AND THEOREMS 104 104

This appendix contains the specifications used in the examples throughout the thesis, including the sig, mod, and thm files as they would appear in a regular Abella development. The thm files contain proof scripts for only the proofs used in examples, where the proofs that serve as lemmas exist without their corresponding proof scripts. An application of the skip tactic would suffice in the place of completing the proofs.

sig thesis.

kind nat type. type z nat. type s nat -> nat.

kind typ type. type numt typ.

kind exp type. type num nat -> exp. type plus exp -> exp -> exp.

type of exp -> typ -> o. type step exp -> exp -> o. type val exp -> o. type nat nat -> o. type add nat -> nat -> nat -> o. type mult nat -> nat -> nat -> o. type divModGap nat -> nat -> nat -> nat -> nat -> o.

105 105 module thesis. nat z. nat (s N) :- nat N. add z N2 N2 :- nat N2. add (s N1) N2 (s N3) :- add N1 N2 N3. mult z N2 z :- nat N2. mult (s N1) N2 N4 :- mult N1 N2 N3, add N3 N2 N4. divModGap z (s N2) z z N2 :- nat N2. divModGap (s N1) (s N2) (s N3) z N2 :- divModGap N1 (s N2) N3 N2 z. divModGap (s N1) (s N2) N3 (s N4) N5 :- divModGap N1 (s N2) N3 N4 (s N5). of (num N) numt :- nat N. of (plus E1 E2) numt :- of E1 numt, of E2 numt. val (num N). step (plus E1 E2) (plus E1' E2) :- step E1 E1'. step (plus E1 E2) (plus E1 E2') :- val E1, step E2 E2'. step (plus (num N1) (num N2)) (num N3) :- add N1 N2 N3.

106 106 Specification "thesis".

Theorem add_nat : forall N1 N2 N3, {add N1 N2 N3} -> {nat N3}.

Theorem add_exists : forall N1 N2, {nat N1} -> {nat N2} -> exists N3, {add N1 N2 N3}. induction on 1. intros. case H1. search. apply IH to H3 H2. search.

Theorem add_unique : forall N1 N2 N3 N3', {add N1 N2 N3} -> {add N1 N2 N3'} -> N3 = N3'. induction on 1. intros. case H1. case H2. search. case H2. apply IH to H3 H4. search.

Theorem add_z : forall N, {nat N} -> {add N z N}.

Theorem add_s : forall N1 N2 N3, {add N1 N2 N3} -> {add N1 (s N2) (s N3)}.

107 107 Theorem mult_nat : forall N1 N2 N3, {mult N1 N2 N3} -> {nat N3}.

Theorem mult_exists : forall N1 N2, {nat N1} -> {nat N2} -> exists N3, {mult N1 N2 N3}. induction on 1. intros. case H1. search. apply IH to H3 H2. apply mult_nat to H4. apply add_exists to H5 H2. search.

Theorem mult_unique : forall N1 N2 N3 N3', {mult N1 N2 N3} -> {mult N1 N2 N3'} -> N3 = N3'. induction on 1. intros. case H1. case H2. search. case H2. apply IH to H3 H5. apply add_unique to H4 H6. search.

Theorem dmg_nat : forall N1 N2 N3 N4 N5, {divModGap N1 N2 N3 N4 N5} -> {nat N3} /\ {nat N4} /\ {nat N5}.

Theorem dmg_exists : forall N1 N2, {nat N1} -> {nat N2} -> exists N3 N4 N5, {divModGap N1 (s N2) N3 N4 N5} /\ {add N5 N4 N2}. induction on 1. intros. case H1. apply add_z to H2. search. apply IH to H3 H2. apply dmg_nat to H4. case H8. case H5. apply add_z to H2. search. case H5. apply add_s to H10. search.

108 108 Theorem cfl_num : forall E, {of E numt} -> {val E} -> exists N, E = num N.

Theorem progress : forall E T, {of E T} -> {val E} \/ exists E', {step E E'}. induction on 1. intros. case H1. search. apply IH to H2. case H4. apply IH to H3. case H6. apply cfl_num to H2 H5. apply cfl_num to H3 H7. case H2. case H3. apply add_exists to H8 H9. search. search. search.

Fresno State Non-exclusive Distribution License (Keep for your records) (to archive your thesis/dissertation electronically via the Fresno State Digital Repository)

By submitting this license, you (the author or copyright holder) grant to the Fresno State Digital Repository the non-exclusive right to reproduce, translate (as defined in the next paragraph), and/or distribute your submission (including the abstract) worldwide in print and electronic format and in any medium, including but not limited to audio or video.

You agree that Fresno State may, without changing the content, translate the submission to any medium or format for the purpose of preservation.

You also agree that the submission is your original work, and that you have the right to grant th e rights contained in this license. You also represent that your submission does not, to the best of your knowledge, infringe upon anyone’s copyright.

If the submission reproduces material for which you do not hold copyright and that would not be considered fair use outside the copyright law, you represent that you have obtained the unrestricted permission of the copyright owner to grant Fresno State the rights required by this license, and that such third-party material is clearly identified and acknowledged within the text or content of the submission.

If the submission is based upon work that has been sponsored or supported by an agency or organization other than Fresno State, you represent that you have fulfilled any right of review or other obligations required by such contract or agreement.

Fresno State will clearly identify your name as the author or owner of the submission and will not make any alteration, other than as allowed by this license, to your submission. By typing your name and date in the fields below, you indicate your agreement to the terms of this use. Publish/embargo options (type X in one of the boxes).

x Make my thesis or dissertation available to the Fresno State Digital Repository immediately upon submission.

Embargo my thesis or dissertation for a period of 2 years from date of graduation. After 2 years, I understand that my work will automatically become part of the university’s public institutional repository unless I choose to renew this embargo here: [email protected]

Embargo my thesis or dissertation for a period of 5 years from date of graduation. After 5 years, I understand that my work will automatically become part of the university’s public institutional repository unless I choose to renew this embargo here: [email protected]

Joseph Reeves

Type full name as it appears on submission

May 22 2020

Date