<<

1 A Survey on Theorem Provers in

M. Saqib Nawaz, Moin Malik, Yi Li, Meng Sun and M. Ikram Ullah Lali

Abstract—Mechanical reasoning is a key area of that lies at the crossroads of mathematical and artificial intelligence. The main aim to develop mechanical reasoning systems (also known as theorem provers) was to enable mathematicians to prove theorems by computer programs. However, these tools evolved with time and now play vital role in the modeling and reasoning about complex and large-scale systems, especially safety-critical systems. Technically, mathematical formalisms and based-approaches are employed to perform and to generate proofs in theorem provers. In literature, there is a shortage of comprehensive documents that can provide proper guidance about the preferences of theorem provers with respect to their designs, performances, logical frameworks, strengths, differences and their application areas. In this work, more than 40 theorem provers are studied in detail and compared to present a comprehensive analysis and evaluation of these tools. Theorem provers are investigated based on various parameters, which includes: implementation architecture, logic and calculus used, support, level of automation, , , differences and application areas.

Index Terms—Mathematical , Reasoning, Interactive theorem provers, Automated theorem provers, automation, Survey. !

1 INTRODUCTION

Recent developments and evolution in Information and Com- events. However, there are some inherent limitations of these munication Technology (ICT) have made systems more techniques. A program can be only tested against the functional and more complex. The criteria that how much we can rely on requirements of the system, which may be not refined and may these systems is based on their correctness. Bugs or any loopholes contain ambiguities that leads to inadequate testing. Exhaustive in the system lead to severe risks that endanger human safety or testing of systems is not possible. Moreover, the time and budget financial loss. In recent times, bugs ratio has increased due to the constraints may affect the testing process. Simulations are also complex designs of the modern systems under market pressure based on assumptions and does not always cover all the aspects and user requirements. The efforts and cost required to correct of the system [77]. Furthermore, both testing and simulation can bugs increases as the gap widens between their introduction and not be used efficiently for analyzing the continuous or hybrid detection. Table 1, taken from [272], shows the relative costs systems. According to [82]: “Program testing is an effective way to fix bugs that are introduced in the requirements phase. A to find errors, but it does not guarantee the absence of errors”. bug introduced in a particular stage of a system development is On the other hand, formal methods formally verify the system relatively cheap to fix if also detected in that stage. It becomes correctness. Formal methods are “mathematical-based techniques more hard and expensive to fix a bug that is introduced in one that are used in the modeling, analysis and verification of both stage and detected in the other stage. For safety-critical systems, the and hardware systems” [275]. Formal methods allow the impact of bugs can be so large as to make a fix effectively the early introduction of models in the development life-cycle, mandatory. against which the system can be proved by using appropriate TABLE 1: Cost to fix a bug introduced in requirements phase . As mathematics is involved in the modeling and Bug Found at Stage Relative Cost to Fix analysis, 100% accuracy is guaranteed [123]. But why we need arXiv:1912.03028v1 [cs.SE] 6 Dec 2019 Requirements 1 (definition) Architecture 3 formal methods in place of other well-known, widely acceptable Design 5-10 and easy to use techniques such as testing and simulation? To System Test 10 Post-Release 10-100 answer this question, we first provide a few examples where testing and simulation failed. Testing and verification techniques are used in the system Air France Flight 447 crashed in June 2009, which resulted test phase to empirically check their correctness. In testing, a into hundred of casualties. During investigation, it was found that system is tested against the software/hardware requirements [201]. the probe sensors were unable to measure the accurate speed Similarly, simulation provides virtual environments for any real of the , which provides the automatic disengagement of autopilot. Similarly, in August 2005, Malaysian Airbus 124 landed • M. Saqib Nawaz is with School of and Technology, Harbin Institute of Technology, Shenzhen, China. E- unexpectedly 18 minutes after taking off due to a fault in its air mail:[email protected] data inertial reference unit. There are two accelerators that control • Moin Malik is with Department of Computer Science & IT, University of the airspeed of the flight, but one of them failed, which resulted Sargodha, Pakistan. E-mail:[email protected] into a sudden rapid climbing and passed almost 4000 feet higher • Yi Li and Meng Sun are with School of Mathematical Sciences, Peking University, Beijing, China. E-mail:{liyi math, sunm}@pku.edu.cn than expected without any warning. After investigation, it came • M. Ikram Ullah Lali is with Department of Computer Science, to know that on the failure of the first accelerator, the second Faculty of Computing and IT, University of Gujrat, Pakistan. E- one used the falsy data of former due to input anomaly in the mail:[email protected] software. The probe, which laid hidden for a decade, was not 2 found in testing because the designer had never assumed that work is to demonstrate how different they are. Moreover, the goal such an event might occur [58]. In June 2009, Metro Train in is to provide a proper guidance to new researchers in formal Washington crashed and as a result, the operator of the train and verification. In order to substantiate our work, a questionnaire 80 other people got injured severely. The cause of this incident has been designed for the evaluation of theorem provers. The was the design anomaly in the safety signal system. The safety questionnaire is then filled by the developers and active researchers system sent a green signal to the upcoming station, while the of theorem provers. Mechanical reasoning systems are investigated track was not empty. Similarly, other examples are failure of the for the following parameters: London Ambulance Service’s computer added dispatch system • used in the system, [217], Therac 25 [184], Anaesthetic equipment and the respiration • Implementation language of the system, monitoring device [189] which resulted into the casualties and • System type, financial losses. All listed accidents could have been avoided if • Platform-support in the system, the design of the systems were analyzed mathematically. • System category, whether it belongs to ATP or ITP, Formal methods techniques in contrast to testing permit the • value of the system (binary/fuzzy), exhaustive investigation and reveals those bugs which are missed • Calculus (deductive/inductive) of the system, by testing methods. Actual requirements of the system in such • (ZF/fuzzy) theoretic support in the system, techniques are translated into formal specifications which are • Programming paradigm of the system, mathematically verified and elaborate the actual behavior of the • (UI) of the system, system in real scenarios. Two most popular formal verification • Scalability of the system, methods are and theorem proving. In • Distributed/multi-threaded, model checking, a finite model of the system is developed first, • Integrated development environment (IDE) support, whose state space is then explored by the model checker to • Library support in the the system, examine whether a desired property is satisfied in the model or • Whether the system satisfy the de Bruijn criterion, and not [28]. Model checking is automatic, fast, effective and it can • Whether the system satisfy the Poincare´ of au- be used to check the partial specification of the system. However, tomation. model checkers still face the state-space explosion problem [62]. Model state space grows infinitely large with increase in the total Primarily, this survey is a collection of tables and figures number of variables and components used and the number of that illustrates various aspects of theorem provers. We received distinct values assigned to these variables [204]. replies from experts/developers of 16 theorem provers. Another Theorem proving on the other hand, can be used to handle 27 theorem provers characteristics are investigated through online infinite systems. In theorem proving, systems are defined and and research articles. We also report the top scientists specified by users in an appropriate mathematical logic. Im- who have proved maximum number of theorems in provers and portant/critical properties of the system are verified by theorem top provers in which most number of mathematical theorems provers. Theorem prover checks that whether a (goal) are proved till now. ATP that performed best at CADE ATP can be derived from a logical set of statements (/). system competition (CASC) are also discussed. CASC is a yearly It can model and verify any system that can be defined with competition for the first-order logic fully ATPs. Moreover, afore- the help of mathematical logics. It is akin to Computer mentioned parameters are used to compare the provers and to System (CAS) because both are used for symbolic computation. present their main characteristics and differences. Finally, their However, theorem provers have some advantages over CAS such applications are discussed along-with the existing recent work and as: flexibility in logic expressiveness, clear expression and more the potential application/problems, where they can be used. rigor. Theorem provers can be further categorized into two main The rest of the survey is structured as follows: An overview types: Automated Theorem Provers (ATPs) and Interactive Theo- on the historical background of theorem provers in the light of rem Provers (ITPs). ATPs deal with the development of automated logical frameworks is provided in Section 2. Related work is also computer programs to prove the goals [254]. In contrast, ITPs discussed. Section 3 elaborates the research methodology that is involve human interaction with computer in the process of proof based on the systematic literature review in . searching and development. That is why ITPs are also known as Section 4 presents the results of the questionnaire, where answers proof-assistants. Due to practical limitations in pure automation, of the experts and developers are presented. In Section 5, top interactive proving is the suitable way for the formalization of scientist that proved most of the theorems and top provers in which “most non trivial theorems in mathematics or computer system maximum number of theorems are proved till now are listed. correctness” [121]. Theorem provers have been used successfully ATP that performed best at CASC competition are also listed. in various domains such as biomedical [232], game [159], Finally, theorem provers are compared for aforementioned criteria. [147], economy [160], computer science [97], The strengths, in-depth analysis of the differences among the artificial intelligence [268] and self-adaptive systems [271]. Note existing theorem provers and their application areas are discussed that the terms theorem provers and mechanical reasoning systems in Section 6, along-with the future research directions. The survey are used interchangeably in this paper. concludes with some remarks in Section 7. We believe that a comprehensive review on mechanical rea- soning systems is strongly needed. People having little about them generally think that all the systems based on math- 2 BACKGROUND ematics have similar nature. However, it is not the case. Each In this section, historical background of the logical frameworks system has different functionality and it is not an easy task to that are used in mechanical reasoning systems is presented. Fur- select which system should be used for the formalization efforts. thermore, related work on the surveys and previous comparisons Theorem provers are diverse in nature and the main aim of this of theorem provers is discussed. 3 2.1 Mathematical Logic Strong about propositional logic is that it is decidable Treating mathematics as a formal theory where all the mathemat- with the help of truth tables. Newell developed the first theorem proving program Logic ical statements are proved with a set of few basic and Theorist in 1956 [245]. This program proved propositional logic rules is a long-standing goal. However, formal proofs of theorems using axioms/inference rules. It not only worked for theorems require a lot of steps, effort and time. Many ancient numeric expressions but also for symbolic formulas and proof Greek logicians, mathematicians and philosophers successfully searching is guided through . It proved 38 theorems expressed reasoning through [124]. Syllogism deals out of 52 and proofs of the theorems were more elegant. An- with formalization of on logical to other contribution in mechanical reasoning system was Davis arrive at a conclusion based on two or more . Leibniz and Putnam’s -based procedure reference [78]. It was worked on the ways to reduce human reasoning calculations a decision method for checking whether a formula in Conjunc- in symbolic logic and embodied these deductive reasoning into tive Normal Form (CNF) is satisfiable or not. Such problems mathematics, making it easier to be implemented in computer nowadays are called SAT (satisfiability) problems. They used programs. Mathematical reasoning provide objectivity and cre- “ground ” for proving mathematical formulas in the ativity that is hard to be found in any other fields. In mechanical form of predicate logic. Ground resolution used two propositional reasoning, two paths were presented. One was to analyze human clauses and generate another propositional clause. For testing proof creation process and implement it using computational Boolean formulas, Davis and Putnam’s procedure implemented resources. The other was to utilize the work of logicians and a series of ground resolution for proving satisfiability of these transform the logical reasoning into a standard form on which formulas [95]. Satisfiability of Boolean formulas was also tested are based [97]. In the mid of 1950s, the relationship with Davis-Putnam-Longemann-Loveland (DPLL) method [79]. of the computer to mathematics has emerged in the form of This was a searching based on mechanism. automated reasoning, especially the automation of mathematical DPLL was used in checking the satisfiability of propositional logic proofs [188]. More discussion on formal deduction, LCF (Logic formulas that are in CNF. This is an improved version of Davis of Computable Functions) and modern role in the and Putnam’s procedure. This method used backtracking search development of theorem provers can be found at [31], [191]. instead of ground resolution. DPLL as compared to Davis and Emergence of automated reasoning initiated the earliest work Putnam’s method was much faster. It has a semantics searching on computer-assisted proofs, when the first general purpose com- process which helps in truth assignment. NP complete problem puters became available [121]. The field of computer supported [93] was the new notion, which was introduced several years later theorem proving gets attention in the second half of 20th century. after DPLL. These problems were developed and proved by Cook In 1970s, theorem provers were investigated for verification of [65]. This motivated the development of SAT for solving hard computers systems. Extensive research in this area was done in late problems, which has been improved over the last few years both 1980s when these tools were used in the verification of hardware on algorithmic and implementation level. systems. In mid 1990s, a bug in ’s Pentium II processor caused by floating point division increased the interest in formal methods 2.1.2 First-Order Logic and formal hardware verification tools were used by industry in Propositional logic has less power as it is based on propositions their system design in late 1990s [176]. In 1994, Boyer proposed and have no ability to predict any complex behavior. Furthermore, the QED manifesto [47] for a “computer-based of all it does not support variables, functions, relations and quantifica- mathematical knowledge” (formalized mathematics). During the tions to compute complex problems. For example, “Socrates is years, the QED manifesto is adopted by many theorem provers. a man” can be represented by propositional logic, but “all men In recent years, the mechanical mathematical proofs for system are mortal” can not be represented in it because of quantification verification has gained popularity [265]. Future of formal methods involvement. First-Order Logic (FOL) is the extension of propo- looks promising and practical. Big companies such as Google, sitional logic that allows quantifiers. Predicate logic is the general Facebook, IBM, Intel and Microsoft are now using and conducting category to which FOL belongs. A predicate is a two valued or research in formal methods. Boolean that maps a given domain of objects onto a Theorem provers are fundamentally based on mathematical Boolean set, and this function is used to show specific quality or logics. Among the most popular theorem provers, we take three property between variables. For example, we assumed that Q(n) widely used ones: Propositional Logic, First-Order Logic (FOL) is a predicate where n is an even integer. Domain of discourse and Higher-Order Logic (HOL). Each logic is further discussed for this predicate is set of all integers. Therefore Q(n) depends next. on the value of n. Logical operators and quantifiers (universal (∀), existential (∃)) are used in predicate logic to express 2.1.1 Propositional Logic problems. It is less challenging to build automated theorem prover Propositional logic is used to represent atomic proposi- based on FOL. However, satisfiability checking of FOL is semi- tions or declarative sentences with the help of mathematical decidable. Skolem and Herbrand first designed a semi-decidable Boolean operators such as and, or, not, implication procedure in 1920 [141]. Their method was based on unguided and equivalence. It is also known as axiomatization of retrieving for proof searching process and of ground Boolean logic. Logical operators such as conjunction (∧), terms. This method was useless in practical terms such as proving disjunction (∨), not (¬), implication (=⇒) and non-trivial theorems. However, it played an important role for bi-implication (⇐⇒) are used to bind propositions to implementing theorem provers based on FOL. make sentences. Truth values (True or ) are assigned to Prawitz was the first one who developed a general mechanical these propositions for evaluation of a sentence. Axiomatization reasoning system for FOL [39]. FOL provers such as Otter of is also performed through propositional logic. [197] and Setheo [183] are mostly based on resolution and 4 tableaux methods and they are used in solving puzzles, algebraic surveyed Coq, PVS, Mizar, ProofPower-Hol, HOL4, Isabelle/HOL problems, software retrieval and verification of protocols. Some and HOL Light for formally analyzing the real time systems. theorem provers even support recursive functions, dependent and They also investigated the extended standard libraries that play (co)inductive types, e.g. ACL2 [156], Metamath [198] and E main role in proof automations: -CoRN/MathClasses for Coq, [240]. Such systems are used in mathematics ( ACL2(r) for ACL2 and the NASA PVS library. The application and ), verification, hardware verification and of theorem provers in economics is discussed in [160], with the commercial applications. However, exponential time algorithms focus on two domains: social choice theory and auction theory. are required for automatic proving in practice. Therefore, proofs In literature, other comparisons between provers can be found. in FOL are generally achieved by changing the FOL formula into However, in most of those works, only two systems are compared a or Boolean satisfiability problem. In such way, BDD, generally. Such works include the comparisons between HOL and DPLL based SAT solvers can be used to automatically check PVS [112], HOL and Isabelle [9], NuPRL and Nqthm [35], Coq the formulas. Satisfiability Modulo (SMT) deals with and HOL [139], HOL and ALF [8] and Isabelle/HOL and Coq the satisfiability of formulas against some logical theory [34]. In [277]. Some works have also been done on how to adapt the proofs recent years, SMT solvers further extended the capabilities of SAT to different systems [108], [202], [213]. In [99], Reentrant Readers solvers. Writers problem is first modeled in UPPAAL model checker and found a possible deadlock scenario. They further converted the 2.1.3 Higher-Order Logic UPPAAL model and analyzed the model in PVS and checked the FOL is more expressive as compared to propositional logic but less PVS model for arbitrary number of processes. Moreover, SPIN than Higher-Order Logic (HOL). FOL only quantifies variables. model checker is used in [100] for modeling and analysis of Predicates, propositions and functions are not quantified by FOL. Reentrant Reader Writers problem. Promela model is converted For example: quantifier quantifying a to PVS specification and the correctness of the model was then verified. ∃s(R(y) −→ s) (1) Another example is quantifier quantifying a predicate. 3 RESEARCH METHODOLOGY ∀S(Q(s) ∧ ¬R(s)) (2) Core part of a Systematic Literature Review (SLR) is the require- ment of research questions. Right and meaningful questions are HOL extends FOL by supporting many types of quantifi- demanded to ask during the review process and we also need to cation. HOL permits predicates to accept which are pinpoint the scope of research accomplishments. Following the also predicates, and allows quantification over predicates and guidelines of [162], we have structured the research questions functions which is not the case for FOL. Based on HOL, one with the support of PIOC (Population, Intervention, Outcome, can construct a proof environment which is logically sound, do and Context) standard for applying the SLR process in software meta reasoning, interactive as well as automated and practically engineering field. PIOC for this work is presented in Table 2. implementable. HOL is mostly used for ITPs. Various methods TABLE 2: PIOC for this work used in HOL-based theorem provers are decision procedures, Population Mechanical reasoning systems inductions, tableaux, , interactions and many heuristics. Intervention Theorem proving approach They are used in formalizing mathematics and verification of Outcome Comprehensive document, evaluation and comparison programming languages, distributed system, compiler, software Context Developers and experts from industry and researchers and hardware systems. Some well known HOL based theorem from academia provers are: HOL [212], Coq [38], PVS [237] and Agda [211]. Efforts are made to collect evidence on recent scenarios of HOL is undecidable, so proving HOL properties is not fully- research in the development of mechanical reasoning tools. For automatic and thus human assistance is required. It is more this purpose, we have designed research questions which are expressive and has the ability to prove complex problems and presented in Table 3. We forwarded our questionnaire to a number theorems. But it is more challenging to build an ATP or ITP based of theorem prover developers, experts and forums. Names and on HOL. Generally, the proof process in ITPs is as follows. User answers of the domain experts that responded are presented in first states the property or feature (in the form of a theorem) that is Section 4. called a goal. User then applies proof commands to solve it. Proof commands may decompose the goal into sub-goals. The proof process is completed once all the sub-goals are solved [185]. 3.1 Keywords Retrieval for Search Strings Keywords (question elements) are find out from relevant research 2.2 Related Work articles that we studied. These keywords are used for retriev- Our survey on mechanical reasoning systems is certainly not ing more information related to the development of mechanical the first one. Some work is done in the past on the survey reasoning tools from electronic databases. These keywords are and detailed discussion and comparison on theorem provers [23], presented in Table 4. Alternative words for the keywords are find [113], [117], [118], [121], [191], [275]. A detailed description of out by using synonyms and thesaurus. These alternative keywords proof-assistants and their theoretical foundations is given in [31] are also used for the searching process. Keywords are linked along-with the comparison for nine theorem provers. However, together with Boolean OR and search strings are constructed by their comparison results take only one and a half pages. 17 linking the four OR lists with Boolean AND. theorem provers are compared in [274] for three parameters: (i) Online databases, journals and conferences related to mechan- library size of each prover, (ii) strength of the logic used in the ical reasoning tools are used for comparison, analysis and evalua- prover, and (iii) level of automation in each prover. Similarly, [46] tion. Seven electronic sources of relevance in software engineering 5

TABLE 3: Research questions from our questionnaire 4 PROVERSAND THEIR CHARACTERISTICS Research questions about system general information What are the names of people who contributed into the system? In this section, answers of the developers and experts that When system first time (date, year) appeared in the market? What is the latest version of the system? responded to our questionnaire are presented. The order in which When was the system updated last time? we present the answers of our respondents is the order in which What is the address of web page for accessing it online? we received their replies. In this way, we wish to express our What are the unique features of the system? What are the success stories of the system? gratitude to them. Research questions about system category information What is the type of the theorem prover (ATP/ITP)? Matt Kaufman (ACL2): Matt Kaufman is a senior research What is the type of the system w.r.t. reasoning (mathematical)? scientist working at Department of Computer Science, University What is the type of the system w.r.t. logic (FOL or HOL)? What is the type of the system w.r.t. truth values (binary/fuzzy)? of Texas, Austin. Matt provided information about “ACL2” [156]. What is the type of the system w.r.t. calculus (deductive/inductive)? Main authors are M. Kaufman and J. Moore. However, several Either set theoretic support (ZF/fuzzy) is available in the system? others have also made significant contributions. The first public Research questions on system programming framework release of ACL2 was 1.9 in 1994. The latest version is 8.2 and What is the paradigm (functional/imperative/other) of the system? What is the programming language (C/C++/Java/Other) of the system? updated last time in May 2019. Basically, it is a monolithic What is the user interface (GUI/CLI) of the system? system, but the applicative style of programming often makes What is the scalability (distributed/multi-threaded) of the system? it straightforward to use pieces of the system. It is generally Either library support is available in the system? classified as an ITP. However, it takes automation seriously; in that sense it shares characteristics with ATP. Its logic is FOL with TABLE 4: Keywords from mechanical reasoning surveys induction and is written mostly in the ACL2 language, which is Armstrong Formal tools, software and hardware correctness, provably an applicative language extending a non-trivial of Common et al. [19] correct design, theorem provers, Satisfiability Modulo Lisp. Its UI is typically Emacs based. ACL2 is a cross platform Theory, abstract models, Event-B, digital systems, formal methods, model correctness. tool: it runs on , MacOS and Windows. It also runs on the Mackenzie Mathematical proofs, interactive theorem proving, auto- top of 6 different implementations. Input format of [188] matic theorem proving, mathematical logic, mathematical ACL2 is s-expressions, though output can be pro-grammatically reasoning, formal logic, proof automation, classical and produced. Web address is cs.utexas.edu/users/moore/acl2. ACL2 constructive logic, machine intelligence. Azurat & Theorem prover, proof checker, formal verification, pro- has been scaled to large applications, recently at Centaur and Prasetya gramming logic, formal representation, HOL, PVS, auto- Oracle. Its users seem pretty happy with readability, but others [25] matic proof generation, Coq. might be put off by the s-expression format. Inter-operability Harrison Logic and program meaning, mathematical logic, sym- between different provers is limited, though there has been [117] bolic manipulation, algebraic , formal lan- guages, software engineering. work [108], [109] that connects ACL2 and HOL4, e.g. ACL2 Harrison , interactive theorem provers, proof goals, is first-order, but is still quite expressive because of its support et al. semi automated mathematics, Automath, Coq, NuPRL, for recursive definitions. Several capabilities allow it to do some [121] Agda, Logic of computable functions, HOL, PVS, proof things that might be considered higher-order in nature: macros, a language, proof automation. Boldo et , formalization, proof libraries, interac- proof technique called functional instantiation and oracle-apply. al. [46] tive theorem provers, PVS, Coq, HOL4, Isabelle/HOL, ACL2(p) [227] supports parallelism for execution, proof and ProofPower-HOL, HOL Light, proof automation. other infrastructure supports parallelism at the level of collections Wiedijk Proof assistants, proof kernel, logical framework, decid- [274] able types, dependable types, de Bruijn criterion, Isabelle, of files. Run-time assertions are supported and lots of debugging Theorema, HOL, Coq, Metamath, PVS, Nuprl, Otter, Alfa, tools are available for program execution and proof. ACL2 can Mizar, ACL2. often emulate other logics by formalizing their proof theories. It Maric´ Decision procedures, proof search, theorem provers, soft- is extensible or programmable by the user via rule classes and [191] ware correctness, interactive theorem provers survey, for- mal deduction, proof checking, logical frameworks, SAT directly via meta rules and clause-processors. ACL2 may be the solvers, SMT solvers, Poincare´ principle. only ITP that presents a single logic for both its programming Hales Computer proofs, proof assistant, small proof kernel, logi- language (provide efficient execution) and its theorem prover [113] cal framework, proof tactics, first-order automated reason- (including definitions and theorems to prove). There is a large ing, , theorem provers. Avigad & Axiomatic set theory, mathematical proof, calculus of library of “Community Books” developed over many years by Harrison reasoning, formalized mathematics, Formal verification, users, in daily use. There are users in academia, government and [23] interactive theorem proving, Poincare´ , formal industry. proof systems. Barendregt proof checking, mathematical logic, type theory and type & checking, type systems, predicate logic, higher-order logic Stephen Schulz (E): Stephen Schulz is the next person who Geuvers proof development, proof-assistants, Coq, Agda, NUPRL, responded to our questionnaire. Stephen designed and developed [31] HOL, Isabelle, Mizar, PVS, ACL2. “E” theorem prover [240]. The first public release of E was 0.2 in 1998. The latest version is 2.4 and updated last time in October 2019. E was originally developed at TU Munich, but now it is is identified in [48]. However, in last few years, many new and maintained and extended at DHBW Stuttgart, Germany. License famous libraries are developed especially in computer science type is open source/free software under GNU GPL Version field. Therefore, it may also be necessary to consider other sources. 2. It is an ATP for full FOL with , where first-order The search strings were used on 10 digital libraries: (i) DBLP, (ii) problems are reduced to clause normal form and uses a saturation IEEE Explore, (iii) ACM , (iv) Springer Link, (v) procedure based on the equational superposition calculus. Main Science Direct, (vi) CiteSeerX , (vii) Scopus, (viii) Inspec, (ix) EI user community of the system is mathematician. Web address is Compendex, and (x) Web of Science. www.eprover.org. E won several CADE ATP competition and 6 has a good ranking. The type of E with respect to reasoning is deductive and supports ZF set theory. Programming paradigm is mathematical, type with respect to the logic is classical FOL imperative and programming language is similar to . Web and with respect to the is binary. Calculus used address is clearsy.com/en/our-tools/atelier-b. UI of Atelier B is in E is deductive and set theoretic support is available but on graphical-based and it operates on various operating systems. It logical level via axiomatization for ZF. Programming paradigm also provides support for the Linux based clusters. Its architecture is imperative, it is purely developed in C and it supports CLI is monolithic, where inheritance and library support is not (command line interface). E is officially distributed in source available. Atelier B (CASE tool) provides C and Ada code files and supports Linux, Mac OS, FreeBSD, Solaris, Windows generation. It has no small proof kernel but has proof-objects and w/Cygwin. It is not a multi-threaded system, has a mixed for more then 130 axioms. Mathematical rules (transformation, architecture (modular + monolithic) and has its own library. Proof rewriting, hypotheses generation) are added by users, but it is can be generated in TPTP-3, PCL2 and Graphviz format. Some not programmable by users. is inspired from Haskell, input codes are generated automatically from test data. System ML, Java, C, C++ and Prolog languages. Infix/postfix/mixfix has no dedicated proof kernel, but has explicit proof-object. operators’ support are available and also for Unicode, Binary Any standard text editor can be used for files input. E has been and ASCII coding schemes. Native support for B language is the combined with other systems (Waldmeister, LEO-II, Vampire, Z3, unique feature of the system. Rich tactic language is available for etc.) at Isabelle Sledgehammer tool [223] to increase the level of writing proof instead by hand and it does not support inductive proof automation. .

Makarius Wenzel (Isabelle): Makarius Wenzel provided David Crocker (Escher Verifier): David Crocker is serving at information about “Isabelle” [222], originally published by L. MISRA C++ working group. He developed “Escher” verifier Paulson (Cambridge, UK). Many people have contributed to [53]. Its latest version is 6.10.02 and was updated last time in Isabelle in the last 30 years. It was released in 1986 and its pure 2015. It is an industrial product and license type of the system logical framework first came up in 1989. Latest version of the is commercial. Web address is eschertech.com/products/ecv.php. system was released in June 2019. Isabelle grew out of university One of the unique feature of Escher Verifier is that sometimes research projects, but it is of industrial quality, or even beyond it suggests missing /assertions/invariants, etc. in that, because it is not subjected to constraints imposed by market the model or software being verified when a proof is not found. economy. The full distribution uses add-on tools with various Main user community is the defense industry. It falls in ATP standard open-source licenses: LGPL, GPL, etc. Web address is category. Reasoning type of the system is mathematical, logical isabelle.in.tum.de. Isabelle unique features is a huge integrated type is a combination of FOL, SOL and some HOL. Type with environment for interactive and automated theorem proving. It is respect to the truth value is mostly binary but triadic where like a word-processor for formal logic, with specifications and necessary. Calculus of the system is deductive and programming proofs. Main user community is the people interested in formal paradigm is mostly functional, but imperative in speed-critical logic and formalized mathematics and people doing proofs about parts. Programming language is C++. There is no direct interface software and hardware. Isabelle is in a multiplicity of ITP for the theorem prover. However, GUI is available for the and ATP systems. Type of the system with respect to reasoning verification tools that uses it. It runs on Windows and Linux is mathematical, type with respect to logic is mostly HOL, but operating systems. System scalability is limited to a single users can also do something else if they really want to. Its type thread and architecture is monolithic and standalone. It does not with respect to truth value is mostly /Boolean. have small proof kernel and editor support. It is not extensible Programming paradigm is purely functional (ML and Scala) and or programmable by its user. Moreover, it does not support UI is a full-scale IDE. It supports multiple operating systems constructive logic. such as: Linux, Windows, Mac and is available both for 32-bit and 64-bit architectures. For scalability, it provides support for Norman Megill (Metamath): Norman Megill is the next classic shared-memory workstations with many cores. Isabelle respondent of the questionnaire. He is the originator of the is highly modular, to the extent that it is hard to tell where it “Metamath” [198]. There are 34 other contributors who helped starts and ends and what is actually its true structure. It provides to extend the system. It was introduced first time in 1993. code generation facility for SML, OCaml, Haskell and Scala. Latest version is 0.131 and was updated last time in June 2016. Isabelle has a small proof kernel, according to the classic “LCF It is an independent development by Norman. Web page is approach”, but with many add-ons and reforms over the decades. us.metamath.org. License type of the system is GPL. User FOL It is based on λ-calculus with simple types and . scheme is the unique feature of the system. It is an ITP and Moreover, it supports inductive recursion and has very powerful also used as a proof checker. Reasoning type of the system is derived for inductive sets, predicates, primitive and mathematical, logical type is FOL. HOL is also possible but not general recursive functions. developed yet. Truth value of the system is binary and deductive calculus is used. There are 12 independent verifiers available Thierry Lecomte (Atelier B): Thierry Lecomte works as in C, Java, C#, Lua, Mathematica, Julia, Rust, Python, Haskell, director at ClearSy organization. Under his supervision, “Atelier C++, and JavaScript. Metamath supports both CLI, GUI and B” was developed. Atelier B implements the B method [4] and runs on almost all operating systems. Architecture of the system offers a theorem prover. It was released first time in 1994. Latest varies according to the environment. Metamath also displays version is 4.5.1 and updated last time in May 2018. Atelier comprehensive error message for the debug output and runtime B is an interactive rule based theorem prover plus interactive assertion. Library of the system contains over 20000 theorems and dedicated tableau method. Logic of the system is classical that covers results in logic, algebra, set and , topology FOL and truth value is traditional Boolean. Calculus type is analysis, Hilbert spaces and quantum logic. It is a standalone 7 system and no code generation facility is available. It is extensible are the unique features. Mathematicians, computer scientists and programmable by the user. Metamath has a small proof and students are the main user community. Mizar is an ITP kernel. Human readability feature according to the syntax is and based on syllogism or mathematical statements. FOL with unique as compared to others. Unification process is used for schemes (statements with free second-order variables) is the pattern matching. Argument handling is implicitly available and system type with respect to the logic and is based on binary it is lightweight. Metamath supports inductive recursion and does truth value. It is based on deductive calculus and ZF set theoretic not allow to write non-terminating programs. The system is easy support is available. Declarative is the programming paradigm to learn, but require with library and reasoning for and object Pascal programming language is used for developing advanced proofs. the system. Type of the interface is CLI and runs on almost all operating systems. It is scalable, but not suitable for distributed Frank Pfenning (Twelf); Frank Pfenning works as professor or multi-threaded environment. It supports IDE, has its own at Computer Science Department, Carnegie Mellon University, library and proof kernel. However, it is not extensible and no USA and is the creator of “Twelf” [226]. C. Schurman¨ also tactic language support is available for proof writing. Mizar contributed to the system. It was released publicly in January Mathematical Library (MML) contains approximately 10,000 1999. Latest version is 1.7.1 and updated last time in January formal definitions and 52,000 lemmas and theorems. 2015. Website address is twelf.org. Twelf is an ITP and simplified BSD is the type of system license. Unique features of the system Michael Norrish (HOL): Michael Norrish has been working as are meta-theorem proving for programming languages and logics. principal research engineer at Australian National University. He It is mainly developed for academia community. Type of the talked about “HOL” theorem prover [212]. It was publicly released reasoning system with respect to logic is type theory and its type in January 1985. Latest version is Kananaskis-13 and was updated with respect to the truth value is intuitionistic. Twelf is built on last time in August 2019. Web address is hol-theorem-prover.org. deductive calculus and is developed in standard ML. UI of the HOL was developed at Cambridge University. Four tools system is CLI and runs on almost all operating systems. It may now comes in HOL family: HOL4 [247], HOL Light [120], be scalable but it is not distributed or multi-threaded system. ProofPower [20] and HOLZero [7]. Other tools that come in HOL Twelf supports IDE and has its own libraries. System is extensible family are developed jointly by Cambridge University, Data61, and programmable by users. It supports no tactic language and CSIRO and Chalmers University of Technology. License type of proofs are written by hand. Twelf has been used to formalize HOL is BSD. It is an ITP based on syllogism or mathematical many different logics and programming languages (examples are statements. HOL is the logical framework of the system and included with the distribution). is based on binary truth value. System is based on deductive calculus and does not supports set theory directly, but it has a Ulf Norell (Agda): Ulf Norell works as a principal set-theoretic model. Programming paradigm is functional and research engineer at University of Gothenburg, Sweden. He developed in SML programming language. UI is command line. It developed “Agda” system [211]. Latest version is 2.6.0.1, supports and runs on all famous operating systems. It is scalable, which was updated last time in May 2019. Web address is but not suitable for distributed or multi-threaded environment. It wiki.portal.chalmers.se/agda/pmwiki.php. License type of the does not support IDE, but has its own proof kernel and library. system is BSD-like. Dependent types are the unique feature and System is extensible and support tactic language for proof writing. academia is the main user community of the system. Popular research language is the main success story of the system. It is Jonathan (RedPRL): Jonathan Sterling is a graduate an ITP and based on . System type with research assistant at School of Computer Science, Carnegie respect to the logic is intuitionistic HOL, type with respect to Mellon University and creator of “RedPRL” [250]. Web address the truth value is binary and is built on inductive calculus. It is redprl.org. MIT is the license type of the system. Unique support constructive type theory. Haskell programming language features of RedPRL are higher dimensional types, support for is used for developing the system and UI of the system is graphic strict equality and tactic scripts, refinement of dependent proofs based. Agda supports and runs on all popular operating systems. and functional . Main user community is homotopy It is scalable but not used for distributed or multi-threaded type theory. It is an ITP and based on syllogism or mathematical environment. It supports IDE, has its own proof kernel and statements. Type theory is the logical framework and is based library. It is extensible and programmable by users and has a on intuitionistic truth values. System is based on deductive tactic language for proof writing. An important aspect of Agda is calculus. Programming paradigm is functional and it is developed its dependence on Unicode. Its standard library is under constant in standard ML. Visual studio code extension is the UI of the development and includes many useful definitions and theorems system. It runs on almost all major operating systems. RedPRL about basic mathematical designs. is not suitable for distributed or multi-thread environment, but has its own IDE. It has no library, but has its own proof kernel. Adam Naumowicz (Mizar): Adam Naumowicz provides RedPRL is extensible by the user and syntax of the system is his services to computer science institute at University of inspired by Nurpl programming language [13]. Tactic language Bialystok, Poland. He gave information on “Mizar” [110], which support is also available for proof writing. was publicly announced in November 1973. Latest version is 8.1.09 and updated last time in June 2019. Web address Oleg Okhotnikov ( & Int Proof Checker): Yuri is mizar.org. Andrzej Trybulec is the founder and Mizar is Vtorushin and Oleg Okhotnikov implemented the “Class and developed at University of Bialystok. The system is free for any Int proof checker”. It was publicly announced first time in noncommercial purposes. User friendly input language based on October 2007. Latest version of the system is Class 2.0 and natural language and a large library of formalized mathematics Int 2.0 and was updated last time in November 2017. Web 8 address is class-int.narod.ru/. Automated proof search for natural system. Programming paradigm of the system is functional and reasoning and support for iterative equalities are the unique developed in OCaml programming language. System supports features of the system. System is mainly developed for students both graphical as well as CLI and run on almost all operating and teachers. Vtorushin and Okhotnikov uses Class and Int systems. System is scalable, but not designed for distributed or programs on seminars with students in courses “Mathematical multi-threaded environment. System has IDE, library support is Logics and Algorithm Theory”, “Artificial intelligence”, etc. available and proof kernel is also owned by the system. Syntax of It is an ATP based on syllogism or mathematical statements. the system is inspired by ML language and is extensible by the It is based on FOL, supports binary truth value and built on users. deductive calculus. Axiomatic method set theoretic support is available. Programming paradigm is declarative and developed Clark Barrett (CVC4): Clark Barret is the last respondent in C++. Moreover, it supports Windows only of the questionnaire and provided information about CVC4 [32], and has a CLI. It is scalable and also supports distributed and developed at Stanford University and University of Iowa. CVC4 multi-threaded environment. System supports IDE and has its is an ATP for SMT problems and was released first time in own proof kernel. System is not extensible by the user and syntax December 2014. Latest version is 1.7 and last time updated in is inspired by Mizar and SAD. Tactic language is also available April 2019. Web address is http://cvc4.cs.stanford.edu and license for proof writing. type is BSD 3-clause. CVC4 is based on DPLL(T) calculus [33] and its type with respect to reasoning is mathematical and is Hans de Nivelle (GEO): Hans de Nivelle from School of Science based on standard many-sorted FOL, with limited support for and Technology, Nazarbayev University, Kazakhstan developed HOL. It also supports finite sets. Main user communities of the “Geo” [61] prover. It was released first time in August 2015 the system are people that are interested in . and latest version is Geo2016C. It is an ATP for FOL that is based Programming paradigm is logical and is developed in C++. on and supports partial classical logic (PCL) with CVC4 supports CLI, offers API’s for C, C++, Java, Python 3-valued logic as a truth value. Calculus of the system is based and runs on Mac and Windows. Support for finite sets is also on geometric resolution and is developed in C++. It supports available. Moreover, system is modular, does not provide any CLI and only runs on Mac. The system takes geometric formulas support for and offers limited support and FOL formulas as input, where FOL formulas are changed to for multi-threading. CVC4 offers solvers for , geometric formulas. During proof search, it looks for a geometric sets and relations, where models assign every formulas either formulas model through backtracking. Main success story is its true or false. Debug output and run time assertions support is existence in the current scenario. License type is GNU GPL, available, whereas code generation support is not available. Input Version 3. System is scalable, but not designed for distributed or language to CVC4 is SMT-Lib, which is inspired by LISP. It also multi-threaded environment and it has no IDE. Library support is support CVC input language, which is more human-readable than not available but proof kernel is owned by the system. Syntax of SMT-LIB. Moreover, limited support is available for inductive the system is inspired by TPTP-language and it is not extensible reasoning. CVC4 is used as the main engine in Altran SPARK by the user with no tactic support. Web page of the system is toolset and at GE and Google. CVC4 comes first in various http://www.ii.uni.wroc.pl/∼nivelle/software/geo III/CASC.html. divisions of Satisfiability Modulo Theories (SMT-COMP), CASC and SyGuS (Syntax-Guided Synthesis) competitions. Hugo Herbelin (Coq): Hugo Heberlin is working as researcher The summary of the main characteristics of theorem provers at INRIA, France. He talked about the “Coq” system [38]. It for which we received answers from experts/developers is listed in was first released in May 1989. Latest version is 8.10.1 and last Table 5. The answers for PVS is provided by authors of this paper time updated in October 2019. Web address of the system is as they have done some work in PVS in the past. coq.inria.fr. It is developed by INRIA and academic partners. We also developed a layout for the survey questionnaire, LPGL 2.1 is the license type. Unique features of the system are which is listed in Table 6. Mnemonics codes are used to rep- expressive logic and programming language, program extraction, resent headlines. These abbreviations are: CLang = Computa- elaborated certification language, proof techniques and a tactic tional Language, 1st Rel = First Release, Ind/Uni/Inde = Indus- language that allows users to define proof methods. Teaching, try/University/Independent, Prog.P = Programming Paradigm, LV formalization of mathematics and certified programming are main = Latest Version, LT = License Type, UI = User Interface, OS = user community of the system. The specification language of Coq Operating System, Lib = Library, CG = Code Generation, Ed = is called Gallina (based on Calculus of Inductive Constructions), Editor, Ext = Extendable, I/O = Input/Output, TType = Tool Type, which allows its users to write their specification by developing CLogic = Computational Logic, TV = Truth Value, ST = Set theories. Coq follows the Curry-Howard isomorphism [248] Theory, App.Areas = Application Areas and Eval = Evaluation. and uses Calculus of Inductive Constructions language [67] We filled the layout for 27 more theorem provers. We collect the to formalize programs, properties and proofs. Curry-Howard data from various resources such as electronic databases, research isomorphism provides a direct relation between programing and articles and dissertations. Complete detail for each system is listed proving and says that proofs in a given subset of mathematics in Appendix A. are exactly programs from a particular language. It means that one can use a programming paradigm to encode propositions and their proofs. Coq is an ITP and supports various decision or semi-decision procedures produced proof-terms checked valid by a kernel. Logical framework of the system is based on HOL, λ-calculus and is built on both inductive as well deductive calculus. Different ways to represent sets are available in the 9 TABLE 5: Main characteristics of 16 theorem provers TABLE 6: Systematic literature review design Theorem provers Name Contributor CVC4 TP4SMT ATP DPLL FOL Binary Deductive Yes Logical Modular C++ CLI Mac+Win No Yes No Yes No No 1st Rel

General Ind/Uni/Ind CLang Prog.P PVS TP ITP Syllogism HOL Binary Deductive Yes Func+OO Modular C Lisp GUI Mac+Linux Yes Yes Yes Yes Yes Yes LV LT UI OS Coq TP ITP DP HOTT Binary Ded+Indu Yes Func Modular OCaml CLI+GUI Cross Yes No Yes Yes No Yes Lib CG

Implementation Ed

Geo TP ATP GT PCL 3-value Deductive No Decl Monolithic C++ CLI Mac Yes No No No No No Ext TType CLogic TV ST Class & Int TP ATP Syllogism FOL Binary Deductive Yes Decl Modular C++ CLI Windows Yes Yes Yes Yes No Yes Calculus Logico-Math ProofKernel App. Areas

RedPRL TP ITP Syllogism TT Intuition Deductive No Func Modular SML GUI Cross No No Yes Yes Yes Yes Eval

Others Unique Features

HOL TP ITP Syllogism HOL Binary Deductive No Func Modular SML CLI Cross Yes No No Yes Yes Yes 5 COMPARISON In this section, we first listed those scientists that contributed most

Mizar TP ITP Syllogism FOL Binary Deductive Yes Decl Modular Pascal CLI Cross Yes No Yes Yes No No in the formalization of mathematical theorems. We also present those theorem provers in which most of the theorems are proved. Furthermore, the top systems (from 1996 till 2019) in CADE

Agda TP ITP FP HOL Binary Inductive Yes Func Modular Haskell GUI Cross Yes No Yes Yes Yes Yes ATP system competition are described. Finally, we showed the comparison of more than 40 provers for parameters mentioned in Section 1. Twelf TP ITP LF TT Intuition Deductive Yes LP Monolithic SML CLI Cross Yes No Yes Yes Yes No 5.1 Top Scientists and Theorem Provers Efficiency and power of theorem provers are generally evaluated Metamath TP ITP Syllogism FOL+HOL Binary Deductive Yes Func Mod+Mono MM CLI+GUI Cross Yes Yes Yes Yes Yes Yes on the number of theorems they proved from top hundred theorems list (available at: http://www.cs.ru.nl/∼freek/100/). People who contributed most in verifying these theorems are presented in Escher TP ATP Syllogism FOL+HOL Bin+Tri Deductive No Func+Imp Monolithic C++ CLI+GUI Win+Linux No No No No No No Figure 1. John Harrison currently working at Intel proved 84 theorems and he used HOL (particularly HOL Light) and Isabelle. Rob Arthan (second in the list) also used family of HOL theorem Atelier B TP ITP Syllogism FOL Binary Deductive Yes Impe Monolithic Prolog GUI Cross Yes No Yes No No Yes provers for theorem proofs. Theorem provers which proved most of the theorems from top hundred theorems list are presented in the order: Isabelle TP ATP+ITP Syllogism HOL Binary Ded+Indu Yes Func Modular ML+Scala GUI Cross Yes Yes Yes Yes Yes Yes HOL Light (86) → Isabelle (81) → Metamath (71) → Coq (69) → Mizar (69) → ProofPower (43) → Nqhtm/ACL2 (18) → PVS

E TP ATP Syllogism FOL Binary Deductive Yes Impe Mod+Mono C CLI Cross Yes No Yes Yes No No (16) → NuPRL/MetaPRL (8) HOL Light’s performance is outstanding and it is at the top by formalizing and proving 86 theorems. Isabelle, another powerful ACL2 TP ITP Syllogism FOL Binary Inductive No Func Modular ACL2 CLI Cross Yes Yes Yes Yes Yes Yes tool is at the second number in the list. Metamath, Coq, Mizar and ProofPower are also the computationally strong tools and play a vital role in the formalization of top hundred theorems. Two systems in HOL family (HOL Light and ProofPower) are included in the list.

5.2 Best FOL Theorem Provers at CASC Each year, FOL based ATPs performances are checked in the Characteristics System Type Theorem Prover Category System Based on Logic Used System’s Truth Value Calculus Set Theoretic Support Programming Paradigm System Architecture Programming Language User Interface Platform Support Scalability Multi-threaded IDE Library Support Programmability Tactic Language Support CADE ATP System Competition (CASC). This competition was started in 1996. CASC consists of various divisions and these divisions are categorized based on the type of their problems and the characteristic of systems. There are two major divisions. First 10

2019 Satallax Vampire Vampire Vampire Vampire E LEO-III Amnie Chaieb 7 2018 iProver SLH MaLARea 7 Benjamin Porter 2017 Vampire Vampire 10 C-CoRN 2016 Vampire Beagle iProver 12 Frederique Guilhot 2015 Niptick VampireZ3 CVC4 Vampire Vampire UEQ Vampire Grzegrorz Bancerek 6 2014 CVC4 iProver iProver Waldmeister 6 Jacques D. Fleuriot 2013 Satallax SPASS + T iProver MaLARea 84 John Harrison 2012 Isabelle Princess Paradox Vampire 10 Karol Pak 2011 Satallax SPASS + T E Waldmeister Laurent Thery 9 2010 LEO-I Vampire Lukas Bulwahn 9 2009 TPS Paradox Paradox Vampire Manual Eberl 17 2008 Meta Prover Meta Prover iProver SInE Marco Riccardi 13 2007 Paradox Paradox Darwin Mario Carneiro 35 2006 Darwin NASA Library 14 2005 Paradox DCTP Norman Megill 13 2004 Gandalf Paul Jackson 5 2003 Ricky Butler 8 2002 Vampire Vampire DCTP Rob Arthan 43 2001 E-SETHEO E-SETHEO Gandalf E-SETHEO Ruben Gambao 8 SEM 2000 Vampire E GandalfSAT E-SETHEO 1999 SPASS Vampire OtterMACE 1998 SPASS Gandalf SPASS Waldmeister Fig. 1: Theorems proved by scientists in ITPs 1997 SPASS Gandalf SPASS

1996 E-SETHEO Otter THF THN TFA TFN FOF FNT CNF SAT EPR SLH LTB one is the competition division which ranks the reasoning system, Fig. 2: Top ATPs at CASC second one is the demonstration division which enables the system to demonstrate its potential without ranking. These divisions are further divided into subdivision on the basis of problem categories. generator and as automated geometric theorem prover share 6%. Competition division is an open platform for automated reasoning Systems that work as decision procedures and ATP for SMT systems that meet the requirements of this division. System se- problems take 8%. lected for the competition division tries to attempt all the problems of this division. Subdivisions of this division are: THF, THN, TFA, TFN, FOF, FNT, CNF, SAT, EPR, SLH (changed to UEQ in 2015 5.3.2 Mechanical Reasoning System Type and back to SLH in 2018) and LTB. More details on subdivisions Figure 3b represents the theorem provers grounding either these can be found in [251]. These divisions are presented on horizontal are based on syllogism, mathematical statements or various logic axis in Figure 2, while vertical axis represents the competition theories. 54.5% of the systems are based on syllogism or mathe- year. Some tools are specified to only one division, while some are matical statements. While other types take 9.1% each. tested on different problem divisions which show excellent results. Figure 2 has sketched the overall results for each division. Arrows in the figure show the continuous winners in a particular division. 5.3.3 Logical Framework For example, Vampire [235] in the FOF division is performing best Figure 3c shows the logical framework of theorem provers. Most from 2002 to 2019 and Satallax [49] is coming first in the THF systems (59%) are FOL based systems. 16% of the systems are division from last seven years. Vampire system topped the TFA, based on HOL. Systems that are based on graph theory, dynamic FOF, FNT and EPR divisions respectively. Similarly, iProver [170] , FOL/HOL, pure classical logic, equational logic, type dominates the EPR division from 2008 to 2014 and 2016 to 2018. theory and higher order type theory share collectively 25%. These provers perform best at CASC due to the following reasons: 1) Sound theoretical foundations, 5.3.4 Truth Value and System Calculus 2) Thorough tuning and testing, Figure 3d shows 86% of theorem provers are based on Boolean 3) Huge implementation efforts, and logic or binary logic, while 7% of the systems are based on 4) Understanding of how to optimize for the competition. . Systems that result in triadic (3-value) share 2% and both binary and triadic value systems take 5%. 5.3 Provers Comparative Analysis Results Figure 3e shows 52% of the systems are based on deductive calculus while 10% are based on inductive calculus. Theorem This subsection presents the comparative analysis of theorem provers which are based on both inductive and deductive calculus provers for various parameters, that includes: theorem prover cate- take 5%. Systems based on euclidean and differential , gory, mechanical reasoning system type, logical framework, truth first order predicate calculus, fixed point co-induction, λ-calculus, value, calculus of the system, set theoretic support, programming calculus, tableau calculus, instantiations calculus, hyper paradigm and programming language of the system, UI, system tableau calculus and typed λ-calculus are respectively 3%, 3%, scalability, distributed/multi-threaded, IDE and library support. 3%, 5%, 5%, 8%, 2%, 2% and 2%. Results for each parameter is presented next.

5.3.1 Theorem Prover Category 5.3.5 Set Theoretic Support Figure 3a shows the category of theorem provers such as ATPs, Figure 3f shows that only 30% of theorem provers provide set ITPs, geometric theorem provers, decision procedures and theory theoretic support, while 56% do not support set theory. Systems generators, etc. Major portion is shared by ATPs (44%) and ITPs that support Horn theory, swinging type theory and ZF set theory (31%). Systems working as both ATP and ITP take 11%. Whereas, take 2% each. Systems that supports Quine’s and B-Method set systems that work as either theory generator, as ATP and also take 3% each. 11 5.3.6 Programming Paradigm and Programming Language TABLE 7: de Bruijn Criterion Figure 3g presents the programming paradigm of the systems System De Bruijn Criterion Proof-object ACL2 - - such as functional, imperative and declarative, etc. Programming E - + paradigms of 23% theorem provers are functional. While systems Isabelle + x having functional, imperative and object oriented paradigms take Atelier B - - 16%. Systems that belong to paradigm take Escher - - Metamath + + 9%. Provers that belong to both procedural and object oriented Twelf - - paradigm are 7% and 12% systems belong to the declarative Agda + + paradigm. 2% of the systems belong to functional, concurrent, Mizar - - and object oriented programming paradigm. Systems that be- HOL + x long to functional and imperative paradigm take 5%. Theorem RedPRL/Nuprl + x Class & Int + + provers which come under the functional and procedural paradigm Geo + + take 5%. Systems that only belong to concurrent programming Coq + + paradigm take 2%. Systems belonging to both functional and PVS - - modular paradigm take 2%. Figure 3h presents the overview of programming languages which are used to develop theorem provers: Ocaml (20%), C/C++ of tactics/strategies that are required to make the proof-assistant (17%), Common Lisp (10%), Java (10%), Prolog (10%), SML to verify the of the statement. The proof-script generates (10%), Haskell (7%), Mathematica (5%), Pascal (3%), Metamath and stores a term that is a proof that can be checked by a simple (2%), Perl (2%), Scala (2%), ML and Scala (2%). proof kernel. The reliability of the whole system depends on the of proof-objects and the proof kernel. Even if someone 5.3.7 User Interface and Operating System have doubts about the validity of certain statements or if some Theorem provers are mostly available with CLI (54%) only, while parts of the systems contain bugs, the proof-object for a given 30% percent of the systems are available with GUI only and 16% statement and the proof kernel can be used to locally verify the of the systems provide both CLI and GUI. statement within the corresponding logical system. On the other hand, 51% theorem provers run on cross platform HOL, Isabelle and Nuprl come in the class of ITPs that have as shown in Figure 3i. 15 % of the systems support Linux, a proof kernel but no proof-objects. In such systems, the proof- Mac and Windows operating systems. 3% of the systems run script are considered as no-standard proof-object (shown with x on all Unixoids-based operating systems. Systems that run only in Table 7). They translate the proof-script into a proof-object that on Windows take 3%, Unix (5%), Linux (5%), and Mac (2%). requires some system specific preprocessing. The trustworthiness Systems that support Unix as well as Linux are 2%. Systems of the translation is verified with the proof kernel. For ITPs with that run on Linux, Unix, Windows and Mac are 5%. Systems no proof kernel (e.g., PVS, Mizar and ACL2), there is no way supporting Linux, Solaris and Mac take 2%. Systems that only (yet) that provides a proof-object with high reliability. One has to support Linux and Solaris take 2%. trust these systems for the correctness of statement accepted by the assistants. The advantage of these kind of systems generally 5.3.8 Distributed/Multi-threaded, IDE and Library Support is their larger automated deduction facilities and user-friendliness Only 17% (8 out of 43) of theorem provers support distributed or [31]. multi-threaded environment, while 83% of the systems does not support such environments. Further, our results showed that all 5.5 The Poincare´ Principle and Automation of the systems have the ability of scalability according to future needs. On the other hand, 65% (30 out of 43) of the systems For theorem provers, one of the important aspects is the automa- support IDE. Furthermore, 56% (26 out of 43) of the systems tion of trivial tasks [274]. It means that a user is not required have their own libraries while 44% of the systems does not have to explain all the details of the calculations to a theorem prover. their own libraries. A theorem prover satisfies the Poincare´ principle (formulated by [29]) if it has the ability to automatically prove the correctness of calculations. For example 3 + 4 = 7 holds by computation and 5.4 The de Bruijn Criterion it should not be justified with long chains of logical inferences. According to the de Bruijn criterion “the correctness of the math- Table 8 lists whether a prover satisfies Poincare´ principle or not. ematics in the system should be guaranteed by a small checker” Another important feature of a theorem prover is whether it [30]. This means that a system has a ‘proof kernel’ (also called enables its user to write programs that can solve proof problems proof checker) that is used to filter all the mathematics. Table 7 algorithmically. HOL, PVS, Coq and Isabelle offers such kind shows whether 15 theorem provers for which we received answers of user automation. Strong built-in automation (proof tactics, from experts/developers have small proof kernels or not + stands decision and search procedures, and induction automation etc.) is for yes and - for no). Proof kernel for other 27 theorem provers are another important aspect of a theorem prover. ACL2 and PVS has shown in Appendix A, and majority of them have no proof kernel. the powerful built-in automation. Then comes HOL, Isabelle and Whereas, HOL Light has extremely small proof kernel containng then Mizar and Coq. The following order indicates the reliability only several hundred lines of OCaml. of ITPs with most reliable on the left side and least reliable on the In some ITPs (e.g., Coq and Agda), the proof kernel also right side. checks the correctness of proof-objects that are generated by other tools included in the whole system. For an ITP with proof-objects, Agda, Coq, Metamath, Nuprl, HOL, Isabelle, Twelf, Mizar, PVS, the proof-script for a statement (theorem or lemma) contain a list ACL2. 12

(a) Theorem provers category (b) Mechanical reasoning system type

(d) Truth value of the systems (c) Logical framework of systems

(f) Set theoretic support (e) Calculus of the systems

(g) System programming paradigm (h) System programming language

(i) Operating system

Fig. 3: Theorem provers comparisons 13 TABLE 8: The Poincare´ principle Atelier B offers a framework that automatically prove and System Poincare´ principle User automation review user added mathematical proof rules. Proof obligations in ACL2 + + Atelier B contain traceability information that helps in locating E - + Isabelle + + modeling errors and model editor allows the navigation of models Atelier B + + and operations [177]. B-method allows one to develop many mod- Escher - - els of the same system with the refinement technique. However, Metamath - - one is required to explain the B model while proving theorems Twelf - - Agda - - and the proof obligation generator may generates small proof Mizar - - obligations that need to be discharged. Atelier B is used in the HOL + + development of safety-critical systems [178] and communication RedPRL/Nuprl + - protocol [145]. Moreover, B-method is also used successfully in Class & Int + - the development of safety-critical parts in two railway projects [5], Geo - + Coq + + [27] and byte code verifier of the Java card [57]. PVS + + Main strength of Metamath is that it uses the minimum possible framework that is required to express mathematics and their proofs. Unlike most ITPs, no assumption is made by the Agda is placed at first place as it only uses predicative logic. Metamath’s verification engine about the underlying logic and the The middle places are occupied by Nuprl, HOL and Isabelle main verification algorithm is very simple (essentially nothing because of their non-standard proof-objects. Finally, the least more than substitution of variables expression, enhanced with reliable are those that do not work wit proof-objects. On the other checking for conflicts in distinct variables). Weaker logics such hand, the order for internal automation in ITPs is opposite with as quantum or intuitionistic can be handled in Metamath with ACL2 and PVS on the top. different sets of axioms. Proofs in Metamath are generally very long, but the proofs are completely transparent and Metamath 6 STRENGTHS,ANALYSIS,APPLICATIONS AND database contains over 30,000 human readable formal proofs. FUTURE DIRECTIONS In the proof development process, users prove a theorem/lemma interactively within the program, which is then written to the In this section, we provide the details on the strengths, in-depth source by the program. A definition is provided in [54] for models analysis of the differences and applications of theorem provers. of Metamath style formal systems, which are demonstrated on Moreover, some future research directions based on recent work examples. From mathematical foundations, is also discussed. Some theorem provers such as Geo, Class & Int Hilbert space and quantum logic is developed in Metamath, which proof checker are omitted in this section and some other famous are used in the verification of some new results in these fields. provers such as Nuprl, Vampire, Prover9/iProver and MaLARea Metamath is used in the formalization of Dirichlet’s theorem and are included. Selberg’s proof of the [55]. An algorithm is presented in [56] that converts HOL Light proofs into Metamath. 6.1 ITPs Twelf strengths are representing deductive systems with side ACL2 main strengths are: state-of-the-art prover heuristics, robust conditions and judgments with contexts. It offers an environ- engineering and extensive hyperlinked documentation. Among all ment for experimenting with encodings and to verify their meta- ITPs, ACl2 uses FOL instead of higher-order logics. Only two theoretic properties. It also provides a module system for the or- ITPs, Isabelle and ACl2 offer parallel proof checking facility. ganization of large developments. Twelf implements the λProlog ACl2 also supports program extraction (also called code gener- [89] and its logic is very close to the Edinburgh Logical Frame- ation) by translating the specification in ACL2 language to Com- work (LF) [226]. Twelf specifications can be executed through mon Lisp. Theories can also be developed and executed in ACL2 a search procedure, which means that it can also be used as a as it is built around a real programming language. Users can not logical programming language. Twelf is used in proving the safety construct inductive types, but powerful built-in induction scheme of standard ML programming language [179], in typed assembly in ACl2 allows users to define their own recursive functions. language system [69], in foundational Proof Carrying-Code sys- The is based on the waterfall design of Nqthm tem [18], in cut-elimination proofs for classical and intuitionistic prover [219]. ACL2 has been used extensively to verify hardware logic [225], for specifying and validating logic morphisms [242] and software designs at AMD, Centaur, Oracle, Intel, IBM and and construction of a safety policy for the IA-32 architecture [70]. to prove separation kernel properties at Rockwell Collins [137]. Main features of Agda is the interactive construction of pro- Moreover, ACL2 is also used successfully in processor modeling grams and proofs with meta-variables and place-holders. Unlike [91], digital systems [157], programming languages [200], asyn- other ITPs that work with proof-scripts, Agda acts as a structure chronous circuits [59] and concurrency [199]. ACl2(ml) [125] uses editor, providing support for term construction. Users can edit machine learning to facilitate users in the proof process. In past, the proof-object by focusing on a hole and executing one of the some work has been done in integrating SAT solvers into ACL2 operations (tactics) that is applied to that hole. It is the only ITP [224], [255]. However, it has generated new issues because of that offers a functional programming language with dependent their support for a wide range of domains including real numbers types. Moreover, strictly positive inductive and inductive-recursive and uninterpreted functions. The x86isa library [104] in ACL2 data types are supported in Agda. Emacs interface for Agda offers a formal model for reasoning about x86 machine-code allows the incremental development of programs [15]. Agda is programs. Adding several features to current x86isa library such used to formally verify railway interlocking systems [155], web as exceptions, interrupts handling and extending I/O capabilities client application development [143], fully certified merge sort will enable us to reason about real system codes. [66], hardware description and verification [92], formalizing Type- 14 Logical Grammars [168], Curry programs verification [17] and algorithm is presented in [101] to automate the selection process formalization of Valiant’s Algorithm [37]. In [94], automated the- of tactics and tactic-sequences in HOL4. orem prover Waldmeister [129] is integrated in Agda to facilitate Nuprl [64], which inspired the development of RedPRL, is a the proof automation. Similarly, another tool is proposed in [186] proof development system based on Computational Type Theory. for automated theorem proving in Agda. Nuprl has evolved significantly in years and now can handle Mizar received popularity because of its huge repository of those logics where inference rules can be declared in a sequent formalized mathematical knowledge, which has been used in style. It follows the LCF approach and the type theory include developing AI/ATP methods to solve in mathematics less-common subtype, very-dependent function types and type [260]. Over the years, the syntax of Mizar is improved, simpli- constructors of quotient type. Type checking in Nuprl is unde- fied and the Mizar language now contains a rich set of logical cidable as subtypes can be defined with arbitrary types. Whereas quantifiers and connectives. The evolution of Mizar in first 30 in PVS, an algorithm for type-checking automates all simpler type years is presented in [194]. In Mizar, proofs are written in a checking tasks. Moreover, the computation language is untyped declarative way and proofs are developed according to the rules of and judgments are also not decidable because the Poincare´ prin- the Jaskowski` style of natural deduction [142]. This characteristics ciple is assumed not only for intensional equality but also for influenced other systems to build similar kind of proof layers extensional equality. Users can interact with Nuprl only through on top of several other systems, such as the Mizar mode for structural editors where proofs can be edited and viewed as proof HOL [115], the Isar language for Isabelle [270], Mizar-Light for trees. Nuprl is used in mathematics formalization and hundred HOL-Light [273] and the declarative proof language (DPL) for of theorems are proved in the system [11]. Nuprl is also used Coq [68]. Mizar does not support the Poincare´ principle, yet it in protocol verification [41], hardware and software specification has some automated deduction and a set of tactics that are very and verification [1], [181], reasoning about functional programs user-friendly. The main unique feature is that the proof-script [135], design of reliable networks [175], and the development of is close to an ordinary proof in mathematics. Apart from large demonstrably correct distributed systems [40]. Some work on the MML, Mizar is used in the development of rigorous mathematics, integration of Nuprl with other systems is done in the past. In in hardware/software verification and in mathematical education [12], PVS is integrated in Nuprl to enables users to access PVS [110]. Some recent work in Mizar includes the formalization of from the Nuprl environment. A new semantics is provided in [134] Pell’s equation [6], Nominative Algorithmic Algebra [169] and to embed the logic of the HOL prover inside Nuprl. Similarly, formalization of bounded functions for cryptology [214]. A suite Nuprl’s meta-theory is formalized in Coq [16], which is later used of AI/ATP system is developed on Mizar library in [151] that in the Nuprl proof for the validity of Brouwer’s Bar Induction contains approximately 58000 theorems. 14 strongest methods that principle [228]. executed in parallel proved 40.6% of the theorems. Moreover, an Coq also follows the LCF approach and probably the most independent certification mechanism is developed in Mizar that is developed ITP after HOL Light. The logic used in Coq is very based on Isabelle framework [148]. In [149], Mizar environment expressive that can define rich mathematical objects. Moreover, is also emulated inside the Isabelle. Coq has the ability to explicitly manipulate proof-objects from HOL was build upon the Cambridge LCF approach [221] that proof-scripts, which makes the integrity of the syetm dependent on is now referred to as HOL/88 with the purpose of hardware veri- the correctness of the type-checking algorithm. As Coq is based on fication. It has influenced the development of other famous ITPs constructive foundations, it has two basic meta-types (also called such as Isabelle/HOL, HOL Zero, ProofPower and HOL Light. In sorts): Prop (as a type of logical propositions) and Set (as a type HOL, the computations that involves recursion can become quite of other types (eg., Booleans, naturals, , etc) [277]. This lengthy and complex when they are converted to proof-objects. allows Coq to distinguish between terms that represent proofs Thus, the proof-objects are not stored, only the proof-scripts. and terms that represent programs. A program extractor can also This is the reason why non-standard proof-objects are used in be used to synthesize and extract verified programs (in OCaml, HOL. Users in HOL generally work inside the implementation Haskell or Scheme) from their formal specifications [182]. Coq language. As HOL is fully programmable, various other means of uses two languages for proofs: Gallina (a pure functional program- interacting with HOL have also been developed. HOL Light is the ming language) for writing specification and LTac (a procedural most widely used provers of this family and it is probably the only language) for the proof process manipulation. Main success stories prover that represents the LCF approach in its purest form. Its logic of the system are formalization of fully certified C-compiler [44], is based on simple type theory with polymorphic type variable. [180], disjoint-set data structure [69], formalization of two way The terms in HOL Light are of simply typed λ−calculus, with finite automata [83], multiplier circuit formalization [220] and just two primitive types: bool (Booleans) and ind (individuals). coordination language [185]. Main mathematical formalizations HOL has been used extensively for formal verification projects done in Coq include the formalization of Feit-Thompson theorem in industry. HOL provers are used widely in the formalization of [107], four-color theorem [106], three gap theorem [195], real mathematical theorems [23], hardware design [2], [116], commu- analysis [45] and theory of algebra [102]. A new approach in nication protocols verification [122], programs [119] and control Coq is presented in [256] that directly generates provably-safe C system analysis [232]. Similarly, HOL(y) Hammer offers machine code. Recently, Coq is used in the formal verification of dynamical learning based selection and automated reasoning both systems [63], password quality checkers [90], protocol for HOL Light and HOL4 [151]. Recent work in HOL includes [218], QWIRE quantum circuit language [230], complex data the formalization of quaternions [96], linear analog circuits [257], structure invariants [146], component connectors [133], [279] and process algebra CCS [258] and metric spaces [190]. A library for the control function for the inverted pendulum [236]. Moreover, a combinational circuits is developed in [244]. Moreover, formalized plug-in (called SMTCoq) is developed in [86] to integrate external Lambek calculus has been ported from Coq to HOL4 in [259] with SMT solvers in Coq. An IDE for integration of Coq projects into some new theorems and improvements. A technique based on A* is also developed in [88]. 15 Compared to other ITPs, PVS is based on classical simple meta-logic and the formulas are terms of type prop. The meta-logic type theory and is without proof-objects, which allows all kinds supports implication, the universal quantification and equality, and of rewriting for numeric as well as symbolic equalities. It of- the inference rule is provided in natural-deduction style [191]. For fers theory interpretation, dependent types, predicate sub-typing, proofs, Isabelle combines HOL for writing specification and Isar powerful decision procedures, Latex support for specifications and as the language to describe procedures for proofs manipulation. proofs, and is user-friendly due to highly expressive specification Isabelle/HOL [210] is the most widely used system nowadays. language and powerful built-in automated deduction. It is also Isabelle offers a rich infrastructure for high-level proof schemes. integrated with other outside systems such as a BDD-based model During theory development, both structured and unstructured checker and also serves as a back-end verification tool for com- proofs can be mixed freely. It is important to state that both HOL puter algebra and code verification systems [191]. During proof and Isabelle use non-standard proof-objects in the form of tactics construction, PVS builds a graphical proof tree in which remaining for equational reasoning. This makes formalization relatively easy proof obligations are at the leaves of tree. Each node in the tree in both systems but it has the disadvantage that the proof-objects represents a sequence and each branch is considered as a (sub) can not be used to see the proof details. In principal, both systems proof goal that is followed from its off-spring branch with the can be modified for proof-objects creation and storing. help of a proof step. PVS prover is based on HP used Isabelle in the design and analysis of the HP 9000 line where each proof goal is a sequent consisting of a sequence of of servers’ Runway bus [52]. The L4.verified project at NICTA formulas called antecedents and a sequence of formulas called used Isabelle to prove the functional correctness of seL4 micro- consequents. The of PVS is not algorithmically kernel [163]. Moreover, Isabelle is successfully used in security decidable and theorem proving may be required to establish the protocols’ correctness [50], formalization of Java programming type- of a PVS specification. Theorems that need to be language [267], Java code soundness and com- proved are called type-correctness conditions (TCCs). PVS is used pleteness [164], property verification of programming language in hardware and software verification [161], [215], concurrency semantics [165]. A list of research projects that uses Isabelle can problems verification [127], file systems verification [128], control be found at: isabelle.in.tum.de/community/projects. Recent works systems [266], cryptographic protocol [24], microprocessor veri- in Isabelle include the verification of Ethereum smart contract fication [249], real time systems [243], formalization of integral bytecode [14], imperative programs asymptotic time complexity calculus [51] and medical devices [193]. Some recent work in PVS verification [278] and formalization of Akra-Bazzi method [84], includes the specification of multi-window user interface [246], deep learning theoretical foundations [36], Green’s theorem [3] formalization of component connectors [206], [208], analysis of and Markov chains and Markov decision processes with discrete distributed cognition systems [192], genetic algorithms operators time and discrete state-spaces [132]. [205] and cloud services [207]. PVS along with its libraries is From last 15 years, E theorem prover is constantly participat- translated to the OMDoc/MMT framework in [167]. The proposed ing at CASC in more than one category (winnrer in SLH divison translation provides a universal representation format for theorem in CASC-27 (2019)). The semantics of E is purely decelrative and prover libraries. Similarly, PVS is allowed in [103] to export proof internal unique features are: shared terms with cached rewriting, certificates that can be verified externally. Moreover, denotational folding feature vector indexing and fingerprint indexing. Unique semantics for Answer Set Programming (ASP) is encoded in [10] features that are visible to the users are advanced and highly and fundamental properties of ASP are proved with PVS theorem flexible search heuristics. E main strengths are the generation of prover. Some of the differences between top ten famous proof- proof-objects, the automatic problem analysis and the support for assistants are listed in Table 9. the TPTP standrad for answers [241]. E is successfully used in the TABLE 9: Comparison of proof-assistants reasoning of large [229], software verification [231] and certification [81]. One of the main limitations in ATP is the ITPs rel T dep. T dec. T state. R rpif LLib lack of mechanism that allows proofs to guide the proof search ACL2 - - - - + + + Isabelle ++ + - + + + + for a new conjecture. In this regard, E is extended in [140] with Metamath + - - - + + + various new clause selection strategies. These strategies are based Twelf + + + + - - - on similarity of a clause with the conjecture. The use of watchlists Agda +++ + + + - - + (also known as hint list) in large E theories is explored in [105], Mizar + + + + + + + to improve the proof process automation. HOL ++ + - + + - + Nuprl ++ + + - - - - Escher Verifier (the successor of Perfect Developer [71]) is Coq + + + + + - + based on the Verified Design-by-Contract Paradigm [72] inspired PVS - + + - + - + from and weakest . It performs static rel: reliability, T: typed, dep. T: , dec. T: decidable type, analysis and formal verification of C programs by checking the state. R: statement about R, rpif: readable proof input files, LLib: large out-of-bounds array indexing, arithmetic overflow, null-pointer library de-referencing and other undefined behavior in the program. It extends the C language with additional keywords and constructs that are required in programs specifications expression and to give 6.2 ATPs strength to the C type system [53]. Term rewriting and FOL based Isabelle is built around a relatively small core that implements theorem prover is implemented for the verification purpose. The many theories as classical FOL, constructive type theory, intu- unique feature of the tool is that it provides hints for the cases itionistic natural deduction and ZFC. Its meta-logic is based on when the provers is unable to verify the code automatically [114]. the fragment of intuitionistic simple type theory that includes Escher verifier is used in the verification of C programs [74], basic types as functional types. Whereas the terms are of typed [73] and formal analysis of web applications [75]. λ-calculus. Only the type prop (proposition) is defined by the Vampire [235] is an ATP for FOL, based on equational super- 16 position calculus and is one of the best ATPs at CASC. Unique TABLE 10: Comparison of famous ATPs features of Vampire include the generation of interpolants and Type ILang Lib SS Web service implementation of a limited resource strategy (LRS). Moreover, E SB-FOP C - + SonTPTP elimination is also implemented in Vampire that is used to Vampire SB-FOP C++ + - SonTPTP automatically find first-order program properties. It has a special Prover9/Otter RB-FOP C + - SonTPTP mode for working with very large knowledge bases and can answer SPASS SB-FOP Java/C - + SonTPTP Satallax TB-HOP OCaml - + SonTPTP queries to them according to TPTP standars. On a multi-core iProver IB-FOP OCaml - + SonTPTP system, Vampire can perform several proof attempts in parallel LEO-II RB-HOP OCaml + + SonTPTP [173]. The strength of ATPs such as Vampire, E and SPASS in MaLARea MS-ATP Perl + - - proving theorems from MML is presented in [262]. Some work on SS: standalone system, FOP: first-order prover SB-FOP: super- adding arithmetic to Vampire is done in [171]. Vampire is used in position-based FOP, RB-FOP: resolution-based FOP, TB-FOP: tableau-based FOP, TB-HOP: tableau-based higher-order prover, [111] to automate the soundness proofs of type systems. Vampire IB-FOP: instantaition-based FOP, RB-HOP: resolution-based HOP, is also used for program analysis and in proving properties of loops SonTPTP: systems on TPTP. MS-ATP: metasystem for ATP with arrays [172]. Cheap SAT solvers such as Avatar [233] plays an important role in the success of Vampire. In general, Vampire is well-suited in the domain of type soundness proofs. However, Similarly, the SAT/SMT solvers can be used in ITPs by first the use of Vampire relies heavily on the size of the chosen axiom translating and passing the goals to the fragment supported by set and on the concrete form of individual axioms. a SAT/SMT solver. The SAT/SMT solver then solve the translated Prover9 [196], the successor of the Otter prover, is a resolution goal without human intervention and guidance. Table 11 lists some based automated prover for equational logic and FOL. Main of the hammers and SAT/SMT solvers that are developed and strength of Prover9 is that it is paired with Mace4. Users gives integrated with ITPs. Waldmeister [129] is integrated with Agda formulas and Prover9 attempts to find a proof. If proof is not find in [94], but no SAT/SMT solvers is yet integrated with Agda. then Mace4 looks for a counter-example. Similalry, Prooftrans can Moreover, PVS employs the Yices SMT solver as an oracle [237], be used to transform Prover9 proofs into more detailed proofs, but not integrated with any ATPs yet. Some new theorem provers simplify the justifications, re-number and expand proof steps, that aims to fill the gap between interactive and automated theorem produce them in XML format, generate hints to guide subsequent proving such as [22] offers APIs to access features of SMT searches and produces proofs for input to proof checkers such solvers (CVC4, Z3) and ATPs. as IVY. Prover9 is used in the analysis of cryptographic-key TABLE 11: ATPs and SAT/SMT solvers for ITPs Assignment schemes [239], verification of Alloy specification language [187] and access control policies [238]. Moreover some ITP Hammers SAT/SMT solvers tasks in on words [131] and geometric procedure Isabelle/HOL Sledgehammer [223] Yices in Isabelle/HOL [87] [216] is also formalized in Prover9, along with proofs of theorems HOL Light/HOL4 HOLyHammer [152] SMT solvers for HOL4 [269] Mizar MizAR [153] MiniSAT for Mizar [203] in Tarskian geometry [264]. Similarly, iProver [170] is based on Coq Hammer for Coq [76] SMTCoq [86] an instantiation framework for FOL called Inst-Gen [98]. First ACL2 ATPs for ACL2 [144] Smtlink for ACL2 [224] order reasoning is combined with ground reasoning in iProver with the help of SAT solver called MiniSat [85]. Main strengths of iProver are: reasoning with large theories, a predicate elimination 6.3 Some Future Directions procedure as a preprocessing technique, EPR-based k-induction Active research activity is going on in both ITPs and ATPs. with counterexample, model representation with first order defini- Despite the great progress in last three decades, general purpose tions in term algebra and proof extraction for resolution as well FOL based theorem provers are still unable to directly determine as instantiation. More details on iProver and other ATPs can be the satisfiability of a first-order formula. SMT problem deals found in Appendix A. with whether a formula written in first-order is satisfiable in Table 10 compares the famous ATPs that perform best at some logical theory. One of the famous theorem prover for SMT CASC for some features. SonTPTP (Systems on TPTP) is an problem is CVC4 [32]. SMT solvers may not terminate on some online interface for ATPs. It can be used by the users to run the problems due to undecidability of FOL. In such cases, users would ATP on TPTP (thousand problems for theorem provers) library or like to know the reason why the solver failed. Developing tools their own problems in the TPTP language [252]. It is important to for SMT solvers which allows developers and users in helping the point out that MaLARea [261] is not an ATP. It is a simple meta- system to finish some proofs is an interesting research area. system that combines several ATPs (E, SPASS, Vampire, etc) with Currently, most of the SMT solvers display “unknown” when a machine learning based component (SNoW system). MaLARea they are unable in proving the unsatisfiability of quantified formu- interleaves the ATPs by first running them (in cycles) on problems, las [234]. Main research direction in SMT solvers is to enable followed by machine learning from successful proofs. The learned them to return counter models in case they fail to prove the information is then used to limit the set of axioms provided to unsatisfiability of quantified formulas that ranges from integers ATPs in the next cycle. In CASC-J9 (2018), MaLARea comes and inductive datatypes to free sorts. Popular SMT solvers (CVC4, first in the LTB division. Yices, Z3 etc) generally work in a sequential manner. One ATPs can be integrated (through hammers [43]) with ITPs for another research area is to parallelize SMT solving to better the proof automation in interactive proof development process. utilize the capability of hardware in multi-core systems. PZ3 Hammers use ATPs to automatically find the proofs for user [60] (the parallel solver for Z3) is one example for this kind defined proof goals. They combine the learning from previous of parallelization. Another important area is the development of proofs with translation of the goals to the logics of ATPs and tools that can integrate SMT solvers in ITPs to increase the level reconstruction of the successfully found proofs for the goals. of automation by offering safe tactics/strategies for solving proof 17 goals automatically with the help of external solvers. SMT solvers challenge is a sound and reliable translation among different for ITPs listed in the preceding Table 11 are some example of this. languages and logics. Similarly, other main concern is the inter- One of the main challenge in ATPs is reasoning with large on- pretation of ATP outputs back into ITP environment. tologies, which are becoming more dominant as large knowledge ITPs does have a large corpora of computer-understandable bases. Some techniques used for reasoning with large theories reasoning knowledge [42], [121] in the form of libraries. These are based on methods for different axiom selection [130], [253]. corpora can play an important role in artificial intelligence based Machine learning is also used for axiom selection where previous methods, such as concept matching, structure formation and theory knowledge is used to prove a conjecture [261], [263]. Framework exploration. The ongoing fast progress in machine learning and for abstract refinement, where the axioms selection and reasoning made it possible to use these learning techniques phases are interleaved, can also be used in reasoning of large on such corpora in guiding the proof search process, in proof ontologies, as shown recently in [126]. automation and in developing proof tactics/strategies, as indicated Theorem provers have the limitations that it is not fast enough, in recent works [101], [105], [147], [154], [209]. Such machine the logic is inconvenient as a scripting language and majority of learning systems can also be combined with ATPs on the large theorem provers do not support graphics and tools. ITPs libraries to develop an artificial intelligence based feedback Similarly, ITPs requires heavy interactions between a user and the loops. Another interesting area is to use evolutionary algorithms proof-assistant, which consumes a lot of time. IDE’s in theorem [26] in ITPs to find and evolve proofs. Till now, effective search provers , especially in ITPs, can substantially facilitate the creation mechanisms for formal proofs are lacking and we believe that of large proofs. However, very few of them are equipped with evolutionary algorithms are more suitable for this task due to their full-fledged IDE’s. Some future work in this direction includes: suitability to solve search and optimization problems. Moreover, (i) making the provers fast and efficient, and (ii) development investigating evolutionary/ algorithms (as the program of IDE’s and integration of IDE’s in different provers. This will (proof) generator) and ITPs (as the proof verifier) to automatically make these tools more acceptable in industrial sector. Similarly, in find formal proofs for new conjectures is also worth pursuing. ITPs, users make use of tactics that reduces a goal to simpler and Some initial work can be found in [136], [276], where a GA is used smaller sub-goals. Another interesting area is the development of with the Coq to automatically find formal proofs for theorems. strategies/tactics by using tactic languages, such as HITAC [21] A single tool based on machine learning is developed in and Ltac [80] which will allow users to elaborate proof strategies [151] that can be used for every ITP on one condition: both the and combine tactics. language and its corresponding library are available in a universal ITPs also lack the inter-operability among proof-assistants and format, so that they can be easily put into the common selection other related tools, which means that tool support cannot be easily algorithm. However, the universal format is generally infeasible shared between ITPs. Translation of ITPs to a universal format is and expensive for many applications. The reason is that it is needed to overcome the duplication efforts in the development of very hard to built a universal format that can offer a good trade- systems, their libraries and supporting periphery tools. Similarly, off between universality and simplicity. Even if such a universal some work [99], [100] is done on integrating the model checking format is available, the implementation of library export into the with theorem provers. However, integrating model checking with universal format is laborious. theorem proving is more difficult as it involves the mapping of One of the such universal format for formal knowledge is models and mathematics involved in the analysis of the systems. the OMDoc/MMT framework [166], which has been used to The vision of QED manifesto is to develop a universal, translate Mizar [138], HOL Light [150] and PVS [167] libraries computer-based database for all mathematical knowledge that into OMDoc/MMT framework . Their work has made the libraries is formalized in logic and is supported by proofs that can be becomes accessible to a wide range of OMDoc-based tools. It validated mechanically. As shown in previous sections, theorem would be interesting to translate other famous ITPs logics and provers are diverse in nature with radically different foundations. libraries to OMDoc/MMT framework for formal mathematics and On one hand, using various provers offers a diverse experience, knowledge management. Also, translation of ITPs to one universal which helps to better understand the strengths and weaknesses format will make the machine learning based premise selection of provers. However, on the other hand, the effort and overhead to provers much easier. Moreover, with flexible alignments [149] needed to learn even one prover effectively makes researchers between the libraries, the developers of different provers can be to stick to using just one system. This results in duplication of guided in the approximate translation of contents across libraries similar work. A way of sharing the work and knowledge among and in reuse notations, such as to show one prover content in a provers would not be just appealing but it would also make provers form that looks familiar to other prover users. more powerful and practical. One feasible approach is to import theorems from one prover to another. Theorems between different systems are transported by trans- 7 CONCLUSION lating the libraries between systems, as done in [158], [174], Mechanical reasoning systems are actively developed since the [213]. The main challenge in sharing theorems is to ensure a birth of modern logic, and now these state-of-the-art tools are meaningful semantic match between the provers, meaning that used in proving complicated mathematical theorems and verifying logic, definitions, types and treatment of functions, etc. in provers large computer systems. A comprehensive survey on mechanical are compatible with each other. Isabelle’s sledgehammer [223] reasoning systems (both ITPs and ATPs) is presented in this work. describes the way for integrating different automation tools. How- Main characteristics, strengths, differences and application areas ever, sledgehammer has number of limitations, such as unsound of the systems are investigated. Some future research directions translation, primitive relevance filter and low performance on based on recent work are also discussed. In summary, we find higher-order problems. Integrating ITPs with ATPs still requires that formalization with theorem provers have not become ma- a lot of research into approaches of interfacing. One of the main ture enough to adopt the working style of vast mathematical 18 community. It still needs: (i) better libraries support for proofs [15] P. D. Ana Bove and U. Norell. A brief overview of Agda-A functional automation, (ii) better ways of knowledge sharing between the language with dependent types. In Proceedings of 22nd International Conference on Theorem Proving in Higher Order Logic, pages 73–78, proof systems, (iii) computation incorporation and verification, 2009. (iv) better means to store and search the background , (v) [16] A. Anand and V. Rahli. Towards a formally verified proof assistant. improved interfaces and integrations of ATPs, ITPs and SAT/SMT In Proceedings of International Conference on Interactive Theorem solvers, and (vi) better support for the machine learning and deep Proving, pages 27–44, 2014. mining techniques for proof guidance and automation. Currently, [17] S. Antoy, M. Hanus, and S. Libby. Proving non-deterministic com- putations in Agda. In Proceedings of WLP’15/’16/WFLP’16, pages a wider researcher community is working on these problems and it 180–195, 2016. is sanguinely estimated that mechanically formalized and verified [18] A. W. Appel. Foundational proof-carrying code. In Proceedings of mathematics will be a commonplace till the mid of this century. 16th Annual Symposium on Logic in Computer Science, pages 247–256, In this survey, judgments which we made about theorem 2001. [19] R. C. Armstrong, R. J. Punnoose, M. H. Wong, and J. R. Mayo. provers may be subjective. The authors hope that this survey Survey of existing tools for formal verification. Technical report, Sandia will provide a quick and easy guide to the interested users and National Laboratories, USA, 2014. people without good mathematics knowledge into the work of [20] R. Arthan. ProofPower–SLRP user guide. Technical report, Lemma 1 theorem provers. In advance, we allege for misrepresentation Limited, 2005. [21] D. Aspinall, E. Denney, and C. Luth.¨ A tactic language for Hiproofs. of these systems from their perspective developers or owners. In Proceedings of International Conference on Intelligent Computer We feel happy to be informed via e-mail about any lapses at Mathematics, pages 339–354, 2008. [email protected]. [22] J. Avigad, L. de Moura, and S. Kong. Theorem proving in Lean, release 3.4.0, 2019. [23] J. Avigad and J. Harrison. Formally verified mathematics. Communica- ACKNOWLEDGMENTS tions of the ACM, 57(4):66–75, 2014. [24] M. Ayala-Rincon´ and Y. S. Rego. Formalization in PVS of balancing The work has been supported by the National Natural Science properties necessary for proving security of the Dolev-Yao cascade Foundation of China under grant no. 61772038, 61532019 and protocol model. Journal of Formalized Reasoning, 6(1):31–61, 2013. 61272160, and the Guandong Science and Technology Depart- [25] A. Azurat and W. Prasetya. A survey on embedding programming ment (Grant no. 2018B010107004). logics in a theorem prover. Technical report, UU-CS-2002-007, Utrecht University, Netharlands, 2002. [26] T. Back. Evolutionary Algorithms in Theory and Practice. Oxford University Press, 1996. EFERENCES R [27] F. Badeau and A. Amelot. Using B as a high level programming [1] M. Aagaard and M. Leeser. Verifying a logic synthesis tool in Nuprl: language in an industrial project: Roissy VAL. In Proceedings of the A case study in software verification. In Proceedings of International International Conference of B and Z Users, pages 334–354, 2005. Conference on Computer Aided Verification, pages 69–81, 1992. [28] C. Baier, J. P. Katoen, and K. G. Larsen. Principles of model checking. [2] A. T. Abdel-Hamid, S. Tahar, and J. Harrison. Enabling hardware MIT press, 2008. verification through design changes. In Proceedings of the International [29] H. Barendregt. The impact of the in logic and computer Conference on Formal Engineering Methods, pages 459–470, 2002. science. Bulletin of Symbolic Logic, 3(2):181–215, 1997. [3] M. Abdulaziz and L. C. Paulson. An Isabelle/HOL formalisation [30] H. Barendregt and E. Barendsen. Autarkic computations in formal of Green’s theorem. In Proceedings of International Conference on proofs. Journal of Automated Reasoning, 28(3):321–336, 2002. Interactive Theorem Proving, pages 3–19, 2016. [31] H. Barendregt and H. Geuvers. Proof-assistants using dependent type [4] J. R. Abrial. The B-book: Assigning programs to meanings. Cambridge systems. In Handbook of Automated Reasoning (in 2 volumes), pages University Press, 2005. 1149–1238. 2001. [5] J. R. Abrial. Formal methods: Theory becoming practice. Journal of [32] C. Barrett, C. L. Conway, M. Deters, L. Hadarean, D. Jovanovic, Universal Computer Science, 13(5):619–628, 2007. T. King, A. Reynolds, and C. Tinelli. CVC4. In Proceedings of 23rd [6] M. Acewicz and K. Pak. Formalization of Pell’s equations in the Mizar International Conference on Computer Aided Verification, pages 171– system. In Proceedings of the Federated Conference on Computer 177, 2011. Science and Information Systems, pages 223–226, 2017. [33] C. Barrett, R. Nieuwenhuis, A. Oliveras, and C. Tinelli. Splitting on [7] M. Adams. Introducing HOL Zero. In Proceedings of 3rd International demand in SAT modulo theories. In Proceedings of 13th International Congress on Mathematical Software, pages 142–143. Springer, 2010. Conference on Logic for Programming, Artificial Intelligence, and [8] S. Agerholm, I. Beylin, and P. Dybjer. A comparison of HOL and Reasoning, pages 512–526, 2006. ALF formalizations of a categorical coherence theorem. In Proceedings [34] C. W. Barrett, R. Sebastiani, S. A. Seshia, and C. Tinelli. Satisfiability of 9th International Conference on Theorem Proving in Higher Order modulo theories. In Handbook of Satisfiability, pages 825–885. 2009. Logics, pages 17–32, 1996. [35] D. Basin and M. Kaufmann. The Boyer-Moore prover and Nuprl: An [9] S. Agerholm and M. Gordon. Experiments with ZF set theory in experimental comparison, logical frameworks, 1991. HOL and Isabelle. In Proceedings of 8th International Conference [36] A. Bentkamp, J. C. Blanchette, and D. Klakow. A formal proof of the on Higher Order Logic Theorem Proving and its Applications, pages expressiveness of deep learning. In Proceedings of 8th International 32–45. Springer, 1995. Conference on International Conference on Interactive Theorem Prov- [10] F. Aguado, P. Ascariz, P. Cabalar, G. Perez, and C. Vidal. Verification ing, pages 46–64, 2017. for ASP : A case study using the PVS theorem prover. Logic Journal of the IGPL, 25(2):195–213, 2015. [37] J. P. Bernardy and P. Jansson. Certified context-free parsing: A [11] S. Allen, R. Constable, R. Eaton, C. Kreitz, and L. Lorigo. The formalisation of Valiant’s algorithm in Agda. Logical Methods in Nuprl open logical environment. In Proceedings of 17th International Computer Science, 12, 2016. Conference on Automated Deduction, pages 170–176, 2000. [38] Y. Bertot and P. Casteran.´ Interactive theorem proving and pro- [12] S. F. Allen, M. Bickford, R. Constable, R. Eaton, and C. Kreitz. A gram development-Coq’Art: The calculus of inductive constructions. Nuprl–PVS connection: Integrating libraries of formal mathematics. Springer, 2013. Technical Report TR2003-1889, Cornell University, USA, 2003. [39] W. Bibel. Research perspectives for logic and deduction. In Proceedings [13] S. F. Allen, M. Bickford, R. L. Constable, R. Eaton, C. Kreitz, L. Lorigo, of Reasoning, Action and Interaction in AI Theories and Systems, pages and E. Moran. Innovations in computational type theory using Nuprl. 25–43, 2006. Journal of Applied Logic, 4(4):428–469, 2006. [40] M. Bickford and D. Guaspari. A programming logic for distributed [14] S. Amani, M. Begel,´ M. Bortin, and M. Staples. Towards verifying systems. Technical report, ATC-NY, 2005. Ethereum smart contract bytecode in Isabelle/HOL. In Proceedings of [41] M. Bickford, C. Kreitz, R. van Renesse, and X. Liu. Proving hybrid the 7th ACM SIGPLAN International Conference on Certified Programs protocols correct. In Proceedings of 14th International Conference on and Proofs, pages 66–77, 2018. Theorem Proving in Higher Order Logics, pages 105–120, 2001. 19

[42] J. C. Blanchette, M. P. L. Haslbeck, D. Matichuk, and T. Nipkow. [69] K. Crary. Toward a foundational typed assembly language. In Pro- Mining the archive of formal proofs. In Proceedings of International ceedings of International Symposium on the Principles of Programming Conference on Intelligent Computer Mathematics, pages 3–17, 2015. Languages, pages 198–212, 2003. [43] J. C. Blanchette, C. Kaliszyk, L. C. Paulson, and J. Urban. Hammering [70] K. Crary and S. Sarkar. Foundational certified code in the Twelf towards QED. Journal of Formalized Reasoning, 9(1):101–148, 2016. metalogical framework. ACM Transactions on Computational Logic, [44] S. Blazy, Z. Dargaye, and X. Leroy. Formal verification of a C compiler 9(3):1–26, 2008. front-end. In Proceedings of 2006 International Symposium on Formal [71] D. Crocker. Perfect developer: A tool for object-oriented formal Methods, pages 460–475, 2006. specification and refinement. In Tools Exhibition Notes at Formal [45] S. Boldo, C. Lelay, and G. Melquiond. Coquelicot: A user-friendly Methods Europe, 2003. library of real analysis for Coq. Technical report, INRIA, France, 2013. [72] D. Crocker. Safe object-oriented software: The verified design-by- [46] S. Boldo, C. Lelay, and G. Melquiond. Formalization of real analysis: contract paradigm. In Proceedings of 12th Safety-Critical Systems A survey of proof assistants and libraries. Mathematical Structures in Symposium, pages 19–41, 2004. Computer Science, 26(7):1196–1233, 2016. [73] D. Crocker. Verifying compilers for financial applications. In Grand [47] R. Boyer. The QED manifesto. In Proceedings of 12th International Challenge 6 workshop of Formal Methods, 2005. Conference on Automated Deduction, pages 238–251, 1994. [74] D. Crocker and J. Carlton. Verification of C programs using automated reasoning. In Proceedings of 5th International Conference on Software [48] P. Brereton, B. A. Kitchenham, D. Budgen, M. Turner, and M. Khalil. Engineering and Formal Methods, pages 1–8, 2007. Lessons from applying the systematic literature review process within [75] D. Crocker and J. H. Warren. Generating commercial web applications the software engineering domain. Journal of Systems and , from precise requirements and formal specifications. In 1st Interna- 80(4):571–583, 2007. tional Workshop on Automated Specification and Verification of Web Proceedings [49] C. Brown. Satallax: An automatic higher-order prover. In Sites, pages 1–6, 2005. of 6th International Joint Conference on Automated Reasoning, pages [76] L. Czajka and C. Kaliszyk. Hammer for Coq: Automation for dependent 111–117. Springer, 2012. type theory. Journal of Automated Reasoning, 61(1-4):423–453, 2018. [50] D. F. Butin. Inductive analysis of security protocols in Isabelle/HOL [77] E. Davis and G. Marcus. The scope and limits of simulation in with applications to electronic voting. PhD thesis, 2012. automated reasoning. Artificial Intelligence, 233:60–72, 2016. [51] R. W. Butler. Formalization of the integral calculus in the PVS theorem [78] M. Davis. The prehistory and early history of automated deduction. In prover. Journal of Formalized Reasoning, 2(1):1–26, 2009. Automation of Reasoning 1: Classical Papers on Computational Logic [52] J. Camilleri. A hybrid approach to verifying liveness in a symmetric 1957-1966, pages 1–28. 1983. multiprocessor. In 10th International Conference on Theorem Proving [79] M. Davis, G. Logemann, and D. Loveland. A machine program for in Higher-Order Logics, pages 49–67, 1997. theorem-proving. Communications of the ACM, 5(7):394–397, 1962. [53] J. Carlton and D. Crocker. Escher verification studio perfect developer [80] D. Delahaye. A tactic language for the system Coq. In Proceedings of and Escher C verifier. Industrial Use of Formal Methods: Formal 7th International Conference on Logic for Programming and Automated Verification, pages 155–193, 2013. Reasoning, pages 85–95, 2000. [54] M. Carneiro. Models for Metamath. Technical report, The Ohio State [81] E. Denney, B. Fischer, and J. Schumann. An empirical evaluation University, USA, 2015. of automated theorem provers in software certification. International [55] M. Carneiro. Formalization of the prime number theorem and Dirich- Journal on Artificial Intelligence Tools, 15(1):81–108, 2006. let’s theorem. In Joint Proceedings of the FM4M, MathUI, and ThEdu [82] E. W. Dijkstra. Cooperating sequential processes. In The Origin of Workshops, pages 10–13, 2016. Concurrent Programming, pages 65–138. 1968. [56] M. M. Carneiro. Conversion of HOL Light proofs into Metamath. [83] C. Doczkal and G. Smolka. Two-way automata in Coq. In Proceedings Journal of Formalized Reasoning, 9:187–200, 2016. of International Conference on Interactive Theorem Proving, pages [57] L. Casset. Development of an embedded verifier for Java card byte 151–166, 2016. code using formal methods. In Proceedings of , [84] M. Eberl. Proving divide and conquer complexities in Isabelle/HOL. pages 20–309, 2002. Journal of Automated Reasoning, 58(4):483–508, 2017. [58] R. N. Charette. Automated to death. IEEE Spectrum, 15, 2009. [85]N.E en´ and N. Sorensson.¨ An extensible SAT-solver. In Proceedings of [59] C. K. Chau, W. A. Hunt, M. Roncken, and I. E. Sutherland. A 6th International Conference on Theory and Applications of Satisfiabil- framework for asynchronous circuit modeling and verification in ACL2. ity Testing, pages 502–518, 2003. In Proceedings of 13th International Haifa Conference on Hardware [86] B. Ekici, A. Mebsout, C. Tinelli, C. Keller, G. Katz, A. Reynolds, and and Software: Verification and Testing, pages 3–18, 2017. C. W. Barrett. Smtcoq: A plug-in for integrating SMT solvers into Coq. [60] X. Cheng, M. Zhou, X. Song, M. Gu, and J. Sun. Parallelizing SMT In Proceedings of 29th International Conference on Computer Aided solving: Lazy decomposition and conciliation. Artificial Intelligence, Verification, pages 126–133, 2017. 257:127–157, 2018. [87] L. Erkk and J. Matthews. Using Yices as an automated solver in Isabelle/HOL. In Proceedings of Automated Formal Methods, pages [61] S. J. Chou. Geo prover- A geometry theorem prover developed at 3–13, 2008. UT. In Proceedings of the 8th International Conference on Automated [88] A. Faithfull, J. Bengtson, E. Tassi, and C. Tankink. Coqoon - An IDE Deduction, pages 679–680. Springer, 1986. for interactive proof development in Coq. International Journal on [62] E. M. Clarke, W. Klieber, M. Nova´cek,ˇ and P. Zuliani. Model checking Software Tools for Technology Transfer, 20(2):125–137, 2018. and the state explosion problem. In Tools for Practical Software [89] A. P. Felty, E. L. Gunter, J. Hannan, D. Miller, G. Nadathur, and A. Sce- Verification, pages 1–30. 2012. drov. Lambda-Prolog: An extended logic programming language. In [63] C. Cohen and D. Rouhling. A formal proof in Coq of LaSalles’s Proceedings of 9th International Conference on Automated Deduction, invariance principle. In Proceedings of International Conference on pages 754–755, 1988. Interactive Theorem Proving, pages 148–163, 2017. [90] J. F. Ferreira, S. A. Johnson, A. Mendes, and P. J. Brooke. Certified [64] R. L. Constable, S. F. Allen, H. M. Bromley, W. R. Cleaveland, J. F. password quality - A case study using Coq and Linux pluggable Cremer, R. W. Harper, D. J. Howe, T. B. Knoblock, N. P. Mendler, authentication modules. In Proceedings of International Conference P. Panangaden, J. T. Sasaki, and S. F. Smith. Implementing Mathematics on Integrated Formal Methods, pages 407–421, 2017. with the Nuprl proof development system. Prentice Hall, 1986. [91] A. Flatau, M. Kaufmann, D. Reed, D. Russinoff, E. Smith, and R. Sum- [65] S. A. Cook. The complexity of theorem-proving procedures. In ners. Formal verification of microprocessors at AMD. In Proceedings Proceedings of the 3rd Annual Symposium on Theory of Computing, of Designing Correct Circuits, 2002. pages 151–158. ACM, 1971. [92] J. P. P. Flor, W. Swierstra, and Y. Sijsling. Pi-Ware: Hardware [66] E. Copello, A. Tasistro, and B. Bianchi. Case of (quite) painless description and verification in Agda. In 21st International Conference dependently typed programming: Fully certified merge sort in Agda. on Types for Proofs and Programs, pages 1–27, 2015. In Proceedings of Brazilian Symposium on Programming Languages, [93] L. Fortnow. The status of the P versus NP problem. Communications pages 62–76, 2014. of the ACM, 52(9):78–86, 2009. [67] T. Coquand and G. Huet. The calculus of constructions. Information [94] S. Foster and G. Struth. Integrating an automated theorem prover into and Computation, 76(2–3):95–120, 1988. Agda. In Proceedings of NASA Formal Methods Symposium, pages [68] P. Corbineau. A declarative language for the Coq proof assistant. 116–130, 2011. In Proceedings of International Workshop on Types for Proofs and [95] J. Franco and J. Martin. A history of satisfiability. volume 185 of Programs, pages 69–84, 2007. Frontiers in Artificial Intelligence and Applications. IOS Press, 2009. 20

[96] A. Gabrielli and M. Maggesi. Formalizing basic quaternionic analysis. [123] O. Hasan and S. Tahar. Formal verification methods. In Encyclopedia of In Proceedings of International Conference on Interactive Theorem and Technology, Third Edition, pages 7162–7170. Proving, pages 225–240, 2017. IGI Global, 2015. [97] J. H. Gallier. Logic for Computer Science: Foundations of automatic [124] J. V. Heijenoort. Historical development of modern logic. Logica theorem proving. Courier Dover Publications, 2015. Universalis, pages 1–11, 2012. [98] H. Ganzinger and K. Korovin. New directions in instantiation-based [125] J. Heras and E. Komendantskaya. ACL2(ml): Machine-learning for theorem proving. In Proceedings of 18th Symposium on Logic in ACL2. In Proceedings of 12th International Workshop on ACL2 Computer Science, pages 55–64, 2003. Theorem Prover and its Applications, pages 461–75, 2014. [99] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Reentrant [126] J. C. L. Hernandez and K. Korovin. Towards an abstraction-refinement readers-writers: A case study combining model checking with theorem framework for reasoning with large theories. In Proceedings of IWIL proving. In Proceedings of International Worksop on Formal Methods Workshop and LPAR Short Presentations, pages 119–123, 2017. for Industrial Critical Systems, pages 85–102, 2008. [127] W. Hesselink and M. IJbema. Starvation-free with [100] B. V. Gastel, L. Lensink, S. Smetsers, and M. van Eekelen. Deadlock semaphores. Formal Aspects of Computing, 25:947–969, 2013. and starvation free reentrant readers–writers: A case study combining [128] W. Hesselink and M. I. Lali. Formalizing a hierarchical file system. model checking with theorem proving. Science of - Formal Aspects of Computing, 24:27–44, 2012. ming, 76(2):82–99, 2011. [129] T. Hillenbrand, A. Buch, R. Vogt, and B. Lochner. Waldmeister: High [101] T. Gauthier, C. Kaliszyk, and J. Urban. Tactictoe: Learning to reason performance equational deduction. Journal of Automated Reasoning, with HOL4 tactics. In Proceedings of 21st International Conference 18:265–270, 1997. on Logic for Programming, Artificial Intelligence and Reasoning, pages [130] K. Hoder and A. Voronkov. Sine qua non for large theory reasoning. In 125–143, 2017. Proceedings of 23rd International Conference on Automated Deduction, [102] H. Geuvers, R. Pollack, F. Wiedijk, and J. Zwanenburg. A construc- pages 299–314, 2011. tive algebraic hierarchy in Coq. Journal of Symbolic Computation, [131] S. Holub and R. Veroff. Formalizing a fragment of combinatorics on 34(4):271–286, 2002. words. In Proceedings of 13th International Conference on Computabil- [103] F. Gilbert. Proof certificates in PVS. In Proceedings of International ity in Europe, pages 24–31, 2017. Conference on Interactive Theorem Proving, pages 262–268, 2017. [132]J.H olzl.¨ Markov chains and markov decision processes in Is- [104] S. Goel. The x86isa books: Features, usage, and future plans. In abelle/HOL. Journal of Automated Reasoning, 59(3):345–387, 2017. Proceedings of 14th International Workshop on the ACL2 Theorem [133] W. Hong, M. S. Nawaz, X. Zhang, Y. Li, and M. Sun. Using Coq for Prover and its Applications, pages 1–17, 2017. formal modeling and verification of timed connectors. In Proceedings [105] Z. Goertzel, J. Jakubuv, S. Schulz, and J. Urban. Proofwatch: Watchlist of Software Engineering and Formal Methods: SEFM 2017 Collocated guidance for large theories in E. In Proceedings of 9th International Workshops, Revised Selected Papers, pages 558–573. Conference on Interactive Theorem Proving, pages 270–288, 2018. [134] D. J. Howe. Semantic foundations for embedding HOL in Nuprl. In [106] G. Gonthier. Formal proof of the Four-Color theorem. Notices of the Proceedings of International Conference on Algebraic Methodology and American Mathematical Society, 55:182–1393, 2008. Software Technology, pages 85–101, 1996. [107] G. Gonthier, A. Asperti, J. Avigad, Y. Bertot, C. Cohen, F. Garillot, [135] D. J. Howe. Reasoning about functional programs in Nuprl. In S. L. Roux, A. Mahboubi, R. OConnor, S. O. Biha, I. Pasca, L. Rideau, Proceedings of Functional Programming, Concurrency, Simulation and A. Solovyev, E. Tassi, and L. Thery.` A machine-checked proof of the Automated Reasoning, pages 145–164, 2005. odd order theorem. In Proceedings of 4th International Conference on [136] S. Huang and Y. Chen. Proving theorems by using evolutionary search Interactive Theorem Proving, pages 163–179, 2013. with human involvement. In Proceedings of Congress on Evolutionary [108] M. J. C. Gordon, W. A. Hunt, M. Kaufmann, and J. Reynolds. An Computation, pages 1495–1502, 2017. embedding of the ACL2 logic in HOL. In Proceedings of the 6th Inter- [137] W. A. Hunt, M. Kaufmann, J. S. Moore, and A. Slobodova. Industrial national Workshop on the ACL2 Theorem Prover and its Applications, hardware and software verification with ACL2. Philosophical Transac- pages 40–46. ACM, 2006. tions A, 375:20150399, 2017. [109] M. J. C. Gordon, J. Reynolds, W. A. Hunt, and M. Kaufmann. An [138] M. Iancu, M. Kohlhase, F. Rabe, and J. Urban. The Mizar mathematical integration of HOL and ACL2. In Proceedings of the 6th International library in OMDoc: Translation and applications. Journal of Automated Conference on Formal Methods in Computer-Aided Design, pages 153– Reasoning, 50(2):191–202, 2013. 160, 2006. [139] L. Jakubiec, S. Coupet-Grimal, and P. Curzon. A comparison of the Coq Short Presentations [110] A. Grabowski, A. Kornilowicz, and A. Naumowicz. Mizar in a nutshell. and HOL proof systems for specifying hardware. at 10th International Conference on Theorem Proving in Higher Order Journal of Formalized Reasoning, 3(2):153–245, 2010. Logics, 63:78, 1997. [111] S. Grewe, S. Erdweg, and M. Mezini. Using vampire in soundness [140] J. Jakubuv and J. Urban. Extending E prover with similarity based proofs of type systems. Technical report, 2016. clause selection strategies. In Proceedings of 9th International Confer- [112] D. Griffioen and M. Huisman. A comparison of PVS and Isabelle/HOL. ence on Intelligent Computer Mathematics, pages 151–156, 2016. In Proccedings of 11th International Conference on Theorem Proving [141] P. Janicic. Automated reasoning: Some successes and new challenges. in Higher Order Logics, pages 123–142, 1998. In Proceedings of the 22nd Central European Conference on Informa- [113] T. C. Hales. Mathematics in the age of the turing machine. Turing’s tion and Intelligent Systems, 2011. legacy. Technical report, University of Pittsburgh, 2013. [142] S. Jaskowski. On the rules of supposition in formal logic. Studia Logica, [114] M. R. Harbach. Methods and tools for the formal verification of 1, 1934. software. Master’s thesis, Technical University Wien, 2011. [143] A. Jeffrey. Dependently typed web client applications. In Proceed- [115] J. Harrison. A Mizar mode for HOL. In Proceedings of 9th International ings of International Symposium on Practical Aspects of Declarative Conference on Theorem Proving in Higher Order Logics, pages 203– Languages, pages 228–243, 2013. 220, 1996. [144] S. J. C. Joosten, C. Kaliszyk, and J. Urban. Initial experiments [116] J. Harrison. Floating-point verification. Journal of Universal Computer with TPTP-style automated theorem provers on ACL2 problems. In Science, 13(5):629–638, 2007. Proceedings of 12th International Workshop on the ACL2 Theorem [117] J. Harrison. A short survey of automated reasoning. In Proceedings of Prover and its Applications, pages 77–85, 2014. International Conference on Algebraic Biology, pages 334–349, 2007. [145] J. Julliand, B. Legeard, T. Machicoane, B. Parreaux, and B. Tatibouet.` [118] J. Harrison. Formal and practice. Notices of the American Specification of an card protocol application using the Mathematical Society, 55(11):1395–1406, 2008. B method and linear . In Proceedings of International [119] J. Harrison. Handbook of Practical Logic and Automated Reasoning. Conference of B Users, pages 273–292, 1998. Cambridge University Press, 2009. [146] J. Kaiser, B. Pientka, and G. Smolka. Relating system F and Lambda2: [120] J. Harrison. Hol Light: An overview. In Proceedings of 22st Interna- A case study in Coq, Abella and Beluga. In Proceedings of 2nd tional Conference on Theorem Proving in Higher Order Logics, pages International Conference on Formal Structures for Computation and 60–66. Springer, 2009. Deduction, pages 21:1–21:19, 2017. [121] J. Harrison, J. Urban, and F. Wiedijk. History of interactive theorem [147] C. Kaliszyk, F. Chollet, and C. Szegedy. HolStep : A machine learning proving. In Computational Logic, volume 9, pages 135–214, 2014. dataset for higher order logic theorem proving. In Proceedings of 5th [122] O. Hasan and S. Tahar. Performance analysis and functional verification International Conference on Learning Representations, 2017. of the stop-and-wait protocol in HOL. Journal of Automated Reasoning, [148] C. Kaliszyk and K. Pak. Progress in the independent certification 42(1):1–33, 2009. of Mizar mathematical library in Isabelle. In Proceedings of 12th 21

Federated Conference on Computer Science and Information Systems, [173] L. Kovacs´ and A. Voronkov. First-order theorem proving and Vampire. pages 227–236, 2017. In Proceedings of 25th International Conference on Computer Aided [149] C. Kaliszyk, K. Pak, and J. Urban. Towards a Mizar environment for Verification, pages 1–35, 2013. Isabelle: Foundations and language. In Proceedings of the Conference [174] A. Krauss and A. Schropp. A mechanized translation from higher- on Certified Programs and Proofs, pages 58–65, 2016. order logic to set theory. In Proceedings of International Conference on [150] C. Kaliszyk and F. Rabe. Towards knowledge management for HOL Interactive Theorem Proving, pages 323–338, 2010. Light. In Proceedings of Internaional Conference on Intelligent Com- [175] C. Kreitz. Building reliable, high-performance networks with the Nuprl puter Mathematics, pages 357–372, 2014. proof development system. Journal of Functional Programming, 14:21– [151] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOL 68, 2004. Light. Mathematics in Computer Science, 9(1):5–22, 2015. [176] T. Kropf. Introduction to formal hardware verification. Springer, 2013. [152] C. Kaliszyk and J. Urban. Hol(y)hammer: Online ATP service for HOL [177] T. Lecomte, D. Deharbe, E. Prun, and E. Mottin. Applying a formal Light. Mathematics in Computer Science, 9(1):5–22, 2015. method in industry: A 25-year trajectory. In Proceedings of 20th [153] C. Kaliszyk and J. Urban. MizAR 40 for Mizar 40. Journal of Brazilian Symposium on Formal Methods, pages 70–87, 2017. Automated Reasoning, 55(3):245–256, 2015. [178] T. Lecomte, T. Servat, and G. Pouzancre. Formal methods in safety- [154] C. Kaliszyk, J. Urban, H. Michalewski, and M. Olsak.´ Reinforcement critical railway systems. In 10th Brasilian Symposium on Formal learning of theorem proving. In Proceedings of Annual Conference on Methods, 2007. Neural Information Processing Systems, pages 8836–8847, 2018. [179] D. Lee, K. Crary, and R. Harper. Towards a mechanized metatheory of [155] K. Kanso. Agda as a platform for the development of verified railway standard ML. In Proceedings of 34th Symposium on the Principles of interlocking systems. PhD thesis, 2012. Programming Languages, pages 173–184, 2007. [156] M. Kaufmann and J. S. Moore. An industrial strength theorem prover [180] X. Leroy. A formally verified compiler back-end. Journal of Automated for a logic based on Common Lisp. IEEE Transactions on Software Reasoning, 43:363–446, 2009. Engineering, 23(4):203–213, 1997. [181] M. Lesser. Using Nuprl for the verification and synthesis of hardware. [157] M. Kaufmann and J. S. Moore. ACL2 and its applications to digital Philosophical Transactions of the Royal Society of London A: Mathe- system verification. In Proceedings of the International Conference on matical, Physical and Engineering Sciences, 339(1652):49–68, 1992. Design and Verification of Microprocessor Systems for High-Assurance [182] P. Letouzey. Extraction in Coq: An overview. In Proceedings of 4th Applications, 2010. Conference on Computability in Logic and Theory of Algorithms, pages [158] C. Keller and B. Werner. Importing HOL Light into Coq. In Proceedings 359–369, 2008. of International Conference on Interactive Theorem Proving, pages [183] R. Letz, J. Schuman, S. Bayerl, and W. Bibel. SETHEO: A 307–322, 2010. high-performance theorem prover. Journal of Automated Reasoning, [159] M. Kerber, C. Lange, and C. Rowat. Formal representation and 8(2):183–212, 1992. proof for cooperative games. In Symposium on Mathematical Practice [184] N. G. Levenson. System safety and computers. Addison Wesley, 1995. and Cognition II. Society for the Study of Artificial Intelligence and [185] Y. Li and M. Sun. Modeling and verification of component connectors Simulation of Behaviour, pages 15–18, 2012. in Coq. Science in Computer Progamming, 113:285–301, 2015. [160] M. Kerber, C. Lange, and C. Rowat. An introduction to mechanized [186] F. Lindblad and M. Benke. A tool for automated theorem proving in reasoning. Journal of Mathematical Economics, 66:26–39, 2016. Agda. In Proceedings of International Workshop on Types for Proofs [161] T. Kim, D. Stringer-Calvert, and S. Cha. Formal verification of func- and Programs, pages 154–169, 2004. tional properties of an SCR-style specification [187] N. Macedo and A. Cunha. Automatic unbounded verification of Alloy using PVS. In Proceedings of 8th International Conference on Tools specifications with Prover9. CoRR, abs/1209.5773, 2012. and Algorithms for the Construction and Analysis of Systems, pages [188] D. Mackenzie. The automation of proof: A historical and sociological 205–220, 2002. exploration. IEEE Annals of the , 17(3):7–29, [162] B. Kitchenham, H. Al-Khilidar, M. A. Babar, M. Berry, K. Cox, 1995. J. Keung, F. Kurniawati, M. Staples, H. Zhang, and L. Zhu. Evalu- [189] J. Mackie and I. Sommerville. Failures of healthcare systems. In ating guidelines for reporting empirical software engineering studies. Proceedings of the 1st IRC Workshop, pages 1–8, 2000. Emperical Software Engineering, 13(1):97–121, 2008. [190] M. Maggesi. A formalization of metric spaces in HOL light. Journal of [163] G. Klein, K. Elphinstone, G. Heiser, J. Andronick, D. Cock, P. Derrin, Automated Reasoning, 60(2):237–254, 2018. D. Elkaduwe, K. Engelhardt, R. Kolanski, M. Norrish, T. Sewell, [191] F. Maric.´ A survey of interactive theorem proving. Zb. Rad, 18:173– H. Tuch, and S. Winwood. seL4: Formal verification of an OS kernel. In 223, 2015. Proceedins of 22nd Symposium on Operating System Principles, pages [192] P. Masci, P. Curzon, D. Furniss, and A. Blandford. Using PVS to 200–220, 2009. support the analysis of distributed cognition systems. Innovations in [164] G. Klein and T. Nipkow. Verified lightweight bytecode verification. Systems and Software Engineering, 11:113–130, 2015. Concurrency and Computataion: Practice and Experience, 13:1133– [193] P. Masci, Y. Zhang, P. Jones, P. Curzon, and H. Thimbleby. Formal ver- 1151, 2001. ification of medical device user interfaces using PVS. In Proceedings of [165] G. Klein and T. Nipkow. Applications of interactive proof to data flow the International Conference on Fundamental Approaches to Software analysis and security. In Software Systems Safety, pages 77–134, 2014. Engineering, pages 200–214, 2014. [166] M. Kohlhase. OMDoc - An Open Markup Format for Mathematical [194] R. Matuszewski and P. Rudnicki. Mizar: The first 30 years. Mechanized Documents [version 1.2], volume 4180 of LNCS. Springer, 2006. Mathematics and Its Applications, 4:3–24, 2005. [167] M. Kohlhase, D. Mller, S. Owre, and F. Rabe. Making PVS accessible to [195] M. Mayero. The three gap theorem (steinhauss conjecture). Technical generic services by interpretation in a universal format. In Proceedings report, INRIA, France, 2006. of 8th International Conference on Interactive Theorem Proving, pages [196] W. McCune. Prover9 and Mace4. Available at: 319–335, 2017. cs.unm.edu/˜mccune/prover9, 2005-2010. [168] W. Kokke. Formalising type-logical grammars in Agda. In Proceedings [197] W. W. McCune. Otter 3.0 reference manual and guide. Technical report, of 1st Workshop on Type Theory and Lexical Semantics, 2015. ANL-94/6, Argonne National Laboratory, 1994. [169] A. Kornilowicz, A. Kryvolap, M. Nikitchenko, and I. Ivanov. Formal- [198] N. Megill. Metamath: A Computer Language for . ization of the nominative algorithmic algebra in Mizar. In Proceedins of Lulu Press, USA, 2007. 38th Iternational Conference on Information Systems Architecture and [199] J. S. Moore. A mechanical analysis of program verification strategies. Technology, pages 176–186, 2017. Formal Methods in System Design, 14:213–228, 1999. [170] K. Korovin. iProver - an instantiation-based theorem prover for first- [200] J. S. Moore. Proving theorems about Java-Like byte code. In Correct order logic. In Proceedings of 4th International Joint Conference on System Design, 2000. Automated Reasoning, pages 292–298, 2008. [201] G. J. Myers, T. Badgett, and C. Snadler. The art of , [171] K. Korovin and A. Voronkov. Solving systems of linear inequalities by Third Edition. John Wiley & Sons Publishers, 2011. bound propagation. In Proceedings of 23rd International Conference [202] P. Naumov, M.-O. Stehr, and J. Meseguer. The HOL/NuPRL proof on Automated Deduction, pages 269–383, 2011. translator. Proceedings of 14th International Conference on Theorem [172] L. Kovacs and A. Voronkov. Finding loop invariants for programs over Proving in Higher Order Logic, 2152:329–345, 2001. arrays using a theorem prover. In Proceedings of the International Con- [203] A. Naumowicz. SAT-enhanced Mizar proof checking. In Proceedings ference on Fundamental Approaches to Software Engineering, pages of 7th International Conference on Intelligent Computer Mathematics, 470–485, 2009. pages 449–452, 2014. 22

[204] M. S. Nawaz, M. I. Lali, and S. Meng. Formal modeling, analysis [230] R. Rand, J. Paykin, and S. Zdancewic. QWIRE practice: Formal verifi- and verification of Black White Bakery algorithm. In 9th International cation of quantum circuits in Coq. In Proceedings of 14th International Conference on Intelligent Human-Machine Systems and Cybernatics, Conference on Quantum Physics and Logic, pages 119–132, 2017. pages 407–410. IEEE, 2017. [231] S. Ranise and D. Deharbe.` Applying light-weight theorem proving [205] M. S. Nawaz and M. Sun. A formal design model for genetic algorithms to debugging and verifying pointer programs. In Proceedings of 4th operators and its encoding in PVS. In Proceedings of 2nd International International Workshop on First-Order Theorem Proving, pages 109– Conference on Big Data and Internet of Things, pages 2186–190, 2018. 119, 2009. [206] M. S. Nawaz and M. Sun. Reo2PVS: Formal specification and verifica- [232] A. Rashid, O. Hasan, U. Siddique, and S. Tahar. Formal reasoning about tion of component connectors. In Proceedings of 30th International systems biology using theorem proving. PLoS ONE, 12(7):e0180179, Conference on Software Engineering and Knowledge Engineering, 2017. pages 391–396, 2018. [233] G. Reger, M. Suda, and A. Voronkov. Playing with AVATAR. In [207] M. S. Nawaz and M. Sun. Using PVS for modeling and verifying cloud Proceedings of the 25th Internal Conference on Automated Deduction, services and their composition. In Proceedings of 6th International pages 399–415, 2015. Conference on Advanced Cloud and Big Data, pages 41–46, 2018. [234] A. Reynolds, C. Tinelli, and C. Barrett. Constraint solving for finite [208] M. S. Nawaz and M. Sun. Using PVS for modeling and verification model finding in SMT solvers. Theory and Practice of Logic Program- of probabilistic connectors. In Proceedings of International Conference ming, 17(4):516–558, 2017. on Fundamentals of Software Engineering, pages 61–76, 2019. [235] A. Riazanov and A. Voronkov. Vampire 1.1 (system description). [209] M. S. Nawaz, M. Sun, and P. Fournier-Viger. Proof guidance in In Proceedings of 1st International Joint Conference on Automated PVS with sequential pattern mining. In Proceedings of International Reasoning, pages 376–380, 2001. Conference on Fundamentals of Software Engineering, pages 45–60, [236] D. Rouhling. A formal proof in Coq of a control function for the 2019. inverted pendulum. In Proceedings of the 7th International Conference [210] T. Nipkow, L. C. Paulson, and M. Wenzel. Isabelle/HOL - A proof on Certified Programs and Proofs, pages 28–41, 2018. assistant for higher-order logic, volume 2283 of LNCS. Springer, 2002. [237] J. M. Rushby. Tutorial: Automated formal methods with PVS, SAL, [211] U. Norell. Dependently typed programming in Agda. In Proceedings and Yices. In 4th International Conference on Software Engineering of the 4th International Workshop on Types in Language Design and and Formal Methods, page 262, 2006. Implementation, pages 1–2. ACM, 2009. [238] K. E. Sabri. Automated verification of role-based access control policies [212] M. Norrish. Formalising C in HOL. PhD thesis, Computer Laboratory, constraints using Prover9. CoRR, abs/1503.07645, 2015. University of Cambridge, 1998. [239] K. E. Sabri and R. Khedri.´ A generic algebraic model for the analysis [213] S. Obua and S. Skalberg. Importing HOL into Isabelle/HOL. In Pro- of cryptographic-key assignment schemes. In Proceedings of 5th ceedings of 3rd Inernational Joint Conference on Automated Reasoning, International Symposium on Foundations and Practice of Security, pages 298–302, 2006. pages 62–77, 2012. [214] H. Okazaki and Y. Futa. Formalization of polynomially bounded and [240] S. Schulz. E-A brainiac theorem prover. Ai Communications, 15(2, negligible functions using the computer-aided proof-checking system 3):111–126, 2002. Mizar. In Proceedings of 9th Conference on Intelligent Computer [241] S. Schulz, S. Cruanes, and P. Vukmirovic. Faster, higher, stronger: E Mathematics, pages 117–131, 2016. 2.3. In Proceedings of 27th International Conference on Automated [215] S. Owre, J. M. Rushby, N. Shankar, and M. K. Srivas. A tutorial on us- Deduction, pages 495–507, 2019. ing PVS for hardware verification. In Proceedings of 2nd International Conference on Theorem Provers in Circuit Design - Theory, Practice [242] C. Schurmann and M. Stehr. An executable formalization of the and Experience, pages 258–279, 1994. HOL/Nuprl connection in the meta-logical framework Twelf. In Proceedings of International Conference on Logic for Programming [216] R. Padmanabhan and R. Veroff. A geometric procedure with Prover9. In Artificial Intelligence and Reasoning, pages 150–166, 2006. Automated Reasoning and Mathematics - Essays in Memory of William W. McCune, pages 139–150, 2013. [243] N. Shankar. Verification of real-time systems using PVS. In Proceedings of the International Conference on Computer Aided Verification, pages [217] D. Page. Report of the inquiry into the london ambulance service, 1993. 280–291, 1993. [218] H. M. Palombo, H. Zheng, and J. Ligatti. POSTER: Towards precise and automated verification of security protocols in Coq. In Proceedings [244] S. Shiraz and O. Hasan. A library for combinational circuit verification IEEE Transaction on CAD of Integrated of the 2017 Conference on Computer and Communications Security, using the HOL theorem prover. Circuits and Systems pages 2567–2569, 2017. , 37(2):512–516, 2018. [219] P. Papapanagiotou and J. D. Fleuriot. The Boyer-Moore [245] J. Siekmann and G. Wrightson. Automation of Reasoning: 2: Classical revisited. CoRR, abs/1808.03810, 2018. Papers on Computational Logic 1967–1970. Springer, 2012. [220] C. Paulin-Mohring. Circuits as streams in Coq: Verification of a [246] K. Singh and B. Auernheimer. Formal specification of Multi-Window sequential multiplier. In Proceedings of the International Workshop on user interface in PVS. In Proceedings of International Conference on Types for Proofs and Programs, slected papers, pages 216–230, 2005. Human-Computer Interaction, pages 144–149, 2016. [221] L. C. Paulson. Logic and Computation: Interactive Proof with Cam- [247] K. Slind and M. Norrish. A brief overview of HOL4. In Proceedings bridge LCF. Cambridge University Press, 1987. of 21st International Conference on Theorem Proving in Higher Order [222] L. C. Paulson. Isabelle: A generic theorem prover. Springer, 1994. Logics, pages 28–32, 2008. [223] L. C. Paulson and J. Blanchette. Three years of experience with [248] M. H. Sorensen and P. Urzyczyn. Lectures on the Curry-Howard Sledgehammer, a practical link between automated and interactive Isomorphism. Elsevier, 2006. theorem provers. In Invited talk at 8th International Workshop on the [249] M. K. Srivas and S. P. Miller. Applying formal verification to the Implementation of Logics, 2010. AAMP5 microprocessor: A case study in the industrial use of formal [224] Y. Peng and M. R. Greenstreet. Extending ACL2 with SMT solvers. methods. Formal Methods in System Design, 8(2):153–188, 1996. In Proceedings of 13th International Workshop on the ACL2 Theorem [250] J. Sterling, D. Gratzer, V. Rahli, D. Morrison, E. Akentyev, and Prover and Its Applications, pages 61–77, 2015. A. Tosun. RedPRL–The people’s refinement logic, available at: [225] F. Pfenning. Structural cut elimination. In Proceedings of 10th Annual http://www.redprl.org/, 2016. Symposium on Logic in Computer Science, pages 156–166, 1995. [251] G. Sutcliffe. The CADE ATP system competition - CASC. AI [226] F. Pfenning and C. Schurmann.¨ System description: Twelf-a meta- Magazine, 37(2):99–101, 2016. logical framework for deductive systems. In Proceedings of Interna- [252] G. Sutcliffe. The TPTP problem library and associated infrastructure tional Conference on Automated Deduction, pages 202–206, 1999. - From CNF to TH0, TPTP v6.4.0. Journal of Automated Reasoning, [227] D. L. Rager. Adding parallelism capabilities to ACL2. In Proceedings 59(4):483–502, 2017. of the 6th International Workshop on the ACL2 Theorem Prover and its [253] G. Sutcliffe and Y. Puzis. SRASS - A semantic relevance axiom applications, pages 90–94. ACM, 2006. selection system. In Proceedings of 21st International Conference on [228] V. Rahli and M. Bickford. Coq as a metatheory for Nuprl with bar Automated Deduction, pages 295–310, 2007. induction. Technical report, Cornell University, 2015. [254] G. Sutcliffe and C. Suttner. Evaluating general purpose automated [229] D. Ramachandran, P. Reagan, and K. Goolsbery. First-orderized re- theorem proving systems. Artificial Intelligence, 131(1):39–54, 2001. searchcyc : Expressivity and efficiency in a common-sense . In [255] S. Swords and J. Davis. Bit-blasting ACL2 theorems. In Proceedings Proceedings of AAAI Workshop on Contexts and Ontologies: Theory, of 10th International Workshop on the ACL2 Theorem Prover and its Practice and Applications, 2005. Applications, pages 84–102, 2011. 23

[256] A. Tanaka, R. Affeldt, and J. Garrigue. Safe low-level code genera- tion in Coq using monomorphization and monadification. Journal of Informaion Processing, 26:54–72, 2018. [257] S. H. Taqdees and O. Hasan. Formally verifying transfer functions of linear analog circuits. IEEE Design & Test, 34(5):30–37, 2017. [258] C. Tian. A formalization of the process algebra CCS in HOL4. CoRR, abs/1705.07313, 2017. [259] C. Tian. Formalized Lambek calculus in higher order logic (HOL4). CoRR, abs/1705.07318, 2017. [260] J. Urban. Translating Mizar for first order theorem provers. In Pro-

ceedings of 2nd International Conference on Mathematical Knowledge ) Management, pages 203–215, 2003. [261] J. Urban. MaLARea: A metasystem for automated reasoning in large theories. In Proceedings of 21th Workshop on Empirically Successful Automated Reasoning in Large Theories, 2007. [262] J. Urban, K. Hoder, and A. Voronkov. Evaluation of automated theorem proving on the Mizar mathematical library. In Proceedings of International Congress on Mathematical Software, pages 155–166, 2010. [263] J. Urban, G. Sutcliffe, P. Pudlak,´ and J. Vyskocil. MaLARea SG1- Algebraic specification & proof system machine learner for automated reasoning with semantic guidance. In Proceedings of 4th International Conference on Automated Reasoning, Zenon R. Bonichon, D. Delahaye &2007 D. Doligez Independent OCaml Functional, Imperative, OO 0.8.2 BSD and MIT CLI Cross Standard library No No Yes No ATP ( FOL (with polymorphic & equality) Binary B-Method set theory Deduction modulo (Tableau method) No Used in focalented environment, algebra Object specification ori- TPTP Category SetSEU (110 (227 out of out 900) Produce low of level proof 462) directly for Coq pages 441–456, 2008. [264] J. Urban and R. Veroff. Experiments with state-of-the-art automated provers on problems in tarskian geometry. In Procedings of 11th International Workshop on the Implementation of Logics, pages 122– 126, 2015. [265] J. Urban and J. Vyskocil. Theorem proving in large formal mathematics as an emerging AI field. In Automated Reasoning and Mathematics: Essays in Memory of William McCune, pages 240–257, 2013. [266] J. Vitt and J. Hooman. Assertional specification and verification using PVS of the steam boiler control system. In Proceedings of the International Conference Formal Methods for Industrial Applications, pages 453–472, 1995. [267] D. von Oheimb. Hoare logic for Java in Isabelle/HOL. Concurrency Analytica E. Clarke & X. Zhao 1990 Carnegie Mellon University Mathematica Procedural, Functional, OO Analytica 2 Free BSD GUI Cross Standard library No Yes Yes Tex files ATP FOL Binary No Deductive Yes Symbolic computation system Policy analysis Translated to OMDoc framework and Computation: Practive and Experience, 13(13):1173–1214, 2001. [268] H. Wang. Computer theorem proving and artificial intelligence. Com- putational Logic, pages 63–75, 1990. [269] T. Weber. SMT solvers: New oracles for the HOL theorem prover. Inter- national Journal on Software Tools for Technology Transfer, 13(5):419– 429, 2011. [270] W. F. Wenzel, M. A comparison of Mizar and Isar. Journal of Automated Reasoning, 29(3–4):389–411, 2002. [271] D. Weyns, M. U. Iftikhar, D. G. de la Iglesia, and T. Ahmad. A survey of formal methods in self-adaptive systems. In Proceedings of 5th International Conference on Computer Science & Software Engineering, pages 67–79, 2012. [272] N. White, S. Matthews, and R. Chapman. Formal verification: Will ICS Jean-Christophe Fillitre 2002 SRI International, USA Ocaml Functional, Imperative, OO Yices Open source CLI Linux, Solaris, MAC No No Yes Yes No Decision procedure FOL with equality & quantifier free Binary No Deductive No Embedded in application to provide deductive services NASA, part of PVS API for proof searchsimulation and symbolic the seedling ever flower? Philosophical Transactions A, 375:20150402, 2017. [273] F. Wiedijk. Mizar Light for HOL Light. In Proceedings of International Conference on Theorem Proving in Higher Order Logic, pages 378–393, 2001. [274] F. Wiedijk. The seventeen provers of the world: Foreword by Dana S. Scott, volume 3600. Springer, 2006. [275] J. Woodcock, P. G. Larsen, J. Bicarregui, and J. S. Fitzgerald. Formal methods: Practice and experience. ACM Computing Surveys, 41(4):1– 36, 2009. [276] L. A. Yang, J. P. Liu, C. H. Chen, and Y. ping Chen. Automatically proving mathematical theorems with evolutionary algorithms and proof

assistants. In Proceedings of Congress on Evolutionary Computation, HR Simon Colton 2002 University of Edinburgh Java Functional, Concurrent HR 2.0 Open source CLI Linux, Windows API No Yes Yes No Theorem generator FOL Binary No Inductive No Produce large number offor theorems testing ATP Systems Zariski Specification Machine Learning pages 4421–4428, 2016. [277] A. Yushkovskiy. Comparison of two theorem provers: Isabelle/HOL and Coq. In Proceedings of the Seminar in Computer Science (CS-E4000), pages 1–17, 2017. [278] B. Zhan and M. P. L. Haslbeck. Verifying asymptotic time complexity of imperative programs in Isabelle. In Proceedings of 9th International

Joint Conference on Automated Reasoning, pages 532–548, 2018. Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

Implementation Others Logico-Math [279] X. Zhang, W. Hong, Y. Li, and M. Sun. Reasoning about connectors General using Coq and Z3. Science of , 170:27–44, 2019.

APPENDIX System details of 27 theorem provers. 24 LEO-II C. Benzmuller, F. Theiss, N.2012 Sultana Freie Universitybridge Berlin University andOCaml Cam- Functional, Imperative, OO 1.7 2015 BSD CLI Unix, Windows Yes No Yes Yes TPTP THF language ATP+ITP HOL Binary No Resolution by unification andity equal- (RUE) No Cooperation with first-order ATP CASC-J5 winner in THF division Cooperative Proof Search iProver Konstantin Korovin 2008 University of Manchester OCaml Functional, Imperative, OO V0.99 GNU CLI Linux No No Yes Yes No ATP FOL Binary No Instantiation calculus No Hardware verificationmodeling andCASC-26 EPR division finite winner Modular combination of proposition and instantiational reasoning ¨ orn Pelzer leanCoP Jens Otten 2003 University of Oslo Prolog Logic programming 2.1 GNU general CLI Windows, Unix, Linux, Mac No No Yes Yes LeanCoP or TPTP syntax ATP First-order intuitionistic Binary No Connection/tableau calculus No Formalization Third in FOF division atProgram CASC-22 can bespecific easily task modified or application for compact due code to its KRHyper/E-KRHyper/Hyper Bj 2007 Koblenz University OCaml Functional, Imperative, OO 1.4 GNU GPL CLI Windows and Unix No No Yes Yes TPTP supported protein format ATP and model generator FOL with equality Binary No Hyper tableau calculus No Embedded in knowledge representa- tion systems Solved 74% of the subestUsed of for TPTP problems -calculus λ E-Darwin Baumgartner Otten 2005 Koblenz University OCaml Functional, imperative, OO 1.5 GNU General CLI Unix, Windows No No Yes Yes Input TPTP or TME format ATP FOL clausal with equality Binary No Model evaluation No Encrypt and solve problems EPR winner at CASC-20 andBack-jumping J3 andtracking dynamic back- Watson M. Randall Holmes 2006 Boise State University Standard ML Functional and Imperative 0.8.2 BSD and MIT GNU General CLI Linux No No No Yes No ITP HOL Binary Quine’s set theory Typed No Software verification, modeling, check- education & mathematics TPTP category Type free HOL support -calculus λ JAPE Richard Bornat 1996 Queen Marry, University of London Java Object oriented (OO) v7-d15 GNU GPL GUI Linux, Mac No No Yes Yes No Proof assistant FOL Binary Yes Deductive & Sequent calculus No Used as a proofment assistant JAPE and theories imple- Teaching purpose tool Forward reasoning and logicing encod- Yarrow Jan Zwanenburg 1997 Eindhoven University Haskell Functional, Modular V1.20 Free BSD GUI & CLI Unix, Linux Fudget for graphical interface No Yes Yes Polymorphic ITP Constructive Binary No Typed Yes Representation environment for log- ics & programming languages. Formalized type theory Experiment context fortype testing system pure

Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

General Implementation Logico-Math Others

Implementation Logico-Math Others General 25 Satallax Chad E. Brown 2010 Saarland University OCaml Functional, imperative, OO 3.0 BSD CLI Linux No No Yes Yes No ATP HOL Binary Church’s simple type theory Tableau calculus No Formalization CASC-26 THF division winner Semantic embedding and cut simula- tion DORIS Johan Bos 1998 University of Edinburgh Prolog Logic programming DORIS 2001 Free BSD CLI Cross Platform No No Yes Yes No TP+ Semantic analyzer FOL Binary No Lambda calculus No Computational semantics, cover var- ious linguistic phenomenas Study behavior of ROB’s algorithm Translate English text intorepresentation discourse structure ¨ org Denzinger Princess Philipp Rummer 2008 Uppsala university Scala OO, functional, concurrent V2.1 LGPL CLI Linux No No Yes Yes Native language SMT Theorem prover FOL Binary Modular linear integer arithmetic Free variable tableau,Constrained se- quent calculus No Software verificationchecking andSolved problems model from QF-LIAgory cate- of SMT Library Solved quantified modulo linear inte- ger arithmetic DISCOUNT J 1997 University of Calgary C Procedural 2.0 Free BSD CLI Linux, Solaris Yes No Yes No E(Universal Implication) Distributed equational TP FOL Binary No Pure unit equality Yes Machine learning Entrance competition Discount/GL Based onknowledge based teamwork distribution method for Muscadet Dominique Pastre 2003 University Paris Descartes SWI-Prolog Logic programming 4.5 BSD CLI Unix, Linux No No Yes Yes No Knowledge base TP second-order logic Binary No Natural deduction No Topological linearautomata spaces, cellular Winner CASC-JC in IJCAR 2001 Efficient for the problems containing too many axioms and formulas Coral Alan Bundy,Monika Maidl Graham2006 Steel and University of Edinburgh Built on SPASS theorem prover Functional 2008 Free BSD GUI Cross Yes No Yes Yes No Inductive theorem prover FOL Binary No Tableau calculus No Cryptographicanalysis, security findsecurity protocols protocol attacksDiscover new attacks on on theand Taghidri Jackson faulty improved protocol Find counterexampleconjecture to inductive SPASS Christophe Weidenbach 1999 Max Planck InstituteScience for Computer Java Functional 3.9 Free BSD CLI+GUI Windows, MAC, Linux Yes No Yes Yes No ATP FOL Binary No Superposition calculus No Analysis of security protocols,sion colli- avoidance protocols Discover new attacks on theGinzboorg Asokan- protocol ATP with equality MaLARea Josef Urban 2007 Charles University Perl Functional, Imperative, OO 0.5 GPL2 CLI Linux No No Yes Yes No ATP HOL Binary No Deductive No Learning and reasoningproving in system large formal for libraries CASC-24 LTB division winner Utilize ATP as coretechniques. system with AI

Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features Name Contributor 1st Rel Ind/Uni/Inde CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

General Implementation Logico-Math Others General Implementation Logico-Math Others 26 Geometry Expert Xiao-Shan Gao 1998 Key Laboratory of China Java Functional MMP/Geometer GNU general public liscence GUI Cross No No Yes Yes No Automatic Geometric TP Dynamic logic models Binary No Euclidean and differential geometry No Teaching geometry,physics in China algebra at school Wu’s method implemented and Automated geometric diagramstruction con- Graffiti Epstein 1986 University of Houston C++ Procedural, OO, statically type,checking type GRAFFITI Free BSD GUI Uinx Yes No Yes Yes No Decision procedure Graph theory Binary Yes Deductive No Used in chemistry(graph and theory) mathematics for making conjecture Barbara Project Pedagogic tool Expander Peter Padawitz 2007 Dortmund university O’Haskell ( extention of Haskell) Concurrent programming Expander Free BSD GUI Cross Yes No Yes Yes No ATP FOL Binary Swinging types Narrow & Fixed point co-induction No Testing algebraic andtional func- logic program Super concentrators Interactive termtransformation, rewriting, severaltion of graph representa- formal expression -calculus Z/EVES Mark Saaltinkz 1997 University of Kent Common Lisp Meta reflective, OO, functional,cedural pro- 2.4.1 Free BSD GUI/CLI Windows, Unix, Linux, Mac Yes Yes Yes Yes Latex ATP+ITP FOL Binary ZF set theory, Axiomatic setλ theory No General theorem prover Specification Theorem prover,checker, syntax domain checker and type Goedel Johan Belinfante 2005 Georgia Institute of Technology Mathematica Functional, procedural 2014 Free BSD GUI Cross No No Yes Yes No ATP FOL Binary ZF set theory Natural deduction No Derive new theoremreasoning for automated 393 example of QAIF inReduce 91 number seconds of steps DTP Don Geddis 1995 Stanford University Common Lisp Meta reflective, OO, functional 3.0 Free BSD CLI Cross Yes Epilog No Yes Yes KIF (Knowledgemat) Interchange For- Modal elimination theorem prover FOL Binary Horn theories First-order predicate calculus No Black box inference engineous for machine vari- learning program Domain independent control of infer- ence Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

Getfol Fausto Giunchiglia 1994 University of Trento Common Lisp Meta reflective, OO, functional 2.001 Free BSD GUI Unix Yes Yes Yes Yes Proof-script ITP FOL Binary No Natural deduction No In various dataembedding structure real world Entrance in CADE-17 Meta-theory implementation

General Implementation Logico-Math Others

Name Contributor 1st Rel Ind/Uni/Ind CLang Prog.P LV LT UI OS Lib CG Ed Ext I/O TType CLogic TV ST Calculus ProofKernel App. Areas Eval Unique Features

Implementation Logico-Math Others General