Proof-Assistant-Based Verification of Programs

Proof-assistant-based verification of programs Valentin Robert University of California, San Diego [email protected] Abstract essentially removing the gap between the programming and Proof assistants are now widely used in the field of pro- the logical fragments. gramming languages research to reason formally about ob- In a system with dependent types, the return type of jects of increasing complexity, like advanced type systems, a function can refer to the value of its argument, and large programming languages, compilers, or mathematical thus express properties about this value. This allows the constructs. This research examination report focuses on the user of these tools to write functions and verify strong use of proof assistants in reasoning about programs, and specifications, by following one of two approaches. surveys the methods developed to tame the complexity of The intrinsic approach uses dependent types as a pro- representing languages and their semantics, as well as the gramming tool which lets one write certified computations. difficulties encountered in reasoning about programs in im- One can use inductive type families to describe correct-by- perative and concurrent settings. We also examine concrete construction data types with strong invariants, or use de- tools developed to perform this reasoning, and consider their pendent records to package simpler data types with proof trade-offs in expressivity and proof automation. terms that characterize such invariants. A classic example of the former is the vector inductive family, indexed by its 1. Introduction size: Over the past decades, proof assistants have changed the Inductive Vec (T : Type) : nat -> Type := social process of exchanging formal results about program- | Nil : Vec O ming languages, and has opened the way, along with solver- | Cons : ∀ {n}, T -> Vec n -> Vec (S n) based techniques, to tackling challenges much more realistic . than what was usually done using pen and paper. Interactive By construction, an element of the type Vec T n must theorem provers can be used to formalize existing bodies of contain exactly n elements. An example of the latter could mathematics, to develop new mathematical theories in a for- be a type for a particular tree structure (balanced, red-black, mal setting, to reason about type systems, or to implement etc.) represented as a dependent record containing a raw tree verified programs. It is now common to see formal develop- and a proof of the expected property: ments attached to publications in the field of programming languages, where the proofs used to be detailed informally, Record Type BST (T : Type) := { often in an informal combination of mathematical notations t : BinaryTree T, and natural language. _ : IsOrdered t, In this report, we focus on the use of proof assistants in }. the verification of programs, and highlight some of the main directions of research in this area, and for each of those, Consumers of this data type have access to both the un- some of the most succesful or promising techniques. derlying tree t, and a proof that t is a search tree. Using Section 2 focuses on the use of proof assistants to im- this proof, they can derive more facts about the ordering plement and verify functional programs. In section 3, we of the values in t if necessary, without requiring dynamic describe the meta-theoretic tools developed by the commu- tests. Producers of this data type, on the other hand, have nity to reason about programming languages. In section 4, additional work, since they need not only build t, but also we give an overview of several verified compilers for different construct the proof term for the IsOrdered property. This flavors of languages. Section 5 focuses on separation logic, a construction can be done either in the form of programming, logic designed to reason about imperative programs. In sec- or, in some systems, it can be dispatched on the side, either tion 6, we describe the most recent developments in verify- fully-automatically with solvers, or manually with tactics. ing low-level code. Finally, in section 7, we address a cross- In pratice, this approach can become extremely verbose, be- cutting concern in all of these directions, the challenge of cause dependently-typed elimination can require complex automating the proof effort. user annotations for the return types and complex propagation patterns for the facts that are known to be true (like the convoy pattern). Systems like Agda mitigate this hurdle 2. Functional programs by using stronger axioms and performing some automated Modern proof assistants are often built on top of a higher- propagation of equalities when performing elimination on order dependent type theory. By leveraging the Curry- inductive type families. Howard isomorphism, a mathematical correspondence be- On the other hand, the extrinsic approach separates tween programs and proofs, types and propositions, they the programming and reasoning phases. Extrinsic code is allow their user to build higher-order functional programs written in the higher-order polymorphic part of the calculus, and to prove theorems using the exact same mechanisms, and properties of the intended behavior are captured a 1 2015/5/4 posteriori, in theorems that make up the specification of siderations are internalized (for instance, Twelf), but most the code. The benefits here is that the code is much easier don’t. An opposite approach is that of de Bruijn indices, to read and write, and clearly separated from the proofs. which encode binders as their distance to their abstraction The inconvenience is that the code does not benefit from location. This representation is immediately canonical, and the static guarantees that the intrinsic style captures, and avoids the need for substitutions altogether, but it cannot therefore, needs to often deal with cases that cannot happen represent terms with free variables without further quotient- in actual executions. ing, and manipulation of such terms becomes cluttered with Most proof assistants require intrinsic termination anno- shifting binders. tations even for extrinsic code, as non-terminating computa- An approach introduced by Coquand [13] and later tions can make the logical fragment inconsistent. Casinghino reused by McKinna et al. [28][29] consists in separating et al. [7] propose a more flexible phase distinction that lets syntactically bound and free occurences of variables. This one write non-terminating functional programs in the pro- makes substitutions for free variables simpler since names gramming fragment, and allows transportation to and from cannot be captured. This idea is further refined by Leroy [27] the logical fragment whenever it is safe to do so. To achieve and Charguéraud [8] with the locally nameless approach. In this, they index their typing judgement by the fragment in this setting, bound variables are nameless, that is, they are which it holds. When one obtains a derivation in the logical represented by their de Bruijn index, but free variables are fragment, they may use the term either as a proof or as a named. This has two advantages: α-equivalence is syntactic program (following the Curry-Howard isomorphism). On the equality, and free variables can have meaningful names. other hand, a derivation in the programming fragment only In Engineering Formal Meta-theory [1], the authors ad- allows running the program, not using it as a proof. As a dress another hurdle of formal reasoning with names in the result, they may allow more constructs in the programming locally nameless approach. The issue is that the classic typ- fragment than the safe ones available in the logical one, but ing rule for abstraction: they let the languages overlap so that safe programs are mobile across fragments. x∈ / FV (t) E, x : S ` tx : T A problem somewhat related to that of termination anal- Exists-fresh ysis is that of dealing with infinite-size data and computa- E ` Lam t : S → T tion over it. A functional programmer may wish to describe gives rise to an arguably poor induction principle (tx stands infinite constructions, either by virtue of their circularity or for the body of the abstraction where de Bruijn variables by expliciting a generating function that could be expanded referring to the opened Lam are replaced with FreeVar x). as much as needed. For instance, one may wish to describe Upon introduction, this rule is easy to use as we need only circular graphs, or to define a possibly-infinite sequence of come up with a given x that is sufficiently fresh with respect elements. These are meant either to be consumed only to to the body of the abstraction. The issue is that upon a finite depth, using mechanisms to delay the evaluation, elimination, we obtain again a fixed variable name x, which or to be processed ad infinitum by a recursive process. For is only guaranteed to be fresh with respect to the body of the the same consistency arguments, these patterns are not ex- abstraction. Often, we need to rather pick an x which is fresh pressible safely within an inductive dependently-typed the- with respect to both the body of the abstraction, and some ory. Coinduction can be used to implement functions work- other set of variable names, say S. This should be achievable ing over infinite data, as long as there is a guarantee that because we can arbitrarily rename variables according to the the function will produce an output within a finite number Barendregt convention (that is, in an α-equivalent way), but of execution steps. Furthermore, the system ensures this by this presentation forces the user to perform this substitution an over-conservative, mostly syntactic check. This is good and prove its correctness. They propose this alternative rule: enough for most purposes, but some seemingly-reasonable programs cannot be written (for instance, a stream filter, x since it could potentially never output anything).

Proof-Assistant-Based Verification of Programs

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support