The Bedrock Structured Programming System Combining Generative Metaprogramming and Hoare Logic in an Extensible Program Veriﬁer

The Bedrock Structured Programming System Combining Generative Metaprogramming and Hoare Logic in an Extensible Program Verifier Adam Chlipala MIT CSAIL [email protected] Abstract pend on. Examples include operating systems and runtime systems, We report on the design and implementation of an extensible pro- which today are almost exclusively written in C and its close rela- gramming language and its intrinsic support for formal verifica- tives. tion. Our language is targeted at low-level programming of infras- Need we always choose between performance and abstraction? tructure like operating systems and runtime systems. It is based on One approach to retaining both is metaprogramming, from basic a cross-platform core combining characteristics of assembly lan- cases of textual macro substitution as in the C preprocessor, to so- guages and compiler intermediate languages. From this founda- phisticated code generators like Yacc. Here we employ abstractions tion, we take literally the saying that C is a “macro assembly lan- that in a sense are compile-time only, running programs that gen- guage”: we introduce an expressive notion of certified low-level erate programs in C or other low-level languages. As a result, we macros, sufficient to build up the usual features of C and beyond incur no run-time cost to abstraction. as macros with no special support in the core. Furthermore, our Unfortunately, code generation is hard to get right. The C pre- macros have integrated support for strongest postcondition calcu- processor’s textual substitution mechanism allows for all sorts of lation and verification condition generation, so that we can provide scoping errors in macro definitions. There exist widely used alter- a high-productivity formal verification environment within Coq for natives, like the macro systems of Lisp/Scheme, OCaml (Camlp4), programs composed from any combination of macros. Our macro and Haskell (Template Haskell), which can enforce “hygiene” re- interface is expressive enough to support features that low-level quirements on lexical scoping in macro definitions. programs usually only access through external tools with no formal The functional programming community knows well the advan- guarantees, such as declarative parsing or SQL-inspired querying. tages of code generation with languages like Haskell and ML as The abstraction level of these macros only imposes a compile-time metalanguages for embedded domain-specific languages (DSLs). cost, via the execution of functional Coq programs that compute That is, one represents the programs of some object language, like programs in our intermediate language; but the run-time cost is C or CUDA, as data within the metalanguage, making it possible not substantially greater than for more conventional C code. We to compute programs of the object language at compile time using describe our experiences constructing a full C-like language stack all of the abstraction features of the metalanguage. using macros, with some experiments on the verifiability and per- There is a tower of static assurance levels for this kind of gener- formance of individual programs running on that stack. ative metaprogramming. Tools like Yacc receive little static assurance from their implementation languages, as they output source Categories and Subject Descriptors F.3.1 [Logics and meanings code as strings that the implementation language does not analyze. of programs]: Mechanical verification; D.3.4 [Programming Lan- Conventional macro systems, from primitive token substitution in guages]: Compilers C to hygienic Scheme macros, can provide stronger lexical guarantees. Next in the tower are languages that allow for static typing Keywords generative metaprogramming; interactive proof assis- of macros, guaranteeing that any of their applications produce both tants; low-level programming languages; functional programming scope-safe and type-safe object-language code. Languages in this category include MetaML [30] and MacroML [13], for homoge- 1. Introduction neous metaprogramming where the meta and object languages are A fundamental tension in programming language design is between the same; and MetaHaskell [22], for heterogeneous metaprogram- performance and abstraction. Closer to the high-performance end ming where meta and object languages are different (e.g., Haskell of the spectrum, we have languages like C, which are often used and CUDA). Our contribution in this paper is to continue higher to implement low-level infrastructure that many applications de- up this tower of static guarantees: we support separate static check- ing of macros to guarantee that they always produce functionally correct object-language code. The programming language and associated verification tools Permission to make digital or hard copies of part or all of this work for personal or will need to provide a proof rule for each macro. A naive proof classroom use is granted without fee provided that copies are not made or distributed rule just expands macro applications using their definitions, pro- for profit or commercial advantage and that copies bear this notice and the full citation viding no abstraction benefit. We could ask instead that macro defi- on the first page. Copyrights for third-party components of this work must be honored. nitions include proof rules, and that the verification system requires For all other uses, contact the owner/author(s). programmers to prove that their macro definitions respect the as- ICFP ’13, September 25–27, 2013, Boston, MA, USA. Copyright is held by the owner/author(s). sociated proof rules. In effect, we have a macro system support- ACM 978-1-4503-2326-0/13/09. ing modularity in the style of well-designed functions, classes, or http://dx.doi.org/10.1145/2500365.2500592 libraries, with clear separation between implementation and inter- Constants c ::= [width-32 bitvectors] face. Code labels ` ::= ::: In this paper, we report on our experiences building such a Registers r ::= Sp j Rp j Rv macro system for a low-level language, intended as a replacement Addresses a ::= r j c j r + c for C in the systems programming domain. This macro system Lvalues L ::= r j [a]8 j [a]32 is part of the Bedrock library for the Coq proof assistant, which Rvalues R ::= L j c j ` provides an integrated environment for program implementation, Binops o ::= + j − j × specification, and verification. A prior publication [8] introduced Tests t ::= =j 6= j < j ≤ Bedrock’s support for proof automation, implementing example Instructions i ::= L R j L R o R programs using an earlier version of our macro system but not de- Jumps j ::= goto R j if (R t R) then ` else ` scribing it in detail. Since that paper was written, the macro system Specs S ::= ::: Blocks B ::= ` : fSg i∗; j has evolved, and we have used it to implement more interesting ∗ examples. We now present its design and implementation in detail, Modules M ::= B along with the higher-level language design principle that motivates it. Figure 1. Syntax of the Bedrock IL Our metalanguage is Gallina, the functional programming language in Coq’s logic. Our object language is a tiny compiler intermediate language. We build up the usual features of C via unpriv- prove correctness theorems within a framework for modular ver- ileged macros, and then we go further and implement features like ification, where we can verify libraries separately and then link declarative parsing and declarative querying of data structures as together their correctness theorems into whole-program correct- macros rather than ad-hoc external tools like Yacc and MySQL. ness theorems, without revisiting details of code implementation To support productive formal verification, we tag each macro or proof. This modularity holds even in the presence of a mutable with a proof rule. To be more precise, each macro contains a pred- heap that may hold callable code pointers, thanks to the XCAP [26] icate transformer, which evolves a logical assertion to reflect the program logic that sits at the base of our framework. effects of a program statement; and a verification condition gener- The remaining sections of this paper will walk through the steps ator, which outputs proof obligations associated with a use of the of building our stack of macros. We begin with a tiny assembly-like macro. These are some of the standard tools of highly automated language and proceed to introduce our notion of certified low-level program verification, and requiring them of all macros lets us build macros for it, build macros for basic C-like constructs, add support a generic verification environment that supports automated proofs for local variables and function calls, and culminate in the high- about programs built with macros drawn from diverse sources. We level macros for declarative parsing and querying. Next we evaluate call the central abstraction certified low-level macros. the effectiveness of our platform in terms of program verifiability and run-time performance, and we finish with a comparison to Our implementations and proofs have much in common with related work. compiler verification. For instance, CompCert [21] is a C com- Source code to both the Bedrock library and our example pro- piler implemented in Coq with a proof that it preserves program grams is available in the latest Bedrock source distribution at: semantics. Each of our macro implementations looks like one case of the CompCert compiler, dealing with a specific source language http://plv.csail.mit.edu/bedrock/ syntax construct, paired with the corresponding correctness proof cases. A specific program may draw upon several macros implemented by different programmers, who never considered how their 2. The Bedrock IL macros might interact. The proof approach from CompCert does The lowest layer of Bedrock is a simple intermediate language in- not extend straightforwardly to this setting, since each CompCert spired by common assembly languages. This Bedrock IL is easy subproof refers to one or more simulation relations between the to compile to those “real” languages, using only localized replace- programs of different fixed languages.

Load more