Biorthogonality, Step-Indexing and Compiler Correctness

Biorthogonality, Step-Indexing and Compiler Correctness Nick Benton Chung-Kil Hur Microsoft Research University of Cambridge [email protected] [email protected] Abstract to a deeper semantic one (‘does the observable behaviour of the We define logical relations between the denotational semantics of code satisfy this desirable property?’). In previous work (Benton a simply typed functional language with recursion and the opera- 2006; Benton and Zarfaty 2007; Benton and Tabareau 2009), we tional behaviour of low-level programs in a variant SECD machine. have looked at establishing type-safety in the latter, more semantic, The relations, which are defined using biorthogonality and step- sense. Our key notion is that a high-level type translates to a low- indexing, capture what it means for a piece of low-level code to level specification that should be satisfied by any code compiled implement a mathematical, domain-theoretic function and are used from a source language phrase of that type. These specifications are to prove correctness of a simple compiler. The results have been inherently relational, in the usual style of PER semantics, capturing formalized in the Coq proof assistant. the meaning of a type A as a predicate on low-level heaps, values or code fragments together with a notion of A-equality thereon. These Categories and Subject Descriptors F.3.1 [Logics and Mean- relations express what it means for a source-level abstractions (e.g. ings of Programs]: Specifying and Verifying and Reasoning about functions of type A ! B) to be respected by low-level code (e.g. Programs—Mechanical verification, Specification techniques; F.3.2 ‘taking A-equal arguments to B-equal results’). A crucial property [Logics and Meanings of Programs]: Semantics of Program- of our low-level specifications is that they are defined in terms of ming Languages—Denotational semantics, Operational seman- the behaviour of low-level programs; making no reference to any tics; F.3.3 [Logics and Meanings of Programs]: Studies of Pro- intensional details of the code produced by a particular compiler or gram Constructs—Type structure,Functional constructs; D.3.4 the grammar of the source language. Of course, the specifications [Programming Languages]: Processors—Compilers; D.2.4 [Soft- do involve low-level details of data representations and calling con- ware Engineering]: Software / Program Verification—Correctness ventions – these are part of the interface to compiled code – but up proofs, Formal methods to that, code from any source that behaves sufficiently like code generated by the compiler should meet the specification, and this General Terms Languages, theory, verification should be independently verifiable. Keywords Compiler verification, denotational semantics, biorthog- Ideally, one might wish to establish the sense in which a compi- onality, step-indexing, proof assistants lation scheme is fully abstract, meaning that the compiled versions of two source phrases of some type are in the low-level relation interpreting that type iff the original source phrases are contextu- 1. Introduction ally equivalent. If low-level specifications are used for checking Proofs of compiler correctness have been studied for over forty linking between the results of compiling open programs and code years (McCarthy and Painter 1967; Dave 2003) and have recently from elsewhere1 and full abstraction does not hold, then source been the subject of renewed attention, firstly because of increased level abstractions become ‘leaky’: reasoning about equivalence or interest in security and certification in a networked world and sec- encapsulation at the source level does not generally translate to ondly because of advances in verification technology, both theoret- the target, which can lead to unsound program transformations in ical (e.g. separation logic, step-indexed logical relations) and prac- optimizing compilers and to security vulnerabilities (Abadi 1998; tical (e.g. developments in model checking and improvements in Kennedy 2006). Ahmed and Blume(2008) also argue that fully interactive proof assistants). abstract translation should be the goal, and prove full abstraction There are many notions of correctness or safety that one might for (source to source) typed closure conversion for a polymorphic wish to establish of a compiler. For applying language-based tech- lambda calculus with recursive and existential types. Later on we niques in operating systems design, as in proof-carrying code, one will say something about why we believe that ‘full’ full abstraction is primarily interesting in broad properties such as type-safety, may not, in practice, be quite the right goal, but we certainly do memory-safety or resource-boundedness. Although these terms are want ‘sufficiently abstract’ compilation, i.e. the preservation of the widely used, they are subject to a range of interpretations. For ex- reasoning principles that we actually use in an optimizing compiler ample, type-safety sometimes refers to a simple syntactic notion or in proving security properties. (‘is the generated code typable using these rules?’) and sometimes The low-level relations of our previous work are not, however, really even sufficiently abstract, having roughly comparable power to a denotational semantics in continuation-passing style (CPS). This is a very strong and useful constraint on the behaviour of Permission to make digital or hard copies of all or part of this work for personal or machine-code programs, but does not suffice to prove all the equa- classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute 1 This is obviously important for foreign function interfaces and multilan- to lists, requires prior specific permission and/or a fee. guage interoperability, but can be an issue even for separate compilation ICFP’09, August 31–September 2, 2009, Edinburgh, Scotland, UK. using the same compiler. It also covers the simpler and ubiquitous case of Copyright c 2009 ACM 978-1-60558-332-7/09/08. $5.00 handcrafted implementations of standard library routines. tions we might like between low-level programs: even something yield complete systems. The plugging −◦− : P ×C ! S might be as simple as the commutativity of addition does not hold for arbi- effected by appending bits of program, substituting terms, applying trary integer computations (just as it doesn’t in a lambda calculus continuations to arguments or composing processes in parallel. In ? with control), even though it does in our pure source language. To any such situation, there is contravariant map (·) : P(P) ! P(C) understand how to refine our low-level relations further, it is natural given by to look at logical relations between low-level code and elements of P ? = fc 2 C j 8p 2 P; p ◦ c 2 Og the domains arising in a standard, direct-style, denotational model ? of our language, which is what we’ll do here. Given such a typed and a homonymous one in the other direction, (·) : P(C) ! P(P) relation between high-level semantics and low-level programs, a ? low-level notion of typed equivalence can be generated by consid- C = fp 2 P j 8c 2 C; p ◦ c 2 Og ering pairs of low-level programs that are related to some common yielding a contravariant Galois connection, so that, amongst many high-level denotational value. other things, (·)?? is a closure operator (inflationary and idempo- The relations we define will establish a full functional correct- tent) on (P), and that any set of the form C? is (·)??-closed.2 ness theorem for a simple compiler, not merely a semantic type P The binary version of this construction starts with a binary rela- safety theorem. Just as for type-safety, there are several approaches tion (e.g. an equivalence relation) on S and proceeds in the obvious to formulating such correctness theorems in the literature. A com- way. mon one, used for example by Leroy(2006), is to define an oper- For compiler correctness, we want the interpretations of source- ational semantics for both high- and low-level languages and then level types or, indeed, source-level values, to be compositional and establish a simulation (or bisimulation) result between source pro- extensional: properties of low-level program fragments that we can grams and their compiled versions, allowing one to conclude that check independently and that make statements about the observable if a closed high-level program terminates with a particular observ- behavior of the complete configurations that arise when we plug able result, then its compiled version terminates with the same re- the fragments into ‘appropriate’ contexts. This set of ‘appropriate’ sult, and often the converse too (Hardin et al. 1998; Leroy and contexts can be thought of as a set of tests: a low-level fragment Grall 2009). The limitation of these simulation-based theorems is is in the interpretation of a source type, or correctly represents that they are not as compositional (modular) or extensional (be- a source value, just when it passes all these tests. Thus these havioural) as we would like, in that they only talk about the be- ?? haviour of compiled code in contexts that come from the same com- low-level interpretations will naturally be (·) -closed sets. For a piler, and usually specify a fairly close correspondence between the simply typed source language, we can define these sets by induction on types, either positively, starting with an over-intensional set (non-observable) intermediate states of the source and target. We ?? would rather have maximally permissive specifications that capture and then taking its (·) -closure, or negatively, by first giving the full range of pieces of low-level code that, up to observations, an inductive definition of a set of contexts and then taking its behave like, or realize, a particular piece of high-level program.

Biorthogonality, Step-Indexing and Compiler Correctness

Forms of Semantic Speci Cation

A Rational Deconstruction of Landin's SECD Machine

From Interpreter to Compiler and Virtual Machine: a Functional Derivation Basic Research in Computer Science

An Introduction to Landin's “A Generalization of Jumps and Labels”

Foundations of Functional Programming

Simplification of Abstract Machine for Functional Language and Its Theoretical Investigation

Abstract Λ-Calculus Machines

Lambda and Pi Calculi, CAM and SECD Machines

Programming Languages and Lambda Calculi (Utah CS7520 Version)

Implementing Functional Languages with Abstract Machines

Closure (Computer Programming) - Wikipedia, the Free Encyclopedia

Deriving the Full-Reducing Krivine Machine from the Small-Step Operational Semantics of Normal Order *