Modular Refinement Using Cogent's Principled Foreign Function Interface

Overcoming Restraint: Modular Refinement using Cogent’s Principled Foreign Function Interface Louis Cheung ! UNSW Sydney, Australia Liam O’Connor ! Ï University of Edinburgh, United Kingdom Christine Rizkallah ! Ï UNSW Sydney, Australia Abstract Cogent is a restricted functional language designed to reduce the cost of developing verified systems code. However, Cogent does not support recursion nor iteration, and its type system imposes restrictions that are sometimes too strong for low-level system programming. To overcome these restrictions, Cogent provides a foreign function interface (FFI) between Cogent and C which allows for implementing those parts of the system which cannot be expressed in Cogent, such as data structures and iterators over these data structures, to be implemented in C and called from Cogent. The Cogent framework automatically guarantees correctness of the overall Cogent-C system when provided proofs that the C components are functionally correct and satisfy Cogent’s FFI constraints. We previously implemented file systems in Cogent and verified key file system operations. However, the C components and the FFI constraints that define the Cogent-C interoperability were axiomatized. In this paper, we verify the correctness and FFI constraints of the C implementation of word arrays used in the file systems. We demonstrate how these proofs modularly compose with existing Cogent theorems and result in a functional correctness theorem of the overall Cogent-C system. This demonstrates that Cogent’s FFI constraints ensure correct and safe inter-language interoperability. Keywords and phrases compilers, verification, type-systems, language interoperability, data-structures Acknowledgements We thank Abdallah Saffidine, Amos Robinson, Jashank Jeremy, and Vincent Jackson for their feedback on the paper. Special thanks to Jashank for the heavy editing. We highly appreciate Vincent’s proof engineering work, over the past couple of years, on maintaining the compiler proof chain and the BilbyFs proofs on ever-changing versions Cogent and Isabelle/HOL. 1 Introduction Cogent [14] is a restricted pure functional language and a certifying compiler [17, 14] designed to ease creating high-assurance systems [3]. It has a foreign function interface (FFI) that enables implementing parts of a system in C. Cogent’s main restrictions are the purposeful lack of recursion and iteration, which ensures totality, and its uniqueness type arXiv:2102.09920v2 [cs.PL] 18 Mar 2021 system, which enforces a uniqueness invariant that guarantees memory safety. Even in the restricted target domains of Cogent, real programs contain some amount of iteration, primarily over specific data structures. This is achieved through Cogent’s integrated FFI: engineers provide data structures and functions to operate on these data structures, including iterators, in a special dialect of C, and imports them into Cogent, including in formal reasoning. The special C code, called anti-quoted C, is C code that can refer to Cogent data types and functions, translated into standard C along with the Cogent program by the Cogent compiler. As long as the C components respect Cogent’s foreign function interface — i.e., are correct and respect the uniqueness invariant — the Cogent framework guarantees soundness for overall Cogent-C system. 2 Modular Refinement using Cogent’s Principled Foreign Function Interface We previously implemented two real-world Linux file systems in Cogent— ext2 and BilbyFs — and verified key operations of BilbyFs3 [ ]. This demonstrated Cogent’s suitability for implementing systems code, and significantly reducing the cost of verification. The implementations of ext2 and BilbyFs rely on an external C library of data structures. This library includes fixed-length word arrays along with iterators for implementing loops, and Cogent stubs for accessing a range of the Linux kernel’s internal APIs. This library was carefully designed to ensure compatibility with Cogent’s FFI constraints. We had previously only verified the Cogent parts of these file system operations, and axiomatized statements of the underlying C correctness and FFI constraints defining Cogent-C interoperability. To fully verify a system written in Cogent and C, one needs to provide manually-written abstractions of the C parts, and manually prove refinement through Cogent’s FFI. The effort required for this manual verification remains substantial, but the reusability ofthese libraries allows for the amortisation this verification cost across different systems. In this paper, we verify the correctness and uniqueness invariants of the word array implementation used in the BilbyFs file system. This demonstrates that is it possible for theC components of a real-world Cogent-C system to satisfy Cogent’s FFI conditions. We demonstrate that the compiler-generated refinement proofs guaranteeing semantic preser- vation and memory safety compose with manual C proofs at each intermediate level up to Cogent’s generated shallow embedding. Our specification of word arrays as Isabelle/HOL lists and verification of array operations against abstract Isabelle/HOL functions on lists, including an array iterator specified as a map-accumulate function over a fragment of a list, are reusable beyond the context of Cogent. Moreover, our proofs are reusable in the verification of any future Cogent-C system that uses word arrays. Our results show that Cogent’s two-language approach is not only suitable for developing real-world systems but also for verification. We show how to modularly compose compiler generated theorems with external theorems. This demonstrates how language interoperability can help guarantee correctness of an overall system. More generally, this work provides our community with a case-study demonstrating how a principled foreign function interface between a high-level functional language with a safe type-system and a low-level imperative language with an unsafe type system can be structured in order to modularly achieve refinement guarantees of the overall system. In particular, our work supports, and is well-described by, Ahmed’s claim that: “compositional compiler correctness is, in essence, a language interoperability problem: for viable solutions in the long term, high-level languages must be equipped with principled foreign-function interfaces that specify safe interoperability between high-level and low-level components, and between more precisely and less precisely typed code.” [1] Our code and proofs are available online [4]. 2 Cogent The Cogent programming language [14] is intended for the implementation of operating system components such as file systems. It is a purely functional language, but itcanbe compiled into efficient C code suitable for systems programming. The Cogent compiler produces three artefacts: C code, a shallow embedding of the Cogent code in Isabelle/HOL [13], and a formal refinement theorem relating the two [14, 17] (arrow (1) in Figure 1). The refinement theorem and proof rely on several intermediate embeddings also generated by the Cogent compiler, some related through language level L. Cheung, L. O’Connor, and C. Rizkallah 3 Overview Verification Correctness Framework legend Correctness ⊑ refines manual Cogent Cogent generated Shallow Embedding Shallow Embedding manual proofs (6) generated proofs (3) Memory Safety ⊑ ⊑ interface refines refines Polymorphic Cogent Polymorphic Cogent (2) (5) (value semantics) (value semantics ) Isabelle/HOL ξv Systems Data-structures ⊑ ⊑ refines refines Monomorphic Cogent Monomorphic Cogent (1) (4) (value semantics) (value semantics ξv) ⊑ ⊑ C code Systems Data-structures refines refines + safe + safe Monomorphic Cogent Monomorphic Cogent Cogent compiler (update semantics) (update semantics ξu ) ⊑ ⊑ Cogent Template C refines refines Program Data-structures C System C Data-structure (imported to Isabelle) (imported to Isabelle) Figure 1 An overview of the Cogent framework (left) and the verification phases (right). proofs, and others through translation validation phases (Section 2.3). The compiler certificate guarantees that correctness theorems proven on top of the shallow embedding also hold for the generated C, which eases verification, and serves as the basis for further functional correctness proofs (Figure 1, arrow (3)). A key part of the compiler certificate describes Cogent’s uniqueness type system, which enforces that each mutable heap object has exactly one active pointer in scope at any point in time. This uniqueness invariant allows modelling imperative computations as pure functions: the allocations and repeated copying commonly found in functional programming can be replaced with destructive updates, and the need for garbage collection is eliminated, resulting in predictable and efficient code. Well-typed Cogent programs have two interpretations: a pure-functional value semantics, which has no notion of a heap and treats all objects as immutable values, and an imperative update semantics, describing the destructive mutation of heap objects. These two semantic interpretations correspond (Section 2.3), meaning that any reasoning about the functional semantics also applies to the imperative semantics. This correspondence further guarantees that well-typed Cogent programs are memory safe (Figure 1, arrow (2)). As mentioned, the underlying C data structures of our two file systems were carefully designed to ensure compatibility with Cogent’s FFI constraints (Section 2.2). But the verification of the file systems previously

Modular Refinement Using Cogent's Principled Foreign Function Interface

Type-Driven Development of Concurrent Communicating Systems

Input and Output in Fuctional Languages

LINEAR/NON-LINEAR TYPES for EMBEDDED DOMAIN-SPECIFIC LANGUAGES Jennifer Paykin

Type-Driven Development of Concurrent Communicating Systems

Wandering Through Linear Types, Capabilities, and Regions

Uniqueness Typing for Resource Management in Message-Passing Concurrency

Making Uniqueness Typing Less Unique Thesis Submitted for the Degree of Doctor in Philosophy

The Yampa Arcade∗

Exploiting Functional Invariants to Optimise Parallelism: a Dataﬂow Approach

Linear Haskell Practical Linearity in a Higher-Order Polymorphic Language

Tackling the Awkward Squad: Monadic Input/Output, Concurrency, Exceptions, and Foreign-Language Calls in Haskell

SAC — a Functional Array Language for Efficient Multi-Threaded Execution