Scaling Symbolic Evaluation for Automated Verification of Systems Code with Serval

Scaling symbolic evaluation for automated verification of systems code with Serval Luke Nelson James Bornholt Ronghui Gu University of Washington University of Washington Columbia University Andrew Baumann Emina Torlak Xi Wang Microsoft Research University of Washington University of Washington Abstract But the benefits of formal verification come at a consider- This paper presents Serval, a framework for developing au- able cost. Writing proofs requires a time investment that is tomated verifiers for systems software. Serval provides an usually measured in person-years, and the size of proofs can extensible infrastructure for creating verifiers by lifting in- be several times or even more than an order of magnitude terpreters under symbolic evaluation, and a systematic ap- larger than that of implementation code [49: §7.2]. proach to identifying and repairing verification performance The push-button verification approach [65, 74, 75] frees bottlenecks using symbolic profiling and optimizations. developers from such proof burden through co-design of Using Serval, we build automated verifiers for the RISC-V, systems and verifiers to achieve a high degree of automation, x86-32, LLVM, and BPF instruction sets. We report our ex- at the cost of generality. This approach asks developers to perience of retrofitting CertiKOS and Komodo, two systems design interfaces to be finite so that the semantics of each in- previously verified using Coq and Dafny, respectively, for terface operation (such as a system call) is expressible as a set automated verification using Serval, and discuss trade-offs of traces of bounded length (i.e., the operation can be imple- of different verification methodologies. In addition, we apply mented without using unbounded loops). Given the problem Serval to the Keystone security monitor and the BPF compil- of verifying a finite implementation against its specification, ers in the Linux kernel, and uncover 18 new bugs through a domain-specific automated verifier reduces this problem verification, all confirmed and fixed by developers. to a satisfiability query using symbolic evaluation [32] and discharges the query with a solver such as Z3 [31]. ACM Reference Format: While promising, this co-design approach raises three Luke Nelson, James Bornholt, Ronghui Gu, Andrew Baumann, Em- open questions: How can we write automated verifiers that ina Torlak, and Xi Wang. 2019. Scaling symbolic evaluation for are easy to audit, optimize, and retarget to new systems? automated verification of systems code with Serval. In ACM SIGOPS How can we identify and repair verification performance 27th Symposium on Operating Systems Principles (SOSP ’19), October 27–30, 2019, Huntsville, ON, Canada. ACM, New York, NY, USA, bottlenecks due to path explosion? How can we retrofit sys- 18 pages. https://doi.org/10.1145/3341301.3359641 tems that were not designed for automated verification? This paper aims to answer these questions. First, we present Serval, an extensible framework for writ- 1 Introduction ing automated verifiers. In prior work on push-button ver- Formal verification provides a general approach to proving ification [65, 74, 75], a verifier implements symbolic evalu- critical properties of systems software [48]. To verify the ation for specific systems, which both requires substantial correctness of a system, developers write a specification of its expertise and makes the resulting verifiers difficult to reuse. intended behavior, and construct a machine-checkable proof In contrast, Serval developers write an interpreter for an to show that the implementation satisfies the specification. instruction set using Rosette [80, 81], an extension of the This process is effective at eliminating entire classes of bugs, Racket language [35] for symbolic reasoning. Serval lever- ranging from memory-safety vulnerabilities to violations of ages Rosette to “lift” an interpreter into a verifier; we use the functional correctness and information-flow policies [49]. term “lift” to refer to the process of transforming a regular program to work on symbolic values [76]. Permission to make digital or hard copies of part or all of this work for Using Serval, we build automated verifiers for RISC-V [85], personal or classroom use is granted without fee provided that copies are x86-32, LLVM [54], and Berkeley Packet Filter (BPF) [37]. not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third- These verifiers are simple and easy to understand, and inherit party components of this work must be honored. For all other uses, contact from Rosette vital optimizations for free, such as constraint the owner/author(s). caching [18], partial evaluation [45], and state merging [52]. SOSP ’19, October 27–30, 2019, Huntsville, ON, Canada They are also reusable and interoperable; for instance, we ap- © 2019 Copyright held by the owner/author(s). ply the RISC-V verifier to verify the functional correctness of ACM ISBN 978-1-4503-6873-5/19/10. https://doi.org/10.1145/3341301.3359641 1 SOSP ’19, October 27–30, 2019, Huntsville, ON, Canada Luke Nelson et al. security monitors, and (together with the BPF verifier) to find x86-32 in the Linux kernel. We find a total of 18 new bugs, bugs in the Linux kernel’s BPF just-in-time (JIT) compilers. all confirmed and fixed by developers. Second, the complexity of systems software makes auto- Through this paper, we address the key concerns facing mated verification computationally intractable. In practice, developers looking to apply automated verification: the ef- this complexity manifests as path explosion during symbolic fort required to write verifiers, the difficulty of diagnosing evaluation, which both slows down the generation of verifi- and fixing performance bottlenecks in these verifiers, and cation queries and leads to queries that are too difficult for the applicability of this approach to existing systems. Serval the solver to discharge. A key to scalability is thus for veri- enables us, with a reasonable effort, to develop multiple ver- fiers to minimize path explosion, and to produce constraints ifiers, apply the verifiers to a range of systems, and findpre- that are amenable to solving with effective decision proce- viously unknown bugs. As an increasing number of systems dures and heuristics. Both of these tasks are challenging, as are designed with formal verification in mind [29, 55, 71, 88], evidenced by the large body of research addressing each [5]. we hope that our experience and discussion of three rep- To address this challenge, Serval adopts recent advances in resentative verification methodologies—CertiKOS, Komodo, symbolic profiling [13], a systematic approach to identifying and Serval—will help better understand their trade-offs and performance bottlenecks in symbolic evaluation. By manu- facilitate a wider adoption of verification in systems software. ally analyzing profiler output, we develop a catalog of com- In summary, the main contributions of this paper are as mon performance bottlenecks for automated verifiers, which follows: (1) the Serval framework for writing reusable and arise from handling indirect branches, memory accesses, trap scalable automated verifiers by lifting interpreters and ap- dispatching, etc. To repair such bottlenecks, Serval intro- plying symbolic optimizations; (2) automated verifiers for duces a set of symbolic optimizations, which enable verifiers RISC-V, x86-32, LLVM, and BPF built using Serval; and (3) to exploit domain knowledge to improve the performance of our experience retrofitting two prior security monitors to symbolic evaluation and produce solver-friendly constraints automated verification and finding bugs in existing systems. for a class of systems. As we will show, these symbolic optimizations are essential to scale automated verification. Third, automated verifiers restrict the types of properties 2 Related work and systems that can be verified in exchange for proof au- Interactive verification. There is a long and rich history of tomation. To understand what changes are needed to retrofit using interactive theorem provers to verify the correctness an existing system to automated verification and the limi- of systems software [48]. These provers provide expressive tations of this methodology, we conduct case studies using logics for developers to manually construct a correctness two state-of-the-art verified systems: CertiKOS [30], a se- proof of an implementation with respect to a specification. curity monitor that provides strict isolation on x86 and is A pioneering effort in this area is the KIT kernel [8], which verified using the Coq interactive theorem prover [79], and demonstrates the feasibility of verification at the machine- Komodo [36], a security monitor that provides software en- code level. It consists of roughly 300 lines of instructions and claves on ARM and is verified using the Dafny auto-active is verified using the Boyer-Moore theorem prover [14]. theorem prover [56]. Our case studies show that the inter- The seL4 project [49, 50] achieved the first functional faces of such low-level, lightweight security monitors are correctness proof of a general-purpose microkernel. seL4 is already finite, making them a viable target for Serval, even written in about 10,000 lines of C and assembly, and verified though they were not designed for automated verification. using the Isabelle/HOL theorem prover [66]. This effort took We port both

Load more