A Multipurpose Formal RISC-V Specification Without Creating New Tools

A Multipurpose Formal RISC-V Specification Without Creating New Tools Thomas Bourgeat, Ian Clester, Andres Erbsen, Samuel Gruetter, Andrew Wright, Adam Chlipala MIT CSAIL Abstract experimentation in basic research, progressively extensible RISC-V is a relatively new, open instruction set architecture to full-fledged realistic systems. with a mature ecosystem and an official formal machine- readable specification. It is therefore a promising playground 1.2 From a golden standard to formal-methods for formal-methods research. infrastructure However, we observe that different formal-methods re- There is the potential for a mutually beneficial relationship search projects are interested in different aspects of RISC-V between the community of formal methods and the RISC-V and want to simplify, abstract, approximate, or ignore the community. other aspects. Often, they also require different encoding On the one hand, the RISC-V Foundation already adopted a styles, resulting in each project starting a new formalization golden standard in the form of an ISA specification in the Sail from-scratch. We set out to identify the commonalities be- language [6]. There is now an agreed-on, machine-processed tween projects and to represent the RISC-V specification as description of authorized behaviors for a significant portion a program with holes that can be instantiated differently by of the spec, beyond the usual natural-language description. different projects. On the other hand, the formal-methods community can Our formalization of the RISC-V specification is written leverage RISC-V as a great laboratory to prototype ideas, in Haskell and leverages existing tools rather than requiring where researchers will be able to get mature compilation new domain-specific tools, contrary to other approaches. To toolchains, realistic processors, and systems to turn minimal our knowledge, it is the first RISC-V specification able to experiments into more real-world projects. serve as the interface between a processor-correctness proof However, different research projects use different lan- and a compiler-correctness proof, while supporting several guages and tools, and even projects using the same language other projects with diverging requirements as well. tend to embrace different specification styles depending on their needs. Therefore, even when several formal-methods 1 Introduction projects use the same RISC-V ISA and maybe even the same language and tools, it seems that they still all need different 1.1 RISC-V as a formal-methods-research enabler formalizations. It would be better if the formalization of the Until recently, formal-methods projects requiring specifi- ISA specification were shared between projects, because a cations of processors were in an uncomfortable situation: specification is only useful and meaningful if many eyes The ISAs used in real processors were very complex, did not proofread and validated it against existing systems in many have openly accessible specifications, or were protected by ways. This large effort should be shared between projects. In patents. other words, we aim for reusable multipurpose formalization. Therefore, each formal-methods project was either invent- While the current golden standard, the Sail model, has ing its own ISA (e.g. [11, 24]) or formalizing from-scratch, been translated automatically into the languages of proof as- arXiv:2104.00762v1 [cs.LO] 1 Apr 2021 at the desired abstraction level, a small subset of an existing sistants, those translated models are produced via a very ver- ISA (e.g. [22]). bose intermediate representation of Sail (several megabytes RISC-V is a game changer: it is particularly compelling for of files) quite impractical for formal-methods projects involv- formal-methods research, because it is a simple clean-slate ing manipulation of those models by-hand. design easy to reason about and yet is part of a mature open Considering the social context and incentive structure, ecosystem. RISC-V is a family of instruction sets broken into this outcome is not surprising. The idea of having one single different native bitwidths (32, 64, 128) and extensions (e.g., formalization for use in multiple projects is fundamentally multiplication and division instructions, atomics, ...) that at-odds with the interests and requirements of each project may be mixed and matched. The the RISC-V software ecosys- using the spec, and it goes beyond the language used. Typi- tem contains fully upstreamed Clang, GCC, and Linux; and cally, each research project is interested in a different aspect more and more commercial and open-source processor de- of the specification and wants to approximate soundly, or signs are becoming available [1–3, 9, 10, 13, 34, 35]. All in all, even simplify incorrectly, the other parts of the specifica- today one can run a mostly complete Linux distribution on a tion. The requirement to simplify and opt out of features RISC-V machine. It makes for a very compelling platform for unrelated to the subject of study is even more important 1 for prototypes and early-stage research projects, where no • SMT-based explorations: for memory-model exploration, one is willing to deal with unrelated complexities. Yet, it an interactive generation of partial executions, itera- is exactly these early-stage projects with limited engineer- tively checked to be compliant with the memory model ing resources that would benefit most from being able to axioms reuse an existing specification, rather than having to start • Testing: batch execution of regression tests incrementally formalizing the RISC-V ISA one more time, • Simulation: execution of off-the-shelf software like inevitably making design mistakes in the process. Linux Our specific formal-methods case studies include the fol- 1.3 Abstracting over use cases lowing that mimic exercises with past multipurpose specs. The main obstacle in writing a multipurpose ISA spec is to • Cross model-checking of our spec with another spec consolidate the conflicting expectations of the different users • Functional-correctness proof in Coq of individual machine- of the spec. code programs Again, it is natural to begin by worrying about “Tower of • Checking of short litmus-test programs against RISC-V’s Babel” problems, where different users of machine specifica- standard weak memory model tions apply quite-different tools and programming languages. Moreover, we note the existence of the following case While we acknowledge that the different programming lan- studies [14] based on our RISC-V Coq specification, not de- guages used can be a first-order practical concern, this paper scribed in detail in the present paper, but relevant because considers that question of syntax as orthogonal to another to our knowledge they are firsts for use of a multipurpose one, subject of this paper, and at least as interesting: To what ISA spec. extent do the different uses of ISA specifications fundamentally • Functional-correctness proof of a pipelined processor require different presentations? • Functional-correctness proof of a software compiler The standard approach [6] considers the ISA specification • The bridging of the two theorems into end-to-end cor- mostly as an abstract syntax tree (almost like a very struc- rectness of a software and hardware system (including tured JSON file), with custom translators and pretty-printers proof of an application in the compiler’s source lan- crafted as-needed by different users. This syntactical tradi- guage) tion has been shown to be very productive to share tools The remaining sections review important elements of the between different ISAs (RISC-V, ARM, x86, ...). However, it RISC-V ISA, explain our specification style, and discuss how misses an opportunity to study what is semantically invari- to cover different use cases (differentiated by how they in- ant in all the different usages of an ISA semantics. stantiate holes in our spec). Our semantics is available on We instead consider that an ISA specification can be de- GitHub1 under a permissive open-source license. scribed as a program with holes. For us, different instantia- tions of those holes with different subprograms will cover 2 Overview the different use cases of the specification. The idea is that a user of the specification can describe The nonprofit RISC-V Foundation developed an instruction her notion of machine state, what it means to set one of the set architecture specification in English [31, 32]. The goal 32 registers, what it means to load something from memory, of our project is to translate the English specification into etc., and we will give her a program that completes what a broadly applicable, machine- and human-readable formal it means to execute any RISC-V instruction on one of her specification. states, to translate a virtual address in that state, etc. There is a particular emphasis on avoiding overspecifi- cation: Platform-specific details that are left unspecified by the English specification should also be unspecified inthe Contributions. This paper shows that it is possible to formal specification, while at the same time enabling users write, in a general-purpose programming language, a multi- of the formal specification to pin down as many of these purpose RISC-V formal specification: a specification used by platform-specific details as appropriate for their use cases. a variety of research projects in formal methods. We demon- Orthogonally, we want to support the many different use strate and explain how to achieve compatibility

Load more