Towards Verified File Systems

Towards verified file systems Thesis submitted for the degree of Doctor of Philosophy at the University of Leicester by Andrea Giugliano Department of Informatics supervised by Dr. Tom Ridge December 6, 2018 ii iii Abstract The formal methods community aims to provide a stack of verified software to users. Verified software is proven to be reliable. The rigour of mathematical logic makes it possible to prove that software meets the designer expectations. File system software enables organized data storage, and in most software systems this functionality is critical. This work provides the basis on which to build a formally verified file system. Firstly, a formal and mechanized specification of POSIX (and Linux, Mac OS X, FreeBSD) is defined and used as an oracle to test if modern implementations behave correctly; then it is shown how to extend this specification with timestamps and the challenges this extension entails; finally the definition of an immutable B-tree and the mathematical verification of its operations are mechanically formalized. operations are mechanically formalized. These achievements bring the development of a verified file system within reach. iv Acknowledgements There are a number of people without whom this thesis might not have been written, and to whom I am greatly indebted. My supervisor Dr. Tom Ridge guided me into the world of research, by supporting me in many moments of disorientation and teaching me the concept of elegant simplicity by example. My colleagues Thomas Türk, David Sheets, Anil Madhavapeddy and Peter Sewell who supported me during the research and that shared the effort needed to publish part of this work. The people at Microsoft Research Limited who believed in the potential of this work and who decided to fund this Ph.D. position. My family that enriched my studies and allowed me to study abroad, and so to achieve an MSc, realize that not everyone eats pasta, and open a path to some interesting research. My beautiful girlfriend who enlightened (and is currently enlightening) the last part of this journey. My warmest thanks to the Informatics department, formerly the Computer Science department, of the University of Leicester, which gave me the opportunity to meet fantastic people who generously shared their knowledge and time with me. My dearest thanks to the Leicester community who made my stay in this city so profound: poets such as Bobba Cass taught me that poetry needs to be in your life before it is in your pen; writers at the Phoenix Writers Club taught me how to write for an erudite and enthusiastic public; orchestra directors such as Paul Jenkins who taught me how to sing and perform opera; runners at the Parkour society who helped me discover how obstacles can become exciting opportunities; children who educated me in how to teach African drumming; street artists who taught me that music is from the heart rather than from the brain; homeless people who taught me to not forget. Thanks also to those who left a sign in history and made me bloom as a person: philosophers, poets, writers and singers such as Alan Watts, Kahlil Gibran, Oscar Wilde and Angelo Lo Forese, respectively. ii Contents 1 Introduction1 Overview of thesis.............................5 2 SibylFS: a formal file system specification7 2.1 Overview................................8 2.2 Introduction..............................8 2.3 Technical challenges.......................... 13 2.4 Model................................. 19 2.5 Test suite and harness........................ 25 2.6 Evaluation and test results...................... 30 2.7 Related work............................. 36 3 SibylFS extended with timestamps 41 3.1 Overview................................ 42 3.2 Prelude................................ 42 3.3 Introduction.............................. 49 3.4 Technical challenges.......................... 67 3.5 Model................................. 69 3.6 Oracle................................. 75 3.7 Results................................. 80 3.8 Related work............................. 86 4 B-trees, formally 89 4.1 Overview................................ 90 4.2 Basics................................. 90 4.3 B-tree definition............................ 98 4.4 Overview of approach to correctness................. 99 iii iv CONTENTS 4.5 Framestacks: a concrete representation of context......... 100 4.6 Find.................................. 102 4.7 Insert................................. 106 4.8 Refinement to block device...................... 111 4.9 Related work............................. 114 5 Conclusion and further work 119 Appendix A: SibylFS main excerpts 121 Appendix B: SibylFS extended with timestamps main excerpts 131 Periodic update events........................... 131 Examples of manual test traces...................... 133 Appendix C: B-tree main excerpts 147 5.1 B-tree wellformedness......................... 147 5.2 B-tree find............................... 154 5.3 B-tree insert.............................. 157 Appendix D: trace that shows ext4 periodicity 163 Bibliography 169 List of Tables 3.1 Number of states obtainable from single transitions........ 46 3.2 Statistics about checking a periodic trace consisting of a sequence of mkdir calls............................. 85 3.3 Statistics about growth of states................... 86 v vi LIST OF TABLES List of Figures 1.1 SibylFS as a specification or as an implementation.........5 2.1 File system testing and trace checking...............8 2.2 Modular structure of the model................... 17 2.3 The model, non-comment lines of specification........... 20 3.1 Example of syscall execution..................... 46 3.2 Example of syscalls updating and observing file time....... 47 3.3 Example of syscalls modeled with logical time........... 48 3.4 Non-deterministic state space explosion............... 58 3.5 Non-deterministic state space explosion for two transitions.... 68 3.6 Algorithm to use SibylFS with timestamp as an oracle...... 74 3.7 Creating observed timestamps from system call output...... 76 3.8 Mitigating the time non-determinism of Figure 3.5......... 79 3.9 Checking a trace involving chmod and timestamps......... 81 3.10 Checking an invalid trace involving chmod and timestamps.... 82 3.11 Checking a concurrent trace involving chmod and timestamps.. 83 3.12 Visualization of the state explosion for a trace that just repeats mkdir commands........................... 85 3.13 Non-determinism explosion for first transition of mkdir trace... 86 4.1 Example of search tree with integer keys.............. 91 4.2 Example of descending a search tree by using a framestack.... 95 4.3 Equivalence of tree and map’s find................. 96 4.4 Equivalence of tree and map’s find without arguments....... 97 4.5 Equivalence of tree and map’s find with steps........... 97 4.6 Correctness refinements........................ 99 vii viii LIST OF FIGURES 4.7 Context refinement from a graph view to an abstract and concrete algebraic view............................. 100 4.8 Example of algebraic context and framestack equivalence..... 102 4.9 Example of insert with splitting................... 107 4.10 Example of insert with splitting and merging of root....... 108 4.11 Algebraic view of a B-tree...................... 112 4.12 Block view of a B-tree........................ 112 4.13 A B-tree according to Bayer and McCreight with branching factor 4 114 4.14 A B*-tree according to Knuth with branching factor 4....... 115 4.15 A B+ tree according to Knuth with branching factor 4. The empty boxes in the leaf nodes hold the data entries............ 116 4.16 Copy-on-write example on a CoW B-tree.............. 116 1 Introduction Software engineering is complex. Complexity causes errors, and the experience of software malfunctions is common. Although one can deal with an unresponsive phone by trying to restart it, there are situations in which software errors are not affordable: some unfortunate examples of this important issue happened in the medical field [45], the space field [18, 59, 66], the military field [72, 73], and in security [19]. For instance, in 1999 a NASA mission worth $327.6 million failed because a team of engineers used a different measurement unit to calculate velocity than the one used by the other components of the system: the Mars Climate Orbiter disappeared in space before accomplishing all the mission’s targets [67]. In such cases the aim should be to eliminate the risk of malfunction. Although standard techniques provide sufficiently reliable software [32], in mathematics one can certify correctness using proof, and one can apply proof to software also. This process is called software formal verification. The computer science community has built numerous tools to make formal verification more practical. Part of the community considers formal methods essential to obtain software correctness [33, 77]. Yet, formal verification is not suitable in all occasions: indeed, it is a complex process and the automation provided by tools available today makes it only just feasible for relatively small examples [2]. One can divide tools for formal verification in two broad categories [5]: 1 2 CHAPTER 1. INTRODUCTION model checkers check the design with respect to the specified properties encoded in a modeling language. A model checker will attempt to do so automatically with limited human intervention and return one of three results: 1. Properties are satisfied by the design. 2. Properties are not satisfied, for which a counterexample will be given. 3. Indeterminate. The state

Load more