Complete Spatial Safety for C and C++ Using CHERI Capabilities
Total Page:16
File Type:pdf, Size:1020Kb
UCAM-CL-TR-949 Technical Report ISSN 1476-2986 Number 949 Computer Laboratory Complete spatial safety for C and C++ using CHERI capabilities Alexander Richardson June 2020 15 JJ Thomson Avenue Cambridge CB3 0FD United Kingdom phone +44 1223 763500 https://www.cl.cam.ac.uk/ c 2020 Alexander Richardson This technical report is based on a dissertation submitted October 2019 by the author for the degree of Doctor of Philosophy to the University of Cambridge, Emmanuel College. This work was supported by the Defense Advanced Research Projects Agency (DARPA) and the Air Force Research Laboratory (AFRL), under contracts FA8750-10-C-0237 (“CTSRD”) and HR0011-18-C-0016 (“ECATS”) as part of the DARPA CRASH and SSITH research programs. The views, opinions, and/or findings contained in this report are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. This work was supported in part by HP Enterprise, Arm Limited and Google, Inc. Approved for Public Release, Distribution Unlimited. Technical reports published by the University of Cambridge Computer Laboratory are freely available via the Internet: https://www.cl.cam.ac.uk/techreports/ ISSN 1476-2986 Abstract Lack of memory safety in commonly used systems-level languages such as C and C++ results in a constant stream of new exploitable software vulnerabilities and exploit techniques. Many exploit mitigations have been proposed and deployed over the years, yet none address the root issue: lack of memory safety. Most C and C++ implementations assume a memory model based on a linear array of bytes rather than an object-centric view. Whilst more efficient on contemporary CPU architectures, linear addresses cannot encode the target object, thus permitting memory errors such as spatial safety violations (ignoring the bounds of an object). One promising mechanism to provide memory safety is CHERI (Capability Hardware Enhanced RISC Instructions), which extends existing processor architectures with capabilities that provide hardware-enforced checks for all accesses and can be used to prevent spatial memory violations. This dissertation prototypes and evaluates a pure-capability programming model (using CHERI capabilities for all pointers) to provide complete spatial memory protection for traditionally unsafe languages. As the first step towards memory safety, all language-visible pointers can beimple- mented as capabilities. I analyse the programmer-visible impact of this change and refine the pure-capability programming model to provide strong source-level compatibility with existing code. Second, to provide robust spatial safety, language-invisible pointers (mostly arising from program linkage) such as those used for functions calls and global variable accesses must also be protected. In doing so, I highlight trade-offs between performance and privilege minimization for implicit and programmer-visible pointers. Finally, I present CheriSH, a novel and highly compatible technique that protects against buffer overflows between fields of the same object, hereby ensuring that the CHERI spatial memory protection is complete. I find that the byte-granular spatial safety provided by CHERI pure-capability codeis not only stronger than most other approaches, but also incurs almost negligible perform- ance overheads in common cases (0.1% geometric mean) and a worst-case overhead of only 23.3% compared to the insecure MIPS baseline. Moreover, I show that the pure-capability programming model provides near-complete source-level compatibility with existing pro- grams. I evaluate this based on porting large widely used open-source applications such as PostgreSQL and WebKit with only minimal changes: fewer than 0.1% of source lines. I conclude that pure-capability CHERI C/C++ is an eminently viable programming environment offering strong memory protection, good source-level compatibility andlow performance overheads. Acknowledgements I would like to take this opportunity to thank the following people for their invaluable help and contribution to this project without whom this research would have not been possible: First of all, I would like to thank my supervisor, Robert Watson, for giving me the opportunity to work on the immensely interesting CHERI project. I cannot thank him enough for the guidance and help he has provided over the past years since I started my MPhil in 2014. I am very grateful for his encouragement and useful critiques of this research work as well as his willingness to answer all the questions I had. I would like to thank David Chisnall for teaching me about the internal workings of the CHERI compiler at the start of my PhD. I would also like to extend my thanks to Khilan Gudka for all his work on the tools that I relied on for my PhD (and for my MPhil before), in particular for making WebKit and libc++ run on CHERI. I would like to thank Nathaniel Filardo for his extremely helpful and detailed feedback on hundreds of pages of my dissertation draft as well as providing assistance with his LaTeX and TikZ skills. Thank you also for the many useful discussions around new CHERI ideas, some of which have made it into this dissertation. I would also like to thank the wider CHERI project members, including Simon Moore, Brooks Davis, Lawrence Esswood, John Baldwin, Alfredo Mazzinghi, Edward Napierała, Jessica Clarke, Alexandre Joannou and Jonathan Woodruff. The CHERI project is a truly collaborative effort that would not be possible without the specific skill-set brought by each individual member. I have learnt so much from every one of you. I would furthermore like to thank my examiners, Alan Mycroft and Howard Shrobe for their invaluable feedback and suggested corrections. I could not have undertaken this research without the funding provided by Hewlett-Packard Enterprise and DARPA. I would like to thank Hewlett-Packard Labs, and particularly Dejan Milojicic, for the internship in Palo Alto that sparked many further research ideas. I would like to thank my family for their constant support and for giving me the opportunity to study and do research at Cambridge. Thank you also for proofreading my entire dissertation (multiple times!). Finally, I would like to thank Claire for her patience and amazing support over the past four years. Thank you also for many hours proofreading of my thesis. I could not have done this without you! Contents 1 Introduction 11 1.1 Contributions . 13 1.2 Publications . 14 1.2.1 Publications directly contributing to this dissertation . 14 1.2.2 Other publications . 15 1.3 Work performed in collaboration with others . 16 1.4 Outline . 17 2 Background 19 2.1 Memory-corruption attacks . 19 2.1.1 Code-injection attacks . 19 2.1.2 Code-reuse attacks . 20 2.1.3 Data-only attacks . 21 2.1.4 Information leakage . 22 2.1.5 Multi-purpose mitigation techniques . 22 2.2 CHERI . 23 2.3 Linkage . 25 2.3.1 Relocation processing . 25 2.3.2 GOT and PLT . 26 3 Pure-capability CHERI C/C++ 29 3.1 The pure-capability programming model . 30 3.2 Pointer representation . 32 3.2.1 Pointer size . 32 3.2.2 Bounded pointers . 33 3.2.3 Alignment of capabilities . 34 3.2.4 Preserving tags when copying memory . 35 3.2.5 Pointer comparison . 37 3.2.6 Pointer permissions . 38 3.3 CHERI capability registers . 39 3.3.1 Function prototypes and calling conventions . 39 3.3.1.1 Unprototyped (K&R) functions . 39 3.3.1.2 Variadic argument handling . 39 3.3.2 Recommended approach . 40 3.4 CHERI capability precision . 41 3.4.1 Choosing the precision for CHERI-128 . 41 3.4.2 Out-of-bounds pointers . 41 3.4.3 Compatibility concerns due to precision . 43 3.4.4 ISA extensions related to precision . 43 3.4.5 Compiler changes due to reduced precision . 44 3.5 Conversions between capabilities and integers . 45 3.5.1 Conversions in hybrid CHERI C . 45 3.5.2 Integer-to-pointer casts in pure-capability C . 45 3.6 Offset and address interpretation of capabilities . 46 3.6.1 Bitwise operations on capabilities . 47 3.6.1.1 Changing and checking pointer alignment . 47 3.6.1.2 Storing additional data in pointers . 48 3.6.1.3 Computing hash values . 49 3.6.2 Implicit conversions between uintptr_t and integers . 49 3.6.3 Other compatibility concerns . 50 3.6.4 Introducing an address interpretation of capabilities . 51 3.7 Adding C++ support . 51 3.8 Optimizations for pure-capability code . 52 3.8.1 ISA changes . 52 3.8.2 Optimizing bounds on stack variables . 54 3.9 Unsupported C/C++ programming idioms . 55 3.9.1 XOR-linked lists . 56 3.9.2 Offsets relative to the current structure . 56 3.9.3 Updating pointers after realloc() .................. 56 3.9.4 Byte-wise copying of data . 57 3.9.5 Using high pointer bits . 57 3.9.6 Yet to be discovered issues . 57 3.10 Future changes to pure-capability C/C++ .................. 57 3.10.1 True offset semantics . 58 3.10.2 Explicit provenance source for uintptr_t .............. 58 3.10.3 Strict compilation mode (CHERI sanitizer) . 58 3.11 Summary . 58 4 Pure-capability CHERI linkage 61 4.1 Introduction . 62 4.2 CHERI pure-capability linkage models . 63 4.2.1 MIPS n64 baseline . 65 4.2.2 Original CHERI static linkage model . 66 4.2.3 Function-descriptor prototype . 67 4.2.4 PC-relative ABI . 68 4.2.5 PLT ABI . 70 4.2.6 Overview . 72 4.3 PLT stubs and lazy binding . 72 4.3.1 Dynamic allocation of PLT stubs . 72 4.3.2 Run-time configurable policy . 73 4.3.3 Lazy binding .