Compiler Techniques for Binary Analysis and Hardening

Politecnico di Milano Dipartimento di Elettronica, Informazione e Bioingegneria Doctoral Programme In Computer Science and Engineering Compiler Techniques for Binary Analysis and Hardening Doctoral Dissertation of: Alessandro Di Federico Supervisor: Prof. Giovanni Agosta Tutor: Prof. Andrea Bonarini The Chair of the Doctoral Program: Prof. Andrea Bonarini XXX Cycle Contents List of Figures v List of Algorithms vii List of Listings ix List of Tables xi Abstract xiii Introduction xv Binary Analysis . xv Binary Hardening . xvii I rev.ng: a unified binary analysis framework 1 1 Background 3 1.1 The Compilation Process . .3 1.2 ELF . .4 1.2.1 Object Files . .5 1.2.2 Executable Programs . .6 1.2.3 Dynamic Loading . .8 1.3 LLVM . 11 1.3.1 The LLVM IR . 12 1.4 QEMU . 15 1.5 Monotone Frameworks . 17 2 A rev.ng Overview 23 2.1 Requirements and design criteria . 23 2.2 revamb: a Static Binary Translator . 24 2.2.1 Representation of the CPU State . 26 2.2.2 Basic Block Identification . 26 2.2.3 Organization of the Translated Code . 27 2.2.4 Handling of Operating System Interactions . 28 2.2.5 Identification of Function Calls . 29 i ii CONTENTS 2.3 Advanced Features . 30 2.3.1 Debugging . 31 2.3.2 Dynamic Libraries Support . 32 2.3.3 Instrumentation . 34 2.3.4 An Alternative Front-end: CGEN . 36 2.3.5 Extending Platform Support . 37 2.4 Performance . 37 3 Basic Block Identification 43 3.1 Problem Statement . 43 3.1.1 Identifying Code and Basic Blocks . 43 3.1.2 Challenges in Jump Target Recovery . 45 3.2 Harvesting Data and Code . 46 3.2.1 Global Data Harvesting . 46 3.2.2 Simple Expression Tracker . 46 3.3 The OSR Analysis . 47 3.3.1 OSR Tracking . 50 3.3.2 BV Tracking . 51 3.3.3 Load and Store Handling . 53 3.3.4 Integration with SET . 54 3.4 Experimental Results . 54 3.4.1 Functional Testing . 56 3.4.2 Basic Block Size . 56 3.5 Conclusions . 59 4 CFG and Function Boundaries Identification 61 4.1 Challenges . 62 4.1.1 Challenges in CFG Recovery . 62 4.1.2 Challenges in the Recovery of Function Boundaries . 63 4.2 Design . 64 4.2.1 Handling of Reaching Definitions . 64 4.2.2 Function Boundaries Recovery . 67 4.3 Experimental Results . 70 4.3.1 Accuracy of the Recovered Function Boundaries . 71 4.3.2 Case Study: the Buggy Memset . 76 4.4 Conclusions . 78 5 Function Prototype Identification 79 5.1 Problem Statement . 79 5.1.1 Calling Conventions Overview . 80 5.1.2 Code Outlining . 81 5.2 Design . 82 5.2.1 Assumptions . 83 5.2.2 The Stack Analysis . 84 5.2.3 The Arguments Analyses . 87 5.2.4 The Final Output . 90 CONTENTS iii 6 Conclusions 97 6.1 Limitations and Future Directions . 97 6.2 Related Works . 97 II Compiler-aided Binary Hardening 101 7 HexVASAN: a Variadic Function Sanitizer 103 7.1 Introduction . 103 7.2 Background . 105 7.2.1 Variadic Functions . 105 7.2.2 Variadic Functions ABI . 106 7.2.3 Variadic Attack Surface . 107 7.2.4 Format String Exploits . 108 7.3 Threat Model . 108 7.4 Design . 109 7.4.1 Analysis and Instrumentation . 109 7.4.2 Runtime Support . 109 7.4.3 Challenges and Discussion . 111 7.5 Implementation . 114 7.6 Evaluation . 116 7.6.1 Case Study: CFI Effectiveness . 117 7.6.2 Exploit Detection . 119 7.6.3 Variadic Functions Statistics . 120 7.6.4 SPEC CPU2006 . 121 7.7 Related Works . 122 7.8 Conclusions . 123 8 leakless: Bypassing Link-time Hardenings 127 8.1 Introduction . 127 8.2 Related Works . 129 8.3 The Dynamic Loader . 131 8.3.1 The ELF Object . 131 8.3.2 Dynamic Symbols and Relocations . 131 8.3.3 Lazy Symbol Resolution . 133 8.3.4 Symbol Versioning . 133 8.3.5 The .dynamic Section and RELRO . 134 8.4 The Attack . 135 8.4.1 The Base Case . 137 8.4.2 Bypassing Partial RELRO . 137 8.4.3 Corrupting Dynamic Loader Data . 138 8.4.4 The Full RELRO Situation [149] . 139 8.5 Implementation . 141 8.5.1 Required Gadgets . 142 8.6 Evaluation . 142 8.6.1 Dynamic Loaders . 142 8.6.2 Operating System Survey . 144 iv CONTENTS 8.6.3 Case Study: Wireshark . 145 8.6.4 Case Study: Pidgin . 146 8.6.5 ROP Chain Size Comparison . 146 8.7 Discussion . 147 8.7.1 leakless Applications . 148 8.7.2 Limitations . 149 8.7.3 Countermeasures . 149 8.8 Conclusion . 151 Conclusions 153 Bibliography 153 A Symbol Versioning Challenges 169 A.1 Constraints due to Symbol Versioning . 169 A.2 The Huge Page Issue . 170 B Dal Vangelo secondo LLVM 173 List of Figures 1.1 Overview of the Compilation Process . .5 1.2 Architectures Supported by QEMU . 15 1.3 Graphs Representing Two Simple Programs . 18 2.1 Overview of the Static Binary Translation Process . 25 2.2 gdb Stepping Through LLVM IR . 31 2.3 Dispatcher Hindering Optimization . 39 3.1 Example of the SET Algorithm . 48 3.2 FSM Representing the BV Signedness State Transitions . 52 3.3 LLVM IR Generated by Two ARM Instructions . 55 4.1 LLVM IR Example of the Need for Path-sensitive Merging . 65 4.2 ARM Instructions Sharing the Same Predicate . 66 4.3 ARM Example of Skipping Jumps . 69 4.4 uClibc ARM Implementation of memset ................ 77 5.1 Graphs of the Argument Analyses . 88 7.1 Overview of the HexVASAN Compilation Pipeline . 110 7.2 Run-time Overhead in the SPECint CPU2006 . 122 8.1 Data Structures Involved in Symbol Resolution . 132 8.2 Illustration of some of the Presented Attacks . ..

Load more