Defending In-Process Memory Abuse with Mitigation and Testing

Defending In-Process Memory Abuse with Mitigation and Testing A Dissertation Presented by Yaohui Chen to The Khoury College of Computer Sciences in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Northeastern University Boston, Massachusetts October 2019 Version Dated: October 21, 2019 To my parents, who gave me the life like a river flows, & To Boyu, my best friend, who accompanies me through the rapids and undertows. i Contents List of Figures v List of Tables viii Acknowledgments x Abstract of the Dissertation xii 1 Introduction 1 1.1 Problem Statement .................................. 1 1.2 Thesis Statment .................................... 3 1.3 Contributions ..................................... 3 1.3.1 A Hybrid Approach for Practical Fine-grained Software Randomization .. 3 1.3.2 Leave No Program Behind: Execute-only Memory Protection For COTS Binaries .................................... 4 1.3.3 Keep My Secrets: In-process Private Memory ................ 4 1.3.4 Focus on bugs: Bug-driven Hybrid Fuzzing ................. 5 1.3.5 Learning On Experience: Smart Seed Scheduling for Hybrid Fuzzing ... 6 1.4 Roadmap ....................................... 7 2 Related Works 8 2.1 Perpetual War On Memory Corruption Attacks ................... 8 2.2 In-Process Memory Isolation ............................. 10 2.3 Automatic Software Tests Generation ........................ 12 I Runtime Protections Against In-Process Abuse 15 3 Code Reuse Exploit Mitigations 16 3.1 Compiler-assisted Code Randomization ....................... 16 3.1.1 Background .................................. 16 3.1.2 Overall Approach ............................... 19 3.1.3 Compiler-level Metadata ........................... 21 3.1.4 Link-time Metadata Consolidation ...................... 25 ii 3.1.5 Code Randomization ............................. 28 3.1.6 Experimental Evaluation ........................... 28 3.2 Enabling Execute-Only Memory for COTS Binaries On AArch64 ......... 33 3.2.1 Overview ................................... 33 3.2.2 Background .................................. 34 3.2.3 Design .................................... 38 3.2.4 Evaluation .................................. 47 3.3 Limitations ...................................... 51 4 In-process Memory Isolation 52 4.1 Overview ....................................... 53 4.2 Design ......................................... 55 4.3 Implementation .................................... 64 4.4 Evaluation ....................................... 66 4.5 Limitations and Discussion .............................. 71 II Offline Software Testing To Find Memory Corruption Bugs 72 5 Bug-driven Hybrid Testing 74 5.1 Background and Motivation ............................. 74 5.1.1 In-efficiency of Existing Coverage-guided Hybrid Testing ......... 74 5.1.2 Motivation .................................. 75 5.2 Design ......................................... 77 5.2.1 Core Techniques ............................... 77 5.2.2 System Design ................................ 80 5.3 Implementation .................................... 85 5.4 Evaluation ....................................... 87 5.4.1 Evaluation with LAVA-M .......................... 88 5.4.2 Evaluation with Real-world Programs .................... 90 5.4.3 Vulnerability Triage ............................. 93 6 Learning-based Hybrid Fuzzing 98 6.1 Introduction ...................................... 98 6.2 Background ...................................... 100 6.2.1 Hybrid Fuzzing ................................ 100 6.2.2 Supervised Machine Learning ........................ 102 6.3 System Design .................................... 103 6.3.1 System Overview ............................... 103 6.3.2 System Requirements ............................ 103 6.3.3 Feature Engineering ............................. 105 6.3.4 Seed Label Inference ............................. 107 6.3.5 Model Construction and Prediction ..................... 108 6.3.6 Updating Model ............................... 109 6.4 Evaluation and Analysis ............................... 110 iii 6.4.1 Evaluation setup ............................... 110 6.4.2 Learning Effectiveness ............................ 111 6.4.3 Insights and Analyses ............................ 112 6.4.4 Model Reusability .............................. 113 6.4.5 Model Transferability ............................ 114 6.4.6 Discovered Bugs ............................... 115 6.5 Discussions ...................................... 117 6.5.1 Applicability of different machine learning models ............. 117 6.5.2 Applicability of MEUZZ on grey-box fuzzing ................ 118 7 Conclusion 123 Bibliography 126 iv List of Figures 3.1 Example of the fixup and relocation information that is involved during the compilation and linking process. .............................. 18 3.2 Overview of the proposed approach. A modified compiler collects metadata for each object file 1 , which is further updated and consolidated at link time into a single extra section in the final executable 2 . At the client side, a binary rewriter leverages the embedded metadata to rapidly generate randomized variants of the executable 3 . ..................................... 21 3.3 An example of the ELF layout generated by Clang (left), with the code of a par- ticular function expanded (center and right). The leftmost and rightmost columns in the code listing (“BBL” and “Fragment”) illustrate the relationships between basic blocks and LLVM’s various kinds of fragments: data (DF), relaxable (RF), and alignment (AF). Data fragments are emitted by default, and may span consecutive basic blocks (e.g., BBL #1 and #2). The relaxable fragment #1 is required for the branch instruction, as it may be expanded during the relaxation phase. The padding bytes at the bottom correspond to a separate fragment, although they do not belong to any basic block. .................................. 22 3.4 Example of jump table code generated for non-PIC and PIC binaries. ........ 25 3.5 Overview of the linking process. Per-object metadata is consolidated into a single section. ........................................ 27 3.6 Performance overhead of fine-grained (function vs. basic block reordering) randomization for the SPEC CPU2006 benchmark tests. ................ 29 3.7 NORAX System Overview: the offline tools (left) analyze the input binary, locate all the executable data and their references (when available), and then statically patch the metadata to the raw ELF; the runtime components (right) create separated mapping for the executable data sections and update the recorded references as well as those generated at runtime. ............................. 39 3.8 The layout of ELF transformed by NORAX. The shaded parts at the end are the generated NORAX-related metadata. ......................... 44 3.9 Bionic Linker’s binary loading flow, NLoader operates in different binary preparing stages, including module loading, relocation and symbol resolution. ........ 44 3.10 Unixbench performance overhead for unixbench binaries, including runtime, peak resident memory and file size overhead (left: user tests, right: system tests) .... 50 v 4.1 Shreds, threads, and a process ............................ 52 4.2 Developers create shreds in their programs via the intuitive APIs and build the programs using S-compiler, which automatically verifies and instruments the executa- bles (left); during runtime (right), S-driver handles shred entrances and exits on each CPU/thread while efficiently granting or revoking each CPU’s access to the s-pools. 54 4.3 The DACR setup for a quad-core system, where k =4. The first 3 domains (Dom Dom ) are reserved by Linux. Each core has a designated domain 0 − 2 (Dom Dom ) that it may access when executing a shred. No CPU can access 3 − 6 Dom7. ........................................ 61 4.4 A shred’s transition of states ............................. 61 4.5 The time and space overhead incurred by S-compiler during the offline compilation and instrumentation phase .............................. 67 4.6 The time needed for a context switch when: (1) a shred-active thread is switched off, (2) a regular thread is switched off but no process or address space change, and (3) a regular thread is switched off and a thread from a different process is scheduled on. ........................................... 67 4.7 Invocation time of shred APIs and reference system calls (the right-most two bars are on log scale). It shows that shred entry is faster than thread creation, and s-pool allocation is slightly slower than basic memory mapping. .............. 69 4.8 Five SPEC2000 benchmark programs tested when: (1) no shred is used, (2) shreds are used but without the lazy domain adjustment turned on in S-driver, and (3) shreds are used with the lazy domain adjustment. .................. 69 5.1 A demonstrative example of hybrid testing. Figure 5.1a presents the code under test. Figure 5.1b and 5.1c are the paths followed by two seeds from the fuzzer. Their execution follows the red line and visits the grey boxes. Note that the white boxes connected by dotted lines are non-covered code. ................... 75 5.2 A demonstrative example of limitation in finding defects by existing hybrid testing. This defect comes from objdump-2.29 [33]. ..................... 76 5.3 An example showing how to estimate the bug-detecting potential of a seed. In this example, the seed follows the path b1->b2->b3->b4. Basic block b5

Load more