Defending In-Process Memory Abuse with Mitigation and Testing
A Dissertation Presented by
Yaohui Chen
to
The Khoury College of Computer Sciences
in partial fulfillment of the requirements for the degree of
Doctor of Philosophy
in
Computer Science
Northeastern University Boston, Massachusetts
October 2019 Version Dated: October 21, 2019 To my parents, who gave me the life like a river flows, & To Boyu, my best friend, who accompanies me through the rapids and undertows.
i Contents
List of Figures v
List of Tables viii
Acknowledgments x
Abstract of the Dissertation xii
1 Introduction 1 1.1 Problem Statement ...... 1 1.2 Thesis Statment ...... 3 1.3 Contributions ...... 3 1.3.1 A Hybrid Approach for Practical Fine-grained Software Randomization .. 3 1.3.2 Leave No Program Behind: Execute-only Memory Protection For COTS Binaries ...... 4 1.3.3 Keep My Secrets: In-process Private Memory ...... 4 1.3.4 Focus on bugs: Bug-driven Hybrid Fuzzing ...... 5 1.3.5 Learning On Experience: Smart Seed Scheduling for Hybrid Fuzzing ... 6 1.4 Roadmap ...... 7
2 Related Works 8 2.1 Perpetual War On Memory Corruption Attacks ...... 8 2.2 In-Process Memory Isolation ...... 10 2.3 Automatic Software Tests Generation ...... 12
I Runtime Protections Against In-Process Abuse 15
3 Code Reuse Exploit Mitigations 16 3.1 Compiler-assisted Code Randomization ...... 16 3.1.1 Background ...... 16 3.1.2 Overall Approach ...... 19 3.1.3 Compiler-level Metadata ...... 21 3.1.4 Link-time Metadata Consolidation ...... 25
ii 3.1.5 Code Randomization ...... 28 3.1.6 Experimental Evaluation ...... 28 3.2 Enabling Execute-Only Memory for COTS Binaries On AArch64 ...... 33 3.2.1 Overview ...... 33 3.2.2 Background ...... 34 3.2.3 Design ...... 38 3.2.4 Evaluation ...... 47 3.3 Limitations ...... 51
4 In-process Memory Isolation 52 4.1 Overview ...... 53 4.2 Design ...... 55 4.3 Implementation ...... 64 4.4 Evaluation ...... 66 4.5 Limitations and Discussion ...... 71
II Offline Software Testing To Find Memory Corruption Bugs 72
5 Bug-driven Hybrid Testing 74 5.1 Background and Motivation ...... 74 5.1.1 In-efficiency of Existing Coverage-guided Hybrid Testing ...... 74 5.1.2 Motivation ...... 75 5.2 Design ...... 77 5.2.1 Core Techniques ...... 77 5.2.2 System Design ...... 80 5.3 Implementation ...... 85 5.4 Evaluation ...... 87 5.4.1 Evaluation with LAVA-M ...... 88 5.4.2 Evaluation with Real-world Programs ...... 90 5.4.3 Vulnerability Triage ...... 93
6 Learning-based Hybrid Fuzzing 98 6.1 Introduction ...... 98 6.2 Background ...... 100 6.2.1 Hybrid Fuzzing ...... 100 6.2.2 Supervised Machine Learning ...... 102 6.3 System Design ...... 103 6.3.1 System Overview ...... 103 6.3.2 System Requirements ...... 103 6.3.3 Feature Engineering ...... 105 6.3.4 Seed Label Inference ...... 107 6.3.5 Model Construction and Prediction ...... 108 6.3.6 Updating Model ...... 109 6.4 Evaluation and Analysis ...... 110
iii 6.4.1 Evaluation setup ...... 110 6.4.2 Learning Effectiveness ...... 111 6.4.3 Insights and Analyses ...... 112 6.4.4 Model Reusability ...... 113 6.4.5 Model Transferability ...... 114 6.4.6 Discovered Bugs ...... 115 6.5 Discussions ...... 117 6.5.1 Applicability of different machine learning models ...... 117 6.5.2 Applicability of MEUZZ on grey-box fuzzing ...... 118
7 Conclusion 123
Bibliography 126
iv List of Figures
3.1 Example of the fixup and relocation information that is involved during the compi- lation and linking process...... 18 3.2 Overview of the proposed approach. A modified compiler collects metadata for each object file 1 , which is further updated and consolidated at link time into a single extra section in the final executable 2 . At the client side, a binary rewriter leverages the embedded metadata to rapidly generate randomized variants of the executable 3 ...... 21 3.3 An example of the ELF layout generated by Clang (left), with the code of a par- ticular function expanded (center and right). The leftmost and rightmost columns in the code listing (“BBL” and “Fragment”) illustrate the relationships between ba- sic blocks and LLVM’s various kinds of fragments: data (DF), relaxable (RF), and alignment (AF). Data fragments are emitted by default, and may span consecutive basic blocks (e.g., BBL #1 and #2). The relaxable fragment #1 is required for the branch instruction, as it may be expanded during the relaxation phase. The padding bytes at the bottom correspond to a separate fragment, although they do not belong to any basic block...... 22 3.4 Example of jump table code generated for non-PIC and PIC binaries...... 25 3.5 Overview of the linking process. Per-object metadata is consolidated into a single section...... 27 3.6 Performance overhead of fine-grained (function vs. basic block reordering) ran- domization for the SPEC CPU2006 benchmark tests...... 29 3.7 NORAX System Overview: the offline tools (left) analyze the input binary, locate all the executable data and their references (when available), and then statically patch the metadata to the raw ELF; the runtime components (right) create separated mapping for the executable data sections and update the recorded references as well as those generated at runtime...... 39 3.8 The layout of ELF transformed by NORAX. The shaded parts at the end are the generated NORAX-related metadata...... 44 3.9 Bionic Linker’s binary loading flow, NLoader operates in different binary preparing stages, including module loading, relocation and symbol resolution...... 44 3.10 Unixbench performance overhead for unixbench binaries, including runtime, peak resident memory and file size overhead (left: user tests, right: system tests) .... 50
v 4.1 Shreds, threads, and a process ...... 52 4.2 Developers create shreds in their programs via the intuitive APIs and build the pro- grams using S-compiler, which automatically verifies and instruments the executa- bles (left); during runtime (right), S-driver handles shred entrances and exits on each CPU/thread while efficiently granting or revoking each CPU’s access to the s-pools. 54 4.3 The DACR setup for a quad-core system, where k =4. The first 3 domains (Dom Dom ) are reserved by Linux. Each core has a designated domain 0 2 (Dom Dom ) that it may access when executing a shred. No CPU can access 3 6 Dom7...... 61 4.4 A shred’s transition of states ...... 61 4.5 The time and space overhead incurred by S-compiler during the offline compilation and instrumentation phase ...... 67 4.6 The time needed for a context switch when: (1) a shred-active thread is switched off, (2) a regular thread is switched off but no process or address space change, and (3) a regular thread is switched off and a thread from a different process is scheduled on...... 67 4.7 Invocation time of shred APIs and reference system calls (the right-most two bars are on log scale). It shows that shred entry is faster than thread creation, and s-pool allocation is slightly slower than basic memory mapping...... 69 4.8 Five SPEC2000 benchmark programs tested when: (1) no shred is used, (2) shreds are used but without the lazy domain adjustment turned on in S-driver, and (3) shreds are used with the lazy domain adjustment...... 69
5.1 A demonstrative example of hybrid testing. Figure 5.1a presents the code under test. Figure 5.1b and 5.1c are the paths followed by two seeds from the fuzzer. Their execution follows the red line and visits the grey boxes. Note that the white boxes connected by dotted lines are non-covered code...... 75 5.2 A demonstrative example of limitation in finding defects by existing hybrid testing. This defect comes from objdump-2.29 [33]...... 76 5.3 An example showing how to estimate the bug-detecting potential of a seed. In this example, the seed follows the path b1->b2->b3->b4. Basic block b5 and b7 are unexplored and they can reach L1 and L2 UBSan labels, respectively. They have been attempted by constraint solving for S1 and S2 times. The final score for this 0.05S 0.05S e 1 L1+e 2 L2 seed is ⇥ 2 ⇥ ...... 78 5.4 Solving the integer overflow in Figure 5.2. This shows the case in a 32-bit system, but it applies to 64-bit as well...... 79 5.5 System architecture of SAVIOR...... 80 5.6 A demonstrative example of reachability analysis. The target BB can “reach” 3 UBSan labels...... 82 5.7 Fork server mode in KLEE. In this mode, KLEE only performs initialization once and reuses the same executor for all the received seeds...... 84 5.8 Evaluation results with LAVA-M.The left column shows the number of bugs reached by different fuzzers and the right column shows the number of bugs triggered by the fuzzers...... 96
vi 5.9 Evaluation results with real-world programs over 24 hours. p1 and p2 are the p- values for the Mann Whitney U-test of SAVIOR vs. DRILLER and SAVIOR vs. QSYM, respectively...... 97
6.1 General hybrid fuzzing workflow...... 101 6.2 System overview of MEUZZ. The coordinator is extended with a ML engine, which consists of 4 modules – Feature extraction, label inference, prediction and training modules. During fuzzing, utility prediction and model training are carried out con- secutively. After extracting features for inputs in the fuzzer’s queue, the ML engine can predict their utilities based on the current model. Then, with the seed labels in- ferred from previously selected seeds, the model is trained iteratively with the new data...... 104 6.3 The examples that show how bug-triggering and coverage features are computed. . 106 6.4 Branch coverage fuzzing with valid seeds (higher is better). p1, p2 and p3 are p- values in Mann-Whitney U Test by comparing QSYM with MEUZZ-OL, MEUZZ- RF and MEUZZ-EN...... 119 6.5 The box plots show the importance of the features on nine programs. The impor- tance is extracted by training an offline random forest model and they are ranked by the median of their importance. Queue Size and New Cov are the most and the least important ones, respectively...... 120 6.6 Branch coverage fuzzing with naive seeds (higher is better). p1, p2 and p3 are p- values in Mann-Whitney U Test by comparing QSYM with MEUZZ-OL, MEUZZ- RF and MEUZZ-EN, respectively...... 121 6.7 This heat map shows Coverage improvement with model initialization for MEUZZ- OL over vanilla MEUZZ-OL. Y-axis is the tested programs, X-axis is the models used for initialization. Each cell shows the relative coverage comparison (%). The diagonal values show the coverage improvement on each program after initializing MEUZZ with model learn from the same program (reusability). Model transferabil- ity is shown in 7 out of the 8 programs...... 122 6.8 Off-by-one heap read overflow in tiff2ps...... 122
vii List of Tables
3.1 Collected randomizaton-assisting metadata ...... 24 3.2 Experimental evaluation dataset and results (* indicates programs written in C++) 32 3.3 Access permissions for stage 1 EL0 and EL1 ...... 35 3.4 ELF sections that comprise the code segment of the example program, the high- lighted ones are locate in the same page...... 37 3.5 Android Marshmallow system binaries that have embedded data in Nexus 5X. ... 38 3.6 Sections in the executable code page that are handled by NORAX ...... 38 3.7 ELF section reference types ...... 38 3.8 Rewritten program functionality tests...... 48 3.9 System compatibility evaluation, the converted zygote, qseecomd, installd, rild, logd, surfaceflinger, libc++, libstagefright are selected randomly to participate the test to see whether they can run transparently with other unmodified system compo- nents...... 48 3.10 Binary transformation correctness test...... 49 3.11 Embedded data identification correctness, empirical experiment shows our analysis works well in AArch64 COTS ELFs, with zero false negative rate and very low false positive rate in terms of finding embedded data. The last column shows the negligible number of leftover gadgets in the duplicated embedded data set...... 49
4.1 5 open source softwares used in evaluation ...... 66 4.2 End-to-end overhead observed while tested programs performing a complete task: the left-side part of the table shows the executing time and the right-side part shows the memory footprint...... 70
5.1 Families of potential bugs that SAVIOR enables UBSan to label. Here, x, y are n-bit integers; array is an array, the size of which is specified as size(array); ops and op refers to binary operators +, , , , % over signed and unsigned integers, u ⇥ ÷ respectively...... 82 5.2 Fuzzer specific settings in evaluation with Lava-M...... 85 5.3 LAVA-MBugs triggered by different fuzzers (before bug-guided verification). “X%” indicates that X% of the listed LAVA bugs are triggered...... 90 5.4 LAVA-M Bugs triggered by different fuzzers (after bug-guided verification). “X%” indicates that X% of the listed LAVA bugs are triggered...... 90
viii 5.5 Real-world benchmark programs and evaluation settings. In the column for Seeds, AFL indicates we reuse the testcases provided in AFL and build-in indicates that we reuse the test cases shipped with the program...... 91 5.6 Number of unique UBSan labels reached by different fuzzers in 24 hours. On aver- age SAVIOR reaches 19.68% and 15.18% more labels than DRILLER and QSYM. 93 5.7 New UBSan violations triggered with bug-guided verification in the evaluation with real-world programs. “+X/Y%” means “X” new violations are triggered, increasing the total number by “Y%”...... 94 5.8 Triage of UBsan violations triggered by SAVIOR in 24 hours...... 95
6.1 Evaluation settings ...... 110 6.2 Execution time spend on different learning stages ...... 112 6.3 The table shows the discovered bugs by MEUZZ. UB, ME, DoS, and ML refers to Undefined Behavior, Memory Error, Denial of Service, and Memory Leak, respec- tively...... 116
ix Acknowledgments
I would like to extend my greatest gratitude towards my Ph.D advisor Prof. Long Lu. Not only did he make a great research advisor as I can ever ask for; In life, he is also like a big brother to me. As a research advisor, he is always so supportive as he encourages me to do research that I am passionate about. During the process, he also guides me to apply critical thinking to distill and crystalize fuzzy ideas. As a big brother, he listens to my distress and grief about life in a foreign country. We also share a lot of joys together, I will never forget the rejoicing we had when our first S&P paper gets accepted after year-long of hard works. These invaluable and unforgettable experience help me grow stronger to be an independent researcher and to conquer obstacles in life. I also like to thank my thesis committee members, Prof. Engin Kirda, Prof. Wil Robertson and Dr. Weidong Cui. Their constructive feedbacks and helpful suggestions help me shape this thesis into its better form. If my Ph.D student life is a painting, the internship experiences would be one of the most colorful strokes. I am so fortunate to work with my mentors Dr. Weidong Cui, Dr. Xinyang Ge and Dr. Ben Niu at Microsoft Research; Dr. László Szekeres, Dr. Stefan Bucur and Dr. Franjo Ivancic at Google; Dr. Hayawardh Vijayakumar and Dr. Mike Grace at Samsung Research; Dr. Peng Li and Dr. Tao Wei at Baidu X-Lab. During my internships, they provided me the best working environments that one can ever ask for. They also showed me the importance of great teamwork; and how to cultivate research ideas and land these ideas via solid engineering. I converted everything I learned from them into my research later on after the internships. I am also grateful to meet these friends during my internships. I interacted and collaborated with them, directly and indirectly, in works, in research, and in life. They make the whole journey much more fun and unforgettable: Prof. Jun Xu, Dr. Nan Zhang, Prof. Wenbo Shen, Prof. Dave Jing Tian, Dr. Yuru Shao, Dr. Yueh-Hsun Lin, Dr. Yuping Li, Dr. Ruowen Wang, Dr. Xun Chen, Rohan Padhye, Dr. Rundong Zhou, Dr. Qian Feng, Dr. Shengjian Guo, Dr. Haining Chen, Yulong Zhang, Dr. Mingshen Sun, Dr. Yu Ding, Dr. Yizheng Chen, Dr. Yiming Gong, Dr. An Liu, Dr. Yueqiang Cheng, Zhaofeng Chen, Hangchen Yu, Willy Vasquez, Meng Xu and Dr. Markus Kusano. These collaborative experience pose very positive influence not only on my research but also on my communication and social networking. Of course, my Ph.D journey would not be complete without my friends from Stony Brook Uni- versity and Northeastern University: Zhichuang, Bo, Suwen, Farhan, Mingwei, Rui, Hyungjoon, Nahid, Meng, Shachee, Tapti, Andrea Possemato, Andrea Mambretti, Fangfan, Ahmad, Sajjad, Ahmin, Conor, Shuwen, Jingjing, Matthew, Eyza, Desheng, Mansour, Reza, Ruimin, Ryan, Alejan- dro, Tomasso, Omin and Jeremiah. We had a lot of fun times together, and I am grateful for their company. This thesis is built upon the foundation knowledge of computing systems and security, most of
x which I acquired during my study in Stony Brook University. I want to thank Prof. Donald Porter, Prof. Nima Hornamand, Prof Michalis Polychronakis, Prof. Nick Nikiforakis and Prof. R. Sekar for passing on their knowledge to me in and out of the classes. These system and security knowledge greatly benefits my PhD career later on. Lastly, I want to take this opportunity to specially thank Kelwin, Fish, DeAdCaT, zTrix, MaskRay and Flanker (the old Blue Lotus members) for being my inspiration to pursue the path of computer system security. Without all these aforementioned people, my life pursuing the Ph.D will not be the same, and I hold the utmost gratitude for them to show up in this fantastic journey.
xi Abstract of the Dissertation
Defending In-Process Memory Abuse with Mitigation and Testing
by Yaohui Chen Doctor of Philosophy in Computer Science Northeastern University, October 2019 Version Dated: October 21, 2019 Dr. Long Lu, Advisor
Modern softwares often include large code base from different origins with different trust lev- els. This creates large attack surface and raises security concerns that sensitive information of one component is directly accessible by other (malicious or manipulated) components in memory. In this thesis, I refer to this problem as in-process memory abuse. Despite the prevalence of in-process abuses, the defense mechanisms are not well studied, due to the complex root causes and attack surfaces of such attack. First of all, a large amount of the existing software is written in type-unsafe languages such as C and C++. Such languages are notorious for being error-prone. These programming errors have incurred countless high-severity security bugs that lead to in-process memory attacks. Secondly, contemporary defenses such as data execution prevention (DEP) and address space layout randomization (ASLR) have little effect on preventing in-process memory attacks. Last but not least, developers are often helpless when they are trying to protect their sensitive data, due to the lack of operating system support to create boundaries within the same process context. As a result, as long as one of the many components is successfully exploited, the whole program’s sensitive data and code are subject to abuse. A common belief is that in-process abuse cannot be defended without high-overhead or loss of backward compatibility. To reduce memory corruption bugs, options like formally verifying every software or rewriting the whole software stack with type safe language is impractical, due to the poor scalability of formal verification methods and the immense engineering cost required to rebuild all existing software infrastructures. To prevent exploitations of memory corruption bugs, one may suggest adopting full memory safety by bound checking all the pointers and tracking the liveness of every allocated memory objects. However, this comes with intolerable overheads.
xii Lastly, existing work propose rewriting established operating system design paradigms to create sub-process isolation, this creates incompatibility and reduces practicability of the solution. Challenging these common beliefs, this thesis presents a series of practical defenses against in- process memory abuse. It includes runtime protections [80, 82, 130] and offline bug detections [53, 77, 78]. Collectively, these new techniques improved the state-of-the-art defense against in-process memory abuse without sacrificing practicability and compatibility. First, I present CCR [130], a compiler-binary rewriter toolchain to enable fine-grained soft- ware randomization. CCR solves the incompatibility of existing fine-grained randomization ap- proaches by aligning its defense implementation with established software deployment and bug report paradigms. However, fine-grained randomization alone is still vulnerable to just-in-time info- leak aided code reuse attacks. To tighten this loose end, I introduce NORAX [82], a binary rewriting framework to retrofit execute-only memory (XOM) protection into source-unavailable programs. Then, I design shreds [80]–fine-grained execution units with private memory–as an extra line of defense to in-process abuse. Shreds enable sub-process isolation without relying on nested paging, virtualization or even modified hardware. It incurs negligible overheads and is highly compatible with the existing operating system design paradigm of process/thread based execution units. Lastly, for offline software test generations, I present SAVIOR [78] and MEUZZ [77], they are advanced hybrid fuzzing frameworks enlightened with bug-driven oracle to quickly find more bugs and machine learning guidance to learn from past fuzzing statistics to tune the fuzzing scheduling strategies. By designing and conducting the large-scale experiments for these proposed defenses on real- world softwares, I demonstrate that in-process memory abuses can be reasonably well-defended and prevented. The insights and knowledge gained during the development of this thesis have raised the community’s awareness of in-process abuse and advanced the state-of-the-art defense against such attacks. Each of the included works have yielded at least one practical defense or automated software testing system. Many of them have also been adopted by the industry, blocking malicious in-process abuse attempts and uncovering highly severe security bugs in critical software infrastructure on daily basis, which highlights the broad impact of the thesis.
xiii Chapter 1
Introduction
1.1 Problem Statement
Many attacks on software aim at accessing sensitive content in victim programs’ memory, in- cluding secret data (e.g., crypto keys and user passwords) and critical code (e.g., private APIs and privileged functions). To achieve the goal, such attacks normally starts with remote exploitations or injected malicious libraries. For instance, the HeartBleed attack on OpenSSL-equipped software reads private keys by exploiting a memory disclosure vulnerability [98]; the malicious libraries found in mobile apps covertly invoke private framework APIs to steal user data [95]. We generally refer to this class of attacks as in-process abuse. Obviously, such attacks would not succeed if we are able to (i) defend the victim program against exploitation attempts; or (ii) isolate the sensitive data and code from hostile code running in the same process. Despite decades of research, the defense techniques still can not meet the demand of practical and effective mitigations against in-process abuse, mainly for the following reasons.
The Pervasiveness of Memory Corruption Bugs: Memory-unsafe languages such as C and C++ allow developers to directly access memory with raw pointers. This great flexibility also imposes the burden on developers to make sure every memory access does not violate the spatial (e.g., out of bound access) and temporal (e.g., use after free) memory safety. This process, unfortunately, is very error-prone. As a result, software inevitably contains defects [32, 181]. A large amount of these defects are security vulnerabilities that can be exploited for malicious purposes [149]. This type of vulnerable code has become a fundamental threat to software security.
1 CHAPTER 1. INTRODUCTION
The Arm Races of Memory Corruption Attacks: Once a memory corruption bug is found, an at- tacker may use it to gain illegal memory accesses to hijack the control flow, alter program execution logic or read memory that can later facilitate the exploitation. To execute arbitrary code, attacks used to inject the shellcode along with the deployed payload into the victim program’s address space. Modern commodity operating systems employ code integrity protection techniques, such as data execution prevention (DEP), to prevent traditional code injection attacks. Consequently, recent attacks [61, 182] increasingly leverage code-reuse techniques to gain control of vulnerable programs. In code reuse attacks, a target application’s control flow is manipulated in a way that snippets of existing code (called gadgets) are chained to carry out malicious activities. Knowledge of process memory layout is a key prerequisite for code-reuse attacks to succeed. Attackers need to know the exact binary instruction locations in memory to assemble the chain of gadgets. Commodity operating systems widely adopt address space layout randomization (ASLR), which loads code binaries at random memory locations unpredictable to attackers. Without knowing the locations of needed code or gadgets, attackers cannot build code-reuse chains. However, memory disclosure attacks can use information leaks in programs to determine code locations, thus defeating ASLR. Such attacks either read the program code (direct de-randomization) or read code pointers (indirect de-randomization). Although deployed ASLR techniques randomize the load address of a large chunk of data or code, leaking a single code pointer or a small sequence of code allows attackers to identify the corresponding chunk, infer its base address, and calculate the addresses of gadgets contained in the chunk. After knowing the location of existing code, attackers can then launch return oriented programming (ROP) style code reuse attacks to execute arbitrary code in the victim process.
The Insufficient Support for In-process Memory Isolation: Developers are virtually helpless when it comes to preventing in-process abuse in their programs, due to a lack of support from underlying operating systems (OS): the memory isolation mechanisms provided by modern OS operate merely at the process level and cannot be used to establish security boundaries inside a process. As a result, protecting sensitive memory content against malicious code inside the same process remains an open issue, which has been increasingly exploited by attackers. To address this open issue, some recent work proposed thread-level memory isolation [62]. Taking distinct approaches, they allow developers to limit the sharing of a thread’s memory space with other threads in the same process. However, this line of works faces three major limitations. First, thread-level memory isolation is still too coarse to stop in-process abuse because exploitable
2 CHAPTER 1. INTRODUCTION or malicious code often run in the same thread as the legitimate code that needs to access sensitive memory content. Second, adopting these solutions requires significant efforts from developers. Sep- arating application components into different threads (i.e., scheduling units) demands major design changes, as opposed to regional code patches, to deal with the added concurrency. Third, threads with private memory tend to incur much higher overhead than normal threads due to the additional page table switches, TLB flushes, or nested page table management upon context switches.
1.2 Thesis Statment
In-process memory abuse has become a dominating problem in software security, while few research has studied the defense mechanisms. A common belief is that in-process abuse cannot be defended without high overhead or loss of backward compatibility. But this thesis challenges the assumptions. I propose holistic defense including runtime mitigations and offline detection tech- niques. Collectively, these tools improved the state-of-the-art defense against in-process memory abuse without sacrificing practicability and compatibility.
1.3 Contributions
Next, I give an overview of the solutions I propose to address the problems discussed in § 1.1. Our goal is to create comprehensive defense against in-process abuse attacks that plague the current software ecosystem. To this end, this thesis makes the following contributions.
1.3.1 A Hybrid Approach for Practical Fine-grained Software Randomization
Despite decades of research on software diversification, only address space layout randomiza- tion has seen widespread adoption. Code randomization, an effective defense against return-oriented programming exploits, has failed to gain wide adoption in practice mainly due to i) the lack of a transparent and streamlined deployment process that does not disrupt existing software distribution norms, and ii) the inherent incompatibility of program variants with error reporting, whitelisting, patching, and other operations that rely on code uniformity. To this end, we present compiler- assisted code randomization (CCR), a hybrid approach that relies on compiler–rewriter cooperation to enable fast and robust fine-grained code randomization on end-user systems, while maintain- ing compatibility with existing software distribution models. The main concept behind CCR is to augment binaries with a minimal set of transformation-assisting metadata, which i) facilitate rapid
3 CHAPTER 1. INTRODUCTION
fine-grained code transformation at installation or load time, and ii) form the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code. We have implemented a prototype of this approach by extending the LLVM compiler toolchain, and developing a simple binary rewriter that leverages the embedded metadata to generate randomized variants using basic block reordering. The results of our experimental evaluation demonstrate the feasibility and practicality of CCR, as on average it incurs a modest file size increase of 11.46% and a negligible runtime overhead of 0.28%, while it is compatible with link-time optimization and control flow integrity.
1.3.2 Leave No Program Behind: Execute-only Memory Protection For COTS Bi- naries
Code reuse attacks exploiting memory disclosure vulnerabilities can bypass all deployed mit- igations. One promising defense against this class of attacks is to enable execute-only memory (XOM) protection on top of fine-grained address space layout randomization (ASLR). However, recent works implementing XOM, despite their efficacy, only protect programs that have been (re)built with new compiler support, leaving commercial-off-the-shelf (COTS) binaries and source- unavailable programs unprotected. We present the design and implementation of NORAX, a practi- cal system that retrofits XOM into stripped COTS binaries on AArch64 platforms. Unlike previous techniques, NORAX requires neither source code nor debugging symbols. NORAX statically trans- forms existing binaries so that during runtime their code sections can be loaded into XOM memory pages with embedded data relocated and data references properly updated. NORAX allows trans- formed binaries to leverage the new hardware-based XOM support—a feature widely available on AArch64 platforms (e.g., recent mobile devices) yet virtually unused due to the incompatibility of existing binaries. Furthermore, NORAX is designed to co-exist with other COTS binary hardening techniques, such as in-place randomization (IPR). We apply NORAX to the commonly used An- droid system binaries running on SAMSUNG Galaxy S6 and LG Nexus 5X devices. The results show that NORAX on average slows down the execution of transformed binaries by 1.18% and in- creases their memory footprint by 2.21%, suggesting NORAX is practical for real-world adoption.
1.3.3 Keep My Secrets: In-process Private Memory
Once attackers managed to execute code in a victim program’s address space (i.e., after bypass- ing the code reuse mitigations), or found a memory disclosure vulnerability, all sensitive data and
4 CHAPTER 1. INTRODUCTION code inside that address space are subject to thefts or manipulation. Unfortunately, this broad type of attack is hard to prevent, even if software developers wish to cooperate, mostly because the con- ventional memory protection only works at the process level and previously proposed in-process memory isolation methods are not practical for wide adoption. We propose shreds, a set of OS-backed programming primitives that address developers’ cur- rently unmet needs for fine-grained, convenient, and efficient protection of sensitive memory con- tent against in-process adversaries. A shred can be viewed as a flexibly defined segment of a thread execution (hence the name). Each shred is associated with a protected memory pool, which is ac- cessible only to code running in the shred. Unlike previous works, shreds offer in-process private memory without relying on separate page tables, nested paging, or even modified hardware. Plus, shreds provide the essential data flow and control flow guarantees for running sensitive code. We have built the compiler toolchain and the OS module that together enable shreds on Linux. We demonstrated the usage of shreds and evaluated their performance using 7 non-trivial open source software, including OpenSSH and Lighttpd. The results show that shreds are fairly easy to use and incur low runtime overhead (4.67%).
1.3.4 Focus on bugs: Bug-driven Hybrid Fuzzing
A popular trend in the fuzzing research community is to augment grey box fuzz testing with symbolic execution, generally referred as hybrid testing. It leverages fuzz testing to test easy-to- reach code regions and uses concolic execution to explore code blocks guarded by complex branch conditions. As a result, hybrid testing is able to reach deeper into program state space than fuzz testing or concolic execution alone. Recently, hybrid testing has seen significant advancement. However, its code coverage-centric design is inefficient in vulnerability detection. First, it blindly selects seeds for concolic execution and aims to explore new code continuously. However, as statis- tics show, a large portion of the explored code is often bug-free. Therefore, giving equal attention to every part of the code during hybrid testing is a non-optimal strategy. It slows down the detection of real vulnerabilities by over 43%. Second, classic hybrid testing quickly moves on after reaching a chunk of code, rather than examining the hidden defects inside. It may frequently miss subtle vulnerabilities despite that it has already explored the vulnerable code paths. I introduce SAVIOR, a new hybrid testing framework pioneering a bug-driven principle. Unlike the existing hybrid testing tools, SAVIOR prioritizes the concolic execution of the seeds that are likely to uncover more vulnerabilities. Moreover, SAVIOR verifies all vulnerable program loca-
5 CHAPTER 1. INTRODUCTION tions along the executing program path. By modeling faulty situations using SMT constraints, SAV- IOR reasons the feasibility of vulnerabilities and generates concrete test cases as proofs. Our eval- uation shows that the bug-driven approach outperforms mainstream automated testing techniques, including state-of-the-art hybrid testing systems driven by code coverage. On average, SAVIOR detects vulnerabilities 43.4% faster than DRILLER and 44.3% faster than QSYM, leading to the discovery of 88 and 76 more unique bugs, respectively. According to the evaluation on 11 well fuzzed benchmark programs, within the first 24 hours, SAVIOR triggers 485 UBSAN violations, among which 243 are real bugs.
1.3.5 Learning On Experience: Smart Seed Scheduling for Hybrid Fuzzing
Seed scheduling is a prominent factor in determining the yields of hybrid fuzzing. Existing hybrid fuzzers schedule seeds based on fixed heuristics to predict input utilities with best effort. However, such heuristics are not generalizable as there is no one-size-fits-all rule that applies to different kinds of situations; they may work well on one program but are detrimental when fuzzing others. To overcome this problem, we design a Machine learning-Enhanced hybrid fUZZing system (MEUZZ), which employs supervised machine learning for devising generalizable seed scheduling. MEUZZ determines which new seeds are likely to produce better fuzzing yields based on the knowl- edge learned from past seed scheduling results. MEUZZ integrates machine learning techniques without interrupting the fuzzing workflow, it draws a series of light-weight but informative features from the reachability and dynamic analysis. Extracting these features incurs very little overhead (in microseconds). Moreover, MEUZZ automatically infers the data labels by constantly evaluating the fuzzing performance of each selected seed. As a result, MEUZZ achieves substantial efficacy, as well as generalizability. The experimental result shows MEUZZ significantly outperforms the state-of-the-art grey-box and hybrid fuzzers, especially by as much as 27.1% more code coverage compared with QSYM. More importantly, the models are extensively reusable and transferable. The reused models boost the coverage performance by 7.1% on average and the transplanted models improve 67.9% of the 56 cross-program fuzzing configurations. Also, MEUZZ can uncover 50 deeply hidden bugs–with 19 confirmed and fixed by the maintainers–when fuzzing 8 well-tested programs with the same configurations used in previous work.
6 CHAPTER 1. INTRODUCTION
1.4 Roadmap
The remainder of this dissertation is organized as follows. In chapter 2, I discuss works related to the online and offline defenses introduced in this thesis. Then, the whole thesis is split into two parts. In part one, I first show how to use runtime mitigations to break in-process abuse exploit chain in chapter 3. Following the mitigations, in chapter 4 I introduce a fine-grained memory isolation technique as the last resort provided to developers to protect their secret data or code from untrusted components running in the same process. In part two, I explore automated test generation to identify software bugs that facilitate in-process memory abuse. Specifically, I present fuzzing works in two directions. In ?? I show how existing code base can be used to automate fuzz driver generation, to improve the adoption rate of fuzz testing. Then In chapter 5 and chapter 6, I show how bug-driven and learning-based hybrid testing can be used to detect bugs hidden in deep program paths. Finally, I conclude by providing a discussion on the findings of this dissertation in chapter 7.
7 Chapter 2
Related Works
In this chapter, I discuss related works concerning the defense of in-process abuses. I start by describing the progression of memory corruption attacks and defenses. Then I discuss past efforts trying to provide extra isolation layer than process-level isolation. Finally, I give an overview of software test generation techniques and describe state-of-the-art testing techniques to facilitate exposing deep memory corruption bugs.
2.1 Perpetual War On Memory Corruption Attacks
Over the years, there has been an ongoing race between code reuse attacks and corresponding defense countermeasures. Such code reuse attacks keep evolving into new forms with more complex attack steps (e.g., Blind-ROP [61], JIT-ROP [182]). To defend against them, two categories of countermeasures (e.g., ASLR + XOM, CFI) have been proposed from different perspectives. Here we briefly review these defenses, especially execute-only memory, which is the category of this paper.
Address Space Layout Randomization (ASLR): ASLR is a practical and popular defense de- ployed in modern operating systems to thwart code reuse attacks [191]. It randomizes the memory address and makes the locations of ROP gadgets unpredictable. However, the de-facto ASLR only randomizes the base address of code pages. It becomes ineffective when facing recent memory- disclosure-based code reuse attacks [61, 182]. Such attack explores the address space on-the-fly to find ROP gadgets via a memory disclosure vulnerability. Although fine-grained ASLR increases the entropy of randomization, such as compile-time code randomization [59] and load-time ran-
8 CHAPTER 2. RELATED WORKS domization [92, 119, 125, 199], the memory disclosure attack is not directly addressed, since code pages can still be read by attackers [182]. Runtime randomization [60, 81, 90] is thus proposed to introduce more uncertainty into the program’s address space. Their effectiveness depends on who acts faster, attacker or the re-randomization mechanism. Due to the need of tracking all the code and data objects and correct their references, these solutions either require compiler’s assist or rely on runtime translation, which limit their applications and incur non-trivial overhead. eXecute-only Memory (XOM): To address the memory disclosure attack, researchers proposed execute-only but non-readable (R X) memory pages to hinder the possibility of locating reusable code (or ROP gadgets). However, one fundamental challenge to achieve this defense is that it is non-trivial to identify and separate legitimate data read operations in code pages. When source code is available, existing works like Readactor [88,89] and LR2 [67] rely on compilers to separate data reads from code pages and then enforcing XOM via either hardware-based virtualization or software-based address masking. On the other hand, for COTS binaries, which are more common in the real-world scenario, XnR [54] blocks direct memory disclosure by modifying the page fault handler in operating systems to check whether a memory read is inside a code or data region of a process. However, it cannot handle embedded data mixed in code region. HideM [109] utilizes split-TLB features in AMD processors to direct code and data access to different physical pages to prevent reading code. Unfortunately, recent processors no longer support split-TLB.
Control Flow Integrity (CFI): Enforcing CFI is another general defense against attacks that hijack control flows, including code reuse attacks. Proposed a decade ago by Abadi et al. [46], CFI has been tuned by researchers over the years [141, 147, 152, 153, 192, 193], from its early form coarse- grained CFI to its current mature appearance as fine-grained CFI. The fundamental difference is that a coarse-grained CFI allows forward edges in the control flow graph (CFG) to point at any node in the graph and backward edges to return to any call preceded destination, whilst a fine-grained CFI has a more precise set of destinations for both forward and backward edges. bin-CFI [210] and CCFIR [209] enforce the coarse-grained CFI policy on Linux and windows COTS binaries respectively. Unfortunately, enforcing a fine-grained CFI requires a more precise CFG to be built as the ground truth, which is difficult to obtain in practice based on static analysis, even when source code is available. In addition, researchers found that it is still possible to launch code reuse attacks when fine-grained CFI solution is in place due to the difficulty of extracting a perfect CFG in practice [72, 91, 101, 112].
9 CHAPTER 2. RELATED WORKS
2.2 In-Process Memory Isolation
Program module isolation: The previous works have studied the problem of isolating the execu- tions of mutually distrusting modules, ranging from libraries in user-space programs to drivers in the OS. SFI [197] and its variants [73,100] establish strict boundaries in memory space to isolate po- tentially faulty modules and therefore contain the impact resulted from the crashes or malfunctions of such modules. SFI has also been extended to build sandboxes for untrusted plugins and libraries on both x86 [105, 206] and ARM [11, 213]. Extending module isolation into kernel-space, some previous works [100,185] contain faulty drivers as well as user-space modules. Unlike these works, which focus on fault isolation or sandboxing, our work aims to prevent the in-process memory abuse launched by either vulnerable or malicious code. Our work allows developers to run sensitive code in flexibly-defined and lightweight execution units (i.e., shreds), where the code has exclusive access to private memory pools, in addition to the regular memory regions, and the execution is protected from other code running (concurrently) in the same address space. The aforementioned works require verification and instrumentation of all untrusted code modules, whereas our work only needs to analyze and harden trusted in-shred code. We repurpose the ARM memory domain to efficiently realize the design of shreds and the protection against in-process abuse. Furthermore, SFI and similar techniques assume that isolated modules should be logically independent and not interact closely, whereas shreds neither impose such restrictions nor incur additional overhead when accessing regular memory, invoking third-party library functions, or making system calls.
Process- and thread-level isolation: Arranging program components into different processes has long been advocated as a practical approach to achieving privilege and memory separation [69,126, 161]. Many widely used software, such as OpenSSH and Chrome, have adopted this approach. Separated components run in their own address spaces and are immune from memory abuse by other components. However, process separation faces three major limitations when being used for defending memory abuse. First, due to the coarse granularity of a process, memory abuse may still happen inside a component process as a result of a library call or a code injection, as shown in several real attacks on Chrome. Second, using process separation usually requires major software design changes due to the added concurrency and restrictions, which prevents wide adoption. Third, process separation can cause high overhead, particularly when separated components frequently interact. Some recent works [62, 160] proposed thread-level isolation. While incurring slightly lower overhead than process-level isolation, they still suffer from the fixed granularity and require
10 CHAPTER 2. RELATED WORKS major software changes to be adopted. In comparison, shreds are flexibly grained and easy to adopt. Shreds are also more efficient because, unlike the aforementioned works, our design does not rely on the heavy paging-based memory access control.
Protected execution environments: A number of systems were proposed for securely executing sensitive code or performing privileged tasks. Flicker [144] allows for trusted code execution in full isolation to OS or even BIOS and provides remote attestation. TrustVisor [143] improves on performance and granularity with a special-purpose hypervisor. SeCage [136] runs sensitive code in a secure VM. SICE [52] protects sensitive workloads purely at the hardware level and supports current execution on multicore platforms. SGX [145], an upcoming feature in Intel CPUs, allows user-space programs to create so-called enclaves where sensitive code can run securely but has little access to system resources or application context. In general, these systems are designed for self- contained code that can run independently in isolated or constrained environments. They are neither suitable nor practical for preventing memory abuses, which can target data or code that cannot be jailed in these isolated environments. In addition, these systems do not need to consider the case where the protected execution can be exploited, whereas our design does and enforces security checks on in-shred executions.
Memory encryption and protection: Several memory protection mechanisms were proposed be- fore. Overshadow [76] uses virtualization to render encrypted views of application memory to untrusted OS, and in turn, protects application data. Mondrian [132] is a hardware-level memory protection scheme that enables permission control at word-granularity and allows memory sharing among multiple protection domains. Another scheme [186] provides memory encryption and in- tegrity verification for secure processors. While offering strong protection, these schemes all require hardware modifications have not been adopted in real-world. In fact, this work was partly motivated by the lack of a practical and software-based memory protection mechanism. Recently, protecting cryptographic keys in memory became a popular research topic. Proposed solutions range from minimizing key exposure in memory [48, 118, 148], to avoiding key presence in the RAM by con- fining key operations to CPUs [113, 150], GPUs [196], and hardware transactional memory [114]. Although effective at preventing key thefts, a major common type of memory abuse, these works can hardly protect other types of sensitive data or code in memory.
11 CHAPTER 2. RELATED WORKS
2.3 Automatic Software Tests Generation
Software tests can expose unexpected programing errors such as memory corruptions. To un- cover software bugs, one must first trigger the functionality that contains the bug. As a result, software testing techniques seek to maximize the functionality coverage. This strategy can also be modeled as an optimization problem, where testing tools search in the potentially infinite input space inputs that trigger new program behavior. In general there are two ways of searching for interesting inputs, namely random testing and systematic analysis. Fuzzing [2] is a representative method for random testing. The fuzzer randomly generate new inputs in the hope that they will trigger unexpected program errors. Whereas systematic analysis like symbolic execution [71] will collect path constraints and utilize SMT solvers to generate inputs that satisfy the constraints. In this thesis, I focus on three categories of test generation techniques, namely fuzzing, concolic execution and hybrid fuzzing.
Advanced Grey-Box Fuzzing: Many recent works focus on improving the capability of code ex- ploration in fuzzing. CollAFL [106] aims to reduce hash collision in coverage feedback to de- crease false negatives. PTrix [79] enables path-sensitive fuzzing based on efficient hardware tracing. TFUZZ [159] transforms tested programs to bypass complex conditions and improve code coverage, and later uses a validator to reproduce the inputs that work for the original program. To generate high-quality seeds, ProFuzzer [207] infers the structural information of the inputs. Along the line of seed generation, Angora [75] assumes a black-box function at each conditional statement and ap- plies gradient descent to find satisfying input bytes. This method is later improved by NEUZZ [177] with a smooth surrogate function to approximate the behavior of the tested program.
Concolic Execution: Symbolic execution, a systematic approach introduced in the 1970s [121, 127] for program testing, has attracted new attention due to the advances in satisfiability modulo theory [93,94,107]. However, classic symbolic execution has the problems of high computation cost and path explosion. To tackle these issues, Sen proposes concolic execution [172], which combines the constraint solving from symbolic execution and the fast execution of concrete testing. Concolic execution increases the coverage of random testing [110, 111] while also scales to large software. Hence, it has been adopted in various frameworks [70, 83, 173, 174]. Recently, concolic execution is also widely applied in automated vulnerability detection and exploitation, in which the concolic component provides critical inputs by incorporating security-related predicates [51, 74]. However, concolic execution operates based on emulation or heavy instrumentation, incurring tremendous
12 CHAPTER 2. RELATED WORKS execution overhead. Purely relying on concolic execution for code exploration is less practical for large software that involves large amounts of operations. In contrast, hybrid testing runs fuzzing for code exploration and invokes concolic execution only on hard-to-solve branches. This takes advantage of both fuzzer’s efficiency and concolic executor’s constraint solving.
Hybrid Testing: Majundar et al. [139] introduce the idea of hybrid concolic testing a decade ago. This idea offsets the deficiency of both random testing and concolic execution. Specifically, their approach interleaves random testing and concolic execution to deeply explore a wide program state space. Subsequent development reinforces hybrid testing by replacing random testing with guided fuzzing [154]. This approach could rapidly contributing more high-quality seeds to concolic execu- tion. Recently, DRILLER [184] engineers the pioneering hybrid testing system. It more coherently combines fuzzing and concolic execution and can seamlessly test various software systems. Despite the advancement, DRILLER still achieves unsound vulnerability detection. DigFuzz [211] is a more recent work that tries to better coordinate the fuzzing and concolic execution components. Using a Monte Carlo algorithm, DigFuzz predicts the difficulty for a fuzzer to explore a path and priori- tizes to explore seeds with a higher difficulty score. Moreover, motivated by the growing demands in software testing, researchers have been reasoning the performance of hybrid testing. As com- monly understood, hybrid testing is largely restricted by the slow concolic execution. To this end, QSYM [208] implements a concolic executor that tailors the heavy but unnecessary computations in symbolic interpretation and constraint solving. It leads to times of acceleration.
Universal Fuzzing Adoption: As of today, two major hindrances of wide adoption of fuzzing as as such. First, to fuzz a program/library, a fuzzer requires a fuzz driver to which it can pass inputs to exercises the interested library code. Unfortunately, writing fuzz drivers remains a primarily manual exercise, a major hindrance to the widespread adoption of fuzzing. Second, state-of-the-art fuzzers suffer from 2-5x slow down if the source code is unavailable for the target program. This is due to the high overhead introduced by the dynamic emulation. As a result, grey-box fuzzers do not ex- pose bugs–when fuzzing binary-only softwares– as efficient as the compiler-based instrumentation approach. To this end, Babic´ et al. [53] built the Fudge system for automated fuzz driver generation. Fudge automatically generates fuzz driver candidates for libraries based on existing client code. Instead of relying on manual effort to compose the fuzz target, Fudge operates based on the key insight that fuzz driver can be automatically learn from client code in the existing code base. Fudge extracts the
13 CHAPTER 2. RELATED WORKS interesting function usage by static analysis and dynamic tracing and use function synthesis module to generate compilable code to use libfuzzer to fuzz the target function. To mitigate the slowdown introduced by dynamic binary instrumentation (DBI), Chen et al. [79] built Ptrix system to replace DBI with intel Processor Trace. Ptrix fully unleash the benefit of iPT with three novel designs. First, PTrix introduces a scheme to highly parallel the processing of PT trace and target program execution. Second, it directly takes decoded PT trace as feedback for fuzzing, avoiding the expensive reconstruction of code coverage information. Third, PTrix main- tains the new feedback with stronger feedback than edge-based code coverage, which helps reach new code space and defects that existing fuzzers may not.
14 Part I
Runtime Protections Against In-Process Abuse
15 Chapter 3
Code Reuse Exploit Mitigations
3.1 Compiler-assisted Code Randomization
3.1.1 Background
To fulfill our goal of generic, transparent, and fast fine-grained code randomization at the client side, there is a range of possible solutions that one may consider. In this section, we discuss why existing solutions are not adequate, and provide some details about the compiler toolchain we used.
3.1.1.1 The Need for Additional Metadata
Static binary rewriting techniques [55, 199, 209] face significant challenges due to indirect con- trol flow transfers, jump tables, callbacks, and other code constructs that result in incomplete or inaccurate control flow graph extraction [120, 163, 200]. More generally applicable techniques, such as in-place code randomization [131, 156], can be performed even with partial disassembly coverage, but can only apply narrow-scoped code transformations, thereby leaving parts of the code non-randomized (e.g., complete basic block reordering is not possible). On the other hand, ap- proaches that rely on dynamic binary rewriting to alleviate the inaccuracies of static binary rewrit- ing [92, 119, 179, 209] suffer from increased runtime overhead. A relaxation that could be made is to ensure programs are compiled with debug symbols and relocation information, which can be leveraged at the client side to perform code randomization. Symbolic information facilitates runtime debugging by providing details about the layout of ob- jects, types, addresses, and lines of source code. On the other hand, it does not include lower-level information about complex code constructs, such as jump tables and callback routines, nor it con-
16 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS tains metadata about (handwritten) assembly code [137]. To make matters worse, modern compilers attempt to generate cache-friendly code by inserting alignment and padding bytes between basic blocks, functions, objects, and even between jump tables and read-only data [194]. Various per- formance optimizations, such as profile-guided [35] and link-time [123] optimization, complicate code extraction even further—Bao et al. [56], Rui and Sekar [162], and others [50, 99, 117], have repeatedly demonstrated that accurately identifying functions (and their boundaries) in binary code is a challenging task. In the same vein, Williams-King et al. [203] implemented Shuffler, a system that relies on sym- bolic and relocation information (provided by the compiler and linker) to disassemble code and identify all code pointers, with the goal of performing live code re-randomization. Despite the im- pressive engineering effort, its authors admit that they “encountered myriad special cases” related to inaccurate or missing metadata, special types of symbols and relocations, and jump table en- tries and invocations. Considering that these numerous special cases occurred just for a particular compiler (GCC), platform (x86-64 Linux), and set of (open-source) programs, it is reasonable to expect that similar issues will arise again, when moving to different platforms and more complex applications. Based on the above, we argue that relying on existing compiler-provided metadata is not a viable approach for building a generic code transformation solution. More importantly, the complexity in- volved in the transformation process performed by the aforementioned schemes (e.g., static code disassembly, control flow graph extraction, runtime analysis, heuristics) is far from what could be considered reasonable for a fast and robust client-side rewriter, as discussed in Section ??. Conse- quently, we opt for augmenting binaries with just the necessary domain-specific metadata needed to facilitate safe and generic client-side code transformation (and hardening) without any further binary code analysis.
3.1.1.2 Fixups and Relocations
When performing code randomization, machine instructions with register or immediate operands do not require any modification after they are moved to a new (random) location. In contrast, if an operand contains a (relative or absolute) reference to a memory location, then it has to be adjusted according to the instruction’s new location, the target’s new location, or both. (Note that a similar process takes place during the late stages of compilation.) Focusing on LLVM, whenever a value that is not yet concrete (e.g., a memory location or an
17 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
Object File Final Executable ADDR Byte Code Instructions Byte Code ADDR 0x5A78 48 89 DF mov rdi, rbx 48 89 DF 0x412D58 0x5A7B 4C 89 F6 mov rsi, r14 4C 89 F6 0x412D5B 0x5A7E E8 49 43 00 00 call someFunc 1 E8 8D 30 06 00 0x412D5E 0x5A83 EB 0D jmp short 0xD 2 EB 0D 0x412D63 0x5A85 49 39 1C 24 cmp [mh],ctrl 49 39 1C 24 0x412D65 0x5A89 74 13 jz short 0x13 74 13 0x412D69 0x5A8B 49 39 5C 24 08 cmp [mh+8],ctrl 49 39 5C 24 08 0x412D6B 0x5A90 74 51 jz short 0x51 74 51 0x412D70 0x5A92 48 83 C4 08 add rsp, 8 48 83 C4 08 0x412D72 0x5A96 5B pop rbx 5B 0x412D76 0x5A97 41 5C pop r12 41 5C 0x412D77 0x5A99 41 5E pop r14 41 5E 0x412D79 0x5A9B 41 5F pop r15 41 5F 0x412D7B 0x5A9D C3 retn C3 0x412D7D
Relocation Table for Object File .text Section OFFSET TYPE VALUE ... 0x5a7f R_X86_64_PC32 someFunc-0x4 1 ... Figure 3.1: Example of the fixup and relocation information that is involved during the compilation and linking process. external symbol) is encountered during the instruction encoding phase, it is represented by a place- holder value, and a corresponding fixup is emitted. Each fixup contains information on how the placeholder value should be rewritten by the assembler when the relevant information becomes available. During the relaxation phase [57, 135], the assembler modifies the placeholder values ac- cording to their fixups, as they become known to it. Once relaxation completes, any unresolved fixups become relocations, stored in the resulting object file. Figure 3.1 shows a code snippet that contains several fixups and one relocation. The left part cor- responds to an object file after compilation, whereas the right one depicts the final executable after linking. Initially, there are four fixups (underlined bytes) emitted by the compiler. As the relocation table shows, however, only a single relocation (which corresponds to fixup 1 ) exists for address 0x5a7f, because the other three fixups were resolved by the assembler. Henceforth, we explicitly refer to relocations in object files as link-time relocations—i.e., fixups that are left unresolved after the assembly process (to be handled by the linker). Similarly, we refer to relocations in executable files (or dynamic shared objects) as load-time relocations—i.e., relocations that are left unresolved after linking (to be handled by the dynamic linker/loader). Note that in this particular example, the
final executable does not contain any load-time relocations, as relocation 1 was resolved during linking (0x4349 0x6308d). !
18 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
In summary, load-time relocations are a subset of link-time relocations, which are a subset of all fixups. Unfortunately, even if link-time relocations are completely preserved by the linker, they are not sufficient for performing fine-grained code randomization. For instance, fixup 2 is earlier resolved by the assembler, but is essential for basic block reordering, as the respective single-byte jmp instruction may have to be replaced by a four-byte one—if the target basic block is moved more than 127 bytes forward or 126 bytes backwards from the jmp instruction itself. Evidently, comprehensive fixups are pivotal pieces of information for fine-grained code shuffling, and should be promoted to first-class metadata by modern toolchains in order to provide support for generic, transparent, and compatible code diversification.
3.1.2 Overall Approach
The design of CCR is driven by the following two main goals, which so far have been limiting factors for the actual deployment of code diversification in real-world environments: Practicality: From a deployment perspective, a practical code diversification scheme should not disrupt existing features and software distribution models. Requiring software vendors to generate a diversified copy per user, or users to recompile applications from source code or transform them using complex binary analysis tools, have proven to be unattractive models for the deployment of code diversification. Compatibility: Code randomization is a highly disruptive operation that should be safely ap- plicable even for complex programs and code constructs. At the same time, code randomization inherently clashes with well-established operations that rely on software uniformity. These include security and quality monitoring mechanisms commonly found in enterprise settings (e.g., code in- tegrity checking and whitelisting), as well as crash reporting, diagnostics, and self-updating mech- anisms. Augmenting compiled binaries with metadata that enable their subsequent randomization at installation or load time is an approach fully compatible with existing software distribution norms. The vast majority of software is distributed in the form of compiled binaries, which are carefully generated, tested, signed, and released through official channels by software vendors. On each endpoint, at installation time, the distributed software typically undergoes some post-processing and customization, e.g., its components are decompressed and installed in appropriate locations according to the system’s configuration, and sometimes they are even further optimized according to the client’s architecture, as is the case with Android’s ahead-of-time compilation [188] or the
19 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
Linux kernel’s architecture-specific optimizations [86]. Under this model, code randomization can fittingly take place as an additional post-processing task during installation. As an alternative, randomization can take place at load time, as part of the modifications that the loader makes to code and data sections for processing relocations [158]. However, to avoid extensive user-perceived delays due to the longer rewriting time required for code randomization, a more viable approach would be to maintain a supply of pre-randomized variants (e.g., an OS service can be generating them in the background), which can then instantly be picked by the loader. Note that this distribution model is followed even for open-source software, as installing bi- nary executables through package management systems (e.g., apt-get) offers unparalleled con- venience compared to having to compile each new or updated version of a program from scratch. More importantly, under such a scheme, each endpoint can choose among different levels of di- versification (hardening vs. performance), by taking into consideration the anticipated exposure to certain threats [108], and the security properties of the operating environment (e.g., private intranet vs. Internet-accessible setting). The embedded metadata serves two main purposes. First, it allows the safe randomization of even complex software without relying on imprecise methods and incomplete symbolic or debug information. Second, it forms the basis for reversing any applied code transformation when needed, to maintain compatibility with existing mechanisms that rely on referencing the original code that was initially distributed. Figure 3.2 presents a high-level view of the overall approach. The compilation process remains essentially the same, with just the addition of metadata collection and processing steps during the compilation of each object file and the linking of the final master executable. The executable can then be provided to users and endpoints through existing distribution channels and mechanisms, without requiring any changes. As part of the installation process on each endpoint, a binary rewriter generates a randomized version of the executable by leveraging the embedded metadata. In contrast to existing code diver- sification techniques, this transformation does not involve any complex and potentially imprecise operations, such as code disassembly, symbolic information parsing, reconstruction of relocation information, introduction of pointer indirection, and so on. Instead, the rewriter performs simple transposition and replacement operations based on the provided metadata, treating all code sections as raw binary data. Our prototype implementation, discussed in detail in Section ??, currently supports fine-grained randomization at the granularity of functions and basic blocks, is oblivious to any applied compiler optimizations, and supports static executables, shared objects, PIC, partial/full
20 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
Input Source Code Object File Metadata Output Object File Metadata 1 Compiler (LLVM/Clang) Object File Metadata
2 Linker (gold ld) Executable Metadata Compilation
Binary Rewriting Variant #1 Variant #2 Variant #N 3 Binary Rewriter Figure 3.2: Overview of the proposed approach. A modified compiler collects metadata for each object file 1 , which is further updated and consolidated at link time into a single extra section in the final executable 2 . At the client side, a binary rewriter leverages the embedded metadata to rapidly generate randomized variants of the executable 3 .
RELRO [129], exception handling, LTO, and even CFI.
3.1.3 Compiler-level Metadata
Our work is based on LLVM [42], which is widely used in both academia and industry, and we picked the ELF format and the x86-64 architecture as our initial target platform. Figure 3.3 illustrates an example of the ELF layout generated by Clang (LLVM’s native C/C++/Objective-C compiler).
3.1.3.1 Layout Information
Initially, the range of the transformable area is identified, as shown in the left side of Figure 3.3. This area begins at the offset of the first object in the .text section and comprises all user-defined objects that can be shuffled. We modified LLVM to append a new section named .rand in every compiled object file so that the linker can be aware of which objects have embedded metadata. In our current prototype, we assume that all user-defined code is consecutive. Although it is possible to have intermixed code and data in the same section, we have ignored this case for now, as by default LLVM does not mix code and data when emitting x86 code. This is the case for other modern compilers too—Andriesse et al. [49] could identify 100% of the instructions when disassembling GCC and Clang binaries (but CFG reconstruction still remains challenging).
21 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
ELF BBL Emitted Bytes by Clang (Fixup) Disassembly by IDA Pro Fragment Header #0 0x40ABD0 53 push rbx #0 (DF) Program crt1.o 0x40ABD1 48 8B 1D 58 F7 0B 00 mov rbx, cs:Fun1 cri.o 0x40ABD8 48 85 DB test rbx, rbx Header crtbegin.o 0x40ABDB 74 2A jz short loc_40AC07 #1 (RF) .interp User- #1 0x40ABDD 48 89 DF mov rdi, rbx ; s #2 (DF) .dynsym Defined 0x40ABE0 E8 7B D7 FF FF call _strlen .dynstr Objects 0x40ABE5 48 89 DF mov rdi, rbx ; b
… 0x40ABE8 48 89 C6 mov rsi, rax ; n .rela.dyn 0x40ABEB E8 50 D3 00 00 call smemclr .rela.plt 0x40ABF0 48 8B 3D 39 F7 0B 00 mov rdi, cs:Fun1 OBJ (i) 0x40ABF7 E8 74 D3 00 00 call safefree .init 0x40ABFC 48 C7 05 29 F7 0B 00 mov cs:Fun1, 0 FUN (0) .plt 00 00 00 00 Rand. Area #2 0x40AC07 31 DB xor ebx, ebx .text 0x40AC09 0F 1F 80 00 00 00 00 nop dword ptr [rax+0x0h] #3 (AF) … #3 0x40AC10 48 8B BB 40 A3 4C 00 mov rdi, qword ptr ds:Fun2[rbx] #4 (DF) .rodata 0x40AC17 E8 54 D3 00 00 call safefree FUN (j) .fini 0x40AC1C 0F 57 C0 xorps xmm0, xmm0 0x40AC1F 0F 29 83 40 A3 4C 00 movaps xmmword ptr ds:Fun2[rbx], xmm0 .got 0x40AC26 48 83 C3 10 add rbx, 10h
.data … 0x40AC2A 48 83 FB 20 cmp rbx, 20h .bss 0x40AC2E 75 E0 jnz short loc_40AC10 #5 (RF) .symtab crtn.o #4 0x40AC30 5B pop rbx #6 (DF) .strtab crtend.o 0x40AC31 C3 retn Section 0x40AC32 66 66 66 66 66 2E 0F align 20h #7 (AF) Header 1F 84 00 00 00 00 00 Figure 3.3: An example of the ELF layout generated by Clang (left), with the code of a particular function expanded (center and right). The leftmost and rightmost columns in the code listing (“BBL” and “Fragment”) illustrate the relationships between basic blocks and LLVM’s various kinds of fragments: data (DF), relaxable (RF), and alignment (AF). Data fragments are emitted by default, and may span consecutive basic blocks (e.g., BBL #1 and #2). The relaxable fragment #1 is required for the branch instruction, as it may be expanded during the relaxation phase. The padding bytes at the bottom correspond to a separate fragment, although they do not belong to any basic block.
When loading a program, a sequence of startup routines assist in bootstrap operations, such as setting up environment variables and reaching the first user-defined function (e.g., main()). As shown in Figure 3.3, the linker appends several object files from libc into the executable for this purpose (crt1.o, cri.o, crtbegin.o). Additional object files include process termina- tion operations (crtn.o, crtend.o). Currently, these automatically-inserted objects are out of transformation—this is an implementation issue that can be easily addressed by ensuring that a set of augmented versions of these objects is made available to the compiler. At program startup, the function _start() in crt1.o passes five parameters to __libc_start_main(), which in turn invokes the program’s main() function. One of the parameters corresponds to a pointer to main(), which we need to adjust after main() has been displaced. The metadata we have discussed so far are updated at link time, according to the final layout of all objects. The upper part of Table 3.1 summarizes the collected layout-related metadata.
22 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
3.1.3.2 Basic Block Information
The bulk of the collected metadata is related to the size and location of objects, functions, basic blocks (BBL), and fixups, as well as their relationships. For example, a fixup inherently belongs to a basic block, a basic block is a member of a function, and a function is included in an object. The LLVM backend goes through a very complex code generation process which involves all scheduled module and function passes for emitting globals, alignments, symbols, constant pools, jump tables, and so on. This process is performed according to an internal hierarchical structure of machine functions, machine basic blocks, and machine instructions. The machine code (MC) framework of the LLVM backend operates on these structures and converts machine instructions into the corresponding target-specific binary code. This involves the EmitInstruction() routine, which creates a new chunk of code at a time, called a fragment. As a final step, the assembler (MCAssembler) assembles those fragments in a target-specific manner, decoupled from any logically hierarchical structure—that is, the unit of the assembly pro- cess is the fragment. We internally label each instruction with the corresponding parent basic block and function. The collection process continues until instruction relaxation has completed, to capture the emitted bytes that will be written into the final binary. As part of the final metadata, however, these labels are not essential, and can be discarded. As shown in Table 3.1, we only keep informa- tion about the lower boundary of each basic block, which can be the end of an object (OBJ), the end of a function (FUN), or the beginning of the next basic block (BBL). Going back to the example of Figure 3.3, we identify three types of data, relaxable, and align- ment fragments, shown at the right side of the figure. The center of the figure shows the emitted bytes as generated by Clang, and their corresponding code as extracted by the IDA Pro disassem- bler, for the j-th function of the i-th object in the code section. The function consists of five basic blocks, eight fragments, and contains eleven fixups (underlined bytes). Note that relaxable fragments are generated only for branch instructions and contain just a single instruction. Alignment fragments correspond to padding bytes. In this example, there are two alignment fragments (#3 and #7): one between basic blocks #2 and #3, and one between function j and the following function. For metadata compactness, alignment fragments are recorded as part of the metadata for their preceding basic blocks. The rest of the instructions are emitted as part of data fragments. Another consideration is fall-through basic blocks. A basic block terminated with a conditional branch implicitly falls through its successor depending on the evaluation of the condition. In Fig-
23 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
Table 3.1: Collected randomizaton-assisting metadata
Metadata Collected Information Collection time
Layout Section offset to first object Linking Section offset to main() Linking Total code size for randomization Linking
Basic Block BBL size (in bytes) Linking (BBL) BBL boundary type (BBL, FUN, OBJ) Compilation Fall-through or not Compilation Section name that BBL belongs to Compilation
Fixup Offset from section base Linking Dereference size Compilation Absolute or relative Compilation Type (c2c, c2d, d2c, d2d) Linking Section name that fixup belongs to Compilation
Jump Table Size of each jump table entry Compilation Number of jump table entries Compilation ure 3.3, the last instruction of BBL #0 jumps to BBL #2 when the zero flag is set, or control falls through to BBL #1. Such fall-through basic blocks must be marked so that they can be treated appropriately during reordering.
3.1.3.3 Fixup Information
Evaluating fixups and generating relocation entries are part of the last processing stage during layout finalization, right before emitting the actual code bytes. Note that this phase is orthogonal to the optimization level used, as it takes place after all LLVM optimizations and passes are done. Each fixup is represented by its offset from the section’s base address, the size of the target (1, 2, 4, or 8 bytes), and whether it represents a relative or absolute value. As shown in Table 3.1, we categorize fixups into four groups, similar to the scheme proposed by Wang et al. [198], depending on their location (source) and the location of their target (destination): code-to-code (c2c), code-to-data (c2d), data-to-code (d2c), and data-to-data (d2d). We define data as a universal region that includes all other sections except the .text section. This classification helps in increasing the speed of binary rewriting when patching fixups after randomization.
3.1.3.4 Jump Table Information
Due to the complexity of some jump table code fragments, extra metadata needs to be kept for their correct handling during randomization. For non-PIC/PIE (position independent code/exe- cutable) binaries, the compiler generates jump table entries that point to targets using their absolute
24 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
Section Compiled without PIC/PIE Compiled with PIC/PIE Name Byte Code Disassembly Byte Code Disassembly .text FF 24 D5 A0 jmp qword 48 8D 05 5E lea rax, 39 4A 00 [rdx*8+0x4A39A0] 84 09 00 [rel 0x98465] 48 63 0C 90 movsxd rcx, dword [rax+rdx*4] 48 01 C1 add rcx, rax FF E1 jmp rcx 1 3 … … Code for JTE #1 Code for JTE #1* Code for JTE #0 Code for JTE #0* 2 4 .rodata D2 C0 40 00 JT Entry #0(8B) AB 7B F6 FF JT Entry #0*(4B) 00 00 00 00 0x0040C0D2 0xFFF67BAB D8 C0 40 00 JT Entry #1(8B) B1 7B F6 FF JT Entry #1*(4B) 00 00 00 00 0x0040C0D8 0xFFF67BB1 … … … … Figure 3.4: Example of jump table code generated for non-PIC and PIC binaries. address. In such cases, it is trivial to update these destination addresses based on their corresponding fixups that already exist in the data section. In PIC executables, however, jump table entries correspond to relative offsets, which remain the same irrespectively of the executable’s load address. Figure 3.4 shows the code generated for a jump table when compiled without and with the PIC/PIE option. In the non-PIC case, the jmp instruction directly jumps to the target location 1 by dereferencing the value of an 8-byte absolute address 2 according to the index register rdx, as the address of the jump table is known at link time (0x4A39A0). On the other hand, the PIC-enabled code needs to compute the target with a series of arithmetic instructions. It first loads the base address of the jump table into rax 3 , then reads from the table the target’s relative offset and stores it in rcx, and finally computes the target’s
absolute address 4 by adding to the relative offset the table’s base address. To appropriately patch such jump table constructs, for which no additional information is emit- ted by the compiler, the only extra information we must keep is the number of entries in the table, and the size of each entry. This information is kept along with the rest of the fixup metadata, as shown in Table 3.1, because the relative offsets in the jump table entries should be updated after randomization according to the new locations of the corresponding targets.
3.1.4 Link-time Metadata Consolidation
The main task of the linker is to merge multiple object files into a single executable. The linking process consists of three main tasks: constructing the final layout, resolving symbols, and updating relocation information. First, the linker maps the sections of each object into their corresponding locations in the final sections of the executable. During this process, alignments are adjusted and the
25 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS size of extra padding for each section is decided. Then, the linker populates the symbol table with the final location of each symbol after the layout is finalized. Finally, it updates all relocations created by the assembler according to the final locations of those resolved symbols. These operations influence the final layout, and consequently affect the metadata that has already been collected at this point. It is thus crucial to update the metadata according to the final layout that is decided at link time. Our CCR prototype is based on the GNU gold ELF linker that is part of binutils. It aims to achieve faster linking times compared to the GNU linker (ld), as it does not rely on the standard binary file descriptor (BFD) library. Additional advantages include lower memory requirements and parallel processing of multiple object files [190]. Figure 3.5 provides an overview of the linking process and the corresponding necessary updates to the collected metadata. Initially, the individual sections of each object are merged into a single one, according to the naming convention 1 . For example, the two code sections .text.obj1 and .text.obj2 of the two object files are combined into a single .text section. Similarly, the metadata from each object is extracted and incorporated into a single section, and all addresses are
updated according to the final layout 2 . As part of the section merging process, the linker introduces padding bytes between objects in the same section 3 . At this point, the size of the basic block at the end of each object file has to be adjusted by increasing it according to the padding size. This is similar to the treatment of alignment bytes within an object file, which is considered as part of the preceding basic block. Note that we do not need to update anything related to whole functions or objects, as our representation of the layout relies solely on basic blocks. Updating the size of the basic blocks that are adjacent to padding bytes is enough for deriving the final size of functions and objects. Once the layout is finalized and symbols are resolved, the linker updates the relocations recorded by the assembler 4 . Any fixups that were already resolved at compilation time are not available in this phase, and thus the corresponding metadata remains unchanged, while the rest is updated
accordingly. Finally, the aggregation of metadata is completed 5 by updating the binary-level metadata discussed in Section 3.1.3, including the offset to the first object, the total code size for transformation, and the offset to the main function (if any). A special case that must be considered is that a single object file may contain multiple .text, .rodata, .data or .data.rel.ro sections. For instance, C++ binaries often have several code and data sections according to a name mangling scheme, which enables the use of the same identifier in different namespaces. The compiler blindly constructs these sections without consid- ering any possible redundancy, as it can only process the code of a single object file at a time. In
26 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
ELF Object (1) Layout Header … Fixups .strtab .text .text .text … 1 .rel.text .rodata .rodata … … Code .symtab .data from other Section Meta- objects Header data(1) …
… Other Sections
ELF Object (N) Header .symtab .strtab 4 Relocations … Update .text .strtab 3 Paddings .rel.text … .rodata … … 5 Integrated Adjustment (basic block sizes, fixup Metadata .symtab offsets from section, section removal) 2 Merging Section Meta- Header data(N) Figure 3.5: Overview of the linking process. Per-object metadata is consolidated into a single section. turn, when the linker observes redundant sections, it nondeterministically keeps one of them and discards the rest [124]. This deduplication process can cause discrepancies in the layout and fixup information kept as part of our metadata, and thus the corresponding information about all removed sections is discarded at this stage. This process is facilitated by the section name information that is kept for basic blocks and fixups during compilation. Note that section names are optional attributes required only at link time. Consequently, after deduplication has completed, any remaining section name information about basic blocks and fixups is discarded, further reducing the size of the final metadata.
27 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
3.1.5 Code Randomization
To strike a balance between performance and randomization entropy, we have opted to maintain some of the constraints imposed by the code layout decided at link time, due to short fixup sizes and fall-through basic blocks. As mentioned earlier, these constraints can be relaxed by modifying the width of short branches and adding new branches when needed. However, our current choice has the simplicity and performance benefit of keeping the total size of code the same, which helps in maintaining caching characteristics due to spatial locality. To this end, we prioritize basic block reordering at intra-function level, and then proceed with function-level reordering. Distance constraints due to fixup size may occur in both function and basic block reordering. For instance, it is typical for functions to contain a short fixup that refers to a different function, as part of a jump instruction used for tail-call optimization. At the rewriting phase, basic block reordering proceeds without any constraints if: (a) the parent function of a basic block does not have any distance-limiting fixup, or (b) the size of the function allows reaching all targets of any contained short fixups. Note that the case of multiple functions sharing basic blocks, which is a common compiler optimization, is fully supported. From an implementation perspective, the simplest solution for fall-through basic blocks is to assume that both child blocks will be displaced away, in which case an additional jump instruction must be inserted for the previously fall-through block. From a performance perspective, however, a better solution is to avoid adding any extra instructions and keep either of the two child basic blocks adjacent to its parent—this can be safely done by inverting the condition of the branch when needed. In our current implementation we have opted for this second approach, but have left branch inversion as part of our future work. As shown in Section 3.1.6.5, this decision does not impact the achieved randomization entropy. After the new layout is available, it is essential to ensure fixups are updated accordingly. We have classified fixups into four categories: c2c, c2d, d2c and d2d. In case of d2d fixups, no update is needed because we diversify only the code region, but we still include them as part of the metadata in case they are needed in the future. The dynamic linking process relies on c2d (relative) fixups to adjust pointers to shared libraries at runtime.
3.1.6 Experimental Evaluation
We evaluated our CCR prototype in terms of runtime overhead, file size increase, randomization entropy, and other characteristics. Our experiments were performed on a system equipped with an
28 CHAPTER 3. CODE REUSE EXPLOIT MITIGATIONS
6 Function Randomization Basic Block Randomization 4
2
Overhead (%) 0