Florida State University Libraries

Electronic Theses, Treatises and Dissertations The Graduate School

2018 Securing Systems by Vulnerability Mitigation and Adaptive Live Patching Yue Chen

Follow this and additional works at the DigiNole: FSU's Digital Repository. For more information, please contact [email protected] FLORIDA STATE UNIVERSITY

COLLEGE OF ARTS AND SCIENCES

SECURING SYSTEMS BY VULNERABILITY MITIGATION

AND ADAPTIVE LIVE PATCHING

By

YUE CHEN

A Dissertation submitted to the Department of in partial fulfillment of the requirements for the degree of Doctor of Philosophy

2018

Copyright © 2018 Yue Chen. All Rights Reserved. Yue Chen defended this dissertation on January 23, 2018. The members of the supervisory committee were:

Zhi Wang Professor Directing Dissertation

Ming Yu University Representative

Xiuwen Liu Committee Member

An-I Andy Wang Committee Member

The Graduate School has verified and approved the above-named committee members, and certifies that the dissertation has been approved in accordance with university requirements.

ii To my beloved ones.

iii ACKNOWLEDGMENTS

Pursuing a Ph.D. degree is a unique experience in my life. Here I would like to express my gratitude to a number of people. Without them, I cannot enjoy this wonderful journey. Foremost, I feel incredibly fortunate to have been under Prof. Zhi Wang’s guidance during my Ph.D. study at Florida State. His passion and dedication for research has highly influenced me and opened my eyes to the research world. His encouragement, guidance and support are invaluable power for me to explore the horizons. I have been very lucky to work with several excellent researchers. I want to express my sincere gratitude to the colleagues during my internship at Baidu X-Lab. It is a great pleasure to have the fruitful discussions with Yulong Zhang and Tao Wei to address challenging problems, and their great thoughts and insightful advising have helped me learn a lot. It is also an enjoyable and instrumental experience to work with other colleagues: Zhaofeng Chen, Zhenyu Zhong, Yu Ding; as well as other interns in the lab: Pei Wang and Peng Wang. I am grateful to all the collaborators for their insightful ideas and helpful discussions. In par- ticular, special thanks are given to Xiaoguang Wang, Ryan Baird, David Whalley and Yajin Zhou for the helpful discussions and suggestions with their precious and profound understanding of the research topics. In the computer science department of FSU, I regularly participate in Prof. An-I Andy Wang’s research group meeting. It is a great opportunity to discuss and learn topics in computer systems. I want to express my gratitude to An-I Andy Wang and his students for great suggestions about my research and the great discussion atmosphere. I would also like to thank the rest of my dissertation committee: Prof. Xiuwen Liu and Prof. Ming Yu, for their detailed advice, comments and suggestions. Last but certainly not least, I owe a big debt of gratitude to my parents and family, who support every decision I have made, including the pursuit of this degree.

iv TABLE OF CONTENTS

List of Tables ...... viii List of Figures ...... ix Abstract ...... xi

1 Introduction 1 1.1 Problem Overview ...... 1 1.2 Our Approach ...... 3 1.3 Summary of Contributions ...... 5 1.4 Dissertation Organization ...... 5

2 Related Work 6 2.1 Memory Vulnerabilities and Exploits ...... 6 2.1.1 Buffer Overflow ...... 6 2.1.2 Information Leakage ...... 6 2.1.3 NULL Pointer Dereference ...... 7 2.1.4 Arbitrary Format String ...... 7 2.1.5 Use-After-Free ...... 7 2.1.6 Data-only Attack ...... 7 2.1.7 Return-oriented Programming ...... 8 2.2 Threat Mitigation ...... 8 2.2.1 Data Execution Prevention ...... 8 2.2.2 Software Diversity ...... 8 2.2.3 ROP Defenses ...... 10 2.2.4 Control-flow Integrity ...... 10 2.3 Root-Cause Analysis ...... 11 2.3.1 Attack/Exploit Detection and Mitigation ...... 11 2.3.2 Vulnerability/Bug Discovery ...... 12 2.3.3 Record & Replay ...... 12 2.4 Generation ...... 13 2.4.1 Kernel Live Patching ...... 13 2.4.2 Semantic Matching ...... 14 2.4.3 Automatic Patch/Filter Generation ...... 14

3 On-demand Live Randomization 16 3.1 Introduction ...... 16 3.2 Design ...... 19 3.2.1 Overview ...... 19 3.2.2 Basic Block Reordering ...... 21 3.2.3 Basic Block Pointer Conversion ...... 24 3.2.4 Live Randomization of Kernel Modules ...... 27 3.2.5 Performance Optimization ...... 28

v 3.2.6 Binary-only Program Support ...... 29 3.3 Implementation ...... 30 3.4 Evaluation ...... 32 3.4.1 Security ...... 32 3.4.2 Performance ...... 35 3.5 Discussion ...... 36 3.6 Summary ...... 37

4 Pinpointing Vulnerabilities 39 4.1 Introduction ...... 39 4.2 System Overview ...... 40 4.3 System Design ...... 41 4.3.1 System Overview ...... 41 4.3.2 Attack Detection ...... 44 4.3.3 Record and Replay ...... 45 4.3.4 Pinpointing Vulnerabilities ...... 49 4.3.5 Prototype Efforts ...... 53 4.4 Evaluation ...... 53 4.4.1 Effectiveness ...... 53 4.4.2 Performance ...... 61 4.5 Discussion ...... 62 4.6 Summary ...... 63

5 Adaptive Android Kernel Live Patching 64 5.1 Introduction ...... 64 5.2 System Design ...... 67 5.2.1 Measuring Android Fragmentation ...... 67 5.2.2 Adaptive Multi-level Patching ...... 70 5.2.3 Architecture and Workflow ...... 71 5.2.4 KARMA Patches ...... 73 5.2.5 Offline Patch Adaptation ...... 76 5.2.6 Live Patching ...... 79 5.2.7 Prototype of KARMA ...... 82 5.3 Evaluation ...... 82 5.3.1 Evaluation of Applicability ...... 82 5.3.2 Evaluation of Adaptability ...... 84 5.3.3 Evaluation of Performance ...... 87 5.4 Discussion and Future Work ...... 89 5.5 Summary ...... 91

6 Conclusion 92

vi Appendix A KARMA Patch Writing for Recent Kernel Vulnerabilities 93

Bibliography ...... 97 Biographical Sketch ...... 111

vii LIST OF TABLES

3.1 Average NOP Space per Function ...... 29

3.2 Statistics of Three Web Servers ...... 32

4.1 Summary of the evaluation results on a number of DARPA CGC programs...... 57

5.1 Devices vulnerable to two infamous root exploits as of Nov. 2016. The second column lists the dates when they are disclosed in Android Security Advisory...... 65

5.2 Images obtained from popular devices...... 67

5.3 Statistics of the obtained Android kernels...... 68

5.4 The extension to Lua. The first five functions can only be used by the live patcher, not by patches...... 79

5.5 Clustering 1, 139 kernels for each function by syntax and semantics. The last-but-two column lists the time of semantic matching to compare Nexus 5 (Android 4.4.2, kernel 3.4.0) and Samsung Note Edge (Android 6.0.1, kernel 3.10.40). The experiment was conducted on an Intel E5-2650 CPU with 16GB of memory, and the results are the average over 10 repeats. The last two columns list the number of instructions and basic blocks for each function in Nexus 5...... 85

A.1 A partial list of recent critical Android kernel vulnerabilities and KARMA's effective- ness to create adaptable patches for them...... 93

viii LIST OF FIGURES

1.1 Three key components of the dissertation: Remix (Chapter 3), Ravel (Chapter 4), and KARMA (Chapter 5) ...... 3

3.1 An Example of Remix on -64 ...... 19

3.2 Jump Table Examples ...... 25

3.3 SPEC CPU2006 Performance Overhead ...... 33

3.4 SPEC CPU2006 Size Increase ...... 33

3.5 Apache Web Server Performance Overhead ...... 34

3.6 ReiserFS Performance Overhead ...... 34

4.1 A simple program with a buffer overflow at line 4...... 40

4.2 A vulnerable function used as the running example. There is a buffer overflow at line 7 caused by the integer error at line 6, and an information leak at line 9 caused by the same integer error...... 42

4.3 Overall architecture of Ravel ...... 44

4.4 Code snippet of CVE-2015-3864 in Android. There is an integer overflow at line 1, leading to the buffer overflow at line 3...... 52

4.5 Code sketch of CVE-2013-2028 in NGINX. An integer signedness error at line 9 leads to a buffer overflow at line 10...... 56

4.6 Code sketch of CVE-2014-0160 (aka. Heartbleed). The attacker controls payload. Memcpy may copy a large amount of extra data to buffer, and send it back through dtls1_write_bytes...... 58

4.7 Code sketch of vulnerability-related functions in CNMP. syslog takes the user-controlled joke_str, and passes it as a format-string argument to vsnprintf...... 60

4.8 Performance overhead of Ravel’s online components relative to the original FreeBSD system...... 61

5.1 Number of revision clusters for each shared function, sorted by the number of clusters. 68

5.2 Percentage of kernels in the largest cluster for each shared function...... 69

5.3 Workflow of KARMA ...... 72

5.4 A simplified patch in Lua for CVE-2014-3153 ...... 73

ix 5.5 Source-code patch for CVE-2013-1763 ...... 74

5.6 Source-code patch for CVE-2013-6123 ...... 75

5.7 Source-code patch for CVE-2016-0802 ...... 76

5.8 Live patching through function hooking ...... 80

5.9 sock_diag_rcv_msg of (a) Huawei Honor 6 Plus (PE-TL10) with Android 4.4 and kernel 3.10.30, compiled by GCC 4.7, and (b) Samsung Galaxy Note Edge (N915R4) with Android 5.0.1 and 3.10.40, compiled by GCC 4.8. Basic blocks and control flows with different syntax are highlighted...... 86

5.10 Three semantically different basic blocks of msm_cci_validate_queue in Oppo 3007 (left) and Samsung N910G (right). They have different callees and arguments, and thus different semantics...... 87

5.11 Performance scores by CF-Bench...... 88

5.12 Execution time of chmod with different patches...... 89

x ABSTRACT

The number and type of digital devices are increasing tremendously in today’s world. However, as the code size soars, the hidden vulnerabilities become a major threat to user security and privacy. Vulnerability mitigation, detection, and patch generation are key protection mechanisms against attacks and exploits. In this dissertation, we first explore the limitations of existing solutions. For vulnerability mitigation, in particular, currently deployed address space layout randomization (ASLR) has the drawbacks that the is randomized only once, and the segment is moved as a whole. This design makes the program particularly vulnerable to information leaks. For vulnerability detection, many existing solutions can only detect the symptoms of attacks, instead of locating the underlying exploited vulnerabilities, since the manifestation of an attack does not always coincide with the exploited vulnerabilities. For patch generation towards a large number of different devices, current schemes fail to meet the requirements of timeliness and adaptiveness. To tackle the limitations of existing solutions, this dissertation introduces the design and imple- mentation of three countermeasures. First, we present Remix, an effective and efficient on-demand live randomization system, which randomizes basic blocks of each function during runtime to pro- vide higher entropy and stronger protection against code reuse attacks. Second, we propose Ravel, an architectural approach to pinpointing vulnerabilities from attacks. It leverages a record & replay mechanism to reproduce attacks in the lab environment, and uses the program’s memory access pat- terns to locate targeted vulnerabilities which can be a variety of types. Lastly, we present KARMA, a multi-level live patching framework for Android kernels with minor performance overhead. The patches are written in a high-level memory-safe language, with the capability to be adapted to thousands of different Android kernels.

xi CHAPTER 1

INTRODUCTION

1.1 Problem Overview

Computing platforms have become omnipresent in our society over the last few decades, and have been growing in a tremendously fast speed, including desktops, laptops, mobile phones, embedded IoT devices and medical devices. They are appealing targets for various attackers. Especially, the persistent and pervasive connectivity makes the attack surface large enough for remote exploits. Memory vulnerabilities and related attack vectors become a major threat in today’s system security. The emergence of new attack and defense techniques never stops, and the arms race always continues. From a defender’s point of view, several steps need to be taken for the overall system protec- tion. First, effective system protection mechanisms are required to mitigate vulnerability exploits. These techniques include data execution prevention (DEP) [73], address space layout randomiza- tion (ASLR) [34,145] and control-flow integrity (CFI) [27], to name a few. ASLR is a technique to randomize the memory layout in order to make it difficult for attackers to guess memory addresses. This mechanism can significantly raise the bar of successful memory exploits on today’s operating systems, especially for code reuse attacks. However, traditional ASLR only randomizes the base of segments once at load time. This design has two limitations. First, it provides limited randomness, especially on the 32- architectures. Second, this defense is not suitable to defeat brute-force address guessing attacks, in which the attacker can try many times until success. Due to these drawbacks, it is particularly vulnerable to information leaks – one leaked pointer can de-randomize the whole address space, due to the fixed offsets within each segment. An ideal improvement is a live randomization scheme during runtime, with a finer granularity. In addition to runtime protection, vulnerability detection and locating are also critical tasks to figure out where the problem lies. Previous research has made significant progress in detecting attacks. However, developers still need to locate and fix these vulnerabilities, a mostly manual and time-consuming process. They face the challenge that the manifestation of an attack does not

1 always coincide with the exploited vulnerabilities. Furthermore, many attacks are hard to reproduce in the lab environment, leaving developers with limited information to locate them. To get detailed information for vulnerability locating, instrumentation is a typical solution. Instrumentation can be done through a number of interfaces, such as on the host, through a vir- tual machine manager, or on the network. For a fine granularity, instruction and memory access instrumentation is a practical way to analyze the vulnerability triggering and exploitation proce- dures. However, it usually puts heavy performance overhead on the execution environment, largely affecting the user experience. A practical solution should combine good performance and strong vulnerability locating capability together. After having discovered and located system vulnerabilities, the next critical steps are patch generation, distribution and application to protect systems from further exploits. For a single, independent system, a single patch is enough. However, for an ecosystem like Android, there are many different device models with different system versions and configurations on the market. Due to the severe fragmentation of the ecosystem, it takes too much manual labor to generate patches for thousands of different Android kernels. In other words, scalability and adaptiveness become major challenges. Additionally, official patch generation and distribution for the Android ecosystem involves several parties, such as researchers, hardware vendors, device vendors, carriers and end users. This long patching chain cannot meet the demand for fast and scalable patch generation and distribution. The result is that most Android devices are never timely updated or patched to protect their users from kernel exploits. Recent Android malware even has built-in kernel exploits to take advantage of this large window of vulnerability. Even worse, some small vendors do not have the capability for patch generation. An effective solution to this problem must be adaptable to lots of (out-of-date) devices, quickly deployable, and secure from misuse. Also, instead of a static one, a live solution without system restart or power-off can meet the requirement of timely and rapid security patch distribution. To address these problems and challenges, we propose a systematic approach which consists of three key components: Remix (Chapter 3) acts as a protection mechanism running on the production system. During runtime, it periodically randomizes basic blocks within each function to foil code reuse attacks. Ravel (Chapter 4) is a memory vulnerability locating approach which leverages a record & replay mechanism to reproduce attacks and exploitations for offline analysis. KARMA

2 System Vulnerabilities

Vulnerability Vulnerability Patch Mitigation Detection Generation

Remix Ravel KARMA

Figure 1.1: Three key components of the dissertation: Remix (Chapter 3), Ravel (Chapter 4), and KARMA (Chapter 5)

(Chapter 5) is a live patching framework aimed at generating Android kernel security live patches, which can be adapted to a variety of devices on the market. In the next section, we will give an overview of these solutions and describe how they work to protect our systems.

1.2 Our Approach

To protect our systems against exploits, we design a series of protection mechanisms. The purposes of our systems are to mitigate, discover, locate, and patch system vulnerabilities. As shown in Figure 1.1, the solution is comprised of three key components: Remix, Ravel and KARMA. Remix [55] is a practical and efficient live randomization system for vulnerability mitigation. Specifically, it randomizes basic blocks within functions on the fly, making it difficult for attackers to guess the memory addresses to foil code reuse attacks. By doing so, it avoids the complexity of migrating stale function pointers, and allows mixing randomized and non-randomized code to strike a balance between performance and security. Also, due to the local randomization (within each function) scheme instead of the global one, locality is preserved for better performance. Remix randomizes a running process in two steps: it first randomly reorders its basic blocks, and then comprehensively migrates live pointers to basic blocks. Our experiments show that Remix can significantly increase randomness to reduce the success rate of vulnerability exploits, with low performance overhead on both CPU and I/O intensive benchmarks and kernel modules, even at very short randomization time intervals.

3 In addition to exploit mitigation, detecting and locating vulnerability is another critical task to protect our systems. We propose Ravel [53], an architectural approach to pinpointing vulnerabilities from attacks. Ravel consists of an online attack detector and an offline vulnerability locator linked by a record & replay mechanism. Specifically, Ravel records the execution of a production system and simultaneously monitors it for attacks. If an attack is detected, the execution history is replayed to reveal the targeted vulnerabilities by analyzing the program’s memory access patterns under attack. We have built a prototype of Ravel based on the open-source FreeBSD operating system. We have evaluated the system with both real-world vulnerabilities and benchmark programs. The evaluation results in security and performance demonstrate that Ravel can effectively pinpoint various types of memory vulnerabilities, including buffer overflow, integer errors, information leakage, use-after- free, with low performance overhead. Furthermore, new attack detection and vulnerability locating approaches can be added to the platform, making Ravel an extensible framework for detecting and locating new vulnerabilities. After vulnerabilities are discovered and located, patching is the next step to protect the system from further exploits. As mentioned in 1.1, the Android ecosystem is highly fragmented and the patching is hardly done in a timely manner. To address these problems, we have systematically studied 1,139 Android kernels and all the recent critical Android kernel vulnerabilities. We accord- ingly propose KARMA [57], an adaptive live patching system for Android kernels. KARMA features a multi-level adaptive patching model to protect kernel vulnerabilities from exploits. Specifically, patches in KARMA can be placed at multiple levels in the kernel to filter malicious inputs, and they can be automatically adapted to thousands of Android devices. In addition, KARMA’s patches are written in a high-level memory-safe language, making them secure and easy to audit, and their runtime behaviors are strictly confined to prevent misuse. Additionally, the patch application is performed in a live manner, with no need of restart or power-off. And the patches are generated and adapted offline, end users can get (download) them from the cloud and apply these patches to their Android devices. Our evaluation demonstrates that KARMA can protect most critical kernel vulnerabilities on a large number of Android devices (520 devices in our evaluation) with only minor performance overhead (< 1%).

4 1.3 Summary of Contributions

The contributions of this dissertations are threefold summarized as follows.

• We propose an on-demand live code randomization approach, named Remix, to mitigate attacks that need the knowledge of code addresses. Remix randomizes the basic blocks inside each function during runtime to make their addresses unpredictable. Due to its local scope, we do not need to migrate stale function pointers, avoiding the function pointer identification ambiguity problem. We have implemented a prototype for both user-space processes and operating system kernel modules. The evaluation shows that Remix can effectively increase the entropy and maintain a low performance overhead in the meanwhile.

• We propose Ravel, a systematic approach leveraging a record & replay mechanism to pinpoint memory vulnerabilities from attacks. This design can not only detect attacks, but also find the root causes of these attacks and exploits, resulting in the capability of locating a variety of memory vulnerabilities with low performance overhead.

• We propose KARMA, a collaborative way to generate live patches for thousands of different Android kernels in a more timely manner. We have studied and measured 1,139 Android kernels and all the recent Android kernel vulnerabilities. Based on our observations and in- sights, we design a multi-level adaptive patching model that can be applied to the highly fragmented Android ecosystem. The patch can be written in a high-level memory-safe pro- gramming language, and its behaviors are strictly confined to prevent misuse. The evaluation shows that KARMA can both adaptively and effectively patch the majority of Android kernel vulnerabilities with negligible performance overhead.

1.4 Dissertation Organization

The rest of the dissertation is organized as follows. First, we introduce some background knowl- edge, present the related work, and compare them with our work in Chapter 2. Then we give the design, implementation and evaluation of Remix in Chapter 3, Ravel in Chapter 4, and KARMA in Chapter 5, respectively. Finally, we summarize this dissertation in Chapter 6.

5 CHAPTER 2

RELATED WORK

In this chapter, we first introduce some background knowledge about current memory vulnerabilities and attack approaches, and then describe the related work about threat mitigation, vulnerability detection and patch generation, with the comparisons to our approaches.

2.1 Memory Vulnerabilities and Exploits

In this section, we briefly introduce some memory vulnerabilities, including the basics and how they can be exploited to compromise computer systems.

2.1.1 Buffer Overflow

Out-of-bounds writes and reads are common bugs in memory. Here we express buffer overflows as out-of-bounds writes. The concept is that the write operations are out of the expected memory bounds. This could result in overwrites of sensitive memory content such as function return addresses or privilege indication . Here we mainly focus on the control-flow hijacking, and data-only overwrites will be described in Section 2.1.6. Here we give an example: the return address of a function is typically stored on the stack. If it is overwritten by unexpected instructions, the program counter value would be changed and the control-flow would be altered when the function returns. Another popular overwriting target is function pointers. Attackers could exploit them to hijack control flows, potentially resulting in whole system take-over. Depending on the memory regions, buffer overflows can then be classified as stack overflow, heap overflow, etc.

2.1.2 Information Leakage

Information leakage happens when a system designed to be closed reveals information to unau- thorized parties. For the system owner, the leakage is usually unexpected. This vulnerability can be exploited via a variety of ways, and is not limited to memory exploits. One typical example is

6 the Heartbleed bug (CVE-2014-0160) [65], which leaks memory content of vulnerable versions of the OpenSSL software.

2.1.3 NULL Pointer Dereference

A NULL pointer dereference happens when a program dereferences a pointer that is expected to be valid, but is NULL. The typical consequence is program crash, which can be used as denial-of- service (DoS) attacks. Other advanced attacks also exist like mapping a NULL page for exploitation.

2.1.4 Arbitrary Format String

Format string vulnerabilities happen when the input format string is misrepresented for malicious purposes. The inputs can be explained as commands, which can be used for reading the stack, executing code or causing a segmentation fault in the running program. The typical example is the printf function family. Nowadays, the proportion of this vulnerability category decreases as the simple validation and security practices can be performed to avoid them.

2.1.5 Use-After-Free

Use-after-free vulnerabilities are rapidly growing in popularity, especially for exploiting web browsers. Double-free is a special type of the use-after-free vulnerability. They are caused by erroneously operating on a dangling pointer, whose points-to memory has been previously freed. In a typical scenario to exploit a use-after-free vulnerability, the attacker tries to allocate an object under his/her control immediately after the vulnerable memory is freed. The memory allocator likely assigns the just-freed memory to this object, granting the attacker full control over the to-be- reused memory. If the reused memory originally contains a data pointer, the attacker could exploit it to read or write arbitrary memory data. Likewise, if it contains a code pointer, the attacker could leverage it for a control-flow hijacking.

2.1.6 Data-only Attack

As control-flow hijacking becomes more challenging, more attention is paid to data-only attacks (or non-control-data attacks) [51, 93, 94]. Unlike changing control-flow for hijacking, they instead manipulate sensitive runtime data to indirectly control the program’s execution, escalate privileges, leak information, etc. For example, an attacker can use techniques to overwrite the privilege bit, resulting in privilege escalation.

7 Recently, data-oriented programming (DOP) [94] demonstrates that attackers can systemically construct expressive non-control-data exploits, and a large percentage of the evaluated real-world programs have gadgets to simulate arbitrary computations. Effective, low-overhead detection of data-only attacks is a challenging problem.

2.1.7 Return-oriented Programming

Return-oriented programming (ROP) [43, 48, 133] is a memory exploit technique that lets an attacker execute code in the presence of security defense mechanisms such as data execution pre- vention (DEP) [69, 70, 73] and code signing. Previously, attackers inject their own data into the memory space, and then execute it as code. However, with the deployment of DEP, the injected data cannot be executed due to the W xor X protection mechanism. With ROP, instead of injecting foreign code, attackers can reuse existing code to bypass DEP. They can reuse either whole functions or short code fragments called gadgets. For x86, due to its nature of variable-length instructions, the control-flow could jump into the middle of an instruction, and interpret this location as the beginning of another instruction. This operation gives the attacker flexible choices to perform exploits.

2.2 Threat Mitigation

To mitigate these threats, security researchers and experts have proposed several mitigation strategies. We describe them as follows.

2.2.1 Data Execution Prevention

With the deployment of data execution prevention (DEP) [69, 70, 73], direct code injection is foiled. The basic idea is to prevent data from being executed. This setting can regard the injected data as non-executable, removing the danger of traditional code injection attacks.

2.2.2 Software Diversity

Software diversity is the approach that diversifies program code or data to foil attacks that depend on the knowledge of the program attributes like the memory layout [109]. For example, ROP chains the discovered gadgets together by arranging their addresses on the stack. Each gadget ends with a return instruction, which pops the next gadget address from the stack and executes it.

8 As such, ROP needs to know the gadget addresses. If one makes this of knowledge unavailable to the attacker, it is difficult to successfully perform this kind of exploits in practice. Software diversity leverages the information asymmetry to make the attack cost prohibitive. There exist several types of defenses from different diversification perspectives, such as during compilation, during runtime, etc. For example, we can instruct compilers to generate different (with enough entropy) for the same piece of , making attackers unable to know or predict the target’s actual memory layout. In the rest of this section, we mainly focus on the code randomization during programs’ runtime. Code randomization [49,52,117,143,150] aims at making gadget addresses unpredictable to foil code reuse attacks. Code randomization systems differ in the randomization granularity. Address space layout randomization (ASLR) is a popular coarse-grained code randomization system which has already been integrated in popular operating systems [145]. It places the program binary as a whole at a random base address. Consequently, ASLR has limited entropy on the 32-bit architec- tures [134]. Because the program internal layout is not changed, ASLR is especially vulnerable to information leak attacks – a single leaked code pointer can de-randomize the whole process. ASLP works at a finer granularity than ASLR [102]. It permutes functions and static data objects in ad- dition to randomizing the section bases. In comparison, Remix works on the basic blocks and also supports live randomization of running processes. Binary stirring is one of the fine-grained code ran- domization systems [151]. It also works at the basic block level. However, it stirs basic block globally for once at the load time. Remix instead reorders basic blocks within their respective functions. This localizes the changes required to compensate the moved basic blocks. It allows a relatively simple implementation of live randomization. Giuffrida et al. proposes a live randomization system that relies on heavy compiler customization to output meta-data for the pointer conversion [84]. In particular, it needs to migrate function pointers which may involve unsolvable ambiguity and require manual efforts. Remix confines the changes (mostly) to functions, and thus are easier to im- plement. At an even finer granularity, some systems randomize the instruction through encoding or encryption to defeat code injection and code reuse attacks [38,101]. ILR randomizes the location of every instruction [90]. It uses a process virtual machine (Strata) to execute the scattered code. IPR rewrites instruction sequences with equivalent same-length instructions [126]. It can eliminate about 10% useful gadgets and probabilistically break 80% of them. It supports a variety of con-

9 crete transformations, such as atomic instruction substitution, instruction reordering, and register reassignment. Data randomization has also been proposed to prevent data-based attacks [40,61]. Code randomization systems are often vulnerable to the leak of memory contents. For example, JIT-ROP repeatedly exploits a memory leak vulnerability to map the victim process’ code in order to launch an on-demand ROP attack [140]. A few systems have been proposed to enhance fine-grained code randomization to withstand JIT-ROP attacks [36, 62, 83]. They all utilize the execute-only memory in which the code can only be executed but not read. Remix only provides probabilistic defense against JIT-ROP attacks (Section 3.4.1). Remix should integrate execute-only memory when it is available in the commodity hardware. Such a combination would significantly raise the bar of successful code reuse attacks.

2.2.3 ROP Defenses

ROP exploits short snippets of the existing code, called gadgets, for malicious purposes [43,133]. Gadgets of ROP end with a return instruction. This allows the attacker to chain a number of gadgets together using a crafted stack. ROP has been demonstrated to be Turing-complete when given a reasonably sized code base. Variations of ROP that do not rely on return instructions have also been proposed [42,48]. Code randomization and control-flow integrity are two systematic defenses against ROP. Besides, there are a wide variety of diverse ROP defenses. For example, G-free eliminates usable gadgets at compile-time by removing unaligned free-branch instructions [124]. Return-less also leverages a customized compiler to remove intended and unintended return instructions [112]. KBouncer [127] and ROPecker [58] detects ROP attacks by checking whether the path to a sensitive contains too many indirect branches to “short” gadgets. Recent work shows that this approach might not be effective [46]. In particular, the threshold is very hard to accurately determine [86].

2.2.4 Control-flow Integrity

Control-flow integrity is another effective defense against code reuse attacks [27]. It inserts in-line monitors to confine the runtime control flow to the program’s (static) control-flow graph. CFI systems vary in the protection granularity. Fine-grained CFI provides a strong protection against most control-flow hijacking attacks, but often has high performance overhead. It also re- quires a precise control-flow graph that still is not readily available in the commodity compilers.

10 Recent research effort focuses on reducing CFI performance overhead for commodity systems and applications [157, 159]. They trade the protection granularity for performance, leading to poten- tial vulnerabilities [72, 85]. Opaque CFI uses coarse-grained control-flow integrity to strengthen fine-grained code randomization against certain types of information leak attacks [116]. Instead of validating the exact target address, OCFI ensures that the target is within a certain randomized bound. RockJIT leverages modular CFI to protect the JIT compiler and the dynamically gener- ated code [122]. It builds a fine-grained CFG from the source code of the JIT compiler, and keeps the control-flow policy updated with the new generated code. Even though most CFI systems are implemented in the software, hardware architectural support for CFI has been proposed that can substantially simplify and speed up CFI systems [77].

2.3 Root-Cause Analysis

In this section, we discuss the related work regarding root-cause analysis, and compare Ravel against them.

2.3.1 Attack/Exploit Detection and Mitigation

There is a long of research in the attack detection, exploit mitigation and system fault isolation [36, 45, 54, 144, 148, 149, 162]. Control-flow integrity (CFI [27]) and data-flow integrity (DFI [47]) provide a comprehensive protection against control-flow and data-flow attacks, respec- tively. They enforce the security policy that runtime control flow/data flow must follow the pro- gram’s control-flow/data-flow graph. CFI is an effective defense against most control-flow attacks. DFI has an even broader coverage because it can also detect data-only attacks. These two techniques have inspired a lot of related systems, including Ravel [72, 85, 141, 157, 159]. One of their focuses is to minimize the performance overhead so that they can be practically deployed [141, 157]. For example, Kenali enforces DFI for the kernel’s access control system [141]. It automatically infers the critical data that need protection and enforces DFI for that data. CFI as a generic attack detection technique can be integrated into Ravel. Ravel’s architecture can be easily extended with all kinds of attack detection techniques. Ravel’s data-flow analysis extends DFI with analyses to locate and refine underlying vulnerabilities. As we have demonstrated, a data-flow violation does not always coincide with the exploited vulnerabilities. Moreover, full DFI enforcement is still prohibitively

11 expensive [47]. Ravel’s use of R&R detaches the (high-overhead) vulnerability locator from the production system. WIT (Write Integrity Testing) is another effective defense against memory errors [30]. It enforces a policy in which each instruction can only write to the set of statically- determined, authorized objects. WIT reduces its overhead by checking only memory writes but not reads. Consequently, WIT cannot detect read-only vulnerabilities such as information leaks. Ravel checks both memory reads and writes for vulnerabilities. Another approach called code-pointer integrity (CPI) [107] separates sensitive data, such as code pointers and pointers leading to code pointers, from regular data to protect them from unauthorized modification.

2.3.2 Vulnerability/Bug Discovery

There exist a number of research works related to vulnerability identification, discovery and evaluation through dynamic and static analysis [39, 76, 137, 142]. For example, fuzz testing is a popular, practical approach to discovering software vulnerabilities. It tries to crash a program by feeding it random inputs. However, fuzz testing often ends with a poor code coverage. To address that, Driller uses concolic execution to guide the fuzzer when it stops [142]. To evaluate vulnerability discovery tools, LAVA [76] proposes a dynamic taint-analysis based approach to generating large ground-truth vulnerability corpora on demand. Gist is a tool to diagnose program failures (i.e., crashes) [100]. Specifically, it traces the program execution with Intel’s processor tracing technology and combines static program slicing and dynamic analysis to find root causes of program failures. Ravel shares the same vision as Gist, but it takes a very different approach because of its different focus – Gist focuses on solving program failures, while Ravel focuses on locating vulnerabilities. Many vulnerabilities such as buffer overflows and information leaks can be exploited without causing program failures, rendering Gist ineffective in locating vulnerabilities.

2.3.3 Record & Replay

Record & replay (R&R) [74,88,104,108,131] is an important technique to reproduce events for attack and vulnerability analysis. In particular, Arnold is an always-on R&R system. It enables an interesting concept called the eidetic system in which the complete execution history of the system is kept and can be queried for information about the past execution [74]. We adopt some techniques of Arnold to reduce Ravel’s performance and storage overhead. Ravel’s R&R is different from other R&R systems because of its instrumented replay. R&R has a variety of interesting usages

12 including software debugging and intrusion detection. For example, BackTracker can reconstruct a past intrusion by building an attack dependence graph backwards [104]. Ravel also relies on R&R to reproduce an attack, but it aims at locating vulnerabilities. BackTracker works on the high-level objects such as processes and files; while Ravel works on the low-level memory accesses with different analyses.

2.4 Patch Generation

In this section, we present the related work about adaptive live patch generation and application.

2.4.1 Kernel Live Patching

There exist a number of kernel live patching systems, such as [23], kGraft [22], [33], and KUP [99]. They assume that the kernel source code is available (a reasonable assumption for their purposes) and create live patches from source code patches. Their patches are however in the binary form. This design does not fit the threat model of KARMA. First, although Android kernel is licensed in GPL, many Android vendors, small and large alike [19], do not (promptly) release their kernel source code. Second, these systems lack a mechanism to automatically adapt a kernel patch to different Android devices. Compared to them, KARMA is adaptive so that it can scale to the Android ecosystem. Third, binary patches are prone to misuse because they are hard to understand and audit, and these systems have no strong confinement of patches’ runtime behaviors. KARMA has been designed specifically to address all these challenges in a live kernel patching system for Android. Among these systems, kpatch [23] and kGraft [22] replace a whole vulnerable function with the patched version. They differ in how patches are applied: kpatch stops all the running processes and ensures that none of these processes are running inside the function to be patched (similar to KARMA). kGraft instead maintains two copies of each patched function at the same time and dynamically decides which copy to execute. Specifically, the kernel code active at the time of patching (e.g., system calls, kernel threads, and interrupt handlers) is dispatched to the original version until it reaches a completion point; all other code is dispatched to the patched version. Like kpatch, Ksplice [33] also stops the machine to apply patches. However, Ksplice can patch individual instructions instead of replacing whole functions. These systems share the same limitation

13 that they cannot support patches that “change the semantics of persistent data structures [33]”. To address that, KUP [99] employs the process checkpoint-and-restart to implement kernel hot patching. Specifically, it checkpoints all the user processes, replaces the running kernel with the patched version, and then restores these user processes. Because it replaces the whole kernel, KUP can support all kinds of patches. However, restoring external resources (e.g., sockets) is often problematic for checkpoint-and-restore systems, including KUP.

2.4.2 Semantic Matching

Semantic matching is an important technique in adaptive patch generation. It compares se- mantics or similarity of two functions [78, 80, 113, 115]. BinHunt [80] first uses symbolic execution to compute semantic similarity of basic blocks and uses a graph isomorphism algorithm to further compare the similarity of CFGs (control-flow graphs). Their follow-up work, iBinHunt [115], ex- tends BinHunt with the inter-procedural control-flow graph comparison. However, whole-program comparison could be very time-consuming. To solve that, iBinHunt runs the program with taint tracking and only compares basic blocks within the same data flows. This approach is not suitable for KARMA because none of the commercial Android devices support kernel dynamic taint tracking or whole-kernel instrumentation. CoP [113] also uses symbolic execution to compute the semantic similarity of basic blocks and uses the longest common subsequence of linearly independent paths to measure the similarity of programs. KARMA uses symbolic execution to solve syntax differences in semantically-equivalent functions. In addition, it leverages the fact that most kernel functions remain semantically similar across different kernel versions to significantly speed-up the compari- son. DiscovRE [78] takes a different approach by using the syntactic information (i.e., structural and numeric features) to compare function similarities. This can significantly improve the analysis efficiency. KARMA requires a more precise comparison than those can be provided by syntax-based approaches.

2.4.3 Automatic Patch/Filter Generation

Another category of the related work includes systems that aim at automatically generating patches or input filters. For example, Talos [95] is a vulnerability rapid response system. It inserts SWRRs (Security Workarounds for Rapid Response) into the kernel source code in order to tem- porarily protect kernel vulnerabilities from being exploited. Talos shares a similar goal as KARMA,

14 and both of them rely on the kernel’s error handling code to gracefully neutralize attacks. Talos’ source code based approach cannot be applied to the fragmented Android ecosystem. To address the fragmentation problem, KARMA can automatically adapt a patch to other devices and strictly confine the runtime behaviors of its patches. ClearView [129] learns invariants of a program during a dynamic training phase. When program failure happens, it identifies the failure-related invariants and uses them to generate patches for the program. PAR [103] proposes a pattern-based automatic program repair framework. Its generated patches resemble the patterns learned from human-written patches. ASSURE [138] introduces rescue points that can recover software from unknown exploits while maintaining system integrity and availability. ShieldGen [63] is a system for automatically generating vulnerability signatures (i.e., data patches). Signature-based filtering can only block known attacks. To address that, ShieldGen leverages protocol specifications to generate more ex- ploits from an initial sample. Bouncer [60] uses static analysis and dynamic symbolic execution to create comprehensive input filters to protect software from bad inputs. Compared to these systems, KARMA aims at protecting kernel vulnerabilities for a large number of Android systems and have a different design.

15 CHAPTER 3

ON-DEMAND LIVE RANDOMIZATION

3.1 Introduction

With the ubiquitous deployment of data execution prevention (DEP) that can foil direct code injection [69, 70, 73], code reuse attacks have become a popular attack method. Instead of inject- ing foreign code, they reuse existing code to bypass DEP. These attacks could reuse either whole functions (e.g., return-to-libc or return-to-plt) or short code fragments called gadgets (e.g., return- oriented programming [43, 48, 133] or jump-oriented programming [42]). In a typical scenario, the attacker first launches a code-reuse attack to disable DEP by calling functions like mprotect, and then injects the malicious code into the victim process for more complex tasks. Control flow in- tegrity (CFI) is an effective defense against code reuse attacks [27]. CFI guarantees that the runtime control flow follows the static control flow graph (CFG). Consequently, the attacker cannot arbi- trarily manipulate the control flow to reuse the existing code. However, CFI has not been widely adopted. Early CFI systems have high performance overhead because CFI requires to instrument every instruction. Recent implementations improve the performance by sacrificing preciseness [157,159] and, in some cases, security [72,85]. Code randomization is another effective defense against code reuse attacks. Unlike CFI, code randomization scrambles the reusable code by randomizing the code location, the code layout, or the instruction encoding [37, 71, 90, 102, 126, 151]. Many code reuse attacks rely on the exact locations or contents of the victim process. Code randomization causes these attacks to behave unpredictably. Most popular operating systems support a simpler form of code randomization called address space layout randomization (ASLR), in which (position-independent) executables are loaded at random base addresses [31, 34, 35]. ASLR offers limited randomness, especially on the 32-bit architectures [134]. Moreover, ASLR is particularly vulnerable to information leak attacks – a single leaked code or data pointer can de-randomize the whole process since every code section has a fixed offset to the base. To address this problem, fine-grained code randomization techniques have

16 been proposed, for example, to rearrange functions [102], basic blocks [151], or instructions [90,126]. A high entropy is the key to the security of code randomization. One effective boost to randomness is on-demand live randomization. Live randomization works on a live, running process. It can be applied many times at undetermined periods of time, making the process a moving target for the attacker. Live randomization can eliminate the predictability associated with the compile-time or load-time randomization schemes. It can significantly improve the randomness for 32-bit architectures, which many computers and embedded devices still use. However, live randomization is challenging to implement correctly: when the code is changed, it is necessary to update all the code and the data that depend on the changed code to guarantee correctness. For example, if a call instruction is moved to a different address, we have to update every branch instruction that targets this instruction (or its preceding instructions), and search the stack for the corresponding return address and update it to the new one. runtime changes to function entry points are even harder to fix – it is non-trivial to locate all the affected function pointers in the whole address space, including the code, the data, the stacks, and the heap. In particular, a linear search of the function address has false positives and could take a prohibitively long time to complete. Function addresses could also be stored by and in the OS kernel. For example, a process can register a handler for each signal of interest. The kernel saves this data in the kernel memory, unreachable by the process. If the handler is moved, the kernel must be notified with the updated address. To achieve that, one has to intercept the system calls that register signal handlers and re-register the handlers when necessary. Therefore, live randomization is challenging to implement. An existing live-randomization system customizes the compiler to generate enough meta-data to facilitate its job [84]. However, it still has yet to overcome the aforementioned challenges. For example, there is unsolvable ambiguity (e.g., pointers in unions or pointers stored as integers) in pointer migration that requires developers’ manual effort. In this chapter, we propose Remix, an efficient, practical live randomization system for both user processes and kernel modules. Remix randomly shuffles the process’ basic blocks within their respective functions to change the runtime code layout (a basic block is a linear sequence of in- structions that has only one entry point and one exit point [152]. An exit point is often a branch instruction, such as jmp or ret. It could also be a non-branch instruction that falls through to the next basic block.) That is, functions remain at their original, expected locations, while basic

17 blocks are moved around but never cross the function boundaries. This design can significantly reduce the complexity of live randomization: first, there is no need to fix function pointers because function entry points are not moved. This avoids the complicated pointer migration that may in- volve unresolvable ambiguity [84] (function addresses are still randomized once at the load time by ASLR). Basic block addresses may still appear in both the code and data sections (e.g., jump tables). But these appearances are mostly limited to the local scopes and thus are relatively easy to fix. Second, it is straightforward to support partial randomization since each change is confined to a local scope. For example, Remix can be used to randomize selected kernel modules. Randomized and non-randomized kernel modules can co-exist in a single kernel in harmony. Third, compared to systems that globally rearrange basic blocks [151], Remix maintains better locality. Compilers make an effort to optimally lay out the code for better performance. Global rearrangement of basic blocks could potentially lead to poor locality and substantial performance loss. Remix instead shuf- fles basic blocks locally. It can also bundle closely-related basic blocks together (e.g., tight loops) to further reduce the performance overhead. Simplicity and efficiency are two major advantages of Remix. They make Remix an ideal technique to compose with other defenses. For example, Remix should be used with ASLR so that functions are randomized at least once (during program startup). Other examples of compatible techniques include defenses against JIT-ROP [140] or Blind-ROP [41] attacks [36, 62, 83] and function-level re-randomization [84]. Remix can significantly increase the unpredictability of those systems with on-demand, live randomization of basic blocks. We have implemented a prototype of Remix for Linux applications and FreeBSD kernel modules. Our prototype uses a slightly modified LLVM compiler to reserve spaces needed for basic block reordering (it can also support binary-only programs by leveraging existing NOP instructions used to align instructions, albeit with less gain in randomness.) Our experiments with standard benchmarks and applications show that Remix can substantially improve the randomness with a minor overhead (e.g., 2.8% average performance overhead and 14.8% average increase in binary size for SPEC CPU2006). The rest of this chapter is organized as the following. We first present the design and imple- mentation of Remix in Section 5.2 and Section 5.2.7, respectively. The evaluation results are given in Section 5.3, followed by a discussion of potential improvements to Remix in Section 5.4. Finally, we conclude the chapter in Section 5.5.

18 Basic Block Reordering: Control Flow:

0x400d30: pushq %rbp 0x400d31: movq %rsp, %rbp ...... 0x400d44: jle 0x400d70 BB1 0x400d4a: nopl 8(%rax, %rax) 0x400d4f: leaq 0xd77e(%rip), %rdi ...... BB2 0x400d58: callq 0x400ae0 ...... 0x400d66: jmpq 0x400d7c

Before Remix 0x400d6b: nopl 8(%rax, %rax) BB3 0x400d70: movl $0, -4(%rbp) 0x400d77: nopl 8(%rax, %rax) 0x400d7c: movl -4(%rbp), %eax BB4 ...... 0x400d83: popq %rbp 0x400d84: retq 0x400d85: nopl 8(%rax, %rax)

0x400d30: jmpq 0x400d5f 0x400d35: leaq 0xd798(%rip), %rdi ...... 0x400d3e: callq 0x400ae0 ...... BB2' 0x400d4c: jmpq 0x400d56 0x400d51: nopl 8(%rax, %rax) 0x400d56: movl -4(%rbp), %eax ...... BB4' 0x400d5d: popq %rbp 0x400d5e: retq

0x400d5f: pushq %rbp BB1'

After Remix 0x400d60: movq %rsp, %rbp ...... 0x400d73: jle 0x400d7e BB3' 0x400d79: jmpq 0x400d35 0x400d7e: movl $0, -4(%rbp) 0x400d85: jmpq 0x400d56

Figure 3.1: An Example of Remix on x86-64

3.2 Design 3.2.1 Overview

Remix aims at increasing randomness for protected processes through live randomization of basic blocks while keeping function entry points unmoved. Figure 3.1 shows an example of applying Remix to a simple 64-bit x86 (x86-64) function. After Remix, the basic blocks have been reordered. Any gadgets discovered before Remix immediately become obsolete, and their execution likely will cause exceptions like illegal or general protection error. Even though it is conceptually straightforward, reordering basic blocks faces a number of challenges:

19 First, the function might not have enough space to accommodate the reordered basic blocks. For example, some basic blocks end with a short jump instruction that takes a single of offset. Their targets could be moved by Remix beyond the reach of one byte. It is thus necessary to substitute the short jump with a long jump, which takes four for the offset. In addition, some basic blocks do not end with a branch instruction. They instead fall through to the next basic block. The movl $0, -4(%rbp) instruction in Figure 3.1 (at address 0x400d70, before Remix) is such an example. The instruction at 0x400d7c starts a new basic block because the instruction at 0x400d66 jumps to it, making it an entry point. Remix has to add a new jump instruction to connect the fall-through basic blocks. To accommodate reordered basic blocks, we modify the compiler to emit a five-byte NOP instruction after each basic block. This provides enough space to insert a long jump (also five bytes) for each basic block. This errs on the safe side – there is always enough space to accommodate the reordered basic blocks. Remix can also support binary-only programs without recompilation by leveraging the existing NOP instructions in functions. Second, when a basic block or its succeeding blocks are moved to other positions, it is necessary to fix their exit points to maintain the correct control flow: if the exit instruction is a direct branch, we only need to update its offset to the new address of its successors (a basic block has two successors if it is a conditional branch.) For example, the jle instruction (Figure 3.1) has two branches. When it is moved, Remix adds a direct jmp instruction (at 0x400d79 after Remix) because the original branch falls through to the movl instruction at 0x400d70. If the exit instruction is an indirect branch, Remix analyzes its structure and handles it accordingly. For example, indirect calls can be left alone because function entry points are not moved by Remix. Indirect jumps are more complicated with several possibilities (Section 3.2.3). They are in fact related to the third challenge, how to migrate basic block pointers. Third, there exist pointers to basic blocks in the process’ code and data sections. For example, the stack consists of local variables and return addresses. A return address points to the instruction after the originating call instruction (the return site). If a call instruction is moved by Remix, we have to substitute the original return address on the stack with the new one. In addition, the compiler generates jump tables to speed up switch/case-like structures. A jump table contains basic block pointers to handle its cases. It has to be patched when basic blocks are moved. Jump tables have several possible structures. Remix must handle all those different cases. The kernel has

20 its own set of basic block pointers that have to be converted to maintain the correct control flow. In the rest of this section, we present in detail how Remix solves these problems.

3.2.2 Basic Block Reordering

Remix shuffles basic blocks within their respective functions to increase runtime randomness. Algorithm 1 gives a high-level overview of this process. Specifically, Remix first parses the code into basic blocks, and generates a random ordering of these basic blocks to guide the process. Remix then lays out the basic blocks according to that ordering, and saves the mapping between their old and new positions in a table (m). This table is used to convert basic block pointers. Note that the first instruction of a function (i.e., the function entry point) is replaced by a direct jump to the first basic block. As shown in Figure 3.1, Remix does not terminate a basic block with a call instruction. We choose this design for two reasons: first, Remix keeps functions at their original locations. Call instructions thus do not require complicated handling. Second, by design, a call instruction falls through to the next instruction. An extra jump must be inserted after the call instruction if the fall- through instruction is moved by Remix. Many applications use a large number of call instructions. This would substantially increase the binary size and reduce the performance. Reordering basic blocks changes their positions. Some instructions need to be updated to main- tain the original control flow. They consist of instructions that have a program-counter (PC) relative operand (e.g., the various branch instructions). Most of them have a constant displacement that can be adjusted to offset the position changes made by Remix. We need to consider two types of position changes – the instruction itself and the destination of the instruction. We use two functions, FixDispS and FixDispD, in Algorithm 1 to handle these two cases, respectively. The majority of the instructions to be patched are branch instructions, i.e., indirect/direct calls and jumps (line 5-15 in Algorithm 1):

• Indirect Call: an indirect call invokes a function indirectly through a function pointer. Function pointers remain valid because Remix does not move function entry points. As such, indirect calls can be left unchanged.

• Direct Call: a direct call targets the function at a certain displacement to itself. Even though the function stays at its position in the memory, the call instruction could have been moved to a different place. Accordingly, direct calls should be fixed with the FixDispS function.

21 Algorithm 1 Basic Block Reordering 1: for each function f do 2: s = GenerateRandomOrder( f ); 3: m = LayoutBasicBlocks(s); 4: for each instruction i do 5: if i ∈ DirectCall then 6: FixDispS(i, m); 7: else if i ∈ DirectJump then 8: FixDispS(i, m); 9: addr = CalcPrevTarget(i); 10: FixDispD(i, m, addr); 11: else if i ∈ IndirectJump then 12: if IsJumpTable(i) then 13: AddToJumpTableList(jt, i); 14: end if 15: end if 16: if i ∈ PC-RelativeInsn then 17: FixDispS(i, m); 18: end if 19: end for 20: ConvertBasicBlockPointers (m, jt); 21: end for

• Direct Jump: a direct jump often targets another basic block. Both the source and the des- tination instructions might change the positions. To fix a direct jump, Remix first adjusts the instruction’s displacement to offset the source instruction movement with FixDispS. It then cal- culates the original target and adjusts the displacement to offset the target instruction movement. In addition, a conditional jump has two branches, one for the true condition and the other for the false condition. One of the branches is a fall-through to the next instruction. Remix handles this case by treating the fall-through as an implicit jump to the next basic block. The same approach is applied if a basic block falls through to the next one without a branch instruction (e.g., BB3 in Figure 3.1).

• Indirect Jump: indirect jumps are more complicated to handle than the other branch instruc- tions. They can target both functions and basic blocks. The former does not need any changes, but the latter can involve several different cases that must be handled by Remix. We elaborate these cases in Section 3.2.3.

22 • PC-relative Addressing Mode: in addition to branch instructions, we also need to patch instructions with the PC-relative addressing mode, which are often used by the compiler to generate position-independent code. A program must be compiled as a position-independent executable (PIE) to benefit from ASLR (on Linux). A PIE program can run at any location in the address space. To achieve that, it calculates the runtime addresses of its code and data relative to the current program counter. The newer x86-64 architecture natively supports the PC-relative addressing mode. For example, instruction lea 0x200000(%rip), %rbp adds 0x20,0000 to the current program counter and saves it to the rbp register. The older x86-32 architecture has no native support for this addressing mode. Instead, the compiler uses a simple built-in function to retrieve the return address from the stack, which has been pushed to the stack earlier by the caller. Accordingly, this function returns the address of the return site (i.e., PC+5). To ensure correctness, Remix needs to update these instructions and functions. Fortunately, the compiler uses this mode (almost) exclusively to calculate the runtime function and data addresses, both of which are not changed by Remix. Only the PC-relative instructions and functions (on the x86-32 architecture) may have been moved. This can be easily compensated with FixDispS.

When updating instructions, the new displacement might grow larger than what can fit in the original instruction. For example, x86-64 has two formats of relative jumps – short jumps with a one-byte displacement and long jumps with a four-byte displacement (x86-32 also supports short jumps with a two-byte displacement.) It is rather easy to overflow short jumps especially in large functions. One feasible solution is to restrict the moving distances of short jumps within the one- byte limit. However, this could quickly become over-complicated if several short jumps are to each other. We might end up with several basic blocks unchanged or only moved by a short distance. Remix instead configures the compiler to always generate the equivalent long jumps with four-byte displacements. This is also the case for call instructions which have either a two-byte or a four-byte displacement. Figure 3.1 gives an example of applying Remix to a short x86-64 function. After Remix, four basic blocks are moved to new positions. Branch and PC-relative instructions, including jle, callq, jmpq and leaq, are updated to maintain the control flow. Moreover, two jmpq instructions (0x400d79 and 0x400d85, after Remix) are inserted for the fall-through of basic blocks. Another jmpq instruction (0x400d30) is placed at the function entry point targeting the first basic block.

23 3.2.3 Basic Block Pointer Conversion

User-space programs built by compilers often do not need or have direct access to basic blocks. Accordingly, most programs have no explicit pointers to basic blocks. However, the compiler might spontaneously create such pointers when compiling the source code. For example, a return address on the stack points to the instruction following the corresponding call instruction. Besides, the compiler often uses jump tables to speed up the switch/case statements. After Remix reorders basic blocks, these pointers become invalid and thus have to be updated. In the rest of this section, we discuss these cases in detail.

Return Address Conversion. A call instruction automatically pushes its return address to the stack so that the callee can continue the execution from there upon return. The return address points to the instruction following the call instruction, i.e., the return site. When Remix performs live randomization of the process, the stack has already contained return addresses. If these addresses are not subsequently updated, the process will return to wrong locations, eventually causing exceptions such as illegal opcode or segmentation fault. To convert return addresses, we traverse the whole stack (starting at the top of the stack in register rsp), and search for and update every address that points to a valid return site. With this condition, the chance of a stack variable being accidentally treated as a return address is very slim. In addition, return address conversion is straightforward and deterministic if the program maintains stack frame pointers. A stack frame is a continuous block of memory on the stack that keeps data for an active function. If frame pointers are maintained, each frame contains a pointer to the previous frame, and the return address is stored at a known location in the frame. Therefore, we can traverse stack frames and update all and only return addresses. By default, modern compilers like gcc do not generate code to maintain frame pointers in an optimized compilation.

Indirect Jump Related Conversion. Indirect jumps are used by the compiler and standard libraries for a number of purposes. They can target either functions or basic blocks. No change is needed for the former, but the latter requires us to update the associated basic block pointers. Function Pointers: the compiler uses indirect jumps (to functions) mostly to support shared libraries, ++ vtable, and tail/sibling calls. For example, the compiler generates the PLT and GOT tables for calls to external functions in a shared library [111]. The library is loaded at a random address unknown until the program runs. At the runtime, the linker resolves the address of each

24 (A) jmpq *0x480000(,%rax,8)

(B) jmpq *0x8(%rax,%rcx,8)

(C) movslq (%r9,%rbp,4),%rcx add %r9,%rcx jmpq *%rcx

Figure 3.2: Jump Table Examples

called external function and saves it in a GOT entry. A PLT entry is an executable trampoline that represents the actual function. It essentially is an indirect jump to the function address saved in its associated GOT entry. The PLT table is placed in a special section. Remix leaves this section unchanged. Tail/sibling call optimization is also interesting. The compiler normally allocates a new stack frame for each function call. However, there are cases where the callee can safely share the caller’s stack frame. Such a call is dubbed the tail call or the sibling call, depending on the location of the call instruction. A typical example of the tail call is tail-recursive functions [155], but compilers like gcc support the broader definition of tail/sibling call. They can identify these cases and reuse the callers’ stack frames. If the callee is a function pointer, the compiler generates an indirect jump (instead of an indirect call) in order to reuse the stack frame. Remix does not need to change indirect jumps introduced by tail/sibling call optimization. Saved Context: indirect jump is also used by the standard C library to restore saved context. For example, the setjmp and sigsetjmp functions save their calling context to a jump buffer, while the longjmp and siglongjmp functions restore the context saved by setjmp and sigsetjmp, respectively. Both functions use an indirect jump to continue the execution at the saved instruction pointer. After reordering basic blocks, Remix needs to update all the jump buffers. The most efficient solution is hooking the functions that save the context and record the locations of the jump buffers. Note that the saved registers in the jump buffer are encoded by glibc in a special format (PTR_MANGLE). The alternative approach that searches the whole address space for jump buffers incurs unnecessary performance overhead as these functions are seldom used. Jump Tables: jump tables are often generated by the compiler to speed up switch/case statements. If some cases are continuous, the compiler stores their handlers in a table, and uses the switch variable as an index to quickly locate the corresponding handler. On x86-64, various

25 patterns of jump tables can be used [59] as shown in Figure 3.2. They all have a base address, an index register, and a scale. An entry in the jump table can be addressed by (base + index ∗ scale). For example, the bases of case A, B, and C are constant 0x480000, register rax, and register r9, respectively, and the indexes are in the rax, rcx, and rbp respectively (in case C, rbp is used as a general-purpose register, not the stack frame base pointer.) Interestingly, while case A and B store the actual handler addresses in the table since they directly jump to the selected entry, case C stores the offsets between the table base and the handlers. Each offset is only four bytes (a pointer is 8 bytes on the x86-64 architecture.) To calculate the handler address, the code reads the offset into register rcx and adds it to the table base in register r9. Handlers for a switch/case statement are some basic blocks of the enclosing function. Remix thus has to update them after reordering basic blocks. The first two cases are rather straightforward to handle: jump tables are typically placed in the .rodata section. We search this section looking for at least 3 consecutive addresses pointing to the code section. If these addresses are close enough to each other (e.g., no more than 1MB apart) and all point to a valid instruction, Remix updates them accordingly. Even though false positives are possible, we did not find it to be a problem during our experiments. This approach does not work on the third case whose jump table consists of offsets, not instruction addresses. A simple solution is to export some meta data (e.g., the table base and length) from the compiler for Remix to patch the table at the runtime. Remix then can locate each handler and adjust its offset by the displacement between the old handler address and the new one. Our prototype uses this approach. Another viable solution is to use pattern matching to locate the code similar to case C (registers might be different) and use an intra-procedural, backward program slicing [29,160] to locate the table base and length. For example, the index (register rbp in case C) is often compared to the table’s upper and lower limits to make sure that it is within the table’s boundary. This gives us the valid range of the index and hence the table length. As for the table base, the compiler generates case C mostly for position-independent code (e.g, shared libraries). The table base is calculated at the runtime using the PC-relative addressing mode, which has its own patterns (Section 3.2.2). As such, the table base can be calculated using the program counter and an offset. This approach is more complicated but it is the only choice if the source code is not available.

26 Exception tables can be similarly patched. Each exception table entry consists of a code range and a handler. If an exception happens in that range, it should be handled by the associated handler. However, Remix might move a faulting instruction out of the range and cause no handler or a wrong handler to be called. To address that, we can either revert the faulting instruction to its original location or avoid moving basic blocks into and out of the range. Our prototype has yet to implement this feature. Nevertheless, we can complete our experiments (including the Apache server and a kernel file system) without any problem. Even though is exploited by malware or DRM software to obfuscate control flows, regular applications do not use it that way (i.e., they use it for exceptions, not regular control flows.) since exception handling is relatively slow.

3.2.4 Live Randomization of Kernel Modules

Live randomization of the kernel code faces many of the same challenges as that of user ap- plications. For example, the kernel can be compiled to use jump tables for tight switch/case statements. The kernel may also use exception tables – the kernel often needs to access the user memory. To protect itself from untrusted applications, the kernel must verify every user address it accesses, an expensive operation that requires traversing the process’ paging structures. Moreover, the vast majority of user addresses are benign and safe to access. To avoid unnecessary verification, the kernel accesses the user memory without a prior verification. Instead, it registers a page fault handler that will be called by the kernel if the memory access fails. These cases can be similarly handled as in the user-space. Nevertheless, there are some differences between the kernel and user applications. For example, the kernel often embeds manual assembly code, which may not follow the paradigms of the compiled code. That code has to be handled case-by-case (for once). The return address conversion is more complicated than the because the changed return addresses could exist in any of the active kernel stacks (if a process is running in the user-space, its kernel stack is empty.) All these stacks need to be updated at once. In addition, a hardware interrupt can interrupt any instruction in the kernel or the user space. The interrupted address is saved on the kernel stack. If the interrupted instruction is in the kernel and has been moved, we can directly update the saved interrupt context. However, if the interrupted instruction is in the user space, Remix cannot update the kernel interrupt context, which is protected from the user space. Consequently, in the user space, Remix should

27 not move an instruction that may be interrupted, i.e., the instruction that is currently executing. In our prototype, we stop the whole process (to guarantee consistency) and use a small agent to reorder basic blocks. The agent does not randomize itself. Even though it is possible to randomize the whole kernel, our prototype currently supports live randomization of kernel modules (e.g., the ReiserFS file system).

3.2.5 Performance Optimization

In this section, we present our strategies to improve the runtime performance of protected processes and to reduce the randomization latency.

Probabilistic Loop Bundling. Compilers make an effort to optimize the layout of the gen- erated code for better performance. For example, gcc has an option to align loops to a power-of-two boundary (-falign-loops). If enabled, gcc inserts a number of NOPs before the loops to properly align them in the cache. If the loops are executed many times, the performance gain from the alignment outweighs the time wasted in executing NOPs. Remix, as well as other basic block ran- domization systems, disrupts this careful layout of the code. Because Remix randomly rearranges basic blocks, its final performance impact is somewhat unpredictable due to the complex interac- tions between the program and the cache hierarchy. For example, our early experiments find that Remix incurs low overhead for most SPEC CPU2006 benchmarks, but there are a couple of outliers with more than 15% overhead. To address that, we propose probabilistic loop bundling. Loops are critical to the overall performance. A process often spends most of its execution time in loops. Changing the layout of loops might incur the largest impact to the performance. Accordingly, Remix focuses its optimization on the loops. It can probabilistically bundle the basic blocks of loops. A bundled loop has the same internal layout of basic blocks as the original, non- randomized loop. Within the boundary of a function, we consider the destination of a backward jump as the beginning of a loop and the jump as its end (even though this loop detection is quite rough, it is sufficient for our purpose.) We also control the size of a bundled loop by limiting the number of the jump and return instructions it contains. This avoids bundling large loops – for some functions, their bodies consist of a single large loop. Before the first randomization, Remix detects loops in the original code and records the layout of their basic blocks. During the live randomization, Remix flips a coin with certain probability to decide whether or not to bundle a

28 Table 3.1: Average NOP Space per Function

Software glibc httpd nginx lighttpd OpenSSL NOP Space 42.9 19.3 26.2 22.1 19.9 loop. If a loop is bundled, its basic blocks are restored to the original, compiler-generated layout. The whole bundle is then treated as a single basic block and takes part in the randomization. In other words, a bundled loop is still moved around but its internal basic blocks remain relatively static. If possible, we make bundled loops to be 16-byte aligned. Our prototype bundles loops with 1 2 a probability of 3 . i.e., about 3 of the loops are randomized. Finally, we want to emphasize that each live randomization individually selects which loops to bundle. No loops will always be bundled.

Meta-data Maintenance. Remix reorders basic blocks from time to time to make the code layout unpredictable. This is a time consuming process especially for large programs. In addition, Remix has to stop the whole process during randomization to ensure consistency. Otherwise, a multi-threaded process might have unsuspecting threads executing partially randomized functions. To this end, Remix maintains some meta-data to facilitate live randomization. For example, it builds an index for basic blocks and some important instructions (e.g., call instructions and jump tables). The meta-data is built from the ground up in the first run and kept updated afterwards. With the meta-data, Remix can significantly reduce the randomization latency. To protect the meta-data from being leaked, we allocate its memory at a random location. Even though the meta- data is stored in the process’ address space, it is isolated from the process itself because no pointers to the meta-data exist in the process (our prototype stores the base address for the meta-data out of the process. See Section 5.2.7.) Information leak vulnerabilities in the process cannot disclose the meta-data location or its content. To be more cautious, we could move the meta-data to random locations at undetermined intervals.

3.2.6 Binary-only Program Support

If the source code is available, Remix uses a (slightly) customized compiler to reserve enough space for extra jumps necessary to connect reordered basic blocks (Section 4.3.1). However, the source code is not always available, especially for commercial or legacy programs. Remix has a compatibility mode to support binary-only programs by leveraging the existing NOP padding in

29 the code. As previously mentioned, compilers often insert NOP instructions to align functions and loops to a power-of-two boundary. As such, there are NOPs between and inside functions. Table 3.1 shows the average NOP space per function (in bytes) for several popular software packages. Remix can use the NOP space for its purpose. We treat small and large functions differently: small functions naturally contain less NOP instructions, but short jumps (2 bytes each, one byte for the opcode and the other byte for the displacement) are often enough to chain basic blocks; Large functions have more NOP space available, but basic blocks might be moved far apart from each other. To chain two basic blocks, we use short jumps whenever possible and long jumps otherwise. If the space runs short, we bundle some basic blocks together to reduce the extra jumps needed (similar to the loop bundling). During each live randomization, Remix picks different sets of basic blocks to bundle together. This ensures that a different code layout is generated each time.

3.3 Implementation

We have implemented a prototype of Remix for the Linux applications and the FreeBSD kernel modules on the x86-64 architecture. The FreeBSD kernel is chosen because it has better support for the LLVM/Clang compiler. In this section, we describe our prototype in detail. We slightly modify the LLVM/Clang compiler to insert a 5-byte NOP instruction (nopl 8(%rax, %rax)) after each basic block. To achieve that, we add one line to the EmitBasicBlockEnd function in LLVM. These 5-byte NOPs also serve as delimiters for basic blocks because LLVM itself does not use this type of NOP (it does use other formats of NOPs, such as xchg %ax,%ax.) This makes basic block identification straightforward for Remix. To ensure that LLVM only generates long jumps (Section 3.2.2), we pass -mc-relax-all to the LLVM backend. However, it unnecessarily relaxes other instructions, such as add and sub, to full displacements as well. With more invasive changes to LLVM (likely in the fixupNeedsRelaxation function), we could make LLVM relax only branch instructions. We use Capstone, a cross-platform multi-architecture disassembly framework, to disassemble instructions in the memory. Linux enforces w ⊕ x for user applications, in which a block of memory is either writable or executable, but not both simultaneously. As such, we use the mprotect() system call to temporarily make the .text and .rodata sections writable. After live randomization, we set their permissions back.

30 A major implementation challenge is to guarantee the consistency of the process, especially for a multi-threaded process. All the threads should enter a consistent state before live randomization, and have their data updated before the execution is resumed. A viable solution is to use a kernel module and pause all the threads at the system call boundary. In our prototype, we use the ptrace to stop the whole process (for single-threaded processes, a timer signal can also serve this purpose.) Similarly, we need to put the kernel in a quiescent state and update all the affected kernel stacks consistently. Ptrace is an interface for process tracing and debugging. It allows one process to inspect and control the execution of another process. We start the target program under the control of a small utility program, which is responsible for initiating live randomization at random intervals (for brevity, we call it the initiator.) When it is time for live randomization, the initiator sends a SIGSTOP signal to the target process and waits for it to stop. For each stopped , the initiator has full access to its execution context, including the registers and the program counter. Even though we could randomize the code with ptrace, the ptrace interface is too slow for this task – each access to the target process’ memory must be conducted through an expensive system call. Instead, we pre-load a small agent in the target process and use ptrace to activate the agent. The agent performs the live randomization and returns the control back to the initiator when it finishes. The initiator can subsequently restores the process’ state and resumes its execution at the interrupted instructions. However, these instructions might have been moved to different positions. To fix that, the initiator requests the agent to translate the interrupted program counters to their new values. To avoid interfering with the target process’ heap and stacks, the agent uses the mmap system call to allocate new memory for its own heap and stack. The agent makes system calls directly instead of using the equivalent libc functions because it might be libc that Remix is currently randomizing (if so, libc is in an inconsistent state.) To prevent the agent from being exploited by code reuse attacks, the initiator relocates the agent from time to time. Moving the agent is much simpler than the live randomization of regular processes because the agent is small, position-independent, and self-contained (i.e., it does not rely on other libraries.) In the FreeBSD kernel, live randomization is triggered by a timer. When the timer expires, we call the smp_rendezvous function to put all the CPUs in a consistent, quiescent state. Smp_rendezvous sends inter-processor interrupts to signal all the CPU cores. They rendezvous and execute the same

31 Table 3.2: Statistics of Three Web Servers

Software Apache nginx lighttpd Average Basic Block # 15.3 18.8 14.4 Average NOP Space 19.3 26.2 22.1 set of functions. In our prototype, one core performs live randomization while others wait for it to finish. That core reorders the basic blocks of the target kernel module and searches the kernel stacks and other data structures for the affected basic block pointers. After randomization, all the cores are resumed and continue the interrupted execution.

3.4 Evaluation

In this section, we first analyze the security guarantee of Remix and then measure its performance overhead with standard benchmarks.

3.4.1 Security

Remix randomly reorders basic blocks within their respective functions to increase entropy. It complements the existing ASLR support in commodity operating systems. ASLR randomly places the executable in the address space. It only provides a coarse-grained protection against code reuse attacks. The leak of a single code pointer, such as a function pointer or a return address, is often sufficient to de-randomize the whole executable. The attacker often leverages an information leak vulnerability to de-randomize the victim process before full-on attacks [153]. Remix can significantly improve ASLR’s resilience to this type of information leak. It reorders the basic blocks of each function at random intervals. The actual code layout is unpredictable and keeps changing from time to time. Even if two systems run the exactly same programs, their runtime code layouts are different. Table 3.2 shows the average number of basic blocks per function for three popular web servers, Apache, nginx, and lighttpd. They all have about 16 basic blocks per function on average. Therefore, Remix adds about four bits of entropy to each instruction of these programs. This leads to about 20% to 25% boost in the entropy for 32-bit systems [134]. More importantly, Remix introduces the time as a variable to address space layout, making it a moving target. The compiler often spontaneously inserts NOP instructions to the generated programs to align functions or loops. Table 3.2 also shows the average NOP space per function (in bytes) for those programs. The NOP

32 6%

5%

4%

3%

2%

Relative Performance Overhead 1%

0% bzip2 mcf gobmk hmmer sjeng libquantumh264ref astar milc namd soplex lbm sphinx3

Figure 3.3: SPEC CPU2006 Performance Overhead

25%

20%

15%

10%

5% Relative Program Size Increase

0% bzip2 mcf gobmk hmmer sjeng libquantumh264ref astar milc namd soplex lbm sphinx3

Figure 3.4: SPEC CPU2006 Size Increase

space can be leveraged to further increase the entropy by randomly placing NOPs between basic blocks. For short functions with less than 4 basic blocks, we also insert some additional NOP space to improve the entropy. Recently, researchers have proposed a few novel attacks against fine-grained code randomization. For example, JIT-ROP (Just-in-time ROP [140]) repeatedly exploits a memory leak vulnerability to recursively map out the victim process’ address space, and synthesizes code reuse attacks on demand. JIT-ROP is particularly detrimental to code randomization techniques that randomize the process only once at the compile or load time [151]. Remix’s live randomization could potentially disrupt the JIT-ROP attack (if the code happens to be randomized by Remix during the attack), but it is

33 3.0%

2.5%

2.0%

1.5%

1.0% Relative Performance Overhead 0.5%

0.0% 1s 5s 10s 60s

Figure 3.5: Apache Web Server Performance Overhead

0.7%

0.6%

0.5%

0.4%

0.3%

0.2% Relative Performance Overhead 0.1%

0.0% 0.01s 0.1s 1s

Figure 3.6: ReiserFS Performance Overhead

not always effective. A sure defense against JIT-ROP is execute-only memory, in which the code can only be executed but not read. Fortunately, execute-only memory is being adopted by major CPU architectures [32, 62, 97] and can be emulated in the software [36, 83]. Remix should incorporate the execute-only memory as a defense-in-depth solution. For performance reasons, our prototype implants an agent and its meta-data into the target process. However, it is unlikely that JIT-ROP could find these artifacts. Even though they exist in the process’ address space, they are isolated from the process itself because the process has no pointers to them. In addition, we could move them to random locations from time to time. JIT-ROP carefully maps the victim process’ address space to avoid accessing invalid memory. Blindly probing the Remix memory most certainly will trigger a general protection exception and be foiled. BROP [41] is another attack against fine-grained code randomization, which exploits a victim process many times to essentially brute-force it in order

34 to locate useful gadgets (it assumes the process would be restarted upon crash.) During this long process, Remix likely has randomized the process a few times, making the probed gadgets useless.

3.4.2 Performance

The performance impact of Remix mostly comes from the following two aspects: first, live randomization has to stop the whole process or the kernel to ensure consistency. This introduces some latency to the whole process. Second, Remix rearranges the code layout. Modern computer architectures rely heavily on the cache for performance. Changing the process’ code layout can affect its cache profile and by extension the performance. We measure both aspects with standard benchmarks (SPEC CPU2006) and a number of popular applications. All the experiments are performed on a third-generation Intel Core i7 machine with 16 GB of memory. The operating system is the 64-bit Ubuntu 14.04.2 LTS. LLVM version 3.6 is used as the base compiler. To measure the execution overhead, we randomize the SPEC CPU2006 benchmarks once (with 1 a probability of 3 for loop bundling) and compare their performance to the baseline built with the unmodified LLVM compiler. All the experiments are repeated 20 times. The standard deviation of the experiments is negligible. Figure 3.3 shows the performance overhead caused by Remix (C++ benchmarks with exceptions currently are not supported yet.) The overhead for most benchmarks are less than 5% with an average of 2.8%. To reserve space for reordering basic blocks, Remix inserts a 5-byte NOP instruction after every basic block. It also relaxes various instructions to use larger constants (e.g., jmp and call). This could substantially increase the program binary size. Figure 3.4 shows the size increase of SPEC CPU2006. The average increase in size is 14.8%. We use ApacheBench to measure the performance impact of live randomization intervals. We run the Apache server and ApacheBench on two directly connected machines. We use ApacheBench to send 5 ∗ 106 requests to the server with a concurrency of 10. We set Remix to periodically randomize the server with an interval of 1, 5, 10, and 60 seconds, respectively. As expected, one-second interval incurs the highest overhead (2.9%). The overhead gradually decreases as the time interval increases. At the ten-second interval, the overhead is only 0.44%. Remix not only supports user-space applications but also kernel modules. Our experiments are based on the FreeBSD kernel as it has better support for the LLVM/Clang compiler. We use Remix to live randomize the ReiserFS kernel driver [130]. IOZone, a user-space file system benchmark [123], is used to measure the performance of ReiserFS under different randomization intervals. We test

35 the stride read of a large file in the automatic mode with a record size from 4KB to 512MB. The performance overhead of Remix is negligible even with a randomization interval of 0.01 seconds (Figure 3.6). This is expected as the performance bottleneck is in the disk I/O. We also test the read/re-read operations and get very similar results.

3.5 Discussion

In this section, we discuss some possible improvements to Remix. First, Remix reorders basic blocks within their respective functions. The entropy increase by Remix is thus limited by the number of basic blocks in the function. Smaller functions have fewer basic blocks and thus benefit less from Remix. In our prototype, we insert extra NOP space in small functions to increase the entropy. Furthermore, we could incorporate fine-grained code randomization [90] specifically for these small functions. One of the key benefit of reordering basic blocks within functions is that function entry points remain at their intended location. Consequently, there is no need to migrate stale function pointers, which in general is an unsolvable problem. However, this does not necessarily require that basic blocks remain within function boundaries. We could randomly place basic blocks in the whole address space, and use a long jump at each function entry point to jump to its first basic block. This system has the benefits of binary stirring’s higher entropy gain and Remix’s simpler live randomization. However, its spatial locality of the randomized code is even more fragmented than our current design. It is necessary to carefully study the optimal basic block layout for better performance and security. Meanwhile, Remix does not lively move functions to avoid the complex runtime fixing of stale function pointers. Functions are nevertheless randomized at least once during the program startup by ASLR. Remix is a highly composable technique. It can be naturally integrated with systems that lively randomize functions [84] or with other techniques. Second, some programs contain code that cannot be automatically randomized by Remix, such as inline assembly code, which sometimes does not follow the (relatively) clean paradigm as the com- piled code. For example, kernels often use inline assembly in trampolines for interrupt handlers. A trampoline prepares the kernel stack to handle pending interrupts. The addresses of these trampo- lines are stored in the interrupt vector table. When an interrupt is triggered, the CPU indexes into this table and dispatches the corresponding handler. If these trampolines are reordered, we need to update the interrupt vector table. In addition, some programs have code that cannot be cleanly

36 disassembled (e.g., the obfuscated code), and programs like just-in-time compiler can dynamically generate binary code. There does not seem to have a universal solution to these diverse problems. We instead have to handle them case-by-case. For example, we could incorporate the design of Remix into the JIT compiler so that dynamically generated code can also be randomized. Third, Remix performs live randomization of the target process at an undetermined interval. The choice of this interval is a trade-off between randomization latency and security. As mentioned earlier, Remix provides probabilistic protection against information-leak based attacks such as JIT- ROP [140] (Section 3.4.1). That is, the protection is in effect if Remix happens to randomize the target process during the attack. As such, an interesting criterion to decide the live randomization interval is how likely Remix can disrupt these attacks. Our prototype uses an interval of ten seconds as a trade-off between randomization latency and security. Like other code randomization systems, Remix is, after all, a probabilistic defense. A more complete, defense-in-depth system should combine Remix with specific defenses against those attacks (e.g., execute-only memory to prevent JIT-ROP). Last, Remix inserts an extra NOP instruction after each basic block to reserve space for reorder- ing basic blocks. A program built by Remix is still a valid one that can be executed standalone. It is just larger (14.8% average size increase for SPEC CPU2006) and probably runs slower. Our tests show that Remix-built programs run mostly as fast as or only slightly slower than the original programs. This result is expected as modern processors have an efficient and intelligent instruction prefetching system. However, there are a few outliers that execute even faster than the baseline. This is probably caused by the complex interaction between the instruction alignment and the cache hierarchy. Native client (NaCl) shows similar results [156]. NaCl is a software fault isolation system that can safely execute native code in the web browser. In NaCl, the untrusted code is divided into equal-sized fragments, and no instructions can cross the fragment boundary. NOP instructions are used to pad the fragments if necessary.

3.6 Summary

We have presented the design and implementation of Remix, a live randomization system for user-space applications and kernel modules. Remix randomly reorders basic blocks within their respective functions at undetermined time intervals. it can substantially increase the entropy of

37 ASLR, one of our most important defenses against code reuse attacks. By randomizing the code layout, Remix can significantly enhance ASLR’s defense against certain types of information leak vulnerabilities. Remix is a flexible and composable defense technique due to its unique design and efficiency. It brings to the composed systems extra entropy that changes with the time. Our experiments with both standard and application benchmarks show that Remix only incurs a small performance overhead.

38 CHAPTER 4

PINPOINTING VULNERABILITIES

4.1 Introduction

In the previous chapter, we have introduced Remix, an approach to mitigating attacks and ex- ploits caused by unpatched vulnerabilities. To completely protect the system from further exploits, the next step we need to take is finding out where the vulnerabilities are. Unfortunately, today’s software systems become complex and error-prone for programmers, and vulnerability discovery is not an easy task. Memory is the battlefield of an eternal arms race between attacks and de- fenses [144]. Although several exploitation mitigation mechanisms like mandatory access control (MAC) [89], address space layout randomization (ASLR) [145] and data-execution prevention (DEP, aka W ⊕ X) [69,70] have been employed in commodity operating systems, attackers are always able to penetrate using novel exploitation approaches. Most often, attackers use a combination of several exploit techniques to bypass defense mechanisms such as ASLR. For example, an attacker may first exploit an information leak vulnerability to de-randomize the victim process before launching a return-oriented programming (ROP [43, 133]) attack to disable DEP, and then inject and run the shellcode. Hence, a timely response to new (zero-day) exploits is essential to defenses. While a number of systems have been designed to detect attacks, most of these systems are only able to detect the symptoms. In other words, the targeted vulnerabilities are usually not known by merely detecting an attack. For instance, system call (syscall) interposition helps to identify abnormalities in the syscalls issued from a protected program [81, 82, 87], which is built on the fact that an attacker would need to make syscalls in order to carry out any “meaningful” malicious activity. When there is a deviation in the actual syscall sequence, an intrusion alert will be raised. However, both the initial attack and the targeted vulnerabilities are not revealed by the detection of anomaly in this case. In a similar vein, the vulnerabilities may not be pointed out by merely detecting the control-flow hijacking. For example, untrusted inputs are marked as tainted in a taint-based attack detection system [50,120], and such taints are transmitted throughout the entire system. Thus, whenever the program counter (PC) turns out to be tainted, an attack is

39 1 int main ( int argc , char * argv []) 2 { 3 char buf [16]; 4 strcpy(buf, argv[1]); 5 printf("%s\n", buf); 6 return 0; 7 }

Figure 4.1: A simple program with a buffer overflow at line 4.

identified. Also, similar attacks can be detected by control-flow integrity (CFI [44]), as it instruments the program with inline reference monitors that check the program’s runtime control flow against its pre-computed control-flow graph (CFG). A deviation from the CFG signals that the control- flow has been hijacked. However, neither taint- nor CFI-based systems can pinpoint the exploited vulnerabilities. This can be illustrated with the simple program in Figure 4.1. While the control flow is first hijacked at line 6 as detected by both the taint-based and the CFI-based systems, the actual vulnerability lies at line 4. Syscall interposition detects the attack even later, i.e., when an unexpected syscall is made. In a nutshell, many attack detection systems fall short of revealing the targeted vulnerabilities. A system that can not only detect attacks but also pinpoint the exploited vulnerabilities could greatly help us in the arms-race against attackers. First, it can significantly reduce the window of vulnerability. Developers often spend non-trivial efforts to reproduce and analyze reported attacks. This is usually a manual, time-consuming, and error-prone process as many attacks are hard to reproduce in the development environment. Second, it can automatically locate zero-day vulnera- bilities, as long as the attacks can be detected. Many existing systems can detect zero-day attacks (i.e., they do not rely on the details of known attacks), including the previously mentioned syscall interposition and taint-based/CFI-based systems. Lastly, locating vulnerabilities is an important first step towards automatic software repair and self-healing.

4.2 System Overview

In this chapter, we propose Ravel [53], a system that can pinpoint the targeted vulnerabilities from detected attacks. Ravel stands for “Root cause Analysis of Vulnerabilities from Exploitation Logs.” It consists of three components: an online attack detector, a record & replay (R&R) mech-

40 anism, and an offline vulnerability locator. R&R decouples the other two components so that the online attack detector can operate as efficiently as possible to minimize the performance overhead, and the offline vulnerability locator can employ multiple, time-consuming approaches to improve its accuracy, precision and coverage. As many attack detection techniques have been proposed, we leverage some existing light-weight detectors (program crashes and syscall interposition) for the online component. Note that the development of attack detectors is orthogonal to the Ravel frame- work. New techniques can be easily adopted by Ravel for better and faster attack detection. In this chapter, we focus on the design of the overall framework of Ravel and the vulnerability locator, the main contributions of Ravel. The intuition behind the vulnerability locator is that exploiting a memory vulnerability often causes changes to data flows, and the source or the destination of such a change provides a good approximation to the actual location of the vulnerability [47]. For example, strcpy in Figure 4.1 can overflow into the return address on the stack if argv[1] is longer than the buf size. This introduces a new data flow with strcpy as the source and the return statement (line 6) as the destination. In this case, the source actually points to the vulnerability we need to locate. However, data-flow tampering in general can only provide a rough location of the vulnerability. Ravel further refines them with vulnerability-specific analyses to pinpoint common memory bugs such as integer errors, use-after-free, race conditions, etc. We have implemented a prototype of Ravel for the FreeBSD operating system (Release 10.2). Our experiments with standard benchmarks and various vulnerabilities in popular applications show that Ravel can pinpoint a variety of memory vulnerabilities, and it only incurs a minor performance overhead (about 2% for SPEC CPU 2006, NGINX, Apache, etc.). This demonstrates Ravel’s effectiveness and practicality for helping locate vulnerabilities.

4.3 System Design 4.3.1 System Overview

When encountering an attack, Ravel aims to automatically pinpoint the vulnerabilities it tar- gets. There exist a number of challenges in locating vulnerabilities, in addition to detecting the correpsonding attacks. We use the example code in Figure 4.2 to illustrate them. Many design decisions in Ravel are made to address these challenges. The code in Figure 4.2 is inspired by a real vulnerability in the popular NGINX web server [64]. Function process_request abstracts how the

41 1 int process_request( int conn_fd , struct Header *packet_header, char *buffer_to_send) 2 { 3 char buffer[MAX_BUF_LEN]; 4 size_t size; 5 ssize_t recved;

6 size = (size_t) min(packet_header->length, MAX_BUF_LEN);

7 recved = recv(conn_fd, buffer, size); 8 save_to_file(buffer, recved); 9 send(conn_fd,buffer_to_send,size);

10 return 0; 11 }

Figure 4.2: A vulnerable function used as the running example. There is a buffer overflow at line 7 caused by the integer signedness error at line 6, and an information leak at line 9 caused by the same integer error.

server processes a client’s request. Specifically, the server reads the request from socket conn_fd (line 7) and logs it to a local file (line 8), it then sends back its response (buffer_to_send) to the client (line 9). The packet_header parameter points to the packet header previously received from the client. Its length field specifies how much data to receive from the client. Consequently, this field is under the attacker’s control. To avoid overflowing the receive buffer, line 6 limits length by the buffer size. Unfortunately, this line has an integer signedness error. More specif- ically, packet_header->length has a type of ssize_t (i.e., signed size_t, an alias of int). If the attacker makes it negative, it will pass min without any change and be converted to a large positive number saved in size. This leads to a stack-based buffer overflow at line 7, a potential denial-of-service at line 8 (by filling the disk space), and an information leak at line 9 (by sending lots of data to the attacker). Note that recv normally returns any data currently available in the socket up to the requested size. The attack thus also controls how much data to receive and write at line 7 and 8, respectively. We use this code as a running example in the rest of this section. Even though the source code is used in these examples, Ravel works on program binaries and thus does not require the source code. Nevertheless, if the debugging information or source code is available, the located vulnerability can be mapped out in the binary to the source code.

42 This example typically demonstrates many challenges in pinpointing vulnerabilities. First, the manifestation of an attack does not necessarily reveal the real vulnerabilities. The real vulnerability in Figure 4.2, an integer signedness error, lies at line 6. Without this flaw, the rest of the function cannot be exploited. Therefore, line 6 is the root cause of those attacks. It is this line that the developer should fix. However, most attack detection systems fail to reveal this root cause because they look for anomalies in the program’s behaviors and can only detect an attack when or after it has happened. For example, control-flow integrity [44] can detect a violation at line 10, and syscall interposition [87] only detects the attack when the payload is executing. Data- flow integrity [47, 141] can reach closer to the root cause but still cannot pinpoint it. In this chapter, we define a data flow as a def-use relation between instructions [47]. Specifically, an instruction “defines” a memory location if it writes to that location, and an instruction “uses” a memory location if it reads from that location. Two instructions form a def-use relation if they write to and read from the same memory location, respectively. Anomalies in the data flow can help us identify two derived vulnerabilities at line 7 and 9 since they both introduce extra def-use relations. The root cause, nevertheless, is the integer error at line 6. To address this challenge, Ravel first uses a data-flow analysis to approximate the real vulnerability, and further refines the result by analyzing its details. Second, several vulnerabilities may co-exist together allowing multiple ways to exploit them. Figure 4.2 contains four vulnerabilities. The attacker may choose to take over the control flow by the buffer overflow, or dump the server’s memory by the information leak (without triggering the buffer overflow since the attacker can control how much data to be received). Ravel needs to handle individual vulnerabilities as well as their combinations. Third, techniques to locate vulnerabilities often require analyzing the program’s detailed memory access patterns, a prohibitively time-consuming technique without special hardware support. To be practical, Ravel has to address this important performance challenge. Finally, how to faithfully reproduce attacks is also challenging but may be indispensable for finding and locating root causes. Figure 4.3 shows the overall architecture of Ravel. Ravel consists of three components: an online attack detector, a record & replay (R&R) mechanism, and an offline vulnerability locator. The target process is executed under the control of a record agent (we call it the recorder for brevity). The recorder logs the complete execution history of the process for replaying later. Recent advances in R&R, such as eidetic systems [74], allow Ravel to continuously record the execution of a process with

43 Process Target Checkpoint Process Attack Detector User Vulnerability Detector Kernel Record Agent Execution Replay Agent History

Phase I: Normal Execution Phase II: Vulnerability Detection

Figure 4.3: Overall architecture of Ravel

low performance and storage overhead. While recording, the attack detector monitors the execution of that process for exploits and attacks. In order to capture real-world attacks, the recorder and the attack detector run on production systems. If an attack is detected, the execution log can be sent to the developer (likely on a different computer) for further analysis using the vulnerability locator. The vulnerability locator replays the recorded execution and performs a number of analyses to pinpoint the vulnerabilities. Specifically, it first uses a data-flow analysis to roughly locate the vulnerabilities and then performs vulnerability-specific analysis to refine the results. In this architecture, we use a combination of the general data-flow analysis and vulnerability-specific analyses to fulfill Ravel’s precision requirements (the 1st and 2nd challenges), and leverage R&R to address the performance and reproducibility requirements (the 3rd and 4th challenges). In the rest of this section, we give details of these components.

4.3.2 Attack Detection

In order to locate vulnerabilities, Ravel first needs to detect and record the attacks. The attack detector thus plays an important role in Ravel. In order to handle real-world (zero-day) vulnera- bilities, the attack detector has to run on the production system. This imposes strict performance and effectiveness requirements on the attack detector. However, Ravel is structured as an extensi- ble framework. It can employ many attack detection techniques, such as syscall interposition and CFI. This is made possible by the design of Ravel. Specifically, the vulnerability locator deduces the rough location of an exploited vulnerability by searching for anomalies in the data flow and further refines that with detailed analyses. Therefore, it is sufficient for the vulnerability locator to know that the execution log contains certain (unknown) attacks. This minimal requirement allows

44 Ravel to employ any attack detection technique as long as it is effective and has low performance overhead. We want to emphasize that the attack detector itself may provide very little help in locating vulnerabilities. For example, syscall interposition detects attacks only when the payload is executing, but the actual exploitation is hidden in the haystack of other executed instructions. In our prototype, we make use of two simple attack detection techniques – program crashes and syscall interposition [87]. They both have low performance overhead. With the wide-spread deployment of exploit mitigation techniques like W ∧ X and ASLR, they are also more effective than before. For example, W ∧ X prevents the injected malicious code from being executed. To address that, the attacker often uses return-oriented programming (ROP [133]) to change the process’ mem- ory permission with an unplanned syscall. This can be readily detected by syscall interposition. Moreover, ASLR often makes an exploit less stable, leading to more frequent crashes under attack. Both techniques can be easily integrated into Ravel. In particular, our implementation of syscall interposition validates both syscall sequences and parameters. Syscall interposition has been well researched [79,87,91,105,106,110,114,118,125], so we omit the details here. The derivative vulnera- bilities in Figure 4.2 (line 7, 8, and 9) can be detected by checking syscall parameters and potentially by crashes if ASLR is supported. We plan to support more advanced detection techniques, such as CFI, in the future.

4.3.3 Record and Replay

Record & replay (R&R [74, 88, 108, 131]) plays as a crucial role in Ravel: it bridges the per- formance gap between the attack detector and the (sluggish) vulnerability locator, making Ravel usable even in production systems, and it makes the attack reproducible for many times. How to faithfully reproduce in-the-wild bugs/attacks is a big challenge in the software development and maintenance. Ravel imposes two requirements on its R&R system. First, since the recorder runs on the production system, it should incur minimal performance and storage overhead. Additionally, it must be compatible with the attack detector without changing the program’s normal execution. Second, the execution history is replayed by the vulnerability locator, which instruments the re- played execution with heavy-weight analyses. These analyses will make additional system calls (e.g., to allocate memory for the intermediate results) and consequently affect the memory layout of the replayed process. The replayer must isolate these side-effects to keep the replay faithful.

45 For the purpose of satisfying these requirements, Ravel adopts a process R&R system, which records and replays a single process or a group of related processes. The process R&R can be implemented either at the library level or the kernel level. Ravel chooses the latter because it is more secure in an adversarial environment — the library-level recording can be bypassed if the program/payload makes direct system calls, and the recorded history cannot be trusted if the process is compromised because the recorder exists in the (compromised) process’ address space. Moreover, our R&R system assumes that the target program properly synchronizes its access to shared resources. Deterministically recording and replaying arbitrary programs that have race conditions could be very complicated and inefficient without hardware support [108]. If the program does have race conditions (i.e., bugs that should be fixed by developers), the replayed execution will deviate from the recorded history and can thus be detected. Lastly, Ravel records the complete execution history of the target process. To reduce the storage overhead, Ravel borrows various techniques from the Eidetic System [74], a practical always-on whole-system R&R system. For example, Ravel uses LZMA to compress the execution log and avoids logging the data that can be recovered from the environment, such as reading a static file.

Record. It is important for Ravel to record all the non-deterministic inputs to the process for a faithful replay. These inputs include both the data and events from the external environment (e.g., network packets and signals) and the internal events (e.g., locks). Next, we describe in detail how these two types of inputs are handled by Ravel. Among different interface choices such as the virtual machine interface or the library interface, the syscall interface is an ideal location to intercept and record external inputs such as syscall returns, the user-space memory modified by a syscall, and signals. Syscall returns need to be recorded because they may affect the program execution. For example, a program may use its process id returned by getpid as one of the ingredients to generate random numbers. Syscall returns can also affect the control flow (e.g., error handling). It is rather straight-forward to record syscall returns – we just need to log them in the execution history. A syscall may modify the user- space memory. For example, the stat syscall writes the file status into the user-provided memory. Most such syscalls explicitly define the structures of the exchanged data. If so, we simply save the modified memory after the syscall returns. In this way, we understand and retain the semantics of the modified data. However, syscalls like may write different amounts of data to the

46 user space when given different parameters. Even worse, ioctl can be dynamically extended by a loaded kernel module. It is hard to record all the user-space memory written by these system calls. To address that, Ravel hooks the FreeBSD kernel’s copyout function to record (and replay) the data copied to the user space during those syscalls. Similar to Linux’s copy_to_user function, FreeBSD exclusively uses copyout to write data to the user-space. Signals can also introduce non- determinism to the recorded process. A signal can be either synchronous and asynchronous. A synchronous signal (e.g., SIGSEGV) is the result of exceptional program behaviors. There is no need to record this kind of signals because replaying the program will trigger the same exceptions. An asynchronous signal (e.g., an alarm) instead must be faithfully recorded and replayed. Since it is asynchronous, we can delay its delivery until a syscall return. This greatly simplifies the replay of asynchronous signals. Certain instructions can bypass the kernel and directly interact with the hardware. A typical example on the x86 architecture is the RDTSC instruction, which returns the CPU’s current time- stamp counter. Some programs use the outputs of RDTSC for random number generation. Therefore, Ravel has to record the outputs of this instruction. However, RDTSC is by default an unprivileged instruction and can be executed by any user programs. To address that, we change the CPU’s configuration (the TSD flag in the CR4 register) to intercept the execution of RDTSC by user processes and record its outputs for the target process. The interception of RDTSC is turned off when the kernel switches to other processes. As such, there is no overhead for other processes. The internal non-determinism comes mostly from accessing the shared memory. To avoid race conditions, the program should synchronize these accesses, say, by using locks. Without race conditions, it is sufficient for Ravel to record the order of processes (or threads) entering critical sections. Replaying the execution in the same order ensures that the shared memory is in the correct state for each critical section. To this end, we instrument the synchroniza- tion primitives in common libraries (e.g., the pthread library) to record and replay them in orders. Examples of these primitives include pthread_mutex_lock, pthread_rwlock_wrlock, pthread_cond_broadcast, pthread_cond_signal, sem_wait, atomic_store, atomic_exchange, etc. On the other hand, if the program does have race conditions (e.g., two threads modify the same data without synchronization), the replay will deviate from the recorded execution history. Ravel tries to detect race conditions when that happens.

47 Replay with Instrumentation. After the detection of an attack, Ravel starts to replay the recorded execution to locate vulnerabilities. For most syscalls, such as getpid and stat, it is not necessary to re-execute them. Ravel just returns the recorded return values and updates the user memory if necessary. Similarly, network connections (sockets) are not recreated during the replay. Ravel directly returns the recorded data to the replayed process. Other syscalls, typically memory related one have to be re-executed. For example, most programs use mmap to allocate memory. The kernel may return a different block of memory during the replay. To address that, we pass the recorded in the first parameter of mmap, which is a suggestion of the allocation address to the kernel. During our experiments, the kernel always accepts the suggestion and allocates the same memory. During the replay, Ravel instruments the program in order to detect anomalies in the process’ memory access patterns. We use dynamic binary translation (BT) for this purpose. As such, the same program binary can be used for both recording and replaying. The BT engine can interfere with the replay. For example, the engine needs to allocate memory for its own use (e.g., to cache the translated code). This may conflict with the memory layout of the recorded execution. The engine also makes extra syscalls, for example, to write the log to the disk. Ravel tries to limit the interference to ensure the replayed execution is faithful to the recorded one. For example, it loads the BT engine in an unused memory area, and asks the kernel to allocate the code cache in another unused area. Since the engine is separated from the code cache 1, Ravel can tell whether a syscall is made by the engine or the program itself, and makes the replay decision accordingly. Note that BT engines often make direct syscalls rather than calling libc functions to avoid disrupting the translated process because many libc functions are non-reentrantable or thread-unsafe. Ravel records the complete execution history of the target process. Even though replaying is often much faster than recording [108], it may still take a long time to replay a long-running process, such as a web server. To address that, we can take periodic snapshots of the process and start replaying from the most recent snapshots if the recorded history is too long. We should continue searching backwards for vulnerabilities until a vulnerability is located. This may introduce false negatives if there are multiple exploited vulnerabilities because of the missing/partial def-use relations. Our prototype does not support this optimization so far.

1In dynamic BT, the translated code executes from the code cache, instead of the original code section.

48 4.3.4 Pinpointing Vulnerabilities

Ravel’s vulnerability locator tries to spot out the exploited vulnerabilities from a recorded exe- cution that has been identified to contain attacks. It is based on the key observation that memory exploits often change the data flow. As such, it first uses a data-flow analysis to locate the rough locations of the vulnerability and further refines them with specific analyses targeting common types of memory vulnerabilities. Ravel is designed as an extensible framework so that analyses for less common types of vulnerabilities can be added later.

Data-Flow Analysis. Data-flow analysis is used to calculate the probable locations of an exploited vulnerability. We define a program’s data flow as the def-use relations between instruc- tions [47]. Specifically, an instruction defines a memory address if it writes to that address, and an instruction uses a memory address if it reads from that address. If two instructions define and use the same address respectively, they form a def-use relation. To detect data-flow anomalies, Ravel computes a data-flow graph (DFG) for the program beforehand using dynamic analysis. During the replay, it instruments the process to capture a detailed execution log, including all the runtime memory accesses. It then extracts from this log the data-flow of the program under attack. If an actual def-use relation is not in the pre-computed DFG, we consider both instructions of this def-use relation as a candidate for the vulnerability. However, it is unclear which instruction is the one. Ravel uses heuristics to help determine whether the “def” or the “use” more likely marks the vulnerability. First, if multiple def-use relations are introduced by one of these instructions, that instruction more likely is the vulnerability. For example, in a buffer overflow, the def instruction may overwrite a large block of memory used later by several instructions. Second, if the data is used by a syscall that sends data outside (e.g., send, sendmsg, write), the vulnerability is likely related to the use (i.e., an information leak). Third, if the accessed memory contains control data, the vulnerability likely lies in the def instruction. We can identify control data if it is read by instruction fetching (e.g., a return instruction fetches its return address from the stack), or if it is subsequently used by an indirect branch instruction. If none of the above heuristics applies, Ravel reports both instructions as a viable candidate. Ravel’s data-flow analysis is performed at the instruction level. For a logged syscall, Ravel understands its semantics and can identify the memory regions it reads from and writes to. From this perspective, a syscall can be treated as a pseudo instruction with an extended define and use

49 sets. Moreover, if an identified instruction lies inside a library, we trace back from that instruction until we reach its call site. A call to a library function is normally encoded as the call to a PLT (procedure linkage table 2) entry. This step is necessary otherwise many identified vulnerabilities would be erroneously attributed to library functions. For example, buffer overflows are often caused by incorrect use of library functions like strcpy and memcpy. Ravel’s data-flow analysis can cover a variety of memory vulnerabilities as most memory- corrupting exploits disturb the program’s data flow. For example, it can locate both memory vulnerabilities in Figure 4.2. Specifically, line 7 contains a buffer overflow. If exploited, it defines the overflowed data on the stack, including the return address pushed by the caller of process_request. When the return address is fetched at line 10, an extra def-use relation is detected between these lines. Since this use is an instruction fetch, Ravel reports that the vulnerability lies at line 7. Line 9 contains an information leak. If exploited, it reads beyond buffer_to_send, creating multiple def- use relations with the same use. Ravel thus reports line 9 as the potential vulnerability. However, as demonstrated by Figure 4.2, the data-flow analysis is not the final solution. The real vulnerability could hide in somewhere else. To this end, Ravel tries to further refine the results with the following analyses.

Integer Errors. For integer errors, they are usually not exploited alone but followed by other exploits. For efficiency, Ravel focuses on common integer errors associated with buffer access vio- lations. Specifically, it checks for common integer errors if a reported vulnerability involves block memory operations as these operations are often conditioned by a length parameter. Such vulnera- bilities include calls to popular library functions that are frequently associated with buffer overflows (e.g., memmove, memcpy, strcpy, strncpy, strncat, strlcpy, and their many variants) and I/O functions or syscalls (e.g., recv, recvfrom, and read). However, some programs may use their own block data copy/move functions rather than the standard libc functions. These functions often employ the repeated string instructions of x86 (i.e., instructions like MOVS/STOS/CMPS/LODS/SCAS prefixed by REP/REPE/REPNE/REPNZ/REPZ). In this case, register RCX contains the data size, and register DS:RSI and ES:RDI contain the source and destination addresses, respectively. To locate integer errors, we search from the reported vulnerability backward.

2PLT/GOT is the structure to support dynamic linking [111].

50 Integer errors have several types, such as assignment truncation, integer overflow/underflow, and signedness error. An assignment truncation happens when an integer is assigned from a longer type to a shorter type. This is usually done by simply discarding the extra most significant bits. An integer overflow/underflow happens when the result of an integer arithmetic exceeds the valid range of the target register. A signedness error happens when an integer is converted between a signed type and an unsigned type. The code in Figure 4.2 contains a signedness error at line 6. Note that regular C/C++ programs may contain both benign and harmful integer errors [75]. However, false positives will not be a big issue for Ravel since all its detected integer errors are related to a reported vulnerability and thus are more likely to be true positives. Detecting assignment truncations is relatively simple because x86’s instruction encoding specifies the width of memory or register accesses. Meanwhile, integer overflows/underflows can be detected by checking the RFLAGS register, which contains various bits for arithmetic instructions. However, signedness errors are more challenging to identify because many integer instructions do not encode the signs of their operands. For example, signed and unsigned integer additions/subtractions are essentially the same with the two’s complement data format. To address that, we collect hints of the signs for those operands from other instructions that access them. Many instructions do carry sign information. For example, the JG, JGE, JL, and JLE instructions select a branch based on the signed comparison, while JA, JAE, JB, and JBE instructions are based on the unsigned comparison. Conditional move instructions, such as CMOVG and CMOVA, also carry the sign information. Modern compilers tend to use conditional moves for better performance. Some arithmetic instructions also carry the sign information, such as SAR/SHR and IDIV/DIV. A second source of the sign information comes from the reported vulnerability itself. For example, functions like memcpy provide a clear definition of their parameter types. If there are conflicts in the collected hints of signs, a signedness error is highly likely and will be reported. Ravel can detect the signedness error in Figure 4.2. Specifically, the data-flow analysis locates the vulnerability at line 7 (or line 9 depending on the attack), which specifies that its parameter size is unsigned. Searching backwards, Ravel finds that size was assigned from a signed number (min uses a signed comparison instruction). This conflict in size’s signs allows Ravel to identify this integer error. Figure 4.4 shows an example of integer overflows. The code is related to CVE-2015- 3864 [67], an integer overflow in Android’s built-in Stagefright media library. If size + chunk_size

51 1 uint8_t *buffer = new uint8_t[size + chunk_size];

2 i f ( size > 0) { 3 memcpy(buffer, data, size); 4 }

Figure 4.4: Code snippet of CVE-2015-3864 in Android. There is an integer overflow at line 1, leading to the buffer overflow at line 3.

overflows, the allocated buffer may be smaller than the data length, leading to the buffer overflow at line 3. Ravel can detect this integer overflow by checking the RFLAGS register during the replay.

Use-After-Frees and Double-Frees. Memory allocation/free functions are used to keep track of the memory life-time. This kind of information can confirm buffer overflows (the buffer size vs. the data size) and identify use-after-free and double-free flaws. Use-after-free and double-free have become popular attack vectors in recent years. Even though the data-flow analysis can discover anomalies in the data flow caused by them, it does not have enough information to correctly identify them. This insufficiency can be addressed by the memory life-time information. For example, a vulnerability can be categorized as use-after-free if a block of the freed memory is accessed again. Note that some programs use their own memory management functions rather than the standard libc or C++ functions. This issue can be addressed through source code annotation or with heuristics.

Race Conditions. Since a large portion of today’s systems have deployed exploit mitigation techniques like DEP and ASLR, race conditions have become a more popular attack vector. As pre- viously mentioned, Ravel’s replayed execution may deviate from the recorded one if the program has race conditions. For example, they may have different syscall sequences, or the replayed execution crashes but the recorded one does not. When that happens, Ravel runs an existing algorithm [28] to detect race conditions during the replay. Specifically, it checks whether two potentially racing operations (e.g., two threads write to the same variable) have a happens-before relation, i.e., one operation is guaranteed to happen before the other. Such a relation can be established if these operations are protected by locks or ordered through inter-process communications (e.g., pipes). We plan to add the capability to root-cause race conditions by identifying common data racing patterns [100].

52 4.3.5 Prototype Efforts

We have implemented a prototype of Ravel based on the FreeBSD release 10.2. The R&R system is implemented from scratch in the kernel with a small user-space utility to control recording and replaying. We added about 3.9K SLOC (source lines of code) to the kernel, and the utility consists of about 300 SLOC. Vulnerability locator is based on the open-source Valgrind [119] instrumentation framework. We made some changes to the Valgrind framework itself so that it could be used in the replay (Section 4.3.3). In addition, Valgrind’s system call wrappers for FreeBSD are incomplete. We wrote our own and contributed it back to the project. We added about 2.2K SLOC to Valgrind in total. The replayer captures the whole execution trace of the replayed program, specifically, the executed instructions and the addresses and sizes of their memory accesses. Based on the execution trace, we implemented Ravel’s data-flow analysis, integer error detection, and use-after-free and double-free detection, and integrated Valgrind’s existing race condition detection. Most of these analyses work at the byte granularity except the integer error detection, which works according to the instructions’ operand sizes. As mentioned before, Ravel works on program binaries directly. After locating a vulnerability, we revert it back to the source code if the program contains the debugging symbols.

4.4 Evaluation

In this section, we first evaluate the effectiveness of Ravel against common memory-based vul- nerabilities, and then measure the performance overhead incurred by Ravel.

4.4.1 Effectiveness

To evaluate the effectiveness of Ravel, we first analyze how Ravel can handle common mem- ory vulnerabilities, such as buffer overflows, integer errors, information leakage, and format string vulnerabilities. Then we describe our experiments with a variety of vulnerabilities, including two high-impact real-world ones. By design, Ravel can detect any attacks that change the program’s runtime data flow. However, the data-flow analysis itself often cannot provide the precise locations of the exploited vulnerabilities. To address that, Ravel further refines the results with vulnerability-specific analyses. There are

53 many types of common memory vulnerabilities. In the following, we discuss how Ravel can handle some of them, starting with the most common one, buffer overflows. Buffer overflows: a buffer overflow, or a buffer overrun, happens when a program writes more data into a buffer than it can hold, overwriting the adjacent data. A typical example is to use functions like strcpy that do not check the buffer size to copy untrusted data. Buffer overflows can often lead to arbitrary code execution and denial-of-services. Ravel can locate buffer overflows if the overwritten data are used after the attack: when a piece of data is read, a new def-use relation is introduced between the vulnerability and the reader. In addition, Ravel could tell that the def is likely the vulnerability if there are multiple new uses with the same def. An example is shown in Figure 4.2 line 7 in which a new def-use relation is introduced between line 7 and 10. Integer errors: integer errors include a number of flaws related to integer operations such as arithmetic, type casting, truncation, and extension. For example, an integer overflow happens when the result of an integer arithmetic exceeds the valid range of the destination type. Integer errors are often not exploited alone but instead with other vulnerabilities, such as buffer overflows. To detect integer errors, Ravel first locates the symptomatic vulnerability and then searches for possible integer errors if that vulnerability takes integer parameters. In Figure 4.2, after locating the buffer overflow at line 7, Ravel continues to search for integer errors because recv’s size parameter is and integer and discovers the integer signedness error at line 6. Information leaks: an information leak happens when a program inadvertently leaks data to unauthorized parties that may help them obtain sensitive information or launch further attacks. For example, an attacker often exploits information leaks to de-randomize the victim process’ address space before launching return-oriented programming attacks. Information leaks can also disclose confidential information to attackers. A recent high-profile example is the Heartbleed flaw in the OpenSSL library. Heartbleed can be exploited to leak the server’s memory to the attacker, 64KB at a time. This eventually allows the attacker to steal the server’s private key. Ravel can precisely locate information leaks: an information leak reads more data than it should. This creates additional def-use relations between the writers of that data and the vulnerability. Accordingly, Ravel can tell that the use likely is the vulnerability since there are multiple defs with the same use. Afterwards, Ravel tries to identify integer errors. In Figure 4.2, new def-use relations are introduced between line 9 and the writers of the data adjacent to buffer_to_send (not shown in the figure).

54 Use-after-frees: use-after-frees are another common type of memory vulnerabilities, in which a program erroneously references the memory that has been previously freed. Depending on its nature, a use-after-free may allow an attacker to crash the program, corrupt data, or even execute the injected code. In a typical scenario to exploit a use-after-free, the attacker tries to allocate an object under his/her control immediately after the vulnerable memory is freed. The memory allocator likely assigns the just-freed memory to this object, giving the attacker full control over the to-be-reused memory. If the reused memory originally contains a data pointer, the attacker could exploit it to read or write arbitrary data. Likewise, if it contains a code pointer, the attacker could exploit it to hijack the control flow. Ravel can locate a use-after-free if the attacker-controlled object is different from the vulnerable object (this is often the case otherwise the attacker can simply misuse the object under his control). Consequently, they are accessed by different instructions, and new def-use relations are created from the writers of the attacker-controlled object to the readers of the vulnerable object. Ravel also keeps track of the data lifetime to facilitate the detection of use-after-frees and double-frees. Format string vulnerabilities: a format string vulnerability occurs when a function, such as printf, accepts an attacker-controlled format string. The format string decides how the function interprets its following parameters. By manipulating format directives, an attacker can read data from the stack, corrupt memory, and even execute arbitrary code. Ravel can pinpoint format string vulnerabilities. For example, a new def-use relation will be introduced between the format function and its caller if the vulnerability is exploited to read the return address. Format string vulnerabilities are becoming less common nowadays because it is relatively easy for compilers to automatically detect them. For example, gcc allows a program to annotate its own format functions with the format compiler directive. This feature is extensively used by the Linux kernel to protect its debugging and logging functions. We have discussed Ravel’s effectiveness in locating common memory vulnerabilities. Next, we show some experiments conducted with a number of real-world and synthetic vulnerabilities, as well as attacks, for popular applications. In the following, we will give the details of these experiments.

CVE-2013-2028 of NGINX. The first experiment is conducted on NGINX, which is a very popular open-source web server. It powers some of the most popular web sites on the Internet,

55 1 typedef struct { 2 ... 3 off_t content_length_n; 4 ... 5 } ngx_http_headers_in_t; 6 ... 7 u_char buffer[NGX_HTTP_DISCARD_BUFFER_SIZE]; 8 ... 9 size = (size_t) ngx_min(r->headers_in.content_length_n, NGX_HTTP_DISCARD_BUFFER_SIZE); 10 n = r->connection->recv(r->connection, buffer, size);

Figure 4.5: Code sketch of CVE-2013-2028 in NGINX. An integer signedness error at line 9 leads to a buffer overflow at line 10.

such as Netflix, Hulu, CloudFlare, and GitHub [121]. Vulnerabilities in NGINX consequently have a high impact. Figure 4.5 shows the CVE-2013-2028 vulnerability in NGINX (version 1.3.9 to 1.4.0). This flaw lies in NGINX’s faulty handling of chunked transfer-encoding, a standard feature of HTTP 1.1. This feature allows data to be sent in a series of chunks. It replaces the regular Content- Length HTTP header with the “Transfer-Encoding: chunked” header. Because it allows data to be transferred piecemeal without knowing their total length, the chunked encoding is particularly useful for dynamically generated data. In this encoding, the length of each chunk is prefixed to the actual data in the chunk. A vulnerable NGINX server parses the chunk length and stores it in r->headers_in.content_length_n. The length is then compared to the buffer size in an effort to prevent buffer overflows. Unfortunately, there is a signedness error at line 9. Specifically, content_length_n is a signed integer. If the attacker makes it negative, ngx_min returns it without any change. The length is then casted to an unsigned integer (size). This vulnerability could be exploited to overflow the buffer on the stack (buffer defined at line 7) at line 10. Interestingly, this vulnerability cannot be exploited as is on the FreeBSD system because the FreeBSD kernel prevents system calls like recv from accepting unreasonably-large size parameters. This is essentially an ad-hoc syscall parameter validation. In spite of that, it is a rather effective de- fense against similar attacks without requiring any change to user programs. With this protection, the buffer overflow at line 10 is foiled by the kernel, i.e., the kernel has located this vulnerability. As such, we can omit the data-flow analysis and directly start other analyses. Ravel reports the signed-

56 ness error at line 9. If we remove the kernel’s protection, this vulnerability becomes exploitable. Note that even though the size parameter to recv is really large, the attack can control how many data to be returned by recv because recv returns the existing data cached in the socket without waiting for the full requested size. In this experiment, we launched a return-oriented programming (ROP [133]) attack against the server, similar to Blind ROP [41]. Exceptions caused by the attack allowed Ravel to detect the attack and identify the buffer overflow at line 10. Ravel further traced it back to the signedness error at line 9 based on the conflicts in the hinted signs of size: unsigned at line 10 (deduced from recv), and signed at line 9. In particular, ngx_min is compiled into a signed comparison followed by a conditional move instruction (cmovg).

Table 4.1: Summary of the evaluation results on a number of DARPA CGC programs.

Program Name Vulnerability Type Pinpointed? BitBlaster Null Pointer Derefernece Yes Heap Overflow Yes CGC_Planet_Markup_Language_Parser NULL Pointer Dereference Yes StackOverflow Yes CGC_Board Heap Overflow Yes CGC_Symbol_Viewer_CSV Integer Overflow Yes CGC_Video_Format_Parser_and_Viewer Heap Overflow Yes Heap Overflow Yes Integer Overflow Yes simple_integer_calculator NULL Pointer Dereference Yes Out-of-bounds Read Yes Null Pointer Dereference Yes Diary_Parser Out-of-bounds Read Yes Stack Overflow Yes Heap Overflow Yes Integer Overflow Yes electronictrading Untrusted Pointer Dereference Yes Use After Free Yes Enslavednode_chat Heap Overflow Yes Arbitrary Format String Yes Kaprica_Script_Interpreter NULL Pointer Dereference Yes Double Free Yes KTY_Pretty_Printer Stack Overflow Yes

57 1 /* ssl/d1_both.c */

2 int dtls1_process_heartbeat(SSL *s) 3 { 4 unsigned char *p = &s->s3->rrec.data[0], *pl; 5 unsigned short hbtype ; 6 unsigned int payload ; 7 unsigned int padding = 16; 8 ... 9 /* Read type and payload length first */ 10 hbtype = *p++; 11 n2s(p, payload); 12 ... 13 pl = p; 14 i f (hbtype == TLS1_HB_REQUEST) { 15 unsigned char *buffer, *bp; 16 ... 17 int r; 18 ... 19 buffer = OPENSSL_malloc(1 + 2 + payload + padding); 20 bp = buffer ; 21 /* Enter response type, length and copy payload */ 22 *bp++ = TLS1_HB_RESPONSE; 23 s2n(payload, bp); 24 memcpy(bp, pl, payload); 25 bp += payload; 26 /* Random padding */ 27 RAND_pseudo_bytes(bp, padding); 28 r = dtls1_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding); 29 ... 30 } 31 ... 32 }

Figure 4.6: Code sketch of CVE-2014-0160 (aka. Heartbleed). The attacker controls payload. Memcpy may copy a large amount of extra data to buffer, and send it back through dtls1_write_bytes.

CVE-2014-0160 (Heartbleed). CVE-2014-0160, commonly known as Heartbleed, is an in- formation leak in the popular OpenSSL library. OpenSSL is a ubiquitous open-source TLS/SSL and cryptography library. It is shipped in most Unix-like systems (e.g., Linux and BSDs) and available for other systems. It is also embedded in lots of devices like routers. The impacts of this flaw is serious and far-reaching. Heartbleed is a flaw in OpenSSL’s handling of the heart-beat extension, which essentially is

58 an echo protocol (similar to ping in the Internet Protocol) to ensure the liveliness of an SSL connection. Heartbleed allows an attacker to steal the victim’s memory up to 64KB at a time. It has been demonstrated that critical data, such as the private key for TLS/SSL, could be leaked by this flaw. Figure 4.6 shows the code sketch of this flaw. Specifically, the payload field of the heartbeat request packet specifies how many bytes of the request data should be sent back to the requester. This field is extracted from the request packet at line 11. A response buffer is allocated at line 19 and the response is assembled in it by line 22 to 27. Particularly, line 24 copies the data from the request packet to the buffer. Finally, the response is sent to the requester at line 28. This bug is caused by the missing check that the request payload has more bytes than payload. Because the payload field is 16 bits, at most 65, 535 bytes of the data can be exfiltrated each time. In our experiment, we ran OpenSSL 1.0.1 with NGINX to serve HTTPS requests. We kept exploiting this bug with different combinations of requests in order to obtain the server’s private key. Moreover, the server was configured to automatically restart if it crashed due to invalid memory reads caused by the attack. After catching an exception, Ravel replayed the attack and discovered extra def-use relations from other instructions to memcpy at line 24. We concluded that the vulnerability lied at line 24 because those relations had a common use. Ravel further checked for integer errors. None was found.

Experiments with CGC Challenges. To evaluate Ravel’s effectiveness on a more diverse set of vulnerabilities, we used the sample challenges from DARPA’s Cyber Grand Challenge (CGC) [68], a competition to design better Cyber Reasoning Systems that can automatically identify software flaws and scan for affected hosts in the network. Challenges in CGC are designed to represent a variety of software vulnerabilities. Instead of contrived simple test cases, they approximate real soft- ware vulnerabilities with enough complexity, ideal for stressing vulnerability discovery and defense systems. Each challenge comes with a set of proof-of-vulnerability (POV) inputs that can trigger these vulnerabilities. The record and replay of CGC programs are simple since they only use a very small number of system calls. In our experiments, we used Ravel to pinpoint these vulnerabilities. The results are summarized in Table 4.1. It shows that Ravel can help developers locate a variety of types of vulnerabilities. In the following, we give details for one of the challenges (CNMP). The CNMP (Chuck Norris Management Protocol) challenge models a message management system, in which users can add, list, count, and show messages (jokes, to be more specific). The

59 related vulnerable functions are listed in Figure 4.7. insert_joke inserts a message into the system’s database. If the message’s length exceeds a threshold (line 7), the system logs this message using syslog (line 10) and returns an error code. Inside the syslog function, its argument format is passed to vsnprintf (line 21), a string-formatting function. Since joke_str is controlled by the attacker, a format-string vulnerability can be triggered by passing a crafted string. In this experiment, we added a long message with format specifiers to the system database. The program crashed due to its vsnprintf implementation. Ravel replayed the program and reported that over-read happened inside function vsnprintf, where extra def-use relations were introduced. Since this function takes a format string as its input, it is easy to figure out the vulnerability by looking at the recorded function arguments (a format string). A developer can fix this vulnerability by using “%s” as the format string at line 10.

1 // add joke to joke_db. 2 int insert_joke(jokedb_struct *jokedb, const char *joke_str) { 3 // return error (-1) if jokedb is already full. 4 i f (jokedb->count >= MAX_JOKES) { 5 return -1; 6 // return error (-2) if joke_str is too long. 7 } e l s e i f (strlen(joke_str) >= MAX_JOKE_STRING_LEN - 1) { 8 i f (LOGLEVEL >= LOG_INFO) { 9 syslog(LOG_ERROR, "Joke was too long -->\n"); 10 syslog(LOG_ERROR, joke_str); 11 } 12 return -2; 13 } e l s e { 14 ... 15 } 16 }

17 int syslog ( int priority , const char *format, ...) { 18 ... 19 // process format string 20 // and write it to log_entry buffer 21 log_entry_len += vsnprintf(log_entry_idx, MAX_SYSLOG_LEN - log_entry_len, format, args); 22 ... 23 return 0; 24 }

Figure 4.7: Code sketch of vulnerability-related functions in CNMP. syslog takes the user-controlled joke_str, and passes it as a format-string argument to vsnprintf.

60 2.5%

2.0%

1.5%

1.0%

0.5% Relative Performance Overhead

0.0% bzip2 gcc mcf gobmk hmmer sjeng libquantumh264ref omnetppastar xalancbmknginx Apache random

Figure 4.8: Performance overhead of Ravel’s online components relative to the original FreeBSD system.

4.4.2 Performance

Ravel is suitable for deployment in production systems to catch in-the-wild, zero-day vulner- abilities. This imposes a strict performance requirement on both its design and implementation. However, locating vulnerabilities requires time-consuming analyses of the program’s memory access patterns. To address that, we use record & replay (R&R) to separate the system into an online component and an offline component. The former records the program’s execution and detects ongoing attacks; while the latter replays the execution to locate vulnerabilities. As such, the per- formance of the online component is more critical than the offline component and is the focus of our performance evaluation. The online overhead can mainly be attributed to the attack detection and the recording. Our prototype employs two lightweight attack detection mechanisms. As such, most of the overhead comes from the recording. We measured the performance of Ravel with a set of standard benchmarks (SPEC CPU 2006 3) and several real-world applications. All the experiments were conducted on an Intel Core i7 com-

3SPEC CPU2006 has dropped the support to FreeBSD for a long time. There are many compatibility issues that prevent some benchmarks from compiling despite all the compilers we tried (including the official Clang/LLVM compiler and several versions of gcc).

61 puter with 16 GB of memory. The OS was based on FreeBSD release 10.2 for x86-64. All the test applications were installed directly from FreeBSD’s software repository. These applications include two popular web servers, NGINX and Apache. We used ApacheBench to send 5 × 106 requests to each server with a concurrency of 10. To diversify the tests, Apache was configured to work in the worker mode with multi-threading enabled; while NGINX was configured to use poll, instead of kqueue as it is in Apache, for connection processing. The third application we tested was dd. Specif- ically, we used it to read 10MB of data from the kernel’s random number pool (/dev/random) and compressed the data with lzma. The command line was dd if=/dev/random bs=1k count=10000 | lzma > /dev/null. As mentioned before, random numbers are a source of non-determinism and are always recorded. The results are given in Figure 4.8. It shows that the performance overhead of Ravel for most CPU-intensive benchmarks is negligible. For I/O intensive ones like the web servers, the overhead is also rather small at about 2%. This is expected as the recording mostly happens when there is a syscall, a relatively infrequent event given the speed of today’s computers. Our optimization of the storage consumption for R&R allows Ravel to avoid recording the data that can be recovered from the environment, such as HTML files. However, Ravel has to record the data read from the random pool because they cannot be reproduced.

4.5 Discussion

We discuss potential approaches to improving the design and implementation of Ravel and its future work in this section. First, Ravel focuses on pinpointing vulnerabilities from a recorded attack. The attack detector plays an important role in the overall effectiveness of Ravel. Ravel has been designed not to rely on details of the detector – its only input is a recorded execution history that is known to contain some attacks. Therefore, it is possible to integrate a wide range of attack detection techniques. For example, recent advances in control-flow integrity [128, 146, 147] make it a practical choice as the attack detector. Effective and low-overhead attack detection for data-only and other emerging attacks is still an ongoing research topic. Second, Ravel uses R&R to detach the time-consuming vulnerability locator from the production system and make attacks reproducible for offline analyses. Ravel’s R&R is an always-on kernel- based R&R system for processes. This imposes strict requirements in its performance and storage

62 overhead. In the future, we plan to integrate more techniques like those in eidetic systems [74] to optimize the storage for long-running processes. As expected, this will increase the performance overhead (eidetic systems have about 8% of overhead). Another challenge is how to reduce the replay time for long-running processes even though replaying is much faster than recording (because most syscalls are not re-executed). To address that, we may periodically checkpoint the process and start replaying at the nearest checkpoints. The challenge is then how to locate vulnerabilities from a potentially incomplete execution history. Third, by design, Ravel can only locate attacks that change the data flow. It is ineffective against attacks that do not do so. For example, SQL injections can be leveraged to execute malicious SQL queries without exploiting any flaws in the SQL server, and vulnerabilities in Java programs can be exploited without compromising the Java virtual machine. Other examples include attacks that misuse the benign/obsolete program features for malicious purposes, such as Shellshock [154]. Moreover, the design of Ravel may cause imprecision in the analyses. First, we use dynamic analysis to generate the program’s data-flow graph (DFG). The incompleteness in the DFG may lead to false positives. We plan to explore methods to increase the code coverage of the dynamic analysis to build better DFGs [142]. Second, Ravel relies on heuristics to refine the locations of vulnerabilities, such as the signs of integers. Heuristics is by nature imprecise and may introduce both false positives and false negatives. Third, Ravel keeps track of memory lifetime to aid the detection of double-frees and use-after-frees. This method is less effective for programs that use custom memory allocators. We could use (more) heuristics to automatically detect such memory allocators.

4.6 Summary

In this chapter, we have presented the design and evaluation of Ravel, a practical system to pinpoint exploited vulnerabilities. Specifically, it records the execution history of a production pro- gram and simultaneously monitors its execution for attacks. If an attack is detected, the execution history is replayed with instrumentation to locate the exploited vulnerabilities. With its data-flow and other analyses, Ravel can pinpoint many different types of memory vulnerabilities, such as buffer (heap) overflows, integer errors, and information leaks. Our evaluation shows that Ravel is effective and incurs low performance overhead, suitable for real-world deployment.

63 CHAPTER 5

ADAPTIVE ANDROID KERNEL LIVE PATCHING

5.1 Introduction

In the previous two chapters, we focus on vulnerability mitigation and detection. The next step to take is patching. For a single, isolated and independent system, manual patch generation is not considered as a tough problem. However, for a large-scale ecosystem like the Android, timely generating and applying patches for thousands of different devices becomes a challenging task. In this chapter, we focus on the patching for the Android ecosystem by virtue of its popularity and its large number of unpatched devices. Android is a popular mobile operating system based on the Linux kernel. The kernel, due to its high privilege, is critical to the security of the whole Android system [4]. For example, Android relies on the Linux kernel to enforce proper isolation between apps and to protect important system services (e.g., the location manager) from unauthorized access. Once the kernel is compromised, none of the apps in the system can be trusted. Many apps contain sensitive personal data, such as bank accounts, mobile payments, private messages, and social network data. Even TrustZone, widely used as the secure keystore and digital rights management in Android, is under serious threat since the compromised kernel enables the attacker to inject malicious payloads into TrustZone [56, 132,135]. Therefore, Android kernel vulnerabilities pose a serious threat to user privacy and security. Tremendous efforts have been put into finding (and exploiting) Android kernel vulnerabilities by both white-hat and black-hat researchers, as evidenced by the significant increase of kernel vul- nerabilities disclosed in Android Security Bulletin [3] in recent years. In addition, many kernel vulnerabilities/exploits are publicly available but never reported to Google or the vendors, let alone patched (e.g., exploits in Android rooting apps [158]). The supply of Android kernel exploits likely will continue growing. Unfortunately, officially patching an Android device is a long process involv- ing multiple parties with disparate interests: Google/the vendor verifies a reported vulnerability and creates a patch for it. The patch is then thoroughly tested and released to carriers; carriers test the update again for compatibility with their networks and release it to their users as an over-the-air

64 Table 5.1: Devices vulnerable to two infamous root exploits as of Nov. 2016. The second column lists the dates when they are disclosed in Android Security Advisory.

CVE ID Release Date Months % Vulnerable Devices CVE-2015-3636 Sep. 2015 14 30% CVE-2015-1805 Mar. 2016 8 47%

(OTA) update. Many updates may queue up at the carriers waiting to be tested [92]; finally, the user may or may not install the update promptly. Arguably, device vendors and carriers have little incentive to keep user devices updated and secure. They instead prefer users to buy new devices. For example, phone vendors usually move to new products and stop updating older devices within one year. Consequently, many Android phones become obsolete shortly after they get into the customers’ hands. There also exist lots of small vendors that do not have necessary resources to keep their phones updated. This dire situation is faithfully reflected in the vulnerable phones in use. Table 5.1 lists the statistics of two infamous kernel vulnerabilities: CVE-2015-3636 (“PingPong Root”) [16] and CVE-2015-1805 [15] (data collected from 30 million devices 1). After months since their public disclosure, there are still a significant portion of devices vulnerable to them. Hence, it is unsurprising that Android malware with years-old root exploits can still compromise many victim devices worldwide [5,17,18,21]. In light of these serious threats, there is an urgent need for third-parties to promptly provide patches for these out-of-date devices, without involving vendors or carriers. Android’s fragmented ecosystem poses a significant challenge to a third-party kernel patching system: there are thousands of Android vendors that have produced and keep producing tens of thousands of devices [1], and Google releases new versions of Android at a regular base. This com- bination creates a mess of Android devices with all kinds of hardware and software configurations. For example, Android Lollipop (Android 5.0) was released in November 2014; as of September 2016, 46.3% of Android devices still run an older version of Android with little hope of any future up- dates [2]. Even worse, many Android vendors, small and large ones alike [19], indefinitely “delay” releasing the kernel source code despite the fact that the kernel’s license (GPL) demands it. As such, existing source-code based patching systems [22, 23, 25, 33] can only cover a limited number of devices; a binary-based approach would work better for a third-party solution. However, kernel 1With user consent, we collected kernel versions and build dates from devices with the Baidu app installed and compare them to each vulnerability’s disclosure date to decide if it is potentially vulnerable.

65 binaries in these devices could differ significantly in details. For example, they may use different build systems, different versions of the compiler, and different optimization levels. An effective solution must accommodate thousands of similar yet very different kernels, a challenging goal. To achieve our goal, we first quantified the Android fragmentation by systematically studying and measuring 1, 139 Android kernel binaries. We formulated three key observations that allowed us to effectively tackle this problem. We also analyzed all the recent critical Android kernel vulnerabilities. Armed with these insights, we propose KARMA [57], a multi-level adaptive patching model that can overcome the Android fragmentation issue. KARMA stands for Kernel Adaptive Repair for Many Androids 2. It protects kernel vulnerabilities by filtering malicious inputs to prevent them from reaching the vulnerable code. KARMA’s patches are written in a high-level memory-safe language. To prevent patches from being misused, KARMA strictly confines their runtime behaviors so that the kernel remains as stable and consistent as possible under attack. Adaptiveness is a key distinguishing feature of KARMA from other live patching systems. It allows KARMA to scale to many Android devices. Specifically, given a reference patch and a target kernel, KARMA automatically identifies whether the target kernel contains the same vulnerability and customizes the reference patch for the target kernel if so. Therefore, KARMA’s patches are easy to audit, secure, and adaptive. Like other kernel patching systems, KARMA requires privileged access to the devices it protects. It can either be pre-installed in the device’s firmware or installed afterwards [7]. The implementation of KARMA supports all major Android platforms, and we are currently working with various Android vendors to pre-install KARMA in their future devices. The main contributions of our work are four-fold: • We analyzed the fragmentation issue that hinders existing kernel live patching solutions to be ubiquitously applied on Android devices, and brought the need of an adaptive Android kernel patching solution to light.

• We studied 1, 139 Android kernels from popular devices and 76 critical Android kernel vulnerabil- ities in the last three years. Based on these insights, we propose KARMA, a multi-level adaptive patching model that can be applied to the fragmented Android ecosystem.

• We implemented KARMA with the framework and primitives enabling memory-safe adaptive live patching. The implementation can support all the current Android kernel versions (from 2.6.x to 3.18.x) and different OEM vendors. 2KARMA is a part of the OASES (Open Adaptive Security Extensions, https://oases.io) project, an initiative founded by Baidu to enable fast and scalable live patching for mobile and IoT devices.

66 • We comprehensively evaluated KARMA against all the recently reported critical kernel vulner- abilities. Our evaluation shows that KARMA can both adaptively and effectively handle the majority of these vulnerabilities with negligible overhead (< 1%).

The rest of the chapter is organized as follows. We first state the problem and present the design of KARMA in Section 5.2. We then evaluate the applicability, adaptability, and performance overhead of KARMA in Section 5.3. Section 5.4 discusses the potential improvements to KARMA, and we conclude this chapter in Section 5.5.

5.2 System Design

In this section, we first present our key observations on the Android fragmentation problem and then describe the design of KARMA in detail.

5.2.1 Measuring Android Fragmentation

Designing a live kernel patching system that can scale to lots of devices is a challenging task. However, three key observations we gained from systematically measuring the Android fragmenta- tion render this task feasible and manageable. These observations can serve as a foundation for future systems tackling this problem.

Table 5.2: Images obtained from popular devices.

Vendor #Models #Images Samsung 192 419 Huawei 132 217 LG 120 239 Oppo 74 249 Google Nexus 2 15 Total 520 1139

Observation A: most kernel functions are stable across devices and Android releases. Android (Linux) kernel is a piece of large and mature software. Like other large software, evolution is more common and preferred than revolution – bugs are fixed and new features are gradually added. Complete rewrite of a core kernel component is few and far between. A patch for one kernel thus can probably be adapted to many other kernels. Adaptiveness is a key requirement for protecting the fragmented Android ecosystem.

67 Table 5.3: Statistics of the obtained Android kernels.

Category Statistics Countries 67 Carriers 37 Android Versions 4.2.x, 4.3.x, 4.4.x, 5.0.x, 5.1.x, 6.0.x, 7.0.x Kernel Versions 2.6.x, 3.0.x, 3.4.x, 3.10.x, 3.18.x Kernel Architectures ARM (77%), AArch64 (23%) Kernel Build Years 2012, 2013, 2014, 2015, 2016

10 8 6 4 2 0 0 26000 52000 78000 104000 130000

Figure 5.1: Number of revision clusters for each shared function, sorted by the number of clusters.

To measure the stableness of Android kernels, we collected 1, 139 system images from four major vendors (Samsung/Huawei/LG/Oppo, 1, 124 images) and Google (15 images). These four vendors together command more than 40% of the Android smartphone market, and Google devices have the newest Android software. This data set is representative of the current Android market: these images come from 520 popular old and new devices, feature Android versions from 4.2 to 7.0, and cover kernels from 2.6.x to 3.18.x. The statistics of these images are shown in Table 5.2 and 5.3. After collecting these images, we extracted symbols from their kernels. There are about 213K unique functions, and about 130K of them are shared by more than 10 kernels. We wrote a simple tool to roughly analyze how many different revisions each of these shared functions has. Specifically, we abstract the syntax of each function by the number of its arguments, the conditional branches it contains, the functions called by it, and non-stack memory writes. We then cluster each function across all the images based on these syntactic features. Each different cluster can be roughly considered as a revision of the function (i.e., each cluster potentially requires a different revision of the patch). The results are shown in Fig. 5.1 and 5.2. Specifically, Fig. 5.1 shows how many clusters each shared function has. About 40% of the shared functions have only one cluster, and about 80% of them have 4 clusters or less. Fig. 5.2 shows the percentage of the kernels in the largest cluster

68 100% 80% 60% 40% 20% 0% 0 26000 52000 78000 104000 130000

Figure 5.2: Percentage of kernels in the largest cluster for each shared function.

for each shared function. For about 60% of shared functions, the largest cluster contains more than 80% of all the kernels that have this function. These data show that most kernel functions are indeed stable across different devices. Vulnerabilities in shared functions should be given a higher priority for patching because they affect more devices. Observation B: many kernel vulnerabilities are triggered by malicious inputs. They can be pro- tected by filtering these inputs. Kernel vulnerabilities, especially exploitable ones, are often triggered by malicious inputs through syscalls or external inputs [161] (e.g., network packets). For example, CVE-2016-0802, a buffer over- flow in the Broadcom WiFi driver, can be triggered by a crafted packet whose size field is larger than the actual packet size. Such vulnerabilities can be protected by placing a filter on the inputs (i.e., function arguments and external data received from functions like copy_from_user) to screen malicious inputs. We surveyed all the critical kernel vulnerabilities in the Android Security Bulletin reported in 2015 and 2016 and found that 71 out of 76 (93.4%) of them could be patched using this method (Table A.1). Observation C: many kernel functions return error codes that are handled by their callers. We can leverage the error handling code to gracefully discard malicious inputs. When a malicious input is blocked, we need to alter the kernel’s execution so that the kernel remains as consistent and stable as possible. We observe that many kernel functions return error codes that are handled by their callers. In such functions, a patch can simply end the execution of the current function and return an error code when a malicious input is detected. The caller will handle the error code accordingly [95]. Linux kernel’s coding style recommends that functions, especially exported ones, returning an error code to indicate whether an operation has succeeded or not [24]. If the function does not normally return error codes, it should indicate errors by returning

69 out-of-range results. A notable exception is functions without return values. Most (exported) kernel functions follow the official coding style and return error codes — even kernel functions that return pointers often return out-of-range “error codes” using the ERR_PTR macro. Based on these observations, our approach is as follows: for each applicable vulnerability, we create a patch that can be placed on the vulnerable function to filter malicious inputs. The patch returns a selected error code when it detects an attack attempt. The error is handled by the existing error handling code, keeping the kernel stable. This patch is then automatically adapted to other devices. Automatic adaptation of patches can significantly reduce the manual efforts and speed up the patch deployment.

5.2.2 Adaptive Multi-level Patching

KARMA features a secure and adaptive multi-level patching model. The security is enforced by the following two technical constraints: Rule I, a patch can only be placed at designated locations, and its patched function must be able to return error codes or return void (i.e., no return value). KARMA protects kernel vulnerabilities by preventing malicious inputs from reaching them. For security reasons, a patch can only be placed at the designated levels. Specifically, level 1 is the entry or return points of a vulnerable function; level 2 is before or after call instructions to a callee of the vulnerable function. Note that we do not patch the callee itself but rather hook call instructions in order to avoid affecting other callers of this callee. A typical example of callees hooked by KARMA is copy_from_user, a function dedicated to copy untrusted user data into the kernel. copy_from_user is a perfect checkpoint for malicious inputs because the kernel calls it whenever the kernel needs to read the user data; Level 3 is similar to the existing binary-based patches [22, 23, 33]. Level-3 patches are more flexible but potentially dangerous because they are (currently) unconstrained. If a vulnerability is difficult to patch at level 1 and level 2, we fall back to level 3. Level-3 patches have to be manually scrutinized to prevent them from being misused. Our experiment with 76 critical kernel vulnerabilities shows that level 1 can patch 49 (64%) vulnerabilities, level 2 can patch 22 (29%) vulnerabilities, and we have to fall back to level 3 in only 5 cases (7%). This multi-level design allows KARMA to patch most, if not all, Android kernel vulnerabilities. In the following, we focus on the level-1 and level-2 patches since level-3 patches (i.e., binary patching) have been studied by a number of the previous research [22,23, 33].

70 A patch can indirectly affect the kernel’s control flow by returning an error code when a ma- licious input is intercepted. This immediately terminates the execution of the vulnerable function and passes the error code to the caller. We require a patched function to return error codes on fault in order to leverage the existing error handling code of the kernel to gracefully fail on malicious inputs. Allowing a patch to return arbitrary values (i.e., other than error codes) may have unin- tended consequences. Fortunately, many kernel functions return error codes on fault, following the guidelines of the official coding style. Similarly, we allow functions that return void to be patched. Rule II, a patch can read any valid kernel data structures, but it is prohibited from writing to the kernel. Even though KARMA’s patches are audited before deployment, they may still contain weakness that can be exploited by attackers. To control their side effects, patches are only allowed to read necessary, valid kernel data structures (e.g., registers, stacks, the heap, code, etc.), but they are prohibited from writing to the kernel. Allowing a patch to change the kernel’s memory, even one bit, is dangerous. For example, it could be exploited to clear the U-bit (the user/kernel bit) of a page table entry to grant the user code the kernel privilege. Without the write permission, patches are also prevented from leaking kernel information to a local or remote adversary. This rule is enforced by providing a set of restricted as the only interface for the patches to access the kernel data. By combining these two rules with a careful auditing process and the memory-safety of the patches, we can strictly confine the runtime behaviors of patches to prevent them from potential misuse.

5.2.3 Architecture and Workflow

KARMA works in two phases as shown in Figure 5.3. The offline phase adapts a reference patch (Pr ) to all the devices supported by KARMA. The reference patch often comes from an upstream source, such as Google and chipset manufacturers. It targets a specific device and kernel

(named as the reference kernel, Kr ) and is not directly applicable to other devices. To address that,

KARMA employs an automated system to customize Pr for each target kernel (Kt ). Specifically,

KARMA first roughly identifies potentially vulnerable functions in kernel Kt , and applies symbolic execution to compare the semantics of each candidate function (Ft ) against reference function Fr . If these two functions are semantically equivalent, KARMA further adjusts the reference patch for kernel Kt , signs it, and deposits it to the cloud. To prevent malicious patches from being

71 Vulnerable Function Offline Patch Generation and Verification

Vulnerable SignedSignedSigned Patch Patch Reference Semantic Function PatchesForFor For Patch Matching Identification TargetTargetTarget Kernels Kernel Kernel

THE CLOUD Target TargetTarget Kernel Kernel Kernels

Apply Download & Patch Verify Patch

Online Live Patching by KARMA Client

Figure 5.3: Workflow of KARMA

installed by user devices, reference patches are carefully auditted and all the patches are signed. User devices only install signed patches. Matching semantics with symbolic execution can abstract syntactic differences in function binaries (e.g., register allocation). Semantic matching decides whether candidate function Ft is semantically equivalent, or very similar to, reference function Fr , and whether Ft has been patched or not. In other words, it is responsible for locating a function in the target kernel that can be patched but has not been patched yet. Semantic matching also provides a scheme to customize reference patch Pr for target kernels. In the second phase, the KARMA client in the user device downloads and verifies the patches for its device and applies them to the running kernel. Specifically, the client verifies that each downloaded patch is authentic by checking its signature and that it is applicable to this device by comparing the device model and the kernel version. If a patch passes the verification, it is cached in a secure store provided by Android. The client then applies the patch to the running kernel. An applied patch immediately protects the kernel from exploits without rebooting the device or user interactions. In the unlikely event that a patch causes the device to malfunction, the user can reboot the device and skip the problematic patches by holding a hardware key. Currently, KARMA’s patches are written in the Lua language. We choose Lua for its simplicity, memory- safety, and easiness to embed and extend (in security, simplicity is a virtue). Lua provides sufficient expressive power for KARMA to fix most kernel vulnerabilities. Other kernel scripting languages,

72 1 function kpatcher(patchID, sp, cpsr, r0, r1, r2, r3, r4, r5, r6, r7, r8, r9, r10, r11, r12, r14) 2 if patchID == 0xca5269db50f4 then 3 uaddr1 = r0 4 uaddr2 = r2 5 if uaddr1 == uaddr2 then 6 return -22 7 else 8 return0 9 end 10 end 11 end 12 kpatch.hook(0xca5269db50f4,"futex_requeue")

Figure 5.4: A simplified patch in Lua for CVE-2014-3153

such as BPF [8], can also satisfy our requirements. To execute these patches, we embed a restricted Lua engine in the kernel. The engine strictly enforces the security rules of KARMA (Section 5.2.2). In the rest of this section, we first illustrate KARMA’s patches and then present these two phases in detail.

5.2.4 KARMA Patches

Patches in KARMA are written in the Lua . Lua is a simple, extensible, embedded language. It has only eight primitive types, such as nil, boolean, number, string, and table. Tables are the only built-in composite . Most user data structures are built on top of tables. Lua is a dynamically typed language, and all the data accesses are checked at the runtime. This reduces common memory-related flaws like buffer overflows. Lastly, Lua creates an isolated environment to execute patches. This prevents patches from directly accessing the kernel memory. Instead, the kernel data can only be accessed through restrictive APIs provided by KARMA. Figure 5.4 shows a simplified patch for CVE-2014-3153, exploited by the infamous Towelroot. CVE-2014-3153 is a flaw in function futex_requeue. It fails to check that two arguments are different, allowing a local user to gain the root privilege via a crafted FUTEX_REQUEUE command [14]. To fix it, we just check whether these two arguments (in register r0 and r1, respectively) are different and return an error code (-22 or -EINVAL) if they are the same. As shown in Fig. 5.4, each hooking point has a unique ID. The patch can check this ID to ensure that it is called by the correct hooking points. When invoked, the patch receives the current values of the registers as arguments. They

73 1 s t a t i c int sock_diag_rcv_msg( struct sk_buff *skb, struct nlmsghdr *nlh) 2 { 3 ... 4 switch (nlh->nlmsg_type) { 5 ... 6 case SOCK_DIAG_BY_FAMILY: 7 return __sock_diag_rcv_msg(skb, nlh); 8 ... 9 }

10 s t a t i c int __sock_diag_rcv_msg( struct sk_buff *skb, struct nlmsghdr *nlh) 11 { 12 int err ; 13 struct sock_diag_req *req = NLMSG_DATA(nlh); 14 struct sock_diag_handler *hndl; 15 i f (nlmsg_len(nlh) < s i z e o f (* req )) 16 return -EINVAL; 17 + i f (req->sdiag_family >= AF_MAX) 18 + return -EINVAL; 19 hndl=sock_diag_lock_handler(req->sdiag_family); 20 ... 21 }

Figure 5.5: Source-code patch for CVE-2013-1763

allow the patch to access function arguments and other necessary data by using the APIs provided by KARMA. The last line of the patch installs itself at the futex_request function with a patch ID of 0xca5269db50f4. Next, we use a few examples to demonstrate how to convert a regular source-code based patch to a reference patch for KARMA. CVE-2013-1763: Figure 5.5 shows the original source code patch for CVE-2013-1763. Each “+” sign marks a new line added by the patch. The added lines validate that the protocol fam- ily of the received message (req->sdiag_family) is less than AF_MAX and returns -EINVAL oth- erwise. This patch can be easily converted to a reference patch for KARMA. However, since __sock_diag_rcv_msg does not appear in the kernel’s symbol table (because it is a static function), KARMA instead hooks the entry point of its parent function and screens the arguments there. CVE-2013-6123: this is a vulnerability in function msm_ioctl_server, which reads an un- trusted (u_isp_event) from the user space with copy_from_user. However, it fails to check that the queue_index field of the input is valid. This vulnerability is fixed by line

74 1 s t a t i c long msm_ioctl_server( struct file *file , void *fh , bool valid_prio, int cmd , void * arg ) 2 { 3 ... 4 i f (copy_from_user(&u_isp_event, 5 (void __user *)ioctl_ptr->ioctl_ptr, 6 s i z e o f ( struct msm_isp_event_ctrl))) { 7 ... 8 } 9 ... 10 + i f (u_isp_event.isp_data.ctrl.queue_idx<0 11 + || u_isp_event.isp_data.ctrl.queue_idx >= 12 + MAX_NUM_ACTIVE_CAMERA) { 13 + pr_err("%s: Invalid index %d\n", 14 + __func__, u_isp_event.isp_data.ctrl.queue_idx); 15 + rc=-EINVAL; 16 + return rc; 17 + } 18 ... 19 }

Figure 5.6: Source-code patch for CVE-2013-6123

10-17 in Fig. 5.6. To patch this vulnerability in KARMA, we cannot hook the entry point of msm_ioctl_server because the malicious input data is not available yet. Instead, we should hook the return point of copy_from_user and filter the received data. copy_from_user returns status codes; therefore it can be hooked by KARMA. If the patch detects a malicious input, it returns the error code of -EINVAL. This terminates the execution gracefully. CVE-2016-0802: this is a buffer overflow in the Broadcom WiFi driver, caused by the missing check that the packet data length is less than the packet length. This vulnerability represents an interesting challenge to KARMA: the source-code is patched in several functions, and a new argu- ment is added to function dhd_wl_host_event and dngl_host_event. The error condition is finally checked in function dngl_host_event. Apparently, this type of fix (i.e., adding new arguments to functions) cannot be translated directly in KARMA because patches are not allowed to write the kernel memory. To address that, we need to hook both dhd_rx_frame and dngl_host_event func- tions. The first hook saves the packet length, and the second hook compares the packet length to the data length. If the data length is larger than the packet length, the patch returns the error code of BCME_ERROR. This is an example of KARMA’s multi-invocation patches (also called stateful

75 1 void dhd_rx_frame(...) 2 { 3 ... 4 dhd_wl_host_event(dhd, &ifidx, 5 skb_mac_header(skb), 6 skb->mac.raw, 7 + len - 2, 8 &event, &data); 9 ... 10 }

11 s t a t i c int dhd_wl_host_event(...) 12 { 13 ... 14 - i f (dngl_host_event(dhd_pub, pktdata) == BCME_OK) { 15 + i f (dngl_host_event(dhd_pub, pktdata, pktlen) == BCME_OK) { 16 ... 17 }

18 int dngl_host_event(...) 19 { 20 ... 21 + i f (datalen > pktlen) 22 + return (BCME_ERROR); 23 ... 24 }

Figure 5.7: Source-code patch for CVE-2016-0802

patches). Both patches bear the same patch ID. The variables at the first hook are made accessible to the second hook by KARMA’s Lua engine. An alternative fix is to hook only dhd_rx_frame and manually extract the data length from the packet. However, this fix is less favorable because the patch has to parse the packet structure by itself and it is placed differently from where the source-code patch modifies the control flow, i.e., where the error handling is guaranteed to work.

5.2.5 Offline Patch Adaptation

KARMA’s offline component adapts a reference patch for all supported devices. It first identifies the vulnerable function in a target kernel through structural and semantic matching; then it uses the information from semantic matching to customize the patch for the target kernel. In the following, we describe these two steps in detail.

76 Syntactic Matching. Given a target kernel Kt , we first identify candidate functions (Ft ) in

Kt that may contain the same vulnerability as reference function Fr . However, this task is not as simple as searching the kernel symbol table. There are a number of challenges. First, function Ft might have different semantics than Fr even though their names are the same. Accordingly, the patch cannot be applied to Kt . KARMA addresses this problem by further matching their semantics.

Second, Ft may have a (slightly) different name than Fr even though their semantics is the same. For example, CVE-2015-3636 [66], exploited by PingPong root, exists in function ping_unhash in the Google Nexus 5 kernel but ping_v4_unhash in some other kernels. Third, Ft could have been inlined in the target kernel and thus does not exist in the symbol table. To address these challenges, we assume that most (other) functions are not changed or renamed across different kernels. This assumption is backed by our first observation (Section 5.2.1).

To find matches of function Fr in target kernel Kt , we first extract the symbol table from Kt ’s 3 binary and search in it for the name of Fr . If an exact match is found, we consider this function to be the only candidate. Otherwise, we try to identify candidate functions by call relations. Specifically, we first extract the call graphs from the target and the reference kernels. We collect callers and callees of function Fr in the reference kernel’s graph, and try to locate nodes in the target kernel’s graph that have similar call relations to these two sets of functions. We may find a unique matching node if the function has been simply renamed. If the function has been inlined, the target kernel’s call graph contains direct edges from the caller set to the callee set (instead of connected through Fr ). Accordingly, we use the containing function as the candidate. Multiple candidate functions may be identified using this approach. The semantics of these candidate functions is then compared to that of function Fr to ensure that the patch is applied to correct functions.

Semantic Matching. In this step, KARMA uses semantic matching to decide whether a function should be patched and whether a given reference patch can be adapted to it. For two Android kernels, the same source code could be compiled into different binaries – they may vary in register allocation, instruction selection, and instruction layout. In addition, the positions of structure members may have shifted, and the stack may contain different temporary variables (e.g., because of differences in the register spilling). Therefore, simple syntactic comparison of functions is too restrictive and may reject functions that could otherwise be patched. To this end, we leverage 3The kernel binary often contains the symbol table so that kernel modules can be linked to the kernel. This table may or may not be exported through the /proc/kallsym file at runtime.

77 symbolic execution to compare semantics of the candidate function (Ft ) and the reference function

(Fr ). Path explosion is a significant obstacle in symbolic execution. The situation is even more serious in the Linux kernel because many kernel functions are highly complicated. Even if the vulnerable function looks simple, it may call complex other functions. This can quickly overwhelm the symbolic execution engine. In KARMA, we assume that functions called by Ft and Fr have the same semantics if they share the same signature (i.e., function name and arguments). Therefore, we can use non- local memory writes (i.e., writes to the heap or global variables), function calls, and function returns as checkpoints for semantic comparison. Non-local memory writes, function calls, and returns make up the function’s impacts to the external environment. We consider two functions having the same semantics if their impacts to the environment are the same. We do not take stack writes into consideration because the same function may have different stack layouts in two kernels.

To compare their semantics, we symbolically execute the basic blocks of Fr and Ft and generate constraints for memory writes and function calls. For each memory write, we first check whether it is a local write or not (we consider it a local write if its address is calculated related to the stack/base pointer). If it is a non-local write, we add two constraints that the memory addresses and the content-to-write should be equal. For function calls, we first check that these functions have the same name (and arguments if the kernel source is available). If so, we add constraints that the arguments to these two functions should be equal. We handle function returns similarly by adding constraints for register r0 at the function exits. External inputs to these two functions, such as initial register values, non-local memory reads, and sub-function returns, are symbolized. KARMA supports two modes of operation: in the strict mode, we require that two matching constraints are exactly the same, except for constants. Constants are often used as offsets into structures or the code (e.g., to read embedded constants in the code). These offsets could be different even for the same source code because of different hardware/software settings (e.g., conditional compiling). We ignore these constants to accommodate these differences. In a relaxed mode, we use a constraint solver to find a solution that can fulfill all the constraints at the same time. We consider two functions to be semantically equivalent if there exist at least one such solution. Moreover, to avoid patching an already-patched function, we compare path constraints for the variables accessed by reference patch Pr in function Fr and Ft . If they are more restrictive in Ft than in Fr (i.e.,

78 Table 5.4: The extension to Lua. The first five functions can only be used by the live patcher, not by patches.

API Functionality hook Hook a function for live patching subhook Hook the calls to sub-functions for live patching alloc_mem Allocate memory for live patching free_mem Free the allocated memory for live patching get_callee Locate a callee that can be hooked search_symbol Get the kernel symbol address current_thread Get the current thread context read_buf Read raw bytes from memory with the given size read_int_8 Read 8 bits from memory as an integer read_int_16 Read 16 bits from memory as an integer read_int_32 Read 32 bits from memory as an integer read_int_64 Read 64 bits from memory as an integer

conditional checks are added in Ft ), the function may have already been patched. Note that since KARMA’s patches cannot modify the kernel memory, reapplying a patch is likely safe. If a semantic match is found, the symbolic formulas provide useful information for adapting patch Pr for the target kernel. For example, we can adjust Pr ’s registers and field offsets by comparing formulas of the function arguments. We evaluate the effectiveness of semantic matching in Section 5.3.2.

5.2.6 Live Patching

To enable its protection, KARMA needs to run its client in the user device. The client consists of a regular app and a kernel module. The app contacts the KARMA servers to retrieve patches for the device, while the kernel module verifies the integrity of these patches and applies ones that pass the verification.

Integration of Lua Engine. Patches in KARMA are written in the Lua language. They are executed by a Lua engine embedded in the kernel. KARMA extends the Lua language by providing a number of APIs for accessing kernel data structures. Normally, extending Lua with unsafe C functions forgoes Lua’s memory safety. KARMA provides two groups of APIs to Lua scripts. The first group is used exclusively for applying patches, and the other group is used by patches to read kernel data. Our auditting process automatically ensures that patches can only use the second group of APIs. As such, the memory safety of Lua is retained because all the APIs that a patch can access are read-only. Table 5.4 lists these APIs, which provide the following functionalities:

79 Original Function Patched Function Lua Engine

Instruction A Jump to Save Context Instruction B Trampoline Invoke Patch Exploit Check Instruction C Instruction C Restore Context ReturnInstruction Status A ...... Instruction A Instruction B Jump Back

Figure 5.8: Live patching through function hooking

1) symbol searching: return the runtime address of a symbol; 2) function hooking: hook a given function/sub-function in order to execute the patch before/after the function is called; 3) typed read: given an address, validate whether the address is readable and return the (typed) data if so; 4) thread-info fetching: return the current thread information, such as its thread ID, kernel stack, etc. The first two functionalities belong to the first group, and the rest belongs to the second group. Again, the live patcher can use both groups of the APIs, but patches can only use the second one.

Patch Application. To apply a patch, KARMA hooks the target function to interpose the patch in the regular execution flow, as shown in Fig. 5.8. Specifically, for each hooking point, we create a piece of the trampoline code and overwrite the first few instructions at the hooking point with a jump to the trampoline. At runtime, the trampoline saves the current context by pushing all the registers to the stack and invokes the Lua engine to execute the associated patch. The saved context is passed to the patch as arguments so that the patch can access these registers. Before installing the hook, the live patcher calls the stop_machine function and checks whether there are any existing invocations of the target function in the kernel stacks. If so, it is unsafe to immediately patch the function because otherwise the existing invocations will return to the patched version, potentially causing inconsistent kernel states. When this happens, we return an error code to the client which will retry later. As soon as the patch is applied, the vulnerable function is protected from attacks. If no malicious inputs are detected, the patch returns zero to the trampoline, which in turn restores the context, executes the overwritten instructions, and jumps back to the original function; If malicious inputs are detected, the patch returns an error code to the trampoline, which ends the execution of the hooked function by jumping to a return instruction.

80 Patch Dispatching. KARMA supports two methods to dispatch a patch, one for each of the two execution contexts: the interrupt context or the thread (or process) context. In the interrupt context, the Lua engine is directly invoked through the engine’s C interface, similar to a regular function call. However, it is expensive to launch a new Lua engine each time a patch is executed. In the thread context, we instead schedule patches to a standalone Lua engine (through a workqueue) and wait for the results. The Lua engine executes in a self-contained kernel thread and processes incoming requests from the workqueue. Each request is identified by the thread ID and the patch ID. This dispatching method cannot be used in the interrupt context because blocking functions (e.g., to acquire a ) cannot be called in that context. If a vulnerable function is called in both contexts, we dispatch the patch according to the active context (we have not found such cases in practice). Patch dispatching in the thread context is more complex. In the following we give more details about it. The kernel is a concurrent execution environment, especially with multi-core CPUs, which most Android devices have. A patch accordingly can be executed simultaneous by multiple threads on different CPU cores. These invocations are grouped by their thread ID and patch ID. Specifically, for each distinct combination of thread ID and patch ID, a separate name space is created. Each Lua variable is saved to its associated name space. A name space is not destroyed until the associated thread ends. Therefore, variables of the previous invocations remain available to the subsequent invocations in the same name space 4. By keeping the states across invocations, KARMA can support multi-invocation patches, i.e., complex patches that need to combine the results of several executions to make a decision. A number of patches we tested require this capability. In the thread context, we can also support multiple Lua engines to improve the throughput of patch execution. Specifically, we can spawn multiple kernel threads to run several instances of the Lua engine. A dispatch algorithm decides which Lua engine a request should be scheduled to. The algorithm must be deterministic so that requests in the same name space will always be scheduled to the same engine, allowing them to access states from previous invocations. When a thread ends, its associated states are cleared from all the Lua engines.

4If the vulnerable function is recursively called, some variable states might be lost. To retain the whole history, we can tag variables with the thread ID, patch ID, and the stack top. However, we have not found any of such cases in practice.

81 Lua is a garbage-collected language. Patches thus do not need to explicitly manage memory allocation and release. The Lua engine uses a simple mark-and-sweep garbage collector [96]. Kernel patches usually do not need to allocate many memory blocks. The default garbage collector works well for our purpose without slowing down the system.

5.2.7 Prototype of KARMA

We have implemented a prototype of KARMA. We wrote a number of offline tools for patch adaptation and signing. Our symbolic execution engine was based on the angr framework [6, 136]. We implemented the syntactic and semantic matching by ourselves. Our Lua engine in the kernel is similar to the lunatik-ng project [26]. For example, the Linux kernel does not use floating-point arithmetic. We therefore changed Lua’s internal number representation from floating-points to integers. We also removed the unnecessary Lua libraries such as file operations. Furthermore, we added the support to name spaces in our Lua engine and extended the Lua language with the APIs specified in Table 5.4. We added roughly about 11K lines of source code in total to the Android kernel. The added code was compiled as an 800KB kernel module. This kernel module can be pre- installed on Android devices through collaboration with vendors or installed afterwards through rooting, the only choice available. KARMA can support all the known Android kernel versions (from 2.6.x to 3.18.x) and different vendors.

5.3 Evaluation

The effectiveness of KARMA can be evaluated by its applicability, adaptability, and per- formance. Applicability quantifies how many existing kernel vulnerabilities can be patched by KARMA, and adaptability quantifies how many devices that KARMA can adapt a reference patch for. In the following, we describe these three aspects of the evaluation in detail.

5.3.1 Evaluation of Applicability

We tested KARMA with all the critical kernel vulnerabilities from Android Security Bulletin and ones used to root Android devices. There are 76 such vulnerabilities in total in the last three years. Remarkably, KARMA can fix 71 of them (93.4%) with level-1 and level-2 patches; i.e., we can create an adaptable KARMA patch for them. Table A.1 in Appendix A gives a more complete list of the results. In the following, we describe how KARMA can prevent some interesting kernel

82 vulnerabilities used in one-click rooting apps and recent malware incidents [5,17,18,21]. Appendix A contains a couple of more examples. CVE-2013-2595 (Framaroot): this vulnerability was a part of the infamous Framaroot app (the “Gandalf” payload). It exists in the camera driver for the Qualcomm MSM devices [10]. The driver provides an uncontrolled mmap interface, allowing the attacker to map sensitive kernel memory into the user space. KARMA can patch this vulnerability by validating whether the memory to be mapped is within the user space. CVE-2013-2596 (MotoChopper): an integer overflow in the fb_mmap function allows a local user to create a read-write mapping of the entire kernel memory and consequently gain the kernel privileges. Specifically, the function has a faulty conditional check: if((vma->vm_end - vma->vm_start + off)>len) return -EINVAL;

Because off is a user-controlled variable, an attacker can pass in a really large number to overflow (vma->vm_end - vma->vm_start + off) (the result is interpreted as a negative number) and bypass the validation. Here the original patch adds more checks to prevent this situation [11]. To patch this vulnerability in KARMA, we hook the fb_mmap function and extract the needed variables from its argument vma. For example, we can calculate off as (vma->vm_pgoff << PAGE_SHIFT). The patch then checks whether (vma->vm_end - vma->vm_start + off) is negative or not, and return -EINVAL if so. CVE-2013-6282 (VROOT): this was one of the most popular vulnerabilities used in the wild to root Android devices, publicly known as “VROOT”. It exists in the get/put_user macros. They both fail to check that user-provided addresses are in the valid range. The original patches add the necessary checks to these macros and return -EFAULT if invalid addresses are detected [12]. However, KARMA cannot patch these two macros because they are expanded by the compiler and thus do not exist in the kernel binary. Instead, KARMA patches their expanded functions (i.e., __get_user_1/2/4 and __put_user_1/2/4/8) with checks of whether user-provided addresses are less than current_thread_info()->addr_limit-1. Note that these patches can access the current thread_info structure by using the current_thread API provided by KARMA. These patches simply return -EFAULT if the address is out of the range.

83 CVE-2014-3153 (Towelroot): this vulnerability is the second most-used one to root Android devices, known as “Towelroot”. It lies in the futex_requeue function, which takes the addresses of two as arguments. By design, the function should only re-queue from a non-PI (priority inheritance [139]) to a PI futex. However, this condition is violated if these two addresses point to the same futex. This leads to an exploitable dangling pointer condition. To fix this bug, Linux simply adds a check to ensure that these two futex addresses are different [13]. This vulnerability can be similarly fixed in KARMA by hooking the futex_requeue function, obtaining its arguments, and compare their equality. The patch returns -EINVAL if an attack is detected (Figure 5.4). CVE-2015-3636 (PingPong Root): This is another popular vulnerability used to root An- droid devices, known as “PingPong Root”. It originates in the interaction between the socket and hlist functions. Specifically, when hlist_nulls_del(&sk-> sk_nulls_node) is called, it as- signs LIST_POISON2 to sk->sk_nulls_node.pprev. LIST_POISON2 is defined as the constant of 0x200200. If interpreted as an address, address LIST_POISON2 can be mapped by a malicious app in the user space without any permissions. A second call to connect by the attacker will result in a use-after-free on this attacker-controlled address, compromising the kernel. The Linux patch sets the pointer to NULL in the ping_unhash function [16]. However, this method cannot be applied by KARMA because its patch is prohibited from writing to the kernel memory. Instead, the patch checks if sk->sk_nulls_node.pprev equals to LIST_POISON2. If so, it returns an error code with- out freeing the associated memory. This blocks the exploit but leaves the socket object on the list. This patch is not clean, but it works and does not impact the kernel’s functionalities. Alternatively, KARMA can hook connect in the kernel to prevent reusing the freed socket.

5.3.2 Evaluation of Adaptability

KARMA is an adaptive kernel live patching system for Android. Its ability to automatically adapt a reference patch is the key to protect a wide variety of devices and reduce the window of vulnerability. In this experiment, we evaluate KARMA’s adaptability with 1, 139 Android kernels collected from Internet. Semantic matching is the key to KARMA’s adaptability. It uses symbolic execution to abstract away syntactic differences in function binaries, such as register allocation, instruction selection, and data offset. To evaluate its effectiveness, we cluster the collected 1, 139 Android kernels 5 by syn- 5Only kernels sharing symbols are considered in the clustering.

84 Table 5.5: Clustering 1, 139 kernels for each function by syntax and semantics. The last-but-two column lists the time of semantic matching to compare Nexus 5 (Android 4.4.2, kernel 3.4.0) and Samsung Note Edge (Android 6.0.1, kernel 3.10.40). The experiment was conducted on an Intel E5-2650 CPU with 16GB of memory, and the results are the average over 10 repeats. The last two columns list the number of instructions and basic blocks for each function in Nexus 5.

Kernel Function CVE ID # of Opcode% of the Clusters Largest# of% Syntax of Opcode the LargestClusters# Cluster of% Semantic of Syntax LargestSemantic ClusterClusters Semantic# Matching of Instructions Cluster# of Basic Time Blocks Cost sock_diag_rcv_msg 2013-1763 35 25.0% 7 73.5% 3 75.5% 10.5s 72 16 perf_swevent_init 2013-2094 9 55.9% 5 55.9% 2 96.3% 24.6s 81 22 fb_mmap 2013-2596 26 20.2% 7 44.4% 5 66.9% 12.2s 102 15 __get_user_1 2013-6282 3 92.4% 2 92.4% 2 98.0% 3.2s 6 2 futex_requeue 2014-3153 54 14.8% 9 71.0% 3 99.3% 35.8s 459 107 msm_isp_proc_cmd 2014-4321 42 22.0% 5 66.5% 3 42.8% 8.8s 385 68 send_write_packing_test_read 2014-9878 12 57.6% 4 61.2% 1 100% 4.9s 25 4 msm_cci_validate_queue 2014-9890 6 59.5% 4 84.9% 2 72.4% 6.7s 77 8 ping_unhash 2015-3636 36 12.5% 5 75.7% 3 50.5% 4.6s 54 8 q6lsm_snd_model_buf_alloc 2015-8940 29 34.0% 9 36.6% 5 44.2% 9.9s 104 20 sys_perf_event_open 2016-0819 22 36.3% 6 46.9% 6 84.2% 34.6s 569 118 kgsl_ioctl_gpumem_alloc 2016-3842 16 35.4% 3 88.8% 4 46.0% 4.7s 79 11 is_ashmem_file 2016-5340 6 89.6% 2 93.9% 2 98.1% 0.8s 23 3 tactic and semantic features for 13 popular vulnerabilities. Specifically, the opcode-based clustering classifies kernel functions by types and frequencies of instruction ; the syntax-based clus- tering classifies kernel functions by function calls and conditional branches; and the semantic-based clustering classifies kernel functions according to KARMA’s semantic matching results. Table 5.5 lists the number of clusters and the percentage of kernels in the largest cluster for each clustering method. This table shows that the semantic-based method is the most precise one because it has the smallest number of clusters. Technically, each cluster may need a different adaptation of the reference patch. Therefore, fewer clusters mean a better chance for adaptation to succeed and less manual efforts if automated adaptation fails. Moreover, the largest clusters in the semantic match- ing often contain the majority of the vulnerable kernels. For example, a single reference patch for the largest cluster of perf_swevent_init can be applied to 96.3% of the vulnerable kernels. We randomly picked some functions to manually verify the outcome of semantic matching. For example, the source code of sock_diag_rcv_msg (the function related to CVE-2013-1763) is exactly the same in Samsung Galaxy Note Edge (Android 5.0.1, Linux kernel 3.10.40) and Huawei Honor 6 Plus (Android 4.4, Linux kernel 3.10.30) 6. However, its binaries are very different between these

6Both vendors have released the source code for their devices.

85 BB 1 BB 1' sock_diag_rcv_msg: MOV R12, SP sock_diag_rcv_msg: STMFD SP!, {R4-R6,R11,R12,LR,PC} STMFD SP!, {R0,R1,R4-R6,LR} SUB R11, R12, #4 MOV R5, R0 SUB SP, SP, #0xC LDRH R3, [R1,#4] LDRH R3, [R1,#4] MOV R4, R1 MOV R4, R1 CMP R3, #0x12 MOV R6, R0 BCC loc_C0A06C7C CMP R3, #0x12 BCS loc_C0D4C488 loc_C0A06B8C: BB 2' CMP R3, #0x13 loc_C0D4C488: BB 2 BLS loc_C0A06BA0 CMP R3, #0x13 BHI loc_C0D4C4D4 loc_C0A06B94: CMP R3, #0x14 loc_C0D4C4D4: BEQ loc_C0A06BEC CMP R3, #0x14 BB 3' BNE loc_C0D4C478 BB 3 BB 9' loc_C0A06BEC: LDR R3, [R1] loc_C0D4C4DC: loc_C0A06B9C: SUB R3, R3, #0x10 LDR R3, [R1] B loc_C0A06C7C CMP R3, #1 SUB R3, R3, #0x10 CMP R3, #1 BLS loc_C0A06C7C BB 4' BB 6' BLS loc_C0D4C478 BB 4 BB 6 BB 5' loc_C0A06BA0: BB 5 loc_C0D4C490: loc_C0A06BFC: LDR R3, =0xC222E584 loc_C0D4C4EC: LDR R5, =0xC1A33F44 LDRB R3, [R1,#0x10] LDR R2, [R3] LDRB R3, [R1,#0x10] LDR R3, [R5,#0xA4] CMP R3, #0x28 MOV R6, R3 CMP R3, #0x28 CMP R3, #0 BHI loc_C0A06C7C CMP R2, #0 BHI loc_C0D4C478 BEQ loc_C0D4C52C BNE loc_C0A06BD0

loc_C0D4C52C: loc_C0A06BB4: MOV R3, #2 MOV R3, #2 BB 7' BB 7 MOV R2, #0x10 MOV R0, #1 STR R3, [SP,#0x24+var_24] loc_C0A06C7C: LDR R1, =aNetPfDProtoDTy loc_C0D4C478: MOV R0, #1 MOV R4, #0xFFFFFFEA MOV R0, #0xFFFFFFEA MOV R2, #0x10 LDR R1, =dword_C11D9904 STR R3, [SP,#0x18+var_18] MOV R3, #4 MOV R3, #4 BL __request_module BL __request_module B loc_C0D4C4A0 BB 8' BB 8 The rest of the control f ow graph is omitted for simplicity. The rest of the control f ow graph is omitted for simplicity.

(a) (b)

Figure 5.9: sock_diag_rcv_msg of (a) Huawei Honor 6 Plus (PE-TL10) with Android 4.4 and Linux kernel 3.10.30, compiled by GCC 4.7, and (b) Samsung Galaxy Note Edge (N915R4) with Android 5.0.1 and Linux kernel 3.10.40, compiled by GCC 4.8. Basic blocks and control flows with different syntax are highlighted.

two devices because of the different compilers and kernel configurations. Figure 5.9a and 5.9b show a part of the disassembly code for these two binaries, respectively. The syntactic differences are highlighted. There are changes to the order of instructions (BB8 on the left vs BB8’ on the right), register allocation (BB7 vs BB7’), instruction selection (BB2 vs BB2’), and control flow (additional BB9’ in the Samsung kernel). KARMA’s semantic matching can abstract these syntactic differences and put these two binaries of sock_diag_rcv_msg into the same cluster. That is, both can be patched by the same CVE-2013-1763 patch discussed in Section 5.2.4. Semantic matching can also separate kernel functions that are incorrectly classified together by the syntax matching. For example, the control flow and most instructions of function msm_cci_validate_queue (the function related to CVE-2014-9890) are identical in the kernel of Oppo 3007 (Android 4.4.4,

86 A A' LDR R1, [R4,#0x15C] LDR R1,[R4,#0x15C] MOV R0, #1 MUL R7, R7, R6 MOV R0, R0, LSL R5 MOV R0, #1 ADD R1, R1, #8 MOV R0, R0, LSL R5 BL msm_camera_io_wb ADD R5, R4, R7 ADD R1, R1, #8 ADD R5, R5, #0x1E4 B BL msm_camera_io_w MOV R0, #0x64 BL msecs_to_jiffies B' MOV R0, #0x64 C BL msecs_to_jiffies MUL R7, R7, R6 ADD R5, R7, #0x210 ADD R5, R4, R5 C' MOV R1, R0 MOV R1, R0 MOV R0, R5 MOV R0, R5 BL wait_for_completion BL wait_for_completion _timeout _interruptible_timeout

Figure 5.10: Three semantically different basic blocks of msm_cci_validate_queue in Oppo 3007 (left) and Samsung N910G (right). They have different callees and arguments, and thus different semantics.

kernel 3.10.28) and Samsung N910G (Android 6.0.1, kernel 3.10.40). A simple syntactic matching algorithm would consider them similar. These functions are shown in Fig. 5.10 (only basic blocks with different semantics are shown). However, KARMA’s semantic matching algorithm considers basic block A and A0, C and C0 to be different because their last instructions call different functions with different arguments. Consequently, KARMA needs to use two patches to fix this vulnerability in these devices. A further investigation shows that KARMA can actually use the same patch for CVE-2014-9890 to fix both kernels because it only needs to validate the arguments, which are the same for both functions. Finally, KARMA’s semantic matching is quite efficient. It simplifies symbolic execution by considering most functions remain unchanged. The last-but-two column of Table 5.5 lists the time used by semantic matching to compare each listed function in two kernels. The analysis time increases with the complexity of the function, but they are all less than 36 seconds with an average of 12.5 seconds. Without this heuristics, it will take much longer and may never finish in some cases.

5.3.3 Evaluation of Performance

To evaluate the performance overhead of KARMA, we experimented with both a standard Android benchmark (CF-Bench [9]) and a syscall-based micro-benchmark. Both benchmarks were

87 20,000

15,000

10,000

Performance Score 5,000

0 No patch Towelroot Ping Pong RootBoth

Figure 5.11: Performance scores by CF-Bench.

run on Google Nexus 5 with Android 4.4. Each reported result is the average over 20 measurements. The standard deviation of the results is negligible. Overall, we find that KARMA does not introduce noticeable time lag to regular operations of the test device. Considering the fact that most critical kernel vulnerabilities exist in less-hot code paths (e.g., device drivers’ ioctl interfaces as shown in Table A.1), we consider KARMA’s performance is sufficient for real-world deployment. The first benchmark measures the whole system performance with CF-Bench. We tested the performance of the following four configurations: the original kernel without any patches, the kernel with the patch for Towelroot, the kernel with the patch for PingPong root, and the kernel with both patches. The results are shown in Fig. 5.11. The measured performance is virtually the same for all four configurations. This benchmark shows that KARMA’s kernel engine has minimal impact on the performance if patches are not frequently executed. To further quantify the overhead of KARMA, we measured the execution time of a syscall with several different patches executed by a single Lua engine. We inserted a hook point in the execution path of a selected syscall (i.e., the patch was always executed for this syscall) and measured the execution time of the syscall under the following conditions:

• The patch simply returns 0. This reflects the runtime cost of the trampoline for function hooking. It takes about 0.42µs to execute.

• The patch contains a set of if/elseif/else conditional statements. This simulates patches that validate input arguments. It takes about 0.98µs to execute.

88 120

100

80

60

40

20

Execution Time (microseconds) 0 No patch Direct returnConditionalMemory comparison readMixed operations

Figure 5.12: Execution time of chmod with different patches.

• The patch consists of a single read of the kernel memory. This measures the overhead of the Lua APIs provided by KARMA. It takes about 0.82µs to execution.

• To simulate more complex patches, we created a patch with a mixture of assignments, memory reads, and conditional statements. It takes about 3.74µs to execute.

The results are shown in Figure 5.12. In each test, the syscall was invoked in a tight loop for a thousand times, and each result is the average of 20 runs. To put this into context, we counted all the syscalls made by Google Chrome for Android during one minute of browsing. The most frequently made syscall was gettimeofday for about 110, 000 times. This translates to about 0.55 seconds (0.9%) of extra time even if we assume the patch takes 5µs for each invocation. In summary, KARMA only incurs negligible performance overhead and performs sufficiently well for real-world deployment.

5.4 Discussion and Future Work

In this section, we discuss potential improvements to KARMA and the future work. First, KARMA aims at protecting the Android kernel from exploits because the kernel has a high privilege and its compromise has serious consequences on user security and privacy. An approach similar to KARMA can be applied to the Android framework and user-space apps. In addition, Android O formalizes the interface between the Android framework and the vendor implementation so that, eventually, the Android framework can be updated independent of the vendor implementation (aka.

89 project Treble [20]). This will at least partially address the user-space update problem. However, project Treble does not address the kernel update problem. Android kernels are still fragmented and out-of-date. A system like KARMA is still necessary. Second, KARMA’s patches are written in the Lua programming language. It relies on the Lua engine to strictly confine patches’ runtime behaviors. However, this approach increases the kernel’s trusted computing base despite the fact that the Lua engine is relatively mature and secure. Exe- cuting patches on the Lua engine also negatively impacts the performance, especially if the system is under heavy load (in reality, this is not a concern because most Android kernel vulnerabilities are on the kernel’s cold paths, such as device drivers’ ioctl functions, as shown in Table A.1). We are investigating alternative designs that can achieve similar security guarantees, such as BPF [8] and sandboxed binary patches. Third, KARMA leverages the existing error handling code in the kernel to handle filtered mali- cious inputs, in order to keep the kernel as stable as possible. However, error handling code has been shown to contain vulnerabilities [98], and this design may leak resources and even cause deadlocks (KARMA does not allow patches themselves to release resource because that requires writing to the kernel). We did not find this to be a constraint during our experiment with all the critical Android kernel vulnerabilities. KARMA’s reference patch is often a direct translation of the official source-code patch, which should have properly released the resources. If an official patch cannot be translated to a level-1 or level-2 patch, we can fall back to the level-3 (binary) patch. Level-3 patches are more flexible but require careful auditting. Fourth, KARMA uses symbolic execution to semantically match two vulnerable functions. The approach is sufficient for our purpose in practice because many kernel functions are rather stable across devices and Android releases. In theory, the approach is not sound. It is a trade-off between soundness and scalability. Many systems make a similar trade-off because symbolic execution itself is neither very scalable nor very precise (e.g., how to handle loops). We are improving our method to better identify vulnerable functions and adapt patches. If KARMA’s automated adaption cannot find a proper function to patch, we can fall back to the binary patch for this particular vulnerability. Lastly, KARMA is a third-party kernel live patching system. Patches can be promptly delivered to user devices without the long wait caused by vendors and carriers. However, without testing performed by vendors and carries, its patches could cause stability issues in the user devices. Our

90 implementation allows users to selectively disable a problematic patch. With KARMA’s cloud service, we can automatically blacklist such patches from specific device models. We can also work with device vendors so patches can be quickly tested before release.

5.5 Summary

We have presented the design, implementation, and evaluation of KARMA, an adaptive live patching system for Android kernel vulnerabilities. By filtering malicious user inputs, KARMA can protect most Android kernel vulnerabilities from exploits. Compared to existing kernel live patching systems, the unique features of KARMA are that it can automatically adapt a reference patch for many Android devices, and it strictly confines the runtime behaviors of its patches. These two features allow KARMA to scale to a large, fragmented Android ecosystem. Our evaluation results demonstrated that KARMA can protect most critical Android kernel vulnerabilities in many devices with negligible performance overhead.

91 CHAPTER 6

CONCLUSION

In this dissertation, a suite of systems have been proposed for threat mitigation, vulnerability detection, and patch generation. Specifically, Remix dynamically randomizes basic blocks during runtime to significantly increase entropy. This design of code randomization raises the bar of successful exploits to a new level. Ravel leverages its powerful record & replay mechanism to reproduce attacks in the lab environment, and adopts advanced memory access analysis techniques to locate underlying vulnerabilities and root causes, rather than detecting only the attacks. KARMA makes live patch generation timely, adaptive and secure for thousands of Android kernels, making it a practical and effective solution for the highly fragmented Android ecosystem. We have implemented a prototype for each of these systems and the evaluation results demonstrate their effectiveness and efficiency, suitable to be deployed in the real world. Additionally, these solutions can be combined with other defense mechanisms to further enhance the overall security landscape. Finally, this systematic protection suite can inspire future research and development in both academia and industry.

92 APPENDIX A

KARMA PATCH WRITING FOR RECENT KERNEL VULNERABILITIES

Table A.1: A partial list of recent critical Android kernel vulnerabilities and KARMA's effectiveness to create adaptable patches for them.

Hook __sys_recvmmsg and its invocation of fput. On return- ing of fput, check if __sys_recvmmsg's err is not equal to 0 CVE-2016-7117 ! and not equal to -EAGAIN. If so, return err and skip the rest execution. Hook is_ashmem_file and check the full path of the input CVE-2016-5340 file. Only return True if the full path is /dev/ashmem. Oth- ! erwise return False. Hook key_reject_and_link and its invocation of __key_link_end. Check if link_ret is 0 before CVE-2016-4470 calling into __key_link_end. If so, simply return. ! key_reject_and_link is void typed so any return value is fine. It requires writing to kernel memory, violating KARMA's ba- CVE-2016-3951 Level-3 sic constraint. Hook do_ipv6_setsockopt to avoid concurrent access to the CVE-2016-3841 ! socket options of the same socket fd. Hook aio_setup_single_vector and check if the input CVE-2016-3775 kiocb->ki_nbytes exceeds MAX_RW_COUNT. If so, return ! -EFAULT. It requires to skip some instructions and continue execution CVE-2016-3768 Level-3 afterwards, which is not a permitted operation by KARMA. Hook mtk_p2p_wext_discovery_results etc. functions of CVE-2016-3767 which the bodies are deleted by the official patch, and simply ! return 0. Android does not enable CONFIG_USER_NS so this should not be a direct threat to Android devices. But KARMA can still CVE-2016-3134 ! fix it by iterating newpos = pos + e->next_offset to check if there is a out-of-bound access. It requires to reorder the instructions (to change when to take CVE-2016-2503 Level-3 the lock). This is not a permitted operation by KARMA.

93 Table A.1 - continued

Vulnerability Hotpatching Using KARMA Adaptable? Hook hdd_parse_ese_beacon_req and check the CVE-2016-2474 tempInt read from the argument pValue. If it exceeds ! SIR_ESE_MAX_MEAS_IE_REQS, return -EINVAL. Hook _kgsl_sharedmem_page_alloc and validate the input CVE-2016-2468 ! size. Hook msm_compr_ioctl and its invocation of __copy_from_user. Check if the params_length passed CVE-2016-2467 into __copy_from_user exceeds MAX_AC3_PARAM_SIZE. If so, ! return error code from __copy_from_user without executing into it. Hook adm_get_params and check if adm_get_parameters[0] CVE-2016-2466 exceeds ADM_GET_PARAMETER_LENGTH-1 and ! params_length/sizeof(int). If so, return -EINVAL. Hook the concerned functions in drivers/video/msm/mdss/mdss_debug.c patched in the CVE-2016-2465 original patch, and their invocations of __copy_to_user. ! Validate len and count, and return -EFAULT in case of exploit conditions. Hook check_vma and return -EFAULT if vma->vm_flags & CVE-2016-2067 ! memdesc->flags != memdesc->flags. Hook adreno_perfcounter_query_group and its invocation CVE-2016-2062 of kmalloc. On the entry of kmalloc, check if t is larger than ! count. Hook ipa_wwan_ioctl and its invocation of find_mux_channel_index. On entry of CVE-2016-0844 ! find_mux_channel_index, if the value of rmnet_index exceeds MAX_NUM_OF_MUX_CHANNEL, return -EFAULT directly. Hook msm_l2_test_set_ev_constraint and check if CVE-2016-0843 shift_idx >= PMU_CODES_SIZE. Return -EINVAL in case of ! that. Hook priv_get_struct and its invo- cation of __copy_from_user, check if CVE-2016-0820 ! prIwReqData->data.length>u4CopyDataMax and return -EFAULT if so. Hook iw_softap_set_channel_range and check if the caller CVE-2016-0806 ! has the capability CAP_NET_ADMIN, return -EPERM if not. Hook get_krait_evtinfo and check if reg exceeds CVE-2016-0805 ! krait_max_l1_reg, return -EINVAL if so.

94 Table A.1 - continued

Vulnerability Hotpatching Using KARMA Adaptable? Hook wl_validate_wps_ie and check if subelt_len exceeds the size of devname (100). Hook CVE-2016-0801 wl_notify_sched_scan_results and its invocation of ! memcpy and check if the passed buffer length exceeds DOT11_MAX_SSID_LEN. Hook asn1_find_indefinite_length and check if dp is CVE-2016-0758 ! larger than datalen. Return -1 if so. Hook join_session_keyring and iterate the keyring. Re- CVE-2016-0728 turn error if keyring->usage reaches the overflow boundary ! (0xFFFFFFFF). Hook msm_cpp_subdev_ioctl, if the argument cmd equals to CVE-2015-8942 VIDIOC_MSM_CPP_IOMMU_DETACH, from its argument sd obtain ! cpp_dev->stream_cnt and check if it equals to 0. Hook msm_isp_axi_check_stream_state and iterate over the input stream_cfg_cmd->stream_handle to see if one ex- CVE-2015-8941 ! ceeds MAX_NUM_STREAM. The other vulnerable functions can be fixed in the same way. Hook q6lsm_snd_model_buf_alloc and check if the integer CVE-2015-8940 ! argument len is out of range. Hook mdp4_argc_process_write_req and check if the input CVE-2015-8939 ! pgc_ptr->num_r/g/b_stages are out of range. Hook msm_isp_send_hw_cmd and check if the ioctl input ar- CVE-2015-8938 guments satisfy the constraints updated by the official patch. ! The constraint list is long so omitted here. Fixing the problem requires locking and increasing the refer- CVE-2015-8816 ence of the usb_hub structure, thus the patch needs to write Level-3 to kernel memory. Hook the system call prctl and check if the corresponding CVE-2015-6640 argument as the end passed to prctl_set_vma_anon_name is ! out of range. Hook PVRSRVSyncPrimSetKM and check if the input CVE-2015-6638 psSyncBlk->ui32BlockSize is smaller than another input ! ui32Index * sizeof(IMG_UINT32). The official patch is to remove all .tmpfile handlers. So we CVE-2015-6619 ! can simply hook such handlers and always return -EINVAL. Hook sys_sendto/sys_recvfrom and check if the input buff CVE-2015-2686 ! and len/size are out of range. Hook __iw_softap_setwpsie and check if ioctl arguments CVE-2015-0570 have improper length, same as the official patch. The check ! list is long so omitted here.

95 Table A.1 - continued

Vulnerability Hotpatching Using KARMA Adaptable? Hook dot11fUnpackIeCountry and CVE-2014-9902 dot11fUnpackIeSuppChannels to validate the value of ! the input ielen. Hook __qseecom_process_rpmb_svc_cmd and validate if the CVE-2014-9891 input req_ptr fields passed in from user space are out of ! range. Hook msm_cci_validate_queue and validate if cmd_size ex- CVE-2014-9890 ! tracted from the inputs is larger than 10. Hook qseecom_send_modfd_cmd and its invocation of CVE-2014-9887 __copy_from_user. Validate req.cmd_req_len obtained ! from user space. Hook qseecom_register_listener etc. handlers to validate CVE-2014-9884 ! pointers passed in from user space, same as the official patch. Hook extract_dci_log and check for the integer overflow CVE-2014-9883 ! condition of the input log_length. Hook iris_vidioc_s_ctrl. If the input ctrl->id is CVE-2014-9882 V4L2_CID_PRIVATE_IRIS_RIVA_ACCS_LEN/_POKE, validate if ! the copied data length exceeds MAX_RIVA_PEEK_RSP_SIZE. Hook iris_vidioc_s_ext_ctrls and perform range/over- CVE-2014-9881 ! flow check on the input ctrl. Hook vid_enc_ioctl and its invocation of CVE-2014-9880 __copy_from_user. Validate seq_header fetched from ! user space. Hook mdp3_histogram_start and validate its input req; CVE-2014-9879 hook mdp3_pp_ioctl and validate mdp_pp obtained from user ! space. Hook send_write_packing_test_read and validate its input CVE-2014-9878 ! buffer and count. Hook msm_isp_ functions as specified in the official CVE-2014-9869 patch, and validate if stats_idx from input exceeds ! MSM_ISP_STATS_MAX. Hook msm_csiphy_release and validate the value of input CVE-2014-9868 ! csi_lane_params->csi_lane_mask. Fixing the issue requires changing the instruction order (delay CVE-2014-9529 the reference put). This is not a safe operation permitted by Level-3 KARMA.

96 BIBLIOGRAPHY

[1] Android Fragmentation: There Are Now 24,000 Devices from 1,300 Brands. http://www.zdnet.com/article/android-fragmentation-there-are-now-24000- devices-from-1300-brands/.

[2] Android Platform Versions. https://developer.android.com/about/dashboards/index. html.

[3] Android Security Bulletins. https://source.android.com/security/bulletin/.

[4] Android System and Kernel Security. https://source.android.com/security/overview/ kernel-security.html.

[5] Android Towelroot Exploit Used to Deliver Dogspectus Ransomware. https: //www.bluecoat.com/security-blog/2016-04-25/android-exploit-delivers- dogspectus-ransomware.

[6] angr. http://angr.io.

[7] Anonymous Citation. This citation is anonymized to avoid leaking the authors’ identities.

[8] (BPF). https://www.kernel.org/doc/Documentation/networking/ filter.txt.

[9] CF-Bench. https://play.google.com/store/apps/details? id=eu.chainfire.cfbench.

[10] CVE-2013-2595 Kernel Patch. https://www.codeaurora.org/patches/quic/la/.PATCH_ 24430_iwoLuwW321heHwW.tar.gz.

[11] CVE-2013-2596 Kernel Patch. http://git.kernel.org/cgit/linux/kernel/git/ torvalds/linux.git/commit/?id=fc9bbca8f650e5f738af8806317c0a041a48ae4a.

[12] CVE-2013-6282 Kernel Patch. http://git.kernel.org/cgit/linux/kernel/git/ torvalds/linux.git/commit/?id=8404663f81d212918ff85f493649a7991209fa04.

[13] CVE-2014-3153 Kernel Patch. http://git.kernel.org/cgit/linux/kernel/git/ torvalds/linux.git/commit/?id=e9c243a5a6de0be8e584c604d353412584b592f8.

[14] CVE-2014-3153 (Towelroot). https://cve.mitre.org/cgi-bin/cvename.cgi?name=cve- 2014-3153.

97 [15] CVE-2015-1805 Kernel Patch. http://git.kernel.org/cgit/linux/kernel/git/ torvalds/linux.git/commit/?id=637b58c2887e5e57850865839cc75f59184b23d1.

[16] CVE-2015-3636 Kernel Patch. http://git.kernel.org/cgit/linux/kernel/git/ torvalds/linux.git/commit/?id=a134f083e79fb4c3d0a925691e732c56911b4326.

[17] From HummingBad to Worse. https://blog.checkpoint.com/wp-content/uploads/ 2016/07/HummingBad-Research-report_FINAL-62916.pdf.

[18] Ghost Push: An Un-Installable Android Virus Infecting 600,000+ Users Per Day. http: //www.cmcm.com/blog/en/security/2015-09-18/799.html.

[19] GPLv2 and Its Infringement by Xiaomi. http://www.xda-developers.com/gplv2-and- its-infringement-by-xiaomi/.

[20] Here comes Treble: A modular base for Android. https://android-developers. googleblog.com/2017/05/here-comes-treble-modular-base-for.html.

[21] Kemoge: Another Mobile Malicious Adware Infecting Over 20 Countries. https://www. fireeye.com/blog/threat-research/2015/10/kemoge_another_mobi.html.

[22] kGraft: Live patching of the Linux kernel. http://events.linuxfoundation.org/sites/ events/files/slides/kGraft.pdf.

[23] kpatch: Dynamic Kernel Patching. https://github.com/dynup/kpatch.

[24] Linux Kernel Coding Style. https://www.kernel.org/doc/Documentation/CodingStyle.

[25] Linux Kernel Livepatch. https://www.kernel.org/doc/Documentation/livepatch/ livepatch.txt.

[26] lunatik. https://github.com/lunatik-ng/lunatik-ng.

[27] Martín Abadi, Mihai Budiu, Úlfar Erlingsson, and Jay Ligatti. Control-Flow Integrity: Prin- ciples, Implementations, and Applications. In Proceedings of the 12th ACM Conference on Computer and Communications Security, November 2005.

[28] Sarita V Adve, Mark D Hill, Barton P Miller, and Robert HB Netzer. Detecting Data Races on Weak Memory Systems. ACM SIGARCH Computer Architecture News, 19(3):234–243, 1991.

[29] Hiralal Agrawal and Joseph R Horgan. Dynamic Program Slicing. In ACM SIGPLAN Notices, volume 25, pages 246–256. ACM, 1990.

98 [30] Periklis Akritidis, Cristian Cadar, Costin Raiciu, Manuel Costa, and Miguel Castro. Pre- venting Memory Error Exploits with WIT. In Proceedings of the 29th IEEE Symposium on Security and Privacy, May 2008.

[31] Apple. OS X MountainLion Core Technologies Overviewe. http://movies.apple.com/ media/us/osx/2012/docs/OSX_MountainLion_Core_Technologies_Overview.pdf.

[32] ARM: the Architecture for the Digital World. http://www.arm.com/.

[33] Jeff Arnold and M. Frans Kaashoek. Ksplice: Automatic Rebootless Kernel Updates. In Proceedings of the 4th ACM European Conference on Computer Systems, pages 187–198, New York, NY, USA, 2009. ACM.

[34] Linux Kernel Address Space Layout Randomization. http://lwn.net/Articles/569635/.

[35] Linux Kernel Address Space Layout Randomization. http://blogs.msdn.com/b/michael_ howard/archive/2006/05/26/address-space-layout-randomization-in-windows- vista.aspx.

[36] Michael Backes, Thorsten Holz, Benjamin Kollenda, Philipp Koppe, Stefan Nürnberger, and Jannik Pewny. You can run but you can’t read: Preventing disclosure exploits in executable code. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, 2014.

[37] Michael Backes and Stefan Nürnberger. Oxymoron: Making Fine-grained Memory Random- ization Practical by Allowing Code Sharing. Proc. 23rd Usenix Security Sym, pages 433–447, 2014.

[38] Elena Gabriela Barrantes, David H Ackley, Stephanie Forrest, and Darko Stefanović. Ran- domized Instruction Set Emulation. ACM Transactions on Information and System Security (TISSEC), 8(1):3–40, 2005.

[39] Sofia Bekrar, Chaouki Bekrar, Roland Groz, and Laurent Mounier. A Taint Based Approach for Smart Fuzzing. In Proceedings of the IEEE Fifth International Conference on Software Testing, Verification and Validation, pages 818–825. IEEE, 2012.

[40] Sandeep Bhatkar and R Sekar. Data Space Randomization. In Detection of Intrusions and Malware, and Vulnerability Assessment, pages 1–22. Springer, 2008.

[41] Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazieres, and Dan Boneh. Hacking Blind. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 227–242. IEEE, 2014.

[42] Tyler Bletsch, Xuxian Jiang, Vince W. Freeh, and Zhenkai Liang. Jump-oriented Program- ming: A New Class of Code-reuse Attack. In Proceedings of the 6th ACM Symposium on Information, Computer and Communications Security, ASIACCS ’11, 2011.

99 [43] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. When Good Instructions Go Bad: Generalizing Return-Oriented Programming to RISC. In Proceedings of the 15th ACM Conference on Computer and Communications Security, October 2008.

[44] Nathan Burow, Scott A Carr, Stefan Brunthaler, Mathias Payer, Joseph Nash, Per Larsen, and Michael Franz. Control-flow Integrity: Precision, Security, and Performance. ACM Computing Surveys, 2017.

[45] Nicholas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and Thomas R Gross. Control-Flow Bending: On the Effectiveness of Control-Flow Integrity. In Proceedings of the 24th USENIX Security Symposium, volume 14, pages 28–38, 2015.

[46] Nicholas Carlini and David Wagner. Rop is Still Dangerous: Breaking Modern Defenses. In USENIX Security Symposium, 2014.

[47] Miguel Castro, Manuel Costa, and Tim Harris. Securing Software by Enforcing Data-flow Integrity. In Proceedings of the 7th Symposium on Operating Systems Design and Implemen- tation, OSDI ’06, 2006.

[48] Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, Hovav Shacham, and Marcel Winandy. Return-oriented Programming without Returns. In Pro- ceedings of the 17th ACM Conference on Computer and Communications Security, CCS ’10, 2010.

[49] Ping Chen, Jun Xu, Zhisheng Hu, Xinyu Xing, Minghui Zhu, Bing Mao, and Peng Liu. What You See is Not What You Get! Thwarting Just-in-Time ROP with Chameleon. In Proceedings of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pages 451–462. IEEE, 2017.

[50] Shuo Chen, Jun Xu, Nithin Nakka, Zbigniew Kalbarczyk, and Ravishankar K Iyer. Defeating Memory Corruption Attacks via Pointer Taintedness Detection. In Proceedings of the 2005 International Conference on Dependable Systems and Networks, pages 378–387. IEEE, 2005.

[51] Shuo Chen, Jun Xu, Emre C. Sezer, Prachi Gauriar, and Ravishankar K. Iyer. Non-Control- Data Attacks Are Realistic Threats. In Proceedings of 2005 USENIX Security Symposium, August 2005.

[52] Xi Chen, Herbert Bos, and Cristiano Giuffrida. CodeArmor: Virtualizing the Code Space to Counter Disclosure Attacks. In Proceedings of the 2nd European Symposium on Security and Privacy, 2017.

[53] Yue Chen, Mustakimur Khandaker, and Zhi Wang. Pinpointing Vulnerabilities. In Proceedings of the 12th ACM Asia Conference on Computer and Communications Security, pages 334–345, Abu Dhabi, United Arab Emirates, 2017. ACM.

100 [54] Yue Chen, Mustakimur Khandaker, and Zhi Wang. Secure In-Cache Execution. In Proceedings of the 20th International Symposium on Research in Attacks, Intrusions and Defenses (RAID 2017), Atlanta, GA, September 2017.

[55] Yue Chen, Zhi Wang, David Whalley, and Long Lu. Remix: On-demand Live Randomization. In Proceedings of the Sixth ACM Conference on Data and Application Security and Privacy, pages 50–61, New Orelans, LA, Mar 2016. ACM.

[56] Yue Chen, Yulong Zhang, Zhi Wang, and Tao Wei. Downgrade Attack on TrustZone. arXiv preprint arXiv:1707.05082, 2017.

[57] Yue Chen, Yulong Zhang, Zhi Wang, Liangzhao Xia, Chenfu Bao, and Tao Wei. Adap- tive Android Kernel Live Patching. In Proceedings of the 26th USENIX Security Symposium (USENIX Security 17), Vancouver, BC, August 2017. USENIX Association.

[58] Yueqiang Cheng, Zongwei Zhou, Miao Yu, Xuhua Ding, and Robert H Deng. ROPecker: A Generic and Practical Approach for Defending against ROP Attacks. In Symposium on Network and Distributed System Security (NDSS), 2014.

[59] Cristina Cifuentes and Mike Van Emmerik. Recovery of Jump Table Case Statements from Binary Code. In Program Comprehension, 1999. Proceedings. Seventh International Workshop on, pages 192–199. IEEE, 1999.

[60] Manuel Costa, Miguel Castro, Lidong Zhou, Lintao Zhang, and Marcus Peinado. Bouncer: Se- curing Software by Blocking Bad Input. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles, volume 41, pages 117–130. ACM, 2007.

[61] Crispin Cowan, Steve Beattie, John Johansen, and Perry Wagle. Pointguard TM: protecting pointers from buffer overflow vulnerabilities. In Proceedings of the 12th conference on USENIX Security Symposium, volume 12, pages 91–104, 2003.

[62] Stephen Crane, Christopher Liebchen, Andrei Homescu, Lucas Davi, Per Larsen, Ahmad-Reza Sadeghi, Stefan Brunthaler, and Michael Franz. Readactor: Practical Code Randomization Resilient to Memory Disclosure. In 36th IEEE Symposium on Security and Privacy (Oakland), May 2015.

[63] Weidong Cui, Marcus Peinado, Helen J Wang, and Michael E Locasto. Shieldgen: Automatic Data Patch Generation for Unknown Vulnerabilities with Informed Probing. In Proceedings of the 28th IEEE Symposium on Security and Privacy, pages 252–266. IEEE, 2007.

[64] CVE-2013-2028. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2013-2028.

[65] CVE-2014-0160. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-0160.

[66] CVE-2015-3636. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3636.

101 [67] CVE-2015-3864. https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3864.

[68] DARPA. Cyber Grand Challenge. https://cgc.darpa.mil.

[69] Memory Protection Technologies. http://technet.microsoft.com/en-us/library/ bb457155.aspx.

[70] x86 NX support. http://lwn.net/Articles/87814/.

[71] Lucas Davi, Christopher Liebchen, Ahmad-Reza Sadeghi, Kevin Z Snow, and Fabian Monrose. Isomeron: Code Randomization Resilient to (just-in-time) Return-oriented Programming. 2015.

[72] Lucas Davi, Ahmad-Reza Sadeghi, Daniel Lehmann, and Fabian Monrose. Stitching the gad- gets: On the ineffectiveness of coarse-grained control-flow integrity protection. In Proceedings of the 23Rd USENIX Conference on Security, SEC’14, 2014.

[73] Data Execution Prevention. http://en.wikipedia.org/wiki/Data_Execution_ Prevention.

[74] David Devecsery, Michael Chow, Xianzheng Dou, Jason Flinn, and Peter M. Chen. Eidetic systems. In Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation, OSDI’14, 2014.

[75] Will Dietz, Peng Li, John Regehr, and Vikram Adve. Understanding Integer Overflow in C/C++. ACM Transactions on Software Engineering and Methodology, 25(1):2, 2015.

[76] Brendan Dolan-Gavitt, Patrick Hulin, Engin Kirda, Tim Leek, Andrea Mambretti, Wil Robertson, Frederick Ulrich, and Ryan Whelan. LAVA: Large-scale Automated Vulnerability Addition. In Proceedings of the 37th IEEE Symposium on Security and Privacy, May 2016.

[77] Ulfar Erlingsson, Martin Abadi, and Mihai-Dan Budiu. Architectural Support for Software- based Protection, March 13 2012. US Patent 8,136,091.

[78] Sebastian Eschweiler, Khaled Yakdan, and Elmar Gerhards-Padilla. discovRE: Efficient Cross- Architecture Identification of Bugs in Binary Code. In Proceedings of the 23rd Network and Distributed System Security Symposium, 2016.

[79] Matt Fredrikson, Somesh Jha, Mihai Christodorescu, Reiner Sailer, and Xifeng Yan. Synthe- sizing Near-optimal Malware Specifications from Suspicious Behaviors. In Proceedings of the 31th IEEE Symposium on Security and Privacy, pages 45–60. IEEE, 2010.

[80] Debin Gao, Michael K Reiter, and Dawn Song. Binhunt: Automatically Finding Semantic Differences in Binary Programs. In International Conference on Information and Communi- cations Security, pages 238–255. Springer, 2008.

102 [81] Tal Garfinkel. Traps and Pitfalls: Practical Problems in System Call Interposition Based Security Tools. In Proceedings of the 20th Annual Network and Distributed Systems Security Symposium, February 2003.

[82] Tal Garfinkel, Ben Pfaff, and Mendel Rosenblum. Ostia: A Delegating Architecture for Secure System Call Interposition. In Proceedings of the 10th Network and Distributed System Security Symposium, 2003.

[83] Jason Gionta, William Enck, and Peng Ning. HideM: Protecting the Contents of Userspace Memory in the Face of Disclosure Vulnerabilities. In Proceedings of the 5th ACM conference on Data and application security and privacy. ACM, 2015.

[84] Cristiano Giuffrida, Anton Kuijsten, and Andrew S Tanenbaum. Enhanced Operating System Security Through Efficient and Fine-grained Address Space Randomization. In USENIX Security Symposium, pages 475–490, 2012.

[85] Enes Göktas, Elias Athanasopoulos, Herbert Bos, and Georgios Portokalidis. Out of control: Overcoming control-flow integrity. In Proceedings of the 2014 IEEE Symposium on Security and Privacy, SP ’14, 2014.

[86] Enes Göktaş, Elias Athanasopoulos, Michalis Polychronakis, Herbert Bos, and Georgios Por- tokalidis. Size Does Matter: Why Using Gadget-chain Length to Prevent Code-reuse Attacks is Hard. In Proceedings of the 23rd USENIX conference on Security Symposium, pages 417– 432. USENIX Association, 2014.

[87] Ian Goldberg, David Wagner, Randi Thomas, and Eric A. Brewer. A Secure Environment for Untrusted Helper Applications: Confining the Wily Hacker. In Proceedings of the 6th USENIX Security Symposium, 1996.

[88] Zhenyu Guo, Xi Wang, Jian Tang, Xuezheng Liu, Zhilei Xu, Ming Wu, M. Frans Kaashoek, and Zheng Zhang. R2: An application-level kernel for record and replay. In Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, 2008.

[89] Joshua D. Guttman, Amy L. Herzog, John D. Ramsdell, and Clement W. Skorupka. Verifying information flow goals in security-enhanced linux. J. Comput. Secur., 13(1):115–134, 2005.

[90] Jason Hiser, Anh Nguyen-Tuong, Michele Co, Matthew Hall, and Jack W Davidson. ILR: Where’d My Gadgets Go? In Security and Privacy (SP), 2012 IEEE Symposium on, pages 571–585. IEEE, 2012.

[91] Steven A Hofmeyr, Stephanie Forrest, and Anil Somayaji. Intrusion Detection Using Sequences of System Calls. Journal of computer security, 6(3):151–180, 1998.

103 [92] How-To Geek. Why Do Carriers Delay Updates for Android But Not iPhone? http://www.howtogeek.com/163958/why-do-carriers-delay-updates-for-android- but-not-iphone.

[93] Hong Hu, Zheng Leong Chua, Sendroiu Adrian, Prateek Saxena, and Zhenkai Liang. Auto- matic Generation of Data-Oriented Exploits. In Proceedings of the 24th USENIX Security Symposium, pages 177–192, 2015.

[94] Hong Hu, Shweta Shinde, Sendroiu Adrian, Zheng Leong Chua, Prateek Saxena, and Zhenkai Liang. Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks. In Proceedings of the 37th IEEE Symposium on Security and Privacy, May 2016.

[95] Zhen Huang, Mariana D’Angelo, Dhaval Miyani, and David Lie. Talos: Neutralizing Vulner- abilities with Security Workarounds for Rapid Response. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2016.

[96] Roberto Ierusalimschy, Luiz Henrique de Figueiredo, and Waldemar Celes. The Evolution of Lua. In Proceedings of the Third ACM SIGPLAN Conference on History of Programming Languages, pages 2–1, New York, NY, USA, 2007. ACM.

[97] Intel. Intel 64 and IA-32 Architectures Software Developerś Manual, Feb 2014.

[98] Suman Jana, Yuan Kang, Samuel Roth, and Baishakhi Ray. Automatically Detecting Error Handling Bugs using Error Specifications. In 25th USENIX Security Symposium (USENIX Security ’16), Austin, August 2016.

[99] Sanidhya Kashyap, Changwoo Min, Byoungyoung Lee, Taesoo Kim, and Pavel Emelyanov. In- stant OS Updates via Userspace Checkpoint-and-Restart. In 2016 USENIX Annual Technical Conference, 2016.

[100] Baris Kasikci, Benjamin Schubert, Cristiano Pereira, Gilles Pokam, and George Candea. Fail- ure Sketching: A Technique for Automated Root Cause Diagnosis of In-production Failures. In Proceedings of the 25th Symposium on Operating Systems Principles, pages 344–360. ACM, 2015.

[101] Gaurav S Kc, Angelos D Keromytis, and Vassilis Prevelakis. Countering Code-injection At- tacks with Instruction-set Randomization. In Proceedings of the 10th ACM conference on Computer and communications security, pages 272–280. ACM, 2003.

[102] Chongkyung Kil, Jinsuk Jun, Christopher Bookholt, Jun Xu, and Peng Ning. Address space layout permutation (aslp): Towards fine-grained randomization of commodity software. In Proceedings of the 22Nd Annual Computer Security Applications Conference, ACSAC ’06, 2006.

104 [103] Dongsun Kim, Jaechang Nam, Jaewoo Song, and Sunghun Kim. Automatic Patch Generation Learned from Human-written Patches. In Proceedings of the 2013 International Conference on Software Engineering, pages 802–811. IEEE, 2013.

[104] Samuel T. King and Peter M. Chen. Backtracking Intrusions. In Proceedings of the 2003 Symposium on Operating Systems Principles, October 2003.

[105] Andrew P Kosoresow and Steven A Hofmeyr. Intrusion Detection via System Call Traces. IEEE software, 14(5):35, 1997.

[106] Christopher Kruegel, Darren Mutz, Fredrik Valeur, and Giovanni Vigna. On the Detection of Anomalous System Call Arguments. In Proceedings of the 8th European Symposium on Research in Computer Security, pages 326–343. Springer, 2003.

[107] Volodymyr Kuznetsov, László Szekeres, Mathias Payer, George Candea, R Sekar, and Dawn Song. Code Pointer Integrity. In USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.

[108] Oren Laadan, Nicolas Viennot, and Jason Nieh. Transparent, lightweight application exe- cution replay on commodity multiprocessor operating systems. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS ’10, 2010.

[109] Per Larsen, Andrei Homescu, Stefan Brunthaler, and Michael Franz. Sok: Automated Soft- ware Diversity. In Security and Privacy (SP), 2014 IEEE Symposium on, pages 276–291. IEEE, 2014.

[110] Wenke Lee, Salvatore J Stolfo, et al. Data Mining Approaches for Intrusion Detection. In Proceedings of the 7th USENIX Security Symposium, 1998.

[111] John R. Levine. Linkers and Loaders. Morgan Kaufmann, San Francisco, CA, 1999.

[112] Jinku Li, Zhi Wang, Xuxian Jiang, Mike Grace, and Sina Bahram. Defeating Return-Oriented Rootkits with “Return-less” Kernels. In Proceedings of the 5th ACM SIGOPS EuroSys Con- ference, April 2010.

[113] Lannan Luo, Jiang Ming, Dinghao Wu, Peng Liu, and Sencun Zhu. Semantics-based Obfuscation-resilient Binary Code Similarity Comparison with Applications to Software Pla- giarism Detection. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering, pages 389–400. ACM, 2014.

[114] Federico Maggi, Matteo Matteucci, and Stefano Zanero. Detecting Intrusions Through Sys- tem Call Sequence and Argument Analysis. IEEE Transactions on Dependable and Secure Computing, 7(4):381–395, 2010.

105 [115] Jiang Ming, Meng Pan, and Debin Gao. iBinHunt: Binary Hunting with Inter-procedural Control Flow. In Proceedings of International Conference on Information Security and Cryp- tology, pages 92–109. Springer, 2012.

[116] Vishwath Mohan, Per Larsen, Stefan Brunthaler, K Hamlen, and Michael Franz. Opaque Control-Flow Integrity. In Symposium on Network and Distributed System Security (NDSS), 2015.

[117] Micah Morton, Hyungjoon Koo, Forrest Li, Kevin Z Snow, Michalis Polychronakis, and Fabian Monrose. Defeating Zombie Gadgets by Re-randomizing Code Upon Disclosure. In Proceedings of the 9th International Symposium on Engineering Secure Software and Systems, pages 143– 160. Springer, 2017.

[118] Darren Mutz, Fredrik Valeur, Giovanni Vigna, and Christopher Kruegel. Anomalous System Call Detection. ACM Transactions on Information and System Security, 9(1):61–93, 2006.

[119] Nicholas Nethercote and Julian Seward. Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation. In ACM Sigplan notices, volume 42, pages 89–100. ACM, 2007.

[120] James Newsome and Dawn Song. Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software. In Proceedings of the 12th Network and Distributed System Security Symposium, Feburary 2005.

[121] NGINX. NGINX. https://www.nginx.com.

[122] Ben Niu and Gang Tan. RockJIT: Securing Just-in-time Compilation Using Modular Control- flow Integrity. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Com- munications Security, pages 1317–1328. ACM, 2014.

[123] William D Norcott and Don Capps. Iozone Filesystem Benchmark. URL: www.iozone.org, 2003.

[124] Kaan Onarlioglu, Leyla Bilge, Andrea Lanzi, Davide Balzarotti, and Engin Kirda. G-free: Defeating return-oriented programming through gadget-less binaries. In Proceedings of the 26th Annual Computer Security Applications Conference, ACSAC ’10, 2010.

[125] Sirinda Palahan, Domagoj Babić, Swarat Chaudhuri, and Daniel Kifer. Extraction of Statis- tically Significant Malware Behaviors. In Proceedings of the 29th Annual Computer Security Applications Conference, pages 69–78. ACM, 2013.

[126] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. Smashing the Gadgets: Hindering Return-Oriented Programming Using In-place Code Randomization. In Proceedings of the 2012 IEEE Symposium on Security and Privacy, SP ’12, 2012.

106 [127] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. Transparent rop exploit mitigation using indirect branch tracing. In Proceedings of the 22Nd USENIX Conference on Security, SEC’13, 2013.

[128] Mathias Payer, Antonio Barresi, and Thomas R Gross. Fine-grained Control-flow Integrity through Binary Hardening. In International Conference on Detection of Intrusions and Mal- ware, and Vulnerability Assessment, pages 144–164. Springer, 2015.

[129] Jeff H. Perkins, Sunghun Kim, Sam Larsen, Saman Amarasinghe, Jonathan Bachrach, Michael Carbin, Carlos Pacheco, Frank Sherwood, Stelios Sidiroglou, Greg Sullivan, Weng-Fai Wong, Yoav Zibin, Michael D. Ernst, and Martin Rinard. Automatically Patching Errors in De- ployed Software. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, October 2009.

[130] Hans Reiser. ReiserFS, 2004.

[131] Michiel Ronsse and Koen De Bosschere. Recplay: A fully integrated practical record/replay system. ACM Trans. Comput. Syst., 17(2), May 1999.

[132] Dan Rosenberg. QSEE TrustZone Kernel Integer Overflow Vulnerability. In Black Hat USA, 2014.

[133] Hovav Shacham. The Geometry of Innocent Flesh on the Bone: Return-Into-Libc without Function Calls (on the x86). In Proceedings of the 14th ACM Conference on Computer and Communications Security, October 2007.

[134] Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. On the effectiveness of address-space randomization. In Proceedings of the 11th ACM Conference on Computer and Communications Security, CCS ’04, pages 298–307, 2004.

[135] Di Shen. Exploiting Trustzone on Android. In Black Hat USA, 2015.

[136] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vi- gna. SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis. In Proceedings of the 37th IEEE Symposium on Security and Privacy, 2016.

[137] Yan Shoshitaishvili, Ruoyu Wang, Christopher Salls, Nick Stephens, Mario Polino, Andrew Dutcher, John Grosen, Siji Feng, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. (State of) The Art of War: Offensive Techniques in Binary Analysis . In Proceedings of the 37th IEEE Symposium on Security and Privacy, May 2016.

107 [138] Stelios Sidiroglou, Oren Laadan, Carlos Perez, Nicolas Viennot, Jason Nieh, and Angelos D. Keromytis. ASSURE: Automatic Software Self-healing Using REscue Points. In Proceedings of the 14th international conference on Architectural support for programming languages and operating systems, March 2009.

[139] Abraham Silberschatz, Peter B. Galvin, and Greg Gagne. Operating System Concepts. Wiley, 2012.

[140] Kevin Z Snow, Fabian Monrose, Lucas Davi, Alexandra Dmitrienko, Christopher Liebchen, and Ahmad-Reza Sadeghi. Just-in-time Code Reuse: On the Effectiveness of Fine-grained Address Space Layout Randomization. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 574–588. IEEE, 2013.

[141] Chengyu Song, Byoungyoung Lee, Kangjie Lu, William Harris, Taesoo Kim, and Wenke Lee. Enforcing Kernel Security Invariants with Data Flow Integrity. In Proceedings of the 23rd Network and Distributed System Security Symposium, Feb 2016.

[142] Nick Stephens, John Grosen, Christopher Salls, Andrew Dutcher, Ruoyu Wang, Jacopo Cor- betta, Yan Shoshitaishvili, Christopher Kruegel, and Giovanni Vigna. Driller: Augmenting Fuzzing Through Selective Symbolic Execution. In Proceedings of the 23rd Network and Dis- tributed System Security Symposium, Feb 2016.

[143] Mingshen Sun, John CS Lui, and Yajin Zhou. Blender: Self-randomizing Address Space Layout for Android Apps. In Proceedings of the 19th International Symposium on Research in Attacks, Intrusions, and Defenses, pages 457–480. Springer, 2016.

[144] László Szekeres, Mathias Payer, Tao Wei, and Dawn Song. Sok: Eternal War in Memory. In Security and Privacy (SP), 2013 IEEE Symposium on, pages 48–62. IEEE, 2013.

[145] PaX Team. PaX Address Space Layout Randomization (ASLR), 2003.

[146] Victor van der Veen, Dennis Andriesse, Enes Göktaş, Ben Gras, Lionel Sambuc, Asia Slowin- ska, Herbert Bos, and Cristiano Giuffrida. Practical Context-sensitive CFI. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Communications Security, pages 927–940. ACM, 2015.

[147] Victor van der Veen, Enes Göktas, Moritz Contag, Andre Pawoloski, Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athanasopoulos, and Cristiano Giuffrida. A Tough Call: Mitigating Advanced Code-reuse Attacks at the Binary Level. In Proceedings of the 37th IEEE Symposium on Security and Privacy, pages 934–953. IEEE, 2016.

[148] Xiaoguang Wang, Yue Chen, Zhi Wang, Yong Qi, and Yajin Zhou. SecPod: a Framework for -based Security Systems. In Proceedings of the 2015 USENIX Annual Technical Conference, pages 347–360, 2015.

108 [149] Xiaoguang Wang, Yong Qi, Zhi Wang, Yue Chen, and Yajin Zhou. Design and Implementation of SecPod, A Framework for Virtualization-based Security Systems. IEEE Transactions on Dependable and Secure Computing, 2017.

[150] Zhe Wang, Chenggang Wu, Jianjun Li, Yuanming Lai, Xiangyu Zhang, Wei-Chung Hsu, and Yueqiang Cheng. ReRanz: A Light-Weight Virtual Machine to Mitigate Memory Disclosure Attacks. In Proceedings of the 13th ACM SIGPLAN/SIGOPS International Conference on Virtual Execution Environments, pages 143–156. ACM, 2017.

[151] Richard Wartell, Vishwath Mohan, Kevin W. Hamlen, and Zhiqiang Lin. Binary stirring: Self-randomizing instruction addresses of legacy x86 binary code. In Proceedings of the 2012 ACM Conference on Computer and Communications Security, CCS ’12, 2012.

[152] Wikipedia. Basic Block. http://en.wikipedia.org/wiki/Basic_block.

[153] Wikipedia. Pwn2Own. http://en.wikipedia.org/wiki/Pwn2Own.

[154] Wikipedia. Shellshock (software bug). https://en.wikipedia.org/wiki/Shellshock_ (software_bug).

[155] Wikipedia. Tail Call. http://en.wikipedia.org/wiki/Tail_call.

[156] Bennet Yee, David Sehr, Gregory Dardyk, J. Bradley Chen, Robert Muth, Tavis Orm, Shiki Okasaka, Neha Narula, Nicholas Fullagar, and Google Inc. Native Client: A Sandbox for Portable, Untrusted x86 Native Code. In Proceedings of the 30th IEEE Symposium on Security and Privacy, May 2009.

[157] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, Laszlo Szekeres, Stephen McCamant, Dawn Song, and Wei Zou. Practical Control Flow Integrity and Randomization for Binary Executa- bles. In Proceedings of the 2013 IEEE Symposium on Security and Privacy, SP ’13, 2013.

[158] Hang Zhang, Dongdong She, and Zhiyun Qian. Android Root and Its Providers: A Double- Edged Sword. In Proceedings of the 22nd ACM SIGSAC Conference on Computer and Com- munications Security, pages 1093–1104, New York, NY, USA, 2015. ACM.

[159] Mingwei Zhang and R. Sekar. Control Flow Integrity for COTS Binaries. In Proceedings of the 22Nd USENIX Conference on Security, SEC’13, 2013.

[160] Xiangyu Zhang, R. Gupta, and Youtao Zhang. Precise Dynamic Slicing Algorithms. In Proceedings of the 25th International Conference on Software Engineering, may 2003.

[161] Yulong Zhang, Yue Chen, Chenfu Bao, Liangzhao Xia, Longri Zhen, Yongqiang Lu, and Tao Wei. Adaptive kernel live patching: An open collaborative effort to ameliorate android n-day root exploits. In Proceedings of Black Hat USA 2016, Las Vegas, NV, 2016.

109 [162] Yajin Zhou, Xiaoguang Wang, Yue Chen, and Zhi Wang. Armlock: Hardware-based fault isolation for arm. In Proceedings of the 2014 ACM SIGSAC Conference on Computer and Communications Security, CCS ’14, 2014.

110 BIOGRAPHICAL SKETCH

Yue Chen has enrolled in the Ph.D. program in Computer Science at Florida State University since 2013. Prior to that, he received his master’s degree in Computer Science from Northeastern University in 2013, and received his bachelor’s degree in Information Security from Harbin Insitute of Technology in 2011. His research interests include, but are not limited to, system and mobile threat mitigation, secure systems, vulnerability detection and analysis, and adaptive live patch generation. His work has led to a number of peer-reviewed papers published in conferences and journals, including USENIX Security, ACM CCS, USENIX ATC, RAID, ACM AsiaCCS, ACM CODASPY, IEEE TDSC, etc.

111