On the Impact of Memory Corruption Vulnerabilities in Client Applications

Dissertation zur Erlangung des Grades eines Doktor-Ingenieurs der Fakult¨atf¨urElektrotechnik und Informationstechnik an der Ruhr-Universit¨atBochum

vorgelegt von

Robert Gawlik aus Tichau

2016 Tag der m¨undlichen Pr¨ufung:7. September 2016

Gutachter: Prof. Dr. Thorsten Holz, Ruhr-Universit¨atBochum

Zweitgutachter: Prof. Dr. Herbert Bos, Vrije Universiteit Amsterdam Abstract

Client programs are omnipresent in our digital age. Especially, web browsers are used by an enormous number of users for various tasks. These tasks include information gather- ing, social media activities or communication with each other. As the popularity of web browsers has grown, attackers shifted their attention towards this kind of client-side soft- ware to compromise systems. Because of their huge code base and tremendous complexity, exploitable vulnerabilities exist in a vast number in these programs. In this thesis, various impacts of memory corruption vulnerabilities in client-side soft- ware are investigated from an offensive and a defensive perspective. The exploitation pro- cess of vulnerabilities in web browsers may obey certain steps, carried out subsequently. Usually, an adversary needs to find information about the address space of the vulnerable program. This important step is called information leak or memory disclosure. Once the attacker has gained enough knowledge about the address space of the program, she is able to hijack the control flow. This thesis considers information leaks and control-flow hijacking from attackers’ and defenders’ viewpoints. We extend the technique of information leaks with a behavior in web browsers which was not known to that extend before (crash resistance). Therefore, the program is kept alive, although it should terminate due to critical memory errors, such as an illegal read access. This allows to evaluate defenses from an adversarial perspective, which promise to keep address space information of the program a secret from attackers. Information leaks are also approached from a defensive perspective. To detect this step of an exploit we introduce a concept for script engines, i. e., JavaScript. Therefore, two simultaneous processes of the same program are executed in parallel and their flow is synchronized. As we enforce a different address space layout in both of them, a memory disclosure manifests itself differently in both processes and can be unveiled. This thesis also addresses control-flow hijacking. Code reuse is currently the most com- mon method to perform arbitrary computations after gaining control of the execution. For an adversary it is important to assess the quantity of code she can reuse. Hence, we in- troduce a framework to help evaluating specific control-flow integrity defenses. Therefore, we attempt to analyze a given program in an architecture independent way to maximize the amount of reusable code which conforms to CFI policies. Attackers are able to hijack the control flow in browsers with vtable hijacking. This thesis approaches this widely used, offensive technique from a defensive perspective. Vir- tual function tables (vtables) injected by attackers into the address space differ from real vtables. By using various heuristics and techniques we are the first to show that a vtable- hijacking mitigation for binary-only code is possible.

i

Zusammenfassung

In unserem digitalen Zeitalter sind clientseitige Anwendungen allgegenw¨artig. Insbeson- dere Webbrowser werden von sehr vielen Benutzern f¨uretliche T¨atigkeiten verwendet. Darunter f¨alltdie Informationsgewinnung, die Aktivit¨atin sozialen Medien oder die Kom- munikation der Benutzer untereinander. Da die Popularit¨atvon Webbrowsern gewachsen ist, haben auch Angreifer ihre Aufmerksamkeit auf diese Clientanwendungen gerichtet, um in Computersysteme einzubrechen. Wegen ihrer großen Codebasis und enormen Kom- plexit¨atexistieren viele ausnutzbare Speicherfehler in diesen Programmen. In dieser Dissertation werden verschiedene Auswirkungen von Speicherfehlern in Client- anwendungen aus offensiven und defensiven Gesichtspunkten untersucht. Der Ausnutzungs- prozess von Schwachstellen (engl. Exploit) kann in verschiedene, aufeinanderfolgende Schrit- te unterteilt werden. Der Angreifer ben¨otigtmeist Informationen ¨uber den Adressraum des verwundbaren Programms. Dieser wichtige Schritt wird auch Informationsleck (engl. Information Leak) oder Speicherenth¨ullung (engl. Memory Disclosure) genannt. Sobald der Angreifer gen¨ugendWissen ¨uber den Adressraum des Programms gesammelt hat, ist er in der Lage, den Kontrollfluss des Programms zu ¨ubernehmen. Dieser Schritt wird auch als Control-Flow Hijacking bezeichnet. In dieser Dissertation werden Information Leaks und Control-Flow Hijacking aus der Perspektive von Angreifern als auch Verteidigern betrachtet. Wir kombinieren die Technik von Information Leaks mit einem Verhalten in Browsern, welches bisher in diesem Ausmaß unbekannt war (Absturzresistenz, engl. Crash Resis- tance). Dabei wird das Programm am Laufen gehalten, obwohl es aufgrund kritischer Spei- cherfehler, wie beispielsweise eines illegalen Lesezugriffs, terminieren sollte. Dieser Ansatz erlaubt es uns, aus der Angreiferperspektive Verteidigungsans¨atzezu beurteilen, die ver- sprechen, den Adressraum vor Angreifern geheimzuhalten. Aus der Sicht eines Verteidigers werden Information Leaks ebenfalls betrachtet. Um diesen Schritt eines Angriffs zu erken- nen wird ein Konzept f¨ursog. Scripting-Umgebungen, wie JavaScript, vorgestellt. Dabei werden zwei Prozesse des gleichen Programms in ihrer Ausf¨uhrungsynchronisiert. Da wir in beiden einen unterschiedlichen Adressraum erzwingen, manifestiert sich ein Information Leak unterschiedlich in beiden Prozessen und kann detektiert werden. Auch widmet sich diese Dissertation dem Schritt der Kontrollfluss¨ubernahme. Die Wiederverwendung von Code ist momentan die g¨angigsteMethode beliebige Berechnun- gen durchzuf¨uhren,sobald der Programmfluss kontrolliert wird. Aus der Sicht eines An- greifers ist es wichtig zu wissen, wie viel Code wiederverwendbar ist. Daher wird ein System vorgestellt, welches helfen soll, spezielle Defensivmaßnahmen – sog. Control-Flow Integrity-L¨osungen(CFI) – zu beurteilen. Dabei wird architekturunabh¨angigversucht, die Menge an wiederverwendbarem Code, der CFI-Regeln entspricht, zu maximieren. Angreifer k¨onnenin Browsern den Kontrollfluss ¨uber sog. Vtable Hijacking ¨ubernehmen. Diese Dissertation betrachtet diese weitverbreitete, offensive Technik aus der Perspek- tive eines Verteidigers. Spezielle Funktionstabellen (sog. Vtables), die von Angreifern im Adressraum abgelegt werden, unterscheiden sich von echten Vtables. Unter der Benutzung verschiedener Heuristiken und Techniken zeigen wir als Erste, dass eine Abschw¨achung von Vtable Hijacking-Angriffen f¨urbin¨areProgramme m¨oglich ist.

iii

Acknowledgements

It is a great pleasure to thank everybody who made this thesis possible. First of all I want to sincerely thank my advisor Prof. Dr. Thorsten Holz for giving me the opportunity of being part of the Systems Security group at the Ruhr University Bochum. He provided me with a pleasant and productive research environment and always supported me during the last four years. I had the chance to focus on research topics I was interested in and was able to collaborate with many friendly people sharing the same passion for security research. I am very happy for working together with many colleagues on many challenging and exciting projects. I am thankful to Sebastian Vogl, Thomas Kittel and Jonas Pfoh for a fruitful, effective collaboration and the time we spent together at various places. I had pro- ductive discussions with Benjamin Kollenda, Philipp Koppe, Jannik Pewny about thrilling ideas and research topics. Many of them culminated in highly interesting projects or pub- lications. During the time at lab it was a pleasure to work with Behrad Garmany, with whom I was sharing an office for the last three years. We had many valuable conversations and a lot of fun not only at our working place. I am also thankful for all people at the chair for the time we spent at the university or elsewhere. Since the day I started they gave me the feeling of belonging to the same glorious team. I was very fortunate to work with, publish with or get inspired by Carsten Willems, Ralf Hund, Felix Schuster, Tilman Frosch, Johannes Hoffmann, Thomas Hupperich, Sebastian Uellenbeck, Teemu Rytilahti, Patrick Wollgast, Moritz Contag, Andre Pawlowski, Johannes Dahse, Christian ¨opke, Apostolis Zarras and Marc K¨uhrer. I want to thank my parents and Corina Costea for their support during all the time. Without you I would not have been able to either start or finish this thesis.

v

Contents

1 Introduction 1 1.1 The Arms Race between Attacks and Defenses ...... 3 1.2 Thesis Contributions ...... 4 1.3 Thesis Organization ...... 7 1.4 Publications ...... 8

2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections 9 2.1 Introduction ...... 9 2.1.1 Subverting Information Hiding ...... 10 2.1.2 Novel Memory Probing Method ...... 10 2.2 Technical Background ...... 11 2.2.1 Adversary Model ...... 11 2.2.2 Randomization Techniques ...... 12 2.2.3 Security by Information Hiding ...... 13 2.3 Unveiling Hidden Memory ...... 15 2.3.1 Fault-Tolerant Functionality ...... 15 2.3.2 Crash-Resistance ...... 16 2.3.3 Memory Oracles ...... 18 2.3.4 Web Workers as Probing Agents ...... 20 2.3.5 Finding Unreachable Memory Regions ...... 20 2.3.6 Subverting Hidden Code Layouts ...... 24 2.4 Conquering (Re-)Randomization ...... 25 2.4.1 Defeating Fine-Grained Re-Randomization ...... 25 2.4.2 Code Execution under Re-Randomization ...... 26 2.5 Implementation ...... 28 2.5.1 Exploiting IE without Knowledge of the Memory Layout ...... 28 2.5.2 Memory Probing in Mozilla Firefox ...... 30 2.5.3 Memory Scan Timings ...... 31 2.6 Related Work ...... 32 2.7 Discussion ...... 33 2.7.1 Novel Memory Scanning Technique ...... 34 2.7.2 Design Choices, Countermeasures, and Defenses ...... 35 2.8 Conclusion ...... 37

vii Contents

3 Information Leak Detection in Script Engines 39 3.1 Introduction ...... 39 3.2 Technical Background ...... 41 3.2.1 Enhancing Security with N-Variant Systems ...... 42 3.2.2 Windows ASLR Internals ...... 42 3.2.3 WOW64 Subsystem Overview ...... 43 3.2.4 Architecture ...... 44 3.2.5 Scripting Engines ...... 44 3.2.6 Adversarial Capabilities ...... 45 3.3 System Overview ...... 47 3.3.1 Main Concept ...... 47 3.3.2 Per Process Re-Randomization ...... 48 3.3.3 Dual Process Execution and Synchronization ...... 49 3.4 Implementation ...... 50 3.4.1 Duplication and Re-Randomization ...... 51 3.4.2 Synchronization ...... 52 3.4.3 Chakra Instrumentation ...... 53 3.4.4 AVM Instrumentation ...... 53 3.5 Evaluation ...... 54 3.5.1 Re-Randomization of Process Modules ...... 54 3.5.2 Detection Engine ...... 58 3.6 Related Work ...... 61 3.6.1 Randomization Techniques ...... 61 3.6.2 Multi-Execution Approaches ...... 62 3.7 Discussion ...... 63 3.7.1 Further Information Leaks ...... 63 3.7.2 Limitations of Prototype Implementation ...... 64 3.8 Conclusion ...... 65

4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks 67 4.1 Introduction ...... 67 4.2 Technical Background ...... 69 4.2.1 Code-Reuse Attacks ...... 69 4.2.2 Control-Flow Integrity (CFI) ...... 70 4.2.3 Heuristic Approaches ...... 71 4.2.4 Defeating the Countermeasures ...... 72 4.3 System Overview ...... 73 4.3.1 Gadget Discovery ...... 73 4.3.2 Gadget Analysis ...... 76 4.3.3 Semantic Search ...... 79 4.4 Implementation ...... 80 4.4.1 Gadget Discovery ...... 81 4.4.2 Gadget Analysis ...... 81 4.4.3 Gadget Search ...... 82

viii Contents

4.5 Evaluation ...... 82 4.5.1 Analysis Time of Gadget Discovery ...... 82 4.5.2 Gadget Type Distribution ...... 83 4.5.3 Exploiting ARM with One CFI-Resistant Gadget ...... 85 4.5.4 Comparison to Other Gadget Discovery Tools ...... 86 4.6 Related Work ...... 86 4.7 Discussion ...... 88 4.8 Conclusion ...... 89

5 Vtable-Hijacking Protection for Binary-Only 91 5.1 Introduction ...... 91 5.1.1 Preventing Vtable Hijacking ...... 92 5.1.2 Our Approach ...... 92 5.2 Technical Background ...... 93 5.2.1 ++ Inheritance and Polymorphism ...... 93 5.2.2 Virtual Function Calls ...... 94 5.2.3 Threat Model: Vtable Hijacking ...... 96 5.2.4 Intermediate Language Prerequisites ...... 98 5.3 System Overview ...... 99 5.3.1 Automated Extraction of Virtual Function Dispatches ...... 99 5.3.2 Automated Protection of Virtual Function Dispatches ...... 100 5.4 Implementation ...... 101 5.4.1 Amorphous Slicing ...... 102 5.4.2 Binary Transformations ...... 105 5.5 Evaluation ...... 109 5.5.1 vExtractor’s Precision ...... 109 5.5.2 Runtime of Instrumented Programs ...... 110 5.5.3 Vtable Hijacking Detection ...... 112 5.6 Related Work ...... 113 5.6.1 Control Flow Integrity (CFI) solutions ...... 113 5.6.2 Extensions against Vtable Hijacking ...... 114 5.6.3 Binary-Only Solutions Against Vtable Hijacking ...... 115 5.6.4 Heap Monitoring ...... 115 5.7 Discussion ...... 116 5.8 Conclusion ...... 117

6 Conclusion 119

Publications 123

List of Figures 125

List of Tables 127

List of Listings 129

ix Contents

x Chapter 1 Introduction

Programming errors have been affecting the security of software for decades [198, 207]. But within the last years, we observed a shift in how attackers abuse these errors to compromise systems. Instead of targeting memory corruption vulnerabilities in server applications, adversaries nowadays often launch attacks against client software. Especially web browsers and their plugins are affected. There may be many reasons why browsers received increasing attention by attackers, two of which are obvious:

• In our digital age, web browsers are prevalent. It can be assumed that almost every user who is utilizing a computer also uses a web browser, and thus, the gain of an successful compromise for an attacker is maximized.

• Major parts of today’s web browsers are developed in unsafe languages such as C and C++ and consist of several million lines of code.

Both aspects seem to be a necessity as web browsers serve as a gate to the information provided by the Internet and are used for various distinct functionalities. As such, web browsers have to be capable of fastly solving numerous tasks to stay popular among its user base. These tasks include media playback and design, 3D gaming, interpretation of several computer languages, and real-time communication. Thus, it is only natural that browsers are prone to programming defects which may result in exploitable bugs, because developers mistakes like any other humans. While C and C++ are the programming languages of choice for performance-critical software, at the same time they are prone to the introduction of memory corruption bugs. In general, memory corruption errors appear when memory contents of an application are accessed in an invalid and unexpected way. Usually, a segmentation fault or access violation is raised, and this leads to an abrupt termination of the affected program. Broadly speaking, memory corruptions can be di- vided into spatial and temporal errors [198]. The address space of today’s applications is accessed with byte granularity (e. g. on the x86 architecture). Therefore, instructions use addresses, i. e., pointers to read, write, or execute contents of memory. Accessing contents in memory via pointers is also called dereferencing. If a pointer is dereferenced out-of- bounds according to its underlying memory object, a spatial memory error happens. A

1 Chapter 1 Introduction pointer may also become dangling, if its corresponding memory object is deleted. Deref- erencing a leads to a temporal memory error. If the pointer or memory content is controllable by an adversary, then the error is most likely exploitable. Adversaries can apparently easily detect memory corruption vulnerabilities in complex programs like browsers, as demonstrated by the steady stream of reported vulnerabili- ties [54–56]. These vulnerabilities are exploited by researchers and attackers for various reasons. During the yearly Pwn2Own competition, for example, researchers try to gain control over a remote machine for fame, fun and profit. Most of the time, they suc- ceed [35, 47, 109, 159, 210]. Governments seem to utilize attacks involving memory cor- ruptions as an instrument as well [140, 176]. It serves the purpose to either deanonymize suspects, or to eavesdrop on potential illegal activities. On the other side, government agencies, companies as well as private persons fell victim to attacks abusing memory cor- ruptions [71, 79, 80]. While a multitude of defenses arose since the industry and academic world recognized the impact of memory corruption bugs, attackers constantly find ways to circumvent them to achieve their goal of breaking into systems. The overall pattern to exploit memory corruption vulnerabilities in browsers nowadays can be divided into crucial steps which are conducted subsequently:

I A memory corruption vulnerability is used to gain extended and illegitimate access to the affected program’s address space with read and write permissions. Under benign circumstances, the address space of a web browser cannot be accessed and manipulated arbitrarily. Legitimate access to the program’s address space is medi- ated by the browser’s engines to process and interpret HTML and JavaScript code, for example. These accesses are considered safe in the absence of vulnerabilities.

II Knowledge about the address space is gained, which is normally not available to the adversary. For the attacker it is a necessity to know where specific components of the program reside in memory to carry out further computations which help to gain access to the system the browser runs in. This procedure is also known as memory disclosure or information leak. Hence, the memory secrecy is undermined. The address space layout is regularly out of sight in languages of untrusted input such as HTML or JavaScript. During the course of this thesis, we use memory disclosure and information leak interchangeably unless not otherwise stated.

III Eventually, the adversary reroutes the flow of the program to execute code of his choice. This process – known as control-flow hijacking – is achieved by controlling the instruction pointer of the central processing unit and serves to escape the safe boundaries which normally tame the browser’s operations.

As an overall result of the compromise, the attacker can access portions or all of a victim’s sensitive data or install software to maintain illegitimate access to the system without the knowledge of the affected user. Step I is normally performed by an attacker to ease the process of step II and step III. It is not strictly required if an adversary can perform subsequent steps without it. Therefore, each of the steps II and III is accompanied by the bypass of at least one defense against

2 1.1 The Arms Race between Attacks and Defenses memory corruption bugs, and it is still an open research problem how exploitation of vulnerabilities can be prevented with a reasonable performance overhead.

1.1 The Arms Race between Attacks and Defenses

In the absence of memory corruptions vulnerabilities, step I and subsequent steps are not possible and a multitude of techniques to abuse bugs are ineffective. In fact, protections against bugs exist, but they are not applicable in a generic way or have a performance impact which prevents widespread adoption [143, 144]. Another option to prevent bugs is to develop software in languages which provide a higher degree of . There are projects going that route, like Mozilla’s Servo written in Rust [8], and the future will show if browser enignes written in memory-safe languages will prevail. In the meantime, step I is a valuable asset for attackers to abuse bugs. Adversaries started to perform step II due to the wide deployment of Address Space Layout Randomization (ASLR) [157, 174]. Since 2007, ASLR is incorporated in Windows. It means that the memory layout of an application or the ’s kernel is randomized either once during the boot process or every time a process is started. Since the attacker lacks information about the exact memory layout, it is harder for her to predict where her code she wants to execute is located. As a line of defense against memory disclosures which bypass ASLR, a variety of randomization strategies were proposed over the last years by the academic community to enforce memory secrecy [20, 112, 157]. To finally take over a program, step III is performed. Control-flow hijacking has a long history and the most infamous bugs to hijack the program flow are buffer overflows [150]. When a reserved space in memory is too small for a user input buffer, the attacker can overwrite critical values, inject data and execute this data as code. One technique to prevent code injection is Data Execution Prevention (DEP) [138]. DEP is a technique that marks data as non-, and thus, an attacker’s injected data in a vulnerable application is prohibited to be interpreted later on as code. DEP is also known as the W ⊕ X(Writable xor eXecutable) security model [158]. It is nowadays directly supported within processors and the Windows operating system has supported DEP since 2004. But adversaries reacted. Different kinds of code-reuse attacks such as for example return-to-libc [67], Return-Oriented Programming (ROP) [183], and many more variants [26, 44, 167] were developed. As the term code reuse already suggests, code injection is not necessary anymore. Instead, the program’s legitimate code is reused in a non-intended way. Therefore, code snippets, called gadgets are chained together, such that they perform the operations of an adversary’s choice. Hence, the ultimate need to inject code vanished. So it was again the turn for the defensive side. Many detection techniques for code-reuse attacks were proposed [82, 154], but most of them were also broken shortly afterwards [40, 90, 178]. A promising defense to prevent step III is to enforce Control-Flow Integrity (CFI) [1]. The idea of CFI is to verify that each control-flow transfer leads to a valid target based on a control-flow graph that is either (statically) pre-computed or dynamically gen- erated. Several implementations of CFI with different design constraints, security goals, and performance overheads were published [1, 2, 36, 61, 137, 162, 202, 205, 228, 229].

3 Chapter 1 Introduction

Therefore, a distinction can be made between fine-grained and coarse-grained CFI solu- tions: while fine-grained CFI systems enforce strict policies on indirect control transfers regarding the target code which can be taken, coarse-grained implementations use more relaxed policies. As the reader might guess, the attackers evolved again and started to hijack the control flow in ways which stay under the radar of many CFI implementations. Several papers recently demonstrated bypasses of these CFI solutions [48, 64, 78, 89, 177]. Nevertheless, a coarse-grained CFI solution, named Control Flow Guard (MS-CFG), found its way into Windows and is deployed since 2015 as it still impedes a successful mount of step III. Other widely deployed defenses are indirectly related to the outlined exploitation step III. If adversaries try to hijack the control flow with stack-based buffer overflows, these defenses have proven to be very effective, and hence, deserve to be men- tioned:

• Stack canaries [49] are random values located on the stack that serve as a guard to hamper memory corruption vulnerabilities. They are placed before return addresses to impede redirecting the control flow with a overwritten return address. If an attacker overwrites a return address by overflowing a stack buffer, the stack canary is overwritten, too. It is checked before the control is transfered to the return address and the program terminated. However, stack canaries are not bullet proof [23].

• Integrity checks of important control structures i. e., SAFESEH and SEHOP on Windows protect Structured Exception Handling (SEH) structures from corruptions and hence, enable the detection of ongoing attacks. Similar to return addresses, these structures reside on the stack. Hence, attackers target these structures instead of return addresses with stack-based buffer overflows. If these protections are enabled for all parts of a running program, tampering with them becomes very difficult.

1.2 Thesis Contributions

Broadly speaking, this thesis takes a closer look at step II and III from an offensive and defensive perspective. We investigate the possibilities for an attacker wanting to successfully perform step II and step III. Similarly, we want to shed light on the defensive side: What has to be undertaken to detect, mitigate or prevent II and III? In our approach and investigation of these exploitation steps in the course of this thesis, we assume that is not necessarily available to solve an offensive or defensive task. Attackers, who implement offensive strategies, work mostly with binary-only code. Similarly, source code is not always available or cannot be easy recompiled to incorporate a specific defense. Hence, this thesis’ contributions resulted also from analysis schemes applicable to binary-only software.

Enabling Crash-Resistance to Evaluate Memory Secrecy Protections. At first, we ex- plore step II, namely memory disclosures or information leaks, from an attacker’s per- spective. It is known that attack primitives can abuse the ability of specific software to automatically restart upon termination. For example, network services like FTP and

4 1.2 Thesis Contributions

HTTP servers are typically restarted in case a crash happens in order to ensure avail- ability. This can be used to defeat ASLR. It is a common belief that client applications, such as web browsers, are immune against exploit primitives utilizing crashes. Due to their hard crash policy, such applications do not restart after memory corruption faults, making it impossible to touch memory more than once with wrong permissions. We show that certain client application can actually survive crashes and are able to tolerate faults, which are normally critical and force program termination. We utilize this behavior and introduce a crash-resistance primitive. We develop a novel memory scanning method with memory oracles as an extension of memory disclosures. Unlike previous work, we do not need control-flow hijacking and still achieve similar goals. We show the practicability of our methods for 32-bit Internet Explorer 11 on Windows 8.1, and Mozilla Firefox 64-bit (Windows 8.1 and 3.17.1). Furthermore, we demonstrate the advantages an attacker gains to overcome recent information-hiding and randomization schemes. As a result, we show that these defenses need improvements since crash resistance weakens their security assumptions.

Information Leak Detection in Script Engines. In our following contribution, we take a closer look on memory disclosures, and thus step II, from a defensive perspective. We thereby observe that bypasses of several defensive solutions such as CFI and other code- reuse protections require memory disclosures as a fundamental step. We analyze this problem and develop a system for fine-grained, automated detection of memory disclo- sure attacks against scripting engines. The basic design insight is as follows: scripting languages, such as JavaScript in web browsers, are strictly sandboxed. They must not provide any insights into the memory layout in their contexts. In fact, any such infor- mation potentially represents an ongoing memory disclosure attack. Hence, to detect such information leaks, our system creates a clone of the scripting engine process with a re-randomized memory layout. The clone is instrumented to be synchronized with the original process. Any inconsistency in the script contexts of both processes appears when a memory disclosure was conducted to leak information about the memory layout. Based on this detection approach, we developed a prototype for the JavaScript engine in Microsoft’s browser Internet Explorer on Windows. An empirical evaluation shows that our tool can successfully detect memory disclosure attacks even against this .

Our following contributions are dedicated to step III, namely control-flow hijacking. We approach this step from an offensive as well as from a defensive perspective.

Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks. First we take the role of an attacker and automate the process of finding appropriate code gad- gets needed for code-reuse attacks under CFI policies. Manually extracting code gadgets which are allowed under CFI policies is a cumbersome task, because – dependent on the CFI flavor – the set of usable gadgets is drastically reduced. To ease the assessment of a CFI solution, we introduce a framework to discover code gadgets for code-reuse attacks that conform to coarse-grained CFI policies. For this purpose, binary code is extracted and transformed into a symbolic representation in an architecture-independent manner.

5 Chapter 1 Introduction

Additionally, code gadgets are verified to provide the needed functionality to a security researcher. We show that our framework finds more CFI-compatible gadgets than other code gadget discovery tools. Furthermore, we demonstrate that code gadgets needed to bypass CFI solutions on the ARM architecture exist in a non-negligible number which has not been shown before.

Vtable-Hijacking Protection for Binary-Only Software. Once again, we take the side of the defender. Especially, we investigate a specific control-flow hijacking technique which is very popular when exploiting memory corruption vulnerabilities in browsers. From a technical point of view, an attacker uses a technique called vtable hijacking to exploit heap-based vulnerabilities such as use-after-free flaws. More specifically, she crafts bogus virtual tables and lets a freed C++ object point to it in order to gain control over the program at virtual function call sites. We present a novel approach towards mitigating and detecting such attacks against C++ binary code. We propose a static binary analysis technique to extract virtual function call site information in an automated way. Leverag- ing this information, we instrument the given binary executable and add runtime policy enforcements to thwart the illegal usage of these call sites. We implemented the proposed techniques in a prototype and successfully hardened three versions of Microsoft’s Inter- net Explorer and one version of Mozilla Firefox. Evaluation with several former zero-day exploits demonstrates that our method prevents all of them. We conducted performance benchmarks on the micro and macro level. We show that the overhead is reasonable and only slightly higher compared to similar approaches which target binary-only software or are compiler-based.

Figure 1.1: The four main chapters, their relation to the exploitation process (step II and step III), and the venues the chapters were published at. A full list of publications, the author contributed to, is provided at the end of this thesis.

6 1.3 Thesis Organization

1.3 Thesis Organization

This thesis is divided in four different main chapters. Figure 1.1 outlines the connection of the chapters – their contributions and publications – to the exploit procedure previously introduced. Chapter 2 and Chapter 3 are dedicated to memory disclosures, namely step II of a common exploit process, while Chapter 4 and Chapter 5 address control-flow hijacking (step III). We interpret the outcome and outline possible future research directions in Chapter 6. Chapter 2 introduces new primitives to circumvent information hidig and randomiza- tion defenses in client programs. First, we give a background on several, recently proposed defenses which fall into that category. Then, we explain how crash resistance and memory oracles can be obtained in web browsers. We show that these primitives provide advantages when adversaries face hidden information. Similarly, we discuss advantages an attacker gains with crash resistance when encountering randomization-based defenses. Further- more, we present implementations for Internet Explorer 11 on Windows and Mozilla Fire- fox on Windows and Linux to show the practical impact of our novel primitives. Finally, we discuss related work and illustrate implications arising due to crash resistance, before we conclude. In Chapter 3 we present a dual execution system for script engines to detect memory disclosures or information leaks. Therefore, we first derive an overview of academic work which uses memory disclosures as fundamental step to circumvent several proposed de- fense schemes. We then outline typical memory disclosure attacks in detail and provide a technical background on aspects needed for information leak detec- tion. We then present our system design and implementation for Internet Explorer 10/11 on Windows 8.0/8.1 and evaluate against real-world memory corruption exploits. At last, we discuss related work and discuss the limitations of our approach. Chapter 4 details a framework to automatically search for code gadgets which conform to specific CFI policies. Initially, we discuss the CFI concept, several coarse-grained CFI solutions and code-reuse defenses necessary for this chapter, and review recent research about bypasses of these defenses. We then infer characteristics for appropriate gadget types and show how our framework performs the extraction, semantic classification, and gadget verification with symbolic execution. Afterwards, we reveal results of our architecture- independent gadget discovery for several libraries which host code usually used during code-reuse attacks. Finally, we compare our framework to other code gadget discovery tools, before we present related work and conclude. In Chapter 5, we explain a novel protection scheme against vtable-hijacking in binary- only C++ software. At first, we explain the reasons why certain vulnerabilities allow vtable-hijacking in C++ code. Then, we outline the principles which enable our detection approach and provide static analysis and dynamic transformation methods which our framework builds upon. Our framework is then described in detail, whereby the interplay between the individual components is shown. Furthermore, we evaluate the detection approach against former zero-day exploits, and conduct a performance analysis. At last, we review work closely related to our work and provide a more in-depth overview of CFI solutions before we finally conclude the chapter.

7 Chapter 1 Introduction

Chapter 6 summarizes our findings. We discuss and interpret the results of the former chapters and try to draw a conclusion regarding the future of memory corruption vulner- abilities. Additionally, we take a look at possible future offensive and defensive directions considering the exploitation of memory corruptions.

1.4 Publications

This thesis is based on several publications at various academic conferences. Additionally, publications are mentioned which emerged during the Ph.D. studies but were not used in this thesis. Chapter 2 was published at the Network and Distributed System Security Symposium (NDSS) 2016 together with Benjamin Kollenda, Philipp Koppe, Behrad Garmany and Thorsten Holz [86]. It is promising work as it partly serves for follow-up research. It inspired the evaluation of entropy-based information hiding which was accepted at the 25th USENIX Security Symposium 2016 [91], a publication jointly created in collaboration with Enes G¨okta¸s,Benjamin Kollenda, Elias Athanasopoulos, Georgios Portokalidis, Cristiano Giuffrida and Herbert Bos. However, this follow-up work is not included in this thesis. Chapter 3 is based on the publication at the 13th Conference on Detection of Intrusions and Malware and Vulnerability Assessment (DIMVA) 2016 [87]. This work was performed together with Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Garmany and Thorsten Holz. Additional information which is included in our published technical report [88] is also available in Chapter 3. Chapter 4 emerged from the publication with the title Automated Multi-Architectural Discovery of CFI-Resistant Code Gadgets [216]. It was accepted at the European Sym- posium on Research in Computer Security (ESORICS) 2016. It is based on the master thesis of Patrick Wollgast. The publication was developed together with Patrick Wollgast, Behrad Garmany, Benjamin Kollenda and Thorsten Holz. Chapter 5 was published at the 30th Annual Computer Security Applications Conference (ACSAC) 2014 [84]. Our technical report contains additional content which is also utilized in Chapter 5 [85]. This research was developed together with Thorsten Holz. Several other research topics culminated in publications but are not part of this the- sis. Together with Sebastian Vogl, Behrad Garmany, Thomas Kittel, Jonas Pfoh, Claudia Eckert and Thorsten Holz, a new hooking technique was published at the 23rd USENIX Security Symposium 2014 [208]. Together with Jannik Pewny, Behrad Garmany, Chris- tian Rossow and Thorsten Holz, a cross-architecture vulnerability discovery framework was presented at the 36th IEEE Symposium on Security and Privacy (2015) [161]. Last but not least, a detection framework for HTTP-based malware was published at the 12th Annual Conference on Privacy, Security and Trust (PST) 2014 together with Apostolis Zarras, Antonis Papadogiannakis and Thorsten Holz.

A comprehensive list of publications is provided at the end of this thesis.

8 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

In contrast to server software, client applications have a crucial property: they typically terminate immediately on memory corruption faults or failed exploit attempts, and do not automatically restart. Hence, adversaries have usually only one attempt to conduct their attack successfully. In contrast, the ability of network services to restart initiated research on developing sophisticated attack primitives: if programs such as servers automatically respawn after termination due to a crash, memory layout information or hidden sections can be deduced which are otherwise not accessible to an adversary [23, 77, 184]. It is a well-known issue that the low entropy used to randomize a program’s address space on 32-bit systems is susceptible to brute-force attacks [184]. However, discovering parts of the memory layout in a brute force manner requires a program which tolerates crashes resulting from e. g., reading unmapped memory. Thus, this way of exploitation is normally only viable on certain kinds of server software, where each request spawns a new but equally randomized process. With increasing entropy to randomize the address space on 64-bit architectures and a hard crash policy, which forbids restarts of the program, brute-force attacks do not seem to be a viable option anymore. In addition, efficient (re-)randomization schemes seem to be a promising direction towards memory secrecy, and several papers that propose such randomization schemes were recently published [12, 21, 51, 65, 123]. It remains an open question if attacks against such schemes are viable, especially in the light of memory disclosures.

2.1 Introduction

It it known that restarting programs, especially network services like FTP and HTTP servers, offer the capability to subsequently abuse memory faults, because they restart. A common belief is that such attacks are not viable against client programs, because they do not restart after a critical fault forced termination. In this chapter, we challenge this assumption and demonstrate the ability to handle faults in a manner that memory corruptions no longer remain an all or nothing primitive against client software. More

9 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections specifically, we demonstrate that memory corruption vulnerabilities can serve as a base for side channels to weaken available security features.

2.1.1 Subverting Information Hiding In the presence of ASLR and DEP (W ⊕ X), the successful exploitation of a memory corruption vulnerability poses a challenge to an adversary. We observe that a successful attack is typically based on a memory disclosure vulnerability. Such information leaks are often the first step utilized to gain some knowledge about the memory layout. Once the code locations are collected, they can be used to mount a code-reuse attack (which is in turn used to disable W ⊕ X). More importantly, we observe that hiding of information in a program’s address space is a crucial aspect: data structures with sensitive information have to remain hidden in order to prevent weakening available security features. For example, in the presence of ASLR, base addresses of shared modules as well as stack addresses of running threads, heap boundaries, and exception handlers have to remain hidden from an attacker. With fine-grained randomization schemes [19, 112, 213], the same problem applies: an adversary might be able to uncover the address space via novel attack methods and leverage this information to perform just-in-time attacks similar to the work by Snow et al. [189]. Other security features rely on information hiding as well, for example hidden regions to store metadata used to perform integrity checks. Consider for example Code-Pointer Integrity (CPI) [116], the state-of-the-art code pointer protection approach: on platforms where the implementation is based on information hiding (e.g., Intel’s x86-64 architecture), such pointers (and pointers to such pointers) are stored in a hidden memory region to impede tampering with them. Recently and similarly to our work, a successful attack against the current CPI implementation was demonstrated that leaks this hidden memory region [77]. Furthermore, all CFI implementations that leverage a shadow stack need to prevent this stack from being leaked to an attacker [41, 59].

2.1.2 Novel Memory Probing Method We show that fault-tolerant functionality is available in web browsers and—when combined with memory disclosures—delivers a novel way to explore unknown memory territories. It is a common belief that a hidden memory region without references to it is indiscoverable in practice without code-execution or code-reuse attacks. Thus, an important building block towards revealing reference-less memory is the ability to scan the address space without forcing the program into termination. This is viable in server software [23, 77, 184], but seems impossible in web browsers due to their hard crash policy (i.e., after three consecutive crashes, Internet Explorer stops restarting automatically). We demonstrate that a memory scanning ability in web browsers can be achieved and use this as a base to subvert memory secrecy and randomization approaches without control-flow hijacking, code injection or code-reuse attacks. Deducing hidden information with memory scans in turn enables an adversary to conduct code-reuse attacks. In our experiments, we were able to scan the address space with 18,357 probes per second in 64-bit Firefox on Linux, with 718 probes per second in 64-bit Firefox on Windows,

10 2.2 Technical Background and 63 probes per seconds in 32-bit Internet Explorer. We leverage memory oracles as an extension of information leaks, which either return the content at specified memory or deliver an event in case of unmapped memory, to learn more about the structure of the address space. This enables us to circumvent standard ASLR implementations and recently proposed defense schemes are undermined. Additionally, we use our crash- resistance primitive together with function chaining to achieve Crash-Resistant Oriented Programming (CROP): arbitrary exported system calls or functions can be dispatched in a fault-tolerant manner. The contributions made in this chapter can be summarized as follows:

• We introduce the ability in web browsers to survive crashes and to run in fault- tolerant mode. We term this new class of primitive crash-resistance.

• We develop new methods allowing to scan memory inside client software based on crash-resistance and memory oracles. We thereby do not need control-flow hijacking, code injection, or code reuse. Furthermore, we demonstrate the practical feasibility of our methodology for Internet Explorer 32-bit on Windows and Mozilla Firefox 64-bit on Linux and Windows.

• We present the advantages an adversary gains with crash-resistance and mem- ory oracles to weaken recently proposed security features based on code hiding and (re-)randomization. More specifically, we show that memory secrecy enforced through memory layout randomization is ineffective even in a large address space (i.e., on x86-64 systems), uncovering sensitive information protected via information hiding.

• Finally, we develop a new code-reuse technique based on function chaining in com- bination with crash-resistance. We term this technique Crash-Resistant Oriented Programming (CROP).

2.2 Technical Background

In the following, we first introduce the adversarial capabilities and the defense model we use throughout this chapter. Furthermore, we describe targeted defense strategies and briefly discuss potential shortcomings.

2.2.1 Adversary Model We assume that the adversary has an initial vulnerability such as a use-after-free or a restricted write (such as a byte increment/decrement or null byte write) to an attacker- chosen address. We assume as well that the initial vulnerability leads to the ability to read from and write to arbitrary addresses, using a scripting environment such as JavaScript. These assumptions are consistent with recent exploitation of memory corruptions in web browsers [51, 52, 65] Furthermore, we assume that the target system incorporates the following defense mech- anisms against the exploitation of memory corruption vulnerabilities:

11 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

• Non-executable memory: The target OS implements the W ⊕ X security model that is applied to all non-executable pages. Thus, only code pages gain the execute permission in order to hamper code injection attacks.

• Memory diversification: The adversary has to tackle several levels of randomization techniques starting with the widely deployed coarse-grained ASLR that randomizes modules, over to fine-grained address space randomization on the instruction/basic block/function level [19, 98, 112, 155, 213], to re-randomizing code such as proposed in Isomeron [65].

• Control-flow integrity: As CFI implementations such as Microsoft’s Control Flow Guard (MS-CFG) begin to be deployed in commodity operating systems, we assume that coarse-grained CFI [228, 229] is active on the target OS.

• Execute-no-read memory: We further restrict the attacker by enforcing non-readable code pages (R ⊕X) as proposed by recent JIT-ROP defenses [11, 51]. Consequently, any read attempt to code results in an access fault.

• Hard crash policy: We assume that a program does not automatically restart after a crash and that a user will not open a potentially dangerous web page again after it crashed a web browser once.

2.2.2 Randomization Techniques

Several randomization techniques were proposed over the last years and we briefly review the different approaches. Furthermore, we discuss potential shortcomings of such methods.

2.2.2.1 Address Space Layout Randomization

All state-of-the-art operating systems deploy ASLR. This feature randomizes the base address of shared libraries and , all stacks, heaps, and other structures. Ran- domization is performed at load time of a program and ideally no locations of specific memory are predictable. However, there exist drawbacks: offsets to data structures and code within a shared library remain constant and are susceptible to static code-reuse at- tacks. If one library base address is revealed with one memory disclosure, the adversary knows the layout of the complete module [196].

2.2.2.2 Fine-Grained ASLR

To overcome the constant layout in shared modules and to prevent an attacker to conduct static code-reuse attacks, several schemes of fine-grained randomization were developed. They randomize the code layout [98], replace instructions with semantic equivalents [155], or alternate the order of basic blocks [213]. These methods are applied during load time of a program. Unfortunately, these defenses can be bypassed if an adversary discloses code pages and assembles a code-reuse payload dynamically on the fly [189].

12 2.2 Technical Background

2.2.2.3 Re-Randomization

To hinder dynamic code-reuse attacks, re-randomization is applied to programs: if an adversary discovers code locations via memory disclosure vulnerabilities, she cannot use them as re-randomization changes the code layout in between. Isomeron [65] applies fine- grained randomization in the load phase of a program to ensure that not only basic blocks or modules are placed at different addresses, but also single code snippets. Furthermore, Isomeron applies re-randomization on the granularity of function calls during runtime. In a coin-flip manner, it decides whether the original or a diversified copy of a function is executed. This approach thwarts code-reuse attacks like ROP and JIT-ROP. However, we found that specific structures are very challenging to re-randomize, especially data structures to which dynamic access needs to be maintained during a program’s runtime (see Section 2.4.1.1). Thus, we show that an adversary can still gather sufficient information and conduct code-reuse attacks.

2.2.3 Security by Information Hiding

As noted above, hiding of information in a program’s address space is getting more and more into the spotlight of interest. Note that all structures with sensitive information have to remain hidden to prevent an adversary from leaking them (and thus weakening available security features). In the context of ASLR, the following information is for example considered to be sensitive: base addresses of shared modules, stack addresses of running threads, heap boundaries, and exception handlers. Based on information hiding, memory regions without references to them exist for similar reasons: they are based on the assumption that memory disclosures cannot reveal them, as knowledge about their location is not available and they are reference-less. We explain several instances of information hiding in the following.

2.2.3.1 Sensitive Application Structures

Microsoft Windows maintains a Process Environment Block (PEB) for each running pro- cess. Similarly, a Thread Environment Block (TEB) is included in each process’ address space for each thread. The legitimate method to gain access to either of them is to use the official Windows API call NtCurrentTeb(). Accessing a TEB or the PEB illegitimately is often done by using the FS register on x86 architectures: the address of the currently active thread’s TEB is found at FS:0 and the address of the PEB at [FS:0x30]. To the best of our knowledge, references to both structures do not exist anywhere in user-space memory. In presence of ASLR, it is nearly impossible for an adversary to reveal the structures with complete read access to memory, unless prior knowledge of the memory layout is available to her. Trying to read unreadable memory results in access faults and termination of the program. For an adversary, the only possible way to reveal them is to hijack the control-flow and execute her code of choice. In Section 2.3.5, we show that this hidden information is accessible even if code-reuse attacks are not an option (e.g., due to control- flow integrity).

13 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

Valuable information for an attacker in a TEB is the thread’s stack boundaries or the chain of exception handlers. Very critical is an undocumented code-trampoline field at offset 0xC0: every system call of a 32-bit process running on 64-bit Windows is going through this CPU mode-switch trampoline. If an attacker manages to overwrite that field, she can gain control over every system call in that process. Among other information, the PEB contains the base addresses of all mapped modules and a function callback table for the Windows kernel. While disclosing a module’s base address by reading another module’s import address table (IAT) may be an option, reading the PEB directly yields all executable modules of a process at once. Note that there are no references to kernel callback tables in the user-mode address space for obvious reasons.

2.2.3.2 Reference-Less Regions for Pointer Safety Another notable example of this concept are implementations of Code-Pointer Integrity (CPI [116]) on x86-64 and ARM systems. In absence of hardware enforced segmentation protection, the CPI implementation on these platforms relies on hiding the location of the safe region that contains the sensitive metadata for pointers. The safe region is used by CPI to enforce its policies. The most restrictive variant of the CPI policy tracks all sensitive pointers in a program. A pointer is considered sensitive if it is a code pointer or a pointer that may later be used to access a sensitive pointer. This recursive definition ensures that all control flow information is protected. The security of CPI on the x86-64 and ARM architectures relies on hiding the precise location of metadata from an attacker. This concept has already been shown to be sus- ceptible to attacks by Evans et al. [77]. In Section 2.3.5, we present an even more efficient mechanism to determine the location of hidden memory. It is used to launch a similar attack on CPI without crashes in a shorter time.

2.2.3.3 Code and Code Pointer Hiding In case fine-grained randomization is in place, an adversary can still conduct JIT-ROP attacks [189]. To prevent the attacker’s ability to discover enough code to reuse, recent research has focused on mapping code as execute-only [11] or hide pointers in code behind a layer of indirection [12]. In another recent work, Crane et. al developed a framework called Readactor which aims to be resilient against memory disclosures and aims to provide a high degree of protection against code-reuse attacks of all kinds [51]. Code pointers in code are not readable, as code is mapped as execute-only, and code-pointers in data are replaced by execute-only trampolines to their appropriate functions. However, the authors note that hidden functions which are imported from other modules can be invoked by an adversary through the trampolines if she manages to disclose trampoline addresses. Based on Readactor, Readactor++ was developed which additionally randomizes the entries in function tables such as in virtual function tables and procedure linkage tables [52]. Export symbols, however, are and must remain discoverable (see Section 2.3.6 on dynamic loading for details). We show in Section 2.5 that this leaves enough space to conduct powerful code-reuse attacks, when combined with crash-resistance. Additionally, we found that it is chal-

14 2.3 Unveiling Hidden Memory lenging to hide pointers in structures which are allowed to be accessed legitimately (see Section 2.3.6 for details).

2.3 Unveiling Hidden Memory

In the following, we demonstrate that a memory scanning ability can be achieved by abusing the fact that certain code constructs enable a crash-resistance. We introduce the technical building blocks and show how they can be used to subvert memory secrecy and randomization without control-flow hijacking, code-injection, or code-reuse attacks.

2.3.1 Fault-Tolerant Functionality

Querying characteristics of memory regions is a legitimate operation in a standard user- mode program. For example, Windows provides API functions for that purpose: IsBad- ReadPtr() and related functions allow a to investigate if a certain memory pointer is accessible with certain permissions without raising faults. Similarly, Virtual- Query() yields memory information of a range of pages. Furthermore, other functionality exists whose primary purpose is not to deliver information about memory permissions. However, exception handling and system calls can be (ab)used to deduce whether memory is accessible or not.

2.3.1.1 Exception and Signal Handling

Program code in Windows can be guarded via Structured Exception Handling (SEH) [163, 175] and Vectored Exception Handling (VEH) [164]. A programmer can install exception handlers and define filter functions which decide if the handler is executed. In case of C/C++ code and SEH, this is achieved with try{... } except(FILTER){... } and similar constructs, and in case of VEH with the Windows API. This way, a chain of exception handlers can be constructed: If an exception like an access violation is raised, the exception handlers’ filters are inspected successively until one handler is picked to process the exception. It can then decide to pass the exception to the next exception handler, terminate the program, or return a status, such that program execution is resumed. In case of SEH, program resumption can continue to execute the code which follows the except(){} block, and in case of VEH, the program is resumed at an address specified within the VEH information. Signal handling is achieved in a similar way in Linux: callback functions can be specified which are called upon a signal raised by the program, such as a segmentation fault. Similar to Windows, the callback function can process the reason for the signal and decide to terminate the program or to resume normal execution. As we demonstrate in the following, legitimate exception handling can be utilized to achieve crash-resistant functionality within a higher-level language like JavaScript in a browser without hijacking the control-flow.

15 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

2.3.1.2 System Calls

System calls in Linux have the ability to return specific status codes based on the param- eters they were called with. If a system call expects a pointer to memory and receives a pointer to an unreadable memory address, it will return a different status code than when called with a parameter which points to a readable memory address. For example, the access() system call in Linux is normally used to check different characteristics of a file whose name is passed as a string pointer. If the pointer points to an unreadable memory page, access() returns the error “Bad Address”, while it returns “No such file or directory” for a readable memory address (which does not have to constitute a valid filename). A similar behavior can be observed with system calls on Windows. The system call NtReadVirtualMemory() returns a different status code when applied to a readable memory address than to an unreadable memory address. However, both Windows and Linux do not raise any exception or access fault. This side channel is still used by egghunt shellcode. It is a specific type of injected code which searches the memory space for the actual malware code after the control-flow was hijacked with a vulnerability [165]. In this chapter, we show that actually searching the memory space is possible without control-flow hijacking.

2.3.2 Crash-Resistance

Memory access faults like memory access violations in Windows programs or segmentation faults in Linux programs are fatal and lead to the abnormal termination of the program. In both operating systems, exception handling is allowed to inspect the type, reason, and faulting code which caused the exception. If the faulting code is not handled by any exception handler, the OS terminates the program. Surprisingly, we discovered that faulting code which should crash a given program does not have to bring down the program necessarily. If we can force a program to stay alive despite its code producing memory corruptions and access faults, we denote this as crash-resistance. Consider for example the Windows C-code in Listing 2.1. On line 29, the timer callback function triggerFault() is installed. It is executed in a loop with DispatchMessage() (line 30 to 33). triggerFault() generates read, write, and execution faults depending on the value of ptr which is increased each time it runs. There are no custom SEH or VEH handlers installed, thus the OS should terminate the program on the first access fault. However, this is not the case: the function triggerFault() is stopped at access faults, but is executed permanently anew. Hence, ptr is continuously increased and each access fault is triggered without forcing the program into termination. Consequently, the program is crash-resistant. This behavior was observed for both 32-bit and 64-bit programs and we found the follow- ing reasons for it: the timer callback triggerFault() is called by the function Dispatch- MessagerWorker() from user32.dll. The callback is wrapped by an exception handler. If an exception in triggerFault() is raised, the corresponding filter function executes and decides if the installed exception handler is going to handle the exception. The filter returns EXCEPTION EXECUTE HANDLER independently of the exception type. This instructs

16 2.3 Unveiling Hidden Memory

1 #include 2 #include 3 4 PCHAR ptr = 0; 5 typedef VOID (*function)(); 6 7 VOID CALLBACK triggerFault(){ 8 CHAR mem; 9 ptr ++; 10 switch ((INT)ptr % 3){ 11 case 0: 12 printf("Execute 0x%.8x\n", ptr); 13 ((function)(ptr))(); 14 break ; 15 case 1: 16 printf("Read at 0x%.8x\n", ptr); 17 mem = *ptr; 18 break ; 19 case 2: 20 printf("Write to 0x%.8x\n", ptr); 21 *ptr = 0; 22 break ; 23 } 24 printf("No fault"); 25 } 26 27 INT main (){ 28 MSG msg; 29 SetTimer(0, 0, 1000, (TIMERPROC)triggerFault); 30 while (1){ 31 GetMessage(&msg, NULL, 0, 0); 32 DispatchMessage(&msg); 33 } 34 }

Listing 2.1: Crash-resistant program in Windows the exception handler to handle any exception. DispatchMessagerWorker() returns and the program continues running without executing line 24. After cooperation with Microsoft, this issue was confirmed to be security relevant (tracked as CVE-2015-6161, see Section 2.7.2 for a discussion). There exist similar design choices and a more in depth technical analysis can be found in an article by Permame- dov [160].

2.3.2.1 Crash-Resistance in Microsoft Internet Explorer

It is important to note that we can exploit this feature inside Internet Explorer and prevent it from abnormal termination on memory corruption errors. We developed two ways to achieve this behavior in Internet Explorer:

1. A web page can use the JavaScript method window.open() to open a new browser tab window. JavaScript code which is dispatched via setTimeout() or setInterval() inside that window can produce memory corruptions without forcing the browser to terminate.

17 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

2. Since the introduction of HTML5, web workers are available and dispatched as real threads in script engines. JavaScript code executed with setTimeout() or set- Interval() inside web workers can generate access faults without crashing Internet Explorer.

2.3.2.2 Crash-Resistance in Mozilla Firefox While the crash-resistance in Windows may seem like an obscure feature, we were able to achieve crash-resistance in Mozilla Firefox as well. We utilized the Firefox JavaScript engine SpiderMonkey and its asm.js optimization module, called OdinMonkey. It is able to compile a subset of JavaScript code ahead of time into high performance native code [139]. We observed that OdinMonkey uses exceptions instead of runtime checks in special cases. Most prominently, bounds checking is not performed explicitly. Instead, page protections on memory are used to check bounds implicitly. Every asm.js function can access a pre-determined heap of a fixed size. On creation, the heap is initialized as an array with zeros. Thus, any access in bounds will not lead to an exception. On 64-bit machines it is then guarded by a non-accessible memory region of slightly more than 4GB. As asm.js only permits 32-bit indices, this guarantees that any offset from the beginning of the heap will either point into the valid array or into the guard region. Out of bound memory accesses on that array are not treated as critical faults. Instead, a default value of NaN is returned to indicate that an element outside of that array was accessed. This is accomplished by an exception handler which prevents program termination: OdinMonkey sets a global signal handler that gets called for every unhandled exception in the process. The handler is defined in AsmJSSignalHandlers.cpp. Out of bound memory accesses provoke checks to ensure only the intended faults are caught. First, the exception code itself is inspected to determine if it is really an access violation. The faulting instruction address is checked against the location of the asm.js compiled code to ensure that it has thrown the exception. The last check determines if the accessed address lies within the heap and the guard pages of the asm.js code, but outside of the bounds indicated by the size of the array buffer. Only if these conditions are met, the handler signals that the exception has been successfully resolved. It sets the instruction pointer to the instruction following the one causing the fault and sets the default value to be returned. Execution can continue safely as if the access occurred correctly. The asm.js generated code can then perform calculations with the default value or return it into the fully featured JavaScript context. Setting the asm.js heap pointer with a vulnerability is sufficient to achieve crash- resistance in Firefox. Accesses to unmapped memory are then treated as standard out of bound array accesses.

2.3.3 Memory Oracles Armed with crash-resistance, we are able to develop a novel memory probing method for web browsers. We denote memory oracles to accomplish the following functionality within JavaScript:

• If non-readable memory is accessed with read access, an access fault is generated and handled in a way that allows recognizing this event.

18 2.3 Unveiling Hidden Memory

• In case memory is successfully read, the oracle returns the bytes at that memory location.

In the following, we present the basic design of our memory oracles. Due to the differences between the two scripting engines within Internet Explorer and Mozilla Firefox, the tech- nical implementations differ, but the general approach and the end result are the same for both browsers.

2.3.3.1 Memory Oracles for Internet Explorer

Assume an adversary controls the buffer pointer in a string object by a vulnerability. She can misuse that string object as a memory oracle as shown in the HTML Listing 2.2: In

1 2

1 2 3 8

Listing 2.2: Memory oracle in Internet Explorer 11. oracle.html is used to open runOracle.html line 7 of oracle.html, a JavaScript string pointing to the four-byte sized wide char buffer "AB" is allocated. Then, it is modified with a vulnerability by the attacker on line 8 to a memory address whose permissions are uncertain. This is only illustrated with a comment in Listing 2.2. Thus, strObj does not point to the actual data ("AB") anymore, but to an attacker-chosen address. Line 5 of runOracle.html dispatches the JavaScript function memoryOracle() in crash-resistant mode. If the modified pointer points to unreadable memory, then memoryOracle() stops running, but Internet Explorer stays alive. Thus, the oracle can be queried again with another pointer value. If a readable address is found, two bytes are returned and further computations are carried out (line 5 in oracle.html). Note that memory oracles can be seen as an extension of memory disclosures, but are more powerful as they can discover reference-less memory.

19 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

2.3.3.2 Memory Oracles for Mozilla Firefox

As mentioned earlier we use asm.js to implement our memory oracle for Mozilla Firefox. Due to the extensive checks performed by this browser, developing a memory oracle is more complex than in Internet Explorer. An object of type AsmJSModule tracks all information related to an asm.js-compiled module. This includes the location of the native code as well as the asm.js heap location. As mentioned earlier, we do not only need to perform our invalid access from an asm.js function, but also are limited to the heap location plus the size of the guard region. But with a vulnerability, the location of the AsmJSModule object is disclosed, as it is reachable with memory disclosures (see Section 2.5.2). Then, the heap address stored in the object’s metadata is overwritten to an attacker-chosen address. A read attempt via an array access yields either the default value or content at that address. The former is only retrieved if memory is not readable. Hence, this constitutes already our basic memory oracle. To query the oracle again, the heap address in the AsmJSModule object is set to another value and an array access is performed anew. As we will demonstrate in Section 2.5.2, the complete virtual address space can be probed continuously.

2.3.4 Web Workers as Probing Agents

Web Workers are a feature of modern browsers. They are intended to run as separate threads in a script environment. We found that web workers can also be used as memory oracles since they can be made crash-resistant. We developed a way to utilize web workers to deduce information whether memory is accessible or not. In Listing 2.3, an attacker can control the wide char buffer pointer of object strObj with the first element of the array object bufPtr. Triggering the vulnerability and initializing bufPtr is omitted in Listing 2.3, but as we show in Section 2.5, such a powerful control is realistic and can be achieved with a single memory corruption such as a null byte write or a use-after-free vulnerability. The web worker is started on line 9 in main.html. On line 6 of worker.js, the function probeMemory() is dispatched in crash-resistant mode with setInterval(). This causes probeMemory() to start subsequently anew, but it stops at line 14 due to read access faults. It only runs further if the read attempt on line 14 succeeds. This occurs eventually: as the read attempt starts at address 0x00 but is increased by 0x1000 bytes on each run, four bytes of the first memory page are returned finally. The content is transfered from the worker to the context of main.html on line 17 and can be processed further in handle- MessageFromWorker() in main.html.

2.3.5 Finding Unreachable Memory Regions

With the ability to probe memory in browsers, we can discover hidden memory areas like the Thread Environment Block (TEB) or the safe region used by CPI to store pointer metadata. Note that no references to these structures exist in memory, and hence, they are not locatable by simple memory disclosure attacks. The intuition behind our attack is that we can probe for specific information and these probes enable us to deduce if we have

20 2.3 Unveiling Hidden Memory

1 2

1 // file worker.js 2 self.addEventListener(’message’, initProbe, true) 3 function initProbe(){ 4 strObj = "AB" 5 pageStep = 0x1000; pageCount = 0 6 idProbeMemory = setInterval(probeMemory, 0) 7 } 8 function probeMemory(){ 9 addr = pageStep * pageCount 10 /* increase WCHAR ptr of strObj via bufPtr */ 11 bufPtr[0] = addr 12 pageCount++ 13 /* try to read at address bufPtr[0] */ 14 mem = strObj.substring(0,2) 15 /* return here only if addr was readable */ 16 clearInterval(idProbeMemory) 17 postMessage({ firstPage: addr, content: mem }) 18 }

Listing 2.3: Using web workers to find the first readable memory page in Internet Explorer 11 found the correct region. We thereby neither use control-flow hijacking nor code reuse nor code-injection techniques as part of our attack. We first explain how we can find the TEB. An excerpt of the TEB structure is shown in Figure 2.1, offsets apply to 32-bit processes. The structure within a TEB from offset 0x00 to 0x18 is known as Thread Information Block (TIB) and contains a pointer to ExceptionList. This pointer points into a thread’s stack, because the OS places at least one exception structure on the stack. Thus, the pointer’s value is between the values StackBase and StackLimit at offset 0x04 and 0x08, respectively. Additionally, the field at 0x18 contains the address of the TEB/TIB itself. Thus, we can apply a simple heuristic to scan over the memory space and discover a TEB (see Algorithm 1). Probing for a TEB in a 32-bit process (e.g., Internet Explorer tab process) starts at the end of the last user-mode page 0x7ffffffc (tebMaxEnd). TEBs can reside somewhere in the address space between 0x78000000 to 0x80000000 [175]. No other structures except for the PEB and shared data are in that memory region. The call to setInterval() in startP robe() sets getT EB() as timed function to execute permanently anew. An address is queried with a memory oracle (oracleP robe()) which either returns

21 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

Figure 2.1: Thread Environment Block: fields in bold can be utilized to discover a TEB in memory probing attempts when the address is readable, or produces an access fault. In the latter case, getT EB() executes again and the address to probe is decreased by the size of a memory page. As soon as an address is readable, its first three least significant bytes are set to zero (setT oP age- Begin()). The timed execution of getT EB() is cleared with clearInterval() and specific fields are read via memory disclosures. If the fields conform to a TEB structure, then success is set, otherwise setInterval() sets getT EB() again to be executed in intervals. On success, the adversary can read any TEB or PEB information to abuse them in malicious computations further on. The same method can be applied to 64-bit processes as well to discover the TEB: the offsets have to be adjusted to conform with the 64-bit pointer size and the address of the last possible user-mode page, where probing starts, has to be modified. The algorithm can be extended to probe fields of a PEB in case the TEB heuristic triggers. This avoids false positives, which may be hit on 64-bit, as TEBs are mapped below shared libraries.

2.3.5.1 Discovering CPI Safe Region The linear table-based and hashtable-based 64-bit implementations of CPI rely on hiding the location of the safe region from an attacker [117]. In the linear table-based imple- mentation, the safe region is 242 Bytes (4 TiB) in size, out of the 247 Bytes (128 TiB) of available virtual user-space memory on modern x86-64 processors. Trivially an attacker can guess any address inside the safe region with a probability of 3.125%, but has no way of knowing where exactly this address is located in relation to the start of the region. Thus, she cannot deduce where the metadata for a specific pointer resides. Without a memory oracle, this provides an acceptable level of security. However, an attacker capable of probing memory can quickly find the exact location of the safe region without the risk of crashing the process. The safe region consists of mostly zero bytes page-wise. Thus, we can distinguish a non- mapped address from an address containing one or more zero bytes. We use an approach that merely scans for zero bytes. If it locates a mapped address, it samples more addresses in the same page. This determines whether it is part of the safe region or if a false positive

22 2.3 Unveiling Hidden Memory

Algorithm 1: Discover a TEB via memory oracles Data: Globals: addrT oP robe, pageCount, pageStep, tebMaxEnd, idGetT EB, teb Result: address of TEB in teb Function startProbe pageStep ← 0x1000 tebMaxEnd ← 0x80000000 − 4 pageCount ← 0 idGetT EB ← setInterval(getT EB, 0) end Function getTEB addrT oP robe ← (tebMaxEnd − pageStep × pageCount) pageCount ← pageCount + 1 oracleProbe addrT oP robe /* at this point probing succeeded */ clearInterval(idGetT EB) teb ← setToPageBegin addrT oP robe /* read TEB specific fields */ ExcList ← readDword(teb) StackBase ← readDword(teb + 4) StackLimit ← readDword(teb + 8) tebSelf ← readDword(teb + 0x18) /* heuristic to identify TEB */ bool isT EB ← (teb == tebSelf) if isT EB ∧ (ExcList < StackBase) ∧ (ExcList > StackLimit) then success = 1 else /* we found other readable memory */ /* continue probing for a TEB */ idGetT EB ← setInterval(getT EB, 0) end end

was hit. Due to the sparsely populated region, this yields correct results under nearly all circumstances. Evans et al. [77] also observed this behavior in their work. After we hit the safe region, we still have no knowledge about where it exactly begins. As we can safely cause access violations due to the crash-resistance, we employ a binary search downward from this address until we find the first page. The algorithm works due to the fact that access to an address before the safe region will cause a fault, while an access anywhere after the start of the mapped area will not cause a fault. Consequently, we can approximate the beginning of the region by halving the error margin with every step. The maximum number of probes with binary search is log2 n, with n being the number of elements to search in. There are 4T iB/4KiB = 1, 073, 741, 824 possible pages containing

23 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

the start of the safe region. This means we need a maximum of log2 1, 073, 741, 824 = 30 tries after we located an arbitrary address in the safe region to determine the start. Assuming the worst case, we need 32 (128T iB/4T iB) probes to locate the safe region and afterwards 30 probes to locate the exact starting address. To decrease the likelihood of erroneously marking an address containing zero bytes not belonging to the safe region, the algorithm can be modified to sample more addresses in the same page. With the ability to alter the information on any pointer we want, the protection of CPI can be circumvented as we can just set the value allowing the action we need to perform with the pointer. Note that the attack assumptions (i.e., a read and write primitive as well as an information leak) required for our memory oracle are within the threat model of CPI.

2.3.6 Subverting Hidden Code Layouts

Data structures related to exports are an essential aspect of dynamic loading. These data structures contain function addresses allowed to be imported by other modules, as explained in the following. We first cover this background information before discussing which challenges this introduces for defenses.

2.3.6.1 Dynamic Loading

Windows as well as Linux provide legitimate methods to load shared libraries into a run- ning process. This procedure is known as dynamic loading. Shared libraries in Windows contain an Export Address Table (EAT) with pointers of exportable functions. This struc- ture is often accessed by legitimate code even during a program’s runtime and not only at load time. For example, the Windows API function GetProcAddress() solely needs the module base and the function name to retrieve a function address. It reads the module’s Portable Executable (PE) metadata until it discovers the appropriate function and returns its address. Hence, knowing a module’s base address is sufficient to retrieve any of its ex- portable functions. Linux provides a similar API: dlopen() can be used to load a shared library into a running process and dlsym() returns the address of a needed symbol (e.g., a function). The key observation is that export symbols and export addresses are available through- out the complete runtime of a process. This is necessary because a library loaded dynam- ically during runtime may import functions which are exported by system libraries like ntdll.dll (Windows) or libc.so (Linux). Therefore, exports in system libraries are in- evitable. Dynamic loading is especially important in web browsers: Firefox implements a plugin architecture to load desired features on the fly. Similarly, Windows implements the Component Object Model (COM) which is indispensable for ActiveX plugins in Internet Explorer [43]. Note that disabling dynamic loading is not an option in practice, as it would break funda- mental functionality and compatibility, and would require loading all libraries at startup of a process. To the best of our knowledge, there is no defense which protects export symbols against illegal access. However, Export Address Table filtering Plus (EAF+) of

24 2.4 Conquering (Re-)Randomization

EMET [132] forbids reading export structures based on the origin of the read instruction. We show in Section 2.5.1 that this is only a small hurdle in practice.

2.3.6.2 Leveraging Crash-Resistance to Subvert Hidden Code Layouts In case of code pointer hiding, which is utilized by Readactor [51], the functions’ addresses are hidden behind execute-only trampolines which mediate execution to the appropriate functions. Thus, their start addresses cannot be read directly. However, with crash- resistance, an adversary can discover the TEB without control-flow hijacking. After she reads information of a TEB, she can read the base addresses of all modules out of the PEB. Another option despite TEB discovery is to sweep through the address space in crash-resistant mode. As the PE file header starting at a module’s base is characteristic, memory oracles can provide modules’ base addresses. Furthermore, by utilizing memory disclosures, the attacker can resolve trampoline addresses corresponding to exported func- tions. She can then chain together several trampoline addresses to perform whole function code-reuse, as we will demonstrate later on.

2.4 Conquering (Re-)Randomization

Randomization of the memory layout or the code itself has been proposed by various works (e.g., [98, 155, 213]) and much attention was payed to their security and effectiveness. Thus, the latest outcome of this evolution are fine-grained re-randomization schemes such as Isomeron [65], which aims at preventing code-reuse primitives (see Section 2.2.2.3 for details). When utilizing crash-resistance, an adversary can abuse weak points in the defenses.

2.4.1 Defeating Fine-Grained Re-Randomization In the case of Isomeron, re-randomization is applied to the layout of the code. Hence, in two different points of time one of two different code versions can be used for a specific execution flow. However, to the best of our knowledge, data is not re-randomized at all. Thus, constant data is a foothold for an adversary to undermine the security guarantees of Isomeron as we discuss in the following.

2.4.1.1 Constant Structures As explained above, the knowledge about a module’s base address is sufficient to resolve any of its exported functions. While re-randomizing the code layout during runtime can be performed efficiently, re-randomizing the layout of data structures and the base addresses of modules has yet to be shown. Moreover, the PE metadata layout of a module needed to discover export functions must stay consistent such that legitimate code can traverse it. Dynamic loading crucially relies on this aspect. The same holds for the metadata of the ELF file format. The potential shortcoming is that an attacker can read that metadata with memory disclosures as well. Re-randomizing the metadata such that its field offsets change would require adjusting legitimate code which accesses it. Additionally,

25 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections data structures in other modules which reference the metadata need to be updated, too. Thus, we assume re-randomization of data structures allocated in a large number across the complete virtual address space is a challenging and yet unsolved task.

2.4.1.2 Pulling Sensitive Information A TEB also contains a pointer to a process’ PEB. One of its fields, namely PPEB LDR DATA LoaderData, contains the base addresses, names, and entry points of all loaded modules. We extract that information after we found a TEB with memory oracles. Then we can traverse the PE metadata of each module and retrieve all exported functions independent of the randomization applied. We therefore read the individual PE fields with memory disclosures and follow the specific offsets until we reach the EAT. Then, we can loop over the function names and extract the function addresses. As noted above, EMET applies filters to EATs, such that only legitimate code can traverse it. These are ineffective in practice and can be bypassed as we show in Section 2.5.1.

2.4.2 Code Execution under Re-Randomization Abusing a memory corruption vulnerability in an address space which is randomized in a fine-grained way on the instruction level and additionally re-randomizes before each function call is very challenging. To further harden exploitation, indirect calls are only allowed to dispatch functions and returns can only target instructions which are preceded by call instructions. This is consistent with coarse-grained CFI like Microsoft’s Control Flow Guard, BinCFI [229], CCFIR [228] or code-reuse protections in EMET [82, 154]. Thus, known code-reuse primitives such as ROP or Call-Oriented Programming (COP [40]) are not an option. Return-to-libc is inappropriate as well since a shadow stack can detect such attacks. However, as we can retrieve all export functions of all modules via crash-resistance and memory oracles, we opt to chain exported functions in a call-oriented manner. As re-randomization preserves the semantic of functions independent of the code (layout) mutations, they are reusable in a consistent way. The basic idea is to invoke exported functions which dispatch other exported functions on indirect call sites. Ultimately, an adversary can achieve the goal of executing her code of choice.

2.4.2.1 Discovering Functions for Code-Reuse At the point of control-flow hijacking, when the adversary dispatches her first function of choice, we assume that she can control the first argument’s memory (see Section 2.5 for details). Thus, we want functions which contain indirect calls whose targets can be controlled with values derived from the first argument. To find possible candidate functions usable for function chaining, we apply static program analysis and symbolic execution. An executable module is disassembled, its Control Flow Graph (CFG) is derived, and all exported functions are discovered. We then mark all indirect calls in them. In the next step, we extract the shortest execution paths between the beginning of a function and its indirect calls. We utilize the symbolic execution functionality of miasm2 [125] on the gathered paths to detect if the first argument to the function influences the target of

26 2.4 Conquering (Re-)Randomization the indirect call. If this is the case, we symbolically propagate potential arguments the functions receives to potential parameters a function may take when dispatched at the indirect call site.

arg5 ← EBPin

arg4 ← EBXin

arg3 ← ESIin

arg2 ← ARG3 + 0x10

arg1 ← ARG1

EAXout ← ARG3 + 0x10

ECXout ← ARG3

EBXout ← ARG1

EIPout ← [ARG1 + 0x2C]

Figure 2.2: Propagation summary for RtlInsertElementGenericTableFullAvl(). REGin are registers which are not redefined until the indirect call.

Figure 2.2 illustrates the concept of argument propagation to an indirect call instruc- tion inside RtlInsertElementGenericTableFullAvl in the NT Layer DLL. ARGn are argu- ments the function receives via the stack. At the indirect call site memory at ARG1 +0x2C is taken as a call target EIPout. Additionally, arguments are propagated to parameters for the callee (argn). For example, the first argument ARG1 becomes the first parameter for the callee, ARG3 is increased by ten to become the callee’s second parameter arg2. Such propagation summaries for export functions serve as a base to build code-reuse func- tion chains. The ultimate goal is to control the parameters of the last function, which eventually performs the operation wanted by an adversary.

2.4.2.2 Crash-Resistant Oriented Programming (CROP) Besides function chaining, an adversary can also utilize the crash-resistance feature to sequentially execute exported system calls or exported functions. Each call is thereby triggered within JavaScript and ends with a fault. As faults are handled, a new call to another exported function or function chain can be prepared and issued as we explain in the following. The exported function NtContinue() in ntdll.dll can be used to set a register con- text [221]. This context is taken as first parameter by NtContinue() and registers are set such that program execution continues within that context. At the point of control-flow hijacking which starts a function chain of choice, eventually NtContinue is dispatched as the last function in our chain. It takes a propagated argument field as its only parameter PCONTEXT. In the PCONTEXT parameter, we let the stack pointer point to attacker-controlled memory and the instruction pointer to an exported function like NtAllocateVirtual- Memory(). The return address for the function is set to NULL in the controlled mem-

27 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections ory. NtContinue() sets the register context and the function of choice (e.g., NtAllocate- VirtualMemory()) executes successfully. Upon its return, an access fault is triggered as it returns to NULL. However, this fault is handled and the browser continues running. This way, exported functions or syscalls can be dispatched subsequently in crash- resistant mode. Similar to our scanning technique shown in Section 2.3.5, this hap- pens within JavaScript with setTimeout or setInterval. We term this technique Crash- Resistant Oriented Programming (CROP) and in spirit it is similar to sigreturn-oriented programming (SROP) [28], as we discuss in Section 2.6.

2.5 Implementation

To demonstrate the practical viability of the methods discussed in the previous sections, we developed proof-of-concept exploits for Internet Explorer (IE) 10 on Windows 8.0 64-bit, for IE 11 and Firefox 64-bit on Windows 8.1 64-bit, and for Firefox 64-bit on 14.10 Linux 3.17.1. IE is a multi-process architecture whose tab processes run in 32-bit mode. We utilized CVE-2014-0322 which is a use-after-free bug in IE 10. It allows increasing a byte at an attacker controlled address. For IE 11, we utilize an introduced vulnerability which only allows writing a null byte to an attacker-specified address. The general procedure we utilize to ultimately execute code consists of the following six steps:

1. Trigger the vulnerability to create a read-write primitive usable from JavaScript.

2. Utilize the primitive as memory disclosure feature to leak information accessible with memory disclosures.

3. Use the primitive as memory oracles to find constant hidden memory such as the TEB or module base-addresses.

4. Traverse the modules’ EATs and extract exported functions.

5. Prepare attacker-controlled objects and set up the function chain.

6. Invoke a JavaScript function to trigger execution of the first function in the chain at an indirect call site.

2.5.1 Exploiting IE without Knowledge of the Memory Layout We use heap feng shui to align objects to predictable addresses [218]. The use-after-free vulnerability in IE 10 and the null byte write in IE 11 are used to modify a JavaScript number inside a JavaScript array. In IE, a generic array keeps array elements in differ- ent forms, which depend on their type. Numbers lower than 0x80000000 are stored as element = number  1 | 1. In contrast, objects are stored as pointers and their least significant bit is never set. We use the vulnerability to modify an element which represents a number. This way, we create a type confusion and let the number point to memory of choice (see Figure 2.3). We control 0x400 bytes at that location and can read and write it with byte granularity. We craft a fake Js::LiteralString object, including the buffer

28 2.5 Implementation pointer to any address we want, a length field, and the type flag1. When the modified number element is accessed, IE will interpret it as a string object. This way, we can use the JavaScript function escape() on that element to retrieve the data where the string’s pointer is pointing to. This functionality is used to (i) probe addresses we set in our fake string object with crash-resistant memory oracles (see Section 2.3.3) and (ii) read memory content at addresses which are readable.

Figure 2.3: After modifying a number, IE interprets it as pointer to an object (fakePtr). As it points to a JavaScript array, elements can be set and fake objects can be created. By varying the buffer pointer (bufPtr), a fake string object can be used for crash-resistant memory probing attempts.

2.5.1.1 Memory Probing After setting the scene, we probe with page granularity for a TEB starting from 0x7ffffffc and extract addresses of all exported functions. Optionally, we probe with a granularity of 64KB (module alignment) and check for the DOS header and PE header in case prob- ing returns readable memory. Similar to the former, this circumvents re-randomization schemes which do not re-randomize metadata in a mapped module. As another hurdle, Export Address Table Filtering Plus (EAF+) of Microsoft EMET [132] needs to be bypassed, too, since it checks read attempts to export metadata of mapped PE modules. If the read originates from illegitimate instructions, then the program is termi- nated. This should prevent reading export or import metadata with JavaScript. Therefore, modules are blacklisted which are not allowed to access it. However, we discovered that applying escape() on large-sized string objects triggers memory copying instructions from whitelisted modules (e.g., msvcrt.dll). Thus, we can simply copy a complete PE module into a JavaScript string by using escape() on our fake Js::LiteralString object. Simi- larly, other JavaScript functionality can be used to create a copy of a PE module without triggering read instructions from blacklisted modules [83]. Finally, we can resolve exports within the copy of the module.

2.5.1.2 Code Execution and Function Chaining With all exported function addresses available, we craft a fake vtable and insert the ad- dresses of five exported functions into the fake object. We dispatch a JavaScript method of the fake string object, which triggers a lookup in the vtable and a dispatch of the first exported function in our chain. Therefore, the first function also receives our fake object as first argument, such that we control two parameters for the last function in the chain

1 Most JavaScript objects are C++ objects and contain a vtable pointer. As we do not know the location of any module’s data yet, we do not set it. However, accessing the fake string object with e.g. escape() still works.

29 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

(LdrInitShimEngineDynamic). The chain propagates our first controlled argument P in a way that

LdrInitShimEngineDynamic([P+0x08]+0x20, [P]+0x18) is executed. If the first parameter points to any module, the second parameter can specify a string pointer to a DLL name. This DLL can reside on a remote server and is loaded into the address space of the program. Hence, the adversary reaches her goal of code execution. We opted to use LdrInitShimEngineDynamic because neither EMET [132] nor CCFIR [228] blacklist the function, and normally dynamic loading of remote DLLs via the standard Windows API is monitored. We can also use WinExec([P+0x08]+0x20, <*>) with a chain of four functions to achieve the execution of arbitrary programs.

2.5.2 Memory Probing in Mozilla Firefox Next, we describe the steps we used to scan memory in 64-bit Mozilla Firefox on Win- dows 8.1 and Ubuntu 14.10 Linux 3.17.1 All steps are also applicable to any other 64-bit application embedding the SpiderMonkey engine. This includes for example Mozilla Thun- derbird with asm.js enabled. We introduced a vulnerability into Firefox 38.0 to simulate a real-world bug. This allows to leak information into the JavaScript context and write to memory addresses. With these primitives, we show the creation of crash-resistance and the feasibility of memory oracles to scan arbitrary memory. In contrast to Internet Explorer, we do not need to rely on setInterval() and web workers. Instead of creating a fake object, we need to change fields in Firefox’ object metadata to obtain crash-resistance. However, web workers can be used to increase the performance, especially since calling into and out of asm.js is an expensive operation. The main bottleneck is the handling of generated faults, because they are delivered with four context switches for every exception.

Figure 2.4: Location of the asm.js heap pointer (heapLoc) needed to modify in order to gain crash resistance.

2.5.2.1 Manipulating the Function Object We let Firefox create an asm.js function by utilizing the asm.js subset of JavaScript. This leads to a AsmJSModule memory object. We then overwrite the heap location to point to the memory region we want to scan. To achieve this goal, we use an information leak to first

30 2.5 Implementation deduce the location of the JSSValue object, which constitutes the function reference. Then, we utilize targeted reads to learn the location of the heap address in the AsmJSModule. We then set the heap location to the region we want to scan (see Figure 2.4). It is possible to use the function object several times: by setting the heap location to another address for each probe, we use the ahead-of-time compiled asm.js code repeatedly.

2.5.2.2 Probing the Address Space A loop in JavaScript is utilized which calls into asm.js and utilizes the asm.js crash- resistant functionality to probe target regions. We use an asm.js function which returns a 64-bit double float value at a given address. The target region may contain entries that are interpreted as NaN. Due to the way floating point numbers are handled, the result is NaN if a value with the highest 11 bits set to 1 is hit. This cannot be distinguished from a faulting read attempt, as this yields NaN as well. This needs additional byte-shifted probes around that address to verify that the page is indeed mapped. By retrieving a value which is not NaN, we are certain that the page is mapped. We are able to scan 4GiB beginning from the heap start. Once this space is scanned, the heap address of the asm.js module needs to be adjusted and scanning can continue. Care has to be taken to only perform probes which attempt to read out of bounds of the asm.js heap size. The heap size is specified on creation of the asm.js object. Normally, when the heap pointer is not modified, inbound accesses do not throw an exception, while out of bound accesses do. As we modified the heap location to point to an unmapped address, inbound heap accesses will throw exceptions. These are not crash-resistant. Simply moving the initial heap location to a lower address and scanning with an offset from the targeted address avoids this problem. Thus, only out of bound probes are utilized, as these do not terminate Firefox.

2.5.2.3 Determining Memory Contents Once a mapped page is found, we need to determine what it contains. When sweeping the complete memory, we can hit shared modules, data structures like the TEB, or application heaps. Learning what memory contains can be done using regular JavaScript by utilizing the capability to call fully featured JavaScript functions from asm.js. We can therefore use the same heuristics used with Internet Explorer to, for example, safely deduce a TEB. At the end, the same techniques used in Internet Explorer can be utilized to gain code execution.

2.5.3 Memory Scan Timings We used performance.now(), the high-resolution performance counter of JavaScript. We performed 268,369,911 probes with Firefox (Windows and Linux) and 32,768 probes with IE on unmapped memory to measure the probes per seconds a single browser thread can achieve. We observed different scanning rates between the two tested browsers. 32-bit Internet Explorer was only able to reach 63 probes/s, while Firefox was able to scan with 718 probes/s in Windows and 18,357 probes/s in Linux on average. However, this includes optimizations as explained below.

31 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

The difference in scanning speeds is caused by the different methods used for probing. While Internet Explorer needs to spawn a new JavaScript function with setInterval() for every probe, Firefox uses ahead-of-time compiled code of asm.js. This causes significantly less overhead. We were able to move parts of the JavaScript scanning loop into the asm.js code, providing another speedup. This is due to fewer calls into the asm.js JavaScript subset, which are expensive. We provide only maximum scanning times, as these are already small enough to be practical for an adversary. In 32-bit Internet Explorer it takes at most (0x80000000 − 0x78000000)/0x1000 = 32, 768 crash-resistant probes to locate the TEB. Thus, the maximum scanning time is (32, 768/63)s = 520.1s (8.7 minutes). To locate the most upper mapped DLL, 28 = 256 probes are necessary at most. This is due to the module base entropy of 8 bits, the 64KB alignment of modules, and the address scan range from 0x77000000 to 0x78000000 where at least one module resides. This yields a maximum scanning time of (256/63)s = 4.1s. In 64-bit Firefox for Windows, we scanned for PE metadata of mapped modules. To locate the DLL mapped on top of the address space, it takes at most 219 = 524, 288 probes due to a module base entropy of 19 bits in 64-bit processes. Thus, the maximum scanning time is (524, 288/718)s = 730.2s (12.2 minutes). Scanning starts at the top user-mode address of 0x7FFFFFE0000 and is performed toward lower addresses in 64KB steps. In 64-bit Firefox for Linux, we focus on finding reference-less hidden memory. An instantiation of reference-less memory is the linear safe region of the 64-bit implementation of CPI. Locating the safe region of CPI can be done in very few steps. As outlined earlier, we first probe for a location in the region and then use binary search to locate the exact starting address. As this requires less than 1,000 probes, it is almost instant ((1, 000/18, 357)s = 0.05s). The difference in probes/s between the Windows and Linux version of Firefox is due to the fast signal handling in Linux in comparison to the exception handling in Windows. Speed increases further when spawning several workers that perform the scanning. This is due to multiple cores on modern CPUs, which run the worker threads in parallel.

2.6 Related Work

We review research which is related to our offensive approach of evaluating information- hiding and diversification schemes. Back in 2004, Shacham et al. [184] showed the ineffectiveness of ASLR on 32-bit systems due to its susceptibility to brute-force attacks. Their work suggested defense mechanisms like subsequent re-randomization. While their approach targeted servers on 32-bit systems, we show that similar capabilities are possible with web browsers on 32-bit and 64-bit platforms. Snow et al. introduced just-in-time code-reuse (JIT-ROP) [189] that can repeatedly utilize an information leak to bypass fine-grained ASLR implementations. The authors suggest frequent re-randomization at runtime as a possible solution. Bittau et al. [23] proposed another interesting flavor of ROP attacks which they called Blind ROP (BROP). The authors show how stack buffer overflows can be utilized to bypass ASLR and conduct code-reuse attacks remotely. BROP uses server crashes as a

32 2.7 Discussion side channel which, in turn, reveals information about the memory layout. By locating and arranging specific gadgets remotely, they trigger a write over a socket that transfers the binary to the attacker to find more gadgets. Our work is different in that it focuses on browsers, which have a hard crash policy. Nevertheless, with crash resistance and memory oracles we are able to undermine memory secrecy. Seibert et al. introduced another approach on the Apache server by reading bytes and measuring the time [181]. It turns out that specific bytes leave different timing patterns and thus, probed in sequence, reveal information about the memory layout. Our work differs in that we introduce fault-tolerant functionality in browsers, which has not been shown before. Its result, however, is similar, in that we can deduce memory which is not locatable by simple memory disclosures. However, recent research demonstrates that coarse-grained CFI variants are prone to code-reuse attacks [64, 89, 178]. Schuster et al. introduce Counterfeit Object-Oriented Programming (COOP) [177] that ranks itself on the same line with other code-reuse at- tacks. The authors manage to bypass many CFI defense mechanisms by using chains of existing C++ virtual function calls. The drawback is that semantic-aware C++ de- fenses check virtual function table hierarchies and prevent COOP. In contrast, we present a different function-reuse technique in addition to the contributions of crash resistance and memory oracles. It uses exported function chains and C-like indirect calls instead of virtual function calls. Thus, C++ defenses are insufficient against it. Furthermore, we combined function chaining with fault-tolerance to gain a novel function-reuse technique named Crash-Resistant Oriented Programming (CROP). In 2014, Kuznetsov et al. introduce CPI [116]. As discussed before, CPI is prone to data pointer overwrites: Evans et al. showed that such overwrites can be utilized to launch timing side-channel attacks that lead to information leakages about the safe region [77]. Similarily, we can deduce the reference-less safe region. However, we show that it is possible within Firefox which does normally not allow faults, while Evans et al. utilize the web server Nginx, which respawns upon a crash. Function chaining is related to entrypoint-gadget (EP-gadget) linking in which functions are chained together via indirect call instructions [89]. Nevertheless, we additionally per- form an analysis to gain knowledge about parameter propagation of the chained functions. CROP is in spirit similar to sigreturn-oriented programming (SROP [28]), as we can set register contexts as well. While SROP is only possible on Linux, we can utilize crash resistance to perform arbitrary exported function chaining and system call dispatching on Windows in a fault-tolerant way.

2.7 Discussion

In the following, we discuss the implications and limitations of crash-resistance and mem- ory oracles. Additionally, we elaborate on potential countermeasures and design choices to thwart fault-tolerant memory scanning.

33 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

2.7.1 Novel Memory Scanning Technique

The existence of reliable and fast memory oracles enables an attacker to bypass all defenses that rely in any way on metadata that is stored in userspace. A common approach was to keep the data in reference-less memory so an attacker would need to hijack the control flow, inject code, or perform code-reuse attacks, before disabling that protection. This implies that the defense also protected itself. However, we show that hidden information in the userspace can be found by an attacker, without control-flow hijacking, code-injection or code-reuse attacks. While this primitive alone does not allow an attacker to exploit an application, it provides a valuable addition to her arsenal. It is an advantage when simple memory disclosures are not an option. Hence, it might allow circumventing previously effective defense mechanisms. With the knowledge obtained by crash-resistant address space scanning, an attacker can overwrite data considered to be unreachable by adversaries. If such data serves as metadata for protection mechanisms, it can enable the successful execution of other exploit stages. This might enable control-flow hijacking again or might endanger the security provided by shadow stacks [59]: modifying a reference-less shadow stack after it is discovered with memory oracles might allow traditional code-reuse attacks again. Another notable example is the reference-less safe region used by 64-bit CPI imple- mentations. CPI is able to prevent control-flow hijacking exploits, but altering the safe region’s metadata effectively disables it. This allows control-flow hijacking and thus, the realization of traditional code-reuse attacks such as ROP. However, CPI also provides a Software Fault Isolation (SFI [212]) and hash table-based implementation of the safe re- gion [117]. While the SFI version is immune against memory oracles, it has an additional performance overhead of about 5%. The hash table-based version is located in userspace and can have a size of 230.4 bytes, while the original linear-based safe region has a size of 242. According to Kuznetsov et al. [117], it requires around 51,000 probes to locate the hash table-based safe region. In Firefox on 64-bit, we achieve a rate of 18,357 probes per seconds. Thus, locating the safe region would still be fast with only 2.78 seconds. As the 32-bit safe region is protected by segmentation, we cannot reach it with memory oracles. Recent work named Readactor++ [52] protects C++ virtual function call sites. We do not claim to have bypassed Readactor++. However, we weakened it in the sense that we can leak information about the memory layout with memory oracles. More specifically, we can extract trampoline addresses corresponding to exported functions. Note that register or data-flow randomization is insufficient as a defense against function chaining: function prototypes of exported functions are mostly documented to ease their usage by a programmer. Thus, the number of arguments and their types are known. If an exported function propagates fields of its first argument structure to parameters for a function at an indirect call site, the propagation is unaffected by register or data- flow randomization. As the propagated fields constitute parameters, they always need to be pushed onto the stack or put into parameter registers specified in the ABI. Shuffling the parameters makes is necessary to adjust the parameter handling of each function which is allowed at the indirect call site. To our knowledge this is not done by current defenses [51, 155].

34 2.7 Discussion

The speed of memory scanning with memory oracles currently varies across browsers and platforms. This is due to a) the way they are implemented and b) the runtime overhead of the exception/signal handlers. Firefox 64-bit on Linux achieves the fastest scanning as ahead-of-time asm.js code is used, which intentionally uses exceptions for bound checks. Additionally, signal handler on Linux are faster than exception handler on Windows. In contrast, the fault-tolerant feature in Internet Explorer is harnessed with code normally used to execute JavaScript timer functions. Thus, much boilerplate code is executed and slows down the scanning ability, in addition to the SEH exception handling overhead. An increase in performance might be gained with typed arrays, as element accesses map to array element accesses on the assembly level. Currently we use a fake string object in Internet Explorer. With asm.js coming to Internet Explorer on Windows 10 [131], it might be possible to increase the speed further. We currently only make use of fault-tolerant functionality based on exception/signal handling for crash-resistance, memory oracles, and memory scanning. While we show their existence and powerful advantages, crash-resistance might be achieved with system calls or functions intended to query memory information. We hope that future work will reveal more crash-resistant functionality for different purposes such as CROP (see Sec- tion 2.4.2.2). Automated approaches utilizing static analysis might simplify that process, such that legitimate crash-resistant code paths become controllable by an attacker without control-flow hijacking [208].

2.7.2 Design Choices, Countermeasures, and Defenses Several choices can be made to prevent crash-resistance. Single instances of crash-resistance are fixable. For example, we do not see any legitimate uses in the crash-resistance of Inter- net Explorer. Actually, after cooperation with Microsoft it was determined that this issue is security relevant and affects Internet Explorer 7 to 11 and the Microsoft Edge Browser (see CVE-2015-6161). It was fixed for Microsoft Edge during the Patch Tuesday cycle in December 2015 and hardening settings for Internet Explorer were made available [134]. In the past, a few vulnerabilities had the ability to survive crashes, and thus, adversaries were able to trigger them several times in order to bypass ASLR or to increase successful exploit chances [114, 193]. Hence, Microsoft’s Security Development Lifecycle (SDL) sug- gests avoiding global exception handlers which can catch all violations [126]. Note that single instances of buffer overflow vulnerabilities can be fixed as well, while the class of buffer overflows cannot be easily completely eliminated. Crash-resistance is similar and we argue that constructing a memory oracle is possible on every modern system which has a way for applications to handle faults.

2.7.2.1 Crash Policies A general countermeasure is to limit the number of faults that can be caused. This means an attacker must find ways to reduce probing attempts and hit the right location in one of her first scans. However, this only provides a probabilistic solution as there is a small chance for the first probe to succeed. In addition, a hard crash policy can interfere with use cases where legitimate exceptions can occur and are expected. As described

35 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections in Section 2.3.2.2, Firefox leverages exceptions for fast array accesses to avoid bounds checking. An attempted out of bound read is caught with the help of the exception handler and returns a default value (undefined). Removing exception support would decrease the performance, because additional bound checks for every array access would have to be performed.

2.7.2.2 Accurately Checking the Exception Information The most effective countermeasure against crash-resistance is to accurately check the ex- ception information of a triggered fault. The triggered exceptions we used in Internet Explorer for crash-resistance allow any fault to be used as a side-channel. Exception handlers should catch only faults which are expected in guarded code. Therefore, the ex- ception type should be inspected carefully as well as the address of the instruction which caused the fault. This information is necessary to verify that only the intended faults are caught. Additionally, it is necessary to make sure that guarded code cannot throw other exceptions. While it might be difficult to always handle all faults accurately, unintended faults should always be forwarded unhandled. This way the operating system can safely terminate the program and prevent crash-resistance. Note that the fault handler of Fire- fox performs rigorous checks on the data provided by the OS. This includes information about the address of the instruction causing the fault, the error code, and the exception type. Thus, we needed to modify metadata in the process in order trick the checks before triggering a fault. Exception information is processed differently in Windows and Linux. Linux can dif- ferentiate if a fault occurred due to unmapped memory or due to an access with wrong permissions. As such, exception handling in the asm.js functionality of Firefox can utilize this subtlety to prevent crash-resistance. Actually, this is a good example for fixing a single instance of crash-resistance. Surprisingly, the Firefox developers added a check to the asm.js exception handling in Firefox 39 [142]. Accesses to unmapped memory are not handled anymore, but only accesses with wrong permissions. As a guard region follows the asm.js heap, bound checks can still be performed with exceptions. As the fix was not flagged as a security issue, the developers unintentionally eliminated a security issue. However, crash-resistance within asm.js remains in the Windows version of Firefox.

2.7.2.3 Using Guard Pages to Prevent Probing We realized in our tests with Firefox in Windows that accesses to guard pages around the stacks were not crash-resistant. Guard pages around the stack normally prevent stack overflows. An access to a guard page delivered an error code different to the error code of a heap guard region access. This was not handled by the asm.js handler. By placing guard pages around critical structures, scanning attempts performed by an attacker can be detected. The program can then be terminated immediately whenever an illegal access is detected. The difference in the exception code allows distinguishing potentially intended faults from exceptions caused by an attacker. However, the same fault in Internet Explorer still allows complete crash-resistant memory scanning as any fault is handled. Thus, probing an unmapped page yields the same result as touching a guard page.

36 2.8 Conclusion

2.7.2.4 Defenses against Crash-Resistance Softbound [143] and CETS [144] are memory corruption defenses and memory safety solutions for C programs. The former provides spatial safety, while the latter prevents temporal bugs. As memory corruptions are eliminated, our current approach of crash- resistance is not possible in C programs. However, most parts of Firefox and Internet Explorer are written in C++, which Softbound and CETS do not support.

2.8 Conclusion

In this chapter, we demonstrated that even client applications such as web browsers can be resistant to crashes. We showed that an adversary can safely query the address space, which is normally not legitimate and should lead to program termination. We thereby do not rely on control-flow hijacking, code-injection or code-reuse attacks. To this end, we introduced the concept of crash resistance and extended memory disclosures to develop memory oracles. This enables an adversary to use fault-tolerant functionality as a side channel to obtain information about the memory layout. Furthermore, we introduced the concept of Crash-Resistant Oriented Programming (CROP) that leverages crash resistance to execute function chains in a fault-tolerant manner. As a result, recently proposed infor- mation hiding and randomization defenses are weakened, and control-flow hijacking and code-reuse attacks can be enabled again. We assume that the existence of crash resistance in client software is only the beginning, and more instances lie dormant waiting to be awaken.

In the next chapter we will focus on memory disclosures from a defensive perspective.

37 Chapter 2 Enabling Crash-Resistance to Evaluate Memory Secrecy Protections

38 Chapter 3 Information Leak Detection in Script Engines

We observe that information leaks are a crucial step in modern attacks against web browsers. As mentioned in Chapter 1, the adversary needs to find a way to read a memory pointer to learn some information about the virtual address space of the vul- nerable program. Generally speaking, the attacker can then de-randomize the address space based on the leaked pointer. Eventually, she can undermine memory secrecy and perform control-flow hijacking in conjunction with the bypass of other defenses. In this chapter we propose a technique for fine-grained, automated detection of memory disclosure attacks against script engines at runtime. Our approach is based on the insight that script engines in web browsers are commonly utilized by adversaries to abuse infor- mation leaks in practice. Therefore, the placement of modules and code sections in the virtual address space is revealed in order to bypass ASLR. Any script context is forbidden to contain memory information, i. e., the scripting context is separated from native mem- ory and hence may not provide raw memory pointers. As such, a viable approach to detect information leaks is to create a clone of the to-be-protected process with a re-randomized address space layout, which is instrumented to be synchronized with the original process. An inconsistency in the script contexts of both processes can only occur when a memory disclosure vulnerability was exploited to gain information about the memory layout. In such a case, the two processes can be halted to prevent further execution of the mali- cious script. Based on this approach, we developed a framework with the name Detile (detection of information leaks).

3.1 Introduction

Modern in the wild exploits against browsers leverage information leaks as a fundamental primitive. More importantly, we observe that much academic research aims at the by- pass of several proposed defenses. Therefore, a lot of bypasses target a web browser and use memory disclosures as a crucial step. In what follows, we emphasize the academic importance of information leaks while not pestering the reader with unecessary details.

39 Chapter 3 Information Leak Detection in Script Engines

Table 3.1: Proposed defenses and offensive approaches utilizing an information leak in browsers to weaken or bypass the specific defense. All mentioned attacks are mitigated by Detile.

Protection flavor Defense Weakened/Bypassed by Address randomization Fine-grained ASLR [98] Just-In-Time Code Reuse [189] Code-reuse protection RopGuard [82] Size Does Matter [90], KBouncer [156] Anti-ROP Evaluation [178], ROPecker [46] COOP [177] Code-reuse protection G-Free [149] Browser JIT Defense Bypass [10], COOP [177] Coarse-grained CFI CCFIR [228] Stitching the Gadgets [64], BinCFI [229] Out of Control [89], COOP [177] Fine-grained CFI IFCC [202] Losing Control [48] VTV [202] Information hiding Oxymoron [12] Vtable Disclosure [65], Crash Resistance [86], COOP [177] Information hiding CPI linear region [116] Crash Resistance [86] Execution randomization Isomeron [65] Crash Resistance [86] Randomization/ Readactor [51] Crash Resistance [86], Information hiding COOP [177] Randomization/ Heisenbyte [199] Code Inference Attacks [190] Destructive code reads NEAR [215]

For example, Snow et al. introduced Just-In-Time Code Reuse attacks (JIT-ROP [189]) to bypass fine-grained ASLR implementations by repeatedly utilizing an information leak. Similarly, Snow et al. were able to circumvent approaches which incorporate destructive code reads, a mechanism to prevent execution of code after it has been read [190]. G- Free [149], a compiler-based approach against return-oriented programming attacks was circumvented by Athanasakis et al. [10]. They force the browser to generate code during runtime, and perform successive information leaks to disclose enough needed information. G¨okta¸set al. demonstrated several bypasses of proposed ROP defenses and their ex- ploit needs an information leak in order to perform subsequents steps of the attack [90]. Similarily, Schuster et al. revealed shortcomings in protections against ROP [178]. In those cases, memory disclosures were used as well. An information leak is also needed by Song et al., who showed that dynamic code generation is vulnerable to code injection at- tacks [191]. Furthermore, Counterfeit Object-oriented Programming (COOP [177]) needs to disclose the location of vtables to mount a subsequent control flow hijacking attack by

40 3.2 Technical Background reusing them. In chapter 2 we also utilized memory disclosures for memory oracles to weaken various defenses [86]. Even several coarse-grained and fined-grained CFI solutions fell victim to attacks proposed by Davi et al., G¨okta¸set al. and Conti et al. [48, 64, 89]. Unsurprisingly, their approaches needed memory disclosures. All of these offensive by- passes utilized an information leak as a crucial step and implemented the attack against a web browser. An overview of bypassed defenses by specific attacks which are mitigated by Detile is shown in Table 3.1. In summary, we see a lot of achievements on the offensive side – especially targeting browsers – but research lags behind when it comes to detecting such information leaks. In spirit, our approach is similar to n-variant systems [33, 50] and similar multi-execution based approaches [38, 53, 68]. However, we are able to observe the actual information leak since we instrument the scripting context, while n-variant systems are only capable of observing when the control flow diverges in the different replica. As such, we can detect modern code-reuse attacks such as JIT-ROP [189] or COOP [177]. We have implemented a prototype of our technique and extended Internet Explorer 10/11 (IE) on Windows 8.0/8.1 to create a synchronized clone of each tab and enforce the information leak checks. We chose this software mainly due to two reasons. First, IE is an attractive target for attackers as the large number of vulnerabilities indicates. Second, IE and Windows pose several interesting technical challenges since it is a proprietary binary system that we need to instrument and it lacks fine-grained ASLR. Evaluation results show that our prototype is able to re-randomize single processes without significant com- putational impact. Additionally, running IE with our re-randomization and information leak detection engine imposes a performance hit of ∼17% on average. Furthermore, em- pirical tests with real-world exploits also indicate that our approach is usable to unravel modern and unknown exploits which target browsers and utilize memory disclosures for exploitation. In summary, our main contributions in this chapter are:

• We present a system to tackle the problem of information leaks, which are frequently used in practice by attackers as an exploit primitive. More specifically, we propose a concept for fine-grained, automated detection of information leaks with per process re-randomization, dual process execution, and process synchronization.

• We show that dual execution of highly complex, binary-only software such as Mi- crosoft’s Internet Explorer is possible without access to the source code, whereby two executing instances operate deterministic to each other.

• We implemented a prototype for IE 10/11 on Windows 8.0/8.1. We show that our tool can successfully detect several real-world exploits, while producing no alerts on highly complex, real-world .

3.2 Technical Background

Before diving into the details of our defense, we first review traditional n-variant schemes, provide several technical details of Windows related to the implementation of ASLR, the

41 Chapter 3 Information Leak Detection in Script Engines interplay between 64- and 32-bit processes, and the architecture of IE. This information is based on empirical tests we performed and reverse engineering of certain parts of Windows and IE. We briefly introduce important scripting engines and also explain the attacker model used throughout the rest of this chapter.

3.2.1 Enhancing Security with N-Variant Systems

N-Variant or Multi-Execution systems evolved from fault-tolerant environments to mit- igation systems against security critical vulnerabilities [33, 50, 99, 209]. Our concept of Detile incorporates similar ideas like dual process execution and dual process synchro- nization. However, our approach is constructed specifically for scripting engines, and thus, is more fine-grained: While Detile operates and synchronizes processes on the scripting interpreter’s level, n-variant systems intercept only at the system call level. One drawback for these conventional systems is that they are prone against Just-In-Time Code-Reuse (JIT-ROP [189]) and Counterfeit Object-oriented Programming (COOP [177]) attacks, while Detile is able to detect these (see Section 3.3.1 and 3.6.2 and for details).

3.2.2 Windows ASLR Internals

Address Space Layout Randomization (ASLR) is a well-known security mechanism that involves the randomization of stacks, heaps, and loaded images in the virtual address space. Its purpose is to leave an attacker with no knowledge about the virtual memory space in which code and data lives. Our main concern about the Windows implementation of ASLR is related to the loaded images. In Windows, whenever an image is loaded into the virtual address space, a section object is created, which represents a section of memory. These objects are managed system-wide and can be shared among all processes. Once a DLL is loaded, its section object remains permanent as long as processes are referencing it. This concept has the benefit that relocation takes place once and whenever a process needs to load a DLL, its section object is reused and the view of the section is mapped into the virtual address space of the process, making the memory section visible. This way, physical memory is shared among all processes that load a specific DLL whose section object is already present. In particular, as long as the virtual address is not occupied, each image is loaded at the same virtual address among all running user-mode processes. Figure 3.1 illustrates this concept. The randomization of a DLL is influenced by a random value (the so called image bias) that is generated at boot time. This value is used as an index in an image bitmap, which represents specific address ranges. For 32-bit images, the top address of the range starts at 0x78000000. For 64-bit images that are based above the 4GB boundary, the top address of the range starts at 0x7FFFFFE0000. Each bit in the bitmap stands for a 64KB unit of allocation starting from the top address to lower addresses. When an image is being loaded, the bitmap is scanned from top to bottom starting at the random image bias until enough free bits are found to map the image. In Windows 8, there are three image bitmaps. One is for 64-bit images above the 4GB address range, one for 64-bit images below 4GB, and the third bitmap is used for 32-bit images.

42 3.2 Technical Background

Figure 3.1: Shared physical memory: shaded regions are sections of memory occupied for images. Their views are mapped into the virtual address space of the processes that load the images.

64-bit DLL images that are based above the 4GB address boundary receive 19 bits of entropy. It is worth mentioning that prior to Windows 8, the ASLR entropy amounted to 8-bit and was the same for both 32-bit and 64-bit images. Executable images other than DLLs receive an entropy of 17 bits when they are based above 4GB, otherwise they receive 8 bits. Whenever an executable image is loaded, a random load offset is calculated which corre- sponds to the entropy the image receives. Thus, an executable image might get relocated to another base once the last reference to its image section is gone. However, Windows does not discard the image section object immediately, but rather keeps it in case the image is loaded soon after. This leads to the empirical fact that executable images are loaded at the same base as before. While all these features make sense from a performance point of view, they create an inconvenient state for our implementation and detection metric. As we discuss in Section 3.4, we rebase each DLL and the main executable for each run.

3.2.3 WOW64 Subsystem Overview 64-bit operating systems are the systems of choice for today’s users: 64-bit processors are widely used in practice, and hence 7, 8 or 8.1 in its 64-bit version is usually running on typical desktop systems. However, most third-party applications are distributed in their 32-bit form. This is for example the case for Mozilla Firefox, and also for parts of Microsoft’s Internet Explorer. As our framework should protect against widely attacked targets, it needs to support 32-bit and 64-bit processes. Therefore, the Windows On Windows 64 (shortened as WOW64 ) emulation layer plays an important role, as it allows legacy 32-bit applications to run on modern 64-bit Windows systems. Executing a user-mode 32-bit application instructs the kernel to create a WOW64 pro- cess. According to our observations, it creates the program’s address space and maps the 64-bit and 32-bit NT Layer DLL (ntdll.dll) into the virtual memory of the pro-

43 Chapter 3 Information Leak Detection in Script Engines gram. The 64-bit ntdll.dll is mapped to an address greater than 4GB, and the 32-bit ntdll.dll to an address smaller than 4GB. Then, the application’s 32-bit main executable is mapped into memory. These three images are the modules, which are available in a user-mode address space, even when starting a 32-bit application in suspended mode. Re- suming the application leads to the mapping of the emulation layer dynamic link libraries wow64.dll, wow64cpu.dll and wow64win.dll. They manage the creation of 32-bit pro- cesses and threads, enable CPU mode switches between 32-bit and 64-bit during system calls, and intercept and redirect 32-bit system calls to their 64-bit equivalents. For more details about the WOW64 layer, the reader is referred to literature on Windows Inter- nals [174]. Subsequent 32-bit DLLs are mapped into the address space via LdrLoadDll of the 32-bit ntdll.dll. The first of them is kernel32.dll. The loader assures that it is mapped to the same address in each WOW64 process system wide, using a unique address per reboot. It therefore compares its name to the hardcoded “KERNEL32.DLL” string in ntdll.dll upon loading. If the loader is not able to map it to its preferred base address, process initialization fails with a conflicting address error. As process based re- randomization plays a crucial role in our framework, this issue is handled such that each process contains its kernel32.dll at a different base address (see Section 3.4.1). After mapping kernel32.dll, all other needed 32-bit DLLs are mapped into the address space by the loader via the library loading API. System libraries are thereby normally taken from the C:\Windows\SysWOW64 folder that comprises the counterpart of C:\Windows\System32 for 32-bit applications.

3.2.4 Internet Explorer Architecture

While our approach is in general applicable to other software, we focus on protecting the scripting engines of a recent version of Microsoft Internet Explorer since browsers are one of the most common targets. Additionally, IE is a high value target as demonstrated by the number of code execution vulnerabilities compared to other browsers [54, 55]. As we will frequently refer to browser internals, a basic understanding of its architecture is needed. Since version 8, IE is developed as multi-process application [225]. That means, a 64- bit main frame process governs several 32-bit WOW64 tab processes, which are isolated from each other. The frame process runs with a medium integrity level and isolated tab processes run with low integrity levels. Hence, tab processes are restricted and forbidden to access all resources of processes with higher integrity levels [133]. This architecture implies that websites opened in new tabs can lead to the start of new tab processes. These have to incorporate our protection in order to protect IE as complete application against information leaks (see Section 3.4).

3.2.5 Scripting Engines

In the context of IE, mainly two scripting engines are relevant and we briefly introduce both.

44 3.2 Technical Background

3.2.5.1 Internet Explorer Chakra

With the release of Internet Explorer 9, a new JavaScript engine called Chakra was in- troduced. Since Internet Explorer 11, Chakra exports a documented API which enables developers to embed the engine into their own applications. However, IE still uses the undocumented internal COM interface. Nevertheless, some Chakra internals were learned from the official API. The engine supports just-in-time (JIT) compiling of JavaScript byte- code to speed up execution. Typed arrays like integer arrays are stored as native arrays in heap memory along with metadata to accelerate element access. Script code is translated to JS bytecode on demand in a function-wise manner to minimize memory footprint and avoid generating unused bytecode. The bytecode is interpreted within a loop, whereby un- documented opcodes govern the execution of native functions within a switch statement. Dependent on the opcode, the desired JavaScript functionality is achieved with native code.

3.2.5.2 ActionScript Virtual Machine (AVM)

The Adobe Flash plugin for browsers and especially for IE is a widely attacked target. Scripts written in ActionScript are interpreted or JIT-compiled to native code by the AVM. There is much unofficial documentation about its internals [24, 121]. Most importantly, it is possible to intercept each ActionScript method with available tools [97]. Thus, no matter whether bytecode is interpreted by the opcode handlers or JIT code is executed, we are able to instrument the AVM.

3.2.6 Adversarial Capabilities Memory disclosure attacks are an increasingly used technique for the exploitation of soft- ware vulnerabilities [182, 189, 196]. In the presence of full ASLR, DEP, CFI, or ROP defenses, the attacker has no anchor to a memory address to jump to, even if in control of the instruction pointer. This is the moment where information leaks come into play:

Figure 3.2: Two sketched methods to achieve information leaks: 1.) Overwriting a length field (0x1000) of a script context data structure gives the adversary the possibility to read beyond legitimate data (SC::data) and leak the address of a vtable (vfptr). 2.) When the pointer (&SC::data) to the data structure is modified directly, it can point into the data structure and can disclose memory beyond the legitimate data.

45 Chapter 3 Information Leak Detection in Script Engines an attacker needs to read—in any way possible—a raw memory pointer in order to gain a foothold into the native virtual address space of the vulnerable program.

One common way to achieve an information disclosure of native memory is to use a vulnerability to eventually overwrite a data length field, without crashing the program. The next step is an out-of-bounds read on the underlying data, to subsequently read memory information. The field and information may have been provoked to reside in predictable locations by heap massaging in the script context, performed by the attacker. Another possibility to disclose memory is to use a program’s vulnerability to write a memory pointer into data that must not provide memory information, such as a string in the script context. Similarly, overwriting a terminating character of a script context data structure (e.g., a wide char null of a JavaScript string) leads to a memory disclosure, as subsequent memory content (i.e., after the string data) is presented to the attacker when reading this data structure.

New powerful scripting features also found their way into the development arsenal of attackers [220, 221]: typed arrays [95] make it possible to read and write data very fine- grained within a legitimate scripting context. Manipulating either a length field or a pointer to an array buffer directly inside the metadata can lead to full read and write access of the process’ memory. It is a method to perform the introduced step I of an exploitation procedure. Figure 3.2 sketches only two general schemes of many possibilities to create information leaks. Note that leaks can also occur due to uninitialized variables or other errors and do not have to be created like shown in Figure 3.2. As soon as the attacker can read process memory, she can learn the base addresses of loaded modules in the address space of the program. Then, any code-reuse primitives can be conducted to exploit a vulnerability in order to bypass DEP, ASLR, CFI [64] and ROP defenses [40, 90]. Another possibility is to leak code directly to initiate an attack and bypass ASLR [189]. Other mitigations like Microsoft’s Enhanced Mitigation Experience Toolkit (EMET) [132] cannot withstand capabilities of sophisticated attackers.

For applications with scripting capabilities, untrusted contexts are sandboxed (e.g., JavaScript in web browsers) and must not provide memory information. Thus, attackers use different vulnerabilities to leak memory information into that context [89, 182, 218]. We assume that the program we want to protect suffers from such a memory corrup- tion vulnerability that allows the adversary to corrupt memory objects. In fact, a study shows that any type of memory error can be transformed into an information leak [198]. Furthermore, we assume that the attacker uses a scripting environment to leverage the obtained memory disclosure information at runtime for her malicious computations. This is consistent with modern exploits in academic research [40, 64, 89, 90, 178] as well as in-the-wild [92, 211, 218, 220, 221]. Our goal is to protect script engines against such powerful, yet realistic adversaries.

Thus, information leaks are an inevitable threat even in the presence of state-of-the-art security features. Note that many use-after-free vulnerabilities can be transformed into information leaks [220]. Thus, especially web browser are in high as these errors are prevalent in such complex software systems.

46 3.3 System Overview

3.3 System Overview

In the following, we explain our approach to tackle the challenge of detecting information leaks in script engines. Hence, we introduce the needed building blocks, namely per process re-randomization and dual process execution.

3.3.1 Main Concept As described above, information leaks manifest themselves in the form of memory infor- mation inside a context which must not reveal such insights. In our case, this is any script context inside an application: high level variables and content in a script must not contain memory pointers, which attackers could use to deduce image base addresses of loaded modules. Unfortunately, a legitimate number and a memory pointer in data bytes received via a scripting function are indistinguishable. This leads us to the following assumption: a memory disclosure attack yields a memory pointer, which may be surrounded by legitimate data. The same targeted memory disclosure, when applied to a differently randomized, but otherwise identical process, will yield the same legitimate data, but a different memory pointer. Due to the varying base addresses of modules, different heap and stack addresses, a memory pointer will have a different address in the second process than in the first process. Thus, a master process and a cloned twin process – with different address space layout randomization – can be executed synchronized side-by-side and perform identical operations, e.g., execute a specific JavaScript function. In benign cases, the same data

Figure 3.3: The basic concept of a script engine’s interpreter loop. The interpreter fetches the bytecode of the script and switches to the corresponding bytecode handler. When the operation of the bytecode handler is finished, the interpreter loop jumps back to the switch and processes the next bytecode.

47 Chapter 3 Information Leak Detection in Script Engines

Figure 3.4: Overview of our main information leak detection concept: The master process is synchronized with a re-randomized, but otherwise identical twin process. If a memory disclosure attack is conducted in the master process, it appears as well in the twin process. Due to the different randomization, the disclosure attack manifests itself in different data flowing into the script context and can be detected (0x727841F0 vs. 0x86941F0) getting into the script context is equal for both processes. When comparing the received data of one process to the same data received in the second process, the only difference can arise because of a leaked memory pointer pointing to equal memory, but having a different address. In order to compare the data of the master and twin process, we have to instrument the interpreter loop of the script engine. Figure 3.3 shows the basic concept of a script engine’s interpreter loop. We can instrument the call and return to precisely check all outgoing data and therefore to detect an information leak. Based on this principle, our prototype system launches the same script-engine process twice with diverse memory layouts (see also Figure 3.4). The script engines are coupled to run in sync which enables checking for information leaks. In spirit, this is similar to n-variant systems [33, 50] and multi-execution based approaches [38, 53, 68]. However, our approach is more fine-grained since it checks and synchronizes the processed data on the bytecode level of the script context and is capable of detecting the actual information leak, instead of merely detecting an artifact of a successful compromise (i.e., divergence in the control flow). A more detailed discussion about the granularity of our approach in comparison to other n-variant and multi-execution systems is given in Section 3.6.2. The involved technical challenges to precisely detect information leaks are explained throughout the rest of this chapter.

3.3.2 Per Process Re-Randomization As sketched in Section 3.2.2, all executable images loaded among simultaneously running processes have the same base address in these processes. While it is convenient from a memory sharing point of view, an attacker can abuse a memory disclosure for coordinated

48 3.3 System Overview

Figure 3.5: Master and twin process with a different randomization: as the loader has to fix up address references for each twin process, sharable code turns into private data. As a consequence, each twin process has its own private copy of the DLL. attacks between them [119]. Applying a different randomization for processes of choice has the nice side effect of excluding these from such attacks, but our per process re- randomization has the main goal to randomize two running instances of the same program (see Figure 3.5). Therefore, a program of interest is started and we collect the base addresses of all images it loads and will load during its runtime period. We refer to this first process as master process. A second process instance of the application known as the twin process is spawned. Upon its initialization, the base addresses gained from the master are occupied in the virtual address space of the twin. This forces the image loader to map the images to other addresses than in the master process, as they are already allocated. We can save us time and trouble to re-randomize the stack and heap process-wise, as modern operating systems (e.g., Windows 8 on 64-bit) support it natively. Though, the steps described to re-randomize all loaded images in the twin process are specific to Windows, the general concept of our proposed information leak detection approach is operating system independent. As long as master and twin process have different memory layouts, our approach can be applied. Finally, we can establish an inter process communication (IPC) bridge between the master and twin process. This enables synchronized execution between them and comparison of data flows into their contexts that are forbidden to contain memory information.

3.3.3 Dual Process Execution and Synchronization After the re-randomization phase, both processes are ready to start execution at their identical entry points. After exchanging a handshake, both resume execution. In order to achieve comparable data for information leak checking, the executions of script interpreters in both processes have to be synchronized precisely. This is accomplished by intercepting an interpreter’s native methods. Additionally, we install hooks inside the bytecode inter-

49 Chapter 3 Information Leak Detection in Script Engines preter loop at positions where opcodes are interpreted and corresponding native functions are called. Thus, we perceive any high-level script method call at its binary level. The master drives execution and these hooks are the points where the master and twin pro- cess are synchronized via IPC. We check for information leaks by comparing binary data which returns as high-level data into the script context. All input data the master loads are stored in a cache and replayed to the twin process to ensure they operate on the same source (e.g., web pages a browser loads). Built-in script functions that potentially introduce entropy (e.g., Math.random, Date.now, and window.screenX in JavaScript) inter- fere with our deployed detection mechanism, since they generate values inside the script context that are different from each other in the master and twin processes, respectively. Additionally, they may induce a divergent script control flow. Both occurrences would be falsely detected as memory disclosure. Thus we also synchronize the entropy of both processes by copying the generated value from the master to the twin process. This way the twin process continues working on the same data as the master process and we are creating a co-deterministic script execution.

3.4 Implementation

Based on the concepts of per process re-randomization and dual process execution, we implemented a tool called Detile for Windows 8.0 and 8.1 64-bit. The current prototype is able to re-randomize on a per process basis and instrument Internet Explorer 10 and 11 to run in dual process execution mode. In the following, we describe in detail the steps taken during the development of our framework to detect information leaks and also discuss arisen and solved challenges.

Figure 3.6: Detile running with Internet Explorer. A 64-bit duplicator library is injected into the main IE frame process to enable it creating and re-randomizing twin tab processes for each master tab process, by itself. The main IE frame also injects a 32-bit DLL into each tab process to allow synchronization, communication between master and twin, and information leak detection.

50 3.4 Implementation

3.4.1 Duplication and Re-Randomization

In order to re-randomize processes and load images at different base addresses, we devel- oped a duplicator which creates a program’s master process. It enumerates the master’s initial loaded images with the help of the Windows API (CreateToolHelp32Snapshot) be- fore the master starts execution. Then, the twin process is created in suspended mode, and a page is allocated in the twin at all addresses of previously gathered image bases. We then need to trick the Windows loader into mapping kernel32.dll at a different base in the twin. Therefore, the twin is attached to the DebugAPI and a breakpoint is set automatically to the function RtlEqualUnicodeString in the 32-bit loader in the NT Layer DLL. The twin is then resumed and the WOW64 subsystem DLLs are initialized successfully to different base addresses, at first. As soon as the breakpoint triggers, and the function tries to compare the unicode name “KERNEL32.DLL” to the hardcoded “KERNEL32.DLL” string in NT Layer DLL, the arguments to RtlEqualUnicodeString are modified: the first unicode name is changed to lowercase and the third parameter is set to perfom a case sensitive comparison. This way, the loader believes that a different DLL than kernel32.dll is going to be initialized and allows the mapping to a different base. The loading of the 32-bit kernel32.dll is performed immediately after the WOW64 subsystem is initialized and it is also the first 32-bit DLL being mapped after the 32- bit NT Layer DLL. Thus, all subsequent libraries that are loaded and import functions from kernel32.dll have no problems to resolve their dependencies using the remapped kernel32.dll. The loader maps them to different addresses, as their preferred base ad- dresses are reserved. Although the DebugAPI is used, all steps run in a fully automated way. As a next step, the DebugAPI is detached and the main image is remapped to a different address. As it is already mapped even in suspended processes, this has to be done specifically. Additionally, LdrLoadDll in the twin process is detoured to intercept new library loads and map incoming images to different addresses than in the master. We were not able to re-randomize ntdll.dll, because it is mapped into the virtual address space very early in the process creation procedure. Attempts to remap ntdll.dll later on did not succeed due to callbacks invoked by the kernel. The implications of a non re-randomized ntdll.dll are discussed in Section 3.7.2. Note that this design works also with pure 64-bit processes. However, frequently at- tacked applications like tab processes of Internet Explorer are 32-bit and are running in the WOW64 subsystem. Hence, our framework has to protect them as well. The following explains how Detile achieves this support. While the above explained logic is sufficient to duplicate and re-randomize a single- process program, additional measures have to be taken in the case of multi-process ar- chitecture applications like Internet Explorer. Therefore, we developed a wrapper which starts the 64-bit main IE frame process and injects a 64-bit library, we named duplicator library (see Figure 3.6). This way, we modify the frame process, such that each time a tab process is started by the frame process, a second tab process is spawned. The first becomes the master, the second the twin. This is achieved via detouring and modifying the process creation of the IE frame. Additionally, our above explained re-randomization logic is incorporated into the duplicator library to allow the main IE frame process itself

51 Chapter 3 Information Leak Detection in Script Engines to re-randomize its spawned twins at creation time. To protect each new tab which is run by the IE frame, we ensure that each tab is run in a new process and gets a twin. To enable communication, synchronization, and detection of information leaks, the du- plicator injects also a 32-bit library into the master and the twin upon their creation by the main IE frame process.

3.4.1.1 Kernel Mode Approach In addition to our user-mode approach, we also developed a kernel driver that follows the same logic. The driver rebases all DLLs and the main image, except for ntdll.dll. The main benefit of approaching the problem from kernel mode is flexibility. It enables us to intercept and filter each process and image load and grants us access to internal data structures that are linked to each image. The driver also handles images that are dynamically loaded, no matter through which API call the request is triggered. This is important as we noticed that not all DLL mappings go through the native LdrLoadDll call. Another motivation for a kernel approach is its generic functionality, in that we are not bound to apply a logic tailored to a specific process, but to apply one logic for each process. However, we left the generic functionality as a future work.

3.4.2 Synchronization We designed our prototype to be contained in a DLL which is loaded into both target instances. To reliably intercept all script execution, we hook LdrLoadDll to initialize our synchronization as early as possible once the engine has been loaded. After determining the role (master or twin), the processes exchange a short handshake and wait for events from the interpreter instrumentation. While most of our work is focused on the scripting engine, we also instrument parts of wininet.dll to provide basic proxy functionality. The twin receives an exact copy of the web data sent to the master to ensure the same code is executed.

3.4.2.1 Entropy Normalization The synchronization of script execution relies heavily on the identification of functions and objects introducing entropy into the script context. Values classified as entropy are over- written in the twin with the value received from the master. This ensures that functions such as Math.random and Date.now return the exact same value, which is crucial for syn- chronous execution. While it is obvious for Date.now, it is not immediately clear for other methods. Therefore, entropy inducing methods are detected and filtered incrementally during runtime. Hence, if a detection has triggered but the cause was not an information leak, it is included into the list of entropy methods.

3.4.2.2 Rendezvous and Checking Points Vital program points where master and twin are synchronized are bytecode handler func- tions. If a handler function returns data into the script context, it is first determined if

52 3.4 Implementation the handler function is an entropy inducing function. However, the vast majority of func- tion invocations and object accesses do not introduce entropy and are checked for equality between master and twin on the fly. If a difference is encountered that is not classified as entropy, we assume that an information leak occurred and take actions, namely logging the incident and terminating both processes. Our empirical evaluation demonstrates that the synchronization is precise and even for complex websites, we can synchronize the master and twin process (see Section 3.5.2 for details).

3.4.3 Chakra Instrumentation

The Chakra JavaScript Engine contains a JIT compiler. It runs in a dedicated thread, identifies frequently executed (so called hot) functions and compiles them to native code. Our current implementation works on script interpreters, hence we disabled the JIT com- piler. This is currently a prototype limitation whose solution we discuss in Section 3.7.2. In order to synchronize execution and check for information leaks, we instrumented the main loop of the Chakra interpreter, which is located in the Js::InterpreterStack- Frame::Process function. It is invoked recursively for each JavaScript call and iterates over the variable length bytecodes of the JavaScript function. The main loop contains a switch statement, which selects the corresponding handler for the currently interpreted bytecode. The handler then operates on the JavaScript context dependent on the operands and the current state. In the examined Chakra versions we observed up to 648 unique bytecodes. Prior to the invocation of a bytecode handler, our instrumentation transfers the control flow to a small, highly optimized assembly stub, which decides whether the current bytecode is vital for our framework to handle. We intercept all call and return as well as necessary conversion bytecodes in order to extract metadata such as JavaScript function arguments, return values and conversion values. Conversion bytecodes handle dynamic type casting, native value to JavaScript object and JavaScript object to native value conversions. Additionally, we intercept engine functions that handle implicit type casts at native level, because they are invoked by other bytecode handlers as required and have no bytecode equivalents themselves. Furthermore, all interception sites support the manipulation of the outgoing native value or JavaScript object for the purpose of entropy elimination in the JavaScript context of the twin process.

3.4.4 AVM Instrumentation

Instrumentation of the AVM is based on prior work of F-Secure [97] and Microsoft [121]. We hook at the end of the native method verifyOnCall inside verifyEnterGPR to intercept ActionScript method calls and retrieve ActionScript method names. At these points, master and twin can be synchronized. Parameters flowing into an ActionScript method and return data flowing back into the ActionScript context can be dissected, too. They are also processed inside the method verifyEnterGPR. Based on their high level ActionScript types, the parameters and return data can be compared in the master and twin. This way, we can keep the master and twin in sync at method calls, check for information leaks and mediate entropy data from the master to the twin.

53 Chapter 3 Information Leak Detection in Script Engines

3.5 Evaluation

In the following, we present evaluation results for our prototype implementation in the form of performance and memory usage benchmarks. The benchmarks were conducted on a system running Windows 8.0/8.1 that was equipped with a 4th generation Intel i7- 4710MQ quad-core CPU and 8GB DDR3 RAM. Furthermore, we demonstrate how our prototype can successfully detect several kinds of real-world information leaks.

3.5.1 Re-Randomization of Process Modules We evaluated our re-randomization engine according to its effectiveness, memory usage, and performance.

3.5.1.1 Effectiveness We applied re-randomization to internal Windows applications and third-party applica- tions, to verify that modules in the twin are based at different addresses than in the master. We therefore compared base addresses of all loaded images between the two processes and confirmed that all images in the twin process had a different base address than in the master, except ntdll.dll. See the discussion in Section 3.7.2 for details on the difficulties of remapping the 64-bit and 32-bit NT Layer DLLs. As an example, Table 3.2 presents important Windows DLLs, re-randomized in different processes running simultaneously on a single user session in Windows 8.1.

3.5.1.2 Physical Memory Usage To inspect the memory overhead of our re-randomization scheme, we measured the work- ing set characteristics for different master and re-randomized twin processes compared to native processes. Figure 3.7 shows the memory working sets of three applications. ReASLR denotes thereby the re-randomization within a single process. DE means that two processes are running, whereby the master’s randomization is kept native while the twin is re-randomized. The applications besides IE are only included to measure the memory overhead and are not synchronized. We calculated the memory overhead of per process re-randomization (ReASLR) of a single process with the formula:

WS(T win) Overhead(ReASLR) = − 1 WS(Native) Thus, the overall memory overhead based on working sets is 0.46 times. When running a program or process in per process re-randomization and dual process execution (DE), we have to include both, master and twin, into the memory overhead calculation. Therefore, the overhead is calculated by

WS(T win) + WS(Master) Overhead(ReASLR + DE) = − 1 WS(Native) Its overall value is 1.45 times. Note that memory working sets can highly vary during an application’s runtime, and thus, are difficult to quantify. The measurements shown

54 3.5 Evaluation

Table 3.2: Re-randomized processes in Internet Explorer 11: original processes have their modules mapped system wide at same base addresses, while our re-randomized processes map their modules to a different base process wise. Bold entries represent the system’s and the browser’s most essential modules.

DLL Name Systemwide Re-Randomized Process 1 Re-Randomized Process 2 ntdll.dll (64-bit) 0x7FF9D5DD0000 0x7FF9D5DD0000 0x7FF9D5DD0000 ntdll.dll 0x77640000 0x77640000 0x77640000 wow64win.dll (64-bit) 0x775D0000 0x590000 0xCD0000 wow64.dll (64-bit) 0x77580000 0x290000 0xC80000 wow64cpu.dll (64-bit) 0x77570000 0xA0000 0x8C0000 oleaut32.dll 0x773A0000 0x2770000 0x2310000 advapi32.dll 0x77270000 0x2A40000 0x2620000 msvcrt.dll 0x76E80000 0xB80000 0x12E0000 kernel32.dll 0x76AA0000 0x750000 0xE10000 ws2 32.dll 0x76A50000 0x56D0000 0x5DD0000 KernelBase.dll 0x76840000 0x890000 0x1070000 gdi32.dll 0x76700000 0x1330000 0x1A50000 shlwapi.dll 0x76620000 0x2610000 0x21B0000 user32.dll 0x764D0000 0xE60000 0x15C0000 crypt32.dll 0x76350000 0x8520000 0x8BA0000 ole32.dll 0x76240000 0x2660000 0x2200000 shell32.dll 0x75080000 0x3EF0000 0x4A40000 cryptbase.dll 0x75050000 0x270000 0xC70000 apphelp.dll 0x74F50000 0xAE0000 0x1240000 ieframe.dll 0x74160000 0x1AC0000 0x3EF0000 IEShims.dll 0x73F20000 0x29F0000 0x25D0000 wininet.dll 0x73C60000 0x53D0000 0x5C00000 userenv.dll 0x73C40000 0x5330000 0x2930000 urlmon.dll 0x73AF0000 0x55A0000 0x2990000 winhttp.dll 0x73A40000 0x5720000 0x5E20000 mswsock.dll 0x739F0000 0x5B70000 0x6280000 rsaenh.dll 0x73970000 0x5C30000 0x6370000 ieproxy.dll 0x738E0000 0x5F90000 0x66D0000 dnsapi.dll 0x73820000 0x6380000 0x8080000 mshtml.dll 0x72340000 0x648000 0x6980000 d2d1.dll 0x71F70000 0x76A0000 0x7C40000 ieui.dll 0x71D10000 0x7F10000 0x8530000 jscript9.dll 0x718B0000 0x8110000 0x8770000 d3d11.dll 0x71630000 0x8FD0000 0x7990000 iexplore.exe 0x2F0000 0x3C0000 0xA70000 in Figure 3.7 (and in the Tables 3.3, 3.4, 3.5 and Table 3.6) were performed after the application has finished startup, and was waiting for user input (i.e., it was idle and all modules were loaded and initialized). Due to additional twins for master processes, the overall additional memory is about one to two times per protected process.

Working set characteristics of per process re-randomization and dual process execution. Table 3.3 and Table 3.5 show the native memory consumption of three applications whose consumptions are shown in Table 3.4 and Table 3.6 when running in dual process execution

55 Chapter 3 Information Leak Detection in Script Engines

Internet Explorer Mozilla Firefox 33.0 Calculator

ReASLR: 63% ReASLR: 9% ReASLR: 81% 120 ReASLR + DE: 145% ReASLR + DE: 110% ReASLR + DE: 185% 95.5M 100 87.9M 88.9M 80

60 51.8M

40 31.8M

WS in M (Win8.1) 26.2M 20.7M 20 11.4M 11.8M 0

ReASLR: 75% ReASLR: 19% ReASLR: 31% 120 ReASLR + DE: 174% ReASLR + DE: 121% ReASLR + DE: 135% 99.7M 100 84.0M 86.0M 80 60 55.1M

40 31.4M 30.9M WS in M (Win8.0) 20 11.3M 11.8M 14.8M 0

Twin Twin Twin Native Master Native Master Native Master Process Mode Figure 3.7: Memory overhead of re-randomization and dual execution measured via working set (WS) consumption in megabytes (M): Native processes on Windows 8.0 and 8.1 are contrasted to their counterparts running in re-randomized dual execution mode (master and twin).

Table 3.3: Memory working sets of 32-bit native processes running on Windows 8.1 64-Bit (Internet Explorer 11 main frame with one tab, Mozilla Firefox 33.0 and the )

Process Working Set WS Private WS Shareable WS Shared iexplore.exe (64-bit frame) 20,980 K 3,008 K 17,972 K 14,532 K iexplore.exe 31,796 K 5,260 K 26,536 K 10,888 K firefox.exe 87,856 K 47,988 K 39,868 K 16,320 K calc.exe 11,440 K 1,332 K 10,108 K 8,968 K

mode. As Internet Explorer is a multi-process software (see Section 3.2.4), tabs can run in separate processes. Therefore, there exist one 64-bit main IE frame and one tab process in native mode, while one 64-bit IE frame manages an additional twin process per master tab process when running in dual process execution mode. The results show, what intuitively is clear: As a re-randomized process has its modules loaded on different base addresses than other processes, corresponding memory is not shared. The process needs private copies for its modules. This results in higher working sets, lower shareable and lower shared memory in re-randomized processes (twins) compared to their master processes.

56 3.5 Evaluation

Table 3.4: Memory Working Sets of 32-bit processes running in dual execution mode on Windows 8.1 64-bit. All twin processes are differently randomized (Internet Explorer 11 main frame with two tabs, Mozilla Firefox 33.0 and the Windows Calculator)

Process Working Set WS Private WS Shareable WS Shared iexplore.exe (64-bit frame) 30,516 K 7,480 K 23,036 K 17,980 K iexplore.exe (master) 26,204 K 5,008 K 21,196 K 18,592 K iexplore.exe (twin) 51,816 K 43,476 K 8,340 K 8,332 K iexplore.exe (master) 21,556 K 3,928 K 17,628 K 17,536 K iexplore.exe (twin) 52,004 K 43,596 K 8,408 K 8,368 K firefox.exe (master) 88,936 K 53,388 K 35,548 K 16,512 K firefox.exe (twin) 95,516 K 80,316 K 15,200 K 14,208 K calc.exe (master) 11,828 K 4,880 K 6,948 K 6,180 K calc.exe (twin) 20,748 K 14,784 K 5,964 K 5,912 K

Table 3.5: Memory Working Sets of 32-bit native processes running on Windows 8.0 64-Bit (Internet Explorer 10, Firefox, and the Calculator)

Process Working Set WS Private WS Shareable WS Shared iexplore.exe (64-bit frame) 19,996 K 4,264 K 15,732 K 13,456 K iexplore.exe 31,432 K 7,580 K 23,852 K 9,840 K firefox.exe 83,988 K 52,400 K 31,588 K 10,684 K calc.exe 11,316 K 1,632 K 9,684 K 9,052 K

Table 3.6: Memory Working Sets of 32-bit processes running in dual execution mode on Windows 8.0 64-bit with all twin processes differently randomized (Internet Explorer 10, Firefox, and the Calculator)

Process Working Set WS Private WS Shareable WS Shared iexplore.exe (64-bit frame) 27,132 K 7,160 K 19,972 K 16,824 K iexplore.exe (master) 30,860 K 7,736 K 23,124 K 19,184 K iexplore.exe (twin) 55,112 K 39,196 K 15,916 K 14,788 K iexplore.exe (master) 27,108 K 5,380 K 21,728 K 18,220 K iexplore.exe (twin) 55,368 K 39,464 K 15,904 K 14,780 K firefox.exe (master) 85,968 K 51,240 K 34,728 K 18,120 K firefox.exe (twin) 99,740 K 82,488 K 17,252 K 15,876 K calc.exe (master) 11,808 K 5,020 K 6,788 K 6,376 K calc.exe (twin) 14,840 K 9,052 K 5,788 K 5,724 K

3.5.1.3 Re-Randomization and Startup Time Performance When a program is started the first time after a reboot, the kernel needs to create section objects for image modules. Hence, the first start of a program always takes longer than subsequent starts of the same program. To measure the additional startup and module load times our protection introduces, we first run each program natively once to allow the kernel to create section objects of most natively used DLLs, and close it afterwards.

57 Chapter 3 Information Leak Detection in Script Engines

Table 3.7: Startup times in seconds and startup slowdowns of native 32-bit applications compared to their counterparts running with per process re-randomization and dual process execution on Windows 8.1 and Windows 8.0 (both 64-bit). On Windows 8.1 IE 11 was measured and on Windows 8.0, IE 10

Application Native (8.1) ReASLR + DE (8.1) Slowdown IE tab creation 0.5194 s 1.3082 s 1.520x Firefox 1.3823 s 1.5441 s 0.117x Calculator 0.4391 s 0.6599 s 0.503x Native (8.0) ReASLR + DE (8.0) IE tab creation 0.9163 s 2.0710 s 1.260x Firefox 0.9624 s 1.8064 s 0.877x Calculator 0.3484 s 0.3610 s 0.037x

We then start the program natively without protection and measure the time until it is idle and all of its initial modules are loaded. In the same way, we measure the time from process creation until both the master and twin process have their initial modules loaded. The startup comparison can be seen in Table 3.7. As expected, the startup times of applications protected with our approach are approximately doubled. This is caused by the fact that a twin process needs to be spawned for each master that should be protected.

3.5.2 Detection Engine

Next, we evaluate the impact of Detile on the user experience and its effectiveness in detecting information leaks. We performed tests on the script execution time for popular websites and used the prototype to detect four real-world vulnerabilities and an artificial vulnerability to evaluate our detection capabilities.

Table 3.8: Native script execution of Internet Explorer 11 on Windows 8.1 64-bit compared to the script execution of Internet Explorer 11 instrumented with Detile. Execution time is measured in milliseconds using the internal F12 developer tools provided by IE.

Web page google.com facebook.com youtube.com yahoo.com baidu.com wikipedia.org twitter.com qq.com taobao.com .com amazon.com live.com google.co.in sina.com.cn hao123.com Native 425 774 1196 3674 1108 472 599 2405 645 439 958 254 483 3360 373 Detile 482 961 1519 4722 1339 513 623 2724 824 517 1210 275 517 4269 379 Overhead (%) 13.4 24.1 27 28.5 20.8 8.6 4 13.2 27.7 17.7 26.3 8.2 7 27 1.6

58 3.5 Evaluation

3.5.2.1 Script Execution Time and Responsiveness

We used the 15 most visited websites worldwide [7] to test how the current prototype interferes with the normal usage of these pages. Besides the subjective impression while using the page, we utilized the F12 developer tools of Internet Explorer 11 to measure scripting execution time provided by the UI Responsiveness profiler tab (see Table 3.8). These tests were performed using Windows 8.1 64-bit and Internet Explorer 11. Note that as a limitation we have to disable the JIT engine as discussed in Section 3.7.2. We want to measure the performance impact of our detection only, and hence, JIT is also disabled during the native runs. While we introduce a performance hit of around 17.0% on average, the subjective user experience was not noticeably affected. This is due to IE’s deferred parsing, which results in displaying content to the user before all computations have finished.

3.5.2.2 Information Leak Detection

We tested our approach on a pure memory disclosure vulnerability (CVE-2014-6355) which allows illegitimately reading data due to a JPEG parsing flaw in Microsoft’s Windows graphics component [222]. It can be used to defeat ASLR by reading leaked stack infor- mation back to the attacker via the toDataURL method of a canvas object. We successfully detected this leak at the point of the call to toDataURL in the master and twin process. In the same way, detection was successful for an exploit for a similar bug (CVE-2015- 0061 [223]). To further verify our prototype, we evaluated it against an exploit for CVE-2011-1346, a vulnerability that was used in the Pwn2Own contest 2011 to bypass ASLR [224]. As this memory disclosure bug is specific for IE 8, we ported the vulnerability into IE 11. An uninitialized index attribute of a new HTML option element is used to leak information. Similarly, we successfully detected this exploitation attempt when the index attribute was accessed. Additionally, we tested our prototype on another real-world vulnerability (CVE-2014- 0322) that was used in targeted attacks [79] and works for Internet Explorer 10 on Windows 8 64-bit. This vulnerability is an ideal showcase for step I to extend an attacker’s capabili- ties within a process. It is a use-after-free error that can be utilized to increase an arbitrary bit, which is enough to allow read and write access to the process’ virtual address space and create information leaks [118]. As attackers are also using these techniques [218, 221], it is important to have a good understanding of them to reliably develop mitigations. The exploit works in the following way: the heap is shaped in a specific layout with heap feng shui [218], such that general arrays are aligned to 0xXXXX0000 boundaries and headers of typed arrays follow aligned at 0xXXXXf000. The structure after heap feng shui is performed is illustrated in Figure 3.8. The vulnerability is used to increase the most significant byte of the size of the array memory block. This is possible, as a function operates on the injected fake object data with the code inc [eax + 0x10]. Then, we can perform out of bound writes with javascript methods of the modified array. We first change the pointer to a typed array Buffer in the typed array header following the general array data. We then, change also

59 Chapter 3 Information Leak Detection in Script Engines

Figure 3.8: Generating an information leak with CVE-2014-0322: Modified fields are shaded gray. The vulnerability allows a bit increase in the size of the array memory block. This is sufficient to subsequently and illegitimately change the length of a typed array buffer and the pointer to a typed array buffer in the contiguous typed array header. This results in access to the complete process memory.

the length of the Typed Array Buffer to 0x7fffffff in the header. Now we have access to the complete process memory via javascript methods of the changed typed array. As we know the location of a typed array vtable due to heap feng shui, we can access it and perform the illegal transition from native memory to the untrusted javascript context. Put differently, an information leak is created, which now can be used to reconstruct the complete memory layout. Detile triggered as the third byte of the vtable was accessed (i.e., the third least significant byte is the first differing byte in both contexts). Therefore, the information leak was detected successfully without problems. To further test our implementation, we also constructed a toy example. Hence, our native code creates an information leak by overwriting the length field of an array. Then, the image base of jscript9.dll is written to memory after the array buffer. This ensures that an out of bounds read will result in both, an information leak and a difference in the twin. We designed the example to be triggered by calling a specifically named JavaScript function and performing array accesses in it. The length field of any arrays accessed in this function will be overwritten with the value 0x400, allowing memory reads beyond the real array data. This example can be triggered in all versions for which we ported Detile and only depends on the structure of the internal metadata, which needs to be

60 3.6 Related Work adjusted between versions of the scripting engine. In our tests, we reliably detected the out of bounds read of the image base and can stop the execution of the process.

3.5.2.3 False Positive Analysis We analyzed the 100 top websites worldwide [7] to evaluate if our prototype can precisely handle real-world, complex websites and their JavaScript contexts without triggering false alarms. None of the tested websites did generate an alert, indicating that the prototype can accurately synchronize the master and twin process.

3.6 Related Work

Software vulnerabilities have received much attention in the last years, mainly due to their high presence in applications and huge impact in practice. Hence, several research results were presented to either offensively abuse vulnerabilities or to develop different defense techniques to mitigate them. In the following, we briefly examine recent work which tries to extend or improve randomization, and discuss traditional multi-execution approaches closely related to our approach.

3.6.1 Randomization Techniques Related to our process-wise re-randomization is Microsoft’s Enhanced Mitigation Expe- rience Toolkit (EMET) [132]. While it enforces ASLR for non-ASLR modules, module base addresses are identical in all processes running on a system and change only after a reboot. This still allows coordinated attacks, in which an information leak gathered in one program is abused in a second program. Thus, our re-randomization is superior as each process has a different module randomization including kernel32.dll and the main executable. Several approaches have been proposed to either improve address space layout random- ization, randomize the data space, or randomize on single instruction level. For example, binary stirring [213] re-randomizes code pages at a high rate for a high performance cost. While it hinders attackers to use information leaks in code-reuse attacks, it does not im- pede their creation by itself. In contrast, our re-randomization scheme reuses the native operating system loader and is the base to allow information leak detection with dual process execution. Oxymoron [12] allows fine-grained address space layout randomization in combination with code sharing and thereby imposing a low overhead. While it protects against code-reuse attacks, it does not detect information leaks. A sophisticated attacker is still able to read a complete memory page with sensitive information or manipulate im- portant tokens to escalate privileges. Thus, it does not protect against data-only attacks in combination with information leaks. Our framework differs in that it does not need to rewrite a given binary, and is specialized in determining if memory information is illegally flowing into an untrusted context. Furthermore, we support the 64-bit Windows oper- ating system that is the platform of choice of adversaries to attack applications. Other solutions [112, 155, 155] are prone to JIT-ROP code-reuse attacks [189], which are based on information leaks. Address space layout permutation is an approach to scramble all

61 Chapter 3 Information Leak Detection in Script Engines data and functions of a binary [112]. Therefore, a given ELF binary has to be rewritten and randomization can be applied on each run. ORP [155], rewrites instructions of a given binary and reorders basic blocks. As discussed above, it is prone to information leak attacks, which we detect. Instruction set randomization [14, 111] complicates code-reuse attacks as it encrypts code pages and decrypts it on the fly. However, in the presence of information leaks combined with key guessing [189, 194, 214] it can be circumvented. Instruction layout randomization (ILR) [98] randomizes the location of each instruction on each run, but no re-randomization occurs. Thus, the layout can be reconstructed with the help of an information leak. Readactor is a defensive system that aims to be resilient against just-in-time code-reuse attacks [51]. It hides code pointers behind execute-only trampolines and code itself is made execute-only, to prevent an attacker building a code-reuse payload just-in-time. However, it has been shown that it is vulnerable against an attack named COOP, which reuses virtual functions [177]. Unlike Readactor, Detile prevents COOP, as this attack needs an information leak as first step. Crane et al. recently presented an enhanced version of Readactor, dubbed Readactor++ [52], that also protects against whole function reuse attacks such as COOP. This is achieved through function pointer table randomization and insertion of booby traps. Consequently, an adversary can no longer obtain meaningful code locations that can be leveraged for code-reuse attacks. Readactor++ also does not detect or prevent the exploitation of memory disclosures, which poses a potential attack vector.

3.6.2 Multi-Execution Approaches Most closely related to our research are n-variant systems, which run variants of the same program with diverse memory layout and instructions [50]. Bruschi et al. presented a similar work that runs program replicæ synchronized at system calls to detect attacks [33, 99]. They demonstrate the detection of memory exploits against the lightweight server thttpd on the Linux platform. Our concept for the detection of information leaks incorporates ideas like dual process execution, per process re-randomization and synchronization. Hence, our work is closely related to n-variant systems [33, 50]. However, our approach is adapted specifically for script engines, making it more fine-grained. More specifically, it operates and synchronizes on the bytecode level, whereas n-variant systems intercept system calls. Furthermore, n-variant systems aim to detect the exploitation of different classes of memory corruption vulnerabilities depending on the utilized diversification method. In contrast, we focus on the detection of information leaks, which represent the first step in modern attacks. As such, our approach is capable of identifying the early phase of an attack instead of merely determining that the control flow has diverged. The major drawback of theses systems is the detection approach: if a memory error is abused, one of the variants eventually crashes, which indicates an attack. As information leaks do not constitute a memory error, they do not raise any exception-based signal. Thus, they remain undetected in these systems. One significant implication is that unlike Detile, n-variant systems do not protect against just-in-time code-reuse attacks such as JIT-ROP [189]. Similarily, this is the case with COOP attacks in browsers [177]. N-

62 3.7 Discussion variant systems prevent conventional ROP attacks [167, 209] with multi process execution and disjunct virtual address spaces: An attacker supplied absolute address (e.g., obtained through a remote memory disclosure vulnerability) is guaranteed to be invalid in n − 1 replicas. Hence, any system call utilizing this address will trigger a detection. However, JIT-ROP attacks may performs several memory disclosures and malicious computations without executing a system call in between, and thus, can evade traditional n-variant systems. COOP attacks may as well perform touring-complete computations on disclosed memory without executing a system call and evade these systems. Additionally, we show that synchronized interpreter execution can be achieved in much more complex software systems with much higher synchronization granularity, even with- out having access to the source code. Private information leaks can be prevented with shadow executions [38]. In contrast, our prototype does not require a virtual machine per program to perform multi execution as our prototype achieves synchronization on the binary level. Additionally, we aim to detect general and fine-grained raw memory information leaks usable to bypass security features. Other research showed that private information leak in networks can be prevented via comparing outbound traffic of original and shadow processes for divergence [53]. Our work strongly varies, as we focus on inherent binary flows and show that even in highly complex software, subtle differences can be used to detect illegal program behavior. Our dual execution approach differs to other multi-execution approaches in that both programs do not need to receive input tagged at a security level. This is due to the fact that raw memory leaks show themselves as different contents inside low privileged contexts. DieHard [17] and DieHarder [148] are memory allocators that mitigate heap vulner- abilities. Furthermore, DieHard is able to operate in a so called replicated mode to run a program several times in parallel to compare the output. In spirit, this is similar to our approach. However, the design of DieHard does not allow to run programs which perform network operations or access the filesystem. In our work, we show that multi-execution is possible beyond programs in the scope of DieHard, e.g., in sophisticated and complex software like web browsers without having access to their source code. Additionally, in single execution mode, memory disclosures like uninitialized reads fly under the radar of DieHard and DieHarder as they do not constitute necessarily a memory error. Devriese and Piessens showed that noninterference can be achieved via secure multi- execution [68]. It targets a different context than our work and is implemented against source code to compare only JavaScript I/O. Similarly, it is possible to leverage static trans- formation of JavaScript and Python code of interest to obtain secure multi-execution [15]. Our work can serve as a starting point to achieve non-interference in binary programs which encompass security levels spread over difficult to connect boundaries (i.e., native memory and the JavaScript layer).

3.7 Discussion

In the following, we discuss potential shortcomings of our approach and the prototype, and also sketch how these shortcomings can be addressed in the future.

63 Chapter 3 Information Leak Detection in Script Engines

3.7.1 Further Information Leaks

Serna provided an in-depth overview of techniques that utilize information leaks for exploit development [182]. The techniques he discussed during the presentation utilize JavaScript code. As our prototype leverages the JavaScript engine of the browser itself, each informa- tion leak that is based on these techniques is detected. This implies that memory disclosure attacks that leverage other (scripting) contexts (e.g., VBScript) can potentially bypass our implementation. However, in practice exploits are typically triggered via JavaScript and thus our prototype can detect such attacks. Furthermore, due to the generic nature of our approach, our current prototype can be extended by instrumenting other scripting en- gines. Another flavor of information leak is based on timing attacks. In 2012, it has been shown that timing attacks on the hash table implementation in Firefox can be utilized to obtain addresses of string objects [153]. In 2013, another timing attack was presented that uses the garbage collector to leak addresses of internal data [25]. Hund et al. demon- strate that kernel-space ASLR can be defeated by timing attacks targeted at the cache and TLB [102]. Seibert et al. demonstrate how memory corruption vulnerabilities can aid in utilizing timing side channel techniques to gain knowledge about the executed code, even if the code is diversified [181]. Timing based information leaks are difficult to detect, as it is hard to verify if a computational operation serves the purpose of a timing side channel or if it is legitimate. Hence, the reliable detection of information leaks based on timing side channels has yet to be shown, but recent results are promising [69, 96]. While our detection does not trigger on timing-based information leaks, it triggers on the usage of them, because we monitor the transition into native memory from JavaScript. As such, we cannot directly observe the information leak, but can directly detect its results once it enters the scripting context or flows from the scripting context to native memory.

3.7.2 Limitations of Prototype Implementation

In the unlikely event one of the functions we classified as entropy source, such as Math.random or Date.now, contain a memory disclosure bug, our approach can lead to an under- approximation of detected information leaks. In this specific case the master confuses the leaked pointer with data from the entropy source and transfers it to the twin pro- cess. This is an undesirable state, because Detile does not prevent the memory layout information to leak into the script context. However, the obtained pointer is only valid in the master process. An attempt to leverage the pointer to mount a code-reuse attack crashes the twin. As a consequence, Detile halts the master process and prevents further damage. The current prototype disables the JIT engine as we protect the interpreter only. How- ever, dynamic binary instrumentation (DBI [31, 124]) frameworks allow to synchronize processes on the instruction or basic block level, and hence, make it possible to hook emit- ted JIT code to dispatch our assembly stub in order to synchronize and check within the JIT code. Asynchronous JavaScript events are currently not synchronized. This is solvable with DBI frameworks as well: If an event triggers in the master process, we let the twin execute to the same point. Then Detile sets up and triggers the same event in the twin process.

64 3.8 Conclusion

One additional shortcoming of our prototype implementation is the identical mapping of ntdll.dll in all processes. As this DLL is initialized already at startup, remapping it is a cumbersome operation. We are not aware of any information leak attack gaining an address to ntdll.dll directly without performing prior memory disclosures. As JavaScript, HTML, and other contexts in browsers do not interact directly with native Windows structures, internal JavaScript objects normally do not contain direct memory references to it. On the contrary, there might be script engines which directly interact with ntdll.dll. Still, the issue is probably solvable with a driver loaded during boot time. Another technical drawback is the application of re-randomization on every process on the OS, as DLL modules of each process would turn into non-shareable memory and increase physical memory consumption. This can be avoided by protecting only critical processes that represent a valid target for attacks.

3.8 Conclusion

Over the last years, script engines were used to exploit vulnerable applications. Espe- cially web browsers became an attractive target for a plethora of attacks. State-of-the- art vulnerability exploits, both in academic research [40, 64, 89, 90, 178] and in-the- wild [92, 211, 218, 220, 221], rely on memory disclosure attacks. In this work, we proposed a fine-grained, automated scheme to reliably detect such information leaks in script en- gines. It is based on the insight that information leaks result in a noticeable difference in the script context of two synchronized processes with different randomization. We imple- mented a prototype of this idea for the proprietary browser IE to demonstrate that our approach is viable even on closed-source systems. An empirical evaluation demonstrates that we can reliably detect real-world attack vectors and that the approach induces a mod- erate performance overhead only (around 17% overhead on average). While most research focused on mitigating specific types of vulnerabilities, we address the root cause behind modern attacks since most of them rely on information leaks as a fundamental step.

65 Chapter 3 Information Leak Detection in Script Engines

66 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

Previous chapters dealt with information leaks (step II), as attackers need knowledge about the memory layout before they can further abuse memory corruptions to hijack the control flow of the program (step III). Due to DEP or W ⊕ X, data injected by attackers is not interpreted as code anymore these days. Thus, since 2006 [115], adversaries resorted to reuse code already available in the to-be-exploited program. Code-reuse attacks do not inject new code but chain together small chunks of existing code, called gadgets, to achieve arbitrary code execution. Several code-reuse techniques gained tremendous popularity among adversaries, as they are utilizable in a generic way, independent of the type of memory corruption. Hence, code-reuse replaced code injection or is used as preparative step to make injected data executable again. It is not surprising that research started to take action and developed techniques to impede code-reuse. A specific defense is control- flow integrity (CFI). The security of coarse-grained CFI is questionable – especially for binary-only software – and a procedure to measure their security is the AIR metric (average indirect target reduction [229]). It allows to measure, in a limited way, how much code is still reusable by an attacker. The more code gadgets that can be discovered and utilized at protected control transfers, the less secure a code-reuse or control-flow hijacking defensive may be assumed. In this chapter, we attempt to maximize the discovery of CFI-compatible code gadgets in an architecture-independent way to enable code-reuse attacks.

4.1 Introduction

In response to the high success of code-reuse attacks, defensive research was driven to find protection methods. Some results of this research are kBouncer [156], ROPecker [46], EMET [74] including ROPGuard [75], BinCFI [229], and CCFIR [228]. These defenses incorporate two main ideas. The first is to enforce control-flow integrity (CFI) [1, 2]. With perfect CFI the control-flow can neither be hijacked by code-injection nor by code- reuse [89]. However, the overhead of perfect CFI is too high to be practical. Therefore the proposed defense methods try to strike a balance between security and tolerable overhead.

67 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

The second idea is to detect code-reuse attacks by known characteristics of an attack like a certain amount of gadgets chained together. All of those schemes defend attacks on the x86/x86-64 architecture. For other architectures the research is lacking behind [61, 162]. Several generic attack vectors have been published by the offensive side to highlight the limitations of the proposed defense methods. Although single implementations can be bypassed with common code-reuse attacks by exploiting a vulnerability in the im- plementation [27, 57], generic circumventions rely on longer and more complex gad- gets [40, 64, 89, 90, 178] or complete functions [177]. Since the gadgets loose their simplicity by becoming longer, it also becomes harder to find specific gadgets and chain them together. To the best of our knowledge there is no gadget discovery framework available to search for CFI resistant gadgets. To be able to assess a CFI solution, it is necessary to reveal code gadgets which could execute within the boundaries of the solution’s CFI policies or detection heuristics. We provide a framework which is able to discover CFI resistant code gadgets or complete functions across different architectures, an increasingly important property as CFI starts to evolve on non-x86 systems as well. Notably, no search for CFI resistant code gadgets has been performed for ARM, while defenses for this architecture have already been devel- oped [61, 162]. The information which our framework delivers, helps security researchers to quickly prototype exploit examples to test CFI solutions. We opted to use an intermediate language (IL) for the analysis of extracted code to support different architectures without the effort to adjust the algorithms to new archi- tectures. Because of the high architecture coverage, VEX [204] is our choice for the IL. VEX is part of , an instrumentation framework intended for dynamic use [203]. We harness VEX in static analysis manner [185, 187] and utilize the SMT solver Z3 [135] to translate code gadgets into a symbolic representation to enable symbolic execution and path constraint analysis. Our evaluations shows that our framework discovers 1.2 to 154.3 times more CFI resistant gadgets across different architectures and operating systems than other gadget discovery tools. Additionally, we show that CFI resistant gadgets are available in binary code for the ARM architecture as well, which should be taken into account by future CFI solutions. In summary, we make the following contributions:

• We develop a framework to discover CFI and heuristic check resistant gadgets in an architecture independent offline search.

• Our framework delivers semantic definitions of extracted code gadgets and classifies them based on these definitions for convenient search and utilization by an security researcher.

• To the best of our knowledge we are the first to provide a code gadget discovery framework which reveals CFI resistant gadgets across different architectures, and show that CFI-compatible gadgets are also prevalent on the ARM architecture.

68 4.2 Technical Background

4.2 Technical Background

Before presenting our work, several background information need to be reviewed in order to understand the remainder of this chapter. We begin by briefly describing code-reuse attacks, CFI approaches we are focusing on in this chapter, as well as heuristic techniques proposed by recent research to defend against runtime attacks. It is important to under- stand the concept of CFI and the presented heuristic checks, as we focus on gadgets that are resistant against these approaches. Architecture independence is another issue that is tackled by our framework.

4.2.1 Code-Reuse Attacks The introduction of Data Execution Prevention (DEP) [66] on modern operating systems provided a useful protection against the injection of new code. To bypass DEP, attackers often resort to reusing code already provided by the vulnerable executable itself or one of its libraries. Vulnerabilities suitable for code-reuse attacks are memory corruptions such as stack, heap or integer overflows, or a dangling pointer. The technique most commonly applied to reuse existing code is return-oriented programming (ROP) [34, 183]. The concept behind ROP is to combine small sequences of code, called gadgets that end with a return instruction. All combined gadgets of an exploit are often referred to as a gadget chain. To be able to combine these gadgets, either a sequence of return addresses has to be placed on the stack where each address points to the next gadget, or the stack pointer has to be redirected to a buffer containing these addresses. The process of redirecting the stack pointer is called stack pivoting. For architectures with variable opcode length like x86/x86-64, the instructions used for the gadgets do not have to be aligned as intended by the compiler. Previous work has shown that enough gadgets for arbitrary computations can be located [34, 62, 72, 113] even

Figure 4.1: A simple gadget chain for the x86 architecture containing a ROP, JOP, and a COP gadget. The gadgets perform the calculation 0x10 + 0x20.

69 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks without those unintended instructions. This is an interesting observation that especially concerns architectures with fixed opcode length. Automated tools that search for gadgets and chain them together have also been developed by past research [103, 152, 179]. Over the years, research on code-reuse attacks has proposed different variations of ROP such as jump-oriented programming (JOP) [26, 45, 62] and call-oriented programming (COP) [40]. JOP uses jumps instead of returns to direct the control flow to the next gadget, and COP apparently uses calls. An exemplary chain of gadgets that involves ROP, JOP and COP is illustrated in Figure 4.1. Due to their complexity, code-reuse attacks are typically used to make injected code executable thus defeating protections like DEP and redirect the control flow to the injected code [110, 159].

4.2.2 Control-Flow Integrity (CFI) The concept of CFI was first introduced by Abadi et al. [1, 2]. Since then, a vast amount of different CFI flavors appeared with various security guarantees. We provide a more detailed overview in Section 5.6.1 of Chapter 5. A comparison of various CFI solutions is also provided by Burow et al. [36]. A program maintains the CFI property if the control flow remains in a predefined control-flow graph (CFG). This predefined CFG contains all intended execution paths of the program. If an attacker redirects the control flow via code injection or code reuse to an unintended execution path, the CFI property is violated and the attack is detected. In an ideal CFG, every indirect transfer corresponds to a list of valid unique identifiers (IDs) and every transfer target has an ID assigned to it [89]. These IDs are checked before indirect transfers occur to ensure that the target is valid. In Figure 4.2, an example of an CFG protected by ideal CFI is illustrated. The figure shows that every indirect call (IC) has one valid call target, unlike the return of FUNCTION 2(), which has two targets. Both invalid returns in FUNCTION 2() and FUNCTION 3() are ROP transfers. If CFI is applied to binary-only software, it becomes problematic to generate a complete CFG. For its construction, the program has to be disassembled and a pointer analysis has to be performed. Every error made during this process may lead to false positives during runtime of the protected program. Another issue with the classical CFI approach as proposed by Abadi et al. is its performance impact. Therefore, proposed CFI solutions— also called coarse-grained approaches—typically reduce the number of IDs by assigning the same ID to the same category of targets. However, this also decreases the security, as Figure 4.2 shows. If all returns share the same ID, the invalid return in FUNCTION 3() to ID C cannot be distinguished from the valid return to ID F. Both transfers maintain the CFI property. Renowned instances of coarse-grained approaches are CCFIR [228] and BinCFI [229]. BinCFI uses two IDs to ensure the integrity of the CFG. The first ID defines rules for targets of return (RET) instructions and indirect jumps (IJ). Exception handlers, constant and computed code pointers are allowed for these transfers. The second ID combines rules for indirect control-transfers from the procedure linkage table (PLT) and indirect calls. For these transfers, exported symbols, constant and computed code pointers are allowed. Each ID has a corresponding routine inside the protected binary. Every indirect transfer is instrumented to jump to one of the two verification routines.

70 4.2 Technical Background

Figure 4.2: Exemplary CFG protected by ideal CFI. In an ideal CFI, all invalid control flow transfers are detected.

Similar to BinCFI, CCFIR also is a coarse-grained CFI approach applied to binaries without source code. Each indirect transfer is redirected through a Springboard. The springboard contains all valid control-flow targets and thereby prevents the flow to be redirected to invalid targets. An initial permutation of the springboard at program startup additionally raises the bar for attackers. CCFIR uses three IDs to maintain CFI. The first ID groups the targets of ICs and IJs. These transfers are allowed to target function entry points (EP) only. A set of security sensitive functions are excluded from this ID. They have to be called with a direct function call. The second ID represents return addresses. Only return addresses at intended call sites are valid targets for return instructions. The third ID groups return addresses within the previously mentioned security-sensitive functions. However, return instructions of security-sensitive functions are allowed to target return addresses of ID two and ID three, while normal return instructions are just allowed to target return addresses of ID three.

4.2.3 Heuristic Approaches In 2013 Pappas et al. [156] introduced kBouncer, an heuristic-aided approach that lever- ages modern hardware features to prevent code-reuse attacks. To perform CFI checks, kBouncer utilizes the Last Branch Record (LBR). LBR is a feature of contemporary Intel and AMD processors which can only be enabled and disabled in kernel mode. Therefore,

71 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks kBouncer consists of a user and kernel mode component. Like the name suggests, LBR records the last taken branches or a subset of the last branches. Each entry in the LBR contains the source and destination address of the taken branch. By fetching some bytes just before the destination address, kBouncer can examine and enforce that every return address is preceded by a call instruction. Otherwise kBouncer reports a CFI violation. Besides the CFI enforcement, a heuristic check is performed by inspecting the last 8 indirect branches. If all entries match kBouncers gadget definition, an attack is reported. A gadget is considered as an entry if it contains up to 20 instructions and ends in an indirect control flow. The checks are invoked whenever one out of 52 critical WinAPI functions such as Vir- tualProtect or WinExec is called. The user mode component hooks these critical functions and triggers the checks in the kernel mode component. Another heuristic-aided approach is ROPecker by Cheng et al. [46], which also utilizes the LBR stack to look for gadgets in the past control flow. Additionally, the future control flow is also examined. To check for gadgets in the future control flow, ROPecker combines online emulation of the flow, stack inspection, and an offline gadget search. Since gadgets are already searched offline and stored to a database, ROPecker has also the possibility to detect unaligned gadgets. To detect gadgets, ROPecker does not apply CFI enforcements, but merely relies on heuristics. A gadget in the context of ROPecker is a sequence of up to 6 instructions ending with an indirect control-flow transfer. Sequences containing direct branch instructions are excluded from the definition. ROPecker inspects the past control flow first by utilizing the LBR to record indirect branch instructions. The first non-gadget encountered while walking the LBR backwards terminates the search for gadgets in the past control flow. Afterwards, the future control flow is inspected for gadgets. If the combined number of encountered gadgets from the past and future control flow is above a predefined threshold, an attack is reported. The research of Cheng et al. suggests that a threshold between 11 and 16 gadgets is a suitable number.

4.2.4 Defeating the Countermeasures All presented defenses against code-reuse attacks have been bypassed in recent years. While some attacks exploit vulnerabilities in a specific implementation to disable the checks [27, 57], we focus on generic bypasses to defeat the protections. We divide the defense policies in two categories, CFI policies posing limitations on indirect branch in- structions and heuristic policies looking for typical characteristics of code-reuse attack vectors.

4.2.4.1 CFI Policies Attacks focusing on kBouncer and ROPecker just have to bypass the call site (CS) checks. However, attacks against BinCFI and CCFIR [64, 89] also have to take into account that ICs and IJs are limited to certain control-flow targets like function entry points (EPs). G¨okta¸set al. [89] categorize the gadgets by their prefix (CS or EP), their payload (IC, fixed function call (F), other instructions), and their suffix (RET, IC, IJ). This categorization

72 4.3 System Overview results in 18 (2 · 3 · 3) different gadget types. They even use gadgets containing conditional jumps. With these gadget categories, they are able to bypass CCFIR, which they consider stricter than BinCFI. Another interesting gadget type is the i-loop-gadget [178]. In their work, Schuster et al. use a loop containing an IC to chain gadgets and invoke security sensitive functions.

4.2.4.2 Heuristic Policies The heuristic policies explained in Section 4.2.3 check for chains of short instruction sequences. To evade these checks, long gadgets with minimal side effects were pro- posed [40, 90]. If the heuristic check encounters a long instruction sequence, the evaluation is terminated and the chain is classified as benign. Another elegant method is to invoke a function call to an unsuspicious function like lstrcmpiW [178]. If the unsuspicious function does not alter the global state of the program and takes enough indirect branches, the attack can not be discovered by the heuristic checks.

4.3 System Overview

The process of discovering suitable code gadgets which fulfill certain CFI policies consist of broadly two phases: first, appropriate code has to be discovered and extracted. Second, it is translated into the symbolic representation and can then be classified according to semantic definitions.

4.3.1 Gadget Discovery Before we can describe the process of the gadget discovery, we have to define the gadgets’ properties first. The definition of the gadgets is important as they define the bounds and specify the content of the gadgets. After the definition of the gadgets is given, we introduce the algorithms to locate all points of interest for the gadget discovery and the algorithm to discover the gadgets themselves.

4.3.1.1 Gadget Categories As explained in Section 4.2.4.1, our gadgets conform to the specifications defined by G¨okta¸s et al. [89] and Schuster et al. [178], except minor modifications. Their definitions provide sufficient properties to, for example, find complete functions for code reuse and other CFI resistant gadgets. We used their definitions to restrict the gadget discovery, but definitions can be extended and added in modular fashion to our framework to support additional gadget types. The bounds of our gadgets have to conform to legitimate control-flow targets. Thus, they have to start at an EP or at a CS and end with an IC, IJ, or RET. The content of a gadget is defined as either an IC, a fixed function call (F), or other arbitrary instructions. We opted to drop IC as gadget content definition, because we can connect a gadget ending with an IC with the gadget it follows starting at the CS. Fixed function calls are beneficial in two ways. Instead of reading the address of the function from the import address

73 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

Figure 4.3: Instructions of an example loop gadget. Just the gray basic blocks belong to a loop gadget by our definition. table (IAT) and preparing the call, one can simply use the gadget with the fixed function call. However, this just works if all parameters of the function can be set to the desired values. Furthermore, defenses preventing calls to security sensitive functions [228] can be circumvented by using gadgets containing a legitimate call to the function. As we show in Section 4.5, many hardcoded function calls inside of gadgets exist. Another useful gadget is the loop gadget. Loops can be used as a dispatch gadget [26, 178] to invoke other gadgets. Figure 4.3 shows a gadget proposed by Schuster et al. During the first iteration of the loop, RBX points to the beginning of a list with the addresses of the to-be dispatched gadgets. RDI points to the end of this list during all iterations of the loop. If the end of the loop is reached the gadget returns. The difference between the proposed gadget and the gadget defined for our search is that just the gray basic blocks in Figure 4.3 belong to our loop gadget definition. For simplicity, loop gadgets end with an IC and start either at the CS of its IC or at an EP. Hence, the basic block beginning with the label @skip and the last basic block comprise a separate, overlapping CS-RET gadget. This has the advantage that also loop gadgets in big functions without a tailing gadget (CS-RET) are found. Additionally, one can query if another gadget starts at the end of the loop gadget. This way, when searching for tailless loop gadgets, we can query if code which overlaps, comprises a gadget containing another suffix than RET. All supported gadget definitions are summarized in Table 4.1. These definitions allow us to extract code with conditional jumps such that each single code path represents a single gadget in a path insensitive way. As each of them is verified with symbolic execution later on, path-sensitive code gadgets arise and path-insensitive gadgets are dropped (see Section 4.3.2).

74 4.3 System Overview

Table 4.1: Gadget types currently supported by our framework, based on prefix, suffix and content.

Prefix Content Suffix EP Arbitrary Instructions IC EP Arbitrary Instructions IJ EP Arbitrary Instructions RET EP F IC EP F IJ EP F RET CS Arbitrary Instructions IC CS Arbitrary Instructions IJ CS Arbitrary Instructions RET CS F IC CS F IJ CS F RET CS Loop IC

4.3.1.2 Discovering Points of Interest To locate gadgets, our search algorithm follows the paths of the CFG. The starting points for the search algorithm are IC, IJ, and RET instructions. The algorithm to locate these points of interest works in two phases. In the first phase, addresses of all calls to fixed functions in all modules of a program of interest are extracted and kept. The set of fixed functions comprises critical imported functions which handle memory management, process and thread creation, and file I/O. These are typically very valuable for an attacker. During the second phase, the algorithm iterates over every instruction belonging to a function. If an instruction is a RET, IC, IJ, or a call, the address of the instruction is added to the corresponding list of starting points.

4.3.1.3 Gadget Extraction with Depth-First Search To construct the gadgets from Section 4.3.1.1, we have to traverse the CFG of every function in the binary. As we limit gadgets to single paths at first and can merge them into conditional gadgets later on in Section 4.3.2, we walk each path separately. We start our traversal from the discovered gadget endpoints, namely ICs, IJs, and RETs. We walk every possible path backwards until we discover a gadget starting point (EP and CS), or until we exceed an adjustable maximum instruction length of the gadgets. The algorithms we use are a modification of depth-first search (DFS). First, the basic block is located containing the gadget endpoint. Afterwards, we check if there are any calls or fixed function calls between the endpoint and the basic block’s

75 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks beginning. If we encounter a call, a CS gadget is created and the path traversal stops. Before a gadget is added to the gadget list, it is checked if a gadget with the same opcode sequence is already in that list to optionally discard or keep it for later analysis. If a fixed function call is encountered we store the information of the fixed function call and split the current basic block. The resulting first block starts at the beginning of the original basic block and ends at the fixed function call. The resulting second block starts at the CS of the fixed function call and ends with the gadget endpoint. Thus, a CS prefixed gadget is created. Path traversal continues and on a hit of a call, the traversal stops. We check if the current basic block contains the EP. In that case, we create a EP prefixed gadget. To traverse all possible paths backwards, we keep path information and iterate over all direct preceding basic blocks. Then, for each block, we check if the basic block has been visited before. If that is the case, a loop gadget is only added, if the traversed path starts at a CS and ends at a IC. In any case, the traversal returns if the basic block has already been visited. Afterwards, the checks for a call, fixed function call, and EP are repeated. Finally, the instruction length of the gadget is checked and updated.

4.3.2 Gadget Analysis Two objectives are accomplished with the gadget analysis: first, we sort out gadgets with unsatisfiable path constraints, and second, gadgets are matched to semantic definitions and classified accordingly. This simplifies the utilization by an security researcher to find wanted functionality. To make a simplified search possible, code gadgets are transformed to a symbolic rep- resentation, executed symbolically to determine its execution contexts and clustered into semantics due to their execution effects.

4.3.2.1 Lifting Code Gadgets with Zex3 to Raw Symbolic Representations Code gadgets are first translated to instructions of the VEX IL. These are mapped to Z3 expression as evaluable strings and stored offline. Therefore, most architecture-dependent peculiarities, such as stack and flags usage are abstracted away and implicit execution effects are made explicit. The goal of this part of the framework, which we named Zex3, is to gather raw symbolic expression which are closely related to the structure of VEX IL instructions. Thus, registers and memory accesses are still architecture dependent.

4.3.2.2 Unification of Raw Symbolics with Zolver3 Unification of architecture-dependent registers and memory handling is done by a devel- oped Z3 wrapper which we named Zolver3. The goal is to gather symbolic expressions for each gadget to be symbolically evaluable by one component only, namely Z3. There- fore, symbolic equations created by Zex3 are transformed into a generic format, such that register usage, memory reads and writes are adjusted. This produces a single base usable to separate symbolic representations into semantic bins and to verify satisfiability of each code gadget. As mentioned in Section 4.3.1, each gadget is a single path. Thus, symbolic execution of overlapping gadgets can yield conditional gadgets as well.

76 4.3 System Overview

Figure 4.4: An example for a fixed function call gadget with unsatisfiable path constraints.

4.3.2.3 Symbolic Analysis of Code Gadgets

It is necessary for a security researcher during exploit development to outrule code gadgets which do not fulfill a desired functionality. We illustrate, what we name unsatisfiability on a gadget with a fixed function call: at the time of compilation, it is unknown to the compiler if a function call succeeds. Therefore, checks for the return value are normally inserted in the calling function. Depending on the return value, a different path in the control flow is taken. We might encounter such checks in gadgets containing a fixed function call. During exploitation we expect the fixed function call to succeed, hence, a gadget depending on a failed fixed function call poses an unsatisfiable path constraint. An example for such an unsatisfiable path constraint is given in Figure 4.4. The gray basic blocks belong to the tested gadget. In case the fixed call to VirtualProtectEx suc- ceeds, the return value in eax is non-zero. However, if the return value is non-zero the jump to loc 4265BE is taken and the basic block belonging to the gadget is not reached. We use the Z3 wrapper Zolver3 to symbolically execute the gadget. If the return value is implemented for the tested fixed function call, our framework checks if the tested gadget is satisfiable. In the case of an unsatisfiable gadget the analysis process is aborted. With the current level of information, a researcher is only able to search through the discovered gadgets based on their boundaries. There is no knowledge about the gadget’s effects on the state of the to-be-exploited process during runtime. This makes an efficient search to chain gadgets cumbersome. Therefore, the second objective is to match every register output and every memory effect of the symbolic representation to a semantic definition. Zolver3 provides the state of every register and every memory effect based on the symbolic variables and input values of the registers and memory. We do not have to trace every instruction of the gadget ourself, but we can treat the gadget as a black box. We send symbolic input values in and get all modifications to the global state of the process by the gadget based on these symbolic input values. This means that all register and memory store output values are symbolic expressions of the input values. We can use these expressions to apply our semantic definitions to the gadgets. The process of applying the semantic definitions to the output equations is explained in the next section.

77 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

4.3.2.4 Semantic Definitions In the following, we present our semantic definitions. These definitions allow the re- searcher, combined with the search presented in Section 4.3.3, to search gadgets with specific operations performed on a specific register or memory address. One or more definitions are assigned to each gadget, based on the operations the gadget performs. When a security researcher develops a code-reuse attack, the defined gadget types are the available instruction set. Therefore, the gadget definitions must cover all necessary instructions to perform arbitrary computations. The following gadget types are necessary to accomplish this: • MovReg: A gadget to move the content of one register to another. • LoadReg: A gadget to load a specific content into a register. • Arithmetic: A gadget to perform arithmetic operations between registers. • LoadMem: A gadget to load the content of a specified memory area into a register. • StoreMem: A gadget to store the content of a register to a specified memory area. We add following four semantic definitions, because they represent operations which are commonly found in gadgets. Alternatives to extend the gadget definitions are discussed in Section 4.7. • ArithmeticLoad: A gadget that loads the value from a specified memory address, performs an arithmetic operation on it, and stores the result to the destination register.

• ArithmeticStore: A gadget that extends a StoreMem gadget with an arithmetic operation

• NOP - No Operation: A gadget that keeps certain registers untouched. This is very useful during a gadget search, because untouched registers can be marked as static.

• Undefined: If none of the previous semantic definitions match the equation of the register, the register gets marked as undefined. These gadget types are enough to create functionality containing jumps and conditional jumps. ROP uses the stack pointer to load the next instruction. Hence, an addition to or subtraction from the stack pointer changes the next instruction. This way, the developer can jump through her ROP chain. JOP and COP often use a dispatcher gadget, like the loop gadget, to invoke the gadgets of the chain. During the loop iteration one register holds a pointer into the buffer containing subsequent gadgets. Instead of the stack pointer, like in ROP, the register holding the pointer to the buffer has to be modified for jumps. Conditional jumps, however, are more complicated. They have to be accomplished by chaining several arithmetic operations [64]. But a study of exploits [159] reveals, that jumping by manipulating the stack pointer is rarely used. Normally the chains just set the shellcode to executable and redirect the control flow to the beginning of the shellcode. Snow et al. [189] come to a similar conclusion regarding the gadget definitions in their research.

78 4.3 System Overview

Applying the Definitions. At the end of the symbolic execution, we have an output equa- tion for every register and memory write. These equations consists of Z3 expression trees, which represent the AST of Z3 expressions. Our definitions are stored as Z3 expression trees as well. Thus, we can match each symbolic operation a gadget performs against our definition and tag the gadget with one or more definitions. We take the approach to apply our definitions to every register and get as many op- erations for every gadget, as the architecture has registers. To apply the definitions to every register, we loop over all equations belonging to classifiable registers and perform checks if the definitions match. Classifiable registers are the general purpose registers of the architecture and the instruction pointer. These are the registers that are usually accessible. We try to match every memory write to definitions recursively, because memory ac- cesses can be nested and every new memory store adds a new layer consisting of Z3 store operations.

4.3.3 Semantic Search In the previous steps, the gadgets have been discovered by their bounds and we have analyzed every effect the gadgets may have on the global state of a running process. As we want the search for the gadgets to be flexible, we perform the search on a register and memory write basis. One can specify the type of a single register or the types, operations, and operands of many registers. Naturally, a search with just the type of a single register yields a lot of potential gadget candidates. In the following section, we define algorithms to order the gadget candidates and eliminate unsatisfiable gadgets.

4.3.3.1 Complexity Ordering We have to present the simplest gadgets first upon a search to speed up the process of the gadget chaining. To provide the gadgets in a decreasing complexity order, we apply four criteria. The first criteria is that the gadgets with the lowest instruction count are presented first. Gadgets with a low instruction count are usually simple, as they typically do not perform many operations. The second criterion is to sort by the least amount of memory writes. For every unnecessary memory write, it has to be ensured that the write address is inside a writable memory area. Then the priority comes to contain the least amount of memory reads in the gadgets. The reason is the same as for the memory writes. However, readable memory areas are typically encountered more often and therefore easier to set up. Our last ordering criterion requires as many registers as possible to contain NOP definitions, as this limits unwanted side-effects such as overwriting a register which is set up by a previous gadget. The algorithm implementing the described ordering of the search is shown in Algorithm 2

4.3.3.2 Gadget Verification Our gadgets support paths containing conditional branches. The exact analysis of the conditions can be tricky. For example, a gadget is needed to load the value 0x12345678

79 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

Algorithm 2: Gadget Complexity Ordering Algorithm Input: search criteria {user defined search criteria} Output: gadgets {list of ordered gadgets} begin gadgets ← GetGadgetsWith(search criteria) gadgets ← OrderByInstrCountAsc(gadgets) gadgets ← SubOrderByMemStoresAsc(gadgets) gadgets ← SubOrderByMemLoadsAsc(gadgets) gadgets ← SubOrderByNOPCountDesc(gadgets)

from a specific memory address into a register. Algorithm 2 may return a gadget list with a LoadMem gadget ranked first that contains a conditional jump. The pitfall is that the jump is only taken, if the LoadMem operation loads a NULL value. This renders the gadget useless to load the value 0x12345678. Therefore, invalid gadgets similar to the one described above have to be sorted out. We use Algorithm 3 to check the constraints of the gadget list with Zolver3 until a satisfiable gadget is encountered.

Algorithm 3: Gadget Verification Algorithm Input: reg constraints {user defined register constraints}, gadgets {list of ordered gadgets} Output: valid gadget {first gadget fulfilling reg constraints} begin foreach gadget in gadgets do zolver3 ← new Zolver3() zolver3.AddGadget(gadget) zolver3.SetConstraints(reg constraints) if zolver3.IsSat() then valid gadget ← gadget break

A search query is specified by a researcher in the language Python. Therefore, the start/end type and the content definition of the gadget is normally specified, as well as the semantics and operations which the gadget has to fulfill.

4.4 Implementation

Our implementation consists of 6537 lines of Python 2 code divided into several com- ponents that search, store and analyze the gadgets. As VEX has architecture dependent components that are incorporated into the IR, we also need modules that take care of these specific functionalities. This is important for the translation to semantically equivalent Z3 expressions that are passed to the solver.

80 4.4 Implementation

4.4.1 Gadget Discovery

To discover the gadgets we use an IDA Pro plugin. The use of IDA Pro has several advantages due to its powerful API; it allows us to detect the target architecture of the binary; there is no need to parse the binary format ourself; and it gives us a convenient instrument to work on the supplied CFG for aligned instructions. During the initialization of the plugin we detect the architecture of the binary. Upon detection the plugin is loaded and functionalities made available. The architecture is important during the gadget search to determine which critical functions and modules to be used during the search. When our plugin initiates the gadget search, the first step is to gather information, like indirect call, indirect jump, and return addresses. We perform this search independently from the architecture by utilizing IDA API functions such as is call insn. Unfortunately, there are no IDA API functions to check for indirect calls. To avoid architecture depen- dencies by checking for specific mnemonics, we resort to check if IDA Pro detects a call instruction and then check for specific operand type flags. Each gadget is stored directly to the SQLite databases. The alternative would be to first cache the results and write them in bulk, however this would consume large amounts of memory. As the Python embedded in IDA Pro is limited to a 32bit process, analyzing a binary with hundreds of thousands of gadgets would quickly run out of memory.

4.4.2 Gadget Analysis

For the gadget analysis we run a script outside of IDA Pro in a 64 bit Python 2 environ- ment. We have to use a 64 bit environment, because pyvex is compiled as a 64 bit module to properly support binaries for 64 bit architectures. Due to the high memory consumption of analyzing the amount of gadgets we located in the first step, we use multiple child processes to perform the analysis on only parts of the gadgets at a time. To analyze the gadgets, the parent process starts a child process in an infinite loop and passes a start and end value as arguments. The child process then loads the gadgets whose IDs lie between the start and end value. If no gadgets in this range are available, it is assumed that there are no gadgets with a higher ID than the start value. In this case the child process returns 0 to the parent and the loop is aborted. Otherwise, the child process returns a 1 and a new child process is started to analyze the next range. The memory usage of the analysis process can be adjusted by increasing or decreasing the analysis range of the child. To sort out unsatisfiable path constrains, we check the satisfiability of the gadget. We set a timeout for the satisfiability check by Z3 to 100 milliseconds to keep the analysis efficient. If the gadget is unsatisfiable, either by invalid path constrains or by exceeding the timeout, the gadget gets deleted from the database. Our results show that a sufficient amount of gadgets remain with a timeout of 100 milliseconds. After the satisfiability check, we classify the gadget’s registers along with its memory access operations and update the respective fields in the database.

81 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

4.4.3 Gadget Search We perform our gadget search solely on the SQLite database created in the previous step, optionally multiple databases can be used. The search is performed in IDA Pro, which has the convenient advantage that the graph view can be utilized to highlight the instructions and the flow of the gadget. To conduct the search, we have implemented a function for every semantic definition. These functions set the options of the operation performed by the semantic definition, e.g. MovReg. These map the actual operation required by the gadget to the semantic definition that can be searched for. Options to set include specifying source and destination registers for a MovReg gadget or the boundaries of the gadget. On top of the search for a single operation we implemented a a mode which allows specifying multiple operations to be performed by a single gadget. This allows one gadget, for example, to load two values from memory, perform an operation on them and save the result back to a location. The list of gadgets returned by the database query are already sorted by Algorithm 2. This guarantees that simpler gadgets, in terms of side effects, are considered first. We abort the search after the first database that returns a list of potential gadgets, in case multiple database files have been specified. We proceed this way due to our implementation of Algorithm 2. Afterwards the potential gadgets of the returned list are verified until a gadget is discovered that satisfies the specified conditions. If a gadget is found, we use auto inspect to either jump to the gadget and highlight it in IDA Pro’s graph or print information on the gadget.

4.5 Evaluation

In the following, we evaluate our prototype. More specifically, we analyze the distribution of the different gadget types across different architectures, demonstrate that we can dis- cover enough gadgets for successful exploitation, and compare our framework to existing tools. We conduct all tests for our evaluation on a 64 bit Linux system running on an Intel XEON processor E3 with 3.3 GHZ. For CFG and disassembly creation we use IDA Pro, and VEX of Valgrind 3.9.0 is used for Zex3’s translation process. Furthermore, we use pyvex’s latest commit at the time of testing [186]. For our evaluation, we analyzed the x86/AMD64 version of ieframe.dll and mshtml.dll of Microsoft’s Internet Explorer (IE) 8.0.7601.17514. We selected these libraries as they are often used during exploitation of IE [159]. To evaluate our gadget finder on ARM, we analyzed ’s (little-endian) libc-2.19.so, because we expect libc to always be loaded during exploitation of a Linux system on ARM. All gadgets residing in libc-2.19.so are in ARM mode. The gadget numbers presented in this section are the total number of gadgets, including gadgets with and without conditional branches.

4.5.1 Analysis Time of Gadget Discovery Table 4.2 contains information on the number of gadgets available for search queries after discovery and analysis. During the analysis, we delete the gadgets containing instructions

82 4.5 Evaluation

Table 4.2: Number of available gadgets and the time it takes to discover and analyze them. All times are displayed in seconds (s).

ieframe.dll mshtml.dll ieframe.dll mshtml.dll libc-2.19.so Architecture x86 x86 AMD64 AMD64 ARM Discovered Gadgets 99355 160266 108010 181749 12401 Remaining After Analysis 91584 147695 95062 163827 10450 Discovery Phase (s) 51.9 91.6 66.7 122.0 11.0 Analysis Phase (s) 12873.3 28967.1 16242.7 51137.8 4068.0 that are unsupported by either VEX or Zex3. We also delete the gadgets with unsatisfiable path constraints. The row Remaining After Analysis shows the count of how many gadgets still remain after the analysis process. A beneficial factor for ARM are the frequent conditional instructions of the architecture. However, these instructions result in complex Z3 equations and, therefore, take longer to translate and to symbolically execute than x86 and AMD64 instructions. Hence, the time benefit for ARM is higher if the instructions do not have to be translated and analyzed.

4.5.2 Gadget Type Distribution

This section provides information on the distribution of different gadget start and end types, the number of discovered loops, and insights on direct calls to sensitive functions within gadgets. It is important to know about the availability of gadgets to determine if successful exploitation is possible. Table 4.3 summarizes the gadget start and end type distribution. Note that the com- bination with the highest number of gadgets is CS-RET. With CS-RET gadgets, one can execute common ROP exploits without triggering CFI checks. Due to the high proportion of CS-RET gadgets, the highest possibility to find suitable gadgets for a gadget chain is

Table 4.3: Number of available gadgets categorized by gadget start and end type.

ieframe.dll mshtml.dll ieframe.dll mshtml.dll libc-2.19.so Architecture x86 x86 AMD64 AMD64 ARM EP-IC 4255 4245 4354 3947 261 EP-IJ 59 370 172 1009 79 EP-RET 11521 16723 10950 16517 2615 CS-IC 36300 55225 38679 68791 1226 CS-IJ 67 28 76 1365 240 CS-RET 39382 71104 40831 72198 6029

83 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

Table 4.4: The number of gadgets containing a loop for each analyzed library.

ieframe.dll mshtml.dll ieframe.dll mshtml.dll libc-2.19.so Architecture x86 x86 AMD64 AMD64 ARM Loops 348 443 335 464 55

searching for a ROP chain. Table 4.4 and Table 4.5 summarize the total count of gadgets containing loops and fixed function calls. Our loop counts, presented in Table 4.4, are based on our loop definition. This means that all listed loops end with an IC and start at the CS of the IC. The number of discovered loops can still be further increased by implementing loops for JOP or allowing relaxed loop definitions. It is worth noting that all functions typically used by attackers for malicious behavior are available, such as VirtualProtect to set memory to executable or writable, LoadLibrary to load a library into the address space, and CreateProcess to create a process. Gadgets containing fixed function calls are not restricted to some gadget start and end types, but are interspersed throughout all start and end type combinations. For the x86 and AMD64 DLLs mentioned in Table 4.5, we found 982 gadgets with hardcoded calls to functions which allocate memory, change memory permissions, load DLLs or perform file I/O operations.

Table 4.5: All fixed function calls remaining after the analysis and their count per analyzed library.

ieframe.dll mshtml.dll ieframe.dll mshtml.dll Architecture x86 x86 AMD64 AMD64 msvcrt::memcpy 105 187 162 424 KERNEL32::VirtualProtect 1 0 1 0 KERNEL32::VirtualAlloc 0 0 1 0 KERNEL32::MapViewOfFile 7 1 4 0 KERNEL32::LoadLibraryW 22 10 28 10 KERNEL32::LoadLibraryExW 4 0 0 0 KERNEL32::LoadLibraryA 1 0 2 6 KERNEL32::CreateProcessW 5 0 1 0 KERNEL32::CreateFileW 1 0 2 0 KERNEL32::CreateFileMappingW 1 1 0 0

84 4.5 Evaluation

4.5.3 Exploiting ARM with One CFI-Resistant Gadget

To evaluate our gadget finder on ARM, we exploit an artificial use-after-free vulnerability. While we explain use-after-free bugs in depth in the next chapter, it is important to notice that an adversary may have control over memory content subsequently used in the program’s execution. The instruction initiating our chain is an IC in ARM mode and the first argument, stored in R0, contains a pointer to our prepared buffer content. The protection in place is similar to CCFIR. This means, IC and IJ can just transfer the control flow to EPs, and RETs are only allowed to return to legitimate CS. We assume that an information leak is available, which is usually the case for real-world exploits. Our gadget pool is derived from Debian’s libc-2.19.so. All discovered gadgets are in ARM mode. The goal of the exploit is to execute system("/bin/sh").

Figure 4.5: An ARM gadget which loads the address of “/bin/sh” from the supplied buffer in R0, loads the address of system from the buffer to R12, and ends with an IC of R12.

On ARM, the first argument to a function is not passed on the stack, but in the register R0. Therefore, to execute system("/bin/sh") we have to load the address of a string containing "/bin/sh" into R0. We do not have to write the string to memory ourselves, as it is already present in libc-2.19.so. We use the information leak to get the base address of libc-2.19.so. The address of libc-2.19.so is also required to get the address of system(). But at first, we have to find the gadgets to load the address of system() and the string "/bin/sh" from the buffer and call the system() function. These addresses are placed later on in our buffer. A pointer to the buffer is passed to our gadgets in R0. Due to the protection scheme in place, the gadget has to start at an EP. The end of the gadget is not defined, yet. An automatically discovered gadget that exhibits the required actions is displayed in Figure 4.5. First, it loads the address of "/bin/sh" from our buffer to R0 via LDR R0, [R0,#0x1C]. And second, it loads the address of system() to R12 and calls R12 at the end. This way, the objective to execute system("/bin/sh") is achieved with a single gadget. The buffer that we use during the exploit is shown in Listing 4.1. At offset 0 the buffer must contain 0x00000001 to satisfy TST R3,#1. Just if this check is valid, the address of system() gets loaded and called.

85 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

1 # Must contain 0x00000001. 2 Buf+0x00 = 0x00000001 3 # Address of the first gadget. Position in buffer is dependent on freed object. 4 Buf+0xXX = 0x00071704 5 Buf+0x04 = 0x41414141 6 ... 7 Buf+0x18 = 0x41414141 8 # .rodata:00122F58 aBinSh DCB "/bin/sh",0 9 Buf+0x1C = 0x00122F58 10 Buf+0x20 = 0x41414141 11 ... 12 Buf+0xA0 = 0x41414141 13 # .text:0003B190 system 14 Buf+0xA4 = 0x0003B190

Listing 4.1: Buffer exploit data. Only addresses at the offsets 0x1C and 0xA4, the address for the initial control-flow transfer, and the 0x00000001 at offset 0 have to be set

4.5.4 Comparison to Other Gadget Discovery Tools To investigate how our framework performs compared to other tools, we used ROPgad- get [172], xrop [217], and IDA sploiter [104] to search for unique gadgets in mshtml.dll, ieframe.dll, and libc-2.19.so. ROPgadget performs a semantic search based on the dis- assembly of Capstone [39], while xrop and IDA sploiter perform a standard instruction search. Therefore, IDA sploiter uses IDA Pro. Hence, we can compare our framework to a tool which uses the same disassembly as input. We searched gadgets with a length of max. 30 instructions with ROPgadget and IDA sploiter, and with a max. length of five instructions in xrop, because the length cannot be adjusted. Then we dropped unaligned gadgets which these tools delivered, as well as non CFI-resistant gadgets. Table 4.6 directly compares the number of found gadgets between our framework and the other tools. In summary, our framework discovered 1.2 times to 154.3 times more gadgets.

4.6 Related Work

Code-reuse attacks have evolved from a simple return-into-libc [67] into a highly sophis- ticated attack vector. In times of DEP, Sebastian Krahmer was the first to propose a method called borrowed code chunks technique [115]. By chaining code snippets together that end with return instructions, Krahmer showed how to perform specific operations and as a consequence bypass DEP . His work was extended by Shacham in 2007 [183]. Shacham showed that Turing-completeness can be achieved by reusing instruction sequences that end in return opcodes, thus leading to the name Return-Oriented-Programming. He called those sequences gadgets. Typically large code bases provide enough gadgets to achieve Turing-completeness. While the first attacks targeted the x86 architecture, the concepts have been shown to be applicable on ARM [113] or SPARC [34] systems as well. ASLR [157] has been successful in stopping static ROP chains. However, its ineffectiveness has also been shown in the presence of information leaks. Even fine-grained randomization can be circumvented

86 4.6 Related Work

Table 4.6: Number of unique EP and CS gadgets found by other tools in comparison to our framework. Improvement factor states the factor of more gadgets found by our tool.

Tool CFI-resistant gadgets Improvement factor IDA sploiter: libc (ARM): 0 ARM not supported ieframe.dll (x86): 11721 7.8 mshtml.dll (x86): 14762 10.0 ieframe.dll (x86 64): 14192 6.7 mshtml.dll (x86 64): 19984 8.2 ROPgadget: libc (ARM): 8677 1.2 ieframe.dll (x86): 28747 3.2 mshtml.dll (x86): 30631 4.8 ieframe.dll (x86 64): 10479 9.1 mshtml.dll (x86 64): 14283 11.5 Xrop: libc (ARM): 1107 9.4 ieframe.dll (x86): 660 138.8 mshtml.dll (x86): 957 154.3 ieframe.dll (x86 64): 1531 62.1 mshtml.dll (x86 64): 2479 66.1 Our framework: libc (ARM): 10450 - ieframe.dll (x86): 91584 - mshtml.dll (x86): 147695 - ieframe.dll (x86 64): 95062 - mshtml.dll (x86 64): 163827 - by the means of just-in-time ROP as demonstrated by Snow et al. [189]. During the attack, they harvest gadgets based on the Galileo algorithm introduced by Shacham et al [183]. The algorithm starts at return instructions and iterates backwards over a code section to retrieve gadgets that end with the return instruction. A table lookup matches their gadgets against semantic definitions. This differs from our approach as we lift only CFI-permitted code paths to an intermediate representation (VEX) having a high ISA coverage, and symbolically evaluate the gadgets to achieve a semantic binning. Schwartz et al. developed a gadget search and compiler framework to automatically generate ROP chains. They apply program verification techniques to categorize gadgets into semantic definitions [179]. However, they do not take into account CFI-policies. Besides the randomization approach another defense has emerged. By enforcing cer- tain constraints on the allowed control flow of a program, ROP chains can be detected. Abadi et al. [1] proposed control-flow-integrity, which laid the foundation for practical implementations. However a perfect CFI solution introduces an unacceptable overhead. As such, the currently used variants relax the requirements of the classical CFI approach

87 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks or employ heuristics, leading to attacks as shown by G¨okta¸set al. [89]. G¨okta¸set al. showed that utilizing specific classes of gadgets lead to bypasses of modern CFI defenses. Schuster et al. [178] evaluated multiple modern ROP defenses and found ways to bypass them in practice with little change to the attack mythology. COOP also utilizes basic sym- bolic execution on virtual functions to find methods for an attacker to perform complete function-reuse. To aid in both the development of ROP attacks and CFI defenses, toolkits to locate suitable gadgets have emerged. Frameworks such as the one introduced by Kornau [72] or ROPgadget [172] utilize an intermediate language to abstract the underlying architecture. However, these do not locate gadgets conforming to the constraints introduced by CFI solutions. Our framework fills this gap and enables researchers to test their CFI policies on multiple architectures with only one toolkit. Closely related to our work is research which tries to measure the gadget quality by introducing several metrics [81]. However, these metrics are bound to an architecture, while our approach is architecture independent.

4.7 Discussion

The core property of our framework is the ability to quickly test CFI policies on multiple architectures. With the possibility to locate gadgets conforming to the same constraints in multiple environments, we enable researches to gain a fast overview on the security of policies. This is applicable to not only one architecture, but to all systems supported by our toolkit. As such, it speeds up evaluation allowing more time to be invested into the design of the the policies. The multi-platform approach also enables to determine differences between architectures, each of which have an impact on the availability of certain gadget classes. One specific gadget class can commonly occur on one architecture, while it is nearly non-existent on another architecture, consequently not posing a risk. Allowing researchers to focus on the most relevant gadget classes for each architecture may lead to defenses that fit more to the environment. While there are other toolkits that are able to locate gadgets on ARM, our framework differs in that it allows to apply the same CFI policies to different architectures.

Limitations. At the current state, we did not include a compiler that is able to generate complete chains from the found gadgets. While we simplify the task by providing an query interface, the last step is still manual. The simplest approach would be to blindly combine chains of gadgets until one of them satisfies the constraints. However, a better solution is to combine gadgets based on a logic that translates an intermediate language written by a developer to a series of gadgets. However, this is no easy task as avoiding CFI detections requires longer and more complex gadgets, which are not side-effect free. The compiler would need to account for both, the intended effects and the compensation of any side effect of the gadget. Due to the modular design, we can support additional gadget types and architectures. For instance, it is possible to extend the discovery phase to locate unintended instructions or whole virtual functions needed for a COOP-attack [177]. Another option is extending the definitions by a limit of targets for an IC of a gadget. This allows assessing fine-grained

88 4.8 Conclusion

CFI defenses. Recently, fine-grained CFI protections were shown to be less secure than assumed [41, 48, 78]. Hence, it is desirable to extend our framework with the specifics of these attacks to be able to evaluate fine-grained CFI systems in a generic way as well. Currently we only focus on coarse-grained CFI solutions.

4.8 Conclusion

We present a framework that not only discovers code-reuse gadgets across multiple archi- tectures, but also locates gadgets that can be used with deployed CFI defenses. While our framework can be used in an offensive way, we deem its value for defensive research to be higher. By quickly testing CFI constraints on multiple architectures, it is possible to focus on the most relevant attack vectors and improve both the defensive capabilities and the performance. In this process, we also showed that it is possible to locate CFI-compatible gadgets not only on x86, but also on ARM. CFI research is lacking behind on mobile platforms, and we hope that by providing an effective evaluation tool, further work on this topic can be simplified.

89 Chapter 4 Towards Architecture-Independent and CFI-Compatible Code-Reuse Attacks

90 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Most CFI solutions try to protect indirect control transfers. In such transfers, the target for an indirect transfer is not hardcoded but is selected during runtime. For example, this is the case for return instructions or indirect calls. Notable, indirect control transfers arise in C++ applications and major parts of web browsers are developed in this language. While spatial and temporal memory corruptions such as use-after-free bugs exist in C as well, adversaries have additional capabilities when hijacking the control flow in browsers. C++ provides an additional layer of indirection due to its additional language features. Some objects may contain fields which are pointers to function tables (virtual function tables). These fields are used during runtime to dynamically select the appropriate target function at special indirect control transfers (virtual function call sites). The crux of the matter is that it is easier for adversaries to replace or manipulate the pointer field instead of manipulating the control transfer target directly. In general, this technique is called vtable hijacking. In this chapter, we present a vtable-hijacking detection framework for binary-only software in order to prevent control-flow hijacking with bugs such as use-after- free errors.

5.1 Introduction

Particular kinds of programming mistakes that are prevalent today in C++ result in so called use-after-free vulnerabilities. These temporal safety problems are often abused by adversaries [4]. In a use-after-free bug, a program path exists during which a pointer to an object that was previously freed, is used again. This dangling pointer could cause the program to crash, unexpected values could be used, or an adversary could even execute arbitrary code. Several former zero-day exploits for Microsoft’s Internet Explorer were based on such use-after-free vulnerabilities [128]. In fact, a recent study suggests that 69% of all vulnerabilities in browsers and 21% of all vulnerabilities in operating systems are related to such bugs [37].

91 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

To take advantage of such vulnerabilities in object-oriented code, attackers typically uti- lize a technique called vtable hijacking. Compared to traditional attacks like stack-based buffer overflows or format-string attacks, this technique targets heap-based pointers to virtual tables (shortened: vtables), a feature of object-oriented languages like C++. Poly- morphic C++ classes have vtables that contain function pointers to the implementation of its virtual methods. If an object of a polymorphic class is freed, but a reference is kept, bogus vtables can be utilized to gain control of the program’s control flow. Therefore a bogus vtable is crafted and a pointer to it is injected at memory where the object is pointing to. A virtual function call site is then abused to use the bogus vtable, enabling an adversary to hijack the control flow.

5.1.1 Preventing Vtable Hijacking Since vtable hijacking attacks are a frequent problem in practice, several started to include protection techniques during the compilation phase. Both GCC and started to implement defense solutions. More specifically, GCC introduced the -fvtable-verify option [202] that analyzes the class hierarchy during the compilation phase to determine all vtables. Using this information, all virtual function call sites are modified such that virtual method dispatches can be checked during runtime. Similarly, SafeDispatch [106] – as a LLVM extension – inserts checks during compilation phases. VTGuard by Microsoft adds a guard entry at the end of the vtable such that (certain kinds of) vtable hijacks can be detected. Note that these approaches are only applicable to source code since they are implemented during the compilation and link phase. This pre- vents an adoption to COTS applications where only the binary code is available. However, especially such applications are vulnerable to vtable hijacks. After the research provided in this chapter was published [84, 85], similar and more fine-grained solutions for source and binary-only code have emerged. We provide a detailed overview in Section 5.6.

5.1.2 Our Approach In this chapter, we present a lightweight approach to provide vtable integrity for COTS binaries implemented in C++ code. We perform our analysis on the binary level since we aim to protect executables for which we do not have source code, debugging symbols, or runtime type information, such as for example web browsers or office applications for Windows. To this end, we introduce a generic method to identify virtual call sites in C++ binary code. More specifically, we lift the assembler code to an intermediate language (IL) and then perform backward slicing on the IL level such that we can spot different kinds of C++ virtual function dispatches. In a second step, we instrument each virtual call site and add integrity checks. To this end, we implemented a generic, static binary rewriting engine for PE files that enables us to implement an integrity policy P for each call site. For now, we have implemented different kinds of integrity policies that, for example, check if a vtable pointer points to a writable memory page (which indicates that an integrity violation happened) or check if a random chosen vtable entry actually is a code pointer. We have implemented our approach in a tool called T-VIP (towards Vtable Integrity Protection) that consists of a slicer called vExtractor and a binary rewriting engine

92 5.2 Technical Background called PeBouncer. Experimental results demonstrate that the precision is reasonable and the performance overhead small. Furthermore, our tool was able to mitigate all tested zero-day attacks against Microsoft’s Internet Explorer and Mozilla Firefox. Our main contributions are:

• We introduce an automated method to identify virtual function call sites in C++ binary applications based on an intermediate language and backward slicing. This enables us to determine the potential attack surface for use-after-free and related vulnerabilities in binary executables implemented in C++.

• We present a generic binary rewriting framework for PE executables with low over- head called PeBouncer that we utilize to implement integrity policies for virtual call sites.

• To the best of our knowledge, we are the first to present virtual table integrity protection for binary C++ code without the need for source code, debugging symbols, or runtime type information.

• We show that T-VIP protects against sophisticated, real-world use-after-free remote code execution exploits launched against web browsers, including zero-day exploits against Microsoft’s Internet Explorer and Mozilla Firefox. A performance evalua- tion against GCC’s virtual table verification feature [202] with micro- and macro- benchmarks demonstrates that our approach introduces a comparable performance overhead.

5.2 Technical Background

Before presenting our approach to enforce the integrity of virtual call sites, we first review the necessary technical background to understand the rest of the chapter. More specifi- cally, we explain C++ inheritance and polymorphism and show their manifestation on the internal low-level assembly stage. Furthermore, we discuss how virtual function tables are typically implemented, how this enables use-after-free memory corruption vulnerabilities, and explain why we need an intermediate language to perform our analysis.

5.2.1 C++ Inheritance and Polymorphism Inheritance is a general concept in Object Oriented Programming (OOP) languages. Data structures called classes can contain data attributes and functions named methods. Work- ing with classes is mostly done on their instances, which are referred to as objects. Classes can serve as base classes when they are inherited, creating derived classes, which inherit the base’s attributes and methods in addition to their own attributes and functions. Classes can inherit from multiple base classes, and also, derived classes themselves can serve as base classes, such that a (potentially very complex) class hierarchy is created between them. A programmer can change the functions of base classes inside derived classes by over- loading or implementing them. They must be declared as virtual, and any class containing

93 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software virtual functions is a polymorphic class (see Figure 5.1, left). The binding of virtual func- tions to a class’ instance is performed dynamically during runtime if the compiler cannot determine the instance’s type statically. Therefore, the function acts as a message and the instance as the message’s receiver. Depending on the dynamically determined type of the instance, the appropriate function is selected, hence, the message has different impacts on the instance according to its runtime type. For more details, the reader is referred to the literature [16, 197].

class A{ /* call of overloaded /* call of base's virtual virtual int Fn(){..}; virtual function */ function */ }; class B{ C* p = new C(); D* p = new D(); virtual int Fc(){..}; p->Fc(); p->Fc(); }; /*single inheritance*/ class C: public B{ ① mov R, [p] ① mov R, [p+offsetVtB] virtual int Fc(){..}; ② add R, offsetFc ② add R, offsetFc }; ③ ③ /*multiple inheritance*/ mov R, [R] mov R, [R] class D: public A, public B ④ mov this , p ④ lea this , [p+offsetVtB] { .. }; ⑤ call R ⑤ call R Figure 5.1: Single and multiple inheritance with polymorphic classes (left). C++ and assembly code of dispatching an overloaded virtual function (middle). And an inherited base class’ virtual function dispatch (right). Registers are denoted with R.

When compiling C++ code that contains virtual function dispatches, most compilers generate instructions containing indirect calls. These are good constructs for attackers to gain control of the instruction pointer by controlling the call’s target register. We elaborate on this danger in the following sections.

5.2.2 Virtual Function Calls

For each class that defines virtual functions, a virtual function table (abbr. vtable) will be created during . It contains the addresses of all virtual functions that a class provides. During runtime, when an instance is created, a pointer to a vtable is inserted into the instance’s layout similar to a data attribute. A class instance’s lifetime can involve the usage of several vtables depending on the number of base classes with virtual functions it inherits from. Figure 5.1 shows the low level instructions on two virtual function dispatches. At first, a vtable address is loaded into a register (x). Then an indexing offset is added to it to let the register point to the address of the virtual function (y). This is omitted if the virtual function is the vtable’s first entry. Afterwards, it is selected by dereferencing the vtable’s entry (z) and dispatched with an indirect call (|). Additionally, a this pointer is created and passed as parameter to the virtual function ({), either via the register ecx [130], or as the first parameter. As first parameter, location or registers are used, which are specified in the corresponding calling convention, i.e., the stack or the register rdi. The this pointer constitutes the instance’s address and is adjusted in case of multiple inheritance.

94 5.2 Technical Background

Table 5.1: Variations of step (4) based on compilers: variadic virtual functions retrieve the this pointer via the stack, either by a push instruction or by stack pointer addressing (FPO). For non-variadic functions, the ecx register is used.

Compiler Passing this to virtual function via: non-variadic function variadic function GCC (MinGW) ecx stack(FPO) (LLVM) stack(FPO) stack(FPO) MS Visual C++ ecx stack(push)

These semantic steps can then be generalized: let obj be the address of an instance and i the displacement offset to the vtable pointer at obj. The length of the vtable in bytes is |vtable| indicated as |vtable|. Then, on a 32-bit system, j ∈ [0, 4 − 1] denotes the index into the vtable, where an address of a virtual function vf resides. A memory dereference is stated with mem. Thus, we get:

∀ vf ∃ mem : mem(mem(obj + i) + j ∗ 4) = vf (5.1) ∀ this ∃ obj :(obj + i) = this (5.2)

(1) and (2) holds for virtual functions called indirectly, where (1) comprises steps x - z and |, and (2) describes step {. Compilers usually translate calls into these five steps [70]. Highly optimized code, such as modern web browser libraries, can omit step {, combine several steps into single instructions, and have multiple unrelated instructions in between. There are syntactical varieties in steps x - z and | dependent on optimization levels. However, the manifestation of semantic step { into assembly strongly depends on the used compiler and is independent of the optimization (see Table 5.1). These subtleties were observed in our 32-bit test binaries originating from C++ code with virtual, single, and multiple inheritance and polymorphic classes, as well as in COTS browser code. While the syntax may differ, virtual function dispatches reveal themselves in generalized semantics, at least in binary code stemming from GCC, LLVM, and MS Visual C++. As GCC and Visual C++ are standard compilers for browsers on MS Windows, our framework is able to extract virtual function dispatches from their generated code (see Section 5.4.1). We refer to the low-level assembly semantics of a C++ virtual function call as virtual dispatch, which includes vtable loading, virtual function selection, and passing the this pointer as hidden or first parameter to the virtual function. The assembly instruction which performs the indirect call of the virtual function we refer to as virtual call. Recent in-the-wild exploits, including two targeted zero-day attacks against Internet Explorer [128] and one against the Mozilla Firefox version included in the Tor Browser Bundle [140], achieved remote code execution by abusing virtual dispatches. Figure 5.2 illustrates the different manifestation of the five semantic steps for the three virtual dis- patches, utilized to exploit CVE-2013-3897, CVE-2013-3893, and CVE-2013-1690. Such exploits abuse in general the five steps outlined in Figure 5.1, but the actual manifestation of the steps can be completely different due to compiler optimizations and other low-level characteristics.

95 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Disassembly with Virtual Dispatches Semantics

0x612dc754: mov ecx, [eax] ① 0x612dc756: push eax ④ 0x612dc757: call dword [ecx+0x4] ②③⑤

CVE-2013-1690: xul.dll 17.0.6.4879 @ 0x611d0000

0x7167d53e: mov eax, [ebx] ① 0x7167d540: and dword [ebp-0x18], 0x0 0x7167d544: lea ecx, [ebp-0x18] 0x7167d547: push ecx 0x7167d548: push dword 0x7167d58c 0x7167d54d: push ebx ④ 0x7167d54e: mov edi, 0x80004002 0x7167d553: call dword [eax] ②③⑤

CVE-2013-3897: mshtml.dll 8.0.7601.17514 @ 0x714c0000

0x706c3857: mov ecx, [esi] ④ 0x706c3859: mov edx, [ecx] ① 0x706c385b: mov eax, [edx+0xc4] ②③ 0x706c3861: call eax ⑤

CVE-2013-3893: mshtml.dll 9.0.8112.16421 @ 0x702b0000 Figure 5.2: Disassembly and corresponding semantic steps of virtual dispatches in vulnerable modules with base addresses denoted after the @ sign. All examples were used in real attacks to gain control of the instruction pointer at step (5) by loading a fake vtable at step (1) first.

5.2.3 Threat Model: Vtable Hijacking In the recent past, attackers have developed several exploitation techniques to turn use- after-free memory corruption vulnerabilities into reliable and arbitrary execution of their code of choice [92, 182, 189]. Such exploits render all current operating system security mechanisms ineffective and are one of the most common attack vectors we observe cur- rently in the wild. In the following, we explain the different basic stages of such exploits based on Figure 5.3. Use-after-free memory corruption vulnerabilities are based on dangling pointers. During runtime, a C++ application requests a new instance of class C at time tn which has im- plemented a virtual function fc. Internally, constructing an instance invokes the memory manager to allocate needed memory. The instance’s structure is built, including a vtable of C. Furthermore, a pointer p to the instance is created. At subsequent execution time tn+1, the instance is removed but the pointer is still kept. If the programmer did not reset the pointer or if an alias was created, the reference to the freed memory still exists. Hence p or the alias becomes dangling.

96 5.2 Technical Background

Figure 5.3: The C++ stages, internal low-level operations, and resulting instance’s memory layout of a use-after-free exploitation process utilizing vtable hijacking

At time tn+2, an adversary can deliver payload content of her choice to the memory where p is pointing to (e.g., via heap spraying [60, 192] and similar techniques), and inject a fake vtable. In practice, this vtable resides in writable memory, whereas a legitimate vtable always resides in non-writable memory. Surprisingly, just checking for non-writable on vtable addresses during runtime before their usage already prevents many of the in- the-wild exploits as our evaluation shows (see Section 5.5.3 for more details). The content of the injected vtable needs to be carefully crafted by the attacker to have an entry pointing to the adversary’s first chosen chunk of code [58]. Later at tn+3, the virtual function fc is dispatched, leading to the instances’s pointer p dereference, the fake vtable’s pointer dereference, and the call of the adversary’s code. Thus, this initiates the step of a code execution attack by retrieving control of the instruction pointer and redirecting the logical program flow (control-flow hijacking). As explained in the previous chapters, an adversary can then utilize code-reuse methods to bypass the Data Execution Prevention (DEP) protection and may craft information leaks [182] beforehand to bypass address space layout randomization (ASLR). Section 4.2.1 and Section 3.2.6 explain these step in more detail. For more information, the reader is also referred to the available literature on code reuse techniques [26, 44, 115, 171, 183]. Usually, at the point of a control-flow hijack, the adversary’s first chosen code chunk is a stack pivot gadget used to exchange the stack pointer with a controlled register. This further redirects the program flow to her injected payload such as a ROP chain [58]. In real-world use-after-free web browser exploits, program snippets executed at time tn, tn+1 and tn+3 often reside far away from each other and may have been generated from different source files. Also, mostly the pointer p is not the original created one but another reference pointer, which is reused. Many recently detected zero day exploits utilize vtable hijacking to exploit vulnerabilities in web browsers as shown in Table 5.2. The main idea behind our protection scheme is to mitigate an exploitation attempt at the entry point, meaning, after the loading of a fake vtable pointer, but before it is used further to select a virtual function. Thus, the execution of the subsequent virtual call can be stopped, preventing the attacker from obtaining control of the instruction pointer, and

97 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Table 5.2: Zero-day attacks using vtable hijacking in-the-wild.

CVE Targeted Application Module Vulnerability 2013-1690 Fx 17.0.6 (TorBrowser) xul.dll use-after-free 2013-3893 Internet Explorer 9 mshtml.dll use-after-free 2013-3897 Internet Explorer 8 mshtml.dll use-after-free 2014-0322 Internet Explorer 10 mshtml.dll use-after-free 2014-1776 Internet Explorer 8-11 mshtml.dll use-after-free impeding successive malicious computations. Thus, compared to CFI schemes explained in chapter 4, which check the virtual function target at a virtual call, our defense prevents selecting a rogue virtual function as target.

5.2.4 Intermediate Language Prerequisites As discussed in Section 5.2.2, dispatching a virtual function consists of several low-level instructions that can be interleaved by other code or distorted due to compiler optimiza- tion. Our goal is to identify such virtual dispatch sites in a given binary executable in an automated way. Since our target architecture is Intel x86, this is a complex task due to the large number of ways to express virtual dispatches in x86 assembly. Furthermore, side-effects of the individual instructions complicate the analysis process. Thus we opted to abstract away from the assembler level and perform our analysis based on an interme- diate language (IL). In the following, we explain the needed background information and review the IL used for our implementation. We utilize a RISC-like assembly language as IL to transform 32-bit x86 disassembly to an intermediate representation. Currently, our IL of choice is REIL [73]. As typical for RISC, there is only one dedicated memory load and memory write instruction. Thus, one x86 assembly instruction is typically transformed into several IL instructions. One IL instruction consists of a mnemonic and three operands. The first and second operand after the mnemonic represent the source and the mnemonic’s preceding operand represents the destination holding the instruction’s result value. Note that not all operands have to be used in one instruction. As registers, real x86 registers as well as an unlimited number of temporary registers (referred to as IL registers) can be used in an interchangeable way. Real registers are generalized to Ri and temporary registers to rj. We refer to an undetermined register (i.e., a register that is either R or r) as q. Any immediate value is denoted with m and operands which are either q or m are denoted with v. Relevant instructions for our analysis are:

• memory load instruction q1 ← load v1 which loads a memory value pointed to by v1 into q1

• addition q1 ← add v1, v2 which adds v1 to v2 and saves it to q1

• substraction q1 ← sub v1, v2 which substracts v1 from v2 and saves it to q1

98 5.3 System Overview

• register store q1 ← stor v1 which stores the value of v1 into q1

• memory store v2 ← stom v1 which stores the value v1 to the memory pointed to by v2

• call v1 sets the instruction pointer to v1.

Note that the usage of IL registers indicates that a complex x86 instruction was de- composed into several IL instructions. Decomposing an indirect addressing instruction with base and displacement will lead to several IL instructions with temporary registers. However, the semantic of a x86 indirect memory addressing can be achieved with several x86 instructions, too. When decomposing them to an IL, almost the same IL instructions are generated as before, except that less temporary and more real registers are used. This means that we can imply certain syntax usage in x86 disassembly from its IL representa- tion. This becomes important in Section 5.4.1.

5.3 System Overview

Figure 5.4: Overview of T-VIP, consisting of vExtractor and PeBouncer

Developing a practical vtable hijacking mitigation and protection framework for binary C++ code involves several engineering challenges. In the following, we introduce our approach called T-VIP (towards Vtable Integrity Protection) to achieve this goal. T-VIP consists of vExtractor, the unit which identifies virtual dispatches in binary code, and PeBouncer which transforms the original executable in order to allow only legitimate virtual calls to be executed. We start by giving a brief description of each component and present a high-level overview of their necessary interactions (see also Figure 5.4 for an illustration).

5.3.1 Automated Extraction of Virtual Function Dispatches

The first component of T-VIP is vExtractor, a static instruction slicing and extraction framework. It takes an executable as input whose vtable usage instructions before virtual calls should be protected, and disassembles it in a first step. While disassembling x86 bina- ries correctly is challenging [180], current approaches are sufficient to generate disassembly

99 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software usable for program transformations [228]. A control flow graph (CFG) is generated, re- flecting the control flow in the form of vertices and edges. Vertices represent basic blocks and edges the control transfers between them. The disassembly is then transformed into an intermediate language to boil down the complex instruction set into a RISC-like syntax, while preserving the semantics and the CFG of the original code. Next, all addresses of indirect call instructions are extracted and defined as slicing criterions [22, 188]. vEx- tractor then performs backward program slicing on the IL to determine if an indirect call is a virtual call. It extracts all instructions which fulfill the low-level semantics of a virtual dispatch, thus, we retrieve virtual dispatch slices. This is achieved via state machines, whereby one state consists of a set of IL instructions (see Section 5.4.1 for details). Furthermore, each state points to at least one successor state. Slicing starts at indirect call sites and the state whose instruction should be found next (in the backward IL instruction stream) is set as target state. If a target state is successfully matched against an instruction in the stream, then its successor is set as target state. As long as the last state of the state machine is not reached or is still satisfiable, slicing continues. It stops either if the last state is reached or if a state cannot be fulfilled. The latter disqualifies the indirect call site as a virtual call. In the former case, vExtractor classifies the call site as part of a virtual dispatch, extracts all instructions which are part of it, and associates its components such as x86 registers, offsets and addresses with instance, vtable and virtual function properties. Most important is the instruction which loads the vtable pointer of the instance into a register, as verification checks will be performed on these registers later on during runtime.

5.3.2 Automated Protection of Virtual Function Dispatches The information produced by vExtractor and the original executable are the input to PeBouncer, the second component of T-VIP. We developed PeBouncer as a generic and static binary rewriting engine for executables conforming the PE specification [129]. Thus, while we use it to generate a protected executable, it is suitable to instrument instructions of interest similar to Pin [124] or DynamoRIO [13, 31]. Furthermore, it can be used to insert arbitrary code in order to enhance an executable with defense techniques similar to Vulcan [195] or SecondWrite [151]. A user who wishes to instrument an executable with PeBouncer has to specify the addresses of instructions to instrument. Each of these instructions is replaced by a forward jump redirecting to an instrumentation stub inserted into a newly created section in the executable. The original instruction is preserved by copying it to the beginning of the stub. The stub ends with a backward jump targeting the address after the redirection to continue the original program flow. When replacing instructions, edge cases such as control transfers, basic block transitions, and relocations are considered and measurements are taken into account to rewrite such cases correctly (see Section 5.4.2). The instrumentation can then implement an integrity policy P to perform checks on the virtual dispatches. Instrumentation stubs to enforce the policy can be developed in assembly code and can invoke functions of a user generated shared library, which we refer to as service library. Hence, major instrumentation code can reside inside the service library in order to remain customizable and still being able to accomplish complex tasks.

100 5.4 Implementation

Using PeBouncer, we can instrument instructions which load a vtable in virtual func- tion dispatches and generate distinct binaries with different integrity policies. As noted above, a vtable always resides in memory pages which are non-writable such as code or read-only data sections. They contain virtual function pointers pointing to read-only and executable pages. This basic insight can be leveraged to implement a simple integrity policy that checks if vtable pointers correctly point to non-writable memory pages. The following kinds of integrity policies are possible:

1. Pnw: Look up vtable pointers in a lookup table with bits set for non-writable memory pages of modules and unset otherwise. This offers performant validation. When determining the memory protection of a vtable address, the page it belongs to is queried instead of the vtable pointer itself.

2. Pnwa: Includes Pnw. Additionally, one entry inside the vtable residing above the virtual function pointer about to be called, is randomly chosen. The entry is derefer- enced, and the resulting address is queried for the non-writable flag. This is applied to all virtual dispatches calling a virtual function pointer that is not the first in the vtable.

3. Pobj: Leverage type reconstruction of object-oriented code [107] in order to recon- struct all objects and the according class hierarchy. As an integrity check, we could verify if a vtable actually maps at virtual dispatch sites.

Note that Pobj is hard to implement in practice on binary code (in contrast to compiler- level implementation), as object recovery has yet to be shown practicable for huge COTS software like web browsers [107]. Hence, we did not implement this policy as part of our work. However, recent research going into that direction looks promising [206]. The automated extraction of virtual dispatches can also yield slices not being virtual dispatches. For example, when binary code originated from nested C structs (see Sec- tion 5.5.1). As such, vExtractor might output instructions which seemingly load a vtable, but in fact represent other kinds of code constructs. This problem can be addressed with a profiling phase as follows: T-VIP first generates an executable instrumented with the checks using policy Pnw, and runs it dynamically on tests in a trusted environment to visit (ideally) all instrumentation stubs. Hence, assumed vtables appearing in writable memory are the result of other constructs and are discarded in a second pass: To ensure vtable integrity protection at runtime, T-VIP applies PeBouncer a second time to the original executable using policies Pnw or Pnwa to produce the final protection. Note that additional policy checks (even complex ones such as Pobj) could be imple- mented in the future to ensure a more complete protection towards virtual table integrity, as PeBouncer is generic.

5.4 Implementation

We now describe in detail the inner workings of T-VIP involving the stages of disassem- bling an executable, transforming it into intermediate language, and performing program slicing to retrieve virtual dispatch slices. This is followed by the architecture of our generic

101 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software binary rewriting engine, and its usage relevant to instrumenting and protecting executables against vtable hijacking. Currently vExtractor supports the IDA Pro disassembler internally to disassemble an executable and generate a control flow graph. Other disassembly frameworks (e.g., BAP [32], ROSE [168] or Dyninst [18]) could be supported in the future. Based on the disassembly, we search for call instructions with registers or indirect addressing with base register and optionally index, scale and diplacement as operand. Addresses of indirect calls are stored and the disassembly is transformed into the platform-independent intermediate language REIL [73]. This produces a second CFG layer: Even strict x86 basic blocks having only one entry and exit can turn into a CFG when being transformed into an IL. Due to certain x86 instructions having implicit branches, the decomposed IL instructions emerge into a CFG, such that a x86 basic block is represented as CFG. We treat the outer x86 CFG layer and inner IL CFG layer separately. As discussed in Section 5.2.2, virtual dispatches have a certain semantic which can express itself differently in the x86 syntax. We exploit the advantages of an IL which converts syntactically greatly varying but semantically similar instruction streams into similar constructs. This facilitates the harvesting and classification of semantically similar but syntactically different instruction streams like virtual dispatches via backward slicing.

5.4.1 Amorphous Slicing

We implemented intra-procedural backward slicing on IL into vExtractor based on state machines. As program slicing is a common technique, we refer the reader to the literature for information about program slicing [22, 94, 188]. In our state machines, one state consists of a set of IL instruction patterns. Figure 5.5 shows our state design and a state transition. When an IL instruction in the instruction stream to search (x), fulfills a pattern’s condition such as matching mnemonic and match- ing destination register, the state is triggered (y). The source of the matched instruction is taken and inserted into the successor state’s instruction patterns as destination (z). A transition to the successor state is performed ({). Slicing continues (|), and the patterns of the successor state have to be fulfilled in order to trigger it.

Figure 5.5: State design and state transition principle. Wildcard operands are denoted with /.*/

102 5.4 Implementation

While the semantics of virtual function dispatches are simple, they can manifest them- selves in different and complex x86 syntax constructs (as previously shown in Figure 5.2). To extract virtual dispatches precisely, we developed several state machines to unveil in- structions being components of them, each visualized in Figure 5.6. The first state machine is used to search backwards, starting from each indirect call instruction in the IL instruction stream for dereferences. On a successful pass, if the final state is reached, vExtractor detects the vtable entry’s dereferencing and the instance’s dereferencing. We refer to the instance also as obj. Furthermore, if available in the IL instruction stream, the reference to the instance and the reference to the reference are detected, too. We name the reference which—when dereferenced—yields a reference to the instance, ref2. The reference that yields the address of the instance when dereferenced is called ref1. The instance’s reference (ref1), and also the reference’s reference (ref2), become important in subsequent state machines.

Figure 5.6: State machines used to detect consecutive dereferences and thiscall in virtual dispatches. Transitions may be constraint (by e.g.: mnemonic: add, source: add v, destination: v ←add)

In the first state machine, the indirect call represents the start state. State one to match is a memory load: the call’s destination register is followed to its definition. In case it was defined by a memory load, state two is set as next state to match. The matched in- struction’s source of state one becomes the destination register of state two. A mandatory memory load and an optional addition or subtraction instruction are searched. In case an addition or subtraction is matched, the state to match next does not change. As soon as the memory load is matched, the source register in the matched instruction dictates the next state to match: In case it is an IL register, state three is set as next state. If it is a

103 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software x86 register, state four is the next state to match. To trigger state eight from state three, the destination register of state three has to be found as source register in an addition or subtraction instruction respectively. State eight is the final state. Thus, starting from state zero (start state), successively transitioning into states 1, 2, 3 and ending at state eight yields the following IL example slice:

State: 3 r0 ← add q3, m1 2 q2 ← load r0 2 q1 ← add q2, m0 1 q0 ← load q1 0 call q0

When extracting the x86 disassembly, corresponding to the IL slice, a dereferencing se- quence arises, which exists in virtual dispatches when used with multiple inheritance:

mov q2, [q3 + m1] call [q2 + m0] q3 is the instance address (obj ), q2 contains the vtable address, m1 is the displacement to the base classes’ vtable to take, and m0 is the offset to the virtual function to call. When vExtractor backtraces the IL instruction stream continuing from state two and transitions the other states until the final state, it detects the following additional information:

Transitions Information 2 → 4 register with obj 4 → 8 second register with obj 4 → 6 register with ref1 4 → 5 → 6 register and displacement of ref1 6 → 8 register with ref2 6 → 7 → 8 register and displacement of ref2

While the first state machine extracts semantic components related to instance and vtable loading, the second state machine is designed to detect a thiscall. In a thiscall, the (ad- justed) instance’s address is passed via ecx to the virtual function as a hidden parameter. Dependent on the information gained by slicing with state machine one, the second state machine is built dynamically and adjusted with the registers and offsets found by machine one. Thus, one of three state machines shown in Figure 5.6 arises and will be chosen for the thiscall detection. Machine A is chosen if ref1 and ref2 were not found. Thus, it detects if obj is moved into ecx (transitions 0 → 1 → 7). With transitions 0 → 1 → 2 → 7, A detects if ecx is filled with an adjusted instance’s address. That is the case in virtual dispatches used in multiple inheritance, where an adjusted instance’s address is dereferenced to gain the vtable, and as well moved to ecx to prepare the thiscall.

104 5.4 Implementation

Machine B is chosen if there is no ref2, and it detects if ref1 is dereferenced to obj and if a subsequent move into ecx follows (transitions 0 → 1 → 3 → 7). If ref1 was displaced by an offset, transitions 0 → 1 → 3 → 4 → 7 of B detect the thiscall. If it fails to detect a thiscall with ref1, but recognizes that obj was used in transition 0 → 1 → 3, it is declared also as valid thiscall. When ref2 was found, machine C is chosen. Similar to B, it detects if a thiscall is prepared using ref2. If it fails to reach the final state but recognizes that obj or ref1 was used for the thiscall, it is classified as valid. Additional state machines are used to detect if an instance is passed as first parameter via the stack to the virtual function at dispatch time. They comprise our third set of state machines. These are built similar to A, B and C. But instead of using states with ecx ← stor qi, states containing patterns with esp ← stom qi are utilized. Thus, if the first parameter on the stack is obj, or if it evolves from ref1 or ref2, then the corresponding instructions are classified as components of a virtual function dispatch.

Summary. vExtractor uses a state machine based approach to harvest virtual dis- patch data of potential virtual dispatches. Therefore, it walks the CFG of a binary of interest backwards, starting from indirect calls and tries to match states to IL instruc- tions. On a match, object location and register, vtable register, virtual function offset, and the addresses of corresponding instructions are saved. Additionally the corresponding disassemblies of virtual dispatches are gained as slices. This virtual dispatch data is then fed into PeBouncer to generate an instrumentation of the vtable loading instruction to enforce specific policies.

5.4.2 Binary Transformations

Static binary rewriting (also called static binary transformations) allows the modification of compiled executables without the need for source code information or recompilation, and is done directly on the binary level. We implemented PeBouncer as a generic and automated instrumentation engine for PE executables for Windows. Thus, transforma- tions are applied statically to produce an instrumented binary that realizes an integrity policy. The relevant enforcement checks become active during runtime.

5.4.2.1 Insertion of Instrumentation Checks

We statically create a new code section where the integrity policy checking code will reside. Addresses of instructions to instrument are fed into PeBouncer and disassembled. As an instruction will be replaced with a redirecting, relative 32-bit jump to its instrumentation stub, we need enough space for the jump opcodes, while keeping other instructions in the neighborhood functional. Therefore, if the size of the original instruction’s opcodes is greater than the jump, we replace it and insert NOPs to fill the remaining space. If the size is smaller than the size of the jump, we disassemble downwards starting from the original instruction until there is at least enough space to insert our jump. Thus, we replace the instruction to instrument and subsequent instructions. The replaced instruction(s) will be

105 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software copied to the beginning of the associated instrumentation stub into the new section. Note that some instructions cannot be overwritten without additional measures:

1. Targets of all relative control transfer instructions have to be kept

2. Instructions with relocations must remain relocatable, our jump should not change due to an original relocation

3. Basic block terminators and basic block leaders must be preserved. For example, one terminator’s opcodes, and at the adjacent address located leader opcodes cannot be replaced with opcodes of one instruction. When there are several entries into the adjacent block, the inserted instruction would get split and resulting opcodes of both halves get interpreted wrongly as distinct instructions.

To tackle these and similar corner cases, we stop our downward search for space at such instructions and traverse the disassembly upwards instead, starting from the instruction to instrument. Similar to the downward search, we stop as soon as we have enough space to insert our redirecting jump, while overwriting additional instructions if necessary. If the instruction to instrument is enclosed between two instructions of above mentioned edge cases, we overwrite one of them with an illegal instruction and install a vectored exception handler. It then serves as a trampoline to the instrumentation stub. For each instruction to instrument, the redirecting jump’s target address is automati- cally calculated to point to the next available free location in the new section. The replaced instructions are copied there, and the instrumentation code is placed below them into the stub. At the end of the stub, a relative 32-bit jump is inserted. Its target address is calcu- lated to point to the original code, to the address right after the redirecting jump. Thus, the new code section is filled successively, stub after stub. An inserted instrumentation stub is shown in Figure 5.7.

Figure 5.7: Instrumentation stub insertion: a virtual dispatch is transformed in order to perform instrumentation on the vtable register (EAX) before a virtual call is issued.

vExtractor provides virtual dispatch slices including the addresses of vtable load instructions. For a C++ executable, we utilize these addresses to instrument and to protect them.

5.4.2.2 Generation of Instrumentation Stubs For instructions of interest, instrumentation stubs can be supplied in position-independent assembly. Hence, relative addressing can be utilized. A stub starts with a prolog to save

106 5.4 Implementation the register context and ends with an epilogue to restore it. The Netwide Assembler (NASM [200]) is used as assembly back-end. We created an annotation feature that serves PeBouncer as hint to modify the stub after it is assembled, but before it is inserted into our new code section. This allows one-time assembling of instrumentation code and many-time stub wise modification. It works as follows: Instrumentation code is provided as assembly file and contains specific keywords inside angle brackets, which PeBouncer recognizes. The brackets including the annotation keywords are replaced with a x86 mnemonic and a hash of the keyword as operand. Depending on the keyword, corresponding mnemonics which allow at least a four byte operand are used. This way, the assembly syntax stays error free, and the keyword’s information is preserved. Also, the occurrence of each keyword is counted. After the instrumentation code is assembled, it contains the binary representation of the hashes. Before the binary instrumentation code is about to be inserted as a stub, the hashes are searched and their occurrences are compared to the keywords’ occurrences to prevent collisions. Then they are replaced with adjusted opcodes specific for a keyword and specific for an instrumentation stub. The reader may ask what benefit it has. We instrument instructions representing vtable loading and each of these may use a different mnemonic and a different register to load the vtable. With our annotation feature, PeBouncer can assemble one instrumentation code for all instructions to instrument, and modify it for each stub to include the specific register which holds the vtable. Thus, each instrumentation stub for every vtable load instruction will operate on its specific vtable register. While creating instrumentation code in assembly is already convenient, the complete API provided by an operating system can be used as well. This is important for the usage of Windows API functions (e.g., OutputDebugString()). Therefore, a shared library (service library) is compiled with exported functions which wrap the API functions to use. Then an instrumentation stub is developed in a certain structure. It starts with a prolog to save the register context and instructions follow which save the instruction pointer (IP) to the stack. Thus, the first stack value will point to the absolute virtual address of the beginning of the stub. To call a service library’s function from instrumentation code, an instruction is specified to load the saved IP into a register. A subsequent indirect call with the register and an annotation keyword containing the library and function name follows. When PeBouncer encounters such a keyword, it replaces the keyword with its four byte hash, such that an indirect call instruction with base register and displacement emerges. The structure of a possible instrumentation stub is shown in Figure 5.8. After the instrumentation code is assembled, the four byte hash is replaced with a binary displacement value. This value is calculated in such a way that the register with the saved IP and the displacement (when summed up) point to a custom Import Address Table entry. This entry will contain the address to the service library’s function to use. During runtime – similar to an Import Address Table (IAT) generation – an additional data section is filled with pointers to the service library’s functions. This resolution is performed as soon as the instrumented executable is loaded into the address space of an application. Thus, the instrumentation code in the executable can reference the service library’s resolved functions addresses in our custom generated IAT. The service library itself can be loaded in many ways into an applications address space [105]. Either the application’s entry point is patched to load the library in case the instrumented executable is known to be

107 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Figure 5.8: Instrumentation stub structure: During runtime, the context is saved in the prolog and restored in the epilogue. Annotation keywords are replaced with respective instructions before assembling and stub wise modified before insertion. loaded afterwards, or the service library is specified to be loaded into every process’ address space [127] in case the instrumented executable is known to be loaded at the application’s startup. In any case, the service library is loaded before the instrumented executable’s entry point is about to be executed. Note that the above concept allows the full support of both ASLR and DEP (NX).

5.4.2.3 Virtual Dispatch Instrumentation

To mitigate vtable hijacking, virtual dispatches are instrumented with policies Pnw (12 assembly instructions) or Pnwa (23 assembly instructions). Each virtual dispatch consists of the low-level semantic steps described in Section 5.2.2. To protect against the use of fake vtables, we instrument vtable load instructions after step one of the virtual dispatch semantic to be able to check the register with the vtable address. We do this in the following way: We keep a read-only 64KB sized lookup bitmap in our service library, representing the complete user-mode memory pages. This bitmap is made writable and set up when an instrumented module is loaded into the address space of its application. Then its access permissions are set to read-only again. Each bit represents the write permissions of a page. A set bit means the page is non-writable, an unset bit means the page is writable. Thus, when loading the module, we find all non-writable module sections in complete memory and set the appropriate bits of corresponding pages in the bitmap. To keep it up to date, Windows loader functions are hooked to change bits when unprotected modules are loaded and unloaded. By now, instrumentation checks can query the page of a vtable address by a simple lookup instead of querying the vtable itself: During runtime, a vtable is loaded into a register. The control flow is then rerouted to

108 5.5 Evaluation its instrumentation stub. Then, the vtable address is transformed with simple operations to an index into the page bitmap. The bit for the page is queried and if it is not set, a violation of Pnw occurred. A set bit means that the page, and thus the vtable it resides in, is non-writable. However, an adversary could circumvent this check, if she manages to find an address which is non-writable and contains a pointer to a gadget of choice to start her ROP chain. Thus, to mitigate this type of attacks, after the page lookup of the vtable, there is an additional virtual method check (Pnwa): As step two of virtual dispatch semantics provides the offset to the virtual function, a pseudo-random index up to that offset is generated with the help of rdtsc. The vtable is dereferenced at this index and the resulting value is looked up in the page bitmap. A violation can be detected, as all entries in a valid vtable above the offset of the virtual function about to be called are method pointers pointing into non-writable code pages.

5.5 Evaluation

We have implemented prototypes for both vExtractor and PeBouncer. In what follows, we evaluate both tools regarding their precision, performance overhead, and pre- vention of real-world exploits.

5.5.1 vExtractor’s Precision As a first step, we wanted to gain insights into the precision and recall of vExtractor’s virtual dispatch detection. The analysis is performed against all identified indirect call instructions of a given program. We leverage a simple classification metric which states that any virtual dispatch found not being a virtual dispatch is a false positive (FP ) and missed virtual dispatches are false negatives (FN). True positives (TP ) and true negatives (TN) are the correctly found and rejected virtual dispatches, respectively. Based on this, we can define precision, recall, and F-measure as follows:

TP precision = TP + FP TP recall = (5.3) TP + FN precision · recall F-Measure = 2 · precision + recall We used version 4.9.0 of the MinGW-w64 GCC cross compiler as it contains a partly implementation of GCC’s virtual table verification feature [202]. Also, we ported missing parts of GCC’s vtable verification library (vtv) to Windows to be able to compile 32- bit PE files with MinGW-w64 resulting in functional vtable verification checks. This porting was necessary since we instrument programs as proprietary web browsers such as Microsoft’s Internet Explorer on Windows (see Sections 5.5.2 and 5.5.3). Thus, we needed a vtv version on Windows to compare against1. Our port was also incorporated into the mainline of GCC [201].

1Our MinGW-w64 extension is available at https://github.com/RUB-SysSec/WindowsVTV

109 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Compilation of 32-bit PE files with the -fvtable-verify flag will insert verification calls at each virtual dispatch after the instruction which loads the vtable into a register. Other resulting code stays identical for the same program when this flag is omitted. The version with verification is used to build a ground truth for indirect calls as they are preceded with verification routines in case the indirect call is part of a virtual dispatch. Otherwise, they are non-virtual calls. Indirect calls in the version compiled without verification are then grouped exactly into virtual dispatches and non-virtual dispatches based on the information gained from the first version. We applied vExtractor to the second version and classified the outcome of slices function-wise to retrieve a classification on the binary. We utilized the open source C++ cryptographic library Botan [29], which contains 90 cryptographic primitives. We chose Botan because of its extended use of C++ features. We compiled it with and without vtable verification. vExtractor traced a total of 6779 indirect calls and identified 6484 virtual dispatches (TP ) with 62 being non-virtual dispatches and 179 false negatives. This yields a precision of 0.99, a recall of 0.97, and an F-measure of 0.98. We analyzed the reasons for false positives and discovered that they are due to C code constructs. More specifically, C code can have semantics equal to virtual dispatches. Consider the following C code line with st and innerSt being pointers to structs and sFn being a function pointer:

st->innerSt->sFn(st, p1, p2)

The two dereferences and st as first parameter fulfill the virtual dispatch semantics when compiled down to binary code. However, these and similar constructs can be eliminated in a profiling phase, as we show later in Section 5.5.2. We discuss the reasons for false negatives in Section 5.7.

5.5.2 Runtime of Instrumented Programs To assess the performance overhead, we compiled Botan with and without GCC’s vtable verification and compared the runtime in micro- and macro-benchmarks to the plain build of Botan, and to the plain build instrumented with T-VIP. Additionally, we compiled the SPEC CPU2006 benchmark with MS Visual C++, instrumented it with T-VIP and compared the runtime to the native build. Finally, we hardened browser modules and measured their runtime overhead. Benchmarks were performed on an Intel Quad Core i7 at 2.6 GHz with 2 GB of RAM running Windows 7.

5.5.2.1 Comparison to GCC’s Virtual Table Verification We patched our port of vtv’s source code to measure the CPU cycles needed for each verifi- cation routine execution (VLTVerifyVtablePointer) in order to perform micro-benchmarks. Therefore, we inserted GCC’s build-in rdtsc routine at the beginning and at the end of the verification routine and executed Botan’s benchmark. The verification produced a median cycle count of 9205. We binary-rewrote Botan using T-VIP to measure the cycle count of our vtable protecting check code. Thus, we added additional rdtsc calls to the start and end of our instrumentation checks consisting of policies Pnw and Pnwa, and

110 5.5 Evaluation

Table 5.3: Binary sizes, amount of instrumented virtual dispatches (#VD), median runtime over three runs, and overheads of C++ SPEC CPU2006 benchmarks.

Runtime (in s) and overhead (in %) CPU2006 Size #VD Native Pe Pnw Pnwa rt(s) rt(s) ov(%) rt(s) ov(%) rt(s) ov(%) soplex 403K 746 232.25 231.05 -0.52 232.41 0.07 233.64 0.60 omnetpp 793K 1593 217.12 293.72 35.28 303.48 39.78 318.15 46.53 povray 1038K 154 164.27 164.22 -0.03 164.36 0.06 164.31 0.03 dealII 947K 272 360.97 361.75 0.22 363.01 0.57 363.14 0.60 xalancbmk 3673K 14061 182.97 294.29 60.84 331.98 81.44 372.26 103.45 took the vanilla build of Botan. We ran the benchmark and retrieved a median cycle count of 8,225 for Pnw and 12,335 for Pnwa. To perform macro-benchmarks, we built Botan with and without vtv with our newly ported GCC, not using rdtsc. We protected the vanilla build with T-VIP using policy Pnwa and run the benchmarking capability of both, ten times each. Botan’s benchmark consists of 90 highly demanding cryptographic algorithms. The version compiled with GCC vtv produced a median overhead of1.0 % with 46 algorithms producing a median overhead smaller than 2.0 %. The version protected with T-VIP produced a median over- head of 15.9 % with 37 algorithms producing a median overhead smaller than 2.0 %. We investigated the rather high appearing overhead: T-VIP installs a vectored exception handler for instrumented instructions, which cannot be overwritten with a jump to an in- strumentation stub (see Section 5.4.2.1). As an exception handler produces high overhead, algorithms executing it will run perceptibly slower.

5.5.2.2 Runtime Overhead Measurements

We compiled the seven C++ benchmarks of SPEC CPU 2006 with MS Visual C++ 2010, applied vExtractor and gained virtual dispatch slices for all except two (i.e., only five benchmarks actually have virtual dispatches). We hardened them with policies Pnw, Pnwa, and an empty policy (Pe) separately, using PeBouncer. Pe consist of a prolog and epilog only, and serves to measure the net overhead introduced by PeBouncer. The results are depicted in Table 5.3.

Overheads are ≤ 0.6 % in soplex, povray and dealII, while high overheads for Pnw and Pnwa in omnetpp and xalancbmk are mostly due to the net overhead of our rewriting engine (Pe column in Table 5.3). Using our policies Pnw and Pnwa with another binary rewriter could lower the overhead. However, as we show with COTS browser modules, the overhead in macro-bechmarks is actually low in practice. We applied vExtractor to xul.dll of Mozilla Firefox 17.0.6 and to mshtml.dll of Internet Explorer in versions 8, 9, and 10. We chose these modules because they contain the major parts of the browsers’ engines and former zero-day attacks were related to code in these modules (see Section 5.5.3 for details). Table 5.4 shows the amount of indirect calls and extracted virtual dispatch slices.

111 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Table 5.4: Amount of indirect calls (#IC), extracted virtual dispatch slices, and filtered non-virtual calls. #Instr. indicates the number of slices fed into PeBouncer to harden listed modules.

App. Module #IC #Slices #Filtered #Instr. Fx 17.0.6 xul.dll 66,120 53,268 73 53,195 IE 8 mshtml.dll 23,682 19,721 3,117 16,604 IE 9 mshtml.dll 64,721 53,312 7,735 45,577 IE 10 mshtml.dll 56,149 44,383 5,515 38,868

We applied PeBouncer to each module to instrument all vtable load instructions, such that during runtime, the addresses of vtables, their memory page permission, and the ad- dresses of the corresponding virtual call sites are gained with OutputDebugString(). Less than 900 exception handlers had to be inserted for each module due to non-overwritable instructions, but all were instrumented without problems. We then, at first, ran the two browser benchmarks SunSpider [9] and Kraken [141] to profile the browsers. Vtable addresses retrieved, not being vtables, show themselves as writable. This way, we can filter out all non-virtual dispatches like calls from nested C structs and eliminate all false positives (see Table 5.4 for details). These were removed from the virtual dispatch slices and each module was rewritten again by PeBouncer. This time, we used instrumentation checks based on our polices Pnw and Pnwa, and policy Pe was used as well. All benchmarks were run again to measure the introduced performance overhead. The results can be seen in Figure 5.9 and yield an overall average performance overhead of approx. 2.1% (Pe), 1.6% (Pnw) and 2.2% (Pnwa).

5.5.3 Vtable Hijacking Detection

Real-world exploits for zero-day vulnerabilities utilized vtable hijacking to first load a fake vtable, and then dereference an entry to call a ROP gadget. In this way, attackers gained a foothold into victim systems via CVE-2013-3897, CVE-2013-3893, and CVE-2013-1690. The virtual dispatches were all found by vExtractor and successfully protected with

6 SunSpider 5.0% 5 4.7% Kraken 4.2% 4 3.3% 3.0% 3.1% 3 2.9% 2.7% 2.5% 2.0% 2 1.6% 1.6% 1.3% 1.4% 1.1% 1.2% 1.2% 1.2% 1.2% 1 0.6% 0.7% 0.6% 0.3% 0.4% T-VIP Overhead [%] 0 e nw nwa e nw nwa e nw nwa e nw nwa Fx 17.0.6 IE 8 IE 9 IE 10 Figure 5.9: Runtime overhead for instrumented browsers on the browser benchmarks SunSpider and Kraken.

112 5.6 Related Work

policies Pnw and Pnwa by PeBouncer. We then attempted to exploit the protected web browsers with corresponding exploits from Metasploit and exploits gained from the wild. All attempts were detected successfully already with Pnw. Another critical vulnerability (CVE-2013-2556) in Windows 7 allowed remote code ex- ecution without any shellcode or ROP in conjuction with vtable hijacking. The culprit was the non-ASLR protected SharedUserData memory region containing function point- ers [219]. Attackers used the region’s address as fake vtable and an entry with a pointer to LdrHotPatchRoutine to gain remote code execution via virtual dispatches. This is de- tected by policy Pnw, as it checks vtables for non-writable in modules. Another zero-day use-after-free vulnerability (CVE-2014-0322) was used in targeted attacks. While the vul- nerability only allowed a one byte write, a vtable pointer of a flash object was modified to gain control [79]. As the precision of vExtractor is high, T-VIP can protect against this vulnerability when the corresponding virtual dispatch is extracted and then instrumented by PeBouncer.

5.6 Related Work

Due to their prevalence and high practical impact, software vulnerabilities have received a lot of attention in the last decades. Many different techniques were proposed to either exploit or detect/mitigate/prevent them. In the following, we briefly review work that is closely related to our approach and discuss how our approach differs from other work in this area.

5.6.1 Control Flow Integrity (CFI) solutions As explained in Section 4.2.2 of Chapter 4, control-flow integrity is a promising concept to mitigate memory corruption attacks that divert the control flow of a given program [1]. The basic idea is to instrument a given program to ensure that each control-flow transfer jumps to a valid program location. Several methods were proposed to implement CFI with low performance overhead [228, 229]. Efficient implementations incur a performance overhead of less than 5%, but had to sacrifice some of the security guarantees given in the original CFI proposal [1]. G¨okta¸set al. demonstrated how these CFI implementations can be circumvented [89]. Their proof-of-concept attack gains control over an indirect transfer by overwriting a vtable pointer with an attacker-controlled heap address. This specific use case is detectable by our approach: we enforce policies at instructions which load vtable addresses before targets of indirect transfers are loaded. Bogus targets might seem legitimate in coarse-grained CFI protections, due to conforming to their CFI policies. Our approach, however, detects a violation if any indirect target comes from a fake vtable. The main difference compared to existing work is that we specifically focus on the integrity of virtual dispatches, since vtable-hijacking attacks have become one of the most widely used attack vectors recently. Instead of protecting all indirect jumps and inducing a performance impact that prevents widespread adoption [198], we focus on a specific subset of indirect jumps that pose an attractive target for attackers. However, many more CFI flavors with various security guarantees and different per- formance overheads were proposed. Explaining exhaustively all of them goes beyond the

113 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software scope of this thesis. The interested reader is referred to the work of Burow et al., in which they provide an overview and compare many CFI solutions [36]. Some solutions are applicable to binary-only software [228, 229], others need source code in order to protect binaries [145, 147, 162]. Due to various attacks against just-in-time (JIT) code generation [10, 191], CFI also found its way into protection of JIT code [146]. Context- sensitive CFI (CCFI) also embodies path information collected during runtime of the to-be-protected program in order to disallow infeasible or invalid execution paths [205]. Similarly, per-input CFI (πCFI) uses runtime information to build the allowed CFG dy- namically but additionally uses a pre-computed CFG to prevent illegal edges in the dy- namic CFG. While above approaches apply protections mostly on the software layer, vari- ous proposals explore the idea of utilizing hardware for this task [63, 137]. Additionally, as mobile devices become more and more prevalent, CFI is applied to mobile architectures, e. g., ARM [61, 162].

5.6.2 Compiler Extensions against Vtable Hijacking Recently, several compiler extensions were proposed to protect vtables from hijacking attacks:

• GCC introduced the -fvtable-verify option [202] that analyzes the class hierarchy during the compilation phase to detect all vtables. Furthermore, checks are inserted at all virtual function call sites to verify the integrity of virtual method dispatches. This compiler-compatible approach was improved by ShrinkWrap [93], as the original design (VTV) missed certain intricacies with multiple inheritance.

• Closely related, SafeDispatch implements an LLVM extension that performs the same basic steps [106]. A class hierarchy analysis is used to determine all valid method implementations and additional checks are inserted to ensure that only valid dispatches are performed during runtime. The measured runtime overhead is about 2.1%.

• Another LLVM extension, and thus, compiler-compatible defense to prevent vtable hijacking was published by Bounov et al. in 2016 [30]. They propose a new vtable layout to decrease the performance overhead while keeping the security guarantees of similar approaches. Determining the class hierarchy and creating interleaved vtables allows an efficent range-check at virtual calls.

• VTrust is also implemented as a LLVM extension and gathers source code infor- mation such as function names and function types during compilation [226]. This information is used during runtime at virtual calls as a sanity check. Virtual function types are enforced at virtual call sites to narrow down the set of allowed targets. As an additional layer of defense, vtable pointers are verified to prevent vtable injection and permit only real vtables.

• VTGuard by Microsoft [108] adds a guard entry at the end of a vtable such that (certain kinds of) vtable hijacks can be detected. This instrumentation is added during the compilation phase. If an information leak exists, an attacker could use

114 5.6 Related Work

this to obtain information about the guard entry, enabling a bypass of the approach. Additionally, it seems that Microsoft’s Control-Flow Guard (MS-CFG) will replace VTGuard.

The main difference to our approach is the fact that we operate on the binary level such that we can also protect proprietary programs where no source code is available. Since the full class hierarchy can be determined during the compilation phase, the security guarantee provided by the first two approaches is stronger than ours: these approaches can implement Pobj and perform a full integrity check. However, empirical results demonstrate that our policy can already defeat in-the-wild zero-day exploits. The performance overhead is slightly higher, but this is mainly due to the fact that we instrument binary programs.

5.6.3 Binary-Only Solutions Against Vtable Hijacking Shortly after the work in this chapter was published [84, 85], similar research arose to harden C++ binaries against vtable hijacking. In what follows, we shortly explain various solutions. Similarly to our approach, VTint checks if vtables used at virtual calls have the read-only permission set. Additionally, it moves all vtables to a read-only section in a static analysis pass to ensure that only these vtables are utilized for a virtual call [227]. VfGuard performs virtual call site identification as well and enforces checks during runtime to verify that a vtable pointer always points to the beginning of a known vtable [166]. Unfortunately, it has been shown that many binary-only solutions are ineffective against a skilled attacker. The Turing-complete vtable-reuse attack, dubbed Counterfeit Object- Oriented Programming (COOP) does not inject fake vtables but reuses existing ones. Many binary-only defenses fell victim to this attack, including our approach [177]. This is due to the fact that we protect against injected vtables – whereby a fake vtable is created in read-write memory – but not against the reuse of real vtables. However, the arms race continues, and van der Veen et al. published research for recovering call site information to limit call targets at indirect call sites based on the recovered type [206]. Hence, it is possible to mitigate COOP.

5.6.4 Heap Monitoring By monitoring the heap of a given program during execution, dangling pointers that can lead to use-after-free or double-free vulnerabilities can be detected. For example, Un- dangle is a detection tool that leverages taint and pointer tracking to recognize dangling pointers during runtime [37]. Note that the tool is not designed to protect applications. Cling is a memory allocator that constraints memory allocation to allow address space reuse only among objects of the same type, thus preventing use-after-free vulnerabili- ties [5]. The authors report a performance overhead of less than 2% for most benchmarks. Other proposals for memory allocators that provide additional security guarantees include DieHard [17] and DieHarder [148]. Besides their ability to run programs in multi-execution mode (see Section 3.6.2 of Chapter 3), both also prevent use-after-free vulnerabilities, but have rather high performance overheads (e.g., DieHarder imposes on average a 20% per- formance penalty). Our performance overhead is comparable to Cling, but we do not require to exchange the memory allocator.

115 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

Reference counting is a memory management technique used for example in garbage collectors to track during runtime how many references to a given object exist [120, 173]. The basic insight is that if no pointer to an object exists anymore, the object can be safely freed. Unfortunately, referencing counting induces a certain performance overhead and no security guarantees can be given. An attacker might be able to corrupt the reference counts since this information needs to be stored on the heap.

5.7 Discussion

In the following, we discuss the limitations and shortcomings of our approach and the current implementation. It is crucial to identify virtual dispatches precisely in order to protect all virtual call sites. As the evaluation shows, vExtractor misses 2.6 % of virtual dispatches. Recall formulae (1) and (2) from Section 5.2.2. Manual investigation revealed that in cases, especially GCC creates multiple aliases for obj. While vExtractor already performs an alias analysis to some extent, cases can slip away if an alias of obj is used in instructions represented by (2), but cannot be connected to obj in (1). Also, at the time of writing, trying to compile Firefox with the original GCC 4.9.0 enabling GCC’s vtable-verification, led to compiler crashes. Thus, we were not able to evaluate vExtractor’s precision using Firefox as ground truth. Currently, binaries have to be profiled in order to remove virtual dispatch-like code constructs. On the binary level, it is impossible to differentiate certain C code constructs from virtual dispatches, and thus we need this (automated) profiling phase, to filter all non-virtual dispatches. As shown in our evaluation, T-VIP protects against real-world vtable hijacking attacks. However, policy Pnw could be circumvented by using a pointer residing in a non-writable module memory page and pointing to code of an attacker’s choice. To mitigate this, we sacrify performance by generating a random index into a vtable in the implementation of Pnwa. Hence, T-VIP guarantees that a different vtable entry is checked for each execution time at the same virtual dispatch. An attacker is thereby restricted to use non- writable function tables in order to reliably compromise a system. By itself, circumventing this is already very hard, but would be still possible if a valid vtable of a wrong class type is used at a virtual dispatch site. This is a limitation we have in common with VTGuard according to [106]. However, implementing Pobj would prevent even such attacks. PeBouncer currently supports 32-bit PE files since the majority of web browsers uses 32-bit code and this is the primary target of use-after-free exploits. However, the concept of PeBouncer is usable for 64-bit code and the ELF file format as well, with only minor modifications. Some corner cases during rewriting are currently handled by an exception handler and introduce additional overhead (see Section 5.4.2.1). This could be solved by leveraging binary rewriting capabilities of ROSE [168] or Dyninst [18] to insert checks inline.

116 5.8 Conclusion

5.8 Conclusion

In this chapter, we introduced an approach to protect binary programs against vtable hi- jacking vulnerabilities, which have become the de-facto attack vector on modern browsers. To this end, we introduced an automated method to extract virtual function dispatches from a given binary, which we implemented in a tool called vExtractor. Furthermore, we developed a generic, static binary rewriting engine for PE files called PeBouncer that can instrument a given binary with a policy that checks the integrity of virtual func- tion dispatches. Empirical evaluations demonstrate that our approach can detect recent zero-day vulnerabilities and the performance overhead is only slightly higher compared to compiler-based approaches.

117 Chapter 5 Vtable-Hijacking Protection for Binary-Only Software

118 Chapter 6 Conclusion

In this thesis, we investigated specific steps of the exploitation process of vulnerabili- ties, and the impact of security critical memory corruptions from offensive and defensive perspectives. Information leaks and control-flow hijacking are typically important steps for an adversary to compromise client applications such as browsers and its plugins. We contributed to both steps with offensive and defensive research. We did not only extend in- formation leaks with a behavior which was not known in web browsers to that extend, but also introduced a new detection technique for information leaks in script engines as they are a fundamental component. On the offensive side of control-flow hijacking, we devel- oped a framework for architecture-independent gadget discovery to gather CFI-compatible code gadgets and on the defensive side we showed that an attack technique popular among attackers can be mitigated in binary-only applications. We believe that the behavior of crash resistance in applications may become an impor- tant primitve that current defenses overlook (Chapter 2). Ongoing and future research will show if it is possible to learn if other programs are affected by that behavior and if such primitives can be discovered in an automated way. This would increase the power of memory disclosures. As we have outlined in Table 3.1 of Chapter 3, memory disclosures (step II) in web browsers are a crucial exploit step to bypass specific defense schemes. Hence, preventing them is a hot research topic. While we approached this attack vec- tor with dual execution of script engines, it became clear that it is possible, but hard to engineer the presented detection concept. This is mainly due to the complexity of web browsers and their components. However, the underlying main concept is powerful, and we believe that additional implementation effort can improve our prototype to become a fully automated detection framework usable to detect in-the-wild, i. e., zero-day exploits which utilize memory disclosures. To investigate step III, i. e., control-flow hijacking, static program analysis was involved to achieve a framework for the evaluation of CFI and code-reuse solutions in Chapter 4. Amongst others, static program analysis was also utilized to prevent vtable-hijacking – a popular adversarial technique to hijack the control flow in browsers – in Chapter 5. It became clear that the runtime of such analyses may have a long duration. Apparently, program analysis of binary applications is entering into the focus of research, and hence, frameworks emerge which make it easier to implement

119 Chapter 6 Conclusion various analyses. It is crucial to choose an appropriate framework which exactly fits the needs of a task to be solved. Therefore, the chosen toolkits used to develop the approaches in Chapter 4 and Chapter 5 may be superseded. However, note that analysis frameworks for research purposes are mostly prototypes or are continuously improved. Hence, they are moving targets and make a choice cumbersome without time-consuming investigation into each of them.

Future Thoughts on Memory Corruption Vulnerabilities. Both fundamental steps, i. e., information leaks and control-flow hijacking have received and are receiving much attention by the research community. It remains still an open problem how applications – especially without access to the source code – can be protected with a reasonable performance overhead. However, the defensive side makes progress in that the difficulty for attackers to exploit client applications rises. While adversaries have immense capabilities due to the complexity of client programs, i. e., they can use languages such as JavaScript to support the exploitation of bugs, from time to time academic defenses are incorporated into commodity operating systems. One recent example is Microsoft’s Control-Flow Guard to thwart a multitude of in-the-wild exploits. Unfortunately, it is imaginable that the arms race between attacks and defenses will not come to an end. But it seems that adversaries will need more and more resources to successfully exploit memory corruptions as defenses are invented and successively added to operating systems. We did not investigate data-flow protections [42], the integrity of memory operations [6] or data-oriented attacks [100, 101]. Similarly, software-fault isolation (SFI [76]) or sandboxing [170] are not in the scope of this thesis. However, some COTS client applications contain measures to reduce the attack surface by splitting application resources across different-privileged processes [133, 169]. This way, the adversary needs more than the actual memory corruption vulnerability. Although, she is able to exploit an error and can perform arbitrary computations, there is no way to cross the boundary to the higher-privileged process without additional bugs to e. g., maintain persistence on a system. Hence, an additional vulnerability in the higher- privileged process is necessary to gain additional capabilities on the target. Only then, the attacker is able to actually perform activities with the rights of the victim user. Indeed, many in-the-wild exploits exercise several bugs to gain complete access to a system [35, 47]. However, it can be assumed that even in the absence of control-flow hijacking, adver- saries will find ways to exploit vulnerable programs with data-only attacks. In recent years, such attacks were already observed in the wild: there may be objects in vulnerable programs containing security flags, which can simply be manipulated by an attacker. And suddenly, capabilities of the application are unleashed which are normally only available for trusted local content [3, 122]. Put differently, the attacker can perform operations within a remote web page that are usually reserved for a local user. For example, certain ActiveX objects in Internet Explorer have a special security flag (safemode flag). If this flag is set, the operations and methods of the object are assumed to be safe within a remote untrusted content. However, if this flag is overwritten by the attacker, additional methods become available which are at least as dangerous as remote code execution via control-flow hijacking. For example, the attacker can use the ActiveX object’s exposed API to start arbitrary programs using script code embedded in a remote web page [122].

120 Another example are security flags in Mozilla Firefox. If an attacker can overwrite such a flag, she has access to the high-privileged JavaScript API and can execute arbitrary processes [3]. Similar attacks follow the same pattern: A security flag field (e. g., in a script-engine object) is overwritten by using a memory corruption vulnerability, and af- terwards privileged API methods are used to conduct the attack [136]. In the worst case, an adversary gains the goal of remote code execution without manipulating control data, only non-control data is abused. We believe that data-oriented exploitation of memory corruption vulnerabilities will become more and more important in the future. While these attacks are mostly bound to the intrinsics of the to-be-exploited client application, their accomplishment is easier than current state-of-the-art methods that divert the control flow. Hence, we assume that the verification of data integrity becomes inevitable to protect future users against memory corruption vulnerabilities.

121 Chapter 6 Conclusion

122 Publications

During the development on this thesis the author contributed to the following publications:

Peer-reviewed Publications • Patrick Wollgast, Robert Gawlik, Behrad Garmany, Benjamin Kollenda, Thorsten Holz. Automated Multi-Architectural Discovery of CFI-Resistant Code Gadgets. In European Symposium on Research in Computer Security (ESORICS), 2016

• Enes G¨okta¸s,Robert Gawlik, Benjamin Kollenda, Elias Athanasopoulos, Georgios Portokalidis, Cristiano Giuffrida, Herbert Bos. Undermining Entropy-based Infor- mation Hiding (And What to do About it). In USENIX Security Symposium, 2016

• Robert Gawlik, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Gar- many, Thorsten Holz. Detile: Fine-Grained Information Leak Detection in Script Engines. In Conference on Detection of Intrusions and Malware & Vulnerability As- sessment (DIMVA), 2016

• Robert Gawlik, Benjamin Kollenda, Philipp Koppe, Behrad Garmany, Thorsten Holz. Enabling Client-Side Crash-Resistance to Overcome Diversification and In- formation Hiding. In Annual Network & Distributed System Security Symposium (NDSS), 2016

• Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, Thorsten Holz. Cross-Architecture Bug Search in Binary Executables. In IEEE Symposium on Se- curity and Privacy (Oakland), 2015

• Robert Gawlik, Thorsten Holz. Towards Automated Integrity Protection of C++ Virtual Function Tables in Binary Programs. In Annual Computer Security Appli- cations Conference (ACSAC), 2014

• Sebastian Vogl, Robert Gawlik, Behrad Garmany, Thomas Kittel, Jonas Pfoh, Clau- dia Eckert, Thorsten Holz. Dynamic Hooks: Hiding Control Flow Changes within Non-Control Data. In USENIX Security Symposium, 2014

123 Publications

• Apostolis Zarras, Antonis Papadogiannakis, Robert Gawlik, Thorsten Holz. Auto- mated Generation of Models for Fast and Precise Detection of HTTP-Based Mal- ware. In Annual Conference on Privacy, Security and Trust (PST), 2014

Technical Reports

• Robert Gawlik, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Gar- many, Thorsten Holz. Technical Report: Detile: Fine-Grained Information Leak Detection in Script Engines. Ruhr-Universit¨atBochum, Horst G¨ortzInstitut f¨ur IT-Sicherheit (HGI), 2016

• Robert Gawlik, Thorsten Holz. Technical Report: Towards Automated Integrity Protection of C++ Virtual Function Tables in Binary Programs. Ruhr-Universit¨at Bochum, Horst G¨ortzInstitut f¨urIT-Sicherheit (HGI), 2014

124 List of Figures

1.1 Exploit procedure and contributions ...... 6

2.1 Thread Environment Block ...... 22 2.2 Propagation summary for RtlInsertElementGenericTableFullAvl() . . . . 27 2.3 Attacker-controlled JsLiteralString object ...... 29 2.4 Location of the asm.js heap pointer ...... 30

3.1 Shared physical memory ...... 43 3.2 Two methods to generate information leaks ...... 45 3.3 Basic script-engine interpreter loop ...... 47 3.4 Overview of Detile ...... 48 3.5 Master and twin process with a different randomization ...... 49 3.6 Detile running with Internet Explorer ...... 50 3.7 Memory overhead of re-randomization and dual execution ...... 56 3.8 Creating an information leak with CVE-2014-0322 ...... 60

4.1 Code reuse example including ROP, JOP and COP ...... 69 4.2 Exemplary CFG protected by ideal CFI ...... 71 4.3 Example loop gadget ...... 74 4.4 Gadget with unsatisfiable path constraint ...... 77 4.5 CFI-compatible ARM gadget ...... 85

5.1 C++ inheritance concepts ...... 94 5.2 Disassembly and corresponding semantic steps of virtual dispatches . . . . . 96 5.3 Use-after-free exploitation steps ...... 97 5.4 Overview of T-VIP ...... 99 5.5 State machine design of backward slicing ...... 102 5.6 Backward slicing state machines ...... 103 5.7 Virtual dispatch instrumentation ...... 106 5.8 Instrumentation stub structure ...... 108 5.9 Runtime overhead for instrumented browsers on browser benchmarks . . . . 112

125 List of Figures

126 List of Tables

3.1 Defenses and offensive approaches utilizing information leaks in browsers . . 40 3.2 Re-randomized processes in Internet Explorer 11 ...... 55 3.3 Memory working sets of 32-bit native processes on Windows 8.1 ...... 56 3.4 Memory working sets of 32-bit processes running in dual execution mode on Windows 8.1 ...... 57 3.5 Memory working sets of 32-bit native processes on Windows 8.0 ...... 57 3.6 Memory working sets of 32-bit processes running in dual execution mode on Windows 8.0 ...... 57 3.7 Startup times and startup slowdowns ...... 58 3.8 Script execution time of Internet Explorer 11 ...... 58

4.1 Gadget definitions listed by type ...... 75 4.2 Gadget discovery and analysis time ...... 83 4.3 Gadget start and end type distribution ...... 83 4.4 Gadget loop count ...... 84 4.5 Fixed function call distribution ...... 84 4.6 Number of unique EP and CS gadgets discovered by other tools in compar- ison to our framework ...... 87

5.1 Compiler-dependent thiscall convention ...... 95 5.2 Zero-day attacks using vtable hijacking in-the-wild ...... 98 5.3 Spec CPU2006 benchmark on instrumented programs ...... 111 5.4 vExtractor results ...... 112

127 List of Tables

128 List of Listings

2.1 Crash-resistant program in Windows ...... 17 2.2 Memory oracle in Internet Explorer 11 ...... 19 2.3 Web-worker memory oracle ...... 21

4.1 Exploit buffer for use-after-free vulnerability on ARM ...... 86

129 List of Listings

130 Bibliography

[1] Mart´ınAbadi, Mihai Budiu, Ulfar´ Erlingsson, and Jay Ligatti. Control-Flow In- tegrity. In ACM Conference on Computer and Communications Security (CCS), 2005.

[2] Mart´ınAbadi, Mihai Budiu, Ulfar´ Erlingsson, and Jay Ligatti. Control-flow Integrity Principles, Implementations, and Applications. ACM Transactions on Information and System Security (TISSEC), 2009.

[3] J¨uriAedla. Out-of-bounds read/write through neutering ArrayBuffer objects. https://bugzilla.mozilla.org/show bug.cgi?id=982974, 2014.

[4] Jonathan Afek and Adi Sharabani. Dangling Pointer: Smashing the Pointer for Fun and Profit. Black Hat USA, 2007.

[5] Periklis Akritidis. Cling: A Memory Allocator to Mitigate Dangling Pointers. In USENIX Security Symposium, 2010.

[6] Periklis Akritidis, Cristian Cadar, Costin Raiciu, Manuel Costa, and Miguel Castro. Preventing Memory Error Exploits with WIT. In IEEE Symposium on Security and Privacy, 2008.

[7] Alexa. The Top 500 Sites on the Web. http://www.alexa.com/topsites, 2014.

[8] Brian Anderson, Lars Bergstrom, David Herman, Josh Matthews, Keegan McAllis- ter, Manish Goregaokar, Jack Moffitt, and Simon Sapin. Experience Report: Devel- oping the Servo Web Browser Engine Using Rust. arXiv preprint arXiv:1505.07383, 2015.

[9] Apple. SunSpider 1.0.2. https://www.webkit.org/perf/sunspider/sunspider. html, 2014.

[10] Michalis Athanasakis, Elias Athanasopoulos, Michalis Polychronakis, Georgios Por- tokalidis, and Sotiris Ioannidis. The Devil is in the Constants: Bypassing Defenses in Browser JIT Engines. In Symposium on Network and Distributed System Security (NDSS), 2015.

131 Bibliography

[11] Michael Backes, Thorsten Holz, Benjamin Kollenda, Philipp Koppe, Stefan N¨urn- berger, and Jannik Pewny. You Can Run but You Can’t Read: Preventing Disclosure Exploits in Executable Code. In ACM Conference on Computer and Communica- tions Security (CCS), 2014.

[12] Michael Backes and Stefan N¨urnberger. Oxymoron: Making Fine-Grained Memory Randomization Practical by Allowing Code Sharing. In USENIX Security Sympo- sium, 2014.

[13] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. Dynamo: a Transparent Dynamic Optimization System. ACM SIGPLAN Notices, 2000.

[14] Elena Gabriela Barrantes, David H. Ackley, Trek S. Palmer, Darko Stefanovic, and Dino Dai Zovi. Randomized Instruction Set Emulation to Disrupt Binary Code Injection Attacks. In ACM Conference on Computer and Communications Security (CCS), 2003.

[15] Gilles Barthe, Juan Manuel Crespo, Dominique Devriese, Frank Piessens, and Ex- equiel Rivas. Secure Multi-Execution through Static Program Transformation. In Formal Techniques for Distributed Systems, 2012.

[16] Pete Becker et al. Working Draft, Standard for C++. Technical report, Technical Report, 2011.

[17] Emery D. Berger and Benjamin G. Zorn. DieHard: Probabilistic Memory Safety for Unsafe Languages. In ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 2006.

[18] Andrew R. Bernat and Barton P. Miller. Anywhere, Any-Time Binary Instrumen- tation. In Proceedings of the 10th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools, 2011.

[19] Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar. Efficient Techniques for Com- prehensive Protection from Memory Error Exploits. In USENIX Security Sympo- sium, 2005.

[20] Sandeep Bhatkar, Daniel C. DuVarney, and Ron Sekar. Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits. In USENIX Security Symposium, 2003.

[21] David Bigelow, Thomas Hobson, Robert Rudd, William Streilein, and Hamed Okhravi. Timely Rerandomization for Mitigating Memory Disclosures. In ACM Conference on Computer and Communications Security (CCS), 2015.

[22] David Binkley and Mark Harman. A Survey of Empirical Results on Program Slicing. Advances in Computing, 2003.

[23] Andrea Bittau, Adam Belay, Ali Mashtizadeh, David Mazieres, and Dan Boneh. Hacking Blind. In IEEE Symposium on Security and Privacy, 2014.

132 Bibliography

[24] Dion Blazakis. Interpreter Exploitation: Pointer Inference and JIT Spraying. In BlackHat DC, 2010.

[25] Dion Blazakis. GC Woah. http://www.trapbit.com/talks/Summerc0n2013-GCWoah. pdf, 2013.

[26] Tyler Bletsch, Xuxian Jiang, Vince W. Freeh, and Zhenkai Liang. Jump-Oriented Programming: A New Class of Code-Reuse Attack. In ACM Symposium on Infor- mation, Computer and Communications Security (ASIACCS), 2011.

[27] Bypassing Microsoft EMET 5.1 - Yet Again. http://blog.sec-consult.com/2014/ 11/bypassing-microsoft-emet-51-yet-again.html.

[28] Erwin Bosman and Herbert Bos. Framing Signals–A Return to Portable Shellcode. In IEEE Symposium on Security and Privacy, 2014.

[29] Botan. Botan C++ Crypto Library. http://botan.randombit.net/, 2013.

[30] Dimitar Bounov, Rami Kici, and Sorin Lerner. Protecting C++ Dynamic Dispatch through Vtable Interleaving. In Symposium on Network and Distributed System Security (NDSS), 2016.

[31] Derek Bruening, Evelyn Duesterwald, and Saman Amarasinghe. Design and Im- plementation of a Dynamic Optimization Framework for Windows. In 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4), 2001.

[32] David Brumley, Ivan Jager, Thanassis Avgerinos, and Edward J. Schwartz. BAP: A Binary Analysis Platform. In Conference on Computer Aided Verification, 2011.

[33] Danilo Bruschi, Lorenzo Cavallaro, and Andrea Lanzi. Diversified Process Replicæ for Defeating Memory Error Exploits. In Performance, Computing, and Communi- cations Conference (IPCCC), 2007.

[34] Erik Buchanan, Ryan Roemer, Hovav Shacham, and Stefan Savage. When Good Instructions Go Bad: Generalizing Return-oriented Programming to RISC. In ACM Conference on Computer and Communications Security (CCS), 2008.

[35] Christopher Budd. Pwn2Own: Day 2 and Event Wrap-Up. http://blog. trendmicro.com/pwn2own-day-2-event-wrap/, 2016.

[36] Nathan Burow, Scott A. Carr, Stefan Brunthaler, Mathias Payer, Joseph Nash, Per Larsen, and Michael Franz. Control-Flow Integrity: Precision, Security, and Performance. arXiv preprint arXiv:1602.04056, 2016.

[37] Juan Caballero, Gustavo Grieco, Mark Marron, and Antonio Nappa. Undangle: Early Detection of Dangling Pointers in Use-After-Free and Double-Free Vulnera- bilities. In International Symposium on Software Testing and Analysis (ISSTA), 2012.

133 Bibliography

[38] Roberto Capizzi, Antonio Longo, V.N. Venkatakrishnan, and A. Prasad Sistla. Pre- venting Information Leaks through Shadow Executions. In Annual Computer Secu- rity Applications Conference (ACSAC), 2008.

[39] Capstone - The Ultimate Disassembly Framework. http://www.capstone-engine. org/.

[40] Nicholas Carlini and David Wagner. ROP is Still Dangerous: Breaking Modern Defenses. In USENIX Security Symposium, 2014.

[41] Nicolas Carlini, Antonio Barresi, Mathias Payer, David Wagner, and Thomas R. Gross. Control-Flow Bending: On the Effectiveness of Control-Flow Integrity. In USENIX Security Symposium, 2015.

[42] Miguel Castro, Manuel Costa, and Tim Harris. Securing Software by Enforcing Data- Flow Integrity. In Symposium on Operating Systems Design and Implementation (OSDI), 2006.

[43] David Chappell. Understanding ActiveX and OLE: a guide for developers and man- agers. , 1996.

[44] Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, Ho- vav Shacham, and Marcel Winandy. Return-Oriented Programming without Re- turns. In ACM Conference on Computer and Communications Security (CCS), 2010.

[45] Stephen Checkoway, Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, Ho- vav Shacham, and Marcel Winandy. Return-oriented Programming Without Re- turns. In ACM Conference on Computer and Communications Security (CCS), 2010.

[46] Yueqiang Cheng, Zongwei Zhou, Miao Yu, Xuhua Ding, and Robert H. Deng. ROPecker: A Generic and Practical Approach for Defending Against ROP Attacks. In Symposium on Network and Distributed System Security (NDSS), 2014.

[47] Dustin Childs. Pwn2Own 2015: Day Two Results. http://community.hpe.com/t5/ Security-Research/Pwn2Own-2015-Day-Two-results/ba-p/6722884, March 2015.

[48] Mauro Conti, Stephen Crane, Lucas Davi, Michael Franz, Per Larsen, Marco Negro, Christopher Liebchen, Mohaned Qunaibit, and Ahmad-Reza Sadeghi. Losing con- trol: On the Effectiveness of Control-Flow Integrity under Stack Attacks. In ACM Conference on Computer and Communications Security (CCS), 2015.

[49] Crispin Cowan, Calton Pu, Dave Maier, Heather Hintony, Jonathan Walpole, Peat Bakke, Steve Beattie, Aaron Grier, Perry Wagle, and Qian Zhang. StackGuard: Au- tomatic Adaptive Detection and Prevention of Buffer-overflow Attacks. In USENIX Security Symposium, 1998.

134 Bibliography

[50] Benjamin Cox, David Evans, Adrian Filipi, Jonathan Rowanhill, Wei Hu, Jack Davidson, John Knight, Anh Nguyen-Tuong, and Jason Hiser. N-variant Systems: A Secretless Framework for Security through Diversity. In USENIX Security Sym- posium, 2006.

[51] Stephen Crane, Christopher Liebchen, Andrei Homescu, Lucas Davi, Per Larsen, Ahmad-Reza Sadeghi, Stefan Brunthaler, and Michael Franz. Readactor: Practi- cal Code Randomization Resilient to Memory Disclosure. In IEEE Symposium on Security and Privacy, 2015.

[52] Stephen Crane, Stijn Volckaert, Felix Schuster, Christopher Liebchen, Per Larsen, Lucas Davi, Ahmad-Reza Sadeghi, Thorsten Holz, Bjorn De Sutter, and Michael Franz. It’s a TRaP: Table Randomization and Protection against Function Reuse Attacks. In ACM Conference on Computer and Communications Security (CCS), 2015.

[53] Jason Croft and Matthew Caesar. Towards Practical Avoidance of Information Leakage in Enterprise Networks. In HotSec, 2011.

[54] CVE. Google Chrome Vulnerability Statistics. http://www.cvedetails.com/ product/15031/Google-Chrome.html, 2014.

[55] CVE. Microsoft Internet Explorer Vulnerability Statistics. http://www.cvedetails. com/product/9900/Microsoft-Internet-Explorer.html, 2014.

[56] CVE. Mozilla Firefox Vulnerability Statistics. http://www.cvedetails.com/ product/3264/Mozilla-Firefox.html, 2014.

[57] Disarming and Bypassing EMET 5.1. https://www.offensive-security.com/ vulndev/disarming-and-bypassing-emet-5-1/.

[58] Dino Dai Zovi. Practical Return-Oriented Programming. SOURCE Boston, 2010.

[59] Thurston H.Y. Dang, Petros Maniatis, and David Wagner. The Performance Cost of Shadow Stacks and Stack Canaries. In ACM Symposium on Information, Computer and Communications Security (ASIACCS), 2015.

[60] Mark Daniel, Jake Honoroff, and Charlie Miller. Engineering Heap Overflow Exploits with JavaScript. In USENIX Workshop on Offensive Technologies (WOOT), 2008.

[61] Lucas Davi, Alexandra Dmitrienko, Manuel Egele, Thomas Fischer, Thorsten Holz, Ralf Hund, Stefan N¨urnberger, and Ahmad-Reza Sadeghi. MoCFI: A Framework to Mitigate Control-Flow Attacks on Smartphones. In Symposium on Network and Distributed System Security (NDSS), 2012.

[62] Lucas Davi, Alexandra Dmitrienko, Ahmad-Reza Sadeghi, and Marcel Winandy. Return-Oriented Programming without Returns on ARM. Technical report, Tech- nical Report HGI-TR-2010-002, Ruhr-University Bochum, 2010.

135 Bibliography

[63] Lucas Davi, Matthias Hanreich, Debayan Paul, Ahmad-Reza Sadeghi, Patrick Koe- berl, Dean Sullivan, Orlando Arias, and Yier Jin. HAFIX: Hardware-Assisted Flow Integrity Extension. In Proceedings of the 52nd Annual Design Automation Confer- ence, 2015.

[64] Lucas Davi, Daniel Lehmann, Ahmad-Reza Sadeghi, and Fabian Monrose. Stitch- ing the Gadgets: On the Ineffectiveness of Coarse-Grained Control-Flow Integrity Protection. In USENIX Security Symposium, 2014.

[65] Lucas Davi, Christopher Liebchen, Ahmad-Reza Sadeghi, Kevin Z. Snow, and Fabian Monrose. Isomeron: Code Randomization Resilient to (Just-In-Time) Return-Oriented Programming. In Symposium on Network and Distributed System Security (NDSS), 2015.

[66] Changes to Functionality in Microsoft Windows XP Service Pack 2. https: //technet.microsoft.com/en-us/library/bb457151.aspx.

[67] Solar Designer. ”Return-to-Libc” Attack. Bugtraq, Aug, 1997.

[68] Dominique Devriese and Frank Piessens. Noninterference through Secure Multi- Execution. In IEEE Symposium on Security and Privacy, 2010.

[69] Goran Doychev, Dominik Feld, Boris K¨opf,Laurent Mauborgne, and Jan Reineke. CacheAudit: A Tool for the Static Analysis of Cache Side Channels. In USENIX Security Symposium, 2013.

[70] Karel Driesen and Urs H¨olzle.The Direct Cost of Virtual Function Calls in C++. ACM Sigplan Notices, 1996.

[71] David Drummond. A New Approach to China. https://googleblog.blogspot.de/ 2010/01/new-approach-to-china.html, 2010.

[72] Thomas Dullien, Tim Kornau, and Ralf-Philipp Weinmann. A Framework for Au- tomated Architecture-Independent Gadget Search. In WOOT, 2010.

[73] Thomas Dullien and Sebastian Porst. REIL: A Platform-Independent Intermediate Representation of Disassembled Code for Static Code Analysis. CanSecWest Applied Security Conference, 2009.

[74] Enhanced Mitigation Experience Toolkit - EMET - TechNet Security. https:// technet.microsoft.com/en-us/security/jj653751.

[75] Microsoft Security Toolkit Delivers New BlueHat Prize Defensive Technology — News Center. http://news.microsoft.com/2012/07/25/microsoft-security- toolkit-delivers-new-bluehat-prize-defensive-technology/.

[76] Ulfar´ Erlingsson, Mart´ınAbadi, Michael Vrable, Mihai Budiu, and George C. Necula. XFI: Software Guards for System Address Spaces. In Symposium on Operating Systems Design and Implementation (OSDI), 2006.

136 Bibliography

[77] Isaac Evans, Sam Fingeret, Juli´anGonz´alez,Ulziibayar Otgonbaatar, Tiffany Tang, Howard Shrobe, Stelios Sidiroglou-Douskos, Martin Rinard, and Hamed Okhravi. Missing the Point(er): On the Effectiveness of Code Pointer Integrity. In IEEE Symposium on Security and Privacy, 2015.

[78] Isaac Evans, Fan Long, Ulziibayar Otgonbaatar, Howard Shrobe, Martin Rinard, Hamed Okhravi, and Stelios Sidiroglou-Douskos. Control Jujutsu: On the Weak- nesses of Fine-Grained Control Flow Integrity. In ACM Conference on Computer and Communications Security (CCS), 2015.

[79] FireEye. Operation SnowMan. https://www.fireeye.com/blog/threat- research/2014/02/operation-snowman-deputydog-actor-compromises-us- veterans-of-foreign-wars-.html, 2014.

[80] FireEye. Angler Exploit Kit Evading EMET. https://www.fireeye.com/blog/ threat-research/2016/06/angler exploit kite.html, 2016.

[81] Andreas Follner, Alexandre Bartel, and Eric Bodden. Analyzing the Gadgets. In Engineering Secure Software and Systems, 2016.

[82] Ivan Fratric. Runtime Prevention of Return-Oriented Programming Attacks. https: //github.com/ivanfratric/ropguard/raw/master/doc/ropguard.pdf.

[83] Robert Gawlik. Bypassing Different Defense Schemes via Crash Resistant Probing of Address Space. In CanSecWest Applied Security Conference, 2016.

[84] Robert Gawlik and Thorsten Holz. Towards Automated Integrity Protection of C++ Virtual Function Tables in Binary Programs. In Annual Computer Security Applications Conference (ACSAC), 2014.

[85] Robert Gawlik and Thorsten Holz. Towards Automated Integrity Protection of C++ Virtual Function Tables in Binary Programs. Technical Report TR-HGI-2014-004, Ruhr-University Bochum, 2014.

[86] Robert Gawlik, Benjamin Kollenda, Philipp Koppe, Behrad Garmany, and Thorsten Holz. Enabling Client-Side Crash-Resistance to Overcome Diversification and Infor- mation Hiding. In Symposium on Network and Distributed System Security (NDSS), 2016.

[87] Robert Gawlik, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Gar- many, and Thorsten Holz. Detile: Fine-Grained Information Leak Detection in Script Engines. In Detection of Intrusions and Malware, and Vulnerability Assess- ment (DIMVA), 2016.

[88] Robert Gawlik, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Gar- many, and Thorsten Holz. Detile: Fine-Grained Information Leak Detection in Script Engines. Technical Report TR-HGI-2016-004, Ruhr-University Bochum, 2016.

137 Bibliography

[89] Enes G¨okta¸s,Elias Athanasopoulos, Herbert Bos, and Georgios Portokalidis. Out of Control: Overcoming Control-Flow Integrity. In IEEE Symposium on Security and Privacy, 2014.

[90] Enes G¨okta¸s,Elias Athanasopoulos, Michalis Polychronakis, Herbert Bos, and Geor- gios Portokalidis. Size Does Matter: Why Using Gadget-Chain Length to Prevent Code-Reuse Attacks is Hard. In USENIX Security Symposium, 2014.

[91] Enes G¨okta¸s,Robert Gawlik, Benjamin Kollenda, Elias Athanasopoulos, Georgios Portokalidis, Cristiano Giuffrida, and Herbert Bos. Undermining Entropy-based Information Hiding (And What to Do About It). In USENIX Security Symposium, 2016.

[92] Jordan Gruskovnjak. Advanced Exploitation of Mozilla Firefox Use-after-free (MFSA 2012-22). http://web.archive.org/web/20150121031623/http://www. vupen.com/blog/20120625.Advanced Exploitation of Mozilla Firefox UaF CVE- 2012-0469.php, 2012.

[93] Istvan Haller, Enes G¨okta¸s, Elias Athanasopoulos, Georgios Portokalidis, and Her- bert Bos. ShrinkWrap: VTable Protection without Loose Ends. In Annual Computer Security Applications Conference (ACSAC), 2015.

[94] Mark Harman and Sebastian Danicic. Amorphous Program Slicing. In 5th Interna- tional Workshop on Program Comprehension, 1997.

[95] David Herman and Kenneth Russell. Typed Array Specification. Khronos. org, 2011.

[96] Jonathan Heusser and Pasquale Malacaria. Quantifying Information Leaks in Soft- ware. In Annual Computer Security Applications Conference (ACSAC), 2010.

[97] Timo Hirvonen. Dynamic Flash Instrumentation For Fun and Profit. In Black Hat USA, 2014.

[98] Jason Hiser, Anh Nguyen-Tuong, Michele Co, Matthew Hall, and Jack W. Davidson. ILR: Where’d My Gadgets Go? In IEEE Symposium on Security and Privacy, 2012.

[99] Petr Hosek and Cristian Cadar. Varan the Unbelievable: An Efficient N-version Execution Framework. In Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.

[100] Hong Hu, Zheng Leong Chua, Sendroiu Adrian, Prateek Saxena, and Zhenkai Liang. Automatic Generation of Data-Oriented Exploits. In USENIX Security Symposium, 2015.

[101] Hong Hu, Shweta Shinde, Sendroiu Adrian, Zheng Leong Chua, Prateek Saxena, and Zhenkai Liang. Data-Oriented Programming: On the Expressiveness of Non-Control Data Attacks. In IEEE Symposium on Security and Privacy, 2016.

[102] R. Hund, C. Willems, and T. Holz. Practical Timing Side Channel Attacks against Kernel Space ASLR. In IEEE Symposium on Security and Privacy, 2013.

138 Bibliography

[103] Ralf Hund, Thorsten Holz, and Felix C. Freiling. Return-Oriented Rootkits: Bypass- ing Kernel Code Integrity Protection Mechanisms. In USENIX Security Symposium, 2009.

[104] Ida Sploiter. https://thesprawl.org/projects/ida-sploiter/.

[105] Ivo Ivanov. API Hooking Revealed. The Code Project, 2002.

[106] Dongseok Jang, Zachary Tatlock, and Sorin Lerner. SafeDispatch: Securing C++ Virtual Calls from Memory Corruption Attacks. In Symposium on Network and Distributed System Security (NDSS), 2014.

[107] Wesley Jin, Cory Cohen, Jeffrey Gennari, Charles Hines, Sagar Chaki, Arie Gurfinkel, Jeffrey Havrilla, and Priya Narasimhan. Recovering C++ Objects From Binaries Using Inter-Procedural Data-Flow Analysis. In Proceedings of ACM SIG- PLAN on Program Protection and Reverse Engineering Workshop 2014, 2014.

[108] Kenneth D. Johnson and Matthew R. Miller. Using Virtual Table Protections to Prevent the Exploitation of Object Corruption Vulnerabilities, 2010. US Patent App. 12/958,668.

[109] Nicolas Joly. Advanced exploitation of Internet Explorer 10/Windows 8 overflow (Pwn2Own 2013). VUPEN Vulnerability Research Team (VRT) Blog, 2013.

[110] Nicolas Joly. Criminals Are Getting Smarter: Analysis of the Adobe Acrobat / Reader 0-Day Exploit. http://web.archive.org/web/20141018060115/http: //www.vupen.com/blog/20100909.Adobe Acrobat Reader 0 Day Exploit CVE-2010- 2883 Technical Analysis.php, September 2009.

[111] Gaurav S. Kc, Angelos D. Keromytis, and Vassilis Prevelakis. Countering Code- Injection Attacks With Instruction-Set Randomization. In ACM Conference on Computer and Communications Security (CCS), 2003.

[112] Chongkyung Kil, Jinsuk Jim, Christopher Bookholt, Jun Xu, and Peng Ning. Ad- dress Space Layout Permutation (ASLP): Towards Fine-Grained Randomization of Commodity Software. In Annual Computer Security Applications Conference (AC- SAC), 2006.

[113] Tim Kornau. Return-Oriented Programming for the ARM Architecture. http: //www.zynamics.com/downloads/kornau-tim--diplomarbeit--rop.pdf, 2009.

[114] Kostya Kortchinsky. Escaping VMware Workstation through COM1. https://www. exploit-db.com/docs/37276.pdf, 2015.

[115] Sebastian Krahmer. x86-64 Buffer Overflow Exploits and the Borrowed Code Chunks Exploitation Technique. http://users.suse.com/∼krahmer/no-nx.pdf, 2005.

[116] Volodymyr Kuznetsov, L´aszl´oSzekeres, Mathias Payer, George Candea, R. Sekar, and Dawn Song. Code-Pointer Integrity. In Symposium on Operating Systems Design and Implementation (OSDI), 2014.

139 Bibliography

[117] Volodymyr Kuznetsov, L´aszl´oSzekeres, Mathias Payer, George Candea, and Dawn Song. Poster: Getting The Point(er): On the Feasibility of Attacks on Code-Pointer Integrity. In IEEE Symposium on Security and Privacy, 2015.

[118] Bromium Labs. Dissecting the Newest IE10 0-Day Exploit (CVE-2014- 0322). http://labs.bromium.com/2014/02/25/dissecting-the-newest-ie10-0- day-exploit-cve-2014-0322/, 2014.

[119] Byoungyoung Lee, Long Lu, Tielei Wang, Taesoo Kim, and Wenke Lee. From Zygote to Morula: Fortifying Weakened ASLR on Android. In IEEE Symposium on Security and Privacy, 2014.

[120] Yossi Levanoni and Erez Petrank. An on-the-Fly Reference Counting Garbage Col- lector for Java. ACM SIGPLAN Notices, 36(11):367–380, 2001.

[121] Hafei Li. Inside AVM. In REcon, 2012.

[122] Zhenhua Liu. Advanced Exploit Techniques Attacking the IE Script En- gine. https://blog.fortinet.com/2014/06/16/advanced-exploit-techniques- attacking-the-ie-script-engine, 2014.

[123] Kangjie Lu, Stefan N¨urnberger, Michael Backes, and Wenke Lee. How to make aslr win the clone wars: Runtime re-randomization. In Symposium on Network and Distributed System Security (NDSS), 2016.

[124] Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geoff Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. Acm Sigplan Notices, 2005.

[125] miasm2 Authors. Miasm2: Reverse Engineering Framework in Python. https: //github.com/cea-sec/miasm, 2015.

[126] Microsoft. The Microsoft SDL and the CWE/SANS Top 25. http: //download.microsoft.com/download/C/A/9/CA988ED6-C490-44E9-A8C2- DE098A22080F/Microsoft%20SDL%20and%20the%20CWE-SANS%20Top%2025.doc, 2009.

[127] Microsoft. AppInit DLLs in Windows 7. http://goo.gl/BchJ4J, 2013.

[128] Microsoft. MS13-080 Addresses Two Vulnerabilities under Limited, Targeted At- tacks. http://goo.gl/sCZNkL, 2013.

[129] Microsoft. PE and COFF Specification. http://goo.gl/EWzFcF, 2013.

[130] Microsoft. Thiscall Calling Convention. http://goo.gl/5o48Ub, 2013.

[131] Microsoft. Bringing asm.js to the Chakra JavaScript Engine in Windows 10. http://blogs.msdn.com/b/ie/archive/2015/02/18/bringing-asm-js-to-the- chakra-javascript-engine-in-windows-10.aspx, 2014.

140 Bibliography

[132] Microsoft. EMET 5.2 is Available. http://blogs.technet.com/b/srd/archive/20 15/03/16/emet-5-2-is-available.aspx, 2014.

[133] Microsoft. What is the Windows Integrity Mechanism? http://msdn.microsoft. com/en-us/library/bb625957.aspx, 2014.

[134] Microsoft. Microsoft Security Bulletin Summary for December 2015. https:// technet.microsoft.com/en-us/library/security/ms15-dec.aspx, 2015.

[135] Microsoft-Research. Z3: Theorem Prover, 2014. http://z3.codeplex.com/.

[136] Daniel Moghimi. Subverting without EIP. http://www.moghimi.org/subverting- without-eip/, 2014.

[137] Vishwath Mohan, Per Larsen, Stefan Brunthaler, Kevin W. Hamlen, and Michael Franz. Opaque Control-Flow Integrity. In Symposium on Network and Distributed System Security (NDSS), 2015.

[138] Ingo Molnar. ”Exec Shield”, new Linux security feature. News-, May, 2003.

[139] Mozilla. Asm.js Working Draft. http://asmjs.org/spec/latest/.

[140] Mozilla. Firefox 0-day Found on Tor .onion Service. https://bugzilla.mozilla. org/show bug.cgi?id=901365, 2013.

[141] Mozilla. Kraken Benchmark Suite. http://krakenbenchmark.mozilla.org/, 2014.

[142] Mozilla. Odinmonkey: Signal-Handling OOB Cleanups. https://bugzilla. mozilla.org/show bug.cgi?id=1135903, 2015.

[143] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. Soft- Bound: Highly Compatible and Complete Spatial Memory Safety for C. In ACM Sigplan Notices, 2009.

[144] Santosh Nagarakatte, Jianzhou Zhao, Milo M.K. Martin, and Steve Zdancewic. CETS: Compiler Enforced Temporal Safety for C. In ACM Sigplan Notices, 2010.

[145] Ben Niu and Gang Tan. Modular Control-Flow Integrity. In ACM SIGPLAN Notices, 2014.

[146] Ben Niu and Gang Tan. RockJIT: Securing Just-In-Time Compilation Using Modu- lar Control-Flow Integrity. In ACM Conference on Computer and Communications Security (CCS), 2014.

[147] Ben Niu and Gang Tan. Per-Input Control-Flow Integrity. In ACM Conference on Computer and Communications Security (CCS), 2015.

[148] Gene Novark and Emery D. Berger. DieHarder: Securing the heap. In ACM Con- ference on Computer and Communications Security (CCS), 2010.

141 Bibliography

[149] Kaan Onarlioglu, Leyla Bilge, Andrea Lanzi, Davide Balzarotti, and Engin Kirda. G-Free: Defeating Return-Oriented Programming through Gadget-less Binaries. In Annual Computer Security Applications Conference (ACSAC), 2010.

[150] Aleph One. Smashing the Stack for Fun and Profit. Phrack magazine, 1996.

[151] P´adraigO’Sullivan, Kapil Anand, Aparna Kotha, Matthew Smithson, Rajeev Barua, and Angelos D. Keromytis. Retrofitting Security in COTS Software with Binary Rewriting. In Future Challenges in Security and Privacy for Academia and Industry, 2011.

[152] Pakt. ROPC - A Turing Complete ROP Compiler. https://github.com/pakt/ropc.

[153] pakt. Leaking Information with Timing Attacks on Hashtables. https: //gdtr.wordpress.com/2012/08/07/leaking-information-with-timing-attacks- on-hashtables-part-1/, 2012.

[154] Vasilis Pappas. kBouncer: Efficient and Transparent ROP Mitigation. http://www. cs.columbia.edu/∼vpappas/papers/kbouncer.pdf.

[155] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. Smashing the Gadgets: Hindering Return-Oriented Programming Using In-Place Code Random- ization. In IEEE Symposium on Security and Privacy, 2012.

[156] Vasilis Pappas, Michalis Polychronakis, and Angelos D. Keromytis. Transparent ROP Exploit Mitigation Using Indirect Branch Tracing. In USENIX Security Sym- posium, 2013.

[157] PaX Team. Address Space Layout Randomization. https://pax.grsecurity.net/ docs/aslr.txt, 2001.

[158] PaX Team. Pageexec. https://pax.grsecurity.net/docs/pageexec.txt, 2001.

[159] Alexandre Pelletier. Advanced Exploitation of Internet Explorer Heap Overflow (Pwn2Own 2012 Exploit). http://web.archive.org/web/20141005134545/http: //www.vupen.com/blog/20120710.Advanced Exploitation of Internet Explorer HeapOv CVE-2012-1876.php, July 2012.

[160] Andrey Permamedov. Why it’s Not Crashing? The Code Project, 2010.

[161] Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, and Thorsten Holz. Cross-Architecture Bug Search in Binary Executables. In IEEE Symposium on Security and Privacy, 2015.

[162] Jannik Pewny and Thorsten Holz. Control-Flow Restrictor: Compiler-Based CFI for iOS. In Annual Computer Security Applications Conference (ACSAC), 2013.

[163] Matt Pietrek. A Crash Course on the Depths of Win32 Structured Exception Han- dling. Microsoft Systems Journal-US Edition, 1997.

142 Bibliography

[164] Matt Pietrek. New Vectored Exception Handling in Windows XP. MSDN Magazine, 2001.

[165] Michalis Polychronakis, Kostas .G Anagnostakis, and Evangelos P. Markatos. Com- prehensive Shellcode Detection using Runtime Heuristics. In Annual Computer Se- curity Applications Conference (ACSAC), 2010.

[166] Aravind Prakash, Xunchao Hu, and Heng Yin. VfGuard: Strict Protection for Virtual Function Calls in COTS C++ Binaries. In Symposium on Network and Distributed System Security (NDSS), 2015.

[167] M. Prandini and M. Ramilli. Return-Oriented Programming. In IEEE Symposium on Security and Privacy, 2012.

[168] Dan Quinlan. ROSE: Compiler Support for Object-oriented Frameworks. Parallel Processing Letters, 2000.

[169] Charles Reis, Adam Barth, and Carlos Pizano. Browser Security: Lessons From Google Chrome. Queue, 2009.

[170] Charles Reis and Steven D. Gribble. Isolating Web Programs in Modern Browser Architectures. In Proceedings of the 4th ACM European Conference on Computer Systems, 2009.

[171] Giampaolo Fresi Roglia, Lorenzo Martignoni, Roberto Paleari, and Danilo Bruschi. Surgically Returning to Randomized Lib(c). In Annual Computer Security Applica- tions Conference (ACSAC), 2009.

[172] ROPgadget - Gadgets Finder and Auto-Roper. http://shell-storm.org/project/ ROPgadget/.

[173] David J. Roth and David S. Wise. One-bit Counts Between Unique and Sticky. SIGPLAN Not., 1998.

[174] Mark Russinovich, David Solomon, and Alex Ionescu. Windows Internals, Part 2. Microsoft Press, 2012.

[175] Mark Russinovich, David A. Solomon, and Alex Ionescu. Windows Internals, Part 1. Microsoft Press, 6th edition, 2012.

[176] Bruce Schneier. How the NSA Attacks Tor/Firefox Users with Quantum and Foxacid. Schneier on Security, October, 2013.

[177] Felix Schuster, Thomas Tendyck, Christopher Liebchen, Lucas Davi, Ahmad-Reza Sadeghi, and Thorsten Holz. Counterfeit Object-oriented Programming: On the Dif- ficulty of Preventing Code-Reuse Attacks in C++ Applications. In IEEE Symposium on Security and Privacy, 2015.

143 Bibliography

[178] Felix Schuster, Thomas Tendyck, Jannik Pewny, Andreas Maaß, Martin Steegmanns, Moritz Contag, and Thorsten Holz. Evaluating the Effectiveness of Current Anti- ROP Defenses. In Symposium on Recent Advances in Intrusion Detection (RAID), 2014.

[179] Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. Q: Exploit Harden- ing Made Easy. In USENIX Security Symposium, 2011.

[180] Benjamin Schwarz, Saumya Debray, and Gregory Andrews. Disassembly of Exe- cutable Code Revisited. In Reverse Engineering, 2002. Proceedings. Ninth Working Conference on, 2002.

[181] Jeff Seibert, Hamed Okkhravi, and Eric S¨oderstr¨om. Information Leaks without Memory Disclosures: Remote Side Channel Attacks on Diversified Code. In ACM Conference on Computer and Communications Security (CCS), 2014.

[182] Fermin J. Serna. The Info Leak Era on Software Exploitation. In Black Hat USA, 2012.

[183] Hovav Shacham. The Geometry of Innocent Flesh on the Bone: Return-into-Libc without Function Calls (On the x86). In ACM Conference on Computer and Com- munications Security (CCS), 2007.

[184] Hovav Shacham, Matthew Page, Ben Pfaff, Eu-Jin Goh, Nagendra Modadugu, and Dan Boneh. On the Effectiveness of Address-Space Randomization. In ACM Con- ference on Computer and Communications Security (CCS), 2004.

[185] Yan Shoshitaishvili. Pyvex - GitHub. https://github.com/zardus/pyvex.

[186] Yan Shoshitaishvili. Pyvex@d81bfe0 - GitHub. https://github.com/zardus/pyvex/ commit/d81bfe0ee7583d599bdd6d6c8cc091a61a42e01e.

[187] Yan Shoshitaishvili, Ruoyu Wang, Christophe Hauser, Christopher Kruegel, and Giovanni Vigna. Firmalice - Automatic Detection of Authentication Bypass Vul- nerabilities in Binary Firmware. In Symposium on Network and Distributed System Security (NDSS), 2015.

[188] Josep Silva. A Vocabulary of Program Slicing-Based Techniques. ACM Computing Surveys (CSUR), 2012.

[189] Kevin Z. Snow, Fabian Monrose, Lucas Davi, Alexandra Dmitrienko, Christopher Liebchen, and Ahmad-Reza Sadeghi. Just-In-Time Code Reuse: On the Effective- ness of Fine-Grained Address Space Layout Randomization. In IEEE Symposium on Security and Privacy, 2013.

[190] Kevin Z. Snow, Roman Rogowski, Jan Werner, Hyungjoon Koo, Fabian Monrose, and Michalis Polychronakis. Return to the Zombie Gadgets: Undermining Destruc- tive Code Reads via Code Inference Attacks. In IEEE Symposium on Security and Privacy, 2016.

144 Bibliography

[191] Chengyu Song, Chao Zhang, Tielei Wang, Wenke Lee, and David Melski. Exploit- ing and Protecting Dynamic Code Generation. In Symposium on Network and Dis- tributed System Security (NDSS), 2015.

[192] Alexander Sotirov. Heap Feng Shui in JavaScript. Black Hat Europe, 2007.

[193] Alexander Sotirov. Reverse Engineering and the ANI Vulnerability. http://www.phreedom.org/presentations/reverse-engineering-ani/reverse- engineering-ani.pdf, 2007.

[194] Ana Nora Sovarel, David Evans, and Nathanael Paul. Where’s the FEEB? The Effectiveness of Instruction Set Randomization. In USENIX Security Symposium, 2005.

[195] Amitabh Srivastava, Andrew Edwards, and Hoi Vo. Vulcan: Binary Transformation in a Distributed Environment. Technical report, Technical Report MSR-TR-2001-50, , 2001.

[196] Raoul Strackx, Yves Younan, Pieter Philippaerts, Frank Piessens, Sven Lachmund, and Thomas Walter. Breaking the Memory Secrecy Assumption. In Proceedings of the Second European Workshop on System Security, 2009.

[197] Bjarne Stroustrup. C++. John Wiley and Sons Ltd., 2003.

[198] L´aszl´oSzekeres, Mathias Payer, Tao Wei, and Dawn Song. SoK: Eternal War in Memory. In IEEE Symposium on Security and Privacy, 2013.

[199] Adrian Tang, Simha Sethumadhavan, and Salvatore Stolfo. Heisenbyte: Thwarting Memory Disclosure Attacks Using Destructive Code Reads. In ACM Conference on Computer and Communications Security (CCS), 2015.

[200] Simon Tatham, Julian Hall, and H. Peter Anvin. Netwide Assembler, 2011.

[201] Caroline Tice. Committing VTV Cygwin Patch for Patrick Wollgast. https://github.com/gcc-mirror/gcc/commit/5be42fa921560bbdaa277e40 df5346e650bf72a2.

[202] Caroline Tice, Tom Roeder, Peter Collingbourne, Stephen Checkoway, Ulfar´ Erlings- son, Luis Lozano, and Geoff Pike. Enforcing Forward-Edge Control-Flow Integrity in GCC & LLVM. In USENIX Security Symposium, 2014.

[203] Valgrind Home. http://valgrind.org/.

[204] Valgrind: Supported Platforms. http://valgrind.org/info/platforms.html.

[205] Victor van der Veen, Dennis Andriesse, Enes G¨okta¸s,Ben Gras, Lionel Sambuc, Asia Slowinska, Herbert Bos, and Cristiano Giuffrida. Practical Context-Sensitive CFI. In ACM Conference on Computer and Communications Security (CCS), 2015.

145 Bibliography

[206] Victor van der Veen, Enes G¨okta¸s, Moritz Contag, Andre Pawlowski, Xi Chen, Sanjay Rawat, Herbert Bos, Thorsten Holz, Elias Athanasopoulos, and Cristiano Giuffrida. A Tough call: Mitigating Advanced Code-Reuse Attacks at The Binary Level. In IEEE Symposium on Security and Privacy, 2016. [207] Victor Van der Veen, Lorenzo Nitish dutt-Sharma iand Cavallaro, and Herbert Bos. Memory Errors: The Past, The present, and The Future. In Symposium on Recent Advances in Intrusion Detection (RAID), 2012. [208] Sebastian Vogl, Robert Gawlik, Behrad Garmany, Thomas Kittel, Jonas Pfoh, Clau- dia Eckert, and Thorsten Holz. Dynamic Hooks: Hiding Control-Flow Changes Within Non-control Data. In USENIX Security Symposium, 2014. [209] Stijn Volckaert, Bart Coppens, and Bjorn De Sutter. Cloning your Gadgets: Com- plete ROP Attack Immunity with Multi-Variant Execution. IEEE Transactions on Dependable and Secure Computing, 2015. [210] Peter Vreugdenhil. Pwn2Own 2010 Windows 7 Internet Explorer 8 Exploit. http:// vreugdenhilresearch.nl/Pwn2Own-2010-Windows7-InternetExplorer8.pdf, 2010. [211] Peter Vreugdenhil. A Browser is Only as Strong as Its Weakest Byte - Part 2. http://blog.exodusintel.com/2013/12/09/a-browser-is-only-as-strong-as- its-weakest-byte-part-2/, 2012. [212] Robert Wahbe, Steven Lucco, Thomas E. Anderson, and Susan L. Graham. Efficient Software-Based Fault Isolation. In ACM SIGOPS Operating Systems Review, 1994. [213] Richard Wartell, Vishwath Mohan, Kevin W. Hamlen, and Zhiqiang Lin. Binary Stirring: Self-Randomizing Instruction Addresses of Legacy x86 Binary Code. In ACM Conference on Computer and Communications Security (CCS), 2012. [214] Yoav Weiss and Elena Gabriela Barrantes. Known/Chosen Key Attacks against Software Instruction Set Randomization. In ACM Conference on Computer and Communications Security (CCS), 2006. [215] Jan Werner, George Baltas, Rob Dallara, Nathan Otternes, Kevin Snow, Fabian Monrose, and Michalis Polychronakis. No-Execute-After-Read: Preventing Code Disclosure in Commodity Software. In ACM Symposium on Information, Computer and Communications Security (ASIACCS), 2016. [216] Patrick Wollgast, Robert Gawlik, Behrad Garmany, Benjamin Kollenda, and Thorsten Holz. Automated Multi-Architectural Discovery of CFI-Resistant Code Gadgets. In European Symposium on Research in Computer Security (ESORICS), 2016. [217] XROP - Tool to generate ROP gadgets for ARM, x86, MIPS and PPC. https: //github.com/acama/xrop. [218] Tao Yan. The Art of Leaks: The Return of Heap Feng Shui. In CanSecWest Applied Security Conference, 2014.

146 Bibliography

[219] Yu Yang. DEP/ASLR bypass without ROP/JIT. CanSecWest Applied Security Conference, 2013.

[220] Yang Yu. ROPs are for the 99%. In CanSecWest Applied Security Conference, 2014.

[221] Yang Yu. Write Once, Pwn Anywhere. In Black Hat USA, 2014.

[222] Michal Zalewski. Two More Browser Memory Disclosure Bugs. http://lcamtuf. blogspot.de/2014/10/two-more-browser-memory-disclosure-bugs.html, 2014.

[223] Michal Zalewski. Bi-level TIFFs and the Tale of the Unexpectedly Early Patch. http://lcamtuf.blogspot.de/2015/02/bi-level-tiffs-and-tale-of- unexpectedly.html, 2015.

[224] ZDI. CVE-2011-1346, (Pwn2Own) Microsoft Internet Explorer Uninitialized Variable Information Leak Vulnerability. http://www.zerodayinitiative.com/ advisories/ZDI-11-198/.

[225] Andy Zeigler. IE8 and Loosely-Coupled IE (LCIE). http://blogs.msdn.com/b/ie/ archive/2008/03/11/ie8-and-loosely-coupled-ie-lcie.aspx, 2008.

[226] Chao Zhang, Scott A. Carr, Tongxin Li, Yu Ding, Chengyu Song, Mathias Payer, and Dawn Song. VTrust: Regaining Trust on Virtual Calls. In Symposium on Network and Distributed System Security (NDSS), 2016.

[227] Chao Zhang, Chengyu Song, Kevin Zhijie Chen, Zhaofeng Chen, and Dawn Song. VTint: Protecting Virtual Function Tables’ Integrity. In Symposium on Network and Distributed System Security (NDSS), 2015.

[228] Chao Zhang, Tao Wei, Zhaofeng Chen, Lei Duan, L´aszl´oSzekeres, Stephen McCa- mant, Dawn Song, and Wei Zou. Practical Control-Flow Integrity and Randomiza- tion for Binary Executables. In IEEE Symposium on Security and Privacy, 2013.

[229] Mingwei Zhang and R. Sekar. Control-Flow Integrity for COTS Binaries. In USENIX Security Symposium, 2013.

147