Accurate Recovery of Functions in COTS Binaries a Dissertation Presented by Rui Qiao to the Graduate School in Partial Fulfillme

Accurate Recovery of Functions in COTS Binaries A Dissertation presented by Rui Qiao to The Graduate School in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science Stony Brook University May 2017 Copyright by Rui Qiao 2017 Stony Brook University The Graduate School Rui Qiao We, the dissertation committe for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation Dr. R. Sekar - Dissertation Advisor Professor, Computer Science Department Dr. Michalis Polychronakis - Chairperson of Defense Assistant Professor, Computer Science Department Dr. Nick Nikiforakis - Committee Member Assistant Professor, Computer Science Department Dr. Aravind Prakash - External Committee Member Assistant Professor, Computer Science Department Binghamton University This dissertation is accepted by the Graduate School Charles Taber Dean of the Graduate School ii Abstract of the Dissertation Accurate Recovery of Functions in COTS Binaries by Rui Qiao Doctor of Philosophy in Computer Science Stony Brook University 2017 Binary analysis and instrumentation play a central role in COTS software security. They can be used to detect and prevent vulnerabilities, mitigate exploits, enforce security policies, and so on. Many security instrumentations work at the granularity of functions. How- ever, unlike high-level languages, functions in binaries are not clearly demar- cated. To complicate matters further, functions in binaries may have multiple entry points and/or exit points. Some of these entries or exits may not be determined simply by instruction syntax or code patterns. Moreover, many functions are reachable only through indirect control transfers, while some may be altogether unreachable. In this dissertation, we present an approach that overcomes these challenges to accurately identify function boundaries, as well as calls and returns. Our approach is based on fine-grained static analysis, relying on precise models of instruction set semantics derived in part from our previous work. In the later part of the work, we expand our investigation to recover the next crucial piece of information that is lost in high-level language to binary translation: the types and numbers of function parameters. We propose an approach that uses fine-grained binary analysis to address this problem. We evaluate this technique by applying it to enforce fine-grained control-flow integrity policies. While our approach is widely applicable to all binaries, when iii combined with recovered C++ semantics, it provides significantly improved protection. iv Contents 1 Introduction1 1.1 Functions at Binary Level....................1 1.2 Challenges for Accurate Function Recovery...........2 1.3 Existing Function Recovery Techniques.............3 1.4 Contributions...........................4 1.5 Dissertation Organization....................6 2 Background and Related Work7 2.1 Binary Organization and Disassembly..............7 2.2 Discovering Code Pointers....................9 2.2.1 Conservative Code Pointer Analysis...........9 2.2.2 Jump Table Analysis...................9 2.2.3 Virtual Table Analysis..................9 2.3 Function Recovery........................ 11 2.3.1 Function Recognition................... 11 2.3.2 Function Type Recovery................. 12 2.3.3 Function Decompilation................. 12 2.4 Code-Reuse Attacks....................... 13 2.4.1 Return-Oriented Programming............. 13 2.4.2 Call-Oriented Programming............... 14 2.5 Control-Flow Integrity...................... 14 2.5.1 Language-Agnostic Approaches............. 15 2.5.2 CFI for C++ Virtual Calls................ 16 2.5.3 Shadow Stack....................... 18 2.5.4 Comparison with Other Defenses............ 19 3 Static Analysis Approach 20 3.1 Intermediate Representation................... 20 3.1.1 Precise Modeling of Instruction Semantics....... 20 3.1.2 IR Transformation.................... 21 3.2 Data-Flow Analysis........................ 21 3.2.1 Reaching Definition Analysis............... 22 3.2.2 Liveness Analysis..................... 23 3.2.3 Static Single Assignment................. 23 3.3 Abstract Stack Analysis..................... 24 v 4 Accurate Recovery of Function Returns 28 4.1 Motivation and Approach Overview............... 28 4.2 Background and Threat Model.................. 30 4.2.1 CFI Platform....................... 30 4.2.2 Threat Model....................... 30 4.3 Inferring Intended Control Flow................. 31 4.3.1 Calls............................ 31 4.3.2 Returns.......................... 32 4.3.3 Static Analysis of Non-standard Returns........ 33 4.3.4 Discussion......................... 36 4.4 Enforcing Intended Control Flow................ 36 4.4.1 Instrumentation-based Enforcement........... 37 4.4.2 RCAP-stack Protection................. 37 4.5 Implementation.......................... 37 4.5.1 Static Analysis...................... 37 4.5.2 Binary Rewriting based Enforcement.......... 38 4.5.3 Optimizing Returns.................... 39 4.6 Evaluation............................. 39 4.6.1 Compatibility....................... 39 4.6.2 Protection......................... 41 4.6.3 Performance Overhead.................. 44 5 Accurate Recovery of Function Boundaries 46 5.1 Overview of Approach...................... 48 5.1.1 Problem Definition.................... 48 5.1.2 Approach Overview.................... 49 5.2 Function Starts.......................... 50 5.2.1 Directly Reachable Functions.............. 50 5.2.2 Indirectly Reachable Functions............. 51 5.2.3 Unreachable Functions.................. 51 5.3 Identifying Function Boundaries................. 51 5.4 Interface Property Checking................... 54 5.4.1 Control Flow Properties................. 54 5.4.2 Data Flow Properties................... 56 5.5 Evaluation............................. 59 5.5.1 Data Set.......................... 59 5.5.2 Metrics.......................... 60 5.5.3 Implementation...................... 60 5.5.4 Summary of Results................... 61 5.5.5 Detailed Evaluation.................... 62 vi 5.5.6 Performance........................ 66 5.6 Case Studies............................ 68 5.6.1 Control-Flow Integrity.................. 68 5.6.2 Function-based Binary Instrumentation......... 69 5.7 Extensions............................. 71 6 Accurate Recovery of Function Types 72 6.1 Indirect Callee Arguments.................... 72 6.1.1 Challenges......................... 73 6.1.2 Our Analysis....................... 74 6.2 Indirect Callsite Arguments................... 76 6.2.1 Challenges......................... 76 6.2.2 Our Analysis....................... 78 6.3 CFI Policies............................ 79 6.3.1 (Normal) Indirect Calls.................. 79 6.3.2 Virtual Calls....................... 80 6.4 Evaluation............................. 83 6.4.1 Analysis Precision..................... 83 6.4.2 Policy Correctness.................... 84 6.4.3 CFI Strength....................... 85 6.5 Security Analysis......................... 87 6.5.1 Coarse-Grained CFI Bypass Attacks.......... 87 6.5.2 COOP Attacks...................... 88 7 Conclusions and Future Work 90 vii List of Figures 2.1 Sections of an unstripped Linux ELF binary. Sections marked in green are code sections, and the ones in red are data sections. Sections marked in (different levels of) grey are symbol, debug, and relocation sections, which can be distinguished using their section names. When the binary is stripped, some of the grey sections are removed.......................8 2.2 Layout of an Itanium C++ VTable............... 10 2.3 Arguments passing for different calling conventions...... 15 2.4 Layout for polymorphic objects................. 16 3.1 IR code and its SSA form.................... 24 3.2 The abstract domain for stack analysis............. 25 3.3 Abstract interpretation for stack analysis............ 26 4.1 A non-standard return from ld.so ............... 32 4.2 Code snippets and their analysis results............. 35 4.3 Non-standard return (NSR) statistics.............. 40 4.4 Non-standard returns in common modules........... 40 4.5 Low-level and real-world software testing............ 42 4.6 CPU overhead of shadow stack systems on SPEC 2006.... 44 5.1 Overview of our analysis. Direct function starts are identified from call instructions in the disassembly, and require no further confirmation. The remaining function start candidates (Indirect, Jump and Unreachable function) need to pass our function interface checks that eliminate spurious functions. Func- tion body traversal is used to determine function ends, and takes advantage of already identified functions. Function body traversal and boundary information feeds back into the deter- mination of unreachable functions, as well as jump-reached (i.e., tail-called) functions........................ 49 5.2 Tail call detection......................... 53 5.3 Incorrect return address is used for a spurious function.... 55 5.4 \Return address" is overwritten for a spurious function.... 55 5.5 Register usage summary for calling conventions on different platforms............................. 57 5.6 Arguments passing for x86-32 calling conventions....... 57 5.7 The analysis states of example code............... 58 5.8 Function start identification results from different tools.... 61 5.9 Function boundary identification results from different tools. 61 viii 5.10 Function boundary identification results for SPEC 2006 and GLIBC............................... 62 5.11

Accurate Recovery of Functions in COTS Binaries a Dissertation Presented by Rui Qiao to the Graduate School in Partial Fulfillme

CSE421 Midterm Solutions —SOLUTION SET— 09 Mar 2012

Precise Garbage Collection for C

Man Pages Section 3 Library Interfaces and Headers

Part I: Unix Signals

Automatic Handling of Global Variables for Multi-Threaded MPI Programs

Threading and GUI Issues for R

KAAPI: a Thread Scheduling Runtime System for Data Flow Computations

Brocade Fabric OS Administration Guide, 8.2.1

ELC PLATFORM SPECIFICATION Version 1.0

A Debugging API for User-Level Thread Libraries

Download Cert

Oracle® Linux Ksplice User's Guide