Static Binary Instrumentation with Applications to COTS Software Security

Static Binary Instrumentation with Applications to COTS Software Security A Dissertation presented by Mingwei Zhang to The Graduate School in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy in Computer Science Stony Brook University August 2015 Copyright by Mingwei Zhang 2015 Stony Brook University The Graduate School Mingwei Zhang We, the dissertation committee for the above candidate for the Doctor of Philosophy degree, hereby recommend acceptance of this dissertation R. Sekar - Dissertation Advisor Professor in Department of Computer Science Mike Ferdman - Chairperson of Defense Assistant Professor in Department of Computer Science Michalis Polychronakis - Committee Member Assistant Professor in Department of Computer Science Zhiqiang Lin - Committee Member Assistant Professor in Department of Computer Science University of Texas at Dallas This dissertation is accepted by the Graduate School Charles Taber Dean of the Graduate School ii Abstract of the Dissertation Static Binary Instrumentation with Applications to COTS Software Security by Mingwei Zhang Doctor of Philosophy in Computer Science Stony Brook University 2015 Binary instrumentation has assumed an important role in software security, as well as related areas such as debugging and monitoring. Binary instrumentation can be performed statically or dynamically. Static binary instrumentation (SBI) is attractive because of its simplicity and efficiency. However, none of the previous SBI systems support secure instrumentation of COTS binaries. This is because of several challenges including: (a) static binary code disassembly errors, (b) dif- ficulty of handling indirect control flow transfers, (c) ensuring completeness of instrumentation, i.e., instrumenting all of the code, including code contained in system libraries and compiler-generated stubs, and (d) maintaining compatibility with complex code, i.e., ensuring that the instrumentation does not break any existing code. We have developed a new static binary instrumentation approach, and present a software platform called PSI that implements this approach. PSI integrates a coarse grained control flow integrity (CFI) property as the basis of secure, non- bypassable instrumentation. PSI scales to large and complex stripped binaries, including low-level system libraries. It provides a powerful API that simplifies the development of custom instrumentations. We describe our approach, present several interesting security instrumentations, and analyze the performance of PSI. Our experiments on several real-world applications demonstrate that PSI's runtime overheads are about an order of mag- nitude smaller than that of the most popular platforms available today, such as iii DynamoRIO and Pin. (Both these platforms rely on dynamic instrumentation.) PSI has been tested on over 300 MB of binaries. In addition to our platform PSI, we describe two novel security applications developed using PSI. First, we present a comprehensive defense against injected code attacks that ensures code integrity at all times, even against very powerful adversaries. Second, we present a defense against code reuse attacks such as return-oriented programming (ROP) that is effective against adversaries possess- ing a wide range of capabilities. iv Dedicated to my wife, my parents and friends. v Contents Abstract iii Contents vi List of Figuresx Acknowledgements xii 1 Introduction1 1.1 Existing binary instrumentation techniques..............1 1.2 Challenges of static binary instrumentation..............2 1.3 Contributions..............................5 1.4 Dissertation organization........................6 2 Core Static Binary Instrumentation Techniques7 2.1 Binary disassembly...........................7 2.1.1 Existing disassembly techniques................7 2.1.2 PSI disassembly algorithm...................9 2.1.3 Disassembler implementation................. 10 2.2 Binary analysis............................. 11 2.2.1 Identifying code pointer constants............... 12 2.2.2 Identifying computed code pointers.............. 12 2.2.3 Identifying other code addresses................ 15 2.3 Binary instrumentation......................... 15 2.3.1 Background........................... 15 2.3.2 Binary instrumentation in PSI................. 17 2.3.3 Address translation for indirect control transfer....... 17 2.3.4 Signal handling......................... 19 2.3.5 Modular instrumentation.................... 20 2.4 Optimizations.............................. 21 2.4.1 Improving branch prediction.................. 22 2.4.2 Avoiding address translation.................. 23 2.4.3 Removing transparency..................... 24 vi 2.5 Control flow integrity policy...................... 24 2.5.1 Definition of control flow integrity............... 25 2.5.2 CFI policies and strength evaluation............. 25 2.5.3 Coarser grained CFIs..................... 27 2.5.4 Non-bypassable instrumentation................ 29 2.5.5 Formal proof of non-bypassable instrumentation....... 31 2.6 Summary................................ 35 3 Platform Implementation, API and Evaluation 36 3.1 System implementation......................... 36 3.2 Instrumentation API.......................... 38 3.2.1 Insertion of assembly code snippets.............. 39 3.2.2 Insertion of calls to instrumentation functions........ 39 3.2.3 Controlling address translation................ 40 3.2.4 Runtime event handling.................... 41 3.2.5 Development of instrumentation tools............. 42 3.3 On-demand instrumentation...................... 43 3.4 Illustrative instrumentation examples................. 44 3.4.1 Basic block counting...................... 44 3.4.2 System call policy enforcement................ 46 3.4.3 Library load policy enforcement................ 46 3.4.4 Shadow stack.......................... 46 3.5 System Evaluation........................... 49 3.5.1 Functionality.......................... 50 3.5.1.1 Testing transformed code.............. 50 3.5.1.2 Correctness of disassembly.............. 51 3.5.1.3 Testing code generated by alternative compilers.. 52 3.5.2 Security evaluation....................... 52 3.5.2.1 CFI effectiveness evaluation............. 52 3.5.2.2 Control-Flow hijack attacks............. 53 3.5.2.3 ROP attacks..................... 54 3.5.3 Performance evaluation on baseline system.......... 55 3.5.3.1 CPU intensive benchmarks............. 56 3.5.3.2 Microbenchmarks................... 58 3.5.3.3 Commonly used applications............ 59 3.5.4 Performance evaluation on sample instrumentations..... 62 3.5.4.1 Basic-block counting................. 62 3.5.4.2 Shadow stack..................... 63 3.6 Summary................................ 63 4 Application: Practical Protection From Code Reuse Attacks 65 4.1 Motivation................................ 65 4.2 Threat model.............................. 68 4.3 System design overview......................... 68 vii 4.4 Code pointer remapping........................ 69 4.4.1 Tightening baseline policy................... 70 4.4.2 Remapping code pointers.................... 71 4.5 Code space isolation.......................... 74 4.5.1 Static analysis for identifying data.............. 76 4.5.2 Separating instrumented code from original code....... 77 4.5.2.1 Protecting instrumented code............ 78 4.5.2.2 Protecting global address translation table..... 79 4.5.2.3 Protecting internal data structures......... 80 4.5.3 Randomization......................... 80 4.6 Implementation............................. 81 4.7 Evaluation............................... 84 4.7.1 Code pointer remapping.................... 84 4.7.2 Static analysis.......................... 84 4.7.3 Strength of randomization................... 86 4.7.4 Runtime performance...................... 87 4.7.5 GUI program startup overhead................ 88 4.8 Discussion............................... 89 4.8.1 Harvesting return addresses.................. 89 4.8.2 Call-oriented programming................... 90 4.9 Summary................................ 91 5 Application: Comprehensive Protection From Code Injection At- tacks 92 5.1 Motivation................................ 92 5.1.1 Property and Approach.................... 94 5.1.2 Key features........................... 95 5.2 Possible code injection vectors..................... 96 5.2.1 Direct attacks.......................... 97 5.2.2 Loader subversion attacks................... 97 5.2.2.1 Control hijacking attacks.............. 97 5.2.2.2 Data corruption attacks............... 98 5.2.2.3 Attacks based on data races............. 99 5.2.3 Case study: code injection using text relocation....... 100 5.2.3.1 Bypassing ASLR................... 100 5.2.3.2 Corrupting loader data structure.......... 101 5.2.3.3 Invoke text relocation function........... 102 5.3 System design.............................. 103 5.3.1 State model........................... 103 5.3.2 State model enforcement.................... 104 5.3.2.1 Operation to open a library............. 104 5.3.2.2 Operations to map file into memory........ 106 5.3.2.3 Segment boundaries................. 106 5.3.3 Code integrity enforcement................... 107 viii 5.3.4 Library loading policy..................... 109 5.3.5 Compatibility with code patching............... 110 5.3.6 Protected memory....................... 110 5.3.7 Support for dynamic code................... 112 5.4 Implementation............................. 114 5.5 Evaluation............................... 114 5.5.1 Micro-benchmark for program loading...........

Load more