Reassembleable Disassembling
Total Page:16
File Type:pdf, Size:1020Kb
Reassembleable Disassembling Shuai Wang, Pei Wang, and Dinghao Wu College of Information Sciences and Technology The Pennsylvania State University fszw175, pxw172, [email protected] Abstract struction level to enforce certain security policies. To ensure program control-flow integrity (CFI, meaning that Reverse engineering has many important applications in program execution is dictated to a predetermined control- computer security, one of which is retrofitting software flow graph) [1,4, 43, 17, 29, 37] without source code, the for safety and security hardening when source code is not original control-flow graph must be recovered from a bi- available. By surveying available commercial and aca- nary executable and the binary must be retrofitted with demic reverse engineering tools, we surprisingly found the CFI enforcement facility embedded [50, 49]. Sym- that no existing tool is able to disassemble executable bolic taint analysis [34] on binaries must recover assem- binaries into assembly code that can be correctly assem- bly code and data faithfully. The defending techniques bled back in a fully automated manner, even for simple against return-oriented programming (ROP) attacks also programs. Actually in many cases, the resulted disas- rely on binary analysis and reconstruction to identify and sembled code is far from a state that an assembler ac- eliminate ROP gadgets [44,9, 47, 22, 39]. cepts, which is hard to fix even by manual effort. This Despite the fact that many security hardening tech- has become a severe obstacle. People have tried to over- niques are highly dependent on reverse engineering, flex- come it by patching or duplicating new code sections for ible and easy-to-use binary manipulation itself remains retrofitting of executables, which is not only inefficient an unsolved problem. Current binary decompilation, but also cumbersome and restrictive on what retrofitting analysis, and reconstruction techniques still cannot fully techniques can be applied to. fulfill many of the requirements from downstream. To In this paper, we present UROBOROS, a tool that can the best of our knowledge, there is no reverse engineer- disassemble executables to the extent that the gener- ing tool that can disassemble an executable into assem- ated code can be assembled back to working binaries bly code which can be reassembled back in a fully au- without manual effort. By empirically studying 244 tomated manner, especially when the processed objects binaries, we summarize a set of rules that can make are commercial-off-the-shelf (COTS) binaries with most the disassembled code relocatable, which is the key to symbol and relocation information stripped. reassembleable disassembling. With UROBOROS, the We have investigated many existing tools from both disassembly-reassembly process can be repeated thou- the industry and academia, including IDA Pro [24], sands of times. We have implemented a prototype of Phoenix [42], Dagger [12], MC-Semantics [32], Second- UROBOROS and tested over the whole set of GNU Core- Write [3], BitBlaze [45], and BAP [8]. Unfortunately, utils, SPEC2006, and a set of other real-world applica- these tools focus more on recovering as much informa- tion and server programs. The experiment results show tion, such as data and control structures, as possible for that our tool is effective with a very modest cost. analysis purpose mainly, but less on producing assembly code that can be readily assembled back without manual 1 Introduction effort. Hence, none of them provide the desired disas- sembly and reassembly functionality that we consider, In computer security, many techniques and applications even if the processed binary is small and simple. depend on binary reverse engineering, i.e., analyzing and Due to lack of support from reverse engineering tools, retrofitting software binaries with the source code un- people build high-level security hardening applications available. For example, software fault isolation (SFI) based on partial binary retrofitting techniques, including [33, 46,2, 19, 18] rewrites untrusted programs at the in- binary rewriting tools such as Alto [35], Vulcan [16], Diablo [13], and binary reuse tools such as BCR [11] support from existing tools. We have confirmed that the and TOP [48]. We consider binary rewriting as a par- key to reassembility is making the assembly code relo- tial retrofitting technique because it can only instrument catable. Relocation is a linker concept, which is basi- or patch binaries, thus not suitable for program-wide cally for ensuring program elements defined in different transformations and reconstructions. As for binary reuse source files can correctly refer to each other after linked tools, they work by dynamically recording execution together. Being relocatable is also a premise for support- traces and combining the traces back to an executable, ing program-wide assembly transformations. In COTS meaning the new binary is only an incomplete part of binaries, however, the information necessary for mak- the original binary due to the incomplete coverage of dy- ing disassembly results relocatable is mostly unavailable. namic program analysis. There has been research trying to address the relocation Partial retrofitting has notable drawbacks and limita- issue [11, 48, 26], but existing work mostly relies on dy- tions: namic analysis which is unlikely to cover the whole pro- gram. • Patch-based rewriting could introduce non- In this paper, we present UROBOROS, a disassembler negligible runtime overhead. Since the patch that does reassembleable disassembling. In UROBOROS, usually lives in an area different from the original we develop a set of methods to precisely recover each code of the binary, interactions between the patch part of a binary executable. In particular, we are the and the original code usually require a large amount first to be capable of not only recovering code, but also of control-flow transfers. data and meta-information from COTS binaries with- • Patch-based rewriting usually relocates instructions out manual effort. We have implemented a prototype at the patch point to somewhere else to make space of UROBOROS and tested it on 244 binaries, including for the inserted code. As a result, it requires the the whole set of GNU Coreutils and the C programs in affected instructions to be relocatable by default. SPEC2006 (including both 32-bit and 64-bit versions). In our experiments, most programs reassembled from • Instrumentation-based rewriting expands binary UROBOROS’s output can pass functionality tests with sizes significantly, sometimes generating nearly negligible execution overhead, even after repeated dis- double-sized products. assembly and reassembly. Our preliminary study shows that UROBOROS can provide support for program-wide • Binary reuse often requires a binary component to transformations on COTS binaries. be small enough for dynamic analysis to cover; oth- In summary, we make the following contributions: erwise the correctness cannot be guaranteed. • We initiate a new focus on reverse engineering. Having investigated previous research on binary ma- Complementary to historical work which mostly fo- nipulation and reconstruction, we believe that it could be cuses on recovering high-level semantic informa- a remarkable improvement if we are able to automati- tion from binary executables or providing support cally recover the assembly from binaries and make the for binary analysis, our work seeks to deliver re- assembly code ready for reassembly. When a binary assembility, meaning we disassembles binaries in a can be reconstructed from assembly code, many high- way that the disassembly results could be directly level and program-wide transformations become feasi- assembled back into working executables, without ble, leading to new opportunities for research based on manual edits. binary retrofitting such as CFI, diversification, and ROP defense. • We identify the key challenge is to make the disas- Our goal is quite different from previous reverse engi- sembled program relocatable, and propose our key neering research. Instead of trying to recover high-level technique to recover references among immediate data and control structures from program binaries which values in the disassembled code, namely “symbol- helps binary code analysis, we aim at a more basic objec- ization”. tive, i.e., producing assembly code that can be readily re- • With reassembility, our research enables direct assembled back without manual effort, which we call the binary-based transformation without resort to the reassembility of disassembling. Although the research previously used patching method, and can poten- community has made notable progress on binary reverse tially become the foundation of binary-based soft- engineering, reassembility is still somewhat a blank due ware retrofitting. to lack of attention. In this sense, our contribution is complementary to existing work. • We implement a prototype of UROBOROS and eval- With that said, we believe that the technical challenge uate its strength on binary reassembly. We applied is also a cause for the deficiency in binary reassembly our technique to 244 binaries, including the whole set of GNU Coreutils and SPEC2006 C binaries. We tried to use it to decompile a simple binary (com- The experiment results show that our tool does cor- piled from a C program with only empty main function). rect disassembly and introduces only modest cost. The decompiler reported several errors and generated an LLVM IR file which cannot be compiled back