Translingual Obfuscation

Translingual Obfuscation Pei Wang, Shuai Wang, Jiang Ming, Yufei Jiang, and Dinghao Wu College of Information Sciences and Technology The Pennsylvania State University fpxw172, szw175, jum310, yzj107, [email protected] Abstract—Program obfuscation is an important software pro- Currently the state-of-the-art obfuscation technique is to tection technique that prevents attackers from revealing the incorporate with process-level virtualization. For example, programming logic and design of the software. We introduce obfuscators such as VMProtect [10] and Code Virtualizer [4] translingual obfuscation, a new software obfuscation scheme replace the original binary code with new bytecode, and a which makes programs obscure by “misusing” the unique custom interpreter is attached to interpret and execute the features of certain programming languages. Translingual ob- bytecode. The result is that the original binary code does fuscation translates part of a program from its original lan- not exist anymore, leaving only the bytecode and interpreter, guage to another language which has a different program- making it difficult to directly reverse engineer [39]. How- ming paradigm and execution model, thus increasing program ever, recent work has shown that the decode-and-dispatch complexity and impeding reverse engineering. In this paper, execution pattern of virtualization-based obfuscation can we investigate the feasibility and effectiveness of translingual be a severe vulnerability leading to effective deobfusca- obfuscation with Prolog, a logic programming language. We tion [24], [66], implying that we are in need of obfuscation implement translingual obfuscation in a tool called BABEL, techniques based on new schemes. which can selectively translate C functions into Prolog pred- We propose a novel and practical obfuscation method icates. By leveraging two important features of the Prolog called translingual obfuscation, which possesses strong se- language, i.e., unification and backtracking, BABEL obfuscates curity strength and good stealth, with only modest cost. The both the data layout and control flow of C programs, making key idea is that instead of inventing brand new obfuscation them much more difficult to reverse engineer. Our experiments techniques, we can exploit some existing programming lan- show that BABEL provides effective and stealthy software guages for their unique design and implementation features obfuscation, while the cost is only modest compared to one of to achieve obfuscation effects. In general, programming the most popular commercial obfuscators on the market. With language features are rarely proposed or developed for BABEL, we verified the feasibility of translingual obfuscation, obfuscation purposes; however, some of them indeed make which we consider to be a promising new direction for software reverse engineering much more challenging at the binary obfuscation. level and thus can be “misused” for software protection. In particular, some programming languages are designed with unique paradigms and have very complicated execution 1. Introduction models. To make use of these language features, we can translate a program written in a certain language to another Obfuscation is an important technique for software pro- language which is more “confusing”, in the sense that it tection, especially for preventing reverse engineering from consists of features leading to obfuscation effects. arXiv:1601.00763v4 [cs.CR] 12 Jan 2016 infringing software intellectual property. Generally speak- ing, obfuscation is a semantics-preserving program transfor- In this paper, we obfuscate C programs by translat- mation that aims to make a program more difficult to under- ing them into Prolog, presenting a feasible example of stand and reverse engineer. The idea of using obfuscating the translingual obfuscation scheme. C is a traditional im- transformations to prevent reverse engineering can be traced perative programming language while Prolog is a typical back to Collberg et al. [20], [21], [56]. Since then many logic programming language. The Prolog language has some obfuscation methods have been proposed [46], [55], [58], prominent features that provide strong obfuscation effects. search-and- [65], [19], [79]. Malware authors also heavily rely on ob- Programs written in Prolog are executed in a backtrack fuscation to compress or encrypt executable binaries so that computation model which is dramatically dif- their products can avoid malicious content detection [69], ferent from the execution model of C and much more [67]. complicated. Therefore, translating C code to Prolog leads to obfuscated data layouts and control flows. Especially, the complexity of Prolog’s execution model manifests mostly This is an extended version of a paper to appear in Proceedings of the 1st IEEE European Symposium on Security and Privacy (Euro S&P 2016) in the binary form of the programs, making Prolog very [73]. suitable for software protection. Translating one language to another is usually very scale of subroutines, i.e., from C functions to Prolog difficult, especially when the target and source languages predicates, to obfuscate the original programs. Lan- have different programming paradigms. However, we made guage translation is always a challenging problem, an important observation that for obfuscation purposes, lan- especially when the target language has a heteroge- guage translation could be conducted in a special manner. neous execution model. Instead of developing a “clean” translation from C to Pro- • We evaluate BABEL with respect to all four eval- log, we propose an “obfuscating” translation scheme which uation criteria proposed by Collberg et al. [21]: retains part of the C memory model, in some sense making potency, resilience, cost, and stealth, on a set of real- two execution models mixed together. We believe this im- world C programs with quite a bit of complexity and proves the obfuscating effect in a way that no obfuscation diversity. Our experiments demonstrate that BABEL methods have achieved before, to the best of our knowledge. provides strong protection against reverse engineer- Consequently in translingual obfuscation, the obfuscation ing with only modest cost. does not only come from the obfuscating features of the target language, but also from the translation itself. With this The remainder of this paper is organized as follows. x2 new translation scheme we manage to kill two birds with one defines our threat model. x3 provides a high-level view stone, i.e., solving the technical problems in implementing on the insights and features of our translingual obfuscation translingual obfuscation and strengthening the obfuscation technique. x4 explains in detail why the Prolog programming simultaneously. language can be misused for obfuscation. We summarize the technical challenges in implementing translingual ob- There may be of a concern that obfuscation techniques fuscation in x5. x6 and x7 present our C-to-Prolog trans- without solid theoretical foundations will not withstand re- lation method and the implementation details of BABEL, verse engineering attacks in the long run. However, research respectively. We evaluate BABEL’s performance in x8. x9 on fundamental obfuscation theories, despite promising pro- has a discussion on some important topics about translingual cess made recently [47], [35], [63], [15], [13], is still not obfuscation, followed by the summary of related work in mature enough to spawn practical protection techniques. x10. x11 concludes the paper. There is a widely accepted consensus that no software protection scheme is resilient to skilled attackers if they inspect the software with intensive effort [22]. A recently proved 2. Threat Model theorem [14] partially supporting this claim states that, a “universally effective” obfuscator does not exist, i.e., for For attackers who try to reverse engineer a program pro- any obfuscation algorithm, there always exists an program tected by obfuscation, we assume that they have full access that it cannot effectively obfuscate. Given the situation, it to the binary form of the program. They can examine the seems that developing an obfuscation scheme resilient to static form of the binaries with whatever method available all reverse engineering threats (known or unknown) is too to them. They can also execute the victim binaries in a ambitious at this point. Hence, making reverse engineering monitored environment with arbitrary input, thus can read more difficult (but not impossible) could be a more realistic any data that has lived in the memory. goal to pursue. Do note that although we assume attackers have un- We have implemented translingual obfuscation in a tool limited access to program binaries, they should not posses called BABEL.BABEL can selectively transform a C func- any knowledge about the source code in our threat model. tion into semantically equivalent Prolog code and compile Assuming attackers can only examine the obfuscated pro- code of both languages together into the executable form. gram at the binary level is important, because that would Our experiment results show that translingual obfuscation mean any implementation detail of the language used in is obscure and stealthy. The execution overhead of BABEL translingual obfuscation contributes to the effectiveness of is modest compared to a commercial obfuscator. We also obfuscation. As for the particular case of employing Prolog show that translingual obfuscation is resilient to one of the in translingual

Translingual Obfuscation

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support