<<

arXiv:1710.01139v1 [cs.CR] 3 Oct 2017 h euiyo rga bucto sntwell-establishe that not seems is obfuscation It program information. of residual origina security the the the just predict leveraging attack to lexical techniques The strings original learning apps. the machine Android of employs obfuscated portion from significant information a recovers which hmwtotne osleayhr rbes aethe Take problems. Bichsel hard by any attack solve recent to most need att without directly and them mechanisms obfuscation particular assume rtra oevr hr r ayntbeatcso curre on attacks notable ( mechanisms many suc obfuscation are meet semantics there cannot the Moreover, generally criteria. recover approaches to obfuscation adversaries Existing for hardness demonstra and adequate protected be can semantics program essential a provide not do guarantee. th techniques security is obfuscation ignore cannot mainstream we implement truth current and a code However, literature tools. of obfuscation the dozens by source in are proposed There indispensable ideas an protection. obfuscation creative becomes software awarded now for It which technique styles”. “smelly 1984, with Obfuscated concept in codes International The the Contest understand. semantic-equivalent at to Code are introduced harder but originally which one was compu original versions the transforms new with It to [1]. programs protection property intellectual ekt eeo rcia rga bucto solutions. obfuscation performance program is the practical approaches hand, develop obfuscation to other rat weak model-oriented the obscurity for On increased requirement guarantee. obfuscat the security code-oriented evaluate than existing generally hand, security approaches one both concerning On no metrics do performance. we evaluation is reason adequate main n The have have usable. still and we secure currently made that approaches is fro finding approaches main obfuscation Our for usable literature. and comput secure overhead explore obfuscating to aim on much focuses ( too it models Besides, incurs usage. mechanism it practical that the shown obfuscation have property, Nevertheless, security they program compelling and a secure mechanism fulfill can encoding in Garg graded breakthrough 2013, a in with a Recently achieved guarantee. heur have security are little tools have obfuscation and mainstream in implemented tion) o otaeitleta rprypoeto.Hwvr g However, ( protection. methods property obfuscation intellectual software for Abstract nofsainapoc sscr fi urnesta the that guarantees it if secure is approach obfuscation An rga obfuscation Program nScr n sbePormOfsain A Obfuscation: Program Usable and Secure On e.g., — rga obfuscation Program icis ahrta elcds nti ae,we paper, this In codes. real than rather circuits) .I I. e.g., NTRODUCTION samjrtcnqefrsoftware for technique major a is e.g., eia bucto,cnrlobfusca- control obfuscation, lexical † 2–7) uhatcsgenerally attacks Such [2]–[7]). et fCmue cec,TeCieeUiest fHn Ko Hong of University Chinese The Science, Computer of Dept. sawdl mlydapproach employed widely a is tal. et u Xu Hui ‡ i.e., colo optrSine ua University Fudan Science, Computer of School 7 sa example, an as [7] indistinguishability. † aga Zhou Yangfan , Survey such o eneral the m ation tal. et istic and ack her ion too tes ter ed nt ‡ at C h d uKang Yu , t l . 1 vlainmtisaentaeut o euiypurposes. security current for that adequate is not reason are primary metrics The evaluation security. demonstrated have approaches b well-studied such no can that approaches st confirm We obfuscation we secure. Firstly, code-oriented literature. usable the whether from approaches obfuscation and secu gaps explore approaches. the to obfuscation essential out usable are and to figure which to applied between, in need be connections We directly. cannot codes mechanisms clos Machines, practical are encoding Turing codes and graded or circuits obfuscating related, Although circuits codes. on real as focus than rather they dashed such Besides, models, a to usable. still computation with be are research approaches to encoding inefficient obfuscation graded such such However, line. of approache explosion obfuscation ( an security deliver follow-up provable many to with inspiring aim been which has investigations It maps. multilinear with secur compelling a achieve could property: it that showed and circuits Garg perspective. bucto prahs fnt hti h ibemeans viable the is what one not, one? towards So if . approaches? as obfuscation “ such is primitives, question important security other as paper obfuscation surveyed our of years. distribution across The 1: Fig. rga bucto loih ( algorithm obfuscation program hssre ist xlr eueadual program usable and secure explore to aims survey This theoretical the from came breakthrough a 2013, in Recently ͳͲ ͳʹ ͳͶ ͳ͸ Ͳ ʹ Ͷ ͸ ͺ ’ƒ’‡”͓ ‡ ihe .Lyu R. Michael ,

indistinguishability ͳͻͻ͵ ‘†‡Ǧ”‹‡–‡†„ˆ—• ƒ–‹‘

” ͳͻͻͶ ͳͻͻͷ ͳͻͻ͸  ” tal. et

ͳͻͻ͹ ‡  ow aescr n sbeprogram usable and secure have we do

ͳͻͻͺ † • e.g., † ͳͻͻͻ ‘ ˆ 8 rpsdtefis candidate first the proposed [8] ʹͲͲͲ  h dai oecd circuits encode to is idea The . „

9 0) iue1demonstrates 1 Figure 10]). [9, ʹͲͲͳ ˆ — ʹͲͲʹ ng • i.e.,

ʹͲͲ͵ ƒ –

ʹͲͲͶ ‹ ‘ rddecdn)frall for encoding) graded ‘†‡ŽǦ”‹‡–‡†„ˆ—• ƒ–‹‘ ʹͲͲͷ  

ʹͲͲ͸ ƒ ’

ʹͲͲ͹ ‡ ”

ʹͲͲͺ • ʹͲͲͻ ʹͲͳͲ ʹͲͳͳ ʹͲͳʹ ʹͲͳ͵ ›‡ƒ” udy ely

ity ʹͲͳͶ re o e s ʹͲͳͷ ʹͲͳ͸ Ͷ Existing investigations generally adopte the metrics proposed by Collberg et al. [11], which are potency, resilience, stealthy ”‡˜‡–‹˜‡ ƒ›‘—– and cost. Note that such evaluation metrics emphasize on the „ˆ—• ƒ–‹‘ „ˆ—• ƒ–‹‘ increased obscurity (i.e., potency) rather than the semantics ‘†‡Ǧ”‹‡–‡† ”ƒ•ˆ‘”ƒ–‹‘ ‘–”‘Ž that remained clear in an obfuscated program. Therefore, the „ˆ—• ƒ–‹‘ „ˆ—• ƒ–‹‘ „ˆ—• ƒ–‹‘ metrics guarantee little security. ‘Ž›‘”’Š‹ ƒ–ƒ „ˆ—• ƒ–‹‘ „ˆ—• ƒ–‹‘ „ˆ—• ƒ–‹‘ Secondly, we study whether we can develop usable pro- ’’”‘ƒ Š gram obfuscation approaches from existing model-oriented Ǧ„ƒ•‡† ‘†‡ŽǦ”‹‡–‡† ”ƒ†‡† ”ƒ†‡† ‘†‹‰ obfuscation investigations. The result is negative. Current „ˆ—• ƒ–‹‘  ‘†‹‰ ‹” —‹–Ǧ„ƒ•‡† graded encoding mechanisms are too inefficient to be usable. Fig. 2: A taxonomy of program obfuscation approaches. ”ƒ†‡† ‘†‹‰ They only satisfy the performance requirement defined by Barak et al. [12], i.e., an obfuscated program should incur only polynomial overhead. The requirement might be too Horvath et al. [18] studied the history of cryptography weak because a qualified program can still grow very large. obfuscation, with a focus on graded encoding mechanisms. Moreover, existing model-oriented obfuscation approaches Barak [19] reviewed the importance of indistinguishability are only applicable to real programs which contain only obfuscation. simple mathematical operations; they do not apply to ordinary To our best knowledge, none of the existing surveys includes codes with complex syntactic structures. For example, model- a thorough comparative study of code-oriented obfuscation oriented obfuscation approaches do not consider some code and model-oriented obfuscation. Indeed, the two categories components, e.g., the lexical information and API calls. Such are closely related, because they frequently cite each other. components serve as essential clues for adversaries to analyze For example, the impossibility result for model-oriented a program and should be obfuscated. obfuscation in [12] has been widely cited by code-oriented ʹ To summarize, this paper serves as a first attempt to investigations (e.g., [20]). Our survey, therefore, severs as explore secure and usable obfuscation approaches with a a pilot study on synthesizing code-oriented obfuscation and comparative study of code-oriented obfuscation and model- model-oriented obfuscation. oriented obfuscation. Our result is that we have no secure Note that there are another two papers [21, 22] that and usable obfuscation approaches in current practice. To have noticed the gaps between code-oriented obfuscation and develop such approaches, we suggest the community to design model-oriented obfuscation, and they work towards secure and appropriate evaluation metrics at first. usable obfuscation. Preda and Giacobazzi et al. [21] proposed The rest of this paper is organized as follows: we first to model the security properties of obfuscation with abstract discuss the related work in Section II; then we introduce our interpretation, which can be further employed to deliver study approach in Section III and major results in Section IV; obfuscation solutions (e.g., [23, 24]). Kuzurin et al. [22] we survey the literature about code-oriented obfuscation in noticed that current security properties for model obfuscation Section V and model-oriented obfuscation in Section VI; are too strong, and they proposed several alternative properties then we discuss the possible paths towards secure and usable for practical program obfuscation scenarios. Note that such program obfuscation in Section VII; finally, we conclude this papers coincide with us on the importance of our studied study in Section VIII. problem. We will discuss more details in Section VII. II. RELATED WORK III. STUDY APPROACH As obfuscation has been studied for almost two decades, several surveys are available. However, they mainly focus on A. Survey Scope either code-oriented obfuscation or model-oriented obfusca- This work discusses program obfuscation, including both tion. The surveys of code-oriented obfuscation include [13]– code-oriented obfuscation and model-oriented obfuscation. A [17]. Balakrishnan and Schulze [13] surveyed several major program obfuscator is a compiler that to transforms a program obfuscation approaches for both benign codes and malicious P into another version O(P), which is functionally identical codes. Majumdar et al. [14] conducted a short survey that to P but much harder to understand. Note that the concept is summarizes the control-flow obfuscation techniques using consistent with the definitions of code-oriented obfuscation opaque predicates and dynamic dispatcher. Drape et al. [15] by Collberg et al. [25] and model-oriented obfuscation by surveyed several obfuscation techniques via layout tran- Barak et al. [12]. By this concept, we rule out some formation, control-flow tranformation, data tranformation, manual obfuscation approaches that can not be generalized language dependent transformations, etc. Roundy et al. [16] or automated in compilers (e.g., [26]). systematically studied obfuscation techniques for binaries, Moreover, we restrict our study to general-purpose ob- which have been frequently used by packers; Schrit- fuscation, which means the obfuscation has no preference twieser et al. [17] surveyed the resilience of obfusca- on program functionalities. However, if some obfuscation tion mechanisms to reverse engineering techniques. The approaches are not designed general programs but are valuable surveys of model-oriented obfuscation include [18, 19]. to obfuscate general programs (e.g., white-box [27,

2 28], malware camouflages [29]), we would also discuss them. cryptographers because the candidate obfuscation approaches Indeed, general-purpose obfuscation is a common obfuscation are based on cryptographic primitives. scenario and covers most obfuscation investigations. The problems of code-oriented obfuscation and model- Finally, we do not emphasize the differences among oriented obfuscation are very differet. Firstly, real codes programming languages, such as C, or Java. Such differences are more complex than general computation models. For are not critical issues towards secure and usable obfuscation. If example, it may include components that are not considered in one obfuscation approach is studied several times for different circuits, such as lexical information and function calls. Such programming languages, we think such investigations are components may serve as essential information for adversaries similar and only discuss a representative one. to interprete the software and should be obfuscated. Besides, it may contain challenging issues for obfuscators to handle, B. A Taxonomy of Obfuscation Approaches such as concurrent operations and pointers. Secondly, the 1) Taxonomy Hierarchy: For all the obfuscation approaches two categories demonstrate different attacker models. Code within the scope, we draw a taxonomy hierarchy as shown in obfuscation assumes the adversarial purpose is to understand a Figure 2. In the first level, we divide program obfuscation into released software program, and the attacking methods can be code-oriented obfuscation and model-oriented obfuscation. either automated deobfuscation tools or manual inspections. Each category can be identified with a groundbreaking paper. Such obfuscation approaches are generally evaluated with The groundbreaking paper of code-oriented obfuscation was the increased obscurity or program complexity, resilience published in 1997 when Collberg et al. [25] conducted a to automated attackers, stealthy to human attackers, and pilot study on the taxonomy of obfuscation transformation. costs. On the other hand, model-oriented obfuscation only They have discussed several transformation approaches and assumes automated attackers (e.g., probabilistic polynomial- evaluation metrics. Since then, obfuscation has been receiving time Turing Machine) and it assumes the adversarial purpose extensive attentions in both research and industrial fields. Fol- is to infer the functionality of a computation model (circuit lowup investigations mainly propose new ideas on obfuscation or Turing Machine). Such investigations (e.g., [12]) generally transformations in layout level, control level, or data level. made assumptions that a program can be represented as Besides, there are also preventive obfuscation (e.g., [30]–[33]) explicit pairs of . In this way, one can and polymorphic obfuscation (e.g., [34]–[36]). We defer our evaluate the security with mathematical representations, i.e., discussion about the details to Section V. the probability of guessing the pairs from an For model-oriented obfuscation, the groundbreaking paper obfuscated program. A negligible probability implies that the was published in 2001 when Barak et al. [12] initiated the obfuscated program leaks little information or it is secure. first theoretical study on the capability of program obfuscation. Finally, the two categories resort to different security basis They studied how much semantic information can be hidden when proposing secure obfuscation solutions. Code-oriented at most via obfuscation. To this end, they proposed a virtual obfuscation is interested in hard program analysis problems, black-box property and showed that not all programs can be such as pointer analysis. Model-oriented obfuscation generally obfuscated with the property. Hence, what security properties makes further assumptions that a model contains only basic can obfuscation guarantee and how to achieve the property mathematical operations. In this way, it can adopt cryp- are the two important problems of this area. Currently, graded tographic primitives based on some mathematical hardness encoding [8] is the only available mechanism that can be assumptions, such as multilinear maps. implemented for obfuscation. We defer our discussion about Note that the differences are summarized based on existing the details to Section VI. obfuscation investigations until this survey is published. Some Note that other investigations (e.g., [22]) may employ gaps may be mitigated in the future. practical obfuscation and theoretical obfuscation as the cat- egory names. However, such a categorization approach may C. Secure and Usable Obfuscation not be very discriminative because practical investigations To facilitate the following study, we clarify the concept on obfuscating codes may also include theoretical studies of secure and usable obfuscation. An obfuscation approach (e.g., [37, 38]), and vice versa. Therefore, our categorization is secure if it performs well in two aspects: obfuscation approach with code-oriented obfuscation and model-oriented effectiveness and resistance. Obfuscation effectiveness means obfuscation should be more appropriate. the confidentiality of program semantics. A secure obfuscation 2) The Differences between Code-Oriented Obfuscation approach should hide as much program semantics as possible, and Model-Oriented Obfuscation: Code-oriented obfuscation especially the essential semantics. Resistance means the demonstrates non-trivial gaps with model-oriented obfuscation hardness to recover the confidential semantics. A secure as summarized in Table I. They are studied by different obfuscation approach should be resistant to attacks. The two research communities. Code-oriented obfuscation interests aspects are consistent with current evaluation metrics in both software security experts or engineers, who deal with real soft- code-oriented obfuscation and model-oriented obfuscation ware protection issues. Model-oriented obfuscation interests fields. For code-oriented obfuscation, obfuscation effectiveness scientists who pursue theoretical study on circuits or Turing is similar as potency, while resistance includes both resilience Machines. Besides, model-oriented obfuscation also interests and stealth. For model-oriented obfuscation, effectiveness

3 TABLE I: The differences between code obfuscation and model obfuscation

Code-Oriented Obfuscation Model-Oriented Obfuscation Research Community Software security, software engineering, etc Theoretical computation, cryptography, etc Program to Obfuscate Codes Circuits or Turing Machines Obfuscation Semantics that computes outputs Adversarial Purpose Task-dependent Problem given inputs Protection Purpose To increase program obscurity To hide mathematical computations Potency or increased obscurity, Leaked information measured as the Security-related Metrics resilience against automated attackers, probability of guessing stealthy to human attackers Hard program analysis problems Cryptography algorithms Security Basis (e.g., pointer analysis) (e.g., multilinear maps) means the security property (e.g., indistinguishability) which handle non-mathematical code syntax nor how to handle other defines the maximum information that an obfuscated program programming concepts, such as control flows and data flows. leaks; while resistance means the complexity in attacking an obfuscation algorithm. V. CODE-ORIENTED OBFUSCATION An obfuscation is usable if it can be applied to obfuscate real In 1993, Cohen [34] published the first code obfusca- codes, and the incurred performance overhead is acceptable tion paper. Later in 1997, Collberg et al. [25] conducted for real application scenarios. Such overhead includes both a groundbreaking study on the taxonomy of obfuscation program size and execution time. The performance of an transformations. Then, many obfuscation approaches were obfuscation approach can be arguably acceptable if it incurs proposed in the literature. trivial overhead. Based on the purpose of protection, we divide code- oriented obfuscation approaches into three categories: preven- IV. OVERVIEW OF FINDINGS tive obfuscation, transformation obfuscation, and polymorphic In short, we have no secure and usable program obfuscation obfuscation. Preventive obfuscation aims to impede attackers approaches to date. The primary reason is that we do not from obtaining the real codes. Transformation obfuscation have adequate evaluation metrics concerning both security and degrades the readability of the real codes. Polymorphic usability. obfuscation aims to prevent attackers from locating targeted Firstly, none of the existing code-oriented obfuscation semantics or features in each obfuscated version. Transforma- investigations evaluate residual semantics in an obfuscated tion obfuscation serves as a major code obfuscation technique, program. They generally adopt the evaluation metrics proposed and most of the obfuscation approaches fall into this category. by Collberg et al. [11], which judges the increased obscurity We further divide them into layout transformation, control rather than the unprotected code semantics. None of them transformation and data transformation, each of which focuses employs evaluation metrics can meet our security requirement, on increasing the obscurity of a particular perspective. especially in obfuscation effectiveness. Therefore, designing appropriate metrics seems the priority for developing secure A. Preventive Obfuscation obfuscation approaches in the future. Preventive obfuscation raises the bar for adversaries to Secondly, current model-oriented obfuscation approaches obtain code snippets in readable formats. It is generally are too inefficient to be employed in practice. Such approaches designed for non-scripting programming languages, such as can satisfy the performance requirement defined by Barak et C/C++ and Java. For such software, a disassembly phase is al. [12], i.e., the obfuscated program incurs only polynomial required to translate machine codes (e.g., binaries) into human overhead, but they still cost too much. Such a performance readable formats. Preventive obfuscation, therefore, aims to requirement with polynomial overhead is too weak for an obstruct the disassembly phase by introducing errors to general approach to be practical. Besides, the security requirements dissemblers. (e.g., indistinguishability) for obfuscating circuits might be Linn and Debray [30] conducted the first preventive too rigid to develop obfuscation solutions, not to mention obfuscation study on ELF (executable and linkage format) a practical one. This may justify why graded encoding programs, or binaries. They propose several mechanisms is the only obfuscation approach to date that can meet to deter popular disassembling algorithms. To thwart lin- a security requirement. Moreover, current model-oriented ear sweep algorithms, the idea is to insert uncompleted obfuscation mechanisms are only applicable to codes with instructions as junk codes after unconditional jumps. The simple mathematical operations. Neither do we know how to mechanism takes effect if a disassembler cannot handle such

4 uncompleted instructions. To thwart recursive algorithms, they In general, layout obfuscation has promising resistance further replace regular procedure calls with branch functions because some transformations are one-way which cannot be and jump tables. In this way, the return addresses are only reversed. But the obfuscation effectiveness is only limited to determined during runtime, and they can hardly be known by layout level. Moreover, some layout information can hardly be static disassemblers. Similarly, Popov et al. [32] have proposed changed, such as the method identifiers from Java SDK. Such to convert unconditional jumps to traps which raise signals. residual information is essential for adversaries to recover the Then they employ a signal handling mechanism to achieve obfuscated information. For example, Bichsel et al. [7] tried to the original semantics. Darwish et al. [33] have verified that deobfuscated ProGuard-obfuscated apps, and they successfully such obfuscation approaches are effective against commercial recovered around 80% names. disassembly tools, e.g., IDA pro [39]. The idea is also applicable to the decompilation process of C. Control Obfuscation Java bytecodes. Chan and Yang [31] proposed several lexical Control obfuscation increases the obscurity of control flows. tricks to impede Java decompilation. The idea is to modify It can be achieved via introducing bogus control flows, bytecodes directly by employing reserved keywords to name employing dispatcher-based controls, and etc. variables and functions. This is possible because the validation 1) Bogus Control Flows: Bogus control flows refer to the check of identifiers is only performed by the frontend. In this control flows that are deliberately added to a program but will way, the modified program can still run correctly, but it would never be executed. It can increase the complexity of a program, cause troubles for decompilation tools. e.g., in McCabe complexity [44] or Harrison metrics [45]. To measure the effectiveness of preventive obfuscation, For example, McCabe complexity [44] is calculated as the Linn and Debray [30] proposed confusion factor, i.e., the number of edges on a control-flow graph minus the number of ratio of incorrectly disassembled instructions (or blocks, or nodes, and then plus two times of the connected components. functions) to all instructions [30]. Popov et al. [32] also To increase the McCabe complexity, we can either introduce adopted the metrics. Besides, they proposed another factor new edges or add both new edges and nodes to a connected that measures the ratio of correct edges on a control-flow component. graph. Such approaches are based on tricks. Although they To guarantee the unreachability of bogus control flows, may mislead existing disassembly or decompilation tools, they Collberg et al. [25] proposed the idea of opaque predicates. are vulnerable to advanced handmade attacks. They defined opaque predict as the predicate whose outcome is known during obfuscation time but is difficult to deduce by B. Layout Obfuscation static program analysis. In general, an opaque predicate can Layout obfuscation scrambles a program layout while be constantly true (P T ), constantly false (P F ), or context- keeping the syntax intact. For example, it may change the dependent (P ?). There are three methods to create opaque orders of instructions or scramble the identifiers of variables predicates: numerical schemes, programming schemes, and and classes. contextual schemes. Lexical obfuscation is a widely employed layout obfuscation approach which transforms the meaningful identifiers to Numerical Schemes meaningless ones. For most programming languages, adopt- Numerical schemes compose opaque predicates with math- ing meaningful and uniform naming rules (e.g., Hungarian ematical expressions. For example, 7x2 −1 6= y2 is constantly Notation [40]) is required as a good programming practice. true for all integers x and y. We can directly employ such Although such names are specified in source codes, some opaque predicates to introduce bogus control flows. Figure 3(a) would remain in the released software. For example, the names demonstrates an example, in which the opaque predicate of global variables and functions in C/C++ are kept in binaries, guarantees that the bogus control flow (i.e., the else branch) and all names of Java are reserved in bytecodes. Because such will not be executed. However, attackers would have higher meaningful names can facilitate adversarial program analysis, chances to detect them if we employ the same opaque we should scramble them. To make the obfuscated identifiers predicates frequently in an obfuscated program. Arboit [46], more confusing, Chan et al. [31] proposed to deliberately therefore, proposed to automatically generate a family of such employ the same names for objects of different types or within opaque predicates, such that an obfuscator can choose a unique different domains. Such approaches have been adopted by opaque predicates each time. ProGuard [41] as a default obfuscation scheme for Android Another mathematical approach with higher security is to programs. employ crypto functions, such as hash function H [47], and Besides, several investigations study obfuscation via shuf- homomorphic encryption [48]. For example, we can substitute fling program items. For example, Low [42] proposed to a predicate x == c with H(x) == chash to hide the solution seperate the related items of Java programs wherever possible, of x for this equation. Note that such an approach is generally because a program is harder to read if the related information employed by malware to evade dynamic program analysis. We is not physically close. Wroblewski [43] proposed to reorder may also employ crypto functions to encrypt equations which a sequence of instructions if it does not change the program cannot be satisfied. However, such opaque predicates incur semantics. much overhead.

5 To compose opaque constants resistant to static analysis, to obfuscate programs. For example, Drape [15] proposed to Moser et al. [49] suggested employing 3-SAT problems, which compose such opaque predicates which are invariant under a are NP-hard. This is possible because one can have efficient contextual constraint, e.g., the opaque predicate x mod 3 == algorithms to compose such hard problems [50]. For example, 1 is constantly true if x mod 3:1? x++ : x = x + 3. Tiella and Ceccato [51] demonstrated how to compose such Palsberg et al. [58] proposed dynamic opaque predicates, opaque predicates with k-clique problems. which include a sequence of correlated predicates. The To compose opaque constants resistant to dynamic analysis, evaluation result of each predicate may vary in each run. Wang et al. [52] propose to compose opaque predicates However, as long as the predicates are correlated, the program with a form of unsolved conjectures which loop for a behavior is deterministic. Figure 3(c) demonstrates an example number of times. Because loop is a challenging issue for of dynamic opaque predicates. No matter how we initialize ∗p dynamic analysis, the approach in nature should be resistant and ∗q, the program is equivalent to y = x +3, x = y +3. to dynamic analysis. Examples of such conjectures include The resistance of bogus control flows largely depends on the Collatz conjecture, 5x +1 conjecture, Matthews conjecture. security of opaque predicates. An ideal security property for Figure 3(b) demonstrates how to employ Collatz conjecture to opaque predicates is that they require worst-case exponential introduce bogus control flows. No matter how we initialize x, time to break but only polynomial time to construct. Notethat the program terminates with x =1, and originalCodes() some opaque predicates are designed with such security can always be executed. concerns but may be implemented with flaws. For example, the 3-SAT problems proposed by Ogiso et al. [38] are based Programming Schemes on trivial problem settings which can be easily simplified. If Because adversarial program analysis is a major threat to such opaque predicates are implemented properly, they would opaque predicates, we can employ challenging program anal- be promising to be secure. ysis problems to compose opaque predicates. Collberg et al. 2) Dispatcher-Based Controls: A dispatcher-based control suggest two classic problems, pointer analysis and concurrent determines the next blocks of codes to be executed during programs. runtime. Such controls are essential for control obfuscation, In general, pointer analysis refers to determining whether because they can hide the original control flows against static two pointers can or may point to the same address. Some program analysis. pointer analysis problems can be NP-hard for static analysis One major dispatcher-based obfuscation approach is control or even undecidable [54]. Another advantage is that pointer flattening, which transforms codes of depth into shallow ones operations are very efficient during execution. Therefore, one with more complexity. Wang et al. [53] firstly proposed can compose resillient and efficient opaque predicts with the approach. Figure 4 demonstrates an example from their well-designed pointer analysis problems, such as maintaining paper that transforms a while loop into another form with pointers to some objects with dynamic data structures [11]. switch-case. To realize such transformation, the first step Concurrent programs or parallel programs is another chal- is to transform the code the into an equivalent reprensentation lenging issue. In general, a parallel region of n statements with if-then-goto statements as shown in Figure 4(b); has n! different ways of execution. The execution is not only then they modify the goto statements with switch-case determined by the program, but also by the runtime status statements as shown in Figure 4(c). In this way, the original of a host computer. Collberg et al. [11] proposed to employ program semantics is realized implicitly by controlling the data concurrent programs to enhance the pointer-based approach flow of the switch variable. Because the execution order of by concurrently updating the pointers. Majumdar et al. [55] code blocks are determined by the variable dynamically, one proposed to employ distributed parallel programs to compose cannot know the control flows without executing the program. opaque predicates. Cappaert and Preneel [59] formalized control flattening as Besides, some approaches compose opaque predicates with employing a dispatcher node (e.g., switch) that controls the programming tricks, such as leveraging exception handling next code block to be executed; after executing a block, control mechanisms. For example, Dolz and Parra [56] proposed to use is transferred back to the dispatcher node. Besides, there the try-catch mechanism to compose opaque predicates are several enhancements for code flattening. For example, for .Net and Java. The exception events include division to enhance the resistance to static program analysis on the by zero, null pointer, index out of range, or even particular switch variable, Wang et al. [60] proposed to introduce hardware exceptions [57]. The original program semantics can pointer analysis problems. To further complicate the program, be achieved via tailored exception handling schemes. However, Chow et al. [61] proposed to add bogus code blocks. such opaque predicates have no security basis, and they are L´aszl´oand Kiss [62] proposed a control flattening mech- vulnerable to advanced handmade attacks. anism to handle specific C++ syntax, such as try-catch, while-do, continue. The mechanism is based on abstract Contextual Schemes syntax tree and employs a fixed pattern of layout. For each Contextual schemes can be employed to compose variant block of code to obfuscate, it constructs a while statement opaque predicates(i.e., {P ?}). The predicates should hold in the outer loop and a switch-case compound inside the some deterministic properties such that they can be employed loop. The switch-case compound implements the original

6 ‹– ƒǡ„Ǣ ‹– šǢ ȀȀˆ‘”ƒ›š൐Ͳ ǥ ™Š‹Ž‡ሺš൐ͳሻሼ ‹ˆሺ͹ƒȗƒȂ ͳǨൌ„ሻሼ ‹ˆሺšΨʹൌൌͳሻ ȀȀƒŽ™ƒ›•–”—‡ šൌšȗ͵൅ͳǢ ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ ‡Ž•‡šൌšȀʹǢ ሽ‡Ž•‡ሼ ȀȀ‘’–‹‘ƒŽ ‹ˆሺšൌൌͳሻ ȀȀƒŽ™ƒ›•”‡ƒ Šƒ„Ž‡ „‘‰—•‘†‡•ሺሻǢ ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ ሽ ሽ

‹–‹– ƒǡ„Ǣ ƒǡ„Ǣ ‹–‹– šǢ šǢȀȀˆ‘”ƒ›š൐ͲȀȀˆ‘”ƒ›š൐Ͳ ‹– ȗ’ൌƬšǢ ‹ˆሺሺȗ“ሻΨʹൌൌͲሻሼ ǥǥ ™Š‹Ž‡ሺš൐ͳሻሼ™Š‹Ž‡ሺš൐ͳሻሼ ‹– ȗ“ൌƬšǢ ›ൌ›൅͵Ǣ ‹ˆሺ͹ƒȗƒȂ‹ˆሺ͹ƒȗƒȂ ͳǨൌ„ሻሼ ‹ˆሺšΨʹൌൌͳሻ‹ˆሺšΨʹൌൌͳሻ ‹ˆሺሺȗ’ሻΨʹൌൌͲሻሼ šൌ›൅͵Ǣ ȀȀƒŽ™ƒ›•–”—‡ȀȀƒŽ™ƒ›•–”—‡ šൌšȗ͵൅ͳǢšൌšȗ͵൅ͳǢ ›ൌš൅ͳǢ ሽ ‡Ž•‡ሼ ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ ‡Ž•‡šൌšȀʹǢ‡Ž•‡šൌšȀʹǢ ሽ‡Ž•‡ሼ šൌ›൅͵Ǣ ሽ‡Ž•‡ሼሽ‡Ž•‡ሼȀȀ‘’–‹‘ƒŽȀȀ‘’–‹‘ƒŽ ‹ˆሺšൌൌͳሻ‹ˆሺšൌൌͳሻȀȀƒŽ™ƒ›•”‡ƒ Šƒ„Ž‡ȀȀƒŽ™ƒ›•”‡ƒ Šƒ„Ž‡ ›ൌš൅ͳǢ ሽ „‘‰—•‘†‡•ሺሻǢ„‘‰—•‘†‡•ሺሻǢ ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ‘”‹‰‹ƒŽ‘†‡•ሺሻǢ ›ൌ›൅ʹǢ (a)ሽሽ Opaque constant. ሽሽ (b) Collatz conjecture. ሽ (c) Dynamic opaque predicate. Fig. 3: Control obfuscation with opaque predicates. ͳͶ

‹– •™ƒ” ൌͳǢ ‹–‹– ƒൌͳǢ ‹–‹– ƒൌͳǢ ƒൌͳǢ •™‹– Šሺ•™ƒ”ሻሼ ‹– „ൌʹǢ ‹– „ൌʹǢ ƒ•‡ͳǣ ƒ•‡͵ǣ ƒ•‡ͷǣ ‹– „ൌʹǢ ‹– „ൌʹǢ ƒൌͳǢ „ ൌ„൅ƒǢ ƒ൅൅Ǣ ™Š‹Ž‡ሺƒ൏ͳͲሻሼ ͳǣ‹ˆሺǨሺƒ൏ͳͲሻሻ ͳͶ ™Š‹Ž‡ሺƒ൏ͳͲሻሼ ͳǣ‹ˆሺǨሺƒ൏ͳͲሻሻ „ൌʹǢ ͳͶ ‹ˆሺǨሺ„൐ͳͲሻሻ •™ƒ” ൌʹǢ „ൌƒ൅„Ǣ ‰‘–‘ ͵Ǣ „ൌƒ൅„Ǣ ‰‘–‘ ͵Ǣ •™ƒ” ൌʹǢ •™ƒ” ൌͷǢ „”‡ƒǢ ‹ˆሺ„൐ͳͲሻሼ „ൌƒ൅„Ǣ ‹ˆሺ„൐ͳͲሻሼ „ൌƒ൅„Ǣ „”‡ƒǢ ‡Ž•‡ ƒ•‡͸ǣ „ǦǦǢ ‹ˆሺǨሺ„൐ͳͲሻሻ „ǦǦǢ ‹ˆሺǨሺ„൐ͳͲሻሻ ƒ•‡ʹǣ •™ƒ” ൌͶǢ ’”‹–ˆሺDzΨ†dzǡ„ሻǢ ሽ ‰‘–‘ ʹǢ ‹ˆሺǨሺƒ൏ͳͲሻሻ „”‡ƒǢ „”‡ƒǢ ሽƒ൅൅Ǣ „ǦǦǢ ‰‘–‘ ʹǢ •™ƒ” ൌ͸Ǣ ƒ•‡Ͷǣ ሽ ሽ ƒ൅൅Ǣ ʹǣƒ൅൅Ǣ„ǦǦǢ ‡Ž•‡ „ǦǦǢ ሽ’”‹–ˆሺDzΨ†dzǡ„ሻǢ ʹǣƒ൅൅Ǣ‰‘–‘ ͳǢ •™ƒ” ൌ͵Ǣ •™ƒ” ൌͷǢ ’”‹–ˆሺDzΨ†dzǡ„ሻǢ(a) Souce code. (b)͵ǣ’”‹–ˆሺDzΨ†dzǡ„ሻǢ Dismantling‰‘–‘ ͳǢ while. „”‡ƒǢ (c)„”‡ƒǢ Using switch. ͵ǣ’”‹–ˆሺDzΨ†dzǡ„ሻǢ Fig. 4: Control-flow flattening approach proposed by Wang et al. [53]. program semantics, and the switch variable is also employed 3) Misc: There are several other control obfuscation to terminate the outer loop. Cappaert and Preneel [59] found approachesͳͷ that do not belong to the discussed categories. that the mechanisms might be vulnerable to local analysis, i.e., Examples of such approaches are instructional control hiding ͳͷ the switch variable is directly assigned such that adversaries (e.g., [65, 66]) and API call hiding (e.g., [67, 68]). They can infer the next block to execute by only looking into a generally have special obfuscation purposes or based on current block. They proposed a strengthened approach with particular tricks. several tricks, such as employing reference assignment (e.g., Instructional Control Hiding swV ar = swV ar + 1) instead of direct assignment (e.g., swV ar = 3), replacing the assignment via if-else with Instructional control hiding converts explicit control instruc- a uniform assignment expression, and employing one-way tions to implicit ones. Balachandran and Emmanuel [65] found functions in calculating the successor of a basic block. that control instructions (e.g., jmp) are important informa- tion for reverse analysis. They proposed to substitute such Besides control flattening, there are several other dispather- instructions with a combination of mov and other instructions based obfuscation investigations (e.g., [20, 30, 63, 64]). which implements the same control semantics. In an extreme Linn and Debray [30] proposed to obfuscate binaries with case, Domas [66] think all high-level instructions should be branch functions that guide the execution based on the obfuscated. He proposed movobfuscation, which employs only stack information. Similarly, Zhang et al. [64] proposed one instruction (i.e., mov) to compile the program. The idea to employ branch functions to obfuscate object-oriented is feasible because mov is Turing complete [69]. programs, which defines a unified method invocation style with an object pool. To enhance the security of such API Call Hiding mechanisms, Ge et al. [63] proposed to hide the control Collberg et al. [25] proposed a problem that the function information in another standalone process and employ inter- invocation codes in Java programs are well understood but process communications. Schrittwieser and Katzenbeisser [20] hard to obfuscate. They suggest substituting common patterns proposed to employ diversified code blocks which implement of function invocation with less obvious ones, such as those the same semantics. discussed by Wills [70]. The problem is significant, but Dispatcher-based obfuscation is resistance against static surprisingly it has not been studied a lot by other investigations analysis because it hides the control-flow graph of a software except [67, 68]. Kovacheva [67] investigated the problem for program. However, it is vulnerable to dynamic program Android apps. He proposed to obfuscate the native calls (e.g., analysis or hybrid approaches. For example, Udupa et al. [2] to libc libraries) via a proxy, which is an obfuscated class proposed a hybrid approach to reveal the hidden control flows that wraps the native functions. Bohannon and Holmes [68] with both static analysis and dynamic analysis. investigated a similar problem for Windows powershell scripts.

7 To obfuscate an invocation command to Windows objects, they discussed the obfuscation transformations for arrays, such as proposed to create a nonsense string first and then leverage splitting one array into several subarrays, merging several ar- Windows string operators to transform the string to a valid rays into one array, folding an array to increase its dimension, command during runtime. or flattening an array to reduce the dimension. Ertaul and Venkatesh [76] suggested transforming the array indices with More Tricks composite functions. Zhu et al. [79, 80] proposed to employ Collberg et al. [25] proposed several other obfuscation homomorphic encryption to obfuscate array. Obfuscation tricks, such as aggregating irrelevant method into one method, classes is similar as obfuscation arrays. Collberg et al. [75] scattering a method into several methods. Such tricks are proposed to increase the depth of the class inheritance tree by also discussed in other investigations (e.g., [42, 71]) and splitting classes or inserting bogus classes. Sosonkin et al. [81] implemented in obfuscation tools (e.g., JHide [72]). Besides, also discussed class obfuscation techniques via coalescing and Wang et al. [73] proposed translingual obfuscation, which splitting. Such approaches can increase the complexity of a introduces obscurity by translating the programs written class, e.g., measured in CK metrics [82]. in C into ProLog before compilation. Because ProLog adopts a different program paradigm and execution model E. Polymorphic Obfuscation from C, the generated binaries should become harder to understand. Majumdar et al. [74] proposed slicing obfuscation, Polymorphism is a technique widely employed by malware which increases the resistance of obfuscated programs against camouflage, which creates different copies of malware to slicing-based deobfuscation attacks, such as by enlarging the evade anti-virus detection [83, 84]. It can also be employed size of a slice with bogus codes. to obfuscate programs. Note that previous obfuscation ap- Not that all existing control obfuscation approaches focus on proaches focus on introducing obscurities to one program, syntactic-level transformation, while the semantic-level protec- while polymorphic obfuscation generats multiple obfuscated tion has rarely been discussed. Although they may demonstrate versions of a program simultaneously. Ideally, it would different strengths of resistance to attacks, their obfuscation pose similar difficulties for adversaries to understand the effectiveness concerning semantic protection remains unclear. components of each particular version. It is a technique orthogonal to the classic obfuscation and mainly designed to D. Data Obfuscation impede large-scale and reproductive attacks to homogeneous software [85]. Data obfuscation transforms data objects into obscure Polymorphic obfuscation generally relies on some random- representations. We can transform the data of basic types via ization mechanisms to introduce variance during obfuscation. splitting, merging, procedurization, encoding, etc. Lin et al. [35] proposed to generate different data structure Data splitting distributes the information of one variable layout during each compilation. The data objects, such as into several new variables. For example, a boolean variable structures, classes, and stack variables declared in functions, can be split into two boolean variables, and performing logical can be reordered randomly in each version. Xin et al. [83] operations on them can get the original value. further improved the data structure polymorphism approach Data merging aggregates several variables into one variable. by automatically discovering the data objects that can be Collberg et al. [75] demonstrated an example that merges randomized and eliminating the semantic errors generated two 32-bit integers into one 64-bit integer. Ertaul and during reordering. Crane et al. [86] proposed to randomize Venkatesh [76] proposed another method that packs several the tables of pointers such that the introduced diversity can be variables into one space with discrete logarithms. resistant to some code reuse attacks. Besides, Xu et al. [36] Data procedurization substitutes static data with procedure suggested introducing security features in the polymorphic calls. Collberg et al. [75] proposed to substitute strings with a code regions. function which can produce all strings by specifying paticular parameter values. Drape [77] proposed to encode numerical VI. MODEL-ORIENTED OBFUSCATION data with two inverse functions f and g. To assign a value v to a variable i, we assign it to an injected variable j as Model-oriented obfuscation studies the theoretical obfusca- j = f(v). To use i, we invoke g(j) instead. tion problems on computation models, such as circuits and Data encoding encodes data with mathematical functions Turing Machines. In 1996, Goldreich and Ostrovsky [87] or ciphers. Ertaul and Venkatesh [76] proposed to encode firstly studied a theoretical software intellectual property strings with Affine ciphers (e.g., Caser cipher) and employ protection mechanism based on Oblivious RAM. Later, discrete logarithms to pack words. Fukushima et al. [78] Hada [88] firstly studied the theoretical obfuscation problem proposed to encode the clear numbers with exclusive or based on Turing Machines. In 2001, Barak et al. [12] proposed operations and then decrypt the computation result before a well-recognized modeling approach for obfuscating circuits output. Kovacheva [67] proposed to encrypt strings with the and Turing Machines, which lays a foundation of this field. RC4 cipher and then decrypt them during runtime. There are two important research topics in this field: 1) what The data obfuscation ideas can also be extended to abstract is the best security property that obfuscation can achieve? 2) data types, such as arrays and classes. Collberg et al. [75] how can we achieve the property?

8 TABLE II: A comparison of the security properties for model-oriented obfuscation. Notations: S is a polynomial-size simulator; A is a polynomial-size adversary; L is a polynomial-size learner; Su is a unbounded-size simulator; P1 and P2 are programs P[q(n)] that compute a same function and have similar cost; P and Q are programs that compute different functions; Su means querying the oracle access Su for n times; ε is a negligible number.

Security Property Requirement Security Strength P Virtual Black-Box Property (VBBP) |Pr[A(O(P)) = 1] − Pr[S = 1]|≤ ε Ideal Security

Indistinguishability Property (INDP) |Pr[A(O(P1)) = 1] − Pr[A(O(P2)) = 1]|≤ ε INDP < VBBP If |Pr[A(O(P)) = 1] − Pr[A(O(Q)) = 1]|≥ ε, VBBP > DIP > INDP Differing-Input Property (DIP) ′ ′ then |Pr[A(P ) = 1] − Pr[A(Q ) = 1]|≥ ε

Best-Possible Property (BPP) |Pr[L(O(P1)) = 1] − Pr[S(P2)]|≤ ε BPP = Efficient INDP P[q(n)] Virtual Grey-Box Property (VGBP) |Pr[A(O(P)) = 1] − Pr[Su = 1]|≤ ε INDP < VGBP < VBBP

A. Security Properties such an exception [91]. 2) Indistinguishability Property (INDP) and Best-Possible Barak et al. [12] defined an obfuscator O as a “compiler” Property (BPP): Although VBBP is not universally attainable, P P that inputs a program and outputs a new program O( ). we still need some attainable properties. As an alternative, O(P) should posses the same functionality as P, incur only Barak et al. [92] proposed a weaker notion: indistinguishability polynomial slowdown, and hold some properties of unintelligi- obfuscation. It requires that if two programs P1 and P2 bility or security. Note that following investigations generally are functionally equivalent, and they have similar program adopt “polynomial slowdown” to discriminate whether an size and execution time, then |P r[A(O(P )) = 1] − algorithm is efficient, but they may adopt different properties 1 P r[A(O(P2)) = 1]| is negligible. Because the notion does not of security. Table II lists several major security properties assume a polynomial-size simulator, it avoids the inefficiency discussed in the literature. Next we introduce each property issue caused by unobfuscatable programs. Barak et al. have and justify why the indistinguishability property is mostly shown that INDP is attainable for universal programs, such interested by existing obfuscation algorithms. as employing an obfuscator that converts all programs to their 1) Virtual Black-Box Property (VBBP): Ideally, an obfus- canonical forms or lookup tables. However, such an obfuscator cated program should leak no more information than accessing might be trivial if it is inefficient. the program in a black-box manner. The property is firstly Another important issue is how useful INDP is since it proposed by Barak et al. [12] as virtual black-box obfuscation. has no intuitive to hide information. INDP guarantees that Let A be a polynomial-size adversary (e.g., a probabilistic the obfuscated program leaks no more information than any polynomial-time Turing Machine). VBBP requires that for any other obfuscated program versions of similar cost. Therefore, such adversaries, there exists a polynomial-size simulator S, P if we can design an obfuscator with INDP, it guarantees the such that |P r[A(O(P)) = 1] − P r[S = 1]| is negligible. obfuscation would have the best effectiveness. To rule out The expression means any program semantics learned by the inefficient obfuscation, Goldwasser et al. [93] proposed best- adversary can be simulated with a polynomial-size simulator. possible obfuscation. BPP requires that for any polynomial- Barak et al. have shown a negative result that at least one size learner L, there exists a polynomial-size simulator S, family of efficient programs Pf (x) cannot be obfuscated with such that |P r[L(O(P1)) = 1] − P r[S(P2)]| is negligible. VBBP. Pf (x) can be constructed with any one-way functions, By assuming a polynomial-size simulator, BPP excludes whose semantics cannot be learned from oracle access. inefficient indistinguishability obfuscation. Note that BPP is Therefore, given only oracle access to the function f(x), also similar to VBBP but it is weaker than VBBP. The no efficient algorithm can compute f(x) better than random differece is that for BPP, the simulator S(P2) can access ′ guessing. However, given any efficient program Pf (x), there another version of the program, while for VBBP the simulator exists an efficient algorithm that can compute the function. SP works as a black-box. Lin et al. [94] proposed another In this way, an efficient program that computes a property similar notion exponentially-efficient indistinguishability prop- ′ of the function can be constructed as D(Pf (x)) → {0, 1}, erty(XINDP), that requires the obfuscated program should be but it cannot be efficiently constructed from oracle access. smaller than its truth table. Lin et al. showed that XINDP Goldwasser et al. [89] and Bitansky et al. [90] further showed implies efficient INDP under the assumption of learning with that some encryption programs cannot be obfuscated with errors (LWE) [95]. VBBP when auxiliary inputs are available. 3) More Properties: There are other alternatives dis- The negative result implies we cannot achieve genreal- cussed in the literature, such as virtual grey-box property purpose obfuscation with VBBP. However, it does not mean (VGBP) [96], differing-input property (DIP) [92] and its no program can be obfuscated with VBBP. Point function is variations. Their security levels are considered as between

9 ‹–ͺ̴–šൌƒ”‰˜ሾͳሿሾͲሿǢ ˆሺšൌൌ͹ሻሼ ’”‹–ˆሺDz ƒ’–—”‡–Š‡ˆŽƒ‰̳dzሻǢ ሽ

VBBP and INDP [18]. DIP is another notion proposed by Barak et al. [92]. For „‹– Ͳ „‹– ͳ „‹– ʹ „‹– ͵ „‹– Ͷ „‹– ͷ „‹– ͸ „‹– ͹ two programs P and Q of the same cost, it requires if ͳ ͳ ͳ Ͳ Ͳ Ͳ Ͳ Ͳ an adversary can distinguish their obfuscated versionsICMP (i.e., •Ͳ •ͳ •ʹ •͵ •Ͷ •ͷ •͸ •͹ •ͻ Ͳ Ͳ O(P) and O(Q)), she should be able to differ any versions Ͳ ͳ ͳ ͳ ͳ ͳ of P and Q with the same cost, i.e., to find an inputICMPx ′ ′ such that P (x) 6= Q (x). When P and Q compute the •ͺ same function, DIP implies INDP. DIP is also known as (a) A branching program. extractability obfuscation [97]. However, Boyle and Pass [98], and Garg et al. [99] showed that DIP is not attainable for all programs. To avoid the impossibility, Ishai et al. [100] … proposed public-coin DIP. Note that DIP is a stronger notion than INDP, and we can have many useful applications with a DIP obfuscator [101]. Ͳ VGBP [96] is similar to VBBP except that it empowers the Š‡ƒ†ƒ–”‹š ƒ–”‹ ‡•ˆ‘”„‹– –ƒ‹Žƒ–”‹š simulator to unbounded size. To be nontrival, it restricts the Š‡ƒ†ƒ–”‹š (b) A matrixƒ–”‹ ‡•ˆ‘”„‹– branching program.Ͳ –ƒ‹Žƒ–”‹š simulator to have only limited times of oracle access. Bitansky et al. [102] shown that VGBP also implies INDP. In brief, we cannot obfuscate universal programs with VBBP security, but we may obfuscate universal programs with INDP security. INDP is also the best-possible security property if the obfuscation is efficient. Therefore, if VBBP is attainable for some programs, efficient INDP would guarantee VBBP. (c) Generate randomized matrix branching program. B. Candidate Approach int8 In 2013, Garg et al. [8] proposed the first candidate Fig. 5: The procedures to convert a program (i.e., if x of algorithm to achieve INDP. Their approach is based on the equals to 7) to a randomized matrix branching program. idea of functional encryption [103]. Functional encryption allows users to compute a function from encrypted data with 1 some keys. In the scenario of program obfuscation, suppose NC circuits and can be extended to all circuits with fully a program P computes a function f(x), then the functional homomorphic encryption. encryption problem is to encrypt the program Enc(P), such Converting to MBP P that Enc( ) can still compute f(x) with a public key Ks, A matrix branching program that computes a function f is but Ks should not reveal P. Because cryptography algorithms given by a tuple generally have strong security basis, functional encryption becomes a dominant idea for program obfuscation with MBPf = (Input,Mhead, (Mi,0,Mi,1)i∈l,Mtail) provable security. Currently, the only known functional encryption approach Input selects a matrix Mi,0 or Mi,1 for each i according to the corresponding bit of input; is a row vector of size for program obfuscation is graded encoding. It encodes Mhead ; are matrix pairs of size that encode programs with multilinear maps, such that any tampering w (Mi,0,Mi,1)i∈l w × w attacks that do not respect its mathematical structure (encoded program semantics; Mtail is a column vector of size w. with private keys) would lead to meaningless program Given an input x, the MBP computes an output executions. To employ graded encoding, Garg et al. [104] MBPf (x) ∈{0, 1} as follows: proposed to convert programs to matrix branching programs (MBP) before encoding. However, such conversion incurs l much overhead. Later, Zimmerman [9], and Applebaum and MBPf (x)= Mhead × ( Mi,xinput(k) ) × Mtail i=1 Brakerski [105] proposed to encode circuits directly without Y converting to MBPs. Next we discuss the two mechanisms. Suppose the i-th matrix pair corresponds to the k-th bit of 1) MBP-based Graded Encoding: An essential requirement the input. If the k-th bit is 0, then Mi,0 is selected, or vice to employ graded encoding is that the encoded programs versa. The program output is the matrix multiplication result, can be evaluated. MBP is such a model that holds a good which is a 1 × 1 matrix, or a value. algebraic structure for evaluation even after being encrypted. How can we convert general programs to MBPs? The There are two phases for MBP-based graded encoding: the first Barrington’s Theorem states that we can convert any boolean phase converts programs to MBPs; the second phase encrypts formula (boolean circuit of fan-in-two, depth d) to a branching MBPs with graded encoding mechanisms. Garg et al. [104] program of width 5 and length ≤ 4d [106]. Garg et al. [104] showed that the MBP-based approach is feasible for shallow simply employ the result and assume the MBP is composed

10 of with 5 × 5 matrices. Ananth et al. [107] found the resulting integer matrix. Stating from an identity matrix, such RMi MBP following Barrington’s Theorem is not very efficient. can be obtained via iterative transformations leveraging the They propose a new approach that converts boolean formulas determinant invariant rule. of size s to matrix branching programs of width ≤ 2s +2 and Graded Encoding length ≤ s. Besides, there are several other efforts towards converting to more efficient MBPs, such as [108, 109]. The Garg et al. [8] noticed that although the randomized matrix conversion generally includes two steps: from a program Pf branching program provides some security, it still suffers three to a branching program BPf and from BPf to MBPf . kinds of attacks: partial evaluation, mixed input, and other Pf → BPf : A branching program is a finite state machine. attacks that do not respect the algebraic structure. Partial For boolean formulas Pf ∈{0, 1}, the finite state machine has evaluation means we can evaluate whether partial programs one start state, two stop states (true and false), and several generate the same result for different inputs. Mixed input intermediate states. Sauerhoff et al. [110] demonstrated a means we can tamper the program intentionally by selecting general approach to simulate any boolean formulas over AND Mi,0 and Mj,1 if i and j are related to the same bit of input. and OR gates with branching programs. It can be extended to Graded encoding is designed to defeat such attacks. It is based any formulas as they can be converted to the form with only on multilinear maps, which can be traced back to the historical AND and OR. Figure 5(a) demonstrates an example which multiparty key exchange problem proposed in 1976 [112, 113]. converts a boolean program i == 7 to a branching program. In general, a graded encoding scheme includes four compo- Suppose i is an integer of eight bits, the boolean formula is nents: setup that generates the public and private parameters b0 ∧ b1 ∧ b2 ∧ ¬b3 ∧ ¬b4 ∧ ¬b5 ∧ ¬b6 ∧ ¬b7. To model the for a system, encoding that defines how to encrypt a message branching program we need 10 states: 8 states (s0-s7) that with the private parameters, operations that declare the accept each bit of input, and 2 stop states (s8 for false, and supported calculations with encrypted messages, and a zero- s9 for true). testing function that evaluates if the plain text of an encrypted BPf → MBPf : To compose a MBPf that is functionally message should be 0. Currently, there are two graded encoding equivalent to BPf , we should compute each matrix of the schemes: GGH scheme [114] which encodes data over lattices, MBPf . In general, Mhead can be an all-zero row vector and CLT scheme [115] which encodes data over integers. Note except the first position is 1, and Mtail can be an all-zero that the graded encoding schemes for program obfuscation are column vector except the last position is 1. (Mi,0,Mi,1)i∈len slightly different from their original versions for multiparty can be constructed from the adjacency matrices of each state. key exchange. For simplicity, below we only discuss the For example, if the first bit of input is 0, the station transfers graded encoding schemes for obfuscation. from s0 to s8, then we start with an identity matrix and The GGH scheme is named after Garg, Gentry, and assign 1 to the element of the first row and the ninth column. Halevi [114], and it is the first plausible solution to compose Figure 5(b) demonstrates the matrices corresponding to the fist multilinear maps. GGH scheme is based on ideal lattices. It input bit of Figure 5(a). encodes an element e over a quotient ring R/I as e+I, where Following such converting approaches, the elements of I = hgi⊂ R is the principal ideal generated by a short vector resulting matrices are either 1 or 0. To protect the matrices, g. The four components of GGH are defined as follows. Kilian [111] proposed that we can randomize the elements of Setup: Suppose the miltilinear level is κ. The system an MBP while not changing its functionality. generates an ideal-generator g which is chosen as g and g−1 MBPf → RMBPf : To randomize the matrices, we first should be short, a large enough modulus q, denominators {zi} generate n+1 random integer matrices RMi and their inverse from the ring Rq. Then we publish the zero-testing parameter −1 κ RMi of size w × w. Then we multiply the original matrices as pzt = [h i=1 zi/g]q, where h is a small ring element. with such random matrices as follows. Encoding: The encoding of an element e in set S can be Q zi computed as : u := [(e + I)/zi]q. RMhead = Mhead × RM0 Operations: If two encodings are in the same set (e.g., u1 := −1 [c1/zi]q and u2 := [c2/zi]q), then one can add up them u1+u2. RM0,0 = RM0 × M0,0 × RM1 If the two encodings are from disjoint sets, one can multiply −1 the two encodings u · u . RM0,1 = RM0 × M0,1 × RM1 1 2 Zero-Testing Function: A zero testing function for a level-κ ... encoding u is defined as RM = RM −1 × M tail n tail 1 if ||[u · p ] || ≤ q3/4 IsZero(u)= zt q ∞ The randomization mechanism ensures that all random- (0 otherwise ization matrices RMi would be canceled when evaluating RMBPf (x). Note that to avoid errors incurred by floating- Note that u·pzt = h·c/g. If u is an encoding of 0, c should point numbers, we should guarantee all the elements of be a short vector in I and the product can be smaller than a matrices are integers as shown in Figure 5(c). This is feasible threshold, otherwise, c should be a short vector in some coset −1 because when the dominant of RMi is 1, RMi is also an of I and the product should be very large.

11 ˜ƒŽ—ƒ–‹‘ǣሺሻሺ ’ ͳǡ ’ ʹǡǥǡ ͳǡ ǥ ሻǡ The CLT scheme is another multilinear map construction approach proposed by Coron, Lepoint, and Tibouchi [115, 116]. It is based on integers. The four components of the scheme are defined as follows. Setup: The scheme generates κ secret large primes {pi}, small primes {gi}, random integers {hi}, random integers κ {zi}, a modulo q = i=1 pi, and a zero-testing parameter κ κ Q −1 pzt = hi × zi × g mod pi × pi′ mod q i=1 i=1 ′ X Y iY6=i Encoding: Suppose ri is a small random integer, the encoding of an element e in set S is u = ri·gi+e (mod p ). zi zi i Operation: If ui and uj are encodings in the same set, one can add them up. If they are from disjoint sets, they can be multiplied. Zero-Testing Function: A zero testing function for a level-κ encoding u is Fig. 6: The evaluation process of an encoded keyed circuit (key: k1...km) given an input is x1 =1, x2 =0...xn. ͳͻ 1 if ||u · p (mod q)|| ≤ q · 2−v IsZero(u)= zt ∞ (0 otherwise VII. ON SECURE AND USABLE OBFUSCATION In this function, v is a value related to the bit-size of the In this section, we carefully justify the prospects of the encoding parameters [115]. existing investigations towards secure and usable obfuscation. Note that both the GGH scheme and CLT scheme are noisy multilinear maps, because the encoding of a value varies at A. With Code-Oriented Obfuscation different times. The only deterministic function is the zero- testing function. However, when a program becomes complex, In general, existing code-oriented obfuscation approaches the noise may overwhelm the signal. Take the CLT scheme are usable but insecure. We may hope to achieve secure as an example, the size of pi should be as large as possible program obfuscation in the future if we have adequate eval- to overwhelm the noise. This requirement largely restrict the uation metrics for security. Our primary supporting evidence usability of graded encoding. is two folds. Firstly, current security metrics are inadequate. 2) Circuit-based Graded Encoding: Converting circuits Secondly, code obfuscation techniques are promising to be to MBPs incurs much overhead because the size of an resistant because deobfuscation also suffers limitations. MBP is generally exponential to the depth of a circuit. 1) Inadequate Security Metrics: To our best knowledge, all To avoid such overhead, Zimmerman [9] and Applebaum existing evaluation metrics emloyed by code-oriented obfus- and Brakerski [105] proposed to obfuscate circuit programs cation investigations are inadequate concerning the security directly. The approach focuses on keyed circuit families discussed in Section III-C. Most investigations and tools (C(·, k))k∈{0,1}m , and it can be extended to general circuits (e.g., Obfuscator-LLVM [118]) adopt the metrics defined because all circuits can be transformed to keyed circuits [117]. by Collberg et al. [11]. Besides, other evaluation metrics Circuit-based graded encoding assumes a circuit structure (e.g., [119]–[122]) proposed in the literature are also in- can be made public, and only the key needs to be protected. adequate. For example, Anckaert et al. [119] followed the Figure 6 demonstrates an example that evaluates an obfuscated idea of potency and developed more detailed measurements. circuit given an input x1 = 1, x2 = 0...xn. To generate such Ceccato et al. [120, 121] proposed to conduct controlled code an obfuscated circuit, we encode each input wire with a pair of comprehension experiments against human attackers, such that encodings corresponding to the input values of 0 and 1, i.e., we can measure the security with task completion rate and time. Such metrics are either heuristic or consider little about [xi,0, αi]Si,0 ,[xi,1, αi]Si,1 . To support the addition of values from different multilinear sets, we publish the encoding of protecting essential program semantics.

1 for each set, i.e., [1, 1]Si,0 ,[1, 1]Si,1 . Besides, we encode 2) Limitations of Deobfuscation: The most recognized each key bit as [ki,βi]SK and publish the encoding of 1, i.e., deobfuscation attacks (e.g., [4, 123]) are based on program

[1, 1]SK . Note that the approach introduces a checksum mech- analysis and pattern recognition, both of which suffer limita- ′ anism, i.e., xi mod N ≡ xi mod Neval ≡ αi mod Nchk, tions. Pattern recognition requires a predefined pattern repos- s.t. N = Neval · Nchk. Evaluating the obfuscated circuit itory, and it cannot automatically adapt to new obfuscation just follows the original circuit structure. As a result, we techniques. Program analysis suffers many challenges. For can compute an evaluation value C(x1, ..., xn, k1, ..., km) ∈ example, symbolic execution is one major program analysis

SNeval and a checksum C(α1, ..., αn,β1, ..., βm) ∈ SNchk . approach to detect opaque predicates [6, 124]–[126], but it

12 is vulnerable to many challenges, such as handling symbolic 2) Security of Graded Encoding: Graded encoding is arrays and concurrent programs [127]. very powerful, Sahai and Waters [139] showed that INDP Moreover, The Rice’s Theorem [128] implies that automated obfuscation can serve as a center for many cryptographic attackers would suffer theoretical limitations because whether applications. Moreover, several investigations (e.g., [138, a deobfuscated program is equivalent to the obfuscated version 140]–[144]) showed that an INDP obfuscator can be more is undecidable. Only when the tricks are known, some powerful than merely providing INDP under idealized models. deobfuscation problems can be in NP [37]. Therefore, program Besides, current graded encoding schemes can be consid- obfuscation approaches are promising to have good resistance. ered as secure but still need to be carefully explored. Both the GGH and CLT schemes are based on a new Graded Decisional B. With Model-Oriented Obfuscation Diffie-Hellman (GGDH) hardness assumption for multilinear maps. The community generally agrees that the security of Current model-oriented obfuscation approaches can be GGDH should be further explored, because it cannot be considered as secure but unusable. We may hope to achieve reduced to other well-established hardness assumptions, such some usable obfuscation applications if the performance as and NTRU for encryption over lattices [145]. Indeed, there metrics can be improved or the security requirement can are several investigations on cryptanalysis (e.g., [146]–[149]) be weakened. Our evidence is two folds. Firstly, there or proposing newly patched schemes (e.g., [142, 150]–[152]). are several obfuscation implementations and applications However, no severe security flaw has been founded so far that which demonstrate the usability issue. Secondly, existing would obsolesce the approach. investigations are optimistic about the security of model- oriented obfuscation. C. With new Evaluation Properties 1) Usability Issues: As the development of model-oriented There are two investigations (i.e., [21, 22]) which coincide obfuscation, several investigations begin focusing on applica- with our results. Kuzurin et al. [22] observed that there are tion issues, such as [10, 129]–[131]. Apon et al. [129] imple- considerable gaps between practical and theoretical obfus- mented a full obfuscation solution based on the CLT graded cation. On one hand, the security properties in theoretical encoding. It can obfuscate programs written in SAGE [132], or model-oriented obfuscation are too strong; on the other a Python library for algebraic operations. Due to the perfor- hand, we have no formal security evaluation approach for mance issue, they only demonstrated single-bit identity practical or code-oriented obfuscation. They proposed to gate, AND gate, XOR gate, and point functions. Even for an design specific security properties for particular application 8-bit point function, it takes hours to obfuscate the program scenarios, such as constant hiding and predict obfuscation. and several minutes to evaluate the program. Besides, the size Preda and Giacobazziet al. [21] found that existing obfuscation of the resulting program is several gigabytes. Lewi et al. [10] evaluation metrics are textual or syntactic, which ignore the implemented another obfuscator that can run on top of either semantics. They propose to employ a semantic-based approach libCLT [115] or GGHLite [133, 134], which are open-source to evaluate the potency of obfuscation. To this end, they libraries of multilinear maps. The input program should be employ abstract interpretations to model the syntactic trans- written in Cryptol [135], which is a formation of obfuscation with semantic-based approach [153]. for design cryptography algorithms. They also evaluated the A semantic-based obfuscation transformation is defined as performance when obfuscating point functions. With a better τ[P]. The obfuscation τ is potent if there is a property α hardware configuration, it can obfuscate 40-bit point functions such that α(S[P]) 6= α(S[τ[P]]). Such properties can be as in minutes and evaluate the program in seconds. However, simple as the sign ({+, −, 0}) of a variable or a complex the obfuscated program sizes are hundreds of megabytes watermarking [154]. Moreover, the authors have conducted or even several gigabytes. Their results show that CLT has several preliminary investigations (e.g., [23, 24, 155, 156]) on better performance over GGH for small-size point functions, employing the ideas to obfuscate simple programs. but the advantage declines when the program size grows. Note that all the investigations are still very preliminary. Halevi et al. [131] implemented a simplified version of There is still a large room for improvement in secure and the graph-induced multi-linear maps [136] which should usable obfuscation. outperform the CLT scheme when the number of branching program states grows. However, their evaluation results have VIII. CONCLUSIONS not shown fundamental changes of the performance. To conclude, this work explores secure and usable program To summerize, we can find two usability issues in such obfuscation in the literature. We have surveyed both existing investigations. Firstly, the costs are unacceptable even when code-oriented obfuscation approaches and model-oriented obfuscating straightforward mathematical expressions. The obfuscation approaches, which exhibit gaps and connections in other issue is that current obfuscation implementations only between. Our primary result is that we do not have secure and focus on elementary mathematical expressions, such as XOR, usable program obfuscation approaches, and the main reason is point functions, and conjecture normal forms [137, 138]. We we lack appropriate evaluation metrics considering both secu- do not know how to handle other advanced mathematical rity and usability. Firstly, we have no adequate security metrics operations, not to mention complex code syntax. to evaluate code-oriented obfuscation approaches. Secondly,

13 the performance requirement for model-oriented obfuscation [23] M. Dalla Preda, M. Madou, K. De Bosschere, and R. Giacobazzi, approaches is too weak, and the security requirements might “Opaque predicates detection by abstract interpretation,” in Algebraic Methodology and Software Technology. Springer, 2006. be too strong. Moreover, we do not know how to apply model- [24] M. Dalla Preda, “Code obfuscation and malware detection by abstract oriented approaches to obfuscating general codes. Our survey interpretation,” Ph.D. dissertation, 2007. and result would urge the communities to rethink the notion [25] C. Collberg, C. Thomborson, and D. Low, “A taxonomy of obfuscating transformations,” The University of Auckland, Tech. Rep., 1997. of security and usable program obfuscation and facilitate the [26] M. Madou, L. Van Put, and K. De Bosschere, “Loco: An interactive development of such obfuscation approaches in the future. code (de) obfuscation tool,” in Proc. of the 2006 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation, 2006. REFERENCES [27] S. Chow, P. Eisen, H. Johnson, and P. C. Van Oorschot, “White-box International Workshop [1] J. Cappaert, “Code obfuscation techniques for software protection,” cryptography and an AES implementation,” in Ph.D. dissertation, 2012. on Selected Areas in Cryptography. Springer, 2002. [2] S. K. Udupa, S. K. Debray, and M. Madou, “Deobfuscation: reverse [28] ——, “A white-box DES implementation for DRM applications,” in engineering obfuscated code,” in Proc. of the 12th IEEE Working ACM Workshop on Digital Rights Management. Springer, 2002. Conference on Reverse Engineering, 2005. [29] J.-M. Borello and L. M´e, “Code obfuscation techniques for [3] S. Chandrasekharan and S. Debray, “Deobfuscation: improving reverse metamorphic viruses,” Journal in Computer Virology, 2008. engineering of obfuscated code,” 2005. [30] C. Linn and S. Debray, “Obfuscation of executable code to improve [4] Y. Guillot and A. Gazet, “Automatic binary deobfuscation,” Journal in resistance to static disassembly,” in Proc. of the 10th ACM Conference Computer Virology, 2010. on Computer and Communications Security, 2003. [5] K. Coogan, G. Lu, and S. Debray, “Deobfuscation of virtualization- [31] J.-T. Chan and W. Yang, “Advanced obfuscation techniques for java obfuscated software: a semantics-based approach,” in Proc. of the 18th bytecode,” Journal of Systems and Software, 2004. ACM Conference on Computer and Communications Security, 2011. [32] I. V. Popov, S. K. Debray, and G. R. Andrews, “Binary obfuscation [6] B. Yadegari, B. Johannesmeyer, B. Whitely, and S. Debray, “A generic using signals,” in Usenix Security, 2007. approach to automatic deobfuscation of executable code,” in Proc. of [33] S. M. Darwish, S. K. Guirguis, and M. S. Zalat, “Stealthy the 2015 IEEE Symposium on Security and Privacy, 2015. code obfuscation technique for software security,” in Proc. of the [7] B. Bichsel, V. Raychev, P. Tsankov, and M. Vechev, “Statistical International Conference on Computer Engineering and Systems. deobfuscation of android applications,” in Proc. of the ACM SIGSAC IEEE, 2010. Conference on Computer and Communications Security, 2016. [34] F. B. Cohen, “Operating system protection through program evolution,” [8] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, Computers & Security, 1993. “Candidate indistinguishability obfuscation and functional encryption [35] Z. Lin, R. D. Riley, and D. Xu, “Polymorphing software by for all circuits,” in Proc. of the 54th IEEE Annual Symposium on randomizing data structure layout,” in Detection of Intrusions and Foundations of Computer Science, 2013. Malware, and Vulnerability Assessment. Springer, 2009. [9] J. Zimmerman, “How to obfuscate programs directly,” in Annual [36] H. Xu, Y. Zhou, and M. R. Lyu, “N-version obfuscation,” in Proc. of the International Conference on the Theory and Applications of 2nd ACM International Workshop on Cyber-Physical System Security, Cryptographic Techniques. Springer, 2015. 2016. [10] K. Lewi, A. J. Malozemoff, D. Apon, B. Carmer, A. Foltzer, D. Wagner, [37] A. Appel, “Deobfuscation is in NP,” Princeton University, 2002. D. W. Archer, D. Boneh, J. Katz, and M. Raykova, “5gen: A [38] T. Ogiso, Y. Sakabe, M. Soshi, and A. Miyaji, “Software obfuscation framework for prototyping applications using multilinear maps and on a theoretical basis and its implementation,” IEICE Trans. on matrix branching programs,” in Proc. of the 23rd ACM SIGSAC Fundamentals of Electronics, Communications and Computer Sciences, Conference on Computer and Communications Security, 2016. 2003. [11] C. Collberg, C. Thomborson, and D. Low, “Manufacturing cheap, [39] IDA, https://www.hex-rays.com/products/ida/, 2017. resilient, and stealthy opaque constructs,” in Proc. of the 25th [40] C. Simonyi, “Hungarian notation,” MSDN Library, 1999. ACM SIGPLAN-SIGACT Symposium on Principles of Programming [41] ProGuard, http://developer.android.com/tools/help/proguard.html, Languages, 1998. 2016. [12] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, [42] D. Low, “Protecting java code via code obfuscation,” Crossroads, 1998. S. Vadhan, and K. Yang, “On the (im) possibility of obfuscating [43] G. Wroblewski, “General method of program code obfuscation,” Ph.D. programs,” in Advances in CryptologyCRYPTO. Springer, 2001. dissertation, Wroclaw University of Technology, 2002. [13] A. Balakrishnan and C. Schulze, “Code obfuscation literature survey,” [44] T. J. McCabe, “A complexity measure,” IEEE Trans. on Software CS701 Construction of Compilers, 2005. Engineering, 1976. [14] A. Majumdar, C. Thomborson, and S. Drape, “A survey of control-flow [45] W. A. Harrison and K. I. Magel, “A complexity measure based on obfuscations,” in Information Systems Security. Springer, 2006. nesting level,” ACM Sigplan Notices, 1981. [15] S. Drape et al., “Intellectual property protection using obfuscation,” [46] G. Arboit, “A method for watermarking java programs via opaque Proc. of SAS, 2009. predicates,” in The Fifth International Conference on Electronic [16] K. A. Roundy and B. P. Miller, “Binary-code obfuscations in prevalent Commerce Research, 2002. packer tools,” ACM Computing Surveys, 2013. [47] M. I. Sharif, A. Lanzi, J. T. Giffin, and W. Lee, “Impeding malware [17] S. Schrittwieser, S. Katzenbeisser, J. Kinder, G. Merzdovnik, and analysis using conditional code obfuscation.” in NDSS, 2008. E. Weippl, “Protecting software throughobfuscation: can it keep pace [48] W. Zhu and C. Thomborson, “A provable scheme for homomorphic with progress in code analysis?” ACM Computing Surveys, 2016. obfuscation in software security,” in The IASTED International [18] M. Horv´ath and L. Butty´an, “The birth of cryptographic obfuscation-a Conference on Communication, Network and , survey,” 2016. 2005. [19] B. Barak, “Hopes, fears, and software obfuscation,” Communications [49] A. Moser, C. Kruegel, and E. Kirda, “Limits of static analysis for of the ACM, 2016. malware detection,” in Proc. of the 23rd Annual [20] S. Schrittwieser and S. Katzenbeisser, “Code obfuscation against static Applications Conference. IEEE, 2007. and dynamic reverse engineering,” in Information Hiding. Springer, [50] B. Selman, D. G. Mitchell, and H. J. Levesque, “Generating hard 2011. satisfiability problems,” Artificial Intelligence, 1996. [21] M. Dalla Preda and R. Giacobazzi, “Semantic-based code obfuscation [51] R. Tiella and M. Ceccato, “Automatic generation of opaque constants by abstract interpretation,” Automata, Languages and Programming, based on the k-clique problem for resilient data obfuscation,” in Proc. 2005. of the 24th International Conference on Software Analysis, Evolution [22] N. Kuzurin, A. Shokurov, N. Varnovsky, and V. Zakharov, “On the and Reengineering. IEEE, 2017. concept of software obfuscation in computer security,” in Information [52] Z. Wang, J. Ming, C. Jia, and D. Gao, “Linear obfuscation to combat Security. Springer, 2007. symbolic execution,” in ESORICS. Springer, 2011.

14 [53] C. Wang, J. Hill, J. Knight, and J. Davidson, “Software tamper [79] W. Zhu, C. Thomborson, and F.-Y. Wang, “Applications of resistance: Obstructing static analysis of programs,” University of homomorphic functions to software obfuscation,” in Intelligence and Virginia, Tech. Rep., 2000. Security Informatics. Springer, 2006. [54] W. Landi and B. G. Ryder, “Pointer-induced aliasing: a problem [80] W. F. Zhu, “Concepts and techniques in software watermarking and classification,” in Proc. of the 18th ACM SIGPLAN-SIGACT obfuscation,” Ph.D. dissertation, ResearchSpace, Auckland, 2007. Symposium on Principles of Programming Languages, 1991. [81] M. Sosonkin, G. Naumovich, and N. Memon, “Obfuscation of design [55] A. Majumdar and C. Thomborson, “Manufacturing opaque predicates intent in object-oriented applications,” in Proc. of the Third ACM in distributed systems for code obfuscation,” in Proc. of the 29th workshop on Digital Rights Management, 2003. Australasian Computer Science Conference. Australian Computer [82] S. R. Chidamber and C. F. Kemerer, “A metrics suite for object oriented Society, 2006. design,” IEEE Trans. on Software Engineering, 1994. [56] D. Dolz and G. Parra, “Using exception handling to build opaque [83] Z. Xin, H. Chen, H. Han, B. Mao, and L. Xie, “Misleading malware predicates in intermediate code obfuscation techniques,” Journal of similarities analysis by automatic data structure obfuscation,” in Computer Science & Technology, 2008. Information Security. Springer, 2010. [57] H. Chen, L. Yuan, X. Wu, B. Zang, B. Huang, and P.-c. Yew, “Control [84] I. You and K. Yim, “Malware obfuscation techniques: a brief survey,” in flow obfuscation with information flow tracking,” in Proc. of the 42nd Proc. of International Conference on Broadband, Wireless Computing, Annual IEEE/ACM International Symposium on Microarchitecture, Communication and Applications, 2010. 2009. [85] S. Forrest, A. Somayaji, and D. H. Ackley, “Building diverse computer [58] J. Palsberg, S. Krishnaswamy, M. Kwon, D. Ma, Q. Shao, and systems,” in Proc. of the sixth IEEE Workshop on Hot Topics in Y. Zhang, “Experience with software watermarking,” in Proc. of the Operating Systems, 1997. 16th IEEE Annual Computer Security Applications Conference, 2000. [86] S. J. Crane, S. Volckaert, F. Schuster, C. Liebchen, P. Larsen, L. Davi, A.-R. Sadeghi, T. Holz, B. De Sutter, and M. Franz, “It’s a TRaP: [59] J. Cappaert and B. Preneel, “A general model for hiding control table randomization and protection against function-reuse attacks,” flow,” in Proc. of the 10th Annual ACM Workshop on Digital Rights in Proc. of the 22nd ACM SIGSAC Conference on Computer and Management, 2010. Communications Security, 2015. [60] C. Wang, J. Davidson, J. Hill, and J. Knight, “Protection of software- [87] O. Goldreich and R. Ostrovsky, “Software protection and simulation based survivability mechanisms,” in Proc. of the IEEE International on oblivious RAMs,” Journal of the ACM, 1996. Conference on Dependable Systems and Networks, 2001. [88] S. Hada, “Zero-knowledge and code obfuscation,” in ASIACRYPT. [61] S. Chow, Y. Gu, H. Johnson, and V. A. Zakharov, “An approach to Springer, 2000. the obfuscation of control-flow of sequential computer programs,” in [89] S. Goldwasser and Y. T. Kalai, “On the impossibility of obfuscation Information Security. Springer, 2001. with auxiliary input,” in Proc. of the 46th Annual IEEE Symposium on [62] T. L´aszl´o and A. Kiss, “Obfuscating c++ programs via control Foundations of Computer Science, 2005. flow flattening,” Annales Universitatis Scientarum Budapestinensis de [90] N. Bitansky, R. Canetti, H. Cohn, S. Goldwasser, Y. T. Kalai, O. Paneth, Rolando E¨otv¨os Nominatae, Sectio Computatorica, 2009. and A. Rosen, “The impossibility of obfuscation with auxiliary input [63] J. Ge, S. Chaudhuri, and A. Tyagi, “Control flow based obfuscation,” or a universal simulator,” in International Cryptology Conference. in Proc. of the 5th ACM Workshop on Digital Rights Management, Springer, 2014. 2005. [91] R. Canetti, “Towards realizing random oracles: hash functions that hide [64] X. Zhang, F. He, and W. Zuo, Theory and practice of program all partial information,” in Advances in Cryptology. Springer, 1997. obfuscation. INTECH Open Access Publisher, 2010. [92] B. Barak, O. Goldreich, R. Impagliazzo, S. Rudich, A. Sahai, [65] V. Balachandran and S. Emmanuel, “Software code obfuscation by S. Vadhan, and K. Yang, “On the (im) possibility of obfuscating hiding control flow information in stack,” in Proc. of the IEEE programs,” in Journal of the ACM, 2012. International Workshop on Information Forensics and Security, 2011. [93] S. Goldwasser and G. N. Rothblum, “On best-possible obfuscation,” [66] C. Domas, “The movfuscator: Turning ’move’ into a soul-crushing RE in Theory of Cryptography. Springer, 2007. nightmare,” REcon, 2015. [94] H. Lin, R. Pass, K. Seth, and S. Telang, “Indistinguishability [67] A. Kovacheva, “Efficient code obfuscation for android,” in Interna- obfuscation with non-trivial efficiency,” in IACR International tional Conference on Advances in Information Technology. Springer, Workshop on Public Key Cryptography. Springer, 2016. 2013. [95] O. Regev, “On lattices, learning with errors, random linear codes, and [68] D. Bohannon and L. Holmes, “Revoke-obfuscation: powershell cryptography,” in STOC, 2005. obfuscation detection using science,” BlackHat, 2017. [96] N. Bitansky and R. Canetti, “On strong simulation and composable [69] S. Dolan, “mov is turing-complete,” 2013. point obfuscation,” in Annual Cryptology Conference. Springer, 2010. [70] L. M. Wills, “Automated program recognition: a feasibility demonstra- [97] E. Boyle, K.-M. Chung, and R. Pass, “On extractability obfuscation,” tion,” Artificial Intelligence, 1990. in Theory of Cryptography Conference. Springer, 2014. [71] M. Madou, B. Anckaert, B. De Bus, K. De Bosschere, J. Cappaert, [98] E. Boyle and R. Pass, “Limits of extractability assumptions with and B. Preneel, “On the effectiveness of transformations distributional auxiliary input,” in International Conference on the for binary obfuscation,” in Proc. of the International Conference on Theory and Application of Cryptology and Information Security. Software Engineering Research and Practice. CSREA Press, 2006. Springer, 2014. [99] S. Garg, C. Gentry, S. Halevi, and D. Wichs, “On the implausibility [72] L. Ertaul and S. Venkatesh, “Jhide-a tool kit for code obfuscation.” in of differing-inputs obfuscation and extractable witness encryption with IASTED Conf. on Software Engineering and Applications, 2004. auxiliary input,” in International Cryptology Conference. Springer, [73] P. Wang, S. Wang, J. Ming, Y. Jiang, and D. Wu, “Translingual 2014. obfuscation,” 2016. [100] Y. Ishai, O. Pandey, and A. Sahai, “Public-coin differing-inputs [74] A. Majumdar, S. J. Drape, and C. D. Thomborson, “Slicing obfuscation and its applications,” in Theory of Cryptography obfuscations: design, correctness, and evaluation,” in Proc. of the 2007 Conference. Springer, 2015. ACM workshop on Digital Rights Management, 2007. [101] P. Ananth, D. Boneh, S. Garg, A. Sahai, and M. Zhandry, “Differing- [75] C. Collberg, C. Thomborson, and D. Low, “Breaking abstractions inputs obfuscation and applications.” IACR Cryptology ePrint Archive, and unstructuring data structures,” in Proc. of the IEEE International 2013. Conference on Computer Languages, 1998. [102] N. Bitansky, R. Canetti, Y. T. Kalai, and O. Paneth, “On virtual [76] L. Ertaul and S. Venkatesh, “Novel obfuscation algorithms for software grey box obfuscation for general circuits,” in International Cryptology security,” in Proc. of the 2005 International Conference on Software Conference. Springer, 2014. Engineering Research and Practice. Citeseer, 2005. [103] D. Boneh, A. Sahai, and B. Waters, “Functional encryption: Definitions [77] S. Drape et al., Obfuscation of abstract data types. Citeseer, 2004. and challenges,” in Theory of Cryptography Conference. Springer, [78] K. Fukushima, S. Kiyomoto, T. Tanaka, and K. Sakurai, “Analysis 2011. of program obfuscation schemes with variable encoding technique,” [104] S. Garg, C. Gentry, S. Halevi, M. Raykova, A. Sahai, and B. Waters, IEICE Transactions on Fundamentals of Electronics, Communications “Candidate indistinguishability obfuscation and functional encryption and Computer Sciences, 2008. for all circuits (full version),” in Cryptology ePrint Archive, 2013.

15 [105] B. Applebaum and Z. Brakerski, “Obfuscating circuits via composite- [131] S. Halevi, T. Halevi, V. Shoup, and N. Stephens-Davidowitz, order graded encoding,” TCC, 2015. “Implementing BP-obfuscation using graph-induced encoding.” in [106] D. A. Barrington, “Bounded-width polynomial-size branching pro- Proc. of the 24th ACM SIGSAC Conference on Computer and grams recognize exactly those languages in NC1,” in Proc. of the 18th Communications Security, 2017. Annual ACM Symposium on Theory of Computing, 1986. [132] W. Stein and D. Joyner, “Sage: System for algebra and geometry [107] P. Ananth, D. Gupta, Y. Ishai, and A. Sahai, “Optimizing obfuscation: experimentation,” ACM SIGSAM Bulletin, 2005. avoiding Barrington’s theorem,” in Proc. of the 2014 ACM SIGSAC [133] A. Langlois, D. Stehl´e, and R. Steinfeld, “GGHLite: more efficient Conference on Computer and Communications Security, 2014. multilinear maps from ideal lattices,” in Annual International [108] A. Sahai and M. Zhandry, “Obfuscating low-rank matrix branching Conference on the Theory and Applications of Cryptographic programs.” IACR Cryptology ePrint Archive, 2014. Techniques. Springer, 2014. [109] D. Boneh, K. Lewi, M. Raykova, A. Sahai, M. Zhandry, and [134] M. R. Albrecht, C. Cocis, F. Laguillaumie, and A. Langlois, J. Zimmerman, “Semantically secure order-revealing encryption: multi- “Implementing candidate graded encoding schemes from ideal lattices,” input functional encryption without obfuscation.” EUROCRYPT, 2015. in Proc. of the International Conference on the Theory and Application [110] M. Sauerhoff, I. Wegener, and R. Werchner, “Formula size over the full of Cryptology and Information Security. Springer, 2014. binary basis?” in Proc. of the 16th Annual Symposium on Theoretical [135] J. R. Lewis and B. Martin, “Cryptol: high assurance, retargetable Aspects of Computer Science. Springer, 1999. crypto development and validation,” in Proc. of the 2003 Military [111] J. Kilian, “Founding crytpography on oblivious transfer,” in Proc. of Communications Conference. IEEE. the 20th Annual ACM Symposium on Theory of Computing, 1988. [136] C. Gentry, S. Gorbunov, and S. Halevi, “Graph-induced multilinear [112] W. Diffie and M. E. Hellman, “Multiuser cryptographic techniques,” in maps from lattices,” in Theory of Cryptography Conference. Springer, Proc. of the National Computer Conference and Exposition. ACM, 2015. 1976. [137] Z. Brakerski and G. N. Rothblum, “Obfuscating conjunctions,” in [113] D. Boneh and A. Silverberg, “Applications of multilinear forms to Advances in Cryptology. Springer, 2013. cryptography,” Contemporary Mathematics, 2003. [138] ——, “Black-box obfuscation for d-CNFs,” in Proc. of the 5th [114] S. Garg, C. Gentry, and S. Halevi, “Candidate multilinear maps from conference on Innovations in Theoretical Computer Science. ACM, ideal lattices,” in Proc. of the Annual International Conference on the 2014. Theory and Applications of Cryptographic Techniques. Springer, 2013. [139] A. Sahai and B. Waters, “How to use indistinguishability obfuscation: [115] J.-S. Coron, T. Lepoint, and M. Tibouchi, “Practical multilinear maps deniable encryption, and more,” in Proc. of the 46th Annual ACM over the integers,” in Advances in Cryptology. Springer, 2013. Symposium on Theory of Computing, 2014. [116] ——, “New multilinear maps over the integers,” in Annual Cryptology [140] R. Canetti and V. Vaikuntanathan, “Obfuscating branching programs Conference. Springer, 2015. using black-box pseudo-free groups.” IACR Cryptology ePrint Archive, [117] J. Zimmerman, “How to obfuscate programs directly,” IACR 2013. Cryptology ePrint Archive, 2014. [141] Z. Brakerski and G. N. Rothblum, “Virtual black-box obfuscation for [118] P. Junod, J. Rinaldini, J. Wehrli, and J. Michielin, “Obfuscator-LLVM: all circuits via generic graded encoding,” in Theory of Cryptography. software protection for the masses,” 2015. Springer, 2014. [119] B. Anckaert, M. Madou, B. De Sutter, B. De Bus, K. De Bosschere, and [142] B. Barak, S. Garg, Y. T. Kalai, O. Paneth, and A. Sahai, B. Preneel, “Program obfuscation: a quantitative approach,” in Proc. “Protecting obfuscation against algebraic attacks,” in Proc. of the of the ACM Workshop on Quality of Protection, 2007. Annual International Conference on the Theory and Applications of [120] M. Ceccato, M. Di Penta, J. Nagra, P. Falcarin, F. Ricca, M. Torchiano, Cryptographic Techniques. Springer, 2014. and P. Tonella, “Towards experimental evaluation of code obfuscation [143] N. Bitansky and V. Vaikuntanathan, “Indistinguishability obfuscation: techniques,” in Proc. of the 4th ACM Workshop on Quality of from approximate to exact,” in Theory of Cryptography Conference. Protection, 2008. Springer, 2016. [121] ——, “The effectiveness of source code obfuscation: An experimental [144] M. Mahmoody, A. Mohammed, and S. Nematihaji, “More on assessment,” in Proc. of the 17th IEEE International Conference on impossibility of virtual black-box obfuscation in idealized models.” Program Comprehension, 2009. IACR Cryptology ePrint Archive, 2015. [122] M. Ceccato, A. Capiluppi, P. Falcarin, and C. Boldyreff, “A large study [145] J. Hoffstein, J. Pipher, and J. H. Silverman, “Ntru: A ring-based on the effect of code obfuscation on the quality of java code,” Empirical public key cryptosystem,” in International Algorithmic Number Theory Software Engineering, 2015. Symposium. Springer, 1998. [123] J. Raber and E. Laspe, “Deobfuscator: an automated approach to the [146] Y. Hu and H. Jia, “Cryptanalysis of GGH map,” in Annual International identification and removal of code obfuscation,” in Proc. of the 14th Conference on the Theory and Applications of Cryptographic Working Conference on Reverse Engineering. IEEE, 2007. Techniques. Springer, 2016. [124] B. Yadegari and S. Debray, “Symbolic execution of obfuscated code,” [147] B. Minaud and P.-A. Fouque, “Cryptanalysis of the new multilinear in Proc. of the 22nd ACM SIGSAC Conference on Computer and map over the integers,” IACR Cryptology ePrint Archive, 2015. Communications Security, 2015. [148] J. H. Cheon, P.-A. Fouque, C. Lee, B. Minaud, and H. Ryu, [125] J. Ming, D. Xu, L. Wang, and D. Wu, “LOOP: logic-rriented opaque “Cryptanalysis of the new CLT multilinear map over the integers,” predicate detection in obfuscated binary code,” in Proc. of the 22nd in Annual International Conference on the Theory and Applications of ACM SIGSAC Conference on Computer and Communications Security, Cryptographic Techniques. Springer, 2016. 2015. [149] J.-S. Coron, M. S. Lee, T. Lepoint, and M. Tibouchi, “Cryptanalysis [126] S. Banescu, C. Collberg, and A. Pretschner, “Predicting the resilience of GGH15 multilinear maps,” in Annual Cryptology Conference. of obfuscated code against symbolic execution attacks via machine Springer, 2016. learning,” in USENIX Security Symposium, 2017. [150] D. Boneh, D. J. Wu, and J. Zimmerman, “Immunizing multilinear maps [127] X. Hui, Z. Yangfan, K. Yu, and R. L. Michael, “Concolic execution against zeroizing attacks.” IACR Cryptology ePrint Archive, 2014. on small-size binaries: challenges and empirical study,” in Proc. of the [151] S. Garg, P. Mukherjee, and A. Srinivasan, “Obfuscation without the 47th IEEE/IFIP International Conference on Dependable Systems & vulnerabilities of multilinear maps.” IACR Cryptology ePrint Archive, Networks, 2017. 2016. [128] J. E. Hopcroft, R. Motwani, and J. D. Ullman, “Automata theory, [152] S. Badrinarayanan, E. Miles, A. Sahai, and M. Zhandry, “Post-zeroizing languages, and computation,” International Edition, 2006. obfuscation: new mathematical tools, and the case of evasive circuits,” [129] D. Apon, Y. Huang, J. Katz, and A. J. Malozemoff, “Implementing in Annual International Conference on the Theory and Applications of cryptographic program obfuscation.” IACR Cryptology ePrint Archive, Cryptographic Techniques. Springer, 2016. 2014. [153] P. Cousot and R. Cousot, “Systematic design of program transformation [130] M. R. Brent Carmer, Alex J. Malozemoff, “5Gen-C: multi-input frameworks by abstract interpretation,” in ACM SIGPLAN Notices, functional encryption and program obfuscation for arithmetic circuits,” 2002. in Proc. of the 24th ACM SIGSAC Conference on Computer and [154] ——, “An abstract interpretation-based framework for software Communications Security, 2017. watermarking,” in ACM SIGPLAN Notices. ACM, 2004.

16 [155] M. Dalla Preda and R. Giacobazzi, “Control code obfuscation by perspectives in code obfuscation and watermarking,” in Proc. of the abstract interpretation,” in Proc. of the third IEEE International sixth IEEE International Conference on Software Engineering and Conference on Software Engineering and Formal Methods, 2005. Formal Methods, 2008. [156] R. Giacobazzi, “Hiding information in completeness holes: New

17