Frankencode: Creating Diverse Programs Using Code Clones

Hayley Borck, Mark Boddy, Ian J De Silva, Steven Harp, Ken Hoyme, Steven Johnston, August Schwerdfeger, and Mary Southern Adventium Labs, Minneapolis, MN 55401 Email: fi[email protected]

Abstract enable not just new attacks, but new kinds of attacks. Apparently-minor vulnerabilities can be chained to- In this paper, we present an approach to detecting gether in successful attacks, even on that is novel cyber attacks though a form of program diversifi- extensively tested and generally viewed as secure.1 cation, similar to the use of n-version programming for In this paper, we describe our research on the Func- fault tolerant systems. Building on extensive previous tionally Equivalent Variants using Information Syn- and ongoing work by others on the use of code clones chronization (FEVIS) system. Part of the Cyber Fault- in a wide variety of areas, our Functionally Equivalent tolerant Attack Recovery (CFAR) program, funded by Variants using Information Synchronization (FEVIS) the U.S. Government’s Defense Advanced Research system automatically generates program variants to be Projects Agency (DARPA), FEVIS builds on previ- run in parallel, seeking to detect attacks through diver- ous and ongoing work on “code clones,” substituting gence in behavior. Unlike approaches to diversification redundant code fragments as a means to generate that only change program memory layout and behavior, program variants automatically. These variants are in- FEVIS can detect attacks exploiting vulnerabilities in tended for use in a multi-variant execution environ- execution timing, string processing, and other logic ment, to be used for attack detection and resistance in errors. the presence of both known and unknown attacks. We We are in the early stages of research and devel- are in the early stages of research and development opment for this approach, but have made sufficient for this approach, but have made sufficient progress to progress to provide a proof of concept and some provide a proof of concept and some lessons learned. lessons learned. In this paper we describe FEVIS In the rest of this paper, we describe related work and and its application to diversifying an open-source previous research on which FEVIS builds (Section 2). webserver, with results on several different example We then present FEVIS itself (Section 3), illustrating classes of attack which FEVIS will detect. the approach through the use of examples drawn from the diversification of an open-source webserver (Sec- 1. Introduction tion 4). We conclude with a discussion of ongoing and future research. Software vulnerabilities and the cyber attacks en- abled by them are increasingly prevalent, and continue to increase in both cost and impact. Critical civilian 2. Related Work and military software systems are vulnerable, with new vulnerabilities being discovered all the time. Vulner- abilities enabling attacks can remain undetected for FEVIS applies concepts and techniques from the years, as with the Heartbleed bug in OpenSSL [1]. study of code redundancy as a means of diversifying Worse, both known and unknown vulnerabilities may software for use in multi-variant execution. In this This research was developed with funding from the Defense Ad- section, we summarize related work in these three vanced Research Projects Agency (DARPA) under contract FA8750- areas. 15--0110. The views and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government. DISTRIBUTION STATEMENT A: Approved for public 1. The multi-step attacks on Chromium detailed in “A Tale of Two release; distribution is unlimited. Pwnies” provide good examples [2]. 2.1. Multi-Variant Execution the logical structure (i.e., the semantics) of the pro- gram itself. Cross-site scripting, use after free, di- Multi-variant execution is an approach to detecting rectory traversal, SQL and OS command injection, and forestalling attacks, in which multiple variants of and unauthorized disclosure via URL string processing an application are run in parallel. These variants are vulnerabilities are among the many forms of attack specifically chosen to provide the same results for that may not be blocked or detected by diversification an authorized range of inputs, but to diverge under measures that seek to preserve program control flow attack. Multi-variant execution has been an active and semantics. Nor will many of them be caught by research topic for several years, with different research more sophisticated approaches such as automated code groups proposing alternative architectures for variant obfuscation or restructuring, if the semantics of the synchronization and checking, as well as methods modified program remain the same. for variant generation including Address Space Lay- These are the kinds of attacks that we seek to detect out Randomization (ASLR); Intelligence, Surveillance, using variants generated by FEVIS. and Reconnaissance (ISR); and buffer padding. For example, Salamat et al. [3] describe a Multi-Variant 2.3. Code Clones Execution Environment (MVEE), using variants in which the stack grows in opposite directions. Nguyen- Our approach takes as a starting point the well- Tuong et al. devised a multi-variant system using data known concept of code clones, in which redundant transformations derived from N-variant systems [4]. code sections can be substituted one for another. The Multi-variant execution is somewhat different from definition of redundancy that we follow is taken from the use of diversification as a “moving target de- Carzaniga et al. [6], who define code redundancy as fense” (MTD), in which the objective is to present the product of observational equivalence and execu- an attacker with a single binary, but one which has tion distance.3 Observational equivalence is defined in unpredictable differences from previous instances of terms of probing code, code which is input to code the same program they may have investigated. One fragment to determine outputs and an oracle. Both critical difference is that multi-variant execution is are appended to the code fragments being tested for secretless, in the sense that an attacker may know about equivalence. This definition allows us to define “func- how variants are generated, may even know which tional equivalence” in terms of authorized or expected specific variants are currently running, without that input, rather than across the entire space of possible information enabling an attack. In contrast, an attacker input. Carzaniga et al. evaluate several definitions for given time to investigate a single variant, may discover execution distance, settling on data projection, which what has been changed and be able to adjust their is measured in terms of the difference in memory attack accordingly.2 locations read and written, as well as what is written to them. Figure 1 provides two examples of redundant code fragments according to this definition. 2.2. Program Diversification Code clones have been employed for a wide variety of uses, including the automatic generation of test or- One common through much previous research acles, self-healing code, fault tolerance, and automatic on program diversification for multi-variant execution test generation. For example Carzaniga [8] creates test is the preservation of program semantics in a very oracles by finding redundancies in code and cross strong sense. ASLR relocates code and data in the checking the execution of a test with the execution of running program, preserving control flow. The same the same test on a variant of the code with redundant is true for techniques such as stack or heap padding, operations replaced. or ISR. Program transformation approaches that seek Clones may either be discovered or constructed, to defeat attacks upon a single binary, such as the both approaches having some currency in the research addition of stack canaries, might plausibly be viewed literature. The prevalence of redundant code fragments as providing diversification. In this case, the altered in large software systems such as the kernel and unaltered programs constitute a set of two. has been extensively documented, as discussed in [8], However, many of the most common exploits, en- abling the most devastating attacks, exploit holes in 3. This is distinct from the more general use of the term “clone” as used (for example) in [7], where clones may differ not at all, or only in layout and choice of identifiers. Clones in this larger sense 2. Larsen et al. [5] provide a good summary of diversification have been applied for purposes such as detecting plagiarism and methods applied as MTD. copyright infringement. of the methods, and the “structural” distance of methods within to have the same result and equivalent side-effects (or state the package hierarchy. The most interesting aspects of the work changes) as A. of Higo and Kusumoto is that, perhaps thanks to the simplicity Other studies on semantically equivalent code adopt a purely of their analysis, they were able to analyze a massive amount functional notion of equivalence, and therefore assume no of code. visible state changes [11]. Yet others consider state changes to be part of the input/output transformation of code fragments, III. CHARACTERIZING SOFTWARE REDUNDANCY but then accept only identical state changes [20]. Instead, we We now formalize our notion of redundancy. We first give would still consider two fragments to be equivalent even if an abstract and general definition that we then specialize to they produce different state changes, as long as the observable develop a practical measurement method. effects of those changes are identical. This notion is close to the We are interested in the redundancy of code. More specif- testing equivalence proposed by De Nicola and Hennessy [27] ically we define redundancy as a relation between two code and the weak bi-similarity by Hennessy and Milner [17]. We fragments within a larger system. A code fragment is any now formulate an initial definition of equivalence between code portion of code together with the necessary linkage between fragments similar to testing equivalence. that code and the rest of the system. A fragment can be seen 1) Basic Definitions: We model a system as a state machine, as the in-line expansion of a function with parameters passed and we denote with S the set of states, and with A the set of by reference, where the parameters are the linkage between all possible actions of the system. The execution of a code the fragment and its context. Figure 1 illustrates the notion fragment C starting from an initial state S0 amounts to a sequence of actions a1,a2,...,ak A that induces a sequence of redundant fragments with two examples. For each pair of a a 2 a of state transitions S 1 S 2 k S . In this model we fragments, we specify the linkage between the fragments and 0 ! 1 ! ···! k the rest of the system by listing the variables in the fragments only consider code fragments with sequential and terminating that refer to variables in the rest of the system. All other and therefore finite (but unbounded) executions, and without variables are local to the fragments. loss of generality we consider the input as being part of the Readings)in)Automated)Sonware)Tes&ng) initial state.ATD)Variants)and)Execu&on)Distance)

linkage: int x; int y; We then use O to denote the set of all possible outputs, that Var1)is, the set of all externally observable effects of an execution. int tmp = x; x ^= y; We use Out(S,a) O to denote the output corresponding to the x = y; y ^= x; 2 y = tmp; x ^= y; Var2)execution of action a starting from state S, and, generalizing, we denote with Out(S0,C) O⇤ the output of the sequence 2 linkage: AbstractMultimap map; String key, Object value; VarNof) actions a1,a2,...,ak corresponding to the execution of C from state S0. map.put(key, value); List list = new ArrayList(); 2) Observational Equivalence: We say that two code list.add(value); Variants)differ)in:) map.putAll(key, list); Figurefragments• Execu&on)&me.) 3. ATDCA variantsand CB shownare observationally running in parallel equivalent from an• Resource)requirements)(memory)and)layout))) initial state S0 if and only if, for every code fragment Fig. 1. Examples of redundant code fragments. C• System)calls)–)type,)number)and)&ming)within)the)segment.)(probing code), the output Out(S ,C ;C ) is the same as Figure 1. Redundant code fragments, from [6] 1)• PSegment)state)–)variant)sta&c)allocated)data) Partition the ATD into chunks (potential0 A variantP [Carzaniga)et,al.,,2015])define)redundancy)as)the)product)of) Out• Func&on)call)dependencies)–)par&cularly)for)variants)built)from)C)libraries.)(S ,C ;C ), where C ;C and C ;C are code fragments segments)0 B P A P B P The1. “Observa&onal)equivalence”)and) first example (first row) shows a code fragment that obtained by concatenating C and C with the probing code 11/10/15) Delivered)to)the)Government)in)Accordance)with)Contract)#FA8750I15ICI0110.)DISTRIBUTION)STATEMENT)D:)Distribu&on)authorized)A B 13) 2) Identify potentialto)DoD))and)their)DoD)contractors)only.)Further)dissemina&on)only)as)directed)by)AFRL/RI,)Rome,)NY)13441.) clones for those chunks: swaps2. twoExecu&on)distance,)measured)by)differences)in)data)read/write) integer variables using a temporary variable (left C , respectively. P a) Code chunks previously encountered and )side) and another fragment that swaps the same variables This definition requires that the two code fragments and stored in a “Diversity Cache” without the temporary by using the bitwise xor operator (right the follow-up probing code produce exactly the same output, Perfect)equivalence)with)zero)execu&on)distance)yields)zero)redundancy.) b) Redundancy within the ATD itself side). The second example refers to a multi-value map in which does not take into account the intended semantics of the ) c) Chunks from related implementations in 11/10/15) Delivered)to)the)Government)in)Accordance)with)Contract)#FA8750I15ICI0110.)DISTRIBUTION)STATEMENT)D:)Distribu&on)authorized) 11) which one canto)DoD))and)their)DoD)contractors)only.)Further)dissemina&on)only)as)directed)by)AFRL/RI,)Rome,)NY)13441.) add an individual mapping for a given key system whereby different output sequences may be equally valid other applications with similar functionality (put) or multiple mappings for the same key (putAll). In both and therefore should be considered equivalent. For example, d) Synthesized variants cases, the fragment on the left is different from, but equivalent consider a container that implements an unordered set of 3) Find or synthesize adaptors for chunks identified to, the fragment on the right. This is our intuitive definition numbers that in state S represents the set 10 . Consider in Step 2 0 { } of redundancy: two fragments are redundant when they are now a fragment C that adds element 20 to the set, and a 4) Generate ATD variantsA by swapping in clones functionally equivalent and at the same time their executions supposedly equivalent fragment C that also adds 20 to the for one or more variant segmentsB are different. We now formalize these two constituent notions. C 5)set Generatebut with a arguments different internal supporting state transformation: ATD variant A leaves thefunctional set in a state equivalence such that and an divergenceiteration would first go through A. An Abstract Notion of Redundancy 10 and then 20, while C causes the same iteration to first go The current implementationB encompasses automated We wantFigure to express 2. FEVIS the Architecture notion that oneOverview should be able to through 20 and then 10. C and C would not be considered “chunking” of the ATD andA the manualB identifica- replace a fragment A with a redundant fragment B, within a tionobservationally of potential variant equivalent segments according within the to the Diversity definition above, larger system, without changing the functionality of the system. since a probing code that iterates through the elements of the [9]. Automatically finding these redundant fragments Cache, along with the appropriate adaptors, and then This means that the execution of B would produce the same set would expose a difference. is an ongoing topic of research. The EQMiner tool automatic generation of a set of ATD variants. Tools resultsof Jiang as A andand Su would uses not random cause input any andnoticeable output differencetesting in for automatedTo account detection for the of semantics potential of code the clones system, are we consider theon future arbitrarily behavior sized ofcode the system. snippets In to other find words, functionally we want B undera more construction, general definition as is the automatedthat requires generation the output of of the two equivalent yet syntactically different code clones [9]. arguments regarding the properties of a given set of The semantic clone detection tool MeCC [10] com- ATD variants. pares abstract memory states in order to find candidate Figure 3 shows several variants for the same ATD. code clones. The shaded regions represent variant segments. The Goffi et al. [11] use genetic algorithms to generate small vertical arrows represent system calls that may functionally equivalent clones. This approach sidesteps be made within those segments, and are included to the difficulties of finding equivalent code fragments in show that rigorous synchronization of the variants at the existing code base, but at the cost of potentially system calls will generate numerous false positives. limiting the space of redundancies explored. We use the term adaptor to refer to code that may be wrapped around a given code fragment, in order to 3. FEVIS make it equivalent to an ATD segment. For example an adapter may be needed to swap the arguments The approach supported by FEVIS is to take an for code fragments with equivalent arguments in dif- Application to Defend (ATD), identify a set of ATD ferent orders. Figure 4 shows how an adaptor may segments for possible replacement by functionally be combined with a code chunk from the Diversity equivalent variant segments, resulting in a set of ATD Cache, in order to generate a variant segment that is variants. functionally equivalent to some ATD segment. While A high-level view of the FEVIS system architecture in the general case finding adaptors is equivalent to is shown in Figure 2. For a given ATD, the process program generation, in limited form it appears to show supported by this architecture is as follows: promise, especially for segments defined at function Original ATD fragment Variant segment or Code Clone of memory-based attacks, using test sets provided by a third party. Sets of two to three variants were created using the manually found clones. Attacks such as heap and stack based buffer overflow and out of bounds Adaptor Code chunk read were tested, which are each in the top ten CWE vulnerabilities list. We have additionally demonstrated Figure 4. Diagram of a clone: an adaptor wrapper the generation of FEVIS ATD variants that will diverge around a code chunk under attacks exploiting program semantics. For this experiment, as the ATD we used an older version % functions with clones of the webserver with a documented security 0.9 flaw (CVE-2005-0543). The vulnerability involves the 0.8 processing of submitted URLs: due to the improper 0.7 0.6 handling of escaped null characters (%00) an attacker 0.5 can obtain the source code for a CGI script, rather 0.4 than just the results of a query submitted to that script. 0.3 0.2 As these scripts frequently contain hard-coded URLs, 0.1 directory paths, and even login credentials, this is a 0 significant vulnerability. Potential clones for this vulnerable variant segment were found in four alternate webservers: thttpd, Hi- awatha, Apache, and . The potential clones Figure 5. Percent of the functions within each from thttpd and had the same vulnerability; file that comprises the thttpd webserver for which those from Cherokee and Apache did not, though clones were found for in a manual search. they fixed it in different ways; Cherokee mapping null characters to spaces, and Apache returning an error. However, while the analogous string-handling func- boundaries. tion in thttpd mishandles nulls in the same way as lighttpd, the thttpd webserver was not exploitable in the 4. Progress to Date same way, due to the details of subsequent processing. This, along with the differences in string processing At this early stage of the project (within the first in the Cherokee and Hiawatha webservers described year), we have implemented the basic functions for above, provides some preliminary evidence for our FEVIS. For motivation and experimentation, we have claim that it is not necessary for a given vulnerability investigated the open-source webserver thttpd, search- to appear in a variant segment, in order to have a set ing for code clones and evaluating the efficacy of ATD of ATD variants diverge when an exploit is attempted. variants for diversification. In a manual search for variant segments for thttpd, 5. Conclusions and Future Work we found that 51% (Figure 5) of the code base had potential clones in other open source webservers The question motivating this research is whether (Apache, Hiawatha, Cherokee, and lighttpd were exam- creating a diverse set of variants of a program from ined). Among the results this analysis turned up were functionally equivalent code clones will enable broader potential clones for the functions defang, strdecode, detection of cyber attacks. We are still within the first match one, tdate parse, atoll, and fdwatch, with year of this project, and so our results are necessarily the number of alternative choices ranging from 1 to 5. preliminary, but the initial indications are promising. In some source files within thttpd, as many as 80% In this section, we discuss ongoing and future work of the functions had clones elsewhere, or even within on FEVIS. the same file. In addition, we explored the use of The primary areas for improvement are in auto- alternative library implementations (for example, the matically finding or generating clones, improved auto- many available versions of the C standard library) as mated testing for equivalence among variant segments a further source of diversity. (including the generation of adaptors where needed), In tests conducted within the overarching DARPA and the automated construction of arguments regarding program, sets of ATD variants automatically generated the functional equivalence and divergence properties of by FEVIS were shown to diverge under multiple kinds sets of ATD variants. In addition, we seek to be able to work from able: http://cve.mitre.org/cgi-bin/cvename.cgi?name= binaries, as well as from source code. Currently we CVE-2014-0160 use source compiled to LLVM [12] within the Diver- [2] J. L. Obes and J. Schuh. (2012) A tale of two pwnies sity Cache. Working from binaries lifted to LLVM is (part 1). [Online]. Available: http://blog.chromium.org/ more difficult because lifted binaries have had much 2012/05/tale-of-two-pwnies-part-1.html of the structure and typing information removed in the compilation process. One possible workaround for [3] B. Salamat, T. Jackson, G. Wagner, C. Wimmer, and this problem is to work directly from disassembled M. Franz, “Runtime defense against code injection attacks using replicated execution,” Dependable and binaries, which somewhat paradoxically have more Secure Computing, IEEE Transactions on, vol. 8, no. 4, structure than the lifted representations, along with the pp. 588–601, 2011. use of tools to recover some of the structure lost in the compilation process. [4] A. Nguyen-Tuong, D. Evans, J. C. Knight, B. Cox, For automated search for potential clones, we are and J. W. Davidson, “Security through redundant data diversity,” in Dependable Systems and Networks With working two approaches. The first is to adapt the FTCS and DCC, 2008. DSN 2008. IEEE International approach implemented in EQMiner to work on LLVM. Conference on. IEEE, 2008, pp. 187–196. The second is to further exploit the very considerable redundancy present in standard libraries. Presently, we [5] P. Larsen, S. Brunthaler, and M. Franz, “Security can and do swap entire libc versions into ATD variants. through diversity: Are we there yet?” Security & Pri- vacy, IEEE, vol. 12, no. 2, pp. 28–35, 2014. Piecemeal substitution of individual library functions will provide much greater freedom to generate variants, [6] A. Carzaniga, A. Mattavelli, and M. Pezze,` “Measuring but comes with some additional complications, such as software redundancy,” in Proc. of Int. Conf. on Software identifying functions that must be swapped together, Engineering (ICSE), 2015. because they make common use of shared information [7] C. K. Roy and J. R. Cordy, “A survey on software clone or data structures. Automated testing for functional detection research,” Technical Report 541, Queens Uni- equivalence is also required. Previous work on clone versity at Kingston, Tech. Rep., 2007. discovery such as EQMiner uses randomly-generated test strings [9]. More directed testing can be accom- [8] A. Carzaniga, A. Goffi, A. Gorla, A. Mattavelli, and plished using a cached set of probes generated from M. Pezze,` “Cross-checking oracles from intrinsic soft- ware redundancy,” in Proceedings of the 36th Interna- “normal operation” of a code fragment (e.g., gathered tional Conference on Software Engineering. ACM, using Daikon [13]), augmented over time with counter- 2014, pp. 931–942. examples from more detailed subsequent testing. In collaboration with our partners on this project at [9] L. Jiang and Z. Su, “Automatic mining of functionally the University of Minnesota, we are also investigating equivalent code fragments via random testing,” in Pro- ceedings of the eighteenth international symposium on a hybrid approach in which adaptors are automati- Software testing and analysis. ACM, 2009, pp. 81–92. cally synthesized (within a significantly limited search space) in order to make a given code chunk equivalent [10] H. Kim, Y. Jung, S. Kim, and K. Yi, “MeCC: memory to some ATD segment. This work is promising, but too comparison-based clone detector,” in Software Engi- preliminary to present as yet. neering (ICSE), 2011 33rd International Conference on. IEEE, 2011, pp. 301–310. Finally, given the extremely large space of ATD variants that could be generated from substituting for [11] A. Goffi, A. Gorla, A. Mattavelli, M. Pezze,` and even a modest number of ATD segments, we must be P. Tonella, “Search-based synthesis of equivalent able to construct much smaller sets of ATD variants method sequences,” in Proceedings of the 22nd ACM that are likely to diverge under attack (necessary for at- SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 2014, pp. 366–376. tack detection), while continuing to exhibit equivalent behavior under normal conditions (necessary for user [12] C. Lattner and V. Adve, “LLVM: A compilation frame- acceptance, and for avoiding false positives). Also in work for lifelong program analysis & transformation,” collaboration with the University of Minnesota, we are in Code Generation and Optimization, 2004. CGO exploring ways to the construction of such arguments. 2004. International Symposium on. IEEE, 2004, pp. 75–86.

References [13] M. D. Ernst, “Dynamically discovering likely program invariants,” Ph.D., University of Washington Depart- [1] “CVE-2014-0160.” Available from MITRE, CVE-ID ment of Computer Science and Engineering, Seattle, CVE-2014-0160., Dec. 3 2013. [Online]. Avail- Washington, Aug. 2000.