Frankencode: Creating Diverse Programs Using Code Clones
Total Page:16
File Type:pdf, Size:1020Kb
Frankencode: Creating Diverse Programs Using Code Clones Hayley Borck, Mark Boddy, Ian J De Silva, Steven Harp, Ken Hoyme, Steven Johnston, August Schwerdfeger, and Mary Southern Adventium Labs, Minneapolis, MN 55401 Email: fi[email protected] Abstract enable not just new attacks, but new kinds of attacks. Apparently-minor vulnerabilities can be chained to- In this paper, we present an approach to detecting gether in successful attacks, even on software that is novel cyber attacks though a form of program diversifi- extensively tested and generally viewed as secure.1 cation, similar to the use of n-version programming for In this paper, we describe our research on the Func- fault tolerant systems. Building on extensive previous tionally Equivalent Variants using Information Syn- and ongoing work by others on the use of code clones chronization (FEVIS) system. Part of the Cyber Fault- in a wide variety of areas, our Functionally Equivalent tolerant Attack Recovery (CFAR) program, funded by Variants using Information Synchronization (FEVIS) the U.S. Government’s Defense Advanced Research system automatically generates program variants to be Projects Agency (DARPA), FEVIS builds on previ- run in parallel, seeking to detect attacks through diver- ous and ongoing work on “code clones,” substituting gence in behavior. Unlike approaches to diversification redundant code fragments as a means to generate that only change program memory layout and behavior, program variants automatically. These variants are in- FEVIS can detect attacks exploiting vulnerabilities in tended for use in a multi-variant execution environ- execution timing, string processing, and other logic ment, to be used for attack detection and resistance in errors. the presence of both known and unknown attacks. We We are in the early stages of research and devel- are in the early stages of research and development opment for this approach, but have made sufficient for this approach, but have made sufficient progress to progress to provide a proof of concept and some provide a proof of concept and some lessons learned. lessons learned. In this paper we describe FEVIS In the rest of this paper, we describe related work and and its application to diversifying an open-source previous research on which FEVIS builds (Section 2). webserver, with results on several different example We then present FEVIS itself (Section 3), illustrating classes of attack which FEVIS will detect. the approach through the use of examples drawn from the diversification of an open-source webserver (Sec- 1. Introduction tion 4). We conclude with a discussion of ongoing and future research. Software vulnerabilities and the cyber attacks en- abled by them are increasingly prevalent, and continue to increase in both cost and impact. Critical civilian 2. Related Work and military software systems are vulnerable, with new vulnerabilities being discovered all the time. Vulner- abilities enabling attacks can remain undetected for FEVIS applies concepts and techniques from the years, as with the Heartbleed bug in OpenSSL [1]. study of code redundancy as a means of diversifying Worse, both known and unknown vulnerabilities may software for use in multi-variant execution. In this This research was developed with funding from the Defense Ad- section, we summarize related work in these three vanced Research Projects Agency (DARPA) under contract FA8750- areas. 15-C-0110. The views and/or findings contained in this article are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the US Government. DISTRIBUTION STATEMENT A: Approved for public 1. The multi-step attacks on Chromium detailed in “A Tale of Two release; distribution is unlimited. Pwnies” provide good examples [2]. 2.1. Multi-Variant Execution the logical structure (i.e., the semantics) of the pro- gram itself. Cross-site scripting, use after free, di- Multi-variant execution is an approach to detecting rectory traversal, SQL and OS command injection, and forestalling attacks, in which multiple variants of and unauthorized disclosure via URL string processing an application are run in parallel. These variants are vulnerabilities are among the many forms of attack specifically chosen to provide the same results for that may not be blocked or detected by diversification an authorized range of inputs, but to diverge under measures that seek to preserve program control flow attack. Multi-variant execution has been an active and semantics. Nor will many of them be caught by research topic for several years, with different research more sophisticated approaches such as automated code groups proposing alternative architectures for variant obfuscation or restructuring, if the semantics of the synchronization and checking, as well as methods modified program remain the same. for variant generation including Address Space Lay- These are the kinds of attacks that we seek to detect out Randomization (ASLR); Intelligence, Surveillance, using variants generated by FEVIS. and Reconnaissance (ISR); and buffer padding. For example, Salamat et al. [3] describe a Multi-Variant 2.3. Code Clones Execution Environment (MVEE), using variants in which the stack grows in opposite directions. Nguyen- Our approach takes as a starting point the well- Tuong et al. devised a multi-variant system using data known concept of code clones, in which redundant transformations derived from N-variant systems [4]. code sections can be substituted one for another. The Multi-variant execution is somewhat different from definition of redundancy that we follow is taken from the use of diversification as a “moving target de- Carzaniga et al. [6], who define code redundancy as fense” (MTD), in which the objective is to present the product of observational equivalence and execu- an attacker with a single binary, but one which has tion distance.3 Observational equivalence is defined in unpredictable differences from previous instances of terms of probing code, code which is input to code the same program they may have investigated. One fragment to determine outputs and an oracle. Both critical difference is that multi-variant execution is are appended to the code fragments being tested for secretless, in the sense that an attacker may know about equivalence. This definition allows us to define “func- how variants are generated, may even know which tional equivalence” in terms of authorized or expected specific variants are currently running, without that input, rather than across the entire space of possible information enabling an attack. In contrast, an attacker input. Carzaniga et al. evaluate several definitions for given time to investigate a single variant, may discover execution distance, settling on data projection, which what has been changed and be able to adjust their is measured in terms of the difference in memory attack accordingly.2 locations read and written, as well as what is written to them. Figure 1 provides two examples of redundant code fragments according to this definition. 2.2. Program Diversification Code clones have been employed for a wide variety of uses, including the automatic generation of test or- One common thread through much previous research acles, self-healing code, fault tolerance, and automatic on program diversification for multi-variant execution test generation. For example Carzaniga [8] creates test is the preservation of program semantics in a very oracles by finding redundancies in code and cross strong sense. ASLR relocates code and data in the checking the execution of a test with the execution of running program, preserving control flow. The same the same test on a variant of the code with redundant is true for techniques such as stack or heap padding, operations replaced. or ISR. Program transformation approaches that seek Clones may either be discovered or constructed, to defeat attacks upon a single binary, such as the both approaches having some currency in the research addition of stack canaries, might plausibly be viewed literature. The prevalence of redundant code fragments as providing diversification. In this case, the altered in large software systems such as the Linux kernel and unaltered programs constitute a set of two. has been extensively documented, as discussed in [8], However, many of the most common exploits, en- abling the most devastating attacks, exploit holes in 3. This is distinct from the more general use of the term “clone” as used (for example) in [7], where clones may differ not at all, or only in layout and choice of identifiers. Clones in this larger sense 2. Larsen et al. [5] provide a good summary of diversification have been applied for purposes such as detecting plagiarism and methods applied as MTD. copyright infringement. of the methods, and the “structural” distance of methods within to have the same result and equivalent side-effects (or state the package hierarchy. The most interesting aspects of the work changes) as A. of Higo and Kusumoto is that, perhaps thanks to the simplicity Other studies on semantically equivalent code adopt a purely of their analysis, they were able to analyze a massive amount functional notion of equivalence, and therefore assume no of code. visible state changes [11]. Yet others consider state changes to be part of the input/output transformation of code fragments, III. CHARACTERIZING SOFTWARE REDUNDANCY but then accept only identical state changes [20]. Instead, we We now formalize our notion of redundancy. We first give would still consider two fragments to be equivalent even if an abstract and general definition that we then specialize to they produce different state changes, as long as the observable develop a practical measurement method. effects of those changes are identical. This notion is close to the We are interested in the redundancy of code. More specif- testing equivalence proposed by De Nicola and Hennessy [27] ically we define redundancy as a relation between two code and the weak bi-similarity by Hennessy and Milner [17]. We fragments within a larger system.