Chopped Symbolic Execution

Chopped Symbolic Execution

<p>Chopped Symbolic Execution </p><p>David Trabish </p><p>Tel Aviv University <br>Israel </p><p>Andrea Mattavelli </p><p>Imperial College London <br>United Kingdom </p><p>Noam Rinetzky </p><p>Tel Aviv University <br>Israel </p><p>Cristian Cadar </p><p>Imperial College London <br>United Kingdom </p><ul style="display: flex;"><li style="flex:1"><a href="mailto:[email protected]" target="_blank">[email protected] </a></li><li style="flex:1"><a href="mailto:[email protected]" target="_blank">[email protected] </a></li><li style="flex:1"><a href="mailto:[email protected]" target="_blank">[email protected] </a></li><li style="flex:1"><a href="mailto:[email protected]" target="_blank">[email protected] </a></li></ul><p></p><p>ABSTRACT </p><p>the code with symbolic values instead of concrete ones. Symbolic </p><p>execution engines thus replace concrete program operations with </p><p>ones that manipulate symbols, and add appropriate constraints on </p><p>the symbolic values. In particular, whenever the symbolic executor </p><p>reaches a branch condition that depends on the symbolic inputs, it </p><p>determines the feasibility of both sides of the branch, and creates </p><p>two new independent symbolic states which are added to a worklist </p><p>to follow each feasible side separately. This process, referred to as </p><p>forking, refines the conditions on the symbolic values by adding </p><p>appropriate constraints on each path according to the conditions </p><p>on the branch. Test cases are generated by finding concrete values </p><p>for the symbolic inputs that satisfy the path conditions. To both </p><p>determine the feasibility of path conditions and generate concrete </p><p>solutions that satisfies them, symbolic execution engines employ </p><p>satisfiability-modulo theory (SMT) constraint solvers [19]. </p><p>Symbolic execution is a powerful program analysis technique that </p><p>systematically explores multiple program paths. However, despite </p><p>important technical advances, symbolic execution often struggles to </p><p>reach deep parts of the code due to the well-known path explosion </p><p>problem and constraint solving limitations. </p><p>In this paper, we propose chopped symbolic execution, a novel form of symbolic execution that allows users to specify uninteresting parts of the code to exclude during the analysis, thus only targeting the exploration to paths of importance. However, the excluded parts are not summarily ignored, as this may lead to </p><p>both false positives and false negatives. Instead, they are executed </p><p>lazily, when their effect may be observable by code under analysis. Chopped symbolic execution leverages various on-demand </p><p>static analyses at runtime to automatically exclude code fragments </p><p>while resolving their side effects, thus avoiding expensive manual annotations and imprecision. </p><p>The Challenge.&nbsp;Symbolic execution has proven to be effective at </p><p>finding subtle bugs in a variety of software [ </p><p>3</p><p>,</p><p>11 </p><p></p><ul style="display: flex;"><li style="flex:1">,</li><li style="flex:1">12, 25, 39], and </li></ul><p></p><p>Our preliminary results show that the approach can effectively </p><p>improve the effectiveness of symbolic execution in several different </p><p>scenarios, including failure reproduction and test suite augmenta- </p><p>tion. </p><p></p><ul style="display: flex;"><li style="flex:1">has started to see industrial take-up [13 15, 25]. However, a key </li><li style="flex:1">,</li></ul><p></p><p>remaining challenge is scalability, particularly related to constraint </p><p>solving cost and path explosion [14]. </p><p>Symbolic execution engines issue a huge number of queries to the </p><p>constraint solver that are often large and complex when analyzing </p><p>real-world programs. As a result, constraint solving dominates </p><p>CCS CONCEPTS </p><p>• Software and its engineering → Software testing and de- </p><p></p><ul style="display: flex;"><li style="flex:1">runtime for the majority of non-trivial programs [30 33]. Recent </li><li style="flex:1">,</li></ul><p></p><p>bugging; </p><p>research has tackled the challenge by proposing several constraint </p><p></p><ul style="display: flex;"><li style="flex:1">solving optimizations that can help reduce constraint solving cost [ </li><li style="flex:1">5, </li></ul><p></p><p>KEYWORDS </p><p>Symbolic execution, Static analysis, Program slicing </p><p>ACM Reference Format: </p><p>David Trabish, Andrea Mattavelli, Noam Rinetzky, and Cristian Cadar. 2018. </p><p>Chopped Symbolic Execution. In ICSE ’18: 40th International Conference on </p><p>Software Engineering, May 27-June 3, 2018, Gothenburg, Sweden. ACM, New </p><p>York, NY, USA, 11 pages. <a href="/goto?url=https://doi.org/10.1145/3180155.3180251" target="_blank">https://doi.org/10.1145/3180155.3180251 </a></p><p>12, 21, 27, 33–35, 41, 45]. </p><p>Path explosion represents the other big challenge facing symbolic execution, and the main focus of this paper. Path explosion </p><p>refers to the challenge of navigating the huge number of paths in </p><p>real programs, which is usually at least exponential to the number </p><p>of static branches in the code. The common mechanism employed </p><p>by symbolic executors to deal with this problem is the use of search </p><p>heuristics to prioritise path exploration. One particularly effective heuristic focuses on achieving high coverage by guiding the </p><p></p><ul style="display: flex;"><li style="flex:1">1</li><li style="flex:1">INTRODUCTION </li></ul><p></p><p>exploration towards the path closest to uncovered instructions [10 12 43]. In practice, these heuristics only partially alleviate the path </p><p>explosion problem, as the following example demonstrates. </p><p>–</p><p>Symbolic execution lies at the core of many modern techniques to software testing, automatic program repair, and reverse engi- </p><p>,</p><p></p><ul style="display: flex;"><li style="flex:1">neering [&nbsp;11, 16, 24, 32, 35]. At a high-level, symbolic execution </li><li style="flex:1">3, </li></ul><p></p><p>systematically explores multiple paths in a program by running </p><p>Motivating Example.&nbsp;The extract_octet() function, shown in </p><p>Figure 1, is a simplified version of a function from the libtasn1 </p><p>Permission to make digital or hard copies of all or part of this work for personal or </p><p>classroom use is granted without fee provided that copies are not made or distributed </p><p>for profit or commercial advantage and that copies bear this notice and the full citation </p><p>on the first page. Copyrights for components of this work owned by others than the </p><p>author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission </p><p>and/or a fee. Request permissions from [email protected]. </p><p>1</p><p>library which parses ASN.1 encoding rules from an input string. </p><p>The ASN.1 protocol is used in many networking and cryptographic </p><p>applications, such as those handling public key certificates and </p><p>electronic mail. Versions of libtasn1 before 4.5 are affected by a </p><p>heap-overflow security vulnerability that could be exploited via a </p><p>ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden </p><p>2</p><p>crafted certificate.&nbsp;Unfortunately, given a time budget of 24 hours, </p><p>©</p><p>2018 Copyright held by the owner/author(s). Publication rights licensed to the </p><p>Association for Computing Machinery. ACM ISBN 978-1-4503-5638-1/18/05...$15.00 <a href="/goto?url=https://doi.org/10.1145/3180155.3180251" target="_blank">https://doi.org/10.1145/3180155.3180251 </a><br><sup style="top: -0.22em;">1</sup>https://www.gnu.org/software/libtasn1 <sup style="top: -0.22em;">2</sup>https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2015-3622 </p><p></p><ul style="display: flex;"><li style="flex:1">ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden </li><li style="flex:1">Trabish, Mataaelli, Rinetzky and Cadar </li></ul><p></p><p>12</p><p><strong>int </strong>extract_octet(asn_t asn, <strong>char </strong>*str, <strong>int </strong>str_len) { </p><p><strong>int </strong>len3, counter, counter_end, result; <strong>int </strong>len2 = get_length(str, str_len, &amp;len3); counter = len3+1; </p><p>functions that construct the corresponding ASN.1 representation, </p><p>such as append_value. Therefore, we could have quickly reached </p><p>the bug if we had skipped the irrelevant functions that build the </p><p>AST. </p><p>345</p><p>counter_end = str_len; </p><p>In this paper, we propose a novel form of symbolic execution </p><p>called chopped symbolic execution that provides the ability to spec- </p><p>ify parts of the code to exclude during the analysis, thus enabling </p><p>symbolic execution to focus on significant paths only. The skipped </p><p>code is not trivially excluded from symbolic execution, since this </p><p>may lead to spurious results. Instead, chopped symbolic execution </p><p>lazily executes the relevant parts of the excluded code when ex- </p><p>plicitly required. In this way, chopped symbolic execution does not </p><p>sacrifice the soundness guarantees provided by standard symbolic </p><p>execution—except for non-termination of the skipped functions, which may be considered a bug on its own—in that only feasible </p><p>paths are explored, but effectively discards paths irrelevant to the </p><p>task at hand. </p><p>6</p><p><strong>while </strong>(counter &lt; counter_end) { <br>// call to get_length() leads to a heap overflow: len2 = get_length(str+counter, str_len, &amp;len3); </p><p><strong>if </strong>(len2 &gt;= 0) { </p><p>DECR_LEN(str_len, len2+len3); append_value(asn, str+counter+len3, len2); </p><p>} <strong>else </strong>{ </p><p>789<br>10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 </p><p>DECR_LEN(str_len, len3); result = extract_octet(asn, str+counter+len3, str_len); </p><p><strong>if </strong>(result != SUCCESS) <strong>return </strong>result; </p><p>len2 = 0; <br>}// str_len should have been decremented at the // beginning of the while block DECR_LEN(str_len, 1); </p><p>We developed a prototype implementation of chopped symbolic </p><p>execution and report the results of an initial experimental evaluation that demonstrates that this technique can indeed lead to </p><p>efficient and effective exploration of the code under analysis. </p><p>counter += len2+len3+1; <br>}</p><p><strong>return </strong>SUCCESS; </p><p>}</p><p>Figure 1: A simplified excerpt from the <strong>extract_octet </strong>rou- </p><p>tine in <strong>libtasn1</strong>. The invocation of <strong>get_length() </strong>in line 8 </p><p>leads to a heap overflow because <strong>str_len </strong>has not been decremented before the call. </p><p>Main Contributions. In summary, in this paper we make the </p><p>following contributions: </p><p>(1) We introduce chopped symbolic execution, a novel form of sym- </p><p>bolic execution that leverages a lightweight specification of </p><p>uninteresting code parts to significantly improve the scalability </p><p>of symbolic execution, without sacrificing soundness. </p><p>(2) We present Chopper, a prototype implementation of our tech- </p><p>nique within KLEE [11], and make it publicly available. </p><p>(3) We report on an experimental evaluation of Chopper in two </p><p>different scenarios: failure reproduction and test suite augmen- </p><p>tation, and show that chopped symbolic execution can improve </p><p>and respectively complement standard symbolic execution. </p><p>This paper is organised as follows. Section 2 gives a high-level </p><p>overview of chopped symbolic execution, and Section 3 presents </p><p>our technique in detail. Section 4 briefly discusses our implementa- </p><p>tion inside the KLEE symbolic execution engine. Section 5 presents </p><p>the experimental evaluation of our technique, and in particular it shows that chopped symbolic execution can overcome the limita- </p><p>tions of state-of-the-art symbolic executors. Section 6 surveys the </p><p>main approaches related to this work. Section 7 summarises the </p><p>contributions of the paper and describes ongoing research work. </p><p>the analysis of the extract_octet() function using the state-ofthe-art symbolic execution engine KLEE [11] fails to identify the </p><p>vulnerability due to path explosion. </p><p>At each loop iteration (lines 6–23), the function decodes the </p><p>length of the current data element with get_length (line 8). Func- </p><p>tion get_length scans through the input string and decodes the </p><p>ASN.1 fields. Then, the execution either recursively iterates over the </p><p>input string (line 14), or invokes append_value (line 11). Function </p><p>append_value creates the actual node in the Abstract Syntax Tree </p><p>(AST) by decoding the input string given the obtained length. This </p><p>function scans once more over the input string, performs several </p><p>checks over the selected element, and allocates memory for the </p><p>node in the recursive data structure. </p><p>Path explosion in this function occurs due to several nested </p><p>function calls. Symbolically executing function get_length alone </p><p>with a symbolic string of n characters leads to 4 ∗ n different paths. </p><p>Function append_value increases even more the number of paths </p><p>and also affects the efficiency of the symbolic execution engine due </p><p>to a huge number of constraint solver invocations. As a result, the </p><p>symbolic executor fails to identify the heap-overflow vulnerability </p><p>at line 8. </p><p></p><ul style="display: flex;"><li style="flex:1">2</li><li style="flex:1">OVERVIEW </li></ul><p></p><p>In this section, we give a high-level overview of chopped symbolic </p><p>execution using the simple program in Figure 2. In particular, Fig- </p><p>ure 2a shows the entry point of the program (function main), while </p><p>Figure 2c shows the uninteresting code which we would like to </p><p>skip (function f). </p><p>Our Approach.&nbsp;Identifying the vulnerability from the entry point </p><p>of the library is not trivial: To reach the faulty invocation of func- </p><p>tion get_length, the input triggering the vulnerability traverses </p><p>2,945 calls to 98 different functions, for a total amount of 386,727 </p><p>instructions. Our key observation is that most of the functions </p><p>required during the execution are not relevant for finding the vul- </p><p>nerability. The vulnerability occurs due to an incorrect update of </p><p>the remaining bytes for parsing (line 21), which results in a memory </p><p>out-of-bound read when calling get_length. The bug thus occurs </p><p>in code which deals with the parsing, which is independent from the </p><p>We start the chopped execution by executing main symbolically. </p><p>When a state reaches the function call for&nbsp;at line 7, we create a </p><p>f</p><p>snapshot state by cloning the current state, and skip the function </p><p>call. The snapshot state is shown graphically in Figure 2b, where </p><p>each gray oval represents a symbolic execution state. </p><p>With a snapshot created, we then continue the execution on </p><p>the current state, but from this point we must consider that some </p><p></p><ul style="display: flex;"><li style="flex:1">Chopped Symbolic Execution </li><li style="flex:1">ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden </li></ul><p></p><p>12</p><p><strong>struct </strong>point { <strong>int </strong>x, y, z;}; </p><p>3<br>1</p><p>Recovery </p><p>3</p><p><strong>int </strong>main() { <br>15 <strong>void </strong>f(<strong>struct </strong>point *p, <strong>int </strong>k) { </p><p>Snapshot </p><p>4</p><p><strong>struct </strong>point p = {0, 0, 0}; <strong>int </strong>j, k; // symbolic </p><p></p><ul style="display: flex;"><li style="flex:1">k≤0 </li><li style="flex:1">k&gt;0 </li></ul><p></p><p>16 17 18 19 20 21 22 23 </p><p><strong>if </strong>(k % 2) </p><p>p-&gt;z++; </p><p>5</p><p>Recovery’ Recovery’’ </p><p>6</p><p></p><ul style="display: flex;"><li style="flex:1">j≤0 </li><li style="flex:1">j&gt;0 </li></ul><p></p><p>7</p><p>f(&amp;p, k); // skip </p><p><strong>if </strong>(j &gt; 0) <strong>if </strong>(p.y) </p><p>bug(); </p><p>2</p><p><strong>if </strong>(k &gt; 0) </p><p>p-&gt;x++; </p><p><strong>else </strong></p><p>p-&gt;y++ </p><p>8</p><p>Dependent </p><p></p><ul style="display: flex;"><li style="flex:1">4</li><li style="flex:1">5</li></ul><p></p><p>9</p><p>Recovery’ </p><p></p><ul style="display: flex;"><li style="flex:1">k≤0 </li><li style="flex:1">k&gt;0 </li></ul><p></p><p>1<br>=</p><p>10 11 12 13 14 </p><p></p><ul style="display: flex;"><li style="flex:1">y</li><li style="flex:1">p. </li></ul><p></p><p>p-&gt;y++; </p><p>return </p><p><strong>else </strong></p><p>6</p><p>}</p><p>7</p><p>allgood(); </p><p><strong>return </strong>0; </p><p>}</p><p>—</p><p>(c) </p><p>Dependent + Recovery’’ <br>Dependent + Recovery’ </p><p></p><ul style="display: flex;"><li style="flex:1">(b) </li><li style="flex:1">(a) </li></ul><p></p><p>Figure 2: Graphical illustration of chopped symbolic execution on a simple example. </p><p>load instructions may depend on the side effects of the skipped function f, i.e. the memory locations that&nbsp;may update. In our example, the side effects of&nbsp;are the memory locations pointed </p><p>to by p.z, p.x, and p.y, which are updated at lines 17, 20, and 22 </p><p>respectively. (We compute the side effects of&nbsp;using conservative </p><p>static pointer analysis [ 26 37] before the symbolic exploration </p><p>example, if line 8 were changed from if (j&gt;0)&nbsp;to if (k&gt;0), then </p><p>the dependent state would have k &gt; 0 in its path condition, ren- </p><p>dering the dependent state incompatible with the path in f where </p><p>k ≤ 0. </p><p>ff</p><p>f</p><p>One way to filter such incompatible paths would be to execute </p><p>4</p><p></p><ul style="display: flex;"><li style="flex:1">,</li><li style="flex:1">,</li></ul><p></p><p>all possible paths thorough&nbsp;during recovery, and later filter the </p><p>f</p><p>starts, see §3.) We define those instructions that read from the side </p><p>effects of the skipped functions as dependent loads. </p><p>ones that are incompatible with the dependent state. However, this would potentially lead to the exploration of a large number of infeasible paths. We thus designed a more efficient solution: </p><p>Each state maintains a list of guiding constraints, which are those </p><p>constraints added since the call to the skipped function. In our </p><p>example, the guiding constraints for the dependent state are j &gt; 0. </p><p>Before we execute a recovery state, we add these guiding constraints from the dependent state to the path condition of the recovery state. By doing this, we guarantee that every path explored in the recovery </p><p>state is consistent with respect to its dependent state. </p><p>On some paths, symbolic execution does not encounter such dependent loads. For example, the path following the else side of the branch at line 8 accesses neither p.x nor p.y nor p.z, so no further action is needed on those paths, and the exploration </p><p></p><ul style="display: flex;"><li style="flex:1">may correctly terminate without ever going through the code of </li><li style="flex:1">f. </li></ul><p></p><p>Indeed, in real programs there are often paths that do not depend </p><p>on the skipped functions, and in such cases symbolic execution </p><p>immediately benefits from our approach: irrelevant paths are safely </p><p>skipped, thus reducing path explosion. </p><p>During recovery, one could execute all possible paths through </p><p>However, on other paths symbolic execution encounters depen- </p><p>dent loads. This happens for our example on the path which explores </p><p>the then side of the branch at line 8, when it loads the value of </p><p>p.y at line 9. At this point, the current state needs to be suspended </p><p>until the relevant paths in function f are explored, and becomes a </p><p>dependent state. To recover a path, we create a new recovery state </p><p>the skipped function&nbsp;which are compatible with the dependent </p><p>f</p><p>state, as we could in the example above. However, for real programs </p><p>this could be unnecessarily expensive, as many paths do not influence the dependent load which started the recovery. To avoid this </p><p>possible path explosion, and reduce the cost of constraint solving, </p><p>we aim to only execute the paths that could influence the depenwhich inherits the snapshot state generated before skipping </p><p>line 7 and start executing symbolically the function. </p><p>f</p><p>at </p><p>dent load. We accomplish this by statically slicing [ </p><p>function with&nbsp;respect to the store instructions that write to the </p><p>memory location read by the dependent load, that is, the side effects observable from the dependent load. Note that function&nbsp;could call </p><ul style="display: flex;"><li style="flex:1">7, </li><li style="flex:1">40, 42, 44] the </li></ul><p></p><p>f</p><p>While symbolic execution is in the recovery state, if the execution forks, then the same fork is performed in the dependent </p><p>state. Furthermore, as we run the recovery state, any stores to the </p><p>memory location read by the dependent load are also performed </p><p>f</p><p>other functions, so the slicing is done for all these functions too. In </p><p>our example, the slicing would likely be able to completely remove </p><p>the if statement at lines 16–17, which would halve the number of </p><p>explored paths, thus reducing path explosion. It would also likely remove the then side of the if statement at line 19, which in this </p><p>case does not bring significant benefits, but it could, if that side of </p><p>the branch were replaced by say, an expensive loop. Slicing away </p><p>these code parts is possible because they do not update p.y on </p><p>in the dependent state. For example, if the symbolic execution of </p><p>f</p><p>traverses the else branch at lines 21–22, then the value of p.y (the </p><p>memory location pointed to by p-&gt;y) is set to 1 in the dependent </p><p>state too. If the recovery state returns successfully, the dependent </p><p>state is resumed successfully. If an error occurs while executing </p><p>the recovery state (e.g., an invalid memory access or a division by </p><p>zero error, which could have occurred if p-&gt;z were set in line 17 to </p><p>4/p-&gt;y) the corresponding dependent state is terminated. </p><p>3</p><p>which the dependent load on line 9 relies. </p><p>When we execute a recovery state, not all paths might be com- </p><p>patible with the execution which the dependent state reached. For </p><p><sup style="top: -0.22em;">3</sup>In practice, the success of the slicing algorithm in reducing the size of the code </p><p>depends on the precision of the underlying pointer analysis. </p><p></p><ul style="display: flex;"><li style="flex:1">ICSE ’18, May 27-June 3, 2018, Gothenburg, Sweden </li><li style="flex:1">Trabish, Mataaelli, Rinetzky and Cadar </li></ul><p></p><p>Algorithm 1 Chopped symbolic execution (simplified). </p><p>Figure 2b shows how chopped symbolic execution works on </p><p>our example in a graphical way. To recapitulate, when the call to </p><p>1: worklist ← ∅ </p><p>f</p><p>is reached at line 7, a snapshot state is created by cloning the </p><p>2: function cse(s , skipFunctions) </p><p>0</p><p>⃝</p><p>1</p><p></p><ul style="display: flex;"><li style="flex:1">current state (step </li><li style="flex:1">in the figure). Then, on the execution state </li></ul><p></p><p>3: worklist ← worklist ∪ {s } </p><p>0</p><p>that reaches line 9, the current state becomes a dependent state and </p><p>4: while worklist , ∅ do </p><p>⃝</p><p>is suspended (step </p><p>2</p><p>), and a recovery state is created by cloning </p>

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    11 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us