Multiplex Symbolic Execution: Exploring Multiple Paths by Solving Once
Total Page:16
File Type:pdf, Size:1020Kb
Multiplex Symbolic Execution: Exploring Multiple Paths by Solving Once Yufeng Zhang∗ Zhenbang Chen∗ Ziqi Shuai College of Computer Science and College of Computer, National College of Computer, National Electronic Engineering, Hunan University of Defense Technology University of Defense Technology University Changsha, China Changsha, China Changsha, China [email protected] [email protected] [email protected] Tianqi Zhang Kenli Li Ji Wang College of Computer, National College of Computer Science and College of Computer, State Key University of Defense Technology Electronic Engineering, National Laboratory of High Performance Changsha, China Supercomputing Center in Changsha, Computing, National University of [email protected] Hunan University Defense Technology Changsha, China Changsha, China [email protected] [email protected] ABSTRACT KEYWORDS Path explosion and constraint solving are two challenges to sym- symbolic execution, constraint solving, mathematical optimization bolic execution’s scalability. Symbolic execution explores the pro- ACM Reference Format: gram’s path space with a searching strategy and invokes the under- Yufeng Zhang, Zhenbang Chen, Ziqi Shuai, Tianqi Zhang, Kenli Li, and Ji lying constraint solver in a black-box manner to check the feasibility Wang. 2020. Multiplex Symbolic Execution: Exploring Multiple Paths by of a path. Inside the constraint solver, another searching procedure Solving Once. In 35th IEEE/ACM International Conference on Automated Soft- is employed to prove or disprove the feasibility. Hence, there exists ware Engineering (ASE ’20), September 21–25, 2020, Virtual Event, Australia. the problem of double searchings in symbolic execution. In this ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3324884.3416645 paper, we propose to unify the double searching procedures to im- prove the scalability of symbolic execution. We propose Multiplex 1 INTRODUCTION Symbolic Execution (MuSE) that utilizes the intermediate assign- Symbolic execution [18, 21, 38] is a precise program analysis tech- ments during the constraint solving procedure to generate new nique that has been successfully applied to many software engi- program inputs. MuSE maps the constraint solving procedure to neering activities, including automatic software testing [5, 41], bug the path exploration in symbolic execution and explores multi- finding [19], program repair [10], etc. One challenge of symbolic ple paths in one time of solving. We have implemented MuSE on execution is the scalability problem caused by path explosion and two symbolic execution tools (based on KLEE and JPF) and three constraint solving. commonly used constraint solving algorithms. The results of the ex- During symbolic execution, each variable in program P has a tensive experiments on real-world benchmarks indicate that MuSE symbolic or concrete value. For non-branch statements, symbolic has orders of magnitude speedup to achieve the same coverage. execution does symbolic or concrete calculations and updates the symbolic or concrete values of the variables. When executing a CCS CONCEPTS branch statement br, symbolic execution generates the path condi- • Software and its engineering → Software verification and tion (PC) for each branch of br. The path condition of a branch b Ón validation; (denoted by PC¹bº = i=1 Ci ) is a constraint in qualifier-free first- order logic that encodes the feasibility of the program path to b, ∗The first two authors contributed equally to this work and are co-first authors. Zhen- and Cn is the symbolic condition of b. Symbolic execution invokes bang Chen and Ji Wang are the corresponding authors. a constraint solver [11, 16, 40] to check the satisfiability [23] of each branch’s PC. If PC¹bº is satisfiable, then the program path to b is feasible, and symbolic execution will continue to execute the Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed statement along b; otherwise, it is infeasible, i.e., no input can drive for profit or commercial advantage and that copies bear this notice and the full citation the program to b, and symbolic execution abandons b. In this way, on the first page. Copyrights for components of this work owned by others than ACM symbolic execution systematically explores the path space of P. must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a On one hand, the path number explodes exponentially in the num- fee. Request permissions from [email protected]. ber of branch statements. On the other hand, constraint solving is ASE ’20, September 21–25, 2020, Virtual Event, Australia well-known to be hard [23]. Another complexity explosion occurs © 2020 Association for Computing Machinery. ACM ISBN 978-1-4503-6768-4/20/09...$15.00 inside of the constraint solver. As shown in Figure 1, there exist https://doi.org/10.1145/3324884.3416645 double explosions in symbolic execution. Such double explosions ASE ’20, September 21–25, 2020, Virtual Event, Australia Zhang Y.F. and Chen Z.B., et al. solution: C1/\C2/\C3 [13, 25], ii) abstraction refinement based quantifier-free array and partial solution: C1/\~C2 bit-vector (QF_ABV) constraint solving [16], and iii) optimization- path space of program P partial solution: C1/\C2/\~C3 based floating-point constraint solving [15, 40]. Besides, we have C1 solution space SPC implemented MuSE on two DSE engines based on KLEE [5] and Symbolic PathFinder (SPF) [33] for C and Java programs, respec- C2 tively. We have applied our prototypes to real-world C and Java programs. The evaluation results indicate the effectiveness and C3 efficiency of MuSE. b The main contributions of this paper are: • We propose MuSE to utilize the partial solutions during con- straint solving to generate multiple test inputs for exploring constraint solver symbolic execution engine multiple paths by solving once. • We have instantiated the idea of partial solution to three constraint solving methods and implemented MuSE on two Figure 1: Double explosions in symbolic execution proce- DSE engines for C and Java programs. dure. MuSE generates multiple test inputs from partial so- • We have carried out extensive experiments on real-world C lutions with only one time of constraint solving. Partial so- and Java programs. The experimental results indicate that lutions can be used to trigger off-the-path branches on the MuSE achieves one or two orders of magnitude speedup on current path. the three constraint solving methods for reaching the same code coverage. We organize the remainders of this paper as follows. Section obstruct the application of symbolic execution to larger real-world 2 motivates MuSE by a Simplex-based solving method. Section 3 programs. presents MuSE and its instantiations on three solving methods. Symbolic execution explores the path space with a search strat- Section 4 explains the implementation of MuSE and the experi- egy, such as depth-first search (DFS) and breadth-first search (BFS). ments on real-world benchmarks. Section 5 reviews the related The underlying constraint solver also employs an internal search- work. Section 6 concludes the paper. ing procedure in the solution space to decide the satisfiability of path conditions. Essentially, both of the path space and the solution 2 MOTIVATING EXAMPLE space represent P’s input space SI . The conditions of P’s branch In this section, we motivate the principle of MuSE. Figure 2 shows statements split SI into different parts. Each part can be represented Ón a Java function start that receives two parameters and has four by a path. For a path condition PC¹bº = i=1 Ci , the solution space S contains all the possible assignments to the input variables in paths. In each path, the program prints a different number. We call PC ∼ PC¹bº. When solving PC¹bº, the constraint solver searches S , and these four path as p1 p4, respectively. Now we use DSE to explore PC f g hence searches S . During this searching procedure, the solver may the path space. Suppose that the initial input is x = 1;y = 3 . I f g search the input space corresponding to other paths of P. However, Then the first path is p1 that covers the lines 2; 3; 4; 6; 7 . The path ≥ ^ − ≥ ^ − the solver only returns the final solution satisfying PC¹bº, if SAT, condition of p1 is ϕ1 = x + y 2 2y x 1 2x y < 0. If or returns UNSAT. Therefore, S is doubly searched in the stack of we use DFS searching strategy, the last branch is flipped. The new I ≥ ^ − ≥ ^ − ≥ symbolic execution by the path space exploration and the under- path condition ϕ2 = x + y 2 2y x 1 2x y 0 is feed lying constraint solver. It is desirable to unify the two searching into off-the-shell constraint solver. Suppose that the solution of f g procedures to improve the scalability. ϕ2 is x = 1;y = 1 , then the second path is p2 that covers the f g In this paper, we propose Multiplex Symbolic Execution (MuSE) lines 2; 3; 4; 5 . Similarly, p3 and p4 will be explored. In total, DSE ∼ towards eliminating the redundant searching in dynamic symbolic invokes the constraint solver three times for p2 p4. execution (DSE). The principle of our method is that we leverage Since all the constraints are linear arithmetic, we suppose that the constraint solver to search the path space directly via generating the solver uses the Simplex-based QF_LIA theory solving algorithm multiple test inputs in one time of solving. For a path condition [23, 25]. The algorithm first considers all the integer variables in Ón the constraints as real variables and uses Simplex-based linear PC¹bº = i=1 Ci , we call a point α in the solution space SPC a partial solution if α satisfies a subset of the constraints in PC¹bº.