On Benchmarking the Capability of Symbolic Execution Tools With

Home , Logic bomb

arXiv:1712.01674v2 [cs.SE] 25 May 2018 • • • aucitsbitdt EETascin nDpnal an Dependable on Transactions IEEE to Computing. submitted Manuscript rtenme fbg eetdi elwrdprograms real-world in detected bugs coverage code of exe- ( achieved number the symbolic the on evaluating rely or and for generally [2] tools methods KLEE cution Current as rapid open-source such [3]. several achieved available, Angr with has tools decade execution It last symbolic [1]. the analysis in development program and testing ecmr euti nisda tde o eedon depend not does it as the Moreover, unbiased tool. is execution our result symbolic way, a benchmark this of In capability concerning metric. the information employ meaningful evaluation more can give an can we approach as elaborate, To based challenge approach challenges. code each benchmark known the a these develop of on There- to achieve. factors propose can and we determinant tool fore, execution [5] the symbolic numbers a are that floating-point coverage They as [6]. such loops well, may tools handle execution not symbolic which problems challenging programs. is targeting which fine-grained to tools sensitive a execution less propose symbolic to execution for approach aims symbolic benchmark therefore, a paper, of This capabilities tool. detailed cannot the they and manifest analyzed, being programs of types different I 1 e.g. [email protected] .Za swt h hns nvriyo ogKn n h Uni Zhou The China Y. and of Kong Technology Hong and of Science University Chinese of The with is Zhao Z. D H and of [email protected] Institute University hxu, Research Chinese Email: Shenzhen The with Engineering, are & Lyu Science Computer R. M. and Xu, H. yblceeuini oua ehiu o software for technique popular a is execution Symbolic npriua,w bev htteeaesm common some are there that observe we particular, In 2,[].Nvrhls,sc erc a utaewith fluctuate may metrics such Nevertheless, [4]). [2], , yblceeuin eetat1 itntcalne from challenges challenges distinct 12 end extract this We To execution. memories. symbolic symbolic th and against numbers tools floating-point such as of exe performance symbolic the benchmark evaluates to approach approach novel a n introduces we paper and off-the-shelf, available tools execution symbolic Abstract pnsuc,wt namt etrfcltt h omnt t community the facilitate better to aim an with only source, takes open generally process rev benchmark can The efficiently. approach and our that show results expe Experimental Triton. real-world te conducted find have can We tool problems. execution corresponding symbolic a If dat problem. our challenging challenge, each For automatically. tools execution ne Terms Index nBnhakn h aaiiyo Symbolic of Capability the Benchmarking On NTRODUCTION ¬ swt colo optrSine ua nvriy Email University. Fudan Science, Computer of School with is Smoi xcto o eoe nidsesbetechniqu indispensable an becomes now execution —Symbolic and smoi execution. —symbolic ahepoinchallenges path-explosion xcto ol ihLgcBombs Logic with Tools Execution u u iu ho aga hu n ihe .Lyu, R. Michael and Zhou, Yangfan Zhao, Zirui Xu, Hui hn edvlpadtsto oi ob n rmwr ob to framework a and bombs logic of dataset a develop we Then, . n Kong. ong e rcia ecmr praht er hi capabilit their learn to approach benchmark practical a eed Secure d stcnan eea oi ob,ec fwihi ure b guarded is which of each bombs, logic several contains aset h ieaueadctgrz hmit w categories: two into them categorize and literature the versity a hi aaiiisadlmttosi adigparticu handling in limitations and capabilities their eal odc uuewr nbnhakn yblceeuinto execution symbolic benchmarking on work future conduct o tcsst rge oi ob,i niae htteto ca tool the that indicates it bombs, logic trigger to cases st p.of ept. oeso iue oeaut ol erlaeordtsto dataset our release We tool. a evaluate to minutes of dozens nw hlegsfcdb eea yblceeuintech execution symbolic general by faced challenges known e iet ihtrepplrsmoi xcto ol:KLEE, tools: execution symbolic popular three with riments efis uvyrltdppr n ytmtz h challeng the systematize and papers related survey first we , uintosi n-rie n fcetmne.I partic In manner. efficient and fine-grained a in tools cution : ✦ hlegsa osbe ectgrz xsigchallenges existing categories: distinct categorize two many We into as possible. embrace as essentia to is approach challenges step benchmark This our execution. for symbolic of challenges the analysis. for programs particular rica oebokta a nyb xctdwe certain when small executed be designing an only is by can bomb that problem logic block A code the bombs. artificial logic tackle with We embedded takes programs efficient, time. generally very long programs not real-world a with is failure. benchmark a itself and cause execution could are cannot symbolic challenges we they Moreover, any end, because and this testing complicated To for too challenges. programs the real-world of with employ tools each execution to symbolic respect of capability the benchmark i ap- categorized. well discussed our be challenges can external With literature existing as the functions. all crypto such consequently, and routines, proach, loops, complex calls, as include function issues, they path-explosion as cause on long can also time but programs programs the long large-sized small-sized only starving very Not paths. tool spending the execution exploring or symbolic resources a computational cause analyze, may to flows which control chal- possible many Path-explosion too introduce overflows. lenges arithmetic and sym- buff overflows, numbers, contextual floating-point jumps, memories, symbolic values, bolic symbolic executions, challeng parallel propagati covert These declarations, flows. variable incor- control symbolic generating particular include for in cases tool test pose execution rect may symbolic they a where to process, problems reasoning symbolic core the challenges explosion o otaetsigadpormaayi.Teeaesever are There analysis. program and testing software for e oti n,w rtcnutasseai uvyon survey systematic a conduct first we end, this To et edvlpa cuaeadefiin prahto approach efficient and accurate an develop we Next, yblcraoigcalne attack challenges Symbolic-reasoning . yblcraoigchallenges symbolic-reasoning elw IEEE Fellow, nhaksymbolic enchmark symbolic-reasoning a susaccurately issues lar e.Teeoe this Therefore, ies. adethe handle n specific a y nr and Angr, ols. lr our ular, ius such niques, sof es iHbas GitHub n and al path- ons, es er n 1 l 2

ǯͳͺ ǯͳ͹ ǯͳͷ ǯͳͷ ǯͳ͵ ǯͳͳ ǯͳͲ

Ǧ Fig. 1. The challenges of symbolic execution discussed in the literature. The detailed paper references are Schwartz’10 [7], Qu’11 [8], Cadar’13 [6], Cseppento’15 [9], Kannavara’15 [10], Quan’16 [5], Xu’17 [11], and Baldoni’18 [12]. conditions are met. We create such logic bombs that can 2 RELATED WORK only be triggered when a challenging problem is solved. The benefits are two folds. By keeping each logic bomb as small as possible, our evaluation result would be less likely This section compares our work with present papers that affected by other unexpected issues. Also, employing small either systematize the challenges of symbolic execution programs can shorten the required symbolic execution time or employ the challenges to evaluate symbolic execution and provides efficiency. tools. Note that although symbolic execution has received Following this method, we have designed a dataset of extensive attention for decades, only a few papers include logic bombs covering all the challenges and a framework a systematic discussion about the challenges of the tech- to run benchmark experiments automatically. For each chal- nique. Existing work in this area mainly focuses on em- lenge, our dataset contains several logic bombs with differ- ploying the technique to carry out specific software analysis ent problem settings or different levels of hardness, such as tasks (e.g., [13], [14], [15]), or proposing new approaches a one-leveled arrays or a two-leveled arrays for the symbolic to improve the technology concerning particular challenges memory challenge. Our framework employs the dataset of (e.g., [16], [17], [18]). logic bombs as evaluation metrics. It firstly parses the logic bombs and compiles them to object codes or binaries; then, The papers that focus on systematizing the challenges it directs a symbolic execution tool to perform symbolic of symbolic execution tools include [8], [9], [10], [19]. Kan- execution on the logic bombs in a batch mode; and finally, navara et al. [10] enumerated several challenges that may it verifies the generated test cases and produces reports. We hinder the adoption of symbolic execution in industrial release our dataset and framework tools on GitHub. fields. Qu and Robinson [8] conducted a case study on We have conducted real-world experiments to bench- the limitations of symbolic testing tools and examined their mark three prevalent symbolic execution tools, KLEE, Tri- prevalence in real-world programs. However, none of the ton, and Angr. Although these tools adopt different imple- two papers provides a method to evaluate symbolic execu- mentation techniques, our framework can adapt to them tion tools. Cseppento and Micskei [9] proposed several met- with only a little customization. The benchmark process for rics to evaluate source-code-based symbolic execution tools. each tool generally took dozens of minutes. Experimental But their metrics are based on specific program syntax of results show that our benchmark approach can reveal their object-oriented codes rather than the language-independent capabilities and limitations efficiently and accurately. In challenges. These metrics are not very general for symbolic summary, Angr has achieved the best performance with 21 execution. Banescu et al. [19] also design several small cases solved, which is roughly one third of the total logic programs for evaluation. But their purpose is to evaluate bombs; KLEE solved nine cases; and Triton only solved the resilience of code obfuscation transformations against three cases. We manually checked the reported solutions symbolic execution-based attacks. They do not investigate and confirmed that they are all correct and nontrivial. Be- the capability of symbolic execution but trust KLEE as a sides, the results also demonstrate some interesting findings state-of-the-art symbolic executor. Besides, there are several about these tools. For example, Angr only supports one- survey papers (e.g., [6], [7], [12]) which also include some leveled arrays but not two-leveled arrays; Triton does not discussion about the challenges. But our work provides a even support the atoi function. Most of our findings are more complete list of challenges as shown in Figure 1. unavailable from the tool websites or existing papers, which In our previous conference paper [11], we have con- further justifies the value of our benchmark approach. ducted an empirical study with some of the challenges. The rest of the paper is organized as follows. We first This paper notably extends our previous paper with a discuss the related work in Section 2 and introduce the formal benchmark methodology. It serves as a pilot study preliminary knowledge of symbolic execution in Section 3. on systematically benchmarking symbolic execution tools Then, we demonstrate our systematic study about the chal- in handling particular challenges. We consequently design a lenges of symbolic execution in Section 4, and we introduce novel benchmark framework based on logic bombs, which our benchmark methodology in Section 5. We present our can facilitate the automation of the benchmark process. experiments and results in Section 6. Finally, we conclude We further provide a benchmark toolset that can be easily this paper in Section 7. deployed by ordinary users. 3

3 PRELIMINARY

This section reviews the underlying techniques of symbolic execution, which is a preliminary for discussing the challenges and the benchmark approach we proposed in this work.

3.1 Theoretical Basis ሺሻ The core principle of symbolic execution is symbolic reasoning. Informally, given a sequence of instructions along ሺ ሻ a control path, a symbolic reasoning engine can extract a constraint model and generates a test case for the path by solving the model. Formally, we can use Hoare Logic [20] to model the symbolic reasoning problem. Hoare Logic is composed of basic Fig. 2. A conceptual framework for symbolic execution. triples {S1}I{S2}, where {S1} and {S2} are the assertions of variable states and I is an instruction. The Hoare triple says if a precondition {S1} is met, when executing I, it will 3.2 Symbolic Execution Framework terminate with the postcondition {S2}. Using Hoare Logic, We demonstrate the conceptual framework of a symbolic we can model the semantics of instructions along a control execution tool in Figure 2. It inputs a program and outputs path as: test cases for the program. The framework includes a core symbolic reasoning engine and a path selection engine. The symbolic reasoning engine analyzes the instructions S0 I0 S1, 1 I1... S −1, −1 I −1 S { } { ∆ } { n ∆n } n { n} along a path and generates test cases that can trigger the path. Based on the theoretical basis of symbolic reasoning, {S0} is the initial symbolic state of the program; {S1} is we can divide symbolic reasoning into four stages: symbolic the symbolic state before the first conditional branch with variable declaration, instruction tracing, semantic interpre- symbolic variables; ∆ is the corresponding constraint for i tation, and constraint modeling and solving. The details are executing the following instructions, and {S } satisfies ∆ . i i discussed as follows: A symbolic execution engine can compute an initial state ′ • {S0} (i.e., the concrete values for symbolic variables) which Symbolic variable declaration (Svar): In this stage, we can trigger the same control path. This can be achieved have to declare symbolic variables which will be by computing the weakest precondition (aka wp) backward employed in the following symbolic analysis process. using Hoare Logic: If some symbolic variables are missing from declaration, insufficient constraints can be generated for triggering a control path. • {Sn−2} = wp(In−2{Sn−1}), s.t. {Sn−1} sat ∆n−1 Instruction tracing (Sinst): This stage collects the instructions along control paths. If some instructions

{Sn−3} = wp(In−3{Sn−2}), s.t. {Sn−2} sat ∆n−2 are missing, or the syntax are not supported, errors would occur. • Semantic interpretation (S ): This stage translates ... sem the semantics of collected instructions with an intermediate language (IL). If some instructions are {S1} = wp(I1{S2}), s.t. {S2} sat ∆2 not correctly interpreted, or the data propagation are incorrectly modeled, errors would occur. {S0} = wp(I0{S1}), s.t. {S1} sat ∆1 • Constraint modeling and solving (Smodel): This stage generates constraint models from IL, and then solve Combining the constraints in each line, we can get a the constraint models. If a required satisfiability constraint model in conjunction normal form: ∆1 ∧ ∆2 ∧ modulo theory is unsupported, errors would occur. ... ∧ ∆n−1. The solution to the constraint model is a test case ′ The path selection engine determines which path should {S0} that can trigger the same control path. be analyzed in the next round of symbolic reasoning pro- Finally, when sampling {I }, not all instructions are use- i cess. Several favorite strategies include depth-first search, ful. We only keep the instructions whose parameter values width-first search, random search, etc [12]. depend on the symbolic variables. We can demonstrate the correctness by expending any irrelevant instruction Ii to X := E, which manipulates the value of a variable X 3.3 Implementation Variations with an expression E. Suppose E does not depend on any According to the different ways of instruction tracing, we symbolic value, then X would be a constant, and should not can classify symbolic execution tools into static symbolic be included in the weakest preconditions. In practice, it can execution and dynamic symbolic execution. Static symbolic be realized using taint analysis techniques [21]. execution loads a whole program first before extracting 4

TABLE 1 A list of the challenges faced by symbolic execution, and the symbolic reasoning stages they attack.

Stage of Error Challenge Idea Svar Sinst & Ssem Smodel Sym. Var. Declaration Contextual variables besides program arguments X X X Covert Propagations Propagating symbolic values in covert ways - X X Buffer Overflows Writing symbolic values without proper boundary check - X X Symbolic Parallel Executions Processing symbolic values with parallel codes - X X -reasoning Symbolic Memories Symbolic values as the offset of memory - X X Challenges Contextual Symbolic Values Retrieving contextual values with symbolic values - X X Symbolic Jumps Sym. values as the addresses of unconditional jump - - X Floating-point Numbers Symbolic values in float/double type - - X Arithmetic Overflows Integers outside the scope of an integer type - - X Loops Change symbolic values within loops - - - Path-explosion Crypto Functions Processing symbolic values with crypto functions - - - Challenges External Function Calls Processing sym. values with some external functions - - -

instructions along with a path on the program control-flow declared before a symbolic analysis process. For example, graph (CFG). Dynamic symbolic execution is also known in source-code-based symbolic execution tools (e.g., KLEE), as concolic (concrete and symbolic) execution. It collects users can manually declare symbolic variables in the source instructions which are actually executed. In each round, codes. Binary-code-based concolic execution tools (e.g., Tri- the concolic execution engine executes the program with ton) generally assume a fixed length of program arguments concrete values to generate instructions. from stdin as the symbolic variable. If some symbolic vari- We may also classify symbolic execution tools into ables are missing from the declaration, the generated test source-code-based symbolic execution and binary-code- cases would be insufficient for triggering particular control based symbolic execution. In general, we do not perform paths. Since the root cause occurs before symbolic execution, symbolic reasoning on source codes or binaries directly. A the challenge attacks Svar. prior step is to interpret the semantics of the program with Figure 3(a) is a sample with a symbolic variable declara- an intermediate language (IL). For source codes, we can tion problem. It returns a BOMB_ENDING only when being translate the code directly with a compiler frontend. For executed with a particular process id. To explore the path, binaries, we have to lift the assembly codes into IL, which a symbolic execution tool should treat pid as a symbolic is more difficult. Their main difference lies in the translation variable and then solve the constraint with respect to pid. process. Otherwise, it cannot find test cases that can trigger the path. To declare symbolic variables precisely, a user should 4 CHALLENGES OF SYMBOLIC EXECUTION know target programs well. However, the task is impossible when analyzing programs on a large scale, e.g., when Based on whether a challenge attacks the symbolic reason- performing malware analysis. In an ideal case, a symbolic ing process, we categorize the challenges of symbolic ex- execution tool may automatically detect such variables ecution into symbolic-reasoning challenges and path-explosion which can control program behaviors and reports the so- challenges. A symbolic-reasoning challenge attacks the sym- lutions accordingly. To our best knowledge, non-existent bolic reasoning process and leads to incorrect test cases tools have implemented the ideal feature. Instead, they are generated. A path-explosion challenge happens when there generally discussed together with other problems related are too many paths to analyze. It does not attack a single to the computing environment, such as libraries, kernels, symbolic reasoning process, but it may starve the compu- and drivers [28]. In reality, there are several challenges tational resources or requires a very long time for symbolic of this work refer to the computing environment, such as execution. contextual symbolic variables, covert propagations, parallel Table 1 demonstrates the challenges that we have in- executions, external function calls. We demonstrate that vestigated in this work. We collect such challenges via a these challenges are different. careful survey of existing papers. The survey scope covers several survey papers about symbolic execution techniques 4.1.2 Covert Propagations (e.g., [6], [7], [12]), several investigations that focus on sys- temizing the challenges of symbolic execution (e.g., [9], [10]), Some data propagation ways are covert because they cannot and other important papers related to symbolic execution be traced easily by data-flow analysis tools. For example, if (e.g., [15], [16], [22], [23], [24], [25], [26], [27]). the symbolic values are propagated via other media (e.g., files) outside of the process memory, the propagation would be untraceable. Such propagation methods are undecidable 4.1 Symbolic-reasoning Challenges and can be beyond the capability of pure program analysis. Next, we discuss nine challenges that may incur errors to Symbolic execution tools have to handle such cases with ad the symbolic reasoning process. hoc methods. There are also some propagations challenging only to certain implementations. For example, propagating 4.1.1 Symbolic Variable Declarations symbolic values via embedded assembly codes should be Test cases are the solutions of symbolic variables subject to a problem for source-code-based symbolic execution tools constrain models. Therefore, symbolic variables should be only. If a symbolic execution tool fails to detect some prop- ͳ ̴ሺሻሼ ͳ ȗሺ ȗ ሻሼ ͳͳ ̴ሺሻሼ ͳ ̴ሺ ȗሻሼ ʹ ൌሺሻሺሻǢ ʹ ȗൌ̶̶Ǣ ͳʹ ሾʹͷ͸ሿǢ ʹ ൌͲǢ ͵ ሺ̶ Ψ̳Ψ̶ǡሻǢ ͵ ȗ ൌሺ ǡ̶̶ሻǢ ͳ͵ ሺ ǡ̶ Ψ̶̳ǡሻǢ ͵ ሾͺሿǢ Ͷ ሺ ൌൌͶͲͻ͸ሻ Ͷ ሾͳͲʹͶሿǢ ͳͶ ȗൌሺ ሻǢ Ͷ ሺǡሻǢ ͷ ̴ Ǣ ͷ ሺǡ̵̳Ͳ̵ǡሺሻሻǢ ͳͷ ሺሺሻൌൌ͹ሻሼ ͷ ሺൌൌͳሻሼ ͸ ͸ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻ ͳ͸ ̴ Ǣ ͸ ̴ Ǣ ͹ ̴ Ǣ ͹ ൌǢ ͳ͹ ሽ ͹ ሽ ͺ ሽ ͺ ሺሻǢ ͳͺ ̴ Ǣ ͺ ̴ Ǣ ͻ ͻ Ǣ ͳͻ ሽ ͻ ሽ ͳͲ ͳͲ ሽ ʹͲ ͳͲ

ͳ ̴ሺሻሼ ͳ ̴ሺሻሼͳ ȗሺ ȗ ሻሼ ͳͳͳ ȗሺ ̴ሺ ȗሻሼ ȗ ሻሼ ͳͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ͳ ̴ሺ ȗሻሼ 5 ʹ ൌሺሻሺሻǢ ʹ ʹ ȗൌ̶̶Ǣ ൌሺሻሺሻǢ ͳʹʹ ȗൌ̶̶Ǣ ൌሾͲሿǦͶͺǢ ͳʹ ʹ ൌሾͲሿǦͶͺǢ ൌͲǢ ʹ ൌͲǢ ͵ ሺ̶ Ψ̳Ψ̶ǡሻǢ ͵ ͵ ȗ ሺ̶ ൌሺ ǡ̶̶ሻǢ Ψ̳Ψ̶ǡሻǢ ͳ͵͵ ሾʹͷ͸ሿǢ ȗ ൌሺ ǡ̶̶ሻǢ ͳ͵ ͵ ሾʹͷ͸ሿǢ ሾͺሿǢ ͵ ሾͺሿǢ ͳ ሺሻሼ ͳͲ ̴ሺሻሼ ͳ ̴ሺሻሼ Ͷ ሺ ൌൌͶͲͻ͸ሻ Ͷ ሺͶ ൌൌͶͲͻ͸ሻ ሾͳͲʹͶሿǢ ͳ ̴ሺሻሼͳͶͶ ሺ ǡ̶ Ψ̶̳ǡሻǢ ሾͳͲʹͶሿǢͳ ̴ሺሻሼ ͳ ȗሺ ͳͶ Ͷ ሺ ǡ̶ Ψ̶̳ǡሻǢ ሺǡሻǢͳ ȗ ሻሼ ȗሺ ȗ ሻሼͳͳ ̴ሺ ȗሻሼͶ ͳͳ ሺǡሻǢ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ʹ ̴ ሾʹሿǢ ͳͳ ൌሺሻǢ ʹ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ͷ ̴ Ǣ ͷ ͷ ̴ Ǣሺǡ̵̳Ͳ̵ǡሺሻሻǢ ʹ ͳͷͷͳ ̴ሺሻሼ ൌሺሻሺሻǢ ȗൌሺ ሻǢሺǡ̵̳Ͳ̵ǡሺሻሻǢʹ ͳ ͳ ̴ሺሻሼ ̴ሺሻሼ ൌሺሻሺሻǢ ʹ ͳͷ ȗൌ̶̶Ǣͷͳ ȗൌሺ ሻǢ ȗሺ ሺൌൌͳሻሼʹ ͳ ȗൌ̶̶Ǣ ȗሺ ͳ ȗሺ ȗ ሻሼ ȗ ሻሼͳʹ ȗ ሻሼ ൌሾͲሿǦͶͺǢͳͳͷ ̴ሺሻሼͳʹሺൌൌͳሻሼͳͳ ൌሾͲሿǦͶͺǢͳͳ ̴ሺ ȗሻሼ ̴ሺሻሼʹ ൌͲǢͳ ʹ ̴ሺ ȗሻሼͳ ൌͲǢ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺȗሻƬሻǢ ͳʹ ሺ̶Ψ̶̳ǡሻǢ ͵ ሺሾሿൌൌͷሻሼ ͸ ͸ ͸ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻ͵ ͳ͸͸ʹ ሺ̶ ሺሺሻൌൌ͹ሻሼሺሺǡͳͲʹͶǦͳǡሻǨൌሻ͵ ʹ ൌሺሻሺሻǢʹ Ψ̳Ψ̶ǡሻǢ ሺ̶ ൌሺሻሺሻǢ ൌሺሻሺሻǢ Ψ̳Ψ̶ǡሻǢ͵ ͳ͸ ȗ͸ʹሺሺሻൌൌ͹ሻሼ ൌሺ ǡ̶̶ሻǢ ȗൌ̶̶Ǣ͵̴ Ǣʹ ȗʹ ȗൌ̶̶Ǣ ൌሺ ǡ̶̶ሻǢ ȗൌ̶̶Ǣͳ͵ ሾʹͷ͸ሿǢͳʹ͸ ͳ͵ ሾʹͷ͸ሿǢ̴ Ǣͳʹ ሾʹͷ͸ሿǢͳʹ ൌሾͲሿǦͶͺǢ ሾʹͷ͸ሿǢ͵ ሾͺሿǢʹ ͵ ʹ ሾͺሿǢ ൌͲǢʹ ൌͲǢ ൌͲǢ Ͷ ʹൌ̴ ሺƬሾͳሿǡǡǡሺȗሻƬሻǢ ͳ͵ ሺ ൌൌͷͲሻ Ͷ ̴ Ǣ ͹ ̴ Ǣ ͹ ͹ ̴ ǢൌǢ Ͷ ሺͳ͹͹͵ ൌൌͶͲͻ͸ሻൌǢ̴ ǢͶ ͵ ሺ̶ ͵ሺ ൌൌͶͲͻ͸ሻ ሺ̶ ሺ̶ Ψ̳Ψ̶ǡሻǢ Ψ̳Ψ̶ǡሻǢ Ψ̳Ψ̶ǡሻǢͶ ͳ͹ ሾͳͲʹͶሿǢ͹͵ ̴ Ǣሽ ȗͶ ͵ ሾͳͲʹͶሿǢ ൌሺ ǡ̶̶ሻǢ͵ ȗ ȗ ൌሺ ǡ̶̶ሻǢ ൌሺ ǡ̶̶ሻǢͳͶ ሺ ǡ̶ Ψ̶̳ǡሻǢͳ͵͹ ͳͶሽሺ ǡ̶ Ψ̶̳ǡሻǢͳ͵ ሺ ǡ̶ Ψ̶̳ǡሻǢͳ͵ ሾʹͷ͸ሿǢሺ ǡ̶ Ψ̶̳ǡሻǢͶ ሺǡሻǢ͵ Ͷ ሾͺሿǢ͵ ሺǡሻǢ ሾͺሿǢ͵ ሾͺሿǢ ͷ ͳൌ̴ሺሾͲሿǡሻǢ ͳͶ ̴ Ǣ ͷ ሽ ͺ ሽ ͺ ሽ ͺ ሺሻǢ ͷ ͳͺ̴ ǢͺͶ ሺሽ ሺሻǢͷ Ͷ ൌൌͶͲͻ͸ሻͶ ̴ Ǣሺሺ ൌൌͶͲͻ͸ሻ ൌൌͶͲͻ͸ሻ ͷ ͳͺሺǡ̵̳Ͳ̵ǡሺሻሻǢͺͶሽ ̴ Ǣ ሾͳͲʹͶሿǢͷ Ͷ ሺǡ̵̳Ͳ̵ǡሺሻሻǢͶ ሾͳͲʹͶሿǢ ሾͳͲʹͶሿǢͳͷ ȗൌሺ ሻǢͳͶͺ ͳͷ̴ Ǣ ȗൌሺ ሻǢͳͶ ȗൌሺ ሻǢͳͶሺ ǡ̶ Ψ̶̳ǡሻǢ ȗൌሺ ሻǢͷ ሺൌൌͳሻሼͶ ͷ ሺǡሻǢͶሺൌൌͳሻሼ ሺǡሻǢͶ ሺǡሻǢ ͻ ͻ Ǣ ͸ ͳͻ ̴ Ǣ͸ ͷ ͷ ̴ Ǣ̴ Ǣ͸ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻͻ ሽ ͸ ͷ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻͷ ሺǡ̵̳Ͳ̵ǡሺሻሻǢሺǡ̵̳Ͳ̵ǡሺሻሻǢͳ͸ ሺሺሻൌൌ͹ሻሼͳ͸ ͳͷ ሺሺሻൌൌ͹ሻሼͳͷ ȗൌሺ ሻǢሺሺሻൌൌ͹ሻሼ͸ ̴ Ǣ͸ ͸ ͷ ̴ Ǣ ʹൌ̴ሺሾͳሿǡሻǢሺൌൌͳሻሼͷ ሺൌൌͳሻሼ ͳͷ ̴ Ǣ ͸ ͻ ͻͷ Ǣ̴ Ǣ ͳͻ ͷ̴ Ǣሺǡ̵̳Ͳ̵ǡሺሻሻǢ ͳͷͻ ሽ ሺሺሻൌൌ͹ሻሼ ͷ ͹ ሺൌൌͳሻሼൌǢ ͳ͸ ሽ ͹ ̴ Ǣ ͳͲ ͳͲ ሽ ͹ ʹͲ̴ Ǣሽ ͹ ͸ ͸ ̴ Ǣ ͹ ൌǢͳͲ ͹ ͸ ൌǢ͸ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻሺሺǡͳͲʹͶǦͳǡሻǨൌሻͳ͹ ̴ Ǣͳ͹ ͳ͸ ̴ Ǣͳ͸ሺሺሻൌൌ͹ሻሼ̴ Ǣ͹ ሽ ͹ ͸ሽ ͸̴ Ǣ̴ Ǣ ͳͲ ͳͲ͸ ሽ ʹͲ ሽ ͸ ሺሺǡͳͲʹͶǦͳǡሻǨൌሻ ͳͲͳ͸ ̴ Ǣ ͸ ͺ ̴ ǢǢ ͳ͹ ͺ ሽ ͺ ሽ ͺ ͹ሽ͹ ̴ Ǣ̴ Ǣͺ ሺሻǢ ͺ ͹ ሺሻǢ͹ ൌǢൌǢ ͳͺ ሽ ͳͺ ͳ͹ ሽ ͳ͹ ̴ Ǣሽ ͺ ̴ Ǣͺ ͹̴ Ǣሽ ͹ ሽ ͹ ̴ Ǣ ͹ ൌǢ ͳ͹ ሽ ͹ ͻ ሽ ሽ ͳͺ ͻ ͻ ͺ ሽ ͻ ͺ ͺሽ ሽ ͻ Ǣͺ ሺሻǢͻͺ Ǣͺ ሺሻǢ ሺሻǢ ͳͻ ̴ Ǣͳͺ ͳͻ ̴ Ǣͳͺ ̴ Ǣͳͺሽ ̴ Ǣͻ ሽ ͺ ͻ ሽ̴ Ǣͺ ̴ Ǣͺ ̴ Ǣ ͳͲ ͳͲ ͳͲ ሽ ͳͲ ሽ ʹͲ ሽ ʹͲ ሽ ͳͲ ͳͲ ͻ ͻ ͻ ͻ Ǣͻ ͻ ǢǢ ͳͻ ሽ ͳͻ ͳͻ̴ Ǣሽ ͻ ሽ ͻ ሽ ͻ ሽ ͳͲ ͳͲ (a)ͳͲ Symbolic variable declarations.ͳͲ ሽ ͳͲ ͳͲሽ ሽ (b) Covert symbolicʹͲ propagations.ʹͲ ሽʹͲ ͳͲ (c)ͳͲ BufferͳͲ overﬂows. ͳ ̴ሺ ȗሻሼ ͳ ሺሻሼ ͳͲ ሺǨൌͳሻሼ ͳ ሺሻሼ ͳ ሺሻሼͳͲ ̴ሺ ȗሻሼ ͳͲ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ͳ ̴ሺ ȗሻሼ ʹ ൌሾͲሿǦ ͶͺǢ ʹ ሺΨʹൌൌͲሻ ͳͳ ൌሺሻǢ ʹ ̴ ሾʹሿǢ ʹ ̴ ሾʹሿǢͳͳ ൌሾͲሿǦͶͺǢ ͳͳ ൌሾͲሿǦͶͺǢʹ ൌሾͲሿǦͶͺǢ ʹ ൌሾͲሿǦͶͺǢ ͵ ሺʹͷͶ͹Ͷͺ͵͸Ͷȗ ൏ͲƬƬ ൐Ͳሻሼ ͵ ȀʹǢ ͳʹ ൅൅Ǣ ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺȗሻƬሻǢ ͳʹ ൌሺሻǢͳ ሺሻሼȗሻƬሻǢͳ ሺሻሼͳʹ ൌሺሻǢ͵ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢͳͲ ̴ሺ ȗሻሼͳͲ ͵ ̴ሺ ȗሻሼሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ͳ ͓ሺሻሺ̶ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ ͳͲ ̴Ͳǣ ͳ ̴ሺሻሼ Ͷ ʹൌ̴ ሺƬሾͳሿǡǡǡሺȗሻƬሻǢ ͳ͵ ሺ ൌൌͷͲሻሼ Ͷ ሺሾΨͷሿൌൌͷሻሼͶ ̴ Ǣ Ͷ ͵ȗ൅ͳǢ ͳ͵ ሽ Ͷ ʹൌ̴ ሺƬሾͳሿǡǡǡሺʹ ̴ȗሻƬሻǢ ሾʹሿǢʹ ͳ ̴ሺሻሼͳ͵ ሺ ሾʹሿǢ ൌൌͷͲሻሼ ͳͳ ൌሾͲሿǦͶͺǢͳͳ ͳͲͶ ൌሾͲሿǦͶͺǢሺሾΨͷሿൌൌͷሻሼ ̴ሺ ȗሻሼ ʹ ൌሾͲሿǦͶͺǢʹ ͳ ൌሾͲሿǦͶͺǢ ̴ሺ ȗሻሼʹ ȗ ൌሺǡ̶̶ሻǢ ʹ ͳͳ ሺ ൐Ͳሻሼ ʹ ሺ̶ ൌΨ̶̳ǡሻǢ ͳ ሺሻሼͳ ሺሻሼ ͷ ሽ ͳͲ ̴ሺሻሼͳͲ ̴ሺሻሼ ͷ ͳሽ ̴ሺሻሼͳ ̴ሺሻሼͳͶ ሺ ൌൌʹͷሻ ͷ ͳൌ̴ሺሾͲሿǡሻǢ ͷ ͳൌ̴ሺሾͲሿǡሻǢͳͶ ̴ Ǣ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺ͵ ʹ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺͳͶ̴̴ Ǣ ሾʹሿǢͷ ̴ ǢȗሻƬሻǢ ͳʹ ȗሻƬሻǢ ൌሺሻǢͳʹ ͳͳͷ ൌሺሻǢ̴ Ǣ ൌሾͲሿǦͶͺǢ ͵ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ͵ ʹ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ൌሾͲሿǦͶͺǢ͵ ሺ Ǩൌሻሼ ͵ ̴ሺሻሼ ͳʹ ൅൅Ǣ ͵ ൌȀ͹ͲǤͲǢ ʹ ̴ʹ ሾʹሿǢ̴ ሾʹሿǢ ͸ ̴ Ǣͳͳ ൌሺሻǢͳͳ ൌሺሻǢ ͸ ʹ ̴ሺ ȗሻሼሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢʹ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢͳͷ ̴ Ǣ ͸ ʹൌ̴ሺሾͳሿǡሻǢ ͸ ʹൌ̴ሺሾͳሿǡሻǢͳͷ ሽ Ͷ ʹൌ̴ ሺƬሾͳሿǡǡǡሺͶ ͵ ʹൌ̴ ሺƬሾͳሿǡǡǡሺͳͷ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺሽ ͸ ሽ ȗሻƬሻǢ ͳ͵ ሺȗሻƬሻǢȗሻƬሻǢ ൌൌͷͲሻሼͳ͵ ͳʹ͸ ሺ ൌൌͷͲሻሼሽ ൌሺሻǢ Ͷ ሺሾΨͷሿൌൌͷሻሼͶ ͵ሺሾΨͷሿൌൌͷሻሼ ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢͶ ሺሻǢ Ͷ ൌƬƬ̴Ͳ൅Ǣ ͳ͵ ሺ ൌൌͲሻ Ͷ ൌͲǤͳǢ ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺ͵ ͳൌ̴ ሺƬሾͲሿǡǡ ǡሺ͹ ሽ ȗሻƬሻǢ ͳʹ ȗሻƬሻǢ ሺ̶Ψ̶̳ǡሻǢͳʹ ሺ̶Ψ̶̳ǡሻǢ ͹ ͵ ሺሾሿൌൌͷሻሼ ൌሾͲሿǦͶͺ൅ͻͶǢ͵ ሺሾሿൌൌͷሻሼͳ͸ ͹ ൌǢ ͹ ൌǢ ͳ͸ ̴ Ǣͷ ͳൌ̴ሺሾͲሿǡሻǢͷ Ͷ ͳൌ̴ሺሾͲሿǡሻǢͳ͸ ʹൌ̴ ሺƬሾͳሿǡǡǡሺ̴ Ǣ͹ ͳͶ ̴ ǢȗሻƬሻǢͳͶ ͳ͵͹ ̴ Ǣሺ ൌൌͷͲሻሼ ͷ ̴ Ǣͷ Ͷ ̴ ǢሺሾΨͷሿൌൌͷሻሼͷ ̴ Ǣ ͷ ሺ ൐͵ͲƬƬ ൏ͶͲሻሼ ͳͶ ̴ Ǣ ͷ ሺǨൌͲǤͳƬǦ ൌൌͲሻሼ Ͷ ʹൌ̴ ሺƬሾͳሿǡǡǡሺͶ ʹൌ̴ ሺƬሾͳሿǡǡǡሺͺ ȗሻƬሻǢ ͳ͵ ȗሻƬሻǢሺ ൌൌͷͲሻͳ͵ ሺ ൌൌͷͲሻ ͺ Ͷ ൌሺሻǢ̴ ǢͶ ̴ Ǣͳ͹ ̴ Ǣ ͺ Ǣ ͺ Ǣ ͳ͹ ሽ ͸ ʹൌ̴ሺሾͳሿǡሻǢ͸ ͷ ʹൌ̴ሺሾͳሿǡሻǢͳ͹ ͳൌ̴ሺሾͲሿǡሻǢሽ ͺ ̴ Ǣͳͷ ሽ ͳͷ ͳͶͺ ሽ ̴ Ǣ̴ Ǣ ͸ ሽ ͸ ͷ ሽ ̴ Ǣ͸ ሽሼ ͸ ሺΨ͸ൌൌͳሻ ͳͷ ሽ ͸ ̴ Ǣ ͷ ͳൌ̴ሺሾͲሿǡሻǢͻ ͳͶ ̴ Ǣ ͻ ͷ ൌͳǢሽ ͳͺ ሽ ͻ ሽ ͻ ሽ ͳͺ ͹ ൌǢͷ ͳൌ̴ሺሾͲሿǡሻǢ͹ ͸ ൌǢͳͺ ʹൌ̴ሺሾͳሿǡሻǢͻ ሽ ͳ͸ ̴ ǢͳͶ ͳ͸ ͳͷ̴ Ǣͻ ̴ Ǣሽ ሽ ͹ ͷ ͹ ሽ͸ ሽ ͹ ̴ Ǣ ͹ ሺሻǢ ͳ͸ ̴ Ǣ ͹ ሽ ͺ Ǣ ͸ ʹൌ̴ሺሾͳሿǡሻǢ ͳ͹ ሽ ͳͷ ̴ Ǣͺ ̴ Ǣ͸ ͸ ʹൌ̴ሺሾͳሿǡሻǢͺ ͹ ǢൌǢ ͳͷ ͳ͹̴ Ǣͳ͸ሽ ̴ Ǣ ͸ ͺ ͹ ̴ Ǣ ͺ ሽ ͺ ሽ ͳ͹ ሽ ͺ ̴ Ǣ ͻ ሽ ͳͺ ͻ ሽ ͹ ൌǢͻ ͺ ሽ͹ ǢൌǢ (d) Parallel executions.ͳ͸ ሽͳͺ ͳ͹ ͳ͸ሽ ሽ (e) Symbolic͹ ͻ memories.ሽͺ̴ Ǣ͹ ̴ Ǣ(f)̴ Ǣͻ Contextualሽ symbolic val- ͻ ͳͺ ͻ ሽ ͺ Ǣͺ Ǣ ͳ͹ ͳ͹ ͳ ̴ሺ ȗሻሼ ͺ ሽ ͺ ሽ ͳ ሺሻሼ ͳͲ ሺǨൌͳሻሼ ͻ ሽ ͳͺ ͻ ሽ ues. Ǧ ͻ ሽ ͻ ሽ ͳͺ ͳͺ ʹ ൌሾͲሿǦ ͶͺǢ ͻ ͻ ʹ ሺΨʹൌൌͲሻ ͳͳ ൌሺሻǢ ͵ ሺʹͷͶ͹Ͷͺ͵͸Ͷȗ ൏ͲƬƬ ൐Ͳሻሼ ͵ ȀʹǢ ͳʹ ൅൅Ǣ Ͷ ̴ Ǣ Ͷ ͵ȗ൅ͳǢ ͳ͵ ሽ ͳ ̴ሺ ȗሻሼ ͳ ͳ͓ሺሻሺ̶ ̴ሺ ȗሻሼ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ ͳ ͲሺሻሼͲǢሽǥͳͲ ̴Ͳǣ ͸ሺሻሼ͸Ǣሽͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼͻ ሺ ͳ̴ሺǡ ሻൌൌͲሻሼͳ ̴ሺ ȗሻሼͳ ̴ሺ ȗሻሼ ͳ ̴ሺ ȗሻሼ ͳ ሺሻሼ ͳͲ ሺǨൌͳሻሼ ͷ ሽ ͷ ሽ ͳͶ ሺ ൌൌʹͷሻ ʹ ȗ ൌሺǡ̶̶ሻǢ ʹ ʹ ȗ ൌሺǡ̶̶ሻǢ ͳ ̴ሺ ȗሻሼʹ ̴ሺ ȗሻሼͳͳ ሺͳ ൐Ͳሻሼ͓ሺሻሺ̶ʹ ൌሾͲሿǦʹ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ ൌሾͲሿǦͶͺǢ ͶͺǢ ͳͲ ̴ ǢͳͲ ʹ ̴Ͳǣʹ ൌሾͲሿǦ ൌሾͲሿǦͶͺǢ ͶͺǢ ͳ ʹ ̴ሺ ȗሻሼ ൌሾͲሿǦ ͶͺǢ ʹ ሺΨʹൌൌͲሻ ͳͳ ൌሺሻǢ ͳ ̴ሺ ȗሻሼ ͳ ͓ሺሻሺ̶ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ͸ ̴ ǢͳͲ ̴Ͳǣ ͳ ̴ሺ ȗሻሼ͸ ̴ሺ ȗሻሼ ͳͷ ̴ Ǣ ͵ ሺ Ǩൌሻሼ ͵ ͵ ̴ሺ ȗሻሼሺ Ǩൌሻሼ ʹ ȗ ൌሺǡ̶̶ሻǢʹ͵ ȗͳʹ ሺȗ ሾ͹ሿሻሺሻൌሼͲǡͳǡʹǡ͵ǡͶǡͷǡ͸ሽǢ ൌሺǡ̶̶ሻǢ൅൅Ǣʹ ͵ ʹ ሾͷሿǢ͵ ൌȀ͹ͲǤͲǢ ͳͳ ሽሼ ͳͳ ͵ ͵ሺሺʹͷͶ͹Ͷͺ͵͸ͶȗൌȀ͹ͲǤͲǢ ൐Ͳሻሼͳͳ ሺ ൏ͲƬƬ ൐Ͳሻሼ ൐Ͳሻሼʹ ͵ ൌሾͲሿǦͶͺǢൌሺʹ ȗ Ȁ͵ͲሻǢ ൌሾͲሿǦͶͺǢ͵ ȀʹǢ ͳʹ ൅൅Ǣ ͳ ̴ሺ ȗሻሼ ͳ ͓ሺሻሺ̶ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ ͳͲ ̴Ͳǣ ͳ ̴ሺ ȗሻሼ Ͷ ሺሻǢ Ͷ Ͷ ൌሾͲሿǦ ሺሻǢ ͶͺǢ ͵ ሺ Ǩൌሻሼ͵Ͷ ሺ Ǩൌሻሼͳ͵ ൌ ሾሺሾͲሿǦሺ͵ ൌൌͲሻ ̴ሺ ȗሻሼ ͶͺሻΨ͹ሿሺሻǢͶ ͵ ሾͲሿൌͲ͹͹͸ͺǢ ̴ሺ ȗሻሼͶ ൌͲǤͳǢ ͳʹ ͹ ሽ̴ Ǣͳʹ Ͷ Ͷ ൅൅Ǣ̴ ǢൌͲǤͳǢͳʹ ൅൅Ǣ ͵ ͶൌȀ͹ͲǤͲǢሺ൐ͲǤͷሻሼ͵ ൌȀ͹ͲǤͲǢ͹ Ͷ ൌሾͲሿǦͶͺǢ͵ȗ൅ͳǢ ͳ͸ ͳ͵ ሽ ͳ ̴ሺሻሼʹ ͳ ȗ ̴ሺሻሼ ൌሺǡ̶̶ሻǢͳ ͓ሺሻሺ̶ʹ ͳ ͓ሺሻሺ̶ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻ ȗΨͲ̶ǣǣ̶̶ሺሻǣሻͻ ̴Ͳǣ ͻ ̴Ͳǣ ͳͳ ሺ ൐Ͳሻሼͳ ̴ሺሻሼͳ ̴ሺሻሼʹ ൌሾͲሿǦͶͺǢ ͷ ̴ Ǣ ͷ ͷ ሺΨ͸Ǩൌͳȁȁ൏ͳͲȁȁ൐ͶͲȁȁ̴ Ǣ Ͷ ൌൌͳͻሻሼ ሺሻǢͶͷ ሺൌൌͷሻሼ ሺሻǢͳͶ ̴ ǢͶ ൌሾͲሿǦͷ Ͷ ሾͳሿൌͲ ͺʹ͵Ǣ ൌሾͲሿǦ ͶͺǢͷ ሺǨൌͲǤͳƬǦ ͶͺǢ ൌൌͲሻሼͳ͵ ͺ ሽ ͳ͵ ͷ ͷ ሺሽ ሺǨൌͲǤͳƬǦͳ͵ ൌൌͲሻ ሺ ൌൌͲሻሼ ൌൌͲሻͶ ͷൌͲǤͳǢ̴ ǢͶ ൌͲǤͳǢͺ ൌሺሻǢͷ ሽ ͳ͹ ͳͶ ̴ Ǣሺ ൌൌʹͷሻ ʹ Ǣ͵ ʹ ሺǢ Ǩൌሻሼ ʹ ͵ ʹ ̴ሺ ȗሻሼ ͳͲ ሺͳͲ ൐Ͳሻሼሺͳʹ ൐Ͳሻሼ൅൅Ǣʹ ሺ̶ʹ ሺ̶ ൌΨ̶̳ǡሻǢ͵ ൌȀ͹ͲǤͲǢ ൌΨ̶̳ǡሻǢ ͸ ሽሼ ͸ ͸ ൌͳ͵Ǣሽሼ ͷ ̴ Ǣͷ͸ ̴ Ǣͳͷ̴ Ǣሽ ͷ ሺΨ͸Ǩൌͳȁȁ൏ͳͲȁȁ൐ͶͲȁȁ͸ ͷ ሾʹሿൌͲͷͺǢሺΨ͸Ǩൌͳȁȁ൏ͳͲȁȁ൐ͶͲȁȁ͸ ̴ ǢൌൌͳͻሻሼͳͶ ͻሽ ൌൌͳͻሻሼͳͶ ͸ ͸ ̴ Ǣ̴ Ǣ̴ ǢͳͶ ̴ Ǣͷ ͸ሺǨൌͲǤͳƬǦሽ ͷ ൌൌͲሻሼሺǨൌͲǤͳƬǦͻ ൌൌͲሻሼ͸ ̴ሺ ȗሻሼ ൌͳǢ ͳͺ ሽ ͳͷ ̴ Ǣ ͵ ൌሺሻሺሻǢͶ ͵ ሺሻǢ ൌሺሻሺሻǢ ͵ ̴ሺሻሼͶ ͵ ̴ሺሻሼ ൌሾͲሿǦ ͶͺǢ ͳͳ ൅൅Ǣͳͳ ൅൅Ǣͳ͵ ሺ͵ ൌൌͲሻൌȀ͹ͲǤͲǢ͵ ൌȀ͹ͲǤͲǢͶ ൌͲǤͳǢ ͹ ̴ Ǣ ͹ ͹ ሽ ̴ Ǣ ͸ ሽሼ ͸͹ ሽሼሽ ͳ͸ ̴ Ǣ͸ ൌͳ͵Ǣ͸ ሾ͵ሿൌͲͳ ͺͳͶ͹Ǣൌͳ͵Ǣ͹ ሽ ͳͷ ͹ሽ ሽ ͳͷ ሽ ͸ ̴ Ǣ͸ ̴ Ǣ Ͷ ሺͷ ൌൌሻሼͶ ̴ Ǣሺ ൌൌሻሼ Ͷ͹ ͷ Ͷ ሺΨ͸Ǩൌͳȁȁ൏ͳͲȁȁ൐ͶͲȁȁ ൌƬƬ̴Ͳ൅Ǣ ൌƬƬ̴Ͳ൅Ǣͳͷ ͳʹ ͹ൌൌͳͻሻሼሺሽ ͳʹ ൌൌͲሻሺͳͶ ൌൌͲሻ̴ ǢͶ ͹ൌͲǤͳǢ̴ ǢͶ ൌͲǤͳǢͷ ሺǨൌͲǤͳƬǦ͹ ൌൌͲሻሼ ൌሾͲሿǦͶͺ൅ͻͶǢ ͳ͸ ͺ ሽ ͺ ͺ ሽ ൌƬƬ̴Ͳ൅Ǣ ͹ ̴ Ǣ͹ͺ ̴ Ǣ̴ Ǣͳ͹ ሽ ͹ ሽ ͺ ͹ ሾͶሿൌͲͳͲ͸ͺ͵Ǣሽ ͺ ̴ Ǣͳ͸ ͳ͸ ͺ ͺ̴ Ǣ̴ Ǣͳ͸ ̴ Ǣ͹ ͺሽ ሽ ͹ ሽ ͺ ൌሺሻǢ ͳ͹ ̴ Ǣ ͺ ሽ ͷ ̴ Ǣ͸ ͷ ሽሼ̴ Ǣͺ ͷ ሺ͸ ൌƬƬ̴Ͳ൅Ǣͷ ൐͵ͲƬƬൌͳ͵Ǣሺ ൐͵ͲƬƬ ൏ͶͲሻሼ ൏ͶͲሻሼ ͳ͵ ͳ͹ ሽ ̴ Ǣͳ͵ ̴ Ǣͳͷ ሽ ͷͺ ሺǨൌͲǤͳƬǦ̴ Ǣͷ ሺǨൌͲǤͳƬǦ ൌൌͲሻሼ͸ ̴ Ǣ ൌൌͲሻሼ ͻ ሽ ͻ ͻ ሺሻǢሽ ͺͻ ሽሽ ͳͺ (g) Symbolic jumps. ͺ ͻ(h) ሽ Floating-point ൌƬƬ̴Ͳ൅Ǣ numbers.ͻ (i)ͻ Arithmeticሽ ͳ͹ ሽ overﬂows. ͻ (j) Externalͺ function̴ Ǣ calls. ͻ ൌͳǢ ͳͺ ሽ ͻ ሽ ͸ ሽ ͻ ͹ሽ͸ ̴ Ǣሽ ͻ ሺሻǢ͸ ͻ ሺΨ͸ൌൌͳሻ͹ ሺሻǢ͸ ሽ ሺΨ͸ൌൌͳሻ ͳͶ ͳͺ ሽ ͳͶ ሽͳͺ ͳ͸ ̴ Ǣ͸ͻ ሽ ̴ Ǣ͸ ͻ ሽ ͹̴ Ǣሽ ͹ ̴ Ǣͺ ͹ ሽ ̴ Ǣ͹ ͺሺሻǢ͹ ሺሻǢ Ǧ ൌƬƬ̴Ͳ൅Ǣ ͳͷ ̴ Ǣͳͷ Ǧ̴ Ǣͳ͹ ሽ ͹ ሽ ͹ ሽ ͺ ̴ Ǣ ͺ ሽ ͻ ͺሽ ሽ ͺ ሽ ͻ ͺ ሺሻǢሽ ͳ͸ ሽ ͳ͸ ሽ ͳͺ ͺ ̴ Ǣ Ǧ ͺ ̴ Ǣͻ Ǧሽ ͻ ͻ ͻ ͻ ͻ ሽ ͻ ሽ ͳ ̴ሺ ȗሻሼ ͳ ሺሻሼ ͳͲ ሺǨൌͳሻሼ ͳ ̴ሺ ȗሻሼ ͻ ሺ ͳ̴ሺǡ ሻൌൌͲሻሼ Ǧ ͳ ̴ሺ ȗሻሼ ʹ ൌሾͲሿǦ ͶͺǢ ʹ ሺΨʹൌൌͲሻ ͳͳ ൌሺሻǢ ʹ ൌሾͲሿǦ ͶͺǢ ͳͲ ̴ Ǣ Ǧ Ǧ ʹ ൌሾͲሿǦ ͶͺǢ ͵ ሺʹͷͶ͹Ͷͺ͵͸Ͷȗ ൏ͲƬƬ ൐Ͳሻሼ ͵ ȀʹǢ ͳʹ ൅൅Ǣ Ͷ ̴ Ǣ Ͷ ͵ȗ൅ͳǢ ͳ͵ ሽ ͵ ሾͷሿǢ ͳͳ ሽሼ ͵ ൌሺ ȗ Ȁ͵ͲሻǢ Ͷ ሾͲሿൌͲ͹͹͸ͺǢͳ ̴ሺ ȗሻሼͳʹ ̴ Ǣͻ ሺ ͳ̴ሺǡ ሻൌൌͲሻሼ Ͷ ሺ൐ͲǤͷሻሼͳ ̴ሺ ȗሻሼ ͷ ሽ ͷ ሽ ͳͶ ሺ ൌൌʹͷሻ ʹ ൌሾͲሿǦ ͶͺǢ ͳͲ ̴ Ǣ ʹ ൌሾͲሿǦ ͶͺǢ ͸ ̴ Ǣ ͸ ̴ሺ ȗሻሼ ͳͷ ̴ Ǣ ͷ ሾͳሿൌͲ ͺʹ͵Ǣ ͳ͵ ሽ ͷ ̴ Ǣ ͵ ሾͷሿǢ ͳͳ ሽሼ ͵ ൌሺ ȗ Ȁ͵ͲሻǢ ͹ ሽ ͹ ൌሾͲሿǦͶͺ൅ͻͶǢ ͳ͸ ͸ ሾʹሿൌͲͷͺǢ ͳͶ ሽ ͸ ሽ Ͷ ሾͲሿൌͲ͹͹͸ͺǢ ͳʹ ̴ Ǣ Ͷ ሺ൐ͲǤͷሻሼ ͺ ͺ ൌሺሻǢ ͳ͹ ̴ Ǣ ͹ ሾ͵ሿൌͲͳ ͺͳͶ͹Ǣ ͳͷ ͹ ̴ Ǣ ͷ ሾͳሿൌͲ ͺʹ͵Ǣ ͳ͵ ሽ ͷ ̴ Ǣ ͻ ͻ ൌͳǢ ͳͺ ሽ ሾͶሿൌͲͳͲ͸ͺ͵Ǣ (k) Loops. ͺ ͸ ሾʹሿൌͲͷͺǢ(l) Cryptoͳ͸ functions.ͳͶ ሽ ͺ ሽ ͸ ሽ ͹ ሾ͵ሿൌͲͳ ͺͳͶ͹Ǣ ͳͷ ͹ ̴ Ǣ Fig. 3. Logic bomb samples with challenging symbolic execution issues. In eachͺ sample, ሾͶሿൌͲͳͲ͸ͺ͵Ǣ we employ symvar toͳ͸ denote a symbolic variable, and ͺ ሽ BOMB_ENDING to denote a macro value indicating a particular program behavior. ͻ

ͳ ̴ሺ ȗሻሼ ͻ ሺ ͳ̴ሺǡ ሻൌൌͲሻሼ ͳ ̴ሺ ȗሻሼ ʹ ൌሾͲሿǦ ͶͺǢ ͳͲ ̴ Ǣ agation, the instructionsʹ ൌሾͲሿǦ related to ͶͺǢ the propagated values to be affected by buffer overflows because the stack layout ͵ ሾͷሿǢ ͳͳ ሽሼ ͵ ൌሺ ȗ Ȁ͵ͲሻǢ Ͷ ሾͲሿൌͲ͹͹͸ͺǢ ͳʹ ̴ Ǣwould be missed fromͶ ሺ൐ͲǤͷሻሼ the following analysis. Therefore, the of a program only exists in assembly codes, which may ͷ ሾͳሿൌͲ ͺʹ͵Ǣ ͳ͵ ሽ challenge attacks theͷ stages̴ Ǣ of Sinst and Ssem. vary for particular platforms. Therefore, such tools cannot ͸ ሾʹሿൌͲͷͺǢ ͳͶ ሽ ͸ ሽ ͹ ሾ͵ሿൌͲͳ ͺͳͶ͹Ǣ ͳͷ Figure 3(b) shows͹ a̴ Ǣ covert propagation sample. We de- model the stack information with source codes only. In ͺ ሾͶሿൌͲͳͲ͸ͺ͵Ǣ ͳ͸ fine an integer i andͺ ሽ initiate it with the value of a symbolic contrast, binary-code-based symbolic execution tools should variable (symvar). So i is also a symbolic variable. We then be more potent in handling buffer overflow issues because propagate the value of i to another variable (ret) through a there can simulate actual memory layouts. However, even shell command (echo), and let ret control the return value. if these tools can precisely track the propagation, they still To find a test case which can return the BOMB_ENDING, a suffer difficulties in automatically analyzing the unexpected symbolic execution tool should properly track or model the program behaviors caused by overflow. Otherwise, they propagation incurred by the shell command. would be powerful enough to generate exploits for bugs, which is a problem far from being solved [29]. 4.1.3 Buffer Overflows Buffer overflow is a typical software bug that can bring security issues. Due to insufficient boundary check, the Figure 3(c) demonstrates a buffer overflow example. input data may overwrite adjacent memories. Adversaries The program returns a BOMB_ENDING if the value of flag can employ such bugs to inject data and intentionally tam- equals one, which is unlikely because the value is zero and per the semantics of the original codes. Buffer overflows should remain unchanged without explicit modification. can happen in either stack or heap regions. If a symbolic However, the program has a buffer overflow bug. It has a execution tool cannot detect the overflow issues, it would buffer (buf) of eight bytes and employs no boundary check fail to track the propagation of symbolic values. Therefore, when copying symbolic values to the buffer with strcpy. buffer overflow involves a particular covert propagation We can change the value of flag to one leveraging the bug, issue. Source-code-based symbolic execution tools are prone e.g., when symvar is “ANYSTRIN\x01\x00\x00\x00”. 6

4.1.4 Parallel Executions like symbolic memories, symbolic values can also serve as Classic symbolic execution is effective for sequential pro- the parameters to retrieve values from the environment, grams. We can draw an explicit CFG for sequential pro- such as loading the contents of a file pointed by symbolic grams and let a symbolic execution engine traverse the values. By default, this contextual information is unavailable CFG. However, if a program processes symbolic variables in to the program or process, and the analysis is complicated. parallel, classic symbolic execution techniques would suffer Moreover, since the contextual information can be changed problems. Parallel programs can be undecidable because the any time without informing the program, the problem is execution order of parallel codes does not only depend on undecidable. A symbolic tool that does not support such the program but may also depend on the execution context. operations would cause errors during Sinst and Ssem. A parallel program may exhibit different behaviors even Figure 3(f) is an example of contextual symbolic values. with the same test case. This poses a problem for symbolic If symvar points to an existed file on the local disk, the execution to generate test cases for triggering corresponding program would return a BOMB_ENDING. control flows. If a symbolic execution tool directly ignores the parallel syntax or addresses the syntax improperly, 4.1.7 Symbolic Jumps errors would happen during Sinst and Ssem. In general, symbolic execution only extracts constraint mod- Figure 3(d) demonstrates an example with parallel els when encountering conditional jumps, such as var<0 in codes. The symbolic variable i is processed by another two source codes, or jle 0x400fda in assembly codes. How- additional threads in parallel, and the result is assigned to ever, we may also employ unconditional jumps to achieve j. Then the value of j determines the whether the program the same effects as conditional jumps. The idea is to jump should return a BOMB_ENDING. to an address controlled by symbolic values. If a symbolic To handle parallel codes, a symbolic execution tool has execution engine is not tailored to handle the unconditional to interpret the semantics and track parallel executions, e.g., jumps, it would fail to extract corresponding constraint by introducing extra symbolic variables [30]. However, such models and miss some available control flows. Therefore, an approach may not be scalable because the possibility of the challenge attacks the constraint modeling stage Smodel. parallel execution can be a large number. In practice, there Figure 3(g) is an example of symbolic jumps. The pro- are several heuristic approaches to improve the efficiency. gram contains an array of function pointers, and each func- For example, we may restrict the exploration time of concur- tion returns an integer value. The symbolic variable serves rent regions with a threshold [30]; we may conduct symbolic as an offset to determine which function should be called execution with arbitrary contexts and convert multi-thread during execution. If f5() is called, the program would programs into equivalent sequential ones [31]; or we can return a BOMB_ENDING. prune unimportant paths leveraging some program codes, such as assertion [32]. 4.1.8 Floating-point Numbers F 4.1.5 Symbolic Memories A floating-point number (f ∈ ) approximates a real number (r ∈ R) with a fixed number of digits in the form of Symbolic memory is a situation whereas symbolic variables f = sign∗baseexp. For example, the 32-bit float type compli- serve as the offsets or pointers to retrieve values from the ant to IEEE-754 has 1-bit for sign, 23-bit for base, and 8-bit memory, such as array indexes. To handle symbolic mem- exp ories, a symbolic execution engine should take advantage for . The representation is essential for computers, as the memory spaces are limited in comparison with the infinity of the memory layout for analysis. For example, we can R convert an array selection operation to a switch/case of . As a tradeoff, floating-point numbers only have limited precision, which makes some unsatisfiable constraints over clause in which the number of possible cases equals the R to be satisfied over F with a rounding mode. In order to length of the array. However, the number of possible com- support reasoning over F, a symbolic execution tool should binations would grow exponentially when there are several such operations along a control flow. In practice, a symbolic consider such approximations when extracting and solving execution tool may directly employ the feature of array constraint models. However, recent studies (e.g., [5], [37], operations implemented by some constraint solvers, such as [38], [39]) show that there is still no silver bullet for the problem. Floating-point numbers remains a challenge for STP [33] and Z3 [34]. It may also analyze the alignment of symbolic execution tools, and the challenge attacks S . some pointers in advance, such as CUTE [35]. However, the model power of pointer analysis is limited because the problem can Figure 3(h) demonstrates an example with floating-point 0.1 be NP-hard or even undecidable for static analysis [36]. If a operations. Because we cannot represent with float type a != 1 symbolic execution tool cannot model symbolic memories precisely, the first predicate is always true. If the second condition a == b can be satisfied, the program properly, errors would occur during Sinst and Ssem. BOMB_ENDING Figure 3(e) demonstrates a sample of symbolic memo- would return a . Therefore, one test case to BOMB_ENDING symvar ries. In this example, the symbolic variable i serves as an returning a is equals ‘7’. offset to retrieve an element from the array. The retrieved element then determines whether the program returns a 4.1.9 Arithmetic Overflows BOMB_ENDING. Arithmetic overflow happens when the result of an arithmetic operation is outside the range of an integer type. For 64 64 4.1.6 Contextual Symbolic Values example, the range of a 64-bit signed integer is [−2 , 2 − The challenge is similar to symbolic memories but more 1]. In this case, a constraint model (e.g., the result of a pos- complicated. Other than retrieving values from the memory itive integer plus another positive integer is negative) may 7

have no solutions over R; but it can have solutions when the counters for each loop [41]. Because loop can incur we consider arithmetic overflow. Handling such arithmetic numerous paths, we can hardly have a perfect solution for overflow issues is not as difficult as previous challenges. this problem. However, some preliminary symbolic execution tools may Figure 3(k) shows a sample with a loop. The loop func- fail to consider these cases and suffer errors when extracting tion is implemented with the Collaz conjecture [42]. No and solving the constraint models. matter what is the initial value of i, the loop will terminate Figure 3(i) shows a sample with an arithmetic overflow with j equals 1. problem. To meet the first condition 254748364 * i < 0, i should be a negative value. However, the second 4.2.3 Crypto Functions condition requires i to be a positive value. Therefore, it Crypto functions generally involve some computationally has no solutions in the domain of real numbers. But the complex problems to ensure security. For a hash func- conditions can be satisfied when 254748364 * i exceeds tion, the complexity guarantees that adversaries cannot the max value that the integer type can represent. efficiently compute the plaintext of a hash value. For a symmetric encryption function, it promises that one cannot 4.2 Path-explosion Challenges efficiently compute the key when given several pairs of Now we discuss three path-explosion challenges existed in plaintext and ciphertext. Therefore, such programs should small-size programs. also be resistant to symbolic execution attacks. From a program analysis view, the number of possible control paths 4.2.1 External Function Calls for the crypto functions can be substantial. For example, the body of the SHA1 algorithm [43] is a loop that iterates Shared libraries, such as libc and libm (i.e., a math 80 rounds, and each round contains several bit-level opera- library), provide some basic function implementations to tions. facilitate software development. An efficient way to employ Figure 3(l) demonstrates a code snippet which employs a the functions is via dynamic linkage, which does not pack SHA1 function [43]. If the hash result of the symbolic value the function body to the program but only links with the is equivalent to a predefined value, the program would functions dynamically when execution. Therefore, such ex- return a BOMB_ENDING. However, it is difficult since SHA1 ternal functions do not enlarge the size of a program, but cannot be reversely calculated. they can enlarge the code complexity in nature. In general, symbolic execution tools cannot handle such When an external function call is related to the prop- crypto programs. Malware may employ the technique to de- agation of symbolic values, the control flows within the ter symbolic execution-based program analysis [44]. When function body should be analyzed by default. There are two analyzing programs with crypto functions, a common way situations. A simple situation is that the external function is to avoid exploring the function internals (e.g., [45], [46]). does not affect the program behaviors after executing it, For example, TaintScope [45] first discriminates the symbolic such as simply printing symbolic values with printf. In variables corresponding to crypto functions from other vari- this case, we may ignore the path alternatives within the ables, and then it employs a fuzzy-based approach to search function. However, if the function execution affects the solutions for such symbolic variables rather than solving the follow-up program behaviors, we should not ignore them. problem via symbolic reasoning. Otherwise, the symbolic execution would be based on a So far, we have discussed 12 different challenges in total. wrong assumption that the new test case generated for an Note that we do not intend to propose a complete list of alternative path can always trigger the same control flow challenges for symbolic execution. Instead, we collect all within the external function. If a small program contains the challenging issues that have been mentioned in the several such function calls, the complexity of external func- literature and systematically analyze them. This analysis is tions may cause path explosion issues. In practice, there are essential for us to design the the dataset of logic bombs in different strategies that symbolic execution tools may adopt Section 5.2.2. with a trade-off between consistency and efficiency [28]. Figure 3(j) demonstrates a sample with an external function call. It computes the sine of a symbolic value 5 BENCHMARK METHODOLOGY via an external function call (i.e., sin), and the result is In this section, we introduce our methodology and a frame- used to determine whether the program should return a work to benchmark the capability of real-world symbolic BOMB_ENDING. execution tools. 4.2.2 Loops Loop statements, such as for and while, are widely em- 5.1 Objective and Challenges ployed in real-world programs. Even a very small program Before describing our approach, we first discuss our design with loops can include many or even an infinite number of goal and the challenges to overcome. paths. By default, a symbolic execution tool should explore This work aims to design an approach that can bench- all available paths of a program, which can beyond the mark the capabilities of symbolic execution tools. Our pur- capability of the tool if there are too many paths. In practice, pose is critical and valid in several aspects. As we have a symbolic execution tool may employ a search strategy discussed, some challenging issues are only engineering which favorites the unexplored branches on a program issues, such as arithmetic overflows. With enough engineer- CFG [18], [40], or introduces new symbolic variables as ing effort, a symbolic execution tool should be able to handle 8

Algorithm 1: Method to design evaluation samples. symbolic variable; the second step is to design a challenging // Create a function with a symbolic problem related to the symbolic variable and save the result variable to another variable symvar2; the third step is to design a Function LogicBomb(symvar) condition related to the new variable symvar2; the final // symvar2 is a value computed from a step is to design a bomb (e.g., return a specific value) which challenging problem related to symvar indicates the condition has been satisfied. Note that because 2 ← symvar Challenge(symvar); the value of symvar2 is propagated from symvar, symvar2 // If symvar2 satisfies a condition is also a symbolic variable and should be considered in the if Condition(symvar2) then // Trigger the bomb symbolic analysis process. Bomb(); The magic of the logic bomb idea enables us to make the end evaluation much precise and efficient. We can create several such small programs; each contains only a challenging issue and a logic bomb that tells the evaluation result. Because these issues. On the other hand, some challenges are hard the object programs for symbolic execution are small, we from a theoretical view, such as loops. But some heuristic can easily avoid unexpected issues that may also cause approaches can tackle certain easy cases. Symbolic execu- failures via a careful design. Also because the programs tion tools may adopt different heuristics and demonstrate are small, performing symbolic execution on them generally different capabilities in handling them. Therefore, it is worth requires a short time. For the programs that unavoidably benchmarking their performances in handling particular incur path explosion issues, we can restrict the symbolic challenging issues. Developers generally do not provide execution time either by controlling the problem complexity much information about the limitations of their tools to or by employing a timeout setting. users. A useful benchmark approach should be accurate and 5.2.2 Logic Bomb Dataset efficient. However, it is challenging to benchmark symbolic Following Algorithm 1, we have designed a dataset of logic execution tools accurately and efficiently. Firstly, a real pro- bombs to evaluate the capability of symbolic execution tools. gram contains many instructions or lines of codes. When a We have already shown several samples in Figure 3. Our full symbolic execution failure happens, locating the root cause dataset is available on GitHub1. The dataset contains over 60 requires much domain knowledge and effort. Since errors logic bombs for 64-bit Linux platform, which covers all the may propagate, it is challenging to conjecture whether a challenges discussed in Section 4. For each challenge, we im- symbolic execution tool fails in handling a particular issue. plemented several logic bombs. Either each bomb involves Secondly, the symbolic execution itself is inefficient. Bench- a unique challenging issue (e.g., covert propagation via file marking a symbolic execution tool generally implies per- write/read or via system calls), or introduces a problem forming several designated symbolic execution tasks, which with a different complexity setting (e.g., one-leveled arrays would be time-consuming. Note that existing symbolic exe- or two-leveled arrays). cution papers (e.g., [2], [3], [47], [48]) generally evaluate the performance of their tools by conducting symbolic execu- When designing the logic bombs, we carefully avoid x00 tion experiments with real programs. The process usually trivial test cases (e.g., \ ) that can trigger the bombs. takes several hours or even days. They demonstrate the Moreover, we try to employ straightforward implementa- effectiveness of their work using the achieved code coverage tions, and we hope to ensure that the results would not and number of bugs detected, and they do not analyze the be affected by other unexpected failures. For example, we atoi argv[1] root causes of uncovered codes. avoid using to convert to integers because some tools cannot support atoi. However, fully avoiding external function calls is impossible for some logic bombs. 5.2 Approach based on Logic Bombs For example, we should employ external function calls to To tackle the challenges of benchmark concerning accuracy create threads when designing parallel codes. Surely that if and efficiency, we propose an approach based on logic a symbolic execution tool cannot handle external functions, bombs. Below, we discuss our detailed design. the result might be affected. To tackle the interference of challenges, we draw a challenge propagation chart among 5.2.1 Evaluation with Logic Bombs the logic bombs as shown in Figure 4. There are two kinds A logic bomb is a code snippet that can only be executed of challenge propagation relationships: should in solid lines, when certain conditions are met. To evaluate whether a and may in dashed lines. A should relationship means a symbolic execution tool can handle a challenge, we can logic bomb contains a similar challenging issue in another design a logic bomb guarded by a particular issue with logic bomb; if a tool cannot solve the precedent logic bomb, the challenge. Then we can perform symbolic execution on it should not be able to solve the later one. For example, the the program which embeds the logic bomb. If a symbolic stackarray_sm_l1 is precedent to stackarray_sm_l2. execution tool can generate a test case that can trigger the A may relationship means a challenge type may be a prece- logic bomb, it indicates the tool can handle the challenging dent to other logic bombs, but it is not the determinant one. issue, or vice versa. For example, a parallel program generally involves external Algorithm 1 demonstrates a general framework to de- function calls. However, although a tool is unable to solve sign such logic bombs. It includes four steps: the first step is to create a function with a parameter symvar as the 1. https://github.com/hxuhack/logic bombs 9

̴ ̴ ̴ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴ͳ ̴̴ͳ ̴ʹ ̴ʹ ̴̴ʹ ̴̴ʹ ̴̴ʹ ̴̴ʹ ̴̴ʹ ̴̴ʹ ̴̴ʹ

̴ ̴ ̴ ʹ ̴̴ͳ ̴ͳ ̴ͳ ̴ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴ ̴ͳ ̴ ̴ͳ ̴ ̴̴ʹ ʹ ̴̴ʹ ̴ʹ ̴̴ʹ ̴̴ʹ ʹ ̴ Ͳ̴ ̴ ̴ͳ ̴ ̴ͳ ̴ ̴ͳ ̴ ̴ͳ Ǧ ͳ ʹ ̴̴ͳ ̴̴ͳ ͷ൅ͳ ͹൅ͳ ̴̴ͳ ̴̴ͳ ̴̴ͳ ̴ ̴ʹ ̴ ̴ʹ ̴ ̴ʹ

͵ Ͷ ͷ ̴̴ʹ ̴̴ʹ ̴̴ʹ ̴ ̴̴ʹ ̴̴ʹ ̴ ̴͵ ̴ ̴͵

Fig. 4. The challenge propagation relationship among our dataset of logic bombs. A solid line means a logic bomb contains a similar problem deﬁned in another logic bomb; a dashed line means a challenge may affect other logic bombs.

source codes into binaries or other formats that a target symbolic execution tool supports. Symbolic execution is generally performed based on intermediate codes. When benchmarking source-code-based symbolic execution tools such as KLEE, we have to compile the source codes into the ʹǤ ͵Ǥ ͳǤ supported intermediate codes. When benchmarking binary- code-based symbolic execution tools, we can directly compile them into binaries, and the tool will lift binary codes Ȁ into intermediate codes automatically. In the second step, we direct a symbolic execution tool to Fig. 5. A framework to benchmark symbolic execution tools. analyze the compiled logic bombs in a batch mode. This step outputs a set of test cases for each program. Some dynamic symbolic execution tools (e.g., Triton) can directly tell which the external functions well, it might be able to solve some test case can trigger a logic bomb during runtime. However, logic bombs with parallel issues as sequential programs. other static symbolic execution tools may only output test cases by default, and we need to replay the generated test 5.3 Automated Benchmark Framework cases to examine the results further. Besides, some tools may Base on the evaluation idea with logic bombs, we design falsely report that a test case can trigger the logic bomb. a benchmark framework as shown in Figure 5. The frame- Therefore, we need a third step to verify the test cases. work inputs a dataset of carefully designed logic bombs and In the third step, we replay the test cases with the outputs the benchmark result for a particular symbolic exe- corresponding programs of logic bombs. If a logic bomb can cution tool. There are three critical steps in the framework: be triggered, it indicates that the challenging case is solved dataset preprocessing, batch symbolic execution, and case by the tool. Finally, we can generate a benchmark report verification. based on the case verification results. In the preprocessing step, we parse the logic bombs and compile them into object codes or binaries such that 6 EXPERIMENTAL STUDY a target symbolic execution tool can process them. The In this section, we conduct an experimental study to demon- parsing process pads each code snippet of a logic bomb strate the effectiveness of our benchmark approach. Below, with a main function and makes it a self-contained program. we discuss the experimental setting and results. By default, we employ argv[1] as the symbolic variables. If a target symbolic execution tool requires adding extra instructions to launch tasks, the parser should add such 6.1 Experimental Setting required instructions automatically. For example, we can We choose three popular symbolic execution tools for bench- add symbolic variable declaration codes when benchmark- mark: KLEE [2], Angr [3], and Triton [49]. Because our ing KLEE. The compilation process compiles the processed dataset of logic bombs are written in C/C++, we only 10 choose symbolic execution tools for C/C++ programs or the side effects, we adopt two timeout settings (60 seconds binaries. The three tools are all released as open source and 300 seconds) for each tool. In this way, we can observe and have a high community impact. Moreover, they adopt the influence of the timeout settings and decide whether different implementation techniques for symbolic execution. we should conduct more experiments with an increased By supporting variant tools, we show that our approach is timeout value. compatible with different symbolic execution implementations. 6.2 Benchmark Results KLEE [2] is a static symbolic execution tool implemented 6.2.1 Result Overview based on LLVM [50]. It requires program source codes to perform symbolic execution. By default, our benchmark Table 2 demonstrates our experimental results. We label the script employs a klee_make_symbolic function to de- results with three options: pass, fail, and timeout. While clare the symbolic variables of logic bombs in the source- ‘pass’ and ‘fail’ imply the symbolic execution has finished, code level. Then, it compiles the source codes into interme- ‘timeout’ implies our benchmark script has terminated the diate codes for symbolic execution. The symbolic execution symbolic execution process when a timeout threshold is process outputs a set of test cases, and our script finally triggered. examines the test cases by replaying them with the binaries. From the results, we can observe that Angr has achieved The whole process is automated with our benchmark script. the best performance with 21 cases solved when the timeout The version of KLEE we benchmark is 1.3.0. Note that is 300 seconds. Comparatively, it only solved 16 cases when because our experiment does not intend to find the best tool the timeout is 60 seconds. KLEE solved nine cases and for particular challenges, so we do not consider the patches the result remains the same with different timeout settings. provided by other parties before they are merged into the Triton performs much worse with three cases solved. To master branch. further verify the correctness of our benchmark results, Triton [49] is a dynamic symbolic execution tool based we compare our experimental results with the previously on binaries. It automatically accepts symbolic variables from declared challenge propagation relationships in Figure 4. the standard input. During symbolic execution, Triton firstly We find the results are all consistent. It justifies that our runs the programs with concrete values and leverages Intel dataset can distinguish the capability of different symbolic- PinTool [51] to trace related instructions, then it lifts the execution tools accurately and effectively. traced instructions into the SSA (single static assignment) The efficiency of our benchmark approach largely de- form and performs symbolic analysis. If there are alternative pends on the timeout setting. Note that Table 2 includes paths found in the trace, Triton generates new test cases some timeout results, and they account for most of our via symbolic reasoning and employs them as the concrete experimental time. Although we try to keep each logic values in the following rounds of concrete execution. This bomb as succinct as possible, our dataset still contains some symbolic execution process goes on until no alternative path complex problems or path explosion issues unavoidable. can be found. The version of Triton we adopted is which When the timeout value is 60 seconds, our benchmark released on Jul 6, 2017, on GitHub. process for each tool takes only dozens of minutes. When Angr [3] is also a tool for binaries but employs different extending the timeout value to 300 seconds, the benchmark implementations. Before performing any symbolic analysis, takes a bit longer time. However, the benefit is not very Angr firstly lifts the binary program into VEX IR [52]. obvious, and only Angr can solve 5 more cases. Can the Then it employs a symbolic analysis engine (i.e., SimuVEX) result get further improved with more time? We have tried to analyze the program based on the IR. Angr does not another group of experiments with 1800 seconds timeout provide ready-to-use symbolic execution script for users but and the results remain unchanged. Therefore, 300 seconds only some APIs. Therefore, we have to implement our own should be a marginal timeout setting for our benchmark symbolic execution script for Angr. Our script collects all experiment. Considering that symbolic execution is compu- the paths to the CFG leaf nodes and then solves the cor- tationally expensive, which may take several hours or even responding path constraints. Angr provides all the critical several days to test a program, our benchmark process is features via APIs, and we only assemble them. Finally, we very efficient. We may further improve the efficiency by check whether the generated test cases can trigger the logic employing a parallel mode, such as assigning each process bombs. In our experiment, we employ Angr with version several logic bombs. 7.7.9.21. Note that our benchmark scripts for these tools all follow 6.2.2 Case Study the framework proposed in Figure 5. During the experi- Now we discuss the detailed benchmark results for each ment, we employ our logic bomb dataset for evaluation. challenge. Firstly, there are several challenges that none of A tool can pass a test only if the generated solution can the tools can trigger even one logic bomb, including sym- correctly trigger a logic bomb. We finally report which logic bolic variable declarations, parallel executions, contextual bombs can be triggered by the tools. symbolic values, loops, and crypto functions. For symbolic We conduct our experiments on an Ubuntu 14.04 X86 64 variable declaration challenge, because all the tools cannot system with Intel i5 CPU and 8G RAM. Because some recognize the expected symbolic variables automatically, symbolic execution tasks may take very long time, our tool they fail in modeling the conditions to trigger the logic allows users to configure a timeout threshold which ensures bombs. The challenges of contextual symbolic values and benchmark efficiency. However, the timeout mechanism crypto functions involve tough problems, and it can be may incur some false results if it is too short. To mitigate expected that all the tools fail in handling them. However, it 11

TABLE 2 Experimental results on benchmarking three symbolic execution tools (KLEE, Triton, and Angr) in handling our logic bombs. Pass means the tool has successfully triggered the bomb; fail means the tool cannot ﬁnd test cases to trigger the bomb; timeout means the tool cannot ﬁnd test cases to trigger the bomb within a given period of time. For each tool, we adopt two timeout settings: 60 seconds and 300 seconds.

KLEE Triton Angr Challenge Case ID t = 60s t = 300s t = 60s t = 300s t = 60s t = 300s df2cf cp pass pass fail fail pass pass echo cp fail fail timeout timeout timeout timeout echofile cp fail fail fail fail timeout timeout file cp fail fail timeout timeout fail fail Covert Propagations socket cp fail fail fail fail fail fail stack cp fail fail pass pass pass pass file eh cp fail fail fail fail timeout pass div0 eh cp fail fail fail fail timeout pass file eh cp fail fail fail fail timeout fail stack bo l1 fail fail fail fail pass pass Buffer Overflows heap bo l1 fail fail fail fail fail fail stack bo l2 fail fail fail fail fail fail malloc sm l1 pass pass timeout fail pass pass realloc sm l1 pass pass fail fail pass pass stackarray sm l1 pass pass fail fail pass pass list sm l1 fail fail fail fail timeout pass Symbolic Memories vector sm l1 fail fail fail fail timeout pass stackarray sm l2 pass pass fail fail fail fail stackoutofbound sm l2 pass pass fail fail pass pass heapoutofbound sm l2 fail fail timeout fail pass pass funcpointer sj l1 pass pass fail fail fail fail Symbolic Jumps jmp sj l1 fail fail fail fail pass pass arrayjmp sj l2 fail fail fail fail fail fail vectorjmp sj l2 fail fail fail fail timeout pass float1 fp l1 fail fail fail fail pass pass float2 fp l1 fail fail fail fail pass pass Floating-point Numbers float3 fp l2 fail fail fail fail timeout timeout float4 fp l2 fail fail fail fail timeout timeout float5 fp l2 fail fail fail fail timeout timeout plus do pass pass pass pass pass pass Arithmetic Overflows multiply do pass pass fail fail pass pass printint ef l1 fail fail pass pass pass pass printfloat ef l1 fail fail fail fail fail fail atoi ef l2 fail fail fail fail pass pass atof ef l2 fail fail fail fail timeout timeout External Function Calls ln ef l2 fail fail fail fail timeout fail pow ef l2 fail fail fail fail pass pass rand ef l2 fail fail timeout timeout fail fail sin ef l2 fail fail fail fail timeout timeout Symbolic Variable Declarations 7 cases, all fail Contextual Symbolic Values 4 cases, all fail Parallel Executions 5 cases, all fail Loops 5 cases, all fail Crypto Functions 2 cases, all fail pass # 62 cases 9 9 3 3 16 21 is a bit surprising that none of the tools can handle parallel handling program in Figure 6. As shown in the box region executions and loops. of Figure 6(b), the mechanism relies on two function calls, which might be the problem that fails Triton. All the tools Covert Propagations: Angr have passed four test cases, failed other covert propagation cases that propagate values df2cf_cp stack_cp , , and two exception handling cases. via file read/write, echo, socket, etc. df2cf_cp propagates the symbolic values indirectly by substituting a data assignment operation with equivalent Buffer Overflows: Only Angr has solved one easy buffer control-flow operations. KLEE also solved the case, but Tri- overflow problem stack_bo_l1. The case has a sim- ton failed. stack_cp propagates symbolic values via direct ple stack overflow issue. Its solution requires modifying assembly instructions push and pop. Only KLEE failed the the value of the stack that might be illegal. However, test because it is a source-based analysis tool which does Angr cannot solve the heap overflow issue heap_bo_l1. not support assembly codes. Besides, Angr has also passed It also failed on another harder stack overflow issue two test cases that propagate symbolic values via the C++ stack_bo_l2, which requires composing sophisticated exception handling mechanism. Not only this exception payload, such as employing return-oriented programming handling mechanism can be covert for some source-code- methods [53]. We are surprised that Triton failed all the tests based analysis tools, such as KLEE, but also it can be covert because binary-code-based symbolic execution tools should for binary-code-based symbolic execution tools, such as be resilient to buffer overflows in nature. Triton. We further break down the details of an exception Symbolic Memories: The results show that Triton does not 12 ሺ ǡ ሻሼ ሺൌൌͲሻሼ ̶ Ǩ̶ǢͲͲͲͲͲͲͲͲͲͲͲͶͲͲͻ൏൅Ͷͳ൐ǣ ͲͶͲͲͳͲ൏̴ͺ൐ ሺ ǡሽ ሻሼ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅Ͷ͸൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲͻ൏൅Ͷͳ൐ǣ ΨͲǡǦͲͶͲሺΨሻ ͲͶͲͲͳͲ൏̴ͺ൐ ሺൌൌͲሻሼሺȀሻǢͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͵൏൅ͷͳ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅Ͷ͸൐ǣ ͲͶͲͲͺ൏̴ͳͲ ̴ ൅ͷ͸൐ ΨͲǡǦͲͶͲሺΨሻ ̶ Ǩ̶Ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͺ൏൅ͷ͸൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͵൏൅ͷͳ൐ǣ ̈́ͲͲǡǦͲͶሺΨሻ ͲͶͲͲͺ൏̴ͳͲ ̴ ൅ͷ͸൐ ሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅͸͵൐ǣ ͲͶͲͲ൏̴ͳͲ ̴ ൅ͳʹͷ൐ ሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͺ൏൅ͷ͸൐ǣ ̈́ͲͲǡǦͲͶሺΨሻ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ Ͷ൏൅͸ͺ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅͸͵൐ǣ ΨǡΨ ͲͶͲͲ൏̴ͳͲ ̴ ൅ͳʹͷ൐ ሺȀሻǢ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ ͸൏൅͹Ͳ൐ǣ ΨǡǦͲʹͲሺΨሻ ̴ሺ ȗሻሼ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ Ͷ൏൅͸ͺ൐ǣ ΨǡΨ ሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ ൏൅͹Ͷ൐ǣ Ψ ǡǦͲʹͶሺΨሻ ൌሾͲሿǦ ͶͺǢ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ ൏൅͹͹൐ǣǥǥ ǦͲʹͶሺΨሻǡΨ ǥǥ ሼ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲ൏൅ͺͲ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲͳ൏൅ͻ͹൐ǣ ̈́ͲͳǡΨ ͲͶͲͲͺͲ൏̴̴ ̴̴ ̷൐ ̴ሺ ȗሻሼ ሺͳͲǡǦ͹ሻǢ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͷ൏൅ͺͷ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸൏൅ͳͲʹ൐ǣ Ψ ǡΨ ΨǡǦͲ͵ͲሺΨሻ ൌሾͲሿǦ ͶͺǢ ̴ Ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͹൏൅ͺ͹൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅ͳͲ͸൐ǣ ͲͶͲͲͲ͸൏̴ͳͲ ̴ ൅ͳ͵Ͷ൐ ̈́ͲͳǡǦͲͶሺΨሻ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅ͻ͵൐ǣ ǦͲʹͲሺΨሻǡΨ ሼ ሽ ሺ ȗሻሼ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͳ൏൅ͳͳ͵൐ǣ ̈́ͲͳǡǦͲ͵ͶሺΨሻ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͳ൏൅ͻ͹൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲͺ൏൅ͳʹͲ൐ǣ ͲͶͲͲͺͲ൏̴̴ ̴ ̴ ̷൐ ͲͶͲͲͺͻͲ൏̴̴ ̴̴ ̷൐ ሺͳͲǡǦ͹ሻǢ ̴ Ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸൏൅ͳͲʹ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅ͳʹͷ൐ǣ ΨǡǦͲ͵ͲሺΨሻ ǦͲͶሺΨሻǡΨ ̴ Ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅ͳͲ͸൐ǣ ̈́ͲͳǡǦͲͶሺΨሻ ሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͲ൏൅ͳʹͺ൐ǣ ̈́ͲͶͲǡΨ ሽ ሺ ȗሻሼሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͳ൏൅ͳͳ͵൐ǣ ̈́ͲͳǡǦͲ͵ͶሺΨሻ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͺ൏൅ͳʹͲ൐ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͶ൏൅ͳ͵ʹ൐ǣ ͲͶͲͲͺͻͲ൏̴̴ ̴ Ψ̴ ̷൐ ̴ Ǣ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲ൏൅ͳʹͷ൐ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͷ൏൅ͳ͵͵൐ǣ ǦͲͶሺΨሻǡΨ ሽ ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͲ൏൅ͳʹͺ൐ǣ̈́ͲͶͲǡΨͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲ͸൏൅ͳ͵Ͷ൐ǣ ǦͲʹͲሺΨሻǡΨ ሽ (a) Source codes. ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͳ൏൅ʹͻ൐ǣሺሻȗ ̴൅ͻ͹ͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͶ൏൅ͳ͵ʹ൐ǣΨͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲ൏൅ͳ͵ͺ൐ǣ ͲͶͲͲ͹ ͲǡΨ (b) Assembly ͲͶͲͲͺ Ͳ൏̴̴ codes. ̷൐ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͷ൏൅͵͹൐ǣሺሻͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲͷ൏൅ͳ͵͵൐ǣ ΨǡǦͲ͵ͲሺΨሻ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͻ൏൅Ͷͳ൐ǣሺሻȀʹͲ̈́ǦͺͲͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲ͸൏൅ͳ͵Ͷ൐ǣ ͲͶͲͲ͹ ͺǡΨ ǦͲʹͲሺΨሻǡΨ Fig. ̴ሺ ȗሻሼ 6. An exemplary program thatǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͳ൏൅Ͷͻ൐ǣ raises an exception whenͲ͹ʹͲǣͲͲͲͲͲͲͲͲ͸ͲͲͲͲͲͲͲͲ͹ͲͲͲͲͲͲͲͲͲͲͲͲͲͲͶͲͲͲ൏൅ͳ͵ͺ൐ǣ divided by ΨǡǦͲʹͺሺΨ zero. Theሻassembly ͲͶͲͲͺ Ͳ൏̴̴ͲͲͲͲͲͺͲͲͲͲͲͲͲͲͻ codes̷൐ demonstrates how the try/catch mechanism ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͷ൏൅ͷ͵൐ǣ ͲͶͲͲ͹ͲǡΨ works in low ൌሾͲሿǦ level. ͶͺǢ Ͳ͹ʹͲǣͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͵ͷͲͲͲͲͲͲͲͲͲ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ ൏൅͸Ͳ൐ǣͲ͹ʹͲǣ ͲͲͲͲͲͲͲͲͳͲͲͲͲͲͲͲͲʹͲͲͲ Ψ ǡǦͲʹͲሺΨሻ ͲͲͲͲͲ͵ͲͲͲͲͲͲͲͲͶ ͳ̴ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ൏൅͸͵൐ǣͲ͹͵ͲͲǣͲͲͲͲͲͲͲͲͷͲͲͲͲͲ͹Ͳ ͲͶͲͲ͹ͲǡΨ ͹ʹͲͲͲͲͲͲͲͲͲͲͲͳ ʹ̴ሾሿൌሼ͸ǡ͹ǡͺǡͻǡͳͲሽǢ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵͹൏൅͹ͳ൐ǣͲ͹͵ͳͲǣͲ͸ ΨǡǦͲͷͲሺΨ ͲͲͲͲͲ͹ሻ ͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵൏൅͹ͷ൐ǣ ͲͶͲͲ͹ͺǡΨ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͵൏൅ͺ͵൐ǣ ΨǡǦͲͶͺሺΨሻ ̴ሺ ȗሻሼ ൌΨͷǢ ሺʹ̴ሾͳ̴ሾሿሿൌൌͻሻሼǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͹൏൅ͺ͹൐ǣ ͲͶͲͲ͹ͲǡΨ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͳ൏൅ʹͻ൐ǣ ͲͶͲͲ͹ ͲǡΨ ሺʹ̴ሾͳ̴ሾሿሿൌൌͻሻሼ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ൏൅ͻͶ൐ǣǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͳ൏൅ʹͻ൐ǣ Ψ ǡǦͲͶͲሺΨሻǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͷ൏൅͵͹൐ǣ ͲͶͲͲ͹ ͲǡΨ ΨǡǦͲ͵ͲሺΨሻ ൌሾͲሿǦ ͶͺǢ ǥǥ̴ Ǣ ̴ሺ ȗሻሼǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͷ൏൅͵͹൐ǣǥǥ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͻ൏൅Ͷͳ൐ǣ ΨǡǦͲ͵ͲሺΨሻ ͲͶͲͲ͹ ͺǡΨ ̴ Ǣ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ͳͻ൏൅Ͷͳ൐ǣ ͲͶͲͲ͹ ͺǡΨ ͳ̴ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ሽ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ʹͲͲͲͲͲͲͲͳ ൌሾͲሿǦ ͶͺǢ ሽ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͳ൏൅Ͷͻ൐ǣǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͳ൏൅Ͷͻ൐ǣ ΨǡǦͲʹͺሺΨሻ ΨǡǦͲʹͺሺΨሻ ʹ̴ሾሿൌሼ͸ǡ͹ǡͺǡͻǡͳͲሽǢ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺ ͳ̴ሾሿൌሼͳǡʹǡ͵ǡͶǡͷሽǢ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͷ൏൅ͷ͵൐ǣ ͶͲͲͲͲͲͲͲ͵ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹͷ൏൅ͷ͵൐ǣ ͲͶͲͲ͹ͲǡΨ ͲͶͲͲ͹ͲǡΨ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ͷ ̴ Ǣ ʹ̴ሾሿൌሼ͸ǡ͹ǡͺǡͻǡͳͲሽǢǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ ൏൅͸Ͳ൐ǣǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ ൏൅͸Ͳ൐ǣ Ψ ǡǦͲʹͲሺΨሻ Ψ ǡǦͲʹͲሺΨሻ ̴ Ǣ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͶʹͲǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ൏൅͸͵൐ǣ ͲͶͲͲ͹ͲǡΨ ൌΨͷǢ (a) Sourceሽ codes. ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸ʹ൏൅͸͵൐ǣ ͲͶͲͲ͹ͲǡΨ ሽ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ͹ͲͲͲͲͲͲͲ͸ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵͹൏൅͹ͳ൐ǣ ΨǡǦͲͷͲሺΨሻ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺ ൌΨͷǢ ͻͲͲͲͲͲͲͲͺǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵൏൅͹ͷ൐ǣǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵͹൏൅͹ͳ൐ǣ ͲͶͲͲ͹ͺǡΨ ΨǡǦͲͷͲሺΨሻ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͵൏൅ͺ͵൐ǣ ΨǡǦͲͶͺሺΨሻ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ Ͳሺʹ̴ሾͳ̴ሾሿሿൌൌͻሻሼ Ͳ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸͵൏൅͹ͷ൐ǣ ͲͶͲͲ͹ͺǡΨ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͹൏൅ͺ͹൐ǣǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͵൏൅ͺ͵൐ǣ ͲͶͲͲ͹ͲǡΨ ΨǡǦͲͶͺሺΨሻ ̴ ǢǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ൏൅ͻͶ൐ǣ Ψ ǡǦͲͶͲሺΨሻ ǥǥ ǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ͹൏൅ͺ͹൐ǣǥǥ ͲͶͲͲ͹ͲǡΨ ሽ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͲǣͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͸Ͷ൏൅ͻͶ൐ǣ ʹͲͲͲͲͲͲͲͳ Ψ ǡǦͲͶͲሺΨሻ ሺሻȗ ̴൅ͻ͹ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺǥǥ ͶͲͲͲͲͲͲͲ͵ ǥǥ ሺሻ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ͷ ̴ Ǣ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ʹͲͲͲͲͲͲͲͳ ሺሻȀʹͲ̈́ǦͺͲ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͶʹͲ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͲǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺ ͹ͲͲͲͲͲͲͲ͸ ͶͲͲͲͲͲͲͲ͵ Ͳ͹ʹͲǣͲͲͲͲͲͲͲͲ͸ͲͲͲͲͲͲͲͲ͹ͲͲͲͲͲͲͲͲͺͲͲͲͲͲͲͲͲͻሽ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ͻͲͲͲͲͲͲͲͺ ͷ Ͳ͹ʹͲǣͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͶͲͲ͵ͷͲͲͲͲͲͲͲͲͲ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ ͲǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͶʹͲ Ͳ Ͳ͹ʹͲǣ ͲͲͲͲͲͲͲͲͳͲͲͲͲͲͲͲͲʹͲͲͲͲͲͲͲͲ͵ͲͲͲͲͲͲͲͲͶ Ͳ͹͵ͲͲǣͲͲͲͲͲͲͲͲͷͲͲͲͲͲ͹Ͳ͹ʹͲͲͲͲͲͲͲͲͲͲͲͳ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹Ͳ ͹ͲͲͲͲͲͲͲ͸ Ͳ͹͵ͳͲǣͲ͸(b) Memory layout ͲͲͲͲͲ͹ after arrayͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲͲ initialization. ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ͺ ͻͲͲͲͲͲͲͲͺ ǤǣͲͲͲͲͲͲͲͲͲͲͶͲͲ͹ Ͳ(c) Assembly Ͳ codes.

Fig. 7. An exemplary program that demonstrates how the stack works with arrays. There is no information about the size of each array left in assembly codes.

support symbolic memory, but KLEE and Angr provide assembly jmp, but it failed funcpointer_sj_l1. very good support. Angr has solved seven cases out of Floating-point Numbers: The results show KLEE and Tri- eight. It only failed in handling the case (Figure 7(a)) with ton do not support floating-point operations, and Angr can a two-leveled array stackarray_sm_l2. Also, it implies support some. During our test, Triton directly reported that that when there are multi-leveled pointers, Angr would fail. it cannot interpret such floating-point instructions. Angr has Figure 7(c) demonstrates the assembly codes that initialize solved two out of the five designated cases. The two passed the arrays, and Figure 7(b) demonstrates the stack layout af- cases are easier ones, which only require integer values as ter initialization. We can observe that the information about the solution. All the failed cases require decimal values as array size or boundary does not exist in assembly codes. the solution, and they employ the atof function to convert This justifies why binary-code-based symbolic execution argv[1] to decimals. Since Angr has also failed the test in tools do not suffer problems when a challenge requires an handling atof in atof_ef_l2, the failures are likely to out-of-boundary access, e.g., stackoutofbound_sm_l2. be caused by the atof function. In comparison, KLEE can solve the two-leveled array prob- Arithmetic Overflows: Arithmetic overflow is not a very lem because it is based on STP [33], which is designed for hard problem, and it only requires symbolic execution tools solving such problems related to arrays. However, KLEE to handle such cases carefully. In our test, KLEE and Angr does not support C++, so it failed the problems with vectors have solved all the cases. However, Triton failed in handling and lists. the integer overflow case in Figure 3(i). The result shows Symbolic Jumps: Since symbolic jump demonstrates no there is still much room for Triton to improve for this explicit conditional branches in the CFG, it should be a problem. hard problem for symbolic execution. However, KLEE and External Function Calls: In this group of logic bombs, Angr are not likely to be affected much by the trick. KLEE each case only contains one external function call. However, has tackled the problem with an array of function pointers the result is very disappointing. Triton only passed a very funcpointer_sj_l1. It failed the other test cases because simple case that print out (with printf) a symbolic value of they employ an assembly instruction jmp, which KLEE integer type. It does not even support printing out floating- does not support. Angr successfully handled two cases with point values. Angr has solved the printf cases and two 13 more complicated cases, atoi_ef_l2 and pow_ef_l2. It [7] E. J. Schwartz, T. Avgerinos, and D. Brumley, “All you ever wanted cannot support atof_ef_l2 and other cases. The results to know about dynamic taint analysis and forward symbolic execution (but might have been afraid to ask),” in Proc. of the 2010 show that we should be cautious when designing logic IEEE Symposium on Security and Privacy, 2010. bombs. Even when involving straightforward external func- [8] X. Qu and B. Robinson, “A case study of concolic testing tools and tion calls, the results could be affected. their limitations,” in Proc. of the IEEE International Symposium on Empirical Software Engineering and Measurement, 2011. [9] L. Cseppento and Z. Micskei, “Evaluating symbolic execution- 7 CONCLUSION based test tools,” in Proc. of the IEEE 8th International Conference on Software Testing, Verification and Validation, 2015. This work proposes an approach to benchmark the capa- [10] R. Kannavara, C. J. Havlicek, B. Chen, M. R. Tuttle, K. Cong, S. Ray, bility symbolic execution tools in handling particular chal- and F. Xie, “Challenges and opportunities with concolic testing,” in Aerospace and Electronics Conference (NAECON), 2015 National. lenges. To this end, we studied the taxonomy of challenges IEEE, 2015, pp. 374–378. faced by symbolic execution tools, including nine symbolic- [11] H. Xu, Y. Zhou, Y. Kang, and M. R. Lyu, “Concolic execution reasoning challenges and three path-explosion challenges. on small-size binaries: challenges and empirical study,” in Proc. Such a study is essential for us to design the benchmark of the 47th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, 2017. dataset. Then we proposed a promising benchmark ap- [12] R. Baldoni, E. Coppa, D. C. D’Elia, C. Demetrescu, and I. Finocchi, proach based on logic bombs. The idea is to design logic “A survey of symbolic execution techniques,” ACM Computing bombs that can only be triggered if a symbolic execution tool Survey, 2018. solves specific challenging issues. By making the programs [13] T. Avgerinos, S. K. Cha, B. L. T. Hao, and D. Brumley, “Aeg: Auto- of logic bombs as small as possible, we can speed up the matic exploit generation,” in Proc. of the 2011 ACM the Network and Distributed System Security Symposium, 2011. benchmark process; and by making them as straightforward [14] J. Ming, D. Xu, L. Wang, and D. Wu, “Loop: Logic-oriented opaque as possible, we can avoid unexpected reasons that may predicate detection in obfuscated binary code,” in Proc. of the 22nd affect the benchmark results. In this way, our benchmark ACM SIGSAC Conference on Computer and Communications Security, approach is both accurate and efficient. Following the idea, 2015. [15] B. Yadegari and S. Debray, “Symbolic execution of obfuscated we implemented a dataset of logic bombs and a prototype code,” in Proc. of the 22nd ACM SIGSAC Conference on Computer benchmark framework which automates the benchmark and Communications Security, 2015. process. Then, we conducted real-world experiments on [16] T. Xie, N. Tillmann, J. de Halleux, and W. Schulte, “Fitness-guided three symbolic execution tools. Experimental results show path exploration in dynamic symbolic execution,” in Proc. of the 39th IEEE/IFIP International Conference on Dependable Systems & that the benchmark process for each tool generally takes Networks, 2009. dozens of minutes. Angr achieved the best benchmark re- [17] L. Ciortea, C. Zamfir, S. Bucur, V. Chipounov, and G. Candea, sults with 21 cases solved, KLEE solved nine, and Triton “Cloud9: a software testing service,” ACM SIGOPS Operating only solved three. These results justify the value of a third- Systems Review, vol. 43, no. 4, pp. 5–10, 2010. party benchmark toolset for symbolic execution tools. Fi- [18] T. Avgerinos, A. Rebert, S. K. Cha, and D. Brumley, “Enhancing symbolic execution with veritesting,” in Proc. of the 36th Interna- nally, we released our dataset as open source on GitHub for tional Conference on Software Engineering. ACM, 2014. public usage. We hope it would serve as an essential tool for [19] S. Banescu, C. Collberg, V. Ganesh, Z. Newsham, and the community to benchmark symbolic execution tools and A. Pretschner, “Code obfuscation against symbolic execution at- could facilitate the development of more comprehensive tacks,” in Proceedings of the 32nd Annual Conference on Computer Security Applications (ACSAC). ACM, 2016. symbolic execution techniques. [20] C. A. R. Hoare, “An axiomatic basis for computer programming,” Communications of the ACM, 1969. [21] D. Song, D. Brumley, H. Yin, J. Caballero, I. Jager, M. G. Kang, ACKNOWLEDGMENTS Z. Liang, J. Newsome, P. Poosankam, and P. Saxena, “Bitblaze: A new approach to computer security via binary analysis,” in This work was supported by the National Natural Science Proc. of the International Conference on Information Systems Security. Foundation of China (Project Nos. 61672164,61332010), and Springer, 2008. 2015 Microsoft Research Asia Collaborative Research Pro- [22] P. Godefroid, N. Klarlund, and K. Sen, “Dart: directed automated gram (Project No. FY16-RES-THEME-005). random testing,” in ACM Sigplan Notices, 2005. [23] D. Brumley, I. Jager, T. Avgerinos, and E. J. Schwartz, “Bap: A binary analysis platform,” in Proc. of the International Conference on Computer Aided Verification. Springer, 2011. REFERENCES [24] N. Razavi, F. Ivanˇcić, V. Kahlon, and A. Gupta, “Concurrent test generation using concolic multi-trace analysis,” in Asian Sympo- [1] J. C. King, “Symbolic execution and program testing,” Communi- sium on Programming Languages and Systems. Springer, 2012. cations of the ACM, 1976. [25] S. K. Cha, T. Avgerinos, A. Rebert, and D. Brumley, “Unleashing [2] C. Cadar, D. Dunbar, and D. R. Engler, “Klee: Unassisted and mayhem on binary code,” in Proc. of the 2012 IEEE Symposium on automatic generation of high-coverage tests for complex systems Security and Privacy, 2012. programs,” in Proc. of the 8th USENIX Symposium on Operating Systems Design and Implementation, 2008. [26] D. Davidson, B. Moench, T. Ristenpart, and S. Jha, “Fie on [3] Y. Shoshitaishvili and et al., “Sok: (state of) the art of war: Offensive firmware: Finding vulnerabilities in embedded systems using techniques in binary analysis,” in Proc. of the IEEE Symposium on symbolic execution.” in USENIX Security Symposium, 2013. Security and Privacy, 2016. [27] S. Guo, M. Kusano, and C. Wang, “Conc-ise: Incremental symbolic [4] D. A. Ramos and D. R. Engler, “Under-constrained symbolic execution of concurrent software,” in Proc. of the 31st IEEE/ACM execution: Correctness checking for real code.” in USENIX Security International Conference on Automated Software Engineering (ASE), Symposium, 2015. 2016. [5] M. Quan, “Hotspot symbolic execution of floating-point pro- [28] V. Chipounov, V. Kuznetsov, and G. Candea, “S2e: A platform for grams,” in Proc. of the 24th ACM SIGSOFT International Symposium in-vivo multi-path analysis of software systems,” 2011. on Foundations of Software Engineering (FSE), 2016. [29] T. Avgerinos, S. K. Cha, A. Rebert, E. J. Schwartz, M. Woo, and [6] C. Cadar and K. Sen, “Symbolic execution for software testing: D. Brumley, “Automatic exploit generation,” Communications of the three decades later,” Communications of the ACM, 2013. ACM, 2014. 14

[30] A. Farzan, A. Holzer, N. Razavi, and H. Veith, “Con2colic testing,” eighteenth International Symposium on Software Testing and Analysis. in Proc. of the 9th Joint Meeting on Foundations of Software Engineering ACM, 2009. (FSE). ACM, 2013. [42] J. C. Lagarias, “The 3x + 1 problem and its generalizations,” The [31] T. Bergan, D. Grossman, and L. Ceze, “Symbolic execution of American Mathematical Monthly, 1985. multithreaded programs from arbitrary program contexts,” 2014. [43] D. Eastlake 3rd and P. Jones, “Us secure hash algorithm 1 (SHA1),” [32] S. Guo, M. Kusano, C. Wang, Z. Yang, and A. Gupta, “Assertion Tech. Rep., 2001. guided symbolic execution of multithreaded programs,” in Proc. [44] M. I. Sharif, A. Lanzi, J. T. Giffin, and W. Lee, “Impeding malware of the 2015 10th Joint Meeting on Foundations of Software Engineering. analysis using conditional code obfuscation.” in NDSS, 2008. ACM, 2015. [45] T. Wang, T. Wei, G. Gu, and W. Zou, “Taintscope: A checksum- [33] V. Ganesh and D. L. Dill, “A decision procedure for bit-vectors and aware directed fuzzing tool for automatic software vulnerability arrays,” in Proc. of the International Conference on Computer Aided detection,” in IEEE Symposium on Security and Privacy (SP), 2010. Verification. Springer, 2007. [46] R. Corin and F. A. Manzano, “Efficient symbolic execution for [34] L. De Moura and N. Bjørner, “Z3: An efficient smt solver,” in analysing cryptographic protocol implementations.” ESSoS, 2011. Proc. of the International Conference on Tools and Algorithms for the [47] V. Kuznetsov, J. Kinder, S. Bucur, and G. Candea, “Efficient state Construction and Analysis of Systems. Springer, 2008. merging in symbolic execution,” ACM Sigplan Notices, 2012. [35] K. Sen, D. Marinov, and G. Agha, “Cute: a concolic unit testing [48] N. Hasabnis and R. Sekar, “Extracting instruction semantics via engine for c,” in ACM SIGSOFT Software Engineering Notes, 2005. symbolic execution of code generators,” in Proc. of the 2016 24th [36] W. Landi and B. G. Ryder, “Pointer-induced aliasing: a problem ACM SIGSOFT International Symposium on Foundations of Software classification,” in Proc. of the 18th ACM SIGPLAN-SIGACT sympo- Engineering, 2016. sium on Principles of programming languages, 1991. [49] F. Saudel and J. Salwan, “Triton: a dynamic symbolic execution [37] A. Solovyev, C. Jacobsen, Z. Rakamarić, and G. Gopalakrishnan, framework,” in Symposium sur la sécuritédes technologies de linfor- “Rigorous estimation of floating-point round-off errors with sym- mation et des communications, SSTIC, France, Rennes, 2015. bolic taylor expansions,” in International Symposium on Formal [50] C. Lattner and V. Adve, “Llvm: A compilation framework for Methods. Springer, 2015. lifelong program analysis & transformation,” in Proc. of the Inter- [38] D. Liew, D. Schemmel, C. Cadar, A. F. Donaldson, R. Zähl, and national Symposium on Code Generation and Optimization: Feedback- K. Wehrle, “Floating-point symbolic execution: A case study in Directed and Runtime Optimization. IEEE Computer Society, 2004. n-version programming,” in Proceedings of the 32nd IEEE/ACM [51] C.-K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, International Conference on Automated Software Engineering (ASE), S. Wallace, V. J. Reddi, and K. Hazelwood, “Pin: building cus- 2017. tomized program analysis tools with dynamic instrumentation,” [39] D. S. Liew, “Symbolic execution of verification languages and in ACM Sigplan Notices, vol. 40, no. 6, 2005, pp. 190–200. floating-point code,” 2018. [52] N. Nethercote, “Dynamic binary analysis and instrumentation,” [40] J. Burnim and K. Sen, “Heuristics for scalable dynamic test gen- Ph.D. dissertation, PhD thesis, University of Cambridge, 2004. eration,” in Proc. of the 23rd IEEE/ACM International Conference on [53] R. Roemer, E. Buchanan, H. Shacham, and S. Savage, “Return- Automated Software Engineering, 2008. oriented programming: Systems, languages, and applications,” [41] P. Saxena, P. Poosankam, S. McCamant, and D. Song, “Loop- ACM Transactions on Information and System Security (TISSEC), extended symbolic execution on binary programs,” in Proc. of the 2012.