Quick viewing(Text Mode)

Mathematical Execution: a Unified Approach for Testing Numerical Code

Mathematical Execution: a Unified Approach for Testing Numerical Code

Mathematical Execution: A Unified Approach for Testing Numerical Code

Zhoulai Fu Zhendong Su Department of Computer Science, University of California, Davis, USA [email protected] [email protected]

Abstract (2) symbolic execution and its variants can perform systematic path This paper presents Mathematical Execution (ME), a new, unified exploration, but suffer from path explosion and are weak in dealing approach for testing numerical code. The key idea is to (1) capture with complex program logic involving numerical constraints. the desired testing objective via a representing function and (2) trans- Our Approach. This paper introduces a new, unified approach form the automated testing problem to the minimization problem of for automatically testing numerical code. It proceeds as follows: the representing function. The minimization problem is to be solved We derive from the program under test FOO another program FOO_R, via mathematical optimization. The main feature of ME is that it called representing function, which represents how far an input x directs input space exploration by only executing the representing dom(FOO) is from reaching the set x FOO(x) wrong . We require∈ function, thus avoiding static or symbolic reasoning about the pro- that the representing function returns{ | a non-negative⇓ value} for all x, gram semantics, which is particularly challenging for numerical which diminishes when x gets close to the set and vanishes when x code. To illustrate this feature, we develop an ME-based algorithm goes inside. Intuitively, this representing function is similar to a sort for coverage-based testing of numerical code. We also show the of distance. It allows to approach the automated testing problem, potential of applying and adapting ME to other related problems, i.., the problem of finding an element in x FOO(x) wrong , as including path reachability testing, boundary value analysis, and the problem of minimizing FOO_R. This approach{ | can⇓ be justified} satisfiability checking. with a strong guarantee: To demonstrate ME’s practical benefits, we have implemented FOO(x) wrong x minimizes FOO_R, (1) CoverMe, a proof-of-concept realization for branch coverage based ⇓ ⇔ testing, and evaluated it on Sun’s C math library (used in, for assuming that there exists at least one x such that FOO(x) wrong example, Android, Matlab, Java and JavaScript). We have compared (details in Sect.4). Therefore, the essence of our approach⇓ is to CoverMe with random testing and Austin, a publicly available transform the automated testing problem to a minimization problem. branch coverage based testing tool that supports numerical code Minimization problems are well studied in the field of Mathematical (Austin combines symbolic execution and search-based heuristics). Optimization (MO) [53]. MO works by executing its objective Our experimental results show that CoverMe achieves near-optimal function only (see Sect.2). That is to say, our approach does not need and substantially higher coverage ratios than random testing on all to analyze the semantics of the tested programs. Instead, it directs tested programs, across all evaluated coverage metrics. Compared input space exploration by only executing the representing function. with Austin, CoverMe improves branch coverage from 43% to 91%, We call this approach Mathematical Execution (abbreviated as ME). with significantly less time (6.9 vs. 6058.4 seconds on average). Note that mathematical optimization by itself does not neces- sarily provide a panacea for automated testing because many MO problems are themselves intractable. However, efficient algorithms 1. Introduction have been successfully applied to difficult mathematical optimiza- Testing has been a predominant approach for improving software tion problems. A classic example is the NP-hard traveling salesman quality. Manual testing is notoriously tedious [13]; automated testing problem, which has been nicely handled by simulated annealing [37],

arXiv:1610.01133v1 [cs.PL] 4 Oct 2016 has been an active research topic, drawing on a rich body of a stochastic MO technique. Another example is Monte Carlo Markov techniques, such as symbolic execution [12, 14, 17, 36], random Chain [8], which has been effectively adapted to testing and verifi- testing [1, 11, 30] and search-based strategies [9, 38, 42, 49]. cation [28, 29, 34, 63]. A major finding of this work is that using Automated testing is about producing program failures [51]. Let mathematical optimization for testing numerical code is a powerful FOO be a program and dom(FOO) be its input domain. An automated approach. If we carefully design the representing function so that testing problem is to systematically find x dom(FOO) such that certain conditions are respected, we can come up with mathematical FOO(x) wrong, where FOO(x) wrong denotes∈ “FOO goes wrong if optimization problems that can be efficiently solved by off-the-shelf executed⇓ on input x.” It is difficult⇓ to specify FOO(x) wrong, which MO tools. is known as the test oracle problem [50, 69]. This⇓ paper assumes To demonstrate the feasibility of our approach, we have applied that an algorithm for checking FOO(x) wrong is given. ME on coverage-based testing [54] of floating-point code, a fun- An important problem in testing is the⇓ testing of numerical code, damental problem in testing. The experimental results show that i.e., programs with floating-point arithmetic, non-linear variable our implemented tool, CoverMe, is highly effective. Fig.1 gives a relations, or external function calls (such as logarithmic and trigono- small program from our benchmark suite Fdlibm [3]. The program metric functions). These programs are pervasive in safety-critical operates on two double input parameters. It first takes |x|’s high systems, but ensuring their quality remains difficult. Numerical code word by bit twiddling, including a bitwise AND (&), a pointer refer- presents two specific challenges for existing automated testing tech- ence (&) and a dereference (*) operator. The bit twiddling result is niques: (1) Random testing is easy to employ and fast, but ineffective stored in integer variable ix (Line 3), followed by four conditional in finding deep semantic issues and handling large input spaces; and statements that examine ix (Lines 4–15). The tool CoverMe yields: 1 #define __HI(x) *(1+(int*)&x) 2 double __kernel_cos(double x, double y){ p 3 ix = __HI(x)&0x7fffffff; /* ix = |x|’s high word */ 4 if(ix<0x3e400000) { / if |x| < 2 (-27) / 4 * ** * p2 5 if(((int)x)==0) return ...; /* generate inexact */ p0 6 } 7 ...; 8 if(ix < 0x3FD33333) /* if |x| < 0.3 */ p1 p3 p5 9 return ...; x 10 else { n 11 if(ix > 0x3fe90000) { /* if |x| > 0.78125 */ (a) (b) 12 ...; 13 } else { Figure 2: (a) Local optimization with the curve of λx.x 1 ? 0 : (x 1)2. 14 ...; The illustrated technique uses tangents of the curve to≤ converge quickly− 15 } to a minimum point; (b) Global optimization with the curve of λx.x 16 return ...; 2 2 2 2 ≤ 1 ? ((x+1) 4) : (x 4) . The MCMC method starts from p0, converges 17 } − − to local minimum p1, performs a Monte-Carlo move to p2 and converges to 18 } p3. Then it moves to p4 and finally converges to p5.

Figure 1: The benchmark program __kernel_cos taken from the Fdlibm [3] tation, such as x*x + abs(y). Similarly, we use a lambda expres- library (http://www.netlib.org/fdlibm/k_cos.c). sion to mean either a mathematical function or its implemen- tation. For example, an implementation λx.x2 may refer to the 100% line coverage code double f (double x) {return x*x;}. We use the C-like syn- 87.5% branch coverage tax A? v1 : v2 to mean an implementation that returns v1 if A holds, or v2 otherwise. When investigating why CoverMe fails to achieve full branch coverage, we find that one out of the eight branches in the program 2. Background cannot be reached. The condition if ((int) x) == 0 (Line 5) always holds because it is nested within the x < 2 27 branch (Line We begin with some preliminaries on mathematical optimization | | − 4). 1 Therefore, the 87.5% branch coverage is, in fact, optimal. We following the exposition of [28]. A complete treatment of either is have compared CoverMe with Austin [43], a publicly available, state- beyond the scope of this paper. See [8, 53, 71] for more details. of-the-art coverage-based testing tool that can handle floating-point A Mathematical Optimization (MO) problem is usually formu- code. Austin achieves 37.5% branch coverage in 1885.1 seconds, lated as: minimize f (x) whereas CoverMe achieves the optimal coverage in 15.4 seconds (2) (see Sect.5). subject to x S ∈ where f is called the objective function, and S the search space.In Contributions. Our contributions follow: general, mathematical optimization problems can be divided into • We introduce Mathematical Execution, a new general approach two categories. One focuses on how functions are shaped at local for testing numerical code; regions and where a local minimum can be found near a given • We develop an effective coverage-based testing algorithm using input. This local optimization is classic, usually involving standard the ME approach; techniques such as Newton’s or the steepest descent methods. Local • We demonstrate that ME is a unified approach by showing how optimization not only provides the minimum value of a function to apply ME to several important testing problems; and within a neighborhood of the given input points, but also aids global • We implement the coverage-based testing tool CoverMe and optimization, which determines the function minimum over the show its effectiveness on real-world numerical library code. entire search space. Paper Outline. Sect.2 gives the background on mathematical Local Optimization. Let f be a function defined over a Euclidean optimization. Sect.3 illustrates ME by studying the case of branch space with distance function d. We call x∗ a local minimum point coverage based testing. We define the problem, demonstrate the if there exists a neighborhood of x∗, namely x d(x,x∗) < δ for ME solution, and give the algorithmic details. Sect.4 lays out the { | } some δ > 0, so that all x in the neighborhood satisfy f (x) f (x∗). theoretical foundation for ME and demonstrates ME with several ≥ The value f (x∗) is called a local minimum of f . additional examples. Sect.5 presents an implementation overview of Local optimization problems can usually be efficiently solved if CoverMe and describes our experimental results. Sect.6 discusses the objective function is smooth (such as continuous or differentiable the current limitations of ME. Finally, Sect.7 surveys related work to some degree) [56]. Fig.2(a) shows a common local optimization and Sect.8 concludes. For completeness, AppendixA lists the method with the objective function λx.x 1 ? 0 : (x 1)2. It uses benchmark programs in Fdlibm that CoverMe does not support and tangents of the curve to quickly converge≤ to a minimum− point. The their reasons, and AppendixB gives implementation details. smoothness of the curve makes it possible to deduce the function’s Notation. The sets of real and integer numbers are denoted by behavior in the neighborhood of a particular point x by using R and Z respectively. For two real numbers a and b, the usage information at x only. aEb means a 10b. In this presentation, we do not distinguish Global Optimization and MCMC. If f (x ) f (x) for all x in ∗ ∗ a mathematical expression, such as x2 + y , and its implemen- the search space, we call x a global minimum point≤ (or minimum | | ∗ point for short), and f (x∗) the global minimum (or minimum for 1Sun’s developers decided to use this redundant check to trigger the short) of the function f . In this presentation, if we say “x∗ minimizes inexact exception of floating-point as a side effect. From the program the function f ”, we mean x∗ is a global minimum point of f . semantics perspective, no input of __kernel_cos can trigger the false branch We use the Monte Carlo Markov Chain (MCMC) sampling to of if (((int) x) == 0). solve global optimization problems. A fundamental fact regarding MCMC is that it follows the target distribution asymptotically. FOO:: ProgramProgram under under test test For simplicity, we give the results [8] with the discrete-valued in any LLVM-supported language probability. type_tFOO(doublex1,doublex2,...) Lemma 2.1. Let x be a random variable, A be an enumerable set of the possible values of x. Let f be a target probability distribution pen (.cpp) for x, i.e., the probability of x taking value a is f (a). Then, for an double pen (int i, int op, double lhs, double rhs) MCMC sampling sequence x1,...,xn ... and a probability density function P(x = a) for each x , we have P(x = a) f (a). FOO_I: Instrumented program (.bc) n n n → For example, consider the target distribution of coin tossing type_tFOO_I(doublex1,doublex2,...) with 0.5 probability for having the head. An MCMC sampling is a FOOloader: Program(.cpp) under test sequence of random variables x1,. . . , xn,..., such that the probability in any LLVM-supported language of x being “head”, denoted by P , converges to 0.5. void LOADER (double* P) n n type_tFOO(doublex1,doublex2,...) MCMC provides multiple advantages in practice. Because FOO_R (.cpp) such sampling can simulate an arbitrary distribution (Lem. 2.1), pen (.cpp) void FOO_R(double* P) MCMC backends can sample for a target distribution in the form of double pen (int i,Step int op,1. double lhs, double rhs) f (x) λx.exp− where f is the function to minimize, which allows its libr.so sampling process to attain the minimum points more frequently than FOO_I:: InstrumentedInstrumented program program (.bc) void FOO_R(double* P) the other points. Also, MCMC has many sophisticated techniques type_tFOO_I(doublex1,doublex2,...) that integrate with classic local search techniques, such as the Basin- MCMC minimization procedure (.py) hopping algorithm [45] mentioned above. Some variants of MCMC loader (.cpp) basinhopping (func, sp, n_iter, callback) can even handle high dimensional problems [60], or non-smooth void LOADER (double* P) objective functions [25]. Fig.2(b) illustrates a typical MCMC cycle. LLVM pass FOO_R (.cpp) Steps p0 p1, p2 p3, and p4 p5 are the local optimization; Linking → → → void FOO_R(double P) Steps p1 p2 and p3 p4 aim to prevent the MCMC sampling * → → FOO: Program under test from getting trapped in the local minima. libr.so in any LLVM-supported language void FOO_R(double* P) 3. Branch Coverage Based Testing type_tFOO(doublex1,doublex2,...) MCMC minimization procedure (.py) This section shows a detailed ME procedure in solving branch pen (.cpp) coverage based testing for numerical code. basinhopping (func, sp, n_iter, callback) double pen (int i, int op, double lhs, double rhs) LLVM pass 3.1 Problem Statement FOO_I: Instrumented program (.bc) Linking Definition 3.1. Let FOO be the program under test with N condi- type_tFOO_I(doublex1,doublex2,...) tional statements, labeled by l0,. . . ,lN 1. Each li has a true branch − loader (.cpp) iT and a false branch iF . The problem of branch coverage based FOO: Program under test testing aims to find a set of inputs X dom(FOO) that covers all voidin any LOADER LLVM-supported (doubleStep* P) 2. language ⊆ 2 N branches of FOO. Here, we say a branch is “covered” by X if it type_tFOO(doublex1,doublex2,...) FOO_R :(.cpp) Representing function ∗ FOO x X is passed through by the path of executing with an . We pen ∈ (.cpp)_ scope the problem with three assumptions: void FOO R(double* P) double pen (int i, int op, double lhs, double rhs) (a) The inputs of FOO are floating-point numbers. libr.so FOO_I: Instrumented program (.bc) (b) Each Boolean condition in FOO is an arithmetic comparison in void FOO_R(double* P) type_tFOO_I(doublex1,doublex2,...) the form of a op b, where op ==, ,<,=, ,> , and a and Step 3. ∈ { ≤ 6 ≥ } MCMC minimization procedure (.py) b are floating-point variables or constants. loader (.cpp) basinhoppingGenerated (func, test inputs sp, n_iter, callback) (c) Each branch of FOO is feasible, i.e., it is covered by dom(FOO). void LOADER (double* P) Assumptions (a) and (b) are set for modeling numerical code. As- X: A set of FOO LLVM _ R ’s(.cpp) global pass minimum points, which saturates (thereforeLinking covers) all branches of FOO: Program under test sumption (c) is set to simplify our presentation. Our implementation void FOO_R(double* P) will partially relax these assumptions (details in AppendixB). in any LLVM-supported language libr.so We introduce the concept of a saturated branch and use it to Figure 3: Branch coverage based testing via ME. Thetype goal_tFOO(doublex1,doublex2,...) is to saturate void FOO_R(double* P) reformulate Def. 3.1. (therefore cover) all branches of FOO, i.e., 0T ,0F ,1T ,pen1F (.cpp). MCMC minimization{ procedure (.py)} Definition 3.2. Let X be a set of inputs generated during the process double pen (int i, int op, double lhs, double rhs) basinhopping (func, sp, n_iter, callback) of testing. We say a branch is saturated by X if the branch itself and FOO_I: Instrumented program (.bc) all its descendant branches, if any, are covered by X. Here, a branch Observe that if all FOOLLVM’s branches pass are covered by an input set _ _ b is called a descendant branch of b if there exists a segment of X, they are also saturatedLinking by X, and vice versa.type ThistFOO observationI(doublex1,doublex2,...) 0 allows us to reformulate the branch coverage based testing problem control flow path from b to b . Given X dom(FOO), we write loader (.cpp) 0 ⊆ following the lemma below. Saturate(X) (3) void LOADER (double* P) Lemma 3.3. Let FOO be a program under test with the assumptions FOO_R (.cpp) for the set of branches saturated by X. Def. 3.1(a-c) satisfied. The goal of branch coverage based testing To illustrate Def. 3.2, suppose that an in- can be stated as to find a small set of inputs Xvoid FOOdom_R(double(FOO) that* P) ⊆ put set X covers 0T ,0F ,1F in the control- saturates all FOO’s branches. libr.so flow graph on the{ right, then} Saturate(X) = Remark 3.4. In Lem. 3.3, we expect the generatedvoid input FOO_R(double set X to* beP) 0F ,1F ; 1T is not saturated because it is not { } “small”, because otherwise we can use X = dom(FOO) which already covered, and 0T is not saturated because its MCMC minimization procedure (.py) saturates all branches under the assumption Def. 3.1(c). descendant branch 1T is not covered. basinhopping (func, sp, n_iter, callback) LLVM pass Linking 3.2 An Example Table 1: A scenario of how our approach saturates all branches of FOO by repeatedly minimizing FOO_R. Column “Saturate”: Branches that have been We use a simple program FOO in Fig.3 to illustrate our approach. saturated. Column “FOO_R”: The representing function and its plot. Column The program has two conditional statements l0 and l1, and their true “x∗”: The point where FOO_R attains the minimum. Column “X”: Generated and false branches are denoted by 0T , 0F and 1T , 1F respectively. test inputs. The objective is to find an input set that saturates all branches. Our approach proceeds in three steps: # Saturate FOO_R x∗ X Step 1. We inject a global variable r in FOO, and, immediately before each control point li, we inject an assignment (Fig.3) 1 /0 λx.0 0.7 0.7 r = pen, (4) { } where pen invokes a code segment with parameters specific to li. The idea of pen is to capture the distance of the program input from ( saturating a branch that has not yet been saturated. As illustrated in ((x + 1)2 4)2 x 1 2 1F λx. − ≤ 1.0 0.7,1.0 Fig.3, this distance returns different values depending on whether { } (x2 4)2 else { } − the branches at li are saturated or not. . We denote the instrumented program by FOO_I. The key in Step 1 is to design pen to meet certain conditions that allow us to approach ( the problem defined in Lem. 3.3 as a mathematical optimization 0T ,1T , 0 x > 1 0.7,1.0, 3 { λx. 2 1.1 { problem. We will specify the conditions in the next step. 1F (x 1) + ε else 1.1 } − } Step 2. This step constructs the representing function that we have mentioned in Sect.1. The representing function is the driver program FOO_R shown in Fig.3. It initializes r to 1, invokes FOO_I and then 0T ,1T , 0.7,1.0, r FOO_R FOO_R 4 { λx.1 5.2 { returns as the output of . That is, (x) for a given input 0F ,1F − 1.1, 5.2 x calculates the value of r at the end of executing FOO_I(x). } − } Our approach requires two conditions on FOO_R: C1. FOO_R(x) 0 for all x, and ≥ _ C2. FOO R(x) = 0 if and only if x saturates a new branch. In other x∗ = 5.2, necessarily satisfies FOO_R(x∗) > 0, which terminates words, a branch that has not been saturated by the generated the algorithm.− input set X becomes saturated with X x , i.e., Saturate(X) = Remark 3.5. Given an input x, the value of FOO_R(x) may change Saturate(X x ). ∪ { } 6 ∪ { } during the minimization process. In fact, FOO_R is constructed with Imagine that we have designed pen so that FOO_R meets both C1 and injected pen which returns different values at li depending on C2. Ideally, we can then saturate all branches of FOO by repeatedly whether the branches iT and iF have been saturated. Thus, the mini- minimizing FOO_R as shown in the step below. mization step in our algorithm differs from existing mathematical optimization techniques where the objective function is fixed [71]. Step 3. In this step, we use MCMC to calculate the minimum points of FOO_R. Other mathematical optimization techniques, e.g., 3.3 Algorithm genetic programming [40], may also be applicable, which we leave We provide details corresponding to the three steps in Sect. 3.2. The for future investigation. algorithm is summarized in Algo.1. We start with an input set X = /0 and Saturate(X) = /0. We minimize FOO_R and obtain a minimum point x∗ which necessarily Algorithm for Step 1. The outcome of this step is the instru- saturates a new branch by condition C2. Then we have X = x mented program FOO_I. As explained in Sect. 3.2, the essence is { ∗} and we minimize FOO_R again which gives another input x∗∗ and to inject the variable r and the assignment r = pen before each x∗,x∗∗ saturates a branch that is not saturated by x∗ . We conditional statement (Algo.1, Lines 1-4). continue{ } this process until all branches are saturated. When{ } the To define pen, we first introduce a set of helper functions that algorithm terminates, FOO_R(x) must be strictly positive for any are sometimes known as branch distance. There are many different input x, due to C1 and C2. forms of branch distance in the literature [38, 49]. We define ours with respect to an arithmetic condition a op b. Tab.1 illustrates a scenario of how our approach saturates all R R branches of FOO. Each “#n” below corresponds to one line in the Definition 3.6. Let a,b , op ==, ,<,=, ,> , ε >0. We define branch distance∈ d (op,∈a, {b) as≤ follows:6 ≥ } ∈ table. We use pen0 and pen1 to denote pen injected at l0 and l1 ε respectively. (#1) Initially, Saturate = /0. Any input saturates a new def 2 _ dε (==,a,b) = (a b) (5) branch. Both pen0 and pen1 set r = 0, and FOO R = λx.0 (Fig.3). − def Suppose x∗ = 0.7 is found as the minimum point. (#2) The branch d ( ,a,b) = (a b) ? 0 : (a b)2 (6) 2 ε 1F is now saturated and 1T is not. Thus, pen1 sets r = (y 4) . ≤ ≤ − − def 2 Minimizing FOO_R gives x = 3.0, 1.0, or 2.0. We have illustrated dε (<,a,b) = (a < b) ? 0 : (a b) + ε (7) ∗ − − this MCMC procedure in Fig.2(b). Suppose x∗ = 1.0 is found. (#3) def d (=,a,b) = (a = b) ? 0 : ε (8) Both 1T and 1F , as well as 0T , are saturated by the generated inputs ε 6 6 0.7,1.0 . Thus, pen returns the previous r. Then, FOO_R amounts def def { } 1 and d ( ,a,b) = d ( ,b,a), d (>,a,b) = d (<,b,a). Usually, to pen , returning 0 if x > 1, or (x 1)2 + ε otherwise, where ε is a ε ε ε ε 0 the parameter≥ ε is a small≤ constant, so we drop the explicit reference small positive constant. Suppose x−= 1.1 is found as the minimum ∗ to ε when using the branch distance. point. (#4) All branches have been saturated. In this case, both _ pen0 and pen1 return the previous r. Then, FOO R becomes λx.1, The intention of d(op,a,b) is to quantify how far a and b are understanding that FOO_R initializes r as 1. The minimum point, e.g., from attaining a op b. For example, d(==,a,b) is strictly positive when a = b, becomes smaller when a and b go closer, and vanishes Algorithm 1: Branch coverage based testing via ME. when a 6== b. The following property holds: Input: FOO Program under test d(op,a,b) 0 and d(op,a,b) = 0 a op b. (9) ≥ ⇔ n_start Number of starting points As an analogue, we set pen to quantify how far an input is from LM Local optimization used in MCMC n_iter Number of iterations for MCMC saturating a new branch. We define pen following Algo.1, Lines 14- Output: X Generated input set 23. /* Step 1 */ Definition 3.7. For branch coverage based testing, the function pen 1 Inject global variable r in FOO has four parameters, namely, the label of the conditional statement 2 for conditional statement li in FOO do li, and op, a and b from the arithmetic condition a op b. 3 Let the Boolean condition at li be a op b where op ,<,=,>, ,= (a) If neither of the two branches at li is saturated, we let pen return ∈ {≤ ≥ 6 } 0 because any input saturates a new branch (Lines 16-17). 4 Insert assignment r = pen(li,op,a,b) before li /* Step 2 */ (b) If one branch at li is saturated but the other is not, we set r to be the distance to the unsaturated branch (Lines 18-21). 5 Let FOO_I be the newly instrumented program, and FOO_R be: double FOO_R(double x) {r = 1; FOO_I(x); return r;} (c) If both branches at li have already been saturated, pen returns /* Step 3 */ the previous value of the global variable r (Lines 22-23). 6 Let Saturate = /0 For example, the two instances of pen at l0 and l1 are invoked as 7 Let X = /0 pen(li, ,x,1) and pen(l ,==,y,4) respectively in Fig.3. 8 for k = 1 to n_start do ≤ 1 9 Randomly take a starting point x Algorithm for Step 2. This step constructs the representing func- 10 Let x∗ = MCMC(FOO_R,x) tion FOO_R (Algo.1, Line 5). Its input domain is the same as that 11 if FOO_R(x ) = 0 then X = X x ∗ ∪ { ∗} of FOO_I and FOO, and its output domain is double, so to simulate a 12 Update Saturate real-valued mathematical function which can then be processed by 13 return X the mathematical optimization backend. FOO_R initializes r to 1. This is essential for the correctness of 14 Function pen(li,op,a,b) the algorithm because we expect FOO_R returns a non-negative value 15 Let iT and iF be the true and the false branches at li 16 if i Saturate and i Saturate then when all branches are saturated (Sect. 3.2, Step 2). FOO_R then calls T 6∈ F 6∈ 17 return 0 FOO_I(x) and records the value of r at the end of executing FOO_I(x). _ 18 else if i Saturate and i Saturate then This r is the returned value of FOO R. T 6∈ F ∈ As mentioned in Sect. 3.2, it is important to ensure that FOO_R 19 return d(op,a,b) /* d: Branch distance */ 20 else if i Saturate and i Saturate then meets conditions C1 and C2. The condition C1 holds true since T ∈ F 6∈ 21 return d(op,a,b) /* op: the opposite of op */ FOO_R returns the value of the instrumented r, which is never 22 else /* i Saturate and i Saturate */ assigned a negative quantity. The lemma below states FOO_R also T ∈ F ∈ 23 return r satisfies C2. Lemma 3.8. Let FOO_R be the program constructed in Algo.1, 24 Function MCMC( f , x) and S the branches that have been saturated. Then, for any input 25 xL = LM( f ,x) /* Local minimization */ x dom(FOO), FOO_R(x) = 0 x saturates a branch that does not 26 for k = 1 to n_iter do belong∈ to S. ⇔ 27 Let δ be a random perturbation generation from a Proof. We first prove the direction. Take an arbitrary x such that predefined distribution _ ⇒ 28 Let xL = LM( f ,xL + δ) FOO R(x) = 0. Let τ = [l0,...ln] be the path in FOO passed through e 29 if ( ) < ( ) then = by executing FOO(x). We know, from Lines 2-4 of the algorithm, that f xeL f xL accept true _ 30 else each li is preceded by an invocation of pen in FOO R. We write peni 31 m for the one injected before l and divide pen i [1,n] into three Let be a random number generated from the i i [ , ] groups. For the given input x, we let P1, P2{ and|P3∈denote} the groups uniform distribution on 0 1 32 Let accept be the Boolean m < exp( f (xL) f (xeL)) of peni that are defined in Def. 3.7(a), (b) and (c), respectively. Then, − we can always have a prefix path of τ = [l ,...l ], with 0 m n 33 if accept then xL = xeL 0 m ≤ ≤ such that each peni for i [m + 1,n] belongs to P3, and each peni 34 return xL for i [0,m] belongs to either∈ P1 or P2. Here, we can guarantee ∈ the existence of such an m because, otherwise, all peni belong in P3, and FOO_R becomes λx.1. The latter contradicts the assumption _ that FOO R(x) = 0. Because each peni for i > m does nothing but performs r = r, we know that FOO_R(x) equals to the exact value of Algorithm for Step 3. The main loop (Algo.1, Lines 8-12) relies r that penm assigns. Now consider two disjunctive cases on penm. on an existing MCMC engine. It takes an objective function and If penm is in P1, we immediately conclude that x saturates a new a starting point and outputs x∗ that it regards as a minimum point. branch. Otherwise, if penm is in P2, we obtains the same from Each iteration of the loop launches MCMC from a randomly selected Eq. (9). Thus, we have established the direction of the lemma. starting point (Line 9). From each starting point, MCMC computes To prove the direction, we use the⇒ same notation as above, the minimum point x (Line 10). If FOO_R(x ) = 0, x is added to ⇐ ∗ ∗ ∗ and let x be the input that saturates a new branch, and [l0,...,ln] be the set of the generated inputs X (Line 11). Lem. 3.8 ensures that the exercised path. Assume that lm where 0 m n corresponds to x∗ saturates a new branch in the case of FOO_R(x∗) = 0. Therefore, the newly saturated branch. We know from≤ the≤ algorithm that (1) in theory, we only need to set n_start = 2 N where N denotes the ∗ penm updates r to 0, and (2) each peni such that i > m maintains the number of conditional statements, so to saturate all 2 N branches. value of r because their descendant branches have been saturated. In practice, however, we set n_start > 2 N because MCMC∗ cannot We have thus proven the direction of the lemma. guarantee that its output is a true global∗ minimum point. ⇐ The MCMC procedure (Algo.1, Lines 24-34) is also known Example 4.4. Let (X,U) be a search problem. as the Basinhopping algorithm [45]. It is an MCMC sampling (a) A trivial representing function is λx.(x X)? 0 : 1. over the space of the local minimum points [46]. The random ∈ starting point x is first updated to a local minimum point x (b) A generic representing function is the point-set distance. Imagine L that the search problem is embedded in a metric space [62] with (Line 25). Each iteration (Lines 26-33) is composed of the two R phases that are classic in the Metropolis-Hastings algorithm family a distance dist : X X . As a standard practice, we can lift dist to dist defined× as→λx.inf dist(x,x ) x X , where inf of MCMC [16]. In the first phase (Lines 27-28), the algorithm X { 0 | 0 ∈ } proposes a sample x from the current sample x. The sample x is refers to the greatest lower bound, or infimum. Intuitively, distX eL eL measures the distance between a point x U and the set X. It obtained with a perturbation δ followed by a local minimization, i.e., ∈ can be shown that distX satisfies conditions Def. 4.3(a-c), and xeL = LM( f ,xL +δ) (Line 28), where LM denotes a local minimization in Basinhopping, and f is the objective function. The second phase therefore, is a representing function. (c) The representing function used in branch coverage based testing (Lines 29-33) decides whether the proposed xL should be accepted e is the FOO_R constructed in Sect.3, where X is the input that as the next sampling point. If f (xeL) < f (xL), the proposed xeL will be sampled; otherwise, x may still be sampled, but only with saturates a new branch and U is the input domain. We have eL _ the probability of exp(( f (x ) f (x ))/T), in which T (called the proved that FOO R is a representing function in Lem. 3.8. L − eL annealing temperature [37]) is set to 1 in Algo.1 for simplicity. The theorem below allows us to approach a search problem by minimizing its representing function. 4. Mathematical Execution Theorem 4.5. Let R be the representing function for the search Predicate as function is a common concept in mathematics. As problem (X,U), and R∗ be the global minimum of R. Gödel stated in his 1931 work [22, 31], (a) Deciding the emptiness of X is equivalent to checking the sign There shall correspond to each relation R (R Zn) a rep- of R∗, i.e.,X = /0 R∗ > 0. ⊆ ⇔ resenting function φ(x1,...,xn) = 0 if R(x1,...,xn) and (b) Assume X = /0. Then, x U,x minimizes R x X. φ(x ,...,x ) = 1 if R(x ...x ). 6 ∀ ∈ ⇔ ∈ 1 n ¬ 1 n Proof. Proof of (a): Suppose X = /0. Let x be an element of X. We Traditional representing function in its Boolean nature is a predi- 6 0 cate/set indicator of two states, essentially being true or false. For have R∗ 0 by Def. 4.3(a). In addition, we have R∗ R(x0) since R is the≥ minimum. Then we have R 0 because R≤(x ) = 0 due example, even(N) that decides whether an integer N is even can be ∗ ∗ ≤ 0 represented by the function λx.(x mod 2 == 0). to Def. 4.3(c). Thus R∗ = 0. Conversely, R∗ = 0 implies that there exists an x U s.t. R(x ) = 0. By Def. 4.3(b), x X. Thus X = /0. In this section, we present Mathematical Execution (ME) by ∗ ∈ ∗ ∗ ∈ 6 extending the Boolean-valued representing function to a real-valued Proof of (b): Let 0R denote the set of the roots of R, and MR calculus to address a spectrum of automated testing problems for be the set of the minimum points of R. It suffices to show that 0 M X 0 under the condition X = /0. We have 0 M numerical code, which we unify in the category of the search R ⊆ R ⊆ ⊆ R 6 R ⊆ R problem. because a root of R is necessarily a minimum point; we have MR X because X = /0 implies R = 0 by Thm. 4.5(a). Take an arbitrary⊆ 6 ∗ 4.1 Search Problem x∗ MR, R(x∗) = R∗ = 0 holds, and therefore x∗ X by Def. 4.3(b); we∈ have X 0 from Def. 4.3(c). ∈ Definition 4.1. The search problem with regard to a set X aims to ⊆ R (a) find an x X if X = /0,and ∈ 6 Remark 4.6. (b) report “not found” if X = /0. An instance of Thm. 4.5(a) is shown in Tab.1, Line 4. There, FOO_R attains the minimum 1, which means that all branches Usually, we have a search space U, and X is specified implicitly as a have been saturated (namely X = /0). An instance of Thm. 4.5(b) is subset of U. We denote the search problem by (X,U). In this paper, shown in Eq. (1). Note that Thm. 4.5(b) does not hold if we drop the we deal with numerical code, and thus, we assume that X is a subset assumption X = /0. In fact, any R such as R(x) > 0 is a representing of RN . We also assume that X is decidable so that we can check function for a search6 problem (/0,U), but its minimum point, if any, whether an x U is an element of X. can never be an element of the empty set X. ∈ Example 4.2. A search problem can be any computational task that 4.3 The ME procedure attempts to find an element from a set. The representing function paves the way toward a generic solution (a) As per the notation used in Sect.1, an automated testing to the search problem. The key parameter in the ME procedure is problem of program FOO is a search problem (X,U) where mathematical optimization. X = x FOO(x) wrong and U = dom(FOO). { | ⇓ } Definition 4.7. Let U be a set, µ be a mathematical optimization (b) Another search problem is satisfiability checking, where X is algorithm that attempts to calculate a minimum point for an objective the set of the models of a constraint, and U is the value domain function defined over U. We define the Mathematical Execution to which the variables can be assigned. procedure as follows: 4.2 Representing Function Input: A search problem (X,U) Output: An element of X, or “not found” Definition 4.3. A function R is said to be a representing function for the search problem (X,U) if with any x U there is an associated M1. Construct the representing function R. real value R(x), such that ∈ M2. Minimize R. Let x∗ be the minimum point obtained by µ. M3. Return x if x X. Otherwise return “not found”. (a) R(x) 0; ∗ ∗ ∈ ≥ (b) every root of the representation function is a solution of the The following corollary states that the ME procedure solves search problem, i.e., R(x) = 0 = x X; and the search problem, under the condition that the ME procedure is ⇒ ∈ (c) the roots of R include all solutions to the search problem, i.e., equipped with an ideal mathematical optimization backend. Its proof x X = R(x) = 0. (omitted) follows Thm. 4.5. ∈ ⇒ FOO: Program under test in any LLVM-supported language

type_tFOO(doublex1,doublex2,...)

pen (.cpp)

double pen (int i, int op, double lhs, double rhs)

_ FOO: Program under test FOO I: Instrumented program (.bc) in any LLVM-supported language type_tFOO_I(doublex1,doublex2,...)

type_tFOO(doublex1,doublex2,...) loader (.cpp)

pen (.cpp) void LOADER (double* P) double pen (int i, int op, double lhs, doubleFOO rhs)_R:(.cpp) Representing function FOO: Program under test FOO_I: Instrumented program program (.bc) void FOO_R(double* P) FOO: Program under under test test in any LLVM-supported language type_tFOO_I(doublex1,doublex2,...) in any LLVM-supported language libr.sotype_tFOO(doublex1,doublex2,...) loader (.cpp) void FOO_R(double P) Generated test inputs type_tFOO(doublex1,doublex2,...)Step 1. Step 2. pen (.cpp) * Step 3. void LOADER (double* P) MCMCdouble minimization pen (int i, procedure int op, double (.py) lhs, double3,1 rhs) pen (.cpp) { } FOO_R (.cpp) basinhopping (func, sp, n_iter, callback) double pen (int i, int op, double lhs, double rhs) FOO_I: Instrumented program (.bc) void FOO_R(double P) * LLVM_ pass _ FOO_I: Instrumented program (.bc) type tFOOI(doublex1,doublex2,...) libr.so Linking Pathtype reachability_tFOO_I(doublex1,doublex2,...) problem loader (.cpp) void FOO_R(double* P) loaderFind x that(.cpp) triggers void LOADER (double* P) def MCMC minimization procedure (.py) t =[0T ,1T ] x triggers t x minimizes FOO_R (.cpp) void LOADER (double* P) , basinhopping (func, sp, n_iter, callback) void FOO_R(double* P) FOO_RFigure(.cpp) 4: Path reachability testing via ME. The goal of this example is to find a test input that triggers the path [0T ,1T ] of the program FOO. LLVM pass libr.so void FOO_R(double* P) Linking void FOO_R(double* P) Corollarylibr.so 4.8. Let (X,U) be a search problem. Assume that µ except that we design a different representing function here. We yields a true global minimum point in M2 of the ME procedure. illustrate theMCMC ME approach minimization in Fig.4. procedure (.py) Then,void FOO_R(double* P) Step 1. Webasinhopping inject a global (func, variable sp, nr_iter,in FOO callback)and the assignment (a)MCMC the ME minimizationprocedure returns procedure an x∗ (.py)X if X = /0, and ∈ 6 LLVM passr = r + ( , , ) (b) the ME procedure returns_ “not found” if X = /0. d op a b (11) basinhopping (func, sp, n iter, callback) Linking before the conditional statements, where d is the branch distance Remark 4.9. LLVM passA known issue with testing is its incompleteness, i.e., defined in Def. 3.6. The instrumented program is shown as FOO_I in “TestingLinking shows the presence of bugs, but not its absence” [24]. In the Fig.4. The assignment is to measure how far the input has attained context of Mathematical Execution, the above well-known remark the desired path. corresponds to the fact that, in practice, if the ME procedure returns “not found” for the search problem (X,U), X may still be non-empty. Step 2. The value of r is then retrieved through a driver program We call this phenomenon practical incompleteness. FOO_R (Fig.4), which initializes r to 0 (unlike in Sect. 3.2, where r The practical incompleteness occurs when the MO backend fails is initialized to 1), calls FOO_I and then returns r. to yield an accurate minimum point. To clarify, we use x∗ for an exact global minimum point, and we use x∗ for the one calculated Step 3. A global minimum point of FOO_R is calculated by an by the MO backend µ. Then we consider four disjunctive cases: MO algorithm. As shown in the graph of FOO_R (Fig.4), there are three local minimum points, 3,1,2 . Two of them, 3,1 , (a) R(x ) = 0 and R(x ) = 0; ∗ ∗ attain the global minimum 0 and{− either solves} the path reachability{− } (b) R(x∗) > 0 and R(x∗) > 0; testing problem. Note that the representing function is discontinuous (c) R(x∗) = 0 and R(x∗) > 0; at x = 1, but the MCMC procedure can easily find it (similar to (d) R(x∗) > 0 and R(x∗) = 0. Fig.2(b)). The ME procedure remains correct for both (a) and (b). The case Below we prove the correctness of the ME solution. (c) cannot happen because R(x∗) < R(x) for all x. The practical Corollary 4.10. An input x triggers τ iff x minimizes FOO_R. incompleteness occurs in (d), where the ME procedure returns “not found” but X = /0.Sect.6 further discusses this incompleteness. 6 Proof. First, it can be shown that the constructed FOO_R is a rep- 4.4 Additional Examples resenting function for the path reachability problem, thus we can apply Thm. 4.5. By Thm. 4.5(a), X = /0 since FOO_R attains the This subsection aims to show that ME is a unified approach by 6 applying it on several other important search problems besides global minimum 0. Then we conclude from Thm. 4.5(b). coverage-based testing. In each example, we illustrate ME with a different representing function. 4.4.2 Boundary Value Analysis 4.4.1 Path Reachability Testing In testing, test inputs that explore “boundary conditions” usually have a higher payoff than those that do not [70]. The problem of Given a path τ of program FOO, we call path reachability testing the generating such inputs is expressed abstractly as boundary value search problem (X,U) with analysis [39, 54, 58], which can be seen as the search problem with X = x x triggers the path τ ,U = dom(FOO). (10) { | } X = x x triggers a boundary condition ,U = dom(FOO). (12) The path reachability problem has been studied as an independent { | } research topic [52], or more commonly, as a subproblem in other Consider again the program FOO in Fig.4. The boundary value testing problems [30, 38, 42]. analysis is to find test inputs that trigger a boundary condition (a) x = 1 at l0, or (b) y = 4 at l1. With manual reasoning, condition (a) Consider the program FOO in Fig.4(left), which is the same as can only be triggered by input 1. Condition (b) can be triggered that in Fig.3. Suppose that we want to trigger the path τ = [0 ,1 ] if x = 2 or x = 2 before l . For each case, we reason backward T T − 1 (we denote a path by a sequence of branches). Our approach to this across the two branches at l0 and then merge the results. Eventually example problem is similar to the three steps explained in Sect. 3.2, we have 3,1,2 as the test inputs to generate. {− } FOO: Program under test in any LLVM-supported language

type_tFOO(doublex1,doublex2,...)

pen (.cpp)

double pen (int i, int op, double lhs, double rhs)

FOO_I: Instrumented program (.bc) FOO: Program under test _ _ in any LLVM-supported languagetype tFOOI(doublex1,doublex2,...)

type_tFOO(doublex1,doublex2,...)loader (.cpp)

pen (.cpp) void LOADER (double* P) _ double pen (int i, int op, double lhs,FOO doubleR:(.cpp) Representing rhs) function _ FOO_I: Instrumented Instrumented program program (.bc)void FOO R(double* P)

type_tFOO_I(doublex1,doublex2,...)libr.so _ loader (.cpp) void FOO R(double* P)

void LOADER (double* P) MCMC minimization procedure (.py) _ FOO_R (.cpp) basinhopping (func, sp, n iter, callback)

void FOO_R(double* P) LLVM pass Linking libr.so

void FOO_R(double* P) Figure 6: Satisfiability checking via ME. This goal of this example is to find a model of π defined in Eq. (14). The curve depicts its representing function MCMC minimization procedure (.py) defined in Eq. (15). Figure 5: Boundary value analysis via ME. This goal is to find a test input to _ triggerbasinhopping a boundary (func, condition, sp,n namely,iter, callback) (a) x = 1 at l0 or (b) y = 4 at l1 of the program FOO in Fig.4. LLVM pass of floating-point numbers to avoid floating-point arithmetic in the Linking first place. Our ME solution follows: As before, we introduce a global variable r to estimate how far a program input is from triggering a 5. Evaluation boundary condition. We inject the assignment 5.1 The ME-powered System CoverMe r = r d(==,a,b) (13) ∗ We have implemented CoverMe, a proof-of-concept realization for before each condition a op b, where function d is the branch distance branch coverage based testing. CoverMe has a modular design and defined in Def. 3.6. Fig.5 illustrates FOO_I and FOO_R. Then we can can be adapted to other automated testing problems, such as path solve the boundary value analysis problem via minimizing FOO_R. reachability and boundary value analysis (Sect. 4.4). This section The correctness of this procedure follows: presents the architecture of CoverMe (Fig.7), in which we identify three layers and their corresponding use cases. Implementation Corollary 4.11. An input x triggers a boundary condition if and details are given in AppendixB. only if x minimizes FOO_R. Client. This layer provides the program under test FOO and speci- Proof. It can be shown that (1) r is always assigned to a non-negative fies the inputs to generate X. The search problem involved will be value; (2) if r = 0 at the end of the program execution, then one (X,dom(FOO)). X is usually implicit, e.g., specified via a path to of the boundary conditions must hold; and (3) r = 0 at the end of trigger as in path reachability testing (Sect. 4.4.1), and checking the program execution if at least one of the boundary conditions x X has to be feasible following Def. 4.1. FOO is a piece of code holds. Thus, the constructed FOO_R satisfies the conditions for being in∈ the LLVM intermediate representation [44] or in any language a representing function (Def. 4.3). We can then prove the corollary that can be transformed to it (such as code in Ada, the C/C++ lan- following Thm. 4.5 (in the same way as we prove Cor. 4.10). guage family, or Julia). The current CoverMe has been tested on C, and we require dom(FOO) RN , as the system backend outputs 4.4.3 Satisfiability Checking floating-point test data by default⊆ (see the ME kernel layer below). Consider the satisfiability problem with constraint Researcher. This layer sets two parameters. One is the initial x 2 π = 2 5 x 5 x 0 (14) value of the representing function, denoted by r0. It is usually ≤ ∧ ≥ ∧ ≥ either 0 or 1 from our experience. For example, r is set to 0 in where x R. If we write x = π to mean that π becomes a tautology 0 ∈ ∗ | path reachability testing (Sect. 4.4.1), and 1 in both coverage-based by substituting its free variable x with x∗. Then, this satisfiability testing (Sect. 3.2) and boundary value analysis (Sect. 4.4.2). The problem can be expressed as the search problem ( x x = π ,R). { ∗ | ∗ | } other parameter is the assignment to inject before each conditional Its representing function can be defined as (Fig.6): statement in FOO. In practice, the Researcher specifies this code def r = R = λx.(2x 5) ? 0 : (2x 5)2 + (x2 5) ? 0 : (x2 5)2 segment in the pen procedure and injects pen, as shown in ≤ − ≥ − (15) Sect.3. + (x 0) ? 0 : x2. ≥ ME Kernel. This layer takes as inputs the program FOO provided Then we can solve the satisfiability checking problem of constraint by the Client, r0 and pen set by the Researcher, and operates the π in Eq. (14) by minimizing its representing function R in Eq. (15). three steps described in Sect. 3.2. It (1) uses Clang [2] to inject r The ME procedure can easily locate the minimum points in between _ and pen into FOO to construct FOO I, (2) initializes r to r0, invokes √5 ( 2.24) and log 5 ( 2.32). Each of the minimum points is _ _ _ ≈ 2 ≈ FOO I and returns r in FOO R, and (3) minimizes FOO R with MO. necessarily a model of π following Thm. 4.5. The correctness of As mentioned in Sect. 3.3, CoverMe uses Basinhopping, an off-the- such an approach follows: shelf implementation from SciPy optimization package [7] as the MO backend. Basinhopping is then launched from different starting Corollary 4.12. x∗ = π x∗ minimizes R. | ⇔ points as shown in Algo.1, Lines 8-12. These starting points are Remark 4.13. The representing function in Eq. (15) is defined on randomly generated from the Hypothesis library [5]. R. It is to illustrate the concept and allows us to ignore issues about 5.2 Experimental Setup floating-point inaccuracy. A more realistic representing function for satisfiability checking has been shown in a recent work [28] where This section discusses the benchmarks, the tools for comparison, the the representing function is constructed based on the binary forms settings and hardware, and two evaluation objectives. FOO: Program under testFOO: Program under test in any LLVM-supportedin any language LLVM-supported language

type_tFOO(doublex1,doublex2,...)type_tFOO(doublex1,doublex2,...) Table 2: Gcov metrics and explanations Client Provide FOO: Program underpen test(.cpp)Specify X pen (.cpp)

in any LLVM-supporteddouble pen language (int i, intdouble op, double pen (int lhs, i, double int op, rhs) double lhs, double rhs) FOO: Program under test Metrics Gcov message Description Note type_tFOO(doublex1,doublex2,...) FOO_I: Instrumented programFOO_I: Instrumented (.bc) program (.bc) in any LLVM-supported language Line% Lines executed Covered source lines a.k.a. line or pen Researcher type_DesigntFOO(doublex1,doublex2,...)(.cpp) pen type_SettFOOr0_I(doublex1,doublex2,...)type_tFOO_I(doublex1,doublex2,...) over total source lines statement coverage double pen (int i, int op, double lhs, double rhs) Condition% Branches Covered conditional pen (.cpp) loader (.cpp) loader (.cpp) executed statements over total FOO_I: Instrumented program (.bc) double pen (int i, int op,void double LOADER lhs, (double double* P)void rhs) LOADER (double* P) conditional statements type_tFOO_I(doublex1,doublex2,...) ME Kernel Construct FOO_I: InstrumentedConstruct programFOO_R (.cpp) (.bc)Minimize FOO_R (.cpp) Branch% Branches taken Covered branches over a.k.a. at least once total branches branch coverage _loader_ (.cpp) _ _ type tFOOI(doublex1,doublex2,...)void FOO R(double* P) void FOO R(doubleCall%* P) Calls executed Covered calls over void LOADER (double P) Figure 7: Architecture ofloaderCoverMe(.cpp). Each layerlibr.so is* associated with typicallibr.so use total calls cases. FOO: program under test; X_: test inputs to generate; pen: procedure that voidFOO LOADERR (.cpp) (double* P)void FOO_R(double* P) void FOO_R(double* P) updates r; r0: initial value of the representing function; FOO_I: instrumented _ program; FOO_R: representingFOO_voidR function.(.cpp) FOO R(double*MCMCP) minimizationMCMC procedure minimization (.py) procedure (.py) “Branches” and “Calls” in Col. 1, Tab.2. Col. 2-4 give the corre- libr.so_ void FOO R(double* P) basinhopping (func, sp,basinhopping n_iter, callback) (func,sponding sp, n_iter, Gcov callback) report message, the descriptions and the general _ libr.sovoid FOO R(double* P) metrics for coverage testing. (2) Efficiency: we measure the wall Benchmarks. We use the C math libraryLLVM Fdlibm pass 5.3 [3] asLLVM our pass Linking Linking time reported by the standard Unix command “time”. The timeout benchmark. These programsvoidMCMC FOO_R(double are minimization developed* P) procedureby Sun (now (.py) Oracle). limit for the compared tools is 48 hours. They are real-world programs,MCMCbasinhopping minimization rich in (func, floating-point procedure sp, n_iter, (.py) operations, callback) and have been used in Matlab, Java, JavaScript and Android. basinhoppingLLVM (func, pass sp, n_iter, callback) 5.3 Quantitative Results Fdlibm includes 80 programs.Linking Each has one or multiple entry functions. In total, FdlibmLLVM has 92 passentry functions. Among them, we This subsection presents two sets of experimental results. The first exclude (1) 36 functionsLinking that do not have branches, (2) 11 functions validates our approach by comparing it against random testing. The involving non-floating-point input parameters, and (3) 5 static C second compares CoverMe with Austin [43], an open-source tool functions. Our benchmark suite includes all remaining 40 functions that combines symbolic execution and search-based strategies. in Fdlibm. For completeness, we list untested functions and reasons why they are not selected in AppendixA. 5.3.1 CoverMe versus Random Testing We have compared CoverMe with Rand by running them on the Compared Tools. We have considered tools publicly available benchmarks described in Sect. 5.2. In Tab.3, we sort all benchmark to us, including Austin [43], Pex [68], and Klee [15] and its programs (Col. 1) and entry functions (Col. 2) by their names, give two variants, namely Klee-Mulitsolver [57] and Klee-FP [18]. We the numbers of source lines, branches and invoked functions (Col. 3- tried Klee-Mulitsolver [57] with Z3 [23] as the SMT backend but 5). All coverage results are given by Gcov. It can be seen that the found that the expression language of Klee [15] did not support programs are rich in branches. The largest number of branches is floating-point constraints. Besides, some common operations in our 114 (ieee754_pow). Especially, some tested functions have more benchmark programs, such as pointer reference and dereference, branches than lines. For example, ieee754_atan2 has 44 branches, type casting, external function calls, etc., are not supported by Z3 but only 39 lines; ceil has 30 branches, but only 29 lines. or any other backend solvers compatible with Klee-Multisolver. The times used by CoverMe are given by Col. 6 in Tab.3. Klee-FP supports symbolic reasoning on the equivalence between Observe that the times used by CoverMe vary considerably through floating-point values but does not support coverage testing [18]. Pex, all entry functions, from 0.1 to 22.1 seconds, but all under half a unfortunately, can only run for .NET programs on Windows whereas minute. We find the numbers of lines (Col. 3) and branches (Col. 4) Fdlibm is in C, and our testing platform is based on Unix. are not correlated with the running times (Col. 6). CoverMe takes To our best knowledge, Austin [43] is the only publicly available 1.1 seconds to run the function expm1 (with 42 branches and 56 tool supporting coverage testing for Fdlibm. Austin combines lines) and 10.1 seconds to run the function floor (with 30 branches symbolic execution and search-based heuristics, and has been and 30 lines). It shows the potential for real-world program testing thoroughly tested for branch coverage based testing on floating-point since CoverMe may not be very sensitive to the number of lines or programs [41]. For empirical comparison, we have also implemented branches. We set the timeout limit of Rand as 600 seconds, since a random sampling tool, which samples inputs from the function’s Rand does not terminate by itself and 600 seconds are already larger input domains using a standard pseudo-random number generator. than the times spent by CoverMe by orders of magnitude. We refer to the tool as Rand. Col. 7-14 in Tab.3 show the coverage results of Rand and Settings and Hardware. CoverMe supports the following com- CoverMe. We use Gcov to compute four metrics: lines, condition, mand line options while generating test inputs from floating-point branches and calls (as defined in Tab.2). All coverage results programs: (1) the number of Monte-Carlo iterations n_iter, (2) the reported by Gcov have been included in Tab.3 for completeness. local optimization algorithm LM and (3) the number of starting points The columns “Line” and “Branch” refer to the commonly used n_start. These options correspond to the three input parameters in line/statement coverage and branch coverage, respectively. The Algo.1. We set n_iter = 5, n_start = 500, and LM=“powell” which average values of the coverage are shown in the last row of the refers to Powell’s local optimization algorithm [59]. table. All values in Col. 8, 10, 12 and 14 are larger than or equal For both Austin and CoverMe, we use the default settings for to the corresponding values in Col. 7, 9, 11 and 13. It means that running the benchmarks. All experiments were performed on a CoverMe achieves higher coverage than Rand for every benchmark laptop with a 2.6 GHz Intel Core i7 and 4GB RAM running Ubuntu program and every metric. For the line and condition coverages, 14.04 virtual machine. CoverMe achieves full coverage for almost all benchmarks, with 97.0% line coverage and 98.2% condition coverage on average, Evaluation Objectives. There are two specific evaluation objec- whereas Rand achieves 54.2% and 61.2% for the two metrics. For tives. (1) Coverage: We use the standard Gnu coverage tool Gcov [4] the most important branch coverage (since it is a primary target to analyze the coverage. Gcov generates four metrics for source of this paper), CoverMe achieves 100% coverage for 11 out of 40 code coverage analysis, which are listed as “Lines”, “Conditions”, benchmarks with an average of 90.8% coverage, while Rand does Table 3: Comparing Random testing and CoverMe. The benchmark programs are taken from Fdlibm [3]. We calculate the coverage percentage using Gcov [4]. The metrics of Gcov are explained in Tab.2. “n/a” in the last two columns indicates that no function call exists in the program; lines with “n/a” are excluded when calculating the mean values of the last two columns.

Benchmark characteristics Time (s) Line (%) Condition (%) Branche (%) Call (%) Program Entry function #Line #Branch #Call Rand CoverMe Rand CoverMe Rand CoverMe Rand CoverMe e_acos.c ieee754_acos(double) 33 12 0 7.8 18.2 100.0 33.3 100.0 16.7 100.0 n/a n/a e_acosh.c ieee754_acosh(double) 15 10 2 2.3 46.7 93.3 60.0 100.0 40.0 90.0 50.0 100.0 e_asin.c ieee754_asin(double) 31 14 0 8.0 19.4 100.0 28.6 100.0 14.3 92.9 n/a n/a e_atan2.c ieee754_atan2(double, double) 39 44 0 17.4 59.0 79.5 54.6 84.1 34.1 63.6 n/a n/a e_atanh.c ieee754_atanh(double) 15 12 0 8.1 40.0 100.0 16.7 100.0 8.8 91.7 n/a n/a e_cosh.c ieee754_cosh(double) 20 16 3 8.2 50.0 100.0 75.0 100.0 37.5 93.8 0.0 100.0 e_exp.c ieee754_exp(double) 31 24 0 8.4 25.8 96.8 33.3 100.0 20.8 96.7 n/a n/a e_fmod.c ieee754_frmod(double, double) 70 60 0 22.1 54.3 77.1 66.7 80.0 48.3 70.0 n/a n/a e_hypot.c ieee754_hypot(double, double) 50 22 0 15.6 66.0 100.0 63.6 100.0 40.9 90.9 n/a n/a e_j0.c ieee754_j0(double) 29 18 2 9.0 55.2 100.0 55.6 100.0 33.3 94.4 0.0 100.0 ieee754_y0(double) 26 16 5 0.7 69.2 100.0 87.5 100.0 56.3 100.0 0.0 100.0 e_j1.c ieee754_j1(double) 26 16 2 10.2 65.4 100.0 75.0 100.0 50.0 93.8 0.0 100.0 ieee754_y1(double) 26 16 4 0.7 69.2 100.0 87.5 100.0 56.3 100.0 0.0 100.0 e_log.c ieee754_log(double) 39 22 0 3.4 87.7 100.0 90.9 100.0 59.1 90.9 n/a n/a e_log10.c ieee754_log10(double) 18 8 1 1.1 83.3 100.0 100.0 100.0 62.5 87.5 100.0 100.0 e_pow.c ieee754_pow(double, double) 139 114 0 18.8 15.8 92.7 28.1 92.7 15.8 81.6 n/a n/a e_rem_pio2.c ieee754_rem_pio2(double, double*) 64 30 1 1.1 29.7 92.2 46.7 100.0 33.3 93.3 100.0 100.0 e_remainder.c ieee754_remainder(double, double) 27 22 1 2.2 77.8 100.0 72.7 100.0 45.5 100.0 100.0 100.0 e_scalb.c ieee754_scalb(double, double) 9 14 0 8.5 66.7 100.0 85.7 100.0 50.0 92.9 n/a n/a e_sinh.c ieee754_sinh(double) 19 20 2 0.6 57.9 100.0 60.0 100.0 35.0 95.0 0.0 100.0 e_sqrt.c ieee754_sqrt(double) 68 46 0 15.6 85.3 94.1 87.0 92.7 69.6 82.6 n/a n/a k_cos.c kernel_cos(double, double) 15 8 0 15.4 73.3 100.0 75.0 100.0 37.5 87.5 n/a n/a s_asinh.c asinh(double) 14 12 2 8.4 57.1 100.0 66.7 100.0 41.7 91.7 50.0 100.0 s_atan.c atan(double) 28 26 0 8.5 25.0 96.4 30.8 100.0 19.2 88.5 n/a n/a s_cbrt.c cbrt(double) 24 6 0 0.4 87.5 91.7 100.0 100.0 50.0 83.3 n/a n/a s_ceil.c ceil(double) 29 30 0 8.8 27.6 100.0 20.0 100.0 10.0 83.3 n/a n/a s_cos.c cos (double) 12 8 6 0.4 100.0 100.0 100.0 100.0 75.0 100.0 83.3 100.0 s_erf.c erf(double) 38 20 2 9.0 21.1 100.0 50.0 100.0 30.0 100.0 0.0 100.0 erfc(double) 43 24 2 0.1 18.6 100.0 41.7 100.0 25.0 100.0 0.0 100.0 s_expm1.c expm1(double) 56 42 0 1.1 21.4 100.0 33.3 100.0 21.4 97.6 n/a n/a s_floor.c floor(double) 30 30 0 10.1 26.7 100.0 20.0 100.0 10.0 83.3 n/a n/a s_ilogb.c ilogb(double) 12 12 0 8.3 33.3 91.7 33.3 83.3 16.7 75.0 n/a n/a s_log1p.c log1p(double) 46 36 0 9.9 71.7 100.0 61.1 100.0 38.9 88.9 n/a n/a s_logb.c logb(double) 8 6 0 0.3 87.5 87.5 100.0 100.0 50.0 83.3 n/a n/a s_modf.c modf(double, double*) 32 10 0 3.5 31.2 100.0 46.7 100.0 33.3 100.0 n/a n/a s_nextafter.c nextafter(double, double) 36 44 0 17.5 72.2 88.9 81.8 95.5 59.1 79.6 n/a n/a s_rint.c rint(double) 34 20 0 3.0 26.5 100.0 30.0 100.0 15.0 90.0 n/a n/a s_sin.c sin (double) 12 8 6 0.3 100.0 100.0 100.0 100.0 75.0 100.0 83.3 100.0 s_tan.c (double) 6 4 3 0.3 100.0 100.0 100.0 100.0 50.0 100.0 66.7 100.0 s_tanh.c tanh(double) 16 12 0 0.7 43.8 100.0 50.0 100.0 33.3 100.0 n/a n/a MEAN 6.9 54.2 97.0 61.2 98.2 38.0 90.8 39.6 100.0 not achieve any 100% coverage and attains only 38.0% coverage on shown in the last row of Tab.4, Austin needs 6058.4 seconds on average. CoverMe achieves 100% call coverage on average, whereas average for the testing. It should be mentioned that the average time Rand achieves 39.5%. does not include the benchmarks where Austin crashes or times out. Compared with Austin, CoverMe is faster (Tab.4, Col. 4) with 6.9 seconds on average (The results are also shown in Tab.3(Col. 6)). 5.3.2 CoverMe versus Austin CoverMe achieves a much higher branch coverage (90.8%) than Tab.4 reports the testing results of Austin and CoverMe. It uses Austin (42.8%). We also compare across Tab.4 and Tab.3. On the same set of benchmarks as Tab.3 (Col. 1-2). We use the time average, Austin provides slightly better branch coverage (42.8%) (Col. 3-4) and the branch coverage metric (Col. 5-6) to evaluate the than Rand (38.0%). efficiency and the coverage. We choose branch coverage instead of Col. 7-8 are the improvement metrics of CoverMe against Austin. the other three metrics in Gcov since branch coverage is the major The Speedup (Col. 7) is calculated as the ratio of the time spent by concern in this paper. Besides, Gcov needs to have access to the Austin and the time spent by CoverMe. The coverage improvement generated test inputs to report the coverage, but currently, there is (Col. 7) is calculated as the difference between the branch coverage no viable way to access the test inputs generated by Austin. Unlike of CoverMe and that of Austin. We observe that CoverMe provides Rand, the branch coverage percentage of Austin (Col. 5) is provided 3,868X speedup and 48.9% coverage improvement on average. by Austin itself, rather than by Gcov. Austin also shows large performance variances over different Remark 5.1. Our evaluation shows that CoverMe has achieved benchmarks, from 667.1 seconds (the program sin) to hours. As high code coverage on most tested programs. One may wonder Table 4: Comparison of Austin [43] and CoverMe. The benchmark programs are taken from the Fdlibm library [3].

Benchmark Time (second) Branch coverage(%) Improvement metrics Program Entry function Austin CoverMe Austin CoverMe Speedup Coverage (%) e_acos.c ieee754_acos(double) 6058.8 7.8 16.7 100.0 776.4 83.3 e_acosh.c ieee754_acosh(double) 2016.4 2.3 40.0 90.0 887.5 50.0 e_asin.c ieee754_asin(double) 6935.6 8.0 14.3 92.9 867.0 78.6 e_atan2.c ieee754_atan2(double, double) 14456.0 17.4 34.1 63.6 831.2 29.6 e_atanh.c ieee754_atanh(double) 4033.8 8.1 8.3 91.7 495.4 83.3 e_cosh.c ieee754_cosh(double) 27334.5 8.2 37.5 93.8 3327.7 56.3 e_exp.c ieee754_exp(double) 2952.1 8.4 75.0 96.7 349.7 21.7 e_fmod.c ieee754_frmod(double, double) timeout 22.1 n/a 70.0 n/a n/a e_hypot.c ieee754_hypot(double, double) 5456.8 15.6 36.4 90.9 350.9 54.6 e_j0.c ieee754_j0(double) 6973.0 9.0 33.3 94.4 776.5 61.1 ieee754_y0(double) 5838.3 0.7 56.3 100.0 8243.5 43.8 e_j1.c ieee754_j1(double) 4131.6 10.2 50.0 93.8 403.9 43.8 ieee754_y1(double) 5701.7 0.7 56.3 100.0 8411.0 43.8 e_log.c ieee754_log(double) 5109.0 3.4 59.1 90.9 1481.9 31.8 e_log10.c ieee754_log10(double) 1175.5 1.1 62.5 87.5 1061.3 25.0 e_pow.c ieee754_pow(double, double) timeout 18.8 n/a 81.6 n/a n/a e_rem_pio2.c ieee754_rem_pio2(double, double*) timeout 1.1 n/a 93.3 n/a n/a e_remainder.c ieee754_remainder(double, double) 4629.0 2.2 45.5 100.0 2146.5 54.6 e_scalb.c ieee754_scalb(double, double) 1989.8 8.5 57.1 92.9 233.8 35.7 e_sinh.c ieee754_sinh(double) 5534.8 0.6 35.0 95.0 9695.9 60.0 e_sqrt.c iddd754_sqrt(double) crash 15.6 n/a 82.6 n/a n/a k_cos.c kernel_cos(double, double) 1885.1 15.4 37.5 87.5 122.6 50.0 s_asinh.c asinh(double) 2439.1 8.4 41.7 91.7 290.8 50.0 s_atan.c atan(double) 7584.7 8.5 26.9 88.5 890.6 61.6 s_cbrt.c cbrt(double) 3583.4 0.4 50.0 83.3 9109.4 33.3 s_ceil.c ceil(double) 7166.3 8.8 36.7 83.3 812.3 46.7 s_cos.c cos (double) 669.4 0.4 75.0 100.0 1601.6 25.0 s_erf.c erf(double) 28419.8 9.0 30.0 100.0 3166.8 70.0 erfc(double) 6611.8 0.1 25.0 100.0 62020.9 75.0 s_expm1.c expm1(double) timeout 1.1 n/a 97.6 n/a n/a s_floor.c floor(double) 7620.6 10.1 36.7 83.3 757.8 46.7 s_ilogb.c ilogb(double) 3654.7 8.3 16.7 75.0 438.7 58.3 s_log1p.c log1p(double) 11913.7 9.9 61.1 88.9 1205.7 27.8 s_logb.c logb(double) 1064.4 0.3 50.0 83.3 3131.8 33.3 s_modf.c modf(double, double*) 1795.1 3.5 50.0 100.0 507.0 50.0 s_nextafter.c nextafter(double, double) 7777.3 17.5 50.0 79.6 445.4 29.6 s_rint.c rint(double) 5355.8 3.0 35.0 90.0 1808.3 55.0 s_sin.c sin (double) 667.1 0.3 75.0 100.0 1951.4 25.0 s_tan.c tan(double) 704.2 0.3 50.0 100.0 2701.9 50.0 s_tanh.c tanh(double) 2805.5 0.7 33.3 100.0 4075.0 66.7 MEAN 6058.4 6.9 42.8 90.8 3868.0 48.9 whether the generated inputs have triggered any latent bugs. Note automated testing problems. Below we discuss three sources of that when no specifications are given, program crashes have been practical incompleteness and how it can be mitigated. frequently used as an oracle for finding bugs in integer programs. Floating-point programs, on the other hand, can silently produce Well-behaved Representing Functions. The well-behavedness wrong results without crashing. Thus, program crashes cannot be of the representing function is essential for the working of ME. used as a simple, readily available oracle as for integer programs. Well-behavedness is a common term in the MO literature to describe Our experiments, therefore, have focused on assessing the efficiency that the MO backend under discussion can efficiently handle the of ME in solving the problem defined in Def. 3.1 or Lem. 3.3, and objective function. A common requirement, for example, is that the do not evaluate its effectiveness in finding bugs, which is orthogonal objective function (or the representing function for ME) should be and interesting future work. smooth to some degree. Consider again the path reachability problem in Sect. 4.4.1. If FOO_I is instrumented as in Fig.8(left), the corresponding FOO_R also satisfies the conditions in Def. 4.3, but this FOO_R is discontinu- 6. Incompleteness ous at 3,1,2 (Fig.8(right)). The representing function changes The development of ME toward a general solution to the search its value{− disruptively} at its minimum points. These minimum points problem, as illustrated in this presentation, has been grounded in the cannot be calculated by MO tools. The example also shows that concept of representing function and mathematical optimization. the conditions in Def. 4.3(a-c) can be insufficient for avoiding ill- While our theory guarantees that the ME procedure solves the behaved representing functions. However, it is unclear to us whether search problem correctly (Cor. 4.8), the practical incompleteness there exists a set of conditions that are both easy-to-verify and (Remark 4.9) remains a challenge in applying ME to real-world sufficient for excluding ill-behaved functions. inputs x such that [ FOO ](x) E, where [ FOO ](x) denotes the path _ void FOO I(double x){ of executing FOO with| input| ∈x. | | r = r + (x <= 1)? 0:1; l0: if (x <= 1) x++; Automated Testing. Many automated testing approaches adopt double y = square(x); symbolic execution [12, 14, 17, 36]. They repeatedly select a target r = r + (y == 4)? 0:1 ; path τ and gather the conjunction of logic conditions along the l1: if (y == 4) ...; path, denoted by Φτ (called path condition [36]). They then use } SMT solvers to calculate a model of Φτ . These approaches are both sound and complete in the sense of FOO(x) wrong x = Φτ . Symbolic execution and its variants have seen⇓ much progress⇔ | since Figure 8: An example of ill-behaved representing function for the path the breakthrough of SAT/SMT [55, 65], but still have difficulties reachability problem in Sect. 4.4.1. The left is the instrumented FOO_I; the in handling numerical code. There are two well-known efficiency right is the graph of representing function FOO_R. issues. First, path explosion makes it time-consuming to select a large number of τ and gather Φτ . Second, for numerical code, each Reliance on Program Execution. We have seen that the ME gathered Φτ involves numerical constraints that can quickly go procedure generates test inputs of program FOO by minimizing, and beyond the capabilities of modern SAT/SMT solvers. therefore, running, another program FOO_R. This execution-reliant A large number of search-based heuristics have been proposed feature has both benefits and risks. It allows us to generate test inputs to mitigate issues from symbolic execution. They use fitness func- of complex programs without analyzing their semantics; it also tions to capture path conditions and use numerical optimization to means that ME can give different results with different compilers or minimize/maximize the fitness functions. A well-developed search- machines. based testing is Austin [43], which combines symbolic execution and In our experiments, we realize that the feature becomes disad- search-based heuristics. Austin has been compared with Cute [64], vantageous if inaccuracy occurs in the program execution. Consider a dynamic symbolic execution tool, and shows to be significantly the satisfiability testing problem with the constraint π = x 1E-20. more efficient. These search-based solutions, however, as McMinn ≥ Suppose we use the representing function Rπ (x) = x 1E-20 ? 0 : points out in his survey paper [49], “are not standalone algorithms 2 ≥ in themselves, but rather strategies ready for adaption to specific (x 1E-20) and implement Rπ as a double-precision floating- − problems.” point program. We can verify that Rπ is a representing function in the sense of Def. 4.3, but it evaluates to 0 not only when x 1E-20, but also when 0 x < 1E-20 because the smallest Mathematical Optimization in Testing and Verification. In the ≥ ≤ machine-representable double is in the order of 1E-324 [32]. Then seminal work of Miller et al. [52], optimization methods are already the ME procedure may return x∗ = 0 in step M2 and then returns used in generating test data for numerical code. These methods are “not found” in step M3 because x∗ = 0 is not a model of π. then taken up in the 1990s by Koral [26, 38], which have found their This issue described above may be mitigated using arbitrary- ways into many mature implementations [42, 66, 68]. Mathematical precision arithmetic [10]. Another option, as is demonstrated in the optimization has also been employed in program verification [19, XSat solver [28], is to construct the representing function based on 61]. Liberti et al. [33, 47] have proposed to calculate the invariant ULP, the unit in the last place value [32], which avoids floating-point of an integer program as the mathematical optimization problem of inaccuracy by using the binary representation of floating-points. mixed integer nonlinear programming (MINLP) [67]. Recently, a floating-point satisfiability solver XSat [28] has been developed. Efficient MO Backend. The inherent intractability of global op- It constructs a floating-point program Rπ from a formula π in timization is another source of the incompleteness of ME. Even conjunctive normal form, and then decides the satisfiability of π by if the representing function is well-behaved and free from com- checking the sign of the minimum of Rπ . This decision procedure putational inaccuracy, it is possible that the MO backend returns is an application of Thm. 4.5(a), and it calculates the models of sub-optimal or local minimum points due to weak implementation, π using Thm. 4.5(b). Compared with XSat, our work lays out the high-dimensional or intrinsically difficult problems, etc.. That said, theoretical foundation for a precise, systematic approach to testing MO is still in active development, and our ME approach has the numerical code; XSat is an instance of our proposed ME procedure flexibility to leverage its state-of-the-art, such as the Basinhopping (see Example 4.2(b) and Sect. 4.4.3). algorithm used in our experiments.

7. Related Work 8. Conclusion Two Paradigms for Program Correctness. Constructing an ax- This paper introduces Mathematical Execution (ME), a new, unified iomatic system [27, 35] is of central importance in ensuring program approach for testing numerical code. Our insight is to (1) use a correctness. Let FOO be a program. If we write [ FOO ] for the set of representing function to precisely and uniformly capture the desired | | FOO’s possible execution paths (a.k.a. trace semantics [48]), and E testing objective, and (2) use mathematical optimization to direct for the unsafe paths, then the correctness of FOO is to ensure input and program space exploration. What ME — and most impor- [ FOO ] E = /0. (16) tantly, the accompanying representation function — provides is an | | ∩ approach that can handle a variety of automated testing problems, in- The problem is known to be undecidable, and approximate solutions cluding coverage-based testing, path reachability testing, boundary have been extensively studied. One is abstract interpretation [20, 21], value analysis, and satisfiability checking. We have implemented a ] which systematically constructs [ FOO ] [ FOO ], and prove Eq. (16) branch coverage testing tool as a proof-of-concept demonstration ] | | ⊇ | | by proving [ FOO ] E = /0. 2 Another category of approximation of ME’s potential. Evaluated on a collection of Sun’s math library is automated| testing.| ∩ It attempts to disprove Eq. (16) by generating (used in Java, JavaScript, Matlab, and Android), our tool CoverMe achieves substantially better coverage results (near-optimal coverage 2The relationship between the abstract and concrete semantics is more on all tested programs) when compared to random testing and Austin ] commonly formalized with a Galois connection [20]; we use [ FOO ] [ FOO ] (a coverage-based testing tool that combines symbolic execution as a simplified case. | | ⊇ | | and search-based strategies). References of Systems, TACAS’08/ETAPS’08, pages 337–340, Berlin, Heidelberg, 2008. [1] American fuzzy lop. http://lcamtuf.coredump.cx/afl/. Accessed: 25 June, 2016. [24] E. W. Dijkstra. Notes on structured programming. Apr. 1970. [2] Clang: A C language family frontend for LLVM. http://clang.llvm. [25] J. Eckstein and D. P. Bertsekas. On the Douglas—Rachford splitting org/. Accessed: 25 June 2016. method and the proximal point algorithm for maximal monotone [3] Freely distributed math library, Fdlibm. http://www.netlib.org/ operators. Mathematical Programming, 55(1-3):293–318, 1992. fdlibm/. Accessed: 25 June, 2016. [26] R. Ferguson and B. Korel. The chaining approach for software test data [4] GNU compiler collection tool, Gcov. https://gcc.gnu.org/ generation. ACM Trans. Softw. Eng. Methodol., 5(1):63–86, Jan. 1996. onlinedocs/gcc/Gcov.html/. Accessed: 25 June 2016. [27] R. W. Floyd. Assigning meanings to programs. Mathematical aspects [5] Python testing library, hypothesis. https://github.com/DRMacIver/ of computer science, 19(19-32):1, 1967. hypothesis. Accessed: 25 June 2016. [28] Z. Fu and Z. Su. XSat: A fast floating-point satisfiability solver. In [6] LLVM: Pass class reference. http://llvm.org/docs/doxygen/html/ Proceedings of the 28th International Conference on Computer Aided classllvm_1_1Pass.html. Accessed: 25 June 2016. Verification, CAV’16, Toronto, Ontario, Canada, 2016. [7] Scipy optimization package. http://docs.scipy.org/doc/scipy- [29] Z. Fu, Z. Bai, and Z. Su. Automated backward error analysis for dev/reference/optimize.html. Accessed: 25 June 2016. numerical code. In Proceedings of the ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, [8] C. Andrieu, N. de Freitas, A. Doucet, and M. I. Jordan. An introduction and Applications, OOPSLA’15, pages 639–654, Pittsburgh, PA, USA, Machine Learning to MCMC for machine learning. , 50(1-2):5–43, 2015. 2003. [30] P. Godefroid, N. Klarlund, and K. Sen. DART: Directed automated [9] A. Baars, M. Harman, Y. Hassoun, K. Lakhotia, P. McMinn, P. Tonella, random testing. In Proceedings of the ACM SIGPLAN 2005 Conference and T. Vos. Symbolic search-based testing. In Proceedings of the on Programming Language Design and Implementation, Chicago, IL, 26th IEEE/ACM International Conference on Automated Software USA, pages 213–223, 2005. Engineering, ASE ’11, pages 53–62, Washington, DC, USA, 2011. [31] K. Gödel. Über formal unentscheidbare sätze der principia mathematica [10] D. H. Bailey. High-precision floating-point arithmetic in scientific und verwandter systeme i. Monatshefte für mathematik und physik, 38 computation. Computing in Science and Engg., 7(3):54–61, May 2005. (1):173–198, 1931. [11] D. L. Bird and C. U. Munoz. Automatic generation of random self- checking test cases. IBM Syst. J., 22(3):229–245, Sept. 1983. [32] D. Goldberg. What every computer scientist should know about floating-point arithmetic. ACM Computing Surveys (CSUR), 23(1): [12] R. S. Boyer, B. Elspas, and K. N. Levitt. A formal system for testing 5–48, 1991. and debugging programs by symbolic execution. In Proceedings of the International Conference on Reliable Software, pages 234–245, New [33] E. Goubault, S. Le Roux, J. Leconte, L. Liberti, and F. Marinelli. York, NY, USA, 1975. Static analysis by abstract interpretation: A mathematical programming approach. Electronic notes in theoretical computer science, 267(1): [13] F. P. Brooks, Jr. The Mythical Man-month (Anniversary Ed.). Addison- 73–87, 2010. Wesley Longman Publishing Co., Inc., Boston, MA, USA, 1995. [34] S. Heule, M. Sridharan, and S. Chandra. Mimic: Computing models for [14] C. Cadar and K. Sen. Symbolic execution for software testing: Three opaque code. In Proceedings of the 10th Joint Meeting on Foundations decades later. Commun. ACM, 56(2):82–90, 2013. of Software Engineering, pages 710–720, 2015. [15] C. Cadar, D. Dunbar, and D. Engler. Klee: Unassisted and automatic [35] C. A. R. Hoare. An axiomatic basis for computer programming. generation of high-coverage tests for complex systems programs. In Commun. ACM, 12(10):576–580, Oct. 1969. Proceedings of the 8th USENIX Conference on Operating Systems Design and Implementation, OSDI’08, pages 209–224, Berkeley, CA, [36] J. C. King. Symbolic execution and program testing. Commun. ACM, USA, 2008. 19(7):385–394, July 1976. [16] S. Chib and E. Greenberg. Understanding the Metropolis-Hastings [37] S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by algorithm. The American Statistician, 49(4):327–335, Nov. 1995. simulated annealing. SCIENCE, 220(4598):671–680, 1983. [17] L. A. Clarke. A system to generate test data and symbolically execute [38] B. Korel. Automated software test data generation. IEEE Trans. Softw. programs. IEEE Trans. Softw. Eng., 2(3):215–222, May 1976. Eng., 16(8):870–879, Aug. 1990. [18] P. Collingbourne, C. Cadar, and P. H. Kelly. Symbolic crosschecking of [39] N. Kosmatov, B. Legeard, F. Peureux, and M. Utting. Boundary cover- floating-point and SIMD code. In Proceedings of the sixth conference age criteria for test generation from formal models. In Proceedings of on Computer systems, pages 315–328, 2011. the 15th International Symposium on Software Reliability Engineering, ISSRE ’04, pages 139–150, Washington, DC, USA, 2004. [19] P. Cousot. Proving program invariance and termination by parametric abstraction, lagrangian relaxation and semidefinite programming. In [40] J. R. Koza. Genetic programming: On the programming of computers International Workshop on Verification, Model Checking, and Abstract by means of natural selection, volume 1. MIT press, 1992. Interpretation, pages 1–24. Springer, 2005. [41] K. Lakhotia, P. McMinn, and M. Harman. An empirical investigation [20] P. Cousot and R. Cousot. Abstract interpretation: A unified lattice into branch coverage for C programs using CUTE and AUSTIN. model for static analysis of programs by construction or approximation Journal of Systems and Software, 83(12):2379–2391, 2010. of fixpoints. In Proceedings of the 4th ACM SIGACT-SIGPLAN [42] K. Lakhotia, N. Tillmann, M. Harman, and J. De Halleux. FloPSy: Symposium on Principles of Programming Languages, POPL’77, pages Search-based floating point constraint solving for symbolic execution. 238–252, 1977. In Proceedings of the 22Nd IFIP WG 6.1 International Conference [21] P. Cousot and R. Cousot. Systematic design of program analysis on Testing Software and Systems, ICTSS’10, pages 142–157, Berlin, frameworks. In Proceedings of the 6th ACM SIGACT-SIGPLAN Heidelberg, 2010. Symposium on Principles of Programming Languages, POPL’79, pages [43] K. Lakhotia, M. Harman, and H. Gross. AUSTIN: An open source 269–282, New York, NY, USA, 1979. tool for search based software testing of C programs. Information and [22] M. Davis. The undecidable: Basic papers on undecidable propositions, Software Technology, 55(1):112–125, 2013. unsolvable problems and computable functions. Dover Publications, [44] C. Lattner and V. Adve. LLVM: A compilation framework for lifelong Incorporated, 2004. program analysis & transformation. In Proceedings of the International [23] L. De Moura and N. Bjørner. Z3: An efficient SMT solver. In Symposium on Code Generation and Optimization: Feedback-directed Proceedings of the Theory and Practice of Software, 14th International and Runtime Optimization, CGO ’04, pages 75–86, Washington, DC, Conference on Tools and Algorithms for the Construction and Analysis USA, 2004. [45] D. Leitner, C. Chakravarty, R. Hinde, and D. Wales. Global optimiza- [67] M. Tawarmalani and N. V. Sahinidis. Global optimization of mixed- tion by basin-hopping and the lowest energy structures of Lennard- integer nonlinear programs: A theoretical and computational study. Jones clusters containing up to 110 atoms. Phys. Rev. E, 56:363, 1997. Mathematical programming, 99(3):563–591, 2004. [46] Z. Li and H. A. Scheraga. Monte Carlo minimization approach to [68] N. Tillmann and J. De Halleux. Pex: White box test generation for the multiple-minima problem in protein folding. Proceedings of the .NET. In Proceedings of the 2Nd International Conference on Tests National Academy of Sciences of the United States of America, 84(19): and Proofs, TAP’08, pages 134–153, Berlin, Heidelberg, 2008. 6611–6615, 1987. [69] E. J. Weyuker. On testing non-testable programs. Comput. J., 25(4): [47] L. Liberti, S. Le Roux, J. Leconte, and F. Marinelli. Mathematical pro- 465–470, 1982. gramming based debugging. Electronic Notes in Discrete Mathematics, [70] L. J. White and E. I. Cohen. A domain strategy for computer program 36:1311–1318, 2010. testing. IEEE Trans. Softw. Eng., 6(3):247–257, May 1980. [48] L. Mauborgne and X. Rival. Trace partitioning in abstract interpretation [71] G. Zoutendijk. Mathematical programming methods. North-Holland, based static analyzers. In Programming Languages and Systems, Amsterdam, 1976. 14th European Symposium on Programming, ESOP 2005, pages 5– 20, Edinburgh, UK, 2005. [49] P. McMinn. Search-based software test data generation: A survey. Softw. Test. Verif. Reliab., 14(2):105–156, June 2004. A. Untested Programs in Fdlibm [50] A. M. Memon, I. Banerjee, and A. Nagarajan. What test oracle should I The programs from the freely distributed math library Fdlibm 5.3 [3] use for effective GUI testing? In 18th IEEE International Conference on Automated Software Engineering, ASE 2003, pages 164–173, Montreal, are used as our benchmarks. Tab.5 lists all untested programs and Canada, 2003. functions, and explains the reason why they are not selected. Three types of the functions are excluded from our evaluation. They are (1) [51] B. Meyer. Seven principles of software testing. IEEE Computer, 41(8): 99–101, 2008. functions without any branch, (2) functions involving non-floating- point input parameters, and (3) static C functions. [52] W. Miller and D. L. Spooner. Automatic generation of floating-point test data. IEEE Trans. Softw. Eng., 2(3):223–226, May 1976. [53] M. Minoux. Mathematical programming: Theory and algorithms. B. Implementation Details Wiley, New York, 1986. As a proof-of-concept demonstration, we have implemented Algo.1 [54] G. J. Myers. The art of software testing. pages I–XV, 1–234, 2004. in the tool CoverMe. This section presents the implementation and [55] R. Nieuwenhuis, A. Oliveras, and C. Tinelli. Solving SAT and technical details omitted from the main body of the paper. SAT modulo theories: From an abstract Davis–Putnam–Logemann– Loveland procedure to DPLL(T). J. ACM, 53(6):937–977, 2006. B.1 Frontend of CoverMe [56] J. Nocedal and S. J. Wright. Numerical optimization. Springer Science The frontend implements Step 1 and Step 2 of Algo.1. CoverMe & Business Media, 2006. compiles the program under test FOO to LLVM IR with Clang [2]. [57] H. Palikareva and C. Cadar. Multi-solver support in symbolic execution. Then it uses an LLVM pass [6] to inject assignments in FOO. The pro- In Proceedings of the 25th International Conference on Computer gram under test can be in any LLVM-supported language, e.g., Ada, Aided Verification, CAV’13, pages 53–68, Berlin, Heidelberg, 2013. the C/C++ language family, or Julia. Our current implementation [58] R. Pandita, T. Xie, N. Tillmann, and J. de Halleux. Guided test genera- accepts C code only. 2010 IEEE International Conference on tion for coverage criteria. In Fig.9 illustrates FOO as a function of signature type_t FOO Software Maintenance, ICSM’10, pages 1–10, 2010. (type_t1 x1, type_t2 x2, ...). The return type (output) of the [59] W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Nu- function, type_t, can be any kind of types supported by C, whereas merical recipes 3rd edition: The art of scientific computing. Cambridge the types of the input parameters, type_t1, type_t2, ..., are University Press, New York, NY, USA, 2007. restricted to double or double*. We have explained the signature of [60] H. Robbins and S. Monro. A stochastic approximation method. The pen in Def. 3.7. Note that CoverMe does not inject pen itself into annals of mathematical statistics, pages 400–407, 1951. FOO, but instead injects assignments that invoke pen. We implement [61] M. Roozbehani, A. Megretski, and E. Feron. Convex optimization pen in a separate C++ file. proves software correctness. In Proceedings of American Control The frontend also links FOO_I and FOO_R with a simple program Conference, pages 1395–1400, 2005. loader to generate a shared object file libr.so, which is the [62] W. Rudin. Principles of mathematical analysis. McGraw-Hill, New outcome of the frontend. It stores the representing function in the York, 3 edition, 1976. form of a shared object file (.so file). [63] E. Schkufza, R. Sharma, and A. Aiken. Stochastic optimization of floating-point programs with tunable precision. In Proceedings of the B.2 Backend of CoverMe 35th ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI ’14, pages 53–64, New York, NY, USA, The backend implements Step 3 of Algo.1. It invokes the rep- 2014. resenting function via libr.so. The kernel of the backend is an external MCMC engine. It uses the off-the-shelf implementation [64] K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In Proceedings of the 10th European Software Engineer- known as Basinhopping from the Scipy Optimization package [7]. ing Conference Held Jointly with 13th ACM SIGSOFT International Basinhopping takes a range of input parameters. Fig.9 shows Symposium on Foundations of Software Engineering, ESEC/FSE-13, the important ones for our implementation basinhopping(f, sp, pages 263–272, New York, NY, USA, 2005. n_iter, call_back), where f refers to the representing function _ [65] J. P. M. Silva and K. A. Sakallah. GRASP: A new search algorithm from libr.so, sp is a starting point as a Python Numpy array, n iter for satisfiability. In Proceedings of the 1996 IEEE/ACM International is the iteration number used in Algo.1 and call_back is a client- Conference on Computer-aided Design, ICCAD ’96, pages 220–227, defined procedure. Basinhopping invokes call_back at the end of Washington, DC, USA, 1996. each iteration (Algo.1, Lines 24-34). The call_back procedure al- [66] M. Souza, M. Borges, M. d’Amorim, and C. S. Pas˘ areanu.˘ CORAL: lows CoverMe to terminate if it saturates all branches. In this way, Solving complex constraints for symbolic pathfinder. In NASA Formal CoverMe does not need to wait until passing all n_start iterations Methods Symposium, pages 359–374, 2011. (Algo.1, Lines 8-12). Table 5: Untested programs and functions in benchmark suite Fdlibm and the corresponding explanations.

Program Entry function Explanation e_gamma_r.c ieee754_gamma_r(double) no branch e_gamma.c ieee754_gamma(double) no branch e_j0.c pzero(double) static C function qzero(double) static C function e_j1.c pone(double) static C function qone(double) static C function e_jn.c ieee754_jn(int, double) unsupported input type ieee754_yn(int, double) unsupported input type e_lgamma_r.c sin_pi(double) static C function ieee754_lgammar_r(double, int*) unsupported input type e_lgamma.c ieee754_lgamma(double) no branch k_rem_pio2.c kernel_rem_pio2(double*, double*, int, int, const int*) unsupported input type k_sin.c kernel_sin(double, double, int) unsupported input type k_standard.c kernel_standard(double, double, int) unsupported input type k_tan.c kernel_tan(double, double, int) unsupported input type s_copysign.c copysign(double) no branch s_fabs.c fabs(double) no branch s_finite.c finite(double) no branch s_frexp.c frexp(double, int*) unsupported input type s_isnan.c isnan(double) no branch s_ldexp.c ldexp(double, int) unsupported input type s_lib_version.c lib_versioin(double) no branch s_matherr.c matherr(struct exception*) unsupported input type s_scalbn.c scalbn(double, int) unsupported input type s_signgam.c signgam(double) no branch s_significand.c significand(double) no branch w_acos.c acos(double) no branch w_acosh.c acosh(double) no branch w_asin.c asin(double) no branch w_atan2.c atan2(double, double) no branch w_atanh.c atanh(double) no branch w_cosh.c cosh(double) no branch w_exp.c exp(double) no branch w_fmod.c fmod(double, double) no branch w_gamma_r.c gamma_r(double, int*) no branch w_gamma.c gamma(double, int*) no branch w_hypot.c hypot(double, double) no branch w_j0.c j0(double) no branch y0(double) no branch w_j1.c j1(double) no branch y1(double) no branch w_jn.c jn(double) no branch yn(double) no branch w_lgamma_r.c lgamma_r(double, int*) no branch w_lgamma.c lgamma(double) no branch w_log.c log(double) no branch w_log10.c log10(double) no branch w_pow.c pow(double, double) no branch w_remainder.c remainder(double, double) no branch w_scalb.c scalb(double, double) no branch w_sinh.c sinh(double) no branch w_sqrt.c sqrt(double) no branch FOO: Program under test in any LLVM-supported language

type_tFOO(doublex1,doublex2,...)

pen (.cpp)

double pen (int i, int op, double lhs, double rhs)

_ FOO I: Instrumented program (.bc) Dealing with Comparison between Non-floating-point Expres- Front-end: Constructtype the_tFOO _I(doublex1,doublex2,...) sions (Relaxing Def. 3.1(b)). representing function We have encountered situations loader (.cpp) where a conditional statement invokes a comparison between non FOOFOO: Program under test voidFOO : LOADERFOO Program (double underP) test floating-point numbers. CoverMe handles these situations by first in any: Program LLVM-supported: Program under under* test test language FOO: Programinin any any underin LLVM-supported any LLVM-supported LLVM-supported test languageFOO language: Program language under test FOO: Program under_ test promoting the non floating-point numbers to floating-point numbers FOOtypeR _(.cpp)tFOO(doublex1,doublex2,...)FOO: Program under test in any LLVM-supportedtype_tFOO(doublex1,doublex2,...)_ languagein any LLVM-supportedFOO: Program under test language type_typeFOOtFOO(doublex1,doublex2,...)tFOO(doublex1,doublex2,...): Program under test and then injecting pen as described in Algo.1. For example, before a in any LLVM-supportedvoid FOO_R(double languageinP) any LLVM-supported language FOO pen (.cpp) * in any LLVM-supported language type_tFOO(doublex1,doublex2,...): ProgramFOOpen: Program(.cpp)penin(.cpp) underany LLVM-supported under test testtype_tFOO(doublex1,doublex2,...) language conditional statement like if ( op yi) where xi and yi are inte- type_tFOO(doublex1,doublex2,...)_ type_tFOO(doublex1,doublex2,...) FOO: Programlibr.sopendouble under pen test (inttype i, inttFOO(doublex1,doublex2,...) op, double lhs, double rhs) FOO in anyin LLVM-supporteddouble anydouble(.cpp) LLVM-supported pen (int pen (int i, int i, op, int languageFOO double op, double language lhs, lhs, double double rhs) rhs) gers, CoverMe injects r = pen (i, op, (double) x, (double) : Program under testtype_tFOO(doublex1,doublex2,...): Program under test pen (.cpp)FOOvoid: Program FOO_R(double underP) test pen (.cpp)pen (.cpp) inin any any LLVM-supported LLVM-supported_ __ language language* y));. Note that such an approach does not allow us to handle data pen (.cpp)type tFOO(doublex1,doublex2,...)typeindoubleFOO anyFOO_tFOO(doublex1,doublex2,...)FOO LLVM-supportedI:I Instrumented:_ pen InstrumentedI: Instrumented (intpen i, program(.cpp) languageprogramin int program any op, (.bc) LLVM-supported(.bc) (.bc)double lhs, double language rhs) typedouble_tFOO(doublex1,doublex2,...) pen (intpen i, int op, doubledouble lhs,FOO pendouble double (int pen rhs) (int i, int i, int op, op, double double lhs, lhs,types double double that rhs) are rhs) incompatible with floating-point types, e.g., conditions type tFOO(doublex1,doublex2,...)MCMCtypetype_typetFOO_tFOO minimization_tFOO_I(doublex1,doublex2,...)(.cpp)_I(doublex1,doublex2,...)_I(doublex1,doublex2,...) procedure_ (.py): Program under test double penpen (int i,typeFOO int_tFOO(doublex1,doublex2,...)_I op, doubledouble lhs,type pen double (inttFOO(doublex1,doublex2,...) i,rhs) int op, double lhs, double rhs) (.cpp)pen (.cpp): Instrumented_ programinFOO any_I LLVM-supported(.bc): Instrumented program language (.bc)like if (p != Null), which CoverMe has to ignore. pen (.cpp)FOO_ : Programbasinhoppingloaderloaderdouble(.cpp) under (func, pen test sp, (int n iter, i, int callback) op, double lhs, double rhs) FOO I: Instrumentedpenloader (.cpp)FOO(.cpp) program: ProgramFOO under (.bc)_ testI: Instrumented program (.bc) _pen (.cpp) (.cpp)_ _ type_tFOO(doublex1,doublex2,...)_ _ FOO Idouble: Instrumenteddouble pen (int pentype i, int (int programtFOO op, i, doubleI(doublex1,doublex2,...) intFOO (.bc) lhs, op,_I doublepen double(.cpp) rhs) lhs,type tFOO doubleI(doublex1,doublex2,...) rhs) in anydouble LLVM-supportedvoidLLVMvoidvoid LOADER pen LOADER pass LOADERin (int (double any (double (double LLVM-supportedi,* P) int languageP)*:P) Instrumented op, double language lhs, program double (.bc) rhs) Dealing with Infeasible Branches (Relaxing Def. 3.1(c)). Infea- type_tFOO_doubleI(doublex1,doublex2,...) penFOO (int_I: i, Instrumented int op,* doubletype_tFOO lhs, program double_I(doublex1,doublex2,...) rhs) (.bc) _double_ _ pen (intLinking i, int op,_ double lhs,pen doubleloader(.cpp)(.cpp) rhs) type tFOOFOO typeI:I(doublex1,doublex2,...) Instrumented__tFOO(doublex1,doublex2,...)FOO_FOO_R program(.cpp)_R (.cpp)type (.bc)typetFOO(doublex1,doublex2,...)_tFOOdouble_I(doublex1,doublex2,...) pen (int i, int op, double lhs,sible double branches rhs) are branches that cannot be covered by any input. At- FOO IFOOFOO:loader InstrumentedFOO__II:: InstrumentedR Instrumented(.cpp)(.cpp) program program program (.bc) (.bc) (.bc) _ _ doublevoid pen LOADER (int (double i, int* op,P) double lhs, double rhs) typeloader__tFOO_I(doublex1,doublex2,...)voidvoid FOOtype_R(double FOOpentFOO_R(doubleP)I(doublex1,doublex2,...)P) tempts to cover infeasible branches are useless and time-consuming. FOO I: Instrumented(.cpp) _ program(.cpp)* * (.bc)loader (.cpp) loader (.cpp)type_tFOOtypevoid___tFOOI(doublex1,doublex2,...) FOO loader_I(doublex1,doublex2,...)R(double_ (double* P) FOO* P)_I: Instrumented program (.bc) pen (.cpp)type tFOOI(doublex1,doublex2,...)loader (.cpp)FOO_ _R Detecting infeasible branches is a difficult problem in general. loader (.cpp)FOO:libr.so Programlibr.so underdouble test pen (int i, intFOO op, doubleI: Instrumented(.cpp) lhs, double programrhs) (.bc) typevoid_tFOO loader_I(doublex1,doublex2,...) (doubleloader* P)(.cpp) void loader (double* P) loaderlibr.so(.cpp) _ _ _ CoverMe uses a simple heuristic to detect and ignore infeasible void loaderdouble (doublein pen any LLVM-supported* (int_P)_ i, intvoid op, language loadertype doubletFOO (double lhs,void_ I(doublex1,doublex2,...) FOOdouble_R(doubleP) rhs)* P) voidloader LOADER (doubleFOOvoid(.cpp)*void FOORP)(.cpp)R(double FOOFOO_R(double_I*: InstrumentedP) * P) programtype tFOO (.bc)* I(doublex1,doublex2,...) loader (.cpp)_ _ typevoid_voidtFOO(doublex1,doublex2,...) LOADER FOOvoidR(double (double loader* *P)P) (double_* P) branches. When CoverMe finds a minimum that is not zero, that loaderFOO R (.cpp) _ _ FOO R (.cpp) _ FOO_R (.cpp)MCMCMCMC minimization_ type minimizationtFOOI(doublex1,doublex2,...) procedure procedure (.py)loaderlibr.so (.py) FOO R (.cpp)FOOvoid(.cpp)_I: loadervoid Instrumentedvoid loader FOO (doubleR(double (double*FOO programP)_R*loader(.cpp)P)P) (.bc)(.cpp)(.cpp) is, FOO_R(x ) > 0, CoverMe deems the unvisited branch of the penFOOMCMC_R (.cpp) minimization* procedure (.py) ∗ void FOO_R(double(.cpp) _P) _void_ FOOvoid_R(double FOO_R(doubleP)* P) void FOO loader_R(double (doublebasinhopping* P)basinhoppingFOO** loaderP)R (func,(.cpp) (func,(.cpp) sp, n sp,iter, n iter, callback)void callback) LOADER (double* * P) void FOO_R(double_ _P) _ void_ loader (double P) last conditional to be infeasible and adds it to Saturate, the set typeFOO_tFOOdoubleRvoidlibr.so*basinhopping_ FOOpenI(doublex1,doublex2,...)R(double (int i, (func, int*voidP) op, sp, FOO double n_R(doubleiter, lhs, callback) double* rhs)P) * FOO(.cpp)R (.cpp)void LOADER (double P) libr.so LLVMvoidLLVM pass FOO pass_R(double *P) MCMC_ minimization procedure (.py)of unvisited and deemed-to-be infeasible branches. FOOlibr.so_R libr.so* FOO R (.cpp) (.cpp)FOOlibr.so__I:Linking InstrumentedLLVMLinking_ pass_ program (.bc)_ _ libr.so loadervoid_ FOOvoidvoid(.cpp)R(double FOO FOO_R(doubleR(doubleFOO* P)libr.soR (.cpp)*P)FOOP) R (.cpp)basinhopping (func, sp, n iter, callback) FOO void FOO R(double* P) * void FOO_R(double* P) Imagine that we modify l1 of the program in Fig.3 to the _ void FOO_R(double P) _ voidvoid FOO FOO_R(doubletypeR(double_tFOOLinkinglibr.so_I(doublex1,doublex2,...)*P)P) * _ void FOO R(double* P) void FOO_R(double P) * void FOO R(doublevoid_ * P) FOO_R(doubleLLVM pass P) conditional statement if (y == -1). Then the branch 1T becomes MCMClibr.sovoid minimization loaderlibr.soMCMC* (double procedure minimization*voidP) (.py) FOO R(double procedurelibr.so* P) (.py)* loaderMCMC(.cpp)void minimizationlibr.so FOO_R(double procedureP) (.py) Linking infeasible. We rewrite this modified program below and illustrate basinhoppingMCMC minimization (func, sp, n_iter, procedure callback)MCMC* (.py) minimization procedure (.py) libr.so _void FOO_R(double P) MCMC minimizationvoid_ FOObasinhoppingbasinhopping_R(double procedure_ (func,P) (func, sp, (.py) n_iter,libr.so sp, callback) n iter, callback)* FOO Rvoidvoid(.cpp) LOADER FOO (doubleR(doublevoid* *MCMC FOOP)_R(double* P) minimizationP) procedure (.py) how we deal with infeasible branches. _ * _ voidbasinhoppingLLVM FOO_R(double pass (func,MCMCP) sp, n minimizationiter,basinhopping callback) procedure (func, (.py) sp, n iter, callback) basinhopping (func,__LLVM sp,* n pass_iter, callback)void FOOMCMC_R(double minimizationP) procedure (.py) LinkingvoidMCMC FOOFOOMCMCR(doubleR minimization(.cpp)LLVM minimizationMCMC* passP)basinhopping minimizationprocedure procedure (func, procedure (.py) (.py) sp, (.py) n*_iter, callback)l0: if (x <= 1) {x++}; Linkingbasinhopping (func, sp,basinhopping n_iter, (func, callback) sp, n_iter, callback) MCMCLLVM minimizationvoid pass FOOLinking_R(doublebasinhopping procedure* P) (func, (.py)LLVM sp, n_iter, pass callback) y = square(x); LLVMlibr.sobasinhopping pass (func, sp, n_iter,MCMC_ callback) minimization procedure (.py) Back-end: Minimizebasinhopping the (func,LLVM sp, n passiter,LLVM callback) pass l1: if (y == -1) {...} Linkinglibr.so LLVM_ pass Linking representingLinkingbasinhopping function (func, sp,LLVM n iter, pass callback) _ void FOO_R(double P) LinkingbasinhoppingLinking (func, sp, n iter, callback) LLVM pass_ * Linking void FOOLLVMR(doubleLinking pass* P) where we omit the concrete implementation of square. LLVMLinking pass _ MCMCMCMC minimizationLinking minimization procedureLLVM (.py) (.py) pass Let FOO R denote the representing function constructed for the Linking basinhopping (func, sp, n_iter, callback)Linking program. In the minimization process, whenever CoverMe obtains basinhopping (func, sp, n_iter, callback) x such that FOO_R(x ) > 0, CoverMe selects a branch that it regards LLVM pass ∗ ∗ LLVMLinking pass infeasible. CoverMe selects the branch as follows: Suppose x∗ GeneratedLinkingGenerated test inputs test inputs exercises a path τ whose last conditional statement is denoted by lz, and, without loss of generality, suppose zT is passed through by τ, _ X: A set ofXFOO: A setR’s of globalFOO_ minimumR’s global points minimum points then CoverMe regards zF as an infeasible branch. In the modified program above, if 1F has been saturated, the representing function evaluates to (y + 1)2 or (y + 1)2 + 1, where y Figure 9: CoverMe Implementation. equals to the non-negative square(x). Thus, the minimum point x∗ must satisfy FOO_R(x∗) > 0 and its triggered path ends with branch 1F . CoverMe then regards 1T as an infeasible branch. CoverMe then regards the infeasible branches as already satu- B.3 Technical Details rated. It means, in line 12 of Algo.1, CoverMe updates Saturate Sect.3 assumes Def. 3.1(a-c) for the sake of simplification. This with saturated branches and infeasible branches (more precisely, section discusses how CoverMe relaxes the assumptions when - branches that CoverMe regards infeasible). dling real-world floating-point code. We also show how CoverMe The presented heuristic works well in practice (See Sect.5), but handles function calls at the end of this section. we do not claim that our heuristic always correctly detects infeasible branches. Dealing with Pointers (Relaxing Def. 3.1(a)) We consider only Dealing with Function Calls. By default, CoverMe injects r = pointers to floating-point numbers. They may occur (1) in an input pen only in the entry function to test. If the entry function invokes parameter, (2) in a conditional statement, or (3) in the code body but i other external functions, they will not be transformed. For example, not in the conditional statement. in the program FOO of Fig.3, we do not transform square(x). In CoverMe inherently handles case (3) because it is execution- this way, CoverMe only attempts to saturate all branches for a single based and does not need to analyze pointers and their effects. function at a time. CoverMe currently does not handle case (2) and simply ignores However, CoverMe can also easily handle functions invoked by these conditional statements by not injecting pen before them. its entry function. As a simple example, consider: Below we explain how CoverMe deals with case (1). A branch coverage testing problem for a program whose inputs are pointers void FOO(double x) {GOO(x);} to doubles, can be regarded as the same problem with a simpli- void GOO(double x) {if (sin(x) <= 0.99) ... } fied program under test. For instance, finding test inputs to cover If CoverMe aims to saturate FOO and GOO but not sin, and it sets branches of program void FOO(double* p) {if (*p <= 1)... } FOO as the entry function, then it instruments both FOO and GOO. Only can be reduced to testing the program void FOO_with_no_pointer GOO has a conditional statement, and CoverMe injects an assignment (double x) {if (x <= 1)... }. CoverMe transforms program on r in GOO. FOO to FOO_with_no_pointer if a FOO’s input parameter is a floating-point pointer.