<<

arXiv:1610.07390v3 [cs.PL] 11 Aug 2020 r h following: the are correctne the verifying imperative. for becomes techniques grams [ of arithmetic development fixed-point num binar of the floating-point instead for Thus, not, avionics, than standard often modern 754 more in wid used, IEEE now Even the the [IEE08]. by of arithmetic facilitated portions point was significant This of acceptable. adoption increasingly cr as of implementation perceived the for is computations floating-point of use The Introduction 1 oet Bagnara Roberto rcia praht nevlRefinement Interval to Approach Practical A codn o[o0] h angaso otn-on rga verific program floating-point of goals main the [Mon08], to According edsoee hog xasietsig hymyhave may they testing, exhaustive through discovered we praho elwrdcd,soigisaiiyt eeto detect to ability its showing behaviors. eval code, anomalous non- experimentally real-world We handle on implementations. to approach common able most be i the to verification of literature enabling the implementations, in function refi rounded first Our model the generation. are data based dichotomic rithms test execution modified and symbolic a interpretation, via abstract on r verification interval based enable develop We functions that numbers. such small for in and techniques size small very of Abstract. osredr eitoueapamtcapoc htlevera that a approach most an pragmatic can propose that a We conclusion introduce program. We the no surrender. of to functions, u behavior se that the the would about programs for statically of This specification verification results. a formal computed without at the attempt wides any about in prevent languag libraries guarantees these all no use that is that with reason programs The for libraries. not matical but areas, several in math.h 1 eicto fC/C of Verification for , 2 2 ihl Chiari Michele , USN srl, BUGSENG / math.h cmath [email protected] 4 1 [email protected] [email protected] 3 oienc iMln,Italy Milano, di Politecnico nvriyo am,Italy Parma, of University nvriyo ia Italy Pisa, of University ucin are functions [email protected] / ++ http://bugseng.com 1 cmath , 2 rgashsse osdrbeprogress considerable seen has programs , 4 oet Gori Roberta , almost Functions icws oooi:as monotonic: piecewise 3 Italy , n baoBagnara Abramo and , ra s come use pread h presence the n glitches eetalgo- nement e h fact the ges s mathe- es’ ueout rule r BDL12,Mon08]. efinement lternative drawn be checking, so uhpro- such of ss correctly ethem: se aeour uate tclsystems itical search, often , mto em floating- y esare bers espread ation 2 1. Proving that the program will never trigger “undefined” or “undesir- able” behaviors, such as an overflow on a conversion from a floating- point type to an integer type. 2. Pin-pointing the sources of roundoff errors in the program; proving an upper bound on the amount of roundoff error in some variable. 3. Proving that the program implements such or such numerical com- putation up to some specified error bound.

An infamous example of the issues addressed by goal (1) is the crash of the Ariane 5 rocket, which was caused by an overflow in a conversion from a 64-bit floating-point to a 16-bit integer in the software embedded into its control system [LMR+98]. Other examples of such “undesirable” behaviors are the generation of infinities or Not-A-Numbers (NaNs). An empirical study on real-world numerical software [FGCRG17] found out that behaviors of this kind are the cause of 28 % of numerical bugs in the considered programs. [FGCRG17] advocates for the development of tools for the detection of such bugs, and the automated generation of test cases triggering them. In this paper, we present a technique to achieve such goals, leaving (2) and (3) to the extensive literature concerned with evaluating the precision of computations, such as [GP06,SBB+18,DK17]. To illustrate the concrete problem raised by the use of floating-point compu- tations in program verification settings, consider the code reproduced in Figure 1. It is a reduced version of a real-world example extracted from a critical embed- ded system. The purpose of function latlong_utm_of is to convert the latitude and longitude received from a drone to UTM coordinates, which are stored in the two global variables at lines 45-46. For the moment, let us just notice that this code consists in a large and varied amount of floating-point computations, many of them non-linear. Many calls to mathematical functions are made (high- lighted in red): any kind of analysis of this code must be able to take them into account. Some of the questions to be answered for each one of the floating-point operations in this code are:

(i) Can infinities and NaNs be generated? (ii) Can sin, cos, and tan be invoked on ill-conditioned arguments? (iii) If anomalies of any one of these kinds are possible, which inputs to the given functions may cause them?

Concerning question (ii), we call the argument of a floating-point periodic trigono- metric function ill-conditioned if its absolute value exceeds some application- dependent threshold. Ideally, this threshold should be just above π. To under- stand this often-overlooked programming error, consider that the distance be- tween two consecutive floating-point numbers (i.e., their ULP)5 increases with their magnitude, while the period of trigonometric functions remains constant. Thus, if x is an IEEE 754 single-precision number and x ≥ 223, then the small- est single-precision range containing [x, x +2π) contains no more than three

5 ULP stands for unit in the last place: if x is a finite floating-point number, ulp(x) is the distance between the two finite floating-point numbers nearest x [Mul05].

2 1 #include 2 #include 3 4 #define RadOfDeg(x) ((x) * (M_PI/180.)) 5 #define E 0.08181919106 /* Computation for the WGS84 geoid only */ 6 #define LambdaOfUtmZone(utm_zone) RadOfDeg((utm_zone-1)*6-180+3) 7 #define CScal(k, z) { z.re *= k; z.im *= k; } 8 #define CAdd(z1, z2) { z2.re += z1.re; z2.im += z1.im; } 9 #define CSub(z1, z2) { z2.re -= z1.re; z2.im -= z1.im; } 10 #define CI(z) { float tmp = z.re; z.re = - z.im; z.im = tmp; } 11 #define CExp(z) { float e = exp(z.re); z.re = e*cos(z.im); \ 12 z.im = e*sin(z.im); } 13 #define CSin(z) { CI(z); struct complex _z = {-z.re, -z.im}; \ 14 float e = exp(z.re); float cos_z_im = cos(z.im); z.re = e*cos_z_im; \ 15 float sin_z_im = sin(z.im); z.im = e*sin_z_im; _z.re = cos_z_im/e; \ 16 _z.im = -sin_z_im/e; CSub(_z, z); CScal(-0.5, z); CI(z); } 17 18 static inline float isometric_latitude(float phi, float e) { 19 return logp1 (tanp2 (M_PI_4 + phi / 2.0)) 20 - e / 2.0 * log((1.0 + e * sinp3 (phi)) / (1.0 - e * sin(phi))); 21 } 22 23 static inline float isometric_latitude0(float phi) { 24 return logp4 (tan(M_PI_4 + phi / 2.0)); 25 } 26 27 void latlong_utm_of(float phi, float lambda, uint8_t utm_zone) { 28 float lambda_c = LambdaOfUtmZone(utm_zone); 29 float ll = isometric_latitude(phi, E); 30 float dl = lambda - lambda_c; 31 float phi_ = asin(sinp5 (dl) / cosh(ll)); 32 float ll_ = isometric_latitude0(phi_); 33 float lambda_ = atan(sinh(ll) /p6 cosp7 (dl)); 34 struct complex z_ = { lambda_, ll_ }; 35 CScal(serie_coeff_proj_mercator[0], z_); 36 uint8_t k; 37 for(k = 1; k < 3; k++) { 38 struct complex z = { lambda_, ll_ }; 39 CScal(2*k, z); 40 CSin(z); 41 CScal(serie_coeff_proj_mercator[k], z); 42 CAdd(z, z_); 43 } 44 CScal(N, z_); 45 latlong_utm_x = XS + z_.im; 46 latlong_utm_y = z_.re; 47 }

Fig. 1. Code excerpted from a real-world avionic library. The original source code is available at http://paparazzi.enac.fr, Paparazzi UAV (Unmanned Aerial Vehicle), v5.14.0 stable release, file sw/misc/satcom/tcp2ivy.c, last accessed on July 16th, 2020. The annotations pi are referred to in Section 7.2.

3 floating-point numbers. Current implementations of floating-point trigonomet- ric functions, such as those of CR-LIBM,6 libmcr7, and GNU libc [LSM+16], contain precise range reduction that compute a very accurate result even for numbers much higher than 223. The point is that the function inputs at this magnitude are so distant form each other that the graph of the function becomes practically indistinguishable from that of a pseudo-random number gen- erator, potentially becoming useless for the application. Substitute 223 with 252, and the same holds for IEEE 754 double-precision numbers. In order to answer questions (i)–(iii), we need a precise characterization of the semantics of all operations involved in the program. Most implementations of the C and C++ programming languages provide floating-point data types that conform to IEEE 754 as far as basic arithmetic operations and conversions are concerned. The C and C++ floating-point mathematical functions are part of the standard libraries of the respective languages (see, e.g., [Int11,Int14]). Ac- cess to such functions requires inclusion of the math.h header file in C, or the cmath header file in C++. The library implementing them is called libm. Very few C/C++ implementations comply to the recommendation of the IEEE 754 standard [IEE08, Section 9.2] that such functions be correctly rounded.8 One of them is CR-LIBM, a remarkable research prototype that, however, has still not found industrial adoption, possibly because of the worst-case performance of some functions (the average case being usually very good). Another freely avail- able correctly-rounded library is libmcr, by Sun Microsystems, Inc. It provides some double-precision functions (exp, log, pow, sin, cos, tan, atan), but its development stopped in 2004. Even though we cannot exclude the existence of proprietary implementations of libm providing formalized precision guarantees, we were not able to find one. In the most popular implementations, such guar- antees are usually not available. For example, the documentation of GNU libc [LSM+16] contains: “Therefore many of the functions in the math library have errors. The table lists the maximum error for each function which is exposed by one of the existing tests in the test suite. The table tries to cover as much as possible and list the actual maximum error (or at least a ballpark figure) but this is often not achieved due to the large search space.” This provides nothing that can really be trusted in a safety-critical context. In the embedded world, we checked the documentation of all major toolchain providers in our possession: for four of them we found that the lack of guar- antees is explicitly mentioned (e.g., [Ana15, page 2-338], [Gre13a, page 591], [Gre13b, page 665], [Fuj13, page 180], [Tex18, page 87]), whereas for all the oth- ers (incuding Arm, CodeWarrior/Freescale/NXP, CrossWorks, HighTec, IAR, 6 See https://gforge.inria.fr/scm/browser.php?group_id=5929&extra=crlibm, last accessed on July 16th, 2020. 7 See https://github.com/simonbyrne/libmcr/blob/master/README, last accessed on July 16th, 2020. 8 A function is said to be correctly rounded if its result is as if computed with infinite precision, and then rounded to the floating-point format in use.

4 Keil, Microchip, NEC, Renesas, TASKING, Wind River) the lack of guarantees is left implicit. We do not have a precise specification for the library functions that are as- sumed in the code of Figure 1. Its sources refer to GNU libc and to Newlib,9 but no specific versions are mentioned. We can probably assume a POSIX-compliant behavior with respect to special values. E.g., if log() is called with a negative number, a NaN is returned. If atan() is called with ±1, then an infinity is re- turned [IEE13]. This information is not sufficient to provide a general answer to the verification questions (i)–(iii). Things change if we fix a specific implemen- tation of the mathematical library. In this respect, we propose a practical approach that enables verification of C/C++ programs using math.h/cmath functions, even with minimal or no specification in addition to the special cases mandated by standards such as POSIX [IEE13]. Our main contribution is the extension of constraint satisfac- tion problem solving techniques based on interval consistency [BMV94,BO97] to programs using libm functions, by providing interval refinement algorithms for such constraints. We use such techniques to solve the constraint systems generated for each program path by symbolic execution [BGM06], and perform symbolic-execution based [GS18]. Symbolic-execution based test data generation [BCGG13] may be performed as well. The application to ab- stract interpretation based on (multi-) interval domains is also straightforward [CC76]. We conducted an investigation by means of exhaustive testing on the most common implementations of libm (cf. Section 9.2). We observed that, for all the implementations we tested, the piecewise monotonicity property of the cor- responding real functions is “almost” preserved. The results of this investigation are presented in Section 4. Consider, for instance the tanhf() function, which is meant to approximate the hyperbolic tangent function over IEEE 754 single- precision floats. While y = tanh(x) is monotonically (strictly) increasing over (−∞, +∞), y = tanhf(x) can be monotonically non-decreasing over the full range of IEEE 754 single-precision floats, or it can be “almost monotonically non-decreasing.” By this we mean that, going from −∞ to +∞, there may be an occasional drop in the graph of tanhf(), but this is quickly recovered from, that is, the function starts increasing again. We use the term glitches to name such occasional drops: we observed that glitches are often shallow (most often just one ULP), narrow (most often just 2 ULPs), and, on average, not very fre- quent. We leverage this fact to provide general interval refinement algorithms that enable software verification and testing. Such algorithms are based on an efficient dichotomic search of the intervals to be refined. While traditional di- chotomic search can only be applied to monotonic domains, our version is mod- ified to work despite the presence of glitches, by just exploiting some minimal data about them. As we explain in Section 8, our algorithms are the first in the literature that do not require the function implementations to be correctly rounded, thus enabling their use with the most common libm implementations.

9 See https://sourceware.org/newlib/, last accessed on July 16th, 2020.

5 If we have approximate but correct information about the maximal depth and width of glitches and, possibly, their number and their localization,10 then we can guarantee the refined intervals are conservative. This allows performing formal verification via abstract interpretation, symbolic model checking or auto- matic theorem proving. With quite precise (correct) information and small/few glitches (or if there are no glitches at all, which includes the case of correctly rounded implementations), the refinement results in tight intervals, and veri- fication is computationally cheaper and with fewer “don’t knows”. With less precise but still correct information, verification is still possible, but slower and with more “don’t knows”. With incorrect information about glitches, we can still automatically generate test inputs with much greater coverage than ran- dom testing. For single-precision and half-precision IEEE 754 functions, collection of pre- cise data about glitches can be obtained by analysing each function on every possible input. This is perfectly feasible since glitch data must be collected only once for each implementation of libm. For double-precision functions, when the mathematical library comes equipped with guarantees on the maximum errors, they can be used as correct approximations of the glitch parameters required by our . This is the case for the HA and LA accuracy modes of the Intel Math Kernel Library.11 Techniques for automatically proving the correctness of error bounds in math.h/cmath implementations have been recently developed [Har00a,Har00b,LSA18]. When provably correct bounds are available, we can verify programs by proving that bad things cannot happen. On the other hand, when the target mathematical library comes equipped with merely empirical in- formation on the maximum errors, as is the case for GNU libc [LSM+16], such information can be used to obtain (possibly incorrect) bounds for glitches, which enable the automatic generation of test inputs. The approach presented in this paper has been fully implemented in a com- mercial tool (ECLAIR, developed and commercialized by BUGSENG) and is used for verification and testing of real C code. We used this tool to experi- mentally evaluate the effectiveness of our approach. As we detail in Section 7, ECLAIR was able to answer questions (i)–(iii) by detecting some dangerous bugs affecting the code of Figure 1, including the generation of a NaN when certain coordinates are received as inputs. Moreover, ECLAIR has been able to answer the same questions for a large and heterogeneous benchmark that we assembled with both real-world and self-developed code, without timing out in at least 96 % of the cases. In Section 8, we compare our techniques with the state of the art, finding out that they outperform other approaches in the ability to detect anomalous behaviors, most of the times offering even shorter analysis times.

Plan of the paper Section 2 recalls basic definitions and introduces the required notation; Section 3 explains the verification framework we use; Section 4 in-

10 For details, see the requirements of the algorithms in Section 5.2. 11 See https://software.intel.com/sites/products/documentation/doclib/mkl/vm/vmdata.htm, last accessed on July 16th, 2020.

6 troduces the notions of monotonicity glitch and quasi-monotonicity; Section 5 describes direct and indirect propagation algorithms that are able to deal with (at least) 75 of the math.h/cmath functions; Section 6 explains how trigonomet- ric functions, which are periodic, can be treated by partitioning a subset of their graph into a set of quasi-monotonic branches; Section 7 briefly describes the implementation in the context of the ECLAIR software verification platforms and illustrates the experimental results; Section 8 compares our approach with the state of the art, and other related work; Section 9 discusses the problems that remain to be solved and sketches several ideas for future work; Section 10 concludes the main part of the paper. Appendix A presents additional data about glitches in single-precision functions for several implementations of libm; Appendices B and C contain more details on the algorithms of Section 5, and Appendix D on those of Section 6, along with formal proofs of correctness and complexity results.

2 Background: Floating-Point Numbers and Intervals

We denote by R+ and R− the sets of strictly positive and strictly negative real numbers, respectively.

Definition 1. (IEEE 754 binary floating-point numbers.) A set of IEEE 754 binary floating-point numbers [IEE08] is uniquely identified by: p ∈ N, the num- ber of significant digits (precision); emax ∈ N, the maximum exponent, the min- imum exponent being emin := 1 − emax. The set of binary floating-point numbers F(p,emax,emin) includes: – all signed zero and non-zero numbers of the form (−1)s · 2e · m, where • s is the sign bit; • the exponent e is any integer such that emin ≤ e ≤ emax; • the mantissa m, with 0 ≤ m< 2, is a number represented by a string of p binary digits with a “binary point” after the first digit:

p−1 −i m = (d0 . d1d2 ...dp−1)2 = di2 ; i=0 X – the infinities +∞ and −∞.

nor emin The smallest positive normal floating-point number is fmin := 2 and the emax 1−p largest is fmax := 2 (2 − 2 ). The non-zero floating-point numbers whose absolute value is less than 2emin are called subnormals: they always have fewer than p significant digits. Every finite floating-point number is an integral multiple emin+1−p of the smallest subnormal magnitude fmin := 2 . Note that the signed zeroes +0 and −0 are distinct floating-point numbers.

Each IEEE 754 binary floating-point format also includes the representation of symbolic data called NaNs, from “Not a Number.” There are quiet NaNs,

7 which are propagated by most operations without signaling exceptions, and sig- naling NaNs, which cause signaling invalid operation exceptions. The unintended and unanticipated generation of NaNs in a program (e.g., by calling the log func- tion on a negative number) is a serious programming error that could lead to catastrophic consequences. In the rest of the article we will only be concerned with IEEE 754 bi- nary floating-point numbers, excluding NaNs, and we will write simply F for F(p,emax,emin) when there is no risk of confusion.

Definition 2. (Floating-point symbolic order.) Let F be any IEEE 754 floating-point format. The relation ≺ ⊆ F × F is such that, for each x, y ∈ F, x ≺ y if and only if either: x = −∞ and y 6= −∞, or x 6= +∞ and y = +∞, or x = −0 and y ∈{+0} ∪ R+, or x ∈ R− ∪{−0} and y = +0, or x, y ∈ R and x

Note that F is linearly ordered with respect to ‘≺’. For x ∈ F, we sometimes confuse the floating-point number with the extended real number it represents, the floats −0 and +0 both corresponding to the real number 0. Thus, when we write, e.g., x

Definition 3. (Floating-point intervals.) Let F be any IEEE 754 floating- point format. The set IF of floating-point intervals with boundaries in F is

IF := {∅} ∪ [l,u] l,u ∈ F,l 4 u .  [l,u] denotes the set { x ∈ F | l 4 x 4 u }. IF is a bounded meet-semilattice with least element ∅, greatest element [−∞, +∞], and the meet operation, which is induced by set-intersection, will be simply denoted by ∩. (A bounded meet- semilattice is a partially ordered set that has a meet, or greatest lower bound, for any nonempty finite subset and a greatest element.) Half-open floating-point intervals are defined similarly, so that [l,u) denotes the set { x ∈ F | l 4 x ≺ u }.

Floating-point intervals with boundaries in F allow us to capture the extended numbers in F: NaNs should be tracked separately. Given a floating-point interval [l,u] ∈ IF, we denote by #[l,u] the cardinality of the set { x ∈ F | l 4 x 4 u }.

Definition 4. (Floating-point successors and predecessors.) The func- tion succ: F \{+∞} → F is defined, for each x ∈ F \{+∞}, by succ(x) := min{ y ∈ F | x ≺ y }. Similarly, function pred: F \ {−∞} → F is defined, for  each y ∈ F \ {−∞}, by pred(y) := max{ x ∈ F | x ≺ y }. We will iteratively 

8 apply these functions, so that, e.g., for each n ∈ N, we will refer to the partial function succn : F ֌ F given, for each x ∈ F, by succ0(x) := x; n+1 n n ( succ (x) := succ succ (x) , if succ (x) 6=+∞. The definition of the iterated pred n : F ֌F function is analogous. Note that the notation #[x, y) >n is equivalent to x ≺ succn(y). When the resultx ˆ of a floating-point operation is not representable exactly in the current format, its floating-point approximation is chosen according to the rounding mode in use. The IEEE 754 Standard defines rounding modes ‘near’, ‘up’, ‘down’ and ‘zero’, that roundx ˆ with, respectively: near : the number x ∈ F minimizing |x − xˆ|; if two such values exists, the even one is chosen. up : the minimum number x ∈ F such thatx ˆ < x. down : the maximum number x ∈ F such thatx ˆ > x. zero : the same as ‘down’ if x> 0, the same as ‘up’ if x ≤ 0.

3 Background: Approaches to Program Verification

In this section, we recall the verification and correctness-ensuring techniques that our interval refinement algorithms enable for floating-point programs.

3.1 Symbolic Execution Symbolic execution is a technique originally introduced for test-data genera- tion [Kin76], but that has found numerous applications also in the field of pro- gram verification [CPDGP01,CR85,GS18]. It consists of the evaluation of each execution path in the program by using symbolic values for variables, treat- ing all assignments and guards of conditional statements as constraints on such symbolic values. The so obtained constraint systems characterize the program variables’ values for which the execution path is feasible. Thus, solving the con- straint system for a path can either prove it unfeasible, if the system has no solution, or yield a set of assignments for program variables, including input variables, that causes the execution of such path. We perform symbolic execution of floating-point computations as described in [BGM06] and [BCGG13]. We briefly illustrate this approach by means of an example taken from line 31 of the listing in Listing 1: 31 float phi_ = asin(sin(dl) / cosh(ll)); starts by translating the code into static single assignment form (SSA) [ALSU07]. In this intermediate code representation, complex ex- pressions are decomposed into sequences of assignment instructions where at most one operator is applied, and new variable names are introduced so that each variable is assigned to only once. Thus, assignments can be considered as if they were equality constraints. The above expression is transformed into

9 1 float phi_; double z1, z2, z3, z4, z5, z6; 2 z1 = (double) dl; 3 z2 = sin(z1); 4 z3 = (double) ll; 5 z4 = cosh(z3); 6 z5 = z2 / z4; 7 z6 = asin(z5); 8 phi_ = (float) z6; Then, we can directly regard the assignments as a system of constraints over floating-point numbers. When if statements are involved, the execution flow is split in two different paths, each of which results in a different constraint system. Loops are dealt with by unrolling them, and function calls by inlining. For more details, we refer the reader to [BGM06]. Once the constraint system for a path has been generated, it can be immedi- ately solved to generate test input data that causes its execution, or to prove it is unfeasible. Additionally, it is possible to augment the constraint system with assertions, whose truth can be evaluated by solving the system. This is how we perform model checking by means of symbolic execution. We support any kind of assertion formulated as a Boolean combination of constraints on the ranges of program variables. To prove or disprove the truth of an assertion, we gener- ate the constraint systems related to each execution path leading to it. Then, we augment such system with the negation of the assertion, and try to solve it. If no solution is found, we can conclude that the assertion is never violated. Otherwise, the solution to the constraint system produces a counter-example. For example, suppose we want to know whether a NaN can be generated by the invocation to asin in line 7. Thus, we want to prove the assertion z5 ≥ −1 ∧ z5 ≤ 1, stating that the argument of asin is always in its domain. We first generate the constraint system for the execution path leading to line 7. Then, we negate the assertion, obtaining two new constraint systems, one with the addition of z5 < −1, and the other one with z5 > 1. As we shall see in Section 3.3, both systems are unsatisfiable, proving that a NaN can never be generated in line 7.

3.2 Constraint Solving over Floating-Point Variables

To solve constraint systems, we employ an approach called interval-based consis- tency, which amounts to iteratively narrowing the floating-point intervals associ- ated with each variable in a process called filtering [BMV94,BO97]. In the litera- ture on constraint propagation, the unary constraints associated with variables, e.g., intervals, are also called labels [Dav87]. A projection proj(xi,C,I1,...,In), is a function that, given a constraint C with n variables x1,...xn and the in- ′ tervals I1,...In associated with them, computes a possibly refined interval Ii for one of the variables xi, that is tighter than or equal to the original inter- val Ii associated with that variable. Ternary constraints of the form x = y ◦ z, where ◦ is one of +, -, *, /, and x, y, z are three program variables, result in the direct projection proj(x, x = y ◦ z, Ix, Iy, Iz), and the indirect projections

10 Algorithm 1 Constraint Propagation

Require: Constraint store S, Variables x1,...,xn, Intervals I1,...In, imax ∈ N. ′ ′ Ensure: Refined Intervals I1 ⊆ I1,...,In ⊆ In. ′ ′ 1: I1 := I1; ...; In := In; Q := S; i := 0; 2: while Q 6= ∅ ∧ i < imax do 3: proj(xi,C) := dequeue(Q); ′′ 4: Ii := proj(xi,C,Ii1 ,...,Iik ); ′′ 5: if Ii = ∅ then break ′′ ′ 6: else if Ii 6= Ii then ′ ′′ 7: Ii := Ii ; ′ ′ ′ ′ 8: ∀C ∈ S,xi ∈ C : ∀xj ∈ C : enqueue(Q, proj(xj ,C )) 9: end if 10: end while

proj(y, x = y ◦ z, Ix, Iy, Iz), and proj(z, x = y ◦ z, Ix, Iy, Iz). Given a constraint x = f(y), where f is any unary math.h/cmath library function, we get the direct projection proj(x, x = f(y), Ix, Iy) and the indirect projection proj(y, x = f(y), Ix, Iy). Binary constraints of the form x = (type) y, where type is either float or double, and binary relations found in Boolean expressions, such as ==, !=, <, <=, >, >=, result in similar projections. For example, considering z5 = z2 / z4, the projection proj(z5, z5 = z2/z4, Iz5, Iz2, Iz4) over z5 is called direct projection (it goes in the same sense of the TAC assignment it comes from) while the pro- jections proj(z2, z5 = z2/z4, Iz5, Iz2, Iz4), and proj(z4, z5 = z2/z4, Iz5, Iz2, Iz4) over z2 and z4 are called indirect projections. In constraint propagation, both direct and indirect projections are repeatedly applied in order to refine the intervals associated with the variables. We define a propagator as the actual implementation of a projection, that possibly refines an interval. For example, a propagator for the projection proj(z5, z5 = z2/z4, Iz5, Iz2, Iz4), with Iz2 = [z2l, z2u] and Iz4 = [z4l, z4u] could be

′ Iz5 = [ min(z2l/z4l, z2l/z4u, z2u/z4l, z2u/z4u), max(z2l/z4l, z2l/z4u, z2u/z4l, z2u/z4u)] ∩ Iz5.

Propagators for basic floating-point operations are already present in the lit- erature [BGM06,BBB+19,BCGG16], and are out of the scope of this paper. Moreover, integer variables can also be treated with this approach, by employ- ing appropriate propagators. The application of projections by executing the corresponding propagators is governed by heuristic algorithms that go beyond the scope of this paper. For our purposes, it suffices to show Algorithm 1. Whenever the interval associated to a variable is refined, all the propagators are inserted into a data structure Q (in our case a FIFO queue) of propagators that are ready to run (line 1). Heuristics are used to select and remove from the data structure one of the ready propagators (line 3), which is then run. If that results in the refinement of the interval of one variable, all the propagators that depend on that variable are inserted into the same data structure, unless they are already present (line 8).

11 This process continues until one of the intervals becomes empty (line 5), in which case propagation can be stopped as unsatisfiability of the given system of constraints has been proved, or the data structure becomes empty (Q = ∅ at line 2), i.e., propagation has reached quiescence as no projection is able to infer more information, or propagation is artificially stopped, e.g., because a timeout has expired (i = imax at line 2). When the constraint system reaches quiescence, the labeling procedure comes into play: a variable is chosen and the corresponding interval, its label, is divided into two or more parts and each part is searched independently. The way this par- tition is made is determined by heuristics that go beyond the scope of this paper, although we will return on this topic in Section 9.5. For each one of such parts, a new “child” propagation process is started, in which the interval for the chosen variable is instantiated to such part. Once all children processes reach a new quiescent state, labeling is performed again and this procedure repeated. This procedure stops when either one of the children propagation processes reaches quiescence with a singleton interval, in which case the singleton value can be used as a test-case or counterexample, or an interval becomes empty in all children processes, in which case the system is unsatisfiable.

3.3 Examples of Constraint Solving

Let us see how this can be used for program verification. As a first example, let us consider the question of whether the division z5 = z2 / z4 can give rise to a division by zero. Assume all the intervals are initially full, i.e., they contain all possible numerical floating-point values and all propagators are ready to run. We modify the interval associated to z4 to [−0, +0] and start propagation. At some stage the indirect propagator for cosh will be called to possibly refine the interval for z3 starting from the interval of z4: a propagator correctly capturing a passable implementation of cosh will refine the label of z3 to the empty interval, thus proving that division by zero is indeed not possible. As we will see, all the implementations of cosh() we have examined are far from perfect, but none of them has a zero in its range. As another example, let us consider z4 = cosh(z3), and suppose the inter- vals associated to z3 and z4 are [1, +∞] and [−∞, +∞], respectively. The direct projection for cosh, described in Section 5.1, would compute, on a machine 12 −52 we will later call xps, the refining interval [18B07551D9F55016 · 2 , +∞] −52 for z4, where we have 18B07551D9F55016 · 2 ≈ 1.543. Now suppose we want to determine for which values of z3 the computation of z4 = cosh(z3) results in an overflow, thereby binding z4 to +∞. To answer this question we artificially refine the interval of z4 to the singleton [+∞, +∞] and let the indirect propagator for cosh, described in Section 5.2, do its job: this will −43 result in the refining interval [1633CE8FB9F87E16 · 2 , +∞] for z3, where −43 1633CE8FB9F87E16 · 2 ≈ 710.5.

12 On xps, float and double are 32-bit and 64-bit IEEE 754 floating point numbers, respectively; see Table 4 for more details.

12 Coming back to the listing in SSA form at the beginning of this section, suppose now we want to know whether a NaN can be generated by the invocation to asin in line 7, i.e., whether we can have z5 < −1 or z5 > 1. Let us concentrate on the latter constraint, which we impose together with the constraints saying that dl and ll are neither NaNs nor infinities. All the other variables can take any value. We indicate with dℓ and iℓ the direct and the indirect projections for the constraint at line ℓ, respectively. The related propagators are either those described in Section 5, or in [BBB+19,BCGG16]. Here is what happens on a selected constraint propagation process on machine xps, where the numbers have been rounded for increased readability:

z5> −−−→1 z5 ∈ [1.0000000000000002, 1.798 · 10308] i −→6 z4 ∈ [−1.798 · 10308, 1.798 · 10308] d −→5 z4 ∈ [1, 1.798 · 10308] i −→4 z3 ∈ [−710.5, 710.5] d −→3 z2 ∈ [−1, 1] d −→6 z5 ∈ ∅ = [−1, 1] ∩ [1.0000000000000002, 1.798 · 10308] As the last constraint is unsatisfiable, the original constraint system is unsat- isfiable. The same happens if z5 < −1 is imposed, thereby proving that NaNs cannot be generated on line 7. As a final example, in order to show the indirect projections for asin and sin at work, we consider a partial constraint propagation starting from state z1 ∈ [−16, 16], z5 ∈ [−1, 1], z2 ∈ [−1, 1], z6 ∈ [−1.571, 1.571], z4 ∈ [1, 1.798 · 10308], phi ∈ [+0, 1.571].

A possible sequence of propagation steps is the following:

i −→8 z6 ∈ [+0, 1.571] i −→7 z5 ∈ [−2−1074, 1] i −→6 z2 ∈ [−1.332 · 10−15, 1], i −→3 z1 ∈ [−16, 15.71]. The constraint system has reached quiescence and labeling starts: after 7 labeling steps, a test-case is generated that falls off line 8 without generating NaN or infinities. This test-case is very simple: dl = +0, ll = +0, z1 = +0, z2 = +0, z3 = +0, z4 =1, z5 = +0 z6 = +0, phi = +0.

13 3.4 Integration into Abstract Interpreters In this section, we assume familiarity with Abstract Interpretation [CC76,CC77], a static analysis technique that enables verification of program properties by soundly approximating their semantics. We consider the concrete domain C = ℘(F), where F is any IEEE 754 floating-point format, and ℘ denotes the power- set operation, and the abstract domain A = IF × B, where IF is the set of intervals with boundaries on F, and B is a Boolean domain, that captures the possibility that a value is NaN. In particular, the Boolean domain of an abstract value is true if it may be NaN, and false if it cannot be NaN. The concretization function γ : A → C is defined as

γ [xl, xu],b ={ x ∈ F | xl 4 x 4 xu }   ∪ nan(p) b = true,p is the NaN payload .  We deal with math.h/cmath functions of the form f : F → F, so we need abstract functions of the form f # : A → A to perform abstract interpretation, with the correctness condition

∀a ∈ A : f ♭ γ(a) ⊆ γ f #(a) ,

where f ♭ : ℘(F) → ℘(F) is the trivial extension of f to subsets of F. The correct- ness condition comprises two parts: the numeric-symbolic part on IF, and the NaN part on B. The NaN part is simple: if f may return NaN on any element of γ(a), a ∈ A, then the Boolean part of f #(a) must be true. For all the functions we treat, the implication also holds in the other direction, since the POSIX stan- dard specifically defines for which values any function may return a NaN. For the # numeric-symbolic part, we must ensure that, if f [xl, xu],bx = [yl,yu],by , then ∀x ∈ [xl, xu]: f(x) = NaN ∨ f(x) ∈ [yl,yu]. Our contribution consists in determining f # on actual implementations of f, and studying the conditions un- der which we can guarantee soundness and precision of the approximation. The direct projections we describe in Section 5 can be immediately used, verbatim, as f # in forward analysis (from the initial states to the target states). The inverse projections can also be immediately used in backward analysis (from the target states, e.g., erroneous states, back to the initial states).13 For all floating-point arithmetic operations, projections that can be used as their abstract versions are already present in the literature [BGM06,BBB+19].

4 (Quasi-) Monotonicity and Glitches

A real-valued partial function fˆ: R ֌ R is called monotonic if it is order pre- serving (i.e., fˆ(x) ≤ fˆ(y) whenever x ≤ y and both fˆ(x) and fˆ(y) are defined) in which case we call it isotonic, or if it is order reversing (i.e., fˆ(x) ≥ fˆ(y)

13 Forward and backward analysis are usually alternated in abstract-interpretation- based static analyses.

14 whenever x ≤ y and both fˆ(x) and fˆ(y) are defined) in which case we say fˆ is antitonic. Definition 5. (Quasi-monotonicity.) Let I ⊆ F be a floating-point inter- val and f : F → F be a floating-point function meant to approximate a real- valued partial function fˆ: R ֌ R. We say that f is quasi-monotonic/quasi- isotonic/quasi-antitonic on I if fˆ is always defined and monotonic/isotonic/an- titonic on I. Let f : F → F be a quasi-monotonic function. The best we can hope for is that f be monotonic over (F, 4) for all rounding modes. While this is often the case, it is not always the case: monotonicity is occasionally violated at spots we call monotonicity glitches. Definition 6. (Monotonicity glitches.) Let f : F → F be a quasi-isotonic function on I ⊆ F. An isotonicity glitch of f in I is an interval [l,u] ⊆ I such that:

u ≻ succ(l) ∧ ∀x ∈ (l,u): f(l) ≻ f(x) ∧ f(l) 4 f(u).

If f is quasi-antitonic, an antitonicity glitch of f in I is an isotonicity glitch of −f in I. Isotonicity and antitonicity glitches are collectively called mono- tonicity glitches or, simply, glitches. Let G = [l,u] be a monotonicity glitch of f in I. The width and the depth of G are given, respectively, by

width(G) := #[l,u] − 1, depth(G) := # m,f(u) − 1, where m = min f(x). x∈(l,u)   Note that, for each glitch G, we have width(G) ≥ 2 and depth(G) ≥ 1. A glitch of f in I is called maximal if none of its proper supersets is a glitch of f in I. Non-maximal glitches are also called sub-glitches.

See Figure 2 for an exemplification of these concepts: G1 is a maximal glitch; G2, being contained into G1 is non-maximal; width(G1) = 5, depth(G1) = 4, width(G2) = depth(G2) = 2. We gathered the relevant statistics about glitches for 25 functions provided by libm on several implementations. We report in Table 1 the data for a machine we call xps, which features an x86 64 CPU and runs Ubuntu 19.10, with GCC 9.2.1, and libm is provided by GNU libc 2.30. The data for other platforms are reported in Appendix A. Table 1 presents, for each function, its name and the minimum and maximum of the considered domain interval. For most of the functions, this interval is the natural one. The exceptions are the following: for lgammaf we start at 2, which is where monotonicity theoretically begins; for tgammaf we also start at 2 because the considered implementations are neither monotonic nor periodic for arguments less than 2;14 for the trigonometric functions, at the bottom of 14 The real Γ function is strictly increasing in the interval [µ, +∞) for 1 <µ< 2.

15 G1

G2

Fig. 2. An example of monotonicity glitches

the tables, we restrict the domain to a region where there are at least 12 floats per period, for reasons that will be discussed in Section 6.1. For each function and each rounding mode (near, up, down, zero), Table 1 gives the number of glitches, ng, their maximum depth, dM, and their maximum width, wM. For the trigonometric functions we report the cumulative results concerning all the quasi-isotonic and quasi-antitonic branches in the given range. In the columns labeled ng, we report for these functions the maximum number of glitches in any such quasi-monotonic branch (we will refer to this quantity in Section 6.2 as ngM). All the data above were collected by exhaustively testing the functions in their domains, which is computationally feasible on single-precision floating-point numbers (cf. Section 9.2 for more details). The following observations can be made: 1. there are few glitches: many functions have no glitch at all, several functions have just a few glitches, a few functions have many glitches; 2. most glitches are very shallow; 3. with a notable exception, glitches are also very narrow. It is important to observe that glitches are not simply bugs that will surely be fixed at the next release. For instance, the implementation of tgammaf in Ubuntu 19.10/x86 64 has more numerous and deeper glitches than the one in Ubuntu 14.04/x86 64. Moreover, the implementation of cosf() in Ubuntu 18.04/x86 64 contains one glitch in all rounding modes that was not present in previous versions. The point is that monotonicity is not one of the objectives of most implementations of math.h/cmath functions. For instance, both the man- ual [LSM+16] and the FAQ of GNU libc explicitly exclude monotonicity from the accuracy goals of the library, so that bug reports about violated monotonicity are closed as invalid.15 15 See, e.g., bug reports https://sourceware.org/bugzilla/show_bug.cgi?id=15898 and https://sourceware.org/bugzilla/show_bug.cgi?id=15899, last accessed on July 16th, 2020.

16 Table 1. Glitch data for the xps machine

function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ 1 1 108 atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 454 1 2 466 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ 1 1 109 exp10f −∞ ∞ exp2f −∞ ∞ 1 1 109 expm1f −∞ ∞ lgammaf 2 ∞ 168 1 2 168 1 2 169 1 2 169 1 2 logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 105 4 3 105 4 3 105 4 3 105 4 3 cosf −223 223 sinf −223 223 tanf −223 223

We will now see how quasi-monotonicity can be exploited for the purposes of interval refinement and, in turn, software verification. Afterwards, we will deal with the special case of the trigonometric functions, as they pose the additional problem of periodic slope inversions.

5 Propagation Algorithms

Let F be any IEEE 754 floating-point format and let S ⊆ ℘(F) be a bounded meet-semilattice. A floating-point unary constraint over S is a formula of the form x ∈ S for S ∈ S. Let f : F → F be a function and consider a constraint of the form y = f(x) along with the unary constraints x ∈ Sx and y ∈ Sy with Sx,Sy ∈ S. ′ Direct propagation amounts to computing a possibly refined set Sy ∈ S, such that ′ ′ Sy ⊆ Sy ∧ ∀x ∈ Sx : f(x) ∈ Sy =⇒ f(x) ∈ Sy. (1) ′ Of course this is always possible by taking Sy = Sy, but the objective of the game ′ is to compute a “small” (possibly the smallest) Sy satisfying (1), compatible with

17 ′ the available information on f and computing resources. The smallest Sy ∈ S that satisfies (1) is such that

′′ ′′ ′ ′′ ∀Sy ∈ S : Sy ⊂ Sy =⇒ ∃x ∈ Sx .f(x) ∈ Sy \ Sy . (2)

Indirect propagation for the same constraints, y = f(x), x ∈ Sx and y ∈ Sy, ′ is the computation of a possibly refined set for x, Sx, such that

′ ′ Sx ⊆ Sx ∧ ∀x ∈ Sx : f(x) ∈ Sy =⇒ x ∈ Sx.

′ Again, taking Sx = Sx is always possible and sometimes unavoidable. The best we can hope for is to be able to determine the smallest such set, i.e., satisfying

′′ ′′ ′ ′′ ∀Sx ∈ S : Sx ⊂ Sx =⇒ ∃x ∈ Sx \ Sx .f(x) ∈ Sy. (3)

Satisfying predicates (2) and (3) corresponds to enforcing and obtaining do- main consistency [VSD98] on our constraint set. This goal is often difficult to reach, especially if the underlying variable domains are large. A less demanding approach is to seek interval consistency: we associate an interval [xl, xu] to vari- able x and an interval [yl,yu] to y, and we try to obtain new intervals whose bounds satisfy y = f(x). If f is isotonic, direct propagation can be reduced to finding a new interval ′ ′ [yl,yu] such that

′ ′ yl < yl ∧∀x ∈ [xl, xu]: f(x)< yl =⇒ f(x) < yl, ′ ′ yu 4 yu∧∀x ∈ [xl, xu]: f(x)4 yl =⇒ f(x)4 yu.

′ ′ Taking yl = yl, yu = yu trivially satisfies these predicates, but we aim to find an interval such that

′′ ′′ ′ ′′ ∀yl ∈ F : yl ≻ yl =⇒ ∃x ∈ [xl, xu] .yl 4 f(x) ≺ yl , ′′ ′′ ′ ′′ ∀yu ∈ F : yu≺ yu =⇒ ∃x ∈ [xl, xu].yu ≺ f(x) 4 yu.

′ ′ Indirect propagation consists now in finding an interval [xl, xu] such that

′ ′ xl < xl ∧∀x ∈ [xl, xu]: f(x) < yl =⇒ x < xl, ′ ′ xu 4 xu∧∀x ∈ [xl, xu]: f(x)4 yu =⇒ x4 xu.

An optimal result would satisfy

′′ ′′ ′ ′′ ∀xl ∈ F : xl ≻ xl =⇒ ∃x ∈ [xl, xl ) .f(x) < yl, ′′ ′′ ′ ′′ ∀xu ∈ F : xu≺ xu =⇒ ∃x ∈ (xu, xu].f(x) 4 yu.

A possible compromise between domain and interval consistency is the use of multi-intervals. It achieves further granularity by splitting domains into multiple intervals. The predicates given above can be easily extended to “multi-interval consistency.”

18 Unfortunately, the functions we are concerned with are neither isotonic nor antitonic, because of glitches. Yet, we devised algorithms that, given the imple- mentation of a quasi-monotonic library function f : F → F, an interval [xl, xu] for x and an interval [yl,yu] for y, compute refined bounds for both intervals, satisfying the correctness predicates defined above and, in some cases, even op- timality predicates. These algorithms exploit simple data describing the glitches of a specific function to overcome the issues generated by its quasi-monotonicity. Such data consist in safe approximations ng, dM and wM of, respectively, the f f f total number of glitches ng , their maximal depth dM and width wM. Moreover, a safe approximation α of where the first glitch starts, αf , and a safe approx- imation ω of where the last glitch ends inside the function’s domain, ωf , are needed. If the values of such data are conservative, then the refined intervals computed by our algorithms contain all solutions to the constraint y = f(x). We call this property correctness. If all projections involved in constraint solving are also correct, then no solution is mistakenly eliminated, and the process yields no false negatives. In general, the refined intervals may also contain values that are not solutions to the constraint, which could potentially lead to false positives in the verification process. This is avoided at the constraint-solving level by the labeling process, which splits variable domains until they become singletons. All projections are made so that they always discard singletons iff they do not contain a solution. For mathematical functions, this is done by the direct projection. Thus, if all projections have this property, the constraint solving process never yields false positives. This has, however, the drawback that the labeling process may lead to the enumeration of all values in the variable domains, if the projections fail to further refine them. If such domains are large, this likely results in a time-out. Thus, as we shall see in Section 7, the verification process yields no false positives or negatives, but don’t knows when it times out. Projections that fail to satisfy such requirements can, in fact, lead to false positives or negatives (cf. Section 8).

5.1 Direct Propagation

′ ′ Given an interval [xl, xu] for x and a function f, finding a refined interval [yl,yu] for y satisfying constraint y = f(x) is trivial if f is monotonic: computing ′ ′ [yl,yu] ≡ f(xl),f(xu) suffices. However, the presence of glitches in quasi- monotonic functions raises two main issues:   – there may be glitches in [xl, xu] in which the value of the function is lower than f(xl); – xu may be inside a glitch, and there may be values of x outside it such that f(x) ≻ f(xu).

If [xl, xu] and [α, ω] do not intersect, f can be treated as if it was monotonic. Otherwise, we exploit the information about the glitches of f to tackle these issues.

19 ′ Lower bound yl : if xl ∈ [α, ω], the worst-case scenario is that there is a glitch starting right after xl, where the graph of f goes lower than f(xl). Such a dM ′ glitch cannot be deeper than pred (f(xl)): we take this value as yl, the lower bound of the refined interval. If xl is not in the “glitch-area,” then we consider dM the value of f(α): no glitch in [α, xu] can go lower than pred (f(α)). In ′ dM this case, we set yl = min f(xl), pred (f(α)) . Upper bound y′ : if x ∈/ [α, ω], then it cannot be in a glitch, and y′ = f(x ). u u  u u Otherwise, it may be in a glitch, which cannot be deeper than dM: the actual maximum value of the function, outside of the glitch, cannot be higher than dM ′ succ (f(xu)). We set yu to this value.

′ ′ If the actual range of f in its whole domain is known, yl and yu can be compared with it to make sure they do not fall outside.

5.2 Indirect Propagation

Assume function f is quasi-isotonic. Indirect propagation, i.e., the process of ′ ′ inferring a new interval [xl, xu] ⊆ [xl, xu] for x starting from the interval [yl,yu] for y, is carried out by separately looking for a lower bound for the values of x satisfying yl = f(x), and an upper bound for the values of x satisfying yu = f(x). ′ ′ We use such bounds to refine correctly interval [xl, xu] into [xl, xu]. We designed two different algorithms, lower bound and upper bound, that carry out such tasks for the equation y = f(x), where y is a given single value of y. They extend the well known dichotomic search method to quasi-isotonic functions. For brevity, we describe in detail algorithm lower bound in the next section, leaving upper bound, which is symmetric, to Appendix C. Function lower bound (Algorithm 2) returns a value l satisfying one of the following predicates:

p0(y, xl, xu,l) ≡ ∀x ∈ [xl, xu]: y ≻ f(x),

p1(y, xl, xu,l) ≡ ∀x ∈ [xl,l]: y ≺ f(x),

p2(y, xl, xu,l) ≡ ∀x ∈ [xl,l]: y ≻ f(x),

p3(y, xl, xu,l) ≡ f(l) ≺ y ≺ f(succ(l)) ∧ ∀x ∈ [xl,l): y ≻ f(x),

p4(y, xl, xu,l) ≡ y = f(l) ∧ ∀x ∈ [xl,l): y ≻ f(x).

Such predicates express properties on whether the new bound for x satisfies equation y = f(x). When condition p0(y, xl, xu,l) holds, y = f(x) has no so- lution over [xl, xu]. Also if p1(y, xl, xu,l) holds with l = xu, y = f(x) has no solution there. When p1(y, xl, xu,l) holds with l ≺ xu or p2(y, xl, xu,l) holds, ′ y = f(x) may have a solution and choosing xl = succ(l) gives a correct refine- ment for xl. When p3(y, xl, xu,l) holds, we identified the leftmost point in [xl, xu] ′ where f crosses y without touching it and we can set xl = succ(l). Finally, when p4(y, xl, xu,l) holds, we identified the leftmost solution x = l of y = f(x) and ′ we can set xl = l.

20 Function upper bound returns a value u that satisfies one of the predicates below:

p5(y, xl, xu,u) ≡ ∀x ∈ [xl, xu]: y ≺ f(x),

p6(y, xl, xu,u) ≡ ∀x ∈ [u, xu]: y ≻ f(x),

p7(y, xl, xu,u) ≡ ∀x ∈ [u, xu]: y ≺ f(x),

p8(y, xl, xu,u) ≡ f(pred(u)) ≺ y ≺ f(u) ∧ ∀x ∈ (u, xu]: y ≺ f(x),

p9(y, xl, xu,u) ≡ y = f(u) ∧ ∀x ∈ (u, xu]: y ≺ f(x).

These predicates are the counterparts of p0-p4 for upper bound. If p5(y, xl, xu,u) or p6(y, xl, xu,u) hold, the latter only with u = xl, then y = f(x) has no solution in [xl, xu]. If p6(y, xl, xu,u) holds with u ≺ xu, or p7(y, xl, xu,u) holds with any u ∈ [xl, xu], then y = f(x) has no solution in interval [u, xu], but it might ′ have a solution somewhere in [xl,u). So, setting xu = pred(u) is correct. If p8(y, xl, xu,u) holds, then we identified the rightmost point in [xl, xu] where the ′ graph of f crosses y without touching it. Setting xu = pred(u) is correct. Finally, when p9(y, xl, xu,u) holds, we found the rightmost solution x = u of equation ′ y = f(x), and setting xu = u is correct. When function f is quasi-isotonic, the results of invoking lower bound on yl ′ ′ and upper bound on yu are combined to refine the interval [xl, xu] into [xl, xu], as follows.

– If p0 or p5 hold, then there is no solution, because the entire graph of the function is either below or above the interval for y. Note that p0 implies p6 and p5 implies p1. ′ – If p1 holds, we set xl = xl even if l ≻ xl. In fact, although the graph of the ′ ′ function is entirely above yl, there might be a value yl such that yl ≺ yl 4 yu ′ that satisfies yl = f(x) for some x ∈ [xl,l], so [xl,l] cannot be excluded. However, if y is a singleton, or if p5 holds, then there are no solutions. The ′ upper bound is treated similarly: unless y is a singleton or p0 holds, xu must be set to xu if p6 holds. – If p3 and p8 both hold and l ≻ u, then there is no solution for y = f(x). ′ ′ – For all other predicates, xl and xu can be set as stated below the definitions of the predicates. The same algorithms are used for the quasi-antitonic functions, because if f is quasi-antitonic, then −f is quasi-isotonic. They are called with f ′ = −f, i′ f = f ◦ (−id), and −yu instead of yl for lower bound, and −yl in place of yu for upper bound. When they terminate, pi(−yu, xl, xu,l) and pj (−yl, xl, xu,u), 0 ≤ i ≤ 4 and 5 ≤ j ≤ 9, hold for −f. Since −y ≺ −f(x) ⇐⇒ y ≻ f(x) and −y ≻−f(x) ⇐⇒ y ≺ f(x), they do not hold on f directly, but since yl and yu ′ ′ are switched, the same case analysis can be done to obtain xl and xu depending on the values of i and j.

Computation of the Lower Bound for y = f(x) Given a value for y and equation y = f(x), Algorithm 2 computes a correct lower bound refining the

21 interval of x. Its preconditions are listed in the Require statement and demand a quasi-isotonic function f : F → F, a value y ∈ F, and an interval [xl, xu] for x to be refined. To be as precise as possible, the algorithm needs safe approximations of the glitch data listed previously in this section. It also uses an inverse function f i : F → F, if available. To avoid complexity issues, parameter t ∈ N fixes the maximum length of the linear searches the algorithm performs in some cases, and s ∈ N is the maximum number of times function logsearch lb in Algorithm 4 can return an interval too wide to ensure the logarithmic complexity of the dichotomic search. The algorithm ends guaranteeing the post-conditions in the Ensure state- ment, where predicates pr(y, xl, xu,l) for r ∈{0,..., 4} are those described pre- viously. The post-conditions are divided in two parts: the correctness part is preceded by c and the precision part by p . The algorithm determines r and l by performing a number of calls to library function f bounded by a small constant k (e.g., k = 3), times the logarithm of #[xl, xu]. The code in lines 1-17 sorts out all edge cases and chooses two values lo and hi suitable to start the dichotomic search, carried out by function bisect lb. First, it calls function init, that takes a value y, interval [xl, xu], and an inverse i function f : F → F, if available, and returns a point inside [xl, xu]. init returns i i f (y) if xl 4 f (y) 4 xu, and the middle point between xl and xu, otherwise. Next, function gallop lb finds values lo and hi, satisfying the precondition of algorithm bisect lb, i.e., so that xl 4 lo 4 hi 4 xu, xl ≺ lo =⇒ #[f(lo),y) > dM and xu ≻ hi =⇒ f(hi) < y. gallop lb starts with hi = i and increases it (e.g., by multiplying it by 2) until it finds a value such that hi ≺ xu and f(hi) < y. If no such value can be found, it sets hi = xu. Similarly, it finds a value lo such that xl ≺ lo and #[f(lo),y) > dM, or it sets lo = xl. The case in which f(lo) ≻ y is then handled by lines 4-9. We need to de- termine if [xl, xu] really does not contain any solution for y, i.e. if, for each x ∈ [xl, xu], y ≻ f(x) holds. If glitches are too deep, a suboptimal value for l is returned, and the algorithm terminates. If f(lo) = y at line 10, the exact solution for the lower bound was found, and it is returned. In line 14, function findhi lb (Algorithm 3) handles the case where hi = xu and f(xu) ≺ y. This may arise if either for no x ∈ [xl, xu] we have f(x) < y, or ′ ′ if there is a value x ∈ [xl, xu] such that f(x ) < y, but xu is in a glitch. In the latter case, f is not monotonic in [xl, xu], and f(xu) is not a safe upper bound to the value of f in it. findhi lb discriminates quickly between these two cases and tries to find a value of hi suitable for bisect lb. In this case, it returns r = 1.

– If xu might be in a glitch wider than t (lines 3-9), for the sake of efficiency we do not perform an exhaustive search. By inspecting f(α), we may still be able to set r = 0, to signify that ∀x ∈ [xl, xu]: f(x) ≺ y, or r = 1 and hi to α. If we do not have enough information to chose one of these options, we just set r = 2 and l to a valid (but suboptimal) lower bound. – Otherwise, linsearch geq(f, y, [xl, xu], wM,t) performs a backward, float- by-float search for no more than min(t, wM) steps, starting from hi = xu,

22 Algorithm 2 Indirect propagation: lower bound(f, y, [xl, xu],ng, dM, wM, α, ω, f i,s,t) f f f f Require: f : F → F, y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , f i ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), f : F → F, s,t ∈ N. Ensure: c l ∈ F, r ∈ {0, 1, 2, 3, 4} =⇒ pr(y,xl,xu,l) f p f(xl) 4 y 4 f(xu) ∧ ng = 0 ∨ wM < t ∨ (ng = 1 ∧ α = α ) =⇒ r ∈   {3, 4} i 1: i := init(y, [xl,xu], f ); ⊲ xl 4 i 4 xu 2: (lo, hi) := gallop lb(f,y, [xl,xu],dM, i); 3: ⊲ (xl 4 lo 4 hi 4 xu) ∧ (xl ≺ lo =⇒ #[f(lo),y) >dM) ∧ (xu ≻ hi =⇒ f(hi) < y)

4: if f(lo) ≻ y then 5: if ng = 0 ∨ #[y, f(α)) >dM then 6: l = xu; r := 1; return 7: else 8: l := α; r := 1; return 9: end if 10: else if f(lo) = y then 11: l := lo; r := 4; return 12: end if; 13: if f(hi) ≺ y then 14: (r, l, hi) := findhi lb(f,y, [xl,xu], ng,dM,wM,α,ω,t); 15: if r ∈ {0, 2} then return 16: end if 17: end if; 18: lo := bisect lb(f,y, [xl,xu], ng,dM,wM,α,ω,s,t, lo, hi); 19: while f(succ(lo)) ≺ y ∧ t> 0 do 20: lo := succ(lo); 21: t := t − 1 22: end while; 23: if f(succ(lo)) ≻ y then l := lo; r := 3 24: else if f(succ(lo)) = y then l := succ(lo); r := 4 25: else l := lo; r := 2 26: end if

23 Algorithm 3 Indirect propagation: findhi lb(f, y, [xl, xu],ng, dM, wM,α,ω,t) f f f f Require: f : F → F, y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , f ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), t ∈ N, f(xu) ≺ y. Ensure: l ∈ F, r ∈ {0, 2} =⇒ pr(y,xl,xu,l), r =1 =⇒ hi ∈ [xl,xu] ∧ f(hi) < y .  1: l = xl; 2: if ng = 0 ∨ xu ≻ ω ∨ #[f(xu),y) >dM then r := 0 3: else if ng = 1 ∧ wM >t ∨ f(succ(α)) ≺ f(α) then  4: if y ≻ f(α) then 5: if f(succ(α)) ≺ f(α) then r := 0 6: else l := α; r := 2 7: end if 8: else hi := α; r := 1 9: end if 10: else 11: (b, hi) := linsearch geq(f,y, [xl,xu],wM,t); 12: ⊲ b = 1 ∧ hi ∈ [xl,xu] ∧ f(hi) < y ∨ b = 0 ∧∀x ∈ [ˆx,xu] : f(x) ≺ y  v  13: ⊲ wherex ˆ = max{xl, pred (xu)} and v = min{t,wM} 14: if b = 1 then r := 1 15: else if t ≥ wM then r := 0 16: else l := xl; r := 2 17: end if 18: end if

looking for the first value hi such that f(hi) < y. The search stops in two cases, discriminated by the value of variable b. b = 0 : no value for hi was found within t search steps. If they were enough to cover the glitch, we set r = 0 since ∀x ∈ [xl, xu]: f(x) ≺ y. Otherwise we set r = 2 and return xl. b = 1 : a value of hi appropriate for bisect lb was found.

Line 16 of lower bound is reached if findhi lb returns r = 1, so xl 4 hi 4 xu and f(hi) < y. Before the invocation of function bisect lb (Algorithm 4) at line 18, we have f(lo) ≺ y 4 f(hi), so lo 6= hi. bisect lb adapts the dichotomic method to refine interval [lo, hi] when f is a quasi-isotonic function. Each iteration of the while loop on line 1 uses function split point to pick the middle point, mid, of interval [lo, hi], so that the cardinalities of [lo, mid] and [mid, hi] differ at most by 1. Then, f(mid) is compared with y. If f(mid) < y, hi is updated with the value of mid. The critical case is when f(mid) ≺ y. Function bisect lb further discriminates whether lo can be updated with the value of mid or other refinements of [lo, hi] are possible. – If mid is not in a glitch, (if-guard on line 5) lo can be updated with the value of mid. – Otherwise, if there is only one glitch, wider than t (wM > t), or starting exactly at α (i.e., f(succ(α)) ≺ f(α)), and mid may be in it, the algorithm compares f(α) and f(ω) with y in order to set lo to the greatest correct

24 Algorithm 4 Indirect propagation: bisect lb(f, y, [xl, xu],ng, dM, wM, α, ω, s, t, lo, hi)

Require: xl 4 lo ≺ hi 4 xu. f(lo) ≺ y 4 f(hi), ∀x ∈ [xl, lo] : f(x) ≺ y f : F → F, f f f f f y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), s,t ∈ N. Ensure: c xl 4 lo ≺ hi 4 xu, f(lo) ≺ y 4 f(hi), ∀x ∈ [xl, lo] : f(x) ≺ y, f p ng = 0 ∨ wM 1 do 2: mid := split point(lo, hi); ′ 3: ⊲ ∃m,m′ > 0 . |m − m′|≤ 1 ∧ mid = predm(hi) = succm (lo) 4: if y 4 f(mid) then hi := mid 5: else if ng = 0 ∨ mid 4 α ∨ mid < ω ∨ #[f(mid),y) >dM then lo := mid 6: else if ng = 1 ∧ wM >t ∨ f(succ(α)) ≺ f(α) then  7: if f(ω) < y then 8: if f(α) < y then hi := α 9: else if f(succ(α)) ≺ f(α) then lo := mid 10: else if lo ≺ α then lo := α 11: else break 12: end if 13: else lo := ω 14: end if 15: else if wM ≤ t then 16: b := findfmax(f,wM, lo, mid); 17: ⊲ b ∈ max{lo, predwM (mid)}, mid ∧∀x ∈ max{lo, predwM (mid)}, mid : f(x) 4     f(b) 18: if f(b) < y then hi := b 19: else lo := mid 20: end if 21: else 22: z := logsearch lb(f,dM, lo, mid,y,s); 23: ⊲ z ∈ [lo, mid] ∧ (lo ≺ z) =⇒ #[f(z),y) >dM  24: if lo ≺ z then lo := z 25: else break 26: end if 27: end if 28: end while

25 value. If f(ω) ≺ y, then the function cannot reach y before ω, and we set lo to ω. Otherwise, we set hi to α if f(α) < y, and continue searching for y in the lower part of the interval. If f(α) ≺ y and the glitch starts at α, then even if mid is in the glitch, there cannot be values of f reaching y before mid. Otherwise, we set lo to α. – If mid may be in a glitch narrower than t, function findfmax finds the value b inside interval max{lo, predwM (mid)}, mid where f(b) is the maximal. b is then used to refine [lo, hi].   – If mid may be in a glitch wider than t, bisect lb refrains from running the expensive float-by-float search performed by findfmax and calls logsearch lb. If it exists, this function finds a value z ∈ [lo, mid] such that #[f(z),y) > dM. If z is found, it is used to refine [lo, hi]. logsearch lb performs a logarithmic search to find a value z as above. Its argument s ∈ N is (for efficiency reasons) a limit to the number of times logsearch lb can return an excessively wide interval as a refinement of [xl, xu]. If s has not been reached yet, logsearch lb starts with z = mid and decreases it (e.g., by dividing it by 2) until #[f(z),y) > dM. Otherwise, it sets z to lo. The post-condition of bisect lb ensures xl 4 lo ≺ hi 4 xu, f(lo) ≺ y 4 f(hi) and ∀x ∈ [xl, lo] : f(x) ≺ y hold when it terminates. At line 19 of lower bound, a while loop performs a float-by-float search (for at most t itera- tions) to approach the exact solution of y = f(x). Afterwards, the loop invariant ∀x ∈ [xl, lo] : f(x) ≺ y holds. An if block (lines 23-25) tests if an optimal so- lution of y = f(x) was found. If the else statement is reached, then the while loop terminated because t reached 0. In this case, l is set to lo, a suboptimal solution of y = f(x). The pseudocode of functions init, gallop lb, linsearch geq, findfmax and logsearch lb is not shown, because they are straightforward. The next results state that Algorithms 2, 3 and 4 are correct.

Lemma 51 Whenever function findhi lb of Algorithm 3 is called on actual parameters satisfying the Require condition, all values computed by findhi lb satisfy the Ensure condition.

Proof. (Sketch) The proof begins by assuming the precondition for findhi lb(f, y, [xl, xu],ng, dM, wM,α,ω,t) is satisfied: in particular, f(xu) ≺ y. Then, a case analysis on the values of ng, wM,f(xu),f(α) and succ(α) determines if a value for hi suitable for bisection can be computed with at most t iterations. In these cases, function findhi lb returns r = 1. If, at least, a new value l such that ∀x ∈ [xl,l]: y ≻ f(x) can be found, function findhi lb returns either r =0 or r = 2. When r = 0 the value of l is set to xu.

Lemma 52 Whenever function bisect lb of Algorithm 4 is called on actual parameters satisfying the Require condition, the values computed by bisect lb satisfy the Ensure conditions.

26 Proof. (Sketch) We assume that the precondition for bisect lb(f, y, [xl, xu],ng, dM, wM,α,ω,ng,s,t) is satisfied and consider the following while loop invariant:

Inv ≡ (xl 4 lo ≺ hi 4 xu) ∧ (f(lo) ≺ y 4 f(hi)) ∧ (∀x ∈ [xl, lo] : f(x) ≺ y). The schema of the proof consists in proving the following properties of the while loop of bisect lb. Initialization: Inv holds prior to the first loop iteration. Note that this is true since it is entailed by the Require statement. Maintenance: assuming that Inv holds at the beginning of an arbitrary loop iter- ation, we prove, by case analysis, that Inv holds at the end of that iteration, as well. Termination: we prove that #[lo, hi] decreases at each iteration. Since the guard of the while loop at line 1 tests the condition #[lo, hi) > 1, that is equivalent to #[lo, hi] > 2, it is guaranteed that the loop always terminates. Correctness: the correctness post-condition coincides with invariant Inv. To prove the precision post-condition we show that, at the exit of the loop, succ(lo) = hi holds. Therefore, by Inv, we have y 4 f(hi), which implies f(succ(lo)) < y. As a consequence of the precision post-condition, the following result also shows that when function f is isotonic or it has glitches narrower than t, Algo- rithm 2 finds a precise solution, i.e., it returns either r =3 or r = 4. Theorem 53. Whenever function lower bound of Algorithm 2 is called on actual parameters satisfying the Require condition, the values computed by lower bound satisfy the Ensure conditions.

Proof. (Sketch) We assume the precondition for lower bound(y, [xl, xu],ng, dM, i wM,α,ω,f ,t) holds. In order to prove the correctness post-condition we proceed as follow. First, we prove that, when calling function gallop lb at line 2, the actual parameters satisfy the Require conditions of gallop lb. Then, we know that function gallop lb(f, y, [xl, xu], dM,i) returns values for lo and hi satisfying the post-condition of line 3. Now, a case analysis on the comparison between the value of f(lo) and y on line 4 and on line 10 directly proves the post-condition for r = 1 and r = 4. The next step is to prove that the precondition of findhi lb is satisfied. Afterwards, by Lemma 51, the post-condition of findhi lb holds. Therefore, when findhi lb terminates, the post-conditions for r = 0 and r = 2 are proved. Then, the last step is proving that for r 6= 0 and r 6= 2 the preconditions of function bisect lb are met. By Lemma 52, after it returns, xl 4 lo ≺ hi 4 xu, f(lo) ≺ y 4 f(hi) and ∀x ∈ [xl, lo] : f(x) ≺ y hold, for the new values of lo and hi. At line 19 a while loop is entered. This loop performs a float-by-float search (for a maximum of t iterations) to approach the exact solution of y = f(x). We prove that the predicate ∀x ∈ [xl, lo] : f(x) ≺ y, which is also the loop invariant, holds at line 22. To this aim, we prove the following loop properties:

Initialization: the invariant ∀x ∈ [xl, lo] : f(x) ≺ y holds prior to the first loop iteration because it is entailed by the post-condition of function bisect lb.

27 Maintenance: we assume ∀x ∈ [xl, lo] : f(x) ≺ y holds at the beginning of an arbitrary loop iteration. This assumption, together with the guard of the loop f(succ(lo)) ≺ y and the assignment lo′ := succ(lo) in the body of the ′ loop, allows us to conclude that ∀x ∈ [xl, lo ]: f(x) ≺ y holds also at the end of the iteration. Termination: the loop terminates because, by the post-condition of bisect lb, there exists a value hi such that lo ≺ hi and y 4 f(hi). Moreover, the loop can end before reaching such value, because the parameter t ∈ N is decremented inside the loop, until it reaches 0. Correctness: as a consequence, at the end of the loop, the property ∀x ∈ [xl, lo] : f(x) ≺ y holds and either f(succ(lo)) < y or t = 0. Finally, the test on f(succ(lo)) ≻ y at line 23 allows us to prove the post- condition for r ∈{2, 3, 4} by case analysis.

Therefore, algorithm lower bound ensures the optimality of the bound in the following cases:

– ng = 0: the function is monotonic, or – wM

Theorem 54. If f : F → F is an isotonic function, i.e., ng =0, then, for each i [xl, xu] ∈ IF, dM, wM, α, ω, f : F → F, s,t ∈ N, computing lower bound as per Algorithm 2 evaluates f at most 2 log2(#[xl, xu]) + 4 times.

Moreover, if f has at least one glitch and wM ≤ t, k is strictly related to wM.

Theorem 55. If f : F → F has short glitches, that is, ng > 0 but wM

6 Trigonometric Functions

The propagators for trigonometric functions (i.e., the floating-point approxima- tions of sin, cos and tan) require a more complex approach to quasi-monotonicity. The underlying (partial) functions in R ֌ R change their monotonicity period- ically. In particular, the sin function changes its monotonicity in odd multiples π π of 2 (of the form (2k + 1) 2 , k ∈ R) while the cos functions does so in even

28 π multiples (of the form 2k 2 ). The tan function has asymptotes in odd multi- π ples of 2 , and in the intervals between them it is isotonic. Because of this be- haviour, Definition 6 fails to distinguish glitches caused by the implementation from “legitimate” monotonicity changes. However, if we separately consider a quasi-monotonic branch of the periodic function that is significantly wider than the widest glitch in terms of ULPs, we can locally apply Definition 6. If we limit our reasoning to each quasi-monotonic branch separately, all the statements we made for quasi-monotonic functions locally hold, and we can use the same meth- ods we developed for them. To properly identify monotonicity glitches in trigonometric functions, we need:

– an interval in which the density of floating point values in every quasi- monotonic branch of the functions’ graphs is sufficiently high; – a way to split their domain into quasi-monotonic branches.

6.1 An Appropriate Domain for Trigonometric Functions Analysis

To choose a domain suitable for the search of monotonicity glitches, we must con- sider that, while the period of a trigonometric function is constant, the distance between two consecutive floating-point numbers increases with the exponent. Such distance is expressed by the value of the ulp: R → R function: we will use the definition of ulp given in [Mul05, Definition 5]. A floating-point interval in which the idea of a monotonicity glitch is well defined should have a sufficient cardinality to allow for glitches that do not cover the entire interval. Let x ∈ F be a positive normal floating-point number: then ulp(x) = x − epred(x)−p+1 pred(x)=2 , where p is the precision of the format, and epred(x) the exponent of pred(x). For trigonometric functions, the size in R of the intervals in which the function has constant monotonicity is π. If we consider an interval [−ℓmax,ℓmax] in which for each x ∈ [−ℓmax,ℓmax] we have ulp(x) ≤ 0.5, then each monotonic branch contains at least 6 or 7 floats, which is acceptable for the propagators described in Section 5, if glitches have a width of 1 or 2 floats. In intervals with a higher ulp value, the notion of glitches would be hardly meaningful. So, we use a maximum exponent eℓmax = p − 1, leading to a domain p−1 23 [−ℓmax,ℓmax] with ℓmax = 2 . In conclusion, ℓmax = 2 for the IEEE 754 52 single-precision format, and ℓmax =2 for double-precision. As we noted in Section 1, domains like these are still excessively large for most real-world applications. For the experiments reported in Section 7.2, we used a bound ℓmax = 16, for example.

6.2 Outline of the Propagation Algorithms

In this section, we describe the projection algorithms we have devised for trigono- metric functions.

29 Direct Propagation The periodicity of trigonometric functions poses a funda- mental issue: in each monotonic branch the function can cover its whole range. Therefore, if the interval [xl, xu] to be used to refine y spans multiple branches, almost no refinement can be performed. However, since floating-point numbers become sparser as the exponent grows, there is the possibility that some branches do not reach the ends of the range, because there are no points where the function takes those values in such branches. Our algorithm takes advantage of these facts. First, it identifies the branches of the graph of function f to which xl and xu belong. Let c ∈ R: with [c]↑ we will denote the upper floating-point approximation of c, i.e. [c]↑ = min{ x ∈ F | x ≥ c }. Similarly, with [c]↓ we will denote the lower floating-point approximation of c, so that [c]↓ = max{ x ∈ F | x ≤ c }. Given x ∈ F, to identify the branch it 2 belongs to, we compute k = x π , with sufficient precision. This can be achieved with a range reduction algorithm, as described in [BCGB16], or [Ng92,Mul16]. Then, if k is odd and f = cos, or if k is even and f = sin or f = tan, k is incremented. The value of k is then such that f changes its monotonicity or has π π π π a discontinuity in k 2 and (k − 2) 2 , and x ∈ (k − 2) 2 ↑ , k 2 ↓ . If xl and xu are both in the same monotonic branch, the refinement   algorithm for regular functions described in Section 5.1 is called. Otherwise, the refinement function should be called separately for each branch, and then the minimum value of yl and the maximum value of yu should be returned. Since the number of branches to be separately inspected can be high, a threshold g is imposed on the maximum number of branches to be analyzed. If [xl, xu] spans more than g branches, the bounds of the function’s range are returned.

Indirect Propagation Algorithm 5, for indirect propagation, refines the inter- val for x by splitting its initial domain [xl, xu] into intervals in which the function graph is monotonic, and then applies lower bound and upper bound locally, to obtain a refined multi-interval. Interval [xl, xu] can be wide, and the number of intervals it has to be split into can be excessively large. So, Algorithm 5 limits their number to parameter g. The intervals closest to 0 are split with maximum granularity (each monotonic branch is considered separately), because floating- point numbers in this area are denser, and the probability of finding an optimal solution is higher. The remaining branches are gathered into two larger intervals, one to the left of the domain and starting with xl, and one to the right, ending with xu. These intervals are only refined in branches at the boundaries. Function split interval returns this split of the intervals, which depends on function f. If f has g or more monotonic branches in [xl, xu], then split interval re- i i i i turns a list of pairs ([xl1, xu1], [xl2, xu2]), 1 ≤ i ≤ g. If f = cos, those with i i π i i 2 ≤ i ≤ g − 1 are such that xl1 = xl2 = max(xl, 2ki 2 ↑) and xu1 = xu2 = π π π min(xu, (2ki + 2) 2 ↓), with 2ki 2 ↑ ≥ xl and (2ki + 2)2 ↓ ≤ xu, where the distinct values ki ∈ Z are those closest to 0 in magnitude, and ki+1 = ki + 1.   1 1   π   For i = 1, we have [xl1, xu1] = [xl, (2k1 + 2) 2 ↓], where k1 ∈ R is the highest value such that (2k + 2) π ≥ x , and [x1 , x1 ] = [ 2k π , (2k + 2) π ]. 1 2 ↓ l l2 u2 1 2 ↑ 1 2 ↓       30 i Algorithm 5 Indirect propagation: bounds trig(f,f , [yl,yu], [xl, xu],ngM, dM, wM,α,ω,g,s,t) i f f f Require: f : F → F, f : F → F, [xl,xu], [yl,yu] ∈IF, ngM ≥ ngM, dM ≥ dM, wM ≥ wM, f f α 4 α , ω < ω , ngM > 0 =⇒ (xl 4 α 4 ω 4 xu), g,s,t ∈ N. i i i i Ensure: |Ir|≤ g, 0≤i≤|Ir | [xl1,xu2] = [xl,xu], ∀0 ≤ i ≤ |Ir| : l , u ∈ F i S i i i i i i ∧ 0 ≤ r ≤ 4 ∧ (p i (yl,x ,x ,l ) ∨ p i (−yu,x ,x ,l )) l rl l1 u2 rl l1 u2 i i i i i i u ∧ 5 ≤ r ≤ 9 ∧ (p i (yu,x ,x , u ) ∨ p i (−yl,x ,x , u )) . u ru l1 u2 ru l1 u2  1: I := split interval(f,xl,xu,g); Ir := ∅; i i i i 2: for all ([xl1,xu1], [xl2,xu2]) ∈ I do i i 3: if isotonic(f, [xl1,xu1]) then i i i i i 4: (rl ,l ) := lower bound(f,yl, [xl1,xu1], ngM,dM,wM, α, ω, f ,s,t) 5: else i i i i i 6: (rl ,l ) := lower bound(−f, −yu, [xl1,xu1], ngM,dM,wM, α, ω, f ◦ (−id),s,t) 7: end if i i 8: if isotonic(f, [xl2,xu2]) then i i i i i 9: (ru, u ) := upper bound(f,yu, [xl2,xu2], ngM,dM,wM, α, ω, f ,s,t) 10: else i i i i i 11: (ru, u ) := upper bound(−f, −yl, [xl2,xu2], ngM,dM,wM, α, ω, f ◦(−id),s,t) 12: end if i i i i i i i i 13: Ir := Ir ∪ ([xl1,xu1], [xl2,xu2], [l , u ],rl ,ru) 14: end for

The last pair, with i = g, is symmetric. If f has less than g monotonic branches, exactly those sub-intervals are returned. For functions sin and tan, which change π monotonicity in odd multiples of 2 , replace 2ki with 2ki +1. The multiplications π by 2 are done with enough digits of π to be correctly rounded. Functions lower bound and upper bound are called at lines 3–12, distinguish- ing whether f is isotonic or antitonic in each interval. The arguments required by Algorithm 5 are essentially the same as those for lower bound, except for ngM, which is a safe approximation of the maximum number of glitches in each quasi-monotonic branch of the function. lower bound and upper bound can only be called if [xl, xu] is a subset of [−ℓmax,ℓmax] because they cannot operate on intervals too narrow (see sec- tion 6.1). If Algorithm 5 is applied repeatedly on the same interval, however, a value g > 1 could cause complexity issues: if sub-intervals are discarded be- cause no solutions are found in them, a repeated call of this refinement algorithm would split the domain again and again. To avoid this problem, the algorithm should be called on a further reduced domain. Note that, while this algorithm often succeeds in finding a refined interval if the function has the desired values y in one of the analyzed sub-intervals, it can say nothing if such values are not found. In this case, we cannot exclude the possibility that the function reaches them somewhere outside [−ℓmax,ℓmax] (or a smaller domain, if chosen). This prevents us from being able to tell when the equation y = x has no solution, which we could do for the regular functions. However, since floating-point num- bers are denser near 0, the functions take most of the values of their image in its

31 vicinity. This allows our algorithm to find a solution very often (when present), making it useful for automatic test-data generation.

7 Implementation and Experiments

In this section we first describe the implementation of the algorithms introduced in this paper. Then, we show the results of its experimental evaluation. The main research question that we aim to answer in this section regards the feasibility of our approach. This aim can be split into the following research questions, which concern different aspects of the problem: – RQ1: In what way can our approach be useful to a programmer? – RQ2: How effective is our approach in proving or disproving the possible occurrence of unwanted behaviors in floating-point computations? – RQ3: What are the performances of our approach? How long does it take to perform the activities investigated by the previous questions? We answer to RQ1 in Section 7.2, by means of a case study, and to RQ2 and RQ3 in Section 7.3.

7.1 Implementation All the algorithms presented in this paper have been implemented and included in the ECLAIR software verification platform for C/C++ source code, Java source code and bytecode.16 For the analysis of integers and floating-point values, ECLAIR mainly uses multi-intervals with a judicious use of polyhedral approx- imations made available by the Parma Polyhedra Library (PPL) [BHZ08]. For reasoning on the floating-point arithmetic operations, ECLAIR uses: – algorithms realizing optimal direct projections as well as correct and precise indirect projections: the result is similar to the projections defined in [Mic02], but the ECLAIR algorithms never require working with precision greater than the operation ; – algorithms that exploit properties of the binary floating-point representa- tions in order to obtain enhanced precision [BCGG16]; – dynamic linear relaxation techniques [BMR12,DGD07] using the PPL to enhance constraint propagation with the relational information provided by convex polyhedra.

The algorithms defined in Section 5 have been implemented in C++ and ex- tensively tested on a variety of implementations with different characteristics in terms of the presence and nature of monotonicity glitches. These algorithms are now used in three components of the platform instantiation for C/C++: the semantic analysis engine based on abstract interpretation [CC77], the automatic generator of test inputs, and the symbolic model checker, the latter being both

16 http://bugseng.com/products/eclair, last accessed on July 16th, 2020.

32 based on constraint solving [GBR98,GBR00]. All components use multi-interval refinement, though in different ways: the test generator and symbolic model checker are driven by labeling and backtracking search. As these have a negative interaction with the searches controlled by the s and t parameters of the algo- rithms, their setting needs to be controlled more carefully (and they are better set to 0 when glitch data is precise) by these components, whereas they are used with values in the range 5–20 in the semantic analysis engine. Here, we report on experiments with the symbolic model checker and automatic generator of test in- puts. One of its interesting features is that it optionally produces a transformed source program that contains the original program, suitably instrumented, and a driver that runs one of the generated tests (or model checking counterexam- ples) at a time. The instrumented code checks that each one of the generated tests achieves its target, e.g., it reaches a certain program point, or it causes an integer overflow or the generation of a floating-point NaN or infinite value. The validation of test inputs is thus completely automatic.

7.2 Case Study

To illustrate better the potential that algorithms in Sections 5 and 6 have, let us consider again the introductory example of Figure 1. We show how our tool can support the workflow of a programmer in checking whether such code presents unwanted behaviors or not. Let us pretend we know nothing about the code (which is realistic, as there are no comments besides the one at line 5). So, we initially assume that the entry point is latlong_utm_of();as there are no asser- tions, we also assume all inputs are possible. For an exploratory analysis, we use ECLAIR’s symbolic model checker in order to detect the possible presence of run- time anomalies: overflow, division by zero and other sources of undefined and implementation-defined behavior over the integers, inexact integer-to-floating conversions, finite-to-infinite and numeric-to-NaN transitions over floating point numbers. A finite-to-infinite (resp., numeric-to-NaN ) transition is a computa- tion whereby the inputs to a floating-point operation or math.h/cmath function is finite (resp., numeric) and the output is infinite (resp, NaN). We also set an analysis parameter asking ECLAIR to flag all the invocations of trigonometric functions whose argument has an absolute value greater than, say, 16. Not sur- prisingly, we obtain three test inputs showing that this is indeed possible. They concern the following program points: p2 : (−0x864880.p − 18F, +0.0F, 1), where −0x864880.p − 18F ≈−33.570801, p3 : (−0x8a3ae7.p − 19F, +0.0F, 1), where −0x8a3ae7.p − 19F ≈−17.278761, p5 : (+0x96d12f.p − 21F, −0x98b6c1.p − 19F, 1), where +0x96d12f.p − 21F ≈ 4.713035 and −0x98b6c1.p − 19F ≈−19.089235.

Of course, the latter input causes the same phenomenon at program point p7 as well. Perhaps latlong_utm_of() callers only pass smaller values for phi and lambda. Even if that is not the case, then perhaps the only problem is a

33 slight precision issue. But ECLAIR produces two other test inputs, with the specification that they trigger number-to-NaN transitions: p1 : (+0xc90fdb.p − 23F, +0.0F, 1), where +0xc90fdb.p − 23F ≈ 1.570796, p4 : (−0xb63223.p − 35F, +0xcfbb98.p − 23F, 1), where we can give the approx- imations −0xb63223.p − 35F ≈−3.48 · 10−4, and +0xcfbb98.p − 23F ≈ 1.622912.

As +0xc90fdb.p − 23F ≈ 1.570796 converted to double precision is slightly greater than M_PI_2, the round-to-nearest, double-precision approximation of π 2 defined in math.h, we make the hypothesis that phi has to be less than or equal to M_PI_2. Indeed, looking at the function callers (there is only one in the program), we come to the realization that phi and lambda are a latitude and longitude in radians, respectively. This part of the analysis took 27.81 seconds. We attempt validation of this hypothesis by adding the assertions assert(-M_PI_2 <= phi && phi <= M_PI_2); assert(-M_PI <= lambda && lambda <= M_PI); at the beginning of latlong_utm_of() and repeat the analysis. After 22.51 seconds, we obtain another ill-conditioned trigonometric function argument test input for program point p5: p5 : (−0xc8f7db.p − 23F, +0xc90fda.p − 22F, 255), where −0xc8f7db.p − 23F ≈ −1.570064 and +0xc90fda.p − 22F ≈ 3.141593

(surely utm zone = 255 is not among the expected inputs) and another numeric- to-NaN transition: p4 : (−0xb63223.p − 35F, +0xcfbb98.p − 23F, 1), where we have the approxi- mations −0xb63223.p − 35F ≈−3.48 · 10−4, and +0xcfbb98.p − 23F ≈ 1.622912.

In order to understand the intended inputs for latlong_utm_of() we now take into account its calling context, summarized in Figure 1.1. The inputs to latlong_utm_of() depend on two 32-bit signed integers, gps_lat and gps_lon, that are received from a communication channel: no check is made upon them after reading the values out of the input buffer. Taking into account the caller context, in 25.73 seconds ECLAIR generates three reports. If gps lat = 0 and gps lon = −1920000000 at line 20 in Figure 1.1, then the conversion in the assignment marked with ‘∗’ on the same line causes an unsigned wraparound (−1 mod 256 = 255,sothat,yes, latlong_utm_of() can be called with utm zone = 255). The same input also generates an ill-conditioned trigonometric function argument for program point p5 in Figure 1. Most importantly, if gps lon = 900000059 and gps lat = −1920000000, then we have a numeric-to-NaN tran- sition at program point p1. This probably means that if the equipment at the other end of the communication channel is defective or if there is a communi- cation error, things can go horribly wrong. However, let us now suppose that there are no problems of this kind and that we have |gps lat · 10−7| ≤ 90

34 Listing 1.1. Calling context of latlong utm of() 1 int gps_lat, gps_lon; /* 1e7 deg */ 2 unsigned char nav_utm_zone0; 3 /* [...] */ 4 static gboolean 5 read_data(GIOChannel *chan, GIOCondition cond, gpointer data) { 6 int count; 7 char buf[BUFSIZE]; 8 9 /* receive data packet containing formatted data */ 10 count = recv(sock, buf, sizeof(buf), 0); 11 if (count > 0) { 12 if (count == 23) { 13 // FillBufWith32bit(com_trans.buf, 1, gps_lat); 14 gps_lat = buf2uint(&buf[0]); 15 // FillBufWith32bit(com_trans.buf, 5, gps_lon); 16 gps_lon = buf2uint(&buf[4]); 17 /* [...] */ 18 19 20 nav_utm_zone0 =∗ (gps_lon/10000000+180) / 6 + 1; 21 latlong_utm_of(RadOfDeg(gps_lat/1e7), RadOfDeg(gps_lon/1e7), 22 nav_utm_zone0);

and |gps lon · 10−7| ≤ 180 as the code seems to assume. In 27.18 seconds, the analysis with ECLAIR shows this is not enough: the numeric-to-NaN tran- sition at program point p1 is still possible with gps lat = −899999991 and gps lon = −1800000000 (this point is roughly 10cm from the Geographic South Pole). In a couple more iterations we add the assertions

18’ assert( -899999990 <= gps_lat && gps_lat <= 899999990); 19’ assert(-1800000000 <= gps_lon && gps_lon <= 1800000000);

and the final ECLAIR run shows no report. This, per se, does not mean much. However, this experiment was performed on the xps machine, for which we have precise glitch data for the single-precision functions (which are not used in the code considered) and we have the maximum known errors provided by the GNU libc manual for the double-precision functions [LSM+16]. As explained in Section 9.2, this data provides imprecise and possibly incorrect information about glitches that our algorithms can exploit. In turn, all this means that:

– if the numbers in [LSM+16] do really provide upper bounds to the maximum errors of the used functions, and – if the caller guarantees that the values of gps lat and gps lon do satisfy the “stay away from the poles” assertions at lines 18′–19′ in modified Figure 1.1, – then, in the context of such a call, all the 154 potential run-time anomalies in the 90 potentially problematic program points of Figure 1 cannot occur on xps.

35 More precisely, these anomalies consist of 4 integer overflows, 4 inexact conver- sions, 10 ill-conditioned trigonometric function arguments, 70 finite-to-infinity and 66-numeric-to-NaN transitions. Just to mention one potential problem, di- vision by zero and consequent finite-to-infinite transition at program point p6, cannot happen on that implementation.

7.3 Feasibility Evaluation

To better assess the capabilities of our approach, we analyzed with ECLAIR a benchmark that we assembled by taking code from the GNU Scientific Li- brary (GSL),17 AxBench [YMELK17], a benchmark popular in approximate computing, and a test suite that we created to evaluate the performances of our algorithms in the most disparate ways. We also included the code from the Paparazzi UAV avionics library from Figure 1. Since our purpose is to evaluate mainly the propagators for the supported math.h/cmath functions, we selected only code that contains at least two calls to such functions, and we excluded code containing unsupported functions, such as pow. We tried to assemble a rather varied benchmark suite, containing code from different application domains. In- deed, GSL is a library for scientific computation, while from AxBench we picked code from the finance (blackscholes.c) and robotics (inversek2j.c) application domains. While other benchmarks are aimed at evaluating our approach on real-world code, our self-made test suite contains various computations aimed at generating constraint systems that are difficult to solve. For each floating-point operation, ECLAIR tries to prove that such operation may not generate any finite-to-infinite or numeric-to-NaN transition. If it fails, it generates a counterexample, i.e. a program input that causes such transition. We limited the maximum number of iterations of the constraint solving process (i.e., imax of Algorithm 1, Section 3.2) to 200. If such threshold is reached, ECLAIR times out. Each generated input is automatically validated by executing the original program. This showed that all inputs generated by ECLAIR actually trigger the intended behaviour in the original code, so we can claim that no false positives were generated. The analyses were executed on machine xps, a high-end laptop with an x86 64 CPU (6 cores @2.20GHz) and 16 GB of RAM, running Ubuntu 19.10. This could be a typical hardware setting for a software developer. ECLAIR does not yet support multi-threaded constraint solving, so only one CPU core is used at a time. We report the results in Table 2.

RQ2: Detection of anomalous behaviors RQ2 asks how effective our ap- proach is in proving or disproving the possible occurrence of unwanted behaviors in floating-point computations. In particular, we want to assess the proportion of floating-point operations for which ECLAIR is able to generate an answer without timing out.

17 https://www.gnu.org/software/gsl/, last accessed on July 16th, 2020.

36 Table 2. Benchmark data for the xps machine. For each file we report the number of lines of code (# LOC), the number of finite-to-infinite and numeric-to-NaN transitions, the total time taken by the analysis (T ) in seconds, the time taken by propagators for math.h/cmath functions (Tm) in milli-seconds. For each kind of transition, ECLAIR identified all operations that may potentially trigger them. We report the total number of such operations that ECLAIR proved feasible by generating a test-case (g), those that ECLAIR proved unfeasible (u), and those for which ECLAIR timed out (t).

finite to +∞ finite to −∞ numeric to NaN Time Benchmark # LOC g u t g u t g u t T (s) Tm (ms) GSL bessel.c 191 6 74 1 5 75 1 1 90 1 23.79 531.24 bessel i.c 144 4 96 0 2 92 0 0 94 0 9.83 6.32 bessel j.c 158 0 109 0 0 109 0 0 114 0 85.18 109.89 bessel olver.c 185 49 305 7 11 349 1 14 363 0 201.99 267.13 exp.c 426 18 195 14 8 200 2 0 210 0 342.94 7566.29 gegenbauer.c 181 33 68 5 14 84 8 4 104 0 700.79 10.75 lambert.c 219 17 41 3 6 55 3 2 65 0 20.27 56.77 sincos pi.c 163 4 35 0 2 37 0 2 39 0 0.83 0.32 cauchy.c 57 2 8 0 2 8 0 2 8 0 0.63 6.55 cauchyinv.c 73 5 9 0 7 7 0 4 10 0 1.46 422.39 exponential.c 56 1 0 1 1 0 0 1 0 0 2.36 0.35 exponentialinv.c 36 1 1 0 2 1 0 1 2 0 0.61 1.11 gauss.c 337 31 59 8 3 91 2 0 95 1 308.89 70.51 gaussinv.c 286 11 73 14 6 85 9 2 97 3 125.57 8.69 gumbel1.c 47 4 6 0 4 3 0 2 5 0 0.94 7.58 gumbel1inv.c 59 2 1 0 3 3 0 3 3 0 0.68 3.30 laplace.c 56 2 11 0 2 7 0 2 7 0 0.63 1.72 laplaceinv.c 73 4 6 0 8 6 0 4 10 0 0.95 4.44 logistic.c 56 2 14 0 2 8 0 2 8 0 0.63 2.64 logisticinv.c 59 3 3 0 3 5 0 2 6 0 0.78 1.69 lognormal.c 38 33 61 8 7 93 2 4 97 1 308.58 72.09 lognormalinv.c 65 11 79 14 6 89 9 2 101 3 126.49 8.74 paretoinv.c 59 3 2 0 2 2 0 1 3 0 0.69 3.00 rayleigh.c 36 2 2 1 3 1 0 2 2 0 0.69 0.67 rayleighinv.c 59 2 0 0 2 1 0 2 3 0 0.66 3.50 AxBench blackscholes.c 292 6 111 8 8 107 7 4 81 2 556.67 699.89 inversek2j.c 26 5 27 0 1 31 0 3 33 0 1.10 3.38 paparazzi.c 93 2 81 1 2 83 2 5 79 1 27.22 759.97 Test suite 3370 234 421 26 125 483 17 105 582 2 135.30 4274.81 Total 6900 497 1898 111 247 2115 63 176 2311 14 2987.16 14905.72

37 Table 2 shows that a significant number of operations generating infinities or NaNs were found, but in most cases, ECLAIR was able to prove that no such behavior may occur. The number of timeouts is generally limited, ad it is higher in files containing long sequences of floating-point operations with data-flow dependencies, that lead to the generation of large constraint systems. Overall, ECLAIR was able to either prove or disprove, without timing out, the occurrence of 2395 finite to +∞ transitions out of 2506 (96 %), 2362 finite to −∞ transitions out of 2425 (97 %), and 2487 numeric-to-NaN transitions out of 2501 (99 %). Such a low timeout rate makes our approach useful in practice to analyze code bases such as those considered in the benchmark.

RQ3: Performances RQ3 asks what are the performances of our approach, i.e. how long it takes to perform the activities investigated by the previous questions. The total time taken by the analysis ranges from less than a second for files with tens of lines of code, to up to 12 minutes for files containing hundreds of lines of code. Thus, we can claim that this kind of analysis is most convenient for small computational kernels, but still practically feasible for medium-large ones. We can also observe that, in general, the time taken by propagators for math.h/cmath functions, i.e., the algorthms of Sections 5 and 6, is negligible with respect to the total time taken by the analysis. This can be partially explained with the fact that, while such propagators have been implemented in C++, the rest of the program analysis and constraint propagation infrastructure has been implemented in (a small subset of) the Prolog programming language. A full implementation in C++ or another language of comparable performances could significantly improve the overall execution times.

8 Comparison with the State of the Art

In this section, we compare the techniques presented in this paper with the state of the art, which we summarize below. Interval consistency techniques have been extended to floating-point com- putations in [BGM06], with the purpose of symbolic execution. More advanced techniques for refining arithmetic floating-point operations have been proposed in [BCGG16]. All such works only deal with basic arithmetic operations, and do not provide any technique to tackle math.h/cmath mathematical functions. Interval refinement algorithms for mathematical functions, which take into account all IEEE 754 rounding modes, were introduced in [Mic02], but they require excessively stringent features for the functions’ implementations: they must be correctly rounded, and strictly monotonic. As we report in Sections 4 and 9.2, most implementations are far from meeting any of such requirements. The detection of floating-point anomalies such as the ones we consider in Section 7.2 has been previously tackled in [Min04] by means of abstract inter- pretation, using linear real-valued approximations of floating-point constraints. This work has been implemented in the commercial tool ASTREE.´ 18 [Min04] 18 https://www.absint.com/astree/index.htm, last accessed on July 16th, 2020.

38 only deals with basic arithmetic operations, and not with math.h/cmath func- tions. We could find no evidence in the literature of the addition of such features to ASTREE´ afterwards. The tool Ariadne [BVLS13] performs symbolic execution of floating-point computations by approximating them with real numbers, and solves the re- sulting constraint systems with a SMT solver. Mathematical functions are also supported, but the fact that they are being approximated with reals makes this approach unsound, even if their implementations are correctly rounded and strictly monotonic, because rounding is not taken into account. Canalyze-fp [WXY+16] also uses symbolic execution to detect floating-point exceptions, and uses the floating-point theory supported by the SMT solver Z3.19 math.h/cmath functions are approximated by just considering their (theoretical) ranges, e.g., a constraint such as y = exp(x) is approximated to y ≥ 0∧y ≤ +∞. Clearly, this approach is trivially sound, but fails to exploit the peculiarities of the functions to effectively refine variable domains. [WLZ17] combines symbolic execution with value range analysis to speed up the floating-point exception detection process. While a SMT solver is used for symbolic execution, value range analysis is performed with interval arithmetic. The indirect propagators for math.h/cmath functions employed to refine variable ranges assume correct rounding of their implementations, and do not take into account the effects of rounding, as [Mic02] does. Thus, this approach is also unsound. To the best of our knowledge, none of the tools above is available to the pub- lic. Anyways, as we detailed above, the most advanced treatment of math.h/cmath functions is the one of [Mic02], despite its being less recent than other ap- proaches. To show what kind of issues arise when using such unsound techniques to attempt software verification, we implemented the projections of [Mic02], and integrated them into ECLAIR. Then, we evaluated the differences between [Mic02] and our approach experimentally. The research questions that we seek to answer are the following: – RQ4: What kind of issues may be caused by an unsound treatment of math.h/cmath functions? – RQ5: How often do such issues occur in practice? – RQ6: What is the impact of the two different approaches on the performances of the analyses? We answer to RQ4 in Section 8.1, and to RQ5 and RQ6 in Section 8.2.

8.1 RQ4: issues caused by an unsound treatment of math.h/cmath functions In this section, we demonstrate what is the type and severity of issues caused by an unsound treatment of math.h/cmath functions by means of another case study.

19 https://github.com/Z3Prover/z3, last accessed on July 16th, 2020.

39 The following C function is an implementation of the Gauss Error Function, taken from [SS14]. 1’ float custom_ferf(float x) { 2’ float sgn_x = signbit(x) ? -1.0F : 1.0F; 3’ return (2/sqrtf(M_PI20)) * sgn_x * sqrtf(1 - expf(-(x*x))) 4’ * (sqrtf(M_PI)/2 5’ + (31.0/200.0)*expf(-(x*x)) 6’ + (3481.0/8000.0)*expf(-2*(x*x))) 7’ } A natural question that arises by looking at this code is whether it may generate NaNs. Consider, e.g., the term sqrtf(1 - expf(-(x*x))). Since the square root function is undefined for negative arguments, its math.h/cmath implementation sqrtf returns a NaN in such a case. But can the result of 1 - expf(-(x*x)) be negative? A very elementary property of the real exponential function exp(x) is that exp(x) ≤ 1 for x ≤ 0, so, considering that -(x*x) ≤ 0 for any finite x, the answer should be no. The same argument holds for a correctly rounded implementation of the expf function, even with rounding mode ‘up’ (i.e., towards positive infinity). This is, indeed, the conclusion reached by applying constraint solving based on the propagators of [Mic02] on this code. Unfortunately, the implementation of the expf function on the xps machine presents a large glitch surrounding 0, when executed with rounding mode ‘up’. At the left border of this glitch, the function returns a value greater than 1, even with a negative argument. E.g., it returns 1.000001 when evaluated on −2−149. Our filtering algorithms take this imprecision into account, and ECLAIR correctly points out that an input value of x = −2−149 causes the call to the sqrtf function to return a NaN, which is propagated in the subsequent computations, and returned by the above function. Our approach handles the quirks of math.h/cmath function implementations soundly and precisely, enabling the discovery of subtle bugs, which are nearly impossible to find manually, and that result in false negatives with other state- of-the-art approaches.

8.2 Comparison on real-world code We run ECLAIR equipped with both our interval refinement algorithms and those of [Mic02] on the self-assembled benchmark we described in Section 7.3. Again, the purpose of the analysis is to prove or disprove the possible occur- rence of finite-to-infinite and numeric-to-NaN transitions, and it has been run for rounding mode ‘near’. We report the results of this evaluation in Table 3, and we comment on them with respect to the research questions below.

RQ5: Frequency of issues caused by unsoundness Table 3 shows that, in all benchmark groups, our propagators find more anomalies than those of [Mic02]. Relatively few anomalies are missed by [Mic02] in GSL and AxBench,

20 M PI is a floating-point approximation of π in math.h/cmath.

40 Table 3. Comparison of benchmark data for the xps machine. For each file and bench- mark group we report the number of potential anomalies discovered, the total time taken by the analysis (T ), the time taken by propagators for math.h/cmath functions (Tm), for both our filters, and those of [Mic02]. We report the number of potential anomalies that have been proved possible (g), those that were proved unfeasible (u), and those for which there was a timeout (t).

Our propagators [Mic02] Benchmark g u t T (s) Tm (ms) g u t T (s) Tm (ms) GSL 420 4205 122 2266.86 9167.68 419 4205 123 2255.85 4841.98 AxBench 27 390 17 557.78 703.27 24 390 20 679.80 1279.15 paparazzi.c 9 243 4 27.22 759.97 6 244 6 30.58 663.01 Test suite 464 1486 45 135.30 4274.81 445 1503 47 138.35 3309.77 Total 920 6324 188 2987.16 14905.72 894 6342 196 3104.59 10093.91 while many more are missed in paparazzi.c and our test suite. Most importantly, in the two latter cases 18 anomalies are mistakenly declared impossible by the analysis based on [Mic02]. Such anomalies may, instead, occur, because the test inputs generated by ECLAIR with our propagators actually trigger them. Over- all, the analysis based on [Mic02] misses 26 anomalies, deeming 18 of them unfeasible, and timing out on 8 of them. Since the proportion of missed anomalies is relatively low, the propagators of [Mic02] may be used for the purpose of test-case generation. However, even this little miss-rate is unacceptable in the context of program verification, especially for safety-critical code.

RQ6: Performance comparison The time taken by the propagators for math.h/cmath functions only (Tm) is consistently lower for the approach of [Mic02]. Interestingly, the total time of the analyses (T ) shows the opposite. For the GSL group, [Mic02] performs better, but only for a few seconds. For all other groups, the analyses based on our algorithms are faster than those based on [Mic02]. For the AxBench benchmarks, the difference reaches one minute. We manually inspected the way the constraint solving process converges for a few cases in which this difference in analysis time is most pronounced. Our explanation for this behavior is that the higher precision of our algorithms favor- ably influences the overall constraint solving process, which converges in fewer iterations. In particular, the fact that our indirect propagation algorithms are based on a dichotomic search allows them to prune more values from variable domains in each iteration, decreasing the total number of iterations needed. Of course, this insight is limited to the cases that we analyzed, but we believe it can be generalized consistently.

8.3 Related Work Automated test-case generation for floating-point computations has also been widely studied. The work of [MS76], which originated the field of search-based

41 testing, searches the input space of the program by numerically maximizing an objective function that represents a given test adequacy criterion. CoverMe [FS17] performs its input-space exploration by minimizing a function representing the code path to be tested through constrained programming. Symbolic execution [Kin76] is also widely used for structure-based test data generation. KLEE [CDE08,LSC+17] is a LLVM-based symbolic execution engine that leverages several SMT solvers to generate test-cases with high code coverage. CORAL [BdA+12] solves constraints generated by symbolic execution with several heuristic strategies combined with interval-based solving. Dynamic Symbolic Execution [GKS05] (DSE) combines symbolic execution with concrete execution of the program. Runtime values gath- ered from the concrete executions are used when constraint solving fails, e.g. when it timeouts or encounters unsupported expressions. CUTE [SMA05] uses DSE to generate test data. Other tools combine search-based approaches with DSE. FloPSy [LTHD10] combines DSE with search-based techniques such as the Alternating Variable Method [Kor90], and evolution strategies. Austin [LHG10] uses symbolic execution combined with heuristic search-based strategies. The main drawbacks of pure symbolic execution come from the limitations of the underlying constraint solver, which may timeout when excessively com- plex non-linear constraint systems are involved. Search-based methods such as [MS76] and CoverMe generally perform better in this respect, and tools such as CORAL, FloPSy and Austin combine them in different ways with symbolic ex- ecution to overcome such issues. Moreover, constraint solvers often do not fully support floating-point arithmetic, and approximate it to real arithmetic, which is unsound [BGM06]. This is the case for CORAL. Only recently, SMT solvers acquired the ability to soundly solve floating-point constraints [BDG+13], which can be exploited by tools based on them, such as KLEE. However, math.h/cmath library functions are always treated as uninterpreted functions, which hinders the accuracy of constraint solving. Tools based on DSE, namely CUTE, FloPSy and Austin, use actual program executions to provide concrete values for such functions. The approach presented in this paper enables solving floating-point constraints soundly, and in a fully static way, i.e. without the need to con- cretely execute the program, even in the presence of math.h/cmath functions. Combining it with search-based techniques to reduce timeouts due to constraint complexity may be an interesting line of future work, although only adequate for testing, and not verification. Much work has been done on the complementary goal of statically determin- ing the accuracy of floating-point computations. Fluctuat [GMP02] and PRE- CiSA [MTDn17] estimate error bounds by means of abstract interpretation, FPTaylor [SBB+18] uses symbolic Taylor expansions, Real2Float [MCD17] uses semidefinite programming, Rosa [DK17] uses a SMT solver combined with a novel technique based on Lipschitz continuity, and Daisy [DIN+18] combines many of the earlier approaches. Gappa [DM10] uses interval arithmetic and for- ward error analysis to prove error bounds. All such tools are not concerned with the detection of floating-point exceptions or the proof of arbitrary assertions, but rather with estimating the error affecting floating-point with respect to real-

42 valued computations. Among them, only Fluctuat and Daisy support the direct analysis of C code. The main similarity between the work presented in this paper and the above tools is the need for an estimation of program variable domains. However, none of such tools does a treatment of math.h/cmath function imple- mentations as precise as the one presented in this paper, as they use real-valued approximations thereof.

9 Discussion and Further Work

In this section we discuss some aspects of the applicability of our proposal, which immediately suggest directions for further work.

9.1 Access to the Target Library

For the purposes of true verification, our approach requires execution access to the mathematical library used by the target. When the host and the target computer coincide, i.e. when the target can run the verifier code, this is no problem. Alternatively, the host computer might provide an implementation that is fully equivalent to the one used on the target: this is the case, e.g., on targets where floating-point support is implemented in software. In other cases, an emulator must be used. This can be seen as the major drawback of our approach. However, in our opinion the question should be put in the following terms: in order to verify a piece of code properly using library functions against, say, the absence of run- time anomalies, the library functions have to be fully specified. If a specification of the form “all functions are POSIX-compliant [IEE13] and compute correctly- rounded results” is available, then we have no problem. Otherwise there really is no other way than supplementing the partial specification available with the missing bits: providing execution access to the library during the analysis along with correct bounds on the size of the glitches might well be the less expensive option.

9.2 Obtaining Glitch Data

The other requirement of the approach concerns the availability of (possibly imprecise) information about glitches. Some ways to obtain such information are the topics of the next sections.

Brute Force For single-precision IEEE 754 (unary) functions, collection of precise glitch data by brute force is perfectly feasible. The glitch data presented in this paper have been obtained by running a program that computes each function on each value of its domain, in ascending order. If the function is quasi-isotonic, each time a value lower than the previous one is found, the program marks the beginning of a glitch, and measures its width and depth by incrementing

43 appropriate counters until the end of the glitch is found (i.e., until the function yields a value greater than or equal to the one recorded at the beginning of the glitch). The program only keeps track of the maximum width and depth of the encountered glitches, of their number, of the input value in which the first glitch starts, and the one in which the last one ends. This is all the data needed by the algorithms of Section 5. For the 25 functions studied in this paper, this procedure takes less than two hours on ordinary hardware. With less powerful CPUs used on embedded systems, it might take ten or twenty times as much. This is not really a prob- lem as glitch data must be collected only once for each implementation of the math.h/cmath functions. And, especially in safety-critical sectors, the mathe- matical (and other) libraries will rarely if ever be changed once they have been selected. Of course, this method cannot be used for double-precision or extended- precision implementations of the functions.

Precision Guarantees When the mathematical library comes equipped with information on the maximum errors for each function (see, e.g., the HA and LA accuracy modes of the Intel Math Kernel Library), such information can be used to determine safe approximations of the required glitch parameters. Re- cent developments enable the automatic proof of error bounds of math.h/cmath implementations [Har00a,Har00b,LSA18]. We can thus expect more and more implementations will provide provably correct error bounds that we can directly exploit for program verification. In fact, given a function and an architecture, the maximum error is measured in ULP and can be used as an upper bound for the maximal depth of the glitches wM . Given an interval [xl, xu], the cardinality of the floating-point interval [xl, xu] is an upper bound to the maximum width of the glitches and, of course xl and xu are safe approximations of where the glitches begin and end. Finally, setting ng > 1 allows us to call the indirect propagation algorithms to refine the interval [xl, xu] of the function domain with respect to a given interval [yl,yu] of the function range. Even with such rough information on the glitches, the algorithms would allow us to refine the interval [xl, xu] using the ′ ′ logarithmic searches (logsearch lb) and returning [xl, xu] ⊆ [xl, xu] such that ′ ′ f(xl) is smaller than yl by more than wM ULPs and f(xu) is bigger than yu by more than wM ULPs. Therefore, the cardinality of the resulting refined interval is related to the growth speed of the considered function. For future work, we intend to investigate how information on the maximum error, coupled with the knowledge of the function and of the interval to be refined, allows computing sound and tight bounds to the width of glitches on that interval.

Analysis of the Implementation Transcendental functions are usually im- plemented with polynomial approximations. When speed is more important than precision, such computations are carried out in the same floating-point format as the the function being approximated; otherwise extended precision can be used to reduce the error. Whereas the total error accumulation can be bounded, the ordinary techniques used do not allow us to relate the rounding errors for

44 different input values to one another. So, if the error bound is small enough to imply monotonicity, fine. Otherwise, as things stand today, we are left with the approach of the previous section. However, we conjecture that (some of) the implementation algorithms can be analyzed with other techniques in order to obtain mode precise glitch data: this is another direction for future work.

9.3 Supported Functions in math.h/cmath For functions that are quasi-monotonic, namely acos, acosh, asin, asinh, atan, atanh, cbrt, cosh, erf, exp, exp10, exp2, expm1, log, log10, log1p, log2, sinh, sqrt, tanh, our approach enables verification in their full domain, provided that the required data on glitches, described in Section 4, are correct and conserva- tive for the math.h/cmath implementation in use. Such data can be gathered as described in Section 9.2. If glitches are sufficiently narrow and shallow, the algo- rithms of Section 5 are able to refine variable domains in a very precise manner, often optimal, which guarantees a fast convergence of the constraint solving pro- cess. Correctly rounded implementations of library functions also fall into this case. If glitches are too large, the results may be less precise, causing slower con- vergence of constraint solving, but they are still correct. In practice, this may result in having more timeouts, i.e. don’t knows, during verification, but may never cause false positives or false negatives. Whether glitches are sufficiently narrow and shallow depends on the parameter t of the algorithms of Section 5, which controls the maximum length of linear searches. If their maximum width, wM, is lower than t, then our algorithms return maximally precise results. The choice of the value of t is thus a trade-off between the computational efficiency of the algorithms and the precision of their results. In our experiments, we found that a value of t = 20 is satisfactory, as it allows us to get precise results on most implementations, while maintaining acceptable performances. If glitch data is not precise, our approach cannot be used for verification, as it cannot reliably state that a certain assertion is always satisfied. It can, however, prove that an assertion does not hold, by finding a counter-example, and it can also be used for test-data generation (cf. Section 9.4). For functions presenting natural monotonicity changes, namely tgamma, lgamma, sin, cos, and tan, verification applies only for a restricted part of the domain. Inside this domain, the same considerations about glitch data we made for the rest of the functions apply. For tgamma, lgamma, such part of the domain is fixed, and it is [2, +∞]. For trigonometric functions, this part of the domain can be chosen by the user, and such choice is mostly influenced by performance considerations. In fact, each quasi-monotonic branch of the graphs of such func- tions must be analyzed separately by the propagation algorithms, as described in Section 6. Thus, this verification domain must be chosen to contain a reason- able amount of quasi-monotonic branches. In our experiments, we found that a reasonable value for such domain is [−16, +16]. Once such domain has been cho- sen, verification can be carried out by proving that the inputs to trigonometric functions never fall out of this domain, by introducing appropriate assertions. As we noted in Section 1, this is generally not a significant limitation, as in

45 most applications the use of trigonometric functions with arguments of exces- sive magnitude is discouraged, due to their ULP getting excessively large. In general, since floating-point numbers are most dense around 0, most of the val- ues they return can be found in a limited domain, which makes test generation always feasible with such functions. So far, we described how to deal with 75 (considering float, double, and long double versions) of the standard C/C++ mathematical functions: but there are many others. Several of them are not problematic, as they are fully specified and their treatment poses no problem (e.g., round, trunc, floor, ceil, fma, fabs, next, . . . ). In future work we will focus on the remaining functions, i.e. functions with two inputs, such as atan2 and pow, and complex functions.

9.4 Verification vs. Test-Data Generation In our experimental evaluation we performed model checking, proving important properties of the code at hand. For such results to be achieved, a the conditions analyzed in Section 9.3 must all apply. When such conditions do not hold, the correctness of constraint propagation is not guaranteed, leading to the following issues: 1. when the variable domains reach quiescence, they do not contain all existing solutions, but some are missing; 2. the final variable domains contain values that are not solutions. Due to issue (1), it is not possible to rule out some program behavior when a domain becomes empty during the constraint solving process. However, even when issue (2) occurs, as far as at least one solution is contained in the domains, another important correctness-ensuring technique is possible: automated test- data generation. The approach we use enables white-box program testing, in the form of sym- bolic execution-based test data generation [BCGG13]. First, a constraint system is built for each execution path selected by a code-coverage criterion [ZHM97]. Then, the constraint-solving engine is launched. When all variable domains reach quiescence, their contents may be used as test data that cause the execution of the path. An instrumented version of the code to test can be executed with such input values, in order to make sure they actually cause the requested execution path to be followed, ruling out issue (2). This procedure can also be employed in testing approaches that mix con- straint solving with other techniques: concolic testing [SMA05], or white-box fuzzing [GKS05,GLM08,SGS+16]. Indeed, such techniques present an improve- ment with respect to pure, black-box random testing, due to their greater capa- bility of finding test inputs that trigger specific parts of the code.

9.5 Better Labeling Strategies for Constraint-Based Reasoning The constraint-solving algorithms of Section 3.2 operate by interleaving con- straint propagation, in which constraints are used to refine variable domains (in- tervals or multi-intervals in our case), and labeling, whereby a variable is chosen

46 and its domain is partitioned into two or more subsets, each of which is explored separately. It is the second process that drives the first one: when constraint propagation goes to quiescence, i.e., when no further refinement of the domains can be achieved, labeling splits the domain of a chosen variable, triggering a new phase of constraint propagation. This goes on until a solution has been found or one of the domains becomes empty. In this paper we only dealt with constraint propagation, but different labeling strategies have an enormous influence on performance. Unfortunately, there is no such a thing as the good labeling strategy: it is a matter of heuristics, and strategies that work well for one problem may still work badly for another. Test input generation and model checking give rise to constraint problems of a different nature: while the latter is very often over-constrained (i.e., there are few or no solutions at all, as the program exhibits very few or no run-time anomalies), this is not the case for the former (e.g., a function made of a single basic block can be covered by a single test input chosen more or less at random). Thus, different tasks can profit from the choice of different labeling strategies. During the experimental evaluation, we strongly felt that the current label- ing strategy employed by ECLAIR can be significantly enhanced by defining heuristics that take into account how variables are constrained by invocations to such functions. Work on these new heuristics is ongoing.

10 Conclusion

There is a popular quotation in the software verification and validation commu- nity, whereby “Without a specification, a system cannot be right or wrong, it can only be surprising!”21 This captures quite well the current state of affairs for C/C++ software that uses the functions declared in the standard math.h/cmath header files. Despite the progress made on the development of correctly rounded functions,22 all implementations in widespread use, especially in the world of embedded systems, offer little or no guarantees about the computed results. As a consequence, the verification of programs using such functions is always painful and expensive and, for these reasons, more often than not it is only partially per- formed through testing. As the search space can be huge, testing can only cover a tiny fraction of all the possible value combinations: this cannot exclude the manifestation of unexpected results, certainly not with the level of confidence that is required for mission- and safety-critical applications. The aim of this work is to improve upon the current situation now, i.e., without waiting for the wider adoption of correctly-rounded implementations. While such adoption is generally desirable and will certainly take place, at some stage and in some application domains, it is not clear whether correctly-rounded implementations can meet the efficiency criteria of all application domains, par- ticularly in the field of embedded systems. Studying different implementations of

21 Paraphrased from [YBK85]. 22 See, e.g., the very interesting MetaLibm project at http://www.metalibm.org/, last accessed on July 16th, 2020.

47 the standard C/C++ mathematical functions, we realized that what they have in common is a piecewise quasi-monotonicity property: monotonicity is either pre- served or only perturbed by small and, on average, not too frequent “glitches.” Based on this observation, we developed direct and indirect propagation algo- rithms for interval refinement. These algorithms can be integrated into abstract interpreters, model checkers and automatic test input generators based on con- straint propagation. The techniques proposed here are now used in the C/C++ semantic analysis components of the ECLAIR software verification platform and the initial expe- riences are quite positive. We can now properly verify the absence of run-time anomalies for code using the C/C++ standard functions that, before, was com- pletely out of reach. Verification in the strong sense is only feasible modulo the possibility of bounding the size of glitches (this can always be done for the single- precision functions) and the ability to query the underlying implementation of the functions during the analysis. For the cases where the first condition cannot be guaranteed, we can still detect many definite program issues, even though we cannot draw conclusions from the fact issues have not been found. When the second condition cannot be met, it may still be possible to use a reference imple- mentation with significant commonalities with the target implementation (the case where libraries for different architectures are derived from the same code base is quite common), and we can nonetheless detect high-severity, possible program issues. We cannot yet claim that the problem of the verification of C/C++ programs using the standard mathematical functions has been solved, as much remains to be done. However, we believe the present work is a definite step in the right direction, and one that has the potential of improving, starting from today, the current state of the art.

Acknowledgments We are grateful to Arnaud Gotlieb (Simula Research Labora- tory, Norway, and INRIA – Rennes, France) and Claude Michel (INRIA – Sophia Antipolis, France) for the fruitful discussions we had on the subject of this pa- per. We are are also grateful to Marcel Beemster (Solid Sands, The Netherlands) for the discussions we had on the subject of testing mathematical libraries for the C language. Alessandro Zaccagnini (University of Parma, Italy) helped us defining rough inverses for the Gamma functions. Thanks also to Patricia M. Hill (BUGSENG srl, Italy) for her work on the ECLAIR platform that greatly facil- itated the experimental evaluation of this work. We also thank Dino Mandrioli (Politecnico di Milano) for his careful reading of a preliminary version of this paper, and Michele Guerriero (Politecnico di Milano) for the discussions. Fi- nally, we would like to express our gratitude to the editor and to the anonymous reviewers for helping us improving the readability of the paper.

References

ALSU07. A. V. Aho, M. S. Lam, R. Sethi, and J. D. Ullman, Compilers: Principles, techniques, and tools, 2nd ed., Pearson, London, UK, 2007.

48 Ana15. Analog Devices, C/C++ Compiler and library manual for Blackfin pro- cessors, August 2015, Revision 1.5. BBB+19. R. Bagnara, A. Bagnara, F. Biselli, M. Chiari, and R. Gori, Cor- rect approximation of IEEE 754 floating-point arithmetic for pro- gram verification, Report arXiv:1903.06119 [cs.PL], 2019, Available at http://arxiv.org/. BCGB16. R. Bagnara, M. Chiari, R. Gori, and A. Bagnara, A practical ap- proach to interval refinement for math.h/cmath functions, Report arXiv:1610.07390 [cs.PL], 2016, Available at http://arxiv.org/. BCGG13. R. Bagnara, M. Carlier, R. Gori, and A. Gotlieb, Symbolic path-oriented test data generation for floating-point programs, Proceedings of the 6th IEEE International Conference on , Verification and Val- idation (Luxembourg City, Luxembourg), IEEE Press, 2013, pp. 1–10. BCGG16. , Exploiting binary floating-point representations for constraint propagation, INFORMS Journal on Computing 28 (2016), no. 1, 31–46. BdA+12. M. Borges, M. d’Amorim, S. Anand, D. Bushnell, and C. S. Pasareanu, Symbolic execution with interval solving and meta-heuristic search, Pro- ceedings of the 5th IEEE International Conference on Software Testing, Verification and Validation (Montreal, Canada), IEEE Computer Society, 2012, pp. 111–120. BDG+13. Martin Brain, Vijay D’Silva, Alberto Griggio, Leopold Haller, and Daniel Kroening, Interpolation-based verification of floating-point programs with abstract CDCL, Static Analysis - 20th International Symposium, SAS 2013, Seattle, WA, USA, June 20-22, 2013. Proceedings (Berlin, Heidel- berg), Lecture Notes in , vol. 7935, Springer-Verlag, Berlin, 2013, pp. 412–432. BDL12. L. Burdy, J.-L. Dufour, and T. Lecomte, The B method takes up floating-point numbers, Proceedings of the 6th International Con- ference & Exhibition on Embedded Real Time Software and Sys- tems (ERTS 2012) (Toulouse, France), SEE, 2012, Available at http://web1.see.asso.fr/erts2012/Site/0P2RUC89/5C-2.pdf. BGM06. B. Botella, A. Gotlieb, and C. Michel, Symbolic execution of floating-point computations, Software Testing, Verification and Reliability 16 (2006), no. 2, 97–121. BHZ08. R. Bagnara, P. M. Hill, and E. Zaffanella, The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and veri- fication of hardware and software systems, Science of Computer Program- ming 72 (2008), no. 1–2, 3–21. BKD+05. N. Brisebarre, P. Kornerup, D. Defour, N. Revol, and J.-M. Muller, A new range-reduction algorithm, IEEE Transactions on Computers 54 (2005), 331–339. BMR12. M. S. Belaid, C. Michel, and M. Rueher, Boosting local consistency al- gorithms over floating-point numbers, Proceedings of the 18th Interna- tional Conference on Principles and Practice of Constraint Programming (Qu´ebec City, Canada) (M. Milano, ed.), Lecture Notes in Computer Sci- ence, vol. 7514, Springer-Verlag, Berlin, 2012, pp. 127–140. BMV94. F. Benhamou, D. McAllester, and P. Van Hentenryck, CLP(intervals) re- visited, Logic Programming: Proceedings of the 1994 International Sym- posium (Ithaca, NY, USA) (M. Bruynooghe, ed.), MIT Press Series in Logic Programming, The MIT Press, 1994, pp. 124–138.

49 BO97. F. Benhamou and W. J. Older, Applying interval arithmetic to real, in- teger, and boolean constraints, Journal of Logic Programming 32 (1997), no. 1, 1–24. BVLS13. E. T. Barr, T. Vo, V. Le, and Z. Su, Automatic detection of floating-point exceptions, Proceedings of the 40th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL ’13) (Rome, Italy) (R. Giacobazzi and R. Cousot, eds.), ACM Press, 2013, pp. 549–560. CC76. P. Cousot and R. Cousot, Static determination of dynamic properties of programs, Proceedings of the Second International Symposium on Pro- gramming (Paris, France) (B. Robinet, ed.), Dunod, Paris, France, 1976, pp. 106–130. CC77. , Abstract interpretation: A unified lattice model for static analysis of programs by construction or approximation of fixpoints, Proceedings of the Fourth Annual ACM Symposium on Principles of Programming Languages (Los Angeles, CA, USA), ACM Press, 1977, pp. 238–252. CDE08. C. Cadar, D. Dunbar, and D. R. Engler, KLEE: Unassisted and automatic generation of high-coverage tests for complex systems programs, Proceed- ings of the 8th USENIX Symposium on Operating Systems Design and Implementation (OSDI 2008) (San Diego, California, USA) (R. Draves and R. van Renesse, eds.), USENIX Association, 2008, pp. 209–224. CPDGP01. A. Coen-Porisini, G. Denaro, C. Ghezzi, and M. Pezz`e, Using symbolic execution for verifying safety-critical systems, Proceedings of the 8th Eu- ropean Software Engineering Conference (Vienna, Austria), ACM Press, 2001, pp. 142–151. CR85. L. A. Clarke and D. J. Richardson, Applications of symbolic evaluation, Journal of Systems and Software 5 (1985), no. 1, 15–35. Dav87. E. Davis, Constraint propagation with interval labels, Artificial Intelligence 32 (1987), no. 3, 281–331. Dek71. T. J. Dekker, A floating-point technique for extending the available preci- sion, Numerische Mathematik 18 (1971), no. 3, 224–242. DGD07. T. Denmat, A. Gotlieb, and M. Ducass´e, Improving constraint-based test- ing with dynamic linear relaxations, Proceedings of the 18th IEEE Inter- national Symposium on Software Reliability (ISSRE 2007) (Trollh¨attan, Sweden), IEEE Computer Society, 2007, pp. 181–190. DIN+18. E. Darulova, A. Izycheva, F. Nasir, F. Ritter, H. Becker, and R. Bas- tian, Daisy — framework for analysis and optimization of numerical pro- grams, Proceedings of the 24th International Conference on Tools and Algorithms for the Construction and Analysis of Systems (TACAS 2018) (Thessaloniki, Greece) (D. Beyer and M. Huisman, eds.), Lecture Notes in Computer Science, vol. 10805, Springer, 2018, Tool paper, pp. 270–287. DK17. E. Darulova and V. Kuncak, Towards a compiler for reals, ACM Transac- tions on Programming Languages and Systems 39 (2017), no. 2, 8:1–8:28. DM10. M. Daumas and G. Melquiond, Certification of bounds on expressions involving rounded operators, ACM Transactions on Mathematical Software 37 (2010), no. 1, 2:1–2:20. DMMM96. M. Daumas, C. Mazenc, X. Merrheim, and J.-M. Muller, Modular range re- duction: a new algorithm for fast and accurate computation of the elemen- tary functions, The Journal of Universal Computer Science: Annual Print and CD-ROM Archive Edition Volume 1 — 1995 (H. Maurer, C. Calude, and A. Salomaa, eds.), Springer Berlin Heidelberg, Berlin, Heidelberg, 1996, pp. 162–175.

50 FGCRG17. A. Di Franco, H. Guo, and Cindy C. Rubio-Gonz´alez, A comprehensive study of real-world numerical bug characteristics, Proceedings of the 32Nd IEEE/ACM International Conference on Automated Software Engineer- ing (Urbana-Champaign, IL, USA), ASE 2017, IEEE Press, 2017, pp. 509– 519. FS17. Z. Fu and Z. Su, Achieving high coverage for floating-point code via uncon- strained programming, Proceedings of the 38th ACM SIGPLAN Confer- ence on Programming Language Design and Implementation (PLDI 2017) (Barcelona, Spain) (A. Cohen and M. T. Vechev, eds.), ACM Press, 2017, pp. 306–319. Fuj13. Fujitsu Semiconductor, FR family SOFTUNE C/C++ compiler manual for V6, eighth ed., June 2013. GBR98. A. Gotlieb, B. Botella, and M. Rueher, Automatic test data generation us- ing constraint solving techniques, Proceedings of the 1998 ACM SIGSOFT International Symposium on Software Testing and Analysis (Clearwater Beach, Florida, USA), ISSTA ’98, ACM Press, 1998, pp. 53–62. GBR00. A. Gotlieb, B. Botella, and M. Rueher, A CLP framework for comput- ing structural test data, Proceedings of the First International Conference on Computational Logic (CL 2000) (London, UK) (J. Lloyd, V. Dahl, U. Furbach, M. Kerber, K.-K. Lau, C. Palamidessi, L. M. Pereira, Y. Sa- giv, and P. J. Stuckey, eds.), Lecture Notes in Artificial Intelligence, vol. 1861, Springer, 2000, pp. 399–413. GKS05. P. Godefroid, N. Klarlund, and K. Sen, DART: Directed automated ran- dom testing, Proceedings of the ACM SIGPLAN 2005 Conference on Pro- gramming Language Design and Implementation (PLDI 2005) (Chicago, IL, USA) (V. Sarkar and M. W. Hall, eds.), ACM, 2005, pp. 213–223. GLM08. P. Godefroid, M. Y. Levin, and D. A. Molnar, Automated whitebox fuzz testing, Proceedings of the Network and Distributed System Security Sym- posium (NDSS 2008) (San Diego, California, USA), The Internet Society, 2008. GMP02. E. Goubault, M. Martel, and S. Putot, Asserting the precision of floating- point computations: A simple abstract interpreter, Pogramming Languages and Systems — 11th European Symposium on Programming (ESOP 2002) (Grenoble, France) (D. Le M´etayer, ed.), Lecture Notes in Computer Sci- ence, vol. 2305, Springer, 2002, pp. 209–212. GP06. E. Goubault and S. Putot, Static analysis of numerical algorithms, Static Analysis: Proceedings of the 13th International Symposium (Seoul, Korea) (K. Yi, ed.), Lecture Notes in Computer Science, vol. 4134, Springer- Verlag, Berlin, 2006, pp. 18–34. Gre13a. Green Hills, MULTI: building applications for embedded Power architec- ture, March 2013, Version 2013.1. Gre13b. Green Hills, MULTI: building applications for embedded and RH850, October 2013, Version 2013.5. GS18. P. Godefroid and K. Sen, Combining model checking and testing, Hand- book of Model Checking (E. M. Clarke, T. A. Henzinger, H. Veith, and R. Bloem, eds.), Springer, Cham, Switzerland, 2018, pp. 613–649. Har00a. J. Harrison, Floating point verification in HOL light: The exponential func- tion, in System Design 16 (2000), no. 3, 271–305. Har00b. , Formal verification of floating point trigonometric functions, For- mal Methods in Computer-Aided Design (Berlin, Heidelberg) (W. A. Hunt and S. D. Johnson, eds.), Springer Berlin Heidelberg, 2000, pp. 254–270.

51 HLB01. Y. Hida, X. S. Li, and D. H. Bailey, Algorithms for quad-double precision floating point arithmetic (arith 15), Proceedings of the 15th IEEE Sympo- sium on Computer Arithmetic (Vail, CO, USA), IEEE Computer Society, 2001, pp. 155–162. IEE08. IEEE Computer Society, IEEE standard for floating-point arithmetic, IEEE Std 754-2008 (revision of IEEE Std 754-1985) ed., August 2008. IEE13. IEEE Computer Society and The Open Group, New York, NY, USA, Standard for Information Technology— Portable Operating System Inter- face (POSIX R ), Base Specifications, Issue 7, IEEE Std 1003.1, 2013 (in- corporates IEEE Std 1003.1-2008, and IEEE Std 1003.1-2008/Cor 1-2013) ed., April 2013. Int11. International Organization for Standardization, ISO/IEC 9899:2011: Pro- gramming languages — C, International Organization for Standardization, Geneva, Switzerland, 2011. Int14. , ISO/IEC 14882:2014(E): Programming languages — C++, In- ternational Organization for Standardization, Geneva, Switzerland, 2014. Kin76. J. C. King, Symbolic execution and program testing, Communications of the ACM 19 (1976), no. 7, 385–394. KM84. W. Kahan and S. McDonald, Minimizing q*m-n, Unpublished hand- out (available at https://people.eecs.berkeley.edu/~wkahan/testpi), 1984. Kor90. B. Korel, Automated software test data generation, IEEE Transactions on Software Engineering 16 (1990), no. 8, 870–879. LHG10. K. Lakhotia, M. Harman, and H. Gross, AUSTIN: A tool for search based software testing for the C language and its evaluation on deployed automo- tive systems, Proceedings of the 2nd International Symposium on Search Based Software Engineering (SSBSE ’10) (Benevento, Italy), IEEE Com- puter Society, 2010, pp. 101–110. LMR+98. P. Lacan, J. N. Monfort, L. V. Q. Ribal, A. Deutsch, and G. Gonthier, ARIANE 5 — the software reliability verification process, DASIA 98: Data Systems in Aerospace (Athens, Greece) (B. H. Kaldeich-Sch¨urmann, ed.), ESA Special Publication, vol. 422, European Space Agency, 1998, p. 201. LMT98. V. Lef`evre, J.-M. Muller, and A. Tisserand, The table maker’s dilemma, Rapport de recherche 98–12, Laboratoire de l’Informatique du Par- all´elisme, Ecole´ Normale Superi´eure de Lyon, 1998. LSA18. W. Lee, R. Sharma, and A. Aiken, On automatically proving the correct- ness of math.h implementation, Proceedings of the ACM on Programming Languages 2 (2018), no. POPL, 47:1–47:32. LSC+17. D. Liew, D. Schemmel, C. Cadar, A. F. Donaldson, R. Z¨ahl, and K. Wehrle, Floating-point symbolic execution: a case study in n-version programming, Proceedings of the 32nd IEEE/ACM International Con- ference on Automated Software Engineering (ASE 2017) (Urbana- Champaign, IL, USA), IEEE Computer Society, 2017, pp. 601–612. LSM+16. S. Loosemore, R. M. Stallman, R. McGrath, A. Oram, and U. Drepper, The GNU C library reference manual, Free Software Foundation, Inc., Boston, MA, USA, version 2.23 ed., August 2016. LTHD10. K. Lakhotia, N. Tillmann, M. Harman, and J. De Halleux, FloPSy: Search- based floating point constraint solving for symbolic execution, Proceedings of the 22nd IFIP WG 6.1 International Conference on Testing Software and Systems (Natal, Brazil), Lecture Notes in Computer Science, vol. 6435, Springer-Verlag, Berlin, Heidelberg, 2010, pp. 142–157.

52 MCD17. V. Magron, G. A. Constantinides, and A. F. Donaldson, Certified round- off error bounds using semidefinite programming, ACM Transactions on Mathematical Software 43 (2017), no. 4, 34:1–34:31. Mic02. C. Michel, Exact projection functions for floating point number constraints, Proceedings of the 7th International Symposium on Artificial Intelligence and Mathematics (Fort Lauderdale, FL, USA), Rutgers University, 2002. Min04. A. Min´e, Relational abstract domains for the detection of floating-point run-time errors, Programming Languages and Systems: Proceedings of the 13th European Symposium on Programming (Barcelona, Spain) (D. Schmidt, ed.), Lecture Notes in Computer Science, vol. 2986, Springer- Verlag, Berlin, 2004, pp. 3–17. Mon08. D. Monniaux, The pitfalls of verifying floating-point computations, ACM Transactions on Programming Languages and Systems 30 (2008), no. 3, 12:1–12:41. MPT16. J.-M. Muller, V. Popescu, and P. T. P. Tang, A new multiplication al- gorithm for extended precision using floating-point expansions, Proceed- ings of the 23rd IEEE Symposium on Computer Arithmetic (ARITH 23) (Santa Clara, CA, USA) (P. Montuschi, M. Schulte, J. Hormigo, S. Ober- man, and N. Revol, eds.), IEEE Computer Society, 2016, pp. 39–46. MS76. W. Miller and D. L. Spooner, Automatic generation of floating-point test data, IEEE Transactions on Software Engineering 2 (1976), no. 3, 223–226. MTDn17. M. Moscato, L. Titolo, A. Dutle, and C. A. Mu noz, Automatic esti- mation of verified floating-point round-off errors via static analysis, Pro- ceedings of the 36th International Conference on Computer Safety, Reli- ability, and Security (SAFECOMP 2017) (Trento, Italy) (S. Tonetta and E. Schoitsch end F. Bitsch, eds.), Lecture Notes in Computer Science, vol. 10488, Springer, 2017, pp. 213–229. Mul05. J.-M. Muller, On the definition of ulp(x), Rapport de recherche 5504, INRIA, 2005. Mul16. , Elementary functions: Algorithms and implementation, 3rd ed., Birkh¨auser, Basel, Switzerland, 2016. Ng92. K. C. Ng, Argument reduction for huge arguments: Good to the last bit, Tech. report, SunPro – A Sun Microsystems, Inc. Business, July 1992. PH83. M. H. Payne and R. N. Hanek, Radian reduction for trigonometric func- tions, SIGNUM Newsletter 18 (1983), no. 1, 19–24. SBB+18. A. Solovyev, M. S. Baranowski, I. Briggs, C. Jacobsen, Z. Rakamari´c, and G. Gopalakrishnan, Rigorous estimation of floating-point round-off errors with symbolic taylor expansions, ACM Transactions on Programming Lan- guages and Systems 41 (2018), no. 1, 2:1–2:39. SGS+16. N. Stephens, J. Grosen, C. Salls, A. Dutcher, R. Wang, J. Corbetta, Y. Shoshitaishvili, C. Kruegel, and G. Vigna, Driller: Augmenting fuzzing through selective symbolic execution., Proceedings of the 23rd Annual Net- work and Distributed System Security Symposium (NDSS 2016) (San Diego, CA, USA), vol. 16, The Internet Society, 2016, pp. 1–16. SMA05. K. Sen, D. Marinov, and G. Agha, CUTE: A concolic unit testing engine for c, Proceedings of the 10th European Software Engineering Confer- ence Held Jointly with 13th ACM SIGSOFT International Symposium on Foundations of Software Engineering (Lisbon, Portugal), ESEC/FSE-13, ACM Press, 2005, pp. 263–272.

53 SS14. H. Sch¨opf and P. Supancic, On b¨urmann’s theorem and its application to problems of linear and nonlinear heat transfer and diffusion, The Mathe- matica Journal 16 (2014). Tex18. Texas Instruments, ARM optimizing C/C++ compiler v18.1.0.LTS user’s guide, January 2018, Literature Number: SPNU151R. VSD98. P. Van Hentenryck, V. Saraswat, and Y. Deville, Design, implementation, and evaluation of the constraint language cc(fd), The Journal of Logic Programming 37 (1998), no. 13, 139 – 164. WLZ17. X. Wu, L. Li, and J. Zhang, Symbolic execution with value-range analysis for floating-point exception detection, Proceedings of the 24th Asia-Pacific Software Engineering Conference (APSEC 2017) (Nanjing, China) (J. Lv, H. J. Zhang, M. Hinchey, and X. Liu, eds.), IEEE Computer Society, 2017, pp. 1–10. WXY+16. X. Wu, Z. Xu, D. Yan, T. Wu, J. Yan, and J. Zhang, The floating-point ex- tension of symbolic execution engine for bug detection, Proceedings of the 23rd Asia-Pacific Software Engineering Conference (APSEC 2016) (Hamil- ton, New Zealand), IEEE Computer Society, 2016, pp. 265–272. YBK85. W. D. Young, W. E. Boebert, and R. Y. Kain, Proving a computer system secure, The Scientific Honeyweller 6 (1985), no. 2, 18–27. YMELK17. A. Yazdanbakhsh, D. Mahajan, H. Esmaeilzadeh, and P. Lotfi-Kamran, AxBench: A multiplatform benchmark suite for approximate computing, IEEE Design & Test 34 (2017), no. 2, 60–68. ZHM97. H. Zhu, P. A. V. Hall, and J. H. R. May, Software unit test coverage and adequacy, ACM Computing Surveys 29 (1997), no. 4, 366–427.

54 A Glitch Data for Other Implementations of libm

In this section, we provide additional data about the glitches in the single- precision functions for other implementations of the math.h/cmath mathematical functions. Table 4 lists, for each implementation, its identification code, which is used in the other tables, the CPU architecture, the operating system, and, where known, the libm version. Note that what appears as a large glitch in Table 9 for function tanf rounded down on the macbook machine, is actually due to a clear bug in the range re- duction algorithm used there.

Table 4. Glitch data: main characteristics of the tested implementations

id CPU OS compiler libm version alpha x86 64 Ubuntu 14.04 GCC 4.8.4 EGLIBC 2.19 gcc110 POWER7 Fedora 20 GCC 4.8.1 GNU libc 2.18 gcc111 POWER7 AIX 7 GCC 4.8.1 gcc112 POWER7 Fedora 21 GCC 4.9.2 GNU libc 2.20 gcc113 AArch64 Ubuntu 14.04 GCC 4.8.4 EGLIBC 2.19 igor x86 64 Fedora 12 GCC 4.4.4 GNU libc 2.11.2 macbook x86 64 Mac OS X 10.10.5 LLVM 6.1.0 Libm-3086.1 raspi ARMv6 + VFPv2 Raspbian Jessie GCC 4.9.2 GNU libc 2.19 xps x86 64 Ubuntu 19.10 GCC 9.2.1 GNU libc 2.30 zoltan x86 64 Ubuntu 16.04 GCC 5.4.0 GLIBC 2.23

55 Table 5. Glitch data for the alpha machine function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ 1 1 108 atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 454 1 2 466 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ 1 1 2 3 1 2 2 1 2 1 1 2 expm1f −∞ ∞ lgammaf 2 ∞ 163 1 2 164 1 2 166 1 2 161 1 2 logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 104 2 3 104 2 4 104 3 3 104 3 4 cosf −223 223 sinf −223 223 tanf −223 223

56 Table 6. Glitch data for the gcc110/2/3 machines function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ 1 1 108 atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 456 1 2 462 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ 2 1 2 expm1f −∞ ∞ lgammaf 2 ∞ logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 104 2 3 104 4 4 104 2 3 104 3 4 cosf −223 223 104 1 3 104 1 3 104 1 3 104 1 3 sinf −223 223 tanf −223 223

Table 7. Glitch data for the gcc111 machine

function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acoshf 1 ∞ asinhf −∞ ∞ atanf −∞ ∞ atanhf −1 1 coshf −∞ ∞ erff −∞ ∞ lgammaf 2 ∞ sinhf −∞ ∞ tanhf −∞ ∞ 2 1 107 tgammaf 2 ∞ cosf −223 223 sinf −223 223 tanf −223 223

57 Table 8. Glitch data for the igor machine function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 1 1 1010 1 1 1010 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 454 1 2 466 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ 1 1 2 3 1 2 2 1 2 1 1 2 expm1f −∞ ∞ lgammaf 2 ∞ 163 1 2 164 1 2 166 1 2 161 1 2 logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 155 109 2 155 122 2 157 119 2 153 119 2 cosf −223 223 104 1 3 104 1 3 104 1 3 104 1 3 sinf −223 223 tanf −223 223

58 Table 9. Glitch data for the macbook machine function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ asinf −1 1 asinhf −∞ ∞ atanf −∞ ∞ 2 1 109 2 1 109 2 1 109 atanhf −1 1 cbrtf −∞ ∞ coshf −∞ ∞ erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ expm1f −∞ ∞ lgammaf 2 ∞ logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 107 tgammaf 2 ∞ cosf −223 223 sinf −223 223 tanf −223 223 1 1010 106

59 Table 10. Glitch data for the raspi machine function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ 1 1 108 atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 454 1 2 466 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ 1 1 2 expm1f −∞ ∞ lgammaf 2 ∞ 163 1 2 164 1 2 166 1 2 161 1 2 logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 104 2 3 104 2 4 104 3 3 104 3 4 cosf −223 223 104 1 3 104 1 3 104 1 3 104 1 3 sinf −223 223 tanf −223 223

60 Table 11. Glitch data for the zoltan machine function Dmin DM near up down zero ng dM wM ng dM wM ng dM wM ng dM wM acosf −1 1 acoshf 1 ∞ 1 1 2 1 1 2 asinf −1 1 asinhf −∞ ∞ 2 1 2 2 1 2 atanf −∞ ∞ 1 1 108 atanhf −1 1 2 1 2 2 1 2 cbrtf −∞ ∞ 106 1 2 106 1 2 106 1 2 106 1 2 coshf −∞ ∞ 454 1 2 466 1 2 442 1 2 448 1 2 erff −∞ ∞ expf −∞ ∞ exp10f −∞ ∞ exp2f −∞ ∞ 1 1 2 3 1 2 2 1 2 1 1 2 expm1f −∞ ∞ lgammaf 2 ∞ 163 1 2 164 1 2 166 1 2 161 1 2 logf 0 ∞ log10f 0 ∞ log1pf −1 ∞ 1 1 2 1 1 2 log2f 0 ∞ sinhf −∞ ∞ sqrtf 0 ∞ tanhf −∞ ∞ 1 1 2 2 1 3 tgammaf 2 ∞ 105 4 3 105 4 3 105 4 3 105 4 3 cosf −223 223 sinf −223 223 tanf −223 223

61 B Computation of Lower Bounds: Proofs and Complexity

Lemma 51 Whenever function findhi lb of Algorithm 3 is called on actual parameters satisfying the Require condition, all values computed by findhi lb satisfy the Ensure condition.

Proof. The proof begins by assuming the precondition for findhi lb(f, y, [xl, xu], ng, dM, wM,α,ω,t) is satisfied: in particular, f(xu) ≺ y. The condition on line 2 holds in the following cases:

f f ng =0: as ng ≥ ng we have ng = 0, that is, f is isotonic on [xl, xu]; thus f(xu) ≺ y implies f(x) ≺ y for each x ∈ [xl, xu]. f f xu ≻ ω : as ω ≥ ω , we know xu ≻ ω . This implies that xu cannot be inside a f f glitch, since, by definition of ω , f is isotonic on [ω , xu]. Therefore f(x) 4 f(xu) ≺ y for each x ∈ [xl, xu]. #[f(xu),y) > dM : this means that f(xu) is so low that, even under the worst- case assumption xu is the minimum of a maximal-depth glitch, the value of f just before the glitch would still be below y. Therefore, f(x) ≺ y for each x ∈ [xl, xu]. In all the circumstances listed above, the post-condition for the case r = 0 is proved. Line 3 contains an else statement: we know that ng > 0, xu 4 ω and #[f(xu),y) ≤ dM. Further, let us assume that the condition on line 3 is true, hence ng = 1. The reasoning must be split as follows: y ≻ f(α) : we must check whether a glitch starts in α or not. f(succ(α)) ≺ f(α) : in this case, f has exactly one glitch in [xl, xu] beginning f f f in α = α. Hence, f(x) 4 f(α ) for each x ∈ [xl, α ] and, since y ≻ f(α), f we also have f(x) ≺ y for each x ∈ [xl, α ]. Moreover, the precondition f of the algorithm entails that f(xu) ≺ y, which, together with ng = 1, f allows us to conclude that f(x) ≺ y also in interval (α , xu]. Summing up, [xl, xu] does not contain any solutions for y = f(x) and setting r =0 in line 5 satisfies the post-condition. f(succ(α)) < f(α) : in this case, f either has no glitch or it has exactly one glitch strictly past α (that is, αf ≻ α). As glitches can be too wide (i.e., wM > t), we refrain from searching a suitable value for hi to start bisection with. However, as y ≻ f(α), we have y ≻ f(x) for each x ∈ [xl, α]: predicate p2(y, xl, xu, α) is satisfied, and setting l = α and r = 2 in line 6 is correct. y 4 f(α) : setting hi = α and r = 1, as done in line 8, is guaranteed to satisfy the post-condition. In fact, ng = 1 implies xl 4 α 4 ω 4 xu, therefore hi ∈ [xl, xu] and f(hi) < y. On the contrary, if the condition on line 2 is false, control flow reaches the else statement on line 10 and we know that either ng > 1 or wM ≤ t. In this case function linsearch geq(f, y, [xl, xu], wM,t) searches backwards the first value

62 for hi such that f(hi) < y; it dows so float-by-float, starting from xu, and it stops after the minimum between t and wM steps, but without going beyond xl. b and hi, as set in line 11, satisfy the condition stated in lines 12–13. At this point, if b = 1, then a suitable value for hi was found and setting r = 1 satisfies the post-condition. On line 15, since we are on the else statement, we know that b = 0. Therefore, by the condition stated in lines 12–13, we have ∀x ∈ [ˆx, xu]: f(x) ≺ y, with v v = min{t, wM} andx ˆ = max{xl, pred (xu)}. Two cases may occur: f f t ≥ wM: in this case, wM ≥ wM implies t ≥ wM. Therefore, we are sure that the interval [ˆx, xu] contains the value for x just before the the last glitch began: we will call such point xm. By the definition of glitch, we have ∀x ∈ [xl, xm]: f(x) 4 f(xm). Since xm ∈ [ˆx, xu], surely f(xm) ≺ y holds, and so ∀x ∈ [xl, xm]: f(x) 4 f(xm) ≺ y. This and ∀x ∈ [ˆx, xu]: f(x) ≺ y lets us comclude that ∀x ∈ [xl, xu]: f(x) ≺ y, and setting r = 0 in line 15 satisfies the post-condition. t < wM: in this case, we cannot be sure that there is no solution in interval [xl, xu], but we could not find (with a complexity bounded by t) a suitable value for hi to start bisection. Hence, setting l = xl and r = 2 in line 16 satisfies, trivially, the post-condition. ⊓⊔

Lemma 52 Whenever function bisect lb of Algorithm 4 is called on actual parameters satisfying the Require condition, the values computed by bisect lb satisfy the Ensure conditions.

Proof. Let us assume that the precondition for bisect lb(f, y, [xl, xu],ng, dM, wM,α, ω, ng,s,t) is satisfied initially. We will consider the following while loop invariant:

Inv ≡ (xl 4 lo ≺ hi 4 xu) ∧ (f(lo) ≺ y 4 f(hi)) ∧ (∀x ∈ [xl, lo] : f(x) ≺ y).

We will show that when the while loop of bisect lb finishes, the post-condition in the Ensure statement holds. To this aim, we will prove that the while loop of bisect lb satisfies the following properties. Initialization: the invariant Inv holds prior to the first loop iteration because it is entailed by the precondition, that is, the Require statement. Maintenance: assume that Inv holds at the beginning of an arbitrary loop it- eration: we will prove that it holds at the end of that iteration, as well. If the guard on line 1, #[lo, hi) > 1, is false, than Inv trivially holds at the end of the loop. Therefore, assuming that #[lo, hi) > 1 holds, a stronger property can be proved: Inv holds and either the loop is exited with a break statement, or the new value hi′ or lo′, that is updated during the iteration, is contained into the open interval (lo, hi). After the invocation of function split point, at line 3, mid = predm(hi) = ′ succm (lo), with m,m′ > 0, holds. Hence, lo ≺ mid ≺ hi holds and this property is exploited in the following two main cases, tested beginning at line 4.

63 y 4 f(mid): in this case hi′ = mid. By assuming that Inv holds before the ′ iteration, we need to prove that xl 4 lo ≺ hi ≺ hi 4 xu and f(lo) ≺ y 4 f(hi′). Note that this is a direct consequence of lo ≺ mid ≺ hi and y 4 f(mid). Since the value of lo was not updated, the remaining part of invariant Inv holds as it did before this iteration. Moreover, in the rest of the proof we will show that ∀x ∈ [xl, lo] : f(x) ≺ y only when the value of lo is being updated. y ≻ f(mid): here, further cases need to be distinguished. ′ ng =0∨mid 4 α∨mid < ω ∨#[f(mid),y) > dM: in this case lo = mid. From lo ≺ mid ≺ hi and y ≻ f(mid), we can derive that xl 4 lo ≺ ′ ′ lo ≺ hi 4 xu and f(lo ) ≺ y 4 f(hi). We are left to prove that ′ ∀x ∈ [xl, lo ]: f(x) ≺ y. If the function f is isotonic or, at least, we are sure that point mid is not inside a glitch, that is, mid 4 α 4 αf f ′ or mid < ω < ω , y ≻ f(mid) implies that ∀x ∈ [xl, lo = mid] : f(x) ≺ y and the invariant Inv is proved. For the remaining case #[f(mid),y) > dM, note that f(mid) is so low that the value of f just before the glitch would be smaller than y even under the worst- case assumption mid is the minimum of a maximal-depth glitch. ′ Therefore, we can conclude that ∀x ∈ [xl, lo = mid] : f(x) ≺ y. ng = 1 and either wM >t or f(succ(α)) ≺ f(α): since line 6 is the else- if guard of the if statement at line 5, we are sure that α ≺ mid ≺ ω. This implies that mid may be inside a glitch. For this reason, at line 7 the condition f(ω) < y is tested and if it holds, at line 8 condition f(α) < y is tested. If it is satisfied, we set hi = α. Since mid ≺ hi 4 xu and α ≺ mid, we can conclude that α ≺ hi. Moreover, since f(lo) ≺ y, f(α) < y and α 4 αf , we have lo ≺ α. Hence, we ′ ′ can conclude xl 4 lo ≺ hi = α ≺ hi 4 xu ∧ f(lo) ≺ y 4 f(hi ) holds. At line 9 we know that f(α) ≺ y 4 f(ω). Then we test if f(succ(α)) ≺ f(α): if this is true, we can be sure that the unique glitch of func- tion f starts exactly at point α, that is, α = αf . Therefore, we have f(αf ) ≺ y 4 f(ω). In this case lo′ = mid. From lo ≺ mid ≺ hi and ′ ′ y ≻ f(mid), we can derive that xl 4 lo ≺ lo ≺ hi 4 xu ∧ f(lo ) ≺ y 4 f(hi). Moreover, f(αf ) ≺ y 4 f(ω) and y ≻ f(mid) implies that ′ ∀x ∈ [xl, lo = mid] : f(x) ≺ y: the last part of the invariant Inv is proved. At line 10 f(α) ≺ y 4 f(ω) still holds. In this case, if lo ≺ α, we set lo′ = α. From lo ≺ α, α 4 αf , f(α) ≺ y and y 4 f(hi) we can ′ ′ derive that xl 4 lo ≺ lo = α ≺ hi 4 xu ∧ f(lo ) ≺ y 4 f(hi) and ′ f ∀x ∈ [xl, lo = α]: f(x) ≺ y, since f(α) ≺ y and α 4 α . This proves the invariant Inv. At line 11 we know that lo = α and we exit from the loop. Note that, in this case, the invariant Inv still holds, trivially. When line 13 is reached we are sure that f(ω) ≺ y and we set lo′ = ω. Since f(ω) ≺ y, ω < ωf and y 4 f(hi), we can conclude that ω ≺ hi 4 xu. Moreover, observe that ω 4 lo cannot hold: since the control flow reached this point, the else-if-guard of line 5 is

64 false, and mid 4 ω. This would imply mid 4 lo, which contradicts the post-condition of function split point. Therefore, xl 4 lo ≺ f ω ≺ hi 4 xu. Moreover, by f(ω) ≺ y and ω < ω we have ∀x ∈ ′ [xl, lo = ω]: f(x) ≺ y. Setting lo := ω satisfies the invariant. wM ≤ t and the previous conditions are false: at line 16 function findfmax(f, wM, lo, mid) is called, returning a value b that satisfies the post-condition of line 17. The if statement of line 18 distinguishes between two cases: f(b) < y: hi′ = b is set. Note that by the post-condition of line 17 we have b ∈ [max{lo, predwM (mid)}, mid]. Since lo ≺ mid ≺ hi and f(b) < y while f(lo) ≺ y, we have lo ≺ b ≺ hi. Therefore, ′ xl 4 lo ≺ hi = b ≺ hi 4 xu. ′ f(b) ≺ y: lo = mid is set. Since lo ≺ mid ≺ hi we have xl 4 lo ≺ ′ ′ lo = mid ≺ hi 4 xu. Moreover, f(lo ) ≺ y 4 f(hi), since line 19 is in the else body of the if construct of line 4. Finally, we have ′ to prove that ∀x ∈ [xl, lo ]: f(x) ≺ y. From post-condition at line 17 we know that ∀x ∈ [max{lo, predwM (mid)}, mid] : f(x) 4 f(b). We have now two cases: max{lo, predwM (mid)} = lo: in this case, the post-condition at line 17 implies that ∀x ∈ [lo, mid] : f(x) 4 f(b) ≺ y. wM wM f max{lo, pred (mid)} = pred (mid): since wM ≥ wM, even in the worst-case scenario, that is, mid is inside a glitch at the maximal distance from the its beginning, interval [predwM (mid), mid] contains the last point xm before the glitch started. There- fore, we have ∀x ∈ [lo, xm]: f(x) 4 f(xm). Together with ∀x ∈ [predwM (mid), mid] : f(x) 4 f(b) ≺ y, this implies ∀x ∈ [lo, mid] : f(x) 4 f(b) ≺ y, which proves that ∀x ∈ ′ [xl, lo ]: f(x) ≺ y. In both cases, invariant Inv holds. Otherwise: Function logsearch lb(f, dM, lo, mid,y,s), which returns a value z satisfying the post-condition stated at line 23, is called at line 22. The if statement of line 24 distinguishes between two cases: lo ≺ z: we set lo′ = z. From the post-condition stated at line 23, we have z ∈ [lo, mid]. Together with lo ≺ z, this gives xl 4 lo ≺ ′ lo = z ≺ hi 4 xu. Moreover, the post-condition stated at line 23 ′ implies that #[f(z),y) > dM: therefore also f(lo ) ≺ y 4 f(hi) ′ holds. Finally, we have to prove that ∀x ∈ [xl, lo = z]: f(x) ≺ y. Since #[f(z),y) > dM, the value of f(z) is so low that, even under the worst-case assumption z is the minimum of a maximal-depth glitch, the value of f just before the glitch would still be below ′ y. This implies that ∀x ∈ [xl, lo = z]: f(x) ≺ y. lo = z: in this case we exit the loop without any change and, there- fore, invariant Inv trivially holds. Termination: We have just proved that invariant Inv holds and either we exit the loop with a break statement, or the new value hi′ or lo′ is contained into the interval (lo, hi). Therefore, #[lo, hi] decreases at each iteration. The guard of

65 the while loop at line 1 tests the condition #[lo, hi) > 1, that is equivalent to #[lo, hi] > 2. This assures that the while loop always terminates. Correctness: We will prove that, whenever the loop invariant and the loop exit- condition both hold, then the post-condition stated in the Ensure statement holds. Since the correctness post-condition coincides with the invariant Inv, we only need to prove that the precision post-condition holds. Note that, f under the hypothesis that ng = 0, wM

Theorem 53. Whenever function lower bound of Algorithm 2 is called on actual parameters satisfying the Require condition, the values computed by lower bound satisfy the Ensure conditions.

i Proof. The precondition for lower bound(y, [xl, xu],ng, dM, wM, α,ω,f ,t) will be assumed to be satisfied, initially. First, we will prove the correctness post- i condition. As described in Section 5.2, function init(y, [xl, xu],f ) has been designed to return a point inside the interval [xl, xu], which, therefore, meets the condition at line 1. Moreover, recall that function gallop lb(f, y, [xl, xu], dM,i) starts with lo = hi = i, where xl 4 i 4 xu and it is made in such a way that it returns new values for lo and hi satisfying the condition at line 3. The goal now is to verify if such lo and hi can be used for bisection. We first focus on the value of lo: the value f(lo) is compared to y on line 4 and on line 10.

f(lo) ≻ y: In this case, according to the post-condition at line 3, we can be sure that lo = xl. Since f(xl) ≻ y, we further need to distinguish the case in which we are guaranteed that ∀x ∈ [xl, xu]: f(x) ≻ y from the case we are not. The first case surely occurs in the following conditions, which are part of the guard of the if statement on line 5: f f ng = 0: in this case, since ng ≥ ng , we can conclude that ng = 0 and, therefore, that function f is isotonic in interval [xl, xu]. Since f(lo) ≻ y we can be sure that ∀x ∈ [xl, xu]: f(x) ≻ y. f #[y,f(α)) > dM: since dM ≥ dM (from the precondition), we can conclude f f that #[y,f(α)) > dM. Moreover, α 4 α and the fact that f is quasi- f f isotonic allow us to conclude that f(α ) < f(α) and #[y,f(α)) > dM. By f f definition of α we know that function f is isotonic on interval [xl, α ], f and since f(xl) ≻ y, we can conclude that ∀x ∈ [xl, α ], f(x) ≻ y. f f f Moreover, #[y,f(α )) > dM assures us that the value f(α ) is so high that, even under the worst case assumption that the glitch starting at αf has maximal depth, the value of f could not decrease enough to reach y. Therefore, we can be sure that ∀x ∈ [xl, xu]: f(x) ≻ y.

66 Hence, in all circumstances in which the condition on line 5 is true, the post-condition for values r = 1 and l = xu is proved. At line 7, we are on the else statement, and therefore we cannot conclude that ∀x ∈ [xl, xu]: f(x) ≻ y, but we cannot apply bisection either, since f we are under the assumption f(lo) ≻ y. From α 4 α and f(xl) ≻ y, we have ∀x ∈ [xl, α], f(x) ≻ y. Therefore, setting l = α ≺ xu satisfies the post-condition for the case r = 1. f(lo) = y: In this case, according to the condition at line 3, we can be sure that xl = lo. Therefore, setting l = lo satisfies the post-condition for the case r = 4. Since each branch of the if-else instructions considered until now contains a return statement, if the control flow reaches the end of line 12 we are sure that f(lo) ≺ y. At line 13, we focus on the value of hi. If f(hi) ≺ y, using the condi- tion at line 3 we are sure that hi = xu and, hence, f(xu) ≺ y. Under this condition, the value of hi = xu cannot can be used for bisection: function findhi lb(f, y, [xl, xu],ng, dM, wM,α,ω,t) specified in Algorithm 3 is called. The precondition of function findhi lb is satisfied by the one of function lower bound and by the hypothesis f(hi) ≺ y. Afterwards, by Lemma 51, the post-condition of function findhi lb holds. At line 15, we test if function findhi lb returned r =0 or r = 2: in these cases the post-condition of function findhi lb assures that the one of function lower bound for r = 0 and r = 2 is satisfied. At the end of line 16, we know that function findhi lb returned r = 1 and, therefore, a new value such that xl 4 hi 4 xu and f(hi) < y was assigned to hi. Summarizing, before the invocation of function bisect lb at line 18 we are sure that f(lo) ≺ y 4 f(hi) and, therefore, lo 6= hi. In order to prove that the preconditions of function bisect lb are met, we need to prove that the values of lo and hi at line 18 satisfy the following predicates: xl 4 lo ≺ hi 4 xu and ∀x ∈ [xl, lo] : f(x) ≺ y. Note, indeed, that we have already proved that at line 18 f(lo) ≺ y 4 f(hi) holds and that the remaining requirements of function bisect lb are implied by the precondition of function lower bound. Moreover, we will first prove that the value of lo at line 18 satisfies xl 4 lo and that ∀x ∈ [xl, lo] : f(x) ≺ y. Observe that the value of lo is the one returned by function gallop lb, since the code at lines 4-18 tests the value of lo without changing it. Therefore, by the post-condition of line 3, we have xl 4 lo. As for the remaining clause, we have two cases:

xl ≺ lo: by the post-condition of line 3, we have #[f(lo),y) > dM. This means that f(lo) is so low that, even under the worst-case assumption lo is the minimum of a maximal-depth glitch, the value of f just before the glitch would still be below y. This implies that ∀x ∈ [xl, lo] : f(x) ≺ y. xl = lo: since at line 18 we are sure that f(xl) ≺ y, in this case ∀x ∈ [xl, lo] : f(x) ≺ y holds, trivially.

In order to prove that the value of hi at line 18 satisfies lo ≺ hi 4 xu, we need to distinguish between the following cases:

67 y 4 f(hi): the value of hi is the one computed by gallop lb, since the guard of the if instruction at line 13 is false. Therefore, by the condition of line 3, we have lo 4 hi 4 xu. Since at line 18 we are sure that lo 6= hi, we can conclude that lo ≺ hi 4 xu. y ≻ f(hi): in this case, the current value of hi is the one chosen by function findhi lb, which must have returned r = 1, since we are at line 18. There- fore, by the post-condition of function findhi lb, we know that the new value of hi satisfies hi ∈ [xl, xu] and f(hi) < y. Since we have proved that ∀x ∈ [xl, lo] : f(x) ≺ y, we can conclude that lo ≺ hi 4 xu. Hence, since the preconditions of function bisect lb are met, by Lemma 52 we are sure that, after it returns, xl 4 lo ≺ hi 4 xu, f(lo) ≺ y 4 f(hi) and ∀x ∈ [xl, lo] : f(x) ≺ y hold, for the new values of lo and hi. At line 19 a while loop is entered. This loop performs a float-by-float search (for a maximum of t iterations) to approach the exact solution of y = f(x). We want to prove that the predicate ∀x ∈ [xl, lo] : f(x) ≺ y, which is also the invariant of the loop, holds at line 22. To this aim, we will prove the following loop properties:

Initialization: the invariant ∀x ∈ [xl, lo] : f(x) ≺ y holds prior to the first loop iteration because it is entailed by the post-condition of function bisect lb. Maintenance: assume ∀x ∈ [xl, lo] : f(x) ≺ y holds at the beginning of an arbitrary loop iteration. Then such assumption, together with the guard of the loop f(succ(lo)) ≺ y and the assignment lo′ := succ(lo) in the body of ′ the loop, allows us to conclude that ∀x ∈ [xl, lo ]: f(x) ≺ y holds also at the end of that iteration. Termination: the while loop terminates because we are assured by the post- condition of bisect lb that there exists a value hi such that lo ≺ hi and y 4 f(hi). Moreover, the loop can end before reaching such value, because the parameter t ∈ N is decremented inside the loop, until it reaches 0. Correctness: as a consequence, at the end of the while loop, the property ∀x ∈ [xl, lo] : f(x) ≺ y holds and either f(succ(lo)) < y or t = 0.

Then, at line 23 we test if f(succ(lo)) ≻ y. Since ∀x ∈ [xl, lo] : f(x) ≺ y, setting l to the value of lo satisfies the post-condition for the case r = 3. With the else-if instruction at line 24 we test if f(succ(lo)) = y. In this case, since ∀x ∈ [xl, lo] : f(x) ≺ y, setting l to the value of succ(lo) satisfies the post- condition for the case r = 4. Furthermore, if we are on the else instruction of line 25, it means that the while-loop of line 19 terminated because t became equal to 0. In this case, since ∀x ∈ [xl, lo] : f(x) ≺ y, setting l to the value of lo satisfies the post-condition for the case r = 2. We will now prove the precision post-condition under the hypothesis that f f(xl) 4 y 4 f(xu) holds and either ng = 0, wM < t or ng = 1 ∧ α = α holds. Reasoning as before, at line 2 we are sure that the function gallop lb(f, y, [xl, xu], dM,i) returns values for lo and hi such that post-condition of line 3 is satisfied. The latter, together with the assumption f(xl) 4 y 4 f(xu), allows us to con- clude that, in the cases we are interested in, f(lo) 4 y 4 f(hi). The condition tested by the if statement at line 4 is surely false. Therefore, in this case, the condition at line 10 needs to be tested. We have the following cases:

68 f(lo) = y: function lower bound returns with r = 4. Therefore, the precision post-condition is satisfied. f(lo) ≺ y: since y 4 f(hi), the test of line 13 is false and before line 18 we are sure that xl 4 lo ≺ hi 4 xu, since f(xl) ≺ y and y 4 f(xu). Therefore, since the preconditions of function bisect lb are met, by Lemma 52 we know that the post-condition holds. In more details, since we have assumed that f ng = 0 ∨ wM < t ∨ (ng = 1 ∧ α = α ), from the precision post-condition of function bisect lb we can derive that f(succ(lo)) < y. In this case, the while loop of line 19 is never executed. The value of f(succ(lo)) < y is then tested at lines 23 and 24. If f(succ(lo)) ≻ y then r = 3 is returned, if f(succ(lo)) = y, then r = 4 is returned. In both cases, the precision post- condition is satisfied.

This completes the proof. ⊓⊔

Theorem 54. If f : F → F is an isotonic function, i.e., ng =0, then, for each i [xl, xu] ∈ IF, dM, wM, α, ω, f : F → F, s,t ∈ N, computing lower bound as per Algorithm 2 evaluates f at most 2 log2(#[xl, xu]) + 4 times.

i Proof. At the beginning of Algorithm 2, function init(y, [xl, xu],f ) is called and it returns a point inside interval [xl, xu], meeting the condition at line 1. It never calls function f. Then, function gallop lb(f, y, [xl, xu], dM,i) is called with lo = hi = i. First, we will assume that dM = 0. Since either #[f(i),y) > 0 (that is, y ≻ f(i)) or f(i) < y holds, the worst cases are the following:

i = xl and ∀x ∈ [xl, xu): y ≻ f(x): in this case hi = xu is found after log2 #[xl, xu] + 1 calls to function f; i = x and ∀x ∈ (x , x ]: f(x) < y: in this case lo = x is found after u  l u l log2 #[xl, xu] + 1 calls to f.

23 If, on the contrary,dM > 0, then the worst-case scenario for function gallop lb is when it starts at some point i inside interval [xl, xu], but it terminates only when lo = xl and hi = xu. In this case, gallop lb invokes log2 #[xl, xu] +1 times function f, as well.  After calling gallop lb, the algorithm discerns between the following cases.

f(lo) < y: condition f(lo) ≻ y is tested at line 4 of Algorithm 2. Then, since ng = 0 the function returns at line 6. Condition f(lo) = y is tested at line 10 and the function returns at line 11.

23 f f Note that dM and wM are just safe approximations of the values dM and wM, so f f f that dM ≥ dM and wM ≥ wM. In particular, although if ng = ng = 0 it would f f surely make sense to have dM = 0 and wM = 0, the execution of the algorithms with dM > 0 and wM > 0 cannot be excluded. Nevertheless, the algorithms are correct even in such occurrence, and the overall computational complexity is not affected. However, in a practical implementation, it would be advisable to enforce ng =0 =⇒ dM = 0 ∧ wM = 0, in order to prevent function gallop lb from doing useless work.

69 f(hi) ≺ y: function findhi lb(f, y, [xl, xu],ng, dM, wM,α,ω,t) is called but, since ng = 0, r = 0 is returned to the calling function lower bound, that terminates at line 15 without invoking function f. f(lo) ≺ y 4 f(hi): at line 18 we call function bisect lb(f,y,ng, dM, wM, α, ω, s, t, lo, hi). At each iteration, it computes the middle point mid between lo and hi, and f(mid). Since ng = 0, it updates either the value of hi or the value of lo with mid until succ(lo) = hi. Since mid satisfies the post- condition of function split point at line 3 of Algorithm 4, we can conclude that #[mid, hi] ≤⌈#[lo, hi]/2⌉ and #[lo, mid] ≤⌈#[lo, hi]/2⌉. Hence, in the worst-case scenario, function bisect lb calls f log2 #[xl, xu] times. In the case n = 0, the precision post-condition of bisect lb holds and we are g  sure that f(succ(lo)) < y. At line 19 of function lower bound, there is a while loop. Since succ(lo) = hi and f(succ(lo)) < y, such loop is never executed. Then, either the function lower bound returns at line 23 because f(succ(lo)) ≻ y, or it returns at line 24 because f(succ(lo)) = y. In any case, function lower bound called function f at most 2 log2 #[xl, xu] + 4 times. This concludes the proof.  ⊓⊔

Theorem 55. If f : F → F has short glitches, that is, ng > 0 but wM 0, function lower bound of Algorithm 2 returns either on line 6 or on line 8, depending on whether #[y,f(α)) > dM holds or not. If f(lo) 4 y, condition f(lo) = y is tested in line 10 and if it holds, function lower bound returns on line 11. Therefore, in this case function lower bound terminates after calling function f at most 2 times. f(lo) 4 y and f(hi) ≺ y: function findhi lb(f, y, [xl, xu],ng, dM, wM,α,ω,t) is called after 2 calls to f (to test the values of f(lo) and f(hi)). We will now prove that the call to findhi lb will invoke function f at most wM +2 times, in the worst case. Since ng > 0 and wM < t, the behavior of function findhi lb changes according to the following cases. xu ≻ ω or #[f(xu),y) > dM: the call to findhi lb terminates at line 2, returning the value r = 0. This causes function lower bound to terminate with 2 invocations to function f. ng = 1, f(succ(α)) ≺ f(α) and y ≻ f(α): the call to findhi lb terminates at line 5 returning the value r = 0. Again, lower bound terminates with 2 calls to function f.

70 ng = 1, f(succ(α)) ≺ f(α) and y 4 f(α): the call to findhi lb terminates at line 8 returning the value r = 1 and invoking the function f just 2 times. ng > 1: after 2 invocations of function f, findhi lb calls linsearch geq(f, y, [xl, xu], wM,t). The latter performs a float-by-float backwards search of the first value of hi such that f(hi) < y. It stops after at most wM steps (recall wM < t, in this case). Hence, function linsearch geq invokes function f at most wM times. Therefore, we can conclude that findhi lb calls function f at most wM +2 times. From the point of view of complexity, the worst-case scenario is when findhi lb returns a value of hi which meets the preconditions of Algorithm bisect lb, allowing to start bisection (in this case findhi lb returns r = 1, see lines 8 and 14 of Algorithm 3). Hence, at line 18 of Algorithm 2, func- tion bisect lb(f,y,ng, dM, wM, α, ω, s, t, lo, hi) is called. In each iteration, it computes the middle point mid between lo and hi and it evaluates f(mid). If y 4 f(mid), function bisect lb updates the value of hi with mid on line 4. As we have already discussed in the proof of Theorem 54, mid satisfies the post-condition of function split point at line 3 of Algorithm 4. Therefore, #[lo, mid] ≤ ⌈#[lo, hi]/2⌉. Alternatively, when y ≻ f(mid), we have the following cases:

mid 4 α or mid < ω or #[f(mid),y) > dM: function bisect lb updates the value of lo with mid at line 5. Since mid satisfies the post-condition of function split point at line 3 of Algorithm 4, we can conclude that #[mid, hi] ≤⌈#[lo, hi]/2⌉. α ≺ mid ≺ ω, ng = 1, f(succ(α)) ≺ f(α), f(ω) < y and f(α) < y: at line 8 the value of hi is updated with α. Since α ≺ mid ≺ ω, we have that #[lo, α] < ⌈#[lo, hi]/2⌉. α ≺ mid ≺ ω, ng = 1, f(succ(α)) ≺ f(α), f(ω) < y and f(α) ≺ y: at line 9 the value of lo is updated with mid. Is is sufficient to remember that #[mid, hi] ≤⌈#[lo, hi]/2⌉. α ≺ mid ≺ ω, ng = 1, f(succ(α)) ≺ f(α) and f(ω) ≺ y: at line 13 the value of lo changes to ω. Since, in this case, α ≺ mid ≺ ω, we have #[ω, hi] < ⌈#[lo, hi]/2⌉. ng > 1: at line 16, function findfmax(f, wM, lo, mid) is called. It finds a value b inside interval [max{lo, predwM (mid)}, mid] such that the value f(b) is the highest possible. Hence, findfmax calls wM times function f . Then, two cases must be distinguished based on value b, returned by findfmax: f(b) < y: at line 18, the value of hi is updated with b. Since xl 4 b 4 mid, we have #[lo,b] ≤⌈#[lo, hi]/2⌉. f(b) ≻ y: at line 19, the value of lo is updated with mid and, as discussed above, #[mid, hi] ≤⌈#[lo, hi]/2⌉ holds. Hence, in the worst-case scenario, function bisect lb calls function findfmax at every step. In that case Algorithm bisect lb calls wM log2 #[xl, xu] +1 times function f, in particular. 

71 Since the precision post-condition of bisect lb holds (wM

C Computation of Upper Bounds

The algorithm we conceived for the computation of the upper bounds is substan- tially similar to the one for the lower bounds in its structure and functioning. It employs the same arguments to obtain glitch data, and it ends ensuring the post-condition predicates listed in Section 5.2. Algorithm 6 consists of a first phase in which it tries to find a sub-interval inside the initial one, [xl, xu], that is suitable for the bisection process. If such an interval cannot be found, it tries to determine quickly whether the equation y = f(x) has a solution or not, compatible with the available glitch information. Otherwise, the obtained interval is searched for an admissible upper bound by Algorithm 9, a dichotomic search that takes into account the possible presence of glitches. This algorithm is similar to bisect lb, except for the fact that it needs to ensure that the function is strictly greater than y in the whole interval between the found upper bound and xu. Another significant difference between the two algorithms is the behavior in the case where the function evaluated at the mid-point mid is greater than y. The computation should continue in the first half of the original interval, discarding the second one and making sure that the graph of the function is entirely above y in the latter. This means asserting that there are no glitches after mid which are deep enough to let the function reach y. Data such as α, ω and the maximum glitch depth dM are almost always helpful in excluding this circumstance. Theoretically speaking, this is not always the case, e.g., if dM is very high, or y is very close to f(mid). The former case seldom occurs in practice, as noted in Section 4. The latter can occur in the last stages of the bisection process if the function increases very slowly. The experimental evaluation we performed, however, showed that this is not a substantial problem in practice. Anyway, should this circumstance occur, if the function has only one glitch and it is sufficiently narrow, it can be searched float-by-float. Otherwise, the whole right interval should be searched for glitches, which is clearly unfeasible, unless the intervals are very small. The analogous issue with bisect lb was making sure that mid was not inside a glitch, in order to exclude the left half of the interval. This situation could always be clarified if wM

72 upper bound than for the lower bound. In particular, it ensures optimality of the bound if the function is monotonic, i.e., ng = 0. Otherwise, it finds an optimal upper bound if

– ng = 1: the function has one glitch only, and – wM k ∧ k ≤ t is true. Nevertheless, the algorithm for the upper bound has the same order of complexity as bisect lb. Below we give a more in-depth description of the algorithms, together with the proof of their correctness and more precise claims about complexity. Algorithm 6 tries to refine the interval for x by finding a correct upper bound. The preconditions for this algorithm are listed in its Require statement, and they are the same as those of Algorithm 2. The algorithm terminates satisfying the post-condition presented in its En- sure statement. The predicates pi that appear in the statement are those defined at the beginning of Section 5.2. The way algorithm upper bound operates is substantially similar to the one of lower bound. First, function init tries to guess initial values for lo and hi by using the rough inverse f i. Then, gallop ub tries to quickly find values for lo and hi suitable for the call to bisect ub. In particular, f(lo) 4 y ≺ f(hi) and ∀x ∈ [hi, xu]: f(x) ≻ y must hold. If this is not the case, the if statement of the subsequent lines handle the situation. If f(hi) ≺ y, we must discern the case when there is no solution from the case when xu is in a glitch, and there is actually a value of x such that f(x) reaches y somewhere outside of the glitch. Function findhi ub is called for this purpose. It operates mostly like findhi lb: if we can be sure that xu is not in a glitch, either because there are no glitches, or because it is outside of [α, ω], or if anyways no glitch could be deep enough for the function to reach y outside of it, we can claim that the function is lower than y for the whole interval. Otherwise, if there is only one glitch and we know its position, a case analysis can be done to conclude whether there is a solution or not. If none of these conditions apply, a float-by-float search is performed in order to try to reach the beginning of the glitch where xu is, if any. When findhi ub terminates, predicate p6 always holds: therefore this function cannot find a value of hi suitable for bisection, and it is only useful when the interval for y is a singleton. When f(lo) ≻ y, if the function has no glitches then we are sure its graph is completely below y, and equation y = f(x) has no solution. Otherwise, there might be glitches in [xl, xu] in which the function decreases until it reaches y. If there is only one glitch and it is not too wide, function check glitch searches it float-by-float to find out whether the function actually reaches y. If not, we can state that our constraint is unsatisfiable; the appropriate predicate index is returned otherwise. If lo and hi satisfy its precondition, Algorithm bisect ub is eventually called on line 28. It has been designed to implement the dichotomic method on interval

73 Algorithm 6 Indirect propagation: upper bound(f, y, [xl, xu],ng, dM, wM, α, ω, f i,s,t) f f f f Require: f : F → F, y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , f i ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), f : F → F, s,t ∈ N. Ensure: c u ∈ F, r ∈ {5, 6, 7, 8, 9} =⇒ pr(y,xl,xu, u)

p f(xl) 4 y 4 f(xu) f f ∧ ng = 0 ∨ ng = 1 ∧ wM k ∧ k ≤ t)   =⇒ r ∈ {8, 9} i 1: i := init(y, [xl,xu], f ); ⊲ xl 4 i 4 xu 2: (lo, hi) := gallop ub(f,y, [xl,xu],dM, i); 3: ⊲ (xl 4 lo 4 hi 4 xu) ∧ (xl ≺ lo =⇒ y < f(lo)) ∧ (xu ≻ hi =⇒ #[y, f(hi)) >dM)

4: if f(hi) ≺ y then 5: u := findhi ub(f,y, [xl,xu], ng,dM,wM,α,ω,t); 6: r := 6; return 7: else if f(hi) = y then 8: u := hi; r := 9; return 9: end if; 10: if f(lo) ≻ y then 11: if ng = 0 ∨ #[y, f(α)) >dM then 12: r := 5; return 13: else 14: (b,z) := check glitch(f,y, [xl,xu], ng,dM,wM,α,ω,t, lo, lo, hi); 15: if b = 0 then 16: r := 5; return 17: else if b = 1 then 18: if f(z)= y then 19: u = z; r = 9; return 20: else 21: u = succ(z); r = 8; return 22: end if 23: else 24: u := min{hi,ω}; r := 7; return 25: end if 26: end if 27: end if; 28: hi := bisect ub(f,y,ng,dM,wM, α, ω, ng,s,t, lo, hi); 29: while f(pred(hi)) ≻ y ∧ t> 0 do 30: hi := pred(hi); 31: t := t − 1 32: end while; 33: if f(pred(hi)) ≺ y then 34: u := hi; r := 8 35: else if f(pred(hi)) = y then 36: u := pred(hi); r := 9 37: else 38: u := hi; r := 7 39: end if

74 Algorithm 7 Indirect propagation: findhi ub(f, y, [xl, xu],ng, dM, wM,α,ω,t) f f f f Require: f : F → F, y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , f ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), t ∈ N, f(xu) ≺ y. Ensure: u ∈ F, p6(y,xl,xu, u). 1: if ng = 0 ∨ xu ≻ ω ∨ #[f(xu),y) >dM then 2: u := xl 3: else if ng = 1 ∧ wM >t ∨ (f(succ(α)) ≺ f(α) ∧ y < f(α)) then  4: if y ≺ f(α) ∨ f(succ(α)) < f(α) then 5: u := xu 6: else if y = f(α) then 7: u := succ(α) 8: else 9: u := xl 10: end if 11: else 12: (b, hi, xˆ) := linsearch geq(f,y, [xl,xu],wM,t); 13: ⊲ b = 1 ∧ hi ∈ [xl,xu] ∧ f(hi) < y ∧∀x ∈ [succ(hi),xu] : f(x) ≺ y  14: ⊲ ∨ b = 0 ∧∀x ∈ [ˆx,xu] : f(x) ≺ y v  15: ⊲ where v = min{t,wM} andx ˆ = max{xl, pred (xu)} 16: if b = 1 then 17: u := succ(hi) 18: else if t ≥ wM then 19: u := xl 20: else 21: u :=x ˆ 22: end if 23: end if

75 Algorithm 8 Indirect propagation: check glitch(f, y, [xl, xu],ng, dM, wM, α, ω, t, lo, m, hi)

Require: xl 4 lo 4 m 4 hi 4 xu, f(m) ≻ y, f(hi) ≻ y, f : F → F, y ∈ F, [xl,xu] ∈IF, f f f f f ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , ω < ω , ng > 0, xl 4 α 4 ω 4 xu, α ≤ hi ∧ ω ≥ m, t ∈ N. Ensure: b ∈ {0, 1, 2}, f f ng = 1 ∧ wM k ∧ k ≤ t) =⇒ b ∈ {0, 1},  b =0 =⇒ ∀x ∈ [m, hi] : f(x) ≻ y, b =1 =⇒ z ∈ F ∧ lo 4 z 4 hi ∧∀x ∈ (z, hi] : f(x) ≻ y ∧ f(z) 4 y. 1: if ng = 1 ∧ wM ≤ t ∧ f(ω−) ≺ f(ω) ∨ f(α+) ≺ f(α) ∨ (#[α, ω) >k ∧ k ≤ t) then  2: sl := max{α, lo}; 3: if f(ω−) ≺ f(ω) ∨ (#[α, ω) >k ∧ k ≤ t) then 4: su := min{ω, hi} 5: else +w 6: su := min{α M , hi} 7: end if; 8: (b,z) := linsearch leq(f,y,wM,sl,su) 9: ⊲ (b = 0 ∧ z =x ˆ ∧∀x ∈ [z,su] : f(x) ≻ y) 10: ⊲ ∨(b = 1 ∧ z ∈ [ˆx, su] ∧ f(z) 4 y ∧∀x ∈ (z,su] : f(x) ≻ y) −wM 11: ⊲ wherex ˆ = max{sl,su } 12: else 13: b = 2 14: end if

[lo, hi], taking into account the fact that f may not be isotone due to glitches. In this function, check glitch is used again in those cases where we do not know whether the function’s graph is completely above y in the upper half of interval [lo, hi] due to the possible presence of glitches. If performing the linear search is unfeasible because of the excessive number or width of glitches, function logsearch ub tries to quickly find a value for hi such that the distance between f(x) and y is too large for a glitch to be deep enough to let the function reach y. Since this could make the partitioning uneven and undermine the logarithmic complexity of the process, the number of such calls of logsearch ub is limited by threshold s. Finally, on line 29 a while loop goes backwards float-by-float for a maximum of t steps, until it reaches a value for hi such that f(pred(hi)) 4 y, and returns it as the upper bound. If such value is not found, the algorithm returns the current value of hi with predicate p7, which means an optimal value for the upper bound was not found. The correctness of Algorithms 6, 7, 8 and 9 is formally proved in the rest of this section. The number of calls to function f performed at most by upper bound has the form k log2 #[xl, xu] +c, where k and c are small constants that depend on w , s and t. The exact values of these constants in some special glitch data M  configurations are given in Theorems 2 and 3. Lemma 1. Function findhi ub specified in Algorithm 7 satisfies its contract.

76 Algorithm 9 Indirect propagation: bisect ub(f, y, [xl, xu],ng, dM, wM, α, ω, s, t, lo, hi)

Require: xl 4 lo ≺ hi 4 xu, f(lo) 4 y ≺ f(hi), ∀x ∈ [hi,xu] : f(x) ≻ y f : F → F, f f f f f y ∈ F, [xl,xu] ∈ IF, ng ≥ ng , dM ≥ dM, wM ≥ wM, α 4 α , ω < ω , ng > 0 =⇒ (xl 4 α 4 ω 4 xu), s,t ∈ N. Ensure: c xl 4 lo ≺ hi 4 xu, f(lo) 4 y ≺ f(hi), ∀x ∈ [hi,xu] : f(x) ≻ y f f p ng = 0 ∨ ng = 1 ∧ wM k ∧ k ≤ t)   =⇒ f(pred(hi)) 4 y 1: while #[lo, hi) > 1 do 2: mid := split point(lo, hi); ′ 3: ⊲ ∃m,m′ > 0, |m − m′|≤ 1, mid = predm(hi) = succm (lo) 4: if f(mid) 4 y then 5: lo := mid 6: else if ng = 0 ∨ hi 4 α ∨ mid < ω ∨ #[y, f(mid)) >dM then 7: hi:=mid 8: else 9: (b,z) := check glitch(f,y,ng,dM,wM,α,ω,t, lo, mid, hi); 10: if b = 0 then 11: hi:=mid 12: else if b = 1 then 13: hi := succ(z); break 14: else 15: z := logsearch ub(f,dM, mid, hi,y,s); 16: ⊲ z ∈ [mid, hi] ∧ (z ≺ hi) =⇒ #[y, f(z)) >dM  17: if z ≺ hi then 18: hi := z 19: elsebreak 20: end if 21: end if 22: end if 23: end while

77 Proof. We assume that the precondition for findhi ub(f, y, [xl, xu],ng, dM, wM, α,ω,t) is satisfied. In particular, f(xu) ≺ y holds. The guard of the if statement on line 1 is the same as the one of Algorithm 3. When it is satisfied we know for sure that either xu is not in a glitch, or that xu might be inside a glitch but not deep enough for the function to reach the value y elsewhere in [xl, xu]. Together with f(xu) ≺ y, this allows us to state that the equation y = f(x) has no solution, and setting u = xl satisfies p6. The purpose of the if block which starts at line 3 is to distinguish between f a few cases when we know from ng = 1 and ng ≥ ng that f has at most one glitch. If such block is entered, xu 4 ω and #[f(xu),y) ≤ dM hold, which comes from the negation of the guard of the previous if statement. Therefore, xu may be inside a glitch. We compute the value of the function in α to discern between the cases listed below. y ≺ f(α) : This, together with the precondition of this algorithm, implies that f(xu) ≺ y ≺ f(α). Therefore, there must exist some x ∈ [α, xu] such that f(x) 4 y. If wM >t, then we refrain from searching such x because the glitch may be too wide, and setting u = xu trivially satisfies p6. This is captured by the if guard of line 4. Otherwise, the control flow can reach line 12. y < f(α) ∧ f(succ(α)) < f(α) : The second term of this condition implies that α ≺ αf . Therefore, we cannot exclude that the function further increases f between α and α , reaching y. If wM >t there is nothing more we can do to refine the upper bound, and setting u = xu trivially satisfies p6. This case is also captured by the if statement of line 4. If wM ≤ t, line 12 can be executed. y = f(α) ∧ f(succ(α)) ≺ f(α) : The second term of this condition implies that f α = α . Since y ≻ f(xu) and ng = 1, we know for sure that the whole interval (α, xu] is a glitch, and the function is strictly lower than y in that interval. Therefore, we just found the rightmost point x = α where f(x)= y, and predicate p6 is satisfied if u = succ(α) is set. When this condition is true, control flows into the if statement of line 6. y ≻ f(α) ∧ f(succ(α)) ≺ f(α) : The same reasoning we did for the previous case can be made: ∀x ∈ [α, xu]: y ≻ f(x). Together with the fact that y ≻ f(α) and that f is isotonic to the right of α, this allows us to state that the equation y = f(x) has no solutions in [xl, xu], and we can set u = xl, satisfying p6. This last case is caught by the else statement of line 8.

The if guard of line 3 captures all the conditions listed above if wM > t. Oth- erwise, if wM ≤ t, only the last two conditions are captured, because if the first two of them occur, then proceeding with the linear search described below can find a better value for u. In line 12, function linsearch geq is invoked. Starting from xu, it searches backwards for a point where the functions reaches or exceeds y. If it does not find such value in at most t steps and before reaching xl, b = 0 is set. Otherwise, b = 1 is set and hi is set to such value.

78 – If b = 1, the post-condition of linsearch geq states that ∀x ∈ [succ(hi), xu]: y ≻ f(x), and hi is the rightmost point where y 4 f(x). Therefore, setting u = succ(hi) satisfies p6 (line 17). – If b = 0, then linsearch geq could not find a point where the function is greater or equal than y, and the function is always lower than y in the interval [ˆx, xu]. This could happen because of either of the two reasons described below, which can be distinguished by comparing t and wM. t ≥ wM : linsearch geq analyzed the entire glitch, and the value of f in the point where the glitch starts is still lower than y. Since the function is quasi-isotonic, it can only decrease further proceeding towards xl. Therefore, we can state that the equation y = f(x) has no solution, and set u = xl, which satisfies p6 (line 19). t < wM : This last case occurs when the else branch of line 20 is entered. The glitch was too wide to be analyzed completely, and we can only set u =x ˆ, satisfying p6 (line 20). ⊓⊔ Lemma 2. Function check glitch specified in Algorithm 8 satisfies its con- tract.

Proof. The precondition of function check glitch requires three values lo, m and hi such that xl 4 lo 4 m 4 hi 4 xu and both f(m) ≻ y and f(hi) ≻ y. In this situation, we would like to claim that the graph of f is always higher than y, and that there is no solution to equation y = f(x) in interval [m, hi]. However, there might be a glitch inside that interval that lets the graph of function f reach the value y. The purpose of this algorithm is to search the glitch float-by-float when possible, in order to either find the rightmost point where f evaluates to y, or to be able to claim that f never reaches y in said interval. The algorithm proceeds with the linear search only if there is at most one glitch, and it is sufficiently tight. This condition is checked by the guard of the if statement on line 1. If it does not hold, we refrain from searching the glitch because it is computationally too expensive, and b is set to 2 in line 13, which means that nothing can be said about the solution of the equation y = f(x) in interval [m, hi]. Otherwise, function linsearch leq searches the glitch backwards starting from su for a maximum of wM floats, until it either finds a value z such that f(z) 4 y, or it reaches sl. sl and su are chosen so that the whole glitch is checked f by linsearch leq. Therefore, sl is set to the maximum between α 4 α and lo. As for su, it is chosen depending on the following cases: f(ω−) ≺ f(ω) : This implies that ω = ωf , and the glitch finishes in ω. Setting su to the minimum between ω and hi lets linsearch leq search the entire glitch (or the part of it inside the interval of interest), because it is surely not too wide (wM k ∧ k ≤ t : We may not know exactly where the glitch ends, but the distance between α and ω is small enough to search the whole interval delim- ited by them. Again, setting su := min{ω, hi} lets linsearch leq analyze the entire glitch.

79 f(α+) ≺ f(α) : This implies α = αf . Therefore, the glitch starts exactly in α +w and, since its width is lower than t, setting su := min{α M , hi} allows linsearch leq to search the entire glitch. On line 9 function linsearch leq is finally called. It can only return b =0 or b = 1, which proves part of the post-condition of this algorithm, i.e.,

f f ng =1 ∧ wM k ∧ k ≤ t) =⇒ b ∈{0, 1}.

It also satisfies the two other claims of the Ensure statement, depending on the value of b. b = 0 : By the post-condition of linsearch leq, ∀x ∈ [z,su]: f(x) ≻ y holds, −wM with z =x ˆ = max{sl,su }. As we previously explained, sl and su are cho- sen so that the only glitch is entirely contained in [sl,su], and it is within wM floats from su: the glitch is contained in interval [z,su], which was analyzed by linsearch leq. Therefore, function f is actually isotonic in [m,z) and in [su, hi]. This, together with f(m) ≻ y, f(α) 4 f(ω), and the post-condition of linsearch leq, allows us to claim that ∀x ∈ [m, hi] : f(x) ≻ y. b = 1 : The post-condition of linsearch leq states that we have z ∈ [ˆx,su], −wM withx ˆ = max{sl,su } < lo and su 4 hi: this implies that lo 4 z 4 hi. Moreover, su was chosen so that f is isotonic in the interval [su, hi]. This, together with ∀x ∈ (z,su]: f(x) ≻ y (which is part of the post-condition of linsearch leq), proves that ∀x ∈ (z, hi] : f(x) ≻ y. Finally, the post-condition of linsearch leq states that f(z) 4 y, which concludes the proof of this part of the post-condition of check glitch. ⊓⊔

Lemma 3. Function bisect ub specified in Algorithm 9 satisfies its contract.

Proof. We assume that the precondition of bisect ub (f, y, [xl, xu],ng, dM, wM, α, ω, s, t, lo, hi) is satisfied before the first iteration of the while loop that starts at line 1. We will now prove that the loop invariant

Inv ≡ (xl 4 lo ≺ hi 4 xu) ∧ f(lo) 4 y ≺ f(hi) ∧ ∀x ∈ [hi, xu]: f(x) ≻ y holds during and after the execution of the loop, satisfying the post-condition expressed in the Ensure statement.

Initialization: The invariant Inv is implied by the precondition of this algorithm. Therefore, it holds before the execution of the loop. Maintenance: At the beginning of the loop body, we assume that both Inv and #[lo, hi) > 1, the guard of the loop, hold. We will now prove that either the loop terminates with a break statement, or it continues and one between lo and hi takes a new value lo′ or hi′. The said new value will be part of ′ the interval (lo, hi). Note that this last statement implies xl 4 lo 4 lo 4 ′ hi 4 hi 4 xu, and therefore any new value for lo and hi that satisfies this condition, also satisfies the first condition of the invariant. We will prove that the invariant holds at the end of the loop body.

80 After function split point is invoked at line 2, lo ≺ mid ≺ hi. Note that this implies that, every time lo′ or hi′ are set to mid, the new value is part of the interval (lo, hi). Then, the value of f(mid) is compared to y in order to decide whether mid can be a new value for lo or hi. The following cases may occur: f(mid) 4 y : lo′ is set to mid. Since lo ≺ mid ≺ hi and because Inv holds, ′ also xl 4 lo = mid ≺ hi 4 xu holds. Again, Inv and f(mid) 4 y imply that f(lo′) 4 y ≺ f(hi). The third part of the invariant trivially holds, because hi remained the same. f(mid) ≻ y : If ng =0 ∨ hi 4 α ∨ mid < ω ∨ #[y,f(mid)) > dM we are sure that ∀x ∈ [mid, hi] : f(x) ≻ y, because there is no glitch in [mid, hi] where the function could become lower than or equal to y. This can be stated because either we are sure there are no glitches at all in that interval (first three conditions of the if guard of line 6), or f(mid) is so higher than y that no glitch can be sufficiently deep for the function to touch y (last condition). Therefore, on line 7 hi′ is set to mid, which satisfies the invariant. In fact, hi′ ∈ (lo, hi) holds because lo ≺ mid ≺ hi, and f(mid) ≻ y implies that f(lo) 4 y ≺ f(hi′) holds. The last condition of the invariant is satisfied by this choice of hi′, as proved at the beginning of this paragraph. Otherwise, the control flow reaches the else body starting at line 9. At this point, we are not sure whether there is a glitch between mid and hi where the graph of the function reaches y or not. If there is only one glitch, its maximum width is small enough, and we know exactly where it starts, function check glitch searches it float-by-float for a point z such that f(z) < y. Whether it succeeded or not can be exerted from the value of b. b = 0 : In this case, check glitch could search the glitch, but it did not find any point in which the function was lower than or equal to y: setting hi′ = mid is correct. The first condition of the invariant is satisfied because of the post-condition of split point, and the second condition holds because lo remained the same and, since we are into one of the else statements of the if at line 4, f(mid) ≻ y. The third condition is also satisfied because the post-condition of check glitch ensures that ∀x ∈ [mid, hi] : f(x) ≻ y. b = 1 : Function check glitch succeeded in finding a value z such that ′ f(z) 4 y inside the glitch. hi is then set to succ(z). xl 4 lo ≺ ′ hi 4 xu holds because the post-condition of check glitch states that mid 4 z 4 hi, and both mid and hi are enclosed in xl and xu. The said post-condition also states that ∀x ∈ (z, hi] : f(x) ≻ y: this maintains the last condition of the invariant, and implies that f(succ(z)) ≻ y, which satisfies f(lo) 4 y ≺ f(hi′). This part of the post-condition of check glitch, together with ∀x ∈ [hi, xu]: f(x) ≻ y, also implies that z is the rightmost point such that f(z) 4 y. The while loop can therefore be broken, as the upper bound found until now cannot be refined further.

81 b = 2 : In this case the conditions that would allow check glitch to search the glitch did not hold. Hence, lo and hi were left untouched, and the algorithm proceeds with the else block at line 14. Here, function logsearch ub is called to find a suitable value for hi. Its post-condition allows us to distinguish between the two following cases: z ≺ hi : logsearch ub found a value z such that #[y,f(z)) > dM: this assures us that there is no glitch in [z, hi] deep enough for the graph of the function to reach y. Therefore, setting hi′ = z satisfies the last two terms of the invariant. Also, hi′ ∈ (lo, hi) holds because z ∈ (mid, hi). z =hi: In this case logsearch ub could not find a safe value for hi, and we refrain from trying to refine the upper bound further. Since the loop is broken without changing anything, the invariant still holds. Termination: In the previous paragraphs we proved that, either: – the loop terminates with a break statement; – the loop continues and one of lo′ or hi′ is set to a new value contained into the interval (lo, hi). In this case, the distance between lo and hi decreases at each iteration: the guard of the while loop, #[lo, hi) > 1, will be eventually negated, terminating the loop. Correctness: We have already proved that the invariant Inv holds after each iteration of the loop, and whenever it terminates with a break statement. Inv coincides with the correctness part of the Ensure statement, which is therefore also proved. f f If ng =0 ∨ ng =1 ∧ wM k ∧ k ≤ t) holds, the post-condition of check glitch assures us that either b = 0 or  b = 1. Therefore, the only break statement that can be reached is the one at line 13. In this case, the same post-condition implies that f(z) 4 y: setting hi′ = succ(z) satisfies the precision statement, because f(pred(hi′)) = f(z) 4 y. In all the other cases, the control flow can only reach lines 5, 7 or 11, which let the loop continue. If the loop is not terminated by the break statement at line 13, the same reasoning presented at the end of the proof of Algorithm 4 can be made: whenever the loop terminates because its guard is negated, we have #[lo, hi] = 2. This, together with Inv, implies f(pred(hi) = lo) 4 y. ⊓⊔ Theorem 1. Function upper bound, specified in Algorithm 6, satisfies its con- tract. Proof. Calls to functions init and gallop ub, on lines 1 and 2 respectively, have the purpose of finding values for lo and hi suitable for the bisection phase. That is, as specified in the Require statement of function bisect ub (Algorithm 9), f(lo) 4 y ≺ f(hi) and ∀x ∈ [hi, xu]: f(x) ≻ y. Function gallop ub is specular to its lower bound counterpart: for more details about those functions, see the proof of Theorem 53. The if statements before the call to bisect ub on line 28 deal with the cases in which the said functions fail in their purpose. First, the value of f(hi) is checked by the if statements on lines 4 and 7, leading to the following cases:

82 f(hi) ≺ y : In this case hi clearly does not satisfy the precondition of bisect ub and, because of the post-condition of gallop ub stated on line 3, hi = xu. Therefore, function findhi ub is called on line 5 to further discern whether the equation y = f(x) has no solution, or hi = xu is inside a glitch and there is, in fact, a solution. Since function findhi ub always sets u to a value satisfying p6 (see Lemma 1), r can be set to 6, and function upper bound can terminate. f(hi) = y : Again, according to the post-condition of gallop ub, we have hi = xu. Since hi is the highest value of interval [xl, xu], setting r = 9 and u = hi is correct. If control flow is not caught by the if statements described above, then f(hi) ≻ y and, by the post-condition of gallop ub, either of the following holds: xu ≻ hi : This implies that #[y,f(hi)) > dM. Therefore, no glitch in interval [hi, xu] can be deep enough for function f to reach y: ∀x ∈ [hi, xu]: f(x) ≻ y holds. xu = hi : In this case, ∀x ∈ [hi, xu] ≡ [xu, xu]: f(x) ≻ y trivially holds. Therefore, hi satisfies the preconditions of bisect ub. Then, the value of f in lo is checked on line 10. If f(lo) ≻ y, lo is not suitable for bisection, and the body of the if statement on the said line tries to understand whether y = f(x) has no solutions, or a solution is contained in a glitch. Note that the post-condition of gallop ub entails that, in this case, lo = xl. The guard of the if statement on line 11 catches the two following cases: ng = 0 : In this case, function f is isotonic. Therefore, ∀x ∈ [lo, xu]: f(x) < f(lo). Together with f(lo) ≻ y, this implies that ∀x ∈ [lo, xu] ≡ [xl, xu]: f(x) ≻ y. This lets us state that equation y = f(x) has no solution, and setting r = 5 on line 12 is correct. #[y,f(α)) > dM : This condition signifies that f(α) is too high for the graph of f to reach y inside a glitch after α. This, together with the definition of α and ω, brings the following conclusions: – ∀x ∈ [xl, α]: f(x) ≻ y (f(lo = xl) ≻ y and isotonicity of f); – ∀x ∈ [α, ω]: f(x) ≻ y; – ∀x ∈ [ω, xu]: f(x) ≻ y (f(ω) < f(α) ≻ y and isotonicity of f). Therefore, equation y = f(x) has no solution in interval [xl, xu], and we can set r = 5. If none of the above conditions apply, there might be glitches deep enough to allow the function to take the value y somewhere in interval [xl, xu]. Function check glitch is called with m = lo in line 14 to search such glitch, if it is only one and it is not too wide. The correctness of the said function is discussed in Lemma 2. The three values of b that this function can return are distinguished in the following if statements. b = 0 : By the post-condition of check glitch, ∀x ∈ [lo, hi] : f(x) ≻ y. As we previously noted, ∀x ∈ [hi, xu]: f(x) ≻ y holds at this point of the algorithm. This, together with the fact that lo = xl, implies that ∀x ∈ [xl, xu]: f(x) ≻ y. Therefore, equation y = f(x) has no solution and we can set r = 5.

83 b = 1 : The post-condition of check glitch assures us that a value z ∈ F is returned, and lo 4 z 4 hi ∧ ∀x ∈ (z, hi] : f(x) ≻ y ∧ f(z) 4 y holds. We also know that ∀x ∈ [hi, xu]: f(x) ≻ y as stated before. This means that ∀x ∈ (z, xu]: f(x) ≻ y and there is no solution after z. The algorithm must now distinguish whether z is a solution or not, and it does so starting from line 18. f(z)= y : z is a solution to equation y = f(x), and it is also the rightmost, for the reasons stated above. Therefore, setting u = z and r = 9 is correct, and no further action is required, so the algorithm can return. f(z) ≺ y : We found the rightmost point where the graph of f crosses y with- out taking its value. Setting u = succ(z) satisfies p8, and the algorithm can terminate. b = 2 : In this last case, either ng > 1, or the glitch was too wide to be searched float-by-float. The only thing that can be done to refine the upper bound is setting u to the minimum between ω and hi. These values both satisfy p7. For hi, this is true because, as we noted before, at this point of the algorithm we have ∀x ∈ [hi, xu]: f(x) ≻ y. As for ω, because of the definition of α and ω, we are sure that f is isotonic outside interval [α, ω] which, together with the fact that f(α) 4 f(ω) and f(lo = xl) ≻ y, implies that ∀x ∈ [ω, xu]: f(x) ≻ y. If control does not flow into the body of the if statement on line 10, then we have f(lo) 4 y, which satisfies the precondition of bisect ub. Both lo and hi are suitable for the bisection phase, which is started on line 28 by calling bisect ub. The correctness of this function is discussed in Lemma 3: after its call, the post- condition stated in the Ensure statement of Algorithm 9 is satisfied. On line 29, a while loop approaches the solution of y = f(x) by going back- wards float-by-float. This loop is specular to the one starting on line 19, and so is the proof of its correctness and termination. See Theorem 53 for more details, keeping in mind that the loop invariant is ∀x ∈ [hi, xu]: f(x) ≻ y in upper bound. Such invariant holds on line 33, where the value of f(pred(hi)) is checked, leading to the following cases: f(pred(hi)) ≺ y : If u = hi is set, we have f(pred(u)) ≺ y ≺ f(u). This, together with the while-loop invariant, satisfies p8. u and r are set to the said values on line 34. f(pred(hi)) = y : The rightmost solution of y = f(x) was found. u can be set to pred(hi) and r to 9, because p9 is satisfied. f(pred(hi)) ≻ y : This condition holds in the body of the else statement of line 37. The loop invariant, ∀x ∈ [hi, xu]: f(x) ≻ y, assures that setting u = hi satisfies p7; u and r are set accordingly on line 38. The correctness statement of this algorithm is therefore proved. In order to prove the precision statement, we assume that f(xl) 4 y 4 f(xu) f holds, together with either ng = 0 or ng = 1 ∧ wM < t ∧ α = α ∨ ω = ωf ∨(#[α, ω) > k∧k ≤ t) . This, together with the post-condition of gallop ub, implies that after line 2 f(lo) 4 y and f(hi) < y hold, even if lo = xl or hi = xu. 

84 Therefore, f(lo) is always suitable for bisect ub, and the if statement of line 10 cannot be entered. As for f(hi), the following two cases must be distinguished: f(hi) = y : The body of the if statement on line 7 is entered, and r = 9 is returned, satisfying the precision post-condition. f(hi) ≻ y : f(hi) satisfies the precondition of bisect ub, that is called on line 28. After this call, according to the precision post-condition of bisect ub, f(pred(hi)) 4 y. The while loop on line 29 will therefore assign pred(hi) to hi, so that f(hi) 4 y. Only the bodies of the if statements on lines 33 and 35 can now be entered: r can only be set to 8 or 9, respectively. This satisfies the preci- sion post-condition of this algorithm. ⊓⊔

Theorem 2. In case f : F → F is isotonic, that is, ng = 0, then, for each i [xl, xu] ∈ IF, dM, wM, α, ω, f : F → F, s,t ∈ N, the call to the function i upper bound(f, y, [xl, xu], 0, dM, wM,α,ω,f ,s,t) in Algorithm 6 can be executed calling at most 2 log2 #[xl, xu] +4 times the function f. Proof. The proof of this theorem is analogous to the proof of Theorem 54. It is therefore omitted.

Theorem 3. In case the function f : F → F has small glitches, that is, ng > 0 i but wM

(s + 2) log2 #[xl, xu] − s + t +7.⊓⊔  D Implementation of the Trigonometric Algorithms

This Section contains some theoretical and practical results that concern the im- plementation of the interval-refinement algorithms for trigonometric functions. These results were omitted from the main article for brevity and simplicity of exposition, but they are thoroughly described here as a reference to those willing to implement the described concepts.

D.1 Glitches in Trigonometric Functions

In this section, the concept of monotonicity glitches in trigonometric functions is analyzed in a more formal way, and new definitions of glitches that take into account the problematic behavior of these functions are given. In particular, we define the set of the quasi-monotonic intervals, which captures in a formal way the fact that trigonometric functions change their monotonicity periodically, and their graphs consist of a succession of monotonic branches. These concepts will be required in the proofs of the presented algorithms. With the definitions of glitches given in Section 4, the normal changes in monotonicity of trigonometric functions would be detected as glitches. Suppose we analyzed, say, the sinf function: Definition 6 would recognize all the intervals Imono,k ∈ IF of the form

3 π Iiso,k = − π +2kπ , +2kπ 2 ↑ 2 ↓   h i  as isotonicity glitches, and intervals Ianti,k of the form

π 3 Ianti,k = − +2kπ , π +2kπ 2 ↑ 2 ↓ h i    as antitonicity glitches, with k ∈ Z. Actual glitches would be recognized as sub- glitches of these intervals, thus being difficult to discern. On the other hand, Definition 6 is still useful if we separately consider those intervals in which the functions are quasi-isotonic or quasi-antitonic. The follow- ing definitions let us denote appropriate intervals in which the functions have such properties. Definition 7 describes the maximal sets of intervals in which the function has a constant monotonicity, while Definition 8 applies the definition of glitches to those intervals.

86 Definition 7. (Set of the quasi-isotonic (quasi-antitonic) intervals.) Let R R f : R → R be a periodic function, and let Mf (Af ) be the set of the maximal isotonic (resp., antitonic) intervals of f in R:

R f is isotonic on I R Mf := I ∈ I ′ ′ ′ . ( ∀I ∈ IR : I ⊂ I =⇒ f is not isotonic on I )

Let D ⊆ F such that, for each x ∈ D, f(x) ∈ R. The set of the quasi-isotonic D intervals of f in D is a set Mf ⊆ IF (Definition 3) such that:

D R Mf := { F ∈ IF | F ⊆ D ∧ ∃I ∈Mf . F = I ∩ D }.

R When an interval I ∈ Mf has 0 as the lower or upper bound, the correspond- D ing interval in Mf has +0 or −0 as the corresponding bound. The sets of the R D (maximal) quasi-antitonic intervals Af and Af are defined similarly. Definition 8. (Isotonicity glitch in a periodic function.) Let f : F → F be the implementation of a periodic function, let I ⊆ F such that, for each x ∈ I, I f(x) ∈ R, and let Mf be the set of the isotonic intervals of f in I. An isotonicity glitch of f in I is an interval G ⊆ I such that:

I 1. there exists an interval IM ∈Mf such that G ⊆ IM; 2. G is an isotonicity glitch of f in IM according to Definition 6.

Antitonicity glitches in periodic functions are defined similarly, but substitut- I I ing Mf with Af . Glitch width and depth in periodic functions are consequently defined according to Definition 6.

D.2 Range Reduction for Trigonometric Functions

Now that we have theoretically characterized the behavior of trigonometric func- tions, we present a method for bringing these concepts into practice. In order to split the domain of trigonometric functions into quasi-monotonic intervals, algorithms capable of discerning close floating-point approximations π of multiples of 2 are needed. Such algorithms will be very similar to the range reduction algorithms used to implement trigonometric library functions: see, e.g., [BKD+05,DMMM96,Ng92,PH83]. Let x ∈ F be the argument of a trigonometric function: argument reduction is aimed at finding values k ∈ Z and r ∈ R such π 2 that x = k 2 +r. In order to obtain these values, k = ⌊x π ⌋ and an approximation π 2 of r = 2 (x π − k) are computed with an appropriate increased precision. Then, r is used to compute the value of the function in a reduced interval, and the last digits of k are used to identify the quadrant of x. A notable implementation of range reduction algorithms is the one in fdlibm;24 it is also inluded in some other

24 Developed in 1993 by Sun Microsystems, Inc. See http://www.netlib.org/fdlibm/, file e rem pio2.c, last accessed on April 19th, 2020.

87 libm implementations, such as GNU libc,25 which uses it for some architectures. Other important implementations are the ones of the mentioned CR-LIBM and libmcr libraries. As we shall point out, the algorithms needed for our analysis do not have the same precision requirements as actual range reduction algorithms: since we will not actually compute the functions, r is not needed in full precision. An π algorithm capable of computing integer division by 2 is needed, as well as a π generator of floating-point approximations of 2 . The computations of such operations poses the problem of deciding how many π digits of the approximation of 2 are needed in order to obtain correctly rounded results. This is also known as the Table Maker’s Dilemma [LMT98]. Lemma 4 allows us to answer this question, by computing the rounding error generated by the multiplication of a floating point number k, which shall be considered exact, and a floating-point approximation [x]r of a real number, x. The rounding mode r used to generate the approximation and to perform the multiplication can be anyone of those defined in the IEEE 754 Standard. In order to prove Lemma 4, the following Definition 9 is needed.

n Definition 9. Let x ∈ R, such that ∄n ∈ Z . |x| =2 and −fmax < |x|

Lemma 4. Let k ∈ F be a number in any of the IEEE 754 binary floating-point n formats, with an exponent ek. Let x ∈ R, such that ∄n ∈ Z . |x| = 2 and fmin < |x|

Proof. The purpose of this lemma is to find a sufficiently tight upper bound for the rounding error of the said multiplication, keeping into account the initial rounding error of the second floating-point operand [x]r. This can be written as the sum of the real number x plus its rounding error, i.e., [x]r = x + ∇(x). Since we are assuming the multiplication is correctly rounded, it can also be written as the sum of its exact value plus its absolute error:

 k [x]r = k x + ∇(x) r

= kx + k ∇(x)r = kx + k ∇(x)+ ∇ kx + k ∇(x)  = kx + ∇ k [x]r . 

25 See GNU libc 2.22, file sysdeps/ieee754/dbl-64/e  rem pio2.c.

88 We already have an upper bound for the rounding error of [x]r, which is ∇(x) ≤ 2ex−p+1. An upper bound for |k| is given by |k| < 2ek +1. Their product is thus bounded by the product of these two bounds.

An upper bound for kx + k ∇(x) can be found in the following way:

kx + k ∇(x) ≤ kx + k ∇(x) ek +1 ex+1 ek+1 ex−p+1 ≤ 2 2 +2 2

=2ek +ex+2 +2ek+ex−p+2 =2ek +ex+2(1+2−p) < 2ek +ex+3 Thus, the exponent of kx + k ∇(x) is at most 2ek+ex+2, and

∇ kx + k ∇(x) ≤ ulp kx + k ∇(x) =2ek +ex−p+3.

We can now further derive the upper bound for the magnitude of the absolute error of the multiplication:

∇(k  [x]r) = k ∇(x)+ ∇ kx + k ∇(x)

≤ k | ∇(x)| + | ∇ (kx + k ∇(x)) |

≤ 2ek +ex−p+2 +2ek+ex−p+3 =2ek +ex−p+2(1 + 2) < 2ek +ex−p+4

ek +ex−p+4 This proves that ∇ k  [x]r ≤ 2 . ⊓⊔ The algorithms for the propagation of constraints with trigonometric func- tions, which have been introduced in Section 6.2, divide the domains of trigono- metric functions into quasi-monotonic intervals. They need functions to identify π 2 approximations of multiples of 2 (by computing x π , x ∈ F), and to gen- erate such upper and lower approximations — k π and k π , k ∈ Z— in 2 ↑  2 ↓ the floating-point format in which the studied trigonometric functions operate,     which will be called the “target” format.

π Integer division by 2 , rounded upwards This procedure has the purpose of π π computing a number k such that (k − 1) 2 ≤ x ≤ k 2 . It does so by multiplying 2 x by π , and then rounding the result to an integer towards positive infinity. The major caveat in this procedure is the precision at which the multiplication 2 should be performed, and consequently the number of digits of π that should 2 be stored. For x × π to be rounded to the correct integer, we need a precision p sufficient to preserve at least the sign and order of magnitude of the difference 2 ˆ ˆ 2 ∆x = x π − k, where k is the integer nearest to x π : 2 ∇ x  < 2e∆x , π   r

89 where e∆x is the exponent of the difference ∆x. According to Lemma 4, this means that, for each x ∈ [−ℓmax,ℓmax] the following relation must hold:

e 2 +ex−p+4 e∆x π 2 2 < 2 ⇐⇒ p>ex − e∆x + e π +4.

The minumum value for p can be obtained by finding the floatx ˆ ∈ [−ℓmax,ℓmax] that maximizes the value of exˆ − e∆xˆ. This task can be fulfilled by running the algorithm given in [KM84]. It turns out that, for the IEEE 754 single-precision floating point format, 21 2 ˆ −27 suchx ˆ is 1.4AC55C16 · 2 , with |∆xˆ| = |xˆ π − k|≈ 1.A1CCDF16 · 2 . Running the algorithm in [KM84] with a “threshold” of |∆x|≤ 2−25 suffices to show that ′ ′ for no other exponent exˆ′ there exists anx ˆ such that |∆xˆ | < |∆xˆ|. It follows that p> 21−1+4+27= 51. Proceeding in the same way, the minimum precision needed for the double-precision format is p> 109. Since we established that for the single-precision format a precision of at least 52 bits is needed, IEEE 754 double-precision numbers may be used to perform  2 the multiplication x π r. The obtained result can be rounded to the nearest integer towards positive infinity using functions provided by the C and C++   standard math libraries. The cases of x = −0.0 and x = +0.0 must be handled separately: the values returned by this function must be 0 and 1 respectively, because of Definition 7.

π Generation of lower 2 multiple approximations A similar reasoning can π be done to compute k 2 ↓ for a given integer k, that is, the maximum floating π point number x ∈ F such that x < k 2 . This can be achieved by multiplying k π   by the constant 2 with a sufficient precision, and then rounding it to the target precision using the roundTowardNegative rounding direction. The precision used for this multiplication must satisfy

p>e − e π + e π +4 k x−k 2 2 according to Lemma 4. The valuex ˆ that maximizes ex−e∆x is the same as for the 2 previous section. Since for each x ∈ [−ℓmax,ℓmax] we have |∆x| = |x π −k| ≤ |x− k π |, where k is the integer nearest to x 2 , e π ≥ e holds. Also, because of the 2 π x−k 2 ∆x criteria we used to define ℓmax (i.e., the ulp in the interval should be sufficiently large, and greater than 1), k is always representable in our target floating-point π ˆ format. Moreover, ek ≤ ex, because |k|≤ k 2 ↓ = |x|. Consequently,x ˆ and k maximize also ek − ex−k π . 2   For the single-precision format we can state

p>e + e π +4 − e =21+0+4+27=52, xˆ 2 ∆xˆ which imposes a required precision of at least 53 bits. This allows us to use the IEEE 754 double-precision format to perform this multiplication. In order to round the result to the single-precision format, the rounding direction can be set to roundTowardNegative. It is also possible to avoid switching rounding mode

90 by adjusting the obtained result according to the rounding direction in use. If k = 0, the function should return x = −0.0. This case can be handled separately. To obtain the result in the double-precision format, p > 110 is required. The computation can be performed using a technique for the implementation of multiple precision arithmetic, such as the one described in [Dek71] with an extended-precision format of at least 55 bits of precision, or the one given in [HLB01] with the double-precision format. For formats requiring higher preci- sion, see [MPT16].

D.3 Algorithms for Trigonometric Functions

The constraint propagation algorithms we developed for trigonometric functions are described in their principles in Section 6.2. They make use of the concepts and range-reduction techniques presented above. Since the direct propagation algorithm is substantially simple in its func- tioning, we will not describe it in further detail. In this section, we will instead present the indirect propagation algorithm in pseudocode, and provide an argu- ment for its correctness. Algorithm compute bounds trig, presented as Algorithm 10, has the main purpose of splitting the interval to be refined, [xl, xu], into the corresponding set of the monotonic intervals, making use of the range-reduction procedures pre- sented in Section D.2. It then invokes algorithms lower bound and upper bound through the helper functions branch lb and branch ub, one of which is shown in as Algorithm 11. The algorithms presented in this section require the same glitch data as the algorithms for regular functions: please refer to Section 5 for an explanation. Argument g ∈ N is a parameter that fixes the maximum number of monotonic sub-intervals that [xl, xu] should be split into. Argument p holds information about the behavior of function f, allowing the algorithm to correctly recognize whether the graph of the function is isotonic or antitonic in each sub-interval, and to correctly invoke the other propagation algorithms. They can take one of values evenv, oddv and oddc, which carry the following meanings:

π evenv the function changes its monotonicity in even multiples of 2 , i.e., in num- π bers of the type 2k 2 , with k ∈ Z. If f = cos, this value of p is passed. π oddv the function changes its tonicity in odd multiples of 2 , numbers of the π form (2k + 1) 2 , k ∈ Z. This is the behavior of the sine function. π oddc the function has a discontinuity in odd multiples of 2 , but it remains isotonic. The tangent has this behavior.

For a deeper understanding of how the algorithms work, the reader can refer to their correctness proofs, which follow.

Lemma 5. If function f is quasi-monotonic over [i.xl, i.xu], then the function branch lb, specified in Algorithm 11, satisfies its contract.

91 i Algorithm 10 Indirect propagation: compute bounds trig(f,f , p, [yl,yu], [xl, xu],ngM, dM, wM,α,ω,g,s,t) i Require: f : F → F, f : F → F, p ∈ {evenv, oddv, oddc}, [xl,xu], [yl,yu] ∈ IF, ngM ≥ f f f f f ngM, dM ≥ dM, wM ≥ wM, α 4 α , ω < ω , ngM > 0 =⇒ (xl 4 α 4 ω 4 xu), g,s,t ∈ N.

Ensure: |I|≤ g, i∈I [i.xl, i.xu] = [xl,xu], ∀i ∈ I : i.l, i.u ∈ F S f −f ∧ i.rl ∈ {0, 1, 2, 3, 4} =⇒ pi.rl (yl, i.xl, i.xu, i.l) ∨ pi.rl (−yu, i.xl, i.xu, i.l) f −f  ∧ i.ru ∈ {5, 6, 7, 8, 9} =⇒ pi.ru (yu, i.xl, i.xu, i.u) ∨ pi.ru (−yl, i.xl, i.xu, i.u) . 2  1: kl := geq tonicity change p, ⌈xl π ⌉ ;  2 ⊲ kl = min k ∈ Z k ≥ ⌈xl π ⌉ ∧ tonicity change(k,p) 2  2: ku := geq tonicity change(p, ⌈xu π ⌉); 2 ⊲ ku = min k ∈ Z k ≥ ⌈xu ⌉ ∧ tonicity change(k,p)  π 3: if even(p) then kc := 0 4: else kc := −1 5: end if; 6: if g = 1 then gl := 1; gr := 0; kc := ku π 7: else if xu 4 kc then gl = g; gr = 0  2 ↓ else if x < k π then g g g 8: l c 2 ↑ l := 0; r := g  9: else gl := ⌊ 2 ⌋; gr := g − gl 10: end if; 11: if gl > 0 then 12: i.xl := xl; i.xu := min xu, [kl]↓ ;  i 13: (i.l, i.rl) := branch lb(f, f , p, [yl,yu], [i.xl, i.xu],kl, ngM,dM,wM,α,ω,s,t);

14: klu := min{ku,kc};

15: i.k := max{kl,klu − 2(gl − 1)}; 16: if i.rl = 0 ∧ kl < i.k then 17: i.l := i.xu; i.rl := 2 18: end if; π 19: cx := max{xl, (i.k − 2) }; l  2 ↑ i.x x , i.k π 20: u := min u 2 ↓ ;    i 21: (i.u, i.ru) := branch ub(f, f , p, [yl,yu], [cxl , i.xu], i.k, ngM,dM,wM,α,ω,s,t); 22: if i.ru = 5 ∧ kl < i.k then 23: i.u := i.xl; i.ru := 7 24: end if; 25: I.add(i); 26: i.k := i.k + 2;

27: while i.k < klu do i.x i.x i.x i.k π ,x 28: l := succ( u); u := min 2 ↓ u ; i   29: (i.l, i.rl) := branch lb(f, f , p, [yl,yu], [i.xl, i.xu], i.k, ngM,dM,wM,α,ω,s,t); i 30: (i.u, i.ru) := branch ub(f, f , p, [yl,yu], [i.xl, i.xu], i.k, ngM,dM,wM,α,ω,s,t); 31: I.add(i); 32: i.k := i.k + 2 33: end while 34: end if;

92 35: if gr > 0 then 36: i.k := max{kl,kc + 2}; π 37: i.xu := max pred(xl), (i.k − 2) ;   2 ↓ 38: while i.k < ku ∧ gr > 1 do i.x i.x i.x x , i.k π 39: l := succ( u); u := min u 2 ↓ ; i    40: (i.l, i.rl) := branch lb(f, f , p, [yl,yu], [i.xl, i.xu], i.k, ngM,dM,wM,α,ω,s,t); i 41: (i.u, i.ru) := branch ub(f, f , p, [yl,yu], [i.xl, i.xu], i.k, ngM,dM,wM,α,ω,s,t); 42: I.add(i); 43: i.k := i.k + 2; 44: gr := gr − 1 45: end while; π 46: i.xl := succ(i.xu); i.xu := min xu, [i.k 2 ]↓ ; i  47: (i.l, i.rl) := branch lb(f, f , p, [yl,yu], [i.xl, i.xu], i.k, ngM,dM,wM,α,ω,s,t); 48: if i.rl = 0 ∧ i.k < ku then 49: i.l := i.xu; i.rl := 2 50: end if; c x , k π i.x x 51: xl := max l ( u − 2) 2 ↑ ; u := u;    i 52: (i.u, i.ru) := branch ub(f, f , p, [yl,yu], [cxl , i.xu],ku, ngM,dM,wM,α,ω,s,t); 53: if i.ru = 5 ∧ i.k < ku then

54: i.u := cxl ; i.ru := 7 55: end if; 56: I.add(i) 57: end if

i Algorithm 11 Indirect propagation: branch lb(f,f , p, y, [i.xl, i.xu], i.k, ngM, dM, wM,α,ω,s,t) i Require: f : F → F, f : F → F, p ∈ {evenv, oddv, oddc}, y ∈ F, [i.xl, i.xu], [yl,yu] ∈ f f f f f IF, i.k ∈ Z, ngM ≥ ngM, dM ≥ dM, wM ≥ wM, α 4 α , ω < ω , ngM > 0 =⇒ (α 4 ω), s,t ∈ N. Ensure: c i.l ∈ F, f −f i.rl ∈ {0, 1, 2, 3, 4} =⇒ p (yl, i.xl, i.xu, i.l) ∨ p (−yu, i.xl, i.xu, i.l) . i.rl i.rl  1: if ngM = 0 ∨ i.xl ≻ ω ∨ i.xu ≺ α then 2: i ng := 0 3: else 4: i ng := ngM; 5: i α := max{α, i.xl}; 6: i ω := min{ω, i.xu} 7: end if; 8: if quasi isotonic(i.k, p) then i 9: (i.l, i.rl) := lower bound(f,yl, [i.xl, i.xu], i ng,dM,wM, i α, i ω, f ,s,t) 10: else 11: (i.l, i.rl) := lower bound(−f, −yu, [i.xl, i.xu], i ng i 12: dM,wM, i α, i ω, f ◦ (−id),s,t) 13: end if

93 Proof. The if statement on line 1 prepares variables i ng, i α and i ω to be used as arguments for lower bound. If there are no glitches in [i.xl, i.xu], i ng is set to 0. Otherwise, the else statement ensures that i.xl 4 i α 4 i ω 4 xu. Along with the preconditions of this algorithm, this satisfies all preconditions of lower bound. The only thing that remains to do, is to distinguish whether f is quasi- isotonic or quasi-antitonic in [i.xl, i.xu]. This is done by predicate quasi isotonic (i.k, p), by checking the value of m = i.k mod 4: if p = evenv, f is isotone when m ∈{0, 3}; if p = oddv f is isotone when m ∈{1, 2}. If f is quasi-isotonic, lower bound can be called normally. Otherwise, lower bound is called on line 12 with −f instead of f, −yu instead of yl, and the inverse function fi with the sign of its argument changed. Note that, if f is quasi-antitonic, then −f is quasi- isotonic, which allows us to invoke lower bound normally. Also, −yu must be passed as the lower bound for y because if yl 4 yu, then −yu 4 −yl. Since i.l and i.rl are set by lower bound, they satisfy its post-condition. So f −f pi.rl (yl, i.xl, i.xu, i.l) holds if f is isotonic in [i.xl, i.xu], or pi.rl (−yu, i.xl, i.xu, i.l) −f f if f is antitonic in that interval. Note that, by pi.rl , we intend predicate pi.rl with all occurrences of f substituted with −f. Consequentially, also the Ensure statement of this algorithm holds. ⊓⊔ The algorithm listing and the proof of branch ub are omitted, since they are very similar to those of branch lb. Theorem 4. Function compute bounds trig, specified in Algorithm 10, satis- fies its contract. Proof. The algorithm starts by identifying which branches of the function graph contain xl and xu. This operation is performed at lines 1 and 2, where function π geq tonicity change is called. The lowest integer that, multiplied by 2 , gives a number greater than xl where the monotonicity of the function changes is assigned to kl. The same is done for xu, and the value is assigned to ku. π On line 3 a value for kc is chosen. The approximations of kc 2 will be used to π π split [xl, xu] in two intervals xl, kc 2 ↓ and kc 2 ↑, xu , which will be referred to as “left” and “right” intervals respectively. Branches of the function nearest π        to kc 2 will not be included into the two large intervals near the bounds of the domains, and will be processed individually. If f changes monotonicity in even π multiples of 2 (such as the cosine), kc = 0. Otherwise, kc is set to −1: in this way π π the branch between −1 2 and 2 will be the leftmost one in the right interval. Now, values for gl and gr are chosen. They are, respectively, the number of sub-intervals in which the left and right intervals will be divided. The if statement at line 6 distinguishes among the four cases listed below. g = 1 : Only one interval must be returned, corresponding to [xl, xu], but refined in the outward branches of the function. In this case, gl is set to 1, and gr to 0. In this way, the body of the if statement on line 11 will take care of the said interval, as it would normally do with the leftmost interval. Also, kc := ku is set, since the above mentioned code uses kc to chose the upper bound of the said interval.

94 4 π g > 1 ∧ xu kc 2 ↓ :[xl, xu] lies entirely into the left interval. Therefore, gl must be set to g and g to 0.   r < π g > 1 ∧ xl kc 2 ↑ :[xl, xu] lies entirely into the right interval. Therefore, gl must be set to 0 and gr to g.  π  π g > 1 ∧ xl ≺ kc 2 ↑ ∧ xu ≻ kc 2 ↓ :[xl, xu] must be divided into left and right intervals. g is set to ⌊ g ⌋ and g is set to g − g . This way, half of the sub-  l  2   r l interval will be in the left interval, and half in the right one. Note that, if the function has less than gl monotonic branches in the left interval or less then gr in the right one, the interval will be only split in as many intervals as the branches.

In all four cases, gl + gr = g. The purpose of the body of the if statement on line 11 is to divide the π left interval xl, min xu, kc 2 ↓ into the appropriate sub-intervals, and refine them by calling branch lb and branch ub. First, the leftmost sub-interval is      generated. Each sub-interval is identified by its initial bounds i.xl and i.xu, by the refined bounds i.l and i.u, by i.rl and i.ru (the predicates that hold after calling lower bound and upper bound), and by i.k, which identifies the branch (or one of the branches) of the function in which the sub-interval is contained. π π That is, it defines the interval (i.k − 2) 2 ↑, i.k 2 ↓ . If the number of monotonic branches in the left interval  is lower than or equal to gl, then this sub-interval will correspond to a single branch; it will include multiple branches otherwise. On line 12 i.xl and i.xu are set to the bounds of the leftmost branch, xl, min xu, [kl]↓ . Because of how kl was set, this interval is part of the leftmost monotonic branch of the function, and it covers the end    of it if xu ≥ [kl]↓. Then since f is monotonic in [i.xl, i.xu], branch lb can be f called and i.l and i.rl are set, satisfying pi.rl (yl, i.xl, i.xu, i.l) if f is isotonic in −f that interval, or pi.rl (−yu, i.xl, i.xu, i.l) if it is antitonic, according to the post- condition of branch lb. The left interval will end with the branch defined by kc, or with xu if xu ≺ π kc 2 ↓. On line 14, klu is set accordingly to min{ku, kc}. Then, on line 15 the π rightmost branch of the left interval is chosen: it will end in i.k 2 ↓. Since the rightmost g − 1 branches will be treated individually by the while loop on l   line 27, i.k is set to max{kl, klu − 2(gl − 1)}. gl − 1 is multiplied by 2 because π each branch has a length of π (two times 2 ), and subtracted from klu , which identifies the rightmost possible branch in the left interval. Before passing to the refinement of the upper bound of the sub-interval, the if statement on line 16 checks whether i.rl, the predicate number returned by lower bound, is 0. If kl = i.k, it means this sub-interval contains a single branch, and no more work is needed, because we know there are no solutions to yl = f(x) in this sub-interval. If also kl < i.k holds, there are more branches after i.xu in this sub-interval. According to p0 we have ∀x ∈ [i.xl, i.xu]: y ≻ f(x) but we know nothing about the other branches. Since i.xu will be later set to the upper bound of this sub-interval, i.l must be set to the current value of i.xu, and i.rl to

95 f −f 2. This way, we have ∀x ∈ [i.xl, i.l]: y ≻ f(x), and either p2 or p2 is satisfied even if the value of i.xu is later increased. Then, the rightmost branch of the leftmost sub-interval is refined. On line 19, cxl is set to the maximum between xl (remain inside the domain, in case this π sub-interval contains only one branch), and (i.k−2) 2 ↑. This is the lower bound of the rightmost branch of this sub-interval (note that i.k was previously set   u to the corresponding value). Similarly, i.xu is set to the minimum between xu π and i.k 2 ↓ on line 20. Now, f is monotonic over [cxl , i.xu], and we are ready to refine the upper bound of this sub-interval, by calling branch ub.   The if statement of line 22 checks whether i.ru = 5. Again, if kl = i.k nothing needs to be done. If kl < i.k, we cannot return i.ru = 5, because we do not know if there are solutions or not in the branches we did not analyze. Therefore, i.u is set to i.xl and i.ru to 7. Since the post-condition of branch ub ensures that p5 holds, we have ∀x ∈ [i.xl, i.xu]: y ≺ f(x): then also ∀x ∈ [i.u, i.xu]: y ≺ f(x) f −f holds, which satisfies p7 or p7 . At this point, both bounds of the leftmost interval have been refined, and i, with the current values of its fields, can be added to I. Then we can pass to the next sub-interval, identified by the current i.k + 2. i.k is updated accordingly on line 26. The while loop starting on line 27 refines all other sub-intervals in the left interval, which all correspond to a single monotonic branch of the function. We will now prove that loop invariant

Inv ≡ [h.xl, h.xu] = [xl, i.xu] (4) h∈I [ ∧ ∀h ∈ I : h.l, h.u ∈ F f ph.rl (yl, h.xl, h.xu, h.l) ∧ h.rl ∈{0, 1, 2, 3, 4} =⇒ −f (5) ∨ph.rl (−yu, h.xl, h.xu, h.l) f ph.ru (yu, h.xl, h.xu, h.u) ∧ h.ru ∈{5, 6, 7, 8, 9} =⇒ −f (6) ∨ph.ru (−yl, h.xl, h.xu, h.u) π ∧ i.xu = min (i.k − 2) , xu (7) 2 ↓ h i holds before and after each iteration of the loop, unless the loop guard is false before the loop is entered the first time. In this latter case, clause (7) does not hold, but the other do. Before starting with the proof of the loop, we need to prove the following fact: right after lines 20 and 28, π x 4 i.k =⇒ i.k +2 ≥ k u . (8) u 2 ↓ l   π 4 Because of how ku was set on line 2, we also have (ku −2) 2 ↑ xu. This implies π 4 π (ku − 2) 2 ↑ i.k 2 ↓ and ku − 2 ≤ i.k. Line 14 ensures that klu ≤ ku: we have k u − 2 ≤ k − 2 ≤ i.k, or i.k +2 ≥ k u .  l u   l

96 Initialization: Before of the first iteration of the loop, I contains only one interval i. Since i.xl was set to xl on line 12 and left untouched, and also i.xu was not changed after the interval was added to I, clause (4) of the invariant is

satisfied. As for i.l, it satisfies predicate pi.rl because, either: i.rl 6=0: it wassetby branch lb and left untouched. Even if i.xu has been

possibly increased after branch lb was called, all predicates pi.rl with i.rl 6= 0 still hold, because they do not imply anything so specific about interval (i.l, i.xu] to be invalidated by increasing i.xu.

i.rl = 0 : the if statement on line 16 changed i.l and i.rl so that pi.rl still holds, as was proved before. A very similar reasoning can be done for i.u and i.ru, distinguishing whether branch ub returned i.ru = 5 or not. Therefore, clauses (5) and (6) of the invariant are also proved. π On line 20, i.xu was set to i.xu := min xu, i.k 2 ↓ and, since i.k was incremented by 2 on line 26, clause (7) is satisfied.    Maintenance: If the loop body was entered, the guard condition is true: so π i.k < klu . This, together with implication (8), assures us that xu ≻ i.k 2 ↓. π Because of clause (7) of the invariant, xu = (i.k − 2) 2 ↓, and succ( xu) = (i.k − 2) π . i.x is set to this value on line 28. On the same line, i.x is set 2 ↑ l   u π to the minimum between xu and i.k 2 ↓. Therefore, [i.x , i.x ] is contained in a single monotonic branch, and functions l u   branch lb and branch ub can be called to refine the bounds of this sub- interval. The post-conditions of the said functions satisfy clauses (5) and (6) of the invariant. Sub-interval i is then added to I. Since on line 28 i.xl was set to the successor of the previous value of i.xu, and because the invariant ensures that previously h∈I [h.xl, h.xu] = [xl, i.xu], adding the new interval to I leaves clause (4) satisfied. Eventually, after line 32 clause (7) also holds. S Termination: On line 15, ik was set to a value such that klu − i.k ≤ 2(gl − 1).

Since i.k is incremented by 2 at the end of the loop body, it reaches klu after at most gl − 1 iterations: the guard then evaluates to false, and the loop terminates.

Since the loop iterates at most gl − 1 times, the number of sub-intervals it adds to I is lower than or equal to gl − 1. Together with the fact that the leftmost sub-interval is always added to I before the loop, this implies |I|≤ gl.

At this point, if ku ≤ kc, then klu = ku and the last sub-interval ended in xu: i.xu = xu and clause i∈I [i.xl, i.xu] = [xl, xu] of the post-condition is proved. In this case, x 4 k π and g = 0 according to line 7: the if statement of u cS2 ↓ r line 35 is not entered, and the algorithm terminates. Since there are no more   sub-intervals to be analyzed, |I|≤ gl = g and the loop invariant satisfy the rest of the post-condition. Otherwise, if xu has not been reached yet, the if body of line 35 processes the remaining sub-intervals. First, it splits the first gr − 1 branches of f and refines them with branch lb and branch ub. The loop on line 38 terminates when either the rightmost branch is refined, or gr = 1. Then, a sub-interval

97 made of the remaining branches is processed in a way similar to the one of the leftmost interval. Since this part of the algorithm is similar to the if statement of line 11, the proof of its correctness is omitted. After the body of this last if statement is executed, a number of sub-intervals less or equal than gr has been added to I. Together with the fact that |I| ≤ gl held before, this implies that at the end |I| ≤ g. The fact that it processes all π branches between kc 2 ↑ and xu by calling branch lb and branch ub, and it corrects their output for the rightmost interval if needed, proves the rest of the   post-condition of this algorithm. ⊓⊔

98