Verifying Bit-Manipulations of Floating-Point

Verifying Bit-Manipulations of Floating-Point Wonyeol Lee Rahul Sharma Alex Aiken Stanford University, USA fwonyeol, sharmar, [email protected] Abstract tomated verification technique is capable of analyzing these Reasoning about floating-point is difficult and becomes only implementations. In this paper, we present a first step to- more so if there is an interplay between floating-point and wards addressing this challenge. bit-level operations. Even though real-world floating-point Bit-precise floating-point reasoning is hard: floating- libraries use implementations that have such mixed compu- point is an approximation to real arithmetic, but floating- tations, no systematic technique to verify the correctness of point numbers do not obey the algebraic axioms of real the implementations of such computations is known. In this numbers due to rounding errors. The situation becomes even paper, we present the first general technique for verifying the more difficult in the presence of bit-level operations, such as correctness of mixed binaries, which combines abstraction, bit-manipulations of the floating-point representation. To il- lustrate a mixed code, consider an implementation that com- analytical optimization, and testing. The technique provides n a method to compute an error bound of a given implementa- putes the floating-point number 2 from a small integer n.A na¨ıve implementation would first compute the integer repre- tion with respect to its mathematical specification. We apply n our technique to Intel’s implementations of transcendental senting 2 and then perform the computationally expensive functions and prove formal error bounds for these widely operation of converting an integer to a floating-point num- used routines. ber. Alternatively, the same result can be obtained by bit- shifting n + 1023 left by 52 bits (Figure 3). Existing static Categories and Subject Descriptors D.2.4 [Software/Pro- analyses for floating-point arithmetic would be stumped by gram Verification]: Correctness proofs the bit-shift operation and would fail to prove the functional Keywords Verification; Floating-point; Bit-manipulation; correctness of this trick. Moreover, such tricks are routine in Bit-level operation; ULP error; Absolute error; x86 binary; real codes [9, 16]. Binary analysis; Transcendental function Before explaining our solution, it is important to understand why existing automated techniques (based on testing, 1. Introduction model checking, and abstract interpretation) are inadequate. The simplest verification technique is exhaustive testing of Highly optimized implementations of floating-point libraries all possible inputs. This approach is feasible for a function rely on intermixing floating-point and bit-level code. Even like expf that computes the exponential of a 32-bit sin- though such code is part of widely used libraries, such as gle precision floating-point number. However, the number optimized implementations of C math libraries, automatic of double precision floating-point numbers is too large for formal verification of these implementations has remained brute force enumeration to be tractable. an open challenge [22]. Although it has been demonstrated A plausible verification strategy involves encoding cor- that it is possible to construct machine-checkable proofs of rectness as the validity of a SMT formula [5]. However, correctness by hand for floating-point algorithms of the level the specifications of interest here are transcendentals and of sophistication we are interested in [11, 12], no existing au- these (ex, sin (x) ; etc.) cannot be encoded precisely in existing SMT theories. Verifiers based on abstract interpretation, such as ASTREE´ and FLUCTUAT, use pattern matching to handle certain bit-trick routines in commercial floating-point avionics codes [9, 16]. Our goal is a general technique. Our approach to the problem is to divide and conquer. For a given floating-point implementation, we consider non- overlapping intervals that are subsets of the possible range of inputs. We require each interval I to satisfy the following property: if we statically know that the inputs are restricted to I, the bit-level operations can be removed from the implementation by partial evaluation. Then, for each interval, 1 vmovddup %xmm0, %xmm0 we have a specialized implementation that is composed ex- 2 vmulpd L2E, %xmm0, %xmm2 clusively of floating-point operations and thus amenable to 3 vroundpd $0, %xmm2, %xmm2 abstraction-based techniques. Our main contribution is to 4 vcvtpd2dqx %xmm2, %xmm3 devise a procedure to construct such intervals (x4). There 5 vpaddd B, %xmm3, %xmm3 6 vpslld $20, %xmm3, %xmm3 is one significant subtlety: The intervals do not always fully 7 vpshufd $114, %xmm3, %xmm3 cover the space and we must deal with potential “gaps” be- 8 vmulpd C1, %xmm2, %xmm1 tween intervals. Commercial tools such as FLUCTUAT [7, 9] 9 vmulpd C2, %xmm2, %xmm2 also subdivide the input range (with no gaps) to improve pre- 10 vaddpd %xmm1, %xmm0, %xmm1 cision and our technique can be seen as a systematic method 11 vaddpd %xmm2, %xmm1, %xmm1 12 vmovapd T1, %xmm0 to construct these subdivisions. We analyze the implementa- 13 vmulpd T12, %xmm1, %xmm2 tions specialized for each interval and report the maximum 14 vaddpd T11, %xmm2, %xmm2 error between the implementation and the ideal mathemati- 15 vmulpd %xmm1, %xmm2, %xmm2 cal specification. 16 vaddpd T10, %xmm2, %xmm2 17 vmulpd %xmm1, %xmm2, %xmm2 We make the following contributions. 18 vaddpd T9, %xmm2, %xmm2 • We describe the first general technique for verification 19 vmulpd %xmm1, %xmm2, %xmm2 20 vaddpd T8, %xmm2, %xmm2 of mixed floating-point and bit-level code. We are un- 21 vmulpd %xmm1, %xmm2, %xmm2 aware of any automatic or semi-automatic verification 22 vaddpd T7, %xmm2, %xmm2 technique that can prove the functional correctness of the 23 vmulpd %xmm1, %xmm2, %xmm2 production grade benchmarks we consider. Prior to this 24 vaddpd T6, %xmm2, %xmm2 25 vmulpd %xmm1, %xmm2, %xmm2 work, formal verification of such benchmarks required 26 vaddpd T5, %xmm2, %xmm2 manual construction of machine-checkable proofs [11, 27 vmulpd %xmm1, %xmm2, %xmm2 12]. 28 vaddpd T4, %xmm2, %xmm2 29 vmulpd %xmm1, %xmm2, %xmm2 • We reduce the problem of computing bounds on numeri- 30 vaddpd T3, %xmm2, %xmm2 cal errors to an optimization problem and leverage state- 31 vmulpd %xmm1, %xmm2, %xmm2 of-the-art techniques for analytical optimization. While 32 vaddpd T2, %xmm2, %xmm2 33 vmulpd %xmm1, %xmm2, %xmm2 our method is not fully automatic, these techniques auto- 34 vaddpd %xmm0, %xmm2, %xmm2 mate one of the most difficult aspects of the problem and 35 vmulpd %xmm1, %xmm2, %xmm1 make verification of complex implementations feasible. 36 vaddpd %xmm0, %xmm1, %xmm0 37 vmulpd %xmm3, %xmm0, %xmm0 • Our technique performs verification at the binary level, 38 retq not on source code or a model of the program. Thus, the derived bounds apply to the actual code that executes Figure 1. The x86 assembly code of exp that ships with directly on the hardware. S3D [3]. Instructions have been reordered to aid understand- We evaluate our technique on three implementations of ing, without affecting the output. transcendental functions from Intel’s libraries: a bounded periodic function (sin, x5.2), an unbounded discontinuous sulting proof scripts can be quite hard to read, and in periodic function (tan, x5.3), and an unbounded continu- some cases hard to modify to prove a slightly different ous function (log, x5.4). We are able to successfully bound theorem. the difference between the result computed by these implementations and the exact mathematical result. For each of The rest of the paper is organized as follows. x2, through these functions, we also trade precision for performance and an example, discusses our verification technique. x3 reviews create significantly more efficient variants that produce ap- formal definitions of rounding errors and x4 presents our proximately correct results. Using our technique, we are able verification technique that combines abstraction, analytical to provide a bound on the difference between the approx- optimization, and testing. x5 discusses evaluation and x6 imate variants and the mathematical specifications. These surveys prior work. Finally, x7 gives a discussion of future results demonstrate the generality of our technique and ad- work and x8 concludes. dress some of the drawbacks of manually constructed proofs: modifying the manual proofs to prove even slightly different 2. Motivating Example theorems is difficult [11, 12]. To quote Harrison [11], S3D [3] is a combustion chemistry simulation that is heavily [N]ontrivial proofs, as are carried out in the work de- used in research on developing more efficient and cleaner scribed here, often require long and complicated se- fuels for internal combustion engines. The performance of quence of rules. The construction of these proofs often the exponential function is so important for this task that the requires considerable persistence. Moreover, the re- developers ship a hand-coded x86 assembly implementation 1 vmulpd L2E, %xmm0, %xmm2 Next, we relate this algorithm with Figure 1. Our descrip- 2 vroundpd $0xfffffffffffffffe, %xmm2, %xmm2 tion below elides several details that are important for per- 3 vcvttpd2dq %xmm2, %xmm3 formance (such as the use of vector instructions), and fo- 4 vpaddw B, %xmm3, %xmm3 cuses on functionality. The calling convention used by exp 5 vpsllq $0x14, %xmm3, %xmm3 includes storing the first argument and the return value of a 6 vpshufd $0x3, %xmm3, %xmm3 function in the register xmm0. We omit details about the x86 7 vmulpd C1, %xmm2, %xmm1 8 vaddpd %xmm1, %xmm0, %xmm1 syntax and describe the implementation at a high level. The 9 vmovapd T1, %xmm0 code is divided into several blocks by the horizontal lines in 10 vlddqu T8, %xmm2 Figure 1 and the instructions within a block compute a value 11 vmulpd %xmm1, %xmm2, %xmm2 of interest. 12 vaddpd T7, %xmm2, %xmm2 13 vmulpd %xmm1, %xmm2, %xmm2 • The first block (lines 1-3) computes N from the input x.

Load more