ALGORITHMS and PROOFS Contents 1

THE MPFR LIBRARY: ALGORITHMS AND PROOFS THE MPFR TEAM Contents 1. Notations and Assumptions 2 2. Error calculus 2 2.1. Ulp calculus 2 2.2. Relative error analysis 4 2.3. Generic error of addition/subtraction 4 2.4. Generic error of multiplication 5 2.5. Generic error of inverse 5 2.6. Generic error of division 6 2.7. Generic error of square root 7 2.8. Generic error of the exponential 7 2.9. Generic error of the logarithm 8 2.10. Ulp calculus vs relative error 8 3. Low level functions 9 3.1. The mpfr add function 9 3.2. The mpfr cmp2 function 9 3.3. The mpfr sub function 10 3.4. The mpfr mul function 11 3.5. The mpfr div function 11 3.6. The mpfr sqrt function 13 3.7. The inverse square root 13 3.8. The mpfr remainder and mpfr remquo functions 15 4. High level functions 15 4.1. The cosine function 15 4.2. The sine function 16 4.3. The tangent function 17 4.4. The exponential function 17 4.5. The error function 18 4.6. The hyperbolic cosine function 19 4.7. The inverse hyperbolic cosine function 20 4.8. The hyperbolic sine function 21 4.9. The inverse hyperbolic sine function 22 4.10. The hyperbolic tangent function 23 4.11. The inverse hyperbolic tangent function 24 4.12. The arc-sine function 25 http://www.mpfr.org/ 1 4.13. The arc-cosine function 25 4.14. The arc-tangent function 26 4.15. The Euclidean distance function 31 4.16. The floating multiply-add function 32 4.17. The expm1 function 33 4.18. The log1p function 34 4.19. The log2 or log10 function 34 4.20. The power function 35 4.21. The real cube root 36 4.22. The exponential integral 37 4.23. The gamma function 37 4.24. The Riemann Zeta function 38 4.25. The arithmetic-geometric mean 40 4.26. The Bessel functions 42 4.27. The Dilogarithm function 46 4.28. Summary 52 5. Mathematical constants 52 5.1. The constant π 52 5.2. Euler’s constant 53 5.3. The log 2 constant 56 5.4. Catalan’s constant 56 References 57 1. Notations and Assumptions In the whole document, () denotes rounding to nearest, () rounding toward zero, () rounding toward plus infinity,N () rounding toward minus infinity,Z and () any of those four△ rounding modes. ▽ ◦ In the whole document, except special notice, all variables are assumed to have the same precision, usually denoted p. 2. Error calculus Let n — the working precision — be a positive integer (considered fixed in the following). e 1 We write any nonzero real number x in the form x = m 2 with 2 m < 1 and e := exp(x), exp(x) n · ≤ | | and we define ulp(x) := 2 − . 2.1. Ulp calculus. n n+1 Rule 1. 2− x < ulp(x) 2− x . | | ≤ | | Proof. Obvious from x = m 2e with 1 m < 1. · 2 ≤ | | Rule 2. If a and b have same precision n, and a b , then ulp(a) ulp(b). | | ≤ | | ≤ ea eb Proof. Write a = ma 2 and b = mb 2 . Then a b implies ea eb, thus ulp(a) = ea n e n · · | | ≤ | | ≤ 2 − 2 b− = ulp(b). ≤ 2 1 Rule 3. Let x be a real number, and y = (x). Then x y 2 min(ulp(x), ulp(y)) in rounding to nearest, and x y min(ulp(x◦), ulp(y)) for| the− other| ≤ rounding modes. | − | ≤ 1 Proof. First consider rounding to nearest. By definition, we have x y 2 ulp(y). If 1 1 | − | ≤ ulp(y) ulp(x), then x y 2 ulp(y) 2 ulp(x). The only difficult case is when ulp(x) < ≤ | − | ≤ ≤ 1 ulp(y), but then necessarily y is a power of two; since in that case y 2 ulp(y) is exactly 1 − 1 representable, the maximal possible difference between x and y is 4 ulp(y) = 2 ulp(x), which concludes the proof in the rounding to nearest case. In rounding to zero, we always have ulp(y) ulp(x), so the rule holds. In rounding away from zero, the only difficult case is when ulp(x≤) < ulp(y), but then y is a power of two, and 1 since y 2 ulp(y) is exactly representable, the maximal possible difference between x and y 1 − is 2 ulp(y) = ulp(x), which concludes the proof. Rule 4. 1 a ulp(b) < ulp(ab) < 2 a ulp(b). 2 | | · | | · ea eb e 1 1 ea+eb Proof. Write a = ma2 , b = mb 2 , and ab = m2 with 2 ma, mb, m < 1, then 4 2 ea+eb · ≤1 ea ea ≤ ab < 2 , thus e = ea+eb or e = ea+eb 1, which implies 2 2 ulp(b) ulp(ab) 2 ulp(b) | | e n − e≤a ≤ using 2 b− = ulp(b), and the rule follows from the fact that a < 2 2 a (equality on 1 | | ≤ | | the right side can occur only if e = ea + eb and ma = 2 , which are incompatible). Rule 5. ulp(2ka) = 2kulp(a). Proof. Easy: if a = m 2ea , then 2ka = m 2ea+k. a · a · Rule 6. Let x > 0, ( ) be any rounding, and u := (x), then 1 u < x < 2u. ◦ · ◦ 2 Proof. Assume x 2u, then 2u is another representable number which is closer from x than ≥ 1 u, which leads to a contradiction. The same argument proves 2 u < x. Rule 7. 1 a ulp(1) < ulp(a) a ulp(1). 2 | | · ≤ | | · Proof. The left inequality comes from Rule 4 with b = 1, and the right one from a ulp(1) 1 ea 1 n | | ≥ 2 2 2 − = ulp(a). Rule 8. For any x = 0 and any rounding mode ( ), we have ulp(x) ulp( (x)), and equality holds when rounding6 toward zero, toward ◦ · for x > 0, or toward≤ + ◦for x < 0. −∞ ∞ e 1 1 ex Proof. Without loss of generality, assume x > 0. Let x = m 2 with 2 m < 1. As 2 2 is a 1 ex · ≤ ex n machine number, necessarily (x) 2 2 , thus by Rule 2, then ulp( (x)) 2 − = ulp(x). If we round toward zero, then◦ (x)≥ x and by Rule 2 again, ulp( (x◦)) ≥ulp(x). ◦ ≤ ◦ ≤ Rule 9. + For error(u) kuulp(u), u.cu− x u.cu ≤ 1 p + ≤ ≤ 1 p with cu− = 1 ku2 − and cu = 1 + ku2 − − 3 + For u = (x), u.c− x u.c ◦ u ≤ ≤ u if u = (x), then c+ = 1 △ u if u = (x), then c− = 1 ▽ u if for x < 0 and u = (x), then c+ = 1 Z u if for x > 0 and u = (x), then cu− = 1 1 p Z + 1 p else c− = 1 2 − and c = 1 + 2 − u − u 2.2. Relative error analysis. Another way to get a bound on the error, is to bound the relative error. This is sometimes easier than using the “ulp calculus” especially when performing only multiplications or divisions. Rule 10. If u := (x), then we can write both: ◦p u = x(1 + θ1), x = u(1 + θ2), p 1 p where θ 2− for rounding to nearest, and θ < 2 − for directed rounding. | i| ≤ | i| Proof. This is a simple consequence of Rule 3. For rounding to nearest, we have u x 1 p | − | ≤ ulp(t) for t = u or t = x, hence by Rule 1 u x 2− . 2 | − | ≤ Rule 11. Assume x1, . , xn are n floating-point numbers in precision p, and we compute an approximation of their product with the following sequence of operations: u1 = x1, u2 = (u1x2), . , un = (un 1xn). If rounding away from zero, the total rounding error is bounded by◦ 2(n 1)ulp(u )◦. − − n 1 p Proof. We can write u1x2 = u2(1 θ2), . , un 1xn = un(1 θn), where 0 θi 2 − . − − − ≤n 1 ≤ We get x1x2 . xn = un(1 θ2) ... (1 θn), which we can write un(1 θ) − for some 1 p − − − n 0 θ 2 − by the intermediate value theorem. Since 1 nt (1 t) 1, we get ≤ ≤ 1 p − ≤ − ≤ x x . x u (n 1)2 − u 2(n 1)ulp(u ) by Rule 1. | 1 2 n − n| ≤ − | n| ≤ − n 2.3. Generic error of addition/subtraction. We want to compute the generic error of the subtraction, the following rules apply to addition too. Note: error(u) k ulp(u), error(v) k ulp(v) ≤ u ≤ v ew p eu p ev p Note: ulp(w) = 2 − , ulp(u) = 2 − , ulp(v) = 2 − with p the precision ′ ′ d+ew p d +ew p ulp(u) = 2 − , ulp(v) = 2 − , with d = e e d = e e u − w v − w error(w) cwulp(w) + kuulp(u) + kvulp(v) ≤ ′ d d = (cw + ku2 + kv2 )ulp(w) If (u 0 and v 0) or (u 0 and v 0) ≥ ≥ ≤ ≤ error(w) (c + k + k ) ulp(w) ≤ w u v 1 Note: If w = (u + v) Then c = else c = 1 N w 2 w 4 2.4. Generic error of multiplication. We want to compute the generic error of the multiplication. We assume here u, v > 0 are approximations of exact values respectively x and y, with u x k ulp(u) and v y k ulp(v). | − | ≤ u | − | ≤ v w = (uv) ◦ error(w) = w xy | − | w uv + uv xy ≤ | − | | − | 1 c ulp(w) + [ uv uy + uy xy + uv xv + xv xy ] ≤ w 2 | − | | − | | − | | − | u + x v + y c ulp(w) + k ulp(v) + k ulp(u) ≤ w 2 v 2 u u(1 + c+) v(1 + c+) c ulp(w) + u k ulp(v) + v k ulp(u) [Rule 9] ≤ w 2 v 2 u c ulp(w) + (1 + c+)k ulp(uv) + (1 + c+)k ulp(uv) [Rule 4] ≤ w u v v u [c + (1 + c+)k + (1 + c+)k ]ulp(w) [Rule 8] ≤ w u v v u 1 Note: If w = (uv) Then c = else c = 1 N w 2 w 2.5.

Load more