When Double Rounding Is Odd

When double rounding is odd Sylvie Boldo and Guillaume Melquiond Laboratoire de l'Informatique du Parallélisme UMR 5668 CNRS, ENS Lyon, INRIA, UCBL 46 allée d'Italie, 69 364 Lyon Cedex 07, France E-mail: fSylvie.Boldo, [email protected] Abstract - Many general purpose processors rounding may be erroneous: the final result is some- (including Intel's) may not always produce the times different from the correctly rounded result. correctly rounded result of a floating-point operation due to double rounding. Instead of rounding It would not be a problem if the compilers were in- the value to the working precision, the value is first deed setting correctly the precisions of each floating- rounded in an intermediate extended precision point operation for processors that only work in an and then rounded in the working precision; this extended format. To increase efficiency, they do not often means a loss of accuracy. We suggest the use usually force the rounding to the wanted format since of rounding to odd as the first rounding in order it is a costly operation. Indeed, the processor pipeline to regain this accuracy: we prove that the double has to be flushed in order to set the new precision. rounding then gives the correct rounding to the If the precision had to be modified before each opera- nearest value. To increase the trust on this result, as this rounding is unusual and this property is tion, it would defeat the whole purpose of the pipeline. surprising, we formally proved this property using Computationally intensive programs would then get the Coq automatic proof checker. excessively slow. As the precision is not explicitly set correspondingly Keywords| Floating-point, double rounding, formal to the destination format, there is a first rounding to proof, Coq. the extended precision corresponding to the floating- I. INTRODUCTION point operation itself and a second rounding when the storage in memory is done. Therefore, double rounding is usually considered as Floating-point users expect their results to be correctly a dangerous feature leading to unexpected inaccu- rounded: this means that the result of a floating-point racy. Nevertheless, double rounding is not necessarily operation is the same as if it was first computed with a threat: we give a double rounding algorithm that an infinite precision and then rounded to the preci- ensures the correctness of the result, meaning that the sion of the destination format. This is required by the result is the same as if only one direct rounding hap- IEEE-754 standard [STE 81], [STE 87] for atomic op- pened. The idea is to prevent the first rounding to erations. This standard is followed by modern general- approach the tie-breaking value between the two pos- purpose processors. sible floating-point results. But this standard does not require the FPU to directly This algorithm only deals with two consecutive round- handle each floating-point format (single, double, dou- ings. But we have also tried to extend this property ble extended): some systems deliver results only to for situations where an arithmetic operation is exe- extended destinations. On such a system, the user cuted between the two roundings. It has led us to the shall be able to specify that a result be rounded in- other algorithm we present. This new algorithm al- stead to a smaller precision, though it may be stored lows us to correctly compute the rounding of a sum of in the extended format. It happens in practice with floating-point numbers under mild assumptions. some processors as the Intel x86, which computes with 80 bits before rounding to the IEEE double precision II. DOUBLE ROUNDING (53 bits), or the PowerPC, which provides IEEE single precision by double rounding from IEEE double A. Floating-point definitions precision. Hence, double rounding may occur if the user did not Our formal proofs are based on the floating-point for- explicitly specify beforehand the destination format. malization of M. Daumas, L. Rideau and L. Théry and Double rounding consists in a first rounding in an ex- on the corresponding library by L. Théry and one of tended precision and then a second rounding in the the authors [DAU 01], [BOL 04a] in Coq [BER 04]. working precision. As described in Section II, this Floating-point numbers are represented by pairs (n; e) that stand for n × 2e. We use both an integral signed the neighborhood of the midpoint of two consecutive mantissa n and an integral signed exponent e for sake floating-point numbers g and h, it may first be rounded of simplicity. in one direction toward this middle t in extended precision, and then rounded in the same direction toward A floating-point format is denoted by B and is a pair f in working precision. Although the result f is close composed by the lowest exponent −E available and to x, it is not the closest floating-point number to x, h the precision p. We do not set an upper bound on is. When both roundings are to nearest, we formally the exponent as overflows do not matter here (see be- proved that the distance between the given result f low). We define a representable pair (n; e) such that and the real value x may be as much as jnj < 2p and e ≥ −E. We denote by F the subset of real numbers represented by these pairs for a given B format . Now only the representable floating-point 1 − − jf − xj ≤ + 2 k 1 ulp(f): numbers will be referred to; they will simply be de- 2 noted as floating-point numbers. All the IEEE-754 rounding modes are also defined in the Coq library, especially the default rounding: the When there is only one rounding, the corresponding 1 rounding to nearest even, denoted by ◦. We have f = inequality is jf − xj ≤ 2 ulp(f). This is the expected ◦(x) if f is the floating-point number closest to x; when result for a IEEE-754 compatible implementation. x is half way between two consecutive floating-point numbers, the one with an even mantissa is chosen. second rounding g h A rounding mode is defined in the Coq library as a first rounding relation between a real number and a floating-point number, and not a function from real values to floats. Indeed, there may be several floats corresponding to x the same real value. For a relation, a weaker property t B than being a rounding mode is being a faithful round- f e step ing. A floating-point number f is a faithful rounding of a real x if it is either the rounded up or rounded down Bw step of x, as shown on Figure 1. When x is a floating-point number, it is its own and only faithful rounding. Oth- Fig. 2. Bad case for double rounding. erwise there always are two faithful roundings when no overflow occurs. C. Double rounding and faithfulness faithful roudings Another interesting property of double rounding as defined previously is that it is a faithful rounding. We x even have a more generic result. correct rounding (closest) Fig. 1. Faithful roundings. x f f B. Double rounding accuracy Fig. 3. Double roundings are faithful. As explained before, a floating-point computation may Let us consider that the relations are not required to first be done in an extended precision, and later be rounding modes but only faithful roundings. We rounded to the working precision. The extended pre- formally certified that the rounded result f of a double cision is denoted by Be = (p + k; Ee) and the working faithful rounding is faithful to the real initial value x, precision is denoted by Bw = (p; Ew). If the same as shown in Figure 3. The requirements are k ≥ 0 and rounding mode is used for both computations (usually k ≤ Ee − Ew (any normal float in the working format to nearest even), it can lead to a less precise result is normal in the extended format). than the result after a single rounding. This is a very powerful result as faithfulness is the best For example, see Figure 2. When the real value x is in result we can expect as soon as we at least consider two roundings to nearest. And this result can be applied B. Correct double rounding algorithm to any two successive IEEE-754 rounding modes (to zero, toward +1. ). Algorithm 1 first computes the rounding to odd of the This means that any sequence of successive roundings real value x in the extended format (with p + k bits). in decreasing precisions gives a faithful rounding of the It then computes the rounding to the nearest even of initial value. the previous value in the working format (with p bits). We here consider a real value x but an implementation III. ALGORITHM does not need to really handle x: it can represent the abstract exact result of an operation between floating- As seen in previous sections, two roundings to nearest point numbers. induce a bigger round-off error than one single rounding to nearest and may then lead to unexpected in- Algorithm 1 Correct double rounding algorithm. correct results. We now present how to choose the p+k roundings for the double rounding to give a correct t = odd (x) rounding to nearest. f = ◦p(t) A. Odd rounding This rounding does not belong to the IEEE-754's or Assuming p ≥ 2 and k ≥ 2, and Ee ≥ 2 + Ew, then 1 even 754R 's rounding modes.

Load more