Dedication ÅEÌ Professor Donald Knuth (Stanford) Extending TEX and Professor William Kahan (Berkeley) ÅEÌAFÇÆÌ with Floating-Point Arithmetic AF Nelson H. F. Beebe ÇÆÌ X and Department of Mathematics University of Utah E T Salt Lake City, UT 84112-0090 USA TEX Users Group Conference 2007 talk. – p. 1/30 TEX Users Group Conference 2007 talk. – p. 2/30 ÅEÌAFÇÆÌ Arithmetic in TEX and Arithmetic in ÅEÌAFÇÆÌ ÅEÌ ÅEÌ Binary integer arithmetic with 32 bits (T X \count ÅEÌAFÇÆÌ restricts input numbers to 12 integer bits: ≥ E registers) % mf expr Fixed-point arithmetic with sign bit, overflow bit, 14 gimme an expr: 4095 >> 4095 ≥ gimme an expr: 4096 integer bits, and 16 fractional bits (T X \dimen, E ! Enormous number has been reduced. \muskip, and \skip registers) AF >> 4095.99998 AF Overflow detected on division and multiplication but not gimme an expr: infinity >> 4095.99998 on addition (flaw (NHFB), feature (DEK)) gimme an expr: epsilon >> 0.00002 gimme an expr: 1/epsilon Gyrations sometimes needed in ÅEÌAFÇÆÌ to work ÇÆÌ ! Arithmetic overflow. ÇÆÌ Xwith and fixed-point numbers X and >> 32767.99998 Uh, oh.E A little while ago one of the quantities gimmeE an expr: 1/3 >> 0.33333 that I was computing got too large, so I’m afraid gimme an expr: 3*(1/3) >> 0.99998 T T your answers will be somewhat askew. You’ll gimme an expr: 1.2 • 2.3 >> •1.1 probably have to adopt different tactics next gimme an expr: 1.2 • 2.4 >> •1.2 time. But I shall try to carry on anyway. gimme an expr: 1.3 • 2.4 >> •1.09999 TEX Users Group Conference 2007 talk. – p. 3/30 TEX Users Group Conference 2007 talk. – p. 4/30 Historical remarks Historical remarks [cont] ÅEÌ ÅEÌ It is difficult today to appreciate that probably the biggest problem facing programmers in the early 1950s was Floating Point Arithmetic . The subject scaling numbers so as to achieve is not at all as trivial as most people think, acceptable precision from a fixed-point machine. AF and it involves a surprising amount of AF interesting information. Martin Campbell-Kelly Programming the Mark I: Donald E. Knuth ÇÆÌ The Art of Computer Programming: ÇÆÌ X and Early Programming Activity X and at the University of Manchester Seminumerical Algorithms, (1998) E Annals of the History of Computing E 2(2) 130–168 (1980) T T TEX Users Group Conference 2007 talk. – p. 5/30 TEX Users Group Conference 2007 talk. – p. 6/30 Historical remarks [cont] Why no floating-point arithmetic? ÅEÌ ÅEÌ Computer hardware designers can make their System dependence in precision, range, rounding, machines much more pleasant to use, underflow, overflow for example by providing Base varies: 2, 3 (Setun), 4 (Illiac II), 8 (Burroughs), floating-point arithmetic 10, 16 (IBM S/360), 256 (Illiac III), 10000 (Maple) which satisfies simple mathematical laws. AF Bizarre behavior when T X was developed: AF The facilities presently available on most E x y = y x (early Crays) machines make the job of rigorous error × 6 × analysis hopelessly difficult, but properly x = 1.0 x (Pr1me) designed operations would encourage 6 × ÇÆÌ x + x = 2 x (Pr1me) ÇÆÌ X and numerical analysts to provide better X and 6 × x = y but 1.0/(x y) gets zero-divide error subroutines which have certified accuracy. 6 − E E wrap between underflow and overflow (PDP-10) Donald E. Knuth T Computer Programming as an Art T job termination on overflow or zero-divide (most) ACM Turing Award Lecture (1973) No standardization: almost every vendor had unique floating-point system TEX Users Group Conference 2007 talk. – p. 7/30 TEX Users Group Conference 2007 talk. – p. 8/30 Why no floating-point ... [cont]? Why no floating-point ... [cont]? ÅEÌ ÅEÌ Language dependence: Input/output problem requires base conversion, and is Algol, Pascal, and SAIL (real) hard (e.g., conversion from 128-bit binary format can require more than 11500 decimal digits) Fortran (REAL, DOUBLE PRECISION, and sometimes REAL*16) DEK wrote A simple program whose proof isn’t C/C++ (double, float added in 1989, long double (1990) about TEX’s conversions between fixed-point in 1999) AF binary and decimal AF Java and C# (only float and double, but Most languages do not guarantee exact base arithmetic system is badly botched: see Kahan and conversion Darcy’s How Java’s Floating-Point Hurts ÇÆÌ ÇÆÌ X and XT and X guarantees identical line-breaking and Everyone Everywhere) E page-breaking across all platforms (floating-point CompilerE dependence: multiple precisions mapped to arithmeticE used only for interword glue calculations) T just one T ÅEÌAFÇÆÌ has no floating-point at all, and generates BSD compilers still provide no 80-bit format after 27 identical fonts on all systems years in hardware TEX Users Group Conference 2007 talk. – p. 9/30 TEX Users Group Conference 2007 talk. – p. 10/30 IEEE 754 binary standard (1985) IEEE 754 binary standard [cont] ÅEÌ ÅEÌ Preliminary version first implemented in Intel 8087 chip Nonstop computing model: sticky flags record (1980) exceptions Three formats defined: 32-bit, 64-bit, and 80-bit. Four rounding modes: 128-bit format available on some Alpha, IA-64, and to nearest with ties to even (default) SPARC systems. AF to + AF ∞ Nonzero normal numbers are rational: to x = ( 1)sf 2p, where f [1, 2) −∞ − × ∈ to zero (historical chopping) Signed zero ÇÆÌ generated from large/small and finite/0 ÇÆÌ X and X and Largest stored exponent represents Infinity when ±∞ NaN generated from 0/0, , / , and any fE = 0, quiet and signaling NaN (Not-a-Number) when E operation with a NaN operand∞−∞ ∞ ∞ f = 0 T 6 T NaN returned from functions when result is undefined Smallest stored exponent allows f to have leading in real arithmetic (e.g., √ 1) zeros with gradual underflow to subnormal values − TEX Users Group Conference 2007 talk. – p. 11/30 TEX Users Group Conference 2007 talk. – p. 12/30 IEEE 754R Precision and range Remark on floating-point arithmetic ÅEÌ ÅEÌ Binary Contrary to popular misconception, even in some books 32-bit 24b ( 7D) 1e-45 1e-38 3e+38 and compilers, floating-point arithmetic is not fuzzy. ≈ 64-bit 53b( 15D) 4e-324 2e-308 1e+308 Results are exact if they are representable ≈ 80-bit 64b( 19D) 3e-4951 3e-4932 1e+4932 ≈ Multiplication by power of base is always exact, in 128-bit 113b ( 34D) 6e-4966 3e-4932 1e+4932AF AF ≈ absence of underflow and overflow 256-bit 234b ( 70D) 2e-315723 5e-315653 3e+315652 ≈ Subtraction of numbers of like signs and exponents is Decimal exact 32-bit 7D 1e-101 1e-95 1e+96ÇÆÌ ÇÆÌ X and X and 64-bit 16D 1e-398 1e-383 1e+384 128-bitE 34D 1e-6176 1e-6143 1e+6144 E 256-bitT 70D 1e-1572932 1e-1572863 1e+15782864 T TEX Users Group Conference 2007 talk. – p. 13/30 TEX Users Group Conference 2007 talk. – p. 14/30 Binary versus decimal Binary versus decimal [cont] ÅEÌ ÅEÌ IEEE 854 Standard for Radix-Independent humans less uncomfortable with decimal arithmetic Floating-Point Arithmetic (1987, 1994) sales tax: 5% of 0.70 = 0.0349999 . in all binary older Cobol standards require 18D fixed-point precisions, instead of exact decimal 0.035. Thus, significant cumulative rounding errors in businesses Cobol 2002 requires 32D fixed-point and floating-point with many small transactions (food, telephone, . ) AF Proposals to add decimal arithmetic to C and C++ AF financial computations need fixed-point decimal (2005, 2006) arithmetic 25 years of Rexx and NetRexx scripting languages ÇÆÌ give valuable experience in arbitrary-precision decimalÇÆÌ Xhand and calculators use decimal arithmetic X and arithmetic additionalE decimal rounding rules (8 instead of 4) E excellent IBM decNumber library provides open source 9 T decimal arithmetic eliminates most base-conversion T decimal floating-point arithmetic with a billion (10 ) problems digits of precision and exponent magnitudes up to 999999999 TEX Users Group Conference 2007 talk. – p. 15/30 TEX Users Group Conference 2007 talk. – p. 16/30 Binary versus decimal [cont] Problems with IEEE 754 arithmetic ÅEÌ ÅEÌ Preliminary support in gcc for +, , , and / (late Language access to features slow: 27+ years and still − × 2006) based on subset of IBM decNumber library waiting! mathcw package provides C99-compliant run-time Programmer unfamiliarity, ignorance, and inexperience library for binary, and also for decimal, arithmetic Deficient educational system (NHFB 2005–2007) AF Partial implementations by some vendors (e.g., AF Three sizes defined for IEEE 754R: 32-bit (7D), 64-bit subnormals flush to zero, IA-32 has only one NaN, (16D), and 128-bit (34D) IA-32 and IA-64 have imperfect rounding, Java and C# IBM zSeries mainframes get IEEE 754 binary f.p. lack rounding modes and higher precisions) ÇÆÌ ÇÆÌ X(1999), and and decimal f.p. in firmware (2006) X and Long internal registers generally beneficial, but also IBME PowerPC chips add hardware decimal arithmetic produceE many computational surprises and double (2007) rounding, compromising portability T T Hardware support likely in future Intel IA-32 and Rounding behavior at underflow and overflow limits EM64T unspecified, and vendor dependent TEX Users Group Conference 2007 talk. – p. 17/30 TEX Users Group Conference 2007 talk. – p. 18/30 How decimal arithmetic is different Software floating-point arithmetic ÅEÌ ÅEÌ s p Nonzero normal numbers x = ( 1) f 2 , where f TEX and ÅEÌAFÇÆÌ must continue to guarantee is an integer: can simulate fixed-point− arithmetic× identical results across platforms Lack of normalization means multiple storage forms, Unspecified behavior of low-level arithmetic but 1., 1.0, 1.00, 1.000, ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-