Dedication ÅEÌ

Professor (Stanford) Extending TEX and Professor William Kahan (Berkeley)

ÅEÌ AF ÇÆÌ with Floating-Point Arithmetic AF Nelson H. F. Beebe ÇÆÌ X and Department of Mathematics University of Utah E T Salt Lake City, UT 84112-0090 USA

TEX Users Group Conference 2007 talk. . . – p. 1/30 TEX Users Group Conference 2007 talk. . . – p. 2/30

ÅEÌ AF ÇÆÌ Arithmetic in TEX and Arithmetic in ÅEÌ AF ÇÆÌ ÅEÌ ÅEÌ

Binary integer arithmetic with 32 bits (T X \count ÅEÌ AF ÇÆÌ restricts input numbers to 12 integer bits: ≥ E registers) % mf expr Fixed-point arithmetic with sign bit, overflow bit, 14 gimme an expr: 4095 >> 4095 ≥ gimme an expr: 4096 integer bits, and 16 fractional bits (T X \dimen, E ! Enormous number has been reduced. \muskip, and \skip registers)

AF >> 4095.99998 AF Overflow detected on division and multiplication but not gimme an expr: infinity >> 4095.99998 on addition (flaw (NHFB), feature (DEK)) gimme an expr: epsilon >> 0.00002 gimme an expr: 1/epsilon

Gyrations sometimes needed in ÅEÌ AF ÇÆÌ to work ÇÆÌ ! Arithmetic overflow. ÇÆÌ Xwith and fixed-point numbers X and >> 32767.99998

Uh, oh.E A little while ago one of the quantities gimmeE an expr: 1/3 >> 0.33333 that I was computing got too large, so I’m afraid gimme an expr: 3*(1/3) >> 0.99998 T T your answers will be somewhat askew. You’ll gimme an expr: 1.2 • 2.3 >> •1.1 probably have to adopt different tactics next gimme an expr: 1.2 • 2.4 >> •1.2 time. But I shall try to carry on anyway. gimme an expr: 1.3 • 2.4 >> •1.09999

TEX Users Group Conference 2007 talk. . . – p. 3/30 TEX Users Group Conference 2007 talk. . . – p. 4/30 Historical remarks Historical remarks [cont] ÅEÌ ÅEÌ

It is difficult today to appreciate that probably the biggest problem facing programmers in the early 1950s was Floating Point Arithmetic . . . The subject scaling numbers so as to achieve is not at all as trivial as most people think, acceptable precision from a fixed-point machine. AF and it involves a surprising amount of AF interesting information. Martin Campbell-Kelly Programming the Mark I: Donald E. Knuth

ÇÆÌ The Art of Computer Programming: ÇÆÌ

X and Early Programming Activity X and at the University of Manchester Seminumerical Algorithms, (1998)

E Annals of the History of Computing E 2(2) 130–168 (1980) T T

TEX Users Group Conference 2007 talk. . . – p. 5/30 TEX Users Group Conference 2007 talk. . . – p. 6/30 Historical remarks [cont] Why no floating-point arithmetic? ÅEÌ ÅEÌ Computer hardware designers can make their System dependence in precision, range, rounding, machines much more pleasant to use, underflow, overflow for example by providing Base varies: 2, 3 (Setun), 4 (Illiac II), 8 (Burroughs), floating-point arithmetic 10, 16 (IBM S/360), 256 (Illiac III), 10000 (Maple) which satisfies simple mathematical laws.

AF Bizarre behavior when T X was developed: AF The facilities presently available on most E x y = y x (early Crays) machines make the job of rigorous error × 6 × analysis hopelessly difficult, but properly x = 1.0 x (Pr1me) designed operations would encourage 6 ×

ÇÆÌ x + x = 2 x (Pr1me) ÇÆÌ X and numerical analysts to provide better X and 6 × x = y but 1.0/(x y) gets zero-divide error subroutines which have certified accuracy. 6 − E E wrap between underflow and overflow (PDP-10) Donald E. Knuth T Computer Programming as an Art T job termination on overflow or zero-divide (most) ACM Turing Award Lecture (1973) No standardization: almost every vendor had unique floating-point system TEX Users Group Conference 2007 talk. . . – p. 7/30 TEX Users Group Conference 2007 talk. . . – p. 8/30 Why no floating-point ... [cont]? Why no floating-point ... [cont]? ÅEÌ ÅEÌ Language dependence: Input/output problem requires base conversion, and is Algol, Pascal, and SAIL (real) hard (e.g., conversion from 128-bit binary format can require more than 11500 decimal digits) Fortran (REAL, DOUBLE PRECISION, and sometimes REAL*16) DEK wrote A simple program whose proof isn’t /C++ (double, float added in 1989, long double (1990) about TEX’s conversions between fixed-point in 1999) AF binary and decimal AF Java and C# (only float and double, but Most languages do not guarantee exact base arithmetic system is badly botched: see Kahan and conversion

Darcy’s How Java’s Floating-Point Hurts ÇÆÌ ÇÆÌ X and XT and X guarantees identical line-breaking and Everyone Everywhere) E page-breaking across all platforms (floating-point CompilerE dependence: multiple precisions mapped to arithmeticE used only for interword glue calculations)

T just one T

ÅEÌ AF ÇÆÌ has no floating-point at all, and generates BSD still provide no 80-bit format after 27 identical fonts on all systems years in hardware

TEX Users Group Conference 2007 talk. . . – p. 9/30 TEX Users Group Conference 2007 talk. . . – p. 10/30 IEEE 754 binary standard (1985) IEEE 754 binary standard [cont] ÅEÌ ÅEÌ Preliminary version first implemented in Intel 8087 chip Nonstop computing model: sticky flags record (1980) exceptions Three formats defined: 32-bit, 64-bit, and 80-bit. Four rounding modes: 128-bit format available on some Alpha, IA-64, and to nearest with ties to even (default) SPARC systems.

AF to + AF ∞ Nonzero normal numbers are rational: to x = ( 1)sf 2p, where f [1, 2) −∞ − × ∈ to zero (historical chopping) Signed zero

ÇÆÌ generated from large/small and finite/0 ÇÆÌ X and X and Largest stored exponent represents Infinity when ±∞ NaN generated from 0/0, , / , and any

fE = 0, quiet and signaling NaN (Not-a-Number) when E operation with a NaN operand∞−∞ ∞ ∞ f = 0 T 6 T NaN returned from functions when result is undefined Smallest stored exponent allows f to have leading in real arithmetic (e.g., √ 1) zeros with gradual underflow to subnormal values −

TEX Users Group Conference 2007 talk. . . – p. 11/30 TEX Users Group Conference 2007 talk. . . – p. 12/30 IEEE 754R Precision and range Remark on floating-point arithmetic ÅEÌ ÅEÌ

Binary Contrary to popular misconception, even in some books 32-bit 24b ( 7D) 1e-45 1e-38 3e+38 and compilers, floating-point arithmetic is not fuzzy. ≈ 64-bit 53b( 15D) 4e-324 2e-308 1e+308 Results are exact if they are representable ≈ 80-bit 64b( 19D) 3e-4951 3e-4932 1e+4932 ≈ Multiplication by power of base is always exact, in 128-bit 113b ( 34D) 6e-4966 3e-4932 1e+4932AF AF ≈ absence of underflow and overflow 256-bit 234b ( 70D) 2e-315723 5e-315653 3e+315652 ≈ Subtraction of numbers of like signs and exponents is Decimal exact

32-bit 7D 1e-101 1e-95 1e+96ÇÆÌ ÇÆÌ X and X and 64-bit 16D 1e-398 1e-383 1e+384 128-bitE 34D 1e-6176 1e-6143 1e+6144 E

256-bitT 70D 1e-1572932 1e-1572863 1e+15782864 T

TEX Users Group Conference 2007 talk. . . – p. 13/30 TEX Users Group Conference 2007 talk. . . – p. 14/30 Binary versus decimal Binary versus decimal [cont] ÅEÌ ÅEÌ IEEE 854 Standard for Radix-Independent humans less uncomfortable with decimal arithmetic Floating-Point Arithmetic (1987, 1994) sales tax: 5% of 0.70 = 0.0349999 . . . in all binary older Cobol standards require 18D fixed-point precisions, instead of exact decimal 0.035. Thus, significant cumulative rounding errors in businesses Cobol 2002 requires 32D fixed-point and floating-point

with many small transactions (food, telephone, . . . ) AF Proposals to add decimal arithmetic to C and C++ AF financial computations need fixed-point decimal (2005, 2006) arithmetic 25 years of Rexx and NetRexx scripting languages

ÇÆÌ give valuable experience in arbitrary-precision decimalÇÆÌ Xhand and calculators use decimal arithmetic X and arithmetic

additionalE decimal rounding rules (8 instead of 4) E excellent IBM decNumber library provides open source 9 T decimal arithmetic eliminates most base-conversion T decimal floating-point arithmetic with a billion (10 ) problems digits of precision and exponent magnitudes up to 999999999

TEX Users Group Conference 2007 talk. . . – p. 15/30 TEX Users Group Conference 2007 talk. . . – p. 16/30 Binary versus decimal [cont] Problems with IEEE 754 arithmetic ÅEÌ ÅEÌ Preliminary support in gcc for +, , , and / (late Language access to features slow: 27+ years and still − × 2006) based on subset of IBM decNumber library waiting! mathcw package provides C99-compliant run-time Programmer unfamiliarity, ignorance, and inexperience library for binary, and also for decimal, arithmetic Deficient educational system (NHFB 2005–2007) AF Partial implementations by some vendors (e.g., AF Three sizes defined for IEEE 754R: 32-bit (7D), 64-bit subnormals flush to zero, IA-32 has only one NaN, (16D), and 128-bit (34D) IA-32 and IA-64 have imperfect rounding, Java and C# IBM zSeries mainframes get IEEE 754 binary f.p. lack rounding modes and higher precisions) ÇÆÌ ÇÆÌ

X(1999), and and decimal f.p. in firmware (2006) X and Long internal registers generally beneficial, but also

IBME PowerPC chips add hardware decimal arithmetic produceE many computational surprises and double (2007) rounding, compromising portability T T Hardware support likely in future Intel IA-32 and Rounding behavior at underflow and overflow limits EM64T unspecified, and vendor dependent

TEX Users Group Conference 2007 talk. . . – p. 17/30 TEX Users Group Conference 2007 talk. . . – p. 18/30 How decimal arithmetic is different floating-point arithmetic ÅEÌ ÅEÌ s p Nonzero normal numbers x = ( 1) f 2 , where f TEX and ÅEÌ AF ÇÆÌ must continue to guarantee is an integer: can simulate fixed-point− arithmetic× identical results across platforms Lack of normalization means multiple storage forms, Unspecified behavior of low-level arithmetic but 1., 1.0, 1.00, 1.000, ... compare equal guarantees platform dependence

Quantization detectable (e.g., for financial AF Floating-point not associative, so instruction orderingAF computations, 1.00 differs from 1.000) (e.g., optimization) affects results Signed zero and Infinity, plus quiet and signaling NaNs Long internal registers alter precision, and results detectable from first byte (binary formats require ÇÆÌ Multiply-add computes x y + z with exact product ÇÆÌ Xexamination and of all bits) X and and single rounding, getting× different result from

EightE rounding modes (legal and tax mandates) separateE operations

T Compact storage formats — Densely-Packed Decimal T Conclusion: only a single software floating-point

(DPD) [IBM] and Binary-Integer Decimal (BID) [Intel] arithmetic system in TEX and ÅEÌ AF ÇÆÌ can — need fewer than BCD’s four bits per decimal digit guarantee platform-independent results

TEX Users Group Conference 2007 talk. . . – p. 19/30 TEX Users Group Conference 2007 talk. . . – p. 20/30 Side excursion Software floating-point [cont.] ÅEÌ ÅEÌ No need to modify TEX beyond what has already been done: LuaTEX interfaces TEX to a clean and well-designed scripting language — just need to change arithmetic and library inside lua What if you could provide a seamlessly Scripting languages usually offer a single floating-point integrated, fully dynamic language with a AF datatype, typically equivalent to IEEE 754 64-bit AF conventional syntax while increasing your double (that is all C used to have) application’s size by less than 200K on an x86? You can do it with Lua! qawk and dnawk: awk for 128-bit binary and decimal ÇÆÌ ÇÆÌ X and Keith Fieldhouse XMachines and are fast and memories are big: adopt 34D 128-bit format, or better, 70D 256-bit format, instead as E defaultE numeric type. T T mathcw has highly-portable open-source library support for ten floating-point precisions, including 256-bit binary and decimal.

TEX Users Group Conference 2007 talk. . . – p. 21/30 TEX Users Group Conference 2007 talk. . . – p. 22/30 How much work? Floating-point arithmetic and typesetting ÅEÌ ÅEÌ T X’s smallest dimension is 2 16pt = 1sp, while Changing scripting languages from binary to decimal E − wavelength of visible light is about 100sp (T Xbook, floating-point arithmetic took two to four hours each, with E p. 58): rounding errors are invisible relatively few modifications: 14 TEX’s largest dimension is 2 pt = 5.75m, not quite Program Lines Deleted Added billboard size dgawk 40717 109 165 AF AF macro notation painful (layout.): dlua 16882 25 94 dmawk 16275 73 386 % MARGINNOTEYA = 0.75 * TEXTHEIGHT + FOOTSKIP \T = \TEXTHEIGHT

dnawk 9478 182 296 ÇÆÌ ÇÆÌ X and X\multiply and \T by 75 % possible overflow!

ÅEÌ AF ÇÆÌ in C 30190 0 0 \divide \T by 100 E TEX in C 25215 0 0 \advanceE \T by \FOOTSKIP

T T \xdef \MARGINNOTEYA {\the \T} reduction 75/100 1/4 3 possible here, but not in general → ×

TEX Users Group Conference 2007 talk. . . – p. 23/30 TEX Users Group Conference 2007 talk. . . – p. 24/30 Floating-point arithmetic [cont] MMIX and NNIX ÅEÌ ÅEÌ Overflow detection unreliable in TEX (integer arithmetic in most programming languages is worse!)

No elementary functions available in TEX, not even square root Whenever anybody has asked if I will be writing a book about operating systems, my

ÅEÌ AF ÇÆÌ offers ++ (Pythagoras), abs, angle, AF reply has always been “Nix.” Therefore the AF ceiling, cosd, dir, floor, length, mexp, mlog, name of MMIX’s operating system, normaldeviate, round, sind, sqrt, and NNIX, should come as no surprise. uniformdeviate Donald E. Knuth ÇÆÌ ÇÆÌ XFloating-point and simplifies computation of fractions, X and MMIXware: scaling, and rotation: see LATEX calc package for A RISC Computer for the Third Millenium horrorsE of fixed-point arithmetic E T T Interface from TEX to scripting language allows conventional numeric programming

TEX Users Group Conference 2007 talk. . . – p. 25/30 TEX Users Group Conference 2007 talk. . . – p. 26/30 MMIX and NNIX [cont] MMIX and NNIX [cont] ÅEÌ ÅEÌ AF AF ÇÆÌ ÇÆÌ X and X and E E T T

TEX Users Group Conference 2007 talk. . . – p. 27/30 TEX Users Group Conference 2007 talk. . . – p. 28/30 MMIX and NNIX [cont] ÅEÌ ÅEÌ MMIX is a modern virtual machine used in recent volumes of DEK’s famous series The Art of Computer Programming MMIX is written as literate program in published books

arithmetic is IEEE 754 64-bit binary in software usingAF AF only 32-bit unsigned integers The End THE BEATLES 17 floating-point instructions: FADD, FCMP, FCMPE, FDIV, JULY/AUGUST 1969 FEQL, FEQLE, FINT, FIX, FIXU, FLOT, FLOTU, FMUL, FREM, [1969 = YEAR OF FIRST EDITION OF ÇÆÌ ÇÆÌ XFSQRT and , FSUB, FUN, and FUNE X and DONALD KNUTH’S Seminumerical Algorithms

gccE version 3.2 can be built for MMIX system E (TAOCP VOLUME 2)]

T NNIX is a (still virtual) Unix-like O/S for MMIX T mathcw library port to MMIX: 10 new + 12 header bug workaround, out of 250,000 lines

TEX Users Group Conference 2007 talk. . . – p. 29/30 TEX Users Group Conference 2007 talk. . . – p. 30/30