1.2 Round-Off Errors and Computer Arithmetic

1.2 Round-off Errors and Computer Arithmetic 1 • In a computer model, a memory storage unit – word is used to store a number. • A word has only a finite number of bits. • These facts imply: 1. Only a small set of real numbers (rational numbers) can be accurately represented on computers. 2. (Rounding) errors are inevitable when computer memory is used to represent real, infinite precision numbers. 3. Small rounding errors can be amplified with careless treatment. So, do not be surprised that (9.4)10= (1001.0110)2 can not be represented exactly on computers. • Round-off error: error that is produced when a computer is used to perform real number calculations. 2 Binary numbers and decimal numbers • Binary number system: A method of representing numbers that has 2 as its base and uses only the digits 0 and 1. Each successive digit represents a power of 2. (… . … ) where 0 1, for each = 2,1,0, 1, 2 … 3210 −1−2−3 2 • Binary≤ to decimal:≤ ⋯ − − (… . … ) (… 2 + 2 + 2 + 2 = 3210 −1−2−3 2 + 2 3 + 22 + 1 2 …0) 3 2 1 0 −1 −2 −3 −1 −2 −3 10 3 Binary machine numbers • IEEE (Institute for Electrical and Electronic Engineers) – Standards for binary and decimal floating point numbers • For example, “double” type in the “C” programming language uses a 64-bit (binary digit) representation – 1 sign bit (s), – 11 exponent bits – characteristic (c), – 52 binary fraction bits – mantissa (f) 1. 0 2 1 = 2047 11 4 ≤ ≤ − This 64-bit binary number gives a decimal floating-point number (Normalized IEEE floating point number): 1 2 (1 + ) where 1023 is called exponent − 1023bias. − • Smallest normalized positive number on machine has = 0, = 1, = 0: 2 (1 + 0) 0.22251 × 10 • Largest normalized− positive1022 number on machine has− 307= 0, = 2046 , = 1 2 � : 2 ≈(1 + 1 2 ) 0.17977 × 10 −52 1023 −52 • Underflow : 309 − < 2 �(1 + 0) − ≈ • Overflow: > 2 −1022(2 2 ) � • Machine epsilon 1023= 2 : this− is52 the difference between 1 and the smallest machine�−52 − floating point number greater than 1. 푚 5 • Positive zero: = 0, = 0, = 0. • Negative zero: = 1, = 0, = 0. • Inf: = 0, = 2047, = 0 • NaN: = 0, = 2047, 0 • Machine epsilon = 2 . ≠ – Difference between 1 and the− smallest52 floating 푚 point number greater than 1. 6 Example a. Convert the following binary machine number (P)2 to decimal number. = 0 10000000011 10111001000100 … 0 2 Example b. What’s the next largest machine number of (P)2 ? 7 Decimal machine numbers • Normalized decimal floating-point form: ±0. … × 10 where 1 9 and 0 9, for each = 2 … . 1 2 3 A. Chopping arithmetic: 1 1. Represent≤ ≤ a positive number≤ ≤ as 0. … … × 10 2. chop off digits … . This gives: 1 2 3 +1 +2 =0. … × 10 +1 +2 B. Rounding arithmetic: 123 1. Add 5 × 10 ( ) to 2. Chop off digits− +1 … . • Remark. represents normalized decimal +1+2 machine number. 8 Example 1.2.1. Compute 5-digit (a) chopping and (b) rounding values of = 3.14159265359 … • Definition. Suppose is an approximation to . The actual error is ∗ . The absolute error is . The relative error∗ is , provided that − ∗ 0.∗ − –Remark.− Relative error takes into consideration the size ≠of value. • Definition. The number is said to approximate to significant digits if is∗ the largest nonnegative integer for which 5 × 10 . ∗ − − 9 ≤ Example 1.1.2. Find absolute and relative errors, number of significant digits for: (a) = 0.3000 × 10 and = 0.3100 × 10 (b) = 0.3000 × 101 and ∗ = 0.3100 × 101 . −3 ∗ −3 Example c. Find a bound of relative error for k-digit chopping arithmetic. 10 Finite-Digit arithmetic • Arithmetic in a computer is not exact. • Let machine addition, subtraction, multiplication and division be , , , . = + ⊕=⊖ ⊗( ⊘ ( )) ⊕ = ( × ()) ⊖ = ( ÷− ()) ⊗ Example 1.1.3. ⊘= , = , = 0.714251 , = 98765.9. Use 5-digit5 chopping1 arithmetic to compute 7 3 , , ( ) . Compute relative error for . ⊕ ⊖ ⊖ ⊗ 11 ⊖ Calculations resulting in loss of accuracy 1. Subtracting nearly equal numbers gives fewer significant digits. 2. Dividing by a number with small magnitude or multiplying by a number with large magnitude will enlarge the error. Example d. Suppose is approximated by + . where error is introduced by previous calculation. Let = 10 , > 0. Estimate the absolute error of . − ⊘ 12 Technique to reduce round-off error • Reformulate the calculation. Example e. Compute the most accurate approximation to roots of + 62. + 1 = 0 with 4-digit rounding arithmetic. 2 10 • Nested arithmetic – Purpose is to reduce number of calculations. Example 1.2.5. evaluate = 6.1 + 3. + 1.5 at = 4.71 using 3-digit chopping3 arithmetic.2 − 2 13.

1.2 Round-Off Errors and Computer Arithmetic

Scilab Textbook Companion for Numerical Methods for Engineers by S

Extended Precision Floating Point Arithmetic

Introduction & Floating Point NMMC §1.4.1, Atkinson §1.2 1 Floating

Floating Point Representation (Unsigned) Fixed-Point Representation

Absolute and Relative Error All Computations on a Computer Are Approximate by Nature, Due to the Limited Precision on the Computer

Numerical Computations – with a View Towards R

CS 542G: Preliminaries, Floating-Point, Errors

Limitations of Digital Representations Machine Epsilon

Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates

Floating-Point Numbers and Rounding Errors

• 1E-8 Smaller Than Machine Epsilon (Float) • Forward Sum Fails Utterly

Video 1: Rounding Errors % a Number System Can Be Represented As = ±1