<<

1.2 Round-off Errors and Computer Arithmetic

1 • In a computer model, a memory storage unit – word is used to store a . • A word has only a finite number of bits. • These facts imply: 1. Only a small set of real (rational numbers) can be accurately represented on computers. 2. () errors are inevitable when computer memory is used to represent real, infinite precision numbers. 3. Small rounding errors can be amplified with careless treatment. So, do not be surprised that (9.4)10= (1001.0110)2 can not be represented exactly on computers. • Round-off error: error that is produced when a computer is used to perform calculations.

2 Binary numbers and decimal numbers • Binary number system: A method of representing numbers that has 2 as its base and uses only the digits 0 and 1. Each successive digit represents a power of 2. (… . … ) where 0 1, for each = 2,1,0, 1, 2 … 𝑏𝑏3𝑏𝑏2𝑏𝑏1𝑏𝑏0 𝑏𝑏−1𝑏𝑏−2𝑏𝑏−3 2 𝑖𝑖 • Binary≤ to𝑏𝑏 decimal:≤ 𝑖𝑖 ⋯ − − (… . … ) (… 2 + 2 + 2 + 2 = 𝑏𝑏3𝑏𝑏2𝑏𝑏1𝑏𝑏0 𝑏𝑏−1𝑏𝑏−2𝑏𝑏−3 2 + 2 3 + 22 + 1 2 …0) 3 2 1 0 𝑏𝑏 −1 𝑏𝑏 −2 𝑏𝑏 −𝑏𝑏3 𝑏𝑏−1 𝑏𝑏−2 𝑏𝑏−3 10 3 Binary machine numbers • IEEE (Institute for Electrical and Electronic Engineers) – Standards for binary and decimal floating point numbers • For example, “double” type in the “” programming language uses a 64-bit (binary digit) representation – 1 sign bit (s), – 11 exponent bits – characteristic (c), – 52 binary fraction bits – mantissa (f)

1. 0 2 1 = 2047 11 4 ≤ 𝑐𝑐 ≤ − This 64-bit binary number gives a decimal floating-point number (Normalized IEEE floating point number): 1 2 (1 + ) where 1023 is called exponent𝑠𝑠 𝑐𝑐− 1023bias. − 𝑓𝑓 • Smallest normalized positive number on machine has = 0, = 1, = 0: 2 (1 + 0) 0.22251 × 10 • Largest normalized− positive1022 number on machine has− 307𝑠𝑠= 0, 𝑐𝑐 = 2046𝑓𝑓 , = 1 2 � : 2 ≈(1 + 1 2 ) 0.17977 × 10 −52 1023 −52 𝑠𝑠 • Underflow𝑐𝑐 : 𝑓𝑓309 − < 2 �(1 + 0) − ≈ • Overflow: > 2 −1022(2 2 ) 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 � • Machine epsilon 1023= 2 : this− is52 the difference between 1 𝑛𝑛𝑛𝑛𝑛𝑛and the𝑛𝑛𝑛𝑛𝑛𝑛 𝑛𝑛smallest machine�−52 − floating point number greater than 1. 𝜖𝜖𝑚𝑚𝑚𝑚𝑚𝑚푚 5 • Positive zero: = 0, = 0, = 0. • Negative zero: = 1, = 0, = 0. 𝑠𝑠 𝑐𝑐 𝑓𝑓 • Inf: = 0, = 2047, = 0 𝑠𝑠 𝑐𝑐 𝑓𝑓 • NaN: = 0, = 2047, 0 𝑠𝑠 𝑐𝑐 𝑓𝑓 • Machine epsilon = 2 . 𝑠𝑠 𝑐𝑐 𝑓𝑓 ≠ – Difference between 1 and the− smallest52 floating 𝑚𝑚𝑚𝑚𝑚𝑚푚 point number greater𝜖𝜖 than 1.

6 Example a. Convert the following binary machine number (P)2 to decimal number. = 0 10000000011 10111001000100 … 0

𝑃𝑃 2 Example b. What’s the next largest machine number of (P)2 ?

7 Decimal machine numbers • Normalized decimal floating-point form: ±0. … × 10 where 1 9 and 0 9, for each𝑛𝑛 = 2 … . 1 2 3 𝑘𝑘 A. Chopping arithmetic:𝑑𝑑 𝑑𝑑 𝑑𝑑 𝑑𝑑 1 𝑖𝑖 1. Represent≤ 𝑏𝑏 ≤ a positive number≤ 𝑏𝑏 ≤ as 𝑖𝑖 𝑘𝑘 0. … … × 10 2. chop off digits … . 𝑦𝑦This 𝑛𝑛gives: 1 2 3 𝑘𝑘 𝑘𝑘+1 𝑘𝑘+2 𝑑𝑑 𝑑𝑑 𝑑𝑑 𝑑𝑑=𝑑𝑑0. 𝑑𝑑 … × 10 𝑘𝑘+1 𝑘𝑘+2 B. Rounding arithmetic:𝑑𝑑 𝑑𝑑 𝑛𝑛 𝑓𝑓𝑓𝑓 𝑦𝑦 𝑑𝑑1𝑑𝑑2𝑑𝑑3 𝑑𝑑𝑘𝑘 1. Add 5 × 10 ( ) to 2. Chop off digits𝑛𝑛− 𝑘𝑘+1 … . 𝑦𝑦 • Remark. represents normalized decimal 𝑑𝑑𝑘𝑘+1𝑑𝑑𝑘𝑘+2 machine number. 8 𝑓𝑓𝑓𝑓 𝑦𝑦 Example 1.2.1. Compute 5-digit (a) chopping and (b) rounding values of = 3.14159265359 …

• Definition. Suppose𝜋𝜋 is an approximation to . The actual error is ∗ . The absolute error is . The relative𝑝𝑝 error∗ is , provided that𝑝𝑝 𝑝𝑝 − 𝑝𝑝 ∗ 0.∗ 𝑝𝑝−𝑝𝑝 𝑝𝑝 –𝑝𝑝Remark.− 𝑝𝑝 Relative error takes into consideration the size 𝑝𝑝 ≠of value. • Definition. The number is said to approximate to significant digits if is∗ the largest nonnegative integer for which 𝑝𝑝 5 × 10 . 𝑝𝑝 𝑡𝑡 ∗𝑡𝑡 𝑝𝑝−𝑝𝑝 −𝑡𝑡 9 𝑝𝑝 ≤ Example 1.1.2. Find absolute and relative errors, number of significant digits for: (a) = 0.3000 × 10 and = 0.3100 × 10 (b) = 0.3000 × 101 and ∗ = 0.3100 × 101 . 𝑝𝑝 𝑝𝑝 −3 ∗ −3 𝑝𝑝 𝑝𝑝 Example c. Find a bound of relative error for k-digit chopping arithmetic.

10 Finite-Digit arithmetic • Arithmetic in a computer is not exact. • Let machine addition, subtraction, multiplication and division be , , , . = + ⊕=⊖ ⊗( ⊘ ( )) 𝑥𝑥 ⊕ 𝑦𝑦 = 𝑓𝑓𝑓𝑓(𝑓𝑓𝑓𝑓 𝑥𝑥 × 𝑓𝑓𝑓𝑓(𝑦𝑦)) 𝑥𝑥 ⊖ 𝑦𝑦 = 𝑓𝑓𝑓𝑓(𝑓𝑓𝑓𝑓 𝑥𝑥 ÷− 𝑓𝑓𝑓𝑓(𝑦𝑦)) 𝑥𝑥 ⊗ 𝑦𝑦 𝑓𝑓𝑓𝑓 𝑓𝑓𝑓𝑓 𝑥𝑥 𝑓𝑓𝑓𝑓 𝑦𝑦 Example 1.1.3𝑥𝑥. ⊘=𝑦𝑦 , 𝑓𝑓𝑓𝑓=𝑓𝑓𝑓𝑓 ,𝑥𝑥 = 0𝑓𝑓𝑓𝑓.714251𝑦𝑦 , = 98765.9. Use 5-digit5 chopping1 arithmetic to compute 7 3 , , (𝑥𝑥 𝑦𝑦) . 𝑢𝑢Compute relative𝑣𝑣 error for . 𝑥𝑥 ⊕ 𝑦𝑦 𝑥𝑥 ⊖ 𝑢𝑢 𝑥𝑥 ⊖ 𝑢𝑢 ⊗ 𝑣𝑣 11 𝑥𝑥 ⊖ 𝑢𝑢 Calculations resulting in loss of accuracy 1. Subtracting nearly equal numbers gives fewer significant digits. 2. Dividing by a number with small magnitude or multiplying by a number with large magnitude will enlarge the error.

Example d. Suppose is approximated by + . where error is introduced by previous calculation. Let = 10 , > 0𝑧𝑧. Estimate the absolute𝑧𝑧 error𝛿𝛿 of . −𝛿𝛿𝑛𝑛 𝜀𝜀 𝑛𝑛 𝑧𝑧 ⊘ 𝜀𝜀 12 Technique to reduce round-off error • Reformulate the calculation. Example e. Compute the most accurate approximation to roots of + 62. + 1 = 0 with 4-digit rounding arithmetic. 2 𝑥𝑥 10𝑥𝑥 • Nested arithmetic – Purpose is to reduce number of calculations. Example 1.2.5. evaluate = 6.1 + 3. + 1.5 at = 4.71 using 3-digit chopping3 arithmetic.2 𝑓𝑓 𝑥𝑥 𝑥𝑥 − 𝑥𝑥 2𝑥𝑥 𝑥𝑥 13