Lecture 3 Taylor Series and Floating Point Representation
J. Chaudhry (Zeb)
Department of Mathematics and Statistics University of New Mexico
J. Chaudhry (Zeb) (UNM) Math/CS 375 1 / 30 What we’ll do:
Taylor Series Floating point representation
J. Chaudhry (Zeb) (UNM) Math/CS 375 2 / 30 Taylor
All we can ever do is add and multiply √ We can’t directly evaluate ex, cos(x), x What to do? Taylor Series approximation
Taylor The Taylor series expansion of f (x) at the point x = c is given by
(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0
J. Chaudhry (Zeb) (UNM) Math/CS 375 3 / 30 Taylor Example
Taylor The Taylor series expansion of f (x) about the point x = c is given by
(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0 Example (ex) We know e0 = 1, so expand about c = 0 to get
(x − 0)2 f (x) = ex = 1 + (x − 0) · 1 + · 1 + ... 2 x2 x3 = 1 + x + + + ... 2! 3!
J. Chaudhry (Zeb) (UNM) Math/CS 375 4 / 30 Taylor Approximation
So 22 23 e2 = 1 + 2 + + + ... 2! 3! But we can’t evaluate an infinite series, so we truncate...
Taylor Series Polynomial Approximation The Taylor Polynomial of degree n for the function f (x) about the point c is
n (x − c)k p (x) = f (k)(c) n k! = Xk 0 Example (ex) In the case of the exponential
x2 xn ex ≈ p (x) = 1 + x + + ··· + n 2! n!
J. Chaudhry (Zeb) (UNM) Math/CS 375 5 / 30 Taylor Approximation
Evaluate e2: Using 0th order Taylor series: ex ≈ 1 does not give a good fit. Using 1st order Taylor series: ex ≈ 1 + x gives a better fit. Using 2nd order Taylor series: ex ≈ 1 + x + x2/2 gives a a really good fit.
1 x=2;
2 pn=0;
3 for j=0:15
4 pn = pn + (xˆj)/factorial(j);
5 err = exp(2)-pn
6 pause
7 end
J. Chaudhry (Zeb) (UNM) Math/CS 375 6 / 30 Taylor Approximation (Important to know all three forms)
Infinite Taylor Series Expansion (exact) (x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n!
Finite Taylor Series Expansion (exact) (x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(ξ), 2! n! where ξ lies between c and x but we don’t know its value.
Finite Taylor Series Approximation (x − c)2 (x − c)n f (x) ≈ f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c). 2! n!
J. Chaudhry (Zeb) (UNM) Math/CS 375 7 / 30 Taylor Approximation Error
How accurate is the Taylor series polynomial approximation? The n terms of the approximation are simply the first n terms of the exact expansion: x2 x3 ex = 1 + x + + + ... (1) 2! 3! x p2 approximation to e truncation error So the function f (x) can be| written{z as} the Taylor| {z Series} approximation plus an error (truncation) term:
f (x) = pn(x) + En(x)
where (x − c)n+1 E (x) = f (n+1)(ξ) n (n + 1)! where ξ lies between c and x, that is ξ ∈ [c, x].
J. Chaudhry (Zeb) (UNM) Math/CS 375 8 / 30 Truncation Error
The Taylor series expansion of sin (x) is
x3 x5 x7 x9 sin (x) = x − + − + − ... 3! 5! 7! 9! If x 1, then the remaining terms are small. If we neglect these terms
x3 x5 x7 x9 sin (x) = x − + − + − ... 3! 5! 7! 9! approximation to sin truncation error | {z } | {z }
J. Chaudhry (Zeb) (UNM) Math/CS 375 9 / 30 Taylor Series: Computations
( ) = 1 How do we evaluate f x 1−x computationally? Taylor Series Expansion:
(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(ξ), 2! n! Thus with c = 0 1 = 1 + x + x2 + x3 + ... 1 − x Second order approximation:
1 ≈ 1 + x + x2 1 − x
J. Chaudhry (Zeb) (UNM) Math/CS 375 10 / 30 Taylor Errors
How many terms do I need to make sure my error is less than 2 × 10−8 for x = 1/2? 1 = 1 + x + x2 + ··· + xn + xk 1 − x ∞ = + k Xn 1 so the error at x = 1/2 is
1k (1/2)n+1 ex=1/2 = = ∞ 2 1 − 1/2 = + k Xn 1 = 2 · (1/2)n+1 < 2 × 10−8
then we need −8 n + 1 > ≈ 26.6 or log10(1/2) n > 26
J. Chaudhry (Zeb) (UNM) Math/CS 375 11 / 30 Taylor Series: Alternate Forms
The Taylor series expansion of f (x) at the point x = c is given by Taylor
(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0 Substitute x = x + h and c = x, Taylor: Alternate
h2 hn f (x + h) = f (x) + hf 0(x) + f 00(x) + ··· + f (n)(x) + ... 2! n! hk = f (k)(x) ∞ k! = Xk 0
J. Chaudhry (Zeb) (UNM) Math/CS 375 12 / 30 Floating Point Numbers
We’re familiar with base 10 representation of numbers:
1234 = 4 × 100 + 3 × 101 + 2 × 102 + 1 × 103
and .1234 = 1 × 10−1 + 2 × 10−2 + 3 × 10−3 + 4 × 10−4 we write 1234.1234 as an integer part and a fractional part:
a3a2a1a0.b1b2b3b4
For some (even simple) numbers, there may be an infinite number of digits to the right:
π = 3.14159 ... 1/9 = 0.11111 ... √ 2 = 1.41421 ...
J. Chaudhry (Zeb) (UNM) Math/CS 375 13 / 30 Other bases
So far, we have just base 10. What about base β? binary (β = 2), octal (β = 8), hexadecimal (β = 16), etc In the β-system we have
n k −k (an ... a2a1a0.b1b2b3b4 ... )β = akβ + bkβ ∞ = = Xk 0 Xk 1
J. Chaudhry (Zeb) (UNM) Math/CS 375 14 / 30 Conversion from base-2 to base-10
Convert a base-2 number to base-10:
(1001.101)2 = 1 × 23 + 0 × 22 + 0 × 21 + 1 × 20 + 1 × 2−1 + 0 × 2−2 + 1 × 2−3 = 8 + 0 + 0 + 1 + 0.5 + 0 + 0.125 = 9.62
1 = For other numbers, such as 5 0.2, an infinite length is needed.
0.2 → .0011 0011 0011 ...
So 0.2 is stored just fine in base-10, but needs infinite number of digits in base-2
!!! This is roundoff error in its basic form...
J. Chaudhry (Zeb) (UNM) Math/CS 375 15 / 30 Numerical ”bugs”
Obvious Software has bugs
Not-SO-Obvious Numerical software has two unique bugs: 1 roundoff error 2 truncation error
J. Chaudhry (Zeb) (UNM) Math/CS 375 16 / 30 Numerical Errors
Roundoff Roundoff occurs when digits in a decimal point (0.3333...) are lost (0.3333) due to a limit on the memory available for storing one numerical value.
Truncation Truncation error occurs when discrete values are used to approximate a mathematical expression (e.g. finite number of terms in Taylor series).
J. Chaudhry (Zeb) (UNM) Math/CS 375 17 / 30 Uncertainty: well- or ill-conditioned?
Errors in input data can cause uncertain results input data can be experimental or rounded. leads to a certain variation in the results well-conditioned: numerical results are insensitive to small variations in the input ill-conditioned: small variations lead to drastically different numerical calculations (a.k.a. poorly conditioned)
J. Chaudhry (Zeb) (UNM) Math/CS 375 18 / 30 Our Job
As numerical analysts, we need to 1 solve a problem so that the calculation is not susceptible to large roundoff error 2 solve a problem so that the approximation has a tolerable truncation error
How? incorporate roundoff-truncation knowledge into I the mathematical model I the method I the algorithm I the software design awareness → correct interpretation of results
J. Chaudhry (Zeb) (UNM) Math/CS 375 19 / 30 Floating Points
Normalized Floating-Point Representation Real numbers are stored as
e x = ±(0.d1d2d3 ... dm)β × β
d1d2d3 ... dm is the mantissa, e is the exponent e is negative, positive or zero
the general normalized form requires d1 , 0
J. Chaudhry (Zeb) (UNM) Math/CS 375 20 / 30 Floating Point
Example In base 10 1000.12345 can be written as
4 (0.100012345)10 × 10
0.000812345 can be written as
−3 (0.812345)10 × 10
J. Chaudhry (Zeb) (UNM) Math/CS 375 21 / 30 Floating Point
Suppose we have only 3 bits for a mantissa and a 2 bit exponent stored like
.d1 d2 d3 e1 e2
All possible combinations give:
0002 = 0 ... × 2−1,0,1
1112 = 7
1 2 7 1 2 7 1 2 7 So we get 0, 16 , 16 ,..., 16 , 0, 4 , 4 ,..., 4 , and 0, 8 , 8 ,..., 8 . On the real line:
J. Chaudhry (Zeb) (UNM) Math/CS 375 22 / 30 Overflow, Underflow
computations too close to zero may result in underflow computations too large may result in overflow overflow error is considered more severe underflow can just fall back to 0
J. Chaudhry (Zeb) (UNM) Math/CS 375 23 / 30 Normalizing
−1,0,1 If we use the normalized form in our 5-bit case, we lose 0.0012 × 2 . So we 1 1 3 cannot represent 16 , 8 , and 16 .
J. Chaudhry (Zeb) (UNM) Math/CS 375 24 / 30 IEEE-754
IEEE-754 is a widely used standard accepted by hardware/software makers I defines the floating point distribution for our computation I offer several rounding modes which effect accuracy Floating point arithmetic emerges in nearly every piece of code I even modest mathematical operation yield loss of significant bits I several pitfalls in common mathematical expressions
J. Chaudhry (Zeb) (UNM) Math/CS 375 25 / 30 IEEE Floating Point (IEEE-754)
How much storage do we actually use in practice? 32-bit word lengths are typical Single-precision floating-point numbers use 32 bits Double-precision floating-point numbers use 64 bits Long Double-precision floating-point numbers use 80 bits Goal: use the 32-bits to best represent the normalized floating point number
J. Chaudhry (Zeb) (UNM) Math/CS 375 26 / 30 IEEE Single Precision
x = ±q × 2m
Notes: 1-bit sign 8-bit exponent |m| 23-bit mantissa q The leading one in the mantissa q does not need to be represented: b1 = 1 is hidden bit IEEE 754: put x in 1.f normalized form 0 < m + 127 = c < 255 Largest exponent = 127, Smallest exponent = -126 Special cases: c = 0, 255 for ±0, ± inf, nan ··· use 1-bit sign for negatives (1) and positives (0)
J. Chaudhry (Zeb) (UNM) Math/CS 375 27 / 30 IEEE Single Precision
x = ±q × 2m
1 (−52.125)10 = −(110100.00100000000)2
2 5 Convert to 1.f form: x = − (1. 101 000 010 000 ... 0)2 × 2 1 23
3 Convert exponent 5 = c −|{z}127 ⇒| c = 132{z⇒ c = (10} 000 100)2 8 1 10 000 100 101 000 010 000 ...| 0 {z } 1 8 23 |{z} | {z } | {z }
J. Chaudhry (Zeb) (UNM) Math/CS 375 28 / 30 IEEE Double Precision
1-bit sign 11-bit exponent 52-bit mantissa single-precision: about 6 decimal digits of precision double-precision: about 15 decimal digits of precision m = c − 1023
J. Chaudhry (Zeb) (UNM) Math/CS 375 29 / 30 Precision vs. Range
type range approx range − −3.40 × 1038 6 x 6 −1.18 × 10 38 single 0 2−126 → 2128 − 1.18 × 10 38 6 x 6 3.40 × 1038 − −1.80 × 10318 6 x 6 −2.23 × 10 308 double 0 2−1022 → 21024 − 2.23 × 10 308 6 x 6 1.80 × 10308 Listing 1: Matlab Tools
1 >> realmax
2 ans = 1.7977e+308
3
4 >> realmin
5 ans = 2.2251e-308
J. Chaudhry (Zeb) (UNM) Math/CS 375 30 / 30