<<

Lecture 3 Taylor Series and Floating Point Representation

J. Chaudhry (Zeb)

Department of Mathematics and Statistics University of New Mexico

J. Chaudhry (Zeb) (UNM) Math/CS 375 1 / 30 What we’ll do:

Taylor Series Floating point representation

J. Chaudhry (Zeb) (UNM) Math/CS 375 2 / 30 Taylor

All we can ever do is add and multiply √ We can’t directly evaluate ex, cos(x), x What to do? Taylor Series approximation

Taylor The Taylor series expansion of f (x) at the point x = is given by

(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0

J. Chaudhry (Zeb) (UNM) Math/CS 375 3 / 30 Taylor Example

Taylor The Taylor series expansion of f (x) about the point x = c is given by

(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0 Example (ex) We know e0 = 1, so expand about c = 0 to get

(x − 0)2 f (x) = ex = 1 + (x − 0) · 1 + · 1 + ... 2 x2 x3 = 1 + x + + + ... 2! 3!

J. Chaudhry (Zeb) (UNM) Math/CS 375 4 / 30 Taylor Approximation

So 22 23 e2 = 1 + 2 + + + ... 2! 3! But we can’t evaluate an infinite series, so we truncate...

Taylor Series Polynomial Approximation The Taylor Polynomial of degree n for the function f (x) about the point c is

n (x − c)k p (x) = f (k)(c) n k! = Xk 0 Example (ex) In the case of the exponential

x2 xn ex ≈ p (x) = 1 + x + + ··· + n 2! n!

J. Chaudhry (Zeb) (UNM) Math/CS 375 5 / 30 Taylor Approximation

Evaluate e2: Using 0th order Taylor series: ex ≈ 1 does not give a good fit. Using 1st order Taylor series: ex ≈ 1 + x gives a better fit. Using 2nd order Taylor series: ex ≈ 1 + x + x2/2 gives a a really good fit.

1 x=2;

2 pn=0;

3 for j=0:15

4 pn = pn + (xˆj)/factorial(j);

5 err = exp(2)-pn

6 pause

7 end

J. Chaudhry (Zeb) (UNM) Math/CS 375 6 / 30 Taylor Approximation (Important to know all three forms)

Infinite Taylor Series Expansion (exact) (x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n!

Finite Taylor Series Expansion (exact) (x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(ξ), 2! n! where ξ lies between c and x but we don’t know its value.

Finite Taylor Series Approximation (x − c)2 (x − c)n f (x) ≈ f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c). 2! n!

J. Chaudhry (Zeb) (UNM) Math/CS 375 7 / 30 Taylor Approximation Error

How accurate is the Taylor series polynomial approximation? The n terms of the approximation are simply the first n terms of the exact expansion: x2 x3 ex = 1 + x + + + ... (1) 2! 3! x p2 approximation to e truncation error So the function f (x) can be| written{z as} the Taylor| {z Series} approximation plus an error (truncation) term:

f (x) = pn(x) + En(x)

where (x − c)n+1 E (x) = f (n+1)(ξ) n (n + 1)! where ξ lies between c and x, that is ξ ∈ [c, x].

J. Chaudhry (Zeb) (UNM) Math/CS 375 8 / 30 Truncation Error

The Taylor series expansion of sin (x) is

x3 x5 x7 x9 sin (x) = x − + − + − ... 3! 5! 7! 9! If x  1, then the remaining terms are small. If we neglect these terms

x3 x5 x7 x9 sin (x) = x − + − + − ... 3! 5! 7! 9! approximation to sin truncation error | {z } | {z }

J. Chaudhry (Zeb) (UNM) Math/CS 375 9 / 30 Taylor Series: Computations

( ) = 1 How do we evaluate f x 1−x computationally? Taylor Series Expansion:

(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(ξ), 2! n! Thus with c = 0 1 = 1 + x + x2 + x3 + ... 1 − x Second order approximation:

1 ≈ 1 + x + x2 1 − x

J. Chaudhry (Zeb) (UNM) Math/CS 375 10 / 30 Taylor Errors

How many terms do I need to make sure my error is less than 2 × 10−8 for x = 1/2? 1 = 1 + x + x2 + ··· + xn + xk 1 − x ∞ = + k Xn 1 so the error at x = 1/2 is

1k (1/2)n+1 ex=1/2 = = ∞ 2 1 − 1/2 = + k Xn 1 = 2 · (1/2)n+1 < 2 × 10−8

then we need −8 n + 1 > ≈ 26.6 or log10(1/2) n > 26

J. Chaudhry (Zeb) (UNM) Math/CS 375 11 / 30 Taylor Series: Alternate Forms

The Taylor series expansion of f (x) at the point x = c is given by Taylor

(x − c)2 (x − c)n f (x) = f (c) + (x − c)f 0(c) + f 00(c) + ··· + f (n)(c) + ... 2! n! (x − c)k = f (k)(c) ∞ k! = Xk 0 Substitute x = x + h and c = x, Taylor: Alternate

h2 hn f (x + h) = f (x) + hf 0(x) + f 00(x) + ··· + f (n)(x) + ... 2! n! hk = f (k)(x) ∞ k! = Xk 0

J. Chaudhry (Zeb) (UNM) Math/CS 375 12 / 30 Floating Point Numbers

We’re familiar with base 10 representation of numbers:

1234 = 4 × 100 + 3 × 101 + 2 × 102 + 1 × 103

and .1234 = 1 × 10−1 + 2 × 10−2 + 3 × 10−3 + 4 × 10−4 we write 1234.1234 as an integer part and a fractional part:

a3a2a1a0.b1b2b3b4

For some (even simple) numbers, there may be an infinite number of digits to the right:

π = 3.14159 ... 1/9 = 0.11111 ... √ 2 = 1.41421 ...

J. Chaudhry (Zeb) (UNM) Math/CS 375 13 / 30 Other bases

So far, we have just base 10. What about base β? binary (β = 2), octal (β = 8), hexadecimal (β = 16), etc In the β-system we have

n k −k (an ... a2a1a0.b1b2b3b4 ... )β = akβ + bkβ ∞ = = Xk 0 Xk 1

J. Chaudhry (Zeb) (UNM) Math/CS 375 14 / 30 Conversion from base-2 to base-10

Convert a base-2 number to base-10:

(1001.101)2 = 1 × 23 + 0 × 22 + 0 × 21 + 1 × 20 + 1 × 2−1 + 0 × 2−2 + 1 × 2−3 = 8 + 0 + 0 + 1 + 0.5 + 0 + 0.125 = 9.62

1 = For other numbers, such as 5 0.2, an infinite length is needed.

0.2 → .0011 0011 0011 ...

So 0.2 is stored just fine in base-10, but needs infinite number of digits in base-2

!!! This is roundoff error in its basic form...

J. Chaudhry (Zeb) (UNM) Math/CS 375 15 / 30 Numerical ”bugs”

Obvious Software has bugs

Not-SO-Obvious Numerical software has two unique bugs: 1 roundoff error 2 truncation error

J. Chaudhry (Zeb) (UNM) Math/CS 375 16 / 30 Numerical Errors

Roundoff Roundoff occurs when digits in a decimal point (0.3333...) are lost (0.3333) due to a limit on the memory available for storing one numerical value.

Truncation Truncation error occurs when discrete values are used to approximate a mathematical expression (e.g. finite number of terms in Taylor series).

J. Chaudhry (Zeb) (UNM) Math/CS 375 17 / 30 Uncertainty: well- or ill-conditioned?

Errors in input data can cause uncertain results input data can be experimental or rounded. leads to a certain variation in the results well-conditioned: numerical results are insensitive to small variations in the input ill-conditioned: small variations lead to drastically different numerical calculations (a.k.a. poorly conditioned)

J. Chaudhry (Zeb) (UNM) Math/CS 375 18 / 30 Our Job

As numerical analysts, we need to 1 solve a problem so that the calculation is not susceptible to large roundoff error 2 solve a problem so that the approximation has a tolerable truncation error

How? incorporate roundoff-truncation knowledge into I the mathematical model I the method I the algorithm I the software design awareness → correct interpretation of results

J. Chaudhry (Zeb) (UNM) Math/CS 375 19 / 30 Floating Points

Normalized Floating-Point Representation Real numbers are stored as

e x = ±(0.d1d2d3 ... dm)β × β

d1d2d3 ... dm is the mantissa, e is the exponent e is negative, positive or zero

the general normalized form requires d1 , 0

J. Chaudhry (Zeb) (UNM) Math/CS 375 20 / 30 Floating Point

Example In base 10 1000.12345 can be written as

4 (0.100012345)10 × 10

0.000812345 can be written as

−3 (0.812345)10 × 10

J. Chaudhry (Zeb) (UNM) Math/CS 375 21 / 30 Floating Point

Suppose we have only 3 for a mantissa and a 2 exponent stored like

.d1 d2 d3 e1 e2

All possible combinations give:

0002 = 0 ... × 2−1,0,1

1112 = 7

1 2 7 1 2 7 1 2 7 So we get 0, 16 , 16 ,..., 16 , 0, 4 , 4 ,..., 4 , and 0, 8 , 8 ,..., 8 . On the real line:

J. Chaudhry (Zeb) (UNM) Math/CS 375 22 / 30 Overflow, Underflow

computations too close to zero may result in underflow computations too large may result in overflow overflow error is considered more severe underflow can just fall back to 0

J. Chaudhry (Zeb) (UNM) Math/CS 375 23 / 30 Normalizing

−1,0,1 If we use the normalized form in our 5-bit case, we lose 0.0012 × 2 . So we 1 1 3 cannot represent 16 , 8 , and 16 .

J. Chaudhry (Zeb) (UNM) Math/CS 375 24 / 30 IEEE-754

IEEE-754 is a widely used standard accepted by hardware/software makers I defines the floating point distribution for our computation I offer several rounding modes which effect accuracy Floating point arithmetic emerges in nearly every piece of code I even modest mathematical operation yield loss of significant bits I several pitfalls in common mathematical expressions

J. Chaudhry (Zeb) (UNM) Math/CS 375 25 / 30 IEEE Floating Point (IEEE-754)

How much storage do we actually use in practice? 32-bit word lengths are typical Single-precision floating-point numbers use 32 bits Double-precision floating-point numbers use 64 bits Long Double-precision floating-point numbers use 80 bits Goal: use the 32-bits to best represent the normalized floating point number

J. Chaudhry (Zeb) (UNM) Math/CS 375 26 / 30 IEEE Single Precision

x = ±q × 2m

Notes: 1-bit sign 8-bit exponent |m| 23-bit mantissa q The leading one in the mantissa q does not need to be represented: b1 = 1 is hidden bit IEEE 754: put x in 1.f normalized form 0 < m + 127 = c < 255 Largest exponent = 127, Smallest exponent = -126 Special cases: c = 0, 255 for ±0, ± inf, nan ··· use 1-bit sign for negatives (1) and positives (0)

J. Chaudhry (Zeb) (UNM) Math/CS 375 27 / 30 IEEE Single Precision

x = ±q × 2m

1 (−52.125)10 = −(110100.00100000000)2

2 5 Convert to 1.f form: x = − (1. 101 000 010 000 ... 0)2 × 2 1 23

3 Convert exponent 5 = c −|{z}127 ⇒| c = 132{z⇒ c = (10} 000 100)2 8 1 10 000 100 101 000 010 000 ...| 0 {z } 1 8 23 |{z} | {z } | {z }

J. Chaudhry (Zeb) (UNM) Math/CS 375 28 / 30 IEEE Double Precision

1-bit sign 11-bit exponent 52-bit mantissa single-precision: about 6 decimal digits of precision double-precision: about 15 decimal digits of precision m = c − 1023

J. Chaudhry (Zeb) (UNM) Math/CS 375 29 / 30 Precision vs. Range

type range approx range − −3.40 × 1038 6 x 6 −1.18 × 10 38 single 0 2−126 → 2128 − 1.18 × 10 38 6 x 6 3.40 × 1038 − −1.80 × 10318 6 x 6 −2.23 × 10 308 double 0 2−1022 → 21024 − 2.23 × 10 308 6 x 6 1.80 × 10308 Listing 1: Matlab Tools

1 >> realmax

2 ans = 1.7977e+308

3

4 >> realmin

5 ans = 2.2251e-308

J. Chaudhry (Zeb) (UNM) Math/CS 375 30 / 30