Numerical computations – with a view towards R

Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 8, 2012

Contents

1 Computer arithmetic is not exact 1

2 Floating point arithmetic 1 2.1 Addition and subtraction ...... 2 2.2 The Relative Machine Precision ...... 2 2.3 Floating-Point Precision ...... 3 2.4 Error in Floating-Point Computations ...... 3

3 Example: A polynomial 3

4 Example: Numerical derivatives 6

5 Example: The Sample Variance 8

6 Example: Solving a quadratic equation 8

7 Example: Computing the exponential 9

8 Ressources 10

1 Computer arithmetic is not exact

The following R statement appears to give the correct answer.

R> 0.3 - 0.2

[1] 0.1

1 But all is not as it seems.

R> 0.3 - 0.2 == 0.1

[1] FALSE

The difference between the values being compared is small, but important.

R> (0.3 - 0.2) - 0.1

[1] -2.775558e-17

2 Floating point arithmetic

Real “do not exist” in computers. Numbers in computers are represented in floating point form

s × be where s is the significand, b is the base and e is the exponent. R has “numeric” (floating points) and “integer” numbers

R> class(1)

[1] "numeric"

R> class(1L)

[1] "integer"

2.1 Addition and subtraction Let s be a 7–digit A simple method to add floating-point numbers is to first represent them with the same exponent.

123456.7 = 1.234567 * 10^5 101.7654 = 1.017654 * 10^2 = 0.001017654 * 10^5

So the true result is

(1.234567 + 0.001017654) * 10^5 = 1.235584654 * 10^5

But the approximate result the computer would give is (the last digits (654) are lost)

2 1.235585 * 10^5 (final sum: 123558.5)

In extreme cases, the sum of two non-zero numbers may be equal to one of them

Quiz: how to sum a sequence of numbers x1, . . . , xn to obtain large accuracy?

2.2 The Relative Machine Precision The accuracy of a floating-point system is measured by the relative machine precision or machine epsilon. This is the smallest positive value which can be added to 1 to produce a value different from 1. A machine epsilon of 10−7 indicates that there are roughly 7 decimal digits of precision in the numeric values stored and manipulated by the computer. It is easy to write a program to determine the relative machine precision

R> .Machine$double.eps

[1] 2.220446e-16

R> macheps <- function(){ eps <- 1 while(1+eps/2 != 1) eps <- eps / 2 eps } R> macheps()

[1] 2.220446e-16

2.3 Floating-Point Precision The preceding program shows that there are roughly 16 decimal digits of precision to R arithmetic. It is possible to see the effects of this limited precision directly.

R> a = 12345678901234567890 R> print(a, digits=20)

[1] 12345678901234567168

The effects of finite precision show up in the results of calculations.

2.4 Error in Floating-Point Computations Numbers are accurate to about 15 significant digits. Subtraction of positive values is one place where the finite precision of floating-point arith-

3 metic is a potential problem.

R> x = 1+1.234567890e-10 R> print(x, digits = 20)

[1] 1.0000000001234568003

R> y = x - 1 R> print(y, digits = 20)

[1] 1.234568003383174073e-10

There are 16 correct digits in x, but only 6 correct digits in y. Subtraction of nearly equal quantities (known as near cancellation) is a major source of inaccuracy in numerical calculations and requires special care.

3 Example: A polynomial

The function

f(x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1 is a 7th degree polynomial, and its graph should appear very smooth. To check this we can compute and graph the function over a range of values

R> x = seq(.988, 1.012, by = 0.0001) R> y = x^7 - 7*x^6 + 21*x^5 - 35*x^4 + 35*x^3 - 21*x^2 + 7*x - 1 R> plot(x, y, type = "l")

4 4e−14 2e−14 y 0e+00 −2e−14 −4e−14

0.990 0.995 1.000 1.005 1.010

x

Not very smooth! To see where the cancellation error comes split the polynomial into individual terms and see what happens when we sum them.

R> x = .99 R> y = (x^7, - 7*x^6, + 21*x^5, - 35*x^4, 35*x^3, - 21*x^2, + 7*x, - 1) R> y

[1] 0.9320653 -6.5903610 19.9707910 -33.6208604 33.9604650 -20.5821000 [7] 6.9300000 -1.0000000

R> cumsum(y)

[1] 9.320653e-01 -5.658296e+00 1.431250e+01 -1.930837e+01 1.465210e+01 [6] -5.930000e+00 1.000000e+00 -1.521006e-14

It is the last subtraction (of 1) which causes the catastrophic cancellation and loss of accuracy. We can reformulate the problem by noticing that

5 f(x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1 = (x − 1)7

Notice that although we are still getting cancellation, when 1 is subtracted from values close to 1, we are only losing a few digits of accuracy. The difference is apparent in the plot.

R> x = seq(.988, 1.012, by = 0.0001) R> y = (x - 1)^7 R> plot(x, y, type = "l") 3e−14 1e−14 y −1e−14 −3e−14

0.990 0.995 1.000 1.005 1.010

x

4 Example: Numerical derivatives

The derivative f 0(x) may be approximated by

f(x + h/2) − f(x − h/2) f 0(x) ≈ , h small h

6 For small h we get near cancellation errors. A generic R function is

R> numDeriv <- function(f, x, h=1e-8){ (f(x+h/2)-f(x-h/2))/h }

Find derivative of exponential at x = 1:

R> g <- function(x){exp(x)} R> print(numDeriv(g, 1), digits=20)

[1] 2.7182818218562943002

R> print(exp(1), digits=20)

[1] 2.7182818284590450908

Try range of h values:

R> hvec <- seq(5e-9, 1e-6, 1e-8) R> dvec <- numDeriv(g, 1, hvec) R> plot(hvec, (dvec-exp(1))/exp(1), type='l') R> abline(h=0, col='red')

7 1e−08 5e−09 0e+00 (dvec − exp(1))/exp(1) (dvec −5e−09

0e+00 2e−07 4e−07 6e−07 8e−07 1e−06

hvec

5 Example: The Sample Variance

Consider the formula for the sample variance

n n 1 X 1  X  (x − x¯)2 = x2 − nx¯2 n − 1 i n − 1 i i=1 i=1

The left-hand side of this equation provides a much better computational procedure for finding the sample variance than the right-hand side. P 2 2 If the mean of xi is far from 0, then i xi and nx¯ will be large and nearly equal to each other. The relative error which results from applying the right-hand side formula can be very large. There can, of course, be loss of accuracy using the formula on the left, but it is not nearly as severe.

8 R> sdval <- 5 R> x <- rnorm(10, mean=1000000, sd=sdval) R> x.bar <- mean(x) R> n <- length(x) R> lhs <- sum((x-x.bar)^2)/(n-1) R> rhs <- (sum(x^2)-n*x.bar^2)/(n-1) R> print(lhs, digits=10)

[1] 9.199470311

R> print(rhs, digits=10)

[1] 9.19921875

R> print(rhs-lhs, digits=10)

[1] -0.0002515605223

6 Example: Solving a quadratic equation

Consider solving ax2 + bx + c = 0

Letting D = b2 − 4ac, the roots are √ √ −b + D −b − D r = , r = 1 2a 2 2a √ √ If b2  ac then D = b2 − 4ac ≈ |b|. √ √ If b > 0 then −b + D involves a near cancellation (same for −b − D if b < 0). √ Rewrite√ the problem: Multiply numerator and dominator of r1 by −b + D (and of r2 by −b + D) by to obtain

2c 2c r1 = √ , r2 = √ −b − D −b + D

7 Example: Computing the exponential

The exponential function is defined by the power series

n X xn exp(x) = n! n=0

xn x Letting tn = n! we have tn+1 = tn n+1 so these terms must eventually become small.

9 One strategy for summing the series is to accumulate terms of the series until the terms become so small that they do not change the sum.

R> expf <- function(x){ n <- 0 term <- 1 ans <- 1 while(abs(term)> .Machine$double.eps) { n = n + 1 term = term * x / n ans <- ans + term } ans }

Compare expf() with R’s built in function. For positive values, the results are good:

R> (expf(1) - exp(1))/exp(1)

[1] 1.633713e-16

R> (expf(20) - exp(20))/exp(20)

[1] -1.228543e-16

For negative values less so:

R> (expf(-1) - exp(-1))/exp(-1)

[1] 3.017899e-16

R> (expf(-20) - exp(-20))/exp(-20)

[1] 1.727543

Why? When x < 0 the terms in

n X xn exp(x) = n! n=0 alternate in sign. For large negative x values, the value returned by the function is small while at least some of the terms are large. Hence, at some point, there has to be near-cancellation of the accumulated sum and the next term of the series. This is the source of the large relative error. Notice: When the argument to expf() is positive, all the terms of the series are positive and there is no cancellation. There is an easy remedy:

10 Since exp(−x) = 1/ exp(x) it is for negative x better to compute the result as exp(x) = 1/ exp(−x)

R> expf <- function(x){ if (x<0) { 1/expf(-x) } else { n <- 0; term <- 1; ans <- 1 while(abs(term)> .Machine$double.eps) { n = n + 1 term = term * x / n ans <- ans + term } ans } }

R> (expf(-1) - exp(-1))/exp(-1)

[1] -1.50895e-16

R> (expf(-20) - exp(-20))/exp(-20)

[1] 2.006596e-16

8 Ressources

What Every Computer Scientist Should Know About Floating-Point Arithmetic: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html

11