Numerical Computations – with a View Towards R

Numerical computations { with a view towards R Søren Højsgaard Department of Mathematical Sciences Aalborg University, Denmark October 8, 2012 Contents 1 Computer arithmetic is not exact 1 2 Floating point arithmetic 1 2.1 Addition and subtraction . .2 2.2 The Relative Machine Precision . .2 2.3 Floating-Point Precision . .3 2.4 Error in Floating-Point Computations . .3 3 Example: A polynomial 3 4 Example: Numerical derivatives 6 5 Example: The Sample Variance 8 6 Example: Solving a quadratic equation 8 7 Example: Computing the exponential 9 8 Ressources 10 1 Computer arithmetic is not exact The following R statement appears to give the correct answer. R> 0.3 - 0.2 [1] 0.1 1 But all is not as it seems. R> 0.3 - 0.2 == 0.1 [1] FALSE The difference between the values being compared is small, but important. R> (0.3 - 0.2) - 0.1 [1] -2.775558e-17 2 Floating point arithmetic Real numbers \do not exist" in computers. Numbers in computers are represented in floating point form s × be where s is the significand, b is the base and e is the exponent. R has \numeric" (floating points) and \integer" numbers R> class(1) [1] "numeric" R> class(1L) [1] "integer" 2.1 Addition and subtraction Let s be a 7{digit number A simple method to add floating-point numbers is to first represent them with the same exponent. 123456.7 = 1.234567 * 10^5 101.7654 = 1.017654 * 10^2 = 0.001017654 * 10^5 So the true result is (1.234567 + 0.001017654) * 10^5 = 1.235584654 * 10^5 But the approximate result the computer would give is (the last digits (654) are lost) 2 1.235585 * 10^5 (final sum: 123558.5) In extreme cases, the sum of two non-zero numbers may be equal to one of them Quiz: how to sum a sequence of numbers x1; : : : ; xn to obtain large accuracy? 2.2 The Relative Machine Precision The accuracy of a floating-point system is measured by the relative machine precision or machine epsilon. This is the smallest positive value which can be added to 1 to produce a value different from 1. A machine epsilon of 10−7 indicates that there are roughly 7 decimal digits of precision in the numeric values stored and manipulated by the computer. It is easy to write a program to determine the relative machine precision R> .Machine$double.eps [1] 2.220446e-16 R> macheps <- function(){ eps <- 1 while(1+eps/2 != 1) eps <- eps / 2 eps } R> macheps() [1] 2.220446e-16 2.3 Floating-Point Precision The preceding program shows that there are roughly 16 decimal digits of precision to R arithmetic. It is possible to see the effects of this limited precision directly. R> a = 12345678901234567890 R> print(a, digits=20) [1] 12345678901234567168 The effects of finite precision show up in the results of calculations. 2.4 Error in Floating-Point Computations Numbers are accurate to about 15 significant digits. Subtraction of positive values is one place where the finite precision of floating-point arith- 3 metic is a potential problem. R> x = 1+1.234567890e-10 R> print(x, digits = 20) [1] 1.0000000001234568003 R> y = x - 1 R> print(y, digits = 20) [1] 1.234568003383174073e-10 There are 16 correct digits in x, but only 6 correct digits in y. Subtraction of nearly equal quantities (known as near cancellation) is a major source of inaccuracy in numerical calculations and requires special care. 3 Example: A polynomial The function f(x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1 is a 7th degree polynomial, and its graph should appear very smooth. To check this we can compute and graph the function over a range of values R> x = seq(.988, 1.012, by = 0.0001) R> y = x^7 - 7*x^6 + 21*x^5 - 35*x^4 + 35*x^3 - 21*x^2 + 7*x - 1 R> plot(x, y, type = "l") 4 4e−14 2e−14 y 0e+00 −2e−14 −4e−14 0.990 0.995 1.000 1.005 1.010 x Not very smooth! To see where the cancellation error comes split the polynomial into individual terms and see what happens when we sum them. R> x = .99 R> y = c(x^7, - 7*x^6, + 21*x^5, - 35*x^4, 35*x^3, - 21*x^2, + 7*x, - 1) R> y [1] 0.9320653 -6.5903610 19.9707910 -33.6208604 33.9604650 -20.5821000 [7] 6.9300000 -1.0000000 R> cumsum(y) [1] 9.320653e-01 -5.658296e+00 1.431250e+01 -1.930837e+01 1.465210e+01 [6] -5.930000e+00 1.000000e+00 -1.521006e-14 It is the last subtraction (of 1) which causes the catastrophic cancellation and loss of accuracy. We can reformulate the problem by noticing that 5 f(x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1 = (x − 1)7 Notice that although we are still getting cancellation, when 1 is subtracted from values close to 1, we are only losing a few digits of accuracy. The difference is apparent in the plot. R> x = seq(.988, 1.012, by = 0.0001) R> y = (x - 1)^7 R> plot(x, y, type = "l") 3e−14 1e−14 y −1e−14 −3e−14 0.990 0.995 1.000 1.005 1.010 x 4 Example: Numerical derivatives The derivative f 0(x) may be approximated by f(x + h=2) − f(x − h=2) f 0(x) ≈ ; h small h 6 For small h we get near cancellation errors. A generic R function is R> numDeriv <- function(f, x, h=1e-8){ (f(x+h/2)-f(x-h/2))/h } Find derivative of exponential at x = 1: R> g <- function(x){exp(x)} R> print(numDeriv(g, 1), digits=20) [1] 2.7182818218562943002 R> print(exp(1), digits=20) [1] 2.7182818284590450908 Try range of h values: R> hvec <- seq(5e-9, 1e-6, 1e-8) R> dvec <- numDeriv(g, 1, hvec) R> plot(hvec, (dvec-exp(1))/exp(1), type='l') R> abline(h=0, col='red') 7 1e−08 5e−09 0e+00 (dvec − exp(1))/exp(1) (dvec −5e−09 0e+00 2e−07 4e−07 6e−07 8e−07 1e−06 hvec 5 Example: The Sample Variance Consider the formula for the sample variance n n 1 X 1 X (x − x¯)2 = x2 − nx¯2 n − 1 i n − 1 i i=1 i=1 The left-hand side of this equation provides a much better computational procedure for finding the sample variance than the right-hand side. P 2 2 If the mean of xi is far from 0, then i xi and nx¯ will be large and nearly equal to each other. The relative error which results from applying the right-hand side formula can be very large. There can, of course, be loss of accuracy using the formula on the left, but it is not nearly as severe. 8 R> sdval <- 5 R> x <- rnorm(10, mean=1000000, sd=sdval) R> x.bar <- mean(x) R> n <- length(x) R> lhs <- sum((x-x.bar)^2)/(n-1) R> rhs <- (sum(x^2)-n*x.bar^2)/(n-1) R> print(lhs, digits=10) [1] 9.199470311 R> print(rhs, digits=10) [1] 9.19921875 R> print(rhs-lhs, digits=10) [1] -0.0002515605223 6 Example: Solving a quadratic equation Consider solving ax2 + bx + c = 0 Letting D = b2 − 4ac, the roots are p p −b + D −b − D r = ; r = 1 2a 2 2a p p If b2 ac then D = b2 − 4ac ≈ jbj. p p If b > 0 then −b + D involves a near cancellation (same for −b − D if b < 0). p Rewritep the problem: Multiply numerator and dominator of r1 by −b + D (and of r2 by −b + D) by to obtain 2c 2c r1 = p ; r2 = p −b − D −b + D 7 Example: Computing the exponential The exponential function is defined by the power series n X xn exp(x) = n! n=0 xn x Letting tn = n! we have tn+1 = tn n+1 so these terms must eventually become small. 9 One strategy for summing the series is to accumulate terms of the series until the terms become so small that they do not change the sum. R> expf <- function(x){ n <- 0 term <- 1 ans <- 1 while(abs(term)> .Machine$double.eps) { n = n + 1 term = term * x / n ans <- ans + term } ans } Compare expf() with R's built in function. For positive values, the results are good: R> (expf(1) - exp(1))/exp(1) [1] 1.633713e-16 R> (expf(20) - exp(20))/exp(20) [1] -1.228543e-16 For negative values less so: R> (expf(-1) - exp(-1))/exp(-1) [1] 3.017899e-16 R> (expf(-20) - exp(-20))/exp(-20) [1] 1.727543 Why? When x < 0 the terms in n X xn exp(x) = n! n=0 alternate in sign. For large negative x values, the value returned by the function is small while at least some of the terms are large. Hence, at some point, there has to be near-cancellation of the accumulated sum and the next term of the series. This is the source of the large relative error. Notice: When the argument to expf() is positive, all the terms of the series are positive and there is no cancellation. There is an easy remedy: 10 Since exp(−x) = 1= exp(x) it is for negative x better to compute the result as exp(x) = 1= exp(−x) R> expf <- function(x){ if (x<0) { 1/expf(-x) } else { n <- 0; term <- 1; ans <- 1 while(abs(term)> .Machine$double.eps) { n = n + 1 term = term * x / n ans <- ans + term } ans } } R> (expf(-1) - exp(-1))/exp(-1) [1] -1.50895e-16 R> (expf(-20) - exp(-20))/exp(-20) [1] 2.006596e-16 8 Ressources What Every Computer Scientist Should Know About Floating-Point Arithmetic: http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html 11.

Numerical Computations – with a View Towards R

Scilab Textbook Companion for Numerical Methods for Engineers by S

Extended Precision Floating Point Arithmetic

Introduction & Floating Point NMMC §1.4.1, Atkinson §1.2 1 Floating

1.2 Round-Off Errors and Computer Arithmetic

Floating Point Representation (Unsigned) Fixed-Point Representation

Absolute and Relative Error All Computations on a Computer Are Approximate by Nature, Due to the Limited Precision on the Computer

CS 542G: Preliminaries, Floating-Point, Errors

Limitations of Digital Representations Machine Epsilon

Adaptive Precision Floating-Point Arithmetic and Fast Robust Geometric Predicates

Floating-Point Numbers and Rounding Errors

• 1E-8 Smaller Than Machine Epsilon (Float) • Forward Sum Fails Utterly

Video 1: Rounding Errors % a Number System Can Be Represented As = ±1