AMS526: I (Numerical ) Lecture 08: Floating Point Arithmetic; Condition Numbers

Xiangmin Jiao

SUNY Stony Brook

Xiangmin Jiao Numerical Analysis I 1 / 11 Outline

1 Floating Point Arithmetic

2 Conditioning and Condition Numbers

Xiangmin Jiao Numerical Analysis I 2 / 11 Floating Point Representations Computers can only use finite number of bits to represent a

I Numbers cannot be arbitrarily large or small (associated risks of overflow and underflow) I There must be gaps between representable numbers (potential round-off errors) Commonly used computer-representations are floating point representations, which resemble scientific notation −1 −p+1 e ±(d0 + d1β + ··· + dp−1β )β , 0 ≤ di < β where β is base, p is digits of precision, and e is exponent between emin and emax Normalize if d0 6= 0 (except for 0) Gaps between adjacent numbers scale with size of numbers 1−p Relative resolution given by machine epsilon machine = 0.5β For all x, there exists a floating point x0 such that 0 |x − x | ≤ machine|x| Xiangmin Jiao Numerical Analysis I 3 / 11 IEEE Floating Point Representations

Single precision: 32 bit

I 1 sign bit (S), 8 exponent bits (E), 23 significant bits (M), (−1)S × 1.M × 2E−127 −24 I machine is 2 ≈ 6e − 8 Double precision: 64 bits

I 1 sign bit (S), 11 exponent bits (E), 52 significant bits (M), (−1)S × 1.M × 2E−1023 −53 I machine is 2 ≈ e − 16 Special quantities

I +∞ and −∞ when operation overflows; e.g., x/0 for nonzero x I NaN (Not a Number) is returned√ when an operation has no well-defined result; e.g., 0/0, −1, arcsin(2), NaN

Xiangmin Jiao Numerical Analysis I 4 / 11 Machine Epsilon Define fl(x) as closest floating point approximation to x By definition of machine, we have:

For all x ∈ R, there exists  with || ≤ machine such that fl(x) = x(1 + )

Given operation +, −, ×, and / (denoted by ∗), floating point numbers x and y, and corresponding floating point arithmetic (denoted by ~), we require that x ~ y = fl(x ∗ y) This is guaranteed by IEEE floating point arithmetic Fundamental axiom of floating point arithmetic:

For all x, y ∈ F, there exists  with || ≤ machine such that x ~ y = (x ∗ y)(1 + )

These properties will be the basis of error analysis with errors

Xiangmin Jiao Numerical Analysis I 5 / 11 Outline

1 Floating Point Arithmetic

2 Conditioning and Condition Numbers

Xiangmin Jiao Numerical Analysis I 6 / 11 I Round-off error (input, computation), truncation (approximation) error We would like the solution to be accurate, i.e., with small errors The error depends on property (conditioning) of the problem, property (stability) of the algorithm

Overview of Error Analysis

Error analysis is important subject of numerical analysis Given a problem f and an algorithm f˜ with an input x, the absolute error is kf˜(x) − f (x)k and relative error is kf˜(x) − f (x)k/kf (x)k What are possible sources of errors?

Xiangmin Jiao Numerical Analysis I 7 / 11 Overview of Error Analysis

Error analysis is important subject of numerical analysis Given a problem f and an algorithm f˜ with an input x, the absolute error is kf˜(x) − f (x)k and relative error is kf˜(x) − f (x)k/kf (x)k What are possible sources of errors?

I Round-off error (input, computation), truncation (approximation) error We would like the solution to be accurate, i.e., with small errors The error depends on property (conditioning) of the problem, property (stability) of the algorithm

Xiangmin Jiao Numerical Analysis I 7 / 11 Absolute Condition Number Condition number is a measure of sensitivity of a problem Absolute condition number of a problem f at x is

kδf k κˆ = lim sup ε→0 kδxk≤ε kδxk

where δf = f (x + δx) − f (x) kδf k Less formally, κˆ = supδx kδxk for infinitesimally small δx If f is differentiable, then

κˆ = kJ(x)k

where J is the Jacobian of f at x, with Jij = ∂fi /∂xj , and the is induced by vector norms on ∂f and ∂x Question: What is absolute condition number of f (x) = αx? Question: Is absolute condition number scale invariant?

Xiangmin Jiao Numerical Analysis I 8 / 11 Relative Condition Number Relative condition number of f at x is kδf k/kf (x)k κ = lim sup ε→0 kδxk≤ε kδxk/kxk

Less formally, κ = sup kδf k/kδxk for infinitesimally small δx δx kf (x)k/kxk Note: we can use different types of norms to get different condition numbers If f is differentiable, then kJ(x)k κ = kf (x)k/kxk Question: What is relative condition number of f (x) = αx? Question: Is relative condition number scale invariant? In numerical analysis, we in general use relative condition number A problem is well-conditioned if κ is small and is ill-conditioned if κ is large

Xiangmin Jiao Numerical Analysis I 9 / 11 Condition Numbers

Absolute condition number of a problem f at x is

kδf k κˆ = lim sup ε→0 kδxk≤ε kδxk

where δf = f (x + δx) − f (x) kδf k Less formally, κˆ = supδx kδxk for infinitesimally small δx Relative condition number of f at x is kδf k/kf (x)k κ = lim sup ε→0 kδxk≤ε kδxk/kxk

Less formally, κ = sup kδf k/kδxk for infinitesimally small δx δx kf (x)k/kxk

Xiangmin Jiao Numerical Analysis I 10 / 11 √ I Absolute condition number of f at x is κˆ = kJk = 1/(2 x)

F Note: We are talking about the condition number of the problem for a given x √ kJk 1/(2 x) I Relative condition number κ = = √ = 1/2 kf (x )k/kx k x/x T Example: f (x) = x1 − x2, where x = (x1, x2)

I Absolute condition number of f at x in ∞-norm is κˆ = kJk∞ = k(1, −1)k∞ = 2 kJk∞ 2 I Relative condition number κ = = kf (x )k∞/kx k∞ |x1−x2|/ max{|x1|,|x2|} I κ is arbitrarily large (f is ill-conditioned) if x1 ≈ x2 (hazard of cancellation error) Note: From now on, we will talk about only relative condition number

Examples

√ Example: Function f (x) = x

Xiangmin Jiao Numerical Analysis I 11 / 11 I Absolute condition number of f at x in ∞-norm is κˆ = kJk∞ = k(1, −1)k∞ = 2 kJk∞ 2 I Relative condition number κ = = kf (x )k∞/kx k∞ |x1−x2|/ max{|x1|,|x2|} I κ is arbitrarily large (f is ill-conditioned) if x1 ≈ x2 (hazard of cancellation error) Note: From now on, we will talk about only relative condition number

Examples

√ Example: Function f (x) = x √ I Absolute condition number of f at x is κˆ = kJk = 1/(2 x)

F Note: We are talking about the condition number of the problem for a given x √ kJk 1/(2 x) I Relative condition number κ = = √ = 1/2 kf (x )k/kx k x/x T Example: Function f (x) = x1 − x2, where x = (x1, x2)

Xiangmin Jiao Numerical Analysis I 11 / 11 Examples

√ Example: Function f (x) = x √ I Absolute condition number of f at x is κˆ = kJk = 1/(2 x)

F Note: We are talking about the condition number of the problem for a given x √ kJk 1/(2 x) I Relative condition number κ = = √ = 1/2 kf (x )k/kx k x/x T Example: Function f (x) = x1 − x2, where x = (x1, x2)

I Absolute condition number of f at x in ∞-norm is κˆ = kJk∞ = k(1, −1)k∞ = 2 kJk∞ 2 I Relative condition number κ = = kf (x )k∞/kx k∞ |x1−x2|/ max{|x1|,|x2|} I κ is arbitrarily large (f is ill-conditioned) if x1 ≈ x2 (hazard of cancellation error) Note: From now on, we will talk about only relative condition number

Xiangmin Jiao Numerical Analysis I 11 / 11