HowResearch and Why Matters to Estimate ConditionFebruary Numbers 25, for 2009 Functions

NickNick Higham Higham DirectorSchool of of Research Mathematics TheSchool University of Mathematics of Manchester http://www.maths.manchester.ac.uk/~higham @nhigham, nickhigham.wordpress.com

Joint work with Sam Relton.

Computational and Optimization

for the Digital Economy 1 / 6 Edinburgh, October 2013 Photocopying errors!

Sources of Error

f : Cn×n → Cn×n. Want to compute f (A) but given A + ∆A not A. ∆A may come from errors in storing A. Measurement errors. Errors from an earlier computation.

Nick Higham Matrix Condition Numbers 2 / 33 Sources of Error

f : Cn×n → Cn×n. Want to compute f (A) but given A + ∆A not A. ∆A may come from Rounding errors in storing A. Measurement errors. Errors from an earlier computation. Photocopying errors!

Nick Higham Matrix Function Condition Numbers 2 / 33 Photocopier

http://en.wikipedia.org/wiki/Photocopier

“Most current photocopiers use a technology called xerography, a dry process that uses electrostatic charges on a light sensitive photoreceptor to first attract and then transfer toner particles (a powder) onto paper in the form of an image”

“There is an increasing trend for new photocopiers to adopt digital technology, thus replacing the older analog technology. With digital copying, the copier effectively consists of an integrated scanner and laser printer.”

Nick Higham Matrix Function Condition Numbers 3 / 33 Xerox WorkCentre 7535, 7556

August 2013: German computer scientist David Kriesel discovered the machines often change “6” to “8”.

Jbig2 compression algorithm implicated.

Nick Higham Matrix Function Condition Numbers 4 / 33 Maxim

Every code for solving a numerical problem should return an error estimate or bound with the computed result.

Nick Higham Matrix Function Condition Numbers 5 / 33 Condition number of the condition number: assuming xf 0(x) > 0 and f (x) > 0,

0  00 0  [2] xc (x) f (x) f (x) c (x) := = 1 + x − . c(x) f 0(x) f (x)

Conditioning of f (x)

Scalar function f ∈ C2, and y = f (x), y + ∆y = f (x + ∆x). Then ∆y xf 0(x) ∆x = + O∆x 2. y f (x) x The relative condition number

0 xf (x) c(x) = . f (x)

Nick Higham Matrix Function Condition Numbers 6 / 33 Conditioning of f (x)

Scalar function f ∈ C2, and y = f (x), y + ∆y = f (x + ∆x). Then ∆y xf 0(x) ∆x = + O∆x 2. y f (x) x The relative condition number

0 xf (x) c(x) = . f (x)

Condition number of the condition number: assuming xf 0(x) > 0 and f (x) > 0,

0  00 0  [2] xc (x) f (x) f (x) c (x) := = 1 + x − . c(x) f 0(x) f (x)

Nick Higham Matrix Function Condition Numbers 6 / 33 Examples:

2 2 2 (X + E) − X = XE + EX + E ⇒ Lx2 (X, E) = XE + EX.

(X + E)−1 − X −1 = −X −1EX −1 + O(E 2)

−1 −1 ⇒ Lx−1 (X, E) = −X EX .

0 Scalar case: Lf (x, e) = f ( x)e.

Fréchet

Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n n×n n×n n×n A linear mapping Lf : C → C s.t. for all E ∈ C

f (X + E) − f (X) − Lf (X, E) = o(kEk).

Nick Higham Matrix Function Condition Numbers 7 / 33 0 Scalar case: Lf (x, e) = f ( x)e.

Fréchet Derivative

Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n n×n n×n n×n A linear mapping Lf : C → C s.t. for all E ∈ C

f (X + E) − f (X) − Lf (X, E) = o(kEk).

Examples:

2 2 2 (X + E) − X = XE + EX + E ⇒ Lx2 (X, E) = XE + EX.

(X + E)−1 − X −1 = −X −1EX −1 + O(E 2)

−1 −1 ⇒ Lx−1 (X, E) = −X EX .

Nick Higham Matrix Function Condition Numbers 7 / 33 Fréchet Derivative

Fréchet derivative of f : Cn×n → Cn×n at X ∈ Cn×n n×n n×n n×n A linear mapping Lf : C → C s.t. for all E ∈ C

f (X + E) − f (X) − Lf (X, E) = o(kEk).

Examples:

2 2 2 (X + E) − X = XE + EX + E ⇒ Lx2 (X, E) = XE + EX.

(X + E)−1 − X −1 = −X −1EX −1 + O(E 2)

−1 −1 ⇒ Lx−1 (X, E) = −X EX .

0 Scalar case: Lf (x, e) = f ( x)e.

Nick Higham Matrix Function Condition Numbers 7 / 33 Gâteaux Derivative

Gâteaux derivative of matrix function f at A in direction E: f (A + E) − f (A) Gf (A, E) = lim . →0  Weaker notion of differentiability.

Nick Higham Matrix Function Condition Numbers 8 / 33 Applications of Fréchet Derivative

Computation of correlated choice probabilities (2013),

registration of MRI images (2013),

Markov models applied to cancer data (2013),

matrix geometric mean computation (2012),

model reduction (2012),

first and second Fréchet derivative in Halley’s method on (2003).

Nick Higham Matrix Function Condition Numbers 9 / 33 Componentwise Sensitivity (1)

T Let E = ei ej . Then

( +  T ) − ( ) f X ei ej f X T lim = Lf (X, ei ej ). →0 

T (Lf (X, ei ej ))rs measures the sensitivity of f (X)rs to perturbations in aij .

Nick Higham Matrix Function Condition Numbers 10 / 33 Componentwise Sensitivity (2)

Since Lf is a linear operator,

vec(Lf (A, E)) = Kf (A)vec(E)

n2×n2 where Kf (A) ∈ C is the Kronecker form of the Fréchet derivative.

a11 a21 a31 ... ann f11 f21

f31 Kf (A) ≡ . T . Lf (A, ei ej ) fnn

Nick Higham Matrix Function Condition Numbers 11 / 33 0 Scalar case: condabs(f , x) = |f (x)|.

Condition Number

kf (A + E) − f (A)k condabs(f , A) = lim sup . →0 kEk≤ 

kLf (A, E)k kLf (A)k := max . E6=0 kEk

Lemma

condabs(f , A) = kLf (A)k.

Nick Higham Matrix Function Condition Numbers 12 / 33 Condition Number

kf (A + E) − f (A)k condabs(f , A) = lim sup . →0 kEk≤ 

kLf (A, E)k kLf (A)k := max . E6=0 kEk

Lemma

condabs(f , A) = kLf (A)k.

0 Scalar case: condabs(f , x) = |f (x)|.

Nick Higham Matrix Function Condition Numbers 12 / 33 Relative Condition Number

kf (A + E) − f (A)k condrel(f , A) := lim sup →0 kEk≤kAk kf (A)k kAk = cond (f , A) . abs kf (A)k

Nick Higham Matrix Function Condition Numbers 13 / 33 Computing Lf : Methods for Specific f

exponential Kenney & Laub (1998), Al-Mohy & H (2009) logarithm Al-Mohy, H & Relton (2013) fractional power H & Lin (2013)

Latter three methods obtained by differentiating alg for f .

Nick Higham Matrix Function Condition Numbers 14 / 33 Computing Lf : Via 2n × 2n Matrix

Theorem (Mathias, 1996) If f is 2n − 1 times ctsly diffble,

 AE   f (A) L (A, E)  f = f . 0 A 0 f (A)

Note that Lf (A, αE) = αLf (A, E), but α may effect alg used for the evaluation.

Nick Higham Matrix Function Condition Numbers 15 / 33 Computing Lf : Complex Step

Assume that f : Rn×n → Rn×n and A, E ∈ Rn×n. Then

f (A + i hE) − f (A) − i hLf (A, E) = o(h).

Thus (Al-Mohy & H, 2010)

f (A) ≈ Re f (A + i hE),

f (A + i hE) L (A, E) ≈ Im . f h Errors O(h2). h not restricted by fl pt arith considerations. Can take h = 10−100. f alg must not employ complex arith.

Nick Higham Matrix Function Condition Numbers 16 / 33 Condition Estimation (1)

Since Lf is a linear operator,

vec(Lf (A, E)) = Kf (A)vec(E)

n2×n2 where Kf (A) ∈ C is the Kronecker form of the Fréchet derivative.

Can show

kLf (A)kF = kKf (A)k2, kL (A)k f 1 ≤ kK (A)k ≤ nkL (A)k . n f 1 f 1

So problem reduces to matrix estimation.

Nick Higham Matrix Function Condition Numbers 17 / 33 Condition Estimation (2)

Use the block 1-norm estimator of H & Tisseur (2000). ∗ For kBk1 it needs Bx and B y for several x and y. Let vec(X) = x. Then Kf (A)x = vec(Lf (A, X)). Theorem (H & Lin, 2013) Let f ∈ C2n−1 and and ef (z) := f (z). If ef (A)∗ = ef (A∗) for all A ∈ Cn×n then ( )∗ = ( ( , ∗)∗) ( ) = Kf A x vec Lef A X , where vec X x.

ef = f for most functions of interest. ef = f ⇒ ef (A∗)∗ = ef (A∗).

Nick Higham Matrix Function Condition Numbers 18 / 33 Software for 1-Norm Estimator

MATLAB: normest1. NAG library: F04YD, F04ZD, Mark 24. Python:

Nick Higham Matrix Function Condition Numbers 19 / 33 Python: SciPy 0.13.0

linalg/_expm_frechet.py linalg/_expm_multiply.py linalg/_sqrtm.py (blocked)

Nick Higham Matrix Function Condition Numbers 20 / 33 kth Fréchet derivative defined by

(k−1) (k−1) Lf (A+Ek , E1,..., Ek−1) − Lf (A, E1,..., Ek−1) (k) − Lf (A, E1,..., Ek ) = o(kEk k).

Higher

Needed for level-2 condition number, understanding accuracy of algorithms for computing Lf (A, E). (2) Second Fréchet derivative Lf (A, E1, E2) is unique mutlilinear function of E1, E2 satisfying

(2) Lf (A+E2, E1) − Lf (A, E1) − Lf (A, E1, E2) = o(kE2k).

Nick Higham Matrix Function Condition Numbers 21 / 33 Higher Derivatives

Needed for level-2 condition number, understanding accuracy of algorithms for computing Lf (A, E). (2) Second Fréchet derivative Lf (A, E1, E2) is unique mutlilinear function of E1, E2 satisfying

(2) Lf (A+E2, E1) − Lf (A, E1) − Lf (A, E1, E2) = o(kE2k).

kth Fréchet derivative defined by

(k−1) (k−1) Lf (A+Ek , E1,..., Ek−1) − Lf (A, E1,..., Ek−1) (k) − Lf (A, E1,..., Ek ) = o(kEk k).

Nick Higham Matrix Function Condition Numbers 21 / 33 Existing Literature

Large literature on Fréchet derivatives in Banach space. Need specialized results for matrix functions.

Nick Higham Matrix Function Condition Numbers 22 / 33 Existence & Continuity of Fréchet Derivatives

D = open subset of C. Cn×n(D, p) = matrices with spectrum in D and largest Jordan block of size p.

Theorem (H & Relton, 2013) Let f be 2k p − 1 times continuously differentiable on D. n×n (k) Then for A ∈ C (D, p) the kth Fréchet derivative Lf (A) (k) exists and Lf (A, E1,..., Ek ) is continuous in A and n×n E1,..., Ek ∈ C .

Proof uses Gâteaux derivative. k = 1: Mathias (1996).

Nick Higham Matrix Function Condition Numbers 23 / 33 Properties

Assume from now on conditions of theorem satisfied. Then the Ei are interchangeable:

(2) (2) Lf (A, E1, E2) = Lf (A, E2, E1). Indeed

(k) ∂ Lf (A, E1,..., Ek ) = f (A + s1E1 + ··· + sk Ek ). ∂s1 ··· ∂sk s=0

Nick Higham Matrix Function Condition Numbers 24 / 33 The Setup

Wish to estimate norms and condition numbers. Knowing how to evaluate exact quantities helps develop and test the estimates. In practice we only ever work with n × n matrices.

Nick Higham Matrix Function Condition Numbers 25 / 33 (2) How to Compute Lf h i  AE1  f (A) Lf (A,E) X1 = 0 A . Know f (X1) = 0 f (A) . Let   AE1 E2 0   0 1  0 A 0 E2  X2 = I2 ⊗ X1 + ⊗ E2 =   . 0 0  0 0 AE1  0 0 0 A

Then

 (2)  f (A) Lf (A, E1) Lf (A, E2) Lf (A, E1, E2)  0 f (A) 0 Lf (A, E2)  f (X2) =   .  0 0 f (A) Lf (A, E1)  0 0 0 f (A)

Nick Higham Matrix Function Condition Numbers 26 / 33 (k) How to Compute Lf

2i n×2i n Define Xi ∈ C by 0 1 X = I ⊗ X + ⊗ I i−1 ⊗ E , X = A. i 2 i−1 0 0 2 i 0

Theorem (H & Relton, 2013) (k) The (1, n) block of f (Xk ) is Lf (A, E1,..., Ek ).

Nick Higham Matrix Function Condition Numbers 27 / 33 Level-2 Condition Number

“Condition number of the condition number”. Demmel (1987) showed that for matrix inversion (and D. J. Higham, 1995), the eigenproblem, zero-finding, pole assignment in linear control problems (relative) level-1 and level-2 cond no’s are equivalent. Cheung & Cucker (2005) show same holds when “condition number = 1 / distance to nearest ill-posed problem”.

Nick Higham Matrix Function Condition Numbers 28 / 33 Level-2 Condition Number

[2] |condabs(f , A + Z ) − condabs(f , A)| condabs(f , A) = lim sup . →0 kZk≤ 

Theorem (H & Relton, 2013)

[2] (2) condabs(f , A) ≤ kKf (A)k2, (2) n4×n2 for a Kronecker matrix Kf (A) ∈ C .

Nick Higham Matrix Function Condition Numbers 29 / 33 Theorem Let A ∈ Cn×n be normal. Then in the 2-norm

[2] 1 ≤ condrel(exp, A) ≤ 2condrel(exp, A) + 1.

Level-2 Condition Number: Exponential

Theorem Let A ∈ Cn×n be normal. Then in the 2-norm

[2] condabs(exp, A) = condabs(exp, A).

Nick Higham Matrix Function Condition Numbers 30 / 33 Level-2 Condition Number: Exponential

Theorem Let A ∈ Cn×n be normal. Then in the 2-norm

[2] condabs(exp, A) = condabs(exp, A).

Theorem Let A ∈ Cn×n be normal. Then in the 2-norm

[2] 1 ≤ condrel(exp, A) ≤ 2condrel(exp, A) + 1.

Nick Higham Matrix Function Condition Numbers 30 / 33 Level-2 Condition Number: Matrix Inverse

Theorem

For nonsingular A ∈ Cn×n,

[2] −1 −1 3/2 condabs(x , A) = 2condabs(x , A) .

Latter is what is expected from the scalar case: f (x) = x −1 ⇒ |f 00| = |2(f 0)3/2|. Have experimental evidence that matrix case can be similar to scalar case.

Nick Higham Matrix Function Condition Numbers 31 / 33 Condition Number of the Fréchet Derivative

kLf (A + ∆A, E + ∆E) − Lf (A, E)k condabs(Lf , A, E) = lim sup . →0 k∆Ak≤  k∆Ek≤

Theorem (H & Relton, 2013) We have

condabs(f , A) ≤ condabs(Lf , A, E) (2) ≤ max kLf (A, E, Z )k + condabs(f , A). kZ k=1

We have developed method to estimate the bounds.

Nick Higham Matrix Function Condition Numbers 32 / 33 Summary and Open Questions

Showed that given an O(n3) flop method for f :

(k) k 3 Computing Lf (A, E) costs O(8 n ) flops. (k) k 3+2k Computing Kf (A) costs O(8 n ) flops. Cost of estimates is O(n3) flops, given an O(n3) flop

method for Lf . Open questions about relation between level-1 and level-2 condition numbers. How can we exploit the symmetry of (k) Lf (E1, E2,..., Ek ) in the Ei ? Looking at componentwise condition numbers.

Nick Higham Matrix Function Condition Numbers 33 / 33 ReferencesI

S. D. Ahipasaoglu, X. Li, and K. Natarajan. A convex optimization approach for computing correlated choice probabilities with many alternatives. Preprint 4034, Optimization Online, 2013. 26 pp. A. H. Al-Mohy and N. J. Higham. Computing the Fréchet derivative of the matrix exponential, with an application to condition number estimation. SIAM J. Matrix Anal. Appl., 30(4):1639–1657, 2009.

Nick Higham Matrix Function Condition Numbers 1 / 8 ReferencesII

A. H. Al-Mohy and N. J. Higham. The complex step approximation to the Fréchet derivative of a matrix function. Numer. Algorithms, 53(1):133–148, 2010. A. H. Al-Mohy, N. J. Higham, and S. D. Relton. Computing the Fréchet derivative of the matrix logarithm and estimating the condition number. SIAM J. Sci. Comput., 35(4):C394–C410, 2013. S. Amat, S. Busquier, and J. M. Gutiérrez. Geometric constructions of iterative functions to solve nonlinear equations. J. Comput. Appl. Math., 157(1):197–205, 2003.

Nick Higham Matrix Function Condition Numbers 2 / 8 References III

J. Ashburner and G. R. Ridgway. Symmetric diffeomorphic modelling of longitudinal structural MRI. Frontiers in Neuroscience, 6, 2013. Article 197. J. W. Demmel. On condition numbers and the distance to the nearest ill-posed problem. Numer. Math., 51:251–289, 1987.

Nick Higham Matrix Function Condition Numbers 3 / 8 ReferencesIV

B. García-Mora, C. Santamaría, G. Rubio, and J. L. Pontones. Computing survival functions of the sum of two independent Markov processes. An application to bladder carcinoma treatment. Int. Journal of Computer Mathematics, 2013. D. J. Higham. Condition numbers and their condition numbers. Linear Algebra Appl., 214:193–213, 1995.

Nick Higham Matrix Function Condition Numbers 4 / 8 ReferencesV

N. J. Higham. Functions of Matrices: Theory and Computation. Society for Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008. ISBN 978-0-898716-46-7. xx+425 pp. N. J. Higham and A. H. Al-Mohy. Computing matrix functions. Acta Numerica, 19:159–208, 2010. N. J. Higham and L. Lin. An improved Schur–Padé algorithm for fractional powers of a matrix and their Fréchet derivatives. SIAM J. Matrix Anal. Appl., 34(3):1341–1360, 2013.

Nick Higham Matrix Function Condition Numbers 5 / 8 ReferencesVI

N. J. Higham and S. D. Relton. The condition number of the Fréchet derivative of a matrix function. MIMS EPrint, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, 2013. In preparation. N. J. Higham and S. D. Relton. Higher order Fréchet derivatives of matrix functions and the level-2 condition number. MIMS EPrint 2013.58, Manchester Institute for Mathematical Sciences, The University of Manchester, UK, Nov. 2013. 19 pp.

Nick Higham Matrix Function Condition Numbers 6 / 8 References VII

B. Jeuris, R. Vandebril, and B. Vandereycken. A survey and comparison of contemporary algorithms for computing the matrix geometric mean. Electron. Trans. Numer. Anal., 39:379–402, 2012. C. S. Kenney and A. J. Laub. A Schur–Fréchet algorithm for computing the logarithm and exponential of a matrix. SIAM J. Matrix Anal. Appl., 19(3):640–663, 1998. R. Mathias. A chain rule for matrix functions and applications. SIAM J. Matrix Anal. Appl., 17(3):610–620, 1996.

Nick Higham Matrix Function Condition Numbers 7 / 8 References VIII

D. Petersson. A Nonlinear Optimization Approach to H2-Optimal Modeling and Control. PhD thesis, Department of Electrical Engineering, Linköping University, Sweden, SE-581 83 Linköping, Sweden, 2013. Dissertation No. 1528. D. Petersson and J. Löfberg. Model reduction using a frequency-limited H2-cost. Technical report, Department of Electrical Engineering, Linköpings Universitet, SE-581 83, Linköping, Sweden, Mar. 2012. 13 pp.

Nick Higham Matrix Function Condition Numbers 8 / 8