<<

Norms

Tom Lyche

Centre of for Applications, Department of Informatics, University of Oslo

September 28, 2009 Matrix Norms

We consider matrix norms on (Cm,n, C). All results holds for (Rm,n, R). Definition (Matrix Norms) A k·k: Cm,n → C is called a matrix on Cm,n if for all A, B ∈ Cm,n and all α ∈ C 1. kAk ≥ 0 with equality if and only if A = 0. (positivity) 2. kαAk = |α| kAk. (homogeneity) 3. kA + Bk ≤ kAk + kBk. (subadditivity) A matrix norm is simply a vector norm on the finite dimensional vector spaces (Cm,n, C) of m × n matrices. Equivalent norms

Adapting some general results on vector norms to matrix norms give Theorem x 1. All matrix norms are equivalent. Thus, if k·k and k·k0 are two matrix norms on Cm,n then there are positive constants µ and M such that µkAk ≤ kAk0 ≤ MkAk holds for all A ∈ Cm,n. 2. A matrix norm is a continuous function k·k: Cm,n → R. Examples

mn I From any vector norm k kV on C we can define a m,n matrix norm on C by kAk := kvec(A)kV , where vec(A) ∈ Cmn is the vector obtained by stacking the columns of A on top of each other.

I m n X X kAkS := |aij |, p = 1, Sum norm, i=1 j=1 m n X X 21/2 kAkF := |aij | , p = 2, Frobenius norm, i=1 j=1

kAkM := max|aij |, p = ∞, Max norm. i,j (1) The Norm 1.

H 2 Pn Pm 2 Pm Pn 2 2 I kA kF = j=1 i=1|aij | = i=1 j=1|aij | = kAkF . The Frobenius Matrix Norm 2.

Pm Pn 2 Pm 2 I i=1 j=1|aij | = i=1kai·k2 Pm Pn 2 Pn Pm 2 Pn 2 I i=1 j=1|aij | = j=1 i=1|aij | = j=1ka·j k2. Unitary Invariance.

m,n m,m n,n I If A ∈ C and U ∈ C , V ∈ C are unitary 2 2. Pn 2 Pn 2 2. 2 I kUAkF = j=1kUa·j k2 = j=1ka·j k2 = kAkF . 1. H H H 1. I kAVkF = kV A kF = kA kF = kAkF . Submultiplicativity

I Suppose A, B are rectangular matrices so that the product AB is defined. 2 Pn Pk T 2 I kABkF = i=1 j=1 ai· b·j ≤ Pn Pk 2 2 2 2 i=1 j=1kai·k2kb·j k2 = kAkF kBkF . Subordinance

n I kAxk2 ≤ kAkF kxk2, for all x ∈ C . I Since kvkF = kvk2 for a vector this follows from submultiplicativity. Explicit Expression

m,n I Let A ∈ C have singular values σ1, . . . , σn and SVD A = UΣVH . Then 3. H p 2 2 I kAkF = kU AVkF = kΣkF = σ1 + ··· + σn. Consistency

n,n I A matrix norm is called consistent on C if 4. kABk ≤ kAk kBk (submultiplicativity) holds for all A, B ∈ Cn,n. m,n I A matrix norm is consistent if it is defined on C for all m, n ∈ N, and 4. holds for all matrices A, B for which the product AB is defined.

I The Frobenius norm is consistent.

I The Sum norm is consistent.

I The Max norm is not consistent. √ m,n I The norm kAk := mnkAkM , A ∈ C is consistent. Subordinate Matrix Norm Definition

I Suppose m, n ∈ N are given, m n I Let k kα on C and k kβ on C be vector norms, and let k k be a matrix norm on Cm,n. I We say that the matrix norm k k is subordinate to the vector norms k kα and k kβ if kAxkα ≤ kAk kxkβ for all A ∈ Cm,n and all x ∈ Cn. I If k kα = k kβ then we say that k k is subordinate to k kα.

I The Frobenius norm is subordinate to the Euclidian vector norm.

I The Sum norm is subordinate to the l1-norm.

I kAxk∞ ≤ kAkM kxk1.

Definition Suppose m, n ∈ N are given and let k·kα be a vector norm on n m m,n C and k·kβ a vector norm on C . For A ∈ C we define

kAxkβ kAk := kAkα,β := max . (2) x6=0 kxkα We call this the (α, β) operator norm, the (α, β)-norm, or simply the α-norm if α = β. Observations

kAxkα I kAkα,β = maxx∈/ker(A) = maxkxk =1kAxkα. kxkβ β

I kAxkα ≤ kAkkxkβ. ∗ ∗ n ∗ I kAkα,β = kAx kα for some x ∈ C with kx kβ = 1. mn, I The operator norm is a matrix norm on C . I The Sum norm and Frobenius norm are not an α operator norm for any α. Operator norm Properties

m,n I The operator norm is a matrix norm on C . I The operator norm is consistent if the vector norm k kα is defined for all m ∈ N and k kβ = k kα. Proof

In 2. and 3. below we take the max over the unit sphere Sβ.

1. Nonnegativity is obvious. If kAk = 0 then kAykβ = 0 for n each y ∈ C . In particular, each column Aej in A is zero. Hence A = 0.

2. kcAk = maxxkcAxkα = maxx|c| kAxkα = |c| kAk.

3. kA + Bk = maxxk(A + B)xkα ≤ maxxkAxkα + maxxkBxkα = kAk + kBk. 4. kABk = max kABxkα = max kABxkα kBxkα Bx6=0 kxkα Bx6=0 kBxkα kxkα ≤ max kAykα max kBxkα = kAk kBk. y6=0 kykα x6=0 kxkα The p matrix norm

I The operator norms k·kp defined from the p-vector norms are of special interest.

I Recall Pn p1/p kxkp := j=1|xj | , p ≥ 1, kxk∞ := max1≤j≤n|xj |. I Used quite frequently for p = 1, 2, ∞.

I We define for any 1 ≤ p ≤ ∞

kAxkp kAkp := max = max kAykp. (3) x6=0 kxkp kykp=1

I The p-norms are consistent matrix norms which are subordinate to the p-vector norm. Explicit expressions

Theorem For A ∈ Cm,n we have m X kAk1 := max |ak,j |, (max column sum) 1≤j≤n k=1

kAk2 := σ1, (largest of A) n X kAk∞ := max |ak,j |, (max row sum). 1≤k≤m j=1 (4)

The expression kAk2 is called the two-norm or the spectral norm of A. The explicit expression follows from the minmax theorem for singular values. Examples

1 14 4 16 For A := 15 [ 2 22 13 ] we find 29 I kAk1 = 15 . I kAk2 = 2. 37 I kAk∞ = . √15 I kAkF = 5. 1 2 I A := [ 3 4 ]

I kAk1 = 6

I kAk2 = 5.465

I kAk∞ = 7.

I kAkF = 5.4772 The 2 norm

Theorem n,n Suppose A ∈ C has singular values σ1 ≥ σ2 ≥ · · · ≥ σn and eigenvalues |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Then

−1 1 kAk2 = σ1 and kA k2 = , (5) σn −1 1 kAk2 = λ1 and kA k2 = , if A is symmetric positive definite, λn (6)

−1 1 kAk2 = |λ1| and kA k2 = , if A is normal. (7) |λn|

For the norms of A−1 we assume of course that A is nonsingular. Unitary Transformations

Definition A matrix norm k k on Cm,n is called unitary invariant if kUAVk = kAk for any A ∈ Cm,n and any unitary matrices U ∈ Cm,m and V ∈ Cn,n. If U and V are unitary then U(A + E)V = UAV + F, where kFk = kEk. Theorem The Frobenius norm and the spectral norm are unitary H H invariant. Moreover kA kF = kAkF and kA k2 = kAk2. Proof 2 norm

I kUAk2 = maxkxk2=1kUAxk2 = maxkxk2=1kAxk2 = kAk2. H I kA k2 = kAk2 (same singular values). H H H H I kAVk2 = k(AV) k2 = kV A k2 = kA k2 = kAk2. Perturbation of linear systems

I x1 + x2 = 20 −16 −15 x1 + (1 − 10 )x2 = 20 − 10

I The exact solution is x1 = x2 = 10.

I Suppose we replace the second equation by

−16 −15 x1 + (1 + 10 )x2 = 20 − 10 ,

I the exact solution changes to x1 = 30, x2 = −10. −16 I A small change in one of the coefficients, from 1 − 10 to 1 + 10−16, changed the exact solution by a large amount. Ill Conditioning

I A mathematical problem in which the solution is very sensitive to changes in the data is called ill-conditioned or sometimes ill-posed.

I Such problems are difficult to solve on a computer.

I If at all possible, the mathematical model should be changed to obtain a more well-conditioned or properly-posed problem. Perturbations

I We consider what effect a small change (perturbation) in the data A,b has on the solution x of a linear system Ax = b.

I Suppose y solves (A + E)y = b+e where E is a (small) n × n matrix and e a (small) vector.

I How large can y−x be?

I To measure this we use vector and matrix norms. Conditions on the norms

n I k·k will denote a vector norm on C and also a submultiplicative matrix norm on Cn,n which in addition is subordinate to the vector norm. n,n n I Thus for any A, B ∈ C and any x ∈ C we have kABk ≤ kAk kBk and kAxk ≤ kAk kxk.

I This is satisfied if the matrix norm is the operator norm corresponding to the given vector norm or the Frobenius norm. Absolute and relative error

I The difference ky − xk measures the absolute error in y as an approximation to x,

I ky − xk/kxk or ky − xk/kyk is a measure for the relative error. Perturbation in the right hand side Theorem Suppose A ∈ Cn,n is invertible, b, e ∈ Cn, b 6= 0 and Ax = b, Ay = b+e. Then

1 kek ky − xk kek ≤ ≤ K(A) , K(A) = kAkkA−1k. K(A) kbk kxk kbk (8)

I Proof:

I Consider (8). kek/kbk is a measure for the size of the perturbation e relative to the size of b. ky − xk/kxk can in the worst case be

K(A) = kAkkA−1k

times as large as kek/kbk.

I K(A) is called the condition number with respect to inversion of a matrix, or just the condition number, if it is clear from the context that we are talking about solving linear systems.

I The condition number depends on the matrix A and on the norm used. If K(A) is large, A is called ill-conditioned (with respect to inversion).

I If K(A) is small, A is called well-conditioned (with respect to inversion). Condition number properties

−1 −1 I Since kAkkA k ≥ kAA k = kI k ≥ 1 we always have K(A) ≥ 1.

I Since all matrix norms are equivalent, the dependence of K(A) on the norm chosen is less important than the dependence on A.

I Usually one chooses the spectral norm when discussing properties of the condition number, and the l1 and l∞ norm when one wishes to compute it or estimate it. The 2-norm

I Suppose A has singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0 and eigenvalues |λ1| ≥ |λ2| ≥ · · · ≥ |λn| if A is square. K (A) = kAk kA−1k = σ1 I 2 2 2 σn K (A) = kAk kA−1k = |λ1| , A normal. I 2 2 2 |λn| I It follows that A is ill-conditioned with respect to inversion if and only if σ1/σn is large, or |λ1|/|λn| is large when A is normal. K (A) = kAk kA−1k = λ1 , A positive definite. I 2 2 2 λn The residual

Suppose we have computed an approximate solution y to Ax = b. The vector r(y :) = Ay − b is called the residual vector , or just the residual. We can bound x−y in term of r(y). Theorem Suppose A ∈ Cn,n, b ∈ Cn, A is nonsingular and b 6= 0. Let r(y) = Ay − b for each y ∈ Cn. If Ax = b then 1 kr(y)k ky − xk kr(y)k ≤ ≤ K(A) . (9) K(A) kbk kxk kbk Discussion

I If A is well-conditioned, (9) says that ky − xk/kxk ≈ kr(y)k/kbk.

I In other words, the accuracy in y is about the same order of magnitude as the residual as long as kbk ≈ 1.

I If A is ill-conditioned, anything can happen.

I The solution can be inaccurate even if the residual is small

I We can have an accurate solution even if the residual is large. The inverse of A + E

Theorem Suppose A ∈ Cn,n is nonsingular and let k·k be a consistent matrix norm on Cn,n. If E ∈ Cn,n is so small that r := kA−1Ek < 1 then A + E is nonsingular and

kA−1k k(A + E)−1k ≤ . (10) 1 − r If r < 1/2 then

k(A + E)−1 − A−1k kEk ≤ 2K(A) . (11) kA−1k kAk Proof

n,n I We use that if B ∈ C and kBk < 1 then I − B is −1 1 nonsingular and k(I − B) k ≤ 1−kBk . −1 I Since r < 1 the matrix I − B := I + A E is nonsingular. −1 −1 I Since (I − B) A (A + E) = I we see that A + E is nonsingular with inverse (I − B)−1A−1. −1 −1 −1 I Hence, k(A + E) k ≤ k(I − B) kkA k and (10) follows. −1 −1 −1 −1 I From the identity (A + E) − A = −A E(A + E) we obtain by k(A + E)−1 − A−1k ≤ kA−1kkEkk(A + E)−1k ≤ kEk kA−1k K(A) kAk 1−r . −1 I Dividing by kA k and setting r = 1/2 proves (11). Perturbation in A

Theorem Suppose A, E ∈ Cn,n, b ∈ Cn with A invertible and b 6= 0. If r := kA−1Ek < 1/2 for some operator norm then A + E is invertible. If Ax = b and (A + E)y = b then

ky − xk kEk ≤ kA−1Ek ≤ K(A) , (12) kyk kAk ky − xk kEk ≤ 2K(A) .. (13) kxk kAk Proof

I A + E is invertible.

I (12) follows easily by taking norms in the equation x − y = A−1Ey and dividing by kyk. −1 −1 I From the identity y − x = (A + E) − A Ax we obtain ky − xk ≤ k(A + E)−1 − A−1kkAkkxk and (13) follows. Finding the of a matrix

I Gauss-Jordan cannot be used to determine rank numerically

I Use singular value decomposition

I numerically will normally find σn > 0.

I Determine minimal r so that σr+1, . . . , σn are ”close” to round off unit.

I Use this r as an estimate for the rank. Convergence in Rm,n and Cm,n

I Consider an infinite sequence of matrices m,n {Ak } = A0, A1, A2,... in C . m,n I {Ak } is said to converge to the limit A in C if each element sequence {Ak (ij)}k converges to the corresponding element A(ij) for i = 1,..., m and j = 1,..., n.

I {Ak } is a Cauchy sequence if for each  > 0 there is an integer N ∈ N such that for each k, l ≥ N and all i, j we have |Ak (ij) − Al (ij)| ≤ .

I {Ak } is bounded if there is a constant M such that |Ak (ij)| ≤ M for all i, j, k. More on Convergence

mn I By stacking the columns of A into a vector in C we obtain m,n m,n I A sequence {Ak } in C converges to a matrix A ∈ C if and only if limk→∞kAk − Ak = 0 for any matrix norm k·k. m,n I A sequence {Ak } in C is convergent if and only if it is a Cauchy sequence. m,n I Every bounded sequence {Ak } in C has a convergent subsequence. The

I ρ(A) = maxλ∈σ(A)|λ|. n,n n,n I For any matrix norm k·k on C and any A ∈ C we have ρ(A) ≤ kAk.

I Proof: Let (λ, x) be an eigenpair for A n,n I X := [x,..., x] ∈ C . I λX = AX, which implies |λ| kXk = kλXk = kAXk ≤ kAk kXk.

I Since kXk= 6 0 we obtain |λ| ≤ kAk. A Special Norm

Theorem Let A ∈ Cn,n and  > 0 be given. There is a consistent matrix norm k·k0 on Cn,n such that ρ(A) ≤ kAk0 ≤ ρ(A) + . A Very Important Result

Theorem For any A ∈ Cn,n we have lim Ak = 0 ⇐⇒ ρ(A) < 1. k→∞

I Proof:

I Suppose ρ(A) < 1. n,n I There is a consistent matrix norm k·k on C such that kAk < 1. k k I But then kA k ≤ kAk → 0 as k → ∞. k I Hence A → 0.

I The converse is easier. Convergence can be slow

0.99 1 0  0.4 9.37 1849 100 I A =  0 0.99 1  , A =  0 0.4 37  , 0 0 0.99 0 0 0.4 10−9  0.004 2000 A =  0 10−9   0 0 10−9 More limits

n,n I For any submultiplicative matrix norm k·k on C and any A ∈ Cn,n we have lim kAk k1/k = ρ(A). (14) k→∞