Vector and Matrix Norms

Tom Lyche

Centre of Mathematics for Applications, Department of Informatics, University of Oslo

October 4, 2010 Vector Norms

Deﬁnition (Norm) A norm in Rn(Cn) is a function k·k : Rn(Cn) → R that satisﬁes for all x, y in Rn(Cn) and all a in R(C) 1. kxk ≥ 0 with equality if and only if x = 0. (positivity) 2. kaxk = |a| kxk. (homogeneity) 3. kx + yk ≤ kxk + kyk. (subadditivity) The triples (Rn, R, k·k) and (Cn, C, k·k) are examples of normed vector spaces and the inequality 3. is called the triangle inequality. Inverse Triangle Inequality

n kx − yk ≥ | kxk − kyk |, x, y ∈ C .

I Need to show kx − yk ≥ kxk − kyk and

I kx − yk ≥ kyk − kxk

I Since kx − yk = ky − xk the last inequality follows from the ﬁrst.

I The ﬁrst: kxk = kx − y + yk ≤ kx − yk + kyk The inverse triangle inequality shows that a norm is a continuous function Cn → R. p Norms

Deﬁne for p ≥ 1

n X p1/p kxkp := |xj | , (1) j=1

kxk∞ := max |xj |. (2) 1≤j≤n

The most important cases are: Pn 1. kxk1 = j=1|xj | ,(the one-norm or l1-norm) Pn 21/2 2. kxk2 = j=1|xj | , the two-norm, l2-norm, or Euclidian norm)

3. kxk∞ = max1≤j≤n|xj |,(the inﬁnity-norm, l∞-norm, or max norm.) Why index ∞?

n lim kxkp = kxk∞ for all x ∈ . (3) p→∞ C

I For x 6= 0 we write

n X |xj | p 1/p kxk := kxk . p ∞ kxk j=1 ∞

I Now each term in the sum is not greater than one and at least one term is equal to one

I 1/p kxk∞ ≤ kxkp ≤ n kxk∞, p ≥ 1. 1/p I Since limp→∞ n = 1 for any n ∈ N we see that (3) follows. H¨olderand Minkowski

I It is shown in shown in an Appendix that the p-norm are norms in Rn and in Cn for any p with 1 ≤ p ≤ ∞. I The triangle inequality kx + ykp ≤ kxkp + kykp is called Minkowski’s inequality.

I To prove it one ﬁrst establishes H¨older’sinequality

n X 1 1 |x y | ≤ kxk kyk , + = 1, x, y ∈ n. (4) j j p q p q C j=1

1 1 I The relation p + q = 1 means for example that if p = 1 then q = ∞ and if p = 2 then q = 2. Equivalent Norms

Deﬁnition Two norms k·k and k·k0 on Cn are equivalent if there are positive constants m and M (depending only on n such that for all vectors x ∈ Cn we have mkxk ≤ kxk0 ≤ Mkxk. (5)

1/p Example: kxk∞ ≤ kxkp ≤ n kxk∞, p ≥ 1. Equivalent Norms

The following result is proved in an Appendix. Theorem All vector norms in Cn are equivalent. Matrix Norms

We consider matrix norms on (Cm,n, C). All results holds for (Rm,n, R). Deﬁnition (Matrix Norms) A function k·k: Cm,n → C is called a matrix norm on Cm,n if for all A, B ∈ Cm,n and all α ∈ C 1. kAk ≥ 0 with equality if and only if A = 0. (positivity) 2. kαAk = |α| kAk. (homogeneity) 3. kA + Bk ≤ kAk + kBk. (subadditivity) A matrix norm is simply a vector norm on the ﬁnite dimensional vector spaces (Cm,n, C) of m × n matrices. Equivalent norms

Adapting some general results on vector norms to matrix norms give Theorem x 1. All matrix norms are equivalent. Thus, if k·k and k·k0 are two matrix norms on Cm,n then there are positive constants µ and M such that µkAk ≤ kAk0 ≤ MkAk holds for all A ∈ Cm,n. 2. A matrix norm is a continuous function k·k: Cm,n → R. The Frobenius Matrix Norm

m,n I A ∈ C , U, V unitary. Pm Pn 2 1/2 I kAkF := ( i=1 j=1|aij | ) . I kABkF ≤ kAkF kBkF Submultiplicativity

I kAxk2 ≤ kAkF kxk2. Subordinance

I kUAVkF = kAkF , Unitary invariance 2 2 1/2 I kAkF = (σ1 + ··· + σn) . singular values The p matrix norm. 1 ≤ p ≤ ∞, A ∈ Cm,n

kAxkp kAkp := max . x6=0 kxkp

I kAkp = maxkykp=1kAykp I The p-norms are matrix norms

I kABkp ≤ kAkP kBkP Submultiplicativity

I kAxkp ≤ kAkpkxkp Subordinance

I kUAVk2 = kAk2 Unitary invariance of 2-norm Explicit expressions

Theorem For A ∈ Cm,n we have m X kAk1 := max |ak,j |, (max column sum) 1≤j≤n k=1

kAk2 := σ1, (largest singular value of A) n X kAk∞ := max |ak,j |, (max row sum). 1≤k≤m j=1 (6)

The expression kAk2 is called the two-norm or the spectral norm of A. The explicit expression for the spectral norm follows from the minmax theorem for singular values. Examples

1 14 4 16 For A := 15 [ 2 22 13 ] we ﬁnd 29 I kAk1 = 15 . I kAk2 = 2. 37 I kAk∞ = . √15 I kAkF = 5. 1 2 I A := [ 3 4 ]

I kAk1 = 6

I kAk2 = 5.465

I kAk∞ = 7.

I kAkF = 5.4772 The spectral norm in special cases

Theorem n,n Suppose A ∈ C has singular values σ1 ≥ σ2 ≥ · · · ≥ σn and eigenvalues |λ1| ≥ |λ2| ≥ · · · ≥ |λn|. Then

−1 1 kAk2 = σ1 and kA k2 = , (7) σn −1 1 kAk2 = |λ1| and kA k2 = , if A is normal. (8) |λn|

If A ∈ Rn,n is symmetric positive deﬁnite then kAk = λ and kA−1k = 1 . For the norms of A−1 we 2 1 2 λn assume of course that A is nonsingular. Consistency

n,n I A matrix norm is called consistent on C if kABk ≤ kAk kBk (submultiplicativity) holds for all A, B ∈ Cn,n. m,n I A matrix norm is consistent if it is deﬁned on C for all m, n ∈ N, and submultiplicativity holds for all matrices A, B for which the product AB is deﬁned.

I The Frobenius norm is consistent.

I The p norm is consistent for 1 ≤ p ≤ ∞. Subordinate Matrix Norm

Deﬁnition

I Suppose m, n ∈ N are given, m n I Let k kα on C and k kβ on C be vector norms, and let k k be a matrix norm on Cm,n. I We say that the matrix norm k k is subordinate to the m,n vector norm k kα if kAxkα ≤ kAk kxkα for all A ∈ C and all x ∈ Cn.

I The Frobenius norm is subordinate to the Euclidian vector norm.

I The p matrix norm is subordinate to the p vector norm for 1 ≤ p ≤ ∞. Operator Norm

Deﬁnition Suppose m, n ∈ N are given and let k·kα be a vector norm on n m m,n C and k·kβ a vector norm on C . For A ∈ C we deﬁne

kAxkβ kAk := kAkα,β := max . (9) x6=0 kxkα We call this the (α, β) operator norm, the (α, β)-norm, or simply the α-norm if α = β.

I The p norms are operator norms

I The Frobenius norm is not an operator norm. Observations on Operator norms

m,n I The operator norm is a matrix norm on C . n,n I Consistent on C and consistent if α = β and the α norm is deﬁned for all m, n.

I kAxkα ≤ kAkkxkβ. Subordinance

kAxkα I kAkα,β = maxx∈/ker(A) = maxkxk =1kAxkα. kxkβ β ∗ ∗ n ∗ I kAkα,β = kAx kα for some x ∈ C with kx kβ = 1. Perturbation of linear systems

I x1 + x2 = 20 −16 −15 x1 + (1 − 10 )x2 = 20 − 10

I The exact solution is x1 = x2 = 10.

I Suppose we replace the second equation by

−16 −15 x1 + (1 + 10 )x2 = 20 − 10 ,

I the exact solution changes to x1 = 30, x2 = −10. −16 I A small change in one of the coeﬃcients, from 1 − 10 to 1 + 10−16, changed the exact solution by a large amount. Ill Conditioning

I A mathematical problem in which the solution is very sensitive to changes in the data is called ill-conditioned or sometimes ill-posed.

I Such problems are diﬃcult to solve on a computer.

I If at all possible, the mathematical model should be changed to obtain a more well-conditioned or properly-posed problem. Perturbations

I We consider what eﬀect a small change (perturbation) in the data A,b has on the solution x of a linear system Ax = b.

I Suppose y solves (A + E)y = b+e where E is a (small) n × n matrix and e a (small) vector.

I How large can y−x be?

I To measure this we use vector and matrix norms. Conditions on the norms

n I k·k will denote a vector norm on C and also a consistent matrix norm on Cn,n which in addition is subordinate to the vector norm. n,n n I Thus for any A, B ∈ C and any x ∈ C we have kABk ≤ kAk kBk and kAxk ≤ kAk kxk.

I This is satisﬁed if the matrix norm is the p norm or the Frobenius norm . Absolute and relative error

I The diﬀerence ky − xk measures the absolute error in y as an approximation to x,

I ky − xk/kxk or ky − xk/kyk is a measure for the relative error. Perturbation in the right hand side

Theorem Suppose A ∈ Cn,n is invertible, b, e ∈ Cn, b 6= 0 and Ax = b, Ay = b+e. Then

1 kek ky − xk kek ≤ ≤ K(A) , K(A) = kAkkA−1k. K(A) kbk kxk kbk (10)

I Consider (10). kek/kbk is a measure for the size of the perturbation e relative to the size of b. ky − xk/kxk can in the worst case be

K(A) = kAkkA−1k

times as large as kek/kbk. Condition number

I K(A) is called the condition number with respect to inversion of a matrix, or just the condition number, if it is clear from the context that we are talking about solving linear systems.

I The condition number depends on the matrix A and on the norm used. If K(A) is large, A is called ill-conditioned (with respect to inversion).

I If K(A) is small, A is called well-conditioned (with respect to inversion). Condition number properties

−1 −1 I Since kAkkA k ≥ kAA k = kI k ≥ 1 we always have K(A) ≥ 1.

I Since all matrix norms are equivalent, the dependence of K(A) on the norm chosen is less important than the dependence on A.

I Usually one chooses the spectral norm when discussing properties of the condition number, and the l1 and l∞ norm when one wishes to compute it or estimate it. The spectral norm condition number

I Suppose A has singular values σ1 ≥ σ2 ≥ · · · ≥ σn > 0 and eigenvalues |λ1| ≥ |λ2| ≥ · · · ≥ |λn| if A is square. −1 σ1 I K (A) = kAk kA k = 2 2 2 σn −1 |λ1| I K (A) = kAk kA k = , A normal. 2 2 2 |λn| I It follows that A is ill-conditioned with respect to inversion if and only if σ1/σn is large, or |λ1|/|λn| is large when A is normal. −1 I K2(A) = kAk2kA k2 = λ1 , A real, symmetric, positive deﬁnite. λn The residual

Suppose we have computed an approximate solution y to Ax = b. The vector r(y :) = Ay − b is called the residual vector , or just the residual. We can bound x−y in term of r(y). Theorem Suppose A ∈ Cn,n, b ∈ Cn, A is nonsingular and b 6= 0. Let r(y) = Ay − b for each y ∈ Cn. If Ax = b then 1 kr(y)k ky − xk kr(y)k ≤ ≤ K(A) . (11) K(A) kbk kxk kbk Discussion

I If A is well-conditioned, (11) says that ky − xk/kxk ≈ kr(y)k/kbk.

I In other words, the accuracy in y is about the same order of magnitude as the residual as long as kbk ≈ 1.

I If A is ill-conditioned, anything can happen.

I The solution can be inaccurate even if the residual is small

I We can have an accurate solution even if the residual is large. The inverse of A + E

Theorem Suppose A ∈ Cn,n is nonsingular and let k·k be a consistent matrix norm on Cn,n. If E ∈ Cn,n is so small that r := kA−1Ek < 1 then A + E is nonsingular and

kA−1k k(A + E)−1k ≤ . (12) 1 − r If r < 1/2 then

k(A + E)−1 − A−1k kEk ≤ 2K(A) . (13) kA−1k kAk Proof

n,n I We use that if B ∈ C and kBk < 1 then I − B is −1 1 nonsingular and k(I − B) k ≤ 1−kBk . −1 I Since r < 1 the matrix I − B := I + A E is nonsingular. −1 −1 I Since (I − B) A (A + E) = I we see that A + E is nonsingular with inverse (I − B)−1A−1. −1 −1 −1 I Hence, k(A + E) k ≤ k(I − B) kkA k and (12) follows. −1 −1 −1 −1 I From the identity (A + E) − A = −A E(A + E) we obtain by k(A + E)−1 − A−1k ≤ kA−1kkEkk(A + E)−1k ≤ kEk kA−1k K(A) kAk 1−r . −1 I Dividing by kA k and setting r = 1/2 proves (13). Perturbation in A

Theorem Suppose A, E ∈ Cn,n, b ∈ Cn with A invertible and b 6= 0. If r := kA−1Ek < 1/2 for some operator norm then A + E is invertible. If Ax = b and (A + E)y = b then

ky − xk kEk ≤ kA−1Ek ≤ K(A) , (14) kyk kAk ky − xk kEk ≤ 2K(A) .. (15) kxk kAk Proof

I A + E is invertible.

I (14) follows easily by taking norms in the equation x − y = A−1Ey and dividing by kyk. −1 −1 I From the identity y − x = (A + E) − A Ax we obtain ky − xk ≤ k(A + E)−1 − A−1kkAkkxk and (15) follows. Convergence in Rm,n and Cm,n

I Consider an inﬁnite sequence of matrices m,n {Ak } = A0, A1, A2,... in C . m,n I {Ak } is said to converge to the limit A in C if each element sequence {Ak (ij)}k converges to the corresponding element A(ij) for i = 1,..., m and j = 1,..., n.

I {Ak } is a Cauchy sequence if for each > 0 there is an integer N ∈ N such that for each k, l ≥ N and all i, j we have |Ak (ij) − Al (ij)| ≤ .

I {Ak } is bounded if there is a constant M such that |Ak (ij)| ≤ M for all i, j, k. More on Convergence

mn I By stacking the columns of A into a vector in C we obtain m,n m,n I A sequence {Ak } in C converges to a matrix A ∈ C if and only if limk→∞kAk − Ak = 0 for any matrix norm k·k. m,n I A sequence {Ak } in C is convergent if and only if it is a Cauchy sequence. m,n I Every bounded sequence {Ak } in C has a convergent subsequence. The Spectral Radius

I ρ(A) = maxλ∈σ(A)|λ|. n,n n,n I For any matrix norm k·k on C and any A ∈ C we have ρ(A) ≤ kAk.

I Proof: Let (λ, x) be an eigenpair for A n,n I X := [x,..., x] ∈ C . I λX = AX, which implies |λ| kXk = kλXk = kAXk ≤ kAk kXk.

I Since kXk= 6 0 we obtain |λ| ≤ kAk. A Special Norm

Theorem Let A ∈ Cn,n and > 0 be given. There is a consistent matrix norm k·k0 on Cn,n such that ρ(A) ≤ kAk0 ≤ ρ(A) + . A Very Important Result

Theorem For any A ∈ Cn,n we have lim Ak = 0 ⇐⇒ ρ(A) < 1. k→∞

I Proof:

I Suppose ρ(A) < 1. n,n I There is a consistent matrix norm k·k on C such that kAk < 1. k k I But then kA k ≤ kAk → 0 as k → ∞. k I Hence A → 0.

I The converse is easier. Convergence can be slow

0.99 1 0  0.4 9.37 1849 100 I A =  0 0.99 1  , A =  0 0.4 37  , 0 0 0.99 0 0 0.4 10−9 0.004 2000 A =  0 10−9  0 0 10−9 More limits

n,n I For any submultiplicative matrix norm k·k on C and any A ∈ Cn,n we have lim kAk k1/k = ρ(A). (16) k→∞