Chapter 6
Vector Norms and Matrix Norms
6.1 Normed Vector Spaces
In order to define how close two vectors or two matrices are, and in order to define the convergence of sequences of vectors or matrices, we can use the notion of a norm.
Recall that R+ = x R x 0 . { 2 | } Also recall that if z = a + ib C is a complex number, 2 with a, b R,thenz = a ib and z = pa2 + b2 ( z is the2modulus of z). | | | |
397 398 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Definition 6.1. Let E be a vector space over a field K, where K is either the field R of reals, or the field C of com- plex numbers. A norm on E is a function : E R+, assigning a nonnegative real number u kkto any! vector u E,andsatisfyingthefollowingconditionsforallk k x, y,2 z E: 2 (N1) x 0, and x =0i↵x =0. (positivity) k k k k (N2) x = x . (homogeneity (or scaling)) k k | |k k (N3) x + y x + y . (triangle inequality) k kk k k k AvectorspaceE together with a norm is called a normed vector space. kk
From (N2) we get
x = x , k k k k and from (N3), we get
x y x y . |k k k k| k k 6.1. NORMED VECTOR SPACES 399 Example 6.1. 1. Let E = R,and x = x ,theabsolutevalueofx. k k | | 2. Let E = C,and z = z ,themodulusofz. k k | | 3. Let E = Rn (or E = Cn). There are three standard norms.
For every (x1,...,xn) E,wehavethe1-norm x ,definedsuchthat,2 k k1 x = x + + x , k k1 | 1| ··· | n| we have the Euclidean norm x ,definedsuchthat, k k2 1 x = x 2 + + x 2 2 , k k2 | 1| ··· | n| and the sup-norm x ,definedsuchthat, k k1 x =max xi 1 i n . k k1 {| || } More generally, we define the ` -norm (for p 1) by p x =(x p + + x p)1/p. k kp | 1| ··· | n|
There are other norms besides the `p-norms; we urge the reader to find such norms. 400 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Some work is required to show the triangle inequality for the `p-norm.
Proposition 6.1. If E is a finite-dimensional vector space over R or C, for every real number p 1, the `p-norm is indeed a norm.
The proof uses the following facts:
If q 1isgivenby 1 1 + =1, p q then
(1) For all ↵, R,if↵, 0, then 2 ↵p q ↵ + . ( ) p q ⇤
(2) For any two vectors u, v E,wehave 2 n u v u v . ( ) | i i|k kp k kq ⇤⇤ i=1 X 6.1. NORMED VECTOR SPACES 401 For p>1and1/p +1/q =1,theinequality n n 1/p n 1/q u v u p v q | i i| | i| | i| i=1 i=1 i=1 X ✓ X ◆ ✓ X ◆ is known as H¨older’s inequality.
For p =2,itistheCauchy–Schwarz inequality.
Actually, if we define the Hermitian inner product , h i on Cn by n u, v = u v , h i i i i=1 X where u =(u1,...,un)andv =(v1,...,vn), then n n u, v u v = u v , |h i| | i i| | i i| i=1 i=1 X X so H¨older’s inequality implies the inequality u, v u v |h i| k kp k kq also called H¨older’s inequality,which,forp =2isthe standard Cauchy–Schwarz inequality. 402 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS
The triangle inequality for the `p-norm, n 1/p n 1/p n 1/q ( u +v )p u p + v q , | i i| | i| | i| i=1 i=1 i=1 ✓ X ◆ ✓ X ◆ ✓ X ◆ is known as Minkowski’s inequality.
When we restrict the Hermitian inner product to real vectors, u, v Rn,wegettheEuclidean inner product 2 n u, v = u v . h i i i i=1 X
It is very useful to observe that if we represent (as usual) n u =(u1,...,un)andv =(v1,...,vn)(inR )bycolumn vectors, then their Euclidean inner product is given by
u, v = u>v = v>u, h i and when u, v Cn,theirHermitianinnerproductis given by 2
u, v = v⇤u = u v. h i ⇤ 6.1. NORMED VECTOR SPACES 403 In particular, when u = v,inthecomplexcaseweget
2 u = u⇤u, k k2 and in the real case, this becomes
2 u = u>u. k k2 As convenient as these notations are, we still recommend that you do not abuse them; the notation u, v is more intrinsic and still “works” when our vector spaceh i is infinite dimensional.
Proposition 6.2. The following inequalities hold for all x Rn (or x Cn): 2 2 x x 1 n x , k k1 k k k k1 x x 2 pn x , k k1 k k k k1 x x pn x . k k2 k k1 k k2 404 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Proposition 6.2 is actually a special case of a very impor- tant result: in a finite-dimensional vector space, any two norms are equivalent.
Definition 6.2. Given any (real or complex) vector space E,twonorms and are equivalent i↵there exists kka kkb some positive reals C1,C2 > 0, such that u C u and u C u , for all u E. k ka 1 k kb k kb 2 k ka 2
Given any norm on a vector space of dimension n,for kk any basis (e1,...,en)ofE,observethatforanyvector x = x e + + x e ,wehave 1 1 ··· n n x = x e + + x e C x , k k k 1 1 ··· n nk k k1 with C =max1 i n ei and k k x = x e + + x e = x + + x . k k1 k 1 1 ··· n nk | 1| ··· | n|
The above implies that u v u v C u v , |k k k k|k k k k1 which means that the map u u is continuous with respect to the norm . 7! k k kk1 6.1. NORMED VECTOR SPACES 405
n 1 Let S1 be the unit sphere with respect to the norm ,namely kk1 n 1 S = x E x =1 . 1 { 2 |k k1 }
n 1 Now, S1 is a closed and bounded subset of a finite- dimensional vector space, so by Heine–Borel (or equiva- n 1 lently, by Bolzano–Weiertrass), S1 is compact.
On the other hand, it is a well known result of analysis that any continuous real-valued function on a nonempty compact set has a minimum and a maximum, and that they are achieved.
Using these facts, we can prove the following important theorem:
Theorem 6.3. If E is any real or complex vector space of finite dimension, then any two norms on E are equivalent.
Next, we will consider norms on matrices. 406 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS 6.2 Matrix Norms
For simplicity of exposition, we will consider the vector spaces Mn(R)andMn(C)ofsquaren n matrices. ⇥
Most results also hold for the spaces Mm,n(R)andMm,n(C) of rectangular m n matrices. ⇥ Since n n matrices can be multiplied, the idea behind matrix norms⇥ is that they should behave “well” with re- spect to matrix multiplication.
Definition 6.3. A matrix norm on the space of kk square n n matrices in Mn(K), with K = R or K = C, ⇥ is a norm on the vector space Mn(K), with the additional property called submultiplicativity that AB A B , k kk kk k for all A, B Mn(K). A norm on matrices satisfying the above property2 is often called a submultiplicative matrix norm.
Since I2 = I,from I = I2 I 2,weget I 1, for every matrix norm.k k k k k k 6.2. MATRIX NORMS 407 Before giving examples of matrix norms, we need to re- view some basic definitions about matrices.
Given any matrix A =(aij) Mm,n(C), the conjugate A of A is the matrix such that2 A = a , 1 i m, 1 j n. ij ij The transpose of A is the n m matrix A> such that ⇥ A> = a , 1 i m, 1 j n. ij ji The adjoint of A is the n m matrix A⇤ such that ⇥ A⇤ = (A>)=(A)>.
When A is a real matrix, A⇤ = A>.
AmatrixA Mn(C)isHermitian if 2 A⇤ = A.
If A is a real matrix (A Mn(R)), we say that A is symmetric if 2 A> = A. 408 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS
AmatrixA Mn(C)isnormal if 2 AA⇤ = A⇤A, and if A is a real matrix, it is normal if
AA> = A>A.
AmatrixU Mn(C)isunitary if 2 UU⇤ = U ⇤U = I.
ArealmatrixQ Mn(R)isorthogonal if 2 QQ> = Q>Q = I.
Given any matrix A =(aij) Mn(C), the trace tr(A)of A is the sum of its diagonal2 elements tr(A)=a + + a . 11 ··· nn It is easy to show that the trace is a linear map, so that tr( A)= tr(A) and tr(A + B)=tr(A)+tr(B). 6.2. MATRIX NORMS 409 Moreover, if A is an m n matrix and B is an n m matrix, it is not hard to⇥ show that ⇥ tr(AB)=tr(BA).
We also review eigenvalues and eigenvectors. We con- tent ourselves with definition involving matrices. A more general treatment will be given later on (see Chapter 7).
Definition 6.4. Given any square matrix A Mn(C), 2 acomplexnumber C is an eigenvalue of A if there 2 is some nonzero vector u Cn,suchthat 2 Au = u. If is an eigenvalue of A,thenthenonzero vectors u 2 Cn such that Au = u are called eigenvectors of A associated with ;togetherwiththezerovector,these n eigenvectors form a subspace of C denoted by E (A), and called the eigenspace associated with .
Remark: Note that Definition 6.4 requires an eigen- vector to be nonzero. 410 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Asomewhatunfortunateconsequenceofthisrequirement is that the set of eigenvectors is not asubspace,sincethe zero vector is missing!
On the positive side, whenever eigenvectors are involved, there is no need to say that they are nonzero.
If A is a square real matrix A Mn(R), then we re- 2 strict Definition 6.4 to real eigenvalues R and real eigenvectors. 2
However, it should be noted that although every complex matrix always has at least some complex eigenvalue, a real matrix may not have any real eigenvalues. For example, the matrix
0 1 A = 10 ✓ ◆ has the complex eigenvalues i and i,butnorealeigen- values.
Thus, typically, even for real matrices, we consider com- plex eigenvalues. 6.2. MATRIX NORMS 411 Observe that C is an eigenvalue of A 2 i↵ Au = u for some nonzero vector u Cn i↵( I A)u =0 2 i↵the matrix I A defines a linear map which has a nonzero kernel, that is, i↵ I A not invertible. However, from Proposition 5.10, I A is not invertible i↵ det( I A)=0.
Now, det( I A)isapolynomialofdegreen in the indeterminate ,infact,oftheform
n n 1 n tr(A) + +( 1) det(A). ···
Thus, we see that the eigenvalues of A are the zeros (also called roots)oftheabovepolynomial.
Since every complex polynomial of degree n has exactly n roots, counted with their multiplicity, we have the fol- lowing definition: 412 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Definition 6.5. Given any square n n matrix ⇥ A Mn(C), the polynomial 2
n n 1 n det( I A)= tr(A) + +( 1) det(A) ··· is called the characteristic polynomial of A.Then (not necessarily distinct) roots 1,..., n of the characteristic polynomial are all the eigenvalues of A and constitute the spectrum of A.
We let
⇢(A)= max i 1 i n | | be the largest modulus of the eigenvalues of A,calledthe spectral radius of A. 6.2. MATRIX NORMS 413
Proposition 6.4. For any matrix norm on Mn(C) and for any square n n matrix A, we havekk ⇥ ⇢(A) A . k k Remark: Proposition 6.4 still holds for real matrices A Mn(R), but a di↵erent proof is needed since in the above2 proof the eigenvector u may be complex.
We use Theorem 6.3 and a trick based on the fact that ⇢(Ak)=(⇢(A))k for all k 1.
Now, it turns out that if A is a real n n symmetric matrix, then the eigenvalues of A are all⇥real and there is some orthogonal matrix Q such that
A = Qdiag( 1,..., n)Q>, where diag( 1,..., n)denotesthematrixwhoseonly nonzero entries (if any) are its diagonal entries, which are the (real) eigenvalues of A. 414 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Similarly, if A is a complex n n Hermitian matrix, then the eigenvalues of A are all⇥real and there is some unitary matrix U such that
A = Udiag( 1,..., n)U ⇤, where diag( 1,..., n)denotesthematrixwhoseonly nonzero entries (if any) are its diagonal entries, which are the (real) eigenvalues of A.
We now return to matrix norms. We begin with the so- called Frobenius norm,whichisjustthenorm 2 on 2 kk Cn ,wherethen n matrix A is viewed as the vec- tor obtained by concatenating⇥ together the rows (or the columns) of A.
The reader should check that for any n n complex ma- ⇥ trix A =(aij),
n 1/2 a 2 = tr(A A)= tr(AA ). | ij| ⇤ ⇤ ✓ i,j=1 ◆ X p p 6.2. MATRIX NORMS 415 Definition 6.6. The Frobenius norm is defined so kkF that for every square n n matrix A Mn(C), ⇥ 2
n 1/2 A = a 2 = tr(AA )= tr(A A). k kF | ij| ⇤ ⇤ ✓ i,j=1 ◆ X p p
The following proposition show that the Frobenius norm is a matrix norm satisfying other nice properties.
Proposition 6.5. The Frobenius norm F on Mn(C) satisfies the following properties: kk (1) It is a matrix norm; that is, AB A B , k kF k kF k kF for all A, B Mn(C). 2 (2) It is unitarily invariant, which means that for all unitary matrices U, V , we have A = UA = AV = UAV . k kF k kF k kF k kF
(3) ⇢(A⇤A) A F pn ⇢(A⇤A), for all A M ( ). k k 2 pn C p 416 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Remark: The Frobenius norm is also known as the Hilbert-Schmidt norm or the Schur norm.Somany famous names associated with such a simple thing!
We now give another method for obtaining matrix norms using subordinate norms.
First, we need a proposition that shows that in a finite- dimensional space, the linear map induced by a matrix is bounded, and thus continuous.
Proposition 6.6. For every norm on Cn (or Rn), kk for every matrix A Mn(C) (or A Mn(R)), there is a real constant C 2 0, such that 2 A Au C u , k k A k k for every vector u Cn (or u Rn if A is real). 2 2
Proposition 6.6 says that every linear map on a finite- dimensional space is bounded. 6.2. MATRIX NORMS 417 This implies that every linear map on a finite-dimensional space is continuous.
Actually, it is not hard to show that a linear map on a normed vector space E is bounded i↵it is continuous, regardless of the dimension of E.
Proposition 6.6 implies that for every matrix A Mn(C) 2 (or A Mn(R)), 2 Ax sup k k CA. x Cn x x2=0 k k 6
Now, since u = u ,itiseasytoshowthat k k | |k k Ax sup k k =sup Ax . x Cn x x Cn k k x2=0 k k x2 =1 6 k k
Similarly
Ax sup k k =sup Ax . x Rn x x Rn k k x2=0 k k x2 =1 6 k k 418 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Definition 6.7. If is any norm on Cn,wedefinethe kk function on Mn(C)by kk Ax A =supk k =sup Ax . k k x Cn x x Cn k k x2=0 k k x2 =1 6 k k
The function A A is called the subordinate matrix norm or operator7! norm k k induced by the norm . kk
It is easy to check that the function A A is indeed anorm,andbydefinition,itsatisfiestheproperty7! k k Ax A x , for all x Cn. k kk kk k 2
Anorm on Mn(C)satisfyingtheabovepropertyis kk said to be subordinate to the vector norm on Cn. kk This implies that
AB A B for all A, B Mn(C), k kk kk k 2 showing that A A is a matrix norm (it is submulti- plicative). 7! k k 6.2. MATRIX NORMS 419 Observe that the operator norm is also defined by
n A =inf R Ax x , for all x C . k k { 2 |k k k k 2 }
The definition also implies that I =1. k k The above shows that the Frobenius norm is not a sub- ordinate matrix norm (why?).
The notion of subordinate norm can be slightly general- ized. Definition 6.8. If K = R or K = C,foranynorm n kk on Mm,n(K), and for any two norms a on K and b on Km,wesaythatthenorm iskksubordinate tokk the norms and if kk kka kkb Ax A x for all A M (K)andallx Kn. k kb k kk ka 2 m,n 2 420 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Remark: For any norm on Cn,wecandefinethe kk function on Mn(R)by kkR Ax A R =supk k =sup Ax . k k x Rn x x Rn k k x2=0 k k x2 =1 6 k k
The function A A R is a matrix norm on Mn(R), and 7! k k A A , k kR k k for all real matrices A Mn(R). 2 However, it is possible to construct vector norms on kk Cn and real matrices A such that A < A . k kR k k In order to avoid this kind of di culties, we define sub- ordinate matrix norms over Mn(C).
Luckily, it turns out that A = A for the vector k kR k k norms, 1 , 2,and . kk kk kk1 Proposition 6.4 also holds for real matrix norms. 6.2. MATRIX NORMS 421
Proposition 6.7. For any matrix norm on Mn(R) kk and for any square n n matrix A Mn(R), we have ⇥ 2 ⇢(A) A . k k
Proposition 6.8. For every square matrix A =(aij) Mn(C), we have 2 n
A 1 =sup Ax 1 =max aij n j k k x C k k | | x2 =1 i=1 k k1 X n
A =sup Ax =max aij n i k k1 x C k k1 | | x2 =1 j=1 k k1 X A 2 =sup Ax 2 = ⇢(A⇤A)= ⇢(AA⇤). k k x Cn k k x2 =1 k k2 p p
Furthermore, A⇤ 2 = A 2, the norm 2 is unitar- ily invariant, whichk k meansk k that kk A = UAV k k2 k k2 for all unitary matrices U, V , and if A is a normal matrix, then A = ⇢(A). k k2 422 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS The norm A is often called the spectral norm. k k2 Observe that property (3) of proposition 6.5 says that A A pn A , k k2 k kF k k2 which shows that the Frobenius norm is an upper bound on the spectral norm. The Frobenius norm is much easier to compute than the spectal norm.
The reader will check that the above proof still holds if the matrix A is real, confirming the fact that A = A k kR k k for the vector norms 1 , 2,and . kk kk kk1 It is also easy to verify that the proof goes through for rectangular matrices, with the same formulae.
Similarly, the Frobenius norm is also a norm on rectan- gular matrices. For these norms, whenever AB makes sense, we have AB A B . k kk kk k 6.2. MATRIX NORMS 423 The following proposition will be needed when we deal with the condition number of a matrix.
Proposition 6.9. Let be any matrix norm and let B be a matrix such thatkkB < 1. k k (1) If is a subordinate matrix norm, then the ma- trixkkI + B is invertible and
1 1 (I + B) . 1 B k k (2) If a matrix of the form I + B is singular, then B 1 for every matrix norm (not necessarily subordinate).k k 424 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS The following result is needed to deal with the conver- gence of sequences of powers of matrices.
Proposition 6.10. For every matrix A Mn(C) and for every ✏>0, there is some subordinate2 matrix norm such that kk A ⇢(A)+✏. k k
The proof uses Theorem 7.4, which says that there ex- ists some invertible matrix U and some upper triangular matrix T such that 1 A = UTU .
Note that equality is generally not possible; consider the matrix 01 A = , 00 ✓ ◆ for which ⇢(A)=0< A ,sinceA =0. k k 6 6.3. CONDITION NUMBERS OF MATRICES 425 6.3 Condition Numbers of Matrices
Unfortunately, there exist linear systems Ax = b whose solutions are not stable under small perturbations of either b or A.
For example, consider the system
10 7 8 7 x1 32 756 5 x 23 0 1 0 21 = 0 1 . 86109 x3 33 B 75910C Bx C B31C B C B 4C B C @ A @ A @ A The reader should check that it has the solution x =(1, 1, 1, 1). If we perturb slightly the right-hand side, obtaining the new system
10 7 8 7 x1 + x1 32.1 756 5 x + x 22.9 0 1 0 2 21 = 0 1 , 86109 x3 + x3 33.1 B 75910C Bx + x C B30.9C B C B 4 4C B C @ A @ A @ A the new solutions turns out to be x =(9.2, 12.6, 4.5, 1.1). 426 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS In other words, a relative error of the order 1/200 in the data (here, b)producesarelativeerroroftheorder10/1 in the solution, which represents an amplification of the relative error of the order 2000.
Now, let us perturb the matrix slightly, obtaining the new system
10 7 8.17.2 x1 + x1 32 7.08 5.04 6 5 x + x 23 0 1 0 2 21 = 0 1 . 85.98 9.98 9 x3 + x3 33 B6.99 4.99 9 9.98C Bx + x C B31C B C B 4 4C B C @ A @ A @ A This time, the solution is x =( 81, 137, 34, 22). Again, a small change in the data alters the result rather drastically.
Yet, the original system is symmetric, has determinant 1, and has integer entries. 6.3. CONDITION NUMBERS OF MATRICES 427 The problem is that the matrix of the system is badly conditioned,aconceptthatwewillnowexplain.
Given an invertible matrix A,first,assumethatweper- turb b to b + b,andletusanalyzethechangebetween the two exact solutions x and x + x of the two systems Ax = b A(x + x)=b + b. We also assume that we have some norm and we use the subordinate matrix norm on matrices.kk From Ax = b Ax + A x = b + b, we get 1 x = A b, and we conclude that 1 x A b k k k k b A x . k kk kk k 428 CHAPTER 6. VECTOR NORMS AND MATRIX NORMS Consequently, the relative error in the result x / x is bounded in terms of the relative error b /k b kink thek data as follows: k k k k