<<

5 Norms on Linear Spaces

5.1 Introduction

In this chapter we study norms on real or complex linear linear spaces. Definition 5.1. Let F = (F, {0, 1, +, −, ·}) be the real or the complex field. A semi- on an F -linear space V is a mapping ν : V −→ R that satisfies the following conditions: (i) ν(x + y) ≤ ν(x)+ ν(y) (the subadditivity property of semi-norms), and (ii) ν(ax)= |a|ν(x), for x, y ∈ V and a ∈ F . By taking a = 0 in the second condition of the definition we have ν(0)=0 for every semi-norm on a real or complex space. A semi-norm can be defined on every linear space. Indeed, if B is a basis of V , B = {vi | i ∈ I}, J is a finite subset of I, and x = i∈I xivi, define νJ (x) as P 0 if x = 0, νJ (x)= ( j∈J |aj | for x ∈ V . We leave to the readerP the verification of the fact that νJ is indeed a .

Theorem 5.2. If V is a real or complex linear space and ν : V −→ R is a semi-norm on V , then

ν(x − y) ≥ |ν(x) − ν(y)|, for x, y ∈ V .

Proof. We have ν(x) ≤ ν(x − y)+ ν(y), so

ν(x) − ν(y) ≤ ν(x − y). (5.1) 160 5 Norms on Linear Spaces

Since ν(x − y)= |− 1|ν(y − x) ≥ ν(y) − ν(x) we have

−(ν(x) − ν(y)) ≤ ν(x) − ν(y). (5.2)

The Inequalities 5.1 and 5.2 give the desired . ⊓⊔

Corollary 5.3. If p : V −→ R is a semi-norm on V , then p(x) ≥ 0 for x ∈ V .

Proof. Choose y = 0 in the inequality of Theorem 5.2 we have ν(x) ≥ |ν(x)|≥ 0. ⊓⊔

Definition 5.4. Let F = (F, {0, 1, +, −, ·}) be the real or the complex field. A norm on an F -linear space V is a semi-norm ν : V −→ R such that ν(x)=0 implies z = 0 for x ∈ V . The pair (V, ν) is referred to as a normed linear space.

5.2 Vector Norms on Rn

R 1 1 Lemma 5.5. Let p, q ∈ −{0, 1} such that p + q =1. Then we have p> 1 if and only if q> 1. Furthermore, one of the numbers p, q belongs to the interval (0, 1) if and only if the other number is negative.

Proof. We leave to the reader the simple proof of this statement. ⊓⊔

R 1 1 Lemma 5.6. Let p, q ∈ −{0, 1} be two numbers such that p + q = 1 and p> 1. Then, for every a,b ∈ R≥0, we have ap bq ab ≤ + , p q

− 1 where the equality holds if and only if a = b 1−p .

xp 1 Proof. By Lemma 5.5, we have q> 1. Consider the f(x)= p + q −x for x ≥ 0. We have f ′(x)= xp−1 − 1, so the minimum is achieved when x =1 and f(1) = 0. Thus, − 1 f ab p−1 ≥ f(1) = 0,   which amounts to p p − p−1 a b 1 − 1 + − ab p−1 ≥ 0. p q p By multiplying both sides of this inequality by b p−1 , we obtain the desired inequality. ⊓⊔

1 1 Observe that if p + q = 1 and p< 1, then q< 0. In this case, we have the reverse inequality n 5.2 Vector Norms on R 161 ap bq ab ≥ + . (5.3) p q which can be shown by observing that the function f has a maximum in x = 1. The same inequality holds when q< 1 and therefore p< 0.

Theorem 5.7 (The H¨older Inequality). Let a1,...,an and b1,...,bn be 1 1 2n nonnegative numbers, and let p and q be two numbers such that p + q =1 and p> 1. We have

1 1 n n p n q p q aibi ≤ ai · bi . i=1 i=1 ! i=1 ! X X X Proof. Define the numbers

ai bi xi = 1 and yi = 1 n p p n q q ( i=1 ai ) ( i=1 bi ) for 1 ≤ i ≤ n. Lemma 5.6P applied to xi,yi yieldsP p p aibi 1 ai 1 bi 1 1 ≤ n + n . n p n q p p p q p i ai q i bi ( i=1 ai ) ( i=1 bi ) =1 =1 Adding these inequalities,P P we obtain P P

1 1 n n p n q p q aibi ≤ ai bi i=1 i=1 ! i=1 ! X X X 1 1 because p + q = 1. ⊓⊔ The nonnegativity of the numbers a1,...,an,b1,...,bn can be relaxed by using absolute values. Indeed, we can easily prove the following variant of Theorem 5.7.

Theorem 5.8. Let a1,...,an and b1,...,bn be 2n numbers and let p and q 1 1 be two numbers such that p + q =1 and p> 1. We have

1 1 n n p n q p q aibi ≤ |ai| · |bi| . ! ! i=1 i=1 i=1 X X X

Proof. By Theorem 5.7, we have

1 1 n n p n q p q |ai||bi|≤ |ai| · |bi| . i=1 i=1 ! i=1 ! X X X The needed equality follows from the fact that 162 5 Norms on Linear Spaces

n n aibi ≤ |ai||bi|.

i=1 i=1 X X ⊓⊔

We obtain as a special case the Cauchy-Schwarz inequality that was proven in Theorem 7.3. n Corollary 5.9 (The Cauchy-Schwarz Inequality for R ). Let a1,...,an and b1,...,bn be 2n numbers. We have

n n n 2 2 aibi ≤ |ai| · |bi| . v v i=1 ui=1 ui=1 X uX uX t t Proof. The inequality follows immediately from Theorem 5.8 by taking p = q = 2. ⊓⊔

Theorem 5.10 (Minkowski’s Inequality). Let a1,...,an and b1,...,bn be 2n nonnegative numbers. If p ≥ 1, we have

1 1 1 n p n p n p p p p (ai + bi) ≤ ai + bi . i=1 ! i=1 ! i=1 ! X X X If p< 1, the inequality sign is reversed.

Proof. For p = 1, the inequality is immediate. Therefore, we can assume that p> 1. Note that

n n n p p−1 p−1 (ai + bi) = ai(ai + bi) + bi(ai + bi) . i=1 i=1 i=1 X X X 1 1 By H¨older’s inequality for p, q such that p> 1 and p + q = 1, we have

1 1 n n p n q p−1 p (p−1)q ai(ai + bi) ≤ ai (ai + bi) i=1 i=1 ! i=1 ! X X 1 X 1 n p n q p p = ai (ai + bi) . i=1 ! i=1 ! X X Similarly, we can write

1 1 n n p n q p−1 p p bi(ai + bi) ≤ bi (ai + bi) . i=1 i=1 ! i=1 ! X X X Adding the last two inequalities yields n 5.2 Vector Norms on R 163

1 1 1 n n p n p n q p p p p (ai + bi) ≤ a + b (ai + bi) ,  i i  i=1 i=1 ! i=1 ! i=1 ! X X X X   which is equivalent to inequality

1 1 1 n p n p n p p p p (ai + bi) ≤ ai + bi . i=1 ! i=1 ! i=1 ! X X X ⊓⊔

n Corollary 5.11. For p ≥ 1, the function νp : R −→ R≥0 defined by

1 n p p νp(x1,...,xn)= |xi| , i=1 ! X n n where x = (x1,...,xn) ∈ R , is a norm on the linear space (R , +, ·).

Proof. We must prove that νp satisfies the conditions of Definition 5.1 and that νp(x) = 0 implies x = 0. n Let x = (x1,...,xn), y = (y1,...,yn) ∈ R . Minkowski’s inequality ap- plied to the nonnegative numbers ai = |xi| and bi = |yi| amounts to

1 1 1 n p n p n p p p p (|xi| + |yi|) ≤ |xi| + |yi| . i=1 ! i=1 ! i=1 ! X X X Since |xi + yi| ≤ |xi| + |yi| for every i, we have

1 1 1 n p n p n p p p p (|xi + yi|) ≤ |xi| + |yi| , i=1 ! i=1 ! i=1 ! X X X that is, νp(x + y) ≤ νp(x)+ νp(y). We leave to the reader the verification of n the remaining conditions. Thus, νp is a norm on R . ⊓⊔

n Example 5.12. Consider the mappings ν1, ν∞ : R −→ R given by

ν1(x)= |x1| + |x2| + · · · + |xn|,

ν∞(x) = max{|x1|, |x2|,..., |xn|},

n n for every x = (x1,...,xn) ∈ R . Both ν1 and ν∞ are norms on R .

We will frequently use the alternative notation k x kp for νp(x). n n A special norm on R is the function ν∞ : R −→ R≥0 given by

ν∞(x) = max{|xi| | 1 ≤ i ≤ n} (5.4) 164 5 Norms on Linear Spaces

n for x = (x1,...,xn) ∈ R . We verify here that ν∞ satisfies the first condition of Definition 5.1. We start from the inequality

|xi + yi| ≤ |xi| + |yi|≤ ν∞(x)+ ν∞(y)

for every i, 1 ≤ i ≤ n. This in turn implies

ν∞(x + y) = max{|xi + yi| | 1 ≤ i ≤ n}≤ ν∞(x)+ ν∞(y),

which gives the desired inequality. This norm can be regarded as a limit case of the norms νp. Indeed, let Rn x ∈ and let M = max{|xi| | 1 ≤ i ≤ n} = |xℓ1 | = · · · = |xℓk | for some

ℓ1,...,ℓk, where 1 ≤ ℓ1,...,ℓk ≤ n. Here xℓ1 ,...,xℓk are the components of x that have the maximal and k ≥ 1. We can write

1 n p p |xi| 1 lim νp(x) = lim M = lim M(k) p = M, p→∞ p→∞ M p→∞ i=1 ! X   which justifies the notation ν∞.

Example 5.13. For p ≥ 1, let ℓp be the set that consists of of real ∞ p numbers x = (x0, x1,...) such that the series i=0 |xi| is convergent. We can show that ℓp is a linear space. P Let x, y ∈ ℓp be two sequences in ℓp. Using Minkowski’s inequality we have

n n n n p p p p |xi + yi| ≤ (|xi| + |yi|) ≤ |xi| + |yi| , i=0 i=0 i=0 i=0 X X X X which shows that x + y ∈ ℓp. It is immediate that x ∈ ℓp implies ax ∈ ℓp for every a ∈ R and x ∈ ℓp.

Recall that a on a set M is a function d : M × M −→ R≥0 that satisfies the following conditions: (i) d(x, y) = 0 if and only if x = y; (ii) d(x, y)= d(y, x); (iii) d(x, y) ≤ d(x, z)+ d(z,y), for all x,y,z ∈ M. If the first condition is replaced by d(x, x) = 0 for every x ∈ M, then d is a semi-metric on M. The following statement shows that any norm defined on a linear space generates a metric on the space.

Theorem 5.14. Each norm ν : V −→ R≥0 on a real linear space V generates a metric on the set V defined by dν (x, y)=k x − y k for x, y ∈ V . n 5.2 Vector Norms on R 165

Proof. Note that if dν (x, y)=k x − y k= 0, it follows that x − y = 0; that is, x = y. The symmetry of dν is obvious and so we need to verify only the triangular axiom. Let x, y, z ∈ L. Applying the third property of Definition ??, we have

ν(x − z)= ν(x − y + y − z) ≤ ν(x − y)+ ν(y − z) or, equivalently, dν (x, z) ≤ dν (x, y)+ dν (y, z), for every x, y, z ∈ L, which concludes the argument. ⊓⊔ We refer to dν as the metric induced by the norm ν on the linear space V .

For p ≥ 1, then dp denotes the metric dνp induced by the norm νp on the linear space Rn known as the Minkowski metric on Rn. If p = 2, we have the Euclidean metric on Rn given by

n n d (x, y)= |x − y |2 = (x − y )2. 2 v i i v i i ui=1 ui=1 uX uX t t For p = 1, we have n

d1(x, y)= |xi − yi|. i=1 X A representation of these metrics can be seen in Figure 5.1 for the special 2 case of R . If x = (x0, x1) and y = (y0,y1), then d2(x, y) is the length of the hypotenuse of the right triangle and d1(x, y) is the sum of the lengths of the two legs of the triangle.

6 y =(y0,y1)

x =(x0,x1) (y0,x1) -

Fig. 5.1. The distances d1(x, y) and d2(x, y).

Theorem 5.16 to follow allows us to compare the norms νp (and the metrics n of the form dp) that were introduced on R . We begin with a preliminary result. 166 5 Norms on Linear Spaces

Lemma 5.15. Let a1,...,an be n positive numbers. If p and q are two positive numbers such that p ≤ q, then

1 1 p p p q q q (a1 + · · · + an) ≥ (a1 + · · · + an) . Proof. Let f : R>0 −→ R be the function defined by

1 r r r f(r) = (a1 + · · · + an) . Since ln (ar + · · · + ar ) ln f(r)= 1 n , r it follows that f ′(r) 1 1 ar ln a + · · · + ar ln a ar ar 1 1 n r . = − 2 ( 1 + · · · + n)+ · r r f(r) r r a1 + · · · + an To prove that f ′(r) < 0, it suffices to show that r r r r a1 ln a1 + · · · + an ln ar ln (a1 + · · · + an) r r ≤ . a1 + · · · + an r This last inequality is easily seen to be equivalent to n ar ar i ln i ≤ 0, ar + · · · + ar ar + · · · + ar i=1 1 n 1 n X which holds because r ai r r ≤ 1 a1 + · · · + an for 1 ≤ i ≤ n. ⊓⊔

Theorem 5.16. Let p and q be two positive numbers such that p ≤ q. For n every u ∈ R , we have k u kp≥k u kq.

Proof. This statement follows immediately from Lemma 5.15. ⊓⊔

Corollary 5.17. Let p, q be two positive numbers such that p ≤ q. For every n x, y ∈ R , we have dp(x, y) ≥ dq(x, y).

Proof. This statement follows immediately from Theorem 5.16. ⊓⊔

Example 5.18. For p = 1 and q = 2 the inequality of Theorem 5.16 becomes

n n |u |≤ |u |2, i v i i=1 ui=1 X uX t which is equivalent to n |u | n |u |2 i=1 i ≤ i=1 i . (5.5) n n P r P n 5.2 Vector Norms on R 167

Theorem 5.19. Let p ≥ 1. For every x ∈ Rn we have

k x k∞≤k x kp≤ n k x k∞ .

Proof. The first inequality is an immediate consequence of Theorem 5.16. The second inequality follows by observing that

1 n p p k x kp= |xi| ≤ n max |xi| = n k x k∞ . 1≤i≤n i=1 ! X ⊓⊔

Corollary 5.20. Let p and q be two numbers such that p, q ≥ 1. There exist two constants c, d ∈ R>0 such that

c k x kq≤k x kp≤ d k x kq for x ∈ Rn.

Proof. Since k x k∞≤k x kp and k x kq≤ n k x k∞, it follows that k x kq≤ n k x kp. Exchanging the roles of p and q, we have k x kp≤ n k x kq, so 1 k x k ≤k x k ≤ n k x k n q p q for every x ∈ Rn. ⊓⊔

n Corollary 5.21. For every x, y ∈ R and p ≥ 1, we have d∞(x, y) ≤ dp(x, y) ≤ nd∞(x, y). Further, for p,q > 1, there exist c, d ∈ R>0 such that

cdq(x, y) ≤ dp(x, y) ≤ cdq(x, y) for x, y ∈ Rn.

Proof. This follows from Theorem 5.19 and from Corollary 5.21. ⊓⊔

Corollary 5.17 implies that if p ≤ q, then the closed sphere Bdp (x, r) is included in the closed sphere Bdq (x, r). For example, we have

Bd1 (0, 1) ⊆ Bd2 (0, 1) ⊆ Bd∞ (0, 1).

In Figures 5.2 (a) - (c) we represent the closed spheres Bd1 (0, 1), Bd2 (0, 1), and Bd∞ (0, 1). An useful consequence of Theorem 5.7 is the following statement:

Theorem 5.22. Let x1,...,xm and y1,...,ym be 2m nonnegative numbers m m such that i=1 xi = i=1 yi = 1 and let p and q be two positive numbers such that 1 + 1 =1. We have pP q P

m 1 1 p q xj yj ≤ 1. j=1 X 168 5 Norms on Linear Spaces

6 6 6 @ @ - '$- - @ @ (a)&% (b) (c) ∞ Fig. 5.2. Spheres Bdp (0, 1) for p = 1, 2, .

1 1 1 1 p p q q Proof. The H¨older inequality applied to x1 ,...,xm and y1 ,...,ym yields the needed inequality m 1 1 m m p q xj yj ≤ xj yj =1 j=1 j=1 j=1 X X X ⊓⊔ Theorem 5.22 allows the formulation of a generalization of the H¨older Inequality.

Theorem 5.23. Let A be an n × m , A = (aij ), having positive entries m such that j=1 aij = 1 for 1 ≤ i ≤ n. If p = (p1,...,pn) is an n-tuple of positive numbers such that n p =1, then P i=1 i P m n pi aij ≤ 1. j=1 i=1 X Y Proof. The argument is by induction on n ≥ 2. The basis case, n = 2 follows immediately from Theorem 5.22 by choosing p = 1 , q = 1 , x = a , and p1 p2 j 1j yj = a2j for 1 ≤ j ≤ m. Suppose that the statement holds for n, let A be an (n + 1) × m-matrix m having positive entries such that j=1 aij = 1 for 1 ≤ i ≤ n + 1, and let p = (p1,...,pn,pn+1) be such that p1 + · · · + pn + pn+1 = 1. It is easy to see that P

m n+1 m pi p1 pn−1 pn+pn+1 aij ≤ a1j an−1 j (anj + an+1 j ) . j=1 i=1 j=1 X Y X By applying the inductive hypothesis, we have

m n+1 pi aij ≤ 1. j=1 i=1 X Y ⊓⊔ A more general form of Theorem 5.23 is given next. n 5.2 Vector Norms on R 169

Theorem 5.24. Let A be an n×m matrix, A = (aij ), having positive entries. n If p = (p1,...,pn) is an n-tuple of positive numbers such that i=1 pi = 1, then pi m n n m P pi a ≤ aij . ij   j=1 i=1 i=1 j=1 X Y Y X   Proof. Let B = (bij ) be the matrix defined by

aij bij = m j=1 aij Pm for 1 ≤ i ≤ n and 1 ≤ j ≤ m. Since j=1 bij = 1, we can apply Theorem 5.23 to this matrix. Thus, we can write P m n m n pi pi aij bij = m aij j=1 i=1 j=1 i=1 j=1 ! X Y X Y m n pi P aij = p m i j=1 i=1 a X Y j=1 ij m  n pi  j=1 Pi=1 aij = p ≤ 1. n m i iP=1 Qj=1 aij   ⊓⊔ Q P We now give a generalization of Minkowski’s inequality (Theorem 5.10). First, we need a preliminary result.

Lemma 5.25. If a1,...,an and b1,...,bn are positive numbers and r < 0, then n n r n 1−r r 1−r ai bi ≥ ai · bi . i=1 i=1 ! i=1 ! X X X n Proof. Let c1,...,cn, d1,...,dn be 2n positive numbers such that i=1 ci = 1 n 1 p q i=1 di = 1. Inequality (5.3) applied to the numbers a = ci andPb = di yields: P 1 1 c d c p d q ≥ i + i . i i p q Summing these inequalities produces the inequality

n 1 1 p q ci di ≥ 1, i=1 X or 170 5 Norms on Linear Spaces

n r 1−r ci di ≥ 1, i=1 X 1 ai bi where r = < 0. Choosing ci = n and di = n , we obtain the p i=1 ai i=1 bi desired inequality. ⊓⊔ P P

Theorem 5.26. Let A be an n×m matrix, A = (aij ), having positive entries, and let p and q be two numbers such that p > q and p 6=0, q 6=0. We have

1 1 p q p q q m n p n m ap ≥ aq .  ij    ij   j=1 i=1 ! i=1 j=1 X X X X        Proof. Define

1 q q m n p E = ap ,  ij  j=1 i=1 ! X X 1 p   p n m q F = aq ,   ij   i=1 j=1 X X      m q and ui = j=1 aij for 1 ≤ i ≤ n. There are three distinct cases to consider related to the position of 0 relative toPp and q. Suppose initially that p>q> 0. We have

p p p n q n q −1 F = ui = uiui i=1 p i=1 p n m q q −1 m n q q −1 = Pi=1 j=1 aij ui = Pj=1 i=1 aij ui . By applying the H¨olderP inequality,P we haveP P

q q n n p n 1− p p −1 p p −1 p q q q q q p−q aij ui ≤ (aij ) · (ui ) (5.6) i=1 i=1 ! i=1 ! X X q X q p 1− p n n p p q = aij · ui , i=1 ! i=1 ! X X which implies F p ≤ EqF p−q. This, in turn, gives F q ≤ Eq, which implies the generalized Minkowski inequality. Suppose now that 0 >p>q, so 0 < −p < −q. Applying the generalized 1 Minkowski inequality to the positive numbers bij = gives the inequality aij 5.3 Matrix Norms 171

1 1 q p − q − p p m n q n m b−q ≥ b−p ,  ij    ij   j=1 i=1 ! i=1 j=1 X X X X        which is equivalent to

1 1 q p − q − p p m n q n m aq ≥ ap .  ij    ij   j=1 i=1 ! i=1 j=1 X X X X        A last transformation gives

1 1 q p q p p m n q n m aq ≤ ap ,  ij    ij   j=1 i=1 ! i=1 j=1 X X X X        which is the inequality to be proven. q Finally, suppose that p > 0 > q. Since p < 0, Inequality (5.7) is replaced by the opposite inequality through the application of Lemma 5.25:

q q p 1− p n p n n p q q −1 p q aij ui ≥ aij · ui . i=1 i=1 ! i=1 ! X X X This leads to F p ≥ EqF p−q or F q ≥ Eq. Since q < 0, this implies F ≤ E. ⊓⊔

5.3 Matrix Norms

In Chapter 3 we saw that the set F m×n is a linear space. Therefore, it is natural to consider norms defined on matrices. In the case of matrices we need to consider a supplementary condition that defines the interaction between norms and matrix multiplication.

Definition 5.27. A matrix norm is a family of functions µ(m,n) : Rm×n −→ R≥0, where m,n ∈ P that satisfies the following conditions: (m,n) (i) µ (A)=0 if and only if A = Om,n; (ii) µ(m,n)(A + B) ≤ µ(m,n)(A)+ µ(m,n)(B) (the subadditivity property), and (iii) µ(m,n)(aA)= |a|µ(m,n)(A), (iv) µ(m,p)(AB) ≤ µ(m,n)(A)µ(n,p)(B) for every matrix A ∈ Rm×n and B ∈ Rn×p (the submultiplicative property).

If the format of the matrix A is clear from context or is irrelevant, then we shall write µ(A) instead of µ(m,n)(A) or just |||A|||. If the last condition of Definition 5.27 is dropped we obtain generalized matrix norms which are, essentially, vector norms obtained by transforming matrices into vectors and, then, using vector norms. 172 5 Norms on Linear Spaces

Definition 5.28. The (m × n)-vectorization mapping is the mapping vec : Rm×n −→ Rmn defined by

vec(A) = (a11,...,am1,a12,...,am2,...,a1n,...,amn), obtained by reading A column-wise.

The vectorization mapping vec is an isomorphism between the linear space F m×n and the linear space F mn, as the reader can easily verify. Definition 5.29. Let ν be a vector norm on the space Rmn. The generalized (m,n) m×n (m,n) m×n matrix norm µ on R is the mapping µ : R −→ R≥0 defined by µ(m,n)(A)= ν(vec(A)), for A ∈ Rm×n. Some generalized matrix norms turn out to be actual matrix norms; others fail to be matrix norms. This point is illustrated by the next two examples.

Example 5.30. Consider the generalized matrix norm µ1 induced by the vector n m Rm×n norm ν1. We have µ1(A) = i=1 j=1 |aij | for A ∈ . Actually, this is a matrix norm. To prove this fact consider the matrices A ∈ Rm×p and P P B ∈ Rp×n. We have m n p m n p

µ1(AB)= aikbkj ≤ |aikbkj | i=1 j=1 k i=1 j=1 k X X X=1 X X X=1 p p m n ≤ |a ik′ ||bk′′j | i=1 j=1 k′ k′′ X X X=1 X=1 (because we added extra non-negative terms to the sums) m p n p = |aik′ | · |bk′′j |   i=1 k′ ! j=1 k′′ X X=1 X X=1 = µ1(A)µ1(B).   We will denote this matrix norm by the same notation as the corresponding vector norm, that is, by k A k1. The generalized norm µ2 induced by the vector norm ν2 is also a matrix norm. Indeed, using the notations as above we have:

m n p 2 2 (µ2(AB)) = aikbkj i=1 j=1 k X X X=1 m n p p 2 2 ≤ |aik| |blj | i=1 j=1 k ! l ! X X X=1 X=1 (by Cauchy-Schwarz inequality) 2 2 ≤ (µ2(A)) (µ2(B)) . 5.3 Matrix Norms 173

The matrix norm µ2, denoted by k A k2, is known as the Frobenius norm. It is easy to see that ′ k A k2= (A A). (5.7)

Example 5.31. The generalized norm µ∞ induced by the vector norm ν∞, denoted by k A k∞ is not a matrix norm. To see that, let a,b be two positive numbers and consider the matrices aa bb A = and B = . aa bb    

Clearly, we have k A k∞= a and k B k2= b. However, since

2ab 2ab AB = , 2ab 2ab   we have k AB k∞=2ab and the submultiplicative propery of matrix norms is violated.

A technique that always produces matrix norms starting from vector norms is introduced in the next theorem. Theorem 5.32. Let ν be a vector norm on Rn. Then, the the function µ : m×n R −→ R≥0 defined as

µ(A) = sup{ν(Ax) | ν(x) ≤ 1}, for A ∈ Rm×n, is a matrix norm.

Proof. We need to verify that µ satisfies the conditions of Definition 5.27. It is easy to see that if we exclude the zero vector from the definition of µ(A) the value of µ(A) remains the same; in other words, µ(A) = sup{ν(Ax) | ν(x) ≤ 1, x ∈ Rn −{0}}. If µ(A) = 0, this means that ν(Ax) = 0 for every x such that ν(x) ≤ 1; In n other words, we have Ax = 0 for every x ∈ R . and this implies A = Om,n. Since µ(Om,n) = 0, the first condition is satisfied. If A, B ∈ Rm×n, then

µ(A + B) = sup{ν(A + B)x | ν(x) ≤ 1} = sup{ν(Ax + Bx) | ν(x) ≤ 1} ≤ sup{ν(Ax)+ ν(Bx) | ν(x) ≤ 1} ≤ sup{ν(Ax) | ν(x) ≤ 1} + sup{ν(Bx) | ν(x) ≤ 1} = µ(A)+ µ(B), which shows that the second condition of Definition 5.27 is satisfied. Let a ∈ R. We can write 174 5 Norms on Linear Spaces

µ(aA) = sup{ν(aA)x | ν(x) ≤ 1} = sup{|a|ν(Ax | ν(x) ≤ 1} = |a| sup{ν(Ax | ν(x) ≤ 1} = |a|µ(A), which means that the third condition is satisfied. Finally, by the first part of Supplement 4 we have:

µ(AB) = sup{ν((AB)x) | ν(x)=1} = sup{ν(A(Bx)) | ν(x)=1} Bx = sup ν A ν(Bx) ν(x)=1 ν(Bx)    

≤ sup µ(A){ν(Bx) ν(x)=1}

= µ(A)µ(B), because Bx ν =1. ν(Bx)   ⊓⊔ By Corollary 2.153, we can replace sup by max in the definition of the matrix norm induced by a vector norm, µ(A) = sup{ν(Ax) | ν(x) ≤ 1}, because the closed sphere B(0, 1) is a compact set in Rn. Thus, we have

µ(A) = max{ν(Ax) | ν(x) ≤ 1}. (5.8)

An equivalent definition of the matrix norm induced by a vector norm is given next. Theorem 5.33. Let ν be a vector norm on Rn and let µ be the matrix norm induced by ν, as in Theorem 5.32. Then,

µ(A) = max{ν(Ax) | ν(x)=1}, for A ∈ Rm×n.

Proof. Since {x | ν(x)=1}⊆{x | ν(x) ≤ 1} it follows that

max{ν(Ax) | ν(x)=1}≤ max{νAx | ν(x) ≤ 1} = µ(A).

There exist z ∈ Rn −{0} such that ν(z) ≤ 1 and ν(Az)= µ(A), which implies

z z µ(A)= ν(z)ν A ≤ ν A . ν(z) ν(z)     z Since ν ν(z) = 1 it follows that µ(A) ≤ max{ν(Ax) | ν(x)=1}. This yields the desired conclusion. ⊓⊔ 5.3 Matrix Norms 175

n If µ is a matrix norm induced by a vector norm on R , then µ(In) = sup{ν(Inx) | ν(x) ≤ 1} = 1. This necessary condition can be used for identi- fying matrix norms that are not induced by vector norms. The matrix norm induced by the vector norm k·kp will be denoted by |||· |||p.

Example 5.34. To compute the matrix norm |||A|||1 = sup{k Ax k1 |k x k1≤ 1}, n×n where A ∈ R , suppose that the columns of A are the vectors a1,..., an. n Let x ∈ R be a vector whose components are x1,...,xn. Then, Ax = x1a1 + · · · + xnan, so

k Ax k1 = k x1a1 + · · · + xnan k1 n

≤ |xi|k ai k1 i=1 X n

≤ max k ai k1 |xi| i i=1 X = max k ai k1 ·k x k1 . i

Thus, |||A|||1 ≤ maxi k ai k1. th Let ei be the vector whose components are 0 with the exception of its i component that is equal to 1. Clearly, we have k ei k1= 1 and ai = Aei. This, in turn implies k ai k1=k Aei k1≤ |||A|||1 for 1 ≤ i ≤ n. Therefore, maxi k ai k1≤ |||A|||1, so

n

|||A|||1 = max k ai k1= max |aij |. i i j=1 X

In other words, |||A|||1 equals the maximum column sum of the absolute values.

Example 5.35. Consider now a matrix A ∈ Rn×n.

|||A|||∞ = sup{k Ax k∞ |k x k∞≤ 1}.

We have

n

k Ax k∞ = max aij xj 1≤i≤n j=1 X n

≤ max |aij xj | 1≤i≤n j=1 X n

≤ max k x k∞ |aij |. 1≤i≤n j=1 X 176 5 Norms on Linear Spaces

n Consequently, if k x k∞≤ 1 we have k Ax k∞≤ max1≤i≤n j=1 |aij |. Thus, |||A||| ≤ max n |a |. ∞ 1≤i≤n j=1 ij P The converse inequality is immediate if A = On,n. Therefore, assume that P A 6= On×n, and let (ap1,...,apn) be any rowof A that has at least one distinct from 0. Define the vector z ∈ Rn by

|apj | if a 6=0, apj pj zj = (1 otherwise, for 1 ≤ j ≤ n. It is clear that zj ∈ {−1, 1} for every j,1 ≤ j ≤ n and, therefore, k z k∞= 1. Moreover, we have |apj | = apj zj for 1 ≤ j ≤ n. Therefore, we can write:

n n n |a | = a z = a z pj pj j pj j j=1 j=1 j=1 X X X

n

≤ max aij zj 1≤i≤n j=1 X

= k Az k ∞≤ sup{k Ax k∞ |k x k∞≤ 1} ≤ |||A|||∞.

n Since this holds for every row of A, it follows that max1≤i≤n j=1 |aij | ≤ |||A|||∞, which proves that P n

|||A|||∞ = max |aij |. 1≤i≤n j=1 X

In other words, |||A|||∞ equals the maximum row sum of the absolute values.

5.4 The Topology of Normed Linear Spaces

We saw that every norm k·k defined on a linear space V generates a metric 2 d : V −→ R≥0 given by d(x, y) =k x − y k. Therefore, by Theorem 2.93, any normed space can be equipped with the topology of a , using the metric defined by the norm. Since this topology is induced by a metric. any normed space is a Hausdorff space by Theorem 2.112. Further, if v ∈ V , then the collection of subsets {C(v, r) | r > 0} is a fundamental system of neighborhoods for v. A (x0, x1,...) of elements of V converges to x if for every ǫ> 0 there exists nǫ ∈ N such that n ≥ nǫ implies k xn − x k<ǫ.

Theorem 5.36. In a normed linear space (V, k·k), the norm, the multipli- cation by scalars and the vector addition are continuous functions. 5.4 The Topology of Normed Linear Spaces 177

Proof. By Theorem 5.2, we have k x − y k≥ | k x k−k y k | for every x, y ∈ V . Therefore, if limn→∞ xn = x, we have k xn −x k≥|k xn k−k x k |, which implies limn→∞ k xn k=k x k. Thus, by Theorem 2.128, the norm is continuous. Suppose now that limn→∞ an = a and limn→∞ xn = x, where (an) is a sequence of scalars. Since the sequence (xn) is bounded, we have

k ax − anxn k≤k ax − anx k + k anx − anxn k

≤ |a − an|k x k +an k x − xn k, which implies that limn→∞ anxn = ax. This shows that the multiplication by scalars is a continuous function. To prove that the vector addition is continuous, let (xn) and (yn) be two sequences in V such that limn→∞ xn = x and limn→∞ yn = y. Note that

k (x + y) − (xn + yn) k≤k x − xn k + k y − yn k, which implies that limn→∞(xn + yn) = x + y. Thus, the vector addition is continuous. ⊓⊔

Definition 5.37. Two norms ν0 and ν1 on a linear space V are equivalent if they generate the same topology.

Theorem 5.38. Let V be a linear space and let ν0 : V −→ R≥0 and ν1 : V −→ R≥0 be two norms on V that generate the topologies O0 and O1 on V , respectively. The topology O1 is finer than the topology O0 (that is, O0 ⊆ O1) if and only if there exists c ∈ R>0 such that ν0(v) ≤ cν1(v) for every v ∈ V .

Proof. Suppose that O0 ⊆ O1. Then, any open sphere C0(0, r0) = {x ∈ V | ν0(x) < r0} (in O0) must be an open set in O1. Therefore, there exists an open sphere C1(0, r1) such that C1(0, r1) ⊆ C0(0, r0). This means that for r0 ∈ R≥0 and v ∈ V there exists r1 ∈ R≥0 such that ν1(v) < r1 implies ν0(v) < r0 for every u ∈ V . In particular, for r0 = 1, there is k> 0 such that ν1(v) < k implies ν0(v) < 1, which is equivalent to

cν1(v) < 1 implies ν0(v) < 1,

1 for every v ∈ V and c = k . For w = 1 v , where ǫ> 0 it follows that c+ǫ ν1(v)

1 v c cν (w)= cν = < 1, 1 1 c + ǫ ν (v) c + ǫ  1  so 1 v 1 ν (v) ν (w)= ν = 0 < 1. 0 0 c + ǫ ν (v) c + ǫ ν (v)  1  1 178 5 Norms on Linear Spaces

Since this inequality holds for every ǫ> 0 it follows that ν0(v) ≤ cν1(v). Conversely, suppose that there exists c ∈ R>0 such that ν0(v) ≤ cν1(v) for every v ∈ V . Since r v | ν (v) ≤ ⊆{v | ν (v) ≤ r}, 1 c 0 n o for v ∈ V and r> 0 it follows that O0 ⊆ O1. ⊓⊔

Corollary 5.39. Let V be a linear space and let ν0 : V −→ R≥0 and ν1 : V −→ R≥0 be two norms on V . Then, ν0 and ν1 are equivalent norms if and only if there exist a,b ∈ R>0 such that aν0(v) ≤ ν1(v) ≤ bν0(v). Proof. This statement follows directly from Theorem 5.38. ⊓⊔

n Example 5.40. By Corollary 5.20 any two norms νp and νq, on R (with p, q ≥ 1) are equivalent.

Continuous linear functions between normed spaces have a simple charac- terization.

Theorem 5.41. Let (V1, ν1) and (V2, ν2) be two normed F -linear spaces where F is either R or C. A linear function f : V1 −→ V2 is continuous if and only if there exists M ∈ R>0 such that ν2(f(x)) ≤ Mν1(x) for every x ∈ V1.

Proof. Suppose that f : V1 −→ V2 satisfies the condition of the theorem. Then, r f C 0 , ⊆ C(0 , r), 1 M 2 for every r > 0, which means  that f is continuous in 01 and, therefore, it is continuous everywhere (by Theorem 2.161). Conversely, suppose that f is continuous. Then, there exists δ > 0 such that f(C(01,δ)) ⊆ C(f(x), 1), which is equivalent to saying that ν1(x) < δ implies ν2(f(x)) < 1. Let ǫ> 0 and let z ∈ V1 be defined by δ z = x. ν1(x)+ ǫ x We have ν (z) = δν1( ) < δ. This implies ν (f(z)) < 1, which is equivalent 1 ν1(x)+ǫ 2 to δ ν2(f(x)) < 1 ν1(x)+ ǫ because of the linearity of f. This means that ν (x)+ ǫ ν (f(x)) < 1 2 δ 1 for every ǫ> 0, so ν2(f(x)) ≤ δ ν1(x). ⊓⊔ Definition 5.42. A Banach space is a normed linear space that is a complete metric space with respect to the metric induced by the norm. 5.5 Matrix Sequences and Matrix Series 179 5.5 Matrix Sequences and Matrix Series

The set of matrices Cm×p is a C-linear space, and the set of matrices Rm×p is an R-linear spaces. Using vector norms or matrix form, these spaces can be equipped with a topological structure, as we indicated above. We focus now on the normed linear space (Rp×p, ||| · |||), where ||| · |||) is a matrix norm. Let A ∈ Rp×p. We can show by induction on n that

|||An||| ≤ (|||A|||)n. (5.9)

The base step, n = 0, is immediate. Suppose that the inequality holds for n. We have

|||An+1||| = |||AnA||| ≤ |||An||||||A||| (because |||· ||| is a matrix norm) ≤ (|||A|||)n|||A||| (by the inductive hypothesis) = (|||A|||)n+1, which concludes our argument. If |||A||| < 1 the sequence of matrices (A, A2,...,An,...) converges to- n n wards the zero matrix Op,p. Indeed, limn→∞ |||A − Op,p||| = limn→∞ |||A ||| ≤ n n limn→∞(|||A|||) = 0, which shows that limn→∞ A = Op,p.

Definition 5.43. Let A = (A0, A1,...,An,...) be a sequence of matrices in Rp×p. A matrix series having A as its sequence of terms is the sequence of i matrices (S0,S1,...,Sn,...), where Si = k =0 Ak. The series (S0,S1,...,Sn,...) will be denoted also by A0 +A1 +· · ·+An + · · · . P We say that the series A0 + A1 + · · · + An + · · · converges to a matrix S if limn→∞ Sn = S. This is also denoted by A0 + A1 + · · · + An + · · · = S.

The subadditivity property of the norm can be generalized to series of matrices. Namely, if the series A0 + A1 + · · · + An + · · · converges to S, then

∞ |||S||| ≤ |||Ai|||. i=0 X Indeed, by the usual subadditivity property

n

|||A0 + A1 + · · · + An||| ≥ |||Ai|||. i=0 X 180 5 Norms on Linear Spaces

This implies ∞ |||A0 + A1 + · · · + An||| ≥ |||Ai|||, i=0 X N ∞ for every n ∈ , so |||S||| ≥ i=0 |||Ai|||, due to the continuity of the matrix norm. P Example 5.44. Let A ∈ Rp×p be a matrix such that |||A||| < 1. We claim that the matrix I − A is invertible and A0 + A1 + · · · + An + · · · = (I − A)−1. Suppose that I − A is not invertible. Then, the system (I − A)x = 0 has a non-trivial solution. This implies x = Ax, so |||A||| ≥ 1, which contradicts the hypothesis. Thus, I − A is an invertible matrix. Observe that

(A0 + A1 + · · · + An)(I − A)−1 = I − An+1.

Therefore,

lim (A0 + A1 + · · · + An) (I − A)−1 = lim (I − An+1)= I, n→∞ n→∞   n+1 0 1 n since limn→∞ A = O. This shows that the series A + A + · · · + A + · · · converges to the inverse of the matrix I − A, so

∞ (I − A)−1 = Ai. i=0 X Moreover, we have

∞ |||(I − A)−1||| = ||| Ai||| i=0 ∞X ≤ |||Ai||| i=0 X∞ = (|||A|||)i i=0 X 1 = , 1 − |||A||| because |||A||| ≤ 1.

Exercises and Supplements

1. Prove that if x, y, z are three vectors in a real V and ν is a norm on V , then νx − y ≤ νx − z + νz − y. Bibliographical Comments 181

2. Let A ∈ Rn×n (or in Cn×n) be a non-singular matrix. Prove that if ν is a n n norm on R (on C , respectively), then νA defined by νA(x) = ν(Ax) is a norm on Rn (on Cn). Rn 1 1 3. Two vector norms νp and νq on are conjugate if p + q = 1. Prove that ′ |x y|≤ νp(x) · νq(y) for x, y ∈ Rn. 4. Let A ∈ Rm×n and let ν be a vector norm on Rn. Prove that if A ∈ Rm×n, then we have the following equalities: µ(A) = sup{ν(Ax) | ν(x)=1} ν(Ax) = sup x ∈ Rn −{0} ν(x)   = inf{k | ν(Ax ) ≤ kν(x), for every x ∈ Rn}.

Solution: To prove the first equality note that {ν(Ax) | ν(x)=1}⊆{ν(Ax) | ν(x) ≤ 1}. This implies sup{ν(Ax) | ν(x)=1}≤ sup{ν(Ax) | ν(x) ≤ 1} = µ(A). On the other hand, let x be a vector such that ν(x) ≤ 1. We have x = 0 if and only if ν(x) = 0 because ν is a norm. Otherwise, x 6= 0, and for y = 1 ν(x) x we have ν(y) = 1 and ν(Ax)= ν(A(ν(x)y)= ν(x)ν(Ay) ≤ ν(Ay). Therefore, in either case we have ν(Ax) ≤ sup{ν(Ay) | ν(y)=1} for ν(x) ≤ 1. Thus, we have the reverse inequality, sup{ν(Ax) | ν(x) ≤ 1}≤ sup{ν(Ay) | ν(y)=1}, so µ(A) = sup{ν(Ax) | ν(x)=1}. To prove the second equality observe that ν(Ax) x sup x 6= 0 = sup ν(A x 6= 0 ν(x) ν(x)       = sup{ν(Ay) | ν(y)=1 } = µ(A),

x because ν ν(x) = 1 and every vector y with ν(y) = 1 can be written as 1 y = ν(x) xfor some x 6= 0. We leave the third equality to the reader.

Bibliographical Comments