<<

Notes on Vector and Norms

Robert A. van de Geijn Department of Computer Science The University of Texas at Austin Austin, TX 78712 [email protected]

September 15, 2014

1 Absolute Value

Recall that if√α ∈ C, then |α| equals its absolute value. In other words, if α = αr + iαc, then |α| = p 2 2 αr + αc = αα. This absolute value has the following properties: • α 6= 0 ⇒ |α| > 0 (| · | is positive definite), •| αβ| = |α||β| (| · | is homogeneous), and

•| α + β| ≤ |α| + |β| (| · | obeys the triangle inequality).

2 Vector Norms

A (vector) extends the notion of an absolute value (length or size) to vectors:

Definition 1. Let ν : Cn → R. Then ν is a (vector) norm if for all x, y ∈ Cn • x 6= 0 ⇒ ν(x) > 0 (ν is positive definite), • ν(αx) = |α|ν(x) (ν is homogeneous), and

• ν(x + y) ≤ ν(x) + ν(y) (ν obeys the triangle inequality).

Exercise 2. Prove that if ν : Cn → R is a norm, then ν(0) = 0 (where the first 0 denotes the zero vector in Cn).

Answer: Let x ∈ Cn and ~0 the zero vector of size n and 0 the scalar zero. Then ν(~0) = ν(0 · x) 0 · x = ~0 = |0|ν(x) ν(·) is homogeneous = 0 algebra

End of Answer Note: often we will use k · k to denote a vector norm.

1 2.1 Vector 2-norm (length)

n Definition 3. The vector 2-norm k · k2 : C → R is defined by √ H p p 2 2 kxk2 = x x = χ0χ0 + ··· + χn−1χn−1 = |χ0| + ··· + |χn−1| .

To show that the vector 2-norm is a norm, we will need the following theorem:

n H Theorem 4. (Cauchy-Schartz inequality) Let x, y ∈ C . Then |x y| ≤ kxk2kyk2.

Proof: Assume that x 6= 0 and y 6= 0, since otherwise the inequality is trivially true. We can then choose H xb = x/kxk2 and yb = y/kyk2. This leaves us to prove that |xb yb| ≤ 1, with kxbk2 = kybk2 = 1. H H H Pick α ∈ C with |α| = 1 s that αxb yb is real and nonnegative. Note that since it is real, αxb yb = αxb yb = H αyb xb. Now,

2 0 ≤ kxb − αybk2 H 2 H = (x − αyb) (xb − αyb)(kzk2 = z z) H H H H = xb xb − αyb xb − αxb yb + ααyb yb (multiplying out) H 2 H H H = 1 − 2αxb yb + |α| (kxbk2 = kybk2 = 1 and αxb yb = αxb yb = αyb xb) H = 2 − 2αxb yb (|α| = 1). H Thus 1 ≥ αxb yb and, taking the absolute value of both sides, H H H 1 ≥ |αxb yb| = |α||xb yb| = |xb yb|, which is the desired result. QED Theorem 5. The vector 2-norm is a norm.

Proof: To prove this, we merely check whether the three conditions are met: Let x, y ∈ Cn and α ∈ C be arbitrarily chosen. Then

• x 6= 0 ⇒ kxk2 > 0 (k · k2 is positive definite): Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that χj 6= 0. Then q p 2 2 2 kxk2 = |χ0| + ··· + |χn−1| ≥ |χj| = |χj| > 0.

•k αxk2 = |α|kxk2 (k · k2 is homogeneous):

p 2 2 p 2 2 2 2 p 2 2 2 kαxk2 = |αχ0| + ··· + |αχn−1| = |α| |χ0| + ··· + |α| |χn−1| = |α| (|χ0| + ··· + |χn−1| ) = p 2 2 |α| |χ0| + ··· + |χn−1| = |α|kxk2.

•k x + yk2 ≤ kxk2 + kyk2 (k · k2 obeys the triangle inequality).

2 H kx + yk2 = (x + y) (x + y) = xH x + yH x + xH y + yH y 2 2 ≤ kxk2 + 2kxk2kyk2 + kyk2 2 = (kxk2 + kyk2) . Taking the square root of both sides yields the desired result.

2 QED

2.2 Vector 1-norm

n Definition 6. The vector 1-norm k · k1 : C → R is defined by

kxk1 = |χ0| + |χ1| + ··· + |χn−1|.

Exercise 7. The vector 1-norm is a norm.

Answer: We show that the three conditions are met: Let x, y ∈ Cn and α ∈ C be arbitrarily chosen. Then

• x 6= 0 ⇒ kxk1 > 0 (k · k1 is positive definite): Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that χj 6= 0. Then kxk1 = |χ0| + ··· + |χn−1| ≥ |χj| > 0.

•k αxk1 = |α|kxk1 (k · k1 is homogeneous):

kαxk1 = |αχ0| + ··· + |αχn−1| = |α||χ0| + ··· + |α||χn−1| = |α|(|χ0| + ··· + |χn−1|) = |α|(|χ0| + ··· + |χn−1|) = |α|kxk1.

•k x + yk1 ≤ kxk1 + kyk1 (k · k1 obeys the triangle inequality).

kx + yk1 = |χ0 + ψ0| + |χ1 + ψ1| + ··· + |χn−1 + ψn−1|

≤ |χ0| + |ψ0| + |χ1| + |ψ1| + ··· + |χn−1| + |ψn−1|

= |χ0| + |χ1| + ··· + |χn−1| + |ψ0| + |ψ1| + ··· + |ψn−1|

= kxk1 + kyk1.

End of Answer The vector 1-norm is sometimes referred to as the “taxi-cab norm”. It is the distance that a taxi travels along the streets of a city that has square blocks.

2.3 Vector ∞-norm (infinity norm)

n Definition 8. The vector ∞-norm k · k∞ : C → R is defined by kxk∞ = maxi |χi|.

Exercise 9. The vector ∞-norm is a norm.

Answer: We show that the three conditions are met: Let x, y ∈ Cn and α ∈ C be arbitrarily chosen. Then

• x 6= 0 ⇒ kxk∞ > 0 (k · k∞ is positive definite): Notice that x 6= 0 means that at least one of its components is nonzero. Let’s assume that χj 6= 0. Then kxk∞ = max |χi| ≥ |χj| > 0. i

3 •k αxk∞ = |α|kxk∞ (k · k∞ is homogeneous):

kαxk∞ = maxi |αχi| = maxi |α||χi| = |α| maxi |χi| = |α|kxk∞.

•k x + yk∞ ≤ kxk∞ + kyk∞ (k · k∞ obeys the triangle inequality).

kx + yk∞ = max |χi + ψi| i

≤ max(|χi| + |ψi|) i

≤ max(|χi| + max |ψj|) i j

= max |χi| + max |ψj| = kxk∞ + kyk∞. i j

End of Answer

2.4 Vector p-norm

n Definition 10. The vector p-norm k · kp : C → R is defined by

pp p p p kxkp = |χ0| + |χ1| + ··· + |χn−1| .

Proving that the p-norm is a norm is a little tricky and not particularly relevant to this course. To prove the triangle inequality requires the following classical result:

n 1 1 H Theorem 11. (H¨olderinequality) Let x, y ∈ C and p + q = 1 with 1 ≤ p, q ≤ ∞. Then |x y| ≤ kxkpkykq.

Clearly, the 1-norm and 2 norms are special cases of the p-norm. Also, kxk∞ = limp→∞ kxkp.

3 Matrix Norms

It is not hard to see that vector norms are all measures of how “big” the vectors are. Similarly, we want to have measures for how “big” matrices are. We will start with one that are somewhat artificial and then move on to the important class of induced matrix norms.

3.1 Frobenius norm

m×n Definition 12. The Frobenius norm k · kF : C → R is defined by v um−1 n−1 u X X 2 kAkF = t |αi,j| . i=0 j=0

Notice that one can think of the Frobenius norm as taking the columns of the matrix, stacking them on top of each other to create a vector of size m × n, and then taking the vector 2-norm of the result. Exercise 13. Show that the Frobenius norm is a norm.

4   Answer: The answer is to realize that if A = a0 a1 ··· an−1 then

v u   2 u a0 v v v u m−1 n−1 n−1 m−1 n−1 u u u u  a1  u X X 2 uX X 2 uX 2 u   kAkF = t |αi,j| = t |αi,j| = t kajk = u  .  . 2 u  .  i=0 j=0 j=0 i=0 j=0 u . t   a n−1 2

In other words, it equals the vector 2-norm of the vector that is created by stacking the columns of A on top of each other. The fact that the Frobenius norm is a norm then comes from realizing this connection and exploiting it. Alternatively, just grind through the three conditions! End of Answer Similarly, other matrix norms can be created from vector norms by viewing the matrix as a vector. It turns out that other than the Frobenius norm, these aren’t particularly interesting in practice.

3.2 Induced matrix norms

m n m×n Definition 14. Let k · kµ : C → R and k · kν : C → R be vector norms. Define k · kµ,ν : C → R by

kAxkµ kAkµ,ν = sup . n x ∈ C kxkν x 6= 0

Let us start by interpreting this. How “big” A is, as measured by kAkµ,ν , is defined as the most that A magnifies the length of nonzero vectors, where the length of the vectors (x) is measured with norm k · kν and the length of the transformed vector (Ax) is measured with norm k · kµ. Two comments are in order. First,

kAxkµ sup = sup kAxkµ. n kxk x ∈ C ν kxkν =1 x 6= 0

This follows immediately from the fact this sequence of equivalences:

kAxkµ Ax x sup = sup k kµ = sup kA kµ = sup kAykµ = sup kAykµ = sup kAxkµ. n kxk n kxk n kxk n x ∈ C ν x ∈ C ν x ∈ C ν x ∈ C kykν =1 kxkν =1 x 6= 0 x 6= 0 x 6= 0 x 6= 0 y = x kxkν

Also the “sup” (which stands for supremum) is used because we can’t claim yet that there is a vector x with kxkν = 1 for which kAkµ,ν = kAxkµ. The fact is that there is always such a vector x. The proof depends on a result from real analysis (sometimes called “advanced calculus”) that states that supx∈S f(x) is attained for some vector x ∈ S as long as f is continuous and S is a compact set. Since real analysis is not a prerequisite for this course, the reader may have to take this on faith! We conclude that the following two definitions are equivalent definitions to the one we already gave:

5 m n m×n Definition 15. Let k · kµ : C → R and k · kν : C → R be vector norms. Define k · kµ,ν : C → R by kAxk kAk = max µ . µ,ν n x ∈ C kxkν x 6= 0

and m n m×n Definition 16. Let k · kµ : C → R and k · kν : C → R be vector norms. Define k · kµ,ν : C → R by

kAkµ,ν = max kAxkµ. kxkν =1

m×n Theorem 17. k · kµ,ν : C → R is a norm.

Proof: To prove this, we merely check whether the three conditions are met: Let A, B ∈ Cm×n and α ∈ C be arbitrarily chosen. Then

• A 6= 0 ⇒ kAkµ,ν > 0 (k · kµ,ν is positive definite): Notice that A 6= 0 means that at least one of its columns is not a zero vector (since at least one element). Let us assume it is the jth column, aj, that is nonzero. Then

kAxk kAe k ka k kAk = max µ ≥ j µ = j µ > 0. µ,ν n x ∈ C kxkν kejkν kejkν x 6= 0

•k αAkµ,ν = |α|kAkµ,ν (k · kµ,ν is homogeneous):

kαAxk kAxk kAxk kαAk = max µ = max |α| µ = |α| max µ = |α|kAk . µ,ν n n n µ,ν x ∈ C kxkν x ∈ C kxkν x ∈ C kxkν x 6= 0 x 6= 0 x 6= 0

•k A + Bkµ,ν ≤ kAkµ,ν + kBkµ,ν (k · kµ,ν obeys the triangle inequality).

k(A + B)xk kAx + Bxk kAxk + kBxk kA + Bk = max µ = max µ ≤ max µ µ µ,ν n n n x ∈ C kxkν x ∈ C kxkν x ∈ C kxkν x 6= 0 x 6= 0 x 6= 0 kAxk kBxk  kAxk kBxk ≤ max µ + µ ≤ max µ + max µ = kAk + kBk . n n n µ,ν µ,ν x ∈ C kxkν kxkν x ∈ C kxkν x ∈ C kxkν x 6= 0 x 6= 0 x 6= 0

QED

3.3 Special cases used in practice m×n The most important case of k · kµ,ν : C → R uses the same norm for k · kµ and k · kν (except that m may not equal n).

6 m×n Definition 18. Define k · kp : C → R by kAxk kAk = max p = max kAxk . p n p x ∈ C kxkp kxkp=1 x 6= 0

m×n   Theorem 19. Let A ∈ C and partition A = a0 a1 ··· an−1 . Show that

kAk1 = max kajk1. 0≤j

Proof: Let ¯be chosen so that max0≤j

     χ1  max kAxk1 = max a0 a1 ··· an−1  .  kxk1=1 kxk1=1  .   .  χ n−1 1 = max kχ0a0 + χ1a1 + ··· + χn−1an−1k1 kxk1=1

≤ max (kχ0a0k1 + kχ1a1k1 + ··· + kχn−1an−1k1) kxk1=1

= max (|χ0|ka0k1 + |χ1|ka1k1 + ··· + |χn−1|kan−1k1) kxk1=1

≤ max (|χ0|ka¯k1 + |χ1|ka¯k1 + ··· + |χn−1|ka¯k1) kxk1=1

= max (|χ0| + |χ1| + ··· + |χn−1|) ka¯k1 kxk1=1

= ka¯k1.

Also,

ka¯k1 = kAe¯k1 ≤ max kAxk1. kxk1=1

Hence ka¯k1 ≤ max kAxk1 ≤ ka¯k1 kxk1=1 which implies that max kAxk1 = ka¯k1 = max kajk. kxk1=1 0≤j

 T  ba0  aT   b1  Exercise 20. Let A ∈ Cm×n and partition A =  . . Show that  .   .  T bam−1

kAk∞ = max kaik1 = max (|αi,0| + |αi,1| + ··· + |αi,n−1|) 0≤i

7  T  ba0  aT   b1  Answer: Partition A =  . . Then  .   .  T bam−1

 T   T  a a x b0 b0  aT   aT x   b1   b1  kAk∞ = max kAxk∞ = max  .  x = max  .  kxk∞=1 kxk∞=1  .  kxk∞=1  .   .   . 

aT aT x bm−1 ∞ bm−1 ∞ n−1 n−1  T  X X = max max |ai x| = max max | αi,pχp| ≤ max max |αi,pχp| kxk =1 i kxk =1 i kxk =1 i ∞ ∞ p=0 ∞ p=0 n−1 n−1 n−1 X X X = max max (|αi,p||χp|) ≤ max max (|αi,p|(max |χk|)) ≤ max max (|αi,p|kxk∞) kxk =1 i kxk =1 i k kxk =1 i ∞ p=0 ∞ p=0 ∞ p=0 n−1 X = max (|αi,p| = kaik1 i b p=0 so that kAk∞ ≤ maxi kbaik1. We also want to show that kAk∞ ≥ maxi kbaik1. Let k be such that maxi kbaik1 = kbakk1 and pick   ψ0

 ψ1    T y =  .  so that a y = |αk,0| + |αk,1| + ··· + |αk,n−1| = kakk1. (This is a matter of picking ψi so that  .  bk b  .  ψn−1 |ψi| = 1 and ψiαk,i = |αk,i|.) Then

 T   T  a a b0 b0  aT   aT   b1   b1  kAk∞ = max kAxk∞ = max  .  x ≥  .  y kxk1=1 kxk1=1  .   .   .   . 

aT aT bm−1 ∞ bm−1 ∞  T  a y b0  aT y   b1  T T =  .  ≥ |ak y| = ak y = kakk1. = max kaik1  .  b b b i b  . 

aT y bm−1 ∞ End of Answer T T T Notice that in the above exercise bai is really (bai ) since bai is the label for the ith row of matrix A.

3.4 Discussion

While k · k2 is a very important matrix norm, it is in practice difficult to compute. The matrix norms, k · kF , k · k1, and k · k∞ are more easily computed and hence more practical in many instances.

8 3.5 Submultiplicative norms

m×n Definition 21. A matrix norm k · kν : C → R is said to be submultiplicative (consistent) if it also satisfies

kABkν ≤ kAkν kBkν .

n m×n Theorem 22. Let k · kν : C → R be a vector norm and given any matrix C ∈ C define the corresponding induced matrix norm as kCxkν kCkν = max = max kCxkν . x6=0 kxkν kxkν =1 m×k k×n Then for any A ∈ C and B ∈ C the inequality kABkν ≤ kAkν kBkν holds. In other words, induced matrix norms are submultiplicative. To prove the above, it helps to first proof a simpler result: n m×n Lemma 23. Let k · kν : C → R be a vector norm and given any matrix C ∈ C define the induced matrix norm as kCxkν kCkν = max = max kCxkν . x6=0 kxkν kxkν =1 m×n n Then for any A ∈ C and y ∈ C the inequality kAykν ≤ kAkν kykν holds.

Proof: If y = 0, the result obviously holds since then kAykν = 0 and kykν = 0. Let y 6= 0. Then

kAxkν kAykν kAkν = max ≥ . x6=0 kxkν kykν

Rearranging this yields kAykν ≤ kAkν kykν . QED We can now prove the theorem: Proof:

kABkν = max kABxkν = max kA(Bx)kν ≤ max kAkν kBxkν ≤ max kAkν kBkν kxkν = kAkν kBkν . kxkν =1 kxkν =1 kxkν =1 kxkν =1

QED

Exercise 24. Show that kAxkµ ≤ kAkµ,ν kxkν .

Answer: W.l.o.g. let x 6= 0. kAykµ kAxkµ kAkµ,ν = max ≥ . y6=0 kykν kxkν Rearranging this establishes the result. End of Answer

Exercise 25. Show that kABkµ ≤ kAkµ,ν kBkν .

Answer:

kABkµ,ν = max kABxkµ ≤ max kAkµ,ν kBxkν = kAkµ,ν max kBxkν = kAkµ,ν kBkν kxkν =1 kxkν =1 kxkν =1

End of Answer

Exercise 26. Show that the Frobenius norm, k · kF , is submultiplicative.

9 Answer: 2 2  H   aH b aH b ··· aH b  ba0 b0 0 b0 1 b0 n−1 H H H H  a   a b0 a b1 ··· a bn−1  2  b1     b0 b0 b0  X X H 2 kABkF =  .  b0 b1 ··· bn−1 =  . . .  = |ai bj|  .   . . .  b  .   . . .  i j

aH aH b aH b ··· aH b bm−1 F bm−1 0 bm−1 1 bm−1 n−1 F X X H 2 2 ≤ kbai k2kbjk (Cauchy-Schwartz) i j !   !   X 2 X 2 X H X H = kbaik2  kbjk  = bai bai  bj bj i j i j     X X H X X H 2 2 ≤  |bai baj|  |bi bj| = kAkF kBkF . i j i j

2 2 2 so that kABkF ≤ kAk2kBk2. Taking the square-root of both sides established the desired result. End of Answer

4 An Application to Conditioning of Linear Systems

A question we will run into later in the course asks how accurate we can expect the solution of a linear system to be if the right-hand side of the system has error in it. Formally, this can be stated as follows: We wish to solve Ax = b, where A ∈ Cm×m but the right-hand side has been perturbed by a small vector so that it becomes b + δb. (Notice how that δ touches the b. This is meant to convey that this is a symbol that represents a vector rather than the vector b that is multiplied by a scalar δ.) The question now is how a relative error in b propogates into a potential error in the solution x. This is summarized as follows: Ax = b Exact equation A(x + δx) = b + δb Perturbed equation We would like to determine a formula, κ(A, b, δb), that tells us how much a relative error in b is potentially amplified into an error in the solution b: kδxk kδbk ≤ κ(A, b, δb) . kxk kbk We will assume that A has an inverse. To find an expression for κ(A, b, δb), we notice that

Ax + Aδx = b + δb Ax = b − Aδx = δb and from this Ax = b δx = A−1δb. If we now use a vector norm k · k and induced matrix norm k · k, then

kbk = kAxk ≤ kAkkxk kδxk = kA−1δbk ≤ kA−1kkδbk.

10 From this we conclude that 1 1 kxk ≤ kAk kbk kδxk ≤ kA−1kkδbk. so that kδxk kδbk ≤ kAkkA−1k . kxk kbk Thus, the desired expression κ(A, b, δb) doesn’t depend on anything but the matrix A:

kδxk kδbk ≤ kAkkA−1k . kxk | {z } kbk κ(A)

κ(A) = kAkkA−1k the called the of matrix A. A question becomes whether this is a pessimistic result or whether there are examples of b and δb for which the relative error in b is amplified by exactly κ(A). The answer is, unfortunately, “yes!”, as we will show next. Notice that

• There is an xb for which kAk = max kAxk = kAxk, kxk=1 b

namely the x for which the maximum is attained. Pick b = Axb. • There is an δbb for which kA−1xk kA−1δbbk kA−1k = max = , kxk6=0 kxk kδbbk again, the x for which the maximum is attained. It is when solving the perturbed system A(x + δx) = b + δbb that the maximal magnification by κ(A) is attained. Exercise 27. Let k · k be a matrix norm induced by the k · · · k vector norm. Show that κ(A) = kAkkA−1k ≥ 1.

Answer: kIk = kAA−1k ≤ kAkkA−1k. But kIk = max kIxk = max kxk = 1. kxk=1 kxk Hence 1 ≤ kAkkA−1k. End of Answer This last exercise shows that there will always be choices for b and δb for which the relative error is at best directly translated into an equal relative error in the solution (if κ(A) = 1).

5 Equivalence of Norms

Many results we encounter show that the norm of a particular vector or matrix is small. Obviously, it would be unfortunate if a vector or matrix is large in one norm and small in another norm. The following result shows that, modulo a constant, all norms are equivalent. Thus, if the vector is small in one norm, it is small in other norms as well.

11 n n Theorem 28. Let k · kµ : C → R and k · kν : C → R be vector norms. Then there exist constants αµ,ν and n βµ,ν such that for all x ∈ C αµ,ν kxkµ ≤ kxkν ≤ βµ,ν kxkµ. A similar result holds for matrix norms: m×n m×n Theorem 29. Let k · kµ : C → R and k · kν : C → R be matrix norms. Then there exist constants αµ,ν m×n and βµ,ν such that for all A ∈ C

αµ,ν kAkµ ≤ kAkν ≤ βµ,ν kAkµ.

12