Lecture 6: Matrix Norms and Spectral Radii

Lecture 6: Matrix Norms and Spectral Radii After a reminder on norms and inner products, this lecture introduces the notions of matrix norm and induced matrix norm. Then the relation between matrix norms and spectral radii is studied, culminating with Gelfand’s formula for the spectral radius. 1 Inner products and vector norms Definition 1. Let V be a vector space over a field K (K = R or C). A function h·; ·i : V × V ! K is called an inner product if (IP1) hx; xi ≥ 0 for all x 2 V , with equality iff x = 0, [positivity] (IP2) hλx + µy; zi = λhx; zi + µhy; zi for all λ, µ 2 K and x; y; z 2 V , [linearity] (IP3) hx; yi = hy; xi for all x; y 2 V . [Hermitian symmetry] Observation: If K = R, then (IP3) simply says that hx; yi = hy; xi. This is not the case in the complex setting, where one has to be careful about complex conjugation. For instance, (IP2) and (IP3) combine to give, for all λ, µ 2 C and x; y; z 2 V , hx; λy + µzi = λhx; yi + µhx; zi: n Observation: On V = C , there is the classical inner product defined by n ∗ X n (1) hx; yi := y x = xjyj; x; y 2 C : j=1 On V = Mn, there is the Frobenius inner product defined by n n ∗ X X hA; BiF := tr (B A) = ak;`bk;`; A; B 2 Mn: k=1 `=1 Cauchy–Schwarz inequality is a fundamental inequality valid in any inner product space. At this point, we state it in the following form in order to prove that any inner product generates a normed space. Theorem 2. If h·; ·i is an inner product on a vector space V , then, for all x; y 2 V , jhx; yij2 ≤ hx; xihy; yi: Proof. For x; y 2 V , choose θ 2 (−π; π] such that Reheiθx; yi = jhx; yij. Consider the function defined for t 2 R by q(t) := heiθx + ty; eiθx + tyi = heiθx; eiθxi + 2tReheiθx; yi + t2hy; yi = hx; xi + 2tjhx; yij + t2hy; yi: 1 This is a quadratic polynomial with q(t) ≥ 0 for all t 2 R, so its discriminant satisfies ∆ = (2jhx; yij)2 − 4hx; xihy; yi ≤ 0; which directly translates into jhx; yij2 ≤ hx; xihy; yi, as desired. The general definition of a norm is given below. A normed space is simply a vector space endowed with a norm. Definition 3. Let V be a vector space over a field K (K = R or C). A function k · k : V ! R is called a (vector) norm if (N1) kxk ≥ 0 for all x 2 V , with equality iff x = 0, [positivity] (N2) kλxk = jλjkxk for all λ 2 K and x 2 V , [homogeneity] (N3) kx + yk ≤ kxk + kyk for all x; y 2 V . [triangle inequality] As examples, we observe that the expression kxk1 := max jxjj j2[1:n] n n defines a norm on K . The corresponding vector space is denoted as `1. The expression kxk1 := jx1j + jx2j + ··· + jxnj n n defines a norm on K . The corresponding vector space is denoted as `1 . More generally, for p ≥ 1, the expression p p p1=p kxkp := jx1j + jx2j + ··· + jxnj n n defines a norm on K . The corresponding vector space is denoted as `p . In the case p = 2, note n n that `2 is the vector space K endowed with the inner product (1). Proposition 4. If V is a vector space endowed with an inner product h·; ·i, then the expression kxk := phx; xi defines a norm on V . Proof. The properties (N1) and (N2) are readily checked. As for (N3), consider x; y 2 V , and use Theorem 2 to obtain hx + y; x + yi = hx; xi + 2Rehx; yi + hy; yi ≤ hx; xi + 2jhx; yij + hy; yi ≤ hx; xi + 2phx; xiphy; yi + hy; yi = phx; xi + phy; yi2; so that phx + y; x + yi ≤ phx; xi + phy; yi, i.e., kx + yk ≤ kxk + kyk, as expected. 2 Now that we know that kxk = phx; xi defines a norm on an inner product space, we can state Cauchy–Schwarz inequality in the more familiar form jhx; yij ≤ kxkkyk; x; y 2 V: If V is a finite-dimensional space, then all norms on V are equivalent in the following sense (in fact, this characterizes finite dimension). Theorem 5. If k · k and k · k0 are two norms on a finite-dimensional vector space V , then there exist constants c; C > 0 such that ckxk ≤ kxk0 ≤ Ckxk for all x 2 V: 0 Proof. Fixing a basis (v1; : : : ; vn) of V , we are going to show that any arbitrary norm k · k is equivalent to the norm k · k defined by n X kxk = max jxjj; where x = xjvj: j2[1:n] j=1 On the one hand, for any x 2 V , we have n n n n 0 0 X X 0 X 0 X 0 kxk = xjvj ≤ kxjvjk = jxjjkvjk ≤ max jxjj kvjk = Ckxk; j2[1:n] j=1 j=1 j=1 j=1 Pn 0 where we have set C := j=1 kvjk . On the other hand, let us assume that there is no c > 0 such that kxk ≤ (1=c)kxk0 for all x 2 V , so that, for each integer k ≥ 1, we can find x(k) 2 V with kx(k)k > kkx(k)k0. Since we can assume without loss of generality that kx(k)k = 1, we (k) 0 (k) have kx k < 1=k. The sequence x1 k≥1 of complex numbers is bounded, so we can extract ('1(k)) ('1(k)) a subsequence x1 k≥1 converging to some x1 2 C; next, the sequence x2 k≥1 of com- ('1('2(k))) plex numbers is bounded, so we can extract a subsequence x2 k≥1 converging to some ('(k)) ('(k)) x2 2 C; etc. Setting ' = '1 ◦ · · · ◦ 'n, we end up with subsequences x1 k≥1;:::; xn k≥1 ('(k)) ('(k)) Pn ('(k)) such that xj −! xj for each j 2 [1 : n]. Note that the vectors x := j=1 xj vj and k!1 Pn ('(k)) ('(k)) x = j=1 xjvj satisfy kx − x k = maxj2[1:n] jxj − xj j −! 0, therefore k!1 kxk0 ≤ kx − x('(k))k0 + kx('(k))k0 ≤ Ckx − x('(k))k + 1='(k): Taking the limit as k ! 1 yields kxk0 = 0, hence x = 0, which contradicts ('(k)) ('(k)) ('(k)) kxk = max jxjj = max lim jxj j = lim max jxj j = lim kx k = lim 1 = 1: j2[1:n] j2[1:n] k!1 k!1 j2[1:n] k!1 k!1 This proves the existence of the desired constant c > 0. 3 n For instance, we can use Cauchy–Schwarz inequality to derive, for any x 2 C , n n n n 1=2 1=2 X X h X 2i h X 2i p kxk1 = jxjj = 1 × jxjj ≤ 1 jxjj = nkxk2; j=1 j=1 j=1 j=1 and this inequality is best possible because it turns into an equality for x = [1; 1;:::; 1]>. On n the other hand, for any x 2 C , we have kxk2 ≤ kxk1, since n n 2 2 X 2 h X i 2 kxk2 = jxjj ≤ jxjj = kxk1; j=1 j=1 and this inequality is best possible because it turns into an equality for x = [1; 0;:::; 0]>. We can more generally compare any `p-norm with any `q-norm. The proof is left as an exercise. n Proposition 6. Given 1 ≤ p < q ≤ 1, for all x 2 K , 1=p−1=q kxkq ≤ kxkp ≤ n kxkq; and these inequalities are best possible. 2 Matrix norms Since Mn is a vector space, it can be endowed with a vector norm. There is one more ingredient making this norm a matrix norm. Definition 7. A function jj|·|jj : Mn ! C is called a matrix norm if (MN1) jjjAjjj ≥ 0 for all A 2 Mn, with equality iff x = 0, [positivity] (MN2) jjjλAjjj = jλjjjjAjjj for all λ 2 C and A 2 Mn, [homogeneity] (MN3) jjjA + Bjjj ≤ jjjAjjj + jjjBjjj for all A; B 2 Mn. [triangle inequality] (MN4) jjjABjjj ≤ jjjAjjjjjjBjjj for all A; B 2 Mn. [submultiplicativity] As an example, we notice that the Frobenius norm defined by p p ∗ kAkF := hA; AiF = tr (A A);A 2 Mn; is a matrix norm. Indeed, for A; B 2 Mn, 2 ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ ∗ kABkF = tr ((AB) (AB)) = tr (B A AB) = tr (BB A A) = hA A; BB iF ≤ kA AkF kB BkF : Now notice that, with λ1 ≥ · · · ≥ λn ≥ 0 denoting the eigenvalues of the Hermitian matrix M := A∗A, we have n n 2 ∗ 2 X 2 h X i 2 2 kA AkF = kMkF = tr (M ) = λj ≤ λj = tr (M) = kAkF : j=1 j=1 4 ∗ 2 2 2 2 Likewise, we have kB BkF ≤ kBkF . We deduce kABkF ≤ kAkF kBkF , i.e., kABkF ≤ kAkF kBkF , as desired. Another important example of matrix norms is given by the norm induced by a vector norm. n Definition 8. If k · k is a vector norm on C , then the induced norm on Mn defined by jjjAjjj := max kAxk kxk=1 is a matrix norm on Mn. n A consequence of the definition of the induced norm is that kAxk ≤ jjjAjjjkxk for any x 2 C . n Let us now verify (MN4) for the induced norm. Given A; B 2 Mn, we have, for any x 2 C with kxk = 1, kABxk ≤ jjjAjjjkBxk ≤ jjjAjjjjjjBjjjkxk = jjjAjjjjjjBjjj; so taking the maximum over x gives the desired inequality jjjABjjj ≤ jjjAjjjjjjBjjj.

Lecture 6: Matrix Norms and Spectral Radii

C H a P T E R § Methods Related to the Norm Al Equations

The Column and Row Hilbert Operator Spaces

Matrices and Linear Algebra

Lecture 5 Ch. 5, Norms for Vectors and Matrices

Matrix Norms 30.4

5 Norms on Linear Spaces

A Degeneracy Framework for Scalable Graph Autoencoders

Chapter 6 the Singular Value Decomposition Ax=B Version of 11 April 2019

Domain Generalization for Object Recognition with Multi-Task Autoencoders

Norms of Vectors and Matrices and Eigenvalues and Eigenvectors - (7.1)(7.2)

Deep Subspace Clustering Networks

Eigenvalues and Eigenvectors