<<

STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2018 LECTURE 4

1. spectral radius • 2-norm is also known as the spectral norm • name is connected to the fact that the norm is given by the square root of the largest eigenvalue of ATA, i.e., largest singular value of A (more on this later) n×n • in general, the spectral radius ρ(A) of a matrix A ∈ C is defined in terms of its largest eigenvalue ρ(A) = max{|λi| : Axi = λixi, xi 6= 0} n×n • note that the spectral radius does not define a norm on C • for example the non-zero matrix 0 1 J = 0 0 has ρ(J) = 0 since both its eigenvalues are 0 • there are some relationships between the norm of a matrix and its spectral radius • the easiest one is that ρ(A) ≤ kAk n for any that satisfies the inequality kAxk ≤ kAkkxk for all x ∈ C , i.e., consistent norm – here’s a proof: Axi = λixi taking norms, kAxik = kλixik = |λi|kxik therefore kAxik |λi| = ≤ kAk kxik since this holds for any eigenvalue of A, it follows that

max|λi| = ρ(A) ≤ kAk i – in particular this is true for any – this is in general not true for norms that do not satisfy the consistency inequality kAxk ≤ kAkkxk (thanks to Likai Chen for pointing out); for example the matrix

" √1 √1 # A = 2 2 √1 − √1 2 2 √ is orthogonal and therefore ρ(A) = 1 but kAkH,∞ = 1/ 2 and so ρ(A) > kAkH,∞ – exercise: show that any eigenvalue of a unitary or an orthogonal matrix must have 1

Date: October 10, 2018, version 1.0. Comments, bug reports: [email protected]. 1 • on the other hand, the following characterization is true for any matrix norm, even the inconsistent ones ρ(A) = lim kAmk1/m m→∞ • we can also get an upper bound for any particular matrix (but not for all matrices)

n×n Theorem 1. Let A ∈ C and ε > 0. There exists an operator norm k · kα of the form

kAxkα kAkα = max , x6=0 kxkα n where k · kα is a norm on C , such that

kAkα ≤ ρ(A) + .

The norm k · kα is dependent on A and ε. • this result suggests that the largest eigenvalue of a matrix can be easily approximated • here is an example, let  2 −1  −1 2 −1     ......  A =  . . .     −1 2 1 −1 2 – the eigenvalues of this matrix, which arises frequently in numerical methods for solving differential equations, are known to be jπ λ = 2 + 2 cos , j = 1, 2, . . . , n j n + 1 the largest eigenvalue is π |λ | = 2 + 2 cos ≤ 4 1 n + 1

and kAk∞ = 4, so in this case, the ∞-norm provides an excellent approximation – on the other hand, suppose 1 106 A = 0 1

6 we have kAk∞ = 10 + 1, but ρ(A) = 1, so in this case the norm yields a poor approximation however, suppose ε 0 D = 0 1 then 1 106ε DAD−1 = 0 1 −1 −6 and kDAD k∞ = 1 + 10 , which for sufficiently small ε, yields a much better approximation to ρ(DAD−1) = ρ(A). • if kAk < 1 for some submultiplicative norm, then kAmk ≤ kAkm → 0 as m → ∞ • since kAk is a continuous function of the elements of A, it follows that Am → O, i.e., every entry of Am goes to 0 • however, if kAk > 1, it does not follow that kAmk → ∞ 2 • for example, suppose 0.99 106  A = 0 0.99 m m in this case, kAk∞ > 1, we claim that because ρ(A) < 1, A → O and so kA k → 0 • let us prove this more generally, in fact we claim the following m Lemma 1. limm→∞ A = O if and only if ρ(A) < 1. Proof. (⇒) Let Ax = λx with x 6= 0. Then Amx = λmx. Taking limits     lim λm x = lim λmx = lim Amx = lim Am x = Ox = 0. m→∞ m→∞ m→∞ m→∞ m Since x 6= 0, we must have limm→∞ λ = 0 and thus |λ| < 1. Hence ρ(A) < 1. (⇐) Since ρ(A) < 1, there exists some operator norm k · kα such that kAkα < 1 by Theorem 1. m m m So kA kα ≤ kAkα → 0 and so A → O.  • alternatively, the second part above may also be proved directly via the Jordan form of A and the expression  k−(n −1) λk kλk−1 kλk−2 ··· k λ r r 1 r 2 r nr−1 r  . . .   .. .. .    k  . . .  Jr =  .. .. .     .. .   . .  k λr for sufficiently large k (see previous lecture) without using Theorem 1). • in Homework 1 we will see that if for some operator norm, kAk < 1, then I−A is nonsingular

2. Gerschgorin’s theorem n×n • A = [aij] ∈ C , for i = 1, . . . , n, we define the Gerschgorin’s discs Gi := {z ∈ C : |z − aii| ≤ ri} where X ri := |aij| j6=i • Gerschgorin’s theoerm says that the n eigenvalues of A are all contained in the union of Gi’s • before we prove this, we need a result that is by itself useful n×n • a matrix A ∈ C is called strictly diagonally dominant if X |aii| > |aij| j6=i • it is called weakly diagonally dominant if the ‘>’ is replaced by ‘≥’ Lemma 2. A strictly diagonally dominant matrix is nonsingular. Proof. Let A be strictly diagonally dominant. Suppose Ax = 0 for some x 6= 0. Let k ∈ {1, . . . , n} be such that |xk| = maxi=1,...,n|xi|. Since x 6= 0, we must have |xk| > 0. Now observe that X akkxk = − akjxj j6=k and so by the triangle inequality, X X X  |akk||xk| = akjxj ≤ |akj||xj| ≤ |akj| |xk| < |akk||xk| j6=k j6=k j6=k 3 where the last inequality is by strict diagonal dominance. This is a contradiction. In other words, there are no non-zero vector with Ax = 0. So ker(A) = {0} and so A is nonsingular.  • we are going to use this to prove the first part of Gerschgorin theorem • the second part requires a bit of topology Theorem 2 (Gerschgorin). The spectrum of A is contained in the union of its Gerschgorin’s discs, i.e., [n λ(A) ⊆ Gi. i=1 Furthermore, the number of eigenvalues (counted with multiplicity) in each connected component Sn of i=1 Gi is equal to the number of Gerschgorin discs that constitute that component. Sn Proof. Suppose z∈ / i=1 Gi. Then A − zI is a strictly diagonal dominant matrix (check!) and therefore nonsingular by the above lemma. Hence det(A − zI) 6= 0 and so z is not an eigenvalue of A. This proves the first part. For the second part, consider the matrix   a11 ta12 ··· ta1n ta21 a22 ··· ta2n   A(t) :=  . . .. .   . . . .  tan1 tan2 ··· ann for t ∈ [0, 1]. Note that A(0) = diag(a11, . . . , ann) and A(1) = A. We will let Gi(t) be the ith Gerschgorin disc of A(t). So Gi(t) = {z ∈ C : |z − aii| ≤ tri}. Clearly Gi(t) ⊆ Gi for any t ∈ [0, 1]. By the first part, all eigenvalues of all the matrices A(t) are Sn contained in i=1 Gi. Since the set of eigenvalues of the matrices A(t) depends continuously on the parameter t, A(0) must have the same number of eigenvalues as A(1) in each connected component Sn of i=1 Gi. Now just observe that the eigenvalues of A(0) are simply the centers akk of each discs in a connected component. 

3. schur decomposition n×n • suppose we want a decomposition for arbitrary matrices A ∈ C like the evd for normal matrices A = QΛQ∗, i.e., diagonalizing with unitary matrices • the way to obtain such a decomposition is to relax the requirement of having a diagonal matrix Λ in A = QΛQ∗ but instead allow it to be upper-triangular • this gives the Schur decomposition: A = QRQ∗ (3.1) • as in the evd for normal matrices, Q is a but   r11 r12 ··· r1n  0 r22 ··· r2n   R =  . .. .   . . .  0 ··· 0 rnn

• note that we the eigenvalues of A are precisely the diagonal entries of R: r11, . . . , rnn • unlike the Jordan canonical form, the Schur decomposition is readily computable in finite- precision via the QR algorithm • the QR algorithm is based on the QR decomposition, which we will discuss later in this course • in its most basic form, QR algorithm does the following: 4 input: A0 = A; step k: Ak = QkRk; perform QR decomposition step k + 1: Ak+1 = RkQk; multiply QR factors in reverse order

• under suitable conditions, one may show that Qk → Q and Rk → R where Q and R are as the requisite factors in (3.1) • in most undergraduate linear algebra classes, one is taught to find eigenvalues by solving for the roots of the characteristic polynomial

pA(x) = det(xI − A) = 0 • this is almost never the case in practice • for one, there is no finite algorithms for finding the roots of a polynomial when the degree exceeds four — by the famous impossibility result of Abel–Galois • in fact what happens is the opposite — the roots of a univariate polynomial (divide by the coefficient of the highest degree term first so that it becomes a monic polynomial) n−1 n p(x) = c0 + c1x + ··· + cn−1x + x are usually obtained as the eigenvalues of its companion matrix   0 0 ... 0 −c0 1 0 ... 0 −c1    0 1 ... 0 −c2  Cp =   ......  . . . . .  0 0 ... 1 −cn−1 using the QR algorithm • exercise: show that det(xI − Cp) = p(x)

4. singular value decomposition m×n • let A ∈ C , we can always write A = UΣV ∗ (4.1) m×m n×n – U ∈ C and V ∈ C are both unitary matrices ∗ ∗ ∗ ∗ U U = Im = UU ,V V = In = VV m×n – Σ ∈ C is a diagonal matrix in the sense that σij = 0 if i 6= j – if m > n, then Σ looks like

σ1  .  ..       σn Σ =    0 ··· 0   . .   . .  0 ··· 0 – if m < n, then Σ looks like   σ1 0 ··· 0  .. . . Σ =  . . . σm 0 ··· 0 5 – if m = n, then Σ looks like   σ1  ..  Σ =  .  σm

– the diagonal elements of Σ, denoted σi, i = 1, . . . , n, are all nonnegative, and can be ordered such that

σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σr+1 = ··· = σmin(m,n) = 0 – r is the rank of A • this decomposition of A is called the singular value decomposition, or svd – the values σi, for i = 1, 2, . . . , n, are the singular values of A – the columns of U are the left singular vectors – the columns of V are the right singular vectors • an alternative decomposition of A omits the singular values that are equal to zero: A = U˜Σ˜V˜ ∗ ˜ m×r ˜ ∗ ˜ – U ∈ C is a matrix with orthonormal columns, i.e. satisfying U U = Ir (but not ∗ U˜U˜ = Im!) ˜ n×r ˜ ∗ ˜ – V ∈ C is also a matrix with orthonormal columns, i.e. satisfying V V = Ir (but ∗ again not V˜ V˜ = In!) – Σ˜ is an r × r diagonal matrix with diagonal elements σ1, . . . , σr – again r = rank(A) – the columns of U˜ are the left singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the image of A – the columns of V˜ are the right singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the coimage of A – this is called the condensed or compact or reduced svd – note that in this case, Σ˜ is a square matrix • the form in (4.1) is sometimes called the full svd • we may also write the reduced svd as a sum of rank-1 matrices ∗ ∗ ∗ A = σ1u1v1 + σ2u2v2 + ··· + σrurvr ˜ m – U = [u1,..., ur], i.e. u1,..., ur ∈ C are the left singular vectors of A ˜ n – V = [v1,..., vr], i.e. v1,..., vr ∈ C are the right singular vectors of A

6