STAT 309: MATHEMATICAL COMPUTATIONS I FALL 2016 LECTURE 5

1. Gerschgorin’s theorem n×n • A = [aij] ∈ C , for i = 1, . . . , n, we define the Gerschgorin’s discs

Gi := {z ∈ C : |z − aii| ≤ ri} where X ri := |aij| j6=i • Gerschgorin’s theoerm says that the n eigenvalues of A are all contained in the union of Gi’s • before we prove this, we need a result that is by itself useful n×n • a A ∈ C is called strictly diagonally dominant if X |aii| > |aij| j6=i • it is called weakly diagonally dominant if the ‘>’ is replaced by ‘≥’ Lemma 1. A strictly diagonally dominant matrix is nonsingular. Proof. Let A be strictly diagonally dominant. Suppose Ax = 0 for some x 6= 0. Let k ∈ {1, . . . , n} be such that |xk| = maxi=1,...,n|xi|. Since x 6= 0, we must have |xk| > 0. Now observe that X akkxk = − akjxj j6=k and so by the triangle inequality, X X X  |akk||xk| = akjxj ≤ |akj||xj| ≤ |akj| |xk| < |akk||xk| j6=k j6=k j6=k where the last inequality is by strict diagonal dominance. This is a contradiction. In other words, there are no non-zero vector with Ax = 0. So ker(A) = {0} and so A is nonsingular.  • we are going to use this to prove the first part of Gerschgorin theorem • the second part requires a bit of topology Theorem 1 (Gerschgorin). The spectrum of A is contained in the union of its Gerschgorin’s discs, i.e. [n λ(A) ⊆ Gi. i=1 Furthermore, the number of eigenvalues (counted with multiplicity) in each connected component Sn of i=1 Gi is equal to the number of Gerschgorin discs that constitute that component.

Date: October 16, 2016, version 1.1. Comments, bug reports: [email protected]. 1 Sn Proof. Suppose z∈ / i=1 Gi. Then A − zI is a strictly diagonal dominant matrix (check!) and therefore nonsingular by the above lemma. Hence det(A − zI) 6= 0 and so z is not an eigenvalue of A. This proves the first part. For the second part, consider the matrix   a11 ta12 ··· ta1n ta21 a22 ··· ta2n   A(t) :=  . . .. .   . . . .  tan1 tan2 ··· ann for t ∈ [0, 1]. Note that A(0) = diag(a11, . . . , ann) and A(1) = A. We will let Gi(t) be the ith Gerschgorin disc of A(t). So Gi(t) = {z ∈ C : |z − aii| ≤ tri}.

Clearly Gi(t) ⊆ Gi for any t ∈ [0, 1]. By the first part, all eigenvalues of all the matrices A(t) are Sn contained in i=1 Gi. Since the set of eigenvalues of the matrices A(t) depends continuously on the parameter t, A(0) must have the same number of eigenvalues as A(1) in each connected component Sn of i=1 Gi. Now just observe that the eigenvalues of A(0) are simply the centers akk of each discs in a connected component. 

2. schur decomposition n×n • suppose we want a decomposition for arbitrary matrices A ∈ C like the evd for normal matrices A = QΛQ∗, i.e., diagonalizing with unitary matrices • the way to obtain such a decomposition is to relax the requirement of having a Λ in A = QΛQ∗ but instead allow it to be upper-triangular • this gives the Schur decomposition: A = QRQ∗ (2.1) • as in the evd for normal matrices, Q is a but   r11 r12 ··· r1n  0 r22 ··· r2n   R =  . .. .   . . .  0 ··· 0 rnn

• note that we the eigenvalues of A are precisely the diagonal entries of R: r11, . . . , rnn • unlike the Jordan canonical form, the Schur decomposition is readily computable in finite- precision via the QR algorithm • in its most basic form, QR algorithm does the following:

input: A0 = A; step k: Ak = QkRk; perform QR decomposition step k + 1: Ak+1 = RkQk; multiply QR factors in reverse order

• under suitable conditions, one may show that Qk → Q and Rk → R where Q and R are as the requisite factors in (2.1) • in most undergraduate classes, one is taught to find eigenvalues by solving for the roots of the characteristic polynomial

pA(x) = det(xI − A) = 0 • this is almost never the case in practice • for one, there is no finite algorithms for finding the roots of a polynomial when the degree exceeds four — by the famous impossibility result of Abel–Galois 2 • in fact what happens is the opposite — the roots of a univariate polynomial (divide by the coefficient of the highest degree term first so that it becomes a monic polynomial) n−1 n p(x) = c0 + c1x + ··· + cn−1x + x are usually obtained as the eigenvalues of its   0 0 ... 0 −c0 1 0 ... 0 −c1    0 1 ... 0 −c2  Cp =   ......  . . . . .  0 0 ... 1 −cn−1 using the QR algorithm • exercise: show that det(xI − Cp) = p(x)

3. decomposition m×n • let A ∈ C , we can always write A = UΣV ∗ (3.1) m×m n×n – U ∈ C and V ∈ C are both unitary matrices ∗ ∗ ∗ ∗ U U = Im = UU ,V V = In = VV m×n – Σ ∈ C is a diagonal matrix in the sense that σij = 0 if i 6= j – if m > n, then Σ looks like

σ1  .  ..       σn Σ =    0 ··· 0   . .   . .  0 ··· 0 – if m < n, then Σ looks like   σ1 0 ··· 0  .. . . Σ =  . . . σm 0 ··· 0 – if m = n, then Σ looks like   σ1  ..  Σ =  .  σm

– the diagonal elements of Σ, denoted σi, i = 1, . . . , n, are all nonnegative, and can be ordered such that

σ1 ≥ σ2 ≥ · · · ≥ σr > 0, σr+1 = ··· = σmin(m,n) = 0 – r is the rank of A • this decomposition of A is called the singular value decomposition, or svd – the values σi, for i = 1, 2, . . . , n, are the singular values of A – the columns of U are the left singular vectors – the columns of V are the right singular vectors 3 • an alternative decomposition of A omits the singular values that are equal to zero: A = U˜Σ˜V˜ ∗ ˜ m×r ˜ ∗ ˜ – U ∈ C is a matrix with orthonormal columns, i.e. satisfying U U = Ir (but not ∗ U˜U˜ = Im!) ˜ n×r ˜ ∗ ˜ – V ∈ C is also a matrix with orthonormal columns, i.e. satisfying V V = Ir (but ∗ again not V˜ V˜ = In!) – Σ˜ is an r × r diagonal matrix with diagonal elements σ1, . . . , σr – again r = rank(A) – the columns of U˜ are the left singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the image of A – the columns of V˜ are the right singular vectors corresponding to the nonzero singular values of A, and form an orthonormal basis for the coimage of A – this is called the condensed or compact or reduced svd – note that in this case, Σ˜ is a • the form in (3.1) is sometimes called the full svd • we may also write the reduced svd as a sum of rank-1 matrices ∗ ∗ ∗ A = σ1u1v1 + σ2u2v2 + ··· + σrurvr ˜ m – U = [u1,..., ur], i.e. u1,..., ur ∈ C are the left singular vectors of A ˜ n – V = [v1,..., vr], i.e. v1,..., vr ∈ C are the right singular vectors of A 4. existence of svd Theorem 2 (Existence of SVD). Every matrix has a singular value decomposition (condensed version). m×n Proof. Let A ∈ C and for simplicity we assume that all its nonzero singular values are distinct. We define the matrix  0 A W = ∈ (m+n)×(m+n). A∗ 0 C It is easy to verify that W = W ∗ (after Wielandt, who’s the first to consider this matrix) and by the for Hermitian matrices, W has an evd, W = ZΛZ∗ (m+n)×(m+n) (m+n)×(m+n) where Z ∈ C is a unitary matrix and Λ ∈ R is a diagonal matrix with real diagonal elements. If z is an eigenvector of W , then we can write x W z = σz, z = y and therefore  0 A x x = σ A∗ 0 y y or, equivalently, Ay = σx,A∗x = σy. Now, suppose that we apply W to the vector obtained from z by negating y. Then we have  0 A  x  −Ay −σx  x  = = = −σ . A∗ 0 −y A∗x σy −y In other words, if σ 6= 0 is an eigenvalue, then −σ is also an eigenvalue. So we may assume without loss of generality that σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = 0 ··· = 0 4 where r = rank(A). So the diagonal matrix Λ of eigenvalues of W may be written as

(m+n)×(m+n) Λ = diag(σ1, σ2, . . . , σr, −σ1, −σ2,..., −σr, 0,..., 0) ∈ C . Observe that there is a zero block of size (m + n − 2r) × (m + n − 2r) in the bottom right corner of Λ. We scale the eigenvector z of W so that z∗z = 2. Since W is symmetric, eigenvectors corre- sponding to the distinct eigenvalues σ and −σ are orthogonal, so it follows that  x  x∗ y∗ = 0. −y These yield the system of equations x∗x + y∗y = 2, x∗x − y∗y = 0, which has the unique solution x∗x = 1, y∗y = 1. Now note that we can represent the matrix of normalized eigenvectors of W corresponding to nonzero eigenvalues (note that there are exactly 2r of these) as   1 XX (m+n)×2r Z˜ = √ ∈ C . 2 Y −Y √ Note that the factor 1/ 2 appears because of the way we have chosen the norm of z. We also let ˜ 2r×2r Λ = diag(σ1, σ2, . . . , σr, −σ1, −σ2,..., −σr) ∈ C . It is easy to see that ZΛZ∗ = Z˜Λ˜Z˜∗ just by multiplying out the zero block in Λ. So we have  0 A = W = ZΛZ∗ = Z˜ΛZ˜∗ A∗ 0      ∗ ∗  1 XX Σr 0 X Y = ∗ ∗ 2 Y −Y 0 −Σr X −Y    ∗ ∗  1 XΣr −XΣr X Y = ∗ ∗ 2 Y Σr Y Σr X −Y  ∗ 1 0 2XΣrY = ∗ 2 2Y ΣrX 0  ∗ 0 XΣrY = ∗ Y ΣrX 0 and therefore ∗ ∗ ∗ A = XΣrY ,A = Y ΣrX where X is an m × r matrix, Σ is r × r, and Y is n × r, and r is the rank of A. We have obtained the condensed svd of A. The last missing bit is the orthonormality of the columns of X and Y . This follows from the fact that distinct columns of XX  Y −Y 5 xi  xj  are mutually orthogonal since they correspond to distinct eigenvalues and so if we pick [ yi ], yj , h xj i −yj for i 6= j, and take their inner products, we get ∗ ∗ xi xj + yi yj = 0, ∗ ∗ xi xj − yi yj = 0. ∗ ∗ Adding them gives xi xj = 0 and substracting them gives yi yj = 0 for all i 6= j, as required.  • see Theorem 4.1 in Trefethen and Bau for an alternative non-constructive proof that does not require the use of spectral theorem

6