Eigenvalue Problems Last Time …

 Social Network Graphs

 Betweenness

 Girvan-Newman Algorithm

 Graph Laplacian

 Spectral Bisection

 휆2, 푤2 Today …

Small deviation into eigenvalue problems … Formulation

 Standard eigenvalue problem: Given a 푛 × 푛 퐴, find scalar 휆 and a nonzero vector 푥 such that

퐴푥 = 휆푥

 휆 is a eigenvalue, and 푥 is the corresponding eigenvector

 Spectrum = 휆(퐴) = set of eigenvalues of 퐴

 Spectral radius = 𝜌 퐴 = max 휆 ∶ 휆 ∈ 휆 퐴 Characteristic Polynomial

 Equation 퐴푥 = 휆푥 is equivalent to 퐴 − 휆퐼 푥 = 0  Eigenvalues of 퐴 are roots of the characteristic polynomial det 퐴 − 휆퐼 = 0

 The characteristic polynomial is a powerful theoretical tool but usually not useful computationally Considerations

 Properties of eigenvalue problem affecting choice of algorithm  Are all eigenvalues needed, or only a few?  Are only eigenvalues needed, or are corresponding eigenvectors also needed?  Is matrix real or complex?  Is matrix relatively small and dense, or large and sparse?  Does matrix have any special properties, such as symmetry? Problem Transformations

 Shift: If 퐴푥 = 휆푥 and 𝜎 is any scalar, then 퐴 − 𝜎퐼 푥 = 휆 − 𝜎 푥

1  Inversion: If 퐴 is nonsingular and 퐴푥 = 휆푥, then 휆 ≠ 0 and 퐴−1푥 = 푥 휆

 Powers: If 퐴푥 = 휆푥, then 퐴푘푥 = 휆푘푥

 Polynomial: If 퐴푥 = 휆푥 and 푝(푡) is a polynomial, then 푝 퐴 푥 = 푝 휆 푥 Similarity Transforms

 퐵 is similar to 퐴 if there is a nonsingular matrix 푇, such that

퐵 = 푇−1퐴푇

 Then, 퐵푦 = 휆푦 ⇒ 푇−1퐴푇푦 = 휆푦 ⇒ 퐴 푇푦 = 휆 푇푦

 Similarity transformations preserve eigenvalues and eigenvectors are easily recovered Diagonal form

 Eigenvalues of a diagonal matrix are the diagonal entries and the eigenvectors are columns of the Identity matrix

 The diagonal form is highly desirable in simplifying eigenvalue problems for general matrices by similarity transformations

 But not all matrices are diagonalizable by similarity transformations Triangular form

 Any matrix can be transformed into a triangular form by similarity

 The eigenvalues are simply the diagonal values

 Eigenvectors are not as obvious, but still easy to compute

 Simplest method for computing one eigenvalue-eigenvector pair

푥푘 = 퐴푥푘−1

 Converges to multiple of eigenvector corresponding to dominant eigenvalue

 We have seen this before while computing the Page Rank

 Proof of convergence ? Convergence of Power iteration

Express starting vector 푥0 in terms of the eigenvectors of 퐴 푛

푥0 = 훼푖 풗푖 푖=1 Then, 2 푘 푥푘 = 퐴푥푘−1 = 퐴 푥푘−2 = ⋯ = 퐴 푥0 푛 푛−1 푘 푘 푘 휆푖 휆푖 훼푖풗푖 = 휆푛 훼푛풗푛 + 훼푖풗푖 휆푛 푖=1 푖=1

휆 Since 푖 < 1, successively higher powers go to zero 휆푛 Power iteration with shift

휆  Convergence rate of power iteration depends on the ratio 푛−1 휆푛  It is possible to choose a shift, 퐴 − 𝜎퐼, such that

휆 − 𝜎 휆 푛−1 < 푛−1 휆푛 − 𝜎 휆푛

so convergence is accelerated

 Shift must be added back to result to obtain eigenvalue of original matrix

 If the smallest eigenvalues are required rather than the largest, we can make use of the fact that the eigenvalues of 퐴−1 are reciprocals of those of 퐴, so the smallest eigenvalue of 퐴 is the reciprocal of the largest eigenvalue of 퐴−1

 This leads to the inverse iteration scheme

퐴푦푘 = 푥푘−1 푥푘 = 푦푘/ 푦푘 ∞

 Inverse of 퐴 is not computed explicitly, but some factorization of 퐴 is used to solve the system at each iteration Shifted inverse iteration

 As before, the shifting strategy using a scalar 𝜎 can greatly improve convergence

 It is particularly useful for computing the eigenvector corresponding to the approximate eigenvalue

 Inverse iteration is also useful for computing the eigenvalue closest to a given value 훽, since if 훽 is used as the shift, then the desired eigenvalue corresponds to the smallest eigenvalue of the shifted matrix Deflation

 Once the dominant eigenvalue and eigenvector 휆푛, 푤푛 have been computed, the remaining eigenvalues can be computed using deflation, which effectively removes the known eigenvalue

 Let 퐻 be any nonsingular matrix such that 퐻푥 = 훼푒1  Then the similarity transform determined by 퐻 transforms 퐴 into, 푇 퐻퐴퐻−1 = 휆푛 푏 0 퐵  Can now work with 퐵 to compute the next eigenvalue,

 Process can be repeated to find additional eigenvalues and eigenvectors Deflation

푇  Alternate approach: let 푢푛 be any vector such that 푢푛푤푛 = 휆푛 푇  Then 퐴 − 푤푛푢푛 has eigenvalues 휆푛−1, … , 휆1, 0

 Possible choices for 푢푛

 푢푛 = 휆푛푤푛, if 퐴 is symmetric and 푤푛 is normalized so that 푤푛 2 = 1 푇  푢푛 = 휆푛푦푛, where 푦푛 is the corresponding left eigenvector (퐴 푦푛 = 휆푛푦푛) 푇 푡ℎ  푢푛 = 퐴 푒푘, if 푤푛 is normalized such that 푤푛 ∞ = 1 and the 푘 component of 푤푛 is 1 QR Iteration

 Iteratively converges to a triangular or block-triangular form, yielding all eigenvalues of 퐴

 Starting with 퐴0 = 퐴, at iteration 푘 compute QR factorization, 푄푘푅푘 = 퐴푘−1  And form the reverse product, 퐴푘 = 푅푘푄푘

 Product of orthogonal matrices 푄푘 converges to matrix of corresponding eigenvectors

 If 퐴 is symmetric, then symmetry is preserved by the QR iteration, so 퐴푘 converges to a matrix that is both symmetric and triangular  diagonal Preliminary reductions

 Efficiency of QR iteration can be enhanced by first transforming the matrix to be as close to triangular form as possible

is triangular except for one additional nonzero diagonal immediately adjacent to the main diagonal

 Symmetric Hessenberg matrix is tridiagonal

 Any matrix can be reduced to Hessenberg form in finite number of steps using Householder transformations

 Work per iteration is reduced from 풪 푛3 to 풪(푛2) for general matrices and 풪(푛) for symmetric matrices methods

 Reduces matrix to Hessenberg or tridiagonal form using only matrix- vector products

 For arbitrary starting vector 푥0, if 푘−1 퐾푘 = 푥0 퐴푥0 ⋯ 퐴 푥0 then −1 퐾푛 퐴퐾푛 = 퐶푛

where 퐶푛 is upper Hessenberg

 To obtain a better conditioned for span(퐾푛), compute the QR factorization, 푄푛푅푛 = 퐾푛 so that 퐻 −1 푄푛 퐴푄푛 = 푅푛퐶푛푅푛 ≡ 퐻 with 퐻 is upper Hessenberg Krylov subspace methods

푡ℎ  Equating 푘 columns on each side of the equation 퐴푄푛 = 푄푛퐻 yields

퐴푞푘 = ℎ1푘푞1 + ⋯ + ℎ푘푘푞푘 + ℎ푘+1,푘푞푘+1

relating 푞푘+1 to preceding vectors 푞1, … , 푞푘

퐻  Premultiplying by 푞푗 and using orthonormality 퐻 ℎ푗푘 = 푞푗 퐴푞푘, 푗 = 1, … , 푘

 These relationships yield the Arnoldi iteration

푥0 : arbitrary nonzero starting vector

푞1 = 푥0/ 푥0 2 for 푘 = 1,2, …

푢푘 = 퐴푞푘 for 푗 = 1 to 푘 퐻 ℎ푗푘 = 푞푗 푢푘

푢푘 = 푢푘 − ℎ푗푘푞푗

ℎ푘+1,푘 = 푢푘 2

if ℎ푘+1,푘 = 0 then stop

푞푘+1 = 푢푘/ℎ푘+1,푘 Arnoldi iteration

 If 푄푘 = 푞1 ⋯ 푞푘 then 퐻 퐻푘 = 푄푘 퐴푄푘 is a upper Hessenberg matrix

 Eigenvalues of 퐻푘, called Ritz values, are approximate eigenvalues of 퐴, and Ritz vectors given by 푄푘푦, where 푦 is an eigenvector of 퐻푘, are corresponding approximate eigenvectors of 퐴

 Eigenvalues of 퐻푘 must be computed by another method, such as QR iteration, but this is an easier problem as 푘 ≪ 푛 Arnoldi iteration

 Is fairly expensive in both work and storage because each new vector 푞푘 must be orthogonalized against all previous columns of 푄푘, and all must be stored for that purpose

 Is usually restarted periodically with a carefully chosen starting vector

 Ritz vectors and values produced are often good approximations to eigenvalues and eigenvectors of 퐴 after relatively few iterations Lanczos iteration

Work and storage costs drop dramatically if the matrix is symmetric or Hermitian, since the recurrence has only three terms and 퐻푘 is tridiagonal

푞0, 훽0 = 0 and 푥0 = arbitrary nonzero starting vector

푞1 = 푥0/ 푥0 2 for 푘 = 1,2, …

푢푘 = 퐴푞푘 퐻 훼푘 = 푞푘 푢푘

푢푘 = 푢푘 − 훽푘−1푞푘−1 − 훼푘푞푘

훽푘 = 푢푘 2

if 훽푘 = 0 then stop

푞푘+1 = 푢푘/훽푘 Lanczos iteration

 훼푘 and 훽푘 are diagonal and subdiagonal entries of symmetric tridiagonal matrix 푇푘  As with Arnoldi, Lanczos does not produce eigenvalues and eigenvectors directly, but only the tridiagonal matrix 푇푘, whose eigenvalues and eigenvectors must be computed by another method to obtain Ritz values and vectors

 If 훽 = 0, then the invariant subspace has already been identified, i.e., the Ritz values and vectors are already exact at that point Lanczos iteration

 In principle, if we let Lanczos run until 푘 = 푛, the resulting tridiagonal matrix would be orthogonally similar to 퐴

 In practice, rounding errors cause loss of

 Problem can be overcome by reorthogonalizing vectors

 In practice, this is usually ignored. The resulting approximations are still good Krylov subspace methods

 Great advantage of Arnoldi and Lanczos is their ability to produce good approximations to extreme eigenvalues for 푘 ≪ 푛

 They only require one matrix-vector product per step and little auxiliary storage, so are ideally suited for large sparse matrices

 If eigenvalues are needed in the middle of the spectrum, say near 𝜎, then the algorithms can be applied to 퐴 − 𝜎퐼 −1, assuming it is practical to solve systems of the form 퐴 − 𝜎퐼 푥 = 푦