Eigenvalue Problems Existence, Uniqueness, and Conditioning Computing Eigenvalues and Eigenvectors Outline

1 Eigenvalue Problems

2 Existence, Uniqueness, and Conditioning

3 Computing Eigenvalues and Eigenvectors

Michael T. Heath Scientific Computing 2 / 87 Eigenvalue Problems Eigenvalue Problems Existence, Uniqueness, and Conditioning Eigenvalues and Eigenvectors Computing Eigenvalues and Eigenvectors Geometric Interpretation Eigenvalue Problems

Eigenvalue problems occur in many areas of science and engineering, such as structural analysis

Eigenvalues are also important in analyzing numerical methods

Theory and algorithms apply to complex matrices as well as real matrices

With complex matrices, we use conjugate transpose, AH , instead of usual transpose, AT

Michael T. Heath Scientific Computing 3 / 87 Eigenvalue Problems Eigenvalue Problems Existence, Uniqueness, and Conditioning Eigenvalues and Eigenvectors Computing Eigenvalues and Eigenvectors Geometric Interpretation Eigenvalues and Eigenvectors

Standard eigenvalue problem : Given n n A, find ⇥ scalar and nonzero vector x such that

Ax= x

is eigenvalue, and x is corresponding eigenvector

may be complex even if A is real

Spectrum = (A)=set of eigenvalues of A

Spectral radius = ⇢(A) = max : (A) {| | 2 }

Michael T. Heath Scientific Computing 4 / 87 Classic Eigenvalue Problem

Consider the coupled pair of di↵erential equations: • dv =4v 5w, v =8att =0, dt dw =2v 3w, w =5att =0. dt This is an initial-value problem. • With the coecient matrix, • 4 5 A = , 2 3  we can write this as, d v(t) 4 5 v(t) = . dt w(t) 2 3 w(t) ✓ ◆  ✓ ◆ Introducing the vector unknown, u(t):=[v(t) w(t)]T with u(0) = [8 5]T , • we can write the system in vector form, du = Au, with u = u(0) at t =0. dt How do we find u(t)? • If we had a 1 1matrixA = a, we would have a scalar equation: • ⇥ du = au with u = u(0) at t =0. dt The solution to this equation is a pure exponential:

u(t)=eat u(0),

which satisfies the initial contion because e0 =1. The derivative with respect to t is aeatu(0) = au, so it satisfies the scalar • initial value problem. The constant a is critical to how this system behaves. • – If a>0thenthesolutiongrowsintime. – If a<0thenthesolutiondecays. – If a Im then the solution is oscillatory. 2 (More on this later...) Coming back to our system, suppose we again look for solutions that are • pure exponentials in time, e.g.,

v(t)=ety w(t)=etz.

If this is to be a solution to our initial value problem, we require • dv = ety =4ety 5etz dt dw = etz =2ety 3etz. dt The et cancels out from each side, leaving: • y =4y 5z z =2y 3z, which is the eigenvalue problem. In vector form, u(t)=etx, yields • du = Au etx = A(etx) dt () which gives the eigenvalue problem in matrix form:

x = Ax or Ax = x.

As in the scalar case, the solution behavior depends on whether has • – positive real part agrowingsolution, ! – negative real part adecayingsolution, ! – an imaginary part an oscillating solution. ! Note that here we have two unknowns: and x. • We refer to (, x)asaneigenpair,witheigenvalue and eigenvector x. • Solving the Eigenvalue Problem

The eigenpair satisfies • (Ax I) x =0, which is to say, – x is in the null-space of A I – is chosen so that A I has a null-space. We thus seek such that A I is singular. • Singularity implies det(A I)=0. • For our example: • 4 5 0= =(4 )( 3 ) ( 5)(2), 2 3 or 2 2=0, which has roots = 1or =2. Finding the Eigenvectors

For the case = = 1, (A I)x satisfies, • 1 1 1 5 5 y 0 = , 2 2 z 0  ✓ ◆ ✓ ◆ which gives us the eigenvector x1 y 1 x = = . 1 z 1 ✓ ◆ ✓ ◆ Note that any nonzero multiple of x is also an eigenvector. • 1 Thus, x defines a subspace that is invariant under multiplication by A. • 1

For the case = =2,(A I)x satisfies, • 2 2 2 2 5 y 0 = , 2 5 z 0  ✓ ◆ ✓ ◆ which gives us the second eigenvector as any multiple of y 5 x = = . 2 z 2 ✓ ◆ ✓ ◆ Return to Model Problem

Note that our model problem du = Au,islinear in the unknown u. • dt Thus, if we have two solutions u (t)andu (t)satisfyingthedi↵erential • 1 2 equation, their sum u := u1 + u2 also satisfies the equation: du 1 = Au dt 1 du + 2 = Au dt 2 d (u + u )=A(u + u ) dt 1 2 1 2 du = Au dt 1t du1 Take u1 = c1e x1: = c e1tx • dt 1 1 1

1t Au1 = A c1e x1

1t = c1e Ax1

1t = c1e 1x1 du = 1 . dt

2t du2 Similarly, for u2 = c2e x2: = Au . • dt 2

du d Thus, = (u + u )=A (u + u ) • dt dt 1 2 1 2

1t 2t u = c1e x1 + c2e x2.

The only remaining part is to find the coecients c and c such that • 1 2 u = u(0) at time t =0. This initial condition yields a 2 2system, • ⇥

c1 8 x1 x2 = . 2 3 c2 5 ✓ ◆ ✓ ◆ 4 5 Solving for c and c via Gaussian elimination: • 1 2 15 c 8 1 = 12 c 5  ✓ 2 ◆ ✓ ◆ 15 c 8 1 = 0 3 c 3  ✓ 2 ◆ ✓ ◆

c2 =1 c =8 5c =3. 1 1

So, our solution is u(t)=x c e1t + x c e2t • 1 1 2 2 1 5 = 3e t + e2t. 1 2 ✓ ◆ ✓ ◆

Clearly, after a long time, the solution is going to look like a multiple • T of x2 =[52] because the component of the solution parallel to x1 will decay. (More precisely, the component parallel to x will not grow as fast as the • 1 component parallel to x2.) Solving for c and c via Gaussian elimination: • 1 2 15 c 8 1 = 12 c 5  ✓ 2 ◆ ✓ ◆ 15 c 8 1 = 0 3 c 3  ✓ 2 ◆ ✓ ◆

c2 =1 c =8 5c =3. 1 1

So, our solution is u(t)=x c e1t + x c e2t • 1 1 2 2 1 5 = 3e t + e2t. 1 2 ✓ ◆ ✓ ◆

Clearly, after a long time, the solution is going to look like a multiple • T of x2 =[52] because the component of the solution parallel to x1 will decay. (More precisely, the component parallel to x will not grow as fast as the • 1 component parallel to x2.) Growing / Decaying Modes

Our model problem, • du = Au u(t)=x c e1t + x c e2t dt ! 1 1 2 2 leads to growth/decay of components.

Also get growth/decay through matrix-vector products. • Consider u = c x + c x . • 1 1 2 2 Au = c1Ax1 + c2Ax2

= c11x1 + c22x2

k k k A u = c11x1 + c22x2 k = k c 1 x + c x . 2 1 1 2 2 " ✓ 2 ◆ #

k k k lim A u = 2 [c1 0 x1 + c2x2]=c22x2. k · · !1 So, repeated matrix-vector products lead to emergence of eigenvector • associated with the eigenvalue that has largest modulus. This is the main idea behind the power method, which is a common • way to find the eigenvector associated with max . | | Eigenvalue Problems Eigenvalue Problems Existence, Uniqueness, and Conditioning Eigenvalues and Eigenvectors Computing Eigenvalues and Eigenvectors Geometric Interpretation Geometric Interpretation

Matrix expands or shrinks any vector lying in direction of eigenvector by scalar factor

Expansion or contraction factor is given by corresponding eigenvalue

Eigenvalues and eigenvectors decompose complicated behavior of general linear transformation into simpler actions

Michael T. Heath Scientific Computing 5 / 87 Eigenvalue Problems Eigenvalue Problems Existence, Uniqueness, and Conditioning Eigenvalues and Eigenvectors Computing Eigenvalues and Eigenvectors Geometric Interpretation Examples: Eigenvalues and Eigenvectors

10 1 0 A = : =1, x = , =2, x = 02 1 1 0 2 2 1    11 1 1 A = : =1, x = , =2, x = 02 1 1 0 2 2 1    3 1 1 1 A = : 1 =2, x1 = , 2 =4, x2 = 13 1 1    1.50.5 1 1 A = : 1 =2, x1 = , 2 =1, x2 = 0.51.5 1 1    01 1 i A = : = i, x = , = i, x = 10 1 1 i 2 2 1    where i = p 1

Michael T. Heath Scientific Computing 6 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Characteristic Polynomial

Equation Ax = x is equivalent to (A I)x = 0 which has nonzero solution x if, and only if, its matrix is singular Eigenvalues of A are roots i of characteristic polynomial det(A I)=0 in of degree n Fundamental Theorem of Algebra implies that n n matrix ⇥ A always has n eigenvalues, but they may not be real nor distinct Complex eigenvalues of real matrix occur in complex conjugate pairs: if ↵ + i is eigenvalue of real matrix, then so is ↵ i, where i = p 1 Michael T. Heath Scientific Computing 7 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Example: Characteristic Polynomial

Characteristic polynomial of previous example matrix is

3 1 10 det = 13 01 ✓  ◆ 3 1 det = 13 ✓ ◆ (3 )(3 ) ( 1)( 1) = 2 6 +8=0 so eigenvalues are given by

6 p36 32 = ± , or =2, =4 2 1 2

Michael T. Heath Scientific Computing 8 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Companion Matrix

Monic polynomial n 1 n p()=c0 + c1 + + cn 1 + ··· is characteristic polynomial of companion matrix 00 0 c ··· 0 10 0 c 2 ··· 1 3 01 0 c Cn = ··· 2 6. . . . . 7 6...... 7 6 7 600 1 cn 17 6 ··· 7 4 5 Roots of polynomial of degree > 4 cannot always computed in finite number of steps So in general, computation of eigenvalues of matrices of order > 4 requires (theoretically infinite) iterative process

Michael T. Heath Scientific Computing 9 / 87 Companion Matrix, n=3

❑ Look at determinant of (C – I ¸): | C- I ¸ | = 0

00 c 0 C := 10 c 0 1 1 01 c 2 @ A 0 c 0 Eigenvalues: C I = 1 c | | 1 01(c ) 2 1 0 = c0 + c1 (c2 + ) 01 01 1 = c c c 2 3 =0 0 1 2

= p()= c + c + c 2 + 3 =0 ) 0 1 2

1 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Characteristic Polynomial, continued

Computing eigenvalues using characteristic polynomial is not recommended because of work in computing coefficients of characteristic polynomial sensitivity of coefficients of characteristic polynomial work in solving for roots of characteristic polynomial

Characteristic polynomial is powerful theoretical tool but usually not useful computationally

Michael T. Heath Scientific Computing 10 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Example: Characteristic Polynomial

Consider 1 ✏ A = ✏ 1  where ✏ is positive number slightly smaller than p✏mach Exact eigenvalues of A are 1+✏ and 1 ✏ Computing characteristic polynomial in floating-point arithmetic, we obtain

det(A I)=2 2 +(1 ✏2)=2 2 +1 which has 1 as double root Thus, eigenvalues cannot be resolved by this method even though they are distinct in working precision

Michael T. Heath Scientific Computing 11 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Multiplicity and Diagonalizability

Multiplicity is number of times root appears when Algebraic polynomial is written as product of linear factors Multiplicity

Eigenvalue of multiplicity 1 is simple

Defective matrix has eigenvalue of multiplicity k>1 with fewer than k linearly independent corresponding Geometric eigenvectors Multiplicity

Nondefective matrix A has n linearly independent eigenvectors, so it is diagonalizable

1 X AX = D

where X is nonsingular matrix of eigenvectors

Note: Every matrixMichael is ² T. away Heath fromScientific being Computing diagonalizable. 12 / 87 Diagonalization

The real merit of eigenvalue decomposition is that it simplifies powers of a matrix. • 1 Consider X AX = D, diagonal • AX = XD

1 A = XDX

2 1 1 A = XDX XDX 2 1 = XD X

k 1 1 1 A = XDX XDX XDX ··· k 1 = XD X k 1 k 2 1 = X 2 3 X ... 6 k 7 6 n 7 4 5

High powers of A tend to be dominated by largest eigenpair (1, x1), • assuming . | 1| | 2| ··· | n|

1 Matrix Powers Example Consider our 1D finite di↵erence example introduced earlier. • 2 d u ui 1 2ui + ui+1 = f(x) f(x ). dx2 ! x2 ⇡ i where u(0) = u(1) = 0 and x =1/(n +1). In matrix form, • 2 1 u f 1 1 0 12 1 10 u2 1 0 f2 1 1 . . . . Au = B 1 .. .. CB . C=B . C x2 B CB C B C B CB . C B . C B ...... 1 CB . C B . C B CB C B C B CB C B C B 12CB um C B fm C B CB C B C @ A@ A @ A Eigenvectors and eigenvalues have closed-form expression: • 2 (z ) =sink⇡x =sink⇡ix = (1 cos k⇡x) k i i k x2 Eigenvalues are in the interval [⇡2, 4(n +1)2]. • ⇠ Matlab Example: heat_demo.m

❑ Repeatedly applying A to a random input vector reveals the eigenvalue of maximum modulus.

❑ This idea leads to one of the most common (but not most efficient) ways of finding an eigenvalue/vector pair, called the power method. Diagonalization

Note that if we define A0 = I,wehaveanypolynomialofA defined as •

pk(1)

2 pk(2) 3 1 p (A)x = X X x. k 6 . 7 6 .. 7 6 7 6 7 6 pk(n) 7 6 7 4 5

We can further extend this to other functions, •

f(1)

2 f(2) 3 1 f(A)x = X X x. 6 . 7 6 .. 7 6 7 6 7 6 f(n) 7 6 7 4 5

For example, the solution to f(A)x = b is would be • 1 1 x = X [f(D)] X b.

The diagonalization concept is very powerful because it transforms systems • of equations into scalar equations.

2 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Eigenspaces and Invariant Subspaces

Eigenvectors can be scaled arbitrarily: if Ax = x, then A(x)=(x) for any scalar , so x is also eigenvector corresponding to

Eigenvectors are usually normalized by requiring some norm of eigenvector to be 1

Eigenspace = = x : Ax = x S { } Subspace of Rn (or Cn) is invariant if A S S ✓ S For eigenvectors x x , span([x x ]) is invariant 1 ··· p 1 ··· p subspace In theory…

Michael T. Heath Scientific Computing 13 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Relevant Properties of Matrices

Properties of matrix A relevant to eigenvalue problems Property Definition diagonal a =0for i = j ij 6 tridiagonal a =0for i j > 1 ij | | triangular aij =0for i>j(upper) aij =0for ij+1(upper) a =0for i

Michael T. Heath Scientific Computing 14 / 87

2 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Examples: Matrix Properties

T 12 13 Transpose: = 34 24   H 1+i 1+2i 1 i 2+i Conjugate transpose: = 2 i 2 2i 1 2i 2+2i   T 12 0 2 0 2 Symmetric: Skew-Symmetric: = 23 20 20    13 Nonsymmetric: 24  11+i Hermitian: 1 i 2  11+i NonHermitian: 1+i 2 

Michael T. Heath Scientific Computing 15 / 87

1 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Examples, continued

01 10 p2/2 p2/2 Orthogonal: , , 10 0 1 p2/2 p2/2    ip2/2 p2/2 Unitary: p2/2 ip2/2  11 Nonorthogonal: 12  120 Normal: 012 2 3 201 4 5 11 Nonnormal: 01 

Michael T. Heath Scientific Computing 16 / 87 Normal Matrices

H Normal matrices have orthogonal eigenvectors, so xi xj = ij

T 1 X = X A = XDXH

Normal matrices include

symmetric (A = AT ) • skew-symmetric (A = AT ) • unitary (U H U = I) • circulant (periodic+Toeplitz) • others ... •

1 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Properties of Eigenvalue Problems

Properties of eigenvalue problem affecting choice of algorithm and software

Are all eigenvalues needed, or only a few?

Are only eigenvalues needed, or are corresponding eigenvectors also needed?

Is matrix real or complex?

Is matrix relatively small and dense, or large and sparse?

Does matrix have any special properties, such as symmetry, or is it general matrix? Sparsity

Michael T. Heath Scientific Computing 17 / 87 Sparsity

❑ Sparsity, either direct or implied, is a big driver in choice of eigenvalue solvers.

❑ Typically, only O(n) entries in entire matrix, where n ~ 109—1018 might be anticipated.

❑ Examples include Big Data (e.g., google page rank) and physics simulations (fluid, heat transfer, electromagnetics, fusion, etc.).

❑ Usually, need only a few eigenvectors / eigenvalues, k << n.

❑ Often, there are special properties of A that make it difficult to create A. Instead, work strictly with matrix-vector products y = A x Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Conditioning of Eigenvalue Problems

Condition of eigenvalue problem is sensitivity of eigenvalues and eigenvectors to changes in matrix

Conditioning of eigenvalue problem is not same as conditioning of solution to linear system for same matrix

Different eigenvalues and eigenvectors are not necessarily equally sensitive to perturbations in matrix

Michael T. Heath Scientific Computing 18 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Conditioning of Eigenvalues

If µ is eigenvalue of perturbation A + E of nondefective matrix A, then

µ cond (X) E | k|  2 k k2

where k is closest eigenvalue of A to µ and X is nonsingular matrix of eigenvectors of A Absolute condition number of eigenvalues is condition number of matrix of eigenvectors with respect to solving linear equations Eigenvalues may be sensitive if eigenvectors are nearly linearly dependent (i.e., matrix is nearly defective) For normal matrix (AH A = AAH ), eigenvectors are orthogonal, so eigenvalues are well-conditioned

Michael T. Heath Scientific Computing 19 / 87 Eigenvalue Problems Characteristic Polynomial Existence, Uniqueness, and Conditioning Relevant Properties of Matrices Computing Eigenvalues and Eigenvectors Conditioning Conditioning of Eigenvalues

If (A + E)(x + x)=( + )(x + x), where is simple eigenvalue of A, then

y x 1 k k2 · k k2 E = E | | / yH x k k2 cos(✓) k k2 | | where x and y are corresponding right and left eigenvectors and ✓ is angle between them For symmetric or , right and left eigenvectors are same, so cos(✓)=1and eigenvalues are inherently well-conditioned Eigenvalues of nonnormal matrices may be sensitive For multiple or closely clustered eigenvalues, corresponding eigenvectors may be sensitive

Michael T. Heath Scientific Computing 20 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning and Variants Computing Eigenvalues and Eigenvectors Other Methods Problem Transformations

Shift : If Ax = x and is any scalar, then (A I)x =( )x, so eigenvalues of shifted matrix are shifted eigenvalues of original matrix Inversion : If A is nonsingular and Ax = x with x = 0, 6 then =0and A 1x =(1/)x, so eigenvalues of inverse 6 are reciprocals of eigenvalues of original matrix Powers : If Ax = x, then Akx = kx, so eigenvalues of power of matrix are same power of eigenvalues of original matrix Polynomial : If Ax = x and p(t) is polynomial, then p(A)x = p()x, so eigenvalues of polynomial in matrix are values of polynomial evaluated at eigenvalues of original matrix

Michael T. Heath Scientific Computing 21 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Similarity Transformation

B is similar to A if there is nonsingular matrix T such that

1 B = T AT

Then

1 By = y T AT y = y A(Ty)=(Ty) ) ) so A and B have same eigenvalues, and if y is eigenvector of B, then x = Ty is eigenvector of A

Similarity transformations preserve eigenvalues and eigenvectors are easily recovered

Michael T. Heath Scientific Computing 22 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Similarity Transformation

From eigenvalues and eigenvectors for previous example,

3 1 11 1120 = 131 1 1 1 04     and hence 0.50.5 3 1 11 20 = 0.5 0.5 131 1 04     So original matrix is similar to diagonal matrix, and eigenvectors form columns of similarity transformation matrix

Michael T. Heath Scientific Computing 23 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Diagonal Form

Eigenvalues of diagonal matrix are diagonal entries, and eigenvectors are columns of identity matrix

Diagonal form is desirable in simplifying eigenvalue problems for general matrices by similarity transformations

But not all matrices are diagonalizable by similarity transformation

Closest one can get, in general, is Jordan form, which is nearly diagonal but may have some nonzero entries on first superdiagonal, corresponding to one or more multiple eigenvalues

Michael T. Heath Scientific Computing 24 / 87 Simple non-diagonalizable example, 2 £ 2 Jordan block:

Simple non-diagonalizable example, 2 2 Jordan block: ⇥ 11 x1 x1 = " 01# x2 ! x2 !

1 1 =(1 )2 =0 01 1 Only one eigenvector: x = 0 !

11 1 1 = " 01# 0 ! 0 !

1 3 3 Non-Diagonalizable Example ⇥

2 21 A = 2 2 3 ,B= 2 213 . 6 2 7 6 2 7 6 7 6 7 4 5 4 5 Characteristic polynomial is ( 2)3 for both A and B. • Algebraic multiplicity is 3. • For A, three eigenvectors. Say, e , e ,ande . • 1 2 3 For B, only one eigenvector (↵e ), so geometric multiplicity of B is 1. • 1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Triangular Form

Any matrix can be transformed into triangular (Schur ) form by similarity, and eigenvalues of triangular matrix are diagonal entries Eigenvectors of triangular matrix less obvious, but still straightforward to compute If U11 uU13 A I = 0 0 vT 2 3 O 0 U33

is triangular, then U11y = u4can be solved5 for y, so that y x = 1 2 3 0 is corresponding eigenvector4 5

Michael T. Heath Scientific Computing 25 / 87 Eigenvectors / Eigenvalues of Upper Triangular Matrix

Suppose A is upper triangular

A11 u U13 A = 2 0 vT 3 6 O 0 A 7 6 33 7 4 5 Then U u U y U y u 11 13 11 0=(A I)x = 2 00vT 30 1 1 = 0 0 1 6 O 0 U 7B 0 C B 0 C 6 33 7B C B C 4 5@ A @ A (A I) x 0

Because U11 is nonsingular, can solve U11y = u to find eigenvector x.

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Block Triangular Form

If A A A 11 12 ··· 1p A22 A2p A = 2 ···. . 3 .. . 6 7 6 A 7 6 pp7 with square diagonal4 blocks, then 5 p (A)= (Ajj) j[=1 so eigenvalue problem breaks into p smaller eigenvalue problems Real Schur form has 1 1 diagonal blocks corresponding ⇥ to real eigenvalues and 2 2 diagonal blocks ⇥ corresponding to pairs of complex conjugate eigenvalues

Michael T. Heath Scientific Computing 26 / 87 Eigenvalue-Revealing Factorizations

Eigenvalue-Revealing Factorizations

1 Diagonalization: A = X⇤X if A is nondefective. • Unitary diagonalization: A = Q⇤Q⇤ if A is normal. • Unitary triangularization: A = QT Q⇤ always exists. • (T upper triangular.) Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Forms Attainable by Similarity AT B distinct eigenvalues nonsingular diagonal real symmetric orthogonal real diagonal complex Hermitian unitary real diagonal normal unitary diagonal arbitrary real orthogonal real block triangular (real Schur) arbitrary unitary upper triangular (Schur) arbitrary nonsingular almost diagonal (Jordan) Given matrix A with indicated property, matrices B and T 1 exist with indicated properties such that B = T AT If B is diagonal or triangular, eigenvalues are its diagonal entries If B is diagonal, eigenvectors are columns of T

Michael T. Heath Scientific Computing 27 / 87 Similarity Transformations

Given 1 • B = T AT 1 A = TBT

If A is normal (AAH = AH A), • A = Q ⇤ QH

1 H B is diagonal, T is unitary (T = T ). If A is symmetric real, • A = Q ⇤ QT

1 T B is diagonal, T is orthogonal (T = T ). If B is diagonal, T is the matrix of eigenvectors. •

Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Power Iteration

Simplest method for computing one eigenvalue- eigenvector pair is power iteration, which repeatedly multiplies matrix times initial starting vector

Assume A has unique eigenvalue of maximum modulus, say 1, with corresponding eigenvector v1

Then, starting from nonzero vector x0, iteration scheme

xk = Axk 1

converges to multiple of eigenvector v1 corresponding to dominant eigenvalue 1

Michael T. Heath Scientific Computing 28 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Convergence of Power Iteration

To see why power iteration converges to dominant eigenvector, express starting vector x0 as linear combination n x0 = ↵ivi Xi=1 where vi are eigenvectors of A Then 2 k xk = Axk 1 = A xk 2 = = A x0 = ··· n n k k k i ↵ivi = 1 ↵1v1 + (i/1) ↵ivi ! Xi=1 Xi=2 Since / < 1 for i>1, successively higher powers go | i 1| to zero, leaving only component corresponding to v1

Michael T. Heath Scientific Computing 29 / 87 2 x 2 Example

1.50.5 A = " 0.51.5 #

1 1/p21/p2 D = X = " 2 # " 1/p21/p2 #

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Power Iteration

Ratio of values of given component of xk from one iteration to next converges to dominant eigenvalue 1 1.50.5 0 For example, if A = and x = , we obtain 0.51.5 0 1  T  k xk ratio 0 0.01.0 xk = Axk 1 1 0.51.5 1.500 2 1.52.5 1.667 3 3.54.5 1.800 4 7.58.5 1.889 5 15.5 16.5 1.941 6 31.5 32.5 1.970 7 63.5 64.5 1.985 8 127.5 128.5 1.992 Ratio is converging to dominant eigenvalue, which is 2

Michael T. Heath Scientific Computing 30 / 87

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Limitations of Power Iteration

Power iteration can fail for various reasons

Starting vector may have no component in dominant eigenvector v1 (i.e., ↵1 =0) — not problem in practice because rounding error usually introduces such component in any case

There may be more than one eigenvalue having same (maximum) modulus, in which case iteration may converge to linear combination of corresponding eigenvectors This issue corrected by simultaneous iteration. For real matrix and starting vector, iteration can never converge to complex vector

Michael T. Heath Scientific Computing 31 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Normalized Power Iteration

Geometric growth of components at each iteration risks eventual overflow (or underflow if 1 < 1)

Approximate eigenvector should be normalized at each iteration, say, by requiring its largest component to be 1 in modulus, giving iteration scheme

yk = Axk 1 xk = yk/ yk k k1

With normalization, yk 1 , and xk v1/ v1 k k1 ! | | ! k k1

Michael T. Heath Scientific Computing 32 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Normalized Power Iteration

Repeating previous example with normalized scheme,

T k x yk k k k1 0 0.000 1.0 1 0.333 1.0 1.500 2 0.600 1.0 1.667 3 0.778 1.0 1.800 4 0.882 1.0 1.889 5 0.939 1.0 1.941 6 0.969 1.0 1.970 7 0.984 1.0 1.985 8 0.992 1.0 1.992

< interactive example >

Michael T. Heath Scientific Computing 33 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Geometric Interpretation

Behavior of power iteration depicted geometrically

Initial vector x0 = v1 + v2 contains equal components in eigenvectors v1 and v2 (dashed arrows)

Repeated multiplication by A causes component in v1 (corresponding to larger eigenvalue, 2) to dominate, so sequence of vectors xk converges to v1

Michael T. Heath Scientific Computing 34 / 87 Convergence Rate of Power Iteration

Convergence rate of power iteration depends on relative separation of 1 and 2. Assuming c =0and > , j>1, we have 1 6 | 1| | j| n k k A x = xj j cj j=1 X n k c = k c x + x j j 1 1 1 j k c " j=2 1 1 # X x k c as k ⇠ 1 1 1 ! 1 k c k c x + x 2 2 ⇠ 1 1 1 2 c " ✓ 1 ◆ 1 #

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Power Iteration with Shift

Convergence rate of power iteration depends on ratio / , where is eigenvalue having second largest | 2 1| 2 modulus

May be possible to choose shift, A I, such that 2 < 2 1 1 so convergence is accelerated Shift must then be added to result to obtain eigenvalue of original matrix

Note: in general, need maxj | ¸j - ¾ | / | ¸1 - ¾ | < | ¸2 | / | ¸1 | .

Michael T. Heath Scientific Computing 35 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Power Iteration with Shift

In earlier example, for instance, if we pick shift of =1, (which is equal to other eigenvalue) then ratio becomes zero and method converges in one iteration

In general, we would not be able to make such fortuitous choice, but shifts can still be extremely useful in some contexts, as we will see later

Idea: pick shifts at each iteration as estimate of spectrum emerges.

Michael T. Heath Scientific Computing 36 / 87 Power Iteration with Shift

(A I)x = x x =( )x = µx

If k 1 .9 ... .1 , then 2 { } 2 =0.9 1

If =0.4, then µ .5 .4 ... .4 and k 2 { } µ 2 =0.8, µ1 so about twice the convergence rate. Shifted power iteration, however, is somewhat limited. The real power derives from inverse power iterations with shifts.

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Inverse Iteration

If smallest eigenvalue of matrix required rather than 1 largest, can make use of fact that eigenvalues of A are reciprocals of those of A, so smallest eigenvalue of A is 1 reciprocal of largest eigenvalue of A

This leads to inverse iteration scheme

Ayk = xk 1 xk = yk/ yk k k1 1 which is equivalent to power iteration applied to A

Inverse of A not computed explicitly, but factorization of A used to solve system of linear equations at each iteration

Michael T. Heath Scientific Computing 37 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Inverse Iteration, continued

Inverse iteration converges to eigenvector corresponding to smallest eigenvalue of A

1 Eigenvalue obtained is dominant eigenvalue of A , and hence its reciprocal is smallest eigenvalue of A in modulus

Michael T. Heath Scientific Computing 38 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Inverse Iteration

Applying inverse iteration to previous example to compute smallest eigenvalue yields sequence T k x yk k k k1 0 0.000 1.0 1 0.333 1.0 0.750 2 0.600 1.0 0.833 3 0.778 1.0 0.900 4 0.882 1.0 0.944 5 0.939 1.0 0.971 6 0.969 1.0 0.985 which is indeed converging to 1 (which is its own reciprocal in this case)

< interactive example >

Michael T. Heath Scientific Computing 39 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Inverse Iteration with Shift

As before, shifting strategy, working with A I for some scalar , can greatly improve convergence

Inverse iteration is particularly useful for computing eigenvector corresponding to approximate eigenvalue, since it converges rapidly when applied to shifted matrix A I, where is approximate eigenvalue Inverse iteration is also useful for computing eigenvalue closest to given value , since if is used as shift, then desired eigenvalue corresponds to smallest eigenvalue of shifted matrix

Michael T. Heath Scientific Computing 40 / 87 Power Iteration: x = Akx cx • ! 1

Normalized y = Ax y || || ! | 1| • Power Iteration: x = y/ y x x1 || || ! 1 1 y = A x y Inverse Iteration: || || ! | n| • x = y/ y x xn || || !

M = A I 1 Inverse Iteration 1 y µk =max k y = M x || || ! | | | | • with shift: 9 x xk x = y/ y ! || || = ; Inverse iteration with shift can be arbitrarily fast since separation ratio can be 0. Inverse Iteration: • k k 0 x = A x n 1 k = x c j j j=1 j X ✓ ◆ k n 1 k 1 = c x + x n c . n n j j n " j=1 j # ✓ ◆ X ✓ ◆ eig_inv_demo.m Inverse Iteration Illustration

❑ With shift and invert, can get significant ratios of dominant eigenvalue

Eigenvalues of A Eigenvalues of A-1

Ratio ~ .9 Ratio = 0.5 Inverse Iteration with Shift: Let • M := A I µ := , and j j l such that µ < µ ,j= l. | l| | j| 6 Then,

k k 0 x = M x n 1 k = x c j µ j j=1 j X ✓ ◆ k k 1 µl = cl xl + xj cj . µl 2 µj 3 ✓ ◆ j=l ✓ ◆ X6 4 5 Using current approximation to can select = µ to be small. • l | l| | l| (Cannot do the same with shifted power iteration.) Blow-up is contained by normalizing after each iteration. • eig_shift_invert.m Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Rayleigh Quotient

Given approximate eigenvector x for real matrix A, determining best estimate for corresponding eigenvalue can be considered as n 1 linear least squares ⇥ approximation problem

x ⇠= Ax From normal equation xT x = xT Ax, least squares solution is given by xT Ax = xT x This quantity, known as Rayleigh quotient, has many useful properties

Michael T. Heath Scientific Computing 41 / 87 Rayleigh Quotient Convergence

Assume A is a and x is close to an eigenvector, • n x = c x + c x , c = ✏ for j 2 1 1 j j | j| j=2 X If x =1,thentheRayleighquotient • || || xT A x r(x)= = xT A x xT x n n T = ci cj ixi xj i=1 j=1 X X n 2 = ci i i=1 X n = 1 n✏2 + ✏2 1 i i=2 X ❑ Rayleigh quotient is stationary at eigenvectors. ❑ It takes on min/max when j=n or 1. Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Rayleigh Quotient

Rayleigh quotient can accelerate convergence of iterative methods such as power iteration, since Rayleigh quotient T T xk Axk/xk xk gives better approximation to eigenvalue at iteration k than does basic method alone For previous example using power iteration, value of Rayleigh quotient at each iteration is shown below T T T k x yk x Axk/x xk k k k1 k k 0 0.000 1.0 1 0.333 1.0 1.500 1.500 2 0.600 1.0 1.667 1.800 Rayleigh quotient puts more emphasis 3 0.778 1.0 1.800 1.941 on x1 and 4 0.882 1.0 1.889 1.985 convergence is 5 0.939 1.0 1.941 1.996 quadratic. 6 0.969 1.0 1.970 1.999

Michael T. Heath Scientific Computing 42 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Rayleigh Quotient Iteration

Given approximate eigenvector, Rayleigh quotient yields good estimate for corresponding eigenvalue

Conversely, inverse iteration converges rapidly to eigenvector if approximate eigenvalue is used as shift, with one iteration often sufficing

These two ideas combined in Rayleigh quotient iteration

T T k = xk Axk/xk xk

(A I)y = x Solve a system! k k+1 k xk+1 = yk+1/ yk+1 k k1 starting from given nonzero vector x0

Michael T. Heath Scientific Computing 43 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Rayleigh Quotient Iteration, continued

Rayleigh quotient iteration is especially effective for symmetric matrices and usually converges very rapidly

Using different shift at each iteration means matrix must be refactored each time to solve linear system, so cost per iteration is high unless matrix has special form that makes factorization easy

Same idea also works for complex matrices, for which transpose is replaced by conjugate transpose, so Rayleigh quotient becomes xH Ax/xH x

Matlab Demo: eig_shifted_rq.m

Michael T. Heath Scientific Computing 44 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Rayleigh Quotient Iteration

Using same matrix as previous examples and randomly chosen starting vector x0, Rayleigh quotient iteration converges in two iterations

T k xk k 0 0.807 0.397 1.896 1 0.924 1.000 1.998 2 1.000 1.000 2.000

Michael T. Heath Scientific Computing 45 / 87 Deflation – Finding Second Eigenpair Deflation – chasing second eigenpair (x2, 2).

Choose H to be elementary Householder matrix such that Hx1 = ↵e1. Consider

Ax1 = 1 x1

1 1 AH Hx1 = 1 H Hx1

1 1 AH ↵e1 = 1 H ↵e1

1 HAH e1 = 1 e1

1 1 A2 := HAH = HAH I

1 = HAH [ e1 e2 ... en ]

T 1 b 0 = 2 0 B 3 6 0 7 6 7 4 5 Apply method of choice to B to find 2. Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Deflation

After eigenvalue 1 and corresponding eigenvector x1 have been computed, then additional eigenvalues 2,...,n of A can be computed by deflation, which effectively removes known eigenvalue

Let H be any nonsingular matrix such that Hx1 = ↵e1, scalar multiple of first column of identity matrix (Householder transformation is good choice for H) Then similarity transformation determined by H transforms A into form bT HAH 1 = 1 0 B  where B is matrix of order n 1 having eigenvalues 2,...,n

Michael T. Heath Scientific Computing 46 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Deflation, continued

Thus, we can work with B to compute next eigenvalue 2

Moreover, if y2 is eigenvector of B corresponding to 2, then T 1 ↵ b y2 x2 = H , where ↵ = y2  2 1 is eigenvector corresponding to 2 for original matrix A, provided = 1 6 2 Process can be repeated to find additional eigenvalues and eigenvectors

Michael T. Heath Scientific Computing 47 / 87 Matrix-Free Deflation

Deflation via Projection.

for k =1,2,... y = Ax T x1 y T y = y x1 T = y x1x1 y x1 x1

x = y/ y . || ||

Do not require knowledge of A. • Only need a routine that provides y Ax. • Deflation on the Fly – Subspace Iteration

Can e↵ect deflation on the fly — Subspace Iteration. Take two independent vectors Y =(y y ). • 1 2 for k =1,2,... Z = AY y = z / z . 1 1 || 1|| y = z y yT z 2 2 1 1 2 y = y / y 2 2 || 2||

(y y ) converge to the first two eigenvectors (x x ). • 1 2 1 2

Matlab Demo: eig_simultaneous Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Simultaneous Iteration

Simplest method for computing many eigenvalue- eigenvector pairs is simultaneous iteration, which repeatedly multiplies matrix times matrix of initial starting vectors

Starting from n p matrix X of rank p, iteration scheme is ⇥ 0

Xk = AXk 1

span(Xk) converges to invariant subspace determined by p largest eigenvalues of A, provided > | p| | p+1| Also called subspace iteration

Michael T. Heath Scientific Computing 49 / 87 Subspace Iteration Variants

n p Let X lR ⇥ be a matrix of rank p. 0 2 Alg. 1: • for k =1,2,...

Xk = AXk 1 end

Alg. 2: • for k =1,2,... n p QR = XQlR ⇥ ,orthogonal 2 X = AQ end

Note: in power iteration notation, Alg. 2 would be:

for k =1,2,... n p Y = AX Q lR ⇥ ,orthogonal 1 2 X = YR orthonormalize end and Xk (x x ... x ). ! 1 2 p So, for Alg. 2:

Qk =(q q ... q ) (x x ... x ), 1 2 p ! 1 2 p Xk ( x x ... x ). ! 1 1 2 2 p p

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Orthogonal Iteration

As with power iteration, normalization is needed with simultaneous iteration

Each column of Xk converges to dominant eigenvector, so columns of Xk become increasingly ill-conditioned for span(Xk) Both issues can be addressed by computing QR factorization at each iteration

QˆkRk = Xk 1 Xk = AQˆk

where QˆkRk is reduced QR factorization of Xk 1 This orthogonal iteration converges to block triangular form, and leading block is triangular if moduli of consecutive eigenvalues are distinct

Michael T. Heath Scientific Computing 50 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods QR Iteration

For p = n and X0 = I, matrices ˆH ˆ Ak = Qk AQk generated by orthogonal iteration converge to triangular or block triangular form, yielding all eigenvalues of A

QR iteration computes successive matrices Ak without forming above product explicitly

Starting with A0 = A, at iteration k compute QR factorization QkRk = Ak 1 and form reverse product

Ak = RkQk

Michael T. Heath Scientific Computing 51 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods QR Iteration, continued

Successive matrices Ak are unitarily similar to each other

H Ak = RkQk = Qk Ak 1Qk

Diagonal entries (or eigenvalues of diagonal blocks) of Ak converge to eigenvalues of A

Product of orthogonal matrices Qk converges to matrix of corresponding eigenvectors

If A is symmetric, then symmetry is preserved by QR iteration, so Ak converge to matrix that is both triangular and symmetric, hence diagonal

Michael T. Heath Scientific Computing 52 / 87 QR Iteration

Recall similarity transformation:

1 A : B = T AT same eigenvalues

With Bz = z, we have:

1 Bz = T AT z = z AT z = Tz Ax = x, with x := Tz

Starting with A0 = A, we consider a sequence of similarity transformations:

T 1 Ak = Qk Ak 1 Qk = Qk Ak 1 Qk

2 QR Iteration

Start with A0 = A. Alg. QR: • for k =1,2,... T 1 Qk Rk = Ak 1 Rk = Qk Ak 1 Qk = Ak 1Rk T 1 Ak = Rk Qk Ak = Qk Ak 1Qk Ak = RkAk 1Rk end

Net result is similarity transformation • T T T Ak =(Qk Qk 1 Q1 ) A (Q1 Q2 Qk) ··· ···

Eigenvalues and symmetry are preserved. • Can use a di↵erent orthogonal matrix A := QT AQ to start. • 0 0 0

5 First, define the following quantities:

Qˆ := Q Q Q k 1 2 ··· k ˆT ˆ Ak := Qk AQk

Rˆk := Rk Rk 1 R1, ··· where Qˆk and Qk are orthogonal matrices. Consider the QR subspace iteration with p = n:

for k =1,2,...

QˆkRk = Xk 1

Xk = AQˆk end

We have the following results

Qˆk+1Rk+1 = AQˆk ˆ ˆT ˆ = QkQk AQk

= QˆkAk

ˆ ˆ T Qk+1Rk+1Qk+1 = QkQk+1Qk+1AkQk+1

=(QˆkQk+1)Ak+1 Comparison of QR and Subspace Iteration = Qˆk+1Ak+1

Rk+1Qk+1 = Ak+1.

Thus, we have a way of generating Qˆk := Q1 Q2 Qk through the following QR iteration: ···

for k =1,2,...

QkRk = Ak 1

Ak = RkQk

Qˆk = Qˆk 1Qk end

If A is normal, columns approach eigenvectors

3 QR Iteration

Note that Q1R1 = A0 = A.

2 A = Q1 R1 Q1 R1

= Q1 Q2 R2 R1 =: Qˆ2Rˆ2

3 A = Q1 R1 Q1 R1 Q1 R1

= Q1 Q2 R2 Q2 R2 R1 =: Qˆ2Rˆ2

= Q1 Q2 Q3 R3 R2 R1 =: Qˆ3Rˆ3

k A = QˆkRˆk

Algorithm produces successive powers of A. •

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: QR Iteration

72 Let A = 0 24  Compute QR factorization .962 .275 7.28 3.02 A0 = Q1R1 = .275 .962 03.30   and form reverse product 7.83 .906 A = R Q = 1 1 1 .906 3.17  Off-diagonal entries are now smaller, and diagonal entries closer to eigenvalues, 8 and 3 Process continues until matrix is within tolerance of being diagonal, and diagonal entries then closely approximate eigenvalues

Michael T. Heath Scientific Computing 53 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods QR Iteration with Shifts

Convergence rate of QR iteration can be accelerated by incorporating shifts

QkRk = Ak 1 kI Ak = RkQk + kI

where k is rough approximation to eigenvalue

Good shift can be determined by computing eigenvalues of 2 2 submatrix in lower right corner of matrix ⇥

Michael T. Heath Scientific Computing 54 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: QR Iteration with Shifts

Repeat previous example, but with shift of 1 =4, which is lower right corner entry of matrix We compute QR factorization .832 .555 3.61 1.66 A I = Q R = 0 1 1 1 .555 .832 01.11   and form reverse product, adding back shift to obtain 7.92 .615 A = R Q + I = 1 1 1 1 .615 3.08  After one iteration, off-diagonal entries smaller compared with unshifted algorithm, and eigenvalues closer approximations to eigenvalues

eig_qr_w_shifts.m < interactive example >

Michael T. Heath Scientific Computing 55 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Preliminary Reduction

Efficiency of QR iteration can be enhanced by first transforming matrix as close to triangular form as possible before beginning iterations

Hessenberg matrix is triangular except for one additional nonzero diagonal immediately adjacent to main diagonal

Any matrix can be reduced to Hessenberg form in finite number of steps by orthogonal similarity transformation, for example using Householder transformations

Symmetric is tridiagonal

Hessenberg or tridiagonal form is preserved during successive QR iterations

Michael T. Heath Scientific Computing 56 / 87 UpperUpper Hessenberg Hessenbergupper X triangular Upper Triangular is upper Hessenberg. is Upper Hessenberg ⇥

xxxxx xxxxx xxxxx xxxxx xxxx xxxxx 2 3 2 3 2 3 xxxx xxx = xxxx 6 xxx7 6 xx7 6 xxx7 6 7 6 7 6 7 6 xx7 6 x 7 6 xx7 6 7 6 7 6 7 4 5 4 5 4 5

1 UpperUpper Hessenberg Hessenbergupper X triangular Upper Triangular is upper Hessenberg. is Upper Hessenberg ⇥

xxxxx xxxxx xxxxx xxxxx xxxx xxxxx 2 3 2 3 2 3 xxxx xxx = xxxx 6 xxx7 6 xx7 6 xxx7 6 7 6 7 6 7 6 xx7 6 x 7 6 xx7 6 7 6 7 6 7 4 5 4 5 4 5

1 Same for upper triangular upper Hessenberg. UpperUpper Hessenberg Hessenbergupper X triangular Upper Triangular is upper Hessenberg. is SameUpper for Hessenberg upper• triangular upper Hessenberg.⇥ ⇥ • ⇥ T Start QR iterationT with A0 := Q0 AQ0 Start QR iteration• with A0 := Q0 AQ0 • such that A0 is upper Hessenberg. xxxxx xxxxx suchxxxxx that A0 is upper Hessenberg. xxxxx xxxx xxxxx 2 3 2 3 2 Then each AThenis3 upper each Hessenberg,Ak is upper and Hessenberg, and xxxx xxx = • xxxx• k QkRk can beQ donekRk withcan be Givens done rotations with Givens rotations 6 xxx7 6 xx7 6 xxx7 2 3 6 7 6 7 6 in O(n2)operations,insteadofin O7(n )operations,insteadofO(n3). O(n ). 6 xx7 6 x 7 6 xx7 6 7 6 7 6 7 4 5 4 5 4 With shiftedWithQR5, need shifted onlyQRO,(n need)iterations, only O(n)iterations, • • 3 3 Same for upper triangular upper Hessenberg.so total costso is O total(n ). cost is O(n ). • ⇥ T QR iteration:QR iteration: Start QR iteration with A0 := Q0 AQ0 • such that A is upper Hessenberg. 0 for k =1,2,...for k =1,2,...i Then each A is upper Hessenberg, and QkRk = Ak 1 k QkRk = Ak 1 • QkRk can be done with Givens rotations 2 3 Ak = RkQk in O(n )operations,insteadofO(n ). Ak = RkQk end With shifted QR, need only O(n)iterations, end • so total cost is O(n3). QR iteration:

for k =1,2,...i

QkRk = Ak 1

Ak = RkQk end

1

1 1

1 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Preliminary Reduction, continued

Advantages of initial reduction to upper Hessenberg or tridiagonal form

Work per QR iteration is reduced from (n3) to (n2) for O O general matrix or (n) for symmetric matrix O Fewer QR iterations are required because matrix nearly triangular (or diagonal) already

If any zero entries on first subdiagonal, then matrix is block triangular and problem can be broken into two or more smaller subproblems

Michael T. Heath Scientific Computing 57 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Preliminary Reduction, continued

QR iteration is implemented in two-stages symmetric tridiagonal diagonal ! ! or

general Hessenberg triangular ! ! Preliminary reduction requires definite number of steps, whereas subsequent iterative stage continues until convergence In practice only modest number of iterations usually required, so much of work is in preliminary reduction Cost of accumulating eigenvectors, if needed, dominates total cost

Michael T. Heath Scientific Computing 58 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Cost of QR Iteration

Approximate overall cost of preliminary reduction and QR iteration, counting both additions and multiplications Symmetric matrices 4 3 3 n for eigenvalues only 9n3 for eigenvalues and eigenvectors

General matrices 10n3 for eigenvalues only 25n3 for eigenvalues and eigenvectors

Michael T. Heath Scientific Computing 59 / 87

Krylov Subspace Methods

Assume in the following that A is a symmetric positive definite matrix • with eigenvalues . 1 2 ··· n Suppose we take k iterations of the power method to approximate : • 1 Algorithm: Code: y = x k yk = A x for j =1:k

T y = Ay yk Ayk = T 1 end yk yk ⇡ yT Ay = yT y

This approach uses no information from preceding iterates, yk 1, yk 2,...y1. • Methods

Suppose instead, we seek (for y =0), • 6 yT Ay =max T y k+1 y y 2K yT Ay =max T y lP k(A)x y y 2 yT Ay max = 1.  y lR n yT y 2 Here, the Krylov subspace • = (A, x)=spanx Ax A2x ... Akx Kk+1 Kk+1 { } is the space of all polynomials of degree k in A times x:  (A, x)=lP(A)x. Kk+1 k Krylov Subspace Methods

Consider the n (k +1)matrix • ⇥ 2 k Vk+1 = x Ax A x ... A x

Assuming V has full rank⇥ (k+1 linearly independent⇤ columns), • k+1 then any y has the form 2 Kk+1 y = V z, z lR k+1. k+1 2 Our optimal Rayleigh quotient amounts to • yT Ay zT V T AV z =max =max k+1 k+1 T T T y k+1 y y z lR k+1 z V V z 2K 2 k+1 k+1 Krylov Subspace Methods

T If we had columns vj that were orthonormal (vi vj = ij), then we’d • T have Vk+1Vk+1 = I and yT Ay =max T y (Vk+1) y y 2R zT V T AV z =max k+1 k+1 T T z lR k+1 z V V z 2 k+1 k+1

T z Tk+1 z =max T z lR k+1 z z 2 = µ (T ) (A). 1 k+1  1 Here, µ is the maximum eigenvalue of T := V T AV . • 1 k+1 k+1 k+1 Krylov Subspace Methods

This is the idea behind Arnoldi / Lanczos iteration. • We use information from the entire Krylov subspace to generate optimal • eigenpair approximations. They require only matrix-vector products (unlike QR iteration, which • requires all of A). The approximation to is given by the eigenvalue µ of the much smaller • 1 1 (k +1) (k +1)matrix,T . ⇥ k+1 It is the closest approximation to out of all possible polynomials of • 1 degree k in A and therefore superior (or equal to) the power method. Similarly, µ is the closest approximation to . • k n The methods produce the best possible approximations (in this subspace) • to the extreme eigenvalue/vector pairs. Middle eigenpairs are more challenging—must use shift & invert. • Krylov Subspace Methods

Note, for z =1, • || || T µ1 =maxz Tk+1 z z =1 || ||

corresponds to z = z1, so

µ = zT T z = zT V T AV z . 1 1 k+1 1 1 k+1 k+1 1 ⇡ 1

So, corresponding eigenvector approximation for Ax = x is • 1 1 1 x V z . 1 ⇡ k+1 1 Krylov Subspace Methods

Remark: Shifting does not improve Lanczos / Arnoldi. WHY? • Krylov Subspace Methods

Remark: Shifting does not improve Lanczos / Arnoldi. WHY? • – If p(x) lP (x), so is p(x +1),p(ax + b), etc. 2 k

– So: (A, x) (A + ↵I,x). K ⌘ K

– The spaces are the same and the Krylov subspace projections will find the same optimal solutions.

Shifting may help with conditioning, however, in certain circumstances. •

The essential steps of the algorithms is to construct, step by step, an • orthogonal basis for . Kk We turn to this for the symmetric (Lanczos) case. • Krylov Subspace Methods

We start with the symmetric case, known as Lanczos iteration. • The essence of the method is to construct, step by step, an • orthogonal basis for . Kk q = x/ x ; 1 || || for k =1,... u = Aq ,u = u k 0 || || u = u P u k = u k || || if k/u0 < ✏, break

qk+1 = u/k end Here, P := Q QT , is the orthogonal projector onto (Q ), so • k k k R k k u = u P u = u q qT u, k j j j=1 X which is implemented as modified Gram-Schmidt . Q: Why is u useful? • 0 Krylov Subspace Generation

Notice how the orthogonal subspace is constructed. • q x 1 2 { } q2 x,Ax 2 { } k 1 q x,Ax,...,A x = k 2 { } Kk In the algorithm, we have • u = Aqk T T qj u = qj Aqk T T T = qk A qj = qk Aqj = qT w , w := Aq k j+1 j+1 j 2 Kj+1 However, • q j +1

The Lancos iteration is: • q = x/ x ; 1 || || for k =1,...

u = Aqk,u0 = u T || || ↵k = qk u u = u ↵kqk k 1qk 1 = u k || || if k/u0 < ✏, break

qk+1 = u/k end Lanczos Iteration (A Symmetric)

In matrix form, • T AQk = QkTk + k+1qk+1ek

Or, • ↵1 1

2 1 ↵1 1 3 A [q q q ]=[q q q ] + q eT . 1 2 k 1 2 k .. .. k+1 k+1 k ··· ··· 6 . . k 1 7 6 7 6 7 6 k 1 ↵k 7 6 7 4 5 Arnoldi Iteration (A Nonsymmetric)

Arnoldi iteration is essentially the same as Lanczos, save that we do not • get the short term recurrence.

q = x/ x ; 1 || || for k =1,... u = Aq ,u = u k 0 || || u = u P u k = u k || || if k/u0 < ✏, break

qk+1 = u/k end Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Krylov Subspace Methods

Krylov subspace methods reduce matrix to Hessenberg (or tridiagonal) form using only matrix-vector multiplication

For arbitrary starting vector x0, if K = x Ax Ak 1x k 0 0 ··· 0 then ⇥ ⇤ 1 Kn AKn = Cn

where Cn is upper Hessenberg (in fact, companion matrix) To obtain better conditioned basis for span(Kn), compute QR factorization QnRn = Kn so that H 1 Q AQ = R C R H n n n n n ⌘ with H upper Hessenberg

Michael T. Heath Scientific Computing 60 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Krylov Subspace Methods

Equating kth columns on each side of equation AQn = QnH yields recurrence

Aq = h q + + h q + h q k 1k 1 ··· kk k k+1,k k+1

relating qk+1 to preceding vectors q1,...,qk

H Premultiplying by qj and using orthonormality,

H hjk = qj Aqk,j=1,...,k

These relationships yield Arnoldi iteration, which produces unitary matrix Qn and upper Hessenberg matrix Hn column by column using only matrix-vector multiplication by A and inner products of vectors

Michael T. Heath Scientific Computing 61 / 87 Krylov Subspace Projections

⌘3 1 ⌘ 6 Aqk ⌘ hk+1,k ⌘ XX XX ⌘ X⌘X q ⌘ XXX k+1 ⌘ XXX ⌘ XX ⌘ XXX ⌘ XXX ⌘ ⌘ XX R(Kk) ⌘ - XXX XX XX XX XXX XXX XXX

Notice that Aq will be in (K )forallj

qk+1 (AKk 1) (Kk) • ? R ⇢ R Krylov Subspace and Similarity Transformation

Consider the rank k matrix • 2 k 1 K := x Ax A x A x , k 0 0 0 ··· 0 and associated Krylov subspace := R(K ). Kk k Krylov subspace methods work with the orthogonal • vectors q , k=1, 2,..., satisfying QR = K . k 2 Kk k The similarity transformation • 1 T Q AQ = Q AQ = H

T with entries hij = qi Aqj is upper Hessenberg. Proof: • qT u =0foranyu K if i>k. • i 2 k u = Aq K . • j 2 j+1 qT u = qT Aq =0fori>j+1. • i i j Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Arnoldi Iteration

x0 = arbitrary nonzero starting vector q = x / x 1 0 k 0k2 for k =1, 2,... uk = Aqk for j =1to k H hjk = qj uk u = u h q k k jk j end h = u k+1,k k kk2 if hk+1,k =0then stop qk+1 = uk/hk+1,k end

Michael T. Heath Scientific Computing 62 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Arnoldi Iteration

x0 = arbitrary nonzero starting vector q = x / x 1 0 k 0k2 for k =1, 2,... uk = Aqk for j =1to k H hjk = qj uk Modified Gram-Schmidt u = u h q k k jk j end h = u k+1,k k kk2 if hk+1,k =0then stop qk+1 = uk/hk+1,k end

Michael T. Heath Scientific Computing 62 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Arnoldi Iteration

x0 = arbitrary nonzero starting vector q = x / x 1 0 k 0k2 for k =1, 2,... uk = Aqk for j =1to k H hjk = qj uk Modified Gram-Schmidt u = u h q k k jk j T uk = ( I-QQ ) uk end T = uk - QQ uk hk+1,k = uk 2 k k = u - [h q h q … h q ] if hk+1,k =0then stop k 1k 1 2k 2 kk k q = u /h k+1 k k+1,k Q: There are two projectors in the end lines above. Where are they ??

Michael T. Heath Scientific Computing 62 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Arnoldi Iteration, continued

If Q = q q k 1 ··· k then ⇥ ⇤ H Hk = Qk AQk is upper Hessenberg matrix

Eigenvalues of Hk, called Ritz values, are approximate eigenvalues of A, and Ritz vectors given by Qky, where y is eigenvector of Hk, are corresponding approximate eigenvectors of A

Eigenvalues of Hk must be computed by another method, such as QR iteration, but this is easier problem if k n ⌧

Michael T. Heath Scientific Computing 63 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Arnoldi Iteration, continued

Arnoldi iteration fairly expensive in work and storage because each new vector qk must be orthogonalized against all previous columns of Qk, and all must be stored for that purpose.

So Arnoldi process usually restarted periodically with carefully chosen starting vector

Ritz values and vectors produced are often good approximations to eigenvalues and eigenvectors of A after relatively few iterations

A reasonable restart choice: current approximate eigenvector

Michael T. Heath Scientific Computing 64 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Lanczos Iteration Work and storage costs drop dramatically if matrix is symmetric or Hermitian, since recurrence then has only three terms and Hk is tridiagonal (so usually denoted Tk)

q0 = 0 0 =0 x0 = arbitrary nonzero starting vector q = x / x 1 0 k 0k2 for k =1, 2,... uk = Aqk H ↵k = qk uk uk = uk k 1qk 1 ↵kqk = u k k kk2 if k =0then stop qk+1 = uk/k end

Michael T. Heath Scientific Computing 65 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Lanczos Iteration, continued

↵k and k are diagonal and subdiagonal entries of symmetric tridiagonal matrix Tk

As with Arnoldi, Lanczos iteration does not produce eigenvalues and eigenvectors directly, but only tridiagonal matrix Tk, whose eigenvalues and eigenvectors must be computed by another method to obtain Ritz values and vectors

If k =0, then algorithm appears to break down, but in that case invariant subspace has already been identified (i.e., Ritz values and vectors are already exact at that point)

Michael T. Heath Scientific Computing 66 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Lanczos Iteration, continued

In principle, if were run until k = n, resulting tridiagonal matrix would be orthogonally similar to A

In practice, rounding error causes loss of , invalidating this expectation

Problem can be overcome by reorthogonalizing vectors as needed, but expense can be substantial

Alternatively, can ignore problem, in which case algorithm still produces good eigenvalue approximations, but multiple copies of some eigenvalues may be generated

Michael T. Heath Scientific Computing 67 / 87 Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Krylov Subspace Methods, continued

Great virtue of Arnoldi and Lanczos iterations is their ability to produce good approximations to extreme eigenvalues for k n ⌧ Moreover, they require only one matrix-vector multiplication by A per step and little auxiliary storage, so are ideally suited to large sparse matrices

If eigenvalues are needed in middle of spectrum, say near , then algorithm can be applied to matrix (A I) 1, assuming it is practical to solve systems of form (A I)x = y

Michael T. Heath Scientific Computing 68 / 87 Optimality of Lanczos, Case A is SPD

Recall, if A is SPD, then xT Ax > 0 x =0. • 8 6 If Q is full rank, then T := QT AQ is SPD. • (Qy)T A (Qy)=yT QT AQy > 0 y =0. 8 6

If A is SPD then A = (max eigenvalue). Thus, • || ||2 1 T x Ax T 1 =max =maxx Ax. x=0 xT x x =1 6 || ||

Let T y = µy.(T is k k tridiagonal, k n.) • ⇥ ⌧ T T T T µ1 =maxy T y =maxy Q AQy x Ax x k y =1 y =1 8 2 K || || || ||

Therefore, µ is the closest Rayleigh quotient estimate for all x , x =1. • 1 2 Kk || || Lanczos is as good as (or much better than) the power method for the same • number of matrix-vector products in A. Matlab Demo

❑ Lanczos vs Power Iteration

❑ Lanczos does a reasonable job of converging to extreme eigenvalues. Matlab Demo

❑ Lanczos vs Power Iteration

Q: What is happening here?

❑ Lanczos does a reasonable job of converging to extreme eigenvalues. Eigenvalue Problems Problem Transformations Existence, Uniqueness, and Conditioning Power Iteration and Variants Computing Eigenvalues and Eigenvectors Other Methods Example: Lanczos Iteration

For 29 29 symmetric matrix with eigenvalues 1,...,29, ⇥ behavior of Lanczos iteration is shown below

Michael T. Heath Scientific Computing 69 / 87