Program Lecture 4 • Basic methods for eigenproblems. Basic iterative methods • Power method • Shift-and-invert Power method • QR algorithm • Basic iterative methods for linear systems • Richardson’s method • Jacobi, Gauss-Seidel and SOR • Iterative refinement Gerard Sleijpen and Martin van Gijzen • Steepest decent and the Minimal residual method October 5, 2016

1

October 5, 2016 2 National Master Course National Master Course Delft University of Technology

Basic methods for eigenproblems The Power method

The eigenvalue problem The Power method is the classical method to compute in modulus largest eigenvalue and associated eigenvector of a Av = λv . can not be solved in a direct way for problems of order > 4, since Multiplying with a matrix amplifies strongest the eigendirection the eigenvalues are the roots of the characteristic equation corresponding to the in modulus largest eigenvalues.

det(A − λI) = 0. Successively multiplying and scaling (to avoid overflow or underflow) yields a vector in which the direction of the largest Today we will discuss two iterative methods for solving the eigenvector becomes more and more dominant. eigenproblem.

October 5, 2016 3 October 5, 2016 4

National Master Course National Master Course Algorithm Convergence (1)

The Power method for an n × n matrix A. Let the n eigenvalues λi with eigenvectors vi, Avi = λivi, be

n ordered such that |λ1| ≥ |λ2|≥ . . . ≥ |λn|. u0 ∈ C is given • Assume the eigenvectors v ,..., vn form a basis. for k = 1, 2, ... 1 • Assume |λ1| > |λ2|. uk = Auk−1 Each arbitrary starting vector u0 can be written as: uk = uk/kukk2 e(k) ∗ λ = uk−1uk u0 = α1v1 + α2v2 + ... + αnvn end for e e e and if α1 6= 0, then it follows that

If uk is an eigenvector corresponding to λj, then n k k k αj λj (k+1) ∗ ∗ 2 A u0 = α1λ1 v1 + vj . λ = u Auk = λj u uk = λj kukk = λj.  α λ  k k 2 j 1 1 X=2     October 5, 2016 5 October 5, 2016 6

National Master Course National Master Course

Convergence (2) Convergence (3) • If the component α v in u is small as compared to, say α v , Using this equality we conclude that 1 1 0 2 2 i.e. |α1| ≪ |α2|, then initially convergence may seem to be k k k λ dominated by λ2 (until |α2λ | < |α1λ )). |λ − λ(k)| = O 2 (k →∞) 2 1 1 λ 1 ! • If the basis of eigenvectors is ill-conditioned, then some

eigenvector components in u0 may be large even if u0 is modest and also that uk directionally converges to v1: λ k and initially convergence may seem to be dominated by the angle between u and v is of order 2 . k 1 λ non-dominant eigenvalues. 1 T • u0 = Va, where V ≡ [v1,..., vn], a ≡ (α1,...,αn) . If |λ1| > |λj| for all j > 1, then we call −1 −1 Hence, ku0k ≤ kVk kak and kak = kV u0k ≤ kV k ku0k : the λ1 the dominant eigenvalue and constant in the ‘O-term’ may depend on the conditioning of the v 1 the dominant eigenvector. basis of eigenvectors.

October 5, 2016 7 October 5, 2016 8

National Master Course National Master Course Convergence (4) Shifting

λ2 Note that there is a problem if |λ1| = |λ2|, which is the case for Clearly, the (asymptotic) convergence depends on | λ1 |. instance if λ1 = λ2. To speed-up convergence the Power method can also be applied A vector u0 can be written as to the shifted problem n (A − σI)v = (λ − σ)v u0 = α1v1 + α2v2 + αjvj . j=3 X The asymptotic rate of convergence now becomes The component in the direction of v3,..., vn will vanish in the λ2 − σ Power method if |λ2| > |λ3|, but uk will not tend to a limit if u0 λ − σ v v 1 has nonzero components in 1 and 2 and |λ1| = |λ2|,λ1 6= λ2.

Moreover, by choosing a suitable shift σ (how?) convergence can be forced towards the smallest eigenvalue of A.

October 5, 2016 9 October 5, 2016 10

National Master Course National Master Course

Shift-and-invert QR-factorisation, power method Consider the QR-decomposition A = QR Another way to speed-up convergence is to apply the Power with Q = [q ,..., q ] unitary and R = (r ) upper triangular. method to the shifted and inverted problem 1 n ij Observations. −1 1 (A − σI) v = µv, λ = + σ. 1) Ae q . µ 1 = r11 1 ∗ ∗ ∗ 2) Since A Q = R , we also have A qn = rnn en. This technique allows us to compute eigenvalues near the shift. The QR-decomposition incorporates one step of the power However, for this the solution of a system is required in every method with A in the first column of Q and with (A∗ )−1 in the iteration! last column of Q (without inverting A!). To continu, ‘rotate’ the basis: instead of e ,..., e , Assignment. Show that the shifted and inverted problem and 1 n take q1,..., qn as new basis in domain and image space of A. the original problem share the same eigenvectors. ∗ A1 ≡ Q AQ = RQ is the matrix of A w.r.t. this rotated basis. In the new basis q1 and qn are represented by e1 and en, resp..

October 5, 2016 11 October 5, 2016 12

National Master Course National Master Course The QR method (1) The QR method (2) This leads to the QR method, a popular technique in particular And repeats these steps: to solve small or dense eigenvalue problems. factor A = Q R , multiply A = R Q . The method repeatedly uses QR-factorisation. k k k k+1 k k

The method starts with the matrix A0 ≡ A, Hence, Ak Qk = Qk Ak+1 and factors it into A0 = Q0 R0, A0 (Q0Q1 · . . . · Qk−1) = (Q0Q1 · . . . · Qk−1) Ak and then reverses the factors: A1 = R0 Q0.

Assignment. Show that A0 and A1 are similar (share the same Uk ≡ Q Q · . . . · Q is unitary, eigenvalues). 0 1 k−1 A0Uk = UkAk : A0 and Ak are similar.

October 5, 2016 13 October 5, 2016 14

National Master Course National Master Course

The QR method (2) The QR method (2) And repeats these steps: And repeats these steps:

factor Ak = QkRk, multiply Ak+1 = RkQk. factor Ak = QkRk, multiply Ak+1 = RkQk.

Hence, with Uk ≡ Q0Q1 · · · Qk−1, Hence, Uk ≡ Q0Q1 · . . . · Qk−1 is unitary and

∗ ∗ AUk = Uk Ak = UkQk Rk = Uk+1 Rk. A Uk+1 = Uk Rk.

(k) (k+1) ∗ (k+1) (k) In particular, Au1 = τ u1 with τ the (1, 1)-entry of Rk: In particular, A un = τ un , now with τ the (n, n)-entry of (k) ∗ (k) the first columns u1 of the Uk represent the power method. Rk: the last column un of Uk incorporates the inverse power ∗ Here, we used that Rk is upper triangular. method. Here, we used that Rk is lower triangular.

October 5, 2016 14 October 5, 2016 14

National Master Course National Master Course The QR method (2) The QR method (3) And repeats these steps: Normally the algorithm is used with shifts

• Ak − σkI = QkRk, Ak+1 = RkQk + σkI factor Ak = QkRk, multiply Ak+1 = RkQk.

Check that Ak+1 is similar to Ak. Hence, Uk ≡ Q0Q1 · . . . · Qk−1 is unitary, Uk converges to an unitary matrix U, • The process incorporates shift-and-invert iteration

Rk converges to an upper triangular matrix S, and (in the last column of Uk ≡ Q0 · . . . · Qk−1). The shifted algorithm (with proper shift) converges quadratically. AUk = Uk+1Rk → AU = US, An eigenvalue of the 2 × 2 right lower block of Ak is such a which is a so-called Schur decomposition of A. proper shift (Wilkinson’s shift). The eigenvalues of A are on the main diagonal of S

(They appear on the main diagonal of Ak and of Rk).

October 5, 2016 14 October 5, 2016 15

National Master Course National Master Course

The QR method (4) The QR method for eigenvalues

Other ingredients of an effective algorithm of the QR method: Ingredients (summary):

• Deflation is used, that is, 1) Bring A to upper Hessenberg form converged columns and rows (the last ones) are removed. 2) Select an appropriate shift strategy 3) Repeat: shift, factor, reverse factors & multiply, de-shift Theorem. Ak+1 is Hessenberg if Ak is Hessenberg: ∗ • Select A0 = U0 AU0 to be upper Hessenberg. 4) Deflate upon convergence

Find all eigenvalues λj on the diagonal of S. Costs to compute all eigenvalues (do not compute U) Costs ≈ 12n3 flop. of the n × n matrix A to full accuracy: ≈ 12n3 flop. Discard one of the ingredients costs O(n4) or higher. Stability is optimal 3 4 (order of nu× stability of the eigenvalue problem of A). n = 10 : Matlab needs a few seconds. What about n = 10 ?

October 5, 2016 16 October 5, 2016 17

National Master Course National Master Course The QR method: concluding remarks Iterative methods for linear systems • The QR method is the method of choice for dense systems of Iterative methods construct successive approximations xk to the size n with n up to a few thousend. solution of the linear systems Ax = b. Here k is the iteration • Usually, for large values of n, one is only interested in a few number, and the approximation xk is also called the iterate. eigenpairs or a part of the spectrum. The QR-method computes The vector ek ≡ xk − x is the error, all eigenvalues. The order in which the method detects the rk ≡ b − Axk (= −Aek) is the residual. eigenvalues can not be pre-described. Therefore, all eigenvalues are computed and the wanted ones are selected. The iterative methods are composed of only a few different basic • For larger values of n, methods are used (to be discussed in a operations: following lectures) that project the eigenvalue problem onto low • Products with the matrix A dimensional spaces, where the QR method is used. • Vector operations (updates and inner product operations) • The method of choice for computing zeros of polynomials is also the QR method (applied to the companion system). • Preconditioning operations

October 5, 2016 18 October 5, 2016 19

National Master Course National Master Course

Preconditioning Basic iterative methods

Usually iterative methods are applied not to the original system The first iterative methods we will discuss are the basic iterative Ax = b, methods. Basic iterative methods only use information of the previous iteration. but to the preconditioned system Until the 70’s they were quite popular. Some are still used but as M−1Ax = M−1b, in combination with an acceleration technique. where the M is chosen such that: They also still play a role in multigrid techniques where they are • Preconditioning operations (operations with M−1, i.e., used as smoothers. solves Mw = r for w) are cheap; • The iterative method converges much faster for the preconditioned system with appropriate preconditioner.

October 5, 2016 20 October 5, 2016 21

National Master Course National Master Course Basic iterative methods (2) Richardson’s method

Basic iterative methods are usually constructed using The choice M = I, R = I − A gives Richardson’s method, a splitting of A: which is the most simple iterative method possible.

A = M − R. The iterative process becomes

Successive approximations are then computed using the xk+1 = xk + (b − Axk)= b + (I − A)xk iterative process

Mxk+1 = Rxk + b which is equivalent too −1 −1 xk+1 = xk + M (b − Axk)= xk + M rk The next few slides we look at M = I.

October 5, 2016 22 October 5, 2016 23

National Master Course National Master Course

Richardson’s method (2) Richardson’s method (3)

This process yields the following iterates: So Richardson’s method ‘generates’ the series expansion for −1 Initial guess x0 = 0 (I − Z) with Z = I − A. If this series converges we have

x1 = b ∞ (I − A)i = A−1. x = b + (I − A)x = b + (I − A)b 2 1 i X=0 2 x3 = b + (I − A)x2 = b + (I − A)b + (I − A) b 1 C The series expansion for 1−z (z ∈ ) converges if |z| < 1. i Repeating this gives The series i(I − A) converges if k i P xk+1 = (I − A) b |1 − λ| < 1 all eigenvalues λ of A. i X=0

October 5, 2016 24 October 5, 2016 25

National Master Course National Master Course Richardson’s method (3) Richardson’s method (4)

So Richardson’s method ‘generates’ the series expansion for In order to increase the radius of convergence and to speed up (I − Z)−1 with Z = I − A. If this series converges we have the convergence, one can introduce a parameter α:

∞ xk+1 = xk + α(b − Axk)= αb + (I − αA)xk (I − A)i = A−1. It is easy to verify that if all eigenvalues are real and positive the i=0 X optimal α is given by 1 C The series expansion for 1−z (z ∈ ) converges if |z| < 1. 2 αopt = . i λmax + λmin The series i(I − A) converges if If all eigenvalues are in right half of the complex plane, i.e., P C λ ∈ {ζ ∈ | |1 − ζ| < 1} all eigenvalues λ of A. Re(λ) > 0 all eigs. λ of A, then, for some α

For λ real this means that 0 <λ< 2. |1 − αλ| < 1 all eigenvalues λ of A.

October 5, 2016 25 October 5, 2016 26

National Master Course National Master Course

Richardson’s method (4) Initial guess

α=0.64 1.5 Before, we assumed for the initial guess x 0. Complex plane λ 0 = αλ Starting with another initial guess x0 only 1 means that we have to solve a “shifted” system

0.5 A(y + x0)= b ⇔ Ay = b − Ax0 = r0

0 0 So the results obtained before remain valid, irrespective of the 1 initial guess.

−0.5

−1

−1.5 −0.5 0 0.5 1 1.5 2 2.5 3

October 5, 2016 27 October 5, 2016 28

National Master Course National Master Course Stopping criterion Convergence We want to stop once the error kx − xk <ǫ, with ǫ some k To investigate the convergence of Basic Iterative Methods in prescribed tolerance. Unfortunately we do not know x, so this general, we look again at the formula criterion does not work in practice.

Alternatives are: Mxk+1 = Rxk + b. • kr k = kb − Ax k = kAx − Ax k <ǫ k k k Remember that A = M − R. If we subtract Mx = Rx + b from Disadvantage: criterion not scaling invariant this equation we get a recursion for the error ek = xk − x: • krkk/kr0k <ǫ Disadvantage: good initial guess does not reduce the Mek+1 = Rek number of iterations

• krkk/kbk <ǫ Seems best (fits the idea of a small backward error).

October 5, 2016 29 October 5, 2016 30

National Master Course National Master Course

Convergence (2) Linear convergence −1 We can also write this as Ultimately, we have kek+1k≈ ρ(M R) kekk, which means that −1 we have linear convergence. ek+1 = M Rek Convergence history 1 10 This is a power iteration and hence the error will ultimately point

−1 0 in the direction of the dominant eigenvector of M R. 10 The rate of convergence is determined by

−1 the spectral radius ρ(M−1R) of M−1R (= I − M−1A): 10

−1 −1 Error −2 ρ(M R) ≡ max{|λ| | λ eigenvalue M R} 10 = max{|1 − λ| | λ eigenvalue M−1A}

−3 10 For convergence we must have that

−1 −4 10 ρ(M R) < 1. 0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 Iteration number The vertical axis displays the size kekk2 of the error on log-scale.

October 5, 2016 31 October 5, 2016 32

National Master Course National Master Course Classical Basic Iterative Methods Jacobi’s method We first write A = L + D + U, with We will now briefly discuss the three best known basic iterative L the strictly lower triangular part of A, methods D the main diagonal, and • Jacobi’s method U the strictly upper triangular part. • The method of Gauss-Seidel Jacobi’s method is defined by the choice • Successive overrelaxation M ≡ D ⇒ R = −L − U. These methods can be seen as Richardson’s method applied to The process is given by the preconditioned system Dxk+1 = (−L − U)xk + b, M−1Ax = M−1b . or, equivalently, by −1 xk+1 = xk + D (b − Axk).

October 5, 2016 33 October 5, 2016 34

National Master Course National Master Course

The Gauss-Seidel method Successive overrelaxation (SOR) We write again A = L + D + U. We write again A = L + D + U. The Gauss-Seidel method is defined by the choice The SOR method is defined by the choice 1 1 M ≡ L + D ⇒ R = −U. M ≡ D + L ⇒ R = ( − 1)D − U. The process is given by ω ω The parameter ω is called the relaxation parameter. (L + D)xk+1 = −Uxk + b, The process is given by or, equivalently, by 1 1 −1 ( ω D + L)xk+1 = (( ω − 1)D − U)xk + b xk+1 = xk + (L + D) (b − Axk). or as 1 −1 xk+1 = xk + ( ω D + L) (b − Axk) With ω = 1 we get the method of Gauss-Seidel back. In general the optimal value of ω is not known.

October 5, 2016 35 October 5, 2016 36

National Master Course National Master Course Iterative refinement One-step projection methods

Two weeks ago, we saw direct methods. For it The convergence of Richardson’s method is not guaranteed and is necessary to perform partial pivoting. However, this goes at if the method converges, convergence is often very slow. the expense of the efficiency. We now introduce two methods that are guaranteed to converge

If the LU-factors are inaccurate, such that A = LU − ∆A, they for wide classes of matrices. The two methods take special might still be usable as preconditioner for the process linear combinations of the vectors rk and Ark to construct a new iterate x that satisfies a local optimality property. −1 k+1 xk+1 = xk + (LU) (b − Axk)

This is called iterative refinement and is used to improve the accuracy of the direct solution.

October 5, 2016 37 October 5, 2016 38

National Master Course National Master Course

Steepest descent (Local) Minimal residual

Let A be Hermitian positive definite. Define the function Let A be general square. Define the function 2 ∗ 2 ∗ f(xk) ≡ kxk − xkA = (xk − x) A(xk − x) g(xk) ≡ kb − Axkk2 = rkrk

Let xk+1 = xk + αk rk. Then the choice Let xk+1 = xk + αk rk. Then the choice ∗ ∗ rkrk rkArk αk = ∗ αk = ∗ ∗ rkArk rkA Ark minimizes f(xk+1) (given xk and rk). minimizes g(xk+1) (given xk and rk).

Theorem. Steepest decent converges Theorem. The local minimal residual method converges if A is Hermitian positive definite. if Re(λ) > 0 for all eigenvalues λ of A. Convergence is still usually very slow. Convergence is still usually very slow.

October 5, 2016 39 October 5, 2016 40

National Master Course National Master Course Orthogonality properties Concluding remarks

The optimality properties of the steepest descent method and During the next lessons, the steepest decent method and the the minimal residual method are equivalent with the following minimal residual method will be generalised. orthogonality properties: This will ultimately give rise to a class of optimal iterative For steepest descent ∗ methods. r rk α = k ⇔ r ⊥ r . k r∗Ar k+1 k k k Moreover, we will see that these methods are closely linked to For the (local) minimal residual method eigenvalue method (as the simple iterative methods are to the ∗ rkArk Power method). αk = ∗ ∗ ⇔ rk+1 ⊥ Ark . rkA Ark

October 5, 2016 41 October 5, 2016 42

National Master Course National Master Course