Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra

Course Notes: Week 1 Math 270C: Applied Numerical Linear Algebra 1 Lecture 1: Introduction We will focus on iterative methods for solving linear systems of equations (and some discussion of eigenvalues problems): n×n Ax = b; Ax = λx A 2 R where n is very large (e.g. n = 109 on your (powerful) desktop PC) and typically A is sparse. In other words, we want to obtain x from A and b (and λ for eigenvalue problems) but not necessarily A−1. We will also be focusing a great deal on the types of matrices that arise in numerical methods for partial differential equations (e.g. finite difference, finite volume methods etc.). However, I am assuming no familiarity with such techniques. Mainly they will provide examples for testing the algorithms we explore, but those algorithms will of course be applicable for all matrices (or at least a broad class such as symmetric matrices or symmetric positive definite matrices depending on the method we focus on). Iterative methods are potentially much faster than direct methods (e.g. Gaussian elimination, QR factorization etc., see Math 270B for this class of methods). Notably, when the matrix is sparse we may be able to solve in linear time: O(n) (and quadratic in general: O(n2)). This is a very appealing possibility considering the direct methods are always O(n2) and O(n3) for sparse and non-sparse. Furthermore, the direct methods typically incur O(n2) storage in both the sparse and non-sparse cases whereas iterative methods will typically only require storage of A (i.e. O(n) storage for sparse and O(n2) for dense). The analysis of iterative methods is in general much more involved. Also it can require great care to achieve linear performance in practice. Generally, the iterative method must be designed with some knowledge of the matrices it is designed for. This aspect is less appealing than the direct methods, but the potential for linear performance more than justifies the effort. Sparsity means that each row in the matrix A has at most a c non-zero entries where c << n. E.g. c = 3 for the matrix: 0 2 −1 0 0 ::: 0 1 B −1 2 −1 0 ::: 0 C B C B 0 −1 2 −1 ::: 0 C A = B C B .. .. .. C @ . A 0 ::: 0 −1 2 Basically, the matrix is mostly zero so we can evaluate matrix vector multiplies in linear time (since n each row can be dotted with an R vector in (2c − 1)n flops. The idea is that with (functional) iterative methods we should be able to get a very good approximation to x = A−1b by some low number of matrix vector multiplies being the dominant cost in the algorithm. This may sound too good to be true (and sometimes it is). However, this is the promise of iterative methods for sparse matrices and it is very often possible to achieve this performance. 1.1 Geometric Convergence Trefethen and Bau refer to \geometric convergence" as the optimal case for an iterative method (but exponential convergence is also often possible). In these scenarios, a constant number of iterations 1 is required in practice to reduce the residual to the system by a given factor (independent of the size of the problem n). In practice, my criterion for optimal convergence is that the residual is reduced by a constant factor each iteration. Let xk be the kth iterate in our method. Therefore, the residual at this step is rk = b − Axk. In general, the residual will be our only means of estimating th k k+1 k+2 the accuracy of the k iterate. Also, in general we hope that jjr jj2 < jjr jj2 < jjr jj2:::. When I am estimating the efficacy of a given iterative method I usually use the following formula: k+1 k jjr jj2 = ρjjr jj2 and estimate ρ asymptotically. I.e. as k ! 1 I estimate ρ (numerically). In this case our asymptotic k γk γ k performance is jjr jj2 = Ce with ρ = e and γ < 0. In practice you can plot log(jjr jj2) vs. k and estimate the slope of the line. 1.2 Examples The two most simplistic iterative methods are probably Jacobi and Gauss-Seidel. These methods are very simple to describe, but typically do not yield optimal performance. I.e. they usually require many many iterations and the the asymptotic ρ discussed above is 1. I.e. the method nearly stagnates as you run it longer and longer (typically without bringing you close enough to the solution). These methods are still very useful, just not typically on their own. For example, both of these techniques can be shown to \smooth" the error very efficiently for many matrices. That is, they reduce the error of the high frequency modes very quickly but reduce the error of the low frequency modes very slowly. This fact is leveraged in multigrid methods for discrete elliptic equations. Not that in both examples below, all operations can be made much more efficient in the case of sparse matrices. Matrix vector multiplies and residual updates can all be done with sparsity in mind to make each iteration of both methods linear in computational cost (linear in the size of the problem n). 1.2.1 Jacobi Jacobi iteration is a very simple algorithm for approximating x = A−1b. In practice, I would describe Jacobi iteration as: for k = 1 to max it do // At this point, x is xk r = b − Ax for i = 1 to n do δ ri Aii xi xi + δ //Note that each xi update can be done independently end for // At this point, x is now equal to xk+1 end for This can also be described as: MJ xk+1 = NJ xk + b 2 where MJ is the diagonal part of A (D) and NJ = − (L + U) and L and U are the lower and upper triangular parts of the matrix A. E.g. for 0 2 −1 0 0 ::: 0 1 B −1 2 −1 0 ::: 0 C B C B 0 −1 2 −1 ::: 0 C A = B C B .. .. .. C @ . A 0 ::: 0 −1 2 0 2 0 0 0 ::: 0 1 0 0 0 0 0 ::: 0 1 0 0 −1 0 0 ::: 0 1 B 0 2 0 0 ::: 0 C B −1 0 0 0 ::: 0 C B 0 0 −1 0 ::: 0 C B C B C B C J B 0 0 2 0 ::: 0 C B 0 −1 0 0 ::: 0 C B 0 0 0 −1 ::: 0 C M = B C ; L = B C ; U = B C : B .. .. .. C B .. .. .. C B .. .. .. C @ . A @ . A @ . A 0 ::: 0 0 2 0 ::: 0 −1 0 0 ::: 0 0 0 1.2.2 Gauss-Seidel Gauss-Seidel usually performs better than Jacobi and is a little more expensive than Jacobi. Also, the order in which you apply a correction matters for Gauss-Seidel and individual updates are not independent. Jacobi is usually considered more parallel friendly than Gauss-Seidel for this reason. The Gauss-Seidel iteration can be described as follows: for k = 1 to max it do // At this point, x is xk for i = 1 to n do δ ri Aii xi xi + δ r = b − Ax //Update the residual to reflect our change to xi end for // At this point, x is now equal to xk+1 end for This can also be described as: MGxk+1 = NGxk + b where MG = D + L and NG = −U. 2 Lecture 2: Krylov Subspaces and Arnoldi Iteration Krylov subspace methods are very powerful techniques that often achieve near optimal performance. That is, these often have the potential of requiring just a few iterations independent of the size of the problem n. Furthermore, they may be optimized for different classes of matrices. E.g. if the matrix is symmetric positive definite, the Kryolv method is conjugate gradients (CG). If is just symmetric the method is MINRES. In general it is GMRES. CG is the most efficient in terms of memory and computation, MINRES second most so and GMRES least so. We will initially discuss the general case before we cover the specializations. These methods are based on the Krylov subspaces Kk = spanfb; Ab; A2b; :::; Ak−1bg. These 3 spaces have the property Kk ⊂ Kk^ 8k^ > k and 9k with A−1b 2 Kk. The first property is trivial, the second property follows because (assuming A is full rank) we can write: k−1 −1 X i A = ciA : i=0 To see this, we can construct a polynomial for which A is a root. For example, consider the case of a diagonalizeable A. Let A = QΛQ−1 where Λ is a diagonal matrix with eigenvalues λ1; λ2; :::; λn. Let λ1; λ2; :::; λk be all of the eigenvalues in Λ except for any repeats (i.e. just the distinct eigenvalues). Then A is a root of the following kth degree polynomial: k−1 X i k p(λ) = (λ − λ1)(λ − λ2) ::: (λ − λk) = biλ + λ i=0 where k is then the number of distinct eigenvalues. Since A is a root of this polynomial k−1 ! 1 X A−1 = − b Ai−1 + Ak−1 : b i 0 i=1 Here d0 is the product of the distinct eigenvalues and thus will not be zero in the full rank case. Therefore, when A is full rank, the solution x = A−1b will be in the kth Krylov subspace Kk where k is the number of distinct eigenvalues.

Load more