Iterative Methods in Linear Algebra

Iterative Methods in Linear Algebra Azzam Haidar Innovative Computing Laboratory Computer Science Department The University of Tennessee Wednesday April 18, 2018 COSC 594, 04-18-2018 COSC 594, 04-18-2018 Outline Part I Discussion Motivation for iterative methods Part II Stationary Iterative Methods Part III Iterative Refinement Methods Part IV Nonstationary Iterative Methods COSC 594, 04-18-2018 Part I Discussion COSC 594, 04-18-2018 Discussion Original matrix: Reordered matrix: >> load matrix.output >> load A.data >> S = spconvert(matrix); >> S = spconvert(A); >> spy(S) >> spy(S) COSC 594, 04-18-2018 Discussion Original sub-matrix: Reordered sub-matrix: >> load matrix.output >> load A.data >> S = spconvert(matrix); >> S = spconvert(A); >> spy(S(1:8660),1:8660)) >> spy(S(8602:8700,8602:8700)) COSC 594, 04-18-2018 Discussion Results on torc0: Results on woodstock: Pentium III 933 MHz, 16 KB L1 Intel Xeon 5160 Woodcrest and 256 KB L2 cache 3.0GHz, L1 32 KB, L2 4 MB Original matrix: Original matrix: Mflop/s = 42.4568 Mflop/s = 386.8 Reordered matrix: Reordered matrix: Mflop/s = 45.3954 Mflop/s = 387.6 BCRS BCRS Mflop/s = 72.17374 Mflop/s = 894.2 COSC 594, 04-18-2018 Iterative Methods COSC 594, 04-18-2018 Motivation So far we have discussed and have seen: Many engineering and physics simulations lead to sparse matrices e.g. PDE based simulations and their discretizations based on FEM/FVM finite di↵erences Petrov-Galerkin type of conditions, etc. How to optimize performance on sparse matrix computations Some software how to solve sparse matrix problems (e.g. PETSc) The question is: Can we solve sparse matrix problems faster than using direct sparse methods? In certain cases Yes: using iterative methods COSC 594, 04-18-2018 Sparse direct methods Sparse direct methods These are based on Gaussian elimination (LU) Performed in sparse format Are there limitations of the approach? Yes, they have fill-ins which lead to more memory requirements more flops being performed Fill-ins can become prohibitively high COSC 594, 04-18-2018 Sparse direct methods Consider LU for the matrix below 1st step of LU factorization will introduce fill-ins a nonzero is represented by a* marked by an F COSC 594, 04-18-2018 Sparse direct methods Fill-ins can be improved by reordering Remember: we talked about it in slightly di↵erent context (for speed and parallelization) Consider the following reordering These were extreme cases but still, the problem exists COSC 594, 04-18-2018 Sparse methods Sparse direct vs dense methods Dense takes O(n2) storage, O(n3) flops, runs within peak performance Sparse direct can take O(#nonz) storage, and O(#nonz) flops, but these can also grow due to fill-ins and performance is bad with #nonz << n2 and ’proper’ ordering it pays o↵to do sparse direct Software (table from Julien Langou) http://www.netlib.org/utk/people/JackDongarra/la-sw.html Package Support Type Language Mode Real Complex f77 c c++ Seq Dist SPD Gen DSCPACK yes X X X M X HSL yes X X X X X X MFACT yes X X X X MUMPS yes X X X X X M X X PSPASES yes X X X M X SPARSE ? X X X X X X SPOOLES ? X X X X M X SuperLU yes X X X X X M X TAUCS yes X X X X X X UMFPACK yes X X X X X Y12M ? X X X X X COSC 594, 04-18-2018 Sparse methods What about Iterative Methods?Thinkforexample x = x + P(b Ax ) i+1 i − i Pluses: Storage is only O(#nonz)(forthematrixandafewworking vectors) Can take only a few iterations to converge (e.g. << n) 1 for P = A− it takes 1 iteration (check)! In general easy to parallelize COSC 594, 04-18-2018 Sparse methods What about Iterative Methods?Thinkforexample x = x + P(b Ax ) i+1 i − i Minuses: Performance can be bad (as we saw previously) Convergence can be slow or even stagnate But can be improved with preconditioning 1 Think of P as a preconditioner, an operator/matrix P A− Optimal preconditioners (e.g. multigrid can be) lead to⇡ convergence in O(1) iterations COSC 594, 04-18-2018 Part II Stationary Iterative Methods COSC 594, 04-18-2018 Stationary Iterative Methods Can be expressed in the form xi+1 = Bxi + c where B and c do not depend on i older, simpler, easy to implement, but usually not as e↵ective (as nonstationary) examples: Richardson, Jacobi, Gauss-Seidel, SOR, etc. (next) COSC 594, 04-18-2018 Richardson Method Richardson iteration x = x +(b Ax )=(I A)x + b (1) i+1 i − i − i i.e. B from the definition above is B = I A − Denote e = x x and rewrite (1) i − i x x = x x (Ax Ax ) − i+1 − i − − i e = e Ae i+1 i − i =(I A)e − i 2 ei+1 (I A)ei I A ei I A ei 1 || || || − || || − || || || || − || || − || I A i e ···||− || || 0|| i.e. for convergence (e 0) we need i ! I A < 1 || − || for some norm , e.g. when ⇢(B) < 1. || · || COSC 594, 04-18-2018 Jacobi Method Jacobi Method 1 1 1 x = x + D− (b Ax )=(I D− A)x + D− b (2) i+1 i − i − i where D is the diagonal of A (assumed nonzero; B = I D 1A − − from the definition) Denote e = x x and rewrite (2) i − i 1 x x = x x D− (Ax Ax ) − i+1 − i − − i 1 e = e D− Ae i+1 i − i 1 =(I D− A)e − i 1 1 1 2 ei+1 (I D− A)ei I D− A ei I D− A ei 1 || || || − || || − || || || || − || || − || 1 i I D− A e ···||− || || 0|| i.e. for convergence (e 0) we need i ! 1 I D− A < 1 || − || for some norm , e.g. when ⇢(I D 1A) < 1 || · || − − COSC 594, 04-18-2018 Gauss-Seidel Method Gauss-Seidel Method 1 x =(D L)− (Ux + b) (3) i+1 − i where L is the lower and U the upper triangular part of A, and − − D is the diagonal. Equivalently: (i+1) (i) (i) (i) (i) x1 =(b1 a12x2 a13x3 a14x4 ... a1nxn )/a11 (i+1) − (i+1) − (i) − (i) − − (i) x2 =(b2 a21x1 a23x3 a24x4 ... a2nxn )/a22 8 (i+1) − (i+1) − (i+1) − (i) − − (i) > x3 =(b2 a31x1 a32x2 a34x4 ... a3nxn )/a22 > − − − − − > . < . (i+1) (i+1) (i+1) (i+1) (i+1) > xn =(bn an1x1 an2x2 an3x3 ... an,n 1xn 1 )/ann > − − − − − − − > :> COSC 594, 04-18-2018 Acomparison(fromJulienLangouslides) Gauss-Seidel method 10 1 2 0 1 23 01213 1 44 A = 0 129101 b = 0 36 1 B 031100C B 49 C B C B C B 120015C B 80 C @ A @ A b Ax(i) / b ,functionofi,thenumberofiterations k − k2 k k2 x(0) x(1) x(2) x(3) x(4) ... 0.0000 2.3000 0.8783 0.9790 0.9978 1.0000 0.0000 3.6667 2.1548 2.0107 2.0005 2.0000 0.0000 2.9296 3.0339 3.0055 3.0006 3.0000 0.0000 3.5070 3.9502 3.9962 3.9998 4.0000 0.0000 4.6911 4.9875 5.0000 5.0001 5.0000 COSC 594, 04-18-2018 Acomparison(fromJulienLangouslides) Jacobi Method: (i+1) (i) (i) (i) (i) x1 =(b1 a12x2 a13x3 a14x4 ... a1nxn )/a11 (i+1) − (i) − (i) − (i) − − (i) x =(b a x a x a x ... a x )/a 8 2 2 − 21 1 − 23 3 − 24 4 − − 2n n 22 > (i+1) (i) (i) (i) (i) > x =(b2 a31x a32x a34x ... a3nxn )/a22 > 3 − 1 − 2 − 4 − − > . <> . > (i+1) (i) (i) (i) (i) > xn =(bn an1x an2x an3x ... an,n 1x )/ann > − 1 − 2 − 3 − − − n 1 > − :> Gauss-Seidel method (i+1) (i) (i) (i) (i) x1 =(b1 a12x2 a13x3 a14x4 ... a1nxn )/a11 (i+1) − (i+1) − (i) − (i) − − (i) x =(b a x a x a x ... a x )/a 8 2 2 − 21 1 − 23 3 − 24 4 − − 2n n 22 > (i+1) (i+1) (i+1) (i) (i) > x =(b2 a31x a32x a34x ... a3nxn )/a22 > 3 − 1 − 2 − 4 − − > . <> . > (i+1) (i+1) (i+1) (i+1) (i+1) > xn =(bn an1x an2x an3x ... an,n 1x )/ann > − 1 − 2 − 3 − − − n 1 > − :> Gauss-Seidel is the method with better numerical properties (less iterations to convergence). Which is the method with better efficiency in term of implementation in sequential or parallel computer? Why? (i+1) (i+1) In Gauss-Seidel, the computation of xk+1 implies the knowledge of xk . Parallelization is impossible. COSC 594, 04-18-2018 Convergence Convergence can be very slow Consider a modified Richardson method: x = x + ⌧(b Ax ) (4) i+1 i − i Convergence is linear, similarly to Richardson we get e I ⌧A i e || i+1|| || − || || 0|| but can be very slow (if I ⌧A is close to 1), e.g. let || − || A be symmetric and positive definite (SPD) the matrix norm in I ⌧A is induced by || − || || · ||2 Then the best choice for ⌧ (⌧ = 2 ) would give λ1+λN k(A) 1 I ⌧A = − || − || k(A)+1 where k(A) = λN is the condition number of A. λ1 COSC 594, 04-18-2018 Convergence Note: The rate of convergence depend on the condition number k(A) Even for the the best ⌧ the rate of convergence k(A) 1 − k(A)+1 is slow (i.e. close to 1) for large k(A) We will see convergence of nonstationary methods also depend on k(A) but is better, e.g.

Load more