4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices

4. Linear Equations with sparse matrices 4.1 General properties of sparse matrices Full n x n – matrix: storage O(n2), solution O(n3) Æ too costly for most applications, especially for fine discretization (large n). Idea: Formulate the given problem in a clever way that leads to a linear system that is sparse: storage O(n), solution O(n)? (that is strucutured: storage O(n), solution O(n log(n)), FFT) (that is dense, but reduced from e.g.3D to 2D) Example: Tridiagonal matrix, banded matrix, block band matrix. 1 .⎛ 0 0 2 .⎞ 0 ⎜ ⎟ 3 .⎜ 4 . 0 5 .⎟ 0 Sparse example matrix 6A .= ⎜ 0 7 . 8 .⎟ , n 9 = .5, nnz = 12 ⎜ ⎟ 0 0⎜ 10 . 11⎟ . 0 ⎜ ⎟ 1 0⎝ 0 0 0⎠ 12 . 4.1.1 Storage in Coordinate Form values AA 12. 9. 7. 5. 1. 2. 11. 3. 6. 4. 8. 10. row JR5 332114 23234 columnJC5 534144 11243 To store: n, nnz, 2*nnz integer for row and column indices in JR and JC, and nnz float in AA. No sorting included. Redundant information. Code for computing c = A*b: for j = 1 : nnz(A) cJR(j) = CJR(j) + AA(j) * bJC(j) ; end aJR(j),JC(j) Disadvantage: Indirect addressing (indexing) in vector c and b Æ jumps in memory2 Advantage: No difference between columns and rows (A and AT), simple. 4.1.2 Compressed Sparse Row Format: CSR row 1 row 2 row 3 row 4 row 5 AA 2 1 5 3 4 6 7 8 9 10 11 12 Values JA 4 1 4 1 2 1 3 4 5 3 4 5 Column indices IA: 1 3 6 10 12 13 pointer to row i Storage: n, nnz, n+nnz+1 integer, nnz float. Code for computing c = A*b: for i = 1 : n for j = IA(i) : IA(i+1)-1 ci = ci + AA(j)*bJA(j) ; end end Indirect addressing only in b. Columnwise Æ compressed sparse column format3 CSR with extracted main diagonal main diagonal entries nondiagonal entries in CSR AA 1 4 7 11 12 * 2 3 5 6 8 9 10 JA 7 8 10 13 14 14 4 2 4 1 4 5 3 Pointer to begin of i-th row: Storage: n, nnz, nnz + 1 integer, nnz + 1 float for i = 1 : n ci = AAi * bi ; for j = JA(i) : JA(i+1)-1 ci = ci + AAj*bJA(j) ; end end 4 4.1.4 Diagonalwise storage 1⎛ 0 2 0⎞ 0 ⎜ ⎟ 3⎜ 4 0 5⎟ 0 A0= ⎜ 6 7 0⎟ 8 ⎜ ⎟ 0⎜ 0 9 0⎟ 0 ⎜ ⎟ 0⎝ 0 0 11⎠ 12 Diagonal number -1 0 2 *⎛ 1⎞ 2 ⎜ ⎟ 3⎜ 4⎟ 5 Values in: DIAG =6⎜ 7⎟ 8 , IOFF=1() − 0 2 ⎜ ⎟ 9⎜ 10⎟ * ⎜ ⎟ 11⎝ 12⎠ * Storage: n, nd = number of diagonals, nd integers orf IOFF, n*nd float 5 4.1.5 Rectangular Storage Scheme by Pressing from the Right 1 0 2 0 0 1⎛ 2⎞ 0 ⎜ ⎟ 3 4 0 5 0 3⎜ 4⎟ 5 0 6 7 0 8 gives COEF =6⎜ 7⎟ 8 ⎜ ⎟ 0 0 9 10 0 9⎜ 10⎟ 0 ⎜ ⎟ 0 0 0 11 12 11⎝ 12⎠ 0 1⎛ 3⎞ * ⎜ ⎟ 1⎜ 2⎟ 4 JCOEF =2⎜ 3⎟ 5 ⎜ ⎟ 3⎜ 4⎟ * ⎜ ⎟ nl := nnz of longest row. 4⎝ 5⎠ * Storage: n, n*nl integer and float 6 Code for c = A b: for i = 1 : n for j = 1 : nl ci = ci + COEFF(i,j) * b(JCOEFF(i,j)); end end This format was used in ELLPACK (package of subroutines for elliptic PDE). Coordinate form is used by MATLAB. 7 4.1.6 Jagged Diagonal Form Prestep: Sort rows after heirt length. Long rows first. 1⎛ 0 2 0⎞ 0 3⎛ 4 0 5⎞ 0 ⎜ ⎟ ⎜ ⎟ Length 3 3⎜ 4 0 5⎟ 0 0⎜ 6 7 0⎟ 8 A0= ⎜ 6 7 0⎟ 8⇒PA1⎜ = 0 2 0⎟ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ 0 9 10⎟ 0 0⎜ 0 9 10⎟ 0Length 2 ⎜ ⎟ ⎜ ⎟ 0 0⎝ 0 11⎠ 12 0⎝ 0 0 11⎠ 12 Values3 of PA: 6DJ 1= ( 9 11 4 7 2 10 12) 5 8 First jagged diagonal second jagged diag. Column indices:12134JDIAG = ( 23345) 45 Pointer to beginning of j-th diagonal: IDIAG1= () 6 11 13 8 NDIAG = number of jagged diagonals Storage: n, NDIAG, nnz float, nnz + NDIAG integer Code for c = A b: for j = 1 : NDIAG for i = 1 : IDIAG(j+1) - IDIAG(j) Length of j-th jagged diag. k = IDIAG(j) + i - 1; ci = ci + DJ(k) * b(JDIAG(k)); end end Advantage: Always start with row 1. More operations on neighboring data. Prepermutation changes only rows. Can be done implicitly. 9 4.2 Sparse Matrices and Graphs 4.2.1 Graph G(A) for symmetric positive definite spd A=AT >0 n x n –matrix: vertices e1, … , en with edges (ei,ek) for aik ≠ 0 , undirected Graph *⎛ * 0⎞ * ⎜ ⎟ *⎜ * *⎟ 0 A = G(A): e e e e 0⎜ * *⎟ * 1 2 3 4 ⎜ ⎟ ⎜ ⎟ *⎝ 0 *⎠ * G(A) as directed graph: e1 e2 e3 e4 10 Adjacency Matrix for G(A) or A: 1⎛ 1 0⎞ 1 ⎜ ⎟ 1⎜ 1 1⎟ 0 AGA( ( ))= can be obtained directly by replacing 0⎜ 1 1⎟ 1 ⎜ ⎟ in A each nonzero entry by 1. ⎜ ⎟ 1⎝ 0 1⎠ 1 Symmetric permutations of A in the form P A PT change the ordering of the rows and columns of A simultaneously. Therefore, the graph of P A PT can be obtained by the graph of A by renumbering the vertices: Example: P permutation that changes 3 ÅÆ4: G(A): e1 e2 e3 e4 T G(PAP ): e1 e2 e4 e3 e1 e2 e3 e4 11 4.2.2 Matrix A nonsymmetric, G(A) undirected *⎛ * 0⎞ 0 1⎛ 1 0⎞ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ * *⎟ 0 0⎜ 1 1⎟ 0 A = ⇒ AGA( ( ))= 0⎜ * *⎟ * 0⎜ 1 1⎟ 1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ *⎝ 0 0⎠ * 1⎝ 0 0⎠ 1 G(A): e1 e2 e3 e4 How can we characterize „good“ sparsity patterns? „good“: Gaussian Elimination can eb reduced to smaller subproblems or produces no (or small) fill-in. 12 Block Diagonal Pattern *⎛ 0 *⎞ 0 1⎛ 0 1⎞ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ * 0⎟ * 0⎜ 1 0⎟ 1 A = ⇒ AGA( ( ))= *⎜ 0 *⎟ 0 1⎜ 0 1⎟ 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 0⎝ * 0⎠ * 0⎝ 1 0⎠ 1 G(A): e1 e2 e3 e4 *⎛ * 0⎞ 0 ⎜ ⎟ T *⎜ * 0⎟ 0⎛ A1 0 ⎞ 2 ÅÆ 3: e1 e3 e2 e4 PAP = = ⎜ ⎟ 0⎜ 0 *⎟ *⎜ 0 A ⎟ ⎜ ⎟ ⎝ 2 ⎠ ⎜ ⎟ 0⎝ 0 *⎠ * −1 By this permutation, A can be ⎛ A 0 ⎞ ⎛ A−1 0 ⎞ transformed into block diagonal ⎜ 1 ⎟ = ⎜ 1 ⎟ ⎜ ⎟ ⎜ −1 ⎟ form Æ easy to solve! ⎝ 0 A2 ⎠ ⎝ 0 A2 ⎠ 13 Banded Pattern ⎛a a ⎞ ⎜ 11 L 1p ⎟ MO⎜ O ⎟ ⎜ ⎟ aq1 O O A = ⎜ ⎟ ⎜ a ⎟ ⎜ O O n− p1 + , ⎟ n ⎜ O OM ⎟ ⎜ ⎟ ⎝ na, n− q + 1 L ann ⎠ Gauss Elimination without pivoting preserves het sparsity pattern ⎛l ⎞ u⎛ u ⎞ ⎜ 11 ⎟ ⎜ 11 L 1p ⎟ MO⎜ ⎟ ⎜ O O ⎟ ⎜ ⎟ ⎜ ⎟ lq1 O O O ⎜ ⎟ U = ⎜ ⎟ L = ⎜ ⎟ ⎜ ⎟ O nu− p1 + , n ⎜ OO ⎟ ⎜ ⎟ ⎜ O O ⎟ ⎜ OM ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ unn ⎠ ⎝ nl, n− q + 1 L lnn ⎠ With pivoting the bandwidth in U grows, but remains <= p+q. 14 Overlapping Block Diagonal ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ A = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ Pattern is preserved by Gaussian Elimination (with restricted pivoting). 15 Dissection Form Nested (recursive) dissection: 0 0 0 0 0 0 0 0 0 0 Pattern are preserved 0 during GE without 0 pivoting. No fill in 0 0 0 0 16 Schur Complement Reduction Idea: Write matrix B in terms of smaller submatrices: BB⎛ ⎞ BD⎛ −1 ⎞IBDBS⎛ + −1 ⎞ ! ⎛ I 0⎞ ⎜ 1 2⎟⋅⎜ 1 ⎟ = ⎜ 1 2 ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ −1 ⎟ ⎜ −1 −1 ⎟ ⎜ ⎟ BB⎝ 3 4⎠ ⎝ 0 SBBBDBS⎠ ⎝ 3 1 3+ 4 ⎠ ⎝* I ⎠ To satisfy hist equation we aveh o set:t −1 −1 1 − BDBSDBBS+1 2 =0 ⇒ =1 2 − −1 −1 1 − −1 BDBSIIBBBSBS+3 4= ⇒ =3 1 − 2 + 4 −1 −1 1 − S B⇒ B = B4 3B − 1 2 and= − D1 2 B B S S Schur Complement 17 BB⎛ ⎞ ⎛ I 0⎞ BB⎛ ⎞ B = ⎜ 1 2⎟ = ⎜ ⎟⋅⎜ 1 2⎟ ⎜ ⎟ ⎜ −1 ⎟ ⎜ ⎟ BB⎝ 3 4⎠ BBI⎝ 3 1 ⎠ ⎝ 0 S ⎠ Therefore, solving linear system in B is reduced to solving wot smaller linear systems, one in B1 and the other in the Schur complement S. B sparse Æ B1 also sparse, but S usually dense! Example: Schur complement and dissection orm:f AF⎛ 0 ⎞ ⎜ 1 1 ⎟ A = ⎜ 0 AF2 2⎟ ⎜ ⎟ Schur complement: GGA⎝ 1 2⎠ 3 ⎛ A−1 0 ⎞ ⎛ F ⎞ SAGG= −()⎜ ⋅1 ⎟⋅⎜ 1 ⎟ = 3 1 2 ⎜ −1 ⎟ ⎜ ⎟ ⎝ 0 A2 ⎠ ⎝ F2 ⎠ −1 −1 18 AGAFGAF=3 − 1 11 − 2 2 2 Direct derivation of Schur complement: AF⎛ 0 ⎞ ⎛ x ⎞ ⎛ b ⎞ A x+ F x= b ⎜ 1 1 ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ 1 1 1 3 1 ⎜ 0 AF2 2⎟⋅⎜ x2 ⎟ = ⎜b2 ⎟ ⇒ A x2+ 2 F2 3 x = 2 b ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ GGA⎝ 1 2⎠ 3 ⎝ x3 ⎠ ⎝Gb3 ⎠ x1+ 1 G2 x 2 +3 A 3 x = 3 b −1 −1 x1 A= b 1 1 − A 1 1 3 F x ⇒ −1 −1 x2 A= b 2 2 − A 2 2 3 F x −1 −1 −1 −1 GAbGAFx⇒ ( 1 11− 1 1 1)( 3+ GAb2 22 − 22 2 GAFx 3+) 3 3 = 3 Axb −1 −1 −1 −1 A GAF⇒(3 − 1GAFxb 11 2 − 2 2 ) 3= 3 − 11 GAbGAb 1 − 2 2 2 ~ 19 ⇒Sx3 = 3 b Algorithm for solving Ax=b based on Schur complement: 1. Compute S by using inv(A1) and inv(A2) ~ 2. Solve Sx3 =b 3 3. Compute x1 and x2 by using inv(A1) and inv(A2) The explicit computation of S can be avoided by solving the linear system in S iteratively, e.g. Jacobi, pcg, …. Then we need only part of S and in every iteration step we have to compute S * intermediate vector. To achieve fast convergence, a preconditioner (approximation) for S has to be used! Iterative methods and preconditioning will be subject of later chapters.

4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices

Triangular Factorization

An Efficient Implementation of the Thomas-Algorithm for Block Penta

Pivoting for LU Factorization

Mergesort / Quicksort Steven Skiena

7 Gaussian Elimination and LU Factorization

Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers✩

Maintaining LU Factors of a General Sparse Matrix*+ Department of Operations Research Stanford Uniuersity Stanfmd, Califmiu 9430

2 Partial Pivoting, LU Factorization

Some P-RAM Algorithms for Sparse Linear Systems

The Cholesky Factorization in Interior Point Methods

Why Sparse Matrix?

28 Matrix Operations