4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices

Total Page:16

File Type:pdf, Size:1020Kb

4. Linear Equations with Sparse Matrices 4.1 General Properties of Sparse Matrices 4. Linear Equations with sparse matrices 4.1 General properties of sparse matrices Full n x n – matrix: storage O(n2), solution O(n3) Æ too costly for most applications, especially for fine discretization (large n). Idea: Formulate the given problem in a clever way that leads to a linear system that is sparse: storage O(n), solution O(n)? (that is strucutured: storage O(n), solution O(n log(n)), FFT) (that is dense, but reduced from e.g.3D to 2D) Example: Tridiagonal matrix, banded matrix, block band matrix. 1 .⎛ 0 0 2 .⎞ 0 ⎜ ⎟ 3 .⎜ 4 . 0 5 .⎟ 0 Sparse example matrix 6A .= ⎜ 0 7 . 8 .⎟ , n 9 = .5, nnz = 12 ⎜ ⎟ 0 0⎜ 10 . 11⎟ . 0 ⎜ ⎟ 1 0⎝ 0 0 0⎠ 12 . 4.1.1 Storage in Coordinate Form values AA 12. 9. 7. 5. 1. 2. 11. 3. 6. 4. 8. 10. row JR5 332114 23234 columnJC5 534144 11243 To store: n, nnz, 2*nnz integer for row and column indices in JR and JC, and nnz float in AA. No sorting included. Redundant information. Code for computing c = A*b: for j = 1 : nnz(A) cJR(j) = CJR(j) + AA(j) * bJC(j) ; end aJR(j),JC(j) Disadvantage: Indirect addressing (indexing) in vector c and b Æ jumps in memory2 Advantage: No difference between columns and rows (A and AT), simple. 4.1.2 Compressed Sparse Row Format: CSR row 1 row 2 row 3 row 4 row 5 AA 2 1 5 3 4 6 7 8 9 10 11 12 Values JA 4 1 4 1 2 1 3 4 5 3 4 5 Column indices IA: 1 3 6 10 12 13 pointer to row i Storage: n, nnz, n+nnz+1 integer, nnz float. Code for computing c = A*b: for i = 1 : n for j = IA(i) : IA(i+1)-1 ci = ci + AA(j)*bJA(j) ; end end Indirect addressing only in b. Columnwise Æ compressed sparse column format3 CSR with extracted main diagonal main diagonal entries nondiagonal entries in CSR AA 1 4 7 11 12 * 2 3 5 6 8 9 10 JA 7 8 10 13 14 14 4 2 4 1 4 5 3 Pointer to begin of i-th row: Storage: n, nnz, nnz + 1 integer, nnz + 1 float for i = 1 : n ci = AAi * bi ; for j = JA(i) : JA(i+1)-1 ci = ci + AAj*bJA(j) ; end end 4 4.1.4 Diagonalwise storage 1⎛ 0 2 0⎞ 0 ⎜ ⎟ 3⎜ 4 0 5⎟ 0 A0= ⎜ 6 7 0⎟ 8 ⎜ ⎟ 0⎜ 0 9 0⎟ 0 ⎜ ⎟ 0⎝ 0 0 11⎠ 12 Diagonal number -1 0 2 *⎛ 1⎞ 2 ⎜ ⎟ 3⎜ 4⎟ 5 Values in: DIAG =6⎜ 7⎟ 8 , IOFF=1() − 0 2 ⎜ ⎟ 9⎜ 10⎟ * ⎜ ⎟ 11⎝ 12⎠ * Storage: n, nd = number of diagonals, nd integers orf IOFF, n*nd float 5 4.1.5 Rectangular Storage Scheme by Pressing from the Right 1 0 2 0 0 1⎛ 2⎞ 0 ⎜ ⎟ 3 4 0 5 0 3⎜ 4⎟ 5 0 6 7 0 8 gives COEF =6⎜ 7⎟ 8 ⎜ ⎟ 0 0 9 10 0 9⎜ 10⎟ 0 ⎜ ⎟ 0 0 0 11 12 11⎝ 12⎠ 0 1⎛ 3⎞ * ⎜ ⎟ 1⎜ 2⎟ 4 JCOEF =2⎜ 3⎟ 5 ⎜ ⎟ 3⎜ 4⎟ * ⎜ ⎟ nl := nnz of longest row. 4⎝ 5⎠ * Storage: n, n*nl integer and float 6 Code for c = A b: for i = 1 : n for j = 1 : nl ci = ci + COEFF(i,j) * b(JCOEFF(i,j)); end end This format was used in ELLPACK (package of subroutines for elliptic PDE). Coordinate form is used by MATLAB. 7 4.1.6 Jagged Diagonal Form Prestep: Sort rows after heirt length. Long rows first. 1⎛ 0 2 0⎞ 0 3⎛ 4 0 5⎞ 0 ⎜ ⎟ ⎜ ⎟ Length 3 3⎜ 4 0 5⎟ 0 0⎜ 6 7 0⎟ 8 A0= ⎜ 6 7 0⎟ 8⇒PA1⎜ = 0 2 0⎟ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ 0 9 10⎟ 0 0⎜ 0 9 10⎟ 0Length 2 ⎜ ⎟ ⎜ ⎟ 0 0⎝ 0 11⎠ 12 0⎝ 0 0 11⎠ 12 Values3 of PA: 6DJ 1= ( 9 11 4 7 2 10 12) 5 8 First jagged diagonal second jagged diag. Column indices:12134JDIAG = ( 23345) 45 Pointer to beginning of j-th diagonal: IDIAG1= () 6 11 13 8 NDIAG = number of jagged diagonals Storage: n, NDIAG, nnz float, nnz + NDIAG integer Code for c = A b: for j = 1 : NDIAG for i = 1 : IDIAG(j+1) - IDIAG(j) Length of j-th jagged diag. k = IDIAG(j) + i - 1; ci = ci + DJ(k) * b(JDIAG(k)); end end Advantage: Always start with row 1. More operations on neighboring data. Prepermutation changes only rows. Can be done implicitly. 9 4.2 Sparse Matrices and Graphs 4.2.1 Graph G(A) for symmetric positive definite spd A=AT >0 n x n –matrix: vertices e1, … , en with edges (ei,ek) for aik ≠ 0 , undirected Graph *⎛ * 0⎞ * ⎜ ⎟ *⎜ * *⎟ 0 A = G(A): e e e e 0⎜ * *⎟ * 1 2 3 4 ⎜ ⎟ ⎜ ⎟ *⎝ 0 *⎠ * G(A) as directed graph: e1 e2 e3 e4 10 Adjacency Matrix for G(A) or A: 1⎛ 1 0⎞ 1 ⎜ ⎟ 1⎜ 1 1⎟ 0 AGA( ( ))= can be obtained directly by replacing 0⎜ 1 1⎟ 1 ⎜ ⎟ in A each nonzero entry by 1. ⎜ ⎟ 1⎝ 0 1⎠ 1 Symmetric permutations of A in the form P A PT change the ordering of the rows and columns of A simultaneously. Therefore, the graph of P A PT can be obtained by the graph of A by renumbering the vertices: Example: P permutation that changes 3 ÅÆ4: G(A): e1 e2 e3 e4 T G(PAP ): e1 e2 e4 e3 e1 e2 e3 e4 11 4.2.2 Matrix A nonsymmetric, G(A) undirected *⎛ * 0⎞ 0 1⎛ 1 0⎞ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ * *⎟ 0 0⎜ 1 1⎟ 0 A = ⇒ AGA( ( ))= 0⎜ * *⎟ * 0⎜ 1 1⎟ 1 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ *⎝ 0 0⎠ * 1⎝ 0 0⎠ 1 G(A): e1 e2 e3 e4 How can we characterize „good“ sparsity patterns? „good“: Gaussian Elimination can eb reduced to smaller subproblems or produces no (or small) fill-in. 12 Block Diagonal Pattern *⎛ 0 *⎞ 0 1⎛ 0 1⎞ 0 ⎜ ⎟ ⎜ ⎟ 0⎜ * 0⎟ * 0⎜ 1 0⎟ 1 A = ⇒ AGA( ( ))= *⎜ 0 *⎟ 0 1⎜ 0 1⎟ 0 ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ 0⎝ * 0⎠ * 0⎝ 1 0⎠ 1 G(A): e1 e2 e3 e4 *⎛ * 0⎞ 0 ⎜ ⎟ T *⎜ * 0⎟ 0⎛ A1 0 ⎞ 2 ÅÆ 3: e1 e3 e2 e4 PAP = = ⎜ ⎟ 0⎜ 0 *⎟ *⎜ 0 A ⎟ ⎜ ⎟ ⎝ 2 ⎠ ⎜ ⎟ 0⎝ 0 *⎠ * −1 By this permutation, A can be ⎛ A 0 ⎞ ⎛ A−1 0 ⎞ transformed into block diagonal ⎜ 1 ⎟ = ⎜ 1 ⎟ ⎜ ⎟ ⎜ −1 ⎟ form Æ easy to solve! ⎝ 0 A2 ⎠ ⎝ 0 A2 ⎠ 13 Banded Pattern ⎛a a ⎞ ⎜ 11 L 1p ⎟ MO⎜ O ⎟ ⎜ ⎟ aq1 O O A = ⎜ ⎟ ⎜ a ⎟ ⎜ O O n− p1 + , ⎟ n ⎜ O OM ⎟ ⎜ ⎟ ⎝ na, n− q + 1 L ann ⎠ Gauss Elimination without pivoting preserves het sparsity pattern ⎛l ⎞ u⎛ u ⎞ ⎜ 11 ⎟ ⎜ 11 L 1p ⎟ MO⎜ ⎟ ⎜ O O ⎟ ⎜ ⎟ ⎜ ⎟ lq1 O O O ⎜ ⎟ U = ⎜ ⎟ L = ⎜ ⎟ ⎜ ⎟ O nu− p1 + , n ⎜ OO ⎟ ⎜ ⎟ ⎜ O O ⎟ ⎜ OM ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ unn ⎠ ⎝ nl, n− q + 1 L lnn ⎠ With pivoting the bandwidth in U grows, but remains <= p+q. 14 Overlapping Block Diagonal ⎛ ⎞ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ A = ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎝ ⎠ Pattern is preserved by Gaussian Elimination (with restricted pivoting). 15 Dissection Form Nested (recursive) dissection: 0 0 0 0 0 0 0 0 0 0 Pattern are preserved 0 during GE without 0 pivoting. No fill in 0 0 0 0 16 Schur Complement Reduction Idea: Write matrix B in terms of smaller submatrices: BB⎛ ⎞ BD⎛ −1 ⎞IBDBS⎛ + −1 ⎞ ! ⎛ I 0⎞ ⎜ 1 2⎟⋅⎜ 1 ⎟ = ⎜ 1 2 ⎟ =⎜ ⎟ ⎜ ⎟ ⎜ −1 ⎟ ⎜ −1 −1 ⎟ ⎜ ⎟ BB⎝ 3 4⎠ ⎝ 0 SBBBDBS⎠ ⎝ 3 1 3+ 4 ⎠ ⎝* I ⎠ To satisfy hist equation we aveh o set:t −1 −1 1 − BDBSDBBS+1 2 =0 ⇒ =1 2 − −1 −1 1 − −1 BDBSIIBBBSBS+3 4= ⇒ =3 1 − 2 + 4 −1 −1 1 − S B⇒ B = B4 3B − 1 2 and= − D1 2 B B S S Schur Complement 17 BB⎛ ⎞ ⎛ I 0⎞ BB⎛ ⎞ B = ⎜ 1 2⎟ = ⎜ ⎟⋅⎜ 1 2⎟ ⎜ ⎟ ⎜ −1 ⎟ ⎜ ⎟ BB⎝ 3 4⎠ BBI⎝ 3 1 ⎠ ⎝ 0 S ⎠ Therefore, solving linear system in B is reduced to solving wot smaller linear systems, one in B1 and the other in the Schur complement S. B sparse Æ B1 also sparse, but S usually dense! Example: Schur complement and dissection orm:f AF⎛ 0 ⎞ ⎜ 1 1 ⎟ A = ⎜ 0 AF2 2⎟ ⎜ ⎟ Schur complement: GGA⎝ 1 2⎠ 3 ⎛ A−1 0 ⎞ ⎛ F ⎞ SAGG= −()⎜ ⋅1 ⎟⋅⎜ 1 ⎟ = 3 1 2 ⎜ −1 ⎟ ⎜ ⎟ ⎝ 0 A2 ⎠ ⎝ F2 ⎠ −1 −1 18 AGAFGAF=3 − 1 11 − 2 2 2 Direct derivation of Schur complement: AF⎛ 0 ⎞ ⎛ x ⎞ ⎛ b ⎞ A x+ F x= b ⎜ 1 1 ⎟ ⎜ 1 ⎟ ⎜ 1 ⎟ 1 1 1 3 1 ⎜ 0 AF2 2⎟⋅⎜ x2 ⎟ = ⎜b2 ⎟ ⇒ A x2+ 2 F2 3 x = 2 b ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ GGA⎝ 1 2⎠ 3 ⎝ x3 ⎠ ⎝Gb3 ⎠ x1+ 1 G2 x 2 +3 A 3 x = 3 b −1 −1 x1 A= b 1 1 − A 1 1 3 F x ⇒ −1 −1 x2 A= b 2 2 − A 2 2 3 F x −1 −1 −1 −1 GAbGAFx⇒ ( 1 11− 1 1 1)( 3+ GAb2 22 − 22 2 GAFx 3+) 3 3 = 3 Axb −1 −1 −1 −1 A GAF⇒(3 − 1GAFxb 11 2 − 2 2 ) 3= 3 − 11 GAbGAb 1 − 2 2 2 ~ 19 ⇒Sx3 = 3 b Algorithm for solving Ax=b based on Schur complement: 1. Compute S by using inv(A1) and inv(A2) ~ 2. Solve Sx3 =b 3 3. Compute x1 and x2 by using inv(A1) and inv(A2) The explicit computation of S can be avoided by solving the linear system in S iteratively, e.g. Jacobi, pcg, …. Then we need only part of S and in every iteration step we have to compute S * intermediate vector. To achieve fast convergence, a preconditioner (approximation) for S has to be used! Iterative methods and preconditioning will be subject of later chapters.
Recommended publications
  • Triangular Factorization
    Chapter 1 Triangular Factorization This chapter deals with the factorization of arbitrary matrices into products of triangular matrices. Since the solution of a linear n n system can be easily obtained once the matrix is factored into the product× of triangular matrices, we will concentrate on the factorization of square matrices. Specifically, we will show that an arbitrary n n matrix A has the factorization P A = LU where P is an n n permutation matrix,× L is an n n unit lower triangular matrix, and U is an n ×n upper triangular matrix. In connection× with this factorization we will discuss pivoting,× i.e., row interchange, strategies. We will also explore circumstances for which A may be factored in the forms A = LU or A = LLT . Our results for a square system will be given for a matrix with real elements but can easily be generalized for complex matrices. The corresponding results for a general m n matrix will be accumulated in Section 1.4. In the general case an arbitrary m× n matrix A has the factorization P A = LU where P is an m m permutation× matrix, L is an m m unit lower triangular matrix, and U is an×m n matrix having row echelon structure.× × 1.1 Permutation matrices and Gauss transformations We begin by defining permutation matrices and examining the effect of premulti- plying or postmultiplying a given matrix by such matrices. We then define Gauss transformations and show how they can be used to introduce zeros into a vector. Definition 1.1 An m m permutation matrix is a matrix whose columns con- sist of a rearrangement of× the m unit vectors e(j), j = 1,...,m, in RI m, i.e., a rearrangement of the columns (or rows) of the m m identity matrix.
    [Show full text]
  • An Efficient Implementation of the Thomas-Algorithm for Block Penta
    An Efficient Implementation of the Thomas-Algorithm for Block Penta-diagonal Systems on Vector Computers Katharina Benkert1 and Rudolf Fischer2 1 High Performance Computing Center Stuttgart (HLRS), University of Stuttgart, 70569 Stuttgart, Germany [email protected] 2 NEC High Performance Computing Europe GmbH, Prinzenallee 11, 40549 Duesseldorf, Germany [email protected] Abstract. In simulations of supernovae, linear systems of equations with a block penta-diagonal matrix possessing small, dense matrix blocks occur. For an efficient solution, a compact multiplication scheme based on a restructured version of the Thomas algorithm and specifically adapted routines for LU factorization as well as forward and backward substi- tution are presented. On a NEC SX-8 vector system, runtime could be decreased between 35% and 54% for block sizes varying from 20 to 85 compared to the original code with BLAS and LAPACK routines. Keywords: Thomas algorithm, vector architecture. 1 Introduction Neutrino transport and neutrino interactions in dense matter play a crucial role in stellar core collapse, supernova explosions and neutron star formation. The multidimensional neutrino radiation hydrodynamics code PROMETHEUS / VERTEX [1] discretizes the angular moment equations of the Boltzmann equa- tion giving rise to a non-linear algebraic system. It is solved by a Newton Raphson procedure, which in turn requires the solution of multiple block-penta- diagonal linear systems with small, dense matrix blocks in each step. This is achieved by the Thomas algorithm and takes a major part of the overall com- puting time. Since the code already performs well on vector computers, this kind of architecture has been the focus of the current work.
    [Show full text]
  • Pivoting for LU Factorization
    Pivoting for LU Factorization Matthew W. Reid April 21, 2014 University of Puget Sound E-mail: [email protected] Copyright (C) 2014 Matthew W. Reid. Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.3 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled "GNU Free Documentation License". 1 INTRODUCTION 1 1 Introduction Pivoting for LU factorization is the process of systematically selecting pivots for Gaussian elimina- tion during the LU factorization of a matrix. The LU factorization is closely related to Gaussian elimination, which is unstable in its pure form. To guarantee the elimination process goes to com- pletion, we must ensure that there is a nonzero pivot at every step of the elimination process. This is the reason we need pivoting when computing LU factorizations. But we can do more with piv- oting than just making sure Gaussian elimination completes. We can reduce roundoff errors during computation and make our algorithm backward stable by implementing the right pivoting strategy. Depending on the matrix A, some LU decompositions can become numerically unstable if relatively small pivots are used. Relatively small pivots cause instability because they operate very similar to zeros during Gaussian elimination. Through the process of pivoting, we can greatly reduce this instability by ensuring that we use relatively large entries as our pivot elements. This prevents large factors from appearing in the computed L and U, which reduces roundoff errors during computa- tion.
    [Show full text]
  • Mergesort / Quicksort Steven Skiena
    Lecture 8: Mergesort / Quicksort Steven Skiena Department of Computer Science State University of New York Stony Brook, NY 11794–4400 http://www.cs.sunysb.edu/∼skiena Problem of the Day Given an array-based heap on n elements and a real number x, efficiently determine whether the kth smallest in the heap is greater than or equal to x. Your algorithm should be O(k) in the worst-case, independent of the size of the heap. Hint: you not have to find the kth smallest element; you need only determine its relationship to x. Solution Mergesort Recursive algorithms are based on reducing large problems into small ones. A nice recursive approach to sorting involves partitioning the elements into two groups, sorting each of the smaller problems recursively, and then interleaving the two sorted lists to totally order the elements. Mergesort Implementation mergesort(item type s[], int low, int high) f int i; (* counter *) int middle; (* index of middle element *) if (low < high) f middle = (low+high)/2; mergesort(s,low,middle); mergesort(s,middle+1,high); merge(s, low, middle, high); g g Mergesort Animation M E R G E S O R T M E R G E S O R T M E R G E S O R T M E M E R G E S O R T E M E M R E G O S R T E E G M R O R S T E E G M O R R S T Merging Sorted Lists The efficiency of mergesort depends upon how efficiently we combine the two sorted halves into a single sorted list.
    [Show full text]
  • 7 Gaussian Elimination and LU Factorization
    7 Gaussian Elimination and LU Factorization In this final section on matrix factorization methods for solving Ax = b we want to take a closer look at Gaussian elimination (probably the best known method for solving systems of linear equations). m×m The basic idea is to use left-multiplication of A ∈ C by (elementary) lower triangular matrices, L1,L2,...,Lm−1 to convert A to upper triangular form, i.e., Lm−1Lm−2 ...L2L1 A = U. | {z } =Le Note that the product of lower triangular matrices is a lower triangular matrix, and the inverse of a lower triangular matrix is also lower triangular. Therefore, LAe = U ⇐⇒ A = LU, where L = Le−1. This approach can be viewed as triangular triangularization. 7.1 Why Would We Want to Do This? Consider the system Ax = b with LU factorization A = LU. Then we have LUx = b. |{z} =y Therefore we can perform (a now familiar) 2-step solution procedure: 1. Solve the lower triangular system Ly = b for y by forward substitution. 2. Solve the upper triangular system Ux = y for x by back substitution. Moreover, consider the problem AX = B (i.e., many different right-hand sides that are associated with the same system matrix). In this case we need to compute the factorization A = LU only once, and then AX = B ⇐⇒ LUX = B, and we proceed as before: 1. Solve LY = B by many forward substitutions (in parallel). 2. Solve UX = Y by many back substitutions (in parallel). In order to appreciate the usefulness of this approach note that the operations count 2 3 for the matrix factorization is O( 3 m ), while that for forward and back substitution is O(m2).
    [Show full text]
  • Mixing LU and QR Factorization Algorithms to Design High-Performance Dense Linear Algebra Solvers✩
    J. Parallel Distrib. Comput. 85 (2015) 32–46 Contents lists available at ScienceDirect J. Parallel Distrib. Comput. journal homepage: www.elsevier.com/locate/jpdc Mixing LU and QR factorization algorithms to design high-performance dense linear algebra solversI Mathieu Faverge a, Julien Herrmann b,∗, Julien Langou c, Bradley Lowery c, Yves Robert a,d, Jack Dongarra d a Bordeaux INP, Univ. Bordeaux, Inria, CNRS (UMR 5800), Talence, France b Laboratoire LIP, École Normale Supérieure de Lyon, France c University Colorado Denver, USA d University of Tennessee Knoxville, USA h i g h l i g h t s • New hybrid algorithm combining stability of QR and efficiency of LU factorizations. • Flexible threshold criteria to select LU and QR steps. • Comprehensive experimental bi-criteria study of stability and performance. article info a b s t r a c t Article history: This paper introduces hybrid LU–QR algorithms for solving dense linear systems of the form Ax D b. Received 16 December 2014 Throughout a matrix factorization, these algorithms dynamically alternate LU with local pivoting and Received in revised form QR elimination steps based upon some robustness criterion. LU elimination steps can be very efficiently 25 June 2015 parallelized, and are twice as cheap in terms of floating-point operations, as QR steps. However, LU steps Accepted 29 June 2015 are not necessarily stable, while QR steps are always stable. The hybrid algorithms execute a QR step when Available online 21 July 2015 a robustness criterion detects some risk for instability, and they execute an LU step otherwise. The choice between LU and QR steps must have a small computational overhead and must provide a satisfactory level Keywords: Numerical algorithms of stability with as few QR steps as possible.
    [Show full text]
  • Maintaining LU Factors of a General Sparse Matrix*+ Department of Operations Research Stanford Uniuersity Stanfmd, Califmiu 9430
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Maintaining LU Factors of a General Sparse Matrix*+ Philip E. Gill, Walter Mnrray, Michael A. Saunders, and Margaret H. Wright Systems Optimization Laboratory Department of Operations Research Stanford Uniuersity Stanfmd, Califmiu 94305 Submitted by Jack Dongarra ABSTRACT We describe a set of procedures for computing and updating an LU factorization of a sparse matrix A, where A may be square (possibly singular) or rectangular. The procedures include a Markowitz factorization and a Bartels-Golub update, similar to those of Reid (1976, 1982). The updates provided are addition, deletion or replace- ment of a row or column of A, and rank-one modification. (Previously, column replacement has been the only update available.) Various design features of the implementation (LUSOL) are described, and compu- tational comparisons are made with the LAOS and MAZB packages of Reid (1976) and Duff (1977). 1. INTRODUCTION Gaussian elimination has long been used to obtain triangular factors of a matrix A. We write the factorization as A = LU, where L and U are ‘This research was supported by the U.S. Department of Energy Contract DE-AA03 76SFOO326, PA No. DE-AS03-76ER72018; National Science Foundation Grants DCR-8413211 and ECS-8312142; the Office of Naval Research Contract NOOO14-89K~343; and the U.S. Army Research Office Contract DAAG29-84-KX1156. f We dedicate this paper to Dr. James H. Wilkinson in recognition of his profound influence were among his favorite on LCI factorization, and in the belief that (‘, I) and (’ Ip) matrices.
    [Show full text]
  • 2 Partial Pivoting, LU Factorization
    LINEAR ALGEBRA: NUMERICAL METHODS. Version: August 12, 2000 19 2 Partial pivoting, LU factorization 2.1 An example We emphasize again, that all the calculations in the example in Section 1.1.3 were exact, i.e. the amplification of errors was not due to numerical inaccuracy; the given system was ill-conditioned. There is not much one could do about it on the mathematical or even com- putational level, apart from warning the user about the problem. In the worse case one could ask for more accurate data, but it is much better if the modelling of the applied problem is changed to a better conditioned problem. The following example, however, shows that even a well-conditioned problem can be made unstable by a carelessly designed algorithm. And this is the responsibility of the computational expert. Suppose that we use a floating point arithmetic with all numbers rounded to three signif- icant digits (i.e. to facilitate the idea we use decimal system). Let 10−4 1 1 A = and b = 11 2 and consider the system Ax = b. The solution of this system is 1.00010 ... 1 x ≈ ≈ true 0.99990 ... 1 Now we apply Gaussian elimination. It is possible to take 10−4 as a pivot since it is a nonzero number. The elimination step gives −4 −4 10 1 1 10 1 1 =⇒ 11 2 0 −9999 −9998 However the numbers 9999 and 9998 are both rounded to the same number 9990, i.e. the computer stores only −4 10 1 1 0 −9990 −9990 LINEAR ALGEBRA: NUMERICAL METHODS.
    [Show full text]
  • Some P-RAM Algorithms for Sparse Linear Systems
    Journal of Computer Science 3 (12): 956-964, 2007 ISSN 1549-3636 © 2007 Science Publications Some P-RAM Algorithms for Sparse Linear Systems 1Rakesh Kumar Katare and 2N.S. Chaudhari 1Department of Computer Science, A.P.S. University, Rewa (M.P.) 486003, India 2Nanyang Technological University, Singapore Abstract: PRAM algorithms for Symmetric Gaussian elimination is presented. We showed actual testing operations that will be performed during Symmetric Gaussian elimination, which caused symbolic factorization to occur for sparse linear systems. The array pattern of processing elements (PE) in row major order for the specialized sparse matrix in formulated. We showed that the access function in2+jn+k contains topological properties. We also proved that cost of storage and cost of retrieval of a matrix are proportional to each other in polylogarithmic parallel time using P-RAM with a polynomial numbers of processor. We use symbolic factorization that produces a data structure, which is used to exploit the sparsity of the triangular factors. In these parallel algorithms number of multiplication/division in O(log3n), number of addition/subtraction in O(log3n) and the storage in O(log2n) may be achieved. Key words: Parallel algorithms, sparse linear systems, cholesky method, hypercubes, symbolic factorization INTRODUCTION BACKGROUND In this research we will explain the method of We consider symmetric Gaussian elimination for representing a sparse matrix in parallel by using P- the solution of the system of linear Eq. RAM model. P-RAM model is a shared memory model. According to Gibbons and Rytter[6], the PRAM model A x = b will survive as a theoretically convenient model of parallel computation and as a starting point for a Where A is an n×n symmetric, positive definite methodology.
    [Show full text]
  • The Cholesky Factorization in Interior Point Methods
    An Intemdonal Journal .",_~, ~. Available online at www.sciencedirect.com computers & • o,..~-~o,.-o., mathematics with applications F_LSFv2P_,R Computers and Mathematics with Applications 50 (2005) 1157-1166 www.elsevier.com/locate/camwa The Cholesky Factorization in Interior Point Methods C. M~SZ~,ROS Computer and Automation Research Institute Hungarian Academy of Sciences, Budapest mcsaba@oplab, sztaki, hu Abstract--The paper concerns the Cholesky factorization of symmetric positive definite matrices arising in interior point methods. Our investigation is based on a property of the Cholesky fac- torization which interprets "small" diagonal values during factorization as degeneracy in the scaled optimization problem. A practical, scaling independent technique, based on the above property, is developed for the modified Cholesky factorization of interior point methods. This technique in- creases the robustness of Cholesky factorizations performed during interior point iterations when the optimization problem is degenerate. Our investigations show also the limitations of interior point methods with the recent implementation technology and floating point arithmetic standard. We present numerical results on degenerate linear programming problems of NETLIB. (~) 2005 Elsevier Ltd. All rights reserved. Keywords--Cholesky factorization, Interior point methods, Linear prograrnming. 1. INTRODUCTION The impressive progress in theory and practice of interior point methods (IPM) in the past 15 years raised several questions. The important practical issue, the stability of computations in IPMs, deserved especially great attention in the literature [1-4]. For most interior point algo- rithms, the major computational task is to solve systems of linear equations with sparse positive definite matrix, which is done by Cholesky decomposition in practice. One of the most important difficulties for IPMs is the ill-conditioning of these linear systems when the method approaches the optimal solution of the optimization problem [5].
    [Show full text]
  • Why Sparse Matrix?
    PRINCIPLES OF CIRCUIT SIMULAITON LectureLecture 9.9. LinearLinear Solver:Solver: LULU SolverSolver andand SparseSparse MatrixMatrix Guoyong Shi, PhD [email protected] School of Microelectronics Shanghai Jiao Tong University Fall 2010 2010-11-15 Slide 1 OutlineOutline Part 1: • Gaussian Elimination • LU Factorization • Pivoting • Doolittle method and Crout’s Method • Summary Part 2: Sparse Matrix 2010-11-15 Lecture 9 slide 2 MotivationMotivation • Either in Sparse Tableau Analysis (STA) or in Modified Nodal Analysis (MNA), we have to solve linear system of equations: Ax = b ⎛⎞A 00⎛⎞i ⎛0 ⎞ ⎜⎟ 00IAv−=T ⎜⎟ ⎜ ⎟ ⎜⎟⎜⎟ ⎜ ⎟ ⎜⎟⎜⎟ ⎜ ⎟ ⎡⎤11 1 KKiv0 ⎝⎠ e ⎝ S ⎠ GG ⎝⎠⎢⎥++22 −− RR13 R 3⎛⎞e1 ⎛⎞0 ⎢⎥⎜⎟= ⎜⎟ ⎢⎥111⎝⎠e2 ⎝⎠I S 5 ⎢⎥−+ ⎣ RRR343⎦ 2010-11-15 Lecture 9 slide 3 MotivationMotivation • Even in nonlinear circuit analysis, after "linearization", again one has to solve a system of linear equations: Ax = b. • Many other engineering problems require solving a system of linear equations. • Typically, matrix size is of 1000s to millions. • This needs to be solved 1000 to million times for one simulation cycle. • That's why we'd like to have very efficient linear solvers! 2010-11-15 Lecture 9 slide 4 ProblemProblem DescriptionDescription Problem: Solve Ax = b A: nxn (real, non-singular), x: nx1, b: nx1 Methods: – Direct Methods (this lecture) Gaussian Elimination, LU Decomposition, Crout – Indirect, Iterative Methods (another lecture) Gauss-Jacobi, Gauss-Seidel, Successive Over Relaxation (SOR), Krylov 2010-11-15 Lecture 9 slide 5 GaussianGaussian EliminationElimination
    [Show full text]
  • 28 Matrix Operations
    28 Matrix Operations Operations on matrices are at the heart of scientific computing. Efficient algo- rithms for working with matrices are therefore of considerable practical interest. This chapter provides a brief introduction to matrix theory and matrix operations, emphasizing the problems of multiplying matrices and solving sets of simultaneous linear equations. After Section 28.1 introduces basic matrix concepts and notations, Section 28.2 presents Strassen’s surprising algorithm for multiplying two n n matrices in 2(nlg7) O(n2.81) time. Section 28.3 shows how to solve a set of× linear equations using LUP= decompositions. Then, Section 28.4 explores the close relationship be- tween the problem of multiplying matrices and the problem of inverting a matrix. Finally, Section 28.5 discusses the important class of symmetric positive-definite matrices and shows how they can be used to find a least-squares solution to an overdetermined set of linear equations. One important issue that arises in practice is numerical stability. Due to the limited precision of floating-point representations in actual computers, round-off errors in numerical computations may become amplified over the course of a com- putation, leading to incorrect results; such computations are numerically unstable. Although we shall briefly consider numerical stability on occasion, we do not fo- cus on it in this chapter. We refer the reader to the excellent book by Golub and Van Loan [125] for a thorough discussion of stability issues. 28.1 Properties of matrices In this section, we review some basic concepts of matrix theory and some fun- damental properties of matrices, focusing on those that will be needed in later sections.
    [Show full text]