A Supernodal Approach to Sparse Partial Pivoting

A Sup erno dal Approach to Sparse Partial Pivoting y z James W. Demmel Stanley C. Eisenstat John R. Gilb ert x XiaoyeS.Li Joseph W. H. Liu July 10, 1995 Abstract Weinvestigate several ways to improve the p erformance of sparse LU factorization with partial pivoting, as used to solve unsymmetric linear systems. To p erform most of the numerical computation in dense matrix kernels, weintro duce the notion of unsymmetric sup erno des. To b etter exploit the memory hierarchy,weintro duce unsymmetric sup erno de-panel up dates and two-dimensional data partitioning. To sp eed up symb olic factorization, we use Gilb ert and Peierls's depth- rst search with Eisenstat and Liu's symmetric structural reductions. Wehave implemented a sparse LU co de using all these ideas. We present exp eriments demonstrating that it is signi cantly faster than earlier partial pivoting co des. We also compare p erformance with Umfpack , which uses a multifrontal approach; our co de is usually faster. Keywords: sparse matrix algorithms; unsymmetric linear systems; sup erno des; column elimination tree; partial pivoting. AMSMOS sub ject classi cations: 65F05, 65F50. Computing Reviews descriptors: G.1.3 [Numerical Analysis]: Numerical Linear Algebra | Linear systems direct and iterative methods, Sparse and very large systems. Computer Science Division, University of California, Berkeley, CA 94720 fdemmel,[email protected] erkel ey.edu. The research of these authors was supp orted in part by NSF grant ASC{9313958, DOE grant DE{FG03{94ER25219, UT Sub contract No. ORA4466 from ARPA Contract No. DAAL03{91{C0047, DOE grant DE{FG03{94ER25206, and NSF Infrastructure grants CDA{8722788 and CDA{9401156. y Department of Computer Science, Yale University, P.O. Box 208285, New Haven, CT 06520-8285 [email protected]. The research of this author was supp orted in part by NSF grant CCR-9400921. z XeroxPalo Alto Research Center, 3333 Coyote Hill Road, Palo Alto, California 94304 gilb [email protected]. The research of this author was supp orted in part by the Institute for Mathematics and Its Application s at the University c of Minnesota. Copyright 1994, 1995 by Xerox Corp oration. All rights reserved. x Department of Computer Science, York University, North York, Ontario, Canada M3J 1P3 [email protected]. The research of this author was supp orted in part by the Natural Sciences and Engineering Research Council of Canada under grant A5509. 1 for column j =1 to n do f = A: ;j; Symb olic factor: determine which columns of L will up date f ; for each up dating column r<j in top ological order do Col-col up date: f = f f r L: ;r; end for; Pivot: interchange f j and f k , where jf k j = max jf j : nj; Separate L and U : U 1: j; j = f1: j ; Lj : n; j = fj:n; Divide: L: ;j= L: ;j=Lj; j ; Prune symb olic structure based on column j ; end for; Figure 1: LU factorization with column-column up dates. 1 Intro duction The problem of solving sparse symmetric p ositive de nite systems of linear equations on sequential and vector pro cessors is fairly well understo o d. Normally, the solution pro cess is broken into two phases: First, symb olic factorization to determine the nonzero structure of the Cholesky factor; Second, numeric factorization and solution. Elimination trees [24] and compressed subscripts [30] reduce the time and space for symb olic factorization to a low-order term. Sup erno dal [5] and multifrontal [11] elimination allow the use of dense vector op erations for nearly all of the oating-p oint computation, thus reducing the symb olic overhead in numeric factorization to a low-order term. Overall, the mega op rates of mo dern sparse Cholesky co des are nearly comparable to those of dense solvers [26]. For unsymmetric systems, where pivoting is required to maintain numerical stability, progress has b een less satisfactory. Recent research has concentrated on two basic approaches: submatrix- based metho ds and column-based or row-based metho ds. Submatrix metho ds typically use some form of Markowitz pivoting, in which each stage's pivot elementischosen from the uneliminated submatrix by criteria that attempt to balance numerical quality and preservation of sparsity. Recent submatrix co des include Ma48 from the Harwell subroutine library [10], and Davis and Du 's unsymmetric multifrontalcode Umfpack [6]. 1 Column metho ds, by contrast, typically use ordinary partial pivoting. The pivot is chosen from the current column according to numerical considerations alone; the columns may b e preordered b efore factorization to preserve sparsity. Figure 1 sketches a generic left-lo oking column LU factorization. Notice that the bulk of the numeric computation o ccurs in column-column up dates, or, to use Blas terminology [8], in sparse Axpy s. Column metho ds have the advantage that the preordering for sparsity is completely separate from the factorization, just as in the symmetric case. However, symb olic factorization cannot b e 1 Row metho ds are exactly analogous to column metho ds, and co des of b oth sorts exist. We will use column terminology; those who prefer rows mayinterchange the terms throughout the pap er. 2 separated from numeric factorization, b ecause the nonzero structures of the factors dep end on the numerical pivoting choices. Thus column co des must do some symb olic factorization at each stage; typically this amounts to predicting the structure of each column of the factors immediately b efore computing it. George and Ng [14,15] describ ed ways to obtain upp er b ounds on the structure of the factors based only on the nonzero structure of the original matrix. An early example of such a co de is Sherman's Nspiv [31] which is actually a row co de. Gilb ert and Peierls [20] showed how to use depth- rst search and top ological ordering to get the structure of each factor column. This gives a column co de that runs in total time prop ortional to the numb er of nonzero oating-p oint op erations, unlike the other partial pivoting co des. Eisenstat and Liu [13] designed a pruning technique to reduce the amount of structural information required for the symb olic factorization, as we describ e further in Section 4. The result was that the time and space for symb olic factorization were in practice reduced to a low order term. In view of the success of sup erno dal techniques for symmetric matrices, it is natural to consider the use of sup erno des to enhance the p erformance of unsymmetric solvers. One dicultyis that, unlike the symmetric case, sup erno dal structure cannot b e determined in advance but rather emerges dep ending on pivoting choices during the factorization. In this pap er, we generalize sup erno des to unsymmetric matrices, and we give ecient algorithms for lo cating and using unsymmetric sup erno des during a column-based LU factorization. We describ e a new co de called SuperLU that uses depth- rst search and symmetric pruning to sp eed up symb olic factorization, and uses unsymmetric sup erno des to sp eed up numeric factorization. The rest of the pap er is organized as follows. Section 2 intro duces the to ols we use: unsymmetric sup erno des, panels, and the column elimination tree. Section 3 describ es the sup erno dal numeric factorization. Section 4 describ es the sup erno dal symb olic factorization. In Section 5, we present exp erimental results: we b enchmark our co de on several test matrices, we compare its p erformance to other column and submatrix co des, and weinvestigate its cache b ehavior in some detail. Finally, Section 6 presents conclusions and op en questions. 2 Unsymmetric sup erno des The idea of a sup erno de is to group together columns with the same nonzero structure, so they can b e treated as a dense matrix for storage and computation. Sup erno des were originally used for symmetric sparse Cholesky factorization; the rst published results are by Ashcraft, Grimes, T Lewis, Peyton, and Simon [5]. In the factorization A = LL , a sup erno de is a range r : sof columns of L with the same nonzero structure b elow the diagonal; that is, Lr : s; r : s is full lower 2 triangular and every rowofLs:n; r : s is either full or zero. Columns of Cholesky sup erno des need not b e contiguous, but we will consider only contiguous sup erno des. Ng and Peyton [26] analyzed the e ect of sup erno des in Cholesky factorization on mo dern unipro cessor machines with memory hierarchies and vector or sup erscalar hardware. All the up dates from columns of a sup erno de are summed into a dense vector b efore the sparse up date is p erformed. This reduces indirect addressing, and allows the inner lo ops to b e unrolled. In e ect, a sequence of column-column up dates is replaced by a sup erno de-column up date. The sup-col up date can b e implemented using a call to a standard dense Blas-2 matrix-vector multiplication kernel. This idea can b e further extended to sup erno de-sup erno de up dates, which can b e implemented using a Blas-3 dense matrix-matrix kernel. This can reduce memory trac by an order of magnitude, 2 We use Matlab notation for integer ranges and submatrices: r : s or r : s is the vector of integers r;r +1;:::; s.

Load more