A Survey of Direct Methods for Sparse Linear Systems
Total Page:16
File Type:pdf, Size:1020Kb
A survey of direct methods for sparse linear systems Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar Technical Report, Department of Computer Science and Engineering, Texas A&M Univ, April 2016, http://faculty.cse.tamu.edu/davis/publications.html To appear in Acta Numerica Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of them.1 This informal yet practical definition captures the essence of the goal of direct methods for solving sparse matrix problems. They exploit the sparsity of a matrix to solve problems economically: much faster and using far less memory than if all the entries of a matrix were stored and took part in explicit computations. These methods form the backbone of a wide range of problems in computational science. A glimpse of the breadth of applications relying on sparse solvers can be seen in the origins of matrices in published matrix benchmark collections (Duff and Reid 1979a) (Duff, Grimes and Lewis 1989a) (Davis and Hu 2011). The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them. 1 Wilkinson actually defined it in the negation: \The matrix may be sparse, either with the non-zero elements concentrated ... or distributed in a less systematic manner. We shall refer to a matrix as dense if the percentage of zero elements or its distribution is such as to make it uneconomic to take advantage of their presence." (Wilkinson and Reinsch 1971), page 191, emphasis in the original. 2 Davis, Rajamanickam, Sid-Lakhdar CONTENTS 1 Introduction 2 2 Basic algorithms 5 3 Solving triangular systems 10 4 Symbolic analysis 11 5 Cholesky factorization 26 6 LU factorization 37 7 QR factorization and least-squares problems 52 8 Fill-reducing orderings 65 9 Supernodal methods 90 10 Frontal methods 100 11 Multifrontal methods 106 12 Other topics 134 13 Available Software 147 References 152 1. Introduction This survey article presents an overview of the fundamentals of direct meth- ods for sparse matrix problems, from theory to algorithms and data struc- tures to working pseudocode. It gives an in-depth presentation of the many algorithms and software available for solving sparse matrix problems, both sequential and parallel. The focus is on direct methods for solving systems of linear equations, including LU, QR, Cholesky, and other factorizations, for- ward/backsolve, and related matrix operations. Iterative methods, solvers for eigenvalue and singular-value problems, and sparse optimization prob- lems are beyond the scope of this article. 1.1. Outline • In Section 1.2, below, we first provide a list of books and prior survey articles that the reader may consult to further explore this topic. • Section 2 presents a few basic data structures and some basic algo- rithms that operate on them: transpose, matrix-vector and matrix- matrix multiply, and permutations. These are often necessary for the reader to understand and perhaps implement, to provide matrices as input to a package for solving sparse linear systems. • Section 3 describes the sparse triangular solve, with both sparse and dense right-hand-sides. This kernel provides a useful background in understanding the nonzero pattern of a sparse matrix factorization, which is discussed in Section 4, and also forms the basis of the basic Cholesky and LU factorizations presented in Sections 5 and 6. Sparse Direct Methods 3 • Section 4 covers the symbolic analysis phase that occurs prior to nu- meric factorization: finding the elimination tree, the number of nonze- ros in each row and column of the factors, and the nonzero patterns of the factors themselves (or their upper bounds in case numerical piv- oting changes things later on). The focus is on Cholesky factorization but this discussion has implications for the other kinds of factorizations as well (LDLT , LU, and QR). • Section 5 presents the many variants of sparse Cholesky factorization for symmetric positive definite matrices, including early methods of historical interest (envelope and skyline methods), and up-looking, left- looking, and right-looking methods. Multifrontal and supernodal meth- ods (for Cholesky, LU, LDLT and QR) are presented later on. • Section 6 considers the LU factorization, where numerical pivoting becomes a concern. After describing how the symbolic analysis dif- fers from the Cholesky case, this section covers left-looking and right- looking methods, and how numerical pivoting impacts these algorithms. • Section 7 presents QR factorization and its symbolic analysis, for both row-oriented (Givens) and column-oriented (Householder) variants. Also considered are alternative methods for solving sparse least-squares prob- lems, based on augmented systems or on LU factorization. • Section 8 is on the ordering problem, which is to find a good permu- tation for reducing fill-in, work, or memory usage. This is a difficult problem (it is NP-hard, to be precise). This section presents the many heuristics that have been created that attempt to find a decent solution to the problem. These heuristics are typically applied first, prior to the symbolic analysis and numeric factorization, but it appears out of or- der in this article, since understanding matrix factorizations (Sections 4 through 7) is a prerequisite in understanding how to best permute the matrix. • Section 9 presents the supernodal method for Cholesky and LU factor- ization, in which adjacent columns in the factors with identical nonzero pattern (or nearly identical) are \glued" together to form supernodes. Exploiting this common substructure greatly improves memory traf- fic and allows for computations to be done in dense submatrices, each representing a supernodal column. • Section 10 discusses the frontal method, which is another way of orga- nizing the factorization in a right-looking method, where a single dense submatrix holds the part of the sparse matrix actively being factorized, and rows and columns come and go during factorization. Historically, this method precedes both supernodal and multifrontal methods. • Section 11 covers the many variants of the multifrontal method for Cholesky, LDLT , LU, and QR factorizations. In this method, the ma- trix is represented not by one frontal matrix, but by many of them, 4 Davis, Rajamanickam, Sid-Lakhdar all related to one another via the assembly tree (a variant of the elim- ination tree). As in the supernodal and frontal methods, dense matrix operations can be exploited within each dense frontal matrix. • Section 12 considers several topics that do not neatly fit into the above outline, yet which are critical to the domain of direct methods for sparse linear systems. Many of these topics are also active areas of research: update/downdate methods, parallel triangular solve, GPU acceleration, and low-rank approximations. • Reliable peer-reviewed software has long been a hallmark of research in computational science, and in sparse direct methods in particular. Thus, no survey on sparse direct methods would thus be complete without a discussion of software, which we present in Section 13. 1.2. Resources Sparse direct methods are a tightly-coupled combination of techniques from numerical linear algebra, graph theory, graph algorithms, permutations, and other topics in discrete mathematics. We assume that the reader is familiar with the basics of this background. For further reading, Golub and Van Loan (2012) provide an in-depth coverage of numerical linear algebra and matrix computations for dense and structured matrices. Cormen, Leiserson and Rivest (1990) discuss algorithms and data structures and their analysis, including graph algorithms. MATLAB notation is used in this article (see Davis (2011b) for a tutorial). Books dedicated to the topic of direct methods for sparse linear systems in- clude those by Tewarson (1973), George and Liu (1981), Pissanetsky (1984), Duff, Erisman and Reid (1986), Zlatev (1991), Bj¨orck (1996), and Davis (2006). Portions of Sections 2 through 8 of this article are condensed from Davis' (2006) book. Demmel (1997) interleaves a discussion of numerical linear algebra with a description of related software for sparse and dense problems. Chapter 6 of Dongarra, Duff, Sorensen and Van der Vorst (1998) provides an overview of direct methods for sparse linear systems. Several of the early conference proceedings in the 1970s and 1980s on sparse matrix problems and algorithms have been published in book form, including Reid (1971), Rose and Willoughby (1972), Duff (1981e), and Evans (1985). Survey and overview papers such as this one have appeared in the lit- erature and provide a useful birds-eye view of the topic and its historical development. The first was a survey by Tewarson (1970), which even early in the formation of the field covers many of the same topics as this survey arti- cle: LU factorization, Householder and Givens transformations, Markowitz ordering, minimum degree ordering, and bandwidth/profile reduction and other special forms. It also presents both the Product Form of the Inverse and the Elimination Form. The former arises in a Gauss-Jordan elimina- tion that is no longer used in sparse direct solvers. Reid's survey (1974) Sparse Direct Methods 5 focuses on right-looking Gaussian elimination, and in two related papers (1977a, 1977b), also considers graphs, the block triangular form, Cholesky factorization, and least-squares problems. Duff (1977b) gave an extensive survey of sparse matrix methods and their applications with over 600 ref- erences. This paper does not attempt to cite the many papers that rely on sparse direct methods, since the count is now surely into the thousands. A Google Scholar search in September 2015 for the term \sparse matrix" lists 1.4 million results, although many of those are unrelated to sparse direct methods.