A Fast Implementation of the Minimum Degree Algorithm Using Quotient Graphs
Total Page:16
File Type:pdf, Size:1020Kb
A Fast Implementation of the Minimum Degree Algorithm Using Quotient Graphs ALAN GEORGE University of Waterloo and JOSEPH W. H. LIU Systems Dimensions Ltd. This paper describes a very fast implementation of the m~n~mum degree algorithm, which is an effective heuristm scheme for fmding low-fill ordermgs for sparse positive definite matrices. This implementation has two important features: first, in terms of speed, it is competitive with other unplementations known to the authors, and, second, its storage requirements are independent of the amount of fill suffered by the matrix during its symbolic factorization. Some numerical experiments which compare the performance of this new scheme to some existing minimum degree programs are provided. Key Words and Phrases: sparse linear equations, quotient graphs, ordering algoritl~ms, graph algo- rithms, mathematical software CR Categories: 4.0, 5.14 1. INTRODUCTION Consider the symmetric positive definite system of linear equations Ax = b, (1.1) where A is N by N and sparse. It is well known that ifA is factored by Cholesky's method, it normally suffers some fill-in. Thus, if we intend to solve eq. (1.1) by this method, it is usual first to find a permutation matrix P and solve the reordered system (pApT)(px) = Pb, (1.2) where P is chosen such that PAP T suffers low fill-in when it is factored into LL T. There are four distinct and independent phases that can be identified in the entire computational process: Permmsion to copy without fee all or part of this material m granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the titleof the publication and its date appear, and notice is given that copying ts by permission of the Association for Computing Machinery. To copy otherwise, or to repubhsh, requires a fee and/or specific permission. Thin research was supported in part by the Canadian Natural Sciences and Engineering Research Council under Grant A-8111. Authors' addresses: A. George, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1; J. W. H. Liu, Systems Dimensions Ltd., 111 Avenue Road, Toronto, Ontario, Canada. © 1980 ACM 0098-3500/80/0900-0337 $0.75 ACM Transactions on Mathematical Software, Vol 6, No. 3, September 1980, Pages 337-358. 338 Alan George and Joseph W. H. Liu (a) finding an appropriate ordering P, (b) storage allocation for the factor L of PAP T, (c) numerical factorization of PAP T into LL T, and (d) triangular solution of (LL T) (Px) ffi Pb. In this paper we study the efficient implementation of the ordering in step (a). A heuristic algorithm which experience suggests is very effective in finding such low-fill permutations is the so-called minimum degree algorithm [8]. This requires some form of simulation of the factorization (either explicit or implicit), since ordering decisions are made on the basis of the structure of the partially factored (reduced) matrices which normally appear in the usual implementation of Gaussian elimination. Specifically, at each step in the elimination the algorithm permutes the part of the matrix remaining to be factored, so that a column with the fewest nonzeros is in the pivot position. Some previous implementations known to the authors store an explicit repre- sentation of the partially factored matrix. The best implementation using this approach that we are aware of is contained in the initial version of the Yale Sparse Matrix Package [2] (it has since been replaced by an improved implemen- tation). In the sequel we shall refer to this code as MDE (minimum degree explicit). Implementations of this type require quite flexible data structures, since the matrix structure changes as the (symbolic) factorization proceeds, and the data structures must accommodate these changes. A more serious practical difficulty with using an explicit representation is that the maximum storage requirement is unpredictable and may exceed the storage required for the resulting factor L. In a recent paper the authors described a "minimal storage" implementation of the minimum degree algorithm [5]. The implementation requires only a few arrays of length N in addition to that needed for the graph of A. Furthermore, the graph of A is preserved during execution of the algorithm. In the sequel we denote this implementation by MDI (minimum degree implicit). Although MDI has obvious advantages in terms of storage when compared with MDE, the comparisons of execution times have been mixed. As the experiments in Section 5 show, for some problems the execution time of MDI is comparable to or even better than that of MDE, but for others the penalty paid by MDI for low storage requirements can be a substantial increase in execution time over that of MDE. The implementation we describe in this paper is endowed with most of the advantages of both MDE and MDI. Its storage requirement is comparable to that of MDI, and for some problems its execution time is substantially lower than that of either MDE or MDI. Its single disadvantage, shared by MDE and absent in MDI, is that the original graph of A is destroyed. Thus, if the graph of A is to be preserved for future use, a copy of it must be made before the algorithm is executed. Our algorithm makes heavy use of the notion of reachable sets described in [5] and also relies upon an efficient implementation of the quotient graph model of symmetric factorization, introduced and analyzed in [6]. We briefly review these topics in Sections 2 and 3.1, respectively, but for more details the reader is referred to the references cited. Since the algorithm is based on quotient graphs, ACM Transactmns on Mathematmal Software, VoL 6, No 3, September 1980 Fast Implementation of the Minimum Degree Algorfll~m • 339 we will refer to our implementation of the minimum degree algorithm as MDQ (minimum degree quotient). We pointed out earlier that ordering the equations is only one of the four main phases in the overall solution. It is well recognized that theoretically the major cost of the entire process, both in terms of storage and execution time, is in the numerical factorization and solution steps. However, our study here to improve the implementation of the minimum degree algorithm is still justified for the following reasons. 1. Practical experiments show that the minimum degree ordering for linear systems with 1000 to 2000 variables typically takes 25 to 50 percent of the overall execution time. 2. Since the four phases of the process are independent and thus can be executed separately, a reduction in the storage requirement in the ordering step is important. For the same reason, a predictable storage requirement for ordering is also a practical advantage. 3. We are describing a more efficient implementation of an existing, widely used ordering algorithm. 2. PRELIMINARIES ON MODELS OF SYMMETRIC FACTORIZATION As pointed out in the introduction, the minimum degree algorithm requires some form of simulation of the factorization process. In this section we discuss the various ways in which this process can be simulated. The approach is graph- theoretical, and the reader is assumed to be familiar with standard graph-theory notions and terminology, reference to which can be found in [1]. In subsequent sections various graph-theory notions are applied to different graphs. When the graph under consideration is not clear from the context, we add the appropriate subscript to the definition. Thus the notation AdjG(y) is used to denote the adjacent set of the node y in the graph G. 2.1 Elimination Graph Model In this section we describe the graph-theory approach to symmetric elimination as given by Patter [7] and Rose [8]. Let A be an N × N symmetric matrix. The labeled undirected graph of A, denoted by G A = {X A, EA), is one for which the node set X a is defined as X A ffi {xl, x2, . .., XN} and (x,, xj} ~ E A if and only ifA,j ~ 0. Note that there is an implicit labeling on the node set X A. For any N × N permutation matrix P, the unlabeled graphs of A and PAP T are the same, but the associated labelings differ. Thus the graph structure of A is a convenient vehicle for studying the permutations for A, since no particular ordering is implied by the graph. Now consider the symmetric factorization of A into LL T. The sparsity changes (fill-in) can be modeled by a sequence of graph transformations on G A. Let G = (X, E ) be a graph, and y ~ X. The elimination graph of G by y, denoted by Gy, is the graph obtained from G by deleting y and its incident edges and then adding ACM Transactions on Mathematmal Software, VoL 6, No. 3, September 1980 340 Alan George and Joseph W. H. Liu edges to the remaining graph, so that the set Adj(y) is a clique. This recipe is due to Patter [7]. With this definition the process of symmetric elimination on A can be modeled as a sequence of elimination graphs Go, G1, G~, .... GN-1, where Go = (Xo, Eo) = G A, G, = (G,-Dx, = (X,, E,), i ffi 1, 2, ..., N - 1. Here X, ffi {x,+l, x,+2, ..., XN}, and it is straightforward to verify that G,- G H' where H, is the part of the matrix remaining to be factored after step i of the factorization has been completed. Thus, this model is quite explicit; the structure of G~ corresponds directly to the matrix H,. 2.2. Reachable-Set Characterization A useful notion in the study of symmetric factorization in sparse linear systems is the reachable set [5].