A Fast Implementation of the Minimum Degree Using Quotient Graphs

ALAN GEORGE University of Waterloo and JOSEPH W. H. LIU Systems Dimensions Ltd.

This paper describes a very fast implementation of the m~n~mum degree algorithm, which is an effective heuristm scheme for fmding low-fill ordermgs for sparse positive definite matrices. This implementation has two important features: first, in terms of speed, it is competitive with other unplementations known to the authors, and, second, its storage requirements are independent of the amount of fill suffered by the matrix during its symbolic factorization. Some numerical experiments which compare the performance of this new scheme to some existing minimum degree programs are provided. Key Words and Phrases: sparse linear equations, quotient graphs, ordering algoritl~ms, graph algo- rithms, mathematical software CR Categories: 4.0, 5.14

1. INTRODUCTION Consider the symmetric positive definite system of linear equations

Ax = b, (1.1) where A is N by N and sparse. It is well known that ifA is factored by Cholesky's method, it normally suffers some fill-in. Thus, if we intend to solve eq. (1.1) by this method, it is usual first to find a permutation matrix P and solve the reordered system (pApT)(px) = Pb, (1.2) where P is chosen such that PAP T suffers low fill-in when it is factored into LL T. There are four distinct and independent phases that can be identified in the entire computational process:

Permmsion to copy without fee all or part of this material m granted provided that the copies are not made or distributed for direct commercial advantage, the ACM copyright notice and the titleof the publication and its date appear, and notice is given that copying ts by permission of the Association for Computing Machinery. To copy otherwise, or to repubhsh, requires a fee and/or specific permission. Thin research was supported in part by the Canadian Natural Sciences and Engineering Research Council under Grant A-8111. Authors' addresses: A. George, Department of Computer Science, University of Waterloo, Waterloo, Ontario, Canada, N2L 3G1; J. W. H. Liu, Systems Dimensions Ltd., 111 Avenue Road, Toronto, Ontario, Canada. © 1980 ACM 0098-3500/80/0900-0337 $0.75 ACM Transactions on Mathematical Software, Vol 6, No. 3, September 1980, Pages 337-358. 338 Alan George and Joseph W. H. Liu

(a) finding an appropriate ordering P, (b) storage allocation for the factor L of PAP T, (c) numerical factorization of PAP T into LL T, and (d) triangular solution of (LL T) (Px) ffi Pb. In this paper we study the efficient implementation of the ordering in step (a). A heuristic algorithm which experience suggests is very effective in finding such low-fill permutations is the so-called minimum degree algorithm [8]. This requires some form of simulation of the factorization (either explicit or implicit), since ordering decisions are made on the basis of the structure of the partially factored (reduced) matrices which normally appear in the usual implementation of . Specifically, at each step in the elimination the algorithm permutes the part of the matrix remaining to be factored, so that a column with the fewest nonzeros is in the pivot position. Some previous implementations known to the authors store an explicit repre- sentation of the partially factored matrix. The best implementation using this approach that we are aware of is contained in the initial version of the Yale Package [2] (it has since been replaced by an improved implemen- tation). In the sequel we shall refer to this code as MDE (minimum degree explicit). Implementations of this type require quite flexible data structures, since the matrix structure changes as the (symbolic) factorization proceeds, and the data structures must accommodate these changes. A more serious practical difficulty with using an explicit representation is that the maximum storage requirement is unpredictable and may exceed the storage required for the resulting factor L. In a recent paper the authors described a "minimal storage" implementation of the minimum degree algorithm [5]. The implementation requires only a few arrays of length N in addition to that needed for the graph of A. Furthermore, the graph of A is preserved during execution of the algorithm. In the sequel we denote this implementation by MDI (minimum degree implicit). Although MDI has obvious advantages in terms of storage when compared with MDE, the comparisons of execution times have been mixed. As the experiments in Section 5 show, for some problems the execution time of MDI is comparable to or even better than that of MDE, but for others the penalty paid by MDI for low storage requirements can be a substantial increase in execution time over that of MDE. The implementation we describe in this paper is endowed with most of the advantages of both MDE and MDI. Its storage requirement is comparable to that of MDI, and for some problems its execution time is substantially lower than that of either MDE or MDI. Its single disadvantage, shared by MDE and absent in MDI, is that the original graph of A is destroyed. Thus, if the graph of A is to be preserved for future use, a copy of it must be made before the algorithm is executed. Our algorithm makes heavy use of the notion of reachable sets described in [5] and also relies upon an efficient implementation of the quotient graph model of symmetric factorization, introduced and analyzed in [6]. We briefly review these topics in Sections 2 and 3.1, respectively, but for more details the reader is referred to the references cited. Since the algorithm is based on quotient graphs, ACM Transactmns on Mathematmal Software, VoL 6, No 3, September 1980 Fast Implementation of the Minimum Degree Algorfll~m • 339 we will refer to our implementation of the minimum degree algorithm as MDQ (minimum degree quotient). We pointed out earlier that ordering the equations is only one of the four main phases in the overall solution. It is well recognized that theoretically the major cost of the entire process, both in terms of storage and execution time, is in the numerical factorization and solution steps. However, our study here to improve the implementation of the minimum degree algorithm is still justified for the following reasons. 1. Practical experiments show that the minimum degree ordering for linear systems with 1000 to 2000 variables typically takes 25 to 50 percent of the overall execution time. 2. Since the four phases of the process are independent and thus can be executed separately, a reduction in the storage requirement in the ordering step is important. For the same reason, a predictable storage requirement for ordering is also a practical advantage. 3. We are describing a more efficient implementation of an existing, widely used ordering algorithm.

2. PRELIMINARIES ON MODELS OF SYMMETRIC FACTORIZATION As pointed out in the introduction, the minimum degree algorithm requires some form of simulation of the factorization process. In this section we discuss the various ways in which this process can be simulated. The approach is graph- theoretical, and the reader is assumed to be familiar with standard graph-theory notions and terminology, reference to which can be found in [1]. In subsequent sections various graph-theory notions are applied to different graphs. When the graph under consideration is not clear from the context, we add the appropriate subscript to the definition. Thus the notation AdjG(y) is used to denote the adjacent set of the node y in the graph G.

2.1 Elimination Graph Model In this section we describe the graph-theory approach to symmetric elimination as given by Patter [7] and Rose [8]. Let A be an N × N symmetric matrix. The labeled undirected graph of A, denoted by G A = {X A, EA), is one for which the node set X a is defined as

X A ffi {xl, x2, . .., XN} and (x,, xj} ~ E A if and only ifA,j ~ 0. Note that there is an implicit labeling on the node set X A. For any N × N permutation matrix P, the unlabeled graphs of A and PAP T are the same, but the associated labelings differ. Thus the graph structure of A is a convenient vehicle for studying the permutations for A, since no particular ordering is implied by the graph. Now consider the symmetric factorization of A into LL T. The sparsity changes (fill-in) can be modeled by a sequence of graph transformations on G A. Let G = (X, E ) be a graph, and y ~ X. The elimination graph of G by y, denoted by Gy, is the graph obtained from G by deleting y and its incident edges and then adding ACM Transactions on Mathematmal Software, VoL 6, No. 3, September 1980 340 Alan George and Joseph W. H. Liu edges to the remaining graph, so that the set Adj(y) is a clique. This recipe is due to Patter [7]. With this definition the process of symmetric elimination on A can be modeled as a sequence of elimination graphs Go, G1, G~, .... GN-1, where Go = (Xo, Eo) = G A, G, = (G,-Dx, = (X,, E,), i ffi 1, 2, ..., N - 1.

Here X, ffi {x,+l, x,+2, ..., XN}, and it is straightforward to verify that G,- G H' where H, is the part of the matrix remaining to be factored after step i of the factorization has been completed. Thus, this model is quite explicit; the structure of G~ corresponds directly to the matrix H,.

2.2. Reachable-Set Characterization A useful notion in the study of symmetric factorization in sparse linear systems is the reachable set [5]. It can be used to model the factorization process in terms of the original graph structure. Let G = (X, E) be a given undirected graph. Consider a subset S C X and a node y ~ S. A node x is said to be reachable from the node y through S if there exists a path (y, sl .... , st, x) such that s, E S for 1 _< i ~ t. Since t can be zero, any node in Adj(y) - S is reachable from y through S. The reachable set ofy through S is then defined as Reach(y, S) ffi {x ~X- Slx is reachable from y through S}. The reachable set notion can be used to characterize the adjacency structure of elimination graphs. Let G A -~ (X A, E A) be the graph of A, with X A ffi {Xl, x2, . .., XN}. Let Go, G1 ..... GN-I be the sequence of elimination graphs as defined by the nodes xl, x2 ..... XN. Define S, = {Xl .... , x, } for i = 1..... N with So = O. The following result relates the structures of the elimination graph G, and the original graph G A through reachable sets. It is essentially a restatement of Lemma 4 in [9], which was discovered independently by the authors in [5]. THEOREM 2.1 [5, 9]. Let y be a node in the elimination graph G, = (X,, E,). The set of nodes adjacent to y in the graph G, is given by ReachGA ( y, S, ). The result means that the elimination process can be modeled by the subsets S, and the Reach operator on the original graph G A, since they represent implicitly the sequence of elimination graphs.

2.3 Quotient Graph Model In [6] the authors have introduced the quotient graph model for the study of the Gaussian elimination process. Its primary advantage is that it lends itself to very efficient computer implementation in simulating the factorization. We now intro- duce notations and d~finitions for the model. ACM Transactions on Mathematical Software, Vol. 6, No 3, September 1980. Fast Implementation of the Minimum Degree'AlgOri~ • .34t )

( ) Groph G Ouotient 9raph G/,~

Fig. 1. A quotient graph.

Let G = (X, E) be a graph with X the set of nodes and E the set of edges. For a subset S C X, G(S) will be used to refer to the subgraph (S, E(S)) of G, where E(S) = ( (u, v} e EI u, v e S}. The central notion in the new model is that of a quotient graph [3]. Let ~ be a partitioning of the node set X:

~= { Yl, Y2 ..... Yp}. That is, U~=~Yk = X and Y, gl Yj ffi ~ for i ~ j. The quotient graph of G with respect to ~is defined to be the graph (~, 8), where { Y~, Yj} E gif and only if Y, A Adj(Yj) # ~. This graph will be denoted by G/~. Consider the graph in Figure 1. If~= {{a, b, c}, {d, e}, {f}, (g,h, i}} is the partitioning, the quotient graph G/~is given as shown. A double circle is used to denote members with more than one node. An important type of partitioning is that defined by connected components. Let S be a subset of the node set X. The component partitioning W(S) of S is defined as ~¢(S) = {Y C SI G(Y) is a connected component in the subgraph G(S)}. A closely related type of component partitioning turns out to be relevant in the modeling of Gaussian elimination. The partitioning on X induced by S is defined to be -- u {{x} Ix s}. That is, the partitioning ~(S) consists of the component partitioning of S and the remaining nodes of the graph G. Consider the graph in Figure 1. Let S be the subset {a, b, d, f, h}. The component partitioning cd(S) is given by cd(S) - {{a, b}, {d, f}, {h}}, so that -~(S) has seven members, and the quotient graph G/-~(S) is given in Figure 2. Here crosshatched nodes are used to indicate partition members in ~d(S); recall that double circles denote members with more than one node. We now study the relevance of quotient graphs in symmetric factorization. Consider the factorization of a matrix A into LL w. As described in Section 2.1, the process may be interpreted as a sequence of elimination graphs Go, G1 ..... GN-1, ACM Transactmns on Mathematmal Software, Vol. 6, No. 3, September 1980. 342 Alan George and Joseph W. H. Liu

)

)---@

)

Fig. 2. The induced quotient graph G/~ {S). where G, precisely reflects the structure of the matrix remaining to be factored after the ith step of the Gaussian elimination. Alternatively, the process can also be represented as a sequence of quotient graphs, which may be regarded as implicit representations of the elimination graphs {G,}. Let {xl, x2, ..., XN} be the sequence of node elimination in the graph G -- G A. As in Section2.2, let S, ffi {xl .... , x,} for 1 _ i - N and So ffi O. Consider the partitioning ~(S,) induced by S, and the corresponding quotient graph G/~(S,) -- (~(S,), ~). We shall denote this quotient graph by if,. In this way we obtain a sequence of quotient graphs ~1, ~2, ..., ~N. The following result shows that, indeed, the quotient graph if, --- G/Cd (S,) can be regarded as an implicit representation of the elimination graph G,. TH~OR~.M 2.2. For y ~ S, Reach~,({y), ~(S,)) ffi {{x} Ix E ReachG(y, S,)). PROOF. Consider x E ReachG(y, S,). If x and y are adjacent in G, so are {x) and {y} in if,. Otherwise, there exists a path y, s~ .... , st, x in G where {sl,..., st) C S,. Let C be the component in G(S,) containing {s~,..., st}. Then we have a path { y}, C, {x} in if, so that {x} E Reach~,({y}, ~(S,)).

Conversely, consider any (x} E Reach ~.~,((y}, cd(S,)). There exists a path {y}, C~,..., C,, {x} in if, where each Cj E ~(S,). If t = 0, then x and y are adjacent in the original graph G. If t > 0, by the definition of ~¢(S,), t cannot be greater than 1; that is, the path must be (y}, c, (x}. Since G(C) is a connected subgraph, we can obtain a path from y to x through C C S, in the graph G. Hence x ~ ReachG(y, S,). [] ACM Transactions on Mathematical Software, Vol 6, No 3, September 1980 Fast Implementation of the Minimum Degi'ee Algofithrn - 343 ® ® "~ ® ® ~ ® ~

~2 ® Q ®~ ® < ® ®

O-o ~3 < ® .) ®

~5 ~6 } @

~8

Fig. 3. A sequence of quotient graphs.

Figure 3 contains an example of a quotient graph sequence. Partition members in ~f(S,) are crosshatched for clarity. As a theoretical tool for studying and understanding the symmetric factoriza- tion, the use of quotient graphs does not appear to be better than the elimination graphs or the reachable-set notion. However, the new model can be implemented very efficiently in terms of both time and space. In what follows we show that it can be implemented in place. ACM Transactionson MathematicalSoftware, Vol. 6, No. 3, September 1980. 344 • Alan George and Joseph W. H. Liu

LEMMA 2.3. Let G ffi (X, E) be a given graph, and let S C X, where G(S) is a connected subgraph. Then lAdy(x) I >- lAdy(S) [ + 2( J S] - 1). xES

PROOF. Let G(S) ffi iS, E(S)) and let V ffi {{u, v} E EI u ~ S, v ~ X - S}. Counting the edges incident to nodes in S, we have

Y. I Adj(x)l = 2IE(S)[ + I vl. x~S Obviously J VJ -> J Adj(S)l, and since G(S) is connected, we have ] E(S) I >- [ S [ - 1. By using these inequalities in the equation above we prove the lemma. [] We now show that the edge set sizes of the quotient graphs N, cannot increase with increasing i. Let x~, x2 ..... XN be the node sequence, let G,f(X,E,), O<_t<_N, be the corresponding elimination graph sequence, and let N, -- (

I -~(s,÷l)l -- I ~(s,+,)l + N - i - 1 - I v(S,)] + N - i = I ~(S,)l. For the inequality on the edge size, consider (d(S,+~). If {x,+~} E ~d(S,+~), clearly [ 8,+1 [ = [ ~, 1. Otherwise, the node x,+~ is merged with some components in ~d(S,) to form a new component in

PROOF. The first inequality follows from Theorem 2.4, arid the fac~ that ] Eol -- I E I implies the second one. [] The advantage of the quotient graph model over the reachable-set approach is, in fact, implicit in the proof of Theorem 2.2. To determine the set ReachG(y, SJ, paths emanating from the node y through the subset S~ have to be traced. These paths can be of arbitrary lengths (less than N), and many may be traversed unnecessarily. For example, in the graph of Figure 3, to find ReaehG(xT, $6), the three paths, (x7, xl, xs), (XT, x,, x~, x~), (X?, X4, X3, XS), of which the last one is unnecessary, have to be examined. However, the determination of Reach~,({y}, ¢E(S,)) in the quotient graph involves paths of length at most two (see proof of Theorem 2.2). Indeed, the time required is usually proportional to the size IReach~, ((y}, ~d(S,))I.

3. DESCRIPTION OF THE MINIMUM DEGREE ALGORITHM 3.1 Using Elimination Graphs The minimum degree algorithm is probably the most widely used heuristic algorithm for finding low-fill orderings. In terms of elimination graphs,'it can be described as follows. Let Go = (X, E) be an unlabeled graph. Step I (Initialization). Set i (--- 1. Step 2 (Minimum-degree selection). In the elimination graph G~-I choose x, to be a node such that [ Adja,_~(x,) I = min~ex,_~ [ Adjc,_~(x)[ where G,-x = (X~-l, E,-]). Step 3 (Graph transformation). Form the new elimination graph G, (G,-a)x . Step 4 (Loop or stop), i ~-- i + 1. If i > I X l, stop. Otherwise go to step 2. In effect, the algorithm orders a given graph by selecting nodes of minimum degree from elimination graphs (hence its name). The formation of the elimination graph in step 3 is necessary to facilitate the selection of the next node to be ordered. Different versions of the algorithm can be formulated for different ways of computing degrees of nodes in the elimination graph. An extremely good implementation of this formulation can be found in the Yale Sparse Matrix Package [2]. The sequence of elimination graphs is repre- sented explicitly in a linked list structure. Interested readers are referred to the report for details.

3.2 Using Reachable Sets Theorem 2.2 relates the adjacency structure of the elimination graph to that of the original graph. With this simple connection, the minimum degree algorithm can be restated in terms of reachable sets. : - ACM Transactions on MathemaUcal Software, VOL 6, No. 3, September 1980. 346 • Alan George and Joseph W. H. Llu

Step I (Initialization). S (-- 0. Step 2 (Selection). Pick a node y ~ X - S such that I Reach(y, S) I = minxex-slReach(x, S)l. Number the node y next. Step 3 (Implicit transformation). Set S (-- S O (y). Step 4 (Loop or stop). If S -- X, stop. Otherwise go to step 2. It should be noted that in the actual implementation the numbers I Reach(x, S) I for x E X - S should be stored to avoid recomputation. This means, in the implicit transformation step, some of these numbers have to be updated to reflect the change in the subset S. The following result is quoted from [5]. THEOREM 3.1. For u ~ X - S, Reach(u, S U (y}) = Reach(u, S) if u ~ Reach(y, S). This result justifies keeping the numbers I Reach(x, S)I, since many of them remain unchanged after the node y is numbered. Indeed, the numbering of node y only affects the degrees of the nodes in Reach(y, S). We have implemented this approach with some refinements, to speed up the execution of the algorithm [5]. The main advantage of this implementation is its small and predictable storage requirements. Moreover, it preserves the adjacency structure of the given graph.

3.3 Using Quotient Graphs The selection step in the minimum degree algorithm depends on the degrees of the nodes in the elimination graph. The reformulation of the algorithm m Section 3.2 makes use of the connection between the degree of a node in the elimination graph and the reachable set. In view of Theorem 2.2, we can describe the algorithm in terms of quotient graphs. Step 1 (Initialization). Set S ~-- Q, (~ -- G/C# (~). Step 2 (Selection). Pick a node y ~ X - S such that I Reach~¢((y), cd(S)) I = minxex-s I Reach,,,,((x}, cd(S)) I. Number the node y next. Step 3 (Quotient graph transformation). S (--- S U {y}, and put ff (-- G/~(S). Step 4 {Loop or stop). If S -- X, stop. Otherwise go to step 2. Basically, there is not much difference in the formulation of the algorithm in terms of reachable sets and quotient graphs. They both employ the notion of reachable sets, one on the original graph and the other on quotient graphs. Their major difference lies in their implementation. The important features of the quotient graph implementation are described in Section 5.

3.4. An Enhancement In [5] we have discussed various enhancements in the implementation of the algorithm by the reachable-set approach. For this section we describe one of ACM Transactions on Mathematical Software, Vol 6, No 3, September 1980 Fast Implementation of the Mintmum Degree Algodti'att ~'" 347

-:)

Fig. 4. A graph example showing indistinguishable nodes.

those enhancements in different terminologies; it is also applicable in the quotient graph approach. We begin by introducing an equivalence relation. Consider a stage in the elimination process, where S is the set of eliminated nodes. Two nodes x, y ~ X - S are said to be indtstinguishable with respect to elimination 1 if Reach(x, S) U (x} = Reach(y, S) U (y). Consider the graph in Figure 4. The subset S contains 36 shaded nodes. (This is an actual stage that occurs when the minimum degree algorithm is applied to this graph.) We note that the nodes a, b, and c are indistinguishable with respect to elimination, since Reach(a, S) U (a}, Reach(b, S) U (b}, and Reach(c, S) U (c) are all equal to (a,b,c,d,e,f,g,h,j,k). Two more groups can be identified as indistinguishable. They axe (j,k} and (f,g). We now study the implication of this equivalence relation and its role in the minimum degree algorithm. As we shall see later, this notion can be used to speed up the execution of the minimum degree algorithm. Tnv, oRv, M 3.2. Let x, y ~ X - S. If Reach(x, S) U (x} = Reach(y, S) U (y}, then, for all X - (x, y} 3 T D S, Reach(x, T) U {x) = Reach(y, T) U (y). PROOF. Obviously, x ~ Reach(y, S) C Reach(y, T) U T, so that x Reach (y, T) because x ~ T. We now want to show that Reach(x, T) C Reach(y,

Henceforth ~t should be understood that nodes referred to as "mdistingmshable" are indistinguish- able wtth respect to ehmmatton. ACM Transactmnson MathematwalSoftware, Vol. 6, No. 3, September1980. 348 Alan George and Joseph W. H. Llu

T) U (y}. Consider z ~ Reach(x, T). There exists a path (x, sl, . . ., st, z) where (sl, ..., st} C T. If all s, E S, there is nothing to prove. Otherwise, let s, be the Krst node in (sl ..... st} not in S; that is, s, E Reach(x, S) N T. This implies s, ~ Reach(y, S) and hence z ~ Reach(y, T). Together, we have Reach(x, T) U (x} C Reach(y, T) U {y}. The inclusion in the other direction can be proved in the same way by exchanging the roles of x and y. []

COROLLARY 3.3. Let x, y be indistinguishable with respect to the subset S. Then for T D S, I Reach(x, T)I--]Reach(y, T)I. In other words, if two nodes become indistinguishable at some stage of the elimination, they remain indistinguishable until one of them is ehminated. More- over, the following theorem shows that they can be eliminated together in the minimum degree algorithm. THEOREM 3.4. If tWO nodes become indistinguishable at some stage in the minimum degree algorithm, then they can be eliminated together in the algo- rithm. PROOF. Let x, y be indistinguishable after the elimination of the subset S. Assume that x becomes a node of minimum degree after the set T 2 S has been eliminated; that is, I Reach(x, T)I -< I Reach(z, T)I for all z ~ X - T. Then, by Corollary 3.3, ]Reach(y, T U {x})l -- I Reach(y, T) - (x}l ffi I Reach(y, T)I - 1 --I Reach(x, T)I - 1. Therefore, for all z ~ X - iT U {x}), I Reach(y, T U {x}) I <_]Reach(z, T)I - 1 _< ]Reach(z, T U {x}) I. In other words, after the elimination of the node x, the node y becomes a node of minimum degree. [] These observations can be exploited in the implementation of the minimum degree algorithm. After carrying out a minimum degree search to determine the next node y ~ X - S to eliminate, we can number immediately after y the set of nodes indistinguishable from y. In addition, in the degree update step, by virtue of Corollary 3.3, work can be ACM Transactions on Mathematmal Software, Vol 6, No 3, September 1980 Fast Implementation of the Minimum Degree Algo(ttt~. • 34,.~

Fig. 5. Indistinguishable nodes in two stages of elimination for the example in Figure 4 reduced, since indistinguishable nodes have the same degree in the elimination graphs. Once nodes are identified as being indistinguishable, they can be "glued" together and treated as a single supernode thereafter. For example, Figure 5 shows two stages in the eliminations, where supernodes are formed from indistinguishable nodes. For simplicity, the eliminated nodes are not shown. After the elimination of the indistinguishable set (a, b, c}, all the nodes have identical reachable sets so that they can be merged into one.

4. THE ALGORITHM AND SOME DETAILS ABOUT ITS IMPLEMENTATION 4.1 The Algorithm In view of the discussion in Section 3, the notion of indis.tinguishable nodes can be incorporated into the implementation of the minimum degree algorithm. Nodes identified as indistinguishable are merged together to form a supernode. They will be treated essentially as one node in the remainder of the algorithm. They share the same adjacent set, have the same degree, and can be eliminated together in the algorithm. In the implementation this supernode will be referenced by a representative of the set. Moreover, the determination of reachable sets for degree update can be done via the quotient graph model to improve the overall efficiency. In effect, elimi- nated connected nodes are merged together, and the computer representation of the sequence of quotient graphs is utilized. It should be emphasized that the idea of quotient (or merging nodes into supernodes) is applied here in two different contexts: (a) eliminated connected nodes, to facilitate the determination of reachable sets, and (b) uneliminated indistinguishable nodes, to speed up elimination. This is illustrated in Figure 6. It shows how the graph of Figure 4 is stored conceptually in this implementation by the two forms of quotient. The crosshatched double-circled nodes denote supernodes that have been eliminated, while blank double-circled supernodes represent those formed from indistinguishable nodes. The overall enhanced algorithm may be stated as follows: Step I (Initialization). Set S *- ~, f~ -- G/~(~). Step 2 (Selection). Pick a node y ~ X - S such that I Reach~{{y), ~(S)) I = ~emi~nslReach~({x}, ~(S))I.

ACM Transactions on Mathematical Software, Vol. 6, No. 3, September 1980 350 Alan George and Joseph W. H. Llu

(

Fig. 6. A quotient graph formed from two types of supernodes.

Step 3 (Elimination). Number the nodes in Y, which contains the node y and all those that have been identified as indistinguishable from y. Step 4 (Degree update). Let U ffi {u[ {u} E Reach~({y}, ~(S))} - Y. Compute the new degrees of the nodes in U and identify indistinguishable nodes in this set. Step 5 (Quotient graph transformation). S ~-- S U Y and set ~ ~-- G/~ (S). Step 6 (Loop or stop). If S ffi X, stop. Otherwise go to step 2. In what follows we describe some essential details about the computer imple- mentation. 4.2. Computer Representation of Quotient Graphs A graph is completely determined by its adjacency structure, that is, the set of adjacency lists for every node in the graph. We represent such a structure by storing the adjacency lists sequentially in a one-dimensional array along with an index array of length N containing pointers to the beginning of each adjacency list in the storage array. An example is shown in Figure 7, where the lists are separated for illustrative purposes. Given a graph G ffi (X, E) and a subset S, how can we represent the quotient graph

ffi G/~(S), using the space provided by the adjacency structure of the original graph? Consider a connected component Y in the subgraph G(S). By Lemma 2.3 we have IAdj(y) [ - IAdj(Y) I + 2(I YI - 1), yEY ACM Transactions on Mathematical Software, Vol 6, No 3, September 1980 Fast Implementation of the Minimum Degree Algorithm • 351

2' I-6"! 3 : FT'-l-~l 4:13 16 17"l 5 : I-I-I

6 : r~-l-~ 7 : I-T-T~ 8:12 13 16 191

9 : I-6-1 Fig. 7. A graph and its representation.

i . FTI~

5'7~ s : I-~"[~-I ;' : Ci"]'3-1 8:121316191 9 : I-F1 Fig. 8 A quotient graph representation. or [Adj(Y)[-- Y, [Adj(y)[-2([Y[- 1). yeY Therefore, there are always enough locations for Adj(Y) from those for Adj(y), y E Y. Moreover, for Y ~ O there is always a surplus of 2([ Y [ - 1) locations which can be utilized for links or pointers. In the graph example in Figure 8, let the subset S -- (1, 2, 3, 4, 5}. Figure 8 shows the way the quotient graph ~ = G/~d(S) is stored. The components in G(S) are given by ~(S) -- ((1, 5}, (2), {3, 4}). Consider finding the adjacent set of the component {3, 4} in the quotient graph representation. It can be retrieved by first exhausting the current list for node 3; the link to node 4 then gives the remaining neighbors of the component. Another point that is worth noting in the example is the use of representatives for component members. The components (1, 5) and {3, 4} are now represented by the nodes 1 and 3, respectively. In other words, the nodes 5 and 4 will be ACM Transactmns on Mathematmal Software, Vol. 6, No. 3, September 1980. 352 Alan George and Joseph W. H. Liu ignored in future processing, although the storage for their original adjacent lists may be used for the quotient graph representation.

4.3 Determination of Reachable Sets The determination of reachable sets in the quotient graph ~ = G/~(S) is straightforward. For a given node y ~ X - S, the following algorithm returns the set Reach~({ y}, cd(S)). Step 1 (Initialization). R (--- •. Step 2 (Find reachable nodes) For K ~ Adj~((y)) if K ~ ~(S) then R (-- R U Adj.(K) else R (-- R U K.

Step 3 (Exit). The reachable set is given in R. In step 2 reachable nodes are obtained via paths of length of at most two. This is possible because connected eliminated nodes have been coalesced in the quotient graph.

4.4 Identification of Indistinguishable Nodes In general, to identify indistinguishable nodes via the definition in Section 3.4 is time-consuming. Since the enhancement does not require the merging of all possible indistinguishable nodes, we look for some simple and easily implemented condition. In what follows a condition is presented which experience has shown to be very effective. In most cases it identifies all indistinguishable nodes. Let G =- (X, E) and S be the set of eliminated nodes. Let K1 and K2 be two connected components in the subgraph G(S); that is,

K1, K2 ~ ~(S). LEMMA 4.1. Let R1 =- Adj(KI) and R2 = Adj(K2). If y ~ RIO Rs and Adj(y) C RIO R2 U K~ U K2, then Reach(y, S) U (y} -- R1 U Rs.

PROOF. Let x ~ R1 U Rs. Assume x ~ R~ -- Adj(K1). Since K1 is a connected component in G(S), we can find a path from y to x through K1 C S. Therefore, x ~ Reach(y, S) U {y}. On the other hand, y ~ R~ U R2 by definition. Moreover, if x E Reach(y, S), there exists a path from y to x through S:

y, Sl, ss, . . . , st, X. If t ---- 0, then x ~ Adj (y) - S C R1 U Rs. Otherwise, if t > 0, s~ ~ Adj (y) n S c/£1 o K2. This means that (Sl ..... st} is a subset of either K~ or Ks, so that x ~ R~ U Rs. Hence Reach(y, S) U (y} C R1 U R2. [] TH~.OREM 4.2. Let K1, Ks and R~, Re be as in Lemma 4.1. Then the nodes in Y = ( y ~ RI n R21Adj(y) c R1 0 Rs U KI O Ks } are indistinguishable with respect to the eliminated subset S (see Figure 9). ACM Transactions on Mathematical Software, Vol 6, No 3, September 1980 Fast Implementation of the Minimum Degree Algorithm - 353

Fig. 9. Indistinguishable nodes m Y.

The proof follows directly from Lemma 4.1 and is omitted. COROLLARY 4.3. For y E Y, I Reach(y, S) [ = I R1 U R21 - 1. Theorem 4.2 can be used to merge indistinguishable nodes in the intersection of the two reachable sets R1 and R2. The test can be simply done by inspecting the adjacent set of nodes in the intersection R1 ¢3 R2. The results of Theorem 4.2 and Corollary 4.3 can be applied to the degree update step of the minimum degree algorithm. We may take the set U = {u[ {u} E Reach~((y}, ~(S))} - Y in the algorithm as the set R1 in Theorem 4.2. By considering each neighboring eliminated component as Ke, .we can identify the indistinguishable nodes in U.

4.5 Quotient Graph Transformation In the minimum degree algorithm the selection of node y puts it into the subset S, so it is necessary to do the transformation ~#,-1 -- G/~(S) --~ G/~(S U {y}) -- 9~,. The following algorithm performs this transformation. Step I (Preparation). Determine the sets T = Adj~,_i((y}) N ~d(S), R = Reach~,_,({y}, ~(S)). Step 2 (Form new supernode and partitioning). Form K={y}UT, ~d(S U (y}) = (~(S) - T) U (K}. Step 3 (Update adjacency) Adj~,(K) = R. For {x} ~ R, Adj~,({x}) -- {K} U Adj~,_l({y}) - (T U {y}). ACM Transactions on Mathematical Software, Vol. 6, No. 3, September 1980. 354 Alan George and Joseph W. H. Liu I GENQMD1 f' Fig. 10. Control relationship among the subroutines which Lrnplement the minh'num degree algorithm.

4.6 Some Implementation Details In this section we describe the basic features of the Fortran implementation of the algorithm described in Section 4.1. The implementation consists of a set of five subroutines, whose control relationship is shown in Figure 10. In step 1, f# = G, and the explicit representation of G is provided by means of a pair of arrays, as described in Section 4.2. The sequence of quotient graphs generated by the algorithm is represented in the same arrays, using the linking procedure also described in Section 4.2. Step 2 of the algorithm is facilitated by storing the current degrees of the nodes in the elimination graphs (implicitly represented) in an array DEG. Step 3 consists of updating the output permutation for those nodes y E Y which form the supernode being eliminated. Step 4 of the algorithm involves the most work and is implemented by QMDUPD. First the reachable set of the node or supernode that has been selected in step 3 is determined, using QMDRCH. This subroutine implements the algorithm of Section 4.3 and as a by-product also supplies the set of eliminated supernodes adjacent to the node being eliminated. Then the groups of indistin- guishable nodes in this reachable set are identified using the results of Section 4.4. The sets are identified and merged into supernodes using the subroutine QMDMRG, and their degrees are updated. (Nodes not identified as supernodes have their degrees computed by QMDRCH.) Finally, step 5 is accomplished using the algorithm of Section 4.5, which is implemented by the subroutine QMDQT. Note that Figure 6 shows how the adjacency structure of f# appears after several quotient graph updates. In the representation of the sequence of quotient graphs, connected eliminated nodes are merged to form a supernode. As mentioned in Section 4.2, for the purpose of reference, it is sufficient to pick a representative from the supernode. If C is such a connected component, we always choose the node x E C last eliminated to represent C. This implies that the remaining nodes in C can be ignored in subsequent quotient graphs. The same remark applies to indistinguish- able groups of uneliminated nodes. For each group only the representative will be considered in the current quotient structure. The working vector MARKER is used to mark those nodes that can be ignored in the adjacency structure. The MARKER values for such nodes are set to -1. ACM Transactions on Mathemat]cal Software, Vol 6, No 3, September 1980 Fast Implementation of the M,n,mum Degree Algorithm • 355

QLINK QSIZE MARKER

m I 2 - 3 o! -I! 4 -Ii O 5 0 6 -I' @ 7 8 0 ,2 -I 9 o

Fig. 11. Illustration of the role of QLINK, QSIZE, and MARKER working vectors. This vector is also used temporarily to facilitate the generation of reachable sets. Two more arrays, QSIZE and QLINK, are used for completely specifying indistinguishable supernodes. If node I is the representative, the number of nodes in this superuode is given by QSIZE(I), and the nodes are given by I, QLINK(I), QLINK(QLINK(I)) ..... Figure 11 illustrates the use of the vectors QSIZE, QLINK, and MARKER. The nodes {2, 5, 8} form an indistinguishable supernode represented by node 2. Thus, the MARKER values of 5 and 8 are -1. On the other hand, {3, 6, 9} forms an eliminated supernode. Its representative is node 9, so that MARKER(3) and MARKER(6) are -1.

5. NUMERICAL EXPERIMENTS AND CONCLUDING REMARKS In this section we report on some numerical experiments showing the performance of our implementation of the minimum degree algorithm (MDQ), based on the use of quotient graphs. As a basis for comparison we have used the two other implementations mentioned in the introduction, one from the (old) Yale Sparse Matrix Package [2], which we denote by MDE, and one from [5], which we denote by MDI. 2 The MDE subroutine is a conventional implementation, in the sense that it represents the elimination graphs Go, G1 .... , GN-1 explicitly in a linked list structure. The MDI implementation, on the other hand, uses only the original graph and does not change it during the execution of the algorithm. Information that is required about the G, is obtained through generation of the appropriate reachable sets, as described in detail in [5]. As mentioned earlier, the expressions MDQ, MDE, and MDI denote quotient graph, explicit, and implicit methods of representing the elimination graphs. Our first set of test problems was chosen to obtain evidence bearing on the asymptotic behavior of our program. We used the set of "graded-L" mesh problems from [4, p. 1062] that consists of a sequence of similar problems typical of those arising in finite-element applications. The results of these runs are

2 The implementation of the minimum degree algorithm in the recent version of the Yale Sparse Matrix Package uses a so-called element model, which embodies many of the ideas used in this paper, such as mass elmnnation, mass degree update, etc. Its execution tune is competitwe with or superior to, but its storage requirement approximately double, that of the MDQ. ACM Transactions on Mathematical Software, Vol. 6, No. 3, September 1980. 356 Alan George and Joseph W. H. Liu

x

x

x

ACM Transactions on Mathematical Software, Vol. 6, No. 3, September 1980 Fast Implementation of the Minimum Degree Algorithm 367

Table II. Comparison of the Performance of the MDE, MDI, and MDQ Implementatig~ of the Minimum Degree Apphed to 16 Test Problems Execution t~me Storage requirements

N MDE MDI MDQ MDE MDI MDQ 1176 3.98 2 95 6.63 5.91 2.91 4.30 663 0.28 0.13 0.25 0.56 0.71 0.55 822 1 00 6.08 2.54 1 20 1.06 1.04 822 4.00 19.32 7.48 3.41 1.22 1 37 1250 1 04 8.03 3.41 1 90 1.63 1.64 796 0 71 4.90 2.11 1.21 1,04 1.04 700 5 40 18.82 8 54 5.15 0.98 1.05 936 2.37 2.46 1.44 2 77 1.47 1.72 1009 2.92 2.97 1.55 3.25 1.59 1.88 1089 2.95 2.86 1.75 3.32 1.72 2.02 1440 3 12 3.86 2 35 3 85 2.25 2.62 1180 2 22 2.71 1 90 2.99 1.84 2.14 1377 2 38 3.12 2 06 3.52 2.14 2.49 1138 2.53 3.17 1 95 2 97 1.78 . 2.06 1141 2.59 3.27 2 05 2 99 1.77 2.06 1349 2.53 3.73 2.11 4.09 2.12 2.49 Average 2 50 5.52 3.01 3.07 1.64 1.90' (xl04) (Xl04) (Xl04) summarized in Table I. Execution times are in seconds on an IBM 3031 computer. The programs are all written in Fortran and were compiled using the optimizing version of the IBM H-level compiler. In terms of execution time the MDQ implementation enjoys a significant advantage over the other two implementations. Both MDI and MDQ appear to execute in times very close to proportional to I E[ for these problems, but the MDE execution time appears to be superlinear in I E [. Because of differences in constants of proportionality, however, MDE is still quite competitive with MDI until N is several thousand. Turning now to storage requirements, it is evident that the MDI and MDQ subroutines require storage proportional to [EI. On the other hand, MDE explicitly generates a sequence of graphs Go, G~, ..., G~-I, and the maximum storage required may greatly exceed [E[. The entries in the storage column for MDE were obtained by monitoring the maximum storage used by the working storage arrays. In general, for implementations which store the elimination graphs explicitly, the user must estimate the maximum storage requirement and provide at least that much to allow the program to execute. This is a major disadvantage not shared by MDI and MDQ, since they use a fLxed, predictable amount of storage. It is interesting to note that the storage required by the subroutine MDQ is actually less than that required by MDI. However, MDI does not change the input graph structure, since it generates reachable sets via the original graph Go. On the other hand, MDQ generates a sequence of graphs starting with the original and does not preserve its input graph. Thus, to be fair in the comparison, we have added space required to retain the input graph to the MDQ storage requirements. Alternatively, one could preserve the input by writing it on auxiliary storage and ACM Transactions on Mathematmal Software, Vol. 6, No. 3, September 1980 358 Alan George and Joseph W. H. Llu then reading it after the ordering had been found. The apparent storage advantage of MDI over MDQ would then be removed. Although MDI and MDQ will always enjoy a storage advantage over MDE, the magnitude of this advantage is problem-dependent. Moreover, it turns out that the execution-time advantage enjoyed by MDQ over MDE in the foregoing problems is not indicative of the results for other problems. Sometimes the lower storage requirement of MDQ and MDI over MDE is paid for through substan- tially increased execution time. To illustrate this, we applied the three subroutines to a collection of problems from various applications, including finite-element problems and normal equations of sparse least-square problems arising in survey- ing and linear programming. The results of these experiments are summarized in Table II. The results in Table II show that none of the methods is uniformly faster than the others, but the "average" row in the table suggests to us that the MDQ implementation is the method of choice. Specifically, if we make the reasonable assumption that the cost of running a program is proportional to the product of execution time and storage used, the average costs of MDE, MDI, and MDQ are 7.68, 9.05, and 5.71, respectively. (In many installations the storage requirements are weighted more heavily, making MDI and MDQ more favorable.) We feel that these costs, combined with the predictability of storage requirements, put MDQ in a very strong position.

REFERENCES 1. BERGE, C. The Theory of Graphs and Its Applicatmns. Wiley, New York, 1962. 2. EISENSTAT, S.C., GURSKY, M.C., SCHULTZ, M.H., AND SHERMAN, A.H. Yale sparse matrLx package. I: The symmetric codes. Res. Rep. 112. Computer Science Dep., Yale Univ., New Haven, Conn. 3. GEORGE, A., AND LIU, J.W.H. Algorithms for matrix partitioning and the numerical solution of finite element systems SIAM J. Numer. Anal. 15 (1978), 293-327. 4. GEORGE, A., AND LIU, J.W.H. An automatic nested dissection algorithm for Lrregular finite element problems. SIAM J. Numer. Anal. 15 (1978), 1053-1069. 5. GEORGE, A., AND LIU, J.W.H. A mimmal storage implementation of the mmh-num degree algorithm SIAM J. Numer Anal. 17 (1980), 283-299. 6. GEORGE, A., AND LIU, J.W.H. A quotient graph model for symmetric faetorization. Sparse Matrix Proceedings 1978, I.S. Duff and G.W. Stewart, Eds., SIAM Pubhcations, 1978, pp. 154- 175. 7. PARTER, S.V., The use of linear graphs in Gauss elimination. SIAM Rev. 3 (1961), 364-369. 8. RosE, D.J. A graph theoretic study of the numerical solution of sparse positive definite systems. In Graph Theory and Computing, R.C. Read, Ed. Academic Press, New York, 1972. 9. ROSE, D.J., TARJAN, R.E., AND LUEKER, G.S. Algorithmic aspects of vertex ehmmation on graphs. SIAM J. Comput. 5 (1975), 266-283.

Received November 1978; revised September 1979; accepted January 1980.

ACM Transactions on MathematmalSoftware, Vol 6, No 3, September 1980