Overview of Iterative Linear System Solver Packages
Total Page:16
File Type:pdf, Size:1020Kb
Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures. Most pack- ages therefore concentrate on more parallel metho ds such as Blo ck Jacobi or AdditiveSchwarz. There are a few implementations of multicolour incomplete factorisations. Blo ckSolve95 is the only package catching breakdown of the ILU or IC fac- torisation. The manual outlines co de that successively shifts the diagonal until the factorisation succeeds. 2.1.2 Preconditioners for sequential pro cessing On sequential architectures, the BPKIT package provides sophisticated blo ck factorisations, and LASpack contains multigrid solvers. Most other packages supply the user with variants of ILU Incomplete LU or IC Incomplete Cholesky. 2.1.3 Preconditioners for parallel iterative metho ds These are the approaches towards parallel preconditioning taken in the packages under review here. Direct approximations of the inverse SPAI section 3.13 is the only pack- age that provides a direct approximation metho d to the inverse of the co ecient matrix. Since such an approximation is applied directly, using a matrix-vector pro duct, it is trivially parallel. The SPAI preconditioner is in addition also generated fully in parallel. Blo ck Jacobi Each pro cessor solves its own subsystem, and there is no commu- nication b etween the pro cessors. Since this strategy neglects the global/implicit prop erties of the linear system, only a limited improvement in the number 3 of iterations can result. On the other hand, this typ e of metho d is very parallel. All parallel preconditioner packages provide some form of Blo ck Jacobi metho d. AdditiveSchwarz As in the Blo ck Jacobi metho d, each pro cessor solves a lo cal subsystem. However, the lo cal system is now augmented to include b ordering variables, which b elong to other pro cessors. A certain amount of communicationisnow necessary, and the convergence improvement can bymuch higher. This metho d is available in Aztec 3.1, Petsc 3.10, ParPre 3.8, PSparselib 3.12. Multicolour factorisations It is p ossible to p erform global incomplete fac- torisation if the domain is not only split over the pro cessors, but is also augmented with a multicolour structure. Under the reasonable assumption that each pro cessor has variables of every colour, b oth the factorisation and the solution with a system thus ordered are parallel. The number of synchronisation p oints is equal to the numb er of colours. This is the metho d supplied in Blo ckSolve95 3.2; it is also available in ParPre 3.8. Blo ck factorisation It is p ossible to implement blo ck SSOR or ILU metho ds, with the subblo cks corresp onding to the lo cal systems with overlap, this gives the MultiplicativeSchwarz metho d. Such factorisations are neces- sarily more sequential than Blo ck Jacobi or AdditiveSchwarz metho ds, but they are also more accurate. With an appropriately chosen pro cessor ordering e.g., multicolour instead of sequential the solution time is only a small multiple times that of Blo ck Jacobi and AdditiveSchwarz metho ds. Such blo ck factorisations are available in Parpre 3.8; PSparselib 3.12 has the MultiplicativeSchwarz metho d, under the name `multicolour SOR'. Multilevel metho ds Petsc 3.10 and ParPre 3.8 are the only packages sup- plying variants of algebraic multigrid metho ds in parallel. Schur complement metho ds ParPre 3.8 and PSparselib 3.12 contain Schur complement domain decomp osition metho ds. 4 2.2 Data structure issues Every iterative metho d contains a matrix-vector pro duct and a preconditioner solve. Since these are inextricably tied to the data structures used for the matrix and the preconditioner and p ossibly even to the data structure used for vectors, p erhaps the most imp ortant design decision for an iterative metho ds package is the choice of the data structure. 2.2.1 The interface to user data The more complicated the data structure, the more urgent the question b ecomes of how the user is to supply the structures to the package. This question is particularly imp ortant in a parallel context. The packages reviewed here have taken the following list of approaches to this problem. Fully internal, not directly accessible, data structures basically ob ject ori- ented programming. This approach is taken in Petsc 3.10, ParPre 3.8, and Blo ckSolve 3.2. The data structures here are only accessible through function calls. This means that the package has to supply routines for constructing the struc- tures, for insp ecting them, and for applying them. Prescrib ed, user supplied, data structures. This approachistaken in Aztec 3.1; it is available in PCG 3.9. Here the user has to construct the data structures, but the package supplies the pro duct and solve routines. User supplied, arbitrary, data structures, with user supplied pro duct and solve routines. This approachisavailable in PCG 3.9 and Petsc 3.10; the ob ject-oriented package IML++ 3.5 can also b e considered to use this approach. User de ned data structures, not passed to the iterative metho d at all; pro duct and solve op erations are requested through a reverse communica- tion interface. This approachistaken in PIM 3.11; it is also available in PCG 3.9. 2.2.2 Parallel data layout There are several ways of partitioning a sparse matrix over multiple pro cessors. The scheme supp orted by all packages is partitioning by blo ckrows. Each pro cessor receives a contiguous series of rows of the matrix. This is the approach taken in Petsc 3.10; it is available in Blo ckSolve95 3.2. Under this approach a pro cessor can determine the ownership of anyvari- able, bykeeping a short list of rst and last rows of each pro cessor. 5 Each pro cessor receives an arbitrary set or rows. This is the approach taken in Aztec 3.1; it is available in Blo ckSolve95 3.2. Under this scheme, each pro cessor needs to maintain the mapping of its lo cal rows to the global numb ering. It is now no longer trivial to determine ownership of a variable. When a sparse matrix is distributed, the lo cal matrix on each pro cessor needs to have its column indices mapp ed from the global to the lo cal numb ering. Various packages o er supp ort for this renumb ering. 6 2.3 High p erformance computing Sparse matrix problems are notoriously low p erformers. Most of this is related to the fact that there is little or no data reuse, thereby preventing the use of BLAS kernels. See for example the tests on vector architectures in [4]. Other p erformance problems stem from parallel asp ects of the solution meth- o ds. Here are then a few of the approaches taken to alleviate p erformance prob- lems. 2.3.1 Quotient graph computation A matrix de nes a graph byintro ducing an edge i; j for every nonzero ele- ment a ij . A dense matrix in this manner de nes a graph in which eachnode is connected to every other no de. This is also called a `clique'. Sparse matrices, on the other hand, induce a graph where, no matter the numb er of no des, each no de is connected to only a small numb er of other no des. However, sparse matrices from problems with multiple physical variables will have dense subblo cks, for instance corresp onding to the variables on any given no de. It is p ossible to increase the eciency of linear algebra op erations by imp osing on the matrix the blo ck structure induced by these small dense 1 ik a kk a ki that subblo cks. For instance, the scalar multiplication/division a app ears in Gaussian elimination now b ecomes a matrix inversion and a matrix- matrix multiplication, in other words, BLAS3 kernels. Identifying cliques in the matrix graph, and deriving a quotient graph by `factoring them out', is the approach taken in the Blo ckSolve package; sec- tion 3.2.