A Comparison of Parallel Solvers for Diagonally

A COMPARISON OF PARALLEL SOLVERS FOR DIAGONALLY DOMINANT AND GENERAL NARROW-BANDED LINEAR SYSTEMS y z x { PETER ARBENZ , ANDREW CLEARY ,JACK DONGARRA , AND MARKUS HEGLAND Abstract. Weinvestigate and compare stable parallel algorithms for solving diagonally dominant and general narrow-banded linear systems of equations. Narrow-banded means that the bandwidth is very small compared with the matrix order and is typically b etween 1 and 100. The solvers compared are the banded system solvers of ScaLAPACK [11] and those investigated by Arb enz and Hegland [3, 6]. For the diagonally dominant case, the algorithms are analogs of the well-known tridiagonal cyclic reduction algorithm, while the inspiration for the general case is the lesser-known bidiagonal cyclic reduction, which allows a clean parallel implementation of partial pivoting. These divide-and-conquertyp e algorithms complement ne-grained algorithms which p erform well only for wide-banded matrices, with each family of algorithms having a range of problem sizes for whichit is sup erior. We present theoretical analyses as well as numerical exp eriments conducted on the Intel Paragon. Key words. narrow-banded linear systems, stable factorization, parallel solution, cyclic reduction, ScaLAPACK 1. Intro duction. In this pap er we compare implementations of direct parallel metho ds for solving banded systems of linear equations Ax = b: 1.1 The n-by-n matrix A is assumed to havelower half-bandwidth k and upp er half- l bandwidth k , meaning that k and k are the smallest integers that imply u l u a 6=0 = k j i k : ij l u We assume that the matrix A has a narrow band, such that k + k n. Linear l u systems with wide band can b e solved eciently by metho ds similar to full system solvers. In particular, parallel algorithms using two-dimensional mappings suchas the torus-wrap mapping and Gaussian elimination with partial pivoting haveachieved reasonable success [16, 10, 18]. The parallelism of these algorithms is the same as that of dense matrix algorithms applied to matrices of size minfk ;k g, indep endentofn, l u from whichitisobvious that small bandwidths severely limit the usefulness of these algorithms, even for large n. Parallel algorithms for the solution of banded linear systems with small bandwidth have b een considered by many authors, b oth b ecause they serve as a canonical form of recursive equations, as well as having direct applications. The latter include the solution of eigenvalue problems with inverse iteration [17], spline interp olation and smo othing [9], and the solution of b oundary value problems for ordinary dif- ferential equations using nite di erence or nite element metho ds [27]. For these y Institute of Scienti c Computing, Swiss Federal Institute of Technology ETH, 8092 Zurich, Switzerland [email protected] z Center for Applied Scienti c Computing, Lawrence Livermore National Lab oratory,P.O. Box 808, L-561, Livermore CA 94551, U.S.A. [email protected] x Department of Computer Science, UniversityofTennessee, Knoxville TN 37996-1301, U.S.A. [email protected] { Computer Sciences Lab oratory, RSISE, Australian National University, Canb erra ACT 0200, Australia [email protected] 2 P. ARBENZ, A. CLEARY, J. DONGARRA, AND M. HEGLAND one-dimensional applications, bandwidths typically vary b etween 2 and 30. The dis- cretisation of partial di erential equations leads to applications with slightly larger bandwidths, for example, the computation of uid ow in a long narrow pip e. In this case, the numb er of grid p oints orthogonal to the ow direction is much smaller than the numb er of grid p oints along the ow and this results in a matrix with bandwidth relatively small compared to the total size of the problem. There is a tradeo for these typ e of problems b etween band solvers and general sparse techniques, in that the band solver assumes that all of the entries within the band are nonzero, which they are not, and thus p erforms unnecessary computation, but its data structures are much simpler and there is no indirect addressing as in general sparse metho ds. In section 2 we review an algorithm for the class of nonsymmetric narrow-banded matrices that can b e factored stably without pivoting, such as diagonally dominant matrices or M-matrices. This algorithm has b een discussed in detail in [3, 11] where the p erformance of implementations of this algorithm on distributed memory mul- ticomputers like the Intel Paragon [3] or the IBM SP/2 [11] is analyzed as well. Johnsson [23] considered the same algorithm and its implementation on the Thinking Machine CM-2 which required a di erent mo del for the complexity of the interpro cessor communication. Related algorithms have b een presented in [26,15,14,7,12, 28] for shared memory multipro cessors with a small numb er of pro cessors. The algorithm that we consider here can b e interpreted as a generalization of cyclic reduction CR, or more usefully, as Gaussian elimination applied to a symmetrically p ermuted system of T equations PAP P x = P b. The latter interpretation has imp ortant consequences, such as it implies that the algorithm is backward stable [5]. It can also b e used to show that the p ermutation necessarily causes Gaussian elimination to generate l l-in which in turn increases the computational complexityaswell as the memory requirements of the algorithm. In section 3 we consider algorithms for solving 1.1 for arbitrary narrow- banded matrices A that may require pivoting for stability reasons. This algorithm was pro- p osed and thoroughly discussed in [6]. It can b e interpreted as a generalization of the well-known blo ck tridiagonal cyclic reduction to blo ck bidiagonal matrices, and again, it is also equivalent to Gaussian elimination applied to a p ermuted non- T symmetrically for the general case system of equations PAQ Qx = P b. Blo ck bidiagonal cyclic reduction for the solution of banded linear systems was intro duced by Hegland [19]. In section 4 we compare the ScaLAPACK implementations [11] of the two algorithms ab ove with the implementations by Arb enz [3] and Arb enz and Hegland [6], resp ectively,by means of numerical exp eriments conducted on the Intel Paragon. ScaLAPACK is a software package with a diverse user community. Each subroutine should have an easily intelligible calling sequence interface and work with easily manageable data distributions. These constraints may reduce the p erformance of the co de. The other two co des are exp erimental. They have b een optimized for low communication overhead. The numb er of messages sent among pro cessors and the marshaling pro cess has b een minimized for the task of solving a system of equations. The co de do es, for instance, not split the LU factorization from the forward elimination which prohibits the solution of a sequence of systems of equations with equal system matrix without factoring the system matrix over and over again. Our com- parisons shall give an answer to the question howmuch p erformance the ScaLAPACK algorithms mayhave lost through the constraint to b e user friendly.We further con- tinue a discussion started in [5] on the overhead intro duced by partial pivoting. Is PARALLEL SOLVERS FOR NARROW-BANDED LINEAR SYSTEMS 3 it necessary to have a pivoting as well as a non-pivoting algorithm for nonsymmetric band systems in ScaLAPACK? In LAPACK [2], for instance, there are only pivoting subroutine for solving dense and banded systems of equations, resp ectively. 2. Parallel Gaussian elimination for the diagonally dominant case. In this section we assume that the matrix A =[a ] in 1.1 is diagonally domi- ij i;j =1;:::;n nant, i.e., that n X ja j; i =1;::: ;n: ja j > ij ii j =1 j 6=i Then the system of equations can b e solved by Gaussian elimination without pivoting in the following three steps: 1. Factorization into A = LU . 2. Solution of Lz = y forward elimination 3. Solution of Ux = z backward substitution The lower and upp er triangular Gauss factors L and U are banded with bandwidths k and k , resp ectively, where k and k are the half-bandwidths of A. The number l u l u of oating p oint op erations ' for solving the banded system 1.1 with r right-hand n sides by Gaussian elimination is see also e.g. [17] 2 ' =2k +1k n +2k +2k 1rn + O k + r k ; k := maxfk ;k g: 2.1 n u l l u l u For solving 1.1 in parallel on a p pro cessor multicomputer we partition the matrix A, the solution vector x and the right-hand hand side b according to 0 0 0 1 1 1 U A B x b 1 1 1 1 L U B B B C C C B C D 1 1 2 1 1 B B B C C C L U B B B C C C D A B x b 2 2 2 2 2 B B B C C C = ; 2.2 B B B C C C . B B B C C C . B B B C C C U L @ @ @ A A A C D B p1 p p1 p1 p1 L D A x b p p p p P p n n kk n k i i i where A 2 R ;C 2 R ; x ; b 2 R ; ; 2 R ; and n +p 1k = n.

A Comparison of Parallel Solvers for Diagonally

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support