A Sparse Matrix Library in C++ for High Performance Architectures Abstract 1 Introduction

A Sparse Matrix Library in C++ for High Performance Architectures xz Jack Dongarra , Andrew Lumsdaine , Xinhiu Niu z x Roldan Pozo , Karin Remington x z Oak Ridge National Lab oratory UniversityofTennessee University of Notre Dame Mathematical Sciences Section Dept. of Computer Science Dept. of Computer Science & Engineering Abstract neering applications. Nevertheless, comprehensive libraries for sparse matrix computations have not b een We describ e an ob ject oriented sparse matrix library develop ed and integrated to the same degree as those in C++ built up on the Level 3 Sparse BLAS prop osal for dense matrices. Several factors contribute to the [5] for p ortability and p erformance across a wide class diculty of designing such a comprehensive library. of machine architectures. The C++ library includes Di erent computer architectures, as well as di erent algorithms for various iterative metho ds and supp orts applications, call for di erent sparse matrix data for- the most common sparse data storage formats used mats in order to b est exploit registers, data lo cal- in practice. Besides simplifying the subroutine inter- ity, pip elining, and parallel pro cessing. Furthermore, face, the ob ject oriented design allows the same driv- co de involving sparse matrices tends to b e very com- ing co de to b e used for various sparse matrix formats, plicated, and not easily p ortable, b ecause the details thus addressing many of the diculties encountered of the underlying data formats are invariably entan- with the typical approach to sparse matrix libraries. gled within the application co de. We emphasize the fundamental design issues of the To address these diculties, it is essential to de- C++ sparse matrix classes and illustrate their usage velop co des which are as \data format free" as p os- with the preconditioned conjugate gradient PCG sible, thus providing the greatest exibility for using metho d as an example. Performance results illustrate the given algorithm library routine in various ar- that these co des are comp etitive with optimized For- chitecture/application combinations. In fact, the se- tran 77. We discuss the impact of our design on ele- lection of an appropriate data structure can typically gance, p erformance, maintainability, p ortability, and b e deferred until link or run time. We describ e an robustness. ob ject oriented C++ library for sparse matrix computations which provides a uni ed interface for various iterative solution techniques across a varietyof 1 Intro duction sparse data formats. The design of the library is based on the following Sparse matrices are p ervasive in application co des principles: which use nite di erence, nite element or nite vol- ume discretizations of PDEs for problems in compu- Clarity: Implementations of numerical algorithms tational uid dynamics, structural mechanics, semi- should resemble the mathematical algorithms on conductor simulation and other scienti c and engi- which they are based. This is in contrast to For- tran 77, which can require complicated subrou- This pro ject was supp orted in part by the Defense Ad- tine calls, often with parameter lists that stretch vanced Research Pro jects Agency under contract DAAL03-91- C-0047, administeredby the Army Research Oce, the Ap- over several lines. plied Mathematical Sciences subprogram of the Oce of En- ergy Research, U.S. Department of Energy, under Contract Reuse: A particular algorithm should only need to DE-AC05-84OR21400, and by the National Science Founda- b e co ded once, with identical co de used for all tion Science and Technology Center Co op erative Agreement No. CCR-8809615, and NSF grant No. CCR-9209815. matrix representations. 1 gradient algorithm, used to solve Ax = b, with pre- Portability: Implementations of numerical algo- conditioner M . The comparison b etween the pseudo- rithms should b e directly p ortable across ma- co de and the C++ listing app ears in gure 1. chine platforms. Here the op erators suchas*and += have b een High Performance: The ob ject oriented library overloaded to work with matrix and vectors formats. co de should p erform as well as optimized data- This co de fragmentworks for all of the supp orted format-sp eci c co de written in C or Fortran. sparse storage classes and makes use of the Level 3 Sparse BLAS in the matrix-vector multiply A*p. Toachieve these goals the sparse matrix library has to b e more than just an unrelated collection of matrix ob jects. Adhering to true ob ject oriented de- 3 Sparse Matrix typ es sign philosophies, inheritance and polymorphism are used to build a matrix hierarchy in which the same Wehave concentrated on the most commonly used co des can b e used for various dense and sparse linear data structures which account for a large p ortion of algebra computations across sequential and parallel application co des. The library can b e arbitrarily ex- architectures. tended to user-sp eci c structures and will eventually grow. We hop e to incorp orate user-contributed ex- tensions in future versions of the software. Matrix 2 Iterative Solvers classes supp orted in the initial version of the library include The library provides algorithms for various iterative Sparse Vector: List of nonzero elements with their metho ds as describ ed in Barrett et. al. [1], together index lo cations. It assumes no particular order- with their preconditioned counterparts: ing of elements. Jacobi SOR SOR COOR Matrix: Co ordinate Storage Matrix. List of Conjugate Gradient CG nonzero elements with their resp ectiverow and column indices. This is the most general sparse Conjugate Gradient on Normal Equations matrix format, but it is not very space or com- CGNE, CGNR putationally ecient. It assumes no ordering of nonzero matrix values. Generalized Minimal Residual GMRES CRS Matrix : Compressed Row Storage Matrix. Minimum Residual MINRES Subsequent nonzeros of the matrix rows are stored in contiguous memory lo cations and an Quasi-Minimal Residual QMR additional integer arrays sp eci es where eachrow Chebyshev Iteration Cheb b egins. It assumes no ordering among nonzero values within eachrow, but rows are stored in Conjugate Gradient Squared CGS consecutive order. Biconjugate Gradient BiCG CCS Matrix: Compressed Column Storage also commonly known as the Harwel l-Boeing sparse ma- Biconjugate Gradient Stabilized Bi-CGSTAB trix format [4]. This is a variant of CRS storage where columns, rather rows, are stored contigu- Although iterative metho ds have provided muchof ously. Note that the CCS ordering of A is the the motivation for this library, many of the same op- T same as the CRS of A . erations and design issues are addressed for direct metho ds as well. In particular, some of the most CDS Matrix: Compressed Diagonal Storage. De- p opular preconditioners, such as Incomplete LU Fac- signed primarily for matrices with relatively con- torization ILU [1]have comp onents quite similar to stant bandwidth, the sub and sup er-diagonals direct metho ds. are stored in contiguous memory lo cations. One motivation for this work is that the high level algorithms found in [1] can b e easily implemented in Matrix: Jagged Diagonal Storage. Also knowas JDS C++. For example, take a preconditioned conjugate ITPACK storage. More space ecient than CDS 2 0 0 Initial r = b Ax r=b-Ax; for i =1;2;::: for int i=1; i<maxiter; i++f i1 i1 solve Mz = r z = M.solver; T i1 i1 = r z rho=r*z; i1 if i =1 if i==1 1 0 p = z p=z; else elsef = = beta = rho1/ rho0; i1 i1 i2 i i1 i1 p = z + p * beta; p = z + p i1 g endif i i q = A*p; q = Ap T i i alpha = rho1 / p*q; = =p q i i1 i i1 i x += alpha * p; x = x + p i i i1 i r -= alpha * q; r = r q i if normr/normb < tol break; check convergence; g end Figure 1: Psuedo co de and C++ comparison of a preconditioned conjugate gradient metho d in IML++ package matrices at the cost of a gather/scatter op era- hierarchical memory architectures, the Sparse BLAS tion. allowvendors to provide optimized routines taking advantage of indirect addressing hardware, registers, BCRS Matrix: Blo ck Compressed Row Storage. Use- pip elining, caches, memory management, and paral- ful when the sparse matrix is comprised of square lelism on their particular architecture. Standardizing dense blo cks of nonzeros in some regular pattern. the Sparse BLAS will not only provide ecient co des, The savings in storage and reduced indirect ad- but will also ensure p ortable computational kernels dressing over CRS can b e signi cant for matrices with a common interface. with large blo ck sizes. There are twotyp es of C++ interfaces to basic kernels. The rst utilizes simple binary op erators SKS Matrix: Skyline Storage. Also for variable band for multiplication and addition, and the second is a or pro le matrices. Mainly used in direct solvers, functional interfaces which can group triad and more but can also b e used for handling the diagonal complex op erations. The binary op erators provide for blo cks in blo ck matrix factorizations. a simpler interface, e.g. y = A x denotes a sparse In addition, symmetric and Hermitian versions of matrix-vector multiply, but may pro duce less ecient most of these sparse formats will also b e supp orted. co de. The computational kernels include: In such cases only an upp er or lower triangular p or- sparse matrix pro ducts, C opA B + C . tion of the matrix is stored. The trade-o is a more complicated algorithm with a somewhat di erent pat- solution of triangular systems, tern of data access.

Load more