Implementation of a Preconditioned Eigensolver Using Hypre

Total Page:16

File Type:pdf, Size:1020Kb

Implementation of a Preconditioned Eigensolver Using Hypre Implementation of a preconditioned eigensolver using Hypre Andrew V. Knyazev 1 ,¤and Merico E. Argentati 1 1 Department of Mathematics, University of Colorado at Denver, USA SUMMARY This paper describes a parallel preconditioned algorithm for solution of partial eigenvalue problems for large sparse symmetric matrices on massively parallel computers, taking advantage of advances in the scalable linear solvers, in particular in multigrid technology and in incomplete factorizations, developed under the Hypre project, at the Lawrence Livermore National Laboratory (LLNL), Center for Applied Scientific Computing. The algorithm implements a ”matrix free” locally optimal block preconditioned conjugate gradient method (LOBPCG), suggested earlier by the first author, to compute one or more of the smallest eigenvalues and the corresponding eigenvectors of a symmetric matrix. We discuss our Hypre specific implementation approach for a flexible parallel algorithm, and the capabilities of the developed software. We demonstrate parallel performance on a set of test problems, using Hypre algebraic multigrid and other preconditioners. The code is written in MPI based C-language and uses Hypre and LAPACK libraries. It has been included in Hypre starting with revision 1.8.0b, which is publicly available on the Internet at LLNL web site. Copyright c 2005 John Wiley & Sons, Ltd. KEY WORDS: Eigenvalue; eigenvector; symmetric eigenvalue problem; preconditioning; sparse matrices; parallel computing; conjugate gradients; algebraic multigrid; additive Schwarz; incomplete factorization; Hypre. AMS(MOS) subject classifications. 65F15, 65N25, 65Y05. 1. Introduction We implement a parallel preconditioned algorithm of the Locally Optimal Block Preconditioned Conjugate Gradient Method (LOBPCG) [3, 4] for the solution of eigenvalue problems Ax = λx for large sparse symmetric matrices A on massively parallel computers. We take advantage of advances in the scalable linear solvers, in particular in multigrid technology and in incomplete factorizations, developed under the High Performance Preconditioners (Hypre) project [5], at the Lawrence Livermore ¤Correspondence to: Department of Mathematics, University of Colorado at Denver, P.O. Box 173364, Campus Box 170, Denver, CO 80217-3364. E-mail: [email protected] Contract/grant sponsor: Center for Applied Scientific Computing, Lawrence Livermore National Laboratory Contract/grant sponsor: National Science Foundation; contract/grant number: DMS 0208773 PRECONDITIONED EIGENSOLVER IN HYPRE 1 National Laboratory (LLNL), Center for Applied Scientific Computing. The solver allows for the utilization of a set of Hypre preconditioners for solution of an eigenvalue problem. We discuss the implementation approach for a flexible “matrix-free” parallel algorithm, and the capabilities of the developed software, see [1] for details. We test the performance of the software on a few simple test problems. The LOBPCG Hypre software has been integrated into the Hypre software at LLNL and has been included in the recent Beta Release: Hypre–1.8.0b. Hypre is publicly available on the Internet at LLNL web site to download. Software for the Preconditioned Eigensolvers is available through the Internet page http:// www-math.cudenver.edu/ ˜aknyazev/software/CG/ which contains, in particular, the MATLAB code of LOBPCG. The rest of the paper is organized as follows. Section 2 contains a complete and detailed description of the LOBPCG algorithm as implemented in Hypre. We discuss our Hypre specific implementation approach for a flexible parallel algorithm and the capabilities of the developed software in Sections 3-4. We demonstrate parallel performance on a set of test problems, using Hypre supported preconditioners in Section 5. The final Section 6 contain the conclusions. 2. The Detailed Description of the LOBPCG Algorithm Below is a detailed description of the LOBPCG algorithm that has been implemented in the Hypre code. The earlier published LOBPCG description in [4] is not nearly as complete as the description below. LOBPCG Algorithm in Hypre Input: m starting linearly independent vectors in X 2 Rn£m, devices to compute A ¤ x and T ¤ x. 1. Allocate memory for the six matrices X;AX;W;AW;P;AP 2 Rn£m. 2. Initial settings for loop: [Q;R] = qr(X);X = Q; AX = A ¤ X; k = 0. 3. Compute initial Ritz vectors and initial residual: [TMP;Λ] = eig(X T ¤ AX);X = X ¤ TMP;AX = AX ¤ TMP; W = AX ¡ X ¤ Λ. 4. Get norm of individual residuals and store in normR 2 Rm. 5. do while (max(normR) > tol) 6. Find index set I of normR for normR > tol. The index set I defines those residual vectors for which the norm is greater than the tolerance. Refer to the column vectors defined by I, that are a subset of the columns of W and P, by WI and PI. 7. Apply preconditioner: WI = T ¤WI. 8. Orthonormalize WI: [Q;R] = qr(WI);WI = Q. 9. Update AWI: AWI = A ¤WI. 10. if k > 0 11. Orthonormalize PI: [Q;R] = qr(PI);PI = Q; ¡1 12. Update API: API = API ¤ R . 13. end if 14. Complete Rayleigh Ritz Procedure: 15. if k > 0 Copyright c 2005 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2005; 00:0–0 Prepared using nlaauth.cls 2 ANDREW KNYAZEV AND MERICO ARGENTATI T T Λ X ¤ AWI X ¤ API T T T 16. Compute G = 2 WI ¤ AX WI ¤ AWI WI ¤ API 3. T T T PI ¤ AX PI ¤ AWI PI ¤ API 4 T T 5 I X ¤WI X ¤ PI T T 17. Compute M = 2 WI ¤ X I WI ¤ PI 3. T T P ¤ X P ¤WI I 4 I I 5 18. else T Λ X ¤ AWI 19. Compute G = T T . · WI ¤ AX WI ¤ AWI ¸ T I X ¤WI 20. Compute M = T . · WI ¤ X I ¸ 21. end if 22. Solve the generalized eigenvalue problem: G ¤ y = λM ¤ y and store the first m eigenvalues in increasing order in the diagonal matrix Λ and store the corresponding m eigenvectors in Y. 23. if k > 0 Y1 24. Partition Y = 2 Y2 3 according to the number of columns in Y 4 3 5 X; WI, and PI, respectively. 25. Compute X = X ¤Y1 +WI ¤Y2 + PI ¤Y3. 26. Compute AX = AX ¤Y1 + AWI ¤Y2 + API ¤Y3. 27. Compute PI = WI ¤Y2 + PI ¤Y3. 28. Compute API = AWI ¤Y2 + API ¤Y3. 29. else Y 30. Partition Y = 1 according to the number of columns in · Y2 ¸ X and WI respectively. 31. Compute X = X ¤Y1 +WI ¤Y2. 32. Compute AX = AX ¤Y1 + AWI ¤Y2. 33. Compute PI = WI. 34. Compute API = AWI. 35. end if 36. Compute new residuals: WI = AXI ¡ XI ¤ ΛI. 37. Update residual normR for columns of W in WI. 38. k = k + 1 39. end do Output: Eigenvectors X and eigenvalues Λ. 3. LOBPCG Hypre Implementation Strategy The Hypre eigensolver code is written in MPI, e.g., [2], based C–language and uses Hypre and LAPACK libraries. It has been originally tested with Hypre version 1.6.0 and is being incorporated Copyright c 2005 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl. 2005; 00:0–0 Prepared using nlaauth.cls PRECONDITIONED EIGENSOLVER IN HYPRE 3 into Hypre starting with the 1.8.0b release. The user interface or Applications Program Interface (API) to the solver is implemented using Hypre style function calls. The matrix–vector multiply and the preconditioned solve are done through user supplied functions. This approach provides significant flexibility, and the implementation illustrates that the LOBPCG algorithm can successfully and efficiently use parallel libraries. Partition of vector data across the processors is determined by user input consisting of an initial array of parallel vectors. The same partitioning that is used for these vectors, is then used to setup and run the eigensolver. Partitioning of the matrix A and the preconditioner (also determined by the user) must be compatible with the partitioning of these initial vectors. Preconditioning is implemented through calls to the Hypre preconditioned conjugate gradient method (PCG). Specifically, the action x = Tb of the preconditioner T on a given vector b is performed by calling one of Hypre iterative linear solvers for the following linear algebraic system: Ax = b. Thus, we do not attempt to use shift-and-invert strategy, but instead simply take T to be a preconditioner for A. Since it is assumed in LOBPCG that the matrix A and the preconditioner T are both symmetric positive definite matrices, we only considered the Hypre PCG iterative method for inner iterations. Specifically, LOBPCG has been tested with Hypre PCG solvers with the following preconditioning: ² AMG–PCG: algebraic multigrid ² DS–PCG: diagonal scaling ² ParaSails–PCG: approximate inverse of A is generated by attempting to minimize kI ¡ AMkF ² Schwarz–PCG: additive Schwarz ² Euclid–PCG: incomplete LU 4. Hypre LOBPCG Software Implementation Details Hypre supports four conceptual interfaces: Struct, SStruct, FEM and IJ. At present, LOBPCG is only implemented using the IJ interface. This algebraic interface supports applications with general sparse matrices. A block diagram of the high-level software modules is given in Figure 1. The test driver ij es.c is a modified version of the Hypre driver ij.c and retains all the Hypre functionality and capabilities of ij.c. We anticipate that drives ij es.c and ij.c will be merged in the future Hypre releases. The ij es.c driver is used to run the eigensolver software with control through command line parameters. Various test matrices can be internally generated that consist of different types of Laplacians, or a Matrix Market format file may be read as input. The user can also choose the preconditioner. An initial set of vectors can be generated randomly, as an identity matrix of vectors, or a Matrix Market file may be input, with it's columns used as initial vectors. Various other input parameters can be specified such as the convergence tolerance, the maximum number of iterations and the maximum number of inner iterations for the Hypre solver.
Recommended publications
  • Recursive Approach in Sparse Matrix LU Factorization
    51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ).
    [Show full text]
  • Arxiv:1911.09220V2 [Cs.MS] 13 Jul 2020
    MFEM: A MODULAR FINITE ELEMENT METHODS LIBRARY ROBERT ANDERSON, JULIAN ANDREJ, ANDREW BARKER, JAMIE BRAMWELL, JEAN- SYLVAIN CAMIER, JAKUB CERVENY, VESELIN DOBREV, YOHANN DUDOUIT, AARON FISHER, TZANIO KOLEV, WILL PAZNER, MARK STOWELL, VLADIMIR TOMOV Lawrence Livermore National Laboratory, Livermore, USA IDO AKKERMAN Delft University of Technology, Netherlands JOHANN DAHM IBM Research { Almaden, Almaden, USA DAVID MEDINA Occalytics, LLC, Houston, USA STEFANO ZAMPINI King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Abstract. MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of dis- cretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM's goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element mesh- ing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. arXiv:1911.09220v2 [cs.MS] 13 Jul 2020 1. Introduction The Finite Element Method (FEM) is a powerful discretization technique that uses general unstructured grids to approximate the solutions of many partial differential equations (PDEs). It has been exhaustively studied, both theoretically and in practice, in the past several decades [1, 2, 3, 4, 5, 6, 7, 8]. MFEM is an open-source, lightweight, modular and scalable software library for finite elements, featuring arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing (HPC) efficiency [9].
    [Show full text]
  • Abstract 1 Introduction
    Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems A. Cleary Department of Computer Science University of Tennessee J. Dongarra Department of Computer Science University of Tennessee Mathematical Sciences Section Oak Ridge National Laboratory Abstract Described hereare the design and implementation of a family of algorithms for a variety of classes of narrow ly banded linear systems. The classes of matrices include symmetric and positive de - nite, nonsymmetric but diagonal ly dominant, and general nonsymmetric; and, al l these types are addressed for both general band and tridiagonal matrices. The family of algorithms captures the general avor of existing divide-and-conquer algorithms for banded matrices in that they have three distinct phases, the rst and last of which arecompletely paral lel, and the second of which is the par- al lel bottleneck. The algorithms have been modi ed so that they have the desirable property that they are the same mathematical ly as existing factorizations Cholesky, Gaussian elimination of suitably reordered matrices. This approach represents a departure in the nonsymmetric case from existing methods, but has the practical bene ts of a smal ler and more easily hand led reduced system. All codes implement a block odd-even reduction for the reduced system that al lows the algorithm to scale far better than existing codes that use variants of sequential solution methods for the reduced system. A cross section of results is displayed that supports the predicted performance results for the algo- rithms. Comparison with existing dense-type methods shows that for areas of the problem parameter space with low bandwidth and/or high number of processors, the family of algorithms described here is superior.
    [Show full text]
  • Overview of Iterative Linear System Solver Packages
    Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures.
    [Show full text]
  • Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
    Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications.
    [Show full text]
  • MFEM: a Modular Finite Element Methods Library
    MFEM: A Modular Finite Element Methods Library Robert Anderson1, Andrew Barker1, Jamie Bramwell1, Jakub Cerveny2, Johann Dahm3, Veselin Dobrev1,YohannDudouit1, Aaron Fisher1,TzanioKolev1,MarkStowell1,and Vladimir Tomov1 1Lawrence Livermore National Laboratory 2University of West Bohemia 3IBM Research July 2, 2018 Abstract MFEM is a free, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. Its mission is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers. MFEM also enables researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. Contents 1 Introduction 3 2 Overview of the Finite Element Method 4 3Meshes 9 3.1 Conforming Meshes . 10 3.2 Non-Conforming Meshes . 11 3.3 NURBS Meshes . 12 3.4 Parallel Meshes . 12 3.5 Supported Input and Output Formats . 13 1 4 Finite Element Spaces 13 4.1 FiniteElements....................................... 14 4.2 DiscretedeRhamComplex ................................ 16 4.3 High-OrderSpaces ..................................... 17 4.4 Visualization . 18 5 Finite Element Operators 18 5.1 DiscretizationMethods................................... 18 5.2 FiniteElementLinearSystems . 19 5.3 Operator Decomposition . 23 5.4 High-Order Partial Assembly . 25 6 High-Performance Computing 27 6.1 Parallel Meshes, Spaces, and Operators . 27 6.2 Scalable Linear Solvers .
    [Show full text]
  • Over-Scalapack.Pdf
    1 2 The Situation: Parallel scienti c applications are typically Overview of ScaLAPACK written from scratch, or manually adapted from sequen- tial programs Jack Dongarra using simple SPMD mo del in Fortran or C Computer Science Department UniversityofTennessee with explicit message passing primitives and Mathematical Sciences Section Which means Oak Ridge National Lab oratory similar comp onents are co ded over and over again, co des are dicult to develop and maintain http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html debugging particularly unpleasant... dicult to reuse comp onents not really designed for long-term software solution 3 4 What should math software lo ok like? The NetSolve Client Many more p ossibilities than shared memory Virtual Software Library Multiplicityofinterfaces ! Ease of use Possible approaches { Minimum change to status quo Programming interfaces Message passing and ob ject oriented { Reusable templates { requires sophisticated user Cinterface FORTRAN interface { Problem solving environments Integrated systems a la Matlab, Mathematica use with heterogeneous platform Interactiveinterfaces { Computational server, like netlib/xnetlib f2c Non-graphic interfaces Some users want high p erf., access to details {MATLAB interface Others sacri ce sp eed to hide details { UNIX-Shell interface Not enough p enetration of libraries into hardest appli- Graphic interfaces cations on vector machines { TK/TCL interface Need to lo ok at applications and rethink mathematical software { HotJava Browser
    [Show full text]
  • A Geometric Theory for Preconditioned Inverse Iteration. III: a Short and Sharp Convergence Estimate for Generalized Eigenvalue Problems
    A geometric theory for preconditioned inverse iteration. III: A short and sharp convergence estimate for generalized eigenvalue problems. Andrew V. Knyazev Department of Mathematics, University of Colorado at Denver, P.O. Box 173364, Campus Box 170, Denver, CO 80217-3364 1 Klaus Neymeyr Mathematisches Institut der Universit¨atT¨ubingen,Auf der Morgenstelle 10, 72076 T¨ubingen,Germany 2 Abstract In two previous papers by Neymeyr: A geometric theory for preconditioned inverse iteration I: Extrema of the Rayleigh quotient, LAA 322: (1-3), 61-85, 2001, and A geometric theory for preconditioned inverse iteration II: Convergence estimates, LAA 322: (1-3), 87-104, 2001, a sharp, but cumbersome, convergence rate estimate was proved for a simple preconditioned eigensolver, which computes the smallest eigenvalue together with the corresponding eigenvector of a symmetric positive def- inite matrix, using a preconditioned gradient minimization of the Rayleigh quotient. In the present paper, we discover and prove a much shorter and more elegant, but still sharp in decisive quantities, convergence rate estimate of the same method that also holds for a generalized symmetric definite eigenvalue problem. The new estimate is simple enough to stimulate a search for a more straightforward proof technique that could be helpful to investigate such practically important method as the locally optimal block preconditioned conjugate gradient eigensolver. We demon- strate practical effectiveness of the latter for a model problem, where it compares favorably with two
    [Show full text]
  • 16 Preconditioning
    16 Preconditioning The general idea underlying any preconditioning procedure for iterative solvers is to modify the (ill-conditioned) system Ax = b in such a way that we obtain an equivalent system Aˆxˆ = bˆ for which the iterative method converges faster. A standard approach is to use a nonsingular matrix M, and rewrite the system as M −1Ax = M −1b. The preconditioner M needs to be chosen such that the matrix Aˆ = M −1A is better conditioned for the conjugate gradient method, or has better clustered eigenvalues for the GMRES method. 16.1 Preconditioned Conjugate Gradients We mentioned earlier that the number of iterations required for the conjugate gradient algorithm to converge is proportional to pκ(A). Thus, for poorly conditioned matrices, convergence will be very slow. Thus, clearly we will want to choose M such that κ(Aˆ) < κ(A). This should result in faster convergence. How do we find Aˆ, xˆ, and bˆ? In order to ensure symmetry and positive definiteness of Aˆ we let M −1 = LLT (44) with a nonsingular m × m matrix L. Then we can rewrite Ax = b ⇐⇒ M −1Ax = M −1b ⇐⇒ LT Ax = LT b ⇐⇒ LT AL L−1x = LT b . | {z } | {z } |{z} =Aˆ =xˆ =bˆ The symmetric positive definite matrix M is called splitting matrix or preconditioner, and it can easily be verified that Aˆ is symmetric positive definite, also. One could now formally write down the standard CG algorithm with the new “hat- ted” quantities. However, the algorithm is more efficient if the preconditioning is incorporated directly into the iteration. To see what this means we need to examine every single line in the CG algorithm.
    [Show full text]
  • XAMG: a Library for Solving Linear Systems with Multiple Right-Hand
    XAMG: A library for solving linear systems with multiple right-hand side vectors B. Krasnopolsky∗, A. Medvedev∗∗ Institute of Mechanics, Lomonosov Moscow State University, 119192 Moscow, Michurinsky ave. 1, Russia Abstract This paper presents the XAMG library for solving large sparse systems of linear algebraic equations with multiple right-hand side vectors. The library specializes but is not limited to the solution of linear systems obtained from the discretization of elliptic differential equations. A corresponding set of numerical methods includes Krylov subspace, algebraic multigrid, Jacobi, Gauss-Seidel, and Chebyshev iterative methods. The parallelization is im- plemented with MPI+POSIX shared memory hybrid programming model, which introduces a three-level hierarchical decomposition using the corre- sponding per-level synchronization and communication primitives. The code contains a number of optimizations, including the multilevel data segmen- tation, compression of indices, mixed-precision floating-point calculations, arXiv:2103.07329v1 [cs.MS] 12 Mar 2021 vector status flags, and others. The XAMG library uses the program code of the well-known hypre library to construct the multigrid matrix hierar- chy. The XAMG’s own implementation for the solve phase of the iterative methods provides up to a twofold speedup compared to hypre for the tests ∗E-mail address: [email protected] ∗∗E-mail address: [email protected] Preprint submitted to Elsevier March 15, 2021 performed. Additionally, XAMG provides extended functionality to solve systems with multiple right-hand side vectors. Keywords: systems of linear algebraic equations, Krylov subspace iterative methods, algebraic multigrid method, multiple right-hand sides, hybrid programming model, MPI+POSIX shared memory Nr.
    [Show full text]
  • Numerical Linear Algebra Solving Ax = B , an Overview Introduction
    Solving Ax = b, an overview Numerical Linear Algebra ∗ A = A no Good precond. yes flex. precond. yes GCR Improving iterative solvers: ⇒ ⇒ ⇒ yes no no ⇓ ⇓ ⇓ preconditioning, deflation, numerical yes GMRES A > 0 CG software and parallelisation ⇒ ⇓ no ⇓ ⇓ ill cond. yes SYMMLQ str indef no large im eig no Bi-CGSTAB ⇒ ⇒ ⇒ no yes yes ⇓ ⇓ ⇓ MINRES IDR no large im eig BiCGstab(ℓ) ⇐ yes Gerard Sleijpen and Martin van Gijzen a good precond itioner is available ⇓ the precond itioner is flex ible IDRstab November 29, 2017 A + A∗ is strongly indefinite A 1 has large imaginary eigenvalues November 29, 2017 2 National Master Course National Master Course Delft University of Technology Introduction Program We already saw that the performance of iterative methods can Preconditioning be improved by applying a preconditioner. Preconditioners (and • Diagonal scaling, Gauss-Seidel, SOR and SSOR deflation techniques) are a key to successful iterative methods. • Incomplete Choleski and Incomplete LU In general they are very problem dependent. • Deflation Today we will discuss some standard preconditioners and we will • Numerical software explain the idea behind deflation. • Parallelisation We will also discuss some efforts to standardise numerical • software. Shared memory versus distributed memory • Domain decomposition Finally we will discuss how to perform scientific computations on • a parallel computer. November 29, 2017 3 November 29, 2017 4 National Master Course National Master Course Preconditioning Why preconditioners? A preconditioned iterative solver solves the system − − M 1Ax = M 1b. 1 The matrix M is called the preconditioner. 0.8 The preconditioner should satisfy certain requirements: 0.6 Convergence should be (much) faster (in time) for the 0.4 • preconditioned system than for the original system.
    [Show full text]
  • Solving Symmetric Semi-Definite (Ill-Conditioned)
    Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Zhaojun Bai University of California, Davis Berkeley, August 19, 2016 Symmetric definite generalized eigenvalue problem I Symmetric definite generalized eigenvalue problem Ax = λBx where AT = A and BT = B > 0 I Eigen-decomposition AX = BXΛ where Λ = diag(λ1; λ2; : : : ; λn) X = (x1; x2; : : : ; xn) XT BX = I: I Assume λ1 ≤ λ2 ≤ · · · ≤ λn LAPACK solvers I LAPACK routines xSYGV, xSYGVD, xSYGVX are based on the following algorithm (Wilkinson'65): 1. compute the Cholesky factorization B = GGT 2. compute C = G−1AG−T 3. compute symmetric eigen-decomposition QT CQ = Λ 4. set X = G−T Q I xSYGV[D,X] could be numerically unstable if B is ill-conditioned: −1 jλbi − λij . p(n)(kB k2kAk2 + cond(B)jλbij) · and −1 1=2 kB k2kAk2(cond(B)) + cond(B)jλbij θ(xbi; xi) . p(n) · specgapi I User's choice between the inversion of ill-conditioned Cholesky decomposition and the QZ algorithm that destroys symmetry Algorithms to address the ill-conditioning 1. Fix-Heiberger'72 (Parlett'80): explicit reduction 2. Chang-Chung Chang'74: SQZ method (QZ by Moler and Stewart'73) 3. Bunse-Gerstner'84: MDR method 4. Chandrasekaran'00: \proper pivoting scheme" 5. Davies-Higham-Tisseur'01: Cholesky+Jacobi 6. Working notes by Kahan'11 and Moler'14 This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm 2. An algebraic reformulation 3. Locally accelerated block preconditioned steepest descent (LABPSD) This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm Status: beta-release 2.
    [Show full text]