Preconditioned Eigensolver LOBPCG in Hypre and Petsc
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Conjugate Gradient Method (Part 4) Pre-Conditioning Nonlinear Conjugate Gradient Method
18-660: Numerical Methods for Engineering Design and Optimization Xin Li Department of ECE Carnegie Mellon University Pittsburgh, PA 15213 Slide 1 Overview Conjugate Gradient Method (Part 4) Pre-conditioning Nonlinear conjugate gradient method Slide 2 Conjugate Gradient Method Step 1: start from an initial guess X(0), and set k = 0 Step 2: calculate ( ) ( ) ( ) D 0 = R 0 = B − AX 0 Step 3: update solution D(k )T R(k ) X (k +1) = X (k ) + µ (k )D(k ) where µ (k ) = D(k )T AD(k ) Step 4: calculate residual ( + ) ( ) ( ) ( ) R k 1 = R k − µ k AD k Step 5: determine search direction R(k +1)T R(k +1) (k +1) = (k +1) + β (k ) β = D R k +1,k D where k +1,k (k )T (k ) D R Step 6: set k = k + 1 and go to Step 3 Slide 3 Convergence Rate k ( + ) κ(A) −1 ( ) X k 1 − X ≤ ⋅ X 0 − X κ(A) +1 Conjugate gradient method has slow convergence if κ(A) is large I.e., AX = B is ill-conditioned In this case, we want to improve convergence rate by pre- conditioning Slide 4 Pre-Conditioning Key idea Convert AX = B to another equivalent equation ÃX̃ = B̃ Solve ÃX̃ = B̃ by conjugate gradient method Important constraints to construct ÃX̃ = B̃ à is symmetric and positive definite – so that we can solve it by conjugate gradient method à has a small condition number – so that we can achieve fast convergence Slide 5 Pre-Conditioning AX = B −1 −1 L A⋅ X = L B L−1 AL−T ⋅ LT X = L−1B A X B ̃ ̃ ̃ L−1AL−T is symmetric and positive definite, if A is symmetric and positive definite T (L−1 AL−T ) = L−1 AL−T T X T L−1 AL−T X = (L−T X ) ⋅ A⋅(L−T X )> 0 Slide -
Recursive Approach in Sparse Matrix LU Factorization
51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ). -
Arxiv:1911.09220V2 [Cs.MS] 13 Jul 2020
MFEM: A MODULAR FINITE ELEMENT METHODS LIBRARY ROBERT ANDERSON, JULIAN ANDREJ, ANDREW BARKER, JAMIE BRAMWELL, JEAN- SYLVAIN CAMIER, JAKUB CERVENY, VESELIN DOBREV, YOHANN DUDOUIT, AARON FISHER, TZANIO KOLEV, WILL PAZNER, MARK STOWELL, VLADIMIR TOMOV Lawrence Livermore National Laboratory, Livermore, USA IDO AKKERMAN Delft University of Technology, Netherlands JOHANN DAHM IBM Research { Almaden, Almaden, USA DAVID MEDINA Occalytics, LLC, Houston, USA STEFANO ZAMPINI King Abdullah University of Science and Technology, Thuwal, Saudi Arabia Abstract. MFEM is an open-source, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of dis- cretization approaches and emphasis on usability, portability, and high-performance computing efficiency. MFEM's goal is to provide application scientists with access to cutting-edge algorithms for high-order finite element mesh- ing, discretizations and linear solvers, while enabling researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel and GPU-accelerated settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. arXiv:1911.09220v2 [cs.MS] 13 Jul 2020 1. Introduction The Finite Element Method (FEM) is a powerful discretization technique that uses general unstructured grids to approximate the solutions of many partial differential equations (PDEs). It has been exhaustively studied, both theoretically and in practice, in the past several decades [1, 2, 3, 4, 5, 6, 7, 8]. MFEM is an open-source, lightweight, modular and scalable software library for finite elements, featuring arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing (HPC) efficiency [9]. -
Abstract 1 Introduction
Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems A. Cleary Department of Computer Science University of Tennessee J. Dongarra Department of Computer Science University of Tennessee Mathematical Sciences Section Oak Ridge National Laboratory Abstract Described hereare the design and implementation of a family of algorithms for a variety of classes of narrow ly banded linear systems. The classes of matrices include symmetric and positive de - nite, nonsymmetric but diagonal ly dominant, and general nonsymmetric; and, al l these types are addressed for both general band and tridiagonal matrices. The family of algorithms captures the general avor of existing divide-and-conquer algorithms for banded matrices in that they have three distinct phases, the rst and last of which arecompletely paral lel, and the second of which is the par- al lel bottleneck. The algorithms have been modi ed so that they have the desirable property that they are the same mathematical ly as existing factorizations Cholesky, Gaussian elimination of suitably reordered matrices. This approach represents a departure in the nonsymmetric case from existing methods, but has the practical bene ts of a smal ler and more easily hand led reduced system. All codes implement a block odd-even reduction for the reduced system that al lows the algorithm to scale far better than existing codes that use variants of sequential solution methods for the reduced system. A cross section of results is displayed that supports the predicted performance results for the algo- rithms. Comparison with existing dense-type methods shows that for areas of the problem parameter space with low bandwidth and/or high number of processors, the family of algorithms described here is superior. -
Overview of Iterative Linear System Solver Packages
Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures. -
Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications. -
Hybrid Algorithms for Efficient Cholesky Decomposition And
Hybrid Algorithms for Efficient Cholesky Decomposition and Matrix Inverse using Multicore CPUs with GPU Accelerators Gary Macindoe A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy of UCL. 2013 ii I, Gary Macindoe, confirm that the work presented in this thesis is my own. Where information has been derived from other sources, I confirm that this has been indicated in the thesis. Signature : Abstract The use of linear algebra routines is fundamental to many areas of computational science, yet their implementation in software still forms the main computational bottleneck in many widely used algorithms. In machine learning and computational statistics, for example, the use of Gaussian distributions is ubiquitous, and routines for calculating the Cholesky decomposition, matrix inverse and matrix determinant must often be called many thousands of times for com- mon algorithms, such as Markov chain Monte Carlo. These linear algebra routines consume most of the total computational time of a wide range of statistical methods, and any improve- ments in this area will therefore greatly increase the overall efficiency of algorithms used in many scientific application areas. The importance of linear algebra algorithms is clear from the substantial effort that has been invested over the last 25 years in producing low-level software libraries such as LAPACK, which generally optimise these linear algebra routines by breaking up a large problem into smaller problems that may be computed independently. The performance of such libraries is however strongly dependent on the specific hardware available. LAPACK was originally de- veloped for single core processors with a memory hierarchy, whereas modern day computers often consist of mixed architectures, with large numbers of parallel cores and graphics process- ing units (GPU) being used alongside traditional CPUs. -
Matrix Inversion Using Cholesky Decomposition
Matrix Inversion Using Cholesky Decomposition Aravindh Krishnamoorthy, Deepak Menon ST-Ericsson India Private Limited, Bangalore [email protected], [email protected] Abstract—In this paper we present a method for matrix inversion Upper triangular elements, i.e. : based on Cholesky decomposition with reduced number of operations by avoiding computation of intermediate results; further, we use fixed point simulations to compare the numerical ( ∑ ) accuracy of the method. … (4) Keywords-matrix, inversion, Cholesky, LDL. Note that since older values of aii aren’t required for computing newer elements, they may be overwritten by the I. INTRODUCTION value of rii, hence, the algorithm may be performed in-place Matrix inversion techniques based on Cholesky using the same memory for matrices A and R. decomposition and the related LDL decomposition are efficient Cholesky decomposition is of order and requires techniques widely used for inversion of positive- operations. Matrix inversion based on Cholesky definite/symmetric matrices across multiple fields. decomposition is numerically stable for well conditioned Existing matrix inversion algorithms based on Cholesky matrices. decomposition use either equation solving [3] or triangular matrix operations [4] with most efficient implementation If , with is the linear system with requiring operations. variables, and satisfies the requirement for Cholesky decomposition, we can rewrite the linear system as In this paper we propose an inversion algorithm which … (5) reduces the number of operations by 16-17% compared to the existing algorithms by avoiding computation of some known By letting , we have intermediate results. … (6) In section 2 of this paper we review the Cholesky and LDL decomposition techniques, and discuss solutions to linear and systems based on them. -
MFEM: a Modular Finite Element Methods Library
MFEM: A Modular Finite Element Methods Library Robert Anderson1, Andrew Barker1, Jamie Bramwell1, Jakub Cerveny2, Johann Dahm3, Veselin Dobrev1,YohannDudouit1, Aaron Fisher1,TzanioKolev1,MarkStowell1,and Vladimir Tomov1 1Lawrence Livermore National Laboratory 2University of West Bohemia 3IBM Research July 2, 2018 Abstract MFEM is a free, lightweight, flexible and scalable C++ library for modular finite element methods that features arbitrary high-order finite element meshes and spaces, support for a wide variety of discretization approaches and emphasis on usability, portability, and high-performance computing efficiency. Its mission is to provide application scientists with access to cutting-edge algorithms for high-order finite element meshing, discretizations and linear solvers. MFEM also enables researchers to quickly and easily develop and test new algorithms in very general, fully unstructured, high-order, parallel settings. In this paper we describe the underlying algorithms and finite element abstractions provided by MFEM, discuss the software implementation, and illustrate various applications of the library. Contents 1 Introduction 3 2 Overview of the Finite Element Method 4 3Meshes 9 3.1 Conforming Meshes . 10 3.2 Non-Conforming Meshes . 11 3.3 NURBS Meshes . 12 3.4 Parallel Meshes . 12 3.5 Supported Input and Output Formats . 13 1 4 Finite Element Spaces 13 4.1 FiniteElements....................................... 14 4.2 DiscretedeRhamComplex ................................ 16 4.3 High-OrderSpaces ..................................... 17 4.4 Visualization . 18 5 Finite Element Operators 18 5.1 DiscretizationMethods................................... 18 5.2 FiniteElementLinearSystems . 19 5.3 Operator Decomposition . 23 5.4 High-Order Partial Assembly . 25 6 High-Performance Computing 27 6.1 Parallel Meshes, Spaces, and Operators . 27 6.2 Scalable Linear Solvers . -
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization
AMS526: Numerical Analysis I (Numerical Linear Algebra) Lecture 3: Positive-Definite Systems; Cholesky Factorization Xiangmin Jiao Stony Brook University Xiangmin Jiao Numerical Analysis I 1 / 11 Symmetric Positive-Definite Matrices n×n Symmetric matrix A 2 R is symmetric positive definite (SPD) if T n x Ax > 0 for x 2 R nf0g n×n Hermitian matrix A 2 C is Hermitian positive definite (HPD) if ∗ n x Ax > 0 for x 2 C nf0g SPD matrices have positive real eigenvalues and orthogonal eigenvectors Note: Most textbooks only talk about SPD or HPD matrices, but a positive-definite matrix does not need to be symmetric or Hermitian! A real matrix A is positive definite iff A + AT is SPD T n If x Ax ≥ 0 for x 2 R nf0g, then A is said to be positive semidefinite Xiangmin Jiao Numerical Analysis I 2 / 11 Properties of Symmetric Positive-Definite Matrices SPD matrix often arises as Hessian matrix of some convex functional I E.g., least squares problems; partial differential equations If A is SPD, then A is nonsingular Let X be any n × m matrix with full rank and n ≥ m. Then T I X X is symmetric positive definite, and T I XX is symmetric positive semidefinite n×m If A is n × n SPD and X 2 R has full rank and n ≥ m, then X T AX is SPD Any principal submatrix (picking some rows and corresponding columns) of A is SPD and aii > 0 Xiangmin Jiao Numerical Analysis I 3 / 11 Cholesky Factorization If A is symmetric positive definite, then there is factorization of A A = RT R where R is upper triangular, and all its diagonal entries are positive Note: Textbook calls is “Cholesky -
A Geometric Theory for Preconditioned Inverse Iteration. III: a Short and Sharp Convergence Estimate for Generalized Eigenvalue Problems
A geometric theory for preconditioned inverse iteration. III: A short and sharp convergence estimate for generalized eigenvalue problems. Andrew V. Knyazev Department of Mathematics, University of Colorado at Denver, P.O. Box 173364, Campus Box 170, Denver, CO 80217-3364 1 Klaus Neymeyr Mathematisches Institut der Universit¨atT¨ubingen,Auf der Morgenstelle 10, 72076 T¨ubingen,Germany 2 Abstract In two previous papers by Neymeyr: A geometric theory for preconditioned inverse iteration I: Extrema of the Rayleigh quotient, LAA 322: (1-3), 61-85, 2001, and A geometric theory for preconditioned inverse iteration II: Convergence estimates, LAA 322: (1-3), 87-104, 2001, a sharp, but cumbersome, convergence rate estimate was proved for a simple preconditioned eigensolver, which computes the smallest eigenvalue together with the corresponding eigenvector of a symmetric positive def- inite matrix, using a preconditioned gradient minimization of the Rayleigh quotient. In the present paper, we discover and prove a much shorter and more elegant, but still sharp in decisive quantities, convergence rate estimate of the same method that also holds for a generalized symmetric definite eigenvalue problem. The new estimate is simple enough to stimulate a search for a more straightforward proof technique that could be helpful to investigate such practically important method as the locally optimal block preconditioned conjugate gradient eigensolver. We demon- strate practical effectiveness of the latter for a model problem, where it compares favorably with two -
16 Preconditioning
16 Preconditioning The general idea underlying any preconditioning procedure for iterative solvers is to modify the (ill-conditioned) system Ax = b in such a way that we obtain an equivalent system Aˆxˆ = bˆ for which the iterative method converges faster. A standard approach is to use a nonsingular matrix M, and rewrite the system as M −1Ax = M −1b. The preconditioner M needs to be chosen such that the matrix Aˆ = M −1A is better conditioned for the conjugate gradient method, or has better clustered eigenvalues for the GMRES method. 16.1 Preconditioned Conjugate Gradients We mentioned earlier that the number of iterations required for the conjugate gradient algorithm to converge is proportional to pκ(A). Thus, for poorly conditioned matrices, convergence will be very slow. Thus, clearly we will want to choose M such that κ(Aˆ) < κ(A). This should result in faster convergence. How do we find Aˆ, xˆ, and bˆ? In order to ensure symmetry and positive definiteness of Aˆ we let M −1 = LLT (44) with a nonsingular m × m matrix L. Then we can rewrite Ax = b ⇐⇒ M −1Ax = M −1b ⇐⇒ LT Ax = LT b ⇐⇒ LT AL L−1x = LT b . | {z } | {z } |{z} =Aˆ =xˆ =bˆ The symmetric positive definite matrix M is called splitting matrix or preconditioner, and it can easily be verified that Aˆ is symmetric positive definite, also. One could now formally write down the standard CG algorithm with the new “hat- ted” quantities. However, the algorithm is more efficient if the preconditioning is incorporated directly into the iteration. To see what this means we need to examine every single line in the CG algorithm.