LARGE-SCALE COMPUTATION of PSEUDOSPECTRA USING ARPACK and EIGS∗ 1. Introduction. the Matrices in Many Eigenvalue Problems
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
CUDA 6 and Beyond
MUMPS USERS DAYS JUNE 1ST / 2ND 2017 Programming heterogeneous architectures with libraries: A survey of NVIDIA linear algebra libraries François Courteille |Principal Solutions Architect, NVIDIA |[email protected] ACKNOWLEDGEMENTS Joe Eaton , NVIDIA Ty McKercher , NVIDIA Lung Sheng Chien , NVIDIA Nikolay Markovskiy , NVIDIA Stan Posey , NVIDIA Steve Rennich , NVIDIA Dave Miles , NVIDIA Peng Wang, NVIDIA Questions: [email protected] 2 AGENDA Prolegomena NVIDIA Solutions for Accelerated Linear Algebra Libraries performance on Pascal Rapid software development for heterogeneous architecture 3 PROLEGOMENA 124 NVIDIA Gaming VR AI & HPC Self-Driving Cars GPU Computing 5 ONE ARCHITECTURE BUILT FOR BOTH DATA SCIENCE & COMPUTATIONAL SCIENCE AlexNet Training Performance 70x Pascal [CELLR ANGE] 60x 16nm FinFET 50x 40x CoWoS HBM2 30x 20x [CELLR ANGE] NVLink 10x [CELLR [CELLR ANGE] ANGE] 0x 2013 2014 2015 2016 cuDNN Pascal & Volta NVIDIA DGX-1 NVIDIA DGX SATURNV 65x in 3 Years 6 7 8 8 9 NVLINK TO CPU IBM Power Systems Server S822LC (codename “Minsky”) DDR4 DDR4 2x IBM Power8+ CPUs and 4x P100 GPUs 115GB/s 80 GB/s per GPU bidirectional for peer traffic IB P8+ CPU P8+ CPU IB 80 GB/s per GPU bidirectional to CPU P100 P100 P100 P100 115 GB/s CPU Memory Bandwidth Direct Load/store access to CPU Memory High Speed Copy Engines for bulk data movement 1615 UNIFIED MEMORY ON PASCAL Large datasets, Simple programming, High performance CUDA 8 Enable Large Oversubscribe GPU memory Pascal Data Models Allocate up to system memory size CPU GPU Higher Demand -
A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems Xinzhe Wu, Serge Petiton
A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems Xinzhe Wu, Serge Petiton To cite this version: Xinzhe Wu, Serge Petiton. A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems. HPC Asia 2018 - International Conference on High Performance Computing in Asia-Pacific Region, Jan 2018, Tokyo, Japan. 10.1145/3149457.3154481. hal-01677110 HAL Id: hal-01677110 https://hal.archives-ouvertes.fr/hal-01677110 Submitted on 8 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A Distributed and Parallel Asynchronous Unite and Conquer Method to Solve Large Scale Non-Hermitian Linear Systems Xinzhe WU Serge G. Petiton Maison de la Simulation, USR 3441 Maison de la Simulation, USR 3441 CRIStAL, UMR CNRS 9189 CRIStAL, UMR CNRS 9189 Université de Lille 1, Sciences et Technologies Université de Lille 1, Sciences et Technologies F-91191, Gif-sur-Yvette, France F-91191, Gif-sur-Yvette, France [email protected] [email protected] ABSTRACT the iteration number. These methods are already well implemented Parallel Krylov Subspace Methods are commonly used for solv- in parallel to profit from the great number of computation cores ing large-scale sparse linear systems. -
Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale Eigenvalue Calculations
NASA Contractor Report 198342 /" ICASE Report No. 96-40 J ICA IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS Danny C. Sorensen NASA Contract No. NASI-19480 May 1996 Institute for Computer Applications in Science and Engineering NASA Langley Research Center Hampton, VA 23681-0001 Operated by Universities Space Research Association National Aeronautics and Space Administration Langley Research Center Hampton, Virginia 23681-0001 IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS Danny C. Sorensen 1 Department of Computational and Applied Mathematics Rice University Houston, TX 77251 sorensen@rice, edu ABSTRACT Eigenvalues and eigenfunctions of linear operators are important to many areas of ap- plied mathematics. The ability to approximate these quantities numerically is becoming increasingly important in a wide variety of applications. This increasing demand has fu- eled interest in the development of new methods and software for the numerical solution of large-scale algebraic eigenvalue problems. In turn, the existence of these new methods and software, along with the dramatically increased computational capabilities now avail- able, has enabled the solution of problems that would not even have been posed five or ten years ago. Until very recently, software for large-scale nonsymmetric problems was virtually non-existent. Fortunately, the situation is improving rapidly. The purpose of this article is to provide an overview of the numerical solution of large- scale algebraic eigenvalue problems. The focus will be on a class of methods called Krylov subspace projection methods. The well-known Lanczos method is the premier member of this class. The Arnoldi method generalizes the Lanczos method to the nonsymmetric case. -
Krylov Subspaces
Lab 1 Krylov Subspaces Lab Objective: Discuss simple Krylov Subspace Methods for finding eigenvalues and show some interesting applications. One of the biggest difficulties in computational linear algebra is the amount of memory needed to store a large matrix and the amount of time needed to read its entries. Methods using Krylov subspaces avoid this difficulty by studying how a matrix acts on vectors, making it unnecessary in many cases to create the matrix itself. More specifically, we can construct a Krylov subspace just by knowing how a linear transformation acts on vectors, and with these subspaces we can closely approximate eigenvalues of the transformation and solutions to associated linear systems. The Arnoldi iteration is an algorithm for finding an orthonormal basis of a Krylov subspace. Its outputs can also be used to approximate the eigenvalues of the original matrix. The Arnoldi Iteration The order-N Krylov subspace of A generated by x is 2 n−1 Kn(A; x) = spanfx;Ax;A x;:::;A xg: If the vectors fx;Ax;A2x;:::;An−1xg are linearly independent, then they form a basis for Kn(A; x). However, this basis is usually far from orthogonal, and hence computations using this basis will likely be ill-conditioned. One way to find an orthonormal basis for Kn(A; x) would be to use the modi- fied Gram-Schmidt algorithm from Lab TODO on the set fx;Ax;A2x;:::;An−1xg. More efficiently, the Arnold iteration integrates the creation of fx;Ax;A2x;:::;An−1xg with the modified Gram-Schmidt algorithm, returning an orthonormal basis for Kn(A; x). -
Iterative Methods for Linear System
Iterative methods for Linear System JASS 2009 Student: Rishi Patil Advisor: Prof. Thomas Huckle Outline ●Basics: Matrices and their properties Eigenvalues, Condition Number ●Iterative Methods Direct and Indirect Methods ●Krylov Subspace Methods Ritz Galerkin: CG Minimum Residual Approach : GMRES/MINRES Petrov-Gaelerkin Method: BiCG, QMR, CGS 1st April JASS 2009 2 Basics ●Linear system of equations Ax = b ●A Hermitian matrix (or self-adjoint matrix ) is a square matrix with complex entries which is equal to its own conjugate transpose, 3 2 + i that is, the element in the ith row and jth column is equal to the A = complex conjugate of the element in the jth row and ith column, for 2 − i 1 all indices i and j • Symmetric if a ij = a ji • Positive definite if, for every nonzero vector x XTAx > 0 • Quadratic form: 1 f( x) === xTAx −−− bTx +++ c 2 ∂ f( x) • Gradient of Quadratic form: ∂x 1 1 1 f' (x) = ⋮ = ATx + Ax − b ∂ 2 2 f( x) ∂xn 1st April JASS 2009 3 Various quadratic forms Positive-definite Negative-definite matrix matrix Singular Indefinite positive-indefinite matrix matrix 1st April JASS 2009 4 Various quadratic forms 1st April JASS 2009 5 Eigenvalues and Eigenvectors For any n×n matrix A, a scalar λ and a nonzero vector v that satisfy the equation Av =λv are said to be the eigenvalue and the eigenvector of A. ●If the matrix is symmetric , then the following properties hold: (a) the eigenvalues of A are real (b) eigenvectors associated with distinct eigenvalues are orthogonal ●The matrix A is positive definite (or positive semidefinite) if and only if all eigenvalues of A are positive (or nonnegative). -
Implicitly Restarted Arnoldi/Lanczos Methods for Large Scale Eigenvalue Calculations
https://ntrs.nasa.gov/search.jsp?R=19960048075 2020-06-16T03:31:45+00:00Z NASA Contractor Report 198342 /" ICASE Report No. 96-40 J ICA IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS Danny C. Sorensen NASA Contract No. NASI-19480 May 1996 Institute for Computer Applications in Science and Engineering NASA Langley Research Center Hampton, VA 23681-0001 Operated by Universities Space Research Association National Aeronautics and Space Administration Langley Research Center Hampton, Virginia 23681-0001 IMPLICITLY RESTARTED ARNOLDI/LANCZOS METHODS FOR LARGE SCALE EIGENVALUE CALCULATIONS Danny C. Sorensen 1 Department of Computational and Applied Mathematics Rice University Houston, TX 77251 sorensen@rice, edu ABSTRACT Eigenvalues and eigenfunctions of linear operators are important to many areas of ap- plied mathematics. The ability to approximate these quantities numerically is becoming increasingly important in a wide variety of applications. This increasing demand has fu- eled interest in the development of new methods and software for the numerical solution of large-scale algebraic eigenvalue problems. In turn, the existence of these new methods and software, along with the dramatically increased computational capabilities now avail- able, has enabled the solution of problems that would not even have been posed five or ten years ago. Until very recently, software for large-scale nonsymmetric problems was virtually non-existent. Fortunately, the situation is improving rapidly. The purpose of this article is to provide an overview of the numerical solution of large- scale algebraic eigenvalue problems. The focus will be on a class of methods called Krylov subspace projection methods. The well-known Lanczos method is the premier member of this class. -
The Influence of Orthogonality on the Arnoldi Method
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Linear Algebra and its Applications 309 (2000) 307±323 www.elsevier.com/locate/laa The in¯uence of orthogonality on the Arnoldi method T. Braconnier *, P. Langlois, J.C. Rioual IREMIA, Universite de la Reunion, DeÂpartement de Mathematiques et Informatique, BP 7151, 15 Avenue Rene Cassin, 97715 Saint-Denis Messag Cedex 9, La Reunion, France Received 1 February 1999; accepted 22 May 1999 Submitted by J.L. Barlow Abstract Many algorithms for solving eigenproblems need to compute an orthonormal basis. The computation is commonly performed using a QR factorization computed using the classical or the modi®ed Gram±Schmidt algorithm, the Householder algorithm, the Givens algorithm or the Gram±Schmidt algorithm with iterative reorthogonalization. For the eigenproblem, although textbooks warn users about the possible instability of eigensolvers due to loss of orthonormality, few theoretical results exist. In this paper we prove that the loss of orthonormality of the computed basis can aect the reliability of the computed eigenpair when we use the Arnoldi method. We also show that the stopping criterion based on the backward error and the value computed using the Arnoldi method can dier because of the loss of orthonormality of the computed basis of the Krylov subspace. We also give a bound which quanti®es this dierence in terms of the loss of orthonormality. Ó 2000 Elsevier Science Inc. All rights reserved. Keywords: QR factorization; Backward error; Eigenvalues; Arnoldi method 1. Introduction The aim of the most commonly used eigensolvers is to approximate a basis of an invariant subspace associated with some eigenvalues. -
Power Method and Krylov Subspaces
Power method and Krylov subspaces Introduction When calculating an eigenvalue of a matrix A using the power method, one starts with an initial random vector b and then computes iteratively the sequence Ab, A2b,...,An−1b normalising and storing the result in b on each iteration. The sequence converges to the eigenvector of the largest eigenvalue of A. The set of vectors 2 n−1 Kn = b, Ab, A b,...,A b , (1) where n < rank(A), is called the order-n Krylov matrix, and the subspace spanned by these vectors is called the order-n Krylov subspace. The vectors are not orthogonal but can be made so e.g. by Gram-Schmidt orthogonalisation. For the same reason that An−1b approximates the dominant eigenvector one can expect that the other orthogonalised vectors approximate the eigenvectors of the n largest eigenvalues. Krylov subspaces are the basis of several successful iterative methods in numerical linear algebra, in particular: Arnoldi and Lanczos methods for finding one (or a few) eigenvalues of a matrix; and GMRES (Generalised Minimum RESidual) method for solving systems of linear equations. These methods are particularly suitable for large sparse matrices as they avoid matrix-matrix opera- tions but rather multiply vectors by matrices and work with the resulting vectors and matrices in Krylov subspaces of modest sizes. Arnoldi iteration Arnoldi iteration is an algorithm where the order-n orthogonalised Krylov matrix Qn of a matrix A is built using stabilised Gram-Schmidt process: • start with a set Q = {q1} of one random normalised vector q1 • repeat for k =2 to n : – make a new vector qk = Aqk−1 † – orthogonalise qk to all vectors qi ∈ Q storing qi qk → hi,k−1 – normalise qk storing qk → hk,k−1 – add qk to the set Q By construction the matrix Hn made of the elements hjk is an upper Hessenberg matrix, h1,1 h1,2 h1,3 ··· h1,n h2,1 h2,2 h2,3 ··· h2,n 0 h3,2 h3,3 ··· h3,n Hn = , (2) . -
Finding Eigenvalues: Arnoldi Iteration and the QR Algorithm
Finding Eigenvalues: Arnoldi Iteration and the QR Algorithm Will Wheeler Algorithms Interest Group Jul 13, 2016 Outline Finding the largest eigenvalue I Largest eigenvalue power method I So much work for one eigenvector. What about others? I More eigenvectors orthogonalize Krylov matrix I build orthogonal list Arnoldi iteration Solving the eigenvalue problem I Eigenvalues diagonal of triangular matrix I Triangular matrix QR algorithm I QR algorithm QR decomposition I Better QR decomposition Hessenberg matrix I Hessenberg matrix Arnoldi algorithm Power Method How do we find the largest eigenvalue of an m × m matrix A? I Start with a vector b and make a power sequence: b; Ab; A2b;::: I Higher eigenvalues dominate: n n n n A (v1 + v2) = λ1v1 + λ2v2 ≈ λ1v1 Vector sequence (normalized) converged? I Yes: Eigenvector I No: Iterate some more Krylov Matrix Power method throws away information along the way. I Put the sequence into a matrix K = [b; Ab; A2b;:::; An−1b] (1) = [xn; xn−1; xn−2;:::; x1] (2) I Gram-Schmidt approximates first n eigenvectors v1 = x1 (3) x2 · x1 v2 = x2 − x1 (4) x1 · x1 x3 · x1 x3 · x2 v3 = x3 − x1 − x2 (5) x1 · x1 x2 · x2 ::: (6) I Why not orthogonalize as we build the list? Arnoldi Iteration I Start with a normalized vector q1 xk = Aqk−1 (next power) (7) k−1 X yk = xk − (qj · xk ) qj (Gram-Schmidt) (8) j=1 qk = yk = jyk j (normalize) (9) I Orthonormal vectors fq1;:::; qng span Krylov subspace n−1 (Kn = spanfq1; Aq1;:::; A q1g) I These are not the eigenvectors of A, but make a similarity y (unitary) transformation Hn = QnAQn Arnoldi Iteration I Start with a normalized vector q1 xk = Aqk−1 (next power) (10) k−1 X yk = xk − (qj · xk ) qj (Gram-Schmidt) (11) j=1 qk = yk =jyk j (normalize) (12) I Orthonormal vectors fq1;:::; qng span Krylov subspace n−1 (Kn = spanfq1; Aq1;:::; A q1g) I These are not the eigenvectors of A, but make a similarity y (unitary) transformation Hn = QnAQn Arnoldi Iteration - Construct Hn We can construct the matrix Hn along the way. -
A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms
A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms Yu Jin Joseph F. JaJa Institute for Advanced Computer Studies Institute for Advanced Computer Studies Department of Electrical and Computer Engineering Department of Electrical and Computer Engineering University of Maryland, College Park, USA University of Maryland, College Park, USA Email: [email protected] Email: [email protected] Abstract—Spectral clustering is one of the most popular graph CPUs, further boost the overall performance and are able clustering algorithms, which achieves the best performance for to achieve very high performance on problems whose sizes many scientific and engineering applications. However, existing grow up to the capacity of CPU memory [6, 7, 8, 9, 10, implementations in commonly used software platforms such as Matlab and Python do not scale well for many of the emerging 11]. In this paper, we present a hybrid implementation of the Big Data applications. In this paper, we present a fast imple- spectral clustering algorithm which significantly outperforms mentation of the spectral clustering algorithm on a CPU-GPU the known implementations, most of which are purely based heterogeneous platform. Our implementation takes advantage on multi-core CPUs. of the computational power of the multi-core CPU and the There have been reported efforts on parallelizing the spec- massive multithreading and SIMD capabilities of GPUs. Given the input as data points in high dimensional space, we propose tral clustering algorithm. Zheng et al. [12] presented both a parallel scheme to build a sparse similarity graph represented CUDA and OpenMP implementations of spectral clustering. in a standard sparse representation format. -
A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations Hasan Metin Aktulga, Md
1 A High Performance Block Eigensolver for Nuclear Configuration Interaction Calculations Hasan Metin Aktulga, Md. Afibuzzaman, Samuel Williams, Aydın Buluc¸, Meiyue Shao, Chao Yang, Esmond G. Ng, Pieter Maris, James P. Vary Abstract—As on-node parallelism increases and the performance gap between the processor and the memory system widens, achieving high performance in large-scale scientific applications requires an architecture-aware design of algorithms and solvers. We focus on the eigenvalue problem arising in nuclear Configuration Interaction (CI) calculations, where a few extreme eigenpairs of a sparse symmetric matrix are needed. We consider a block iterative eigensolver whose main computational kernels are the multiplication of a sparse matrix with multiple vectors (SpMM), and tall-skinny matrix operations. We present techniques to significantly improve the SpMM and the transpose operation SpMMT by using the compressed sparse blocks (CSB) format. We achieve 3–4× speedup on the requisite operations over good implementations with the commonly used compressed sparse row (CSR) format. We develop a performance model that allows us to correctly estimate the performance of our SpMM kernel implementations, and we identify cache bandwidth as a potential performance bottleneck beyond DRAM. We also analyze and optimize the performance of LOBPCG kernels (inner product and linear combinations on multiple vectors) and show up to 15× speedup over using high performance BLAS libraries for these operations. The resulting high performance LOBPCG solver achieves 1.4× to 1.8× speedup over the existing Lanczos solver on a series of CI computations on high-end multicore architectures (Intel Xeons). We also analyze the performance of our techniques on an Intel Xeon Phi Knights Corner (KNC) processor. -
Slepc Users Manual Scalable Library for Eigenvalue Problem Computations
Departamento de Sistemas Inform´aticos y Computaci´on Technical Report DSIC-II/24/02 SLEPc Users Manual Scalable Library for Eigenvalue Problem Computations https://slepc.upv.es Jose E. Roman Carmen Campos Lisandro Dalcin Eloy Romero Andr´es Tom´as To be used with slepc 3.15 March, 2021 Abstract This document describes slepc, the Scalable Library for Eigenvalue Problem Computations, a software package for the solution of large sparse eigenproblems on parallel computers. It can be used for the solution of various types of eigenvalue problems, including linear and nonlinear, as well as other related problems such as the singular value decomposition (see a summary of supported problem classes on page iii). slepc is a general library in the sense that it covers both Hermitian and non-Hermitian problems, with either real or complex arithmetic. The emphasis of the software is on methods and techniques appropriate for problems in which the associated matrices are large and sparse, for example, those arising after the discretization of partial differential equations. Thus, most of the methods offered by the library are projection methods, including different variants of Krylov and Davidson iterations. In addition to its own solvers, slepc provides transparent access to some external software packages such as arpack. These packages are optional and their installation is not required to use slepc, see x8.7 for details. Apart from the solvers, slepc also provides built-in support for some operations commonly used in the context of eigenvalue computations, such as preconditioning or the shift- and-invert spectral transformation. slepc is built on top of petsc, the Portable, Extensible Toolkit for Scientific Computation [Balay et al., 2021].