PFEAST: a High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers
Total Page:16
File Type:pdf, Size:1020Kb
PFEAST: A High Performance Sparse Eigenvalue Solver Using Distributed-Memory Linear Solvers James Kestyn∗, Vasileios Kalantzisy, Eric Polizzi∗, Yousef Saady ∗Electrical and Computer Engineering Department, University of Massachusetts, Amherst, MA, U.S.A. yComputer Science and Engineering Department, University of Minnesota, Minneapolis, MN, U.S.A. Abstract—The FEAST algorithm and eigensolver for interior computing interior eigenpairs that makes use of a rational eigenvalue problems naturally possesses three distinct levels filter obtained from an approximation of the spectral pro- of parallelism. The solver is then suited to exploit modern jector. FEAST can be applied for solving both standard and computer architectures containing many interconnected proces- sors. This paper highlights a recent development within the generalized forms of Hermitian or non-Hermitian problems, software package that allows the dominant computational task, and belongs to the family of contour integration eigensolvers solving a set of complex linear systems, to be performed with a [32], [33], [3], [14], [15], [4]. Once a given search interval distributed memory solver. The software, written with a reverse- is selected, FEAST’s main computational task consists of communication-interface, can now be interfaced with any generic a numerical quadrature computation that involves solving MPI linear-system solver using a customized data distribution for the eigenvector solutions. This work utilizes two common independent linear systems along a complex contour. The “black-box” distributed memory linear-systems solvers (Cluster- algorithm can exploit natural parallelism at three different MKL-Pardiso and MUMPS), as well as our own application- levels: (i) search intervals can be treated separately (no over- specific domain-decomposition MPI solver, for a collection of 3- lap), (ii) linear systems can be solved independently across dimensional finite-element systems. We discuss and analyze how the quadrature nodes of the complex contour, and (iii) each parallel resources can be placed at all three levels simultaneously in order to achieve good scalability and optimal use of the complex linear system with multiple right-hand-sides can be computing platform. solved in parallel. Within a parallel environment, the algorithm complexity becomes then directly dependent on solving a I. INTRODUCTION single linear system. Eigenvalue problems are widely used across a diverse range The FEAST numerical library offers ‘black-box’ reverse of high performance computing applications. A generalized communication interfaces (RCI) which are both matrix format problem is defined by two n × n matrices A and B, with the and linear system solver independent, and can then be fully set (A, B) known as the matrix pencil. The eigenvalues Λ and customized by the end users to allow maximum flexibility for eigenvectors X are non-trivial solutions to their applications. In addition, FEAST offers the following set AX = BXΛ: (1) of desirable features: (i) high-robustness and a well defined The problem is called ‘standard’ if B reduces to the identity convergence rate, (ii) all multiplicities naturally captured, (iii) matrix or ‘generalized’ otherwise. In spite of the enormous no explicit orthogonalization procedure on eigenvectors, and progress that has been made over the last few decades in (iv) a reusable subspace when solving a series of related algorithms and software packages that compute the solution eigenproblems. Consequently, the software package has been to large sparse eigenvalue problems, the current state-of-the- very well received by application developers, especially in art methods are facing new challenges for achieving ever the electronic structure community. A common technique to higher levels of efficiency, accuracy and performance on calculate the electronic structure and ground-state properties modern parallel architectures. In particular, traditional methods of molecules is Density Functional Theory (DFT) [20], where suffer from the orthogonalization of a very large basis when many eigenvalue problems must be solved within a self many eigenpairs are computed. In this case, a divide-and- consistent loop. The FEAST algorithm is an ideal candidate conquer approach that can compute wanted eigenpairs by to parallelize this step, speeding up the time-to-solution for parts, becomes mandatory since ‘windows’ or ‘slices’ of the these calculations and allowing for the investigation of very spectrum can be computed independently of one another and large molecules containing thousands of atoms. orthogonalization between eigenvectors in different slices is So far, the FEAST software has been limited to the use no longer necessary. In this framework, all the resulting sub- of shared-memory system solvers (at the third level of par- intervals are called interior eigenvalue problems, in the sense allelism). In this paper we extend the software to operate on that they involve large blocks of eigenpairs located anywhere distributed memory platforms and interface the fully parallel inside the spectrum. version of FEAST (PFEAST) with three different MPI linear- The FEAST algorithm [27] and associated software package system solvers: (i) Cluster-MKL-Pardiso [17], (ii) MUMPS [28], [9] is an accelerated subspace iterative technique for [1], and (iii) our own custom domain-decomposition solver for SC16; Salt Lake City, Utah, USA; November 2016 978-1-4673-8815-3/16/$31.00 c 2016 IEEE electronic structure calculations [24]. Three separate levels of Algorithm 1 The FEAST Algorithm ~ communication must be managed within the software kernel 1: input: A, B, Xm0 , f(zj ;!j )g1;:::;ne , and the multilevel parallel capabilities of the eigensolver result 2: while ( jjAX~m − BX~mΛmjj > ) do in a trade-off between memory and performance. Parallel re- 3: Qm0 = 0 sources can be placed at all three levels simultaneously in order 4: for ( j = 0; j < ne; j = j + 1 ) do (j) −1 ~ to achieve good scalability and optimal use of the computing 5: Qm0 (zj B − A) BXm0 (j) platform. This paper aims to highlight the flexibility of the 6: Qm0 Qm0 + !j Qm0 eigensolver and describe how the new PFEAST kernel can 7: end for H H be interfaced with different MPI linear-system solvers using 8: Aq = Q AQ Bq = Q BQ specific data distributions for input matrices and right-hand- 9: Solve AqWq = BqWqΛq ~ ~ side vectors. We then benchmark the eigensolvers performance 10: Xm0 = Qm0 Wq Λ = Λq with the three different solvers, showcase its scalability at each 11: end while level of parallelism, and discuss how to optimally distribute 12: output: X~m, Λ~ m parallel resources. II. THE FEAST ALGORITHM In practice, the spectral projector must be approximated using a quadrature rule using ne integration nodes and weights The FEAST algorithm utilizes spectral projection and f(zj;!j)gj=1;:::;n i.e. subspace iteration to obtain selected interior eigenpairs. A e n Rayleigh-Ritz procedure is used to project matrices A and Xe Q = ! (z B − A)−1BX~ ; (6) B onto a reduced search subspace to form matrices m0 j j m0 j=1 H H Aq = Q AQ and Bq = Q BQ: (2) where we also consider a search subspace of size m0 ≥ m. The computation of Q amounts to solving a set of n ~ ~ m0 e Approximate eigenvalues Λ and eigenvectors X of the orig- complex shifted linear-systems inal system (i.e. Ritz-values and Ritz-vectors) can then be n Xe recovered from the solutions of the much smaller eigenvalue (z B − A)Q(j) = BX~ Q = ! Q(j) : j m0 m0 with m0 j m0 (7) problem j=1 AqWq = BqWqΛq (3) This matrix Qm0 is then used as the Rayleigh-Ritz projec- A B as tion operator to form reduced matrices q and q of (2). If the exact spectral projector was known, solving the reduced eigen- X~ = QWq and Λ~ = Λq: (4) problem in (3) will produce the exact eigenvalues Λ~ = Λq = Λ and eigenvectors X~ = QW = X. However, since it is only Initializing X~ as a set of m random vectors and obtaining q m approximated, the Ritz-values Λ and updated Ritz-vectors X~ Q after QR factorization of X~ , results in a standard q m m are only an approximation to the true eigenpairs. Subspace subspace iteration (i.e. power method) that converges linearly iteration will then, in effect, tilt the subspace spanned by toward the dominant eigenpairs [31]. Many other sophisticated columns of X~ toward the desire eigenspace. At convergence Krylov-based methods have also been developed to improve we will obtain X~ = Q = X and Λ~ = Λ. The general outline the convergence rate for the calculation of selected smallest, can be seen in Algorithm 1 for computing m eigenpairs in a largest or interior eigenpairs [22], [19], [6], [36]. The subspace given search interval. The input X~ can be chosen as a set of iteration technique, in turn, can be efficiently used for solving m random vectors or a previously calculated solution to a the interior eigenvalue problem when it is combined with 0 closely related problem. filtering which aims to improve the convergence by increasing The purpose of this section is not to provide a thorough the gap between wanted eigenvalues and unwanted ones. It is understanding of the FEAST algorithm, but to give a general well known that Ritz-pairs (X~ , Λ~) converge toward the true m idea of the algorithmic steps involved. A detailed numer- eigenpairs (X , Λ) at a rate determined by the filter [31], m ical analysis can be found in [38]. Additional information [26], [38]. In theory, the ideal filter for the Hermitian problem regarding the application of FEAST to non-symmetric and would act as a projection operator X XH B onto the subspace m m non-Hermitian systems is available in [18], [37], [21], [40]. spanned by the eigenvector basis, which can be expressed via In particular, we would like to emphasize that the main the Cauchy integral formula: computational procedure within the algorithm is solving the I set of complex linear-systems in (7).