A Testing Infrastructure for Lapack's
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  High-Performance Parallel Computations Using Python As High-Level LanguageAachen Institute for Advanced Study in Computational Engineering Science Preprint: AICES-2010/08-01 23/August/2010 High-Performance Parallel Computations using Python as High-Level Language Stefano Masini and Paolo Bientinesi Financial support from the Deutsche Forschungsgemeinschaft (German Research Association) through grant GSC 111 is gratefully acknowledged. ©Stefano Masini and Paolo Bientinesi 2010. All rights reserved List of AICES technical reports: http://www.aices.rwth-aachen.de/preprints High-Performance Parallel Computations using Python as High-Level Language Stefano Masini⋆ and Paolo Bientinesi⋆⋆ RWTH Aachen, AICES, Aachen, Germany Abstract. High-performance and parallel computations have always rep- resented a challenge in terms of code optimization and memory usage, and have typically been tackled with languages that allow a low-level management of resources, like Fortran, C and C++. Nowadays, most of the implementation effort goes into constructing the bookkeeping logic that binds together functionalities taken from standard libraries. Because of the increasing complexity of this kind of codes, it becomes more and more necessary to keep it well organized through proper software en- gineering practices. Indeed, in the presence of chaotic implementations, reasoning about correctness is difficult, even when limited to specific aspects like concurrency; moreover, due to the lack in flexibility of the code, making substantial changes for experimentation becomes a grand challenge. Since the bookkeeping logic only accounts for a tiny fraction of the total execution time, we believe that for such a task it can be afforded to introduce an overhead due to a high-level language. We consider Python as a preliminary candidate with the intent of improving code readability, flexibility and, in turn, the level of confidence with respect to correctness.
- 
												  Iterative Refinement for Symmetric Eigenvalue Decomposition IIJapan Journal of Industrial and Applied Mathematics (2019) 36:435–459 https://doi.org/10.1007/s13160-019-00348-4 ORIGINAL PAPER Iterative refinement for symmetric eigenvalue decomposition II: clustered eigenvalues Takeshi Ogita1 · Kensuke Aishima2 Received: 22 October 2018 / Revised: 5 February 2019 / Published online: 22 February 2019 © The Author(s) 2019 Abstract We are concerned with accurate eigenvalue decomposition of a real symmetric matrix A. In the previous paper (Ogita and Aishima in Jpn J Ind Appl Math 35(3): 1007–1035, 2018), we proposed an efficient refinement algorithm for improving the accuracy of all eigenvectors, which converges quadratically if a sufficiently accurate initial guess is given. However, since the accuracy of eigenvectors depends on the eigenvalue gap, it is difficult to provide such an initial guess to the algorithm in the case where A has clustered eigenvalues. To overcome this problem, we propose a novel algorithm that can refine approximate eigenvectors corresponding to clustered eigenvalues on the basis of the algorithm proposed in the previous paper. Numerical results are presented showing excellent performance of the proposed algorithm in terms of convergence rate and overall computational cost and illustrating an application to a quantum materials simulation. Keywords Accurate numerical algorithm · Iterative refinement · Symmetric eigenvalue decomposition · Clustered eigenvalues Mathematics Subject Classification 65F15 · 15A18 · 15A23 This study was partially supported by CREST, JST and JSPS KAKENHI Grant numbers 16H03917, 25790096. B Takeshi Ogita [email protected] Kensuke Aishima [email protected] 1 Division of Mathematical Sciences, School of Arts and Sciences, Tokyo Woman’s Christian University, 2-6-1 Zempukuji, Suginami-ku, Tokyo 167-8585, Japan 2 Faculty of Computer and Information Sciences, Hosei University, 3-7-2 Kajino-cho, Koganei-shi, Tokyo 184-8584, Japan 123 436 T.
- 
												  LAPACK WORKING NOTE 195: SCALAPACK's MRRR ALGORITHM 1. Introduction. Since 2005, the National Science Foundation Has Been FundLAPACK WORKING NOTE 195: SCALAPACK’S MRRR ALGORITHM CHRISTOF VOMEL¨ ∗ Abstract. The sequential algorithm of Multiple Relatively Robust Representations, MRRR, can compute numerically orthogonal eigenvectors of an unreduced symmetric tridiagonal matrix T ∈ Rn×n with O(n2) cost. This paper describes the design of ScaLAPACK’s parallel MRRR algorithm. One emphasis is on the critical role of the representation tree in achieving both numerical accuracy and parallel scalability. A second point concerns the favorable properties of this code: subset computation, the use of static memory, and scalability. Unlike ScaLAPACK’s Divide & Conquer and QR, MRRR can compute subsets of eigenpairs at reduced cost. And in contrast to inverse iteration which can fail, it is guaranteed to produce a numerically satisfactory answer while maintaining memory scalability. ParEig, the parallel MRRR algorithm for PLAPACK, uses dynamic memory allocation. This is avoided by our code at marginal additional cost. We also use a different representation tree criterion that allows for more accurate computation of the eigenvectors but can make parallelization more difficult. AMS subject classifications. 65F15, 65Y15. Key words. Symmetric eigenproblem, multiple relatively robust representations, ScaLAPACK, numerical software. 1. Introduction. Since 2005, the National Science Foundation has been funding an initiative [19] to improve the LAPACK [3] and ScaLAPACK [14] libraries for Numerical Linear Algebra computations. One of the ambitious goals of this initiative is to put more of LAPACK into ScaLAPACK, recognizing the gap between available sequential and parallel software support. In particular, parallelization of the MRRR algorithm [24, 50, 51, 26, 27, 28, 29], the topic of this paper, is identified as one key improvement to ScaLAPACK.
- 
												  Numerical Linear Algebra Texts in Applied Mathematics 55Grégoire Allaire 55 Sidi Mahmoud Kaber TEXTS IN APPLIED MATHEMATICS Numerical Linear Algebra Texts in Applied Mathematics 55 Editors J.E. Marsden L. Sirovich S.S. Antman Advisors G. Iooss P. Holmes D. Barkley M. Dellnitz P. Newton Texts in Applied Mathematics 1. Sirovich: Introduction to Applied Mathematics. 2. Wiggins: Introduction to Applied Nonlinear Dynamical Systems and Chaos. 3. Hale/Koc¸ak: Dynamics and Bifurcations. 4. Chorin/Marsden: A Mathematical Introduction to Fluid Mechanics, 3rd ed. 5. Hubbard/West: Differential Equations: A Dynamical Systems Approach: Ordinary Differential Equations. 6. Sontag: Mathematical Control Theory: Deterministic Finite Dimensional Systems, 2nd ed. 7. Perko: Differential Equations and Dynamical Systems, 3rd ed. 8. Seaborn: Hypergeometric Functions and Their Applications. 9. Pipkin: A Course on Integral Equations. 10. Hoppensteadt/Peskin: Modeling and Simulation in Medicine and the Life Sciences, 2nd ed. 11. Braun: Differential Equations and Their Applications, 4th ed. 12. Stoer/Bulirsch: Introduction to Numerical Analysis, 3rd ed. 13. Renardy/Rogers: An Introduction to Partial Differential Equations. 14. Banks: Growth and Diffusion Phenomena: Mathematical Frameworks and Applications. 15. Brenner/Scott: The Mathematical Theory of Finite Element Methods, 2nd ed. 16. Van de Velde: Concurrent Scientific Computing. 17. Marsden/Ratiu: Introduction to Mechanics and Symmetry, 2nd ed. 18. Hubbard/West: Differential Equations: A Dynamical Systems Approach: Higher-Dimensional Systems. 19. Kaplan/Glass: Understanding Nonlinear Dynamics. 20. Holmes: Introduction to Perturbation Methods. 21. Curtain/Zwart: An Introduction to Infinite-Dimensional Linear Systems Theory. 22. Thomas: Numerical Partial Differential Equations: Finite Difference Methods. 23. Taylor: Partial Differential Equations: Basic Theory. 24. Merkin: Introduction to the Theory of Stability of Motion.
- 
												  An Implementation of the MRRR Algorithm on a Data-Parallel CoprocessorAn Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor Christian Lessig∗ Abstract The Algorithm of Multiple Relatively Robust Representations (MRRRR) is one of the most efficient and most accurate solvers for the symmetric tridiagonal eigenvalue problem. We present an implementation of the MRRR algorithm on a data-parallel coprocessor using the CUDA pro- gramming environment. We obtain up to 50-fold speedups over LA- PACK’s MRRR implementation and demonstrate that the algorithm can be mapped efficiently onto a data-parallel architecture. The accuracy of the obtained results is currently inferior to LAPACK’s but we believe that the problem can be overcome in the near future. 1 Introduction 1 n×n The eigenvalue problem Aui = λiui for a real symmetric matrix A ∈ R with eigen- n values λi ∈ R and (right) eigenvectors ui ∈ R is important in many disciplines. This lead to the development of a variety of algorithms for the symmetric eigenvalue prob- lem [33, 51]. The efficacy of these eigen-solver depends on the required computational resources and the accuracy of the obtained results. With the advent of mainstream parallel architectures such as multi-core CPUs and data-parallel coprocessors also the amenability to parallelization is becoming an important property. Many eigen-solver are however difficult to parallelize. Notable exceptions are the Divide and Conquer algorithm [14] and the Algorithm of Multiple Relatively Robust Represen- tations (MRRR) [21] for which efficient parallel implementations exist. Both algorithms provide additionally highly accurate results [19]. The computational costs of the Divide and Conquer algorithm are with O(n3) however significantly higher than the O(n2) op- erations required by the MRRR algorithm.
- 
												  A Survey of Recent Developments in Parallel Implementations of Gaussian EliminationCONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2014) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3306 A survey of recent developments in parallel implementations of Gaussian elimination Simplice Donfack1, Jack Dongarra1, Mathieu Faverge2, Mark Gates1, Jakub Kurzak1,*,†, Piotr Luszczek1 and Ichitaro Yamazaki1 1EECS Department, University of Tennessee, Knoxville, TN, USA 2IPB ENSEIRB-Matmeca, Inria Bordeaux Sud-Ouest, Bordeaux, France SUMMARY Gaussian elimination is a canonical linear algebra procedure for solving linear systems of equations. In the last few years, the algorithm has received a lot of attention in an attempt to improve its parallel performance. This article surveys recent developments in parallel implementations of Gaussian elimination for shared memory architecture. Five different flavors are investigated. Three of them are based on different strategies for pivoting: partial pivoting, incremental pivoting, and tournament pivoting. The fourth one replaces pivoting with the Partial Random Butterfly Transformation, and finally, an implementation without pivoting is used as a performance baseline. The technique of iterative refinement is applied to recover numerical accuracy when necessary. All parallel implementations are produced using dynamic, superscalar, runtime scheduling and tile matrix layout. Results on two multisocket multicore systems are presented. Performance and numerical accuracy is analyzed. Copyright © 2014 John Wiley & Sons, Ltd. Received 15 July 2013; Revised 26 April 2014; Accepted 2 May 2014 KEY WORDS: Gaussian elimination; LU factorization; parallel; shared memory; multicore 1. INTRODUCTION Gaussian elimination has a long history that can be traced back some 2000 years [1]. Today, dense systems of linear equations have become a critical cornerstone for some of the most compute intensive applications.
- 
												  GLUED MATRICES and the MRRR ALGORITHM 1. IntroductionSIAM J. SCI. COMPUT. c 2005 Society for Industrial and Applied Mathematics Vol. 27, No. 2, pp. 496–510 GLUED MATRICES AND THE MRRR ALGORITHM∗ † ‡ § INDERJIT S. DHILLON , BERESFORD N. PARLETT , AND CHRISTOF VOMEL¨ Abstract. During the last ten years, Dhillon and Parlett devised a new algorithm (multiple relatively robust representations (MRRR)) for computing numerically orthogonal eigenvectors of a symmetric tridiagonal matrix T with O(n2) cost. It has been incorporated into LAPACK version 3.0 as routine stegr. We have discovered that the MRRR algorithm can fail in extreme cases. Sometimes eigenvalues agree to working accuracy and MRRR cannot compute orthogonal eigenvectors for them. In this paper, we describe and analyze these failures and various remedies. Key words. multiple relatively robust representations, numerically orthogonal eigenvectors, symmetric tridiagonal matrix, tight clusters of eigenvalues, glued matrices, Wilkinson matrices AMS subject classifications. 15A18, 15A23 DOI. 10.1137/040620746 1. Introduction. Starting in the mid-90s, Dhillon and Parlett developed the algorithm of multiple relatively robust representations (MRRR) that computes nu- merically orthogonal eigenvectors of a symmetric tridiagonal matrix T with O(n2) cost [13, 15, 16, 17, 7]. This algorithm was tested on a large and challenging set of matrices and has been incorporated into LAPACK version 3.0 as routine stegr. The algorithm is described, in the detail we need here, in section 2. For a more detailed description see also [8]. An example of MRRR in action is given in section 3. In 2003, one of us (V¨omel) came to Berkeley to assist in the modification of stegr to compute a subset of k eigenpairs with O(kn) operations.
- 
												  Fortran Optimizing Compiler 6Numerical Libraries for General Interest Seminars 2015 Scientific Computing April 1, 2015 Ge Baolai SHARCNET Faculty of Science Western University Topics . Numerical libraries for scientific computing . Numerical libraries at SHARCNET . Examples – Linear algebra – FFTW – ODE solvers – Optimization – GSL – Multiprecision with MPFUN – PETSc (PDE solvers) . Q&A Overview Numerical Libraries SEMINARS 2015 Numerical Computing . Linear algebra . Nonlinear equations . Optimization . Interpolation/Approximation . Integration and differentiation . Solving ODEs . Solving PDEs . FFT . Random numbers and stochastic simulations . Special functions Copyright © 2001-2015 Western University Seminar Series on Scientific and High Performance Computing, London, Ontario, 2015 Numerical Libraries SEMINARS 2015 More fundamental problems: . Linear algebra . Nonlinear equations . Numerical integration . ODE . FFT . Random numbers . Special functions Copyright © 2001-2015 Western University Seminar Series on Scientific and High Performance Computing, London, Ontario, 2015 Numerical Libraries SEMINARS 2015 Top Ten Algorithms for Science (Jack Dongarra 2000) 1. Metropolis Algorithm for Monte Carlo 2. Simplex Method for Linear Programming 3. Krylov Subspace Iteration Methods 4. The Decompositional Approach to Matrix Computations 5. The Fortran Optimizing Compiler 6. QR Algorithm for Computing Eigenvalues 7. Quicksort Algorithm for Sorting 8. Fast Fourier Transform 9. Integer Relation Detection 10. Fast Multipole Method Copyright © 2001-2015 Western University Seminar
- 
												  A − Λb Be an N-By-N Regular Matrix Pencil with NElectronic Journal of Linear Algebra ISSN 1081-3810 A publication of the International Linear Algebra Society Volume 30, pp. 632-648, October 2015 ELA ON WILKINSON’S PROBLEM FOR MATRIX PENCILS∗ SK. SAFIQUE AHMAD† AND RAFIKUL ALAM‡ Abstract. Suppose that an n-by-n regular matrix pencil A − λB has n distinct eigenvalues. Then determining a defective pencil E−λF which is nearest to A−λB is widely known as Wilkinson’s problem. It is shown that the pencil E − λF can be constructed from eigenvalues and eigenvectors of A − λB when A − λB is unitarily equivalent to a diagonal pencil. Further, in such a case, it is proved that the distance from A − λB to E − λF is the minimum “gap” between the eigenvalues of A − λB. As a consequence, lower and upper bounds for the “Wilkinson distance” d(L) from a regular pencil L(λ) with distinct eigenvalues to the nearest non-diagonalizable pencil are derived. Furthermore, it is shown that d(L) is almost inversely proportional to the condition number of the most ill-conditioned eigenvalue of L(λ). Key words. Matrix pencil, Pseudospectrum, Backward error, Multiple eigenvalue, Defective pencil, Wilkinson’s problem. AMS subject classifications. 65F15, 15A22, 15A60, 15A18, 65F35. 1. Introduction. Let L(λ) := A λB be an n-by-n regular matrix pencil with − n distinct eigenvalues. Then L(λ) is diagonalizable, that is, there are nonsingular matrices Y and X such that Y ∗AX and Y ∗BX are diagonal matrices. The columns of Y and X are left and right eigenvectors of L(λ), respectively.
- 
												  A New O(N2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector ProblemA New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem by Inderjit Singh Dhillon B.Tech. (Indian Institute of Technology, Bombay) 1989 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA, BERKELEY Committee in charge: Professor James W. Demmel, Chair Professor Beresford N. Parlett Professor Phil Colella 1997 The dissertation of Inderjit Singh Dhillon is approved: Chair Date Date Date University of California, Berkeley 1997 A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem Copyright 1997 by Inderjit Singh Dhillon 1 Abstract A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem by Inderjit Singh Dhillon Doctor of Philosophy in Computer Science University of California, Berkeley Professor James W. Demmel, Chair Computing the eigenvalues and orthogonal eigenvectors of an n × n symmetric tridiagonal matrix is an important task that arises while solving any symmetric eigenproblem. All practical software requires O(n3) time to compute all the eigenvectors and ensure their orthogonality when eigenvalues are close. In the first part of this thesis we review earlier work and show how some existing implementations of inverse iteration can fail in surprising ways. The main contribution of this thesis is a new O(n2), easily parallelizable algorithm for solving the tridiagonal eigenproblem. Three main advances lead to our new algorithm. A tridiagonal matrix is traditionally represented by its diagonal and off-diagonal elements. Our most important advance is in recognizing that its bidiagonal factors are “better” for computational purposes.
- 
												  Conference Abstracts Index Abderraman´ Marrero, Jesus,´ 72 Beckermann, Bernhard, 18 Absil, Pierre-Antoine, 49, 78 Beitia, MConference Abstracts Index Abderraman´ Marrero, Jesus,´ 72 Beckermann, Bernhard, 18 Absil, Pierre-Antoine, 49, 78 Beitia, M. Asuncion,´ 74 Acar, Evrim, 40 Belhaj, Skander, 89 Agullo, Emmanuel, 39, 51 Ben-Ari, Iddo, 24 Ahuja, Kapil, 41, 45 Bendito, Enrique, 19, 66 Aidoo, Anthony, 81 Benner, Peter, 22, 41, 42, 76, 81 Aishima, Kensuke, 50 Benning, Martin, 59 Akian, Marianne, 53 Bergamaschi, Luca, 58 Al-Ammari, Maha, 79 Bergqvist, Goran,¨ 87 Al-Mohy, Awad, 96 Berman, Abraham, 33 Alastair Spence, Alastair, 56 Bernasconi, Michele, 81 Albera, Laurent, 86 Beyer, Lucas, 37 Ali, Murtaza, 37 Bientinesi, Paolo, 37 Allamigeon, Xavier, 59 Bing, Zheng, 82 Alter, O., 73 Bini, Dario A., 12 Amat, Sergio, 67, 76, 96 Bioucas-Dias, Jose, 55 Amestoy, Patrick, 65 Birken, Philipp, 92 Ammar, Amine, 26 Blank, Luise, 32 Amodei, Luca, 69 Boiteau, Olivier, 65 Amsallem, David, 15 Boito, Paola, 63 Andrianov, Alexander, 91 Bolten, Matthias, 14 Antoine, Xavier, 24 Boman, Erik G., 39 Antoulas, A.C., 43 Borges, Anabela, 83 Araujo,´ Claudia Mendes, 95 Borm,¨ Steffen, 56 Arbenz, Peter, 86 Borobia, Alberto, 80 Arioli, Mario, 36, 45, 58 Borwein, Jonathan M., 65 Armentia, Gorka, 84 Bouhamidi, Abderrahman, 28, 79 Ashcraft, Cleve, 65 Boumal, Nicolas, 78 Asif, M. Salman, 53 Boutsidis, Christos, 18 Aurentz, Jared L., 64 Boyanova, Petia, 17 Avron, Haim, 21 Bozkurt, Durmus¸, 71, 72 Axelsson, Owe, 17 Bozoviˇ c,´ Nemanja, 44 Bras,´ Isabel, 81 Baboulin, Marc, 17, 21 Braman, Karen, 72 Bai, Zhong-Zhi, 25, 86, 92 Brannick, James, 30 Bailey, David H., 65 Brauchart, Johann S., 69 Baker, Christopher G., 77 Breiten, Tobias, 22 Ballard, Grey, 27, 34 Bro, Rasmus, 40 Baragana,˜ Itziar, 74 Bru, Rafael, 52 Barioli, Francesco, 32 Brualdi, Richard A., 26 Barlow, Jesse, 37 Buchot, Jean-Marie, 69 Barreras, Alvaro,´ 14 Budko, Neil, 24 Barrett, Wayne, 32 Bueno, Maria I., 64 Barrio, Roberto, 65, 66, 71 Buluc¸, Aydın, 34 Basermann, Achim, 75 Busquier, S., 67 Batselier, Kim, 71, 80 Butkovic,ˇ Peter, 60 Baum, Ann-Kristin, 82 Buttari, Alfredo, 65 Baykara, N.
- 
												  Matrix Depot Documentation Release 0.1.0Matrix Depot Documentation Release 0.1.0 Weijian Zhang Nov 30, 2018 Contents 1 Installation 3 1.1 Usage...................................................3 1.2 Matrices.................................................5 1.3 Random Graphs............................................. 46 1.4 Test Problems for Regularization Methods............................... 47 1.5 Groups.................................................. 51 1.6 Interface to Test Collections....................................... 52 1.7 Adding New Matrix Generators..................................... 55 1.8 Examples................................................. 58 Bibliography 65 i ii Matrix Depot Documentation, Release 0.1.0 Matrix Depot is an extensible test matrix collection for Julia. It provides a diverse collection of test matrices, including parametrized matrices and real-life matrices. • Source at Github • Release Notes Contents 1 Matrix Depot Documentation, Release 0.1.0 2 Contents CHAPTER 1 Installation To install the release version, type: julia>] pkg> add MatrixDepot 1.1 Usage Every matrix in the collection is represented by a string "matrix_name", for example, the Cauchy matrix is repre- sented by "cauchy" and the Hilbert matrix is represented by "hilb". The matrix groups are noted as symbols. For example, the class of the symmetric matrices is symbolized by :symmetric. mdinfo() Return a list of all the matrices in the collection: julia> matrixdepot() Matrices: 1) baart2) binomial3) blur4) cauchy 5) chebspec6) chow7) circul8) clement 9) companion 10) deriv2