
Communication-avoiding Krylov subspace methods Mark Frederick Hoemmen Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2010-37 http://www.eecs.berkeley.edu/Pubs/TechRpts/2010/EECS-2010-37.html April 2, 2010 Copyright © 2010, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Communication-Avoiding Krylov Subspace Methods By Mark Hoemmen A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Professor James W. Demmel, chair Professor Katherine A. Yelick Professor Ming Gu Spring 2010 i Contents Acknowledgments vi Notation viii List of terms x Abstract 1 1 Introduction 1 1.1 Krylov subspace methods . 2 1.1.1 What are Krylov methods? . 2 1.1.2 Kernels in Krylov methods . 4 1.1.3 Structures of Krylov methods . 6 1.2 Communication and performance . 13 1.2.1 What is communication? . 13 1.2.2 Performance model . 14 1.2.3 Simplified parallel and sequential models . 17 1.2.4 Communication is expensive . 17 1.2.5 Avoiding communication . 23 1.3 Kernels in standard Krylov methods . 25 1.3.1 Sparse matrix-vector products . 25 1.3.2 Preconditioning . 30 1.3.3 AXPYs and dot products . 31 1.4 New communication-avoiding kernels . 32 1.4.1 Matrix powers kernel . 32 1.4.2 Tall Skinny QR . 33 1.4.3 Block Gram-Schmidt . 33 1.5 Related work: s-step methods . 34 1.6 Related work on avoiding communication . 37 1.6.1 Arnoldi with Delayed Reorthogonalization . 39 1.6.2 Block Krylov methods . 41 1.6.3 Chebyshev iteration . 43 1.6.4 Avoiding communication in multigrid . 44 1.6.5 Asynchronous iterations . 45 1.6.6 Summary . 47 CONTENTS ii 1.7 Summary and contributions . 48 2 Computational kernels 52 2.1 Matrix powers kernel . 53 2.1.1 Introduction . 53 2.1.2 Model problems . 54 2.1.3 Parallel Algorithms . 55 2.1.4 Asymptotic performance . 59 2.1.5 Parallel algorithms for general sparse matrices . 59 2.1.6 Sequential algorithms for general sparse matrices . 63 2.1.7 Hybrid algorithms . 65 2.1.8 Optimizations . 67 2.1.9 Performance results . 70 2.1.10 Variations . 72 2.1.11 Related kernels . 75 2.1.12 Related work . 76 2.2 Preconditioned matrix powers kernel . 79 2.2.1 Preconditioning . 79 2.2.2 New kernels . 81 2.2.3 Exploiting sparsity . 81 2.2.4 Exploiting low-rank off-diagonal blocks . 82 2.2.5 A simple example . 82 2.2.6 Problems and solutions . 85 2.2.7 Related work . 86 2.3 Tall Skinny QR . 87 2.3.1 Motivation for TSQR . 88 2.3.2 TSQR algorithms . 89 2.3.3 Accuracy and stability . 96 2.3.4 TSQR performance . 97 2.4 Block Gram-Schmidt . 100 2.4.1 Introduction . 101 2.4.2 Notation . 102 2.4.3 Algorithmic skeletons . 102 2.4.4 Block CGS with TSQR . 104 2.4.5 Performance models . 104 2.4.6 Accuracy in the unblocked case . 107 2.4.7 Na¨ıve block reorthogonalization may fail . 110 2.4.8 Rank-revealing TSQR and BGS . 111 2.5 Block orthogonalization in the M inner product . 114 2.5.1 Review: CholeskyQR . 115 2.5.2 M-CholeskyQR . 115 2.5.3 M-CholBGS . 116 CONTENTS iii 3 Communication-avoiding Arnoldi and GMRES 120 3.1 Arnoldi iteration . 121 3.1.1 Notation . 121 3.1.2 Restarting . 123 3.1.3 Avoiding communication in Arnoldi . 124 3.2 Arnoldi(s)..................................... 124 3.2.1 Ansatz . 124 3.2.2 A different basis . 125 3.2.3 The Arnoldi(s) algorithm . 127 3.2.4 Unitary scaling of the basis vectors . 128 3.2.5 Properties of basis conversion matrix . 129 3.3 CA-Arnoldi or \Arnoldi(s, t)" . 130 3.3.1 Ansatz . 131 3.3.2 Notation . 134 3.3.3 QR factorization update . 134 3.3.4 Updating the upper Hessenberg matrix . 139 3.3.5 CA-Arnoldi algorithm . 141 3.3.6 Generalized eigenvalue problems . 142 3.3.7 Summary . 143 3.4 CA-GMRES . 143 3.4.1 Scaling of the first basis vector . 144 3.4.2 Convergence metrics . 146 3.4.3 Preconditioning . 146 3.5 CA-GMRES numerical experiments . 147 3.5.1 Key for convergence plots . 148 3.5.2 Diagonal matrices . 149 3.5.3 Convection-diffusion PDE discretization . 153 3.5.4 Sparse matrices from applications . 157 3.5.5 WATT1 test problem . 176 3.5.6 Summary . 178 3.6 CA-GMRES performance experiments . 179 3.6.1 Implementation details . 179 3.6.2 Results for various sparse matrices from applications . 180 3.6.3 Results for WATT1 matrix . 181 3.6.4 Implementation challenges . 183 3.6.5 Future work . 185 4 Comm.-avoiding symm. Lanczos 188 4.1 Symmetric Lanczos . 189 4.1.1 The standard Lanczos algorithm . 189 4.1.2 Communication in Lanczos iteration . 191 4.1.3 Communication-avoiding reorthogonalization . 191 4.2 CA-Lanczos . 192 4.2.1 CA-Lanczos update formula . 193 4.2.2 Updating the tridiagonal matrix . 197 CONTENTS iv 4.2.3 The CA-Lanczos algorithm . 200 4.3 Preconditioned CA-Lanczos . 200 4.3.1 Lanczos and orthogonal polynomials . 202 4.3.2 Left preconditioning excludes QR . 205 4.3.3 Left-preconditioned CA-Lanczos with MGS . ..
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages358 Page
-
File Size-