Randomized Methods for Computing Low-Rank Approximations of Matrices

Randomized methods for computing low-rank approximations of matrices by Nathan P. Halko B.S., University of California, Santa Barbara, 2007 M.S., University of Colorado, Boulder, 2010 A thesis submitted to the Faculty of the Graduate School of the University of Colorado in partial fulfillment of the requirements for the degree of Doctor of Philosophy Department of Applied Mathematics 2012 This thesis entitled: Randomized methods for computing low-rank approximations of matrices written by Nathan P. Halko has been approved for the Department of Applied Mathematics Per-Gunnar Martinsson Keith Julian David M. Bortz Francois G. Meyer Date The final copy of this thesis has been examined by the signatories, and we find that both the content and the form meet acceptable presentation standards of scholarly work in the above mentioned discipline. iii Halko, Nathan P. (Ph. D., Applied Mathematics) Randomized methods for computing low-rank approximations of matrices Thesis directed by Professor Per-Gunnar Martinsson Randomized sampling techniques have recently proved capable of efficiently solving many standard problems in linear algebra, and enabling computations at scales far larger than what was previously possible. The new algorithms are designed from the bottom up to perform well in modern computing environments where the expense of communication is the primary constraint. In extreme cases, the algorithms can even be made to work in a streaming environment where the matrix is not stored at all, and each element can be seen only once. The dissertation describes a set of randomized techniques for rapidly constructing a low-rank approximation to a matrix. The algorithms are presented in a modular framework that first computes an approximation to the range of the matrix via randomized sampling. Secondly, the matrix is pro- jected to the approximate range, and a factorization (SVD, QR, LU, etc.) of the resulting low-rank matrix is computed via variations of classical deterministic methods. Theoretical performance bounds are provided. Particular attention is given to very large scale computations where the matrix does not fit in RAM on a single workstation. Algorithms are developed for the case where the original matrix must be stored out-of-core but where the factors of the approximation fit in RAM. Numerical examples are provided that perform Principal Component Analysis of a data set that is so large that less than one hundredth of it can fit in the RAM of a standard laptop computer. Furthermore, the dissertation presents a parallelized randomized scheme for computing a reduced rank Singular Value Decomposition. By parallelizing and distributing both the randomized sampling stage and the processing of the factors in the approximate factorization, the method requires an amount iv of memory per node which is independent of both dimensions of the input matrix. Numerical experiments are performed on Hadoop clusters of computers in Amazon's Elastic Compute Cloud with up to 64 total cores. Finally, we directly compare the performance and accuracy of the randomized algorithm with the classical Lanczos method on extremely large, sparse matrices and substantiate the claim that randomized methods are superior in this environment. Dedication vi Acknowledgements vii Contents Chapter 1 Introduction 1 1.1 Approximation by low rank matrices . .1 1.2 Existing methods for computing low-rank approximations . .3 1.2.1 Truncated Factorizations . .3 1.2.2 Direct Access . .4 1.2.3 Iterative methods . .4 1.3 Changes in computer architecture, and the need for new algorithms . .5 1.4 Computational framework and problem formulation . .5 1.5 Randomized algorithms for approximating the range of a matrix . .7 1.5.1 Intuition . .7 1.5.2 Basic Algorithm . .8 viii 1.5.3 Probabilistic error bounds . .9 1.5.4 Computational cost . 11 1.6 Variations of the basic theme . 11 1.6.1 Structured Random Matrices . 12 1.6.2 Increasing accuracy . 12 1.6.3 Numerical Stability . 13 1.6.4 Adaptive methods . 14 1.7 Numerical linear algebra in a distributed computing environment . 15 1.7.1 Distributed computing environment . 15 1.7.2 Distributed operations . 16 1.7.3 Stochastic Singular Value Decomposition in MapReduce . 17 1.7.4 Lanczos comparison . 17 1.7.5 Numerical Experiments . 17 1.8 Survey of prior work on randomized algorithms in linear algebra . 18 1.9 Structure of dissertation and overview of principal contributions . 18 ix Bibliography 22 2 Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions 25 2.1 Overview . 26 2.1.1 Approximation by low-rank matrices . 27 2.1.2 Matrix approximation framework . 28 2.1.3 Randomized algorithms . 29 2.1.4 A comparison between randomized and traditional techniques . 32 2.1.5 Performance analysis . 34 2.1.6 Example: Randomized SVD . 35 2.1.7 Outline of paper . 37 2.2 Related work and historical context . 37 2.2.1 Randomized matrix approximation . 37 2.2.2 Origins . 42 2.3 Linear algebraic preliminaries . 45 2.3.1 Basic definitions . 46 2.3.2 Standard matrix factorizations . 46 2.3.3 Techniques for computing standard factorizations . 48 x 2.4 Stage A: Randomized schemes for approximating the range . 51 2.4.1 The proto-algorithm revisited . 51 2.4.2 The number of samples required . 52 2.4.3 A posteriori error estimation . 53 2.4.4 Error estimation (almost) for free . 54 2.4.5 A modified scheme for matrices whose singular values decay slowly . 55 2.4.6 An accelerated technique for general dense matrices . 56 2.5 Stage B: Construction of standard factorizations . 58 2.5.1 Factorizations based on forming Q∗A directly . 58 2.5.2 Postprocessing via row extraction . 59 2.5.3 Postprocessing an Hermitian matrix . 61 2.5.4 Postprocessing a positive semidefinite matrix . 61 2.5.5 Single-pass algorithms . 62 2.6 Computational costs . 63 2.6.1 General matrices that fit in core memory . 63 2.6.2 Matrices for which matrix{vector products can be rapidly evaluated . 64 2.6.3 General matrices stored in slow memory or streamed . 66 2.6.4 Gains from parallelization . 66 xi 2.7 Numerical examples . 67 2.7.1 Two matrices with rapidly decaying singular values . 67 2.7.2 A large, sparse, noisy matrix arising in image processing . 68 2.7.3 Eigenfaces . 69 2.7.4 Performance of structured random matrices . 70 2.8 Theoretical preliminaries . 83 2.8.1 Positive semidefinite matrices . 83 2.8.2 Orthogonal projectors . 85 2.9 Error bounds via linear algebra . 86 2.9.1 Setup . 87 2.9.2 A deterministic error bound for the proto-algorithm . 87 2.9.3 Analysis of the power scheme . 90 2.9.4 Analysis of truncated SVD . 91 2.10 Gaussian test matrices . 92 2.10.1 Technical background . 92 2.10.2 Average-case analysis of Algorithm 4.1 . 94 2.10.3 Probabilistic error bounds for Algorithm 4.1 . 96 2.10.4 Analysis of the power scheme . 98 xii 2.11 SRFT test matrices . 99 2.11.1 Construction and Properties . 100 2.11.2 Performance guarantees . 101 2.12 On Gaussian matrices . 103 2.12.1 Expectation of norms . 103 2.12.2 Spectral norm of pseudoinverse . 104 2.12.3 Frobenius norm of pseudoinverse . 104 Bibliography 109 3 An algorithm for the principal component analysis of large data sets 118 3.1 Introduction . 118 3.2 Informal description of the algorithm . 119 3.3 Summary of the algorithm . 121 3.4 Out-of-core computations . 123 3.4.1 Computations with on-the-fly evaluation of matrix entries . 123 3.4.2 Computations with storage on disk . 123 3.5 Computational costs . 123 3.5.1 Costs with on-the-fly evaluation of matrix entries . 124 xiii 3.5.2 Costs with storage on disk . 124 3.6 Numerical examples . 125 3.6.1 Synthetic data . 125 3.6.2 Measured data . 128 3.7 An application . ..

Randomized Methods for Computing Low-Rank Approximations of Matrices

Some Properties of the Essential Numerical Range on Banach Spaces

Numerical Range of Operators, Numerical Index of Banach Spaces, Lush Spaces, and Slicely Countably Determined Banach Spaces

Numerical Range and Functional Calculus in Hilbert Space

Numerical Range of Square Matrices a Study in Spectral Theory

Morphophoric Povms, Generalised Qplexes, Ewline and 2-Designs

Numerical Ranges and Applications in Quantum Information

On Invariance of the Numerical Range and Some Classes of Operators In

Uncertainty Relations and Joint Numerical Ranges

Spectral Inclusion for Unbounded Block Operator Matrices ✩

Numerical Range: (In) a Matrix Nutshell

On Characterization of Posinormal Operators and Their Applications in Quantum Mechanics

A FUNCTIONAL CALCULUS BASED on the NUMERICAL RANGE. APPLICATIONS the Starting Point of This Work Is the Following Inequality