Randomized Numerical Linear Algebra: Foundations & Algorithms

Randomized Numerical Linear Algebra: Foundations & Algorithms Per-Gunnar Martinsson, University of Texas at Austin Joel A. Tropp, California Institute of Technology Abstract: This survey describes probabilistic algorithms for linear algebra computations, such as factorizing matrices and solving linear systems. It focuses on techniques that have a proven track record for real-world problems. The paper treats both the theoretical foundations of the subject and practical computational issues. Topics include norm estimation; matrix approximation by sampling; structured and un- structured random embeddings; linear regression problems; low-rank approximation; subspace iteration and Krylov methods; error estimation and adaptivity; interpolatory and CUR factorizations; Nystrom¨ approximation of positive semidefinite matrices; single- view (“streaming”) algorithms; full rank-revealing factorizations; solvers for linear systems; and approximation of kernel matrices that arise in machine learning and in scientific computing. CONTENTS 1. Introduction 2 2. Linear algebra preliminaries 6 3. Probability preliminaries 9 4. Trace estimation by sampling 10 5. Schatten p-norm estimation by sampling 16 6. Maximum eigenvalues and trace functions 18 7. Matrix approximation by sampling 24 8. Randomized embeddings 30 9. Structured random embeddings 35 10. How to use random embeddings 40 arXiv:2002.01387v2 [math.NA] 15 Aug 2020 11. The randomized rangefinder 43 12. Error estimation and adaptivity 52 13. Finding natural bases: QR, ID, and CUR 57 14. Nystrom¨ approximation 62 15. Single-view algorithms 64 16. Factoring matrices of full or nearly full rank 68 17. General linear solvers 76 18. Linear solvers for graph Laplacians 79 19. Kernel matrices in machine learning 85 20. High-accuracy approximation of kernel matrices 94 References 106 Randomized Numerical Linear Algebra P.G. Martinsson and J.A. Tropp 1. INTRODUCTION Numerical linear algebra (NLA) is one of the great achievements of scientific computing. On most computational platforms, we can now routinely and automatically solve small- and medium-scale linear algebra problems to high precision. The purpose of this survey is to describe a set of probabilistic techniques that have joined the mainstream of NLA over the last decade. These new techniques have accelerated everyday computations for small- and medium-size problems, and they have enabled large-scale computations that were beyond the reach of classical methods. 1.1. Classical numerical linear algebra. NLA definitively treats several major classes of problems, including • solution of dense and sparse linear systems; • orthogonalization, least-squares, and Tikhonov regularization; • determination of eigenvalues, eigenvectors, and invariant subspaces; • singular value decomposition (SVD) and total least-squares. In spite of this catalog of successes, important challenges remain. The sheer scale of certain datasets (terabytes and beyond) makes them impervious to classical NLA algorithms. Modern computing architectures (GPUs, multi-core CPUs, massively distributed systems) are powerful, but this power can only be unleashed by algorithms that minimize data movement and that are designed ab initio with parallel computation in mind. New ways to organize and present data (out-of-core, distributed, streaming) also demand alternative techniques. Randomization offers novel tools for addressing all of these challenges. This paper surveys these new ideas, provides detailed descriptions of algorithms with a proven track record, and outlines the mathematical techniques used to analyze these methods. 1.2. Randomized algorithms emerge. Probabilistic algorithms have held a central place in scientific computing ever since Ulam and von Neumann’s groundbreaking work on Monte Carlo methods in the 1940s. For instance, Monte Carlo algorithms are essential for high- dimensional integration and for solving PDEs set in high-dimensional spaces. They also play a major role in modern machine learning and uncertainty quantification. For many decades, however, numerical analysts regarded randomized algorithms as a method of last resort—to be invoked only in the absence of an effective deterministic alternative. Indeed, probabilistic techniques have several undesirable features. First, Monte Carlo methods often produce output with low accuracy. This is a consequence of the central limit theorem, and in many situations it cannot be avoided. Second, many computational sci- entists have a strong attachment to the engineering principle that two successive runs of the same algorithm should produce identical results. This requirement aids with debugging, and it can be critical for applications where safety is paramount, for example simulation of in- frastructure or control of aircraft. Randomized methods do not generally offer this guarantee. (Controlling the seed of the random number generator can provide a partial work-around, but this necessarily involves additional complexity.) Nevertheless, in the 1980s, randomized algorithms started to make inroads into NLA. Some of the early work concerns spectral computations, where it was already traditional to use random initialization. Dixon (1983) recognized that (a variant of) the power method with a random start provably approximates the largest eigenvalue of a positive semidefinite (PSD) matrix, even without a gap between the first and second eigenvalue. Kuczynski´ and Wozniakowski´ (1992) provided a sharp analysis of this phenomenon for both the power method and the Lanczos algorithm. Around the same time, Girard (1989) and Hutchinson (1990) proposed Monte Carlo methods for estimating the trace of a large psd matrix. Soon af- ter, Parker (1995) demonstrated that randomized transformations can be used to avoid pivoting steps in Gaussian elimination. Starting in the late 1990s, researchers in theoretical computer science identified other ways to apply probabilistic algorithms in NLA. Alon, Matias and Szegedy (1999) and Alon, 2 x1. Introduction Gibbons, Matias and Szegedy (2002) showed that randomized embeddings allow for computations on streaming data with limited storage. (Papadimitriou, Raghavan, Tamaki and Vempala 2000) and (Frieze, Kannan and Vempala 2004) proposed Monte Carlo methods for low-rank matrix approximation. (Drineas, Kannan and Mahoney 2006a), (Drineas, Kannan and Mahoney 2006b), and (Drineas, Kannan and Mahoney 2006c) wrote the first statement of theoretical principles for randomized NLA. Sarlos´ (2006) showed how subspace embeddings support linear algebra computations. In the mid-2000s, numerical analysts introduced practical randomized algorithms for low- rank matrix approximation and least-squares problems. This work includes the first computational evidence that randomized algorithms outperform classical NLA algorithms for particular classes of problems. Early contributions include (Martinsson, Rokhlin and Tygert 2006a, Liberty, Woolfe, Martinsson, Rokhlin and Tygert 2007, Rokhlin and Tygert 2008, Woolfe, Liberty, Rokhlin and Tygert 2008). These papers inspired later work, such as (Avron, May- mounkov and Toledo 2010, Halko, Martinsson and Tropp 2011a, Halko, Martinsson, Shkol- nisky and Tygert 2011b), that has made a direct impact in applications Parallel with the advances in numerical analysis, a tide of enthusiasm for randomized algorithms has flooded into cognate fields. In particular, stochastic gradient descent (Bottou 2010) has become a standard algorithm for solving large optimization problems in machine learning. At the time of writing, in late 2019, randomized algorithms have joined the mainstream of NLA. They now appear in major reference works and textbooks (Golub and Van Loan 2013, Strang 2019). Key methods are being incorporated into standard software libraries ((NAG) 2019, Xiao, Gu and Langou 2017, Ghysels, Li, Gorman and Rouet 2017). 1.3. What does randomness accomplish? Over the course of this survey we will explore a number of different ways that randomization can be used to design effective NLA algorithms. For the moment, let us just summarize the most important benefits. Randomized methods can handle certain NLA problems faster than any classical algorithm. In Section 10, we describe a randomized algorithm that can solve a dense m × n least-squares problem with m n using about O(mn + n3) arithmetic operations (Rokhlin and Tygert 2008). Meanwhile, classical methods require O(mn2) operations. In Section 18, we present an algorithm called SPARSECHOLESKY that can solve the Poisson problem on a dense undirected graph in time that is roughly quadratic in the number of vertices (Kyng and Sachdeva 2016). Standard methods have cost that is cubic in the number of vertices. The improvements can be even larger for sparse graphs. Randomization allows us to tackle problems that otherwise seem impossible. Section 15 contains an algorithm that can compute a rank-r truncated SVD of an m×n matrix in a single pass over the data using working storage O(r(m + n)). The first reference for this kind of algorithm is Woolfe et al. (2008). We know of no classical method with this computational profile. From an engineering point of view, randomization has another crucial advantage: it allows us to restructure NLA computations in a fundamentally different way. In Section 11, we will introduce the randomized SVD algorithm (Martinsson et al. 2006a, Halko et al. 2011a). Es- sentially all the arithmetic in this procedure takes place in a short sequence of matrix–matrix multiplications. Matrix multiplication is

Randomized Numerical Linear Algebra: Foundations & Algorithms

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support