Approximating Matrix Eigenvalues by Randomized Subspace Iteration Samuel M
Total Page:16
File Type:pdf, Size:1020Kb
Approximating matrix eigenvalues by randomized subspace iteration Samuel M. Greene,1 Robert J. Webber,2 Timothy C. Berkelbach,1, 3, a) and Jonathan Weare2, b) 1)Department of Chemistry, Columbia University, New York, New York 10027, United States 2)Courant Institute of Mathematical Sciences, New York University, New York, New York 10012, United States 3)Center for Computational Quantum Physics, Flatiron Institute, New York, New York 10010, United States Traditional numerical methods for calculating matrix eigenvalues are prohibitively expensive for high- dimensional problems. Randomized iterative methods allow for the estimation of a single dominant eigenvalue at reduced cost by leveraging repeated random sampling and averaging. We present a general approach to ex- tending such methods for the estimation of multiple eigenvalues and demonstrate its performance for problems in quantum chemistry with matrices as large as 28 million by 28 million. I. INTRODUCTION be understood as a natural generalization of projector Monte Carlo methods to excited states. Many scientific problems require matrix eigenvectors Among previous randomized iterative approaches to and eigenvalues, but methods for calculating them based the multiple eigenvalue problem, ours is perhaps most on dense, in-place factorizations are intractably expen- closely related to several \replica" schemes that use mul- sive for large matrices.1,2 Iterative methods involving re- tiple independent iterations to build subspaces within peated matrix multiplications3{5 offer reduced computa- which the target matrix is subsequently diagonal- 30,33{35 tional and memory costs, particularly for sparse matri- ized. In comparison, our method avoids the high- ces. However, even these methods are too expensive for variance inner products of sparse, random vectors that 34 the extremely large matrices increasingly encountered in can hinder replica approaches and results in a stable modern applications. stochastic iteration that can be averaged to further re- Randomized iterative numerical linear algebra meth- duce statistical error. As a notable consequence and un- ods6{11 can enable significant further reductions in com- like replica methods, our approach is applicable to itera- 32 putational and memory costs by stochastically impos- tive techniques in continuous space. ing sparsity in vectors and matrices at each iteration, thus facilitating the use of efficient computational tech- niques that leverage sparsity. While randomized itera- II. A NON-STANDARD DETERMINISTIC SUBSPACE tive methods offer fewer established theoretical guaran- ITERATION tees compared to the better studied matrix sketching ap- 12{20 proaches, their memory and computational costs can Our goal is to find the k dominant eigenvalues (count- 11,21{24 be significantly lower. When used to calculate the ing multiplicity) of an n n matrix A. Starting from ground state energy, or smallest eigenvalue, of the quan- an initial n k matrix X(0)× , classical subspace iteration tum mechanical Hamiltonian operator, randomized iter- techniques construct× a sequence of \matrix iterates" ac- ative methods are known as projector quantum Monte cording to the iteration X(i+1) = AX(i)[G(i)]−1. For 7,22,25{28 Carlo approaches. Applying these methods to k = 1, this corresponds to power iteration, on which calculate multiple eigenvalues poses additional challenges many single-eigenvalue randomized iterative methods are due to the need to maintain orthogonality among eigen- based. Multiplying by [G(i)]−1 enforces orthonormality 10,29{31 vectors as iteration proceeds. among the columns of matrix iterates. The column span This paper presents a randomized subspace iteration of the matrix iterates converges to the span of the k domi- arXiv:2103.12109v1 [math.NA] 22 Mar 2021 approach to addressing these challenges. The method is nant eigenvectors if the overlap of the initial iterate with general and can be used to extend all of the above ref- this subspace is nonzero. Eigenvalues and eigenvectors erenced randomized iterative approaches for dominant can be estimated after each iteration using the Rayleigh- eigenvalues (including continuous-space methods such as Ritz method.36{38 In standard implementations of sub- 32 diffusion Monte Carlo ) to the multiple dominant eigen- space iteration, both the orthogonalization and eigen- value problem. For concreteness, we focus on a partic- value estimation steps involve nonlinear operations on ular randomization technique, namely one from the fast X(i), which lead to statistical biases once randomness is 11 randomized iteration framework. We test our method introduced into the iterates by stochastic sparsification. on quantum mechanical problems; in this context, it can In order to reduce these errors in our randomized algo- rithm, we make two non-standard choices. First, we es- timate eigenvalues by solving the generalized eigenvalue problem a)Electronic mail: [email protected] b) Electronic mail: [email protected] U∗AX(i)W(i) = U∗X(i)W(i)Λ(i) (1) 2 for the unknown diagonal matrix Λ(i) of Ritz values, error is zero. where U is a constant deterministic matrix with columns The variance in a compressed vector x0 = Φ(x) can chosen to approximate the dominant eigenvectors of A. be systematically reduced by increasing m, albeit at in- This approach represents a multi-eigenvalue generaliza- creased computational cost. In the context of random- tion of the \projected estimator" commonly used in ized iterative methods, m can often be chosen to be sig- single-eigenvalue randomized methods.39 Eigenvalue es- nificantly less than the dimension of x. The statisti- timates are exact for any eigenvector exactly contained cal variance in the dot product u∗x0 is often low even within the column span of U regardless of the quality of in high dimensions.11 In contrast, dot products between the iterate X(i), a feature that will provide an additional pairs of uncorrelated, sparse, random vectors, as are used means of reducing statistical error in our randomized al- in replica methods, can have high variance, particularly gorithm. when the vectors are high-dimensional.34 And taking the Second, we construct the matrices G(i) by a non- dot product of a compressed vector with itself (e.g. x0∗x0) standard approach. We only enforce orthogonality of the introduces a significant bias in high dimensions.27 columns of X(i) within the column span of U instead of in the entire vector space. Multiplication by [G(i)]−1 typically also enforces a normalization constraint, which IV. RANDOMIZED SUBSPACE ITERATION we relax by normalizing by a modified running geometric average of iterate norms. Specific procedures for con- Applying stochastic compression to our subspace it- structing the matrices G(i) are described below. eration yields the iteration X(i+1) = AΦ(X(i))[G(i)]−1, The principle motivating these choices is that non- where the compression operation Φ is performed inde- linear operations in our iteration should only be applied pendently at each iteration. We refer to the sequence of to products between the iterates X(i) and the constant iterates X(i) so generated as a \trajectory." As empha- matrices U and A∗U. This leads to a suboptimal deter- sized above, the inner products of X(i) with deterministic ministic algorithm in the sense that eigenvalue estimates matrices exhibit low variance. Nonetheless, even this low converge at only half the rate obtained if (1) is replaced variance could result in significant biases if the \instan- by the standard quadratic Rayleigh quotient.36 However, taneous" Ritz values corresponding to U∗AΦ(X(i)) and when iterates are randomized as described below, their U∗X(i) ((1)) from each iteration were averaged, due to products with deterministic matrices typically have very the nonlinear eigensolve operation. This will be demon- low variance, even when the variance in the iterates is strated in our numerical experiments below. For this rea- significant.11 son, we average these matrices in order to further reduce their variance before solving the eigenvalue problem ∗ (i) ∗ (i) U AΦ(X ) iW = U X iWΛ (2) III. STOCHASTIC COMPRESSION h i h i where Λ is a diagonal matrix of Ritz values and i rep- h·i If sparsity is leveraged, the cost of forming the ma- resents an average over multiple iterations i from a single trix products AX(i) in the above algorithm scales as long trajectory. This formulation allows for the estima- (mamxk), where ma and mx are the number of nonzero tion of eigenvalues with low bias and low variance while elementsO in each column of A and X(i), respectively. also avoiding intractable memory costs. (X(i) has k columns.) Stochastic compression allows one In principle, we could first average the iterates to ob- (i) tain X i, an accurate representation of the dominant to reduce this cost by zeroing nonzero elements at ran- h i domly selected positions. We define a stochastic com- eigenspace, and then calculate its associated Ritz values. However, this would require (kn) memory, rendering it pression operator Φ which, when applied to a generic O vector x, returns a random compressed vector Φ(x) with impractical for large matrices. Instead, the k k ma- ∗ (i) ∗ (i) × (1) at most a user-specified number m of nonzero ele- trices U AΦ(X ) and U X in (2) can be stored and ments, and (2) elements equal to those of the input vec- averaged at little memory cost and, due to their linear tor x in expectation, i.e., E[Φ(x)i] = xi. Applying Φ dependence on iterates, without statistical bias. In fact, to a matrix X = [x1 x2 :::] involves compressing each estimates from (2) are equal (in expectation value) to of its columns independently to m nonzero elements, i.e. those obtained by applying the Rayleigh-Ritz estimator (i) Φ(X) = [Φ(x ) Φ(x ) :::]. Many possible compression in (1) to X i. 1 2 h i (i) algorithms correspond to this generic definition.