PRIMME SVDS: a High-Performance Preconditioned SVD Solver For
Total Page:16
File Type:pdf, Size:1020Kb
PRIMME SVDS: A HIGH-PERFORMANCE PRECONDITIONED SVD SOLVER FOR ACCURATE LARGE-SCALE COMPUTATIONS LINGFEI WU,∗ ELOY ROMERO,∗ ANDREAS STATHOPOULOS∗ Abstract. The increasing number of applications requiring the solution of large scale singular value problems has rekindled an interest in iterative methods for the SVD. Some promising recent ad- vances in large scale iterative methods are still plagued by slow convergence and accuracy limitations for computing smallest singular triplets. Furthermore, their current implementations in MATLAB cannot address the required large problems. Recently, we presented a preconditioned, two-stage method to effectively and accurately compute a small number of extreme singular triplets. In this re- search, we present a high-performance library, PRIMME SVDS, that implements our hybrid method based on the state-of-the-art eigensolver package PRIMME for both largest and smallest singular values. PRIMME SVDS fills a gap in production level software for computing the partial SVD, especially with preconditioning. The numerical experiments demonstrate its superior performance compared to other state-of-the-art software and its good parallel performance under strong and weak scaling. 1. Introduction. We consider the problem of finding a small number, k n, of extreme singular values and corresponding left and right singular vectors of a≪ large m n sparse matrix A R × (m n). These singular values and vectors satisfy, ∈ ≥ (1.1) Avi = σiui, where the singular values are labeled in ascending order, σ1 σ2 ... σn. The m n ≤n n ≤ ≤ matrices U = [u ,...,un] R × and V = [v ,...,vn] R × have orthonormal 1 ∈ 1 ∈ columns. (σi,ui, vi) is called a singular triplet of A. The singular triplets correspond- ing to largest (smallest) singular values are referred as largest (smallest) singular triplets. The Singular Value Decomposition (SVD) can always be computed [12]. The SVD is one of the most widely used computational kernels in various scientific and engineering areas. The computation of a few of the largest singular triplets plays a critical role in machine learning for computing a low-rank matrix approximation [43, 46, 8] and the nuclear norm [37], in data mining for latent semantic indexing [10], in statistics for principal component analysis [25], and in signal processing and pattern recognition as an important filtering tool. Calculating a few smallest singular triplets plays an important role in linear algebra applications such as computing pseu- dospectra, and determining the range, null space, and rank of a matrix [14, 42], in machine learning for the total least squares problems [13], and as a general deflation arXiv:1607.01404v2 [cs.MS] 24 Jan 2017 mechanism, e.g., for computing the trace of the matrix inverse [44]. For large scale problems, memory demands and computational complexity neces- sitate the use of iterative methods. Over the last two decades, iterative methods for the SVD have been developed [12, 36, 26, 9, 11, 3, 2, 23, 24, 20, 45] to effectively com- pute a few singular triplets under limited memory. In particular, the computation of the smallest singular triplets presents challenges both to the speed of convergence and to the accuracy of iterative methods, even for problems with moderate conditioning. Many recent research efforts attempt to address this challenge with new or modified iterative methods [3, 2, 23, 24, 20]. These methods still lack the necessary robustness and display irregular or slow convergence in cases of clustered spectra. Most im- portantly, they are only available in MATLAB research implementations that cannot ∗Department of Computer Science, College of William and Mary, Williamsburg, Virginia 23187- 8795, U.S.A. ([email protected], [email protected], [email protected]) 1 solve large-scale problems. In our previous work [45], we presented a hybrid, two-stage method that achieves both efficiency and accuracy for both largest and smallest singu- lar values under limited memory. In addition, the method can take full advantage of preconditioning to significantly accelerate the computation, which is the key towards large-scale SVD computation for certain real-world problems. Our previous results showed substantial improvements in efficiency and robustness over other methods. With all this algorithmic activity it is surprising that there is a lack of good quality software for computing the partial SVD, especially with preconditioning. SVDPACK [7] and PROPACK [27] can efficiently compute largest singular triplets but they are either single-threaded or only support shared-memory parallel computing. In addi- tion, they are slow and unreliable for computing the smallest singular triplets and cannot directly take advantage of preconditioning. SLEPc offers a more robust imple- mentation of the thick restarted Lanczos bidiagonalization method (LBD) [18] which can be used in both distributed and shared memory environments. SLEPc also of- fers some limited functionality for computing the partial SVD through its eigensolvers [19]. However, neither approach is tuned for accurate computation of smallest singular triplets, and only the eigensolver approach can use preconditioning which limits the desired accuracy or the practical efficiency of the package. There is a clear need for high quality, high performance SVD software that addresses the above shortcomings of current software and that provides a flexible interface suitable for both black-box and advanced usage. In this work we address this need by developing a high quality SVD library, PRIMME SVDS, based on the state-of-the-art package PRIMME (PReconditioned Iterative MultiMethod Eigensolver [40]). We highlight three main contributions: We present the modifications and functionality extensions of PRIMME that • are required to implement our two stage PHSVDS method [45] so that it achieves full accuracy and high-performance, for both smallest and largest singular values and with or without preconditioning. We provide intuitive user interfaces in C, MATLAB, Python, and R for both • ordinary and advanced users to fully exploit the power of PRIMME SVDS. We also discuss distributed and shared memory interfaces. We demonstrate with numerical experiments on large scale matrices that • PRIMME SVDS is more efficient and significantly more robust than the two most widely used software packages, PROPACK and SLEPc, even without a preconditioner. We also demonstrate its good parallel scalability. Since this is a software paper, the presentation of our method is given in a more abstract way that relates to the implementation. Further theoretical and algorithmic details can be found in [45]. 2. Related work. The partial SVD problem can be solved as a Hermitian eigen- value problem or directly using bidiagonalization methods [14]. We introduce these approaches and discuss the resulting state-of-the-art SVD methods. Next, we review the state-of-practice software, both eigensolver and dedicated SVD software. 2.1. Approaches and Methods for Computing SVD Problems. The first T n n approach computes eigenpairs of the normal equations matrix C = A A C × T m m ∈ (or AA C × if m<n). The eigenpairs of C correspond to the squares of the ∈ T 2 2 singular values and the right singular vectors of A, (A A)V = V diag σ1,...,σn . If 1 σi = 0, the corresponding left singular vectors are obtained as ui = σiAvi. Although extreme6 Hermitian eigenvalue problems can be solved very efficiently, C can be ill- conditioned. Assume (˜σi, ˜ui, ˜vi) is a computed singular triplet corresponding to the 2 2 exact triplet (σi, ui, vi) with residual satisfying C˜vi ˜viσ˜i A ǫmach. Then the relative errors satisfy the relations [41], k − k≤k k 2 σ˜i σi σn 2 (2.1) − 2 ǫmach κi ǫmach, | σi |≤ σi ≡ √2ǫmach (2.2) sin ∠(˜vi, vi) . 1 1 1 1 , κ− κ− κ− + κ− | i − j || i j | ˜ui ui 2 (2.3) k − k . κi( ˜vi vi 2 + γǫmach), ui k − k k k2 where σj is the closest singular value to σi, γ is a constant that depends on the dimensions of A, and ǫmach is the machine precision. Since all singular values (except the largest σn) will lose accuracy, this approach is typically followed by a second stage of iterative refinement. The singular triplets can be refined one by one [36, 11, 7] or collectively as an initial subspace for an iterative eigensolver [45]. This is especially needed when seeking smallest singular values. 0 AT The second approach seeks eigenpairs of the augmented matrix B = , A 0 (m+n) (m+n) where B C × . If U ⊥ is a basis for the orthogonal complement subspace ∈ m (m n) of U, where U ⊥ C × − , and define the orthonormal matrix ∈ 1 V V 0 (2.4) Y = − , √2 U U √2U ⊥ then the eigenvalue decomposition of B corresponds to [12, 14] (2.5) BY = Y diag σ1,...,σn, σ1,..., σn, 0,..., 0 . − − m n − | {z } This approach can compute all singular values accurately, i.e., with residual norm close to A ǫmach or relative accuracy O(κiǫmach). The main disadvantage of this approachk isk that when seeking the smallest singular values, the eigenvalue problem is a maximally interior one and thus Krylov iterative methods converge slowly and unre- liably. Let Kl(C, v1) be the Krylov subspace of dimension l of matrix C starting with Kl(C, v1) 0 initial vector v1. It is easy to see that K2l(B, ¯v1)= , 0 ⊕ AKl(C, v1) 1 where ¯v1 is the vector [v1; 0] . Therefore, an unrestarted Krylov method on B with the proper initial vector and extraction method converges twice slower than that on C [45]. In practice, restarted methods on C can converge much faster than on B, partly because the Rayleigh-Ritz projection for interior eigenvalues is less efficient due to presence of spurious Ritz values [35]. Harmonic projection [33, 34] and refined projection [22] have been proposed to overcome this difficulty. However, near-optimal restarted methods on C still converge significantly faster [45].