PRIMME SVDS: a High-Performance Preconditioned SVD Solver For

Total Page:16

File Type:pdf, Size:1020Kb

PRIMME SVDS: a High-Performance Preconditioned SVD Solver For PRIMME SVDS: A HIGH-PERFORMANCE PRECONDITIONED SVD SOLVER FOR ACCURATE LARGE-SCALE COMPUTATIONS LINGFEI WU,∗ ELOY ROMERO,∗ ANDREAS STATHOPOULOS∗ Abstract. The increasing number of applications requiring the solution of large scale singular value problems has rekindled an interest in iterative methods for the SVD. Some promising recent ad- vances in large scale iterative methods are still plagued by slow convergence and accuracy limitations for computing smallest singular triplets. Furthermore, their current implementations in MATLAB cannot address the required large problems. Recently, we presented a preconditioned, two-stage method to effectively and accurately compute a small number of extreme singular triplets. In this re- search, we present a high-performance library, PRIMME SVDS, that implements our hybrid method based on the state-of-the-art eigensolver package PRIMME for both largest and smallest singular values. PRIMME SVDS fills a gap in production level software for computing the partial SVD, especially with preconditioning. The numerical experiments demonstrate its superior performance compared to other state-of-the-art software and its good parallel performance under strong and weak scaling. 1. Introduction. We consider the problem of finding a small number, k n, of extreme singular values and corresponding left and right singular vectors of a≪ large m n sparse matrix A R × (m n). These singular values and vectors satisfy, ∈ ≥ (1.1) Avi = σiui, where the singular values are labeled in ascending order, σ1 σ2 ... σn. The m n ≤n n ≤ ≤ matrices U = [u ,...,un] R × and V = [v ,...,vn] R × have orthonormal 1 ∈ 1 ∈ columns. (σi,ui, vi) is called a singular triplet of A. The singular triplets correspond- ing to largest (smallest) singular values are referred as largest (smallest) singular triplets. The Singular Value Decomposition (SVD) can always be computed [12]. The SVD is one of the most widely used computational kernels in various scientific and engineering areas. The computation of a few of the largest singular triplets plays a critical role in machine learning for computing a low-rank matrix approximation [43, 46, 8] and the nuclear norm [37], in data mining for latent semantic indexing [10], in statistics for principal component analysis [25], and in signal processing and pattern recognition as an important filtering tool. Calculating a few smallest singular triplets plays an important role in linear algebra applications such as computing pseu- dospectra, and determining the range, null space, and rank of a matrix [14, 42], in machine learning for the total least squares problems [13], and as a general deflation arXiv:1607.01404v2 [cs.MS] 24 Jan 2017 mechanism, e.g., for computing the trace of the matrix inverse [44]. For large scale problems, memory demands and computational complexity neces- sitate the use of iterative methods. Over the last two decades, iterative methods for the SVD have been developed [12, 36, 26, 9, 11, 3, 2, 23, 24, 20, 45] to effectively com- pute a few singular triplets under limited memory. In particular, the computation of the smallest singular triplets presents challenges both to the speed of convergence and to the accuracy of iterative methods, even for problems with moderate conditioning. Many recent research efforts attempt to address this challenge with new or modified iterative methods [3, 2, 23, 24, 20]. These methods still lack the necessary robustness and display irregular or slow convergence in cases of clustered spectra. Most im- portantly, they are only available in MATLAB research implementations that cannot ∗Department of Computer Science, College of William and Mary, Williamsburg, Virginia 23187- 8795, U.S.A. ([email protected], [email protected], [email protected]) 1 solve large-scale problems. In our previous work [45], we presented a hybrid, two-stage method that achieves both efficiency and accuracy for both largest and smallest singu- lar values under limited memory. In addition, the method can take full advantage of preconditioning to significantly accelerate the computation, which is the key towards large-scale SVD computation for certain real-world problems. Our previous results showed substantial improvements in efficiency and robustness over other methods. With all this algorithmic activity it is surprising that there is a lack of good quality software for computing the partial SVD, especially with preconditioning. SVDPACK [7] and PROPACK [27] can efficiently compute largest singular triplets but they are either single-threaded or only support shared-memory parallel computing. In addi- tion, they are slow and unreliable for computing the smallest singular triplets and cannot directly take advantage of preconditioning. SLEPc offers a more robust imple- mentation of the thick restarted Lanczos bidiagonalization method (LBD) [18] which can be used in both distributed and shared memory environments. SLEPc also of- fers some limited functionality for computing the partial SVD through its eigensolvers [19]. However, neither approach is tuned for accurate computation of smallest singular triplets, and only the eigensolver approach can use preconditioning which limits the desired accuracy or the practical efficiency of the package. There is a clear need for high quality, high performance SVD software that addresses the above shortcomings of current software and that provides a flexible interface suitable for both black-box and advanced usage. In this work we address this need by developing a high quality SVD library, PRIMME SVDS, based on the state-of-the-art package PRIMME (PReconditioned Iterative MultiMethod Eigensolver [40]). We highlight three main contributions: We present the modifications and functionality extensions of PRIMME that • are required to implement our two stage PHSVDS method [45] so that it achieves full accuracy and high-performance, for both smallest and largest singular values and with or without preconditioning. We provide intuitive user interfaces in C, MATLAB, Python, and R for both • ordinary and advanced users to fully exploit the power of PRIMME SVDS. We also discuss distributed and shared memory interfaces. We demonstrate with numerical experiments on large scale matrices that • PRIMME SVDS is more efficient and significantly more robust than the two most widely used software packages, PROPACK and SLEPc, even without a preconditioner. We also demonstrate its good parallel scalability. Since this is a software paper, the presentation of our method is given in a more abstract way that relates to the implementation. Further theoretical and algorithmic details can be found in [45]. 2. Related work. The partial SVD problem can be solved as a Hermitian eigen- value problem or directly using bidiagonalization methods [14]. We introduce these approaches and discuss the resulting state-of-the-art SVD methods. Next, we review the state-of-practice software, both eigensolver and dedicated SVD software. 2.1. Approaches and Methods for Computing SVD Problems. The first T n n approach computes eigenpairs of the normal equations matrix C = A A C × T m m ∈ (or AA C × if m<n). The eigenpairs of C correspond to the squares of the ∈ T 2 2 singular values and the right singular vectors of A, (A A)V = V diag σ1,...,σn . If 1 σi = 0, the corresponding left singular vectors are obtained as ui = σiAvi. Although extreme6 Hermitian eigenvalue problems can be solved very efficiently, C can be ill- conditioned. Assume (˜σi, ˜ui, ˜vi) is a computed singular triplet corresponding to the 2 2 exact triplet (σi, ui, vi) with residual satisfying C˜vi ˜viσ˜i A ǫmach. Then the relative errors satisfy the relations [41], k − k≤k k 2 σ˜i σi σn 2 (2.1) − 2 ǫmach κi ǫmach, | σi |≤ σi ≡ √2ǫmach (2.2) sin ∠(˜vi, vi) . 1 1 1 1 , κ− κ− κ− + κ− | i − j || i j | ˜ui ui 2 (2.3) k − k . κi( ˜vi vi 2 + γǫmach), ui k − k k k2 where σj is the closest singular value to σi, γ is a constant that depends on the dimensions of A, and ǫmach is the machine precision. Since all singular values (except the largest σn) will lose accuracy, this approach is typically followed by a second stage of iterative refinement. The singular triplets can be refined one by one [36, 11, 7] or collectively as an initial subspace for an iterative eigensolver [45]. This is especially needed when seeking smallest singular values. 0 AT The second approach seeks eigenpairs of the augmented matrix B = , A 0 (m+n) (m+n) where B C × . If U ⊥ is a basis for the orthogonal complement subspace ∈ m (m n) of U, where U ⊥ C × − , and define the orthonormal matrix ∈ 1 V V 0 (2.4) Y = − , √2 U U √2U ⊥ then the eigenvalue decomposition of B corresponds to [12, 14] (2.5) BY = Y diag σ1,...,σn, σ1,..., σn, 0,..., 0 . − − m n − | {z } This approach can compute all singular values accurately, i.e., with residual norm close to A ǫmach or relative accuracy O(κiǫmach). The main disadvantage of this approachk isk that when seeking the smallest singular values, the eigenvalue problem is a maximally interior one and thus Krylov iterative methods converge slowly and unre- liably. Let Kl(C, v1) be the Krylov subspace of dimension l of matrix C starting with Kl(C, v1) 0 initial vector v1. It is easy to see that K2l(B, ¯v1)= , 0 ⊕ AKl(C, v1) 1 where ¯v1 is the vector [v1; 0] . Therefore, an unrestarted Krylov method on B with the proper initial vector and extraction method converges twice slower than that on C [45]. In practice, restarted methods on C can converge much faster than on B, partly because the Rayleigh-Ritz projection for interior eigenvalues is less efficient due to presence of spurious Ritz values [35]. Harmonic projection [33, 34] and refined projection [22] have been proposed to overcome this difficulty. However, near-optimal restarted methods on C still converge significantly faster [45].
Recommended publications
  • Recursive Approach in Sparse Matrix LU Factorization
    51 Recursive approach in sparse matrix LU factorization Jack Dongarra, Victor Eijkhout and the resulting matrix is often guaranteed to be positive Piotr Łuszczek∗ definite or close to it. However, when the linear sys- University of Tennessee, Department of Computer tem matrix is strongly unsymmetric or indefinite, as Science, Knoxville, TN 37996-3450, USA is the case with matrices originating from systems of Tel.: +865 974 8295; Fax: +865 974 8296 ordinary differential equations or the indefinite matri- ces arising from shift-invert techniques in eigenvalue methods, one has to revert to direct methods which are This paper describes a recursive method for the LU factoriza- the focus of this paper. tion of sparse matrices. The recursive formulation of com- In direct methods, Gaussian elimination with partial mon linear algebra codes has been proven very successful in pivoting is performed to find a solution of Eq. (1). Most dense matrix computations. An extension of the recursive commonly, the factored form of A is given by means technique for sparse matrices is presented. Performance re- L U P Q sults given here show that the recursive approach may per- of matrices , , and such that: form comparable to leading software packages for sparse ma- LU = PAQ, (2) trix factorization in terms of execution time, memory usage, and error estimates of the solution. where: – L is a lower triangular matrix with unitary diago- nal, 1. Introduction – U is an upper triangular matrix with arbitrary di- agonal, Typically, a system of linear equations has the form: – P and Q are row and column permutation matri- Ax = b, (1) ces, respectively (each row and column of these matrices contains single a non-zero entry which is A n n A ∈ n×n x where is by real matrix ( R ), and 1, and the following holds: PPT = QQT = I, b n b, x ∈ n and are -dimensional real vectors ( R ).
    [Show full text]
  • Abstract 1 Introduction
    Implementation in ScaLAPACK of Divide-and-Conquer Algorithms for Banded and Tridiagonal Linear Systems A. Cleary Department of Computer Science University of Tennessee J. Dongarra Department of Computer Science University of Tennessee Mathematical Sciences Section Oak Ridge National Laboratory Abstract Described hereare the design and implementation of a family of algorithms for a variety of classes of narrow ly banded linear systems. The classes of matrices include symmetric and positive de - nite, nonsymmetric but diagonal ly dominant, and general nonsymmetric; and, al l these types are addressed for both general band and tridiagonal matrices. The family of algorithms captures the general avor of existing divide-and-conquer algorithms for banded matrices in that they have three distinct phases, the rst and last of which arecompletely paral lel, and the second of which is the par- al lel bottleneck. The algorithms have been modi ed so that they have the desirable property that they are the same mathematical ly as existing factorizations Cholesky, Gaussian elimination of suitably reordered matrices. This approach represents a departure in the nonsymmetric case from existing methods, but has the practical bene ts of a smal ler and more easily hand led reduced system. All codes implement a block odd-even reduction for the reduced system that al lows the algorithm to scale far better than existing codes that use variants of sequential solution methods for the reduced system. A cross section of results is displayed that supports the predicted performance results for the algo- rithms. Comparison with existing dense-type methods shows that for areas of the problem parameter space with low bandwidth and/or high number of processors, the family of algorithms described here is superior.
    [Show full text]
  • Overview of Iterative Linear System Solver Packages
    Overview of Iterative Linear System Solver Packages Victor Eijkhout July, 1998 Abstract Description and comparison of several packages for the iterative solu- tion of linear systems of equations. 1 1 Intro duction There are several freely available packages for the iterative solution of linear systems of equations, typically derived from partial di erential equation prob- lems. In this rep ort I will give a brief description of a numberofpackages, and giveaninventory of their features and de ning characteristics. The most imp ortant features of the packages are which iterative metho ds and preconditioners supply; the most relevant de ning characteristics are the interface they present to the user's data structures, and their implementation language. 2 2 Discussion Iterative metho ds are sub ject to several design decisions that a ect ease of use of the software and the resulting p erformance. In this section I will give a global discussion of the issues involved, and how certain p oints are addressed in the packages under review. 2.1 Preconditioners A go o d preconditioner is necessary for the convergence of iterative metho ds as the problem to b e solved b ecomes more dicult. Go o d preconditioners are hard to design, and this esp ecially holds true in the case of parallel pro cessing. Here is a short inventory of the various kinds of preconditioners found in the packages reviewed. 2.1.1 Ab out incomplete factorisation preconditioners Incomplete factorisations are among the most successful preconditioners devel- op ed for single-pro cessor computers. Unfortunately, since they are implicit in nature, they cannot immediately b e used on parallel architectures.
    [Show full text]
  • Singular Value Decomposition (SVD)
    San José State University Math 253: Mathematical Methods for Data Visualization Lecture 5: Singular Value Decomposition (SVD) Dr. Guangliang Chen Outline • Matrix SVD Singular Value Decomposition (SVD) Introduction We have seen that symmetric matrices are always (orthogonally) diagonalizable. That is, for any symmetric matrix A ∈ Rn×n, there exist an orthogonal matrix Q = [q1 ... qn] and a diagonal matrix Λ = diag(λ1, . , λn), both real and square, such that A = QΛQT . We have pointed out that λi’s are the eigenvalues of A and qi’s the corresponding eigenvectors (which are orthogonal to each other and have unit norm). Thus, such a factorization is called the eigendecomposition of A, also called the spectral decomposition of A. What about general rectangular matrices? Dr. Guangliang Chen | Mathematics & Statistics, San José State University3/22 Singular Value Decomposition (SVD) Existence of the SVD for general matrices Theorem: For any matrix X ∈ Rn×d, there exist two orthogonal matrices U ∈ Rn×n, V ∈ Rd×d and a nonnegative, “diagonal” matrix Σ ∈ Rn×d (of the same size as X) such that T Xn×d = Un×nΣn×dVd×d. Remark. This is called the Singular Value Decomposition (SVD) of X: • The diagonals of Σ are called the singular values of X (often sorted in decreasing order). • The columns of U are called the left singular vectors of X. • The columns of V are called the right singular vectors of X. Dr. Guangliang Chen | Mathematics & Statistics, San José State University4/22 Singular Value Decomposition (SVD) * * b * b (n>d) b b b * b = * * = b b b * (n<d) * b * * b b Dr.
    [Show full text]
  • Eigen Values and Vectors Matrices and Eigen Vectors
    EIGEN VALUES AND VECTORS MATRICES AND EIGEN VECTORS 2 3 1 11 × = [2 1] [3] [ 5 ] 2 3 3 12 3 × = = 4 × [2 1] [2] [ 8 ] [2] • Scale 3 6 2 × = [2] [4] 2 3 6 24 6 × = = 4 × [2 1] [4] [16] [4] 2 EIGEN VECTOR - PROPERTIES • Eigen vectors can only be found for square matrices • Not every square matrix has eigen vectors. • Given an n x n matrix that does have eigenvectors, there are n of them for example, given a 3 x 3 matrix, there are 3 eigenvectors. • Even if we scale the vector by some amount, we still get the same multiple 3 EIGEN VECTOR - PROPERTIES • Even if we scale the vector by some amount, we still get the same multiple • Because all you’re doing is making it longer, not changing its direction. • All the eigenvectors of a matrix are perpendicular or orthogonal. • This means you can express the data in terms of these perpendicular eigenvectors. • Also, when we find eigenvectors we usually normalize them to length one. 4 EIGEN VALUES - PROPERTIES • Eigenvalues are closely related to eigenvectors. • These scale the eigenvectors • eigenvalues and eigenvectors always come in pairs. 2 3 6 24 6 × = = 4 × [2 1] [4] [16] [4] 5 SPECTRAL THEOREM Theorem: If A ∈ ℝm×n is symmetric matrix (meaning AT = A), then, there exist real numbers (the eigenvalues) λ1, …, λn and orthogonal, non-zero real vectors ϕ1, ϕ2, …, ϕn (the eigenvectors) such that for each i = 1,2,…, n : Aϕi = λiϕi 6 EXAMPLE 30 28 A = [28 30] From spectral theorem: Aϕ = λϕ 7 EXAMPLE 30 28 A = [28 30] From spectral theorem: Aϕ = λϕ ⟹ Aϕ − λIϕ = 0 (A − λI)ϕ = 0 30 − λ 28 = 0 ⟹ λ = 58 and
    [Show full text]
  • Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product
    Accelerating the LOBPCG method on GPUs using a blocked Sparse Matrix Vector Product Hartwig Anzt and Stanimire Tomov and Jack Dongarra Innovative Computing Lab University of Tennessee Knoxville, USA Email: [email protected], [email protected], [email protected] Abstract— the computing power of today’s supercomputers, often accel- erated by coprocessors like graphics processing units (GPUs), This paper presents a heterogeneous CPU-GPU algorithm design and optimized implementation for an entire sparse iter- becomes challenging. ative eigensolver – the Locally Optimal Block Preconditioned Conjugate Gradient (LOBPCG) – starting from low-level GPU While there exist numerous efforts to adapt iterative lin- data structures and kernels to the higher-level algorithmic choices ear solvers like Krylov subspace methods to coprocessor and overall heterogeneous design. Most notably, the eigensolver technology, sparse eigensolvers have so far remained out- leverages the high-performance of a new GPU kernel developed side the main focus. A possible explanation is that many for the simultaneous multiplication of a sparse matrix and a of those combine sparse and dense linear algebra routines, set of vectors (SpMM). This is a building block that serves which makes porting them to accelerators more difficult. Aside as a backbone for not only block-Krylov, but also for other from the power method, algorithms based on the Krylov methods relying on blocking for acceleration in general. The subspace idea are among the most commonly used general heterogeneous LOBPCG developed here reveals the potential of eigensolvers [1]. When targeting symmetric positive definite this type of eigensolver by highly optimizing all of its components, eigenvalue problems, the recently developed Locally Optimal and can be viewed as a benchmark for other SpMM-dependent applications.
    [Show full text]
  • Singular Value Decomposition (SVD) 2 1.1 Singular Vectors
    Contents 1 Singular Value Decomposition (SVD) 2 1.1 Singular Vectors . .3 1.2 Singular Value Decomposition (SVD) . .7 1.3 Best Rank k Approximations . .8 1.4 Power Method for Computing the Singular Value Decomposition . 11 1.5 Applications of Singular Value Decomposition . 16 1.5.1 Principal Component Analysis . 16 1.5.2 Clustering a Mixture of Spherical Gaussians . 16 1.5.3 An Application of SVD to a Discrete Optimization Problem . 22 1.5.4 SVD as a Compression Algorithm . 24 1.5.5 Spectral Decomposition . 24 1.5.6 Singular Vectors and ranking documents . 25 1.6 Bibliographic Notes . 27 1.7 Exercises . 28 1 1 Singular Value Decomposition (SVD) The singular value decomposition of a matrix A is the factorization of A into the product of three matrices A = UDV T where the columns of U and V are orthonormal and the matrix D is diagonal with positive real entries. The SVD is useful in many tasks. Here we mention some examples. First, in many applications, the data matrix A is close to a matrix of low rank and it is useful to find a low rank matrix which is a good approximation to the data matrix . We will show that from the singular value decomposition of A, we can get the matrix B of rank k which best approximates A; in fact we can do this for every k. Also, singular value decomposition is defined for all matrices (rectangular or square) unlike the more commonly used spectral decomposition in Linear Algebra. The reader familiar with eigenvectors and eigenvalues (we do not assume familiarity here) will also realize that we need conditions on the matrix to ensure orthogonality of eigenvectors.
    [Show full text]
  • A Singularly Valuable Decomposition: the SVD of a Matrix Dan Kalman
    A Singularly Valuable Decomposition: The SVD of a Matrix Dan Kalman Dan Kalman is an assistant professor at American University in Washington, DC. From 1985 to 1993 he worked as an applied mathematician in the aerospace industry. It was during that period that he first learned about the SVD and its applications. He is very happy to be teaching again and is avidly following all the discussions and presentations about the teaching of linear algebra. Every teacher of linear algebra should be familiar with the matrix singular value deco~??positiolz(or SVD). It has interesting and attractive algebraic properties, and conveys important geometrical and theoretical insights about linear transformations. The close connection between the SVD and the well-known theo1-j~of diagonalization for sylnmetric matrices makes the topic immediately accessible to linear algebra teachers and, indeed, a natural extension of what these teachers already know. At the same time, the SVD has fundamental importance in several different applications of linear algebra. Gilbert Strang was aware of these facts when he introduced the SVD in his now classical text [22, p. 1421, obselving that "it is not nearly as famous as it should be." Golub and Van Loan ascribe a central significance to the SVD in their defini- tive explication of numerical matrix methods [8, p, xivl, stating that "perhaps the most recurring theme in the book is the practical and theoretical value" of the SVD. Additional evidence of the SVD's significance is its central role in a number of re- cent papers in :Matlgenzatics ivlagazine and the Atnericalz Mathematical ilironthly; for example, [2, 3, 17, 231.
    [Show full text]
  • Easybuild Documentation Release 20210907.0
    EasyBuild Documentation Release 20210907.0 Ghent University Tue, 07 Sep 2021 08:55:41 Contents 1 What is EasyBuild? 3 2 Concepts and terminology 5 2.1 EasyBuild framework..........................................5 2.2 Easyblocks................................................6 2.3 Toolchains................................................7 2.3.1 system toolchain.......................................7 2.3.2 dummy toolchain (DEPRECATED) ..............................7 2.3.3 Common toolchains.......................................7 2.4 Easyconfig files..............................................7 2.5 Extensions................................................8 3 Typical workflow example: building and installing WRF9 3.1 Searching for available easyconfigs files.................................9 3.2 Getting an overview of planned installations.............................. 10 3.3 Installing a software stack........................................ 11 4 Getting started 13 4.1 Installing EasyBuild........................................... 13 4.1.1 Requirements.......................................... 14 4.1.2 Using pip to Install EasyBuild................................. 14 4.1.3 Installing EasyBuild with EasyBuild.............................. 17 4.1.4 Dependencies.......................................... 19 4.1.5 Sources............................................. 21 4.1.6 In case of installation issues. .................................. 22 4.2 Configuring EasyBuild.......................................... 22 4.2.1 Supported configuration
    [Show full text]
  • Over-Scalapack.Pdf
    1 2 The Situation: Parallel scienti c applications are typically Overview of ScaLAPACK written from scratch, or manually adapted from sequen- tial programs Jack Dongarra using simple SPMD mo del in Fortran or C Computer Science Department UniversityofTennessee with explicit message passing primitives and Mathematical Sciences Section Which means Oak Ridge National Lab oratory similar comp onents are co ded over and over again, co des are dicult to develop and maintain http://w ww .ne tl ib. org /ut k/p e opl e/J ackDo ngarra.html debugging particularly unpleasant... dicult to reuse comp onents not really designed for long-term software solution 3 4 What should math software lo ok like? The NetSolve Client Many more p ossibilities than shared memory Virtual Software Library Multiplicityofinterfaces ! Ease of use Possible approaches { Minimum change to status quo Programming interfaces Message passing and ob ject oriented { Reusable templates { requires sophisticated user Cinterface FORTRAN interface { Problem solving environments Integrated systems a la Matlab, Mathematica use with heterogeneous platform Interactiveinterfaces { Computational server, like netlib/xnetlib f2c Non-graphic interfaces Some users want high p erf., access to details {MATLAB interface Others sacri ce sp eed to hide details { UNIX-Shell interface Not enough p enetration of libraries into hardest appli- Graphic interfaces cations on vector machines { TK/TCL interface Need to lo ok at applications and rethink mathematical software { HotJava Browser
    [Show full text]
  • Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions
    Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions Xin Liu∗ Zaiwen Weny Yin Zhangz March 22, 2012 Abstract In many data-intensive applications, the use of principal component analysis (PCA) and other related techniques is ubiquitous for dimension reduction, data mining or other transformational purposes. Such transformations often require efficiently, reliably and accurately computing dominant singular value de- compositions (SVDs) of large unstructured matrices. In this paper, we propose and study a subspace optimization technique to significantly accelerate the classic simultaneous iteration method. We analyze the convergence of the proposed algorithm, and numerically compare it with several state-of-the-art SVD solvers under the MATLAB environment. Extensive computational results show that on a wide range of large unstructured matrices, the proposed algorithm can often provide improved efficiency or robustness over existing algorithms. Keywords. subspace optimization, dominant singular value decomposition, Krylov subspace, eigen- value decomposition 1 Introduction Singular value decomposition (SVD) is a fundamental and enormously useful tool in matrix computations, such as determining the pseudo-inverse, the range or null space, or the rank of a matrix, solving regular or total least squares data fitting problems, or computing low-rank approximations to a matrix, just to mention a few. The need for computing SVDs also frequently arises from diverse fields in statistics, signal processing, data mining or compression, and from various dimension-reduction models of large-scale dynamic systems. Usually, instead of acquiring all the singular values and vectors of a matrix, it suffices to compute a set of dominant (i.e., the largest) singular values and their corresponding singular vectors in order to obtain the most valuable and relevant information about the underlying dataset or system.
    [Show full text]
  • Interfacing Epetra to the RSB Sparse Matrix Format for Shared-Memory Performance
    Interfacing Epetra to the RSB sparse matrix format for shared-memory performance Michele Martone High Level Support Team Max Planck Institute for Plasma Physics Garching bei M¨unchen,Germany 5th European Trilinos User Group Meeting Garching bei M¨unchen, Germany April 19, 2016 1 / 18 Background: sparse kernels can be performance critical Sparse kernel Operation definition1 SpMV: Matrix-Vector Multiply \y β y + α A x" SpMV-T: (Transposed) \y β y + α AT x" SpSV : Triangular Solve \x α L−1 x" SpSV-T : (Transposed) \x α L−T x" Epetra_... Epetra_CrsMatrix I CRS (Compressed Row Storage) versatile and straightforward I class Epetra CrsMatrix can rely on Intel MKL 1A is a sparse matrix, L sparse triangular, x; y vectors or multi-vectors, α; β scalars. 2 / 18 Background: optimized sparse matrix classes Epetra_... Epetra_CrsMatrix Epetra_OskiMatrix I OSKI (Optimized Sparse Kernels Interface) library, by R.Vuduc I kernels for iterative methods, e.g. SpMV, SpSV I autotuning framework I OSKI + Epetra = class Epetra OskiMatrix, by I.Karlin 3 / 18 Background: librsb I a stable \Sparse BLAS"-like kernels library: I Sparse BLAS API (blas sparse.h) I own API (rsb.h) I LGPLv3 licensed I librsb-1.2 at http://librsb.sourceforge.net/ I OpenMP threaded 2 I C/C++/FORTRAN interface 2ISO-C-BINDING for own and F90 for Sparse BLAS 4 / 18 What can librsb do 3 I can be multiple times faster than Intel MKL's CRS in I symmetric/transposed sparse matrix multiply (SpMV) I multi-RHS (SpMM) I large matrices I threaded SpSV/SpSM (sparse triangular solve) I iterative methods ops: scaling, extraction, update, ..
    [Show full text]