Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance

Total Page:16

File Type:pdf, Size:1020Kb

Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance FIELD G. VAN ZEE, The University of Texas at Austin ROBERT A. VAN DE GEIJN, The University of Texas at Austin GREGORIO QUINTANA-ORT´ı, Universitat Jaume I We show how both the tridiagonal and bidiagonal QR algorithms can be restructured so that they be- come rich in operations that can achieve near-peak performance on a modern processor. The key is a novel, cache-friendly algorithm for applying multiple sets of Givens rotations to the eigenvector/singular vector matrix. This algorithm is then implemented with optimizations that (1) leverage vector instruction units to increase floating-point throughput, and (2) fuse multiple rotations to decrease the total number of memory operations. We demonstrate the merits of these new QR algorithms for computing the Hermitian eigenvalue decomposition (EVD) and singular value decomposition (SVD) of dense matrices when all eigen- vectors/singular vectors are computed. The approach yields vastly improved performance relative to the traditional QR algorithms for these problems and is competitive with two commonly used alternatives— Cuppen’s Divide and Conquer algorithm and the Method of Multiple Relatively Robust Representations— while inheriting the more modest O(n) workspace requirements of the original QR algorithms. Since the computations performed by the restructured algorithms remain essentially identical to those performed by the original methods, robust numerical properties are preserved. Categories and Subject Descriptors: G.4 [Mathematical Software]: Efficiency General Terms: Algorithms; Performance Additional Key Words and Phrases: eigenvalues, singular values, tridiagonal, bidiagonal, EVD, SVD, QR algorithm, Givens rotations, linear algebra, libraries, high-performance 1. INTRODUCTION The tridiagonal (and/or bidiagonal) QR algorithm is taught in a typical graduate-level numerical linear algebra course, and despite being among the most accurate1 meth- ods for performing eigenvalue and singular value decompositions (EVD and SVD, re- spectively), it is not used much in practice because its performance is not competi- tive [Watkins 1982; Golub and Loan 1996; Stewart 2001; Dhillon and Parlett 2003]. The reason for this is twofold: First, classic QR algorithm implementations, such as those in LAPACK, cast most of their computation (the application of Givens rotations) in terms of a routine that is absent from the BLAS, and thus is typically not available in optimized form. Second, even if such an optimized implementation existed, it would 1Notable algorithms which exceed the accuracy of the QR algorithm include the dqds algorithm (a variant of the QR algorithm) [Fernando and Parlett 1994; Parlett and Marques 1999] and the Jacobi-SVD algorithm by [Drmacˇ and Veselic´ 2008a; 2008b]. Authors’ addresses: Field G. Van Zee and Robert A. van de Geijn, Institute for Computational Engineer- ing and Sciences, Department of Computer Science, The University of Texas at Austin, Austin, TX 78712, {field,rvdg}@cs.utexas.edu. Gregorio Quintana-Ort´ı, Departamento de Ingenier´ıa y Ciencia de Computa- dores, Universitat Jaume I, Campus Riu Sec, 12.071, Castellon,´ Spain, [email protected]. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies show this notice on the first page or initial screen of a display along with the full citation. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is per- mitted. To copy otherwise, to republish, to post on servers, to redistribute to lists, or to use any component of this work in other works requires prior specific permission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 Penn Plaza, Suite 701, New York, NY 10121-0701 USA, fax +1 (212) 869-0481, or [email protected]. °c 2012 ACM 0098-3500/2012/01-ART00 $10.00 DOI 10.1145/0000000.0000000 http://doi.acm.org/10.1145/0000000.0000000 ACM Transactions on Mathematical Software, Vol. 0, No. 0, Article 00, Publication date: January 2012. 00:2 F. G. Van Zee et al. not matter because the QR algorithm is currently structured to apply Givens rotations via repeated instances of (n2) computation on (n2) data, effectively making it rich in level-2 BLAS-like operations,O which inherentlyO cannot achieve high performance because there is little opportunity for reuse of cached data. Many in the numerical linear algebra community have long speculated that the QR algorithm’s performance could be improved by saving up many sets of Givens rotations before applying them to the matrix in which eigenvectors or singular vectors are being accumulated. In this paper, we show that, when computing all eigenvectors of a dense Hermitian matrix or all singular vectors of a dense general matrix, dramatic improvements in performance can indeed be achieved. This work makes a number of contributions to this subject: — It describes how the traditional QR algorithm can be restructured so that computa- tion is cast in terms of an operation that applies many sets of Givens rotations to the matrix in which the eigen-/singular vectors are accumulated. This restructuring preserves a key feature of the original QR algorithm: namely, the approach requires only linear ( (n)) workspace. An optional optimization to the restructured algorithm that requiresO (n2) workspace is also discussed and tested. — It proposes anO algorithm for applying many sets of Givens rotations that, in theory, exhibits greatly improved reuse of data in the cache. It then shows that an imple- mentation of this algorithm can achieve near-peak performance by (1) utilizing vector instruction units to increase floating-point operation throughput, and (2) fusing mul- tiple rotations so that data can be reused in-register, which decreases costly memory operations. — It exposes and leverages the fact that lower computational costs of both the method of Multiple Relatively Robust Representations (MRRR) [Dhillon and Parlett 2004; Dhillon et al. 2006] and Cuppen’s Divide-and-Conquer method (D&C) [Cuppen 1980] are partially offset by an (n3) difference in cost between the former methods’ back- transformations and the correspondingO step in the QR algorithm. — It demonstrates performance of EVD via the QR algorithm that is competitive with that of D&C- and MRRR-based EVD, and QR-based SVD performance that is com- parable to D&C-based SVD. — It makes the resulting implementations available as part of the open source libflame library. The paper primarily focuses on the complex case; the mathematics then trivially sim- plify to the real case. We consider these results to be significant in part because we place a premium on simplicity and reliability. The restructured QR algorithm presented in this paper is simple, gives performance that is almost as good as that of more intricate algo- rithms (such as D&C and MRRR) and does so using only (n) workspace, and without the worry of what might happen to numerical accuracy inO pathological cases such as tightly clustered eigen-/singular values. It should be emphasized that the improved performance we report is not so pro- nounced that the QR algorithm becomes competitive with D&C and MRRR when per- forming standalone tridiagonal EVD or bidiagonal SVD (that is, when the input matrix is already reduced to condensed form). Rather, we show that in the context of the dense decompositions, which include two other stages of (n3) computation, the restructured QR algorithm provides enough speedup over the moreO traditional method to facilitate competitive overall performance. ACM Transactions on Mathematical Software, Vol. 0, No. 0, Article 00, Publication date: January 2012. Restructuring the Tridiagonal and Bidiagonal QR Algorithms for Performance 00:3 2. COMPUTING THE SPECTRAL DECOMPOSITION OF A HERMITIAN MATRIX Given a Hermitian matrix A Cn×n, its eigenvalue decomposition (EVD) is given by A = QDQH , where Q Cn×n∈is unitary (QH Q = I) and D Rn×n is diagonal. The eigenvalues of matrix A∈can then be found on the diagonal of∈D while the correspond- ing eigenvectors are the columns of Q. The standard approach to computing the EVD proceeds in three steps [Stewart 2001]: Reduce matrix A to real tridiagonal form T via H T unitary similarity transformations: A = QAT QA ; Compute the EVD of T : T = QT DQT ; H and back-transform the eigenvectors of T : Q = QAQT so that A = QDQ . Let us dis- cuss these in more detail. Note that we will use the general term “workspace” to refer to any significant space needed beyond the n n space that holds the input matrix A and the n-length space that holds output eigenvalues× (ie: the diagonal of D). 2.1. Reduction to real tridiagonal form The reduction to tridiagonal form proceeds as the computation and application of a sequence of Householder transformations. When the transformations are defined as reflectors [Golub and Loan 1996; Van Zee et al. 2012; Stewart 2001], the tridiagonal H reduction takes the form Hn−2 H1H0AH0H1 Hn−2 = QA AQA = T , a real-valued, tridiagonal matrix.2 ··· ··· 4 3 The cost of the reduction to tridiagonal form is 3 n floating-point operations (flops) 4 3 if A is real and 4 3 n flops if it is complex-valued. About half of those computa- tions are in symmetric× (or Hermitian) matrix-vector multiplications (a level-2 BLAS operation [Dongarra et al. 1988]), which are inherently slow since they perform (n2) computations on (n2) data. Most of the remaining flops are in Hermitian rank-Ok up- dates (a level-3 BLASO operation) which can attain near-peak performance on modern architectures. If a flop that is executed as part of a level-2 BLAS operation is K2 times slower than one executed as part of a level-3 BLAS operation, we will give the effective 2 3 cost of this operation as (K2 + 1) 3 n , where K2 1 and typically K2 1.
Recommended publications
  • 18.06 Linear Algebra, Problem Set 2 Solutions
    18.06 Problem Set 2 Solution Total: 100 points Section 2.5. Problem 24: Use Gauss-Jordan elimination on [U I] to find the upper triangular −1 U : 2 3 2 3 2 3 1 a b 1 0 0 −1 4 5 4 5 4 5 UU = I 0 1 c x1 x2 x3 = 0 1 0 : 0 0 1 0 0 1 −1 Solution (4 points): Row reduce [U I] to get [I U ] as follows (here Ri stands for the ith row): 2 3 2 3 1 a b 1 0 0 (R1 = R1 − aR2) 1 0 b − ac 1 −a 0 4 5 4 5 0 1 c 0 1 0 −! (R2 = R2 − cR2) 0 1 0 0 1 −c 0 0 1 0 0 1 0 0 1 0 0 1 ( ) 2 3 R1 = R1 − (b − ac)R3 1 0 0 1 −a ac − b −! 40 1 0 0 1 −c 5 : 0 0 1 0 0 1 Section 2.5. Problem 40: (Recommended) A is a 4 by 4 matrix with 1's on the diagonal and −1 −a; −b; −c on the diagonal above. Find A for this bidiagonal matrix. −1 Solution (12 points): Row reduce [A I] to get [I A ] as follows (here Ri stands for the ith row): 2 3 1 −a 0 0 1 0 0 0 6 7 60 1 −b 0 0 1 0 07 4 5 0 0 1 −c 0 0 1 0 0 0 0 1 0 0 0 1 2 3 (R1 = R1 + aR2) 1 0 −ab 0 1 a 0 0 6 7 (R2 = R2 + bR2) 60 1 0 −bc 0 1 b 07 −! 4 5 (R3 = R3 + cR4) 0 0 1 0 0 0 1 c 0 0 0 1 0 0 0 1 2 3 (R1 = R1 + abR3) 1 0 0 0 1 a ab abc (R = R + bcR ) 60 1 0 0 0 1 b bc 7 −! 2 2 4 6 7 : 40 0 1 0 0 0 1 c 5 0 0 0 1 0 0 0 1 Alternatively, write A = I − N.
    [Show full text]
  • [Math.RA] 19 Jun 2003 Two Linear Transformations Each Tridiagonal with Respect to an Eigenbasis of the Ot
    Two linear transformations each tridiagonal with respect to an eigenbasis of the other; comments on the parameter array∗ Paul Terwilliger Abstract Let K denote a field. Let d denote a nonnegative integer and consider a sequence ∗ K p = (θi,θi , i = 0...d; ϕj , φj, j = 1...d) consisting of scalars taken from . We call p ∗ ∗ a parameter array whenever: (PA1) θi 6= θj, θi 6= θj if i 6= j, (0 ≤ i, j ≤ d); (PA2) i−1 θh−θd−h ∗ ∗ ϕi 6= 0, φi 6= 0 (1 ≤ i ≤ d); (PA3) ϕi = φ1 + (θ − θ )(θi−1 − θ ) h=0 θ0−θd i 0 d i−1 θh−θd−h ∗ ∗ (1 ≤ i ≤ d); (PA4) φi = ϕ1 + (Pθ − θ )(θd−i+1 − θ0) (1 ≤ i ≤ d); h=0 θ0−θd i 0 −1 ∗ ∗ ∗ ∗ −1 (PA5) (θi−2 − θi+1)(θi−1 − θi) P, (θi−2 − θi+1)(θi−1 − θi ) are equal and independent of i for 2 ≤ i ≤ d − 1. In [13] we showed the parameter arrays are in bijection with the isomorphism classes of Leonard systems. Using this bijection we obtain the following two characterizations of parameter arrays. Assume p satisfies PA1, PA2. Let ∗ ∗ A, B, A ,B denote the matrices in Matd+1(K) which have entries Aii = θi, Bii = θd−i, ∗ ∗ ∗ ∗ ∗ ∗ Aii = θi , Bii = θi (0 ≤ i ≤ d), Ai,i−1 = 1, Bi,i−1 = 1, Ai−1,i = ϕi, Bi−1,i = φi (1 ≤ i ≤ d), and all other entries 0. We show the following are equivalent: (i) p satisfies −1 PA3–PA5; (ii) there exists an invertible G ∈ Matd+1(K) such that G AG = B and G−1A∗G = B∗; (iii) for 0 ≤ i ≤ d the polynomial i ∗ ∗ ∗ ∗ ∗ ∗ (λ − θ0)(λ − θ1) · · · (λ − θn−1)(θi − θ0)(θi − θ1) · · · (θi − θn−1) ϕ1ϕ2 · · · ϕ nX=0 n is a scalar multiple of the polynomial i ∗ ∗ ∗ ∗ ∗ ∗ (λ − θd)(λ − θd−1) · · · (λ − θd−n+1)(θ − θ )(θ − θ ) · · · (θ − θ ) i 0 i 1 i n−1 .
    [Show full text]
  • Self-Interlacing Polynomials Ii: Matrices with Self-Interlacing Spectrum
    SELF-INTERLACING POLYNOMIALS II: MATRICES WITH SELF-INTERLACING SPECTRUM MIKHAIL TYAGLOV Abstract. An n × n matrix is said to have a self-interlacing spectrum if its eigenvalues λk, k = 1; : : : ; n, are distributed as follows n−1 λ1 > −λ2 > λ3 > ··· > (−1) λn > 0: A method for constructing sign definite matrices with self-interlacing spectra from totally nonnegative ones is presented. We apply this method to bidiagonal and tridiagonal matrices. In particular, we generalize a result by O. Holtz on the spectrum of real symmetric anti-bidiagonal matrices with positive nonzero entries. 1. Introduction In [5] there were introduced the so-called self-interlacing polynomials. A polynomial p(z) is called self- interlacing if all its roots are real, semple and interlacing the roots of the polynomial p(−z). It is easy to see that if λk, k = 1; : : : ; n, are the roots of a self-interlacing polynomial, then the are distributed as follows n−1 (1.1) λ1 > −λ2 > λ3 > ··· > (−1) λn > 0; or n (1.2) − λ1 > λ2 > −λ3 > ··· > (−1) λn > 0: The polynomials whose roots are distributed as in (1.1) (resp. in (1.2)) are called self-interlacing of kind I (resp. of kind II). It is clear that a polynomial p(z) is self-interlacing of kind I if, and only if, the polynomial p(−z) is self-interlacing of kind II. Thus, it is enough to study self-interlacing of kind I, since all the results for self-interlacing of kind II will be obtained automatically. Definition 1.1. An n × n matrix is said to possess a self-interlacing spectrum if its eigenvalues λk, k = 1; : : : ; n, are real, simple, are distributed as in (1.1).
    [Show full text]
  • Accurate Singular Values of Bidiagonal Matrices
    d d Accurate Singular Values of Bidiagonal Matrices (Appeared in the SIAM J. Sci. Stat. Comput., v. 11, n. 5, pp. 873-912, 1990) James Demmel W. Kahan Courant Institute Computer Science Division 251 Mercer Str. University of California New York, NY 10012 Berkeley, CA 94720 Abstract Computing the singular values of a bidiagonal matrix is the ®nal phase of the standard algo- rithm for the singular value decomposition of a general matrix. We present a new algorithm which computes all the singular values of a bidiagonal matrix to high relative accuracy indepen- dent of their magnitudes. In contrast, the standard algorithm for bidiagonal matrices may com- pute sm all singular values with no relative accuracy at all. Numerical experiments show that the new algorithm is comparable in speed to the standard algorithm, and frequently faster. Keywords: singular value decomposition, bidiagonal matrix, QR iteration AMS(MOS) subject classi®cations: 65F20, 65G05, 65F35 1. Introduction The standard algorithm for computing the singular value decomposition (SVD ) of a gen- eral real matrix A has two phases [7]: = T 1) Compute orthogonal matrices P 11and Q such that B PAQ 11is in bidiagonal form, i.e. has nonzero entries only on its diagonal and ®rst superdiagonal. Σ= T 2) Compute orthogonal matrices P 22and Q such that PBQ 22is diagonal and nonnega- σ Σ tive. The diagonal entries i of are the singular values of A. We will take them to be σ≥σ = sorted in decreasing order: ii+112. The columns of Q QQ are the right singular vec- = tors,andthecolumnsofP PP12are the left singular vectors.
    [Show full text]
  • Design and Evaluation of Tridiagonal Solvers for Vector and Parallel Computers Universitat Politècnica De Catalunya
    Design and Evaluation of Tridiagonal Solvers for Vector and Parallel Computers Author: Josep Lluis Larriba Pey Advisor: Juan José Navarro Guerrero Barcelona, January 1995 UPC Universitat Politècnica de Catalunya Departament d'Arquitectura de Computadors UNIVERSITAT POLITÈCNICA DE CATALU NYA B L I O T E C A X - L I B R I S Tesi doctoral presentada per Josep Lluis Larriba Pey per tal d'aconseguir el grau de Doctor en Informàtica per la Universitat Politècnica de Catalunya UNIVERSITAT POLITÈCNICA DE CATAl U>'YA ADA'UNÍIEÏRACÍÓ ;;//..;,3UMPf';S AO\n¿.-v i S -i i::-.« c:¿fCM or,re: iïhcc'a a la pàgina .rf.S# a;; 3 b el r;ú¡; Barcelona, 6 N L'ENCARREGAT DEL REGISTRE. Barcelona, de de 1995 To my wife Marta To my parents Elvira and José Luis "A journey of a thousand miles must begin with a single step" Lao-Tse Acknowledgements I would like to thank my parents, José Luis and Elvira, for their unconditional help and love to me. I will always be indebted to you. I also want to thank my wife, Marta, for being the sparkle in my life, for her support and for understanding my (good) moods. I thank Juanjo, my advisor, for his friendship, patience and good advice. Also, I want to thank Àngel Jorba for his unconditional collaboration, work and support in some of the contributions of this work. I thank the members of "Comissió de Doctorat del DAG" for their comments and specially Miguel Valero for his suggestions on the topics of chapter 4. I thank Mateo Valero for his friendship and always good advice.
    [Show full text]
  • Computing the Moore-Penrose Inverse for Bidiagonal Matrices
    УДК 519.65 Yu. Hakopian DOI: https://doi.org/10.18523/2617-70802201911-23 COMPUTING THE MOORE-PENROSE INVERSE FOR BIDIAGONAL MATRICES The Moore-Penrose inverse is the most popular type of matrix generalized inverses which has many applications both in matrix theory and numerical linear algebra. It is well known that the Moore-Penrose inverse can be found via singular value decomposition. In this regard, there is the most effective algo- rithm which consists of two stages. In the first stage, through the use of the Householder reflections, an initial matrix is reduced to the upper bidiagonal form (the Golub-Kahan bidiagonalization algorithm). The second stage is known in scientific literature as the Golub-Reinsch algorithm. This is an itera- tive procedure which with the help of the Givens rotations generates a sequence of bidiagonal matrices converging to a diagonal form. This allows to obtain an iterative approximation to the singular value decomposition of the bidiagonal matrix. The principal intention of the present paper is to develop a method which can be considered as an alternative to the Golub-Reinsch iterative algorithm. Realizing the approach proposed in the study, the following two main results have been achieved. First, we obtain explicit expressions for the entries of the Moore-Penrose inverse of bidigonal matrices. Secondly, based on the closed form formulas, we get a finite recursive numerical algorithm of optimal computational complexity. Thus, we can compute the Moore-Penrose inverse of bidiagonal matrices without using the singular value decomposition. Keywords: Moor-Penrose inverse, bidiagonal matrix, inversion formula, finite recursive algorithm.
    [Show full text]
  • Basic Matrices
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector Linear Algebra and its Applications 373 (2003) 143–151 www.elsevier.com/locate/laa Basic matricesୋ Miroslav Fiedler Institute of Computer Science, Czech Academy of Sciences, 182 07 Prague 8, Czech Republic Received 30 April 2002; accepted 13 November 2002 Submitted by B. Shader Abstract We define a basic matrix as a square matrix which has both subdiagonal and superdiagonal rank at most one. We investigate, partly using additional restrictions, the relationship of basic matrices to factorizations. Special basic matrices are also mentioned. © 2003 Published by Elsevier Inc. AMS classification: 15A23; 15A33 Keywords: Structure rank; Tridiagonal matrix; Oscillatory matrix; Factorization; Orthogonal matrix 1. Introduction and preliminaries For m × n matrices, structure (cf. [2]) will mean any nonvoid subset of M × N, M ={1,...,m}, N ={1,...,n}. Given a matrix A (over a field, or a ring) and a structure, the structure rank of A is the maximum rank of any submatrix of A all entries of which are contained in the structure. The most important examples of structure ranks for square n × n matrices are the subdiagonal rank (S ={(i, k); n i>k 1}), the superdiagonal rank (S = {(i, k); 1 i<k n}), the off-diagonal rank (S ={(i, k); i/= k,1 i, k n}), the block-subdiagonal rank, etc. All these ranks enjoy the following property: Theorem 1.1 [2, Theorems 2 and 3]. If a nonsingular matrix has subdiagonal (su- perdiagonal, off-diagonal, block-subdiagonal, etc.) rank k, then the inverse also has subdiagonal (...)rank k.
    [Show full text]
  • Copyright © by SIAM. Unauthorized Reproduction of This Article Is Prohibited. PARALLEL BIDIAGONALIZATION of a DENSE MATRIX 827
    SIAM J. MATRIX ANAL. APPL. c 2007 Society for Industrial and Applied Mathematics Vol. 29, No. 3, pp. 826–837 ∗ PARALLEL BIDIAGONALIZATION OF A DENSE MATRIX † ‡ ‡ § CARLOS CAMPOS , DAVID GUERRERO , VICENTE HERNANDEZ´ , AND RUI RALHA Abstract. A new stable method for the reduction of rectangular dense matrices to bidiagonal form has been proposed recently. This is a one-sided method since it can be entirely expressed in terms of operations with (full) columns of the matrix under transformation. The algorithm is well suited to parallel computing and, in order to make it even more attractive for distributed memory systems, we introduce a modification which halves the number of communication instances. In this paper we present such a modification. A block organization of the algorithm to use level 3 BLAS routines seems difficult and, at least for the moment, it relies upon level 2 BLAS routines. Nevertheless, we found that our sequential code is competitive with the LAPACK DGEBRD routine. We also compare the time taken by our parallel codes and the ScaLAPACK PDGEBRD routine. We investigated the best data distribution schemes for the different codes and we can state that our parallel codes are also competitive with the ScaLAPACK routine. Key words. bidiagonal reduction, parallel algorithms AMS subject classifications. 15A18, 65F30, 68W10 DOI. 10.1137/05062809X 1. Introduction. The problem of computing the singular value decomposition (SVD) of a matrix is one of the most important operations in numerical linear algebra and is employed in a variety of applications. The SVD is defined as follows. × For any rectangular matrix A ∈ Rm n (we will assume that m ≥ n), there exist ∈ Rm×m ∈ Rn×n ΣA ∈ Rm×n two orthogonal matrices U and V and a matrix Σ = 0 , t where ΣA = diag (σ1,...,σn) is a diagonal matrix, such that A = UΣV .
    [Show full text]
  • Moore–Penrose Inverse of Bidiagonal Matrices. I
    PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY Physical and Mathematical Sciences 2015, № 2, p. 11–20 Mathematics MOORE–PENROSE INVERSE OF BIDIAGONAL MATRICES.I 1 ∗ 2∗∗ Yu. R. HAKOPIAN , S. S. ALEKSANYAN ; 1Chair of Numerical Analysis and Mathematical Modeling YSU, Armenia 2Chair of Mathematics and Mathematical Modeling RAU, Armenia In the present paper we deduce closed form expressions for the entries of the Moore–Penrose inverse of a special type upper bidiagonal matrices. On the base of the formulae obtained, a finite algorithm with optimal order of computational complexity is constructed. MSC2010: 15A09; 65F20. Keywords: generalized inverse, Moore–Penrose inverse, bidiagonal matrix. Introduction. For a real m × n matrix A, the Moore–Penrose inverse A+ is the unique n × m matrix that satisfies the following four properties: AA+A = A; A+AA+ = A+ ; (A+A)T = A+A; (AA+)T = AA+ (see [1], for example). If A is a square nonsingular matrix, then A+ = A−1. Thus, the Moore–Penrose inversion generalizes ordinary matrix inversion. The idea of matrix generalized inverse was first introduced in 1920 by E. Moore [2], and was later redis- covered by R. Penrose [3]. The Moore–Penrose inverses have numerous applications in least-squares problems, regressive analysis, linear programming and etc. For more information on the generalized inverses see [1] and its extensive bibliography. In this work we consider the problem of the Moore–Penrose inversion of square upper bidiagonal matrices 2 3 d1 b1 6 d2 b2 0 7 6 7 6 .. .. 7 A = 6 . 7 : (1) 6 7 4 0 dn−1 bn−1 5 dn A natural question arises: what caused an interest in pseudoinversion of such matrices ? The reason is as follows.
    [Show full text]
  • Minimum-Residual Methods for Sparse Least-Squares Using Golub-Kahan Bidiagonalization a Dissertation Submitted to the Institute
    MINIMUM-RESIDUAL METHODS FOR SPARSE LEAST-SQUARES USING GOLUB-KAHAN BIDIAGONALIZATION A DISSERTATION SUBMITTED TO THE INSTITUTE FOR COMPUTATIONAL AND MATHEMATICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY David Chin-lung Fong December 2011 © 2011 by Chin Lung Fong. All Rights Reserved. Re-distributed by Stanford University under license with the author. This dissertation is online at: http://purl.stanford.edu/sd504kj0427 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Michael Saunders, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Margot Gerritsen I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Walter Murray Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii iv ABSTRACT For 30 years, LSQR and its theoretical equivalent CGLS have been the standard iterative solvers for large rectangular systems Ax = b and least-squares problems min Ax b . They are analytically equivalent − to symmetric CG on the normal equation ATAx = ATb, and they reduce r monotonically, where r = b Ax is the k-th residual vector.
    [Show full text]
  • Restarted Lanczos Bidiagonalization for the SVD in Slepc
    Scalable Library for Eigenvalue Problem Computations SLEPc Technical Report STR-8 Available at http://slepc.upv.es Restarted Lanczos Bidiagonalization for the SVD in SLEPc V. Hern´andez J. E. Rom´an A. Tom´as Last update: June, 2007 (slepc 2.3.3) Previous updates: { About SLEPc Technical Reports: These reports are part of the documentation of slepc, the Scalable Library for Eigenvalue Problem Computations. They are intended to complement the Users Guide by providing technical details that normal users typically do not need to know but may be of interest for more advanced users. Restarted Lanczos Bidiagonalization for the SVD in SLEPc STR-8 Contents 1 Introduction 2 2 Description of the Method3 2.1 Lanczos Bidiagonalization...............................3 2.2 Dealing with Loss of Orthogonality..........................6 2.3 Restarted Bidiagonalization..............................8 2.4 Available Implementations............................... 11 3 The slepc Implementation 12 3.1 The Algorithm..................................... 12 3.2 User Options...................................... 13 3.3 Known Issues and Applicability............................ 14 1 Introduction The singular value decomposition (SVD) of an m × n complex matrix A can be written as A = UΣV ∗; (1) ∗ where U = [u1; : : : ; um] is an m × m unitary matrix (U U = I), V = [v1; : : : ; vn] is an n × n unitary matrix (V ∗V = I), and Σ is an m × n diagonal matrix with nonnegative real diagonal entries Σii = σi for i = 1;:::; minfm; ng. If A is real, U and V are real and orthogonal. The vectors ui are called the left singular vectors, the vi are the right singular vectors, and the σi are the singular values. In this report, we will assume without loss of generality that m ≥ n.
    [Show full text]
  • Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods1
    Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods1 Richard Barrett2, Michael Berry3, Tony F. Chan4, James Demmel5, June M. Donato6, Jack Dongarra3,2, Victor Eijkhout7, Roldan Pozo8, Charles Romine9, and Henk Van der Vorst10 This document is the electronic version of the 2nd edition of the Templates book, which is available for purchase from the Society for Industrial and Applied Mathematics (http://www.siam.org/books). 1This work was supported in part by DARPA and ARO under contract number DAAL03-91-C-0047, the National Science Foundation Science and Technology Center Cooperative Agreement No. CCR-8809615, the Applied Mathematical Sciences subprogram of the Office of Energy Research, U.S. Department of Energy, under Contract DE-AC05-84OR21400, and the Stichting Nationale Computer Faciliteit (NCF) by Grant CRG 92.03. 2Computer Science and Mathematics Division, Oak Ridge National Laboratory, Oak Ridge, TN 37830- 6173. 3Department of Computer Science, University of Tennessee, Knoxville, TN 37996. 4Applied Mathematics Department, University of California, Los Angeles, CA 90024-1555. 5Computer Science Division and Mathematics Department, University of California, Berkeley, CA 94720. 6Science Applications International Corporation, Oak Ridge, TN 37831 7Texas Advanced Computing Center, The University of Texas at Austin, Austin, TX 78758 8National Institute of Standards and Technology, Gaithersburg, MD 9Office of Science and Technology Policy, Executive Office of the President 10Department of Mathematics, Utrecht University, Utrecht, the Netherlands. ii How to Use This Book We have divided this book into five main chapters. Chapter 1 gives the motivation for this book and the use of templates. Chapter 2 describes stationary and nonstationary iterative methods.
    [Show full text]