9.4 the QR Algorithm the QR Factorization of a Matrix

Total Page:16

File Type:pdf, Size:1020Kb

9.4 the QR Algorithm the QR Factorization of a Matrix 9.4 The QR Algorithm x t The QR factorization of a matrix: For a matrix (aij), let x1 = (a11; : : : ; an1) and y1 = t ( x1 2; 0;:::; 0) , then there is a Householder matrix P1 such that P1x1 = y1 and k k (1) (1) x1 2 a : : : a 2 k k 12 1n 3 0 a(1) : : : a(1) 6 22 2n 7 P1A = 6 . 7 6 . ::: . 7 6 7 6 (1) (1) 7 4 0 an2 : : : ann 5 (1) (1) t t Let x2 = (0; a : : : ; a ) and y2 = (0; x2 2; 0;:::; 0) , then there is a Householder matrix P2 such 22 n1 k k that P2x2 = y2 and (1) (1) (1) x1 2 a12 a13 : : : a1n 2 k k (2) (2) 3 0 x2 2 a : : : a 6 k k 23 2n 7 6 (2) (2) 7 P2P1A = 6 0 0 a33 : : : a3n 7 : 6 . 7 6 . 7 6 . ::: . 7 6 (2) (2) 7 4 0 0 an3 : : : ann 5 If we continue this procedure, we will have Householder matrices P3;:::;Pn 1 such that − r11 r12 r13 : : : r1n 2 3 0 r22 r23 : : : r2n 6 7 P P P A = 6 0 0 r33 : : : r3n 7 = R; n 1 2 1 6 7 − ··· 6 . .. 7 6 . 7 6 7 4 0 0 0 : : : rnn 5 1 t an upper triangular matrix. Since all Pi's are symmetric and orthogonal, Pi− = Pi = Pi. Let Q = P1P2 Pn 1. Then, ··· − A = QR: Note that Q is still orthogonal (but not symmetric). The operation count for the QR factorization is O(n3). Remark: For a system Ax = b, if we apply the QR factorization to the matrix [A b], say Q[A b] = [R b ], i.e., j j j 0 r11 r12 r13 : : : r1n b 2 10 3 0 r22 r23 : : : r2n b20 6 7 6 0 0 r33 : : : r3n b30 7 6 7 6 . .. 7 6 . 7 6 7 4 0 0 0 : : : rnn bn0 5 Then solving Ax = b is equivalent to solving Rx = b0. You can use backward-substitution method to solve Rx = b0 as we did in the last step of Gaussian elimination method. This is the QR method for solving linear systems. This method is much more stable than Gaussian elimination. The QR method for eigenvalues of tridiagonal matrices: The basics: 1 (1) If A is upper Hessenberg and symmetric, then A is tridiagonal. (2) The product of two upper-Hessenberg matrices is still upper-Hessenberg. (in your homework) t t (3) If w = (0;:::; 0; wk; wk+1; 0;:::; 0) , then P = (pij) = I 2ww is tridiagonal (only off-diagonal − elements pk;k+1 and pk+1;k may not be zero), and thus upper-Hessenberg. (4) The eigenvalues of a diagonal matrix are the diagonal elements of the matrix. (5) If A is a tridiagonal matrix with one of the off-diagonal element, say ak+1;k, is zero or smaller than the tolerance given, then we can split A into two smaller matrices (one from the first k rows and first k columns of A, the another from the last n k rows and last n k columns of A), and the eigenvalues of these two smaller matrices are the− eigenvalues of A. − Let A be a symmetric tridiagonal matrix with no off-diagonal elements zero or small. We want to find its eigenvalues. If we apply the method of QR factorization above, there are Householder matrices P1;P2;:::;Pn 1 and an upper-triangular matrix R such that − (1) Pn 1 P2P1A = R : − ··· t Note that Pi's are all tridiagonal from (3) in the basics. Pi 's are still tridiagonal. So, t t t (1) (1) A = (P1P2 Pn 1)R = Q R ··· − t 1 since Pi's are all orthogonal (Pi = Pi− ), where (1) t t t Q = P1P2 Pn 1: ··· − Let A(1) = R(1)Q(1). Then (a) A(1) = R(1)Q(1) = (Q(1))tAQ(1): So, A(1) is symmetric and similar to A (this requires that Q(1) is orthogonal!). (b) A(1) is still tridiagonal since R(1) and Q(1) are upper-Hessenberg and A(1) is symmetric. A result: The absolute values of the off-diagonal elements of A(1) are usually smaller than the ones of A. (We can not prove it in this class) If we continue this procedure, that is, perform the following iterations: For k = 1; 2; 3;:::, do A(k) = Q(k+1)R(k+1);A(k+1) = R(k+1)Q(k+1): Then A(k) will converges to a diagonal matrix D when k increases. The iterations can be stopped when the off-diagonal elements are smaller than the tolerance given. The diagonal elements of D are the eigenvalues of A since A(k)'s are similar to A. This is the basic idea of QR method for finding eigenvalues of matrices. Two simple ways to improve the method: 2 (a) Replace the Householder matrices by rotation matrices to reduce operation counts. (b) Use shifting technique to speed up the convergence of the off-diagonal elements to 0. For (a): the basic operation in the Householder method for the tridiagonal matrices is to change t x = (x1; : : : ; xk 1; xk; xk+1; 0 :::; 0) − to t y = (x1; : : : ; xk 1; ; 0; 0 :::; 0) : − ∗ t It can be done if we know how to simply find an orthogonal matrix to change a = (a1; a2) to t ( a 2; 0) . Let θ be the angle between these two vectors in the plane. Then k k a a cos θ = 1 ; sin θ = 2 a 2 a 2 k k k k and cos θ sin θ a1 a 2 " #" # = " k k # sin θ cos θ a2 0 − To change t x = (x1; : : : ; xk 1; xk; xk+1; 0 :::; 0) − to t y = (x1; : : : ; xk 1; ; 0; 0 :::; 0) ; − ∗ we simply define x x cos θ = k ; sin θ = k+1 (xk; xk+1) 2 (xk; xk+1) 2 k k k k and 1 2 . 3 .. 6 7 6 7 6 1 7 6 7 6 cos θ sin θ 7 P = 6 7 6 sin θ cos θ 7 6 7 6 − 7 6 1 7 6 7 6 ::: 7 6 7 4 1 5 with the only nonzero off-diagonal elements pk+1;k and pk;k+1. Then P x = y. This kind of matrices are called rotation matrices. They are orthogonal (we need this since we only can perform \similar" transformations on A!), but not symmetric. Remark: These matrices are a little easier to construct than the Householder's. The saving is small (save 4 = operations). But the saving will be big if you need to construct this kind of matrices millions× times.÷ This is the case for the QR method for eigenvalues. For (b): If A has distinct eigenvalues λi's with λ1 > λ2 > > λn , the rate of convergence (k) j j j j ··· j j of the off-diagonal element ai+1;i depends on the ratio λi+1/λi. The idea is to introduce a shifting to make one of the ratio very small, usually the last ratio (k) λn/λn 1. In this way, an;n 1 goes to zero much faster than other off-diagonal elements. So, \choose" − − 3 (k) a number sk which is close to the eigenvalue to which ann will converge, say λn, then the matrix (k) A skI has eigenvalues λ1 sk; : : : ; λn sk, and the ratio − − − λn sk − λn 1 sk − − (k) will be very small. Then continue the iterations on the shifted matrix A skI. Let − (k) (k) (k) A skI = Q R : − Define (k+1) (k) (k) A = R Q + skI: The same shifting technique can be used on A(k+1). Note that A(k) and A(k+1) have same eigenvalues. At step k, sk is chosen to be the eigenvalue of a(k) a(k) " n 1;n 1 n;n 1 # −(k) − (k−) ; an;n 1 an;n − (k) which is the closest to an;n. (k) (k) When an;n 1 is less than the tolerance given, then an;n is a numerical eigenvalue of the shifted j − j (k) (k) matrix A . If we add the previous shifting value back to an;n, we get a numerical eigenvalue of A. At the same time, we can split the final shifted matrix A(k) into two smaller matrices (one is 1 1) as explained in (4) of the basics. Then continue the iterations on the smaller matrix. × The method will be stopped when it reaches a 2 2 matrix whose eigenvalues are easily obtained by solving its characteristic polynomial. × An example is given on page 581 (you need the definitions of the constants in Alg. 9.6 for the example). There are much much more about QR method for calculating eigenvalues! 4.
Recommended publications
  • Eigenvalues and Eigenvectors of Tridiagonal Matrices∗
    Electronic Journal of Linear Algebra ISSN 1081-3810 A publication of the International Linear Algebra Society Volume 15, pp. 115-133, April 2006 ELA http://math.technion.ac.il/iic/ela EIGENVALUES AND EIGENVECTORS OF TRIDIAGONAL MATRICES∗ SAID KOUACHI† Abstract. This paper is continuation of previous work by the present author, where explicit formulas for the eigenvalues associated with several tridiagonal matrices were given. In this paper the associated eigenvectors are calculated explicitly. As a consequence, a result obtained by Wen- Chyuan Yueh and independently by S. Kouachi, concerning the eigenvalues and in particular the corresponding eigenvectors of tridiagonal matrices, is generalized. Expressions for the eigenvectors are obtained that differ completely from those obtained by Yueh. The techniques used herein are based on theory of recurrent sequences. The entries situated on each of the secondary diagonals are not necessary equal as was the case considered by Yueh. Key words. Eigenvectors, Tridiagonal matrices. AMS subject classifications. 15A18. 1. Introduction. The subject of this paper is diagonalization of tridiagonal matrices. We generalize a result obtained in [5] concerning the eigenvalues and the corresponding eigenvectors of several tridiagonal matrices. We consider tridiagonal matrices of the form −α + bc1 00 ... 0 a1 bc2 0 ... 0 .. .. 0 a2 b . An = , (1) .. .. .. 00. 0 . .. .. .. . cn−1 0 ... ... 0 an−1 −β + b n−1 n−1 ∞ where {aj}j=1 and {cj}j=1 are two finite subsequences of the sequences {aj}j=1 and ∞ {cj}j=1 of the field of complex numbers C, respectively, and α, β and b are complex numbers. We suppose that 2 d1, if j is odd ajcj = 2 j =1, 2, ..., (2) d2, if j is even where d 1 and d2 are complex numbers.
    [Show full text]
  • Parametrizations of K-Nonnegative Matrices
    Parametrizations of k-Nonnegative Matrices Anna Brosowsky, Neeraja Kulkarni, Alex Mason, Joe Suk, Ewin Tang∗ October 2, 2017 Abstract Totally nonnegative (positive) matrices are matrices whose minors are all nonnegative (positive). We generalize the notion of total nonnegativity, as follows. A k-nonnegative (resp. k-positive) matrix has all minors of size k or less nonnegative (resp. positive). We give a generating set for the semigroup of k-nonnegative matrices, as well as relations for certain special cases, i.e. the k = n − 1 and k = n − 2 unitriangular cases. In the above two cases, we find that the set of k-nonnegative matrices can be partitioned into cells, analogous to the Bruhat cells of totally nonnegative matrices, based on their factorizations into generators. We will show that these cells, like the Bruhat cells, are homeomorphic to open balls, and we prove some results about the topological structure of the closure of these cells, and in fact, in the latter case, the cells form a Bruhat-like CW complex. We also give a family of minimal k-positivity tests which form sub-cluster algebras of the total positivity test cluster algebra. We describe ways to jump between these tests, and give an alternate description of some tests as double wiring diagrams. 1 Introduction A totally nonnegative (respectively totally positive) matrix is a matrix whose minors are all nonnegative (respectively positive). Total positivity and nonnegativity are well-studied phenomena and arise in areas such as planar networks, combinatorics, dynamics, statistics and probability. The study of total positivity and total nonnegativity admit many varied applications, some of which are explored in “Totally Nonnegative Matrices” by Fallat and Johnson [5].
    [Show full text]
  • The Multishift Qr Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance∗
    SIAM J. MATRIX ANAL. APPL. c 2002 Society for Industrial and Applied Mathematics Vol. 23, No. 4, pp. 929–947 THE MULTISHIFT QR ALGORITHM. PART I: MAINTAINING WELL-FOCUSED SHIFTS AND LEVEL 3 PERFORMANCE∗ KAREN BRAMAN† , RALPH BYERS† , AND ROY MATHIAS‡ Abstract. This paper presents a small-bulge multishift variation of the multishift QR algorithm that avoids the phenomenon ofshiftblurring, which retards convergence and limits the number of simultaneous shifts. It replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number ofsimultaneous shifts—even hundreds—without adverse effects on the convergence rate. With enough simultaneous shifts, the small-bulge multishift QR algorithm takes advantage ofthe level 3 BLAS, which is a special advantage for computers with advanced architectures. Key words. QR algorithm, implicit shifts, level 3 BLAS, eigenvalues, eigenvectors AMS subject classifications. 65F15, 15A18 PII. S0895479801384573 1. Introduction. This paper presents a small-bulge multishift variation of the multishift QR algorithm [4] that avoids the phenomenon of shift blurring, which re- tards convergence and limits the number of simultaneous shifts that can be used effec- tively. The small-bulge multishift QR algorithm replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number of simultaneous shifts—even hundreds—without adverse effects on the convergence rate. It takes advantage of the level 3 BLAS by organizing nearly all the arithmetic workinto matrix-matrix multiplies. This is par- ticularly efficient on most modern computers and especially efficient on computers with advanced architectures.
    [Show full text]
  • The SVD Algorithm
    Jim Lambers CME 335 Spring Quarter 2010-11 Lecture 6 Notes The SVD Algorithm Let A be an m × n matrix. The Singular Value Decomposition (SVD) of A, A = UΣV T ; where U is m × m and orthogonal, V is n × n and orthogonal, and Σ is an m × n diagonal matrix with nonnegative diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σp; p = minfm; ng; known as the singular values of A, is an extremely useful decomposition that yields much informa- tion about A, including its range, null space, rank, and 2-norm condition number. We now discuss a practical algorithm for computing the SVD of A, due to Golub and Kahan. Let U and V have column partitions U = u1 ··· um ;V = v1 ··· vn : From the relations T Avj = σjuj;A uj = σjvj; j = 1; : : : ; p; it follows that T 2 A Avj = σj vj: That is, the squares of the singular values are the eigenvalues of AT A, which is a symmetric matrix. It follows that one approach to computing the SVD of A is to apply the symmetric QR algorithm T T T T to A A to obtain a decomposition A A = V Σ ΣV . Then, the relations Avj = σjuj, j = 1; : : : ; p, can be used in conjunction with the QR factorization with column pivoting to obtain U. However, this approach is not the most practical, because of the expense and loss of information incurred from computing AT A. Instead, we can implicitly apply the symmetric QR algorithm to AT A. As the first step of the symmetric QR algorithm is to use Householder reflections to reduce the matrix to tridiagonal form, we can use Householder reflections to instead reduce A to upper bidiagonal form 2 3 d1 f1 6 d2 f2 7 6 7 T 6 .
    [Show full text]
  • Communication-Optimal Parallel and Sequential QR and LU Factorizations
    Communication-optimal parallel and sequential QR and LU factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2008-89 http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-89.html August 4, 2008 Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Communication-optimal parallel and sequential QR and LU factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou August 4, 2008 Abstract We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m × n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization. The new algorithms are superior in both theory and practice. We have extended known lower bounds on communication for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions.
    [Show full text]
  • (Hessenberg) Eigenvalue-Eigenmatrix Relations∗
    (HESSENBERG) EIGENVALUE-EIGENMATRIX RELATIONS∗ JENS-PETER M. ZEMKE† Abstract. Explicit relations between eigenvalues, eigenmatrix entries and matrix elements are derived. First, a general, theoretical result based on the Taylor expansion of the adjugate of zI − A on the one hand and explicit knowledge of the Jordan decomposition on the other hand is proven. This result forms the basis for several, more practical and enlightening results tailored to non-derogatory, diagonalizable and normal matrices, respectively. Finally, inherent properties of (upper) Hessenberg, resp. tridiagonal matrix structure are utilized to construct computable relations between eigenvalues, eigenvector components, eigenvalues of principal submatrices and products of lower diagonal elements. Key words. Algebraic eigenvalue problem, eigenvalue-eigenmatrix relations, Jordan normal form, adjugate, principal submatrices, Hessenberg matrices, eigenvector components AMS subject classifications. 15A18 (primary), 15A24, 15A15, 15A57 1. Introduction. Eigenvalues and eigenvectors are defined using the relations Av = vλ and V −1AV = J. (1.1) We speak of a partial eigenvalue problem, when for a given matrix A ∈ Cn×n we seek scalar λ ∈ C and a corresponding nonzero vector v ∈ Cn. The scalar λ is called the eigenvalue and the corresponding vector v is called the eigenvector. We speak of the full or algebraic eigenvalue problem, when for a given matrix A ∈ Cn×n we seek its Jordan normal form J ∈ Cn×n and a corresponding (not necessarily unique) eigenmatrix V ∈ Cn×n. Apart from these constitutional relations, for some classes of structured matrices several more intriguing relations between components of eigenvectors, matrix entries and eigenvalues are known. For example, consider the so-called Jacobi matrices.
    [Show full text]
  • Massively Parallel Poisson and QR Factorization Solvers
    Computers Math. Applic. Vol. 31, No. 4/5, pp. 19-26, 1996 Pergamon Copyright~)1996 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/96 $15.00 + 0.00 0898-122 ! (95)00212-X Massively Parallel Poisson and QR Factorization Solvers M. LUCK£ Institute for Control Theory and Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrluck@savba, sk M. VAJTERSIC Institute of Informatics, Slovak Academy of Sciences DdbravskA cesta 9, 840 00 Bratislava, P.O. Box 56, Slovak Republic kaifmava©savba, sk E. VIKTORINOVA Institute for Control Theoryand Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrevka@savba, sk Abstract--The paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder re- flections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n x n processor array. The Dirichlet problem is replaced by its finite-difference analog on an M x N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(MN log L/n2), where L = max(M + 1, N). A parallel proposal of QI~ factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.
    [Show full text]
  • Explicit Inverse of a Tridiagonal (P, R)–Toeplitz Matrix
    Explicit inverse of a tridiagonal (p; r){Toeplitz matrix A.M. Encinas, M.J. Jim´enez Departament de Matemtiques Universitat Politcnica de Catalunya Abstract Tridiagonal matrices appears in many contexts in pure and applied mathematics, so the study of the inverse of these matrices becomes of specific interest. In recent years the invertibility of nonsingular tridiagonal matrices has been quite investigated in different fields, not only from the theoretical point of view (either in the framework of linear algebra or in the ambit of numerical analysis), but also due to applications, for instance in the study of sound propagation problems or certain quantum oscillators. However, explicit inverses are known only in a few cases, in particular when the tridiagonal matrix has constant diagonals or the coefficients of these diagonals are subjected to some restrictions like the tridiagonal p{Toeplitz matrices [7], such that their three diagonals are formed by p{periodic sequences. The recent formulae for the inversion of tridiagonal p{Toeplitz matrices are based, more o less directly, on the solution of second order linear difference equations, although most of them use a cumbersome formulation, that in fact don not take into account the periodicity of the coefficients. This contribution presents the explicit inverse of a tridiagonal matrix (p; r){Toeplitz, which diagonal coefficients are in a more general class of sequences than periodic ones, that we have called quasi{periodic sequences. A tridiagonal matrix A = (aij) of order n + 2 is called (p; r){Toeplitz if there exists m 2 N0 such that n + 2 = mp and ai+p;j+p = raij; i; j = 0;:::; (m − 1)p: Equivalently, A is a (p; r){Toeplitz matrix iff k ai+kp;j+kp = r aij; i; j = 0; : : : ; p; k = 0; : : : ; m − 1: We have developed a technique that reduces any linear second order difference equation with periodic or quasi-periodic coefficients to a difference equation of the same kind but with constant coefficients [3].
    [Show full text]
  • Determinant Formulas and Cofactors
    Determinant formulas and cofactors Now that we know the properties of the determinant, it’s time to learn some (rather messy) formulas for computing it. Formula for the determinant We know that the determinant has the following three properties: 1. det I = 1 2. Exchanging rows reverses the sign of the determinant. 3. The determinant is linear in each row separately. Last class we listed seven consequences of these properties. We can use these ten properties to find a formula for the determinant of a 2 by 2 matrix: � � � � � � � a b � � a 0 � � 0 b � � � = � � + � � � c d � � c d � � c d � � � � � � � � � � a 0 � � a 0 � � 0 b � � 0 b � = � � + � � + � � + � � � c 0 � � 0 d � � c 0 � � 0 d � = 0 + ad + (−cb) + 0 = ad − bc. By applying property 3 to separate the individual entries of each row we could get a formula for any other square matrix. However, for a 3 by 3 matrix we’ll have to add the determinants of twenty seven different matrices! Many of those determinants are zero. The non-zero pieces are: � � � � � � � � � a a a � � a 0 0 � � a 0 0 � � 0 a 0 � � 11 12 13 � � 11 � � 11 � � 12 � � a21 a22 a23 � = � 0 a22 0 � + � 0 0 a23 � + � a21 0 0 � � � � � � � � � � a31 a32 a33 � � 0 0 a33 � � 0 a32 0 � � 0 0 a33 � � � � � � � � 0 a 0 � � 0 0 a � � 0 0 a � � 12 � � 13 � � 13 � + � 0 0 a23 � + � a21 0 0 � + � 0 a22 0 � � � � � � � � a31 0 0 � � 0 a32 0 � � a31 0 0 � = a11 a22a33 − a11a23 a33 − a12a21a33 +a12a23a31 + a13 a21a32 − a13a22a31. Each of the non-zero pieces has one entry from each row in each column, as in a permutation matrix.
    [Show full text]
  • QR Decomposition on Gpus
    QR Decomposition on GPUs Andrew Kerr, Dan Campbell, Mark Richards Georgia Institute of Technology, Georgia Tech Research Institute {andrew.kerr, dan.campbell}@gtri.gatech.edu, [email protected] ABSTRACT LU, and QR decomposition, however, typically require ¯ne- QR decomposition is a computationally intensive linear al- grain synchronization between processors and contain short gebra operation that factors a matrix A into the product of serial routines as well as massively parallel operations. Achiev- a unitary matrix Q and upper triangular matrix R. Adap- ing good utilization on a GPU requires a careful implementa- tive systems commonly employ QR decomposition to solve tion informed by detailed understanding of the performance overdetermined least squares problems. Performance of QR characteristics of the underlying architecture. decomposition is typically the crucial factor limiting problem sizes. In this paper, we focus on QR decomposition in particular and discuss the suitability of several algorithms for imple- Graphics Processing Units (GPUs) are high-performance pro- mentation on GPUs. Then, we provide a detailed discussion cessors capable of executing hundreds of floating point oper- and analysis of how blocked Householder reflections may be ations in parallel. As commodity accelerators for 3D graph- used to implement QR on CUDA-compatible GPUs sup- ics, GPUs o®er tremendous computational performance at ported by performance measurements. Our real-valued QR relatively low costs. While GPUs are favorable to applica- implementation achieves more than 10x speedup over the tions with much inherent parallelism requiring coarse-grain native QR algorithm in MATLAB and over 4x speedup be- synchronization between processors, methods for e±ciently yond the Intel Math Kernel Library executing on a multi- utilizing GPUs for algorithms computing QR decomposition core CPU, all in single-precision floating-point.
    [Show full text]
  • The Unsymmetric Eigenvalue Problem
    Jim Lambers CME 335 Spring Quarter 2010-11 Lecture 4 Supplemental Notes The Unsymmetric Eigenvalue Problem Properties and Decompositions Let A be an n × n matrix. A nonzero vector x is called an eigenvector of A if there exists a scalar λ such that Ax = λx: The scalar λ is called an eigenvalue of A, and we say that x is an eigenvector of A corresponding to λ. We see that an eigenvector of A is a vector for which matrix-vector multiplication with A is equivalent to scalar multiplication by λ. We say that a nonzero vector y is a left eigenvector of A if there exists a scalar λ such that λyH = yH A: The superscript H refers to the Hermitian transpose, which includes transposition and complex conjugation. That is, for any matrix A, AH = AT . An eigenvector of A, as defined above, is sometimes called a right eigenvector of A, to distinguish from a left eigenvector. It can be seen that if y is a left eigenvector of A with eigenvalue λ, then y is also a right eigenvector of AH , with eigenvalue λ. Because x is nonzero, it follows that if x is an eigenvector of A, then the matrix A − λI is singular, where λ is the corresponding eigenvalue. Therefore, λ satisfies the equation det(A − λI) = 0: The expression det(A−λI) is a polynomial of degree n in λ, and therefore is called the characteristic polynomial of A (eigenvalues are sometimes called characteristic values). It follows from the fact that the eigenvalues of A are the roots of the characteristic polynomial that A has n eigenvalues, which can repeat, and can also be complex, even if A is real.
    [Show full text]
  • Sparse Linear Systems Section 4.2 – Banded Matrices
    Band Systems Tridiagonal Systems Cyclic Reduction Parallel Numerical Algorithms Chapter 4 – Sparse Linear Systems Section 4.2 – Banded Matrices Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Outline 1 Band Systems 2 Tridiagonal Systems 3 Cyclic Reduction Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Banded Linear Systems Bandwidth (or semibandwidth) of n × n matrix A is smallest value w such that aij = 0 for all ji − jj > w Matrix is banded if w n If w p, then minor modifications of parallel algorithms for dense LU or Cholesky factorization are reasonably efficient for solving banded linear system Ax = b If w / p, then standard parallel algorithms for LU or Cholesky factorization utilize few processors and are very inefficient Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Narrow Banded Linear Systems More efficient parallel algorithms for narrow banded linear systems are based on divide-and-conquer approach in which band is partitioned into multiple pieces that are processed simultaneously Reordering matrix by nested dissection is one example of this approach Because of fill, such methods generally require more total work than best serial algorithm for system with dense band We will illustrate for tridiagonal linear systems, for which w = 1, and will assume pivoting is not needed for stability (e.g., matrix is diagonally dominant or symmetric positive definite) Michael T.
    [Show full text]