9. QR Algorithm

Total Page:16

File Type:pdf, Size:1020Kb

9. QR Algorithm L. Vandenberghe ECE133B (Spring 2020) 9. QR algorithm • tridiagonal symmetric matrices • basic QR algorithm • QR algorithm with shifts 9.1 QR algorithm • the standard method for computing eigenvalues and eigenvectors • we discuss the algorithm for symmetric eigendecomposition n T X T A = QΛQ = λiqiqi i=1 there are two stages 1. reduce A to tridiagonal form by an orthogonal similarity transformation T Q1 AQ1 = T; T tridiagonal, Q1 orthogonal Λ T 2. compute eigendecomposition T = Q2 Q2 by a fast iterative algorithm QR algorithm 9.2 Outline • tridiagonal symmetric matrices • basic QR algorithm • QR algorithm with shifts Reflector Q = I − vvT p • v is a vector with norm kvk = 2 • Q is symmetric and orthogonal • product Qx = x − ¹vT xºv requires 4m flops if x and v are m-vectors Reflection to multiple of first unit vector (133A lecture 6) • an easily constructed reflector maps a given y to a multiple of e1 = ¹1;0;:::;0º • if y , 0, choose the reflector defined by p 2 v = w; w = y + sign¹y ºkyke kwk 1 1 this reflector maps y to Qy = −sign¹y1ºkyke1 QR algorithm 9.3 Reduction to tridiagonal form given an n × n symmetric matrix A, find orthogonal Q such that 2 a1 b1 0 ··· 0 0 0 3 6 7 6 b1 a2 b2 ··· 0 0 0 7 6 7 6 0 b2 a3 ··· 0 0 0 7 T 6 : : : : : : : 7 Q AQ = 6 : : : : : : : : 7 6 7 6 0 0 0 ··· an−2 bn−2 0 7 6 7 6 0 0 0 ··· bn−2 an−1 bn−1 7 6 7 6 0 0 0 ··· 0 bn−1 an 7 4 5 • this can be achieved by a product of reflectors Q = Q1Q2 ··· Qn−2 • complexity is order n3 QR algorithm 9.4 First step partition A as T a1 c1 A = c1 is an ¹n − 1º-vector, B1 is ¹n − 1º × ¹n − 1º c1 B1 • − T find a reflector I v1v1 that maps c1 to b1e1 and define 1 0 Q = 1 − T 0 I v1v1 • multiply A with Q1 to introduce zeros in positions 3;:::; n of 1st column and row a cT¹I − v vTº Q AQ = 1 1 1 1 1 1 ¹ − Tº ¹ − Tº ¹ − Tº I v1v1 c1 I v1v1 B1 I v1v1 T a b eT v B1v1 = 1 1 1 where w = B v − 1 v − T − T 1 1 1 1 b1e1 B1 v1w1 w1v1 2 • computation of 2;2 block requires order 4n2 flops QR algorithm 9.5 General step after k − 1 steps, 2 a b 0 ··· 0 0 0 0 3 6 1 1 7 6 b a b ··· 0 0 0 0 7 6 1 2 2 7 6 0 b a ··· 0 0 0 0 7 6 2 3 7 6 : : : : : : : : : : 7 ··· ··· 6 7 Qk−1 Q1AQ1 Qk−1 = 6 0 0 0 ··· a b 0 0 7 6 k−2 k−2 7 6 0 0 0 ··· b a b 0 7 6 k−2 k−1 k−1 7 6 T 7 6 0 0 0 ··· 0 bk−1 ak c 7 6 k 7 6 0 0 0 ··· 0 0 ck Bk 7 4 5 • − T ¹ − º find a reflector I vkvk that maps the n k -vector ck to bke1 and define 2 Ik−1 0 0 3 T 6 7 T T ak+1 ck+1 Qk = 6 0 1 0 7 ; ¹I − vkvk ºBk¹I − vkvk º = 6 T 7 ck+1 Bk+1 6 0 0 I − vkv 7 4 k 5 • complexity of step k is 4¹n − kº2 plus lower order terms QR algorithm 9.6 Complexity • complexity for complete algorithm is dominated by n−2 4 X 4¹n − kº2 ≈ n3 k=1 3 • Q is stored in factored form (the vectors vk are stored) • if needed, assembling the matrix Q adds another order n3 term QR algorithm 9.7 QR factorization of tridiagonal matrix suppose A is n × n and tridiagonal, with QR factorization A = QR Q and R have a special structure (dots indicate possible nonzero elements): 2 •• 3 2 •••••• 3 2 ••• 3 6 7 6 7 6 7 6 ••• 7 6 •••••• 7 6 ••• 7 6 7 6 7 6 7 6 ••• 7 6 ••••• 7 6 ••• 7 6 7 6 7 6 7 6 7 = 6 7 6 7 6 ••• 7 6 •••• 7 6 ••• 7 6 7 6 7 6 7 6 ••• 7 6 ••• 7 6 •• 7 6 7 6 7 6 7 6 •• 7 6 •• 7 6 • 7 4 5 4 5 4 5 • Q is zero below the first subdiagonal (Qij = 0 if i > j + 1) column k is column k of A orthogonalized with respect to previous columns • R is zero above second superdiagonal (Rij = 0 if j > i + 2) follows from considering R = QT A and the property of Q QR algorithm 9.8 Computing tridiagonal QR factorization QR factorization of n × n tridiagonal A takes order n operations QT A = R for example, in the Householder algorithm (133A lecture 6) • T − T Q is a product of reflectors Hk = I vkvk that make A upper triangular 2 A11 A12 0 ··· 0 0 3 6 7 6 A21 A22 A23 ··· 0 0 7 6 7 6 0 A32 A33 ··· 0 0 7 Hn−1 ··· H1 6 : : : : : : 7 = R 6 : : : : : : : 7 6 7 6 0 0 0 ··· An−1;n−1 An−1;n 7 6 7 6 0 0 0 ··· An;n−1 Ann 7 4 5 if A is tridiagonal, each vector vk has only two nonzero elements • Q is stored in factored form (as the reflectors vk) • we can allow zeros on diagonal of R, to extend QR factorization to singular A QR algorithm 9.9 Outline • tridiagonal symmetric matrices • basic QR algorithm • QR algorithm with shifts QR algorithm suppose A is a symmetric n × n matrix Basic QR iteration: start at A1 = A and repeat for k = 1;2;:::, • compute QR factorization Ak = Qk Rk • define Ak+1 = RkQk for most matrices, • Ak converges to a diagonal matrix of eigenvalues of A • Uk = Q1Q2 ··· Qk converges to matrix of eigenvectors QR algorithm 9.10 Some immediate properties A1 = A; Ak = Qk Rk; Ak+1 = RkQk ¹k ≥ 1º • the matrices Ak are symmetric: the first matrix A1 = A is symmetric and T Ak+1 = RkQk = Qk AkQk • continuing recursively, we see that an orthogonal similarity relates Ak and A: T Ak+1 = ¹Q1Q2 ··· Qkº A¹Q1Q2 ··· Qkº T = Uk AUk therefore the matrices Ak all have the same eigenvalues as A • the orthogonal matrices Uk = Q1Q2 ··· Qk and Rk satisfy AUk−1 = Uk−1Ak = Uk−1Qk Rk = Uk Rk QR algorithm 9.11 Equivalent form a related algorithm (“simultaneous iteration”) uses the last property to generate Uk: AUk−1 = Uk Rk we note that the right-hand side is a QR factorization Simultaneous iteration: start at U0 = I and repeat for k = 1;2;:::, • multiply with A: compute Vk = AUk−1 • compute QR factorization Vk = Uk Rk if the matrices Uk converge to U, then Rk converges to a diagonal matrix, since T T Rk = Uk Vk = Uk AUk−1 T so the limit of Rk is both symmetric (U AU) and triangular, hence diagonal QR algorithm 9.12 Interpretation simultaneous iteration is a matrix extension of the power iteration Power iteration: start at n-vector u0 with ku0k = 1, and repeat for k = 1;2;:::, • multiply with A: compute vk = Auk−1 • normalize: uk = vk/kvk k this is a simple iteration for computing an eigenvector with the largest eigenvalue • suppose the eigenvalues of A satisfy jλ1j > jλ2j ≥ · · · ≥ jλnj • suppose u0 = α1q1 + ··· + αnqn where qi is a normalized eigenvector for λi • after k power iterations, uk is the normalized vector k k k k A u0 = λ1¹α1q1 + α2¹λ2/λ1º q2 + ··· + αn¹λn/λ1º qnº • ± T if α1 , 0, the vector uk converges to q1, and uk Auk converges to λ1 QR algorithm 9.13 QR iteration with tridiagonal A now suppose A in the basic QR iteration on page 9.10 is tridiagonal and symmetric • we already noted that the matrices Ak are symmetric if A is symmetric (p. 9.11): T Ak+1 = RkQk = Qk AkQk • Q-factor of a tridiagonal matrix is zero below the first subdiagonal (page 9.8) • this implies that the product Ak+1 = RkQk is zero below the first subdiagonal: 2 ••• 3 2 •••••• 3 2 •••••• 3 6 7 6 7 6 7 6 ••• 7 6 •••••• 7 6 •••••• 7 6 7 6 7 6 7 6 ••• 7 6 ••••• 7 6 ••••• 7 6 7 6 7 6 7 6 7 6 7 = 6 7 6 ••• 7 6 •••• 7 6 •••• 7 6 7 6 7 6 7 6 •• 7 6 ••• 7 6 ••• 7 6 7 6 7 6 7 6 • 7 6 •• 7 6 •• 7 4 5 4 5 4 5 • since Ak+1 is also symmetric, it is tridiagonal hence, for tridiagonal symmetric A, the complexity of one QR iteration is order n QR algorithm 9.14 Outline • tridiagonal symmetric matrices • basic QR algorithm • QR algorithm with shifts QR algorithm with shifts in practice, a multiple of the identity is subtracted from Ak before factoring QR iteration with shifts: start at A1 = A and repeat for k = 1;2;:::, • choose a shift µk • compute QR factorization Ak − µk I = Qk Rk • define Ak+1 = RkQk + µk I • iteration still preserves symmetry and tridiagonal structure in Ak • with properly chosen shifts, the iteration always converges • with properly chosen shifts, convergence is fast (usually cubic) QR algorithm 9.15 Complexity overall complexity of QR method for symmetric eigendecomposition A = QΛQT Eigenvalues: if eigenvectors are not needed, we can leave Q in factored form • reduction of A to tridiagonal form costs ¹4/3ºn3 • for tridiagonal matrix, complexity of one QR iteration is linear in n • on average, number of QR iterations is a small multiple of n hence, cost is dominated by ¹4/3ºn3 for initial reduction to tridiagonal form Eigenvalues and eigenvectors: if Q is needed, order n3 terms are added • reduction to tridiagonal form and accumulating orthogonal matrix costs ¹8/3ºn3 • finding eigenvalues and eigenvectors of tridiagonal matrix costs 6n3 hence, total cost is ¹26/3ºn3 plus lower order terms QR algorithm 9.16 References • Lloyd N.
Recommended publications
  • Eigenvalues and Eigenvectors of Tridiagonal Matrices∗
    Electronic Journal of Linear Algebra ISSN 1081-3810 A publication of the International Linear Algebra Society Volume 15, pp. 115-133, April 2006 ELA http://math.technion.ac.il/iic/ela EIGENVALUES AND EIGENVECTORS OF TRIDIAGONAL MATRICES∗ SAID KOUACHI† Abstract. This paper is continuation of previous work by the present author, where explicit formulas for the eigenvalues associated with several tridiagonal matrices were given. In this paper the associated eigenvectors are calculated explicitly. As a consequence, a result obtained by Wen- Chyuan Yueh and independently by S. Kouachi, concerning the eigenvalues and in particular the corresponding eigenvectors of tridiagonal matrices, is generalized. Expressions for the eigenvectors are obtained that differ completely from those obtained by Yueh. The techniques used herein are based on theory of recurrent sequences. The entries situated on each of the secondary diagonals are not necessary equal as was the case considered by Yueh. Key words. Eigenvectors, Tridiagonal matrices. AMS subject classifications. 15A18. 1. Introduction. The subject of this paper is diagonalization of tridiagonal matrices. We generalize a result obtained in [5] concerning the eigenvalues and the corresponding eigenvectors of several tridiagonal matrices. We consider tridiagonal matrices of the form −α + bc1 00 ... 0 a1 bc2 0 ... 0 .. .. 0 a2 b . An = , (1) .. .. .. 00. 0 . .. .. .. . cn−1 0 ... ... 0 an−1 −β + b n−1 n−1 ∞ where {aj}j=1 and {cj}j=1 are two finite subsequences of the sequences {aj}j=1 and ∞ {cj}j=1 of the field of complex numbers C, respectively, and α, β and b are complex numbers. We suppose that 2 d1, if j is odd ajcj = 2 j =1, 2, ..., (2) d2, if j is even where d 1 and d2 are complex numbers.
    [Show full text]
  • Parametrizations of K-Nonnegative Matrices
    Parametrizations of k-Nonnegative Matrices Anna Brosowsky, Neeraja Kulkarni, Alex Mason, Joe Suk, Ewin Tang∗ October 2, 2017 Abstract Totally nonnegative (positive) matrices are matrices whose minors are all nonnegative (positive). We generalize the notion of total nonnegativity, as follows. A k-nonnegative (resp. k-positive) matrix has all minors of size k or less nonnegative (resp. positive). We give a generating set for the semigroup of k-nonnegative matrices, as well as relations for certain special cases, i.e. the k = n − 1 and k = n − 2 unitriangular cases. In the above two cases, we find that the set of k-nonnegative matrices can be partitioned into cells, analogous to the Bruhat cells of totally nonnegative matrices, based on their factorizations into generators. We will show that these cells, like the Bruhat cells, are homeomorphic to open balls, and we prove some results about the topological structure of the closure of these cells, and in fact, in the latter case, the cells form a Bruhat-like CW complex. We also give a family of minimal k-positivity tests which form sub-cluster algebras of the total positivity test cluster algebra. We describe ways to jump between these tests, and give an alternate description of some tests as double wiring diagrams. 1 Introduction A totally nonnegative (respectively totally positive) matrix is a matrix whose minors are all nonnegative (respectively positive). Total positivity and nonnegativity are well-studied phenomena and arise in areas such as planar networks, combinatorics, dynamics, statistics and probability. The study of total positivity and total nonnegativity admit many varied applications, some of which are explored in “Totally Nonnegative Matrices” by Fallat and Johnson [5].
    [Show full text]
  • The Multishift Qr Algorithm. Part I: Maintaining Well-Focused Shifts and Level 3 Performance∗
    SIAM J. MATRIX ANAL. APPL. c 2002 Society for Industrial and Applied Mathematics Vol. 23, No. 4, pp. 929–947 THE MULTISHIFT QR ALGORITHM. PART I: MAINTAINING WELL-FOCUSED SHIFTS AND LEVEL 3 PERFORMANCE∗ KAREN BRAMAN† , RALPH BYERS† , AND ROY MATHIAS‡ Abstract. This paper presents a small-bulge multishift variation of the multishift QR algorithm that avoids the phenomenon ofshiftblurring, which retards convergence and limits the number of simultaneous shifts. It replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number ofsimultaneous shifts—even hundreds—without adverse effects on the convergence rate. With enough simultaneous shifts, the small-bulge multishift QR algorithm takes advantage ofthe level 3 BLAS, which is a special advantage for computers with advanced architectures. Key words. QR algorithm, implicit shifts, level 3 BLAS, eigenvalues, eigenvectors AMS subject classifications. 65F15, 15A18 PII. S0895479801384573 1. Introduction. This paper presents a small-bulge multishift variation of the multishift QR algorithm [4] that avoids the phenomenon of shift blurring, which re- tards convergence and limits the number of simultaneous shifts that can be used effec- tively. The small-bulge multishift QR algorithm replaces the large diagonal bulge in the multishift QR sweep with a chain of many small bulges. The small-bulge multishift QR sweep admits nearly any number of simultaneous shifts—even hundreds—without adverse effects on the convergence rate. It takes advantage of the level 3 BLAS by organizing nearly all the arithmetic workinto matrix-matrix multiplies. This is par- ticularly efficient on most modern computers and especially efficient on computers with advanced architectures.
    [Show full text]
  • The SVD Algorithm
    Jim Lambers CME 335 Spring Quarter 2010-11 Lecture 6 Notes The SVD Algorithm Let A be an m × n matrix. The Singular Value Decomposition (SVD) of A, A = UΣV T ; where U is m × m and orthogonal, V is n × n and orthogonal, and Σ is an m × n diagonal matrix with nonnegative diagonal entries σ1 ≥ σ2 ≥ · · · ≥ σp; p = minfm; ng; known as the singular values of A, is an extremely useful decomposition that yields much informa- tion about A, including its range, null space, rank, and 2-norm condition number. We now discuss a practical algorithm for computing the SVD of A, due to Golub and Kahan. Let U and V have column partitions U = u1 ··· um ;V = v1 ··· vn : From the relations T Avj = σjuj;A uj = σjvj; j = 1; : : : ; p; it follows that T 2 A Avj = σj vj: That is, the squares of the singular values are the eigenvalues of AT A, which is a symmetric matrix. It follows that one approach to computing the SVD of A is to apply the symmetric QR algorithm T T T T to A A to obtain a decomposition A A = V Σ ΣV . Then, the relations Avj = σjuj, j = 1; : : : ; p, can be used in conjunction with the QR factorization with column pivoting to obtain U. However, this approach is not the most practical, because of the expense and loss of information incurred from computing AT A. Instead, we can implicitly apply the symmetric QR algorithm to AT A. As the first step of the symmetric QR algorithm is to use Householder reflections to reduce the matrix to tridiagonal form, we can use Householder reflections to instead reduce A to upper bidiagonal form 2 3 d1 f1 6 d2 f2 7 6 7 T 6 .
    [Show full text]
  • Communication-Optimal Parallel and Sequential QR and LU Factorizations
    Communication-optimal parallel and sequential QR and LU factorizations James Demmel Laura Grigori Mark Frederick Hoemmen Julien Langou Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2008-89 http://www.eecs.berkeley.edu/Pubs/TechRpts/2008/EECS-2008-89.html August 4, 2008 Copyright © 2008, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Communication-optimal parallel and sequential QR and LU factorizations James Demmel, Laura Grigori, Mark Hoemmen, and Julien Langou August 4, 2008 Abstract We present parallel and sequential dense QR factorization algorithms that are both optimal (up to polylogarithmic factors) in the amount of communication they perform, and just as stable as Householder QR. Our first algorithm, Tall Skinny QR (TSQR), factors m × n matrices in a one-dimensional (1-D) block cyclic row layout, and is optimized for m n. Our second algorithm, CAQR (Communication-Avoiding QR), factors general rectangular matrices distributed in a two-dimensional block cyclic layout. It invokes TSQR for each block column factorization. The new algorithms are superior in both theory and practice. We have extended known lower bounds on communication for sequential and parallel matrix multiplication to provide latency lower bounds, and show these bounds apply to the LU and QR decompositions.
    [Show full text]
  • (Hessenberg) Eigenvalue-Eigenmatrix Relations∗
    (HESSENBERG) EIGENVALUE-EIGENMATRIX RELATIONS∗ JENS-PETER M. ZEMKE† Abstract. Explicit relations between eigenvalues, eigenmatrix entries and matrix elements are derived. First, a general, theoretical result based on the Taylor expansion of the adjugate of zI − A on the one hand and explicit knowledge of the Jordan decomposition on the other hand is proven. This result forms the basis for several, more practical and enlightening results tailored to non-derogatory, diagonalizable and normal matrices, respectively. Finally, inherent properties of (upper) Hessenberg, resp. tridiagonal matrix structure are utilized to construct computable relations between eigenvalues, eigenvector components, eigenvalues of principal submatrices and products of lower diagonal elements. Key words. Algebraic eigenvalue problem, eigenvalue-eigenmatrix relations, Jordan normal form, adjugate, principal submatrices, Hessenberg matrices, eigenvector components AMS subject classifications. 15A18 (primary), 15A24, 15A15, 15A57 1. Introduction. Eigenvalues and eigenvectors are defined using the relations Av = vλ and V −1AV = J. (1.1) We speak of a partial eigenvalue problem, when for a given matrix A ∈ Cn×n we seek scalar λ ∈ C and a corresponding nonzero vector v ∈ Cn. The scalar λ is called the eigenvalue and the corresponding vector v is called the eigenvector. We speak of the full or algebraic eigenvalue problem, when for a given matrix A ∈ Cn×n we seek its Jordan normal form J ∈ Cn×n and a corresponding (not necessarily unique) eigenmatrix V ∈ Cn×n. Apart from these constitutional relations, for some classes of structured matrices several more intriguing relations between components of eigenvectors, matrix entries and eigenvalues are known. For example, consider the so-called Jacobi matrices.
    [Show full text]
  • Massively Parallel Poisson and QR Factorization Solvers
    Computers Math. Applic. Vol. 31, No. 4/5, pp. 19-26, 1996 Pergamon Copyright~)1996 Elsevier Science Ltd Printed in Great Britain. All rights reserved 0898-1221/96 $15.00 + 0.00 0898-122 ! (95)00212-X Massively Parallel Poisson and QR Factorization Solvers M. LUCK£ Institute for Control Theory and Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrluck@savba, sk M. VAJTERSIC Institute of Informatics, Slovak Academy of Sciences DdbravskA cesta 9, 840 00 Bratislava, P.O. Box 56, Slovak Republic kaifmava©savba, sk E. VIKTORINOVA Institute for Control Theoryand Robotics, Slovak Academy of Sciences DdbravskA cesta 9, 842 37 Bratislava, Slovak Republik utrrevka@savba, sk Abstract--The paper brings a massively parallel Poisson solver for rectangle domain and parallel algorithms for computation of QR factorization of a dense matrix A by means of Householder re- flections and Givens rotations. The computer model under consideration is a SIMD mesh-connected toroidal n x n processor array. The Dirichlet problem is replaced by its finite-difference analog on an M x N (M + 1, N are powers of two) grid. The algorithm is composed of parallel fast sine transform and cyclic odd-even reduction blocks and runs in a fully parallel fashion. Its computational complexity is O(MN log L/n2), where L = max(M + 1, N). A parallel proposal of QI~ factorization by the Householder method zeros all subdiagonal elements in each column and updates all elements of the given submatrix in parallel. For the second method with Givens rotations, the parallel scheme of the Sameh and Kuck was chosen where the disjoint rotations can be computed simultaneously.
    [Show full text]
  • Explicit Inverse of a Tridiagonal (P, R)–Toeplitz Matrix
    Explicit inverse of a tridiagonal (p; r){Toeplitz matrix A.M. Encinas, M.J. Jim´enez Departament de Matemtiques Universitat Politcnica de Catalunya Abstract Tridiagonal matrices appears in many contexts in pure and applied mathematics, so the study of the inverse of these matrices becomes of specific interest. In recent years the invertibility of nonsingular tridiagonal matrices has been quite investigated in different fields, not only from the theoretical point of view (either in the framework of linear algebra or in the ambit of numerical analysis), but also due to applications, for instance in the study of sound propagation problems or certain quantum oscillators. However, explicit inverses are known only in a few cases, in particular when the tridiagonal matrix has constant diagonals or the coefficients of these diagonals are subjected to some restrictions like the tridiagonal p{Toeplitz matrices [7], such that their three diagonals are formed by p{periodic sequences. The recent formulae for the inversion of tridiagonal p{Toeplitz matrices are based, more o less directly, on the solution of second order linear difference equations, although most of them use a cumbersome formulation, that in fact don not take into account the periodicity of the coefficients. This contribution presents the explicit inverse of a tridiagonal matrix (p; r){Toeplitz, which diagonal coefficients are in a more general class of sequences than periodic ones, that we have called quasi{periodic sequences. A tridiagonal matrix A = (aij) of order n + 2 is called (p; r){Toeplitz if there exists m 2 N0 such that n + 2 = mp and ai+p;j+p = raij; i; j = 0;:::; (m − 1)p: Equivalently, A is a (p; r){Toeplitz matrix iff k ai+kp;j+kp = r aij; i; j = 0; : : : ; p; k = 0; : : : ; m − 1: We have developed a technique that reduces any linear second order difference equation with periodic or quasi-periodic coefficients to a difference equation of the same kind but with constant coefficients [3].
    [Show full text]
  • Determinant Formulas and Cofactors
    Determinant formulas and cofactors Now that we know the properties of the determinant, it’s time to learn some (rather messy) formulas for computing it. Formula for the determinant We know that the determinant has the following three properties: 1. det I = 1 2. Exchanging rows reverses the sign of the determinant. 3. The determinant is linear in each row separately. Last class we listed seven consequences of these properties. We can use these ten properties to find a formula for the determinant of a 2 by 2 matrix: � � � � � � � a b � � a 0 � � 0 b � � � = � � + � � � c d � � c d � � c d � � � � � � � � � � a 0 � � a 0 � � 0 b � � 0 b � = � � + � � + � � + � � � c 0 � � 0 d � � c 0 � � 0 d � = 0 + ad + (−cb) + 0 = ad − bc. By applying property 3 to separate the individual entries of each row we could get a formula for any other square matrix. However, for a 3 by 3 matrix we’ll have to add the determinants of twenty seven different matrices! Many of those determinants are zero. The non-zero pieces are: � � � � � � � � � a a a � � a 0 0 � � a 0 0 � � 0 a 0 � � 11 12 13 � � 11 � � 11 � � 12 � � a21 a22 a23 � = � 0 a22 0 � + � 0 0 a23 � + � a21 0 0 � � � � � � � � � � a31 a32 a33 � � 0 0 a33 � � 0 a32 0 � � 0 0 a33 � � � � � � � � 0 a 0 � � 0 0 a � � 0 0 a � � 12 � � 13 � � 13 � + � 0 0 a23 � + � a21 0 0 � + � 0 a22 0 � � � � � � � � a31 0 0 � � 0 a32 0 � � a31 0 0 � = a11 a22a33 − a11a23 a33 − a12a21a33 +a12a23a31 + a13 a21a32 − a13a22a31. Each of the non-zero pieces has one entry from each row in each column, as in a permutation matrix.
    [Show full text]
  • QR Decomposition on Gpus
    QR Decomposition on GPUs Andrew Kerr, Dan Campbell, Mark Richards Georgia Institute of Technology, Georgia Tech Research Institute {andrew.kerr, dan.campbell}@gtri.gatech.edu, [email protected] ABSTRACT LU, and QR decomposition, however, typically require ¯ne- QR decomposition is a computationally intensive linear al- grain synchronization between processors and contain short gebra operation that factors a matrix A into the product of serial routines as well as massively parallel operations. Achiev- a unitary matrix Q and upper triangular matrix R. Adap- ing good utilization on a GPU requires a careful implementa- tive systems commonly employ QR decomposition to solve tion informed by detailed understanding of the performance overdetermined least squares problems. Performance of QR characteristics of the underlying architecture. decomposition is typically the crucial factor limiting problem sizes. In this paper, we focus on QR decomposition in particular and discuss the suitability of several algorithms for imple- Graphics Processing Units (GPUs) are high-performance pro- mentation on GPUs. Then, we provide a detailed discussion cessors capable of executing hundreds of floating point oper- and analysis of how blocked Householder reflections may be ations in parallel. As commodity accelerators for 3D graph- used to implement QR on CUDA-compatible GPUs sup- ics, GPUs o®er tremendous computational performance at ported by performance measurements. Our real-valued QR relatively low costs. While GPUs are favorable to applica- implementation achieves more than 10x speedup over the tions with much inherent parallelism requiring coarse-grain native QR algorithm in MATLAB and over 4x speedup be- synchronization between processors, methods for e±ciently yond the Intel Math Kernel Library executing on a multi- utilizing GPUs for algorithms computing QR decomposition core CPU, all in single-precision floating-point.
    [Show full text]
  • The Unsymmetric Eigenvalue Problem
    Jim Lambers CME 335 Spring Quarter 2010-11 Lecture 4 Supplemental Notes The Unsymmetric Eigenvalue Problem Properties and Decompositions Let A be an n × n matrix. A nonzero vector x is called an eigenvector of A if there exists a scalar λ such that Ax = λx: The scalar λ is called an eigenvalue of A, and we say that x is an eigenvector of A corresponding to λ. We see that an eigenvector of A is a vector for which matrix-vector multiplication with A is equivalent to scalar multiplication by λ. We say that a nonzero vector y is a left eigenvector of A if there exists a scalar λ such that λyH = yH A: The superscript H refers to the Hermitian transpose, which includes transposition and complex conjugation. That is, for any matrix A, AH = AT . An eigenvector of A, as defined above, is sometimes called a right eigenvector of A, to distinguish from a left eigenvector. It can be seen that if y is a left eigenvector of A with eigenvalue λ, then y is also a right eigenvector of AH , with eigenvalue λ. Because x is nonzero, it follows that if x is an eigenvector of A, then the matrix A − λI is singular, where λ is the corresponding eigenvalue. Therefore, λ satisfies the equation det(A − λI) = 0: The expression det(A−λI) is a polynomial of degree n in λ, and therefore is called the characteristic polynomial of A (eigenvalues are sometimes called characteristic values). It follows from the fact that the eigenvalues of A are the roots of the characteristic polynomial that A has n eigenvalues, which can repeat, and can also be complex, even if A is real.
    [Show full text]
  • Sparse Linear Systems Section 4.2 – Banded Matrices
    Band Systems Tridiagonal Systems Cyclic Reduction Parallel Numerical Algorithms Chapter 4 – Sparse Linear Systems Section 4.2 – Banded Matrices Michael T. Heath and Edgar Solomonik Department of Computer Science University of Illinois at Urbana-Champaign CS 554 / CSE 512 Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Outline 1 Band Systems 2 Tridiagonal Systems 3 Cyclic Reduction Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 2 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Banded Linear Systems Bandwidth (or semibandwidth) of n × n matrix A is smallest value w such that aij = 0 for all ji − jj > w Matrix is banded if w n If w p, then minor modifications of parallel algorithms for dense LU or Cholesky factorization are reasonably efficient for solving banded linear system Ax = b If w / p, then standard parallel algorithms for LU or Cholesky factorization utilize few processors and are very inefficient Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 3 / 28 Band Systems Tridiagonal Systems Cyclic Reduction Narrow Banded Linear Systems More efficient parallel algorithms for narrow banded linear systems are based on divide-and-conquer approach in which band is partitioned into multiple pieces that are processed simultaneously Reordering matrix by nested dissection is one example of this approach Because of fill, such methods generally require more total work than best serial algorithm for system with dense band We will illustrate for tridiagonal linear systems, for which w = 1, and will assume pivoting is not needed for stability (e.g., matrix is diagonally dominant or symmetric positive definite) Michael T.
    [Show full text]