Communication Avoiding Rank Revealing Qr Factorization with Column Pivoting
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
QR Decomposition: History and Its Applications
Mathematics & Statistics Auburn University, Alabama, USA QR History Dec 17, 2010 Asymptotic result QR iteration QR decomposition: History and its EE Applications Home Page Title Page Tin-Yau Tam èèèUUUÎÎÎ JJ II J I Page 1 of 37 Æâ§w Go Back fÆêÆÆÆ Full Screen Close email: [email protected] Website: www.auburn.edu/∼tamtiny Quit 1. QR decomposition Recall the QR decomposition of A ∈ GLn(C): QR A = QR History Asymptotic result QR iteration where Q ∈ GLn(C) is unitary and R ∈ GLn(C) is upper ∆ with positive EE diagonal entries. Such decomposition is unique. Set Home Page a(A) := diag (r11, . , rnn) Title Page where A is written in column form JJ II J I A = (a1| · · · |an) Page 2 of 37 Go Back Geometric interpretation of a(A): Full Screen rii is the distance (w.r.t. 2-norm) between ai and span {a1, . , ai−1}, Close i = 2, . , n. Quit Example: 12 −51 4 6/7 −69/175 −58/175 14 21 −14 6 167 −68 = 3/7 158/175 6/175 0 175 −70 . QR −4 24 −41 −2/7 6/35 −33/35 0 0 35 History Asymptotic result QR iteration EE • QR decomposition is the matrix version of the Gram-Schmidt orthonor- Home Page malization process. Title Page JJ II • QR decomposition can be extended to rectangular matrices, i.e., if A ∈ J I m×n with m ≥ n (tall matrix) and full rank, then C Page 3 of 37 A = QR Go Back Full Screen where Q ∈ Cm×n has orthonormal columns and R ∈ Cn×n is upper ∆ Close with positive “diagonal” entries. -
Lecture 4: Applications of Orthogonality: QR Decompositions Week 4 UCSB 2014
Math 108B Professor: Padraic Bartlett Lecture 4: Applications of Orthogonality: QR Decompositions Week 4 UCSB 2014 In our last class, we described the following method for creating orthonormal bases, known as the Gram-Schmidt method: ~ ~ Theorem. Suppose that V is a k-dimensional space with a basis B = fb1;::: bkg. The following process (called the Gram-Schmidt process) creates an orthonormal basis for V : 1. First, create the following vectors f ~v1; : : : ~vkg: • ~u1 = b~1: • ~u2 = b~2 − proj(b~2 onto ~u1). • ~u3 = b~3 − proj(b~3 onto ~u1) − proj(b~3 onto ~u2). • ~u4 = b~4 − proj(b~4 onto ~u1) − proj(b~4 onto ~u2) − proj(b~4 onto ~u3). ~ ~ ~ • ~uk = bk − proj(bk onto ~u1) − ::: − proj(bk onto k~u −1). 2. Now, take each of the vectors ~ui, and rescale them so that they are unit length: i.e. ~ui redefine each ~ui as the rescaled vector . jj ~uijj In this class, I want to talk about a useful application of the Gram-Schmidt method: the QR decomposition! We define this here: 1 The QR decomposition Definition. We say that an n×n matrix Q is orthogonal if its columns form an orthonor- n mal basis for R . As a side note: we've studied these matrices before! In class on Friday, we proved that for any such matrix, the relation QT · Q = I held; to see this, we just looked at the (i; j)-th entry of the product QT · Q, which by definition was the i-th row of QT dotted with the j-th column of A. -
Numerical Linear Algebra Revised February 15, 2010 4.1 the LU
Numerical Linear Algebra Revised February 15, 2010 4.1 The LU Decomposition The Elementary Matrices and badgauss In the previous chapters we talked a bit about the solving systems of the form Lx = b and Ux = b where L is lower triangular and U is upper triangular. In the exercises for Chapter 2 you were asked to write a program x=lusolver(L,U,b) which solves LUx = b using forsub and backsub. We now address the problem of representing a matrix A as a product of a lower triangular L and an upper triangular U: Recall our old friend badgauss. function B=badgauss(A) m=size(A,1); B=A; for i=1:m-1 for j=i+1:m a=-B(j,i)/B(i,i); B(j,:)=a*B(i,:)+B(j,:); end end The heart of badgauss is the elementary row operation of type 3: B(j,:)=a*B(i,:)+B(j,:); where a=-B(j,i)/B(i,i); Note also that the index j is greater than i since the loop is for j=i+1:m As we know from linear algebra, an elementary opreration of type 3 can be viewed as matrix multipli- cation EA where E is an elementary matrix of type 3. E looks just like the identity matrix, except that E(j; i) = a where j; i and a are as in the MATLAB code above. In particular, E is a lower triangular matrix, moreover the entries on the diagonal are 1's. We call such a matrix unit lower triangular. -
A Fast and Accurate Matrix Completion Method Based on QR Decomposition and L 2,1-Norm Minimization Qing Liu, Franck Davoine, Jian Yang, Ying Cui, Jin Zhong, Fei Han
A Fast and Accurate Matrix Completion Method based on QR Decomposition and L 2,1-Norm Minimization Qing Liu, Franck Davoine, Jian Yang, Ying Cui, Jin Zhong, Fei Han To cite this version: Qing Liu, Franck Davoine, Jian Yang, Ying Cui, Jin Zhong, et al.. A Fast and Accu- rate Matrix Completion Method based on QR Decomposition and L 2,1-Norm Minimization. IEEE Transactions on Neural Networks and Learning Systems, IEEE, 2019, 30 (3), pp.803-817. 10.1109/TNNLS.2018.2851957. hal-01927616 HAL Id: hal-01927616 https://hal.archives-ouvertes.fr/hal-01927616 Submitted on 20 Nov 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Page 1 of 20 1 1 2 3 A Fast and Accurate Matrix Completion Method 4 5 based on QR Decomposition and L -Norm 6 2,1 7 8 Minimization 9 Qing Liu, Franck Davoine, Jian Yang, Member, IEEE, Ying Cui, Zhong Jin, and Fei Han 10 11 12 13 14 Abstract—Low-rank matrix completion aims to recover ma- I. INTRODUCTION 15 trices with missing entries and has attracted considerable at- tention from machine learning researchers. -
Notes on Orthogonal Projections, Operators, QR Factorizations, Etc. These Notes Are Largely Based on Parts of Trefethen and Bau’S Numerical Linear Algebra
Notes on orthogonal projections, operators, QR factorizations, etc. These notes are largely based on parts of Trefethen and Bau’s Numerical Linear Algebra. Please notify me of any typos as soon as possible. 1. Basic lemmas and definitions Throughout these notes, V is a n-dimensional vector space over C with a fixed Her- mitian product. Unless otherwise stated, it is assumed that bases are chosen so that the Hermitian product is given by the conjugate transpose, i.e. that (x, y) = y∗x where y∗ is the conjugate transpose of y. Recall that a projection is a linear operator P : V → V such that P 2 = P . Its complementary projection is I − P . In class we proved the elementary result that: Lemma 1. Range(P ) = Ker(I − P ), and Range(I − P ) = Ker(P ). Another fact worth noting is that Range(P ) ∩ Ker(P ) = {0} so the vector space V = Range(P ) ⊕ Ker(P ). Definition 1. A projection is orthogonal if Range(P ) is orthogonal to Ker(P ). Since it is convenient to use orthonormal bases, but adapt them to different situations, we often need to change basis in an orthogonality-preserving way. I.e. we need the following type of operator: Definition 2. A nonsingular matrix Q is called an unitary matrix if (x, y) = (Qx, Qy) for every x, y ∈ V . If Q is real, it is called an orthogonal matrix. An orthogonal matrix Q can be thought of as an orthogonal change of basis matrix, in which each column is a new basis vector. Lemma 2. -
GRAM SCHMIDT and QR FACTORIZATION Math 21B, O. Knill
GRAM SCHMIDT AND QR FACTORIZATION Math 21b, O. Knill GRAM-SCHMIDT PROCESS. Let ~v1; : : : ;~vn be a basis in V . Let ~w1 = ~v1 and ~u1 = ~w1=jj~w1jj. The Gram-Schmidt process recursively constructs from the already constructed orthonormal set ~u1; : : : ; ~ui−1 which spans a linear space Vi−1 the new vector ~wi = (~vi − projVi−1 (~vi)) which is orthogonal to Vi−1, and then normalizes ~wi to get ~ui = ~wi=jj~wijj. Each vector ~wi is orthonormal to the linear space Vi−1. The vectors f~u1; : : : ; ~un g form then an orthonormal basis in V . EXAMPLE. 2 2 3 2 1 3 2 1 3 Find an orthonormal basis for ~v1 = 4 0 5, ~v2 = 4 3 5 and ~v3 = 4 2 5. 0 0 5 SOLUTION. 2 1 3 1. ~u1 = ~v1=jj~v1jj = 4 0 5. 0 2 0 3 2 0 3 3 1 2. ~w2 = (~v2 − projV1 (~v2)) = ~v2 − (~u1 · ~v2)~u1 = 4 5. ~u2 = ~w2=jj~w2jj = 4 5. 0 0 2 0 3 2 0 3 0 0 3. ~w3 = (~v3 − projV2 (~v3)) = ~v3 − (~u1 · ~v3)~u1 − (~u2 · ~v3)~u2 = 4 5, ~u3 = ~w3=jj~w3jj = 4 5. 5 1 QR FACTORIZATION. The formulas can be written as ~v1 = jj~v1jj~u1 = r11~u1 ··· ~vi = (~u1 · ~vi)~u1 + ··· + (~ui−1 · ~vi)~ui−1 + jj~wijj~ui = r1i~u1 + ··· + rii~ui ··· ~vn = (~u1 · ~vn)~u1 + ··· + (~un−1 · ~vn)~un−1 + jj~wnjj~un = r1n~u1 + ··· + rnn~un which means in matrix form 2 3 2 3 2 3 j j · j j j · j r11 r12 · r1n A = 4 ~v1 ···· ~vn 5 = 4 ~u1 ···· ~un 5 4 0 r22 · r2n 5 = QR ; j j · j j j · j 0 0 · rnn where A and Q are m × n matrices and R is a n × n matrix which has rij = ~vj · ~ui, for i < j and vii = j~vij THE GRAM-SCHMIDT PROCESS PROVES: Any matrix A with linearly independent columns ~vi can be decomposed as A = QR, where Q has orthonormal column vectors and where R is an upper triangular square matrix with the same number of columns than A. -
QR Decomposition on Gpus
QR Decomposition on GPUs Andrew Kerr, Dan Campbell, Mark Richards Georgia Institute of Technology, Georgia Tech Research Institute {andrew.kerr, dan.campbell}@gtri.gatech.edu, [email protected] ABSTRACT LU, and QR decomposition, however, typically require ¯ne- QR decomposition is a computationally intensive linear al- grain synchronization between processors and contain short gebra operation that factors a matrix A into the product of serial routines as well as massively parallel operations. Achiev- a unitary matrix Q and upper triangular matrix R. Adap- ing good utilization on a GPU requires a careful implementa- tive systems commonly employ QR decomposition to solve tion informed by detailed understanding of the performance overdetermined least squares problems. Performance of QR characteristics of the underlying architecture. decomposition is typically the crucial factor limiting problem sizes. In this paper, we focus on QR decomposition in particular and discuss the suitability of several algorithms for imple- Graphics Processing Units (GPUs) are high-performance pro- mentation on GPUs. Then, we provide a detailed discussion cessors capable of executing hundreds of floating point oper- and analysis of how blocked Householder reflections may be ations in parallel. As commodity accelerators for 3D graph- used to implement QR on CUDA-compatible GPUs sup- ics, GPUs o®er tremendous computational performance at ported by performance measurements. Our real-valued QR relatively low costs. While GPUs are favorable to applica- implementation achieves more than 10x speedup over the tions with much inherent parallelism requiring coarse-grain native QR algorithm in MATLAB and over 4x speedup be- synchronization between processors, methods for e±ciently yond the Intel Math Kernel Library executing on a multi- utilizing GPUs for algorithms computing QR decomposition core CPU, all in single-precision floating-point. -
How Unique Is QR? Full Rank, M = N
LECTUREAPPENDIX · purdue university cs 51500 Kyle Kloster matrix computations October 12, 2014 How unique is QR? Full rank, m = n In class we looked at the special case of full rank, n × n matrices, and showed that the QR decomposition is unique up to a factor of a diagonal matrix with entries ±1. Here we'll see that the other full rank cases follow the m = n case somewhat closely. Any full rank QR decomposition involves a square, upper- triangular partition R within the larger (possibly rectangular) m × n matrix. The gist of these uniqueness theorems is that R is unique, up to multiplication by a diagonal matrix of ±1s; the extent to which the orthogonal matrix is unique depends on its dimensions. Theorem (m = n) If A = Q1R1 = Q2R2 are two QR decompositions of full rank, square A, then Q2 = Q1S R2 = SR1 for some square diagonal S with entries ±1. If we require the diagonal entries of R to be positive, then the decomposition is unique. Theorem (m < n) If A = Q1 R1 N 1 = Q2 R2 N 2 are two QR decom- positions of a full rank, m × n matrix A with m < n, then Q2 = Q1S; R2 = SR1; and N 2 = SN 1 for square diagonal S with entries ±1. If we require the diagonal entries of R to be positive, then the decomposition is unique. R R Theorem (m > n) If A = Q U 1 = Q U 2 are two QR 1 1 0 2 2 0 decompositions of a full rank, m × n matrix A with m > n, then Q2 = Q1S; R2 = SR1; and U 2 = U 1T for square diagonal S with entries ±1, and square orthogonal T . -
A Survey of Direct Methods for Sparse Linear Systems
A survey of direct methods for sparse linear systems Timothy A. Davis, Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar Technical Report, Department of Computer Science and Engineering, Texas A&M Univ, April 2016, http://faculty.cse.tamu.edu/davis/publications.html To appear in Acta Numerica Wilkinson defined a sparse matrix as one with enough zeros that it pays to take advantage of them.1 This informal yet practical definition captures the essence of the goal of direct methods for solving sparse matrix problems. They exploit the sparsity of a matrix to solve problems economically: much faster and using far less memory than if all the entries of a matrix were stored and took part in explicit computations. These methods form the backbone of a wide range of problems in computational science. A glimpse of the breadth of applications relying on sparse solvers can be seen in the origins of matrices in published matrix benchmark collections (Duff and Reid 1979a) (Duff, Grimes and Lewis 1989a) (Davis and Hu 2011). The goal of this survey article is to impart a working knowledge of the underlying theory and practice of sparse direct methods for solving linear systems and least-squares problems, and to provide an overview of the algorithms, data structures, and software available to solve these problems, so that the reader can both understand the methods and know how best to use them. 1 Wilkinson actually defined it in the negation: \The matrix may be sparse, either with the non-zero elements concentrated ... or distributed in a less systematic manner. We shall refer to a matrix as dense if the percentage of zero elements or its distribution is such as to make it uneconomic to take advantage of their presence." (Wilkinson and Reinsch 1971), page 191, emphasis in the original. -
Parallel Codes for Computing the Numerical Rank '
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector LINEAR ALGEBRA AND ITS APPLICATIONS ELSEIVIER Linear Algebra and its Applications 275~ 276 (1998) 451 470 Parallel codes for computing the numerical rank ’ Gregorio Quintana-Orti ***, Enrique S. Quintana-Orti ’ Abstract In this paper we present an experimental comparison of several numerical tools for computing the numerical rank of dense matrices. The study includes the well-known SVD. the URV decomposition, and several rank-revealing QR factorizations: the QR factorization with column pivoting, and two QR factorizations with restricted column pivoting. Two different parallel programming methodologies are analyzed in our paper. First, we present block-partitioned algorithms for the URV decomposition and rank-re- vealing QR factorizations which provide efficient implementations on shared memory environments. Furthermore, we also present parallel distributed algorithms, based on the message-passing paradigm, for computing rank-revealing QR factorizations on mul- ticomputers. 0 1998 Elsevier Science Inc. All rights reserved. 1. Introduction Consider the matrix A E KY’“” (w.1.o.g. m 3 n), and define its rank as 1’ := rank(il) = dim(range(A)). The rank of a matrix is related to its singular values a(A) = {crt, 02,. , o,,}, p = min(m, H), since *Corresponding author. E-mail: [email protected]. ’ All authors were partially supported by the Spanish CICYT project TIC 96-1062~CO3-03. ’ This work was partially carried out while the author was visiting the Mathematics and Computer Science Division at Argonne National Laboratory. ’ E-mail: [email protected]. -
Coded QR Decomposition
Coded QR Decomposition Quang Minh Nguyen1, Haewon Jeong2 and Pulkit Grover2 1 Department of Computer Science, National University of Singapore 2 Department of Electrical and Computer Engineering, Carnegie Mellon University Abstract—QR decomposition of a matrix is one of the essential • Previous works in ABFT for QR decomposition studied operations that is used for solving linear equations and finding applying coding on the Householder algorithm or the least-squares solutions. We propose a coded computing strategy Givens Rotation algorithm. Our work is the first to for parallel QR decomposition with applications to solving a full- rank square system of linear equations in a high-performance consider applying coding on the Gram-Schmidt algo- computing system. Our strategy is applied to the parallel Gram- rithm (and its variants). We show that using our strategy Schmidt algorithm, which is one of the three commonly used throughout the Gram-Schmidt algorithm1, vertical/hori- algorithms for QR decomposition. Conventional coding strategies zontal checksum structures are preserved (Section III). cannot preserve the orthogonality of Q. We prove a condition for • Simply applying linear coding cannot protect the Q-factor a checksum-generator matrix to restore the degraded orthogo- nality of the decoded Q through low-cost post-processing, and as the orthogonality is not preserved after linear trans- construct a checksum-generator matrix for single-node failures. forms. To circumvent this issue, we propose an innovative We obtain the minimal number of checksums required for single- post-processing technique to restore the orthogonality of node failures under the “in-node checksum storage setting”, where the Q-factor after coding. -
Efficient Scalable Algorithms for Hierarchically Semiseparable Matrices
EFFICIENT SCALABLE ALGORITHMS FOR HIERARCHICALLY SEMISEPARABLE MATRICES SHEN WANG¤, XIAOYE S. LIy , JIANLIN XIAz , YINGCHONG SITUx , AND MAARTEN V. DE HOOP{ Abstract. Hierarchically semiseparable (HSS) matrix structures are very useful in the develop- ment of fast direct solvers for both dense and sparse linear systems. For example, they can built into parallel sparse solvers for Helmholtz equations arising from frequency domain wave equation mod- eling prevailing in the oil and gas industry. Here, we study HSS techniques in large-scale scienti¯c computing by considering the scalability of HSS structures and by developing massively parallel HSS algorithms. These algorithms include parallel rank-revealing QR factorizations, HSS constructions with hierarchical compression, ULV HSS factorizations, and HSS solutions. BLACS and ScaLAPACK are used as our portable libraries. We construct contexts (sub-communicators) on each node of the HSS tree and exploit the governing 2D-block-cyclic data distribution scheme widely used in ScaLAPACK. Some tree techniques for symmetric positive de¯nite HSS matrices are generalized to nonsymmetric problems in parallel. These parallel HSS methods are very useful for sparse solutions when they are used as kernels for intermediate dense matrices. Numerical examples (including 3D ones) con¯rm the performance of our implementations, and demonstrate the potential of structured methods in parallel computing, especially parallel direction solutions of large linear systems. Related techniques are also useful for the parallelization of other rank structured methods. Key words. HSS matrix, parallel HSS algorithm, low-rank property, compression, HSS con- struction, direct solver AMS subject classi¯cations. 15A23, 65F05, 65F30, 65F50 1. Introduction. In recent years, rank structured matrices have attracted much attention and have been widely used in the fast solutions of various partial di®erential equations, integral equations, and eigenvalue problems.