Sparse Givens QR Factorization on a Multiprocessor J

Sparse Givens QR Factorization on a Multiprocessor J. Tourino R. Doallo E.L. Zapata May 1996 Technical Report No: UMA-DAC-96/08 Published in: 2nd Int’l. Conf. on Massively Parallel Computing Systems Ischia, Italy, May 1996, pp. 551-556 University of Malaga Department of Computer Architecture C. Tecnologico • PO Box 4114 • E-29080 Malaga • Spain Sparse Givens QR Factorization on a Multipro cessor Juan Tourino Ramon Doallo Emilio L Zapata Dept Electronica y Sistemas Dept Arquitectura de Computadores University of La Coruna University of Malaga Campus de Elvina sn Campus de Teatinos La Coruna Spain Malaga Spain fjuandoallogudces ezapataatcctimaumaes Householder reections Since these sequential algo Abstract rithms have a high arithmetic complexity the deve lopment of parallel algorithms is of considerable in We present a paral lel algorithm for the QR fac terest Several parallel orthogonal factorization algo torization with column pivoting of a sparse matrix rithms have b een designed for various machines We by means of Givens rotations Nonzero elements of cite just a few for the Intel iPSC for the the matrix M to be decomposed are stored in a one nCUBE for a network of transputers for the dimensional doubly linked list data structure We wil l nCUBE for the CM all of them for dense discuss a strategy to reduce l lin in order to gain me matrices and CM Fujitsu AP mory savings and decrease the computation times As Cray TD for sparse matrices an application of QR factorization we wil l describe the least squares problem This algorithm has been We have implemented the Givens metho d with co designed for a message passing multiprocessor and we lumn pivoting for sparse matrices on the Cray TD have evaluated it on the Cray TD supercomputer using MIMD distributed memory computer Although a sparse problem could b e treated with a parallel pro the Harwel lBoeing sparse matrix col lection gram for dense factorization the storage and time cost of ignoring sparsity would not b enet from parallel pro cessing Intro duction This pap er is organized as follows In x we describ e the sequential and parallel Givens algorithm as well as QR factorization is a direct metho d in matrix alge a strategy to reduce llin The least squares problem bra which involves the decomp osition of a matrix M an application of QR factorization is shown in x and of dimensions A B A B into the pro duct of exp erimental results are discussed in x T 1 an orthogonal matrix Q Q Q and an upp er triangular matrix R QR factorization has many appli QR through Givens rotations cations in numerical linear algebra to solve linear sys tems of equations least squares problems linear pro grams eigenvalue problems co ordinate transforma We obtain matrices Q and R of dimensions A A tions pro jections and optimization problems It is ne and B B resp ectively using Givens rotations We cessary to solve these problems in many scientic elds do not calculate matrix Q b ecause this matrix is not such as uid dynamics molecular chemistry aeronau explicitly necessary in order to solve the least squares tic simulation problem describ ed in x Matrix M is overwritten by This factorization can b e computed by several p ossi matrix R inplace algorithm We present the sequen ble ways Chapter using plane rotations Givens tial algorithm with column pivoting in order to consider metho d the Mo died GramSchmidt pro cedure or those cases in which the rank of matrix M is not maxi mum The use of orthogonal transformations is nume This work was supp orted by the Spanish CICYT under con rically stable and in practice the rank of the matrix tract TICC and by the TRACS Programme under the can b e determined accurately when column p ermuta Human Capital and Mobility Programme of the Europ ean Union grant numb er ERBCHGECT tions are p erformed during factorization rank B Givens rotations are clearly orthogonal and it is for j jB j not necessary to p erform inverse trigonometric func A1 tions Figure shows the sequential QR factorization X 2 m norm j by means of Givens rotations It can b e observed how ij i=0 zeros are intro duced in matrix M to achieve the upp er triangular matrix R Finally the norms of the up dated for cx cxB cx f columns are calculated in Obtain px cx px B such that norm max norm px j j B cx 1) 2) A-1) if norm f px ******** ******** ******** ******** ******** 0 ******* ank cx r ******** ******** 0 ******* ******** ******** 0 ******* B cx ********M ******** 0 ******* 0 g ******** ** ****** ******* ******** 0 ******* 0 ******* f else 0 * * **** * 0 * ****** 0 ******* swap nor m col of M and cx cx nor m col of M px px ******** ******** ******** 0 ******* 0 ******* 0 ******* 0 ******* 0 0 ****** 0 0 ******R 0 ******* 0 0 ****** 0 0 0 ***** A icx i for i 0 ******* 0 0 * ***** 0 0 0 0 **** 0 0 0 0 0 0 0 0 ens rotation to subrows Apply Giv ******* * ***** *** 0 ******* 0 0 * ***** 0 0 0 0 0 0 ** and i from col to col i 0 0 0 0 0 0 0 0 0 0 0 0 B 1 cx ****** ***** * A) B[2A-B-1]/4 ) B[2A-B-1]/2 ) for jcx jB j 2 Figure 1. Sequential Givens rotations nor m nor m m j j cx j g Once the algorithm has ended what we really get is g a M Q R factorization is a p ermutation B B made up by the pro duct of rank elementary p ermutations each with In the squares of the euclidean norms of the 0 1 r ank 1 j jrank b eing the identity matrix or a matrix columns of matrix M are calculated and stored in vec resulting from swapping two of its columns This is tor norm Then a pro cedure of B iterations if the due to the pivoting carried out in rank of M is maximum is p erformed It consists of the following actions the pivot column px and the pivot element norm are selected The pivot element px Fillin control is the maximum of the norms of the columns whose index is cx If pivot is close to is the required When working with sparse matrices an additional precision the rank of the matrix is given by the value problem arises which is the llin The factors that in cx and the factorization ends Otherwise a swap of uence the llin of a sparse matrix are the following column cx with the pivot column of matrix M as well the dimensions and rank of the matrix the sparsity as a swap of their norms are p erformed degree numb er of null elements and another factor Givens rotations are applied in to zero the sub of great imp ortance but dicult to mo del which is diagonal elements of column cx A Givens rotation the matrix pattern or the lo cation of the nonzero ele involving two rows of matrix M consists of calcu ments Thus llin may vary signicantly for two ma lating the following pro duct trices with the same dimensions rank and sparsity de 0 0 0 gree dep ending on how nonzero elements are placed m m m cx cx+1 B 1 0 0 A high llin is an undesirable situation due to the in m m cx+1 B 1 crease in the storage cost and computation time It would b e of interest to implement a simple metho d to reduce llin The most common heuristic strategy employed in LU factorization to maintain the sparsity g cos g sin m m m cx cx+1 B 1 degree is the Markowitz criterion Chapter More g sin g cos m m m cx cx+1 B 1 over numerical stability must b e ensured in the LU factorization by avoiding the selection of pivots with a m m cx cx p p where g cos g sin 2 2 2 2 +m +m m m low absolute value A row ordering strategy for Givens cx cx cx cx Matrix Origin A B ElemM ElemM JPWH Circuit physics mo delling BCSSTK Structural engineering SHERMAN Oil reservoir mo delling Table 1. Harwell-Boeing sparse matrices Matrix Reduc ElemR ElemR ElemR ElemR JPWH BCSSTK SHERMAN Table 2. Fill-in reduction rotations based on pairing rows to minimize llin is by on the average for this set of sparse matrices presented in which is a substantial reduction We have implemented a metho d to reduce llin in the QR factorization by taking advantage of column Parallel Givens Rotations pivoting Instead of expression we use a new cri terion to select the pivot column The parallel algorithm develop ed has b een generali zed for any numb er of pro cessing elements PEs and Obtain px cx px B such that any dimension of matrix M We nd parallel Givens nor m z er o algorithms in and px px max z er o max nor m j j Matrix M is distributed onto a mesh with m n cxj B cxj B PEs Each PE is identied by co ordinates idxidy is maximum with idx n and idy m Nonzero elements of where zero is the numb er of nonzero elements in sub M are mapp ed over PEs using a Blo ck Column Scatter j column j from row cx to row A and BCS scheme but these elements are stored in is a prexed parameter The rst term refers to llin doubly linked lists instead of vectors This distribu reduction while the second one relates to numerical tion provides data and load balancing The algorithm stability Obviously for the pivot column selec requires access b oth by rows and by columns a data tion criterion is equivalent to the one describ ed in structure such as a twodimensional doubly linked list The strategy is to cho ose a pivot column with many used in for a LU factorization would b e suita zero es in the sub column from row cx to A in order ble But this structure is costly to manage and a great to p erform fewer rotations Although we try to reduce amount of memory is required Therefore we use one llin as much as p ossible we shall always keep dimensional doubly linked lists each one represents one a minimum degree of numerical stability

Load more