Parallelizing the Divide and Conquer Algorithm for The

Parallelizi ng the Divide and Conquer Algorithm for the Symmetric Tridiagona l Eigenvalue Problem on Distributed Memory Architectures y z Francoise Tisseur Jack Dongarra February 24, 1998 Abstract We present a new parallel implementation of a divide and conquer algorithm for computing the sp ectral decomp osition of a symmetric tridiagonal matrix on distributed memory architectures. The implementation we develop di ers from other implementations in that we use a two dimensional blo ck cyclic distribution of the data, we use the Lowner theorem approach to compute orthogonal eigenvectors, and weintro duce p ermutations b efore the back transformation of each rank-one up date in order to make good This work supp orted in part by Oak Ridge National Lab oratory, managed byLockheed Martin Energy Research Corp. for the U.S. Department of Energy under contract number DE-AC05-96OR2246 and by the Defense Advanced Research Pro jects Agency under contract DAAL03-91-C-0047, administered by the Army Research Oce. y Department of Mathematics, University of Manchester, Manchester, M13 9PL, England [email protected], http://www.ma.man.ac.uk/~ftisseur/. z Department of Computer Science, UniversityofTennessee, Knoxville, TN 37996-1301, USA and Mathematical Sciences Section, Oak Ridge National Lab oratory, Oak Ridge, TN 37831 [email protected], http://www.cs.utk.edu/~dongarra. 1 use of de ation. This algorithm yields the rst scalable, p ortable and nu- merically stable parallel divide and conquer eigensolver. Numerical results con rm the e ectiveness of our algorithm. We compare p erformance of the algorithm with that of the QR algorithm and of bisection followed by inverse iteration on an IBM SP2 and a cluster of Pentium PI I's. Key words. Divide and conquer, symmetric eigenvalue problem, tridiagonal matrix, rank-one mo di cation, parallel algorithm, ScaLAPACK, LAPACK, distributed memory architec- ture AMS sub ject classi cations. 65F15, 68C25 1 Intro duction The divide and conquer algorithm is an imp ortant recent development for solving the tridiagonal symmetric eigenvalue problem. The algorithm was rst develop ed by Cupp en [8] based on previous ideas of Golub [17] and Bunch, Nielson and Sorensen [5] for the solution of the secular equation and made p opular as a practical parallel metho d by Dongarra and Sorensen [14]. This simple and attractive algorithm was considered unstable for a while b ecause of a lack of orthogonality in the computed eigenvectors. It was thought that extended precision arithmetic was needed in the solution of the secular equation to guarantee that suciently orthogonal eigenvectors are pro duced when there are close eigenvalues. Recently,however, Gu and Eisenstat [20]have found a new approach that do es not require extended precision and wehave used it in our implementation. The divide and conquer algorithm has natural parallelism as the initial problem is partitioned into several subproblems that can b e solved inde- p endently. Early parallel implementations had mixed success. Dongarra and Sorensen [14] and later Darbyshire [9] wrote an implementation for shared memory machines Alliant FX/8, KSR1. They concluded that divide and conquer algorithms, when prop erly implemented, can b e many times faster than traditional ones such as bisection followed byinverse iteration or the QR algorithm, even on serial computers. Hence, a version has b een incorp orated in LAPACK [1]. Ipsen and Jessup [23] compared 2 their parallel implementations of the divide and conquer algorithm and the bisection algorithm on the Intel iPSC-1 hyp ercub e. They found that their bisection implementation was more ecient than their divide and conquer implementation b ecause of the excessive amount of data that was transferred b etween pro cessors and also b ecause of unbalanced work load after the de ation pro cess. More recently, Gates and Arb enz [16] with an implementation for the Intel Paragon and Fachin [15] with an implementation on a network of T800 transputers showed that go o d sp eed-up can be achieved from distributed memory parallel implementations. However, their implementations are not as ecient as they could have b een. They did not use techniques describ ed in [20] that guarantee the orthogonality of the eigenvectors and that make go o d use of the de ation in order to sp eed the computation. In this pap er, we describ e an ecient, scalable, and p ortable parallel implementation for distributed memory machines of a divide and conquer algorithm for the symmetric tridiagonal eigenvalue problem. Divide and conquer metho ds consist of an initial partition of the problem into subproblems and then, after some appropriate computations done on these individual subproblems, results are joined together using rank-r up dates r>1. Wechose to implement the rank-one up date of Cupp en [8] rather than the rank-two up date used in [16], [20]. A priori, we see no reason why one up date should b e more accurate than the other or faster in general, but Cupp en's metho d, as reviewed in Section 2, app ears to b e easier to implement. In Section 3 we discuss several imp ortant issues to consider for parallel implementations of a divide and conquer algorithm. Then, in Section 4, we derive our algorithm. Wehave implemented our algorithm in Fortran 77 as pro duction quality software in the ScaLAPACK mo del [4]. Our algorithm is well suited to compute all the eigenvalues and eigenvectors of large matrices with clusters of eigenvalues. For these problems, bisection followed byinverse iteration as implemented in ScaLAPACK [4], [11]is limited by the size of the largest cluster that ts on one pro cessor. The QR algorithm is less sensitive to the eigenvalue distribution but is more exp ensive in computation and communication and thus do es not p erform as well as the divide and conquer metho d. Examples that demonstrate the 3 eciency and numerical p erformance are presented in x6. 2 Cupp en's Metho d Solving the symmetric eigenvalue problem consists, in general, in three steps: the symmetric matrix A is reduced to tridiagonal form T , then one computes the eigenvalues and eigenvectors of T and nally, one computes the eigenvectors of A from the eigenvectors of T . In this section we consider the problem of determining the sp ectral decomp osition T T = WW nn of a symmetric tridiagonal matrix T 2 R , where is diagonal and W is orthogonal. Cupp en [8] divides the original problem into subproblems of smaller size byintro ducing the decomp osition: k nk ! ! ! T b k T e e T 0 e 1 k 1 k 1 T 1 T T = = + e e 1 k T 1 b T nk e e 0 T e 2 1 2 1 k where 1 k n, e represents the j th canonical vector of appropriate j b b dimension, is the k th o -diagonal elementofT , and T and T di er 1 2 from the corresp onding submatrices of T only by their last and rst diagonal co ecient, resp ectively. This is the divide phase. Dongarra and Sorensen [14]intro duced the factor to avoid cancellation when forming b b the new diagonal elements of diag T ; T . Wenowhavetwo indep endent 1 2 symmetric tridiagonal eigenvalue problems of order k and n k . Let T T b b 2.1 ; T = Q D Q T = Q D Q 2 2 2 1 1 1 2 1 b e their sp ectral decomp ositions and ! e k T z = diag Q ;Q : 2.2 1 2 1 e 1 Then we get ! ! ! T Q 0 D Q 0 1 1 1 T T = + z z ;= 2.3 ; 0 Q D 0 Q 2 2 2 T T = QD + z z Q 4 T and the eigenvalues of T are therefore those of D + z z . Finding the sp ectral decomp osition T T D + z z = UU of a rank-one up date is the heart of the divide and conquer algorithm. This is the conquer phase. Then it follows that T T = WW with W = QU: 2.4 A recursive application of the strategy describ ed ab ove on the two tridiagonal matrices in 2.1 leads to the divide and conquer metho d for the symmetric tridiagonal eigenvalue problem. 2.1 Computing the Sp ectral Decomp osition of a Rank-One Perturb ed Matrix An up dating technique as describ ed in [5], [8], [17] can b e used to compute the sp ectral decomp osition of a rank-one p erturb ed matrix T T D + z z = UU 2.5 where D = diag d ;d ;:::;d , z =z ;z ;:::;z and is a nonzero 1 2 n 1 2 n T scalar. By setting equal to zero the characteristic p olynomial of D + z z T n of D + z z are the ro ots of we nd that the eigenvalues f g i i=1 n 2 X z i f = 1+ ; 2.6 d i i=1 which is called the secular equation. Many metho ds have b een suggested for solving the secular equation see Melman [27] for a survey. Each eigenvalue is computed in O n ops. A corresp onding normalized eigenvector u can b e computed from the formula v , u n 2 1 X z u z D I z z j 1 n t u = = ;:::; 2.7 1 2 jjD I z jj d d d 1 n j j =1 in only O n ops. Thus, the sp ectral decomp osition of a rank-one p er- 2 turb ed matrix can b e computed in O n ops.

Parallelizing the Divide and Conquer Algorithm for The

Fast Solution of Sparse Linear Systems with Adaptive Choice of Preconditioners Zakariae Jorti

Efficient “Black-Box” Multigrid Solvers for Convection-Dominated Problems

Chapter 7 Iterative Methods for Large Sparse Linear Systems

Chebyshev and Fourier Spectral Methods 2000

Relaxed Modulus-Based Matrix Splitting Methods for the Linear Complementarity Problem †

Numerical Solution of Saddle Point Problems

Trigonometric Transform Splitting Methods for Real Symmetric Toeplitz Systems

Numerical Linear Algebra

The Jacobi-Davidson Algorithm for Solving Large Sparse Symmetric Eigenvalue Problems with Application to the Design of Accelerator Cavities

Relaxation Or Iterative Techniques for the Solution of Linear Equations

On CSCS-Based Iteration Methods for Toeplitz System of Weakly Nonlinear

Stationary Splitting Iterative Methods for the Matrix Equation Axb = C