A New O(N2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem

A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem by Inderjit Singh Dhillon B.Tech. (Indian Institute of Technology, Bombay) 1989 A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the GRADUATE DIVISION of the UNIVERSITY of CALIFORNIA, BERKELEY Committee in charge: Professor James W. Demmel, Chair Professor Beresford N. Parlett Professor Phil Colella 1997 The dissertation of Inderjit Singh Dhillon is approved: Chair Date Date Date University of California, Berkeley 1997 A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem Copyright 1997 by Inderjit Singh Dhillon 1 Abstract A New O(n2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem by Inderjit Singh Dhillon Doctor of Philosophy in Computer Science University of California, Berkeley Professor James W. Demmel, Chair Computing the eigenvalues and orthogonal eigenvectors of an n × n symmetric tridiagonal matrix is an important task that arises while solving any symmetric eigenproblem. All practical software requires O(n3) time to compute all the eigenvectors and ensure their orthogonality when eigenvalues are close. In the first part of this thesis we review earlier work and show how some existing implementations of inverse iteration can fail in surprising ways. The main contribution of this thesis is a new O(n2), easily parallelizable algorithm for solving the tridiagonal eigenproblem. Three main advances lead to our new algorithm. A tridiagonal matrix is traditionally represented by its diagonal and off-diagonal elements. Our most important advance is in recognizing that its bidiagonal factors are “better” for computational purposes. The use of bidiagonals enables us to invoke a relative criterion to judge when eigenvalues are “close”. The second advance comes by using multiple bidiagonal factorizations in order to compute different eigenvectors independently of each other. Thirdly, we use carefully chosen dqds-like transformations as inner loops to compute eigen- pairs that are highly accurate and “faithful” to the various bidiagonal representations. Orthogonality of the eigenvectors is a consequence of this accuracy. Only O(n) work per eigenpair is needed by our new algorithm. Conventional wisdom is that there is usually a trade-off between speed and accuracy in numerical procedures, i.e., higher accuracy can be achieved only at the expense of greater computing time. An interesting aspect of our work is that increased accuracy in 2 the eigenvalues and eigenvectors obviates the need for explicit orthogonalization and leads to greater speed. We present timing and accuracy results comparing a computer implementation of our new algorithm with four existing EISPACK and LAPACK software routines. Our test-bed contains a variety of tridiagonal matrices, some coming from quantum chemistry applications. The numerical results demonstrate the superiority of our new algorithm. For example, on a matrix of order 966 that occurs in the modeling of a biphenyl molecule our method is about 10 times faster than LAPACK’s inverse iteration on a serial IBM RS/6000 processor and nearly 100 times faster on a 128 processor IBM SP2 parallel machine. Professor James W. Demmel Dissertation Committee Chair iii To my parents, for their constant love and support. iv Contents List of Figures vi List of Tables viii 1 Setting the Scene 1 1.1 Our Goals . 2 1.2 Outline of Thesis . 3 1.3 Notation . 5 2 Existing Algorithms & their Drawbacks 7 2.1 Background . 7 2.2 The QR Algorithm . 9 2.3 Bisection and Inverse Iteration . 11 2.4 Divide and Conquer Methods . 12 2.5 Other Methods . 13 2.6 Comparison of Existing Methods . 14 2.7 Issues in Inverse Iteration . 16 2.8 Existing Implementations . 20 2.8.1 EISPACK and LAPACK Inverse Iteration . 22 2.9 Our Approach . 32 3 Computing the eigenvector of an isolated eigenvalue 34 3.1 Twisted Factorizations . 38 3.2 The Eigenvector Connection . 44 3.3 Zero Pivots . 49 3.4 Avoiding Divisions . 55 3.4.1 Heuristics for choosing r ......................... 56 3.5 Twisted Q Factorizations — A Digression∗ . 57 3.5.1 “Perfect” Shifts are perfect . 61 3.6 Rank Revealing Factorizations∗ ......................... 64 4 Computing orthogonal eigenvectors when relative gaps are large 68 4.1 Benefits of High Accuracy . 69 4.2 Tridiagonals Are Inadequate . 70 v 4.3 Relative Perturbation Theory for Bidiagonals . 72 4.4 Using Products of Bidiagonals . 76 4.4.1 qd-like Recurrences . 78 4.4.2 Roundoff Error Analysis . 83 4.4.3 Algorithm X — orthogonality for large relative gaps . 93 4.5 Proof of Orthogonality . 94 4.5.1 A Requirement on r ........................... 95 4.5.2 Outline of Argument . 97 4.5.3 Formal Proof . 99 4.5.4 Discussion of Error Bounds . 105 4.5.5 Orthogonality in Extended Precision Arithmetic . 107 4.6 Numerical Results . 108 5 Multiple Representations 111 5.1 Multiple Representations . 112 5.2 Relatively Robust Representations (RRRs) . 115 5.2.1 Relative Condition Numbers . 116 5.2.2 Examples . 119 5.2.3 Factorizations of Nearly Singular Tridiagonals . 123 5.2.4 Other RRRs . 126 5.3 Orthogonality using Multiple Representations . 126 5.3.1 Representation Trees . 127 5.4 Algorithm Y — orthogonality even when relative gaps are small . 132 6 A Computer Implementation 134 6.1 Forming an RRR . 135 6.2 Computing the Locally Small Eigenvalues . 136 6.3 An Enhancement using Submatrices . 138 6.4 Numerical Results . 139 6.4.1 Test Matrices . 140 6.4.2 Timing and Accuracy Results . 143 6.5 Future Enhancements to Algorithm Y . 152 7 Conclusions 155 7.1 Future Work . 156 Bibliography 158 A The need for accurate eigenvalues 169 B Bidiagonals are Better 173 C Multiple representations lead to orthogonality 177 vi List of Figures 2.1 A typical implementation of Inverse Iteration to compute the jth eigenvector 20 2.2 Eigenvalue distribution in Example . 21 2.3 Tinvit — EISPACK’s implementation of Inverse Iteration . 23 2.4 STEIN — LAPACK’s implementation of Inverse Iteration . 24 3.1 Twisted Triangular Factorization at k = 3 (the next elements to be annihilated are circled) . 40 3.2 Twisted Q Factorization at k = 3 (the next elements to be annihilated are circled): forming Nk................................ 58 3.3 Twisted Triangular Factorization of a Hessenberg matrix at k = 3 (the next elements to be annihilated are circled) . 65 3.4 Twisted Orthogonal Factorization of a Hessenberg matrix at k = 3 (the next elements to be annihilated are circled) . 65 3.5 The above may be thought of as a twisted triangular or orthogonal factorization of a dense matrix at k = 3 (the next elements to be annihilated are circled) . 66 4.1 Effects of roundoff — dstqds transformation . 85 4.2 Effects of roundoff — dqds transformation . 87 4.3 Effects of roundoff — dtwqds transformation . 88 4.4 Effects of roundoff — LDLT decomposition . 90 4.5 dtwqds transformation applied to compute an eigenvector . 97 4.6 An eigenvector of a tridiagonal : most of its entries are negligible . 106 5.1 Representation Tree — Forming an extra RRR based at 1 . 129 5.2 Representation Tree — Only using the RRR based at 1 . 130 5.3 Representation Tree — An extra RRR based at 1 is essential . 131 6.1 Eigenvalue distribution for Biphenyl . 142 6.2 Eigenvalue distribution for SiOSi6 . 142 6.3 Eigenvalue distribution for Zeolite ZSM-5 . 142 6.4 Absolute and Relative Eigenvalue Gaps for Biphenyl . 152 6.5 Absolute and Relative Eigenvalue Gaps for SiOSi6 . 152 6.6 Absolute and Relative Eigenvalue Gaps for Zeolite ZSM-5 . 153 vii C.1 The strong correlation between element growth and relative robustness . 181 C.2 Blow up of earlier figure : X-axis ranges from 1 + 10−8 to 1 + 3.5 × 10−8 . 181 C.3 Relative condition numbers (in the figure to the right, X-axis ranges from 1 + 10−8 to 1 + 3.5 × 10−8) ........................... 182 viii List of Tables 2.1 Summary of xStein’s iterations to compute the second eigenvector of T . 31 4.1 Timing results on matrices of type 1 . 109 4.2 Accuracy results on matrices of type 1 . 109 4.3 Accuracy results on matrices of type 2 . 109 6.1 Timing Results . 144 6.2 Timing Results . 145 6.3 Maximum Residual Norms ≡ maxi kT vî − λîvîk/nεkT k . 146 6.4 Maximum Residual Norms ≡ maxi kT vî − λîvîk/nεkT k . 147 T 6.5 Maximum Dot Products ≡ maxi6=j |vî vˆj|/nε . 148 T 6.6 Maximum Dot Products ≡ maxi6=j |vî vˆj|/nε . 149 ix Acknowledgements The author would like to express his extreme gratitude to Professor Beresford Parlett for making worthwhile the whole experience of working on this thesis. The author has bene- fited greatly from Professor Parlett’s expertise, vision and clarity of thought in conducting research as well as in its presentation. Moreover, helpful discussions over many a lunch has greatly shaped this work and sparked off future ideas. The author is grateful to Professor Demmel for introducing him to the field of numerical linear algebra. Professor Demmel’s constant support, numerous suggestions and careful reading have played a great role in the development of this thesis. The author also wishes to thank Professor Kahan for much well-received advice, and Dr. George Fann for many helpful discussions. The author is also grateful to many friends who have made life in Berkeley an enriching experience. This research was supported in part by the Defense Advanced Research Projects Agency, Contract No. DAAL03-91-C-0047 through a subcontract with the University of Tennessee, the Department of Energy, Contract No. DOE-W-31-109-Eng-38 through a subcontract with Argonne National Laboratory, and by the Department of Energy, Grant No.

A New O(N2) Algorithm for the Symmetric Tridiagonal Eigenvalue/Eigenvector Problem

High-Performance Parallel Computations Using Python As High-Level Language

Iterative Refinement for Symmetric Eigenvalue Decomposition II

LAPACK WORKING NOTE 195: SCALAPACK's MRRR ALGORITHM 1. Introduction. Since 2005, the National Science Foundation Has Been Fund

Numerical Linear Algebra Texts in Applied Mathematics 55

An Implementation of the MRRR Algorithm on a Data-Parallel Coprocessor

A Survey of Recent Developments in Parallel Implementations of Gaussian Elimination

GLUED MATRICES and the MRRR ALGORITHM 1. Introduction

Fortran Optimizing Compiler 6

A − Λb Be an N-By-N Regular Matrix Pencil with N

Conference Abstracts Index Abderraman´ Marrero, Jesus,´ 72 Beckermann, Bernhard, 18 Absil, Pierre-Antoine, 49, 78 Beitia, M

Matrix Depot Documentation Release 0.1.0

Algorithm 880: a Testing Infrastructure for Symmetric Tridiagonal Eigensolvers