Solving Symmetric Semi-Definite (Ill-Conditioned)

Solving Symmetric Semi-definite (ill-conditioned) Generalized Eigenvalue Problems Zhaojun Bai University of California, Davis Berkeley, August 19, 2016 Symmetric definite generalized eigenvalue problem I Symmetric definite generalized eigenvalue problem Ax = λBx where AT = A and BT = B > 0 I Eigen-decomposition AX = BXΛ where Λ = diag(λ1; λ2; : : : ; λn) X = (x1; x2; : : : ; xn) XT BX = I: I Assume λ1 ≤ λ2 ≤ · · · ≤ λn LAPACK solvers I LAPACK routines xSYGV, xSYGVD, xSYGVX are based on the following algorithm (Wilkinson'65): 1. compute the Cholesky factorization B = GGT 2. compute C = G−1AG−T 3. compute symmetric eigen-decomposition QT CQ = Λ 4. set X = G−T Q I xSYGV[D,X] could be numerically unstable if B is ill-conditioned: −1 jλbi − λij . p(n)(kB k2kAk2 + cond(B)jλbij) · and −1 1=2 kB k2kAk2(cond(B)) + cond(B)jλbij θ(xbi; xi) . p(n) · specgapi I User's choice between the inversion of ill-conditioned Cholesky decomposition and the QZ algorithm that destroys symmetry Algorithms to address the ill-conditioning 1. Fix-Heiberger'72 (Parlett'80): explicit reduction 2. Chang-Chung Chang'74: SQZ method (QZ by Moler and Stewart'73) 3. Bunse-Gerstner'84: MDR method 4. Chandrasekaran'00: \proper pivoting scheme" 5. Davies-Higham-Tisseur'01: Cholesky+Jacobi 6. Working notes by Kahan'11 and Moler'14 This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm 2. An algebraic reformulation 3. Locally accelerated block preconditioned steepest descent (LABPSD) This talk Three approaches: 1. A LAPACK-style implementation of Fix-Heiberger algorithm Status: beta-release 2. An algebraic reformulation Status: completed basic theory and proof-of-concept 3. Locally accelerated block preconditioned steepest descent (LABPSD) Status: published two manuscripts, software to be released A LAPACK-style implementation of Fix-Heiberger algorithm (with C. Jiang) A LAPACK-style solver T I xSYGVIC: computes "-stable eigenpairs when B = B ≥ 0 wrt a prescribed threshold ". I Implementation is based on Fix-Heiberger's algorithm, and organized in three phases. I Given the threshold ", xSYGVIC determines: 1. A − λB is regular and has k (0 ≤ k ≤ n) "-stable eigenvalues or 2. A − λB is singular. I The new routine xSYGVIC has the following calling sequence: xSYGVIC( ITYPE, JOBZ, UPLO, N, A, LDA, B, LDB, ETOL,& K, W, WORK, LDWORK, WORK2, LWORK2, IWORK, INFO ) xSYGVIC { Phase I 1. Compute the eigenvalue decomposition of B (xSYEV): n1 n2 (0) (0) T n1 D B = Q1 BQ1 = D = (0) ; n2 E where diagonal entries of D: d11 ≥ d22 ≥ ::: ≥ dnn, and elements of (0) (0) E are smaller than "d11 . 2. Set E(0) = 0, and update A and B(0): n1 n2 " (1) (1) # (1) T T n1 A11 A12 A = R1 Q1 AQ1R1 = (1)T (1) n2 A12 A22 and n1 n2 (1) T (0) n1 I B = R1 B R1 = ; n2 0 (0) −1=2 where R1 = diag((D ) ;I) xSYGVIC { Phase I 3. Early exit B is "-well-conditioned. A − λB is regular and has n "-stable eigenpairs (Λ, X): (1) I A U = UΛ (xSYEV). I X = Q1R1U xSYGVIC { Phase I: performance profile T T I Test matrices A = QADAQA and B = QBDBQB where I QA;QB are random orthogonal matrices; I DA is diagonal with −1 < DA(i; i) < 1; i = 1; : : : ; n; I DB is diagonal with 0 < " < DB (i; i) < 1; i = 1; : : : ; n; I 12-core on an Intel "Ivy Bridge" processor (Edison@NERSC) Runtime of xSYGV/xSYGVIC with B>0 Percentage of sections in xSYGVIC 30 1 DSYGV DSYGVIC 25 0.8 20 0.6 15 2.05 0.4 Percentage 10 0.2 Runtime (seconds) 2.03 5 2.22 0 2.73 2000 3000 4000 5000 Matrix order n 0 2000 3000 4000 5000 xSYEV of B xSYEV of A Others Matrix order n xSYGVIC { Phase II (1) (1) 1. Compute the eigendecomposition of (2,2)-block A22 of A (xSYEV): n3 n4 (2) (2) (2)T (1) (2) n3 D A22 = Q22 A22 Q22 = (2) n4 E (2) (2) (2) where eigenvalues are ordered such that jd11 j ≥ jd22 j ≥ · · · ≥ jdn2n2 j, (2) (2) and elements of E are smaller than "jd11 j. 2. Set E(2) = 0, and update A(1) and B(1): (2) T (1) (2) T (1) A = Q2 A Q2;B = Q2 B Q2 (2) where Q2 = diag(I;Q22 ). (1) 3. Early exit When A22 is a "-well-conditioned matrix. A − λB is regular and has n1 "-stable eigenpairs (Λ, X): (2) (2) I A U = B UΛ (Schur complement and xSYEV) I X = Q1R1Q2U. xSYGVIC { Phase II (backup) A(2)U = B(2)UΛ (1) where n n 1 2 n1 n2 " (2) (2) # (2) n1 A11 A12 (2) n1 I A = (2)T and B = (2) n2 0 n2 A12 D . Let n1 n U U = 1 1 n2 U2 The eigenvalue problem (1) becomes (2) (2) (2) (2) −1 (2)T F U1 = A11 − A12 (D ) A12 U1 = U1Λ (xSYEV) (2) −1 (2) T U2 = −(D ) (A12 ) U1 xSYGVIC { Phase II: performance profile Accuracy: 1. If B ≥ 0 has n2 zero eigenvalues: I xSYGV stops, the Cholesky factorization of B could not be completed. I xSYGVIC successfully computes n − n2 "-stable eigenpairs. 2. If B has n2 small eigenvalues about δ, both xSYGV and xSYGVIC \work", but produce different quality numerically.1 −13 −12 I n = 1000; n2 = 100;δ = 10 and " = 10 . Res1 Res2 DSYGV 3.5e-8 1.7e-11 DSYGVIC 9.5e-15 7.1e-12 −15 −12 I n = 1000; n2 = 100;δ = 10 and " = 10 . Res1 Res2 DSYGV 3.6e-6 1.8e-10 DSYGVIC 1.3e-16 6.8e-14 1 Res1 = kAXb − BXbΛbkF =(nkAkF kXbkF ) and T 2 Res2 = kXb BXb − IkF =(kBkF kXbkF ). xSYGVIC { Phase II: performance profile Timing: T T I Test matrices A = QADAQA and B = QBDBQB where I QA;QB are random orthogonal matrices; I DA is diagonal with −1 < DA(i; i) < 1; i = 1; : : : ; n; I DB is diagonal with 0 < DB (i; i) < 1; i = 1; : : : ; n and n2=n DB (i; i) < ". I 12-core on an Intel "Ivy Bridge" processor (Edison@NERSC) Runtime of xSYGV/xSYGVIC Percentage of sections in xSYGVIC 25 1 xSYGV xSYGVIC 0.8 20 1.25 0.6 15 1.25 0.4 Percentage 10 0.2 Runtime (seconds) 5 2.47 0 2000 3000 4000 5000 3.01 Matrix order n 0 (2) 2000 3000 4000 5000 xSYEV(B) xSYEV(F ) Others Matrix order n xSYGVIC { Phase II: performance profile Why is the overhead ratio of xSYGVIC lower? I Performance of xSYGV varies depending on the percentage of \zero" eigenvalues of B. I For example, for n = 4000 on a 12-core processor execution: Runtime of DSYGV for n=4000 12 DSYEV Others 10 8 6 4 Runtime (seconds) 2 0 0 10% 20% 33% 50% Percentage of "zero" eigenvalues of B xSYGVIC { Phase III 1. A(2) and B(2) can be written as 3 by 3 blocks: n1 n3 n4 n1 n3 n4 2 (2) (2) (2) 3 2 3 n1 A11 A12 A13 n1 I (2) (2)T (2) (2) A n 6 7 and B n3 0 = 3 4 A12 D 5 = 4 5 (2)T n4 0 n4 A13 0 where n3 + n4 = n2. (2) 2. Reveal the rank of A13 by QR decomposition with pivoting: (2) (3) (3) (3) A13 P13 = Q13 R13 where n4 (3) (3) n4 A14 R13 = n5 0 xSYGVIC { Phase III (2) 2 3. Final exit When n1 > n4 and A13 is full rank, then A − λB is regular and has n1 − n4 "-stable eigenpairs (Λ, X): (3) (3) I A U = B UΛ I X = Q1R1Q2Q3U. 2All the other cases either lead A − λB to be \singular" or \regular but no finite eigenvalues". xSYGVIC { Phase III (backup) A(3)U = B(3)UΛ (2) I Update (3) T (2) (3) T (2) A = Q3 A Q3 and B = Q3 B Q3 where n1 n3 n4 2 (3) 3 n1 Q13 Q3 = n3 6 I 7 4 (3) 5 n4 P13 (3) (3) I Write A and B as 4 × 4 blocks: n4 n5 n3 n4 2 3 (3) (3) (3) (3) n4 n5 n3 n4 n4 A A A A 2 3 6 11 12 13 14 7 n I 6 (3) T (3) (3) 7 4 (3) n5 6 (A ) A A 0 7 (3) n 6 I 7 A = 6 12 22 23 7;B = 5 6 7; 6 (3) T (3) T (2) 7 n3 4 0 5 n3 6 (A ) (A ) D 0 7 4 13 23 5 n 0 (3) T 4 n4 (A14 ) 0 0 0 where n1 = n4 + n5 and n2 = n3 + n4. xSYGVIC { Phase III (backup) I Let n5 2 3 n4 U1 n5 6 U2 7 U = 6 7 n3 4 U3 5 n4 U4 then the eigenvalue problem (2) becomes: U1 = 0 (3) (3) (3) −1 (3)T A22 − A23 (D ) A23 U2 = U2Λ (xSYEV) (2) −1 (3)T U3 = −(D ) A23 U2 (3) −1 (3) (3) U4 = −(A14 ) A12 U2 + A13 U3 xSYGVIC { Phase III: performance profile Test case (Fix-Heiberger'72) 1. Consider 8 × 8 matrices: A = QT HQ and B = QT SQ; where 2 6 0 0 0 0 0 1 0 3 6 0 5 0 0 0 0 0 1 7 6 7 6 0 0 4 0 0 0 0 0 7 6 7 6 0 0 0 3 0 0 0 0 7 H = 6 7 6 0 0 0 0 2 0 0 0 7 6 7 6 0 0 0 0 0 1 0 0 7 6 7 4 1 0 0 0 0 0 0 0 5 0 1 0 0 0 0 0 0 and S = diag[1; 1; 1; 1; δ; δ; δ; δ] As δ ! 0, λ = 3; 4 are the only stable eigenvalues of A − λB.

Solving Symmetric Semi-Definite (Ill-Conditioned)

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

On the Performance and Energy Efficiency of Sparse Linear Algebra on Gpus

High Efficiency Spectral Analysis and BLAS-3

A High Performance Block Eigensolver for Nuclear Conﬁguration Interaction Calculations Hasan Metin Aktulga, Md

Computing Singular Values of Large Matrices with an Inverse-Free Preconditioned Krylov Subspace Method∗

Recent Implementations, Applications, and Extensions of The

Block Locally Optimal Preconditioned Eigenvalue Xolvers BLOPEX

Robust and Efficient Computation of Eigenvectors in a Generalized

Limited Memory Block Krylov Subspace Optimization for Computing Dominant Singular Value Decompositions

High-Performance SVD for Big Data

Locally Optimal Block Preconditioned Conjugate Gradient Method for Hierarchical Matrices

Lawrence Berkeley National Laboratory Recent Work