Trace-Penalty Minimization for Large-Scale Eigenspace Computation

Trace-Penalty Minimization for Large-scale Eigenspace Computation Yin Zhang Department of Computational and Applied Mathematics Rice University, Houston, Texas, USA (Co-authors: Zaiwen Wen, Chao Yang and Xin Liu) 1 CAAM VIGRE Seminar, January 31, 2013 1SJTU (Shanghai), LBL (Berkeley) and CAS (Beijing) Yin Zhang (RICE) EIGPEN February, 2013 1 / 39 Outline 1 Introduction Problem Description Existing Methods 2 Motivation Large Scale Redefined Avoid the Bottleneck 3 Trace-Penalty Minimization Basic Idea Model Analysis Algorithm Framework 4 Numerical Results and Conclusion Numerical Experiments and Results Concluding Remarks Yin Zhang (RICE) EIGPEN February, 2013 2 / 39 Section I. Eigenvalue/vector Computation: Fundamental, yet still Challenging Yin Zhang (RICE) EIGPEN February, 2013 3 / 39 Problem Description and Applications Given a symmetric n × n real matrix A Eigenvalue Decomposition AQ = QΛ (1.1) Q 2 Rn×n is orthogonal. n×n Λ 2 R is diagonal (with λ1 ≤ λ2 ≤ ::: ≤ λn on diagonal). k−truncated decomposition (k largest/smallest eigenvalues) AQk = Qk Λk (1.2) n×k Qk 2 R with orthonormal columns; k n. k×k Λk 2 R is diagonal with largest/smallest k eigenvalues. Yin Zhang (RICE) EIGPEN February, 2013 4 / 39 Applications Basic Problem in Numerical Linear Algebra Various scientific and engineering applications Lowest-energy states (Materials, Physics, Chemistry) Density functional theory for Electron Structures Nonlinear eigenvalue problems Singular Value Decomposition Data analysis, e.g., PCA Ill-posed problems Matrix rank minimization F Increasingly large-scale sparse matrices F Increasingly large portions of the spectrum Yin Zhang (RICE) EIGPEN February, 2013 5 / 39 Some Existing Methods Books and Surveys Saad, 1992, “Numerical Methods for Large Eigenvalue Problems” Sorensen, 2002, “Numerical Methods for Large Eigenvalue Problems” Hernandez´ et al, 2009, “A Survey of Software for Sparse Eigenvalue Problems” Krylov Subspace Techniques Arnodi methods, Lanczos Methods – ARPACK (eigs in Matlab) Sorensen, 1996, “Implicitly Restarted Arnoldi/Lanczos Methods for ...... ” Krylov-Schur, ...... Optimization based, e.g., LOBPCG Subspace Iteration, Jacobi-Davidson, polynomial filtering, ... Keep orthogonality: X T X = I at each iteration Rayleigh-Ritz (RR): [V; D] = eig(X T AX); X = X ∗ V; Yin Zhang (RICE) EIGPEN February, 2013 6 / 39 Section II. Motivation: A Method for Larger Eigenspaces with Richer Parallelism Yin Zhang (RICE) EIGPEN February, 2013 7 / 39 What is Large Scale? Ordinarily Large Scale: A large and sparse matrix, say n = 1M A small number of eigen-pairs, say k = 100 Doubly Large Scale: A large and sparse matrix, say n = 1M A large number of eigen-pairs, say k = 1% ∗ n A sequence of doubly large scale problems Change of characters as k jumps: X 2 Rn×k Cost ofRR/ orth(X) AX Parallelism becomes a critical factor Low parallelism in RR/Orth =) Opportunity for new methods? Yin Zhang (RICE) EIGPEN February, 2013 8 / 39 Example: DFT, Materials Science Kohn-Sham Total Energy Minimization T min Etotal(X) s:t: X X = I; (2.1) where, for ρ(X) := diag(XX T ), ! 1 1 E (X) := tr X T ( L + V )X + ρTL yρ + ρT (ρ) + E : total 2 ion 2 xc rep Nonlinear eigenvalue problem: up to 10% smallest eigen-pairs. A Main Approach: SCF — a sequence of linear eigenvalue problems Yin Zhang (RICE) EIGPEN February, 2013 9 / 39 Avoid the Bottleneck Two Types of Computation: AX and RR=orth As k becomes large, AX is dominated by RR=orth — bottleneck Parallelism AX −! Ax1 [ Ax2 [ ::: [ Axk . Higher. RR=orth contains sequentiality. Lower. Avoid bottleneck? Do fewer RR/orth No free lunch? Do more BLAS3 (higher parallelism than AX) Yin Zhang (RICE) EIGPEN February, 2013 10 / 39 Section III. Trace-Penalty Minimization: Free of Orthogonalization BLAS3-Dominated Computation Yin Zhang (RICE) EIGPEN February, 2013 11 / 39 Basic Idea Trace Minimization min ftr(X TAX): X TX = Ig (3.1) X2Rn×k Trace-penalty Minimization 1 T µ T 2 min f(X) := tr(X AX) + kX X − IkF: (3.2) X2Rm×k 2 4 It is well known that µ ! 1, (3.2) =) (3.1) Quadratic Penalty Function (Courant 1940’s) This idea appears old and unsophisticated. However, ...... Yin Zhang (RICE) EIGPEN February, 2013 12 / 39 “Exact” Penalty However, µ ! 1 is unnecessary. Theorem (Equivalence in Eigenspace) Problem (3.2) is equivalent to (3.1) if and only if µ > λk : (3.3) Under (3.3), all minimizers of (3.2) have the SVD form: 1=2 T X^ = Qk (I − Λk /µ) V ; (3.4) where Qk consist of k eigenvectors associated with a set of k smallest k×k eigenvalues that form the diagonal matrix Λk , and V 2 R is any orthogonal matrix. Yin Zhang (RICE) EIGPEN February, 2013 13 / 39 Fewer Saddle Points Original Model: minftr(X TAX): X TX = I; X 2 Rn×k g One minimum/maximum subspace (discounting multiplicity). All k-dimensional eigenspaces are saddle points. However, for the penalty model: Theorem Let f(X) be the penalty function associated with parameter µ > 0. 1 For µ 2 (λk ; λn), f(X) has a unique minimum, no maximum. 2 For µ 2 (λk ; λk+p) where λk+p is the smallest eigenvalue > λk , a rank-k stationary point must be a minimizers, as defined in (3.4). In a sense, the penalty model is much stronger. Yin Zhang (RICE) EIGPEN February, 2013 14 / 39 Error Bounds between Optimality Conditions First order condition Our penalty model: 0 = rf(X) , AX + µX(X TX − I); Original model: 0 = R(X) , AY(X) − Y(X)(Y(X)TAY(X)), where Y(X) is an orthonormal basis of spanfXg. Lemma Let rf(X) (with µ > λk ) and R(X) be defined as above, then −1 kR(X)kF ≤ σmin(X)krf(X)kF ; (3.5) where σmin(X) is the smallest singular value of X. Moreover, for any global minimizer X^ and any > 0, there exists δ > 0 such that whenever kX − X^kF ≤ δ, 1 + kR(X)kF ≤ p krf(X)kF : (3.6) 1 − λk /µ Yin Zhang (RICE) EIGPEN February, 2013 15 / 39 Condition Number Condition Number of the Hessian at Solution 2 2 2 κ(r f(X^)) , λmax(r f(X^))/λmin(r f(X^)) Determining factor for asymptotic convergence rate of gradient methods Lemma Let X^ be a global minimizer of (3.2) with µ > λk . The condition number of the Hessian at X^ satisfies max (2(µ − λ1); (λn − λ1)) κ r2f(X^) ≥ : (3.7) min (2(µ − λk ); (λk+1 − λk )) In particular, the above holds as an equality for k = 1. Gradient methods may encounter slow convergence at the end. Yin Zhang (RICE) EIGPEN February, 2013 16 / 39 Generalizations Generalized eigenvalue problems: X TX = I ! X TBX = I Keep out undesired subspace: UTX = 0 (UT U = I) Trace Minimization with Subspace Constraint min ftr(X TAX): X TBX = I; UTX = 0g X2Rn×k Trace-Penalty Formulation 1 T T µ T T 2 min tr(X Q AQX) + kX Q BQX − IkF X 2 4 where Q = I − UUT (QX = X − U(UT X)). With changes of variables, all results still hold. Yin Zhang (RICE) EIGPEN February, 2013 17 / 39 Algorithms for Trace-Penalty Minimization Gradient Methods: X X − αrf(X): rf(X) = AX + µX(X TX − I) First Order Condition: rf(X) = 0 , AX = X(I − X TX)µ 2 Types of Computations for rf(X): 1 AX: O(k nnz(A)) 2 X(X T X): O(k 2n) — BLAS3 (2) dominates (1) whenever k nnz(A)=n Gradient methods requires NO RR/Orth Yin Zhang (RICE) EIGPEN February, 2013 18 / 39 Gradient Method Preserve Full Rank Lemma Let X j+1 be generated by X j+1 = X j − αjrf(X j) from a full rank X j. Then X j+1 is rank deficient only if 1/αj is one of the k generalized eigenvalues of the problem: [(X j)T rf(X j)]u = λ[(X j)T (X j)]u: j j j j+1 On the other hand, if α < σmin(X )=jjrf(X )jj2,X is full rank. Combined with previous results, there is a high probability of getting a global minimizer by using gradient type methods. Yin Zhang (RICE) EIGPEN February, 2013 19 / 39 Gradient Methods (Cont’d) X j+1 = X j − αjrf(X j) Step Size α Non-monotone line search (Grippo 1986, Zhang-Hager 2004) Initial BB step: tr((Sj)TY j) αj = arg min jjSj − αY jjj2 = F jj jjj2 α Y F where Sj = X j − X j−1, Y j = rf(X j) − rf(X j−1). Many other choices Yin Zhang (RICE) EIGPEN February, 2013 20 / 39 Current Algorithm Framework: 1 Pre-process — scaling, shifting, preconditioning 2 Penalty parameter µ — dynamically adjusted 3 Gradient iterations — main operations: X(X T X) and AX 4 RR Restart — computing Ritz-pairs and restarting (Further steps possible, but NOT used in comparison) 5 Deflation — working on desired subspaces only 6 Chebychev Filter — improving accuracy Yin Zhang (RICE) EIGPEN February, 2013 21 / 39 Enhancement: RR Restarting RR Steps return Ritz-pairs for given subspaces 1 Orthogonalization: Q 2 orth(X) 2 Eigenvalue decomposition: QTAQ = V TΣV 3 Ritz-paires: QV and diag(Σ) RR Steps ensure accurate terminations RR Steps can accelerate convergence Very few RR Steps are used Yin Zhang (RICE) EIGPEN February, 2013 22 / 39 Section IV. Numerical Results and Conclusion Yin Zhang (RICE) EIGPEN February, 2013 23 / 39 Pilot Tests in Matlab Matrix: delsq(numgrid(’S’,102)); size: n = 10000; tol = 1e-3 CPU Time in Seconds 120 300 eigs eigs lobpcg lobpcg eigpen 100 eigpen 250 80 200 150 60 CPU Second CPU Second 100 40 50 20 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500 Number of Eigenvalues Number of Eigenvalues (a) with “-singleCompThread” (b) without “-singleCompThread” Yin Zhang (RICE) EIGPEN February, 2013 24 / 39 Experiment Environment Running Platform A single node of a Cray XE6 supercomputer (NERSC) Two 12-core AMD ‘MagnyCours’ 2.1-GHz processors 32 GB shared memory System and language: Cray Linux Environment version 3 Fortran + OpenMP All 24 cores are used unless otherwise specified Solvers Tested ARPACK LOBPCG EIGPEN Yin Zhang (RICE) EIGPEN February, 2013 25 / 39 Relative Error Measurements Let x1 x2 ··· xk be computed Ritz vectors, and θi Ritz values.

Trace-Penalty Minimization for Large-Scale Eigenspace Computation

CUDA 6 and Beyond

Accelerating the LOBPCG Method on Gpus Using a Blocked Sparse Matrix Vector Product

Solving Symmetric Semi-Definite (Ill-Conditioned)

On the Performance and Energy Efficiency of Sparse Linear Algebra on Gpus

LARGE-SCALE COMPUTATION of PSEUDOSPECTRA USING ARPACK and EIGS∗ 1. Introduction. the Matrices in Many Eigenvalue Problems

High Efficiency Spectral Analysis and BLAS-3

A High Performance Implementation of Spectral Clustering on CPU-GPU Platforms

A High Performance Block Eigensolver for Nuclear Conﬁguration Interaction Calculations Hasan Metin Aktulga, Md

Slepc Users Manual Scalable Library for Eigenvalue Problem Computations

Comparison of Numerical Methods and Open-Source Libraries for Eigenvalue Analysis of Large-Scale Power Systems

Computing Singular Values of Large Matrices with an Inverse-Free Preconditioned Krylov Subspace Method∗

Recent Implementations, Applications, and Extensions of The