A Projected Preconditioned Conjugate Gradient Method for the Linear Response Eigenvalue Problem

NUMERICAL ALGEBRA, doi:10.3934/naco.2018025 CONTROL AND OPTIMIZATION Volume 8, Number 4, December 2018 pp. 389–412

A PROJECTED PRECONDITIONED CONJUGATE GRADIENT METHOD FOR THE LINEAR RESPONSE EIGENVALUE PROBLEM

Xing Li School of Mathematics, Shanghai University of Finance and Economics 777 Guoding Road, Yangpu District Shanghai, 200433, People’s Republic of China Chungen Shen College of Science, University of Shanghai for Science and Technology 334 Jungong Road, Yangpu District Shanghai 200093, China Lei-Hong Zhang˚ School of Mathematics and Shanghai Key Laboratory of Financial Information Technology Shanghai University of Finance and Economics 777 Guoding Road, Yangpu District Shanghai, 200433, People’s Republic of China

(Communicated by Wenyu Sun)

Abstract. The linear response eigenvalue problem aims at computing a few smallest positive eigenvalues together with the associated eigenvectors of a special Hamiltonian matrix and plays an important role for estimating the excited states of physical systems. A subspace version of the Thouless minimization principle was established by Bai and Li (SIAM J. Matrix Anal. Appl., 33:1075- 1100, 2012) which characterizes the desired eigenpairs as its solution. In this paper, we propose a Projected Preconditioned Conjugate Gradient (PPCG lrep) method to solve this subspace version of Thouless’s minimization directly. We show that PPCG lrep is an eﬃcient implementation of the inverse power iteration and can be performed in parallel. It also enjoys several properties including the monotonicity and constraint preservation in the Thouless minimization principle. Convergence of both eigenvalues and eigenvectors are established and numerical experiences on various problems are reported.

1. Introduction. In computational quantum chemistry and physics, the so-called random phase approximation (RPA) describes the excited states of physical systems in many-particle systems [1,3, 10, 11, 15, 16, 17, 18], which has applications in silicon nanoparticles and nanoscale materials and analysis of interstellar clouds [1,2]. One

2010 Mathematics Subject Classiﬁcation. Primary: 65F15, 65F30; Secondary: 15A18. Key words and phrases. linear response eigenvalue problem, conjugate gradient method, inverse power iteration, minimization principle. The third author is supported by NSF grant NSFC-11671246, NSFC-91730303 and NSFC- 11371102. ˚ Corresponding author: Lei-Hong Zhang.

389 390 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG important question in RPA is to compute a few, say k, eigenpairs associated with the smallest positive eigenvalues of the following eigenvalue problem:

AB u u Hw “ “ λ (1.1) ´B Á v v „  „  „  where A, B P Rnˆn are both symmetric matrices and AB is positive definite. (1.2) BA „  The eigenvalue problem (1.1) results from computing the excitation energies and absorption spectrum of a time-dependent Kohn-Sham (KS) system in the density functional theory [3, 11, 12]. The well-known time-independent (one-particle) KS equation can give properties of the corresponding many-particle system in the ground state, as long as an accurate exchange-correlation potential approximation is provided. When the underlying many-particle system is perturbed by a specific perturbation, the excitation energies and absorption spectrum are usually of interest. In this situation, the associated KS equation is time-dependent, and in the absence of external perturbations, its linearized system in the frequency domain can be expressed as an eigenvalue problem [11, eq. (14)] associated with the Liou- villian super operator. The information in the eigensystem of the Liouvillian super operator provides the excited states of the underlying system where the eigenvalues represent the excitation energies and the eigenpairs can be used to estimate the absorption spectrum. The matrix H in (1.1) is the finite dimensional batch representation [11] of the action of the Liouvillian super operator onto one-particle time-dependent KS density matrix in the frequency domain [11]. Note that H is a special Hamiltonian matrix and its eigenvalues are all real and appear in pairs ˘λ with λ ě 0. In quantum uj chemistry, the first a few eigenpairs λj, for j “ 1, 2, . . . , k, corresponding vj to the smallest k positive eigenvaluesˆ „ ˙

0 ď λ1 ď λ2 ď ¨ ¨ ¨ ď λk of H are usually desired, where λj ě 0 (resp. ´λj ď 0) is called the jth excitation uj energy (resp. the deexcitation energy). The ﬁrst k eigenpairs λj, for vj j “ 1, 2, . . . , k are used to compute the absorption spectrum [3, eq.ˆ (13)]„ associated˙ with the perturbed time-dependent KS system, and we refer to [3, 11, 12] for more detailed discussion on the Time-Dependent Density Functional Theory (TDDFT) for the excited-state calculations. The eigenvalue problem (1.1) is referred to as the Linear Response Eigenvalue Problem in the literature, and several minimization principles and algorithms [2, 3, 10, 11, 13, 14] have been proposed to obtain the ﬁrst few eigenpairs of (1.1). Among them, Thouless’s minimization principle [15] describes the smallest positive eigenvalue λ1 of H as the minimum of the following optimization problem:

T u AB u v BA v λ1 “ min %pu, vq, where %pu, vq “ „  „ 2 2 „ . (1.3) }u}2‰}v}2 |}u}2 ´ }v}2| A PROJECTED PRECONDITIONED CG METHOD 391

Note that we can introduce a matrix 1 I I J “ ? n n , 2 In ´In „  T 2 satisfying J J “ J “ I2n to transform similarly [1,2] H to A ´ B K J THJ “ “: “: H; (1.4) A ` B M „  „  thus, by the relation y u u y z :“ “ J T , “ J , x v v x „  „  „  „  we can rewrite (1.1) equivalently as K y y Hz :“ “ λ . (1.5) M x x „  „  „  Consequently, Thouless’s minimization principle (1.3) can be equivalently expressed as xTKx ` yTMy λ1 “ min ρpx, yq, where ρpx, yq “ , (1.6) xTy‰0 2|xTy| or equivalently,

1 T T λ1 “ min x Kx ` y My. 2 xTy“1 A further step towards Thouless’s minimization principal (1.6) is made in [1] where a subspace version is developed,1 namely, k 1 T T λi “ inf χpU, V q :“ trpU KU ` V MV q , (1.7) U TV “I 2 i“1 k ÿ " * where U, V P Rnˆk. It is true that the condition (1.2) implies that both K and M in (1.4) are symmetric and positive definite [1], and computing the first k eigenpairs of (1.5) associated with λj for j “ 1, 2, . . . , k is also referred to the Linear Response Eigenvalue Prob- lem (LREP) in this paper. Apart from some direct methods [5,8] for obtaining the full eigenvalue decomposition of the matrix H in (1.1), several projection methods [2,3, 10, 13, 14] have been proposed based on this extension of Thouless’s minimization principal (1.7) for finding the desired eigenpairs. The noticeable work of Bai and Li in [2] proposes the LO4DCG method which is an efficient realization of the locally optimal block preconditioned CG by making full use of the special structure of (1.5). LO4DCG is an improvement over the block 4D steep descent method introduced in [10]. [3, 13] discuss single-vector Lanczos type methods and [14] considers a block Chebyshev- Davidson iteration to compute the desired eigenpairs of (1.5). All these algorithms follow similarly a Rayleigh-Ritz projection technique used in the traditional eigenvalue computations. In particular, different algorithm introduces specific subspace expansion procedure but projects the original LREP onto the resulted subspace to form a smaller size LREP (1.5) for the Ritz pairs.

1When K are M are both positive deﬁnite, the “inf” in (1.7) can be replaced by “min”. 392 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

In this paper, we suggest a Projected Preconditioned Conjugate Gradient (PPCG lrep) method to solve the subspace version of Thouless’s minimization principal (1.7) directly. The proposed PPCG lrep is indeed an efficient implementation of an alternative variables iteration for the minimization principle (1.7). PPCG lrep it- eratively computes a sequence of pairs pUj,Vjq for j “ 1, 2,... , and preserves the bi- T orthogonality Uj Vj “ Ik and ensures the monotonicity χpUj`1,Vj`1q ď χpUj,Vjq. For the iteration j, the k columns in either Uj or Vj can be computed by a projected preconditioned CG as a whole in a block manner, or computed separately in a parallel scheme. Moreover, if appropriate preconditioners of K and/or M are available, they can be incorporated to speed up PPCG lrep. The linear convergence of pUj,Vjq to the minimizer of the Thouless’s minimization principal (1.7) is proved under a gap assumption λk ă λk`1, and the k approximate eigenpairs can finally be obtained via solving an LREP of size k. Numerical experiences of PPCG lrep, coded both in MATLAB and in C lagrange, on several problems with various choices of preconditioners are reported. We organize the paper as follows. Relevant preliminary results are stated in Sec- tion2. In Section3, we first give the framework of an alternative variable iteration, and then describe its implementation details, i.e., the PPCG lrep method. The convergence analysis of PPCG lrep is made in Section4. Our numerical evaluation of PPCG lrep for LREP on both randomly generated problems as well as two practical problems from computational quantum chemistry is conducted in Section5, and conclusions are drawn in Section6.

2. Preliminary results. To facilitate our discussions, we ﬁrst collect several nec- essary properties of the LREP in Theorem 2.1, and the reader can refer to [1, Section 2] and [15, 16, 17, 18] for more propositions.

2.1. Basic results about LREP. Theorem 2.1. Suppose that M is deﬁnite. Then the following statements are true: 1. There exists a nonsingular Y P Rnˆn such that K “ YΛ2Y T,M “ XXT, (2.1)

´ T where Λ “ diagtλ1, λ2, . . . , λnu and X “ Y . 2. The eigen-decompostion of KM and MK are pKMqY “ YΛ2, pMKqX “ XΛ2, (2.2) respectively. 3. If K is also deﬁnite, then all λi ą 0 and His diagonalizable: YΛYΛ YΛYΛ Λ H “ . X ´X X ´X ´Λ „  „  „  4. H is not diagonalizable if and only if λ1 “ 0, which happens when and only when K is singular. YΛ 5. The ith column of Z “ is the eigenvector corresponding to λ , and it X i „  is unique if (a) λi is a simple eigenvalue of H, or (b) i “ 1, λ1 “ `0 ă λ2. In this case, 0 is a double eigenvalue of H but there is only one eigenvector associated with it. A PROJECTED PRECONDITIONED CG METHOD 393

The property (2.2) follows directly from (2.1), which implies that we can alternatively solve the LREP by solving any product eigenvalue problem in (2.2). It is worth noting that only the k smallest positive eigenvalues together with the associated eigenvectors are of interests, and we are concerned with the extreme eigenpair of the product eigenvalue problem in (2.2). By the Lagrangian multiplier theory, for a KKT pair pU, V q of the Thouless type minimization principle (1.7), we know that there are two matrices Ξ P Rkˆk and Υ P Rkˆk such that KU “ VΞ and MV “ UΥ. The deflating subspace tU, Vu of tK,Mu in [1] is essentially the one spanned by the KKT pair pU, V q which satisfies KU Ď V and MV Ď U. For any deflating subspace tU, Vu with K n dimpUq “ dimpVq “ k and U ‘ V “ R , (2.3) there is a KKT pair pU, V q so that RpUq “ U and RpV q “ V, and vice versa. The solution of the Thouless type minimization principle (1.7) just corresponds to the extreme positive deflating subspace so that the objective function value χpU, V q achieves the minimum. Let U P Rnˆk and V P Rnˆk be the basis matrices of the subspaces U and V T T satisfying (2.3), then W “ U V is nonsingular. Factorize W as W “ W1 W2 with two nonsingular W1 and W2 to have a structure-preserving projection matrix HSR VW ´1 VW ´1 H 2 “ 2 H , (2.4) UW ´1 UW ´1 SR „ 1  „ 1  where W ´ TU TKUW ´1 H “ 1 1 . (2.5) SR W ´ TV TMVW ´1 „ 2 2  yˆ Consequently, any eigenpair λ, of H yields an eigenvalue λ of H and the xˆ SR ˆ „ ˙ ´1 ´1 corresponding eigenvector of (1.5) with x “ UW1 xˆ and y “ VW2 yˆ. 2.2. Canonical angles. In our discussion of the convergence, we will use the canonical angles and angles in M-inner product. For two subspaces A and B of n R with k “ dimpAq ď dimpBq “ `, the angles θipA, Bq are defined recursively for i “ 1, 2, . . . , k, by [7] T T cos θipA, Bq “ max max x y “ xi yi (2.6) xPA yPB subject to T T }x}2 “}y}2 “ 1, x xj “ y yj “ 0, j “ 1, 2, . . . , i ´ 1. (2.7) If A P Rnˆk and B P Rnˆ` are orthonormal basis matrices of A and B, respectively, T and σ1 ď ¨ ¨ ¨ ď σk are the singular values of B A, then the k canonical angles θjpA, Bq from A to B are π 0 ď θ pA, Bq :“ arccos σ ď for 1 ď j ď k. j j 2 Set ΘpA, Bq “ diagtθ1pA, Bq, . . . , θkpA, Bqu. (2.8)

Note that θ1pA, Bq ě ¨ ¨ ¨ ě θkpA, Bq and the angles are independent of the orthonormal basis matrices A and B. Therefore, no confusion can arise to use ΘpA, Bq instead of ΘpA, Bq. 394 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

The value } sin ΘpA, Bq}2 defines a distance metric between A and B, and T } sin ΘpA, Bq}2 “ sin θ1pA, Bq “ }A BK}2, where BK is an orthonormal basis matrix of the orthogonal complement of B. Now for the symmetric and positive definite matrix M, we denote the M-inner T product by xx, yyM “ x My. Generalizing the canonical angles given by (2.6) and (2.7) leads to the angles in M-inner product, which we will denote by θipA, BqM and ΘpA, BqM “ diagtθ1pA, BqM , . . . , θkpA, BqM u. The canonical angles θipA, Bq and the angles θipA, BqM in M-inner product are related as follows [7, Theorem 4.2]: if M “ XXT, then the angels between subspaces A and B relative to M-inner product coincide with the canonical angels between subspaces XTA and XTB. Based on this connection, instead of the distance metric } sin ΘpA, Bq}2 between A and B, we have the distance } sin ΘpA, BqM }2 and T T } sin ΘpA, BqM }2 “} sin ΘpX A,X Bq}2. (2.9) 3. The Projected Preconditioned Conjugate Gradient Method. In this section, we first present the framework of an alternative variables iteration and then discuss an efficient Projected Preconditioned Conjugate Gradient (PPCG lrep) method for its implementation. We assume that K and M are symmetric positive definite from now on. 3.1. The framework of the alternative variables iteration. By relying on the minimization principle (1.7), a natural idea is to alternatively iterate U P Rnˆk and V P Rnˆk to improve the objective value χpU, V q, each with the other variable fixed. nˆ2k T In particular, starting from an initial pair rU0,V0s P R satisfying U0 V0 “ Ik, Algorithm 3.1 provides the basic iteration step.

Algorithm 3.1 The framework of the alternative variables iteration for LREP nˆ2k T Given a pair rU0,V0s P R with U0 V0 “ Ik, the following iteration computes an approximate maximizer for the optimization problem (1.7). 1: j Ð 0; 2: while not convergence do 3:

1 T Uj`1 : “ arg min trpU KUq “ arg min χpU, Vjq; (3.1a) T T U Vj “Ik 2 U Vj “Ik

1 T Vj`1 : “ arg min trpV MV q “ arg min χpUj`1,V q; (3.1b) T T Uj`1V “Ik 2 Uj`1V “Ik

4: j Ð j ` 1; 5: end while

nˆ2k The initial pair rU0,V0s P R can simply be U0 “ V0 “ re1,..., eks, or more generally, T ´1 U0 “ UpU V q ,V0 “ V for any full column rank U and V with rankpU TV q “ k. Thus it is seen that the main computational burden in each outer iteration of Algorithm 3.1 is solving the two subproblems (3.1a) and (3.1b). Taking (3.1a) for example, the Lagrangian multiplier theory says that there exists a matrix Ξ P Rkˆk satisfying T KU ´ VjΞ “ 0 and U Vj “ Ik, (3.2) A PROJECTED PRECONDITIONED CG METHOD 395 which leads to the solution of (3.1a) ´1 T ´1 ´1 Uj`1 “ K VjpVj K Vjq . (3.3) By a similar argument, we can easily see that the solution of (3.1b) is ´1 T ´1 ´1 Vj`1 “ M Uj`1pUj`1M Uj`1q . (3.4) Now, by substituting (3.3) into (3.4), we know that the iteration formulation from Vj to the next Vj`1 is given by ´1 T ´1 ´1 T ´1 Vj`1 “ pKMq Vj rVj pKMKq Vjs pVj K Vjq . (3.5)

“:Ωk One can easily realize that an exchangelooooooooooooooooooooomooooooooooooooooooooon of the subproblems (3.1a) and (3.1b) in Algorithm 3.1 results in the iteration formula for the sequence tUju with ´1 T ´1 ´1 T ´1 Uj`1 “ pMKq UjrUj pMKMq Ujs pUj M Ujq. (3.6) An interesting observation from (3.5) or (3.6) implies that our alternative variables iteration is essentially a special type of inverse power iteration for solving the product eigenvalue problems (2.2). However, we remark that the explicit iteration formula (3.5) is of little use from the computational point of view, as it would be too expensive to compute the matrix-matrix products and inverses; moreover, we will not also rely upon (3.5) to get an approximation solution (using, for example, the Krylov subspace iteration to obtain an inexact solution Vj`1) because the decreasing of the objective function χpU, V q is not guaranteed. Instead, in the next subsection, we will propose an implementation for obtaining the solution or an inexact solution which guarantees that (1) the objective functioin χpU, V q is non-increasing; that is,

χpUj,Vjq ě χpUj`1,Vj`1q; (2) the constraint is preserved, i.e., T T Uj`1Vj “ Uj`1Vj`1 “ Ik; (3.7)

(3) the columns of Uj`1 and Vj`1 in (3.1a) and (3.1b), respectively, can be computed in parallel; (4) each column of Uj`1 and Vj`1 can be computed by an efficient preconditioned CG iteration and therefore, only matrix-vector products are involved. We conclude this subsection by suggesting a stopping criterion for Algorithm 3.1 in line 2. Let pUj,Vjq be the current iteration pair, which is an approximation (i.e., an approximate KKT pair) for the Thouless’s minimization principal (1.7). There- fore, according to (2.3), pUj,Vjq is an approximate deflating subspace of tK,Mu, T which by Uj Vj “ Ik,(2.4) and (2.5) imply that the k positive eigenvalues (the Ritz values) of T Uj KUj Hj “ T , (3.8) V MVj „ j  serve as the approximations for λ1, . . . , λk. Based upon this observation, we can terminate the iteration if the relative residual error T T }KUj ´ VjpUj KUjq}1 }MVj ´ UjpVj MVjq}1 Res “ T ` T ď r (3.9) }KUj}1 ` }VjpUj KUjq}1 }MVj}1 ` }UjpVj MVjq}1 396 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG is fulfilled. Accordingly, to measure the accuracy of the Ritz values, we can choose

χpUj´1,Vj´1q ´ χpUj,Vjq χr “ , (3.10) χpUj,Vjq and stop the iteration if χr ď χ for a given tolerance χ. In our numerical testing, the two measures are combined together as the stoping criterion. The Ritz pairs corresponding to the k positive Ritz values of Hj in (3.8) at the terminated iteration j are used as the approximations for the LREP (1.5).

3.2. Solve the subproblems. We now discuss how to solve the subproblems (3.1a) and (3.1b) in an eﬃcient way. The implementation results our Projected Precondi- tioned Conjugate Gradient (PPCG lrep) method. T Take the former as an example. Using the condition Uj Vj “ Ik, for the moment, T we ﬁrst express the constraint U Vj “ Ik as

U “ PvZ ` Uj, (3.11)

nˆpn´kq T where Pv P R is a basis of NullpVj q. It should be pointed out that, in our computation, Pv needs not to be formed explicitly. Substituting (3.11) into (3.1a) yields 1 1 1 χ pUq :“ trpU TKUq “ trpZTP TKP Zq ` trpZTP TKU q ` trpU TKU q. 1 2 2 v v v j 2 j j T Now, denote Z “ rz1,..., zks and Pv KUj “ rw1,..., wks and we know that (3.1a) is equivalent to

k T T k T T zi Pv KPvzi T zi Pv KPvzi T min p ` zi wiq “ min ϕpziq :“ ` zi wi . z P n´k 2 z P n´k 2 i R i“1 i“1 i R ÿ ÿ " * Note that k min χpU, Vjq “ min ϕpziq ` χpUj,Vjq. (3.12) U TV “I z P n´k j k i“1 i R ÿ Therefore, we only need to consider the solutions of T T zi Pv KPvzi T min ` zi wi , i “ 1, . . . , k, (3.13) z P n´k 2 i R " * each of which can be computed by the (preconditioned) CG iteration [9, Section T ˆ 16.3]. To be precise, if W “ Pv KPv is a precondition so that the symmetric ˆ ´ 1 T ´ 1 and positive deﬁnite matrix K « K and W 2 pPv KPvqW 2 « In´k, then the preconditioned CG iteration [9, Algorithm 16.1] for (3.13) processes as Algorithm 3.2. There are many nice properties for the (preconditioning) CG iteration (e.g., [9, Chapter 5]), and among them, the following proposition [9, Theorem 5.2] implies the monotonic decrease of the objective value.

p`q p``1q Proposition 1. Let tzi u be the sequence generated by Algorithm 3.2, then zi p0q ´1 T ´1 p0q is the minimizer of ϕpziq over zi ` K`pW Pv KPv,W ri q, where K pW ´1P TKP ,W ´1rp0qq ` v v i r is the Krylov subspace. Therefore, ϕpzp`qq ě ϕpzp``1qq. i i r A PROJECTED PRECONDITIONED CG METHOD 397

Algorithm 3.2 Preconditioned CG for the reduced systems w.r.t. zi p0q n´k T ˆ Staring from an initial point zi P R and a precondition matrix W “ Pv KPv, this preconditioned CG iteration solves inexactly (3.13). p0q T p0q p0q ´1 p0q p0q p0q 1: ri “ Pv KPvzi ` wi, gi “ W ri , di “ ´gi , ` “ 0; p`q 2: while }ri }2 ą c do ` ` p``1q p`q p`q prp qrqTgp q 3: r r r i i r zi “ zi ` αidi , with αi “ p`q T T p`q ; pdi q Pv KPv di p``1qr p`q p`q r r 4: r “ r ` α P TKP d ; i i i rv v i r r p``1q ´1 pr``1q r 5: gi “ W ri ; p``1q T p``1q r p``1q r p`q r p`q r pri q gi 6: d “ ´g ` βid , with βi “ ` ` ; i i i prp qqTgp q r r i i 7: ` “ ` ` 1; r r 8: endr while r r r r r r

Indeed, Algorithm 3.2 can be taken one step further to yield a projected CG iteration (see e.g., [9, Algorithm 16.2]), which works directly on systems with respect p0q p0q p0q to ui “ Pvzi ` ui , where U “ ru1,..., uks and Uj “ ru1 ,..., uk s. This can be realized by introducing new variables ri, gi and di by T Pv ri “ ri, gi “ Pvgi and di “ Pvdi, respectively. With these new variables, we can work on the variable ui with the r ´r1 T r p0q corresponding preconditioner FK “ PvW Pv and with a special initial point ui p0q (i.e., corresponding to zi “ 0 in Algorithm 3.2), and the iteration is summarized in p`q T p`q p`q T p`q p`q 2 Algorithm 3.3 in which the stopping criterion is pri q gi “ pri q Pvri “}ri }2.

Algorithm 3.3 Projected preconditioned CG for the systems w.r.t. ui r T ˆ ´1 T Given a precondition matrix FK “ PvpPv KPvq Pv , this preconditioned CG iteration obtains a vector which is an approximation to the ith column of the solution U of (3.1a). p0q p0q p0q p0q p0q p0q 1: ri “ Kui , gi “ FK ri , di “ ´gi , ` “ 0; p`q T p`q 2: while pri q gi ą c do ` ` p``1q p`q p`q prp qqTgp q 3: i i ui “ ui ` αidi , with αi “ p`q T p`q ; pdi q Kdi p``1q p`q p`q 4: ri “ ri ` αiKdi ; p``1q p``1q 5: gi “ FK ri ; ` 1 ` 1 p``1q p`q p`q prp ` qqTgp ` q 6: i i di “ ´gi ` βidi , with βi “ p`q T p`q ; pri q gi 7: ` “ ` ` 1; 8: end while

p`q Remark 1. It can be veriﬁed that if Algorithm 3.3 generates the sequences tui u, p`q p`q p`q tri u, tgi u and tdi u, they uniquely determine the corresponding sequences p`q p`q p`q p`q tzi u, tri u, tgi u and tdi u, which are also the sequences generated by Algo- p0q rithm 3.2 staring from the initial point zi “ 0, and it is true that r r r p`q p`q p0q ui “ Pvzi ` ui . (3.14) 398 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

p0q Moreover, with the initial point ui and by the relation (3.14), we know that, if p0q Uj`1 “ ru1,..., uks where ui “ Pvzi ` ui is an inexact solution obtained from Algorithm 3.3, it is true that T r r Uj`1 “ Prvrz1,...,rzks ` Uj ùñ Uj`1Vj “ Ik, (3.15) T which means that the constraint U Vj “ Ik is preserved. In addition, by Proposi- tion1 and (3.14), we know thatr r k k χ1pUj`1q “ ϕpziq ` χ1pUjq ď ϕp0q ` χ1pUjq “ χ1pUjq. i“1 i“1 ÿ ÿ As a result, we have r χpUj`1,Vjq ď χpUj,Vjq. (3.16) p`q One can easily see that the vectors ui for i “ 1, 2, . . . , k generated by Algorithm 3.3 can also be obtained simultaneously by updating the corresponding matrices p`q p`q p`q p`q p`q p`q R : “ rr1 ,..., rk s,G :“ rg1 ,..., gk s, p`q p`q p`q p`q p`q p`q D : “ rd1 ,..., dk s,U :“ ru1 ,..., uk s as in Algorithm 3.4.

Algorithm 3.4 Projected preconditioned CG for (3.1a) T ˆ ´1 T Given a precondition matrix FK “ PvpPv KPvq Pv , this preconditioned CG p`q iteration obtains an (inexact) solution Uj`1 “ U of (3.1a). p0q p0q p0q p0q p0q 1: R “ KUj,G “ FK R ,D “ ´G , ` “ 0; p`q T p`q p`q T p`q 2: while maxtpri q gi ,..., prk q gk u ą c do prp`qqTgp`q 3: p``1q p`q p`q i i U “ U ` D diagtα1, . . . , αku, with αi “ p`q T p`q ; pdi q Kdi p``1q p`q p`q 4: R “ R ` KD diagtα1, . . . , αku; p``1q p``1q 5: G “ FK R ; prp``1qqTgp``1q 6: p``1q p`q p`q i i D “ ´G ` D diagtβ1, . . . , βku, with βi “ p`q T p`q ; pri q gi 7: ` “ ` ` 1; 8: end while

The other subproblem (3.1b) can be solved similarly. In fact, if we denote by T T ˆ ´1 T Pu a basis of NullpUj`1q and by FM “ PupPu MPuq Pu a precondition matrix so that Mˆ « M is symmetric and positive deﬁnite, the subproblem (3.1b) can be solved by Algorithm 3.5.

Theorem 3.1. Suppose tUju and tVju are the sequences generated from Algorithm 3.1 where each Uj and Vj pj ě 1q, are computed (inexactly) by Algorithm 3.4 and Algorithm 3.5, respectively, then for j “ 0, 1,..., T T a). Uj Vj “ Uj`1Vj “ Ik and b). χpUj,Vjq ě χpUj`1,Vj`1q. T Proof. These conclusions follow by induction. Since U0 V0 “ Ik, by (3.15) and T (3.16), we have U1 V0 “ Ik and χpU1,V0q ď χpU0,V0q. Thus, according to the T similar argument, the next V1 generated by Algorithm 3.5 satisﬁes U1 V1 “ Ik and χpU1,V1q ď χpU1,V0q. A PROJECTED PRECONDITIONED CG METHOD 399

Algorithm 3.5 Projected preconditioned CG for (3.1b) T ˆ ´1 T Given a precondition matrix FM “ PupPu MPuq Pu , this preconditioned CG p`q iteration obtains an (inexact) solution Vj`1 “ V of (3.1b). p0q p0q p0q p0q p0q 1: R “ MVj,G “ FM R ,D “ ´G , ` “ 0; p`q T p`q p`q T p`q 2: while maxtpri q gi ,..., prk q gk u ą c do prp`qqTgp`q 3: p``1q p`q p`q i i V “ V ` D diagtα1, . . . , αku, with αi “ p`q T p`q ; pdi q Mdi p``1q p`q p`q 4: R “ R ` MD diagtα1, . . . , αku; p``1q p``1q 5: G “ FM R ; prp``1qqTgp``1q 6: p``1q p`q p`q i i D “ ´G ` D diagtβ1, . . . , βku, with βi “ p`q T p`q ; pri q gi 7: ` “ ` ` 1; 8: end while

For a detailed implementation of Algorithms 3.4 and 3.5, we have the following additional remarks:

1. There are many simple choices of the precondition matrices for FK and FM . For instance, we can take rK,ˆ Mˆ s “ rIn,Ins, rK,ˆ Mˆ s “ rdiagt|Kii|u, diagt|Mii|us or the block diagonal forms. 2. It should be pointed out that in computing fK “ FK t and fM “ FM t for a given vector or a matrix t, we do not need to form explicitly the basis matrices Pv and Pu. Instead, they can be computed as follows (see [9, Section 16.3]): ˆ ´1 T ˆ ´1 ´1 T ˆ ´1 fK “ K rIn ´ VjpVj K Vjq Vj K st ˆ ´1 ˆ ´1 T ˆ ´1 ´1 T ˆ ´1 “ K t ´ K VjpVj K Vjq Vj K t, (3.17) ˆ ´1 T ˆ ´1 ´1 T ˆ ´1 fM “ M rIn ´ Uj`1pUj`1M Uj`1q Uj`1M st ˆ ´1 ˆ ´1 T ˆ ´1 ´1 T ˆ ´1 “ M t ´ M Uj`1pUj`1M Uj`1q Uj`1M t. (3.18)

3. The columns of Uj`1 and Vj`1 can be computed in parallel in which the precondition matrices FK and FM can be used for all i “ 1, 2, . . . , k.

4. Convergence analysis. Assuming the subproblems (3.1a) and (3.1b) are solved exactly, we investigate the convergence of the alternative variables iteration Algo- rithm 3.1 in this section.

4.1. Accuracy of the deflating subspace. Relying upon the explicit updating formulation (3.5), we first estimate how fast RpVjq and RpUjq approach RpY1q and RpX1q, respectively, where RpY1q and RpX1q are the eigenspaces associated with the k smallest eigenvalues of KM and MK, respectively, and Y “ rY1,Y2s, X “ rX1,X2s and 2 2 2 2 2 2 KMY1 “ Y1Λ1 “ Y1 diagtλ1, . . . , λku,MKX1 “ X1Λ1 “ X1 diagtλ1, . . . , λku. Taking the advantage of the equal role of K and M in the Thouless’s minimization principal (1.7), we can focus on the accuracy of RpVjq in approximating RpY1q first, and then apply the results to RpUjq. To this end, we will specially use the distance } sin ΘpVj,Y1qM }2. For the sake of convenience, in the following discussions, the subscripts are omitted so that Vj and Vj`1 are denoted simply by V and V respectively. r 400 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

nˆk nˆpn´kq By (2.9) and (2.1), for X “ rX1,X2s with X1 P R and X2 P R , we have

T T µ : “} sin ΘpV,Y1qM }2 “} sin ΘpX V,X Y1q}2 I “ sin Θ XTV, k 0 › ˆ „ ˙›2 › › › T ´ 1 T › 0 “ ›pV MV q 2 V X › In k › „ ´ ›2 › T ´ 1 T › “› }pV MV q 2 V X2}2 › › 1 › T ´ 2 T “ }pS Sq S2 }2, (4.1) where we have deﬁned T T X1 V S1 S :“ X V “ T “: . X V S2 „ 2  „  On the other hand, it follows that

T T µ : “} sin ΘpV,Y1qM }2 “} sin ΘpX V,X Y1q}2 I “ sin Θ XTV, k r r 0 r › ˆ „ ˙›2 › › › T ´r1 T ›0 “ ›pV MV q 2 V X › In k › „ ´ ›2 › T ´ 1 T › T ´ 1 T “ }p› Vr MVrq 2 Vr X2}2 “ }p›S Sq 2 V X2}2, (4.2) › › with S XTV . By (3.5) and (2.1), we have “ r r r r r r V KM ´1VΩ X´ TΛ´2XTV V TXΛ´4XTV ´1 V TXΛ´2XTV r “ pr q “ p q p q “ X´ TΛ´2S pSTΛ´4Sq´1pSTΛ´2Sq, (4.3) r “Ω and thus, loooooooooooooomoooooooooooooon

T T ´2 T ´4 ´1 T ´2 ´1 V X2 “ pS Λ SqpS Λ Sq S Λ X X2 0 “ pSTΛ´2SqpSTΛ´4Sq´1ST (4.4) r Λ´2 „ 2  T ´2 T ´4 ´1 T ´2 “ pS Λ SqpS Λ Sq V X2Λ2 , and by (4.3)

STS “ V TMV “ ΩTSTΛ´2X´1MX´ TΛ´2SΩ “ ΩTSTΛ´4SΩ “ pSTΛ´2SqpSTΛ´4Sq´1pSTΛ´2Sq. (4.5) r r r r Consequently, we have

1 T ´2 T ´4 ´1 T ´2 ´ 2 T ´2 T ´4 ´1 T ´2 µ “ rpS Λ SqpS Λ Sq pS Λ Sqs pS Λ SqpS Λ Sq V X2Λ2 2 › 1 › T ´2 T ´4 ´1 T ´2 2 T ´2 ´1 T ´2 “ ›rpS Λ SqpS Λ Sq pS Λ Sqs pS Λ Sq S2 Λ2 . (4.6)› r › 2 › › › The values› µ and µ in (4.1) and (4.6), respectively, can be further› simpliﬁed as the › › following lemma shows. r A PROJECTED PRECONDITIONED CG METHOD 401

Lemma 4.1. Let µ and µ be deﬁned in (4.1) and (4.6), respectively. Then

T T ´1 µ “r λmax pS2 S2qpS Sq , (4.7) c ´ ¯ T ´4 T ´4 ´1 µ “ λmax pS2 Λ2 S2qpS Λ Sq , (4.8) c ´ ¯ where λ p‚q stands for the largest eigenvalue of ‚. max r Proof. (4.7) is obvious from (4.1). For (4.8), we have from (4.6) that

1 2 ´2 T ´2 ´1 T ´2 T ´4 ´1 T ´2 2 2 µ “}Λ2 S2pS Λ Sq rpS Λ SqpS Λ Sq pS Λ Sqs }2 T ´2 T ´4 ´1 T ´2 1 T ´2 ´1 “ λmax rpS Λ SqpS Λ Sq pS Λ Sqs 2 pS Λ Sq r ´ 1 T ´4 T ´2 ´1 T ´2 T ´4 ´1 T ´2 2 ˆpS2 Λ2 S2qpS Λ Sq rpS Λ SqpS Λ Sq pS Λ Sqs

T ´2 ´1 T ´4 T ´2 ´1 ¯ “ λmax pS Λ Sq pS2 Λ2 S2qpS Λ Sq ´ ˆrpSTΛ´2SqpSTΛ´4Sq´1pSTΛ´2Sqs

T ´2 ´1 T ´4 T ´4 ´1 ¯ T ´2 “ λmax pS Λ Sq pS2 Λ2 S2qpS Λ Sq pS Λ Sq

´ T ´4 T ´4 ´1 ¯ “ λmax pS2 Λ2 S2qpS Λ Sq , as asserted. ´ ¯ Using Lemma 4.1, we are able to give a relation between µ and µ in Lemma 4.2. Lemma 4.2. Under the assumptions of Lemma 4.1, we have µ r µ ď , (4.9) µ2 ` ζ4p1 ´ µ2q where ζ :“ λk`1 ě 1. r a λk T T T Proof. By (4.7) in Lemma 4.1 and S S “ S1 S1 ` S2 S2, we have 2 T T ´1 µ “ λmax pS2 S2qpS Sq ´ 1 ¯ 1 T 2 T T ´1 T 2 “ λmax pS2 S2q pS1 S1 ` S2 S2q pS2 S2q ´ 1 1 ¯ T ´ 2 T T ´ 2 ´1 “ λmax rIk ` pS2 S2q S1 S1pS2 S2q s , and therefore, ´ ¯

1 T ´ 1 T T ´ 1 “ λ rI ` pS S q 2 pS S qpS S q 2 s µ2 min k 2 2 1 1 2 2 ´ 1 1 ¯ T ´ 2 T T ´ 2 “ 1 ` λmin pS2 S2q pS1 S1qpS2 S2q . (4.10) Similarly, by ´ ¯ T ´4 T ´4 T ´4 S Λ S “ S1 Λ1 S1 ` S2 Λ2 S2, one has

1 T ´4 ´ 1 T ´4 T ´4 ´ 1 “ λ rI ` pS Λ S q 2 pS Λ S qpS Λ S q 2 s µ2 min k 2 2 2 1 1 1 2 2 2 ´ 1 1 ¯ T ´4 ´ 2 T ´4 T ´4 ´ 2 “ 1 ` λmin pS2 Λ2 S2q pS1 Λ1 S1qpS2 Λ2 S2q . (4.11) r ´ ¯ 402 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

On the other hand, by the ascending order of λi ą 0, we have 4 1 1 λk`1 1 1 T ´4 ´ 2 T ´4 T ´4 ´ 2 T ´ 2 T T ´ 2 pS2 Λ2 S2q pS1 Λ1 S1qpS2 Λ2 S2q ľ pS2 S2q pS1 S1qpS2 S2q , λk ´ ¯ where A ľ B means that A ´ B is positive semi-deﬁnite. Therefore, by (4.11) and (4.10), we have

1 4 T ´ 1 T T ´ 1 ě 1 ` ζ λ pS S q 2 pS S qpS S q 2 µ2 min 2 2 1 1 2 2 1´ ¯ “ 1 ` ζ4 ¨ ´ 1 , (4.12) r µ2 ´ ¯ which consequently leads to (4.9), and the proof is completed. We can present our convergence result for the alternative variables iteration Al- gorithm 3.1, which also reveals the numerical behavior of the PPCG lrep method as well.

Theorem 4.3. Suppose tUju and tVju are the sequences generated by Algorithm 3.1 where Uj and Vj for j “ 1, 2,... are the solutions of the subproblems (3.1a) and λk`1 1 (3.1b), respectively. Assume ζ “ ą 1 and σ is arbitrary satisfying 4 ď σ ă 1. λk ζ Then

(i) if V0 satisﬁes ζ4σ ´ 1 } sin ΘpV0,Y1qM }2 ď , (4.13) dζ4σ ´ σ

the sequence tRpVjqu converges to tRpY1qu at least linearly, and moreover, ? } sin ΘpVj`1,Y1qM }2 ď σ} sin ΘpVj,Y1qM }2; (4.14)

(ii) if U0 satisﬁes ζ4σ ´ 1 } sin ΘpU0,X1qK }2 ď , (4.15) dζ4σ ´ σ

the sequence tRpUjqu converges to tRpX1qu at least linearly, and moreover, ? } sin ΘpUj`1,X1qK }2 ď σ} sin ΘpUj,X1qK }2. (4.16) Proof. For (i), by (4.9) in Lemma 4.2, for 0 ă σ ă 1, the condition µ2 ď σµ2 yields 1 ď µ2p1 ´ ζ4q ` σζ4 or 1 ´ σζ4 ď µ2p1 ´ ζ4q. 2 4 1 r Because µ p1 ´ ζ q ă 0, we require ζ4 ď σ ă 1 and therefore, we require

ζ4σ ´ 1 µ ď ă 1. dζ4σ ´ σ

Now, by induction, if this condition (4.13) is fulﬁlled at j “ 0, the sequence tRpVjqu converges to RpY1q and (4.14) holds, too. Part (ii) can be proved analogously by considering the symmetric structure in (1.7). Remark 2. Theorem 4.3 implies that the larger the ζ is, the smaller the σ could be, and therefore, the faster convergence is. Moreover, it can be seen from the initial condition (4.13) that as ζ gets large (i.e., σ gets small), the attractive basin for the local convergence becomes large, too. A PROJECTED PRECONDITIONED CG METHOD 403

T Remark 3. Note that for an arbitrary basis pU, V q of pRpX1q, RpY1qq with U V “ k Ik, it is true [1, Appendix A] that χpU, V q ě i“1 λi and the strictly inequality is possible. But as in [1, Appendix A], we can choose another basis pU,ˇ Vˇ q of ř pRpX1q, RpY1qq ´ 1 1 ˇ ´ T 2 ˇ ´ T 2 U “ UΨ1 Λk and V “ VΨ1 Λk ˇ ˇ k kˆk to have χpU, V q “ i“1 λi, where Ψ1,Ψ2,Λk P R are from the decomposition (by Theorem 2.1) U TKU “ Ψ Λ2 Ψ T and V TMV “ Ψ Ψ T with Ψ “ Ψ ´ T and ř 1 k 1 2 2 2 1 Λk “ diagtλ1, . . . , λku. In other words, an arbitrary basis pU, V q of the deflating T k subspace pRpX1q, RpY1qq with U V “ Ik may not achieve the minimum i“1 λi of Thouless’s minimization principal (1.7), and the solution to (1.7) is only a special ˆ ˆ ř 8 basis pRpX1q, RpY1qq. Now, let pU, V q be a limit point of the sequence tpUj,Vjquj“0, which by Theorem 4.3 implies that pRpUˆq, RpVˆ qq “ pRpX1q, RpY1qq. It might also ˆ ˆ k be true that χpU, V q ą i“1 λi, but this has no any impact on our final task Uˆ TKUˆ in solving LREP (1.5), asř the projected matrix contains the Vˆ TMVˆ desired k eigenpairs. „  4.2. Accuracy of the Ritz values. With the help of Theorem 4.3, we finally consider the local convergence of the Ritz values ˘ ˘ 0 ă λ1 ď ¨ ¨ ¨ ď λk (4.17) which are the k positive eigenvalues of the matrix Hj given in (3.8). ˘ First, it has been known that [1, Theorem 4.1] 0 ď λi ď λi for i “ 1, 2, . . . , k, k k ˘ and hence i“1 λi is always a lower bound for i“1 λi. Thus, we are interested in an upper bound for k λ˘ . The following Theorem 4.5 is a Rayleigh-Ritz type ř i“1 i ř perturbation theory for LREP (see [18] for other types of Rayleigh-Ritz approximations for LREP), whichř is of interest in its own right and also provides the accuracy k ˘ k of i“1 λi in approximating the Thouless’s minimum i“1 λi in (1.7). Lemmař 4.4. ([6, Corollary 7.7.4]) If A, B P Rnˆn řare symmetric and positive definite, then (i) A ľ B if and only if B´1 ľ A´1; (ii) If A ľ B, then σipAq ě σipBq for all i “ 1, 2, . . . , n, where σipAq for i “ 1, 2, . . . , n are eigenvalues of A in the increasing order.

Theorem 4.5. Let K and M be symmetric and positive deﬁnite and pU, V q be an T approximation basis for tRpX1q, RpY1qu satisfying U V “ Ik and

} sin ΘpV,Y1qM }2 ď ă 1 and } sin ΘpU, X1qK }2 ď ă 1. (4.18) ˘ ˘ Let 0 ă λ1 ď ¨ ¨ ¨ ď λk be the k positive eigenvalues of U TKU H “ P 2kˆ2k. (4.19) SR V TMV R „  Then for any suﬃciently small P r0, 1q, we have

k k k λ3 0 ď λ˘ ´ λ ď 2 λ ` i ` Op4q. (4.20) i i i λ λ i“1 i“1 i“1 1 k`1 ÿ ÿ ÿ ˆ ˙ 404 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

Λ Proof. Note from M “ XXT ,K “ pYΛqpYΛqT, Y TX “ I and R 1 n 0 ˆ„ ˙ I “ R k that 0 ˆ„ ˙ I sin ΘpV,Y q “ sin ΘpXTV,XTY q “ sin Θ XTV, k 1 M 1 0 ˆ „ ˙ and T T sin ΘpU, X1qK “ sin ΘppYΛq U, pYΛq X1q Λ “ sin Θ ΛY TU, 1 0 ˆ „ ˙ I “ sin Θ ΛY TU, k . 0 ˆ „ ˙ Thus, assumptions (4.18) imply that

T T ´ 1 Ik 0 X V pV MV q 2 “ C1 ` C2, }C2}2 “} sin ΘpV,Y1qM }2 ď 0 In k „  „ ´  T T ´ 1 Ik 0 ΛY UpU KUq 2 “ C3 ` C4, }C4}2 “} sin ΘpU, X1qK }2 ď 0 In k „  „ ´  T T T T T with C1 C1 ` C2 C2 “ C3 C3 ` C4 C4 “ Ik. This, by Y X “ In, X “ rX1,X2s and Y “ rY1,Y2s, leads to T 1 T 1 T 1 V “ Y1C1pV MV q 2 ` Y2C2pV MV q 2 “ Y CˆpV MV q 2 , (4.21) 1 1 ´1 T 2 ´1 T 2 U “ X1Λ1 C3pU KUq ` X2Λ2 C4pU KUq ´1 T 1 “ XΛ C˜pU KUq 2 , (4.22) ˆ T T T nˆk ˜ T T T nˆk where C “ rC1 ,C2 s P R and C “ rC3 ,C4 s P R , both are of orthonormal columns. T Since U V “ Ik, by Thouless’s minimization principal (1.7), we have k ˘ 1 T T T T λi “ min trpP1 U KUP1 ` P2 V MVP2q; (4.23) P TP “I ,P ,P P kˆk 2 i“1 1 2 k 1 2 R ÿ k ˘ on the other hand, i“1 λi is sum of the k positive eigenvalues of HSR given in (4.19). It has been shown [1, Theorem 2.9] that the Ritz values ˘λ˘ for i “ 1, 2, . . . , k, are ř i invariant with respect to the choice of basis pU,˘ V˘ q of pRpUq, RpV qq as long as T U˘ V˘ “ Ik. Thus, we choose a new and special basis pU,˘ V˘ q for pRpUq, RpV qq given by 1 ´ 1 ˘ ˆ ´1 2 ˘ ´1 ˜ ˆT ´1 ˜ ´1 T 2 V “ Y CC1 Λ1 , U “ XΛ CpC Λ Cq C1 Λ1 , (4.24) T which satisﬁes U˘ V˘ “ Ik and pRpU˘q, RpV˘ qq “ pRpUq, RpV qq. The nonsingularity T ´1 T of the matrix Cˆ Λ C˜ follows from (4.21), (4.22) and U V “ Ik. With this choice of basis, it follows from (4.23) that k k 1 λ ď λ˘ ď trpU˘ TKU˘ ` V˘ TMV˘ q, (4.25) i i 2 i“1 i“1 ÿ ÿ 1 ˘ T ˘ and the conclusion (4.20) follows by establishing an upper bound for 2 trpU KU ` V˘ TMV˘ q. A PROJECTED PRECONDITIONED CG METHOD 405

T To this end, we ﬁrst notice by using Cˆ Cˆ “ Ik and }C2}2 ď that

T ´1 T ´1 T ´1 1 1 }pC1C1 q }2 “ }pC1 C1q }2 “ }pIk ´ C2 C2q }2 ď 2 ď 2 1 ´ }C2}2 1 ´ and thus,

1 1 1 1 T 2 ´ T T T ´1 2 2 T ´1 2 1 V˘ MV˘ “ Λ C Cˆ Y MY CCˆ Λ “ Λ pC1C q Λ ĺ Λ1. 1 1 1 1 1 1 1 1 ´ 2 Therefore, by Lemma 4.4, k k k T i“1 λi 2 4 trpV˘ MV˘ q ď “ λi ` λi ` Op q. (4.26) 1 ´ 2 i“1 i“1 ř ÿ ÿ For trpU˘ TKU˘q, we have 1 ´ 1 ´ ´ 1 ˘ T ˘ 2 ´ T ˆT ´1 ˜ ˜T ´1 ˆ ´1 2 U KU “ Λ1 C1 C Λ CC Λ CC1 Λ1 . (4.27) Consider the medium matrix´ in (4.27) and we have ¯ ´ T ˆT ´1 ˜ ˜T ´1 ˆ ´1 ´1 T ´1 ´ T T ´1 T ´1 ´1 C1 C Λ CC Λ CC1 “ Λ1 C3C3 Λ1 ` C1 C2 Λ2 C4C4 Λ2 C2C1 ´ T T ´1 T ´1 ´1 T ´1 ´1 `C1 C2 Λ2 C4C3 Λ1 ` Λ1 C3C4 Λ2 C2C1 . Note that 2 ´2 ´1 T ´1 ´2 p1 ´ qΛ1 ĺ Λ1 C3C3 Λ1 ĺ Λ1 , ´ T T ´1 T ´1 ´1 0 ĺ C C Λ C4C Λ C2C , 1 2 ?2 4 2 1 and by }C´1} ď ? 1 and }C } ď 1 ` 2, 1 2 1´2 3 2 ? 2 2 ´ T T ´1 T ´1 ´1 T ´1 ´1 2 1 ` C C Λ C4C Λ ` Λ C3C Λ C2C ď ¨ ? , 1 2 2 3 1 1 4 2 1 2 2 λ1λk`1 1 ´ › › implying› › ? 2 2 2 1 ` ´ T T ´1 T ´1 ´1 T ´1 ´1 ´ ¨ ? Ik ĺ C C Λ C4C Λ ` Λ C3C Λ C2C . 2 1 2 2 3 1 1 4 2 1 λ1λk`1 1 ´ Therefore, for any sufficiently small , ? 2 2 ´ T T ´1 T ´1 ´1 2 ´2 2 1 ` C Cˆ Λ C˜C˜ Λ CCˆ ľ p1 ´ qΛ ´ ¨ ? Ik ą 0, 1 1 1 2 λ1λk`1 1 ´ which by Lemma 4.4 and (4.27) yields ? 1 2 2 ´ T 2 ´1 2 1 ` 0 ĺ U˘ KU˘ ĺ p1 ´ qΛ ´ ¨ ? Λ1 . (4.28) 1 2 λ1λk`1 1 ´ ˆ ˙ Note that for any sufficiently small , ? 1 ` 2 ? “ 1 ` 2 ` Op4q. 1 ´ 2 Then for any i “ 1, 2, . . . , k, ? 1 2 2 ´ 2 ´1 2 1 ` p1 ´ qλ ´ ¨ ? λi (4.29) i 2 λ1λk`1 1 ´ ˆ ˙ ? 1 2 2 ´ 2 2λi 1 ` “ λi ¨ 1 ´ 1 ` ¨ ? (4.30) 2 λ1λk`1 1 ´ ˆ ˆ ˙˙ 406 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG ? 2 2 2 2λi 1 ` 4 “ λi ¨ 1 ` 1 ` ¨ ? ` Op q (4.31) 2 λ1λk`1 1 ´ ˆ ˆ ˙˙ 2λ2 “ λ ¨ 1 ` 2 1 ` i p1 ` 2q ` Op4q, (4.32) i λ λ ˆ ˆ 1 k`1 ˙˙ and according to Lemma 4.4 again, (4.28) gives k k 2λ3 trpU˘ TKU˘q ď λ ` 2 λ ` i ` Op4q, i i λ λ i“1 i“1 1 k`1 ÿ ÿ ˆ ˙ which combines with (4.26) leads to (4.20). Revealed by Theorem 4.5, we know that, similar to the Hermitian eigenvalue problem, the accuracy of the sum of the k positive Ritz values is also proportional to the square of the accuracy of the deflating subspaces. As sin Θ V,Y and ? } p 1qM }2 } sin ΘpU, X1qK }2 converge to zero linearly with factor σ (by Theorem 4.3), we k ˘ k know that i“1 λi converges to i“1 λi with factor σ.

Corollaryř 1. Suppose tUju andřtVju are the sequences generated by Algorithm 3.1 λk`1 with U0 and V0 satisfying (4.15) and (4.13), respectively. Assume ζ “ ą 1 and λk 1 σ is arbitrary satisfying ζ4 ď σ ă 1. Then for sufficiently large j, the k positive ˘ eigenvalues λi for i “ 1, 2, . . . , k of Hj defined by (3.8) satisfy k k k λ3 0 ď λ˘ ´ λ ď σj λ ` i ` Opσ2jq. (4.33) i i i λ λ i“1 i“1 i“1 1 k`1 ÿ ÿ ÿ ˆ ˙ Proof. According to Theorem 4.3, we know that ? j ? j } sin ΘpVj,Y1qM }2 ď σ } sin ΘpV0,Y1qM }2 ď σ , ? j ? j } sin ΘpUj,X1qK }2 ď σ } sin ΘpU0,X1qK }2 ď σ . Now use Theorem 4.5 to conclude (4.33) for any sufficient large j.

5. Numerical experiments. In this section, we evaluate our proposed PPCG lrep method from several aspects. In the next subsection, we first show that PPCG lrep is a much more efficient implementation of the inverse power iteration (3.5)-(3.6) than the other straightforward one, where the involved linear systems in (3.5) and (3.6) are solved by the preconditioned CG (PCG); our next goal is to test several types of preconditioners for K and M. Both of these tasks are carried out in MATLAB (R2011b) on made-up LREPs with randomly generated dense matrices and sparse matrices from University of Florida Sparse Matrix Collection [4]. Our final goal is to evaluate the parallelization capability of PPCG lrep when columns in Uj and Vj are computed in parallel. For that purpose, we code the algorithm PPCG lrep in C language with the inner Algorithms 3.2 and 3.3 parallelized by OpenMP, and test two practical problems arising from TDDFT. All of our tests are conducted on a PC under Linux system with Intel Core i5-3230M CPU (2.6 GHz) and 4 GB memory. 5.1. Numerical tests for dense random and sparse made-up problems. Our reported numerical experiments of PPCG lrep (using the block implementation via Algorithms 3.4 and 3.5 to compute pUj,Vjq) are based on several types of preconditioners, namely, Iden, Diag, Chol, IChol, and SSOR. Specifically, Iden and Diag are diagonal preconditioners using the identity In and diagonal matrices of K and M, respectively; Chol and IChol are triangular preconditioners resulting from A PROJECTED PRECONDITIONED CG METHOD 407 the Cholesky factorization for dense case and incomplete Cholesky factorization for sparse case, respectively; SSOR uses the symmetric successive over relaxation method as preconditioner. For the stopping criterion, we terminate Algorithm 3.1 ´6 whenever either (3.9) or (3.10) is satisfied, where both r and χ are set to be 10 . The initial U0 and V0 are simply re1, . . . ,eeks, and the maximum numbers of outer iteration and inner iteration are chosen to be 100 and 500, respectively. In addition, we vary the tolerance (cg tol) in the inner CG from 10´5 to 10´10 in order to investigate the overall performance of PPCG lrep with various accurate approximations of (3.1a) and (3.1b). For our first test example, we use a pair of matrices pK,Mq of order n “ 500 generated randomly by K “ randnpnq,K “ K1 ˚ K,M “ randnpnq,M “ M 1 ˚ M (5.1) in MATLAB. Our goal is to compute the first 4 smallest positive eigenvalues 0 ă λ1 ď λ2 ď λ3 ď λ4. In the following Table 5.1, we compare the performance of PPCG lrep with the straightforward way via the preconditioned CG (PCG) for solving the subproblems (3.1a) and (3.1b).

Table 5.1. The CPU time(s) for the case n “ 500 and k “ 4

Using PCG for (3.3) and (3.4) PPCG lrep cg tol Iden Diag Chol Iden Diag Chol SSOR 10´10 78.31 350.15 3.06 6.60 5.92 0.28 27.21 10´9 74.53 342.30 2.61 5.99 4.74 0.22 22.09 10´8 74.22 345.08 3.06 5.31 3.94 0.21 17.72 10´7 74.38 339.19 3.48 4.55 3.12 0.20 13.35 10´6 73.48 338.52 3.62 3.84 2.14 0.19 9.12 10´5 72.76 334.73 2.94 3.08 1.49 0.16 3.53

The numerical results listed in Table 5.1 clearly indicate that PPCG lrep is a much more efficient implementation for (3.3) and (3.4) than the direct way by PCG. For more information about the numerical results, in Table 5.2, we report the relative eigenvalue errors (RESeig), the total number of inner CG iterations (inner iter.) and the number of outer loop iterations (outer iter.) of PPCG lrep. The relative error RESeig concerns the accuracy of the sum of eigenvalues associated with the ˘pjq approximate eigenvalues (Ritz values) λi for i “ 1, . . . , k at the jth iteration and is defined as | k λexact ´ k λ˘pjq| RESeig “ i“1 i i“1 i , k ˘pjq ř i“1 λiř exact where λi for i “ 1, . . . , k are computedř by MATLAB function eig for the matrix H given in (1.4). We observe from Table 5.2 that (1) the outer loop iterations roughly remain the same (about 25) for different preconditioners, but the efficiency in terms of the executing CPU time and the number of inner CG steps is largely dependent on the specific preconditioner, and (2) the relative eigenvalue error RESeig (i.e., the accuracy of the outer loop) decreases when the relevant systems are computed more accurately by the 408 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG

Table 5.2. Numerical results of PPCG lrep for n “ 500 and k “ 4

PPCG lrep cg tol Iden Diag Chol SSOR RESeig 4.29 ˆ 10´9 2.33 ˆ 10´7 1.72 ˆ 10´8 2.43 ˆ 10´7 10´10 outer iter. 25 25 25 25 inner iter. 14831 11484 50 8232 RESeig 5.77 ˆ 10´9 2.55 ˆ 10´6 1.72 ˆ 10´8 2.37 ˆ 10´6 10´9 outer iter. 25 21 25 25 inner iter. 14321 9500 50 6651 RESeig 5.30 ˆ 10´8 2.37 ˆ 10´5 5.39 ˆ 10´8 1.93 ˆ 10´5 10´8 outer iter. 25 21 25 21 inner iter. 12699 7913 47 5337 RESeig 5.33 ˆ 10´7 2.68 ˆ 10´4 5.27 ˆ 10´7 1.91 ˆ 10´4 10´7 outer iter. 25 17 25 21 inner iter. 10906 6151 41 4004 RESeig 5.38 ˆ 10´6 3.60 ˆ 10´3 5.16 ˆ 10´6 2.25 ˆ 10´3 10´6 outer iter. 21 13 25 17 inner iter. 9114 4222 35 2783 RESeig 5.87 ˆ 10´5 1.87 ˆ 10´2 5.05 ˆ 10´5 1.11 ˆ 10´1 10´5 outer iter. 21 13 21 9 inner iter. 7370 2930 29 1057

inner loop CG; in particular, for Iden and Chol, the relative error RESeig are roughly of the same order as the tolerance cg tol for the inner CG. In Figure 5.1, we further demonstrate the behavior history of the relative error ˘pjq for individual eigenvalue λi for i “ 1, 2, 3, 4 with respect to the outer loop iteration ˘pjq ˘pjq j. It is observed that various preconditioners perform differently for λ1 and λ2 , ˘pjq ˘pjq which usually converge faster than λ3 and λ4 . Next, we test PPCG lrep on two sparse problems, where the matrices K and M are from University of Florida Sparse Matrix Collection [4]. The features of these matrices are presented in Table 5.3, and in the case when the two matrices from the collection have different dimensions, we extract out the leading principal submatrix of the larger one to give K or M of equal size. Moreover, in this test, we set the tolerance in the inner CG to be cg tol “ 10´8 to compute the first 4 smallest positive eigenvalues. The computation outputs of PPCG lrep are listed in Table 5.4, and we observed that IChol is a good preconditioner for PPCG lrep in the test problems.

Table 5.3. The sparse matrices K and M

Problem n K M SPARSE TEST 1 10001 bloweybq ted B SPARSE TEST 2 9604 fv1 ﬁnan512

As the ﬁnal part of this subsection, we extend our numerical evaluation of PPCG lrep by comparing another gradient type method, namely, the block 4-D steepest descent algorithm (B4dSD) proposed in [10]. In this test, we set the toler- ´6 ´8 ance cg tol “ 10 and r “ χ “ 10 for PPCG lrep, while we terminate B4dSD whenever the number of iterations is larger than 5000, or each of the relative residual ˘ associated with (1.5) at the approximated solution pλi, ˘ziq satisﬁes }H˘z ´ λ˘ ˘z } i i i 1 ď 10´8, i “ 1, 2, . . . , k. ˘ }H}1}˘zi}1 ` |λi|}˘zi}1 A PROJECTED PRECONDITIONED CG METHOD 409

The relative eigenvalue error of λ The relative eigenvalue error of λ 1 2 0 0 10 10 Iden Iden −2 Diag Diag 10 Chol Chol SSOR SSOR −4 10 −5 10

−6 10

−8 10 −10

Relative eigenvalue error Relative eigenvalue error 10 −10 10

−12 10 0 2 4 6 8 10 12 14 0 5 10 15 Outer iter. Outer iter.

The relative eigenvalue error of λ The relative eigenvalue error of λ 3 4 0 0 10 10 Iden Iden −2 Diag Diag 10 Chol −2 Chol 10 SSOR SSOR −4 10

−4 10 −6 10

−8 −6 10 10 Relative eigenvalue error Relative eigenvalue error −10 10 −8 10

−12 10 0 5 10 15 20 25 0 5 10 15 20 25 Outer iter. Outer iter.

˘pjq Figure 5.1. Relative error of λi w.r.t. outer loop iteration j for the case n “ 500 and k “ 4

Table 5.4. Numerical results of PPCG lrep for sparse problems

Iden Diag IChol SSOR CPU time(s) 157.10 120.77 0.86 75.80

SPARSE TEST 1 outer iter. 69 49 9 13 inter iter. 18926 12769 22 2120 CPU time(s) 73.79 52.19 55.46 78.38

SPARSE TEST 2 outer iter. 473 477 473 473 inter iter. 8012 4599 1076 2432

To demonstrate the performance, besides the test matrices in the form of (5.2), we use another kind of random symmetric and positive deﬁnite matrices

rQ, „s “ qrprandnpnqq,D “ diagprandpn, 1q ` 0.01q,K “ Q1 ˚ D ˚ Q (5.2) for K and for M similarly. Tables 5.5 and 5.6 summarizes parts of our experiments for (5.1) and (5.2), respectively. Note that for B4dSD, various types of preconditioners including the identity (Iden), the Cholesky (Chol) and the CG (CG) are used for testing, and we refer to [10] for the details of B4dSD as well as the preconditioning technique. By observation, one can see that PPCG lrep can be more eﬃcient for 410 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG these randomly generated problems. In particular, for the type of (5.1), the condition numbers of K and M are generally large and PPCG lrep can achieve more accurate solutions (with smaller RESeig) than B4dSD.

Table 5.5. Numerical results of PPCG lrep and B4dSD for the case (5.1)

PPCG lrep B4dSD n cond(K) k Iden Diag Chol Iden Chol CG CPU times(s) 11.07 7.23 0.32 44.35 0.12 76.33 RESeig 3.49 ˆ 10´6 8.02 ˆ 10´3 1.21 ˆ 10´6 8.90 ˆ 10´1 7.34 ˆ 10´4 1.98 ˆ 10´4 4 outer iter. 9 9 9 5000 3 504 inner iter. 6027 3526 7 - - - 10001.57 ˆ 107 CPU times(s) 57.48 18.94 1.23 102.43 0.15 68.64 RESeig 5.67 ˆ 10´7 8.80 ˆ 10´4 5.12 ˆ 10´6 5.29 ˆ 10´1 1.22 ˆ 10´2 2.54 ˆ 10´2 10 outer iter. 33 17 25 5000 3 189 inner iter. 19932 5735 41 - - - CPU times(s) 31.55 17.26 0.78 86.22 0.26 256.26 RESeig 9.65 ˆ 10´6 9.40 ˆ 10´3 4.62 ˆ 10´6 9.45 ˆ 10´1 1.82 ˆ 10´3 3.94 ˆ 10´2 4 outer iter. 9 9 9 5000 3 791 inner iter. 8207 4333 9 - - - 15004.98 ˆ 109 CPU times(s) 175.69 65.56 2.44 143.45 0.30 146.67 RESeig 8.59 ˆ 10´7 1.65 ˆ 10´3 3.53 ˆ 10´6 7.36 ˆ 10´1 1.53 ˆ 10´2 1.56 ˆ 10´2 10 outer iter. 29 17 25 5000 3 186 inner iter. 30099 10075 37 - - - CPU times(s) 141.18 53.60 1.84 145.82 0.57 766.27 RESeig 4.42 ˆ 10´5 2.67 ˆ 10´1 1.08 ˆ 10´5 9.63 ˆ 10´1 1.95 ˆ 10´2 8.04 ˆ 10´2 4 outer iter. 17 13 13 5000 3 1361 inner iter. 21875 7925 13 - - - 20003.87 ˆ 107 CPU times(s) 507.23 150.00 8.05 243.64 0.59 968.64 RESeig 3.13 ˆ 10´6 7.88 ˆ 10´3 7.65 ˆ 10´6 8.32 ˆ 10´1 1.39 ˆ 10´2 3.39 ˆ 10´3 10 outer iter. 37 17 41 5000 3 430 inner iter. 51725 14066 67 - - -

Table 5.6. Numerical results of PPCG lrep and B4dSD for the case (5.2)

PPCG lrep B4dSD n cond(K) k Iden Diag Chol Iden Chol CG CPU times(s) 1.07 1.30 1.46 15.44 1.13 9.76 RESeig 9.49 ˆ 10´4 5.35 ˆ 10´4 3.48 ˆ 10´5 8.97 ˆ 10´5 7.88 ˆ 10´5 6.95 ˆ 10´5 4 outer iter. 29 29 37 1820 66 66 inner iter. 514 595 61 - - - 1000 98.98 CPU times(s) 1.10 1.45 1.22 12.06 0.86 11.90 RESeig 6.55 ˆ 10´4 2.57 ˆ 10´4 2.11 ˆ 10´5 5.40 ˆ 10´5 3.47 ˆ 10´4 2.35 ˆ 10´4 10 outer iter. 21 21 25 846 33 33 inner iter. 326 385 41 - - - CPU times(s) 1.41 1.93 3.89 27.96 2.45 20.91 RESeig 2.75 ˆ 10´3 1.43 ˆ 10´3 1.12 ˆ 10´4 4.23 ˆ 10´5 4.92 ˆ 10´5 4.16 ˆ 10´5 4 outer iter. 21 25 45 1626 64 65 inner iter. 302 404 77 - - - 1500 99.47 CPU times(s) 2.27 2.76 3.43 24.54 2.11 30.83 RESeig 1.00 ˆ 10´3 5.70 ˆ 10´4 5.79 ˆ 10´5 1.75 ˆ 10´3 5.43 ˆ 10´3 2.69 ˆ 10´3 10 outer iter. 21 21 33 852 40 39 inner iter. 321 376 55 - - - CPU times(s) 2.96 3.54 12.78 62.44 5.00 44.36 RESeig 7.96 ˆ 10´3 6.55 ˆ 10´3 1.55 ˆ 10´3 2.28 ˆ 10´3 1.85 ˆ 10´3 1.97 ˆ 10´3 4 outer iter. 25 25 85 2128 78 80 inner iter. 378 460 157 - - - 2000 84.74 CPU times(s) 4.54 5.91 7.50 46.85 3.93 57.14 RESeig 9.35 ˆ 10´4 4.74 ˆ 10´4 4.61 ˆ 10´5 4.33 ˆ 10´4 1.42 ˆ 10´4 1.43 ˆ 10´4 10 outer iter. 25 29 41 960 42 42 inner iter. 396 488 69 - - - A PROJECTED PRECONDITIONED CG METHOD 411

5.2. Numerical tests for two practical LREPs in parallel computation. In this subsection, we evaluate the eﬃciency of the parallel implementation of PPCG lrep, where the columns of Uj and Vj are computed in parallel; this results in the implementation of PPCG lrep using Algorithms 3.2 and 3.3 instead of their block version Algorithms 3.4 and 3.5, respectively. For that purpose, we coded the two versions (i.e., the block version PPCGb lrep with Algorithms 3.4 and 3.5, and the parallel version PPCGp lrep using Algorithms 3.2 and 3.3) in C language. For comparison purpose, we only utilize the generic reference BLAS (REFERENCE BLAS Version 3.5.0) instead of certain optimized version like MKL or ATLAS. The block version PPCGb lrep mainly uses level-3 BLAS operations. For PPCGp lrep, we parallelize (using 4 cores) the k calls of Algorithms 3.2 and 3.3 by OpenMP in each outer loop iteration. Both algorithms are applied to solve two LREPs for computing the optical spectra of the Na2 sodium clusters and silane (SiH4). The test matrices are obtained from the plane wave-pseudopotential turbo TDDFT code, which is part of the Quantum ESPRESSO(QE) package. In particular, the dimensions of Na2 and SiH4 are 1864 ´8 and 5660, respectively, and we set the stopping criteria r “ χ “ 10 for both versions to compute the ﬁrst k “ 4 smallest positive eigenvalues. The numerical results are displayed in Table 5.7. Although these are preliminary numerical tests for PPCGp lrep, they show the improvements on the block version PPCGb lrep, which is based on level-3 BLAS operations.

Table 5.7. Numerical results of PPCGp lrep and PPCGb lrep

Problem PPCGp lrep PPCGb lrep Iden Diag Iden Diag CPU time(s) 17.53 15.90 29.48 27.64 RESeig 1.64 10´6 1.72 10´6 1.64 10´6 1.67 10´6 Na2 ˆ ˆ ˆ ˆ outer iter. 73 73 73 73 inner iter. 1818 1533 2411 2048 CPU time(s) 105.94 77.08 142.84 103.59 RESeig 3.20 ˆ 10´6 3.10 ˆ 10´6 3.11 ˆ 10´6 3.04 ˆ 10´6 SiH 4 outer iter. 121 121 121 121 inner iter. 1113 722 1055 677

6. Conclusions. Relying upon the subspace version of the Thouless minimization principle for the linear response eigenvalue problem, in this paper, we have introduced an alternating block (between U and V ) minimization scheme to compute the desired eigenpairs characterized by the Thouless minimization principle. The connection of this alternating scheme with the inverse power iteration facilitates us to perform the convergence analysis. To make this scheme numerically eﬃcient, we further formulated the computation of each iteration as a projected preconditioned CG step, which can be implemented either in a block version rich in level-3 BLAS operations, or in parallel. Preliminary numerical experiments are reported and demonstrate its behaviors on make-up LREPs as well as on two practical problems from TDDFT.

Acknowledgements. The authors would like to thank Dr. Junqi Hu at Shang- hai University of Finance and Economics for the discussion on the time-dependent 412 XING LI, CHUNGEN SHEN AND LEI-HONG ZHANG density functional theory for the excited-state calculations. They also thank the anonymous referees for the comments and suggestions.

REFERENCES [1] Z. Bai and R. C. Li, Minimization principle for linear response eigenvalue problem, I: Theory, SIAM J. Matrix Anal. Appl., 33 (2012), 1075–1100. [2] Z. Bai and R. C. Li, Minimization principles for the linear response eigenvalue problem II: Computation, SIAM J. Matrix Anal. Appl., 34 (2013), 392–416. [3] J. Brabec, L. Lin, M. Shao, N. Govind, C. Yang, Y. Saad and E. G. Ng, Eﬃcient algorithms for estimating the absorption spectrum within linear response TDDFT, J. Chem. Theory Comput., 11 (2015), 5197–5208. [4] T. Davis and Y. Hu, The University of Florida sparse matrix collection, ACM Trans. Math. Softw., 38 (2011), 1–25. [5] U. Flaschka, W.-W. Lin and J.-L. Wu, A KQZ algorithm for solving linear-response eigenvalue equations, Linear Algebra Appl., 165 (1992), 93–123. [6] R. A. Horn and C. R. Johnson, Matrix Analysis, Cambridge University Press, Cambridge, 1985. [7] A. V. Knyazev and M. E. Argentati, Principal angles between subspaces in an A-based scalar product: Algorithms and perturbation estimates, SIAM J. Matrix Anal. Appl., 23 (2002), 2008–2040. [8] T. Li, R.-C. Li and W.-W. Lin, A symmetric structure-preserving Γ QR algorithm for linear response eigenvalue problems, Linear Algebra Appl., 520 (2017), 191–214. [9] J. Nocedal and S. Wright, Numerical Optimization, Springer, 2nd edition, 2006. [10] D. Rocca, Z. Bai, R.-C. Li and G. Galli, A block variational procedure for the iterative diagonalization of non-Hermitian random-phase approximation matrices, J. Chem. Phys., 136 (2012), 034111. [11] D. Rocca, R. Gebauer, Y. Saad and S. Baroni, Turbo charging time-dependent density- functional theory with Lanczos chains, J. Chem. Phys., 128 (2008), 928–934. [12] E. Runge and E. K. U. Gross, Density-functional theory for time-dependent systems, Phys. Rev. Lett., 52 (1984), 997–1000. [13] Z. Teng and R.-C. Li, Convergence analysis of Lanczos-type methods for the linear response eigenvalue problem, J. Comput. Appl. Math., 247 (2013), 17–33. [14] Z. Teng, Y. Zhou and R.-C. Li, A block Chebyshev–Davidson method for linear response eigenvalue problems, Adv. in Comput. Math., 42 (2016), 1103–1128. [15] D. J. Thouless, Vibrational states of nuclei in the random phase approximation, Nuclear Physics, 22 (1961), 78–95. [16] D. J. Thouless, The Quantum Mechanics of Many-Body Systems, Academic, 1972. [17] L. H. Zhang, W. W. Lin and R. C. Li, Backward perturbation analysis and residual-based error bounds for the linear response eigenvalue problem, BIT, 55 (2015), 869–896. [18] L. H. Zhang, J. Xue and R. C. Li, Rayleigh–Ritz approximation for the linear response eigenvalue problem, SIAM J. Matrix Anal. Appl., 35 (2014), 765–782. Received February 2017; 1st revision May 2018; ﬁnal revision August 2018. E-mail address: [email protected] E-mail address: [email protected] E-mail address: [email protected]