Enhancing the Performance and Robustness of the FEAST Eigensolver

arXiv:1605.08771v1 [math.NA] 27 May 2016 ecnie h emta tnadegnau rbe for problem eigenvalue standard (i.e. Hermitian simplicity the consider we iial,btrte hnmliligtetilsbpc b subspace matrix trial the projector multiplying than operates A rather algorithm FEAST but iterati The similarly, repeated converges. subspace is the process until this procedure; Rayleigh-Ritz etoa usaeieaincnit fmliligatri a multiplying accuratel of matrix the be con consists by also A subspace iteration procedure. can iteration subspace it subspace ventional but optimal an [7], as [6], described [5], [4], eigensolvers [3]. well as n and problems generalized eigenvalue the Hermitian to straight-forwardly extended be can ievleproblems eigenvalue h muto optto.Slce ueia xmlsare examples numerical the Selected of incre discussed. computation. robustness to and having presented of convergence without amount method the situations two these the improve in introduces to algorithm work FEAST one This detrim allow be balancing. can situations that which load in interest, parallel of near slow interval populated to search converge be densely the are can of algorithm matrix in edges a convergence of FEAST filter eigenvalues cases, the the rational where many from a in p as solutions rapidly in problems projector the eigenvalue interior spectral Although solve a efficiently to uses order that method interval yfidn h eigenvectors the finding by where nlssteinterval the encloses integral contour complex when ple oara number real a to applied in saseta rjco hs mg stesbpc htis that subspace the of is eigenvectors image the only whose by projector spanned spectral a is n nta utpistetilsbpc ytespectral the by subspace trial the multiplies instead one , nacn h efrac n outeso the of Robustness and Performance the Enhancing ES eog otebodrfml fcnorintegration contour of family broader the to belongs FEAST ES 1,[]i usaeieainagrtmfrsolving for algorithm iteration subspace a is [2] [1], FEAST Abstract I utpiaino etrby vector a of Multiplication . λ C I I ∈ sacoe otu ntecmlxpanta exactly that plain complex the in contour closed a is TeFATagrtmi usaeiteration subspace a is algorithm FEAST —The ( = λ n tews.A eut h matrix the result, a As otherwise. 0 and A min ρ ( = A Ax ρ λ , = ) ( A A .I I. max H = I ) h matrix The . A 2 and B,A λBx, ntera xs h function The axis. real the on 1 πi NTRODUCTION ) n hnotooaiigi ihthe with it orthogonalizing then and λ fteue’ hoig nti paper this In choosing. user’s the of x I B safitrfnto htrtrs1 returns that function filter a is C hs eigenvalues whose ( ≡ nvriyo ascuet mes,M 10,USA 01003, MA Amherst, - Massachusetts of University zI I eateto lcrcladCmue Engineering, Computer and Electrical of Department − -al gvnuaseu [email protected] [email protected], E-mail: ∈ ,btteFATalgorithm FEAST the but ), ρ A R A ( ρ A n ) ES Eigensolver FEAST ( − hs ievle lie eigenvalues whose × ) A 1 n rjcsi nothat into it projects ) dz, , sgvnb the by given is rna ai n rcPolizzi Eric and Gavin Brendan λ i nsome in lie arallel. ρ ental ρ vely ( ( on- ase the (1) (2) A λ al y y ) ) s - ees h ES loih n soitdsfwr packag software mul associated at and ( parallelism algorithm exploit FEAST to the ability levels, its and wel robustness as its properties, convergence remarkable these of Because P omnt.TeFATagrtmi urnl featured currently is algorithm FEAST The community. HPC magnitude m nasbpc trto loih prtn nasbpc of subspace a on operating algorithm dimension iteration subspace a In differe in pairs a eigenvector/eigenvalue in independently. the intervals para lie for in eigenvalues solved by solving be that, whose by can problems is eigenvectors eigenvalue benefit interval, the certain first The only twofold. finding are technique iteration possibl quadrature. are Gauss rules use we quadrature work of this variety in A system. linear the h ounvcoso h matrix the of vectors column the λ aeo ovrec eed togyo h eigenspectrum the the that on means strongly this depends of iteration convergence subspace of typical rate a For nitude. where ( ( ρ i ti o nomnt eal oaheeacnegnerate convergence a achieve to able be to of uncommon rapidl not found be is thus It can spectrum the in anywhere eigenpairs ihricesn h cuayo h udauerl 3 by (3) rule quadrature the of accuracy increasing the increasing either nerto udauerule quadrature integration usae n nti a h ES loih nsol the only in finds lie algorithm eigenvalues FEAST whose the eigenvectors way this in and subspace, nltclepeso,s npatc h utpiaino multiplication the practice genera in no matrix has so (2) expression, integral contour analytical The Appendix. in outlined www.feast-solver.org ρ th m z ( m i ( λ h eodbnfi a od ihtert fconvergence. of rate the with do to has benefit second The subspace traditional a over FEAST using of benefits The h ratio The 10 steeatnme fegnausta i nteinterval the in lie that eigenvalues of number exact the is I ags au of value largest A 0 0 A i +1 ) 1) + − ) yuigFAT h aeo ovrec becomes convergence of rate the FEAST, using By . 4 /ρ X n with A X ( steegnau ihthe with eigenvalue the is c = λ ) th − m stenme fqartr ons n ahterm each and points, quadrature of number the is by n m 1 | 2 ags au of value largest 0 n c λ ρ X 1 πi +1 rb nraigtesz ftesubspace the of size the increasing by or , 0 c ( i ρ h ievco ihthe with eigenvector the , | λ ( 8 = ) sfudb sn iersse ovrwith solver system linear a using by found is I A i ovre tart of rate a at converges ) C 2,where [2], ) /ρ ( ρ zI sapoiae yuigsm numerical some using by approximated is n usaesz of size subspace a and ( ( λ λ aebe eywl eevdb the by received well very been have ) − ) m and , 0 A +1 ) − ) ρ λ λ 1 ( a emd rirrl ag by large arbitrarily made be can i λ Xdz m snwteegnau ihthe with eigenvalue the now is ) 0 . +1 X ( stergthn ie of sides hand right the as steegnau ihthe with eigenvalue the is ≈ m I X i 0 n =1 i 2.Teagrtmis algorithm The [2]. | c 1) + th λ m i ω | ags eigenvalue largest / i 0 ( | th z λ ≈ i m I ags mag- largest 1 0 − . +1 5 m A | where , m where , ) − 0 1 tiple the ; X, as l llel a f (3) I nt e; y. e 1 . l 2

as the principle HPC eigenvalue solver in the Intel Math Kernel increasing nc or m0 increases the number of linear systems Library (MKL). that need to be solved with each iteration, and the solution The convergence rate of FEAST is not entirely insensitive of the linear systems for the quadrature rule in equation (3) to the spectrum of A, however. In situations where the eigen- is where most of the computation in the FEAST algorithm values of A are packed many times more closely together occurs. immediately outside of I than they are inside of I, the rate We would ideally like to be able to use parallel resources of convergence can be very slow. This is illustrated in Figure as efﬁciently as possible, performing the same amount of 1. The top plots in Figure 1 illustrate the situation where the computation for each interval in which we solve the eigenvalue density of the eigenvalue spectrum is the same both inside problem. We therefore would like to improve the convergence and outside the interval I, and the bottom plots in Figure 1 rate of FEAST in situations where the spectrum of A results illustrate the situation where the density is much larger outside in slow or varying convergence rates, but without having to of the interval I than it is inside of the interval I. The error solve additional linear systems in order to do so. at each FEAST subspace iteration is plotted for several values In Ref. [8], this problem is addressed with the introduction of nc and m0, and the corresponding values of λm0+1 and of the Zolotarev quadrature that produces a very steep slope

ρ(λm0+1) are indicated with horizontal dotted lines in the plots for the filter at the interval endpoints, which then leads to on the left in order to illustrate the effects of these parameters the same convergence rate between different contours. The on convergence for both the dense spectrum and the sparse Zolotarev approach presents, however, few limitations. The spectrum. first limitation is that the convergence rate is fixed and cannot be improved while increasing m0, and it will thus underper- Sparse Spectrum 0 form in comparison with Gauss quadrature, for example, in 0 10 10 situations where the spectrum is sparsely packed or uniformely -1 m =51, n =8 -4 10 0 c 10 distributed (e.g. top plot of Figure 1); The second limitation -2 10 m =70, n =3 λ) 0 c

( -8

ρ is that the Zolotarev approach cannot be extended to the non- -3 10 10 Interval in Contour Hermitian problem where the eigenvalues are located in the

-4 -12 Eigenvector Error 10 10 m0=70, nc=8 complex plane. -5 10 -2 -1 0 1 2 2 4 6 8 10 In this work we propose a more general set of alternatives λ Dense Spectrum Subspace Iteration 0 0 10 that use “accelerated subspace approach” strategies in order to 10 improve the convergence robustness of FEAST regardless of -1 -4 10 nc=3 m0=51, nc=8 10 n =8 which quadrature rule is being used. -2 c 10 m =70, n =3 λ) 0 c ( -8 ρ m =70, n =3 10 -3 m =70, n =8 0 c 10 0 c m0=70, nc=8 II. ACCELERATING THE FEAST SUBSPACE ITERATION

-12 Eigenvector Error -4 Interval in Contour m =51, n =8 10 10 0 c Previous research [9] and the observation that larger sub- -5 10 -2 -1 0 1 2 2 4 6 8 10 λ Subspace Iteration space sizes m0 increase the rate of convergence for FEAST suggest that we may be able to improve convergence by Fig. 1. Two test cases illustrating the difference in the convergence rate of finding ways to increase the size of the subspace that is used FEAST for a matrix with a sparsely packed eigenvalue spectrum outside the in the Rayleigh-Ritz procedure. If we can do this without contour interval (top plots) and a matrix with a densely packed eigenvalue spectrum outside the contour interval (bottom plots). Each matrix is dimension having to solve additional linear system right hand sides when 545 with 50 eigenvalues inside of the contour interval and 495 eigenvalues performing the numerical quadrature in equation (3), then we outside of the contour interval. The plots on the right show the convergence may improve the convergence rate of FEAST without having of the maximum eigenvector error for various values of the parameters m0 and nc. The plots on the left show the value of ρ(λ) for nc = 3 and nc =8 to do too much additional computation. In the following plotted with solid and dashed curves, with the locations of the eigenvalues of subsections we discuss two ways of expanding the FEAST the matrix indicated by plot markers. The locations of λm0+1 and the values subspace size without solving additional linear systems. of ρ(λm0+1) are indicated with dotted horizontal lines for the same several values of m0 and nc. The matrix with the sparsely packed spectrum converges well, whereas the matrix with densely packed spectrum barely converges at A. Method 1: Expand Subspace Using Previous Subspaces all. In a typical FEAST subspace iteration, the trial subspace Even in situations that are less pathological than the one Xi from the previous iteration is discarded and replaced with illustrated in the bottom plot of Figure 1, the varying density of the filtered subspace ρ(A)Xi (Step 1, FEAST Algorithm). the spectrum of A can have negative implications for parallel Rather than discarding the previous subspace Xi, we might load balancing. We can find the eigenpairs of A in parallel instead append the new, filtered subspace to the old one before by dividing the spectrum of A into several non-intersecting performing the Rayleigh-Ritz procedure; by doing this we can intervals and then solving the eigenvalue problem for each increase the dimension of the subspace by m0 without having interval separately and in parallel. When we do this, some to solve additional linear systems. Step 1 of FEAST might intervals may converge more quickly than others due to the then look like this: 1. Filter the trial subspace and append it to the columns varying density of the spectrum, even if every interval contains of the old one: ′ the same number of eigenvalues. It is possible to speed up X =[Xi ρ(A)Xi] convergence in a given interval by increasing n or m0, but ′ c where we form X by appending the column vectors of this does not reduce the amount of computation required; ρ(A)Xi to the matrix for the previous subspace Xi. We 3 could repeat this process several times in order to build up eigenpairs are found whose eigenvalues lie inside the contour a total subspace size of s × m0, after which we could keep interval, we then select additional eigenpairs from outside the the subspace size constant by removing old subspaces before contour as well, preferentially selecting those eigenpairs with adding new ones at each subspace iteration. the lowest residuals. It would not be surprising if this modification of FEAST were to improve its the convergence rate; by expanding the B. Method 2: Expand Subspace Using Eigenvector Residuals subspace in this way, we are essentially building a Krylov The other piece of information that the typical FEAST itera- subspace technique wherein we multiply our prospective sub- tion generates (and which is otherwise discarded) is the eigen- space by powers of ρ(A) rather than by powers of A. vector error residuals. Step 3 of the normal FEAST algorithm If we modify step 1 of FEAST in this way then we computes the eigenvector error residuals r = Ax − λ x , have to make a few other modifications as well. Step 2(i) k k k k and uses the one with the largest norm as a measure of the of FEAST requires the solution of the reduced eigenvalue ′ ′ ′ ′ ′ accuracy of the current subspace estimate. problem A q = λB q, so we need to ensure that B = X T X Because the current eigenpair estimates provided at each is symmetric positive definite; because we append the filtered iteration of FEAST come from the Rayleigh-Ritz procedure, subspace to the old one, this is no longer guaranteed. We the inner product of any of the estimated eigenvectors with any therefore need to add another step to FEAST: orthogonalize T ′ of the residual vectors is zero: x r = 0, ∀ 1 ≤ j, k ≤ m0. the matrix X before doing the Rayleigh-Ritz procedure. This j k One can show this by using the fact that x = X′q : can be done by using the QR decomposition or the singular k k ′ T T T value decomposition (SVD) of X . For the research presented xj rk = xj Axk − xj λkxk ′ ′ here, we orthogonalize X by taking its SVD and setting X T ′T ′ T ′T ′ = q X AX qk − λkq X X qk (6) equal to the left singular vectors: j j = δjkλk − λkδjk =0. X′ = UΣV T → X′ = U. (4) If R is the matrix of column vectors rk, then its column In particular, we do this by diagonalizing X′T X′ vectors span a subspace that is orthogonal to the current ′ ′ ′ − estimated solution subspace X. We can therefore perform X T X = V Σ2V T → U = X V Σ 1, (5) another Rayleigh-Ritz procedure in the subspace spanned and then retaining the first m0 columns of U. Although by the combined columns of X and R without having to this is less numerically stable than QR, we have found that orthogonalize the column vectors of R with respect to those it offers performance benefits in terms of the speed of the of X in order to ensure that X′T X′ is symmetric positive orthogonalization. definite. This allows us to improve the estimated subspace We call this algorithm ’expanding subspace FEAST’; see without having to solve any additional linear systems and the XFEAST algorithm in Appendix. without having to do any orthogonalization procedure. A The implementation of XFEAST we use in this paper modified FEAST algorithm using this approach is given in also involves expanding the subspace to its full size before the RFEAST algorithm in Appendix. doing the first Rayleigh-Ritz procedure. The subspace size can Again, it would not be surprising if adding eigenvector be increased incrementally, with the Rayleigh-Ritz procedure residuals to the subspace were to help improve the convergence being done in between each subspace expansion, but there is rate; the eigenvalue algorithm LOBPCG [10] also works by no reason to do this unless one expects that the algorithm including an eigenvector residual block in the search subspace. might converge before the subspace size has reached its limit. Step 2(ii) of RFEAST again requires that we select the The part of step 2(ii) of XFEAST that specifies that one desired eigenpairs from amongst the ones produced by the must select the desired eigenvectors is required because the Rayleigh-Ritz procedure. This is done in the same way as for subspace X is expanded beyond just the filtered subspace. XFEAST. In conventional FEAST iterations the Rayleigh-Ritz procedure Measuring the error on the estimated subspace for RFEAST will find all of the m eigenvectors whose eigenvalues are in the generally requires more care than in XFEAST or FEAST. The interval I, plus the m0 − m eigenvectors whose eigenvalues Rayleigh-Ritz procedure for RFEAST tends to produce many are closest to, but still outside of, I. In XFEAST, because eigenpairs that do not exist in the spectrum of the full size the subspace is expanded beyond the size m0, the Rayleigh- matrix A due to numerical error, and many of these spurious Ritz procedure will find all of those m0 eigenpairs plus many eigenpairs have eigenvalues that fall inside the interval I. more. Due to numerical errors it may even find eigenpairs for In order to return the correct estimated eigenpairs and the Rayleigh-Ritz matrix A′ that do not exist for the original estimate the error on them, we must select only the eigenpairs matrix A. inside I that are not spurious. We do this by determining how Since step 4(i) of XFEAST requires a subspace of dimen- many eigenpairs we should expect to find in that interval, and sion m0 to filter with ρ(A) for the next iteration, we must then taking that number of eigenpairs inside I with the lowest select m0 of the s × m0 eigenpairs that are produced by residuals to be the eigenpairs of interest. We determine the step 2(i). Here, we are using a few steps of sorting. First, number of eigenpairs to expect by counting the number of we calculate the error residuals for all s × m0 eigenpairs from eigenpairs found during the first Rayleigh-Ritz procedure of 2(i). We then select all of the eigenpairs whose eigenvalues each subspace iteration (i.e. during iteration j = 1 in step lie inside the interval I = (λmin, λmax). If fewer than m0 2 of RFEAST); the subspace used for the first Rayleigh-Ritz 4

FEAST Iterations for Sparse and Dense Eigenspectra procedure is just the conventional FEAST subspace, and so we will not yet have produced the proliferation of spurious subspace size: 51 subspace size: 102 subspace size: 153 eigenpairs that comes from expanding the subspace by using 1e+00 the eigenvector residuals. 1e-02 1e-04 es Sparse Dense 1e-06 III. RESULTS AND COMPARISONS 1e-08 1e-10 We demonstrate the convergence properties of these modi- 1e-12 ﬁed FEAST algorithms with several example matrices. 1e-14 Figure 2 shows the eigenvector error residual at each 1e+00 1e-02 subspace iteration of the FEAST, XFEAST, and RFEAST 1e-04 Eigenvector Error algorithms as applied to two different real symmetric matrices, 1e-06 1e-08 for several different subspace sizes. Both matrices are dimen- FEAST 1e-10 XFEAST sion 545 and have the same eigenvectors, with 50 eigenvalues 1e-12 RFEAST 1e-14 inside the FEAST interval I =[-1,1]. 0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12 One matrix, labeled “Sparse” in Figure 2, has the other # Contour Integrations 495 eigenvalues in the interval [1.01, 20.81], whereas the Fig. 2. Plots showing the eigenvector error residual versus number of contour one labeled “Dense” has those 495 eigenvalues in the interval integrations for each of the three FEAST variations for various subspace [1.01, 1.1]. That is, the “Sparse” matrix has sparsely-packed dimensions. The top row of plots show the results when using a matrix with a eigenvalues outside of I, and the “Dense” matrix has densely- densely packed eigenspectrum outside of the FEAST interval, and the bottom row of plots show the results when using a matrix with a sparsely packed packed eigenvalues outside of I. eigenspectrum outside of the FEAST interval. Both matrices are dimension For the FEAST iterations the value of m0 is the same as 545, and we search for 50 eigenvalues. The number above each plot indicates the subspace size, whereas for the XFEAST and RFEAST the size of the subspace being used. The “Dense” results are for nc = 8 and the “Sparse” results are for nc = 3; convergence is too fast for good iterations m0 is always set at 51, and the full subspace illustration with nc = 8 for the “Sparse” matrix. size is generated by one or the other subspace expansion method. XFEAST and RFEAST thus solve the same number Eigenvector Residual vs. # Linear RHS Solved of linear systems for each contour integration, regardless of the Dense Sparse subspace size, whereas FEAST solves more linear systems for 1e+00 F ss=51 nC=3 1e-02 larger subspace sizes. F ss=51 nC=8 F ss=102 n =3 Despite solving many fewer linear system right hand sides 1e-04 C F ss=102 nC=8 per iteration (the number of linear system right hand sides 1e-06 F ss=153 nC=3 per iteration is nc × m0), XFEAST and RFEAST outperform F ss=153 n =8 1e-08 C XF ss=102 n =3 FEAST for the “Dense” matrix on a per-subspace iteration C XF ss=102 n =8 basis. This is not the case for the “Sparse” matrix. Nonetheless, 1e-10 C Eigenvector Error RF ss=102 nC=3 1e-12 even for the “Sparse” matrix, XFEAST and RFEAST do RF ss=102 nC=8 1e-14 a similar amount of total computation for a given level of 0 1000 2000 3000 4000 5000 0 1000 2000 convergence. # Linear System Right Hand Sides Figure 3 shows the amount of eigenvector error per number of linear system right hand sides solved for the same two Fig. 3. Plots showing eigenvector residual versus the number of linear system right hand sides solved to reach that level of convergence, for both a matrix matrices, for various values of m0 and nc. The advantages with a dense eigenspectrum outside the interval of interest and a matrix with a of XFEAST and RFEAST are especially clear here; the rate sparse eigenspectrum outside the interval of interest, using FEAST, XFEAST, of convergence per linear system right hand side solved, and RFEAST for various subspace sizes m0 and numbers of quadrature points nc. XFEAST and RFEAST consistently require fewer linear system solutions which is the majority of the computation in the FEAST than regular FEAST does in order to reach the same level of accuracy, with algorithm, depends primarily on which algorithm is used in the difference being fairly dramatic in the dense eigenspectrum case. the case of the “Dense” matrix, with XFEAST and RFEAST clearly outperforming FEAST. XFEAST and RFEAST also outperform FEAST for the “Sparse” spectrum matrix, but here when FEAST is used on each of these intervals separately. the difference is less dramatic. For the interval encompassed by Contour 2 (shown in red), the Figure 4 illustrates the results of solving an eigenvalue rightmost edge of which passes through a very dense region in problem whose spectrum derives from electronic structure the eigenspectrum, we also show the result of using XFEAST calculations [8], [11]. Our work here has been motivated by and RFEAST in order to try to achieve better convergence applications of this kind. than is possible with FEAST. The left plot of Figure 4 shows the density of the eigen- The ﬁrst interval (shown in blue), which has no eigenvalues spectrum of the Hamiltonian matrix for the ground state of of immediately near its edges, converges rapidly, much like the a Caffeine molecule. The density of the eigenspectrum shows second and third columns of the sparse example in Figure 2. several distinct peaks, and one potential partitioning of the The second interval, which has its upper limit passing through spectrum into two intervals is shown with red and blue lines. the middle of a dense group of eigenvalues, converges very The right plot of Figure 4 shows the convergence trajectory slowly when using FEAST. This is the sort of problem that 5 we seek to address. ACKNOWLEDGMENTS FEAST Applied to Electronic Structure Spectrum The authors wish to acknowledge helpful discussions with Dr. Ping Tak Peter Tang and Dr. Yousef Saad. This material Contour 1, F Contour 2, F is supported by NSF under Grant #CCF-1510010. Contour 2, XF Contour 2, RF Contour 2 Interval 0 10 REFERENCES -2 Contour 1 Interval 10 -4 [1] E. Polizzi, “Density-matrix-based algorithm for solving eigenvalue prob- 10 -6 lems,” Phys. Rev. B, vol. 79, p. 115112, 2009. 10 [2] P. T. P. Tang and E. Polizzi, “Feast as subspace iteration accelerated by -8 10 approximate spectral projection,” SIAM Journal on Matrix Analysis and -10 10 Applications, vol. 35, p. 354390, 2014. -12 [3] J. Kestyn, E. Polizzi, and P. T. P. Tang, “FEAST Eigensolver for non-

10 Eigenvector Error Density of Eigenvalues -14 10 Hermitian Problems,” ArXiv e-prints 1506.04463, Jun. 2015. -16 [4] T. Sakurai and H. Sugiura, “A projection method for generalized 10 0 2 4 6 8 10 eigenvalue problems using numerical integration,” in Proceedings of the -500 -400 -300 -200 -100 0 # Contour Integrations λ 6th Japan-China Joint Seminar on Numerical Mathematics (Tsukuba, 2002), vol. 159, no. 1, 2003, pp. 119–128. Fig. 4. Plots showing the application of the FEAST variations to a matrix [5] T. Sakurai and H. Tadano, “CIRR: a Rayleigh-Ritz type method with derived from electronic structure theory. Left plot shows the density of contour integral for generalized eigenvalue problems,” Hokkaido Math. the eigenspectrum of the matrix, divided into two intervals, and the right J., vol. 36, no. 4, pp. 745–757, 2007. plot shows the convergence of the eigenvector error for the various FEAST [6] A. Imakura, L. Du, and T. Sakurai, “A block Arnoldi-type contour algorithms applied to the two intervals. The 14 eigenpairs in the ”Contour integral spectral projection method for solving generalized eigenvalue 1 Interval” were calculated using a base subspace size of m0 = 17, and problems,” Appl. Math. Lett., vol. 32, pp. 22–27, 2014. the 43 eigenpairs in the ”Contour 2 Interval” were calculated by using a [7] A. P. Austin and L. N. Trefethen, “Computing eigenvalues of real base subspace size of m0 = 46. Both the XFEAST and the RFEAST runs symmetric matrices with rational filters in real arithmetic,” SIAM J. Sci. for ”Contour Interval 2” use a total subspace size of 3m0, with the subspace Comput., vol. 37, no. 3, pp. A1365–A1387, 2015. having expanded twice by using either the previous FEAST iteration solutions [8] S. Güttel, E. Polizzi, P. T. P. Tang, and G. Viaud, “Zolotarev quadrature or the eigenvector residuals. rules and load balancing for the feast eigensolver,” SIAM Journal on Scientific Computing, vol. 37, no. 4, pp. A2100–A2122, 2015. [9] B. Gavin and E. Polizzi, “Non-linear eigensolver-based alternative to Using XFEAST and RFEAST, we can improve the final traditional scf methods,” J. Chem. Phys., vol. 138, p. 194101, 2013. eigenvector error residual for the second, more challenging [10] A. V. Knyazev, “Toward the optimal preconditioned eigensolver: Locally optimal block preconditioned conjugate gradient method,” SIAM journal interval by more than four orders of magnitude. Still, this on scientific computing, vol. 23, no. 2, pp. 517–541, 2001. does not achieve ideal load balancing because the first interval [11] A. R. Levin, D. Zhang, and E. Polizzi, “{FEAST} fundamental frame- has both of its edges in regions that are completely empty work for electronic structure calculations: Reformulation and solution of the muffin-tin problem,” Computer Physics Communications, vol. 183, of eigenvalues, and so it converges very quickly. Better load no. 11, pp. 2370 – 2375, 2012. balancing can only be achieved by dividing the spectrum in [12] E. Di Napoli, E. Polizzi, and Y. Saad, “Efficient estimation of eigenvalue a less arbitrary way, which will require that we estimate the counts in an interval,” Numerical Linear Algebra with Applications, 2016, nla.2048. spectrum of a matrix before diagonalizing it. This is a subject [13] S. Kajpust, “Variations of the feast eigenvalue algorithm,” Master’s of continuing research. thesis, Michigan Technological University, 2014.

IV. CONCLUSION APPENDIX The results in Section III show that we can indeed improve the convergence rate of FEAST without solving additional FEAST Algorithm linear systems by expanding the FEAST subspace through n×n other means. This is particularly helpful in situations where the Start with: Matrix A ∈ R to be diagonalized, interval I = (λmin, λmax) wherein fewer than m0 eigenvalues are expected to be n×m spectrum of the matrix at hand makes convergence difficult. found, initial guess X0 ∈ R 0 for the subspace spanned by the Doing so comes at the price of having to use additional solution to the eigenvalue problem Ax = λx, λ ∈I. memory to store the expanded subspace; when using enough 1. Filter the subspace Xi to remove eigenvectors whose eigen- ′ parallelism (and therefore a large enough number of intervals), values do not lie in the interval I: X = ρ(A)Xi however, we expect that memory will not be a constraint 2. Perform Rayleigh-Ritz procedure to find a new estimate for eigenvalues and eigenvectors: because the initial size of the subspace for each interval can ′ ′ i. Solve reduced eigenvalue problem A q = λB q, be made almost arbitrarily small. As the results in Figure 4 with A′ = X′T AX′ and B′ = X′T X′ ′ show, though, this alone is not yet a fully satisfactory solution ii. Get new estimate for subspace X: Xi+1 = X Q for achieving load balancing. Future work will consist of 3. Check the eigenvector error r = max ||Axk − λkxk||, 1 ≤ using this research to build on the efforts of others in order k ≤ m0, λk ∈I. If r is above a given tolerance, GOTO 1. to estimate the eigenvalue distribution of a matrix [12] and efficiently divide the eigenvalue interval of interest [13]. We expect that, by combining our work here with these techniques for measuring and dividing and the eigenvalue spectrum of a matrix, we can achieve ideal load balancing in an automated way for arbitrary matrices. 6

XFEAST Algorithm

Start with: Matrix A ∈ Rn×n to be diagonalized, interval I = (λmin, λmax) where fewer than m0 eigenvalues are expected to be n×m found, initial guess X0 ∈ R 0 , and maximum number of subspaces to store s. 0. Repeatedly apply the filter procedure and append the resulting subspaces in order to expand the subspace to the predeter- mined size: ′ X =[X0 X1 X2 ... Xs−1] i with Xi = (ρ(A)) X0. 1. Orthogonalize the columns of X′ 2. Perform Rayleigh-Ritz procedure to find new estimate for eigenvalues and eigenvectors: i. Solve reduced eigenvalue problem A′q = λB′q, with A′ = X′T AX′ and B′ = X′T X′ ii. Select the desired m0 eigenpairs and get new esti- ′ mate for X: Xi+1 = X Q 3. Check the eigenvector error r = max ||Axk − λkxk||, 1 ≤ k ≤ m0, λk ∈I. If r is below a given tolerance, STOP. 4. Update subspace: i. Apply filter to new subspace estimate: Xi+1 = ρ(A)Xi+1 ii. Update subspace by removing the oldest subspace and appending the newest update: X′ = [Xi−s+1 Xi−s+2 ... Xi+1] 5. GOTO 1.

RFEAST Algorithm

Start with: Matrix A ∈ Rn×n to be diagonalized, interval I = (λmin, λmax) where fewer than m0 eigenvalues are expected to be n×m found, initial guess X0 ∈ R 0 , maximum number of Rayleigh-Ritz iterations s . 1. Filter the subspace Xi to remove eigenvectors whose eigen- ′ values do not lie in the interval I: X = ρ(A)Xi 2. Perform Rayleigh-Ritz procedure to ﬁnd new estimate for eigenvalues and eigenvectors: For j = 1 to s i. Solve reduced eigenvalue problem A′q = λB′q, with A′ = X′T AX′ and B′ = X′T X′ ii. Select the desired m0 eigenpairs and get new esti- ′ mate for X: Xi+1 = X Q iii. Compute residual vectors and expand subspace: ′ ′ R = AXi+1 − Xi+1Λ −→ X =[X R] end for 3. Check the eigenvector error r = max ||Axk − λkxk||, 1 ≤ k ≤ m, λk ∈I. If r is above a given tolerance, GOTO 1.