arXiv:0704.3486v2 [hep-lat] 19 Oct 2007 eta sitoue nteQDLgaga.I hschem- this If Lagrangian. QCD po- the chemical in quark introduced a is density, tential nonzero [6]. at calculations QCD describe model To and the- simulations and lattice collisions in ion oretically heavy relativistic in experimentally been have [4,5]. case this its for developed compute to methods efficient computing inof tion optn oefunction some computing ini h ueia optto ftesg ucino a of function sign the solu- matrix of this sparse computation for numerical pay the to is Ginsparg-Wilson has tion one the price of the However, [2] solution [3]. operator relation exact Dirac an the overlap provides Perhaps the which ways. is complementary solution prominent of most number a been in recently have solved problems these but posed [1], lattice time, problems the long serious on a symmetry For chiral lattice. of space-time implementation finite the simulation a on numerical theory the the of is (QCD) chromodynamics tum Introduction 1. PACS: words: Key e.g vectors, different part on times is many modification computed Our be disco to method. a has has the matrix which of function, efficiency sign improved the w the to which applied for then clos subspace, eigenvalues is Krylov has method the matrix to the eigenvectors when critical spoiled of de compu is is be method algorithm can the Arnoldi vector of the arbitrary on qua based an a approximation on subspace of matrix presence non-Hermitian the compu a in the of non-Hermitian requires becomes QCD it lattice in Hermitian, operator Dirac overlap The Abstract n t plcto oteoelpDrcoeao tnonzero at operator Dirac overlap the to application its and rpitsbitdt Elsevier to submitted Preprint nieaiemto ocmuetesg ucino non-Her a of function sign the compute to method iterative An h hs iga fQDi urnl en explored being currently is QCD of diagram phase The h nysseai oprubtv praht quan- to approach nonperturbative systematic only The 26.c 11.a 12.38Gc 11.15.Ha, 02.60.Dc, f ( A f vra ia prtr ur hmclptnil infun sign potential, chemical quark operator, Dirac overlap ntevector the on ) ( A A ) · fdimension of x , where a nttt o hoeia hsc,Uiest fRegensbu of University Physics, Theoretical for Institute b eateto ahmtc,Uiest fWpetl 42097 Wuppertal, of University Mathematics, of Department f x x famatrix a of Typically . ∈ N C .Bloch J. eeadi h following, the in and Here . N .. eemnn h ac- the determining i.e., , A A a sasothn for short-hand a is sHriin and Hermitian, is .Frommer A. , rps w ieetdflto cee.Teesigmodifie ensuing The schemes. deflation different two propose e cie o h vlaino eei arxfnto.The function. matrix generic a of evaluation the for scribed oafnto icniut.Ti scrdb digasmall a adding by cured is This discontinuity. function a to e tniyaogteiaiayai.Tenmrclrslscl results numerical The axis. imaginary the along ntinuity b ,i h vra ia prtri netduiga iterati an using inverted is operator Dirac overlap the if ., kceia oeta.W hwhwteato ftesg func sign the of action the how show We potential. chemical rk aino h infnto famti.Wieti arxi u is matrix this While matrix. a of function sign the of tation .Lang B. , clryeetv hnteato ftesg ucino the of function sign the of action the when effective icularly e ffiinl nlreltie ya trtv ehd Kry A method. iterative an by lattices large on efficiently ted to,nnHriinmti,ieaiemethods iterative matrix, non-Hermitian ction, clptnili mlmne nteoelpoeao [7], operator overlap the in matrix the implemented is potential ical C,adi e.6w ics h eut bandfrtwo for obtained results the sizes. discuss lattice we lattice different 6 of sign Sec. operator in the Dirac and compute overlap QCD, the to in method occurring the function crit- use these We method about eigenvalues. Arnoldi information ical the exact enhance account into we taking 5 by the Sec. along In function axis. sign the imaginary of discontinuity the This by parts. ma- caused real a is absolute of small function with sign eigenvalues having this the trix computing of when efficiency poor The is function. method matrix an generic approx- a propose subspace of Krylov we imation a func- 4 make sign Sec. to the method In Arnoldi-based discuss matrices. briefly non-Hermitian for and tion detail non-Hermitian more the in describe we problem section next the In method. control. the under keep larger requirements to memory on methods and iterative time but computation to below), resort to (4) ma- needs Eq. one spectral lattices (see the per- definition using by function and done trix be diagonalization can full this a lattice, forming small a non-Hermitian On a matrix. of sparse function sign the computing of problem h ups fti ae st nrdc uha iterative an such introduce to is paper this of purpose The b n .Wettig T. and , g 34 eesug Germany Regensburg, 93040 rg, A oe t emtct,adoei ae ihthe with faced is one and Hermiticity, its loses upra,Germany Wuppertal, a hmclpotential chemical iinmatrix mitian 6Arl2007 April 26 emethod. ve al show early Arnoldi d efficiency number sually same tion lov 2. The overlap operator and the sign function λ 1 0 i ···  .. .  The overlap formulation of the Dirac operator is a rig- 0 λi . . Ji = . (6) orous method to preserve chiral symmetry at finite lat-  . . .   . .. .. 1  tice spacing in a vector-like gauge theory. Its construction     is based on successive developments described in seminal  0 0 λi  ···  papers by Kaplan, Shamir, Furman, Narayanan and Neu-   Then, berger [8–10,2]. In the presence of a non-zero quark chemi- −1 cal potential µ, the massless overlap Dirac operator is given f(A)= U f(Ji) U , (7) by [7] i ! M Dov(µ)=1+ γ5 sgn(Hw(µ)) , (1) where where sgn stands for the sign function, Hw(µ)= γ5Dw(µ), (mi−1) (1) f (λi) D (µ) is the Wilson-Dirac operator at nonzero chemical f(λi) f (λi) w ··· (m 1)! potential [11,12] with negative Wilson mass m ( 2, 0),  i −  w ∈ − .. . γ5 = γ1γ2γ3γ4, and γ with ν = 1,..., 4 are the Dirac 0 f(λi) . . ν f(Ji)=   (8)  .  gamma matrices in Euclidean space. The Wilson-Dirac op-  . .. .. (1)   . . . f (λi)  erator is a discretized version of the continuum Dirac op-     erator that avoids the replication of fermion species which  0 0 f(λi)   ···  occurs when a naive discretization of the opera-   tor is used. It is given by with mi the size of the Jordan block, see [14]. The super- scripts denote of the function with respect to [Dw(µ)]nm = δn,m (2) its argument. 3 3 Non-Hermitian matrices typically have complex eigen- κ (1 + γ )U δ κ (1 γ )U † δ − j n,j n+ˆj,m − − j n−ˆj,j n−ˆj,m values, and applying Eq. (4) or (7) to the sign function in j=1 j=1 X X Eq. (1) requires the evaluation of the sign of a complex κ(1 + γ )eµU δ κ(1 γ )e−µU † δ , number. The sign function needs to satisfy [sgn(z)]2 = 1 − 4 n,4 n+4ˆ,m − − 4 n−4ˆ,4 n−4ˆ,m and, for real x, sgn(x)= 1 if x ≷ 0. These properties are where κ =1/(8+2m ) and U is the SU(3)-matrix asso- w n,ν satisfied if one defines ± ciated with the link connecting the lattice site n to n +ˆν. z The exponential factors e±µ are responsible for the non- sgn(z) = sgn(Re(z)) , (9) ≡ √ 2 Hermiticity of the operator. The quark field at each lattice z site corresponds to 12 variables: 3 SU(3) color components where in the last equality the cut of the is 4 Dirac spinor components. chosen along the negative real axis. Using the definition (9) × For µ = 0 the argument Hw(µ) of the sign function be- in the spectral definition (7) yields a matrix sign function comes non-Hermitian,6 and we need to define the sign func- in agreement with that used in [15–18]. Indeed, based on tion for this case. Consider first a given matrix A with no the general Jordan decomposition particular symmetry properties and a function f. Let Γ be C J+ a collection of closed contours in such that f is analytic A = U U −1 , (10)   inside and on Γ and such that Γ encloses the spectrum of J− A. Then the function f(A) of the matrix A can be defined   by [13] where J+ represents the Jordan blocks corresponding to 1 the eigenvalues with positive real part and J− those cor- f(A)= f(z)(zI A)−1dz , (3) 2πi − responding to the eigenvalues with negative real part, the IΓ where the integral is defined component-wise and I denotes spectral definition (7) for the sign function becomes the identity matrix. +I From this definition it is easy to derive a spectral function sgn(A)= U U −1 , (11) definition, even if the matrix is non-Hermitian. If the ma-  I  −1 − trix A is diagonalizable, i.e., A = UΛU with a diagonal   eigenvalue matrix Λ = diag(λ ) and U Gl(N, C), then see also [19]. This definition agrees with the result one ob- i ∈ tains when deriving Eq. (1) from the domain-wall fermion −1 f(A)= Uf(Λ)U (4) formalism at µ = 0 in the limit in which the extent of the fifth dimension6 goes to infinity [20]. with For any square matrix A we have sgn(A)2 = I, and a f(Λ) = diag(f(λi)) . (5) short calculation [7] shows that for this reason the overlap operator D (µ) as defined in Eq. (1) satisfies the Ginsparg- If A cannot be diagonalized, a more general spectral defini- ov Wilson relation tion of f(A) can be derived from Eq. (3) using the Jordan decomposition A = U( J )U −1 with Jordan blocks D ,γ = D γ D . (12) i i { ov 5} ov 5 ov L 2 If A is Hermitian, the polar factor pol(A) = A(A†A)−1/2 many variants and improvements have been built over the of A coincides with sgn(A), and this fact has been used years. The Lanczos method makes use of short recurrences successfully to develop efficient iterative methods for com- to build an orthonormal basis in the Krylov subspace. puting the action of the matrix sign function on a vec- Krylov subspace methods are also used for non- tor [21]. However, if A is non-Hermitian, then in general Hermitian matrices in the context of eigenvalue problems sgn(A) = pol(A) and pol(A)2 = I. Thus, for µ = 0, re- (a popular example being the restarted Arnoldi method of 6 6 6 placing sgn(Hw) by pol(Hw) in the definition of the overlap ARPACK), for the solution of linear systems, and even for operator in Eq. (1) not only changes the operator but also the evaluation of the exponential function [27,28]. The two violates the Ginsparg-Wilson relation, as can also be seen most widely used methods are the Arnoldi method and the in numerical experiments. We conclude that the definition two-sided Lanczos method. In contrast to the Hermitian given in Eq. (1) is the correct formulation of the overlap case, the Arnoldi method requires long recurrences to con- operator for µ = 0. struct an orthonormal basis for the Krylov subspace, while 6 the two-sided Lanczos method uses two short recurrence relations, but at the cost of losing orthogonality. 3. Direct and iterative methods In the next section we present an Arnoldi-based method to compute a generic function of a non-Hermitian matrix. A numerical implementation of the sign function using The application of the two-sided Lanczos method to this the spectral definition (4) is only possible for small matri- problem will be investigated in a separate publication. ces, as a full diagonalization becomes too expensive as the matrix grows. As an alternative, matrix-based iterative al- gorithms for the computation of the matrix sign function 4. Arnoldi function approximation for a have been around for many years [15,17,22,23]. Although non-Hermitian matrix these algorithms are much faster than the direct implemen- tation of the spectral definition for medium-sized problems, From the spectral definition (4) it follows that f(A) is they still require the storage and manipulation (i.e., inver- identical to some polynomial P (A) of degree K 1 < K−1 − sions and/or multiplications) of the entire matrix. This is N. Indeed, the unique interpolation polynomial PK−1(z), feasible for medium-sized matrices, but becomes too expen- which interpolates f(z) at the different eigenvalues λi of sive for the very large matrices occurring in typical lattice A, satisfies f(A)= PK−1(A), as follows immediately from QCD simulations. E.g., even for an 83 8 lattice, which is Eq. (4). If A has non-trivial Jordan blocks, this is still true, the minimum lattice size required for a× physically relevant but interpolation is now in the Hermite sense, where at 3 problem, the matrix dimension is already 12 8 8 50 000. each eigenvalue PK−1 interpolates f and its derivatives up Even though these QCD matrices are sparse,· the· ≈ iterative to order one less than the size of the largest corresponding procedure computing the sign function fills the matrix as Jordan block, see [19]. the iterations proceed, and the method eventually becomes Hence, for an arbitrary vector x, prohibitively expensive. K−1 Therefore, another iterative method is required which y f(A)x = P (A)x = c Aix . (13) does not produce an approximation to the full sign matrix ≡ K−1 i i=0 itself, but rather produces an approximation to the vec- X tor y = sgn(A)x, i.e., to the operation of the sign matrix The idea is to construct an approximation to y using a on an arbitrary vector. Many QCD applications only re- polynomial of degree k 1 with k N. One possibility is to − ≪ quire the knowledge of this product for a number of selected construct a good polynomial approximation Pk−1(A) such source vectors x. For instance, some low-energy properties that Pk−1(λi) f(λi). This would yield a single polyno- of QCD can be described by the lowest-lying eigenvalues mial approximation≈ operating on any vector x to compute of the Dirac operator. These eigenvalues can efficiently be f(A)x Pk−1(A)x. One such example is the expansion in found by an iterative eigenvalue solver like ARPACK [24], terms of≈ Chebyshev polynomials up to degree k 1. Al- which only requires the computation of matrix-vector mul- though Chebyshev polynomials are very close to the− mini- tiplications. Analogously, the computation of the propaga- max polynomial for a function approximation over an ap- tion of fermions can be well approximated by inverting the propriate ellipse in the , one can do better for Dirac operator on a selected number of source vectors bk, matrix function approximations. First of all, one only needs i.e., the solution of the systems Dovx = bk. These inversions a good approximation to f at the eigenvalues of A, and are also performed using iterative linear solvers requiring secondly, one can use information about the source vector only matrix-vector multiplications. x to improve the polynomial approximation. The vector x Such iterative methods, mostly from the class of Krylov can be decomposed using the (generalized) eigenvectors of subspace methods, are already extensively used for the so- A as a complete basis, and clearly some eigendirections will lution of eigenvalue problems, linear systems, and for func- be more relevant than others in the function approxima- tion evaluations [25,26] with Hermitian matrices. There, tion. Using a fixed interpolation polynomial does not use the ancestor of all methods is the Lanczos method, of which any information about the vector x. Furthermore, the only

3 feature of A usually taken into account by such polynomial The long recurrences of the Arnoldi method make the approximations is the extent of its spectrum. method much slower than the Lanczos method used for Indeed, there exists a best polynomial approximation Hermitian matrices. Nevertheless, our first results showed yˆ = Pk−1(A)x of degree at most k 1, which is readily consistent convergence properties when computing the ma- defined as the orthogonal projection− of y on the Krylov trix sign function: as the size of the Krylov space increases, subspace (A, x) = span(x, Ax, A2x,...,Ak−1x). If V = the method converges to within machine accuracy. More Kk k (v1,...,vk) is an N k matrix whose columns form an precisely, for the sign function the method shows a see-saw × † orthonormal basis in k, then VkVk is a projector on the profile corresponding to even-odd numbers of Krylov vec- K † tors, as was previously noticed for the Hermitian case as Krylov space, and the projection isy ˆ = VkVk y. The operation of any polynomial of A of degree smaller well [5]. This even-odd pattern is related to the sign func- tion being odd in its argument. The see-saw effect is com- than k on x is also an element of k, and the projection yˆ corresponds to the polynomialK which minimizes ∆ = pletely removed when using the Harmonic Ritz values as f(A)x Pk−1(A)x, as∆ k(A, x). Clearly, this approx- described in Ref. [5] for Hermitian matrices and extended imation− explicitly takes into⊥ K account information about A to non-Hermitian matrices in Ref. [28], and convergence be- and x. comes smooth. Alternatively, one can just as well restrict The problem, however, is that to find this best polyno- the calculations to even-sized Krylov subspaces. mial approximation of degree k 1 one already has to know Unfortunately, in the case of the sign function the Arnoldi the answer y = f(A)x. Therefore,− we need a method to ap- method described above has a very poor efficiency when proximate the projected vectory ˆ. This can be done by one some of the eigenvalues are small. In the case considered of the Krylov subspace methods mentioned in Sec. 3. Here, in Fig. 3 (see Sec. 6 below), the size of the Krylov space we use the Arnoldi algorithm to construct an orthonormal has to be taken very large (k N/2) to reach accuracies of the order of 10−8 (see the m≈= 0 curve in the top pane basis for the Krylov subspace k(A, x) using the long re- currence K of Fig. 3). A discussion and resolution of this problem are T given in the next section. AVk = VkHk + βkvk+1ek , (14) where v = x/β, β = x , H is an upper Hessenberg matrix 1 | | k (upper triangular + one subdiagonal), βk = Hk+1,k, and 5. Deflation k ek is the k-th basis vector in C . The projectiony ˆ of f(A)x on (A, x) can be written as Kk 5.1. Introduction † yˆ = VkVk f(A)x . (15) Making use of x = βv = βV e , Eq. (15) becomes For Hermitian matrices, it is well known that the compu- 1 k 1 tation of the sign function can be improved by deflating the † yˆ = βVkVk f(A)Vke1 . (16) eigenvalues smallest in [5]. The reason why From Eq. (14) it is easy to see that this is crucial and specific to the sign function is the dis- continuity of the sign function at zero. For non-Hermitian † Hk = Vk AVk (17) matrices, the situation is analogous since the sign function † now has a discontinuity along the imaginary axis. A nec- as Vk Vk = Ik and vk+1 Vk. Therefore it seems natural to introduce the approximation⊥ [27] essary condition for the method described in Sec. 4 to be efficient is the ability to approximate f well at the eigenval- † f(Hk) V f(A)Vk (18) ues of A by a low-order polynomial. If the gap between the ≈ k in Eq. (16), which finally yields eigenvalues of A to the left and to the right of the imagi- nary axis is small, no low-order polynomial will exist which yˆ βV f(H )e . (19) ≈ k k 1 will be accurate enough for all eigenvalues. This expression is just a linear combination of the k Arnoldi The idea is to resolve this problem by treating these crit- vectors vi, where the k coefficients are given by β times ical eigenvalues exactly, and performing the Krylov sub- the first column of f(Hk). Saad [29] showed that the ap- space approximation on a deflated space. proximation (19) corresponds to replacing the polynomial In the Hermitian case, deflation is straightforward. The interpolating f at the eigenvalues of A by the lower-degree function f(A) of a Hermitian matrix A with eigenvalues polynomial which interpolates f at the eigenvalues of Hk, λi and orthonormal eigenvectors ui (i = 1,...,N) can be which are also called the Ritz values of A in the Krylov written as N space. The hope is that for k not too large the approxima- † tion (19) will be a suitable approximation for y. The ap- f(A)= f(λi)uiui , (20) i=1 proximation of f(A)x by Eq. (19) replaces the computation X of f(A) by that of f(Hk), where Hk is of much smaller size and its operation on an arbitrary vector x as than A. The evaluation of (the first column of) f(Hk) can N be implemented using a spectral decomposition, or another † f(A)x = f(λi)(ui x)ui . (21) suitable evaluation method [15,17,22,23]. i=1 X 4 If m critical eigenvalues of A and their corresponding eigen- where Sm = (s1,...,sm) is the N m matrix formed by vectors have been computed, one can split the vector space the orthonormal Schur vectors and× the diagonal elements into two orthogonal subspaces and write an arbitrary vec- of Tm are the eigenvalues corresponding to the Schur vec- m † tor as x = xk + x⊥, where xk = i=1(ui x)ui and x⊥ = tors. These Schur vectors form an orthonormal basis of the x xk. Eq. (21) can then be rewritten as eigenvector subspace Ωm, which is invariant under opera- − P tion of A. m After constructing the m-dimensional subspace Ω we f(A)x = f(λ )(u†x)u + f(A)x . (22) m i i i ⊥ run a modified Arnoldi method to construct an orthogonal i=1 X basis of the composite subspace Ω + (A, x). That is, m Kk The first term on the right-hand side of Eq. (22) can be each Arnoldi vector is orthogonalized not only against the computed exactly, and the second term can be approxi- previous ones, but also against the Schur vectors si. In mated by applying the Arnoldi method of Sec. 4 to x⊥. analogy to (14), this process can be summarized as As the vector x does not contain any contribution in the ⊥ † eigenvector directions corresponding to the critical eigen- Tm SmAVk T A S V = S V + βkvk+1e . values, the polynomial approximation no longer needs to m k m k   m+k 0 Hk interpolate f at the eigenvalues closest to the function dis-       (24) continuity to approximate f(A)x⊥ well. Therefore, after Here, the columns v1,...,vk of Vk form an orthonormal deflation, lower-degree polynomials will yield the required basis of the space ⊥(A, x), which is the projection of the accuracy, and a smaller-sized Krylov subspace can be used Kk Krylov subspace k(A, x) onto the orthogonal complement in the approximation. In theory, the orthonormality of the ⊥ K Ω of Ωm. In particular, v1 = x⊥/β, where x⊥ = (1 eigenvectors guarantees that the Krylov subspace will be † ⊥ − SmSm)x is the projection of x onto Ω and β = x⊥ . perpendicular to xk, but in practice numerical inaccuracies | | Again, Hk is an upper Hessenberg matrix. could require us to reorthogonalize the subspaces during ⊥ Note that the orthogonality of k with respect to Ωm the construction of the Krylov subspace. has to be enforced explicitly duringK the Arnoldi iterations, For non-Hermitian matrices the (generalized) eigenvec- as the operation of A on a vector in the projected Krylov tors are no longer orthonormal, and it is not immediately ⊥ subspace k in general gets a contribution belonging to clear how to deflate a critical subspace. The matrix func- K⊥ Ωm, i.e., k is not invariant under the operation of A. This tions as defined in (4) or (7) involve the inverse of the ma- is a consequenceK of the non-orthogonality of the eigenvec- trix of basis vectors, U, and no simple decomposition into tors of A. orthogonal subspaces can be performed. Defining In the remainder of this section, we will develop two alter- native deflation schemes for the non-Hermitian case, using † Tm SmAVk a composite subspace generated by adding a small number Q = Sm Vk and H = , (25)  0 H  of critical eigenvectors to the Krylov subspace. This idea of   k augmented Krylov subspace method   an has been used in the H satisfies a relation similar to Eq. (17), namely iterative solution of linear systems for some time, see, e.g., Ref. [30]. Since in computational practice one will never H = Q†AQ , (26) encounter non-trivial Jordan blocks, we assume in the fol- and the function approximation derived in Sec. 4 can be lowing, for simplicity, that the matrix is diagonalizable. used here as well. We briefly repeat the steps of Sec. 4. The operation of the matrix function f(A) on the vector x can be approximated by its projection on the composite 5.2. Schur deflation subspace, f(A)x QQ†f(A)x , (27) We construct the subspace Ωm + k(A, x), which is the ≈ K and because x lies in the subspace, sum of the subspace Ωm spanned by the eigenvectors cor- responding to m critical eigenvalues of A and the Krylov f(A)x QQ†f(A)QQ†x . (28) subspace k(A, x). The aim is to make an approximation ≈ similar toK that of Eq. (19), but to treat the contribution As H satisfies Eq. (26), we can introduce the same approx- of the critical eigenvalues to the sign function explicitly so imation as in Eq. (18), that the size of the Krylov subspace can be kept small. f(H) Q†f(A)Q, (29) Assume that m critical eigenvalues and right eigenvectors ≈ of A are determined using an iterative eigenvalue solver like and substituting Eq. (29) in Eq. (28) we construct the func- the one implemented in ARPACK. From this one can easily tion approximation construct m Schur vectors and the corresponding m m † × f(A)x Qf(H)Q x . (30) upper triangular matrix Tm satisfying ≈ Because of the block structure (25) of the composite Hes- ASm = SmTm , (23) senberg matrix H, the matrix f(H) can be written as

5 f(Tm) Y (v) Compute (column m +1 of) f(H) using Roberts’ it- f(H)= . (31) erative method (34) on H and solving the Sylvester   k 0 f(Hk) equation (32) for Y as described in Appendix A. † The upper left corner is the function of the triangular Schur (vi) Compute the approximation f(A)x Qf(H)Q x us- ing formula (33). ≈ matrix Tm (which is again triangular), and the lower right corner is the (dense) matrix function of the Arnoldi Hes- If f(A)x has to be computed for several x, only steps (iii)- (vi) need to be repeated for each vector x. senberg matrix Hk. The upper right corner reflects the cou- pling between both subspaces and is given by the solution of the Sylvester equation 5.3. LR-deflation

TmY YHk = f(Tm)X Xf(Hk) (32) − − An alternative deflation in the same composite subspace † with X = SmAVk, which follows from the identity Ωm + k(A, x) can be constructed using both the left and f(H)H = Hf(H). Combining (30) and (31), we obtain right eigenvectorsK corresponding to the critical eigenval- ues. This deflation algorithm is a natural extension of the † f(Tm) Y Sm method described in Sec. 5.1 from the Hermitian to the f(A)x S V x ≈ m k    †  non-Hermitian case. A similar idea has been used for the 0 f(Hk) V   k iterative solution of linear systems [31,32].     Assume that m critical eigenvalues of A have been com- † Y † = Smf(Tm)Sm x + Sm Vk Vk x . (33) puted together with their corresponding left and right f(H )   k eigenvectors by some iterative method like the one pro-   † vided by ARPACK. The right eigenvectors satisfy Note that Vk x = βe1. Therefore only f(Tm) and the first column of Y and f(Hk), i.e., the first m + 1 columns of ARm = RmΛm (35) f(H), are needed to evaluate (33). This information can be with Λm the diagonal eigenvalue matrix for the m criti- computed using the spectral definition (4) or some other cal eigenvalues and Rm = (r1,...,rm) the matrix of right suitable method [15,17,22,23]. In the case of the sign func- eigenvectors (stored as columns). Similarly, the left eigen- tion we chose to use Roberts’ iterative method [16] vectors obey 1 L† A =Λ L† , (36) Sn+1 = Sn + (Sn)−1 (34) m m m 2 where Lm = (ℓ1,...,ℓm) is the matrix containing the left with S0 = A, which converges quadratically to sgn(A). eigenvectors (also stored as columns). For a non-Hermitian Roberts’ method is applied to compute sgn(Tm) and matrix, the left and right eigenvectors corresponding to dif- sgn(Hk). The matrix Y is computed by solving the ferent eigenvalues are orthogonal (for degenerate eigenval- Sylvester equation (32) using the method described in ues linear combinations of the eigenvectors can be formed Appendix A. such that this orthogonality property remains valid in gen- In the implementation one has to be careful to com- eral). Furthermore, if the eigenvectors are normalized such pute the deflated eigenvectors to high enough accuracy, as † † † that ℓi ri = 1, then clearly LmRm = Im, and RmLm is an this will limit the overall accuracy of the function approx- oblique projector on the subspace Ωm spanned by the right imation. When computing f(A)x for several x, the partial eigenvectors. Schur decomposition (23) and the triangular matrix f(Tm) Let us now decompose x as need to be computed only once. Only the modified Arnoldi method must be repeated for each new vector x. x = xk + x⊖ , (37) We summarize our algorithm for approximating f(A)x: † where xk = RmLmx is an oblique projection of x on Ωm (i) Determine the eigenvectors for m critical eigenvalues and x⊖ = x xk. of A using ARPACK. Construct and store the corre- Applying −f(A) to the decomposition (37) yields sponding Schur matrix Sm and the upper triangular † f(A)x = f(A)R L† x + f(A)x . (38) matrix Tm = SmASm. The columns of Sm form an m m ⊖ orthonormal basis of a subspace Ωm. The first term on the right-hand side can be evaluated ex- (ii) Compute the triangular matrix f(Tm) using (34). † actly using (iii) Compute x⊥ = (1 SmSm)x. − † † (iv) Construct an orthonormal basis for the projected f(A)RmLmx = Rmf(Λm)Lmx , (39) Krylov subspace ⊥(A, x ) using a modified Arnoldi k ⊥ which follows from Eq. (35), while the second term can be method. The basisK is constructed iteratively by or- approximated by applying the Arnoldi method described in thogonalizing each new Krylov vector with respect to Sec. 4 to the vector x⊖. An orthonormal basis is constructed Ωm and to all previous Arnoldi vectors, and is stored in the Krylov subspace k(A, x⊖) using the recurrence as columns of a matrix Vk. Also build the upper K † T Hessenberg matrix H = Q AQ with Q = (Sm, Vk). AVk = VkHk + βkvk+1ek , (40)

6 where v1 = x⊖/β and β = x⊖ . By construction, x⊖ has larger, while in the LR-deflation the components of f(A)x no components along the m critical| | (right) eigendirections, along the critical eigendirections can be computed exactly, and successive operations of A will yield no contributions independently of the size of the Krylov subspace. This is along these directions either, hence (A, x ) does not mix probably the reason for the observation that, for fixed m Kk ⊖ with Ωm. In principle, numerical inaccuracies accumulated and k, the LR-deflation is slightly more accurate than the during the Arnoldi iterations might make it necessary to Schur deflation, see Fig. 4 below. occasionally re-extract spurious components along the crit- Numerically, the LR-deflation has two advantages over ical eigendirections. However, this turned out not to be nec- the Schur deflation. First, its Arnoldi method does not re- essary in our numerical calculations. quire the deflated directions to be (obliquely) projected out Applying the Arnoldi approximation (19) to Eq. (38) of the Krylov subspace because the subspaces do not mix. yields the final approximation Second, this absence of coupling between the subspaces † means that the LR-scheme has no analog of the Sylvester f(A)x Rmf(Λm)L x + βVkf(Hk)e1 . (41) ≈ m equation (32). Note that again only the first column of f(Hk) is needed A downside of the LR-deflation is that both left and to evaluate Eq. (41). As before, f(Hk) has to be computed right eigenvectors need to be computed in the initializa- using some suitable method. The function sgn(Hk) can be tion phase, whereas the Schur deflation only needs the right efficiently computed using Roberts’ algorithm (34). eigenvectors. Hence, the Schur deflation will have a shorter We summarize our algorithm for approximating f(A)x initialization phase, unless one needs to operate with both in the LR-deflation scheme: f(A) and its adjoint f(A)†, in which case both sets of eigen- (i) Determine the left and right eigenvectors for m criti- vectors are needed anyways (for the latter, the roles of left cal eigenvalues of A using ARPACK. Store the corre- and right eigenvectors are interchanged). sponding eigenvector matrices Lm and Rm. In the next section, we will present numerical results (ii) Compute f(λi) (i = 1,...,m) for the critical eigen- obtained with our modified Arnoldi method. values. (iii) Compute x = (1 R L† )x. ⊖ − m m (iv) Construct an orthonormal basis for the Krylov sub- 6. Results space (A, x ) using the Arnoldi recurrence. The ba- Kk ⊖ sis is constructed iteratively by orthogonalizing each We implemented the modified Arnoldi method proposed new Krylov vector with respect to all previous Arnoldi in the previous section to compute the sign function occur- vectors, and is stored as columns of a matrix Vk. Also † ring in the overlap Dirac operator (1) of lattice QCD. build the upper Hessenberg matrix Hk = V AVk. k First we discuss the critical eigenvalues of the γ5-Wilson- (v) Compute (the first column of) f(Hk) using Roberts’ Dirac operator Hw(µ), which are needed for deflation and iterative method (34). have to be computed once for any given SU(3) gauge con- (vi) Compute the approximation to f(A)x using Eq. (41). figuration. Deflation is needed because of the existence of If f(A)x has to be computed for several x, only steps (iii)- eigenvalues close to the imaginary axis. In Fig. 1 we show (vi) need to be repeated for each vector x. 4 4 the spectrum of Hw(µ) for a 4 lattice and a 6 lattice, using the same parameters as in Ref. [7], i.e., mw = 2 Discussion − 5.4. and gauge coupling βg =5.1. These complete spectra were computed with LAPACK [33]. This is a very costly calcu- We briefly compare both deflation schemes. Al- lation, especially for the 64 lattice, which was done for the though both schemes use the same composite subspace sole purpose of this numerical investigation but cannot be Ωm + k(A, x), they yield different function approxima- performed routinely during production runs. tions resultingK from a different subspace decomposition. Although the eigenvalues of interest for deflation in the In the Schur deflation, the composite subspace is decom- case of the sign function are those with smallest absolute posed in a sum of two orthogonal subspaces which are cou- real parts, we decided to deflate the eigenvalues with small- pled by A, while in the LR-deflation the subspaces no longer est magnitude instead. Numerically the latter are more eas- mix, at the expense of losing orthogonality of the two sub- ily determined, and both choices yield almost identical de- spaces. flations for the γ5-Wilson operator at nonzero chemical po- Accordingly, both schemes introduce different approxi- tential. The reason for this is that, as long as the chemical mations to f(A)x: the Schur deflation approximates the potential does not grow too large, the spectrum looks like orthogonal projection of the solution vector on the total a very narrow bow-tie shaped strip along the real axis (see composite space using Eq. (29), while the LR-deflation Fig. 1), and the sets of eigenvalues with smallest absolute first extracts the critical eigendirections and only approx- real parts and smallest magnitudes will coincide. imates the orthogonal projection of the residual vector on In practice we compute the eigenvalues of Hw with small- the Krylov subspace using Eq. (19). Therefore, in the Schur est magnitude with ARPACK. This package has an option deflation the components of f(A)x along the Schur vec- to retrieve the eigenvalues with smallest magnitude without tors become more accurate as the Krylov subspace becomes performing an explicit inversion, which would be very ex-

7 0.1 0.8

0.08 0.6 0.06 0.4 0.04 0.2 0.02 ) 2 0 0 Im(z) Im(z -0.02 -0.2 -0.04 -0.4 -0.06 -0.6 -0.08

-0.1 -0.8 -6 -4 -2 0 2 4 6 -5 0 5 10 15 20 25 30 Re(z) Re(z2) 0.1 0.8

0.08 0.6 0.06 0.4 0.04 0.2 0.02 ) 2 0 0 Im(z) Im(z -0.02 -0.2 -0.04 -0.4 -0.06 -0.6 -0.08

-0.1 -0.8 -6 -4 -2 0 2 4 6 -5 0 5 10 15 20 25 30 Re(z) Re(z2)

4 2 4 4 Fig. 1. Spectrum of Hw(µ) in Eq. (1) for a 4 lattice (top pane) Fig. 2. Spectrum of Hw(µ) for a 4 lattice (top pane) and a 6 lattice 4 and a 6 lattice (bottom pane), with µ = 0.3 and mw = −2. Note (bottom pane), with µ = 0.3 and βg = 5.1. the difference in scale between real and imaginary axes. The gauge fields were generated using the Wilson plaquette action with gauge deflated eigenvalues and the size of the Krylov subspace. coupling βg = 5.1 [7]. A useful piece of information in this context is the ratio pensive in this case. However, the use of this option requires of the magnitude of the largest deflated eigenvalue over the eigenvalues to be near the boundary of the convex hull the largest overall eigenvalue, which is given in Table 1 for of the spectrum. From Fig. 1 it is clear that the eigenval- different numbers of deflated eigenvalues. A comparison of ues closest to the origin are deep inside the interior of the the values for both lattice sizes indicates that the number convex hull and do not satisfy this requirement. Therefore of eigenvalues with a magnitude smaller than a given ratio we opted to compute the eigenvalues with smallest magni- increases proportionally with the lattice volume. This is 2 tude of the squared operator Hw. Clearly the eigenvalues consistent with a spectral density of the small eigenvalues with smallest magnitude will be the same for both oper- proportional to the lattice volume. This property is also 2 2 2 ators, as λ = λ . The eigenvalues of Hw are given by apparent in Figs. 1 and 2, as the contours enclosing the z2 = x2 | y2|+2ixy| | , where x and y are the real and imag- spectra remain unchanged when the volume is increased. − 2 inary parts of the eigenvalues z of Hw. The spectra of Hw As a first guess we therefore expect that scaling m with the for the 44 and 64 lattices are shown in Fig. 2, and clearly volume should yield comparable convergence properties of the eigenvalues with smallest magnitudes are now close to the method for various lattice sizes. the boundary of the convex hull of the spectrum so that The convergence of the method is illustrated in Fig. 3, ARPACK can find them more easily. Since in this approach where the accuracy of the approximation is shown as a func- we square the matrix, there is the potential danger that tion of the Krylov subspace size for two different lattice small eigenvalues get spoiled just by the additional numer- sizes. The various curves correspond to different numbers ical round-off introduced when applying A twice. However, of deflated eigenvalues. The results in the figure were com- this should be noticeable only if these eigenvalues are com- puted using the LR-deflation scheme. The Schur deflation parable in size to the round-off error. Our calculations are yields similar results. not affected by this problem. Without deflation (m = 0) the Krylov subspace method Obviously there is a trade-off between the number of would be numerically unusable because of the need of large

8 1

Schur 0.01 LR m=0 proj 0.01 2

4 0.0001 0.0001 64 lattice 8 m=128

error error 1x10-6 -6 16 1x10 44 lattice 32 m=32

64 -8 1x10-8 1x10

-10 1x10 1x10-10 0 200 400 600 800 1000 1200 1400 1600 0 200 400 600 800 1000 Krylov subspace size Krylov subspace size 1 m=0 Fig. 4. Comparison of the accuracies achieved with both defla-

tion schemes and the exact projection of y on the composite space 16 0.01 Ωm + Kk(A,x). The relative error ǫ = ky˜ − yk/kyk is shown as a function of the Krylov subspace size k. For both lattice sizes the ac-

32 curacy of the LR-deflation is slightly better than that of the Schur 0.0001 deflation. Furthermore, the accuracy of the modified Arnoldi ap- proximation is very close to the best possible approximation in the

error composite subspace. 64 1x10-6

a given accuracy. 128 Furthermore, the deflation efficiency grows with increas- 1x10-8 ing lattice volume. To reach an accuracy of 10−8 for the 44 lattice with 25 ( 0.0081N) deflated eigenvalues, one -10 ≈ 1x10 requires a Krylov subspace size k 570 ( 0.19N). How- 0 500 1000 1500 2000 2500 3000 ≈ ≈4 Krylov subspace size ever, to reach the same accuracy for the 6 lattice with a comparable deflation of 128 ( 0.0082N) critical eigenval- ≈ Fig. 3. Accuracy of the modified Arnoldi method for y = sgn(A)x. ues, one only requires k 700 ( 0.045N). Although the ≈ ≈ The non-Hermitian matrix is A = γ5Dw(µ), where Dw(µ) is the matrix size N is more than 5 times larger for the 64 lattice, Wilson-Dirac operator (2) at chemical potential µ = 0.3, and the Krylov subspace only has to be expanded by a factor x = (1, 1,..., 1). Top pane: 44 lattice with dim(A) = 3072, bot- tom pane: 64 lattice with dim(A) = 15552. The relative error of 1.2 to achieve the same accuracy (when m is scaled pro- ǫ = ky˜ − yk/kyk is shown as a function of the Krylov subspace size portional to N so that the ratio of Table 1 remains approx- k for various numbers of deflated eigenvalues m using the LR-defla- imately constant for both lattice sizes). tion. In order to compute the error ǫ, the exact solution y was first In Fig. 4 we compare the accuracy of the two deflation determined using the spectral definition (4) and the full spectral schemes described in Sec. 5.2 and 5.3. For an equal number decomposition computed with LAPACK. of deflated eigenvalues and equal Krylov subspace size, the Krylov subspaces. Clearly, deflation highly improves the LR-deflation seems systematically slightly more accurate efficiency of the numerical method: as more eigenvalues are than the Schur deflation. deflated, smaller Krylov subspaces are sufficient to achieve To assess the quality of the modified Arnoldi approxima- tion, it is interesting to compare the approximations (33) m max |λdefl|/ max |λall| and (41) for f(A)x with the best approximation in the com- 44 lattice 64 lattice posite subspace, which corresponds to the orthogonal pro- jection (27) of f(A)x on Ωm + k(A, x), 2 0.0040 0.0016 K m k 4 0.0056 0.0026 † † † yproj = QQ y = (si y)si + (vi y)vi . (42) 8 0.0119 0.0035 i=1 i=1 X X 16 0.0183 0.0052 The relative accuracy of this projection is also shown in 32 0.0351 0.0084 Fig. 4. It is encouraging to note that the modified Arnoldi 64 0.0631 0.0155 approximation is quite close to the exact projection yproj. 128 — 0.0286 In Table 2 we show the CPU time used by the modified Table 1 Arnoldi method for the Schur and LR-deflation schemes. Ratio of largest deflated eigenvalue over largest eigenvalue for various The times needed to construct the orthonormal basis in the numbers of deflated eigenvalues m for a 44 and a 64 lattice. Krylov subspaces according to Eqs. (24) and (40) and to

9 compute the sign function of the Arnoldi Hessenberg ma- needed to compute the sign of the Hessenberg matrix is trices are tabulated separately. The tabulated times were also slightly larger for the Schur deflation as it involves the measured for an m = 32 deflation for the 44 lattice and additional solution of the Sylvester Equation (32). For the m = 128 for the 64 lattice. same reasons, varying m for a given lattice size will only The larger CPU times required by the Schur defla- significantly change the timings for the Schur deflation tion mainly reflect the additional orthogonalization of the (this m-dependence is not shown in the table). Arnoldi vectors with respect to the Schur vectors. The time To summarize, the LR-deflation scheme has a somewhat better accuracy and requires less CPU time per iteration

4 4 than the Schur deflation. The one advantage of the Schur 4 lattice, m = 32 4 lattice, m = 32 deflation is that it only requires the initial computation Schur deflation LR-deflation of the right eigenvectors, while the LR-deflation requires initialization time: 14.1 s initialization time: 27.5 s the computation of both left and right eigenvectors. The time needed to compute the critical eigenvectors of Hw(µ) k Arnoldi sgn(H) total k Arnoldi sgn(Hk) total is given in the headers of the four blocks in Table 2. The 100 0.18 0.03 0.23 100 0.12 0.03 0.15 choice of deflation scheme depends on the number of vectors 200 0.59 0.21 0.81 200 0.45 0.20 0.66 x for which sgn(Hw)x needs to be computed. This will be the topic of future work on nested iterative methods for 300 1.22 0.52 1.75 300 1.01 0.49 1.51 non-Hermitian matrices occurring during the inversion of 400 2.05 1.08 3.16 400 1.77 1.02 2.82 the overlap operator. Of course, as mentioned in Sec. 5.4, if 500 3.12 1.79 4.93 500 2.76 1.69 4.47 one needs to apply both sgn(Hw) and its adjoint, then the 600 4.37 2.90 7.31 600 3.94 2.77 6.74 LR-deflation will be the better choice. 700 5.88 4.57 10.49 700 5.36 4.40 9.79 800 7.56 6.69 14.28 800 6.96 6.44 13.44 7. Conclusion 900 9.50 9.38 18.92 900 8.84 9.10 17.98 1000 11.63 12.68 24.36 1000 10.84 12.33 23.21 In this paper we have proposed an algorithm to approx- imate the action of a function of a non-Hermitian matrix 64 lattice, m = 128 64 lattice, m = 128 on an arbitrary vector, when some of the eigenvalues of the Schur deflation LR-deflation matrix lie in a region of the complex plane close to a dis- continuity of the function. initialization time: 884 s initialization time: 1713 s The method approximates the solution vector in a com-

k Arnoldi sgn(H) total k Arnoldi sgn(Hk) total posite subspace, i.e., a Krylov subspace augmented by the 100 2.03 0.05 2.13 100 0.66 0.03 0.75 eigenvectors corresponding to a small number of critical eigenvalues. In this composite subspace two deflation vari- 200 5.16 0.22 5.45 200 2.39 0.15 2.62 ants are presented based on different subspace decompo- 300 9.27 0.56 9.91 300 5.16 0.42 5.69 sitions: the Schur deflation uses two coupled orthogonal 400 14.59 1.15 15.85 400 9.01 0.94 10.06 subspaces, while the LR-deflation uses two decoupled but 500 20.95 2.09 23.17 500 13.96 1.73 15.84 non-orthogonal subspaces. The subspace decompositions are then used to compute 600 28.12 3.35 31.61 600 20.03 2.80 22.98 Arnoldi-based function approximations in which the con- 700 36.81 5.17 42.15 700 27.09 4.44 31.70 tribution of the critical eigenvalues is taken into account 800 46.32 7.39 53.88 800 35.09 6.49 41.78 explicitly. This deflation of critical eigenvalues allows for a 900 56.83 10.37 67.39 900 44.38 9.10 53.70 smaller size of the Krylov subspace and is crucial for the efficiency of the method. 1000 68.29 13.88 82.39 1000 54.74 12.36 67.34 For the sign function, deflation is particularly important Table 2 because of its discontinuity along the imaginary axis. The CPU time (in seconds) for varying Krylov subspace size k. Top row: 4 4 method was applied to the overlap Dirac operator of lat- 4 lattice with m = 32, bottom row: 6 lattice with m = 128. Left tice QCD at nonzero chemical potential, where deflation panes: Schur deflation, right panes: LR-deflation. The time required by the initial calculation of the critical eigenvectors is given in the was shown to clearly enhance the efficiency of the method. header of each block. The time needed to construct the Arnoldi basis If the overlap Dirac operator has to be inverted using some in the Krylov subspace (column 2) is approximately proportional to iterative method, each iteration will require the computa- 2 Nk(k+2m) for the Schur deflation and Nk for the LR-deflation. The tion of sgn(Hw)x for some vector x. In such a situation the time used by Roberts’ method (34) to compute the sign function of cost for computing the critical eigenvectors, which is done the Hessenberg matrix (column 3) is O(k3). The total time (column 4) also includes the evaluation of Eq. (33) for the Schur deflation and just once, is by far outbalanced by the smaller costs for Eq. (41) for the LR deflation. These timings were measured on an each evaluation of the sign function. However, an impor- Intel Core 2 Duo 2.33GHz computer using optimized ATLAS BLAS tant question that deserves further study is how the optimal routines [34]. number m of deflated eigenvectors depends on the volume

10 j and how this influences the initialization time. This ques- 1 aj+1 = Tiiaj + akHkj , tion could become performance relevant for large volumes. −Hj+1,j "− # As mentioned above, our next steps include the appli- k=1 (A.6) X j cation of the two-sided Lanczos method to the problem of 1 bj+1 = c˜ij Tiibj + bkHkj , approximating f(A)x for a non-Hermitian matrix, and the −Hj+1,j − " k=1 # investigation of nested iterative methods for non-Hermitian X matrices. Work in these directions is in progress. starting from a1 = 1, b1 = 0. After substituting Eq. (A.5) with the known coefficients (A.6), element (i,n) of Eq. (A.3) can be solved for y , Appendix A. Sylvester equation i1 c˜ T b + n b H y = in − ii n k=1 k kn . (A.7) i1 T a n a H In this appendix we describe a particularly simple algo- ii n − kP=1 k kn rithm to solve the special Sylvester equation Once yi1 is known, all otherP elements yij of row i can be T Y YH = C, (A.1) computed using Eq. (A.5) with coefficients (A.6). − where T is an m m upper triangular matrix, H is an n n Acknowledgments upper Hessenberg× matrix, and the right-hand side C and× the unknown matrix Y are m n matrices. Classical meth- ods to solve the Sylvester equation× when T and H are full This work was supported in part by DFG grants matrices are formulated in [35,36]. For triangular H and T FOR465-WE2332/4-2 and Fr755/15-1. JB would like to the Sylvester equation can easily be solved by direct substi- thank Thomas Kaltenbrunner for useful discussions. tution, see, e.g., [14]. In principle, this algorithm could also be applied to Eq. (A.1) if the upper Hessenberg matrix H is References first transformed into triangular form using a Schur decom- position. Here we present a more efficient approach, which [1] H. B. Nielsen, M. Ninomiya, Nucl. Phys. B 185 (1981) 20, Nucl. can be regarded as a natural extension of the algorithm for Phys. B 193 (1981) 173, Phys. Lett. B 105 (1981) 219. the triangular Sylvester equation when one of the matrices [2] R. Narayanan, H. Neuberger, Nucl. Phys. B 412 (1994) 574, is upper Hessenberg instead of triangular. Blocking would Nucl. Phys. B 443 (1995) 305; H. Neuberger, Phys. Lett. B 417 also be possible (cf., e.g., [37]), but since the solution of the (1998) 141. Sylvester equation accounts only for a small portion of the [3] P. H. Ginsparg, K. G. Wilson, Phys. Rev. D25 (1982) 2649. [4] H. Neuberger, Phys. Rev. Lett. 81 (1998) 4060. overall time we did not pursue this issue further. [5] J. van den Eshof, A. Frommer, T. Lippert, K. Schilling, H. A. Written out explicitly, the element (i, j) of the matrix van der Vorst, Comput. Phys. Commun. 146 (2002) 203. equation (A.1) is [6] For a recent review, see M. Stephanov, Proc. of Science LAT2006 (2006) 024. m max(j+1,n) [7] J. Bloch, T. Wettig, Phys. Rev. Lett. 97 (2006) 012003. T y y H = c (A.2) [8] D. B. Kaplan, Phys. Lett. B288 (1992) 342. ik kj − ik kj ij k=i k=1 [9] Y. Shamir, Nucl. Phys. B406 (1993) 90. X X [10] V. Furman, Y. Shamir, Nucl. Phys. B439 (1995) 54. for i =1,...,m and j =1,...,n. [11] P. Hasenfratz, F. Karsch, Phys. Lett. B125 (1983) 308, Phys. This matrix equation can be solved row by row from Rept. 103 (1984) 219. bottom to top, since Eq. (A.2) can be solved for row i once [12] J. B. Kogut, et al., Nucl. Phys. B225 (1983) 93. rows i +1,...,m are known, [13] N. Dunford, J. Schwartz, Linear Operators, Part I: General Theory, Interscience Publishers, 1958. max(j+1,n) [14] G. H. Golub, C. F. Van Loan, Matrix Computations, The Johns T y y H =˜c (A.3) Hopkins University Press, 1989. ii ij − ik kj ij k=1 [15] E. D. Denman, A. N. Beavers, Appl. Math. Comput. 2 (1976) X 63. m withc ˜ = c T y . [16] J. Roberts, Internat. J. Control 32 (1980) 677. ij ij − k=i+1 ik kj Inside row i one can solve for the element yi,j+1 as a [17] C. Kenney, A. Laub, SIAM J. Matrix Anal. Appl. 12 (1991) 273. function of the elementsP to its left, [18] N. Higham, Linear Algebra and its application 212/213 (1994) 3. [19] R. A. Horn, C. R. Johnson, Topics in matrix analysis, Cambridge j 1 University Press, Cambridge, 1994. yi,j+1 = c˜ij Tiiyij + yikHkj (A.4) [20] J. Bloch, T. Wettig, arXiv:0709.4630 [hep-lat]. −Hj+1,j − " k=1 # [21] A. Bori¸ci, Phys. Lett. B 453 (1999) 46, J. Comput. Phys. 162 X (2000) 123. for columns j =1,...,n 1. From Eq. (A.4) it follows that [22] C. Kenney, A. Laub, IEEE Trans. Autom. Control 40 (1995) − all elements of row i can be written as 1330. [23] N. J. Higham, Num. Algorithms 15 (1997) 227. yij = aj yi1 + bj , (A.5) [24] http://www.caam.rice.edu/software/ARPACK. [25] H. A. van der Vorst, J. Comput. Appl. Math. 18 (1987) 249. where the coefficients aj and bj can be computed explicitly [26] V. Druskin, A. Greenbaum, L. Knizhnerman, SIAM J. Sci. from the recurrence relations Comput. 19 (1998) 38.

11 [27] E. Gallopoulos, Y. Saad, On the parallel solution of parabolic equations, in: R. D. Groot (Ed.), Proceedings of the International Conference on Supercomputing 1989, Heraklion, Crete, June 5- 9, 1989, ACM press, 1989. [28] M. Hochbruck, M. E. Hochstenbach, Subspace extraction for matrix functions, preprint (2005). [29] Y. Saad, SIAM Journal on Numerical Analysis 29 (1992) 209. [30] Y. Saad, SIAM J. Matrix Anal. Appl. 18 (1997) 435. [31] R. Morgan, W. Wilcox, Deflated iterative methods for linear equations with multiple right-hand sides, Tech. rep., Baylor University (2004). [32] A. Hasenfratz, P. Hasenfratz, F. Niedermayer, Phys. Rev. D72 (2005) 114508. [33] http://www.netlib.org/lapack. [34] http://math-atlas.sourceforge.net. [35] R. H. Bartels, G. W. Stewart, Commun. ACM 15 (1972) 820. [36] G. H. Golub, S. Nash, C. F. Van Loan, IEEE Trans. Autom. Control 24 (1979) 909. [37] E. S. Quintana-Ort´ı, R. A. van de Geijn, ACM Trans. Math. Softw. 29 (2) (2003) 218.

12