ARNOLDI METHODS FOR THE EIGENVALUE PROBLEM, GENERALIZED OR NOT

MARKO HUHTANEN∗

Abstract. Arnoldi methods are devised for polynomially computing either a Hessenberg- triangular form or a triangular-triangular+rank-one form to numerically solve large eigenvalue prob- lems, generalized or not. If generalized, then no transformations into a standard form take place. If standard, then a new Arnoldi method arises. The equivalence transformations involved are unitary, or almost, and polynomially generated. The normal generalized eigenvalue structure is identified for which separate Arnoldi methods of moderate complexity are derived. Averaging elements of the n Grassmannian Grk( C ) is suggested to optimally generate subspaces in the codomain. Then the two canonical forms described get averaged, giving rise to an optimal Arnoldi method. For k = 1 this replaces the Rayleigh quotient by yielding the field of directed norms instead of the field of values.

Key words. generalized eigenvalue problem, Arnoldi method, Krylov subspace, polynomial method, normal eigenvalue problem, optimality, Rayleigh quotient, Grassmannian average

AMS subject classifications. 65F15, 65F50

1. Introduction. Consider a large eigenvalue problem Mx = λNx (1.1) of computing a few eigenvalues λ and corresponding eigenvectors x Cn. Here the matrices M,N Cn×n are possibly sparse as is often the case in applications∈ [18, 20, 24]. Using the generalized∈ Schur decomposition as a starting point, an algorithm for n×k the task consists of first computing Qk C with orthonormal columns, or almost, for k n. Depending on the cost, a number∈ of suggestions have been made to this end; see≪ [7] and [2, Chapter 8] as well as [22]. (There is a vast literature on iterative methods for eigenvalue problems; for a concise review on the generalized eigenvalue n×k problem, see, e.g., [23, Section 7].) Thereafter Zk C with orthonormal columns is generated. The eigenvalue problem then gets∈ reduced in dimension once some linear combinations A and (nonsingular) B of M and N are compressed by forming the partial equivalence transformation ∗ ∗ Zk AQk and Zk BQk. (1.2)

In this paper, Arnoldi methods are devised for polynomially computing Qk and Zk in the domain and codomain. Without any transformations into a standard form taking place, classical iterations get covered in a natural way. A new Arnoldi method for the standard eigenvalue problem arises. The normal generalized eigenvalue structure is identified, admitting devising Arnoldi methods of moderate computational complexity for interior eigenvalues. For a given Qk, a criterion for optimally computing Zk is n devised based on averaging two elements of the Grassmannian Grk( C ). An optimal Arnoldi method for the eigenvalue problem arises. Altogether, it is shown that the standard and generalized eigenvalue problems should not be treated separately, and that the classical Arnoldi method is not optimal. To derive Arnoldi methods for (1.1), for appropriate Krylov subspaces inspect the associated resolvent operator λ R(A, B, λ) = (λB A)−1. 7−→ − ∗ Division of Mathematics, Department of Electrical and Information Engineering, University of Oulu, 90570 Oulu 57, Finland, ([email protected]). 1 2 MARKO HUHTANEN

Analogously to the “shift-and-invert” paradigm, here the choice of the linear com- binations A and B is a delicate issue in view of convergence. The Neumann series expansion of the resolvent operator reveals a Krylov space structure for computing Qk in the domain. For computing Zk in the codomain, two Krylov subspaces surface in a natural way, both giving rise to finitely computable canonical forms. With the first choice for a Krylov subspace to compute Zk, the matrix com- pressions attain the well-known Hessenberg-triangular form. Recall that the classical Arnoldi method is an iterative alternative to using elementary unitary transformations to convert a single matrix into a Hessenberg form. For the generalized eigenvalue prob- lem, elementary unitary transformations can be used to bring a pair of matrices into a Hessenberg-triangular form [16].1 Here an Arnoldi method for iteratively comput- ing this form is devised. Even though we are dealing with the generalized eigenvalue problem, polynomially the Ritz values result from a standard Arnoldi minimization problem. (In [22, Section 3] the same partial form is derived purely algebraically, without establishing a polynomial connection.) Polynomials are important by the fact that they link iterative methods with other fields of mathematics, offering a large pool of tools accordingly; see [15, 11] and references therein. With the second choice for a Krylov subspace to compute Zk, the matrix com- pressions attain a triangular-triangular+rank-one form. We have encountered such a structure neither in connection with the generalized eigenvalue problem nor with the classical Arnoldi method. Because of the symmetric roles of the matrices A and B, it can argued that it is equally natural as the Hessenberg-triangular form by the fact that the Ritz values arise polynomially as well, although somewhat surprisingly, through a GMRES (generalized minimal residual) minimization problem. This provides hence an explicit link between the GMRES method and eigenvalue approximations. In the case of a standard eigenvalue problem, a triangular-companion matrix form arises. There are Krylov subspace methods for computing Ritz values for normal matri- ces [11]. These methods can be applied to normal generalized eigenvalue problems. A generalized eigenvalue problem is said to be normal if the generalized Schur decom- position involves diagonal matrices. This leads to a natural extension of the classical notion of normality providing computationally by far the most attractive setting. That is, for interior eigenvalues the normal case turns out to be strikingly different by not requiring applying shift-and-invert techniques. In this context, the so-called folded spectrum method can be treated as a special case. Partly motivated by a possible inaccurate computation of Qk in forming the par- tial equivalence transformation (1.2), a criterion for optimally computing Zk for a given Qk is devised. Formulated in terms of Grassmannians, it consists of compar- ing two k-dimensional subspaces of Cn so as to simultaneously compute a maximal projection onto them in terms of applying Zk. For the two Arnoldi methods just described, the Zk’s suggested get replaced with their average such that the resulting Arnoldi method can be regarded as optimal. In particular, for k = 1 solving this for the standard eigenvalue problem replaces the Rayleigh quotient q∗Aq with q∗Aq Aq q∗Aq k k | | such that the field of values accordingly becomes the field of directed norms. Con- sequently, the corresponding power method converges more aggressively towards an

1Serves as a “front end” decomposition before executing the actual QZ iteration yielding the generalized Schur decomposition [8, p.380]. ARNOLDI METHODS FOR EIGENVALUES 3

exterior eigenvalue. This is not insignificant since the simple power method effect is the reason behind the success of more complex iterative methods for eigenvalues. For k 2, optimally computing Zk for a given Qk is not costly and thereby pro- vides a≥ noteworthy option for any method relying on the use of a partial equivalence transformation.2 In Section 2 Krylov subspaces and two Arnoldi methods for the generalized eigen- value problem are described. Although being polynomial methods both, the give rise to very different canonical forms. Section 3 deals with the normal generalized eigen- value problem and how shift-and-invert can then be avoided. In Section 4 a way to perform the partial equivalence transformation optimally is derived. An optimal Arnoldi method arises. The field of directed norms is introduced. In Section 5 nu- merical experiments are conducted. 2. Arnoldi methods for the eigenvalue problem. In the generalized eigen- value eigenvalue problem (1.1), the task is to locate the singular elements of the nonsingular3 two dimensional matrix subspace

= span M,N . (2.1) V { } In practice this is solved by computing invertible matrices X, Y Cn×n so as to perform an equivalence transformation ∈

= X Y (2.2) W V whose singular elements are readily identifiable. For problems of moderate size, a numerically reliable way [16] to achieve this is based on fixing a basis of and then computing the generalized Schur decomposition of the following theorem.V Theorem 2.1. Suppose A, B Cn×n. Then there exist unitary matrices Q and Z such that Z∗AQ = T and Z∗BQ∈= S are upper triangular both. If the unitary matrices Q and Z can be chosen in such a way that T and S are diagonal, then there are good reasons to call the matrix subspace normal. Correspondingly, the generalized eigenvalue problem (1.1) is then said to beV normal. This yields a very natural extension the notion of normal matrix; see Section 3. For large problems, computing the generalized Schur decomposition is typically not realistic. This paper is concerned with devicing Arnoldi methods to construct a partial equivalence transformation of , i.e., with iteratively computing matrices X∗, Y Cn×k, with orthonormal columnsV for k n, to accordingly perform a dimension∈ reduction in (2.2). 4 Bearing in mind that≪ the generalized Schur decom- position carries two unitary matrices, i.e., two different orthonormal bases, a natural Arnoldi method to this end should involve two Krylov subspaces. 2.1. Krylov subspaces of the eigenvalue problem. To compute Krylov sub- spaces, fix a basis of by setting V A = aM + bN and B = cM + dN (2.3) for some scalars a,b,c,d C such that det a b =0. The choice of these parameters ∈ c d 6 is a delicate issue determined by which eigenvalues  are being searched. Expressed by

2Practically all the methods for the eigenvalue problem can be recast in such a way that they rely on the use of a partial equivalence transformation. 3Nonsingular means that V contains invertible elements. 4Assuming biorthogonality, oblique projections are used to describe similar approaches for eigen- value problems; see [20] and [2, Chapter 3]. 4 MARKO HUHTANEN

using the “shift-and-invert” paradigm, it is unavoidable in any iterative computation of a small number of specific eigenvalues. Once done, thereafter solving the eigenvalue problem

Ax = λBx (2.4)

is equivalent to solving the original formulation (1.1) in the following sense. Proposition 2.2. Let M,N Cn×n and suppose (2.3) with det a b = 0. ∈ c d 6 Then the matrix αA + βB is singular if and only if δM + γN is singular,  where δ a b α γ = c d [ β ] .   Investigating  the inverses of invertible elements of a matrix subspace can be very useful for devising numerical methods [1, 10]. For the eigenvalue problem (2.4) this means inspecting the resolvent operator

λ R(A, B, λ) = (λB A)−1. (2.5) 7−→ − Analytically, finding the poles of the resolvent operator is equivalent to solving the generalized eigenvalue problem (1.1). Assume B (equiv. A) is invertible. For algebraic information, invoking the Neumann series yields

∞ (B−1A)j R(A, B, λ)= B−1 λj+1 Xj=0 which is valid for λ large enough in absolute value. Bearing in mind the classi- cal Arnoldi method, the truncated Neumann series for the resolvent operator hints at canonical Krylov subspaces.5 For the generalized eigenvalue problem this means taking the appearing powers and expressing their linear combination in terms of a polynomial p to have

(λB A)p(B−1A)B−1 = p(AB−1)(λB A)B−1 (2.6) − − for any λ C. This reveals what type of polynomial methods and Krylov subspaces are naturally∈ associated with the generalized eigenvalue problem. In particular, there is not much room for other polynomial methods in the following sense. Proposition 2.3. Let A, B Cn×n with B invertible and assume ∈ (λB A)p(X)= p(Y )(λB A) − − holds for any polynomial p. Then X commutes with AB−1 and Y = BXB−1.6 Proof. By taking λ = 0 and p(λ) = λ we have AX = Y A. Thereby we have BX = YB, so that Y = BXB−1 follows. By again taking λ = 0 and p(λ)= λ forces X to commute with AB−1. Recall the generalized Schur decomposition of Theorem 2.1. To iteratively con- struct a partial equivalence transformation (1.2), the identity (2.6) suggests how to generate Krylov subspaces. By taking a starting vector b Cn, in the domain first form ∈

−1 −1 −1 −1 −1 −1 k−1 −1 k(B A; B b) = span B b,B AB b, . . . , (B A) B b (2.7) K { } 5For a linear system Ax = b with A ∈ Cn×n and b ∈ Cn, the resolvent operator λ 7→ (λI − A)−1 of A has the Neumann series expansion involving powers of A for |λ| large enough. Modern iterative methods, such as GMRES, rely on forming optimal linear combinations of these powers applied to b. 6This means that X is, generically, a polynomial in AB−1. ARNOLDI METHODS FOR EIGENVALUES 5

for k =1, 2,.... Then we have the images

−1 −1 −1 B k(B A; B b)= k(AB ; b) (2.8) K K and

−1 −1 −1 −1 A k(B A; B b)= k(AB ; AB b) (2.9) K K

which are Krylov subspaces as well. Plainly, it appears natural to construct Qk based on the Krylov subspace (2.7). However, having now two Krylov subspaces (2.8) and (2.9) for constructing Zk in the codomain, we are forced to make some choices. From the outset, because of the symmetric roles of A and B, neither of them appears to be preferable over the other. It turns out that to both of these extremes correspond a natural Arnoldi method with very different canonical forms for (2.4). These alternatives will be discussed in the two subsections that follow. Then in Section 4, an average of (2.8) and (2.9) computed for an optimal Arnoldi method.

2.2. Arnoldi method for the Hessenberg-triangular form. Take Qk hav- ing orthonormal columns spanning the Krylov subspace (2.7), orthogonalized by in- voking the Arnoldi method. Bearing in mind the identity (2.6), a natural option is to take Zk having orthonormal columns spanning (2.8). (For any λ, the image of λB A −1 − applied to (2.7) is contained in k+1(AB ; b), i.e., in (2.8) augmented with just the −1 k K vector (AB ) b.) In terms of factorizations, Zk can be found by computing the QR factorization of BQk as

BQk = ZkRk, (2.10)

k×k where Rk C is upper triangular. Then, by using the relationship between (2.8) and (2.9),∈ we have

AQk = Zk+1H˜k, (2.11)

(k+1)×k where H˜k C is of Hessenberg type. Constructing∈ the subspaces in the prescribed manner results hence in a Hessen- berg matrix

∗ ∗ Hk = Zk AQk while Rk = Zk BQk (2.12) is upper triangular. Observe that, as opposed to the original matrices M and N in (1.1), it is the chosen basis matrices A and B which are brought into Hessenberg and upper triangular forms, respectively. A direct algorithm to this end based on using elementary orthogonal transformations was devised in [16]. (See also [14] for recent developments.) It extends the direct algorithm transforming a single matrix into Hessenberg form under a unitary similarity. It is noteworthy that the scheme can be regarded as extending the classical poly- nomial iterations in a natural way. That is, in the derivation above, absolutely no difference was made between standard and generalized eigenvalue problems. This makes it conceptually simpler to cover such problems at one stroke. For an illustra- tion, the classical Arnoldi method is realized with the following choices. Example 1. In the standard eigenvalue problem we have N = I. The classical Arnoldi method corresponds to choosing A = M and B = N. Namely, then Qk = Zk k−1 coincide and (2.7) is simply k(A; b) = span b,Ab,...,A b . K { } 6 MARKO HUHTANEN

Changing the roles of A and B in Example 1 resulting in another classical itera- tion, the inverse iteration, with a minor modification that (1.2) becomes a generalized eigenvalue problem then. Example 2. Consider again the standard eigenvalue problem, i.e., N = I. Ap- plying the inverse iteration corresponds to choosing A = N and B = M. Then (2.7) −1 −1 −1 is k(B ; B b) and (2.8) is k(B ; b). In particular, then we have Qk = Zk. K K 6 Also the following underscores that we are dealing with a natural extension of the classical Arnoldi method. −1 −1 −1 −1 Example 3. Suppose dim k+1(B A; B b) = dim k(B A; B b). Let X = [B−1bB−1AB−1b (B−1A)k−K1B−1b] and Y = [b AB−1Kb (AB−1)k−1b]. Then ··· ··· (λB A)X = Y (λI C), (2.13) − − where C is a companion matrix. The matrix subspace (2.9) equals (2.8) multiplied by AB−1. This is the rela- tionship in the classical Arnoldi method for the matrix AB−1 using b as the starting vector. The matrix AB−1 is a linear fractional transformation of M and N; see [13] and references there for linear fractional transformations of operators.7 In particular, the amount of lack of invariance (caused by A) gets expressed in terms of the standard Chebyshev minimization problem

min p(AB−1)b , (2.14) p∈Pk (∞) || ||

where k( ) denotes the set of monic polynomials of degree k as follows. P ∞ Theorem 2.4. With Rk and Hk defined in (2.12), let pk(λ) = det(λRk Hk). − Then pk realizes the minimum (2.14). Proof. Denote by Pk the orthogonal projection onto (2.8). Then the pencil λRk Hk is equivalent to λI Ck, where Ck is the companion matrix whose last − − −1 k column is obtained by expanding Pk(AB ) b in the (power) basis (2.8) as

−1 k −1 −1 k−1 Pk(AB ) b = c b + c AB b + + ck (AB ) b. (2.15) 0 1 ··· −1 (That is, the columns of X and Y of Example 3 are used as bases.) This yields the k k−1 j claim, after expanding the determinant to have p(λ)= λ cjλ which, because − j=0 of (2.15), also realizes (2.14). P The basis (2.3) chosen entirely determines Qk and Zk. One should be aware that (2.7) and (2.8) do not genuinely involve a linear fractional transformation, only shift-and-invert takes place as follows. Proposition 2.5. Let C = αA + βB with C =0. Then 6 −1 −1 −1 −1 −1 −1 k(B C; B b)= k(B A; B b) and k(AB ; b)= k(CB ; b). K K K K

Proof. Krylov subspaces generated with a matrix are translation and dilation invariant. Therefore the identities αB−1A + βI = B−1C and αAB−1 + βI = B−1C yield the claim.

7Among numerical linear algebraists, a linear fractional transformation of matrices is sometimes called a spectral transformation. ARNOLDI METHODS FOR EIGENVALUES 7

Hence, the way Qk and Zk were chosen above, only the parameters c and d were critical. Thereby, to have the sparsest possible A, one should take either a = 0 or b = 0 providing det a b =0. c d 6 To implement the scheme, there are many options. For instance, compute Qk+1 first. Then compute the QR factors Zk+1 and Rk+1 of BQk+1 by overwriting Qk+1 simultaneously. To compute Hk of (2.11), you can always recover a column of Qk by −1 applying B to the respective column of Zk+1Rk+1. Once Rk and Hk are available, Ritz values can be computed. If Ritz vectors are needed, then return to Qk+1 by −1 computing B Zk+1Rk+1 and overwriting Zk+1. In particular, like with the classical Arnoldi method, this implementation requires storing only a single “tall and skinny matrix”, i.e., either Qk+1 or Zk+1. 2.3. Arnoldi method for the triangular-triangular+rank-one form. An- other natural option is to take Zk having orthonormal columns spanning (2.9), or- thogonalized by the Arnoldi method. (For any λ, the image of λB A applied to (2.7) is contained in (2.9) augmented with the vector b.) In constructing− a basis of (2.7), it turns out worthwhile to make a minor modification by setting B−1b to be the first basis vector. Thereafter the remaining vectors are obtained by computing an orthonormal basis of

−1 −1 −1 k (B A; B AB b) K −1 by the Arnoldi process.8 (That is, aside from the first vector, the basis is orthonor- mal.) This minuscule change causes no numerical problems while leading to a very n×k structured form. Denote by Qˆk C the matrix having these vectors as columns. ∈ In terms of factorizations, Zk can be found by computing the QR factorization of AQˆk as

AQˆk = ZkRk, (2.16)

k×k where Rk C is upper triangular. Then, because of the way the basis of (2.7) was constructed,∈ we have

BQˆk = [ b Zk−1Tˆk−1 ] , (2.17)

(k−1)×(k−1) where Tˆk−1 C is upper triangular. These manipulations∈ can readily be modified to have an orthonormal basis of (2.7), yielding a finitely computable “structure approximation” to the generalized Schur decomposition through an Arnoldi process as follows. n×n n −1 −1 Theorem 2.6. Let A, B C and b C satisfy dim n(B A; B b) = n. Then the prescribed Arnoldi∈ process yields∈ unitary matrices KQ and Z such that Z∗AQ = T is upper triangular and Z∗BQ bv∗ = S strictly upper triangular with v Cn. − ∈ Proof. Consider (2.16). Compute the QR factorization QkRˆk of Qˆk and then multiply with the inverse of Rˆk from the right. Then

AQk = ZkR˜k (2.18) with an upper triangular R˜k and ˜ ∗ BQk = Zk−1Tk−1 + bvk−1 (2.19)

8Generically, the starting vector leads to an orthonormal basis for the column space of B−1A. This should be compared with the QR factorization. 8 MARKO HUHTANEN

(k−1)×k k with an upper triangular T˜k C and vk C . −1 ∈ +1 ∈ Of course, if the dimension growth of (2.7) stops for k

∗ ˆ ∗ ˆ Rk = Zk AQk while for Tk = Zk BQk (2.20)

∗ consider (2.17). In applying Zk from the left one obtains a rank-one perturbation of a strictly upper triangular matrix by the fact that the first column is nonzero. This column simply consists of the Fourier coefficients of b with respect to the columns of Zk. The arising rank-one structure fundamentally differs form Hessenberg matrices which are typically associated with Krylov subspace methods for eigenvalue compu- tations. This is intriguing by the fact that rank-one perturbed eigenvalue problems have been an object of study for some time; see [19] and references therein. Thereby it is of interest to see how the classical Arnoldi method of Example 1 works out with these choices. Example 4. In the standard eigenvalue problem N = I. Like in Example 1, let us k−1 take A = M and B = N. Then (2.7) equals k(A; b) = span b,Ab,...,A b while 2 Kk { } (2.9) equals k(A; Ab) = span Ab, A b,...,A b . (Because k(A; Ab) = A k(A; b), this resemblesK the structure appearing{ in connection} with theK harmonic RitzK values.) This leads to a very particular structure by the fact that then Rk is upper triangular and Tk is a companion matrix. (The first column of Tk is nonzero while the other entries are zero except on the first super-diagonal there are ones.9) Of course, these computations require no more storage than the classical Arnoldi method. Again, no difference was made between standard and generalized eigenvalue prob- lems. This is underscored by Example 4 where a standard eigenvalue problem got converted into a very structured generalized eigenvalue problem. That is, it is not clear at all that in a numerical approximation one should avoid transforming a stan- dard eigenvalue problem into a generalized eigenvalue problem. Here this happens naturally. (Of course, for long it has been common that the opposite takes place [5].) In Section 4 this viewpoint is further supported both qualitatively and quantitatively when choosing Zk is done in an optimal way. The norm of b has not been constrained. It has no effect on the approximations generated. Proposition 2.7. Compute (2.16) and (2.17) with the starting vector tb for a nonzero t C. Then the eigenvalue approximations coincide. ∈ As opposed to the Chebyshev minimization problem (2.14), now the amount of lack of invariance (caused now by B) gets expressed in terms of the GMRES minimization problem

min p(AB−1)b , (2.21) p∈Pk(0) || || where k(0) denotes the set of polynomials of degree k at most satisfying the normal- izationPp(0) = 1.

9 T Typically a companion matrix is represented as P TkP where P is the permutation matrix having ones on the anti-diagonal joining the left lower-corner with the right upper-corner. ARNOLDI METHODS FOR EIGENVALUES 9

Theorem 2.8. With Rk and Tk defined in (2.20), let

k pk(λ)=1+ λ(det(λRk Tk) λ ). − − 10 Then pk realizes the minimum (2.21). Proof. Denote by Pk the orthogonal projection onto (2.9). Then the pencil λRk Tk is equivalent to λI Ck, where Ck is the companion matrix whose last − − column is obtained by expanding Pkb in the (power) basis (2.9) as

−1 −1 k Pkb = c AB b + + ck(AB ) b. 1 ··· This yields the claim, after expanding the determinant and reordering the character- istic polynomial.

2.4. Preconditioning and related iterations. If the inverse of B is applied, there arises the obvious possibility to non-unitarily transform the problem into a standard eigenvalue problem, as suggested already in [5]. We do not regard it entirely natural. This approach may run into serious problems when B is very ill-conditioned. As is well-known, this is not an issue that can be neglected by the fact that these schemes require accurate solving of the linear systems involving B; see [2, Chapter 11]. This, of course, is not a trivial matter. Our aim has been at finding a natural Krylov subspace method to generate a unitarily equivalent eigenvalue problem to the original one whose truncation is then computed in practice. Although the methods require inversion (while iterating), it is only for producing orthonormal bases. If the inversion is not done accurately, we still obtain a unitary equivalence. Only (2.6) gets then lost. We will address how to deal with this problem in Section 4 in terms of optimally choosing Zk, no matter how Qk has been generated. A main argument being the , also [7, 22] are concerned with devising preconditioned iterative methods for solving the original eigenvalue problem with the help of a (truncated) unitary equivalence aiming at the generalized Schur decomposition. In our approach, it is B which is the subject of preconditioning for the purpose of achieving speed-ups in solving linear systems involving B. The construction of a good preconditioner is likely by far the most time consuming part of the scheme. Thereby the linear combination of M and N yielding B should be carefully chosen; changing it later can be very costly if a preconditioner needs to be regenerated.11 Instead of regenerating the preconditioner, restarting is often the most realistic alternative to adaptively use information during the iteration. See [22, 3] for restarting. See [6] for polynomial filtering to compute a large number of eigenvalues. Let us emphasize that performing an equivalence transformation (2.2) and there- after iterating cannot be interpreted as performing preconditioning. That simply turns into a similarity transformation in computing Krylov subspaces (2.7), (2.8) and (2.9). (The power method effect remains unchanged under a similarity since the spec- trum does not change.) Instead, preconditioning is aimed at a clever construction of a (partial) equivalence transformation. In this the effect of the inverse of B must be somehow present. In the section that follows, a generalized eigenvalue structure which admits avoiding the use of the inverse is described.

10 Observe that λRk − Tk is the partial equivalence transformation of λA − B. 11Most efficient preconditioners are “approximate factorizations”. Such preconditioners do not admit simple translations, i.e., they must be entirely regenerated. 10 MARKO HUHTANEN

3. The normal generalized eigenvalue problem. Assume the matrix sub- space is normal and consider (1.1). In what follows it is shown that there are many strikinglyV inexpensive options for Arnoldi methods then. By this we mean that ap- plying the inverse of B can be completely avoided. Aside perhaps from the standard Hermitian eigenvalue problem, we do not claim that the normal generalized eigen- value problem is of great practical importance per se. However, it provides an ideal structure with properties that otherwise can be strived for. And, of course, there exist ways to utilize normality in problems which are not normal [9]. The Gershgorin circles theorem provides a prime example of this. The following is a more involved, although a standard trick [18, Chapter 15]. Example 5. For a generalized eigenvalue problem appearing often in appli- cations, suppose A is Hermitian and B positive definite. Then, if not too costly, it is customary to form the Cholesky factorization B = LL∗ of B. It can be used to transform span A, B into span L−1AL−∗, I which is normal. { } { } It is instructive to observe that the classical notion of normality is de facto a two dimensional notion in Cn×n as follows Proposition 3.1. A nonscalar12 matrix A Cn×n is normal if and only if the matrix subspace span I, A is normal. ∈ Proof. If A C{I is} normal, then span I, A is normal. For the converse, if span I, A is normal,66∈ then Z∗Q = eiΘ for a real{ diagonal} matrix Θ and Z∗AQ = Λ for a{ diagonal} matrix Λ. From these it follows that we may choose Z = Q and hence A is a normal matrix. Example 6. Proposition 3.1 underscores how normality is also revealed in prac- tice. That is, in applying and studying the resolvent operator, it does not suffice to inspect the matrix alone. Instead, one is concerned with the inverses of the elements of the matrix subspaces span I, A . For normal matrices the growth of the norm of the resolvent operator is given{ by the} reciprocal of the distance to the spectrum. Because of Proposition 3.1 and Example 6, the suggested two dimensional exten- sion of normality appears very natural. In the Arnoldi methods described, in the domain we are dealing with B−1A while in the codomain with AB−1. These are (similar) normal matrices both, although unitarily diagonalizable with respect to entirely different orthonormal bases in gen- eral, obtained from the generalized Schur decomposition. There are Krylov subspace methods for iteratively producing orthonormal bases, as well as computing Ritz val- ues, for normal matrices [11]. Algorithms which benefit from the structure provide, as a rule, the method of choice for solving problems. Theorem 3.2. A non-singular two dimensional matrix subspace is normal if and only if B−1A and AB−1 are normal matrices for any linearlyV independent A, B with B invertible. Proof∈V . It is clear that if is normal, then B−1A and AB−1 are normal matrices for any A, B with B invertible.V For the converse,∈V assume B−1A and AB−1 are normal for two linearly independent A, B with B invertible. We may assume that, after applying the generalized Schur∈ decomposition, V both A and B are upper triangular. Then, by being normal, −1 −1 B A = D1 and AB = D2 are diagonal. Thus we have BD1 = D2B. Since B is upper triangular and invertible, from the diagonal entries we may conclude that D1 = D2. Thus B commutes with a diagonal matrix. If all the eigenvalues of D1

12That is, A 6= αI for α ∈ C. This guarantees that the dimension of span{I,A} is two. ARNOLDI METHODS FOR EIGENVALUES 11 differ, the claim follows since B is then necessarily diagonal and A = BD1. If D1 has multiple eigenvalues, the condition D1B = BD1 forces B to have a conforming block- structure. Compute the singular value decomposition of B in each of these blocks. In each block corresponding to nonzero entries of D1, the corresponding block of A is a scalar multiple of the corresponding block of B. Hence we may use the singular value decomposition of B for the corresponding block of A. Collecting these yields the unitary equivalence diagonalizing A and B. Since A and B span , we may conclude that is normal. V V Similarly one can prove the following result. Theorem 3.3. A non-singular two dimensional matrix subspace is normal if and only if B∗A and AB∗ are normal matrices for any linearly independentV A, B with B invertible.13 ∈V Because of computational complexity, it is certainly desirable to avoid applying the inverse. In terms of the Hermitian transpose, for the eigenvalues we have

Λ(B∗A)=Λ(S∗T ), (3.1) where T and S are the matrices of Theorem 2.1. Because of normality, they are now diagonal, yielding in this case a spectral mapping identity. It can be used to have inclusion regions for the eigenvalues of (1.1). Example 7. Consider (3.1) and suppose you know ST = D where D is a given diagonal matrix and T and S are unknown diagonal matrices. Then you know on which rays through the origin the eigenvalues of (2.4) are located. In the normal generalized eigenvalue problem it suffices to approximate the uni- tary matrix Q of Theorem 2.1. Thereafter Z can be recovered by applying a linear combination of A and B to the columns of Q. (This should be compared with the SVD (singular value decomposition), i.e., how the left singular vectors of matrix be- come available once the right singular vectors have been computed.) For computing Q this does not require applying the inverse of B by the fact that B∗A is a normal matrix diagonalized by Q. To iteratively generate an approximation to a partial of Q, a natural choice is to execute the Arnoldi method for normal matrices of [11]. In particular, by making the following choices, the algorithm reduces to the Hermitian Lanczos method such that any shift-and-invert can be completely avoided. Example 8. It is quite remarkable that in the normal case, applying the inverse is not necessary in searching interior eigenvalues. Namely, consider (1.1) and suppose one is interested in eigenvalues λ near a prescribed point µ C. Form linear combinations A = M µN and B = A. (Observe that for approximating∈ a partial of Q, matrices A and−B can be linearly dependent.) Then B∗A is a positive semidefinite matrix such that its left-most, i.e., extreme eigenvalues are associated with the generalized eigenvalues near µ.14 These get approximated in a fairly well- known manner with the Hermitian Lanczos method for approximating the SVD.15

13In practice, normality should be checked by performing matrix vector products only, i.e., N ∈ Cn×n is normal with probability one if NN ∗b − N ∗Nb = 0 for a randomly chosen b ∈ Cn. 14For the standard Hermitian eigenvalue problems the trick is well-known. It is called, among physicists at least, the folded spectrum method. This terminology is not entirely satisfactory since the success of the idea relies on the singular vectors of the SVD coinciding with the eigenvectors. 15There exists the PROPACK software for the SVD of large and sparse matrices; see http://sun.stanford.edu/ rmunk/PROPACK/. 12 MARKO HUHTANEN

4. Optimal partial equivalence transformation and optimal Arnoldi method. When applying the inverse of B cannot be done to a high accuracy, Qk gets inaccurately computed in the sense that the identity (2.6) no longer holds. Then Zk cannot be computed with columns spanning either the Krylov subspace (2.8) or (2.9). Assume thus having computed Qk with orthonormal columns without possibly satisfying these relationships. It has been suggested that Zk be then computed based on how a linear combination of A and B maps Qk [7]. See also [21]. The reasoning for this is somewhat unclear by the fact that the underlying mathematical structure is n n the Grassmannian Grk( C ) of k dimensional subspaces of C . Within this structure, a linear combination of A and B applied to Qk does not satisfy any obvious optimality criteria. n For an optimality condition, identify Qk with an element of Grk( C ) by taking the span of its columns. Consequently, Qk and QkV are indistinguishable elements n k×k of Grk( C ) if and only if V C is unitary. Denote by Zˆ and Z˜ matrices having orthonormal columns spanning∈ the column spaces of

AQk and BQk. (4.1)

n If the elements of Grk( C ) corresponding to Zˆ and Z˜ equal (which is an unrealistic 16 assumption) choose Zk to be either of them. Then the eigenvalue problem (1.1) gets partially exactly solved once the corresponding reduced k-by-k eigenvalue problem is solved. n If the elements of Grk( C ) corresponding to Zˆ and Z˜ differ, then the problem arises how to choose Zk. It is clear that if Zk is poorly chosen, the respective partial equivalence transformation is of little use for eigenvalue approximations. It appears natural to aim at finding an average of them in some sense. By using the Frobenius norm, an optimality criterion to this end can be formulated as

∗ ˆ 2 ∗ ˜ 2 max n Z Z F + Z Z F . (4.2) Z∈Grk( C ) k k k k  For the eigenvalue problem this means finding a partial equivalence transformation which maximizes the attainable projection onto the column spaces of (4.1). The n task appears approximation theoretically natural and is well-defined in Grk( C ), i.e., it does not depend on the choice of the representing matrices Zˆ, Z˜ and Z. It is immediate that the value of (4.2) is bounded above by 2k such that the equality is attained in the ideal case of equaling Zˆ and Z˜. The optimality criterion is best illustrated by studying the case k = 1. Then the “average” is formed as follows. Proposition 4.1. Let z,ˆ z˜ Cn be unit vectors. Then ∈ 1 z = (eiθzˆ +˜z) 2+2 zˆ∗z˜ | | p with θ = argˆz∗z˜ = 0 is the unique solution to (4.2). If zˆ∗z˜ = 0, then z can be any unit linear combination6 of zˆ and z˜. Proof. The problem becomes that of finding the largest singular value of the ∗ zˆ ∗ matrix ∗ , which is 1+ zˆ z˜ , and the corresponding singular vector which is as z˜ | | claimed.  p

16Hence, suppose Zˆ and Z˜ have k columns both. ARNOLDI METHODS FOR EIGENVALUES 13

Consider the eigenvalue problem (2.4). In the one dimensional case the partial equivalence transformation then reads

z∗Aq = λz∗Bq (4.3) with unit vectors z, q Cn. Hence λ must equal ∈ z∗Aq . (4.4) z∗Bq With q fixed, choosing z according to Proposition 4.1 yields Aq λ = eiθ k k (4.5) Bq k k with θ = arg q∗B∗Aq, assuming q∗B∗Aq = 0. In particular, the standard eigenvalue problem corresponds to A = M and B =6 N = I yielding us the following connection with the Rayleigh quotient q∗Aq.17 Theorem 4.2. Let A Cn×n and suppose q∗Aq = 0 for a unit vector q Cn. ∈ Aq6 ∈ Then choosing z to satisfy (4.2) with zˆ = q and z˜ = kAqk yields z∗Aq q∗Aq = Aq (4.6) z∗q q∗Aq k k | | Proof. Construct z according to Proposition 4.1. Then solve λ from the identity

Aq ∗ Aq ∗ eiθq + Aq = λ eiθq + q  Aq   Aq  k k k k to have the claim. This yields eigenvalue approximations moving more aggressively towards the bound- ary eigenvalues than the points q∗Aq in the field of values. Corollary 4.3. Suppose A Cn×n is Hermitian. Then (4.4) equals Aq for q∗Aq > 0 and Aq for q∗Aq < 0∈. k k Collecting−k thesek numbers yields a certain type of field of values. Definition 4.4. Let A Cn×n. The set ∈ q∗Aq (A)= Aq : q =1 such that if Aq =0, then require q∗Aq =0 Fk { q∗Aq k k k k 6 6 } | | is said to be the field of directed norms of A. The field of directed norms contains the spectrum. Clearly, k(A) contained in an annulus (centered at the origin) having the outer radius A andF the inner radius min Ax . The notion is unitarily invariant in the sensek thatk kxk=1 k k (UAU ∗)= (A) Fk Fk for any unitary U Cn×n. It can yield very accurate (nonconvex) information. ∈ Example 9. Let A Cn×n be Hermitian and unitary. Then the field of directed norms equals the spectrum∈ of A.

17The Rayleigh quotient consists of forming the partial equivalence transformation (4.3) by choos- ing z = q. As opposed to optimally computing z, this choice can be argued by the fact that the corresponding λ has solved minλ kλq − Aqk; see [25, pp. 203–209]. See also [17]. 14 MARKO HUHTANEN

∗ n×n i arg q Aq Proposition 4.5. Assume A C is unitary. Then k(A) = e : q∗Aq =0 . ∈ F { 6 } Numerical experiments can be devised to conclude that k(A) is not translation invariant, though.18 Hence, for a tighter spectral inclusion, weF may take an intersec- tion of the sets

(A λI)+ λ Fk − for any finite collection of points λ. This can be used with the power method for a dominating eigenvalue (or with its generalizations based on the use of the Rayleigh quotient; see, e.g., [25, pp. 203–209] or [20, Section 4.1]). Simply replace the Rayleigh quotient with (4.6) accordingly for the eigenvalue approximation. If q∗Aq = 0, then (4.6) is not defined in an obvious, unique way. (We can always put z = q.) This is not a serious issue simply by the fact that then the eigenvalue approximation is extremely poor in any case. Example 10. Consider executing the power method and using (4.6) in approxi- mating the dominating eigenvalue. By Corollary 4.3, for Hermitian matrices the ap- proximation is better than the one given by the Rayleigh quotient. This is somewhat striking since originally the Rayleigh quotient was particularly designed for Hermi- tian matrices, or compact Hermitian operators; see [17, pp. 679–680] for a concise history. This improvement is readily seen to extend to normal matrices as well. (As is well-known, with nonnormal matrices very nasty examples for Ritz values can be generated.) The classical Arnoldi method can be regarded as being an extension of the Rayleigh quotient to subspaces generated with the power method. For a similar extension involving (4.6) instead, let us assume that Zˆ and Z˜ in (4.2) have been cho- sen in such a way that Zˆ∗Z˜ = Σ is diagonal with nonnegative entries. More precisely, compute the SVD

Zˆ∗Z˜ = UΣV ∗ (4.7) of Zˆ∗Z˜ with unitary U, V Ck×k and a diagonal matrix Σ with nonnegative entries. ∈ Then take ZUˆ and ZV˜ to replace the original Zˆ and Z˜. Denote now byz ˆj andz ˜j the columns of Zˆ and Z˜. With these vectors, form the columns of Z according to Proposition 4.1. This yields

k ∗ 2 ∗ 2 Z Zˆ + Z Z˜ = 1+ σj , (4.8) k kF k kF Xj=1 where σj are the diagonal entries of Σ. Theorem 4.6. Assume Z,ˆ Z˜ Cn×k have orthonormal columns. Then a solu- tion Z to (4.2) satisfies (4.8). ∈ Proof. Consider the linear map

Zˆ∗ Zˆ∗Z Z Z = (4.9) 7→  Z˜∗   Z˜∗Z  from Cn×k to C2k×k. By the construction in connection with (4.7), we may assume that Zˆ and Z˜ are such that Zˆ∗Z˜ = Σ is diagonal with nonnegative entries. By

18 C In general, Fk(A − λI) does not equal Fk(A) − λ for λ ∈ . ARNOLDI METHODS FOR EIGENVALUES 15

Algorithm 1 to compute Zk satisfying (4.2) for (4.1) 1: Read n-by-n matrices A and B and n-by-k matrix Qk with orthonormal columns 2: Compute the QR factorizations AQk = Q1R1 and BQk = Q2R2 ∗ ∗ 3: Compute the SVD Q1Q2 = UΣV 4: Set Zˆ = Q1U and Z˜ = Q2V 5: With the columnsz ˆj andz ˜j of Zˆ and Z˜ 6: for l =1,...,k do ∗ 7: Compute a =z ˆj z˜j and b = arg a ib 8: Set zj = (e zˆj +˜zj)/ 2+2 a | | 9: end for p 10: Set Zk = [z zk] 1 ···

regarding (4.9) as acting columnwise on Z, in terms of the Kronecker product we may consider I Zˆ∗ vect(Z) M vect(Z)= ⊗ vect(Z) (4.10) 7→  I Z˜∗  ⊗ with the identity I being of size k-by-k, so that M is of size 2k2-by-nk. We have

I Zˆ∗ I I I Σ MM ∗ = ⊗ I Zˆ I Z˜ = ⊗ ⊗ .  I Z˜∗  ⊗ ⊗  I Σ I I  ⊗   ⊗ ⊗ Hence, the nonzero singular values of this map are determined by taking k times the I Σ positive square root of the eigenvalues of . These eigenvalues are 1 σj  Σ I  ± for j = 1,...,k. Because of (4.10), the problem separates and becomes that of how to position k orthonormal vectors with respect to the 2k singular values 1 σj for ± Zˆ∗ p j =1,...,k for the matrix . This is the dual problem of approximating with  Z˜∗  the singular value decomposition with the optimal solution satisfying (4.8). Consider (4.1). The solution provided by Theorem 4.6 can be used with any method based on the construction of a partial equivalence transformation (such as the Jacobi-Davidson method) for a given Qk. Algorithm 1 yields a routine to this end. For an optimal Arnoldi method, let us now regard (2.8) and (2.9) as elements of n Grk( C ). (Hence, assume they are of dimension k both.) Suppose Zˆk is generated with the Arnoldi method with its columns spanning (2.8). Then Z˜k is obtained by orthonormalizing the columns of Zˆk+1H˜k, where

−1 AB Zˆk = Zˆk+1H˜k (4.11)

(k+1)×k with H˜k C being of upper-Hessenberg type. By using Zˆk and Z˜k, compute ∈ Zk to satisfy (4.8). Because of the construction, the columns of Zk are linear combi- nations of the columns of Zˆk+1 and thereby we are dealing with a Krylov subspace method. In particular, neither the method of Section 2.2 nor Section 2.3 can be re- garded as being optimal with respect to any obvious criterion. An optimal method corresponds to Zk being an average of (2.8) and (2.9). In the following we assume that the Arnoldi method has not broken down so as to have (4.11), i.e., (2.8) and (2.9) are of dimension k both. 16 MARKO HUHTANEN

Definition 4.7. Suppose Qk is generated with the Arnoldi method for (2.7). Then the partial equivalence transformation with Zk satisfying (4.8) with respect to (2.8) and (2.9) is said to be an optimal Arnoldi method for Ritz values. Example 11. It is noteworthy that the classical Arnoldi method (i.e., A = M and B = N = I) differs from the optimal Arnoldi method. Recall that in the classical Arnoldi method Zk = Qk is always chosen so as to keep the compressed problem standard. It is hard to argue why this should be the case unless the problem is Hermitian, or more generally, normal and one wants to preserve the structure. That is, by taking the optimal choice Zk satisfying (4.8), the compressed problem is not a standard but generalized eigenvalue problem in general. Proposition 4.8. Suppose A, B Rn×n and b Rn. Then the partial equiva- ∈ ∈ lence transformation with Zk satisfying (4.8) involves real matrices. Proposition 4.9. Let Zˆk and Z˜k be generated with the optimal Arnoldi method for Ritz values such that (2.8) and (2.9) are of dimension k both. Then all the singular ∗ k×k values of Zˆ Z˜k C , except the kth, are ones. k ∈ Proof. From (4.11) we obtain Z˜k by forming the QR factorization Q˜kR˜k of H˜k and then setting Z˜k = Qk+1Q˜k. Since Zˆk = Qk, we have

ˆ∗ ˜ ˜ ˜ Zk Zk = IkQk, (4.12)

where I˜k is k-by-(k +1) with ones at the positions (j, j) for j =1,...k while the other entries are zeros. Since H˜k is a Hessenberg type matrix, I˜kQ˜k is obtained by removing the last row of Q˜k where there is just the (k +1, k) entry which is possibly nonzero. Since the classical Arnoldi method has not broken down, the entry is nonzero and therefore only the kth singular value is less than one. Since the columns of Q˜k are orthonormal, the columns of I˜kQ˜k are orthogonal. To end this section, observe that in the optimality condition (4.2) the matrices Zˆ and Z˜ depend on how A and B have been chosen when (4.1) is formed.19 Example 12. Consider the standard eigenvalue problem of having N = I. Let us set B = N = I, so that no preconditioning is going to be used. Suppose Qk has been generated with the classical Arnoldi method, i.e., the columns of Qk k−1 yield an orthonormal basis of k(A; b) = span b,Ab,...,A b . Then it is not obvious how the linear combinationK A in (2.3) should{ be formed.} If we take a = 1 and b = trace(M,I/√n), then A and B are orthogonal and maximally linearly independent.−

5. Numerical experiments. In what follows, by using Matlab, three small but illustrative numerical experiments are conducted to demonstrate different aspects of the methods proposed. Many of the experiments are well-documented benchmark eigenvalue problems. Example 13 is a standard Hermitian eigenvalue problem from [4]. It is taken here because of the common belief that the Hermitian Lanczos method is the method of choice then. Here it is shown that this may not be the case. (Recall also Example 10.) Example 14 is a well-known and carefully documented nonnormal eigenvalue problem from [20]. Example 15 is concerned with the generalized eigen- value problem BFWAVE from the matrix market with which the effect of inaccurate applications of B−1 is studied.

19This should be compared with the shifted power method for the standard eigenvalue problem Ax = λx. When applied with A − µI, the dominating eigenvalue can be different from that of A. ARNOLDI METHODS FOR EIGENVALUES 17

4

3

2

1

0 Eigenvalues

−1

−2

−3

−4 0 2 4 6 8 10 12 Iteration number

Fig. 5.1. For Example 13, the convergence of the Hermitian Lanczos method (depicted with ’o’) and the optimal Arnoldi method (depicted with ’+’), drawn vertically. The first 10 steps are shown. Exact eigenvalues of A are on the right.

j =1 j =2 j =3 j =4 j =5 j =6 j =7 j =8 j =9 j = 10 -0.0326 -0.0031 0.2633 0.6609 1.0712 1.4502 1.7905 2.0915 2.3531 2.5761 -0.9990 -1.8060 2.3869 2.7896 3.0476 3.2096 3.3135 3.3825 3.4298 3.4632 Table 5.1 For Example 13, the power method first used with the Rayleigh quotient and then by using (4.6).

Example 13. Let A Cn×n be Hermitian with its eigenvalues drawn randomly from a normal Gaussian distribution.∈ Take B = I in (2.4) so that we have a standard eigenvalue problem. If A is diagonal, let the starting vector have equaling entries. The purpose is mimic the carefully documented test for Hermitian eigensolvers; see [4, Example 7.1]. Therefore we also take n = 103. In Figure 5 we have compared the Hermitian Lanczos method (with full orthogonalization) against the optimal Arnoldi method, drawn vertically while the iteration number runs horizontally. In Table 5.1 the power method is applied by using first the Rayleigh quotient and then (4.6) for the dominating eigenvalue approximation. Ten steps were taken. Altogether, the optimal Arnoldi method performed better than the Hermitian Lanczos method, although not by a wide margin. The differences were largest at the early steps. In applying the power method, the Rayleigh quotient clearly looses against using (4.6). Example 14. A Markov model of a random walk on a triangular grid [20, Section 2.5.1] is a well-documented test for basic iterative eigensolvers; see [20, Example 4.1] and [20, Example 6.1].20 The problem is standard such that the eigenvalues at the right end of the spectrum are of interest. Here the Matlab script of [20, p. 44] is used to generate A Rn×n while B = I in (2.4). We took n = 5050. The starting vector was randn(n, 1)∈ divided by its norm. In Figure 5 we have compared the classical Arnoldi method against the optimal Arnoldi method, drawn vertically while the iteration number runs horizontally. Since now A is a nonnormal matrix, the problem is tougher. Although A is real, there are complex eigenvalues and complex Ritz values. Whenever

20Also to be found from the matrix market at http://math.nist.gov/MatrixMarket/ 18 MARKO HUHTANEN

1.5

1

0.5

0 Eigenvalues

−0.5

−1

−1.5 0 2 4 6 8 10 12 14 Iteration number

Fig. 5.2. For Example 14, the convergence of the classical Arnoldi method (depicted with ’o’) and the optimal Arnoldi method (depicted with ’+’), drawn vertically. The first 12 steps are shown. Numerically computed eigenvalues of A are on the right.

the optimal Arnoldi method yields an extreme Ritz value appearing as a pair (i.e., two genuinely complex extreme Ritz values), the approximation is of the same order as given by the classical Arnoldi method. Whenever the optimal Arnoldi method yields is single real extreme Ritz value, the approximation is better than that given by the classical Arnoldi method. Example 15. This is the generalized eigenvalue problem BFWAVE: Bounded Finline Dielectric Waveguide from the matrix market. We used n = 782. Then A Rn×n is not Hermitian while B Rn×n Hermitian, although indefinite. Without aiming∈ at any particular eigenvalues,∈ now the inversion must be performed in any case. We assume an inaccurate application of the inverse. To this end B−1 is replaced with

B−1 (1 + E B−1 10−6), ∗ ∗k k∗ R where E = kRk with R = randn(n,n) so as to model inaccuracies in the computation of Qk. In Figure 5 we have compared the Arnoldi method (using Zk = Qk in the partial equivalence) and the optimal Arnoldi method. At the left end of the spectrum the optimal Arnoldi behaves better whereas at the right end the Arnoldi method slightly wins.

REFERENCES

[1] E. Asplund, Inverses of matrices aij which satisfy aij = 0 for j>i+p, Math. Scand., 7 (1959), pp. 57–60. [2] Z. Bai, J. Demmel, J. J. Dongarra, A. Ruhe, and H. A. van der Vorst, eds., Templates for the Solution of Algebraic Eigenvalue Problems: A Practical Guide, SIAM, Philadelphia, PA, USA, 2000. [3] C.A. Beattie, M. Embree, and D.C. Sorensen, Convergence of polynomial restart Krylov methods for eigenvalue computations, SIAM Rev., 47 (2005), pp. 492–515. [4] J.W. Demmel, Applied , SIAM Philadelphia, 1997. [5] T. Ericsson and A. Ruhe, The spectral transformation Lanczos method for the numerical solution of large sparse generalized symmetric eigenvalue problems, Math. Comp., 35 (1980), pp. 1251–1268. ARNOLDI METHODS FOR EIGENVALUES 19

6 x 10 0.5

0

−0.5

−1

Eigenvalues −1.5

−2

−2.5

−3 0 2 4 6 8 10 12 14 Iteration number

Fig. 5.3. For Example 15, the convergence of the classical Arnoldi method (depicted with ’o’) and the optimal Arnoldi method (depicted with ’+’), drawn vertically. The first 11 steps are shown. The real parts of the numerically computed eigenvalues of B−1A are on the right.

[6] H-r Fang and Y. Saad, A filtered Lanczos procedure for extreme and interior eigenvalue prob- lems, SIAM J. Sci. Comput., 34 (2012), pp. A2220–A2246. [7] D. Fokkema, G. Sleijpen and H. Van der Vorst, Jacobi-Davidson style QR and QZ algo- rithms for the reduction of matrix pencils, SIAM J. Sci. Comput., 20 (1998), pp. 94–125. [8] G.H. Golub and C.F. van Loan, Matrix Computations, The Johns Hopkins University Press, the 3rd ed., 1996. [9] M. Huhtanen, A matrix nearness problem related to iterative methods, SIAM J. Numer. Anal., 39 (2001), pp. 407–422. [10] M. Huhtanen, Differential geometry of matrix inversion, Math. Scand., 107 (2010), pp. 267– 284. [11] M. Huhtanen and R. M. Larsen, Exclusion and inclusion regions for the eigenvalues of a normal matrix, SIAM J. Matrix. Anal. Appl., 23 (2002), pp. 1070–1091. [12] M. Huhtanen and A. Peram¨ aki,¨ Factoring matrices into the product of circulant and diagonal matrices, a submitted manuscript 2013. [13] V.A. Khatskevich, M.I Ostrovskii and V.S. Shulman, Linear fractional relations for operators, Math. Nachr., 279 (2006), pp. 875–890. [14] B. Kagstrom,¨ D. Kressner, E.S. Quintana-Ort and G. Quintana-Ort, Blocked algorithms for the reduction to Hessenberg-triangular form revisited, BIT, 48 (2008), pp. 563–584. [15] A. Kuijlaars, Convergence analysis of Krylov subspace iterations with methods from potential theory, SIAM Rev., 48 (2006), pp. 3–40. [16] C.B. Moler and G.W. Stewart, An algorithm for generalized matrix eigenvalue problems, SIAM J. Numer. Anal., 10 (1973), pp. 241–256. [17] B. Parlett,, The Rayleigh quotient iteration and some generalizations for nonnormal matrices, Math. Comp., 28 (1974), pp. 679–693. [18] B. Parlett, The Symmetric Eigenvalue Problem, Classics in Applied Mathematics 20, SIAM, Philadelphia, 1997. [19] A. Ran, and M. Wojtylak, Eigenvalues of rank one perturbations of unstructured matrices, Linear Algebra Appl., 437 (2012), pp. 589–600. [20] Y. Saad, Numerical Methods for Large Eigenvalue Problems, 2nd Edition, SIAM, Philadelphia, 2011. [21] G. Sleijpen, A. Booten, D. Fokkema and H. Van der Vorst, Jacobi-Davidson type methods for generalized eigenproblems and polynomial eigenproblems, BIT, 36 (1996), pp. 595–633. [22] D.C. Sorensen, Truncated QZ methods for large scale generalized eigenvalue problems, ETNA, 7 (1998), pp. 141–162. [23] D.C. Sorensen, Numerical methods for large eigenvalue problems, in Acta Numerica, Cam- bridge University Press, Cambridge, UK, 2002, pp. 519–584. [24] F. Tisseur and K. Meerbergen, The quadratic eigenvalue problem, SIAM Rev., 43 (2001), 20 MARKO HUHTANEN

pp. 235–286. [25] L.N. Trefethen and D. Bau, III, Numerical Linear Algebra, SIAM, Philadelphia, 1997.