Several properties of invariant pairs of nonlinear algebraic eigenvalue problems

Fei Xue and Daniel B. Szyld

Report 12-02-09 February 2012. Revised February 2013

This report is available in the World Wide Web at http://www.math.temple.edu/~szyld

SEVERAL PROPERTIES OF INVARIANT PAIRS OF NONLINEAR ALGEBRAIC EIGENVALUE PROBLEMS∗

DANIEL B. SZYLD† AND FEI XUE‡

Abstract. We analyze several important properties of invariant pairs of nonlinear algebraic eigenvalue problems of the form T (λ)v = 0. Invariant pairs are generalizations of invariant subspaces in association with block Rayleigh quotients of square matrices to a nonlinear -valued T (·). They play an important role in the analysis of nonlinear eigenvalue problems and algorithms. In this paper, we first show that the algebraic, partial, and geometric multiplicities together with the Jordan chains corresponding to an eigenvalue of T (λ)v = 0 are completely represented by the Jordan canonical form of a simple invariant pair that captures this eigenvalue. We then investigate approximation errors and perturbations of a simple invariant pair. We also show that second order accuracy in eigenvalue approximation can be achieved by the two-sided block Rayleigh functional for non-defective eigenvalues. Finally, we study the matrix representation of the Fr´echet derivative of the eigenproblem, and we discuss the norm estimate of the inverse derivative, which measures the conditioning and sensitivity of simple invariant pairs.

Key words. nonlinear eigenvalue problems, Jordan chains, invariant pairs, approximation error, perturbation analysis, Rayleigh functional, Newton-Kantorovich theorem

AMS subject classifications. 65F15, 65F10, 65F50, 15A18, 15A22.

1. Introduction. The development of theory and numerical methods for non- linear algebraic eigenvalue problems has attracted considerable attention in recent years. These problems arise from a large variety of applications, such as dynamic analysis of structures, study of singularities in elastic materials, optimization of the acoustic emissions of high speed trains, and minimization of the cost functional for optimal control problems; see, e.g., [1], [16], [17], [31]. Polynomial and rational eigen- value problems, and especially quadratic problems, are of particular interest. These problems can be equivalently transformed to a linear eigenvalue problem of larger size by some appropriate form of linearizations; see, e.g., [13], [14], [29]. Nonlinear eigen- value problems are generally much more challenging than their linear counterparts. For example, these problems are highly structured in many cases, and the structure often needs to be taken into full account for the development of analysis and algo- rithms; in addition, deflation of converged eigenpairs is not straightforward, because eigenvectors corresponding to distinct eigenvalues can be linearly dependent. This paper concerns the study of several important algebraic and analytical prop- erties of invariant pairs for general nonlinear algebraic eigenvalue problems of the form T (λ)v = 0. Invariant pairs, introduced by Kressner [12], are generalizations of invariant subspaces and associated subspace projections of square matrices. The computation of invariant pairs are considered a practical and effective approach to provide approximations to a set of eigenpairs simultaneously, especially for degenerate (semi-simple or defective) eigenvalues, or eigenpairs with linearly dependent eigenvec- tors. In particular, a block version of Newton’s method studied in [12] is shown to exhibit locally quadratic convergence. A major result developed therein is that the

∗This version dated 15 February 2013. This work was supported by the U.S. Department of Energy under grant DE-FG02-05ER25672 and U. S. National Science Foundation under grant DMS- 1115520. †Department of Mathematics, Temple University, 1805 N. Broad Street, Philadelphia, PA 19122, USA ([email protected]). ‡Department of Mathematics, University of Louisiana at Lafayette, P.O. Box 41010, Lafayette, LA 70504, USA ([email protected]). 1 2

Fr´echet derivative at the desired invariant pair is nonsingular if and only if this invari- ant pair is simple. This conclusion is a generalization of the well-known fact that for linear eigenvalue problems, the Fr´echet derivative at a single eigenpair is nonsingular if and only if this eigenvalue is simple. The nonsingularity of the Fr´echet derivative is critical for the success of Newton’s method. In Section 2, we briefly review some basic properties of invariant pairs. In this paper, we study invariant pairs and present several new and important algebraic and analytical properties. We first show that the spectral structure, in- cluding the algebraic, partial, and geometric multiplicities together with all Jordan chains, of an eigenvalue of T (λ)v = 0 are completely resolved by the Jordan canon- ical form of a simple invariant pair that captures this eigenvalue. In other words, a nonlinear algebraic eigenvalue problem can be locally represented by a small linear matrix eigenvalue problem characterized by simple invariant pairs. These algebraic properties of invariant pairs are discussed in Section 3. The second part of the paper concerns several analytical properties of invariant pairs, including approximation errors and perturbations of simple invariant pairs, and subspace projections (restrictions) of the nonlinear eigenproblem T (·) onto an approximate eigenspace. The first two issues are fundamental for the understanding of conditioning and sensitivity of simple invariant pairs. Given a simple invariant pair approximation with a small eigenresidual, we study conditions for the existence and uniqueness of a nearby exact invariant pair, and we give an error estimate of the ap- proximate pair. The study of approximation errors is closely related to a perturbation analysis of exact invariant pairs. The result on subspace projections of a nonlinear eigenproblem shows that good eigenvalue approximations can be obtained provided that T (·) is projected onto accurate eigenspace approximations. In particular, the second order accuracy in eigenvalue approximation can be achieved by the two-sided block Rayleigh functional for non-defective eigenvalues. These analyses are presented in Section 4. In Section 5, we study the matrix representation of the Fr´echet derivative of the nonlinear eigenproblem formulated for invariant pairs, and we give an norm estimate of the inverse derivative. This estimate provides a quantitative description of the con- ditioning and sensitivity of the involved invariant pair. The structure of the Fr´echet derivative is explored by studying the dimension of the of a linear operator associated with T (·). We also briefly discuss upper bounds of the norm of the per- turbed inverse Fr´echet derivative involved in the analysis of approximation errors and perturbations of invariant pairs. We give numerical examples in Section 6 to illustrate the approximation errors and perturbations of simple invariant pairs, and eigenvalue approximation accuracy achieved by block Rayleigh functionals. Finally, we summarize the paper in Section 7. 2. Preliminaries. Consider the general nonlinear algebraic eigenvalue problem

(2.1) T (λ)v ≡ (f1(λ)A1 + f2(λ)A2 + ··· + fm(λ)Am) v = 0,

where f1, . . . , fm :Ω → C are holomorphic functions defined on an open set Ω ⊂ C, n×n and A1,...,Am ∈ C are constant matrices. Here λ ∈ C is an eigenvalue, and v ∈ Cn \{0} is the corresponding right eigenvector. To specify the scaling of v, a normalization condition uH v = 1 with some fixed vector u is often imposed. The set of all eigenvalues of T (·) is called the spectrum of the nonlinear eigenproblem. Let λ0 be an eigenvalue of (2.1). The geometric multiplicity of λ0, denoted by geoT (λ0), is defined as dim(ker(T (λ0)); the algebraic multiplicity of λ0, denoted by 3

∂i algT (λ0), is the smallest integer i such that ∂µi det(T (µ))|µ=λ0 6= 0; see, for example, [12, Denition 7], [11, Definition A.3.4]. These definitions of multiplicities are identical to those for an eigenvalue of a square matrix. Let (λ0, v0) be an eigenpair of the problem (2.1). An ordered collection of nonzero vectors {v0, v1, . . . , vk−1} is called a Jordan chain of (2.1) or of the nonlinear eigen- problem T (·) corresponding to λ0 if they satisfy s X 1 di (2.2) T (i)(λ )v = 0 for s = 1, . . . , k − 1, where T (i) = T (µ). i! 0 s−i dµi i=0

Here, v1, . . . , vk−1 are also called generalized eigenvectors; see [11, Definition A.3.5]. Suppose that (2.2) is satisfied for some k = k∗, and no more vectors can be introduced such that (2.2) is satisfied for k = k∗ +1, then k∗ is the length of this Jordan chain, or a partial multiplicity of λ0. If all Jordan chains corresponding to λ0 are of length 1, then λ0 is a semi-simple eigenvalue; otherwise it is defective. Semi-simple eigenvalues whose algebraic multiplicity is 1 are called simple eigenvalues. In this paper, we only consider eigenvalues of finite algebraic multiplicity. In this case, [11, Corollary A.6.5] shows that in a neighborhood of λ0,

k1 k2 kg  (2.3) T (µ) = N(µ) diag (µ − λ0) , (µ − λ0) ,..., (µ − λ0) , 1, ··· , 1 M(µ),

where g = geoT (λ0), ki are integers satisfying 1 ≤ kg ≤ · · · ≤ k1 < ∞, N(µ) and M(µ) are holomorphic matrix functions in the neighborhood of λ0, and N(λ0) are M(λ0) are nonsingular matrices. The diagonal matrix in (2.3) is called the (local) Smith form of T (µ). If the algebraic multiplicity of λ0 is finite, it follows that λ0 is the only point in the neighborhood of λ0 at which T (·) is a singular matrix. We note that the algebraic multiplicity of λ0 can also be defined as the sum of all partial multiplicities of λ0 (total length of all Jordan chains); see, e.g., [10] for matrices, and [11, Appendix A.] for nonlinear eigenproblems T (·). It can be shown that the two definitions are equivalent; see [11, Proposition A.6.4]. Both definitions are useful for the derivation of many fundamental theoretical results. For example, we can use the first definition of algT (λ0) to prove the following result: Proposition 2.1. geoT (λ0) ≤ algT (λ0). Proof. The conclusion holds trivially if algT (λ0) ≥ n. Assume that algT (λ0) < n, and geoT (λ0) = g. Let {ϕ1, ϕ2, ··· , ϕg} be a basis of ker(T (λ0)), and choose linearly independent vectors {ξ1, ξ2, . . . , ξn−g} such that B = [ϕ1, ϕ2, . . . , ϕg, ξ1, ξ2, . . . , ξn−g] is nonsingular. Because T (µ) is holomorphic in a neighborhood of λ0, we have T (µ) = P∞ (k) k T (λ0) + k=1 T (λ0)(µ − λ0) /k!, and therefore det(T (µ)) = det (T (µ)B) · det B−1 −1 = det ([T (µ)ϕ1,...,T (µ)ϕg,T (µ)ξ1,...,T (µ)ξn−g]) · det(B ) −1 = det ([(µ − λ0)ψ1,..., (µ − λ0)ψg,T (µ)ξ1,...,T (µ)ξn−g]) · det(B ) g −1 = (µ − λ0) det ([ψ1, . . . , ψg,T (µ)ξ1,...,T (µ)ξn−g]) · det(B ),

(k) P∞ T (λ0) k−1 where ψi = k=1 k! (µ − λ0) ϕi (1 ≤ i ≤ g) and T (µ)ξi (1 ≤ i ≤ n − g) are 0 holomorphic vector functions of µ in a neighborhood of λ0, and ψi → T (λ0)ϕi as dj µ → λ0. It follows that dµj det(T (µ))|µ=λ0 = 0 if j < g, and thus algT (λ0) ≥ g = geoT (λ0). To the best of our knowledge, the proof based on the first definition of algT (λ0) has not been given in the literature. Of course, the proposition holds trivially if 4

algT (λ0) is defined as the total length of Jordan chains corresponding to λ0, because geoT (λ0) is the number of Jordan chains. The definition of an eigenpair (λ, v) can be naturally extended to allow treatment of several eigenpairs simultaneously. The extension, called invariant pairs, provide a convenient way for one to analyze multiple eigenpairs as an entity and develop numer- ical algorithms to compute them, especially for semi-simple and defective eigenvalues. The definition and preliminary properties of invariant pairs of the problem (2.1) were first discussed in [12]. In this paper, we study several important properties of invariant pairs, and we compare our results with those established since the 1970’s for invariant subspaces of the standard eigenvalue problem Av = λv. To prepare for the study, we briefly review the definitions and basic properties of invariant pairs. A pair (X,G) ∈ Cn×k × Ck×k is called minimal if there exists an integer ` ≥ 1 such that

 X   XG  (2.4) U (X,G) ≡   `  .   .  XG`−1

has full column rank k. The smallest such ` is called the minimality index of (X,G). We can show that the minimality index of a minimal pair (X,G) ∈ Cn×k × Ck×k is no greater than k. A nonminimal pair can be replaced with a minimal pair. If (X,G) ∈ Cn×k ×Ck×k ˆ ˆ ˆ is nonminimal, then for a given `, there exists a minimal pair (X,b Gb) ∈ Cn×k × Ck×k, ˆ where k < k, such that span (X) = span(Xb) and span (U`(X,G)) = span(U`(X,b Gb)); see [2, Theorem 3]. This property allows us to restrict our discussion to minimal pairs. Definition 2.2. Consider (V,L) ∈ Cn×k × Ck×k, where the eigenvalues of L are contained in Ω ⊂ C. Then (V,L) is called an invariant pair of the nonlinear eigenvalue problem (2.1) if

(2.5) A1V f1(L) + A2V f2(L) + ··· + AmV fm(L) = 0.

Note that in the case of matrix polynomials, i.e., T (·) is a polynomial of order `, then the pair (V,L) ∈ Cn×`n × C`n×`n satisfying (2.5) contains the complete spectral information for T (·), and it is called a standard pair of the matrix polynomial; see [6, Introduction]. In the general nonlinear case, we assume that the matrix functions f1(L), . . . , fm(L) are well-defined; see, e.g., [9] and [3, Definition 2.2]. For example, let (λ1, v1),..., (λk, vk) be k eigenpairs of (2.1) where λi 6= λj for i 6= j. It can be shown that (V,L) with V = [v1, . . . , vk] and L = diag(λ1, . . . , λk) is a minimal invariant eigenpair. Given a minimal invariant pair (V,L), one can show that for any nonsingular Z ∈ Ck×k,(VZ,Z−1LZ) is also a minimal invariant pair, and the eigenvalues of L are those of the problem (2.1); that is, Λ(L) ⊂ Λ(T (·)), where Λ stands for the spectrum of a matrix or nonlinear eigenproblem. Note that for the standard eigenvalue problem where T (µ) = µI − A, it is easy to see that an eigenpair (V,L) with AV = VL is minimal, and Λ(L) ⊂ Λ(A). Since Λ(L) ⊂ Λ(T (·)), we can have the following definition. Definition 2.3. An invariant pair (V,L) for problem (2.1) is called simple if (V,L) is minimal and the algebraic multiplicity of each eigenvalue of L is identical to the algebraic multiplicity of this eigenvalue of the problem (2.1). 5

To specify the scaling of V for a given invariant pair (V,L), we choose an integer ` no less than the minimality index of (V,L) such that U`(V,L) defined in (2.4) has full column rank k. Define   W0  W1  −1 (2.6) W =   = U (V,L) U (V,L)H U (V,L) ∈ `n×k,  .  ` ` ` C  .  W`−1

H k×k such that W U`(V,L) = Ik. Let CΩ be the set of k × k complex matrices whose spectrum lies in Ω. Kressner [12] observed that the invariant pair (V,L) is a root of the n×k k×k n×k n×k k×k k×k following matrix operators T : C × CΩ → C and V : C × CΩ → C :  T :(X,G) → A1Xf1(G) + A2Xf2(G) + ··· + AmXfm(G) (2.7) H . V :(X,G) → W U`(X,G) − Ik

The Fr´echet derivative of T and V at (X,G) are the operators defined as follows: ( Pm DT(X,G) : (∆X, ∆G) → T(∆X,G) + j=1AjXDfj(G)(∆G) (2.8) H P`−1 H j j  . DV(X,G) : (∆X, ∆G) → W0 ∆X + j=1Wj ∆XG + X[DG ](∆G)

j Here, Dfj(G) is the Fr´echet derivative of the matrix function G → fj(G), and DG is the Fr´echet derivative of the matrix function G → Gj. The following major result given in [12] shows that a simple invariant pair (V,L) is a simple root of the nonlinear matrix equation (T(X,G), V(X,G)) = (0, 0). Theorem 2.4. [12, Theorem 10] Let (V,L) be a minimal invariant pair of the problem (2.1). Then (V,L) is simple if and only if the associated Fr´echetderivative  (2.9) L(V,L) : (∆X, ∆G) → DT(V,L)(∆X, ∆G), DV(V,L)(∆X, ∆G)

is invertible, where DT and DV are defined in (2.8). Simple invariant pairs of the nonlinear eigenproblem T (·) are natural generaliza- tions of simple invariant subspaces and corresponding projections of square matrices. The major characteristic of both entities is that their corresponding spectra are dis- joint with the rest of the spectrum of the original eigenvalue problem. As a result, they change analytically under analytic perturbation of the eigenvalue problem. This property guarantees that the numerical computation of simple invariant pairs is a well-posed problem. In particular, iterative algorithms for computing simple invariant subspaces and simple invariant pairs provide reliable approximations to degenerate (semi-simple or defective) eigenvalues and associated eigenspaces. In the following sections, we investigate several important properties of invariant pairs. 3. Some algebraic properties of simple invariant pairs. This section con- cerns a few fundamental algebraic properties of simple invariant pairs (V,L) of (2.1). We show that the spectral structure (algebraic, partial, and geometric multiplicities, and Jordan chains) of an eigenvalue λ0 of the eigenvalue problem T (λ)v = 0 is com- pletely resolved by a simple invariant pair that captures this eigenvalue.

Theorem 3.1. Suppose that (V,L) is a simple invariant pair of (2.1), λ0 ∈ Λ(L), −1 and JL = Z LZ is the Jordan canonical form of L. Assume that JL has g Jordan blocks corresponding to λ0, each of size ki × ki (1 ≤ i ≤ g). Then there are exactly g Jordan chains of T (·) corresponding to λ0, the length of each is ki, and geoT (λ0) = g. 6

Proof. Without loss of generality, let     Jk1 (λ0) λ0 1

 Jk2 (λ0)   λ0 1      J = ..  ,J (λ )= .. ..  , L  .  ki 0  . .       Jkg (λ0)   λ0 1  J(λother) λ0 ki×ki where J(λother) is the block diagonal matrix of Jordan blocks corresponding to other eigenvalues of L. Let algL(λ0) be the algebraic multiplicity of λ0 as an eigenvalue of L. Then, by the definition of simple invariant pairs and the structure of JL, we have algT (λ0) = algL(λ0) = k1 + k2 + ··· + kg. Since (V,L) is an invariant pair, we know from Section 2 that (VZ,Z−1LZ) = (VZ,JL) is an invariant pair, and U`(VZ,JL) = U`(V,L)Z has full column rank k with minimality index `. For the sake of brevity of notation, let   VZ = ϕ1,0, ϕ1,1, . . . , ϕ1,k1−1, ϕ2,0, . . . , ϕ2,k2−1, . . . , ϕg,0, . . . , ϕg,kg −1, Φother ; that is, we partition VZ according to the order and size of the Jordan blocks {Jki (λ0)}. Consider the collection of vectors ϕ1,0, ϕ2,0, . . . , ϕg,0. These vectors must be linearly independent; otherwise the 1st, (k1+1)st, ... , and (k1+···+kg−1+1)st column vectors of U`(VZ,JL) (corresponding to the first columns of Jk1 (λ0),Jk2 (λ0),...,Jkg (λ0)), namely,       ϕ1,0 ϕ2,0 ϕg,0  λ0ϕ1,0   λ0ϕ2,0   λ0ϕg,0         .  ,  .  ,...,  .  ,  .   .   .  `−1 `−1 `−1 λ0 ϕ1,0 λ0 ϕ2,0 λ0 ϕg,0 are linearly dependent, contradicting the fact that U`(VZ,JL) has full column rank. Now note that the matrix function f(JL) is defined as (see, e.g., [3],[9])   f(Jk1 (λ0))

 f(Jk2 (λ0))     ..  (3.1) f(JL) =  .  ,    f(Jkg (λ0))  f(J(λother)) where

00 (k −1)  0 f (λ0) f i (λ0)  f(λ0) f (λ0) ··· 2! (ki−1)! (k −2)  0 f i (λ0)   f(λ0) f (λ0)   (ki−2)!  (3.2) f(Jki (λ0)) =  . . .  (1 ≤ i ≤ g).  .. .. .   0   f(λ0) f (λ0)  f(λ0)

Since (VZ,JL) is an invariant pair of the problem (2.1), we can substitute (V,L) with (VZ,JL) in (2.5). Considering the 1st, (k1 +1)st, ... , and (k1 +···+kg−1 +1)st columns, we have from (3.1) and (3.2) that m X Ajϕi,0fj(λ0) = T (λ0)ϕi,0 = 0 for 1 ≤ i ≤ g. j=1 7

Similarly, we see from other columns that

m si (r) si X X fj (λ0) X 1 (3.3) A ϕ = T (r)(λ )ϕ = 0, j i,si−r r! r! 0 i,si−r j=1 r=0 r=0

for i = 1, 2, . . . , g, and si = 1, 2, . . . , ki − 1. That is, ϕ1,0, ϕ2,0, . . . , ϕg,0 ∈ ker(T (λ0)),

and therefore geoT (λ0) ≥ g. Also, {ϕ1,0, . . . , ϕ1,k1−1}, {ϕ2,0, . . . , ϕ2,k2−1}, ..., and

{ϕg,0, . . . , ϕg,kg −1} form g Jordan chains corresponding to λ0. Note that the algebraic multiplicity of λ0 as an eigenvalue of L is algL(λ0) = Pg i=1 ki. From (3.3) and the definition (2.2), k1, k2, . . . , kg are lower bounds of the lengths of the g Jordan chains, because there may exist additional generalized eigen- vectors such that the length of one of these Jordan chains is greater than ki. Since algT (λ0) equals the total length of all (possibly more than g) Jordan chains corre- sponding to λ0, we have algL(λ0) ≤ algT (λ0). For a simple invariant pair (V,L), since algL(λ0) = algT (λ0), we see that there do not exist any additional Jordan chains cor- responding to λ0, and there do not exist additional generalized eigenvectors such that the length of any of these g Jordan chains is greater than ki. Therefore geoT (λ0) = g, and the length of each Jordan chain is ki (1 ≤ i ≤ g). This completes the proof. Theorem 3.1 shows that through simple invariant pairs, the nonlinear algebraic eigenvalue problem can be locally represented by a small matrix eigenvalue problem. This insight is obtained by exploring the Jordan canonical form of the matrix L, its connection to matrix functions and the definition of Jordan chains. 4. Some analytical properties of invariant pairs. In this section, we study several analytical properties of invariant pairs. First, given an approximate invariant pair (X,G), we discuss the conditions for the existence and uniqueness of a nearby exact invariant pair (V,L), and we explore the error of (X,G) as an approximation of (V,L). The study of approximation error leads to a perturbation analysis of an exact invariant pair (V,L). In addition, we consider the projection (restriction) of T (·) onto an approximate eigenspace. This projection is a block generalization of the nonlinear Rayleigh functional [23], and it provides eigenvalue approximations associated with the approximate eigenspace. The major mathematical tool we will use to study the analytical properties of invariant pairs is the Newton-Kantorovich theorem; see, e.g., [8], [19]. For a nonlinear functional F defined on Banach spaces whose Fr´echet derivative is Lipschitz continu- 0 ous, the theorem states that, given a vector u0 for which F (u0) is nonsingular and its inverse is bounded and, moreover, kF (u0)k is sufficiently small, there exists a unique root of F nearby. Theorem 4.1 (Newton-Kantorovich). Let Y and Z be two Banach spaces, and D an open convex subset of Y . Let F : D → Z be Fr´echetdifferentiable on D. 0 0 Suppose that there exists γ > 0 such that kF (ua) − F (ub)k ≤ γkua − ubk for all 0 −1 ua, ub ∈ D. Assume that u0 ∈ D is such that [F (u0)] : Z → Y exists, and there 0 −1 0 −1 exist κ > 0, δ > 0 such that [F (u0)] ≤ κ, and [F (u0)] F (u0) ≤ δ. Suppose √ that B(u0, ds) = {u : ku − u0k ≤ ds} ⊂ D, where ds = 2(1 − 1 − h)δ/h, and let h ≡ 2γκδ. If h ≤ 1, then 0 −1 1. the Newton iterates un+1 = un − [F (un)] F (un) exist and un ∈ B(u0, ds) for all n; ∗ ∗ ∗ 2. u = lim xn exists, F (u ) = 0, and u ∈ B(u0, ds) ⊂ D; ∗ 3. u is the only solution of F√(u) = 0 in B(u0, dl) ∩ D if h < 1, and in B(u0, dl) if h = 1, where dl = 2(1 + 1 − h)δ/h. 8

0 −1 Remark 4.2. In Theorem 4.1, it is worth noting that we can let κ ≡ [F (u0)]

0 −1 and δ ≡ [F (u0)] F (u0) to obtain the optimal bounds. In fact, by such definitions of κ and δ, h ≡ 2κγδ attains its minimum value, so that there is a greater possibility 2δ ∗ to satisfy h ≤ 1; also, ds = √ is minimized for the bound of ku0 − u k, and √ 1+ 1−2κγδ 1+ 1−2κγδ ∗ dl = κγ is maximized for radius of the ball in which u is the unique root of F . In the following analysis, we will use such definitions for κ and δ. To prepare for the analysis of approximation errors and perturbations of invariant pairs, consider the Banach space B of matrix pairs (X,G) ∈ Cn×k ×Ck×k. Define  X   vec(X)  (4.1) k(X,G)k ≡ = , G vec(G) F 2 kXk ≡ kXkF = kvec(X)k2, and kGk ≡ kGkF = kvec(G)k2, where vec(·) is the standard vectorization of a matrix. It follows that kXk ≤ k(X,G)k and kGk ≤ k(X,G)k. Suppose that a function F : B → B is Fr´echet differentiable. Its Fr´echet derivative DF : B → B is a linear transformation whose norm is defined in the usual manner kDF(∆X, ∆G)k (4.2) kDFk ≡ max . k(∆X,∆G)k6=0 k(∆X, ∆G)k

Since B is a Banach space of dimension nk + k2 = (n + k)k, the linear transformation DF defined on B can be represented by an (n + k)k × (n + k)k matrix. Given the definition of the “vector” norm k(X,G)k in (4.1), we can see that kDFk defined in (4.2) equals the 2-norm of the matrix representation of DF. In fact, any consistent matrix norm can be used to define k(X,G)k, kXk and kGk. The treatment of the “eigenvector part” and “eigenvalue part” as an entity seems most natural in the our setting. However, one needs to realize that this treatment has certain shortcomings. For example, k(X −V,G−L)k may not capture wide-varying magnitudes in the eigenvector and the eigenvalue parts; for example, if kX −V k  kG−Lk, an accurate estimate of k(X−V,G−L)k is a serious overestimate of kG−Lk. In addition, for two equivalent invariant pairs (V1,L1) and (V2,L2) such that V2 = −1 V1S and L2 = S L1S for a nonsingular S, k(V1 −V2,L1 −L2)k does not reveal the equivalence of the two pairs. Another issue is the loss of scale-invariance. An invariant 1 pair (V,L) of T (λ)v = 0 corresponds to (V, α L) of the scaled problem T (αλ)v = 0, and thus the bounds on approximation errors and perturbations of invariant pairs of the original problem can not be converted to the bounds for the scaled problem by simple scaling. Nevertheless, to the best of our knowledge, our results based on invariant pairs and their norms provide new insights into the properties of multiple eigenpairs as an entity that are not well-understood by alternative approaches. 4.1. Error estimate of approximate invariant pairs. In this section, we consider an approximate invariant pair (X,G) of the problem (2.1) for which the eigenresidual norm kT(X,G)k is small. We give conditions for the existence and uniqueness of a nearby exact invariant pair (V,L) and the approximation error of (X,G). We will see later that our result is very similar to that of eigenpairs of matrices developed by Stewart [24], [25], which is restated in the following proposition. Proposition 4.3. [27, Chapter 4, Theorem 2.12] Let X be a set of orthonormal n×n vectors that span an approximate invariant subspace of A ∈ C , and [XX⊥] be a 9

H H unitary matrix. Let G = X AX and M = X⊥ AX⊥ be the Rayleigh quotients associ- H H H ated with X and X⊥, respectively. Define S = X A − GX and the eigenresidual R = AX − XG. Suppose that 4kRkkSk ≤ sep2(G, M) in the 2-norm, where

−1 −1 ! kS(Q)k kS (Q)k −1 −1 (4.3) sep(G, M) = inf = sup = kS k , kQk6=0 kQk kQk6=0 kQk and

(4.4) S : Q → QG − MQ is a linear Sylvester operator. Then there exists a simple eigenpair (V,L) of A, i.e., AV = VL and Λ(L) ∩ (Λ(A) \ Λ(L)) = ∅, such that

2kRkkSk 2kRk (4.5) kG − Lk ≤ , and tan (X,V ) ≤ , sep(G, M) ∠ sep(G, M)

where ∠(X,V ) stands for the largest canonical angle between span{X} and span{V }; see, e.g., [7, Chapter 2.6], [26, Chapter 1.4.5]. To prepare for the study of approximation errors of invariant pairs, we first review the Mean Value Theorem in a Banach space (see, e.g., [20, Section 3.2]), and we discuss a sufficient condition for the Lipschitz continuity of the Fr´echet derivative L. Lemma 4.4 (Mean Value Theorem). Let Y and Z be two Banach spaces, and D an open convex subset of Y containing ua and ub. Define

(ua, ub) = {u : u = αua + (1 − α)ub, 0 < α < 1}, and

[ua, ub] = {u : u = αua + (1 − α)ub, 0 ≤ α ≤ 1}.

Let F : D → Z be a continuous functional on [ua, ub] whose Fr´echetderivative DF exists for all u ∈ (ua, ub). Then

kF (ub) − F (ua)k ≤ sup kDF(u)kkub − uak. u∈(ua,ub)

The Lipschitz continuity of the Fr´echet derivative L (see (2.8)) can be established by the following lemma.

Lemma 4.5. Suppose that the Fr´echetderivatives of the functions {fj} in (2.1) are Lipschitz continuous, i.e., there exists γj > 0 such that Dfj(G1) − Dfj(G2) ≤ k×k n×k n×k γjkG1 − G2k for all G1,G2 ∈ CΩ . Let CΓ ⊂ C be a convex and bounded k×k set. If Ω is bounded and CΩ is convex, then the Fr´echetderivative L(X,G) =  n×k k×k DT(X,G), DV(X,G) is Lipschitz continuous in CΓ × CΩ . Proof. Note that L(X,G) is Lipschitz continuous if DT and DV defined in (2.8) are Lipschitz continuous. We first show the Lipschitz continuity of DT. From (2.8), for 10

n×k k×k any (X1,G1), (X2,G2) ∈ CΓ × CΩ , we have

DT(X1,G1)(∆X, ∆G) − DT(X2,G2)(∆X, ∆G) m X = (∆X,G ) − (∆X,G ) + A X f (∆G) − X f (∆G) T 1 T 2 j 1D j(G1) 2D j(G2) j=1 m X ≤ Aj∆X (fj(G1) − fj(G2)) j=1 m X + A (X − X ) f (∆G) + X f (∆G) − f (∆G) j 1 2 D j(G2) 1 D j(G1) D j(G2) j=1 m X ≤ ||Aj|| sup Dfj(G) ||G1 − G2||k∆Xk (using Lemma 4.4) j=1 G∈(G1,G2) m X  + ||Aj|| ||X1 − X2|| Dfj(G2) ||∆G|| + ||X1|| γj ||G1 − G2||||∆G|| j=1 m X ≤ ||Aj|| sup Dfj(G) ||(X1 − X2,G1 − G2)||k(∆X, ∆G)k j=1 G∈(G1,G2) m X  + ||Aj|| Dfj(G2) + γj ||X1|| ||(X1 − X2,G1 − G2)||||(∆X, ∆G)|| . j=1 It then follows that

DT(X1,G1) − DT(X2,G2)

DT(X1,G1)(∆X, ∆G) − DT(X2,G2)(∆X, ∆G) = sup (∆X,∆G)6=0 k(∆X, ∆G)k m ! X ≤ kAjk sup Dfj(G) + kDfj(G2)k + γjkX1k k(X1 − X2,G1 − G2)k j=1 G∈(G1,G2)

≡ γDT k(X1 − X2,G1 − G2)k.

Pm   Here, γDT = j=1 kAjk supG∈(G1,G2)kDfj(G)k + kDfj(G2)k + γjkX1k is bounded, because both kDfjk and kX1k are bounded (Dfj is Lipschitz continuous on the k×k n×k bounded set CΩ , and X1 belongs to the bounded set CΓ ). The proof of the Lipschitz continuity of DV follows the lines of that for DT, and is therefore omitted. The lemma is thus established. H Let (X,G) be an approximate invariant pair such that W U`(X,G) is nonsingu- H −1 H H −1 lar. Define Xe = X(W U`(X,G)) and Ge = (W U`(X,G))G(W U`(X,G)) . We H −1 H see that T(X,e Ge) = T(X,G)(W U`(X,G)) , and W U`(X,e Ge) − Ik = 0. Because Xe and X span the same subspace, and Ge and G have identical spectrum, (X,e Ge) and (X,G) are “equivalent” approximate invariant pairs. Therefore, from now on, it suf- H fices to consider approximate invariant pairs (X,G) satisfying W U`(X,G) − Ik = 0. We are now ready to present a theorem giving an error estimate of an approximate invariant pair (X,G) of the nonlinear eigenvalue problem (2.1). n×k k×k H Theorem 4.6. Let (X,G) ∈ CΓ ×CΩ satisfying W U`(X,G) − Ik = 0 be n×k an approximate minimal invariant pair of (2.1), where CΓ is bounded and convex, 11

k×k and CΩ is open and convex with bounded Ω. Suppose that the Fr´echetderivative −1 L(X,G) is nonsingular, and κ = L(X,G) is finite, and there exists γ > 0 such that n×k L(X1,G1) − L(X2,G2) ≤ γk(X1 −X2,G1 −G2)k for all (X1,G1), (X2,G2) ∈ C × Γ k×k. Suppose that (X,G) is small in norm such that δ = −1 ( (X,G), 0) CΩ T L(X,G) T √2δ n×k k×k satisfies h ≡ 2κγδ ≤ 1, and that B((X,G), ) ⊂ CΓ ×CΩ , then there is a 1+ 1−h √ 1+ 1−h n×k k×k unique invariant pair (V,L) of (2.1) in B((X,G), κγ ) ∩ CΓ × CΩ (if h < 1) or in B((X,G), 1 )(if h = 1) with ||(X − V,G − L)|| ≤ √2δ . κγ 1+ 1−h n×k k×k n×k k×k Proof. Consider the Banach spaces D = CΓ ×CΩ , Z = C ×C , and the 0 functionals F :(X,G) → (T(X,G), V(X,G)), F : (∆X, ∆G) → L(X,G)(∆X, ∆G). Let u0 = (X,G). The theorem can be established by applying Theorem 4.1. Theorem 4.6 gives an explicit bound of the approximation error of (X,G), namely, k(X − V,G − L)k ≤ d ≡ √2δ . Since 0 ≤ h ≤ 1, we can see that s 1+ 1−h

(4.6) δ ≤ d ≤ 2δ = 2 −1 ( (X,G), 0) ≤ 2 −1 k (X,G)k, s L(X,G) T L(X,G) T

which is very similar to the error estimate of approximate invariant subspaces (4.5). Both error estimates are proportional to the eigenresidual norm of the approximate invariant pair, and inversely proportional to the reciprocal of the norm of an inverse linear operator; see (2.8) and (2.9), (4.3) and (4.4), respectively, for the two linear operators involved. In both cases, by an analogy to the singular values of a nonsingular matrix, we see that the reciprocal of the norm of the inverse operator is essentially the smallest singular value of the original operator. This quantity measures the separation between the spectrum approximation G and the rest of the spectrum of the eigenvalue problem. The smaller this separation is, the larger the approximation error could be. In Section 5, we give an estimate of this separation for the case where a semi-simple eigenvalue is the only distinct eigenvalue present in an invariant pair. Note that if kXk is considerably smaller or larger than kV k, then the upper bound of k(X−V,G−L)k given in Theorem 4.6 can be a significant overestimate of ∠(X,V ). This overestimate can be avoided for linear eigenvalue problems by considering an unitary block triangularization of a matrix or a matrix pair and orthogonal projection of the exact invariant subspace onto the approximate invariant subspace; see, e.g., [24], [25]. Similarly, for nonlinear eigenvalue problems, one may consider the use of an orthogonal projection of X−V onto V , but this approach does not yield an explicit sharper error estimate; see, e.g., [2]. If X and V are appropriately normalized such that kXk ≈ kV k, it is natural to expect ds as a reasonable upper bound of ∠(X,V ), provided that kX −V k is not much smaller than kG−Lk.

4.2. Perturbation analysis of simple invariant pairs. In this section, we develop a perturbation analysis of simple invariant pairs of the nonlinear eigenvalue problem (2.1) with focus on their condition number, and we compare our result with the perturbation analysis of simple eigenpairs of a matrix given by Stewart [25]. We first introduce a block diagonalization of a matrix and review Stewart’s analysis. n×n Lemma 4.7. Suppose that (V1,L) is an invariant pair of A ∈ C , that is, H k×k AV1 − V1L = 0 where V1 has orthonormal columns and L = V1 AV1 ∈ C . Choose n×(n−k) H W2 ∈ C such that [V1 W2] is unitary, and let M = W2 AW2. If (V1,L) n×(n−k) is simple, i.e., Λ(L) ∩ Λ(M) = ∅, then there exists matrices V2 ∈ C and 12

n×k H −1 W1 ∈ C such that [W1 W2] = [V1 V2] , and A can be block diagonalized as

   H  L W1 (4.7) A = [V1 V2] H . M W2

Proof. Suppose that A has a Jordan canonical form

   H  −1 J1 Q1 A = PJP = [P1 P2] H . J2 Q2

It is then easy to see that A can be decomposed as (4.7), where

H −1/2 H 1/2 V1 = P1(P1 P1) ,V2 = P2(Q2 Q2) , H 1/2 H −1/2 H −1/2 H 1/2 L = (P1 P1) J1(P1 P1) ,M = (Q2 Q2) J2(Q2 Q2) , H 1/2 H −1/2 W1 = Q1(P1 P1) ,W2 = Q2(Q2 Q2) , and both V1 and W2 have orthonormal columns. Proposition 4.8. [27, Chapter 4, Theorem 2.13] Let A ∈ Cn×n have the decom- position as (4.7), where Λ(L) ∩ Λ(M) = ∅. Letting Ae = A + E, we have

 H    W1 L + F11 F12 H (A + E)[V1 V2] = , W2 F21 M + F22

H H H H where F11 = W1 EV1, F12 = W1 EV2, F21 = W2 EV1 and F22 = W2 EV2. Suppose 2 (n−k)×k that 4kF12kkF21k ≤ sep (L + F11,M + F22), then there is S ∈ C satisfying kSk ≤ 2kF21k such that sep(L+F11,M+F22)

(L,e Ve1) = (L + F11 + F12S, V1 + V2S)

is a simple right eigenpair of Ae, i.e., AeVe1 = Ve1Le and Λ(Le) ∩ (Λ(Ae) \ Λ(Le)) = ∅.

We see from Proposition 4.8 that the perturbation of (V1,L) is bounded by

( 2kV2kkEk −1 kVe1 − V1k = kV2Sk ≤ = 2kV2k Se kEk, and (4.8) sep(L+F11,M+F22) , H −1 H kLe − Lk = kF11 + F12Sk ≤ kW1 Ek + 2 Se kW1 EV2kkEk

where Se : Q → Q(L + F11) − (M + F22)Q is a perturbed variant of the Sylvester operator S : Q → QL − MQ. We assume that the perturbation is small enough so that Se is nonsingular. Later we will compare our perturbation results for the nonlinear case with these bounds. To begin the perturbation analysis, first note that the error estimate of the ap- proximate invariant pair (X,G) given in Theorem 4.6 can be used for this purpose. In fact, the invariant pair (V,L) of the original problem (2.1) is an approximate invariant pair of a slightly perturbed nonlinear eigenproblem

m X (4.9) Te(λ) ≡ fj(λ)(Aj + δAj) = T (λ) + E(λ). j=1

Here, we assume that small perturbations arise in the matrices {Aj}, which in many applications come from the discretization of differential equations; {fj} are often 13

fixed functions, e.g., standard elementary or special functions, that are not subject to perturbations. To derive the perturbation analysis, consider the operators

 Pm Te :(X,G) → j=1(Aj + δAj)Xfj(G) = T(X,G) + E(X,G) (4.10) H (see (2.7)) , V :(X,G) → W U`(X,G) − Ik Pm where E(X,G) = j=1 δAjXfj(G), and the Fr´echet derivative of Te

DTe(X,G) : (∆X, ∆G) → DT(X,G)(∆X, ∆G) + DE(X,G)(∆X, ∆G)(4.11) m X = DT(X,G)(∆X, ∆G) + E(∆X,G) + δAjXDfj(G)(∆G). j=1

One can see that DTe is close to DT if {kδAjk} are small. In fact, we have

(4.12) kDTe(X,G)(∆X, ∆G) − DT(X,G)(∆X, ∆G)k m X ≤ E(∆X,G) + δAjXDfj(G)(∆G) j=1 m X ≤ kδAjkk∆Xfj(G) + XDfj(G)(∆G)k j=1 m X  ≤ kδAjk kfj(G)kk∆Xk + kXkkDfj(G)kk∆Gk j=1 m X  ≤ kδAjk kfj(G)k + kXkkDfj(G)k k(∆X, ∆G)k. j=1

The Fr´echet derivative of the perturbed problem is therefore   (4.13) Le(X,G) : (∆X, ∆G) → DTe(X,G)(∆X, ∆G), DV(X,G)(∆X, ∆G) , which, from (4.12), satisfies

m X (4.14) Le(X,G) − L(X,G) ≤ kδAjk(kfj(G)k + kXkkDfj(G)k). j=1

To develop the perturbation analysis of a simple invariant pair (V,L), it suffices to show that 1) Le(V,L) is nonsingular, 2) Le is Lipschitz continuous, and 3) kTe(V,L)k is small enough. Then the existence and uniqueness of a nearby invariant pair (V,e Le) of the perturbed problem can be established by the Newton-Kantorovich theorem. Conditions 1) and 3) hold if the perturbations {kδAjk} (j = 1, 2, . . . , m) are small. Specifically, note that L(V,L) is nonsingular (see Theorem 2.4), and the eigenvalues of linear operators in a finite dimensional space change continuously under perturbation; see, e.g., [28], [30]. Therefore, if {kδAjk} are small enough, Le(V,L) − L(V,L) is small, and the nonsingularity of Le(V,L) can be guaranteed; in addition,

m X Te(V,L) = kE(V,L)k ≤ kδAjkkV fj(L)k j=1 14

is also small. The Lipschitz continuity of Le can be shown the same way as we did for L in Lemma 4.5. In summary, the perturbation analysis of a simple invariant pair (V,L) of the nonlinear eigenvalue problem (2.1) is given in the following theorem. n×k k×k Theorem 4.9. Let (V,L) ∈ CΓ × CΩ be a simple invariant pair of (2.1), n×k k×k where CΓ is bounded and convex, and CΩ is open and convex with bounded Ω. For the perturbed nonlinear eigenproblem (4.9), suppose that the Fr´echetderivative Le de- −1 fined in (4.13) is nonsingular at (V,L), and κ = Le(V,L) is finite, and there exists γ >

0 such that Le(X1,G1) −Le(X2,G2) ≤ γk(X1−X2,G1−G2)k for all (X1,G1), (X2,G2) ∈ n×k k×k −1 CΓ × CΩ . Suppose that {kδAjk} are small, such that δ = Le(V,L)(Te(V,L), 0) √2δ n×k k×k satisfies h ≡ 2κγδ ≤ 1, and that B((V,L), ) ⊂ CΓ × CΩ , then there is 1+ 1−h √ 1+ 1−h n×k k×k a unique invariant pair (V,e Le) of (4.9) in B((V,L), κγ ) ∩ CΓ × CΩ (if 1 2δ h < 1) or in B((V,L), )(if h = 1) such that (V − V, L − L) ≤ √ . κγ e e 1+ 1−h Similar to the approximation error analysis given in the previous section, the perturbation of the invariant pair is bounded by d ≡ √2δ , which satisfies s 1+ 1−h

−1 −1 (4.15) δ ≤ ds ≤ 2δ = 2 Le(V,L)(Te(V,L), 0) ≤ 2 Le(V,L) kE(V,L)k.

−1 This bound shows that 2 Le(V,L) is the condition number of a simple invariant pair under perturbation of {Aj} of the problem (2.1). From (4.8), we see that the condition −1 number of the simple eigenspace V1 of the matrix A is 2kV2k Se . Both condition numbers are proportional to the norm of an inverse perturbed linear operator. Recall that the reciprocal of this norm describes the separation between the desired spectrum Λ(Le) and the rest of the spectrum of the perturbed eigenvalue problem. The smaller the separation is, the larger the condition number we have. It is worth noting that the bound in Theorem 4.9 is very similar to the one presented in [2, Theorem 8] for polynomial eigenproblems. There are, nevertheless, several minor differences. Specifically, the bound in [2]: 1) includes an orthogonal pro- jector which projects out the component of the perturbation that lies in the manifold −1 of all invariant pairs equivalent to (V,L); 2) it uses L(V,L) instead of the perturbed −1 2 variant Le(V,L); and 3) it includes a O(kTe(·) − T(·)k ) term, because it is based on a first-order perturbation analysis. In spite of these differences, both bounds show the similar idea: the perturbation of simple invariant pairs are proportional to the perturbation of the problem data, and condition number is the norm of the inverse −1 Fr´echet derivative L(V,L) or its perturbed variant. 4.3. Subspace projection of T (·) and the block Rayleigh functional. In this section, we investigate the projection (restriction) of the nonlinear eigenproblem T (·) defined in (2.1) onto an approximate eigenspace. This projection is the block version of the nonlinear Rayleigh functional [23]. Rayleigh functionals are generalizations of Rayleigh quotients of linear eigenvalue problems to the nonlinear case. Consider the generalized linear eigenvalue problem Av = λBv, i.e., T (λ)v = (λB − A)v = 0, with a given approximate eigenvector x ≈ v. One can choose an auxiliary vector y such that yH Bx 6= 0, and define the Rayleigh quotient as ρ(y, x) = (yH Ax)/(yH Bx), which is in fact the solution of yH T (ρ)x = 0. Similarly, for the nonlinear eigenproblem T (·) defined in (2.1), the Rayleigh functional ρ(y, x) is defined as the solution of yH T (ρ)x = 0 that satisfies yH T 0(ρ)x 6= 0; see [23]. 15

The Rayleigh quotient and the Rayleigh functional can be considered as the projection of the nonlinear eigenproblem T (·) onto the one-dimensional right space span{x} and left space span{y}. In particular, if x = v, then ρ(y, v) = λ is the cor- responding eigenvalue; see [23, Theorem 5]. This is consistent with the fact that the projection of a square matrix A onto an invariant subspace V of dimension k is a ma- trix L ∈ Ck×k whose eigenvalues are identical to those of A corresponding to V . The projection of A onto an approximate invariant subspace is called the Rayleigh-Ritz projection and is often used to obtain eigenvalue approximations called Ritz values. Consider an exact invariant pair (V,L) and a corresponding approximate pair (X,G), respectively, of the nonlinear nonlinear eigenproblem T (·). The idea discussed above can be used to study the projection of T (·) onto the right space span{X} and left space span{Y }. For some given Y,X ∈ Cn×k, define the operator m (Y,X) X H P : G → Y AjXfj(G). j=1

Note that P(Y,V )(L) = Y H T(V,L) = 0, because (V,L) is an invariant pair. Therefore, if X is sufficiently close to V , then

m (Y,X) X H kP (L)k = Y AjXfj(L) j=1 m m X H X H = Y Aj(X − V )fj(L) ≤ kX − V k kY Ajkkfj(L)k j=1 j=1 is also small. (Y,X) Following the lines of the proof of Lemma 4.5, we can show that DP(L) is Lip- schitz continuous if the Fr´echet derivatives {Dfj} are Lipschitz continuous. Assume that for an approximate invariant pair (X,G), Y is chosen such that the Fr´echet derivative m (Y,X) X H (4.16) DP(L) : ∆G → Y AjXDfj(L)(∆G) j=1 is nonsingular. This assumption is a block generalization of yH T 0(λ)x 6= 0, which is needed to make sure that the Rayleigh functional ρ(x, y) is well-defined; see [23]. This condition holds in general for the two-sided scalar Rayleigh functional approximating non-defective eigenvalues λ, if x and y are close to appropriate right and left eigen- vectors, respectively. Specifically, as we will discuss shortly, one can always choose a right eigenvector v and a left eigenvector w corresponding to a non-defective λ, such that wH T 0(λ)v = 1. Therefore, yH T 0(λ)x 6= 0 if x and y are sufficiently close v and w, respectively. For the block case, however, we do not have a complete understanding (Y,X) of the conditions under which the the Fr´echet derivative DP(L) is nonsingular. In the following theorem, we summarize some properties of a block Rayleigh func- tional, which are obtained by applying the Newton-Kantorovich Theorem to P(Y,X). n×k k×k Theorem 4.10. Let (V,L), (X,G) ∈ CΓ ×CΩ be an exact and a corresponding n×k approximate minimal invariant pair of (2.1), respectively, where CΓ is bounded and k×k n×k convex, and CΩ is open and convex with bounded Ω. Suppose Y ∈C is such that (Y,X) (Y,X) −1 the Fr´echetderivative DP(L) defined in (4.16) is nonsingular, and κ= (DP(L) ) 16

(Y,X) (Y,X) is finite, and there exists γ > 0 such that − ≤ γkL1 − L2k for all DP(L1) DP(L2) k×k (Y,X) −1 L1,L2 ∈ CΩ . Suppose that X is close to V , such that δ = (DP(L) ) P(Y,X)(L) √2δ k×k satisfies h ≡ 2κγδ ≤ 1, and B(L, ) ⊂ CΩ , then there is a unique block 1+ 1−h √ (Y,X) H 1+ 1−h k×k Rayleigh functional Q satisfying P (Q)=Y T(X,Q)=0 in B(L, κγ ) ∩ CΩ (if h < 1) or in B(L, 1 ) (if h = 1) such that kQ − Lk ≤ √2δ . κγ 1+ 1−h Theorem 4.10 gives an approximation error estimate of the block Rayleigh func- tional Q, namely, kQ − Lk ≤ d = √2δ , which satisfies s 1+ 1−h

(Y,X) −1 (Y,X) −1 (Y,X) (4.17) δ ≤ ds ≤ 2δ = 2 (DP(L) ) P(Y,X)(L) ≤ 2 (DP(L) ) kP (L)k. From (4.17), we can show that for the special case where both the left and the right spaces are of one dimension (k = 1), the scalar Rayleigh functional studied in [23] may provide an eigenvalue approximation of higher order accuracy. In fact, let (λ, v) and (λ, w) be a right and a left eigenpair of the problem (2.1), respectively, such that H w T (λ) = 0, T (λ)v = 0 and kvk = kwk = 1. Consider x = cos αv + sin αv⊥ and π y = cos βw + sin βw⊥, where α, β < 2 , and v⊥⊥v and w⊥⊥w are unit vectors such that kxk = kyk = 1. Then, noting that X = x, Y = y and L = λ, we have m (Y,X) X H (4.18) kP (L)k = y Ajxfj(λ)

j=1 m X H H = (cos βw + sin βw )Aj(cos αv + sin αv⊥)fj(λ) ⊥ j=1 m H   X H  ≤ cos β w T (λ) (cos αv + sin αv⊥) + sin β w Ajfj(λ) (cos αv + sin αv⊥) ⊥ j=1 m H  X H = |sin β| cos αw T (λ)v + sin α w Ajfj(λ)v⊥ ⊥ ⊥ j=1 H = |w⊥ T (λ)v⊥|| sin α sin β| ≤ kT (λ)k| sin α sin β| , and for small α and β

m −1 (Y,X) −1 X H 0 H 0 −1 (DP ) = (y Ajx)fj(λ) = |y T (λ)x| (L) j=1

H 0 H 0 = cos β cos αw T (λ)v + cos β sin αw T (λ)v⊥ −1 H 0 H 0 H +sin β cos αw⊥ T (λ)v + sin β sin αw⊥ T (λ)v⊥ −1 ≤ | cos α cos β||wH T 0(λ)v| − (| sin α cos β| + | cos α sin β| + | sin α sin β|)kT 0(λ)k .

From (4.17), for small α and β, the error estimate of the scalar Rayleigh functional as an approximation of λ is 2kT (λ)k| sin α sin β| (4.19) |q(y, x) − λ| ≤ |yH T 0(λ)x| 2kT (λ)k| tan α tan β| ≤ . |wH T 0(λ)v| − (| tan α| + | tan β| + | tan α tan β|)kT 0(λ)k 17

For the linear eigenproblem Av − vλ = 0, exactly this bound has been derived in [22, Proposition 5]. For nonlinear eigenproblems, it is shown in [23, Remark 3.9] that the kT (λ)k| tan α tan β| bound can be improved to |q(y, x) − λ| ≤ . |wH T 0(λ)v|(1 − O(tan α + tan β)) Therefore, if wH T 0(λ)v 6= 0, the two-sided scalar Rayleigh functional is an eigen- value approximation of second order accuracy for small α and β; if wH T 0(λ)v = 0, then the last expression in (4.19) indicates that only first order accuracy can be achieved. In fact, whether wH T 0(λ)v = 0 basically depends on if λ is a defective eigenvalue. Specifically, assume that the geometric multiplicity of λ is g, and {ϕ1, ϕ2, . . . , ϕg} and {ψ1, ψ2, . . . , ψg} are corresponding right and left eigenvectors, respectively. Then it is shown in [11, Theorem A.10.2] that the two sets of eigenvectors can be chosen H 0 such that ψi T (λ)ϕj = δij (i, j = 1, 2, . . . , g) for non-defective (simple or semisimple) H 0 λ. For defective λ, however, we always have ψi T (λ)ϕj = 0. It follows that second order accuracy of the two-sided scalar Rayleigh functional can only be achieved for non-defective eigenvalues. We find that the above result on second order accuracy in eigenvalue approxima- tion in the scalar case can be extended to the two-sided block Rayleigh functional for non-defective eigenvalues. The result is summarized in the following theorem. Theorem 4.11. Let (V,L) and (X,G) be an exact and a corresponding approx- imate invariant pair of (2.1), respectively. Let Y ∈ Cn×k be such that range(Y ) is close to the span of the corresponding left eigenvectors. Assume that the con- ditions of Theorem 4.10 are satisfied, so that there exists a unique block Rayleigh functional Q satisfying kQ − Lk ≤ √2δ ≤ 2δ. Let λ be an eigenvalue of T (·) 1+ 1−h and L, and v and w be a corresponding right and a left eigenvector. Suppose that there exist x ∈ range(X) and y ∈ range(Y ) such that α = ∠(x, v) and β = ∠(y, w) are sufficiently small. If λ is a simple eigenvalue of T (·), or if λ is semi-simple and wH T 0(λ)v 6= 0, then the block Rayleigh functional Q has an eigenvalue λ˜ such that |λ˜−λ| ≤ O(sin α sin β). If λ is a defective eigenvalue of T (·), then Q has an eigenvalue λ˜ such that |λ˜ − λ| ≤ O(min(α, β)). The proof of Theorem 4.11 is given in Appendix A. 5. Structure and norm estimate of Fr´echet derivatives. Let (V,L) be a simple invariant pair of (2.1), and (X,G) be a corresponding approximate invariant pair. In Section 4, we showed that the norms of the inverse Fr´echet derivatives −1 −1 L(X,G) and Le(V,L), play an important role in the analysis of approximation errors and perturbations of invariant pairs; see (4.6) and (4.15). In this section, we analyze the algebraic structure of the Fr´echet derivative L(V,L), and we briefly discuss an norm −1 −1 −1 estimate of the inverse derivative L(V,L). Since both L(X,G) and Le(V,L) are perturbed −1 variants of L(V,L), a norm estimate of the latter can be used as an estimate of the former, and thus it may give a quantitative description of conditioning and sensitivity of invariant pairs. −1 Before we present the norm estimate of L(V,L), recall that the expression of the Fr´echet derivative from (2.8) and (2.9) is  (5.1) L(V,L) : (∆X, ∆G) → DT(V,L)(∆X, ∆G), DV(V,L)(∆X, ∆G) m `−1 `−1  X X H j X H j  = T(∆X,L) + AjV Dfj(L)(∆G), Wj ∆XL + Wj V [DL ](∆G) , j=1 j=0 j=1 which is a linear mapping on Cn×k ×Ck×k. Since a linear mapping on a finite di- 18

mensional linear space can be represented by a matrix, the structure of L(V,L) can be understood by studying the matrix representation of the Fr´echet derivative.

Theorem 5.1. Let (V,L) be a simple invariant pair of (2.1) and L(V,L) be  HM  the corresponding Fr´echetderivative defined in (5.1). Let be the matrix NK nk×nk nk×k2 k2×nk representation of L(V,L), where H ∈ C , M ∈ C , N ∈ C and K ∈    2 2 HM vec(∆X) k ×k , such that vec (∆X, ∆G) = for all (∆X, ∆G) ∈ C L(V,L) NK vec(∆G) Cn×k × Ck×k. Then

1 ≤ r ≡ dim(ker(H)) ≤ k2. Here, r = 1 if and only if L has only one distinct eigenvalue and is similar to a single Jordan block of order k; r = k if L has k distinct simple eigenvalues; r = k2 if and only  Σ 0  if L has only one distinct eigenvalue that is semi-simple. Let H = Y 11 Y H a 0 0 b (nk−r)×(nk−r) be the singular value decomposition of H, where Σ11 ∈ C contains all the nonzero singular values of H. Define    H     H H  Σ11 0 M13 Ya 0 HM Yb 0 Ya HYb Ya M F∗ ≡ = =  0 0 M23 , 0 Ik2 NK 0 Ik2 NYb K H H N31 N32 K

(nk−r)×k2 r×k2 H k2×(nk−r) H k2×r where M13 ∈ C , M23 ∈ C , N31 ∈ C , N32 ∈ C . If F∗ is −1 nonsingular, then F∗ has the following 3 × 3 block form

 −1 −1 −1 −1 −1 H −1 −1 H H −1  Σ11 −Σ11 M13Q3 Q0 Σ11 M13Q3 N32 −Σ11 M13Q3 I −N32(N32)L H −1 H −1 −1 −1 H H −1 −1 H H −1 −(N32)L N31Σ11 −Q2Q3 Q0 I + Q2Q3 N32 (N32)L −Q2Q3 I −N32(N32)L , −1 −1 H −1 H H −1 Q3 Q0 −Q3 N32 Q3 I −N32(N32)L

H −1 H −1 r×k2 H where (N32)L = (N32N32) N32 ∈ C is the left inverse of N32, and

H H −1 H −1 k2×(nk−r) Q0 =− I −N32(N32)L N31Σ11 ∈ C , H −1 k2×k2 Q1 = K33 −N31Σ11 M13 ∈ C , H −1 r×k2 Q2 = M23 +(N32)L Q1 ∈ C , H H H −1 H k2×k2 Q3 = Q1 −N32Q2 = I −N32(N32)L Q1 −N32M23 ∈ C .

The proof of this theorem is given in Appendix B. Remark 5.2. It follows from Theorem 5.1 that, given the block matrix repre- −1 sentation of L(V,L), we may define the norm of the inverse derivative L(V,L) as, for example,

−1  HM  −1 ≡ = F −1 , L(V,L) NK ∗ 2 2 and thus an upper bound of this norm can be obtained by summing up the norm of −1 each block of the matrix form of F∗ . In particular, we see from this block matrix −1 −1 that a large Σ11 2 indicates that F∗ 2 is large. For example, in the special case 19 where L has only one distinct semi-simple eigenvalue such that dim(ker(H)) = k2, this block matrix can be simplified, and one can follow [21, Lemma 4.5] and show that

−1 −1 H −1 H  −1  (5.2) F∗ 2 ≤ Σ11 2 1 + (N32) N31 2 1 + M13M23 2 H −1 −1 −1 H −1 + (N32) KM23 2 + M23 2 + (N32) 2 .

−1 Here, the upper bound depends linearly on Σ11 2. In the scalar case (k = r = 1),   " T (λ) T 0(λ)v # HM T H 0 wH 0 λ is simple, and = H , M23 = e Y T (λ)v = T (λ)v = v n a kwk2 NK 2 0 kvk2 1 H vH vH v 1 6= 0, and N32 = 2 Yben = 2 = 6= 0. Therefore the last three terms kwk2 kvk2 kvk2 kvk2 kvk2 in the upper bound of (5.2) are finite, and they bear no connections to the separation between λ, the eigenvalue of interest, and the rest of the spectrum. In the block case where λ is semi-simple, we have no proof, but we conjecture that the same observation still holds for these terms. −1 Note that Σ11 2 is the reciprocal of the smallest nonzero singular value of H, i.e., the matrix representation of the functional ∆X → T(∆X,L). In the special case when k = 1 and L = λ, this functional degenerates to the matrix T (λ), and its smallest nonzero singular value is relevant to estimate the separation between λ and the rest of the spectrum of T (·). A very small singular value indicates that the separation is small, though the inverse may not be true, for example, in certain cases where λ is highly ill-conditioned. By analogy, the smallest nonzero singular value of H may also be useful to estimate the separation between Λ(L) ⊂ Λ(T (·)) and Λ(T (·)) \ Λ(L). If −1 −1 −1 the estimate of L(V,L) is a reasonable estimate of L(X,G) and Le(V,L) , then large separation between the desired spectrum and the rest of the spectrum is desirable for numerical algorithms to compute the wanted invariant pairs. In this case, an approximate invariant pair with small eigenresidual norm is indeed close to the exact invariant pair, and the exact invariant pair is not very sensitive to perturbations. −1 −1 We conclude this section by discussing an estimate of L(X,G) and Le(V,L) , i.e., the condition numbers in the quantification of approximation errors and perturbations of invariant pairs; see Theorems 4.6 and 4.9. For example, following the treatment " # He Mf of L(V,L) in Appendix B, we have as the matrix representation of Le(V,L). Ne Ke  H " #  Ya 0 He Mf Yb 0 Define Fe = , as F∗ is defined in (B.3). Then Fe → F∗ 0 Ik2 Ne Ke 0 Ik2 if the perturbation {δAj} → 0. Suppose that the perturbation is sufficiently small, such that Fe is close enough to F∗. Then there is η < 1 such that

−1 −1 (5.3) F∗ (Fe − F∗) = I − F∗ Fe ≤ η.

−1 k −1 −1 Therefore limk→∞ I − F∗ Fe = 0, I − (I − F∗ Fe) = F∗ Fe is nonsingular (see, e.g., [26, Theorem 4.20]), and Fe is nonsingular. From (5.3), we have in the 2-norm

−1 −1 −1  −1 Fe F∗ − 1 ≤ Fe F∗ − I = I − F∗ Fe Fe F∗ −1 −1 −1 ≤ I − F∗ Fe Fe F∗ ≤ η Fe F∗ −1 −1 ⇒ Fe F∗ ≤ (1 − η) −1 −1 −1 −1 −1 −1 −1 ⇒ Fe = Fe F∗F∗ ≤ Fe F∗ kF∗ k ≤ (1 − η) kF∗ k. 20

Assume that the perturbations are small such that η is not very close to 1. Then −1 −1 −1 −1 Le(V,L) = Fe is bounded above by a modest multiple of L(V,L) = kF∗ k. The −1 analysis can be applied verbatim to the estimate of L(X,G) . 6. Numerical Examples. In this section, we give numerical examples to illus- trate the properties of invariant pairs: 1) given a simple invariant pair (V,L) and a corresponding approximate pair (X,G), we give evidence that k(X − V,G − L)k is proportional to the eigenresidual norm kT(X,G)k; 2) for simple invariant pairs (V,L) and (V,e Le) of the original and the perturbed problems, we show that k(Ve −V, Le −L)k is proportional to the perturbation of T (·), and 3) the second order accuracy in eigen- value approximation can be achieved by the two-sided block Rayleigh functional Q that approximates the non-defective eigenvalues of L . We start with the description of the example eigenvalue problem. Consider the following nonlinear eigenproblem:

 3  µ−λ1 µ 2  µ−λ2 µ     µ−λ3 µ   2   (µ−λ4)   2 3  T (µ) =  5µ 100µ 0 µ µ+92  .  2 3   6µ 99µ 1 µ µ+90     ......   . . . . .  100µ 5µ2 95 µ3 µ−98 100×100

One see from the structure of T (µ) that if µ = λi (i = 1, 2, 3, 4), then the top left 4×4 submatrix is singular, and so is T (µ). Therefore {λi} are eigenvalues of this nonlinear eigenproblem. This is a cubic eigenproblem, and it can be solved by MATLAB’s polyeig function. 6.1. Approximation errors. We begin the discussion with invariant pair ap- proximation errors. First, consider a simple invariant pair involving three distinct simple eigenvalues λ1 = 3.02 + 0.01i, λ2 = 3 + 0.02i, and λ3 = 2.99 − 0.01i. For this problem, the corresponding eigenvectors {vi} (i = 1, 2, 3) are linearly indepen- −1  dent. Let the invariant pair be (V,L) = [v1 v2 v3]S, S diag(λ1, λ2, λ3)S , where the entries of S obey the standard normal distribution. To construct a corresponding ap- proximate invariant pair (X,G), we first introduce perturbations dV and dL, whose entries also obey the standard normal distribution. We then scale dV and dL such that kdV k = kV k and kdLk = kLk. The approximate pair (X,G) is generated by the following command err = 10^ (-kk); X = V + err*dV; G = L + err*dL; where kk = 1, 2,..., 12 specifies the magnitude of the invariant pair approximation error. Then we assign (X,G) ← X(XH X)−1/2, (XH X)1/2G(XH X)−1/2, so that the new X has orthonormal columns. This normalization is slightly different from H imposing W U`(X,G) = Ik (` = 1 due to the linear independence of {v1, v2, v3}) with W defined in (2.6), but it is used in the block Newton method for computing simple invariant pairs [12]. Our experience is that the choice of normalization has minimum impact on the numerical results. In the left part of Figure 6.1, k(X − V,G − L)k, the invariant pair approximation error, is plotted against the eigenresidual norm kT(X,G)k. We see from the solid line with  marker that k(X − V,G − L)k is proportional to kT(X,G)k. In addition, 21 we found that kX − V k and kG − Lk are also proportional to kT(X,G)k with the same proportional factor. Theorem 4.6 shows that this factor should be inversely proportional to the separation between Λ(L) and Λ(T ) \ Λ(L). To see the approximation errors for invariant pairs involving degenerate eigenval- ues, consider a semi-simple λ1 = λ2 = λ3 = 3 + 0.1i, and a defective λ4 = 3 with a single Jordan chain of length 2. For the semi-simple eigenvalue, we let L = λ1I3 and obtain V = [v1 v2 v3] by directly computing ker T (λ1). For the defective case, define   λ4 1 † 0 L = , and let v1 ∈ ker T (λ4) and v2 = −T (λ4) T (λ4)v1 be the standard 0 λ4 and the generalized eigenvectors, respectively. In both cases, (V,L) is a simple invari- ant pair. We then perform the same procedure done for simple eigenvalues. In the left part of Figure 6.1, we see from the dash-dot line with ♦ marker and the dashed line with 4 marker that k(X−V,G−L)k is proportional to kT(X,G)k in the semi-simple and defective cases as well. Remark 6.1. To make the understanding of the invariant pair approximation errors more complete, we briefly discuss the intuition for the inverse proportionality between k(X −V,G−L)k and the separation between the desired spectrum and the −1 −1 rest of the spectrum (characterized by L(X,G) in Theorem 4.6). Consider the simplest case where the desired invariant pair is a simple eigenpair (λ∗, v∗). Assume that there is a nearby simple eigenpair (λα, vα) such that kvα − v∗k > γ0 > 0. Then (λ∗, vα) is an approximate invariant pair with eigenresidual 0 2 kT (λ∗)vαk = kT (λα)vα + (λ∗ − λα)T (λα)vα + O (λ∗ − λα) k = O(λ∗ − λα).

Therefore, if λα is very close to λ∗, kT (λ∗)vαk is small, but the separation between λ∗ and the rest of the spectrum is also small. Indeed, since the physical distance between λ∗ and Λ(T (·)) \{λ∗} is no greater than |λ∗ − λs|, we may make a reasonable assumption that this separation is bounded above by O(λs −λ∗). As a result, the −1 −1 upper bound for kvα − v∗k given in Theorem 4.6, namely, kT (λ∗)vαk/ , L(vα,λ∗) is bounded below by a positive constant. This observation is consistent with the fact that kvα − v∗k > γ0 > 0. In conclusion, if the desired spectrum is not well separated from the rest of the spectrum, an approximate invariant pair with a small eigenresidual norm does not guarantee an accurate eigenspace approximation. 6.2. Perturbations. To study the perturbation of simple invariant pairs, we generate a set of random matrices {∆Aj}, where each ∆Aj has the same sparsity pattern as Aj, and the entries of ∆Aj obey the standard normal distribution. Then {∆Aj} are scaled so that k∆Ajk = kAjk. The perturbed eigenproblem is thus defined Pm as Te(µ) = j=1 fj(µ)(Aj + ∆Aj), where  is a small scaling factor controlling the magnitude of the perturbation. We found that the three simple eigenvalues specified in Section 6.1 remain simple after such perturbation. Let {(λei, vei)} be the perturbed eigenpairs, Ve = [ve1 ve2 ve3], −1 and Le = diag(λe1, λe2, λe3). Then we let (V,e Le) ← (VeS,e Se LeSe), where Se solves the least squares problem

S = arg min kV − V SkF . e 3×3 e e S∈C   Finally, we normalize Ve by letting (V,e Le) ← Ve(Ve H Ve)−1/2, (Ve H Ve)1/2Le(Ve H Ve)−1/2 . In the right part of Figure 6.1, we plotted k(Ve −V, Le −L)k against the magnitude of eigenproblem perturbation . Clearly, the perturbation of (V,L) is proportional to 22 0 10 ! ) −2 )

−5 L 10 10 − V,L

−4

V,G 10 −

X Simple −10 ( −6 ! 10 10 Simple Semi−simple Semi−simple Defective −8

Invariant Pair Approx. Error Defective 10 Perturbation in ( Defective (eigenpair alone) Eigenresidual Norm !T(X,G)! T −15 −10 Perturbation in 10 10 −10 −5 0 −10 −5 10 10 10 10 10

Fig. 6.1. Approximation errors and perturbations of simple invariant pairs Left: eigenresid- ual norm kT(X,G)k (x-axis) vs. invariant pair approximation errors k(X−V,G−L)k (y-axis) Right: Perturbations of T (·) (x-axis) vs. invariant pair perturbations k(Ve −V, Le−L)k (y-axis)

the perturbation of the problem data. In addition, we also computed kVe − V k and kLe − Lk, and we found that both quantities are also proportional to . We repeated the experiment for simple invariant pairs involving the semi-simple eigenvalue and the defective eigenvalue. Specifically, the semi-simple eigenvalue λ1 = λ2 = λ3 = 3 + 0.1i corresponds to three distinct perturbed simple eigenvalues λe1, λe2 and λe3; the simple invariant pair involving the defective eigenvalue is perturbed to a new invariant pair (V,e Le) involving two distinct simple eigenpairs (λe4a, ve4a) and (λe4b, ve4b). Our results again show that the perturbation of simple invariant pairs is proportional to the perturbation of the eigenproblem data. Here, it is worth pointing out that the defective eigenpair itself, (λ4, v4), is much more sensitive to perturbation than the simple invariant pair involving λ4. We√ found that k(v4 − v4a, λ4 − λ4a)k and k(v4 − v4b, λ4 − λ4b)k are both proportional to ; see the dotted line with ◦ marker in the right part of Figure 6.1. This result is consistent with the well-known fact that the perturbation of an defective eigenpair is proportional to 1/m, where m is the size of the corresponding largest Jordan block [18]. 6.3. Block Rayleigh functionals. Given a minimal invariant pair (V,L) and a right eigenspace approximation X ≈ V , Theorem 4.10 shows that the block Rayleigh H Pm H functional Q satisfying Y T(X,Q) = j=1 Y AjXfj(Q) = 0 is a good approxi- mation to L; in addition, it is shown in Theorem 4.11 that the eigenvalues of Q are approximations of second order accuracy to the non-defective eigenvalues of L, if Y contains good approximations to the corresponding left eigenvectors. We perform numerical experiments to illustrate the above results. Consider again the simple eigenpairs (λi, vi)(i = 1, 2, 3) used in Sections 6.1 and 6.2. Let {wi} be the corresponding unit left eigenvectors, V = [v1 v2 v3]S, W = [w1 w2 w3] and −1 L = S diag(λ1, λ2, λ3)S, where S is a random 3 × 3 matrix. We generate random perturbations dV and dW , whose entries obey the standard normal distribution, and we normalize them such that kdV k = kV k and kdW k = kW k. Then the right and left eigenspace approximations are constructed by the MATLAB command X = V + err*dV; Y = W + err*dW; such that kX −V k is proportional to kY −W k. There is no need to normalized X and Y , because the normalization does not change the eigenvalues of the block Rayleigh functional, as is shown in (A.2). The block Rayleigh functional Q is computed by Newton’s method with initial iterate L. 23

! L − Q ) !

−5 −5

10 10 Q, L ( md Simple Simple Semi−simple Semi−simple Defective Defective −10 −10 RFBlock Matching Distance 10 10 Eigenspace Approx. Err. Eigenspace Approx. Err. !X − V ! and !Y − W ! !X − V ! and !Y − W ! Block RF Approx. Err.

−10 −8 −6 −4 −10 −8 −6 10 10 10 10 10 10 10

Fig. 6.2. Two-sided Block Rayleigh functional approximations Left: eigenspace approxima- tion errors kX − V k and kY − W k (x-axis) vs. block Rayleigh functional approximation errors kQ − Lk (y-axis) Right: eigenspace approximation errors kX − V k and kY − W k (x-axis) vs. optimal matching distance md(Q, L) (y-axis)

The left part of Figure 6.2 shows that Q is an approximation to L of first order accuracy, that is, kQ − Lk = O(kX − V k) = O(kY − W k); see the solid line with  marker. In the right part of Figure 6.2, we plotted the optimal matching distance [28, Chapter 4.1] between Q and L against kX − V k. This distance is defined as

md(Q, L) = min max |λπ(i)(Q) − λi(L)|, π(i) 1≤i≤3 where π is taken over all permutations of {1, 2, 3}; it is a reliable estimate of the approximation of the spectrum of Q to that of L. We see that the eigenvalue approx- imations are of second order accuracy, i.e., md(Q, L) = O(kX − V kkY − W k). We repeated the experiments for semi-simple and defective eigenvalues. Note that for the semi-simple eigenvalue, both kQ − Lk and md(Q, L) are proportional to O(kX − V kkY − W k); see the dash-dot lines with ♦ markers. The accuracy of Q as an approximation of L is higher than that presented in Theorem 4.10. We have no proof of this observation, but we believe this is due to the fact that L = −1 S diag(λ1, λ2, λ3)S = λiI3 is a diagonal matrix. For the defective eigenvalue, what we observed for simple eigenvalues holds verbatim; that is, kQ − Lk and md(Q, L) are proportional to O(kX − V k) and O(kX − V kkY − W k), respectively. Note that in the defective case, the highest accuracy of kQ − Lk and md(Q, L) is roughly on the order of the square root of machine precision. As we remarked in Section 6.2, this is due to the serious ill-conditioning of the defective eigenpair (λ4, v4); see [18]. 7. Conclusion. We investigated a few important properties of invariant pairs of nonlinear algebraic eigenvalue problems. We showed that the algebraic, partial, and geometric multiplicities of an eigenvalue of a simple invariant pair are identical to those of this eigenvalue of the nonlinear eigenvalue problem. We then studied the approximation errors and perturbations of simple invariant pairs, and the accuracy of eigenvalue approximations achieved by the block Rayleigh functional. We analyzed the structure of the inverse Fr´echet derivative arising in the block version of Newton’s method, and we discussed the norm estimate of the inverse derivative that describes the conditioning and sensitivity of simple invariant pairs. Numerical examples are given to illustrate the analysis. Acknowledgement. We greatly appreciate the referees’ comments, which were very helpful in improving our manuscript. 24

Appendix A: Proof of Theorem 4.11. The proof consists of three steps. In the first step, we consider a special case where the exact minimal pair (V,L) ∈ Cn×k × Ck×k is such that L is diagonal and has only one distinct eigenvalue λ; that is, λ is a semi-simple eigenvalue of L of multiplicity k, and it is also a semi-simple eigenvalue of T (·) of multiplicity g ≥ k. In the second step, we consider a more general case where L is diagonal and has at least two distinct eigenvalues. Finally, we show that the result developed in the first two steps can be extended to a minimal invariant pair (V,L) with non-diagonal L. We begin with the first step. Let λ be a semi-simple eigenvalue of T (·) whose multiplicity is g ≥ k, and {ϕ1, ϕ2, . . . , ϕg} and {ψ1, ψ2, . . . , ψg} are the corresponding right and left eigenvectors that span kerT (λ) and ker (T (λ))H , respectively, such that 0 ψiT (λ)ϕj = δij. Without loss of generality, let V = [ϕ1 ϕ2 . . . ϕk], W = [ψ1 ψ2 . . . ψk], and L = λIk. Let (X,G) be an approximation to (V,L), and Y an approximation to W . Assume that X is sufficiently close to V in norm, and that the conditions of Theorem 4.10 are satisfied. It then follows that there exists a unique block Rayleigh functional Q such that Y H (X,Q) = 0, and kQ − Lk ≤ √2δ ≤ 2δ. Assume that T 1+ 1−h −1 ˜ ˜ ˜ Q is diagonalizable, such that Q = SQΛQSQ where ΛQ = diag(λ1, λ2,..., λk). We know from Gerschgorin theory that for all i = 1, 2, . . . , k, √ √ ˜ X |λi − λ| ≤ |Qii − λ| + |Qij| ≤ kQ − Lk∞ ≤ kkQ − LkF ≤ 2 kδ 1≤j≤k, j6=i ˜ This means that for each eigenvalue λ of L, there exits a unique eigenvalue λi of Q such that |µ − λ| ≤ O(δ). In fact, there are more accurate eigenvalue approximations as we show in the following derivation. Note that since Y H T(X,Q) = 0, we have m m H X H X H −1 Y T(X,Q) = Y AiXfi(Q) = Y AiXSQfi(ΛQ)SQ i=1 i=1 m X H ˜ ˜ ˜ −1 = Y Ai (VSQ + (X − V )SQ) diag(fi(λ1), fi(λ2), . . . , fi(λk))SQ = 0, i=1 from which it follows that m X H ˜ yj Ai (VSQej + (X − V )SQej) fi(λj)(A.1) i=1 H ˜ = yj T (λi)(VSQej + (X − V )SQej) = 0. Obviously, if kX − V k is sufficiently small, then the right eigenvector approximation VSQej + (X − V )SQej is sufficiently close to the exact right eigenvector VSQej. Assume that the (j, j) entry of SQ is nonzero, so that VSQej has a nonzero component H 0 of vj and thus wj T (λ)VSQej 6= 0. It follows that the result shown on the second order accuracy in eigenvalue approximation of the scalar Rayleigh functional can be ˜ directly applied to (A.1), such that |λj −λ| ≤ O(sin α sin β), where α = ∠(wj, yj) and β = ∠(VSQej,VSQej + (X − V )SQej). This completes the first step of our proof. To begin the second step, we need to review an important perturbation theorem for block matrices. Proposition A.1. [27, Chapter 4, Theorem 2.10] Let k · k be a consistent matrix  NH  norm. Let B = , and define sep(N,M) = inf kPN − MP k. If KM kP k=1 25

4kKkkHk ≤ sep2(N,M), then there exists a unique matrix P satisfying PN − MP = kKk K − PHP and kP k ≤ 2 sep(N,M) such that  I 0   I 0   N + HPH  B˜ = B = , −PI PI 0 M − PH where N + HP and M − PH have disjoint spectra. Let λ1, . . . , λ`, λ`+1, . . . , λk be non-defective eigenvalues of T (·), where λ1 = λ2 = ... = λ`, and they are not equal to any value in the set {λ`+1, . . . , λk}. Let {vi} and {wi} be corresponding right and left eigenvectors, respectively, such H 0 H 0 that wi T (λ1)vj = δij for 1 ≤ i, j ≤ ` and wi T (λi)vi = 1 for ` + 1 ≤ i ≤ k. Let L = diag(λ1, . . . , λ`, λ`+1, . . . , λk), V = [V1 V2] = [ϕ1 ··· ϕ` ϕ`+1 ··· ϕk], and W = [W1 W2] = [ψ1 ··· ψ` ψ`+1 ··· ψk]. Let (X,G) be an approximation to (V,L), and Y an approximation to W , such that kX −V k and kY −W k are sufficiently small. Assume that the conditions in Theorem 4.10 are satisfied, so that there exists a unique block Rayleigh functional Q such that Y T (X,G) = 0, and kQ − Lk ≤ √2δ ≤ 2δ. T 1+ 1−h We see that  11 12  L1 + EQ EQ Q = L + EQ = 21 22 , where EQ L2 + EQ

L1 = diag(λ1, . . . , λ`),L2 = diag(λ`+1, . . . , λk), and ij kEQ k ≤ 2δ (i, j = 1, 2).

Here, note that L1 and L2 have disjoint spectra. Since kX − V k is sufficiently small, ij so are δ (see Theorem 4.10) and kEQ k. Therefore the (1, 1) and (2, 2) blocks of Q, 11 22 namely, L1 + EQ and L2 + EQ , also have disjoint spectra. By Proposition A.1, there 21 (k−`)×` 2kEQ k is a unique matrix P ∈ C satisfying kP k ≤ 11 22 such that sep(L1+EQ ,L2+EQ )

     11 12 12  ˜ I 0 I 0 L1 + EQ + EQ PEQ Q = Q = 22 12 , −PI PI 0 L2 + EQ − PEQ

11 12 22 12 `×(k−`) and the spectra of L1 +EQ +EQ P and L2 +EQ −PEQ are disjoint. Let Z ∈ C be the unique solution of the Sylvester equation

22 12 11 12  Z L2 + EQ − PEQ − L1 + EQ + EQ P Z = 0 .

Then the block upper-triangular Q˜ can be written as

   11 12    ˜ IZ L1 + EQ + EQ P 0 I −Z Q = 22 12 , 0 I 0 L2 + EQ − PEQ 0 I

˜ 11 12 such that the (1, 1) block of the matrix function f(Q) is f(L1 + EQ + EQ P ). With the above derivation, we can complete the second step of the proof as follows. n×` n×(k−`) Let X = [X1 X2] and Y = [Y1 Y2] where X1,Y1 ∈ C and X2,Y2 ∈ C . Since the block Rayleigh functional Q satisfies Y T T(X,Q) = 0, we have m m X X  I 0   I 0  Y H (X,Q) = Y H A Xf (Q) = Y H A Xf Q˜ T j j j j PI −PI j=1 j=1 m X  I 0   I 0  = [Y Y ]H A [X X ] f (Q˜) = 0. 1 2 j 1 2 PI j −PI j=1 26

The (1, 1) block of the above matrix equation is

m X H 11 12 Y1 Aj(X1 + X2P )fj(L1 + EQ + EQ P ) = 0. j=1

11 12 Here, note that L1 + EQ + EQ P is a good approximation to L1 = diag(λ1, . . . , λ`) 11 12 21 since kEQ k, kEQ k ≤ O(δ) and kP k ≤ O(kEQ k) ≤ O(δ), Y1 is a good approximation to the left eigenvectors W1 = [ψ1, . . . , ψ`], X1 is a good approximation to the right eigenvectors V1 = [ϕ1, . . . , ϕ`], and so is X1 +X2P . Therefore, the result we developed 11 12 in the first step can be applied, indicating that the eigenvalues of L1 + EQ + EQ P are approximations to λ1 of second order accuracy. Note that the spectra of L1 + 11 12 ˜ EQ + EQ P is a subset of the spectra of Q, which is equivalent to that of Q. This completes the second step of the proof. Finally, for any minimal invariant pair (V 0,L0) involving non-defective eigenvalues of T (·) where L0 is a non-diagonal matrix, we can first transform (V 0,L0) to (V,L) with diagonal L and apply the same transformation to the approximate pair (X,G). 0 −1 0 −1 Specifically, let L = SLLSL , V = VSL , and consider the transformed approximate −1 0 pair (XSL,SL GSL). If kX − V k is sufficiently small, then

0 kXSL − V k ≤ kX − V kkSLk is also small as long as kSLk is not too large. Then the above two steps of proof can be applied to the transformed pairs. Note that if the conditions of Theorem 4.10 are satisfied such that there exists a unique block Rayleigh functional in our setting, then H the two block Rayleigh functionals Qa satisfying Y T(X,Qa) = 0 and Qb satisfying H (YF1) T(XF2,Qb) = 0 (F1 and F2 are any nonsingular k ×k matrices) have identical spectra, because

m H X H H (YF1) T(XF2,Qb) = F1 Y AjXF2fj(Qb)(A.2) j=1 m X H H −1 H −1 = F1 Y AjXfj(F2QbF2 )F2 = 0 ⇐⇒ Y T(X,F2QbF2 ) = 0, j=1

−1 from which Qa = F2QbF2 follows. Therefore, the second order accuracy in eigen- value approximation of the two-sided block Rayleigh functional holds for a pair (X,G) that is sufficiently close to any given minimal invariant pair (V,L) involving only non- defective eigenvalues of T (·). The above analysis of the two-sided block Rayleigh functional for approximating non-defective eigenvalues can be applied verbatim to the defective eigenvalues. Since the two-sided scalar Rayleigh functional can only achieve the first order accuracy in approximating defective eigenvalues, the same conclusion holds for the two-sided block Rayleigh functionals. This completes the proof. Appendix B: Proof of Theorem 5.1. The derivation of the norm estimate  HM  of −1 consists of two steps. In step 1, we study the structure of , i.e., L(V,L) NK the matrix representation of L(V,L). We note that the (1, 1) block matrix H is in fact the matrix representation of the functional ∆X → T(∆X,L) (this functional is 2 a block generalization of the matrix T (λ0)). We have 1 ≤ dim(ker(H)) ≤ k , and 27

dim(ker(H)) = k2 if and only if L has a unique eigenvalue that is semi-simple. In step 2, using the singular value decomposition of H, we give a 3 × 3 block matrix representation of L(V,L) together with its inverse. We begin with step 1 to study the structure of the matrix representation of L(V,L) and the dimension of ker(H). Suppose that L has s distinct eigenvalues λ1, . . . , λs with algebraic multiplicities {algL(λi)} (1 ≤ i ≤ s). It is easy to see that any 2 × 2 block  L ∆G  matrix of the form with arbitrary ∆G ∈ k×k has s distinct eigenvalues 0 L C λ1, . . . , λs with algebraic multiplicities {2 algL(λi)}. Note that the size of the largest Jordan block of this block matrix corresponding to λi is 2 algL(λi) × 2 algL(λi) at most. By the Hermite interpolation theory (see, e.g., [4, Section 4.3.1]), there are Ps polynomials pj (1 ≤ j ≤ m) of degree d = 2 i=1 algL(λi) − 1 = 2k − 1 such that

(di) (di) pj (λi) = fj (λi),

where 1 ≤ i ≤ s, 0 ≤ di ≤ 2 algL(λi) − 1 for each fj in (2.1). It then follows from the proof of [12, Theorem 10] that pj(L) = fj(L) and Dpj(L)(∆G) = Dfj(L)(∆G) for any ∆G, and therefore fj can be equivalently replaced with pj for the Fr´echet derivative Pd r r L(V,L). Let pj(t) = r=0 cjrt , and note that the Fr´echet derivative of L for r ≥ 1 r Pr−1 i r−i−1 is [DL ](∆G) = i=0 L ∆GL . −1 We transform L(V,L) into a matrix form to analyze its norm. For this purpose,  vec(X)  we define the vectorization of an ordered pair (X,G) as vec ((X,G)) ≡ . vec(G) Then, noting that vec(AXB) = (BT ⊗ A)vec(X), we have     vec Pm A ∆Xp (L) + Pm A V p (∆G)  j=1 j j j=1 j D j(L) (B.1) vec (V,L)(∆X, ∆G) = L  P`−1 H j P`−1 H j   vec j=0 Wj ∆XL + j=1 Wj V [DL ](∆G)

 Pm  Pm Pd Pr−1 i r−i−1  vec j=1 Aj∆Xpj(L) + vec j=1 AjV r=1 cjr i=0 L ∆GL =  P`−1 H j P`−1 H Pj−1 i j−i−1  vec j=0 Wj ∆XL + vec j=1 Wj V i=0 L ∆GL " Pm T Pm Pd Pr−1 r−i−1 T i # pj(L) ⊗Aj (L ) ⊗(cjrAjVL )  vec(∆X)  = j=1 j=1 r=1 i=0 P`−1 j T H P`−1 Pj−1 j−i−1 T H i vec(∆G) j=0(L ) ⊗Wj j=1 i=0 (L ) ⊗(Wj VL )    H M 2 vec(∆X) ≡ nk×nk nk×k . Nk2×nk Kk2×k2 vec(∆G) To reveal the structure of the 2 × 2 block matrix just defined, we need to study −1 the above (1, 1) block H and dim(ker(H)). Let JL = Z LZ be the Jordan canonical form of L. First, suppose that JL has only one distinct eigenvalue λ0 such that  JL = diag Jk1 (λ0),Jk2 (λ0),...,Jkg (λ0) with k1 ≥ k2 ≥ ... ≥ kg, where Jki is a Jordan block of size ki × ki. Then we have m m X T X −1T H = pj(L) ⊗ Aj = Zpj(JL)Z ⊗ Aj j=1 j=1 m  m  X −T T T  −T X T T = Z pj(JL) Z ⊗ Aj = (Z ⊗ In) pj(JL) ⊗ Aj (Z ⊗ In). j=1 j=1

Pm T −T T Let Hb = j=1pj(JL) ⊗ Aj. Since Z ⊗ In and Z ⊗ In are nonsingular, we have dim(ker(H)) = dim(ker(Hb)). 28

Now consider the top left nk1 × nk1 submatrix of Hb Pm T Hb1 = j=1pj(Jk1 (λ0)) ⊗ Aj   pj(λ0)Aj 0  pj(λ0)Aj pj(λ0)Aj  m  p00(λ )  X  j 0 0   2 Aj pj(λ0)Aj pj(λ0)Aj  =    . . . ..  j=1  . . . .   (k1−1)  pj (λ0) 0 Aj ······ p (λ0)Aj pj(λ0)Aj (k1−1)! j   T (λ0) 0  T (λ0) T (λ0)   00   T (λ0) T 0(λ ) T (λ )  =  2 0 0  .  . . . .   . . . ..   (k1−1)  T (λ0) 0 ······ T (λ0) T (λ0) (k1−1)!

Recall from Section 3 that for a simple invariant pair (V,L), the Jordan block Jk1 (λ0)

of L corresponds to a Jordan chain {ϕ1,0, ϕ1,1, . . . , ϕ1,k1−1} associated with λ0 of the nonlinear eigenvalue problem (2.1). By the definition of Jordan chains (2.2), we have h iT that ϕ ≡ ϕT , ϕT , . . . , ϕT ∈ ker(H ). 1 1,0 1,1 1,k1−1 b1

Consider the second Jordan block Jk2 (λ0) and the corresponding matrix Hb2 = Pm T j=1 pj(Jk2 (λ0)) ⊗Aj, the second block diagonal submatrix of Hb. Let the associated h iT Jordan chain be {ϕ , . . . , ϕ }. Then ϕ ≡ ϕT , . . . , ϕT ∈ ker(H ). 2,0 2,k2−1 2 2,0 2,k2−1 b2

Here, if k2 = k1, then Hb1 = Hb2, and ϕ1, ϕ2 ∈ ker(Hb1) = ker(Hb2). That is, Jordan chains of the same length corresponding to λ0 can be exchanged in this setting to construct the null space vectors. Similarly, suppose that k1 = k2 = ... = kp > kp+1 ≥ ... ≥ kq, then ϕ1, ϕ2, . . . , ϕp ∈ ker(Hbi) (1 ≤ i ≤ p), and             ϕ1 ϕp 0 0 0 0  0   0   ϕ1   ϕp   0   0               0   0   0   0   0   0  (B.2)   ,...,   ,   ,...,   ,...,   ,...,    .   .   .   .   .   .   .   .   .   .   .   .  0 0 0 0 ϕ1 ϕp   ∈ ker diag(Hb1, Hb2, ··· , Hbp) .

Note that the leading vectors in the p Jordan chains, namely, {ϕ1,0, ϕ2,0, . . . , ϕp,0} are linearly independent (see Theorem 3.1). Therefore {ϕ1, ϕ2, . . . , ϕp} are linearly independent, and so are the p2 vectors in (B.2). Therefore, if the g Jordan blocks can be grouped into s collections such that each collection contains pi (1 ≤ i ≤ s) Jordan blocks of the same size, and the Jordan Ps 2 blocks in two collections are of different size, then we have dim(ker(Hb)) = i=1 pi . If L has two or more distinct eigenvalues, the same calculation can be applied to each distinct eigenvalue, and we can see easily that dim(ker(H)) = dim(ker(Hb)) is the sum of the dimensions of the null spaces for each distinct eigenvalue. For example, suppose that L has 2 distinct eigenvalues λ0 and λ1. Assume that for λ0, there are 3 Jordan blocks of size 4 × 4, 1 Jordan block of size 3 × 3, and 2 29

Jordan blocks of size 1 × 1; for λ1, there are 1 Jordan block of size 2 × 2 and 4 Jordan blocks of size 1×1. Then dim(ker(H)) = (32 +12 +22)+(12 +42) = 31. In particular, consider the following three scenarios: 1. L has only one distinct eigenvalue, and there is only 1 Jordan block of size k × k. Then dim(ker(H)) = 1, the minimum value; 2. L has k distinct simple eigenvalues. Then dim(ker(H)) = k; 3. L has only one distinct eigenvalue and the eigenvalue is semi-simple (k Jordan blocks of size 1 × 1). Then dim(ker(H)) = k2. From the above discussion, it is not hard to see that k2 is the maximum value of dim(ker(H)), and it can be achieved if and only if Λ(L) = {λ0} and λ0 is semi-simple. This concludes step 1. −1 In step 2, we give a 3 × 3 block matrix representation of L(V,L). Assume that dim(ker(H)) = r with 1 ≤ r ≤ k2, and the singular value decomposition of H is  Σ 0  H = Y 11 Y H , where Σ ∈ (nk−r)×(nk−r) has all the nonzero singular a 0 0 b 11 C values of H on its diagonal, say, in decreasing magnitude. Define    H     H H  Σ11 0 M13 Ya 0 HM Yb 0 Ya HYb Ya M (B.3) F∗ ≡ = =  0 0 M23 , 0 Ik2 NK 0 Ik2 NYb K H H N31 N32 K

(nk−r)×k2 r×k2 H k2×(nk−r) H k2×r H where M13 ∈ C , M23 ∈ C , N31 ∈ C , N32 ∈ C , Ya M =

   −1 M13  H H  −1 HM and NYb = N31 N32 . If F∗ is nonsingular, then kF∗ k = M23 NK in any unitary invariant norm. For a simple invariant pair (V,L), Theorem 2.4 shows that the Fr´echet derivative  HM  is invertible, and therefore, its matrix representation is nonsingular. L(V,L) NK It follows that F∗ is nonsingular, and so is the Schur complement of Σ11, namely,       0 M23 0 −1   0 M23 H − H Σ11 0 M13 = H H −1 . N32 K N31 N32 K−N31Σ11 M13

H As a result, N32 and M23 must be of full rank. In the special case where L has exactly 2 one distinct eigenvalue that is semi-simple, then r = k , and F∗ is nonsingular if and H only if the square matrices N32 and M23 are nonsingular. −1 If F∗ is nonsingular, we can obtain F∗ by block elimination as follows:

 −1 −1 −1 −1 −1 H −1 −1 H H −1  Σ11 −Σ11 M13Q3 Q0 Σ11 M13Q3 N32 −Σ11 M13Q3 I −N32(N32)L H −1 H −1 −1 −1 H H −1 −1 H H −1 −(N32)L N31Σ11 −Q2Q3 Q0 I + Q2Q3 N32 (N32)L −Q2Q3 I −N32(N32)L , −1 −1 H −1 H H −1 Q3 Q0 −Q3 N32 Q3 I −N32(N32)L

H −1 H −1 r×k2 H where (N32)L = (N32N32) N32 ∈ C is the left inverse of N32, and

H H −1 H −1 k2×(nk−r) Q0 =− I −N32(N32)L N31Σ11 ∈ C , H −1 k2×k2 Q1 = K33 −N31Σ11 M13 ∈ C , H −1 r×k2 Q2 = M23 +(N32)L Q1 ∈ C , H H H −1 H k2×k2 Q3 = Q1 −N32Q2 = I −N32(N32)L Q1 −N32M23 ∈ C . 30

−1 This completes step 2, where we obtained a block matrix form of L(V,L). Theorem 5.1 is thus established. Remark B.1. For linear eigenproblems, let (V,L) be an exact minimal invariant pair of T (λ) = λI − A. Then the functional ∆X → T(∆X,L), which has the matrix representation H as discussed in Theorem 5.1, has the null space

k×k ker(T(·,L)) = {X = VK : K ∈ C which solves LK − KL = 0}.

The dimension of this null space equals the dimension of the null space of the K → LK − KL, and this dimension can be explicitly dertermined from the Jordan canonical form of L; see [5] for details.

REFERENCES

[1] T. Betcke, N. J. Higham, V. Mehrmann, C. Schroder¨ and F. Tisseur, NLEVP: a col- lection of nonlinear eigenvalue problems, MIMS EPrint 2010.98, Manchester Institute for Mathematical Sciences, University of Manchester, 2010. [2] T. Betcke and D. Kressner, Perturbation, extraction and refinement of invariant pairs for matrix polynomials, Linear Algebra and its Applications, Vol. 435 (2011), pp. 514–536. [3] A. Frommer and V. Simoncini, Matrix functions, in “Model Order Reduction: Theory, Re- search Aspects and Applications”, Mathematics in Industry, Wil H. A. Schilders and Henk A. van der Vorst eds, Springer, Heidelberg, 2008, pp.275–304. [4] G. Dahlquest and A.˚ Bjorck¨ , Numerical Methods in Scientific Computing: Volume 1, SIAM, Philadelphia, 2008. [5] F. R. Gantmacher, The Theory of Matrices, Vols. I, II, AMS Chelsea Publishing Company, Providence, 1959. [6] I. Gohberg, P. Lancaster and L. Rodman, Matrix Polynomials, Academic Press, New York, 1982. Republished by SIAM, Philadelphia, 2009. [7] G. H. Golub and C. F. van Loan, Matrix Computations, 3rd edition, The Johns Hopkins University Press, Baltimore, 1996. [8] W. B. Gragg and R. A. Tapia, Optimal error bounds for the Newton-Kantorovich theorem, SIAM Journal on Numerical Analysis, Vol. 11 (1974), pp. 10–13. [9] N. J. Higham, Functions of Matrices: Theory and Computation, SIAM, Philadelphia, 2008. [10] M. A. Kaashoek and S. M. Verduyn Lunel, Characteristic matrices and spectral properties of evolutionary systems, Transactions of the American Mathematical Society, Vol. 334 (1992), pp. 479–517. [11] V. Kozlov and V. Maz’ia, Differential Equations with Operator Coefficients, Springer, Berlin- Heidelberg, 1999. [12] D. Kressner, A block Newton method for nonlinear eigenvalue problems, Numerische Mathe- matik, Vol. 114 (2009), pp. 355–372. [13] D.S. Mackey, N. Mackey, C. Mehl, V. Mehrmann, Vector spaces of linearizations for matrix polynomials, SIAM Journal on Matrix Analysis and Applications, Vol. 28 (2006), pp. 971–1004. [14] D.S. Mackey, N. Mackey, C. Mehl, V. Mehrmann, Structured polynomial eigenvalue prob- lems: good vibrations from good linearizations, SIAM Journal on Matrix Analysis and Applications, Vol. 28 (2006), pp. 1029–1051. [15] R. Mathias, A chain rule for matrix functions and applications, SIAM Journal on Matrix Analysis and Applications, Vol. 17 (1996), pp. 610–620. [16] V. Mehrmann and C. Schroder¨ , Nonlinear eigenvalue and frequency response problems in industrial practice, Journal of Mathematics in Industry, 1:7 (2011). [17] V. Mehrmann, H. Voss, Nonlinear eigenvalue problems: a challenge for modern eigenvalue methods, Mitteilungen der Gesellschaft f¨urAngewandte Mathematik und Mechanik, Vol. 27 (2005), pp. 121–151. [18] J. Moro, J. V. Burke and M. L. Overton, On the Lidskii-Vishik-Lyusternik perturbation theory for eigenvalues of matrices with arbitrary Jordan structure, SIAM Journal on Matrix Analysis and Applications, Vol. 18 (1997), pp. 793–817. [19] J. M. Ortega, The Newton-Kantorovich theorem, The American Mathematical Monthly, Vol. 75 (1968), pp. 658–660. 31

[20] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York, London, 1970. Republished by SIAM, Philadelphia, 2000. [21] K. Schreiber, Nonlinear Eigenvalue Problems: Newton-type Methods and Nonlinear Rayleigh Functionals, Ph.D thesis, Institute f¨urMathematik, TU Berlin, 2008. [22] H. Schwetlick and R. Losche¨ , A generalized Rayleigh quotient iteration for computing simple eigenvalues of nonnormal matrices, Zeitschrift f¨urAngewandte Mathematik und Mechanik (ZAMM - Journal of Applied Mathematics and Mechanics), Vol. 80 (2000), pp. 9–25. [23] H. Schwetlick and K. Schreiber, Nonlinear Rayleigh functionals, Linear Algebra and Its Applications, Vol. 436 (2012), pp. 3991–4096. [24] G.W. Stewart, Error bounds for approximate invariant subspaces of closed linear operators, SIAM Journal on Numerical Analysis, Vol. 8 (1971), pp. 796–808. [25] G.W. Stewart, Error and perturbation bounds for subspaces associated with certain eigenvalue problems, SIAM Review, Vol. 15 (1973), pp. 727–764. [26] G.W. Stewart, Matrix Algorithms, Volume I: Basic Decompositions, SIAM, Philadelphia, 1998. [27] G.W. Stewart, Matrix Algorithms, Volume II: Eigensystems, SIAM, Philadelphia, 2001. [28] G.W. Stewart and J.-G. Sun, Matrix Perturbation Theory, Academic Press, 1990. [29] Y. Su and Z. Bai, Solving rational eigenvalue problems via linearization, SIAM Journal on Matrix Analysis and Applications, Vol. 32 (2011), pp. 201–216. [30] J.-G. Sun, Matrix Perturbation Analysis (Chinese), 2nd edition, Science Press, Beijing, 2001. [31] F. Tisseur and K. Meerbergen, The quadratic eigenvalue problem, SIAM Review, Vol. 43 (2001), pp. 234–286.