Future Generation Computer Systems 19 (2003) 1231–1242

New algorithms for the iterative refinement of estimates of invariant subspaces K. Hüper a,∗, P. Van Dooren b a Department of Mathematics, Würzburg University, Am Hubland, D-97074 Würzburg, Germany b Department of Mathematical Engineering, Université Catholique de Louvain, B-1348 Louvain-la-Neuve, Belgium

Abstract New methods for refining estimates of invariant subspaces of a non-symmetric are presented. We use global analysis to show local quadratic convergence of our method under mild conditions on the spectrum of the matrix. © 2003 Elsevier B.V. All rights reserved.

Keywords: Non-symmetric eigenvalue problem; ; Riccati equations; Sylvester equations; Global analysis; Local quadratic convergence

1. Refining estimates of invariant subspaces mations, which display local quadratic convergence to an upper block triangular form. The formulation The computation of invariant subspaces has re- of the algorithms and their convergence analysis are ceived a lot of attention in the numerical linear valid for different diagonal block sizes, as long as algebra literature. In this paper we present a new these blocks have disjoint spectrum, i.e., as long as algorithm that borrows ideas of two well established the corresponding invariant subspaces are well de- methods: the Riccati-/Sylvester-like iteration, and fined. An important special case is the computation the Jacobi-like iteration. Even though these tech- of the real Schur form, which groups complex con- niques are traditionally associated with very different jugate eigenvalues in 2 × 2 diagonal blocks. It can be eigenvalue algorithms, the algorithm that we de- obtained by our method, provided these eigenvalue rive in this paper makes a nice link between them. pairs are disjoint. We always work over R, but the Our method refines estimates of invariant subspaces generalization to C is immediate and we state with- of real non-symmetric matrices which are already out proof that all the results from this paper directly “nearly” upper block triangular, but not in condensed apply to the complex case. The outline of this paper form. It uses the Lie group of real unipotent lower is as follows. After introducing some notation we block triangular (n × n)-matrices as similarities on a will focus on an algorithm consisting of similarity nearly upper block . We develop a transformations by unipotent lower block triangular class of algorithms based on such similarity transfor- matrices. In order to improve numerical accuracy, we then use orthogonal transformations instead. The ∗ Corresponding author. convergence properties of the orthogonal algorithm is E-mail addresses: [email protected] shown to be an immediate consequence of the former (K. Hüper), [email protected] (P. Van Dooren). one.

0167-739X/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/S0167-739X(03)00048-7 1232 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242

2. Lower unipotent block triangular l ∈ ln and for arbitrary a ∈ V transformations Dσ(I, A) · (l, a) = lA − Al + a. (6) n×n Let V ⊂ Rn×n denote the vector space of real upper We show that for any h ∈ R the linear system (n × n) block triangular -matrices lA − Al + a = h (7) n×n V :={X ∈ R |Xij = 0ni×nj ∀1 ≤ j

(α)  .  n¯ α×nα We refer to [5] for a comparison of the approaches P :=  .  ∈ R , of the former three papers and to [16] for quantita- p(r,α) . tive results concerning Newton-type iterations to solve n¯ α=nα+1 +···+nr, (14) Riccati equations, see also the recent work [6,7]. 1234 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242

(1) (1) (1)−1 A rather natural idea to solve (18) approximately is Xk := L XL (1) (1) to ignore the second order term, −P X P , and ( ) ( ) −1 12 X 2 := L(2)X 1 L(2) solve instead the Sylvester equation k k . (1) (1) . P X11 + X21 − X22P = 0, (20) ((r )) r ((r )−1) r X 2 = L(( ))X 2 (L(( )))−1, which by Assumption 2.1 has a unique solution. k : 2 k 2 We now return to the general case where the num- l = ,...,(r ) where for 1 2 , the transformation matrix ber r of invariant subspaces to be computed is larger (l) L ∈ Ln differs from the identity matrix In only by than 2. With Jacobi-like sweep algorithms in mind, (l) the ijth block, say p . it is natural to formulate an algorithm which solves (l) n ×n Here β((ij)) = l and p ∈ R j i solves the an equation like (20) for P(1), say, along the lines Sylvester equation of (13)–(16), then transform X according to X → ( ) ( ) − ( ) L 1 X(L 1 ) 1, do the same for P 2 , and so forth. (l) (l−1) (l−1) (l) (l−1) p (X )jj − (X )iip + (X )ij = 0. One can show that such an algorithm would be a dif- k k k ferentiable map around A. Moreover, local quadratic The overall algorithm then consists of the following convergence could be proved by means of analysis. (α) iteration of sweeps. Instead of solving a Sylvester equation for P , i.e., solving for the corresponding block of (15), one can reduce the complexity by solving Sylvester equa- Algorithm 3.2 (Refinement of estimates of sub- tions of lower dimension in a cyclic manner, i.e., per- spaces). p(ij) ∈ Rni×nj form the algorithm block wise on each . • Let X0,...,Xk ∈ MLn be given for k ∈ N0. r The smaller the block sizes ni, the smaller the Riccati (1) (( )) • Define the recursive sequence X ,...,X 2 as equations one has to solve and the simpler the trans- k k (l) above (sweep). formations L . The smallest possible sizes ni one ((r )) • X = X 2 can choose for the diagonal blocks Ai,i should group Set k+1 : k . Proceed with the next sweep. the multiple eigenvalues of A into the diagonal blocks since this is needed by Assumption 2.1. An algorithm For the index set I :={(ij)}i=2,...,r;j=1,...,r−1 for block sizes 1 × 1 (implying distinct (real) eigen- we propose two particular orderings βcol : I → { ,...,(r )} β I →{,...,(r )} values of A) would lead to scalar algebraic Riccati 1 2 , and row : 1 2 , that are equations which are solvable in closed form. Such an best illustrated by the two diagrams in Fig. 1. Obvi- approach would come very close to [2,3,18] where the ously, the two orderings are mapped into each other authors studied Jacobi-type methods for solving the by just transposing the diagrams with respect to the non-symmetric (generalized) eigenvalue problem. antidiagonal.

3.1. Formulation of the algorithm 3.2. Local convergence analysis

The following algorithm will be analyzed for a ma- The next result shows that our algorithm is locally trix A satisfying Assumption 2.1 and an initial matrix a smooth map.

X ∈ MLn that is sufficiently close to A. Consider the index set I :={(ij)}i=2,...,r;j=1,...,r−1 and fix an or- Theorem 3.1. Algorithm 3.2, i.e., the mapping s : β I →{,...,(r )} M → M dering, i.e., a surjective map : 1 2 . Ln Ln is smooth locally around A. For convenience we rename double indices in the de- scription of the algorithm by simple ones by means of Proof. The algorithm is a composition of partial al- X → X β ij β((ij)) respecting the ordering . gorithmic steps ri : MLn → MLn , with ri(A) = A for all i. It therefore suffices to show smoothness for each Algorithm 3.1 (Sylvester sweep). Given an X ∈ ri around the fixed point A. Typically, for one partial

MLn . Define iteration step one has to compute the sub-block p of K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 1235

Fig. 1.

the unipotent lower block triangular matrix if βcol or βrow is chosen. By the chain rule we therefore r (A) i>j ≤ i ≤ l I have to compute D ij for all with 2 L = 0 ≤ j ≤ m − pI and 1 1. To be more precise, we have to study the effect of applying the Drij(A) : satisfying the equation TAMLn → TAMLn to those tangent vectors [l, A] ∈ T M r (A) I X X I A Ln onto which the “earlier” linear maps D pq −1 = 0 · 11 12 · 0 LXL pI X X −pI have already been applied to 21 22 ∗∗ s(A) · l, A = r (A) ··· r (A) · l, A , = , D [ ] D last D first [ ] −pX p ∗ 12 l ∈ ln. i.e., p has to solve the Sylvester equation pX11 +X21 − Notice that A is not only a fixed point of s but also X22p = 0. By Assumption 2.1 and since X is close to one of each individual r . Without loss of generality A, the spectra of X11 and X22 will be disjoint and the ij solution of this Sylvester equation exists and is unique. we make the simplifying assumption that the partition- × r (X) = Moreover, applying the implicit function theorem to ing consists of 5 5 blocks. Typically, an ij L −1 the function (X, p) → f(X, p), defined by f(X, p) = ijXLij looks like + X − X p = X → p(X) pX11 21 22 0, implies that is   smooth around A. Hence all partial iteration steps are I 0 000   smooth and the result follows. ᮀ  0 I 000   rij(X) =  00I 00 · X Theorem 3.1 justifies to use calculus for proving    0 p 0 I 0  higher order convergence of our algorithm. We show ij s 0000I next that the first derivative of our algorithm at the   fixed point A vanishes identically implying quadratic I 0 000 β   convergence if the chosen ordering is either row or  0 I 000 β   col. ·  I  .  00 00 (21)  −p I  Theorem 3.2. Algorithm 3.2 converges locally 0 ij 0 0 0000I quadratically fast if ordering βrow or βcol is chosen. r (A)· l, A = (L −1)· l, X | = Proof. We will show that the first derivative Ds(A) of Therefore, D ij [ ] D ijXLij [ ] X=A s A L ,A + l, A L = L (A) · l, A the algorithm at the fixed point vanishes identically [ ij ] [ ], where ij : D ij [ ], and 1236 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 typically diagonal are eliminated. We therefore can conclude   that Dr (A)·[l, A] is upper block triangular. Moreover, 0 0 000 ij it follows from the proof of Lemma 2.2 that Dr (A) ·  0 0 000 ij   [l, A] = 0 as well. Essentially, Assumption 2.1 implies L =  0 0 000 ij   that the only Lie algebra element of ln commuting with  0 p 000 ij A into a upper block triangular matrix like A itself, 0 0 000 is the zero matrix. The theorem then follows from a p = p (X)· l, X | Taylor-type argument [14]: with ij : D ij [ ] X=A. We already know that p p (X)X + 2 2 ij solves a Sylvester equation, namely ij jj Xk+1 − A≤supD s(Z)·Xk − A . ᮀ ¯ Xij −Xiipij(X) = 0, with pij(X)|X=A = 0. Taking the Z∈U derivative of this Sylvester equation acting on [l, X] evaluated at X = A gives Quite naturally one might ask if the two orderings β and β are the only possible ones ensuring p (A)A + l, A − A p (A) = . row col ij jj [ ]ij ii ij 0 (22) quadratic convergence. The answer is no, because An easy computation verifies that the commutator “mixtures” of both strategies also suffice. As a corol- L ,A lary of Theorem 3.2 we obtain the following result. [ ij ] is of the following form:   0 ∗ 000 Corollary 3.1. Algorithm 3.2 is quadratic conver-  ∗   0 000 gent if the ordering is specified by the following   ∗  ,...,(r ) [L ,A] =  0 000 , two rules. The integers 1 2 to be filled in are: ij  p A − A p ∗∗∗ 0 ij jj ii ij (i) strictly increasing across each row, and (ii) are 0 0 000 strictly increasing up each column. i.e., it differs from zero only by the (ij)th block as well Remark 3.1. These possible orderings are related to as by the blocks right to it and above it. By (22),we Young tableaux, or to be more precise, to standard therefore obtain for the derivative of the (ij)th partial tableaux; see [9] for the connections between geom- GL step rij: etry of flag manifolds, representation theory of n,   and calculus of tableaux. 0 ∗ 000  ∗   0 000 It easily follows that for r = 3, βcol and βrow are  ∗  Drij(A) · [l, A] =  0 000 the only two possible orderings ensuring quadratic  p A − A p ∗∗∗ r = 0 ij jj ii ij convergence. For 4 there are already eight 0 0 000 possible orderings together with their “conjugate” counterparts. [L ,A]  ij  We did not comment yet on orderings which ∗ ∗ ∗∗∗ are definitely not leading to quadratic convergence.    ∗ ∗ ∗∗∗ Generically, this is the case for any ordering which   + ∗ ∗ ∗∗∗. does not respect Corollary 3.1, but when the fixed   ∗ [l, A]ij ∗∗∗ point matrix A has some specific zero blocks above ∗ ∗ ∗∗∗ the diagonal, quadratic convergence may be recov- ered. For a more detailed discussion on this and for [l,A] illustrative examples, we refer to [14,15]. That is, by (22) the first derivative annihilates the (ij)th block, altering those blocks which are above or to the right to this (ij)th block, but it leaves invariant all the 4. Orthogonal transformations remaining blocks. Both ordering strategies then imply that after a whole iteration step all those blocks of For numerical reasons it makes more sense to the tangent vector [l, A] lying below the main block use orthogonal transformations instead of unipotent K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 1237 lower triangular ones. We therefore reformulate Obviously, the set of orthogonal Q-matrices does Algorithm 3.2 in terms of orthogonal transformations. include The convergence analysis for this new algorithm 1 1 −1 Q := √ , but does not include will greatly benefit from the calculations we already 11 did. 2 r = 0 −1 2 For convenience we assume for a while that 5. Q := = Q . Given 10   Note that limp→±∞ L/∈ L . Nevertheless, we are I 0000 2   able to construct at least locally the space on which  0 I 000   an orthogonal version of Algorithm 3.2 can be de- L =  I  ,  00 00 fined. This construction will allow us to use a similar  0 p 0 I 0  analysis to prove quadratic convergence. Consider an 0000I arbitrary element L ∈ Ln in a sufficiently small neigh- borhood ULn (In) of the identity In in Ln, such that L a quite natural idea is to use instead of L the orthogonal can be parameterized by exponential coordinates of Q-factor from L after performing Gram–Schmidt, i.e., the second kind, cf. [19, p. 86]. Let L = L RQ, to the rows of sub-blocks of .Wehave L = L r ···L = R r Q r ···R Q . .  − / .  − / ( ) 1 ( ) ( ) 1 1 (24) with Nl=(I + p p) 1 2 and Nr=(I + pp ) 1 2 the 2 2 2 factorization: Here the Li are defined as in (21). Each Li, for r l   i = 1,... ,( ), is represented as Li = e i with, e.g., I 2 0000 using βrow as an ordering,     Nl 0 p Nr 0      0 ··· ··· 0 L = R · Q =  I 00  . .     . .. .   −1  . . . Nr 0 l =   , 1  .. .  I  0 . .    I p1 0 ··· 0 0000       0 Nl 0 −Nlp 0  0 ··· ··· 0    . .  ·  I  .  . .. .   00 00 (23)  . . .  l2 =   ,... (25)  0 Nrp 0 Nr 0  .. .  0 . .  000 0 I 0 p2 0 ··· 0 Before we proceed to formulate the orthogonal ver- We can therefore study the map sion of Algorithm 3.2 we need some preliminaries. σ L ⊃ U (I ) → SO , Namely we have to fix the manifold such an algorithm : n Ln n n is “living” on. Consider an “Iwasawa Decomposition” L → Q(r )(L) ·····Q (L). (26) 2 1 [13] of the Lie group Ln. The set of orthogonal matri- Q (I ) = I i = ,...,(r ) ces Q coming from an RQ-decomposition as in (23) Note that i n n for all 1 2 . The fol- σ do in general not generate an orthogonal group with lowing series of lemmata characterizes the mapping . the ordinary matrix product as group operation. To see Lemma 4.1. The mapping σ defined by (26) is this we look at the simple 2 × 2 case smooth. 10 (1 + p2)−1/2 p(1 + p2)−1/2 = Proof. See the explicit form of the Qi given as in 2 1/2 p 1 0 (1 + p ) (23). ᮀ ( + p2)−1/2 −( + p2)−1/2p · 1 1 . Lemma 4.2. The mapping σ defined by (26) is an 2 −1/2 2 −1/2 (1 + p ) p(I+ p ) immersion at In. 1238 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242

 Proof. We have to show that the derivative Dσ(In) : TIn σ(ULn (In)) → TAMSOn , defined by l − l → (r )   l → so l = 2 l ∈ l [l−l ,A], we partition l−l conformably with A, i.e., n n is injective. For arbitrary i=1 i n the following holds true:   A11 ··· Arr   (r )  . .  2 A =  .. .  , Dσ(In) · l = DQi(In) · li i= A 1  rr  (r ) −p ··· −p 2 0 21 r1 = (l − l) = l − l,   i i (27)  .. .  i=  p . .  1 l − l =  21  .  .  2  −1/2  . ..   where we have used (d/dε)(I + ε p p) |ε=0 = 0  . . −p   − / r,r −1 and (d/dε)(I + ε2pp ) 1 2|ε= = 0. Eq. (27) implies 0 p ··· p injectivity in an obvious manner. ᮀ r1 r,r −1 0  Note that [l − l ,A]r1 = pr1A11 − Arrpr1. Assume We can now apply the immersion theorem, cf. the converse, i.e., [1, p. 199].   [l − l ,A] = [˜l − ˜l ,A] (30) Lemma 4.3. The mapping σ as defined by (26) holds for some ˜l = l with is a diffeomorphism of ULn (In) onto the image   σ(UL (In)). 0 n  .   p˜ ..  ˜l =  21  ∈ l . Consider the isospectral manifold :  . .  n  . ..  n×n  . MSOn :={X ∈ R |X = QAQ ,Q∈ SOn} (28) p˜ r1 ···p ˜r,r −1 0 A ∈ V with as above fulfilling Assumption 2.1. Define Looking at the (r1)-block of (30) implies (pr1 −  p˜ r )A − A (pr −˜pr ) = 0, which by Assumption α : σ(ULn (In)) → MSOn ,Q → QAQ . (29) 1 11 rr 1 1 2.1 implies in turn that pr1 =˜pr1. Now we use induc- Lemma 4.4. The mapping α defined as in (29) is tion on the sub-diagonal blocks going from the lower smooth. left corner block of (30) to the first sub-diagonal blocks. Applying recursively the same arguments on Proof. The result follows by the explicit construction the (r−1, 1)-block of (30), as well as on the (r2)-block p =˜p p =˜p of an arbitrary Q by using exponential coordinates of of (30), then implies r2 r2 and r−1,1 r−1,1.  ˜ ˜  the second kind. ᮀ Finally, we get [l − l ,A] = [l − l ,A] ⇒ l = l ,a contradiction. Therefore, Dα(In) is injective, hence α Lemma 4.5. The mapping α defined as in (29) is an is an immersion at In. ᮀ immersion at In. Consequently, we have the following lemma. Proof. We have to show that the derivative Dα(In) : α ◦ σ TI σ(UL (In)) → TAMSO is injective. Arbitrary el- Lemma 4.6. The composition mapping : n n n U (I ) → M U (I ) ements of the tangent space TI σ(UL (In)) have the Ln n SOn is a diffeomorphism of Ln n n n (α ◦ σ)(U (I )) form onto the image Ln n . (r ) 2   4.1. The algorithm (li − li ) = l − l , i= 1 The following algorithm will be analyzed for A whereas those of the tangent space TAMSOn look satisfying Assumption 2.1. We denote in the se-  like [l − l ,A]. To show injectivity of Dα(In) : quel M := (α ◦ σ)(ULn (In)). We are given an K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 1239

X ∈ M close to A and we consider the index set Proof. The algorithm is a composition of partial al- I :={(ij)}i=2,...,r;j=1,...,r−1 and fix an ordering β.For gorithmic steps ri. Smoothness of these partial algo- convenience we again rename double indices in the rithmic steps follows from the smoothness of each pi description of the algorithm by simple ones by means already shown. ᮀ of Xij → Xβ((ij)) respecting the ordering β. Theorem 4.2. Algorithm 4.2 converges locally Algorithm 4.1 (Orthogonal Sylvester sweep). Given quadratically fast if for working off the partial algo- an X ∈ (α ◦ σ)(ULn (In)) = M, define rithmic steps an ordering is chosen which respects Corollary 3.1. X(1) = Q  k : 1XQ1 (2) (1)  r (A) i>j ≤ X := Q X Q Proof. We will compute D ij for all with 2 k 2 k 2 i ≤ l ≤ j ≤ m − . and 1 1. Without loss of generality . we may assume that the partitioning consists of 5 × r r Q (( )) (( )−1) 5 blocks. Typically, a transformation matrix ij for 2 r 2   Xk := Q( )Xk Q(r ), r (X) = Q 2 2 ij ijXQij looks like l = ,...,(r )   where for 1 2 the transformation matrix I 0000 Ql ∈ SOn differs from the identity matrix In only by    S (X) −S (X)p(X)  four sub-blocks. Namely, the  0 ij 0 ij ij 0    Q (X) =  I  ,  −1/2 ij  00 00 ( jj)th block equals (I + p p) ,    − /   0 Tij(X)pij(X) 0 Tij(X) 0  ( ji)th block equals − (I + p p) 1 2p ,  − / 000 0 I (ij)th block equals (I + pp ) 1 2p,  − / (31) (ii)th block equals (I + pp ) 1 2. S (X) = (I +p(X)p(X))−1/2 T (X) = n ×n where ij : , and ij : Here β((ij)) = l and pl ∈ R j i solves the Sylvester  −1/2 (I + p(X)p (X)) . Moreover, Sij(A) = Ini and equation T (A) = I Ω ∈ so /(so ⊕···⊕so ) ij nj . An arbitrary n n1 nr p(l)(X(l−1)) − (X(l−1)) p(l) + (X(l−1)) = . looks like k jj k ii k ij 0   −Ω ··· −Ω 0 21 r1  . .  The overall algorithm consists of the following it-  Ω .. .  Ω =  21  . eration of orthogonal sweeps.  . .   . .. −Ω  r,r −1 Algorithm 4.2 (Orthogonal refinement of subspace Ωr1 ··· Ωr,r −1 0 estimates). The derivative of one partial algorithmic step acting • X ,...,Xk ∈ M k ∈ N  Let 0 be given for 0. Ω, A ∈ TAM r (A) · Ω, A = Q ,A + ( ) ((r )) on [ ] is as D ij [ ] [ ij ] 1 2  • Define the recursive sequence Xk ,...,Xk as Ω, A Q = Q (A) · Ω, A [ ], where ij : D ij [ ], and typically above (sweep). ((r ))   • Set Xk+ := X 2 . Proceed with the next sweep. 00000 1 k    S (A) −(p)(A)   0 ij 0 ij 0  4.2. Local convergence analysis   Q =   ij  00000  p (A) T  (A)  Analogous to Theorem 3.1 we have the following  0 ij 0 ij 0  theorem. 00000 s p (A) = p (X) · Ω, X | Theorem 4.1. Algorithm 4.2, i.e., the mapping : with ij : D ij [ ] X=A. We already M → M is smooth locally around A. know that pij solves a Sylvester equation, namely 1240 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 pij(X)Xjj+Xij−Xiipij(X) = 0, with pij(X)|X=A = 0. with A into a upper block triangular matrix, is the zero Taking the derivative of this Sylvester equation acting matrix. This can also be seen from the fact that the  on [Ω, A]gives above Ω is of the type l − l , where l ∈ ln. The result p (A)A + Ω, A − A p (A) = . then follows. ᮀ ij jj [ ]ij ii ij 0 (32) An easy computation verifies that the commutator Q ,A 5. Computational aspects [ ij ] is of the following form:   0 ∗ ∗∗∗ In this section we look at computational aspects of    ∗ ∗∗∗ the basic recurrences described in Section 3. We first  0    focus on the complexity of the solution of the Sylvester Q ,A =  0 ∗ ∗∗∗ , [ ij ]   equation needed both in the Sylvester sweep and in the  p A − A p ∗∗∗  0 ij jj ii ij  orthogonal Sylvester sweep algorithm. Using the no- 0 0 000 tation of those sections, we want to solve the equation + X − X P = , ( ) p A −A p PX11 21 22 0 (33) i.e., the ij th block equals ij jj ii ij and columns ni×nj of blocks to the left as well as rows of blocks below are where we will assume Xij ∈ R and n2 ≥ zero. By (32), we therefore obtain for the derivative n1. The recommended method here is to use the of the (ij)th partial step rij: Hessenberg–Schur method described in [11]. In this   method one computes orthogonal similarity transfor- 0 ∗ ∗∗∗ . mations Ui, i = 1, 2 such that S=UTX U is in real   1 11 1  0 ∗ ∗∗∗   Schur form (i.e. an upper block triangular. matrix with r (A) · Ω, A =  ∗ ∗∗∗ 1×1or2×2 blocks on diagonal) and H=UTX U is D ij [ ]  0  2 22 2     in Hessenberg form (i.e. a matrix with zeros below the 0 p Ajj − Aiip ∗∗∗ . ij ij first sub-diagonal). Defining also F=UTX U and 0 0 000 . 2 21 1 Z=UT Z 2 PU1 we obtain an equivalent equation in : [Q ,A]  ij  ZS + F − HZ = 0, (34) ∗ ∗ ∗∗∗   H  ∗ ∗ ∗∗∗ which is easier to solve. The special forms of and   S z + ∗ ∗ ∗∗∗. indeed yield simple recurrences for the columns k,   k = 1,...,n of Z.Ifsk+ ,k = 0 then zk is obtained  ∗ [Ω, A]ij ∗∗∗ 1 1 from previously computed columns using ∗ ∗ ∗∗∗ k−1 [Ω,A] (H − sk,kI)zk = fk + sj,k zj (35) That is, by (32) the first derivative annihilates the j=1 (ij)th block, altering eventually those blocks which are 2 (which can be solved in n flops) and if sk+1,k = 0 above, to the right, or a combination of both, to this 2 then columns zk and zk+1 are similarly computed from (ij)th block, but it leaves invariant all the remaining blocks. Apparently, all ordering strategies respecting sk,k sk,k+1 H[zk|zk+1] − [zk|zk+1] Corollary 3.1 ensure that after a whole iteration step all sk+1,k sk+1,k+1 those blocks lying below the main block diagonal are k−1 r (A) · eliminated. We therefore can conclude that D ij = [fk|fk+ ] + [sj,k zj|sj,k+ zj] Ω, A 1 1 [ ] is strictly upper block triangular. Again we can j=1 even conclude more, namely Drij(A) · [Ω, A] = 0. n2 Following the argumentation in the proof of Lemma (which can be solved in 6 2 flops). It is shown in [11] 2.2, essentially, Assumption 2.1 ensures that the only that the overall algorithm, including the back trans- son/(son ⊕···⊕son ) P = U T ( / )n3 + n3 + element of 1 r , which commutes formation 2ZU1 requires 5 3 2 10 1 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242 1241

n2n +( / )n2n n2/ 5 2 1 5 2 1 2 flops. It is also shown in that paper are 8 of those per sweep, the orthogonal sweep that this method is weakly stable, which essentially requires about 4n3 flops, i.e., about the double of a guarantees that the computed solution is as accurate standard Sylvester sweep. Since both algorithms are as what could be expected from any algorithm solving quadratically convergent, one can say that these algo- (33). Let us point out here that the computation of the rithms require O(n3) flops to converge (when starting solutions of each Sylvester equation of the Sylvester from a form that is almost block diagonal). We point sweep is negligible in comparison to the application out that the standard QR algorithm is of the same or- of the similarity transformations, provided the sizes nj der of complexity. of the blocks are small compared to n, the dimension We should mention that the methods described in of the matrix they operate on. Indeed, one similarity this paper extend to the generalized eigenvalue prob- transform implying a matrix P of dimension ni×nj re- lem in a straightforward manner. Instead of one Ric- quires 4n·ni·nj flops (for multiplying P with an nj ×n cati or one Sylvester equation one has then to solve matrix from the right and with an n × ni matrix from a system of two coupled ones. Everything is similar the left). A total Sylvester sweep therefore requires under an equivalent assumption on the spectra of the sub-blocks. 3 4n(ni · nj) ≤ 4n ni nj = 4n If the considered matrix is symmetric, our method i j i j is related to [12]. There, the so-called approximate Givens (or Jacobi) transformations are developed flops. For the more concrete case where all ni = nj = 2 (i.e. when we calculate a Schur-like form) the com- which essentially approximate an exact rotation to plexity can be calculated more precisely and is 2n3 zero out a matrix entry. It is not clear though that flops. Clearly, the computation of the solutions of the our algorithm has an interpretation as a Jacobi-type n2/8 Sylvester equations of size 2 × 2 is of the order method in the general non-symmetric case. of n2 flops and hence an order of magnitude less. The appeal of our method is to be found in For the orthogonal Sylvester sweep one has first to time-varying eigenvalue problems (as occurring, e.g. construct the orthogonal transformation Q described in tracking problems of slowly varying phenomena). in (23). A simple and reliable procedure for reason- At each step one then has a nearly block triangular able sized matrices P is to use its de- matrix to start with, which needs to be updated in ni×nj order to reduce it again to block triangular form via composition. Assuming P ∈ R with ni ≤ nj, we have P = U[ T 0 ]V , where U ∈ Rni×ni , V ∈ similarity transformations. One should also point out nj×nj that in terms of efficient implementation on parallel R and T = diag{ tan θ1,..., tan θnj } is diagonal and positive. It then follows that the four non-trivial architectures, our method should have much the same blocks of Q are given by: properties as Jacobi-like methods.

  Nl = VCV ,NlP = V [ S 0 ]U , References S  C 0  NrP = U V ,Nr = U U , 0 0 I [1] R. Abraham, J.E. Marsden, T. Ratiu, Manifolds, Tensor where Analysis, and Applications, 2nd ed., Springer, New York, 1988. [2] A. Bunse-Gerstner, H. Fassbender, On the generalized Schur C = { θ ,..., θn }, diag cos 1 cos j decomposition of a matrix pencil for parallel computation,

S = diag{ sin θ1,..., sin θnj }. SIAM J. Sci. Statist. Comput. 12 (4) (1991) 911–939. [3] J.-P. Charlier, P. Van Dooren, A Jacobi-like algorithm for The computational cost of constructing these transfor- computing the generalized Schur form of a regular pencil, J. mations is again negligible with respect to their appli- Comput. Appl. Math. 27 (1989) 17–36. [4] F. Chatelin, Simultaneous Newton’s iteration for the cation. For ni = nj = 2 each one essentially amounts × × n eigenproblem, Computing (Suppl. 5, Defect Correction toa4 4 row transformation on a 4 block row of Methods, Oberwolfach, 1983) (1984) 67–74. A and 4 × 4 column transformation on an n × 4 block [5] J. Demmel, Three methods for refining estimates of invariant column of A. This requires 32n flops and since there subspaces, Computing 38 (1987) 43–57. 1242 K. Hüper, P. Van Dooren / Future Generation Computer Systems 19 (2003) 1231–1242

[6] L. Dieci, A. Papini, Continuation of eigendecompositions, J. Karlsruhe, Germany. In 1996, he spent 2 months as a visiting Comput. Appl. Math. 19 (7) (2003) 1225–1237. associate at the Research School of Information Sciences and En- [7] L. Dieci, M. Friedman, Continuation of invariant subspaces, gineering in the Institute of Advanced Studies, Australian National Preprint, 2001. University, Canberra. In 2000, he was a postdoctoral researcher [8] J.J. Dongarra, C.B. Moler, J.H. Wilkinson, Improving the at the Department of Mathematics and the Institute for Systems accuracy of computed eigenvalues and eigenvectors, SIAM and Robotics, University of Coimbra, Portugal. From 1996 to J. Numer. Anal. 20 (1) (1983) 23–45. 2002, he was teaching and research assistant at the Institute of [9] W. Fulton, Young Tableaux, LMS Student Texts, vol. 35, Mathematics, Würzburg University, Germany. Since 2003, he is Cambridge University Press, Cambridge, 1997. Privatdozent (Senior Lecturer) at the Institute of Mathematics, [10] C.G. Gibson, Singular Points of Smooth Mappings, Pitman, Würzburg University, Germany. His research interests are in the Boston, 1979. area of Lie theory and geometric optimization on manifolds. His [11] G. Golub, S. Nash, C. Van Loan, A Hessenberg–Schur method most recent interests are the application of dynamical systems for the problem AX + XB = C, IEEE Trans. Automat. Contr. methods to matrix algorithms, interpolation problems on man- AC-24 (1979) 909–913. ifolds, and the mathematics behind nuclear-magnetic-resonance [12] J. Götze, Orthogonale Matrixtransformationen, Parallele experiments. He was a Co-guest Editor of a special issue of Algorithmen, Architekturen und Anwendungen, Oldenbourg, Systems and Control Letters. Dr. Hüper is member of the AMS München, 1995 (in German). and the German GAMM. [13] S. Helgason, Differential Geometry, Lie Groups, and Symme- tric Spaces, Academic Press, Boston, 1978. [14] K. Hüper, A calculus approach to matrix eigenvalue algori- thms, Habilitation Address, Würzburg University, Germany, 2002. [15] K. Hüper, A dynamical system approach to matrix eigen- P. Van Dooren received his engineer- value algorithms, in: J. Rosenthal, D.S. Gilliam (Eds.), Math- ing degree in computer science and the ematical Systems Theory in Biology, Communication, and doctoral degree in applied sciences, both Finance, IMA, vol. 134, Springer, New York, 2003, from the Katholieke Universiteit te Leu- pp. 257–274. ven, Belgium, in 1974 and 1979, re- [16] M.T. Nair, Computable error estimates for Newton’s iterations spectively. He held research and teach- for refining invariant subspaces, Indian J. Pure Appl. Math. ing positions at the Katholieke Univer- 21 (12) (1990) 1049–1054. siteit te Leuven (1974–1979), the Univer- [17] G.W. Stewart, Error and perturbation bounds for subspaces sity of Southern California (1978–1979), associated with certain eigenvalue problems, SIAM Rev. Stanford University (1979–1980), the 15 (4) (1973) 727–764. Australian National University (1984), Philips Research Lab- [18] G.W. Stewart, A Jacobi-like algorithm for computing the oratory Belgium (1980–1991), the University of Illinois at Schur decomposition of a non-Hermitian matrix, SIAM J. Urbana-Champaign (1991–1994), and the Universite Catholique Sci. Statist. Comput. 6 (4) (1986) 853–864. de Louvain (1980–1991, 1994 to now), where he is currently a [19] V.S. Varadarajan, Lie Groups, Lie Algebras, and their Repre- professor of Mathematical Engineering. Dr. Van Dooren received sentations, Springer, New York, 1984. his Householder Award in 1981 and the Wilkinson Prize of Nu- merical Analysis and Scientific Computing in 1989. He is an Associate Editor of Journal of Computational and Applied Math- ematics, Numerische Mathematik, SIAM Journal on Control and Optimization, Applied Mathematics Letters, and K. Hüper was born in Karlsruhe, Ger- its Applications, Journal of Numerical Algorithms, the Electronic many, in 1962. He received his Diploma Transactions of Numerical Analysis, Applied and Computational Physicist and the Doctor rer.nat. degrees Control, Signals and Systems, Mathematics of Control, Signals from Munich University of Technology, and the SIAM Journal on Matrix Analysis and Applications. His Germany, in 1990 and 1996, respectively. main interests lie in the areas of numerical linear algebra, sys- In 2002, he finished his Habilitation the- tems and control theory, digital signal processing, and parallel sis in Mathematics. From 1990 to 1995, algorithms. he was a teaching and research assistant at the Institute for Circuit Theory and Signal Processing, Munich University of Tech- nology. In 1995, he was a teaching and research assistant at the Institute for Algorithms and Cognitive Systems, University of