Stable Factorization of Hankel and Hankel-Like Matrices∗

Stable Factorization of Hankel and Hankel-like Matrices∗

Vadim Olshevsky Michael Stewart Department of Mathematics Computer Sciences Laboratory Georgia State University RSISE, Australian National University Atlanta, GA 30303 Canberra ACT 0200, Australia [email protected] [email protected] www.cs.gsu.edu/ matvro

e Keywords: Structured matrices, Hankel matrices, fast al- Several authors have developed many other fast O(n2) algo- gorithms, look-ahead, numerical stability. rithms for the factorization or inversion of Hankel matrices, see, e.g., [20, 21, 22] among others. The further speed-up 2 Abstract of computations is possible, and two superfast O(n log (n)) algorithms based on Pade´ approximation were given in [4]. This paper gives fast O(n2) algorithms for the factorization For the positive definite Hankel matrices the major atten- positive definite and indefinite Hankel matrices. The algo- tion was paid to the speed-up of computations, and the study rithms are based on the concept of displacement structure of numerical properties has received much less attention. In and are valid for the more general class of Hankel-like matri- fact, there are no results on the numerical stability of any of ces. The positive definite algorithm is proven to be backward the known fast algorithms for positive definite Hankel ma- stable. The indefinite algorithm uses a look-ahead step that trices. Moreover, many of them contain computations, like 1 is naturally suggested by displacement approach. Our error computing the factors of H− , that seem to be potentially nu- analysis suggests a new criterion for the size of the look- merical unstable. For some algorithms we know from practi- ahead step, and our numerical experiments suggest that the cal experience that they are not stable, cf., e.g., with [23]. In use of the new criterion allows us to ensure numerical stabil- fact, the design of fast and accurate algorithms for positive ity. definite Hankel matrices is quite a delicate problem. Indeed, 1 n 6 it was shown in [23] that κ2(H) = H 2 H− 2 3 2 − if H is positive definite. There arek evenk k morek discourag-≥ · 1 Introduction ing estimates which suggest that for positive definite Hankel matrices the condition number grows asymptotically as 4n. For h with 0 i 2n 2 we define the Hankel matrix i ≤ ≤ − Thus we do not expect to find forward accurate solutions, and focus on the backward stability. h0 h1 h2 hn 1 .. ···.. − h1 . . hn Building on the earlier work [9] (see aslo a survey [16]  . . .  and the references therein) we propose in 3 a new algorithm H = h ...... (1) §  2  for positive definite Hankel matrices that is perhaps the first  . . . .   ......  provably backward stable algorithm. Specifically, we show   in 4 that if C is the computed Cholesky factor of H then hn 1 hn h2n 2  − ······ −  §   A problem of computing recursive triangular factorizations CTC H f(n) H for H arises in coding theory, in the latter context the ele- k − k ≤ k k ments h are taken from a finite field. Such factorizations i where is the machine precision and f(n) is a polynomial in could be easily computed by general methods like Gaussian n. elimination or the bordering method. However, the disadvantage of these general methods is their high computational Various applications give rise to the more general classes cost of O(n3) arithmetic operations. In contrast, the fast of Hankel-related structure. One such class are the Hankel- Berlekamp-Massey algorithm computes the factorization of like matrices that are defined as those having a small dis- 1 2 placement rank. The latter is defined here as H− in only O(n ) operations. This algorithm was developed in [1] in the context of decoding BCH codes. T Hankel matrices with real hi often arise in several other rank(ZH HZ ) applications, e.g., in system theory. The fast O(n2) − Berlekamp-Massey algorithm applies to the real case as well. where Z is the downshift matrix defined by (Z)ij = 1 when This work was supported by NSF grant CCR 9732355. i j = 1 and (Z) = 0 otherwise. Clearly, the displacement ∗ − ij rank of any Hankel matrix does not exceed two: elements that are in rows i through j and columns k through l. The notation H(:, j) indicates column j of H. 0 h0 h1 hn 2 − − · · · − − h0  h  2 The Displacement ∆Z (H) := 1 , (2)  .   .  The Hankel matrix H is determined by 2n 1 parameters.   − hn 2  None of the results in this paper are strictly limited to Han-  −    kel matrices; everything will apply to the more general class so that the usual Hankel matrices are Hankel-like1. We gen- of Hankel-like matrices. To define this class, we borrow and eralize our algorithm to Hankel-like matrices of higher dis- adapt an idea used in the study of Toeplitz matrices by intro- placement rank in 6. In 6, we also give a condensed error ducing the Hankel displacement rank of a matrix. A symmet- analysis of the general§ algorithm§ using results from 4, and ric matrix H is Hankel-like with Hankel displacement rank 2 establish its backward stability for positive definite Hankel-§ whenever the displacement like matrices. ∆ (H) := ZH HZT Having established favorably results for positive definite Z − matrices we address the indefinite case. Perhaps the first is rank 2. The matrix Z is the downshift matrix defined by algorithms for (the inverse) factorization of (strongly non- (Z) = 1 when i j = 1 and (Z) = 0 otherwise. The singular) Hankel-like matrices appeared in [11]. However, ij ij notion of a Hankel-like− matrix can be generalized to larger the major problem causing instability of fast recursive algo- ranks. We will consider such generalizations in Sect. 6. For rithms is a possible presence of ill-conditioned leading sub- the moment, we deal exclusively with displacement rank 2 matrices. To cope with this problem look-ahead algorithms2 Hankel-like matrices. have been proposed, e.g., in [3, 19]. The algorithm of [10] is For any matrix, the displacement is skew-symmetric. For also a look-ahead algorithm. These algorithms were derived a Hankel matrix we have (2). We will represent a Hankel-like for the usual Hankel matrices, and in this paper we study the matrix by means of its generators, the columns of a matrix more general class of Hankel-like matrices. The two major A such that problems appearing in the design of a look-ahead algorithms are as follows. First, one needs to derive formulas for the 0 1 ∆ (H) = AJAT where J := . block step of the algorithm. Secondly, one needs a practical Z 1− 0 criterion to estimate the size of the block step. A displace- (3) ment structure approach leads to a surprisingly simple an- A real skew-symmetric matrix of rank 2 can always be de- swers to these questions. First, a block factorization step can composed as (3). Given ∆Z (H), the decomposition can be be derived in matrix notation in a few lines, cf. [9, 8, 16]. We computed in a stable manner using a single 2 2 pivot step of describe the basic block step for Hankel-like matrices in 3 a skew-symmetric elimination procedure[5].× Further, a real and look-ahead refinements in 5. Secondly, our error anal-§ skew-symmetric always has even rank. Thus if the displace- § 1 ysis allows us to suggest estimating the quantity H21H11− ment is nonzero then A has full rank. 1 k k (i.e., not just H11− ) as a new criterion for the size of the The generators are not unique. For a Hankel matrix it is look-ahead step.k Wek also derive a fast algorithm to estimate possible to choose generators with the simple form this quantity. Our numerical experiments suggest that (a) 1 T √h0 0 0 0 controlling the size of H21H11− seem to be sufficient to A = ··· . (4) k k 0 √h0 h1/√h0 hn 2/√h0 get the stability (b) it is also likely to be necessary for stabil- ··· − ity. We plan to address this issue and to report our numerical A generator matrix A for which the (1, 2) element equals results elsewhere. zero is said to be in proper form. When it is convenient, we use MATLAB notation to in- The displacement operator, ∆ ( ), is different from the dicate submatrices. Thus the matrix H(i : j, k : l) is the Z better known Toeplitz-like displacements· in three algorith- (j i + 1) (l k + 1) block of H formed by taking only − × − mically significant respects. 1The concept of displacement was introduced in [14] in connection with Toeplitz pattern of structure, and it was shown in [11] to be useful to study- 1. The Hankel displacement is skew-symmetric. This fact ing the other classes as well, including Hankel-like, Vadermonde-like and determines the classes of transformations that we can Cauchy-like matrices. We refer to [11, 15, 16] for the results on other classes use in implementing a Schur-type factorization algo- of structured matrices as well as for more references. Here we focus only on numerical issues related to Hankel-like matrices. rithm. 2We would like to briefly mention a seemingly overlooked fact that the Berlekamp-Massey algorithm also could be ascribed to the group of look- 2. The proper form generators of A are not unique, even ahead algorithms. Indeed, the Berlekamp-Massey algorithm was developed ignoring sign changes or scaling. For a Toeplitz ma- for coding theory applications, where hk are taken from a finite field. In trix, the proper form generators are unique up to sign this case the probability for H to have an exactly singular leading submatrix changes. This difference has implications for the nu- is clearly non-zero. To overcome this disadvantage, Berlekamp developed a what is now called a look-ahead step, i.e., a clever way to jump over exactly merical stability of factorizations: for a given Hankel- singular leading submatrices. like matrix, there exists a generator matrix in proper form with arbitrarily large norm. Hence to get a sta- a11 = 0 then the (1, 1) element of H must be zero. This is ble algorithm, we must be very careful in the choice of not possible for a positive definite matrix. generators. An algorithm that is forced to work with To see that (5) captures all proper form generators note generators that are much larger than the matrix to be that any equivalent generator matrix can be represented as factored is likely to be unstable. B = AS for det(S) = 1. Since B is assumed to be in proper form b11 0 = a11 0 S. Since a11 = 0, S(1, 1) = 3. The linear displacement operator ∆Z ( ) has a non- 6 3 · b11/a11 = d and S(1, 2) = 0. The complete form of S trivial null space. This means that it is not possible follows from the fact that to get det(S) = 1 for a triangular to reconstruct H given only ∆Z (H). A factorization al- matrix, we must have S(2, 2) = 1/S(1, 1) = 1/d. If d = 1, gorithm cannot work solely with the generators of H; 1/d or l are very large, then B will be much larger than it must incorporate some additional information. We A . Putting the generators ink properk form does not directly will show that H can be reconstructed from ∆Z (H) and guaranteek k a bound on their size. This is in contrast to the from the last column of H. generators of a positive definite Toeplitz matrix for which the We will elaborate on each of these points in turn. proper form generators are unique up to sign changes and can A matrix S that satisfies SJST = J is a 2 2 symplec- be bounded in a way that is useful for a numerical stability tic matrix. General r r symplectic matrices× are defined analysis[2]. × We now consider the null space of ∆ ( ). Define the per- 0 Ir/2 T 0 Ir/2 Z by the relation S − S = − . We mutation, , so that its action on a vector· is to reverse the Ir/2 0 Ir/2 0 P will make use of these more general symplectic matrices in order of the elements. Thus P (i, j) = 1 if i + j = n + 1 Sect. 6. We will use 2 2 symplectic matrices in the factor- and P (i, j) = 0 otherwise. Then PZT = ZP so that if × n 1 i 0 ization of displacement rank 2 Hankel-like matrices. If S is N = i=0− niZ P where Z = I then symplectic then ASJSTAT = AJAT and AS is an alternate T P n 1 set of generators for ∆Z (H) = AJA . − ∆ (N) = n (ZZiP ZiPZT) = In devising a factorization algorithm we are free to ap- Z i − i=0 ply any symplectic transformation to the generator matrix to X compute an alternate set of generators for the same displace- n 1 − ment. n (ZZiP ZiZP ) = 0. a b 0 1 a c i − Note that = i=0 c d 1− 0 b d X 0 (ad bc) Considering the form of N, we see that any Hankel matrix − − . In the 2 2 case, if det(S) = 1 of the form (1) for which h = h = h = 0 is in the (ad bc) 0 × 0 1 n 2 − null space of ∆ ( ). ··· − then S is symplectic. Symplectic transformations that will Z · later be of particular algorithmic interest are the plane To see that this is a complete characterization of the null T c s space, note that the relation ZN NZ = 0 implies rotations Q = with c2 + s2 = 1 and the diagonal − s− c N(i, j 1) = N(i 1, j) d 0 scaling scaling matrices D = for d = 0. − − 0 1 d 6 for 2 i, j n. Given the first row and last column of N, If A = 0 so that the displacement rank is exactly 2 and this recurrence≤ ≤ gives the other elements. If ZN NZT = 0 A must6 have full rank then AJAT = BJBT implies that then the first row of N must be zero except possibly− for its AT = J TA BJ BT. and J = (J TA BJ)J(J TA BJ)T † † † last element. Thus the recurrence implies that if ∆ (N) = 0 so that J TA BJ) = 1 is symplectic. It follows that for a Z † then particular fixed choice of generator matrix A the set of equivalent generator matrices representing ∆Z (H) can be charac- 0 0 0 n1 .. ···.. terized by B = AS where S is symplectic. 0 . . n2 n 1 a11 0 −  . . .  Consider the proper form generators A = N = n ZiP = 0 ...... a21 a22 i   i=0  . . . .  where a11 is a scalar and a21 and a22 are vectors. In general, X  ......    if A is in proper form and a11 = 0, then the set of equivalent n n n  6  1 2 ······ n proper form generators representing the displacement has the   form Consequently the null space of ∆ ( ) is the set of Hankel b 0 a 0 d 0 Z · B = 11 = 11 (5) matrices that are lower triangular with respect to the cross- b b a a l 1 21 22 21 22 d diagonal. for any l and any d = 0. The assumption that a = 0 Although H cannot be recovered from ∆ (H), it is deter- 6 11 6 Z can be justified in the positive definite case by the fact that if mined uniquely by ∆Z (H) together with H(:, n). Since the 3in [17] we called such matrices H partially reconstructible and devel- first row of ZH is zero oped algorithms for them in the context of boundary rational interpolation problems. H(1, 1 : n 1) = [∆ (H)] (1, 2 : n). (6) − − Z Thus we have the first row and the last column of H. By I 0 L = 1 . Clearly symmetry we also have the first column and last row. Given H21H− I − 11 these elements, it is easy to see that the other elements can 1 T T T T T T T be obtained uniquely from the relation LZL− (LHL ) (LHL )L− Z L = LAJA L − (12)

[∆Z (H)] = H(i 1, j) H(i, j 1), 2 i, j n. T H11 0 1 i,j − − − ≤ ≤ and LHL = . The matrix LZL− is lower tri- (7) 0 HS 1 This is simply an indexed form of the definition ∆Z (H) = angular and its (2, 2) block is LZL− is just Z22. It follows ZH HZT. that − Finally, we note that (6) is algorithmically important. It T 1 1 T Z H H Z = (A H H− A )J(A H H− A ) . provides a natural way to obtain most of the first row of H. 22 S− S 22 2− 21 11 1 2− 21 11 1 This is precisely what is needed to obtain a row of the (13) Cholesky factor in a factorization algorithm. If the genera- Since Z22 is a shift matrix, this shows that HS has the same tors are in proper form then we get the particularly simple sort of displacement structure as H and formula 1 AS = A2 H21H11− A1. (14) H(1, 1 : n 1) = a aT . (8) − − 11 22 The formula (14) was derived in [9] (see also [8]), and it If C is the Cholesky factor of H then will be the starting point in deriving a stable algorithm for positive definite matrices. It also forms a direct basis for a C(1, 1 : n 1) = 11 aT . (9) a look-ahead algorithm for indefinite matrices. The look- − a (1) 22 r 22 ahead algorithm is recursive and, neglecting details about the computation of the look-ahead step size that we will cover in 3 Schur Type Algorithms Sect. 5, it can be described as follows. Algorithm 1. Let nS = n and

As with the Toeplitz displacement, the Schur complement T of an arbitrary matrix has a displacement rank with respect ∆Z (H) = AJA . to ∆ ( ) that is no larger than the displacement rank of the Z Let r be the last column of H. Start with L = I and D = 0. original· matrix. Thus Hankel-like structure is preserved under the process of Schur complementation. This observation 1. While nS > 0: forms the basis of a class of fast Schur-type algorithms. At 2. Let H be the current n n Schur complement with each stage of the triangular factorization the algorithms work S S × S with the generators of the current Schur complement together generators A and last column r. Find an appropriate with the last column instead of with the full matrix. look-ahead step size, m, and let HS and A be partitioned as (10) and (11) where H is m m. Since any symplectic S can be used to transform the gen- 11 × erators, many distinct algorithms are possible. In this section 3. Compute H and H from A and r using (7). we will explain the basic ideas behind algorithms for both 11 21 the positive definite and the indefinite cases. The algorithm 4. Let for the positive definite case is chosen to be provably stable. 1 More efficient variations are possible and it seems that some L(n nS +m : n, n nS +1 : n nS +m) H21H− − − − ← 11 of these algorithms are likely to be stable in practice. Nev- ertheless, some possible algorithms are clearly unstable; we and will illustrate the problem with one such algorithm in this D(n n +1 : n n +m, n n +1 : n n +m) H . section. − S − S − S − S ← 11 Suppose that H is real, symmetric and Hankel-like with T 5. Update A and r by ∆Z (H) = AJA . Partition H as 1 A A H H− A , H HT ← 2 − 21 11 1 H = 11 21 (10) 1 H21 H22 r r(m + 1 : n ) H H− r(1 : m) ← S − 21 11 A as and let A nS nS m. A = 1 (11) ← − A2 Neglecting numerical errors, the algorithm computes Z 0 lower triangular L and block diagonal D such that H = and Z = 11 . Z Z LDLT. The sizes of the diagonal blocks in D are the look- 21 22 We consider the Schur complement HS = ahead step sizes m chosen in each iteration of the algorithm. 1 T H H H− H and the elimination matrix Given (13), it is trivial to prove that the algorithm correctly 22 − 21 11 21 computes the decomposition. Consider the first step of the It is easily verified that this Hankel matrix has generators algorithm for which HS = H. Step 4 is a block elemination step with an m m pivot. Step 5 uses (13) a direct imple- T 1 0 0 0 0 × A = 10 9 8 7 . mentation of the formula for a Schur complement to get the 1 5 10− 4.5 10− 4.95 10− 6.075 10− × × × × generators and last column of the Schur complement of H. By returning to step 1, the algorithm recursively computes This matrix was formed from the elements of H by multi- T 1 T T plying the generator matrix A shown in (4) from the left LS and DS such that H22 H21H11− H21 = LSDSL . Thus − S 1 0 1 1 √h0 1 T by . We would hope that a stable al- T I 0 H11 0 IH11− H21 0 √h0 0 1 LDL = 1 T = H21H11− LS 0 DS 0 LS gorithm would prevent generator growth and introduce er- (15) rors not much larger than the size of the generators. Since H HT H 1/2 .55 is not too different from A this would give = 11 21 = H. H L D LT + H HT HT ak smallk relative≈ backward error. k k 21 S S S 21 11 21 This is a familiar and natural recursive description of block However, factoring this matrix using a direct implementa- Gaussian elimination. It is easy to check from the indices in tion of the scalar version of Algorithm 1 results in substan- 1 tial generator growth. We get a computed Cholesky factor for step 4 that the algorithm fills in H21H11− and H11 into the T C C H 5 blocks of L and D in a manner consistent with (15). which k − k = 3.9 10− . Although the example might H × For the indefinite case, we will choose a look-ahead step seem somewhatk k contrived, it is worth noting that even start- 1 size with the intent of keeping H21H11− small to prevent ing with (4), the generators of the Schur complements would any significant growth in the sizek of the generatorsk when ap- not have any obvious special structure when computed by plying (13) directly. For reasons of stability, it is necessary (17). We have not found a Hankel matrix for which initial to keep this quantity from becoming too large in the applica- generators of the form (4) lead to instability in (17). tion of any type of block elimination algorithm[13]. While To deal with this problem we present a more sophisti- we do not give a proof, experimental results suggest that this cated algorithm for the positive definite case. It will make criterion is also sufficient to give a stable algorithm. We give use of transformations of the generators AS with S satis- further details on the implementation of the algorithm and its fying SJST = J. As with Algorithm 1, it can be viewed performance in Sect. 5. as a recursive process in which we obtain the first row of For the case in which H is positive definite and we wish the Cholesky factor of H from the generators and from the to compute its Cholesky factor we consider the paritioning stored last column of H and then compute the generators and the last column of the Schur complement of H. The main h hT h hT H = 11 21 = 0 21 (16) difference is that rather than applying (17) directly, we will h H h H 21 22 21 22 use transformations of the generators of the form AS to get a proper form in which the row of the Cholesky factor can where h = h is a scalar. This corresponds to the case 11 0 be retrieved from (9) and in which (17) is guaranteed not to m = 1 and Algorithm 1 will use the simple update produce any significant generator growth. h21 The result of these modifications is the following positive Aˆ2 = A2 A1 (17) − h0 definite Schur-type algorithm. T Algorithm 2. Let nS = n and let ∆Z (H) = AJA . Let r where A1 is a row vector. The scalar h0 and the vector h21 be the last column of H. Start with C = 0. h can be obtained by using the relation 0 = h21(1 : n 2) 1. While nS > 0: T − A2JA1 or by using (8) if the generator matrix is in proper 2. Partition the current n 2 generator matrix A as form. Thus step 3 of Algorithm 1 is trivial and we get a S × simple O(n2) algorithm to generate the rows of the Cholesky a a factor of H in sequence. A = 11 12 (18) a21 a22 However, if h21/h0 is large then (17) could result in significant generatork growth.k The following example shows where a and a are scalars. that h /h can be large for a positive definite Hankel-like 11 12 k 21 0k matrix and that this can negatively impact the stability of the d 0 3. Scale so that . 1 1 pivot version of Algorithm 1. A A 1 A(:, 1) 2 = A(:, 2) 2 ← 0 d k k k k Example× 1. We construct a 5 5 Hankel matrix H = KTK from a Krylov matrix × 4. Put the generators into proper form using an orthogonal transformation K = b Bb B2b B3b B4b c s A A − formed from B = 4 diag(1, 2, 3, 4, 5) and ← s c · T 5 b = 1 10− 1 1 1 1 1 . so that after the transformation a = 0 and a > 0. × · 12 11 5. Let 4 Error Analysis

a11 In this section we will outline a proof of the numerical stabil- C(n n + 1, n n + 1 : n 1) aT − S − S − ← a (1) 22 ity of Algorithm 2. While we will only highlight a few key r 22 issues the analysis can be made rigorous. The result is that if and C is the computed Cholesky factor of H then r(1) C(n nS + 1, n) . CTC H f(n) H − ← a11a22(1) k − k ≤ k k p where is the machine precision and f(n) is a polynomial in 6. Update A and r by n. Given the well known stability of plane rotations it should a22(2 : nS 1)/a22(1) A a21 a11 − a22 , not be surprising that Step 5 of Algorithm 2 is stable so long ← − r(1)/(a22(1)a11) 2 as the generators do not satisfy A H . In particular, it is straightforward to showk thatk if Aˆ is k thek transformed a22(2 : nS 1)/a22(1) generator matrix after Step 5 and Hˆ is the Hankel-like ma- r r(2 : nS) r(1) − ← − r(1)/(a22(1)a11) trix corresponding to the unmodified last row of H and the modified Aˆ then and let nS nS 1. ← − 2 H Hˆ f1(n) A (19) The fact that the algorithm computes C such that H = k − k ≤ k k T 2 C C is easy to establish. Steps 3 and 4 apply transforma- for some polynomial f1(n). It follows that if A is not tions AS such that det(S) = 1 and SJST = J. After step 4, too large relative to H then the new generatorsk correspondk we have proper form generators for the current Schur com- to a nearby matrix andk k step 5 will not lead to large relative plement. Thus (9) holds. Step 5 uses this relation and the backward errors. first element of r in an obvious way to get the complete first The rest of the analysis breaks into three parts: showing row of the Cholesky factor of the current Schur complement. that the generators do not become too large under Steps 3 The update for A in step 6 is just (17) applied to generators and 6, that Step 5 is a stable way to recover the row of the in proper form. The update for r is a direct application of the Cholesky factor and that Step 6 produces Schur complement expression for the last column of the Schur complement of generators satisfying a relation similar to (19) where H is H. The process proceeds recursively on the Schur comple- an exact Schur complement. Going from last to first, we note ment of H in the usual manner. that Step 6 can be viewed as an elimination step applied to the In practice, the complexity of the algorithm can be reduced generator matrix—a step that also occurs in fast algorithms by replacing the plane rotation with for the factorization of Cauchy-like matrices. Methods developed for the analysis fast Cauchy factorization can be applied 1 0 0 1 1 0 to show that this part of the algorithm is stable. Since Step 5 , − l 1 1 0 l 1 is just a scaling, the second part of the analysis is relatively straighforward. By far the most interesting part of the analy- where the pivoting is done as needed to ensure that l < 1. sis is the first: we must show that the generators never grow | | A careful analysis suggests the potential for an imbalance in so large as to destroy the stability of the algorithm. the scaling of the columns of the generators, thus making the In fact, Step 3 of the algorithm was introduced with normalization in step 3 necessary to ensure the stability of the sole intention of provably avoiding generator growth. the plane rotation. However, in practice this does not seem Whether or not Step 3 is actually used, it can be shown that to occur very often if at all. The scaling does not seem to be the generators, A(k), after k iterations of the algorithm al- necessary for stability in most cases. ways satisfy As presented, Algorithm 2 requires about 8.5n2 flops. Ne- A(k)(:, 1) A(k)(:, 2) k k2k k2 ≤ glecting the norm computation of Step 3, it requires approx- (0) (0) 2 2 A (:, 1) 2 A (:, 2) 2 + C(1 : k, :) F . imately 6.5n flops. If is used rather than 2 for ≤ k k k k k k the normalization, thenk the · k∞ algorithm really can bek · imple- k Since this product of column norms is bounded by a quan- mented in 6.5n2 flops, although searches are required to find tity of the same order of magnitude as H , the only con- the largest element in each column of A. Replacing the plane cern with generator growth is that there mightk k be poor scaling rotation with a pivoted lower triangular transformation re- of the generator columns in which one column is very large duces the computation further to 4.5n2 flops. Eliminating and one is very small. The application of a plane rotation step 3 altogether gives an algorithm that runs in 3.5n2 flops. to an improperly scaled generator matrix would incur errors While all of these variations seem to have robust stability proportional to the larger column and could well introduce properties, none of them suggest a means for deriving the large backward errors. In contrast, even with poor scaling, simple and natural backward error bounds that can be found the rescaling of Step 3 is always stable. Thus if Step 6 pro- for Algorithm 2. duces a poorly scaled generator matrix, Step 3 always fixes the problem before a plane rotation is applied. The rescal- and ing is sufficient to prove stability, but it is not clear that it is 1 T GS = G2 h21g1 , (22) necessary. An example of poor generator scaling destroying − h11 stability when the rescaling is not used has not been found; it then is possible that Step 3 could be left out. Zn 1 0 In 1 0 T BS T − KS − KSZn 1 = JHT GS 0 I − 0 Zm − DS 5 More on the Look-Ahead Algorithm where Zn 1 is a (n 1) (n 1) shift matrix. − − × − When implemented with a stable method for inverting H11, Proof: The proof is by direct verification that the displace- Algorithm 1 forms the basis of a look-ahead algorithm. How- ment of KS has the required factorization. Note that H is ever, it is not complete. Further refinements are necessary Hankel-like. If we start with the factorization Z H HZ = n − n for efficient implementation. To see why let H represent a BJ GT instead of Z H HZ = AJAT then as before HT n − n Schur complement at some generic iteration of Algorithm 1 (12) shows that and consider the block partitioning (10). To determine an appropriate look-ahead step we try to choose H to be m m 1 T 1 T 11 Zn 1 H22 h21h21 H22 h21h21 Zn 1 = 1 T × − − h11 − − h11 − with H11− H21 of moderate size. Estimating this norm for eachk possible sizek of H until an appropriate look-ahead 11 = B J GT step size is found can be costly. If the search were imple- S HT S mented in a naive manner the algorithm would have to invert whatever the choice of JHT . Thus if we define H11 for each tested step size. If the algorithm required look- 1 T ahead step sizes that were some significant fraction of n, the TS = T2 t1h21, resulting algorithm could require O(n4) flops. − h11 Fortunately, it is possible to guarantee that even in the T T it is sufficient to verify that TS ZmTSZ = DSJHT G . 3 n 1 S worst case the number of required computations is O(n ) Since − − with O(n2) being more typical if the look-ahead step size 0 eT can be bounded independent of n. The method uses the fact DJ GT = t T Z t T 1 = 1 T HT 1 2 m 1 2 T that for a displacement rank 2 Hankel-like matrix, H− H − 0 Zn 1 11 21 − is Toeplitz-like with displacement rank 3. To demonstrate T T t1 T2 ZmT2Zn 1 Zmt1e1 and exploit this fact, we use a general theorem describing − − − T T T a single step of a Schur algorithm for a rectangular matrix we have T2 ZmT2Zn 1 = DJHT G2 + Zmt1e1 . Conse- − T− containing both Toeplitz-like and Hankel-like blocks. quently TS ZmTSZn 1 = H − − Theorem 1. For K = consider the mixed Toeplitz- T T T 1 T T 1 T = DJHT G2 +Zmt1e1 + Zmt1h21Zn 1 t1h21 = Hankel displacement equation h11 − − h11

Z 0 I 0 B T 1 T 1 T n n T ˆT = DJHT G + Zmt1 h11 h21(1 : n 2) t1h . K KZn = JHT G 2 21 0 Im − 0 Zm D h11 − −h11 ¿From the displacement equation t1 = DJHT g1. Since the where all the matrices are real, H is n n and symmetric, T × first row of ZnH is zero the displacement also gives is m n, Zn and Zm are n n and m m shift matrices and × × × T T T JHT is a general matrix that may include both a symmetric b1 JHT G2 = h11 h21(1 : n 2) . and a skew-symmetric component. Assume that H(1, 1) = − − 6 T 0. Let Thus TS ZmTSZn 1 = − − T T T 1 1 h11 h21 b1 g1 T T T T H = ,T = t1 T2 ,B = ,G = , DJHT G2 Zmt1b1 JHT G2 DJHT g1h21 = h H B G − h11 − h11 21 22 2 2 T T 1 T T 1 T where b1 and d1 are row vectors. Also let = D Z t b J G g h − h m 1 1 HT 2 − h 1 21 11 11 H22 1 h21 T where the last equality follows because the skew-symmetry KS = h21 T2 − h11 t1 T T of ZnH HZn implies that b1 JHT g1 = 0. Given− a Hankel-like matrix H obtained as a Schur com- so that K is the Schur complement of K. If S plement at some iteration in Algorithm 1 and partitioned as 1 in (10), Theorem 1 gives a Schur algorithm for the matrix B = B h bT, (20) S 2 − h 21 1 11 H HT 11 21 H 1 T K := H21 H22 = . DS = D Zmt1b1 (21)   T − h11 Im 0   If H11 is m m then after m steps of elimination on this and matrix we have× a Schur complement T T h21(1 : n j 1) = b1 JHT G2(3 : n j+1, :) , 1 T − − − − H22 H21H11− H21 KS = − 1 T . (23) h (n j) = r(1). H− H 21 − 11 21 − Thus h is a vector of length n j. Compute t from 21 − 1 T T A1 Suppose that ZnH HZn = AJA for A = where − A2 t1 = DSJHT g1. A1 has m rows. Then the matrix K satisfies 2. Update B, D and G using (20), (21) and (22). Zn 0 In 0 T K KZn = 0 Im − 0 Zm 3. Update r using A JAT A JAT 1 1 1 1 2 B r r(2 : n j + 1) h21r(1). T T T ← − − h11 = A2JA1 A2JA2 = JHT G  T  D e1e 0 1 4. Let j j + 1 and go to 1.   where ← At the end of this process we have D and G satisfying (24). 1 T B = A 0n,1 ,D = 0m,2 e1 ,G = A e1 The matrix H11− H21 can be obtained from (25). 1 T To estimate a step size for which H− H is suitably k 11 21k 0 1 0 small, we can apply Algorithm 3 until a suitable step size − T and JHT = 1 0 0 . Although the KS shown in (23) is found. Clearly each element of ∆ := DJHT G can 0 0 1 be computed in a number of flops that depends on the dis- is not the same as the one in Theorem 1 in which H11 is placement rank and not on n or m. At each iteration of Al- 1 T assumed to be 1 1, we can obtain generators for (23) by gorithm 3, the matrix T := H− H can be computed in × 11 21 recursive application of (20), (21) and (22). This gives gen- O(mn) flops using the relation Ti,j = ∆i,j + Ti,j which is T erators BS, DS and GS for the displacement satisfied by any matrix T such that T ZmTZn m = ∆. Thus, even if the final look-ahead step size− is m =−n 1, the 1 T − BS T Zn m 0 H22 H21H11− H21 extra computation involved in computing T for each iteration JHT GS = − − 1 T 3 DS 0 Im H− H − of Algorithm 3 is O(n ). − 11 21 1 T In m 0 H22 H21H11− H21 T − − 1 T Zn m 0 Zm H− H − 6 General Hankel-like Matrices − 11 21

Thus Consider the case in which the displacement ∆Z (H) = ZH HZT has higher rank. In Sect. 2, we veriﬁed that 1 T 1 T T T − H11− H21 ZmH11− H21Zn m = DSJHT GS (24) if ∆ (H) has rank equal to 2 then it has a decomposition − − − Z of the form (3). The following theorem shows that a similar 1 T and clearly H11− H21 is a Toeplitz like matrix with displace- decomposition is possible in the more general case. ment rank 3. The displacement operator is invertible with the Theorem 2. Let ∆ be a real, skew-symmetric n n matrix inverse given by the formula with r = rank(∆) > 0. Then ∆ has the following× factorization. min(m 1,n m 1) − − − 1 T j T j T T H11− H21 = (Zm)DSJHT GS (Zn m) . J B1 − − j=0 T X J B2 (25) ∆ = B1 B2 Br/2  .   .  ··· .. . The following algorithm computes generators for     1  J BT  H11− H21.    r/2 Algorithm 3. For n n H satisfying Z H HZT = AJAT,    (26) × n − n start with where the Bi are real n 2 matrices. Proof: The decomposition× can be constructed using a B = A 0n,1 ,D = 0m,2 e1 ,G = A e1 skew-symmetric Gaussian elimination procedure with 2 2 pivots[5] of the form × and let r be the last column of H. For j = 1, 2, . . . , m: 0 δ 1 . 1. For B, D and G partitioned as in Theorem 1, compute δ −0 1 h11 and h21 from If the displacement ∆Z (H) is available, it is possible to h = bTJ G (2, :)T use r/2 steps of a skew-symmetric elimination procedure[5] 11 − 1 HT 2 to get a low rank factorization of this form. However, be- We introduce two types of elementary orthogonal sym- cause of potential sensitivity in the Schur complement to be plectic matrices[18]. The ﬁrst is the Householder symplectic truncated[12], it is important to use a complete pivoting strat- matrix of the form egy. T It is easy to show that there exists a permutation such R1 0 2uu P RH = where R1 = Ir . 0 R1 − uTu that J The second is the Jacobi symplectic rotation of the form J T 0 Ir/2 P  .  P = − = Jr C1 S1 .. Ir/2 0 RJ =   S1 C1  J −   r r   where C1 and S1 are diagonal matrices of the form Thus Theorem 2 implies that for any real skew-symmetric ∆ 2 × 2

of rank r there exists a matrix A := A1 A2 where A1 C1 = diag(1,..., 1, c, 1,..., 1), r T and A2 are n matrices satisfying ∆Z (H) = AJrA . 2 k 1 Note that this× notation is different from the notation of − (11). | {z } S1 = diag(0,..., 0, s, 0 ... 0)

k 1 6.1 Factorization Algorithms − with real c and s satisfying| c{z2 +} s2 = 1. These elemen- We partition H as (10) and let tary transformations may be computed to zero elements in a vector. In particular A11 A12 A1 A2 = . A21 A22 T T R1 0 C1 S1 T x y = xˆ 0 1 T 0 R1 S1 C1 If H = H H H− H then the arguments from Sect. 3 − S 22− 21 11 21 show that if yTR = y eT and T 1 1 ∆Z (HS) = ASJrAS k k where T c s 2 x R1(:, r/2) y = (xTR (:, r/2)) + y 2 0 . k k s c 1 k k 1 1 − hq i AS = A21 H21H11− A11 A22 H21H11− A11 . − − T T 2 2 T (27) If xˆ R2 = xˆ e then x + y e 0 = k k 1 k k k k 1 It is possible to compute H11 and H21 from A using (7). R p0 C S R 0 Thus Algorithm 1 can be adapted to an arbitrary displace- xT yT 1 1 1 2 . ment rank by using (27) in step 5 instead of (14). As before, 0 R1 S1 C1 0 R2 − this is potentially unstable; a small perturbation to the matrix from Example 1 can be chosen to give a displacement rank 4 The usual methods for computing plane rotations and House- Hankel-like matrix for which the modified version of Algo- holder transformations[7], ensure that these relations hold rithm 1 fails. All of the results of Sect. 5 apply to the larger and that the transformations are numerically stable. By substituting this combination of transformations for the displacement rank case by adding extra J blocks to JHT . Consequently a completely general look-ahead algorithm is single rotation of Algorithm 2, we get the following positive not significantly different from the displacement rank 2 ver- definite Hankel-like Schur algorithm. sion. Algorithm 4. Start with A1 and A2 such that As before, it is possible to get a provably stable algorithm T A1 for the positive definite case. In extending Algorithm 2, we ∆Z (H) = A1 A2 Jr T . T A2 note that any transformation S for which SJrS = Jr may ˆ be applied to A to get an equivalent set of generators A = AS Let r be the last column of H. Let n = n and C = 0. T S such that ∆Z (H) = AJˆ rAˆ . The set of matrices satisfying T SJrS = Jr are known as symplectic matrices. In order to 1. While nS > 0: prevent any possibility of generator growth, we will make use of the group of real orthogonal symplectic transformations 2. Let the current nS r generator matrix A be parti- × T T a11 a12 a13 a14 tioned as A1 A2 = where Q Q r r a21 A22 a23 A24 r r 11 12 T 2 2 Q × Q = ,Q Q = Ir,Q11,Q12 × . T T ∈ < | Q12 Q11 ∈ < a11 and a13 are scalars and a and a are both length − 12 14 r/2 1 row vectors. It is trivial to verify that any matrix of the specified form is − 1 both orthogonal and symplectic. Such matrices have applica- 3. Scale A1(:, 1) dA1(:, 1),A2(:, 1) d A2(:, 1) tions in the study of Hamiltonian eigenvalue problems[18, 6]. where d is chosen← so that A (:, 1) = A←(:, 1) . k 1 k2 k 2 k2 4. Put the generators into proper form using a symplectic The algorithm for indefinite matrices uses a very simple orthogonal transformation of the form look-ahead step that follows easily from the displacement formulation. The approach is much simpler than others that P 0 CS P 0 A A 1 2 . have been proposed in the literature. We have also suggested ← 0 P1 SC 0 P2 1 T − the size of H11− H21 as a criterion for choosing a look- ahead step sizek and shownk how to estimate this quantity effi- so that after the transformation ciently using a Schur algorithm for a mixed Hankel-Toeplitz a 0 0 0 displacement. Experiments suggest that controlling the size A = 11 a21 A22 A23 A24 of this quantity is sufficient for numerical stability. Results on block Gaussian elimination suggest that this is also likely with a11 > 0. to be necessary for stability. 5. Let a References C(n n + 1, n n + 1 : n 1) 11 aT − S − S − ← a (1) 23 r 23 [1] E. R. Berlekamp. Algebraic Coding Theory. McGraw- and Hill, New York, 1968. r(1) C(n nS + 1, n) . [2] A. W. Bojanczyk, R. P. Brent, F. R. De Hoog, and − ← a a (1) 11 23 D. R. Sweet. On the stability of the Bareiss and re- 6. Update A and r by p lated Toeplitz factorization algorithms. SIAM J. Matrix Anal. Appl., 16:40–57, 1995. a23(2 : nS 1)/a23(1) A a21 a11 − A22 a23 A24 , [3] A. W. Bojanczyk and G. Heinig. A multi-step algorithm ← − r(1)/(a23(1)a11) for Hankel matrices. Journal of Complexity, 10:142–

a23(2 : nS 1)/a23(1) 164, 1994. r r(2 : nS) r(1) − ← − r(1)/(a23(1)a11) [4] R. P. Brent, F. G. Gustavson, and D. Y. Y. Yun. Fast and let solution of Toeplitz systems of equations and compu- nS nS 1. tation of Pade´ approximants. Journal of Algorithms, ← − 1:259–295, 1980. Verifying that Algorithm 4 computes C such that CTC = H is not substantially different from the verification of Algo- [5] J. R. Bunch. A note on the stable decomposition of rithm 2. Steps 3 and 4 apply orthogonal symplectic transfor- skew symmetric matrices. Mathematics of Computa- T mations AS such that SJrS = Jr. After step 4, the gener- tion, 158:475–480, 1982. ators are in proper form. Using the obvious generalization of (8) [6] R. Byers. A Hamiltonian QR algorithm. SIAM Jour- T nal on Scientific and Statistical Computing, 7:212–229, H(1, 1 : n 1) = a11a23 − − 1986. and the first element of r we get the expressions for the first row of the Cholesky factor of the current Schur complement. [7] G. H. Golub and C. F. Van Loan. Matrix Computations. The update for A in step 6 is just (27) applied to generators Johns Hopkins University Press, Baltimore, Maryland, in proper form. The update for r is the same as in Algo- 3nd edition, 1996. rithm 2. The process then proceeds recursively on the Schur complement of H in the usual manner. The analysis outlined [8] I.Gohberg, T.Kailath and V.Olshevsky. Fast Gaussian in Sect. 4 applies to the more general algorithm with only elimination with partial pivoting for matrices with dis- minor modification. placement structure. Mathematics of Computation, 64 (1995), 1557-1576.

7 Conclusions [9] I.Gohberg and V.Olshevsky Fast state space algorithms for matrix Nehari and Nehari-Takagi interpola- We have described displacement structure algorithms for the tion problems, Integral Equations and Operator The- factorization of Hankel-like matrices. The algorithms for ory, 20, No. 1 (1994), 44-83. positive deﬁnite matrices use orthogonal symplectic transformations and admit a reasonably straightforward error analy- [10] M. H. Gutknecht. Stable row recurrences for the sis when a scaling step is used to prevent a potential imbal- Pade´ table and generically superfast look-ahead solvers ance in the scaling of the generator column vectors. We have for non-hermitian Toeplitz systems. Research Report not been able to show that this scaling step is necessary for 92-14, Interdisciplinary Project Center for Supercom- stability; in practice it can often be skipped without introduc- puting, Eidgenossische¨ Technische Hochschule Zuich,¨ ing excessive backward errors. 1992. [11] G.Heinig and K.Rost, Algebraic methods for Toeplitz- like matrices and operators, Operator Theory, 13, Birkhauser, Basel, 1984.

[12] N. J. Higham. Analysis of the Cholesky decomposition of a semi-deﬁnite matrix. In M. G. Cox and S. J. Ham- marling, editors, Reliable Numerical Computation. Ox- ford University Press, 1989.

[13] N. J. Higham. Accuracy and Stability of Numerical Al- gorithms. SIAM, Philadelphia, 1996. [14] T.Kailath, S.Kung and M.Morf. Displacement ranks of matrices and linear equations, J. Math. Anal. and Appl., 68 (1979), 395-407.

[15] T.Kailath and A.H.Sayed. Displacement structure : Theory and Applications, SIAM Review, 37 No.3 (1995), 297-386. [16] V.Olshevsky, Pivoting for Structured Matrices with Ap- plications, preprint, 1997. To appear in Linear Algebra and Its Applications.

[17] T. Kailath and V. Olshevsky, Diagonal Pivoting for Partially Reconstructible Cauchy-like Matrices, With Applications to Toeplitz-like Linear Equations and to Boundary Rational Matrix Interpolation Problems, Lin- ear Algebra and Its Applications, 254 (1997), 251-302.

[18] C. C. Paige and C. Van Loan. A Schur decomposition for Hamiltonian matrices. Linear Algebra and its Ap- plications, 41:11–32, 1981.

[19] D. Pal and T. Kailath. Fast triangular factorization and inversion of Hankel and related matrices with arbitrary rank proﬁle. SIAM Journal on Matrix Analysis and Ap- plications, 15:451–478, 1994.

[20] J. L. Phillips. The triangular decomposition of Hankel matrices. Mathematics of Computation, 25:599–602, 1971.

[21] J. Rissanen. Algorithms for triangular decomposition of block Hankel and Toeplitz matrices with application to factoring positive matrix polynomials. Mathematics of Computation, 27:147–154, 1973.

[22] W. F. Trench. An algorithm for inversion of ﬁnite Han- kel matrices. J. SIAM, 13:1102–1107, 1965.

[23] E. Tyrtyshnikov. How bad are Hankel matrices? Nu- merische Mathematik, 67:261–269, 1994.

[24] J. H. Wilkinson. The Algebraic Eigenvalue Problem. Clarendon Press, Oxford, England, 1965.