MM Research Preprints, 170–182 KLMM, AMSS, Academia Sinica No. 24, December 2005

Fast Implementation of Low Approximation of a Sylvester 1)

Bingyu Li, Zhuojun Liu and Lihong Zhi Key Lab of Mechanization AMSS, Beijing 100080 China {liby,zliu,lzhi}@mmrc.iss.ac.cn

Abstract. In [16], authors described an algorithm based on Structured Total Least Norm (STLN) for constructing a Sylvester matrix of given lower rank and obtaining the nearest per- turbed with exact GCD of given degree. For their algorithm, the overall computation time depends on solving a sequence least squares (LS) problems. In this paper, a fast implementa- tion for solving these LS problems is proposed. The increased efficiency is obtained by exploiting the low displacement rank of the involved coefficient matrices.

1. Introduction The authors in [16] described an algorithm based on STLN [23, 21] for constructing a Sylvester matrix of given lower rank and obtaining the nearest perturbed polynomials with exact GCD of given degree. For their algorithm, the overall computation time depends on solving a sequence least squares (LS) problems and the algorithm has a complexity of O(st2), where s = 2m + 2n − k + 3, t = 2m + 2n − 2k + 3. Motivated by the previous work [8, 27, 17] on displacement structure of the Sylvester matrix, in this paper, a fast implementation for solving these LS problems is proposed. It has a quadratic amount of complexity O(s2). The increased efficiency is obtained by exploiting the low displacement rank of the involved coefficient matrices. It can be seen that the fast implementation is more efficient when k is not large. The organization of this paper is as follows. In Section 2, we introduce some results for the computation of the nearest perturbed polynomials described in [16]. In the following two sections, we propose a fast algorithm for solving the involved LS problems efficiently and derive a forward error analysis respectively. In Section 5 we give numerical tests to show the performance of the fast algorithm. Then we give some concluding remarks in the last section.

2. Preliminaries m n Given two polynomials a, b ∈ R[x] with a = amx + ··· + a1x + a0 and b = bnx + ··· + b1x + b0, am 6= 0, bn 6= 0. S is the Sylvester matrix for a and b. The perturbations of a and b are denoted by m n ∆a = ∆amx + ··· + ∆a1x + ∆a0, ∆b = ∆bnx + ··· + ∆b1x + ∆b0 respectively. We consider the 2 2 minimal perturbation problem: minimize k∆ak2 + k∆bk2 preserving that a + ∆a and b + ∆b have an exact GCD of a given degree.

1) The work is Partially supported by a National Key Basic Research Project of China 2004CB318000 and Chinese National Science Foundation under Grant 10371127 and 10401035. 171

(m+n−k+1)×(m+n−2k+2) Denote Sk = [a Ak] ∈ R as the k-th Sylvester matrix,   am 0 ··· 0 0 bn 0 ··· 0 0    am−1 am ··· 0 0 bn−1 bn ··· 0 0   ......   ......   . . ··· . . . . ··· . .  S = , (2..1) k  0 0 ··· a0 a1 0 0 ··· b0 b1  0 0 ··· 0 a0 0 0 ··· 0 b0 | {z } | {z } n−k+1 m−k+1 where a is the first column of Sk, Ak consists of the last m + n − 2k + 1 columns of Sk. The perturbations ∆a and ∆b are expressed by a m + n + 2-dimensional vector d,

T d = [d1, d2, . . . , dm+n+1, dm+n+2] .

The k-th Sylvester structured perturbation of Sk is represented as [∆a Dk]. Theorem 1 Given univariate polynomials a(x) , b(x) ∈ R[x] with deg(a) = m and deg(b) = n. Let S(a, b) be the Sylvester matrix of a(x) and b(x), Sk be the k-th Sylvester matrix, 1 ≤ k ≤ min(m, n). Then the following statements are equivalent: (1) rank(S) ≤ m + n − k.

(2) Rank deficiency of Sk is greater than or equal to one. Theorem 2 [16] Given univariate polynomials a(x) , b(x) ∈ R[x] with deg(a) = m and deg(b) = n and a positive integer k ≤ min(m, n). Suppose Sk is the k-th Sylvester matrix of a and b. Partition Sk = [a Ak], where a is the first column of Sk and Ak consists of the last m + n − 2k + 1 columns of Sk. Then we have dim Nullspace(Sk) ≥ 1 ⇐⇒ Akx = a has a solution. Theorem 3 [16] Given integers m, n and k, k ≤ min(m, n), then there exists a Sylvester matrix S ∈ R(m+n)×(m+n) with rank m + n − k. Based on the above theorems we know that, for a given degree k, it is always possible to find a k-th Sylvester structured perturbation matrix [∆a Dk] such that a + ∆a ∈ Range(Ak + Dk). The minimal perturbation problem can be formulated as the following equality-constrained least squares problem:

min kdk2, subject to r = 0, (2..2) x where the structured residual r is given by

r = a + ∆a − (Ak + Dk)x.

Applying the penalty method [1], we transform (2..2) into °· ¸° ° wr ° min ° ° , (2..3) ° d ° d,x 2 where w is a large penal value. The vectors ∆a and Dkx can be expressed as

∆a = Pk d,Dkx = Xkd. 172

Here · ¸ I 0 P = m+1 , k 0 0 where Im+1 is an of order m + 1, and   0 xn+1−k    .. ..   x1 . xn+2−k .   . .   ......   . . 0 . . xn+1−k    Xk =  xn−k x1 xm+n+1−2k xn+2−k  .  . . . .   ......  xn−k xm+n+1−2k | {z } | {z } m+1 n+1

Then (2..3) becomes the following least squares problem: ° · ¸ · ¸ · ¸ ° ° ° ° w(Xk − Pk) w(Ak + Dk) ∆d −wr ° min ° + ° , (2..4) ∆x ∆d Im+n+2 0 ∆x d 2 where Im+n+2 is an identity matrix of order m + n + 2. Thus, an iterative algorithm was derived (see [16]) for solving the previously proposed minimal perturbation problem. Let us denote the coefficient matrix of the system in (2..4) by M, · ¸ w(X − P ) w(A + D ) M = k k k k , Im+n+2 0 · ¸ · ¸ ∆d wr and denote y = , z = , the least squares problem (2..4) can be rewritten as ∆x −d

min kMy − zk . (2..5) y 2

3. Fast implementation of LS problem (2..5) In this section we propose a fast implementation for solving the LS problem (2..5). The increased efficiency is obtained by exploiting the low displacement rank of the involved coefficient matrices.

3.1. Displacement structure

Hereafter, Ii and Zi denote i × i identity matrix and i × i lower respectively, where i is an integer. The displacement structure of an n × n R is defined as:

∇R = R − FRF T , where F is an n × n lower-. The choice of F depends on the matrix R, e.g., if R is a , F is chosen equal to a lower shift matrix Zn. If ∇R has a lower rank r (¿ n) independent of n, the size of R, then r is referred to as the displacement rank of R. It follows that ∇R can be factored as ∇R = GJGT , (3..1) 173 where G is an n × r matrix and J is a of the form · ¸ I 0 J = p , p + q = r. 0 −Iq

The integers p, q denote the numbers of positive eigenvalues and negative eigenvalues of ∇R respec- tively. The factorization (3..1) is nonunique. If G satisfies (3..1) then GΘ also satisfies (3..1) for any J- Θ, i.e. for any Θ such that ΘJΘT = J. The pair (G, J) is said to be a generator pair. LDLT factorization of a strong regular R (i.e., all its leading submatrices are nonsingular) can be efficiently carried out by a generalized Schur algorithm [15], which operates on the generator pair (G, J) directly and costs O(rn2). The number of iterations of the generalized Schur algorithm is equal to the size of R. Let G1 = G and denote by Gi the at the beginning of the i-th iteration. A J-unitary matrix 0 0 Θi is chosen such that Gi = GiΘi is in proper form, i.e., the top row of Gi has a single nonzero entry. More precisely, denote by fi the top row of Gi. The nonzero entry is in the first column if T T fiJfi > 0 (positive step); the nonzero entry is in the last column if fiJfi < 0 (negative step). The T case fiJfi = 0 is ruled out by the strong regularity of R. Denote an index by j, ½ j = 1, for a positive step; j = r, for a negative step. · ¸ 0 ¯ 0 T Denote by li = ¯ the i-th column of L, then li is the jth column of Gi. If fiJfi > 0 we set li T D(·i, i) = 1.¸ If fiJfi < 0 we set D(i, i) = −1. The next generator matrix Gi+1 is the nonzero part 0 0 of , which is formed as multiplying the jth column of Gi by Fi, the submatrix obtained Gi+1 0 by deleting the first i − 1 rows and columns of F , and keeping the other columns of Gi unchanged. To reduce the generator matrix Gi to a proper form, the J-unitary Θi can be formed as a product of a sequence of orthogonal transformations and a hyperbolic rotation. The implementation of the hyperbolic rotation is needed to be careful, it can be along the lines of [5, 4]. A more extensive description about the generalized Schur algorithm can be found in [15]. The notion of displacement rank can also be expanded to nonsymmetric matrices. The following result was given in [18]: Theorem 4 M is a structured matrix of displacement rank at most 4. With respect to the shift-block matrices F1 = diag (Zm+n−k+1,Zm+n+2) ,F2 = diag (Zm+1,Zn+1,Zn−k,Zm−k+1) , we have T T M − F1MF2 = [u1, u2, u3, u4][e1, em+2, em+n+3, e2n+m−k+3] , where ei denotes the i-th column of the identity matrix I2m+2n−2k+3, and

u1 = M(:, 1), u2 = M(:, m + 2), u3 = M(:, m + n + 3), u4 = M(:, 2n + m − k + 3).

3.2. Description of a fast solver for LS problem (2..5) Fast algorithms based on QR decomposition for solving least squares problems with coefficient matrices being Toeplitz matrices are developed in many works such as [9, 19, 7, 3, 10, 22, 24]. The stability properties of these algorithms are still not well understood and most of the algorithms may suffer from loss of accuracy when they are applied to ill-conditioned problems. Based on the method of corrected semi-normal equations, the algorithm derived in [20] can produce a more accurate R factor in the QR decomposition of a Toeplitz matrix, even for certain ill-conditioned matrices. Hence 174 this algorithm can be used to solve LS problems of significantly extended range from well-conditioned to certain ill-conditioned. However, for least squares problems (e.g.(2..5)) with coefficient matrices having three or more very small singular values, this method may not work as well [2, 20]. A new fast and efficient algorithm was developed by Gu in [13], which is based on a new fast algorithm for solving Cauchy-like least squares problems. In the present paper, we solve the least squares problem (2..5) by extending a fast solver for linear equations systems described by Chandrasekaran et al. in [6]. For the least squares system (2..5), denote by yLS, the least squares solution and by rLS, the minimum residual vector: rLS = MyLS − z, then yLS solves the following linear system: · ¸ · ¸ · ¸ M T MM T y 0 = . (3..2) M 0 −z z + rLS

Denote by Q the from the QR decomposition of M, we partition Q as: Q = (2m+2n−k+3)×k [Q1,Q2], where Q2 ∈ R , then ° · ¸ ° ° wr ° kr k = kQT zk = °QT ° . (3..3) LS 2 2 2 ° 2 −d ° 2 Due to the heavy weight of the upper block M(1..m + n − k + 1, :) of M, the entries of the block Q2(m + n − k + 2..2m + 2n − k + 3, :) are of O(1), the block Q2(1..m + n − k + 1, :) consists of near zero elements. Therefore, derived from (3..3), krLSk2 is of much smaller size compared to kzk2, i.e.

krLSk2 ¿ kzk2 . This motivates us to compute an approximate LS solution yˆ by means of the following augmented system: · ¸ · ¸ · ¸ M T MM T y 0 ≈ . (3..4) M 0 −z z

We normalize the matrix M and the vector z as:

M := M/kMkF , s.t. kMk2 ≤ 1, (3..5)

z := z/kMkF , (3..6) where kMkF is the Frobenius norm of M. Due to the large penal value w, after normalization, the lower left corner of M has very small diagonal elements. This causes the numerical rank deficiency of M T M. Moreover, since M is not a square matrix, the coefficient matrix of the linear system (3..4) is rank deficient. In order to complete the generalized Schur algorithm successfully, we construct T as: · ¸ M T M + αI(1) M T T = , (3..7) M −βI(2) where αI(1), βI(2) are small multiples of identity matrices such that M T M +αI(1) is positive definite, which ensures the positive steps complete successfully, and the Schur complement of T with respect to M T M + αI(1) is negative definite, which ensures the negative steps complete successfully. We explore the displacement structure of T as: Theorem 5 [18] T is a structured matrix with displacement rank at most 10. We construct a generator pair (G, J) for T such that

T − FTF T = GJGT , 175 where F = diag (Zm+1,Zn+1,Zn−k,Zm−k+1,Zm+n−k+1,Zm+n+2) ,

J = diag (I4, −I6) , and G is a matrix with 10 columns. In [18], we expressed G by columns of T which are of order 1/w2 where w is the large penal value. It is undesirable for numerical stability. Now we introduce a new generator matrix with columns gi, i = 1, ··· , 10, which are of order 1/w. Define t1, . . . , t4 as:

2 2 t1 = kM(:, 1)k2 + α, t2 = kM(:, m + 2)k2 + α, 2 2 t3 = kM(:, m + n + 3)k2 + α, t4 = kM(:, 2n + m − k + 3)k2 + α. Let

£ T T ¤T c1 = M (1, :)M,M (1, :) + αI(:, 1), £ T T ¤T c2 = M (m + 2, :)M,M (m + 2, :) + αI(:, m + 2), £ T T ¤T c3 = M (m + n + 3, :)M,M (m + n + 3, :) + αI(:, m + n + 3), £ T T ¤T c4 = M (2n + m − k + 3, :)M,M (2n + m − k + 3, :) + αI(:, 2n + m − k + 3), where I denotes the identity matrix of order 4m + 4n − 3k + 6. Then √ g = c / t , 1 1 √ 1 g = c / t , except that g [1] = 0, 2 2 √ 2 2 g = c / t , except that g [1] = 0, g [m + 2] = 0, 3 3 √ 3 3 3 g4 = c4/ t4, except that g4[1] = 0, g4[m + 2] = 0, g4[m + n + 3] = 0, £ T ¤T g5 = 0, g1 (2 : 4m + 4n − 3k + 6) , £ T T ¤T g6 = g2 (1 : m + 1), 0, g2 (m + 3 : 4m + 4n − 3k + 6) , £ T T ¤T g7 = g3 (1 : m + n + 2), 0, g3 (m + n + 4 : 4m + 4n − 3k + 6) , £ T T ¤T g8 = g (1 : 2n + m − k + 2), 0, g (2n + m − k + 4 : 4m + 4n − 3k + 6) , 4 p 4 g = [ 0,..., 0, 0, β, 0,..., 0 ]T , 9 | {z } 2m+2n−2k+3 p g = [ 0,..., 0, 0, β, 0,..., 0 ]T . 10 | {z } 3m+3n−3k+4

Expanding the proof in [6] by combining with singular values decomposition (SVD) of M, we prove that after applying 2m + 2n − 2k + 3 positive steps and 2m + 2n − k + 3 negative steps of the generalized Schur algorithm, which operate on the generator pair (G, J), we can obtain a backward stable factorization of T : · ¸ · ¸ RˆT 0 Rˆ QˆT , (3..8) Qˆ Dˆ 0 −Dˆ T where Rˆ is upper triangular and Dˆ is lower triangular. Using the backward stable factorization (3..8) we can solve the augmented system · ¸ · ¸ y 0 T = ξ z to get · ¸ · ¸ yˆ 0 (T + H) = , (3..9) ξˆ z 176

kHk2 = O(²), ² is the machine . Here · ¸ · ¸−1 · ¸−1 · ¸ yˆ Rˆ QˇT RˆT 0 0 = , ξˆ 0 −Dˆ T Qˆ Dˆ z and ˆy is computed by the expression

Rˆ−1QˆT Dˆ −T Dˆ −1z. (3..10)

ˆy is regarded as an approximate solution to the least squares system (2..5). In the phase of completing the triangular factorization of T by applying the generalized Schur algorithm, we assume that, for each iteration, the reduction of the generator matrix to a proper form is implemented via two Householder transformations and an elementary hyperbolic rotation in OD form [5], the number of flops needed is: 47 121 (4m + 4n − 3k + 6)2 + (4m + 4n − 3k + 6). 2 2 In the phase of obtaining the approximate solution ˆy (3..10) via two backward substitutions, one forward substitution and one matrix-vector multiplication, the number of flops needed is [11]:

2(2m + 2n − k + 3)2 + (2m + 2n − 2k + 3)2 + 2(2m + 2n − k + 3)(2m + 2n − 2k + 3).

4. Error Analysis We derived a relative forward error bound for the approximate LS solution to (2..5). Hereafter, we use κ(·) to denote a 2-norm condition number of its argument. We define u = m + n − k + 1, l = m + n − k + 2. For any integer i, 1 ≤ i ≤ u + l, we represent σi as the i-th largest singular value of M. Lemma 6 Denote by f, a vector which satisfies the following linear system: · ¸ · ¸ · ¸ M T M + αI(1) M T yˆ 0 = + f, (4..1) M −βI(2) ξˆ z we derive an upper bound for kˆy−yLS k2 which is roughly of the form: kyLS k2

2 2 β + αβκ (M) κ(M) + βκ (M) kfk2 2 2 + 2 2 . (4..2) kMk2 − αβκ (M) kMk2 − αβκ (M) kyLSk2 £ ¤ T T T (2m+2n−2k+3)×1 (2m+2n−k+3)×1 Proof. Partition f as: f1 , f2 , f1 ∈ R , f2 ∈ R , then from (4..1) we have: ½ ¡ ¢ T (1) T ˆ M M + αI ˆy + M ξ = f1, ˆ Myˆ − βξ = z + f2. Eliminating ξˆ, we get ³ ´ T (1) T T T β M M + αI yˆ + M Myˆ = M z + M f2 + βf1. (4..3)

Noting that T T M z = M MyLS, using elementary calculus we get: ³ ´ ³ ´ T (1) T T (1) T βM M + αβI + M M (yLS − yˆ) = βM M + αβI yLS − M f2 − βf1, (4..4) 177 and

³ ´−1 ³ ´ (1) T −1 (1) (1) T −1 yLS − yˆ = βI + αβ(M M) + I βI + αβ(M M) yLS ³ ´−1 (1) T −1 (1) ¡ † T −1 ¢ − βI + αβ(M M) + I M f2 + β(M M) f1 .

So ° ° ° ° ° ° ° T −1° ° †° ° T −1° kyLS − yˆk2 β + αβ (M M) 2 M 2 + β (M M) 2 kfk2 ≤ T −1 + T −1 kyLSk2 1 − β − αβ k(M M) k2 1 − β − αβ k(M M) k2 kyLSk2 β kMk2 + αβκ2(M) κ(M) kMk + βκ2(M) kfk = 2 + 2 2 , 2 2 2 2 ky k (1 − β) kMk2 − αβκ (M) (1 − β) kMk2 − αβκ (M) LS 2 where we use equations: ° ° ° ° ° †° ° T −1° 2 2 M 2 kMk2 = κ(M), (M M) 2 kMk2 = κ (M). Noting that

kMk2 ≤ 1, β ¿ 1, we get the final form of the inequality. ¤ £ ¤ T T T u×1 (l+k)×1 Lemma 7 Partition z as zu , zl+k , zu ∈ R , zl+k ∈ R , then for T defined as (3..7) we have ° · ¸ ° µ ¶ µ ¶ ° 0 ° 4 2 1 2 3 °T −1 ° ≤ + + kz k + + kz k . ° z ° u 2 l+k 2 2 σu wβ wσu+l β σu+l Proof. Denote · ¸ −1 B11 B12 T = T , B12 B22 then µ ¶ h i−1 −T ³ ´−1 † T (1) T (2) T (1) B11 = βM M M M + αI M + βI M M M + αI , µ ¶ ³ ´−1 h i−1 −1 T (1) T T (1) T (2) B12 = M M + αI M M M M + αI M + βI , µ ¶ h i−1 −1 T (1) T (2) B22 = − M M M + αI M + βI . · ¸ Σ Let M = U V T be the SVD of M, where U, V are orthogonal matrices and Σ is a 0k×(u+l) consisting of singular values of M. Write Σ as · ¸ Σ Σ = u , Σl where Σu = diag (σ1, ··· , σu) , Σl = diag (σu+1, ··· , σu+l) .

Partition U, V as · ¸ U11 U12 U = ,V = [V1,V2] , U21 U22 178

u×u (u+l)×u where U11 ∈ R ,V1 ∈ R . Define diagonal matrices ³ ´−1 £ −2¤−1 Λu = Iu + αΣu + βIu ,

0 ¡ −1¢−1 Λu = Σu + αΣu Λu, ³ ´−1 ¡ −1¢−1 £ −2¤−1 Λl = Σl + αΣl Il + αΣl + βIl , " ³ ´−1 # £ −2¤−1 Il + αΣl + βIl Λl+k = , 1 β Ik where Iu,Il,Ik denote identity matrices of dimension u, l, k respectively. Then

£ 0 T T 0 T T ¤ B12 = V1Λ U + V2 [Λl 0] U V1Λ U + V2 [Λl 0] U , · u 11 12 u 21 22 ¸ T T T T U11ΛuU11 + U12Λl+kU12 U11ΛuU21 + U12Λl+kU22 B22 = − T T T T . U21ΛuU11 + U22Λl+kU12 U21ΛuU21 + U22Λl+kU22 Note that the following inequalities hold:

0 kΛuk2 ≤ 1 + 1/σu, kΛuk2 ≤ 1/σu, kΛlk2 ≤ 1/σu+l, kΛl+kk2 ≤ 1 + 1/β. (4..5) For the last inequality the following assumption is used:

2 αβ < σu+l. Meantime, due to the heavy weight (equal to O(w)) of the submatrix M(1..u, :), the following in- equalities hold:

kU12k2 , kU21k2 ≤ 1/w. Hence, we have ° · ¸° ° 0 ° °T −1 ° ≤ kB zk + kB zk ° z ° 12 2 22 2 2 µ ¶ µ ¶ 4 2 1 2 3 ≤ + + kzuk2 + + kzl+kk2 . σu wβ wσu+l β σu+l ¤ Lemma 8 As previously defined, H is a backward error matrix which satisfies (3..9), then for the vector f defined in Lemma 6, we have ·µ ¶ µ ¶ ¸ kHk2 4 2 1 2 3 kfk2 ≤ + + kzuk2 + + kzl+kk2 , 1 − γkHk2 σu wβ wσu+l β σu+l where

2 γ = β/σu+l + 2/σu+l + 1/β + 1. Proof. By the definitions of f and H, we get · ¸ yˆ f = −H ξˆ · ¸ 0 = −H (T + H)−1 z · ¸ ¡ ¢−1 0 = −H I + T −1H T −1 . z 179

So ° · ¸° kHk ° 0 ° kfk ≤ 2 °T −1 ° 2 1 − kT −1Hk ° z ° 2 ° · 2 ¸° kHk ° 0 ° ≤ 2 °T −1 ° . −1 ° z ° 1 − kT k2 kHk2 2 Note that · ¸ −1 0 βΣu Λu T B11 = V −1 V , βΣ Λl · l ¸ 0 Λu 0u×k T B12 = V U , Λl 0l×k · ¸ Λu T B22 = −U U , Λl+k from (4..5) we have ° ° ° −1° T 2 ≤ kB11k2 + kB12k2 + kB22k2 2 ≤ β/σu+l + 2/σu+l + 1/β + 1. According to Lemma 7, we get the final form of the inequality. ¤ Theorem 9 Supposing yˆ is a vector computed from the linear system: · ¸ · ¸ yˆ 0 (T + H) = , ξˆ z

kHk2 = O(²), ² is the machine precision, where T is defined as (3..7). Then yˆ is an approximate solution to the least squares problem (2..5). 2 With an assumption that αβ < σu+l, we’ve derived an upper bound for the relative forward error kyˆ−y k LS 2 which is of the form: kyLS k2 2 2 β + αβκ (M) κ(M) + βκ (M) kHk2 δ 2 2 + 2 2 , (4..6) kMk2 − αβκ (M) kMk2 − αβκ (M) (1 − γkHk2)kyLSk2 where 2 γ = β/σ + 2/σu+l + 1/β + 1, µ u+l ¶ µ ¶ 4 2 1 2 3 δ = + + kzuk2 + + kzl+kk2 . σu wβ wσu+l β σu+l Proof. It follows immediately from Lemma 6, 7, 8. ¤ Remark 10 In order to complete the generalized Schur algorithm stably we take α, β that satisfy the lower bound derived in [6]. In essential, it means that α, β can be taken somewhat larger than the machine precision. Besides this, based on the error analysis in this section, α, β should be taken 2 such that the upper bound (4..6) is as small as it could be. Hence the assumption αβ < σu+l used in Theorem 9 is natural, noting that it is approximately equivalent to αβκ2(M) < 1, which is a necessary condition that the upper bound (4..6) is smaller than 1. Remark 11 In practice, after normalization (3..5), the following inequalities hold:

kzuk2 < 1, kzl+kk2 < 1/w; Furthermore, when the given integer k (2..4) is taken as the approximate GCD degree upper bound, we have δ = O(1). 180

5. Experiments In the following table, we show the performance of the fast version of the Algorithm AppSylv-k in [16], for which we apply the fast implementation described in this paper when solving the LS problem (2..5) in each iteration. In the numerical tests, we compute minimal perturbations needed for an exact GCD with given degree of univariate polynomials. Here we set Digits = 15, α = β = 10−14.

error error error cond.num for.err. εˆ(ˆy) Ex m,n k (Zeng) (classic) (new fast) (LS Prob.) (LS Prob.) (LS Prob.) 1 2, 2 1 5.98099e–3 5.59933e–3 5.59933e–3 0.109e13 0.461e–3 0.238e–12 2 3, 3 2 1.08172e–2 1.07129e–2 1.07129e–2 0.148e13 0.434e–3 0.176e–13 3 5, 4 3 1.56146e–6 1.56146e–6 1.56146e–6 0.209e13 0.143e–2 0.143e–13 4 5, 5 3 1.34138e–8 1.34138e–8 1.34318e–8 0.232e13 0.664e–3 0.452e–13 5 6, 6 4 1.96333e–10 1.96333e–10 1.96333e–10 0.239e13 0.182e–3 0.448e–13 6 8, 7 4 1.98415e–16 1.98415e–16 1.98416e–16 0.312e13 0.322e–3 0.896e–14 7 10, 10 5 1.65332e–12 1.51551e–12 1.51552e–12 0.545e13 0.272e–2 0.598e–13 8 14, 13 7 2.61850e–4 2.61818e–4 2.61819e–4 0.697e13 0.163e–1 0.112e–12 9 28, 28 10 3.55567e–4 2.54575e–4 3.54600e–4 0.151e14 0.992e–1 0.512e–14 10 50, 50 30 9.35269e–6 9.35252e–6 9.40237e–6 0.302e14 0.134 0.168e–13 The sample polynomials are the same as those generated in [16]: For every example, we use 50 random cases for each (m, n), and report the average over all results. For each example, the prime parts and GCD of two polynomials are constructed by choosing polynomials with random integer coefficients in the range −10 ≤ c ≤ 10, and then adding a perturbation. For noise we choose a relative tolerance 10−e, then randomly choose a that has the same degree as the product, and coefficients in [−10e, 10e]. Finally, we scale the perturbation so that the relative error is 10−e. In the table, m, n denote the total degrees of the polynomials a and b; k is the degree of ap- proximate GCD of a and b; error (Zeng), error (classic), and error (new fast) denote the minimal perturbations computed by the algorithms given in [26], [16] and this report respectively; cond.num. denotes the 2-norm condition number of the LS problem (2..5) and for.err. denotes the relative forward error (relative to the LS solution given in [16]) of the approximate LS solution. In the last column we show backward perturbationsε ˆ(yˆ) that the approximate LS solutions satisfy. Hereε ˆ(yˆ) is an alternative F-norm bound on δM such that ˆy is an exact solution to the LS problem below min k(M + δM)y − zk2. (5..1) y

εˆ(yˆ) differs from the smallest possible backward perturbation ε(yˆ) derived in [25,µ 14] by¶ at most a Σ factor less than 2, and it is computed as follows [12]: As previously defined, M = U V T is the 0 SVD of M. Assume that ˆy 6= 0 and let µ ¶ ˆr1 (2m+2n−2k+3)×1 k×1 kˆrk2 ˆr = z − Myˆ = U , for ˆr1 ∈ R , ˆr2 ∈ R , and η = . ˆr2 kyˆk2

Then s ˆrT Σ2(Σ2 + η2I)−1ˆr εˆ(yˆ) = min(η, σ˜), whereσ ˜ = 1 1 . 2 2 2 T 2 2 −2 kˆr2k2/η + η ˆr1 (Σ + η I) ˆr1 As shown in the table, the small backward perturbationsε ˆ(yˆ) imply the approximate LS solutions are backward stable. The computed minimal polynomial perturbations have the same magnitudes as the results computed by the algorithms given in [26] and [16]; Especially, the new results in the first eight examples even have several significant digits identical to that of the classic results [16]. As is suggested by the last two examples, the accuracy of the new results is affected by the large condition number of M.

6. Conclusions In this paper, we describe a fast implementation of low rank approximation of a Sylvester matrix. An iterative algorithm for constructing a Sylvester matrix with given lower rank and obtaining the nearest perturbed polynomials was described in [16]. For this algorithm, the problem that 181 needs to be solved in each iteration is a least squares problem. By exploiting the low displacement rank of the involved coefficient matrix, we obtain an implementation that requires 51 49 s2 + 49st + t2 2 2 flops after throwing away the low order terms, where s = 2m + 2n − k + 3, t = 2m + 2n − 2k + 3. 2 2 3 Comparing with this, the straight implementation in [16] requires 2st − 3 t flops. So the implemen- tation here is more efficient, especially when k is not large. Our fast algorithm can be generalized to several polynomials and also to polynomials with complex coefficients.

Acknowledgment The authors would like to thank Prof. Zhongzhi Bai for useful discussions.

References [1] A. A. Anda and H. Park, Fast plane with dynamic scaling, SIAM J. Matrix Anal. Applic., 15 (1994), pp. 162–174. [2] A.¨ Bjorck¨ , Stability analysis of the method of semi-normal equations for least squares problems, Lin. Alg. Appl., 88/89 (1987), pp. 31–48. [3] A. W. Bojanczyk, R. P. Brent, and F. R. de Hoog, QR factorization of Toeplitz matrices, Numer. Math., 49 (1986), pp. 81–94. [4] , A note on downdating the Cholesky factorization, SIAM J. Sci. Stat. Comput., 8 (1987), pp. 210– 221. [5] S. Chandrasekaran and A. H. Sayed, Stabilizing the generalized Schur algorithm, SIAM J. Matrix Anal. Applc., 17 (1996), pp. 950–983. [6] , A fast stable solver for nonsymmetric Toeplitz and quasi-Toeplitz systems of linear equations, SIMAX, 19 (1998), pp. 107–139. [7] J. Chun, T. Kailath, and H. Lev-Ari, Fast parallel algorithms for QR and triangular factorization, SIAM J. Sci. Stat. Comput., 8 (1987), pp. 899–913. [8] R. M. Corless, S. M. Watt, and L. Zhi, QR factoring to compute the GCD of univariate approximate polynomials, IEEE Transactions on Signal Processing, 52 (2004), pp. 3394–3402. [9] G. Cybenko, A general orthogonalization technique with applications to time series analysis and signal processing, Math.Comp., 40 (1983), pp. 323–336. [10] , Fast Toeplitz orthogonalization using inner products, SIAM J. Sci. Stat. Comput., 8 (1987), pp. 734–740. [11] G. H. Golub and C. F. V. Loan, Matrix Computations, 1996. Johns Hopkins University Press, 3nd edition. [12] M. Gu, Backward perturbation bounds for linear least squares problems, SIAM J. Matrix Anal. Applic., 20 (1998), pp. 363–372. [13] , New fast algorithms for structured linear least squares problems, SIAM J. Matrix Anal. Applic., 20 (1998), pp. 244–269. [14] N. J. Higham, Accuracy and Stability of Numerical Algorithms, 1996. Society for Industrial and Applied Mathematics, Philadelphia. [15] T. Kailath and A. H. Sayed, Displacement structure: theory and applications, SIAM Review, 37 (1995), pp. 297–386. [16] E. Kaltofen, Z. Yang, and L. Zhi, Structured low rank approximation of a sylvester matrix. Submitted to SNC2005, 2005. [17] B. Li, Z. Liu, and L. Zhi, A structured rank-revealing method for a Sylvester matrix. Submitted to J. JCAM (Journal of Computational and Applied Mathematics), 2005. [18] B. Li, Z. Yang, and L. Zhi, Fast low rank approximation of a Sylvester matrix by Structured Total Least Norm, J. JSSAC (Japan Society for Symbolic and Algebraic Computation), 11 (2005), pp. 165–174. 182

[19] J. G. Nagy, Fast inverse QR factorization for Toeplitz matrices, SIAM J. Sci. Stat. Comput., 14 (1993), pp. 1174–1183. [20] H. Park and L. Elden´ , Stability analysis and fast algorithms for triangularization of Toeplitz matrices, Numer. Math., 76 (1997), pp. 383–402. [21] H. Park, L. Zhang, and J. B. Rosen, Low rank approximation of a by Structured Total Least Norm, BIT, 39 (1999), pp. 757–779. [22] S. Qiao, Hybrid algorithm for fast Toeplitz orthogonalization, Numer. Math., 53 (1988), pp. 351–366. [23] J. B. Rosen, H. Park, and J. Glick, Total least norm formulation and solution for structured problems, SIAM J. Matrix Anal. Applic., 17 (1996), pp. 110–128. [24] D. R. Sweet, Fast Toeplitz orthogonalization, Numer. Math., 43 (1984), pp. 1–21. [25] B. Walden,´ R. Karlson, and J.-G. Sun, Optimal backward perturbation bounds for the linear least square problem, Numer. Lin. Alg. Appl., 2 (1995), pp. 271–286. [26] Z. Zeng, The approximate GCD of inexact polynomials. part I: a univariate algorithm. Manuscript, 2004. [27] L. Zhi, Displacement structure in computing approximate GCD of univariate polynomials, in Proc. Sixth Asian Symposium on Computer Mathematics (ASCM 2003), 2003, pp. 288–298.