All You Should Know About Your “Companion” Robert E Hartwig

Home , Companion matrix, Hankel matrix, Polynomial matrix, Shift matrix, Unimodular matrix

Feature Article

All you should know about your “Companion” Robert E Hartwig

Department of Mathematics North Carolina State University http://www4.ncsu.edu/~hartwig

Every mathematician will meet a good dose of linear a complete “all you should know about your compan- algebra in his/her battle to reach nirvanna, be it as ion” presentation. stepping stone to infinite dimensions, or as an entry In this note we shall demonstrate that many of the use- into more abstract algebra, or as a computational tool ful results involving companion matrices are a conse- in numerical analysis. quence of the companion shift property. We shall go In this article we examine the story of the “companion through several of these and develop the notation as matrix” we go along. Since there are so many areas of application, some secrets of our companion will be left to the  0 ... −f  0 literature.  1 −f1  L[f(x)] =   ,  ..  Before we present the shift condition, we mention some  .  of its well known properties. 0 1 −fn−1 • Its transpose LT is similar to L, which translates which is associated with the monic polynomial f(x) = into the fact that any matrix is similar to its trans- n f0 + f1x + ··· + fnx with fn = 1, over a field F. pose. These matrices are the “molecules” that lie at the heart • f(x) is a (left) annihilating polynomial for L(f). of the whole field of linear algebra and its many appli- n n−1 Indeed f`(Lf ) = L + L fn−1 + ··· + If0 = 0, cations, ranging from Canonical Forms to Systems The- which ensures that any matrix has an annihilating ory, Digital Image Processing and Numerical Analysis. polynomial. Their membership includes the “least periodic” matrix • Companion matrices are irreducible Hessenberg in the form of a nilpotent Jordan block, as well as the matrices that represent polynomials. “most periodic” matrix, which undoubtedly is the circulant matrix. • They are the simplest matrices that one can write down with a prescribed characteristic polynomial. Companion matrices really represent polynomials and As such they are crucial building blocks in the appear whenever polynomials are involved. Add to this construction of Canonical Forms. that polynomials are one of the most important concept in applicable mathematics, and so there is some ground • They are non-derogatory, i.e. the characteris- for examining them. tic and minimal polynomials are equal. (blame Sylvester!) They are in fact a realization of “finiteness”. Indeed, finiteness implies periodicity, periodicity implies the ex- • They are sparse (= most of their entries are zero). istence of annihilating polynomials and these in turn • They are closely associated with cyclic sub- imply the existence of companion matrices. spaces, which are the most important subspaces of Companion matrices and annihilating polynomials are linear algebra, module theory, etc. Indeed, their two concepts which like “foot soldiers” are always use ranges from Canonical Forms to chain compu- “there” in the background! In fact, as in real life, it tations, as found in coding and numerical analysis is the behavior of these “molecules” that dictates the (GMRES). behavior of matrices in general. • The link to the cyclic subspaces is provided by the There are numerous reasons why these matrices play cyclic chains, which play a key role in the question such a dominant role, and we shall not attempt to give of minimal polynomials and basis changes.

22 • Companion matrices induce a companion shift, on companion matrix LT , rather than L – as for example a chain of matrices, which can be manipulated with eigenvectors – because left and right multiplica- in numerous ways. These shifts may be compen- tion are different. The fundamental relation between L sated for when the terms in the string satisfy suit- and LT is given by able recurrence relations. It is this cancelation T T −1 T phenomena that is at the bottom of the some of Lf Gf = (Lf Gf ) = Gf Lf or G LG = L , the deeper theorems. and even holds in the non-commutative case. This iden- • Cyclic subspaces parallel cyclic groups, which in tity is the reason why G is often referred to as the “in- turn are fundamental building blocks in all of tertwining” matrix or symmetrizer. n group theory. Consider Lf , where f(x) = f0 +f1x+···+x . We asso- T ˆ • The companion matrix of L(xn − 1) is special. It ciate the coefficient vectors f = [f0, . . . , fn−1] and f = [f , . . . , f ]T and the chain matrices X0 = [1, x, ., xn−1] presents itself when dealing with permutation ma- 0 n n  1  trices and is the linch pin in the whole field of Dis-  y  crete Fourier Transform. Indeed, the roots of its and Y =  . The companion shift takes the n  .  polynomial generate the “mother of all groups”,  .  i.e. the group of n-th roots of unity. yn−1 form

Before we enter the realm of the companion matrix let 0 0 T T (i) xXn −XnLf = f(x)en or (ii) Lf Xn −xXn = −f(x)en us first clear up some more of the needed definitions (0.1) and notations. Throughout this article all matrices will which is trivial to verify and yet is the most important be over a field F, but many results extend to the non- property of the companion matrix! commutative (block) case. The basic shift matrix Sm(f) has been introduced to n Let A ∈ Fn×n and x ∈ V = F . The characteristic take care of polynomial multiplication, i.e. of convolu- and minimal polynomials of A are denoted by ∆ (x) n A tion. Again, let f(x) = f0 + f1x + ··· + fnx , g(x) = and ψ (x) = ψ (x), respectively. The minimal an- m A V g0 + g1x + ··· + gmx , and h(x) = f(x)g(x) = h0 + nihilating polynomial (m.a.p.) of x relative to A is m+n ¯ h1x + ··· + hm+nx , with coordinate columns f, g¯ the monic polynomial ψx(λ), of least degree, such that and h¯. The two key results that we need are ψx(A)x = 0. A Polynomial will be denoted by p(x) or 0 0 by p(λ), when there is no ambiguity, and we use ∂(.) for f(x)Xn = X2nSn(f) (0.2) degree. The reciprocal polynomial of f(x) is given by ¯ Sm+1(f)g¯ = h. ˜ n f(x) = x f(1/x). We shall freely interchange Lf and L(f) and will suppress the subscript where convenient. The latter reflects the convolution product hk = fkg0 + We use rk(.) and ν(.) for r rank and nullity and denote fk−1g1 + ··· + f0gk. the Kronecker product by ⊗. A is regular if AA−A = A There are numerous types of operations that we can for some A− and col[x , .., x ] stands for [xT, .., xT]T. 1 n 1 n now perform on this shift equation. Besides L(f), there are several other matrices that are also determined by f(x). In particular we need its (i) We can multiply through by a suitable matrix. n × n symmetric Hankel matrix G = G[f(x)], and the (n + t) × t basic shift matrix St(f), generated by f(x), (ii) We can embed this shift into a unimodular poly- which are given by nomial matrix – which will enable us show the

 f1 f2 . . . fn  equivalence of xL − L to its Smith Normal Form  f2 f3 ··· fn 0  diag(f(x),I).   G =  .  , and f  .    (iii) We can consider it as a matrix equation of the  fn−1 fn 0 ··· 0  form AX −XB =C, and use the telescoping trick, fn 0 ··· 0 – i.e. repeatedly pre-multiplying by A and post  f0 0  multiplying by B, and add – to arrive at  f1 f0   .  k k k−1 k−2 k−1  . ..  A X −XB = Γk = A C +A CB +···+CB  . .    (0.3)  fn fn−1 ··· f0  St(f) =   .  .   .  (iv) We may (formally) diﬀerentiate to give  0 fn .     . .   . ..  0(k) 0(k−1) (k) T X n (Lf − xI) = kX n − f (x)en 0 fn For later use, we denote the “ﬂip matrix” G(xn) by F . (v) We may evaluate the identity at x = a or replace In many settings it is essential to use the transposed x by a matrix B, giving us useful block identities.

23 (vi) We may combine any of the above such as the Proof. If X = L(p) then take f = p in (2.5). Con- companion shift with the basic shift Sn(g), or dif- versely, if (2.6) holds we select f = p in (2.5). Sub- ferentiation followed by multiplication. tracting shows that [I,B,...,Bn−1][(L(p)−X)⊗I] = 0. Since the powers in the chain are independent it follows Let us now present each of the above and see how this that X = L(p). application can be used. We may now introduce a second matrix B, and multiply (2.5) through by (I ⊗ B) to give for any monic 1 Multiplication polynomial f(x) A[B, AB, . . . , Ar−1B] = If we multiply (0.1) by G, we meet the family of adjoint r−1 [B, AB, . . . , A B](Lf ⊗ I) + [0,..., 0, f(A)B]. polynomials fi(x) given by r−1 0 2 n−1 Chains of the form [B, AB, . . . , A B] are of consid- Fn = [f0(x), f1(x), . . . , fn−1(x))] = [1, x, x , . . . , x ]Gf , erable importance in linear control and systems the- or in detail ory. We shall mainly focus on the case where B is a n−k−1 column x, in which case the chain matrix Kr(x,A) = fk(x) = fk+1 + fk+2x + ··· + fnx , k = 0, 1, . . . , n − 1. [x,Ax,A2x,...,Ar−1x] is referred to as a Krylov ma- It should be noted that f−1(λ) = f(λ) and that trix. There are now two cases of interest. fn−1(λ) = fn = 1. (i) Suppose that the m.a.p of ψx has degree ∂(ψx) = If we multiply (0.1)-(i) on the right by G we obtain the r. Then Arx is the smallest (= first) power that adjoint shift condition is a linear combination of the previous powers, and as such the r links in the chain matrix K (x ,A) = 0 0 T T r xFn − FnLf = f(x)e1 , (1.4) [x,Ax,A2x,...,Ar−1x] are linearly independent and which is equivalent to the recurrence relation fk−1(x) = AKr(x,A) = Kr(x,A)L[ψx(λ)]. fk + xfk(x), k = 0, 1, . . . , n − 1. If we right multiply (0.1) by adj(xI − L) then we gen- Completing the matrix Kr(x,A) to an invertible matrix T T Q = [Kr,B], we then arrive at erate f(x)en adj(xI − L) = Xn (xI − L)adj(xI − L) = T Xn f(x)I, from which we may cancel f(x) to give L(ψ ) E L(ψ ) E AQ = Q x i.e. Q−1AQ = x . 0 n−1 T 0 D 0 D Xn = [1, x, ., x ] = en [adj(xI − Lf )]. This will shortly be used as the first step in the deriva- tion of the Cyclic-Decomposition Theorem. 2 Substitution (ii) If, on the other hand, we take the first n links in the 2 n−1 chain we obtain Kn(x,A) = [x,Ax,A x,...,A x]. Given a matrix A ∈ Fn×n, if we replace x by A in (0.1) Now Anx must be a linear combination of lower pow- and (1.4) we see that n n−1 ers, say A x = −[(f0x + f1Ax + ··· + fn−1A x]. We A[I, A, . . . , Am−1] = see that m−1 AKn = KnL(f), [I, A, . . . , A ][LA(f) ⊗ I] + [0,..., 0, f(A)]. (2.5) where f(x) = f + f x + ··· + xn. It is clear that and 0 1 Kn(x,A) will be non-singular iff ∂(ψx) = n, in which −1 A[A0,A1,...,Am−1] = case A is non-derogatory and A = KnL(f)Kn . T [A0,A1,...,Am−1][LA(f) ⊗ I] + [f(A), 0,..., 0]. Associated with the above chain is the cyclic subspace Z (A) =< x,Ax,A2x, . . . , > generated by x. It is also If in addition f(x) = ∆ (x) and f(A) = 0, then we x A referred to as the Krylov subspace generated by x, and obtain the coefficients Ai = fi(A) in the expansion n−1 is used, for example, in the GMRES method of numer- P i adj(xI − A) = Aix . Not surprisingly we can use ical analysis. i=0 the companion shift to actually characterization a com- A vectorspace V is called cyclic if V = Zu(A) for some panion matrix. Indeed, vector u in V . For the case where V = Fn this means that V =< u,Au,...,An−1u >= R(K (u,A)). Proposition 2.1. Given a matrix B with minimal n polynomial p(x) of degree n. An n × n matrix X equals Cyclic subspaces parallel the concept of cyclic groups, L(p) iff which are by far the most important type of group. The following is, for example, the analog to the fundamental n−1 n−1 B[I,B,...,B ] = [I,B,...,B ](X ⊗ I). (2.6) theorem of finite cyclic subgroups.

24 n−1 Proposition 2.2. If V = Zu(A) and W is an A- (ii) I = [e1,Le1,...,Len−1] = [e1,Le1,...,L e1]. invariant subspace then k n−1 (iii) L = [ek+1,Lek+1,...,L ek+1], for k = (i) W is also a cyclic (sub)space. 1, 2,...

ii) W = Zg(A)u for some polynomial g(λ). From these we obtain the curious by-product that

It is easily veriﬁed that the vector y = g(A)u indeed  I  has a m.a.p equal to ψy = ψA/g.  L  col([I, L, L2,...,Ln−1]) = col( ).  .  As such, it should come as no surprise that cyclic sub-  .  spaces also play an important role in several areas of Ln−1 applied linear algebra such as in coding and in linear control and pole placement. Indeed, cyclic codes contain We next show that polynomials in L and LT generate the BCH codes, which are one of the most important Krylov chains and that they are completely determined families of error-correcting codes. The key question: by their ﬁrst and last columns respectively. When is a vectorspace V cyclic? is really a question Pn i about companion matrices Lemma 4.2. Let L = L(f), g(x) = i=0 gix and Pk i h(x) = i=0 hix , with k < n, hk 6= 0 and associated T T vectors g = [g0, g1, . . . , gn−1] , h = [h0, h1, . . . , hn−1] 3 Embedding and γ = g − gnf. Then

n−1 0 (i) g[L(f)] = [γ,Lγ,...,L γ]. Next, we embed Xn into a unimodular matrix  1 x x2 . . . xn−1  (ii) rk[h(L)] ≥ n − k with equality if h|f.  0 1 x xn−2    (iii) g(L)Gf = [f0(Lf )γ, . . . , fn−1(Lf )γ].  .  K(x) =  1 .  ,   (iv) g[L(f)]h = h[L(f)]γ.  ..   . x  T T T 0 1 (v) g(L )ej = fj−1(L )g(L )en.

0T f(x) and then compute K(x)[xI − L(f)] = , Pn−1 i Pn−1 −I u Proof. (i) g(L)e1 = i=0 (gi − gnfi)L e1 = i=0 (gi − T T gnfi)ei+1 = γ. where u = [f0(x), f1(x), . . . , fn−2(x)] . Selecting n−k−1 u −I (ii) Matrix [h,...,L h] = Sn−k(h) has rank n−k. R(x) = , we see that 1 0 If f = hq, then ∂(q) = n−k and rk[q(L)] ≥ n−(n−k). Also 0 = h(L)q(L), which shows that ν[h(L)] ≥ k. n−1 f(x) 0 (iii) [f0(Lf )γ, . . . , fn−1(Lf )γ] = [I, L, . . . , L ](Gf ⊗ K(x)[xI − L(f)]R(x) = . n−1 0 I I)(I ⊗ γ) = [I, L, . . . , L ](I ⊗ γ)Gf = [γ,Lγ,...,Ln−1γ]G = g(L)G. Since K(x) and R(x) are unimodular with K(x)−1 = (iv)[γ,Lγ, .., Ln−1γ]h = h(L)γ. T −1 −1 I − xJn(0), we have obtained the Smith Normal Form (v) g(L )ej = G [g(L)G]ej = G fj−1(L)γ = −1 −1 diag(f(x),I) of L(f). [G fj−1(L)]g(L)e1 = [G fj−1(L)]g(L)Gen = T T fj−1(L )g(L )en. Taking determinants shows further that ∆L(x) = f(x) so that Lf is indeed non-drogatory. Part one shows that the map must have degree n. Part two shows that if d = (f, g) then rk[g(Lf )] = rk[d(Lf )] = n − ∂(d), which illustrates the close con- 4 Some basic Identities nection between gcds and companion matrices, and is crucial in systems theory. Before we progress, we shall need several basic identities Next we observe that G does symmetrize all powers of k that illustrate how companion matrices deal with poly- L, i.e. Lf Gf is symmetric. In fact it follows by induc- nomial properties. The key shift property of L comes tion that, for k = 1, 2,..., from the following.  0 f0   fk+1 ··· fn  T k . . Proposition 4.1. If L = Lf and f = [f0, . . . , fn−1] ,  . . .   . . .  Lf Gf = diag(−  . .  ,  . . ). then f0 ··· fk−1 fn 0

i n (i) L e1 = ei+1 for i = 0, 1, . . . , n − 1 and L e1 = f. We now come to a couple of useful inverses.

25 The inverse of L(f) exists exactly when f0 is invertible 5 Corner matrices and has the form of flipped companion matrix, i.e. Besides multiplication or evaluation there are two other −1 −1 −1 n −1 L(f) = −[I(f1f0 ) + L(f2f0 ) + ··· + L (f0 ]. operations that we can apply to the companion shift, and these are telescoping or differentiation. Actually The inverse of G on the other hand, is again a Hankel the corner matrix acts very much like “differentiation”. matrix. Indeed, For example, for any polynomial g(x),

T T n−1 T T n−1 0 G[en,L en,..., (L ) en] = [Gen, GL en,...,G(L ) en] x 1 g(x) g (x) n−1 g( ) = = [e1, LGen,...,L Gen] 0 x 0 g(x) n−1 = [e1,Le1,...,L e1] = I. x 0 0 1 = g( ) + g0(x) . 0 x 0 0 Transposing now gives More generally, the corner matrices Γk appear in the  T  AC en powers of . The diﬀerence form now becomes T 0 D  en L   T 2  k k G−1 = [e ,LT e ,..., (LT )n−1e ] =  en L  . AC A 0 0 Γk n n n   − = , where Γk satis-  .  0 B 0 B 0 0  .  T n−1 ﬁes the down-shift recurrence relation en L k k Γk+1(A, C, D) = AΓk + CD = A C + ΓkD. Using this in turn yields If we now have a second polynomial g(x) = g0 + g1x +  T  N g(A)ΓgA, C, B) en ··· + x then g(M) = , where T  T  0 g(D)  en L  en f0(L) −1  T 2  . I = GG = G  en L  =  .  , N n−1  .   .  X X i  .  T Γg = giΓi = gi(A)CB  .  en fn−1(L) i=1 i=0 eT Ln−1 n  I  B T T n−1   establishing that e f (L) = e . We conclude this = [I, A, . . . , A ][Gg ⊗ C]   (5.7) n i i+1  .  section with some of the interaction between L(f) and  .  Bn−1 the matrix N = E1,n and assume that ∂(g) < n.

and the gi(x) are the adjoint polynomials aﬃliated Proposition 4.3. The following hold: AC with g(x). The diﬀerence now becomes g( ) − 0 D N T A 0 P fi(A) 0 0 C 0 0 (i) L[f(x) + g(x)] = L(f) − eng . g( ) = i . 0 D i=1 0 0 0 0 0 B (ii) L[f(x) − 1] = L + N and f(L + N) = I. 6 Companion matrices, Resul- k (iii) L[f(x) − x ] = L + Ek+1,n and f(L + Ek+1,n) = k tants and Bezoutians (L + Ek+1,n) .

k n−1 We now present an example in which we telescope the (iv) NLf N = 0 for k = 0, 1, . . . , n−1 and NL N = N. companion shift equation resulting in a simple relation between companion matrices, resultants and Be- zoutians. The latter enters the realm of root location, (v) g(L + N)N = g(L )N + g N. f f n−1 stability analysis, and gcd-degree computation.

(vi) f(Lf + N)N = N = Nf(Lf + N). Given two monic polynomials f(x) and g(x) of degree n. The bilinear form associated with f(x) is the diﬀerence r r r−1 quotient (vii)( L + N) − L = Γr(L, N, L) = L N + n−2 r−1 n−1 L NL + ··· + NL , r = 1, . . . , n − 2. f(x) − f(y) 0 X = X G Y = f Γ (x, y). x − y n f n i i i=0

Proof. (ii)-(iii) f(x) − xk is an ap for L + E . (iv) With the polynomial pair f(x), g(x) we may associate, k+1,n the form Follows from (4.1)-(i). (v) g(L+N)e1 = g[L(f −1)]e1 = n−1 L(f − 1)g = (L + N)g = Lg + Ng = g(L)e1 + gn−1e1. f(x)g(y) − f(y)g(x) X i j 0 B(f, g) = = bij x y = XnBYn. (vi) NΓr = 0 for r = 0, . . . , n − 1. x − y i,j=0

26 It is bilinear and anti-symmetric, i.e. B(g, f) = in which we equate blocks to yields −B(f, g). The n × n matrix B = B(f, g) is called the − − + + Sn (g)Gf −Sn (f)Gg = g(Lf )Gf and Sn (g)Gf = Sn (f)Gg) Bezoutian (of Hankel type), and is symmetric. It is (6.11) clear that + + The latter follows from the fact that Sn (f) and Sn (g) f(x) − f(y) g(x) − g(y) commute and (f, g) = g(x) − f(x) B + − ˜ + T − ˜ x − y x − y Sn (f)F = Gf ,FGf = Sn (f),Sn (f) = Sn (f), (6.12) 0 0 = g(x)[XnGf Yn] − f(x)[XnGgYn] where f˜ is the reciprocal polynomial. Using the split basic shift (6.8) we see that 0 In order to tackle the bilinear form [g(x)Xn]Gf Yn we use the basic shift on g(x)X0 to simplify the re- 0 0 − n 0 + n [g(x)Xn]Gf Yn = XnSn (g)Gf Yn + x XnSn (g)Gf Yn sult. Since t = n, it is convenient to split Sn(f) = − Sn (f) while + , where Sn (f) 0 0 − n 0 + [f(x)Xn]GgYn = XnSn (f)GgYn + x XnSn (f)GgYn .  f0 0   fn ··· f1  −  . ..  +  ..  Sn (f) =  . .  ,Sn (f) =  .  . Subtracting them, we obtain fn−1 ··· f0 0 fn 0 − − B(f, g) =Xn[Sn (g)Gf − Sn (f)Gg]Yn Because of their Toeplitz structure, representing poly- n 0 + + − − + x Xn[Sn (g)Gf − Sn (f)Gg]Yn, nomial multiplication, we know that Sn (f) and Sn (g) commute, as do their “plus” counter parts. We can now in which the second term vanished because of (6.11). split the basic shift (0.2) into − Extracting the matrix we see that B(f, g) = Sn (g)Gf − − 0 0 0 − n 0 + S (f)Gg which on account of (6.11) gives Barnett’s for- f(x)Xn = X2nSn(f) = XnSn(f) + x XnSn(f) . (6.8) n mula Next we recall the companion shift (0.1), which has the 0 B(f, g) = S−(g)G − S−(f)G = g(L )G . form AX − XB = C, where A = x, B = L(f), X = Xn n f n g f f and C = f(x)eT . Telescoping as in (0.3) we obtain n Lastly we introduce the resultant matrix M(g, f) = k 0 0 k T T − − x Xn − XnL = Γk(xI, f(x)en ,L) = f(x)en Γk(xI, L) Sn (g) Sn (f) + + and the two row matrices (6.9) Sn (g) Sn (f) If the second polynomial is given by g(x) = g0 + g1x + n ··· + x , then we pre-multiply (6.9) by gk and sum, I −S−(f)[S+(f)]−1 I 0 T = n n and U = . giving 0 I −QI 0 0 T T g(x)Xn − Xng(Lf ) = f(x)en Γg(xI, L) = f(x)q (x), (6.10) We subsequently compute the triplet TM(g,f)U in two We next use the adjoint polynomials in ways, establishing that n−1 ζ 0 ζ 0 I 0 T T X T i + = + + q = en Γg = gi(x)en Lf 0 Sn (f) Sn (g) Sn (f) −QI i=0 − + −1 − I −Sn (f)[Sn (f)] g(Lf ) Sn (f)  eT  = + n 0 I 0 Sn (f) T  en Lf  g(Lf ) 0 = [g0(x), . . . , gn−1(x)]  .   .  = + .  .  0 Sn (f) eT Ln−1 n f As such we see that 0 −1 0 −1 = XnGgGf = XnQ, where Q = (GgGf ). g(L ) = ζ = S−(g) − S−(f)[S+(f)]−1S+(g), Applying the basic shift in (6.10) now gives f n n n n

g(L ) which is the (2,2) Schur complement of M(g,f). In con- X0 S (g) − X0 f = X0 S (f)Q, 2n n 2n 0 2n n clusion we may use (6.12) to compute − − + − T ˜ Sn (f) Sn (g) Sn (g) Sn (g) from which we obtain the identity M(f, g)M (˜g, −f) = + + + − , Sn (f) Sn (g) −Sn (f) −Sn (f) g(L ) S (g)G − f G = S (f)G . which reduces to diag(K,N), where n f 0 f n g ˜ We could also have telescoped the adjoint shift equa- K = −B(f, g)F and N = −FB(f, g˜). tion. Splitting this then produces Needless to say, there are numerous generalizations of − − this concept to multivariate or non-commutative set- Sn (g)Gf Sn (f)Gg g(Lf )Gf + − + = Sn (g)Gf Sn (f)Gg 0 tings.

27 7 Generalized Adjoint chains 8 The Cyclic Decomposition Theorem The idea of an adjoint chain may be extended by using a block Hankel matrix G. Indeed, suppose we are given the N This theorem is a statement about the periodicity of m×n polynomial matrix F (x) = F0+F1x+···+FN x . The adjoint polynomials associated with this master finite dimensional objects, and is realized in terms of polynomial are defined by companion matrices. There are essentially two versions of this theorem. A N−k−1 weak version, which is easier to prove, and a strong ver- Fk(x) = Fk+1 + Fk+2x + ··· + FN x , k = 0,...,N − 1, sion, which requires much more firepower. The weak version says that any matrix A in V = is similar with in addition FN (x) = 0, F−1(x) = F (x) and Fn×n FN−1(x) = FN . As in the scalar case they can be to a direct sum of companion matrices. Or equivalently, expressed in block matrix form as that any vectorspace over a field F can be decomposed as a direct sum of “cyclic” subspaces. It may be con- N−1 [F0(x), F1(x), ··· , FN−1(x)] = [1, x, ··· , x ]G(F ), sidered as a special case of the fundamental, theorem of abelian groups. where G = G(F ) is the block Hankel matrix The best example is that of a permutation, which is a product of distinct cycles. In terms of matrices this  F F ...F  1 2 N says that any permutation matrix is similar to direct  F2 F3 FN 0    sum of matrices of the form L(xr − 1).  .  G =  .  .   We shall use the adjoint-companion shift (7.13) to de-  Fn−1 FN 0  FN 0 ... 0 rive the “strong”’ version of this theorem – which is often called the Rational Canonical Form – in which, The key feature of these polynomials is that they satisfy in addition, the minimal polynomials of the companion the down-shift Recurrence Relation matrices interlace. The proof is short and does not use quotient spaces. x.Fk(x) = Fk−1(x) − Fk k = 0, 1, ··· ,N − 1, When this theorem is combined with the Primary De- composition Theorem, they will spawn the Jacobson in which we may replaced x by a suitable square matrix and Jordan Canonical Forms. A on the left, or D, on the right. Given a matrix A in V = Fn×n, with minimal poly- Using the block recurrence we may write nomial ψA. Select a maximal vector x, for which ψ = ψ = f(λ) = f + f λ + ··· + λm. The existence       x A 0 1 F1(x) F0(x) F1 of such a vector follows as for finite abelian groups G,  .   .   .   .  .x =  .  −  .  when we replace the order O(.) of an element by the FN (x) FN−1(x) FN m.a.p of a vector. Indeed in G, if O(a) - O(b) then there exists z in G such that O(b)|O(z) but b 6= z and and by using the companion structure we also have if O(y) is maximal then O(a)|O(y) for all a in G. We then form the chain matrix K =     0   m−1 F1(x) f0I [x,Ax,...,A x], which has rank m, and complete it  F1(x)  (L ⊗I)  .  =  − .  (x). Lf C f  .   .   .  FN to a basis Q = [K,B] for V . Then AQ = Q ,  .  0 D FN (x) fN−1I −1 FN−1(x) for some C and D. Since Q is invertible, Q AQ = L C = M, and thus ψ = ψ = f. It now fol- Subtracting these we arrive at the generalized adjoint- 0 D M A companion shift identity f(L)Γ lows that 0 = f(M) = f , in which the 0 f(D)         F0(x) corner block takes the form F1(x) F1(x) F1  0  (L ⊗I) . − .  x = . −  m k−1 m−1 f  .   .   .   .  X X k−j−1 j X i  .  Γf = fk L CD = fi(L)CD = 0 (8.14) FN (x) FN (x) FN 0 k=0 j=0 i=0 (7.13) and the f (λ) are the usual adjoint polynomials of together with a row analog. It should be noted that k  γT  the indices differ by one from those in (1.4), and it goes 1  .  without saying that we may now again replace x by a f(x). Suppose now that C =  .  and set Fk(x) = T suitable matrix D. γm

28 m−1 P T i−k 9 Differentiation γi+1x and Fm(x) = 0. It is easily seen that i=k T Fk(x) satisfies Fk(x).x = Fk−1(x) − γk so that we If the field is closed we may use the companion shift can use the adjoint-shift identity (7.13) to obtain the Jordan form for L(f), but since we use   right eigenvectors, it is more convenient to use LT . The     F0(D) f F1(D) F1(D) 0 transposed companion shift takes the form . .   L(f)  .  −  .  D = C −  .  ,  .   .   .  T  .  Lf Xn(x) − Xn(x)x = −f(x)en. Fm(D) Fm(D) 0 T Now f(a) = 0 iff Xn(a) is an eigenvector for L associ- m−1 m−1 P T i P T i ated with eigenvalue a. The corresponding eigenvector in which F0(D) = γi+1D = ei+1CD . Us-   i=0 i=0 f0(a) T T ing the fact that emfi(L) = ei+1, this reduces to  .  m−1 m−1 for Lf will be F (a) = Gf Xn(a) = . . Now P T i T P i T   F0(D) = emfi(L)CD = em fi(L)CD = emΓf , f (a) i=0 i=0 n−1 and thus, by (8.14), vanishes. In other words because rk[LT − aI] = n − 1, it follows that there can we have constructed a solution X to the matrix only be one independent eigenvector for a, and thus equation L(f)X − XD = C. This means that there is exactly one Jordan block per eigenvalue, as ex- IX L(f) C I −X L(f) 0 pected, for a non-derogatory matrix. = ,and 0 I 0 D 0 I 0 D n Q L(f) 0 In the simplest case f(x) = (x − λi) has n dis- consequently A ≈ M ≈ , in which 0 D i=1 tinct roots and LT (f) has n distinct evalues, and as ψD|ψM = f. It goes without saying that we may re- such is diagonalizable via its evector matrix V = peat the above steps with D to obtain a direct sum [Xn(λ1),...,Xn(λn)]. Needless to say this is the cel- decomposition ebrated Vandermonde matrix  1 1 1  A ≈ diag[L(ψ1),L(ψ2),...,L(ψt)],  λ1 λ2 ··· λn   2 2 2  where ψt | ψt−1 | · · · | ψ2 | ψ1.  λ1 λ2 ··· λn  V =   .  .  The polynomials JA = (ψ1, . . . , ψt−1) are unique, and  .  n−1 n−1 n−1 are called the invariant factors of A. They completely λ1 λ2 ··· λn characterize similarity and do not depend on any possi- We may conclude that if f(x) has distinct roots then ble factorization of polynomials, and were obtained by −1 T only using “rational operations”. Hence the alternative V L (f)V = diag(λ1, . . . , λn) = Λ name of “Rational Canonical Form”. Likewise L(f) is diagonalized by the matrix basis The uniqueness of this Canonical Form follows at once, change GV , i.e. L(GV ) = (GV )D, where (GV )ij = if we recall that ψ1 = ψA is unique and then apply the fi(λj), i, j = 0, 1, . . . , n − 1. following elementary result. f(x)−f(y) 0 We next note that, because = XnGf Yn, if A 0 A 0 x−y Lemma 8.1. If M = ≈ = N α and β are two distinct roots of f(x) then the quo- 0 B 0 C tient vanishes and so X0 (α)G Y (β) = 0. Also let- then ψ = ψ . n f n B C ting x approach y, or by summing directly, we see that X0 (x)G Y (x) = f 0(x). This means that V T G V = ψ (A) 0 ψ (A) 0 n f n f Proof. ψ (M) = B ≈ B . diag(f 0(λ ), ., f 0)λ )) = D or V −1 = DV T G, which B 0 0 0 ψ (C) 1 n B can then be used to establish that Taking ranks shows that rk[ψB(C)] = 0 and thus T T T T ψB(C) = 0 and ψC | ψB. By symmetry, it also fol- V B(f, g)V = V g(Lf )Gf V = V Gf g(Lf )V lows that ψ | ψ , ensuring equality. B C = (V T GV )Λ = DΛ.

n Now if A ≈ diag[L(ψ),D] ≈ diag[L(ψ),E] then, by ap- The companion matrix Ω = L(x −1) = [e2,..., en, e1] plying Lemma (8.1), we see that ψD = ψE, so that we is called the basic circulant, and any polynomial p(Ω) can indeed continue the reduction process with D or in Ω is a circulant matrix. with E. The same polynomials will be obtained. n−1 For example if p(x) = p0 + p1x + ··· + pn−1x then It is of interest to note that the above method can actu-   AC p0 pn−1 p2 p1 ally also be used to prove that similarity of  p1 p0 pn−1 ··· p2  0 D    p2 p1 p0  A 0 p(Ω) =   . and , ensures that AX − XD = C has a so-  .  0 D  . pn−1  lution. pn−1 ··· p1 p0

29 T The matrix Ω is one of the most important matrices Dividing by k! and setting Mk = Xn/k!, yields L Mk = (k) in all of applied mathematics. Indeed, since ∆Ω(λ) = xMk + Mk−1 − (f /k!)en, which is a step down re- λn − 1, its eigenvalues are the n distinct n-th roots of currence relation. Stacking r of these columns in 2 n−1 2πi unity, σ = {1, ω, ω , . . . , ω }, where ω = exp( n ). Wr,n(x) = [M0, ., Mr−1] shows that T T Consequently it also has n independent eigenvectors L Wr,n(x) = Wr,n(x)Jr(x) − enFr(x) , k vn(ω ), k = 0, 1, . . . , n−1 (called phasers), and as such 0 (r−1) T f (x) f (x) Ω can be diagonalized via where Fr(x) = [f(x), 1! ,..., (r−1)! ]. If we sub-

T −1 stitute λi for x and take r = mi, then F (λi) = 0 leaving Ω V = VD, and ΩV = VD , T L Wmi,n(λi) = Wmi,n(λi)Jmi (λi), where n−1  2 m−1 n n  where D = diag(1, ω, . . . , ω ) and V is the Vander- 1 λ λ ··· λ n λ m−2 n n−1 monde matrix  0 1 2λ ··· (m − 1)λ n−1 λ  T   W (λ) = .  m,n+1  .   1 1 1 ... 1   0 0 1 .  2 n−1 0 0 0 ··· 1 n λn−m+1  1 ω ω ω  n−m+1  2 4 2(n−1)   1 ω ω ω  T is an m × (n + 1) conﬂuent Vandermonde block. Stack- V =   = V .  .   .  ing these blocks for each of the distinct eigenvalues we 1 ωn−1 ω2(n−1) . . . ω(n−1)(n−1) arrive at LT [W ,...,W ] = Now V is not only symmetric but its columns v(ωk), λ1 λs are also pairwise orthogonal ! Indeed, [Wλ1 ,...,Wλs ]diag(Jm1 (λ1),...,Jms (λs)),

n−1 which is the desired Jordan form. Several points should X n if r = k v(ωk)∗v(ωr) = ωs(r−k) = now be noted: 0 if r 6= k. s=0 (i) Wm,(α) precisely equals the chain matrix T Consequently we may normalize the eigenvectors and Kn[e1,Jm(α)]. use the unitary matrix W = √1 V for which W −1 = n (ii) it relates the derivatives to the coefficients of a poly- W¯ T = W¯ . nomial in the stacked form     The matrix multiplication f(λ) f0 f 0(λ)/1!    f1   .  = W (λ)   . y = W x  .  m,n  .   .   .   (m−1)  f (λ) f is referred to as the Discrete Fourier Transform. It (m−1)! n is of cardinal importance in the theory of filtering. We (iii) The matrix W = [W ,...,W ] is the general- shall now examine the case of repeated roots of f(λ). λ1 λs ized Vandermonde matrix. It is also known as the Like Janus, differentiation is a “two-faced” personality. Caratheodory matrix, which appears in the study of On the one hand it is used to compute tangents and moment problems. tangent planes, and as such is all important in opti- (iv) W also equals the Wronskian Matrix of the set of mization, while on the other hand it also serves as the functions {tjeλkt}, k = 1, . . . , s and j = 0, . . . , m − 1 ultimate counting machine. This makes it indis- k and appears in the study of differential equations. pensable in combinatorics and in fact anywhere where k (j) polynomials are used. Recall that the term λ is after (v) If [f0, . . . , fn]W = 0 then f (λk) = 0 for k = all just a place holder, and that its coefficient can be 1, . . . , s and j = 0, . . . , mk, with m1 + ··· + ms = n + 1. Qs mi “counted” by differentiating k times. As such we have But then π(x) = i=1(λ − λi) |f(x), in which ∂(π) = two counting tools, matrix multiplication and differen- n + 1 while ∂(f) = n. Thus forcing f(x) = 0, ensuring tiation and our main trick is is to convert differential that W is non-singular. As a check we can compute identities into matrix identities. Q mimj (vi) det(W ) = (λi − λj) . s 1≤j

30 T T Now W GW is made up of blocks Wα Gf Wβ, where α The parameters x and y can be expressed in terms of −1 and β are eigenvalues of L. When α 6= β we see from w , p0, p1 and p2. Its structure is again sparse, but is the diﬀerence quotient that this vanishes. On the other closer to that of a perturbed companion matrix. The hand, when α = β, we shall need more care. First re- expression for the Drazin inverse however, is still un- 0 0 call that the adjoint polynomial satisfy Fn = XnG, and known. 0 (k) 0 (k) T hence that Fn = Xn G. Now since the rows of Wα are the derivatives of XN at α we obtain

0 0 11 Conclusions  Xn(α)   Fn(α)  1! 1! T  .   .  Wα Gf =  .  Gf =  .  , We have seen that the companion shift equation is cen-  0 (k−1)   0 (k−1)  Xn (α) Fn (α) tral to many of the applications involving L. It is in (k−1)! (k−1) combination with other shift conditions that cancella- which is the weighted Wronskian of the adjoint poly- tion can occur and the best results materialize. There T are numerous generalizations of a companion a matrix, nomials at α. It is now clear that (Wα Gf Wα)pq = F 0 (p)X (q) such as the comrade and congenial matrices which use n n . To compute this we ﬁrst diﬀerentiate the p!q! other bases besides the powers of x. Many of the chain adjoin shift equation (1.4) which gives relations generalize to the block case and provide a in-

0 k) 0 k−1) 0 k) T (k) T exhaustible supply of challenging problems. xFn + kxFn − Fn L = f (x)e1 .

Now post mutiply by Xn, and substitute the column companion shift (0.1). This gives Bibliography

0 (k−1) (k) kFn (x)Xn(x) = f (x), [1] S. Barnett, Division of Generalized Polynomials

0 (k) using the Comrade matrix, Lin ALg. App. 60 where we used the fact that Fn en = 0. It now follows (1984), pp. 159-75. by induction that [2]—, A matrix circle in Linear Control, 12 Bull. Ins. f (k+r+1)(x)r!k! F 0 (k)X (r) = , maths. Appl. 9 (1976), pp. 173-177. n n (r + k + 1)! [3] L. Brand, The companion matrix and its proper- T and thus the matrix Wα Gf Wα can now be identiﬁed ties, Math. Monthly, vol 71 (1964), pp. 629-634. as F φ(Jm (α)). i [4] G. Chen and Z. Yang, Bezoutian Representation via Vandermonde Matrices, Lin ALg. App, 186 10 The group Inverse of a Com- (1993), pp. 37-44. panion Matrix [5] R. E. Hartwig, Resultants, Companion Matrices and Jacobson Chains, Lin ALg App 94 (1987), pp. 1-34. We have seen that the inverse, if any, of a companion matrix L(f), again has companion structure. When L is [6] ——, AX - XB = C, Resultants and generalized singular we require in some settings the group inverse inverses, SIAM J. App. Math. 28 (1975), pp. 154- L# (over a ring R with 1), which satisﬁes 183.

LXL = L, XLX = X and LX = XL. [7] G. Heinig, Matrix Representations of Bezoutians, Lin ALg. App, 223/224 (1995), pp. 337-54. − It exists iﬀ p0 is regular and w = p0 − (1 − p0p0 )p1 is [8] U. Helmke and P.A. Fuhrmannn, Bezoutians, Lin invertible, in which case it has the form L# = [x, y,B] ALg. App, 122 (1989), pp. 1039-1097.  0T  u v where B = I , x = , y = , v = [9] R. Puystjens and R.E.Hartwig, The Group In-  n−2  x y 0T verse of a Companion Matrix, Lin.Mul.Lin. Alg.   43 (1997), pp.137-150. v2  .  ˆ  .  ˆ ˆ T [10] N. J. Young, Linear Fractional Transform of Com- e1 +fy and u =  +fx and f = [p1, . . . , pn−1] .  vn−1  panion Matrices, Glasgow Math. J. 20 (1979), pp. y 129-132.