similarity

Isak Johansson and Jonas Bederoff Eriksson Supervisor: Tilman Bauer Department of Mathematics

2016

1 Abstract

This thesis will deal with similar matrices, also referred to as matrix conju- gation. The first problem we will attack is whether or not two given matrices are similar over some field. To solve this problem we will introduce the Ratio- nal , RCF. From this normal form, also called the , we can determine whether or not the given matrices are sim- ilar over any field. We can also, given some field F , see whether they are similar over F or not. To be able to understand and prove the existence and uniqueness of the RCF we will introduce some additional module theory. The theory in this part will build up to finally prove the theorems regarding the RCF that can be used to solve our problem. The next problem we will investigate is regarding simultaneous conjugation, i.e. conjugation by the same matrix on a pair of matrices. When are two pairs of similar matrices simultaneously conjugated? Can we find any necessary or even sufficient conditions on the matrices? We will address this more complicated issue with the theory assembled in the first part.

2 Contents

1 Introduction 4

2 Modules 5 2.1 Structure theorem for finitely generated modules over a P.I.D, existence ...... 8 2.2 Structure theorem for finitely generated modules over a P.I.D, uniqueness ...... 10

3 Similar matrices and normal forms 12 3.1 Linear transformations ...... 12 3.2 Rational Canonical Form ...... 12 3.3 Similiar Matrices ...... 14 3.4 Smith Normal Form ...... 16 3.5 Finding a conjugating matrix ...... 18

4 Simultaneous Conjugation 25

3 1 Introduction

Similarity between matrices is a well studied problem in the area of . Most of the theory regarding matrix similarity has been known for a very long time. Ferdinand Georg Frobenius, who named the normal form we will use in this report, lived over a hundred years ago. Besides the purely mathematical reasons for this report to be interesting, it is interesting in the way that it can be used to find similarity between objects that is not immediately visible. If we have two linear transformations in different bases, how can we know if they are in fact the same? Or given a linear transformation, which is the ”nicest” way of representing it as a matrix? Is it always possible to write it in this ”nice” form? These questions can be answered after reading this report. We will assume that the reader is familiar with some basic abstract algebra. This includes the theory of groups, rings, principal ideal domains, fields and modules. The structure of the report will be to define new concepts and then use them to prove useful theorems. We will also give some examples so the reader will have the opportunity to apply the theory and understand it in a better way. Our sources to this paper was mainly the book Abstract algebra by David S. Dummit and Richard M. Foote, but we have also been reading a couple of reports similar to this one. From now on we let R denote a commutative unitary ring.

4 2 Modules

Definition 1. Let M be an R-module. The torsion of M is all the elements of M for which there exists an r ∈ R − 0 such that their product is zero i.e. T or(M) = {m ∈ M | ∃r ∈ R − 0 , rm = 0}. If M = T or(M), M is said to be a torsion module.

Definition 2. Let M be an R-module and let A be a subset of M. If for  every non-zero element x ∈ M there exist unique r1, r2, ..., rn ∈ R − 0 and a1, a2, ..., an ∈ A, ai 6= aj when i 6= j, for some n ∈ Z≥0 such that x = r1a1 +r2a2 +...+rnan then M is a free module on the set A. The set A is called the for M, and the of M is the number of basis elements i.e. the cardinality of A.

Definition 3. Let M be an R-module. The annihilator of M is all the elements of R for which the product with every element in M is zero i.e.

Ann(M) = {r ∈ R | rm = 0 ∀m ∈ M}.

Definition 4. An R-module M is said to be finitely generated if it is generated by some finite subset, i.e. if there is some finite subset A = {a1, a2, ..., an} of M such that

M = RA = Ra1+Ra2+...+Ran = {r1a1+r2a2+...+rnan | ri ∈ R, 1 ≤ i ≤ n}.

Definition 5. An R-module M is said to be cyclic if it is generated by a single element, i.e. if there is some element x ∈ M such that

M = Rx = {rx | r ∈ R}.

Theorem 6. Let R be a P.I.D, M be a finitely generated free R-module of rank n and N be a submodule of M. Then the following statements are true. i) N is free of rank m, m ≤ n. ii) There exists a basis {y1, y2 . . . yn} for M such that {a1y1, a2y2 . . . amym} is a basis for N, where ai ∈ R − {0} and ai | ai+1.

Proof. The theorem holds for N = 0, so assume N is not equal to zero. Let Φ be any homomorphism from M into R. Then Φ(N) will be an ideal in R.

5 Since R is a P.I.D there exists an element aΦ in R such that Φ(N) = (aΦ). Now we want to collect all the ideals in R obtained from mapping N in M into R. Σ = {(aΦ) | Φ ∈ HomR(M,R)} Since R is a P.I.D. there exists an element in Σ, which is not properly contained in any other element of Σ[1]. This implies that there exists an φ ∈ HomR(M,R) such that φ(N) = (aφ). Let a1 = aφ and y be an element of N such that φ(y) = a1.

Next we show that a1 divides Φ(y) for every Φ ∈ HomR(M,R). Let d be a divisor of both a1 and Φ(y) such that d = r1a1 + r2Φ(y) for some r1, r2 ∈ R, then (a) ⊆ (d) and (Φ(y)) ⊆ (d). Now define a homomorphism ν such that ν = r1φ + r2Φ then ν(y) = r1a1 + r2Φ(y) = d. So (a1) ⊆ (ν(y)) = (d). But from how we defined a1 we must have the equality (a1) = (ν(y)) = (d) and in particular (Φ(y)) ⊆ (a1).

Applying this knowledge to the natural projection homomorphism πi on the basis {x1, x2 . . . xn} for M we obtain

πi(y) = a1bi, bi ∈ R, 1 ≤ i ≤ n.

Now define y1 ∈ M as n X y1 = bixi, i=1 but then the equality a1y1 = y holds and φ(y1) = 1, since a1 = φ(y) = φ(a1y1) = a1φ(y1). We will now prove that y1 can be taken as an element in a basis for M and a1y1 can be taken as an element in a basis for N, i.e. a) M = Ry1 ⊕ ker(φ) b) N = Ra1y1 ⊕ (N ∩ ker(φ))

For a) we can take any element x ∈ M and add and subtract φ(x)y1 to it,

x = φ(x)y1 + (x − φ(x)y1).

Note that x − φ(x)y1 is an element of ker(φ) since

φ(x − φ(x)y1) = φ(x) − φ(x)φ(y1) = 0

6 Therefore x can be written as x = Ry1 +ker(φ). To see that we have a direct sum suppose ry1 is an element of ker(φ), then

0 = φ(ry1) = rφ(y1) = r =⇒ r = 0.

To prove b) we first recall that a1 divides φ(z) for all z ∈ N. In a similiar way as before we rewrite z as

z = φ(z)y1 + (z − φ(z)y1), and evaluating φ at z gives

z = ba1y1 + (z − ba1y1). So clearly, N = Ra1y1 + (N ∩ ker(φ)). This is just a special case of a) so the sum is direct. We will prove the first part of the theorem by induction. If m = 0, N must be a torsion module, but since N is also a free module, N = 0. The theorem holds for N = 0 as noted earlier. By a) we see that m = 1 + rank(N ∩ kerφ) =⇒ rank(N ∩ kerφ) = m − 1 So by induction N ∩ kerφ is a free R-module with rank m − 1 and N is a free module of rank m. We will again use induction and the identities a) and b) to prove the second part of the theorem. Applying a) to M we get that ker(φ) is a free R-module of rank n − 1. If we then continue with replacing M in a) with ker(φ) and N in b) with N ∩kerφ we get that {y2, y3, . . . , yn} is a basis for ker(φ) and that {a2y2, a3y3, . . . , amym} is a basis for N ∩kerφ. Here ai ∈ R, 1 ≤ i ≤ m and we have the divisibility relation a2 | a3 | · · · | am. If we again use the identities we see that {y1, y2, . . . , yn} is a basis for M and {a1y1, a2y2, . . . , amym} is basis for N, since the sums are direct.

If we can show that a1 divides a2, we are done. Let ψ be the projection homomorphism from M to R, such that ψ(y1) = ψ(y2) = 1 and ψ(yi) = 0 for all 3 ≤ i ≤ n. Then ψ(a1y1) = a1ψ(y1) = a1, so (a1) ⊆ ψ(N) but since a1 can not be properly contained by any homomorphism of N into R we must have equality (a1) = ψ(N). In a similiar way for a2y2 we obtain ψ(a2y2) = a2ψ(y2) = a2, hence (a2) ⊆ ψ(N) = a1 i.e. a1 | a2.

7 2.1 Structure theorem for finitely generated modules over a P.I.D, existence

Theorem 7. Let R be a P.I.D and M be a finitely generated R-module. Then M is isomorphic to a free part and a direct sum of cyclic modules on the form R/(ai). ∼ r M = R ⊕ R/(a1) ⊕ R/(a2) · · · ⊕ R/(am) where the denominators above satisfies the relation a1 | a2 | · · · | am.

 n Proof. Let n ∈ Z≥0. Take any basis b1, . . . , bn for R and generators  c1, . . . , cn for M and define a homomorphism Φ as

Φ(bi) = ci for 1 ≤ i ≤ n

Then by the first isomorphism theorem we have

Rn/ker(Φ) ∼= M.

 n  By Theorem 6 we can choose another basis y1, . . . , yn for R , where a1y1, . . . , amym is a basis for the kernel of Φ. Here ai ∈ R for 1 ≤ i ≤ n and with the di- visibility relation a1 | a2 | · · · | am. The last expression can now be written as Ry ⊕ · · · ⊕ Ry M ∼= 1 n . Ra1y1 ⊕ · · · ⊕ Ramym We want to rewrite this expression using the map

n−m Ry1 ⊕ · · · ⊕ Ryn 7→ R/(a1) ⊕ · · · ⊕ R/(am) ⊕ R , which clearly has the kernel Ra1y1 ⊕ · · · ⊕ Ramym. Hence,

∼ r M = R ⊕ R/(a1) ⊕ R/(a2) · · · ⊕ R/(am).

Note: r = n − m.

Definition 8. In Theorem 7, r is called the rank of M and the numbers with the divisibility relation a1 | · · · | am are called invariant factors.

8 Theorem 9. Let R, M and r be as in Theorem 7. Then M is isomorphic to a direct sum

∼ r α1 α2 αm M = R ⊕ R/(p1 ) ⊕ R/(p2 ) · · · ⊕ R/(pm ), where pi are primes and αi ∈ N. Note that the cyclic factors are not neces- sarily distinct.

Proof. This follows from applying the Chinese Remainder Theorem on each cyclic factor in Theorem 7.

α1 α2 αm Definition 10. The numbers p1 , p2 , . . . , pm in Theorem 9 are called elementary divisors.

Theorem 11. Let R be a P.I.D, p be a prime in R and F be the field R/(p). Then Rr/pRr ∼= F r.

Proof. Consider the natural map

Φ: Rr 7→ F r, defined by

Φ(x1, x2, ..., xr) = (x1 (mod p), x2 (mod p), ..., xr (mod p)), where Im(Φ) = F r and Ker(Φ) = pRr. Since this is an R-module homo- morphism it gives us the desired isomorphism

Rr/pRr ∼= F r by the First Isomorphism Theorem for modules. Lemma 12. Let R be a P.I.D, p ∈ R be a prime and M the R-module M = R/(a1) ⊕ R/(a2) · · · ⊕ R/(am), where p divides all ai. Then M/pM is isomorphic to F m where F is the field R/(p).

Proof. We will prove that if M = R/(a), then M/pM ∼= R/(p). This proves our lemma.

9 The module pM = [(p) + (a)]/(a) is the image of (p) in R via the canonical homomorphism from R to R/(a). If p divides a then pM = (p)/(a) and

M/pM = (R/(a))/((p)/(a)) ∼= R/(p) = F.

Lemma 13. Given a list of elementary divisors, the corresponding invariant factors are uniquely determined.

Proof. Suppose we have two different lists of invariant factors a1 | a2 | · · · | an and b1 | b2 | · · · | bm with the same corresponding list of elementary divisors. Due to the divisibility relation, an and bm is the product of the greatest powers of each prime among the elementary divisors, which implies that an = bm. Removing these prime powers, an−1 and bm−1 are now the products of the greatest powers of each prime remaining among the elementary divisors. This implies an−1 = bm−1. Continuing this procedure we see that aj = bj ∀j and we are done.

2.2 Structure theorem for finitely generated modules over a P.I.D, uniqueness

Theorem 14. Two finitely generated R-modules M1 and M2 are isomorphic if and only if they have the same free rank and the same list of invariant factors.

Proof. (⇐=). Assume M1 and M2 have the same free rank, r, and the same list of invariant factors, a1, a2, ..., am. Then they are trivially isomorphic since

∼ r ∼ M1 = R ⊕ R/(a1) ⊕ R/(a2) · · · ⊕ R/(am) = M2 ∼ ⇒ M1 = M2.

∼ ∼ r1 ∼ r2 (=⇒). Assume now M1 = M2. Then also T or(M1) = T or(M2) ⇒ R = R , where r1 and r2 are the free ranks of M1 and M2 respectively. If p is any non-zero prime in R we obtain Rr1 /pRr1 ∼= Rr2 /pRr2 . By Theorem 8 we get that Rr1 /pRr1 ∼= Rr2 /pRr2 ⇒ F r1 ∼= F r2 ,

10 where F = R/(p). Since these are vector spaces we must have that r1 = r2. Since the invariant factors are uniquely determined by the elementary divisors it is sufficient to prove that the modules have the same list of elementary divisors. Given any prime p we want to show that the list of elementary divisors which are powers of p are the same for both M1 and M2.

If M1 and M2 are isomorphic, the submodules of M1 and M2 containing the cyclic factor whose elementary divisor is the greatest power of p must also be isomorphic, since M1 and M2 share the same annihilator. It is now sufficient to deal with the case for when the elementary divisors of M1 and M2 are on the following form:

α1 α2 αs for M1: p, p . . . p, p , p , . . . , p | {z } n1

β1 β2 βt for M2: p, p . . . p, p , p , . . . , p | {z } n2

But then the elementary divisors for the pMi must be

α1−1 α2−1 αs−1 for pM1: p , p , . . . , p

β1−1 β2−1 βt−1 for pM2: p , p , . . . , p ∼ ∼ If M1 = M2 then pM1 = pM2, one can obtain that the power of the ele- mentary divisors has decreased for the latter modules with one. The list of elementary divisors for pM1 and pM2 reveals that αs −1 = βt −1 and αs = βt. By induction we obtain that αi = βi and s = t.

∼ n1+s ∼ n2+t Using the previous lemma we obtain M1/pM1 = F and M2/pM2 = F n1+s ∼ n2+t which gives that F = F , but then n1 = n2. This proves that M1 and M2 has the same list of elementary divisors hence the same list of invariant factors.

11 3 Similar matrices and normal forms

In this section we will introduce the concept of similarity. Using the theory of modules, most importantly the structure theorem, we can now construct the normal forms useful to determine matrix similarity.

3.1 Linear transformations

Let F be a field. For every pair of a vector space V and a linear transforma- tion T : V 7→ V , there exists an F [x]-module isomorphic to the vector space V . Without loss of generality we can consider V to be this F [x]-module.

T.α = xα ∀α ∈ V.

T acts by multiplication with x on V . In this chapter we will apply the theory for modules to the special case when the module is a vector space over F [x].

3.2 Rational Canonical Form

Let V be a finite dimensional vector space considered as an F [x]-module. Let T : V 7→ V be the linear transformation which acts by multiplication with x on V . Let (a(x)) be the annihilator of V . Since a(x) is unique up to a unit, one can let a(x) be a monic polynomial.

n n−1 a(x) = x + bn−1x + ··· + b0.

Now let the T act on the generators of V .

1 7→ x

x 7→ x2 n−1 n−1 n−2 x 7→ −bn−1x − bn−2x − · · · − b0.

12 Writing this in matrix notation and the basis {1, x, x2, ..., xn−1} we get a matrix with ones down the subdiagonal.   0 0 ··· 0 −b0 1 0 ··· 0 −b   1  0 1 ··· 0 −b  T =  2  ......  . . . . .  0 0 ··· 1 −bn−1

Definition 15. The matrix Ca(x) above is said to be the n n−1 of the monic polynomial a(x) = x + bn−1x + ··· + b0. Applying Theorem 7 to the vector space V with zero free rank we get ∼ V = F [x]/(a1(x)) ⊕ F [x]/(a2(x)) ⊕ · · · ⊕ F [x]/(am(x)). ∼ Let Vi = F [x]/(ai(x)), then ∼ V = ⊕Vi as F [x]-modules.

Let T be the linear transformation acting by componentwise (on each Vi) multiplication with x.

These subspaces are clearly T -invariant, i.e. T (v) ∈ Vi for all v ∈ Vi. Now one can choose a basis for V to be {x1, x2, ··· , xn}, such that T (v) = Cai(x)v for all v ∈ Vi, where Cai(x) is the companion matrix of the monic polynomial ai(x). T can now be written as a block A,

A = ⊕Cai(x).   Ca1(x) 0 ··· 0 0 .  0 C .. 0 0   a2(x)   .  A =  0 0 .. 0 0  .    ......   . . . . . 

0 0 ··· 0 Cam(x)

Definition 16. The matrix above is the Rational Canonical Form of a linear transformation T . Here ai(x) | ai+1(x).

13 3.3 Similiar Matrices

Definition 17. Two n × n matrices A and B are said to be similiar or conjugated if they are the same linear transformation in possibly different bases, i.e. A = PBP −1 where P is an invertible matrix. A similiar to B can be writ- ten as A ∼ B.

Theorem 18. Given any n × n matrix A with entries in F there exists a unique matrix B with entries in F such that B is in Rational Canonical Form and A ∼ B.

Proof. This is just a special case of Theorem 7 and Theorem 14. Let A : V 7→ V . Here V is a vector space which is isomorphic to a direct sum on the form ∼ V = F [x]/(a1(x)) ⊕ F [x]/(a2(x)) ⊕ · · · ⊕ F [x]/(am(x)). One can now produce a matrix with companion matrices of the invariant factors on the diagonal and zeros elsewhere. Thus the Rational Canonical Form for A exists. By Theorem 14, two F [x]-modules are isomorphic if and only if they have the same invariant factors, hence the Rational Canonical Form must be unique.

Lemma 19. i) The characteristic polynomial of the companion matrix of a(x) is a(x). ii) The characteristic polynomial of a matrix with companion matrices on the diagonal is the product of the characteristic polynomials of the companion matrices, Y χA(x) = χCi (x). i Theorem 20. The characteristic polynomial of an n × n matrix A is the product of all invariant factors of the F [x]-module (F n,A).

Proof. Let B be the matrix in Rational Canonical Form which A is similar to. Then the following is true.

14 −1 χA(x) = Det(xI − A) = Det(xI − PBP ) = 1 = Det(P )Det(xI − B)Det(P −1) = Det(P )Det(xI − B) = χ (x). Det(P ) B Applying the previous lemma now and we are done. Theorem 21. Let A and B be two n × n matrices with entries in some field F. Then the following statements are equivalent. i) A and B are similar over F. ii) A and B have the same invariant factors. iii) A and B have the same Rational Canonical Form.

Proof. i) =⇒ ii). Let A = PBP −1. We know that P is an isomorphism from a vector space V to V since it is invertible. Below we show that P is an F [x]-module isomorphism from VB to VA, where VB is the F [x]-module for V and B and VA is the F [x]-module for V and A. ∼ P.(xα) = P.(B.α) = PB.(α) = AP.(α) = A.(P.(α)) = x(P.(α)) ∀α ∈ VB = V.

Since the F [x]-modules are isomorphic they must by have the same list of invariant factors by Theorem 14. ii) =⇒ iii). Since the matrices have the same list of invariant factors they have the same list of companion matrices and hence the same Rational Canonical Form. iii) =⇒ i). Since the matrices have the same Rational Canonical Form they have the same representation but in a possibly different basis. So they are the same linear transformation up to a change of basis, hence similar. Theorem 22. If A ∼ B are two n × n matrices then the following is true: i) Tr(A)=Tr(B). ii) Det(A) = Det(B).

Proof. Since A and B are similar they have the same invariant factors by The- orem 21. By Theorem 20, they also have the same characteristic polynomial. In particular, they share and since these are coefficients in the characteristic polynomial.

15 Theorem 23. Let A and B be two n × n matrices with entries in some field F. Let K be an extension field of F. Then A and B are similiar over K (elements of P in K) if and only if A and B are similiar over F (elements of P in F).

Proof. (⇐=). Assume that A ∼ B over F . But then also A ∼ B over K since F is embedded in K. (=⇒). Assume that A ∼ B over K. Then A and B have the same Rational Canonical Form. By Theorem 18, this RCF has entries in F . Theorem 21 then implies that A and B are also similar over F .

3.4 Smith Normal Form

Up to this point we have showed the existence and uniqueness of the RCF. With this in mind, we will in this section show how to find this unique RCF. We will introduce another normal form, the Smith Normal Form, which can be found using simple matrix operations. This Smith Normal Form can be used to extract the RCF of any matrix.

Definition 24. Let A be any n×n matrix over some field F. Then the matrix B = xI − A has entries in F [x] and the following three operations on B are called the elementary row and column operations. 1) Interchanging any two rows or columns. 2) Adding a multiple in F [x] of one row or column to another row or column. 3) Multiplying any row or column by a unit in F [x], i.e any non-zero element in the field F. These operations are often referred to as the ERCOs. The notation used for the ERCOs are the following.

Ri ⇐⇒ Rj whenever the ith and jth row are interchanged.

Ci + p(x)Cj whenever p(x) times the jth column is added to the ith column.

Ri · u

16 whenever the ith row is multiplied by a factor u. If two operations are done in one step, the one which is written above the other is the first one executed.

Definition 25. An n × n matrix A is said to be in Smith Normal Form, SNF, if it is diagonal and the elements 1, 1, ..., a1, a2, ..., am on the diagonal satisfy the divisibility relation a1 | a2 |...| am.

1 0 0 ··· 0 0   .. ..  0 . 0 . 0 0  . . .  . 0 1 .. 0 .  A =   0 0 0 a 0 0   1   . .  0 0 0 .. .. 0  0 0 0 ··· 0 am

Theorem 26. Given any n×n matrix A over some field K the matrix xI −A can be put in Smith Normal Form using ERCOs. This normal form is unique and the diagonal elements a1(x), a2(x), ..., am(x) are the invariant factors of A.

Proof. The proof of this theorem is omitted due to its lenght[1].

What we want to do now is to give an intuition on why this algorithm works, i.e. when using ERCOs on the matrix xI − A to put it in SNF, we get the invariant factors of A on the diagonal. First off we can see that when using the ERCOs on xI − A we do not change the determinant (up to a unit in F [x]). Since the characteristic polynomial of A is defined as Det(xI − A) it is therefore invariant under ERCOs up to a unit. Since this polynomial is defined as a monic we know that

χA(x) = Det(SNF ) where SNF is the unique Smith Normal Form we get from using ERCOs on xI −A. The SNF is diagonal so its determinant is the product of the diagonal elements, which have the divisibility criteria a1(x) | a2(x) |...| am(x). So now we know that the product of the diagonal element of the SNF is the

17 characteristic polynomial. We also know that they divide each other. These properties are shared with the invariant factors of A.

3.5 Finding a conjugating matrix

Finding a matrix P conjugating any matrix A into it s RCF (i.e. P −1AP is in RCF) is done by keeping track of the row operations used on xI − A to put it in SNF.

th First off we denote the order of the i invariant factor of A to be di, so P that di = n where n is the dimension of the matrix A. This is also the i dimension of the vector space corresponding to the ith cyclic factor in the invariant factor decomposition. Starting with the matrix P 0 = I, for every row operation used on xI − A, we change the matrix P 0 according to the following rules.

0 1) If Ri ⇐⇒ Rj then Ci ⇐⇒ Cj for P . 0 2) If Ri + p(x)Rj then Cj − p(A)Ci for P . −1 0 3) If Ri · u then Ci · u for P . This can be seen as finding a generator for each invariant factor by taking F [x]-linear combinations of the generators of the standard basis, i.e. the th standard basis vectors where ei has a one on the i position and zeros else- where. When this is done we will have a matrix with n-m zero-columns in the begin- ning, where m denotes the number of invariant factors of A. These columns will be removed now and with the remaining m non-zero columns (note that these columns correspond precisely to the m diagonal elements in the Smith Normal Form) we do the following. From the first (non-zero) column, now denoted C1, we will extract d1 columns, since the vector space corresponding to this cyclic factor has dimension d1. These columns will form a basis for this vector space and are extracted in the following way.

2 d1−1 C1, AC1,A C1, ..., A C1.

With respect to this order we have now the first d1 columns of P , i.e. the basis for the above mentioned vector space. Then we do the same thing

18 with C2 from which we will extract d2 columns to put in P in the same way as above. They will in the same manner form a basis for the vector space corresponding to the second cyclic factor of A. These basis vectors will become columns d1 + 1, d1 + 2, ..., d1 + d2 of P . We continue this process for the remaining (non-zero) columns of P 0 to construct n linearly independent columns of P , i.e. n basis vectors in which basis the linear transformation A is in Rational Canonical Form.  0 3 0 3 0 3 0 −3  1 1 −1 0 1 2 0 −3 Example 27. Assume A =   and B =   −1 −1 1 3 0 0 3 3  0 1 1 2 0 0 0 −1 Are A and B similar, i.e. is there some matrix P, say over Q, such that A = PBP −1? Solution. We will do this by finding the RCF for both A and B, and see if they are the same or not. To find the RCF we must first find the invariant factors, which we will do by using elementary row and column operations on xI − A and xI − B to put them on their Smith Normal Form. We start with A, where we from now on define a(x) := x2 − 2x − 3

 x −3 0 −3   x 0 0 −3 

−1 x − 1 1 0  C2−C4 −1 x − 1 1 0  xI − A =   −−−−→    1 1 x − 1 −3   1 4 x − 1 −3  0 −1 −1 x − 2 0 1 − x −1 x − 2 x 0 0 −3   x 0 0 −3 

R2+R3 0 x + 3 x −3  R3−R1  0 x + 3 x −3  −−−−→   −−−−→   1 4 x − 1 −3  1 − x 4 x − 1 0  0 1 − x −1 x − 2 0 1 − x −1 x − 2  x 0 0 −3   x 0 0 −3 

R2+x·R4  0 −a(x) 0 a(x)  C1+C3  0 −a(x) 0 a(x)  −−−−−→   −−−−→   1 − x 4 x − 1 0   0 4 x − 1 0  0 1 − x −1 x − 2 −1 1 − x −1 x − 2  x −3 0 −3   x −3 0 −3 

C2+C4  0 0 0 a(x)  R2 ⇐⇒ R4 −1 −1 −1 x − 2 −−−−→   −−−−−−→    0 4 x − 1 0   0 4 x − 1 0  −1 −1 −1 x − 2 0 0 0 a(x)

19  0 −x − 3 −x a(x)  −1 0 −1 x − 2

R1+x·R2 −1 −1 −1 x − 2 C2−C3  0 −3 −x a(x)  −−−−−→   −−−−−−→    0 4 x − 1 0  R1 ⇐⇒ R2  0 5 − x x − 1 0  0 0 0 a(x) 0 0 0 a(x)

1 0 0 x − 2 1 0 0 0 

C3−C1 0 −3 −x a(x)  C4+(2−x)·C1 0 −3 −x a(x) −−−−→   −−−−−−−→   C1(−1) 0 5 − x x − 1 0  0 5 − x x − 1 0  0 0 0 a(x) 0 0 0 a(x)

1 0 0 0  1 0 0 0 

R2−R4 0 −3 −x 0  R3+2·R2 0 −3 −x 0  −−−−→   −−−−−→   0 5 − x x − 1 0  0 −1 − x −1 − x 0  0 0 0 a(x) 0 0 0 a(x)

1 0 0 0  1 0 0 0 

C2−C3 0 x − 3 x 0  C3−C2 0 x − 3 3 0  −−−−→   −−−−→   C3(−1) 0 0 x + 1 0  0 0 x + 1 0  0 0 0 a(x) 0 0 0 a(x)

1 0 0 0  1 0 0 0  R − x+1 ·R C + 3−x ·C 3 3 2 0 x − 3 3 0  2 3 3 0 0 3 0  −−−−−−−→  −a(x)  −−−−−−−→  −a(x)  0 3 0 0  0 3 0 0  0 0 0 a(x) 0 0 0 a(x)

1 0 0 0  1 0 0 0 

C2 ⇐⇒ C3 0 3 0 0  C3·(−3) 0 1 0 0  −−−−−−→  −a(x)  −−−−→   0 0 0 1 0 0 a(x) 0  3  C2·( 3 )   0 0 0 a(x) 0 0 0 a(x) This is now the Smith Normal Form of xI −A. We see that we have two equal invariant factors, a(x). Each of them have the corresponding companion matrix 0 3 C = . a(x) 1 2

20 This gives the Rational Canonical Form of A

0 3 0 0 1 2 0 0 RCF (A) =   . 0 0 0 3 0 0 1 2

If we now do the same procedure for B we get that

 x −3 0 3 

−1 x − 2 0 3  C2+C1 xI − B =   −−−−→  0 0 x − 3 −3  0 0 0 x + 1

 x x − 3 0 3  x + 1 0 0 0 

C2+C1 −1 x − 3 0 3  R1−R2  −1 x − 3 0 3  −−−−→   −−−−→    0 0 x − 3 −3   0 0 x − 3 −3  0 0 0 x + 1 0 0 0 x + 1

x + 1 0 0 0  x + 1 0 0 0 

R2+R3  −1 x − 3 x − 3 0  C3−C2  −1 x − 3 0 0  −−−−→   −−−−→    0 0 x − 3 −3   0 0 x − 3 −3  0 0 0 x + 1 0 0 0 x + 1

x + 1 0 0 0  x + 1 0 0 0  R + x+1 R 4 3 3  −1 x − 3 0 0  R3 ⇐⇒ R4  −1 x − 3 0 0  −−−−−−→   −−−−−−→   R4·3  0 0 x − 3 −3  0 0 a(x) 0  0 0 a(x) 0 0 0 x − 3 −3

 0 a(x) 0 0   0 −3 x − 3 0 

R1+(x+1)R2 −1 x − 3 0 0  C2 ⇐⇒ C4 −1 0 0 x − 3 −−−−−−−→   −−−−−−→    0 0 a(x) 0  R1 ⇐⇒ R4  0 0 a(x) 0  0 0 x − 3 −3 0 0 0 a(x) −1 0 0 x − 3 −1 0 0 0 

R1 ⇐⇒ R2  0 −3 x − 3 0  C4+(x−3)C1  0 −3 x − 3 0  −−−−−−→   −−−−−−−→    0 0 a(x) 0   0 0 a(x) 0  0 0 0 a(x) 0 0 0 a(x)

21 −1 0 0 0  1 0 0 0  C + x−3 C C ·( −1 ) 3 3 2  0 −3 0 0  2 3 0 1 0 0  −−−−−−→   −−−−→    0 0 a(x) 0  C1(−1) 0 0 a(x) 0  0 0 0 a(x) 0 0 0 a(x) This is the Smith Normal Form for xI − B. Since A and B have the same invariant factors, they have the same RCF, i.e. 0 3 0 0 1 2 0 0 RCF (B) =   0 0 0 3 0 0 1 2 and RCF (A) = RCF (B). Theorem 21 gives us that A ∼ B or A = PBP −1 for some P . By following the algorithm above we will find −1 −1 matrices PA,PB, such that M = PA APA = PB BPB, where M is the RCF −1 −1 −1 −1 −1 of A and B. Then we have that A = PAPB BPBPA = (PAPB )B(PAPB ) −1 so that P = PAPB . We start with PA. 1 0 0 0 1 0 0 0  1 0 0 0

0 1 0 0 C3−C2 0 1 −1 0 C1+C3 −1 1 −1 0   −−−−→   −−−−→   0 0 1 0 0 0 1 0  1 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1  1 0 0 −3  1 −3 0 0

C4−AC2 −1 1 −1 −1 C2 ⇐⇒ C4 −1 −1 −1 1 −−−−−→   −−−−−−→    1 0 1 1   1 1 1 0 0 0 0 0 0 0 0 0  1 0 0 0 0 1 0 0

C2−AC1 −1 0 −1 1 C1 ⇐⇒ C2 0 −1 −1 1 −−−−−→   −−−−−−→    1 0 1 0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0 1 0 1 0 1

C4+C2 0 −1 −1 0 C2−2C3 0 1 −1 0 −−−−→   −−−−→   0 1 1 1 0 −1 1 1 0 0 0 0 0 0 0 0

22 0 0 0 1 C + A+I C 2 3 3 0 0 −1 0 0 −−−−−−→   = P . 0 0 1 1 A 0 0 0 0 0 For A we have that d1 = d2 = 2, so PA has the right number of zero-columns. The first two columns of PA are  0   0  −3 −1 −1 −2   ,A   =   .  1   1   2  0 0 0

The last two columns are 1 1 0 0 0 0   ,A   =   . 1 1 0 0 0 1

This gives  0 −3 1 0 −1 −2 0 0 PA =   .  1 2 1 0 0 0 0 1 Now we do the same thing for the matrix B.

1 0 0 0 1 1 0 0 1 1 −1 0

0 1 0 0 C2+C1 0 1 0 0 C3−C2 0 1 −1 0   −−−−→   −−−−→   0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 0 0 1 0 0 0 1

1 1 0 0 1 1 0 0 1 1 0 0 C − B+I C C · 1 3 3 4 0 1 0 0 4 3 0 1 0 0 C3 ⇐⇒ C4 0 1 0 0 −−−−−−→   −−→   −−−−−−→   0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 1 0 0 0 3 0 0 3 0 1 0 0 0 0 0 0 1

C2−(B+I)C1 0 0 0 0 C1 ⇐⇒ C4 0 0 0 0 0 −−−−−−−→   −−−−−−→   = P . 0 0 0 0 0 0 0 0 B 1 1 0 0 3 0 0 0 3 0

23 For matrix B we of course also have d1 = d2 = 2, which gives the first two columns of PB 0 0 −1 0 0 −1   ,B   =   . 0 0  1  1 1 −1 3 3 3

The last two columns of PB are given by 1 1 0 0 0 1   ,B   =   . 0 0 0 0 0 0 This gives the matrix 0 −1 1 0 0 −1 0 1 PB =   , 0 1 0 0 1 −1 3 3 0 0 with an inverse 0 0 1 3 −1 0 0 1 0 P =   . B 1 0 1 0 0 1 1 0 This finally gives by above 1 0 −2 0  −1 0 0 −3 −3 P = PAP =   , B 1 0 4 3  0 1 1 0 which conjugates B into A by A = PBP −1 Note however that this is not the only matrix conjugating B into A. For example the matrix 1 0 1 0  0 1 0 −1 P2 =   0 −1 1 1  0 0 1 1 also conjugates B into A in the same manner.

24 4 Simultaneous Conjugation

Definition 28. Two pairs of matrices (A1,A2) and (B1,B2) are said to be simultaneously conjugated if there exists an invertible matrix P such that

−1 −1 P.(A1,A2) := (PA1P ,PA2P ) = (B1,B2).

The notation used for this pairwise similarity will be (A1,A2) ∼ (B1,B2). When we started investigating this pairwise similarity we wanted to somehow use our prior knowledge in simple similarity to attack the problem of when two pairs of matrices are pairwise similar.

First of all we see that we we need A1 ∼ B1 and A2 ∼ B2. This is the most obvious necessary condition. It is clear that this is not sufficient. Then we realised that we could construct the products A1A2 and A2A1 and see that another necessary condition is that A1A2 ∼ B1B2 and A2A1 ∼ B2B1. This follows straight from the definition since

−1 A1 = P B1P and −1 A2 = P B2P. This gives −1 A1A2 = P B1B2P and −1 A2A1 = P B2B1P.

We could also show that this together with the simple similarity (A1 ∼ B1 and A2 ∼ B2) was not a sufficient condition, by finding a counterexample. Another approach we thought about was finding the explicit matrices P for each conjugation and from there see if they could coincide. The problem here was that in general there are a lot of matrices P that can conjugate a matrix into one in the same similarity class. In special cases e.g. small matrices it is possible to find some stronger restrictions in the set of these matrices P that makes it possible to find all of them. But in the general case this is very hard and they can look a lot different. This can be seen at the end of Example 27.

25 We continued the process by seeing if the simple similarity condition together with the condition that

i1 i2 i3 ik i1 i2 i3 ik T r(A1 A2 A1 ...A2 ) = T r(B1 B2 B1 ...B2 ), ij, k ∈ N ∪ 0, 1 ≤ j ≤ k was sufficient. We could also here construct a counterexample to show that is was not sufficient, by looking at upper triangular matrices. Finally we looked at the strongest necessary simple conjugation condition we could find. By constructing finite products of powers of A1 and A2 we could see that a necessary condition was that

i1 i2 i3 ik i1 i2 i3 ik A1 A2 A1 ...A2 ∼ B1 B2 B1 ...B2 . The proof that this is not a sufficient condition follows below. Note that this condition is a generalization of all the above mentioned conditions. Hence we omitted the proofs for for these special cases since they follow from Theorem 29.

i1 i2 i3 ik i1 i2 i3 ik Theorem 29. A1 A2 A1 ...A2 ∼ B1 B2 B1 ...B2 , ij, k ∈ N∪0, 1 ≤ j ≤ k does not imply (A1,A2) ∼ (B1,B2).

Proof. We will prove this by finding a counterexample. Consider the following matrices.

1 3 1 1 A = B = 1 0 2 1 0 2

2 5 2 1 A = B = 2 0 3 2 0 3 One can see that 2J ∗  Ai1 Ai2 Ai3 ...Aik = 1 2 1 2 0 2I 3J and 2J ∗∗  Bi1 Bi2 Bi3 ...Bik = , 1 2 1 2 0 2I 3J where k X2 I = i2n−1 n=1

26 and k X2 J = i2n. n=1

The only thing we know about the upper right element is that it is a positive number but since it does not affect the invariant factors and hence similarity i1 i2 i3 ik i1 i2 i3 ik we denote it with stars. Clearly A1 A2 A1 ...A2 and B1 B2 B1 ...B2 have the same invariant factors so they must be similar.

Since A1 and B1 are similar there exists an invertible matrix P such that

−1 A1 = PB1P ⇐⇒ A1P = PB1. 1 3 p p  p + 3p p + 3p  LH = 11 12 = 11 21 12 22 . 0 2 p21 p22 2p21 2p22 p p  1 1 p p + 2p  RH = 11 12 = 11 11 12 . p21 p22 0 2 p21 p21 + 2p22

This implies that p21 = 0 and replacing p11 and p22 with the parameters s and t gives s 3t − s P = . 0 t

Now we apply the same procedure for the matrices A2 and B2 with the invertible change of basis matrix Q.

2 5 q q  2q + 5q 2q + 5q  LH = 11 12 = 11 21 12 22 . 0 3 q21 q22 3q21 3q22

q q  2 1 2q q + 3q  RH = 11 12 = 11 11 12 . q21 q22 0 3 2q21 q21 + 3q22 0 This implies that q21 = 0 and replacing q11 and q22 with the parameters s and t0 gives s0 5t0 − s0 Q = . 0 t0

Assume now (A1,A2) ∼ (B1,B2). Then there exists a choice of s, t, s’, t’ such that P = Q, s 3t − s s0 5t0 − s0 = . 0 t 0 t0

27 This implies that s = s0, t = t0 and

3t − s = 5t0 − s0 =⇒ 3t = 5t =⇒ t = t0 = 0.

Thus we have reached a contradiction since P and Q are supposed to be invertible matrices and we are done.

References

[1] David S Dummit and Richard M. Foote, Abstract Algebra, Wiley, third edition, 2004

28